Data Visualization and Analysis of COVID-19

Keshab Acharya

1. Introduction

The novel coronavirus has negatively impacted our social and academic life. The coronavirus disease, mainly referred to as Covid-19, has taken the lives of many people. In the rise of covid cases, the United States was impacted by them more than any country in the world. The covid cases started rising around late January and increased exponentially throughout the months. This immediate rise in the cases has surprised everyone. If we closely observe the impact of covid in all the countries in the world, it clearly shows that United States was the one that had the highest impact.

This project is a simple analysis of covid cases throughout the world. We will first start visualizing data just for United States and we will explore more data through different datasets. The project has been divided into multiple parts, each with their own purpose. This project largely consists of visualization and analysis as this is the best way to see the impact of covid around the world. It would make most sense to visualize how many cases and deaths have happened throughout the world using the datasets. Therefore, the majority of this project consists of plots.

In addition to visualization, we will try to analyze the data with predictions. One of the fundamentals of data science is to predict values based on models. We will map a model based on our data and try to predict cases for next couple of days. We will also do some visualizations using maps which give interesting results and a different way to interpret our data.

2. Preparing the Data for U.S

First, we will use a dataset from Kaggle that has a list of counties with their respective state, the count of covid cases, along with their deaths. We can delete all the entries with null values since they won't contribute to the plot. The dataset is updated every day since we are still in a pandemic and the cases are still rising every day. Therefore, the data analysis might not perfectly match the true increase in cases, but it should be an accurate analysis as of this writing.

The dataset for the counties can be accessed in Kaggle through US Counties COVID-19 Dataset

We will first start by importing some basic libraries we will need to start visualizing the data. We will import more libraries later as we do more analysis. To install these libraries we can simply do pip install [library name]