Professional Documents
Culture Documents
Submitted by
Yugan S
19BCE1072
in
1
School of Computer Science and Engineering
DECLARATION
Place:
Chennai
Date:
11/12/2021
2
School of Computer Science and Engineering
CERTIFICATE
Guide/Supervisor HoD
(Seal of SCSE)
3
Acknowledgement
I was obliged to give my appreciation to a number of people without whom I
could not have completed this thesis successfully.I would like to place on record
my deep sense of gratitude and thanks to my internal guide Prof.Dr.Arunkumar
Sivaraman, School of Computer Science and Engineering (SCOPE), Vellore
Institute of Technology, Chennai, whose esteemed support and immense
guidance encouraged me to complete the project successfully.I would like to
thank our HoD Dr.Nithiyanandam P, School of Computer Science and
Engineering (SCOPE) and Project Co-Ordinator Dr..Arunkumar Sivaraman,
Vellore Institute of Technology, Chennai, for their valuable support and
encouragement to take up and complete this thesis. Special mention to our dean
Dr.Ganesan R ,School of Computer Science and Engineering (SCOPE), Vellore
Institute of Technology, Chennai, for motivating us in every aspect of software
engineering.I thank our management of Vellore Institute of Technology,
Chennai, for permitting me to use the library and laboratory resources. I also
thank all the faculty members for giving me the courage and the strength that I
needed to complete my goal. This acknowledgment would be incomplete
without expressing the whole hearted thanks to my family and friends who
motivated me during the course of my work.
Yugan S
Reg. No. 19BCE1072
4
Abstract
This paper focuses on the incidence of the COVID-19 disease in India and
analysis of the top most affected states by population in the country.Data
analytics can be helpful in understanding different aspects of the pandemic.The
cumulative number of confirmed cases, deaths cases, recovered cases in
different states. We use libraries like Scrappy ,Pandas, Numpy, Matplotlib,
Plotly for dataset extraction, scientific calculations and visualizations. We used
Tableau : Powerful visualization tool which allows us to plot geo informational
Data and create Dashboard.
5
Contents
Declaratio 2
n
Certificate 3
Acknowledgement 4
Abstract 5
1 Introductio
n 8
1. Background
1
1. Statement
2
1. Motivation
3
1. Challenges
4
2 Planning & Requirements Specification 9-11
2. Literature Review
1
2. Requirements
3
2. System Requirements
4
2.3 Hardware Requirements
.1
2.3 Software Requirements
.2
3 System Design 12
1 System design 15
3 Pie chart 17
4 Dumbbell chart 18
7
Introduction
1.1 Background
Coronaviruses are a large family of viruses that are known to cause illness ranging from the
common cold to more severe diseases such as Middle East Respiratory Syndrome (MERS)
and Severe Acute Respiratory Syndrome (SARS).
1.2 Statement
COVID-19 outbreak was first reported in Wuhan, China and has spread to more than 50 countries
and the first case reported in India was on 27th January 2020 in Kerala .WHO declared COVID-19
as a Public Health Emergency of International Concern (PHEIC) on 30th January 2020. Naturally,
a rising infectious disease involves fast spreading, endangering the health of large numbers of
people, and thus requires immediate actions to prevent the disease at the community level.
Therefore, CoronaTracker was born as the online platform that provides latest and reliable news
development, as well as statistics and analysis on COVID-19. This paper is done by the project
research team aims to predict and forecast COVID-19 cases, deaths, and recoveries through time
series analysis and also analysing if the lockdown period was effective or not in India. The model
helps to interpret patterns of public sentiment on disseminating related health information, and
assess political and economic influence of the spread of the virus.
1.3 Motivation
Data Analytics can be used in understanding the covid-19 trends, decision making and precautions
can be taken according to the observed results.Classification of different states based on their
active number of cases, recovered cases, deaths and other attributes.We use Time series Analysis
on the data to predict the future cases based on the previously observed cases all over the states in
India.
1.4 Challenges
The retrieved dataset had some misleading Indian state names and duplication of state names.
Huge data was collected not able to verify the legitimacy of the dataset and it was a time
consuming process all the data related to covid counts was given in cumulative numbers.
[4] Prediction of new active cases of coronavirus disease (COVID-19) pandemic using
multiple linear regression models.
In this paper,Regression model such as Linear and Multiple Linear Regression techniques
are applied to the data set to visualize the trend of the affected cases and In the end it is
found that Multiple Linear Regression mode is more accurate in predicting the outcome and
nearly produce accurate results.The strength of the model is its R2 value came to be 1.0
which shows a strong predictor model taking into consideration of all the factors.
[6] Statistical analysis and visualization of the potential cases of pandemic coronavirus
In this paper,It has supported us to generate and disseminate detailed information to the
scientific community and to the public, especially at the peak phase, in order to understand
the growth and impact of the novel coronavirus. Dataset given by Johns Hopkins CSSE data
repository. In this paper,analysing and comparing the deaths and recovery in each country.
5% of deaths and 8% of recoveries occurred in reported cases in the United States. In Spain,
10% of deaths and 40% of recoveries occurred in confirmed cases.
10
2.2 Requirements
Hardware Requirement
11
3. System Design
12
4) Implementation of the System
Analysing the dataset for confirmed Covid-19 cases in India by state wise from (31st Jan 2020)
to (31st oct 2021), it’s clearly evident that confirmed cases are higher in states & union
territories with higher digit of population and population density, dual bar-line chart visualize the
total number of confirmed and deceased cases. From the data retrieved in between the interval of
time, Maharashtra has around 6.3 million confirmed and 0.14 million of death, we find that
Maharastra has the highest number of cases reported and the highest number of deaths,also
includes most confirmed cases states like Kerala,Karnataka,Tamilnadu,Andhra pradesh,Uttar
pradesh,Delhi and more.Confirmed cases are high where the population density is high like
maharashtra population is 123,144,223 approx.123 million.
13
Figure 3: Pie chart
From the observation we come to know the state wise confirmed cases in India, from the
observation Maharashtra has the most confirmed cases compared to other states. Each state is
indicated by different shades accordingly. Kerala, Karnataka & Tamil nadu have more confirmed
cases which ranks in the top 4 places in confirmed cases scenario.
Figure 4: Dumble chart
Analysing the impact of covid-19 in different states according to year 2020 and 2021 the number of
confirmed cases have increased dramatically in Maharashtra state with numbers approx. 3980k and have
the most number of average confirmed cases and states like kerala, chhattisgarh, punjab had less number
cases in 2020 and have increased in number in 2021 even though kerala has less population comparatively
the covid has huge spread due to migration of people.
Figure 5: Geographical Heatmap
Maharashtra has a contrasting nature in terms of average confirmed cases with number
62,29,596. The reason behind the huge digit is the majorly the population of the state and
population density of each region district wise, where the population density of capital city
of Maharashtra , Mumbai is approx.73,000 people per sq mile, hence the outcome is
proportional.
Figure 6: Line chart (Time Series)
Analysing the total recovered cases throughout India by state wise, while comparing it with
the population of the country state wise ,it’s clearly evident that states and union territories
with higher the population and population density have more number of cases and the
recovered cases are proportional to the confirmed cases but the observation explicits that
states Odisha and Chhattisgarh were not proportional in compassion between confirmed and
recovered cases, this shows that Odisha has 12.6% higher recover rate over Chhattisgarh.
ANALYSING LOCKDOWN:
2020
On 24 March, the first day of the lockdown, nearly all services and factories were
suspended People were hurrying to stock essentials in some parts.Arrests across the states
were made for violating norms of lockdown such as venturing out for no emergency,
opening businesses and also home quarantine violations. The government held meetings
with e-commerce websites and vendors to ensure a seamless supply of essential goods
across the nation during the lockdown period.
Phase 2: 15 April 2020 – 3 May 2020 (19 days)
On 14 April, PM Modi extended the nationwide lockdown till 5 May, with a conditional
relaxation promised after 20 April for the regions where the spread had been contained by
then.He said that every town, every police station area and every state would be carefully
evaluated to see if it had contained the spread. The areas that were able to do so would be
released from the lockdown on 20 April. If any new cases emerged in those areas,
lockdown could be reimposed. On 16 April, lockdown areas were classified as "red zone",
indicating the presence of infection hotspots,"orange zone" indicating some infection, and
"green zone" with no infections.
Analysing the chart,Lockdown periods (Mar-May 2020).During the lockdown period there
is no huge increase in confirmed cases but after the lockdown period got over the case
become very huge while after may (Jun-dec 2020).So, conclusion is while in lockdown the
cases are in control but when the lockdown was suspended the cases rises very high.
Figure 8: Bar chart (Time Series)
From the above charts we can observe that , impact of Covid-19 on Indian states.The most
affected states were Maharashtra , kerala , karnataka , Tamil Nadu ..etc.Maharashtra has
suffered the most during this period but we can see that proper measures have been taken
and the recovery rate of the people have increased. Even though the Confirmed cases of the
covid-19 were high we can see that death rate is very less comparatively and the death rate
has been decaling from the start of the covi-19 period ,so we can assume that there was
good medical support.
This research presented current trends of COVID-19 outbreak in India from 31th Jan 2020
to 31rd October 2021 as visualized in different charts using Tableau visualization tool. The
trajectory of the outbreak is also forecasted by using a Time series analysis model. COVID-
19 is still an infectious disease with some unclear or unknown properties, which means
accurate SEIR prediction can only be obtained once the outbreak has been successfully
contained. The outbreak spreads are largely influenced by each country’s policy and social
responsibility. In a pandemic like this, providing timely information to the public is
paramount.
Analyses district wise effect of covid-19 on each state and find the reason behind mostly
affected areas. the role of vaccination in reducing the impact of covid-19 in each state.
References:
[1] Mapping the spread of COVID-19 outbreak in India (Vanshika Bidhan 1, Bhavini
Malhotra 2, Mansi Pandit 3 and N.Latha* )
[3] Statistical Explorations and Univariate Time series Analysis on COVID-19 Datasets to
Understand the Trend of Disease Spreading and Death.
[4] Prediction of new active cases of coronavirus disease (COVID-19) pandemic using
multiple linear regression model
[6] Statistical analysis and visualization of the potential cases of pandemic coronavirus.
22