You are on page 1of 34

IMPACT OF AIR POLLUTION ON OUR LIVES

Contents:

1. Abstract
2. Objective
3. Problem Statement
4. Introduction
5. Data Analysis
5 a. Part I
5 b. Part II
6. Methodology - I
6 a. Impact of Air pollution - Pre COVID.
7. Coding and Results.
8. Methdology - II
8 a. Impact of Air pollution - Post COVID.
9. Summary
10.Conclusion
ABSTRACT
Air pollution is increasing day by day. Mostly the chemical pollutants like CO2, SO2, NH3,
PMare the causes of the air pollution. The sources of these pollutants are Industries, vehicles,
Burning of fossil fuels e.t.c. This Document provides an detail description and analysis of factors
and their ratios affecting and lead to harmfulness to the people and other living organisms. Air
Quality index is the most important factor that should be considered. By considering it we can
estimate the effect rate of air pollution like severe, poor ,good. The data is given by Central
government Pollution board. I estimated Air quality Index by applying Machine Learning
Classification techniques Like Random Forest, Support Vector Machine and then Clustering analysis
for grouping the effect of the pollutants based on Air Quality Index. After data Analyzation is done
using Tableau tool for predicting impact of pollution after COVID’19 and pollutants percentage are
also analyzed using this tool. After that based on the effect of pollution. I can say what are the
harmful effects that we are going to face as per pollution group.
OBJECTIVE
The main objective of this project is to provide harmful effects of Air Pollution and the sources
that are causing it. Our goal is predict the impact of Air Pollution after three years of COVID’19
and analyze the pollution before three years of COVID’19.
PROBLEM STATEMENT
Predict the Air Quality Index (AQI) of the current data and compare with existing data. Group
the effect rate of pollution into good (0 – 50), Satisfactory (51-100) , Moderate (101-200), Poor
(200 – 300) and Very Poor (300 and above). Mention the Impact of air pollution and predict the
air pollution for next upcoming years.
INTRODUCTION

Air pollution may be described as contamination of the atmosphere by gaseous, liquid, or solid
wastes or by-products that can endanger human health and welfare of plants and animals, attack
materials, reduce visibility or produce undesirable odors. Although some pollutants are released
by natural sources like volcanoes, coniferous forests, and hot springs, the effect of this pollution
is very small when compared to that caused by emissions from industrial sources, power and heat
generation, waste disposal, and the operation of internal combustion engines. Fuel combustion is
the largest contributor to air pollutant emissions, caused by man, with stationary and mobile
sources equally responsible. The air pollution problem is encountered outdoor as well as indoor.
To read more about the Outdoor Air Pollution and to read more about the Indoor Air Pollution

The indoor air pollution came to our attention during 80's while outdoor air pollution has been
around for some time. The major pollutants which contribute to indoor air pollution include
radon, volatile organic compounds, formaldehyde, biological contaminants, and combustion by-
products such as carbon monoxide, carbon dioxide, sulfur dioxide, hydrocarbons.
The major pollutants which contribute to outdoor air pollution are sulfur dioxide, carbon
monoxide, nitrogen oxides, ozone, total suspended particulate matter, lead, carbon dioxide, and
toxic pollutants.

There are several reasons to worry about air pollution. Some are:

Air pollution affects every one of us.


Air pollution can cause health problems and, may be, death.
Air pollution reduces crop yields and affects animal life.
Air pollution can contaminate soil and corrode materials.
DATA ANALYSIS

PART-I
Tool Used: Tableau

In this Part, we discuss about the chemical pollutants which cause air pollution is collected and
entered in an csv file ,using tableau tool they are analyzed.

This part of the data analysis explains the brief historical data of air pollution like chemical
factors, annual death rates, different kinds of air pollution.

Fig 5a.1.1 Data:

Fig 5a.1.2 Tableau-tool analysis:


Smoke air pollution

Fig 5a.2.1 Data:

Fig 5a.2.2 Tableau analysis:

Transport and Industry Effects:

Fig 5a.3.1 Data:


Fig 5a.3.2 Tableau Analysis:

Fig 5a.4.1 Annual death rates:


Fig 5a.4.2 Tableau Analysis death rates:
PART-II
Technology &Tool Used: Python (Machine Learning) & Jupyter Notebook.

In this Part, we discuss about the chemical pollutants that cause air pollution and AQI. Air
Quality index is the main solution to detect the type of pollutiondiseases that cause effect the
lives of people and living organisms.

The data is taken from the Central Pollution of India and entered in an csv file.

The number of instances are 24022.(city.csv)

Training Data:

Testing Data(i):

Samples are taken and then air quality Index is predicted.

Instances: 90

On this data, we want to predict the air quality index and then we group them into five disease
stages, as we discussed earlier.

Testing Data(ii):

Samples are taken and then air quality Index is predicted.

Instances: 21
Factor:

Air Quality index : The total of all chemical pollutants *1.5

Let’s go to the Methodology to understand better.


METHODOLOGY I
Tool Used: Tableau.

Dataset: city.csv

Impact of Air pollution - Pre COVID.

Fig 6.1- AQI vs Year

Fig 6.2- AQI vs Pollution Remark


Fig 6.3 – AQI vs Cities

Methodology – I conclusion:

We can conclude that 95% of the pollution is decreased by 2019-2020.


CODING AND RESULTS
Technology Used: Python (Machine Learning)

Tool Used: Jupyter Notebook.

Training data is trained, and then test data is given as input to predict the results.

We are analyzing in three kinds. They are

(i) Prediction of Air Quality Index


(ii) Clustering the Air Quality Index and COVID
(iii) Marking the affect of pollution and disease messaging, as per the central government
standards.

Prediction of Air Quality Index


Train data: City.csv
No. of instances: 24022

Train data: TEST file


No. of instances: 90

Language: Python
Technique: Regression (Random Forest Regressor & Support Vector Machine)

Explanation is available in code fragment.


In [3]: import pandas as pd

In [4]: #loading the train data set of airquality(90 instances)


data=pd.read_csv('C:\sravan\city_day.csv')

In [7]: #missing data is removed


traindata1=traindata.dropna()

In [37]: #first we predict the air quality index by splitting our data as 80%trai
n data and 20% testing
#Then we apply regression techniqueto predict the air quality index base
d on all chemical pollutants.
#there after we apply cluster analysis
# and finally we want to predict what are the harmful affects that you a
re going to face like good,very poor e.t.c

In [38]: #first drop unwanted columns.

In [8]: traindata1.head(3)

Out[8]:
CITY DATE PM2.5 PM10 NO NO2 Nox NH3 CO SO2 O3 Benzen

1969 Amaravati 11/25/2017 81.40 124.50 1.44 20.50 12.08 10.72 0.12 15.24 127.09 0.2

1970 Amaravati 11/26/2017 78.32 129.06 1.26 26.00 14.85 10.28 0.14 26.96 117.44 0.2

1971 Amaravati 11/27/2017 88.76 135.32 6.60 30.85 21.77 12.91 0.11 33.59 111.81 0.2

In [9]: traindata2=traindata1.drop(['CITY','DATE','pollution range'],axis='colum


ns')

In [10]: traindata2.head(2)

Out[10]:
PM2.5 PM10 NO NO2 Nox NH3 CO SO2 O3 Benzene Toluene Xylene air

1969 81.40 124.50 1.44 20.5 12.08 10.72 0.12 15.24 127.09 0.20 6.50 0.06
1970 78.32 129.06 1.26 26.0 14.85 10.28 0.14 26.96 117.44 0.22 7.95 0.08

In [11]: #here prediction value(class label is air_quality index) so,make it into


target variable
target=traindata2['air_quality_index']
print(len(traindata2))
print(len(target))

4646
4646

In [47]: #traindata=traindata.drop(['air_quality_index'],axis='columns

In [13]: #Then split our traindata into training(80%) and testing (20%)

In [12]: from sklearn.model_selection import train_test_split

In [13]: X_train,x_test,Y_train,y_test=train_test_split(traindata2,target,test_si
ze=0.3)
#making our data into test and trainsets

In [14]: len(X_train)

Out[14]: 3252

In [15]: from sklearn.ensemble import RandomForestRegressor

In [16]: r=RandomForestRegressor(n_estimators=50)

In [21]: #model
r.fit(X_train,Y_train)

Out[21]: RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,


max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=Non
e,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=50,
n_jobs=None, oob_score=False, random_state=None,
verbose=0, warm_start=False)

In [18]: r.score(x_test,y_test)

Out[18]: 0.999758744001202

In [90]: #score obtained is 99.9% predicted.....

In [22]: res=r.predict(X_train)
res

Out[22]: array([374.12, 156. , 266.72, ..., 46. , 140. , 247.98])

In [23]: print(traindata)

CITY DATE PM2.5 PM10 NO NO2 Nox NH3


CO \
1969 Amaravati 11/25/2017 81.40 124.50 1.44 20.50 12.08 10.72
0.12
1970 Amaravati 11/26/2017 78.32 129.06 1.26 26.00 14.85 10.28
0.14
1971 Amaravati 11/27/2017 88.76 135.32 6.60 30.85 21.77 12.91
0.11
1972 Amaravati 11/28/2017 64.18 104.09 2.56 28.07 17.01 11.42
0.09
1973 Amaravati 11/29/2017 72.47 114.84 5.23 23.20 16.59 12.25
0.16
... ... ... ... ... ... ... ... ...
...
24018 Patna 4/27/2020 19.03 50.03 77.24 14.17 57.37 11.30
0.43
24019 Patna 4/28/2020 12.37 39.29 66.20 11.68 58.88 11.30
0.39
24020 Patna 4/29/2020 15.21 41.96 79.67 13.50 69.42 10.13
0.42
24021 Patna 4/30/2020 30.93 60.26 69.32 14.46 61.62 10.08
0.52
24022 Patna 5/1/2020 29.26 76.89 75.87 11.84 65.66 12.02
0.52

SO2 O3 Benzene Toluene Xylene air_quality_index \


1969 15.24 127.09 0.20 6.50 0.06 184.0
1970 26.96 117.44 0.22 7.95 0.08 197.0
1971 33.59 111.81 0.29 7.63 0.12 198.0
1972 19.00 138.18 0.17 5.02 0.07 188.0
1973 10.55 109.74 0.21 4.71 0.08 173.0
... ... ... ... ... ... ...
24018 9.83 23.31 0.66 3.22 0.16 109.0
24019 8.63 31.79 0.55 3.05 0.14 98.0
24020 9.37 33.08 0.69 1.24 0.73 111.0
24021 11.96 41.62 1.67 1.82 2.62 118.0
24022 7.86 35.56 2.28 1.93 2.75 118.0

pollution range
1969 Moderate
1970 Moderate
1971 Moderate
1972 Moderate
1973 Moderate
... ...
24018 Moderate
24019 Satisfactory
24020 Moderate
24021 Moderate
24022 Moderate

[4646 rows x 16 columns]

In [24]: #now lets take other test data for predicting air quality index
testdata=pd.read_csv('C:\sravan\TEST.csv')

In [25]: testdata

Out[25]:

STATE CITY DATE PM2.5 PM10 NO NO2 Nox NH3 CO SO2

Andhra
0 Rajamahendravaram 27/2/2019 31 49 16 4 10 0 49 0
Pradesh

1 assam gauhati 5/1/2019 18 19 10 29 16 44 19 0

2 assam gauhati 5/2/2019 30 31 12 2 20 17 31 0

4 assam gauhati 23/5/2019 31 31 12 2 20 17 31 0

3 assam gauhati 5/10/2019 43 42 11 2 24 19 42 0


... ... ... ... ... ... ... ... ... ... ... ...

86 Andhrapradesh Visakhapatnam 21/1/2020 90 0 22 6 8 23 0 0

87 Delhi Delhi 25/1/2020 89 0 67 0 0 23 0 0

88 Delhi Delhi 26/1/2020 88 0 45 4 5 35 0 0

amaravathi 1/4/2019 302 181 144 2 39 0 181 0


89 Andhra
Pradesh

90 Maharashtra Mumbai 12/2/2017 330 0 41 0 6 86 0 0

91 rows × 18 columns

In [26]: testdata=testdata.drop(['STATE','CITY','DATE','REMARK','HEALTH-IMPACT'],
axis='columns')
testdata

Out[26]:
predicted
PM2.5 PM10 NO NO2 Nox NH3 CO SO2 air quality O3 Benzene Toulene Xylene
index

0 31 49 16 4 10 0 49 0 287.80 3 0 0 0

1 18 19 10 29 16 44 19 0 287.80 44 0 0 0

2 30 31 12 2 20 17 31 0 439.18 50 0 0 0

3 43 42 11 2 24 19 42 0 446.16 57 0 0 0

4 31 31 12 2 20 17 31 0 436.68 49 0 0 0

... ... ... ... ... ... ... ... ... ... ... ... ... ...

86 90 0 22 6 8 23 0 0 252.76 67 0 0 0

87 89 0 67 0 0 23 0 0 130.52 45 0 0 0

88 88 0 45 4 5 35 0 0 241.26 67 0 0 0

89 302 181 144 2 39 0 181 0 152.94 78 0 0 0

90 330 0 41 0 6 86 0 0 160.54 52 0 0 0

91 rows × 13 columns

In [28]: target1=traindata['air_quality_index']
traindata3=traindata2.drop(['air_quality_index'],axis='columns')
traindata3

Out[28]:
PM2.5 PM10 NO NO2 Nox NH3 CO SO2 O3 Benzene Toluene Xylene

1969 81.40 124.50 1.44 20.50 12.08 10.72 0.12 15.24 127.09 0.20 6.50 0.06

1970 78.32 129.06 1.26 26.00 14.85 10.28 0.14 26.96 117.44 0.22 7.95 0.08

1971 88.76 135.32 6.60 30.85 21.77 12.91 0.11 33.59 111.81 0.29 7.63 0.12

1972 64.18 104.09 2.56 28.07 17.01 11.42 0.09 19.00 138.18 0.17 5.02 0.07

1973 72.47 114.84 5.23 23.20 16.59 12.25 0.16 10.55 109.74 0.21 4.71 0.08

... ... ... ... ... ... ... ... ... ... ... ... ...

24018 19.03 50.03 77.24 14.17 57.37 11.30 0.43 9.83 23.31 0.66 3.22 0.16

24019 12.37 39.29 66.20 11.68 58.88 11.30 0.39 8.63 31.79 0.55 3.05 0.14

24020 15.21 41.96 79.67 13.50 69.42 10.13 0.42 9.37 33.08 0.69 1.24 0.73

24021 30.93 60.26 69.32 14.46 61.62 10.08 0.52 11.96 41.62 1.67 1.82 2.62

24022 29.26 76.89 75.87 11.84 65.66 12.02 0.52 7.86 35.56 2.28 1.93 2.75

4646 rows × 12 columns

In [29]: testing=RandomForestRegressor(n_estimators=50)

In [30]: testing.fit(traindata3,target1)

Out[30]: RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,


max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=Non
e,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=50,
n_jobs=None, oob_score=False, random_state=None,
verbose=0, warm_start=False)

In [121]: res=testing.predict(testdata)
res

Out[121]: array([287.8 , 287.8 , 439.18, 446.16, 436.68, 374.54, 439.46, 257.6 ,


261.3 , 151.88, 154.24, 182.34, 167.78, 158.42, 216.1 , 235.48,
159.02, 84.38, 120.54, 139.36, 122.84, 259. , 163.4 , 271.24,
302.16, 284.88, 220.02, 214.92, 290. , 232.42, 107.88, 158.8 ,
151.68, 219.86, 262.74, 376.44, 303.76, 286.04, 116.36, 117.02,
151.28, 139.9 , 86.6 , 157. , 218.88, 344.84, 246.8 , 131.38,
185.02, 339.94, 384.86, 159.1 , 406.88, 264.78, 283.36, 162.58,
131.34, 224.74, 249.44, 130.32, 129.5 , 158.94, 166.8 , 281.24,
178.24, 140.36, 187.14, 153.94, 334. , 145.48, 505.54, 494.8 ,
170.4 , 88.22, 183.48, 265.9 , 146.84, 146.14, 170.68, 141.84,
168.1 , 162.5 , 170.2 , 186.08, 170.52, 162.24, 252.76, 130.52,
241.26, 152.94, 160.54])

In [31]: testing.score(traindata3,target1)

Out[31]: 0.9924580809809389

In [32]: res=pd.DataFrame(res)

In [33]: res

Out[33]:
0

0 374.12
1 156.00

2 266.72

3 174.98

4 79.00
... ...

3247 41.02

3248 49.00

3249 46.00
3250 140.00

3251 247.98

3252 rows × 1 columns

In [34]: #now keep this in test(result) data set


testdata["predicted air quality index"]=res

In [35]: testdata.to_csv (r'C:\sravan\predicted_airquality_final.csv', index = Fa


lse, header=True)

In [36]: testdata

Out[36]:
predicted
PM2.5 PM10 NO NO2 Nox NH3 CO SO2 air quality O3 Benzene Toulene Xylene
index

0 31 49 16 4 10 0 49 0 374.12 3 0 0 0

1 18 19 10 29 16 44 19 0 156.00 44 0 0 0

2 30 31 12 2 20 17 31 0 266.72 50 0 0 0

3 43 42 11 2 24 19 42 0 174.98 57 0 0 0

4 31 31 12 2 20 17 31 0 79.00 49 0 0 0

... ... ... ... ... ... ... ... ... ... ... ... ... ...

86 90 0 22 6 8 23 0 0 45.00 67 0 0 0

87 89 0 67 0 0 23 0 0 222.02 45 0 0 0

88 88 0 45 4 5 35 0 0 69.00 67 0 0 0

89 302 181 144 2 39 0 181 0 105.00 78 0 0 0

90 330 0 41 0 6 86 0 0 137.00 52 0 0 0

91 rows × 13 columns

In [37]: traindata3
FINAL RESULT:

Clustering the Air Quality Index vs COVID

We are considering AQI vs COVID for cluster the data and then group into 5
clusters.
They are good, satisfactory, Moderate, poor, Very poor.
good=cluster(0),satisfactory=cluster(2),poor=cluster(3),moderate=cluster(1),very
poor=cluster(4)
Data:

We used K Means Clustering Algorithm to cluster the data and scatter plot to
visualize the data.
I n [ 3 ] : import pandas a s p d

I n [ 4 2 ] : #loading the train data set of airquality(90 instances)


data= p d. read_csv( 'C: \ sravan \internship \\airpollution_cluster_analysis.cs
v ')

In [43]: data

Out[43]:
P M 2.5- P M 1 0- N O2- N H 3- S O 2- OZ O N E -
ST AT E C IT Y D AT E CO
AV G AV G A VG AV G AG AV G

An dhr a
0 a mar a v athi 1/1/2 01 9 1 90 13 1 10 7 4 42 0 63
Pr ad es h

Andhr a
1 a mar a v athi 1/2/2 01 9 1 88 13 1 11 0 4 40 0 62
Pr ades h

An dhr a
2 a mar a v athi 1/3/2 01 9 2 80 17 4 15 5 2 37 0 52
Pr ad es h

Andhr a
3 a mar a v athi 1/4/2 0 19 3 0 2 18 1 1 44 2 39 0 7 8 tr af
Pr ades h

An dhr a
4 a mar a v athi 1/6/2 01 9 2 85 16 0 12 1 3 19 0 71
Pr ad es h

... ... ... ... ... ... ... ... ... ... ...

86 An dhr a pr ad es h Vi s ak h ap atn a m 2/4/ 20 20 12 3 0 5 6 6 0 0 5 6 tr af

87 D el hi D el hi 4/1/20 20 43 0 76 4 0 0 76 tr af

88 D el hi D el hi 23/1/2 02 0 11 1 0 46 7 0 0 78 tr af

89 D el hi D el hi 25/1/2 02 0 89 0 67 0 0 23 45 tr af

90 D el hi D el hi 26/1/2 02 0 88 0 45 4 5 35 67 tr af

91 ro ws × 15 colu mn s

I n [ 4 4 ] : inputs =d a t a. d r o p ( 'AIR_QUALITY_INDEX' , a x i s =' c o l u m n s')

I n [ 4 5 ] : target =data[ 'AIR_QUALITY_INDEX' ]

I n [ 4 6 ] : target

Out[46]: 0 190
1 188
2 280
3 302
4 285
...
86 123
87 43
88 111
89 89
90 88
Name: AIR_QUALITY_INDEX, Length: 91, dtype: int64

inputs
In [47]:

Out[47]:
P M 2.5- P M 1 0- N O2- N H 3- S O 2- OZ O N E -
ST AT E C IT Y D AT E CO
AV G AV G A VG AV G AG AV G

An dhr a
0 a mar a v athi 1/1/2 01 9 1 90 13 1 10 7 4 42 0 63
Pr ad es h

Andhr a
1 a mar a v athi 1/2/2 01 9 1 88 13 1 11 0 4 40 0 62
Pr ades h

An dhr a
2 a mar a v athi 1/3/2 01 9 2 80 17 4 15 5 2 37 0 52
Pr ad es h

Andhr a
3 a mar a v athi 1/4/2 0 19 3 0 2 18 1 1 44 2 39 0 7 8 tr af
Pr ades h

An dhr a
4 a mar a v athi 1/6/2 01 9 2 85 16 0 12 1 3 19 0 71
Pr ad es h

... ... ... ... ... ... ... ... ... ... ...

86 An dhr a pr ad es h Vi s ak h ap atn a m 2/4/ 20 20 12 3 0 5 6 6 0 0 5 6 tr af

87 D el hi D el hi 4/1/20 20 43 0 76 4 0 0 76 tr af

88 D el hi D el hi 23/1/2 02 0 11 1 0 46 7 0 0 78 tr af

89 D el hi D el hi 25/1/2 02 0 89 0 67 0 0 23 45 tr af

90 D el hi D el hi 26/1/2 02 0 88 0 45 4 5 35 67 tr af

91 ro ws × 14 colu mn s

In [ ]:

I n [ 4 8 ] : from sklearn.preprocessing i m p o r t L a b e l E n co d e r
#converting binary to nominal using labelencoder

I n [ 4 9 ] : le_fever = LabelEncoder()
inputs[ ' covid' ] = le_fever . fit_transform(inputs[ 'COVID' ] )

In [50]: inputs

Out[50]:
ST AT E C IT Y D AT E P M 2.5- P M 1 0- N O2- N H 3- S O 2- C O OZ O N E -
AV G AV G A VG AV G AG AV G

An dhr a
0 a mar a v athi 1/1/2 01 9 1 90 13 1 10 7 4 42 0 63
Pr ad es h

Andhr a
1 a mar a v athi 1/2/2 01 9 1 88 13 1 11 0 4 40 0 62
Pr ades h

An dhr a
2 a mar a v athi 1/3/2 01 9 2 80 17 4 15 5 2 37 0 52
Pr ad es h

Andhr a
3 a mar a v athi 1/4/2 0 19 3 0 2 18 1 1 44 2 39 0 7 8 tr af
Pr ades h

An dhr a
4 a mar a v athi 1/6/2 01 9 2 85 16 0 12 1 3 19 0 71
Pr ad es h
... ... ... ... ... ... ... ... ... ... ...

86 An dhr a pr ad es h Vi s ak h ap atn a m 2/4/ 20 20 12 3 0 5 6 6 0 0 5 6 tr af

87 D el hi D el hi 4/1/2 0 20 4 3 0 7 6 4 0 0 7 6 tr af

88 D el hi D el hi 23/1/2 02 0 1 11 0 46 7 0 0 78 tr af

89 D el hi D el hi 25/1/2 02 0 89 0 67 0 0 23 45 tr af

90 D el hi D el hi 26/1/2 02 0 88 0 45 4 5 35 67 tr af

91 ro ws × 15 colu mn s

In [51]: target

Out[51]: 0 190
1 188
2 280
3 302
4 285
...
86 123
87 43
88 111
89 89
90 88
Name: AIR_QUALITY_INDEX, Length: 91, dtype: int64

In [52]: #making results for clustering analysis


thenres = inputs . drop([ 'STATE' ,'CITY' , ' D A T E ', ' P L A C E ', 'REMARK' , 'HEALTH - I M P A
C T ',' C O V I D ', ] , a x i s =' c o l u m n s' )
thenres

Out[52]:
PM 2.5- A V G P M 1 0- A V G N O2- AV G N H 3- A V G S O 2- A G C O OZ ON E - AV G c o v id

0 19 0 13 1 1 07 4 42 0 63 0

1 18 8 13 1 1 10 4 40 0 62 0

2 28 0 174 15 5 2 37 0 52 0

3 30 2 181 14 4 2 39 0 78 0

4 28 5 160 12 1 3 19 0 71 0

... ... ... ... ... ... ... ... ...

86 12 3 0 56 6 0 0 56 1

87 43 0 76 4 0 0 76 1

88 11 1 0 46 7 0 0 78 1

89 89 0 67 0 0 23 45 1

90 88 0 45 4 5 35 67 1

91 ro ws × 8 colum n s

I n [ 5 3 ] : thentarget =t h e n r e s [' c o v i d ' ]

I n [ 5 4 ] : thentarget

Out[54]: 0 0
1 0
2 0
3 0
4 0
..
86 1
87 1
88 1
89 1
90 1
Name: covid, Length: 91, dtype: int32

I n [ 5 5 ] : from sklearn.svm import SVC


s v m=S V C ( )
#predicting data within traindata using support vector machine

I n [ 5 6 ] : s v m.fit(thenres,thentarget)

C : \U s e r s\ rajesh \ anaconda3 \lib \ si t e- packages \ sklearn \ s v m\ base.py:193: Fut


ureWarning: The default value of gamma will change from 'auto' to 'scal
e' in version 0.22 to account better for unscaled features. Set gamma ex
plicitly to 'auto' or 'scale' to avoid this warning.
"avoid this warning.", FutureWarning)

Out[56]: SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,


decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
kernel='rbf', max_iter= - 1, prob ability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False)

In [57]: s v m.score(thenres,thentarget)

Out[57]: 1.0

I n [ 6 1 ] : from sklearn.cluster import K M e a n s


#i am using K Means algorithm for clustering

I n [ 6 3 ] : from matplotlib import pyplot a s plt


thenres

Out[63]:
PM 2.5- A V G P M 1 0- A V G N O2- AV G N H 3- A V G S O 2- A G C O OZ ON E - AV G c o v id

0 19 0 13 1 1 07 4 42 0 63 0

1 18 8 13 1 1 10 4 40 0 62 0

2 28 0 174 15 5 2 37 0 52 0

3 30 2 181 14 4 2 39 0 78 0

4 28 5 160 12 1 3 19 0 71 0

... ... ... ... ... ... ... ... ...

86 12 3 0 56 6 0 0 56 1

87 43 0 76 4 0 0 76 1

88 11 1 0 46 7 0 0 78 1

89 89 0 67 0 0 23 45 1

90 88 0 45 4 5 35 67 1

91 ro ws × 8 colum n s

In [64]: thenres[ 'air_quality_index'] =target


thenres

Out[64]:
SO 2- OZ ON E-
PM 2.5- PM 10- N O 2- N H 3- CO co v id air _q u alit y_in d ex
AV G AVG AV G A V G AG AV G

0 19 0 13 1 1 07 4 42 0 63 0 19 0

1 18 8 13 1 1 10 4 40 0 62 0 18 8

2 28 0 17 4 1 55 2 37 0 52 0 28 0

3 30 2 18 1 1 44 2 39 0 78 0 30 2

4 28 5 16 0 1 21 3 19 0 71 0 28 5

... ... ... ... ... ... ... ... ... ...

86 123 0 56 6 0 0 56 1 123

87 43 0 76 4 0 0 76 1 43

88 11 1 0 46 7 0 0 78 1 11 1

89 89 0 67 0 0 23 45 1 89

90 88 0 45 4 5 35 67 1 88

91 ro ws × 9 colum n s

In [66]: p l t.scatter(thenres[ ' c o v i d '],thenres[ 'air_quality_index' ])


#visualizing scatterplot before and after corona
p l t.t i t l e ( 'AIR QUALITY VS COVID' )
p l t.x l a b e l (' C O V I D ' )
p l t.y l a b e l (' A I R Q U AL I T Y I N DE X ' )

Out[66]: Text(0, 0.5, 'AIR QUALITY INDEX')

In [67]: k m=K M e a n s ( n_ c l u s t e rs = 5)
km
#dividing into 5 clusters

Out[67]: KMeans(algorithm='auto', copy_x=True, init='k - means++', max_iter=300,


n_clusters=5, n_init=10, n_jobs=None, precompute_distances='aut
o',
random_state=None, tol=0.0001, verbose=0)

In [68]: clus =k m. fit_predict(thenres[[ 'covid' , 'air_quality_i n d e x ']])


clus
#displaying the cluster data group

Out[68]: array([1, 1, 3, 3, 3, 3, 3, 4, 4, 4, 0, 0, 0, 0, 4, 4, 0, 0, 0, 0, 0, 4,
0, 4, 4, 4, 4, 4, 4, 0, 0, 4, 0, 4, 4, 1, 1, 4, 0, 0, 0, 0, 0, 0 ,
4, 1, 0, 0, 4, 1, 3, 1, 3, 4, 4, 4, 4, 1, 1, 0, 0, 4, 1, 4, 0, 0 ,
0, 0, 3, 0, 2, 3, 1, 0, 3, 4, 0, 0, 0, 0, 0, 4, 0, 4, 0, 4, 4, 0 ,
4, 4, 4])

In [69]: thenres[ 'grouped_pollutuion' ]=clus


thenres
#displaying in the dataset
#good=cluster(0),satisfactory=cluster(2),poor=cluster(3),moderate=cluste
r(1),very poor=cluster(4)

Out[69]:
PM 2.5- PM 10- N O 2- N H 3- SO 2- OZ ON E-
CO co v id air _q u alit y_in d ex g r o u p ed _ p o llu t u io n
AV G AV G A VG AV G AG AV G

0 190 13 1 10 7 4 42 0 63 0 19 0 1

1 188 13 1 11 0 4 40 0 62 0 18 8 1

2 280 17 4 15 5 2 37 0 52 0 28 0 3

3 302 1 81 144 2 39 0 78 0 30 2 3

4 285 1 60 121 3 19 0 71 0 28 5 3

... ... ... ... ... ... ... ... ... ... ...

86 12 3 0 56 6 0 0 56 1 12 3 4

87 43 0 76 4 0 0 76 1 43 0

88 11 1 0 46 7 0 0 78 1 11 1 4

89 89 0 67 0 0 23 45 1 89 4

90 88 0 45 4 5 35 67 1 88 4

91 ro ws × 10 colu mn s

In [76]: d f 1=thenres[thenres.grouped_pollutuion = = 0 ]
d f 2=thenres[thenres.grouped_pollutuion = = 1 ]
d f 3=thenres[thenres.grouped_pollutuion = = 2 ]
d f 4=thenres[thenres.grouped_pollutuion = = 3 ]
d f 5=thenres[thenres.grouped_pollutuion = = 4 ]

p l t.scatter(df1 . covid,df1[ 'air_quality_index' ],color ="green" )


p l t.scatter(df2 . covid,df2[ 'air_quality_index' ],color ="blue" )
plt .scatter(df3 .covid,df3[ 'air_quality_index'],color = "yellow" )
p l t.scatter(df4 . covid,df4[ 'air_quality_index' ],color ="red" )
p l t.scatter(df5 . covid,df5[ 'air_quality_index' ],color ="black" )
p l t.x l a b e l (' c o v i d ' )
p l t.y l a b e l (' a i r quality')
p l t.l e g e n d (' 2 3 4 0 1 ' )

Out[76]: <matplotlib.legend.Legend at 0xe6d0f50>

In [ ] :

In [ ] :

In [32]:

In [ ] :

In [86]:

In [ ] :

In [92]:

In [ ] :

In [ ] :

In [ ] :

In [ ] :

In [98]:

In [ ] :

In [ ] :
RESULT:

Marking the affect of pollution and disease as per the central government
standards (Category prediction)

Central Government Standards

Technology Used: Python Random forest Classifier

Train data:
Test data:

Predicted Result:
In [51 ]: i m p o r t p andas a s p d
f r o m mat plotl ib im port p yplot a s p lt

I n [13 7]: #loading the train data set o f air qualit y(90 instan ces)
t r a i n d a t a = p d . read_ csv( 'C : \sra van \ i nterns hip \ \ airpol lutio n_eff ect_ca use_t
raindata .csv' )
traindat a

O ut [ 1 3 7 ] :
STATE CI TY DATE PM2.5- PM10- NO2- NH3- SO2- CO OZO NE-
AVG AVG AVG AVG AG AVG

Andhra
0 amaravathi 1/1/2019 190 131.0 107 4 42 0 63
Pradesh

1 An dhr a
amaravathi 1/2/2019 188 131.0 110 4 40 0 62
Pradesh

Andhra
2 amaravathi 1/3/2019 280 174.0 155 2 37 0 52
Pradesh

3 An dhr a
amaravathi 1/4/2019 302 181.0 144 2 39 0 78 traf
Pradesh

Andhra
4 amaravathi 1/6/2019 285 160.0 121 3 19 0 71
Pradesh

... ... ... ... ... ... ... ... ... ... ...

86 Andhrapradesh Visakhapatnam 2/4/2020 123 0.0 56 6 0 0 56 traf

87 Delhi Delhi 4/1/2020 43 0.0 76 4 0 0 76 traf

88 Delhi Delhi 23/1/2020 111 0.0 46 7 0 0 78 traf

89 Delhi Delhi 25/1/2020 89 0.0 67 0 0 23 45 traf

90 Delhi Delhi 26/1/2020 88 NaN 45 4 5 35 67 traf

91 ro w s × 1 5 colu m n s

In [138] : #loading the test data s et of airq uality (19 i nstanc es)
t e s t d a t a =p d. r ead_c sv( ' C : \ srav an \in ternsh ip \ \ a irpoll ution _effe ct_cau se_te
stdata.c sv')
testdata

O ut [ 1 3 8 ] :
STATE CI TY DATE PM2.5- PM10- NO2- NH3- SO2- CO- O ZONE - P
AVG AVG AVG AVG AVG AVG AVG

0 Telangana Hyderabad 4/1/2020 110 94 25 3 2 32 32

1 Telangana Hyderabad 4/2/2020 117 95 25 4 1 39 27

2 Telangana Hyderabad 4/3/2020 66 73 7 3 5 27 17 i

3 Telangana Hyderabad 4/4/2020 57 65 5 2 6 25 19 i

4 Telangana Hyderabad 4/5/2020 61 68 8 2 6 23 17 i

5 Telangana Hyderabad 4/6/2020 51 61 10 2 9 24 17

6 Telangana Hyderabad 4/7/2020 39 55 24 7 16 25 26 i

7 Telangana Hyderabad 4/8/2020 31 43 25 8 23 22 26 i

8 Telangana Hyderabad 4/9/2020 49 58 23 6 20 26 25 i

9 Telangana Hyderabad 4/10/2020 38 40 19 5 12 19 24 i

t
10 Andhra Amaravati 4/1/2020 64 69 6 2 32 18 34
pradesh i

t
11 Andhra Amaravati 4/2/2020 48 57 6 2 27 - 26
pra de s h i

t
12 Andhra Amaravati 4/3/2020 50 59 5 2 28 - 17
pradesh i

Andhra
13 Rajamahendravaram 4/4/2020 56 56 9 2 10 28 37
pradesh

Andhra
14 Rajamahendravaram 4/5/2020 43 48 8 2 9 27 33 i
pradesh

Andhra
15 Rajamahendravaram 4/6/2020 34 40 7 2 9 27 17
pradesh

Andhra
16 Tirupati 4/7/2020 35 38 7 1 8 26 27 i
pradesh

17 Andhra Tirupati 4/8/2020 37 33 7 1 7 22 63 i


pra de s h

18 Andhra visa khapatnam 4/9/2020 23 37 33 2 9 6 26


pradesh

19 Andhra visa khapatnam 4/10/2020 42 71 48 2 7 6 22


pra de s h

In [139] : #scatter plot showi ng the stat e and its a ir qu ality index
p l t. s c a t t e r ( t r a i n d a t a[ ' A I R _ Q UA L I T Y _ I N D E X ' ] , t r a i n d a t a [' R E M A R K ' ] )
p l t. titl e( 'PO LLUTI ON REMA RK' )
p l t. xlab el( 'A IR_QU ALITY_ INDEX ' )
p l t. ylab el( 'P OLLUT ION REM ARK' )

O ut [ 1 3 9 ] : Tex t(0, 0.5, ' POLLU TION REMARK ')

I n [14 0]: #goal is to p redic t base d on air p olluti on we will say w hich level of po
llution you w ill b e affe cted.

I n [14 1]: # w e a r e u s i n g c l a s s i f i c a t i o n t e c h n i q u e f o r th is

I n [14 2]: t r a i n _ d a t a s e t = trai ndata . drop( [ 'HEA LTH - IM PACT' ,' S O 2- A G ' , ' CO' ,' DATE' , ' C I T
Y ','STAT E' ,'P LACE' , 'COVI D' ,'P M2.5 - A V G ' ,' PM10 - A V G ', ' NO2 - A VG' ,' NH3 -AV G' , ' O
Z O N E-AVG ' ],ax is ='c olumns ' )
# t e s t _ d a t a s e t = t e s t d a t a . d r o p ( [ ' H E A L T H - IMP ACT', 'SO2 - A G','C O','D ATE',' CIT
Y ' , ' S T A T E ' , ' P L A C E ' , ' C O V I D ' , ' P M 2 . 5 - AVG',' PM10 - AVG',' NO2 - A VG',' NH3 -AV G','O
Z O N E-AVG '],ax is='c olumns ')

I n [143]: train_da taset

O ut [ 1 4 3 ] :
AIR_QUALI TY_I NDEX REM ARK

0 190 moderate

1 188 moderate

2 280 poor

3 302 very poor

4 285 poor

... ... ...

86 123 moderate

87 43 good

88 111 moderate

89 89 satisfactory

90 88 satisfactory

91 ro w s × 2 col um n s

In [ ]:

I n [14 4]: f r o m skl earn. prepr ocessi ng im port LabelE ncode r


#convert ing b inary to no minal usin g labe lenco der

I n [14 5]: l e _ v a r= L abelE ncode r()


t r a i n _ d a t a s e t [ 'pol lution _effe ct_ca tegory ' ]= le _var . f it_tr ansfo rm(tra in_da
t a s e t ['R EMARK ' ] )

I n [14 6]: train_da taset


#it is c atego rized that 1=mod erate ,2=poo r,0=g ood,3= satis facto ry and 4 =
very poo r.
t r a i n _ d a t a s e t 1 = tra in_dat aset . drop( [ 'REMA RK' ], axis = ' colum ns' )

I n [147]: train_da taset 1

O ut [ 147] :
AIR_QUALI TY_I NDEX polluti on_effect_categor y

0 190 1

1 188 1

2 280 2

3 302 4

4 285 2

... ... ...

86 123 1

87 43 0

88 111 1

89 89 3

90 88 3

91 ro w s × 2 col um n s

I n [136]:

-------- ----- ----- ------ ----- ----- ------ ----- ------ ----- ----- ------ -----
---
KeyError T r a c e b a c k ( m o s t r e c e n t c a l l la
st)
< i p y t h o n -inpu t - 1 3 6 -e8af5 3e925 a5> i n <mod ule>
- - - -> 1 train _data set3 = t rain_ datas et1 . dr op ( [' pollut ion_e ffect _categ ory' ]
,a x i s='c olumn s' )

~\ anacon da3 \ l ib \si te -pac kages \pand as \cor e \ fra me.py i n dr op (se lf, la bels,
axis, in dex, colum ns, le vel, inpla ce, er rors)
3995 l e v e l =le vel ,
3996 i n p l a c e = inpla ce ,
-> 3 9 9 7 e r r o r s =e rrors ,
3998 )
3999

~\ anacon da3 \ l ib \si te -pac kages \pand as \cor e \ gen eric.p y i n d r o p( self, label
s , a x i s , i n d e x , c o l u m n s , l e v e l , i n p l a c e , error s)
3934 f o r a x i s , lab els i n a x e s . item s ( ) :
3935 i f label s is not N o n e :
-> 393 6 o b j = ob j . _dr op_ax is ( lab els , a x i s , l e v e l = leve l ,
e r r o r s =e rrors )
3937
3938 i f inp lace :

~\ anacon da3 \ l ib \si te -pac kages \pand as \cor e \ gen eric.p y i n _drop _axis ( self,
l a b e l s , a x i s , l e v e l , error s)
3968 n e w _a x i s = ax is . dr op ( lab els , l e v e l = l e v e l , erro rs
=e r r o r s )
3969 else:
-> 3 9 7 0 n e w _a x i s = ax is . dr op ( lab els , e rrors = error s )
3971 r e s u l t = s e l f . rein dex (** { axis _name : new_ axis } )
3972

~\ anacon da3 \ l ib \si te -pac kages \pand as \cor e \ ind exes \ b ase.p y i n d r o p (s elf,
labels, error s)
5016 i f m a s k .a ny ( ) :
5017 i f error s ! = "i gnore" :
-> 5 0 1 8 r a i se Ke yErro r (f"{ labels [mask ]} no t foun d in axi
s ")
5019 i n d e x e r = in dexer [ ~ m a s k ]
5020 r e t u r n s e l f . dele te ( in dexer )

K e y E r r o r : "[' pollu tion_e ffect _cate gory'] not found in ax is"

I n [12 5]: #here i am us ing d ecisio n tre e cla sifier for classi fying the pollut ion r
emark.
f r o m skl earn impor t t r e e

I n [12 6]: f r o m skl earn. ensem ble im port Rando mFores tClas sifier

I n [12 7]: r a m= Rand omFor estCl assifi er(n_ estim ators = 100 )

I n [12 8]: r a m. fit( train _data set1,t arget _trai n_data set1)

O ut [ 1 2 8 ] : Ran domFo restCl assif ier(b ootstr ap=Tr ue, cl ass_w eight =None, crit erion= ' g i n
i',
m a x _ de p t h = N o n e , m a x _ f e a t u r e s = ' a u t o ' , m a x _ l e a f _ n o d
es=None,
m i n _ im p u r i t y _ d e c r e a s e = 0 . 0 , m i n _ i m p u r i t y _ s p l i t = N o n
e,
m i n _ sa m p l e s _ l e a f = 1 , m i n _ s a m p l e s _ s p l i t = 2 ,
m i n _ we i g h t _ f r a c t i o n _ l e a f = 0 . 0 , n _ e s t i m a t o r s = 1 0 0 ,
n _ j o bs = N o n e , o o b _ s c o r e = F a l s e , rand om_sta te=No ne,
v e r b os e = 0 , w a r m _ s t a r t = F a l s e )

I n [14 8]: t e s t i n g = testd ata . d rop([ ' HEALT H -IMP ACT' ,' SO2 - A VG' ,'C O - AVG ' ,'DA TE' ,'C ITY' ,
' S T A T E ' , 'PLAC E' ,'P M2.5 -A VG' , ' PM10 - A V G ' ,' NO2 - A VG' ,'N H3 -AV G' , 'O ZONE - A VG' ] ,
a x i s='co lumns ' )

I n [15 2]: t a r g e t _ t r a i n _ d a t a s e t = tra in_da taset [ 'poll ution _effec t_cat egory ' ]
target_t rain_ datas et

Out[152] : 0 1
1 1
2 2
3 4
4 2
..
86 1
87 0
88 1
89 3
90 3
Name: po lluti on_ef fect_c atego ry, L ength: 91, dtype: int3 2

In [155] : t r a i n _ d a t a s e t 1 = tra in_dat aset . drop( [ 'poll ution _effec t_cat egory ' , 'REM ARK'
] , a x i s= ' colum ns' )
train_da taset 1

Out[155] :
A IR _ QU A L IT Y_ IN D E X

0 190

1 188

2 280

3 302

4 285

... ...

86 123

87 43

88 111
89 89

90 88

91 ro w s × 1 col um n s

In [97 ]:

In [98 ]: testing

Out[98]:
A IR _ QU A L IT Y_ IN D E X

0 110

1 117

2 73

3 65

4 68

5 61

6 55

7 43

8 58

9 40

10 69

11 57

12 59

13 56

14 48

15 40

16 38

17 63

18 37

19 71

In [ ]:

I n [ 1 0 3 ] : tes ting

Out[103] :
A IR _ QU A L IT Y_ IN D E X

0 110

1 117

2 73

3 65

4 68

5 61

6 55

7 43

8 58

9 40

10 69

11 57

12 59

13 56

14 48
METHDOLOGY II

We seen results about air pollution by considering different attributes like AQI and COVID
before and now COVID.

Now in this Methodology we want to predict the air pollution an deaths of people (after
COVID).

Tool Used: Tableau.

So we use tableau to predict the next year pollution an death rate, by considering each attribute in
city.csv file. So let’s recap the data set.

This dataset contain data from the year 2015 to May 2020(till present)

Let’s move on…..

Fig 8.1.1- AQI vs Year


Description:

AQI – 2015: 386,337

AQI – 2016: 489,903

AQI – 2017: 564,131

AQI – 2018: 1,005,646

AQI – 2019: 1,050,165

AQI – 2020: 3, 59, 407

Fig: 8.1.2: Predicting to 2021, 2022, 2023 and 2024

AQI – 2021: 2, 77, 570

AQI – 2022: 2, 67, 210

AQI – 2034: 2, 11, 211

AQI – 2024: 2, 34, 345


Fig 8.2.1 : Each chemical pollutants reaction on the environment and its prediction rate up to
2024

Fig 8.2.2 : Each chemical pollutants reaction on the environment and its prediction rate upto
2024

Summary of the data:


SUM (Benzene)

Sum: 51,465

Average: 10,293

Minimum: 4,956

Maximum: 19,768

Median: 9,281

Standard deviation: 6,118

First quartile: 5,154

Third quartile: 12,306

Skewness: 0.70

Excess Kurtosis: -0.86

SUM (NH3)

Sum: 358,869

Average: 71,774

Minimum: 44,766

Maximum: 107,020

Median: 62,112

Standard deviation: 27,659

First quartile: 50,192

Third quartile: 94,778

Skewness: 0.33

Excess Kurtosis: -1.62

SUM (NO)

Sum: 362,816

Average: 72,563
Minimum: 38,347

Maximum: 111,688

Median: 58,267

Standard deviation: 33,752

First quartile: 48,913

Third quartile: 105,601

Skewness: 0.29

Excess Kurtosis: -1.73

SUM (Toluene)

Sum: 142,619

Average: 28,524

Minimum: 12,710

Maximum: 52,022

Median: 16,467

Standard deviation: 19,040

First quartile: 15,012

Third quartile: 46,409

Skewness: 0.43

Excess Kurtosis: -1.75

SUM(Xylene)

Sum: 68,693

Average: 6,869

Minimum: 720

Maximum: 10,626

Median: 8,219
Standard deviation: 3,375

First quartile: 5,046

Third quartile: 8,219

Skewness: -0.80

Excess Kurtosis: -0.78

SUM (Air Quality Index)

Sum: 8,421,167

Average: 842,116.70

Minimum: 386,337

Maximum: 1,050,165

Median: 984,997.00

Standard deviation: 254,114

First quartile: 669,347.50

Third quartile: 984,997.00

Skewness: -0.94

Excess Kurtosis: -0.95

Fig 8.3.1: Predicting Remark on Industry and traffic air pollution

Mostly we got satisfactory results. i.e pollution range : (above 50 but less than100)
Fig: 8.4 :Predicting industry and air pollution 2020-2024
We found mostly we get satisfactory results for the next four years.

Fig 8.5 cities vs remark

Similarly we obtained majority as satisfactory for the given cities for the next four years.

Fig 8.6.1 : Industry Pollution


Fig 8.7 : COVID vs Air pollution
Fig 8.8 Industry smoke prediction

Year =2020

Lower Prediction Interval for Suspended Particulate Matter (SPM)=-100.197425345

Upper Prediction Interval for Suspended Particulate Matter =161.558892074


(SPM) Suspended Particulate Matter (SPM)= 30.680733365

Year =2034

Lower Prediction Interval for Suspended Particulate Matter (SPM)= -186.356139481

Upper Prediction Interval for Suspended Particulate Matter =247.717606211

(SPM) Suspended Particulate Matter (SPM)= 30.680733365.


PREDICTION CONCLUSION

FINALLY, FOR THE NEXT FOUR YEARS BY CONSIDERING ALL THE FACTORS,
WE GOT PREDICTION AS “SATISFACTORY” (50-100 IS THE POLLUTION RANGE).

EFFECT: Minor breathing discomfort to sensitive people.


SUMMARY
1. Air Pollution Major sources are Traffic and Industry, which include PM2.5 and PM10
major chemicals.
2. Based on the Air Quality Index (AQI) Pollution is estimated and causes effects in living
organisms. Central Government standards are followed for formulating AQI.
3. Tableau analysis tool is used to analyze this data.
4. Air quality is predicted based on chemical pollutants and model is fitted on Training data
using Random Forest Regressor and trained on 2020 dataset.
5. After predicting the AQI, based on COVID estimation, they are clustered into 5
categories like good, satisfactory, poor, moderate and very poor.
6. The finally Classification technique is applied on my dataset to predict the type of disease
, the classification techniques are Support vector machine and random forest
Classifier.
7. For Future Prediction of Air Pollution, Tableau is used for forecasting the data till 2024
like each chemical occurance and overall AQI.
8. Industry pollution is also forecasted up to 2050.
9. Finally ,We can analyze and predict that for the upcoming years the air pollution in will
be “SATISFACTORY” , such that pollution can range mainly due to Industry and Traffic
or both by 50 -100
10. So the effect would be “ Minor breathing discomfort to sensitive people “.
11. Finally on an average, there are no major problems facing with air pollution, based on the
results we got.
CONCLUSION

The data is taken from Central Government of India. The best ensembling regression
Techniques like Random Forest, Bagging are used. Data is correctly analyzed using
tableau tool. The prediction results are approximately correct. There is no Code and
analysis Plagiarism.

You might also like