You are on page 1of 19

Abstract

Today, environmental sustainability is a crucial component of daily life.


One of the biggest threats to the environment's long-term viability is air pollution.
ability. The issue of poverty has been present in Delhi, India's capital.
For a long time, there was an air quality index. The poor air quality has been detrimental.
the residents' quality of life. As is often said, prevention is preferable to treatment.
would be beneficial to anticipate potential future events in order to better prepare.
in order to handle it. Several time series forecasting techniques are used in this thesis.
to forecast Delhi's Air Quality Index for the upcoming few time periods. . the pollution.
Particulate Matter (PM2.5, PM10), Sulphur Dioxide (SO2), and Carbon levels.
Among other things, monoxide (CO) and nitrogen dioxide (NO2) have been predicted for a.
Delhi's only selected location. The mistakes for various techniques have been corrected.
ted. In addition to citing other secondary sources, this research paper's findings.
sources that provide some insight into the fundamental problems with air pollution. It might.
be used for a variety of upcoming studies that are related and may be conducted in the future.
discusses the smog in Delhi. There were two models in for the suggested design.
regrated. All data was passed to the gated recurrent unit in the first layer.
two dense layers came after the Long Short-Term Memory layer. This .
Comparing the proposed model to the Gated Long Short-Term Memory allows for evaluation.
Decision-tree, linear regression, and recurrent unit models. Average sq\. Er-.
Performance metrics include Mean Absolute Error, error, and Root Mean sq\. Error.
whose error rates are calculated. The final result is produced when two models are combined.
Some factors' performance gets better.

1 Introduction

A statistic for evaluating the quality of the air in our immediate surroundings is the Air Quality Index
(AQI). It gauges the potential health effects of air pollution over a given time frame. Particulate
Matter (PM2.5, PM10), Nitrogen Dioxide (NO2), Carbon Monoxide (CO), Sulfur Dioxide (SO2), and
Ozone (O3) are the most prevalent air pollutants. The top three global risks are high cholesterol,
tobacco use, and dietary hazards, and there are 87 health risk factors when compared to the
number of fatalities in the year 2019. Polluted air is one of these risk factors. But in India, air
pollution now ranks first among risk factors for premature death, surpassing all others. More than
90% of people reside in areas where the World Health Organization's (WHO) recommendation for
healthy air is exceeded. The WHO's strictest standard for air quality is not met where more than half
of the population resides. India continued to rank among the top ten nations with in 2019.

the greatest exposure to ozone (O3). India's ozone (O3) concentrations have increased by the most
(17%) in the past ten years. Nine out of ten people, according to recent WHO data, breathe air that is
significantly contaminated. According to the World Health Organization, breathing polluted air
causes nearly 7 million deaths worldwide annually. As a result, it's crucial to keep an eye on the city's
air quality and alert the populace as soon as possible.

The majority of Indian cities have seen a sharp decline in air quality in recent years. Along with more
recent pollutants like lead, carbon monoxide, sulphur dioxide, carbon dioxide, ozone, and particulate
matter (PM10 and PM2 point 5) that have been added to the atmosphere, there are also many more
traditional pollutants like carbon dioxide (CO2). We are harmed by the majority of air pollutants. The
most dangerous gas, however, is CO. It is also referred to as the Silent Killer because it kills quickly
and quietly. While depriving the brain and heart of the oxygen they need to function, it directly
enters blood cells and replenishes oxygen. If it's in the air, it quickly invades blood cells and causes
symptoms like headache, nausea, u, confusion, dizziness, etc. People experience unconsciousness
and vomiting as the pollutant level rises; if exposure is prolonged, however, it can also kill or damage
brain cells. All that is around us is what we call the environment. Human activity and natural
disasters are causing the environment to become more polluted; pollution is one of these dangers.
The amount of air pollutants in the surrounding atmosphere is determined by meteorological factors
like atmospheric wind speed, wind direction, ratio, and temperature. Because sweat won't
evaporate into the air when the humidity is higher, we experience much greater heat.

In order to evaluate air pollution, raw data can be difficult to understand. This is why air quality
indices are created. The US Environmental Protection Agency (US EPA) developed the air quality
index (AQI) for reporting daily air quality. The AQI primarily focuses on the health effects and effects
of air pollution. Governmental organizations should adhere to the EPA's guidelines for AQI
computations and provide the public with accurate and reliable AQIs to avoid underestimating air
pollution. During the research period, air quality index (AQI) readings were computed using the
EPA's methodology. An air quality index is a grading system that shows the risks related to each level
of air pollution as well as how polluted the atmosphere is. Citizens of all ages can use an air quality
index (AQI) to convert numerical data into a qualitative grading system that will help them better
understand how many contaminants are present in the air they breathe.

Use the EPA's rolling method to first average all contaminants. The pollutant with the highest
average level was then identified across all of the monitors. Two breakpoints are located around the
observation concentration Cp. Where IH i and IH i stand for the AQI break point concentrations and
BPH i and BPLO, respectively, for the higher and lower air pollutant concentrations.
IH i ILo
IP = (Cp BPLo) + ILo
BPH i BPLo

1.1 Background and motivation

Environmental squalor, pollution, and contamination brought on by solid waste management are
global problems that need immediate attention and resolution. In order to address the current
pollution wave and ensure economic sustainability, immediate attention is required.

ility. Since there are uncoordinated daily activities capable of prompting, encouraging, and
stimulating unsustainable waste management, much attention should be paid to emerging and
developing countries on these issues. Whether in urban or rural settings, the development of
facilities for waste management depends on the environment at hand as well as the type of waste
present there. Unfavorable operational precincts, political, technical, and economic laws prevailed in
various operation systems, regardless of the arrangement mentioned earlier. Because of human
activities that cause significant pollution, which affects both the terrestrial and aquatic domains,
developing and emerging countries are subject to ongoing, unchecked waste disposal.
1.2 Research Question

This study is concerned with the research questions:

"How successfully can Deep Neural Networks and machine learning control air pollu-tion and
provide good health to living beings by forecasting air quality data?"
Table 1 shows the research questions objectives. Determine which model is best for data
prediction based on data from India’s air pollution. The second goal is to compare several models in
order to determine the best forecasting model.

The results will be analysed and underlying causes for this state can be studied in detail.
Following this, a roadmap can be created for dealing with this situation in a prag-matic manner. For
nding out the underlying causes, data and matter will be collected and studied from reliable sources
for meaningful deductions.

This research thesis has been divided in segments as per the following sequence- Part

2 Literature Review

2.1 Introduction
This section includes information on a brief explanation of the methods used to predict air pollution
data using different models and what research has been conducted using these data. This.

The section is divided into three parts; the first part discusses the various deep learning techniques
that have been used to forecast the air quality index, and the second part is devoted to machine
learning research. Related work that was finished in India is included in the final section.

2.2 Deep Learning techniques

Pollution prevention and control benefit from air quality forecasting. This study focuses on a multi-
time forecasting model that was used to analyze data from Beijing's air quality. The data from Beijing
take into account a variety of pollutants, meteorological data, and spatiotemporal data. Deep
learning models like LSTM, CNN, and BPNN were compared by the researchers for their results.
Because of this, LSTM performs better, making it possible to use LSTM for data forecasting. Overall,
Yan and Li (2021) find that clustering-based forecasting that is either spatial or seasonal is more
suitable for enhancing forecasting in a specific cluster or season.

India has a central and state control board that uses monitoring programs to collect data on the air
quality in 240 cities. The board used 342 monitoring stations, and daily and hourly data loads were
made. These data can be analyzed. Although statistics for the entire nation were provided by the
central and state control boards, the researcher only looked at Chennai. Utilize preprocessing
methods to get data from Chennai and remove any missing variables. The data were classified using
the LSTM and SVR models by AQI. The deep learning process helps with long-term sustainability
planning in metropolitan areas by accurately forecasting AQI values. By introducing synchronized
traffic signals, encouraging the use of public transportation, and increasing the number of trees in
designated areas.
2.3 Machine Learning techniques

In this study, PM10 and PM2.5 pollution levels are decreased using machine learning methods like
SVM, BTR, and ANN. Temperature and polluting gas data were gathered from the AQM site. Data
was pre-processed for feature selection using the principal component analysis method. They used a
range of metrics, including RMSE, to evaluate the project. For the best outcomes, compare the
results from the training and testing datasets.
3 Methodology

In order to identify patterns, particularly from vast amounts of data, data mining is extremely
important. Data mining typically provides correlations for data that was overlooked or was not
recognized, in addition to insights. CRISP-DM has been used frequently, particularly in the creation
and improvement of materials. The use of CRISP-DM has a significant impact in many fields. It is
divided into six phases: business understanding, data or information understanding, data
preparation, modeling, data evaluation, and data deployment. There are specific processes and
algorithms that are typically incorporated into the system at each phase. The CRISP-DM technique
has been used in a variety of fields, including engineering, the medical field, and CNN. Considering
the complexity of the underlying systems, this system can manage enormous amounts of data.
Organizations are now able to decide crucial operational decisions thanks to CRISP-DM.

CRISP-DM Model

3.1 Business Understanding

Before starting any analysis, the data scientist must have a firm understanding of the industry. Due
to industrial pollutants that contaminate the atmosphere, air pollution is the biggest concern in
today's society. Pollution-related health issues lead to environmental challenges. For future
applications, such as limiting air pollution and averting health issues, it is advantageous to analyze
the significance of air pollution prediction data.
3.2 Data Understanding

Finding out what is present in the data that the client has already obtained is the first step in
understanding data. Understanding the data provided by the customers is the primary goal at this
stage.
consisting of his or her geographic location and prior purchases, which also help to ascertain the
customer's interest that may be used later as the business develops.

3.3 Data Preparation

In the process of preparing data, raw information from the business understanding is cleaned as well
as prepared. Data mining algorithms are applied to the sampled and prepared data, and the
outcome of the developed and implemented solutions depends on the data quality.
3.4 Modelling

The fourth stage involves data modeling, where a variety of modeling concepts are picked and used
to execute data features, model data, activate the model, and calibrate its parameters to a medium
level. Thus, the stage mainly entails application of a desirable data mining or the machine learning
algorithms to the given dataset and provides a solution to the majority of business problems by use
of either a single approach or a composite of several techniques to evaluate the model and solve the
current problem.

It is a statistical technique that examines the distribution of data points to make future predictions. A
technique called time series forecasting uses time series data to predict future values. Time series
data are data collected on a specific feature over a period of time at regular intervals. Error
measures include mean absolute error (MAE), root mean square error (RMSE), and mean square
error (MSE). For better results, we employ a hybrid model that combines recurrent neural networks
(RNN) and their specialized types of Gated Recurrent Units and Long-Short Term Memory (LSTM).
3.4.1 Long short-term memory

The three types of gates that are available in the LSTM model are input gates, forget gates, and
output gates. The memory cell receives the output of the input modulation gate and is responsible
for collecting all additional data from the outside world. In the following generation, the forget gate
decides which data should be retained and which should be deleted. In this manner, the ideal delays
for the given data series are chosen. The output gate receives the calculated outcomes as input. The
output gate creates the signal that leaves the Long Short-Term Memory cell. In most language
models, the output layer of the LSTM is stacked on top of a softmax layer. Contrarily, our approach
adds a thick layer on top of the output layer of the LSTM cell.

The most significant things in LSTM are the cell states, which stands by the horizontal line at the
top part of Figure. The cell state behaves like a conveyor belt so it keeps running in the chain of
LSTM with some linear functions’ transformation, which retains

the previous information. An activation function layer (typically sigmoid) and a point-wise
multiplication operation are commonly used to produce the three gates that allow the LSTM is used
to add or remove data from a cell’s state.
3.4.2 Gated recurrent units

SeqtoSeq models are not restricted to LSTM units because the term LSTM Auto-encoder is widely
used in the machine-learning community. LSTM units come in a range of sizes and shapes. The most
widely used technology today is the gated recurrent unit. The unit, which combines the input and
forget gates into a, is based on the LSTM unit.
Zt
3.4.3 Linear Regression

The most basic and frequently used type of data analytics is linear regression. Regression's goal is to
examine two things: (1) Is it possible to predict an outcome (dependent) variable using a group of
predictor variables? (2) What traits in particular are significantly predictive of the dependent
variables, and What influence do they have on the outcome variable as indicated by the size and sign
of the beta approximation?
Y = Bo + B1X + E

3.4.4 Decision Tree

It's some sort of classification algorithm. Both categorical and continuous inputs and outputs are
supported. Using a Decision Tree and the best decision across the response variable, we split the
data set into two or more homogeneous sets. Although it has a number of intriguing features
that address problems like missing values, outliers, and identifying the most important
dimensions, it does not perform as well with continuous target variables as it does with
categorical ones. Since it doesn't require a constant data type and is a non-parametric method, it
is advantageous for data exploration.

A decision tree is a tool used to project decisions in the shape of a tree. When debugging
algorithms with conditional control statements, the tool is incredibly helpful. The main algorithm
for making decision trees is ID3. ID3 employs information gain and entropy to construct a
decision tree. For making decision trees, ID3 is the most popular algorithm. ID3 employs
information gain and entropy to create a decision tree.
c
X
E(S) = Pi log2 Pi

i=1

Gain(T; X) = Entropy(T ) Entropy(T; X)

3.5 Evaluation

Data evaluation is the second last stage, which only serves as a monitoring stage to the modelling
stage as the model building requires to be carefully and thoroughly looked upon before. Once the
model building and data evaluation are finished and everyone is convinced and satisfied with the
entire process, the deployment stage, which is the last stage to work out, is then engaged.

R-squared measures the proportion of the outcome's variance that the dependent variables can
adequately explain (R2).

The average error a model makes when making predictions based on observations is measured by
the Root Mean sq\.d Error (RMSE). The mean squared error (MSE), which is the smallest absolute
difference between observed actual output values and predicted values by the model, is the square
root of the mean squared error (RMSE) in mathematics.

The RMSE and the Mean Absolute Error (MAE) measure the RMSE prediction error and the MAE,
respectively.
1 n
Xi
MAE = n (xi x)2
=1
X
i
1 n
(Y
MSE = ACT (i) xP RED(i))2
n
=1

RMSE = v 1 n(YACT (i) x


P RED(i))2
u n i=1
u X

4 Design Specification

Several parts make up the entire project. We first choose data to comprehend business and
environmental needs. We then choose data on air quality, which is crucial for both the environment
and human life. For business and pollution control, air quality forecasting is crucial. The second stage
of data pre-processing Because the data we have is raw, the results of a model that uses it directly
may be unreliable. Pre-processing is a crucial step in the data forecasting process. It is necessary to
perform a transformation step when the values are too large for the model to comprehend. We
must employ a variety of models after data preparation. The next step is evaluation, which is crucial
because, regardless of the model we choose, we must compare the outcomes on the basis of errors
in order to show that the better model is superior.
5 Implementation

5.1 Environmental setup

Python programming was used to build the project. The main library contained the neural network
algorithms tersorflow, keras, dense, and sequential, and I used Google Collab to implement LSTM
and GRU..

5.2 Selection of Data

The data for our study can be found on the Central Pollution Control Board website, https://cpcb .
nic . in/, which is the portal of the Indian government. Particulate matter (PM2.5 and PM10),
nitrogen monoxide (NO), nitrogen dioxide (NO2), nitrogen oxides (NOx), ammonia (NH3), carbon
monoxide (CO), sulphur dioxide (SO2), and ozone (O3) were among the various parameters included
in the data set.
5.3 Data Pre-processing

In order to handle various functions, I set up all required libraries and packages on Google
Collab. The Central Pollution Control Board, which has 342 monitoring stations, is where all of the
data was obtained. Because the data set was so big, I restricted myself to only using data from the
Delhi region.

Your ability to analyze the data and see how each pollutant evolves over time depends on how
you plot each pollutant year by year. Except for CO, almost all pollutants have a seasonal graph. The
most important element in this scenario is the air quality index (AQI), which varies seasonally.
5.3.1 Data Cleaning

An essential step in the data preparation process is data cleaning. The results of the model could
suffer if the raw data is not cleaned, and the model might not be appropriate for the dataset. Since
manually cleaning data is insufficient, I use the following procedures:

Use the median function to check for missing data, null values, and all null values for better results.
Check for duplicate values and eliminate them.
5.3.2 Data Transformation

We must check the datatype of all variables since some datatypes are incompatible with the
operation. Date is already in object format in our scenario, however when we plot graph, we want
date to be in datetime format. As a result, we change the date variable’s format to Datetime. When
applying a module to data, we must ensure that the array

dimension is compatible with our model implementation. For a nice outcome, we change the data
using a Min-Max scaler. Data was divided into two variables, X and Y, using the drop function. X
serves as the input variable while Y serves as the output variable. We use the Drop function to insert
all nine parameters into the x variable and only the AQI into the y variable because there are only
one independent variable and nine dependent ones.

Data must be split into train and test datasets so that various models can be applied. I chose 20% of
the data for the test set and 80% of the data for the train set in this instance. The way prediction
techniques work is that they first use datasets to train their module, and then they require more
datasets to test their predictions. We can tell whether a model performed well or poorly after
testing it by calculating its accuracy or error rate.
5.4 Model Implementation and Evaluation

5.4.1 Implementation using LSTM

Whether the content of the earlier timestamp should be kept or ignored depends on the rest
section. The cell makes an effort to gain new knowledge from the input in the second section. The
cell then sends updated data about the current timestamp to the third component's subsequent
timestamp. An LSTM cell consists of three parts: gates. Input gate, Forget gate, and Output gate
make up the first and second sections of the rest section, respectively.

Data scaling must be carefully considered before LSTM modeling. The neural network converges
more quickly and performs more accurately when it is exposed to the same scaled features. In order
to keep attribute variance within the same range, we divided the data into a train and test dataset
and used the Min-Max scaler method. The performance of the forecasting model can also be ect if
one feature has more variance than others. The dataset is multivariate, and the date is an index
variable, so we used AQI to forecast the data. In order to maintain records over a pertinent time
period and assess the model's accuracy, the air quality attributes were scaled using Python's
MinMaxScaler() method.
Table 3: Performance Comparison

Model R- square RMSE MAE

LSTM 0.69 65.23 52.54

The model was built with an input layer, and then the next LSTM layer received sequence data
rather than random data. To avoid overfitting the model, a dropout layer is used. Finally, we output
one hot encoded result using a dense layer. Set up the model and begin training with a checkpoint
and early stopping. When the monitored loss exceeds the tolerance, early stopping ends training,
and checkpoint stores the model weight when it reaches the minimal loss.
5.4.2 Implementation using GRU

The GRU model uses the same dataset as the LSTM model. The MinMax function is used to scale
data in LSTM. We start by passing data in a sequential order. In the GRU model, we pass 64 units.

We utilized the dropout function to avoid over tting the data. We employed a dense layer with
one unit to get a single variable output. Finally, we compile the module with the optimizer "adam"
and use Mean Square Error to verify for loss.

Table 4: Performance Comparison

Model R- square RMSE MAE

GRU 0.84 45.32 37.56

In one epoch, the full dataset is fed forward and backpropagated through the model. It’s tricky
to cure out how many epochs to use so that the model fits the data well without becoming
overfitted. A model is considered to be underfitted if it fails to capture
the correlation between the independent (x) and dependent (y) variables. When a model is
overfitted on the training data, it fails to generalize to the test data, which is known as overfitting.

5.4.3 Implementation using LR

For visualization plot actual verses predicted graph for better understanding.

Table 5: Performance Comparison

Model R- square RMSE MAE

LR 0.85 45.23 34.81

For evaluation data, I utilized the LinearRegression function in Python to t the model with the
train dataset and forecast the model with the test dataset, as well as the sklearn metrics library. I
used the mean absolute error, the R-square function, and the passed test and projected values as
metrics.
5.4.4 Implementation using DT

I implemented the decision tree using the DecisionTreeRegressor function, and I set the random
state parameter to zero. Utilizing the Fit algorithm, the train data was fitted. In order to forecast the
data, test data was also used. In the evaluation process, linear regression is used. Using the sqrt
library from math, the root mean square error was calculated.
Table 6: Performance Comparison

Model R- square RMSE MAE

DT 0.84 47.26 33.22

5.4.5 Implementation using GRU-LSTM

We created two stage frameworks for forecasting data on air quality in this study. In the rest phase,
pre-processing input data, check for missing values and ll them in with the median function. For
normalized input datasets, we developed minmax scaler

algorithms. Then we created a hybrid model that combined the GRU and LSTM deep learning
models.
Actual vs Prediction GRU-LSTM

Train vs Validation Loss GRU-LSTM

We have historical data to prepare the model. The model can be enhanced with additional data. The
GRU layer in our hybrid model receives input data and sends it to the LSTM layer, which employs a
dropout function to prevent data overfitting. Two dense layers were used for best performance, with
the final dense layer supplying a single input value. The t function was then used to fit the model
using the train dataset. 50 epochs should be run for evaluation, with early stopping used to prevent
additional epochs.

We draw the train and validation loss graphs; dropout was used in our model because train loss is
greater than validation loss. The validation process is robust even though some features are set to
zero. In the test set, every neuron is in use. A prediction model could be used to analyze the entire
scenario.
Performance Comparison

Model R- square RMSE MAE

Proposed model 0.85 44.50 33.12

6 Model performance Comparison

The above sheet's calculation of the actual value's error is approximately correct. As a result, the
forecast value can be used to inform future Air Quality Index (AQI) forecasts. In order to reduce
errors and increase forecast accuracy, the data can also be smoothed using double and triple
exponential methods. After a model is implemented, evaluation is crucial so that we can choose the
best model. Three measures have been used in the analysis. The calculations were then finished by
comparing the actual value to the anticipated results. In this investigation, the model performance
was assessed using the Mean Absolute Error (MAE), R-Square, and Root Mean sq\. Error (RSME).
There have been numerous discussions, as well as many disagreements, regarding which technique
or strategy should be used to access model performance.
Table 8: Performance Comparison

Model R- square RMSE MAE

Proposed model 0.82 42.47 37.33

LSTM 0.64 65.24 51.78

GRU 0.81 44.49 39.55

LR 0.81 45.13 33.82

DT 0.88 48.26 33.10


It is proven that the proposed

model MAE 24.70 is less than other models, implying that the proposed model is good for projecting
air quality data. However, the proposed R-square model has an R-square of 0.84 and LSTM has an R-
square of 0.68, indicating that LSTM performs better in this case. In comparison to other models, the
outcomes of some measures are lower.

7 Conclusion and Future Work

The performance of the model is increased when deep learning and machine learning are combined,
or by combining machine learning models. The data is static in this model. Although the data is
updated hourly by the government. Real-time data analysis in the cloud can result in better
performance. For further processing, the predicted AQI values can be categorized in accordance with
AQI and health standards, and this can definitively talk about whether the air quality is hazardous or
not. The research that will be done in the future and the steps that will be taken can both benefit
from this prediction.

People who lived close to these dumpsites suffered long-term effects as a result of the ambient air
pollution, which included highly toxic substances. For the remediation of contaminated sites, it is
urgent to address the suspended particulate matter (SPM), hydrogen sulphide (H2S), and oxides of
nitrogen (NOx), and new technology for waste management needs to be adopted.

To predict AQI in highly polluted cities, a hybrid model that combines GRU and LSTM was proposed
in this study. If we can accurately predict the AQI, we can reduce pollution. DT, LR, LSTM, and GRU
models were among those that were compared using the MAE and RMSE parameters. Readings
show that the suggested hybrid model produces fewer errors than the standalone models, proving
its superiority.

The suggested method can be used in the future to forecast data from other cities. Prediction can
also be used to identify the polluted area and its origin. Some pollutants pose a serious threat to
human health in the future.

You might also like