Air Pollution Forecasting

Air Quality Prediction
Group Members:
Guide : Ms. Nitha K P Sruthi K S
Teslin Rose P V
Vini Sasidharan
Vishnu V U
Outline
Introduction Modules
Objectives Architecture
Methodology Status
➟ Air pollution is one of the great killers of our
time. Polluted air was responsible for 6.4
million deaths worldwide: 2.8 million from
Problem household air pollution and 4.2 million from

ambient air pollution.
statement
➟ Data from that year shows that air pollution
worldwide caused
• 19% of all cardiovascular deaths
• 24% of ischemic heart disease deaths
• 21% of stroke deaths
• 23% of lung cancer deaths
➟ If people are aware of variations in the quality
of the air they breathe, there is a greater
likelihood of motivating changes in both
Benefits of air individual behavior and public policy.
pollution ➟ Such awareness has the potential to create a

information and cleaner environment and a healthier
Prediction population.
➟ Governments also make use of early prediction

to establish procedures to reduce the severity
of local pollution levels
➟ When predicting air quality, there are many
variables to take into account, some of which
are quite unpredictable
Accuracy in air ➟ Air pollution levels are strongly correlated with

quality local weather conditions and nearby pollution
Predicting emissions
➟ However, long-range transport of pollution

through strong winds is also a significant
influencing factor and must be taken into
consideration when forecasting local AQI
readings.
➟ Predicting air quality, therefore, not only
involves the difficulties of weather forecasting,
it also requires data on and knowledge of:
○ Local pollutant Concentration
Accuracy in air ○ Emissions from distant locations
quality ○ Possible transformations of pollutants
Predicting ○ Prevailing winds
➟ These many factors at play in predicting air

quality result in air pollution forecasting being
both subjective and objective
➟ There are many such predictive models,
and all require more complexity than
Air quality weather forecast models.
Predicting
techniques ➟ The first step to an accurate air quality
forecast is an excellent weather forecast.
➟ These models are mathematical

simulations of how airborne pollutants
disperse in the air.
➟ Data Cleaning
○ Missing Data
○ Noisy Data
Data
Pre-processing ➟ Data Transformation
○ Normalization
○ Discretization
➟ The data can have many irrelevant and
missing parts. To handle this part, data
cleaning is done. It involves handling of
missing data, noisy data.
Data Cleaning
➟ This situation arises when some data is
missing in the data. It can be handled in
various ways
○ Ignore the tuples
○ Fill the missing value
➟ Multivariate Imputation by Chained
Equations(MICE)
➟ Multivariate imputation by chained
equations (MICE) has emerged as a
principled method of dealing with missing
Data Cleaning data. Despite properties that make MICE
particularly useful for large imputation
procedures
➟ MICE operates under the assumption that

given the variables used in the imputation
procedure, the missing data are Missing At
Random, which means that the probability
that a value is missing depends only on
observed values.
➟ This step is taken in order to transform the
data in appropriate forms suitable for mining
Data process.
Transformation ➟ Normalization is done in order to scale the

data values in a specified range (-1.0 to 1.0 or
0.0 to 1.0)
○ Z Normalization(Standardization)
Temperature
Humidity
Feature Wind
Selection Pressure
Rainfall
SO2
NO2
O3
PM10
PM2.5
➟ Random Forest
Predictive
Models ➟ Deep Learning (Multilayer perceptron)
➟ Extreme Gradient Boost (XGBoost)
➟ CatBoost
The proposed best air pollution prediction model.
Random Forest
➟ Random forests or random decision forests are an ensemble learning method for classification,
regression.
➟ Random forest is a bagging technique. The trees in random forests are run in parallel. There is no
interaction between these trees while building the trees. It operates by constructing a multitude
of decision trees at training time and outputting the mean prediction (regression) of the
individual trees.
Error and Feature Importance
Deep Learning
➟ AQI is set as the dependent variable,and the rest of the features are set as the independent
variables.
➟ The data is converted into a matrix form,and is split into two independent samples, training
and testing, 70% and 30% respectively.
➟ The test and train samples are then normalised, using Z Normalization.
➟ The tuned model has two hidden layer, with ten and five neurons. The activation function
used in Rectified Linear Unit (relu).
Deep Learning
➟ The input layer has one neuron each for the independent variables,and the output layer has
one neuron for response.
➟ As we are using regression and the output variable is numeric, mse is used as the loss
function ,and rmsprop is used as the optimizer function and mae is used as the metric
function , to compile the model.
➟ To fit the model,the training data is used ,which is run over a 100 epochs,then to evaluate the
model ,the testing data is used. The test data is used for getting the prediction.
Neural Network Model
Prediction and Loss
eXtreme Gradient Boosting
➔ XGBoost is an optimized distributed gradient boosting library designed to be
highly efficient, flexible and portable.
➔ It implements machine learning algorithms under the Gradient Boosting

framework. XGBoost provides a parallel tree boosting that solve many data
science problems in a fast and accurate way.
How it works
Feature Importance
CatBoost
➔ CatBoost is a gradient boosting machine learning algorithm it reduces the need
for extensive hyper-parameter tuning and lower the chances of overfitting also
which leads to more generalized models. Although, CatBoost has multiple
parameters to tune and it contains parameters like the number of trees, learning
rate, regularization, tree depth, fold size, bagging and others
➔ It is especially powerful in two ways:

◆ It yields state-of-the-art results without extensive data training typically required by
other machine learning methods
◆ Provides powerful out-of-the-box support for the more descriptive data formats that
accompany many business problems.
Error and Feature Importance
RMSE of each Model
Interface Using Shiny R web app
➔ Shiny is an R package that makes it easy to build interactive web apps straight
from R. You can host standalone apps on a webpage or embed them in R
Markdown documents or build dashboards.
➔ You can also extend your Shiny apps with CSS themes, htmlwidgets, and
JavaScript actions.
User Interface
Status
90% 80%
Completed
➟ Data Pre-processing
60%
➟ Random Forest
➟ Deep Learning(MLP)
XG
Ca
Us
er
tB
ooB
oo
Int
st
st
er
fa
ce
Thank you!
Any questions?

Air Pollution Forecasting

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Air Pollution Forecasting

Uploaded by

Copyright:

Available Formats

Air Quality Prediction

Problem household air pollution and 4.2 million from

Beneﬁts of air individual behavior and public policy.

pollution ➟ Such awareness has the potential to create a

➟ Governments also make use of early prediction

Accuracy in air ➟ Air pollution levels are strongly correlated with

➟ However, long-range transport of pollution

Predicting ○ Prevailing winds

➟ These many factors at play in predicting air

Air quality weather forecast models.

➟ These models are mathematical

➟ MICE operates under the assumption that

Transformation ➟ Normalization is done in order to scale the

➟ Extreme Gradient Boost (XGBoost)

➔ It implements machine learning algorithms under the Gradient Boosting

➔ It is especially powerful in two ways:

You might also like