You are on page 1of 30

Air Quality Prediction

Group Members:
Guide : Ms. Nitha K P Sruthi K S
Teslin Rose P V
Vini Sasidharan
Vishnu V U
Outline
Introduction Modules

Objectives Architecture

Methodology Status
➟ Air pollution is one of the great killers of our
time. Polluted air was responsible for 6.4
million deaths worldwide: 2.8 million from

Problem household air pollution and 4.2 million from


ambient air pollution.
statement
➟ Data from that year shows that air pollution
worldwide caused
• 19% of all cardiovascular deaths
• 24% of ischemic heart disease deaths
• 21% of stroke deaths
• 23% of lung cancer deaths
➟ If people are aware of variations in the quality
of the air they breathe, there is a greater
likelihood of motivating changes in both

Benefits of air individual behavior and public policy.

pollution ➟ Such awareness has the potential to create a


information and cleaner environment and a healthier
Prediction population.

➟ Governments also make use of early prediction


to establish procedures to reduce the severity
of local pollution levels
➟ When predicting air quality, there are many
variables to take into account, some of which
are quite unpredictable

Accuracy in air ➟ Air pollution levels are strongly correlated with


quality local weather conditions and nearby pollution
Predicting emissions

➟ However, long-range transport of pollution


through strong winds is also a significant
influencing factor and must be taken into
consideration when forecasting local AQI
readings.
➟ Predicting air quality, therefore, not only
involves the difficulties of weather forecasting,
it also requires data on and knowledge of:
○ Local pollutant Concentration
Accuracy in air ○ Emissions from distant locations
quality ○ Possible transformations of pollutants

Predicting ○ Prevailing winds

➟ These many factors at play in predicting air


quality result in air pollution forecasting being
both subjective and objective
➟ There are many such predictive models,
and all require more complexity than

Air quality weather forecast models.

Predicting
techniques ➟ The first step to an accurate air quality
forecast is an excellent weather forecast.

➟ These models are mathematical


simulations of how airborne pollutants
disperse in the air.
➟ Data Cleaning
○ Missing Data
○ Noisy Data
Data
Pre-processing ➟ Data Transformation
○ Normalization
○ Discretization
➟ The data can have many irrelevant and
missing parts. To handle this part, data
cleaning is done. It involves handling of
missing data, noisy data.
Data Cleaning
➟ This situation arises when some data is
missing in the data. It can be handled in
various ways
○ Ignore the tuples
○ Fill the missing value
➟ Multivariate Imputation by Chained
Equations(MICE)
➟ Multivariate imputation by chained
equations (MICE) has emerged as a
principled method of dealing with missing
Data Cleaning data. Despite properties that make MICE
particularly useful for large imputation
procedures

➟ MICE operates under the assumption that


given the variables used in the imputation
procedure, the missing data are Missing At
Random, which means that the probability
that a value is missing depends only on
observed values.
➟ This step is taken in order to transform the
data in appropriate forms suitable for mining

Data process.

Transformation ➟ Normalization is done in order to scale the


data values in a specified range (-1.0 to 1.0 or
0.0 to 1.0)
○ Z Normalization(Standardization)
Temperature

Humidity

Feature Wind

Selection Pressure

Rainfall

SO2

NO2

O3

PM10

PM2.5
➟ Random Forest
Predictive
Models ➟ Deep Learning (Multilayer perceptron)

➟ Extreme Gradient Boost (XGBoost)

➟ CatBoost
The proposed best air pollution prediction model.
Random Forest
➟ Random forests or random decision forests are an ensemble learning method for classification,
regression.

➟ Random forest is a bagging technique. The trees in random forests are run in parallel. There is no
interaction between these trees while building the trees. It operates by constructing a multitude
of decision trees at training time and outputting the mean prediction (regression) of the
individual trees.
Error and Feature Importance
Deep Learning
➟ AQI is set as the dependent variable,and the rest of the features are set as the independent
variables.

➟ The data is converted into a matrix form,and is split into two independent samples, training
and testing, 70% and 30% respectively.

➟ The test and train samples are then normalised, using Z Normalization.

➟ The tuned model has two hidden layer, with ten and five neurons. The activation function
used in Rectified Linear Unit (relu).
Deep Learning

➟ The input layer has one neuron each for the independent variables,and the output layer has
one neuron for response.

➟ As we are using regression and the output variable is numeric, mse is used as the loss
function ,and rmsprop is used as the optimizer function and mae is used as the metric
function , to compile the model.

➟ To fit the model,the training data is used ,which is run over a 100 epochs,then to evaluate the
model ,the testing data is used. The test data is used for getting the prediction.
Neural Network Model
Prediction and Loss
eXtreme Gradient Boosting
➔ XGBoost is an optimized distributed gradient boosting library designed to be
highly efficient, flexible and portable.

➔ It implements machine learning algorithms under the Gradient Boosting


framework. XGBoost provides a parallel tree boosting that solve many data
science problems in a fast and accurate way.
How it works
Feature Importance
CatBoost
➔ CatBoost is a gradient boosting machine learning algorithm it reduces the need
for extensive hyper-parameter tuning and lower the chances of overfitting also
which leads to more generalized models. Although, CatBoost has multiple
parameters to tune and it contains parameters like the number of trees, learning
rate, regularization, tree depth, fold size, bagging and others

➔ It is especially powerful in two ways:


◆ It yields state-of-the-art results without extensive data training typically required by
other machine learning methods
◆ Provides powerful out-of-the-box support for the more descriptive data formats that
accompany many business problems.
Error and Feature Importance
RMSE of each Model
Interface Using Shiny R web app
➔ Shiny is an R package that makes it easy to build interactive web apps straight
from R. You can host standalone apps on a webpage or embed them in R
Markdown documents or build dashboards.

➔ You can also extend your Shiny apps with CSS themes, htmlwidgets, and
JavaScript actions.
User Interface
Status
90% 80%
Completed

➟ Data Pre-processing
60%
➟ Random Forest
➟ Deep Learning(MLP)

XG

Ca

Us
er
tB
ooB

oo

Int
st

st

er
fa
ce
Thank you!
Any questions?

You might also like