Professional Documents
Culture Documents
The regulation of air pollutant levels is rapidly becoming one of the most important tasks for
the governments of developing countries, especially China and India. Among the pollutant
index, Fine particulate matter (PM2.5) is a significant one because it is a big concern to
people's health when its level in the air is relatively high. PM2.5 refers to tiny particles in the
air that reduce visibility and cause the air to appear hazy when levels are elevated.
However, the relationships between the concentration of these particles and meteorological
and traffic factors are poorly understood. To shed some light on these connections, some of
these advanced techniques have been introduced into air quality research. These studies
utilized selected techniques, such as Decision Tree, K-Nearest Neighbour, and Naïve
Bayesian Classifier to predict ambient air pollutant levels based on mostly weather data.
This project attempt to apply some machine learning techniques to predict PM2.5 levels
based on a dataset consisting of daily weather parameters in Delhi, India. Due to the
uncertainty of the specific number PM2.5 level, we simplified the problem to be a binary
classification one, that is to classify the PM2.5 level into "High" (> 120) and "low"
(<= 120 ug/m3), and also into 6 class classification. The value is chosen based on the Air
Quality Level standard in India, which set 115 ug/m3 to be mild level pollution.
For 6 Classes Classification:-
Problem Statement: -
The aim of this project is to predict the PFM levels of the specific city in advance based on
the previous meteorological and PFM (Air Quality) data obtained through various resources.
Predicting the Air Quality Level in advance will help government to take early measures in
controlling the air pollution of that city and this will help people to take certain precautions
in order to avoid various disease that are caused due to increase in air pollution.
Objectives of the proposed Work: -
This project attempts to apply some machine learning techniques such as Naïve Bayesian
Classifier, K-Nearest Neighbour, and Decision Tree Classifier to predict PM2.5 levels based
on a dataset consisting of daily weather in the city being chosen. Due to the uncertainty of
the specific number PM2.5 level, we have simplified the problem to be a binary classification
one, that is to classify the PM2.5 level into "High" (> x threshold ug/m ) and "low" (<= xthreshold
3
ug/m ). The value is chosen based on the Air Quality Level standard in the chosen city, which
3
Some of the Classifiers are trained to classify 6 class label {1,2,3,4,5,6} in which 1-4 class
labels are considered as non-harmful pollution levels whereas 5-6 contains severe pollution
levels.
In order to identify and forecast key parameters affecting air quality and propose
appropriate preventive strategies and policies, it is essential to systematically collect data
characterizing air quality. The data includes two parts: training data set and test data set.
Proposed Methodology: -
Where u=mean of the Attribute such that all tuples belong to same class
And sigma=standard deviation of the Attribute such that all tuples belong to same class
4) Compute the posterior probability of each class labels based on test tuples since Naïve
Bayesian Classifier assume that all the attributes contribute independently.
Posterior Probability= Likelihood * Prior Probability
5) Assign the Test Tuple to the class label that corresponds to highest posterior probability
among all class labels.
Where p j is the probability of number of tuples of the data set belonging to that class
label.
The attribute with the largest standard deviation reduction is chosen for the decision
node
4) The dataset is divided based on the values of the selected attribute. This process is run
recursively on the non-leaf branches, until all data is processed. When the number of
instances is more than one at a leaf node we calculate the average as the final value for
the target.
5) Linear Regressor: -
Algorithm:-
1) Split the data into 60-40% as training data and testing data respectively
2) Compute the coefficient of the following equation
Y=a+bX1+cX2+dX3+eX4
Where Y=dependent attribute i.e. to be predicted
and Xi =independent attribute
Scope of Improvement: -
Results are based on amount of data so in order to improve the result increase the
quality of data
Accuracy can be increased by using various different classifiers that are available
Before using the classifier on the data remove the outliers and the noise that is present in
the data
In some cases normalized data generate better results as compared to original data
References: -
[3] Ioannis N. Athanasiadis, Kostas D. Karatzas and Pericles A. Mitkas. "Classification techniques for
air quality forecasting." Fifth ECAI Workshop on Binding Environmental Sciences and Artificial
Intelligence, 17th European Conference on Artificial Intelligence, Riva del Garda, Italy, August 2006.