Air CHAPTER

CHAPTER 1
INTRODUCTION
1.1 BACKGROUND
Air pollution is a problem for much of the developing world and is believed to kill
more people worldwide than AIDS, Malaria, Breast Cancer or tuberculosis . India adopted
the ambient air quality standard and began development of a nation-wide programme of
ambient air quality monitoring known as National Air quality monitoring program(NAMP).
The network consists of 683 operating stations covering three hundred cities/towns in 29
states and four union territories of the country. The monitoring of pollutants is carried out for
24 hr (4 hourly sampling for gaseous pollutants and 8 hourly sampling for particulate matter)
with a frequency of twice a week, to have 104 observations in a year. The environmental
regulatory authorities have prescribed the guidelines for the maximum permissible levels of
various air pollutants, such as SO 2(80 µg m-3),NO2(80 µg m-3),and PM10(100 µg m-
3
),respectively. Urban air pollution in India had increased rapidly with the population growth,
number of motor vehicles, use of fuels with poor environmental performance, badly
mentioned transportation system, poor land use pattern, and above all ineffective
environmental regulations. As for the health impact of air pollutants, air quality index (AQI)
is an important indicator for general public to understand easily how bad or good the air
quality is for their health and to assist in data interpretation for decision making processes
related to pollution mitigation measures and environmental management. Basically the AQI
is defined as an index or rating scale for reporting the daily combined effect of ambient air
pollutants recorded at the monitoring site
Machine learning is to predict the future from past data. Computer studying (ML) is a
style of artificial intelligence (AI) that delivers computers the capability to gain knowledge of
without being explicitly programmed. Machine finding out makes a speciality of the progress
of pc applications that can alternate when exposed to new information and the basics of
laptop studying, implementation of a easy laptop finding out algorithm utilising python.
Process of coaching and prediction involves use of specialised algorithms. It feed the training
data to an algorithm, and the algorithm uses this training knowledge to offer predictions on a
brand new test information. Machine finding out can be roughly separated in to three classes.
There are supervised learning, unsupervised finding out and reinforcement finding out.
1
Supervised studying software is each given the input knowledge and the corresponding
labeling to be trained data must be labeled with the aid of a person previously. Unsupervised
learning isn't any labels It provided to the learning algorithm. This algorithm has to figure out
the clustering of the input knowledge. Subsequently, Reinforcement learning dynamically
interacts with its environment and it receives positive or bad suggestions to toughen its
efficiency.
The necessity of healthy air has always been of great importance. As air is vital for all
living beings on earth, it is our responsibility to keep the air clean. The rapid urbanization and
industrialization have led the world into a new era of air pollution and is seen as a modern-
day curse. Air pollution refers to the contamination of the air by excessive quantities of
harmful substances. Most air pollution occurs from energy use and production, where
emissions from traffic and industry are major contributors. Air pollution is a widespread
problem due to its impact on both humans and the environment. Urban cities usually have the
worst air pollution due to human activities [Kampa and Castanas (2008)].
Clear links between pollution and health effects have been revealed, which includes
both short- and long-term consequences [Brauer et al. (2012)]. Associations with reduced
lung function and increase in heart attack [Arden et al. (2002)], direct impact on people with
asthma and other types of pneumonia [Guarnieri and Balmes (2014)] and once inhaled, a fine
particular matter may hardly be self-purified by the immune system [Becker (2002)]. The
overall effects of ambient air pollution on premature human mortality are a falling global
trend, but in a smaller geographical area, the levels do not follow WHO’s guidelines [WHO
et al. (2006)]. Due to these severe problems, there are national requirements and objectives
that each city must meet. Air quality has increasingly attracted attention from environment
managers and citizens all over the world. New tools continue to emerge to raise air quality
awareness worldwide.
Continuously improvements in air quality mapping are happening along with the
advancements of smart cities and the amount of internet-of-things sensor devices. The
increase in data produced contributes to further momentum in air pollution activity. A hot
research topic is air pollution forecasting, the prediction of the atmospheric composition of
pollutants for a given time and location. With an accurate air quality forecast, one can decide
how to act due to air pollution health effects. On the national level, accurate forecasting
contributes to planning and establishing procedures to reduce the severity of local pollution
levels. With better knowledge at the individual level, one can choose the right choice for the
2
cleanest routes for the commute, the best time for outdoor activity and other daily outdoor
activities. Awareness like this has the potential to create a cleaner environment and a
healthier population. Accurate time series forecasting of air quality is a continuous research
area, and much effort has been made by researchers to create models capable of fitting the
underlying time series. Often, air quality prediction involves a noisy and limited amount of
historical data. Furthermore, the prediction of a single observation usually depends on many
events that rely on each other. The models are then forced to include specially adapted
techniques to comply with the erroneous or lack of data.
These complex problems make it hard to generalize the solution to be transferable to
other locations. Besides, the air changes rapidly in short time frames, with hourly data more
uncertain compared with monthly and yearly trends and seasonality. The lack of and poor
quality of data, a low spatial resolution of data points, and the cost of high-quality sensors
add up to the list among other obstacles. Figure 1.1 presents the observations of particular
matter smaller than 2.5 microns (PM2.5) in the period of the end of 2017 to Mai 2019. The
graph shows the typical pollution trends in Trondheim, of rapid changes in pollution levels
for the winter months, while the summer includes a relatively good level. These trends and
patterns are typical characteristics of the air quality in cities in Norway and Scandinavia.
Consequently, the challenge of these time series is then the prediction of the sudden changes
in harmful pollution.
1.2 MOTIVATION
Air pollution is a problem for much of the developing world and is believed to kill
more people worldwide than AIDS, Malaria, Breast Cancer or tuberculosis . India adopted
the ambient air quality standard and began development of a nation-wide programme of
ambient air quality monitoring known as National Air quality monitoring program(NAMP).
The network consists of 683 operating stations covering three hundred cities/towns in 29
states and four union territories of the country. The monitoring of pollutants is carried out for
24 hr (4 hourly sampling for gaseous pollutants and 8 hourly sampling for particulate matter)
with a frequency of twice a week, to have 104 observations in a year. The environmental
regulatory authorities have prescribed the guidelines for the maximum permissible levels of
various air pollutants, such as SO 2(80 µg m-3),NO2(80 µg m-3),and PM10(100 µg m-
3
),respectively. Urban air pollution in India had increased rapidly with the population growth,
number of motor vehicles, use of fuels with poor environmental performance, badly
3
mentioned transportation system, poor land use pattern, and above all ineffective
environmental regulations. As for the health impact of air pollutants, air quality index (AQI)
is an important indicator for general public to understand easily how bad or good the air
quality is for their health and to assist in data interpretation for decision making processes
related to pollution mitigation measures and environmental management. Basically the AQI
is defined as an index or rating scale for reporting the daily combined effect of ambient air
pollutants recorded at the monitoring site
1.3 OBJECTIVE
 To predict real-time air pollution levels in urban cities.
 To provide air quality information for the public.
 To identify and provide different categories and levels of pollution in air
 To develop more accurate and cost-effective air quality prediction system by using
Logistic algorithm and AQI.
1.4 SCOPE OF THE PROJECT
The Scope of the our project is to apply Data Mining techniques based classification
and regression functions to enumerate the influence of pollutants on the air quality and to
predict the air quality index in the study region using primary pollutants (SO2, NO2 and
PM10) as the estimators. Accordingly, in this project we used to predict air quality by using
Logistic algorithm methods for classification and regression.
1.5 LIMITATIONS
Conventional monitoring instruments are their large size, heavy weight and
extraordinary expensiveness. These lead to sparse deployment of the monitoring stations. In
order to be effective, the locations of the monitoring stations need careful placement because
the air pollution situation in urban areas is highly related to human activities (e.g.,
construction activities) and location-dependent (e.g., the traffic choke-points have much
worse air quality than average). Changes in urban arrangement, activities or regulation may
affect both the species and the concentrations of air pollutants, which require relocating
4
stations or adding new stations. These requirements are typically hard or even impossible to
fulfil due to the cost inefficiency in acquisition and maintenance of the monitoring stations.
1.6 IMPORTANCE OF PROJECT

Monitoring helps in assessing the level of pollution in relation to the ambient air
quality standards. Standards are a regulatory measure to set the target for pollution reduction
and achieve clean air. Robust monitoring helps to guard against extreme events by alerting
people and initiate action. We regulate a total of 12 pollutants, including SO2, NO2, PM10,
PM2.5 (particulate matter of up to 10 micron and up to 2.5 micron size), ozone, lead, arsenic,
nickel, CO, NH3, benzene, and BaP (particulate phase). Across cities, only SO2, NO2 and
RSPM / PM10 are monitored regularly.
1.7 ORGRANIZATION OF THE REPORT
The rest of the report is organized as follows. Chapter 2 gives the survey of various
related work to this project. Chapter 3 describes the architecture design and gives the
algorithm used in the module design. Chapter 4 describes the implementation and results with
screenshot of inputs and outputs. Also this chapter gives the evaluation metrics to identify Air
pollution Areas. Chapter 5 is the conclusion and future work to be done.
5
CHAPTER 2
LITERATURE SURVEY
Verma, Ishan, Rahul Ahuja, HardikMeisheri, andLipikaDey. ”Air pollutant severity

rediction using Bi-directional LSTM Network.” In 2021 IEEE/WIC/ACM International
Conference on Web Intelligence (WI),pp. 651-654. IEEE, 2021.
This author described the benefits of the Bidirectional Long -Short Memory [BiLSTM]
method to forecast the severity of air pollution. The proposed technique achieved better
prediction which models the long term, short term, and critical consequence of PM2.5
severity levels. In the proposed method prediction is made at 6h, 12h, 24h. The results
obtained for 12h is consistent, but the result obtained for 6h, and 24h are not consistent.
Figures Zhang, Chao, Baoxian Liu, Junchi Yan, Jinghai Yan, Lingjun Li, Dawei Zhang,
XiaoguangRui, and RongfangBie. ”Hybrid Measurement of Air Quality as a 5 Fig. 8.
RH w.r.t tin oxide Fig. 9. RH w.r.t C6H6 Mobile Service: An Image Based Approach.”
In 2020 IEEE International Conference on Web Services (ICWS), pp. 853-856.
IEEE,2020.
This author proposed web service methodology to predict air quality. They provided service
to the mobile device, the user to send photos ofair pollution. The proposed method includes 2
modules a) GPS location data to retrieve the assessment of the quality of the air from nearby
air quality stations. b) they have applied dictionary learning and convolution neural network
on the photos uploaded by the user to predict the air quality. The proposed methodology has
less error rate compared to other algorithms such as PAPLE, DL, PCALL but this method has
a disadvantage in learning stability due to this the results are less accurate.
Yang, Ruijun, Feng Yan, and Nan Zhao. ”Urban air quality based on Bayesian
network.” In 2019 IEEE 9th Fig. 10. RH w.r.t NO Fig. 11. RH w.r.t NO2 International
Conference on Communication Softwareand Networks (ICCSN), pp. 1003-1006.
IEEE,2019.
This author used the Bias network to find out the air quality and formed DAG from the data
set of the town called as shanghai. The dataset is dived for the training and testing model. The
disadvantage of this approach is they have not considered geographical and social
environment characteristics, so the results may vary based on these factors.
6
Ayele, TemeseganWalelign, and RutvikMehta.”Air pollution monitoring and prediction
using IoT.” In 2019 Second International Conference on Inventive Communication
6Fig. 12. RH w.r.t Temperature Fig. 13. RH w.r.t CO and Computational Technologies
(ICICCT), pp. 1741-1745. IEEE,2019.
This author proposed an IoT based technique to obtain air quality data set. They have used
Long Short-term Memory[LSTM] technique in-order to predict the air quality the proposed
technique achieved better accuracy by reducing the time taken to train the model. But still,
the accuracy can be improved by compared other techniques such as the Random forest
method
Djebbri, Nadjet, and MouniraRouainia. ”Artificial neural networksbased air pollution

monitoring inindustrial sites.” In 2019 International Conference on Engineering and
Technology (ICET), pp. 1-5. IEEE,2019.
This author proposed artificial based Regressive model which is nonlinear to predict 2 major
air pollutants 2 such as carbon monoxide and nitrogen oxides. They have considered the
variables such as the speed of the air, air direction, temperature, and moisture and the toxic
elements from the industrial site such as Skikda. They have used RMSE and MAE to
determine the performance, but this method considered only 2 pollutants such as NO and CO
the other major pollutants such as sulfur dioxide, PM2.5, PM10 are not considered.
D. Van Le, C. K. Tham, Machine Learning (ML)-Based Air Quality Monitoring Using
Vehicular Sensor Networks. In Parallel and Distributed Systems (ICPADS), 2019 IEEE
23rd International Conference on 2019 Dec 15 (pp. 65–72). IEEE.
This author proposes a machine learning based Air quality Monitoring system which aims at
reduce the sensing cost and communication by allowing vehicles to process the collected data
in a distributed fashion. It was proposed to assign and sense locations to vehicles such that
the successful measurement probability of all sensing sub-areas is maximized while the
vehicles can learn models with prediction accuracy.
S. Kumar, A. Jasuja, Air quality monitoring system based on IoT using Raspberry Pi.
In Computing, Communication and Automation (ICCCA), 2017 International
Conference on 2017 May 5 (pp. 1341–1346). IEEE.
This author proposes a real time AQI Monitoring System using various parameters such as
CO, CO2, Humidity, PM 2.5, Temperature and air pressure. The system is tested in Delhi and
7
the measurements are compared with the data provided by the local environmental control
authority. Pollution Monitoring Sensor, Arduino Uno, Raspberry pi are used for monitoring
the air quality, with accurate, affordable and easy to use. Furthermore, durable pollution
arrangements can be detected and assured network between the air pollutants can be found.
K.S. Goverdhan Rathla,T. Sankarappa*, J.S. Ashwajeet and R. Ramanna- Air Quality
Analysis using Machine Learning Algorithm
Ambient Air Quality (AQI) of Belagavi city and its surroundings was monitored using
a set of dust samplers, viz., APM-460, APM-550 and APM-433. The pollutants considered
for the assessment of air quality were sulphur dioxide (SO2), nitrogen dioxide (NO2),
ammonia (NH3) and particulate matter (PM10 and PM2.5) which gave a measure of pollution
content in the air. The sampling sites have been monitored for 24 hours on an 8 hourly basis
in three different seasons in 2013 and 2014. Air quality was quantified in terms of
concentrations of PM10, PM2.5, NO2, SO2 and NH3. The results revealed that locations near
Bus station and Macchae industrial area lie in the range of Moderate Air Pollution and
Autonagar, Sadashivnagar and Kangralli village lie in the range of Light Air Pollution.
Annual Air Quality index of presently selected sites of Belagavi has been determined to
assess the degree of atmospheric pollution.
Ioannis N. Athanasiadis, Vassilis G. Kaburlasos, Pericles A. Mitkas and Vassilios

Petridis: Predicting and Analyzing Air Quality using Machine Learning:
Comprehensive Model
Air quality is typically assessed based on either expert meteorologist knowledge or on
sophisticated “first principles” mathematical models. Air Quality Operational Centers have
been established worldwide in areas with (potential) air pollution problems. These centers
monitor critical atmospheric variables and they publish regularly their analysis results
[ΚΑ99]. Currently, real-time decisions are made by human experts, whereas mathematical
models are used for off-line study and understanding of the atmospheric phenomena
involved.
8
Huiping Peng: Classification Model for Air Quality using Machine Learning
Techniques
With economic development and population rise in cities, environmental pollution
problems involving air pollution, water pollution, noise and the shortage of land resources
have attracted increasing attention. Among these, air pollution's direct impact on human
health through exposure to pollutants has resulted in an increased public awareness in both
developing and developed countries (Kim et al., 2013; Kurt and Oktay, 2010; McGranahan
and Murray, 2003). Air pollution is usually caused by energy production from power plants,
industries, residential heating, fuel burning vehicles, natural disasters, etc. Human health
concern is one of the important consequences of air pollution, especially in urban areas. The
global warming from anthropogenic greenhouse gas emissions is a long-term consequence of
air pollution (Nordiska, 2008; Ramanathan and Feng, 2009; Kumar and Goyal, 2013).
Accurate air quality forecasting can reduce the effect of a pollution peak on the surrounding
population and ecosystem, hence improving air quality forecasting is an important goal for
society.
V. M. Niharika and P. S. Rao, “A survey on air quality forecasting
techniques,”International Journal of Computer Science and Information Technologies,
vol. 5, no. 1, pp.103-107, 2020.
Recurrent Neural Networks (RNN) has shown its effectiveness in dealing with
temporal data. But, data from future which may come up later that the present time is needed
for prediction. RNNs can partly acquire this via postponing the output with the aid of a
specific amount of time frames to incorporate future understanding. Hypothetically, a huge
lengthen could be applied but in observe, it's located that prediction outcomes drip if the
extend is too big. At the same time deferring the yield by way of some borders has been
applied efficiently to make stronger outcome for consecutive information, the surest extend is
duty elegant and have to be received by way of the experimental and blunder process.
Likewise, two distinct networks, one for each and every track would be informed on all input
expertise and then the outcomes can be combined by applying arithmetic or geometric mean
for ultimate forecast. Nevertheless, it's complex to receive most excellent combining
considering the fact that exceptional networks trained on the identical knowledge can now
not be viewed as unbiased. To overcome these obstacles, it proposed bidirectional recurrent
neural community (BRNN) that may be proficient utilizing all on hand input know-how
previously and future of a special time frame. Pollutant knowledge like every other sensor
data shouldn't be permitted from lacking normal and anomalous values. The indiscretions
9
may just occur due to influential mistake or any other outside reasons similar to energy-
blocking or compensation of connectivity and so on. There have been occasions the place
pollutant data used to be no longer suggested by a source observing post. These lacking
values were incorporated making use of systematic data values of previous occasions. If a
value lies out of scope, the allowable variety for an attribute is handled as an anomalous
worth. Irregular values can be changed with the aid of rolling natural of prior three instances.
It offered a robust manner of predicting the severity of pollution through the readings
received from different sensors in 6, 12 and 24 hour upfront making use of Deep learning
items. This claim is verified over the data from pollution Department from New Delhi, India
via forecasting the concentration of PM2.5 pollutant. We gift our investigations with
assessment to reference line method over distinct posts and over exceptional time phases.
Additionally, we have now awarded a Collaborative process which achieves good in lots of
the circumstances, and also attests to be extra effective.
E. Kalapanidas and N. Avouris, “Applying machine learning techniques in air quality

prediction,” in Proc. ACAI, vol. 99, September 2017.
It's apparent that folks that labor in a company or concern are possible to be exposed
to the hazard of breathing damaging chemical compounds and air because of their constant
acquaintance to pollutants. Air pollution provides to the hazardous situation that creates
negative influence on dwelling items. It is likely an actual consideration for the whole earth.
Contamination of the air is a global trouble even for multi-national companies, governments,
and the mass media. Making use of ordinary possessions at a better fee than the character's
capability to rebuild it can cause pollution of vegetation, air, and water. As opposed to the
works done by people, there are other causes that result in the release of harmful toxins.
Apart from quests made by men, natural calamities reminiscent of many kinds may have an
outcome in infecting the air. Technology has advanced in almost all areas and movements of
living organisms. In the current world everything is completed making use of new science
with a view to satisfy the demand of person, institution, manufacturer and so on. Internet of
matters (IoT) is without doubt one of the predominant exchanging information traits within
the last ten years. By means of this thought, it's feasible to attach numerous embedded objects
that consume less power.
10
D. J. Nowak, D. E. Crane, and J. C. Stevens, “Air pollution removal by urban trees and shrubs
in the United States,”Urban Forestry &Urban Greening, vol. 4, no. 3, pp. 115-123, 2019`
Outside air nice performs an essential role in human wellbeing. Air pollution reasons
colossal raises in scientific costs, disease and is assessed to intent nearly 800,000 yearly
hastydemises global (Cohen et al., 2005). The outside air commonly includes physically
expressively phases of numerous pollution together with particulates (PM10 or PM2.5),
ozone, carbon monoxide, oxides of nitrogen and sulfur, bioaerosols, metals, volatile organics
and pesticides. A gigantic proportion of those pollution are formed by way of anthropogenic
events. Though most persons devote nearly every hour inside their houses, outside air
exceptional may disturb the air inside to a tremendous measure. Furthermore many sufferers
corresponding to asthmatics, patients with allergies and chemical sensitivities, COPD
patients, heart and stroke sufferers, diabetics, pregnant women, the aged and children are
mainly vulnerable. It is extensively believed that city air pollution has a right away have an
impact on human wellbeing mainly in constructing and industrial nations, where air pleasant
measures usually are not available or minimally applied or enforced. Contemporary reports
have shown vast evidences that publicity to atmospheric pollutants has robust hyperlinks to
hostile diseases together with asthma and lung irritation. The modules are in charge for
getting and loading the info, preprocessing and translating the info into knowledge,
estimating the pollution centered on ancient know-how, and subsequently offering the bought
understanding by means of distinctive channels, corresponding to cellular application and net
gateway. The focal point of the research work is on the observing and estimating.
Three machine learning (ML) algorithms are analysed to construct precise estimation
of items for single and various steps ahead of concentrations of floor-degree ozone (O3),
nitrogen dioxide (NO2), and sulfur dioxide (SO2). These ML algorithms are support vector
machines, M5P model bushes, and artificial neural networks (ANN). Two forms of
simulation are tried out: 1) uni-variate and 2) multivariate. The parameters used for analyzing
the performance are prediction pattern accuracy and root mean squarer error (RMSE). The
outcome proves that utilizing unique attributes in multivariate simulation with M5P algorithm
yields good estimation. Air satisfactory is an fundamental crisis that immediately influences
human health. Air exceptional knowledge are gathered wirelessly from monitoring motes.
These information are analyzed and used in forecasting awareness values of pollutants using
clever desktop to computer platform. The platform makes use of ML-centered algorithms to
build the forecasting items by studying from the amassed information.
11
D. J. Nowak, D. E. Crane, and J. C. Stevens, “Air pollution removal by urban trees and
shrubs in the United States,”Urban Forestry &Urban Greening, vol. 4, no. 3, pp. 115-
123, 2019`
Assessing air pollution in cities is a critical errand. Source records are not up to data
and are not available. Because of this, the forecast results of numerical models may not be
accurate and sufficient. Hence we cannot get advantage from such methods. To overcome
this problem, this research work proposes a complete analysis system to make the forecast
results stronger and accurate. Experimentations on distinct attribute organizations have given
findings which are very much helpful. It was tested with the air pollution data from China
and some other metropolitan cities from around the world. The parameter air quality index
(AQI) is used to measure the amount of pollution in air in a particular urban area. This will be
helpful for the Government to assess the condition of quality of air and to take the necessary
steps to control pollution in urban areas. Thus this work is helpful to the society as a whole
for all living organisms.
2.1 INFERENCE OF THE LITERATURE SURVEY
Gaps in the Literature Air quality prediction has been widely researched using popular
machine learning techniques. However, several gaps in the literature have been discovered.
The most evident is the lack of a strict evaluation framework. The researchers use different
problem definitions and evaluation methods to showcase their results. The problems include a
large area of combinations: univariate or multivariate, fine-grained or single target
predictions, and a window horizon of a few hours or a range of multiple days. Besides, the
datasets used in the literature is tied up to the research location. The cities of interest each
introduce a new set of data with different distribution and variables. The data is characterized
by the cities unique geographical factors, climate, and the city magnitude. The datasets from
a town might be different from another or challenging to obtain. These limitations make the
promising solutions from the literature challenging to reproduce for other locations.
However, this is not the focus of this thesis. The impact of events and human mobility data
on air quality patterns has not yet been studied, as far as we are aware. This kind of data
might be of challenge to acquire or includes inaccurate information, and thus not applicable
for experiments. The city population tends to gather around more significant events, and
therefore, the air quality could have a correlation with this movement of population density.
Despite this interesting direction, this was not followed through, due to that, we were not able
12
to acquire the necessary data in time. The literature includes comprehensive experiments of
meteorological, temporal, and spatial techniques. These features are further used to highlight
temporal and spatial relations by using feature engineering. These spatial relations are
described with neighboring air quality measurements. The temporal relations are represented
with historical data and with including information of the timestamp of the measure. Another
approach is to include statistical calculations of the time series to complement the features.
This approach is less seen from the research and is believed to add even more hidden
relations of the complexity in air quality, and is studied in this thesis.
13
CHAPTER 3
SYSTEM ANALYSIS
3.1 PROBLEM DEFINITION

Observing and retaining high standard air has become the crucial challenge in metropolitan
areas which have more industries, companies and population. As there is a rise in population,
there is an increase in the transportation, usage of electricity and fuels. There is a lot of waste
dumped in the land which we are well aware of. The air is also highly contaminated which
causes a more serious threat to all kinds of living organisms in the earth. This gives rise to the
need for monitoring and assessing the quality of air and accordingly the government should
be given alert to take necessary actions. This research work concentrates on performing an
effective analysis on all the major works done in this aspect using machine learning
algorithms
3.2 ADVANTAGES
The basic aim of our project is to apply Machine Learning techniques based classification and
regression functions to enumerate the influence of pollutants on the air quality and to predict
the air quality index in the study region using primary pollutants (SO2, NO2 and PM10) as
the estimators. Accordingly, in this project we used to predict air quality by using Multiple
Classifier in Data Mining methods (Random forest) for classification and regression
3.3 SYSTEM SPECIFICATION
HARDWARE REQUIREMENTS
The hardware requirements for executing this model are:
 RAM – 4 GB
 Operating System – Windows 10
 Processor – Intel(R) Core(TM) i3
 Processor speed – 3.60 GHz
SOFTWARE REQUIREMENTS
The programming language used to develop this application is
 Python and the IDE used is Jupyter Notebook.
14
 Programming Language – Python
 Python IDE – Jupyter Notebook
 Deep Learning Framework - Tensorflow.
15
CHAPTER 4
SYSTEM ARCHITECTURE
4.1 OVERALL ARCHITECTURE
Figure 4.1 Overall Architecture of Proposed System
The basic aim of our project is to apply Machine Learning techniques based classification and
regression functions to enumerate the influence of pollutants on the air quality and to predict
the air quality index in the study region using primary pollutants (SO2, NO2 and PM10) as
the estimators. Accordingly, in this project we used to predict air quality by using Multiple
Classifier in Machine Learning methods (Decision tree, KNN, SVM, Random forest, Logistic
Regression, Naive Bayes) for classification and regression.
4.2 LIST OF MODULES
1. Build Data Set

2. Data Pre-processing
3. Feature Selection
4. Train the Model
5. Test the Model
6. Data Analysis
7. Prediction
16
4.2.1. Build Data Set
Tamil Nadu is the eleventh-largest state in India by area and the sixth-most populous.
Under NAMP, Ambient Air Quality is being monitored by CPCB in association with Tamil
Nadu Pollution Control Board (TNPCB) in 28 locations covering cities, major towns and
major industrial areas viz. Chennai, Salem, Coimbatore, Madurai, Trichy, Cuddalore, Mettur,
and Thoothukudi. All these stations are manual operated stations. The ambient air samples
are collected through high volume samplers by running 24 hours and twice a week. Thus in
each stations, not less than 108 samplings are done in a year. PM10, SO2 and NO2 are
monitored. Out of these 28 stations, 10 stations were selected to calculate historical AQI so as
to know the air quality of the cities and towns. With this, the Air quality index ranges will be
generated and are organized in a CSV (Comma Separated Values) file.
Figure 4.2 Air Quality Parameters-Chennai
17
Table-2 Air Quality Parameters-Madurai
Table-3 Air Quality Parameters-Coimbatore
18
Table-4 Air Quality Index scale and its categories
Table-5 Health Statements for IND-AQI Categories
19
4.2.2 DATA PRE-PROCESSING
The data we get from different sources may contain inconsistent data, missing values and
repeated data. To get proper prediction result, the dataset must be cleaned, missing values
must be taken care of either by deleting or by filling with mean values or some other method.
Also redundant data must be removed or eliminated so as to avoid biasing of the results.
Some dataset may have some outlier or extreme values which also have to be removed to get
good prediction accuracy. Classification and clustering algorithms and other data mining
methods will work well only if all this pre-processing is done on the data.
Data preprocessing in Machine Learning is a crucial step that helps enhance the
quality of data to promote the extraction of meaningful insights from the data. Data
preprocessing in Machine Learning refers to the technique of preparing (cleaning and
organizing) the raw data to make it suitable for a building and training Machine Learning
models. In simple words, data preprocessing in Machine Learning is a data mining
technique that transforms raw data into an understandable and readable format. When it
comes to creating a Machine Learning model, data preprocessing is the first step marking the
initiation of the process. Typically, real-world data is incomplete, inconsistent, inaccurate
(contains errors or outliers), and often lacks specific attribute values/trends. This is where
data preprocessing enters the scenario – it helps to clean, format, and organize the raw data,
thereby making it ready-to-go for Machine Learning models.
Identifying and handling the missing values

In data preprocessing, it is pivotal to identify and correctly handle the missing values,
failing to do this, you might draw inaccurate and faulty conclusions and inferences from the
data. Needless to say, this will hamper your ML project.
Basically, there are two ways to handle missing data:
 Deleting a particular row – In this method, you remove a specific row that has a null
value for a feature or a particular column where more than 75% of the values are
missing. However, this method is not 100% efficient, and it is recommended that you
use it only when the dataset has adequate samples. You must ensure that after deleting
the data, there remains no addition of bias.
 Calculating the mean – This method is useful for features having numeric data like
age, salary, year, etc. Here, you can calculate the mean, median, or mode of a
20
particular feature or column or row that contains a missing value and replace the
result for the missing value. This method can add variance to the dataset, and any loss
of data can be efficiently negated. Hence, it yields better results compared to the first
method (omission of rows/columns). Another way of approximation is through the
deviation of neighbouring values. However, this works best for linear data.
 Duplicate values: A dataset may include data objects which are duplicates of one
another. It may happen when say the same person submits a form more than once. The
term duplication is often used to refer to the process of dealing with duplicates. In most
cases, the duplicates are removed so as to not give that particular data object an
advantage or bias, when running machine learning algorithms.
ENCODING THE CATEGORICAL DATA

Categorical data refers to the information that has specific categories within the
dataset. In the dataset cited above, there are two categorical variables – country and
purchased.Machine Learning models are primarily based on mathematical equations. Thus,
you can intuitively understand that keeping the categorical data in the equation will cause
certain issues since you would only need numbers in the equations.
The Air quality Data initially obtained must be processed or organized for analysis.
For instance, these may involve placing data into rows and columns in a table format.
In this study, both the single pollutant and combined pollutants based AQIs were calculated
for all the data points, single pollutant based AQI were calculated using IND-AQI method
I p=¿
Where Ip is the AQI for the pollutant, p; Cp is the actual ambient concentration of the
pollutant, p; BPHI is the break point for the pollutant that is greater than or equal to Cp; BP LO
is the break point for pollutant that is less than or equal to Cp; I HI is the sub-Index value
corresponding to BPHI; and ILO is the sub-Index value corresponding to BP LO. The IND-AQI
values for each of the pollutants considered here were calculated using above equation
[1].Here the Air quality were considered as dependent variables in regression and
classification modelling respectively whereas air pollutants(concentration in(µg m -3) or ppm)
and location level were taken as independent Variables.
Table 1. IND-Air Quality Index scale and its categories
21
AQI PM10 PM2.5 NO2 O3 CO SO2 NH3 Pb
Category 24-hr 24-hr 24- 24-hr 24-hr 24-hr
(Range) hr
Good (0- 0-50 0-30 0-40 0-50 0- 0-40 0-200 0-0.5
50) 1.0
Satisfactory 51- 31-60 41- 51- 1.1- 41-80 201- 0.5-
(51-100) 100 80 100 2.0 400 1.0
Moderately 101- 61-90 81- 101- 2.1- 81-380 401- 1.1-
polluted 250 180 168 10 800 2.0
(101-200)
Poor (201- 251- 91- 181- 169- 10- 381- 801- 2.1-
300) 350 120 280 208 17 800 1200 3.0
Very Poor 351- 121- 281- 209- 17- 801- 1200- 3.1-
(301- 400) 430 250 400 748* 34 1600 1800 3.5
Severe 430 + 250 + 400 748 34 1600+ 1800+ 3.5+
(401-500) + +* +
*One hourly monitoring(for mathematical calculation only)
Calculation of AQI
The CPCB has given guidelines on calculating AQI as follows [2]:
1. The Sub-indices for individual pollutants at a monitoring location are calculated using its
24-hourly average concentration value (8-hourly in case of CO and O3) and health breakpoint
concentration range. The worst sub-index is the AQI for that location.
2. All the eight pollutants may not be monitored at all the locations. Overall AQI is calculated
only if data are available for minimum three pollutants out of which one should necessarily
be either PM2.5 or PM10. Else, data are considered insufficient for calculating AQI.
Similarly, a minimum of 16 hours’ data is considered necessary for calculating sub-index
3. The sub-indices for monitored pollutants are calculated and disseminated, even if data are
inadequate for determining AQI. The Individual pollutant-wise sub-index will provide air
quality status for that pollutant.
4. The web-based system is designed to provide AQI on real time basis. It is an automated
system that captures data from continuous monitoring stations without human intervention,
and displays AQI based on running average values (e.g. AQI at 6 A.M on a day will
incorporate data from 6 A.M on previous day to the current day).
22
5. For manual monitoring stations, an AQI calculator is developed wherein data can be fed
manually to get AQI value.
Table : 2 Health Statements for IND- AQI Categories
AQI Associated Health Impacts

Good (0-50) Minial Impact
Satisfactory (51- May cause minor breathing discomfort to sensitive
100) people
Moderately polluted May cause breathing discomfort to the people with lung
(101-200) disease such as asthma and discomfort to people with
heart disease, children and older adults
Poor (201-300) May cause breathing discomfort to the people on
prolonged exposure and discomfort to people with heart
disease
Very Poor (301- May cause respiratory illness to the people on
400) prolonged exposure. Effect may be more pronounced in
people with lung and heart diseases
Severe (401-500) May cause respiratory effects even on healthy people
and serious health impacts on people with lung/heart
diseases. The health impacts may be experienced even
during light physical activity
The data were then partitioned into two subsets; training and test set, In the present study, the
complete data set (433 samples x 16 variables) was partitioned a training (433 samples x 16
variables) and test (337 samples x 16 variables)set. Thus the training and test sets comprised
of 100% and 65% samples, respectively. Since all the independent variables (Air pollutants
Concentration in (µg m-3), type of locations) in data exhibited significant correlations with the
dependent variables, all of them were considered for Classification and regression.
DATA CLEANING
Once processed and organized, the data may be incomplete, contain duplicates, or
contain errors. The need for data cleaning will arise from problems in the way that data is
23
entered and stored. Data cleaning is the process of preventing and correcting these errors.
Common tasks include record matching, identifying inaccuracy of data, overall quality of
existing data, deduplication, and column segmentation. Such data problems can also be
identified through a variety of analytical techniques. For example, with financial information,
the totals for particular variables may be compared against separately published numbers
believed to be reliable. Unusual amounts above or below pre-determined thresholds may also
be reviewed. Quantitative data methods for outlier detection can be used to get rid of likely
incorrectly entered data. Textual data spellcheckers can be used to lessen the amount of
mistyped words, but it is harder to tell if the words themselves are correct.
EXPLORATORY DATA ANALYSIS

Once the data is cleaned, it can be analyzed. Analysts may apply a variety of
techniques referred to as exploratory data analysis to begin understanding the messages
contained in the data. The process of exploration may result in additional data cleaning or
additional requests for data, so these activities may be iterative in nature. Descriptive
statistics such as the average or median may be generated to help understand the data. Data
visualization may also be used to examine the data in graphical format, to obtain additional
insight regarding the messages within the data.
AIR QUALITY CALCULATED USING AQI
Air quality index is a 100 point scale that summarizes results from a total of nine
different measurements when complete: Year, Month, Season, Location, SO2, NO2, PM10,
and Sub Index PM10. And each parameter Q-Values changes depends upon the their values.
AQI = ΣAnWn / ΣWn
An = quality rating of nth Air quality parameter.
Wn = unit weight of of nth air quality parameter.
Location Count Unique Top Freq
Chennai 108 3 MODERATE 84
Madurai 108 2 GOOD 84
Coimbatore 108 4 MODERATE 64
Table-6 Overall Performance variations of air quality in the study areas
4.2.4 TRAIN THE MODEL

Training set denotes the subset of a dataset that is used for training the machine
learning model. Here, you are already aware of the output. A test set, on the other hand, is the
24
subset of the dataset that is used for testing the machine learning model. The ML model uses
the test set to predict outcomes.
Usually, the dataset is split into 70:30 ratios or 80:20 ratios. This means that you either take
70% or 80% of the data for training the model while leaving out the rest 30% or 20%. The
splitting process varies according to the shape and size of the dataset in question.
Figure 4.3 Train set and Test Set representation
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.2, random_state=0)

Here, the first line splits the arrays of the dataset into random train and test subsets. The
second line of code includes four variables:
 x_train – features for the training data
 x_test – features for the test data
 y_train – dependent variables for training data
 y_test – independent variable for testing data
Thus, the train_test_split() function includes four parameters, the first two of which are for
arrays of data. The test_size function specifies the size of the test set. The test_size
maybe .5, .3, or .2 – this specifies the dividing ratio between the training and test sets. The
last parameter, “random_state” sets seed for a random generator so that the output is always
the same. In machine learning, training data is the data you use to train a machine learning
algorithm or model. Training data requires some human involvement to analyse or process
the data for machine learning use. How people are involved depends on the type of machine
learning algorithms you are using and the type of problem that they are intended to solve.
 With supervised learning, people are involved in choosing the data features to be
used for the model. Training data must be labelled - that is, enriched or annotated - to
teach the machine how to recognize the outcomes your model is designed to detect.
25
 Unsupervised learning uses unlabelled data to find patterns in the data, such as
inferences or clustering of data points. There are hybrid machine learning models that
allow you to use a combination of supervised and unsupervised learning.
Figure 4.4 Training Model
4.2.5 TEST MODEL

This part of the dataset is used to test our model hypothesis. It is left untouched and
unseen until the model and hyper parameters are decided, and only after that the model is
applied on the test data to get an accurate measure of how it would perform when deployed on
real-world data.
4.2.6 AIR QUALITY DATA ANALYSIS

The air quality data in the CSV file is used to perform analysis using machine
learning algorithm. The analysis performed with the air quality data is the end result of the
paper. There are 5 major analysis performed with the scrapped results.
26
Table -7 Overall Performance variations of air quality monthwise
27
CHAPTER 5
SYSTEM IMPLEMENTATION
HARDWARE AND SOFTWARE SPECIFICATION
5.1 HARDWARE REQUIREMENTS:
 Hard Disk: 500GB and Above

 RAM: 4GB and Above
 Processor: I 3 and Above
5.2 SOFTWARE REQUIREMENTS:
 Operating System: Windows 7, 8, 10 (64 bit)

 Software: Python 3.7
 Tools: Anaconda (Jupyter NoteBook IDE), Teachable Machine,COLAB
5.3 TECHNOLOGIES USED:
 Python
 Deep learning
 Tensor Flow
5.3.1 Introduction to Python
Python is a widely used general-purpose, high level programming language. It was initially
designed by Guido van Rossum in 1991 and developed by Python Software Foundation. It
was mainly developed for emphasis on code readability, and its syntax allows programmers
to express concepts in fewer lines of code.
Python is a programming language that lets you work quickly and integrate systems more
efficiently.
It is used for:
 Web development (server-side)

 Software development
 Mathematics
 System Scripting
28
Uses of Python:
 Python can be used on a server to create web applications.

 Python can be used alongside software to create workflows.
 Python can connect to database systems. It can also read and modify files.
 Python can be used to handle big data and perform complex mathematics.
 Python can be used for rapid prototyping, or for production-ready software
development.
5.3.2 Characteristics of Python:
 Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc).
 Python has a simple syntax similar to the English language.
 Python has syntax that allows developers to write programs with fewer lines than
some other programming languages.
 Python runs on an interpreter system, meaning that code can be executed as soon as it
is written. This means that prototyping can be very quick.
 Python can be treated in a procedural way, an object-orientated way or a functional
way.
Python 2 Vs Python 3:
 The most recent major version of Python is Python 3, which we shall be using in this
tutorial. However, Python 2, although not being updated with anything other than
security updates, is still quite popular.
 Python 2.0 was released in 2000, and the 2.x versions were the prevalent releases until
December 2008. At that time, the development team made the decision to release
version 3.0, which contained a few relatively small but significant changes that were
not backward compatible with the 2.x versions. Python 2 and 3 are very similar, and
some features of Python 3 have been back ported to Python 2. But in general, they
remain not quite compatible.
 Both Python 2 and 3 have continued to be maintained and developed, with periodic
release updates for both. As of this writing, the most recent versions available are
2.7.15 and 3.6.5. However, an official End of Life date of January 1, 2020 has been
established for Python 2, after which time it will no longer be maintained.
29
 Python is still maintained by a core development team at the Institute, and Guido is
still in charge, having been given the title of BDFL (Benevolent Dictator For Life) by
the Python community. The name Python, by the way, derives not from the snake, but
from the British comedy troupe Monty Python’s Flying Circus, of which Guido was,
and presumably still is, a fan. It is common to find references to Monty Python
sketches and movies scattered throughout the Python documentation.
 It is possible to write Python in an Integrated Development Environment, such as
Thonny, Pycharm, Netbeans or Eclipse which are particularly useful when managing
larger collections of Python files.
Python Syntax compared to other programming languages:
 Python was designed for readability, and has some similarities to the English language
with influence from mathematics.
 Python uses new lines to complete a command, as opposed to other programming
languages which often use semicolons or parentheses.
 Python relies on indentation, using whitespace, to define scope; such as the scope of
loops, functions and classes. Other programming languages often use curly-brackets for
this purpose.
5.3.2 Python is Interpreted:

 Many languages are compiled, meaning the source code you create needs to be
translated into machine code, the language of your computer’s processor, before it can
be run. Programs written in an interpreted language are passed straight to an
interpreter that runs them directly.
 This makes for a quicker development cycle because you just type in your code and
run it, without the intermediate compilation step.
 One potential downside to interpreted languages is execution speed. Programs that are
compiled into the native language of the computer processor tend to run more quickly
than interpreted programs. For some applications that are particularly computationally
intensive, like graphics processing or intense number crunching, this can be limiting.
 In practice, however, for most programs, the difference in execution speed is
measured in milliseconds, or seconds at most, and not appreciably noticeable to a
30
human user. The expediency of coding in an interpreted language is typically worth it
for most applications.
 For all its syntactic simplicity, Python supports most constructs that would be
expected in a very high-level language, including complex dynamic data types,
structured and functional programming, and object-oriented programming.
 Additionally, a very extensive library of classes and functions is available that
provides capability well beyond what is built into the language, such as database
manipulation or GUI programming.
 Python accomplishes what many programming languages don’t: the language itself is
simply designed, but it is very versatile in terms of what you can accomplish with it.
5.3.4 TENSOR FLOW

Commonly used software for ML tasks is TF. It is widely adopted as it provides an
interface to express common ML algorithms and executable code of the models. Models
created in TF can be ported to heterogeneous systems with little or no change with devices
ranging from mobile phones to distributed servers. TF was created by and is maintained by
Google, and is used internally within the company for ML purposes. TF expresses
computations as a stateful data flow graph enabling easy scaling of ANN training with
parallelisation and replication. As the model described in this paper is to be trained on
computational servers and later ported to mobile devices, TF is highly suitable
5.3.5 TENSOR FLOW MOBILE

During design, Google developed TF to be able to run on heterogeneous systems,
including mobile devices. This was due to the problems of sending data back and forth
between devices and data centres when computations could be executed on the device
instead. TFM enabled developers to create interactive applications without the need of
network round-trip delays for ML computations . As ML tasks are computationally
expensive, model optimisation is used to improve performance. The minimum hardware
requirements of TFM in terms of Random-Access Memory (RAM) size and CPU speed are
low, and the primary bottleneck is the calculation speed of the computations as the desired
latency for mobile applications is low. For example, a mobile device with hardware capable
of running 10 Giga Floating Point Operations Per Second (FLOPS) is limited to run a 5
GFLOPS model in 2 FPS, which might impede desired application performance.
31
5.3 TENSORFLOW FRAMEWORK:
Tensor flow is an open-source software library.

Tensor flow was originally developed by researchers and engineers.
It is working on the Google Brain Team within Google’s Machine Intelligence research
organization the purposes of conducting machine learning and deep neural networks research.
It is an opensource framework to run deep learning and other statistical and predictive
analytics workloads.
It is a python library that supports many classification and regression algorithms and more
generally deep learning.
TensorFlow is a free and open-source software library for dataflow and differentiable
programming across a range of tasks.
It is a symbolic math library, and is also used for machine learning applications such as
neural networks.
It is used for both research and production at Google, TensorFlow is Google Brain's second-
generation system.
Version 1.0.0 was released on February 11, While the reference implementation runs on
single devices, TensorFlow can run on multiple CPUs and GPUs (with optional CUDA and
SYCL extensions for general-purpose computing on graphics processing units).
Tensor Flow is available on 64-bit Linux, macOS, Windows, and mobile computing
platforms including Android and iOS.
Its flexible architecture allows for the easy deployment of computation across a variety of
platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge
devices.
The name Tensor Flow derives from the operations that such neural networks perform on
multidimensional data arrays, which are referred to as tensors.
5.3 OPENCV:
1.It is a cross-platform library using which we can develop real-time computer vision
applications.
2.It mainly focuses on image processing, video capture and analysis including feature like
face detection and object detection.
32
3. Currently Open CV supports a wide variety of programming languages like C++, Python,
Java etc. and is available on different platforms including Windows, Linux, OS X, Android,
iOS etc.
4. Also, interfaces based on CUDA and OpenCL are also under active development for high-
speed GPU operations. Open CV-Python is the Python API of Open CV.
5. It combines the best qualities of Open CV C++ API and Python language.
6. OpenCV (Open-Source Computer Vision Library) is an opensource computer vision and
machine learning software library. OpenCV was built to provide a common infrastructure for
computer vision applications and to accelerate the use of machine perception in the
commercial products. Being a BSD-licensed product, OpenCV makes it easy for businesses
to utilize and modify the code.
7. The library has more than 2500optimized algorithms, which includes a comprehensive set
of both classic and state-of -the-art computer vision and machine learning algorithms.
8.Algorithms can be used to detect and recognize faces, identify objects, classify human
actions in videos, track camera movements, track moving objects, extract 3D models of
objects, produce 3D point clouds from stereo cameras, stitch images together to produce a
high resolution image of an entire scene, find similar images from an image database, remove
red eyes from images taken using flash, follow eye movements, recognize scenery and
establish markers to overlay it with augmented reality, etc.
5.4 NUMPY:
NumPy is a library for the Python programming language, adding support for large,
multi-dimensional arrays and matrices, along with a large collection of high-level
mathematical functions to operate on these arrays. The ancestor of NumPy, Numeric, was
originally created by Jim Hugunin with contributions from several other developers. In 2005,
Travis Oliphant created NumPy by incorporating features of the competing Num array into
Numeric, with extensive modifications. NumPy is open-source software and has many
contributors. The Python programming language was not initially designed for numerical
computing, but attracted the attention of the scientific and engineering community early on,
so that a special interest group called matrix-sig was founded in 1995 with the aim of
defining an array computing package. Among its members was Python designer and
maintainer Guido van Rossum, who implemented extensions to Python's syntax (in particular
the indexing syntax) to make array computing easier.
33
An implementation of a matrix package was completed by Jim Fulton, then
generalized by Jim Hugunin to become Numeric also variously called Numerical Python
extensions or NumPy Hugunin, a graduate student at Massachusetts Institute of Technology
(MIT) joined the Corporation for National Research Initiatives (CNRI) to work on J Python
in 1997 leaving Paul Dubois of Lawrence Livermore National Laboratory (LLNL) to take
over as maintainer. In early 2005, NumPy developer Travis Oliphant wanted to unify the
community around a single array package and ported num-array's features to Numeric,
releasing the result as NumPy 1.0 in 2006. This new project was part of SciPy. To avoid
installing the large SciPy package just to get an array object, this new package was separated
and called NumPy.
5.5 MATPLOT:
Mat plot is a plotting library for the Python programming language and its numerical
mathematics extension NumPy. It provides an object-oriented API for embedding plots into
applications using general-purpose GUI toolkits like Tkinter, WX Python, Qt, or GTK+.
There is also a procedural "Pylab" interface based on a state machine (like OpenGL),
designed to closely resemble that of MATLAB, though its use is discouraged SciPy makes
use of Matplotlib.
Matplotlib was originally written by John D. Hunter, has an active development
community and is distributed under a BSD-style license. Michael Droettboom was nominated
as matplotlib's lead developer shortly before John Hunter's death in August 2012 and further
joined by Thomas Caswell.
5.6 IPYTHON
What exactly is Python? You may be wondering about that. You may be referring to this
book because you wish to learn editing but are not familiar with editing languages.
Alternatively, you may be familiar with programming languages such as C, C ++, C #, or
Java and wish to learn more about Python language and how it compares to these "big word"
languages.
5.7 PYTHON CONCEPTS

You can skip to the next chapter if you are not interested in how and why Python. In this
chapter, I will try to explain why I think Python is one of the best programming languages
available and why it is such a great place to start.
34
Python was developed into an easy-to-use programming language. It uses English words
instead of punctuation, and has fewer syntax than other languages. Python is a highly
developed, translated, interactive, and object-oriented language.
Python translated - Interpreter processing Python during launch. Before using your software,
you do not need to install it. This is similar to PERL and PHP editing languages.
Python interactive - To write your own applications, you can sit in Python Prompt and
communicate directly with the interpreter.
Python Object-Oriented - Python supports the Object-Oriented program style or method,
encoding the code within objects.
Python is a language for beginners - Python is an excellent language for beginners, as it
allows for the creation of a variety of programs, from simple text applications to web
browsers and games.
5.8.1Python Features
Python features include -
1.Easy-to-learn - Python includes a small number of keywords, precise structure, and well-
defined syntax. T This allows the student to learn the language faster
2.Easy to read - Python code is clearly defined and visible to the naked eye.
3.Easy-to-maintain - Python source code is easy to maintain.
4.Standard General Library - Python's bulk library is very portable and shortcut compatible
with UNIX, Windows, and Macintosh.
5.Interaction mode - Python supports interaction mode that allows interaction testing and
correction of captions errors.
6.Portable - Python works on a variety of computer systems and has the same user interface
for all.
7.Extensible - Low-level modules can be added to Python interpreter. These modules allow
system developers to improve the efficiency of their tools either by installing or customizing
them.
8.Details - All major commercial information is provided by Python ways of meeting.
9.GUI Programming - Python assists with the creation and installation of a user interface for
images of various program phones, libraries, and applications, including Windows MFC,
Macintosh, and Unix's X Window.
10.Scalable - Major projects benefit from Python building and support, while Shell writing is
not
35
Aside from the characteristics stated above, Python offers a long list of useful features, some
of which are described below. −
 It supports OOP as well as functional and structured programming methodologies.
 It can be used as a scripting language or compiled into Byte-code for large-scale
application development.
 It allows dynamic type verification and provides very high-level dynamic data types.
 Automatic garbage pickup is supported by IT.
ADVANTAGES/BENEFITS OF PYTHON:
The diverse application of the Python language is a result of the combination of features
which
give this language an edge over others. Some of the benefits of programming in Python
include:
1. Presence of Third-Party Modules:
The Python Package Index (PyPI) contains numerous third-party modules that make Python
capable of interacting with most of the other languages and platforms.
2. Extensive Support Libraries:
Python provides a large standard library which includes areas like internet protocols, string
operations, web services tools and operating system interfaces. Many high use programming
tasks have already been scripted into the standard library which reduces length of code to be
written significantly.
3. Open Source and Community Development:
Python language is developed under an OSI-approved opensource license, which makes it
free to use and distribute, including for commercial purposes. Further, its development is
driven by the community which collaborates for its code through hosting conferences and
mailing lists, and provides for its numerous modules.
4. Learning Ease and Support Available:
Python offers excellent readability and uncluttered simple-to-learn syntax which helps
beginners to utilize this programming language. The code style guidelines, PEP 8, provide a
set of rules to facilitate the formatting of code. Additionally, the wide base of users and active
developers has resulted in a rich internet resource bank to encourage development and the
continued adoption of the language.
5. User-friendly Data Structures:
36
Python has built-in list and dictionary data structures which can be used to construct fast
runtime data structures. Further, Python also provides the option of dynamic high-level data
typing which reduces the length of support code that is needed.
6. Productivity and Speed:
Python has Clean object-oriented design, provides enhanced process control capabilities, and
possesses strong integration and text processing capabilities and its own unit testing
framework, all of which contribute to the increase in its speed and productivity. Python is
considered a viable option for building complex multi-protocol network applications. 10 As
can be seen from the above-mentioned points, Python offers a number of advantages for
software development. As upgrading of the language continues, its loyalist base could grow as
well.
Python has five standard data types :
 Numbers
 String
 List
 Tuple
 Dictionary
PYTHON NUMBERS
Numeric values are stored in number data types. When you give a number a value, it becomes
a number object.
PYTHON STRINGS
In this python uses a string is defined as a collection set of characters enclosed in quotation
marks. Python allows you to use any number of quotes in pairs. The slice operator ([] and [:] )
can be used to extract subsets of strings, with indexes starting at 0 at the start of the string and
working their way to -1 at the end.
PYTHON LISTS
The most diverse types of Python data are lists. Items are separated by commas and placed in
square brackets in the list ([]). Lists are similar to C-order in some ways. Listings can be for
many types of data, which is one difference between them.
The slide operator ([] and [:]) can be used to retrieve values stored in the list, the indicators
start with 0 at the beginning of the list and work their way to the end of the list. The
concatenation operator of the list is a plus sign (+), while the repeater is an asterisk (*).
37
PYTHON TUPLES
A cone is a type of data type similar to a sequence of items. Cone is a set of values separated
by commas. The pods, unlike the list, are surrounded by parentheses. Lists are placed in
parentheses ([]), and the elements and sizes can be changed, but the lumps are wrapped in
brackets (()) and cannot be sorted. Powders are the same as reading lists only.
PYTHON DICTIONARY
Python dictionaries in Python are a way of a hash table. They are similar to Perl's
combination schemes or hashes, and are made up of two key numbers. The dictionary key
can be any type of Python, but numbers and strings are very common. Prices, on the other
hand, can be anything you choose Python. Curly braces () surround dictionaries, and square
braces ([) are used to assign and access values.
Different modes in python
Python Normal and interactive are the two basic Python modes. The scripted and
completed.py files are executed in the Python interpreter in the regular manner.
Interactive mode is a command line shell that provides instant response for each statement
while simultaneously running previously provided statements in active memory.
The feed programme is assessed in part and whole as fresh lines are fed into the interpreter.
PANDAS
Pandas is an open-source library that is built on top of NumPy library. It is a Python package
that offers various data structures and operations for manipulating numerical data and time
series. It is mainly popular for importing and analyzing data much easier. Pandas is fast and it
has high-performance & productivity for users.
Pandas is a python package providing fast, flexible, and expressive data structures designed
to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the
fundamental high-level building block for doing practical, real-world data analysis in
Python. Additionally, it has the broader goal of becoming the most powerful and flexible
opensource data analysis/manipulation tool available in any language. It is already well
on its way toward this goal.
Explore data analysis with Python. Pandas Data Frames make manipulating your data easy,
from selecting or replacing columns and indices to reshaping your data.
Pandas is well suited for many different kinds of data:
38
Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet
Ordered and unordered (not necessarily fixed-frequency) time series data.
Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels
Any other form of observational / statistical data sets. The data need not be labeled at all to be
placed into a Pandas data structure.
KERAS
KERAS is an API designed for human beings, not machines. Keras follows best practices for
reducing cognitive load: it offers consistent & simple APIs, it minimizes the number of user
actions required for common use cases, and it provides clear & actionable error messages.
It also has extensive documentation and developer guides. Keras contains numerous
implementations of commonly used neural network building blocks such as layers,
objectives, activation functions, optimizers, and a host of tools to make working with image
and text data easier to simplify the coding necessary for writing deep neural network code.
The code is hosted on GitHub, and community support forums include the GitHub issues
page, and a Slack channel. Keras is a minimalist Python library for deep learning that can run
on top of Theano or Tensor Flow.
It was developed to make implementing deep learning models as fast and easy as possible for
research and development.
FOUR PRINCIPLES:
 Modularity: A model can be understood as a sequence or a graph alone. All the
concerns of a deep learning model are discrete components that can be combined in
arbitrary ways.
 Minimalism: The library provides just enough to achieve an outcome, no frills and
maximizing readability.
 Extensibility: New components are intentionally easy to add and use within the
framework, intended for researchers to trial and explore new ideas.
 Python: No separate model files with custom file formats. Everything is native
Python. Keras is designed for minimalism and modularity allowing you to very
quickly define deep learning models and run them on top of a Theano or TensorFlow
backend.
39
CHAPTER 6
RESULT AND DISCUSSION
The values of monthly PM10, SO2, NO2 concentrations measured in the major cities
of Tamil Nadu, during the study period (2020-2021) The moderate level has been noted in
Tamil Nadu during period 2007-2013 and satisfactory level has been noted during period
(2020-2021) are shown table(5,6) respectively, where the high concentrations may be even
more problematic in terms of human due to the fact that most of the particles in these
locations are derived from anthropogenic sources. Their concentrations tended to increase for
locations of various cities in Tamil Nadu, where they achieve moderate level during 2020 to
2021. On the contrary the concentrations significantly decrease in Chennai-Adayar (table5).
The higher PM10 concentrations might be due to suspension of road dust, soil dust and
vehicles and not good condition of the road. Table (7) presents a summary of the monthly
averaged concentrations for oxides of nitrogen NO2, SO2, PM10 monitored during the whole
period of the study. It can be seen that concentration levels are well below the prescribed
IND-Air quality indices. Air pollution in Tamil Nadu was under permissible level. Level of
PM10 in all stations in some locations found beyond the permissible limit (Satisfactory to
Moderate level) but SO2 and NO2 were below the permissible limit at all the stations.
40
41
42
43
44
Accuracy is the overall number of the correct predictions fractionated by the whole
number of predictions created for a dataset. It can inform us immediately if a model is trained
correctly and by which method it may perform in general. Nevertheless, it does not give
detailed information concerning its application to the issue. Precision, called PPV, is a
satisfactory measure to determination, whereas the false positives cost is high. Recall is the
model metric used to select the best model when there is an elevated cost linked with false
negative. Recall helps while the false negatives’ cost is high. F1-score is required when you
desire to seek symmetry between both precision and recall. It is a general measure of the
accuracy of the model. It combines precision and recall. A good F1-score is explained by
having low false positives and also low false negatives.
True Positives (TP):- These are the correctly predicted positive values, which means that the
value of the actual class is yes and the value of the predicted class is also yes.
True Negatives (TN):- These are the correctly predicted negative values, which means that
the value of the actual class is no and value of the predicted class is also no.
False positives and false negatives, these values occur when our actual class contradicts with
the predicted class.
False Positives (FP):- When actual class is no and predicted class is yes.
False Negatives (FN):- When actual class is yes but predicted class in no.
Where,
 True positive (TP) = correctly identified
 False positive (FP) = incorrectly identified
 True negative (TN) = correctly rejected
 False negative (FN) = incorrectly rejected
45
Precision
Precision means to determine the number of positive class predictions
that actually belong to the positive class.
Precision = TP/TP+FP
Recall
Recall means to determine the number of positive class predictions
made out of all positive samples in the dataset.
Recall = TP/TP+FN
The air quality categories of air pollutants (PM10,SO2,NO2) of major cities of Tamil
Nadu based on the AQI, few cities of Tamil Nadu was categorized as Chennai-Adayar at
residential area, Coimbatore at mixed zone, Chennai-T.nagar at commercial area and
Madurai at mixed zone, Trichy at traffic zone on type of location are presented in table 5.
AQI values in this project were calculated by using the Concentration of PM10, SO2 and
NO2 (by using standard formula mention in data processing). Air pollution in many cities
was under control level. Level of PM10 in all location satisfactory to moderate level. Levels
of SO2, NO2 were below the permissible limit at all locations.
APPENDIX – A (SOURCE CODE)

# In[ ]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
get_ipython().magic(u'matplotlib inline')
from matplotlib.gridspec import GridSpec
from matplotlib.offsetbox import AnchoredText
aqchtr = pd.read_csv('G:/air pollution chennai with AQI train.csv',encoding='latin1')
aqchte = pd.read_csv('G:/air pollution chennai with AQI test.csv',encoding='latin1')
sns.set(style="white", color_codes=True)
aqchtr.head()
# In[5]:
46
aqchte.head()
# In[6]:
aqchtr['AQI'].value_counts()
# In[7]:
aqchte['AQI'].value_counts()
# In[8]:
Loc_Data = aqchtr[["Year",'Month','Season','Location',"SO2","NO2 ","PM10",'AQI']]
Loc_Data.head()
# In[9]:
Loc_Datate = aqchte[["Year",'Month','Season','Location',"SO2 ","NO2 ","PM10 ",'AQI']]
Loc_Datate.head(12)
# In[15]:
aqchtr['AQI'].groupby(aqchtr['Season']).describe()
# In[16]:
aqchtr['AQI'].groupby(aqchtr['Location']).describe()
# In[17]:
aqchtr['AQI'].groupby(aqchtr['Year']).describe()
# In[18]:
aqchtr['AQI'].groupby(aqchtr['Month']).describe()
# In[19]:
aqchtr['AQI'].groupby(aqchtr['PM10']).describe()
# In[13]:
aqchtr.describe(include=['object'])
47
# In[32]:
aqchte.describe(include=['object'])
# In[10]:
aqchtr['AQI'].groupby(aqchtr['Location']).describe()
# In[34]:
# Getting the count of people that leave and not

AQIcounts=aqchtr['AQI'].value_counts()
print(AQIcounts)
# Using matplotlib pie chart and label the pie chart
plt.pie(AQIcounts,labels=['MODERATE','SATISFACTORY','GOOD','POOR']);
# In[35]:
# Getting the count of people that leave and not

AQIcounts1=aqchte['AQI'].value_counts()
print(AQIcounts1)
# Using matplotlib pie chart and label the pie chart

plt.pie(AQIcounts1,labels=['MODERATE','SATISFACTORY','GOOD','POOR']);
# In[57]:
clsal22 = {"Month": {"january": 1, "february":2, "march": 3, "april": 4,"may": 5,"june":
6,"july": 7,"august": 8,"september": 9,"october": 10,"november": 11,"december": 12}}
Loc_Data.replace(clsal22,inplace= True)
Loc_Data.head()
48
# In[58]:
Loc_Data.head()
# In[59]:
clsal222 = {"Season": {"winter": 1, "summer":2, "monsoon": 3}}
Loc_Data.head()
# In[60]:
clsal1219 = {"AQI": {"GOOD": 1, "SATISFACTORY":2, "MODERATE": 3,"POOR": 4}}
Loc_Data.head()
# In[61]:
clsal12019 = {"Location": {"anna nagar": 1, "adayar":2, "T.nagar": 3,"kilpauk": 4}}
Loc_Data.head()
# In[62]:
clsal2262 = {"Month": {"january": 1, "february":2, "march": 3, "april": 4,"may": 5,"june":
6,"july": 7,"august": 8,"september": 9,"october": 10,"november": 11,"december": 12}}
Loc_Datate.replace(clsal2262,inplace= True)
Loc_Datate.head()
# In[63]:
clsal2221 = {"Season": {"winter": 1, "summer":2, "monsoon": 3}}
Loc_Datate.head()
# In[64]:
clsal1201 = {"Location": {"anna nagar": 1, "adayar":2, "T.nagar": 3,"kilpauk": 4}}
Loc_Datate.head()
# In[65]:
49
clsal12193 = {"AQI": {"GOOD": 1, "SATISFACTORY":2, "MODERATE": 3,"POOR": 4}}
Loc_Datate.head()
# In[67]:
X_train = Loc_Data.drop("AQI", axis=1)
Y_train = Loc_Data["AQI"]
X_test = Loc_Datate.drop("Location", axis=1).copy()
X_train.shape, Y_train.shape, X_test.shape
# In[68]:
from sklearn.svm import SVC
svc = SVC()
svc.fit(X_train, Y_train)
Y_pred = svc.predict(X_test)
acc_svc = round(svc.score(X_train, Y_train) * 100, 2)
acc_svc
# In[69]:
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression()
logreg.fit(X_train, Y_train)
Y_pred = logreg.predict(X_test)
acc_log = round(logreg.score(X_train, Y_train) * 100, 2)
acc_log
# In[70]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 3)
knn.fit(X_train, Y_train)
Y_pred = knn.predict(X_test)
acc_knn = round(knn.score(X_train, Y_train) * 100, 2)
acc_knn
50
# In[71]:
from sklearn.naive_bayes import GaussianNB
gaussian = GaussianNB()
gaussian.fit(X_train, Y_train)
Y_pred = gaussian.predict(X_test)
acc_gaussian = round(gaussian.score(X_train, Y_train) * 100, 2)
acc_gaussian
# In[72]:
from sklearn.linear_model import Perceptron
perceptron = Perceptron()
perceptron.fit(X_train, Y_train)
Y_pred = perceptron.predict(X_test)
acc_perceptron = round(perceptron.score(X_train, Y_train) * 100, 2)
acc_perceptron
# In[73]:
from sklearn.svm import SVC, LinearSVC
linear_svc = LinearSVC()
linear_svc.fit(X_train, Y_train)
Y_pred = linear_svc.predict(X_test)
acc_linar_svc = round(linear_svc.score(X_train, Y_train) * 100, 2)
acc_linear_svc
# In[74]:
from sklearn.linear_model import SGDClassifier
sgd = SGDClassifier()
sgd.fit(X_train, Y_train)
Y_pred = sgd.predict(X_test)
acc_sgd = round(sgd.score(X_train, Y_train) * 100, 2)
acc_sgd
51
# In[75]:
from sklearn.tree import DecisionTreeClassifier
decision_tree = DecisionTreeClassifier()
decision_tree.fit(X_train, Y_train)
Y_pred = decision_tree.predict(X_test)
acc_decision_tree = round(decision_tree.score(X_train, Y_train) * 100, 2)
acc_decision_tree
# In[76]:
from sklearn.ensemble import RandomForestClassifier
random_forest = RandomForestClassifier(n_estimators=100)
random_forest.fit(X_train, Y_train)
Y_pred = random_forest.predict(X_test)
random_forest.score(X_train, Y_train)
acc_random_forest = round(random_forest.score(X_train, Y_train) * 100, 2)
acc_random_forest
# In[77]:
models = pd.DataFrame({
'Model': ['Support Vector Machines', 'KNN', 'Logistic Regression',
'Random Forest', 'Naive Bayes', 'Perceptron',
'Stochastic Gradient Decent', 'Linear SVC',
'Decision Tree'],
'Score': [acc_svc, acc_knn, acc_log,
acc_random_forest, acc_gaussian, acc_perceptron,
acc_sgd, acc_linear_svc, acc_decision_tree]})
models.sort_values(by='Score', ascending=False)
# In[90]:
sns.set_color_codes("muted")
sns.barplot(x='Score', y='Model', data=models, color="b")
plt.xlabel('Score %')
plt.title('Classifier Accuracy')
plt.show()
52
CHAPTER 6
CONCLUSION
AQI of cities and towns in Tamil Nadu reveals that PM10 is the main contributor for
higher value of index. SO2 and NO2 are well within the NAAQ standards for 24 hours. The
higher value of PM10 is mainly due to vehicular pollution. Vehicular emissions are of
particular concern because these are ground level sources and thus have the maximum impact
on general population. Also, vehicles contribute significantly to the total air pollution load in
many urban areas. It is to be noted that AQI system is based on maximum operator function
by selecting the maximum of sub-indices of various pollutants as overall AQI. Ideally, eight
parameters (i.e.,) PM10, PM2.5, NO2, SO2, CO, O3, NH3 and Pb having short-term
standards should be considered for near real-time dissemination of AQI.
The regulating agencies should establish source-receptor relationships in terms of
impact of emissions on air quality. Adopting comprehensive policies in an integrated manner
and addressing the root causes rather than focusing on issues in isolation and seeking
remedies is the key to managing air quality in urban areas. In case AQI category is severe or
very poor, necessary steps need to be taken by further regulating the emissions which are
causing maximum impact to ambient air quality. Specific actions, such as (i) strict vigilance
and no-tolerance to visible polluting vehicles, industries, open burning, construction activities
etc.; (ii) regulating traffic; and (iii) identifying sources contributing significantly to rising air
quality levels and actions for reducing emissions from such sources are to be taken. In the
cities well-constructed clean roads, flyovers, cleaner transport fuel will reduce the ambient air
pollution level. AQI is an initiative intended to enhance public awareness and involvement in
efforts to improve air quality. People can contribute by maintaining vehicles properly (e.g.
get PUC checks, replace car air filter, maintain right tires pressure), following lane discipline
& speed limits, avoiding prolong idling and turning off engines at red traffic signals.
FUTURE ENHANCEMENT
 India meteorological department wants to automate the detecting the air quality is
good or not from eligibility process (real time).
 To automate this process by show the prediction result in web application or desktop
application.
 To optimize the work to implement in Artificial Intelligence environment
53
REFERENCES
[1] National air quality index –Control of Urban pollution series : CUPS/82/201-15
Central Pollution Control Board (Ministry of Environment, Forest 7 Climate Change)
New delhi
[2] Lidia Contreras and Cesar Ferri.,Wind-Sensitive Interpolation of Urban Air Pollution
forecasts- Elsevier journal ICCS2016
[3] CPCB,2009.National Ambient Air Quality Standards.CPCB No.B-29016/20/90/PCI-

L.
[4] T.Senthilnathan.,Measurements of Urban Ambient Air Quality of Chennai City,

Indian Journal of Air Pollution Control Vol.VIII No.I March 2008 pp 35-47.
[5] Gorunescu,F.,2011.Data Mining Concepts, Models and Tecchniques, Intelligent

Syatem Reference Library.Spring –Verlag,Heidelberg. http :// dx.doi.org
/10.1007/978-3-642-19721-5.
[6] Akpinar, E.K.,Akpinar, S.,Oztop,H,F.,2009. Statistical analysis of meteorological

factors and air pollution at winter months in elaig, Turkey, Journal of Urban and
Environmental Engineering 3,7-16
[7] Suresh kumar sharma,Vinod Sharma.,Comparative Analysis of machine Learning

Techniques in sale forecasting.,International Journal of Computer Applications
(0975-8887) Volume 53- No.6,September2012
[8] Kumar,A.,GoyalP.,2011.Forecasting of air quality in Delhi using principal

component regression technique.Atmospheric Pollution Research 2,436-444.
[9] Vapnik,V 1995.The Natuure of Statistical Learning Theory.Springer –Verlag,New

York.
54
[10] Krzysztof Siwek,Stanislawosowski.,Data Mining Methods for Prediction of Air
Pollution.,International Journal of Applied Mathematics and Computer
Science.,2016,Vol.26, No.2,467-478
[11] I.N.Athanasiadis,K.Karatzas and P.Mitkas., Contemporary Air Quality Forecasting

Methods: A Comparative Analysis between Classification Algorithms and Statistical
Methods.,5th Int’l nceconfereUrbanAir Quality Measurement, Modelling and
Management,(Sokhi,R. and Brexhler,J.,Eds.),Valencia,Spain,March 2005.
[12] RuiyunYu,YuYang,Leyouyang,Guangjie Han and Oguti Ann Move RAQ- A

Random Forest Approach for Predicting Air Quality in Urban Sensing Systems,
MDPI Sensors 2016,16,86
[13] JosipMesaric and Dario Sebaji.,Decision trees for predicting the academic success of
students,Croatian Operational Review CRORR 7(2016),367-388.
[14] Asha B.Chelani.,D.G.Gajghate& M.Z.Hasan.,Prediction of Ambient PM10 and Toxic

Metals Using Artificial Neural Networks.,Journal of the Air & Waste Management
Association 2126-2906.
[15] Robert Damo,Pirroicka.,A Comparative study of Ambient Air quality status in major
cities of Albania.,The annals of valahia university of targoviste 2014.,
[16] Jan Kleine Deters, Rasa Zalakeviciute, Mario Gonzalez, and Yves
Rybarczyk.,Modeling PM2.5 Urban Pollution Using Machine Learning and Selected
Meteorological Parameters.,Hindawi Journal of Electrical and Computer Engineering
volume 2017.
[17] Fahimeh. Hosseinibalam and Azadeh. Influence of Meteorological Parameters on Air

Pollution in Isfahan, 3rd International Conference on Biology, Environment and
Chemistry IPCBEE, 46, 7-12 (2012)
[18] Madhukar R And Srikantaswamy S., Assessment of Air Quality in Bidadi Industrial
Area, Ramanagaram Disrict,Karnataka, India, Journal of Environmental Science,
55
Computer Science and Engineering and Technology; 2(4), 1135-1140 (2013)
[19] Carrie. V. Breton and Amy Marutani. N., Air Pollution and Epigenetics: Recent
Findings Curr Envir Health Rpt 1, 35– 45 DOI 10.1007/s40572-013-0001-9, (2014)
[20] Tanushree Bhattacharya1, Leena Kriplani, Sukalyan Chakraborty. Seasonal Variation

in Air Pollution Tolerance Index of Various Plant Species of Baroda City, Universal
Journal of Environmental Research and Technology, 3(2), 199-208 (2013)
56

Air CHAPTER

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Air CHAPTER

Uploaded by

Copyright:

Available Formats

CHAPTER 1

1.4 SCOPE OF THE PROJECT

1.6 IMPORTANCE OF PROJECT

1.7 ORGRANIZATION OF THE REPORT

Verma, Ishan, Rahul Ahuja, HardikMeisheri, andLipikaDey. ”Air pollutant severity

Djebbri, Nadjet, and MouniraRouainia. ”Artificial neural networksbased air pollution

Ioannis N. Athanasiadis, Vassilis G. Kaburlasos, Pericles A. Mitkas and Vassilios

E. Kalapanidas and N. Avouris, “Applying machine learning techniques in air quality

2.1 INFERENCE OF THE LITERATURE SURVEY

3.1 PROBLEM DEFINITION

3.3 SYSTEM SPECIFICATION

4.1 OVERALL ARCHITECTURE

Figure 4.1 Overall Architecture of Proposed System

4.2 LIST OF MODULES

1. Build Data Set

Figure 4.2 Air Quality Parameters-Chennai

Table-3 Air Quality Parameters-Coimbatore

Table-5 Health Statements for IND-AQI Categories

Identifying and handling the missing values

ENCODING THE CATEGORICAL DATA

Table : 2 Health Statements for IND- AQI Categories

AQI Associated Health Impacts

EXPLORATORY DATA ANALYSIS

AIR QUALITY CALCULATED USING AQI

4.2.4 TRAIN THE MODEL

Figure 4.3 Train set and Test Set representation

x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.2, random_state=0)

Figure 4.4 Training Model

4.2.5 TEST MODEL

4.2.6 AIR QUALITY DATA ANALYSIS

HARDWARE AND SOFTWARE SPECIFICATION

5.1 HARDWARE REQUIREMENTS:

 Hard Disk: 500GB and Above

5.2 SOFTWARE REQUIREMENTS:

 Operating System: Windows 7, 8, 10 (64 bit)

5.3 TECHNOLOGIES USED:

5.3.1 Introduction to Python

 Web development (server-side)

 Python can be used on a server to create web applications.

5.3.2 Characteristics of Python:

Python Syntax compared to other programming languages:

5.3.2 Python is Interpreted:

5.3.4 TENSOR FLOW

5.3.5 TENSOR FLOW MOBILE

Tensor flow is an open-source software library.

5.7 PYTHON CONCEPTS

RESULT AND DISCUSSION

APPENDIX – A (SOURCE CODE)

# Getting the count of people that leave and not

# Getting the count of people that leave and not

# Using matplotlib pie chart and label the pie chart

[3] CPCB,2009.National Ambient Air Quality Standards.CPCB No.B-29016/20/90/PCI-

[4] T.Senthilnathan.,Measurements of Urban Ambient Air Quality of Chennai City,

[5] Gorunescu,F.,2011.Data Mining Concepts, Models and Tecchniques, Intelligent

[6] Akpinar, E.K.,Akpinar, S.,Oztop,H,F.,2009. Statistical analysis of meteorological

[7] Suresh kumar sharma,Vinod Sharma.,Comparative Analysis of machine Learning

[8] Kumar,A.,GoyalP.,2011.Forecasting of air quality in Delhi using principal

[9] Vapnik,V 1995.The Natuure of Statistical Learning Theory.Springer –Verlag,New

[11] I.N.Athanasiadis,K.Karatzas and P.Mitkas., Contemporary Air Quality Forecasting

[12] RuiyunYu,YuYang,Leyouyang,Guangjie Han and Oguti Ann Move RAQ- A

[14] Asha B.Chelani.,D.G.Gajghate& M.Z.Hasan.,Prediction of Ambient PM10 and Toxic

[17] Fahimeh. Hosseinibalam and Azadeh. Influence of Meteorological Parameters on Air

[20] Tanushree Bhattacharya1, Leena Kriplani, Sukalyan Chakraborty. Seasonal Variation