You are on page 1of 6

Deep Learning Implementation for Air Quality Prediction

December 20, 2022

Sakina A. ASHOUR

Istanbul Aydin University, Institute of Graduate Studies


Computer Engineering Program, Artificial Intelligence Course

saliashour@stu.aydin.edu.tr

ABSTRACT

This paper presents an example of the use of artificial intelligence deep learning approach for prediction. Air quality
prediction in this work is done based on meteorological information (Dew, temperature, pressure, and wind speed).
This model works on predicting air quality in terms of whether it is polluted or unpolluted (clean) and giving the result
in the form of (0 = clean and 1 = polluted).
Deep learning dense layers technique is implemented using Python. The total number of dense layers designed for the
model is five. Three layers with 20 nodes, and one layer consists of 10 nodes. The output layer has two nodes.
30% of the data is used for testing, while 70% of the sampled data is used to train the model. The model used is a
Sequential model that is constructed as stacking layers. The model's prediction when employing test data exceeds 90%
confidence, demonstrating its robustness in prediction.

Keywords: Deep learning, dense layers, air prediction, Sequential model, python.

1. INTRODUCTION
A lot of attention is currently being paid to deep learning (DL) as an effective approach for the
field of artificial intelligence. Importantly, deep learning has been effectively implemented into
several application issues. It has been demonstrated that deep learning applications improve
accuracy and return in forecasting and prediction. The popularity of deep learning has led to
improvements in the neural networks used in these applications. Instead of describing a single
technique for learning large prediction models, such as multi-layer neural networks with several
hidden units, deep learning (DL) provides a variety of learning techniques [1].
In this project, deep learning dense layers technique is used. to predict air pollution based on a set
of pre-recorded data. Air quality data (Particulate matter PM2.5) is worked out in combination with
meteorological data.

2. RELATED WORK
Artificial intelligence is applied in many cases. One of these applications is to create an algorithm
that identifies retinopathy conditions from retinal image data [2]. Deep learning implementation in
prediction has been covered in a number of studies and publications. Long short-term memory
networks (LSTMs) models are classified as the basic structures of deep learning recently [3].
Moreover, Multi-layer Perceptron classifier (MLP), random forest classifiers (RFC), and decision

1
tree (DT) longevity, have the highest accuracy of 88.2%, 83.3%, and 82.5% of the tests respectively
[4]. However, it was discovered that the support victor machine (SVM) was 80.5% less accurate
[4]. The effectiveness of hidden dense layers in the convolution neural network has been studied
in order to enhance the classification model's performance [5]. Fully connected layers deep learning
application is applied for prediction of vehicle classification on highways. This is an example of
implementing dense layers in classification [6].
Meteorological factors have significant impacts on the dilution and diffusion of air pollutants, and
further affect the distribution and concentration of pollutants. For example, it was reported that the
wind and rainfall could strongly affect the concentration of PM2.5, while relative humidity did not
have similar effects [7], [8], [9]. Meteorological conditions play a crucial role in ambient air
pollution by affecting both directly and indirectly the emissions, transport, formation, and
deposition of air pollutants [10]. Meteorological conditions were the primary factor determining
day-to-day variations in pollutant concentrations, explaining more than 70% of the variance of
daily average pollutant concentrations over China [11].
Particulate matter PM is a complex mixture of extremely small particles and liquid droplets
(classified to PM2.5 and PM10). Particle pollution is made up of a number of components,
including acids (such as nitrates and sulfates), organic chemicals, metals, and soil or dust particles.
PMs can be smoke, dirt and dust from factories, farming, and roads, formed by crushing and
grinding rocks and soil then blown by wind [12]. The air quality in cities varies depending on
several factors and one of them is the meteorological properties of the region [13].

3. METHODOLOGY: DEEP LEARNING


The method applied in this work is an artificial neural network deep learning model. This model is
constructed by a number of fully connected hidden layers. The model is done by Python platform
and implements several functions to generate an accurate prediction by analyzing the given data.
The data used in this project is a csv. file [14] consists of observations of the PM2.5 (Particle
Matters with diameters that are generally 2.5 micrometers and smaller). These values are made
against meteorological data (Dew, temperature, pressure, and wind speed). The data was collected
on daily basis in Beijing, China in 2010. PM2.5 classification is done according to air quality index
[15].

4. PROPOSED MODEL
The model design in this project is presented in Python language. This model works on predicting
air quality in terms of whether it is polluted or unpolluted (clean). The model is trained on the
sampled data by 70% and 30% of the data is applied for testing. Detailed information about the
model is explained in the coming of this article.
4.1 Dense Layers
Since dense layer is most frequently employed in the models and most efficient one, it is preferred
for this model. Each neuron in the dense layer receives input from all the neurons in the layer before

2
it because the dense layer is a neural network layer that is fully connected. The dense layer conducts
a matrix-vector multiplication in the background
4.2 Importing the Libraries and Data
The necessary libraries and functions required for this project are recalled such as Numpy.pandas,
Keras.model, Keras.utlis, and Sklearn.model_selection, and pre-processing function. The function
pre-processing is imported from the Sklearn library. Data is loaded to the program as csv.file
consists of five (5) columns and (7873) rows of data.
4.3 Data Normalization
Since the variation of the data in this project is high, the model calls pre-processing sklearn library,
(i.e., some columns contain numbers in thousands and others include small minus numbers). This
preprocessing step is done to the dataset in order to convert any column consisting of numeric
values to common scale. This process is performed without losing information or distorting
variations in the value ranges.
4.4 Building the Model
The model employed is a Sequential model which is constructed as stacking layers. It is recalled
from Keras model library. In order to apply deep learning process to the suggested model, four (4)
hidden layers were created in this project, in addition to the four-node input layer. Each layer
contains twenty (20) nodes. The fourth layer has ten (10) nodes, and the output layer has two (2)
nodes. One node is for the output when it is one (1= Polluted), and the other is for the output when
it is zero (0 = Clean). The model is illustrated in figure 1 and summarized in table 1.

Figure (1): Structure of the model designed for air quality prediction

3
Table (2): Summary of the model construction

4.5 Combining the Model


Sigmoid function is utilized in the last layer (Output layer) as activation function. Every node of
the hidden layers uses the (Relu) activation function. The cost function categorical crossentropy is
applied since prediction in this work relies on classification. Additionally, the optimizer used is
(Adam) optimizer. Prior to training the program, the data allocated for training, which is 70% from
the whole set of data, is divided into two parts. 30% of the data is used for validation and the rest
is utilized for teaching the model.
4.6 Results
The following outcomes is obtained when the program was trained using the training data (only
the last five attempts of epochs is summarized in this article) (Table 2).
Table (2): Result of last five attempts of training the model

The validation score outcome is as follows, along with how frequently data was sent to the program.

4
Figure 2: Plot shows the result of validation loss with epochs of passing data to the model

5 DISCUSSION AND CONCLUSION


The number of nodes in the hidden layers constructed in the model has almost no influence on
training accuracy, as well as the value of cost function has the same behavior of falling from around
0.4982 to about 0. 2822. In this model learning capability has reached 87%. Despite the fact that
the model is programed to stop its attempts for training after (10) times in case of not achieving
any progress, the model runs beyond that given attempts since improvement in learning is
continued. With regard to the two (2) activation functions used in this model, it is noted that the
best activation function for the hidden layers is (relu function) while (sigmoid function) is the most
suitable in the output layer. Additionally, the model prediction when using test data exceeds 90%
of confidence, which means that the model is robust in this kind of prediction.

REFERENCE
[1] LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature 521:436
[2] Ali Okatan, Bekir Karlık, Fatma Demirezen Yağmur (2009). Detection of Retinopathy Diseases Using Artificial
Neural Network Based on Discrete Cosine Transform. Article in Neural Network World.

5
[3] Zhen Yang, Han Feng1, Shailesh Tripathi and Matthias Dehmer (2020), An Introductory Review of Deep Learning
for Prediction Models with Big Data.
[4] E. Y. Kalafi1, N. A. M. Nor, N. A. Taib, M. D. Ganggayah1, C. Town, S. K. Dhillon (2019) Machine Learning
and Deep Learning Approaches in Breast Cancer Survival Prediction Using Clinical Data. Folia Biologica (Praha) 65,
212-220, Received June 4, 2019. Accepted August 28, 2019.
[5] Helen Josephine V L, A.P.Nirmala, Vijaya Lakshmi Alluri (2021). Impact of Hidden Dense Layers in
Convolutional Neural Network to enhance Performance of Classification Model. IOP Conf. Series: Materials Science
and Engineering 1131 (2021) 012007 IOP Publishing doi:10.1088/1757- 99X/1131/1/012007.
[6] Ahmet Dogan, Ali Okatan , Ali Cetinkaya (2021). Vehicle Classification and Tracking Using Convolutional Neural
Network. Based on Darknet Yolo with Coco Dataset. Proceedings of international conference on ai and big data in
engineer ING applications
[7] Xiaoyu Oia, Gang Meia, Salvatore Cuomob, Chun Liuc, Nengxiong Xua (2021). Data analysis and mining of the
correlations between meteorological conditions and air quality: A case study in Beijing. Elsevier, Volume 14, June
2021, 100127
[8] Pengfei Wanga, Hao Guoa, Jianlin Hub,,Sri Harsha, Kotac, Qi Yingd, Hongliang Zhanga (2019). Responses of
PM2.5 and O3 concentrations to changes of meteorology and emissions in China. Elsevier, Science of The Total
Environment, Volume 662, 20 April 2019, Pages 297-306.
[9] Jinjin Sun, Mingjie Liang, Zhihao Shi, Fuzhen Shen, Jingyi Li, Lin Huang, Xinlei Ge, Qi Chen, Yele Sun, Yanlin
Zhang, Yunhua, Chang, Dongsheng Ji, Qi Ying, Hongliang Zhang, Sri Harsha Kota, Jianlin Hu.(2019). Investigating
the PM2.5 mass concentration growth processes during 2013–2016 in Beijing and Shanghai. Elsevier Chemosphere.
Volume 221, April 2019, Pages 452-463
[10] Hongliang Zhang, Yungang Wang, Jianlin Hu, Qi Ying, Xiao-Ming Hu (2015). Relationships between
meteorological parameters and criteria air pollutants in three megacities in China. Elsevier. Environmental Research.
Volume 140, July 2015, Pages 242-254
[11] Jianjun He, Sunling Gong, Ye Yu, Lijuan Yu, Lin Wu, Hongjun Mao, Congbo Song, Suping Zhao, Hongli Liu,
Xiaoyu Li , Ruipeng Li (2017). Air pollution characteristics and their relation to meteorological conditions during
2014–2015 in major Chinese cities. Elsevier, Environmental Pollution, Volume 223, April 2017, Pages 484-496.
[12] EPA United States environmental protection agency website.
[13] M. B. Çelik, İ. Kadi (2007). The Relation Between Meteorological Factors and Pollutants Concentrations in
Karabük City. G.U. Journal of Science, 20(4): 87-95 (2007)
[14] https://www.kaggle.com website
[15] Technical Assistance Document for the Reporting of Daily Air Quality – the Air Quality Index (AQI). EPA 454/B-
18-007 September 2018. U.S. Environmental Protection Agency, Office of Air Quality Planning and Standards, Air
Quality Assessment Division, Research Triangle Park, NC.

You might also like