You are on page 1of 9

8514 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 17, NO.

12, DECEMBER 2021

Multivariate Air Quality Forecasting With Nested


Long Short Term Memory Neural Network
Ning Jin , Yongkang Zeng , Ke Yan , Member, IEEE, and Zhiwei Ji

Abstract—Artificial intelligence-based air quality index travel warnings, providing self-protection suggestions, reorga-
(AQI) forecasting is a hot research topic in the fields of nizing industrial production, and promoting relevant legislation.
sustainable and smart industrial environment design. There
Such guidance is vital to ensure social welfare and protect the
are mainly two obstacles that hinder the existing machine
learning (ML) and deep learning (DL) technologies provid- environment. However, the internal factors, including industrial
ing accurate forecasting results to protect the environment, production, energy extraction, transportation, etc., make the AQI
which include the intercorrelation between different AQI change irregular, and difficult for forecasting techniques [4].
components and the highly volatile AQI pattern changes. Data-driven time series forecasting methods, including deep
In this article, a novel DL framework combining multiple
learning (DL) and big data analysis, provide crucial forecasting
nested long short term memory networks (MTMC-NLSTM)
is proposed for accurate AQI forecasting enlightened with results for many industrial areas [5]–[7]. For the AQI forecasting
the federated learning. The performance of the proposed problem, traditional methods include regression analysis, sup-
MTMC-NLSTM model is compared with conventional ML port vector machine (SVM), and random forest (RF) [8], [9].
models, DL methods, as well as hybrid DL models. The For more precise control of the environmental indices, and the
experimental results show that the performance of the pro-
next-generation smart environment setup, DL techniques, such
posed method is superior to those of all compared models.
as the long short term memory (LSTM), are adopted for the AQI
Index Terms—Discrete stationary wavelet transform forecasting problem [10]–[12].
(DSWT), multichannel neural network, multitask neural net- In this article, study, the LSTM network was upgraded to
work, nested long short term memory (NLSTM).
the nested LSTM network (NLSTM), which adds an extra
LSTM unit nested in every LSTM unit [13], [14]. An innovative
I. INTRODUCTION multitask multichannel (MTMC) NLSTM network (MTMC-
NLSTM) is proposed for multivariate AQI data forecasting.
IR quality control, as one of the most important and long-
A existed issues in the process of human civilization, has
received increasing attention globally in recent years [1]. Air pol-
Discrete stationary wavelet transform (DSWT) is employed
to decompose the original data into multiple subsignals ac-
cording to frequency. Each subsignal is then attached with an
lution not only destroys the ecological balance of nature, but also NLSTM model for short-term AQI prediction. The proposed
affects people’s health, economy, and civilization developments MTMC-NLSTM framework has the following contributions in
[2]. Air quality index (AQI), as one of the major indicators of the the related fields.
air quality, is comprised of six major pollution components— 1) Adopting NLSTM in AQI component prediction. The NL-
fine particulate matter (PM2.5 ), respirable particulate matter STM neural network was proposed in 2018. To the best
(PM10 ), sulfur dioxide (SO2 ), nitrogen dioxide (NO2 ), carbon of our knowledge, it is the first work utilizing NLSTM for
monoxide (CO), and ozone (O3 ) [3]. The research and forecast- AQI component prediction. The original NLSTM is also
ing techniques of AQI enable us to better prevent and control air further extended with a multichannel NLSTM structure.
pollution issues. With the guidance of more precise AQI fore- 2) A multitask multichannel NLSTM neural network enlight-
casting, more effective measures can be taken, including sending ened with federated learning. Instead of training and
testing the six components of the AQI separately, an
Manuscript received November 27, 2020; revised February 27, 2021;
accepted March 9, 2021. Date of publication March 17, 2021; date of MTMC algorithm is proposed to forecast the six AQI
current version August 20, 2021. Paper no. TII-20-5352. (Corresponding components, i.e., PM2.5 , PM10 , SO2 , NO2 , CO, and O3 ,
author: Ke Yan.) simultaneously, enlightened with the federated learning.
Ning Jin and Yongkang Zeng are with the Key Laboratory of Elec-
tromagnetic Wave Information Technology and Metrology of Zhe- The proposed MTMC learning structure enhances the
jiang Province, College of Information Engineering, China Jiliang Uni- data-driven prediction performance by considering the
versity, Hangzhou 310018, China (e-mail: jinning1117@cjlu.edu.cn; internal correlation between different components of the
ykzeng899@gmail.com).
Ke Yan is with the National University of Singapore, Singapore AQI concurrently.
117566, Singapore (e-mail: yanke@cjlu.edu.cn). 3) A comprehensive comparative study. In Section IV, the
Zhiwei Ji is with the School of Artificial Intelligence, Nanjing Agricul- performance of the proposed MTMC-NLSTM model is
tural University, Nanjing 210095, China (e-mail: zhiwei.ji@njau.edu.cn).
Color versions of one or more figures in this article are available at compared with those of the conventional machine learn-
https://doi.org/10.1109/TII.2021.3065425. ing (ML) models, including the multilayer perception
Digital Object Identifier 10.1109/TII.2021.3065425 (MLP), the support vector regression (SVR), and various

1551-3203 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 13,2022 at 04:36:42 UTC from IEEE Xplore. Restrictions apply.
JIN et al.: MULTIVARIATE AIR QUALITY FORECASTING WITH NESTED LSTM NEURAL NETWORK 8515

Fig. 1. General flowchart of the proposed AQI forecasting framework.

DL models. The performance of the proposed method is on CNN and LSTM. The statistic components were included as
superior to all compared existing models. part of the features for better performance.
In the experimental simulation phase, a real-world dataset
collected in Beijing, China, is utilized to justify the superior III. METHODOLOGY
performance of the proposed method with four evaluation met- The general flowchart of the proposed AQI forecasting frame-
rics. The performances of the multiple existing time series work is illustrated in Fig. 1. The input AQI data consist of six
forecasting methods are compared. The results show that the components, including PM2.5 , PM10 , SO2 , NO2 , CO, and O3 .
proposed method outperforms the compared methods in both For each component, data normalization is applied followed by a
fitting performance and prediction errors. three-stage wavelet transform, dividing the original volatile data
into more stationary subsignals (see Fig. 2). The training part of
the subsignals are learned by the proposed MTMC-NLSTM DL
II. RELATED WORKS neural network to produce forecasting results. The forecasted
results are evaluated by four different evaluation metrics.
DL techniques for time series data are among the top impor-
tant and popular topics in AI. Wen et al. [15] implemented an A. Discrete Stationary Wavelet Transform
LSTM model to predict solar energy generation in short term.
The proposed singular LSTM neural network outperformed Original AQI dataset contains extremely frequent and dras-
existing models in results. Das et al. [16] employed bidirectional tic fluctuations. Without data preprocessing, these fluctuations
LSTM, a state-of-art method, to predict electric loads of different cannot be well-learned and predicted merely by DL model.
household devices. Yan et al. [17] combined stationary wavelet Wavelet transform (WT) is a data preprocessing method
transform (SWT) with LSTM to predict energy consumption with excellent performance for data stabilization [26]. In this
for individual households. Sugiartawan et al. [18] proposed a article, DSWT with basis function of Daubechies wavelet is
similar method predicting the growth in the number of visitors implemented to decompose the original data into multiple sub-
and tourism investments. Zheng et al. [19] proposed a hybrid DL signals including the denoised lower frequency component and
model combining LSTM and empirical mode decomposition deseasonalized higher frequency component (see Fig. 2) [13].
(EMD), decomposing the original data into multiple intrinsic After decomposition, the information mixed in the original
mode functions (IMFs) for better forecasting analysis. Zang et data is resolved into several subsignals, resulting in improved
al. [20] employed a novel data preprocessing method variational prediction accuracy.
mode decomposition (VMD) and convolutional neural network By wavelet transform decomposition of level m, the original
(CNN) in photovoltaic power forecasting and achieved better data series can be decomposed into
performance. m

While the research of air quality prediction receives increas- y (t) = Am (t) + Di (t) (1)
ingly attention, various ML and DL methods have been adopted i=1
and reinvented for AQI forecasting. Kok et al. [21] proposed where Am is an approximate information set, indicating the
LSTM neural networks for air quality prediction in smart cities overall pattern and longer term trend characteristics of the orig-
and compared the performance of LSTM to that of SVR. Song inal data. Di are high-frequency information sets, representing
et al. [22] proposed an air quality prediction method based on detailed high-frequency fluctuations caused by sorts of transient
the hybrid model of LSTM and Kalman filtering to predict the factors, which are a noise portion of the original data.
concentration of air pollutants. The variable m is initialized with 1. With each level of de-
While the time series data become increasing volatile, feder- composition, a low-frequency data Amt is to be decomposed
ated learning enlightened models are proposed to handle those into a higher frequency set D(m+1) and a lower frequency set
tedious problems. Cheng et al. [23] developed a multitask and A(m+1) . In the proposed model, decomposition level of WT is
multiview learning model to perform short-term traffic forecast- set to be three, decomposing the original raw data y(t) into four
ing. Particle swarm algorithm was implemented to optimize the subsignals: A3 , D1 , D2 , and D3 (see Fig. 2). The original data
model. Tian et al. [24] developed a multichannel DL framework series y(t) is decomposed into
to conduct electrical load time series forecasting. The framework
y (t) = A3 (t) + D3 (t) + D2 (t) + D1 (t) . (2)
consists two parallel channels and a feature fusion module.
One of the channels is composed of CNN layers and another It is noted that when the times of decomposition increase beyond
is composed of LSTM layers. The two channels are jointed in three, the prediction accuracy tends to be stable. However, the
feature fusion module and the final output is set after it. Shao et complexity of the network and the time of training surges, thus
al. [25] proposed a similar multichannel DL framework based the overall efficiency and performance decrease. In Fig. 2, the

Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 13,2022 at 04:36:42 UTC from IEEE Xplore. Restrictions apply.
8516 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 17, NO. 12, DECEMBER 2021

Fig. 2. Decomposition component after wavelet transform. The signals from left to right represent original data, A3 , D3 , D2 , and D1 .

four subsignals are modeled separately with the NLSTM neural


networks for better forecasting performance.

B. NLSTM Neural Network


The very recently proposed NLSTM neural network [14] is
adopted to improve the prediction performance by memorizing
additional information in the historical data.
An NLSTM memory cell nests one more LSTM memory cell
into the original LSTM cell. The external storage cell is free to
selectively read and write the relevant long-term information of
the internal cell. This structure, in overall, improves the robust-
ness of the original LSTM neural network structure, enabling
memorizing and processing longer term history information. In
LSTM, the output gate follows a principle that information that Fig. 3. Structure of an NLSTM memory cell.
is not relevant to the current time step is still worth remembering.
 
f˜t = σ̃f
Following the abovementioned logic, NLSTM is more advan-
x̃t W̃xf + h̃t−1 W̃hf + b̃f (12)
tageous in the prediction of the time series data with volatile
changes [14].  
c̃t = f˜t  c̃t−1 + ĩt  σ̃c x̃t W̃xc + h̃t−1 W̃hc + b̃c (13)
The structure of a common LSTM memory cell is as follows:
 
it = σi (xt Wxi + ht−1 Whi + bi ) (3) õt = σ̃o x̃t W̃xo + h̃t−1 W̃ho + b̃o (14)
ft = σf (xt Wxf + ht−1 Whf + bf ) (4)
h̃t = õt  σ̃h (c̃t ) . (15)
ct = ft  ct−1 + it  σc (xt Wxc + ht−1 Whc + bc ) (5)
ot = σo (xt Wxo + ht−1 Who + bo ) (6) Due to the existence of an inner memory, the update of the
outer memory cell status ct becomes
ht = ot  σh (ct ) . (7)
ct = h̃t . (16)
Consider (9), the update of the memory cell state ct is made
by adding two parts, that is With the structure of outer and inner memory, the NLSTM
network can form a hierarchy of memory, and can remember
h̃t−1 = ft  ct−1 (8)
and process longer term information. Therefore, the NLSTM
x̃t = it  σc (xt Wxc + ht−1 Whc + bc ) . (9) network is applied to achieve better prediction with higher
accuracy and lower error instead of LSTM or other extensions.
Using another LSTM cell to replace the ct calculation equa-
tion in the ordinary LSTM model constitutes an NLSTM cell.
C. MTMC-NLSTM Neural Network Structure
The outer LSTM is called the outer memory, and the inner
LSTM is called the inner memory. In an ordinary LSTM cell, In this article, the multiple AQI time series components are
the memory cell status ct is updated as follows: trained and predicted simultaneously using a novel MTMC-
NLSTM neural network structure, considering the correlations
ct = h̃t−1 + x̃t . (10)
between different AQI components.
In the NLSTM cell depicted in Fig. 3, this process is replaced Referring to the overall flowchart shown in Fig. 1, each
by an inner LSTM cell, where x̃t and h̃t−1 are short-term and AQI component data are decomposed into multiple subsignals
long-term memory inputs, respectively. The structure of the using WT and inputted into the MTMC-NLSTM model. For
inner LSTM cell, identical with a common one, is as follows: each AQI component, a multichannel NLSTM neural network
  module is attached to learn the subsignals separately, and each
ĩt = σ̃i x̃t W̃xi + h̃t−1 W̃hi + b̃i (11) subsignal corresponds to one of the channels. Subsequently, all

Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 13,2022 at 04:36:42 UTC from IEEE Xplore. Restrictions apply.
JIN et al.: MULTIVARIATE AIR QUALITY FORECASTING WITH NESTED LSTM NEURAL NETWORK 8517

Fig. 4. Internal structure of the proposed MTMC-NLSTM model. The blue boxes represent composing layers of the neural network.

AQI components are input into the multitask NLSTM neural Step 1: The six AQI time series data are normalized using
network, where each component corresponds to one of the zero-score normalization. Each sample of the normalized AQI
task channels. All outputs of the multichannel models are data is then decomposed into four subsignals using wavelet
concatenated using a feature fusion layer for the dense layers’ transform.
outputs. Internally, the prediction output of each component is Step 2: The data processed in step 1 is divided into training set
influenced by other components in the same AQI. Considering and testing set as input of the MTMC neural network. The
the physical properties of the AQI, the correlative influence network has 24 inputs corresponding to the 24 subsignals,
potentially improves the final prediction results of the AQI and six outputs corresponding to the six AQIs’ predicted
components. The overall network structure of the proposed value. The network is trained by fitting the actual value of
MTMC-NLSTM model is illustrated in Fig. 4. the six AQIs’ based on the 24 subsignals from the training
We introduce the multichannel NLSTM network and the mul- set. During the training process, 5% of the training set is split
titask structure separately. Each multichannel NLSTM network to be validation set. When the neural network is trained, the
consists of four channels. Each channel has two layers, namely, predictions of the AQIs are generated based on the testing set.
an NLSTM layer and a full-connected layer (dense layer). The Step 3: The predictions obtained in step 2 are denormalized
number of units for each NLSTM layer and the number of units to the original magnitude. The forecasting results are then
for the dense layer are shown in Fig. 3. The “ADD” operation evaluated to check the forecasting performance according to
sums the four prediction outputs and outputs a 16×1 vector as the seriousness of error and effectiveness of fitting.
the prediction result for each AQI component.
In the multitask framework, in the feature fuse layer, six 16×1 The detailed hyperparameter settings of the proposed multi-
vectors as the outputs of the six multichannel NLSTM networks channel NLSTM model (see Fig. 4) can be found in Table I. The
are concatenated into a 96×1 vector. The concatenated vector is RMSprop optimizer is adopted to minimize the loss function of
then input into different dense layers for final prediction results. mean-squared error (MSE). The learning rate is set 0.001 without
The proposed MTMC framework is, in overall, multi-input weight decay. The neural network is trained by 16 epochs of
multioutput (MIMO), and is composed of six parallel multichan- training set and the result is output. All hyperparameters are
nel NLSTM modules, a feature fusion layer and an output layer. tuned and optimized in Section IV.
In the multichannel modules, the variables are learned separately
and in parallel. In the feature fusion layer, the features of all
variables are merged. In the output layer, the final predictions of IV. EXPERIMENTAL PROCESS AND RESULTS
the AQI components are generated.
It is noted that in this MTMC-NLSTM neural network struc- A. Data Description
ture, the proposed model enables the NLSTM neural networks The original AQI data employed in this article are collected
to comprehensively learn and analyze the features between from 12 observing stations around Beijing from year 2013 to
different AQI variables. 2017. The data are accessible at The University of California,
Irvine (UCI) Machine Learning Repository [27]. The original
sensor data are collected in an hourly time step, across 1461
days. The total number of hourly data samples is 35 064. Each
D. Proposed Forecasting Model
hourly recorded dataset consists of multiple variables and six of
The detailed procedure of the proposed forecasting model is them are utilized in this research, namely, PM2.5 , PM10 , SO2 ,
presented in Algorithm 1. The overall process of the proposed NO2 , CO, and O3 . The six AQI components data are visualized
forecasting method involves the following three main steps. in Fig. 5. It is evident that these components have vital influence

Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 13,2022 at 04:36:42 UTC from IEEE Xplore. Restrictions apply.
8518 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 17, NO. 12, DECEMBER 2021

Fig. 5. Visualization of the six AQI components.

data samples (50 days) of the remaining data are used for testing.
Algorithm 1: Overall Process of the Proposed Model,
The proportion between training, validation, and testing dataset
Where i is the Index for the AQI Components and k is the
is approximately 19:1:1.
Index for the Decomposed Subsignals.
Input: Original data of the 6 AQIs F 1∼6 , size of training
set (including validation set) ntrain , size of testing set B. Experimental Setup
ntest . The experiments were conducted in a server equipped with
Output: The prediction of the 6 AQIs F̂ 1∼6 . Intel i7-8700K CPU and NVIDIA GeForce GTX 1080 GPU.
for i = 1 to 6 do The software environment is Python 3.6. The source code and
Data normalization using Zero-Score method: dataset of this article is freely available for download.1
F S i = f S i (F i ) Multiple existing methods are compared to further verify the
Data decomposing using Wavelet superiority of the proposed method in multivariate AQI time
transform:{F SD i1∼i4 } = f W T (F S i ) series forecasting. Using the same evaluation metrics, the pre-
According to ntrain and ntest, divide F S i into: diction errors and fitting effect of different methods predicting
{T rainY i , T estY i } each AQI variable are calculated and compared. The universality
Obtain Actual value: RealY i = f S i −1 (T estY i ) of the methods is challenged because different AQI components
for k = 1 to 4 do show different patterns and characteristics.
According to ntrain and ntest, divide F SD ik The comparative methods include decision tree, random for-
into:{T rainX ik , T estX ik } est, multilayer perception (MLP), SVR, the traditional LSTM
end neural network [28], NLSTM, and the stacked LSTM (SLSTM)
end [29]. The proposed method is also compared with the hybrid
Obtain training set: DL models combining various LSTM extensions and data pre-
TrainX = {T rainX ik } , TrainY = processing methods, namely, EMD and VMD [19], [20]. EMD
{T rainY i } and VMD are implemented with similar purpose of wavelet
Obtain network testing set: transform [17], [18]. A multitask NLSTM model not combining
TestX = {T estX ik } , TestY = wavelet transform is also compared with the proposed MTMC
{T estY i } learning framework. The four evaluation metrics introduced
where i ∈ {1, 2, 3, 4, 5, 6}, j ∈ {1, 2, 3, 4} above are applied to quantify the effect of forecasting, and
Build the proposed neural network N N according to evaluate the performances of these forecasting methods.
Table I.
Input TrainX and TrainY, and fit TrainY by
C. Evaluation Metrics
minimizing loss function:
M SE(N N (TrainX), TrainY) Four evaluation metrics, namely, mean absolute error (MAE),
Input TestX to obtain the forecasting results: root-mean squared error (RMSE), mean absolute percentage
F̂ S 1∼6 = N N (TestX) error (MAPE), and r-square (R2 ), are employed to evaluate the
for i = 1 to 6 do accuracy of the prediction and verify the effectiveness of the
Denormalization: F̂ i = f S i −1 (F̂ S i ) propose method. The specific formulas of the abovementioned
end four metrics are listed as follows:
  1   
n

MAE f, fˆ = fi − fi  (17)
n i=1
upon the air quality status and are important components of air
pollutants [23].
The training set in the experiment consists of the first 22 800
(950 days) data samples in the dataset. For the purpose of 1 [Online]. Available: https://github.com/761133412/AQI_Predction_
validation, 1200 data samples (50 days) are utilized, and 1200 MTMC.git

Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 13,2022 at 04:36:42 UTC from IEEE Xplore. Restrictions apply.
JIN et al.: MULTIVARIATE AIR QUALITY FORECASTING WITH NESTED LSTM NEURAL NETWORK 8519

TABLE I
HYPERPARAMETER SETTING OF THE MULTICHANNEL NLSTM MODEL

TABLE II TABLE III


FORECASTING RESULTS OF THE PROPOSED MODEL AND THE COMPARED FORECASTING RESULTS OF THE PROPOSED MODEL AND THE COMPARED
METHODS (AQI COMPONENT PM2.5 ) METHODS (AQI COMPONENT PM10 )

 TABLE IV
  1  n   2 FORECASTING RESULTS OF THE PROPOSED MODEL AND THE COMPARED
 
RMSE f, fˆ = fi − fi 
METHODS (AQI COMPONENT SO2 )
(18)
n i=1
 
  n f − f i 
1 i
MAPE f, fˆ = (19)
n i=1 fi

n  
2
  i = 1 fi − fi
R2 f, fˆ = 1 −
n (20)
¯2
i = 1 fi − f

where f refers to the actual data value and fˆ is the predicted


value.
Amongst the four metrics, MAE, RMSE, MAPE are to eval-
uate the level of the error for prediction results. Lower MAE,
RMSE, and MAPE values refer to higher forecasting accuracy.
R2 evaluates the fitting conditions between the prediction results
and the actual data. While the values of R2 are closer to 1, the
better fitting of the prediction model/results is reflected.
a “-“ sign. We also display prediction results of all compared
methods visually in Fig. 6.
D. Results of Experiment According to the prediction results shown in Tables II–VII,
The prediction results of the six AQI components produced the proposed model outperforms all compared methods in terms
by all compared methods with four evaluation metrics are shown of the four evaluation metrics, including, MAE, RMSE, MAPE,
in Tables II –VII, respectively. The averaged training time of all and R2 . The proposed MCMT-NLSTM neural network frame-
compared methods is recorded in seconds. While the training work provides better fitting to the original data, resulting in more
time for most of the ML methods is short, it is denoted using accurate prediction results. The training time of the proposed DL

Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 13,2022 at 04:36:42 UTC from IEEE Xplore. Restrictions apply.
8520 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 17, NO. 12, DECEMBER 2021

TABLE V
FORECASTING RESULTS OF THE PROPOSED MODEL AND THE COMPARED
METHODS (AQI COMPONENT NO2 )

TABLE VI
FORECASTING RESULTS OF THE PROPOSED MODEL AND THE COMPARED
METHODS (AQI COMPONENT CO)

TABLE VII
FORECASTING RESULTS OF THE PROPOSED MODEL AND THE COMPARED
METHODS (AQI COMPONENT O3 )

Fig. 6. Comparison of final prediction result and the actual value.


The six subfigures share the same legend and from top to bottom
are the forecasting results for PM2.5 , PM10 , SO2 , NO2 , CO, and O3 ,
respectively.

model is less than 2 min, which is within the acceptable range


for real-time air pollution monitoring.
Most of the outperformance is achieved with the unique
inner memory structure of the NLSTM neural network without
increasing the training time drastically. Although the SLSTM
neural networks also improve the prediction accuracy with stack-
ing multiple LSTM layers, the stacking architecture doubles the

Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 13,2022 at 04:36:42 UTC from IEEE Xplore. Restrictions apply.
JIN et al.: MULTIVARIATE AIR QUALITY FORECASTING WITH NESTED LSTM NEURAL NETWORK 8521

number of parameters in the optimization process. The compu- prediction. The training and forecasting process of the multi-
tational time of the SLSTM training process is much longer than variate AQI data was performed in parallel, enlightened with
NLSTM. Therefore, according to our results, NLSTM model is the federated learning. According to the experimental results,
more suitable for AQI time series forecasting compared with the proposed method was capable of tracing the actual AQI
other LSTM extensions. very closely, with MAE, RMSE, and MAPE values all less
In the proposed and compared methods, the technique of data than three. The accurate prediction of the air pollution supported
decomposition using EMD, WT, and VMD are implemented to environmental management decisions and was crucial for human
stabilize the original AQI variable data. According to Tables II– health by providing timely environmental information to the
VII, after combining DL models with data preprocessing meth- government.
ods, including DSWT (WT), EMD, and VMD, the prediction In the experimental phase, a real-world dataset collected by
performance is improved notably and the prediction latency is weather stations located around Beijing, China, was utilized. A
reduced. The result justifies the necessity of the implementation comprehensive comparative study with various existing methods
of data preprocessing methods to stabilize the original data. in the literature was conducted and the results and comparisons
Compared to EMD and VMD, DL models combined with demonstrated that the proposed method could outperform most
WT perform more accurate prediction and produce less error. of the existing methods compared and showed notable supe-
VMD, EMD, and any other widely employed data processing riority in multivariate AQI time series forecasting. According
techniques, including Kalman filtering [22] and Fourier trans- to Tables II–VII and Fig. 6, the proposed MTMC model could
form [30], utilize a global and infinitely long wave to fit the perform predictions with far less error and almost no latency
original data series, and decompose the original data. in trend compared to the other methods. The result proved that
Different from those methods, WT utilizes a local and adap- the proposed DL framework is very suitable for multivariate
tively long wavelet to analyze the local fluctuations in the AQI time series forecasting. The proposed method separated
original data. In this study, the air pollutant concentration record the features and trends by decomposing the original data into
consists of plentiful frequent and drastic fluctuations, which high-frequency part and low-frequency part to learn them, re-
have no obvious global temporal pattern. The characteristics of spectively, in a multichannel module. The prediction of each
the dataset make WT more suitable for decomposing the AQI variable was produced by merging and optimizing the prediction
data. In Tables II–VII, a model of multitask NLSTM that is not of the subsignals during the training process. The capability of
attached with WT is compared with other methods. The results of learning multiple AQI variables in parallel was obtained by the
the multitask model are not comparable to those of the proposed multitask structure. In the multitask architecture, the prediction
model. In the multitask model, the history information of all AQI of each variable was performed and optimized after analysis of
variables are accessible to the neural network. However, without the information of the other variables.
the WT data preprocessing, the comprehensive and multivariate
information is too mixed and too concentrated, making the DL
model overwhelmed at certain points. The obstructed NLSTM
networks cannot recognize the temporal features without WT, REFERENCES
and subsequently, produce less accurate prediction. [1] L. Wan, Y. Sun, I. Lee, W. Zhao, and F. Xia, “Industrial pollution areas
One of the contributions of this article is that an innovative detection and location via satellite-based IIoT,” IEEE Trans. Ind. Inform.,
vol. 17, no. 3, pp. 1785–1794, Mar. 2021.
method of data preprocessing by DSWT is proposed. In the [2] S. Lei, Y. Hou, X. Wang, and K. Liu, “Unit commitment incorporating
DSWT, the high-frequency signal is not simply considered spatial distribution control of air pollutant dispersion,” IEEE Trans. Ind.
as noise and filtered out, but learned and analyzed by neural Inform., vol. 13, no. 3, pp. 995–1005, Jun. 2017.
[3] M. M. Rathore, A. Ahmad, A. Paul, and S. Rho, “Urban planning and
networks respectfully. This process significantly increases the building smart cities based on the Internet of things using big data analyt-
attentions of DL networks for every part of the features. And ics,” Comput. Netw., vol. 101, no. C, pp. 63–80, 2016.
since more than one LSTM network is implemented to learn [4] X. Lü, T. Lu, C. J. Kibert, and M. Viljanen, “Modeling and forecasting en-
ergy consumption for heterogeneous buildings using a physical–statistical
parts of the data, the prediction accuracy can be significantly approach,” Appl. Energy, vol. 144, pp. 261–275, 2015.
improved. [5] I. Alawe, A. Ksentini, Y. Hadjadj-Aoul, and P. Bertin, “Improving traffic
Furthermore, the prediction vectors of the subsignals are first forecasting for 5G core network scalability: A machine learning approach,”
IEEE Netw., vol. 32, no. 6, pp. 42–49, Nov./Dec. 2018.
summed to produce the final prediction results, and then opti- [6] L. Ren, Z. Meng, X. Wang, L. Zhang, and L. T. Yang, “A data-driven
mized by the DL model. In the existing studies, the reconstruc- approach of product quality prediction for complex production systems,”
tion of the data is by simply summing the results of the subsignals IEEE Trans. Ind. Inform., to be published, doi: 10.1109/TII.2020.3001054.
[7] J. Chang, Y. Du, X. Chen, E. G. Lim, and K. Yan, “Forecasting based
without any further optimization, causing insensitivity to the virtual inertia control of PV systems for islanded micro-grid,” in Proc.
peaks [17], [18]. In the proposed method, the summation of IEEE 29th Australas. Univ. Power Eng. Conf., 2019, pp. 1–6.
the subsignal prediction results is optimized during the training [8] Y. Han, J. C. Lam, V. O. Li, and Q. Zhang, “A domain-specific Bayesian
deep-learning approach for air pollution forecast,” IEEE Trans. Big Data,
processing. to be published, doi: 10.1109/TBDATA.2020.3005368.
[9] K. Gu, J. Qiao, and W. Lin, “Recurrent air quality predictor based on
meteorology-and pollution-related factors,” IEEE Trans. Ind. Inform.,
V. CONCLUSION vol. 14, no. 9, pp. 3946–3955, Sep. 2018.
[10] Z. Zhao, W. Chen, X. Wu, P. C. Chen, and J. Liu, “LSTM network: A deep
The contribution of this article was an innovative MTMC learning approach for short-term traffic forecast,” IET Intell. Transport
learning framework for multivariate air pollutant concentration Syst., vol. 11, no. 2, pp. 68–75, 2017.

Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 13,2022 at 04:36:42 UTC from IEEE Xplore. Restrictions apply.
8522 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 17, NO. 12, DECEMBER 2021

[11] H. Shi, M. Xu, and R. Li, “Deep learning for household load forecasting— Ning Jin received the B.S. degree and the mas-
A novel pooling deep RNN,” IEEE Trans. Smart Grid, vol. 9, no. 5, ter’s degree in information and electronic en-
pp. 5271–5280, Sep. 2017. gineering from Zhejiang University, Hangzhou,
[12] M. Ma and Z. Mao, “Deep convolution-based LSTM network for re- China, in 1988 and 1991, respectively.
maining useful life prediction,” IEEE Trans. Ind. Inform., vol. 17, no. 3, She is currently a Professor and the Dean
pp. 1658–1667, Mar. 2021. with the Information Engineering College, China
[13] F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget: Jiliang University, Hangzhou. She is a Senior
Continual prediction with LSTM,” Neural Comput., vol. 12, no. 10, Member of the China Institute of Communica-
pp. 2451–2471, 2000. tions (CIC) and the Chair of the Hangzhou In-
[14] J. R. A. Moniz and D. Krueger, “Nested LSTMs,” 2018, arXiv:1801.10308. stitute of Electronics, Hangzhou. Her research
[15] L. Wen, K. Zhou, S. Yang, and X. Lu, “Optimal load dispatch of community interests include intelligent systems, wireless
microgrid with deep learning based solar power and load forecasting,” networks, and signal processing.
Energy, vol. 171, pp. 1053–1065, Mar. 2019.
[16] A. Das, M. K. Annaqeeb, E. Azar, V. Novakovic, and M. B. Kjærgaard, Yongkang Zeng is currently working toward the
“Occupant-centric miscellaneous electric loads prediction in buildings postgraduation degree in automation and con-
using state-of-the-art deep learning methods,” Appl. Energy, vol. 269, trol engineering with the Key Laboratory of Elec-
2020, Art. no. 115135. tromagnetic Wave Information Technology and
[17] K. Yan, W. Li, Z. Ji, M. Qi, and Y. Du, “A hybrid LSTM neural network for Metrology of Zhejiang Province, College of In-
energy consumption forecasting of individual households,” IEEE Access, formation Engineering, China Jiliang University,
vol. 7, pp. 157633–157642, 2019. Hangzhou, China.
[18] P. Sugiartawan, R. Pulungan, and A. K. Sari, “Prediction by a hybrid of His main research interests include AI, ma-
wavelet transform and long-short-term-memory neural network,” Int. J. chine learning, and deep learning technologies
Adv. Comput. Sci. Appl., vol. 8, no. 2, pp. 326–332, 2017. for multiple industrial applications.
[19] Z. Huiting, Y. Jiabin, and C. Long, “Short-term load forecasting using
EMD-LSTM neural networks with a XGBoost algorithm for feature Ke Yan (Member, IEEE) received the bachelor’s
importance evaluation,” Energies, vol. 10, no. 8, 2017, Art. no. 1168. and Ph.D. degrees in computer science from the
[20] H. Zang et al., “Hybrid method for short-term photovoltaic power forecast- School of Computing (SoC), National University
ing based on deep convolutional neural network,” IET Gener., Transmiss. of Singapore (NUS), Singapore, in 2006 and
Distrib., vol. 12, no. 20, pp. 4557–4567, 2018. 2012, respectively.
[21] I. Kok, M. U. Simsek, and S. Ozdemir, “A deep learning model for air He is currently an Assistant Professor with
quality prediction in smart cities,” in Proc. IEEE Int. Conf. Big Data, NUS. He has authored or coauthored more than
Boston, MA, USA, 2017, pp. 1983–1990. 70 full length papers with highly ranked con-
[22] X. Song, J. Huang, and D. Song, “Air quality prediction based on LSTM- ferences and journals, including the Associa-
Kalman model,” in Proc. IEEE 8th Joint Int. Inf. Technol. Artif. Intell. tion for the Advancement of Artificial Intelligence
Conf., 2019, pp. 695–699. (AAAI), IEEE TRANSACTIONS ON INDUSTRIAL IN-
[23] S. Cheng, F. Lu, P. Peng, and S. Wu, “Multi-task and multi-view learning FORMATICS (TII), IEEE TRANSACTIONS ON SUSTAINABLE ENERGY (TSE),
based on particle swarm optimization for short-term traffic forecasting,” IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS: SYSTEMS
Knowl. Based Syst., vol. 180, no. 15, pp. 116–132, Sep. 2019. (SMCA), and Applied Energy (AE). He is actively engaged in cross-
[24] C. Tian, J. Ma, C. Zhang, and P. Zhan, “A deep neural network discipline research fields. His research interests include machine learn-
model for short-term load forecast based on long short-term memory ing, artificial intelligence, cyber intelligence, applied mathematics, sus-
network and convolutional neural network,” Energies, vol. 11, 2018, tainability, and applied energy.
Art. no. 3493.
[25] S. Xiaorui, C.-S. Kim, and P. Sontakke, “Accurate deep model for electric- Zhiwei Ji received the bachelor’s degree from
ity consumption forecasting using multi-channel and multi-scale feature Zhejiang A&F University (ZAFU), Linan, China,
fusion CNN-LSTM,” Energies, vol. 13, 2020, Art. no. 1881. in 2003, the master’s degree from Shanghai
[26] A. J. Conejo, M. A. Plazas, R. Espinola, and A. B. Molina, “Day-ahead University, Shanghai, China, in 2009, and the
electricity price forecasting using the wavelet transform and ARIMA Ph.D. degree from the School of Electronics
models,” IEEE Trans. Power Syst., vol. 20, no. 2, pp. 1035–1042, and Information Engineering, Tongji University,
May 2005. Shanghai, in 2016.
[27] S. Zhang, B. Guo, A. Dong, J. He, Z. Xu, and S. X. Chen, “Cautionary He is currently a Full Professor with the
tales on air-quality improvement in Beijing,” Proc. Roy. Soc. A, Math., School of Artificial Intelligence, Nanjing Agricul-
Phys. Eng. Sci., vol. 473, no. 2205, 2017, Art. no. 20170457. tural University (NJAU), Nanjing, China. Prior to
[28] Q. Wang, S. Bu, and Z. He, “Achieving predictive and proactive mainte- this position, he was an Assistant Professor with
nance for high-speed railway power equipment with LSTM-RNN,” IEEE the University of Texas Health Science Center, Houston (UTHealth),
Trans. Ind. Inform., vol. 16, no. 10, pp. 6509–6517, Oct. 2020. Houston, TX, USA. He has been working on systems biology, bioinfor-
[29] K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmidhu- matics, pattern recognition, big data analysis and modeling for over ten
ber, “LSTM: A search space odyssey,” IEEE Trans. Neural Netw. Learn. years. He is currently the Director of the Center for Data Science and
Syst., vol. 28, no. 10, pp. 2222–2232, Oct. 2017. Intelligent Computing, NJAU. He has authored or coauthored more than
[30] R. N. Bracewell, The Fourier Transform and its Applications. New York, 50 papers in top journals, e.g., PloS Computational Biology, Information
NY, USA: McGraw-Hill, 2005. Sciences, during recent years.

Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 13,2022 at 04:36:42 UTC from IEEE Xplore. Restrictions apply.

You might also like