You are on page 1of 8

Atmospheric Environment 44 (2010) 4481e4488

Contents lists available at ScienceDirect

Atmospheric Environment
journal homepage: www.elsevier.com/locate/atmosenv

Prediction of hourly O3 concentrations using support vector regression algorithms


E.G. Ortiz-García a, S. Salcedo-Sanz a, *, Á.M. Pérez-Bellido a, J.A. Portilla-Figueras a, L. Prieto b
a
Department of Signal Theory and Communications, Universidad de Alcalá, 28871 Alcalá de Henares, Madrid, Spain
b
Department of Physics of the Earth, Astronomy and Astrophysics II, Universidad Complutense de Madrid, Spain

a r t i c l e i n f o a b s t r a c t

Article history: In this paper we present an application of the Support Vector Regression algorithm (SVMr) to the
Received 28 April 2010 prediction of hourly ozone values in Madrid urban area. In order to improve the training capacity of
Received in revised form SVMrs, we have used a recently proposed approach, based on reductions of the SVMr hyper-parameters
2 July 2010
search space. Using the modified SVMr, we study different influences which may modify the ozone
Accepted 14 July 2010
prediction, such as previous ozone measurements in a given station, measurements in neighbors
stations, and the influence of meteorologic variables. We use statistical tests to verify the significance of
Keywords:
incorporating different variables into the SVMr. A comparison with the results obtained using a neural
O3 concentration prediction
Support vector regression algorithms
network (multi-layer perceptron) is also carried out. This study has been carried out in 5 different
Air quality stations of the air pollution monitoring network of Madrid, so the conclusions raised are backed by real
data. The final result of the work is a robust and powerful software for tropospheric ozone prediction in
Madrid. Also, the prediction tool based on SVMr is flexible enough to incorporate any other prediction
variable, such as city models, or traffic patters, which may improve the prediction obtained with the
SVMr.
 2010 Elsevier Ltd. All rights reserved.

1. Introduction forecasting from air quality stations in different cities of the world
(Brunelli et al., 2007; Garfias-Vázquez et al., 2005; Balaguer-
Nowadays, Ozone (O3) is, together with nitrogen oxides (NOx), Ballester et al., 2002; Lu and Wang, 2008). One important
one of the most relevant air pollutants in urban areas of all medium problem previously tackled is the time forecasting of pollutants, in
and large cities of the world (Agirre-Basurko et al., 2006). It is well- which several factors such as meteorology, past concentrations,
known that ozone is a secondary pollutant, since it is not directly traffic or city structure are considered in order to provide a value of
emitted into the air. On the contrary, tropospheric ozone is concentration in a given point in the future, usually specific
produced when the primary pollutants, mainly nitrogen oxides measuring points are selected. This specific problem has been
(NOx) and Volatile Organic Compounds (VOC), interact under the successfully tackled with different methods, such as physical
action of the sunlight (Ionescu et al., 2000; Barrero et al., 2006). In approaches (Massart and Kvalheim, 1998a, 1998b), classical statis-
addition, O3 is recognized as one of the key pollutants degrading tical approaches (Sousa et al., 2006) or soft-computing methods
the air quality in urban areas (Al-Alawi et al., 2008; Kanaroglou such as neural networks (Lu et al., 2006; Dutot et al., 2007; Lu and
et al., 2005), responsible for increases in mortality rates during Wang, 2008; Aneiros-Perez et al., 2004).
episodes of high concentration, mainly in summer. The study of the Recently, Support Vector Machines paradigm (SVMs) (Smola
O3 concentrations, and specially O3 maxima, is, therefore, of major and Schölkopf, 1998), has gained importance in forecasting prob-
interest. lems related to environment (Wang et al., 2008; Luan et al., 2005;
Several works on modeling and forecasting O3 can be found in Lu and Wang, 2005; Osowski and Garanty, 2007). Specifically, the
the literature (Massart and Kvalheim, 1998a, 1998b; Palacios et al., Support Vector Regression algorithms (SVMrs) e SVMs specifically
2002; Wang and Lu, 2006; Felipe-Sotelo et al., 2004), many of them developed for regression problems e are appealing algorithms for
about problem of the modeling or forecasting the complete a large variety of regression problems, since they do not only take
concentration of O3 in a column, or the distribution of the pollutant into account the error approximation to the data, but also the
in a study area. There are also specific works on ground O3 generalization of the model, i.e., their capability to improve the
prediction of the model when new data are evaluated by it. Several
previous works have applied the SVM or SVMr methodology for O3
* Corresponding author. Tel.: þ34 91 885 6731; fax: þ34 91 885 6699. forecasting or related problems. In (Salazar-Ruiz et al., 2008) the
E-mail address: sancho.salcedo@uah.es (S. Salcedo-Sanz). SVM methodology is compared to other artificial intelligence and

1352-2310/$ e see front matter  2010 Elsevier Ltd. All rights reserved.
doi:10.1016/j.atmosenv.2010.07.024
4482 E.G. Ortiz-García et al. / Atmospheric Environment 44 (2010) 4481e4488

statistics methods in a problem of tropospheric ozone prediction in Although there are several versions of the SVMr, in this case we
California. In (Lu and Wang, 2008) the SVM approach is applied to describe the classical model, presented in (Smola and Schölkopf,
the forecasting of the O3 at ground level in Hong-Kong. The authors 1998) (3-SVM).
propose an interesting modification of the standard SVM for clas- The 3-SVM method for regression consists of, given a set of
sification problems in order to be able to tackle regression problems training vectors C ¼ {(xi, yi), i ¼ 1,., l}, training a model of the form
with it. In (Wang et al., 2008) an online forecasting system for y(x) ¼ f(x) þ b ¼ wT4(x) þ b, to minimize a general risk function of
pollutants based on SVMs is presented. The experimental test of the the form
approach is also carried in Hong-Kong and surrounding areas. In
(Chelani, 2010) the performance of the SVMr algorithm is 1 Xl

compared to that of a MLP and a multiple regression technique in R½f  ¼ k w k2 þC Lðyi ; f ðxÞÞ (1)
2 i¼1
a problem of daily maximum ozone prediction at Delhi (India). The
SVMr obtained better results in terms of mean square error than where w controls the smoothness of the model, 4(x) is a function of
the other approaches applied to solve the problem. In (Luan et al., projection of the input space to the feature space, b is a parameter
2005) the prediction of retention time of VOC at ground level is of bias, xi is a feature vector of the input space with dimension N, yi
carried out with a SVM. The performance of the SVM algorithm is is the output value to be estimated and L(yi, f(x)) is the loss function
compared with that of a heuristic algorithm for the same purpose. selected. In this paper, we use the L1-SVRr (L1 support vector
In (Lu and Wang, 2005) the performance of a SVM algorithm is regression), characterized by an 3-insensitive loss function (Smola
tested in the forecasting of different atmospheric pollutants, and Schölkopf, 1998)
including O3, and in (Osowski and Garanty, 2007) the SVM is mixed
with wavelets for improving the performance of the SVM approach Lðyi ; f ðXÞÞ ¼ jyi  f ðXi Þje (2)
in a problem of meteorological pollutants forecasting.
In order to train this model, it is necessary to solve the following
In this paper we present the application of an SVMr algorithm in
optimization problem (Smola and Schölkopf, 1998):
the forecasting of hourly O3 values from data of the Madrid air
quality network. We use a SVMr algorithm which incorporates !
Xl  
1
a mechanism based on bounds to better estimate the correspond- min k w k2 þC xi þ x*i (3)
ing hyper-parameters of the SVMr. We study the effect of including 2 i¼1
different input variables to the SVMr in the O3 prediction: specifi-
cally, we evaluate the effect of including different number of subject to
previous measures in a given station, the effect of using data from
neighborhood measuring stations and the effect of incorporating yi  W T fðXi Þ  b  e þ xi ; i ¼ 1; .; l (4)
meteorological variables from the meteorologic network of Madrid.
*
Statistical tests are performed in order to characterize the signifi- yi þ W T fðXi Þ  b  e þ xi ; i ¼ 1; .; l (5)
cance of the different input variables in the SVMr. Finally, we test
the performance of the SVMr approach against the results provided
by a neural network, obtaining interesting improvements in the
xi ; x*i  0; i ¼ 1; .; l (6)
prediction system.
The structure of the rest of the paper is the following: next The dual form of this optimization problem is usually obtained
section presents the Materials and methods used in the paper, through the minimization of the Lagrange function, constructed
including the description of the 3-SVMr approach, the bounds for from the objective function and the problem constraints. In this
a better estimation of the SVMr hyper-parameters, the description case, the dual form of the optimization problem is the following:
of the Madrid air quality network and how to use the high number 0 1
P l     
of measuring stations to improve the O3 forecasting. Section 3 
B 2
1 a i  a*
i
a j  a* K x ;x 
j i j C
presents the experimental part of the paper, where we provide B i;j ¼ 1 C
maxB C (7)
the main results obtained with the SVMr, and a comparison with @ P l   P l   A
e ai þ a*i þ yi ai  a*i
a multi-layer perceptron algorithm. Section 4 closes the paper i¼1 i¼1
giving some final conclusions.
subject to
2. Materials and methods
l 
X 
In this section we briefly describe the main characteristics of ai  a*i ¼ 0 (8)
SVMr, including a brief discussion of the hyper-parameters search i¼1
space reductions considered. We also describe in this section the air
quality network of Madrid urban area, whose data have been used ai ; a*i ˛½0; C (9)
in the experimental section.
In addition to these constraints, the KarusheKuhneTucker
2.1. Support vector regression algorithms conditions must be fulfilled, and also the bias variable, b, must be
obtained. We do not detail this process for simplicity, the interested
Support vector machines for regression (SVMr) (Smola and reader can consult (Smola and Schölkopf, 1998) for reference. In the
Schölkopf, 1998) are one of the most important forecasting dual formulation of the problem the function K(xi, xj) is the kernel
statistic models of the last few years. The SVMrs are appealing matrix, which is formed by the evaluation of a kernel function,
algorithms for a large variety of regression problems (Mohandes equivalent to the dot product < 4(xi), 4(xj) >. An usual election for
et al., 2004; Akay, 2009; Hou and Li, 2009), since they not only this kernel function is a Gaussian function:
take into account the error approximation to the data, but also the
   
generalization of the model, i.e., their capability to improve the
K Xi ; Xj ¼ exp  g$kXi  Xj k2 : (10)
prediction of the model when new data must be evaluated.
E.G. Ortiz-García et al. / Atmospheric Environment 44 (2010) 4481e4488 4483

The final form of function f(x) depends on the Lagrange multi- measuring (fixed) stations spread out in the city. Fig. 1 and Table 1
pliers ai, a*i , as follows: show the location and other characteristics of the air pollution
measuring stations. The automatic air pollution monitoring network
l 
X  of Madrid started in 1978. At the beginning the network was formed
f ðxÞ ¼ ai  a*i Kðxi ; xÞ (11) by 16 measuring stations, connected by the telephonic network with
i¼1
a center of data control, depending on the department of air quality of
In this way it is possible to obtain a SVMr model solving Madrid City Council. In 1989 the network was completely renewed,
a quadratic problem, given a set of hyper-parameters (C, 3 and g in new “intelligent” stations were acquired. At this point the systematic
this case). However, obtaining these hyper-parameters is not an measurement of NOx and O3 started. The monitoring network in its
easy task, and it is necessary to implement search algorithms to current form was finished in 2001, when the last 2 stations were
estimate them (Ortiz-García et al., 2009). added to the network, and several other stations were moved from
their original location due to technical reasons.
2.2. Hyper-parameters search space reductions The studies on spatial distribution of contaminants in cities
consider two types of monitoring networks to obtain data: routine
In the previous section we introduce the quadratic optimization networks and purpose-design networks (Hoek et al., 2008). The air
problem which is needed to train a SVMr. As has been previously pollution monitoring network of Madrid used in this study is
mentioned, in order to solve this optimization problem, we need to a routine network, which provides hourly O3 measurements in the
define the hyper-parameters C, 3 and g. This is a hard task in different points of the city where there are stations deployed.
practice, because there are no a theoretical way to define them. Several other studies have been used data from routine monitoring
Thus, it is necessary to use some algorithm which finds the best set networks, such as (Beelen et al., 2007; Agirre-Basurko et al., 2006;
of hyper-parameters, i.e., the set which produces a SVMr model Coman et al., 2008; Ibarra-Berastegi et al., 2008; Moore et al., 2007).
with the best performance. In addition, if we want to train many
models with many different data sets, we need a hyper-parameter 3. Experimental part
search algorithm fast and robust, but without losing accuracy. In
(Ortiz-García et al., 2009) a novel methodology which obtains 3.1. Data available and methodology of this study
a good balance between training time and accuracy is introduced.
This methodology is based on a classical grid search, which divides The available database is formed by hourly measures of O3 taken
the search space in an uniform distribution of points around the in the 27 stations of the air quality monitoring network of Madrid
whole space. Then, it evaluates the validation accuracy of the model corresponding to six years, from 2002 to 2007. In order to reduce
trained by using the hyper-parameter in each point, and finally the the number of experiments we only study the prediction of ozone
model with smaller validation error is chosen as the final one. The concentrations in the measuring stations where the levels of ozone
most important characteristic in this novel algorithm is the addi- are the highest in general, in this case, stations 5, 9, 10, 14 and 24. In
tion of hyper-parameters search space reductions. These reductions addition, we only evaluate the accuracy of models trained with data
enclose the search space in an smaller subspace where the grid from months in which the ozone concentration is the highest, i.e.,
search is carried out. The search space reductions proposed in summer months July and August. In the paper we try to obtain the
(Ortiz-García et al., 2009) are described by next equations: best set of features to predict O3 concentration in a given station, by
analyzing several models trained with different features. Note that
ymax be each variable considered (previous O3 measures, O3 measures in
C i ! (12) other stations, meteorological variables etc.) can present different
1
Pl  
1 l1 j ¼ 1;jsi K xj ; x i availability, because of lost measurements in different periods of
time. Thus, we need to define each analysis depending on the
features selected. Therefore, for each evaluated feature selection,
loge ð0:001Þ we have chosen input vectors set where all features have been
g !2 (13)
measured, discarding times in which any of the features are
1
Pl  
i ¼ 1 minj;isj d xj ; xi
missing. On the other hand, we need to compare the real differ-
l
ences among the different trained models with different feature
sets by means of statistical test. Thus, we divided the experiments
e < sy (14) in the six years of available data and by using a K e fold cross
validation in the two months considered. In this way, with K ¼ 5 we
Equation (12) describes the relationship between the regulari- obtain 30 different data sets. All the experiments carried out have
zation hyper-parameter C and the rest of hyper-parameters. It is been run in an Intel Xeon 2.66 Ghz, with 4 cores and 16 Gb of RAM
specially important the relationship of parameter C with parameter memory. The computation time of the experiments carried out
g because it generates the most important reduction in the search (SVM training and the corresponding t-test) is slightly different
space. The rest of bounds (Equations (13) and (14)) are related to depending on the number of variables included in each training
the characteristic of minimum influence between support vector process. It varies from less than 1 min in the simplest case, up to
and the closed relationship between the 3 hyper-parameter and the 3 min in the hardest case with all the variables included. Anyway,
variance of noise in the data. After applying these reductions, a grid note that the complete process to evaluate the SVM performance in
search algorithm is used in the experimental part of the paper to the problem is quite fast, and it is quite reasonable to assume in
find the hyper-parameters of the SVMr, in the O3 prediction a real application of the proposed algorithm.
problem.
3.2. Influence of previous ozone concentrations in the
2.3. The air pollution monitoring network of Madrid studied measuring station

The air pollution monitoring network of Madrid is the largest in We carry out a first analysis of the number of O3 measurements
Spain, and one of the largest in Europe. It is currently formed by 27 in previous hours needed to improve the forecasting in different
4484 E.G. Ortiz-García et al. / Atmospheric Environment 44 (2010) 4481e4488

Fig. 1. Location of the measuring stations of the air quality monitoring network of Madrid (in red), and meteorological stations (in grey). (For interpretation of the references to
colour in this figure legend, the reader is referred to the web version of this article).

measuring stations. The number of previous hours has been chosen


Table 1 from 1 to 4 h. The results obtained by changing this number of
Characteristics of the air quality monitoring network stations in Madrid urban area. hours are shown in Table 2. In this table it is easy to see that the best
Number District Latitude Longitude Altitude prediction result is obtained when we consider 4 previous O3
1 Centro 40 250 21.3600 N 3 410 31.0000 W 648 m measurements as input of the SVMr. This result is consistent in all
2 Retiro 40 240 33.3500 N 3 410 29.5100 W 629 m the measuring stations studied.
3 Centro 40 250 09.1500 N 3 420 11.4200 W 657 m To evaluate the statistical significance of the different accuracy
4 Moncloa 40 250 26.3700 N 3 420 44.4000 W 637 m in these experiments we use some statistical tests. We specially use
5 Fuencarral 40 280 41.6200 N 3 420 41.5500 W 673 m
a t-test or a Fisher sign test depending on the normality test, carried
6 Chamberí 40 260 15.3900 N 3 410 27.0000 W 669 m
7 Salamanca 40 250 47.8100 N 3 400 49.1900 W 679 m out by means of a KolmogoroveSmirnov test, all of them with
8 Salamanca 40 250 17.6300 N 3 400 56.3500 W 672 m a significance level a ¼ 0.05. The results of these statistical tests are
9 Arganzuela 40 240 07.6800 N 3 410 36.3500 W 679 m shown in Table 3. This table shows the statistical comparison
10 Chamberí 40 260 43.9500 N 3 420 25.6600 W 699 m
between the model trained by using a number of O3 measurements
11 Chamartín 40 270 05.0300 N 3 400 38.4700 W 708 m
12 Salamanca 40 250 43.7000 N 3 400 06.7800 W 678 m in previous hours and the best model so far (trained with a smaller
13 Puente Vallecas 40 230 17.3400 N 3 390 05.4800 W 677 m
14 Usera 40 230 06.2800 N 3 420 59.7100 W 605 m
15 Tetuan 40 280 05.7300 N 3 410 19.2900 W 729 m Table 2
16 Ciudad Lineal 40 260 24.1700 N 3 380 21.2400 W 698 m Mean and standard deviation of the accuracy for the experiments where the number
17 Villaverde 40 200 54.2500 N 3 420 41.4200 W 593 m of previous hours in the studied stations is changed.
18 Carabanchel 40 230 41.2000 N 3 430 54.6000 W 625 m
Number of previous hours
19 Latina 40 240 28.2900 N 3 440 30.8300 W 632 m
20 Moratalaz 40 240 28.6400 N 3 380 43.0600 W 671 m 1 2 3 4
21 Moncloa 40 260 27.5100 N 3 430 04.5400 W 672 m
22 Arganzuela 40 240 22.9500 N 3 420 46.5600 W 622 m Station Mean Std Mean Std Mean Std Mean Std
23 San Blas 40 260 55.4400 N 3 360 34.6200 W 637 m 5 14.36 2.87 12.15 2.54 12.02 2.52 12.02 2.44
24 Moncloa 40 250 09.6800 N 3 440 50.4400 W 645 m 9 11.69 2.00 10.68 2.04 10.70 2.42 10.67 2.49
25 Villa de Vallecas 40 220 44.4800 N 3 360 09.1800 W 652 m 10 13.50 2.62 11.53 2.68 11.44 2.57 11.32 2.47
26 Barajas 40 270 33.5600 N 3 340 48.4200 W 620 m 14 13.60 3.26 12.36 3.07 12.17 2.95 12.06 2.99
27 Barajas 40 280 36.9400 N 3 340 48.1000 W 631 m 24 12.83 2.15 10.91 2.03 10.97 2.17 11.00 2.31
E.G. Ortiz-García et al. / Atmospheric Environment 44 (2010) 4481e4488 4485

Table 3 Table 5
Statistical tests for the experiments where the number of previous hours in the Statistical tests for the experiments where 3 previous hours in the studied station
studied stations is changed. and a different number of previous hours of the nearest stations are used to train the
model.
Compared numbers of previous hours in the studied station
Compared numbers of previous hours of ozone concentration
2 vs. 1 3 vs. 2 4 vs. 3
in nearest station
Station P-value W-L-T P-value W-L-T P-value W-L-T
1 vs. 0 2 vs. 1 3 vs. 2
5 0.00* 30-0-0 0.07* 20-10-0 1.00* 16-14-0
9 0.00* 28-2-0 0.00** 26-4-0 0.33* 17-13-0 Station P-value W-L-T P-value W-L-T P-value W-L-T
10 0.00* 29-1-0 0.01** 23-7-0 0.36** 18-12-0 5 0.00* 19-11-0 0.02* 22-8-0 0.54* 17-13-0
14 0.00* 28-2-0 0.02* 25-5-0 0.23* 19-11-0 9 0.86** 14-16-0 0.00** 29-1-0 0.67* 17-13-0
24 0.00* 30-0-0 0.36** 18-12-0 0.68* 19-11-0 10 0.02** 22-8-0 0.02* 20-10-0 0.00* 7-23-0
14 0.19* 21-9-0 0.00* 28-2-0 0.72* 19-11-0
*t-test a ¼ 0.05.
24 0.11* 25-5-0 0.00* 25-5-0 0.20** 11-19-0
**Fisher test a ¼ 0.05.
*t-test a ¼ 0.05.
**Fisher test a ¼ 0.05.
number of O3 measurements in previous hours). For example, the
columns 2 and 3 shows the comparison between the models
trained by using 2 O3 measurements in previous hours with the station is statistical significant comparing to not using any informa-
models trained with only one. Due to the differences are statistical tion of neighbor stations. Thus, we take this model as reference in this
significant, we fix the 2 h models as best model and we compare it analysis. The results are shown in Table 6. In this table the mean
against the other models. The results are also significant when we accuracy is improved in all studied stations by increasing the number
compare the 3 h case and increasing this number does not statis- of neighbor stations considered.
tically improve again. Therefore, the best model obtained is the case As the statistical tests shows (Table 7), when we use two
of considering 3 previous O3 measurements in the previous hours. neighbor stations we obtain statistically improvements in 3 out of 5
studied stations. On the other hand, in 3 and 4 station we obtain
some statistical significant different in some stations, but the
3.3. Influence of neighbor measuring stations in the studied station models are similar in the rest of cases. In addition, the winner value
is larger in all the studied stations. Therefore, by using four near
We also study the influence of different neighbor measuring stations we obtain better or similar results than using a different
stations on the prediction in a given station. In a first step, we number of neighbor stations.
analyze the number of previous O3 measurement hours in the
nearest station, to fix the interval of time which allows to obtain 3.4. Influence of meteorological variables
a good mean accuracy. We use a number of hours from 0 (case in
which we only use information of the studied station) to 3 h. The In the same way that in the previous analysis, we search the
results are shown in Table 4. This table shows as the mean accuracy number of hours needed to improve the performance of the best
is improved by using a larger number of previous O3 measurements model so far when we use some meteorological variables such as
hours in three stations (5, 14, 24) and it is also improved in stations solar radiation and temperature. Note that the meteorological
9 and 10 when 2 h of previous O3 measurements are used to train values used in a given air quality station correspond to the nearest
the model. meteorological station. In both analysis (solar radiation and
We use similar statistical tests to the previous analysis to eval- temperature), we have seen how the most significant number of
uate statistical differences. In this case, as Table 5, the results hours to include solar radiation and temperature is the 2 previous
obtained by using O3 measurements in 2 previous hours in the hours. With this value, we compare the use of solar radiation, the
nearest station are statistical significant in all stations considered. use of temperature and both of them. The results in this analysis are
On the other hand, the results by using 3 h are quite similar, but shown in Table 8. These results show that the use of both meteo-
they are worse in the station 2. Therefore, we have chosen to use 2 rological variables improves the accuracy of the rest of models in
previous hours of O3 measurements as the optimal value in almost all stations (all stations but 14, where the accuracy is not
neighbor stations. improved).
Next, we analyze the influence of several stations near to the Table 9 shows the corresponding statistical test to the previous
studied one (not only the nearest one), by means of the evaluation of experiments, by comparing the use of temperature or solar radia-
the accuracy of several models which are trained by using two tion and temperature to the case in which only solar radiation is
previous hours of a different number of neighbor stations, from 1 to 4. used. In this case, the results are statistically improved in two
Note that in the previous analysis we showed the use of the nearest stations (5, 9) when both meteorologic variables are included in the

Table 4 Table 6
Mean and standard deviation of the accuracy for the experiments where 3 previous Mean and standard deviation of the accuracy for the experiments where the
hours in the studied station and a different number of previous hours of the nearest considered number of neighbor stations is changed from 1 to 4, and using O3
stations are used to train the SVMr model considered. measurements at 2 previous hours from all of them.

Number of previous hours in the nearest station Number of near stations

0 1 2 3 1 2 3 4

Station Mean Std Mean Std Mean Std Mean Std Station Mean Std Mean Std Mean Std Mean Std
5 12.24 2.42 12.03 2.44 11.84 2.35 11.79 2.36 5 11.90 2.50 11.62 2.51 11.60 2.45 11.51 2.52
9 10.45 2.20 10.58 2.53 9.90 1.89 9.91 1.96 9 9.96 1.79 9.83 1.88 9.78 1.99 9.71 1.90
10 11.52 2.49 11.38 2.51 11.31 2.49 11.41 2.56 10 11.35 2.65 11.15 2.58 10.75 2.32 10.66 2.30
14 11.89 2.86 11.81 2.78 11.49 2.64 11.48 2.67 14 11.56 2.71 11.54 2.65 11.50 2.81 11.33 2.70
24 10.88 2.07 10.67 2.26 10.34 2.07 10.28 1.93 24 10.33 2.11 10.13 2.20 9.81 1.95 9.94 2.27
4486 E.G. Ortiz-García et al. / Atmospheric Environment 44 (2010) 4481e4488

Table 7
Statistical tests for the experiments where the considered number of neighbor
stations is changed from 1 to 4, using O3 measurement at 2 previous hours from all
of them.

Compared numbers of near stations

2 vs. 1 3 vs. 2 4 vs. 3

Station P-value W-L-T P-value W-L-T P-value W-L-T


5 0.00* 23-7-0 0.82* 17-13-0 0.29* 19-11-0
9 0.08* 20-10-0 0.51* 19-11-0 0.27* 15-15-0
10 0.02* 21-9-0 0.00* 26-4-0 0.21* 20-10-0
14 0.87* 18-12-0 0.67* 21-9-0 0.02* 25-5-0
24 0.03* 23-7-0 0.04** 21-9-0 0.20** 19-11-0

*t-test a ¼ 0.05.
**Fisher test a ¼ 0.05.

Table 8
Mean and standard deviation of the accuracy for the experiments where solar
radiation, temperature or both are considered with measurements at 2 previous
hours.

Solar radiation Temperature Both


Fig. 2. Structure of the MLP used for comparison.
Station Mean Std Mean Std Mean Std
5 11.10 2.45 11.16 2.47 10.89 2.47
9 9.48 1.86 9.55 1.89 9.29 1.83
ozone measurements in the 4 nearest stations and 2 previous hours
10 10.37 2.28 10.46 2.21 10.23 2.24
14 10.69 2.87 10.87 2.79 10.77 2.76 of solar radiation and temperature. To evaluate the performance we
24 9.80 2.23 9.59 1.82 9.60 1.84 use several methods: the first one is the evaluation of the well-
known R2 coefficient which introduces a ratio between the variance
of the studied model and the model formed by the mean of the
Table 9
output variable. The results of this coefficient in all experiments
Statistical tests for the experiments where solar radiation, temperature or both are
considered, with measurements at 2 previous hours. carried out are summarized in Table 10. In this table the mean and
standard deviation of the R2 coefficient, just as its minimum and
Temperature vs. Solar Both vs. Solar Radiation
maximum values in the 30 experiments in each station, shows
Radiation
a good performance of our model. The R2 is less than one in all
Station P-value W-L-T P-value W-L-T experiments and stations, and the mean shows more than a 68%
5 0.40* 12-18-0 0.00* 25-5-0 reduction of the mean square error.
9 0.38* 13-17-0 0.03* 20-10-0
Following we evaluate the performance of the proposed SVMr
10 0.19* 10-20-0 0.06* 18-12-0
14 0.06* 9-21-0 0.32* 11-19-0 approach by comparing to other prediction techniques. In our case,
24 0.58** 17-13-0 0.36** 18-12-0 we compare to the prediction given by the ozone concentration in
*t-test a ¼ 0.05.
the previous hour and the prediction given by a multi-layer per-
**Fisher test a ¼ 0.05. ceptron (MLP) trained with the same features that the ones used in
our model. We have considered a MLP with the classical structure

Table 10
Mean, standard deviation, minimum and maximum R2 coefficient for 30 experi- Table 11
ments with the best SVMr model. Mean and standard deviation of the accuracy obtained by training different models,
including the previous hour predictor, a multi-layer perceptron and our approach.
Coefficient R2
Previous hour MLP SVMr
Station Mean Std Min Max
predictor
5 0.27 0.17 0.07 0.79
9 0.32 0.21 0.08 0.88 Station Mean Std Mean Std Mean Std
10 0.26 0.20 0.07 0.93 5 14.97 3.34 35.20 10.01 10.89 2.47
14 0.30 0.18 0.10 0.85 9 12.12 2.30 38.17 15.94 9.29 1.83
24 0.24 0.13 0.05 0.53 10 14.03 3.18 34.77 9.04 10.23 2.24
14 13.63 3.55 37.26 10.57 10.77 2.76
24 13.23 2.43 29.72 11.27 9.60 1.84

training of the SVMr prediction model. In addition, in stations 10


and 24 it is easy to see that winner values are larger than lost Table 12
values, and thus, using both meteorologic variables can also be Statistical test for the accuracy obtained by training different models, including the
considered as the best choice, as expected. previous hour predictor, a multi-layer perceptron and our approach.

SVMr vs. Previous hour SVMr vs. MLP


predictor
3.5. Performance evaluation of the best SVMr model: comparison
5 0.00* 30-0-0 0.00* 30-0-0
with a multi-layer perceptron 9 0.00* 29-1-0 0.00* 30-0-0
10 0.00* 30-0-0 0.00* 30-0-0
We evaluate the performance of the best SVMr model obtained 14 0.00* 29-1-0 0.00* 30-0-0
in the previous analysis: trained using with three previous hours of 24 0.00* 29-1-0 0.00* 30-0-0

ozone concentrations in the studied station, 2 previous hours of *t-test a ¼ 0.05.


E.G. Ortiz-García et al. / Atmospheric Environment 44 (2010) 4481e4488 4487

Fig. 3. Evolution of ozone concentration during first two weeks in July for the station 9 and years from 2002 to 2007.

of one input layer, a hidden layer and one output, as can be seen in 4. Conclusions
Fig. 2, similar to the MLP used recently in other papers in the
literature (Salazar-Ruiz et al., 2008; Chelani, 2010). We have applied In this paper we have presented a study of the application of
the well-known LevenbergeMarquardt (LM) algorithm to train the Support Vector Regression algorithms (SMVrs) in the hourly
network, with the best set of input variables obtained in the prediction of ozone measures in Madrid urban area. Specifically,
previous study for the SVM approach, i.e., 3 previous hours of ozone data from the air quality network of Madrid have been used. The
concentrations in the studied station, 2 previous hours of ozone influence of different input variables (previous ozone measures in
measurements in the 4 nearest stations and 2 previous hours of the station, values in neighborhood stations and meteorologic
solar radiation and temperature. variables) has been analyzed by means of statistical tests which
The obtained results are shown in Table 11. It can be seen that allow measuring the statistical significance of each set of variables
the SVMr model obtained much better mean accuracy than the in the ozone prediction. Finally, a comparison with a multi-layer
multi-layer perceptron approach, which is even worse than the perceptron algorithm has shown the superiority of the SVMr in this
previous day model. This is interesting because it shows that particular problem. We have shown that the SVMr model is able to
the database presents non-linear characteristics which a multi- obtain accurate prediction of hourly ozone in the network of
layer perceptron is not able to learn. The statistical tests in Table 12 Madrid, and this model could be extended by considering data of
shows as the differences in mean accuracy are significant in all any other monitoring network in a different city.
experiments carried out. Following these results, the proposed
SVMr approach is a very good choice to carry out the hourly ozone References
prediction in Madrid’s air quality monitoring system.
Note that in this paper we have compared the SVM results Agirre-Basurko, E., Ibarra-Berastegui, G., Madariaga, I., 2006. Regression and
multilayer perceptron-based methods models to forecast hourly O3 and NO2
against those by a MLP, since this approach has been used as levels in the Bilbao area. Environmental Modelling & Software 21, 430e446.
a reference in previous studies about ozone prediction. Of course, Akay, M.F., 2009. Support vector machines combined with feature selection for
much more recent alternatives exist that could obtain excellent breast cancer diagnosis. Expert Systems with Applications 36 (2), 3240e3247.
Al-Alawi, S.M., Abdul-Wahab, S.A., Bakheit, C.S., 2008. Combining principal component
results in ozone prediction problems, and it could be an interesting regression and artificial neural networks for more accurate predictions of ground-
topic for future research to test them in similar problems. For level ozone. Environmental Modelling & Software 23, 396e403.
example, Gaussian processes (Sun and Yao, 2006a,b,c) is a tech- Aneiros-Perez, G., Cardot, H., Estevez-Perez, G., Vieu, P., 2004. Maximum ozone
concentration forecasting by functional non-parametric approaches. Environ-
nique which could provide very good results, since it has been metrics 15 (7), 675e685.
proven to be effective in regression problems before. Other tech- Balaguer-Ballester, E., Camps i Valls, G., Carrasco-Rodriguez, J.L., Soria-Olivas, E.,
niques such as kernel models with boosting or marginal likelihood Valle-Tascon, S., 2002. Effective 1-day ahead prediction of hourly surface ozone
concentrations in eastern Spain using linear models and neural networks.
(Sun and Yao, 2010; Sun and Yao, 2006a,b,c) have been recently
Ecological Modelling 156, 27e41.
discussed, and could be successfully applied to prediction problems Barrero, M.A., Grimalt, J.O., Cantón, L., 2006. Prediction of daily ozone concentration
in atmospheric science. maxima in the urban atmosphere. Chemometrics and Intelligent Laboratory
Finally, an example of the prediction obtained with the model is Systems 80 (1), 67e76.
Beelen, R., Hoek, G., Fischer, P., Van der Brandt, P.A., Brunekreef, B., 2007. Estimated
shown in Fig. 3. Here we can see the good approximation of the long-term outdoor air pollution concentrations in a cohort study. Atmospheric
prediction to the real data for the July month of all studied years. Environment 41, 1343e1358.
4488 E.G. Ortiz-García et al. / Atmospheric Environment 44 (2010) 4481e4488

Brunelli, U., Piazza, V., Pignato, L., Sorbello, F., Vitabile, S., 2007. Two-days ahead Massart, B., Kvalheim, O.M., 1998a. Ozone forecasting from meteorological vari-
prediction of daily maximum concentrations of SO2, O3, PM10, NO2, CO in the ables: part I. Predictive models by moving window and partial least squares
urban area of Palermo, Italy. Atmospheric Environment 41, 2967e2995. regression. Chemometrics and Intelligent Laboratory Systems 42 (1e2),
Chelani, A.B., 2010. Prediction of daily maximum ground ozone concentration using 179e190.
support vector machine. Environmental Monitoring and Assessment 162 (1e4), Massart, B., Kvalheim, O.M., 1998b. Ozone forecasting from meteorological vari-
169e176. ables: part I. Predictive models by moving window and partial least squares
Coman, A., Ionescu, A., Candau, Y., 2008. Hourly ozone prediction for a 24-h horizon regression. Chemometrics and Intelligent Laboratory Systems 42 (1e2),
using neural networks. Environmental Modelling & Software 23, 1407e1421. 191e197.
Dutot, A.L., Rynkiewicz, J., Steiner, F.E., Rude, J., 2007. A 24-h forecast of ozone peaks Mohandes, M.A., Halawani, T.O., Rehman, S., Hussain, A.A., 2004. Support vector
and exceedance levels using neural classifiers and weather predictions. Envi- machines for wind speed prediction. Renewable Energy 29 (6), 939e947.
ronmental Modelling & Software 22, 1261e1269. Moore, D.K., Jerrett, M., Mack, W.J., Kunzli, N., 2007. A land use regression model for
Felipe-Sotelo, M., Gustems, L., Hernández, I., Terrado, M., Tauler, R., 2004. Investi- predicting ambient fine particulate matter across Los Angeles. Journal of
gation of geographical and temporal distribution of tropospheric ozone in Environmental Monitoring 9, 246e252.
Catalonia (North-East Spain) during the period 2000e2004 using multivariate Ortiz-García, E., Salcedo-Sanz, S., Pérez-Bellido, A., Portilla-Figueras, J.A., 2009.
data analysis methods. Atmospheric Environment 40, 7421e7436. Improving the training time of support vector regression algorithms through
Garfias-Vázquez, M., Audry-Sánchez, J., Garfias-Ayala, F.J., 2005. Tropospheric ozone novel hyper-parameters search space reductions. Neurocomputing 72,
prediction in Mexico city. Journal of the Mexican Chemistry Society 49 (1), 2e9. 3683e3691.
Hoek, G., Beelen, R., Hoogh, K., Vienneau, D., Gulliver, J., Fischer, P., Briggs, D., 2008. Osowski, S., Garanty, K., 2007. Forecasting of the daily meteorological pollution
A review of land-use regression models to assess spatial variation of outdoor air using wavelets and support vector machine. Engineering Applications of Arti-
pollution. Atmospheric Environment 42, 7561e7578. ficial Intelligence 20 (6), 745e755.
Hou, S., Li, Y., 2009. Short-term fault prediction based on support vector machines Palacios, M., Kirchner, F., Martilli, A., Clappier, A., Martín, F., Rodríguez, M.E., 2002.
with parameter optimization by evolution strategy. Expert Systems with Summer ozone episodes in the Greater Madrid area. Analyzing the ozone
Applications 36, 12383e12391. response to abatement strategies by modelling. Atmospheric Environment 36,
Ibarra-Berastegi, G., Elias, A., Barona, A., Saenz, J., Ezcurra, A., Diaz de Argandoña, J., 5323e5333.
2008. From diagnosis to prognosis for forecasting air pollution using neural Salazar-Ruiz, E., Ordieres, J.E., Vergara, E.P., Capuz-Rizo, S.E., 2008. Development
networks: air pollution monitoring in Bilbao. Environmental Modelling & and comparative analysis of tropospheric ozone prediction models using
Software 23, 622e637. linear and artificial intelligence-based models in Mexicali, Baja California
Ionescu, A., Candau, Y., Mayer, E., Colda, I., 2000. Analytical determination and (Mexico) and Calexico, California (US). Environmental Modeling & Software 23
classification of pollutant concentration fields using air pollution monitoring (8), 1056e1069.
network data: methodology and application in the Paris area, during episodes Smola, A.J., Schölkopf, B., 1998. A tutorial on support vector regression. Statistics
with peak nitrogen dioxide levels. Environmental Modelling & Software 15 and Computing 14 (3), 199e222.
(6e7), 565e573. Sousa, S., Martins, F.G., Pereira, M.C., Alvim-Ferraz, M., 2006. Prediction of ozone
Kanaroglou, P., Jerret, M., Morrison, J., Beckerman, B., Arain, M.A., Gilbert, N.L., concentrations in Oporto city with statistical approaches. Chemosphere 64,
Brook, J.R., 2005. Establishing an air pollution monitoring network for intra 1141e1149.
urban population exposure assessment: a location-alocation approach. Atmo- Sun, P., Yao, X., 2006a. Boosting kernel models for regression. In: Proc. of the 6th
spheric Environment 39, 2399e2409. IEEE Conference on Data Mining, Hong-Kong, pp. 583e591.
Lu, W.-Z., Wang, W.-J., 2005. Potential assessment of the support vector machine Sun, P. and Yao, X., 2006b. Efficient forward kernel regression with marginal like-
method in forecasting ambient air pollutant trends. Chemosphere 59 (5), lihood. In: Proc. of the 14th European Symposium on Artificial Neural Networks,
693e701. Bruges, Belgium, pp. 485e490.
Lu, W., Wang, D., 2008. Ground-level ozone prediction by support vector machine Sun, P. and Yao, X., 2006. Greedy forward selection algorithms to sparse Gaussian
approach with a cost-sensitive classification scheme. Science of the Total process regression. In: Proc. of the International Joint Conference on Artificial
Environment 395, 109e116. Neural Networks, Vancouver, Canada, pp. 159e165.
Lu, H.-C., Hsieh, J.-C., Chang, T.-S., 2006. Prediction of daily maximum ozone Sun, P., Yao, X., 2010. Sparce approximation throgh boosting for learning large-scale
concentrations from meteorological conditions using a two-stage neural kernel machines. IEEE Transactions on Neural Networks 21 (6), 883e894.
network. Atmospheric Research 81, 124e139. Wang, D., Lu, W., 2006. Ground-level ozone prediction using multilayer perceptron
Luan, F., Xue, C., Zhang, R., Zhao, C., Liu, M., Hu, Z., Fan, B., 2005. Prediction of trained with an innovative hybrid approach. Ecological Modelling 198,
retention time of a variety of volatile organic compounds based on the heuristic 332e340.
method and support vector machine. Analytica Chimica Acta 537 (1e2), Wang, W., Men, C., Lu, W., 2008. Online prediction model based on support vector
101e110. machine. Neurocomputing 71 (4e6), 550e558.

You might also like