Professional Documents
Culture Documents
1 s2.0 S1352231015301035 Main PDF
1 s2.0 S1352231015301035 Main PDF
Atmospheric Environment
journal homepage: www.elsevier.com/locate/atmosenv
h i g h l i g h t s
A new computational code has been implemented for retrieving pollutant sources.
The code (GAIM) employs a genetic algorithm scheme and it has been parallelized.
Effectively identication of emission rates and positions up to three sources.
a r t i c l e i n f o
a b s t r a c t
Article history:
Received 24 September 2014
Received in revised form
12 May 2015
Accepted 15 May 2015
Available online 16 May 2015
A computational model is developed for retrieving the positions and the emission rates of unknown
pollution sources, under steady state conditions, starting from the measurements of the concentration of
the pollutants. The approach is based on the minimization of a tness function employing a genetic
algorithm paradigm. The model is tested considering both pollutant concentrations generated through a
Gaussian model in 25 points in a 3-D test case domain (1000m 1000m 50 m) and experimental data
such as the Prairie Grass eld experiments data in which about 600 receptors were located along ve
concentric semicircle arcs and the Fusion Field Trials 2007. The results show that the computational
model is capable to efciently retrieve up to three different unknown sources.
2015 Elsevier Ltd. All rights reserved.
Keywords:
Atmospheric pollution
Inverse model
Multi-source
Genetic algorithm
1. Introduction
Air pollution is still one of the most signicant problem in the
modern society. The urban and industrial development impacts in
an incisive way on the pollutants emissions in the atmosphere with
severe implications on the environment and on the population
health (Tiwary and Colls, 2010). As a consequence, many efforts
(Hart and Martinez, 2006) have been focused on the development
of methodologies that could take advantage of the measurements
carried out by networks of sensors for the monitoring and the
identication of pollutant sources.
The recognition of the number, the location and the emission
rate of unknown sources, starting from the measurements of the
concentration of the pollutants, is referred as inverse model. This
kind of models could be used to recognize the most relevant
pollution sources or to identify illegal releases in the atmosphere.
* Corresponding author.
E-mail address: luca.cedola@uniroma1.it (L. Cedola).
http://dx.doi.org/10.1016/j.atmosenv.2015.05.030
1352-2310/ 2015 Elsevier Ltd. All rights reserved.
requirement of a priori information and to the expensive computational efforts. Moreover, Allen et al. (2007a) developed an hybrid
optimization technique that overcomes the limitations deriving
from the initialization process which requires several simulations.
Particularly, in the case of ill-posed inverse problems, where a
multi-modal solution exists, all the above mentioned techniques
may not always be able to provide the exact solution. With the
introduction of the adjoint functions, as dened by Marchuk et al.
(2005) and re-elaborated by Issartel et al. (2007) and Sharan et al.
(2009), the ill-posed problems do not need initialization or a priori information regarding the release. In a very recent paper, Sharan
et al. (2012) have used adjoint functions for identifying multiplepoint emissions employing a two-step inversion algorithm. However, the application of this algorithm to large domains with several
sources becomes highly demanding in terms of computational efforts as pointed out by the authors. Even more recently, a leastsquares inversion algorithm has been implemented by Singh
et al. (2014) in order to reconstruct (together with the projected
horizontal location and intensity of the source) the unknown
height. The proposed algorithm has been successfully applied to
identify an elevated point source from both synthetic and real data.
In the present study, an approach employing the adjoint functions and a genetic algorithm (GA) has been implemented in order
to solve the problem of retrieving simultaneous multiple pollutant
releases. GAs (Holland, 1992; Goldberg, 1989; Mitchell, 1998;
Whitley, 2001; Hamblin, 2013) are a family of computational
models inspired by evolution. Since their rst formulation, one of
the most important applications has been in global search problems due to their robustness, fast convergence and intrinsic
parallelism. For these reasons the GA approach stands as an optimal
method in function minimization. Since the problem of detection of
pollutant sources' positions can be formulated in general terms as
the minimization of an appropriate cost function, GAs have been
used recently for the characterization of atmospheric contaminant
sources, as in the recent series of articles (Haupt, 2005; Haupt et al.,
2006; Allen et al., 2007a,b) where a thorough investigation is carried out in order to develop methodologies that couples a dispersion and transport model with a pollution receptor model for
identifying emission sources and that can also be able to take into
account meteorological data to improve the pollutant source
characterization. It is in this direction that we have implemented a
GA procedure in a new computational code for the identication of
pollutant multiple sources both for the three dimensional spatial
positions and for the releases intensity, i.e. for the solution of the
inverse model.
2. Computational method
In this work a least square formulation, as described by Sharan
et al. (2012), has been used for dening the tness (cost) function
J to be minimized in our GA Inverse Model (GAIM) code. The
identication of several simultaneous sources has been carried out
starting from a nite set of concentration measurements
m1,m2,,mn. The number of simultaneous releases is assumed to be
known a priori. The initial requirement of the formulation is a
source-sensor relationship which describes the mapping between
sources and the measurements obtained by the sensors. The relationship between sources and measurements mi is described by
introducing the adjoint functions (Marchuk et al., 2005) as:
mi qai
for
i 1; 2; ; n
(1)
where n is the number of sensors, ai is the adjoint function corresponding to the i-th sensor ad q is the unknown source intensity.
The adjoint function describes backward transport of the
37
"
#T "
#
m
m
X
X
1
Jx1 ; x2 ; ; q m
qi axi
qi axi
m
2
i1
i1
(2)
38
Fig. 1. a) Sketch of the grid-domain showing the correspondence between tuples and grid points. Also shown the chosen axes orientation; b) Example of crossover considering a
run with 4 unknown sources (i.e. sets of 4 tuples); c) Example of mutation considering a run with 4 unknown sources.
39
Fig. 3. Example of the migration process among 4 CPUs at the i-th iteration.
40
!"
y2
2 exp
2sy
!#
z h2
2s2z
Q
cx; y; z
exp
2pusy sz
exp
z h2
2s2z
!
(3)
and the obtained results. Fig. 4 shows the planar view of the 3D test
case domain, in which the receptors are represented by red crosses
and the unknown sources (to be identied), Qin:i, are represented
by blue stars. The values of the employed meteorological variables
are reported in Table 1 while the positions and the emission rates of
the sources for the chosen test case are reported in Table 2. Typical
summer conditions in Mediterranean climate have been taken into
account with low wind speed, westerly wind direction and very
unstable stability class. The test case shown in Fig. 4 has been
chosen to highlight the performance of our inverse model when a
difcult conguration has to be solved. In fact, in this particular
case two of the sources are located close to the boundary along the
y axis of the domain (Qin:1 and Qin:2) and two sources (Qin:2 and
Qin:3) are in the same direction, which corresponds to the wind
direction, so that Qin:3 turns out to be hidden by Qin:2. Furthermore, the emission rate of one of the three sources is much lower
than those of the other two (source No. 3 in Table 2).
The input parameters needed for our GAIM code are reported in
Table 3. Eight CPUs have been used and a population of 50,000
individuals has been assigned to each CPU. The maximum number
of iterations has been chosen equal to 500. The mutation probability pm has been set equal to 0.5, whereas the individuals that will
enter to the elite are the 10% of the population and the migration
module has been activated. The computational time has been
74 min on a four cores Intel(R) Xeon(R) dual CPU E3-1245 V2 @
3.40 GHz (32 GiB of memory) and the obtained results are presented in Figs. 5e7.
Fig. 5 shows the results relative to the described test case obtained by our code. The panel a) shows the planar view of the
domain. The position of the unknown sources given as input (Qin:i)
are represented as blue stars and the sources position obtained as
output (Qout:i) as green full circles. The employed meteorological
parameters are indicated in the gure. As one can see, both the
emission rates (see on the right side of panel a)) and the position of
the unknown sources are exactly identied by the model. In panel
b) the three dimensional domain is sketched along with the identied sources. In both panels, the red crosses represent the
receptors.
Fig. 6 reports the lowest tness value, as dened in Eq. (2),
corresponding to each generation, i.e to each iteration. One can see
that the value of the tness decreases very quickly in few iterations
and tends to stabilize for the remaining ones, a common feature in
any type of GA procedures. In fact, at the beginning of the run the
population is more heterogeneous, being randomly generated, and
it is the crossover operator that creates offspring better (i.e. with
lower tness values) than parents. After several iterations, population tends to be more homogeneous and only the mutation creep
operator can produce signicant effects on the tness value. As
shown in Fig. 6, starting from the 62th iteration, the tness value
drastically lowers of several orders of magnitude in few generations
due to the mutation creep module. This is a paradigmatic case in
which the mutation creep step is essential for nding the exact
solution. For each generation, a new elite is dened. At the end of
the process, the individuals with the lowest tness value in the elite
will represent the solution. Fig. 7 shows in a two dimensional cut
(z 2 m) the evolution of the individuals in the elite from the 1st
Table 1
Meteorological parameters employed in synthetic (test cases)
calculations.
Fig. 4. Planar view of the 3D test case domain (1000 m 1000 m 50 m) with the
presence of three randomly chosen sources (Q_in: i, i 1,2,3) represented by blue
stars. The crosses represent the position of the receptors. (For interpretation of the
references to colour in this gure legend, the reader is referred to the web version of
this article.)
Variable
Value
Wind speed
Wind direction
Pasquill stability class
2.0 [ms1]
270 [ N]
Very unstable
41
Table 2
Data of the three sources for the selected test case.
Source no.
Q [gm3]
x [m]
y [m]
z [m]
1
2
3
100.0
47.0
1.0
17
97
430
773
374
379
18
43
11
Table 3
GAIM parameters used for all the synthetic calculations.
Parameter
Value
No. of CPUs
Population size
No. of iterations
Mutation probability
Elite size
Migration ag
8
50,000
500
0.5
5000
True
Fig. 6. Evolution of the lowest tness value as function of the number of iterations (i.e.
generations). At the 62th iteration the mutation creep module starts. See main text for
details.
r
m
2
2
2
1 X
Qin;i Qout;i
x
x
y
y
z
z
in;i
out;i
in;i
out;i
in;i
out;i
m i1
Qin;i
(4)
Fig. 5. Results obtained with our GAIM code for the selected test case. a) Planar view. The blue stars represent the unknown emission sources given as input (Qin:i), the green full
circles represent the sources position obtained as output (Qout:i) and the crosses represent the receptors. The numerical values of the input and output emission rates are displayed
on the right of this panel. b) Three dimensional view of the same results. (For interpretation of the references to colour in this gure legend, the reader is referred to the web version
of this article.)
42
Fig. 7. Elite individuals evolution from the 1st generation up to the 299th. Red crosses indicate the receptors, blue stars represent the three unknown sources and the black dots are
tuples representing the elite individuals. (For interpretation of the references to colour in this gure legend, the reader is referred to the web version of this article.)
Fig. 8. Bar plot for all the synthetic runs taking into account the calculated values of
the estimator. The blue, red and yellow bars respectively indicates one, two and three
sources runs. See main text for details. (For interpretation of the references to colour in
this gure legend, the reader is referred to the web version of this article.)
model is able to retrieve both position and emission rate for all
runs, that is the estimator is equal to zero for all the 100 congurations (blue bar in Fig. 8). For the multiple sources cases it turns
out that can be greater than 10 m. In particular, for two sources 95
congurations are in the rst class (0e10) and 81 test cases out of
these 95 give the exact solution (0). For the three sources case, 82
congurations are in the class 0e10 and for 47 test cases out of
these 82 we obtained the exact solution. It was veried that the test
cases for which the estimator is greater than 10 m occur when one
or two sources are shadowed by another source along wind direction. Furthermore, we have to consider that many congurations
taken into account are particularly difcult to solve since the
sources are located along the x axis boundary or at the right hand
side of the domain (see Fig. 4) so that few sensors are affected by
the sources. This means that only a small subset of the 25 receptors
concentration values can be used in the computational procedure.
In Fig. 9 we report the correlations of the model input values
against the obtained output variables. Looking at the panels b) and
c) of that gure, one can see that GAIM is able to locate efciently
the sources that are located in the x-y plane. On the other hand, the
correlation along the z axis (see panel d) of Fig. 9) is marked by a
greater spread with respect to that for the x-y plane. This is due to
our choice to place sensors only in the x-y plane at an height of 2 m
in order to simulate a realistic experimental setup and with this
conguration we have a rather rough sampling of concentrations
along the z axis. This issue affects in turn the errors on the emission
rates for the three sources case, as one can see in the panel a) of
43
Fig. 9. Correlations between input values (unknown sources) and output results. In panel a) the emission rates Qin vs Qout for all the synthetic calculations for 1, 2 and 3 sources are
reported. Panels b), c) and d) show the correlations between the coordinates Xin (synthetic unknown coordinates sources) and Xout (obtained with our GAIM code). In all the panels
blue crosses, red stars and yellow dots indicates one, two and three sources runs, respectively. Emission rates are in gs1 and coordinates in m. (For interpretation of the references
to colour in this gure legend, the reader is referred to the web version of this article.)
Fig. 9.
Nevertheless, in spite of these particular critical situations the
model performance in these synthetic runs is overall quite good.
3.2. Application of GAIM to single source and three sources
experimental data
In order to verify the model capability to retrieve the unknown
sources in a real environment, the Prairie Grass eld experiments
Fig. 10. Prairie Grass No. 4 eld experiment: contour map indicating the interpolated
SO2 concentration emitted by a surface source (red star) and measured at the sensors
(red crosses). Concentrations are in mg/m3. (For interpretation of the references to
colour in this gure legend, the reader is referred to the web version of this article.)
data (Barad, 1958) have been employed with our GAIM code for the
case with single-point source and a trial from Fusion Field Trials
2007, FFT07, (Storwald, 2007) for the case with three different
sources. In Prairie Grass campaign approximately 600 receptors
had been located along ve concentric semicircle arcs disposed at
the radii distance of 50, 100, 200, 400 and 800 m downwind to the
release. Sixty-eight runs had been performed with summer conditions, both during the day and the night. For each run, sulfur
dioxide (SO2) tracer originated by a source placed at 0.46 m agl had
been released. The samplers height was 1.5 m agl. In Fig. 10 the
sampler positions are depicted along with the concentrations
contour map for one of the 68 experiments.
In order to use the experimental data, a three-dimensional
model domain 4000 m 4000 m 50 m has been used. All the
68 runs have been examined and employed for the calculations
with our GAIM code. The source has been placed in the center of the
computation domain to avoid to restrict the model solutions to a
limited area portion. For each run, the source position error (SPE)
and the source strength difference (SSD) have been calculated. The
SPE indicates the distance between the position of the source
retrieved by the model and the real source position, while the SSD,
dened as (Qobs Qmod)/Qobs, gives the percentage error between
the modeled source strength and the observed one. Both of them
are represented in the rst panel of Fig. 11, in which the SPE is less
than 100 m for the majority of the runs (55 out of 68). The SSD
exceeds the value of jSSDj 1 in 18 runs which have been mainly
carried out during slightly unstable, neutral and stable atmospheric
conditions. In the panel b) of Fig. 11 SPEs and SSDs are reported as a
function of the atmospheric Pasquill stability classes (Pasquill,
1961) where A, B, C, D, E, F indicates very unstable, unstable,
slightly unstable, neutral, slightly stable and stable atmosphere,
44
Fig. 11. Panel a): source position error (SPE) (blue circles) and source strength difference (SSD) (red triangles) calculated for each Prairie Grass eld experiment. Panel b) SPE (blue
circles) and SSD (red triangles) as a function of atmospheric stability classes. SPEs are in m. See details in the main text. (For interpretation of the references to colour in this gure
legend, the reader is referred to the web version of this article.)
Fig. 12. Correlation between the SPE and the 10 min averaged wind direction standard
deviation (WDSD10 ). Orange stars, blue diamonds, red up-triangles, black squares,
purple circles and green down-triangles represent runs conducted in very unstable (A),
unstable (B), slightly unstable (C), neutral (D), slightly stable (E) and stable atmosphere
(F), respectively. The exponential regression t of the neutral runs are shown by the
black line. SPEs are in m and WDSD10 in degree. (For interpretation of the references to
colour in this gure legend, the reader is referred to the web version of this article.)
In spite of this, even with real data, GAIM is able to retrieve, for
single-point release, the sources strength and position with very
good accuracy and our results compare well with those of Cervone
and Franzese (2011) when applying at the same Prairie Grass
experimental data. In the FFT07 series of experiments, a grid of 100
digital photoionization detectors (at 2 m above the ground) were
located in an area of 475 m 450 m. Propylene gas, the tracer, was
released from multiple locations at 2 m above ground and at constant ow rates for approximately 10 min per trial. In order to test
our code we have chosen the FFT07 trial 28 where a continuous
release from 3 sources for 10 min is carried out. The experimental
layout, along with the obtained results, is shown in Fig. 14. The
same trial has been already used by Wade and Senocak (2013) to
test their multi-source event reconstruction tool (MERT). The
meteorological data to be employed as our input are taken from a
32 m ultrasonic tower with 5 different vertical levels. We timeaveraged the concentration data from sensors for the continuous
release of trial 28. The computational domain is 900 m 900 m
with the same step sizes used for the single-point release calculations. The distances between the true sources and those retrieved
with our code are 0.99 m, 11.45 m and 23.05 m, as one can see in Fig.
14. Thus, we are able to successfully locate the position of the three
sources with a maximum error lower than that obtained by Wade
and Senocak (2013), which report a maximum error of 48.3 m for
the same trial 28. The intensities we have found for the three
sources are retrieved with factors 0.8, 2.9 and 4.5, respectively. It
has to be considered that the inverse model is strongly affected by
Fig. 13. Correlation between the SPE and the SSD. Symbols follow the notation used in
Fig. 12. SPEs are in m.
45
Fig. 14. Layout of FFT07 trial 28. The sensors position is represented by red crosses, the location of the true sources is in blue stars and the estimated sources location as retrieved by
our GAIM code is represented by green circles. (For interpretation of the references to colour in this gure legend, the reader is referred to the web version of this article.)
46
Cantelli, A., Leuzzi, G., Monti, P., Viotti, P., 2012. An inverse modelling approach for
estimating vehicular emissions in urban coastal areas of the messina strait. Int.
J. Environ. Pollut. 50 (1), 274e282.
Cervone, G., Franzese, P., 2011. Non-Darwinian evolution for the source detection of
atmospheric releases. Atmos. Environ. 45 (26), 4497e4506.
Cervone, G., Franzese, P., Grajdeanu, A., 2010. Characterization of atmospheric
contaminant sources using adaptive evolutionary algorithms. Atmos. Environ.
44 (31), 3787e3796.
Correia, L., 2010. Computational evolution: taking liberties. Theory Biosci. 129
(2e3), 183e191.
Fang, Y., Li, J., 2010. A review of tournament selection in genetic programming. In:
Cai, Z., Hu, C., Kang, Z., Liu, Y. (Eds.), Advances in Computation and Intelligence,
Volume 6382 of Lecture Notes in Computer Science. Springer-Verlag, Berlin,
pp. 181e192.
Goldberg, D.E., 1989. Genetic Algorithms in Search, Optimization, and Machine
Learning. Addison-Wesley Longman Publishing Co., Inc, Boston.
Goldberg, D.E., 2002. The Design of Innovation: Lessons from and for Competent
Genetic Algorithms. Kluwer Academic, Norwell, MA, USA.
Hamblin, S., 2013. On the practical usage of genetic algorithms in ecology and
evolution. Methods Ecol. Evol. 4 (2), 184e194.
Hart, J.K., Martinez, K., 2006. Environmental sensor networks: a revolution in the
earth system science? Earth Sci. Rev. 78 (3e4), 177e191.
Haupt, S.E., 2005. A demonstration of coupled receptor/dispersion modeling with a
genetic algorithm. Atmos. Environ. 39 (37), 7181e7189.
Haupt, S.E., Young, G.S., Allen, C.T., 2006. Validation of a receptor-dispersion model
coupled with a genetic algorithm using synthetic data. J. Appl. Meteorol. Climatol. 45, 476e490.
Holland, J.H., 1992. Adaptation in Natural and Articial Systems. MIT Press.
Issartel, J.-P., Sharan, M., Modani, M., 2007. An inversion technique to retrieve the
source of a tracer with an application to synthetic satellite measurements. Proc.
R. Soc. A Math. Phys. Eng. Sci. 463 (2087), 2863e2886.
Keats, A., Yee, E., Lien, F.-S., 2007. Bayesian inference for source determination with
applications to a complex urban environment. Atmos. Environ. 41 (3), 465e479.
Krysta, M., Bocquet, M., Sportisse, B., Isnard, O., 2006. Data assimilation for shortrange dispersion of radionuclides: an application to wind tunnel data. Atmos.
Environ. 40 (38), 7267e7279.
Marchuk, G., Shutyaev, V., Bocharov, G., 2005. Adjoint equations and analysis of
complex systems: application to virus infection modelling. J. Comput. Appl.
Math. 184 (1), 177e204.
Mitchell, M., 1998. An Introduction to Genetic Algorithms. The MIT Press.
Pandey, H.M., Chaudhary, A., Mehrotra, D., 2014. A comparative review of approaches to prevent premature convergence in ga. Appl. Soft Comput. 24,
1047e1077.
Pasquill, F., 1961. The estimation of the dispersion of windbome material. Meteorol.