You are on page 1of 11

Atmospheric Environment 115 (2015) 36e46

Contents lists available at ScienceDirect

Atmospheric Environment
journal homepage: www.elsevier.com/locate/atmosenv

Application of genetic algorithm for the simultaneous identication of


atmospheric pollution sources
A. Cantelli, F. D'Orta, A. Cattini, F. Sebastianelli, L. Cedola*
Department of Mechanical and Aerospace Engineering, University of Rome Sapienza, Via Eudossiana, 18, 00184 Rome, Italy

h i g h l i g h t s
 A new computational code has been implemented for retrieving pollutant sources.
 The code (GAIM) employs a genetic algorithm scheme and it has been parallelized.
 Effectively identication of emission rates and positions up to three sources.

a r t i c l e i n f o

a b s t r a c t

Article history:
Received 24 September 2014
Received in revised form
12 May 2015
Accepted 15 May 2015
Available online 16 May 2015

A computational model is developed for retrieving the positions and the emission rates of unknown
pollution sources, under steady state conditions, starting from the measurements of the concentration of
the pollutants. The approach is based on the minimization of a tness function employing a genetic
algorithm paradigm. The model is tested considering both pollutant concentrations generated through a
Gaussian model in 25 points in a 3-D test case domain (1000m  1000m  50 m) and experimental data
such as the Prairie Grass eld experiments data in which about 600 receptors were located along ve
concentric semicircle arcs and the Fusion Field Trials 2007. The results show that the computational
model is capable to efciently retrieve up to three different unknown sources.
2015 Elsevier Ltd. All rights reserved.

Keywords:
Atmospheric pollution
Inverse model
Multi-source
Genetic algorithm

1. Introduction
Air pollution is still one of the most signicant problem in the
modern society. The urban and industrial development impacts in
an incisive way on the pollutants emissions in the atmosphere with
severe implications on the environment and on the population
health (Tiwary and Colls, 2010). As a consequence, many efforts
(Hart and Martinez, 2006) have been focused on the development
of methodologies that could take advantage of the measurements
carried out by networks of sensors for the monitoring and the
identication of pollutant sources.
The recognition of the number, the location and the emission
rate of unknown sources, starting from the measurements of the
concentration of the pollutants, is referred as inverse model. This
kind of models could be used to recognize the most relevant
pollution sources or to identify illegal releases in the atmosphere.

* Corresponding author.
E-mail address: luca.cedola@uniroma1.it (L. Cedola).
http://dx.doi.org/10.1016/j.atmosenv.2015.05.030
1352-2310/ 2015 Elsevier Ltd. All rights reserved.

In the last years, several researchers have focused on the


problem of the identication of unknown sources starting from
concentration measurements. Signicant efforts in this direction
have been done by Pudykiewicz (1998), Penenko et al. (2002),
Issartel et al. (2007), Allen et al. (2007b), Senocak et al. (2008),
Sharan et al. (2009), Cervone et al. (2010), Cervone and Franzese
(2011), Singh and Rani (2014) and Cantelli et al. (2012). All the
above studies consider the identication of just one single-point
source. On the other hand, the simultaneous identication of
several sources is a more challenging task as the measured concentrations are not uniquely due to a single source of pollutant. As
reported by Singh et al. (2013), the identication of multiple-point
releases has been addressed on techniques based on deterministic
(Penenko et al. (2002), Krysta et al. (2006), Sharan et al. (2009,
2012)) and probabilistic (Keats et al. (2007), Yee (2008)) methods.
In particular, Yee (2008) has used a probabilistic approach for
identifying multiple-point emissions employing a reversible jump
Markov process technique that efciently samples the number of
point sources, their origin and corresponding sources intensity. The
drawbacks of these probabilistic methods are due to the

A. Cantelli et al. / Atmospheric Environment 115 (2015) 36e46

requirement of a priori information and to the expensive computational efforts. Moreover, Allen et al. (2007a) developed an hybrid
optimization technique that overcomes the limitations deriving
from the initialization process which requires several simulations.
Particularly, in the case of ill-posed inverse problems, where a
multi-modal solution exists, all the above mentioned techniques
may not always be able to provide the exact solution. With the
introduction of the adjoint functions, as dened by Marchuk et al.
(2005) and re-elaborated by Issartel et al. (2007) and Sharan et al.
(2009), the ill-posed problems do not need initialization or a priori information regarding the release. In a very recent paper, Sharan
et al. (2012) have used adjoint functions for identifying multiplepoint emissions employing a two-step inversion algorithm. However, the application of this algorithm to large domains with several
sources becomes highly demanding in terms of computational efforts as pointed out by the authors. Even more recently, a leastsquares inversion algorithm has been implemented by Singh
et al. (2014) in order to reconstruct (together with the projected
horizontal location and intensity of the source) the unknown
height. The proposed algorithm has been successfully applied to
identify an elevated point source from both synthetic and real data.
In the present study, an approach employing the adjoint functions and a genetic algorithm (GA) has been implemented in order
to solve the problem of retrieving simultaneous multiple pollutant
releases. GAs (Holland, 1992; Goldberg, 1989; Mitchell, 1998;
Whitley, 2001; Hamblin, 2013) are a family of computational
models inspired by evolution. Since their rst formulation, one of
the most important applications has been in global search problems due to their robustness, fast convergence and intrinsic
parallelism. For these reasons the GA approach stands as an optimal
method in function minimization. Since the problem of detection of
pollutant sources' positions can be formulated in general terms as
the minimization of an appropriate cost function, GAs have been
used recently for the characterization of atmospheric contaminant
sources, as in the recent series of articles (Haupt, 2005; Haupt et al.,
2006; Allen et al., 2007a,b) where a thorough investigation is carried out in order to develop methodologies that couples a dispersion and transport model with a pollution receptor model for
identifying emission sources and that can also be able to take into
account meteorological data to improve the pollutant source
characterization. It is in this direction that we have implemented a
GA procedure in a new computational code for the identication of
pollutant multiple sources both for the three dimensional spatial
positions and for the releases intensity, i.e. for the solution of the
inverse model.
2. Computational method
In this work a least square formulation, as described by Sharan
et al. (2012), has been used for dening the tness (cost) function
J to be minimized in our GA Inverse Model (GAIM) code. The
identication of several simultaneous sources has been carried out
starting from a nite set of concentration measurements
m1,m2,,mn. The number of simultaneous releases is assumed to be
known a priori. The initial requirement of the formulation is a
source-sensor relationship which describes the mapping between
sources and the measurements obtained by the sensors. The relationship between sources and measurements mi is described by
introducing the adjoint functions (Marchuk et al., 2005) as:

mi qai

for

i 1; 2; ; n

(1)

where n is the number of sensors, ai is the adjoint function corresponding to the i-th sensor ad q is the unknown source intensity.
The adjoint function describes backward transport of the

37

pollutant's concentration from the sensors. These adjoint functions


can be generated from a dispersion model. In our case a Gaussian
dispersion model has been used (see below in Section 3 for details).
The least square method we have employed for the estimation
of sources parameters (locations and intensities) from n observed
concentration measurements mi and from m unknown simultaneous point sources is based on the minimization of the sum of
square of residuals represented by the tness function J (Sharan
et al. (2012)):

"
#T "
#
m
m
X
X
1
Jx1 ; x2 ; ; q m 
qi axi
qi axi
m
2
i1
i1

(2)

where qi is the intensity of the i-th source and xi is the position


vector for the sources. For each xed set of x1,x2,,xm, the condition
vJ/vqi 0 (i 1,2, ,m) for identifying the critical points will lead to
a system of equations that can be solved using the Gauss eliminab . The computed q
b will minimize
tion method in order to compute q
the function J for that xed set of x1,x2,,xm (Sharan et al., 2012).
This process is repeated for all the possible combinations of
b for which J
bi; q
x1,x2,,xm in the domain. Finally, the value of x
takes the lowest value returns the position and the strength of the
unknown sources. Since this method requires to calculate the value
of J for all the possible combinations of the m sources along the
chosen grid points in the domain, xi, it is evident that this procedure becomes rapidly time consuming with the increase of the
number of grid points and of the sources. In fact, the total number
of combinations is equal to CN,m N!/m!(N  m)! where N is the
total number of grid points and m is the number of sources. Even
with a 3-D mesh size of N 1000  1000  50 and with only two
sources, m 2, the total number of combinations is 1.25  1015, as
already pointed out by Singh et al. (2013). In that paper the authors
propose an inverse modeling methodology for the identication of
multiple-point sources based on the geometry of the monitoring
network in terms of weight functions employing a bracketing
strategy to reduce the computational time due to the very large
number of the grid points to be visited (Singh et al., 2013). In the
present work, in order to limit the computational efforts we have
developed an inverse model based on the GAs features that employs the J function reported in equation (2) as the cost function
and that avoids its evaluation for all the possible combinations CN,m.
The GA scheme we have implemented in our model can be summarized as follows: (i) generate a dened-size population of
random individuals, (ii) select the best individuals (elite) assigning
to each of them a score (tness function), (iii) generate a new
generation of individuals by crossover and mutation techniques
(mating procedure) starting from the best individuals, (iv) repeat
(ii) and (iii) steps until the chosen number of iterations is reached.
In our computational framework the domain is discretized through
a grid of N nx  ny  nz points. The individuals are chosen to be the
positions that the sources can take over the discretized grid points.
An integer number (tuple) is uniquely associated to the set of coordinates of each grid point as shown in the panel a) of Fig. 1. For
example the 1001-th tuple represents the grid point position x 2,
y 1, z 1 in the dened domain.
The rst population of individuals is created randomly according to the GA procedure. With one unknown source, individuals are
picked from the tuple domain, until the population size is reached.
If the number m of unknown sources is greater than one, for each
individual m tuples will be picked up.
Hence, depending on the number of sources, which is one of the
input parameters for the model, individuals look like: [*] for one
source, [*,*] for two sources and [*,*,,*] for m sources, where the *
represents a generic tuple of the domain.

38

A. Cantelli et al. / Atmospheric Environment 115 (2015) 36e46

To each individual of the population is assigned a tness value


calculated using the denition of J as in equation (2). The individuals are sorted according to their tness value and then a
selection method is applied to extract the 10% of population as the
elite. The tournament method proposed by Goldberg (1989) has
been implemented in our code for the elite selection. As stated by
Correia (2010), the tournament selection is favoured in genetic
programming instead of proportional methods, because the latter
have a strong tendency to converge prematurely (see also Goldberg
(2002)). Moreover, as pointed by Fang and Li (2010), tournament
selection is simpler to code and is computationally efcient since it
does not require to sort rst the whole population, as in the roulette
wheel selection method. New individuals are generated from the
elite by crossover and mutation operations in order to create a new
population with the same initial size. GA employs the operations of
crossover and mutation to create the evolving generations. The
crossover process takes two parents, i.e. two lists of tuples, to
generate a new individual (a new list of tuples) derived from both
parents, according to one-point crossover denition as shown in
panel b) of Fig. 1 in the case of four unknown sources. On the other
hand, the mutation process generates a new individual with the
same set of tuples of the parent except for one randomly mutated
tuple, as shown in panel c) of Fig. 1. The probability by which new
individuals can be generated from the crossover process is dened
as pc, while the mutation probability is dened as pm and generally
pc > pm. In GAIM we assume that pc 1  pm.
After the mating process has been carried out, a new population
is generated and all the individuals can be assigned to new tness
values. The best individual evaluated from the last generation, that
is the one with the lowest J, brings the information for the positions of the unknown sources for which the corresponding
emission rates are calculated. In the following, the GAIM code is
described and its ow diagram is shown in Fig. 2. GAIM has been
written mainly in Python language and some modules have been
written in Fortran 90. It is parallelized through the Parallel Python
(PP) module that provides mechanism for parallel execution of
Python code on systems with multiple processors or cores and
clusters.
The input variables are the pollutants concentrations, the
number
of
sources,
the
computational
domain
size
(N nx  ny  nz), the grid resolution, the meteorological information (wind speed and wind direction, stability class and roughness
length), the CPUs number and the GA parameters (population size,
iterations number and mutation probability pm). According to the
CPUs number, a rst generation of population(s), one for each CPU,
is randomly created. This means that in our code on each CPU a
single population is processed, making the equivalence between
CPU and population. Then, if more than one CPU is available the
parallel mode is used. The tness value for each individual in each
population is calculated and a sorting process is done according to

the values of the tness. The tournament module is called to select


other individuals among all the population(s) to be added to the
elite. Before the mating process starts, a check is done in order to
verify if the populations are formed by one-tuple individuals, i.e. if a
single source problem occurs. In this case, only the mutation
operation is performed. If more sources are present, a check is done
to choose between the crossover and the mutation according to the
pm value. If the pm value is greater than a randomly generated
number between [0,1) mutation occurs and a random individual of
the elite is replaced by an individual randomly picked up from the
total population. If the mutation process is chosen, an additional
control is performed to select normal mutation or mutation creep
module. The tness value tends to decrease with the iterations.
However, it is possible to observe that in the majority of the cases
the best tness value does not change after a xed iterations
number and a plateau effect occurs. This effect is due to the fast
convergence of the model and depends on the homogeneity of the
population at that stage of calculation. In this case, crossover and
mutation do not usually improve anymore the solution. For these
reasons the mutation creep (Rapallo et al., 2005) has been implemented in the code. The mutation creep mutates a randomly chosen tuple of the best individuals only on a neighbourhood, i.e. on a
small portion of the 3-D domain, so the chances to identify the
unknown sources become greater.
The mutation creep module is activated both when the plateau
condition is observed and after the 75% of the iteration number
has been reached. The issue of premature convergence, introduced
above, is of particular importance in all the GA methodologies (see
the very recent paper of Pandey et al. (2014), and references
within, for a comparative review of approaches). One of the
methods devised to avoid premature convergence is known as the
Island Model (Whitley, 1994; Whitley et al., 1999) and we implement this method in our code because of its efciency and its
straightforward application when dealing with a parallel environment. In our parallel implementation each CPU performs the
GA optimization with its own population and all the CPUs involved
in the calculation exchange a portion of their populations giving
rise to a process of migration (Whitley et al., 1999). Thus this
migration process will be dened by the migration interval, i.e. the
number of generations between a migration, and the migration
size, corresponding to the numbers of individuals in the population undergoing to this procedure. In our scheme the migration is
active after each iteration and the portion of the individuals
migrating from one CPU-population to another one is chosen to be
the 10% of the our elite size, as schematically shown in Fig. 3. We
have tested that the migration module improved the performances of the calculations. A new generation is passed as input
until the maximum iteration number is reached so the process
restarts giving nally the computed emission rate and the unknown sources position.

Fig. 1. a) Sketch of the grid-domain showing the correspondence between tuples and grid points. Also shown the chosen axes orientation; b) Example of crossover considering a
run with 4 unknown sources (i.e. sets of 4 tuples); c) Example of mutation considering a run with 4 unknown sources.

A. Cantelli et al. / Atmospheric Environment 115 (2015) 36e46

39

Fig. 2. Flow Diagram of our GA inverse model.

3. Results and discussion

Fig. 3. Example of the migration process among 4 CPUs at the i-th iteration.

The proposed GAIM model has been developed to efciently


identify multiple-point emissions. In order to verify the model
capability in retrieving strength and position for one, two and three
sources, GAIM has been employed over a 3-D domain discretized
through a grid of N nx  ny  nz points. The unknown sources are
located at the grid points whereas receptors for the pollutants
measurements can be located everywhere inside the domain. As
already said, the values of the concentration at the receptors points
are used as input data for the code. If no concentration measurements are available, a synthetic concentrations eld can be generated assuming to know the location and the emissions rates of the
sources in order to test the proposed inverse model. A direct
dispersion model (Stockie, 2011) is employed to generate synthetic
concentration values at the points where the receptors are supposed to be in order to simulate the measurements. Then, in this
case, the Gaussian plume solution for the advection-diffusion
equation is used to generate the pollutant concentration c(x,y,z):

40

A. Cantelli et al. / Atmospheric Environment 115 (2015) 36e46

!"
y2
 2 exp
2sy
!#
z h2

2s2z

Q
cx; y; z
exp
2pusy sz
exp

z  h2

2s2z

!
(3)

where Q is the source emission rate, u is the wind speed at the h


height of the release, x is the downwind, y is the crosswind and z is
the vertical direction. The sy and the sz are the crosswind and
vertical dispersion coefcients (i.e. the plume spreads) depending
on the atmospheric stability class. The dispersion coefcients were
computed from the tabulated curves of Briggs (Arya, 1999). In this
study, u is assumed to be known and a single reection at the
ground is used.
3.1. Application of GAIM with a synthetic concentration eld
In order to validate the GAIM we have carried out a series of
calculations considering a domain of 1000 m  1000 m  50 m with
a resolution (step size) of 1 m. A regular network of 25 receptors is
supposed to be positioned inside the domain at 2 m above ground
level (agl), as shown in Fig. 4. The synthetic concentrations eld has
been generated positioning the unknown sources in different grid
points and using a Gaussian dispersion model with ground reection to create the concentration data. The standard deviations of
Gaussian distribution, which indicate the spread of the plume in
the y and z directions, has been calculated following the Bailey and
Schwede (1995) equations, that approximately t the PasquilleGifford curves (Pasquill, 1961; Turner, 1973) for the rural
mode. The concentration eld will be the input of the model,
whereas the position and the emission rate of the sources are the
unknown variables to be found by our GAIM code. In this work the
inverse problem for one, two and three sources has been considered. For each of these problems, 100 different test cases have been
taken into account. In each test case the sources positions and their
emission rates are randomly chosen, but with two constraints that
avoid the presence of too many sources being very close to each
other and of unlikely emission rates. Hence, the distance between
any two sources has to be greater than 100 m and the emission
rates can vary in the range from 1 to 100 g/s. A particular relevant
test case, chosen among the 100 runs for the identication of three
sources, is here presented in detail to describe the GAIM procedure

and the obtained results. Fig. 4 shows the planar view of the 3D test
case domain, in which the receptors are represented by red crosses
and the unknown sources (to be identied), Qin:i, are represented
by blue stars. The values of the employed meteorological variables
are reported in Table 1 while the positions and the emission rates of
the sources for the chosen test case are reported in Table 2. Typical
summer conditions in Mediterranean climate have been taken into
account with low wind speed, westerly wind direction and very
unstable stability class. The test case shown in Fig. 4 has been
chosen to highlight the performance of our inverse model when a
difcult conguration has to be solved. In fact, in this particular
case two of the sources are located close to the boundary along the
y axis of the domain (Qin:1 and Qin:2) and two sources (Qin:2 and
Qin:3) are in the same direction, which corresponds to the wind
direction, so that Qin:3 turns out to be hidden by Qin:2. Furthermore, the emission rate of one of the three sources is much lower
than those of the other two (source No. 3 in Table 2).
The input parameters needed for our GAIM code are reported in
Table 3. Eight CPUs have been used and a population of 50,000
individuals has been assigned to each CPU. The maximum number
of iterations has been chosen equal to 500. The mutation probability pm has been set equal to 0.5, whereas the individuals that will
enter to the elite are the 10% of the population and the migration
module has been activated. The computational time has been
74 min on a four cores Intel(R) Xeon(R) dual CPU E3-1245 V2 @
3.40 GHz (32 GiB of memory) and the obtained results are presented in Figs. 5e7.
Fig. 5 shows the results relative to the described test case obtained by our code. The panel a) shows the planar view of the
domain. The position of the unknown sources given as input (Qin:i)
are represented as blue stars and the sources position obtained as
output (Qout:i) as green full circles. The employed meteorological
parameters are indicated in the gure. As one can see, both the
emission rates (see on the right side of panel a)) and the position of
the unknown sources are exactly identied by the model. In panel
b) the three dimensional domain is sketched along with the identied sources. In both panels, the red crosses represent the
receptors.
Fig. 6 reports the lowest tness value, as dened in Eq. (2),
corresponding to each generation, i.e to each iteration. One can see
that the value of the tness decreases very quickly in few iterations
and tends to stabilize for the remaining ones, a common feature in
any type of GA procedures. In fact, at the beginning of the run the
population is more heterogeneous, being randomly generated, and
it is the crossover operator that creates offspring better (i.e. with
lower tness values) than parents. After several iterations, population tends to be more homogeneous and only the mutation creep
operator can produce signicant effects on the tness value. As
shown in Fig. 6, starting from the 62th iteration, the tness value
drastically lowers of several orders of magnitude in few generations
due to the mutation creep module. This is a paradigmatic case in
which the mutation creep step is essential for nding the exact
solution. For each generation, a new elite is dened. At the end of
the process, the individuals with the lowest tness value in the elite
will represent the solution. Fig. 7 shows in a two dimensional cut
(z 2 m) the evolution of the individuals in the elite from the 1st

Table 1
Meteorological parameters employed in synthetic (test cases)
calculations.
Fig. 4. Planar view of the 3D test case domain (1000 m  1000 m  50 m) with the
presence of three randomly chosen sources (Q_in: i, i 1,2,3) represented by blue
stars. The crosses represent the position of the receptors. (For interpretation of the
references to colour in this gure legend, the reader is referred to the web version of
this article.)

Variable

Value

Wind speed
Wind direction
Pasquill stability class

2.0 [ms1]
270 [ N]
Very unstable

A. Cantelli et al. / Atmospheric Environment 115 (2015) 36e46

41

Table 2
Data of the three sources for the selected test case.
Source no.

Q [gm3]

x [m]

y [m]

z [m]

1
2
3

100.0
47.0
1.0

17
97
430

773
374
379

18
43
11

Table 3
GAIM parameters used for all the synthetic calculations.
Parameter

Value

No. of CPUs
Population size
No. of iterations
Mutation probability
Elite size
Migration ag

8
50,000
500
0.5
5000
True

generation up to the 299th generation. In all the graphs, each point


represents the spatial position of a tuple (i.e. the set of the three
coordinates). Hence, in the test case considered here, each individual of the population is represented by a set of three tuples. In
the rst panel, when the generation number is equal to one, the
elite individuals are almost uniformly distributed in the domain.
After about the 180th generation the individuals begin to localize in
the neighbourhood of the unknown sources (the three stars shown
in each panel) and at the 185th almost all the individuals identify
the solution. In the last panel, when the 299th iteration is reached,

Fig. 6. Evolution of the lowest tness value as function of the number of iterations (i.e.
generations). At the 62th iteration the mutation creep module starts. See main text for
details.

populations (1 for each CPU). The coordinates of the sources have


been randomly generated, the emission rates chosen within the
values 1e100 gm3, and the sources positions are limited to the
interval 1e700 m along the wind direction (x axis) in order to
ensure that at least one column of sensors is inuenced by sources.
To evaluate the results with respect to the efciency in nding the
exact solutions, a rather standard estimator has been devised. It
has been dened as:

r
m 
2 
 
2 
2
1 X
Qin;i  Qout;i 
x

x

y

y

z

z
in;i
out;i
in;i
out;i
in;i
out;i

m i1 
Qin;i

all the individuals have already concentrated on the exact positions


of the sources.
As said, in order to evaluate the overall performance of our
GAIM code 100 test cases for one, two and three sources have been
taken into account. We employed the GAIM code for each conguration (i.e. each of the 300 runs) setting only 8 different random

(4)

where m is the number of the unknown sources, the Qin,i represents


the emission rate of the i-th unknown source, and xin,i, yin,i and zin,i
are the coordinates of the i-th unknown source. This estimator,
[m], takes into account the distance error produced by the model,
weighted by the absolute value of the emission rate relative error
and by the number of sources in order to have a consistent

Fig. 5. Results obtained with our GAIM code for the selected test case. a) Planar view. The blue stars represent the unknown emission sources given as input (Qin:i), the green full
circles represent the sources position obtained as output (Qout:i) and the crosses represent the receptors. The numerical values of the input and output emission rates are displayed
on the right of this panel. b) Three dimensional view of the same results. (For interpretation of the references to colour in this gure legend, the reader is referred to the web version
of this article.)

42

A. Cantelli et al. / Atmospheric Environment 115 (2015) 36e46

Fig. 7. Elite individuals evolution from the 1st generation up to the 299th. Red crosses indicate the receptors, blue stars represent the three unknown sources and the black dots are
tuples representing the elite individuals. (For interpretation of the references to colour in this gure legend, the reader is referred to the web version of this article.)

comparison among problems dealing with a different number of


sources to be identied.
In Fig. 8 we show the statistics of the results obtained with our
code as a function of seven estimator classes. In the rst class, all
the results (number of occurrences) for which the calculated estimator is less than 10 m are reported. For the single source case the

Fig. 8. Bar plot for all the synthetic runs taking into account the calculated values of
the estimator. The blue, red and yellow bars respectively indicates one, two and three
sources runs. See main text for details. (For interpretation of the references to colour in
this gure legend, the reader is referred to the web version of this article.)

model is able to retrieve both position and emission rate for all
runs, that is the estimator is equal to zero for all the 100 congurations (blue bar in Fig. 8). For the multiple sources cases it turns
out that can be greater than 10 m. In particular, for two sources 95
congurations are in the rst class (0e10) and 81 test cases out of
these 95 give the exact solution (0). For the three sources case, 82
congurations are in the class 0e10 and for 47 test cases out of
these 82 we obtained the exact solution. It was veried that the test
cases for which the estimator is greater than 10 m occur when one
or two sources are shadowed by another source along wind direction. Furthermore, we have to consider that many congurations
taken into account are particularly difcult to solve since the
sources are located along the x axis boundary or at the right hand
side of the domain (see Fig. 4) so that few sensors are affected by
the sources. This means that only a small subset of the 25 receptors
concentration values can be used in the computational procedure.
In Fig. 9 we report the correlations of the model input values
against the obtained output variables. Looking at the panels b) and
c) of that gure, one can see that GAIM is able to locate efciently
the sources that are located in the x-y plane. On the other hand, the
correlation along the z axis (see panel d) of Fig. 9) is marked by a
greater spread with respect to that for the x-y plane. This is due to
our choice to place sensors only in the x-y plane at an height of 2 m
in order to simulate a realistic experimental setup and with this
conguration we have a rather rough sampling of concentrations
along the z axis. This issue affects in turn the errors on the emission
rates for the three sources case, as one can see in the panel a) of

A. Cantelli et al. / Atmospheric Environment 115 (2015) 36e46

43

Fig. 9. Correlations between input values (unknown sources) and output results. In panel a) the emission rates Qin vs Qout for all the synthetic calculations for 1, 2 and 3 sources are
reported. Panels b), c) and d) show the correlations between the coordinates Xin (synthetic unknown coordinates sources) and Xout (obtained with our GAIM code). In all the panels
blue crosses, red stars and yellow dots indicates one, two and three sources runs, respectively. Emission rates are in gs1 and coordinates in m. (For interpretation of the references
to colour in this gure legend, the reader is referred to the web version of this article.)

Fig. 9.
Nevertheless, in spite of these particular critical situations the
model performance in these synthetic runs is overall quite good.
3.2. Application of GAIM to single source and three sources
experimental data
In order to verify the model capability to retrieve the unknown
sources in a real environment, the Prairie Grass eld experiments

Fig. 10. Prairie Grass No. 4 eld experiment: contour map indicating the interpolated
SO2 concentration emitted by a surface source (red star) and measured at the sensors
(red crosses). Concentrations are in mg/m3. (For interpretation of the references to
colour in this gure legend, the reader is referred to the web version of this article.)

data (Barad, 1958) have been employed with our GAIM code for the
case with single-point source and a trial from Fusion Field Trials
2007, FFT07, (Storwald, 2007) for the case with three different
sources. In Prairie Grass campaign approximately 600 receptors
had been located along ve concentric semicircle arcs disposed at
the radii distance of 50, 100, 200, 400 and 800 m downwind to the
release. Sixty-eight runs had been performed with summer conditions, both during the day and the night. For each run, sulfur
dioxide (SO2) tracer originated by a source placed at 0.46 m agl had
been released. The samplers height was 1.5 m agl. In Fig. 10 the
sampler positions are depicted along with the concentrations
contour map for one of the 68 experiments.
In order to use the experimental data, a three-dimensional
model domain 4000 m  4000 m  50 m has been used. All the
68 runs have been examined and employed for the calculations
with our GAIM code. The source has been placed in the center of the
computation domain to avoid to restrict the model solutions to a
limited area portion. For each run, the source position error (SPE)
and the source strength difference (SSD) have been calculated. The
SPE indicates the distance between the position of the source
retrieved by the model and the real source position, while the SSD,
dened as (Qobs  Qmod)/Qobs, gives the percentage error between
the modeled source strength and the observed one. Both of them
are represented in the rst panel of Fig. 11, in which the SPE is less
than 100 m for the majority of the runs (55 out of 68). The SSD
exceeds the value of jSSDj 1 in 18 runs which have been mainly
carried out during slightly unstable, neutral and stable atmospheric
conditions. In the panel b) of Fig. 11 SPEs and SSDs are reported as a
function of the atmospheric Pasquill stability classes (Pasquill,
1961) where A, B, C, D, E, F indicates very unstable, unstable,
slightly unstable, neutral, slightly stable and stable atmosphere,

44

A. Cantelli et al. / Atmospheric Environment 115 (2015) 36e46

Fig. 11. Panel a): source position error (SPE) (blue circles) and source strength difference (SSD) (red triangles) calculated for each Prairie Grass eld experiment. Panel b) SPE (blue
circles) and SSD (red triangles) as a function of atmospheric stability classes. SPEs are in m. See details in the main text. (For interpretation of the references to colour in this gure
legend, the reader is referred to the web version of this article.)

respectively. As shown in that panel, for A, B, C and E classes the SPE


and the SSD are mostly negligible indicating an high accuracy of the
GAIM solution. The worst runs (see Fig. 11) are associated to an high
value of the measured 10 min averaged wind direction standard
deviation (WDSD10 ). This is clearly visible in Fig. 12 which shows
the relationship between the WDSD10 and the SPEs. Data from runs
conducted under neutral conditions (black squares in that gure)
are well tted by an exponential regression line, which shows that
SPE increase with the increasing of WDSD10 .
Runs carried out in stable atmosphere show an high SPE even
with modest values of WDSD10 (green triangle in Fig. 12). These
results are not surprising since all the runs had been conducted at
night and during low wind conditions; it should be also noted that
during those atmospheric conditions the standard deviation of the
concentration distribution sz,sy of the Gaussian plume have a lower
value with respect to those calculated under unstable conditions.
Due to the reduced sz,sy values, the Gaussian plume spreading involves a smaller number of sensors making difcult to nd the
exact solution.
Fig. 13 shows the correlation between SPE and SSD as a function
of atmospheric stability. Low SSD values correspond to low SPE
values and the worst results are related to those runs (green triangle and black square in Fig. 13) for which the assumptions made
to derive the Gaussian analytic solution (Stockie, 2011) are poorly
satised.

Fig. 12. Correlation between the SPE and the 10 min averaged wind direction standard
deviation (WDSD10 ). Orange stars, blue diamonds, red up-triangles, black squares,
purple circles and green down-triangles represent runs conducted in very unstable (A),
unstable (B), slightly unstable (C), neutral (D), slightly stable (E) and stable atmosphere
(F), respectively. The exponential regression t of the neutral runs are shown by the
black line. SPEs are in m and WDSD10 in degree. (For interpretation of the references to
colour in this gure legend, the reader is referred to the web version of this article.)

In spite of this, even with real data, GAIM is able to retrieve, for
single-point release, the sources strength and position with very
good accuracy and our results compare well with those of Cervone
and Franzese (2011) when applying at the same Prairie Grass
experimental data. In the FFT07 series of experiments, a grid of 100
digital photoionization detectors (at 2 m above the ground) were
located in an area of 475 m  450 m. Propylene gas, the tracer, was
released from multiple locations at 2 m above ground and at constant ow rates for approximately 10 min per trial. In order to test
our code we have chosen the FFT07 trial 28 where a continuous
release from 3 sources for 10 min is carried out. The experimental
layout, along with the obtained results, is shown in Fig. 14. The
same trial has been already used by Wade and Senocak (2013) to
test their multi-source event reconstruction tool (MERT). The
meteorological data to be employed as our input are taken from a
32 m ultrasonic tower with 5 different vertical levels. We timeaveraged the concentration data from sensors for the continuous
release of trial 28. The computational domain is 900 m  900 m
with the same step sizes used for the single-point release calculations. The distances between the true sources and those retrieved
with our code are 0.99 m, 11.45 m and 23.05 m, as one can see in Fig.
14. Thus, we are able to successfully locate the position of the three
sources with a maximum error lower than that obtained by Wade
and Senocak (2013), which report a maximum error of 48.3 m for
the same trial 28. The intensities we have found for the three
sources are retrieved with factors 0.8, 2.9 and 4.5, respectively. It
has to be considered that the inverse model is strongly affected by

Fig. 13. Correlation between the SPE and the SSD. Symbols follow the notation used in
Fig. 12. SPEs are in m.

A. Cantelli et al. / Atmospheric Environment 115 (2015) 36e46

45

Fig. 14. Layout of FFT07 trial 28. The sensors position is represented by red crosses, the location of the true sources is in blue stars and the estimated sources location as retrieved by
our GAIM code is represented by green circles. (For interpretation of the references to colour in this gure legend, the reader is referred to the web version of this article.)

the dispersion model employed in the direct simulation. In this


respect, the GAIM errors showed with FFT07 data could be reduced
once a more sophisticated direct model than the Gaussian solution
were used.
4. Conclusion
The proposed computational model has been implemented for
retrieving the location and the emission rates of unknown pollution
sources starting from the measurements of the concentration of the
pollutants in steady state conditions. The model is based on the
minimization of a tness function through a GA procedure. The GA
has been used to reduce the computational time when the unknown sources and the number of grid points in the computational
domain increases.
Considering a domain of 1000 m  1000 m  50 m with a
resolution of 1 m, a regular network of 25 receptors at 2 m agl and a
Gaussian dispersion model employed to generate the concentration
values at the grid points, the GA Inverse Model (GAIM) for one, two
and three sources has been tested. For each problem, 100 test cases
have been done. In each test case the sources positions and their
emission rate have been randomly chosen. The locations are limited
to the interval 1e700 m along the wind direction in order to ensure
that at least one column of sensors was inuenced by sources. For
the single-source case, the model is able to retrieve both position
and emission rate for all the synthetic runs, that is the estimator
calculated for all the 100 congurations is equal to zero. For
multiple-sources runs the error is slightly higher. In particular, for
two sources 95 congurations are in the estimator class 0e10 (81
test cases out of these 95 give an estimator equal to zero). For the
three-sources case, 82 congurations are in the estimator class
0e10 (47 test cases out of these 82 give an estimator equal to zero).
It has been veried that these errors occur when one or two sources
are shadowed by another source along the wind direction.
Furthermore, it has to be considered that many congurations
taken into account are particularly difcult to solve since the
sources are located along the x axis boundary or at the end of the
domain so that few sensors are affected by the sources.
Finally, GAIM has been tested with Prairie Grass (single source)

and FFT07 (three sources) experimental data. The efciency of our


code has been measured employing 68 experimental runs evaluating the source position error (SPE) and the source strength difference (SSD). SPE is less than 100 m for 55 runs, while SSD exceeds
the value of jSSDj 1 in 18 runs, which have been mainly carried
out during slightly unstable, neutral and stable atmospheric conditions. In unstable atmospheric conditions both SPE and SSD are
negligible, indicating an high accuracy of the GAIM solution.
Regarding the FFT07 trial 28 test, we have obtained a good agreement for the retrieved location (distances between true and
calculated sources are 0.99 m, 11.45 m and 23.05 m) and factors of
0.8, 2.9 and 4.5 for the intensity of the emissions. We can conclude
that GAIM is able to retrieve the location of the sources in very
different conditions. However, because the inverse model is
strongly affected by the dispersion model employed in the direct
simulation we are working to add new modules to our code in order
to take into account more realistic dispersion models in addition to
the Gaussian solution.
Acknowledgments
The authors are grateful to professor P. Monti and professor G.
Leuzzi for the helpful and stimulating discussions. One of the authors (L.C.) acknowledges the nancial support of the Italian Ministry for University and Research (MIUR), grant PON & REC no.
01_02422 SNIFF (Sensor Network Infrastructure For Factors).
References
Allen, C.T., Haupt, S.E., Young, G.S., 2007a. Source characterization with a genetic
algorithm-coupled dispersion-backward model incorporating SCIPUFF. J. Appl.
meteorol. Climatol. 46 (3), 273e287.
Allen, C.T., Young, G.S., Haupt, S.E., 2007b. Improving pollutant source characterization by better estimating wind direction with a genetic algorithm. Atmos.
Environ. 41 (11), 2283e2289.
Arya, P., 1999. Air Pollution Meteorology and Dispersion. Oxford University Press.
Bailey, D., Schwede, D., 1995. EPA-454/B-95-003a, EPA-454/B-95-003b. Users Guide
for the Industrial Source Complex (Isc3) Dispersion Model, Vol. I and II. USeEPA,
Research Triangle Park, North Carolina, 27711.
Barad, M., 1958. Geophysical Research Paper, No. 59. Project Prairie Grass. A Field
Program in Diffusion, Vols. I and II. Air Force Cambridge Research Center,
Bedford, USA.

46

A. Cantelli et al. / Atmospheric Environment 115 (2015) 36e46

Cantelli, A., Leuzzi, G., Monti, P., Viotti, P., 2012. An inverse modelling approach for
estimating vehicular emissions in urban coastal areas of the messina strait. Int.
J. Environ. Pollut. 50 (1), 274e282.
Cervone, G., Franzese, P., 2011. Non-Darwinian evolution for the source detection of
atmospheric releases. Atmos. Environ. 45 (26), 4497e4506.
Cervone, G., Franzese, P., Grajdeanu, A., 2010. Characterization of atmospheric
contaminant sources using adaptive evolutionary algorithms. Atmos. Environ.
44 (31), 3787e3796.
Correia, L., 2010. Computational evolution: taking liberties. Theory Biosci. 129
(2e3), 183e191.
Fang, Y., Li, J., 2010. A review of tournament selection in genetic programming. In:
Cai, Z., Hu, C., Kang, Z., Liu, Y. (Eds.), Advances in Computation and Intelligence,
Volume 6382 of Lecture Notes in Computer Science. Springer-Verlag, Berlin,
pp. 181e192.
Goldberg, D.E., 1989. Genetic Algorithms in Search, Optimization, and Machine
Learning. Addison-Wesley Longman Publishing Co., Inc, Boston.
Goldberg, D.E., 2002. The Design of Innovation: Lessons from and for Competent
Genetic Algorithms. Kluwer Academic, Norwell, MA, USA.
Hamblin, S., 2013. On the practical usage of genetic algorithms in ecology and
evolution. Methods Ecol. Evol. 4 (2), 184e194.
Hart, J.K., Martinez, K., 2006. Environmental sensor networks: a revolution in the
earth system science? Earth Sci. Rev. 78 (3e4), 177e191.
Haupt, S.E., 2005. A demonstration of coupled receptor/dispersion modeling with a
genetic algorithm. Atmos. Environ. 39 (37), 7181e7189.
Haupt, S.E., Young, G.S., Allen, C.T., 2006. Validation of a receptor-dispersion model
coupled with a genetic algorithm using synthetic data. J. Appl. Meteorol. Climatol. 45, 476e490.
Holland, J.H., 1992. Adaptation in Natural and Articial Systems. MIT Press.
Issartel, J.-P., Sharan, M., Modani, M., 2007. An inversion technique to retrieve the
source of a tracer with an application to synthetic satellite measurements. Proc.
R. Soc. A Math. Phys. Eng. Sci. 463 (2087), 2863e2886.
Keats, A., Yee, E., Lien, F.-S., 2007. Bayesian inference for source determination with
applications to a complex urban environment. Atmos. Environ. 41 (3), 465e479.
Krysta, M., Bocquet, M., Sportisse, B., Isnard, O., 2006. Data assimilation for shortrange dispersion of radionuclides: an application to wind tunnel data. Atmos.
Environ. 40 (38), 7267e7279.
Marchuk, G., Shutyaev, V., Bocharov, G., 2005. Adjoint equations and analysis of
complex systems: application to virus infection modelling. J. Comput. Appl.
Math. 184 (1), 177e204.
Mitchell, M., 1998. An Introduction to Genetic Algorithms. The MIT Press.
Pandey, H.M., Chaudhary, A., Mehrotra, D., 2014. A comparative review of approaches to prevent premature convergence in ga. Appl. Soft Comput. 24,
1047e1077.
Pasquill, F., 1961. The estimation of the dispersion of windbome material. Meteorol.

Mag. 90 (1063), 33e49.


Penenko, V., Baklanov, A., Tsvetova, E., 2002. Methods of sensitivity theory and
inverse modeling for estimation of source parameters. Future Gener. Comp.
Syst. 18 (5), 661e671.
Pudykiewicz, J.A., 1998. Application of adjoint tracer transport equations for evaluating source parameters. Atmos. Environ. 32 (17), 3039e3050.
Rapallo, A., et al., 2005. Global optimization of bimetallic cluster structures. I. Sizemismatched AgeCu, AgeNi, and AueCu systems. J. Chem. Phys. 122, 194308.
Senocak, I., Hengartner, N.W., Short, M.B., Daniel, W.B., 2008. Stochastic event
reconstruction of atmospheric contaminant dispersion using Bayesian inference. Atmos. Environ. 42 (33), 7718e7727.
Sharan, M., Issartel, J.-P., Singh, S.K., Kumar, P., 2009. An inversion technique for the
retrieval of single-point emissions from atmospheric concentration measurements. Proc. R. Soc. A Math. Phys. Eng. Sci. 465 (2107), 2069e2088.
Sharan, M., Singh, S.K., Issartel, J.-P., 2012. Least square data assimilation for identication of the point source emissions. Pure Appl. Geophys. 169 (3), 483e497.
Singh, S.K., Rani, R., 2014. A least-squares inversion technique for identication of a
point release: application to fusion eld trials 2007. Atmos. Environ. 92,
104e117.
Singh, S.K., Sharan, M., Issartel, J.-P., 2013. Inverse modelling for identication of
multiple-point releases from atmospheric concentration measurements.
Boundary-Layer Meteorol. 146 (2), 277e295.
Singh, S.K., Sharan, M., Singh, A.K., 2014. Reconstructing height of an unknown
point release using least-squares data assimilation. Q. J. R. Meteorol. Soc. http://
dx.doi.org/10.1002/qj.2446.
Stockie, J.M., 2011. The mathematics of atmosperic dispersion modeling. Siam Rev.
53 (2), 349e372.
Storwald, D.P., 2007. Detailed Test Plan for the Fusing Sensor Information from
Observing Networks (Fusion) Field Trial (FFT-07). Meteorology Division, West
Desert Test Center. U.S. Army Dugway Proving Ground WDTC Document No.
WDTC-TP-07e078.
Tiwary, A., Colls, J., 2010. Air Pollution. Routledge, London.
Turner, D.B., 1973. Workbook of Atmospheric Dispersion Estimates. US Government
Printing Ofce.
Wade, D., Senocak, I., 2013. Stochastic reconstruction of multiple source atmospheric contaminant dispersion events. Atmos. Environ. 74, 45e51.
Whitley, D., 1994. A genetic algorithm tutorial. Stat. Comput. 4, 65e85.
Whitley, D., 2001. An overview of evolutionary algorithms: practical issues and
common pitfalls. Inf. Softw. Technol. 43, 817e831.
Whitley, D., Rana, S., Heckendorn, R., 1999. The island model genetic algorithm: on
separability, population size and convergence. J. Comput. Inf. Technol. 7, 33e48.
Yee, E., 2008. Theory for reconstruction of an unknown number of contaminant
sources using probabilistic inference. Bound. layer Meteorol. 127 (3), 359e394.

You might also like