Professional Documents
Culture Documents
A Robust Data Mining Approach
A Robust Data Mining Approach
www.emeraldinsight.com/0264-4401.htm
EC
28,3 A robust data mining approach
for formulation of geotechnical
engineering systems
242
Amir Hossein Alavi
College of Civil Engineering, Iran University of Science and Technology,
Tehran, Iran, and
Amir Hossein Gandomi
College of Civil Engineering, Tafresh University, Tafresh, Iran
Abstract
Purpose – The complexity of analysis of geotechnical behavior is due to multivariable dependencies
of soil and rock responses. In order to cope with this complex behavior, traditional forms of
engineering design solutions are reasonably simplified. Incorporating simplifying assumptions into
the development of the traditional models may lead to very large errors. The purpose of this paper is to
illustrate capabilities of promising variants of genetic programming (GP), namely linear genetic
programming (LGP), gene expression programming (GEP), and multi-expression programming (MEP)
by applying them to the formulation of several complex geotechnical engineering problems.
Design/methodology/approach – LGP, GEP, and MEP are new variants of GP that make a clear
distinction between the genotype and the phenotype of an individual. Compared with the traditional GP, the
LGP, GEP, and MEP techniques are more compatible with computer architectures. This results in a
significant speedup in their execution. These methods have a great ability to directly capture the knowledge
contained in the experimental data without making assumptions about the underlying rules governing the
system. This is one of their major advantages over most of the traditional constitutive modeling methods.
Findings – In order to demonstrate the simulation capabilities of LGP, GEP, and MEP, they were
applied to the prediction of: relative crest settlement of concrete-faced rockfill dams; slope stability;
settlement around tunnels; and soil liquefaction. The results are compared with those obtained by
other models presented in the literature and found to be more accurate. LGP has the best overall
behavior for the analysis of the considered problems in comparison with GEP and MEP. The simple
and straightforward constitutive models developed using LGP, GEP and MEP provide valuable
analysis tools accessible to practicing engineers.
Originality/value – The LGP, GEP, and MEP approaches overcome the shortcomings of different
methods previously presented in the literature for the analysis of geotechnical engineering systems.
Contrary to artificial neural networks and many other soft computing tools, LGP, GEP, and MEP
provide prediction equations that can readily be used for routine design practice. The constitutive
models derived using these methods can efficiently be incorporated into the finite element or finite
difference analyses as material models. They may also be used as a quick check on solutions
developed by more time consuming and in-depth deterministic analyses.
Keywords Data collection, Geotechnical engineering, Programming and algorithm theory,
Systems analysis
Paper type Research paper
Engineering Computations:
International Journal for
Computer-Aided Engineering and
Software Introduction
Vol. 28 No. 3, 2011
pp. 242-274 In contrast with other civil engineering problems, many of geotechnical engineering
q Emerald Group Publishing Limited systems lack a precise analytical theory or model for their solutions. This is usually
0264-4401
DOI 10.1108/02644401111118132 because of an inadequate understanding of the phenomena involved and the factors
affecting them, as well as a limited quantity and poor quality of information available. Geotechnical
In order to cope with the complexity of geotechnical engineering problems, traditional engineering
forms of engineering design solutions have widely been developed. The information
has usually been collected, synthesized and presented in the form of design charts, systems
tables or empirical formulae (Shahin et al., 2001). Most commonly used regression
analyses can have large uncertainties. The regression analysis has major drawbacks
pertaining idealization of complex processes, approximation and averaging widely 243
varying prototype conditions. In regression analyses, the nature of the corresponding
problem is modeled by a pre-defined linear or nonlinear equation. Another major
constraint in application of regression analysis is the assumption of normality of
residuals. The simulation capability of the classical constitutive modeling is also
limited for reasons pertaining to the formulation complexity, idealization of material
behavior, and excessive empirical parameters (Adeli, 2001).
Several computer-aided data mining approaches have been developed by
developments in computational software and hardware. Pattern recognition system,
as an example, learns adaptively from experiences and extracts various discriminators.
Artificial neural networks (ANNs) are the most widely used pattern recognition
procedures. They have emerged as a result of simulation of biological nervous system.
ANNs have extensively been used to capture the nonlinear interactions between various
parameters in geotechnical engineering systems (Juang et al., 2001; Javadi, 2006; Alavi
et al., 2009, 2010c). An overview of the ANN applications in geotechnical engineering and
current research directions of this approach have been recently presented by Shahin et al.
(2008, 2009) and Javadi and Rezania (2009). Despite the acceptable performance of ANNs
in most cases, they do not give a definite function to calculate the outcome using the
input values. Hence, they do not provide a better understanding of the nature of the
derived relationships. The ANN approach is mostly appropriate to be used as a part of a
computer program. However, more robust tools are required to assess the behavior of
geotechnical engineering problems due to their nonlinearity and complexity.
Genetic algorithm (GA) is a powerful stochastic search and optimization method
based on the principles of genetics and natural selection. GA has been shown to be
suitably robust for a wide variety of complex geotechnical problems (Simpson and
Priest, 1993; Pal et al., 1996; Goh, 1999; McCombie and Wilkinson, 2002; Cui and Sheng,
2005; Levasseur et al., 2007, 2009; Majdi and Beiki, 2009; Hashash et al., 2010). Genetic
programming (GP) (Koza, 1992; Banzhaf et al., 1998) is an alternative approach for the
behavior modeling of geotechnical engineering tasks. GP is a developing subarea of
evolutionary algorithms inspired from Darwin’s evolution theory. It may generally be
defined as a specialization of GA where the solutions are computer programs rather
than fixed-length binary strings. The main advantage of GP over the conventional
statistical methods and other soft computing tools is its ability to generate prediction
equations without assuming prior form of the existing relationship. The developed
equations can be easily manipulated in practical circumstances. In contrast with ANNs
and GA, application of GP in the field of civil engineering is quite new and original.
The classical GP technique has been recently used to derive greatly simplified
formulas for geotechnical engineering problems (Yang et al., 2004; Johari et al., 2006;
Javadi et al., 2006). Recent studies have also shown that GP and its variants possess
obvious superiority over ANNs in dealing with geotechnical problems (Narendra et al.,
2006; Rezania and Javadi, 2007; Kayadelen et al., 2009; Alavi et al., 2010b).
EC Linear genetic programming (LGP) (Brameier and Banzhaf, 2007) is a new subset of GP
28,3 with a linear structure similar to the DNA molecule in biological genomes. LGP is a
machine learning approach that uses sequences of imperative instructions as genetic
material. More specifically, LGP operates on programs that are represented as linear
sequences of instructions of an imperative programming language (Brameier and
Banzhaf, 2001, 2007). Gene expression programming (GEP) (Ferreira, 2001) is another
244 recent extension to GP that evolves computer programs of different sizes and shapes
encoded in linear chromosomes of fixed length. The GEP chromosomes are composed of
multiple genes, each gene encoding a smaller subprogram. Multi-expression programming
(MEP) (Oltean and Dumitrescu, 2002) is also a linear variant of GP that uses a linear
representation of chromosomes. MEP has a special ability to encode multiple computer
programs of a problem in a single chromosome. Based on numerical experiments, the LGP,
GEP, and MEP approaches are significantly able to outperform similar techniques (Oltean
and Grosşan, 2003a; Brameier and Banzhaf, 2007). Some of the limited scientific efforts
directed at applying LGP, GEP, and MEP to geotechnical engineering tasks include
performance characteristics modeling of stabilized soil (Alavi et al., 2008), prediction of
compressive and tensile strength of limestone (Baykasoglu et al., 2008), prediction of peak
ground acceleration (Cabalar and Cevik, 2009), modeling damping ratio and shear
modulus of sand (Cevik and Cabalar, 2009), formulation of soil classification (Alavi et al.,
2010b), and soil liquefaction assessment (Alavi and Gandomi, 2010).
This study investigates the potential of LGP, GEP, and MEP in simulating the
nonlinear complex behavior of geotechnical engineering systems. In order to demonstrate
the formulation capabilities of LGP, GEP, and MEP, these techniques were applied to four
practical examples of geotechnical engineering. The obtained results were further
compared with those provided by the existing models in the literature. The LGP, GEP, and
MEP models were developed based on reliable experimental results collected through an
extensive literature review.
Genetic programming
GP is a symbolic optimization technique that creates computer programs to solve a
problem using the principle of Darwinian natural selection (Koza, 1992). Friedberg
(1958) left the first footprints in the area of GP by using a learning algorithm to
stepwise improve a program. Much later, Cramer (1985) applied GAs and tree-like
structures to evolve programs. The breakthrough in GP then came in the late 1980s
with the experiments of Koza (1992) on symbolic regression. GP was introduced by
Koza (1992) as an extension of GA. Most of the genetic operators used in GA can also
be implemented in GP with minor changes. The main difference between GP and GA is
the representation of the solution. The GP solutions are computer programs that are
represented as tree structures and expressed in a functional programming language
(like LISP) (Koza, 1992). GA creates a string of numbers that represent the solution.
In other words, in GP, the evolving programs (individuals) are parse trees than can
vary in length throughout the run rather than fixed-length binary strings. Essentially,
this is the beginning of computer programs that program themselves (Koza, 1992).
Since GP often evolves computer programs, the solutions can be executed without
post-processing, while coded binary strings typically evolved by GA require
post-processing (Ahmed et al., 2007). The traditional optimization techniques, like GA,
are generally used in parameter optimization to evolve the best values for a given
set of model parameters. GP, on the other hand, gives the basic structure of the Geotechnical
approximation model together with the values of its parameters (Javadi and Rezania, engineering
2009). GP optimizes a population of computer programs according to a fitness
landscape determined by a program ability to perform a given computational task. systems
The fitness of each program in the population is evaluated using a fitness function.
Thus, the fitness function is the objective function GP aims to optimize (Torres et al.,
2009). That is to say, the fitness function is a particular type of objective function that 245
prescribes the optimality of a solution (computer program) evolved by GP and ranks
the program against all the other generated programs.
This classical GP approach is referred to as tree-based GP. A population member in
tree-based GP is a hierarchically structured tree comprising functions and terminals. The
functions and terminals are selected from a set of functions and terminals. For example,
the function set F can contains the basic arithmetic operations (þ, 2 , £ , /, etc.), Boolean
logic functions (AND, OR, NOT, etc.), or any other mathematical functions. The terminal
set T contains the arguments for the functions and can consist of numerical constants,
logical constants, variables, etc. The functions and terminals are chosen at random and
constructed together to form a computer model in a tree-like structure with a root point
with branches extending from each function and ending in a terminal. An example of a
simple tree representation of a GP model is shown in Figure 1. In addition to the traditional
tree-based GP, there are other types of GP where programs are shown in different ways
(Figure 2). These are linear and graph-based GP (Banzhaf et al., 1998). The emphasis of this
study is placed on the linear-based GP techniques.
Linear-based GP
There are some main reasons for using linear GP. Basic computer architectures are
fundamentally the same now as they were 20 years ago, when GP began. Almost all
architectures represent computer programs in a linear fashion. Also, computers do not
Root node √
Link
+ Functional node
X1 /
3 X2 Figure 1.
Tree representation
p of a
GP model ðX1 þ 3=X2 Þ
Terminal nodes
GP
Figure 2.
Different types of genetic
Tree-based GP Linear-based GP Graph-based GP programming
EC naturally run tree-shaped programs. Hence, slow interpreters have to be used as part of
28,3 tree-based GP. Conversely, by evolving the binary bit patterns actually obeyed by the
computer, the use of an expensive interpreter (or compiler) is avoided and GP can run
several orders of magnitude faster (Poli et al., 2007). Several linear variants of GP have
been recently proposed. Some of them are (Oltean and Grosşan, 2003a): LGP (Brameier
and Banzhaf, 2007), GEP (Ferreira, 2001), MEP (Oltean and Dumitrescu, 2002), cartesian
246 genetic programming (Miller and Thomson, 2002), GA for deriving software (Patterson,
2002) and infix form genetic programming (Oltean and Grosşan, 2003c). LGP, GEP and
MEP are the most common linear-based GP methods. These variants make a clear
distinction between the genotype and the phenotype of an individual. The individuals in
these variants are represented as linear strings (Oltean and Grosşan, 2003a).
/
f[0] = 0;
L0: f[0] += v[0];
L1: f[0] –= –5;
– v[2]
L2: f[0] /= v[2];
Return f[0];
v[0] –5
Figure 3. y = f[0] = (v[0]– (–5))/ v[2]
Comparison of the GP (a) (b)
program structures
Notes: (a) LGP; (b) tree-based GP (after Alavi et al. (2010c))
are restricted to operations that accept a minimum number of constants or memory Geotechnical
variables, called registers (r), and assign the result to a destination register, e.g. engineering
r0: ¼ r1 þ 1. A part of a linear genetic program in C code is represented as follows
(Brameier and Banzhaf, 2007): systems
void LGP (double r[5])
{. . .
r½0 ¼ r½5 þ 70; 247
r½5 ¼ r½0 2 50;
ifðr½1 > 0Þ
ifðr½5 > 2Þ
r½4 ¼ r½2*r½1;
r½2 ¼ r½5 þ r½4;
r½0 ¼ sinðr½2Þ;
}
where register r[0] holds the final program output. LGPs can be converted into a functional
representation by successive replacements of variables starting with the last effective
instruction (Oltean and Grosşan, 2003a). Automatic induction of machine code by genetic
programming (AIMGP) is a particular form of LGP. In AIMGP, evolved programs are
stored as linear strings of native binary machine code and are directly executed by the
processor during fitness calculation. The absence of an interpreter and complex memory
handling results in a significant speedup in the AIMGP execution compared to tree-based
GP. This machine-code-based LGP approach searches for the computer program and the
constants at the same time. Here are the steps the machine-code-based LGP follows for a
single run (Francone and Deschaine, 2004; Brameier and Banzhaf, 2007):
(1) Initializing a population of randomly generated programs and calculating their
fitness values.
(2) Running a Tournament. In this step, four programs are selected from the
population randomly. They are compared and based on their fitness values, two
programs are picked as the winners and two as the losers.
(3) Transforming the winner programs. After that, two winner programs are
copied and transformed probabilistically as follows:
.
parts of the winner programs are exchanged with each other to create two
new programs (crossover); and/or
.
each of the tournament winners are changed randomly to create two new
programs (mutation).
(4) Replacing the loser programs in the tournament with the transformed winner
programs. The winners of the tournament remain without change.
(5) Repeating steps (2) through (4) until convergence.
Comprehensive descriptions of the basic parameters used to direct a search for a linear
genetic program can be found in Brameier and Banzhaf (2007).
where x1, x2 and x3 are variables and 3 is a constant; ‘‘.’’ is element separator for easy
reading. The above expression is termed as Karva notation or K-expression (Ferreira,
2006). A K-expression can be represented by a diagram which is an ET. For example,
the above sample gene can be shown as Figure 4.
The conversion starts from the first position in the K-expression, which corresponds
to the root of the ET, and reads through the string one by one. The above GEP gene can
also be expressed in a mathematical form as:
p
X 1 ððX 1 þ 3Þ 2 ðX 2 £ X 3 ÞÞ þ ðX 2 þ X 1 Þ ð2Þ
An ET can inversely be converted into a K-expression by recording the nodes from left
to right in each layer of the ET, from root layer down to the deepest one to form the
string. As previously mentioned, GEP genes have fixed length, which is predetermined
for a given problem. Thus, what varies in GEP is not the length of genes but the size of
the corresponding ETs. This means that there exist a certain number of redundant
elements, which are not useful for the genome mapping. Hence, the valid length of a
K-expression may be equal or less than the length of the GEP gene. To guarantee the
validity of a randomly selected genome, GEP employs a head-tail method. Each GEP
gene is composed of a head and a tail. The head may contain both function and
terminal symbols, whereas the tail may contain terminal symbols only. The GEP
× √
X1 – +
+ × X2 X1
Figure 4.
Example of ETs X3 3 X2 X3
algorithm uses the following steps until a termination condition is reached Geotechnical
(Ferreira, 2001):
engineering
(1) random generation of the fixed-length chromosome of each individual for the
initial population;
systems
(2) expressing chromosomes as ET and evaluating fitness of each individual;
(3) selecting the best individuals according to their fitness to reproduce with 249
modification; and
(4) repeating the above process for a definite number of generations or until a
solution is found.
In GEP, the individuals are selected and copied into the next generation according to
the fitness by roulette wheel sampling with elitism. This guarantees the survival and
cloning of the best individual to the next generation. Variation in the population is
introduced by conducting single or several genetic operators on selected chromosomes,
which include crossover, mutation and rotation. The rotation operator is used to rotate
two subparts of element sequence in a genome with respect to a randomly chosen
point. It can also drastically reshape the ETs. As an example, the following gene:
p
þ: þ : £ :X 2 :X 1 :X 3 :3:X 2 :X 3 :þ: £ : :X 1 :2
rotates the first five elements of gene (1) to the end. Only the first seven elements are
used to construct the solution function (X2 þ X1) þ (X3 £ 3), with the corresponding
expression shown in Figure 5.
Multi-expression programming
MEP is a subarea of GP developed by Oltean and Dumitrescu (2002). MEP uses linear
chromosomes for solution encoding. It has a special ability to encode multiple solutions
(computer programs) of a problem in a single chromosome. Based on the fitness values
of the individuals, the best encoded solution is chosen to represent the chromosome.
There are not increases in the complexity of the MEP decoding process compared with
the other GP variants that store a single solution in a chromosome. The exception is
on the situations where the set of training data is not a priori known (Oltean and
Grosşan, 2003a). The evolutionary steady-state MEP algorithm starts by the creation
of a random population of individuals. In order to, evolve the best expression from a
data file of inputs and outputs along a specified number of generations, MEP uses the
following steps until a termination condition is reached (Oltean and Grosşan, 2003b):
(1) selecting two parents using a binary tournament procedure and recombining
them with a fixed crossover probability;
+ ×
Figure 5.
X2 X1 X3 3 Example of ETs
EC (2) obtaining two offspring by the recombination of two parents; and
28,3 (3) mutating the offspring and replacing the worst individual in the current population
with the best of them (if the offspring is better than the worst individual in the
current population).
MEP is represented similar to the way in which C and Pascal compilers
250 translate mathematical expressions into machine code. The number of MEP genes
per chromosome is constant and specifies the length of the chromosome. A terminal
(an element in the terminal set T) or a function symbol (an element in the function set F) is
encoded by each gene. A gene that encodes a function includes pointers towards the
function arguments. Function parameters always have indices of lower values than
the position of that function itself in the chromosome. The first symbol in a chromosome
must be a terminal symbol as stated by the proposed representation scheme.
An example of an MEP chromosome can be seen below. It should be noted that
numbers to the left stand for gene labels that do not belong to the chromosome. Using
the set of functions F ¼ {þ , £ ,/} and the set of terminals T ¼ {x1, x2, x3, x4}, the
example is given as follows:
0: x1
1: x2
2: £ 0, 1
3: x3
4: þ 2, 3
5: x4
6: /4, 5
Translation of the MEP individuals into computer programs can be obtained by reading
the chromosome top-down starting with the first position. A terminal symbol defines a
simple expression and each of function symbols specifies a complex expression obtained
by connecting the operands specified by the argument positions with the current
function symbol (Oltean and Grosşan, 2003b). In the present example, genes 0, 1, 3 and
5 encode simple expressions formed by a single terminal symbol. These expressions are:
E0 ¼ x1, E1 ¼ x2, E3 ¼ x3, E5 ¼ x4. Gene 2 indicates the operation £ on the operands
located at positions 0 and 1 of the chromosome. Thus, gene 2 encodes the expression:
E2 ¼ x1 £ x2. Gene 4 indicates the operation þ on the operands located at positions 2
and 3. Therefore, gene 4 encodes the expression: E4 ¼ (x1 £ x2) þ x3. Gene 6 indicates
the operation/on the operands located at positions 4 and 5. Hence, gene 6 encodes the
expression: E6 ¼ ((x1 £ x2) þ x3)/x4.
In order to, choose one of these expressions (E1, . . . ,E6) as the chromosome
representer, multiple solutions in a single chromosome are encoded. Each MEP
chromosome may be viewed as a forest of trees rather than a single tree due to its multi
expression representation (Figure 6). Each of these expressions can be considered as a
possible solution of a problem. The fitness of each expression in an MEP chromosome is
calculated to designate the best encoded expression in that chromosome.
The LGP, GEP, and MEP models were developed based on the experimental results 251
obtained from the literature. Various parameters involved in the LGP, GEP, and MEP
algorithms are presented in Table I. The major task is to define the hidden function
connecting the input and output variables. The parameter selection will affect the
/
+
× + X4
X1 X2 X3 × X3 X4
X1 X2 × X3 Figure 6.
Expressions encoded by
0 1 X1 X2 5
2 3 an MEP chromosome
X1 X2 6
represented as trees
4
Common parameters
LGP, GEP, MEP Number of generation 100, 250, 500
Population size 500, 2,500, 5,000
p
Function set þ , 2, £ , /, , power, exp, log, ln
Mutation rate (%) 10, 90
Fitness function Linear error function
Algorithm-specific parameters
MEP Crossover rate (%) 50, 95
Crossover type Uniform
Chromosome length 50-80 genes
GEP Number of genes 1-3
Head size 3, 5, 8
Linking function þ
One-point recombination rate (%) 30, 50
Two-points recombination rate (%) 30
Gene recombination rate 10
Gene transposition rate (%) 10
Numerical constants Integer, floating-point
LGP Crossover rate (%) 50, 95
Block mutation rate (%) 30
Instruction mutation rate (%) 30
Data mutation rate (%) 40
Homologous crossover (%) 95 Table I.
Program size Initial: 80, maximum: 256-512 Parameter settings for the
Number of demes 20 LGP, GEP, and MEP
Numerical constants Integer, randomize algorithms
EC generalization capability of the LGP, GEP, and MEP models. Several runs were
28,3 conducted to come up with a parameterization of LGP, GEP, and MEP that provided
enough robustness and generalization to solve the problems. The effective training time
specifies the number of generations in LGP, GEP, and MEP. For all the cases, three levels
were set for the number of generations. A fairly large number of generations were tested
on each run to find models with minimum error. For each case, the program was run until
252 there was no longer significant improvement in the performance of the models or the
runs terminated automatically. Three levels were also set for the population size. Large
populations were used with the runs to guarantee sufficient diversity. Note that a run will
take longer with a larger population size. Two levels were considered for the crossover
and mutation rates. The success of the algorithms usually increases with increasing the
maximum program size parameter in LGP, head size and number of genes in GEP,
and chromosome length in MEP. In this case, the complexity of the evolved functions
increases and the speeds of the algorithms decrease. Different optimal levels were
considered for these parameters as tradeoffs between the running time and the
complexity of the evolved solutions. Basic arithmetic operators and mathematical
functions were utilized to get the optimum models. The values considered for the other
parameters were based on some previously suggested values (Baykasoglu et al., 2008;
Cevik and Cabalar, 2009; Alavi and Gandomi, 2010; Gandomi et al., 2010) and also after
making several preliminary runs and observing the performance behavior. All of the
combinations of the parameters were tested and ten replications were carried out for
each combination.
A computer software called Discipulus (Conrads et al., 2001) working on the basis of
the AIMGP platform was used for the LGP analysis. The GEP algorithm was
implemented by GeneXproTools (GEPSOFT, 2006) software. Source code of MEP
(Oltean, 2004) in Cþ þ was modified by the authors to be utilizable for the available
problems. For the LGP, GEP, and MEP analyses, the available datasets were randomly
divided into training and testing subsets. The GP-based models have difficulty
extrapolating beyond the range of the data used for their calibration. In order to develop
the best models, the statistical properties of the training and testing subsets need to be
similar to ensure that each subset represents the same statistical population (Masters,
1993). In order to obtain a consistent data division, several combinations of the training
and testing sets were considered. The selection was such that the maximum, minimum,
mean and standard deviation of parameters were consistent in the training and testing
datasets. Out of the available data for each problem, approximately 75 percent of the
data was used for the training process and the remaining 25 percent was taken for
testing of the generalization capability of the LGP, GEP and MEP models. In some of the
investigated cases, the input and output variables were normalized between 0 and 1 to
obtain better results. The best LGP, GEP, and MEP-based formulas were chosen on the
basis of a multi-objective strategy as below:
(1) Involving maximum number of the input variables.
(2) Providing the best fitness value on the training set of data.
Correlation coefficient (R), root mean squared error (RMSE) and mean absolute error
(MAE) were used to evaluate the capabilities of the proposed correlations. R, RMSE
and MAE are given in the form of formulas as follows:
Pn
i¼1 ðhi 2 hi Þðt i 2 ti Þ
Geotechnical
R ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pn P ð4Þ
i¼1 ðhi 2 hi Þ
2 n ðt i 2 ti Þ2
i¼1
engineering
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi systems
Pn 2
i¼1 ðhi 2 t i Þ
RMSE ¼ ð5Þ
n
253
1X n
MAE ¼ jhi 2 ti j
n i¼1
where hi and ti are, respectively, the actual output and the calculated output value for
the ith output, h i is the average of the actual outputs, and n is the number of sample.
For the analysis of the classification problem (soil liquefaction), the output variable
was decoded into binary code with a threshold value equal to 0.5. For a more detailed
performance analysis of the soil liquefaction prediction models, their sensitivity,
specificity, positive predictivity, and accuracy were obtained using the following
equations:
TP
Sensitivity ð%Þ ¼ £ 100 ð7Þ
TP þ FN
TN
Specificity ð%Þ ¼ £ 100 ð8Þ
TN þ FP
TP
Positive predictivity ð%Þ ¼ £ 100 ð9Þ
TP þ FP
TP þ TN
Accuracy ð%Þ ¼ £ 100 ð10Þ
TP þ FP þ FN þ TN
where:
TP (true positive) The model predicts that the class is 1 and the class of the given
instance is indeed 1.
TN (true negative) The model predicts that the class is 0 and the class of the given
instance is indeed 0.
FP (false positive) The model predicts that the class is 1 but the class of the given
instance is 0.
FN (false negative) The model predicts that the class is 0 but the class of the given
instance is 1.
TP and TN are correct classifications, while FP and FN are incorrect classifications.
The classification performance of the LGP, GEP, and MEP models was also evaluated
using receiver operating characteristics (ROC) analysis. The ROC curves plot the
sensitivity and specificity versus the model output for a continuous range of decision
thresholds. The selected index of performance was the area (Az) under the ROC curves
which is a meaningful performance measure. Generally, a higher area index reflects a
better classification performance.
EC Problem I: relative crest settlement of concrete-faced rockfill dams
28,3 The concrete-face rockfill dam (CFRD) has become popular in the last four decades
because of its good performance and low cost compared with rockfill dams with an inner
earth core. Experience up to 1960 using dumped rockfill showed that CFRD is a safe and
economical type of dam. However, it is subject to concrete-face damage and leakage
caused by the high compressibility of the segregated dumped rockfill. Many CFRD
254 designs have been used for dam construction worldwide. The CFRD designs overcome
technical difficulties such as dam construction on a soft foundation, complex dam
erection, and related problems. CFRD is the preferred construction type for new dam
projects in many countries, including Australia, China, and Brazil. Standard CFRD
design guidelines have already produced in some of these countries (e.g. ANCLDI, 1991;
BCD, 2000). Numerous studies have been carried out on CFRDs (BCD, 2000).
Cooke (1984) presented a chronicle of modern rockfill dam design, including a
description of current practice in the CFRD design. Clements (1984) proposed empirical
equations to investigate the actual crest settlements and deformations of several rockfill
dams after construction. It was observed that the values calculated using the empirical
formulas exhibit significant differences from the observed values. Liu et al. (1993)
presented a method to predict the maximum settlement at the end of construction and
the maximum face slab normal displacement during reservoir operation. The method
was on the basis of the physical and mechanical properties of rockfill, the load factor,
and the geometric profile of the CFRD section. The characteristics of rockfill behavior
using actual CFRD cases were explained by Hunter (2003) and Hunter and Fell (2003).
It is usually necessary to rely on the historic performance data from other dams to
estimate the dam properties. There is some published information on predicting the
deformation of CFRDs (Clements, 1984). However, such studies are based on limited
data. Also, researchers have often concentrated on only one or two factors which affect
either the rockfill modulus or the measured deformation. ANNs have also been applied to
the prediction of the RCS of a CFRD (Kim and Kim, 2008). As mentioned previously,
ANNs have some fundamental disadvantages that limit their usage in practical
calculations.
Herein, the LGP, GEP, and MEP approaches were used as alternative ways to
simulate the behavior of the CFRD crest settlement. The models derived using these
methods can be used as quick and accurate tools for evaluating the RCS without any
need for manual testing.
RCS ¼ aH b ð11Þ
RCS ¼ 0:0069H 0:655 ð12Þ
where RCS is crest settlement (m); H is dam height (m), ; and ’ are constants. ; is
equal to 0.0002 at initial impounding and 0.0000014 after ten years’ service, and ’ is 1.1
at initial impounding and 2.6 after ten years’ service.
RCS (m)
RCS (m)
MAEAll = 0.061 MAEAll = 0.059
60 60
30 30
256 0 0
Train Test Train Test
–30 –30
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Test no. Test no.
(a) (b)
150 150
Experiment MEP Experiment Clements (1984)
120 120
RAll = 0. 948 RAll = 0.275
90 RMSEAll = 0.078 90 RMSEAll = 0.320
RCS (m)
RCS (m)
MAEAll = 0.060 MAEAll = 0.206
60 60
30 30
0 0
Train Test Train Test
–30 –30
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Test no. Test no.
(c) (d)
150
Experiment Kim and Kim (2008)
120
90 RAll = 0. 237
RMSEAll = 0.247
RCS (m)
60 MAEAll = 0.118
30
Figure 7.
0
Predicted versus
Train Test
measured RCS of CFRDs –30
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
using the LGP, GEP, and
MEP models Test no.
(e)
Training Testing
Models R RMSE MAE R RMSE MAE
training and entire database is, respectively, obtained by MEP and GEP. The results
clearly demonstrate that it is not appropriate to use the models proposed by Clements
(1984) and Kim and Kim (2008) to estimate the crest settlement because of their poor
performance.
Problem II: slope stability Geotechnical
Slope failure is a complex natural phenomenon that constitutes a serious natural hazard. It engineering
is responsible for hundreds of millions of dollars of damage to public and private property
every year. To prevent or mitigate the landslide damage, the slope-stability analysis systems
requires an understanding and evaluation of the processes that govern the behavior of the
slopes. Factor of safety (FS) as an index of stability is required to evaluate the slope
stability. Many parameters are involved in the slope stability evaluation. Calculating the 257
FS values requires geometrical, physical data on the geologic materials and their
shear-strength parameters (cohesion and angle of internal friction), information on
pore-water pressures, etc. The methods available to solve FS of a given slope are
traditionally classified into the following categories (Nash, 1987; Duncan, 1996):
.
energy methods;
.
limit-equilibrium methods;
.
finite element or finite difference methods; and
.
circular failure surface methods and non-circular failure surface methods.
FS
1 1
259
Train Test Train Test
0 0
1 3 5 7 9 11 13 15 17 19 21 23 25 1 3 5 7 9 11 13 15 17 19 21 23 25
Test no. Test no.
(a) (b)
3
Experiment MEP
RAll = 0.877
RMSEAll = 0.113
2 MAEAll = 0.100
FS
1
Figure 8.
Train Test Predicted versus
0 measured FS values using
1 3 5 7 9 11 13 15 17 19 21 23 25 the LGP, GEP, and MEP
Test no.
models
(c)
Training Testing
Models R RMSE MAE R RMSE MAE
Table V.
LGP 0.917 0.128 0.106 0.920 0.174 0.147 Overall performance of
GEP 0.890 0.132 0.098 0.884 0.168 0.141 different models for the
MEP 0.858 0.148 0.116 0.947 0.203 0.158 assessment of FS
Of these, the most difficult task is the prediction of the ground losses in the tunneling
process. Many studies have been done for the assessment of the ground movement. Most
of these studies have followed the trend set by Peck (1969). Peck (1969) represented the
settlement trough over a single tunnel by the error function normal or probability
curve within reasonable limits. Empirical prediction methods were the first methods for
the prediction of the surface subsidence (Peck, 1969; Atkinson and Potts, 1979; Attewell
and Farmer, 1974; Clough and Schmidt, 1981). These methods are on the basis of
the correlation of measured data with the geometric parameters of the excavations.
The results obtained are valid only for the investigated area because these methods are
derived from the measurements in a specific area. The second group of prediction methods
are based on the influential functions. The influential functions are utilized to describe the
value of the impact of elementary part of the excavation on the formation of subsidence.
These methods are based on several assumptions or principles which simplify the calculus
and make the methods generally applicable. The principle of utilizing the methods are to
select the influential function for each mine and then determine the coefficients to ensure
that the subsidence curve is similar to the form of the subsidence in nature. Another group
EC of prediction models are mathematical-physical models. The behavior of roof and the
development of subsidence are calculated in accord with the laws of mechanics. The elastic
28,3 and plastic models of subsidence belong to these methods. In case of using these models,
the problem is usually solved by numerical methods, such as the finite element, finite
difference or boundary element methods.
Progress has recently been made in the ability to predict the ground movements due
260 to tunneling. The state of the art is still deficient in many ways. ANNs have recently
been applied to the prediction of the tunneling-induced ground movement (Ambrozic
and Turk, 2003; Neaupane and Adhikari, 2006). Li et al. (2006) proposed fuzzy models
for the analysis of rock mass displacements due to underground mining. Li et al. (2007)
utilized a hybrid fuzzy and tree-based GP method to analyze the actual cases of
excavation, mining and ground surface movement.
On the basis of a detailed investigation, a viable approach is still necessary for the
prediction of the ground movement. In this paper, measurements of settlement
recorded in different tunnel projects were formulated by means of the LGP, GEP, and
MEP techniques.
Smax
Figure 9.
Typical section of a tunnel
D
Similarly, CM was classified as 1, 2, and 3 for the hand-mined shield, mechanical Geotechnical
shield and semi-mechanical type (compressed air support) shield, respectively. The
descriptive statistics of the data used in this study is given in Table VI. The data from
engineering
several tunneling case studies (e.g. Toronto subway; Regents park, London; Bangkok, systems
GBC5; San Francisco, Brussels metro, etc.) presented by Neaupane and Adhikari (2006)
were used to develop the LGP, GEP and MEP-based models. Of the available 40 data,
30 datasets were used for the training of the models and the rest were taken for the 261
testing purposes.
Smax (mm)
120 120
90 90
60 60
262 30
0
30
0
Train Test
–30 –30
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
Test no. Test no.
(a) (b)
210
180 RAll = 0.888 Experiment MEP
150 RMSEAll = 17.119
MAEAll = 8.802
Smax (mm)
120
90
60
Figure 10. 30
Predicted versus
0
measured Smax using the Train Test
–30
LGP, GEP, and MEP 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
models Test no.
(c)
Training Testing
Models R RMSE MAE R RMSE MAE
Table VII.
Overall performance of LGP 0.949 10.679 8.306 0.985 8.111 6.704
different models for the GEP 0.952 10.751 8.010 0.967 10.748 9.021
assessment of Smax MEP 0.821 19.065 9.593 0.976 9.045 6.429
resulted from the application of noncyclic shear stresses, can be triggered. Liquefaction
usually occurs when the pore water pressure increases to carry the overburden stress,
i.e. the grain to grain stress equals zero. Therefore, soil immediately loses most of its
strength leading to extreme deformations, flow of water and suspension of sediment
(Liang, 2005). This phenomenon is a source of damage and destructive failures in
various types of structures. The seriousness of potential failures of critical structures
due to liquefaction led to massive research efforts to understand this phenomenon.
Several procedures are developed to evaluate the liquefaction potential in the field.
The available liquefaction evaluation procedures are categorized into three main
groups:
(1) the stress-based procedures;
(2) the strain-based procedures; and
(3) the energy-based procedures.
The stress-based procedure is the most widely used liquefaction assessment method
first proposed by Seed and Idriss (1971) and Whitman (1971). This approach is mainly
empirical and based on laboratory and field observations. The stress method has Geotechnical
continually been refined as a result of newer studies and increase in the number of engineering
liquefaction case histories (NRC, 1985; NCEER, 1997). The main criteria in the
stress-based procedure are the shear stress level and the number of cycles. To establish systems
a relationship between earthquake actual motion and laboratory harmonic loading
conditions, the equivalent stress intensity and the number of cycles have to be defined.
Dobry et al. (1982) proposed a strain-based procedure as an alternative to the empirical 263
stress-based procedure. This method was derived from the mechanics of two
interacting idealized sand grains and then generalized for natural soil deposits. It is
based on a hypothesis that pore pressure initiates to develop when shear strain
surpasses a threshold shear strain. Use of the strain-based approach for liquefaction
evaluation is not as common as the stress-based method. The reason for its limited uses
is that the strain procedure only predicts the initiation of pore pressure buildup, which
is essential for liquefaction to occur, but does not imply that liquefaction will occur.
The energy concept has been widely used in the theories of elasticity and plasticity,
potential energy surface for constitutive law and energy principles (Desai and
Siriwardane, 1984). Since the late 1970s, numerous energy-based procedures have
been proposed for evaluating the liquefaction potential of soil deposits (Liang, 2005).
The use of the energy concept is shown to be a logical step in the liquefaction
evaluation of soils (Alavi and Gandomi, 2010).
Modern techniques such as fuzzy systems and ANNs have been utilized to develop
the liquefaction prediction models (Goh, 1994; Hanna et al., 2007; Pal, 2006; Goh and
Goh, 2007) investigated the potential of support vector machines (SVM) classification
approach to assess the liquefaction potential based on actual standard penetration test
(SPT) and cone penetration test (CPT) field data. Recently, Baykasoglu et al. (2009)
proposed a hybrid ANN and ant colony optimization algorithm in order to extract
accurate rules for the liquefaction classification. Unlike the other soft computing tools,
applications of GP and its variants to the liquefaction assessment are difficult to be
found. In this connection, Alavi and Gandomi (2010) derived generalized LGP and MEP
models relating the strain energy density required to trigger liquefaction to the factors
affecting the liquefaction characteristics of sands.
In this work, the potential of alternative data-induction tools, LGP, GEP and MEP, is
demonstrated by applying them to the classification of several liquefied and
non-liquefied cases records.
Discussion
Different LGP-, GEP-, and MEP-based constitutive relationships were obtained for the
assessment of four complex geotechnical engineering systems. The RCS values predicted
using the new CFRD models were in good agreement with the field measurements.
0.8
0.7
266 0.6
Sensitivity
0.5
0.4
a wide area using nonparametric variables with large extension. The ground settlement
above tunnels was successfully formulated by means of the LGP, GEP, and MEP methods.
In the modeling procedure, the effects of several parameters with direct physical
significance on the ground behavior around tunnels were considered. These parameters
were identified through a detailed investigation of different tunnel projects published
in the literature. The results show that the LGP-, GEP-, and MEP-based models can
effectively be used for predicting the ground surface movements due to the soft ground
tunneling. The viability of LGP, GEP, and MEP to model the complex behavior of the
liquefaction phenomenon was further demonstrated. The derived correlations have
integrated the input parameters that account for all possible variations in the field. Several
soil and seismic parameters were included in the soil liquefaction analysis. The developed
constitutive models are expected to be very useful for the preliminary evaluation of the
liquefaction potential of sites for which the input parameters are not well defined.
It is known that the models derived using neural networks, GP-based approaches or
other similar techniques perform best when they do not extrapolate beyond the range
of the data used for their calibration (Shahin et al., 2008). Consequently, the amount of
data used for the model development is an important issue, as it heavily bears on the
reliability of the final models. In this context, Frank and Todeschini (1994) argue that
the minimum ratio of the number of total objects over the number of selected variables
for model acceptability is 3, but often a suffer value of 5 is more reasonable. In the
present study, the ratios are higher and are equal to 26/5 ¼ 5.2 at the minimum
(Problem II: slope stability) and 226/6 ¼ 37.7 at the maximum (Problem IV: soil
liquefaction). Note that the obtained constitutive models can easily be retrained and
improved to make more accurate predictions for a wider range by including the data
for other soil types or test conditions. Comparing the overall performance of the utilized
methods, LGP has generally provided the best results followed by GEP and MEP.
In most of the cases, the capability of LGP and GEP was better than MEP in
incorporating the effects of more influencing parameters.
The task faced by LGP-, GEP-, MEP-, and other GP-based approaches is mainly the Geotechnical
same as that faced by ANNs. GP and ANNs are machine learning techniques that can engineering
effectively be applied to the classification and approximation problems. They directly
learn from raw experimental (or field) data presented to them in order to extract the systems
subtle functional relationships among the data, even if the underlying relationships are
unknown or the physical meaning is difficult to be explained. Contrary to these methods,
most conventional empirical and statistical methods need prior knowledge about the 267
nature of the relationships among the data. Classical constitutive models rely on
assuming the structure of the model in advance, which may be suboptimal. Therefore,
the GP and ANN-based approaches are well-suited to modeling the complex behavior of
most geotechnical engineering problems with extreme variability in their nature (Shahin
et al., 2009). In spite of similarities, there are some important differences between GP and
ANNs. ANNs suffer from some shortcomings including lack of transparency and
knowledge extraction. That is, they do not explicitly explain the underlying physical
processes. The knowledge extracted by ANNs is stored in a set of weights that cannot
properly be interpreted. Owing to the large complexity of the network structure, ANNs
do not give a transparent function relating the inputs to the corresponding outputs.
The main advantage of GP over ANNs is that GP generates a transparent and structured
representation of the system being studied. An additional advantage of GP over ANNs is
that determining the ANN architecture is a difficult task. The structure and network
parameters of ANNs (e.g. number of inputs, transfer functions, number of hidden layers
and their number of nodes, etc.) should be identified a priori, which is usually done
through a time-consuming trial and error procedure. In GP, the number and combination
of terms are automatically evolved during model calibration (Shahin et al., 2009; Javadi
and Rezania, 2009). A notable limitation of GP and its variants is that these methods are
parameter sensitive, especially when difficult experimental training datasets like those
used in this paper are employed. Using any form of optimally controlling the parameters
of the run (e.g., GAs) can improve the performance of the LGP, GEP, and MEP
algorithms. Also, the underlying assumption that the input parameters are reliable is not
always the case. Sine fuzzy logic can provide a systematic method to deal with imprecise
and incomplete information, the process of developing hybrid fuzzy and linear-based GP
models for such problems can be a suitable topic for further studies.
However, one of the goals of introducing the expert systems, such as the GP-based
approaches, into the design processes is better handling of the information in the
pre-design phase. In the initial steps of design, information about the features and
properties of targeted output or process are often imprecise and incomplete
(Kraslawski et al., 1999). Nevertheless, it is idealistic to have some initial estimates of
the outcome before performing any extensive laboratory or field work. The LGP, GEP,
and MEP approaches employed in this research are based on the data alone to
determine the structure and parameters of the models. Thus, the derived constitutive
models can particularly be valuable in the preliminary design stages. For more
reliability, the results of the LGP-, GEP-, and MEP-based analyses are suggested to
be treated as a complement to conventional computing techniques. In any case, the
importance of engineering judgment in interpretation of the obtained results should not
be underestimated. In order to develop a sophisticated prediction tool, LGP, GEP, and
MEP can be combined with advanced deterministic geomechanical models. Assuming
the geomechanical model captures the key physical mechanisms, it needs appropriate
EC initial conditions and carefully calibrated parameters to make accurate predictions.
An idea could be to calibrate the geomechanical parameters by the use of LGP, GEP,
28,3 and MEP which take into account historic datasets as well as the laboratory or field
test results. This allows integrating the uncertainties related to in situ conditions which
the geomechanical model does not explicitly account for. LGP, GEP, and MEP provide
a structured representation for the constitutive material model that can readily be
268 incorporated into the finite element or finite difference analyses. In this case, it is
possible to use a suitably trained GP-based material model instead of a conventional
(analytical) constitutive model in a numerical analysis tool such as finite element code
or finite difference software (like FLAC). Consequently, the need for complex
yielding/plastic potential/failure functions, or flow rules is avoided. It is notable that
the numerical implementation of ANNs in the finite element analyses has already been
presented by several researchers (Shin and Pande, 2000; Javadi et al., 2005). This
strategy has lead to some qualitative improvement in the application of finite element
method in engineering practice ( Javadi and Rezania, 2009).
Conclusions
In this paper, the LGP, GEP, and MEP paradigms were employed for the analysis of
complex geotechnical engineering systems. These methodologies were applied to the
assessment of the RCS of CFRD, slope stability, ground settlement above tunnels, and
soil liquefaction phenomenon. Reliable databases gathered from the literature were
used to develop the models. The following conclusions can be derived from the results
presented in this research:
(1) Despite high nonlinearity in the behavior of the investigated systems, the
proposed LGP, GEP and MEP models give reasonable estimates of the target
values. The validity of the models was verified for a part of test results beyond
the training data domain. The LGP models have the best overall behavior
followed by the GEP and MEP models.
(2) The proposed models efficiently take into consideration the effects of several
parameters representing the engineering behavior of the geotechnical problems.
(3) LGP, GEP, and MEP provide prediction equations that are relatively simple and
can be used for routine design practice via hand calculations.
(4) The LGP, GEP, and MEP-based models can be incorporated into the finite
element or finite difference methods in the same way as a conventional
constitutive model.
(5) The constitutive models derived using LGP, GEP, and MEP are basically
different from the conventional constitutive models based on the first
principles (e.g. elasticity and plasticity theories). One of the distinctive features
of the LGP, GEP, and MEP-based constitutive models is that they are based on
the experimental data rather than on assumptions made in developing the
conventional models. Consequently, as more data become available, these
models can be improved by re-training LGP, GEP, and MEP, without repeating
the development procedures from the beginning.
(6) It is possible to obtain more than one correlation for a complex phenomenon by
selecting various parameters and function sets involved in the LGP, GEP and
MEP predictive algorithms.
(7) LGP, GEP and MEP can be regarded as efficient tools for the analysis of Geotechnical
geotechnical engineering problems because of their unique learning, training engineering
and prediction characteristics. These methods are particularly practical for the
situations where: systems
.
good experimental data are available;
. the behavior is too complex; and
269
.
the conventional constitutive models are unable to effectively describe
various aspects of the behavior.
References
Adeli, H. (2001), “Neural networks in civil engineering: 1989-2000”, Computer-Aided Civil and
Infrastructure Engineering, Vol. 16 No. 2, pp. 126-42.
Ahmed, A.A., Ali, H.A., ElAraby, S.M., ElKateb, M. and Noureldin, S.M. (2008),
“Non-deterministic tunneling analysis using AI based techniques genetic programming
vs ANNs”, paper presented at 12th International Colloquium on Structural and
Geotechnical Engineering (ICSGE), Cairo.
Alavi, A.H. and Gandomi, A.H. (2010), “Energy-based numerical correlations for soil liquefaction
assessment”, Computers and Geotechnics, 8 July.
Alavi, A.H., Gandomi, A.H. and Heshmati, A.A.R. (2010a), “Discussion on soft computing
approach for real-time estimation of missing wave heights”, Ocean Engineering, Vol. 37
No. 13.
Alavi, A.H., Gandomi, A.H., Gandomi, M. and Sadat Hosseini, S.S. (2009), “Prediction of
maximum dry density and optimum moisture content of stabilized soil using RBF neural
networks”, The IES Journal Part A: Civil & Structural Engineering, Vol. 2 No. 2, pp. 98-106.
Alavi, A.H., Gandomi, A.H., Sahab, M.G. and Gandomi, M. (2010b), “Multi expression
programming: a new approach to formulation of soil classification”, Engineering with
Computers, Vol. 26 No. 2, pp. 111-18.
Alavi, A.H., Gandomi, A.H., Mollahasani, A., Heshmati, A.A.R. and Rashed, A. (2010c),
“Modeling of maximum dry density and optimum moisture content of stabilized soil using
artificial neural networks”, Journal of Plant Nutrition and Soil Science, Vol. 173 No. 3.
Alavi, A.H., Heshmati, A.A.R., Gandomi, A.H., Askarinejad, A. and Mirjalili, M. (2008),
“Utilisation of computational intelligence techniques for stabilised soil”, in Papadrakakis,
M. and Topping, B.H.V. (Eds), Engineering Computational Technology, Civil-Comp Press,
Edinburgh, paper 175.
Ambrozic, T. and Turk, G. (2003), “Prediction of subsidence due to underground mining by
artificial neural networks”, Computers & Geosciences, Vol. 29, pp. 627-37.
ANCLDI (1991), Guidelines on Concrete-faced Rockfill Dams, Australian National Committee on
Large Dams Incorporated, Yichang.
Atkinson, J.H. and Potts, D.M. (1979), “Subsidence above shallow tunnels in soft ground”,
Journal of Geotechnical and Geoenvironmental Engineering (ASCE), Vol. 103 No. 4,
pp. 307-25.
Attewell, P.B. and Farmer, I.W. (1974), “Ground deformations resulting from shield tunneling
in London clay”, Canadian Geotechnical Journal, Vol. 11, pp. 380-95.
Banzhaf, W., Nordin, P., Keller, R. and Francone, F. (1998), Genetic Programming –
An Introduction. On the Automatic Evolution of Computer Programs and its Application,
dpunkt/Morgan Kaufmann, San Francisco, CA.
EC Baykasoglu, A., Çevik, A., Özbakır, L. and Sinem, K. (2009), “Generating prediction
rules for liquefaction through data mining”, Expert Systems with Applications, Vol. 36
28,3 No. 10.
Baykasoglu, A., Gullub, H., Canakcı, H. and Ozbakır, L. (2008), “Prediction of compressive and
tensile strength of limestone via genetic programming”, Expert Systems with Applications,
Vol. 35 Nos 1/2, pp. 111-23.
270 BCD (2000), Highlights of Brazilian Dam Engineering, Brazilian Committee on Dams, Sao Paulo.
Brameier, M. and Banzhaf, W. (2001), “A comparison of linear genetic programming and neural
networks in medical data mining”, IEEE Transactions on Evolutionary Computation, Vol. 5
No. 1, pp. 17-26.
Brameier, M. and Banzhaf, W. (2007), Linear Genetic Programming, Springer ScienceþBusiness
Media LLC, New York, NY.
Cabalar, A.F. and Cevik, A. (2009), “Genetic programming-based attenuation relationships:
an application of recent earthquakes in Turkey”, Computers & Geosciences, Vol. 35,
pp. 1884-96.
Cevik, A. and Cabalar, A.F. (2009), “Modelling damping ratio and shear modulus of sand-mica
mixtures using genetic programming”, Expert Systems with Applications, Vol. 36 No. 4,
pp. 7749-57.
Clements, R.P. (1984), “Post-construction deformation of rockfill dams”, Journal of Geotechnical
Engineering ( ASCE ), Vol. 110 No. 7, pp. 821-40.
Clough, W. and Schmidt, B. (1981), “Design and performance of excavations and tunnels in
softclay”, Soft Clay Engineering, Elsevier, Amsterdam, pp. 100-4.
Conrads, M., Dolezal, O., Francone, F.D. and Nordin, P. (2001), Discipulus – Fast Genetic
Programming based on AIM Learning Technology, Register Machine Learning
Technologies, Littleton, CO.
Cooke, J.B. (1984), “Progress in rockfill dams”, Journal of Geotechnical Engineering (ASCE),
Vol. 110 No. 10, pp. 821-40.
Cramer, N.L. (1985), “A representation for the adaptive generation of simple sequential
programs”, Proceedings of the International Conference on Genetic Algorithms and Their
Applications, Hillsdale, NJ, July, pp. 183-7.
Cui, L. and Sheng, D. (2005), “Genetic algorithms in probabilistic finite element analysis
of geotechnical problems”, Computers and Geotechnics, Vol. 32 No. 8, pp. 555-63.
Darve, F. (1996), “Liquefaction phenomenon of granular materials and constitutive instability”,
Engineering Computations, Vol. 13 No. 7, pp. 5-28.
Desai, C.S. and Siriwardane, H.J. (1984), Constitutive Laws for Engineering Materials:
with Emphasis on Geologic Materials, Prentice-Hall, New Jersey, NJ.
Dobry, R., Ladd, R.S., Yokel, F.Y., Chung, R.M. and Powell, D. (1982), “Prediction of pore water
pressure buildup and liquefaction of sands during earthquakes by the cyclic strain
method”, Building Science Series, Vol. 138, National Bureau of Standards, US Department
of Commerce, US Governmental Printing Office, Washington, DC.
Duncan, J.M. (1996), “State of the art: limit equilibrium and finite element analysis of slopes”,
Journal of Geotechnical Engineering (ASCE), Vol. 122, pp. 577-96.
Ermini, L., Catani, F. and Casagli, N. (2005), “Artificial neural networks applied to landslide
susceptibility assessment”, Geomorphology, Vol. 66 Nos 1-4, pp. 327-43.
Ferreira, C. (2001), “Gene expression programming: a new adaptive algorithm for solving
problems”, Complex Systems, Vol. 13 No. 2, pp. 87-129.
Ferreira, C. (2006), Gene Expression Programming: Mathematical Modeling by an Artificial Geotechnical
Intelligence, 2nd ed., Springer, Heidelberg.
engineering
Francone, F.D. and Deschaine, L.M. (2004), “Extending the boundaries of design optimization by
integrating fast optimization techniques with machine-code-based, linear genetic systems
programming”, Information Sciences, Vol. 161, pp. 99-120.
Frank, I.E. and Todeschini, R. (1994), The Data Analysis Handbook, Elsevier, Amsterdam.
Friedberg, R.M. (1958), “A learning machine: Part I”, IBM Journal of Research and Development,
271
Vol. 2, pp. 2-13.
Gandomi, A.H., Alavi, A.H., Mirzahosseini, M.R. and Moghadas Nejad, F. (2010), “Nonlinear
genetic-based models for prediction of flow number of asphalt mixtures”, Journal of
Materials in Civil Engineering ( ASCE ), Vol. 23 No. 3, pp. 1-18.
GEPSOFT (2006), GeneXproTools Owner’s Manual, Version 4.0. available at: http://gepsoft.com/
Goh, A.T.C. (1994), “Seismic liquefaction potential assessed by neural networks”, Journal of
Geotechnical Engineering ( ASCE ), Vol. 120 No. 9, pp. 1467-80.
Goh, A.T.C. (1999), “Genetic algorithm search for critical slip surface in multiple-wedge stability
analysis”, Canadian Geotechnical Journal, Vol. 36 No. 2, pp. 382-91.
Goh, A.T.C. and Goh, S.H. (2007), “Support vector machines: their use in geotechnical engineering
as illustrated using seismic liquefaction data”, Computers and Geotechnics, Vol. 34,
pp. 410-21.
Hanna, A.M., Ural, D. and Saygili, G. (2007), “Evaluation of liquefaction potential of soil deposits
using artificial neural networks”, Engineering Computations, Vol. 24 No. 1, pp. 5-16.
Hashash, Y.M.A., Levasseurb, S., Osoulia, A., Finno, R. and Malecot, Y. (2010), “Comparison of
two inverse analysis techniques for learning deep excavation response”, Computers and
Geotechnics, Vol. 37 No. 3, pp. 323-33.
Hunter, G.J. (2003), “The pre- and post-failure deformation behavior of soil slopes”, PhD thesis,
University of New South Wales, Sydney.
Hunter, G.J. and Fell, R. (2003), “Rockfill modulus and settlement of concrete face rockfill dams”,
Journal of Geotechnical and Geoenvironmental Engineering ( ASCE ), Vol. 129 No. 10,
pp. 909-17.
Javadi, A.A. (2006), “Estimation of air losses in compressed air tunneling using neural network”,
Journal of Tunnelling and Underground Space Technology, Vol. 21 No. 1, pp. 9-20.
Javadi, A.A. and Rezania, M. (2009), “Applications of artificial intelligence and data mining
techniques in soil modeling”, Geomechanics and Engineering, Vol. 1 No. 1, pp. 53-74.
Javadi, A.A., Rezani, M. and Mousavi Nezhad, M. (2006), “Evaluation of liquefaction induced
lateral displacements using genetic programming”, Computers and Geotechnics, Vol. 33
Nos 4/5, pp. 222-33.
Javadi, A.A., Tan, T.P. and Elkassas, A.S.I. (2005), “Intelligent finite element method”, paper
presented at the 3rd MIT Conference on Computational Fluid and Solid Mechanics,
Cambridge, MA.
Johari, A., Habibagahi, G. and Ghahramani, A. (2006), “Prediction of soil-water characteristic
curve using genetic programming”, Journal of Geotechnical and Geoenvironmental
Engineering ( ASCE ), Vol. 132 No. 5, pp. 661-5.
Juang, C.H., Jiang, T. and Christopher, R.A. (2001), “Three-dimensional site characterisation:
neural network approach”, Geotechnique, Vol. 51 No. 9, pp. 799-809.
EC Juang, C.H., Yuan, H., Lee, D. and Lin, P. (2003), “Simplified cone penetration test-based method
for evaluating liquefaction resistance of soils”, Journal of Geotechnical and
28,3 Geoenvironmental Engineering ( ASCE ), Vol. 129 No. 11, pp. 66-80.
Kayadelen, C., Günaydın, O., Fener, M., Demir, A. and Özvan, A. (2009), “Modeling of the angle of
shearing resistance of soils using soft computing systems”, Expert Systems with
Applications, Vol. 36, pp. 11814-26.
272 Kim, Y.S. and Kim, B.T. (2008), “Prediction of relative crest settlement of concrete-faced rockfill
dams analyzed using an artificial neural network model”, Computers and Geotechnics,
Vol. 35, pp. 313-22.
Koza, J.R. (1992), Genetic Programming, on the Programming of Computers by Means of Natural
Selection, MIT Press, Cambridge, MA.
Kraslawski, A., Pedrycz, W. and Nyström, L. (1999), “Fuzzy neural network as instance generator
for case-based reasoning system: an example of selection of heat exchange equipment in
mixing”, Neural Computing & Applications, Vol. 8 No. 2, pp. 106-13.
Levasseur, S., Malécot, Y., Boulon, M. and Flavigny, E. (2007), “Soil parameter identification
using a genetic algorithm”, International Journal for Numerical and Analytical Methods in
Geomechanics, Vol. 32 No. 2, pp. 189-213.
Levasseur, S., Malécot, Y., Boulon, M. and Flavigny, E. (2009), “Statistical inverse analysis based
on genetic algorithm and principal component analysis: method and developments using
synthetic data”, International Journal for Numerical and Analytical Methods in
Geomechanics, Vol. 33 No. 12, pp. 1485-511.
Li, W., Daib, L., Houa, X. and Leia, W. (2007), “Fuzzy genetic programming method for analysis
of ground movements due to underground mining, technical note”, International Journal of
Rock Mechanics and Mining Sciences, Vol. 44, pp. 954-61.
Li, W., Mei, S., Zhai, S., Zhao, S. and Liang, X. (2006), “Fuzzy models for analysis of rock mass
displacements due to underground mining in mountainous areas”, International Journal of
Rock Mechanics and Mining Sciences, Vol. 43, pp. 503-11.
Liang, L. (2005), “Development of an energy method for evaluating the liquefaction potential of
a soil deposit”, PhD dissertation, Department of Civil Engineering, Case Western Reserve
University, Cleveland, OH.
Liu, F.M., Chen, Y.B., Liu, J. and Ni, Y.L. (1993), “Construction materials selection and
characteristics of Wan An Xi concrete faced rockfill dam”, High Earth-Rockfill Dams,
Beijing, Vol. 1, pp. 272-85.
McCombie, P. and Wilkinson, P. (2002), “The use of the simple genetic algorithm in finding
the critical factor of safety in slope stability analysis”, Computers and Geotechnics, Vol. 29
No. 8, pp. 699-714.
Majdi, A. and Beiki, M. (2009), “Evolving neural network using a genetic algorithm for predicting
the deformation modulus of rock masses”, International Journal of Rock Mechanics and
Mining Sciences, Vol. 47 No. 2, pp. 246-53.
Masters, T. (1993), Practical Neural Network Recipes in Cþþ , Academic Press, San Diego, CA.
Miller, J. and Thomson, P. (2002), “Cartesian genetic programming”, in Poli, R., Banzhaf, W.,
Langdon, B., Miller, J., Nordin, P. and Fogarty, T.C. (Eds), Genetic Programming, Springer,
Berlin.
Narendra, B.S., Sivapullaiah, P.V., Suresh, S. and Omkar, S.N. (2006), “Prediction of unconfined
compressive strength of soft grounds using computational intelligence techniques:
a comparative study”, Computers and Geotechnics, Vol. 33, pp. 196-208.
Nash, D. (1987), “A comparative review of limit equilibrium methods of stability analysis”, Geotechnical
in Anderson, M.G. and Richards, K.S. (Eds), Slope Stability for Geotechnical Engineering
and Geomorphology, Wiley, New York, NY, pp. 11-75. engineering
NCEER (1997), “In evaluation of liquefaction resistance of soils”, in Youd, T.L. and Idriss, I.M. systems
(Eds), Technical Report NCEER-97-0022, National Center for Earthquake Engineering
Research, State University of New York, New York, NY.
Neaupane, K.M. and Achet, S.H. (2004), “Use of backpropagation neural network for landslide 273
monitoring: a case study in the higher Himalaya”, Engineering Geology, Vol. 74, pp. 213-26.
Neaupane, K.M. and Adhikari, N.R. (2006), “Prediction of tunneling-induced ground movement
with the multi-layer perceptron”, Tunnelling and Underground Space Technology, Vol. 21,
pp. 151-9.
NRC (1985), Liquefaction of Soils During Earthquakes, Committee on Earthquake Engineering,
Commission on Engineering and Technical Systems, National Research Council, National
Academy Press, Washington, DC, p. 240.
Oltean, M. (2004), “Multi expression programming source code”, available at: http://mep.cs.
ubbcluj.ro/
Oltean, M. and Dumitrescu, D. (2002), “Multi expression programming”, Technical Report,
UBB-01-2002, Babeş-Bolyai University, Cluj-Napoca.
Oltean, M. and Grosşan, C. (2003a), “A comparison of several linear genetic programming
techniques”, Advances in Complex Systems, Vol. 14 No. 4, pp. 1-29.
Oltean, M. and Grosşan, C. (2003b), “Evolving evolutionary algorithms using multi
expression programming”, Artificial Life, LNAI 2801, Springer, Berlin, pp. 651-8.
Oltean, M. and Grosşan, C. (2003c), “Solving classification problems using infix form genetic
programming”, in Berthold, M. (Ed.), Intelligent Data Analysis, LNCS 2810, Springer,
Berlin, pp. 242-52.
Pal, M. (2006), “Support vector machines-based modelling of seismic liquefaction potential”,
International Journal for Numerical and Analytical Methods in Geomechanics, Vol. 30,
pp. 983-96.
Pal, S., Wije Wathugala, G. and Kundu, S. (1996), “Calibration of a constitutive model using
genetic algorithms”, Computers and Geotechnics, Vol. 19 No. 4, pp. 325-48.
Patterson, N. (2002), “Genetic programming with context-sensitive grammars”, PhD thesis,
School of Computer Science, University of Scotland, London.
Peck, R.B. (1969), “Deep excavation and tunneling in soft ground”, Proceedings of the International
Conference in Soil Mechanics and Foundation Engineering, Mexico City, pp. 225-90.
Poli, R., Langdon, W.B., McPhee, N.F. and Koza, J.R. (2007), “Genetic programming:
an introductory tutorial and a survey of techniques and applications”, Technical Report
[CES-475], University of Essex, Colchester.
Rezania, M. and Javadi, A.A. (2007), “A new genetic programming model for predicting settlement of
shallow foundations”, Canadian Geotechnical Journal, Vol. 44 No. 12, pp. 1462-73.
Rumelhart, D.E. and McClelland, J. (1986), Parallel Distributed Processing: Explorations in
Microstructure of Cognition, Massachusetts Institute of Technology Press, Cambridge,
MA, pp. 11-30.
Seed, H.B. and Idriss, I.M. (1971), “Simplified procedure for evaluating soil liquefaction
potential”, Journal of the Soil Mechanics and Foundations Division (ASCE), Vol. 97, SM8,
pp. 1249-74.
Seed, R.B., Cetin, K.O., Moss, R.E.S., Kammerer, A., Wu, J., Pestana, J.M., Riemer, M.F.,
Sancio, R.B., Bray, J.D., Kayen, R.E. and Faris, A. (2003), “Recent advances in soil
EC liquefaction engineering: a unified and consistent framework”, Keynote Address,
26th Annual Geotechnical Spring Seminar, Los Angeles Section of the GeoInstitute,
28,3 Los Angeles, CA.
Shahin, M.A., Jaksa, M.B. and Maier, H.R. (2008), “State of the art of artificial neural networks in
geotechnical engineering”, Electronic Journal of Geotechnical Engineering, Vol. 8, pp. 1-26,
available at: www.ejge.com/Bouquet08/shahin
274 Shahin, M.A., Jaksa, M.B. and Maier, H.R. (2009), “Recent advances and future challenges for
artificial neural systems in geotechnical engineering applications”, Advances in Artificial
Neural Systems, Vol. 2009, p. 9.
Shahin, M.A., Maier, H.R. and Jaksa, M.B. (2001), “Artificial neural network applications
in geotechnical engineering”, Australian Geomechanics, Vol. 36 No. 1, pp. 49-62.
Shin, H.S. and Pande, G.N. (2000), “On self-learning finite element code based on monitored
response of structures”, Computers and Geotechnics, Vol. 27, pp. 161-78.
Simpson, A.R. and Priest, S.D. (1993), “The application of genetic algorithms to optimisation
problems in geotechnics”, Computers and Geotechnics, Vol. 15 No. 1, pp. 1-19.
Thongyot, T. (1995), “Ground movement associated with 11 km water transmission bored tunnel
in Bangkok subsoil”, Masters thesis (GE-95-7), Asian Institute of Technology (AIT),
Thailand.
Torres, R.S., Falcão, A.X., Gonçalves, M.A., Papa, J.P., Zhang, B., Fan, W. and Fox, E.A. (2009),
“A genetic programming framework for content-based image retrieval”, Pattern
Recognition, Vol. 42 No. 2, pp. 283-92.
Wang, H.B., Xu, W.Y. and Xu, R.C. (2005), “Slope stability evaluation using back propagation
neural networks”, Engineering Geology, Vol. 80, pp. 302-15.
Whitman, R.V. (1971), “Resistance of soil to liquefaction and settlement”, Soils and Foundations,
Vol. 11 No. 4, pp. 59-68.
Yang, C.X., Tham, L.G., Feng, X.T., Wang, Y.J. and Lee, P.K.K. (2004), “Two-stepped
evolutionary algorithm and its application to stability analysis of slopes”, Journal of
Computing in Civil Engineering ( ASCE ), Vol. 18 No. 2, pp. 145-53.
Yoshikoshi, W., Osamu, W. and Takagaki, N. (1978), “Prediction of ground settlements
associated with shield tunneling”, Soils and Foundations, Vol. 18 No. 4, pp. 47-59.
Corresponding author
Amir Hossein Alavi can be contacted at: ah_alavi@hotmail.com