A Robust Data Mining Approach

The current issue and full text archive of this journal is available at
www.emeraldinsight.com/0264-4401.htm
EC
28,3 A robust data mining approach
for formulation of geotechnical
engineering systems
242
Amir Hossein Alavi
College of Civil Engineering, Iran University of Science and Technology,
Tehran, Iran, and
Amir Hossein Gandomi
College of Civil Engineering, Tafresh University, Tafresh, Iran
Abstract
Purpose – The complexity of analysis of geotechnical behavior is due to multivariable dependencies
of soil and rock responses. In order to cope with this complex behavior, traditional forms of
engineering design solutions are reasonably simplified. Incorporating simplifying assumptions into
the development of the traditional models may lead to very large errors. The purpose of this paper is to
illustrate capabilities of promising variants of genetic programming (GP), namely linear genetic
programming (LGP), gene expression programming (GEP), and multi-expression programming (MEP)
by applying them to the formulation of several complex geotechnical engineering problems.
Design/methodology/approach – LGP, GEP, and MEP are new variants of GP that make a clear
distinction between the genotype and the phenotype of an individual. Compared with the traditional GP, the
LGP, GEP, and MEP techniques are more compatible with computer architectures. This results in a
significant speedup in their execution. These methods have a great ability to directly capture the knowledge
contained in the experimental data without making assumptions about the underlying rules governing the
system. This is one of their major advantages over most of the traditional constitutive modeling methods.
Findings – In order to demonstrate the simulation capabilities of LGP, GEP, and MEP, they were
applied to the prediction of: relative crest settlement of concrete-faced rockfill dams; slope stability;
settlement around tunnels; and soil liquefaction. The results are compared with those obtained by
other models presented in the literature and found to be more accurate. LGP has the best overall
behavior for the analysis of the considered problems in comparison with GEP and MEP. The simple
and straightforward constitutive models developed using LGP, GEP and MEP provide valuable
analysis tools accessible to practicing engineers.
Originality/value – The LGP, GEP, and MEP approaches overcome the shortcomings of different
methods previously presented in the literature for the analysis of geotechnical engineering systems.
Contrary to artificial neural networks and many other soft computing tools, LGP, GEP, and MEP
provide prediction equations that can readily be used for routine design practice. The constitutive
models derived using these methods can efficiently be incorporated into the finite element or finite
difference analyses as material models. They may also be used as a quick check on solutions
developed by more time consuming and in-depth deterministic analyses.
Keywords Data collection, Geotechnical engineering, Programming and algorithm theory,
Systems analysis
Paper type Research paper
Engineering Computations:
International Journal for
Computer-Aided Engineering and
Software Introduction
Vol. 28 No. 3, 2011
pp. 242-274 In contrast with other civil engineering problems, many of geotechnical engineering
q Emerald Group Publishing Limited systems lack a precise analytical theory or model for their solutions. This is usually
0264-4401
DOI 10.1108/02644401111118132 because of an inadequate understanding of the phenomena involved and the factors
affecting them, as well as a limited quantity and poor quality of information available. Geotechnical
In order to cope with the complexity of geotechnical engineering problems, traditional engineering
forms of engineering design solutions have widely been developed. The information
has usually been collected, synthesized and presented in the form of design charts, systems
tables or empirical formulae (Shahin et al., 2001). Most commonly used regression
analyses can have large uncertainties. The regression analysis has major drawbacks
pertaining idealization of complex processes, approximation and averaging widely 243
varying prototype conditions. In regression analyses, the nature of the corresponding
problem is modeled by a pre-defined linear or nonlinear equation. Another major
constraint in application of regression analysis is the assumption of normality of
residuals. The simulation capability of the classical constitutive modeling is also
limited for reasons pertaining to the formulation complexity, idealization of material
behavior, and excessive empirical parameters (Adeli, 2001).
Several computer-aided data mining approaches have been developed by
developments in computational software and hardware. Pattern recognition system,
as an example, learns adaptively from experiences and extracts various discriminators.
Artificial neural networks (ANNs) are the most widely used pattern recognition
procedures. They have emerged as a result of simulation of biological nervous system.
ANNs have extensively been used to capture the nonlinear interactions between various
parameters in geotechnical engineering systems (Juang et al., 2001; Javadi, 2006; Alavi
et al., 2009, 2010c). An overview of the ANN applications in geotechnical engineering and
current research directions of this approach have been recently presented by Shahin et al.
(2008, 2009) and Javadi and Rezania (2009). Despite the acceptable performance of ANNs
in most cases, they do not give a definite function to calculate the outcome using the
input values. Hence, they do not provide a better understanding of the nature of the
derived relationships. The ANN approach is mostly appropriate to be used as a part of a
computer program. However, more robust tools are required to assess the behavior of
geotechnical engineering problems due to their nonlinearity and complexity.
Genetic algorithm (GA) is a powerful stochastic search and optimization method
based on the principles of genetics and natural selection. GA has been shown to be
suitably robust for a wide variety of complex geotechnical problems (Simpson and
Priest, 1993; Pal et al., 1996; Goh, 1999; McCombie and Wilkinson, 2002; Cui and Sheng,
2005; Levasseur et al., 2007, 2009; Majdi and Beiki, 2009; Hashash et al., 2010). Genetic
programming (GP) (Koza, 1992; Banzhaf et al., 1998) is an alternative approach for the
behavior modeling of geotechnical engineering tasks. GP is a developing subarea of
evolutionary algorithms inspired from Darwin’s evolution theory. It may generally be
defined as a specialization of GA where the solutions are computer programs rather
than fixed-length binary strings. The main advantage of GP over the conventional
statistical methods and other soft computing tools is its ability to generate prediction
equations without assuming prior form of the existing relationship. The developed
equations can be easily manipulated in practical circumstances. In contrast with ANNs
and GA, application of GP in the field of civil engineering is quite new and original.
The classical GP technique has been recently used to derive greatly simplified
formulas for geotechnical engineering problems (Yang et al., 2004; Johari et al., 2006;
Javadi et al., 2006). Recent studies have also shown that GP and its variants possess
obvious superiority over ANNs in dealing with geotechnical problems (Narendra et al.,
2006; Rezania and Javadi, 2007; Kayadelen et al., 2009; Alavi et al., 2010b).
EC Linear genetic programming (LGP) (Brameier and Banzhaf, 2007) is a new subset of GP
28,3 with a linear structure similar to the DNA molecule in biological genomes. LGP is a
machine learning approach that uses sequences of imperative instructions as genetic
material. More specifically, LGP operates on programs that are represented as linear
sequences of instructions of an imperative programming language (Brameier and
Banzhaf, 2001, 2007). Gene expression programming (GEP) (Ferreira, 2001) is another
244 recent extension to GP that evolves computer programs of different sizes and shapes
encoded in linear chromosomes of fixed length. The GEP chromosomes are composed of
multiple genes, each gene encoding a smaller subprogram. Multi-expression programming
(MEP) (Oltean and Dumitrescu, 2002) is also a linear variant of GP that uses a linear
representation of chromosomes. MEP has a special ability to encode multiple computer
programs of a problem in a single chromosome. Based on numerical experiments, the LGP,
GEP, and MEP approaches are significantly able to outperform similar techniques (Oltean
and Grosşan, 2003a; Brameier and Banzhaf, 2007). Some of the limited scientific efforts
directed at applying LGP, GEP, and MEP to geotechnical engineering tasks include
performance characteristics modeling of stabilized soil (Alavi et al., 2008), prediction of
compressive and tensile strength of limestone (Baykasoglu et al., 2008), prediction of peak
ground acceleration (Cabalar and Cevik, 2009), modeling damping ratio and shear
modulus of sand (Cevik and Cabalar, 2009), formulation of soil classification (Alavi et al.,
2010b), and soil liquefaction assessment (Alavi and Gandomi, 2010).
This study investigates the potential of LGP, GEP, and MEP in simulating the
nonlinear complex behavior of geotechnical engineering systems. In order to demonstrate
the formulation capabilities of LGP, GEP, and MEP, these techniques were applied to four
practical examples of geotechnical engineering. The obtained results were further
compared with those provided by the existing models in the literature. The LGP, GEP, and
MEP models were developed based on reliable experimental results collected through an
extensive literature review.
Genetic programming
GP is a symbolic optimization technique that creates computer programs to solve a
problem using the principle of Darwinian natural selection (Koza, 1992). Friedberg
(1958) left the first footprints in the area of GP by using a learning algorithm to
stepwise improve a program. Much later, Cramer (1985) applied GAs and tree-like
structures to evolve programs. The breakthrough in GP then came in the late 1980s
with the experiments of Koza (1992) on symbolic regression. GP was introduced by
Koza (1992) as an extension of GA. Most of the genetic operators used in GA can also
be implemented in GP with minor changes. The main difference between GP and GA is
the representation of the solution. The GP solutions are computer programs that are
represented as tree structures and expressed in a functional programming language
(like LISP) (Koza, 1992). GA creates a string of numbers that represent the solution.
In other words, in GP, the evolving programs (individuals) are parse trees than can
vary in length throughout the run rather than fixed-length binary strings. Essentially,
this is the beginning of computer programs that program themselves (Koza, 1992).
Since GP often evolves computer programs, the solutions can be executed without
post-processing, while coded binary strings typically evolved by GA require
post-processing (Ahmed et al., 2007). The traditional optimization techniques, like GA,
are generally used in parameter optimization to evolve the best values for a given
set of model parameters. GP, on the other hand, gives the basic structure of the Geotechnical
approximation model together with the values of its parameters (Javadi and Rezania, engineering
2009). GP optimizes a population of computer programs according to a fitness
landscape determined by a program ability to perform a given computational task. systems
The fitness of each program in the population is evaluated using a fitness function.
Thus, the fitness function is the objective function GP aims to optimize (Torres et al.,
2009). That is to say, the fitness function is a particular type of objective function that 245
prescribes the optimality of a solution (computer program) evolved by GP and ranks
the program against all the other generated programs.
This classical GP approach is referred to as tree-based GP. A population member in
tree-based GP is a hierarchically structured tree comprising functions and terminals. The
functions and terminals are selected from a set of functions and terminals. For example,
the function set F can contains the basic arithmetic operations (þ, 2 , £ , /, etc.), Boolean
logic functions (AND, OR, NOT, etc.), or any other mathematical functions. The terminal
set T contains the arguments for the functions and can consist of numerical constants,
logical constants, variables, etc. The functions and terminals are chosen at random and
constructed together to form a computer model in a tree-like structure with a root point
with branches extending from each function and ending in a terminal. An example of a
simple tree representation of a GP model is shown in Figure 1. In addition to the traditional
tree-based GP, there are other types of GP where programs are shown in different ways
(Figure 2). These are linear and graph-based GP (Banzhaf et al., 1998). The emphasis of this
study is placed on the linear-based GP techniques.
Linear-based GP
There are some main reasons for using linear GP. Basic computer architectures are
fundamentally the same now as they were 20 years ago, when GP began. Almost all
architectures represent computer programs in a linear fashion. Also, computers do not
Root node √
Link
+ Functional node
X1 /
3 X2 Figure 1.
Tree representation
p of a
GP model ðX1 þ 3=X2 Þ
Terminal nodes
GP
Figure 2.
Different types of genetic
Tree-based GP Linear-based GP Graph-based GP programming
EC naturally run tree-shaped programs. Hence, slow interpreters have to be used as part of
28,3 tree-based GP. Conversely, by evolving the binary bit patterns actually obeyed by the
computer, the use of an expensive interpreter (or compiler) is avoided and GP can run
several orders of magnitude faster (Poli et al., 2007). Several linear variants of GP have
been recently proposed. Some of them are (Oltean and Grosşan, 2003a): LGP (Brameier
and Banzhaf, 2007), GEP (Ferreira, 2001), MEP (Oltean and Dumitrescu, 2002), cartesian
246 genetic programming (Miller and Thomson, 2002), GA for deriving software (Patterson,
2002) and infix form genetic programming (Oltean and Grosşan, 2003c). LGP, GEP and
MEP are the most common linear-based GP methods. These variants make a clear
distinction between the genotype and the phenotype of an individual. The individuals in
these variants are represented as linear strings (Oltean and Grosşan, 2003a).
Linear genetic programming

LGP is a subset of GP with a linear representation of individuals. There are several
main differences between LGP and the traditional tree-based GP. Figure 3 shows a
comparison of program structures in LGP and tree-based GP. Linear genetic programs
can be seen as a data flow graph generated by multiple usage of register content. LGP
operates on genetic programs that are represented as linear sequences of instructions
of an imperative programming language (like C/Cþ þ ) (Figure 3(a)). As shown in
Figure 3(b), the data flow in tree-based GP is more rigidly determined by the tree
structure of the program (Brameier and Banzhaf, 2001; Alavi et al., 2010a).
LGP allows structurally non-effective codes to coexist with effective codes in programs
(Brameier and Banzhaf, 2007). An instruction of a linear genetic program is called
“effective” at its position if it affects the program output. Non-effective codes in genetic
programs, which are referred to as “introns”, represent instructions without any influence
on the program behavior. Structural introns act as a protection that reduces the effect of
variation on the effective code. The introns allow variations to remain neutral in terms of
fitness change (Brameier and Banzhaf, 2007). Because of the imperative program structure
in LGP, these non-effective instructions can be detected and eliminated much easier than in
tree-based GP. This allows the corresponding effective instructions to be extracted from a
program during runtime. Since only effective programs are executed when testing the
fitness cases, evaluation can be accelerated significantly.
In the LGP system described here, an individual program is interpreted as a
variable-length sequence of simple C instructions. The instruction or function set
of LGP consists of arithmetic operations, conditional branches, and function calls.
The terminal set of the system is composed of variables and constants. The instructions
/
f[0] = 0;
L0: f[0] += v[0];
L1: f[0] –= –5;
– v[2]
L2: f[0] /= v[2];
Return f[0];
v[0] –5
Figure 3. y = f[0] = (v[0]– (–5))/ v[2]
Comparison of the GP (a) (b)
program structures
Notes: (a) LGP; (b) tree-based GP (after Alavi et al. (2010c))
are restricted to operations that accept a minimum number of constants or memory Geotechnical
variables, called registers (r), and assign the result to a destination register, e.g. engineering
r0: ¼ r1 þ 1. A part of a linear genetic program in C code is represented as follows
(Brameier and Banzhaf, 2007): systems
void LGP (double r[5])
{. . .
r½0 ¼ r½5 þ 70; 247
r½5 ¼ r½0 2 50;
ifðr½1 > 0Þ
ifðr½5 > 2Þ
r½4 ¼ r½2*r½1;
r½2 ¼ r½5 þ r½4;
r½0 ¼ sinðr½2Þ;
}
where register r[0] holds the final program output. LGPs can be converted into a functional
representation by successive replacements of variables starting with the last effective
instruction (Oltean and Grosşan, 2003a). Automatic induction of machine code by genetic
programming (AIMGP) is a particular form of LGP. In AIMGP, evolved programs are
stored as linear strings of native binary machine code and are directly executed by the
processor during fitness calculation. The absence of an interpreter and complex memory
handling results in a significant speedup in the AIMGP execution compared to tree-based
GP. This machine-code-based LGP approach searches for the computer program and the
constants at the same time. Here are the steps the machine-code-based LGP follows for a
single run (Francone and Deschaine, 2004; Brameier and Banzhaf, 2007):
(1) Initializing a population of randomly generated programs and calculating their
fitness values.
(2) Running a Tournament. In this step, four programs are selected from the
population randomly. They are compared and based on their fitness values, two
programs are picked as the winners and two as the losers.
(3) Transforming the winner programs. After that, two winner programs are
copied and transformed probabilistically as follows:
.
parts of the winner programs are exchanged with each other to create two
new programs (crossover); and/or
.
each of the tournament winners are changed randomly to create two new
programs (mutation).
(4) Replacing the loser programs in the tournament with the transformed winner
programs. The winners of the tournament remain without change.
(5) Repeating steps (2) through (4) until convergence.
Comprehensive descriptions of the basic parameters used to direct a search for a linear
genetic program can be found in Brameier and Banzhaf (2007).
Gene expression programming

GEP is a natural development of GP. It was first invented by Ferreira (2001). GEP
consists of five main components: function set, terminal set, fitness function, control
EC parameters, and termination condition. Unlike the parse-tree representation in the
28,3 conventional GP, GEP uses a fixed length of character strings to represent solutions to
the problems, which are afterwards expressed as parse trees of different sizes and
shapes. These trees are called GEP expression trees (ETs). One advantage of the GEP
technique is that the creation of genetic diversity is extremely simplified as genetic
operators work at the chromosome level. Another strength of GEP refers to its unique,
248 multigenic nature which allows the evolution of more complex programs composed of
several subprograms. Each GEP gene contains a list of symbols p with a fixed length
that can be any element from a function set like {þ , 2 , £ , /, } and the terminal set
like {X1, X2, X3, 2}. The function and terminal set must have the closure property: each
function must able to take any value of data type which can be returned by a function
or assumed by a terminal. A typical GEP gene with the given function and terminal
sets can be as follows:
p
þ: £ : :X 1 : 2 : þ : þ : £ :X 2 :X 1 :x3 :3:X 2 :X 3 ð1Þ
where x1, x2 and x3 are variables and 3 is a constant; ‘‘.’’ is element separator for easy
reading. The above expression is termed as Karva notation or K-expression (Ferreira,
2006). A K-expression can be represented by a diagram which is an ET. For example,
the above sample gene can be shown as Figure 4.
The conversion starts from the first position in the K-expression, which corresponds
to the root of the ET, and reads through the string one by one. The above GEP gene can
also be expressed in a mathematical form as:
p
X 1 ððX 1 þ 3Þ 2 ðX 2 £ X 3 ÞÞ þ ðX 2 þ X 1 Þ ð2Þ
An ET can inversely be converted into a K-expression by recording the nodes from left
to right in each layer of the ET, from root layer down to the deepest one to form the
string. As previously mentioned, GEP genes have fixed length, which is predetermined
for a given problem. Thus, what varies in GEP is not the length of genes but the size of
the corresponding ETs. This means that there exist a certain number of redundant
elements, which are not useful for the genome mapping. Hence, the valid length of a
K-expression may be equal or less than the length of the GEP gene. To guarantee the
validity of a randomly selected genome, GEP employs a head-tail method. Each GEP
gene is composed of a head and a tail. The head may contain both function and
terminal symbols, whereas the tail may contain terminal symbols only. The GEP
× √
X1 – +
+ × X2 X1
Figure 4.
Example of ETs X3 3 X2 X3
algorithm uses the following steps until a termination condition is reached Geotechnical
(Ferreira, 2001):
engineering
(1) random generation of the fixed-length chromosome of each individual for the
initial population;
systems
(2) expressing chromosomes as ET and evaluating fitness of each individual;
(3) selecting the best individuals according to their fitness to reproduce with 249
modification; and
(4) repeating the above process for a definite number of generations or until a
solution is found.
In GEP, the individuals are selected and copied into the next generation according to
the fitness by roulette wheel sampling with elitism. This guarantees the survival and
cloning of the best individual to the next generation. Variation in the population is
introduced by conducting single or several genetic operators on selected chromosomes,
which include crossover, mutation and rotation. The rotation operator is used to rotate
two subparts of element sequence in a genome with respect to a randomly chosen
point. It can also drastically reshape the ETs. As an example, the following gene:
p
þ: þ : £ :X 2 :X 1 :X 3 :3:X 2 :X 3 :þ: £ : :X 1 :2
rotates the first five elements of gene (1) to the end. Only the first seven elements are
used to construct the solution function (X2 þ X1) þ (X3 £ 3), with the corresponding
expression shown in Figure 5.
Multi-expression programming
MEP is a subarea of GP developed by Oltean and Dumitrescu (2002). MEP uses linear
chromosomes for solution encoding. It has a special ability to encode multiple solutions
(computer programs) of a problem in a single chromosome. Based on the fitness values
of the individuals, the best encoded solution is chosen to represent the chromosome.
There are not increases in the complexity of the MEP decoding process compared with
the other GP variants that store a single solution in a chromosome. The exception is
on the situations where the set of training data is not a priori known (Oltean and
Grosşan, 2003a). The evolutionary steady-state MEP algorithm starts by the creation
of a random population of individuals. In order to, evolve the best expression from a
data file of inputs and outputs along a specified number of generations, MEP uses the
following steps until a termination condition is reached (Oltean and Grosşan, 2003b):
(1) selecting two parents using a binary tournament procedure and recombining
them with a fixed crossover probability;
+ ×
Figure 5.
X2 X1 X3 3 Example of ETs
EC (2) obtaining two offspring by the recombination of two parents; and
28,3 (3) mutating the offspring and replacing the worst individual in the current population
with the best of them (if the offspring is better than the worst individual in the
current population).
MEP is represented similar to the way in which C and Pascal compilers
250 translate mathematical expressions into machine code. The number of MEP genes
per chromosome is constant and specifies the length of the chromosome. A terminal
(an element in the terminal set T) or a function symbol (an element in the function set F) is
encoded by each gene. A gene that encodes a function includes pointers towards the
function arguments. Function parameters always have indices of lower values than
the position of that function itself in the chromosome. The first symbol in a chromosome
must be a terminal symbol as stated by the proposed representation scheme.
An example of an MEP chromosome can be seen below. It should be noted that
numbers to the left stand for gene labels that do not belong to the chromosome. Using
the set of functions F ¼ {þ , £ ,/} and the set of terminals T ¼ {x1, x2, x3, x4}, the
example is given as follows:
0: x1
1: x2
2: £ 0, 1
3: x3
4: þ 2, 3
5: x4
6: /4, 5
Translation of the MEP individuals into computer programs can be obtained by reading
the chromosome top-down starting with the first position. A terminal symbol defines a
simple expression and each of function symbols specifies a complex expression obtained
by connecting the operands specified by the argument positions with the current
function symbol (Oltean and Grosşan, 2003b). In the present example, genes 0, 1, 3 and
5 encode simple expressions formed by a single terminal symbol. These expressions are:
E0 ¼ x1, E1 ¼ x2, E3 ¼ x3, E5 ¼ x4. Gene 2 indicates the operation £ on the operands
located at positions 0 and 1 of the chromosome. Thus, gene 2 encodes the expression:
E2 ¼ x1 £ x2. Gene 4 indicates the operation þ on the operands located at positions 2
and 3. Therefore, gene 4 encodes the expression: E4 ¼ (x1 £ x2) þ x3. Gene 6 indicates
the operation/on the operands located at positions 4 and 5. Hence, gene 6 encodes the
expression: E6 ¼ ((x1 £ x2) þ x3)/x4.
In order to, choose one of these expressions (E1, . . . ,E6) as the chromosome
representer, multiple solutions in a single chromosome are encoded. Each MEP
chromosome may be viewed as a forest of trees rather than a single tree due to its multi
expression representation (Figure 6). Each of these expressions can be considered as a
possible solution of a problem. The fitness of each expression in an MEP chromosome is
calculated to designate the best encoded expression in that chromosome.
Application to geotechnical engineering problems

This paper introduces the LGP, GEP, and MEP approaches to obtain meaningful
nonlinear relationships between various parameters of four practical geotechnical
engineering problems. The investigated problems are:
.
prediction of relative crest settlement (RCS) of concrete-faced rockfill dams; Geotechnical
.
slope stability evaluation; engineering
.
prediction of ground settlement above tunnels; and systems
.
soil liquefaction assessment.
The LGP, GEP, and MEP models were developed based on the experimental results 251
obtained from the literature. Various parameters involved in the LGP, GEP, and MEP
algorithms are presented in Table I. The major task is to define the hidden function
connecting the input and output variables. The parameter selection will affect the
/
+
× + X4
X1 X2 X3 × X3 X4
X1 X2 × X3 Figure 6.
Expressions encoded by
0 1 X1 X2 5
2 3 an MEP chromosome
X1 X2 6
represented as trees
4
Algorithms Parameters Parameter setting
Common parameters
LGP, GEP, MEP Number of generation 100, 250, 500
Population size 500, 2,500, 5,000
p
Function set þ , 2, £ , /, , power, exp, log, ln
Mutation rate (%) 10, 90
Fitness function Linear error function
Algorithm-specific parameters
MEP Crossover rate (%) 50, 95
Crossover type Uniform
Chromosome length 50-80 genes
GEP Number of genes 1-3
Head size 3, 5, 8
Linking function þ
One-point recombination rate (%) 30, 50
Two-points recombination rate (%) 30
Gene recombination rate 10
Gene transposition rate (%) 10
Numerical constants Integer, floating-point
LGP Crossover rate (%) 50, 95
Block mutation rate (%) 30
Instruction mutation rate (%) 30
Data mutation rate (%) 40
Homologous crossover (%) 95 Table I.
Program size Initial: 80, maximum: 256-512 Parameter settings for the
Number of demes 20 LGP, GEP, and MEP
Numerical constants Integer, randomize algorithms
EC generalization capability of the LGP, GEP, and MEP models. Several runs were
28,3 conducted to come up with a parameterization of LGP, GEP, and MEP that provided
enough robustness and generalization to solve the problems. The effective training time
specifies the number of generations in LGP, GEP, and MEP. For all the cases, three levels
were set for the number of generations. A fairly large number of generations were tested
on each run to find models with minimum error. For each case, the program was run until
252 there was no longer significant improvement in the performance of the models or the
runs terminated automatically. Three levels were also set for the population size. Large
populations were used with the runs to guarantee sufficient diversity. Note that a run will
take longer with a larger population size. Two levels were considered for the crossover
and mutation rates. The success of the algorithms usually increases with increasing the
maximum program size parameter in LGP, head size and number of genes in GEP,
and chromosome length in MEP. In this case, the complexity of the evolved functions
increases and the speeds of the algorithms decrease. Different optimal levels were
considered for these parameters as tradeoffs between the running time and the
complexity of the evolved solutions. Basic arithmetic operators and mathematical
functions were utilized to get the optimum models. The values considered for the other
parameters were based on some previously suggested values (Baykasoglu et al., 2008;
Cevik and Cabalar, 2009; Alavi and Gandomi, 2010; Gandomi et al., 2010) and also after
making several preliminary runs and observing the performance behavior. All of the
combinations of the parameters were tested and ten replications were carried out for
each combination.
A computer software called Discipulus (Conrads et al., 2001) working on the basis of
the AIMGP platform was used for the LGP analysis. The GEP algorithm was
implemented by GeneXproTools (GEPSOFT, 2006) software. Source code of MEP
(Oltean, 2004) in Cþ þ was modified by the authors to be utilizable for the available
problems. For the LGP, GEP, and MEP analyses, the available datasets were randomly
divided into training and testing subsets. The GP-based models have difficulty
extrapolating beyond the range of the data used for their calibration. In order to develop
the best models, the statistical properties of the training and testing subsets need to be
similar to ensure that each subset represents the same statistical population (Masters,
1993). In order to obtain a consistent data division, several combinations of the training
and testing sets were considered. The selection was such that the maximum, minimum,
mean and standard deviation of parameters were consistent in the training and testing
datasets. Out of the available data for each problem, approximately 75 percent of the
data was used for the training process and the remaining 25 percent was taken for
testing of the generalization capability of the LGP, GEP and MEP models. In some of the
investigated cases, the input and output variables were normalized between 0 and 1 to
obtain better results. The best LGP, GEP, and MEP-based formulas were chosen on the
basis of a multi-objective strategy as below:
(1) Involving maximum number of the input variables.
(2) Providing the best fitness value on the training set of data.
Correlation coefficient (R), root mean squared error (RMSE) and mean absolute error
(MAE) were used to evaluate the capabilities of the proposed correlations. R, RMSE
and MAE are given in the form of formulas as follows:
Pn
i¼1 ðhi 2 hi Þðt i 2 ti Þ
Geotechnical
R ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pn P ð4Þ
i¼1 ðhi 2 hi Þ
2 n ðt i 2 ti Þ2
i¼1
engineering
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi systems
Pn 2
i¼1 ðhi 2 t i Þ
RMSE ¼ ð5Þ
n
253
1X n
MAE ¼ jhi 2 ti j
n i¼1
where hi and ti are, respectively, the actual output and the calculated output value for
the ith output, h i is the average of the actual outputs, and n is the number of sample.
For the analysis of the classification problem (soil liquefaction), the output variable
was decoded into binary code with a threshold value equal to 0.5. For a more detailed
performance analysis of the soil liquefaction prediction models, their sensitivity,
specificity, positive predictivity, and accuracy were obtained using the following
equations:
TP
Sensitivity ð%Þ ¼ £ 100 ð7Þ
TP þ FN
TN
Specificity ð%Þ ¼ £ 100 ð8Þ
TN þ FP
TP
Positive predictivity ð%Þ ¼ £ 100 ð9Þ
TP þ FP
TP þ TN
Accuracy ð%Þ ¼ £ 100 ð10Þ
TP þ FP þ FN þ TN
where:
TP (true positive) The model predicts that the class is 1 and the class of the given
instance is indeed 1.
TN (true negative) The model predicts that the class is 0 and the class of the given
instance is indeed 0.
FP (false positive) The model predicts that the class is 1 but the class of the given
instance is 0.
FN (false negative) The model predicts that the class is 0 but the class of the given
instance is 1.
TP and TN are correct classifications, while FP and FN are incorrect classifications.
The classification performance of the LGP, GEP, and MEP models was also evaluated
using receiver operating characteristics (ROC) analysis. The ROC curves plot the
sensitivity and specificity versus the model output for a continuous range of decision
thresholds. The selected index of performance was the area (Az) under the ROC curves
which is a meaningful performance measure. Generally, a higher area index reflects a
better classification performance.
EC Problem I: relative crest settlement of concrete-faced rockfill dams
28,3 The concrete-face rockfill dam (CFRD) has become popular in the last four decades
because of its good performance and low cost compared with rockfill dams with an inner
earth core. Experience up to 1960 using dumped rockfill showed that CFRD is a safe and
economical type of dam. However, it is subject to concrete-face damage and leakage
caused by the high compressibility of the segregated dumped rockfill. Many CFRD
254 designs have been used for dam construction worldwide. The CFRD designs overcome
technical difficulties such as dam construction on a soft foundation, complex dam
erection, and related problems. CFRD is the preferred construction type for new dam
projects in many countries, including Australia, China, and Brazil. Standard CFRD
design guidelines have already produced in some of these countries (e.g. ANCLDI, 1991;
BCD, 2000). Numerous studies have been carried out on CFRDs (BCD, 2000).
Cooke (1984) presented a chronicle of modern rockfill dam design, including a
description of current practice in the CFRD design. Clements (1984) proposed empirical
equations to investigate the actual crest settlements and deformations of several rockfill
dams after construction. It was observed that the values calculated using the empirical
formulas exhibit significant differences from the observed values. Liu et al. (1993)
presented a method to predict the maximum settlement at the end of construction and
the maximum face slab normal displacement during reservoir operation. The method
was on the basis of the physical and mechanical properties of rockfill, the load factor,
and the geometric profile of the CFRD section. The characteristics of rockfill behavior
using actual CFRD cases were explained by Hunter (2003) and Hunter and Fell (2003).
It is usually necessary to rely on the historic performance data from other dams to
estimate the dam properties. There is some published information on predicting the
deformation of CFRDs (Clements, 1984). However, such studies are based on limited
data. Also, researchers have often concentrated on only one or two factors which affect
either the rockfill modulus or the measured deformation. ANNs have also been applied to
the prediction of the RCS of a CFRD (Kim and Kim, 2008). As mentioned previously,
ANNs have some fundamental disadvantages that limit their usage in practical
calculations.
Herein, the LGP, GEP, and MEP approaches were used as alternative ways to
simulate the behavior of the CFRD crest settlement. The models derived using these
methods can be used as quick and accurate tools for evaluating the RCS without any
need for manual testing.
Model construction and analysis

The LGP, GEP, and MEP-based correlations for predicting the crest settlement of
CFRDs were developed based on thirtyfive datasets gathered by Kim and Kim (2008).
The data are for several types of rockfill material in seven countries (Australia,
Colombia, Brazil, Thailand, Sri Lanka, Korea, and China). There are large differences in
the elapsed times from completion of construction to the time of measurement of crest
settlement of the investigated dams. According to Hunter (2003), the post-construction
crest settlement of a CFRD after impounding converges to a final value after six years.
Therefore, five crest settlement data from the Mangrove Creek, Tianshenqiao, Ita,
Janheung, and Daegok dams measured less than six years after impounding were
excluded. The database includes the measurements of several variables such as dam
height (H), void ratio (e), vertical deformation modulus after construction (EV), and RCS
of CFRDs after impounding. The descriptive statistics of the data used in this study is Geotechnical
given in Table II. H, e and EV were considered as the input variables to predict the RCS
values. The optimal LGP, GEP, and MEP models were derived after developing and
engineering
controlling a number of models with different combinations of the predictor variables. systems
Out of the 30 data, 23 datasets were used as the training data and seven sets were taken
for the testing of the generalization capability of the models.
In addition, the proposed models were verified by comparison with the results based 255
on the Clements (1984) theory (equation (11)) and those obtained by Kim and Kim
(2008) (equation (12)). These regression-based equations are as follows:
RCS ¼ aH b ð11Þ
RCS ¼ 0:0069H 0:655 ð12Þ
where RCS is crest settlement (m); H is dam height (m), ; and ’ are constants. ; is
equal to 0.0002 at initial impounding and 0.0000014 after ten years’ service, and ’ is 1.1
at initial impounding and 2.6 after ten years’ service.
LGP, GEP, and MEP-based formulations of crest settlement

The LGP, GEP, and MEP-based formulations of the RCS of CFRDs after impounding
are as given below:
2 2
1 H 27 1 Ev H þ 10
RCS LGP ðmÞ ¼ þ þ ð13Þ
9 Ev þ 7 1; 225 H 2 E v 2 7 H ðE v 2 10=eÞ
15 e Expðe 2 7ÞðH 2 7Þ
RCS GEP ðmÞ ¼ þ þ ð14Þ
5E v 2 H 2 1 e 2 H þ E v þ 9 5e

1 1 e 11
RCS MEP ðmÞ ¼ H þ þ ðE v þ 4Þ3 ð15Þ
E v ð9E v 2 2H Þ 18eðE v þ H Þ 16
where H, e, and EV, respectively, denote the dam height, void ratio, and vertical
deformation modulus after construction. Comparisons of the crest settlement
predictions obtained by LGP, GEP, and MEP and other existing models are shown
in Figure 7(a)-(e). Performance statistics of different models for the training and testing
data is summarized in Table III. Based on the performance measures, it can be
observed that the LGP, GEP, and MEP models are able to predict the target values with
high degree of accuracy. It can be seen that the LGP model has produced better results
than the GEP and MEP-based models for the testing data. The best performance on the
Parameter H (m) e Ev (MPa) RCS (m)
Mean 96.43 0.26 105.57 0.21

SD 39.66 0.05 74.32 0.25
Sample variance 1,573.16 0.00 5,523.29 0.06 Table II.
Range 161.00 0.22 350.00 1.16 The variables used in the
Minimum 26.00 0.18 25.00 0.01 development of the RCS
Maximum 187.00 0.40 375.00 1.17 prediction models
EC 150
Experiment LGP
150
Experiment GEP
28,3 120 120
RAll = 0. 950
RAll = 0. 944
90 RMSEAll = 0.081 90 RMSEAll = 0.077
RCS (m)
RCS (m)
MAEAll = 0.061 MAEAll = 0.059
60 60
30 30
256 0 0
Train Test Train Test
–30 –30
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Test no. Test no.
(a) (b)
150 150
Experiment MEP Experiment Clements (1984)
120 120
RAll = 0. 948 RAll = 0.275
90 RMSEAll = 0.078 90 RMSEAll = 0.320
RCS (m)
RCS (m)
60 60
30 30
0 0
–30 –30
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Test no. Test no.
(c) (d)
150
Experiment Kim and Kim (2008)
120
90 RAll = 0. 237
RMSEAll = 0.247
RCS (m)
60 MAEAll = 0.118
30
Figure 7.
0
Predicted versus
Train Test
measured RCS of CFRDs –30
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
using the LGP, GEP, and
MEP models Test no.
(e)
Training Testing
Models R RMSE MAE R RMSE MAE
LGP 0.876 0.090 0.067 0.994 0.041 0.039

Table III. GEP 0.894 0.082 0.064 0.963 0.054 0.042
Overall performance of MEP 0.911 0.076 0.058 0.898 0.083 0.064
different models for the Clements (1984) 0.502 0.215 0.145 20.099 0.536 0.404
assessment of RCS Kim and Kim (2008) 0.518 0.173 0.090 20.133 0.403 0.210
training and entire database is, respectively, obtained by MEP and GEP. The results
clearly demonstrate that it is not appropriate to use the models proposed by Clements
(1984) and Kim and Kim (2008) to estimate the crest settlement because of their poor
performance.
Problem II: slope stability Geotechnical
Slope failure is a complex natural phenomenon that constitutes a serious natural hazard. It engineering
is responsible for hundreds of millions of dollars of damage to public and private property
every year. To prevent or mitigate the landslide damage, the slope-stability analysis systems
requires an understanding and evaluation of the processes that govern the behavior of the
slopes. Factor of safety (FS) as an index of stability is required to evaluate the slope
stability. Many parameters are involved in the slope stability evaluation. Calculating the 257
FS values requires geometrical, physical data on the geologic materials and their
shear-strength parameters (cohesion and angle of internal friction), information on
pore-water pressures, etc. The methods available to solve FS of a given slope are
traditionally classified into the following categories (Nash, 1987; Duncan, 1996):
.
energy methods;
.
limit-equilibrium methods;
.
finite element or finite difference methods; and
.
circular failure surface methods and non-circular failure surface methods.
The complexity of the slope system requires employment of efficient prediction

methods. In this context, ANNs attract many researchers due to their successful
performance in modeling non-linear multivariate problems (Neaupane and Achet, 2004;
Wang et al., 2005; Ermini et al., 2005). The major characteristics of ANNs in dealing with
quantitative and qualitative indexes include large-scale parallel distributed processing,
continuously nonlinear dynamics, collective computation, self learning, and real-time
treatment (Rumelhart and McClelland, 1986). Recently, Yang et al. (2004) applied
tree-based GP to the stability analysis of slopes. Applications of the linear-based GP to
solve slope stability problems are conspicuous by their near absence. The current work
therefore introduces the LGP, GEP and MEP techniques to analyze the slope stability.

The parameters that influence the slope stability may be divided into internal and external
causes. The internal causes are mainly the geomorphic characteristics and properties of
geologic materials such as friction, cohesion, loading conditions and density. The external
factors are earthquakes, rainfall and groundwater, human activities, and erosion. The
most related parameters for rainfall-induced landslides are rainfall intensity and
long-term volume of rain as the external causes, and friction and cohesion of soils as
internal causes in soil slides. All of the landslides and slopes data that has the potential to
fail were caused by the same environmental factors. In the proposed LGP, GEP, and MEP
models, slope height (H) and angle (u), bulk density ((), cohesion (c), and angle of internal
friction (N) were chosen as the main factors influencing FS. These factors were identified
based on exploration of typical, large-scale slopes that have the potential to fail and on
statistical analyses throughout the Qing River basin (Wang et al., 2005). The FS values of
the samples were computed employing limit-equilibrium analysis and other geotechnical
evaluation methods by Wang et al. (2005). The statistics of different input and output
parameters involved in the model development is given in Table IV. The LGP, GEP, and
MEP-based correlations were developed based on 26 datasets presented by Wang et al.
(2005). Of the available data, 20 datasets were used in the training process and the rest
were taken for the testing of the models.
EC LGP, GEP, and MEP-based formulations of safety factor
The FS prediction equations for the best result by the LGP, GEP, and MEP algorithms
28,3 are given as follows:

1 9þg
FS LGP ¼ f þ þ1 ð16Þ
uc 2 7H 2 7u uðH þ 2u þ 8Þ
258 9
10ðH þ f 2 10Þ g c
FS GEP ¼ þ þ ð17Þ
7H u ðH 2 cÞðH 2 c 2 9Þ H =4c þ 28
6 g þ 2f þ 3 f
FS MEP ¼ þH 2 þ1 ð18Þ
f 2 1=6H 2 2u 2 1 ðH þ uÞ2 cðð1=4ÞH 2 2cÞ
in which H, u, (, c and N are, respectively, the slope height, angle, bulk density,
cohesion, and angle of internal friction. Comparisons of the predicted versus measured
FS values are shown in Figure 8(a)-(c). Performance statistics of the LGP, GEP, and
MEP models for the training and testing data is summarized in Table V. The results
indicate that the LGP, GEP, and MEP models are able to predict FS with high degree of
accuracy. It can be shown from Figure 8 and Table V that LGP has produced better
outcomes than GEP and MEP on the training and whole of data. In this case, the
MEP-based correlation has provided better performance on the testing data.
Problem III: tunneling-induced ground settlement

Construction of a tunnel in soft ground brings about a change in the state of stress and
strains and displacement around tunnel opening. Hence, some degree of soil deformation
may always be induced. It is essential to protect existing adjacent structures and
underground facilities from damage due to tunneling. In order to minimize the risk,
reliable predictions of the ground deformations induced by tunneling must be made.
To make reliable estimate of the inevitable ground movements and settlements, the type of
soil, groundwater conditions geometry and depth of tunnel should be taken into account
(Thongyot, 1995). Numerous researches have been conducted in recent years to predict the
settlement due to tunneling. The selection of appropriate method depends on the
complexity of the problems. The estimation of the settlement associated with soft ground
tunneling can be divided into three steps (Neaupane and Adhikari, 2006):
(1) prediction of the ground loss in the tunneling process;
(2) prediction of the ground movements generated by these losses; and
(3) assessment of the possible distress to above and below ground facilities
brought about by these ground movements.
Parameter H (m) u (o ) N (o ) ((kN/m3) c (kPa) FS
Mean 121.50 27.96 23.43 21.09 18.03 1.25

SD 117.41 9.21 10.20 3.64 11.23 0.28
Table IV. Sample variance 13,784.87 84.76 104.10 13.26 126.06 0.08
The variables used in the Range 396.34 37.00 45.00 14.44 40.00 0.99
development of the FS Minimum 3.66 16.00 0.00 14.00 0.00 1.00
prediction models Maximum 400.00 53.00 45.00 28.44 40.00 1.99
3 3
RAll = 0.896
Geotechnical
RAll = 0.919 Experiment LGP Experiment GEP
RMSEAll = 0.115 RMSEAll = 0.123
MAEAll = 0.093
engineering
MAEAll = 0.092
2 2 systems
FS
FS
1 1
259
0 0
1 3 5 7 9 11 13 15 17 19 21 23 25 1 3 5 7 9 11 13 15 17 19 21 23 25
Test no. Test no.
(a) (b)
3
Experiment MEP
RAll = 0.877
RMSEAll = 0.113
2 MAEAll = 0.100
FS
1
Figure 8.
Train Test Predicted versus
0 measured FS values using
1 3 5 7 9 11 13 15 17 19 21 23 25 the LGP, GEP, and MEP
Test no.
models
(c)
Training Testing
Table V.
LGP 0.917 0.128 0.106 0.920 0.174 0.147 Overall performance of
GEP 0.890 0.132 0.098 0.884 0.168 0.141 different models for the
MEP 0.858 0.148 0.116 0.947 0.203 0.158 assessment of FS
Of these, the most difficult task is the prediction of the ground losses in the tunneling
process. Many studies have been done for the assessment of the ground movement. Most
of these studies have followed the trend set by Peck (1969). Peck (1969) represented the
settlement trough over a single tunnel by the error function normal or probability
curve within reasonable limits. Empirical prediction methods were the first methods for
the prediction of the surface subsidence (Peck, 1969; Atkinson and Potts, 1979; Attewell
and Farmer, 1974; Clough and Schmidt, 1981). These methods are on the basis of
the correlation of measured data with the geometric parameters of the excavations.
The results obtained are valid only for the investigated area because these methods are
derived from the measurements in a specific area. The second group of prediction methods
are based on the influential functions. The influential functions are utilized to describe the
value of the impact of elementary part of the excavation on the formation of subsidence.
These methods are based on several assumptions or principles which simplify the calculus
and make the methods generally applicable. The principle of utilizing the methods are to
select the influential function for each mine and then determine the coefficients to ensure
that the subsidence curve is similar to the form of the subsidence in nature. Another group
EC of prediction models are mathematical-physical models. The behavior of roof and the
development of subsidence are calculated in accord with the laws of mechanics. The elastic
28,3 and plastic models of subsidence belong to these methods. In case of using these models,
the problem is usually solved by numerical methods, such as the finite element, finite
difference or boundary element methods.
Progress has recently been made in the ability to predict the ground movements due
260 to tunneling. The state of the art is still deficient in many ways. ANNs have recently
been applied to the prediction of the tunneling-induced ground movement (Ambrozic
and Turk, 2003; Neaupane and Adhikari, 2006). Li et al. (2006) proposed fuzzy models
for the analysis of rock mass displacements due to underground mining. Li et al. (2007)
utilized a hybrid fuzzy and tree-based GP method to analyze the actual cases of
excavation, mining and ground surface movement.
On the basis of a detailed investigation, a viable approach is still necessary for the
prediction of the ground movement. In this paper, measurements of settlement
recorded in different tunnel projects were formulated by means of the LGP, GEP, and
MEP techniques.

Prediction of the ground settlement above tunnels has long been a subject of numerous
researches. A number of comprehensive works have been done on the basis of the
empirical relationships (Peck, 1969; Atkinson and Potts, 1979; Attewell and Farmer,
1974; Clough and Schmidt, 1981; Yoshikoshi et al., 1978). It is difficult to determine all
the engineering parameters of the ground settlements due to underground mining, and
it is even more difficult to determine their relative impacts (Neaupane and Adhikari,
2006). Based on the existing models, a number of input parameters were identified and
considered in the development of the LGP, GEP, and MEP models. These variables
were taken from the published literature and include various case studies of tunnel
projects of different countries. The tunneling-induced maximum ground settlement
(Smax) was formulated in terms of tunnel depth (Z), diameter (D) and cohesion (Cu),
and volume loss parameter (Vl). Vl is the ratio of the volume of the settlement through
per unit length of tunnel (Vs) to the ground loss during excavation (Vt). It is a key
parameter in calculating the ground loss (Atkinson and Potts, 1979). Figure 9 shows
the settlement trough and detail nomenclatures. Effects of groundwater level (GWL)
and construction method (CM) on soil behavior were also incorporated. Although the
hydraulic conductivity of clay is small, GWL will greatly affect the overburden
pressure on tunnel crown. GWL was classified into two categories, the water level
above the tunnel crest was taken as 1 and below the tunnel axis was taken as 2.
GWL
Smax
Figure 9.
Typical section of a tunnel
D
Similarly, CM was classified as 1, 2, and 3 for the hand-mined shield, mechanical Geotechnical
shield and semi-mechanical type (compressed air support) shield, respectively. The
descriptive statistics of the data used in this study is given in Table VI. The data from
engineering
several tunneling case studies (e.g. Toronto subway; Regents park, London; Bangkok, systems
GBC5; San Francisco, Brussels metro, etc.) presented by Neaupane and Adhikari (2006)
were used to develop the LGP, GEP and MEP-based models. Of the available 40 data,
30 datasets were used for the training of the models and the rest were taken for the 261
testing purposes.
LGP, GEP, and MEP-based formulations of tunneling-induced ground settlement

The LGP, GEP and MEP-based formulations of the tunneling-induced maximum
ground settlement, Smax, are as given below:
8 2ZC u Z 2 ð4=3ÞGWL þ 9
S max ;LGP ðmmÞ ¼ 4CM 2 þ DðC u þ 4Þ þ þ ð19Þ
Cu 3 2 V l 2 GWL ZC u 2 D 2 9
D 2Cu
C u ð2Z þ C u Þ 7 þ D 2 Cu
S max;GEP ðmmÞ ¼ C u V l þ 4CM þ þ þ ð20Þ
V 2l 1 2 GWL C u CM ðD 2 ð1=10ÞGWLÞ

Z Z þD24 DV l
S max ;MEP ðmmÞ ¼ C u þ 3D 2 þ 4C u þ CM 1 þ D þ ð21Þ
2C u V l GWL Z 2 Cu
where Z, D, Cu, Vl, GWL, and CM, respectively, denote the tunnel depth, diameter,
cohesion, volume loss parameter, GWL, and construction method. Comparisons of the
predicted versus measured Smax values are shown in Figure 10(a)-(c). Performance
statistics of the proposed models for the training and testing data is summarized in
Table VII. As it is seen, the LGP, GEP, and MEP models precisely predict the target values.
With the exception of the training data, the best results are obtained by the LGP model.
Problem IV: soil liquefaction

Soil liquefaction is one of the most complex phenomena studied in geotechnical
earthquake engineering. Liquefaction is the phenomenon of vanishing intergranular
stresses, as a material response to some loading paths. These loading paths can be
isochoric paths. In practical cases, the isochoric situation plays an important role since
it corresponds to an undrained loading. Liquefaction is commonly considered as a
specific feature of loose and saturated sandy soils. The presence of water appears to
be necessary only to allow easy verification of the isochoric condition (Darve, 1996).
The liquefaction phenomenon can be caused by seismic shaking, nonseismic vibration
or waved-induced shear stresses. In some loose sediments, static liquefaction, which is
Parameter Z (m) D (m) Cu (kN/m2) Vl Smax (mm)
Mean 12.90 4.05 39.60 4.80 32.59

SD 6.66 1.93 33.69 9.57 37.54
Sample variance 44.34 3.71 1,135.02 91.58 1,409.51 Table VI.
Range 29.50 8.80 150.00 41.50 149.80 The variables used in the
Minimum 4.50 1.30 0.00 0.20 0.20 development of the Smax
Maximum 34.00 10.10 150.00 41.70 150.00 prediction models
EC 210
Experiment LGP
210
Experiment GEP
28,3 180
150
RAll = 0. 963
180
150
RAll = 959
RMSEAll = 10.098 RMSEAll = 10.750
Smax (mm)
Smax (mm)
120 120
90 90
60 60
262 30
0
30
0
Train Test
–30 –30
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
Test no. Test no.
(a) (b)
210
180 RAll = 0.888 Experiment MEP
150 RMSEAll = 17.119
MAEAll = 8.802
Smax (mm)
120
90
60
Figure 10. 30
Predicted versus
0
measured Smax using the Train Test
–30
LGP, GEP, and MEP 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
models Test no.
(c)
Training Testing
Table VII.
Overall performance of LGP 0.949 10.679 8.306 0.985 8.111 6.704
different models for the GEP 0.952 10.751 8.010 0.967 10.748 9.021
assessment of Smax MEP 0.821 19.065 9.593 0.976 9.045 6.429
resulted from the application of noncyclic shear stresses, can be triggered. Liquefaction
usually occurs when the pore water pressure increases to carry the overburden stress,
i.e. the grain to grain stress equals zero. Therefore, soil immediately loses most of its
strength leading to extreme deformations, flow of water and suspension of sediment
(Liang, 2005). This phenomenon is a source of damage and destructive failures in
various types of structures. The seriousness of potential failures of critical structures
due to liquefaction led to massive research efforts to understand this phenomenon.
Several procedures are developed to evaluate the liquefaction potential in the field.
The available liquefaction evaluation procedures are categorized into three main
groups:
(1) the stress-based procedures;
(2) the strain-based procedures; and
(3) the energy-based procedures.
The stress-based procedure is the most widely used liquefaction assessment method
first proposed by Seed and Idriss (1971) and Whitman (1971). This approach is mainly
empirical and based on laboratory and field observations. The stress method has Geotechnical
continually been refined as a result of newer studies and increase in the number of engineering
liquefaction case histories (NRC, 1985; NCEER, 1997). The main criteria in the
stress-based procedure are the shear stress level and the number of cycles. To establish systems
a relationship between earthquake actual motion and laboratory harmonic loading
conditions, the equivalent stress intensity and the number of cycles have to be defined.
Dobry et al. (1982) proposed a strain-based procedure as an alternative to the empirical 263
stress-based procedure. This method was derived from the mechanics of two
interacting idealized sand grains and then generalized for natural soil deposits. It is
based on a hypothesis that pore pressure initiates to develop when shear strain
surpasses a threshold shear strain. Use of the strain-based approach for liquefaction
evaluation is not as common as the stress-based method. The reason for its limited uses
is that the strain procedure only predicts the initiation of pore pressure buildup, which
is essential for liquefaction to occur, but does not imply that liquefaction will occur.
The energy concept has been widely used in the theories of elasticity and plasticity,
potential energy surface for constitutive law and energy principles (Desai and
Siriwardane, 1984). Since the late 1970s, numerous energy-based procedures have
been proposed for evaluating the liquefaction potential of soil deposits (Liang, 2005).
The use of the energy concept is shown to be a logical step in the liquefaction
evaluation of soils (Alavi and Gandomi, 2010).
Modern techniques such as fuzzy systems and ANNs have been utilized to develop
the liquefaction prediction models (Goh, 1994; Hanna et al., 2007; Pal, 2006; Goh and
Goh, 2007) investigated the potential of support vector machines (SVM) classification
approach to assess the liquefaction potential based on actual standard penetration test
(SPT) and cone penetration test (CPT) field data. Recently, Baykasoglu et al. (2009)
proposed a hybrid ANN and ant colony optimization algorithm in order to extract
accurate rules for the liquefaction classification. Unlike the other soft computing tools,
applications of GP and its variants to the liquefaction assessment are difficult to be
found. In this connection, Alavi and Gandomi (2010) derived generalized LGP and MEP
models relating the strain energy density required to trigger liquefaction to the factors
affecting the liquefaction characteristics of sands.
In this work, the potential of alternative data-induction tools, LGP, GEP and MEP, is
demonstrated by applying them to the classification of several liquefied and
non-liquefied cases records.

Geotechnical engineers usually solve complex problems having a serious of interacting
parameters. In some problems, such as liquefaction, these parameters are not well
defined and or the problem could be too complex to be described by a mathematical
function. On the basis of a literature review (Seed et al., 2003; Hanna et al., 2007; Goh
and Goh, 2007), several soil and seismic parameters governing the soil liquefaction
potential were taken into account. These parameters are cone tip resistance (qc), sleeve
friction ratio (Rf), effective stress at the depth of interest (sv0 ), total stress at the same
depth (sv), maximum horizontal ground surface acceleration (amax), and earthquake
moment magnitude (Mw). The existence of the liquefaction was represented by binary
variables. The non-liquefied and liquefied cases were, respectively, represented by
0 and 1. The proposed models were developed based on several CPT case records
EC compiled by Juang et al. (2003) and presented by Goh and Goh (2007). The database
contains 226 records with 133 liquefied cases and 93 non-liquefied cases. The statistics
28,3 of different input and output parameters involved in the model development is shown
in Table VIII. Out of the available data, 170 datasets were taken as the training data
and the rest were used for the testing of the models.
264 LGP, GEP, and MEP-based formulations of soil liquefaction

The LGP, GEP, and MEP-based empirical relationships to classify the non-liquefied
and liquefied cases, LC, are as given below:
1 2

LC LGP ¼ 2 amax s0 v 2 4qc s0 2 9Rf s0v þ 54s0v þ 9sv 2 54Mw 2 378 ð22Þ
s0 v

1 Rf qc 2 ðMw 2 qc ÞRf 4 2 Rf
LC GEP ¼ amax 2 0 2 5amax þ þ ð23Þ
sv amax 2qc 2 sv 2 3 qc þ 2

1 9 9Rf s0v
LC MEP ¼ amax þ 0 4amax þ 4Mw 2 4qc þ 0 2 þ
sv 4sv 4 4
2 ð24Þ
2ðqc 2 amax 2 Mw Þ þ Mw
24Rf
s0v
where qc, Rf, sv0 , sv, amax, and Mw are, respectively, the cone tip resistance, sleeve
friction ratio, effective stress at the depth of interest, total stress at the same depth,
maximum horizontal ground surface acceleration, and earthquake moment magnitude.
When return of the proposed models is greater than or equal 0.5, the case is marked as
‘‘liquefied”. Alternatively, when return of the models is less than 0.5, the case is marked
as ‘‘non-liquefied”. The sensitivity, specificity, positive predictivity, and accuracy
values obtained by the LGP, GEP, and MEP models are, respectively, shown in
Tables IX-XI. The ROC curves were used to visualize the detection performance of the
classifiers on the entire database (Figure 11). The results clearly indicate that the LGP,
GEP, and MEP models are efficiently able to classify the liquefied cases and
non-liquefied cases. As it is seen, the best classification results are obtained by LGP
followed by GEP and MEP.
Discussion
Different LGP-, GEP-, and MEP-based constitutive relationships were obtained for the
assessment of four complex geotechnical engineering systems. The RCS values predicted
using the new CFRD models were in good agreement with the field measurements.
Parameter qc (MPa) Rf (%) sv0 (kPa) sv (kPa) amax (g) Mw
Mean 5.82 1.22 74.65 106.89 0.29 6.95

Table VIII. SD 4.09 1.05 34.40 55.36 0.14 0.44
The variables used in the Sample variance 16.75 1.10 1,183.05 3,064.65 0.02 0.19
development of the Range 24.10 5.10 192.70 247.40 0.72 1.60
liquefaction prediction Minimum 0.90 0.10 22.50 26.60 0.08 6.00
models Maximum 25.00 5.20 215.20 274.00 0.80 7.60
While the conventional models use only dam height for evaluating RCS, it was found Geotechnical
herein that incorporating three basic properties of a dam (H, e, EV) into the model engineering
development significantly improves the RCS assessments. The slope stability was also
evaluated by the LGP, GEP, and MEP methods. The models were constructed using data systems
of landslide samples. The results of the analysis show that LGP, GEP, and MEP are
effective approaches for the evaluation of the slope stability and instability. Contrary to
LGP, GEP, and MEP, numerous traditional simplified deterministic methods for the 265
FS calculation suffer from limitations such as the inability to consider variability in
the input parameters. The LGP, GEP, and MEP techniques can be applied better over
Detected liquefaction class by the LGP model

Training data Testing data
Samples Liquefied Non-liquefied Liquefied Non-liquefied
Measured stability class

Liquefied 96 5 32 0
Non-liquefied 12 57 3 21 Table IX.
Sensitivity (%) 95.05 100.00 Classification results
Specificity (%) 82.61 87.50 achieved by the LGP
Positive predictivity (%) 88.89 91.43 model for the evaluation
Accuracy (%) 90.00 94.64 of the soil liquefaction
Detected liquefaction class by the GEP model


Liquefied 91 10 32 0
Non-liquefied 9 60 4 20 Table X.
Specificity (%) 86.96 83.33 achieved by the GEP
Detected liquefaction class by the MEP model


Liquefied 90 11 30 2
Non-liquefied 12 57 6 18 Table XI.
Specificity (%) 82.61 75.00 achieved by the MEP
EC 1
28,3 0.9
0.8
0.7
266 0.6
Sensitivity
0.5
0.4
0.3 LGP (Az = 0.97)

GEP (Az = 0.95)
0.2 MEP (Az = 0.92)
Figure 11. 0.1
The ROC curves for the
LGP, GEP, and MEP 0
models 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
1-Specificity
a wide area using nonparametric variables with large extension. The ground settlement
above tunnels was successfully formulated by means of the LGP, GEP, and MEP methods.
In the modeling procedure, the effects of several parameters with direct physical
significance on the ground behavior around tunnels were considered. These parameters
were identified through a detailed investigation of different tunnel projects published
in the literature. The results show that the LGP-, GEP-, and MEP-based models can
effectively be used for predicting the ground surface movements due to the soft ground
tunneling. The viability of LGP, GEP, and MEP to model the complex behavior of the
liquefaction phenomenon was further demonstrated. The derived correlations have
integrated the input parameters that account for all possible variations in the field. Several
soil and seismic parameters were included in the soil liquefaction analysis. The developed
constitutive models are expected to be very useful for the preliminary evaluation of the
liquefaction potential of sites for which the input parameters are not well defined.
It is known that the models derived using neural networks, GP-based approaches or
other similar techniques perform best when they do not extrapolate beyond the range
of the data used for their calibration (Shahin et al., 2008). Consequently, the amount of
data used for the model development is an important issue, as it heavily bears on the
reliability of the final models. In this context, Frank and Todeschini (1994) argue that
the minimum ratio of the number of total objects over the number of selected variables
for model acceptability is 3, but often a suffer value of 5 is more reasonable. In the
present study, the ratios are higher and are equal to 26/5 ¼ 5.2 at the minimum
(Problem II: slope stability) and 226/6 ¼ 37.7 at the maximum (Problem IV: soil
liquefaction). Note that the obtained constitutive models can easily be retrained and
improved to make more accurate predictions for a wider range by including the data
for other soil types or test conditions. Comparing the overall performance of the utilized
methods, LGP has generally provided the best results followed by GEP and MEP.
In most of the cases, the capability of LGP and GEP was better than MEP in
incorporating the effects of more influencing parameters.
The task faced by LGP-, GEP-, MEP-, and other GP-based approaches is mainly the Geotechnical
same as that faced by ANNs. GP and ANNs are machine learning techniques that can engineering
effectively be applied to the classification and approximation problems. They directly
learn from raw experimental (or field) data presented to them in order to extract the systems
subtle functional relationships among the data, even if the underlying relationships are
unknown or the physical meaning is difficult to be explained. Contrary to these methods,
most conventional empirical and statistical methods need prior knowledge about the 267
nature of the relationships among the data. Classical constitutive models rely on
assuming the structure of the model in advance, which may be suboptimal. Therefore,
the GP and ANN-based approaches are well-suited to modeling the complex behavior of
most geotechnical engineering problems with extreme variability in their nature (Shahin
et al., 2009). In spite of similarities, there are some important differences between GP and
ANNs. ANNs suffer from some shortcomings including lack of transparency and
knowledge extraction. That is, they do not explicitly explain the underlying physical
processes. The knowledge extracted by ANNs is stored in a set of weights that cannot
properly be interpreted. Owing to the large complexity of the network structure, ANNs
do not give a transparent function relating the inputs to the corresponding outputs.
The main advantage of GP over ANNs is that GP generates a transparent and structured
representation of the system being studied. An additional advantage of GP over ANNs is
that determining the ANN architecture is a difficult task. The structure and network
parameters of ANNs (e.g. number of inputs, transfer functions, number of hidden layers
and their number of nodes, etc.) should be identified a priori, which is usually done
through a time-consuming trial and error procedure. In GP, the number and combination
of terms are automatically evolved during model calibration (Shahin et al., 2009; Javadi
and Rezania, 2009). A notable limitation of GP and its variants is that these methods are
parameter sensitive, especially when difficult experimental training datasets like those
used in this paper are employed. Using any form of optimally controlling the parameters
of the run (e.g., GAs) can improve the performance of the LGP, GEP, and MEP
algorithms. Also, the underlying assumption that the input parameters are reliable is not
always the case. Sine fuzzy logic can provide a systematic method to deal with imprecise
and incomplete information, the process of developing hybrid fuzzy and linear-based GP
models for such problems can be a suitable topic for further studies.
However, one of the goals of introducing the expert systems, such as the GP-based
approaches, into the design processes is better handling of the information in the
pre-design phase. In the initial steps of design, information about the features and
properties of targeted output or process are often imprecise and incomplete
(Kraslawski et al., 1999). Nevertheless, it is idealistic to have some initial estimates of
the outcome before performing any extensive laboratory or field work. The LGP, GEP,
and MEP approaches employed in this research are based on the data alone to
determine the structure and parameters of the models. Thus, the derived constitutive
models can particularly be valuable in the preliminary design stages. For more
reliability, the results of the LGP-, GEP-, and MEP-based analyses are suggested to
be treated as a complement to conventional computing techniques. In any case, the
importance of engineering judgment in interpretation of the obtained results should not
be underestimated. In order to develop a sophisticated prediction tool, LGP, GEP, and
MEP can be combined with advanced deterministic geomechanical models. Assuming
the geomechanical model captures the key physical mechanisms, it needs appropriate
EC initial conditions and carefully calibrated parameters to make accurate predictions.
An idea could be to calibrate the geomechanical parameters by the use of LGP, GEP,
28,3 and MEP which take into account historic datasets as well as the laboratory or field
test results. This allows integrating the uncertainties related to in situ conditions which
the geomechanical model does not explicitly account for. LGP, GEP, and MEP provide
a structured representation for the constitutive material model that can readily be
268 incorporated into the finite element or finite difference analyses. In this case, it is
possible to use a suitably trained GP-based material model instead of a conventional
(analytical) constitutive model in a numerical analysis tool such as finite element code
or finite difference software (like FLAC). Consequently, the need for complex
yielding/plastic potential/failure functions, or flow rules is avoided. It is notable that
the numerical implementation of ANNs in the finite element analyses has already been
presented by several researchers (Shin and Pande, 2000; Javadi et al., 2005). This
strategy has lead to some qualitative improvement in the application of finite element
method in engineering practice ( Javadi and Rezania, 2009).
Conclusions
In this paper, the LGP, GEP, and MEP paradigms were employed for the analysis of
complex geotechnical engineering systems. These methodologies were applied to the
assessment of the RCS of CFRD, slope stability, ground settlement above tunnels, and
soil liquefaction phenomenon. Reliable databases gathered from the literature were
used to develop the models. The following conclusions can be derived from the results
presented in this research:
(1) Despite high nonlinearity in the behavior of the investigated systems, the
proposed LGP, GEP and MEP models give reasonable estimates of the target
values. The validity of the models was verified for a part of test results beyond
the training data domain. The LGP models have the best overall behavior
followed by the GEP and MEP models.
(2) The proposed models efficiently take into consideration the effects of several
parameters representing the engineering behavior of the geotechnical problems.
(3) LGP, GEP, and MEP provide prediction equations that are relatively simple and
can be used for routine design practice via hand calculations.
(4) The LGP, GEP, and MEP-based models can be incorporated into the finite
element or finite difference methods in the same way as a conventional
constitutive model.
(5) The constitutive models derived using LGP, GEP, and MEP are basically
different from the conventional constitutive models based on the first
principles (e.g. elasticity and plasticity theories). One of the distinctive features
of the LGP, GEP, and MEP-based constitutive models is that they are based on
the experimental data rather than on assumptions made in developing the
conventional models. Consequently, as more data become available, these
models can be improved by re-training LGP, GEP, and MEP, without repeating
the development procedures from the beginning.
(6) It is possible to obtain more than one correlation for a complex phenomenon by
selecting various parameters and function sets involved in the LGP, GEP and
MEP predictive algorithms.
(7) LGP, GEP and MEP can be regarded as efficient tools for the analysis of Geotechnical
geotechnical engineering problems because of their unique learning, training engineering
and prediction characteristics. These methods are particularly practical for the
situations where: systems
.
good experimental data are available;
. the behavior is too complex; and
269
.
the conventional constitutive models are unable to effectively describe
various aspects of the behavior.
References
Adeli, H. (2001), “Neural networks in civil engineering: 1989-2000”, Computer-Aided Civil and
Infrastructure Engineering, Vol. 16 No. 2, pp. 126-42.
Ahmed, A.A., Ali, H.A., ElAraby, S.M., ElKateb, M. and Noureldin, S.M. (2008),
“Non-deterministic tunneling analysis using AI based techniques genetic programming
vs ANNs”, paper presented at 12th International Colloquium on Structural and
Geotechnical Engineering (ICSGE), Cairo.
Alavi, A.H. and Gandomi, A.H. (2010), “Energy-based numerical correlations for soil liquefaction
assessment”, Computers and Geotechnics, 8 July.
Alavi, A.H., Gandomi, A.H. and Heshmati, A.A.R. (2010a), “Discussion on soft computing
approach for real-time estimation of missing wave heights”, Ocean Engineering, Vol. 37
No. 13.
Alavi, A.H., Gandomi, A.H., Gandomi, M. and Sadat Hosseini, S.S. (2009), “Prediction of
maximum dry density and optimum moisture content of stabilized soil using RBF neural
networks”, The IES Journal Part A: Civil & Structural Engineering, Vol. 2 No. 2, pp. 98-106.
Alavi, A.H., Gandomi, A.H., Sahab, M.G. and Gandomi, M. (2010b), “Multi expression
programming: a new approach to formulation of soil classification”, Engineering with
Computers, Vol. 26 No. 2, pp. 111-18.
Alavi, A.H., Gandomi, A.H., Mollahasani, A., Heshmati, A.A.R. and Rashed, A. (2010c),
“Modeling of maximum dry density and optimum moisture content of stabilized soil using
artificial neural networks”, Journal of Plant Nutrition and Soil Science, Vol. 173 No. 3.
Alavi, A.H., Heshmati, A.A.R., Gandomi, A.H., Askarinejad, A. and Mirjalili, M. (2008),
“Utilisation of computational intelligence techniques for stabilised soil”, in Papadrakakis,
M. and Topping, B.H.V. (Eds), Engineering Computational Technology, Civil-Comp Press,
Edinburgh, paper 175.
Ambrozic, T. and Turk, G. (2003), “Prediction of subsidence due to underground mining by
artificial neural networks”, Computers & Geosciences, Vol. 29, pp. 627-37.
ANCLDI (1991), Guidelines on Concrete-faced Rockfill Dams, Australian National Committee on
Large Dams Incorporated, Yichang.
Atkinson, J.H. and Potts, D.M. (1979), “Subsidence above shallow tunnels in soft ground”,
Journal of Geotechnical and Geoenvironmental Engineering (ASCE), Vol. 103 No. 4,
pp. 307-25.
Attewell, P.B. and Farmer, I.W. (1974), “Ground deformations resulting from shield tunneling
in London clay”, Canadian Geotechnical Journal, Vol. 11, pp. 380-95.
Banzhaf, W., Nordin, P., Keller, R. and Francone, F. (1998), Genetic Programming –
An Introduction. On the Automatic Evolution of Computer Programs and its Application,
dpunkt/Morgan Kaufmann, San Francisco, CA.
EC Baykasoglu, A., Çevik, A., Özbakır, L. and Sinem, K. (2009), “Generating prediction
rules for liquefaction through data mining”, Expert Systems with Applications, Vol. 36
28,3 No. 10.
Baykasoglu, A., Gullub, H., Canakcı, H. and Ozbakır, L. (2008), “Prediction of compressive and
tensile strength of limestone via genetic programming”, Expert Systems with Applications,
Vol. 35 Nos 1/2, pp. 111-23.
270 BCD (2000), Highlights of Brazilian Dam Engineering, Brazilian Committee on Dams, Sao Paulo.
Brameier, M. and Banzhaf, W. (2001), “A comparison of linear genetic programming and neural
networks in medical data mining”, IEEE Transactions on Evolutionary Computation, Vol. 5
No. 1, pp. 17-26.
Brameier, M. and Banzhaf, W. (2007), Linear Genetic Programming, Springer ScienceþBusiness
Media LLC, New York, NY.
Cabalar, A.F. and Cevik, A. (2009), “Genetic programming-based attenuation relationships:
an application of recent earthquakes in Turkey”, Computers & Geosciences, Vol. 35,
pp. 1884-96.
Cevik, A. and Cabalar, A.F. (2009), “Modelling damping ratio and shear modulus of sand-mica
mixtures using genetic programming”, Expert Systems with Applications, Vol. 36 No. 4,
pp. 7749-57.
Clements, R.P. (1984), “Post-construction deformation of rockfill dams”, Journal of Geotechnical
Engineering ( ASCE ), Vol. 110 No. 7, pp. 821-40.
Clough, W. and Schmidt, B. (1981), “Design and performance of excavations and tunnels in
softclay”, Soft Clay Engineering, Elsevier, Amsterdam, pp. 100-4.
Conrads, M., Dolezal, O., Francone, F.D. and Nordin, P. (2001), Discipulus – Fast Genetic
Programming based on AIM Learning Technology, Register Machine Learning
Technologies, Littleton, CO.
Cooke, J.B. (1984), “Progress in rockfill dams”, Journal of Geotechnical Engineering (ASCE),
Vol. 110 No. 10, pp. 821-40.
Cramer, N.L. (1985), “A representation for the adaptive generation of simple sequential
programs”, Proceedings of the International Conference on Genetic Algorithms and Their
Applications, Hillsdale, NJ, July, pp. 183-7.
Cui, L. and Sheng, D. (2005), “Genetic algorithms in probabilistic finite element analysis
of geotechnical problems”, Computers and Geotechnics, Vol. 32 No. 8, pp. 555-63.
Darve, F. (1996), “Liquefaction phenomenon of granular materials and constitutive instability”,
Engineering Computations, Vol. 13 No. 7, pp. 5-28.
Desai, C.S. and Siriwardane, H.J. (1984), Constitutive Laws for Engineering Materials:
with Emphasis on Geologic Materials, Prentice-Hall, New Jersey, NJ.
Dobry, R., Ladd, R.S., Yokel, F.Y., Chung, R.M. and Powell, D. (1982), “Prediction of pore water
pressure buildup and liquefaction of sands during earthquakes by the cyclic strain
method”, Building Science Series, Vol. 138, National Bureau of Standards, US Department
of Commerce, US Governmental Printing Office, Washington, DC.
Duncan, J.M. (1996), “State of the art: limit equilibrium and finite element analysis of slopes”,
Journal of Geotechnical Engineering (ASCE), Vol. 122, pp. 577-96.
Ermini, L., Catani, F. and Casagli, N. (2005), “Artificial neural networks applied to landslide
susceptibility assessment”, Geomorphology, Vol. 66 Nos 1-4, pp. 327-43.
Ferreira, C. (2001), “Gene expression programming: a new adaptive algorithm for solving
problems”, Complex Systems, Vol. 13 No. 2, pp. 87-129.
Ferreira, C. (2006), Gene Expression Programming: Mathematical Modeling by an Artificial Geotechnical
Intelligence, 2nd ed., Springer, Heidelberg.
engineering
Francone, F.D. and Deschaine, L.M. (2004), “Extending the boundaries of design optimization by
integrating fast optimization techniques with machine-code-based, linear genetic systems
programming”, Information Sciences, Vol. 161, pp. 99-120.
Frank, I.E. and Todeschini, R. (1994), The Data Analysis Handbook, Elsevier, Amsterdam.
Friedberg, R.M. (1958), “A learning machine: Part I”, IBM Journal of Research and Development,
271
Vol. 2, pp. 2-13.
Gandomi, A.H., Alavi, A.H., Mirzahosseini, M.R. and Moghadas Nejad, F. (2010), “Nonlinear
genetic-based models for prediction of flow number of asphalt mixtures”, Journal of
Materials in Civil Engineering ( ASCE ), Vol. 23 No. 3, pp. 1-18.
GEPSOFT (2006), GeneXproTools Owner’s Manual, Version 4.0. available at: http://gepsoft.com/
Goh, A.T.C. (1994), “Seismic liquefaction potential assessed by neural networks”, Journal of
Geotechnical Engineering ( ASCE ), Vol. 120 No. 9, pp. 1467-80.
Goh, A.T.C. (1999), “Genetic algorithm search for critical slip surface in multiple-wedge stability
analysis”, Canadian Geotechnical Journal, Vol. 36 No. 2, pp. 382-91.
Goh, A.T.C. and Goh, S.H. (2007), “Support vector machines: their use in geotechnical engineering
as illustrated using seismic liquefaction data”, Computers and Geotechnics, Vol. 34,
pp. 410-21.
Hanna, A.M., Ural, D. and Saygili, G. (2007), “Evaluation of liquefaction potential of soil deposits
using artificial neural networks”, Engineering Computations, Vol. 24 No. 1, pp. 5-16.
Hashash, Y.M.A., Levasseurb, S., Osoulia, A., Finno, R. and Malecot, Y. (2010), “Comparison of
two inverse analysis techniques for learning deep excavation response”, Computers and
Geotechnics, Vol. 37 No. 3, pp. 323-33.
Hunter, G.J. (2003), “The pre- and post-failure deformation behavior of soil slopes”, PhD thesis,
University of New South Wales, Sydney.
Hunter, G.J. and Fell, R. (2003), “Rockfill modulus and settlement of concrete face rockfill dams”,
Journal of Geotechnical and Geoenvironmental Engineering ( ASCE ), Vol. 129 No. 10,
pp. 909-17.
Javadi, A.A. (2006), “Estimation of air losses in compressed air tunneling using neural network”,
Journal of Tunnelling and Underground Space Technology, Vol. 21 No. 1, pp. 9-20.
Javadi, A.A. and Rezania, M. (2009), “Applications of artificial intelligence and data mining
techniques in soil modeling”, Geomechanics and Engineering, Vol. 1 No. 1, pp. 53-74.
Javadi, A.A., Rezani, M. and Mousavi Nezhad, M. (2006), “Evaluation of liquefaction induced
lateral displacements using genetic programming”, Computers and Geotechnics, Vol. 33
Nos 4/5, pp. 222-33.
Javadi, A.A., Tan, T.P. and Elkassas, A.S.I. (2005), “Intelligent finite element method”, paper
presented at the 3rd MIT Conference on Computational Fluid and Solid Mechanics,
Cambridge, MA.
Johari, A., Habibagahi, G. and Ghahramani, A. (2006), “Prediction of soil-water characteristic
curve using genetic programming”, Journal of Geotechnical and Geoenvironmental
Engineering ( ASCE ), Vol. 132 No. 5, pp. 661-5.
Juang, C.H., Jiang, T. and Christopher, R.A. (2001), “Three-dimensional site characterisation:
neural network approach”, Geotechnique, Vol. 51 No. 9, pp. 799-809.
EC Juang, C.H., Yuan, H., Lee, D. and Lin, P. (2003), “Simplified cone penetration test-based method
for evaluating liquefaction resistance of soils”, Journal of Geotechnical and
28,3 Geoenvironmental Engineering ( ASCE ), Vol. 129 No. 11, pp. 66-80.
Kayadelen, C., Günaydın, O., Fener, M., Demir, A. and Özvan, A. (2009), “Modeling of the angle of
shearing resistance of soils using soft computing systems”, Expert Systems with
Applications, Vol. 36, pp. 11814-26.
272 Kim, Y.S. and Kim, B.T. (2008), “Prediction of relative crest settlement of concrete-faced rockfill
dams analyzed using an artificial neural network model”, Computers and Geotechnics,
Vol. 35, pp. 313-22.
Koza, J.R. (1992), Genetic Programming, on the Programming of Computers by Means of Natural
Selection, MIT Press, Cambridge, MA.
Kraslawski, A., Pedrycz, W. and Nyström, L. (1999), “Fuzzy neural network as instance generator
for case-based reasoning system: an example of selection of heat exchange equipment in
mixing”, Neural Computing & Applications, Vol. 8 No. 2, pp. 106-13.
Levasseur, S., Malécot, Y., Boulon, M. and Flavigny, E. (2007), “Soil parameter identification
using a genetic algorithm”, International Journal for Numerical and Analytical Methods in
Geomechanics, Vol. 32 No. 2, pp. 189-213.
Levasseur, S., Malécot, Y., Boulon, M. and Flavigny, E. (2009), “Statistical inverse analysis based
on genetic algorithm and principal component analysis: method and developments using
synthetic data”, International Journal for Numerical and Analytical Methods in
Geomechanics, Vol. 33 No. 12, pp. 1485-511.
Li, W., Daib, L., Houa, X. and Leia, W. (2007), “Fuzzy genetic programming method for analysis
of ground movements due to underground mining, technical note”, International Journal of
Rock Mechanics and Mining Sciences, Vol. 44, pp. 954-61.
Li, W., Mei, S., Zhai, S., Zhao, S. and Liang, X. (2006), “Fuzzy models for analysis of rock mass
displacements due to underground mining in mountainous areas”, International Journal of
Rock Mechanics and Mining Sciences, Vol. 43, pp. 503-11.
Liang, L. (2005), “Development of an energy method for evaluating the liquefaction potential of
a soil deposit”, PhD dissertation, Department of Civil Engineering, Case Western Reserve
University, Cleveland, OH.
Liu, F.M., Chen, Y.B., Liu, J. and Ni, Y.L. (1993), “Construction materials selection and
characteristics of Wan An Xi concrete faced rockfill dam”, High Earth-Rockfill Dams,
Beijing, Vol. 1, pp. 272-85.
McCombie, P. and Wilkinson, P. (2002), “The use of the simple genetic algorithm in finding
the critical factor of safety in slope stability analysis”, Computers and Geotechnics, Vol. 29
No. 8, pp. 699-714.
Majdi, A. and Beiki, M. (2009), “Evolving neural network using a genetic algorithm for predicting
the deformation modulus of rock masses”, International Journal of Rock Mechanics and
Mining Sciences, Vol. 47 No. 2, pp. 246-53.
Masters, T. (1993), Practical Neural Network Recipes in Cþþ , Academic Press, San Diego, CA.
Miller, J. and Thomson, P. (2002), “Cartesian genetic programming”, in Poli, R., Banzhaf, W.,
Langdon, B., Miller, J., Nordin, P. and Fogarty, T.C. (Eds), Genetic Programming, Springer,
Berlin.
Narendra, B.S., Sivapullaiah, P.V., Suresh, S. and Omkar, S.N. (2006), “Prediction of unconfined
compressive strength of soft grounds using computational intelligence techniques:
a comparative study”, Computers and Geotechnics, Vol. 33, pp. 196-208.
Nash, D. (1987), “A comparative review of limit equilibrium methods of stability analysis”, Geotechnical
in Anderson, M.G. and Richards, K.S. (Eds), Slope Stability for Geotechnical Engineering
and Geomorphology, Wiley, New York, NY, pp. 11-75. engineering
NCEER (1997), “In evaluation of liquefaction resistance of soils”, in Youd, T.L. and Idriss, I.M. systems
(Eds), Technical Report NCEER-97-0022, National Center for Earthquake Engineering
Research, State University of New York, New York, NY.
Neaupane, K.M. and Achet, S.H. (2004), “Use of backpropagation neural network for landslide 273
monitoring: a case study in the higher Himalaya”, Engineering Geology, Vol. 74, pp. 213-26.
Neaupane, K.M. and Adhikari, N.R. (2006), “Prediction of tunneling-induced ground movement
with the multi-layer perceptron”, Tunnelling and Underground Space Technology, Vol. 21,
pp. 151-9.
NRC (1985), Liquefaction of Soils During Earthquakes, Committee on Earthquake Engineering,
Commission on Engineering and Technical Systems, National Research Council, National
Academy Press, Washington, DC, p. 240.
Oltean, M. (2004), “Multi expression programming source code”, available at: http://mep.cs.
ubbcluj.ro/
Oltean, M. and Dumitrescu, D. (2002), “Multi expression programming”, Technical Report,
UBB-01-2002, Babeş-Bolyai University, Cluj-Napoca.
Oltean, M. and Grosşan, C. (2003a), “A comparison of several linear genetic programming
techniques”, Advances in Complex Systems, Vol. 14 No. 4, pp. 1-29.
Oltean, M. and Grosşan, C. (2003b), “Evolving evolutionary algorithms using multi
expression programming”, Artificial Life, LNAI 2801, Springer, Berlin, pp. 651-8.
Oltean, M. and Grosşan, C. (2003c), “Solving classification problems using infix form genetic
programming”, in Berthold, M. (Ed.), Intelligent Data Analysis, LNCS 2810, Springer,
Berlin, pp. 242-52.
Pal, M. (2006), “Support vector machines-based modelling of seismic liquefaction potential”,
International Journal for Numerical and Analytical Methods in Geomechanics, Vol. 30,
pp. 983-96.
Pal, S., Wije Wathugala, G. and Kundu, S. (1996), “Calibration of a constitutive model using
genetic algorithms”, Computers and Geotechnics, Vol. 19 No. 4, pp. 325-48.
Patterson, N. (2002), “Genetic programming with context-sensitive grammars”, PhD thesis,
School of Computer Science, University of Scotland, London.
Peck, R.B. (1969), “Deep excavation and tunneling in soft ground”, Proceedings of the International
Conference in Soil Mechanics and Foundation Engineering, Mexico City, pp. 225-90.
Poli, R., Langdon, W.B., McPhee, N.F. and Koza, J.R. (2007), “Genetic programming:
an introductory tutorial and a survey of techniques and applications”, Technical Report
[CES-475], University of Essex, Colchester.
Rezania, M. and Javadi, A.A. (2007), “A new genetic programming model for predicting settlement of
shallow foundations”, Canadian Geotechnical Journal, Vol. 44 No. 12, pp. 1462-73.
Rumelhart, D.E. and McClelland, J. (1986), Parallel Distributed Processing: Explorations in
Microstructure of Cognition, Massachusetts Institute of Technology Press, Cambridge,
MA, pp. 11-30.
Seed, H.B. and Idriss, I.M. (1971), “Simplified procedure for evaluating soil liquefaction
potential”, Journal of the Soil Mechanics and Foundations Division (ASCE), Vol. 97, SM8,
pp. 1249-74.
Seed, R.B., Cetin, K.O., Moss, R.E.S., Kammerer, A., Wu, J., Pestana, J.M., Riemer, M.F.,
Sancio, R.B., Bray, J.D., Kayen, R.E. and Faris, A. (2003), “Recent advances in soil
EC liquefaction engineering: a unified and consistent framework”, Keynote Address,
26th Annual Geotechnical Spring Seminar, Los Angeles Section of the GeoInstitute,
28,3 Los Angeles, CA.
Shahin, M.A., Jaksa, M.B. and Maier, H.R. (2008), “State of the art of artificial neural networks in
geotechnical engineering”, Electronic Journal of Geotechnical Engineering, Vol. 8, pp. 1-26,
available at: www.ejge.com/Bouquet08/shahin
274 Shahin, M.A., Jaksa, M.B. and Maier, H.R. (2009), “Recent advances and future challenges for
artificial neural systems in geotechnical engineering applications”, Advances in Artificial
Neural Systems, Vol. 2009, p. 9.
Shahin, M.A., Maier, H.R. and Jaksa, M.B. (2001), “Artificial neural network applications
in geotechnical engineering”, Australian Geomechanics, Vol. 36 No. 1, pp. 49-62.
Shin, H.S. and Pande, G.N. (2000), “On self-learning finite element code based on monitored
response of structures”, Computers and Geotechnics, Vol. 27, pp. 161-78.
Simpson, A.R. and Priest, S.D. (1993), “The application of genetic algorithms to optimisation
problems in geotechnics”, Computers and Geotechnics, Vol. 15 No. 1, pp. 1-19.
Thongyot, T. (1995), “Ground movement associated with 11 km water transmission bored tunnel
in Bangkok subsoil”, Masters thesis (GE-95-7), Asian Institute of Technology (AIT),
Thailand.
Torres, R.S., Falcão, A.X., Gonçalves, M.A., Papa, J.P., Zhang, B., Fan, W. and Fox, E.A. (2009),
“A genetic programming framework for content-based image retrieval”, Pattern
Recognition, Vol. 42 No. 2, pp. 283-92.
Wang, H.B., Xu, W.Y. and Xu, R.C. (2005), “Slope stability evaluation using back propagation
neural networks”, Engineering Geology, Vol. 80, pp. 302-15.
Whitman, R.V. (1971), “Resistance of soil to liquefaction and settlement”, Soils and Foundations,
Vol. 11 No. 4, pp. 59-68.
Yang, C.X., Tham, L.G., Feng, X.T., Wang, Y.J. and Lee, P.K.K. (2004), “Two-stepped
evolutionary algorithm and its application to stability analysis of slopes”, Journal of
Computing in Civil Engineering ( ASCE ), Vol. 18 No. 2, pp. 145-53.
Yoshikoshi, W., Osamu, W. and Takagaki, N. (1978), “Prediction of ground settlements
associated with shield tunneling”, Soils and Foundations, Vol. 18 No. 4, pp. 47-59.
Corresponding author
Amir Hossein Alavi can be contacted at: ah_alavi@hotmail.com
To purchase reprints of this article please e-mail: reprints@emeraldinsight.com

Or visit our web site for further details: www.emeraldinsight.com/reprints

A Robust Data Mining Approach

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Robust Data Mining Approach

Uploaded by

Copyright:

Available Formats

The current issue and full text archive of this journal is available at

Linear genetic programming

Gene expression programming

Application to geotechnical engineering problems

Algorithms Parameters Parameter setting

Model construction and analysis

LGP, GEP, and MEP-based formulations of crest settlement

Parameter H (m) e Ev (MPa) RCS (m)

Mean 96.43 0.26 105.57 0.21

LGP 0.876 0.090 0.067 0.994 0.041 0.039

The complexity of the slope system requires employment of efficient prediction

Model construction and analysis

Problem III: tunneling-induced ground settlement

Parameter H (m) u (o ) N (o ) ((kN/m3) c (kPa) FS

Mean 121.50 27.96 23.43 21.09 18.03 1.25

Model construction and analysis

LGP, GEP, and MEP-based formulations of tunneling-induced ground settlement

Problem IV: soil liquefaction

Parameter Z (m) D (m) Cu (kN/m2) Vl Smax (mm)

Mean 12.90 4.05 39.60 4.80 32.59

Model construction and analysis

264 LGP, GEP, and MEP-based formulations of soil liquefaction

Parameter qc (MPa) Rf (%) sv0 (kPa) sv (kPa) amax (g) Mw

Mean 5.82 1.22 74.65 106.89 0.29 6.95

Detected liquefaction class by the LGP model

Measured stability class

Detected liquefaction class by the GEP model

Measured stability class

Detected liquefaction class by the MEP model

Measured stability class

0.3 LGP (Az = 0.97)

To purchase reprints of this article please e-mail: reprints@emeraldinsight.com

You might also like