Professional Documents
Culture Documents
A thesis presented to
the faculty of
In partial fulfillment
Master Of Science
Vishal V. Ghai
March 2006
This thesis entitled
by
Vishal V. Ghai
Gary R. Weckman
Dennis Irwin
Engineering
Knowledge Based Approach Using Neural Networks For Predicting Corrosion Rate
(135 pp.)
A number of CO2 corrosion for the oil and gas industry exists. However, these
models lag significantly behind the needs of the industry. There is still a large knowledge
gap between actual processes occurring in the field and the current mechanistic and
phenomena is often such that our understanding is significantly lower than the level
required for the mechanistic modeling. There is a need to develop a model that would
have both the capability to predict the CO2 corrosion rate with high accuracy, as well as
provide knowledge that would aid the understanding of the phenomena. This thesis
focuses on the development of an Artificial Neural Network model based on CO2 field
data used in predicting the corrosion rate of carbon steel. Further, rules are extracted
from the trained network using a TREPAN decision tree algorithm to translate the
hypothesis learnt into symbolic form. Network model performance is then evaluated by
comparing it to a linear regression model using MINITAB. The efficacy of the rule set is
then compared to the C4.5 machine learning algorithm. The interrelationship of input
variables is discussed based on the constructed network model and the generated rule set.
Approved:
Gary R. Weckman
TABLE OF CONTENTS
Abstract ............................................................................................................................... 3
Table of Contents................................................................................................................ 5
CHAPTER 1. Introduction................................................................................................ 13
1. 6 Thesis Structure...................................................................................................... 19
2. 5. 2 Perturb Method............................................................................................... 44
3. 3 Semi-Empirical Models.......................................................................................... 68
CHAPTER 4. Methodology.............................................................................................. 77
4. 3 Sensitivity Analysis................................................................................................ 83
References....................................................................................................................... 127
9
LIST OF FIGURES
Figure 4.8 Separate Sensitivity for Total Acid Number (TAN) ....................................... 88
Figure 4.14 Sensitivity About the Mean for 1% Crude Oil Concentration ...................... 92
Figure 4.15 Sensitivity About the Mean for 20% Crude Oil Concentration .................... 92
Figure 4.16 Sensitivity About the Mean for 50% Crude Oil Concentration .................... 93
Figure 4.17 Sensitivity About the Mean for 80% Crude Oil Concentration .................... 93
Figure 4.18 Sensitivity About the Mean for 1% and 20% Combined .............................. 94
Figure 4.19 Sensitivity About the Mean for 50% and 80% Combined ............................ 95
Figure 4.20 Relationship Between % Crude Oil and Ni at Constant Inhibition Output... 96
Figure 4.21 Relationship Between % Crude Oil and TAN at Constant Inhibition
Output ...........................................................................................................97
Figure 4.22 Relationship Between API and Aromatics at Constant Inhibition Output .... 97
11
Figure 4.23 Relationship Curves Between % Crude Oil, Ni and TAN at Constant
Inhibition.......................................................................................................98
Variables. ....................................................................................................101
Figure 4.27 Partially Expanded View of Trepan_tree Extracted From the UNEVEN
Figure 5.1 Model Performance on Training vs. Test Data. ............................................ 111
12
LIST OF TABLES
Table 5.1 Prediction Accuracy of the 11-6-6-1 MLP Network. ..................................... 110
Table 5.2 Number of Features in the Rule Antecedent for the NN-Rule Set ................. 117
CHAPTER 1. INTRODUCTION
process where iron is dissolved at the anode and hydrogen is evolved at the cathode
This chemical reaction results in formation of solid FeCO3 films. Depending on the
conditions during formation, these films can be protective or non-protective. One of the
Fe → Fe 2+ + 2e − (1.2)
The presence of CO2 acts as a catalyst increasing the hydrogen evolution, thereby
increasing the corrosion rate of carbon steel in aqueous solution. Even at pH > 5 the
hydrogen evolution increases under the presence of H2CO3. Some researchers in their
work [1] and [2] assume that H2CO3 either serves as an extra source of H+ ions or is
reduced directly. It has also been assumed [2] and [3] that both these reactions are
independent of each other and the total cathodic current is the aggregate of the current
2 H + + 2e − → H 2 (1.3)
14
For more details on CO2 corrosion, refer to a number of publications covering this
field [1]-[8]. Particular attention is drawn to the recent reviews of the main design
considerations [9] and prediction techniques related to CO2 corrosion [10] compiled by
The majority of the oil and gas pipelines are made of carbon steel. Pipelines, like
other structures in nature, deteriorate over time. This deterioration in metallic pipeline
usually occurs as a result of the damaging effects of the surrounding environment. For
carbon steel, one of the most dominant forms of such deterioration is corrosion. The
corrosion problem is a major concern and becomes critical as a pipeline ages. Pipeline
operators throughout the world are confronted with the expensive and risky task of
operating aged pipelines because of corrosion and its potential damaging effects. The
major effect of corrosion is the loss of metal cross-section. This results in a reduction of
the pipeline’s carrying capacity and its safety. For a pipeline carrying live corrosion
defects, the major concern for the operator is the need to have a simple and quick
technique which can be used to analyze the rate of corrosion when a particular type of oil
is flowing into the pipeline. This information can be used to evaluate the pipeline’s
current reliability, and the time-dependent changes in it. This would help in determining
the effective safe-life of the pipeline, and an estimate of the time when the pipeline needs
15
to be changed. Changing a pipeline in the case of oil and gas industry is a very time
The role of crude oil in CO2 corrosion has gained special attention in the last few
years due to its significance when predicting or modeling corrosion rates. Modeling the
effect of crude oil in CO2 corrosion is not an easy task. Though many researchers have
worked in the area, the complexity and variations in the constituent of different crude oils
make it difficult to model its effects (properties such as wettability and corrosivity) on the
carbon steel.
Efird [11] stressed the importance of testing the effect of specific crude oils and
including this in corrosion prediction and testing. He also introduced the definition of
Corrosion Rate Break as the level of produced water in crude oil production where
corrosion is accelerated and becomes a problem. Smart [12] in 1993 presented his work
work that crude oils have surface active compounds (polar compounds containing
oxygen, nitrogen and sulfur) that strongly affect the wettability properties of brines.
Hernández et al. [13] In her work provided insight about the variables in crude oil
composition that could be playing a major role in the inhibition offered by crude oils.
16
For years, researchers have presented various approaches detailing the process of
corrosion. The task of corrosion prediction has been identified as a key approach in
utilizing the knowledge of the corrosion process and applying it to industrial corrosion
related problems. Many corrosion models have been developed over the years. These
models can be categorized into three main categories: empirical, semi-empirical and
mechanistic models, based on how firmly they are grounded in theory. These models
predict the corrosion rate with sufficient accuracy, but provide little knowledge into
understanding the corrosion process. It is also important to note that some of these
models are so complex that one needs a thorough understanding of the thermodynamic
and electro-chemical processes occurring during the process of corrosion. The everyday
industrial application calls for a corrosion model which is relatively easy to use, has high
understanding the interrelationship between variables affecting the corrosion rate, and
can be interpreted without the need of extensive chemical and thermodynamic knowledge
of corrosion process.
The success of a good model is based primarily on the consistency of a good data
set [14]. Corrosion data is generally expensive to produce, and large corrosion data sets
with sufficient consistency are difficult to find. The poor quality of the data may be due
• Errors in the data arising from poor experiment design, faulty equipment or
miscalculations.
• The summarization of data, e.g. by plotting lines without the data points on which
the lines are based, seriously limits the use of such data for further analysis.
The present empirical, semi-empirical and mechanistic models lack high accuracy
mainly due to their inability to model the corrosion process in the absence of large
consistent data sets. This leads to the necessity of developing a more robust model which
is able to predict the corrosion rate with high accuracy even in the presence of a limited
1. 5 CURRENT RESEARCH
• Developing a robust prediction model, capable of handling limited noisy data with
high accuracy.
• A model which can serve as a hybrid to the current mechanistic and empirical
Intelligence approach for modeling the corrosion rate. ANNs are being recognized as a
powerful and general technique for machine learning because of their non-linear
modeling abilities. Further, their distributed architecture is more robust in handling the
noise-ridden data. The hypothesis or model learned by the neural network is not explicitly
stated, but is implicitly enumerated in the network architecture. However, ANNs can be
1. To construct an ANN which can be used to predict the corrosion rate in carbon
steel.
19
3. To understand the interrelationship between the input variables and the impact
5. To compare prediction accuracy of the trained ANN model with the extracted
1. 6 THESIS STRUCTURE
The thesis has been organized into six chapters as follows: Chapter 1 introduced the
corrosion problem and explains the motivation of the current research. Chapter 2
provides background material for the various soft computing methods utilized in this
thesis. Chapter 3 undertakes a survey of the various approaches to solve the corrosion
prediction problem. Chapter 4 presents the methods and tools used in this work to
achieve the research objectives and focuses on the development of the neural network
model and implementation of rule extraction methods. Chapter 5 discusses the results of
the various techniques employed for the analysis. This chapter also describes in brief the
statistical and machine learning methodology used for analysis and comparison of the
20
results of the current research. Chapter 6 provides conclusions and suggestions for future
research.
21
uncertainty. The adoption of this approach has led to the development of systems that
have high MIQ (Machine Intelligence Quotient) [15]. SC-methodologies have proven to
be more successful then classical modeling, reasoning and search techniques in a wide
• Modeling difficulties: Generally, real world problems are poorly defined and
• Large-scale solution spaces: Problems with large-scale solution spaces are usually
effort is huge, and deterministic search does not employ mechanisms for
crisp classifications and unambiguous definitions are not always possible. Also, in
some cases, there is a need to directly acquire knowledge from problem data
described problem characteristics to yield tractable and robust intelligent systems at low
solution cost. The discipline of SC encompasses several paradigms such as fuzzy set
theory, neural networks, approximate reasoning, and stochastic optimization methods like
genetic algorithms, simulated annealing and machine learning techniques. SC unites these
construction of innovative hybrid intelligent systems. The key strengths of the constituent
• Fuzzy Set Theory allows for imprecise knowledge representation in the form of
• Neural Networks exhibit learning and adaptive behavior with non-linear modeling
capabilities.
• Genetic Algorithms provide systematic global search of solution space and are
solutions.
several real world problems in robotics, space flight, process control, production and
aerospace applications [17]. The remaining sections of this chapter provide the necessary
background material for the SC-methodologies utilized in the current research effort.
2. 2 GENETIC ALGORITHMS
Genetic Algorithms (GAs), first proposed by John Holland [18], are stochastic
search techniques used for optimization problems. The basic methodology is rooted in the
optimization tools, GAs offer several unique advantages over conventional optimization
techniques. They combine elements of directed and stochastic search methods, striking a
good balance between the exploration and exploitation of the solution space [19]. One of
not required for determining the search direction in these algorithms. This characteristic
makes them a flexible tool for optimizing a large number of objective functions, which
Genetic Algorithms work with solution populations rather than single members, making
The solution or search space contains all feasible solutions. Each point in this
space, called a chromosome, has an associated fitness value that usually equals the
which is repeatedly evolved over generations towards better fitness. The next generation
is created from the current population by using genetic operators like crossover and
mutation. The chromosomes with a higher fitness value survive and participate in the
creation of new populations. This ensures that successful chromosomes pass their good
genes to the next generation. The population continuously evolves toward better fitness,
and the algorithm converges to the best chromosome after several generations.
2. 3 MACHINE LEARNING
The ability to learn from examples and construct a model of the world is the
foundation of biological intelligence. This model of the world, often implicit, allows us to
adapt to a dynamically changing environment and is necessary for our survival. Artificial
Intelligence (AI) aims at constructing artifacts (machines, programs) that have the
capability to learn, adapt and exhibit human-like intelligence. Hence, learning is the key
for practical applications of AI. The field of machine learning is the study of methods for
programming computers to learn [20]. Many important algorithms have been developed
and successfully applied to diverse learning tasks such as speech recognition, game
format, referred to as a training set, or input. The system then generates a model or
etc). The model is evaluated based on its ability to correctly generalize with examples not
to gain insight into the problem domain, laying a special emphasis on the criterion of
comprehensibility. It refers to the ease of understanding the model by a human user and
serves the purpose of validation, knowledge discovery and refinement. Fayyad et al. [22]
the developing the area of knowledge discovery in databases and data mining.
Although a wide choice of machine learning schemes are available, they differ
different learning algorithms and evaluating the induced model in terms of predictive
methods, decision tree induction and attribute oriented induction. Neural networks, the
main class of non-symbolic machine learning tools used in this research are covered in a
Decision trees are among the most popular symbolic machine learning algorithms.
They express the learned hypothesis or target function using a unique representation
format known as a decision tree. Decision trees can easily be compiled into simple if-then
rules for improving human comprehensibility. GAs have been applied in a number of
places from diagnosis of medical cases to learning to asses the credit risk of loan
applicants [23]. The example shown in table Table 2.1 [25] depicts the training data for
the target concept, PlayTennis. This decision tree classifies Saturday mornings as suitable
Each node in the tree indicated by an oval specifies a logical test based on some
attribute or feature in the problem. The tree has a root node, outlook, having three
possible attribute values: sunny, overcast and rain. Each of the outgoing branches from a
node corresponds to one of the possible values of the attribute. Hence, the root node has
three branches. A tree has also a set of leaf nodes, which represents the outcome of the
classifier (i.e., decision to play tennis or not). The classification of an instance involves
traversal through the tree starting at the root node until a leaf node is encountered. The
example instance (outlook = sunny, humidity = high) would follow the leftmost branch of
Outlook
No Yes No Yes
The tree predicts that target concept, PlayTennis = no, indicating unsuitable
weather conditions for playing tennis. Also, the attribute temperature was not utilized in
Many decision tree induction methods have been developed in the last two decades
with different capabilities and requirements. The ID3 algorithm is the core algorithm on
which many variants have been developed. The algorithm constructs a decision tree in a
decision tree induction. ID3 uses a statistical property called information gain to select
29
Let S denote the set of training instances. The information gain, InfoGain (T)
instance in S, and infoT (S) is the corresponding measure after partitioning the set S based
on attribute T. If the set has k possible partitions for k classes, the information content is:
k
freq(Cj , S ) ⎛ freq(Cj , S ) ⎞
info(S) = − ∑ |S|
log 2 ⎜⎜
|S|
⎟⎟ (2.2)
j =1 ⎝ ⎠
Here freq(Cj, S) is the number of examples of class Cj and j ranges over k classes.
Given a partition based on attribute T, the expected value of information over the induced
n |S |
infoT (S) = −∑ i
.info ( S i ) (2.3)
i =1 | S |
In the above expression, Si is the subset of examples in S having and ith outcome,
The procedure of selecting a splitting attribute and partitioning the training instance
set is recursively done for each internal node. Only the examples, which reach that node
30
(i.e., the examples that satisfy logical tests on the path), are used in attribute selection.
2. The training instances at a given node belong to the same class. If so, the node is
The major flaw with the ID3 algorithm based on the gain criterion is that it has a
strong bias towards the test with greater outcomes [104]. Let us consider a hypothetical
medical diagnosis task in which one of the attributes contains a patient identification.
Since every such identification is intended to be unique, partitioning any of the training
cases on the values of this attribute will lead to a large number of subsets, each
containing just one case. Since all of these one-case subsets necessarily contain cases of a
single class, InfoT (S)=0, so information gain from using this attribute to partition the set
of training cases is maximal. From the point of view of prediction, however, such a
The bias inherent in the gain criterion was later rectified in the C4.5 algorithm [25]
by employing the gain ratio. Let us take the scenario where the information about a case
indicates the outcome of a test rather then the class information to which the case
n |T | ⎛ Ti ⎞
split info(S) = − ∑ i
× log 2 ⎜⎜ ⎟
⎟
(2.4)
i =1 | T | ⎝T ⎠
31
by dividing T into n subsets and info(S) provides knowledge about the classes that are
generated by the division. Equation 2.5 represents the amount of information that is
2. 3. 2 ATTRIBUTE-ORIENTED INDUCTION
When the training set for learning is provided as a database, the task of inducing
hypothesis describing the data is called data mining. In real world applications, databases
are predominantly used for representing and maintaining information. Often, the
information is enormous, noisy, uncertain, and can involve missing values. A growing
need for knowledge discovery in databases led to the rapid development and adaptation
operations to discover rules. Figure 2.2 shows the inputs and output of the AOI method.
32
Database
Queries Attribute Oriented Generalized
Induction Relation
List of Attributes
Concept Hierarchy
given by the experts or automatically generated by data analysis [27]. AOI is capable of
utilizing these concept hierarchies to generate logical rules. The following are some of
• Vote Aggregation: The number of identical tuples being merged during the tree
the number of tuples in the initial relation that are generalized to the current
relation.
• Rule Transformation: The obtained final relation is a logical rule in the format
characteristic rules (LCHR) and learning classification rules (LCLR) [28]. Their
induction procedures are similar, differing in the attribute generalization process. Further
details about the AOI methodology can be found in [26] and [28].
2. 4. 1 NEURAL COMPUTATION
The motivation for the early development of neural networks stemmed from the
desire to mimic the functionality of the human brain. A neural network is an intelligent
data-driven modeling tool that is able to capture and represent complex and non-linear
input/output relationships. Neural networks are used in many important applications, such
prediction, optimization and noise-filtering. They are used in many commercial products
data mining, knowledge acquisition systems and medical instrumentation, etc [29].
A neural network is consists of many layers of nodes. These nodes are linked by
connections, with each connection having an associated weight, Wi. The weight of a
connection is a measure of its strength and its sign is indicative of the excitation or
inhibition potential. Figure 2.3 shows a simple perceptron having n inputs, {X1, X2… Xi…
Xn}.
34
W1
X1
W2
X2
Input Output
Wi
f (∑ Xi Wi) - θ
Xi
Wn
Xn
The perceptron has a threshold or bias, θ, which is the value of the net input
required to produce non-zero activation. The net input to a perceptron, neti, is given by:
A transfer function, f maps the net input to a range, O which is the activation or
Neural networks have two distinct phases of operation: training and production.
Some design parameters need to be chosen before training the network. These include:
35
• Training Algorithm: The training algorithm and the performance measure or the
cost function.
Parameters like weights and biases are modified during the training phase. The
network uses problem data to assign values to these parameters. The distinguishing
information flow design as depicted in the Figure 2.4 (adapted from Principe et al. [30]).
The performance feedback loop utilizes a cost function to provide a measure of deviation
between the calculated output and the desired output. This performance feedback is
utilized directly to adapt the parameters, W and θ, so that the system output improves
Desired
Training
algorithm Error
layers. A single hidden layer network is illustrated in Figure 2.5. The input layer contains
nodes that represent the inputs of the given problem. Each input is represented by a single
node in the input layer. The hidden layer maps the input to an intermediate space, which
X1 X2 X3 X4
Input Layer
Hidden Layer
Output Layer
Y1 Y2
The output layer represents the response/output. The output node, as shown in
Figure 2.5, allows us to determine the response/output from the input variables. MLPs
have been proven to be universal approximators [31], capable of implementing any given
function. This is only possible with the choice of non-linear transfer functions. Two of
the most commonly used functions are the logistic function and hyperbolic tangent
function. The two functions are different in the range of their output values as illustrated
The logistic function has an output range [0, 1], and the activation of a node, ai is
given by:
1
ai = (2.8)
1+ e - net input
The hyperbolic tangent function compresses a unit’s net input into an activation
The training phase in neural networks provides the answer to the following
questions: Is there a set of network parameters (weights and biases) that allows a network
to map a given set of input patterns to the desired outputs? If so, how are the parameters
the direction of propagation of error. The training regimen adjusts the weights and biases
of the network to minimize the cost function. Though several cost functions are available,
the function appropriate for prediction problems is the cross-entropy function [33]:
In the above equation, E is the cross entropy, p represents the number of training
patterns and i represents the number of classes. The term, y pi is the estimated probability
that an input pattern belongs to class i, and tpi is the target with the range [0, 1]. Network
output is interpreted as the probability that the given input pattern belongs to a certain
class.
The cost function, E needs to be minimized and its derivative, with respect to the
weight is calculated and denoted by ∂E/∂w. Having obtained the derivative, adjusting
∂E
Weight Update, Δwij = −η (2.11)
∂wij
In this equation wij, represents weight passing from the node i to the node j. η > 0
represents the learning rate and ∂E/∂wij, is the partial derivative of the error, E with
respect to wij. In the initial phase the random weights are assigned to the network and the
training algorithms modifies these weights according to the above discussed procedure.
Many alternative optimization techniques have been utilized; variations of the basic
method include methods like the conjugate-gradient method, momentum learning, etc.
Stochastic search algorithms such as genetic algorithms have been applied to avoid the
2. 4. 4 GENERALIZATION CONSIDERATION
The collection of input pattern-desired response pairs used to train the learning
system is called the training set. The testing set contains examples not used for the
training purpose and it is used to evaluate the generalization capabilities of the network.
Vapnik [36] indicates that performance of the network trained with back-propagation
always improves with the number of training cycles. However, the error on the testing set
initially decreases with the number of cycles, and then increases as shown in Figure 2.7
41
Prediction Error
Stopping point
Testing Set
Training Set
Number of cycles
capabilities. One solution to this problem is to split the training set into two sets – the
training set and validation set. After every fixed number of iterations, the error on the
validation set is calculated. Training is terminated when this error starts to increase. This
Once the neural network has been trained on a specific network topology, the next
step in the modeling of the process using ANN involves extracting knowledge from the
understand the predictive modeling process it is imperative to analyze these weights and
42
extract information regarding the contribution of input variables on the final output. Also
these input variables may not always be independent of each other. The interrelationship
between the input variables significantly affects the contribution towards the final
response/output.
The next section delineates some of the important methods used to determine the
The Partial Derivative Method (PaD) consists of calculating the partial derivatives
of the response variables depending on the values of the input variables [37] and [38];
• A profile of the output variations which indicates small changes for each input
variable.
• A classification of the relative contributions that each variable has towards the
A partial derivative of the output generated by the network is computed w.r.t the
input to obtain the profile of the variations on the output with a small change in the input
variable [106]. For a network structure consisting of one hidden layer of nh neurons, ni
inputs and a single output variable (i.e. no=1), the partial derivative of the response
variable yj w.r.t input xj (with j =/1. . . N and N the total number of observations) are:
43
nh
d ji = S j ∑ who I hj (1 − I hj ) wih (2.12)
h =1
In this equation Sj denotes the partial derivative of the resulting output neuron with
reference to the input. Ihj is the response of the hth hidden neuron, who and wih are the
weights between the output neuron and hth hidden neuron, and between the ith input
A set of graphs of the partial derivatives versus each corresponding input variable
can then be plotted, which would enable a visual representation of the effect that
individual input variable has on the network output. Interpretation of one of these graphs
is that, if the partial derivative is negative then, for this value of the studied variable, the
output variable would tend to decrease, while the input variable increases. Inversely, if
the partial derivatives are positive, the output variable would tend to increase, while the
input variable also increases. The second result of PaD concerns the relative contribution
of the ANN output to the data set with respect to an input. It is calculated by a sum of the
n
SSDi = ∑ (d ji ) 2 (2.13)
j =1
One SSD (Sum of Square Derivatives) value is obtained per input variable. The
SSD values allow classification of the variables according to their increasing contribution
to the output variable in the model. The input variable with the highest SSD value would
2. 5. 2 PERTURB METHOD
The ‘Perturb’ method corresponds to a perturbation of the input variables [39]. This
method aims to assess the effect of small changes in each input of the neural network
particular input variable while maintaining the others constant and to record the
corresponding output. The variable with greatest influence on the network output is
considered to be the most significant to the model. The mean square error (MSE) is
expected to increase as a larger amount of noise is added to the selected input variable
[39] and [40]. The aim is to assess the effect of small changes in each input on the neural
network output. Classification of the input variables can be obtained by order of their
importance.
2. 5. 3 SENSITIVITY ANALYSIS
Sensitivity analysis [41], [42], [43], [107], extracts the interrelationship between
the input and the output variables of the network. It is significant to gather information
regarding the influence of the input variables on the network response during the training
cycle, as this would provide feedback as to which input channels are most significant.
The input space is the pruned by removing the insignificant channels resulting in reduced
network size thereby simplifying the network complexity and reducing training times.
Sensitivity analysis [108] helps in understanding the influence of each input variable on
the network response generated. The magnitude of the one of the input variables is varied
45
over its entire range. During this process all the other input variables are held constant at
their mean values. The network learning is disabled during this operation in order to
make sure that the network weights are not affected. The methodology of sensitivity
analysis becomes rapidly complex with the increase in the number of input variables for a
given neural network model. The common approach in order to simplify the process as
described by Olden and Jackson [108] is to find out the basic summary statistics such as
the min, max and mean values for each of the variables. The network response is
recorded as the value of the variable is varied over the entire range and this provides a
information helps in understanding the relative contribution of each input variable and
2. 5. 4 GARSON’S ALGORITHM
form of connection weights. The input from each of the input variable is fed to the
network model as a weight. The contribution of each of these input variables to the output
mainly depends on magnitude and the direction of the connection weights [48]. As
described by Olden and Jackson [108], a positive connection weight increases the
magnitude of the network output whereas a negative weight inhibits the value of the
considered to have a greater impact on the network output as compared to the others. The
46
mapping between the inputs variables and the predicted response generated in the case of
a MLP, is a bi-level process of information flow involving weight transfer from input to
hidden and then from hidden to output layer. An important fact which has been noticed
[48] is that when the direction of the connection weight is same (positive or negative)
between the input-hidden and the hidden-output layer it positive effect on the network
output. With the significant amount of knowledge that could be extracted by studying the
flow patterns of connection weight, the next step for researchers was to partition the
connection weights so establish the relative contribution of each of the input variable
toward the network output. In 1991 Garson [49] formulated an algorithm that would a
percentage breakup of the relative importance of each of the input variable in a given
network. Further enhancements to this algorithm were later proposed by Goh [50]. An
Forward MLP with two Processing Elements (PEs) is shown in Figure 2.8.
47
Step 2: The contribution of each input neuron to the output via each hidden neuron
Step 3: Relative contribution of each input neuron to the outgoing signal of each
hidden neuron (RA1) and the sum of input neuron contributions (S1) is shown in
Table 2.4:
RI 1 = S1 / ((S1 + S 2 + S 3 ) × 100 )
(2.17)
= 1.05 / ((1.05 + 0.25 + 0.70 ) × 100) = 52.5%
Relative Importance
Input 1 RI1 = 52.5%
Input 2 RI2 = 12.5%
With the increase in the number of input variable in the network structure with
more than one hidden layer, it becomes difficult to associate the contribution of the input
variables on the network output based on the magnitude of the connection weights. In
order to simply this problem and provide a visual representation of the network and the
50
connection weights, Özesmi & Özesmi [53] developed the Neural Interpretation Diagram
(NID). The underlying methodology [108] was to represent the connection weights in the
form of line joining neurons in each of the layers in the network. The line thickness was
the representation of the magnitude of the connection weights, with thick lines
representing higher weights. The direction of the connection weights was represented
using the line shadings. Solid lines were used to signify the excitatory signal and dashed
lines were used for inhibitory signals). Study of magnitude and director of connection
weights helps in predicting the variable contribution [51], [52] as well as understanding
the interactions between the input variables. A sample NID illustrating the direction and
In their work [109] Olden and Jackson explain the concept that when like (positive
from input-hidden to hidden-output layer. The product of the two connection weights
subsequently passing between the layers of the MLP gives the final effect of the input
clear testament to the enormous capabilities of the ANN paradigm. There are three salient
acquire knowledge about a given problem domain through the training phase. The
power to understand and learn linear as well as non-linear relationships helps the
ANN to learn with relative ease and model the problem. This process is quite
symbolic AI systems.
ease and speed with which this ‘knowledge’ can be accessed and used aids in ease
of analysis.
• The robustness of an ANN solution in the presence of noise in the input data. This
helps to build network models that have relatively higher accuracy even in the
Another advantage of the trained ANN is the high degree of accuracy reported
when an ANN solution generalized over unseen examples from the problem domain. In
spite of the salient characteristics the ANN have a major drawback of being unable to
53
clearly explain the process of generating the results [110]. The basic idea of a learning
and generalization tool is to explain the process and provide knowledge. For ANN to gain
with a tool that would provide comprehensible explanation for the results and the
underlying methodology. For applications where ANN is to be used for safety critical
applications such as airlines, medical diagnosis, power plants, life cycle of gas pipelines
etc., it is essential for the ANN system to have three important capabilities:
1. Providing capability for a user to validate the results generated by the ANN for
2. Providing capability for user to define the boundary conditions for the input
variables under which the system would satisfactorily perform the task of
generating a desired output with sufficient reliability [110]. This would provide
Intelligence.
3. Within a trained ANN, the capability should exist for the user to determine
for ANN solutions to not only be transparent but also provide information about
the internal states of the system. The satisfaction of such requirement would
excluding those ANN-based solutions that have the potential to give erroneous
54
suboptimal.
The knowledge represents the hypothesis or model learned by the network. Usually
these models are difficult to understand because the processing in a neural network
parameters. It may not always be possible to directly translate these large sets of real
valued parameters into symbols or concepts that have semantic significance. Mapping
between input features and target concept is represented by the hidden units in the
network. Thus, hidden units represent higher-level derived features, which may not
The goal of rule extraction approaches is to express the knowledge gained from the
network into symbolic inference rules. The proliferation of rule extraction techniques has
55
prompted researchers [54] [55] to develop criteria to evaluate the proposed algorithms
comprehensible.
• Expressive power: The structure of the output presented to the end-user. Various
inference rules, decision trees, etc. can be used based on the problem domain.
representations.
etc).
architecture.
One of the earliest approach for extracting comprehensible rules from ANN can be
found in the work of Gallant [56] on connectionist expert systems. Classification rules
56
describing the network’s behavior were obtained by analyzing the role of attribute
been developed since then for addressing the problem of comprehensibility in neural
networks. According to Andrews et al. [57], rule extraction methods can be classified
into three categories, based on the view taken by the algorithms of the underlying
In the Decompositional methods, rules are extracted from the network at every
neuron of the hidden and output layers. These rules are then combined to describe the
behavior of the overall network. This approach can be considered as a local approach to
rule extraction as the analysis is primarily based of the architecture of the network. Most
approaches within this category employ a search procedure for finding subsets of
incoming weights that exceed the bias or threshold on a node. The identified subsets of
such activations are translated into propositional rules. The subset method by Fu [58] and
the M-of-N algorithm developed by Towell and Shavlik [59] are generic representatives
of this category. The subset method extracts simple propositional rules. The M-of-N
expression is satisfied when m of the possible n antecedents are satisfied. Setiono [60] in
his work describes the method to extract rules where first the activation values at the
hidden layer neurons are grouped together, and then the network is repeatedly split into
sub-networks for ease of analysis. The RULEX technique developed by Andrews and
Geva [61] directly interprets the weight vectors as rules. This technique can be used only
57
for a particular type of multilayer perceptron called the Constrained Error Back-
approach to rule extraction has various limitations. The algorithmic complexity increases
exponentially with network complexity. Various restrictions are imposed on the network
architecture and the training procedures, which adversely affects the generalization
Pedagogical techniques extract rules that map network inputs to outputs directly.
The approach used by Saito and Nakano’s approach [62], was to select useful rules from
the rule set generated using input activation values of the network which activate a given
approach, which uses the characteristics of querying the network. Their rule extraction
process consisted of systematic sampling of the network data and then generating queries
to extract rule set from the network. This approach is less computationally intensive than
search based methods. Validity Interval Analysis (VI-Analysis) proposed by Thurn [64],
throughout the network. Linear programming is used to determine if the set of proposed
validity intervals are consistent with the network’s activation values on all nodes. The
rules from a neural network. It is based on the principle that changing the truth value of
one of the antecedents in a conjunctive rule changes the consequent of the rule.
58
Several pedagogical approaches have also been developed for extracting decision
tree representations of the neural network. Craven and Shavlik [66] extract decision trees
from trained neural networks using a novel algorithm named TREPAN. This algorithm
employs a greedy gain ratio criterion for evaluating attribute splits. Binary and M-of-N
decision trees can be derived by this method. The ANN-DT (Artificial Neural Network -
Decision Tree) algorithm proposed by Shmitz et al. [67] is capable of growing binary
decision trees from neural networks by using attribute selection criteria based on
significance analysis for continuous valued features. The DecText (Decision Tree
Extractor) algorithm [68] is effective in extracting high fidelity trees from trained
networks. The paper also proposes different criteria for selecting an attribute to partition
combines elements of the basic categories discussed above. The BRAINNE system
proposed by Sestino and Dillon [69] extracts simple if-then rules. The method uses a
unique approach to handle continuous data without discretization. The genetic algorithm
based rule extraction approach was developed by Keedwell et al. [70]. Genes contain the
weight between two adjacent layers. Chromosomes are then constructed to represent a
path from the input to the output layer. The fitness function is calculated as the product of
the weights being transferred from the input the output layer. The algorithm identifies the
fittest chromosomes, which subsequently mapped into if-then rules. A major limitation of
One of the more popular rule extraction and refinement techniques is the Validity
Interval Analysis (VI-Analysis) [64]. The basic procedure of the VI-Analysis algorithm
• Generation of candidate rule set: For rule extraction, the first step is the
intervals (range of its activation values) to be specified on the input and output
them through the network in two phases: forward and backward. Linear
intervals.
• Rule validation based on convergence: There are two possible outcomes to the
validity intervals are inconsistent with the behavior of the network. The rule is
rejected and steps 2-4 are repeated with another candidate rule.
calls to an optimization module. In addition, activation levels of the nodes are assumed
independent of one another. This assumption is not always valid and the algorithm may
60
not find maximally general rules. Maire [71] shows that VI-Analysis always converges
in one run (forward and backward phase) for single layer networks and has an
representation of the algorithm adapted from [67] is shown in Figure 2.10. As illustrated
in the figure, the sampled data set S is split into two data sets S1 and S2, based on the
selected attribute. The main steps in the ANN-DT algorithm are as follows:
sampling of the feature space and the class labels are obtained by querying the
2. Selection of Attribute: For discrete output classes, gain ratio is used for selecting
also be used.
3. Stopping Criteria: The selected attribute splits the current set of data into two
classes, the process is terminated when an internal node contains data with one
Estimate neighbor-
hood areas for S1
interpolation
2. 6. 5 TREPAN ALGORITHM
algorithms such as C4.5 [25] but differs by extracting knowledge from a trained network
as an inductive learning task. The resulting decision tree approximates the network.
62
A major difference between TREPAN and other decision tree algorithms is the use of
an oracle that makes membership queries and returns the class labels. The network itself
serves as the oracle and answering the membership queries means using the network to
classify an instance. This information is used in developing the nodes and leafs of a tree.
Generally, an attribute is selected to be placed at the root node. In the next step a branch
is then added to this node of the tree for each possible value of this attribute. The
branching process splits the data set into a given number of subsets. The process is
recursively repeated at every branch, using only those data patterns that actually reach the
branch. If the number is less than the threshold, the oracle is used. The branching
continues until all the patterns that reach a leaf node belong to the same class. No further
expansion of this leaf node is necessary and the node is designated with the appropriate
class label. The expansion then proceeds to other branches of the tree, until all possible
leaf nodes have been produced. For more detail on TREPAN see Craven and Shavlik [63].
TREPAN uses membership queries at each instance of the learner’s instance space,
to determine the class labels for each instance. This membership query is a question to
oracle (the network model) and returns the class label. TREPAN utilizes DRAWSAMPLE
routine to get a set of query instance to use for membership queries. These query
instances are subject to a set of constraints determined by the location of the node in the
64
tree. The constraints mainly state that instances should have outcomes for the tests at
nodes higher in the tree that cause the instances to follow the path from the root to the
given node.
The CONSTRUCTTEST function is used to determine the splitting test for a particular
node. TREPAN uses m-of-n expression for its tests. The m-of-n expression is a Boolean
Boolean literals. The expression m-of-n is satisfied when at least m of its n literals are
satisfied.
TREPAN ensures that a minimum number of instances exist at a given node, before
giving a class label to the node or choosing a splitting test for it. The data set to be used at
specified field. The parameter controls both the size as well as the depth of the decision
tree and in turn influences the classification accuracy of the decision tree.
65
PROBLEM
Many different models of CO2 corrosion exist. These can be arbitrarily classified
into three categories based on how firmly they are grounded in theory:
• Mechanistic Models
• Semi-empirical Models
• Empirical Models
reactions and have a strong theoretical background. All or most of the constants
appearing in this type of model have a clear physical meaning. Many of the numerical
values can be found in the literature about corrosion. When calibrated on a reliable
experimental database this type of model should, in principle, enable accurate and
as well as extrapolation predictions. It is easy to add new knowledge to these models with
minimal modifications of the existing model structure and without having to recalibrate
Semi-empirical Models: these models are only partly based on firm theoretical
hypothesis. They are, for practical purposes, often extended to areas where insufficient
theoretical knowledge is available so that the additional phenomena are described with
empirical functions. Some of the constants appearing in these models have a clear
physical meaning while the others are just best-fit parameters. Calibrated with a
sufficiently large and reliable experimental database, such models will enable good
physically unrealistic results. New knowledge can be added with some effort usually by
constants.
Most or all of the constants have little physical meaning – they are just best-fit
parameters to the available experimental results. When calibrated with a large and
no assurance that the arbitrary empirical correlations hold outside of their calibration
domain. The addition of any new knowledge to this model is rather difficult and requires
interactions with the existing empirical constants, can be added with some degree of
uncertainty.
67
The next section briefly discusses some of the significant models belonging to each
of these categories.
3. 2 MECHANISTIC MODELS
simultaneously. Thus the processes to be modeled are electrochemical reactions and the
flow of the different components of the system such as H+, CO2, H2CO3 and Fe++, as well
as the chemical reactions occurring between them. There is no single model which would
be able to grasp all of these complexities, but there have been studies which incorporate
occurring to the metal surface. One of the most significant and widely used models based
was proposed by Wards and Milliams [1], [4]. Due to some of the basic assumptions
made by the authors in the modeling process, the model was questioned for validity [72],
[73], [3]. Wards and Milliams [1], [4] later revised their model [74] based on some of the
constants determined by experiments of Dugstad el al. [8]. This revised model has been
used on several occasions in order to extend its validity into areas concerning corrosion
presence of protective films [7], [74]. Another electrochemical model was presented by
Gray et al. [2], [6] which contained constants based on the own glass cell and flow cell. It
68
was a breakthrough in its scope and approach of the study in the field of CO2 corrosion
modeling. Nesic et al. [3] did a follow up study and presented another electrochemical
model. This model had a successful predictive accuracy as compared to the semi-
empirical models of de Waard et al. [7] and Dugstad el al. [8]. The model of Nesic et al.
[3], described the electrochemical process occurring on the metal surface in detail, but
process of transport leading to the occurrence of currents was oversimplified. Pots [76]
later presented a more realistic model describing the transport processes in the boundary
layer for the case of CO2 corrosion. His model was based on the approach of Turgoose et
al. [77] who was the first to model the phenomena. Archour et al. [78], Dalayan et al.
[79] used their own mechanistic model to simulate pit propagation of carbon steel in CO2
3. 3 SEMI-EMPIRICAL MODELS
Most of the models based on the mechanistic approach are called the “worst-case”
models because they do not take into consideration the presence of protective surface
films, corrosion inhibitors, hydrocarbons, different steel types, high pressure and other
When the walls of the pipeline are wetted with oil (hydrophobic) no corrosion is
possible. Also some of the components of crude oil have inhibitive properties, which help
in obtaining more protective films. This crucial factor was incorporated in the modeling
69
process by de Waard and Lotz [7] by accounting for a factor called water-wetting factor
Waard et al. [74] presented a semi-empirical model based on their initial study [1],
which considered the effects of protective films and corrosion inhibitors which modeling
corrosion. Dugstad et al. [8] presented his semi-empirical model of CO2 corrosion based
on the temperature-dependent basic equation (best fit polynomial function). Pots [75],
[76] presented a more realistic semi-empirical model describing the transport processes in
the boundary layer for the case of CO2 corrosion. His model was based on the approach
of Turgoose et al. [77] who was the first to model the phenomena.
and flow lines carrying oils operate under multi-phase flow conditions. Modeling of
multi-phase flow alone is a difficult task, even more so is its effect on CO2 corrosion.
Jepson et al. [80] presented a semi-empirical model suggesting the importance of the
3. 4 EMPIRICAL MODELS
It has been observed that CO2 corrosion rates in the field in presence of crude oil
are much lower then those obtained in laboratory conditions where crude oil was not used
or where synthetic crude oils were used. One can identify two main effects of crude oil
on the CO2 corrosion rate. The first is a wettability effect and relates to a hydrodynamic
condition where crude oil entrains the water and prevents it from wetting the steel surface
70
components of crude oil that reaches the steel surface either by direct contact or by first
Efird [11] stressed the importance of testing the effect of specific crude oils and
Corrosion Rate Break as the level of produced water in crude oil production where
corrosion is accelerated and becomes a problem. Smart [12] in 1993 presented his work
that crude oils have surface active compounds (polar compounds containing oxygen,
nitrogen and sulfur) that strongly affect the wettability properties of brines. Adams et al.
[99] later presented work relating the water-wetting factor of corrosion to the velocities in
the flow pipe using a multiple regression model. Use of linear regression models to
describe the complex process of CO2 corrosion has always been a questionable approach.
In some recent studies [83], [84] the degree of inhibition was quantitatively modeled to
the chemical composition of crude oil and the concentration of saturates, aromatics,
resins, asphaltenes, nitrogen and sulfur. Hernández et al. [13] gave an insight about the
variables in crude oil composition that could be playing a major role in the inhibition
offered by crude oils. In this work, a statistical analysis was performed with several
Venezuelan crude oils evaluated experimentally under the same conditions. Crude oils
were separated into two groups: paraffinic and asphaltenic, depending on their
distribution of saturates, aromatics, resins and asphaltenes (SARA). The effects of basic
71
chemical and physical properties of crude oils were then evaluated by using multiple
linear regression analyses. Markov description stochastic approach for modeling the
phenomenon of pitting corrosion has been presented in the work of Provan [81].
In recent years the field of artificial intelligence has been explored for modeling the
corrosion process. ANNs have been one of the most promising approaches to the
corrosion modeling process. The next section presents some of the important research
that has taken place the field of modeling the corrosion process.
that of Smets and Bogaerts [85]. They developed a series of neural networks to predict
the SCC of type 304 stainless steel in near neutral solutions as a function of chloride
content, oxygen content and temperature. They found that the neural network approach
Urquidi-Macdonald [86] developed a neural network model used for predicting the
number and depth of pits in heat exchangers. No information was given about the
network size other than that it had two hidden layers or the number of training points.
The evolution of the pit depth and the number of pits were effectively modeled and
Ben-Hain and MacDonald [87] described the use of neural network models to
predict the influence of various parameters on the acidity of simulated geological brines.
72
The solutions were based on NaCl + MgCl2. The network inputs were the Na+ and Mg2+
concentration and the temperature. The predicted output was the pH value. The data set
consisted of 101 points, of which 90 were used for training, with the remaining 11
retained as a test set. A simple network consisting of a single layer with two hidden nodes
was used. The network achieved good results and the prediction error was of the same
Silverman and Rosen [88] combined artificial neural networks with an expert
system in order to predict the type of corrosion from a polarization curve. Inputs to the
networks included the passive current density, the pitting potential and the repassivation
potential, while outputs were the risks of crevice, pitting and general corrosion. Two
approaches were used: independent networks for each type of corrosion, and a single
combined network producing all three outputs. An expert system was used to interpret the
outputs produced by the two approaches. The relatively small size of the training data set
was one of the major concerns regarding the reliability of the model.
Trasatti and Mazza [90] developed a neural network to be used for the prediction
of crevice corrosion behavior of stainless steels. The network was trained from long-term
laboratory and field tests. Seventeen input variables were used with one hidden layer of
five nodes. Six hundred training examples were available; 450 of these were used for
training and the remaining 150 as a test set. The performance of the network was
reasonably good, but the very large number of input variables might be expected to
hypercube has 217 approx. 130,000 ‘corners’ so the data space is inevitably very sparsely
populated.
wavelet analysis and ANNs to identify and quantify the corrosion damage images on
classification algorithm was used to identify the corroded regions from the non-corroded
regions in the panel based on the extracted features. Good accuracy was obtained in
identification of the corroded segments. A back propagation NN was used to predict the
material loss due to corrosion. Perturbing the images by changing the pixel values that
would correspond to the higher material loss due to growth in corrosion was simulated.
Experiments were conducted by perturbing the images of the damaged regions such that
growth in the extent of the material loss could be observed. A good trend was observed
between the predicted material loss and the experimental data. The results indicated that
the computational methods developed for corrosion analysis seem to provide reasonable
Pidaparti et al. [97] presented a work that examined the residual strength of aging
aircraft panels in the presence of corrosion and fatigue damage. Both the residual strength
and the corrosion rates were predicted using a neural network consisting of two hidden
layer feed-forward architecture. Sensitivity analysis was performed for determining the
impact of input variables on the output. A series of simulations were also performed to
examine the generalization ability of the network in predicting the outputs for different
74
conditions of the input parameters. Each simulation tested the effect of a particular input
parameter on the predictions for a particular panel. The results obtained were in good
agreement with the experimental data. A similar work was carried out by Bailey et al.
[92]. A model was developed using neural networks to predict the ASTM G34 corrosion
rating and the resulting material loss in aging aircrafts. Another model was also
constructed that would predict the cycles for final fatigue failure and the residual static
strength of a particular type of material, given the amount of material loss due to
corrosion.
Bucolo et al. [93] modeled the corrosion phenomena occurring in a pulp and paper
plant. In this study, two predictive models were constructed. Predictive models for both a
local and a global prediction were built to allow for the evaluation of the corrosion rate
taking place in the stainless steel used in the ozone bleaching devices used in the plant.
An MLP model was constructed and later merged with a neuro-fuzzy system (NF). The
performance of the adopted predictive monitoring showed that the neuro-fuzzy expert
system was able to improve the capability of the neural network model by both
improving the accuracy of the model as well as demonstrating a dramatic reduction in the
Leifer et al. [94] presented a model based on the pitting corrosion for the carbon
steel waste tanks containing aqueous radioactive waste, used for temporary storage of
spent nuclear fuel while permanent storage facilities for such materials were being
prepared. ANN was used to predict the corrosion rate. The back propagation of error
75
method was used to train and test the ANN model using archival pitting data. The
resulting data for the number of pits obtained from the neural network model were in
conjunction to the results obtained from experimental methods [111]. In one of his other
works Leifer et al. [95] presented a predictive model to determine the rate of pitting
corrosion in carbon steel waste tanks used to store radioactive sludge. The other
inhibitors and temperature ranges. The concentration levels of inhibitors such as Nitrite
were used to analyze the experimental data. The network architecture selected was a
training the network. The results revealed greater accuracy in prediction conditions
growth rate in dual-phase (DP) steels (primarily a low carbon steel with micro-alloying
additions of vanadium and boron) using an artificial neural network. The training data
consisted of corrosion-fatigue crack growth rates at varying stress intensity ranges for
martensite contents between 32 and 76%. The ANN model used consisted of three hidden
layer with back-propagation architecture. Even though a large number of input variables
were used during the training of the model, the model exhibited excellent comparison
Nesic and Vrhovac [91] developed a hybrid model combining the reliability of a
mechanistic model with the flexibility of the neural network approach. The model was
developed using the experimental database of Dugstad et al. [8]. The model architecture
consisted of a single hidden layer back propagation NN having 66 input neurons and 51
hidden neurons. GAs was used for the network training. The inputs to the network were
indirect, crude or noisy parameters, called primitive descriptors, such as: t, pH, PCO2,
Fe++, HCO3-, and v (flow velocity of oils). Relations between these primitive descriptors
The prediction ability was found to be significantly better than conventional models.
77
CHAPTER 4. METHODOLOGY
4. 1 CORROSION TESTS
The main goal of this research was to make a model based on the actual data
collected from the experimental results. The detailed description of the corrosion tests
and the resulting data was published in a previous paper Hernández et al. [13]. A brief
components, aromatics, resins and asphaltenes (SARA) was performed on each crude oil.
API density (°API), total nitrogen content (NTOTAL), Total Acid Number (TAN), Sulfur
content (S%), Vanadium (V), Nickel (Ni) were measured according to ASTM standards.
Weight loss corrosion tests were performed on coupons. Three coupons were used for
each set of testing conditions; two of them were used for corrosion rate calculations and
the third for surface analysis and corrosion product characterization. After calculating
corrosion rates, these were then translated into inhibiting capacity, by dividing the values
corrosionrate withcrude
Inhibiting Capacity = 1 − (4.1)
corrosionrateblank
78
The first step towards the development of the model was to create a network
structure that would be most efficient in predicting complex nature of the corrosion
prediction problem. There are four major components related to the development of a
1. Choice of data and dividing them (based on sizes) into training, cross-validation
constants
these network selection parameters. Rigorous experimentation and a number of trials with
different types of network architecture were performed to achieve a good network model
for the given data. The software, NeuroSolutions version 4.21 developed by
NeuroDimensions Incorporated, was used for development and testing of the neural
network model.
79
The original data set was split into training, cross-validation and testing data sets
where:
• 15% of the exemplars concurrent with the training set were used for cross
validation during which the MSE was computed within a ‘test set’
• 20% of the exemplars were used for testing the trained network.
Generalized Feed Forward Network, Modular Neural Network and Radial Basis
Function, were experimented with to achieve the model which resulted in the best
Number
Type of Transfer Training Classification
of hidden Dimensionality
Network Functions Algorithm Accuracy
Layers
Multilayer Hyperbolic Gradient
1 11-8-1 75.82
Perceptron Tangent Descent
Multilayer Gradient
1 11-6-1 Logistic 83.67
Perceptron Descent
Multilayer Hyperbolic Gradient
2 11-10-8-3 88.52
Perceptron Tangent Descent
Multilayer Hyperbolic Gradient
2 11-10-8-1 92.2
Perceptron Tangent Descent
Multilayer
Perceptron with Hyperbolic Gradient
2 11-6-6-1 96.7
Genetic Tangent Descent
Optimization
Gaussian,
RadialBasis Gradient
1 11-7-1 Hyperbolic 80.14
Function Descent
Tangent
Gaussian,
RadialBasis Gradient
2 11-8-6-1 Hyperbolic 86.77
Function Descent
Tangent
Once the preliminary tests revealed that MLP were more accurate as compared to
others in predicting the inhibition rates, the Genetic Control component of the software
was utilized in order to obtain the best network parameters. The Steady State progression
genetic algorithms were used, in which only the worst member of the population would
be replaced with each iteration. This method of progression tends to arrive at a near
The genetic operator used for the algorithm is called Selection [112]. It selects
chromosome undergoes crossover and/or mutation to generate offspring which are then
added to the next generation’s population. The process of Crossover [112] is to develop a
develops the characteristics of both the parents. The Crossover Probability controls the
process of crossover. In our model the crossover probability of 0.9 was used. Another
genetic operator called Mutation is used to alter one or more genes in a particular
chromosome, resulting in a new gene value. With the inclusion of these new gene values
the GA can obtain betters results as compared to the ones obtained before the crossover
and mutation of the parent chromosomes. The Mutation probability used was 0.01 to
Based on the comparison of the results, a two hidden layer MLP (11-6-6-1) with six
processing elements each and a hyperbolic tangent transfer function at the hidden layers
was selected. Gradient descent was used as the training algorithm. Step size and
momentum rates are the key learning parameters for this algorithm. In order to accelerate
the network ‘learning’ and to make sure that the probability of network convergence is
highest at the global minimum, both the momentum rates and the step-sizes were
4. 2. 4 TERMINATION CRITERIA
The Gradient descent algorithm determines the weight vectors, which maps the
network input parameters to the desired output. This weight vector is randomly initialized
The randomness of the initial weight vector is very important for learning, but the
properties with different initial weight vectors. Therefore, to increase the probability of a
good initial solution (weight vector); a number of runs are required. In each of these run,
generalization. The following four termination criteria have been used to determine
The training parameters (i.e., learning parameters and the termination criteria) for
Number of Runs 5
Number of Epochs
without improvement in 100
CV error
Once all of the network parameters were selected, six test runs (Test 1 to 6) were
conducted using exactly the same network architecture and network parameters, but a
different set of randomized training data. These test results provided a set of different
weight vectors that were randomly initialized and adapted during the training process.
4. 3 SENSITIVITY ANALYSIS
The next step was to analyze the interrelationship between the input variables and
their effect on the output of the network. Sensitivity Analysis was performed for the
chosen MLP network and for all the six test runs for that particular model. The sensitivity
was computed based on the corresponding difference (delta) in the output(s) as graphed
using the Max-Min criteria of the output (inhibition). The results for one of the six test
Figure 4.2 to Figure 4.12 illustrate the separate sensitivities for each variable.
averaging the sensitivity values for all of the six test runs (Test 1 to 6). From the analysis
of the Cumulative sensitivity graph it apparent that crude oil percentage had the most
impact on the output (inhibition rate). Because of this, the data was subdivided on the
0.6
0.5
0.4
Sensitivity
0.3
0.2
0.1
0
TAN
S%
Aromatics
Total Nitrogen
Ni
API
V
Saturates
Resins
Asphaltenes
% Crude Oil
Figure 4.13 Cumulative Sensitivity Graph
The data was separated by crude Oil Percentage into 4 different groups: 1%, 20%,
50% and 80%. Sensitivity analysis was also performed on each of these groups as shown
Figure 4.14 Sensitivity About the Mean for 1% Crude Oil Concentration
Figure 4.15 Sensitivity About the Mean for 20% Crude Oil Concentration
93
Figure 4.16 Sensitivity About the Mean for 50% Crude Oil Concentration
Figure 4.17 Sensitivity About the Mean for 80% Crude Oil Concentration
94
Based on the results, similar behavior patterns were noted between the 1% and 20%
crude oil data and similarly between the 50% and 80% crude oil data analysis. The
similar groups were combined (1% and 20%) & (50% and 80%) and another sensitivity
analysis was performed (Figure 4.18 and Figure 4.19) to identify the similarities between
Figure 4.18 Sensitivity About the Mean for 1% and 20% Combined
95
Figure 4.19 Sensitivity About the Mean for 50% and 80% Combined
From the results of the cumulative sensitivity analysis we found that Nickel (Ni),
Crude oil and TAN were some of the important input variables affecting the output.
These results were in accordance with earlier studies Hernández et al. [13]. In light of
these results, to further explore the interrelationship between the input variables, an excel
model was constructed. The model represented the network analysis and generated the
predicted inhibition rate for a given set of input values. The network output was held
constant and the values of input variables were varied to explore the effects. Figure 4.20
demonstrates the behavior of Crude oil and Nickel while holding the network output and
85.0
65.0
Ni
45.0
25.0
5.0
0 10 20 30 40 50 60 70 80
% Crude Oil
Figure 4.20 Relationship Between % Crude Oil and Ni at Constant Inhibition Output
Similarly Figure 4.21 and Figure 4.22 illustrate behavior pattern between Crude oil
4.0
3.0
TAN
2.0
1.0
0.0
0 10 20 30 40 50 60 70 80 90
% Crude Oil
Figure 4.21 Relationship Between % Crude Oil and TAN at Constant Inhibition Output
50.0
40.0
Aromatics
30.0
20.0
10.0
0.0
8.00 9.00 10.00 11.00 12.00 13.00
API
Figure 4.22 Relationship Between API and Aromatics at Constant Inhibition Output
98
This same approach was further developed to show interactions between three
variables (Crude oil, Ni and TAN) at constant output inhibition as shown in Figure 4.23.
Figure 4.23 Relationship Curves Between % Crude Oil, Ni and TAN at Constant
Inhibition.
99
The Network Interpretation Diagram was constructed to track the direction and
variables on the output parameter. Figure 4.24 represents the NID for the network model
and illustrates the relative influence of each input variable in predicting the output
response.
100
4. 5 GARSON’S ALGORITHM
Garson’s Algorithm was applied to the network model in order to decipher the
relative importance of each input variable and their contribution to the predicted output.
Figure 4.25 displays the results of the algorithm in the form of a pie chart, partitioning the
Variables.
102
4. 6 TREPAN ALGORITHM
The next step in the analysis was to extract rules which would translate the neural
network model into explicit symbolic form. For the TREPAN algorithm, the regression
problem with continuous output data was transformed into a classification problem. The
output (% inhibition) was divided into 5 classes. Two different data sets (EVEN and
UNEVEN) were generated; one having even classes and another having uneven classes.
The two class ranges for both the data are shown in Table 4.3 and Table 4.4.
number of decision trees with different sizes and classification accuracies. The
minimum_sample size of one generated the decision tree which resulted in the best
accuracy.
Two different kinds of decision trees were extracted for each set of data,
loaded neural network model using the TREPAN algorithm. The Disjunctive_Trepan_tree
extracts a tree from the loaded network model using a variant of TREPAN that applies
disjunctive (i.e. “or”) tests instead of the general m-of-n tests at the internal nodes of the
extracted tree.
The performance statistics for two variants of decision trees applied to the two data
Trepan_tree Disjunctive_Trepan_tree
Training Training
Test Data Test Data
Data Data
EVEN
87.30% 62.70% 89.60% 64.80%
Classes
UNEVEN
91.80% 69.75% 92.30% 71.80%
Classes
104
Figure 4.26 and Figure 4.27 show a partially expanded view of the
Figure 4.27 Partially Expanded View of Trepan_tree Extracted From the UNEVEN Class
Data.
107
In the decision trees (Figure 4.26 and Figure 4.27) the circles represent the leaf
nodes and indicate the class label for the response variable (inhibition) predicted for a
particular set of values of the input variables represented in the path. The decision tree
can be easily decomposed into propositional rules. Table 4.6 shows a set of 20 distinct
Rule Class
Rule Text
No. Label
1 (% Crude Oil > 10.50) AND (Resin > 17) AND (API <= 9.26) CL5
(% Crude Oil > 10.50) AND (Resin > 17) AND (API <= 13.15) AND (API > 9.26)
2 CL4
AND (% Crude Oil <= 64.99)
(% Crude Oil > 10.50) AND (Resin > 17) AND (API <= 13.15) AND (API > 9.26)
3 CL5
AND (% Crude Oil > 64.99)
(% Crude Oil > 10.50) AND (Resin > 17) AND (API > 13.15) AND (% Crude Oil
4 CL3
<= 34.99)
(% Crude Oil > 10.50) AND (Resin > 17) AND (API > 13.15) AND (% Crude Oil >
5 CL5
34.99)
(% Crude Oil > 10.50) AND (Resin <= 17) AND (Saturates <= 63.40) AND (%
6 CL3
Crude Oil <= 34.99) AND (Resin <= 5.30)
(% Crude Oil > 10.50) AND (Resin <= 17) AND (Saturates <= 63.40) AND (%
7 CL2
Crude Oil <= 34.99) AND (Resin > 5.30) AND (API <= 23.95)
(% Crude Oil > 10.50) AND (Resin <= 17) AND (Saturates <= 63.40) AND (%
8 CL4
Crude Oil <= 34.99) AND (Resin > 5.30) AND (API <= 32.34) AND (API > 23.95)
(% Crude Oil > 10.50) AND (Resin <= 17) AND (Saturates <= 63.40) AND (%
9 CL4
Crude Oil <= 34.99) AND (Resin > 5.30) AND (API > 32.34)
(% Crude Oil > 10.50) AND (Resin <= 17) AND (Saturates <= 63.40) AND (%
10 CL4
Crude Oil > 64.99)
(% Crude Oil > 10.50) AND (Resin <= 17) AND (Saturates <= 63.40) AND (%
11 CL3
Crude Oil > 34.99) AND (% Crude Oil <= 64.99) AND (API <= 23.95)
(% Crude Oil > 10.50) AND (Resin <= 17) AND (Saturates <= 63.40) AND (%
12 Crude Oil > 34.99) AND (% Crude Oil <= 64.99) AND (API > 23.95) AND (S % <= CL3
0.57)
(% Crude Oil > 10.50) AND (Resin <= 17) AND (Saturates <= 63.40) AND (%
13 Crude Oil > 34.99) AND (% Crude Oil <= 64.99) AND (API > 23.95) AND (S % > CL4
0.57)
(% Crude Oil > 10.50) AND (Resin <= 17) AND (Saturates > 63.40) AND (% Crude
14 CL2
Oil > 64.99)
(% Crude Oil > 10.50) AND (Resin <= 17) AND (Saturates > 63.40) AND (% Crude
15 CL1
Oil <= 64.99) AND (API <= 32.90)
(% Crude Oil > 10.50) AND (Resin <= 17) AND (Saturates > 63.40) AND (% Crude
16 CL2
Oil <= 64.99) AND (API > 32.90)
17 (% Crude Oil <= 10.50) AND (Nitrogen >= 6514.38) CL2
(% Crude Oil <= 10.50) AND (Nitrogen < 6514.38) AND (Saturates <= 57.15 OR
18 CL1
TAN <= 4.22)
(% Crude Oil <= 10.50) AND (Nitrogen < 6514.38) AND (Saturates > 57.15 OR
19 CL2
TAN > 4.22) AND (S % >= 0.58)
(% Crude Oil <= 10.50) AND (Nitrogen < 6514.38) AND (Saturates > 57.15 OR
20 CL1
TAN > 4.22) AND (S % < 0.58)
109
The first rule implies that “IF (% Crude Oil > 10.50) AND (Resin > 17) AND (API
<= 9.26) the class label is CL5” i.e. the predicted inhibition rate would be in range (0.98
a network architecture which accurately mimics the data patterns. It is evident from Table
4.1 the 11-6-6-1 MLP network has the best prediction accuracy on the training data. The
Mean Square Error (MSE) which is a difference between the network output and the
desired output is an indirect measure of the performance of the model. Table 5.1 shows
the accuracy and the MSE values for the selected model.
11-6-6-1 MLP
Neural
R = 97.60% 0.0026
Network
Model
111
Figure 5.1 shows the model performance on training vs. test data.
the model to reach minimum MSE values. In this case the selected model was able to
converge to the minimum MSE value in the very first run after only 3500 epochs.
112
The initial sensitivity runs on the selected model Figure 4.1 revealed that the
variables having the greatest influence on the output response were Crude Oil percentage,
Ni Content and API gravity. The separate sensitivity analysis Figure 4.2 to Figure 4.12
further explains the effect of each of the variable on the final output. Increase in the %
Crude Oil causes an increase in the inhibiting capacity of the (see Figure 4.2). Increase in
the Nickel content has a detrimental effect reducing the inhibiting capacity of the oil (see
Figure 4.3). API (Figure 4.4) and total Nitrogen (Figure 4.5) tends to increase the
inhibition rate as their respective content increases. The effect of Vanadium (Figure 4.6)
and Total Acid Number (Figure 4.8) resulted in an increase of the inhibiting capacity;
however the effect is very small as can be seen for the values in the y-axis. The content of
S % in the range tested (Figure 4.7) showed to decrease the inhibiting capacity. In regards
to the SARA components of the crude oil, none showed a significant effect, however
saturates (Figure 4.9) showed to decrease the inhibiting capacity as their content
increases, contrary to aromatics (Figure 4.10), resins (Figure 4.11) and asphaltenes
sensitivity values from six different test runs was also consistent with the initial
sensitivity analysis and indicated that Crude Oil percentage, Ni Content and API gravity
were the most important factors that affected the inhibition rate.
113
The tendency of % crude oil vs. inhibition was clear in both the data and the model.
An increase in crude oil content increases the degree of corrosion protectiveness by the
crude oil. With API density, even if the data is scattered, the model predicts an increase
in inhibition as API increases, implying lighter crude oils providing higher values of
inhibiting capacities.
In order to see if this effect was repeatable, separated sensitivity analyses were
performed for the various crude oil contents evaluated: 1%, 20%, 50% and 80%.
• For 1% crude oil (see Figure 4.14) the model tends to predict a higher inhibiting
capacity than the real measured values, but the R (model accuracy) value is still
sulfur content, TAN and asphaltenes. All but Nickel increase the inhibiting
• For 20% crude oil (Figure 4.15, R=0.98) Nickel is not that critical and the
variables influencing the most are API, total Nitrogen, resins and TAN. Saturates
• For 50% crude oil (Figure 4.16, R=0.96) the four variables with the highest
sensitivity are Nickel, Vanadium, aromatics and sulfur. Nickel and aromatics
positive effects are considered then V, S %, asphaltenes and resins show the
highest influence.
114
• For 80% (Figure 4.17, R=0.98) Nickel and Vanadium showed the highest
content increases. asphaltenes follow and then aromatics, the latter also having an
inverse relationship. Note that sensitivity values are a lot higher for the first two
cases.
An interesting result from the model is that it was able to point out notably different
behaviors when the crude oil concentration changes. By putting together the data for low
concentrations (1 and 20%, see Figure 4.18) and the data for higher concentrations (50
and 80%, see Figure 4.19) and looking at the sensitivities it can be concluded that at
higher concentrations the presence of crude oil has the greatest influence on the output
and the effects of other variables is not as significant. At low crude oil concentrations the
sensitivities are a lot higher (up to 0.8) indicating that inhibition is not as much related to
the amount of crude oil but to the presence of oil or a combination of the two or more
variables.
From the interrelationship graphs (see Figure 4.20) we can clearly see that for
Crude oil range [1% - 20%], Nickel content of the oil tends to increase. Further for
[20% - 80%] crude oil range, the Nickel content decreases; depicting an inverse
relationship between the two variables. Similarly from Figure 4.21 we can see a linear
relationship between Crude oil and TAN for crude oil ranging from [20% - 80%].
115
Network Interpretation Diagram results serve as the basis; reiterating the fact that
certain variables have a positive effect on the output response whereas others tend to have
a negative effect. From the Figure 4.24 we can see that the thick continuous lines
representing positive excitatory signal are getting generate from %crude oil, API and
total Nitrogen content, where as thin continuous lines are generated from Vanadium and
TAN. These results are consistent with the patterns available through the sensitivity
analysis showing that all the above mentioned variables positively affect the inhibition
rate i.e. an increase in these variables tend to increase the inhibition rate. On the contrary
variables such as Nickel, S% and Saturates generate thick dashed lines which represent a
negative or detrimental effect on the output response. The NID thus provided a clear
The main idea behind implementation of Garson’s algorithm was to partition the
relative share of prediction associated with each input variable and determine if any of
the input variables can be eliminated from the further analysis. From the Figure 4.25 we
can positively conclude that %Crude oil is by far the most influencing factor affecting the
inhibition rate. The relative partitioning also revealed that apart from crude oil and
Nickel; all the input variables have almost similar affect on the output.
116
The TREPAN algorithm was applied both on Even and Uneven class data sets and
From the results as shown in Table 4.5 the following inference can be made.
• The classification accuracy in the case of both training and test data was found
higher in the UNEVEN class data set. This is mainly because of the fact the most
of the data points had a high inhibition rates, so dividing into uneven classes
based on the frequency distribution of the data within particular ranges proved to
be better idea.
The efficacy of the rule extraction task can be tested along the following
dimensions:
the Trepan decision tree were successfully able to provide class labels providing
simple combination of the values of the input variables. The number of input
117
antecedents in a particular rule (the left column) and the number of rules having
that number of features in their respective antecedents (right column). From the
table it is clear that almost 90% of the rules in the rule set have less than five input
set.
Table 5.2 Number of Features in the Rule Antecedent for the NN-Rule Set
Number of features in
Number of rules (Total: 20)
the rule antecedent
2 1
3 9
4 8
5 2
• Accuracy and Fidelity: The rules set was generated from the decision tree created
by the TREPAN algorithm. Considering the fact that TREPAN uses the network as
an oracle to predict the class labels, the rule set accurately mimics the behavior of
the trained neural network. Hence, we can positively conclude that the fidelity of
5. 6 COMPARISON METHODOLOGIES
5. 6. 1 STATISTICAL ANALYSIS
regression analysis was performed to come up with a regression equation that would be
able to show the model could be augmented by knowing any possible linear relationships
among each of the input variables and the output. The regression equation is:
The Coef is the regression coefficient for a given variable, SE Coef is the standard
error of the coefficient. The t-value (T) is used to compare the t-value to the t-distribution
to determine if a predictor is significant. The bigger the absolute value of the t-value, the
more likely the predictor is significant. The p-value (P) is the probability value and it is
often used in hypothesis tests to help decide whether to reject or fail to reject a null
hypothesis. The p-value is the probability of obtaining a test statistic that is at least as
extreme as the actual calculated value, if the null hypothesis is true. The smaller the p-
value, the smaller the probability is that one would be making a mistake by rejecting the
null hypothesis. A commonly used cut-off value for the p-value is 0.05. For example, if
the calculated p-value of a test statistic is less than 0.05, the null hypothesis is rejected.
The p-values for the estimated coefficients of API, TAN and Crude Oil are 0.000,
indicating that they are significantly related to % Inhibition. The p-values for V, Ni,
Total, S% are >0.05, indicating that these are not related to Inhibition at a-level of 0.05.
The R-Square value obtained was 55%, which is fairly low suggesting that the
relationship between the predictor and response variables is not linear. The R-Square
value of 55% implies that only 55% of the variability in the output could be captured and
Many statistical tests and intervals are based on the assumption of normality.
Unfortunately, many real data sets are in fact not approximately normal. However, an
appropriate transformation of a data set can often yield a data set that does follow
T (Y ) = (Y λ − 1) / λ (5.2)
some of the input variables e.g. (API)2 or (Ni)-0.5 to normalize the data. The BOX-COX
but the results did not improve reinforcing the fact that the relationship between the
predictor and response variables is not linear. Lastly, Stepwise Regression [100] was
performed to consider reducing the model size by eliminating some of the input variables
(within the scope of the analysis). Once again, there was no significant improvement in
It has been seen that the neural networks are generally better in approximating the
complex relationships between the continuous variables and their influence on the output.
Rules extracted from the decision trees based on the network parameters such as weight
and basis tend to be more accurate in some cases then those derived direct from the data
by other machine learning methods, such as ID3 [23], C4.5 [25] or CART [102].
121
The Waikato Environment for Knowledge Analysis (WEKA) [101] java software
package provides a host of well documented data structures, classes and tools for
development of the machine learning schemes. WEKA uses a J4.8 algorithm which
implements the C4.5 algorithm to extract decision tree. In order to compare the results of
TREPAN the C4.5 algorithm was applied to the data. The results of the C4.5 algorithm
applied set of EVEN and UNEVEN classified data set is shown in Table 5.4
From the results shown in Table 5.4 we can see that the data with the EVEN
classification ranges had higher prediction accuracy both on the training and the test data.
From the table is evident that clearly the Neural Network model outperforms the
traditional Multiple Regression Analysis, providing us with a model that understands the
data patterns and the interrelationship between the predictor variables and their effect on
the response. Again the rules set extracted from the Disjunctive_Trepan decision tree
based on the network parameters such as synaptic weights and bias, has higher prediction
accuracy as compared to the C4.5 decision tree derived directly from the data values.
123
6. 1 CONCLUSIONS
The main aim was to develop a model that would perform accurately generating
high predictive accuracy even with limited amount of data. Another important driving
factor for the research was to come up with a robust model which could handle noisy data
and still be able to explain the complex relationship between the various constituents of
the oil and provide knowledge based on pattern in the data. This thesis covered several
aspects of knowledge based approach for predicting the corrosion rate based on the
constituent of the oil. Sensitivity Analysis was clearly able to explain the interrelationship
between their input variables. The analysis revealed that variables such as crude oil, API,
Nickel and Total Nitrogen content were the most important factors affecting the
inhibition rate. These results were in accordance to the experimental results obtained
Hernández et al. [13]. Further analysis using NID and Garson’s algorithm determined
the relative share of importance of all the input variables in the predicted response.
Finally TREPAN presented us with a rule set extracted from the decision tree, which was
accurately able to mimic the neural network in classifying the patterns in the rule space.
The efficacy of rule set proved that the rules were simple and comprehensible with less
A comparative study was also undertaken to test the performance of the neural
network based approach compared to the statistical analysis and traditional data mining
124
methods. The encouraging results proved that neural network based model coupled with
In summary the research was successfully able develop a rule based approach using
6. 2 FUTURE RESEARCH
There are various directions that can effectively channelize the future research in
this field.
corrosion behavior and the unpredictable behavioral patterns of the network model that
may occur in regions of the problem domain where no data is available. Neural networks
cannot produce reliable prediction for input conditions that are outside the ranges of the
data used to train them. The standard interpolation techniques can be successfully applied
to one or two variables, but in the case of corrosion the number of variables is significant
(11 variables in this research). There is a need to develop interpolation methods that can
generate data points while accommodating large number of input variables and still be
the time and cost of training the neural network. The training regimen can be
strategies. Such techniques can map the available domain knowledge into the basic
Regression Trees
Most of the rule extraction methods depend on decision trees which are primarily
the original continuous data, introduces significant noise in the learning data set. For
the actual regression surface of the network. A proposed research could be developing a
regression tree, which would have leaves characterized by real valued functions rather
The current research focuses on extraction of rule from a trained neural network for
available in the form of equations that can be used for calculations. There is a need to
processes through a set of equations. It would express both the existing knowledge
126
(theoretical knowledge based on chemical analysis) and the one extracted in the form of
REFERENCES
[29]. Efraim, T., Jay E. A., Liang T. P., McCarthy, R. V., “Decision support systems
and intelligent systems,” Prentice Hall, Upper Saddle River, NJ, 2001.
[30]. Principe, J. C., Euliano, E. R., Lefebvre, W. C., “Neural and adaptive systems:
Fundamentals through simulations with cd-rom,” John Wiley & Sons, Inc., New
York, NY, 1999.
[31]. Reed, R. D., Marks, R. J., “Neural smithing: Supervised learning in feedforward
artificial neural networks,” MIT Press, Cambridge, MA, 1998.
[32]. Rumelhart, D. E., Hinton, G. E., Williams, R. J., “Learning representations by
backpropagation errors,” Nature, Paper no. 323, 1986, p. 533-536.
[33]. Hinton, G. E., “Connectionist learning procedures,” Artificial. Intelligence,
40(1-3), 1989, p. 185-234.
[34]. Whitley, D., Starkweather, T., Bogart, C., “Genetic algorithms and neural
networks: Optimizing connections and connectivity,” Parallel Computing, 14(3),
1990, p. 347-361.
[35]. Engel, J., “Teaching feed-forward neural networks by simulated annealing,”
Complex Systems, 2(6), 1988, p. 641-648.
[36]. Vapnik, V. N., “The nature of statistical learning theory,” Springer-Verlag, NY,
1995.
[37]. Dimopoulos, Y., Bourret, P., Lek, S., “Use of some sensitivity criteria for
choosing networks with good generalization ability,” Neural Processing Letters
Vol. 2, 1995, p. 1-4.
[38]. Dimopoulos, I., Chronopoulos, J., Chronopoulou Sereli, A., Lek, S., “Neural
network models to study relationships between lead concentration in grasses and
permanent urban descriptors in Athens city (Greece).” Ecological Modelling,
Paper no. 120, 1999, p. 157-165.
[39]. Scardi, M., Harding, L.W., “Developing an empirical model of phytoplankton
primary production: a neural network case study,” Ecological Modelling, 120
(2-3), 1999, p. 213-223.
[40]. Yao, J., Teng, N., Poh, H.L., Tan, C.L., “Forecasting and analysis of marketing
data using neural networks,” Journal of Information Science and Engineering
14, 1998, p. 843-862.
[41]. Lek, S., Belaud, A., Dimopoulos, I., Lauga, J., Moreau, J., “Improved
estimation, using neural networks, of the food consumption of fish populations,”
Marine Freshwater Research, 46, 1995, p. 1229-1236.
[42]. Lek, S., Belaud, A., Baran, P., Dimopoulos, I., Delacoste, M., “Role of some
environmental variables in trout abundance models using neural networks,”
Aquatic Living Resources, 9, 1996a, p. 23-29.
130
[43]. Lek, S., Delacoste, M., Baran, P., Dimopoulos, I., Lauga, J., Aulagnier, S.,
“Application of neural networks to modelling nonlinear relationships in
ecology,” Ecological Modelling, 90, 1996b, p. 39-52.
[44]. Mastrorillo, S., Lek, S., Dauba, F., “Predicting the abundance of minnow
Phoxinus phoxinus (Cyprinidae) in the River Ariege (France) using artificial
neural networks,” Aquat. Living Resour, 10, 1997a, p. 169–176.
[45]. Mastrorillo, S., Dauba, F., Oberdorff, T., Gue´gan, J.F., Lek, S., “Predicting
local fish species richness in the Garonne River basin,” C.R. Acad. Sci, Sciences
de la vie Paris, 321, 1998, p. 423–428.
[46]. Lek-Ang, S., Deharveng, L., Lek, S., “Predictive models of collembolan
diversity and abundance in a riparian habitat,” Ecological Modelling, 120, 1999,
p. 247–260.
[47]. Spitz, F., Lek, S., “Environmental impact prediction using neural network
modeling. An example in wildlife damage.” Journal of applied ecology, 36,
1999, p. 317–326.
[48]. Olden, J.D., “An artificial neural network approach for studying phytoplankton
succession.” Hydrobiology, 436, 2000, p. 131–143.
[49]. Garson, G.D., “Interpreting neural network connection weights,” Artificial
Intelligence Expert, 6, 1991, p. 47-51.
[50]. Goh, A.T.C., “Back-propagation neural networks for modeling complex
systems,” Artificial Intelligence in Engineering, 9, 1995, p. 143-151.
[51]. Aoki, I., Komatsu, T., “Analysis and prediction of the fluctuation of sardine
abundance using a neural network,” Oceanol. Acta, 20, 1999, p. 81–88.
[52]. Chen, D.G., Ware, D.M., “A neural network model for forecasting fish stock
recruitment,” Can. J. Fish, Aquat. Sci, Vol. 56, 1999, p. 2385–2396.
[53]. Özesmi, S. L., U. Özesmi, “An artificial neural network approach to spatial
habitat modelling with interspecific interaction,” Ecological Modelling, 116,
1999, p. 15–31.
[54]. Tickle, A., Andrews, R., Golea, M., Diederich, J., “The truth will come to light:
Directions and challenges in extracting the knowledge embedded within trained
artificial neural networks,” IEEE Transactions on Neural Networks, 9(6), 1998,
p. 1057-1068.
[55]. Craven, M., Shavlik, J., “Rule extraction: where do we go from here?”
University of Wisconsin Machine Learning Research Group working paper, 99-
1, 1999.
[56]. Gallant, S. I., “Connectionist expert systems,” Communications of the ACM, 31,
1988, p. 152-169.
131
[57]. Andrews, R., Diederich., Tickle, A. B., “Survey and critique of techniques for
extracting rules from trained artificial neural networks,” Knowledge Based
Systems, 8, 1995, p. 373-389.
[58]. Fu, L. M., “Rule learning by searching on adapted nets,” Proceedings of the
Ninth National Conference on Artificial Intelligence, AAAI Press, Anaheim,
CA, 1991, p. 590-595.
[59]. Towell, G. G., Shavlik, J. W., “Extracting refined rules from knowledge-based
neural networks,” Machine Learning, 13, 1993, p. 71-101.
[60]. Setiono, R., “Extracting rules from neural networks by pruning and hidden-unit
splitting,” Neural Computation, 9, 1997, p. 205-225.
[61]. Andrews, R., Geva, S., “Rule extraction from a constrained error back
propagation MLP,” Proceedings of Fifth Australian Conference on Neural
Networks, Brisbane, Queensland, 1994, p. 9-12.
[62]. Saito, K., Nakano, R., “Rule extraction from facts and neural networks,”
Proceedings of the International Neural Network Conference, San Diego, CA,
1990, p. 379-382.
[63]. Craven, M. W., Shavlik, J. W., “Using sampling and queries to extract rules
from trained neural networks,” Proceedings of the Eleventh International
Conference on Machine Learning, Morgan Kaufmann, New Brunswick, NJ,
1994, p. 37-45.
[64]. Thurn, S. B., “Extracting provably correct rules from artificial neural networks,”
Technical Report IAI-TR-93-5, University of Bonn, Bonn, Germany, 1993.
[65]. Pop, E., Ruleneg, J. D., “Rule-extraction from neural networks by step-wise
negation,” Technical report, Queensland University of Technology,
Neurocomputing Research Center, 1994.
[66]. Craven, M.W., Shavlik, J. W., “Extracting tree-structured representations of
trained networks,” Advances in Neural Information Processing, Vol. 8, 1996,
p. 24-30.
[67]. Schmitz, G. P. J., Aldrich, C., Gouws, F. S., “ANN-DT: An algorithm for
extraction of decision trees from artificial neural networks,” IEEE Transactions
on Neural Networks, Vol. 10(6), 1999, p. 1392-1401.
[68]. Boz, O., “Converting a trained neural network to a decision tree,” Proceedings
of the 2002 International Conference on Machine Learning and Applications –
ICMLA., Las Vegas, NE, CSREA Press, 2002, p. 110-116.
132
[110]. http://www.ics.uci.edu/~mlearn/MLlist/v7/20.html
[111]. http://sti.srs.gov/fulltext/ms9800653/ms9800653.pdf
[112]. http://www.nd.com/genetics/selection.html