Ohiou 1132954243

KNOWLEDGE BASED APPROACH USING NEURAL NETWORKS FOR
PREDICTING CORROSION RATE
A thesis presented to
the faculty of
the Fritz J. and Dolores H. Russ
College of Engineering and Technology of Ohio University
In partial fulfillment
of the requirements for the degree
Master Of Science
Vishal V. Ghai
March 2006
This thesis entitled
KNOWLEDGE BASED APPROACH USING NEURAL NETWORKS FOR
PREDICTING CORROSION RATE
by
Vishal V. Ghai
has been approved for
the Department of Industrial and Manufacturing Systems Engineering
and the Russ College of Engineering and Technology by
Gary R. Weckman
Associate Professor of Industrial and Manufacturing Systems Engineering
Dennis Irwin
Dean, Russ College of Engineering and Technology

ABSTRACT
VISHAL V. GHAI. M.S. March 2006. Industrial and Manufacturing Systems
Engineering
Knowledge Based Approach Using Neural Networks For Predicting Corrosion Rate
(135 pp.)
Director of Thesis: Gary R. Weckman
A number of CO2 corrosion for the oil and gas industry exists. However, these
models lag significantly behind the needs of the industry. There is still a large knowledge
gap between actual processes occurring in the field and the current mechanistic and
empirical models of CO2 corrosion. The complexity of the underlying physico-chemical
phenomena is often such that our understanding is significantly lower than the level
required for the mechanistic modeling. There is a need to develop a model that would
have both the capability to predict the CO2 corrosion rate with high accuracy, as well as
provide knowledge that would aid the understanding of the phenomena. This thesis
focuses on the development of an Artificial Neural Network model based on CO2 field
data used in predicting the corrosion rate of carbon steel. Further, rules are extracted
from the trained network using a TREPAN decision tree algorithm to translate the
hypothesis learnt into symbolic form. Network model performance is then evaluated by
comparing it to a linear regression model using MINITAB. The efficacy of the rule set is
then compared to the C4.5 machine learning algorithm. The interrelationship of input
variables is discussed based on the constructed network model and the generated rule set.
Approved:
Gary R. Weckman
Associate Professor of Industrial and Manufacturing Systems Engineering

5
TABLE OF CONTENTS
Abstract ............................................................................................................................... 3
Table of Contents................................................................................................................ 5
List of Figures ..................................................................................................................... 9
List of Tables .................................................................................................................... 12
CHAPTER 1. Introduction................................................................................................ 13
1. 1 CO2 Corrosion: Theoretical Background .............................................................. 13
1. 2 Corrosion in the Oil and Gas Industry ................................................................... 14
1. 3 Importance of Crude Oil in CO2 Corrosion of Carbon Steel ................................. 15
1. 4 Previous Research in the Corrosion Field .............................................................. 16
1. 5 Current Research .................................................................................................... 18
1. 6 Thesis Structure...................................................................................................... 19
CHAPTER 2. Soft Computing Methodologies................................................................. 21
2. 1 What is Soft Computing? ....................................................................................... 21
2. 2 Genetic Algorithms ................................................................................................ 23
2. 2. 1 Methodology of Genetic Algorithms.............................................................. 24
2. 3 Machine Learning .................................................................................................. 24

6
2. 3. 1 Decision Tree Induction ................................................................................. 26
2. 3. 2 Attribute-Oriented Induction .......................................................................... 31
2. 4 Artificial Neural Networks..................................................................................... 33
2. 4. 1 Neural Computation ....................................................................................... 33
2. 4. 2 The Multi-Layer Perceptron ........................................................................... 36
2. 4. 3 Neural Network Training ............................................................................... 39
2. 4. 4 Generalization Consideration ......................................................................... 40
2. 5 Knowledge Extraction From Artificial Neural Networks ...................................... 41
2. 5. 1 Partial Derivative Method .............................................................................. 42
2. 5. 2 Perturb Method............................................................................................... 44
2. 5. 3 Sensitivity Analysis ........................................................................................ 44
2. 5. 4 Garson’s Algorithm ........................................................................................ 45
2. 5. 5 Network Interpretation Diagram (NID).......................................................... 49
2. 6 Rule Extraction in Neural Networks ...................................................................... 52
2. 6. 1 The Rule Extraction Task............................................................................... 54
2. 6. 2 Approaches to Rule Extraction....................................................................... 55
2. 6. 3 Validity Interval Analysis............................................................................... 59
2. 6. 4 Extraction of Decision Tree Representations ................................................. 60

7
2. 6. 5 Trepan Algorithm ........................................................................................... 61
CHAPTER 3. Approaches to Corrosion Prediction Problem ........................................... 65
3. 1 Review of Approaches to Solving Corrosion Prediction Problem......................... 65
3. 2 Mechanistic Models ............................................................................................... 67
3. 3 Semi-Empirical Models.......................................................................................... 68
3. 4 Empirical Models ................................................................................................... 69
3. 5 Neural Network Models ......................................................................................... 71
CHAPTER 4. Methodology.............................................................................................. 77
4. 1 Corrosion Tests ...................................................................................................... 77
4. 2 Development of the Neural Network Model.......................................................... 78
4. 2. 1 Training, Cross-validation and Test Datasets................................................. 79
4. 2. 2 Network Architecture, Training Algorithm and Learning Parameters........... 79
4. 2. 3 Genetic Optimization of Network Parameters ............................................... 80
4. 2. 4 Termination Criteria ....................................................................................... 82
4. 3 Sensitivity Analysis................................................................................................ 83
4. 4 Network Interpretation Diagram (NID) ................................................................. 99
4. 5 Garson’s Algorithm.............................................................................................. 101
4. 6 Trepan Algorithm ................................................................................................. 102

8
CHAPTER 5. RESULTS, COMPARISION AND DISCUSSION ................................ 110
5. 1 Accuracy of Selected Model ................................................................................ 110
5. 2 Sensitivity Analysis Results ................................................................................. 112
5. 3 Network Interpretation Diagram (NID) Results................................................... 115
5. 4 Results of Garson’s Algorithm............................................................................. 115
5. 5 Results of Trepan Algorithm................................................................................ 116
5. 5. 1 Efficacy of the Rule Extraction Task ........................................................... 116
5. 6 Comparison Methodologies ................................................................................. 118
5. 6. 1 Statistical Analysis ....................................................................................... 118
5. 6. 2 C4.5 Decision Tree Using WEKA ............................................................... 120
CHAPTER 6. Conclusion and Future Research ............................................................. 123
6. 1 Conclusions .......................................................................................................... 123
6. 2 Future Research.................................................................................................... 124
References....................................................................................................................... 127
9
LIST OF FIGURES
Figure 2.1 Decision Tree Representation of PlayTennis Concept.................................... 28
Figure 2.2 Scheme of Attribute-Oriented Induction ......................................................... 32
Figure 2.3 Computation at a Node.................................................................................... 34
Figure 2.4 Information Flow for Training Phase .............................................................. 36
Figure 2.5 A Multi-Layer Perceptron ............................................................................... 37
Figure 2.6 Logistic and Hyperbolic Tangent Transfer Functions..................................... 38
Figure 2.7 Cross Validation for Termination.................................................................... 41
Figure 2.8 Network Diagram ............................................................................................ 47
Figure 2.9 Network Interpretation Diagram (NID)........................................................... 51
Figure 2.10 Schematic Representation of the ANN-DT Algorithm ................................. 61
Figure 2.11 The TREPAN Algorithm. ................................................................................ 63
Figure 4.1 Sensitivity Analysis About the Mean .............................................................. 84
Figure 4.2 Separate Sensitivity for % Crude Oil .............................................................. 85
Figure 4.3 Separate Sensitivity for Nickel (Ni) ................................................................ 85
Figure 4.4 Separate Sensitivity for API ............................................................................ 86
Figure 4.5 Separate Sensitivity for Total Nitrogen........................................................... 86

10
Figure 4.6 Separate Sensitivity for Vanadium (V) ........................................................... 87
Figure 4.7 Separate Sensitivity for S % ............................................................................ 87
Figure 4.8 Separate Sensitivity for Total Acid Number (TAN) ....................................... 88
Figure 4.9 Separate Sensitivity for Saturates.................................................................... 88
Figure 4.10 Separate Sensitivity for Aromatics................................................................ 89
Figure 4.11 Separate Sensitivity for Resins...................................................................... 89
Figure 4.12 Separate Sensitivity for Asphaltenes ............................................................. 90
Figure 4.13 Cumulative Sensitivity Graph ....................................................................... 91
Figure 4.14 Sensitivity About the Mean for 1% Crude Oil Concentration ...................... 92
Figure 4.15 Sensitivity About the Mean for 20% Crude Oil Concentration .................... 92
Figure 4.18 Sensitivity About the Mean for 1% and 20% Combined .............................. 94
Figure 4.19 Sensitivity About the Mean for 50% and 80% Combined ............................ 95
Figure 4.20 Relationship Between % Crude Oil and Ni at Constant Inhibition Output... 96
Figure 4.21 Relationship Between % Crude Oil and TAN at Constant Inhibition
Output ...........................................................................................................97
Figure 4.22 Relationship Between API and Aromatics at Constant Inhibition Output .... 97
11
Figure 4.23 Relationship Curves Between % Crude Oil, Ni and TAN at Constant
Inhibition.......................................................................................................98
Figure 4.24 NID for 11-6-6-1 MLP Network ................................................................. 100
Figure 4.25 Results of Garson’s Algorithm Showing Relative Importance of Input
Variables. ....................................................................................................101
Figure 4.26 Partially Expanded View of Disjunctive_Trepan_tree Extracted From the
UNEVEN Class Data..................................................................................105
Figure 4.27 Partially Expanded View of Trepan_tree Extracted From the UNEVEN
Class Data. ..................................................................................................106
Figure 5.1 Model Performance on Training vs. Test Data. ............................................ 111
12
LIST OF TABLES
Table 2.1 Training Set for the PlayTennis Concept.......................................................... 27
Table 2.2 Matrix Showing Connection Weights............................................................... 47
Table 2.3 Matrix Showing Contribution of Each Input Neuron ....................................... 48
Table 2.4 Relative and Sum of Input Neuron Contributions ............................................ 49
Table 2.5 Relative Importance of Input Variables............................................................ 49
Table 4.1 Evaluation of Different Neural Network Model Architectures ........................ 80
Table 4.2 Learning Parameters and the Termination Criteria........................................... 83
Table 4.3 Uneven Class Ranges...................................................................................... 102
Table 4.4 Even Class Ranges.......................................................................................... 102
Table 4.5 Performance Statistics for Decision Trees...................................................... 103
Table 4.6 Rules Extracted From the Disjunctive_Trepan_tree ...................................... 108
Table 5.1 Prediction Accuracy of the 11-6-6-1 MLP Network. ..................................... 110
Table 5.2 Number of Features in the Rule Antecedent for the NN-Rule Set ................. 117
Table 5.3 Results of Multiple Regression Analysis........................................................ 118
Table 5.4 Results of C4.5 Algorithm. ............................................................................. 121
Table 5.5 Comparative Summary ................................................................................... 122

13
CHAPTER 1. INTRODUCTION
1. 1 CO2 CORROSION: THEORETICAL BACKGROUND
Corrosion in carbon steel in the presence of CO2 involves an electrochemical
process where iron is dissolved at the anode and hydrogen is evolved at the cathode
[103]. The chemical reaction is:
Fe + CO2 + H 2O → FeCO3 + H 2 (1.1)
This chemical reaction results in formation of solid FeCO3 films. Depending on the
conditions during formation, these films can be protective or non-protective. One of the
most important individual reactions is the anodic dissolution of iron:
Fe → Fe 2+ + 2e − (1.2)
The presence of CO2 acts as a catalyst increasing the hydrogen evolution, thereby
increasing the corrosion rate of carbon steel in aqueous solution. Even at pH > 5 the
hydrogen evolution increases under the presence of H2CO3. Some researchers in their
work [1] and [2] assume that H2CO3 either serves as an extra source of H+ ions or is
reduced directly. It has also been assumed [2] and [3] that both these reactions are
independent of each other and the total cathodic current is the aggregate of the current
produced by the two reactions:
2 H + + 2e − → H 2 (1.3)
14
2 H 2CO3 + 2e − → H 2 + 2 HCO3− (1.4)
For more details on CO2 corrosion, refer to a number of publications covering this
field [1]-[8]. Particular attention is drawn to the recent reviews of the main design
considerations [9] and prediction techniques related to CO2 corrosion [10] compiled by
the European Federation of Corrosion.
1. 2 CORROSION IN THE OIL AND GAS INDUSTRY
The majority of the oil and gas pipelines are made of carbon steel. Pipelines, like
other structures in nature, deteriorate over time. This deterioration in metallic pipeline
usually occurs as a result of the damaging effects of the surrounding environment. For
carbon steel, one of the most dominant forms of such deterioration is corrosion. The
corrosion problem is a major concern and becomes critical as a pipeline ages. Pipeline
operators throughout the world are confronted with the expensive and risky task of
operating aged pipelines because of corrosion and its potential damaging effects. The
major effect of corrosion is the loss of metal cross-section. This results in a reduction of
the pipeline’s carrying capacity and its safety. For a pipeline carrying live corrosion
defects, the major concern for the operator is the need to have a simple and quick
technique which can be used to analyze the rate of corrosion when a particular type of oil
is flowing into the pipeline. This information can be used to evaluate the pipeline’s
current reliability, and the time-dependent changes in it. This would help in determining
the effective safe-life of the pipeline, and an estimate of the time when the pipeline needs
15
to be changed. Changing a pipeline in the case of oil and gas industry is a very time
consuming and expensive procedure.
1. 3 IMPORTANCE OF CRUDE OIL IN CO2 CORROSION OF CARBON STEEL
The role of crude oil in CO2 corrosion has gained special attention in the last few
years due to its significance when predicting or modeling corrosion rates. Modeling the
effect of crude oil in CO2 corrosion is not an easy task. Though many researchers have
worked in the area, the complexity and variations in the constituent of different crude oils
make it difficult to model its effects (properties such as wettability and corrosivity) on the
carbon steel.
Efird [11] stressed the importance of testing the effect of specific crude oils and
including this in corrosion prediction and testing. He also introduced the definition of
Corrosion Rate Break as the level of produced water in crude oil production where
corrosion is accelerated and becomes a problem. Smart [12] in 1993 presented his work
relating petrophysical and wettability properties to the corrosion. He indicated in his
work that crude oils have surface active compounds (polar compounds containing
oxygen, nitrogen and sulfur) that strongly affect the wettability properties of brines.
Hernández et al. [13] In her work provided insight about the variables in crude oil
composition that could be playing a major role in the inhibition offered by crude oils.
16
1. 4 PREVIOUS RESEARCH IN THE CORROSION FIELD
For years, researchers have presented various approaches detailing the process of
corrosion. The task of corrosion prediction has been identified as a key approach in
utilizing the knowledge of the corrosion process and applying it to industrial corrosion
related problems. Many corrosion models have been developed over the years. These
models can be categorized into three main categories: empirical, semi-empirical and
mechanistic models, based on how firmly they are grounded in theory. These models
predict the corrosion rate with sufficient accuracy, but provide little knowledge into
understanding the corrosion process. It is also important to note that some of these
models are so complex that one needs a thorough understanding of the thermodynamic
and electro-chemical processes occurring during the process of corrosion. The everyday
industrial application calls for a corrosion model which is relatively easy to use, has high
predictive accuracy, provides insight regarding the modeling process, helps in
understanding the interrelationship between variables affecting the corrosion rate, and
can be interpreted without the need of extensive chemical and thermodynamic knowledge
of corrosion process.
The success of a good model is based primarily on the consistency of a good data
set [14]. Corrosion data is generally expensive to produce, and large corrosion data sets
with sufficient consistency are difficult to find. The poor quality of the data may be due
to one or more limitations in the recorded data:

17
• Errors in the data arising from poor experiment design, faulty equipment or
miscalculations.
• Failure to measure, control or report significant variables. For any work
concerned with the effect of alloy composition on corrosion behavior, the
reporting of nominal, rather than actual, compositions introduces a significant
uncertainty, and degrades the value of the result.
• Failure to report or control environmental variables such as flow rate, oxygen
concentration or temperature restrict the value of the data recorded.
• The summarization of data, e.g. by plotting lines without the data points on which
the lines are based, seriously limits the use of such data for further analysis.
The present empirical, semi-empirical and mechanistic models lack high accuracy
mainly due to their inability to model the corrosion process in the absence of large
consistent data sets. This leads to the necessity of developing a more robust model which
is able to predict the corrosion rate with high accuracy even in the presence of a limited
noisy data set.

18
1. 5 CURRENT RESEARCH
The desired improvements identified in the previous research are:
• Developing a robust prediction model, capable of handling limited noisy data with
high accuracy.
• A model which is relatively easy to use and provides an understanding of the
corrosion process by providing knowledge regarding the interrelationship between
variables affecting the corrosion rate.
• A model which can serve as a hybrid to the current mechanistic and empirical
models by providing knowledge in the form of rules
The current research uses Artificial Neural Networks (ANNs) as an Artificial
Intelligence approach for modeling the corrosion rate. ANNs are being recognized as a
powerful and general technique for machine learning because of their non-linear
modeling abilities. Further, their distributed architecture is more robust in handling the
noise-ridden data. The hypothesis or model learned by the neural network is not explicitly
stated, but is implicitly enumerated in the network architecture. However, ANNs can be
made to yield comprehensible models by using rule extraction procedures.
This thesis has five main objectives:
1. To construct an ANN which can be used to predict the corrosion rate in carbon
steel.
19
2. To optimize network parameters using genetic algorithm to create a model
which would have a high accuracy
3. To understand the interrelationship between the input variables and the impact
of the response using knowledge extraction methods such as Sensitivity
Analysis, Network Interpretation Diagram and Garson’s Algorithm.
4. To capture the embedded knowledge by extracting symbolic rules and decision
trees by using appropriate ANN rule extraction algorithms.
5. To compare prediction accuracy of the trained ANN model with the extracted
rule set and other machine learning algorithms.
1. 6 THESIS STRUCTURE
The thesis has been organized into six chapters as follows: Chapter 1 introduced the
corrosion problem and explains the motivation of the current research. Chapter 2
provides background material for the various soft computing methods utilized in this
thesis. Chapter 3 undertakes a survey of the various approaches to solve the corrosion
prediction problem. Chapter 4 presents the methods and tools used in this work to
achieve the research objectives and focuses on the development of the neural network
model and implementation of rule extraction methods. Chapter 5 discusses the results of
the various techniques employed for the analysis. This chapter also describes in brief the
statistical and machine learning methodology used for analysis and comparison of the
20
results of the current research. Chapter 6 provides conclusions and suggestions for future
research.
21
CHAPTER 2. SOFT COMPUTING METHODOLOGIES
2. 1 WHAT IS SOFT COMPUTING?
Soft Computing (SC) refers to the evolving collection of methodologies used to
build intelligent systems exhibiting human-like reasoning and capable of tackling
uncertainty. The adoption of this approach has led to the development of systems that
have high MIQ (Machine Intelligence Quotient) [15]. SC-methodologies have proven to
be more successful then classical modeling, reasoning and search techniques in a wide
variety of problem domains. The characteristics of problems for which traditional
analytical approaches have proven deficient are:
• Modeling difficulties: Generally, real world problems are poorly defined and
information is empirically available as input-output patterns representing
instances of the problem’s behavior. Precise and accurate mathematical models
for such problems are either unavailable or restrictively expensive to build.
Further, such models exhibit non-linear behavior for which traditional
mathematical modeling tools are of limited utility.
• Large-scale solution spaces: Problems with large-scale solution spaces are usually
intractable with deterministic search techniques. The computational time and
effort is huge, and deterministic search does not employ mechanisms for
successfully navigating through local optima.

22
• Knowledge Acquisition: Expert knowledge in a problem domain is often fuzzy,
consisting of imprecise declarations, partial truths and approximations. Hence,
crisp classifications and unambiguous definitions are not always possible. Also, in
some cases, there is a need to directly acquire knowledge from problem data
without human intervention.
Soft computing consists of a suite of approaches capable of exploiting the above
described problem characteristics to yield tractable and robust intelligent systems at low
solution cost. The discipline of SC encompasses several paradigms such as fuzzy set
theory, neural networks, approximate reasoning, and stochastic optimization methods like
genetic algorithms, simulated annealing and machine learning techniques. SC unites these
complementary approaches into a cohesive structure, providing a scaffold for the
construction of innovative hybrid intelligent systems. The key strengths of the constituent
approaches are as follows:
• Fuzzy Set Theory allows for imprecise knowledge representation in the form of
fuzzy if-then rules.
• Neural Networks exhibit learning and adaptive behavior with non-linear modeling
capabilities.
• Genetic Algorithms provide systematic global search of solution space and are
capable of evolving better candidate solutions starting with random initial
solutions.
• Machine Learning methods are important for automated knowledge acquisition.

23
The above capabilities have allowed SC-approaches to successfully confront
several real world problems in robotics, space flight, process control, production and
aerospace applications [17]. The remaining sections of this chapter provide the necessary
background material for the SC-methodologies utilized in the current research effort.
2. 2 GENETIC ALGORITHMS
Genetic Algorithms (GAs), first proposed by John Holland [18], are stochastic
search techniques used for optimization problems. The basic methodology is rooted in the
principles of natural genetics and evolutionary processes. As general-purpose
optimization tools, GAs offer several unique advantages over conventional optimization
techniques. They combine elements of directed and stochastic search methods, striking a
good balance between the exploration and exploitation of the solution space [19]. One of
the unique characteristics of GAs is that functional derivative or gradient information is
not required for determining the search direction in these algorithms. This characteristic
makes them a flexible tool for optimizing a large number of objective functions, which
are either not differentiable or whose gradient calculation is computationally expensive.
Genetic Algorithms work with solution populations rather than single members, making
them capable of yielding multiple solutions of high quality.

24
2. 2. 1 METHODOLOGY OF GENETIC ALGORITHMS
The solution or search space contains all feasible solutions. Each point in this
space, called a chromosome, has an associated fitness value that usually equals the
objective function evaluated at that point. GA maintains a population of chromosomes,
which is repeatedly evolved over generations towards better fitness. The next generation
is created from the current population by using genetic operators like crossover and
mutation. The chromosomes with a higher fitness value survive and participate in the
creation of new populations. This ensures that successful chromosomes pass their good
genes to the next generation. The population continuously evolves toward better fitness,
and the algorithm converges to the best chromosome after several generations.
2. 3 MACHINE LEARNING
The ability to learn from examples and construct a model of the world is the
foundation of biological intelligence. This model of the world, often implicit, allows us to
adapt to a dynamically changing environment and is necessary for our survival. Artificial
Intelligence (AI) aims at constructing artifacts (machines, programs) that have the
capability to learn, adapt and exhibit human-like intelligence. Hence, learning is the key
for practical applications of AI. The field of machine learning is the study of methods for
programming computers to learn [20]. Many important algorithms have been developed
and successfully applied to diverse learning tasks such as speech recognition, game
playing, medical diagnosis, financial forecasting and industrial control [21].

25
A learning system is given a set of examples encoded in a machine-readable
format, referred to as a training set, or input. The system then generates a model or
hypothesis to perform a task of interest (pattern recognition, classification, prediction,
etc). The model is evaluated based on its ability to correctly generalize with examples not
used for training purposes. Hence, predictive accuracy or generalization is an important
criterion in evaluating alternate machine learning schemes. An accurate model allows us
to gain insight into the problem domain, laying a special emphasis on the criterion of
comprehensibility. It refers to the ease of understanding the model by a human user and
serves the purpose of validation, knowledge discovery and refinement. Fayyad et al. [22]
asserts that inductive learning with a focus on comprehensibility is a central activity in
the developing the area of knowledge discovery in databases and data mining.
Although a wide choice of machine learning schemes are available, they differ
significantly in terms of predictive accuracy, comprehensibility and ease of
implementation across different problem domains. The selection of a suitable learning
algorithm for a specific problem is based on considerable experimentation, using
different learning algorithms and evaluating the induced model in terms of predictive
accuracy, comprehensibility, and other possible criteria.
The following sub-sections present two important symbolic machine learning
methods, decision tree induction and attribute oriented induction. Neural networks, the
main class of non-symbolic machine learning tools used in this research are covered in a
later section in this chapter.

26
2. 3. 1 DECISION TREE INDUCTION
Decision trees are among the most popular symbolic machine learning algorithms.
They express the learned hypothesis or target function using a unique representation
format known as a decision tree. Decision trees can easily be compiled into simple if-then
rules for improving human comprehensibility. GAs have been applied in a number of
places from diagnosis of medical cases to learning to asses the credit risk of loan
applicants [23]. The example shown in table Table 2.1 [25] depicts the training data for
the target concept, PlayTennis. This decision tree classifies Saturday mornings as suitable
for playing a game of tennis or not.

27
Table 2.1 Training Set for the PlayTennis Concept
Outlook Temperature Humidity Wind PlayTennis

Sunny Hot High Weak No
Sunny Hot High Strong No
Overcast Hot High Weak Yes
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Overcast Cool Normal Strong Yes
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes
Overcast Mild High Strong Yes
Overcast Hot Normal Weak Yes
Rain Mild High Strong No
Each node in the tree indicated by an oval specifies a logical test based on some
attribute or feature in the problem. The tree has a root node, outlook, having three
possible attribute values: sunny, overcast and rain. Each of the outgoing branches from a
node corresponds to one of the possible values of the attribute. Hence, the root node has
three branches. A tree has also a set of leaf nodes, which represents the outcome of the
classifier (i.e., decision to play tennis or not). The classification of an instance involves
traversal through the tree starting at the root node until a leaf node is encountered. The
example instance (outlook = sunny, humidity = high) would follow the leftmost branch of
the depicted tree in Figure 2.1 [25].

28
Outlook
Sunny Overcast Rain
Humidity Yes Wind
High Normal Strong Weak
No Yes No Yes
Figure 2.1 Decision Tree Representation of PlayTennis Concept
Adapted From Quinlan [23].
The tree predicts that target concept, PlayTennis = no, indicating unsuitable
weather conditions for playing tennis. Also, the attribute temperature was not utilized in
constructing the tree, indicating its insignificance in the decision-making process.
Many decision tree induction methods have been developed in the last two decades
with different capabilities and requirements. The ID3 algorithm is the core algorithm on
which many variants have been developed. The algorithm constructs a decision tree in a
top-down fashion by recursively partitioning the instances at each node. The
determination of an attribute for partitioning the instance space is an important aspect in
decision tree induction. ID3 uses a statistical property called information gain to select
29
among candidate attributes in constructing a tree. Information gain provides a criterion
that measures the effectiveness of an attribute in classifying the training instances.
Let S denote the set of training instances. The information gain, InfoGain (T)
obtained by choosing an attribute T for splitting the set S as given in [24]:
InfoGain(T ) = info( S ) − infoT ( S ) (2.1)
In the above equation, info(S) is the amount of information needed to classify an
instance in S, and infoT (S) is the corresponding measure after partitioning the set S based
on attribute T. If the set has k possible partitions for k classes, the information content is:
k
freq(Cj , S ) ⎛ freq(Cj , S ) ⎞
info(S) = − ∑ |S|
log 2 ⎜⎜
|S|
⎟⎟ (2.2)
j =1 ⎝ ⎠
Here freq(Cj, S) is the number of examples of class Cj and j ranges over k classes.
Given a partition based on attribute T, the expected value of information over the induced
n subsets is given by:
n |S |
infoT (S) = −∑ i
.info ( S i ) (2.3)
i =1 | S |
In the above expression, Si is the subset of examples in S having and ith outcome,
and i ranges over the n subsets.
The procedure of selecting a splitting attribute and partitioning the training instance
set is recursively done for each internal node. Only the examples, which reach that node
30
(i.e., the examples that satisfy logical tests on the path), are used in attribute selection.
This process continues until either of these two criteria is met:
1. Every available attribute has been included in the tree path
2. The training instances at a given node belong to the same class. If so, the node is
labeled as a leaf node.
The major flaw with the ID3 algorithm based on the gain criterion is that it has a
strong bias towards the test with greater outcomes [104]. Let us consider a hypothetical
medical diagnosis task in which one of the attributes contains a patient identification.
Since every such identification is intended to be unique, partitioning any of the training
cases on the values of this attribute will lead to a large number of subsets, each
containing just one case. Since all of these one-case subsets necessarily contain cases of a
single class, InfoT (S)=0, so information gain from using this attribute to partition the set
of training cases is maximal. From the point of view of prediction, however, such a
division is quite useless.
The bias inherent in the gain criterion was later rectified in the C4.5 algorithm [25]
by employing the gain ratio. Let us take the scenario where the information about a case
indicates the outcome of a test rather then the class information to which the case
belongs. By analogy with the definition of info(S), we have
n |T | ⎛ Ti ⎞
split info(S) = − ∑ i
× log 2 ⎜⎜ ⎟
⎟
(2.4)
i =1 | T | ⎝T ⎠
31
According to Klinkenberg [105] the equation represents the information obtained
by dividing T into n subsets and info(S) provides knowledge about the classes that are
generated by the division. Equation 2.5 represents the amount of information that is
provided by split [104].
gain ratio ( S ) = gain ( S ) ÷ split info(S) (2.5)
2. 3. 2 ATTRIBUTE-ORIENTED INDUCTION
When the training set for learning is provided as a database, the task of inducing
hypothesis describing the data is called data mining. In real world applications, databases
are predominantly used for representing and maintaining information. Often, the
information is enormous, noisy, uncertain, and can involve missing values. A growing
need for knowledge discovery in databases led to the rapid development and adaptation
of special-purpose machine learning techniques suited for databases.
Attribute-oriented induction (AOI) is a technique for mining knowledge from
relational databases by inducing characteristic and classification rules describing the
hypothesis. It is a set-oriented method that generalizes the task-relevant subset of data,
attribute-by-attribute, into a general relation [26]. AOI is a data-driven induction process
capable of generalization to a desired level of abstraction. The method integrates machine
learning concepts like induction, generalization, concept hierarchies and database
operations to discover rules. Figure 2.2 shows the inputs and output of the AOI method.
32
Database
Queries Attribute Oriented Generalized
Induction Relation
List of Attributes
Concept Hierarchy
Figure 2.2 Scheme of Attribute-Oriented Induction
This technique requires provision of domain knowledge in the form of concept
hierarchies for obtaining generalized relations. Concept hierarchies can be explicitly
given by the experts or automatically generated by data analysis [27]. AOI is capable of
utilizing these concept hierarchies to generate logical rules. The following are some of
the key steps in induction algorithm:
• Concept Tree Ascension: Generalize the relationship by elimination of identical
tuples using predetermined threshold to control the generalization process.
• Vote Aggregation: The number of identical tuples being merged during the tree
ascension is important for the learning task. A counter is maintained indicating
the number of tuples in the initial relation that are generalized to the current
relation.
• Simplification: The generalized relation is simplified by merging of nearly
identical tuples (i.e., differing in the value of one attribute).
• Rule Transformation: The obtained final relation is a logical rule in the format
desired by the user.

33
The method is capable of generating two types of induction rules: learning
characteristic rules (LCHR) and learning classification rules (LCLR) [28]. Their
induction procedures are similar, differing in the attribute generalization process. Further
details about the AOI methodology can be found in [26] and [28].
2. 4 ARTIFICIAL NEURAL NETWORKS
2. 4. 1 NEURAL COMPUTATION
The motivation for the early development of neural networks stemmed from the
desire to mimic the functionality of the human brain. A neural network is an intelligent
data-driven modeling tool that is able to capture and represent complex and non-linear
input/output relationships. Neural networks are used in many important applications, such
as function approximation, pattern recognition and classification, memory recall,
prediction, optimization and noise-filtering. They are used in many commercial products
such as modems, image-processing and recognition systems, speech recognition software,
data mining, knowledge acquisition systems and medical instrumentation, etc [29].
A neural network is consists of many layers of nodes. These nodes are linked by
connections, with each connection having an associated weight, Wi. The weight of a
connection is a measure of its strength and its sign is indicative of the excitation or
inhibition potential. Figure 2.3 shows a simple perceptron having n inputs, {X1, X2… Xi…
Xn}.
34
W1
X1
W2
X2
Input Output
Wi
f (∑ Xi Wi) - θ
Xi
Wn
Xn
Figure 2.3 Computation at a Node
The perceptron has a threshold or bias, θ, which is the value of the net input
required to produce non-zero activation. The net input to a perceptron, neti, is given by:
neti = ∑Wi X i + θ (2.6)
A transfer function, f maps the net input to a range, O which is the activation or
output of the perceptron. It is given by:
Output,O = f(neti ) (2.7)
Neural networks have two distinct phases of operation: training and production.
Some design parameters need to be chosen before training the network. These include:
35
• System architecture or topology: The number of nodes in each layer and
corresponding transfer functions.
• Training Algorithm: The training algorithm and the performance measure or the
cost function.
• Generalization Considerations: The number of epochs or cycles needed to ensure
good generalization and criteria for termination of training phase.
Parameters like weights and biases are modified during the training phase. The
network uses problem data to assign values to these parameters. The distinguishing
characteristic of neural networks is their adaptivity, during which it requires a unique
information flow design as depicted in the Figure 2.4 (adapted from Principe et al. [30]).
The performance feedback loop utilizes a cost function to provide a measure of deviation
between the calculated output and the desired output. This performance feedback is
utilized directly to adapt the parameters, W and θ, so that the system output improves
with respect to the desired goal.

36
Desired
Input Neural network Output

(W, θ)
Adjust parameters Cost
Training
algorithm Error
Figure 2.4 Information Flow for Training Phase
2. 4. 2 THE MULTI-LAYER PERCEPTRON
A multi-layer perceptron (MLP) consists of a cascade of perceptrons arranged in
layers. A single hidden layer network is illustrated in Figure 2.5. The input layer contains
nodes that represent the inputs of the given problem. Each input is represented by a single
node in the input layer. The hidden layer maps the input to an intermediate space, which
serves as an input region for the output layer.

37
X1 X2 X3 X4
Input Layer
Hidden Layer
Output Layer
Y1 Y2
Figure 2.5 A Multi-Layer Perceptron
The output layer represents the response/output. The output node, as shown in
Figure 2.5, allows us to determine the response/output from the input variables. MLPs
have been proven to be universal approximators [31], capable of implementing any given
function. This is only possible with the choice of non-linear transfer functions. Two of
the most commonly used functions are the logistic function and hyperbolic tangent
function. The two functions are different in the range of their output values as illustrated
in Figure 2.6 for net input in the range [-4, 4].

38
Figure 2.6 Logistic and Hyperbolic Tangent Transfer Functions
The logistic function has an output range [0, 1], and the activation of a node, ai is
given by:
1
ai = (2.8)
1+ e - net input
The hyperbolic tangent function compresses a unit’s net input into an activation
value in the range [-1, 1]:
e net input - e- net input

ai = (2.9)
e net input + e- net input
39
2. 4. 3 NEURAL NETWORK TRAINING
The training phase in neural networks provides the answer to the following
questions: Is there a set of network parameters (weights and biases) that allows a network
to map a given set of input patterns to the desired outputs? If so, how are the parameters
determined? The most commonly used training algorithm is the backpropagation
algorithm, first discussed by Rumelhart et al.[32]. The term, ‘backpropagation’ refers to
the direction of propagation of error. The training regimen adjusts the weights and biases
of the network to minimize the cost function. Though several cost functions are available,
the function appropriate for prediction problems is the cross-entropy function [33]:
E=∑ ∑t pi ln(y pi ) + (1 - t pi ) ln(1 - y pi ) (2.10)

p i
In the above equation, E is the cross entropy, p represents the number of training
patterns and i represents the number of classes. The term, y pi is the estimated probability
that an input pattern belongs to class i, and tpi is the target with the range [0, 1]. Network
output is interpreted as the probability that the given input pattern belongs to a certain
class.
The cost function, E needs to be minimized and its derivative, with respect to the
weight is calculated and denoted by ∂E/∂w. Having obtained the derivative, adjusting
weights is now an optimization problem. Back-propagation uses a form a gradient
descent to update weights according to the formula:

40
∂E
Weight Update, Δwij = −η (2.11)
∂wij
In this equation wij, represents weight passing from the node i to the node j. η > 0
represents the learning rate and ∂E/∂wij, is the partial derivative of the error, E with
respect to wij. In the initial phase the random weights are assigned to the network and the
training algorithms modifies these weights according to the above discussed procedure.
Many alternative optimization techniques have been utilized; variations of the basic
method include methods like the conjugate-gradient method, momentum learning, etc.
Stochastic search algorithms such as genetic algorithms have been applied to avoid the
problem of convergence to local minima by [34] and [35].
2. 4. 4 GENERALIZATION CONSIDERATION
The collection of input pattern-desired response pairs used to train the learning
system is called the training set. The testing set contains examples not used for the
training purpose and it is used to evaluate the generalization capabilities of the network.
Vapnik [36] indicates that performance of the network trained with back-propagation
always improves with the number of training cycles. However, the error on the testing set
initially decreases with the number of cycles, and then increases as shown in Figure 2.7
41
Prediction Error
Stopping point
Testing Set
Training Set
Number of cycles
Figure 2.7 Cross Validation for Termination
This phenomenon is called overtraining and is indicative of poor generalization
capabilities. One solution to this problem is to split the training set into two sets – the
training set and validation set. After every fixed number of iterations, the error on the
validation set is calculated. Training is terminated when this error starts to increase. This
method is called early stopping or stopping with cross-validation.
2. 5 KNOWLEDGE EXTRACTION FROM ARTIFICIAL NEURAL NETWORKS
Once the neural network has been trained on a specific network topology, the next
step in the modeling of the process using ANN involves extracting knowledge from the
network. This embedded knowledge is in the form of connection weights. In order to
understand the predictive modeling process it is imperative to analyze these weights and
42
extract information regarding the contribution of input variables on the final output. Also
these input variables may not always be independent of each other. The interrelationship
between the input variables significantly affects the contribution towards the final
response/output.
The next section delineates some of the important methods used to determine the
impact of the input variables on the output.
2. 5. 1 PARTIAL DERIVATIVE METHOD
The Partial Derivative Method (PaD) consists of calculating the partial derivatives
of the response variables depending on the values of the input variables [37] and [38];
Two results can be obtained through this method:
• A profile of the output variations which indicates small changes for each input
variable.
• A classification of the relative contributions that each variable has towards the
output generated by the network.
A partial derivative of the output generated by the network is computed w.r.t the
input to obtain the profile of the variations on the output with a small change in the input
variable [106]. For a network structure consisting of one hidden layer of nh neurons, ni
inputs and a single output variable (i.e. no=1), the partial derivative of the response
variable yj w.r.t input xj (with j =/1. . . N and N the total number of observations) are:
43
nh
d ji = S j ∑ who I hj (1 − I hj ) wih (2.12)
h =1
In this equation Sj denotes the partial derivative of the resulting output neuron with
reference to the input. Ihj is the response of the hth hidden neuron, who and wih are the
weights between the output neuron and hth hidden neuron, and between the ith input
neuron and the hth hidden neuron [106].
A set of graphs of the partial derivatives versus each corresponding input variable
can then be plotted, which would enable a visual representation of the effect that
individual input variable has on the network output. Interpretation of one of these graphs
is that, if the partial derivative is negative then, for this value of the studied variable, the
output variable would tend to decrease, while the input variable increases. Inversely, if
the partial derivatives are positive, the output variable would tend to increase, while the
input variable also increases. The second result of PaD concerns the relative contribution
of the ANN output to the data set with respect to an input. It is calculated by a sum of the
square partial derivatives obtained per input variable:
n
SSDi = ∑ (d ji ) 2 (2.13)
j =1
One SSD (Sum of Square Derivatives) value is obtained per input variable. The
SSD values allow classification of the variables according to their increasing contribution
to the output variable in the model. The input variable with the highest SSD value would
influence the output variable the most.

44
2. 5. 2 PERTURB METHOD
The ‘Perturb’ method corresponds to a perturbation of the input variables [39]. This
method aims to assess the effect of small changes in each input of the neural network
output. The underlying methodology of the algorithms is to adjust the values of a
particular input variable while maintaining the others constant and to record the
corresponding output. The variable with greatest influence on the network output is
considered to be the most significant to the model. The mean square error (MSE) is
expected to increase as a larger amount of noise is added to the selected input variable
[39] and [40]. The aim is to assess the effect of small changes in each input on the neural
network output. Classification of the input variables can be obtained by order of their
importance.
2. 5. 3 SENSITIVITY ANALYSIS
Sensitivity analysis [41], [42], [43], [107], extracts the interrelationship between
the input and the output variables of the network. It is significant to gather information
regarding the influence of the input variables on the network response during the training
cycle, as this would provide feedback as to which input channels are most significant.
The input space is the pruned by removing the insignificant channels resulting in reduced
network size thereby simplifying the network complexity and reducing training times.
Sensitivity analysis [108] helps in understanding the influence of each input variable on
the network response generated. The magnitude of the one of the input variables is varied
45
over its entire range. During this process all the other input variables are held constant at
their mean values. The network learning is disabled during this operation in order to
make sure that the network weights are not affected. The methodology of sensitivity
analysis becomes rapidly complex with the increase in the number of input variables for a
given neural network model. The common approach in order to simplify the process as
described by Olden and Jackson [108] is to find out the basic summary statistics such as
the min, max and mean values for each of the variables. The network response is
recorded as the value of the variable is varied over the entire range and this provides a
spectrum of output response at different values of a particular input variable. This
information helps in understanding the relative contribution of each input variable and
can be easily summarized into relative importance/contribution plots.
2. 5. 4 GARSON’S ALGORITHM
As described earlier a neural network processes and provides information in the
form of connection weights. The input from each of the input variable is fed to the
network model as a weight. The contribution of each of these input variables to the output
mainly depends on magnitude and the direction of the connection weights [48]. As
described by Olden and Jackson [108], a positive connection weight increases the
magnitude of the network output whereas a negative weight inhibits the value of the
response variable. Also a variable with significantly higher connection weight is
considered to have a greater impact on the network output as compared to the others. The
46
mapping between the inputs variables and the predicted response generated in the case of
a MLP, is a bi-level process of information flow involving weight transfer from input to
hidden and then from hidden to output layer. An important fact which has been noticed
[48] is that when the direction of the connection weight is same (positive or negative)
between the input-hidden and the hidden-output layer it positive effect on the network
output. With the significant amount of knowledge that could be extracted by studying the
flow patterns of connection weight, the next step for researchers was to partition the
connection weights so establish the relative contribution of each of the input variable
toward the network output. In 1991 Garson [49] formulated an algorithm that would a
percentage breakup of the relative importance of each of the input variable in a given
network. Further enhancements to this algorithm were later proposed by Goh [50]. An
illustration of the application of Garson’s algorithm in a single hidden layer Feed
Forward MLP with two Processing Elements (PEs) is shown in Figure 2.8.
47
Figure 2.8 Network Diagram
Step 1: The matrix containing input-hidden-output neuron connection weights is
shown in Table 2.2.
Table 2.2 Matrix Showing Connection Weights
Inputs Hidden A Hidden B
Input 1 WA1 = -2.61 WB1 = -1.23
Input 2 WA2 = 0.13 WB2 = -0.91

Input 3 WA3 = -0.69 WB3 = -2.09
Output WOA = 1.11 WOB = 0.39
48
Step 2: The contribution of each input neuron to the output via each hidden neuron
calculated as the product of the input-hidden connection and the hidden-output
connection weights as shown in Table 2.3.
C A1 = W A1 × WOA = −2.61 × 1.11 = −2.90 (2.14)
Table 2.3 Matrix Showing Contribution of Each Input Neuron
Inputs Hidden A Hidden B
Input 1 CA1 = -2.90 CB1 = -0.48

Input 2 CA2 = 0.14 CB2 = -0.35
Input 3 CA3 = -0.77 CB3 = -0.82
Step 3: Relative contribution of each input neuron to the outgoing signal of each
hidden neuron (RA1) and the sum of input neuron contributions (S1) is shown in
Table 2.4:
R A1 = C A1 / ( C A1 + C A 2 + C A3 ) = 2.90 / (2.90 + 0.14 + 0.77 ) = 0.76 (2.15)
S1 = R A1 + R B1 = 0.76 + 0.29 = 1.05 (2.16)

49
Table 2.4 Relative and Sum of Input Neuron Contributions
Inputs Hidden A Hidden B Sum

Input 1 RA1 = 0.76 RB1 = 0.29 S1 = 1.05
Input 2 RA2 =0.04 RB2 = 0.21 S2 = 0.25
Input 3 RA3 = 0.20 RB3 = 0.50 S3 = 0.70
Step 4: Relative importance of each input variable is shown in Table 2.5:
RI 1 = S1 / ((S1 + S 2 + S 3 ) × 100 )
(2.17)
= 1.05 / ((1.05 + 0.25 + 0.70 ) × 100) = 52.5%
Table 2.5 Relative Importance of Input Variables
Relative Importance
Input 1 RI1 = 52.5%
Input 2 RI2 = 12.5%
Input 3 RI3 = 35.0%
2. 5. 5 NETWORK INTERPRETATION DIAGRAM (NID)
With the increase in the number of input variable in the network structure with
more than one hidden layer, it becomes difficult to associate the contribution of the input
variables on the network output based on the magnitude of the connection weights. In
order to simply this problem and provide a visual representation of the network and the
50
connection weights, Özesmi & Özesmi [53] developed the Neural Interpretation Diagram
(NID). The underlying methodology [108] was to represent the connection weights in the
form of line joining neurons in each of the layers in the network. The line thickness was
the representation of the magnitude of the connection weights, with thick lines
representing higher weights. The direction of the connection weights was represented
using the line shadings. Solid lines were used to signify the excitatory signal and dashed
lines were used for inhibitory signals). Study of magnitude and director of connection
weights helps in predicting the variable contribution [51], [52] as well as understanding
the interactions between the input variables. A sample NID illustrating the direction and
magnitude of the connection weights is shown in Figure 2.9.

51
Figure 2.9 Network Interpretation Diagram (NID)
In their work [109] Olden and Jackson explain the concept that when like (positive
or negative) connection weights transfer from input-hidden to hidden-output layer in an
MLP, it results in a positive/excitatory effect of input variables on the network output.
Whereas negative/inhibitory effect is projected when opposing connection weights flow
from input-hidden to hidden-output layer. The product of the two connection weights
subsequently passing between the layers of the MLP gives the final effect of the input
variable on the output.

52
2. 6 RULE EXTRACTION IN NEURAL NETWORKS
The successful proliferation of applications incorporating ANN technology in fields
ranging from engineering, ecological, science, industry, commerce, medicine offers a
clear testament to the enormous capabilities of the ANN paradigm. There are three salient
characteristics of the ANNs which underpin this success:
• The comparatively straightforward manner in which the artificial neural networks
acquire knowledge about a given problem domain through the training phase. The
power to understand and learn linear as well as non-linear relationships helps the
ANN to learn with relative ease and model the problem. This process is quite
distinct from the more complicated knowledge engineering/acquisition process for
symbolic AI systems.
• The compact (albeit completely numerical) form in which the acquired
information/knowledge is stored within the trained ANN. Also, the comparative
ease and speed with which this ‘knowledge’ can be accessed and used aids in ease
of analysis.
• The robustness of an ANN solution in the presence of noise in the input data. This
helps to build network models that have relatively higher accuracy even in the
presence of noisy data.
Another advantage of the trained ANN is the high degree of accuracy reported
when an ANN solution generalized over unseen examples from the problem domain. In
spite of the salient characteristics the ANN have a major drawback of being unable to
53
clearly explain the process of generating the results [110]. The basic idea of a learning
and generalization tool is to explain the process and provide knowledge. For ANN to gain
a wider acceptance as a learning and generalization tool, it is essential to integrate them
with a tool that would provide comprehensible explanation for the results and the
underlying methodology. For applications where ANN is to be used for safety critical
applications such as airlines, medical diagnosis, power plants, life cycle of gas pipelines
etc., it is essential for the ANN system to have three important capabilities:
1. Providing capability for a user to validate the results generated by the ANN for
all the possible input values [110].
2. Providing capability for user to define the boundary conditions for the input
variables under which the system would satisfactorily perform the task of
generating a desired output with sufficient reliability [110]. This would provide
transparency of the ANN solution and would expand the possibilities of
integrating profitably symbolic and connectionist approaches to Artificial
Intelligence.
3. Within a trained ANN, the capability should exist for the user to determine
whether the ANN has an optimal structure or size. A concomitant requirement is
for ANN solutions to not only be transparent but also provide information about
the internal states of the system. The satisfaction of such requirement would
make a significant contribution to the task of identifying and, if possible,
excluding those ANN-based solutions that have the potential to give erroneous
54
results without any accompanying indication as to when and why a result is
suboptimal.
Rule extraction offers the potential for providing such capabilities.
2. 6. 1 THE RULE EXTRACTION TASK
A neural network captures task-relevant knowledge as part of its training regimen.
This knowledge is encoded as:
• The architecture or topology of the network
• The transfer functions used for non linear mapping
• A set of network parameters (weights and biases)
The knowledge represents the hypothesis or model learned by the network. Usually
these models are difficult to understand because the processing in a neural network
occurs at the sub-symbolic level as numerical estimation and manipulation of network
parameters. It may not always be possible to directly translate these large sets of real
valued parameters into symbols or concepts that have semantic significance. Mapping
between input features and target concept is represented by the hidden units in the
network. Thus, hidden units represent higher-level derived features, which may not
correspond to known features in the problem domain.
The goal of rule extraction approaches is to express the knowledge gained from the
network into symbolic inference rules. The proliferation of rule extraction techniques has
55
prompted researchers [54] [55] to develop criteria to evaluate the proposed algorithms
and their extracted knowledge representations as summarized below:
• Comprehensibility: The extent to which the extracted representations are humanly
comprehensible.
• Expressive power: The structure of the output presented to the end-user. Various
representation formats like simple propositional rules, M-of-N rules, fuzzy
inference rules, decision trees, etc. can be used based on the problem domain.
• Fidelity: The capability of the extracted representations to mimic the behavior of
the original network.
• Predictive Accuracy: The generalization capabilities of the resulting
representations.
• Scalability: The ability of the rule extraction method in adapting to different
problem sizes (dimensionality of the input space, number of processing elements
etc).
• Generality: The degree to which a rule extraction method imposes special
requirements like tailored training regimens or restrictions on network
architecture.
2. 6. 2 APPROACHES TO RULE EXTRACTION
One of the earliest approach for extracting comprehensible rules from ANN can be
found in the work of Gallant [56] on connectionist expert systems. Classification rules
56
describing the network’s behavior were obtained by analyzing the role of attribute
ordering in correctly classifying a problem. A variety of rule extraction methods have
been developed since then for addressing the problem of comprehensibility in neural
networks. According to Andrews et al. [57], rule extraction methods can be classified
into three categories, based on the view taken by the algorithms of the underlying
network topology: decompositional, pedagogical and eclectic.
In the Decompositional methods, rules are extracted from the network at every
neuron of the hidden and output layers. These rules are then combined to describe the
behavior of the overall network. This approach can be considered as a local approach to
rule extraction as the analysis is primarily based of the architecture of the network. Most
approaches within this category employ a search procedure for finding subsets of
incoming weights that exceed the bias or threshold on a node. The identified subsets of
such activations are translated into propositional rules. The subset method by Fu [58] and
the M-of-N algorithm developed by Towell and Shavlik [59] are generic representatives
of this category. The subset method extracts simple propositional rules. The M-of-N
algorithm, as the name suggests is capable of extracting m-of-n rules. An m-of-n
expression is satisfied when m of the possible n antecedents are satisfied. Setiono [60] in
his work describes the method to extract rules where first the activation values at the
hidden layer neurons are grouped together, and then the network is repeatedly split into
sub-networks for ease of analysis. The RULEX technique developed by Andrews and
Geva [61] directly interprets the weight vectors as rules. This technique can be used only
57
for a particular type of multilayer perceptron called the Constrained Error Back-
propagation (CEBP) perceptron. Though simple in conception, the decompositional
approach to rule extraction has various limitations. The algorithmic complexity increases
exponentially with network complexity. Various restrictions are imposed on the network
architecture and the training procedures, which adversely affects the generalization
capabilities of the neural network.
Pedagogical techniques extract rules that map network inputs to outputs directly.
The approach used by Saito and Nakano’s approach [62], was to select useful rules from
the rule set generated using input activation values of the network which activate a given
output unit. Craven and Shavlik’s Rule-extraction-as-learning [63] is a pedagogical
approach, which uses the characteristics of querying the network. Their rule extraction
process consisted of systematic sampling of the network data and then generating queries
to extract rule set from the network. This approach is less computationally intensive than
search based methods. Validity Interval Analysis (VI-Analysis) proposed by Thurn [64],
extracts rules by a generate-and-test procedure, by propagating validity intervals
throughout the network. Linear programming is used to determine if the set of proposed
validity intervals are consistent with the network’s activation values on all nodes. The
RULENEG approach developed by Pop et al. [65] focuses on extracting conjunctive
rules from a neural network. It is based on the principle that changing the truth value of
one of the antecedents in a conjunctive rule changes the consequent of the rule.
58
Several pedagogical approaches have also been developed for extracting decision
tree representations of the neural network. Craven and Shavlik [66] extract decision trees
from trained neural networks using a novel algorithm named TREPAN. This algorithm
employs a greedy gain ratio criterion for evaluating attribute splits. Binary and M-of-N
decision trees can be derived by this method. The ANN-DT (Artificial Neural Network -
Decision Tree) algorithm proposed by Shmitz et al. [67] is capable of growing binary
decision trees from neural networks by using attribute selection criteria based on
significance analysis for continuous valued features. The DecText (Decision Tree
Extractor) algorithm [68] is effective in extracting high fidelity trees from trained
networks. The paper also proposes different criteria for selecting an attribute to partition
the training data.
The third category of rule extraction techniques, labeled eclectic approaches,
combines elements of the basic categories discussed above. The BRAINNE system
proposed by Sestino and Dillon [69] extracts simple if-then rules. The method uses a
unique approach to handle continuous data without discretization. The genetic algorithm
based rule extraction approach was developed by Keedwell et al. [70]. Genes contain the
weight between two adjacent layers. Chromosomes are then constructed to represent a
path from the input to the output layer. The fitness function is calculated as the product of
the weights being transferred from the input the output layer. The algorithm identifies the
fittest chromosomes, which subsequently mapped into if-then rules. A major limitation of
this method is that only single antecedent rules can be extracted.

59
2. 6. 3 VALIDITY INTERVAL ANALYSIS
One of the more popular rule extraction and refinement techniques is the Validity
Interval Analysis (VI-Analysis) [64]. The basic procedure of the VI-Analysis algorithm
adapted from [57] and [64] is as follows:
• Generation of candidate rule set: For rule extraction, the first step is the
generation of a feasible rule set.
• Validity interval assignment: A candidate rule is translated into a set of validity
intervals (range of its activation values) to be specified on the input and output
nodes of the network.
• Interval refinement: These proposed validity intervals are refined by propagating
them through the network in two phases: forward and backward. Linear
programming techniques are then employed to generate and refine validity
intervals.
• Rule validation based on convergence: There are two possible outcomes to the
previous step. VI-Analysis converges validating the proposed rule. Otherwise, a
contradiction is found, proving that the constraints imposed by the proposed
validity intervals are inconsistent with the behavior of the network. The rule is
rejected and steps 2-4 are repeated with another candidate rule.
A major drawback of this method is the computational intensity requiring many
calls to an optimization module. In addition, activation levels of the nodes are assumed
independent of one another. This assumption is not always valid and the algorithm may
60
not find maximally general rules. Maire [71] shows that VI-Analysis always converges
in one run (forward and backward phase) for single layer networks and has an
exponential rate of convergence for multilayer networks.
2. 6. 4 EXTRACTION OF DECISION TREE REPRESENTATIONS
The ANN-DT algorithm generates univariate decision trees and a schematic
representation of the algorithm adapted from [67] is shown in Figure 2.10. As illustrated
in the figure, the sampled data set S is split into two data sets S1 and S2, based on the
selected attribute. The main steps in the ANN-DT algorithm are as follows:
1. Interpolation of Correlated Data: An artificial data set is prepared by random
sampling of the feature space and the class labels are obtained by querying the
neural network modeling the training data
2. Selection of Attribute: For discrete output classes, gain ratio is used for selecting
the attribute. An alternate method based on analysis of attribute significance can
also be used.
3. Stopping Criteria: The selected attribute splits the current set of data into two
subsets. By recursive splitting of data, a decision tree is generated. For discrete
classes, the process is terminated when an internal node contains data with one
output class. For continuous outputs, termination occurs when standard
deviation of variance of data is zero.

61
Estimate neighbor-
hood areas for S1
interpolation
Use interpolated data S Extraction of binary

to sample neural decision tree
network Selection of attribute
Selection of split point
Train neural network

on original dataset S2
Figure 2.10 Schematic Representation of the ANN-DT Algorithm
Adapted From Schultz et al. [67]
2. 6. 5 TREPAN ALGORITHM
The TREPAN algorithm is similar to other conventional decision tree induction
algorithms such as C4.5 [25] but differs by extracting knowledge from a trained network
as an inductive learning task. The resulting decision tree approximates the network.
62
A major difference between TREPAN and other decision tree algorithms is the use of
an oracle that makes membership queries and returns the class labels. The network itself
serves as the oracle and answering the membership queries means using the network to
classify an instance. This information is used in developing the nodes and leafs of a tree.
Generally, an attribute is selected to be placed at the root node. In the next step a branch
is then added to this node of the tree for each possible value of this attribute. The
branching process splits the data set into a given number of subsets. The process is
recursively repeated at every branch, using only those data patterns that actually reach the
branch. If the number is less than the threshold, the oracle is used. The branching
continues until all the patterns that reach a leaf node belong to the same class. No further
expansion of this leaf node is necessary and the node is designated with the appropriate
class label. The expansion then proceeds to other branches of the tree, until all possible
leaf nodes have been produced. For more detail on TREPAN see Craven and Shavlik [63].
The TREPAN algorithm is shown in Figure 2.11

63
Figure 2.11 The TREPAN Algorithm.
TREPAN uses membership queries at each instance of the learner’s instance space,
to determine the class labels for each instance. This membership query is a question to
oracle (the network model) and returns the class label. TREPAN utilizes DRAWSAMPLE
routine to get a set of query instance to use for membership queries. These query
instances are subject to a set of constraints determined by the location of the node in the
64
tree. The constraints mainly state that instances should have outcomes for the tests at
nodes higher in the tree that cause the instances to follow the path from the root to the
given node.
The CONSTRUCTTEST function is used to determine the splitting test for a particular
node. TREPAN uses m-of-n expression for its tests. The m-of-n expression is a Boolean
expression that is specified by a threshold values m (integer value), and a set of n
Boolean literals. The expression m-of-n is satisfied when at least m of its n literals are
satisfied.
TREPAN ensures that a minimum number of instances exist at a given node, before
giving a class label to the node or choosing a splitting test for it. The data set to be used at
each of these nodes is determined by the minimum_sample parameter, which is a user
specified field. The parameter controls both the size as well as the depth of the decision
tree and in turn influences the classification accuracy of the decision tree.
65
CHAPTER 3. APPROACHES TO CORROSION PREDICTION
PROBLEM
3. 1 REVIEW OF APPROACHES TO SOLVING CORROSION PREDICTION PROBLEM
Many different models of CO2 corrosion exist. These can be arbitrarily classified
into three categories based on how firmly they are grounded in theory:
• Mechanistic Models
• Semi-empirical Models
• Empirical Models
Mechanistic Models: these models describe the mechanism of the underlying
reactions and have a strong theoretical background. All or most of the constants
appearing in this type of model have a clear physical meaning. Many of the numerical
values can be found in the literature about corrosion. When calibrated on a reliable
experimental database this type of model should, in principle, enable accurate and
physically realistic interpolation (prediction within the bounds of calibrating parameters),
as well as extrapolation predictions. It is easy to add new knowledge to these models with
minimal modifications of the existing model structure and without having to recalibrate
any of the model constants.

66
Semi-empirical Models: these models are only partly based on firm theoretical
hypothesis. They are, for practical purposes, often extended to areas where insufficient
theoretical knowledge is available so that the additional phenomena are described with
empirical functions. Some of the constants appearing in these models have a clear
physical meaning while the others are just best-fit parameters. Calibrated with a
sufficiently large and reliable experimental database, such models will enable good
interpolation predictions. However, extrapolation can lead to unreliable and sometimes
physically unrealistic results. New knowledge can be added with some effort usually by
adding correction factors and/or by performing a partial recalibration of the model
constants.
Empirical Models: these models have very little or no theoretical background.
Most or all of the constants have little physical meaning – they are just best-fit
parameters to the available experimental results. When calibrated with a large and
reliable experimental database, these models can have excellent interpolation
characteristics. However, any extrapolation may lead to incorrect predictions, as there is
no assurance that the arbitrary empirical correlations hold outside of their calibration
domain. The addition of any new knowledge to this model is rather difficult and requires
recalibration of the entire model. Alternatively, correction factors related to their
interactions with the existing empirical constants, can be added with some degree of
uncertainty.
67
The next section briefly discusses some of the significant models belonging to each
of these categories.
3. 2 MECHANISTIC MODELS
As described in the theoretical background section of chapter 1, CO2 corrosion is a
complex phenomenon where electrochemical, transport and chemical processes occur
simultaneously. Thus the processes to be modeled are electrochemical reactions and the
flow of the different components of the system such as H+, CO2, H2CO3 and Fe++, as well
as the chemical reactions occurring between them. There is no single model which would
be able to grasp all of these complexities, but there have been studies which incorporate
individually different aspects of this complex process.
Since CO2 corrosion is an electrochemical phenomenon a number of researchers
have attempted to construct a mechanistic model based on the electrochemical processes
occurring to the metal surface. One of the most significant and widely used models based
was proposed by Wards and Milliams [1], [4]. Due to some of the basic assumptions
made by the authors in the modeling process, the model was questioned for validity [72],
[73], [3]. Wards and Milliams [1], [4] later revised their model [74] based on some of the
constants determined by experiments of Dugstad el al. [8]. This revised model has been
used on several occasions in order to extend its validity into areas concerning corrosion
presence of protective films [7], [74]. Another electrochemical model was presented by
Gray et al. [2], [6] which contained constants based on the own glass cell and flow cell. It
68
was a breakthrough in its scope and approach of the study in the field of CO2 corrosion
modeling. Nesic et al. [3] did a follow up study and presented another electrochemical
model. This model had a successful predictive accuracy as compared to the semi-
empirical models of de Waard et al. [7] and Dugstad el al. [8]. The model of Nesic et al.
[3], described the electrochemical process occurring on the metal surface in detail, but
process of transport leading to the occurrence of currents was oversimplified. Pots [76]
later presented a more realistic model describing the transport processes in the boundary
layer for the case of CO2 corrosion. His model was based on the approach of Turgoose et
al. [77] who was the first to model the phenomena. Archour et al. [78], Dalayan et al.
[79] used their own mechanistic model to simulate pit propagation of carbon steel in CO2
environment under highly turbulent conditions.
3. 3 SEMI-EMPIRICAL MODELS
Most of the models based on the mechanistic approach are called the “worst-case”
models because they do not take into consideration the presence of protective surface
films, corrosion inhibitors, hydrocarbons, different steel types, high pressure and other
realistic conditions found in oil and gas industry.
When the walls of the pipeline are wetted with oil (hydrophobic) no corrosion is
possible. Also some of the components of crude oil have inhibitive properties, which help
in obtaining more protective films. This crucial factor was incorporated in the modeling
69
process by de Waard and Lotz [7] by accounting for a factor called water-wetting factor
in their so-called “resistance model”, relating effect of velocity to corrosion process.
Waard et al. [74] presented a semi-empirical model based on their initial study [1],
which considered the effects of protective films and corrosion inhibitors which modeling
corrosion. Dugstad et al. [8] presented his semi-empirical model of CO2 corrosion based
on the temperature-dependent basic equation (best fit polynomial function). Pots [75],
[76] presented a more realistic semi-empirical model describing the transport processes in
the boundary layer for the case of CO2 corrosion. His model was based on the approach
of Turgoose et al. [77] who was the first to model the phenomena.
An important problem in modeling of CO2 corrosion is that many of the pipelines
and flow lines carrying oils operate under multi-phase flow conditions. Modeling of
multi-phase flow alone is a difficult task, even more so is its effect on CO2 corrosion.
Jepson et al. [80] presented a semi-empirical model suggesting the importance of the
Froude number in characterizing the effects of multi-phase flow on CO2 corrosion.
3. 4 EMPIRICAL MODELS
It has been observed that CO2 corrosion rates in the field in presence of crude oil
are much lower then those obtained in laboratory conditions where crude oil was not used
or where synthetic crude oils were used. One can identify two main effects of crude oil
on the CO2 corrosion rate. The first is a wettability effect and relates to a hydrodynamic
condition where crude oil entrains the water and prevents it from wetting the steel surface
70
(continuously or intermittently). The second effect is corrosion inhibition by certain
components of crude oil that reaches the steel surface either by direct contact or by first
partitioning into the water phase.
Efird [11] stressed the importance of testing the effect of specific crude oils and
including it in corrosion prediction and testing. He also introduced the definition of
Corrosion Rate Break as the level of produced water in crude oil production where
corrosion is accelerated and becomes a problem. Smart [12] in 1993 presented his work
relating petrophysical and wettability properties to corrosion. He indicated in his work
that crude oils have surface active compounds (polar compounds containing oxygen,
nitrogen and sulfur) that strongly affect the wettability properties of brines. Adams et al.
[99] later presented work relating the water-wetting factor of corrosion to the velocities in
the flow pipe using a multiple regression model. Use of linear regression models to
describe the complex process of CO2 corrosion has always been a questionable approach.
In some recent studies [83], [84] the degree of inhibition was quantitatively modeled to
the chemical composition of crude oil and the concentration of saturates, aromatics,
resins, asphaltenes, nitrogen and sulfur. Hernández et al. [13] gave an insight about the
variables in crude oil composition that could be playing a major role in the inhibition
offered by crude oils. In this work, a statistical analysis was performed with several
Venezuelan crude oils evaluated experimentally under the same conditions. Crude oils
were separated into two groups: paraffinic and asphaltenic, depending on their
distribution of saturates, aromatics, resins and asphaltenes (SARA). The effects of basic
71
chemical and physical properties of crude oils were then evaluated by using multiple
linear regression analyses. Markov description stochastic approach for modeling the
phenomenon of pitting corrosion has been presented in the work of Provan [81].
In recent years the field of artificial intelligence has been explored for modeling the
corrosion process. ANNs have been one of the most promising approaches to the
corrosion modeling process. The next section presents some of the important research
that has taken place the field of modeling the corrosion process.
3. 5 NEURAL NETWORK MODELS
An early published attempt to apply a neural network to a corrosion problem was
that of Smets and Bogaerts [85]. They developed a series of neural networks to predict
the SCC of type 304 stainless steel in near neutral solutions as a function of chloride
content, oxygen content and temperature. They found that the neural network approach
out-performed traditional regression techniques.
Urquidi-Macdonald [86] developed a neural network model used for predicting the
number and depth of pits in heat exchangers. No information was given about the
network size other than that it had two hidden layers or the number of training points.
The evolution of the pit depth and the number of pits were effectively modeled and
demonstrated a good comparison of the experimental results.
Ben-Hain and MacDonald [87] described the use of neural network models to
predict the influence of various parameters on the acidity of simulated geological brines.
72
The solutions were based on NaCl + MgCl2. The network inputs were the Na+ and Mg2+
concentration and the temperature. The predicted output was the pH value. The data set
consisted of 101 points, of which 90 were used for training, with the remaining 11
retained as a test set. A simple network consisting of a single layer with two hidden nodes
was used. The network achieved good results and the prediction error was of the same
order as the experimental uncertainty.
Silverman and Rosen [88] combined artificial neural networks with an expert
system in order to predict the type of corrosion from a polarization curve. Inputs to the
networks included the passive current density, the pitting potential and the repassivation
potential, while outputs were the risks of crevice, pitting and general corrosion. Two
approaches were used: independent networks for each type of corrosion, and a single
combined network producing all three outputs. An expert system was used to interpret the
outputs produced by the two approaches. The relatively small size of the training data set
was one of the major concerns regarding the reliability of the model.
Trasatti and Mazza [90] developed a neural network to be used for the prediction
of crevice corrosion behavior of stainless steels. The network was trained from long-term
laboratory and field tests. Seventeen input variables were used with one hidden layer of
five nodes. Six hundred training examples were available; 450 of these were used for
training and the remaining 150 as a test set. The performance of the network was
reasonably good, but the very large number of input variables might be expected to
present difficulties in training with a relatively small data set. A 17-dimensional

73
hypercube has 217 approx. 130,000 ‘corners’ so the data space is inevitably very sparsely
populated.
Palakal et al. [98] developed an intelligent computational approach based on
wavelet analysis and ANNs to identify and quantify the corrosion damage images on
panels obtained from nondestructive inspection (NDI) techniques. A K-mean
classification algorithm was used to identify the corroded regions from the non-corroded
regions in the panel based on the extracted features. Good accuracy was obtained in
identification of the corroded segments. A back propagation NN was used to predict the
material loss due to corrosion. Perturbing the images by changing the pixel values that
would correspond to the higher material loss due to growth in corrosion was simulated.
Experiments were conducted by perturbing the images of the damaged regions such that
growth in the extent of the material loss could be observed. A good trend was observed
between the predicted material loss and the experimental data. The results indicated that
the computational methods developed for corrosion analysis seem to provide reasonable
data needed for clustering material loss due to corrosion damage.
Pidaparti et al. [97] presented a work that examined the residual strength of aging
aircraft panels in the presence of corrosion and fatigue damage. Both the residual strength
and the corrosion rates were predicted using a neural network consisting of two hidden
layer feed-forward architecture. Sensitivity analysis was performed for determining the
impact of input variables on the output. A series of simulations were also performed to
examine the generalization ability of the network in predicting the outputs for different
74
conditions of the input parameters. Each simulation tested the effect of a particular input
parameter on the predictions for a particular panel. The results obtained were in good
agreement with the experimental data. A similar work was carried out by Bailey et al.
[92]. A model was developed using neural networks to predict the ASTM G34 corrosion
rating and the resulting material loss in aging aircrafts. Another model was also
constructed that would predict the cycles for final fatigue failure and the residual static
strength of a particular type of material, given the amount of material loss due to
corrosion.
Bucolo et al. [93] modeled the corrosion phenomena occurring in a pulp and paper
plant. In this study, two predictive models were constructed. Predictive models for both a
local and a global prediction were built to allow for the evaluation of the corrosion rate
taking place in the stainless steel used in the ozone bleaching devices used in the plant.
An MLP model was constructed and later merged with a neuro-fuzzy system (NF). The
performance of the adopted predictive monitoring showed that the neuro-fuzzy expert
system was able to improve the capability of the neural network model by both
improving the accuracy of the model as well as demonstrating a dramatic reduction in the
number of input parameters necessary for satisfactory accuracy.
Leifer et al. [94] presented a model based on the pitting corrosion for the carbon
steel waste tanks containing aqueous radioactive waste, used for temporary storage of
spent nuclear fuel while permanent storage facilities for such materials were being
prepared. ANN was used to predict the corrosion rate. The back propagation of error
75
method was used to train and test the ANN model using archival pitting data. The
resulting data for the number of pits obtained from the neural network model were in
conjunction to the results obtained from experimental methods [111]. In one of his other
works Leifer et al. [95] presented a predictive model to determine the rate of pitting
corrosion in carbon steel waste tanks used to store radioactive sludge. The other
parameters taken into consideration were the different concentrations of corrosion
inhibitors and temperature ranges. The concentration levels of inhibitors such as Nitrite
were calculated experimentally using electrochemical polarization. Statistical methods
were used to analyze the experimental data. The network architecture selected was a
Generalized Feed-Forward ANN with Back-Propagation of Error Algorithms used for
training the network. The results revealed greater accuracy in prediction conditions
leading to pitting corrosion as compared to the regression model.
Haque et al. [96] developed a model for prediction of corrosion-fatigue crack
growth rate in dual-phase (DP) steels (primarily a low carbon steel with micro-alloying
additions of vanadium and boron) using an artificial neural network. The training data
consisted of corrosion-fatigue crack growth rates at varying stress intensity ranges for
martensite contents between 32 and 76%. The ANN model used consisted of three hidden
layer with back-propagation architecture. Even though a large number of input variables
were used during the training of the model, the model exhibited excellent comparison
with the experimental results.

76
Nesic and Vrhovac [91] developed a hybrid model combining the reliability of a
mechanistic model with the flexibility of the neural network approach. The model was
developed using the experimental database of Dugstad et al. [8]. The model architecture
consisted of a single hidden layer back propagation NN having 66 input neurons and 51
hidden neurons. GAs was used for the network training. The inputs to the network were
indirect, crude or noisy parameters, called primitive descriptors, such as: t, pH, PCO2,
Fe++, HCO3-, and v (flow velocity of oils). Relations between these primitive descriptors
were studied by introducing additional problem descriptors called evolved descriptors.
The prediction ability was found to be significantly better than conventional models.
77
CHAPTER 4. METHODOLOGY
4. 1 CORROSION TESTS
The main goal of this research was to make a model based on the actual data
collected from the experimental results. The detailed description of the corrosion tests
and the resulting data was published in a previous paper Hernández et al. [13]. A brief
overview of the experimental procedure is summarized as below.
Fifteen Venezuelan crude oils were evaluated. An analysis of saturated
components, aromatics, resins and asphaltenes (SARA) was performed on each crude oil.
API density (°API), total nitrogen content (NTOTAL), Total Acid Number (TAN), Sulfur
content (S%), Vanadium (V), Nickel (Ni) were measured according to ASTM standards.
Weight loss corrosion tests were performed on coupons. Three coupons were used for
each set of testing conditions; two of them were used for corrosion rate calculations and
the third for surface analysis and corrosion product characterization. After calculating
corrosion rates, these were then translated into inhibiting capacity, by dividing the values
of each test by the value obtained in blank tests, so that:
corrosionrate withcrude
Inhibiting Capacity = 1 − (4.1)
corrosionrateblank
78
4. 2 DEVELOPMENT OF THE NEURAL NETWORK MODEL
The first step towards the development of the model was to create a network
structure that would be most efficient in predicting complex nature of the corrosion
prediction problem. There are four major components related to the development of a
neural network model.
1. Choice of data and dividing them (based on sizes) into training, cross-validation
(CV) and testing data
2. Selection of an appropriate network architecture, training algorithm and learning
constants
3. Genetic optimization for the most suitable network parameters
4. Determination of the termination criteria
There are presently no definitive rules or formulae available to determine each of
these network selection parameters. Rigorous experimentation and a number of trials with
different types of network architecture were performed to achieve a good network model
for the given data. The software, NeuroSolutions version 4.21 developed by
NeuroDimensions Incorporated, was used for development and testing of the neural
network model.
79
4. 2. 1 TRAINING, CROSS-VALIDATION AND TEST DATASETS
The original data set was split into training, cross-validation and testing data sets
where:
• 65% of the exemplars were presented to the network for training
• 15% of the exemplars concurrent with the training set were used for cross
validation during which the MSE was computed within a ‘test set’
• 20% of the exemplars were used for testing the trained network.
4. 2. 2 NETWORK ARCHITECTURE, TRAINING ALGORITHM AND LEARNING PARAMETERS
A number of different network architectures, such as Multilayer Perceptron,
Generalized Feed Forward Network, Modular Neural Network and Radial Basis
Function, were experimented with to achieve the model which resulted in the best
prediction accuracy. A description of the different network architecture and their
respective prediction accuracy is tabulated in Table 4.1

80
Table 4.1 Evaluation of Different Neural Network Model Architectures
Number
Type of Transfer Training Classification
of hidden Dimensionality
Network Functions Algorithm Accuracy
Layers
Multilayer Hyperbolic Gradient
1 11-8-1 75.82
Perceptron Tangent Descent
Multilayer Gradient
1 11-6-1 Logistic 83.67
Perceptron Descent
2 11-10-8-3 88.52
2 11-10-8-1 92.2
Multilayer
Perceptron with Hyperbolic Gradient
2 11-6-6-1 96.7
Genetic Tangent Descent
Optimization
Modular Neural 11-4(upper)- Hyperbolic Gradient

1 73.88
Network 4(lower)-1 Tangent Descent
Gaussian,
RadialBasis Gradient
1 11-7-1 Hyperbolic 80.14
Function Descent
Tangent
Gaussian,
RadialBasis Gradient
2 11-8-6-1 Hyperbolic 86.77
Function Descent
Tangent
4. 2. 3 GENETIC OPTIMIZATION OF NETWORK PARAMETERS
Once the preliminary tests revealed that MLP were more accurate as compared to
others in predicting the inhibition rates, the Genetic Control component of the software
was utilized in order to obtain the best network parameters. The Steady State progression
genetic algorithms were used, in which only the worst member of the population would
be replaced with each iteration. This method of progression tends to arrive at a near
optimal solution more quickly than the generational progression.

81
The genetic operator used for the algorithm is called Selection [112]. It selects
which chromosome is to be included in the next generation’s population. This selected
chromosome undergoes crossover and/or mutation to generate offspring which are then
added to the next generation’s population. The process of Crossover [112] is to develop a
new chromosome by combining/mating two parent chromosomes, so that the offspring
develops the characteristics of both the parents. The Crossover Probability controls the
process of crossover. In our model the crossover probability of 0.9 was used. Another
genetic operator called Mutation is used to alter one or more genes in a particular
chromosome, resulting in a new gene value. With the inclusion of these new gene values
the GA can obtain betters results as compared to the ones obtained before the crossover
and mutation of the parent chromosomes. The Mutation probability used was 0.01 to
ensure that the search criterion remains specific.
Based on the comparison of the results, a two hidden layer MLP (11-6-6-1) with six
processing elements each and a hyperbolic tangent transfer function at the hidden layers
was selected. Gradient descent was used as the training algorithm. Step size and
momentum rates are the key learning parameters for this algorithm. In order to accelerate
the network ‘learning’ and to make sure that the probability of network convergence is
highest at the global minimum, both the momentum rates and the step-sizes were
simultaneously varied during the training regimen [112].

82
4. 2. 4 TERMINATION CRITERIA
The Gradient descent algorithm determines the weight vectors, which maps the
network input parameters to the desired output. This weight vector is randomly initialized
and then adapted during the training process.
The randomness of the initial weight vector is very important for learning, but the
inherent non-linear dynamics of the training process adapts different convergence
properties with different initial weight vectors. Therefore, to increase the probability of a
good initial solution (weight vector); a number of runs are required. In each of these run,
a number of training cycles (epochs) are essential in order to ensure adequate
generalization. The following four termination criteria have been used to determine
convergence of the training algorithm:
• Number of runs before termination
• The maximum number of epochs/run
• Non-improvement of cross-validation error with training
• Increase in the cross-validation error with training
The training parameters (i.e., learning parameters and the termination criteria) for
the 11-6-6-1 MLP network are given in Table 4.2

83
Table 4.2 Learning Parameters and the Termination Criteria
Network Parameters Value
Step size 0.01
Momentum factor 0.7
Number of Runs 5
Number of Epochs/Run 7,000
Number of Epochs
without improvement in 100
CV error
Once all of the network parameters were selected, six test runs (Test 1 to 6) were
conducted using exactly the same network architecture and network parameters, but a
different set of randomized training data. These test results provided a set of different
weight vectors that were randomly initialized and adapted during the training process.
4. 3 SENSITIVITY ANALYSIS
The next step was to analyze the interrelationship between the input variables and
their effect on the output of the network. Sensitivity Analysis was performed for the
chosen MLP network and for all the six test runs for that particular model. The sensitivity
was computed based on the corresponding difference (delta) in the output(s) as graphed
using the Max-Min criteria of the output (inhibition). The results for one of the six test
runs are shown in the Figure 4.1.

84
Figure 4.1 Sensitivity Analysis About the Mean

85
Figure 4.2 to Figure 4.12 illustrate the separate sensitivities for each variable.
Figure 4.2 Separate Sensitivity for % Crude Oil
Figure 4.3 Separate Sensitivity for Nickel (Ni)

86
Figure 4.4 Separate Sensitivity for API
Figure 4.5 Separate Sensitivity for Total Nitrogen

87
Figure 4.6 Separate Sensitivity for Vanadium (V)
Figure 4.7 Separate Sensitivity for S %

88
Figure 4.8 Separate Sensitivity for Total Acid Number (TAN)
Figure 4.9 Separate Sensitivity for Saturates

89
Figure 4.10 Separate Sensitivity for Aromatics
Figure 4.11 Separate Sensitivity for Resins

90
Figure 4.12 Separate Sensitivity for Asphaltenes
A Cumulative Sensitivity graph, as shown in Figure 4.13, was constructed by
averaging the sensitivity values for all of the six test runs (Test 1 to 6). From the analysis
of the Cumulative sensitivity graph it apparent that crude oil percentage had the most
impact on the output (inhibition rate). Because of this, the data was subdivided on the
basis of the crude oil percentages for further analysis.

91
0.6
0.5
0.4
Sensitivity
0.3
0.2
0.1
0
TAN
S%
Aromatics
Total Nitrogen
Ni
API
V
Saturates
Resins
Asphaltenes
% Crude Oil
Figure 4.13 Cumulative Sensitivity Graph
The data was separated by crude Oil Percentage into 4 different groups: 1%, 20%,
50% and 80%. Sensitivity analysis was also performed on each of these groups as shown
in Figure 4.14 to Figure 4.17.

92
Figure 4.14 Sensitivity About the Mean for 1% Crude Oil Concentration
93
94
Based on the results, similar behavior patterns were noted between the 1% and 20%
crude oil data and similarly between the 50% and 80% crude oil data analysis. The
similar groups were combined (1% and 20%) & (50% and 80%) and another sensitivity
analysis was performed (Figure 4.18 and Figure 4.19) to identify the similarities between
low or high crude oil concentrations.
Figure 4.18 Sensitivity About the Mean for 1% and 20% Combined
95
Figure 4.19 Sensitivity About the Mean for 50% and 80% Combined
From the results of the cumulative sensitivity analysis we found that Nickel (Ni),
Crude oil and TAN were some of the important input variables affecting the output.
These results were in accordance with earlier studies Hernández et al. [13]. In light of
these results, to further explore the interrelationship between the input variables, an excel
model was constructed. The model represented the network analysis and generated the
predicted inhibition rate for a given set of input values. The network output was held
constant and the values of input variables were varied to explore the effects. Figure 4.20
demonstrates the behavior of Crude oil and Nickel while holding the network output and
the remaining variables constant.

96
% Crude Oil vs. Ni
85.0
65.0
Ni
45.0
25.0
5.0
0 10 20 30 40 50 60 70 80
% Crude Oil
Figure 4.20 Relationship Between % Crude Oil and Ni at Constant Inhibition Output
Similarly Figure 4.21 and Figure 4.22 illustrate behavior pattern between Crude oil
and TAN and between API and Aromatics.

97
% Crude Oil vs. TAN

5.0
4.0
3.0
TAN
2.0
1.0
0.0
0 10 20 30 40 50 60 70 80 90
% Crude Oil
Figure 4.21 Relationship Between % Crude Oil and TAN at Constant Inhibition Output
API vs. Aromatics

60.0
50.0
40.0
Aromatics
30.0
20.0
10.0
0.0
8.00 9.00 10.00 11.00 12.00 13.00
API
Figure 4.22 Relationship Between API and Aromatics at Constant Inhibition Output
98
This same approach was further developed to show interactions between three
variables (Crude oil, Ni and TAN) at constant output inhibition as shown in Figure 4.23.
Figure 4.23 Relationship Curves Between % Crude Oil, Ni and TAN at Constant
Inhibition.
99
4. 4 NETWORK INTERPRETATION DIAGRAM (NID)
The Network Interpretation Diagram was constructed to track the direction and
magnitude of synaptic weights between neurons among input/hidden/output layers,
thereby providing a visual representation of the impact of individual/interacting input
variables on the output parameter. Figure 4.24 represents the NID for the network model
and illustrates the relative influence of each input variable in predicting the output
response.
100
Figure 4.24 NID for 11-6-6-1 MLP Network

101
4. 5 GARSON’S ALGORITHM
Garson’s Algorithm was applied to the network model in order to decipher the
relative importance of each input variable and their contribution to the predicted output.
Figure 4.25 displays the results of the algorithm in the form of a pie chart, partitioning the
relative importance of each input variable in the predicted response.
Figure 4.25 Results of Garson’s Algorithm Showing Relative Importance of Input
Variables.
102
4. 6 TREPAN ALGORITHM
The next step in the analysis was to extract rules which would translate the neural
network model into explicit symbolic form. For the TREPAN algorithm, the regression
problem with continuous output data was transformed into a classification problem. The
output (% inhibition) was divided into 5 classes. Two different data sets (EVEN and
UNEVEN) were generated; one having even classes and another having uneven classes.
The two class ranges for both the data are shown in Table 4.3 and Table 4.4.
Table 4.3 Uneven Class Ranges
Uneven Classification Ranges

Class Label Range
CL1 0 - 0.749
CL2 0.75 - 0.849
CL3 0.85 - 0.899
CL4 0.90 - 0.979
CL5 0.98 - 0.999
Table 4.4 Even Class Ranges
Even Classification Ranges

Class Label Range
CL1 0 - 0.2
CL2 0.2 - 0.4
CL3 0.4 - 0.6
CL4 0.6 - 0.8
CL5 0.8 - 1.0
103
The minimum_sample parameter was varied between 1 to 4 that resulted in a
number of decision trees with different sizes and classification accuracies. The
minimum_sample size of one generated the decision tree which resulted in the best
accuracy.
Two different kinds of decision trees were extracted for each set of data,
Trepan_tree and Disjunctive_Trepan_tree. The Trepan_tree extracts a tree from the
loaded neural network model using the TREPAN algorithm. The Disjunctive_Trepan_tree
extracts a tree from the loaded network model using a variant of TREPAN that applies
disjunctive (i.e. “or”) tests instead of the general m-of-n tests at the internal nodes of the
extracted tree.
The performance statistics for two variants of decision trees applied to the two data
sets (EVEN and UNEVEN) is provided in Table 4.5.
Table 4.5 Performance Statistics for Decision Trees
Trepan_tree Disjunctive_Trepan_tree
Training Training
Test Data Test Data
Data Data
EVEN
87.30% 62.70% 89.60% 64.80%
Classes
UNEVEN
91.80% 69.75% 92.30% 71.80%
Classes
104
Figure 4.26 and Figure 4.27 show a partially expanded view of the
Disjunctive_Trepan_tree and Trepan_tree extracted for the UNEVEN data set.

105
Figure 4.26 Partially Expanded View of Disjunctive_Trepan_tree Extracted From the
UNEVEN Class Data.

106
Figure 4.27 Partially Expanded View of Trepan_tree Extracted From the UNEVEN Class
Data.
107
In the decision trees (Figure 4.26 and Figure 4.27) the circles represent the leaf
nodes and indicate the class label for the response variable (inhibition) predicted for a
particular set of values of the input variables represented in the path. The decision tree
can be easily decomposed into propositional rules. Table 4.6 shows a set of 20 distinct
rules generated by decomposing the Disjunctive_Trepan_tree (see Figure 4.26)

108
Table 4.6 Rules Extracted From the Disjunctive_Trepan_tree
Rule Class
Rule Text
No. Label
1 (% Crude Oil > 10.50) AND (Resin > 17) AND (API <= 9.26) CL5
(% Crude Oil > 10.50) AND (Resin > 17) AND (API <= 13.15) AND (API > 9.26)
2 CL4
AND (% Crude Oil <= 64.99)
(% Crude Oil > 10.50) AND (Resin > 17) AND (API <= 13.15) AND (API > 9.26)
3 CL5
AND (% Crude Oil > 64.99)
(% Crude Oil > 10.50) AND (Resin > 17) AND (API > 13.15) AND (% Crude Oil
4 CL3
<= 34.99)
(% Crude Oil > 10.50) AND (Resin > 17) AND (API > 13.15) AND (% Crude Oil >
5 CL5
34.99)
(% Crude Oil > 10.50) AND (Resin <= 17) AND (Saturates <= 63.40) AND (%
6 CL3
Crude Oil <= 34.99) AND (Resin <= 5.30)
7 CL2
Crude Oil <= 34.99) AND (Resin > 5.30) AND (API <= 23.95)
8 CL4
Crude Oil <= 34.99) AND (Resin > 5.30) AND (API <= 32.34) AND (API > 23.95)
9 CL4
Crude Oil <= 34.99) AND (Resin > 5.30) AND (API > 32.34)
10 CL4
Crude Oil > 64.99)
11 CL3
Crude Oil > 34.99) AND (% Crude Oil <= 64.99) AND (API <= 23.95)
12 Crude Oil > 34.99) AND (% Crude Oil <= 64.99) AND (API > 23.95) AND (S % <= CL3
0.57)
13 Crude Oil > 34.99) AND (% Crude Oil <= 64.99) AND (API > 23.95) AND (S % > CL4
0.57)
(% Crude Oil > 10.50) AND (Resin <= 17) AND (Saturates > 63.40) AND (% Crude
14 CL2
Oil > 64.99)
15 CL1
Oil <= 64.99) AND (API <= 32.90)
16 CL2
Oil <= 64.99) AND (API > 32.90)
17 (% Crude Oil <= 10.50) AND (Nitrogen >= 6514.38) CL2
(% Crude Oil <= 10.50) AND (Nitrogen < 6514.38) AND (Saturates <= 57.15 OR
18 CL1
TAN <= 4.22)
(% Crude Oil <= 10.50) AND (Nitrogen < 6514.38) AND (Saturates > 57.15 OR
19 CL2
TAN > 4.22) AND (S % >= 0.58)
(% Crude Oil <= 10.50) AND (Nitrogen < 6514.38) AND (Saturates > 57.15 OR
20 CL1
TAN > 4.22) AND (S % < 0.58)
109
The first rule implies that “IF (% Crude Oil > 10.50) AND (Resin > 17) AND (API
<= 9.26) the class label is CL5” i.e. the predicted inhibition rate would be in range (0.98
- 0.99) as per Table 4.3.

110
CHAPTER 5. RESULTS, COMPARISION AND DISCUSSION
5. 1 ACCURACY OF SELECTED MODEL
As discussed earlier, the most important parameter is neural network is establishing
a network architecture which accurately mimics the data patterns. It is evident from Table
4.1 the 11-6-6-1 MLP network has the best prediction accuracy on the training data. The
Mean Square Error (MSE) which is a difference between the network output and the
desired output is an indirect measure of the performance of the model. Table 5.1 shows
the accuracy and the MSE values for the selected model.
Table 5.1 Prediction Accuracy of the 11-6-6-1 MLP Network.
Network Model Mean Square

Architecture Accuracy Error (MSE)
11-6-6-1 MLP
Neural
R = 97.60% 0.0026
Network
Model
111
Figure 5.1 shows the model performance on training vs. test data.
Figure 5.1 Model Performance on Training vs. Test Data.
Other important parameter to be considered is the time (number of runs/epochs) for
the model to reach minimum MSE values. In this case the selected model was able to
converge to the minimum MSE value in the very first run after only 3500 epochs.
112
5. 2 SENSITIVITY ANALYSIS RESULTS
The initial sensitivity runs on the selected model Figure 4.1 revealed that the
variables having the greatest influence on the output response were Crude Oil percentage,
Ni Content and API gravity. The separate sensitivity analysis Figure 4.2 to Figure 4.12
further explains the effect of each of the variable on the final output. Increase in the %
Crude Oil causes an increase in the inhibiting capacity of the (see Figure 4.2). Increase in
the Nickel content has a detrimental effect reducing the inhibiting capacity of the oil (see
Figure 4.3). API (Figure 4.4) and total Nitrogen (Figure 4.5) tends to increase the
inhibition rate as their respective content increases. The effect of Vanadium (Figure 4.6)
and Total Acid Number (Figure 4.8) resulted in an increase of the inhibiting capacity;
however the effect is very small as can be seen for the values in the y-axis. The content of
S % in the range tested (Figure 4.7) showed to decrease the inhibiting capacity. In regards
to the SARA components of the crude oil, none showed a significant effect, however
saturates (Figure 4.9) showed to decrease the inhibiting capacity as their content
increases, contrary to aromatics (Figure 4.10), resins (Figure 4.11) and asphaltenes
(Figure 4.12) which show to increase inhibition as their content increases.
The cumulative sensitivity graph (Figure 4.13) constructed by averaging the
sensitivity values from six different test runs was also consistent with the initial
sensitivity analysis and indicated that Crude Oil percentage, Ni Content and API gravity
were the most important factors that affected the inhibition rate.
113
The tendency of % crude oil vs. inhibition was clear in both the data and the model.
An increase in crude oil content increases the degree of corrosion protectiveness by the
crude oil. With API density, even if the data is scattered, the model predicts an increase
in inhibition as API increases, implying lighter crude oils providing higher values of
inhibiting capacities.
In order to see if this effect was repeatable, separated sensitivity analyses were
performed for the various crude oil contents evaluated: 1%, 20%, 50% and 80%.
• For 1% crude oil (see Figure 4.14) the model tends to predict a higher inhibiting
capacity than the real measured values, but the R (model accuracy) value is still
considerably high, 0.96. Nickel appears to be most significant followed by API,
sulfur content, TAN and asphaltenes. All but Nickel increase the inhibiting
capacity when increased in number or concentration.
• For 20% crude oil (Figure 4.15, R=0.98) Nickel is not that critical and the
variables influencing the most are API, total Nitrogen, resins and TAN. Saturates
seem to have a detrimental effect.
• For 50% crude oil (Figure 4.16, R=0.96) the four variables with the highest
sensitivity are Nickel, Vanadium, aromatics and sulfur. Nickel and aromatics
decrease the value of inhibiting capacity as their content increases. If only
positive effects are considered then V, S %, asphaltenes and resins show the
highest influence.
114
• For 80% (Figure 4.17, R=0.98) Nickel and Vanadium showed the highest
sensitivities, in both cases producing a decrease in the inhibiting capacity as their
content increases. asphaltenes follow and then aromatics, the latter also having an
inverse relationship. Note that sensitivity values are a lot higher for the first two
cases.
An interesting result from the model is that it was able to point out notably different
behaviors when the crude oil concentration changes. By putting together the data for low
concentrations (1 and 20%, see Figure 4.18) and the data for higher concentrations (50
and 80%, see Figure 4.19) and looking at the sensitivities it can be concluded that at
higher concentrations the presence of crude oil has the greatest influence on the output
and the effects of other variables is not as significant. At low crude oil concentrations the
sensitivities are a lot higher (up to 0.8) indicating that inhibition is not as much related to
the amount of crude oil but to the presence of oil or a combination of the two or more
variables.
From the interrelationship graphs (see Figure 4.20) we can clearly see that for
Crude oil range [1% - 20%], Nickel content of the oil tends to increase. Further for
[20% - 80%] crude oil range, the Nickel content decreases; depicting an inverse
relationship between the two variables. Similarly from Figure 4.21 we can see a linear
relationship between Crude oil and TAN for crude oil ranging from [20% - 80%].
115
5. 3 NETWORK INTERPRETATION DIAGRAM (NID) RESULTS
Network Interpretation Diagram results serve as the basis; reiterating the fact that
certain variables have a positive effect on the output response whereas others tend to have
a negative effect. From the Figure 4.24 we can see that the thick continuous lines
representing positive excitatory signal are getting generate from %crude oil, API and
total Nitrogen content, where as thin continuous lines are generated from Vanadium and
TAN. These results are consistent with the patterns available through the sensitivity
analysis showing that all the above mentioned variables positively affect the inhibition
rate i.e. an increase in these variables tend to increase the inhibition rate. On the contrary
variables such as Nickel, S% and Saturates generate thick dashed lines which represent a
negative or detrimental effect on the output response. The NID thus provided a clear
visual representation of the magnitude and direction of the synaptic weights.
5. 4 RESULTS OF GARSON’S ALGORITHM
The main idea behind implementation of Garson’s algorithm was to partition the
relative share of prediction associated with each input variable and determine if any of
the input variables can be eliminated from the further analysis. From the Figure 4.25 we
can positively conclude that %Crude oil is by far the most influencing factor affecting the
inhibition rate. The relative partitioning also revealed that apart from crude oil and
Nickel; all the input variables have almost similar affect on the output.
116
5. 5 RESULTS OF TREPAN ALGORITHM
The TREPAN algorithm was applied both on Even and Uneven class data sets and
two different decision trees; Trepan_tree and Disjunctive_Trepan_tree were extracted.
From the results as shown in Table 4.5 the following inference can be made.
• The classification accuracy in the case of both training and test data was found
higher in the UNEVEN class data set. This is mainly because of the fact the most
of the data points had a high inhibition rates, so dividing into uneven classes
based on the frequency distribution of the data within particular ranges proved to
be better idea.
• The Disjunctive_Trepan_tree extracted in the case of the UNEVEN data had
slightly better prediction accuracy as compared to the Trepan_tree extracted from
the UNEVEN data set.
5. 5. 1 EFFICACY OF THE RULE EXTRACTION TASK
The efficacy of the rule extraction task can be tested along the following
dimensions:
• Comprehensibility and Expressive Power: The propositional rules generated from
the Trepan decision tree were successfully able to provide class labels providing
explicit results of the extracted knowledge. The antecedent of each rule is a
simple combination of the values of the input variables. The number of input
117
variables in the antecedent of each rule serves as an indirect measure of scaling
the comprehensibility and the expressive power of the rules set.
• Table 5.2 gives us a quick overview of the number of input variables in
antecedents in a particular rule (the left column) and the number of rules having
that number of features in their respective antecedents (right column). From the
table it is clear that almost 90% of the rules in the rule set have less than five input
variables in their antecedents, clearly signifying the comprehensibility of the rule
set.
Table 5.2 Number of Features in the Rule Antecedent for the NN-Rule Set
Number of features in
Number of rules (Total: 20)
the rule antecedent
2 1
3 9
4 8
5 2
• Accuracy and Fidelity: The rules set was generated from the decision tree created
by the TREPAN algorithm. Considering the fact that TREPAN uses the network as
an oracle to predict the class labels, the rule set accurately mimics the behavior of
the trained neural network. Hence, we can positively conclude that the fidelity of
the rule extraction process was extremely high.

118
5. 6 COMPARISON METHODOLOGIES
5. 6. 1 STATISTICAL ANALYSIS
Statistical Analysis of the data was performed using MINITAB. Multiple
regression analysis was performed to come up with a regression equation that would be
able to show the model could be augmented by knowing any possible linear relationships
among each of the input variables and the output. The regression equation is:
%Inhibition = 71.7 + 0.023 API + 0.000086 total + 0.0576 TAN

− 0.722 Saturates − 0.714 Aromatics − 0.700Resins
(5.1)
− 0.727 Asphaltenes + 0.000227 V − 0.0047 Ni
+ 0.00365 %Crude Oil
Details for the multiple regression is given in Table 5.3
Table 5.3 Results of Multiple Regression Analysis
Predictor Coef SE Coef T P

Constant 71.72 34.64 2.07 0.041
API 0.02311 0.00546 4.24 0
S(%) -0.06953 0.0946 -0.73 0.464
Total 0.00009 0.00006 1.35 0.181
TAN 0.05759 0.01581 3.64 0
Saturates -0.7223 0.3464 -2.09 0.039
Aromatics -0.7136 0.345 -2.07 0.041
Resins -0.7 0.3446 -2.03 0.045
Asphaltenes -0.7273 0.3386 -2.15 0.034
V 0.00023 0.00022 1.04 0.299
Ni -0.00472 0.00341 -1.38 0.169
%Crude oil 0.00365 0.0004 9.05 0
119
The Coef is the regression coefficient for a given variable, SE Coef is the standard
error of the coefficient. The t-value (T) is used to compare the t-value to the t-distribution
to determine if a predictor is significant. The bigger the absolute value of the t-value, the
more likely the predictor is significant. The p-value (P) is the probability value and it is
often used in hypothesis tests to help decide whether to reject or fail to reject a null
hypothesis. The p-value is the probability of obtaining a test statistic that is at least as
extreme as the actual calculated value, if the null hypothesis is true. The smaller the p-
value, the smaller the probability is that one would be making a mistake by rejecting the
null hypothesis. A commonly used cut-off value for the p-value is 0.05. For example, if
the calculated p-value of a test statistic is less than 0.05, the null hypothesis is rejected.
The p-values for the estimated coefficients of API, TAN and Crude Oil are 0.000,
indicating that they are significantly related to % Inhibition. The p-values for V, Ni,
Total, S% are >0.05, indicating that these are not related to Inhibition at a-level of 0.05.
The R-Square value obtained was 55%, which is fairly low suggesting that the
relationship between the predictor and response variables is not linear. The R-Square
value of 55% implies that only 55% of the variability in the output could be captured and
explained by this linear model.
Many statistical tests and intervals are based on the assumption of normality.
Unfortunately, many real data sets are in fact not approximately normal. However, an
appropriate transformation of a data set can often yield a data set that does follow
approximately a normal distribution. This increases the applicability and usefulness of

120
statistical techniques based on the normality assumption. The BOX-COX transformation
[100] is a particularly useful family of transformations, applied to the response variable
(in our case inhibition). It is defined as:
T (Y ) = (Y λ − 1) / λ (5.2)
where Y is the response variable and λ is the transformation parameter.
Similarly the BOX-TIDWELL [100] method is apply a power transformation to
some of the input variables e.g. (API)2 or (Ni)-0.5 to normalize the data. The BOX-COX
and BOX-TIDWELL transformations were performed on the original regression model
but the results did not improve reinforcing the fact that the relationship between the
predictor and response variables is not linear. Lastly, Stepwise Regression [100] was
performed to consider reducing the model size by eliminating some of the input variables
(within the scope of the analysis). Once again, there was no significant improvement in
the R-Sq value
5. 6. 2 C4.5 DECISION TREE USING WEKA
It has been seen that the neural networks are generally better in approximating the
complex relationships between the continuous variables and their influence on the output.
Rules extracted from the decision trees based on the network parameters such as weight
and basis tend to be more accurate in some cases then those derived direct from the data
by other machine learning methods, such as ID3 [23], C4.5 [25] or CART [102].
121
The Waikato Environment for Knowledge Analysis (WEKA) [101] java software
package provides a host of well documented data structures, classes and tools for
development of the machine learning schemes. WEKA uses a J4.8 algorithm which
implements the C4.5 algorithm to extract decision tree. In order to compare the results of
TREPAN the C4.5 algorithm was applied to the data. The results of the C4.5 algorithm
applied set of EVEN and UNEVEN classified data set is shown in Table 5.4
Table 5.4 Results of C4.5 Algorithm.
C4.5 Decision Tree

Training Data Test Data
EVEN
72.96% 60.45%
Classes
UNEVEN
74.31% 62.70%
Classes
From the results shown in Table 5.4 we can see that the data with the EVEN
classification ranges had higher prediction accuracy both on the training and the test data.
Table 5.5 gives us a comparative summary of the prediction accuracy of the
different methodologies employed in this thesis.

122
Table 5.5 Comparative Summary
Prediction Methodology Prediction Accuracy

Employed on Test Data
Neural Network Model
96.7
(MLP) 11-6-6-1
Disjunctive_Trepan_tree
71.80%
(Uneven Classes)
C4.5 Decision Tree
62.70%
(Uneven Classes)
Multiple Regression
55.00%
Analysis
From the table is evident that clearly the Neural Network model outperforms the
traditional Multiple Regression Analysis, providing us with a model that understands the
data patterns and the interrelationship between the predictor variables and their effect on
the response. Again the rules set extracted from the Disjunctive_Trepan decision tree
based on the network parameters such as synaptic weights and bias, has higher prediction
accuracy as compared to the C4.5 decision tree derived directly from the data values.
123
CHAPTER 6. CONCLUSION AND FUTURE RESEARCH
6. 1 CONCLUSIONS
The main aim was to develop a model that would perform accurately generating
high predictive accuracy even with limited amount of data. Another important driving
factor for the research was to come up with a robust model which could handle noisy data
and still be able to explain the complex relationship between the various constituents of
the oil and provide knowledge based on pattern in the data. This thesis covered several
aspects of knowledge based approach for predicting the corrosion rate based on the
constituent of the oil. Sensitivity Analysis was clearly able to explain the interrelationship
between their input variables. The analysis revealed that variables such as crude oil, API,
Nickel and Total Nitrogen content were the most important factors affecting the
inhibition rate. These results were in accordance to the experimental results obtained
Hernández et al. [13]. Further analysis using NID and Garson’s algorithm determined
the relative share of importance of all the input variables in the predicted response.
Finally TREPAN presented us with a rule set extracted from the decision tree, which was
accurately able to mimic the neural network in classifying the patterns in the rule space.
The efficacy of rule set proved that the rules were simple and comprehensible with less
number of features in the rule antecedent.
A comparative study was also undertaken to test the performance of the neural
network based approach compared to the statistical analysis and traditional data mining
124
methods. The encouraging results proved that neural network based model coupled with
rule extraction techniques accomplished better results.
In summary the research was successfully able develop a rule based approach using
artificial neural networks in predicting the corrosion rate.
6. 2 FUTURE RESEARCH
There are various directions that can effectively channelize the future research in
this field.
Use of Interpolation Techniques for Data Farming:
The neural network performance faces challenges such as variability of the
corrosion behavior and the unpredictable behavioral patterns of the network model that
may occur in regions of the problem domain where no data is available. Neural networks
cannot produce reliable prediction for input conditions that are outside the ranges of the
data used to train them. The standard interpolation techniques can be successfully applied
to one or two variables, but in the case of corrosion the number of variables is significant
(11 variables in this research). There is a need to develop interpolation methods that can
generate data points while accommodating large number of input variables and still be
able to maintain the interrelationship between variables.

125
Knowledge Based Neurocomputing
An important consideration for using neural networks in practical applications is
the time and cost of training the neural network. The training regimen can be
significantly improved by introducing knowledge-primed or hind-based training
strategies. Such techniques can map the available domain knowledge into the basic
architecture of the neural model to reduce the training effort.
Regression Trees
Most of the rule extraction methods depend on decision trees which are primarily
learned models in classification domains. Formulation of a classification problem from
the original continuous data, introduces significant noise in the learning data set. For
regression tasks, it would be beneficial to have the extracted model be a representation of
the actual regression surface of the network. A proposed research could be developing a
regression tree, which would have leaves characterized by real valued functions rather
than predicted classes.
Development of an Hybrid Mechanistic Model
The current research focuses on extraction of rule from a trained neural network for
corrosion prediction. For practical application of this technique it needs to be readily
available in the form of equations that can be used for calculations. There is a need to
develop a hybrid mechanistic model which represents the complex physiochemical
processes through a set of equations. It would express both the existing knowledge
126
(theoretical knowledge based on chemical analysis) and the one extracted in the form of
rules extracted from the neural network.

127
REFERENCES
[1]. C. de Waard, D. E. Milliams, “Carbonic Acid Corrosion of Steel,” Corrosion

31(5) 1975, p.131-177.
[2]. L. G. S. Gray, B. G. Anderson, M. J. Danysh, P. R. Tremaine, “Mechanism of
carbon steel corrosion in brines containing dissolved carbon dioxide at pH4,”
Corrosion/89, Paper no. 464, Houston, TX, NACE International, 1989.
[3]. S. Nesic, J. Postlethwaite, S. Olsen, “An electrochemical model for prediction of
CO2 corrosion,” Corrosion/95, Paper no. 131, Houston, TX, NACE
International, 1995.
[4]. C. de Waard, D. E. Milliams, “Predication of carbonic acid corrosion in natural
gas pipelines,” First International Conference on the Internal and External
Corrosion of Pipes, Paper FL, University of Durham, England, 1975.
[5]. G. Schmitt, B. Rothman, “Werkstoffe and Korrosion 28,” 1977, p. 816.
[6]. L. G. S. Gray, B. G. Anderson, M. J. Danysh, P. R. Tremaine, “Effect of pH and
temperature on the mechanism of carbon steel corrosion by aqueous carbon
dioxide,” Corrosion/90, Paper no. 40, Houston , TX, NACE International, 1990.
[7]. C. de Waard, U. Lotz, “Prediction of CO2 corrosion of carbon steel,”
Corrosion/93, Paper no. 69, Houston, TX, NACE International, 1993.
[8]. A. Dugstad, L. Lunde, K. Videm, “Parametric study of CO2 corrosion of carbon
steel,” Corrosion/94, Paper no. 14, Houston, TX, NACE International, 1994.
[9]. European Federation of Corrosion, “CO2 corrosion control in oil and gas
production, A Working Party Report,” Publication 23, The Institute of Materials,
London, England, 1997.
[10]. European Federation of Corrosion, “CO2 corrosion control in oil and gas
industry, A Working Party Report,” Publication 13, The Institute of Materials,
London, England, 1994.
[11]. K. D. Efird, “Preventive Corrosion Engineering in Crude Oil Production,”
Offshore Technology Conference (OTC), Paper no. 6599, 1991.
[12]. J. S. Smart, “Wettability – a Major Factor in Oils and Gas System Corrosion,”
NACE Corrosion/93, Paper no. 70, 1993.
[13]. S. Hernández, S. Duplat, J. R. Vera, E. Barón, “A statistical approach for
analyzing the inhibiting effects of different types of crude oil in CO2 corrosion
of carbon steel.” Corrosion/2002, Paper no. 02293. National Association of
Corrosion Engineers, 2002.
128
[14]. R. A. Cottis, L. Qing, G. Owen, S. J. Gartland, I.A. Helliwell, M. Turega,

“Neural networks for corrosion data reduction,” Materials and Design, Vol. 20,
1999.
[15]. Bonissone, P. P., “Soft computing: the convergence of emerging reasoning
technologies,” Soft Computing, 1(1), 1997, p. 6-18.
[16]. Zadeh, L. A., “Fuzzy logic: issues, contentions and perspectives,” IEEE
International Conference on Acoustics, Speech, and Signal Processing, Vol. 4,
1994, p. 19-22.
[17]. Dote, Y., Ovaska, S. J., “Industrial applications of soft computing: A review,”
Proceedings of the IEEE, 89(9), 2001, p. 1243-1265.
[18]. Holland, J. H., “Adaptation in natural and artificial systems,” University of
Michigan Press, Ann Arbor, MI, 1975.
[19]. Gen, M., Cheng, R., “Genetic algorithms and engineering optimization,” John
Wiley & Sons, Inc., New York, NY, 2000.
[20]. Dietterich, T. G., “Machine learning. Annual Review of Computer Science,”
Vol. 4, 1990, p. 255-306.
[21]. Mitchell, T., “Machine learning. 1st edition. Computer Science Series,” WCB
McGraw-Hill, Boston, MA, 1997.
[22]. Fayyad, U., Piatetsky S. G., Smyth, P., “From data mining to knowledge
discovery in databases,” AI Magazine, 17(3), 1996.
[23]. Quinlan, J. R., “Induction of decision trees. Machine Learning,” Vol. 1, 1986, p.
81-106.
[24]. Craven, M., “Extracting comprehensible models from trained neural networks,”
Ph.D. dissertation, University of Wisconsin, Madison, WI, 1996.
[25]. Quinlan, J. R., “C4.5: Programs in machine learning,” Morgan Kaufmann, San
Mateo, CA, 1993.
[26]. Han, J., Fu, Y., “Exploration of the power of attribute-oriented induction in data
mining in advances in knowledge discovery and data mining,” AAAI, MIT
Press, Cambridge, MA, 1996, p. 399-421.
[27]. Han, J., Cai, Y., Cercone, N., “Knowledge discovery in databases: An attribute-
oriented approach,” Proceedings of 1992 International Conference on Very
Large Data Bases (VLDB'92), Vancouver, Canada, 1992, p. 547-559.
[28]. Han, J., Cai, Y., Cercone, N., Huang, Y., “Discovery of data evolution
regularities in large databases,” Journal of Computer and Software Engineering,
1994, p. 1-29.
129
[29]. Efraim, T., Jay E. A., Liang T. P., McCarthy, R. V., “Decision support systems
and intelligent systems,” Prentice Hall, Upper Saddle River, NJ, 2001.
[30]. Principe, J. C., Euliano, E. R., Lefebvre, W. C., “Neural and adaptive systems:
Fundamentals through simulations with cd-rom,” John Wiley & Sons, Inc., New
York, NY, 1999.
[31]. Reed, R. D., Marks, R. J., “Neural smithing: Supervised learning in feedforward
artificial neural networks,” MIT Press, Cambridge, MA, 1998.
[32]. Rumelhart, D. E., Hinton, G. E., Williams, R. J., “Learning representations by
backpropagation errors,” Nature, Paper no. 323, 1986, p. 533-536.
[33]. Hinton, G. E., “Connectionist learning procedures,” Artificial. Intelligence,
40(1-3), 1989, p. 185-234.
[34]. Whitley, D., Starkweather, T., Bogart, C., “Genetic algorithms and neural
networks: Optimizing connections and connectivity,” Parallel Computing, 14(3),
1990, p. 347-361.
[35]. Engel, J., “Teaching feed-forward neural networks by simulated annealing,”
Complex Systems, 2(6), 1988, p. 641-648.
[36]. Vapnik, V. N., “The nature of statistical learning theory,” Springer-Verlag, NY,
1995.
[37]. Dimopoulos, Y., Bourret, P., Lek, S., “Use of some sensitivity criteria for
choosing networks with good generalization ability,” Neural Processing Letters
Vol. 2, 1995, p. 1-4.
[38]. Dimopoulos, I., Chronopoulos, J., Chronopoulou Sereli, A., Lek, S., “Neural
network models to study relationships between lead concentration in grasses and
permanent urban descriptors in Athens city (Greece).” Ecological Modelling,
Paper no. 120, 1999, p. 157-165.
[39]. Scardi, M., Harding, L.W., “Developing an empirical model of phytoplankton
primary production: a neural network case study,” Ecological Modelling, 120
(2-3), 1999, p. 213-223.
[40]. Yao, J., Teng, N., Poh, H.L., Tan, C.L., “Forecasting and analysis of marketing
data using neural networks,” Journal of Information Science and Engineering
14, 1998, p. 843-862.
[41]. Lek, S., Belaud, A., Dimopoulos, I., Lauga, J., Moreau, J., “Improved
estimation, using neural networks, of the food consumption of fish populations,”
Marine Freshwater Research, 46, 1995, p. 1229-1236.
[42]. Lek, S., Belaud, A., Baran, P., Dimopoulos, I., Delacoste, M., “Role of some
environmental variables in trout abundance models using neural networks,”
Aquatic Living Resources, 9, 1996a, p. 23-29.
130
[43]. Lek, S., Delacoste, M., Baran, P., Dimopoulos, I., Lauga, J., Aulagnier, S.,
“Application of neural networks to modelling nonlinear relationships in
ecology,” Ecological Modelling, 90, 1996b, p. 39-52.
[44]. Mastrorillo, S., Lek, S., Dauba, F., “Predicting the abundance of minnow
Phoxinus phoxinus (Cyprinidae) in the River Ariege (France) using artificial
neural networks,” Aquat. Living Resour, 10, 1997a, p. 169–176.
[45]. Mastrorillo, S., Dauba, F., Oberdorff, T., Gue´gan, J.F., Lek, S., “Predicting
local fish species richness in the Garonne River basin,” C.R. Acad. Sci, Sciences
de la vie Paris, 321, 1998, p. 423–428.
[46]. Lek-Ang, S., Deharveng, L., Lek, S., “Predictive models of collembolan
diversity and abundance in a riparian habitat,” Ecological Modelling, 120, 1999,
p. 247–260.
[47]. Spitz, F., Lek, S., “Environmental impact prediction using neural network
modeling. An example in wildlife damage.” Journal of applied ecology, 36,
1999, p. 317–326.
[48]. Olden, J.D., “An artificial neural network approach for studying phytoplankton
succession.” Hydrobiology, 436, 2000, p. 131–143.
[49]. Garson, G.D., “Interpreting neural network connection weights,” Artificial
Intelligence Expert, 6, 1991, p. 47-51.
[50]. Goh, A.T.C., “Back-propagation neural networks for modeling complex
systems,” Artificial Intelligence in Engineering, 9, 1995, p. 143-151.
[51]. Aoki, I., Komatsu, T., “Analysis and prediction of the fluctuation of sardine
abundance using a neural network,” Oceanol. Acta, 20, 1999, p. 81–88.
[52]. Chen, D.G., Ware, D.M., “A neural network model for forecasting fish stock
recruitment,” Can. J. Fish, Aquat. Sci, Vol. 56, 1999, p. 2385–2396.
[53]. Özesmi, S. L., U. Özesmi, “An artificial neural network approach to spatial
habitat modelling with interspecific interaction,” Ecological Modelling, 116,
1999, p. 15–31.
[54]. Tickle, A., Andrews, R., Golea, M., Diederich, J., “The truth will come to light:
Directions and challenges in extracting the knowledge embedded within trained
artificial neural networks,” IEEE Transactions on Neural Networks, 9(6), 1998,
p. 1057-1068.
[55]. Craven, M., Shavlik, J., “Rule extraction: where do we go from here?”
University of Wisconsin Machine Learning Research Group working paper, 99-
1, 1999.
[56]. Gallant, S. I., “Connectionist expert systems,” Communications of the ACM, 31,
1988, p. 152-169.
131
[57]. Andrews, R., Diederich., Tickle, A. B., “Survey and critique of techniques for
extracting rules from trained artificial neural networks,” Knowledge Based
Systems, 8, 1995, p. 373-389.
[58]. Fu, L. M., “Rule learning by searching on adapted nets,” Proceedings of the
Ninth National Conference on Artificial Intelligence, AAAI Press, Anaheim,
CA, 1991, p. 590-595.
[59]. Towell, G. G., Shavlik, J. W., “Extracting refined rules from knowledge-based
neural networks,” Machine Learning, 13, 1993, p. 71-101.
[60]. Setiono, R., “Extracting rules from neural networks by pruning and hidden-unit
splitting,” Neural Computation, 9, 1997, p. 205-225.
[61]. Andrews, R., Geva, S., “Rule extraction from a constrained error back
propagation MLP,” Proceedings of Fifth Australian Conference on Neural
Networks, Brisbane, Queensland, 1994, p. 9-12.
[62]. Saito, K., Nakano, R., “Rule extraction from facts and neural networks,”
Proceedings of the International Neural Network Conference, San Diego, CA,
1990, p. 379-382.
[63]. Craven, M. W., Shavlik, J. W., “Using sampling and queries to extract rules
from trained neural networks,” Proceedings of the Eleventh International
Conference on Machine Learning, Morgan Kaufmann, New Brunswick, NJ,
1994, p. 37-45.
[64]. Thurn, S. B., “Extracting provably correct rules from artificial neural networks,”
Technical Report IAI-TR-93-5, University of Bonn, Bonn, Germany, 1993.
[65]. Pop, E., Ruleneg, J. D., “Rule-extraction from neural networks by step-wise
negation,” Technical report, Queensland University of Technology,
Neurocomputing Research Center, 1994.
[66]. Craven, M.W., Shavlik, J. W., “Extracting tree-structured representations of
trained networks,” Advances in Neural Information Processing, Vol. 8, 1996,
p. 24-30.
[67]. Schmitz, G. P. J., Aldrich, C., Gouws, F. S., “ANN-DT: An algorithm for
extraction of decision trees from artificial neural networks,” IEEE Transactions
on Neural Networks, Vol. 10(6), 1999, p. 1392-1401.
[68]. Boz, O., “Converting a trained neural network to a decision tree,” Proceedings
of the 2002 International Conference on Machine Learning and Applications –
ICMLA., Las Vegas, NE, CSREA Press, 2002, p. 110-116.
132
[69]. Sestito, S., Dillon, T. “Automated knowledge acquisition of rules with

continuously valued attributes,” Proceedings of the Twelfth International
Conference on Expert Systems and their Application, Avignon, France, 1992,
p. 645-656.
[70]. Keedwell, E., Narayanan, A., Savic, D. A., “Using genetic algorithms to extract
rules from trained neural networks,” Proceedings of the Genetic and
Evolutionary Computing Conference, Orlando, FL, Morgan Kaufmann, 1999,
p. 793.
[71]. Maire, F., “The convergence of validity interval analysis.” IEEE Transactions on
Neural Networks, 11(3), 2000, p. 802-807.
[72]. Nesic, S., Thevenot, N., Crolet, J.L., “Electrochemical properties of iron
dissolution in CO2 solutions – basic revisited,” Corrosion /96, Paper No.3,
Houston, TX, NACE International, 1996.
[73]. Videm, K., “Fundamental studies aimed at improving models for prediction of
CO2 corrosion,” Progress in the Understanding and Prevention of Corrosion,
Proceedings of the 10th European Corrosion Congress, Vol. 1. Institute of
Metals, London, 1993, p. 513.
[74]. C. de Waard, U. Lotz, A.Dugstad, “Influence of liquid flow velocity on CO2
corrosion: a semi-empirical model,” NACE Corrosion/95, Paper no. 128, 1005.
[75]. Pots, C. D., Garber, J. D., Walters, F. H., Singh, C., “Verification of computer
modeled tubing life predictions by field data,” NACE International,
Corrosion/93, Paper no. 82, Houston, TX, 1993.
[76]. Pots, B. F. M., “Mechanistic models for prediction of CO2 corrosion rates under
multi-phase flow conditions,” NACE International, Corrosion /95, Houston, TX,
Paper no. 137, 1995.
[77]. Turgoose, S., Cottis, R. A., Lawson, K., “Modeling of electrode processes and
surface chemistry in carbon dioxide containing solutions,” ASTM Symposium
on Computer Modelling of Corrosion, San Antonio, TX, 1990.
[78]. Achour, M.H., Kolts, J., Johannes, A.H., Liu, G., “Mechanistic modeling of pit
propagation in CO2 environment under high turbulence effects,” NACE
International, Corrosion/93, Houston, TX, Paper no. 87, 1993.
[79]. Dayalan, E., Vani, G., Shadley, J. R, Shirazi, S. A., Rybicki, E. F., “Modelling
CO2 corrosion of carbon steel in pipe flow,” NACE International, Corrosion/95
Houston, TX, Paper no. 118, 1995.
[80]. Kanwar, S., Jepson, W. P., “A model to predict sweet corrosion of multiphase
flow in horizontal pipelines,” NACE International, Corrosion /94, Houston, TX,
Paper no. 24, 1994.
133
[81]. Provan, J. W., Rodriquez, E. S. III, "Part I: Development of a Markov

description of pitting corrosion,” Corrosion, Vol. 45, n 3, 1989, p. 178-192.
[82]. Sheikh, A. K., Boah, J. K., Hansen, D. A., “Statistical modeling of pitting
corrosion and pipeline reliability,” Corrosion, Vol. 46, n 3, 1990, p. 190-197.
[83]. C. Mendez, S. Duplat, S. Hernandez, J. Vera, “On The Mechanism Of Corrosion
Inhibition By Crude Oils,” NACE International, Corrosion/2001, Houston, TX,
Paper no. 01044, 2001.
[84]. S. Hernandez, J. Bruzual, F. Lopez-Linares, J. Luzon, “Isolation Of Potential
Corrosion Inhibiting Compounds in Crude Oils,” NACE International,
Corrosion/2003, Houston, TX, Paper No. 03330, 2003.
[85]. Smets, H. M. G., Bogaerts, W. F. L., “Neural network prediction of stress-
corrosion cracking,” Mater Perform, Vol. 31, 1992, p. 64-67.
[86]. Urquidi-Macdonald, M., Eiden, M. N., Macdonald D. D, “Development of a
neural network model for predicting damage functions for pitting corrosion in
condensing heat exchangers,” Modifications of Passive Films, Paris,1993,
p. 336-343.
[87]. Ben-Hain M., Macdonald D. D, “Modeling geological brines in salt-dome high
level nuclear waste isolation repositories by artificial neural networks,”
Corrosion Science., Vol. 36, 1994, p. 385-393.
[88]. Silverman D. C, Rosen E. M., “Corrosion prediction from polarization scans
using an artificial neural network integrated with an expert system,”
Corrosion/92, Vol. 48, 1992, p. 734 -745.
[89]. Silverman D. C., “Artificial neural network predictions of degradation of non-
metallic lining materials from laboratory tests,” Corrosion/94, Vol. 50, 1994, p.
411-418.
[90]. Trasatti S. P., Mazza F., “Crevice corrosion: a neural network approach,”
Corrosion Journal, Vol. 31, 1996, p. 105-112.
[91]. Nesic S., Vrhovac M., “Neural network model for CO2 corrosion of carbon
steel,” Journal Of Corrosion Science and Engineering, Vol. 1, paper 6, 1998.
[92]. R. A. Bailey, S. Jayanti, R. M. Pidaparti, M. J. Palakal, “Corrosion prediction in
aging aircraft materials using neural networks,” Proc. of the 41st
AIAA/ASME/ASCE/ACS Conference on Structures and Structural Dynamics
and Materials, April 2000.
[93]. M. Bucolo, L. Fortuna, M. Nelke, A. Rizzo, T. Sciacca, “Prediction Model for
the Corrosion Phenomena in Pulp & Paper Plant,” Control Engineering Practice,
Vol. 10, Elsevier Science, 2002, p. 227-237.
134
[94]. Leifer, J., Mickalonis, J. I., “Prediction of Pitting Corrosion in Aqueous

Environments via Artificial Neural Network Analysis,” Proceedings of Artificial
Neural Networks in Engineering (ANNIE) 98, St. Louis, MO, November’98.
[95]. Leifer, J., Zapp, P. E., Mickalonis, J. I., “Predictive Models for the
Determination of Pitting Corrosion versus Inhibitor Concentrations and
Temperature for Radioactive Sludge in Carbon Steel Tanks,” The Journal of
Engineering and Science Corrosion, Vol. 55, no. 1, January 1999, p. 31 - 37.
[96]. Haque, M. E., Sudhakar, K. V., "Prediction of Corrosion-Fatigue behavior of DP
steel through Artificial Neural Network,” International Journal of Fatigue, Vol.
23, Issue 1, Elsevier Science, Ltd., January 2001, p. 1-4.
[97]. Pidaparti R. M., Jayanti S., Palakal M. J., “Residual Strength and Corrosion Rate
Predictions of Aging Aircraft Panels: Neural Network Study,” Journal of
Aircraft, No. 1, Vol. 39, Janurary 2002, p. 175-180.
[98]. M. J. Palakal, R. M. Pidaparti, S. Rebbapragada, C. R. Jones, “Intelligent
Computational Methods for Corrosion Damage Assessment,” AIAA Journal,
Vol. 39, No. 10, 2001, p. 1936-1943.
[99]. Adams, C. D., Garber, J. D., Walters, F. H., Singh, C., “Verification of computer
modeled tubing life predictions by field data,” NACE International,
Corrosion/93, Houston, TX, Paper no. 82, 1993.
[100]. D. C. Montgomery, E. A Peck, G. Vining, “Introduction to Linear Regression
Analysis,” Third Edition, John Wiley & Sons, New York, NY, 2001.
[101]. Stephen R. Garner, “WEKA: The Waikato Environment for Knowledge
Analysis,” 1995.
[102]. Breiman, L., Friendman, J. H., Olshen, R. A., Stone C. J., “Classification and
Regression Trees,” Chapman & Hill, New York, 1984.
[103]. http://www2.umist.ac.uk/corrosion/JCSE/Volume1/paper6/v1p6.html
[104]. http://www.cs.nyu.edu/web/Research/Theses/li_bin.pdf
[105]. http://www-ai.informatik.uni-dortmund.de/DOKUMENTE/klinkenberg_96a.pdf
[106]. http://online.awma.org/journal/pdfs/2003/4/dimopoulos.pdf
[107]. http://www.neurosolutions.com/products/ns/features.html
[108]. Olden, J. D., Jackson, D. A., “Illuminating the “black box”: a randomization
approach understanding variable contributions in artificial neural networks,”
Ecological Modelling, No. 154, 2002, p.135-150.
[109]. Olden, J.D., and Jackson, D. A., “Fish-habitat relationships in lakes: Gaining
predictive and explanatory insight by using artificial neural networks,”
Transactions of the American Fisheries Society, 130, 2001, p. 878-897.
135
[110]. http://www.ics.uci.edu/~mlearn/MLlist/v7/20.html
[111]. http://sti.srs.gov/fulltext/ms9800653/ms9800653.pdf
[112]. http://www.nd.com/genetics/selection.html

Ohiou 1132954243

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ohiou 1132954243

Uploaded by

Copyright:

Available Formats

KNOWLEDGE BASED APPROACH USING NEURAL NETWORKS FOR

PREDICTING CORROSION RATE

the Fritz J. and Dolores H. Russ

College of Engineering and Technology of Ohio University

of the requirements for the degree

KNOWLEDGE BASED APPROACH USING NEURAL NETWORKS FOR

PREDICTING CORROSION RATE

has been approved for

the Department of Industrial and Manufacturing Systems Engineering

and the Russ College of Engineering and Technology by

Associate Professor of Industrial and Manufacturing Systems Engineering

Dean, Russ College of Engineering and Technology

Director of Thesis: Gary R. Weckman

empirical models of CO2 corrosion. The complexity of the underlying physico-chemical

Associate Professor of Industrial and Manufacturing Systems Engineering

List of Figures ..................................................................................................................... 9

List of Tables .................................................................................................................... 12

1. 1 CO2 Corrosion: Theoretical Background .............................................................. 13

1. 2 Corrosion in the Oil and Gas Industry ................................................................... 14

1. 3 Importance of Crude Oil in CO2 Corrosion of Carbon Steel ................................. 15

1. 4 Previous Research in the Corrosion Field .............................................................. 16

1. 5 Current Research .................................................................................................... 18

CHAPTER 2. Soft Computing Methodologies................................................................. 21

2. 1 What is Soft Computing? ....................................................................................... 21

2. 2 Genetic Algorithms ................................................................................................ 23

2. 2. 1 Methodology of Genetic Algorithms.............................................................. 24

2. 3 Machine Learning .................................................................................................. 24

2. 3. 1 Decision Tree Induction ................................................................................. 26

2. 3. 2 Attribute-Oriented Induction .......................................................................... 31

2. 4 Artificial Neural Networks..................................................................................... 33

2. 4. 1 Neural Computation ....................................................................................... 33

2. 4. 2 The Multi-Layer Perceptron ........................................................................... 36

2. 4. 3 Neural Network Training ............................................................................... 39

2. 4. 4 Generalization Consideration ......................................................................... 40

2. 5 Knowledge Extraction From Artificial Neural Networks ...................................... 41

2. 5. 1 Partial Derivative Method .............................................................................. 42

2. 5. 3 Sensitivity Analysis ........................................................................................ 44

2. 5. 4 Garson’s Algorithm ........................................................................................ 45

2. 5. 5 Network Interpretation Diagram (NID).......................................................... 49

2. 6 Rule Extraction in Neural Networks ...................................................................... 52

2. 6. 1 The Rule Extraction Task............................................................................... 54

2. 6. 2 Approaches to Rule Extraction....................................................................... 55

2. 6. 3 Validity Interval Analysis............................................................................... 59

2. 6. 4 Extraction of Decision Tree Representations ................................................. 60

2. 6. 5 Trepan Algorithm ........................................................................................... 61

CHAPTER 3. Approaches to Corrosion Prediction Problem ........................................... 65

3. 1 Review of Approaches to Solving Corrosion Prediction Problem......................... 65

3. 2 Mechanistic Models ............................................................................................... 67

3. 4 Empirical Models ................................................................................................... 69

3. 5 Neural Network Models ......................................................................................... 71

4. 1 Corrosion Tests ...................................................................................................... 77

4. 2 Development of the Neural Network Model.......................................................... 78

4. 2. 1 Training, Cross-validation and Test Datasets................................................. 79

4. 2. 2 Network Architecture, Training Algorithm and Learning Parameters........... 79

4. 2. 3 Genetic Optimization of Network Parameters ............................................... 80

4. 2. 4 Termination Criteria ....................................................................................... 82

4. 4 Network Interpretation Diagram (NID) ................................................................. 99

4. 5 Garson’s Algorithm.............................................................................................. 101

4. 6 Trepan Algorithm ................................................................................................. 102

CHAPTER 5. RESULTS, COMPARISION AND DISCUSSION ................................ 110

5. 1 Accuracy of Selected Model ................................................................................ 110

5. 2 Sensitivity Analysis Results ................................................................................. 112

5. 3 Network Interpretation Diagram (NID) Results................................................... 115