You are on page 1of 135

KNOWLEDGE BASED APPROACH USING NEURAL NETWORKS FOR

PREDICTING CORROSION RATE

A thesis presented to

the faculty of

the Fritz J. and Dolores H. Russ

College of Engineering and Technology of Ohio University

In partial fulfillment

of the requirements for the degree

Master Of Science

Vishal V. Ghai

March 2006
This thesis entitled

KNOWLEDGE BASED APPROACH USING NEURAL NETWORKS FOR

PREDICTING CORROSION RATE

by

Vishal V. Ghai

has been approved for

the Department of Industrial and Manufacturing Systems Engineering

and the Russ College of Engineering and Technology by

Gary R. Weckman

Associate Professor of Industrial and Manufacturing Systems Engineering

Dennis Irwin

Dean, Russ College of Engineering and Technology


ABSTRACT
VISHAL V. GHAI. M.S. March 2006. Industrial and Manufacturing Systems

Engineering

Knowledge Based Approach Using Neural Networks For Predicting Corrosion Rate

(135 pp.)

Director of Thesis: Gary R. Weckman

A number of CO2 corrosion for the oil and gas industry exists. However, these

models lag significantly behind the needs of the industry. There is still a large knowledge

gap between actual processes occurring in the field and the current mechanistic and

empirical models of CO2 corrosion. The complexity of the underlying physico-chemical

phenomena is often such that our understanding is significantly lower than the level

required for the mechanistic modeling. There is a need to develop a model that would

have both the capability to predict the CO2 corrosion rate with high accuracy, as well as

provide knowledge that would aid the understanding of the phenomena. This thesis

focuses on the development of an Artificial Neural Network model based on CO2 field

data used in predicting the corrosion rate of carbon steel. Further, rules are extracted

from the trained network using a TREPAN decision tree algorithm to translate the

hypothesis learnt into symbolic form. Network model performance is then evaluated by

comparing it to a linear regression model using MINITAB. The efficacy of the rule set is
then compared to the C4.5 machine learning algorithm. The interrelationship of input

variables is discussed based on the constructed network model and the generated rule set.

Approved:

Gary R. Weckman

Associate Professor of Industrial and Manufacturing Systems Engineering


5

TABLE OF CONTENTS

Abstract ............................................................................................................................... 3

Table of Contents................................................................................................................ 5

List of Figures ..................................................................................................................... 9

List of Tables .................................................................................................................... 12

CHAPTER 1. Introduction................................................................................................ 13

1. 1 CO2 Corrosion: Theoretical Background .............................................................. 13

1. 2 Corrosion in the Oil and Gas Industry ................................................................... 14

1. 3 Importance of Crude Oil in CO2 Corrosion of Carbon Steel ................................. 15

1. 4 Previous Research in the Corrosion Field .............................................................. 16

1. 5 Current Research .................................................................................................... 18

1. 6 Thesis Structure...................................................................................................... 19

CHAPTER 2. Soft Computing Methodologies................................................................. 21

2. 1 What is Soft Computing? ....................................................................................... 21

2. 2 Genetic Algorithms ................................................................................................ 23

2. 2. 1 Methodology of Genetic Algorithms.............................................................. 24

2. 3 Machine Learning .................................................................................................. 24


6

2. 3. 1 Decision Tree Induction ................................................................................. 26

2. 3. 2 Attribute-Oriented Induction .......................................................................... 31

2. 4 Artificial Neural Networks..................................................................................... 33

2. 4. 1 Neural Computation ....................................................................................... 33

2. 4. 2 The Multi-Layer Perceptron ........................................................................... 36

2. 4. 3 Neural Network Training ............................................................................... 39

2. 4. 4 Generalization Consideration ......................................................................... 40

2. 5 Knowledge Extraction From Artificial Neural Networks ...................................... 41

2. 5. 1 Partial Derivative Method .............................................................................. 42

2. 5. 2 Perturb Method............................................................................................... 44

2. 5. 3 Sensitivity Analysis ........................................................................................ 44

2. 5. 4 Garson’s Algorithm ........................................................................................ 45

2. 5. 5 Network Interpretation Diagram (NID).......................................................... 49

2. 6 Rule Extraction in Neural Networks ...................................................................... 52

2. 6. 1 The Rule Extraction Task............................................................................... 54

2. 6. 2 Approaches to Rule Extraction....................................................................... 55

2. 6. 3 Validity Interval Analysis............................................................................... 59

2. 6. 4 Extraction of Decision Tree Representations ................................................. 60


7

2. 6. 5 Trepan Algorithm ........................................................................................... 61

CHAPTER 3. Approaches to Corrosion Prediction Problem ........................................... 65

3. 1 Review of Approaches to Solving Corrosion Prediction Problem......................... 65

3. 2 Mechanistic Models ............................................................................................... 67

3. 3 Semi-Empirical Models.......................................................................................... 68

3. 4 Empirical Models ................................................................................................... 69

3. 5 Neural Network Models ......................................................................................... 71

CHAPTER 4. Methodology.............................................................................................. 77

4. 1 Corrosion Tests ...................................................................................................... 77

4. 2 Development of the Neural Network Model.......................................................... 78

4. 2. 1 Training, Cross-validation and Test Datasets................................................. 79

4. 2. 2 Network Architecture, Training Algorithm and Learning Parameters........... 79

4. 2. 3 Genetic Optimization of Network Parameters ............................................... 80

4. 2. 4 Termination Criteria ....................................................................................... 82

4. 3 Sensitivity Analysis................................................................................................ 83

4. 4 Network Interpretation Diagram (NID) ................................................................. 99

4. 5 Garson’s Algorithm.............................................................................................. 101

4. 6 Trepan Algorithm ................................................................................................. 102


8

CHAPTER 5. RESULTS, COMPARISION AND DISCUSSION ................................ 110

5. 1 Accuracy of Selected Model ................................................................................ 110

5. 2 Sensitivity Analysis Results ................................................................................. 112

5. 3 Network Interpretation Diagram (NID) Results................................................... 115

5. 4 Results of Garson’s Algorithm............................................................................. 115

5. 5 Results of Trepan Algorithm................................................................................ 116

5. 5. 1 Efficacy of the Rule Extraction Task ........................................................... 116

5. 6 Comparison Methodologies ................................................................................. 118

5. 6. 1 Statistical Analysis ....................................................................................... 118

5. 6. 2 C4.5 Decision Tree Using WEKA ............................................................... 120

CHAPTER 6. Conclusion and Future Research ............................................................. 123

6. 1 Conclusions .......................................................................................................... 123

6. 2 Future Research.................................................................................................... 124

References....................................................................................................................... 127
9

LIST OF FIGURES

Figure 2.1 Decision Tree Representation of PlayTennis Concept.................................... 28

Figure 2.2 Scheme of Attribute-Oriented Induction ......................................................... 32

Figure 2.3 Computation at a Node.................................................................................... 34

Figure 2.4 Information Flow for Training Phase .............................................................. 36

Figure 2.5 A Multi-Layer Perceptron ............................................................................... 37

Figure 2.6 Logistic and Hyperbolic Tangent Transfer Functions..................................... 38

Figure 2.7 Cross Validation for Termination.................................................................... 41

Figure 2.8 Network Diagram ............................................................................................ 47

Figure 2.9 Network Interpretation Diagram (NID)........................................................... 51

Figure 2.10 Schematic Representation of the ANN-DT Algorithm ................................. 61

Figure 2.11 The TREPAN Algorithm. ................................................................................ 63

Figure 4.1 Sensitivity Analysis About the Mean .............................................................. 84

Figure 4.2 Separate Sensitivity for % Crude Oil .............................................................. 85

Figure 4.3 Separate Sensitivity for Nickel (Ni) ................................................................ 85

Figure 4.4 Separate Sensitivity for API ............................................................................ 86

Figure 4.5 Separate Sensitivity for Total Nitrogen........................................................... 86


10

Figure 4.6 Separate Sensitivity for Vanadium (V) ........................................................... 87

Figure 4.7 Separate Sensitivity for S % ............................................................................ 87

Figure 4.8 Separate Sensitivity for Total Acid Number (TAN) ....................................... 88

Figure 4.9 Separate Sensitivity for Saturates.................................................................... 88

Figure 4.10 Separate Sensitivity for Aromatics................................................................ 89

Figure 4.11 Separate Sensitivity for Resins...................................................................... 89

Figure 4.12 Separate Sensitivity for Asphaltenes ............................................................. 90

Figure 4.13 Cumulative Sensitivity Graph ....................................................................... 91

Figure 4.14 Sensitivity About the Mean for 1% Crude Oil Concentration ...................... 92

Figure 4.15 Sensitivity About the Mean for 20% Crude Oil Concentration .................... 92

Figure 4.16 Sensitivity About the Mean for 50% Crude Oil Concentration .................... 93

Figure 4.17 Sensitivity About the Mean for 80% Crude Oil Concentration .................... 93

Figure 4.18 Sensitivity About the Mean for 1% and 20% Combined .............................. 94

Figure 4.19 Sensitivity About the Mean for 50% and 80% Combined ............................ 95

Figure 4.20 Relationship Between % Crude Oil and Ni at Constant Inhibition Output... 96

Figure 4.21 Relationship Between % Crude Oil and TAN at Constant Inhibition

Output ...........................................................................................................97

Figure 4.22 Relationship Between API and Aromatics at Constant Inhibition Output .... 97
11

Figure 4.23 Relationship Curves Between % Crude Oil, Ni and TAN at Constant

Inhibition.......................................................................................................98

Figure 4.24 NID for 11-6-6-1 MLP Network ................................................................. 100

Figure 4.25 Results of Garson’s Algorithm Showing Relative Importance of Input

Variables. ....................................................................................................101

Figure 4.26 Partially Expanded View of Disjunctive_Trepan_tree Extracted From the

UNEVEN Class Data..................................................................................105

Figure 4.27 Partially Expanded View of Trepan_tree Extracted From the UNEVEN

Class Data. ..................................................................................................106

Figure 5.1 Model Performance on Training vs. Test Data. ............................................ 111
12

LIST OF TABLES

Table 2.1 Training Set for the PlayTennis Concept.......................................................... 27

Table 2.2 Matrix Showing Connection Weights............................................................... 47

Table 2.3 Matrix Showing Contribution of Each Input Neuron ....................................... 48

Table 2.4 Relative and Sum of Input Neuron Contributions ............................................ 49

Table 2.5 Relative Importance of Input Variables............................................................ 49

Table 4.1 Evaluation of Different Neural Network Model Architectures ........................ 80

Table 4.2 Learning Parameters and the Termination Criteria........................................... 83

Table 4.3 Uneven Class Ranges...................................................................................... 102

Table 4.4 Even Class Ranges.......................................................................................... 102

Table 4.5 Performance Statistics for Decision Trees...................................................... 103

Table 4.6 Rules Extracted From the Disjunctive_Trepan_tree ...................................... 108

Table 5.1 Prediction Accuracy of the 11-6-6-1 MLP Network. ..................................... 110

Table 5.2 Number of Features in the Rule Antecedent for the NN-Rule Set ................. 117

Table 5.3 Results of Multiple Regression Analysis........................................................ 118

Table 5.4 Results of C4.5 Algorithm. ............................................................................. 121

Table 5.5 Comparative Summary ................................................................................... 122


13

CHAPTER 1. INTRODUCTION

1. 1 CO2 CORROSION: THEORETICAL BACKGROUND

Corrosion in carbon steel in the presence of CO2 involves an electrochemical

process where iron is dissolved at the anode and hydrogen is evolved at the cathode

[103]. The chemical reaction is:

Fe + CO2 + H 2O → FeCO3 + H 2 (1.1)

This chemical reaction results in formation of solid FeCO3 films. Depending on the

conditions during formation, these films can be protective or non-protective. One of the

most important individual reactions is the anodic dissolution of iron:

Fe → Fe 2+ + 2e − (1.2)

The presence of CO2 acts as a catalyst increasing the hydrogen evolution, thereby

increasing the corrosion rate of carbon steel in aqueous solution. Even at pH > 5 the

hydrogen evolution increases under the presence of H2CO3. Some researchers in their

work [1] and [2] assume that H2CO3 either serves as an extra source of H+ ions or is

reduced directly. It has also been assumed [2] and [3] that both these reactions are

independent of each other and the total cathodic current is the aggregate of the current

produced by the two reactions:

2 H + + 2e − → H 2 (1.3)
14

2 H 2CO3 + 2e − → H 2 + 2 HCO3− (1.4)

For more details on CO2 corrosion, refer to a number of publications covering this

field [1]-[8]. Particular attention is drawn to the recent reviews of the main design

considerations [9] and prediction techniques related to CO2 corrosion [10] compiled by

the European Federation of Corrosion.

1. 2 CORROSION IN THE OIL AND GAS INDUSTRY

The majority of the oil and gas pipelines are made of carbon steel. Pipelines, like

other structures in nature, deteriorate over time. This deterioration in metallic pipeline

usually occurs as a result of the damaging effects of the surrounding environment. For

carbon steel, one of the most dominant forms of such deterioration is corrosion. The

corrosion problem is a major concern and becomes critical as a pipeline ages. Pipeline

operators throughout the world are confronted with the expensive and risky task of

operating aged pipelines because of corrosion and its potential damaging effects. The

major effect of corrosion is the loss of metal cross-section. This results in a reduction of

the pipeline’s carrying capacity and its safety. For a pipeline carrying live corrosion

defects, the major concern for the operator is the need to have a simple and quick

technique which can be used to analyze the rate of corrosion when a particular type of oil

is flowing into the pipeline. This information can be used to evaluate the pipeline’s

current reliability, and the time-dependent changes in it. This would help in determining

the effective safe-life of the pipeline, and an estimate of the time when the pipeline needs
15

to be changed. Changing a pipeline in the case of oil and gas industry is a very time

consuming and expensive procedure.

1. 3 IMPORTANCE OF CRUDE OIL IN CO2 CORROSION OF CARBON STEEL

The role of crude oil in CO2 corrosion has gained special attention in the last few

years due to its significance when predicting or modeling corrosion rates. Modeling the

effect of crude oil in CO2 corrosion is not an easy task. Though many researchers have

worked in the area, the complexity and variations in the constituent of different crude oils

make it difficult to model its effects (properties such as wettability and corrosivity) on the

carbon steel.

Efird [11] stressed the importance of testing the effect of specific crude oils and

including this in corrosion prediction and testing. He also introduced the definition of

Corrosion Rate Break as the level of produced water in crude oil production where

corrosion is accelerated and becomes a problem. Smart [12] in 1993 presented his work

relating petrophysical and wettability properties to the corrosion. He indicated in his

work that crude oils have surface active compounds (polar compounds containing

oxygen, nitrogen and sulfur) that strongly affect the wettability properties of brines.

Hernández et al. [13] In her work provided insight about the variables in crude oil

composition that could be playing a major role in the inhibition offered by crude oils.
16

1. 4 PREVIOUS RESEARCH IN THE CORROSION FIELD

For years, researchers have presented various approaches detailing the process of

corrosion. The task of corrosion prediction has been identified as a key approach in

utilizing the knowledge of the corrosion process and applying it to industrial corrosion

related problems. Many corrosion models have been developed over the years. These

models can be categorized into three main categories: empirical, semi-empirical and

mechanistic models, based on how firmly they are grounded in theory. These models

predict the corrosion rate with sufficient accuracy, but provide little knowledge into

understanding the corrosion process. It is also important to note that some of these

models are so complex that one needs a thorough understanding of the thermodynamic

and electro-chemical processes occurring during the process of corrosion. The everyday

industrial application calls for a corrosion model which is relatively easy to use, has high

predictive accuracy, provides insight regarding the modeling process, helps in

understanding the interrelationship between variables affecting the corrosion rate, and

can be interpreted without the need of extensive chemical and thermodynamic knowledge

of corrosion process.

The success of a good model is based primarily on the consistency of a good data

set [14]. Corrosion data is generally expensive to produce, and large corrosion data sets

with sufficient consistency are difficult to find. The poor quality of the data may be due

to one or more limitations in the recorded data:


17

• Errors in the data arising from poor experiment design, faulty equipment or

miscalculations.

• Failure to measure, control or report significant variables. For any work

concerned with the effect of alloy composition on corrosion behavior, the

reporting of nominal, rather than actual, compositions introduces a significant

uncertainty, and degrades the value of the result.

• Failure to report or control environmental variables such as flow rate, oxygen

concentration or temperature restrict the value of the data recorded.

• The summarization of data, e.g. by plotting lines without the data points on which

the lines are based, seriously limits the use of such data for further analysis.

The present empirical, semi-empirical and mechanistic models lack high accuracy

mainly due to their inability to model the corrosion process in the absence of large

consistent data sets. This leads to the necessity of developing a more robust model which

is able to predict the corrosion rate with high accuracy even in the presence of a limited

noisy data set.


18

1. 5 CURRENT RESEARCH

The desired improvements identified in the previous research are:

• Developing a robust prediction model, capable of handling limited noisy data with

high accuracy.

• A model which is relatively easy to use and provides an understanding of the

corrosion process by providing knowledge regarding the interrelationship between

variables affecting the corrosion rate.

• A model which can serve as a hybrid to the current mechanistic and empirical

models by providing knowledge in the form of rules

The current research uses Artificial Neural Networks (ANNs) as an Artificial

Intelligence approach for modeling the corrosion rate. ANNs are being recognized as a

powerful and general technique for machine learning because of their non-linear

modeling abilities. Further, their distributed architecture is more robust in handling the

noise-ridden data. The hypothesis or model learned by the neural network is not explicitly

stated, but is implicitly enumerated in the network architecture. However, ANNs can be

made to yield comprehensible models by using rule extraction procedures.

This thesis has five main objectives:

1. To construct an ANN which can be used to predict the corrosion rate in carbon

steel.
19

2. To optimize network parameters using genetic algorithm to create a model

which would have a high accuracy

3. To understand the interrelationship between the input variables and the impact

of the response using knowledge extraction methods such as Sensitivity

Analysis, Network Interpretation Diagram and Garson’s Algorithm.

4. To capture the embedded knowledge by extracting symbolic rules and decision

trees by using appropriate ANN rule extraction algorithms.

5. To compare prediction accuracy of the trained ANN model with the extracted

rule set and other machine learning algorithms.

1. 6 THESIS STRUCTURE

The thesis has been organized into six chapters as follows: Chapter 1 introduced the

corrosion problem and explains the motivation of the current research. Chapter 2

provides background material for the various soft computing methods utilized in this

thesis. Chapter 3 undertakes a survey of the various approaches to solve the corrosion

prediction problem. Chapter 4 presents the methods and tools used in this work to

achieve the research objectives and focuses on the development of the neural network

model and implementation of rule extraction methods. Chapter 5 discusses the results of

the various techniques employed for the analysis. This chapter also describes in brief the

statistical and machine learning methodology used for analysis and comparison of the
20

results of the current research. Chapter 6 provides conclusions and suggestions for future

research.
21

CHAPTER 2. SOFT COMPUTING METHODOLOGIES

2. 1 WHAT IS SOFT COMPUTING?

Soft Computing (SC) refers to the evolving collection of methodologies used to

build intelligent systems exhibiting human-like reasoning and capable of tackling

uncertainty. The adoption of this approach has led to the development of systems that

have high MIQ (Machine Intelligence Quotient) [15]. SC-methodologies have proven to

be more successful then classical modeling, reasoning and search techniques in a wide

variety of problem domains. The characteristics of problems for which traditional

analytical approaches have proven deficient are:

• Modeling difficulties: Generally, real world problems are poorly defined and

information is empirically available as input-output patterns representing

instances of the problem’s behavior. Precise and accurate mathematical models

for such problems are either unavailable or restrictively expensive to build.

Further, such models exhibit non-linear behavior for which traditional

mathematical modeling tools are of limited utility.

• Large-scale solution spaces: Problems with large-scale solution spaces are usually

intractable with deterministic search techniques. The computational time and

effort is huge, and deterministic search does not employ mechanisms for

successfully navigating through local optima.


22

• Knowledge Acquisition: Expert knowledge in a problem domain is often fuzzy,

consisting of imprecise declarations, partial truths and approximations. Hence,

crisp classifications and unambiguous definitions are not always possible. Also, in

some cases, there is a need to directly acquire knowledge from problem data

without human intervention.

Soft computing consists of a suite of approaches capable of exploiting the above

described problem characteristics to yield tractable and robust intelligent systems at low

solution cost. The discipline of SC encompasses several paradigms such as fuzzy set

theory, neural networks, approximate reasoning, and stochastic optimization methods like

genetic algorithms, simulated annealing and machine learning techniques. SC unites these

complementary approaches into a cohesive structure, providing a scaffold for the

construction of innovative hybrid intelligent systems. The key strengths of the constituent

approaches are as follows:

• Fuzzy Set Theory allows for imprecise knowledge representation in the form of

fuzzy if-then rules.

• Neural Networks exhibit learning and adaptive behavior with non-linear modeling

capabilities.

• Genetic Algorithms provide systematic global search of solution space and are

capable of evolving better candidate solutions starting with random initial

solutions.

• Machine Learning methods are important for automated knowledge acquisition.


23

The above capabilities have allowed SC-approaches to successfully confront

several real world problems in robotics, space flight, process control, production and

aerospace applications [17]. The remaining sections of this chapter provide the necessary

background material for the SC-methodologies utilized in the current research effort.

2. 2 GENETIC ALGORITHMS

Genetic Algorithms (GAs), first proposed by John Holland [18], are stochastic

search techniques used for optimization problems. The basic methodology is rooted in the

principles of natural genetics and evolutionary processes. As general-purpose

optimization tools, GAs offer several unique advantages over conventional optimization

techniques. They combine elements of directed and stochastic search methods, striking a

good balance between the exploration and exploitation of the solution space [19]. One of

the unique characteristics of GAs is that functional derivative or gradient information is

not required for determining the search direction in these algorithms. This characteristic

makes them a flexible tool for optimizing a large number of objective functions, which

are either not differentiable or whose gradient calculation is computationally expensive.

Genetic Algorithms work with solution populations rather than single members, making

them capable of yielding multiple solutions of high quality.


24

2. 2. 1 METHODOLOGY OF GENETIC ALGORITHMS

The solution or search space contains all feasible solutions. Each point in this

space, called a chromosome, has an associated fitness value that usually equals the

objective function evaluated at that point. GA maintains a population of chromosomes,

which is repeatedly evolved over generations towards better fitness. The next generation

is created from the current population by using genetic operators like crossover and

mutation. The chromosomes with a higher fitness value survive and participate in the

creation of new populations. This ensures that successful chromosomes pass their good

genes to the next generation. The population continuously evolves toward better fitness,

and the algorithm converges to the best chromosome after several generations.

2. 3 MACHINE LEARNING

The ability to learn from examples and construct a model of the world is the

foundation of biological intelligence. This model of the world, often implicit, allows us to

adapt to a dynamically changing environment and is necessary for our survival. Artificial

Intelligence (AI) aims at constructing artifacts (machines, programs) that have the

capability to learn, adapt and exhibit human-like intelligence. Hence, learning is the key

for practical applications of AI. The field of machine learning is the study of methods for

programming computers to learn [20]. Many important algorithms have been developed

and successfully applied to diverse learning tasks such as speech recognition, game

playing, medical diagnosis, financial forecasting and industrial control [21].


25

A learning system is given a set of examples encoded in a machine-readable

format, referred to as a training set, or input. The system then generates a model or

hypothesis to perform a task of interest (pattern recognition, classification, prediction,

etc). The model is evaluated based on its ability to correctly generalize with examples not

used for training purposes. Hence, predictive accuracy or generalization is an important

criterion in evaluating alternate machine learning schemes. An accurate model allows us

to gain insight into the problem domain, laying a special emphasis on the criterion of

comprehensibility. It refers to the ease of understanding the model by a human user and

serves the purpose of validation, knowledge discovery and refinement. Fayyad et al. [22]

asserts that inductive learning with a focus on comprehensibility is a central activity in

the developing the area of knowledge discovery in databases and data mining.

Although a wide choice of machine learning schemes are available, they differ

significantly in terms of predictive accuracy, comprehensibility and ease of

implementation across different problem domains. The selection of a suitable learning

algorithm for a specific problem is based on considerable experimentation, using

different learning algorithms and evaluating the induced model in terms of predictive

accuracy, comprehensibility, and other possible criteria.

The following sub-sections present two important symbolic machine learning

methods, decision tree induction and attribute oriented induction. Neural networks, the

main class of non-symbolic machine learning tools used in this research are covered in a

later section in this chapter.


26

2. 3. 1 DECISION TREE INDUCTION

Decision trees are among the most popular symbolic machine learning algorithms.

They express the learned hypothesis or target function using a unique representation

format known as a decision tree. Decision trees can easily be compiled into simple if-then

rules for improving human comprehensibility. GAs have been applied in a number of

places from diagnosis of medical cases to learning to asses the credit risk of loan

applicants [23]. The example shown in table Table 2.1 [25] depicts the training data for

the target concept, PlayTennis. This decision tree classifies Saturday mornings as suitable

for playing a game of tennis or not.


27

Table 2.1 Training Set for the PlayTennis Concept

Outlook Temperature Humidity Wind PlayTennis


Sunny Hot High Weak No
Sunny Hot High Strong No
Overcast Hot High Weak Yes
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Overcast Cool Normal Strong Yes
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes
Overcast Mild High Strong Yes
Overcast Hot Normal Weak Yes
Rain Mild High Strong No

Each node in the tree indicated by an oval specifies a logical test based on some

attribute or feature in the problem. The tree has a root node, outlook, having three

possible attribute values: sunny, overcast and rain. Each of the outgoing branches from a

node corresponds to one of the possible values of the attribute. Hence, the root node has

three branches. A tree has also a set of leaf nodes, which represents the outcome of the

classifier (i.e., decision to play tennis or not). The classification of an instance involves

traversal through the tree starting at the root node until a leaf node is encountered. The

example instance (outlook = sunny, humidity = high) would follow the leftmost branch of

the depicted tree in Figure 2.1 [25].


28

Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

No Yes No Yes

Figure 2.1 Decision Tree Representation of PlayTennis Concept

Adapted From Quinlan [23].

The tree predicts that target concept, PlayTennis = no, indicating unsuitable

weather conditions for playing tennis. Also, the attribute temperature was not utilized in

constructing the tree, indicating its insignificance in the decision-making process.

Many decision tree induction methods have been developed in the last two decades

with different capabilities and requirements. The ID3 algorithm is the core algorithm on

which many variants have been developed. The algorithm constructs a decision tree in a

top-down fashion by recursively partitioning the instances at each node. The

determination of an attribute for partitioning the instance space is an important aspect in

decision tree induction. ID3 uses a statistical property called information gain to select
29

among candidate attributes in constructing a tree. Information gain provides a criterion

that measures the effectiveness of an attribute in classifying the training instances.

Let S denote the set of training instances. The information gain, InfoGain (T)

obtained by choosing an attribute T for splitting the set S as given in [24]:

InfoGain(T ) = info( S ) − infoT ( S ) (2.1)

In the above equation, info(S) is the amount of information needed to classify an

instance in S, and infoT (S) is the corresponding measure after partitioning the set S based

on attribute T. If the set has k possible partitions for k classes, the information content is:

k
freq(Cj , S ) ⎛ freq(Cj , S ) ⎞
info(S) = − ∑ |S|
log 2 ⎜⎜
|S|
⎟⎟ (2.2)
j =1 ⎝ ⎠

Here freq(Cj, S) is the number of examples of class Cj and j ranges over k classes.

Given a partition based on attribute T, the expected value of information over the induced

n subsets is given by:

n |S |
infoT (S) = −∑ i
.info ( S i ) (2.3)
i =1 | S |

In the above expression, Si is the subset of examples in S having and ith outcome,

and i ranges over the n subsets.

The procedure of selecting a splitting attribute and partitioning the training instance

set is recursively done for each internal node. Only the examples, which reach that node
30

(i.e., the examples that satisfy logical tests on the path), are used in attribute selection.

This process continues until either of these two criteria is met:

1. Every available attribute has been included in the tree path

2. The training instances at a given node belong to the same class. If so, the node is

labeled as a leaf node.

The major flaw with the ID3 algorithm based on the gain criterion is that it has a

strong bias towards the test with greater outcomes [104]. Let us consider a hypothetical

medical diagnosis task in which one of the attributes contains a patient identification.

Since every such identification is intended to be unique, partitioning any of the training

cases on the values of this attribute will lead to a large number of subsets, each

containing just one case. Since all of these one-case subsets necessarily contain cases of a

single class, InfoT (S)=0, so information gain from using this attribute to partition the set

of training cases is maximal. From the point of view of prediction, however, such a

division is quite useless.

The bias inherent in the gain criterion was later rectified in the C4.5 algorithm [25]

by employing the gain ratio. Let us take the scenario where the information about a case

indicates the outcome of a test rather then the class information to which the case

belongs. By analogy with the definition of info(S), we have

n |T | ⎛ Ti ⎞
split info(S) = − ∑ i
× log 2 ⎜⎜ ⎟

(2.4)
i =1 | T | ⎝T ⎠
31

According to Klinkenberg [105] the equation represents the information obtained

by dividing T into n subsets and info(S) provides knowledge about the classes that are

generated by the division. Equation 2.5 represents the amount of information that is

provided by split [104].

gain ratio ( S ) = gain ( S ) ÷ split info(S) (2.5)

2. 3. 2 ATTRIBUTE-ORIENTED INDUCTION

When the training set for learning is provided as a database, the task of inducing

hypothesis describing the data is called data mining. In real world applications, databases

are predominantly used for representing and maintaining information. Often, the

information is enormous, noisy, uncertain, and can involve missing values. A growing

need for knowledge discovery in databases led to the rapid development and adaptation

of special-purpose machine learning techniques suited for databases.

Attribute-oriented induction (AOI) is a technique for mining knowledge from

relational databases by inducing characteristic and classification rules describing the

hypothesis. It is a set-oriented method that generalizes the task-relevant subset of data,

attribute-by-attribute, into a general relation [26]. AOI is a data-driven induction process

capable of generalization to a desired level of abstraction. The method integrates machine

learning concepts like induction, generalization, concept hierarchies and database

operations to discover rules. Figure 2.2 shows the inputs and output of the AOI method.
32

Database
Queries Attribute Oriented Generalized
Induction Relation
List of Attributes
Concept Hierarchy

Figure 2.2 Scheme of Attribute-Oriented Induction

This technique requires provision of domain knowledge in the form of concept

hierarchies for obtaining generalized relations. Concept hierarchies can be explicitly

given by the experts or automatically generated by data analysis [27]. AOI is capable of

utilizing these concept hierarchies to generate logical rules. The following are some of

the key steps in induction algorithm:

• Concept Tree Ascension: Generalize the relationship by elimination of identical

tuples using predetermined threshold to control the generalization process.

• Vote Aggregation: The number of identical tuples being merged during the tree

ascension is important for the learning task. A counter is maintained indicating

the number of tuples in the initial relation that are generalized to the current

relation.

• Simplification: The generalized relation is simplified by merging of nearly

identical tuples (i.e., differing in the value of one attribute).

• Rule Transformation: The obtained final relation is a logical rule in the format

desired by the user.


33

The method is capable of generating two types of induction rules: learning

characteristic rules (LCHR) and learning classification rules (LCLR) [28]. Their

induction procedures are similar, differing in the attribute generalization process. Further

details about the AOI methodology can be found in [26] and [28].

2. 4 ARTIFICIAL NEURAL NETWORKS

2. 4. 1 NEURAL COMPUTATION

The motivation for the early development of neural networks stemmed from the

desire to mimic the functionality of the human brain. A neural network is an intelligent

data-driven modeling tool that is able to capture and represent complex and non-linear

input/output relationships. Neural networks are used in many important applications, such

as function approximation, pattern recognition and classification, memory recall,

prediction, optimization and noise-filtering. They are used in many commercial products

such as modems, image-processing and recognition systems, speech recognition software,

data mining, knowledge acquisition systems and medical instrumentation, etc [29].

A neural network is consists of many layers of nodes. These nodes are linked by

connections, with each connection having an associated weight, Wi. The weight of a

connection is a measure of its strength and its sign is indicative of the excitation or

inhibition potential. Figure 2.3 shows a simple perceptron having n inputs, {X1, X2… Xi…

Xn}.
34

W1
X1
W2
X2

Input Output
Wi
f (∑ Xi Wi) - θ
Xi

Wn
Xn

Figure 2.3 Computation at a Node

The perceptron has a threshold or bias, θ, which is the value of the net input

required to produce non-zero activation. The net input to a perceptron, neti, is given by:

neti = ∑Wi X i + θ (2.6)

A transfer function, f maps the net input to a range, O which is the activation or

output of the perceptron. It is given by:

Output,O = f(neti ) (2.7)

Neural networks have two distinct phases of operation: training and production.

Some design parameters need to be chosen before training the network. These include:
35

• System architecture or topology: The number of nodes in each layer and

corresponding transfer functions.

• Training Algorithm: The training algorithm and the performance measure or the

cost function.

• Generalization Considerations: The number of epochs or cycles needed to ensure

good generalization and criteria for termination of training phase.

Parameters like weights and biases are modified during the training phase. The

network uses problem data to assign values to these parameters. The distinguishing

characteristic of neural networks is their adaptivity, during which it requires a unique

information flow design as depicted in the Figure 2.4 (adapted from Principe et al. [30]).

The performance feedback loop utilizes a cost function to provide a measure of deviation

between the calculated output and the desired output. This performance feedback is

utilized directly to adapt the parameters, W and θ, so that the system output improves

with respect to the desired goal.


36

Desired

Input Neural network Output


(W, θ)

Adjust parameters Cost

Training
algorithm Error

Figure 2.4 Information Flow for Training Phase

2. 4. 2 THE MULTI-LAYER PERCEPTRON

A multi-layer perceptron (MLP) consists of a cascade of perceptrons arranged in

layers. A single hidden layer network is illustrated in Figure 2.5. The input layer contains

nodes that represent the inputs of the given problem. Each input is represented by a single

node in the input layer. The hidden layer maps the input to an intermediate space, which

serves as an input region for the output layer.


37

X1 X2 X3 X4

Input Layer

Hidden Layer

Output Layer

Y1 Y2

Figure 2.5 A Multi-Layer Perceptron

The output layer represents the response/output. The output node, as shown in

Figure 2.5, allows us to determine the response/output from the input variables. MLPs

have been proven to be universal approximators [31], capable of implementing any given

function. This is only possible with the choice of non-linear transfer functions. Two of

the most commonly used functions are the logistic function and hyperbolic tangent

function. The two functions are different in the range of their output values as illustrated

in Figure 2.6 for net input in the range [-4, 4].


38

Figure 2.6 Logistic and Hyperbolic Tangent Transfer Functions

The logistic function has an output range [0, 1], and the activation of a node, ai is

given by:

1
ai = (2.8)
1+ e - net input

The hyperbolic tangent function compresses a unit’s net input into an activation

value in the range [-1, 1]:

e net input - e- net input


ai = (2.9)
e net input + e- net input
39

2. 4. 3 NEURAL NETWORK TRAINING

The training phase in neural networks provides the answer to the following

questions: Is there a set of network parameters (weights and biases) that allows a network

to map a given set of input patterns to the desired outputs? If so, how are the parameters

determined? The most commonly used training algorithm is the backpropagation

algorithm, first discussed by Rumelhart et al.[32]. The term, ‘backpropagation’ refers to

the direction of propagation of error. The training regimen adjusts the weights and biases

of the network to minimize the cost function. Though several cost functions are available,

the function appropriate for prediction problems is the cross-entropy function [33]:

E=∑ ∑t pi ln(y pi ) + (1 - t pi ) ln(1 - y pi ) (2.10)


p i

In the above equation, E is the cross entropy, p represents the number of training

patterns and i represents the number of classes. The term, y pi is the estimated probability

that an input pattern belongs to class i, and tpi is the target with the range [0, 1]. Network

output is interpreted as the probability that the given input pattern belongs to a certain

class.

The cost function, E needs to be minimized and its derivative, with respect to the

weight is calculated and denoted by ∂E/∂w. Having obtained the derivative, adjusting

weights is now an optimization problem. Back-propagation uses a form a gradient

descent to update weights according to the formula:


40

∂E
Weight Update, Δwij = −η (2.11)
∂wij

In this equation wij, represents weight passing from the node i to the node j. η > 0

represents the learning rate and ∂E/∂wij, is the partial derivative of the error, E with

respect to wij. In the initial phase the random weights are assigned to the network and the

training algorithms modifies these weights according to the above discussed procedure.

Many alternative optimization techniques have been utilized; variations of the basic

method include methods like the conjugate-gradient method, momentum learning, etc.

Stochastic search algorithms such as genetic algorithms have been applied to avoid the

problem of convergence to local minima by [34] and [35].

2. 4. 4 GENERALIZATION CONSIDERATION

The collection of input pattern-desired response pairs used to train the learning

system is called the training set. The testing set contains examples not used for the

training purpose and it is used to evaluate the generalization capabilities of the network.

Vapnik [36] indicates that performance of the network trained with back-propagation

always improves with the number of training cycles. However, the error on the testing set

initially decreases with the number of cycles, and then increases as shown in Figure 2.7
41

Prediction Error

Stopping point

Testing Set

Training Set

Number of cycles

Figure 2.7 Cross Validation for Termination

This phenomenon is called overtraining and is indicative of poor generalization

capabilities. One solution to this problem is to split the training set into two sets – the

training set and validation set. After every fixed number of iterations, the error on the

validation set is calculated. Training is terminated when this error starts to increase. This

method is called early stopping or stopping with cross-validation.

2. 5 KNOWLEDGE EXTRACTION FROM ARTIFICIAL NEURAL NETWORKS

Once the neural network has been trained on a specific network topology, the next

step in the modeling of the process using ANN involves extracting knowledge from the

network. This embedded knowledge is in the form of connection weights. In order to

understand the predictive modeling process it is imperative to analyze these weights and
42

extract information regarding the contribution of input variables on the final output. Also

these input variables may not always be independent of each other. The interrelationship

between the input variables significantly affects the contribution towards the final

response/output.

The next section delineates some of the important methods used to determine the

impact of the input variables on the output.

2. 5. 1 PARTIAL DERIVATIVE METHOD

The Partial Derivative Method (PaD) consists of calculating the partial derivatives

of the response variables depending on the values of the input variables [37] and [38];

Two results can be obtained through this method:

• A profile of the output variations which indicates small changes for each input

variable.

• A classification of the relative contributions that each variable has towards the

output generated by the network.

A partial derivative of the output generated by the network is computed w.r.t the

input to obtain the profile of the variations on the output with a small change in the input

variable [106]. For a network structure consisting of one hidden layer of nh neurons, ni

inputs and a single output variable (i.e. no=1), the partial derivative of the response

variable yj w.r.t input xj (with j =/1. . . N and N the total number of observations) are:
43

nh
d ji = S j ∑ who I hj (1 − I hj ) wih (2.12)
h =1

In this equation Sj denotes the partial derivative of the resulting output neuron with

reference to the input. Ihj is the response of the hth hidden neuron, who and wih are the

weights between the output neuron and hth hidden neuron, and between the ith input

neuron and the hth hidden neuron [106].

A set of graphs of the partial derivatives versus each corresponding input variable

can then be plotted, which would enable a visual representation of the effect that

individual input variable has on the network output. Interpretation of one of these graphs

is that, if the partial derivative is negative then, for this value of the studied variable, the

output variable would tend to decrease, while the input variable increases. Inversely, if

the partial derivatives are positive, the output variable would tend to increase, while the

input variable also increases. The second result of PaD concerns the relative contribution

of the ANN output to the data set with respect to an input. It is calculated by a sum of the

square partial derivatives obtained per input variable:

n
SSDi = ∑ (d ji ) 2 (2.13)
j =1

One SSD (Sum of Square Derivatives) value is obtained per input variable. The

SSD values allow classification of the variables according to their increasing contribution

to the output variable in the model. The input variable with the highest SSD value would

influence the output variable the most.


44

2. 5. 2 PERTURB METHOD

The ‘Perturb’ method corresponds to a perturbation of the input variables [39]. This

method aims to assess the effect of small changes in each input of the neural network

output. The underlying methodology of the algorithms is to adjust the values of a

particular input variable while maintaining the others constant and to record the

corresponding output. The variable with greatest influence on the network output is

considered to be the most significant to the model. The mean square error (MSE) is

expected to increase as a larger amount of noise is added to the selected input variable

[39] and [40]. The aim is to assess the effect of small changes in each input on the neural

network output. Classification of the input variables can be obtained by order of their

importance.

2. 5. 3 SENSITIVITY ANALYSIS

Sensitivity analysis [41], [42], [43], [107], extracts the interrelationship between

the input and the output variables of the network. It is significant to gather information

regarding the influence of the input variables on the network response during the training

cycle, as this would provide feedback as to which input channels are most significant.

The input space is the pruned by removing the insignificant channels resulting in reduced

network size thereby simplifying the network complexity and reducing training times.

Sensitivity analysis [108] helps in understanding the influence of each input variable on

the network response generated. The magnitude of the one of the input variables is varied
45

over its entire range. During this process all the other input variables are held constant at

their mean values. The network learning is disabled during this operation in order to

make sure that the network weights are not affected. The methodology of sensitivity

analysis becomes rapidly complex with the increase in the number of input variables for a

given neural network model. The common approach in order to simplify the process as

described by Olden and Jackson [108] is to find out the basic summary statistics such as

the min, max and mean values for each of the variables. The network response is

recorded as the value of the variable is varied over the entire range and this provides a

spectrum of output response at different values of a particular input variable. This

information helps in understanding the relative contribution of each input variable and

can be easily summarized into relative importance/contribution plots.

2. 5. 4 GARSON’S ALGORITHM

As described earlier a neural network processes and provides information in the

form of connection weights. The input from each of the input variable is fed to the

network model as a weight. The contribution of each of these input variables to the output

mainly depends on magnitude and the direction of the connection weights [48]. As

described by Olden and Jackson [108], a positive connection weight increases the

magnitude of the network output whereas a negative weight inhibits the value of the

response variable. Also a variable with significantly higher connection weight is

considered to have a greater impact on the network output as compared to the others. The
46

mapping between the inputs variables and the predicted response generated in the case of

a MLP, is a bi-level process of information flow involving weight transfer from input to

hidden and then from hidden to output layer. An important fact which has been noticed

[48] is that when the direction of the connection weight is same (positive or negative)

between the input-hidden and the hidden-output layer it positive effect on the network

output. With the significant amount of knowledge that could be extracted by studying the

flow patterns of connection weight, the next step for researchers was to partition the

connection weights so establish the relative contribution of each of the input variable

toward the network output. In 1991 Garson [49] formulated an algorithm that would a

percentage breakup of the relative importance of each of the input variable in a given

network. Further enhancements to this algorithm were later proposed by Goh [50]. An

illustration of the application of Garson’s algorithm in a single hidden layer Feed

Forward MLP with two Processing Elements (PEs) is shown in Figure 2.8.
47

Figure 2.8 Network Diagram

Step 1: The matrix containing input-hidden-output neuron connection weights is

shown in Table 2.2.

Table 2.2 Matrix Showing Connection Weights

Inputs Hidden A Hidden B

Input 1 WA1 = -2.61 WB1 = -1.23

Input 2 WA2 = 0.13 WB2 = -0.91


Input 3 WA3 = -0.69 WB3 = -2.09
Output WOA = 1.11 WOB = 0.39
48

Step 2: The contribution of each input neuron to the output via each hidden neuron

calculated as the product of the input-hidden connection and the hidden-output

connection weights as shown in Table 2.3.

C A1 = W A1 × WOA = −2.61 × 1.11 = −2.90 (2.14)

Table 2.3 Matrix Showing Contribution of Each Input Neuron

Inputs Hidden A Hidden B

Input 1 CA1 = -2.90 CB1 = -0.48


Input 2 CA2 = 0.14 CB2 = -0.35
Input 3 CA3 = -0.77 CB3 = -0.82

Step 3: Relative contribution of each input neuron to the outgoing signal of each

hidden neuron (RA1) and the sum of input neuron contributions (S1) is shown in

Table 2.4:

R A1 = C A1 / ( C A1 + C A 2 + C A3 ) = 2.90 / (2.90 + 0.14 + 0.77 ) = 0.76 (2.15)

S1 = R A1 + R B1 = 0.76 + 0.29 = 1.05 (2.16)


49

Table 2.4 Relative and Sum of Input Neuron Contributions

Inputs Hidden A Hidden B Sum


Input 1 RA1 = 0.76 RB1 = 0.29 S1 = 1.05
Input 2 RA2 =0.04 RB2 = 0.21 S2 = 0.25
Input 3 RA3 = 0.20 RB3 = 0.50 S3 = 0.70

Step 4: Relative importance of each input variable is shown in Table 2.5:

RI 1 = S1 / ((S1 + S 2 + S 3 ) × 100 )
(2.17)
= 1.05 / ((1.05 + 0.25 + 0.70 ) × 100) = 52.5%

Table 2.5 Relative Importance of Input Variables

Relative Importance
Input 1 RI1 = 52.5%
Input 2 RI2 = 12.5%

Input 3 RI3 = 35.0%

2. 5. 5 NETWORK INTERPRETATION DIAGRAM (NID)

With the increase in the number of input variable in the network structure with

more than one hidden layer, it becomes difficult to associate the contribution of the input

variables on the network output based on the magnitude of the connection weights. In

order to simply this problem and provide a visual representation of the network and the
50

connection weights, Özesmi & Özesmi [53] developed the Neural Interpretation Diagram

(NID). The underlying methodology [108] was to represent the connection weights in the

form of line joining neurons in each of the layers in the network. The line thickness was

the representation of the magnitude of the connection weights, with thick lines

representing higher weights. The direction of the connection weights was represented

using the line shadings. Solid lines were used to signify the excitatory signal and dashed

lines were used for inhibitory signals). Study of magnitude and director of connection

weights helps in predicting the variable contribution [51], [52] as well as understanding

the interactions between the input variables. A sample NID illustrating the direction and

magnitude of the connection weights is shown in Figure 2.9.


51

Figure 2.9 Network Interpretation Diagram (NID)

In their work [109] Olden and Jackson explain the concept that when like (positive

or negative) connection weights transfer from input-hidden to hidden-output layer in an

MLP, it results in a positive/excitatory effect of input variables on the network output.

Whereas negative/inhibitory effect is projected when opposing connection weights flow

from input-hidden to hidden-output layer. The product of the two connection weights

subsequently passing between the layers of the MLP gives the final effect of the input

variable on the output.


52

2. 6 RULE EXTRACTION IN NEURAL NETWORKS

The successful proliferation of applications incorporating ANN technology in fields

ranging from engineering, ecological, science, industry, commerce, medicine offers a

clear testament to the enormous capabilities of the ANN paradigm. There are three salient

characteristics of the ANNs which underpin this success:

• The comparatively straightforward manner in which the artificial neural networks

acquire knowledge about a given problem domain through the training phase. The

power to understand and learn linear as well as non-linear relationships helps the

ANN to learn with relative ease and model the problem. This process is quite

distinct from the more complicated knowledge engineering/acquisition process for

symbolic AI systems.

• The compact (albeit completely numerical) form in which the acquired

information/knowledge is stored within the trained ANN. Also, the comparative

ease and speed with which this ‘knowledge’ can be accessed and used aids in ease

of analysis.

• The robustness of an ANN solution in the presence of noise in the input data. This

helps to build network models that have relatively higher accuracy even in the

presence of noisy data.

Another advantage of the trained ANN is the high degree of accuracy reported

when an ANN solution generalized over unseen examples from the problem domain. In

spite of the salient characteristics the ANN have a major drawback of being unable to
53

clearly explain the process of generating the results [110]. The basic idea of a learning

and generalization tool is to explain the process and provide knowledge. For ANN to gain

a wider acceptance as a learning and generalization tool, it is essential to integrate them

with a tool that would provide comprehensible explanation for the results and the

underlying methodology. For applications where ANN is to be used for safety critical

applications such as airlines, medical diagnosis, power plants, life cycle of gas pipelines

etc., it is essential for the ANN system to have three important capabilities:

1. Providing capability for a user to validate the results generated by the ANN for

all the possible input values [110].

2. Providing capability for user to define the boundary conditions for the input

variables under which the system would satisfactorily perform the task of

generating a desired output with sufficient reliability [110]. This would provide

transparency of the ANN solution and would expand the possibilities of

integrating profitably symbolic and connectionist approaches to Artificial

Intelligence.

3. Within a trained ANN, the capability should exist for the user to determine

whether the ANN has an optimal structure or size. A concomitant requirement is

for ANN solutions to not only be transparent but also provide information about

the internal states of the system. The satisfaction of such requirement would

make a significant contribution to the task of identifying and, if possible,

excluding those ANN-based solutions that have the potential to give erroneous
54

results without any accompanying indication as to when and why a result is

suboptimal.

Rule extraction offers the potential for providing such capabilities.

2. 6. 1 THE RULE EXTRACTION TASK

A neural network captures task-relevant knowledge as part of its training regimen.

This knowledge is encoded as:

• The architecture or topology of the network

• The transfer functions used for non linear mapping

• A set of network parameters (weights and biases)

The knowledge represents the hypothesis or model learned by the network. Usually

these models are difficult to understand because the processing in a neural network

occurs at the sub-symbolic level as numerical estimation and manipulation of network

parameters. It may not always be possible to directly translate these large sets of real

valued parameters into symbols or concepts that have semantic significance. Mapping

between input features and target concept is represented by the hidden units in the

network. Thus, hidden units represent higher-level derived features, which may not

correspond to known features in the problem domain.

The goal of rule extraction approaches is to express the knowledge gained from the

network into symbolic inference rules. The proliferation of rule extraction techniques has
55

prompted researchers [54] [55] to develop criteria to evaluate the proposed algorithms

and their extracted knowledge representations as summarized below:

• Comprehensibility: The extent to which the extracted representations are humanly

comprehensible.

• Expressive power: The structure of the output presented to the end-user. Various

representation formats like simple propositional rules, M-of-N rules, fuzzy

inference rules, decision trees, etc. can be used based on the problem domain.

• Fidelity: The capability of the extracted representations to mimic the behavior of

the original network.

• Predictive Accuracy: The generalization capabilities of the resulting

representations.

• Scalability: The ability of the rule extraction method in adapting to different

problem sizes (dimensionality of the input space, number of processing elements

etc).

• Generality: The degree to which a rule extraction method imposes special

requirements like tailored training regimens or restrictions on network

architecture.

2. 6. 2 APPROACHES TO RULE EXTRACTION

One of the earliest approach for extracting comprehensible rules from ANN can be

found in the work of Gallant [56] on connectionist expert systems. Classification rules
56

describing the network’s behavior were obtained by analyzing the role of attribute

ordering in correctly classifying a problem. A variety of rule extraction methods have

been developed since then for addressing the problem of comprehensibility in neural

networks. According to Andrews et al. [57], rule extraction methods can be classified

into three categories, based on the view taken by the algorithms of the underlying

network topology: decompositional, pedagogical and eclectic.

In the Decompositional methods, rules are extracted from the network at every

neuron of the hidden and output layers. These rules are then combined to describe the

behavior of the overall network. This approach can be considered as a local approach to

rule extraction as the analysis is primarily based of the architecture of the network. Most

approaches within this category employ a search procedure for finding subsets of

incoming weights that exceed the bias or threshold on a node. The identified subsets of

such activations are translated into propositional rules. The subset method by Fu [58] and

the M-of-N algorithm developed by Towell and Shavlik [59] are generic representatives

of this category. The subset method extracts simple propositional rules. The M-of-N

algorithm, as the name suggests is capable of extracting m-of-n rules. An m-of-n

expression is satisfied when m of the possible n antecedents are satisfied. Setiono [60] in

his work describes the method to extract rules where first the activation values at the

hidden layer neurons are grouped together, and then the network is repeatedly split into

sub-networks for ease of analysis. The RULEX technique developed by Andrews and

Geva [61] directly interprets the weight vectors as rules. This technique can be used only
57

for a particular type of multilayer perceptron called the Constrained Error Back-

propagation (CEBP) perceptron. Though simple in conception, the decompositional

approach to rule extraction has various limitations. The algorithmic complexity increases

exponentially with network complexity. Various restrictions are imposed on the network

architecture and the training procedures, which adversely affects the generalization

capabilities of the neural network.

Pedagogical techniques extract rules that map network inputs to outputs directly.

The approach used by Saito and Nakano’s approach [62], was to select useful rules from

the rule set generated using input activation values of the network which activate a given

output unit. Craven and Shavlik’s Rule-extraction-as-learning [63] is a pedagogical

approach, which uses the characteristics of querying the network. Their rule extraction

process consisted of systematic sampling of the network data and then generating queries

to extract rule set from the network. This approach is less computationally intensive than

search based methods. Validity Interval Analysis (VI-Analysis) proposed by Thurn [64],

extracts rules by a generate-and-test procedure, by propagating validity intervals

throughout the network. Linear programming is used to determine if the set of proposed

validity intervals are consistent with the network’s activation values on all nodes. The

RULENEG approach developed by Pop et al. [65] focuses on extracting conjunctive

rules from a neural network. It is based on the principle that changing the truth value of

one of the antecedents in a conjunctive rule changes the consequent of the rule.
58

Several pedagogical approaches have also been developed for extracting decision

tree representations of the neural network. Craven and Shavlik [66] extract decision trees

from trained neural networks using a novel algorithm named TREPAN. This algorithm

employs a greedy gain ratio criterion for evaluating attribute splits. Binary and M-of-N

decision trees can be derived by this method. The ANN-DT (Artificial Neural Network -

Decision Tree) algorithm proposed by Shmitz et al. [67] is capable of growing binary

decision trees from neural networks by using attribute selection criteria based on

significance analysis for continuous valued features. The DecText (Decision Tree

Extractor) algorithm [68] is effective in extracting high fidelity trees from trained

networks. The paper also proposes different criteria for selecting an attribute to partition

the training data.

The third category of rule extraction techniques, labeled eclectic approaches,

combines elements of the basic categories discussed above. The BRAINNE system

proposed by Sestino and Dillon [69] extracts simple if-then rules. The method uses a

unique approach to handle continuous data without discretization. The genetic algorithm

based rule extraction approach was developed by Keedwell et al. [70]. Genes contain the

weight between two adjacent layers. Chromosomes are then constructed to represent a

path from the input to the output layer. The fitness function is calculated as the product of

the weights being transferred from the input the output layer. The algorithm identifies the

fittest chromosomes, which subsequently mapped into if-then rules. A major limitation of

this method is that only single antecedent rules can be extracted.


59

2. 6. 3 VALIDITY INTERVAL ANALYSIS

One of the more popular rule extraction and refinement techniques is the Validity

Interval Analysis (VI-Analysis) [64]. The basic procedure of the VI-Analysis algorithm

adapted from [57] and [64] is as follows:

• Generation of candidate rule set: For rule extraction, the first step is the

generation of a feasible rule set.

• Validity interval assignment: A candidate rule is translated into a set of validity

intervals (range of its activation values) to be specified on the input and output

nodes of the network.

• Interval refinement: These proposed validity intervals are refined by propagating

them through the network in two phases: forward and backward. Linear

programming techniques are then employed to generate and refine validity

intervals.

• Rule validation based on convergence: There are two possible outcomes to the

previous step. VI-Analysis converges validating the proposed rule. Otherwise, a

contradiction is found, proving that the constraints imposed by the proposed

validity intervals are inconsistent with the behavior of the network. The rule is

rejected and steps 2-4 are repeated with another candidate rule.

A major drawback of this method is the computational intensity requiring many

calls to an optimization module. In addition, activation levels of the nodes are assumed

independent of one another. This assumption is not always valid and the algorithm may
60

not find maximally general rules. Maire [71] shows that VI-Analysis always converges

in one run (forward and backward phase) for single layer networks and has an

exponential rate of convergence for multilayer networks.

2. 6. 4 EXTRACTION OF DECISION TREE REPRESENTATIONS

The ANN-DT algorithm generates univariate decision trees and a schematic

representation of the algorithm adapted from [67] is shown in Figure 2.10. As illustrated

in the figure, the sampled data set S is split into two data sets S1 and S2, based on the

selected attribute. The main steps in the ANN-DT algorithm are as follows:

1. Interpolation of Correlated Data: An artificial data set is prepared by random

sampling of the feature space and the class labels are obtained by querying the

neural network modeling the training data

2. Selection of Attribute: For discrete output classes, gain ratio is used for selecting

the attribute. An alternate method based on analysis of attribute significance can

also be used.

3. Stopping Criteria: The selected attribute splits the current set of data into two

subsets. By recursive splitting of data, a decision tree is generated. For discrete

classes, the process is terminated when an internal node contains data with one

output class. For continuous outputs, termination occurs when standard

deviation of variance of data is zero.


61

Estimate neighbor-
hood areas for S1
interpolation

Use interpolated data S Extraction of binary


to sample neural decision tree
network Selection of attribute
Selection of split point

Train neural network


on original dataset S2

Figure 2.10 Schematic Representation of the ANN-DT Algorithm

Adapted From Schultz et al. [67]

2. 6. 5 TREPAN ALGORITHM

The TREPAN algorithm is similar to other conventional decision tree induction

algorithms such as C4.5 [25] but differs by extracting knowledge from a trained network

as an inductive learning task. The resulting decision tree approximates the network.
62

A major difference between TREPAN and other decision tree algorithms is the use of

an oracle that makes membership queries and returns the class labels. The network itself

serves as the oracle and answering the membership queries means using the network to

classify an instance. This information is used in developing the nodes and leafs of a tree.

Generally, an attribute is selected to be placed at the root node. In the next step a branch

is then added to this node of the tree for each possible value of this attribute. The

branching process splits the data set into a given number of subsets. The process is

recursively repeated at every branch, using only those data patterns that actually reach the

branch. If the number is less than the threshold, the oracle is used. The branching

continues until all the patterns that reach a leaf node belong to the same class. No further

expansion of this leaf node is necessary and the node is designated with the appropriate

class label. The expansion then proceeds to other branches of the tree, until all possible

leaf nodes have been produced. For more detail on TREPAN see Craven and Shavlik [63].

The TREPAN algorithm is shown in Figure 2.11


63

Figure 2.11 The TREPAN Algorithm.

TREPAN uses membership queries at each instance of the learner’s instance space,

to determine the class labels for each instance. This membership query is a question to

oracle (the network model) and returns the class label. TREPAN utilizes DRAWSAMPLE

routine to get a set of query instance to use for membership queries. These query

instances are subject to a set of constraints determined by the location of the node in the
64

tree. The constraints mainly state that instances should have outcomes for the tests at

nodes higher in the tree that cause the instances to follow the path from the root to the

given node.

The CONSTRUCTTEST function is used to determine the splitting test for a particular

node. TREPAN uses m-of-n expression for its tests. The m-of-n expression is a Boolean

expression that is specified by a threshold values m (integer value), and a set of n

Boolean literals. The expression m-of-n is satisfied when at least m of its n literals are

satisfied.

TREPAN ensures that a minimum number of instances exist at a given node, before

giving a class label to the node or choosing a splitting test for it. The data set to be used at

each of these nodes is determined by the minimum_sample parameter, which is a user

specified field. The parameter controls both the size as well as the depth of the decision

tree and in turn influences the classification accuracy of the decision tree.
65

CHAPTER 3. APPROACHES TO CORROSION PREDICTION

PROBLEM

3. 1 REVIEW OF APPROACHES TO SOLVING CORROSION PREDICTION PROBLEM

Many different models of CO2 corrosion exist. These can be arbitrarily classified

into three categories based on how firmly they are grounded in theory:

• Mechanistic Models

• Semi-empirical Models

• Empirical Models

Mechanistic Models: these models describe the mechanism of the underlying

reactions and have a strong theoretical background. All or most of the constants

appearing in this type of model have a clear physical meaning. Many of the numerical

values can be found in the literature about corrosion. When calibrated on a reliable

experimental database this type of model should, in principle, enable accurate and

physically realistic interpolation (prediction within the bounds of calibrating parameters),

as well as extrapolation predictions. It is easy to add new knowledge to these models with

minimal modifications of the existing model structure and without having to recalibrate

any of the model constants.


66

Semi-empirical Models: these models are only partly based on firm theoretical

hypothesis. They are, for practical purposes, often extended to areas where insufficient

theoretical knowledge is available so that the additional phenomena are described with

empirical functions. Some of the constants appearing in these models have a clear

physical meaning while the others are just best-fit parameters. Calibrated with a

sufficiently large and reliable experimental database, such models will enable good

interpolation predictions. However, extrapolation can lead to unreliable and sometimes

physically unrealistic results. New knowledge can be added with some effort usually by

adding correction factors and/or by performing a partial recalibration of the model

constants.

Empirical Models: these models have very little or no theoretical background.

Most or all of the constants have little physical meaning – they are just best-fit

parameters to the available experimental results. When calibrated with a large and

reliable experimental database, these models can have excellent interpolation

characteristics. However, any extrapolation may lead to incorrect predictions, as there is

no assurance that the arbitrary empirical correlations hold outside of their calibration

domain. The addition of any new knowledge to this model is rather difficult and requires

recalibration of the entire model. Alternatively, correction factors related to their

interactions with the existing empirical constants, can be added with some degree of

uncertainty.
67

The next section briefly discusses some of the significant models belonging to each

of these categories.

3. 2 MECHANISTIC MODELS

As described in the theoretical background section of chapter 1, CO2 corrosion is a

complex phenomenon where electrochemical, transport and chemical processes occur

simultaneously. Thus the processes to be modeled are electrochemical reactions and the

flow of the different components of the system such as H+, CO2, H2CO3 and Fe++, as well

as the chemical reactions occurring between them. There is no single model which would

be able to grasp all of these complexities, but there have been studies which incorporate

individually different aspects of this complex process.

Since CO2 corrosion is an electrochemical phenomenon a number of researchers

have attempted to construct a mechanistic model based on the electrochemical processes

occurring to the metal surface. One of the most significant and widely used models based

was proposed by Wards and Milliams [1], [4]. Due to some of the basic assumptions

made by the authors in the modeling process, the model was questioned for validity [72],

[73], [3]. Wards and Milliams [1], [4] later revised their model [74] based on some of the

constants determined by experiments of Dugstad el al. [8]. This revised model has been

used on several occasions in order to extend its validity into areas concerning corrosion

presence of protective films [7], [74]. Another electrochemical model was presented by

Gray et al. [2], [6] which contained constants based on the own glass cell and flow cell. It
68

was a breakthrough in its scope and approach of the study in the field of CO2 corrosion

modeling. Nesic et al. [3] did a follow up study and presented another electrochemical

model. This model had a successful predictive accuracy as compared to the semi-

empirical models of de Waard et al. [7] and Dugstad el al. [8]. The model of Nesic et al.

[3], described the electrochemical process occurring on the metal surface in detail, but

process of transport leading to the occurrence of currents was oversimplified. Pots [76]

later presented a more realistic model describing the transport processes in the boundary

layer for the case of CO2 corrosion. His model was based on the approach of Turgoose et

al. [77] who was the first to model the phenomena. Archour et al. [78], Dalayan et al.

[79] used their own mechanistic model to simulate pit propagation of carbon steel in CO2

environment under highly turbulent conditions.

3. 3 SEMI-EMPIRICAL MODELS

Most of the models based on the mechanistic approach are called the “worst-case”

models because they do not take into consideration the presence of protective surface

films, corrosion inhibitors, hydrocarbons, different steel types, high pressure and other

realistic conditions found in oil and gas industry.

When the walls of the pipeline are wetted with oil (hydrophobic) no corrosion is

possible. Also some of the components of crude oil have inhibitive properties, which help

in obtaining more protective films. This crucial factor was incorporated in the modeling
69

process by de Waard and Lotz [7] by accounting for a factor called water-wetting factor

in their so-called “resistance model”, relating effect of velocity to corrosion process.

Waard et al. [74] presented a semi-empirical model based on their initial study [1],

which considered the effects of protective films and corrosion inhibitors which modeling

corrosion. Dugstad et al. [8] presented his semi-empirical model of CO2 corrosion based

on the temperature-dependent basic equation (best fit polynomial function). Pots [75],

[76] presented a more realistic semi-empirical model describing the transport processes in

the boundary layer for the case of CO2 corrosion. His model was based on the approach

of Turgoose et al. [77] who was the first to model the phenomena.

An important problem in modeling of CO2 corrosion is that many of the pipelines

and flow lines carrying oils operate under multi-phase flow conditions. Modeling of

multi-phase flow alone is a difficult task, even more so is its effect on CO2 corrosion.

Jepson et al. [80] presented a semi-empirical model suggesting the importance of the

Froude number in characterizing the effects of multi-phase flow on CO2 corrosion.

3. 4 EMPIRICAL MODELS

It has been observed that CO2 corrosion rates in the field in presence of crude oil

are much lower then those obtained in laboratory conditions where crude oil was not used

or where synthetic crude oils were used. One can identify two main effects of crude oil

on the CO2 corrosion rate. The first is a wettability effect and relates to a hydrodynamic

condition where crude oil entrains the water and prevents it from wetting the steel surface
70

(continuously or intermittently). The second effect is corrosion inhibition by certain

components of crude oil that reaches the steel surface either by direct contact or by first

partitioning into the water phase.

Efird [11] stressed the importance of testing the effect of specific crude oils and

including it in corrosion prediction and testing. He also introduced the definition of

Corrosion Rate Break as the level of produced water in crude oil production where

corrosion is accelerated and becomes a problem. Smart [12] in 1993 presented his work

relating petrophysical and wettability properties to corrosion. He indicated in his work

that crude oils have surface active compounds (polar compounds containing oxygen,

nitrogen and sulfur) that strongly affect the wettability properties of brines. Adams et al.

[99] later presented work relating the water-wetting factor of corrosion to the velocities in

the flow pipe using a multiple regression model. Use of linear regression models to

describe the complex process of CO2 corrosion has always been a questionable approach.

In some recent studies [83], [84] the degree of inhibition was quantitatively modeled to

the chemical composition of crude oil and the concentration of saturates, aromatics,

resins, asphaltenes, nitrogen and sulfur. Hernández et al. [13] gave an insight about the

variables in crude oil composition that could be playing a major role in the inhibition

offered by crude oils. In this work, a statistical analysis was performed with several

Venezuelan crude oils evaluated experimentally under the same conditions. Crude oils

were separated into two groups: paraffinic and asphaltenic, depending on their

distribution of saturates, aromatics, resins and asphaltenes (SARA). The effects of basic
71

chemical and physical properties of crude oils were then evaluated by using multiple

linear regression analyses. Markov description stochastic approach for modeling the

phenomenon of pitting corrosion has been presented in the work of Provan [81].

In recent years the field of artificial intelligence has been explored for modeling the

corrosion process. ANNs have been one of the most promising approaches to the

corrosion modeling process. The next section presents some of the important research

that has taken place the field of modeling the corrosion process.

3. 5 NEURAL NETWORK MODELS

An early published attempt to apply a neural network to a corrosion problem was

that of Smets and Bogaerts [85]. They developed a series of neural networks to predict

the SCC of type 304 stainless steel in near neutral solutions as a function of chloride

content, oxygen content and temperature. They found that the neural network approach

out-performed traditional regression techniques.

Urquidi-Macdonald [86] developed a neural network model used for predicting the

number and depth of pits in heat exchangers. No information was given about the

network size other than that it had two hidden layers or the number of training points.

The evolution of the pit depth and the number of pits were effectively modeled and

demonstrated a good comparison of the experimental results.

Ben-Hain and MacDonald [87] described the use of neural network models to

predict the influence of various parameters on the acidity of simulated geological brines.
72

The solutions were based on NaCl + MgCl2. The network inputs were the Na+ and Mg2+

concentration and the temperature. The predicted output was the pH value. The data set

consisted of 101 points, of which 90 were used for training, with the remaining 11

retained as a test set. A simple network consisting of a single layer with two hidden nodes

was used. The network achieved good results and the prediction error was of the same

order as the experimental uncertainty.

Silverman and Rosen [88] combined artificial neural networks with an expert

system in order to predict the type of corrosion from a polarization curve. Inputs to the

networks included the passive current density, the pitting potential and the repassivation

potential, while outputs were the risks of crevice, pitting and general corrosion. Two

approaches were used: independent networks for each type of corrosion, and a single

combined network producing all three outputs. An expert system was used to interpret the

outputs produced by the two approaches. The relatively small size of the training data set

was one of the major concerns regarding the reliability of the model.

Trasatti and Mazza [90] developed a neural network to be used for the prediction

of crevice corrosion behavior of stainless steels. The network was trained from long-term

laboratory and field tests. Seventeen input variables were used with one hidden layer of

five nodes. Six hundred training examples were available; 450 of these were used for

training and the remaining 150 as a test set. The performance of the network was

reasonably good, but the very large number of input variables might be expected to

present difficulties in training with a relatively small data set. A 17-dimensional


73

hypercube has 217 approx. 130,000 ‘corners’ so the data space is inevitably very sparsely

populated.

Palakal et al. [98] developed an intelligent computational approach based on

wavelet analysis and ANNs to identify and quantify the corrosion damage images on

panels obtained from nondestructive inspection (NDI) techniques. A K-mean

classification algorithm was used to identify the corroded regions from the non-corroded

regions in the panel based on the extracted features. Good accuracy was obtained in

identification of the corroded segments. A back propagation NN was used to predict the

material loss due to corrosion. Perturbing the images by changing the pixel values that

would correspond to the higher material loss due to growth in corrosion was simulated.

Experiments were conducted by perturbing the images of the damaged regions such that

growth in the extent of the material loss could be observed. A good trend was observed

between the predicted material loss and the experimental data. The results indicated that

the computational methods developed for corrosion analysis seem to provide reasonable

data needed for clustering material loss due to corrosion damage.

Pidaparti et al. [97] presented a work that examined the residual strength of aging

aircraft panels in the presence of corrosion and fatigue damage. Both the residual strength

and the corrosion rates were predicted using a neural network consisting of two hidden

layer feed-forward architecture. Sensitivity analysis was performed for determining the

impact of input variables on the output. A series of simulations were also performed to

examine the generalization ability of the network in predicting the outputs for different
74

conditions of the input parameters. Each simulation tested the effect of a particular input

parameter on the predictions for a particular panel. The results obtained were in good

agreement with the experimental data. A similar work was carried out by Bailey et al.

[92]. A model was developed using neural networks to predict the ASTM G34 corrosion

rating and the resulting material loss in aging aircrafts. Another model was also

constructed that would predict the cycles for final fatigue failure and the residual static

strength of a particular type of material, given the amount of material loss due to

corrosion.

Bucolo et al. [93] modeled the corrosion phenomena occurring in a pulp and paper

plant. In this study, two predictive models were constructed. Predictive models for both a

local and a global prediction were built to allow for the evaluation of the corrosion rate

taking place in the stainless steel used in the ozone bleaching devices used in the plant.

An MLP model was constructed and later merged with a neuro-fuzzy system (NF). The

performance of the adopted predictive monitoring showed that the neuro-fuzzy expert

system was able to improve the capability of the neural network model by both

improving the accuracy of the model as well as demonstrating a dramatic reduction in the

number of input parameters necessary for satisfactory accuracy.

Leifer et al. [94] presented a model based on the pitting corrosion for the carbon

steel waste tanks containing aqueous radioactive waste, used for temporary storage of

spent nuclear fuel while permanent storage facilities for such materials were being

prepared. ANN was used to predict the corrosion rate. The back propagation of error
75

method was used to train and test the ANN model using archival pitting data. The

resulting data for the number of pits obtained from the neural network model were in

conjunction to the results obtained from experimental methods [111]. In one of his other

works Leifer et al. [95] presented a predictive model to determine the rate of pitting

corrosion in carbon steel waste tanks used to store radioactive sludge. The other

parameters taken into consideration were the different concentrations of corrosion

inhibitors and temperature ranges. The concentration levels of inhibitors such as Nitrite

were calculated experimentally using electrochemical polarization. Statistical methods

were used to analyze the experimental data. The network architecture selected was a

Generalized Feed-Forward ANN with Back-Propagation of Error Algorithms used for

training the network. The results revealed greater accuracy in prediction conditions

leading to pitting corrosion as compared to the regression model.

Haque et al. [96] developed a model for prediction of corrosion-fatigue crack

growth rate in dual-phase (DP) steels (primarily a low carbon steel with micro-alloying

additions of vanadium and boron) using an artificial neural network. The training data

consisted of corrosion-fatigue crack growth rates at varying stress intensity ranges for

martensite contents between 32 and 76%. The ANN model used consisted of three hidden

layer with back-propagation architecture. Even though a large number of input variables

were used during the training of the model, the model exhibited excellent comparison

with the experimental results.


76

Nesic and Vrhovac [91] developed a hybrid model combining the reliability of a

mechanistic model with the flexibility of the neural network approach. The model was

developed using the experimental database of Dugstad et al. [8]. The model architecture

consisted of a single hidden layer back propagation NN having 66 input neurons and 51

hidden neurons. GAs was used for the network training. The inputs to the network were

indirect, crude or noisy parameters, called primitive descriptors, such as: t, pH, PCO2,

Fe++, HCO3-, and v (flow velocity of oils). Relations between these primitive descriptors

were studied by introducing additional problem descriptors called evolved descriptors.

The prediction ability was found to be significantly better than conventional models.
77

CHAPTER 4. METHODOLOGY

4. 1 CORROSION TESTS

The main goal of this research was to make a model based on the actual data

collected from the experimental results. The detailed description of the corrosion tests

and the resulting data was published in a previous paper Hernández et al. [13]. A brief

overview of the experimental procedure is summarized as below.

Fifteen Venezuelan crude oils were evaluated. An analysis of saturated

components, aromatics, resins and asphaltenes (SARA) was performed on each crude oil.

API density (°API), total nitrogen content (NTOTAL), Total Acid Number (TAN), Sulfur

content (S%), Vanadium (V), Nickel (Ni) were measured according to ASTM standards.

Weight loss corrosion tests were performed on coupons. Three coupons were used for

each set of testing conditions; two of them were used for corrosion rate calculations and

the third for surface analysis and corrosion product characterization. After calculating

corrosion rates, these were then translated into inhibiting capacity, by dividing the values

of each test by the value obtained in blank tests, so that:

corrosionrate withcrude
Inhibiting Capacity = 1 − (4.1)
corrosionrateblank
78

4. 2 DEVELOPMENT OF THE NEURAL NETWORK MODEL

The first step towards the development of the model was to create a network

structure that would be most efficient in predicting complex nature of the corrosion

prediction problem. There are four major components related to the development of a

neural network model.

1. Choice of data and dividing them (based on sizes) into training, cross-validation

(CV) and testing data

2. Selection of an appropriate network architecture, training algorithm and learning

constants

3. Genetic optimization for the most suitable network parameters

4. Determination of the termination criteria

There are presently no definitive rules or formulae available to determine each of

these network selection parameters. Rigorous experimentation and a number of trials with

different types of network architecture were performed to achieve a good network model

for the given data. The software, NeuroSolutions version 4.21 developed by

NeuroDimensions Incorporated, was used for development and testing of the neural

network model.
79

4. 2. 1 TRAINING, CROSS-VALIDATION AND TEST DATASETS

The original data set was split into training, cross-validation and testing data sets

where:

• 65% of the exemplars were presented to the network for training

• 15% of the exemplars concurrent with the training set were used for cross

validation during which the MSE was computed within a ‘test set’

• 20% of the exemplars were used for testing the trained network.

4. 2. 2 NETWORK ARCHITECTURE, TRAINING ALGORITHM AND LEARNING PARAMETERS

A number of different network architectures, such as Multilayer Perceptron,

Generalized Feed Forward Network, Modular Neural Network and Radial Basis

Function, were experimented with to achieve the model which resulted in the best

prediction accuracy. A description of the different network architecture and their

respective prediction accuracy is tabulated in Table 4.1


80

Table 4.1 Evaluation of Different Neural Network Model Architectures

Number
Type of Transfer Training Classification
of hidden Dimensionality
Network Functions Algorithm Accuracy
Layers
Multilayer Hyperbolic Gradient
1 11-8-1 75.82
Perceptron Tangent Descent
Multilayer Gradient
1 11-6-1 Logistic 83.67
Perceptron Descent
Multilayer Hyperbolic Gradient
2 11-10-8-3 88.52
Perceptron Tangent Descent
Multilayer Hyperbolic Gradient
2 11-10-8-1 92.2
Perceptron Tangent Descent
Multilayer
Perceptron with Hyperbolic Gradient
2 11-6-6-1 96.7
Genetic Tangent Descent
Optimization

Modular Neural 11-4(upper)- Hyperbolic Gradient


1 73.88
Network 4(lower)-1 Tangent Descent

Gaussian,
RadialBasis Gradient
1 11-7-1 Hyperbolic 80.14
Function Descent
Tangent
Gaussian,
RadialBasis Gradient
2 11-8-6-1 Hyperbolic 86.77
Function Descent
Tangent

4. 2. 3 GENETIC OPTIMIZATION OF NETWORK PARAMETERS

Once the preliminary tests revealed that MLP were more accurate as compared to

others in predicting the inhibition rates, the Genetic Control component of the software

was utilized in order to obtain the best network parameters. The Steady State progression

genetic algorithms were used, in which only the worst member of the population would

be replaced with each iteration. This method of progression tends to arrive at a near

optimal solution more quickly than the generational progression.


81

The genetic operator used for the algorithm is called Selection [112]. It selects

which chromosome is to be included in the next generation’s population. This selected

chromosome undergoes crossover and/or mutation to generate offspring which are then

added to the next generation’s population. The process of Crossover [112] is to develop a

new chromosome by combining/mating two parent chromosomes, so that the offspring

develops the characteristics of both the parents. The Crossover Probability controls the

process of crossover. In our model the crossover probability of 0.9 was used. Another

genetic operator called Mutation is used to alter one or more genes in a particular

chromosome, resulting in a new gene value. With the inclusion of these new gene values

the GA can obtain betters results as compared to the ones obtained before the crossover

and mutation of the parent chromosomes. The Mutation probability used was 0.01 to

ensure that the search criterion remains specific.

Based on the comparison of the results, a two hidden layer MLP (11-6-6-1) with six

processing elements each and a hyperbolic tangent transfer function at the hidden layers

was selected. Gradient descent was used as the training algorithm. Step size and

momentum rates are the key learning parameters for this algorithm. In order to accelerate

the network ‘learning’ and to make sure that the probability of network convergence is

highest at the global minimum, both the momentum rates and the step-sizes were

simultaneously varied during the training regimen [112].


82

4. 2. 4 TERMINATION CRITERIA

The Gradient descent algorithm determines the weight vectors, which maps the

network input parameters to the desired output. This weight vector is randomly initialized

and then adapted during the training process.

The randomness of the initial weight vector is very important for learning, but the

inherent non-linear dynamics of the training process adapts different convergence

properties with different initial weight vectors. Therefore, to increase the probability of a

good initial solution (weight vector); a number of runs are required. In each of these run,

a number of training cycles (epochs) are essential in order to ensure adequate

generalization. The following four termination criteria have been used to determine

convergence of the training algorithm:

• Number of runs before termination

• The maximum number of epochs/run

• Non-improvement of cross-validation error with training

• Increase in the cross-validation error with training

The training parameters (i.e., learning parameters and the termination criteria) for

the 11-6-6-1 MLP network are given in Table 4.2


83

Table 4.2 Learning Parameters and the Termination Criteria

Network Parameters Value

Step size 0.01

Momentum factor 0.7

Number of Runs 5

Number of Epochs/Run 7,000

Number of Epochs
without improvement in 100
CV error

Once all of the network parameters were selected, six test runs (Test 1 to 6) were

conducted using exactly the same network architecture and network parameters, but a

different set of randomized training data. These test results provided a set of different

weight vectors that were randomly initialized and adapted during the training process.

4. 3 SENSITIVITY ANALYSIS

The next step was to analyze the interrelationship between the input variables and

their effect on the output of the network. Sensitivity Analysis was performed for the

chosen MLP network and for all the six test runs for that particular model. The sensitivity

was computed based on the corresponding difference (delta) in the output(s) as graphed

using the Max-Min criteria of the output (inhibition). The results for one of the six test

runs are shown in the Figure 4.1.


84

Figure 4.1 Sensitivity Analysis About the Mean


85

Figure 4.2 to Figure 4.12 illustrate the separate sensitivities for each variable.

Figure 4.2 Separate Sensitivity for % Crude Oil

Figure 4.3 Separate Sensitivity for Nickel (Ni)


86

Figure 4.4 Separate Sensitivity for API

Figure 4.5 Separate Sensitivity for Total Nitrogen


87

Figure 4.6 Separate Sensitivity for Vanadium (V)

Figure 4.7 Separate Sensitivity for S %


88

Figure 4.8 Separate Sensitivity for Total Acid Number (TAN)

Figure 4.9 Separate Sensitivity for Saturates


89

Figure 4.10 Separate Sensitivity for Aromatics

Figure 4.11 Separate Sensitivity for Resins


90

Figure 4.12 Separate Sensitivity for Asphaltenes

A Cumulative Sensitivity graph, as shown in Figure 4.13, was constructed by

averaging the sensitivity values for all of the six test runs (Test 1 to 6). From the analysis

of the Cumulative sensitivity graph it apparent that crude oil percentage had the most

impact on the output (inhibition rate). Because of this, the data was subdivided on the

basis of the crude oil percentages for further analysis.


91

0.6

0.5

0.4
Sensitivity

0.3

0.2

0.1

0
TAN
S%

Aromatics
Total Nitrogen

Ni
API

V
Saturates

Resins

Asphaltenes

% Crude Oil
Figure 4.13 Cumulative Sensitivity Graph

The data was separated by crude Oil Percentage into 4 different groups: 1%, 20%,

50% and 80%. Sensitivity analysis was also performed on each of these groups as shown

in Figure 4.14 to Figure 4.17.


92

Figure 4.14 Sensitivity About the Mean for 1% Crude Oil Concentration

Figure 4.15 Sensitivity About the Mean for 20% Crude Oil Concentration
93

Figure 4.16 Sensitivity About the Mean for 50% Crude Oil Concentration

Figure 4.17 Sensitivity About the Mean for 80% Crude Oil Concentration
94

Based on the results, similar behavior patterns were noted between the 1% and 20%

crude oil data and similarly between the 50% and 80% crude oil data analysis. The

similar groups were combined (1% and 20%) & (50% and 80%) and another sensitivity

analysis was performed (Figure 4.18 and Figure 4.19) to identify the similarities between

low or high crude oil concentrations.

Figure 4.18 Sensitivity About the Mean for 1% and 20% Combined
95

Figure 4.19 Sensitivity About the Mean for 50% and 80% Combined

From the results of the cumulative sensitivity analysis we found that Nickel (Ni),

Crude oil and TAN were some of the important input variables affecting the output.

These results were in accordance with earlier studies Hernández et al. [13]. In light of

these results, to further explore the interrelationship between the input variables, an excel

model was constructed. The model represented the network analysis and generated the

predicted inhibition rate for a given set of input values. The network output was held

constant and the values of input variables were varied to explore the effects. Figure 4.20

demonstrates the behavior of Crude oil and Nickel while holding the network output and

the remaining variables constant.


96

% Crude Oil vs. Ni

85.0

65.0
Ni

45.0

25.0

5.0
0 10 20 30 40 50 60 70 80
% Crude Oil

Figure 4.20 Relationship Between % Crude Oil and Ni at Constant Inhibition Output

Similarly Figure 4.21 and Figure 4.22 illustrate behavior pattern between Crude oil

and TAN and between API and Aromatics.


97

% Crude Oil vs. TAN


5.0

4.0

3.0
TAN

2.0

1.0

0.0
0 10 20 30 40 50 60 70 80 90

% Crude Oil

Figure 4.21 Relationship Between % Crude Oil and TAN at Constant Inhibition Output

API vs. Aromatics


60.0

50.0

40.0
Aromatics

30.0

20.0

10.0

0.0
8.00 9.00 10.00 11.00 12.00 13.00
API

Figure 4.22 Relationship Between API and Aromatics at Constant Inhibition Output
98

This same approach was further developed to show interactions between three

variables (Crude oil, Ni and TAN) at constant output inhibition as shown in Figure 4.23.

Figure 4.23 Relationship Curves Between % Crude Oil, Ni and TAN at Constant

Inhibition.
99

4. 4 NETWORK INTERPRETATION DIAGRAM (NID)

The Network Interpretation Diagram was constructed to track the direction and

magnitude of synaptic weights between neurons among input/hidden/output layers,

thereby providing a visual representation of the impact of individual/interacting input

variables on the output parameter. Figure 4.24 represents the NID for the network model

and illustrates the relative influence of each input variable in predicting the output

response.
100

Figure 4.24 NID for 11-6-6-1 MLP Network


101

4. 5 GARSON’S ALGORITHM

Garson’s Algorithm was applied to the network model in order to decipher the

relative importance of each input variable and their contribution to the predicted output.

Figure 4.25 displays the results of the algorithm in the form of a pie chart, partitioning the

relative importance of each input variable in the predicted response.

Figure 4.25 Results of Garson’s Algorithm Showing Relative Importance of Input

Variables.
102

4. 6 TREPAN ALGORITHM

The next step in the analysis was to extract rules which would translate the neural

network model into explicit symbolic form. For the TREPAN algorithm, the regression

problem with continuous output data was transformed into a classification problem. The

output (% inhibition) was divided into 5 classes. Two different data sets (EVEN and

UNEVEN) were generated; one having even classes and another having uneven classes.

The two class ranges for both the data are shown in Table 4.3 and Table 4.4.

Table 4.3 Uneven Class Ranges

Uneven Classification Ranges


Class Label Range
CL1 0 - 0.749
CL2 0.75 - 0.849
CL3 0.85 - 0.899
CL4 0.90 - 0.979
CL5 0.98 - 0.999

Table 4.4 Even Class Ranges

Even Classification Ranges


Class Label Range
CL1 0 - 0.2
CL2 0.2 - 0.4
CL3 0.4 - 0.6
CL4 0.6 - 0.8
CL5 0.8 - 1.0
103

The minimum_sample parameter was varied between 1 to 4 that resulted in a

number of decision trees with different sizes and classification accuracies. The

minimum_sample size of one generated the decision tree which resulted in the best

accuracy.

Two different kinds of decision trees were extracted for each set of data,

Trepan_tree and Disjunctive_Trepan_tree. The Trepan_tree extracts a tree from the

loaded neural network model using the TREPAN algorithm. The Disjunctive_Trepan_tree

extracts a tree from the loaded network model using a variant of TREPAN that applies

disjunctive (i.e. “or”) tests instead of the general m-of-n tests at the internal nodes of the

extracted tree.

The performance statistics for two variants of decision trees applied to the two data

sets (EVEN and UNEVEN) is provided in Table 4.5.

Table 4.5 Performance Statistics for Decision Trees

Trepan_tree Disjunctive_Trepan_tree

Training Training
Test Data Test Data
Data Data
EVEN
87.30% 62.70% 89.60% 64.80%
Classes
UNEVEN
91.80% 69.75% 92.30% 71.80%
Classes
104

Figure 4.26 and Figure 4.27 show a partially expanded view of the

Disjunctive_Trepan_tree and Trepan_tree extracted for the UNEVEN data set.


105

Figure 4.26 Partially Expanded View of Disjunctive_Trepan_tree Extracted From the

UNEVEN Class Data.


106

Figure 4.27 Partially Expanded View of Trepan_tree Extracted From the UNEVEN Class

Data.
107

In the decision trees (Figure 4.26 and Figure 4.27) the circles represent the leaf

nodes and indicate the class label for the response variable (inhibition) predicted for a

particular set of values of the input variables represented in the path. The decision tree

can be easily decomposed into propositional rules. Table 4.6 shows a set of 20 distinct

rules generated by decomposing the Disjunctive_Trepan_tree (see Figure 4.26)


108

Table 4.6 Rules Extracted From the Disjunctive_Trepan_tree

Rule Class
Rule Text
No. Label
1 (% Crude Oil > 10.50) AND (Resin > 17) AND (API <= 9.26) CL5
(% Crude Oil > 10.50) AND (Resin > 17) AND (API <= 13.15) AND (API > 9.26)
2 CL4
AND (% Crude Oil <= 64.99)
(% Crude Oil > 10.50) AND (Resin > 17) AND (API <= 13.15) AND (API > 9.26)
3 CL5
AND (% Crude Oil > 64.99)
(% Crude Oil > 10.50) AND (Resin > 17) AND (API > 13.15) AND (% Crude Oil
4 CL3
<= 34.99)
(% Crude Oil > 10.50) AND (Resin > 17) AND (API > 13.15) AND (% Crude Oil >
5 CL5
34.99)
(% Crude Oil > 10.50) AND (Resin <= 17) AND (Saturates <= 63.40) AND (%
6 CL3
Crude Oil <= 34.99) AND (Resin <= 5.30)
(% Crude Oil > 10.50) AND (Resin <= 17) AND (Saturates <= 63.40) AND (%
7 CL2
Crude Oil <= 34.99) AND (Resin > 5.30) AND (API <= 23.95)
(% Crude Oil > 10.50) AND (Resin <= 17) AND (Saturates <= 63.40) AND (%
8 CL4
Crude Oil <= 34.99) AND (Resin > 5.30) AND (API <= 32.34) AND (API > 23.95)
(% Crude Oil > 10.50) AND (Resin <= 17) AND (Saturates <= 63.40) AND (%
9 CL4
Crude Oil <= 34.99) AND (Resin > 5.30) AND (API > 32.34)
(% Crude Oil > 10.50) AND (Resin <= 17) AND (Saturates <= 63.40) AND (%
10 CL4
Crude Oil > 64.99)
(% Crude Oil > 10.50) AND (Resin <= 17) AND (Saturates <= 63.40) AND (%
11 CL3
Crude Oil > 34.99) AND (% Crude Oil <= 64.99) AND (API <= 23.95)
(% Crude Oil > 10.50) AND (Resin <= 17) AND (Saturates <= 63.40) AND (%
12 Crude Oil > 34.99) AND (% Crude Oil <= 64.99) AND (API > 23.95) AND (S % <= CL3
0.57)
(% Crude Oil > 10.50) AND (Resin <= 17) AND (Saturates <= 63.40) AND (%
13 Crude Oil > 34.99) AND (% Crude Oil <= 64.99) AND (API > 23.95) AND (S % > CL4
0.57)
(% Crude Oil > 10.50) AND (Resin <= 17) AND (Saturates > 63.40) AND (% Crude
14 CL2
Oil > 64.99)
(% Crude Oil > 10.50) AND (Resin <= 17) AND (Saturates > 63.40) AND (% Crude
15 CL1
Oil <= 64.99) AND (API <= 32.90)
(% Crude Oil > 10.50) AND (Resin <= 17) AND (Saturates > 63.40) AND (% Crude
16 CL2
Oil <= 64.99) AND (API > 32.90)
17 (% Crude Oil <= 10.50) AND (Nitrogen >= 6514.38) CL2
(% Crude Oil <= 10.50) AND (Nitrogen < 6514.38) AND (Saturates <= 57.15 OR
18 CL1
TAN <= 4.22)
(% Crude Oil <= 10.50) AND (Nitrogen < 6514.38) AND (Saturates > 57.15 OR
19 CL2
TAN > 4.22) AND (S % >= 0.58)
(% Crude Oil <= 10.50) AND (Nitrogen < 6514.38) AND (Saturates > 57.15 OR
20 CL1
TAN > 4.22) AND (S % < 0.58)
109

The first rule implies that “IF (% Crude Oil > 10.50) AND (Resin > 17) AND (API

<= 9.26) the class label is CL5” i.e. the predicted inhibition rate would be in range (0.98

- 0.99) as per Table 4.3.


110

CHAPTER 5. RESULTS, COMPARISION AND DISCUSSION

5. 1 ACCURACY OF SELECTED MODEL

As discussed earlier, the most important parameter is neural network is establishing

a network architecture which accurately mimics the data patterns. It is evident from Table

4.1 the 11-6-6-1 MLP network has the best prediction accuracy on the training data. The

Mean Square Error (MSE) which is a difference between the network output and the

desired output is an indirect measure of the performance of the model. Table 5.1 shows

the accuracy and the MSE values for the selected model.

Table 5.1 Prediction Accuracy of the 11-6-6-1 MLP Network.

Network Model Mean Square


Architecture Accuracy Error (MSE)

11-6-6-1 MLP
Neural
R = 97.60% 0.0026
Network
Model
111

Figure 5.1 shows the model performance on training vs. test data.

Figure 5.1 Model Performance on Training vs. Test Data.

Other important parameter to be considered is the time (number of runs/epochs) for

the model to reach minimum MSE values. In this case the selected model was able to

converge to the minimum MSE value in the very first run after only 3500 epochs.
112

5. 2 SENSITIVITY ANALYSIS RESULTS

The initial sensitivity runs on the selected model Figure 4.1 revealed that the

variables having the greatest influence on the output response were Crude Oil percentage,

Ni Content and API gravity. The separate sensitivity analysis Figure 4.2 to Figure 4.12

further explains the effect of each of the variable on the final output. Increase in the %

Crude Oil causes an increase in the inhibiting capacity of the (see Figure 4.2). Increase in

the Nickel content has a detrimental effect reducing the inhibiting capacity of the oil (see

Figure 4.3). API (Figure 4.4) and total Nitrogen (Figure 4.5) tends to increase the

inhibition rate as their respective content increases. The effect of Vanadium (Figure 4.6)

and Total Acid Number (Figure 4.8) resulted in an increase of the inhibiting capacity;

however the effect is very small as can be seen for the values in the y-axis. The content of

S % in the range tested (Figure 4.7) showed to decrease the inhibiting capacity. In regards

to the SARA components of the crude oil, none showed a significant effect, however

saturates (Figure 4.9) showed to decrease the inhibiting capacity as their content

increases, contrary to aromatics (Figure 4.10), resins (Figure 4.11) and asphaltenes

(Figure 4.12) which show to increase inhibition as their content increases.

The cumulative sensitivity graph (Figure 4.13) constructed by averaging the

sensitivity values from six different test runs was also consistent with the initial

sensitivity analysis and indicated that Crude Oil percentage, Ni Content and API gravity

were the most important factors that affected the inhibition rate.
113

The tendency of % crude oil vs. inhibition was clear in both the data and the model.

An increase in crude oil content increases the degree of corrosion protectiveness by the

crude oil. With API density, even if the data is scattered, the model predicts an increase

in inhibition as API increases, implying lighter crude oils providing higher values of

inhibiting capacities.

In order to see if this effect was repeatable, separated sensitivity analyses were

performed for the various crude oil contents evaluated: 1%, 20%, 50% and 80%.

• For 1% crude oil (see Figure 4.14) the model tends to predict a higher inhibiting

capacity than the real measured values, but the R (model accuracy) value is still

considerably high, 0.96. Nickel appears to be most significant followed by API,

sulfur content, TAN and asphaltenes. All but Nickel increase the inhibiting

capacity when increased in number or concentration.

• For 20% crude oil (Figure 4.15, R=0.98) Nickel is not that critical and the

variables influencing the most are API, total Nitrogen, resins and TAN. Saturates

seem to have a detrimental effect.

• For 50% crude oil (Figure 4.16, R=0.96) the four variables with the highest

sensitivity are Nickel, Vanadium, aromatics and sulfur. Nickel and aromatics

decrease the value of inhibiting capacity as their content increases. If only

positive effects are considered then V, S %, asphaltenes and resins show the

highest influence.
114

• For 80% (Figure 4.17, R=0.98) Nickel and Vanadium showed the highest

sensitivities, in both cases producing a decrease in the inhibiting capacity as their

content increases. asphaltenes follow and then aromatics, the latter also having an

inverse relationship. Note that sensitivity values are a lot higher for the first two

cases.

An interesting result from the model is that it was able to point out notably different

behaviors when the crude oil concentration changes. By putting together the data for low

concentrations (1 and 20%, see Figure 4.18) and the data for higher concentrations (50

and 80%, see Figure 4.19) and looking at the sensitivities it can be concluded that at

higher concentrations the presence of crude oil has the greatest influence on the output

and the effects of other variables is not as significant. At low crude oil concentrations the

sensitivities are a lot higher (up to 0.8) indicating that inhibition is not as much related to

the amount of crude oil but to the presence of oil or a combination of the two or more

variables.

From the interrelationship graphs (see Figure 4.20) we can clearly see that for

Crude oil range [1% - 20%], Nickel content of the oil tends to increase. Further for

[20% - 80%] crude oil range, the Nickel content decreases; depicting an inverse

relationship between the two variables. Similarly from Figure 4.21 we can see a linear

relationship between Crude oil and TAN for crude oil ranging from [20% - 80%].
115

5. 3 NETWORK INTERPRETATION DIAGRAM (NID) RESULTS

Network Interpretation Diagram results serve as the basis; reiterating the fact that

certain variables have a positive effect on the output response whereas others tend to have

a negative effect. From the Figure 4.24 we can see that the thick continuous lines

representing positive excitatory signal are getting generate from %crude oil, API and

total Nitrogen content, where as thin continuous lines are generated from Vanadium and

TAN. These results are consistent with the patterns available through the sensitivity

analysis showing that all the above mentioned variables positively affect the inhibition

rate i.e. an increase in these variables tend to increase the inhibition rate. On the contrary

variables such as Nickel, S% and Saturates generate thick dashed lines which represent a

negative or detrimental effect on the output response. The NID thus provided a clear

visual representation of the magnitude and direction of the synaptic weights.

5. 4 RESULTS OF GARSON’S ALGORITHM

The main idea behind implementation of Garson’s algorithm was to partition the

relative share of prediction associated with each input variable and determine if any of

the input variables can be eliminated from the further analysis. From the Figure 4.25 we

can positively conclude that %Crude oil is by far the most influencing factor affecting the

inhibition rate. The relative partitioning also revealed that apart from crude oil and

Nickel; all the input variables have almost similar affect on the output.
116

5. 5 RESULTS OF TREPAN ALGORITHM

The TREPAN algorithm was applied both on Even and Uneven class data sets and

two different decision trees; Trepan_tree and Disjunctive_Trepan_tree were extracted.

From the results as shown in Table 4.5 the following inference can be made.

• The classification accuracy in the case of both training and test data was found

higher in the UNEVEN class data set. This is mainly because of the fact the most

of the data points had a high inhibition rates, so dividing into uneven classes

based on the frequency distribution of the data within particular ranges proved to

be better idea.

• The Disjunctive_Trepan_tree extracted in the case of the UNEVEN data had

slightly better prediction accuracy as compared to the Trepan_tree extracted from

the UNEVEN data set.

5. 5. 1 EFFICACY OF THE RULE EXTRACTION TASK

The efficacy of the rule extraction task can be tested along the following

dimensions:

• Comprehensibility and Expressive Power: The propositional rules generated from

the Trepan decision tree were successfully able to provide class labels providing

explicit results of the extracted knowledge. The antecedent of each rule is a

simple combination of the values of the input variables. The number of input
117

variables in the antecedent of each rule serves as an indirect measure of scaling

the comprehensibility and the expressive power of the rules set.

• Table 5.2 gives us a quick overview of the number of input variables in

antecedents in a particular rule (the left column) and the number of rules having

that number of features in their respective antecedents (right column). From the

table it is clear that almost 90% of the rules in the rule set have less than five input

variables in their antecedents, clearly signifying the comprehensibility of the rule

set.

Table 5.2 Number of Features in the Rule Antecedent for the NN-Rule Set

Number of features in
Number of rules (Total: 20)
the rule antecedent

2 1
3 9
4 8
5 2

• Accuracy and Fidelity: The rules set was generated from the decision tree created

by the TREPAN algorithm. Considering the fact that TREPAN uses the network as

an oracle to predict the class labels, the rule set accurately mimics the behavior of

the trained neural network. Hence, we can positively conclude that the fidelity of

the rule extraction process was extremely high.


118

5. 6 COMPARISON METHODOLOGIES

5. 6. 1 STATISTICAL ANALYSIS

Statistical Analysis of the data was performed using MINITAB. Multiple

regression analysis was performed to come up with a regression equation that would be

able to show the model could be augmented by knowing any possible linear relationships

among each of the input variables and the output. The regression equation is:

%Inhibition = 71.7 + 0.023 API + 0.000086 total + 0.0576 TAN


− 0.722 Saturates − 0.714 Aromatics − 0.700Resins
(5.1)
− 0.727 Asphaltenes + 0.000227 V − 0.0047 Ni
+ 0.00365 %Crude Oil

Details for the multiple regression is given in Table 5.3

Table 5.3 Results of Multiple Regression Analysis

Predictor Coef SE Coef T P


Constant 71.72 34.64 2.07 0.041
API 0.02311 0.00546 4.24 0
S(%) -0.06953 0.0946 -0.73 0.464
Total 0.00009 0.00006 1.35 0.181
TAN 0.05759 0.01581 3.64 0
Saturates -0.7223 0.3464 -2.09 0.039
Aromatics -0.7136 0.345 -2.07 0.041
Resins -0.7 0.3446 -2.03 0.045
Asphaltenes -0.7273 0.3386 -2.15 0.034
V 0.00023 0.00022 1.04 0.299
Ni -0.00472 0.00341 -1.38 0.169
%Crude oil 0.00365 0.0004 9.05 0
119

The Coef is the regression coefficient for a given variable, SE Coef is the standard

error of the coefficient. The t-value (T) is used to compare the t-value to the t-distribution

to determine if a predictor is significant. The bigger the absolute value of the t-value, the

more likely the predictor is significant. The p-value (P) is the probability value and it is

often used in hypothesis tests to help decide whether to reject or fail to reject a null

hypothesis. The p-value is the probability of obtaining a test statistic that is at least as

extreme as the actual calculated value, if the null hypothesis is true. The smaller the p-

value, the smaller the probability is that one would be making a mistake by rejecting the

null hypothesis. A commonly used cut-off value for the p-value is 0.05. For example, if

the calculated p-value of a test statistic is less than 0.05, the null hypothesis is rejected.

The p-values for the estimated coefficients of API, TAN and Crude Oil are 0.000,

indicating that they are significantly related to % Inhibition. The p-values for V, Ni,

Total, S% are >0.05, indicating that these are not related to Inhibition at a-level of 0.05.

The R-Square value obtained was 55%, which is fairly low suggesting that the

relationship between the predictor and response variables is not linear. The R-Square

value of 55% implies that only 55% of the variability in the output could be captured and

explained by this linear model.

Many statistical tests and intervals are based on the assumption of normality.

Unfortunately, many real data sets are in fact not approximately normal. However, an

appropriate transformation of a data set can often yield a data set that does follow

approximately a normal distribution. This increases the applicability and usefulness of


120

statistical techniques based on the normality assumption. The BOX-COX transformation

[100] is a particularly useful family of transformations, applied to the response variable

(in our case inhibition). It is defined as:

T (Y ) = (Y λ − 1) / λ (5.2)

where Y is the response variable and λ is the transformation parameter.

Similarly the BOX-TIDWELL [100] method is apply a power transformation to

some of the input variables e.g. (API)2 or (Ni)-0.5 to normalize the data. The BOX-COX

and BOX-TIDWELL transformations were performed on the original regression model

but the results did not improve reinforcing the fact that the relationship between the

predictor and response variables is not linear. Lastly, Stepwise Regression [100] was

performed to consider reducing the model size by eliminating some of the input variables

(within the scope of the analysis). Once again, there was no significant improvement in

the R-Sq value

5. 6. 2 C4.5 DECISION TREE USING WEKA

It has been seen that the neural networks are generally better in approximating the

complex relationships between the continuous variables and their influence on the output.

Rules extracted from the decision trees based on the network parameters such as weight

and basis tend to be more accurate in some cases then those derived direct from the data

by other machine learning methods, such as ID3 [23], C4.5 [25] or CART [102].
121

The Waikato Environment for Knowledge Analysis (WEKA) [101] java software

package provides a host of well documented data structures, classes and tools for

development of the machine learning schemes. WEKA uses a J4.8 algorithm which

implements the C4.5 algorithm to extract decision tree. In order to compare the results of

TREPAN the C4.5 algorithm was applied to the data. The results of the C4.5 algorithm

applied set of EVEN and UNEVEN classified data set is shown in Table 5.4

Table 5.4 Results of C4.5 Algorithm.

C4.5 Decision Tree


Training Data Test Data
EVEN
72.96% 60.45%
Classes
UNEVEN
74.31% 62.70%
Classes

From the results shown in Table 5.4 we can see that the data with the EVEN

classification ranges had higher prediction accuracy both on the training and the test data.

Table 5.5 gives us a comparative summary of the prediction accuracy of the

different methodologies employed in this thesis.


122

Table 5.5 Comparative Summary

Prediction Methodology Prediction Accuracy


Employed on Test Data
Neural Network Model
96.7
(MLP) 11-6-6-1
Disjunctive_Trepan_tree
71.80%
(Uneven Classes)
C4.5 Decision Tree
62.70%
(Uneven Classes)
Multiple Regression
55.00%
Analysis

From the table is evident that clearly the Neural Network model outperforms the

traditional Multiple Regression Analysis, providing us with a model that understands the

data patterns and the interrelationship between the predictor variables and their effect on

the response. Again the rules set extracted from the Disjunctive_Trepan decision tree

based on the network parameters such as synaptic weights and bias, has higher prediction

accuracy as compared to the C4.5 decision tree derived directly from the data values.
123

CHAPTER 6. CONCLUSION AND FUTURE RESEARCH

6. 1 CONCLUSIONS

The main aim was to develop a model that would perform accurately generating

high predictive accuracy even with limited amount of data. Another important driving

factor for the research was to come up with a robust model which could handle noisy data

and still be able to explain the complex relationship between the various constituents of

the oil and provide knowledge based on pattern in the data. This thesis covered several

aspects of knowledge based approach for predicting the corrosion rate based on the

constituent of the oil. Sensitivity Analysis was clearly able to explain the interrelationship

between their input variables. The analysis revealed that variables such as crude oil, API,

Nickel and Total Nitrogen content were the most important factors affecting the

inhibition rate. These results were in accordance to the experimental results obtained

Hernández et al. [13]. Further analysis using NID and Garson’s algorithm determined

the relative share of importance of all the input variables in the predicted response.

Finally TREPAN presented us with a rule set extracted from the decision tree, which was

accurately able to mimic the neural network in classifying the patterns in the rule space.

The efficacy of rule set proved that the rules were simple and comprehensible with less

number of features in the rule antecedent.

A comparative study was also undertaken to test the performance of the neural

network based approach compared to the statistical analysis and traditional data mining
124

methods. The encouraging results proved that neural network based model coupled with

rule extraction techniques accomplished better results.

In summary the research was successfully able develop a rule based approach using

artificial neural networks in predicting the corrosion rate.

6. 2 FUTURE RESEARCH

There are various directions that can effectively channelize the future research in

this field.

Use of Interpolation Techniques for Data Farming:

The neural network performance faces challenges such as variability of the

corrosion behavior and the unpredictable behavioral patterns of the network model that

may occur in regions of the problem domain where no data is available. Neural networks

cannot produce reliable prediction for input conditions that are outside the ranges of the

data used to train them. The standard interpolation techniques can be successfully applied

to one or two variables, but in the case of corrosion the number of variables is significant

(11 variables in this research). There is a need to develop interpolation methods that can

generate data points while accommodating large number of input variables and still be

able to maintain the interrelationship between variables.


125

Knowledge Based Neurocomputing

An important consideration for using neural networks in practical applications is

the time and cost of training the neural network. The training regimen can be

significantly improved by introducing knowledge-primed or hind-based training

strategies. Such techniques can map the available domain knowledge into the basic

architecture of the neural model to reduce the training effort.

Regression Trees

Most of the rule extraction methods depend on decision trees which are primarily

learned models in classification domains. Formulation of a classification problem from

the original continuous data, introduces significant noise in the learning data set. For

regression tasks, it would be beneficial to have the extracted model be a representation of

the actual regression surface of the network. A proposed research could be developing a

regression tree, which would have leaves characterized by real valued functions rather

than predicted classes.

Development of an Hybrid Mechanistic Model

The current research focuses on extraction of rule from a trained neural network for

corrosion prediction. For practical application of this technique it needs to be readily

available in the form of equations that can be used for calculations. There is a need to

develop a hybrid mechanistic model which represents the complex physiochemical

processes through a set of equations. It would express both the existing knowledge
126

(theoretical knowledge based on chemical analysis) and the one extracted in the form of

rules extracted from the neural network.


127

REFERENCES

[1]. C. de Waard, D. E. Milliams, “Carbonic Acid Corrosion of Steel,” Corrosion


31(5) 1975, p.131-177.
[2]. L. G. S. Gray, B. G. Anderson, M. J. Danysh, P. R. Tremaine, “Mechanism of
carbon steel corrosion in brines containing dissolved carbon dioxide at pH4,”
Corrosion/89, Paper no. 464, Houston, TX, NACE International, 1989.
[3]. S. Nesic, J. Postlethwaite, S. Olsen, “An electrochemical model for prediction of
CO2 corrosion,” Corrosion/95, Paper no. 131, Houston, TX, NACE
International, 1995.
[4]. C. de Waard, D. E. Milliams, “Predication of carbonic acid corrosion in natural
gas pipelines,” First International Conference on the Internal and External
Corrosion of Pipes, Paper FL, University of Durham, England, 1975.
[5]. G. Schmitt, B. Rothman, “Werkstoffe and Korrosion 28,” 1977, p. 816.
[6]. L. G. S. Gray, B. G. Anderson, M. J. Danysh, P. R. Tremaine, “Effect of pH and
temperature on the mechanism of carbon steel corrosion by aqueous carbon
dioxide,” Corrosion/90, Paper no. 40, Houston , TX, NACE International, 1990.
[7]. C. de Waard, U. Lotz, “Prediction of CO2 corrosion of carbon steel,”
Corrosion/93, Paper no. 69, Houston, TX, NACE International, 1993.
[8]. A. Dugstad, L. Lunde, K. Videm, “Parametric study of CO2 corrosion of carbon
steel,” Corrosion/94, Paper no. 14, Houston, TX, NACE International, 1994.
[9]. European Federation of Corrosion, “CO2 corrosion control in oil and gas
production, A Working Party Report,” Publication 23, The Institute of Materials,
London, England, 1997.
[10]. European Federation of Corrosion, “CO2 corrosion control in oil and gas
industry, A Working Party Report,” Publication 13, The Institute of Materials,
London, England, 1994.
[11]. K. D. Efird, “Preventive Corrosion Engineering in Crude Oil Production,”
Offshore Technology Conference (OTC), Paper no. 6599, 1991.
[12]. J. S. Smart, “Wettability – a Major Factor in Oils and Gas System Corrosion,”
NACE Corrosion/93, Paper no. 70, 1993.
[13]. S. Hernández, S. Duplat, J. R. Vera, E. Barón, “A statistical approach for
analyzing the inhibiting effects of different types of crude oil in CO2 corrosion
of carbon steel.” Corrosion/2002, Paper no. 02293. National Association of
Corrosion Engineers, 2002.
128

[14]. R. A. Cottis, L. Qing, G. Owen, S. J. Gartland, I.A. Helliwell, M. Turega,


“Neural networks for corrosion data reduction,” Materials and Design, Vol. 20,
1999.
[15]. Bonissone, P. P., “Soft computing: the convergence of emerging reasoning
technologies,” Soft Computing, 1(1), 1997, p. 6-18.
[16]. Zadeh, L. A., “Fuzzy logic: issues, contentions and perspectives,” IEEE
International Conference on Acoustics, Speech, and Signal Processing, Vol. 4,
1994, p. 19-22.
[17]. Dote, Y., Ovaska, S. J., “Industrial applications of soft computing: A review,”
Proceedings of the IEEE, 89(9), 2001, p. 1243-1265.
[18]. Holland, J. H., “Adaptation in natural and artificial systems,” University of
Michigan Press, Ann Arbor, MI, 1975.
[19]. Gen, M., Cheng, R., “Genetic algorithms and engineering optimization,” John
Wiley & Sons, Inc., New York, NY, 2000.
[20]. Dietterich, T. G., “Machine learning. Annual Review of Computer Science,”
Vol. 4, 1990, p. 255-306.
[21]. Mitchell, T., “Machine learning. 1st edition. Computer Science Series,” WCB
McGraw-Hill, Boston, MA, 1997.
[22]. Fayyad, U., Piatetsky S. G., Smyth, P., “From data mining to knowledge
discovery in databases,” AI Magazine, 17(3), 1996.
[23]. Quinlan, J. R., “Induction of decision trees. Machine Learning,” Vol. 1, 1986, p.
81-106.
[24]. Craven, M., “Extracting comprehensible models from trained neural networks,”
Ph.D. dissertation, University of Wisconsin, Madison, WI, 1996.
[25]. Quinlan, J. R., “C4.5: Programs in machine learning,” Morgan Kaufmann, San
Mateo, CA, 1993.
[26]. Han, J., Fu, Y., “Exploration of the power of attribute-oriented induction in data
mining in advances in knowledge discovery and data mining,” AAAI, MIT
Press, Cambridge, MA, 1996, p. 399-421.
[27]. Han, J., Cai, Y., Cercone, N., “Knowledge discovery in databases: An attribute-
oriented approach,” Proceedings of 1992 International Conference on Very
Large Data Bases (VLDB'92), Vancouver, Canada, 1992, p. 547-559.
[28]. Han, J., Cai, Y., Cercone, N., Huang, Y., “Discovery of data evolution
regularities in large databases,” Journal of Computer and Software Engineering,
1994, p. 1-29.
129

[29]. Efraim, T., Jay E. A., Liang T. P., McCarthy, R. V., “Decision support systems
and intelligent systems,” Prentice Hall, Upper Saddle River, NJ, 2001.
[30]. Principe, J. C., Euliano, E. R., Lefebvre, W. C., “Neural and adaptive systems:
Fundamentals through simulations with cd-rom,” John Wiley & Sons, Inc., New
York, NY, 1999.
[31]. Reed, R. D., Marks, R. J., “Neural smithing: Supervised learning in feedforward
artificial neural networks,” MIT Press, Cambridge, MA, 1998.
[32]. Rumelhart, D. E., Hinton, G. E., Williams, R. J., “Learning representations by
backpropagation errors,” Nature, Paper no. 323, 1986, p. 533-536.
[33]. Hinton, G. E., “Connectionist learning procedures,” Artificial. Intelligence,
40(1-3), 1989, p. 185-234.
[34]. Whitley, D., Starkweather, T., Bogart, C., “Genetic algorithms and neural
networks: Optimizing connections and connectivity,” Parallel Computing, 14(3),
1990, p. 347-361.
[35]. Engel, J., “Teaching feed-forward neural networks by simulated annealing,”
Complex Systems, 2(6), 1988, p. 641-648.
[36]. Vapnik, V. N., “The nature of statistical learning theory,” Springer-Verlag, NY,
1995.
[37]. Dimopoulos, Y., Bourret, P., Lek, S., “Use of some sensitivity criteria for
choosing networks with good generalization ability,” Neural Processing Letters
Vol. 2, 1995, p. 1-4.
[38]. Dimopoulos, I., Chronopoulos, J., Chronopoulou Sereli, A., Lek, S., “Neural
network models to study relationships between lead concentration in grasses and
permanent urban descriptors in Athens city (Greece).” Ecological Modelling,
Paper no. 120, 1999, p. 157-165.
[39]. Scardi, M., Harding, L.W., “Developing an empirical model of phytoplankton
primary production: a neural network case study,” Ecological Modelling, 120
(2-3), 1999, p. 213-223.
[40]. Yao, J., Teng, N., Poh, H.L., Tan, C.L., “Forecasting and analysis of marketing
data using neural networks,” Journal of Information Science and Engineering
14, 1998, p. 843-862.
[41]. Lek, S., Belaud, A., Dimopoulos, I., Lauga, J., Moreau, J., “Improved
estimation, using neural networks, of the food consumption of fish populations,”
Marine Freshwater Research, 46, 1995, p. 1229-1236.
[42]. Lek, S., Belaud, A., Baran, P., Dimopoulos, I., Delacoste, M., “Role of some
environmental variables in trout abundance models using neural networks,”
Aquatic Living Resources, 9, 1996a, p. 23-29.
130

[43]. Lek, S., Delacoste, M., Baran, P., Dimopoulos, I., Lauga, J., Aulagnier, S.,
“Application of neural networks to modelling nonlinear relationships in
ecology,” Ecological Modelling, 90, 1996b, p. 39-52.
[44]. Mastrorillo, S., Lek, S., Dauba, F., “Predicting the abundance of minnow
Phoxinus phoxinus (Cyprinidae) in the River Ariege (France) using artificial
neural networks,” Aquat. Living Resour, 10, 1997a, p. 169–176.
[45]. Mastrorillo, S., Dauba, F., Oberdorff, T., Gue´gan, J.F., Lek, S., “Predicting
local fish species richness in the Garonne River basin,” C.R. Acad. Sci, Sciences
de la vie Paris, 321, 1998, p. 423–428.
[46]. Lek-Ang, S., Deharveng, L., Lek, S., “Predictive models of collembolan
diversity and abundance in a riparian habitat,” Ecological Modelling, 120, 1999,
p. 247–260.
[47]. Spitz, F., Lek, S., “Environmental impact prediction using neural network
modeling. An example in wildlife damage.” Journal of applied ecology, 36,
1999, p. 317–326.
[48]. Olden, J.D., “An artificial neural network approach for studying phytoplankton
succession.” Hydrobiology, 436, 2000, p. 131–143.
[49]. Garson, G.D., “Interpreting neural network connection weights,” Artificial
Intelligence Expert, 6, 1991, p. 47-51.
[50]. Goh, A.T.C., “Back-propagation neural networks for modeling complex
systems,” Artificial Intelligence in Engineering, 9, 1995, p. 143-151.
[51]. Aoki, I., Komatsu, T., “Analysis and prediction of the fluctuation of sardine
abundance using a neural network,” Oceanol. Acta, 20, 1999, p. 81–88.
[52]. Chen, D.G., Ware, D.M., “A neural network model for forecasting fish stock
recruitment,” Can. J. Fish, Aquat. Sci, Vol. 56, 1999, p. 2385–2396.
[53]. Özesmi, S. L., U. Özesmi, “An artificial neural network approach to spatial
habitat modelling with interspecific interaction,” Ecological Modelling, 116,
1999, p. 15–31.
[54]. Tickle, A., Andrews, R., Golea, M., Diederich, J., “The truth will come to light:
Directions and challenges in extracting the knowledge embedded within trained
artificial neural networks,” IEEE Transactions on Neural Networks, 9(6), 1998,
p. 1057-1068.
[55]. Craven, M., Shavlik, J., “Rule extraction: where do we go from here?”
University of Wisconsin Machine Learning Research Group working paper, 99-
1, 1999.
[56]. Gallant, S. I., “Connectionist expert systems,” Communications of the ACM, 31,
1988, p. 152-169.
131

[57]. Andrews, R., Diederich., Tickle, A. B., “Survey and critique of techniques for
extracting rules from trained artificial neural networks,” Knowledge Based
Systems, 8, 1995, p. 373-389.
[58]. Fu, L. M., “Rule learning by searching on adapted nets,” Proceedings of the
Ninth National Conference on Artificial Intelligence, AAAI Press, Anaheim,
CA, 1991, p. 590-595.
[59]. Towell, G. G., Shavlik, J. W., “Extracting refined rules from knowledge-based
neural networks,” Machine Learning, 13, 1993, p. 71-101.
[60]. Setiono, R., “Extracting rules from neural networks by pruning and hidden-unit
splitting,” Neural Computation, 9, 1997, p. 205-225.
[61]. Andrews, R., Geva, S., “Rule extraction from a constrained error back
propagation MLP,” Proceedings of Fifth Australian Conference on Neural
Networks, Brisbane, Queensland, 1994, p. 9-12.
[62]. Saito, K., Nakano, R., “Rule extraction from facts and neural networks,”
Proceedings of the International Neural Network Conference, San Diego, CA,
1990, p. 379-382.
[63]. Craven, M. W., Shavlik, J. W., “Using sampling and queries to extract rules
from trained neural networks,” Proceedings of the Eleventh International
Conference on Machine Learning, Morgan Kaufmann, New Brunswick, NJ,
1994, p. 37-45.
[64]. Thurn, S. B., “Extracting provably correct rules from artificial neural networks,”
Technical Report IAI-TR-93-5, University of Bonn, Bonn, Germany, 1993.
[65]. Pop, E., Ruleneg, J. D., “Rule-extraction from neural networks by step-wise
negation,” Technical report, Queensland University of Technology,
Neurocomputing Research Center, 1994.
[66]. Craven, M.W., Shavlik, J. W., “Extracting tree-structured representations of
trained networks,” Advances in Neural Information Processing, Vol. 8, 1996,
p. 24-30.
[67]. Schmitz, G. P. J., Aldrich, C., Gouws, F. S., “ANN-DT: An algorithm for
extraction of decision trees from artificial neural networks,” IEEE Transactions
on Neural Networks, Vol. 10(6), 1999, p. 1392-1401.
[68]. Boz, O., “Converting a trained neural network to a decision tree,” Proceedings
of the 2002 International Conference on Machine Learning and Applications –
ICMLA., Las Vegas, NE, CSREA Press, 2002, p. 110-116.
132

[69]. Sestito, S., Dillon, T. “Automated knowledge acquisition of rules with


continuously valued attributes,” Proceedings of the Twelfth International
Conference on Expert Systems and their Application, Avignon, France, 1992,
p. 645-656.
[70]. Keedwell, E., Narayanan, A., Savic, D. A., “Using genetic algorithms to extract
rules from trained neural networks,” Proceedings of the Genetic and
Evolutionary Computing Conference, Orlando, FL, Morgan Kaufmann, 1999,
p. 793.
[71]. Maire, F., “The convergence of validity interval analysis.” IEEE Transactions on
Neural Networks, 11(3), 2000, p. 802-807.
[72]. Nesic, S., Thevenot, N., Crolet, J.L., “Electrochemical properties of iron
dissolution in CO2 solutions – basic revisited,” Corrosion /96, Paper No.3,
Houston, TX, NACE International, 1996.
[73]. Videm, K., “Fundamental studies aimed at improving models for prediction of
CO2 corrosion,” Progress in the Understanding and Prevention of Corrosion,
Proceedings of the 10th European Corrosion Congress, Vol. 1. Institute of
Metals, London, 1993, p. 513.
[74]. C. de Waard, U. Lotz, A.Dugstad, “Influence of liquid flow velocity on CO2
corrosion: a semi-empirical model,” NACE Corrosion/95, Paper no. 128, 1005.
[75]. Pots, C. D., Garber, J. D., Walters, F. H., Singh, C., “Verification of computer
modeled tubing life predictions by field data,” NACE International,
Corrosion/93, Paper no. 82, Houston, TX, 1993.
[76]. Pots, B. F. M., “Mechanistic models for prediction of CO2 corrosion rates under
multi-phase flow conditions,” NACE International, Corrosion /95, Houston, TX,
Paper no. 137, 1995.
[77]. Turgoose, S., Cottis, R. A., Lawson, K., “Modeling of electrode processes and
surface chemistry in carbon dioxide containing solutions,” ASTM Symposium
on Computer Modelling of Corrosion, San Antonio, TX, 1990.
[78]. Achour, M.H., Kolts, J., Johannes, A.H., Liu, G., “Mechanistic modeling of pit
propagation in CO2 environment under high turbulence effects,” NACE
International, Corrosion/93, Houston, TX, Paper no. 87, 1993.
[79]. Dayalan, E., Vani, G., Shadley, J. R, Shirazi, S. A., Rybicki, E. F., “Modelling
CO2 corrosion of carbon steel in pipe flow,” NACE International, Corrosion/95
Houston, TX, Paper no. 118, 1995.
[80]. Kanwar, S., Jepson, W. P., “A model to predict sweet corrosion of multiphase
flow in horizontal pipelines,” NACE International, Corrosion /94, Houston, TX,
Paper no. 24, 1994.
133

[81]. Provan, J. W., Rodriquez, E. S. III, "Part I: Development of a Markov


description of pitting corrosion,” Corrosion, Vol. 45, n 3, 1989, p. 178-192.
[82]. Sheikh, A. K., Boah, J. K., Hansen, D. A., “Statistical modeling of pitting
corrosion and pipeline reliability,” Corrosion, Vol. 46, n 3, 1990, p. 190-197.
[83]. C. Mendez, S. Duplat, S. Hernandez, J. Vera, “On The Mechanism Of Corrosion
Inhibition By Crude Oils,” NACE International, Corrosion/2001, Houston, TX,
Paper no. 01044, 2001.
[84]. S. Hernandez, J. Bruzual, F. Lopez-Linares, J. Luzon, “Isolation Of Potential
Corrosion Inhibiting Compounds in Crude Oils,” NACE International,
Corrosion/2003, Houston, TX, Paper No. 03330, 2003.
[85]. Smets, H. M. G., Bogaerts, W. F. L., “Neural network prediction of stress-
corrosion cracking,” Mater Perform, Vol. 31, 1992, p. 64-67.
[86]. Urquidi-Macdonald, M., Eiden, M. N., Macdonald D. D, “Development of a
neural network model for predicting damage functions for pitting corrosion in
condensing heat exchangers,” Modifications of Passive Films, Paris,1993,
p. 336-343.
[87]. Ben-Hain M., Macdonald D. D, “Modeling geological brines in salt-dome high
level nuclear waste isolation repositories by artificial neural networks,”
Corrosion Science., Vol. 36, 1994, p. 385-393.
[88]. Silverman D. C, Rosen E. M., “Corrosion prediction from polarization scans
using an artificial neural network integrated with an expert system,”
Corrosion/92, Vol. 48, 1992, p. 734 -745.
[89]. Silverman D. C., “Artificial neural network predictions of degradation of non-
metallic lining materials from laboratory tests,” Corrosion/94, Vol. 50, 1994, p.
411-418.
[90]. Trasatti S. P., Mazza F., “Crevice corrosion: a neural network approach,”
Corrosion Journal, Vol. 31, 1996, p. 105-112.
[91]. Nesic S., Vrhovac M., “Neural network model for CO2 corrosion of carbon
steel,” Journal Of Corrosion Science and Engineering, Vol. 1, paper 6, 1998.
[92]. R. A. Bailey, S. Jayanti, R. M. Pidaparti, M. J. Palakal, “Corrosion prediction in
aging aircraft materials using neural networks,” Proc. of the 41st
AIAA/ASME/ASCE/ACS Conference on Structures and Structural Dynamics
and Materials, April 2000.
[93]. M. Bucolo, L. Fortuna, M. Nelke, A. Rizzo, T. Sciacca, “Prediction Model for
the Corrosion Phenomena in Pulp & Paper Plant,” Control Engineering Practice,
Vol. 10, Elsevier Science, 2002, p. 227-237.
134

[94]. Leifer, J., Mickalonis, J. I., “Prediction of Pitting Corrosion in Aqueous


Environments via Artificial Neural Network Analysis,” Proceedings of Artificial
Neural Networks in Engineering (ANNIE) 98, St. Louis, MO, November’98.
[95]. Leifer, J., Zapp, P. E., Mickalonis, J. I., “Predictive Models for the
Determination of Pitting Corrosion versus Inhibitor Concentrations and
Temperature for Radioactive Sludge in Carbon Steel Tanks,” The Journal of
Engineering and Science Corrosion, Vol. 55, no. 1, January 1999, p. 31 - 37.
[96]. Haque, M. E., Sudhakar, K. V., "Prediction of Corrosion-Fatigue behavior of DP
steel through Artificial Neural Network,” International Journal of Fatigue, Vol.
23, Issue 1, Elsevier Science, Ltd., January 2001, p. 1-4.
[97]. Pidaparti R. M., Jayanti S., Palakal M. J., “Residual Strength and Corrosion Rate
Predictions of Aging Aircraft Panels: Neural Network Study,” Journal of
Aircraft, No. 1, Vol. 39, Janurary 2002, p. 175-180.
[98]. M. J. Palakal, R. M. Pidaparti, S. Rebbapragada, C. R. Jones, “Intelligent
Computational Methods for Corrosion Damage Assessment,” AIAA Journal,
Vol. 39, No. 10, 2001, p. 1936-1943.
[99]. Adams, C. D., Garber, J. D., Walters, F. H., Singh, C., “Verification of computer
modeled tubing life predictions by field data,” NACE International,
Corrosion/93, Houston, TX, Paper no. 82, 1993.
[100]. D. C. Montgomery, E. A Peck, G. Vining, “Introduction to Linear Regression
Analysis,” Third Edition, John Wiley & Sons, New York, NY, 2001.
[101]. Stephen R. Garner, “WEKA: The Waikato Environment for Knowledge
Analysis,” 1995.
[102]. Breiman, L., Friendman, J. H., Olshen, R. A., Stone C. J., “Classification and
Regression Trees,” Chapman & Hill, New York, 1984.
[103]. http://www2.umist.ac.uk/corrosion/JCSE/Volume1/paper6/v1p6.html
[104]. http://www.cs.nyu.edu/web/Research/Theses/li_bin.pdf
[105]. http://www-ai.informatik.uni-dortmund.de/DOKUMENTE/klinkenberg_96a.pdf
[106]. http://online.awma.org/journal/pdfs/2003/4/dimopoulos.pdf
[107]. http://www.neurosolutions.com/products/ns/features.html
[108]. Olden, J. D., Jackson, D. A., “Illuminating the “black box”: a randomization
approach understanding variable contributions in artificial neural networks,”
Ecological Modelling, No. 154, 2002, p.135-150.
[109]. Olden, J.D., and Jackson, D. A., “Fish-habitat relationships in lakes: Gaining
predictive and explanatory insight by using artificial neural networks,”
Transactions of the American Fisheries Society, 130, 2001, p. 878-897.
135

[110]. http://www.ics.uci.edu/~mlearn/MLlist/v7/20.html
[111]. http://sti.srs.gov/fulltext/ms9800653/ms9800653.pdf
[112]. http://www.nd.com/genetics/selection.html

You might also like