You are on page 1of 17

Advanced Engineering Informatics 19 (2005) 299–315

www.elsevier.com/locate/aei

Monitoring bridge health using fuzzy case-based reasoning


Yousheng Cheng*, Hani G. Melhem
Department of Civil Engineering, Kansas State University, Manhattan KS 66506, USA

Abstract

Case-based reasoning (CBR), one of the artificial intelligence (AI) learning approaches, is drawing the attention of many researchers in
Civil Engineering. However, due to vagueness and uncertainties in knowledge representation, attribute description, and similarity measures
in CBR—especially when dealing with similarity assessment—it is difficult to find the cases from a case base which exactly match the query
case. Therefore, fuzzy theories have been incorporated into CBR allowing for more robust, flexible, and accurate models. In this study, two
fuzzy membership functions (trapezoidal and step-wise) and fuzzy numbers are used to measure the similarity between attribute values. They
are integrated into CBR to develop a model used to monitor highway bridge health. This model’s learning capabilities have been validated
using five different error-metrics, based on the cross-validation method. The code is implemented using the programming language CCC,
and all the cases used for both training and testing are extracted from the electronic bridge database of the Kansas Department of
Transportation. It is shown from the experimental results that it is feasible to apply fuzzy case-based reasoning to monitor bridge health.
q 2005 Elsevier Ltd. All rights reserved.

Keywords: Fuzzy case-based reasoning; Fuzzy membership functions; Bridge health monitoring

1. Introduction Recently, there has been an increasing interest in


application of artificial intelligence (AI) and machine
By the mid 90’s, almost all transportation agencies in learning (ML) approaches to bridge engineering and
North America had fully operational bridge management management due to their non-traditional heuristic
systems (BMSs) for managing the bridge network under problem-solving capabilities. Melhem and Cheng [13]
their jurisdiction. At the project level, a BMS assists used the k-nearest-neighbor instance-based learning as
decision-makers in determining the optimal cost-effective well as inductive learning to predict the remaining service
maintenance actions for a specific bridge, while at the life of bridges subject to corrosion of deck reinforcement.
network level such a decision is made for an entire network The combinations of wrapper methods and decision trees
of bridges considering the budget constraints. Making these were applied to predict the general deterioration of bridge
decisions without predicting the future condition of these decks [14]. The model tree and the regression tree were
bridges and without identifying their future funding needs employed to forecast bridge corrosion [8]. Another AI
restricts the benefits obtained from these BMSs. Therefore, approach widely used is case-based reasoning (CBR). CBR
bridge deterioration models have been introduced to has been used in transforming the raw data of several
predict the future condition of different bridge elements domains into usable knowledge that has been applied to
in order to better identify their future funding requirements. solve planning, prediction, classification, and design
The American Association of State Highway and Trans- problems [5,25]. Some applications of CBR to bridge
portation Officials (AASHTO) prescribed the bridge engineering practice are: the use of multimedia approach to
deterioration model as one of the minimum requirements case-based structural design [12], BRIDGER, a system for
of any BMSs [2]. the conceptual design of cable-stayed bridges [20], the use
of case-based approach for steel bridge fabrication errors
* Corresponding author.
[21], CBR system for modeling infrastructure deterioration
E-mail addresses: ych6565@ksu.edu (Y. Cheng), melhem@ksu.edu [17], and modeling bridge deterioration using CBR [18].
(H.G. Melhem). The common advantages of the approach are their
versatility, and ease of updating. However, there are some
1474-0346/$ - see front matter q 2005 Elsevier Ltd. All rights reserved.
doi:10.1016/j.aei.2005.07.002 vagueness and uncertainties in knowledge representation,
300 Y. Cheng, H.G. Melhem / Advanced Engineering Informatics 19 (2005) 299–315

retrieval, similarity measure, and reference of cases in CBR. numbers are. Thus, the impetus behind the introduction of
Therefore, fuzzy theory concepts are incorporated into CBR fuzzy set theories was to provide a means of defining
to overcome its drawbacks. categories that are inherently imprecise [4].
Melhem et al. [15] proposed the general concept of using
fuzzy case-based reasoning in bridge management systems,
which addresses two objectives: (a) monitoring the health of
3. Concept of case-based reasoning
bridge decks and predicting their future deterioration, and
(b) identifying the necessary actions for maintenance,
In CBR, knowledge is represented as cases. A case is a
rehabilitation, and replacement (MR&R). Portions of the
conceptualized piece of knowledge representing an experi-
proposed concept have been implemented (a few months
ence [11]. A case usually consists of a problem description,
ago) and will be presented by Cheng and Melhem [9]. This
and its corresponding outcome/solution. Whether the case
paper discusses the detailed implementation and
includes all these parts or not depends on the specific
results validation of the model developed for the fist
problem to be solved. A representative set of cases
objective, i.e., monitoring/predicting the deterioration of
comprises a case library for a problem domain.
bridge decks. Additional details on this task, and complete
CBR involves the following four cyclical processes: (1)
details and validation of the second task (evaluation of
retrieving the most similar case(s), (2) reusing the solutions
MR&R actions) are presented in full elsewhere [7].
of the retrieved case(s), (3) revising the proposed solution if
necessary, and (4) retaining the new solution as part of a
new case [1]. The CBR system retrieves one or more similar
2. Fuzzy set concept cases from the case library when a new problem is
encountered. A solution proposed by the most similar
Given a universe U of objects, a conventional crisp case(s) is reused or adapted to solve the new problem. The
subset S of U is commonly defined by specifying the problem may be retained as a new case in the case library to
objects, which are members of S, of the universe. An update the knowledge of the CBR system.
equivalent way of defining S is to specify the characteristic Among the tasks, retrieving the most similar case(s) is
function of S, ms: U/{0, 1}, where for all x2U: the first and most crucial step in CBR since the subsequent
mS ðxÞ Z 1; x 2S (1) steps would not take place without it. Retrieving the most
similar case(s) involves evaluating the degrees of similarity
mS ðxÞ Z 0; x ;S (2) between any two cases being compared. The approach
commonly used to assess similarity is the distance function.
For a fuzzy set, the idea of vagueness is introduced by However, imprecision and uncertainties are inherent in this
assigning an indicator function that may take on values in distance function [6]. Therefore, the fuzzy membership
the range 0–1. This is called membership level of x in S and functions are here integrated into CBR to deal with
indicates the strength of the belief that x lies within S. If S imprecision in the similarity measure. When fuzzy theories
represents a fuzzy set, then the membership level of any are incorporated into CBR, the algorithm is usually called
element, x2S, is denoted as ms(x), which satisfies: fuzzy case-based reasoning (FCBR).
0% ms ðxÞ% 1 (3)
A particular value of the membership function can be
expressed as: 4. Data sets
ms : x/ ½0; 1 (4) Data sets have a great effect on how well a machine
An example of a fuzzy set is the set of real numbers much learning technique performs, so the extraction of data from
larger than zero, which can be defined with a membership the database is a very important task. The data set used here
function as follows: was extracted from the electronic bridge database of the
Kansas Department of Transportation (KDOT). This
x2
mS ðxÞ Z ; xR 0 (5) database is used by PONTIS [19]. Such computerized data
x2 C 1 processing is crucial for managing any large inventory of
infrastructure assets. PONTIS uses the concept of Com-
mS ðxÞ Z 0; x! 0 (6)
monly-Recognized (CoRe) bridge elements and measures—
According to the above equations, the larger the x, the among other things—the deterioration of a bridge using the
greater, and the closer it is to 1. For example, for xZ0.5, health index of the entire bridge. The health index of the
ms(0.5)Z0.2, and for xZ20, ms(20)Z0.998. The strength of bridge is derived from the health indices of the individual
the belief or the membership level of 20 ‘much larger than CoRe elements, which are a function of the CoRe elements
zero’ is greater than that of 0.5. This fuzzy membership condition states and corresponding coefficients. These are
function indicates how much larger than zero these real explained in more details in the following paragraphs.
Y. Cheng, H.G. Melhem / Advanced Engineering Informatics 19 (2005) 299–315 301

4.1. Commonly-recognized (CoRe) elements the condition of an element at a given time as a point along a
continuous timeline from 100% in the best state to 0% in the
Most states have successfully used the CoRe bridge worst state, even though element condition states are
elements as the basis for data collection, performance categorical. The element condition state distribution
measurement, resource allocation, and management (among states 1 through 5) at any point in time can be
decision support [23]. The commonality aspect of CoRe obtained from field inspection. The health index of any
elements depends on definitions that are widely understood specific element is the ratio of the sum of the
by transportation agencies and that are stable over time. In current quantities in each condition state multiplied by a
this study, the CoRe element ‘Bridge Deck’ is used as the corresponding coefficient, over the initial total quantity of
basis for the investigation. the element. It can be given by the following formula [19,
Section 4.2)
4.2. Condition states P
kq
He Z Ps s s !100% (7)
s qs
The health index for CoRe elements is based on their
condition states. The condition states, whose definitions are where He is the health index of an individual element,
from the element-level inspection, reflect the levels of s denotes the index of the condition state, qs represents the
deterioration and the influence of deterioration on service- element quantity in sth condition state, and ks is for health
ability. The levels of deterioration of each CoRe element are index coefficient corresponding to the sth condition state.
generally described as follows [23]: The health index coefficients of the condition states {ks} are
fractional values calculated as follows:
1. Protected. The element’s protective materials or systems
nKs
are sound and functioning as intended to prevent ks Z (8)
deterioration of the element. nK1
2. Exposed. The element’s protective materials or systems in which sZ1,2,.,n, and n represents the number of
have partially or completely failed, leaving the element applicable condition states that can be 3, 4, or 5 depending
vulnerable to deterioration. upon the type of CoRe element. The health index
3. Attacked. The element is experiencing active attack by coefficients are given in Table 1 [19] according to Eq. (8).
physical or chemical processes, but is not yet damaged. The health index of the entire bridge can be evaluated as
4. Damaged. The element has lost important amounts of follows
material, such that its serviceability is suspect. P
e He Qe We
5. Failed. The element no longer serves its intended HZ P (9)
function. e Q e We

where e denotes the index of an element, we is the weighting


The above levels define the condition states and are factor of that element, and Qe represents the total quantity of
denoted State 1 through State 5, respectively. The total element e on the bridge. we is determined by either the
quantity (number, square meters, etc.) of a particular element’s failure cost to the agency or as an empirically
element is allocated among the condition states based on assigned value.
the visual observations of the inspector during the bridge As presented by Shepard and Johnson [22], the health
inspection. For instance, if 30% of the total area of a bridge index of an entire bridge over time may be plotted as shown
deck has spalled concrete (‘Exposed’), then the inspector in Fig. 1, where the x-axis represents the time (years), and
would indicate 30% in State 2, and 70% in State 1. the y-axis denotes health index (%). The health index of the
entire bridge should decrease over time. It will be increased
4.3. Health index if major maintenance actions are undertaken at a certain
point in time. The number of years it would take till the
The health index concept is used for the computation of a health index reverts to the same level as before the last
single integral indicator of the structural health of the remedial action is the extended life that this action has given
bridge. This indicator is expressed as a percentage value. to the bridge.
This value may change from 0% corresponding to the worst
possible condition, to 100% in the best condition. The Table 1
The health index coefficients [19]
premise of the health index is that each bridge element
(when new) has an initial asset value in the best condition No. of State 1 State 2 State 3 State 4 State 5
state. An element may deteriorate to a lower condition state condition
states
due to traffic or bad environment, then reducing its asset
value. After a repair maintenance or rehabilitation, the 3 1.00 0.50 0.00
4 1.00 0.67 0.33 0.00
condition of the element is improved and the corresponding
5 1.00 0.75 0.50 0.25 0.00
asset value increases. Therefore, it is helpful to think of
302 Y. Cheng, H.G. Melhem / Advanced Engineering Informatics 19 (2005) 299–315

100 The difference between Figs. 1 and 2 is due to two


Increased reasons. First of all, Pontis bridge inspection data (for
80 Value Kansas) is relatively recent and goes back only to 1993. The
database used gives information on a maximum of 5
Health Index (%)

inspections while some instances have only two or three


60
inspection records. As years go by and new inspections are
Increased Life Due performed and electronically recorded, more data points
40 to Repair Actions
will be obtained and a longer history such as the one shown
in Fig. 1 may be depicted. Secondly, the condition states
20 (and the corresponding health index) of the bridge deck may
remain the same for a certain number of years. However, the
0
health index of the entire bridge may change because a
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 bridge includes many other elements, and a variation in the
Time (Years) condition states of any of them will affect the entire bridge
health index according to Eq. (9).
Fig. 1. The health index of an entire bridge over 15 years.

4.4. Extracted data


The variation of the health index of bridge decks (or any
other CoRe bridge element) over time as computed from The database used in Pontis includes two types of data
actual inspection data is shown in Fig. 2. These curves for each bridge: (1) inventory data, which consist of
denote the health indices for five typical bridge decks administrative data such as location and identification
referred to (in the PONTIS database used in this study) as number, technical data such as traffic and posting, and
No. 27025, No. 27056, No. 27036, No. 44010, and No. descriptive data such as geometry and material; and (2)
46102. The health index of bridge decks either stay the same inspection data for the different CoRe elements, which
or decrease unless a repair action is undertaken. If a bridge currently consist of up to four visual inspections collected
is relatively new, its health index may not vary for quite a every 2 years (or in some instance more than 2 years). The
period of time. For instance, for bridge No. 27025, the CoRe elements of a structure include decks, abutments,
health index of the deck remains at 100% up to the year piles, railing, and the like. Available CoRe inspection
1998 but starts to decrease to 75% in 2000, then stays the records of structures were from 1994 through 2002, or from
same till 2002. For bridge No. 44010, the health index of the 1993 through 2003, for every 2–4 years. As mentioned
deck does not change from 1994 to 1996, goes down earlier, only the deterioration of the CoRe element ‘concrete
between 1996 and 1998, and then remains the same from bridge deck’ is considered in this research. Concrete bridge
1998 to 2002. On the other hand, bridge No. 46102 had the decks have the highest deterioration rate because of their
deck deteriorated to the condition state 2 (State 2: Exposed, direct and continuous exposures to traffic, weather, and
HZ75%) in 1994, then the condition of the deck kept deicing chemicals [10].
degrading to the condition state 3 (State 3: Attacked, HZ Visual inspection information on bridge decks includes
50%) in 1996, but after repair actions were undertaken, the total areas and percentages of the total area in each
index went back to 100% in 1998 and 2000, after that, it condition state (usually five condition states for bridge
degraded to 75% in 2002. decks). The health index of bridge decks, which is their
health indicator, is calculated using Eq. (7) according to the
inspection data and the last line of Table 1.
100 The KDOT database used by Pontis includes around
5000 bridges. Since the objective of this research is not the
75 management of bridges in the State of Kansas, but rather the
Health Index (%)

proof of concept of the research approach, only a subset of


those bridges was considered. It was decided to use records
50 from only State Districts One (headquartered in Topeka)
ID 27025 and Two (headquartered in Salina). These have 1172 and
ID 27056 803 bridges, respectively. Only bridges with changes in the
25 ID 27036
health index (between any two consecutive inspections)
ID 44010
ID 46102
were chosen as instances for the FCBR model. This resulted
0 in a set of 241 bridges including all such bridges from
1994 1996 1998 2000 2002 District One, and most such bridges (63 out of 68 bridges)
from District Two. It was also decided to limit the type
Inspection Time
of data to that describing ‘deterioration’ to avoid the issue
Fig. 2. The health index of bridge decks. of efficiency of repair actions. Therefore, increases in
Y. Cheng, H.G. Melhem / Advanced Engineering Informatics 19 (2005) 299–315 303

the health index due to repairs or deck reconstruction were and sometimes 3 years, depending on the bridge type and
not considered. condition. Therefore, for each case used in CBR, only the
From the inventory database, only the parameters (fields) traffic values corresponding to the applicable years of
that seem to mostly influence the deterioration of bridge inspection are used.
decks were considered. Moreover, environmental factors The health index is a special attribute, since it has the
were not considered as a separate attribute because the same property as ADT and ADTT/ADT, i.e. taking multiple
bridges selected (located in Districts One and Two) all had values in a case, and participating in matching of attributes.
the same value in the database (3ZModerate, i.e., ‘Typical However, the health index differs from ADT and ADTT/ADT
level of environmental influence on deterioration’) for the and the other attributes because the former is also
parameter Environment. A total of 17 attributes were considered as a target/class. In some cases, there are five
selected. Some of them were continuous (with real values), records for the health index for a bridge, whereas, in others,
including (Re-)Construction Year, Design Load, Main three or four records are available. Each of these records
Spans, Max Span, Length, Deck Area, Skew, Operating corresponds to different inspection date, which starts from
Rating Load, Inventory Rating Load, Average Daily Traffic 1994 through 2002, or 1993 through 2003, for every 2–4
(ADT), Average Daily Truck Traffic (ADTT) to ADT years.
(ADTT/ADT), and Health Index; others were symbolic The attribute Deck Structure Type consists of nine values
(with discrete values), including Deck Structure Type, Main including: Concrete Deck Coated Bar, Bare Concrete Deck,
Span Materials, Main Span Design, Deck Surface Type, and Protected Concrete Deck AC, Unprotected Concrete Deck
Kind of Highway. AC, Protected Concrete Deck Rigid, Concrete Slab Coated
Since the improvement due to repair action or deck Bar, Bar Concrete Slab, Unprotected Concrete Slab AC, and
reconstruction was not considered, the database fields Protected Concrete Slab Rigid. The attribute Main Span
‘Construction Year’ and ‘Reconstruction Year’ were Materials has 4 values: Concrete, Concrete Continuous,
combined into one attribute ‘(Re-)Construction Year’ with Steel Continuous, and Prestressed Concrete Continuous.
the most recent date selected as the corresponding attribute The attribute Main Span Design includes 5 values: Slab,
value. Stringer/Girder, Girder Floor Beam, Tee Beam, and
The attribute values for ‘Design Load’ were originally Multiple Box Beam. The attribute Deck Surface Type
listed in the database as follows: M9 (H10), M13.5 (H15), takes 5 values: Monolithic Concrete, Integral Concrete,
MS13.5 (HS15), M18 (H20), MS18 (HS20), and MS18 Low Slump Concrete, Latex Concrete, and Bituminous. The
(HS20)CMod. An equivalency system was used to convert attribute Kind of Highway also includes 5 values: Interstate
such data into numeric values with a range going from 135 Highway, State Highway, US Numbered Highway, County
to 360 kN (30 kips to 80 kips). The ‘CMod’ extension refers Highway, and City Street.
to the addition of the Alternate Military Loading, which For each data instance, the health index of the bridge
results in a 20–25% increase in load effect (bending deck is computed using Eq. (7). The quantities in each of the
moment) depending on the span length. This equivalency five condition states are the areas (in square meters)
was used in consultation with a KDOT bridge design extracted from the Kansas PONTIS inspection data. The
engineer [24]. weights are those given in the last line of Table 1. The health
The attribute ‘Operating Rating Load’ corresponds to the index is used as a class/target (decision). After the health
maximum level of stresses. Thus load ratings describe the index is computed, the quantities in the condition states are
maximum permissible live load to which the structure may not used anymore and are ‘hidden’ from the machine
be subjected. The attribute ‘Inventory Rating Load’ learning process.
generally corresponds to the customary design level of
stresses, but reflects the existing bridge and material
conditions with regard to deterioration and loss of section. 5. Evaluation of similarities
Inventory level load ratings allow comparisons with the
capacity of new structures, and therefore, result in a live The section discusses how to evaluate the degree of
load that can safely utilize an existing structure for an similarity for attributes and cases. The ways of measuring
indefinite period of time. similarity between two attribute values depend on the type
The two attributes ADT and ADTT/ADT are different than of the attributes. In this study, fuzzy membership functions
the others. Usually, an attribute takes one value in a case, are used to measure the similarities between two numeric
however, both of ADT and ADTT/ADT have multiple values values for an attribute, while similarity matrices or trees are
for the same bridge. In this research, for each bridge case, for enumerated or hierarchical attributes. Two types of
ten records (corresponding to the years 1993 through 2002) fuzzy membership functions for numeric attributes are
were extracted from the electronic database of KDOT with investigated here: step-wise triangular and trapezoidal.
their respective values of ADT and ADTT/ADT. Typically, Similarity matrices and trees for enumerated and hierarch-
ADT and ADTT traffic values are recorded every year for all ical symbolic attributes are determined according to experts,
bridges while the inspection frequency varies from 1, 2, and they may vary within certain ranges. Optimal values for
304 Y. Cheng, H.G. Melhem / Advanced Engineering Informatics 19 (2005) 299–315

similarity matrices and trees can be achieved based on a The D is considered a reference value, i.e., if an is exactly
trial-error method. the maximum or minimum, and ao is the minimum or
maximum, the difference x between the two attribute values
5.1. Fuzzy membership functions for numeric-valued reaches the highest value, then their similarity is considered
attributes 0. As seen from the above equations, the smaller the
difference between two attribute values, the larger the value
Two types of fuzzy membership functions are presented of m(x) is, that means the bigger the degree of similarity
to measure the degree of similarity between any two between the two attributes, and vice versa.
numeric attribute values: trapezoidal and step-wise
triangular.
5.2. Step-wise triangular fuzzy functions

5.1.1. Trapezoidal fuzzy functions A step-wise triangular membership function may be given
The trapezoidal membership function is given by Eqs. by the following Eqs. (14)–(19), and is described by Fig. 4.
(10)–(13), and is represented by Fig. 3.
jxjð1KmðtvÞÞ
mðxÞ Z 1Kj
x
j; where aD% jxj% D (10) mðxÞ Z 1K ; where 0% jxj% tv (14)
ð1KaÞD tv

mðtnÞðjxjKtvÞ C mðtvÞðtnKjxjÞ
mðxÞ Z 1; where xj% aD (11) mðxÞ Z ;
tnKtv (15)
D Z amax Kamin (12) where tv% jxj% tn

x Z an Kao (13) mðtsÞðjxjKtnÞ C mðtnÞðtsKjxjÞ


mðxÞ Z ;
where m(x) is the fuzzy membership function with a value tsKtn (16)
between 0 and 1, and used to measure the degree of
where tn% jxj% ts
similarity between any two values of a certain attribute, an is
an attribute value from a new case, a0 is the value of the
mðtsÞðDKjxjÞ
corresponding attribute from an old/past case, x is the mðxÞ Z ; where tx% jxj% DS (17)
difference between the two attribute values from the new DKts
case and old case, respectively, and D is the difference
D Z amax Kamin (18)
between the maximum and the minimum values of the
attribute in all cases. a is a positive parameter less than 1,
x Z an Ka0 (19)
and its value is chosen such that the FCBR system can
achieve the best performance. As seen in Fig. 4, the step-wise fuzzy function is symmetric
As shown in Fig. 3, the degree of similarity between two about the vertical axis and consists of four linear functions
values of a certain attribute is assumed to be 1 if the (actually, any number of linear functions may be defined),
difference x between the two values is less than 5 (or any which stand for Extremely Near, Very Near, Near, and Slightly
small number) percent (here assume a equals 0.05, or 5%) Near, respectively, i.e., the degree of similarity between any
of the difference D. This means that if 0%x%0.05D, a pair two values of an attribute. Here tv, tn, and ts denote thresholds
of any two values of the attribute has the same similarity, of very near, near, and slightly near points, respectively, and
and their similarities achieve the largest value of 1. If m(tv), m(tn), and m(ts) represent specific membership values or
0.05D!x%D, then the degree of similarity, m(x), of the two similarities corresponding to tv, tn, and ts. All of them are
values varies linearly with x. decided on the basis of the trial-error method such that

µ(x)
µ(x)

Extremely near
1 1
Very near µ(tv)
Near µ(tn)
Sightly near
µ(ts)

0 α∆ ∆ x 0 tv tn ts ∆ x

Fig. 3. Trapezoidal membership function. Fig. 4. Step-wise triangular membership function.


Y. Cheng, H.G. Melhem / Advanced Engineering Informatics 19 (2005) 299–315 305

Table 2
The similarity matrix for the attribute Main Span Design

Slab Stringer Girder floor Tee beam Multiple box Frame Truss deck Truss thru
girder beam beam
Slab 1
Stringer girder 0.4 1 SYMMETRIC
Girder floor beam 0.6 0.5 1
Tee beam 0.4 0.8 0.4 1
Multiple box beam 0.3 0.4 0.3 0.8 1
Frame 0.2 0.2 0.2 0.2 0.2 1
Truss deck 0.5 0.2 0.2 0.2 0.2 0.3 1
Truss thru 0.2 0.2 0.2 0.2 0.2 0.3 0.6 1

the FCBR system can be optimized. The other symbols are the attribute and the nodes are all-possible values of that
same as the ones in the trapezoidal fuzzy function. attribute [11]. Taxonomy tree technique differs from
similarity matrix technique in terms of the knowledge it
5.3. Fuzzy numbers (similarity matrices/trees) expresses [16]. This approach expresses the relationship
for symbolic-valued attributes between attribute values through their position in the
taxonomy tree [3]. By going from the root to the leaves of
Since the values of symbolic attributes cannot be the tree, attribute values become more specialized. Fig. 5
represented on the basis of a unified scale, therefore, the shows an example of a taxonomy tree for the symbolic
difference between these values cannot be computed. In attribute Main Span Materials.
addition, the values of a symbolic attribute can be In Fig. 5, the degree of similarity between Concrete/
represented in many formats, which may cause problems continuous/ reinforced and Concrete/Simply Sup-
in calculations of similarity [11]. In order to avoid these ported/ Reinforced is evaluated as 1!0.6Z0.6, which
problems, similarity has to be addressed in a different way, stands for the difference in support conditions only. This
and for this purpose symbolic attributes are divided into two degree of similarity differs from the one between
categories: enumerated attribute and hierarchical attribute. Concrete/continuous/reinforced and Concrete /
Their similarities are determined using similarity matrices Simply Supported/ Pre-stressed, which is computed as
and similarity trees, respectively. 0.8!0.6Z0.48. This value is more reasonable since it
accounts for the difference in support conditions as well as
reinforcement.
5.3.1. Enumerated symbolic-valued attribute
An enumerated attribute is the first type of symbolic
attributes. A similarity matrix is most commonly used to 5.4. Overall similarities between two cases
define the similarity between the values of the enumerated
attribute. In similarity matrices, the numbers (also called The global similarity between any two cases can be
fuzzy numbers) in the rows and columns denote the degree calculated using Eqs. (20)–(22) [11]:
of similarity between each pair of values of an enumerated
attribute and are determined by domain experts. Since the GSimðxn ; x0 Þ Z TSim=TWt (20)
similarity matrix is symmetrical, the degree of similarity can
be mirrored abound the matrix diagonal. Consequently, only
the degree of similarity for the upper or lower triangle of the X
k
TSim Z LSimi !Wti (21)
similarity matrix needs to be determined. iZ1
As discussed earlier, there are five symbolic attributes in
our problem domain. The four attributes: Kind of Highway,
Main Span Design, Deck Surface Type, and Structural Type, X
k

belong to the enumerated type. The fifth attribute, Main TWt Z Wti (22)
iZ1
Span Materials, is of the hierarchical type and will be
discussed later. Table 2 shows a possible similarity matrix where GSim(xn, x0) stands for the global similarity between
of the attribute Main Span Design. It may be noted again a new/query case and an old/past case, TSim denotes the
that all similarity matrices are symmetrical. total attribute similarity, represents the local similarity
between new and old case values for attribute i evaluated
5.3.2. Hierarchical symbolic-valued attribute using the appropriate similarity measures mentioned in
A hierarchical attribute is the other type of symbolic previous sections (Eqs. (10)–(19), and similarity matrices/
attributes with values defined as nodes of a taxonomy tree. trees), TWt is the total attribute weight, and is the weight of
A taxonomy tree is a tree in which the root is a hierarchical attribute i.
306 Y. Cheng, H.G. Melhem / Advanced Engineering Informatics 19 (2005) 299–315

0.0

Main Span Material


Material type

0.6 0.6
Steel Concrete
Support Support

0.8
0.8
Simply
Simply Supported Continuous
Continuous Reinforcement
Supported Reinforcement

Reinforced Pre-stressed Reinforced Pre-stressed

Fig. 5. Taxonomy tree of the attribute Main Span Materials.

6. Fuzzy predictors closer neighbors. It is expressed by Eqs. (24)–(26).


P
k
Fuzzy predictors are used to monitor the bridge health, wi f ðxi Þ
i.e. predict the deterioration of bridge decks. The different f^ðxn Þ Z iZ1 k (24)
ways of prediction of a numeric class value result in P
wi
different fuzzy predictors. In this study, two fuzzy predictors iZ1
are investigated: Mean-Valued KNN and Distance-
Weighted KNN. 1
wi Z (25)
dðxn ; xi Þ2
6.1. Mean-valued KNN fuzzy predictor
X
N
dðxn ; xi Þ2 Z ðxn;q Kxi;q Þ2 (26)
The algorithm of the mean-valued fuzzy predictor first qZ1
searches the k nearest neighbors to a case according to
the overall similarity, then uses the average of the numeric where d(xn, xi)2 is the distance between a new/test case and
class value of the k nearest neighbors as the final prediction its ith neighbor, wi is the weight of the ith neighbor, N is the
of a new/test case. It is represented by Eq. (23). number of attributes representing these cases, and xn,q and
xi,q stand for the values of the qth attribute in the new/test
case and in its ith neighbor, respectively. The others are the
same as in the mean-value k-NN predictor. The algorithm of
P
k
f ðxi Þ these fuzzy k-NN predictors (mean-valued and distance-
f^ðxn Þ Z iZ1 (23) weighted) is described in Fig. 6.
k

7. Experiments
where k is the number of nearest neighbors, denotes the
class value of the ith (iZ1, 2,., k) neighbor, and represents
Error measures and the correlation coefficient (cc) are
the prediction of the new/test case.
used to evaluate the performance of the FCBR mode. The
following four metrics of error measures are explored here:
6.2. Distance-weighted KNN fuzzy predictor root mean-squared error (rmse), mean absolute error (mae),
root relative-squared error (rrse), and relative absolute error
The algorithm of the distance-weighted fuzzy predictor (rae). These are discussed in the next section.
weights the contribution of each of the k neighbors in terms Ten-fold cross-validation is normally considered the
of their distance to a new/test case , giving greater weight to standard approach for prediction quality of a learning
Y. Cheng, H.G. Melhem / Advanced Engineering Informatics 19 (2005) 299–315 307

Fig. 6. The algorithm of the distance-weighted k-NN predictor.

Table 3
The effect of the parameter a of the trapezoidal membership function on mean absolute error

a Fuzzy No. of run


predictors
1 2 3 4 5 6 7 8 9 10 Average
0.025 M.V. kNN 0.0544 0.0139 0.0475 0.0451 0.0260 0.0417 0.0295 0.0475 0.0521 0.0283 0.0386
D.W. kNN 0.0592 0.0201 0.0536 0.0607 0.0327 0.0550 0.0487 0.0518 0.0587 0.0315 0.0472
0.05 M.V. kNN 0.0527 0.0139 0.0405 0.0382 0.0295 0.0417 0.0295 0.0475 0.0521 0.0317 0.0377
D.W. kNN 0.0590 0.0201 0.0513 0.0470 0.0339 0.0592 0.0487 0.0518 0.0587 0.0324 0.0461
0.1 M.V. kNN 0.0492 0.0174 0.0301 0.0382 0.0295 0.0451 0.0330 0.0509 0.0451 0.0417 0.0380
D.W. kNN 0.0543 0.0235 0.0386 0.0470 0.0339 0.0618 0.0515 0.0506 0.0533 0.0464 0.0461
0.15 M.V. kNN 0.0561 0.0174 0.0579 0.0417 0.0365 0.0590 0.0469 0.0475 0.0556 0.0317 0.0450
D.W. kNN 0.0589 0.0232 0.0611 0.0479 0.0406 0.0733 0.0606 0.0423 0.0589 0.0308 0.0498
0.2 M.V. kNN 0.0561 0.0243 0.0579 0.0486 0.0365 0.0671 0.0503 0.0544 0.0556 0.0383 0.0450
D.W. kNN 0.0589 0.0286 0.0585 0.0507 0.0406 0.0341 0.0636 0.0493 0.0589 0.0351 0.0478
0.25 M.V. kNN 0.0666 0.0451 0.0752 0.0660 0.0538 0.0706 0.0642 0.0544 0.0590 0.0383 0.0593
D.W. kNN 0.0695 0.0426 0.0669 0.0682 0.0537 0.0691 0.0758 0.0561 0.0572 0.0328 0.0592

M.V., mean-valued; D.W., distance-weighted.


308 Y. Cheng, H.G. Melhem / Advanced Engineering Informatics 19 (2005) 299–315

technique given a certain data set. The idea behind the

Average
10 folds is that a data set D is randomly divided into 10

0.0396
0.0474
0.0358
0.0436
0.0367
0.0446
0.0419
0.0472
0.0345
0.0421
0.0358
0.0442
parts, each of which is approximately 10% of D. Nine
parts are used as training set for prediction, and the
remaining part is used as a test set for evaluating
0.0317
0.0329
0.0267
0.0322
0.0267
0.0311
0.0350
0.0366
0.0233
0.0282
0.0233
0.0268
the prediction based on the training set. Each of the 10
10

parts, in turn, is used as a test set and the above process


of predicting and testing is repeated 10 times. Usually,
the average accuracy is reported over 10 folds. Ten-fold
0.0590
0.0543
0.0556
0.0545
0.0521
0.0492
0.0660
0.0558
0.0451
0.0501
0.0521
0.0493
is chosen here, because a variety of tests on many
9

different data sets with different learning algorithms have


shown that 10 is about the right number of folds to get
the best accuracy of prediction [26]. The 10-fold cross-
0.0405
0.0466
0.0440
0.0497
0.0509
0.0573
0.0440
0.0504
0.0370
0.0370
0.0509
0.0580
validation has become the standard way in practice
8

though this statement is no way conclusive, and what is


the best for evaluation is still argued in machine learning
0.0330
0.0526
0.0174
0.0337
0.0104
0.0214
0.0330
0.0525
0.0122
0.0218
0.0104
0.0214

and data mining circles.


7

7.1. Error measures


0.0382
0.0512
0.0347
0.0435
0.0382
0.0425
0.0313
0.0360
0.0347
0.0382
0.0417
0.0421

The evaluation measures are different for classification


6

problems (symbolic class values) than for numeric


prediction problems (continuous class values). For fuzzy
0.0295
0.0352
0.0260
0.0327
0.0243
0.0327
0.0330
0.0367
0.0208
0.0304
0.0278
0.0341

predictors, a basic quality measure given by accuracy


percentage or error rate is not appropriate since errors are
5

not simply ‘hit’ or ‘miss’. Let e1,e2 .,en stand for the
The effect of the tv, tn, ts, m(tv), m(tn), and m(ts) of the step-wise membership function on mean absolute error

estimated (predicted) values of instances on the test data


0.0521
0.0627
0.0521
0.0640
0.0521
0.0630
0.0521
0.0642
0.0521
0.0642
0.0521
0.0642

set, and a1,a2,.,an denote the actual values of the


4

instances. Several methods are used to evaluate the quality


of numeric predictions. The root mean-squared error,
rmse, is the principal and most commonly used method,
0.0370
0.0445
0.0336
0.0457
0.0336
0.0457
0.0405
0.0427
0.0370
0.0464
0.0370
0.0420

and is given by
3

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ðe1 Ka1 Þ2 C/C ðen Kan Þ2
rmse Z (27)
0.0139
0.0216
0.0139
0.0204
0.0208
0.0323
0.0226
0.0245
0.0208
0.0324
0.0208
0.0334

n
2

The mean absolute error, mae considers the average of


No. of run

the individual errors not considering their sign, and is


0.0613
0.0723
0.0544
0.0592
0.0579
0.0706
0.0613
0.0723
0.0613
0.0718
0.0579
0.0706

calculated as
1

je1 Ka1 j C/C jen Kan j


mae Z (28)
n
D.W. kNN

D.W. kNN

D.W. kNN

D.W. kNN

D.W. kNN

D.W. kNN
M.V. kNN

M.V. kNN

M.V. kNN

M.V. kNN

M.V. kNN

M.V. kNN
predictors
Fuzzy

M.V., mean-valued; D.W., distance-weighted.

Table 5
The optimal values of the parameters for the trapezoidal and the step-wise
m(tv), m(tn), m(ts)

triangular fuzzy membership functions


0.90, 0.80, 0.25

0.95, 0.80, 0.5

0.85, 0.75, 0.3

0.85, 0.65, 0.3


0.85, 0.6, 0.3

Fuzzy functions Parameters Fuzzy predictors


0.9, 0.7, 0.4

Mean-valued Distance-
weighted
Trapezoidal a 0.05 0.05
Step-wise tv 0.15 0.15
0.025, 0.25, 0.80

triangular tn 0.25 0.25


0.15, 0.3, 0.75
0.05, 0.15, 0.4

0.15, 0.25, 0.5

Ts 0.50 0.50
0.1, 0.2, 0.6

0.2, 0.3, 0.7

m(tv) 0.85 0.85


tv, tn, ts
Table 4

m(tn) 0.60 0.60


m(ts) 0.30 0.30
Y. Cheng, H.G. Melhem / Advanced Engineering Informatics 19 (2005) 299–315 309

The mean-squared error tends to exaggerate the according to their magnitude. These two methods measure
influence of the instances whose prediction error is larger the absolute errors, but sometimes the averages of absolute
than the others, but the absolute error does not have this error will be meaningless. Therefore, methods measuring
effect, since all sizes of error are dealt with equally relative errors become of importance. The measures
Table 6
The effect of the number k of nearest neighbors on the fuzzy predictors
k No. of run Fuzzy membership functions
Mean-valued fuzzy predictor Distance-weighted fuzzy predictor
Trapezoidal Step-wise Trapezoidal Step-wise
1 1 0.0590 0.0729 0.0590 0.0729
2 0.0208 0.0313 0.0208 0.0313
3 0.0486 0.0729 0.0486 0.0729
4 0.0625 0.1042 0.0625 0.1042
5 0.0417 0.0417 0.0417 0.0417
6 0.0313 0.0417 0.0313 0.0417
7 0.0417 0.0417 0.0417 0.0417
8 0.0521 0.0313 0.0521 0.0313
9 0.0521 0.0417 0.0521 0.0417
10 0.04 0.03 0.04 0.03
Average 0.0450 0.0509 0.0450 0.0509
2 1 0.0503 0.066 0.056 0.0701
2 0.0156 0.0156 0.0173 0.0228
3 0.0347 0.0451 0.0393 0.0525
4 0.0365 0.0521 0.0455 0.0668
5 0.0313 0.026 0.0366 0.0327
6 0.0521 0.0365 0.0562 0.0368
7 0.0417 0.0156 0.0441 0.0136
8 0.0625 0.0469 0.06 0.049
9 0.0399 0.0313 0.0409 0.0313
10 0.0275 0.03 0.0297 0.03
Average 0.0392 0.0365 0.0426 0.0405
3 1 0.0527 0.0613 0.0569 0.066
2 0.0139 0.0208 0.015 0.0262
3 0.0405 0.0370 0.0428 0.0384
4 0.0382 0.0521 0.0462 0.0633
5 0.0295 0.0208 0.0327 0.0272
6 0.0417 0.0347 0.048 0.0363
7 0.026 0.0139 0.0365 0.0168
8 0.0475 0.037 0.0472 0.0365
9 0.0521 0.0452 0.0591 0.0482
10 0.0317 0.0233 0.0321 0.0268
Average 0.0374 0.0346 0.0416 0.0386
4 1 0.0447 0.0564 0.0514 0.0617
2 0.013 0.0247 0.0143 0.0308
3 0.0512 0.033 0.0461 0.0344
4 0.0313 0.0469 0.0437 0.0564
5 0.0352 0.0313 0.0346 0.0343
6 0.0547 0.0339 0.0579 0.0368
7 0.0247 0.0135 0.0312 0.0161
8 0.0538 0.0391 0.0516 0.0382
9 0.0482 0.0365 0.0527 0.0382
10 0.0288 0.0275 0.0307 0.0335
Average 0.0386 0.0343 0.0414 0.0380
5 1 0.0483 0.0576 0.0554 0.0617
2 0.0229 0.0323 0.0261 0.0365
3 0.0514 0.0347 0.0462 0.0355
4 0.0458 0.0479 0.0563 0.0574
5 0.0344 0.0281 0.0352 0.032
6 0.0549 0.0403 0.0574 0.0426
7 0.0323 0.0198 0.0351 0.0232
8 0.0493 0.0375 0.0488 0.0396
9 0.051 0.0354 0.0546 0.0377
10 0.021 0.024 0.0245 0.0305
Average 0.0411 0.0358 0.044 0.0397
310 Y. Cheng, H.G. Melhem / Advanced Engineering Informatics 19 (2005) 299–315

Table 7
The performance of the distance-weighted fuzzy predictor at kZ3

Metrics No. of run Fuzzy functions Metrics No. of run Fuzzy functions
Trapezoidal Step-wise Trapezoidal Step-wise
mae 1 0.0569 0.066 rae 1 0.2756 0.3114
2 0.015 0.0262 2 0.0692 0.1159
3 0.0428 0.0384 3 0.1977 0.1769
4 0.0462 0.0633 4 0.2531 0.372
5 0.0327 0.0272 5 0.1762 0.141
6 0.048 0.0362 6 0.2354 0.1785
7 0.0365 0.0168 7 0.1417 0.0623
8 0.0472 0.0365 8 0.2518 0.1846
9 0.0591 0.0482 9 0.237 0.2021
10 0.0321 0.0267 10 0.1544 0.1284
Average 0.0416 0.0386 Average 0.1992 0.1873
rmse 1 0.1546 0.1617 cc 1 0.8373 0.818
2 0.0369 0.0773 2 0.9901 0.958
3 0.0946 0.085 3 0.9329 0.9488
4 0.1146 0.1269 4 0.9348 0.9113
5 0.0767 0.0717 5 0.9638 0.971
6 0.0905 0.0823 6 0.9358 0.943
7 0.0818 0.0483 7 0.9652 0.9872
8 0.1029 0.1155 8 0.899 0.8809
9 0.1259 0.1039 9 0.8946 0.9266
10 0.0963 0.0975 10 0.9276 0.9257
Average 0.0975 0.097 Average 0.9281 0.9271
rrse 1 0.5726 0.599
2 0.1447 0.3031
3 0.4013 0.3606
4 0.4146 0.4591
5 0.2893 0.2706
6 0.3903 0.355
7 0.284 0.1679
8 0.4508 0.5063
9 0.5042 0.4159
10 0.385 0.3898
Average 0.3837 0.3827

commonly used include the root relative squared error, in which:


rrse, and the relative absolute error, rae, which are P
evaluated by Eqs. (29) and (30), respectively, i ðei KlÞðai KcÞ
s2PA Z (32)
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi nK1
ðe1 Ka1 Þ2 C/C ðen Kan Þ2
rrse Z (29) P
ða1 KcÞ2 C/C ðan KcÞ2 ðei KlÞ2
i
s2P Z (33)
nK1
je1 Ka1 j C/C jen Kan j
rae Z (30) P
je1 Kcj C/C jen Kcj i ðai KcÞ2
s2A Z (34)
nK1
where
P P c is the same as in Eqs. (29) and (30), and
where
cZ i i:
a lZ i ei =n:
n The correlation coefficient ranges from 1 for perfect
correlation, through 0 for no correlation at all, to K1 when
Finally, the correlation coefficient measures the there are completely negatively correlated outcomes.
statistical correlation between the actual values and the Negative values should not occur for reasonable prediction
predictions. The correlation coefficient, cc, is computed as measures. Good performance results in a large value of the
correlation coefficient, and small error rate. All the above
s2PA methods were used to measure the prediction quality of the
cc Z (31)
sP sA learning algorithms reported in this paper.
Y. Cheng, H.G. Melhem / Advanced Engineering Informatics 19 (2005) 299–315 311

7.2. Optimization of parameters the mean absolute errors that are obtained by setting kZ3
and threshold of similarity degreeZ0.5. The different
7.2.1. Optimization of the trapezoidal and the step-wise combinations of the six parameters of the step-wise
triangular fuzzy functions triangular membership function yield Table 4, which
In order to obtain the best performance of the FCBR results from the same settings as for the former.
model, the parameters for the trapezoidal and the step- As shown in Table 3, when a equals 0.05, the Mean-
wise membership functions first had to be investigated Valued kNN results in the average mean absolute error of
since these fuzzy membership functions are used to 0.0377, and the Distance-Weighted kNN yields the
evaluate the degree of similarity. As shown in Fig. 3, the average mean absolute error of 0.0461. Both these values
trapezoidal membership function has only one parameter, are the smallest among the mean absolute errors produced
aD, to be determined (where D is known), while Fig. 4 by Mean-Valued kNN and Distance-Weighted kNN,
shows that there are 6 parameters in the step-wise respectively. Therefore, 0.05 is the local optimal value
membership function: tv, tn, ts, m(tv), m(tn), and m(ts). of a. From Table 4, when tv, tn, and ts take the values of
These parameters have been explained in Section 5.1. The 0.15, 0.25, 0.5, and m(tv), m(tn), and m(ts) are equal to
purpose of this section is to find the best values of these 0.85, 0.6, and 0.3 correspondingly, both the Mean-Valued
parameters such that this FCBR model can achieve the k-NN and the Distance-Weighted k-NN achieve the
optimal performance. Thus, the influence of these smallest mean-absolute error of 0.0345, and 0.0421,
parameters on the FCBR model is mainly explored in respectively. So this combination of the six parameters
this section. Mean Absolute Error is selected to find the is locally the best. These values are shown in bold in the
local optimal values of these parameters based on 10-fold two tables. Moreover, endless values for all parameters
cross-validation. Table 3 shows the effect of different make it impossible to try each of them, so the best
parameters of the trapezoidal membership function on discussed here is called the local, rather than global,

Table 8
The performance of the distance-weighted fuzzy predictor at kZ4

Metrics No. of run Fuzzy functions Metrics No. of run Fuzzy functions
Trapezoidal Step-wise Trapezoidal Step-wise
mae 1 0.0514 0.0617 rae 1 0.2561 0.2938
2 0.0143 0.0308 2 0.0654 0.1346
3 0.0461 0.0344 3 0.2158 0.1652
4 0.0437 0.0564 4 0.2399 0.321
5 0.0346 0.0343 5 0.1827 0.1807
6 0.0579 0.0368 6 0.2883 0.1872
7 0.0312 0.0161 7 0.1205 0.0591
8 0.0516 0.0382 8 0.2669 0.1908
9 0.0527 0.0382 9 0.2235 0.1676
10 0.0307 0.0335 10 0.1515 0.1692
Average 0.0414 0.0381 Average 0.2011 0.1869
rmse 1 0.1371 0.1591 cc 1 0.8724 0.8245
2 0.036 0.0727 2 0.9909 0.9653
3 0.036 0.0727 3 0.9491 0.9605
4 0.1113 0.1212 4 0.9414 0.9178
5 0.0757 0.0771 5 0.9616 0.9623
6 0.0934 0.069 6 0.9298 0.9594
7 0.0714 0.0473 7 0.9723 0.9879
8 0.1133 0.1157 8 0.8795 0.8817
9 0.1074 0.0903 9 0.9167 0.9427
10 0.0795 0.0951 10 0.9523 0.9287
Average 0.0861 0.092 Average 0.9366 0.9331
rrse 1 0.5077 0.5891
2 0.141 0.2853
3 0.3607 0.3074
4 0.4028 0.4386
5 0.2858 0.2911
6 0.4032 0.2979
7 0.2478 0.1644
8 0.4964 0.5071
9 0.4298 0.3613
10 0.3179 0.3802
Average 0.3593 0.3622
312 Y. Cheng, H.G. Melhem / Advanced Engineering Informatics 19 (2005) 299–315

optimal value. The optimal values of these parameters are 7.2.3. Evaluation of fuzzy predictors
summarized in Table 5. The two fuzzy k-NN predictors discussed previously
were investigated, and the five methods of measuring the
errors are applied to compare the learning performance of
7.2.2. The optimal values of k for the fuzzy predictors
This section explores the influence of the number of the two predictors. All the following experiments are
nearest neighbors, k, on the fuzzy predictors. Ten-fold based on 10-fold cross-validation, threshold value of 0.5,
cross-validations result in Table 6 by setting the threshold and the local optimal values of the parameters for the
of similarity degree equal to 0.5 and the parameters for trapezoidal and the step-wise fuzzy membership functions
the fuzzy membership functions to be the optimal values identified previously. As found earlier, for the distance-
obtained previously. The best averages over 10 runs and weighted fuzzy predictor, the optimal values of k
the corresponding k values are shown in bold. As shown corresponding to the trapezoidal and the step-wise
in Table 6, the mean-valued fuzzy predictor achieved the triangular are equal, i.e. 4 obtained using Mean-Absolute
smallest average of mean absolute errors at kZ3 no Error. The optimal value of k for mae is not probably the
matter which fuzzy function was used (the trapezoidal best for other error metrics. Therefore, for verification of
0.0374 and the step-wise 0.0346). Therefore, for the the optimal value obtained by mae, all errors and
mean-valued fuzzy predictor, the optimal value of k is 3. correlation coefficients due to the above two fuzzy
For the distance-weighted predictor, the trapezoidal and membership functions are shown based on kZ3, 4, and
step-wise fuzzy functions led to the smallest values of 5. Tables 7–9 are the results obtained by setting k to 3, 4,
0.0414 and 0.0380, respectively, when k equals 4. Thus, and 5, respectively. rmse due to the trapezoidal fuzzy
the best value of k is 4 for the fuzzy distance-weighted function and rae due to the step-wise fuzzy function are
predictor. the best at kZ4, which completely matches the one

Table 9
The performance of the distance-weighted fuzzy predictor at kZ5

Metrics No. of run Fuzzy functions Metrics No. of run Fuzzy functions
Trapezoidal Step-wise Trapezoidal Step-wise
mae 1 0.0554 0.0617 rae 1 0.2713 0.2941
2 0.0261 0.0365 2 0.1179 0.162
3 0.0462 0.0355 3 0.2171 0.1702
4 0.0563 0.0574 4 0.3195 0.328
5 0.0352 0.032 5 0.1814 0.1671
6 0.0574 0.0426 6 0.287 0.2102
7 0.0337 0.0242 7 0.1279 0.0914
8 0.0488 0.0396 8 0.2552 0.211
9 0.0546 0.0377 9 0.2295 0.1717
10 0.0245 0.0305 10 0.1219 0.1549
Average 0.0438 0.0398 Average 0.2129 0.1961
rmse 1 0.1406 0.1601 cc 1 0.8628 0.8228
2 0.0513 0.0709 2 0.9827 0.9678
3 0.0789 0.0723 3 0.957 0.9599
4 0.1185 0.1201 4 0.9253 0.9228
5 0.0706 0.0735 5 0.9655 0.9645
6 0.0919 0.075 6 0.9312 0.9556
7 0.0688 0.0588 7 0.9741 0.9803
8 0.1108 0.0994 8 0.8843 0.9059
9 0.1066 0.0827 9 0.9185 0.95
10 0.0683 0.0874 10 0.9656 0.9393
Average 0.0906 0.09 Average 0.9367 0.9369
rrse 1 0.5206 0.5928
2 0.2014 0.2782
3 0.3346 0.3069
4 0.4287 0.4345
5 0.2664 0.2775
6 0.3967 0.3236
7 0.2391 0.2041
8 0.4855 0.4355
9 0.4267 0.3309
10 0.2729 0.3493
Average 0.3573 0.3533
Y. Cheng, H.G. Melhem / Advanced Engineering Informatics 19 (2005) 299–315 313

Table 10
The performance of the mean-valued fuzzy predictor at kZ3

Metrics No. of run Fuzzy functions Metrics No. of run Fuzzy functions
Trapezoidal Step-wise Trapezoidal Step-wise
mae 1 0.0527 0.0613 rae 1 0.2521 0.2961
2 0.0139 0.0208 2 0.064 0.093
3 0.0405 0.037 3 0.1892 0.173
4 0.0382 0.0521 4 0.1964 0.283
5 0.0295 0.0208 5 0.1581 0.1062
6 0.0417 0.0347 6 0.2025 0.1724
7 0.026 0.0139 7 0.1014 0.0523
8 0.0475 0.037 8 0.2455 0.1882
9 0.0521 0.0451 9 0.2151 0.1919
10 0.0317 0.0233 10 0.1533 0.1139
Average 0.0374 0.0346 Average 0.1778 0.167
rmse 1 0.1519 0.1591 cc 1 0.8438 0.8244
2 0.034 0.0589 2 0.9917 0.9756
3 0.0858 0.0769 3 0.9456 0.9587
4 0.0884 0.1006 4 0.9632 0.9483
5 0.0706 0.0589 5 0.9671 0.9798
6 0.0798 0.0798 6 0.951 0.9455
7 0.0619 0.0417 7 0.9808 0.9903
8 0.0984 0.1159 8 0.9092 0.8794
9 0.1185 0.1 9 0.9042 0.9324
10 0.087 0.0833 10 0.9407 0.945
Average 0.0876 0.0875 Average 0.9397 0.9379
rrse 1 0.5624 0.5891
2 0.1334 0.2311
3 0.364 0.3263
4 0.3198 0.3641
5 0.2666 0.2224
6 0.3443 0.3443
7 0.2151 0.1447
8 0.431 0.508
9 0.4745 0.4003
10 0.3477 0.3331
Average 0.3459 0.3463

found before, but the optimal (smallest) values of all the seven optimal values are obtained when k equals 5.
other error metrics produced by both the fuzzy functions All the optimal values are highlighted in bold in
appear at kZ5, rather than 4. However, for a certain Tables 7–9, respectively.
fuzzy function, there are no significant difference between
the values of a given metric at the optimal values of k For easy comparison of the two fuzzy functions, the
obtained in this section and the previous section. For optimal values of the four error measures and the correlation
example, for the step-wise fuzzy function, the smallest coefficient produced by the fuzzy predictors in conjunction
value of rmse is 0.090 at kZ5, and is 0.092 at kZ4. This
means that the optimal values of k obtained before are Table 11
accurate enough. Therefore, the performance of the mean- The optimal values of the five metrics produced by the fuzzy predictors
valued fuzzy k-NN predictors is measured on the basis of Fuzzy predic- Metrics Optimal values
the optimal value of kZ3, achieved earlier and the results tors
are listed in Table 10. Trapezoidal Step-wise
For the distance-weighted fuzzy predictor, from Distance- mae 0.0414 0.0381
Tables 7–9, it can be found that: weighted pre- rmse 0.0861 0.09
dictor rrse 0.3573 0.3533
rae 0.1992 0.1869
1. All the optimal values of rrse and cc corresponding to cc 0.9367 0.9369
the two fuzzy functions appear at kZ5. Mean-valued mae 0.0374 0.0346
2. Generally, this predictor performs slightly worse at kZ3 predictor rmse 0.0876 0.0875
than at kZ4 and 5. However, it is slightly better at kZ5 rrse 0.3459 0.3463
rae 0.1778 0.1670
than at kZ4, since only one optimal value results from
cc 0.9397 0.9379
kZ3, the four optimal values correspond to kZ4, while
314 Y. Cheng, H.G. Melhem / Advanced Engineering Informatics 19 (2005) 299–315

with these fuzzy functions are summarized in Table 11 1. It is feasible to apply fuzzy case-based reasoning
according to Tables 7–10. It should be noted that the optimal algorithms to monitor highway bridge health. The
value is the one corresponding to smallest error metrics, predictive errors by the fuzzy k-NN predictors are very
while yielding the largest correlation coefficient. small (no matter what metrics are used) and the
From Table 11, the following interesting points can be correlation coefficients reach more than 0.9.
seen: 2. Based on the actual data used in this study, the mean-
valued fuzzy k-NN predictor is more efficient than the
1. The optimal values of the five-error metrics have shown distance-weighted fuzzy k-NN predictor. Regardless of
a good performance of the fuzzy predictors because both which metrics and fuzzy membership functions are used,
the distance-weighted and the mean-valued fuzzy almost all the optimal results by the former are better
predictors lead to very small errors and very high than the corresponding ones by the latter. The only
correlation coefficients. For the distance-weighted fuzzy exception is the rmse due to the trapezoidal fuzzy
predictor, even the largest errors produced by the two membership function at kZ4 which is worse by the
fuzzy functions are very small. For example, the largest mean-value predictor than by the distance-weighted
values of mae, rmse, rrse, and rae are 0.0414, 0.0906, predictor.
0.3573, and 0.201, respectively. Also, rrse and rae can 3. Generally speaking, the step-wise fuzzy function is
be expressed as a percentage, 35.73 and 20.1% since slightly better than the trapezoidal fuzzy function.
they are relative errors. The largest value of cc is 0.9381
4. Based on the same data sets (training/test), usually the
(cc varies from 0 to 1 in this study, usually the larger the
smaller the error is, the larger the correlation coefficients
value of cc, the better the predictors are). For the mean-
are.
valued fuzzy predictor, the same is also true.
5. The threshold values for the trapezoidal and step-wise
2. For both fuzzy predictors, mae and rae due to the step-
fuzzy functions can be adjusted so that the fuzzy system
wise fuzzy function are smaller than due to
achieves the best performance.
the trapezoidal fuzzy function. Therefore, the step-wise
6. The optimal value of k depends on factors such as the
fuzzy function is best for both the mean absolute error
fuzzy membership functions and the algorithms of fuzzy
and the relative absolute error. The rmse yielded by the
predictors.
mean-valued fuzzy predictor in conjunction with the
7. The data sets (including training cases and test cases)
trapezoidal function (0.0876) is almost equal to that
heavily influence the learning performance of the fuzzy
conjunction with the step-wise function (0.0875).
Regarding which fuzzy function is best suited for the system. For example, among any 10 runs, the data sets
rrse depends on the fuzzy predictors. (training/test) for the first run always produce the larger
errors than the other data runs.

8. Conclusions
References
The results presented in this paper are part of a more
comprehensive study described in more details in a PhD [1] Aamodt A, Plaza E. Case-based reasoning: foundational issues,
dissertation [7]. The paper mainly investigates the effec- methodological variations, and system approaches. AI Commun 1994;
tiveness of monitoring bridge health using the FCBR model. 7:39–59.
The two fuzzy predictors (mean-valued and distance- [2] American Association of State highway and Transportation officials
(AASHTO), Guidelines for bridge management systems, Washington,
weighted) and the two fuzzy membership functions DC; 1993.
(trapezoidal and step-wise) have been explored here. [3] Bergmann, R. On the use of taxonomies for representing case features
Other types of membership functions (triangular and and local similarity measures. Proceeding of the 6th German
Gaussian) have being investigated but are not reported in Workshop on Case-Based Reasoning (GWCBR’98); Berlin,
this paper. Meanwhile, the threshold parameters for the Germany. 1998; 23–32.
[4] Bezdek LA. Pattern recognition with fuzzy objective function
trapezoidal and the step-wise fuzzy functions and the
algorithms. New York: Plenum Press; 1981.
number k of nearest-neighbors are locally optimized such [5] Boussabaine AH. The use of artificial neural networks in construction
that this model can achieve the best performance. All management: a review. Construction Management and Economics,
algorithms are implemented in object-oriented program- E&FN Spon 1996;14:427–36.
ming language, CCC. The performance of the developed [6] Burkhard HD, Richter MM. On the notion of similarity in case based
model is measured using five different error metrics. The reasoning and fuzzy theory. In: Pal Sankar K, Dilon Tharam S,
Yeung Daniel S, editors. Soft computing in case-based reasoning,
model is validated using the actual data sets extracted from
Chapter 2. London: Springer; 2001.
the electronic database of bridge deck inspections of KDOT [7] Cheng, Y. Development of Bridge Management Systems Using Fuzzy
based on the 10-fold cross-validation method. According to Case-Based Reasoning. PhD Dissertation. Kansas State University,
the experiments, the following conclusions can be made: Manhattan, KS, USA; 2005.
Y. Cheng, H.G. Melhem / Advanced Engineering Informatics 19 (2005) 299–315 315

[8] Cheng, Y., Melhem, H., Numeric prediction algorithms for bridge [17] Morcous G, Rivard H, Hanna AM. Case-based reasoning system for
corrosion. Proceedings of the 10th International Conference on modeling infrastructure deterioration. J Comput Civil Eng, ASCE
Computing in Civil and Building Engineering (ICCCBE), Weimar, 2002a;16(2):104–14.
Germany. 2004. [18] Morcous G, Rivard H, Hanna AM. Modeling bridge deterioration
[9] Cheng, Y., Melhem, H. Application of fuzzy case-based reasoning to using case-based reasoning. J Infrastruct Syst, ASCE 2002b;8(3):
bridge management system. Proceedings of International Conference 86–95.
on Computing in Civil Engineering, ASCE, Cancun, Mexico; 2005. [19] PONTIS, Pontis Bridge Management, Release 4, Technical Manual,
[10] Freyermuth CL, Klieger P, Stark DC, Wenke HN. Durability of American Association of State Highway and Transportation Officials,
concrete bridge decks-a review of cooperative studies. Transport Res Washington, DC; 2001.
Record, TRB 1970;328:50–60. [20] Reich Y, Fenves SJ. A system that learns to design cable-stayed
[11] Kolodner JL. Case-based reasoning.. San Francisco, USA: Morgan bridges. J Struct Eng 1995;121(7):1090–100.
Kaufmann Publisher; 1993. [21] Roddis KWM, Bocox J. Case-based approach for steel bridge
[12] Maher ML, Balachandran B. Multimedia approach to case-based fabrication errors. J Comput Civil Eng 1997;11(2):84–91.
structural design. J Comput Civil Eng 1994;8(3):359–76. [22] Shepard RW, Johnson MB. A diagnostic tool to maximize bridge
[13] Melhem HG, Cheng Y. Prediction of remaining service life of bridge longevity, investment. Transport Res News 2001;215:6–11.
decks using machine learning. J Comput Civil Eng, ASCE 2003; [23] Thompson, PD., Shepard, RW. AASHTO Commonly-recognized
17(1):1–9. bridge elements - successful applications and lessons learned.
[14] Melhem HG, Cheng Y, Kossler D, Scherschligt D. Wrapper methods National Workshop on Commonly Recognized Measures for
for inductive learning: example application to bridge decks. J Comput Maintenance; 2000.
Civil Eng, ASCE 2003;17(1):46–57. [24] Washburn L. Personal contact. Bridge evaluation engineer, KDOT,
[15] Melhem H, Cheng Y, Kossler D. Fuzzy case-based reasoning for Topeka, Kansas 2002;2002.
bridge management. Proceedings of the 11th International Workshop [25] Watson I. Applying case-based reasoning: techniques for enterprise
for Intelligent EG-ICE. Weimar, Germany 2004;2004:76–85. systems. San Mateo, CA, USA: Morgan Kaufmann Publisher; 1997.
[16] Morcous, G. Case-based reasoning for modeling bridge deterioration. [26] Witten IH, Frank E. Data mining. San Francisco, CA, USA: Morgan
PhD Dissertation. Concordia University, Canada; 2000. Kaufmann Publisher; 2000.

You might also like