You are on page 1of 7

.

K N O W L E D G E D I S C O V E R Y

Applying Knowledge
Discovery to Predict
Water-Supply Consumption
Aijun An, Christine Chan, Ning Shan, Nick Cercone, and Wojciech Ziarko,
University of Regina, Saskatchewan, Canada

O PTIMIZING CONTROL OF OPER-


ations in a municipal water-distribution sys-
A ROUGH-SET METHOD GENERATES PREDICTION RULES
tem can reduce electricity costs and realize FROM OBSERVED DATA, USING STATISTICAL INFORMATION
other economic benefits. However, optimal
control requires an ability to precisely predict
INHERENT IN THE DATA TO HANDLE INCOMPLETE AND
short-term water demand so that minimum- AMBIGUOUS TRAINING SAMPLES. EXPERIMENTAL RESULTS
cost pumping schedules can be prepared. One
of the objectives of our project to develop an INDICATE THAT THIS METHOD PROVIDES MORE PRECISE
intelligent system for monitoring and con- INFORMATION THAN IS AVAILABLE THROUGH KNOWLEDGE
trolling municipal water-supply systems is to
ensure optimal control and reduce energy ACQUISITION FROM HUMAN EXPERTS.
costs. Hence, prediction of water demand is
essential. In this article, we present an appli-
cation of a rough-set approach for automated Pumps and valves, housed in pumping sta- bution systems operations is difficult. Doc-
discovery of rules from a set of data samples tions, control the systems pressures and flow umenting the heuristics of the most experi-
for daily water-demand predictions. The data- rates. Human operators control the distribu- enced operator in an intelligent system is one
base contains 306 training samples, covering tion systems operations at a central pumping way to reduce operating costs in the supply
information on seven environmental and soci- station. The operators use heuristics or rules and distribution of purified water.
ological factors and their corresponding daily of thumb to minimize the cost of power used To develop our expert system, we con-
volume of distribution flow. by pumps, to make demand forecasts, and to ducted knowledge acquisition through struc-
keep the water level of reservoirs within rea- tured and unstructured interviews with human
sonable ranges. These heuristics depend on experts, and we obtained heuristics for cost-
The problem domain several economic, environmental, and socio- effective water-utility operations. Analysis of
logical factors. A sample heuristic is, these heuristics indicates that accurately esti-
The problem domain is typical of a water- If the weather in the last three days is hot and
mating short-term (daily) forecasts of water
distribution system of moderate-sized cities dry, and the weather in the next three days is demand is important so that energy-saving
in North America. The sources of water are a expected to be hot and dry, and the time before pumping schedules can be derived. However,
lake and several underground wells. Water is high demand is expected to be less than or equal the prediction of water demand is poorly
first pumped to reservoirs at various locations to eight hours, then use a large pump and run it understoodeven by the experts, who ap-
for a short time.
in the city and then from the reservoirs to the proximate daily water demand based on their
distribution system, or to another reservoir Because several operators control the sys- experience. The often inaccurate estimations
when it is necessary to adjust water levels. tem, standardizing and optimizing the distri- result in inefficient operation of the water-dis-

72 0885-9000/97/$10.00 1997 IEEE IEEE EXPERT


.

Table 1. Condition factors for water-demand prediction.

LABEL CONDITION ATTRIBUTES

a0 Day of the week


tribution system. The lack of knowledge Condition attributes. Seven factors can
a1 Maximum temperature
about water-demand prediction translates to affect the daily consumption of water in a city a2 Minimum temperature
a gap in the expert systems knowledge base. (see Table 1). The first factor is day of the a3 Average humidity
In other words, manual knowledge acquisi- week, which we chose because of the obser- a4 Rainfall
tion by itself is inadequate for handling all the vation that on weekends the daily total- a5 Snowfall
a6 Average wind speed
situations that can arise in a complex engi- distribution flows are usually less than on
neering application. weekdays. The city also has
The relevant studies on demand prediction the power, through bylaws,
have focused on developing mathematical to restrict watering of lawns Table 2. Information system for water-demand prediction.
models based on statistical studies of historical on Wednesdays. Further-
data.1 In our project, we focused on using more, Mondays are days of C
machine-learning techniques for generating high water usage, because
rules on water-demand prediction, because many people do their laun- OBJECTS a0 a1 a2 a3 a4 a5 a6 D
mathematical modeling focuses on quantita- dry on Mondays and be-
obj1 6 6 7 8 1 0 3 0
tive information alone. But in automated learn- cause, in the summer, peo- obj2 1 5 7 9 0 0 1 0
ing from observed data, both quantitative as ple water their lawns on the obj3 6 6 7 8 1 0 3 0
well as qualitative variables are considered. day after returning from a obj4 3 6 7 4 0 0 1 1
Because the databases training samples are weekend at their cottages. obj5 3 5 7 3 0 0 1 1
obj6 3 6 7 4 0 0 1 1
incomplete and possibly ambiguous, exact The remaining factors are
obj7 6 6 8 3 0 0 6 2
decision rules cannot be derived by standard weather conditions: tem- obj8 3 6 7 4 0 0 1 2
methods.25 We propose a method for gener- perature, humidity, precipi-
ating classification rules from incomplete tation, and wind. We ob-
information. This method extends the rough- tained the values of these
set model,6 and uses statistical information to factors from monthly meteorological sum- for 10 months, from March to December
define the positive and negative regions of a maries printed by Environment Canada. 1994. Table 2 lists eight of these objects. For
concept. Each classification rule generated by the purpose of rough-set-based data analy-
our learning system is characterized by an Decision attribute. The total amount of sis, we have generalized the information sys-
uncertainty factor, which is the probability that water consumed by the city each day is cal- tem by replacing the original attribute values
an object matching the condition part of the culated by summing the daily distribution with some discrete rangesfor example, we
rule belongs to the concept. flows metered at each pumping station of the have discretized attribute a2 (minimum tem-
city. The operator at the central control sta- perature) into 10 categories, denoted as 0, 1,
tion records this number every day. The value 2, 3, 4, 5, 6, 7, 8, and 9. For instance, cate-
Data collection and varies from 50 megaliters in the coldest win- gories 7 and 8 for a2, which appear in Table
representation ter days to 180 megaliters in the summer. 2, stand for the ranges (6.46, 10.34] and
(10.34, 14.22], respectively. (The unmatched
We obtained data on the water domain Information systems. We assume that the brackets stand for a half-open and half-closed
from the municipal water department in given set of training samples represents the range. For example, 6.46 does not belong to
Regina, Saskatchewan, Canada, and Envi- knowledge about the domain. In our ap- the range (6.46, 10.34], but 10.34 does.)
ronment Canada. The former provided us proach, the training set is described by an The question of how to optimally dis-
with historical data on water consumption, information system.4 The objects in a universe cretize the attribute values is a subject of
and the latter on weather conditions. U are described by a set of attribute values. ongoing research; the method of discretiza-
Formally, an information system S is a tion we have adopted is just one of many pos-
Characteristics of water demand. The quadruple <U, A, V, f>, where U = {x1, x2, sibilities. Our experimental results are based
instantaneous consumption of water in an , xN} is a finite set of objects, and A is a on the whole set of training samples (the 306-
urban distribution system depends on the finite set of attributes. The attributes in A are object information system).
many industrial, commercial, public, and further classified into two disjoint subsets:
domestic consumers distributed throughout the condition attributes C and the decision
the area supplied. Factors such as weather attribute D, such that A = C D and C D Data analysis
conditions, seasonal variation, day of the = 0/ . V = UaAVa is a set of attribute values,
week, and whether a particular day is an where Va is the domain of attribute a (the set In what follows, we present the basics of
observed holiday can all influence this con- of values of attribute a). f : U A V is an an extended rough-set model, referred to as
sumption. We have found that expert opera- information function that assigns particular the probabilistic model of rough sets. The
tors base their water-demand predictions pri- values from domains of attributes to objects, probabilistic model underlies the rule-extrac-
marily on weather-related considerations and such that f(xi, a) Va for all xi U and a A. tion technique adopted in our research.
day of the week. Therefore, in this article, we The 306 training samples in our informa-
use weather conditions and day of the week tion system for water-demand prediction are Indiscernibility relation. An information
as the condition attributes, and consumer objects that include daily information on the system provides only partial information of
water demand as the decision attribute. condition attributes and the decision attribute the universe. That is, the objects described

JULY/AUGUST 1997 73
.

by the fixed set of selected attributes might be Y0 = {obj1, obj2, obj3} R(C)(Y ) R(C)(Y )
insufficient for characterizing the objects Y1 = {obj4, obj5, obj6}
uniquely. Any two objects are indistinguish- Y2 = {obj7, obj8} For this reason, several extensions to the orig-
able from each other whenever they assume inal rough-set model have been pro-
the same attribute values. This means we The equivalence classes on the relation R(C) posed.69,11 In our approach, we try to rectify
might not be able to distinguish all the are this limitation by introducing a -approxi-
objects solely by means of the admitted mation space.
attributes and their values. X1 = [obj1]R = [obj3]R = {obj1, obj3} A -approximation space ASP is a triple
Given an information system <U, A, V, X2 = [obj2]R = {obj2} <U, R(C), P>, where P is a probability mea-
f>, let B be a subset of A, and let xi and xj be X3 = [obj4]R = [obj6]R = [obj8]R sure and is a real number in the range (0.5,
members of U. A binary relation R(B), = {obj4, obj6, obj8} 1]. The -approximation space ASP can be
called an indiscernibility relation, is defined X4 = [obj5]R = {obj5} divided into the following regions:
as R(B) = {(xi, xj) U2 | a B, f(xi, a) = X5 = [obj7]R = {obj7}
f(xj, a)}. We say that xi and xj are indis- -positive region of the set Y:
cernible to the set of attributes B in S if f(xi,
a) = f(xj, a) for every a B. For example,
Because X4 Y1, obj5 definitely belongs to
concept Y1. The objects obj4, obj6, and obj8
POSC (Y ) = U P Y X
( )
i
{Xi R * (C)}
in the information system shown in Table 2, possibly belong to concept Y1, because the -negative region of the set Y:
obj1 and obj3 are indiscernible to the set of
attributes C D, and obj4, obj6, and obj8
intersection of their equivalence class, X3,
with Y1 is not empty. Other objects do not
NEG C (Y ) = U P Y X
( i <) {Xi R * (C)}
are indiscernible to the set of condition attri- belong to Y1.
butes C. The conditional probability of each equiv- The -positive region of the set Y corre-
R(B) is an equivalence relation on U for alence class is sponds to all elementary sets of U that can
every B A. Thus, we can define two nat- be classified into the concept Y with condi-
ural equivalence relations, R(C) and R(D), P(Y1 | X1) = 0 tional probability P(Y | Xi) greater than or
on U for an information system S. A concept P(Y1 | X2) = 0 equal to the parameter . Similarly, the neg-
Y is an equivalence class of the relation R(D). P(Y1 | X3) = 2/3 = 0.667 ative region of the set Y corresponds to all
Without loss of generality, we can consider D P(Y1 | X4) = 1 elementary sets of U that can be classified
as a singleton set. Our objective is to con- P(Y1 | X5) = 0 into the set Y.
struct decision rules for each concept. Given Let xi U be an object; POSC(Y) and
a concept Y, the partition of U with respect -probabilistic approximation classifica- NEGC(Y) are the positive and negative
to this concept is defined as R*(D) = {Y, U tion. Given an information system S = {U, regions of the concept Y. The object xi is clas-
Y} = {Y, Y}. A, V, f} and an equivalence relation R(C) (an sified as belonging to the concept Y if and
Based on the set of condition attributes C, indiscernibility relation) on U, an ordered only if xi POSC(Y), or to the complement
an object xi specifies the equivalence class pair AS = <U, R(C)> is called an approxi- Y of the concept Y if and only if xi
[xi]R of the relation R(C): mation space4 based on the condition attri- NEGC(Y). We want to decide whether xi is in
butes C. The equivalence classes of the rela- the concept Y on the basis of the set of equiv-
[xi]R = {xj U | a C, f(xj, a) = f(xi, a)} tion R(C) are called elementary sets in AS, alence classes in ASP rather than on the basis
because they represent the smallest groups of the set Y. This means we deal with
We say that xi U definitely belongs to a of objects that are distinguishable in terms of POSC(Y) and NEGC(Y) instead of the set Y.
concept Y if [xi]R Y, and that xi U possi- the attributes and their values. Let Y U be If xi U is in POSC(Y), it can be classified
bly belongs to the concept Y if [xi]R Y 0/. a subset of objects representing a concept, into the concept Y with the conditional prob-
We define conditional probabilities as and R*(C) = {X1, X2, , Xn} = {[x1]R, [x2]R, ability P(Y | Xi) greater than or equal to the
, [xn]R} be the collection of equivalence parameter .
(
P Y [ xi ] R ) = Y [x ]
( [ xi ] R ) =
i R classes induced by the relation R(C). In the
( )
PY
P [ xi ] R [ xi ] R standard rough-set model, the lower and Example 2. Following from Example 1,
upper approximations of a set Y are defined recall that Y1 = {obj4, obj5, obj6}. If we let
by = 1, then the -positive region of the set Y1 is
where P(Y | [x i]R) is the probability of POSC(Y1) = {obj5}, and the -negative region
occurrence of event Y conditioned on event R(C)(Y ) = U {Xi R * (C)} of the set Y1 is NEGC(Y1) = {obj1, obj2, obj3,
[xi]R. That is, P(Y | [xi]R) = 1 if and only if ( )
P Y Xi =1 obj4, obj6, obj7, obj8}. If we let = 0.6, then
[xi]R Y; P(Y | [xi]R) > 0 if and only if [xi]R the -positive region of the set Y1 is POSC(Y1)
Y 0/; and P(Y | [xi]R) = 0 if and only if and = {obj4, obj5, obj6, obj8}, and the -negative
[xi]R Y = 0/. region of the set Y1 is NEGC(Y1) = {obj1, obj2,
R(C)(Y ) = U {Xi R * (C)} obj3, obj7}.
Example 1. Let us consider the given infor- ( )
P Y Xi >0

mation system in Table 2. The concepts in Reduction of condition attributes. An


this information system, the equivalence The above definitions do not use the statisti- information system often includes some con-
classes on the relation R(D), are cal information in the boundary region, dition attributes that do not provide any addi-

74 IEEE EXPERT
.

Table 3. Reduct table with respect to Y1 after reduction from Table 2.

C NUMBER OF NUMBER OF
OBJECTS a1 a3 Y1 Y1
OBJECTS IN Y1
OBJECTS IN

tional information about the objects in U. obj4, obj6, obj8 6 4 1 2 1


Eliminating those attributes can reduce the obj5 5 3 1 1 0
complexity and cost of the decision process. obj1, obj3 6 8 0 0 2
We use the concept of a reduct in rough sets obj2 5 9 0 0 1
to describe the method of condition-attributes obj7 6 3 0 0 1
reduction.
Given an attribute-value system S = <U, C
{d}, V, f>, an attribute a is dispensable in Y} be the partition induced by the decision The descriptions of these equivalence classes
C with respect to {d} if POSC{a}(Y) = attribute. Each equivalence class Xi of the are
POSC(Y); otherwise a is an indispensable equivalence relation R(RED) is associated
attribute in C with respect to {d}. A subset with a unique combination of values of attrib- Des(X1) = (a1 = 6) (a3 = 4)
of condition attributes B C is a dependent utes belonging to RED. This combination of Des(X2) = (a1 = 5) (a3 = 3)
set in S with respect to {d} if a proper subset values is called the description of the equiv- Des(X3) = (a1 = 6) (a3 = 8)
K B exists such that POSB(Y) = POSK(Y); alence class Xi R*(RED). We can express Des(X4) = (a1 = 5) (a3 = 9)
otherwise, B is an independent set with the description of Xi as Des(X5) = (a1 = 6) (a3 = 3)
respect to {d}. A reduct C of attributes C is
The descriptions of Y1 and Y1 are
(a = f ( xi , a))
a maximal independent subset of condition
attributes with respect to {d}.4 Des( Xi ) =
aRED
The procedure for finding a single reduct Des(Y1) = (D = 1)
is very straightforward. Consider a condition Des(Y1) = (D 1)
attribute a C. If the -positive region where denotes the conjunction operator,
POSC{a}(Y) of the set Y is the same as and xi is an object in the equivalence class Xi. Because Y1 = {obj4, obj5, obj6}, we calcu-
POSC(Y), then the attribute a is marked as Similarly, the descriptions of Y and Y are late the condition probabilities as
being redundant and is removed from the set
of condition attributes C. Other superfluous Des(Y) = (D = f(xi, D)), and P(Y1 | X1) = 2/3 = 0.67
condition attributes can be removed in the Des(Y) = (D f(xi, D)) P(Y1 | X2) = 1
same manner. The remaining set of condition P(Y1 | X3) = 0
attributes is a reduct. More than one reduct where D is the decision attribute and xi Y. P(Y1 | X4) = 0
can exist for a given attribute-value system. The following decision rules describe the P(Y1 | X4) = 0
Selection of a best reduct depends on the relationship between the partition R*(RED)
optimality criterion associated with the and the partition R*(D): We then obtain the decision rules with
attributes. We can also assign significance respect to the concept Y1 as follows:
values to attributes and base the selection on for Xi R*(RED),
those values. r1+: (a1 = 6) (a3 = 4) 0.67 (D = 1)
r2+: (a1 = 5) (a3 = 3) 1 (D = 1)
( )
Ci
Des( Xi ) Des(Y ), if P Y Xi
Example 3. The condition attributes C of the r1: (a1 = 6) (a3 = 8) 1 (D 1)
information system in Table 2 has a total of r2: (a1 = 5) (a3 = 9) 1 (D 1)
( )
Ci
seven reducts. Table 3 shows a reduced infor- Des( Xi ) Des(Y ), if P Y Xi < r3: (a1 = 6) (a3 = 3) 1 (D 1)
mation system based on the reduct {a1, a3}
with respect to the concept Y1. The objects Rule generalization. As indicated in the
with value 1 for attribute Y1 belong to the - where ci is the uncertainty factor, which is previous subsection, we can obtain decision
positive region of the concept Y1; the objects equal to P(Y | Xi) in the first case and 1 P(Y rules directly from the reduced information
with value 0 for attribute Y1 belong to the - | Xi) in the second. This means that if an system. We obtain one rule for each equiv-
negative region of concept Y1. Here, = 0.6. object xi satisfies the description Des(Xi) and alence class of the partition R*(RED). How-
if P(Y | Xi) , then the object xi definitely ever, rules might contain attributes whose
belongs to Y with uncertainty ci. Similarly, if values are irrelevant for determining the tar-
Generating decision rules P(Y | Xi) < , then the object xi possibly get concept. Furthermore, we can general-
belongs to the complementary concept Y ize rules further by inspecting which con-
Rule generation is a crucial task in any with uncertainty ci. ditions in a rule can be removed without
learning system. We now describe how deci- causing any inconsistency. A decision rule
sion rules are generated based on the reduct Example 4. Following from Example 3, let obtained by dropping the maximum possi-
obtained in the previous section. RED denote the reduct {a1, a3}. The collec- ble number of conditions is called a maxi-
tion R*(RED) of the equivalence classes of mally general rule. By construction, maxi-
Probabilistic decision rules. Let R*(RED) the relation R(RED) is mally general rules contain a minimum
= {X1, X2, , Xn} be the collection of equiv- number of conditions. We use the decision
alence classes of the relation R(RED), where R*(RED) = {X1, X2, X3, X4, X5} = {{obj4, matrix technique to find all the maximally
RED is a reduct that is a reduced set of con- obj6, obj8}, {obj5}, {obj1, obj3}, {obj2}, general rules.10
dition attributes C in S, and let R*(D) = {Y, {obj7}} Let Xi+, i = (1, 2, ), denote the equiva-

JULY/AUGUST 1997 75
.

Table 4. The decision matrix for the rules with respect to concept Y1.

X1 X2 X3

X1+ (a3, 4) (a1, 6), (a3, 4) (a3, 4) The support sets of these rules are
X2+ (a1, 5), (a3, 3) (a3, 3) (a1, 5)
supp(r1) = {obj1, obj3}
supp(r2) = {obj2}
supp(r3) = {obj4, obj6, obj8}
lence classes of the relation R*(RED) such B1 = (a3 = 4) ((a1 = 6) (a3 = 4)) supp(r4) = {obj5}
that Xi+ POSRED(Y); and let Xj, j = (1, 2, (a3 = 4) = (a3 = 4) supp(r5) = {obj7}
, ), denote the equivalence classes of the B2 = ((a1 = 5) (a3 = 3)) (a3 = 3)
relation R*(RED) such that Xj (a1 = 5) = (a1 = 5) (a3 = 3) Because these support sets are not overlap-
NEGRED(Y). We define a decision matrix M ping, the rules are both minimal rules and
= (Mij) as Therefore, the maximally general rule for the minimal covering rules.
equivalence class X1+ is
Mij = {(a, f(Xi+, a)) : a RED, f(Xi+, a)
f(Xj, a)} (a3 = 4) 0.67(D = 1) Experimental results
where a is a condition attribute belonging to and the maximally general rule for the equiv- We have implemented this method in the
RED. That is, Mij contains all attribute-value alence class X2+ is KDD-R (Knowledge Discovery in Databases,
pairs whose values are not the same between Rough Sets Approach) system developed at
the equivalence class Xi+ and the equivalence (a1 = 5) (a3 = 3) 1 (D = 1). the University of Regina.9 The system per-
class Xj. forms data analysis, database mining, pattern
We obtain the set of decision rules com- Given the set of all maximally general recognition and validation, and expert-system
puted for a given equivalence class Xi+ by rules for an information system S, our sys- building. We tested the proposed method by
treating each element of Mij as a Boolean tem provides the options to find the set of applying the KDD-R system on the data for
expression and constructing the following minimal rules and the set of minimal cover- water-demand prediction. Our objective is to
Boolean function: ing rules. The support set of a rule ri, denoted analyze the set of training data and generate a
as supp(ri), is the collection of rows of the set of decision rules to predict a citys daily

j
(
Bi = Mij ) original table satisfying the condition part of
ri. A collection of rules RUL is a set of min-
water demand. As we mentioned, the set of
training samples consists of 306 objects col-
imal rules if, for every ri RUL, lected over a period of 10 months and includes
where and are the usual conjunction and information on day of the week, weather con-
disjunction operators. ditions, and daily water consumption.
We can show that the prime implicants of supp(ri ) / supp rj ( )r RUL, r r
j i j
In our experiment, we divided the values of
the Boolean function Bi are the maximally decision attributes into nine ranges so that the
general rules for the equivalence class Xi+ information system has nine concepts. The
belonging to the positive learning region A collection of rules RUL is a set of mini- KDD-R system generated a total of 188 rules.
POSRED(Y). Thus, by finding the prime mal covering rules if, for every ri RUL, Table 5 lists the number of rules generated for
implicants of all the decision functions Bi (i different concepts; NC stands for the number
= 1, 2, , ), we can compute all the maxi- of training samples covered by the rules.
mally general rules for the positive learning supp(ri ) / U supp rj ( )r RUL, r r
j i j
The most general rule for the concept D =
region POSRED(Y). (60 70] is

Example 5. Following from Example 4, the Example 6. As shown in Example 5, we can (13.80 < a2 1.40) (81.60 < a3
equivalence classes of the relation R(RED) use a decision matrix to calculate the maxi- 93.00) 1 (60 < D 70)
are mally general rule for the concept Y1. Simi-
larly, we can obtain the maximally general This rule covers 12.37% of the training
X1+ = X1 = {obj4, obj6, obj8} rules for concepts Y0 and Y2. The set of all objects concluding the concept. The rule
X2+ = X2 = {obj5} maximally general rules for the information states that
X1 = X3 = {obj1, obj3} system in Table 2 are
X2 = X4 = {obj2} If the maximum temperature is within the range
X3 = X5 = {obj7} r1: (a3 = 8) 1 (D = 0) (13.80C, 1.40C] and the average humidity
is within (81.60%, 93%], then the water demand
r2: (a3 = 9) 1 (D = 0) is between 60 megaliters and 70 megaliters,
Table 4 shows the decision matrix r3: (a3 = 4) 0.67 (D = 1) with an uncertainty factor of 1.
r4: (a1 = 5) (a3 = 3) 1 (D = 1)
( ) 2 3
M = Mij r5: (a1 = 6) (a3 = 3) 1 (D = 2) The most general rule for the concept D =
(70 80] is
where r1 and r2 are for the concept Y0, r3 and
We then obtain the Boolean functions for r4 are for the concept Y1, and r5 is for the con- (a0 = M) (31.50 a2 13.80)
each Xi+ (i = 1, 2) as follows: cept Y2. 1 (70 < D 80)

76 IEEE EXPERT
.

Table 5. Number of rules for different concepts.

CONCEPTS NUMBER OF RULES NC

This rule covers 4.17% of the training [53 60] 3 3 system). It offers the advantage of describ-
objects concluding the concept. It states that (60 70] 43 97 ing important relationships between condi-
(70 80] 67 120 tion factors and water consumption in terms
If the day of the week is Monday and the (80 90] 32 38 of simple if-then rules that users can easily
minimum temperature is within the range (90 100] 16 18 understand. The experimental results indi-
[31.50C, 13.80C], then the water demand (100 110] 8 9
is between 70 megaliters and 80 megaliters, cate that the proposed algorithm can gener-
with an uncertainty factor of 1. (110 120] 6 7 ate rules for water-demand prediction, pro-
(120 130] 4 5 viding more precise information than is
The most general rule for the concept D = (130 140] 3 3 available through knowledge acquisition
(80 90] is (140 176] 5 6 from human experts.
Our proposed method for machine induc-
(a0 = F) (6.70 < a1 34.10) (4.50 tion is general and can be applied to other
a6 8.70) 1(80 < D 90) domains, such as general consumer-demand
knowledge system. Thus, our method can prediction, fault diagnosis, and process
This rule covers 7.90% of the training derive decision rules from incomplete knowl- control. For water-demand prediction, we
objects concluding the concept. It states that edge. This capability is important, because have also tried different sets of causal vari-
we seldom have complete and consistent ables with the rule-induction method to fur-
If the day of the week is Friday and the maxi- information when designing intelligent sys- ther improve predictive accuracy. The
mum temperature is within the range (6.70C, tems. results show that adding time-series data,
34.10C] and the average wind speed is within
[4.50 km/hr., 8.70 km/hr.], then the water Application of this knowledge-discovery such as yesterdays and the day before yes-
demand is between 80 megaliters and 90 mega- method for water-demand prediction com- terdays water consumption, as conditional
liters, with an uncertainty factor of 1. plements manual knowledge-acquisition variables gives better results than using
techniques (see the sidebar, The intelligent weather factors alone. In terms of the
The most general rule for the concept D =
(100 110] is

(a0 > SU) (47.40 < a3 53.10) The intelligent system


(17.10 < a6 21.30) 1(100 < D 110) The knowledge-discovery module for water-demand prediction is one component of the
intelligent system for monitoring and controlling the water-distribution system. The intel-
This rule covers 22.22% of the training ligent system consists of five modules:
objects concluding the concept. It states that
the expert-system module,
the energy-management module,
If the day of the week is Sunday and the aver-
age humidity is within (47.40%, 53.10%] and the water-demand prediction module,
the average wind speed is within (17.10 km/hr., the pipeline-network-simulation module, and
21.30 km/hr.], then the water demand is the scheduling and planning module.
between 100 megaliters and 110 megaliters,
The expert system consists of a knowledge base for detecting faults and recommending a
with an uncertainty factor of 1.
series of adjustments on equipment at a given state of operations.
The energy-management module informs users of the most cost-effective combination of
To evaluate the rules derived by our pumps and valves among the possible configurations suggested by the expert system.
method, we conducted a leave-ten-out The prediction module forecasts water demands. It has been implemented using various AI
experiment by using 90% of the data for techniques, including neural networks, fuzzy sets, case-based reasoning, and knowledge dis-
training and the remaining 10% for testing. covery from databases.
The error rate depends on the selection of The work presented in the main text of the article describes the knowledge-discovery com-
training samples. We conducted the exper- ponent of the prediction module. The forecast on water usage, along with the recommenda-
iment 10 times. The best error rate of pre- tions on pumps and valve adjustments provided by the expert system, becomes input to the
simulation module, for verification purposes.
diction was 6.67%, and the average error
The simulation program uses an extended period simulation to represent the dynamic
rate of prediction was 10.27%.
behavior of flows, pressures, and water levels. The operational procedure of pumps and valves
suggested by the expert system are satisfactory if the simulation module shows that
the pressure at the outlet of each pumping station is within the set-point limit,
the water level remains within the range allowable for the reservoirs, and

W
the combination of pumps and valves does not cause any reservoir to overflow or deplete
within a short period of time.
E HAVE SUGGESTED A METHOD
for generating prediction rules from a given The project began in May 1994, and we have completed the prototype of the expert system,
set of training examples. The method is an the simulation program, the prediction module, and the energy-management module. We are
working on the scheduling and planning module, and on integrating the different components
extension of the rough-set model. The salient
into an intelligent system. We expect to complete the project this fall.
feature of the proposed method is that it uses
the statistical information inherent in the

JULY/AUGUST 1997 77
.

machine-induction method, we have devel- 7. J. Katzberg and W. Ziarko, Variable Preci- Information Processing Society. Readers can con-
oped a new algorithm that integrates rule sion Rough Sets with Asymmetric Bounds, tact Chan at the Dept. of Computer Science, Univ.
in Rough Sets, Fuzzy Sets and Knowledge of Regina, Regina, SK S4S 0A2, Canada; chan@
induction with case-based reasoning to Discovery, W. Ziarko, ed., Springer-Verlag, cs.uregina.ca; http://www.cs.uregina.ca/~chan.
improve case retrieval and problem solving. New York, 1994, pp. 167177.
We have applied this integrated method to
numeric-prediction domains. Experimental 8. Z. Pawlak, S.K.M. Wong, and W. Ziarko,
results show that case-based reasoning with Rough Sets: Probabilistic Versus Determin-
istic Approach, Intl J. Man-Machine Stud-
rule induction performs better than case- ies, Vol. 29, No. 1, 1988, pp. 8195.
based reasoning by itself, in terms of pre- Ning Shan is a PhD candidate in the Department
dictive accuracy. 9. W. Ziarko and N. Shan, KDD-R: A Com- of Computer Science at the University of Regina.
prehensive System for Knowledge Discovery His research interests include machine learning,
in Databases Using Rough Sets, Proc. Third rough sets, rule-based expert systems, knowledge
Intl Workshop Rough Sets and Soft Comput- discovery in databases, and real-world applications
ing, San Jose State Univ., San Jose, Calif., of artificial intelligence. He received a B.Eng. in
1994, pp. 164173. computer engineering from ChengDu Electronic
Engineering Institute, China, and an MSc in com-
10. W. Ziarko and N. Shan, A Rough Set-Based puter science from the University of Regina. Read-
Method for Computing All Minimal Deter- ers can contact Shan at the Dept. of Computer Sci-
Acknowledgments ministic Rules in Attribute-Value Systems, ence, Univ. of Regina, Regina, SK S4S 0A2,
Computational Intelligence: An Intl J., Vol. Canada; ning@cs.uregina.ca.
This research is supported by the Telecommu- 12, No. 2, 1996, pp. 223234.
nications Research Laboratories and the Natural
Science and Engineering Research Council of 11. S.K.M. Wong and W. Ziarko, INFERAn
Canada. We deeply appreciate the cooperation of Adoptive Decision Support System Based on
the personnel at the City of Reginas Municipal the Probabilistic Approximate Classification,
Water Engineering and Water Supply departments. Proc. Sixth Intl Workshop Expert Systems
We also would like to thank Paitoon Tontiwach- and Their Applications, Agence de Informa- Nick Cercone is a professor and the chair of the
wuthikul for his invaluable suggestions on this tique, Avignon, France, 1986, pp. 713726. Computer Science Department at the University
work. of Waterloo. His research interests include natural-
language processing, knowledge-based systems,
knowledge discovery in databases, and design and
human interfaces. He received a BS in engineering
science from the University of Steubenville, an MS
in computer and information science from Ohio
State University, and a PhD in computing science
Aijun An is a PhD candidate in the Department of from the University of Alberta. He is a member of
Computer Science at the University of Regina. Her the ACM, the IEEE, the AAAI, the Artificial Intel-
research interests include machine learning, case- ligence Simulation of Behavior, the AGS, and the
based reasoning, knowledge engineering for devel- Association of Computational Linguistics. He
References oping real-time expert systems, validation and
verification of knowledge-based systems, and
recently won the Canadian Artificial Intelligence
Societys Distinguished Service Award. Readers
1. J. Quevedo et al., Time Series Modelling of applications of artificial intelligence techniques to can contact Cercone at the Computer Science
Water Demand: A Study on Short-Term and practical problems. She received a BSc and an Dept., Faculty of Mathematics, Univ. of Waterloo,
Long-Term Predictions, in Computer Appli- MSc in computer science from Xidian University, Waterloo, ON N2L 3G1, Canada; ncercone@uwa-
cations in Water Supply, Vol. 1, B. Coulbeck Xian, China. She is a student member of the terloo.ca.
and C. Orr, eds., Research Studies Press Ltd., AAAI and the IEEE Computer Society. Readers
Letchworth, England, 1988, pp. 268288. can contact An at the Dept. of Computer Science,
Univ. of Regina, Regina, SK S4S 0A2, Canada;
2. R.S. Michalski, J.G. Carbonell, and T.M. aijun@cs.uregina.ca.
Mitchell, eds., Machine Learning: An Artifi-
cial Intelligence Approach, Vol. 1, Morgan
Kaufmann, San Francisco, 1983.
Wojciech Ziarko is a professor in the Computer
3. R.S. Michalski, J.G. Carbonell, and T.M. Science Department at the University of Regina.
Mitchell, eds., Machine Learning: An Artifi- Christine Chan is an associate professor in the His research interests are knowledge discovery in
cial Intelligence Approach, Vol. 2, Morgan Department of Computer Science at the Univer- databases, machine learning, pattern classification,
Kaufmann, 1986. sity of Regina. Her research interests include and control-algorithm acquisition from sensor
industrial applications of artificial intelligence, data. These research interests are, to a large degree,
4. Z. Pawlak, Rough Sets: Theoretical Aspects object-oriented methodologies for knowledge- motivated by the recent introduction of the theory
of Reasoning about Data, Kluwer Academic based systems development, knowledge acquisi- of rough sets, which serves as a basic mathemati-
Publishers, Norwell, Mass., 1991. tion and conceptual modeling, and educational cal framework in much of his work. He received an
instructional software. She received MSc degrees MSc in applied mathematics from Warsaw Uni-
5. J.R. Quinlan, Induction of Decision Trees, in computer science and management informa- versity of Technology, Poland, and a PhD in com-
Machine Learning, Vol. 1, No. 1, 1986, pp. tion systems from the University of British puter science, specializing in databases, from the
81106. Columbia. She also received a PhD in the inter- Institute of Computer Science of the Polish Acad-
disciplinary studies of computer science, philos- emy of Sciences, Warsaw, Poland. Readers can
6. W. Ziarko, Variable Precision Rough Set ophy, and psychology from Simon Fraser Uni- contact Ziarko at the Dept. of Computer Science,
Model, J. Computer and System Sciences, versity. She is a member of the AAAI, the IEEE Univ. of Regina, Regina, SK S4S 0A2, Canada;
Vol. 46, No. 1, 1993, pp. 3959. Computer Society, the ACM, and the Canadian ziarko@cs.uregina.ca.

78 IEEE EXPERT

You might also like