Professional Documents
Culture Documents
K N O W L E D G E D I S C O V E R Y
Applying Knowledge
Discovery to Predict
Water-Supply Consumption
Aijun An, Christine Chan, Ning Shan, Nick Cercone, and Wojciech Ziarko,
University of Regina, Saskatchewan, Canada
JULY/AUGUST 1997 73
.
by the fixed set of selected attributes might be Y0 = {obj1, obj2, obj3} R(C)(Y ) R(C)(Y )
insufficient for characterizing the objects Y1 = {obj4, obj5, obj6}
uniquely. Any two objects are indistinguish- Y2 = {obj7, obj8} For this reason, several extensions to the orig-
able from each other whenever they assume inal rough-set model have been pro-
the same attribute values. This means we The equivalence classes on the relation R(C) posed.69,11 In our approach, we try to rectify
might not be able to distinguish all the are this limitation by introducing a -approxi-
objects solely by means of the admitted mation space.
attributes and their values. X1 = [obj1]R = [obj3]R = {obj1, obj3} A -approximation space ASP is a triple
Given an information system <U, A, V, X2 = [obj2]R = {obj2} <U, R(C), P>, where P is a probability mea-
f>, let B be a subset of A, and let xi and xj be X3 = [obj4]R = [obj6]R = [obj8]R sure and is a real number in the range (0.5,
members of U. A binary relation R(B), = {obj4, obj6, obj8} 1]. The -approximation space ASP can be
called an indiscernibility relation, is defined X4 = [obj5]R = {obj5} divided into the following regions:
as R(B) = {(xi, xj) U2 | a B, f(xi, a) = X5 = [obj7]R = {obj7}
f(xj, a)}. We say that xi and xj are indis- -positive region of the set Y:
cernible to the set of attributes B in S if f(xi,
a) = f(xj, a) for every a B. For example,
Because X4 Y1, obj5 definitely belongs to
concept Y1. The objects obj4, obj6, and obj8
POSC (Y ) = U P Y X
( )
i
{Xi R * (C)}
in the information system shown in Table 2, possibly belong to concept Y1, because the -negative region of the set Y:
obj1 and obj3 are indiscernible to the set of
attributes C D, and obj4, obj6, and obj8
intersection of their equivalence class, X3,
with Y1 is not empty. Other objects do not
NEG C (Y ) = U P Y X
( i <) {Xi R * (C)}
are indiscernible to the set of condition attri- belong to Y1.
butes C. The conditional probability of each equiv- The -positive region of the set Y corre-
R(B) is an equivalence relation on U for alence class is sponds to all elementary sets of U that can
every B A. Thus, we can define two nat- be classified into the concept Y with condi-
ural equivalence relations, R(C) and R(D), P(Y1 | X1) = 0 tional probability P(Y | Xi) greater than or
on U for an information system S. A concept P(Y1 | X2) = 0 equal to the parameter . Similarly, the neg-
Y is an equivalence class of the relation R(D). P(Y1 | X3) = 2/3 = 0.667 ative region of the set Y corresponds to all
Without loss of generality, we can consider D P(Y1 | X4) = 1 elementary sets of U that can be classified
as a singleton set. Our objective is to con- P(Y1 | X5) = 0 into the set Y.
struct decision rules for each concept. Given Let xi U be an object; POSC(Y) and
a concept Y, the partition of U with respect -probabilistic approximation classifica- NEGC(Y) are the positive and negative
to this concept is defined as R*(D) = {Y, U tion. Given an information system S = {U, regions of the concept Y. The object xi is clas-
Y} = {Y, Y}. A, V, f} and an equivalence relation R(C) (an sified as belonging to the concept Y if and
Based on the set of condition attributes C, indiscernibility relation) on U, an ordered only if xi POSC(Y), or to the complement
an object xi specifies the equivalence class pair AS = <U, R(C)> is called an approxi- Y of the concept Y if and only if xi
[xi]R of the relation R(C): mation space4 based on the condition attri- NEGC(Y). We want to decide whether xi is in
butes C. The equivalence classes of the rela- the concept Y on the basis of the set of equiv-
[xi]R = {xj U | a C, f(xj, a) = f(xi, a)} tion R(C) are called elementary sets in AS, alence classes in ASP rather than on the basis
because they represent the smallest groups of the set Y. This means we deal with
We say that xi U definitely belongs to a of objects that are distinguishable in terms of POSC(Y) and NEGC(Y) instead of the set Y.
concept Y if [xi]R Y, and that xi U possi- the attributes and their values. Let Y U be If xi U is in POSC(Y), it can be classified
bly belongs to the concept Y if [xi]R Y 0/. a subset of objects representing a concept, into the concept Y with the conditional prob-
We define conditional probabilities as and R*(C) = {X1, X2, , Xn} = {[x1]R, [x2]R, ability P(Y | Xi) greater than or equal to the
, [xn]R} be the collection of equivalence parameter .
(
P Y [ xi ] R ) = Y [x ]
( [ xi ] R ) =
i R classes induced by the relation R(C). In the
( )
PY
P [ xi ] R [ xi ] R standard rough-set model, the lower and Example 2. Following from Example 1,
upper approximations of a set Y are defined recall that Y1 = {obj4, obj5, obj6}. If we let
by = 1, then the -positive region of the set Y1 is
where P(Y | [x i]R) is the probability of POSC(Y1) = {obj5}, and the -negative region
occurrence of event Y conditioned on event R(C)(Y ) = U {Xi R * (C)} of the set Y1 is NEGC(Y1) = {obj1, obj2, obj3,
[xi]R. That is, P(Y | [xi]R) = 1 if and only if ( )
P Y Xi =1 obj4, obj6, obj7, obj8}. If we let = 0.6, then
[xi]R Y; P(Y | [xi]R) > 0 if and only if [xi]R the -positive region of the set Y1 is POSC(Y1)
Y 0/; and P(Y | [xi]R) = 0 if and only if and = {obj4, obj5, obj6, obj8}, and the -negative
[xi]R Y = 0/. region of the set Y1 is NEGC(Y1) = {obj1, obj2,
R(C)(Y ) = U {Xi R * (C)} obj3, obj7}.
Example 1. Let us consider the given infor- ( )
P Y Xi >0
74 IEEE EXPERT
.
C NUMBER OF NUMBER OF
OBJECTS a1 a3 Y1 Y1
OBJECTS IN Y1
OBJECTS IN
JULY/AUGUST 1997 75
.
Table 4. The decision matrix for the rules with respect to concept Y1.
X1 X2 X3
X1+ (a3, 4) (a1, 6), (a3, 4) (a3, 4) The support sets of these rules are
X2+ (a1, 5), (a3, 3) (a3, 3) (a1, 5)
supp(r1) = {obj1, obj3}
supp(r2) = {obj2}
supp(r3) = {obj4, obj6, obj8}
lence classes of the relation R*(RED) such B1 = (a3 = 4) ((a1 = 6) (a3 = 4)) supp(r4) = {obj5}
that Xi+ POSRED(Y); and let Xj, j = (1, 2, (a3 = 4) = (a3 = 4) supp(r5) = {obj7}
, ), denote the equivalence classes of the B2 = ((a1 = 5) (a3 = 3)) (a3 = 3)
relation R*(RED) such that Xj (a1 = 5) = (a1 = 5) (a3 = 3) Because these support sets are not overlap-
NEGRED(Y). We define a decision matrix M ping, the rules are both minimal rules and
= (Mij) as Therefore, the maximally general rule for the minimal covering rules.
equivalence class X1+ is
Mij = {(a, f(Xi+, a)) : a RED, f(Xi+, a)
f(Xj, a)} (a3 = 4) 0.67(D = 1) Experimental results
where a is a condition attribute belonging to and the maximally general rule for the equiv- We have implemented this method in the
RED. That is, Mij contains all attribute-value alence class X2+ is KDD-R (Knowledge Discovery in Databases,
pairs whose values are not the same between Rough Sets Approach) system developed at
the equivalence class Xi+ and the equivalence (a1 = 5) (a3 = 3) 1 (D = 1). the University of Regina.9 The system per-
class Xj. forms data analysis, database mining, pattern
We obtain the set of decision rules com- Given the set of all maximally general recognition and validation, and expert-system
puted for a given equivalence class Xi+ by rules for an information system S, our sys- building. We tested the proposed method by
treating each element of Mij as a Boolean tem provides the options to find the set of applying the KDD-R system on the data for
expression and constructing the following minimal rules and the set of minimal cover- water-demand prediction. Our objective is to
Boolean function: ing rules. The support set of a rule ri, denoted analyze the set of training data and generate a
as supp(ri), is the collection of rows of the set of decision rules to predict a citys daily
j
(
Bi = Mij ) original table satisfying the condition part of
ri. A collection of rules RUL is a set of min-
water demand. As we mentioned, the set of
training samples consists of 306 objects col-
imal rules if, for every ri RUL, lected over a period of 10 months and includes
where and are the usual conjunction and information on day of the week, weather con-
disjunction operators. ditions, and daily water consumption.
We can show that the prime implicants of supp(ri ) / supp rj ( )r RUL, r r
j i j
In our experiment, we divided the values of
the Boolean function Bi are the maximally decision attributes into nine ranges so that the
general rules for the equivalence class Xi+ information system has nine concepts. The
belonging to the positive learning region A collection of rules RUL is a set of mini- KDD-R system generated a total of 188 rules.
POSRED(Y). Thus, by finding the prime mal covering rules if, for every ri RUL, Table 5 lists the number of rules generated for
implicants of all the decision functions Bi (i different concepts; NC stands for the number
= 1, 2, , ), we can compute all the maxi- of training samples covered by the rules.
mally general rules for the positive learning supp(ri ) / U supp rj ( )r RUL, r r
j i j
The most general rule for the concept D =
region POSRED(Y). (60 70] is
Example 5. Following from Example 4, the Example 6. As shown in Example 5, we can (13.80 < a2 1.40) (81.60 < a3
equivalence classes of the relation R(RED) use a decision matrix to calculate the maxi- 93.00) 1 (60 < D 70)
are mally general rule for the concept Y1. Simi-
larly, we can obtain the maximally general This rule covers 12.37% of the training
X1+ = X1 = {obj4, obj6, obj8} rules for concepts Y0 and Y2. The set of all objects concluding the concept. The rule
X2+ = X2 = {obj5} maximally general rules for the information states that
X1 = X3 = {obj1, obj3} system in Table 2 are
X2 = X4 = {obj2} If the maximum temperature is within the range
X3 = X5 = {obj7} r1: (a3 = 8) 1 (D = 0) (13.80C, 1.40C] and the average humidity
is within (81.60%, 93%], then the water demand
r2: (a3 = 9) 1 (D = 0) is between 60 megaliters and 70 megaliters,
Table 4 shows the decision matrix r3: (a3 = 4) 0.67 (D = 1) with an uncertainty factor of 1.
r4: (a1 = 5) (a3 = 3) 1 (D = 1)
( ) 2 3
M = Mij r5: (a1 = 6) (a3 = 3) 1 (D = 2) The most general rule for the concept D =
(70 80] is
where r1 and r2 are for the concept Y0, r3 and
We then obtain the Boolean functions for r4 are for the concept Y1, and r5 is for the con- (a0 = M) (31.50 a2 13.80)
each Xi+ (i = 1, 2) as follows: cept Y2. 1 (70 < D 80)
76 IEEE EXPERT
.
This rule covers 4.17% of the training [53 60] 3 3 system). It offers the advantage of describ-
objects concluding the concept. It states that (60 70] 43 97 ing important relationships between condi-
(70 80] 67 120 tion factors and water consumption in terms
If the day of the week is Monday and the (80 90] 32 38 of simple if-then rules that users can easily
minimum temperature is within the range (90 100] 16 18 understand. The experimental results indi-
[31.50C, 13.80C], then the water demand (100 110] 8 9
is between 70 megaliters and 80 megaliters, cate that the proposed algorithm can gener-
with an uncertainty factor of 1. (110 120] 6 7 ate rules for water-demand prediction, pro-
(120 130] 4 5 viding more precise information than is
The most general rule for the concept D = (130 140] 3 3 available through knowledge acquisition
(80 90] is (140 176] 5 6 from human experts.
Our proposed method for machine induc-
(a0 = F) (6.70 < a1 34.10) (4.50 tion is general and can be applied to other
a6 8.70) 1(80 < D 90) domains, such as general consumer-demand
knowledge system. Thus, our method can prediction, fault diagnosis, and process
This rule covers 7.90% of the training derive decision rules from incomplete knowl- control. For water-demand prediction, we
objects concluding the concept. It states that edge. This capability is important, because have also tried different sets of causal vari-
we seldom have complete and consistent ables with the rule-induction method to fur-
If the day of the week is Friday and the maxi- information when designing intelligent sys- ther improve predictive accuracy. The
mum temperature is within the range (6.70C, tems. results show that adding time-series data,
34.10C] and the average wind speed is within
[4.50 km/hr., 8.70 km/hr.], then the water Application of this knowledge-discovery such as yesterdays and the day before yes-
demand is between 80 megaliters and 90 mega- method for water-demand prediction com- terdays water consumption, as conditional
liters, with an uncertainty factor of 1. plements manual knowledge-acquisition variables gives better results than using
techniques (see the sidebar, The intelligent weather factors alone. In terms of the
The most general rule for the concept D =
(100 110] is
W
the combination of pumps and valves does not cause any reservoir to overflow or deplete
within a short period of time.
E HAVE SUGGESTED A METHOD
for generating prediction rules from a given The project began in May 1994, and we have completed the prototype of the expert system,
set of training examples. The method is an the simulation program, the prediction module, and the energy-management module. We are
working on the scheduling and planning module, and on integrating the different components
extension of the rough-set model. The salient
into an intelligent system. We expect to complete the project this fall.
feature of the proposed method is that it uses
the statistical information inherent in the
JULY/AUGUST 1997 77
.
machine-induction method, we have devel- 7. J. Katzberg and W. Ziarko, Variable Preci- Information Processing Society. Readers can con-
oped a new algorithm that integrates rule sion Rough Sets with Asymmetric Bounds, tact Chan at the Dept. of Computer Science, Univ.
in Rough Sets, Fuzzy Sets and Knowledge of Regina, Regina, SK S4S 0A2, Canada; chan@
induction with case-based reasoning to Discovery, W. Ziarko, ed., Springer-Verlag, cs.uregina.ca; http://www.cs.uregina.ca/~chan.
improve case retrieval and problem solving. New York, 1994, pp. 167177.
We have applied this integrated method to
numeric-prediction domains. Experimental 8. Z. Pawlak, S.K.M. Wong, and W. Ziarko,
results show that case-based reasoning with Rough Sets: Probabilistic Versus Determin-
istic Approach, Intl J. Man-Machine Stud-
rule induction performs better than case- ies, Vol. 29, No. 1, 1988, pp. 8195.
based reasoning by itself, in terms of pre- Ning Shan is a PhD candidate in the Department
dictive accuracy. 9. W. Ziarko and N. Shan, KDD-R: A Com- of Computer Science at the University of Regina.
prehensive System for Knowledge Discovery His research interests include machine learning,
in Databases Using Rough Sets, Proc. Third rough sets, rule-based expert systems, knowledge
Intl Workshop Rough Sets and Soft Comput- discovery in databases, and real-world applications
ing, San Jose State Univ., San Jose, Calif., of artificial intelligence. He received a B.Eng. in
1994, pp. 164173. computer engineering from ChengDu Electronic
Engineering Institute, China, and an MSc in com-
10. W. Ziarko and N. Shan, A Rough Set-Based puter science from the University of Regina. Read-
Method for Computing All Minimal Deter- ers can contact Shan at the Dept. of Computer Sci-
Acknowledgments ministic Rules in Attribute-Value Systems, ence, Univ. of Regina, Regina, SK S4S 0A2,
Computational Intelligence: An Intl J., Vol. Canada; ning@cs.uregina.ca.
This research is supported by the Telecommu- 12, No. 2, 1996, pp. 223234.
nications Research Laboratories and the Natural
Science and Engineering Research Council of 11. S.K.M. Wong and W. Ziarko, INFERAn
Canada. We deeply appreciate the cooperation of Adoptive Decision Support System Based on
the personnel at the City of Reginas Municipal the Probabilistic Approximate Classification,
Water Engineering and Water Supply departments. Proc. Sixth Intl Workshop Expert Systems
We also would like to thank Paitoon Tontiwach- and Their Applications, Agence de Informa- Nick Cercone is a professor and the chair of the
wuthikul for his invaluable suggestions on this tique, Avignon, France, 1986, pp. 713726. Computer Science Department at the University
work. of Waterloo. His research interests include natural-
language processing, knowledge-based systems,
knowledge discovery in databases, and design and
human interfaces. He received a BS in engineering
science from the University of Steubenville, an MS
in computer and information science from Ohio
State University, and a PhD in computing science
Aijun An is a PhD candidate in the Department of from the University of Alberta. He is a member of
Computer Science at the University of Regina. Her the ACM, the IEEE, the AAAI, the Artificial Intel-
research interests include machine learning, case- ligence Simulation of Behavior, the AGS, and the
based reasoning, knowledge engineering for devel- Association of Computational Linguistics. He
References oping real-time expert systems, validation and
verification of knowledge-based systems, and
recently won the Canadian Artificial Intelligence
Societys Distinguished Service Award. Readers
1. J. Quevedo et al., Time Series Modelling of applications of artificial intelligence techniques to can contact Cercone at the Computer Science
Water Demand: A Study on Short-Term and practical problems. She received a BSc and an Dept., Faculty of Mathematics, Univ. of Waterloo,
Long-Term Predictions, in Computer Appli- MSc in computer science from Xidian University, Waterloo, ON N2L 3G1, Canada; ncercone@uwa-
cations in Water Supply, Vol. 1, B. Coulbeck Xian, China. She is a student member of the terloo.ca.
and C. Orr, eds., Research Studies Press Ltd., AAAI and the IEEE Computer Society. Readers
Letchworth, England, 1988, pp. 268288. can contact An at the Dept. of Computer Science,
Univ. of Regina, Regina, SK S4S 0A2, Canada;
2. R.S. Michalski, J.G. Carbonell, and T.M. aijun@cs.uregina.ca.
Mitchell, eds., Machine Learning: An Artifi-
cial Intelligence Approach, Vol. 1, Morgan
Kaufmann, San Francisco, 1983.
Wojciech Ziarko is a professor in the Computer
3. R.S. Michalski, J.G. Carbonell, and T.M. Science Department at the University of Regina.
Mitchell, eds., Machine Learning: An Artifi- Christine Chan is an associate professor in the His research interests are knowledge discovery in
cial Intelligence Approach, Vol. 2, Morgan Department of Computer Science at the Univer- databases, machine learning, pattern classification,
Kaufmann, 1986. sity of Regina. Her research interests include and control-algorithm acquisition from sensor
industrial applications of artificial intelligence, data. These research interests are, to a large degree,
4. Z. Pawlak, Rough Sets: Theoretical Aspects object-oriented methodologies for knowledge- motivated by the recent introduction of the theory
of Reasoning about Data, Kluwer Academic based systems development, knowledge acquisi- of rough sets, which serves as a basic mathemati-
Publishers, Norwell, Mass., 1991. tion and conceptual modeling, and educational cal framework in much of his work. He received an
instructional software. She received MSc degrees MSc in applied mathematics from Warsaw Uni-
5. J.R. Quinlan, Induction of Decision Trees, in computer science and management informa- versity of Technology, Poland, and a PhD in com-
Machine Learning, Vol. 1, No. 1, 1986, pp. tion systems from the University of British puter science, specializing in databases, from the
81106. Columbia. She also received a PhD in the inter- Institute of Computer Science of the Polish Acad-
disciplinary studies of computer science, philos- emy of Sciences, Warsaw, Poland. Readers can
6. W. Ziarko, Variable Precision Rough Set ophy, and psychology from Simon Fraser Uni- contact Ziarko at the Dept. of Computer Science,
Model, J. Computer and System Sciences, versity. She is a member of the AAAI, the IEEE Univ. of Regina, Regina, SK S4S 0A2, Canada;
Vol. 46, No. 1, 1993, pp. 3959. Computer Society, the ACM, and the Canadian ziarko@cs.uregina.ca.
78 IEEE EXPERT