This action might not be possible to undo. Are you sure you want to continue?
Soft Computing Series — Volume 5
fl new Paradigm of Knowledge Engineering by Soft Computing
Editor: Liya Ding
RSI
Fuzzy Logic
Systems Institute (FLSI)
r
o'jo.roco
>0010C0iL
fl new Paradigm of Knowledge Engineering by Soft Computing
Fuzzy Logic Systems Institute (FLSI) Soft Computing Series
Series Editor: Takeshi Yamakawa (Fuzzy Logic Systems Institute, Japan)
Vol. 1: Advanced Signal Processing Technology by Soft Computing edited by Charles Hsu (Trident Systems Inc., USA) Vol. 2: Pattern Recognition in Soft Computing Paradigm edited by Nikhil R. Pal (Indian Statistical Institute, Calcutta)
Vol. 3: What Should be Computed to Understand and Model Brain Function? — From Robotics, Soft Computing, Biology and Neuroscience to Cognitive Philosophy edited by Tadashi Kitamura (Kyushu Institute of Technology, Japan) Vol. 4: Practical Applications of Soft Computing in Engineering edited by SungBae Cho (Yonsei University, Korea) Vol. 6: Brainware: BioInspired Architecture and Its Hardware Implementation edited by Tsutomu Miki (Kyushu Institute of Technology, Japan)
F L j p I Soft Computing Series — Volume 5
fl new Paradigm of Knowledge Engineering by Soft Computing
Editor
Liya Ding
National University of Singapore
V f e World Scientific
« • Sinaapore Singapore • New Jersey • L London • Hong Kong
Published by World Scientific Publishing Co. Pte. Ltd. P O Box 128, Farrer Road, Singapore 912805 USA office: Suite IB, 1060 Main Street, River Edge, NJ 07661 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library CataloguinginPublication Data A catalogue record for this book is available from the British Library.
A NEW PARADIGM OF KNOWLEDGE ENGINEERING BY SOFT COMPUTING FLSI Soft Computing Series — Volume 5 Copyright © 2001 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. Thisbook, orparts thereof, may notbe reproducedinanyformorby any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN 9810245173
Printed in Singapore by Fulsland Offset Printing
To
Prof. Lotfi A. Zadeh and other pioneers
who have changed our life in more ways t h a n one and who have encouraged as well as guided us t o continue our research and development in Soft C o m p u t i n g
Series Editor's Preface
The IIZUKA conference originated from the Workshop on Fuzzy Systems Application in 1988 at a small city, which is located in the center of Fukuoka prefecture in the most southern island, Kyushu, of Japan, and was very famous for coal mining until forty years ago. lizuka city is now renewed to be a science research park. The first IIZUKA conference was held in 1990 and from then onward this conference has been held every two years. The series of these conferences played important role in the modern artificial intelligence. The workshop in 1988 proposed the fusion of fuzzy concept and neuroscience and by this proposal the research on neurofuzzy systems and fuzzy neural systems has been encouraged to produce significant results. The conference in 1990 was dedicated to the special topics, chaos, and nonlinear dynamical systems came into the interests of researchers in the field of fuzzy systems. The fusion of fuzzy, neural and chaotic systems was familiar to the conference participants in 1992. This new paradigm of information processing including genetic algorithms and fractals is spread over to the world as "Soft Computing". Fuzzy Logic Systems Institute (FLSI) was established, under the supervision of Ministry of Education, Science and Sports (MOMBUSHOU) and International Trade and Industry (MITT), in 1989 for the purpose of proposing brandnew technologies, collaborating with companies and universities, giving university students education of soft computing, etc. FLSI is the major organization promoting so called IIZUKA Conference, so that this series of books edited from IIZUKA Conference is named as FLSI Soft Computing Series. The Soft Computing Series covers a variety of topics in Soft Computing and will propose the emergence of a postdigital intelligent systems.
Takeshi Yamakawa, Ph.D. Chairman, IIZUKA 2000 Chairman, Fuzzy Logic Systems Institute
vii
Volume Editor's Preface
Soft computing (SC) consists of several computing paradigms, including neural networks, fuzzy set theory, approximate reasoning, and derivativefree optimization methods such as genetic algorithms. The integration of those constituent methodologies forms the core of soft computing. Also the synergy allows soft computing to incorporate human knowledge effectively, deal with imprecision and uncertainty, and learn to adapt to unknown or changing environment for better performance. Together with other modern technologies, soft computing and its applications bring unprecedented influence to intelligent systems that mimic human intelligence in thinking, learning, reasoning and many other aspects. On the other hand, knowledge engineering (KE) that deals with knowledge acquisition, representation, validation, inferencing, explanation, and maintenance has had significant progress recently due to the indefatigable effort of researchers. Undoubtedly, the hot topics of data mining and knowledge/data discovery have injected new lease of life to the classical AI world. It is obvious that soft computing and knowledge engineering are expected to fulfill some common targets in materializing machine intelligence. In recent trends, many researchers of SC have applied their technics in solving KE problems, and researchers of KE have adopted SC methodologies to enhance KE applications. The cooperation of the two disciplines is not only extending the application of SC, but also introducing new innovation toKE. There are fifteen chapters in total in this book. Except for the introductory chapter which provides the reader with a guideline on the contents,
IX
x
L. Ding
the rest of the fourteen chapters is an extended version of the original conference papers selected from IIZUKA'98. These papers mainly presented works on: • • • • • Acquisition and modelling of imprecise knowledge Reasoning and retrieval with imprecise knowledge Description and representation of fuzzy knowledge Knowledge representation and integration by SC Knowledge discovery and data mining by SC
The fourteen chapters are divided into two parts. The first part (Chapters 2 to 9) mainly focuses on fuzzy knowledgebased systems, including (i) (ii) (iii) (iv) (v) fuzzy rule extraction, fuzzy system tuning, fuzzy reasoning, fuzzy retrieval, and knowledge description language for fuzzy systems.
The second part (Chapters 10 to 15) mainly focuses on (vi) knowledge representation, (vii) knowledge integration, (viii) knowledge discovery, and (ix) data mining by soft computing. The aim of this book is to help readers trace out how KE has been influenced and extended by SC and how SC will be helpful in pushing the frontier of KE further. This book is intended for researchers and also graduate students to use as a reference in the study of knowledge engineering and intelligent systems. The reader is expected to have a basic knowledge of fuzzy logic, neural networks, genetic algorithms, and knowledgebased systems.
Acknowledgments
1. All authors of original papers for the valuable contributions. 2. Prof. Takeshi Yamakawa for his constant encouragement. 3. Prof. Masao Mukaidono from Meiji University for his guidance to me in the establishment of a foundation for research on fuzzy logic and knowledge engineering.
Volume Editor's
Preface
xi
4. Prof. Lotfi A. Zadeh and other pioneers (too numerous to name individually) for their support to me over the past 14 years. 5. The Institute of Systems Science, National University of Singapore for providing me the opportunity of doing research and applying the results. 6. Mrs. Jenny Russon for editing and polishing my English with amazing speed and thoroughness.
Liya DING Singapore
Contents
Series Editor's Preface Volume Editor's Preface vii ix
Chapter 1
Knowledge Engineering and Soft Computing —An Introduction .... 1 Liya Ding
Part I: Fuzzy KnowledgeBased Systems Chapter 2 Linguistic Integrity: A Framework for Fuzzy Modeling — AFRELI Algorithm Jaito Espinosa, Joos Vandewalle A New Approach to Acquisition of Comprehensible Fuzzy Rules Hiroshi Ohno, Takeshi Furuhashi Fuzzy Rule Generation with Fuzzy SingletonType Reasoning Method Yan Shi, Masaharu Mizumoto Antecedent Validity Adaptation Principle for Table LookUp Scheme PingTong Chan, Ahmad B. Rod Fuzzy Spline Interpolation in Sparse Fuzzy Rule Bases Mayuka F Kawaguchi, Masaaki Miyakoshi Revision Principle Applied for Approximate Reasoning Liya Ding, Peizhuang Wang, Masao Mukaidono Handling Null Queries with Compound Fuzzy Attributes ShyueLiang Wang, YuJane Tsai
xiii
15
Chapter 3
43
Chapter 4
59
Chapter 5
77
Chapter 6
95
Chapter 7
121
Chapter 8
149
xiv
Contents
Chapter 9
Fuzzy System Description Language Kazuhiko Otsuka, Yuichiro Mori, Masao Mukaidono
163
Part II: Knowledge Representation, Integration, and Discovery by Soft Computing Chapter 10 Knowledge Representation and Similarity Measure in Learning a Vague Legal Concept MingQiang Xu, Kaoru Hirota, Hajime Yoshino Trend Fuzzy Sets and Recurrent Fuzzy Rules for Ordered Dataset Modelling Jim F. Baldwin, Trevor P. Martin, Jonathan M. Rossiter Approaches to the Design of Classification Systems from Numerical Data and Linguistic Knowledge Hisao Ishibuchi, Manabu Nii, Tomoharu Nakashima A Clustering based on SelfOrganizing Map and Knowledge Discovery by Neural Network Kado Nakagawa, Naotake Kamiura, Yutaka Hata Probabilistic Rough Induction Juzhen Dong, Ning Zhong, Setsuo Ohsuga Data Mining via Linguistic Summaries of Databases:'An Interactive Approach Janusz Kacprzyk, Slavomir Zadrozny
189
Chapter 11
213
Chapter 12
241
Chapter 13
273
Chapter 14
297
Chapter 15
325
About the Authors Keyword Index
347 369
Chapter 1 Knowledge Engineering and Soft Computing — An Introduction
Liya Ding
National University of Singapore
1.1
Introduction
As the title, "A New Paradigm of Knowledge Engineering by Soft Computing" , indicates, this book presents works in the intersection of two areas of computer science in the broad sense: knowledge engineering and soft computing. Knowledge engineering (KE) [2], known as an important component of artificial intelligence (AI), is an area that mainly concentrates on activities with knowledge, including knowledge acquisition, representation, validation, inference, and explanation. Soft computing (SC) [14], on the other hand, is an area that provides tools and methodologies for intelligent systems to be developed with the capability of handling uncertainty and imprecision, learning new knowledge and adapting themselves to a changing environment. Though the concept of knowledge engineering was put forward in its own way in early years without the recognition of the usefulness of soft computing, soft computing methodologies, including fuzzy logic, neural networks, and evolutionary computation, have been related to one or more aspects of KE and therefore AI problems. They have done so with their particular strengths from the beginning. There have been many remarkable works done in parallel in both KE and SC areas, but relatively less in the intersection of the two. In recent
1
2
L. Ding
trends, many researchers of SC have applied their technics in solving KE problems, and researchers of KE have adopted SC methodologies to enhance KE applications. This book is to introduce to the reader a collection of works that bring new innovation to knowledge engineering by the applications of soft computing. 1.1.1 Knowledge and Knowledge Engineering
Knowledge, or the problem of dealing with knowledge, has been of intensive interest to sociologists and psychologists for a long time. With the development of artificial intelligence (AI), the emphasis has shifted from philosophical and social concepts to the problem of representation, or more precisely, the problem of representation of knowledge in computers. Knowledge is a highly abstract concept. Although most of us have a fairly good idea of what it means and how it relates to our life, we have probably not explored some of its wider meaning in a universal context. Knowledge can be defined as the body of facts, principles, acts, state of knowing and experience accumulated by humankind. However, this definition is far from complete and knowledge is actually much more than this. It is having actual experiences with languages, concepts, procedures, rules, ideas, abstractions, places, customs, facts, and associations, coupled with an ability to use these experiences effectively in modeling different aspects of the world. Knowledge is closely related to intelligence. Knowledgebased systems are often described as being 'intelligent' in the sense that they attempt to simulate many of the activities which when undertaken by a human being are regarded as being instances of intelligence. The differentiation between types of knowledge can be found in several ways. From the point of view of the use of intelligent systems, knowledge can be divided into the following types: (1) Declarative knowledge is passive knowledge expressed as statements of facts about the world. (2) Procedural knowledge is compiled knowledge related to the performance of some task. (3) Heuristic knowledge is to describe human experience for solving complex problems. In building a knowledgebased system for a specific domain, socalled domain knowledge can be considered to have two main kinds: a) surface knowledge and b) deep knowledge. Surface knowledge is the heuristic, experiential knowledge learned after solving a large number of problems
Knowledge Engineering
and Soft Computing
— An Introduction
3
in that domain. Deep knowledge refers to the basic laws of nature and the fundamental structure and behavioral principles of that domain which cannot be altered. In regard to levels of abstraction and completeness, knowledge can be summarized in different forms. Rules are often used to represent more deterministic and abstract knowledge, by a certain relationship between the antecedent and the consequent. Cases are useful to describe knowledge gained from previous experience, which will tell us the appearance of related factors without us knowing clearly which is the cause and which is the effect. Patterns, as compared with rules and cases, are usually used to store less abstract and sometimes less complete knowledge. The difference between types, or forms of knowledge is not always absolute. Heuristics may be of the nature of declarative knowledge or procedural knowledge. Cases may be represented in the form of rules through necessary transformation. Patterns may also be summarized as cases or even rules through appropriate granulation or quantization, if the corresponding knowledgebased system requires so. Knowledge includes and requires the use of data and information, but should not be confused with them. Knowledge includes skill and training, perception, imagination, intuition, common sense and experience, and combines relationships, correlations, and dependencies. It has been widely accepted that with a sufficient amount of data, some useful knowledge may possibly be discovered through a certain discovery technique. As a recent hot topic, data mining for knowledge discovery has attracted more and more attention. Knowledge engineering is a discipline devoted to integrating human knowledge in computer systems, or in other words, to building knowledgebased systems. It can be viewed from both the narrow or wider perspectives. According to the narrow perspective, knowledge engineering deals with knowledge acquisition (also referred to as knowledge elicitation), representation, validation, inference, and explanation. Alternatively, according to the wider perspective the term describes the entire process of development and maintenance of knowledgebased systems. In both cases knowledge plays the key role. Knowledge engineering, especially the knowledge acquisition practice, involves the cooperation of human experts in that domain who work with the knowledge engineer to codify and to make explicit the rules (or other form of knowledge) that a human expert uses to solve real problems. Since
4
L. Ding
the construction of knowledge base needs human knowledge in a direct or an indirect way, an important issue in the design of knowledgebased systems is how to equip them with human knowledge that often appears to be uncertain, imprecise, and incomplete to some degree. 1.1.2 Soft Computing
Soft computing is an emerging approach to computing which parallels the remarkable ability of the human mind to reason and learn in an environment of uncertainty and imprecision. (Lotfi A. Zadeh [14]) As pointed out by Prof. Lotfi A. Zadeh, soft computing is not a single methodology, but a partnership. The principal partners at this stage are fuzzy logic (FL), neuro computing (NC), and probabilistic reasoning (PR), with the latter subsuming genetic algorithms (GA), chaotic systems, belief networks, and parts of learning theory. The pivotal contribution of FL is a methodology for computing with words; that of NC is system identification, learning, and adaptation; and that of GA is systematized random search and optimization. [5] Fuzzy Logic Fuzzy logic has its narrow and broad sense. According to the narrow sense, fuzzy logic is viewed as a generalization of the various multivalued logics. It mainly refers to approximate reasoning, as well as knowledge representation and inference with imprecise, incomplete, uncertain or partially true information. According to the broad sense, fuzzy logic includes all the research efforts related to fuzzy inference systems (or fuzzy systems). It is generally agreed that human knowledge includes imprecision, uncertainty, and incompleteness in nature, because the human brain interprets imprecise and incomplete sensory information provided by perceptive organs. Instead of simple rejection of the ambiguity, fuzzy set theory, as an extension of set theory, offers a systematic calculus to deal with such information. It performs numerical computation by using linguistic labels stipulated by membership functions. With fuzzy sets, human knowledge described in words can be represented and hence processed in computer. Fuzzy logic, in its narrow sense, offers the possibility of inference with uncertainty and imprecision. Together with fuzzy set theory, it provides the basis of fuzzy inference systems. A typical fuzzy inference system has a structured knowledge represen
Knowledge Engineering
and Soft Computing
— An Introduction
5
tation in the form of fuzzy "ifthen" rules. A fuzzy "ifthen" rule (or fuzzy rule) takes the same form as a symbolic ifthen rule, but is interpreted and executed in a different way with the use of linguistic variable. Fuzzy knowledge representation and approximate reasoning have greatly extended the ability of the traditional rulebased system. However, it lacks the adaptability to deal with a changing environment and assumes the availability of well structured knowledge for the problem domain. Thus, people incorporate learning concepts in fuzzy inference systems. One important way of materializing learning in fuzzy inference systems is using neural networks. Neural Networks The original idea of artificial neural networks (also known as neural networks) is inspired by biological nervous systems. A neural network system is a continuoustime nonlinear dynamic system. It uses connectionist architectures to mimic human brain mechanisms for intelligent behavior. Such connectionism replaces symbolically structured representation with distributed representation in the form of weights between a massive set of interconnected processing units. The weights are modified through a certain learning procedure so that the neural network system can be expected to improve its performance progressively in a specific environment. Neural networks are good in faulttolerance, and can learn from training data provided in nonstructured and nonlabelled form. However, as compared with fuzzy inference systems, the knowledge learned in a neural network system is usually nontransparent and hard to explain. Many researchers have put efforts on rule extraction from neural networks and rule generation using neural networks. Those extracted or generated rules can then be used to develop fuzzy inference systems with necessary and possible fine tuning. Evolutionary Computation Fuzzy logic offers knowledge representation and inference mechanism for knowledge processing with imprecision and incompleteness. Neural networks materializes learning and adaptation capability for intelligent systems. While evolutionary computation provides the capacity for populationbased systematic random search and optimization. Evolutionary computing techniques such as genetic algorithms (GA) are based on the evolutionary principle of natural selection. A GA carries on evaluations of fitness for the population of possible solutions and leads the
6
L. Ding
search to a better fitness. A 'best' solution may always be expected for many AI applications. The use of heuristic search techniques, therefore, forms an important part of application of intelligent systems. However, in reality it is not always possible to get such an optimal solution when the search space is too large for an exhaustive search and at the same time is too difficult to reduce. Genetic algorithm (GA) is a usable technique to perform more efficient search techniques to find lessthanoptimum solutions.
1.1.3
Soft Computing ing
Contributes
to Knowledge
Engineer
Some of the contributions of soft computing to knowledge engineering can be found in the following aspects: • Knowledge Representation Fuzzy logic can be used to represent imprecise and incomplete knowledge described in words. On the other hand, knowledge based neural networks offer a connectionist way of knowledge representation with the learning ability of neural networks. • Knowledge Acquisition When the information and data obtained as domain knowledge is less structured and summarized, neural networks can be employed for learning. A trained neural network can be viewed as a form of knowledge representation and possible rule extraction may be applied then to obtain fuzzy rules from the neural network. Some clustering techniques can also be used with fuzzy logic to help fuzzy rule extraction. Genetic algorithms can help search for more accurate fuzzy rules or fine tune fuzzy rules. • Knowledgebased Inference In broad sense, both fuzzy inference systems and neural network systems offer knowledgebased inference. In fuzzy inference systems, inference is executed by using fuzzy rules, fuzzy relations, and fuzzy sets within the frame of fuzzy logic. While in neural network systems, the inference results are determined by the inference algorithms based on the learned knowledge in the neural networks. Genetic algorithms can be used to find a better neural network configuration.
Knowledge Engineering
and Soft Computing
— An Introduction
7
Modeling and Developing Knowledgebased Systems Neurofuzzy modeling is a pivotal technique in soft computing by incorporate neural network learning concepts in fuzzy inference systems. Hybrid systems provide more capability in developing intelligent systems with the cooperation of the SC techniques. Knowledge Integration Knowledge integration becomes a critical issue to maximize the functionality of an intelligent, knowledgebased system when the knowledge for the specific domain exists at different levels of abstraction and completeness, or comes from various sources and is described in different forms. The cooperation of soft computing techniques offers more flexibility in dealing with such situation. Knowledge Discovery If we can say that knowledge representation is for representing available knowledge, and knowledge acquisition is for obtaining existing but not well summarized knowledge, then we probably should say that knowledge discovery is to find out knowledge existing in more unknown form. Neural networks, with supervised or unsupervised learning approaches, can help discover knowledge from given data. Evolutionary computations, and probabilistic approaches have also been applied for similar purposes.
1.2
Structure of This Book
This book is organized into two parts. Part I (Chapters 2 to 9) mainly focuses on fuzzy knowledge based systems, including rule extraction, system tuning, reasoning, retrieval, and knowledge description language. Part II (Chapters 10 to 15) mainly focuses on knowledge representation, integration, discovery and data mining by soft computing. Figures 1.1 and 1.2 illustrate contents of the chapters from the perspectives of the KE topics related and the SC techniques applied, respectively.
1.2.1
Part I: Fuzzy Knowledgebased
Systems
In developing fuzzy inference systems, one of the important tasks is to construct a fuzzy rule base within the framework of fuzzy modelling. Chapter 2 introduces an algorithm for automatic fuzzy rule extraction from prior knowledge and numerical data. It consists of several main steps: twostage
8
L. Ding
clustering of numerical data using mountain clustering and fuzzy cmeans methods; generation and reduction of fuzzy membership functions for antecedents; consequence calculation and further adjustment.
Knowledge Modelling & Acquisition
Reasoning & Retrieval
Knowledge Engineering
Knowledge Based System Development
Knowledge Representation & Integration Ch. 12: Integration )
Knowledge Discovery
Ch. 1315 Discovery
Figure 1.1: A View of the Contents based on Related KE Topics Chapter 3 presents an algorithm for acquisition of fuzzy rules by using evolutionary computation (EP). It first constructs a fuzzy neural network based on fuzzy modelling, and then applies EP on training data to identify parameters of the fuzzy neural network, which indicate central position and width of fuzzy membership functions, as well as singleton consequent of each fuzzy rule. A "reevaluation" of fuzzy model by evolutionary computation is executed to simplify the membership functions obtained in the early phase with a flexible "degree of explanation" indicated by the user. Chapter 4 introduces a rule extraction method by neurofuzzy learning and fuzzy clustering. The proposed method is used with fuzzy singletontype reasoning, which has been successfully applied in fuzzy control sys
Knowledge
Engineering
and Soft Computing
— An Introduction
9
terns. The process of rule extraction is divided into two stages; fuzzy cmeans clustering is first executed to generate initial tuning parameters of fuzzy rules based on inputoutput data; then a neurofuzzy learning algorithm based on the gradient descent method is later applied to tune the parameters.
Fuzzy Clustering / NN Clustering Fuzzy Sets Fuzzy Neural Networks Neuroruzzy Learning / NN Learning Evolutionary Computation Fuzzy Reasoning Probability / Possibility Theory Fuzzy Query Rough Sets
Fuzzy Rules
Similarity Measure
Data Mining
Figure 1.2: A View of the Contents based on SC Techniques Applied Chapter 5 explains another algorithm for fuzzy rule extraction based on numerical data and expert knowledge. This algorithm first fixes the fuzzy membership functions in the input and output spaces and then generates fuzzy rules from given data and given expert knowledge. It uses the antecedents validity to adjust the output consequences. With more concern for reducing modelling error, it tends to generate a larger number of rules than that of data patterns. Fuzzy reasoning is another important aspect of fuzzy knowledge based systems. In the discussion of Chapters 2 to 5, fuzzy rules and member
10
L. Ding
ship functions are assumed to cover well the problem space. However, it is also necessary to consider applications that lack sufficient data and expert knowledge. Chapter 6 presents a technique of approximate reasoning through linear and nonlinear interpolation functions of given fuzzy rules. This method makes it possible to apply the ordinary approximate reasoning with sparse fuzzy rule bases. Chapter 7 summarizes the work of approximate reasoning using the revision principle. It is different from other methods in that it performs approximate reasoning in a more intuitive way based on the key concept of "reasoning based on revision". Five methods based on linear revision and semantic revision are presented. By using the "approximation measure", it allows the approximate reasoning with sparse fuzzy rules and arbitrary shapes of membership functions. Fuzzy retrieval is a main topic for fuzzy database and also a useful technique for a wide range of application of fuzzy systems. Chapter 8 presents an approach of fuzzy query handling for fuzzy retrieval. It allows the use of compound fuzzy attributes, which can be derived from numbers, interval values, scalars, and sets of all these data types, with appropriate aggregation functions and similarity measures on fuzzy sets. A general programming language for fuzzy system development is very useful to support the growth of application of fuzzy systems. Chapter 9 summarizes the work on a fuzzy system description language, which accepts the user's description of the target fuzzy system and then generates corresponding C code based on the description. It offers flexible types of data and knowledge including fuzzy sets with different kinds of membership function, fuzzy numerical and logical operations, as well as fuzzy rules. 1.2.2 Part II: Knowledge Discovery by Soft Representation, Computing Integration, and
The comprehensive applications of knowledgebased systems request more flexibility in representation and integration of different types of knowledge. Chapter 10 presents a "fuzzy factor hierarchy" for representing uncertain and vague concepts in legal expert systems. It offers the possibility to represent objects with not only numerical features but also contextbased features. A structural similarity measure containing surface level component and deep level component is proposed for the reasoning and retrieval when using the fuzzy factor hierarchy. The surface level component con
Knowledge Engineering
and Soft Computing
— An Introduction
11
sists of distancebased and featurebased similarity, while the deep level component is determined by contextbased similarity. Chapter 11 presents two models to handle ordered dataset and time series problem for classification applications. The proposed two models are based on the theory of mass assignment, which is to unify probability, possibility, and fuzzy sets into a single theory. The memorybased modelling makes possible the belief updating method by using recurrent fuzzy rules and focuses on how the computing model captures human belief and memory. The perceptionbased modelling uses trend fuzzy sets to describe natural trends of a time series. It is based on the high level perception mechanism used by humans to sense their environment. Chapter 12 introduces two approaches of knowledge integration for the design of classification systems; one is a fuzzy rulebased approach where fuzzy ifthen rules generated from numerical data are used together with the given linguistic knowledge to construct a fuzzy rulebased system and the rules can be generated by heuristic procedure, genetic algorithms, or neural networks; the other one is a neuralnetworksbased approach where both of the given linguistic knowledge (i.e., fuzzy ifthen rules) and the numerical data (i.e., training patterns) are handled as fuzzy training patterns and then used in the learning of extended neural networks. With the rapid growth of applications of knowledgebased systems, the matter of "how to maximize the use of available knowledge, information, and data to make knowledgebased systems more "intelligent"" has become a pressing issue. The study of knowledge discovery and data mining offers a possible way towards the solution. Chapter 13 proposes a twostage method of knowledge discovery by neural networks. In the first stage, SelfOrganizing Map (SOM) is applied for initial clustering of given training data. The result is then modified by combining some neighboring neurons that satisfy some conditions. In the second stage, a threelayered feedforward network learns the center vector of each modified cluster obtained in the early stage. By pruning some hidden neurons to obtain a socalled " degree of contribution", it discovers the knowledge for explaining why a cluster has been formed in terms of its corresponding attribute values. Chapter 14 presents an approach for uncertain rule discovery from database with noise and incomplete data. This approach is based on the combination of the rough set theory and the "generalization distribution table" which is used to represent the probabilistic relationships between
12
L. Ding
concepts and instances over discrete domains. It first selects a set of rules with larger strengths from possible rules, and then further finds "minimal relative reducts" from this set. It offers the flexibility to involve biases and some background knowledge in the discovery process. Chapter 15 proposes an interactive approach to linguistic summaries of databases for data mining applications. The derived linguistic summaries are based on fuzzy logic with linguistic quantifiers. Three main types of data summaries are offered; type 1 is to receive some estimate of the cardinality of some population as a linguistic quantifier; type 2 is to determine typical values of a field; type 3 which is the most general type, is to produce fuzzy rules describing the dependencies between values of particular fields. Both soft computing and knowledge engineering are rapidly developing and constantly evolving areas. More and more new techniques and applications of SC and KE are being proposed. The results achieved so far have already established a good foundation in building more "intelligent" machines in future which will contribute greatly to our daily life.
Knowledge Engineering and Soft Computing — An Introduction
13
References
P. BeynonDavies, "Knowledge Engineering for Information Systems", McGrawHill, 1993. E. Feigenbaum & P. McCorduck, "The Fifth Generation", AddisonWesley, 1983. D. B. Fogel, "Evolutionary Computation  Toward a New Philosophy of Machine Intelligence", IEEE Press, 1995. L. Fu, "Neural Networks in Computer Intelligence", McGrawHill, Inc., 1994. J.S. R. Jang, C.T. Sun & E. Mizutani, "NeuroFuzzy and Soft Computing", PrenticeHall, Inc., 1997. C.T. Lin & C.S. George Lee, "Neural Fuzzy Systems  A NeuroFuzzy Synergism to Intelligent Systems", PrenticeHall International, Inc., 1996. C. V. Negoita, "Expert Systems and Fuzzy Systems", jamin/Cummings Publishing Company, Inc., 1985. The Ben
D. W. Patterson, "Introduction to Artificial Intelligence and Expert Systems", PrenticeHall, Inc., 1990. D. A. Waterman, "A Guide to Expert Systems", AddisonWesley Publishing, 1986. R. R. Yager, "Fuzzy logics and artificial intelligence", Fuzzy Sets and Systems, Vol. 90, pp.193198, 1997. T. Yamakawa &c G. Matsumoto (Eds.), "Methodologies for the Conception, Design and Application of Soft Computing", Proceedings of the 5th International Conference on Soft Computing and Information/Intelligent Systems (IIZUKA'98), World Scientific Publishing, 1998. L. A. Zadeh, "Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic", Fuzzy Sets and Systems Vol. 90, pp.111127, 1997. L. A. Zadeh, "Fuzzy Logic = Computing with Words", IEEE Trans. Fuzzy Systems, Vol. 4, No. 2, pp.103111, 1996.
14
L. Ding
[14] L. A. Zadeh, "Fuzzy logic, neural networks and soft computing", Commun., ACM, Vol. 37, No. 3, pp.7784, 1994. [15] H. J. Zimmermann, "Fuzzy Sets, Decision Making, and Expert Systems", Kluwer Academic Publishers, 1987.
Part I: Fuzzy KnowledgeBased Systems
Chapter 2 Linguistic Integrity: A Framework for Fuzzy Modeling AFRELI Algorithm
Jairo Espinosa, Joos Vandewalle
Katholieke Universiteit Leuven
Abstract In this paper, a method for fuzzy modeling is presented. The framework of the method is the concept of Linguistic Integrity. The use of this framework present several advantages. The most important is transparency, this transparency can be exploited in two directions. T h e first direction is in data mining where the method can provide a linguistic relation (IFTHEN rules) among the variables. The second direction is to improve the completeness of a model by giving an easy interface to the user such that expert knowledge can be included. T h e algorithm starts from numerical data (inputoutput data) and generates a rule base with a limited number of membership functions on each input domain. The rules are created in the environment of fuzzy systems. The algorithm used for rule extraction is named (AFRELI). Keywords: fuzzy data mining modeling, function approximation, knowledge extraction,
2.1
Introduction
The use of models is the "corner stone" of human reasoning. Human beings make use of models to determine the consequences of their acts. The representations of such models is variated and can be external (mathematical models, ifthen rules, etc.) or internal (thoughts, reasoning, reflexes). Human beings use also the models not only to predict the results of their actions but also to understand the "mechanism" which governs the nature. Of course a causal nature of systems is embedded on this line of reasoning. The differences among the models are motivated by the information used
15
16
J. Espinosa
& J.
Vandewalle
to construct the model and the information demanded from the model (representation and accuracy). Modern science provides us with new sensors, extending our possibilities to explore the nature beyond our five senses. Most of the time the amount of data provided by sensors is overwhelming and obstructs our capacity to understand the phenomena governing the process. Information extraction is a task needed before some understanding of the process can be achieved. The basic principle of information extraction is the construction of a model which is able to capture the behavior of the data generated by the process. Recent studies have been successful on the task of constructing mathematical models out of numerical data provided by sensors (System Identification). On the other hand linguistic models constructed out of human experience in the form of IFTHEN rules had attracted the attention for multiple applications. The development of expert systems is a good example of this method. Information about the system under study can be present in multiple forms, numerical data, expert knowledge, hypotheses which are valid on similar models (uncertain knowledge), etc. The global behavior of the system is described partially by each of these pieces of information. Some of these pieces of information are redundant and some others are unique. The aim is to design a modeling technique that can introduce as much information as possible from very different sources without major changes in the format of the data. In this paper we present a modeling technique using fuzzy logic. Fuzzy logic is known for its capacity to combine in one framework linguistic (expert knowledge in format IFTHEN rules) and numerical information. So far the use of the so called neurofuzzy models has been the attempt to construct fuzzy models from numerical data [ll] [4]. To apply these models the structure of the fuzzy model should be fixed in advance (number of membership functions, number of rules, etc). Many schemes have been proposed to solve this inconvenience, some of them, are based on the accuracy of the approximation or local error [5] [10] and others are based on fuzzy clustering methods [9] [12] [6]. The results of these approaches are models with good capabilities on the framework of numerical approximation, but sometimes very poor in the context of linguistic information. This paper presents the AFRELI algorithm (Autonomous Fuzzy Rule Extractor with Linguistic Integrity), the algorithm is able to fit input
Linguistic
Integrity:
A Framework for Fuzzy Modeling . . .
17
output data while maintaining the semantic integrity of the rule base, in such a way that linguistic information can be also included and the description given by the rules base can be used directly to interpret the behavior of the data. So the applications of the technique won't be limited to modeling but also can be used in data mining in order to obtain causal relations among variables. The paper is structured as follows. Section 2.2 presents the structure of the fuzzy model, section 2.3 introduces the AFRELI algorithm, section 2.4 presents the FuZion algorithm to preserve the semantic integrity of the domain, section 2.5 shows some application examples and finally, section 2.6 gives the conclusions.
2.2
Structure of the fuzzy model
One of the advantages of the modeling techniques using Fuzzy Inference Systems (FIS) is the flexibility of the structures. Some of the degrees of freedom that can be found on a FIS are shape and number of membership functions, Tnorms, aggregation methods, etc. But sometimes this flexibility makes very difficult the analysis and the design of such structures. Some criteria should be applied to fix some of the parameters of the FIS. In this paper we select some parameters using criteria such as reconstruction capabilities (optimal interface design) and semantic integrity [7] [8].
• Optimal interface design — Errorfree Reconstruction: In a fuzzy system a numerical value is converted into a linguistic value by means of fuzzification. A defuzzification method should guaranteed that this linguistic value can be reconstructed in the same numerical value. V i e [a,6]: C~1[C(x)]=x (1)
where [a, b] is the universe of discourse. This condition guarantees the perfect correspondence between a numerical value and linguistic concept and vice versa. The use of centroid denazification with triangular membership functions with overlap  will satisfy this requirement (see proof: [7])
18
J. Espinosa & J.
Vandewalle
• Semantic integrity This integrity guarantees that the membership functions will represent a linguistic concept. The conditions needed to guarantee such semantic integrity are: — Distinguishability Each linguistic label should have semantic meaning and the fuzzy set should clearly define a range in the universe of discourse. So, the membership functions should be clearly different. Too much overlap among two membership functions means that they are representing the same linguistic concept. The assumption of the overlap equal to  makes sure that the support of each fuzzy set will be different. The distance between the modal values of the membership functions is also very important to make sure that the membership functions can be distinguished. The modal value of a membership function is defined as the acut with a = 1
Mi(a=i)(z),
i = l,...,N
(2)
— Justifiable Number of Elements The number of sets should be compatible with the number of "quantifiers" that a human being can handle. This number should not exceed the limit of 7 ± 2 distinct terms. This is a practical limitation of our brain and it is reflected in our language such that it is almost impossible to find a language where you can "formulate" more than 9 quantifiers. To handle more categories we use methods such as enumeration which are not part of the natural language [2]. The shape of the membership functions does not guarantee this property. In this paper we present the FuZion algorithm which is a method to reduce the number of membership functions on a given universe of discourse. — Coverage Any element from the universe of discourse should belong to at least one of the fuzzy sets. This concept is also mentioned in the literature as e completeness [4]. This guarantees that the input value is considered during the inference process. — Normalization Due to the fact that each linguistic label has semantic meaning, at least one of the values in the universe of discourse should have a membership degree equal to one. In
Linguistic
Integrity:
A Framework for Fuzzy Modeling . . .
19
other words all the fuzzy sets should be normal. Further details about these concepts can be found on [7] [8]. Based on these concepts the choice for the membership functions will be to use triangular and normal membership functions (/ii(x),[12(%)>• • ,IJn{x)) with a specific overlap of  . It means that the height of the intersection of two successive fuzzy sets is hgt(/ijn//i±i) =  . (3)
The choice of the AND and the OR operations will be motivated for the need of generating a continuous differentiable nonlinear map from the FIS. The use of the product as AND operation and the probabilistic sum as OR, makes easier the derivation of gradients that can be used to refine the models. If no further refinement will be applied there won't be major reason to prefer product and probabilistic sum instead of MIN/MAX operation. The aggregation method and the defuzzification method will be discussed in the next sections.
2.3
The AFRELI algorithm
The AFRELI (Automatic Fuzzy Rule Extractor with Linguistic Integrity) is an algorithm designed to obtain a good trade off between numerical approximation and linguistic integrity. The more accurate one wants to describe a function the more difficult it is to make a consistent linguistic description. The main steps involved in the algorithm are: • • • • Clustering. Projection. Reduction of terms. Consequence calculation.
• (optional) further antecedent adjustment. The detailed AFRELI algorithm proceeds as follows: (1) Collect N points from the inputs (U = { i t i , . . . , UJV}) and the out
20
J. Espinosa
& J.
Vandewalle
put (Y =
{yi,...,VN})
yk
=
(4)
k J
where Uk € K" and y^ £ 3? represents the inputs and the output of the function on instant k and construct the feature vectors
,i
Xk
i
(5) I Vk
Xk € 5Rra+1. These feature vectors are a spatial representation of the samples on a n + 1 dimensional space. (2) Using the iV feature vectors find C clusters by using mountain clustering method [12] [6] and refine them using fuzzy cmeans [l]. The use of mountain clustering methods helps to find the number of clusters that should be extracted and help to initialize the position of the centers of the clusters. These two parameters are very important to obtain good results when the fuzzy cmeans algorithm is applied.
*i *i ^1
Xc =
^n+l ^n+1 ~n+l
(6)
Xce$ln+lxC. (3) Project the C prototypes (centers) of the clusters into the input spaces. Assuming that the projected value of each prototype is the modal value of a triangular membership function. (7) where i = 1 , . . . , C, j = 1 , . . . , n (4) Sort the modal values on each domain such that: ml < mJi+1 \/j (8)
Linguistic
Integrity:
A Framework for Fuzzy Modeling . . .
21
(5) Add two more modal values for each input to guarantee full coverage of the input space. m:
m
o _
*=i,
mm,N uu
(9) (10)
c+i
= *.= J ,? 3 * ^ " * . 1 ,JV
(6) Construct the triangular membership functions with overlap of  as: nKx3) = max 0,min m^
'»i
l
m:»+i m^ m:« + i ,
(11)
ii
where: i = 1 , . . . , C, and the trapezoidal membership functions at the extremes of each universe of discourse ^o(^) max 0,min
rJ m
.m n
r.3
J
V.i
(12) (13)
^c+i(x:>) ~ max 0,min
l
 •
c+i
TUr
(7) Apply PuZion algorithm to reduce the number of membership functions. The FuZion algorithm guarantees a reduction of the membership functions till they fulfill the requirements of "distinguishability" and "justifiable number of elements". (8) Associate linguistic labels (p.e.BIG, MEDIUM, SMALL, etc.) to the resulting membership functions. (9) Construct the rule base with all possible antecedents (all possible permutations). This guarantees the completeness of the rules and full coverage of the working space. Use rules of the form:
IF u\ is n] AND u\ is tf AND . . . AND «J is p? THEN yk = yt
Equivalently the evaluation of antecedents of each rule can be expressed in terms of the min operator and the product operator as follows: Ai,(u*) =mm{d(ul),ri(ul),...,tf(un)} (14)
22
J. Espinosa & J.
Vandewalle
Vi(uk) = Mj(ufc) • M?(ufc) • • • • • M " « )
(15)
(10) Propagate the N values of the inputs and calculate the consequences of the rules as singletons (yj). These singletons can be calculated as the solution of a Least Squares (LS) problem. Observe that the output of the fuzzy system can be calculated as: ^2/Ji{uk)yi
/(«*) =
i=i
(16)
where L is the number of rules and /xj(i/k) can be calculated as shown in the equations (14) or (15) (According to the selected AND operator). The system then can be represented as the weighted sum of the consequences:
f(uk) =^2wf(uk)yt
where tof(Ufc) =
L
(17)
(18)
i=i
expressing wf as the strength of the rule I when the input is u*. Taking all the values the problem can be seen as: y\
2/2
1 1
" 2/i 2/2
ei
w,
N L WX
2
+
&2
(19)
< w
N WL J
. VL .
. ew .
The aim here is to reduce as much as possible the norm of the vector E. Using the quadratic norm: mmE 2 = m i n   y  W 0   2 (20)
Linguistic
Integrity:
A Framework for Fuzzy Modeling . . .
23
The solution to this problem can be found using the LS solution if rank(W) = dim(0) (21)
This implies that all the rules have to receive enough excitation during training. In practice, this is not always guaranteed, due to sparseness of the data. Then it is better to apply Recursive Least Squares (RLS) which guarantees that the adaptation will only affect the excited rules. Another advantage of the use of RLS is the possibility to initialize the consequence values using prior knowledge, such that the RLS algorithm only "correct" the consequences of the excited rules. In this way we can say that the prior knowledge is valid as far as the data don't say the contrary. If no prior knowledge is present then it can be created from the data and the easiest way is to construct a linear model and initialize the consequences of the rules by using the values given by this model. The mechanism to obtain the consequences is to evaluate in the linear approximation the modal values of the membership functions of a given rule and the singleton consequence will be the evaluated value. This guarantee that the fuzzy model will be "at least as good as the linear model". Another alternative with even better approximation capabilities is to use "the smallest fuzzy model that can be built" this is a fuzzy model with only two (2) membership functions on each input. This structure generates a multilinear approximator (if there is one input it will be linear, with two inputs bilinear and so on) with the advantage that the problem of consequence calculation via "Least Squares" will be well conditioned due to the fact that each point will excite the 2 rules of the model. Once the "smallest ™ fuzzy model" is built the model will be used to generate the initial value of the consequences using the same procedure proposed for the linear model. The RLS algorithm used to calculate the singleton consequences of the rule base is described as follows: 6(k + 1) = 6{k) + j(k)[y(k + 1)  Wk+10(k)] with Wk = {wi ,u>2, • • • ,wkL) and:
7(fc)
(22)
=
P(k + l)Wk+1
(23)
24
J. Espinosa & J.
Vandewalle
Wk+1P(k)W£+1 P(k + 1) = [Iy(k)Wk+1]P(k)
+1 (25)
with the initial value P(O) = al, where a is large. The initial values of 0(0) are the initial values of the consequences as described in the previous paragraph. If the information is a priori considered to excite the whole rule base a good initialization will be:
m=^kykmmkyky {2Q)
Other details about the initialization approaches are discussed in [3]. (11) (Optional step) If some further refinement is desired to improve the approximation. The positioning of the modal values can be optimized by using constrained gradient descent methods such that the "distinguishability" condition is the main constraint on this optimization. Observe that the use of gradient descent methods guarantee convergence to a "local minimum" making the optimal solution close to the initial solution. This is the reason to mention this step as an optional one, because the expected improvement in the solution won't be very significant for many applications, specially if there is more interest in the linguistic description of the rules. Special care should be taken in the calculation of the "true" gradient when the model is going to be used in dynamic operation (with delayed feedback from its own output). (12) Convert the singletons to triangular membership functions with overlap \ and modal values equal to the position of the singleton y~i. Consider the vector Y whose entries are the L consequences of the rules but sorted in such a way that: yi<V2<<yL (27)
The triangular membership function of the ith consequence is: H\{y) = max 0,min V ~ Vii V ~ Vi+i (28)
Vi ~ j / i  i ' Vi ~ 2/i+i
Linguistic
Integrity:
A Framework for Fuzzy Modeling . . .
25
and the two membership functions of the extremes:
f4(y) =max 0,min
'y2yi+V2 y\ + 2/2
y  h y\ m
(29)
fiyL{y) = m a x 0,min( Vhi
VL ~ J / i  1
tVWL+VLi
(30)
 j f L + 2/Ll
(13) (14) (15)
(16)
This description of the outer membership functions guarantees that their centers of gravity will be exactly on its modal value. This guarantees that the condition of error free reconstruction for optimal interface is full filed. Apply FuZion algorithm to reduce the number of membership functions in the output universe. Associate linguistic labels to the resulting membership functions. With the partition of the output universe, fuzzify the values of the singletons. Observe that each singleton will have a membership degree in at least one set and in as much as two. Relate the fuzzified values with the corresponding rule. It means that each rule will have one consequence or two weighted consequences were the weights are the non zero membership values of the fuzzified singleton. This conversion of the singleton consequences into weighted triangular consequences gives linguistic meaning to the consequences.
2.4
The FuZion algorithm
The FuZion algorithm is a routine that merges triangular membership functions with modal values that are too close to each other. This merging process is needed to preserve the distinguishability and the justifiable number of elements on each input domain to guarantee the semantic integrity. The FuZion algorithm goes as follows: (1) Taking the triangular membership functions fix (x), HI (x),... with i overlap, and the modal values mi = Hi(a=i)(x), i = l,...,N ,fij^(x)
(31)
26
J. Espinosa & J.
Vandewalle
with:
m i < m.2 < • • • < m,N (32)
(2) Define the minimum distance M, acceptable between the modal values. (3) Calculate the difference between successive modal values as:
dj — mj+\ — rrij,
j = 1,... ,N — 1
(33)
(4) Find all the differences smaller than M. (5) If there is no difference smaller than M goto 7. (6) Merge all the modal values corresponding to consecutive differences smaller than M using (34).
b
rn
D
=
ba
+l
(35)
where a and b are respectively the index of the first and the last modal value of the fusioned sequence and D is the number of merged membership functions. (7) Update ./V and Go to 3.
2.5
Examples
In this section three examples of applications of the AFRELI and FuZion algorithms are presented. The first two examples are approximations of nonlinear static maps and the last one is the prediction of a chaotic time series. 2.5.1 Example 1: Modeling a two input nonlinear function
In this example we consider the function: /(xl2/)=sin(^)sin(^) (36)
Linguistic
Integrity:
A Framework for Fuzzy Modeling . . .
27
Ordinal Mambarahlp Function*
Mambarahtp function* aftar FuZion
Fig. 2.1
Effect of the FuZion algorithm
441 points regularly distributed where selected from the interval [—10,10] x [—10,10]. The graph of the function is shown in figure 2.2 Using mountain clustering and fuzzy Cmeans algorithm 26 clusters were found and are shown in the figure 2.3 represented with 'x'. After the cluster was found its center value is projected into the input domains as is shown in figure 2.4. In figure 2.5 the projected membership functions are shown. The FuZion algorithm is applied obtaining the membership functions shown in figure 2.6. The output membership functions are shown in figure 2.7. Figure 2.8 shows the identified surface. The 25 extracted rules are: (1) IF a; is Negative Large AND y is Negative Large THEN z is Negative with strength 0.01 AND Zero with strength 0.99 (2) IF x is Negative Medium AND y is Negative Large THEN z is Zero with strength 0.92 AND Positive with strength 0.08 (3) IF x is Zero AND y is Negative Large THEN z is Negative with strength 0.01 AND Zero with strength 0.99 (4) IF x is Positive Medium AND y is Negative Large THEN z is Negative with strength 0.1 AND Zero with strength 0.9
28 J. Espinosa & J. Vandewalle
10
10
Fig. 2.2 Example liFunction f(x,y) = sin (^) sin (^)
Fig. 2.3 Example l:Extracted clusters (V) from the data (o)
(5) I F x is Positive Large AND y is Negative Large T H E N z is Negative with strength 0.03 AND Zero with strength 0.97
Linguistic Integrity: A Framework for Fuzzy Modeling . . . 29
Projection of the Center of The Clusters
Fig. 2.4 Example l:Projection of the centers of the clusters
Projected Membership Functions lor Input X
Projected Membership Functions For Input Y
Fig. 2.5 Example l:Projected membership functions
(6) I F x is Negative Large AND y is Negative Medium T H E N z is Zero with strength 0.96 AND Positive with strength 0.04 (7) I F x is Negative Medium AND y is Negative Medium T H E N z is Zero with strength 0.01 AND Positive with strength 0.99 (8) I F x is Zero AND y is Negative Medium T H E N z is Zero with strength 0.92 AND Positive with strength 0.08
30
J. Espinosa & J.
Vandewalle
I F x is Positive Medium AND y is Negative Medium T H E N z is Negative with strength 0.99 AND Zero with strength 0.01 I F x is Positive Large AND y is Negative Medium T H E N z is Negative with strength 0.1 AND Zero with strength 0.90 I F x is Negative Large AND y is Zero T H E N z is Negative with strength 0.02 AND Zero with strength 0.98 I F x is Negative Medium AND y is Zero T H E N z is Negative with strength 0.11 AND Zero with strength 0.89 I F x is Zero AND y is Zero T H E N z is Negative with strength 0.03 AND Zero with strength 0.97 I F x is Positive Medium AND y is Zero T H E N z is Zero with strength 0.92 AND Positive with strength 0.08 I F x is Positive Large AND y is Zero T H E N z is Negative with strength 0.01 AND Zero with strength 0.99 I F x is Negative Large AND y is Positive Medium T H E N z is Negative with strength 0.07 AND Zero with strength 0.93 I F x is Negative Medium AND y is Positive Medium T H E N z is Negative with strength 1 I F x is Zero AND y is Positive Medium T H E N z is Negative with strength 0.11 AND Zero with strength 0.89 I F x is Positive Medium AND y is Positive Medium T H E N z is Positive with strength 1 I F x is Positive Large AND y is Positive Medium T H E N z is Zero with strength 0.93 AND Positive 0.07 I F x is Negative Large AND y is Positive Large T H E N z is Negative with strength 0.02 AND Zero with strength 0.98 I F x is Negative Medium AND y is Positive Large T H E N z is Negative with strength 0.07 AND Zero with strength 0.93 I F x is Zero AND y is Positive Large T H E N z is Negative with strength 0.02 AND Zero with strength 0.98 I F x is Positive Medium AND y is Positive Large T H E N z is Zero with strength 0.96 AND Positive with strength 0.04 I F x is Positive Large AND y is Positive Large T H E N z is Negative with strength 0.01 AND Zero with strength 0.99 It is i m p o r t a n t to remark t h a t for this example there is a clear dominance of one of the consequences in most of the rules, when this situation
Linguistic Integrity: A Framework for Fuzzy Modeling . . .
Membership Functions foi Input X after FuZion Functions tor Input Y after FuZion
31
Fig. 2.6
Example 1:Membership functions after FuZion
Membership functions for the oulpul
Oulpul Membership functions with UngrAstic meaning
0.5
0
OS
Fig. 2.7
Example l:(a) Singletons (b) Membership functions with linguistic meaning
appears it will be possible to eliminate the consequence with the smallest strength with a minor impact in the numerical approximation. 2.5.2 Example tion 2:Modeling of a three input nonlinear func
For this example the data were generated using the function: f(x,y,z) = (l + x°5+yl+z15)2 (37)
In this case 216 random points from the input range [1,6] x [1,6] x [1,6] were used as training set and 125 random points from the input range
32
J. Espinosa & J.
Vandewalle
Surface generated by the fuzzy system
Fig. 2.8
Example l:Surface Generated by the Fuzzy System
[1.5,5.5] x [1.5,5.5] x [1.5,5.5] were used as validation set. As a performance index we used the average percentage error (APE):
APE^t^m^mr,
(as,
where T(i) is the desired output and 0(i) is the predicted output. This performance index allows us to compare the present result with previous works. First a mountain clustering procedure was used and 11 clusters were found, further refinement was obtained by using fuzzy Cmeans clustering algorithm. In figure 2.9 the projected membership functions can be observed. After reduction using FuZion with a minimum distance factor of 15% of the size of the universe of discourse for each input, the membership functions shown in figure 2.10 were obtained. Figure 2.11 shows the singleton consequences and the consequences after FuZion. Table 1.1 shows the comparative results with previous work. The result in the table shows that the model obtained with the AFRELI method has an average performance when the training points are evaluated but when the model is compared with the other models using the validation set it is clear that the ANFIS model and the AFRELI model exhibit the
Linguistic
Projected Membership Functions for Input X
Integrity:
A Framework for Fuzzy Modeling . . .
Projected Membership Function* for Input Y
33
Fig. 2.9
Example 2:Projected membership functions
best performance. This result confirm that the AFRELI model has not only an acceptable approximation capability and linguistic meaning, but also a good generalization.
2.5.3
Example
3:Predicting
Chaotic
Time
Series
This example shows the capability of the algorithm to capture the dynamics governing the MackeyGlass chaotic time series. These time series were generated using the following delay differential equation: ^(t)= °
,
? "
1K}
T )
,0.1x(t)
(39)
1 + X
X{t — Tj
where r = 17. The numerical solution of this differential equation was obtained using fourth order RungeKutta method, with a time step of 0.1 and initial condition a;(0) = 1.2. The simulation was run for 2000 seconds
34
J. Espinosa & J.
Vandewalle
Projected Membership Functions tot Input Y after FjZion
Projected Membership Functions tor Input X after FuZion
Fig. 2.10
Example 2: Membership functions after FuZion
and the samples were taken each second. To train and test the fuzzy system 1000 points were extracted t = 118 to 1117. The first 500 points were used as training set and the remaining as validation set. First a six step ahead predictor is constructed using past outputs as inputs of the model: [x(t  18) x(t  12) x(t  6) x(t)] (40)
and the output will be x(t + 6). After applying the mountain clustering method, 57 clusters were found. Some refinement on the position of the clusters were obtained by using Fuzzy CMeans clustering method. After projection and FuZion the membership functions shown in figure 2.12 were obtained. To permit a comparison with previous works, the prediction error was evaluated using the so called NonDimensional Error Index (NDEI) denned
Linguistic
Integrity:
A Framework for Fuzzy Modeling . . .
35
Table 2.1 Example 2: Performance comparison with previous work. The results from previous works were taken from [4].
Model
AFRELI ANFIS GMDH model Fuzzy model 1 Fuzzy model 2
APEj
APEVAL
Par am. Number
80 50
Size Train. Set
216 216 20 20 20
Size Valid. Set
125 125 20 20 20
1.002 % 0.043 % 4.7% 1.5% 0.59 %
1.091 % 1.066 % 5.7% 2.1% 3.4%
22 32
Singletons of the Consequences
Consequences Membership Functions after Fu2on
18
20
22
24
20
25
30
Fig. 2.11
Example 2:(a) Singletons (b) Membership functions with linguistic meaning
NDEI
=
V&££i(r(0Q(»))2
a{T)
(41)
where T{i) is the desired output, 0(i) is the predicted output and a(T) is the standard deviation of the target series. The tables 1.2 and 1.3 show some comparative results. Conclusions from these results are that the performance of AFRELI from the numerical point of view is acceptable and similar to other numerically oriented techniques. The added value by the AFRELI algorithm to the model is the linguistic meaning.
36
J. Espinosa & J. Vandewalle
Projected Merntxtrihip Function for input ir(tlBr Projected Memberer»p Funciion for input x(t12)
Fig. 2.12
Example 3:Membership functions after projection and FuZion
2.6
Conclusions
Fuzzy Inference Systems are "universal approximators". A comparative advantage of the fuzzy systems is its linguistic intepretability. The AFRELI algorithm in combination with the FuZion algorithm guarantees an acceptable compromise between numerical accuracy and linguistic integrity. The numerical accuracy of the algorithm can be directly controlled via the parameters of the FuZion and the clustering algorithm. Some improvements of the numerical performance of the model can be obtained by making a "fine" tuning of the parameters of the antecedents by means of gradient descent techniques, but the procedure should respect the minimum distance M between the modal values. The selection of the mentioned parameters are the user's choices.
Linguistic Integrity: A Framework for Fuzzy Modeling ...
Membership functions for the output Consequences Membership Functions the Output alter FuSon
37
Fig. 2.13
Example 3:(a)Singletons (b) Membership functions with linguistic meaning
Table 2.2 Example 3: Performance for prediction six steps ahead. The results from previous works were taken from [4].
Method
Training Cases 500 500 500 500 500 500 500 2000
Nondimensional error index 0.0493 0.0324 0.007 0.19 0.06 0.02 0.04 0.55
AFRELI AFRELI (with optional step) ANFIS AR model Cascadedcorrelation NN Backpropagation MLP 6th order polynomial Linear predictive method
The use of normalized triangular membership functions with 0.5 overlap also guarantees a limited complexity. Because utmost two membership functions have a value different from zero on each input, the maximum number of evaluated rules is 2N where N is the number of inputs. Hence, the problem of the combinatorial explosion in the evaluation of fuzzy systems is now a problem of storage rather than a problem of computation. Future research will be oriented to the implementation of a similar algorithm using membership functions composed by third order polynomials. Some of the features described here are also applicable to this kind of membership functions but a different denazification method should be imple
38
J. Espinosa & J.
Vandewalle
(a) MackeyGlass time series
Fig. 2.14 Example 3:(a) MackeyGlass time series (solid line) from t = 618 to 1117 and six steps ahead prediction (dashed line) (b) Prediction errors
mented to guarantee an zero error reconstruction optimal interface design. The use of this kind of membership functions will also guarantee a continuous derivative, a very important element to "fine tune" the antecedents.
Linguistic Integrity: A Framework for Fuzzy Modeling . . .
(a) MackeyGlass time series prediction 84 steps ahead Desired (Solid) and Predicted (Dashed) 1.4r
39
Fig. 2.15 Example 3:(a) MackeyGlass time series (solid line) from t = 118 to 1117 and 84 steps ahead prediction (dashed line) (b) Prediction errors
Acknowledgments This work is supported by several institutions: the Flemish Government: Concerted Research Action GOAMIPS (Modelbased Information Processing Systems), the FWO (Fund for Scientific Research  Flanders) project G.0262.97 : Learning and Optimization: an Interdisciplinary Approach, the
40
J. Espinosa & J.
Vandewalle
Table 2.3 Example 3: Performance for prediction 84 steps ahead (the first seven rows) and 85 (the last four rows). Results for the first seven methods are obtained by simulation of the model obtained for prediction six steps ahead. Results for localized receptive fields (LRFs) and multiresolution hierarchies (MRHs) are for neurons trained to predict 85 steps ahead. T h e results from previous works were taken from [4].
Method
Training Cases 500 500 500 500 500 500 500 2000 500 10000 500 10000
Nondimensional error index 0.1544 0.1040 0.036 0.39 0.32 0.05 0.85 0.60 0.100.25 0.0250.05 0.05 0.02
AFRELI AFRELI (with optional step) ANFIS AR model Cascadedcorrelation NN Backpropagation MLP 6th order polynomial Linear predictive method LRF LRF MRH MRH
FWO Research Communities: ICCoS (Identification and Control of Complex Systems) and Advanced Numerical Methods for Mathematical Modelling, IWT Action Programme on Information Technology (ITA/GBO/T23)  Federal Office for Scientific, Technical and Cultural Affairs  Interuniversity Poles of Attraction Programme (IUAP P402 (19972001): Modeling, Identification, Simulation and Control of Complex Systems; and IUAP P424 (19972001): Intelligent Mechatronic Systems (IMechS) ), the European Commission: TMR project: System Identification.
Linguistic Integrity: A Framework for Fuzzy Modeling ...
41
References
Bezdek J.C., "A Physical Interpretation of Fuzzy ISODATA," IEEE Trans. Syst.,Man,Cybern., 6, p.387, 1976. Broadbent D.,"The Magic Number Seven After Fifteen Years," Studies in Long Term Memory, Ed. A. Kennedy and A. Wilkes, p.3, 1975. Espinosa J., Vandewalle J., "Fuzzy Modeling and Identification, A guide for the user," Proceedings of the IEEE Singapore International Symposium on Control Theory and Applications1997, p.437,1997. Jang,J.S.R., " NeuroFuzzy Modeling: Architectures, Analyses and Applications," Phd.dissertation University of Berkeley, California, 1992. Jang J.S. R., "Structure Determination in Fuzzy Modeling: A Fuzzy CART Approach," Proc. of IEEE international conference on fuzzy systems, 1994. Lori N., Costa Branco P.J.,"Autonomous MountainClustering Method Applied to Fuzzy Systems Modeling," Intelligent Engineering Systems Through Artificial Neural Networks, Smart Engineering Systems: Fuzzy Logic and Evolutionary Programming , Ed. C.H. Dagli, M. Akay, C.L. Philip Chen, B. Fernandez, and J. Ghosh, 5, p.311, ASME Press, New York, 1995. Pedrycz W., "Why Triangular Membership Functions?," Fuzzy Sets and Systems, 64, p. 21, 1994. Pedrycz W., Valente de Oliveira J., "Optimization of fuzzy models," IEEE Trans. Syst.,Man,Cybern., Part.B, 26, p.627, 1996. Sugeno M., Yasukawa T., "A Fuzzy Logic Based Approach to Qualitative Modeling," IEEE Trans, on Fuzzy Systems, 1, p.7, 1993. Tan S., Vandewalle J., "An Online Structural and Parametric Scheme for Fuzzy Modelling," Proc. of the 6th International Fuzzy Systems Association World Congress IFSA95, p.189, 1995. Wang L.X., "Adaptive Fuzzy Systems and Control," Prentice Hall, New Jersey, 1994. Yager R.R., Filev D.,"Essentials of fuzzy modeling and control," John Wiley & Sons, New York, 1994.
Chapter 3 A New Approach to Acquisition of Comprehensible Fuzzy Rules
Hiroshi Ohno1 and Takeshi Furuhashi2
Toyota Central Research & Development Labs., Inc. 2 Nagoya University
Abstract We present a new approach to acquisition of comprehensible fuzzy rules for fuzzy modeling from data using Evolutionary Programming (EP). For accuracy of model, it is effective to allow overlapping of membership functions with each other in the fuzzy model. From the viewpoint of knowledge acquisition, it is desirable that the model has a smaller number of membership functions with less overlapping. Considering the tradeoff between the precision and the clarity of the fuzzy model, this paper presents an acquisition method of comprehensible fuzzy rules form the identified model that satisfies the desired accuracy. The approach clearly distinguishes modeling phase and reevaluation phase. The accurate model of unknown system in the modeling phase is to be obtained by, for example, fuzzy neural network (FNN) such as a radial basis function network, using EP. The simplified model in the reevaluation phase can mainly be used for knowledge acquisition from unknown system. A numerical experiment was done to show the feasibility of the proposed algorithm. Keywords: fuzzy modeling, knowledge acquisition, evolutionary programming, linguistic meanings fuzzy neural networks,
3.1
Introduction
In recent years, many researchers on fuzzy modeling from data have been widely developed for numerous applications[l]. In the fuzzy modeling, parameters of membership function and fuzzy rules are generated and tested
43
44
H. Ohno & T.
Furuhashi
by evolutionary optimization algorithm. The resulting fuzzy model thus represents the nonlinear characteristics of unknown system. However, the model cannot always provide comprehensible fuzzy rules since many membership functions are generated and overlapped with each other for identifying a precise model of the nonlinear system. If the purpose of modeling is to realize the nonlinear relation of unknown system, we can use neural networks. Fuzzy modeling is superior in linguistic explanation of unknown system to neural networks and other nonlinear modeling techniques. Comprehensibility and precision of the model are tradeoff, and the comprehensibility of the fuzzy rules thus has been sacrificed for achieving the precision. Since comprehensible fuzzy rules discovered from unknown system are useful for understanding the system, a modeling method satisfying both the precision and the clarity of the resulting model simultaneously is desirable. The conventional studies on fuzzy modeling can be categorized into two main approaches. One is to find out precise fuzzy rules subject to the constraints of comprehensible fuzzy rules[2],[3][6],[7],[8]. The other is to reduce complexity after obtaining a precise fuzzy model[9][13]. To find out precise fuzzy rules, the fuzzy model becomes more complex,which has many membership functions. It is not suitable for knowledge acquisition. In the latter approach,the modeling performance is often degraded because of reducing number of membership functions that have available information for nonlinear modeling. Thus, from a knowledge acquisition viewpoint, it is not necessary that the simplified fuzzy model has a higher precision and at the same time a better comprehensibility. In this paper we address the knowledge acquisition with clear linguistic meanings for the fuzzy modeling, and clearly distinguish the modeling phase and the reevaluation phase in the process of knowledge acquisition. We propose a new approach using EP[l4], which consists of the modeling phase for identification of accurate model and the reevaluation phase for simplification of the identified model. In the approach we introduce the degree of explanation for the acquired fuzzy model in the reevaluation phase, which is the constraints of membership function. The constraints guarantee linguistic meanings of membership function during the reevaluation phase. The degree of explanation that is set by the user determines the structure of knowledge in the simplified model. To generate proper knowledge of unknown system effectively, it is necessary that the user who is a domain expert interacts in the process of knowledge acquisition[l5]. This
A New Approach to Acquisition of Comprehensible
Fuzzy Rules
45
is to direct the process of knowledge acquisition according to the domain knowledge. The acquired fuzzy rules are reevaluated using EP for clarifying their linguistic meanings subject to the constraints of membership function. It is not always easy to derive the derivatives of the objective needed by the learning algorithm such as gradient descent optimization method. Evolutionary optimization method does not require the derivatives for its learning. EP can directly handle numerical data without the use of binary code. The approaches described in the literatures[ll], [12], [13] are aimed at reducing the number of membership functions or fuzzy rules in the model. These approaches do not always make interpretation and analysis of the rules easy. Fig.3.1 shows the flow chart of our approach. The distinguishing feature of the proposed approach is the incorporation of the degree of explanation by the user. From a neural network viewpoint, studies have been done for extracting rules or fuzzy rules from trained neural networks[l6],[l7]. However, the acquired rules in the neural network are very difficult to comprehend because of the distributed representation of knowledge in connection weights and biases of network. The proposed approach using an FNN is superior to the neural network approach in the sense of easily extracting knowledge, because it has the structure of knowledge. This paper is organized as follows. In Section 2, our approach for knowledge acquisition from fuzzy model is presented. A fuzzy modeling using an FNN is described. Section 3 illustrates computer experiments using a simple example from literature to show the feasibility of the proposed method. In Section 4, a summary and conclusions complete the paper.
3.2
Proposed Algorithm
This section describes the proposed algorithm for the clarity of fuzzy rules identified by the FNN[l8]. Another fuzzy modeling techniques(e.g., [19]) can also be used. The algorithm consists of modeling phase and reevaluation phase.
46
H. Ohno & T.
Furuhashi
User: Degree of explanation
comprehensible fuzzy rules
Fig. 3.1 Flow chart of our approach
3.2.1
Fuzzy
modeling
In the modeling phase, we use the FNN with Gaussiantype membership functions depicted in Fig 3.2, and identify their parameters using EP from given inputoutput training data. We can also use another modeling technique instead of EP, such as gradient descent type method to decide unknown parameters. The fuzzy rules are represented as Ri'.If x\ is An and x2 is Ai2, • • ,XN is Any then yi = Wi (1)
Vi — T^M prJV . .I
where M is the number of rules, N is the number of inputs, Ri(i = 1,2, ...,M) denotes the ith fuzzy rule, Xj(j = 1,2, ...,N) is the input, y is the output of the fuzzy rule, Wi is the singleton, Aij is the fuzzy variable with the Gaussiantype membership function. This function is given by
VAZ1(XJ)
= exp((a; j 
cij)2/2a^j),
(2)
A New Approach to Acquisition
of Comprehensible
Fuzzy Rules
47
where Cij and <Jij determine the central position and the width of the membership function, respectively. c^, &ij, and Wi are parameters to be identified by EP. The objective of EP is to reduce the squared error between the output of the fuzzy model and the desired output of the training data.
Fig. 3.2
Architecture of fuzzyneural network
3.2.2
Reevaluation tion
of fuzzy
model for knowledge
acquisi
In the reevaluation phase we simplify the membership functions for the clarity of the acquired knowledge in the following procedure: First, we select the type of membership function in the antecedent such as trapezoidal or triangular with linguistic meanings, and decide the number of them. This is the process to introduce the degree of explanation by the user into the fuzzy model. Second, we replace the original membership functions of the identified fuzzy model using EP subject to the constraints as the degree of explanation. The objective of EP is to reduce the squared error between the values of each original membership function associated with the input training data and those of the new membership function. The shapes of the new membership functions are depicted in Fig.3.3. The parameter of these membership functions is summarized in Table 3.1. Other shapes of membership function can also be used. The constraints are imposed on the shape and the order of membership functions summarized in Table 3.2. The degree of explanation is defined by these constraints and the number of new membership functions. The user
48
H. Ohno & T. Furuhashi
interacts with the computer system for knowledge acquisition through the degree of explanation. If the constraints are not satisfied during the operation of EP, the parameter of the membership functions, that is included in a solution vector of EP, are initialized and the EP operation is continued. The solutions of EP are realvalued vectors corresponding to the unknown parameters of the fuzzy system and the adaptable standard deviations that determine the step size for each unknown parameter. Finally, we evaluate the performance of the fuzzy model replaced by the new membership functions in terms of the minimum distance between the output of the original membership functions and new ones. The algorithm is summarized as follows: Given a fuzzy model such as an FNN constructed from data. Step 1. Set the degree of explanation by selecting the number and shapes of membership functions and determining the order of them. Step 2. Create the new training data that comprises the input data in the modeling phase and the corresponding outputs of the original membership functions. Step 3. Adapt the new membership functions to the original membership functions of the fuzzy model using EP subject to the constraints given by the degree of explanation. Step 4. Replace the membership functions with the new membership functions with the minimum distance using the new training data in Step 2. Step 5. Evaluate the new fuzzy model using the training data in modeling phase. In Step 5, the fuzzy rules which have the same antecedent part are merged and the singletons, w,, among them are averaged.
Table 3.1
Parameters of membership functions
A B C D
x,l x,11,12 x,11,12 x,11,12
A New Approach to Acquisition
of Comprehensible
Fuzzy Rules
49
Fig. 3.3
Shape of membership functions
Tfthle 3.2
Conditions nf t.hp ronst.ra.int.s
Shape Order
0 < / for A CBD
0 < ll< 12 for B, C, D CABAD
3.3
Computer Experiments
We used a simple example to demonstrate the proposed algorithm. Threeinput and singleoutput function was chosen to be an unknown system for modeling phase. The function was represented as
5 2 y = (1 + x\h + x^1 + *81Bxa )
(3)
Forty inputoutput training data were randomly generated using this function, as shown in Table 3.3. Twenty data were used in the modeling phase. The remaining were used for testing the generalization of model. The accuracy of fuzzy model was evaluated using the following performance index:
E=1/LJ2
x 100(%),
Vi
(4)
where L denoted the number of data, yt denoted the ith data from Eq.3, and y* was the output of the fuzzy model. For EP, the population size that is the number of solutions was set at 100.
3.3.1
II u
O t O C O S O ! O l i  i « M P O
C D
II o
OiOiOiOlOii—'l—!—'l—'H*O
M deli ng re
CT" ft>
a
W W H O l W H C n W M O l W
%
4^WtOI'tOtO
h
t».Ol4i.COtO h
tO M H I O P H P O l , O l , >. . O l , O l H O O l t O M O l S O l M M O J O l O ) « l O ( 0 0 ) N 0 1 * . f f l
fTl
i ^ t O C O t O C O C O t O W C O C O W O f f l C O N O l O l ^ O J M H O
n
Ol Ol Ol Ol Ol
3
o
cr t o fD
C O
O
c*
•pad
rr CO .^ £>• fT> n •o cr
Ol
* O cr * l ft!
3 51 3
Cot—1 Ol Co hl Ol CO I—l Ol CO H( O W J l D l i ^ t O t O H b O W i ^
We set the initial ran dom v alues of c [0,5]. For EP, the p opulat ion size t hat i selected to 00. I n Figs 3.4, 3.5 , and the functions c rres ponding to e ach input varia after 100 10, and t en me mbership funct are In Tabl 3.4, the p erform ance inde xes of t resu ltiu The experi ment il resu It had little com prehe ility the membe ship funct ions w ere heavil y ove
I ^ 1 1 — — Ol CO W O l CO I—' I ^ I—' — tO CO 0 1 W 00 CO I—' CO O tO Ol ^^ • w N ) h J ° ( O S O >£• CO H ^ M
oqs
8,
ii
CD
3
rti
o o
3
fD
cr "o
t?
w
0)
3
CR
3
O it>
h*
"
fT en fi ti o a. kl a> VJ
(1)
ii
O
O
CO
CO
utic mem rsh atio
are rule sin
~
cr
3
CO
C D
o 3 o 2,
3
P 3 cr it a. S
3
fD
cr X)
I m S
A New Approach to Acquisition
of Comprehensible
Fuzzy Rules
51
1.00 2.00 3.00 4.00 5.00
Fig. 3.4 Membership functions of input variable (x\)
.00 2.00 3.00 4.00 5.00
Fig. 3.5 Membership functions of input variable (12)
3.3.2
Reevaluation
results
We examined the case where the degree of explanation was denned as follows:the numbers of the membership functions were A = 0, B = 1, C = 1, and D = 1 for each input variable; The order of these membership functions was set from left to right as "CBD." The linguistic meanings in this case
52
H. Ohno & T.
Furuhashi
1.00 2.00 3.00 4.00 5.00
Fig. 3.6 Membership functions of input variable (2:3)
Table 3.4
Performance index E (%)
Training data (Data 1~20) 11.46
Unknown data (Data 21~40) 12.31
were interpreted as C is "small," and B is "medium," and D is "big." Table 3.5 shows the initial random values of the parameters of the membership functions assigned to B, C, and D. These parameter values were determined empirically after trial and error experiments to prevent the reevaluation phase form getting stuck at local minimum. The maximum and minimum values of the input variable were 1 and 5, respectively. After 5000 generations, we obtained the membership functions as shown in Figs. 3.7, 3.8, and 3.9. In these figures, the membership functions were labeled by "C," "B," and "D." In Table 3.6, the performance indexes are shown. The performance indexes were degraded from those of the original fuzzy mpdel shown in Table 3.4. This degradation is regarded as the explanation cost for the comprehensibility. In this case, the membership functions became more comprehensible than the original ones shown in Figs. 3.4, 3.5, and 3.6. If the objective of fuzzy modeling is to acquire a precise model, the original model can be used. From these figures, it is seen that the new membership functions are
A New Approach to Acquisition Table 3.5
of Comprehensible
Fuzzy Rules
53
Initial values of the membership functions
B C D
x € [2.8, 3.2] 11 6 [0.3, 0.7] , 12 e [0.8, 1.2] x € [0.8, 1.2] Zl € [0.3, 0.7] , 12 £ [0.8, 1.2] x £ [4.8, 5.2] 11 £ [0.3, 0.7] , 12 6 [0.8, 1.2]
not weak consistency[20]. Prom the linguistic meaning point of view, it is desirable for membership functions to be weak consistency or consistency. Therefore, in considering this point it needs new constraints which are imposed on the membership functions. Table 3.7 shows the fuzzy rules of the new fuzzy model that is simplified. In the table, the last column is the value of the singleton, wt. The number of fuzzy rules decreased from 10 to 9.
1.00 2.00 3.00 4.00 5.00
Fig. 3.7 Membership functions of input variable (xi)
3.4
Conclusions
In this paper, a new approach to the acquisition of comprehensible fuzzy rules from the FNN constructed from data was proposed and its feasibility
54
H. Ohno & T.
Furuhashi
x2
1.00 2.00 3.00 4.00 5.00
Fig. 3.8 Membership functions of input variable (£2)
x3
00 2.00 3.00 4.00 5.00
Fig. 3.9 Membership functions of input variable (13)
was demonstrated through computer experiments. The proposed algorithm using EP consists of two phases: modeling and reevaluation. In the reevaluation phase, we can control the explanation degree for the knowledge acquisition as the constraints. This is the feature that distinguishes the proposed approach from the conventional approaches.
A New Approach to Acquisition of Comprehensible Fuzzy Rules Table 3.6 Performance index E (%)
55
Training data 52.79
Unknown data 54.96
Table 3.7
Fuzzy rules of new fuzzy model.
X2
Number 1 2 3 4 5 6 7 8 9
Xi
Xz
small small small medium medium big big big big
small medium big small big small small big big
medium small small big medium small medium small big
y 22.690 5.437 1.920 3.532 0.116 16.271 0.144 21.760 3.102
Future work is to improve the performance index after reevaluation phase and to apply this method to practical application. In the reevaluation phase, the experimental results reveal that new constraints for membership functions are needed for linguistic meanings. Moreover, introduction of another measure of distance between the original and new membership functions may improve the performance in the reevaluation phase.
56 H. Ohno & T. Furuhashi
References
[1] J.C.Bezdek, "Editorial:Fuzzy ModelsWhat Are They, and Why ?," IEEE Trans. Fuzzy Syst., Vol.1, pp.16, 1993. [2] S. Horikawa, T. Furuhashi, S. Okuma, and Y. Uchikawa, "A Fuzzy Controller Using a Neural Network and its Capability to Learn Expert's Control Rules," in Proc. of Int'l. Conf. on Fuzzy Logic & Neural Networks(IIZXJKA90), pp. 103106, 1990. [3] J.S. R. Jang, "Fuzzy modeling Using Generalized Neural Networks and Kalman Filter Algorithm," in Proc. of Ninth National Conf. Artificial Intelligence(AAAl9l), pp. 762767, 1991. [4] J.S. R. Jang, "SelfLearning Fuzzy Controllers Based on Temporal BackPropagation," IEEE Trans. Neural Networks, vol. 3, no. 5, pp. 714723, 1992. [5] T. Hasegawa, S. Horikawa, T. Furuhashi, et al., "A Study on Fuzzy Modeling of BOF Using a Fuzzy Neural Network," in Proc. of the 2nd Int'l Conf. on Fuzzy Logic & Neural Networks(IIZ\JKA'92), pp. 10611064, 1992. [6] S. Nakayama, T. Furuhashi, and Y. Uchikawa, "A Proposal of Hierarchical Fuzzy Modeling Method," Journal of Japan Society for Fuzzy Theory and Systems, vol. 5, no. 5, pp. 11551168, 1993. [7] K. Shimojima, T. Fukuda, and Y. Hasegawa, "Selftuning Fuzzy modeling with Adaptive Membership Function, Rules, and Hierarchical Structure Based on Genetic Algorithm," Fuzzy Sets and Systems, vol. 71, no. 3, pp. 295309, 1995. [8] S. Matsushita, A. Kuromiya, M. Yamaoka, T. Furuhashi, and Y. Uchikawa, "Determination of Antecedent Structure of Fuzzy Modeling Using Genetic Algorithm," in Proc. of 1996 IEEE Int'l Conf. on Evolutionary Computaium(ICEC96), pp. 235238, 1996. [9] R. R. Yager and D. P. Filev, "Unified structure and parameter identification of fuzzy models," IEEE Trans. Syst., Man, Cybern., vol. 23, no. 4, pp. 11981205, 1993. [10] B. G. Song, R. J. Marks II, S.Oh, P. Arabshahi, T. P. Caudell, and J. J. Choi, "Adaptive membership function fusion and annihilation in fuzzy ifthen rules," in Proc. Second IEEE Int. Conf. Fuzzy Syst, pp. 961967, 1993.
A New Approach to Acquisition of Comprehensible Fuzzy Rules
57
11] C. T. Chao, Y. J. Chen, and C. C. Teng, "Simplification of fuzzyneural systems using similarity analysis," IEEE Trans. Syst., Man, Cybern., vol. 26, no. 2, pp. 344354, 1996. 12] R. Babuska, M. Setnes, U. Kaymak, and H. R. van Nauta Lemke, "Rule base simplification with similarity measures," in Proc. Fifth IEEE Int. Conf. Fuzzy Syst, pp. 16421647, 1996. 13] J. Yen and L. Wang, "An SVDbased fuzzy model reduction strategy," in Proc. Fifth IEEE Int. Conf. Fuzzy Syst, pp. 835841, 1996. 14] N. Saravanan and D. B. Fogel, "Evolving Neural Control Systems," IEEE EXPERT, 10(3), pp. 2327, 1995. 15] Y. Nakamori, "Development and Application of an Interactive Modeling Support System," Automatica, vol. 25, no. 2, pp. 185206, 1989. 16] J. Diederich, "Explanation and artificial neural networks," Int. J. ManMachine Studies, Vol.37, pp. 335355, 1992. 17] S. H. Huang and M. R. Endsley, "Providing understanding of the behavior of feedforward neural networks," IEEE Trans. Syst., Man, Cybern., Vol.27, No.3, pp. 465474, 1997. 18] S. Horikawa, T. Furuhashi, and Y. Uchikawa, "On Fuzzy Modeling Using Fuzzy Neural Networks with the BackPropagation Algorithm," IEEE Trans. on Neural Networks, vol. 3, no. 5, pp. 801806, 1992. [19] M.Sugeno and T.Yasukawa, "A fuzzylogicbased approach to qualitative modeling," IEEE Trans. Fuzzy Syst, Vol.1, pp. 731, 1993. [20] X.J. Zeng and M. G. Singh, "Approximation Accuracy Analysis of Fuzzy Systems as Function Approximators," IEEE Trans, on Fuzzy Systems, vol. 4, no. 1, pp. 4463, 1996.
Chapter 4 Fuzzy Rule Generation with Fuzzy SingletonType Reasoning Method
Yan Shi1 and
2
Masaharu Mizumoto2
' Kyushu Tokai University Osaka ElectroCommunication University
Abstract
By means of fczzy singletontype reasoning method, we propose a selftuning method for fuzzy rule generation. In this tuning approach, first we use a learning algorithm for tuning fuzzy rules under fuzzy singletontype reasoning method, then we roughly design initial tuning parameters of fuzzy rules based on fuzzy clustering algorithm. By this approach, the learning time can be reduced and the fuzzy rules generated are reasonable and suitable for identified system model. Finally, we show the efficiency of the employed method by identifying nonlinear functions. Keywords : fuzzy singletontype reasoning method, fuzzy rule generation, neurofuzzy learning algorithm, fuzzy cmeans clustering algorithm
4.1 Introduction In fuzzy singletontype reasoning method by Mizumoto [10,11], that can adjust the weights of fuzzy rules, the fuzzy inference conclusion can be well improved because of the flexibility of the method. Also, it shows better fuzzy control results than the case of simplified fuzzy reasoning [10,11]. Like other fuzzy reasoning methods, it is necessary and important to design the fuzzy rules of fuzzy singletontype reasoning method for a practical problem, in the case when the construction of a fuzzy system model is difficult by human being [58,1314,1621]. For this purpose, a learning algorithm for tuning the real numbers and weights of the consequent parts has been proposed in [10] by using the gradient descent method [15], where membership functions of antecedent parts are of triangulartype. Furthermore, in the case of oneinput oneoutput, another so called selfgenerating learning algorithm fuzzy rules has
59
60
Y. Shi & M.
Mizumoto
been provided by fuzzy singletontype reasoning method [12], which tunes centers of triangular membership functions of antecedent parts, real numbers and weights of the consequent parts based on the gradient descent method. However, the above two tuning methods lack the generality for a multipleinput fuzzy system model. Also, as is well known in all of neurofuzzy learning algorithms [3,58,1314,1621], it has not been full investigated how to arrange the suitable initial values of tuning parameters (centers and widths of membership functions of antecedent parts, real numbers of consequent parts and their weights) before learning them. In this article, we propose a new selftuning method for fuzzy rule generation based on the fuzzy singletontype reasoning method. In this developed approach, first we give a socalled neurofuzzy learning algorithm for tuning fuzzy rules under fuzzy singletontype reasoning, then we roughly design initial tuning parameters of fuzzy rules by using fuzzy clustering algorithm, before learning a fuzzy model. By this approach, the learning time can be reduced and the fuzzy rules generated are reasonable and suitable for the identified system model. Moreover, the potential of the proposed technique is illustrated by identifying nonlinear functions. 4.2 Fuzzy SingletonType Reasoning Method (FSTRM) We first briefly review the fuzzy singletontype reasoning method by Mizumoto [10,11], in which the fuzzy model has m input linguistic variables (xbx2,...,xm) and one output variable y. For convenience of representation, in the sequel we denote fuzzy singletontype reasoning method as FSTRM. Usually, a fuzzy model with m linguistic variables (xhx2,...,xj and one output variable y can be expressed by FSTRM in the form of"If...then...with..." fuzzy inference rule model as follows [10,11]: Rule 1 : 1 ^ i s ^ u andx2 isy42i and... andxm isAm\ thenyj withwj Rule /': If x{ isAu andx2 isA^ and... andxm i s ^ theny, with w, Rule n: If x, is^4ln andx2 is A^ and... andxm isA„„ theny„ with wn (1)
Fuzzy Rule Generation with Fuzzy Singleton . . .
61
where Ajt (j=l,2,..m; i=l,2,.../t) is a fuzzy subset for the input linguistic variable Xj, y, is a real number for the output variable y, and w; is the weight corresponding to /'th fuzzy rule, respectively. And n denotes the number of fuzzy rules. When an observation (xlrr2,..., xm) is given, a fuzzy inference consequence y can be obtained by using FSTRM in the following way [10,11]:
ht=AlficMite)...AJ?cJ y= yhtwiyi/Vhlwi
(2) (3)
where //, (/l,2,...,w) is an agreement of the antecedent of /th fuzzy rule at (xhx2 xm). As a simple explanation of FSTRM, Fig. 4.1 shows the process of the fuzzy inference by using fuzzy singletontype reasoning method wheny = 2, i = 2in(l)[10]. It has been shown that better fuzzy control results can be obtained by FSTRM than those by simplified fuzzy reasoning method, which implies that FSTRM is a powerful tool for fuzzy logic applications [1012].
• xi
yi
__.
h)Wi
?W2 hyvz
1
y
=Ai2(xl)A22(.X2)W2 yi
Fig. 4.1
Explanation of fuzzy singletontype reasoning method.
62 Y. Shi & M. Mizumoto 4.3 Learning Algorithm of Fuzzy Rule Generation with FSTRM
For given training inputoutput data (x^1,...^cmy*) to a fuzzy system model, we have the following objective function E for evaluating an error between y" and
y
E=(yyfl2
(4)
where y is the desired output value, andy is the corresponding fuzzy inference result. To minimize the objective function E, a new neurofuzzy learning algorithm for tuning fuzzy rules under FSTRM is developed as follows: Gaussiantype neurofuzzy learning algorithm by FSTRM Firstly, we give a neurofuzzy learning algorithm, in which membership functions of the antecedent parts of the fuzzy rules are of Gaussiantype as shown in Fig. 4.2.
Fig. 4.2
Gaussiantype membership functions for input variable x}.
Let fuzzy subsets 4/, (/=l,2,..w, i=l,2,...,n) be of Gaussiantype as follows [6]: AJt(Xj) = exp((Xj  cij,)2/ bj.) where aJt is the center ofAj, and bfi is the width of AJt. By (1)  (3) a neurofuzzy learning algorithm under FSTRM for updating the parameters aJh bJh y, and w, (j=\,2,~m; zl,2,...,«) is formulated based on the gradient descent method [15] as follows: (5)
Fuzzy Rule Generation with Fuzzy Singleton . . .
63
aJI(t + = aj , (0 +
l)aJI(f)adE/daJI(f) 1
J
—
(6)
bJ,{t + •M0+
l)bJI{t)fidE/dbJI(t) WOr'yXy.yWMxjaj,)2 (7)
S
*S h,w,
(8)
/A'/
T
/,W/
w,(r + l)  w , ( f )  ^ £ / d w , ( 0 w,(0 + Oiyy*t
n
(9)
y.h'wi
where a, 0,
y and # are the learning rates, and r is the learning iteration.
Trianguartype neurofuzzy learning algorithm by FSTRM
Fig. 4.3
Triangulartype membership functions for input variable Xj.
64
Y. Shi & M.
Mizumoto
Next, we give another neurofuzzy learning algorithm, in which the memberslup functions of the antecedent parts of the fuzzy rules are of Triangulartype as shown in Fig. 4.3. Let fuzzy subsets^, (J=l,2,..m; i=l,2,...,ri) be of Triangulartype as follows: l2x,a„./£„., ajibJi/2xxJxaJi+bjl/2; 0, otherwise
W
(10)
where a,, is the center afAM and bJt is the width cSAj,. By (1)  (3) a neurofuzzy learning algorithm under FSTRM for updating the parameters aJh bJh yt and w, (j=l,2,..m; i=l,2,...,n) is given based on the gradient descent method [15] as follows: aJt + l)aJt)aaE/da„(t)
m
2a(y*  y)(yt  y)w, sgn(x 
(11)
o„(0 + 
b bt:yh,w,
4
Z>,,. (/ +1) = *,,(?)/*)£/6*,, (0
m
2p{yy){yty)w\xJaJ^[Aki •bAt)
+
(12)
b
ji y v #
yAt + = yi{t)
+
VyMydElByM)
2'"
n
?{y'y)h^
(13)
Fuzzy Rule Generation with Fuzzy Singleton . . . W, (t +1)  W, (f)  0dE 19wi (t)  v , ( 0 + *
( /
65
 ^ h.w,
(14>
where in (11) and (12), k = _/' implies & = l,...1/l,_/+l,.../w, and the mark sgn is £ a sign function as follows: 1, 0, 1, x<0; x = 0; x>0.
sgn (x) 
4.4 Designing Initial Tuning Parameters of Fuzzy Rules based on Fuzzy Clustering Algorithm (FCM) Now, we are in a position to design the initial tuning parameters of fuzzy rule in the neurofuzzy learning algorithm (6)  (9) or (11)  (14) by using FCM, which is described briefly as follows [1,2]. Assume ( ^ ^ . . . ^ J to be variables on the input space X = Xl XJ 2 X... XXm, and y be variable on the output space Y, U^R„Xc be an n X c matrix of fuzzy partition for given n training data JC* = (xlk,x2k,...,xmk,y*k) (k=l,2,...,n), where cis the number of clusters. Let /i h; e U be a membership function value from /cth vector xk to ith cluster center vector v, (=(v1',v2',...,vm+1')ei?"rtl) (z'=l,2,...,c; 2<c<ri), and satisfy the following restrictions: 0*//to£l, Vk,i (15)
0«; y juki «; n,
V/
(16)
66
Y. Shi & M.
Mizumoto
t/^=l,
VAr
(17)
A n objective function Js of F C M for solving v, is defined as follows:
J , ( t / , V l , v 2 , . . . , v c )   ^{MuYfav.i
<18>
where s (l<s«») means a smoothing weight,  •  is the inner product norm. For minimizing J„ FCM algorithm is performed in the following procedures: Step 1. Given parameters c and s such that 2^c<n, l<s<°°, and the inner product norm  • . We can take a norm as hvifA=(xkvi)A(xkvf (19)
where A is a positive definite matrix. Step 2. Given an initial matrix C/(0) of fuzzy partitions randomly, subject to the conditions of (15)(17). Step 3. For t = 0,1,2,..., calculate cluster centers v, (z=l,2,...,c) by using U{f) as follows:
v,(//*)'v(//a)' <20>
Step 4. Update U(t) in the following ways: (1) Calculate the sets Ik and fk (k=l,2,...,n) as / * = { i  l ^ i ^ c,4, = ^v, = 0} fk= {1,2,...,c}Ik (21) (22)
Fuzzy Rule Generation
with Fuzzy Singleton . . .
67
(2)(a): When/, = 0,
(2)(b):When/t ^ 0,
Afa = 0,
V / e/4
(24)
y^=i
<25)
Step 5. Define a suitable matrix norm, and stop FCM process if the following condition holds: \\U(t)U(t+l)\\< E (26)
otherwise, set t = t+\ and return to Step 3; where £ is a sufficient small positive number, which is a terminal criterion. In this study, we take a matrix norm as [2] p{t) U(t +1)[  max/^. (r)  nkl (t +1) (27)
In the sequel we discuss how to arrange the suitable initial values of tuning parameters of fuzzy rules (centers and widths of membership functions of antecedent parts, real numbers of consequent parts and their weights) in the learning algorithm (6)  (9) or (11)  (14). For given n training data xk = (x{jcf,...jc^, y'^ (k=l,2,...,n), we can create the cluster center v, = (v^v^..v,^,') (/=l,2,...,c) and the matrix U = (Afc)„xc by using FCM process. First, we define the importance w, of each cluster center v, (/=l,2,...,c) as
68
Y. Shi & M.
Mizumoto
",l/<C
(28)
or more generally,
W «p(tO
(29)
where dh = \\xk  v, (£=1,2,...,«; i=l,2,...,c) is the distance from a point JCA to the cluster center v,. Then, by means of the idea of the fuzzy cshell clustering algorithm [4] we can define a scalar r, = (rx',r2,...,rj,rm¥ll) (z'=l,2,...,c), where rj, stands for the length of the cluster center v, iny'th axis:
rlliMufUfvlVliMj
(30)
Next, we correspond the cluster center v, with the initial tuning parameter vectors (au(0), a2i(0),...,aJ0),yl(0)) in (6) and (8) or in (11) and (13) as (au(0), <h0),...,aJO),yjm = v, Namely, we have «,,(0) = v/y=l,2,...,m; i=l,2,...,c y@) = vm¥l' i=l,2,..,c. (32) (33) (31)
Furthermore, we give the relations between rj (/=l,2,...,w; j=l,2,...,c) with the initial tuning parametersfy,(0)in (7) or in (12) as bj{V) = krj, j=l,2,...,m;i=l,2,...,c (34)
Fuzzy Rule Generation
with Fuzzy Singleton . . .
69
where k stands for an adjusting constant, subject to the type of membership functions. Fig. 4.4 gives a simple explanation of (32) and (34) at m = 2. Finally, the weight w,(0) of z'th fuzzy rule in (9) or in (14) can be calculated by using (28) or (29) as W/(0) = w, I max{ wj) (35)
Note that the number of the cluster centers coincides with the number of fuzzy rules in (6)  (9) or in (11)  (14). Hence, we can obtain the initial tuning parameters as (32)  (35) before the learning of fuzzy rules.
Fig. 4.4 4.S
Initial membership functions of antecedent parts by cluster centers
Numerical Examples
In the sequel, we only adopt Gaussiantype neurofuzzy learning algorithm to the system identification. Similarly, we may solve the problem by Triangulartype neurofuzzy learning algorithm.
70
Y. Shi & M.
Mizumoto
First, the proposed method is applied to the following nonlinear function with one input variable and one output variable. Example I: y= 0.3 +0.9x/(1.2x 3 +x + 0.3)+ T J (36)
where x e [0,1] is an input variable, n e [0,0.15] is a noise created randomly. We assume that n = 30, c = 3, s = 2 and A = Unit matrix in Step 1, and e = 0.0001 in Step 5, respectively, and 30 inputoutput training data are used randomly. FCM process obtains 3 cluster centers after 18 iterations as shown in Fig. 4.5. Then, from (32)  (35) initial parameters of fuzzy rules can be set as shown in Table 4.1. Fig. 4.6 shows an illustration for the desired model of (36) and the corresponding fuzzy model for 30 checking input data given randomly, based on the fuzzy rules of Table 4.1. In this case, the mean square error is 0.004071, the maximum absolute error is 0.189695. Moreover, by using the learning algorithm (6)  (9), we tuned the initial parameters of fuzzy rules of Table 4.1, where the learning rates are set as a= jff= y= 6= 0.01, and the threshold 6 , which stops the learning process, is 0.002. Then, the fuzzy rules for identifying (36) are generated after 43 iterations as shown in Table 4.2. Here we employed 30 training data for the learning process given in Fig. 4.5.
9s>
O Inputoutput data • Cluster centers I
Fig. 4.5
Inputoutput data and Cluster centers for (36).
Fuzzy Rule Generation
with Fuzzy Singleton . . .
71
CD
O
0
O Desired model • Fuzzy model
0
I 1
Fig. 4.6
Desired model for (36) and fuzzy model by Table 4.1.
Table 4.1 Initial parameters of fuzzy rules obtained by FCM for (36). No. 1 2 3
«i,(0)
MO)
0.0798 0.0748 0.0930
0.2065 0.5218 0.8148
y&) 0.7085 0.8588 0.7981
w,(0) 0.6956 1.0000 0.6505
Table 4.2 Fuzzy rales generated by the proposed method to identify (36). No. 1 2 3
flu
0.1489 0.4560 0.8198
bu 0.0719 0.1357 0.0363
yt 0.5830 0.8741 0.7793
w, 0.4560 1.0073 0.6497
When the fuzzy inference is performed for the former 30 checking input data again by using the fuzzy rules of Table 4.2, a better approximate result is obtained as shown in Fig. 4.7. In this case, the mean square error is 0.0016, and the maximum absolute error is 0.083778.
72
Y. Shi & M.
Mizumoto
y
Q
O Desired model D Fuzzy model
Fig. 4.7
Desired model for (36) and fuzzy model by Table 4.2.
Next, we compare the proposed method with the direct method (6)  (9) (namely, tune fuzzy rules without FCM process) by means of the following nonlinear function: Example 2: y = [4sin( KX^ + 2cos( 7ix^\l\2 + 0.45 + 7 7 (37)
where x2, x 2 ^[l,l] are input variables, 7j e [0,0.05] is a noise created randomly. As in Example 1, we assume that n = 100, c = 16, s = 2 and A = Unit matrix in Step 1, and e = 0.0001 in Step 5, and 100 inputoutput data are employed > randomly. Then by using the proposed method, we can obtain 16 ellipticaltype cluster centers with the lengths corresponding to the linguistic input variables x, and x2, after 22 iterations as shown in Fig. 4.8. By (32)  (35), we can transfer them into initial parameters of fuzzy rules as shown in Table 4.3. On the other hand, we can have another kind of initial parameters of fuzzy rules as in [1012,19] directly, as shown in Table 4.4. Furthermore, we tuned the above two kinds of initial fuzzy rules in Tables 4.3 and 4, by using the learning algorithm (6)  (9), respectively, where 40 training data are taken in each learning process, randomly. Table 4.5 shows the iteration of the learning, the error of evaluation and the maximum absolute error for the checking data to identify (37) by the
Fuzzy Rule Generation
with Fuzzy Singleton . . .
73
direct method (A) and the proposed method (B), respectively. Here, the learning rates are taken as a= 0= y= 0= 0.1, and the threshold < is 0.0001; the 5 error of evaluation is taken as a mean square error for the checking data, and 2601 checking data (x^2) were employed from (1,1) to (1,1), equally.
x
2
_ m CZ)
. . . . inpm aara Ellipticaltype cluster centers
1
0
Fig. 4.8 Table 4.3
Input data and Ellipticaltype cluster centers for (37).
Initial parameters of fuzzy rules for (37) by the proposed method.
No.
1 2 3 4 5 6 7 8 9 10 1 1 12 13 14 15 16
«„(0) 0.6181 0.2903 0.8647 0.1797 0.7102 0.0781 0.8912 0.4792 0.2408 0.3148 0.3867 0.8256 0.5043 0.6259 0.7490 0.7701
6„(0) 0.1303 0.1234 0.1727 0.1143 0.1043 0.0788 0.1108 0.1533 0.1336 0.1631 0.1236 0.1025 0.1557 0.1631 0.1770 0.1466
«2,(0)
M0)
0.1397 0.1404 0.1503 0.1274 0.1646 0.1689 0.1435 0.1780 0.1555 0.1901 0.1253 0.1192 0.1603 0.1778 0.1511 0.1486
0.0320 0.6840 0.8611 0.8318 0.8132 0.6529 0.7603 0.7728 0.0087 0.3675 0.6080 0.5660 0.3015 0.2328 0.6890 0.3639
v,(0) 0.3497 0.1299 0.4511 0.5001 0.0968 0.3252 0.4693 0.0423 0.8563 0.2809 0.7194 0.6016 0.8926 0.8603 0.1748 0.3364
W,(0) 0.9497 0.9874 0.5554 0.6824 0.8505 1.0000 0.7298 0.7072 0.7936 0.8155 0.8419 0.8413 0.7970 0.8128 0.6803 0.7489
74
Y. Shi & M. Mizumoto Table 4.4 Initial parameters of fuzzy rules for (37) by the direct method.
No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
*w(0) 1.000 1.000 1.000 1.000 0.333 0.333 0.333 0.333 0.333 0.333 0.333 0.333 1.000 1.000 1.000 1.000
MO)
0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603
OJXO)
1.000 0.333 0.333 1.000 1.000 0.333 0.333 1.000 1.000 0.333 0.333 1.000 1.000 0.333 0.333 1.000
b2i(0) 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603 0.1603
y,(0) 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000
W,(0) 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
Table 4.5 Comparison between the direct method (^4) and the proposed method (B) for identifying (37).
No. 1 2 3 4 5 6 7 8 9 10
Iteration Error of evaluation (A) (B) (A) (B) 13 8 0.00662 0.00555 66 94 0.00524 0.00464 8 113 0.00494 0.00270 6 4 0.01061 0.00661 59 84 0.00602 0.00428 19 21 0.00653 0.00549 85 283 0.00250 0.00359 8 21 0.00501 0.00435 32 70 0.00842 0.00580 110 12 0.00304 0.00458
M a x . absolute error (A) (5)
0.2800 0.3700 0.2091 0.4287 0.4164 0.2951 0.1952 0.2394 0.4022 0.2514 0.2834 0.3436 0.3013 0.3195 0.3048 0.3166 0.2891 0.3129 0.3821 0.2921
From Table 4.5 we can see a better approximation in the case of the proposed method than that by the direct method except for a few special cases, though the iterations of two methods are almost the same.
Fuzzy Rule Generation with Fuzzy Singleton . . .
75
4.6 Conclusions In this chapter, a kind of efficient learning approach for fuzzy rule generation with fuzzy singletontype reasoning method has been proposed. We have illustrated the efficiency of the proposed learning technique by identifying some numerical examples, which demonstrated better fuzzy inference results than those by the direct learning algorithm. Our results also indicate that the proposed learning method is more reasonable and suitable for constructing an optimum fuzzy system model in the case when applying the fuzzy singletontype reasoning method to identification systems. References
[1] J.C. Bezdek, "Pattern Recognition with Fuzzy Objective Function Algorithms", Plenum Press, 1981. [2] R.L. Cannon, J. V. Dave and J.C. Bezdek, "Efficient implementation of the fuzzy cmeans clustering algorithms", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 8, pp. 248255, 1986. [3] K.B. Cho and B. H. Wang, "Radial basis function based adaptive fuzzy systems and their applications to system identification and prediction", Fuzzy Sets and Systems, Vol. 83, pp. 325339,1996. [4] R.N. Dave and S.K. Bhaswan, "Adaptive fuzzy cshell clustering", Proceedings of the 10th Annual North American Fuzzy Information Processing Society Meeting, pp. 195199,1991. [5] S. Horikawa, T. Furuhashi and Y Uchikawa, "On fuzzy modeling using fuzzy neural networks with the backpropagation algorithm", IEEE Transactions on Neural Networks, Vol. 3, pp. 801806, 1992. [6] H. Ichihashi, "Iterative fuzzy modeling and a hierarchical network", Proceedings of the Fourth IFSA World Congress, pp. 4952, 1991. [7] H. Ichihashi and I. B. Tuksen, "A neurofuzzy approach to data analysis of pairwise comparisons", InternationalJournal of Approximate Reasoning, Vol. 9, pp. 227248, 1993. [8] S. Lee and R. M. Kil, "A Gaussian potential function network with hierarchically selforganization learning", IEEE Transactions on Neural Networks, Vol. 4, pp. 207224,1991. [9] M. Maeda and S. Murakami, "An automobile tracking control with a fuzzy logic", Proceedings of the Third Fuzzy System Symposium, pp. 6166, 1987.
76
Y. Shi & M. Mizumoto
532, 1992. [11]M. Mizumoto, "Fuzzy controls by fuzzy singleton type reasoning method", Proceedings of the Fifth IFSA World Congress, pp. 945948, 1993. [12]M. Mizumoto and M. Iwakiri, "Selfgeneration of fuzzy rules by fuzzy singletontype reasoning method", Proceedings of the Ninth Fuzzy System Symposium, pp. 585588,1993. [13]H. Nomura, I. Hayashi and N. Wakami, "A selftuning method of fuzzy control by descent method", Proceedings of the Fourth IFSA World Congress, pp. 155158, 1991. [14]H. Nomura, I. Hayashi and N. Wakami, A selftuning method of fuzzy control by descent method", Proceedings of the IEEE International Conference on Fuzzy Systems, pp. 203210,1992. [15]D.E. Rumelhart, J.L. McClelland and the PDP Research Group, "Parallel Distributed Processing", MA: MIT Press, 1986. [16]Y. Shi, M. Mizumoto, N. Yubazaki and M. Otani, "A method of fuzzy rules generation based on neurofuzzy learning algorithm", Journal of Japan Society for Fuzzy Theory and Systems, Vol. 8, pp. 695705,1996. [17]Y Shi, M. Mizumoto, N. Yubazaki and M. Otani, "A selftuning method of fuzzy rules based on the gradient descent method", Journal of Japan Society for Fuzzy Theory and Systems, Vol. 8, pp. 757765,1996. [18]Y Shi, M. Mizumoto, N. Yubazaki and M. Otani, "A learning algorithm for tuning fuzzy rules based on the gradient descent method", Proceedings of the Fifth IEEE International Conference on Fuzzy Systems, pp. 5561,1996. [19]Y. Shi, M. Mizumoto, N. Yubazaki and M. Otani, "A tuning method of fuzzy rules by fuzzy singletontype reasoning method", Proceedings of the Fourth International Conference on Soft Computing, pp. 553556, 1996. [20]L.X. Wang and J.M. Mendel, "BackPropagation fuzzy system as nonlinear dynamic system identifiers", Proceedings of the IEEE International Conference on Fuzzy Systems, pp. 14091416, 1992. [21]L.X. Wang, "Adaptive Fuzzy Systems and Control", Prentice Hall, 1994.
Chapter 5 Antecedent Validity Adaptation Principle for Table Lookup Scheme
PingTong Chan and Ahmad B. Rad The Hong Kong Polytechnic University Kowloon, Hong Kong
Abstract In this chapter, we propose an Antecedent Validity Adaptation (AVA) principle for fuzzy systems tuning. It is suggested that the fuzzy rules should be updated with respect to the validity of antecedents. This adaptation principle agrees with the human intuition and fuzzy logic reasoning. The principle is applied to Table Lookup (TL) scheme to model recorded data. Based on this approach an online fuzzy identification algorithm is also presented. These methods are successfully applied to model nonlinear systems. Keywords: Data Modeling, Table Lookup Scheme, Antecedent Validity Adaptation
5.1 Introduction Fuzzy logic has been used extensively for the task of controller design [4]; however, more and more researchers have shown interest to explore the possibility of using fuzzy logic for modeling of complex systems. Fuzzy modeling has been carried out both offline [7] or online [5, 6]. Wang and Mendel [7] proposed a Table Lookup Scheme to generate fuzzy rules, which uses both numerical data and expert knowledge. Nozaki et al [3] proposed a heuristic method to generate fuzzy rules from numerical data. Moreover, Wang [5, 6] proposed gradient and least square methods to generate fuzzy rules on77
78 P.T. Chan & A. B. Rad
line. The main advantage of fuzzy modeling (identification) is that scattered heterogeneous information such as qualitative knowledge, empirical observations, measured data and available a priori information can be interpreted and represented in a coherent format. Due to these properties, complex and illdefined nonlinear systems can be modeled with simple structures instead of sophisticated mathematical models. This is especially effective for modeling systems that can be controlled by a skilled human operator who is equipped with a fuzzy model of the underlying dynamics acquired by combination of intuition, intelligence and practice. In this chapter, we suggest an Antecedent Validity Adaptation (AVA) principle. The adaptation principle agrees with the human intuition and fuzzy logic reasoning. The crux of the method is to divide the data into each of its original linguistic antecedents and update the corresponding consequences with respect to their validity measure according to their product inference. The algorithm highly utilizes the resources in the fuzzy system to assimilate knowledge embedded in the data and is also capable of summarizing the available expertise. The main work in this chapter is to improve the Table Lookup Scheme, instead of updating the most significant rules, the proposed method uses more information form the available data. The rest of this chapter is organized as follows: Section 2 introduces the antecedent validity adaptation (AVA) principle. The principle is applied to refine a Wang and Mendel's table lookup scheme [7] for recorded data followed by a simulated example in Section 3. The algorithm will be extended to form an adaptive identification algorithm in Section 4 and simulated example will be included to show the performance of the online modeling. Finally, the chapter is concluded in Section 5.
5.2 Antecedents Validity Adaptation Principle and Table Lookup Scheme A good engineering practice should be capable of incorporating all available information effectively; in the same spirit, the design should not discard any available information. Antecedents Validity Adaptation (AVA) principle uses the antecedents validity of each data, with respect to fuzzy sets and fuzzy rule, to adjust the output consequences.
Antecedent
Validity Adaptation
Principle . . .
79
Wang and Mendel [7] proposed a method to generate fuzzy rules which use both numerical data and expert knowledge. In their method, they first divided the input and output domain into several regions. Then, they generated rules from the data. A degree was assigned to each rule. Finally, combined rulebase for the controller was generated by resolving conflicts between different rules. However, one may notice that this method selects only the most influencing rules and leaves the other data information out of consideration. The AVA incorporates this information into the fuzzy system. A step by step procedure for implementing the AVA for the TL is shown as follows: Step 1. Define fuzzy sets to cover the input spaces Let us assume that the following m inputoutput pairs are given as training data for constructing a fuzzy rulebased system: {(xiph,yip))p=l,2,...,m} where £p) = { x\P), X^ ,..., X{np) }eU={ A/ 1 X Af X... X A? } c Rn is the input vector of the pth inputoutput pair and y<p> e Vc R is the corresponding output and jx = 1,2,...K\\ ... ; j„ 1,2,...K„. The ndimensional input space A, X A2 X...X A n is divided into KiK2 ... Kn fuzzy subspaces. The fuzzy system performs a mapping from Ua Rn to R, where U= £/]X.. .xUn, and i/j (zR, i=l,2,...,n. In the proposed method, we have implemented the fuzzy system with triangular membership functions, fuzzy singleton fuzzifier, product inference and center of height defuzzification. The shape of each membership function is triangular: one vertex lies at the center of the region which has membership value equal to one; the other two vertices lie at the centers of two adjacent regions with membership values equal to zero. The total summation of membership values on anywhere over the universe of discourse equals to one. The quality of reconstruction depends upon the number of fuzzy subsets in the input space. Increasing the number of the input fuzzy subsets improves accuracy. Step 2 Generate fuzzy rules from given data pairs (i) Construct the fuzzy system The rule base consists of a set of fuzzy IFTHEN rules in the form "IF a set of conditions are satisfied, THEN consequent can be inferred". We assume that the rule base is composed of fuzzy IFTHEN rule of the following form: Rule R^Jn
:
r p ^ i s A / 1 and ... andx n is A^n
theny is Z?71"7"
y,
=
80 P.T. Chan & A. B. Rad l,2,...K{;...;jnl,2,...Kn where R^" n ls m e label of each fuzzy IFTHEN rule and b ^'" consequent real number. From the I I . _ , K
ii
i
] J Jn
(1) is the
fuzzy IFTHEN rules:
^) = —^rnr—;
7i=l 7,=1 '=1
£ioW*,)]
<2>
are free
where j'=l, 2, ..., n and j , = 1,2,...AT,; ... ; y'n 1,2,...Kn, bJl""Jn parameters to be designed, and A,' are designed in Step 1, (ii) Collect free parameters
bJ\Jn
ft=(fc,\...,^,",,fc,2,",,...,^2",,...,^,"Ar",...,^^")r
and rewrite as Ax) = bTa(x) (3) where a(x) is a_J ] _ . , A . dimensional matrix with its /i.../„th elements T
ah ~ln (x) = 7
1=1
iitn^c^)]
A'
7,=1 y„=l i=l
'f^„
(4)
(iii) Select the initial parameter b(0) If there are linguistic rules from human experts whose IF parts agree with the IF parts of (1), then choose (0) to be the centers of the THEN part fuzzy sets in these linguistic rules. In this way, we can construct the initial fuzzy system from conscious human knowledge. This offers the advantage of no need to set the centers of the consequences before modeling as the case of Mamdani Type fuzzy systems [1]. The antecedent validity of the rules RJ^,Jn data is given by Eq. 4. for the
Antecedent
Validity Adaptation
Principle . . .
81
Step 3 Create a Combined Fuzzy Rule Base
m
£'.'. = £ = !
X«(P)'"J"
(5)
5.2.1 Illustrative example Let us explain these procedures with an example to help clarify the discussion. In this example, the task is to generate a set of fuzzy rules to formulate a mapping/[(JCI, x2); y]. Assume that the domain intervals of X\, x2, and y are [x{~, *i+L [*2~> *2+] a n d [y~: y+], respectively. A step by step procedure is outlined as: Step 1. Define fuzzy sets to cover the input spaces Consider the normalized domain [*i_, x\+] = [0, 1], [x2~, x2+] = [0, 1] and [y~, y+] [0, 1]. X\ is divided into 5 fuzzy sets, x2 is divided into 7 fuzzy sets as depicted in Fig. 1. Assume also that X\ and x2 are inputs, and y is the output. This simple twoinput oneoutput case is selected to demonstrate the basic concept of our new approach.
82 P.T. Chan & A. B. Rod (0.675; CE:0.3,B1:0.7) (0.65;CE:0.4,B1:0.6)
H(x,)
(0.8;B1:0.8,B2:0.2) B2
(0.28;S2:0.3,S1:0.7) H(xJ S3 S2
(0.36;S1:0.9,CE:0.1) (0.5;CE:1)
(0.525; CE:0.9,B1:0.1) (0.55;CE:0.8,B1:0.2) (0.677; CE:0.3,B1:0.7) B2
Figure 5.1 Input and output fuzzy sets and three data and their corresponding membership values.
Step 2 Generate Fuzzy Rules from Given Data Pairs Suppose that we are given three sets of data. Data 1: (0.8,0.28; 0.525); Data 2: (0.65,0.5; 0.677); and Data 3: (0.675, 0.36; 0.55). For the data sets 1 and 2, we determine the degrees of membership for X\ and JC2' in different regions. For example, in Fig. 1, xi =0.8 has a membership value of 0.8 in B l , and a membership value of 0.2 in B2, and zero membership in all other regions. The membership values for these data sets are shown in Table 1.
Antecedent Validity Adaptation Principle . . .
83
*, ( 1 ) 0.8 x2(1)  0.675 y(1)  0.28
xi{2)  0.65
x2(2)  0.5 y(2)  0.677
Table 5.1
Fuzzy set 1 : Membership value Bl:0.8 S2:0.3 CE:0.9 CE:0.4 CE:1 CE:0.3
Fuzzy set 2 : Membership value B2:0.2 S 1:0.7 Bl:0.1 B 1:0.6 Bl:0.1
Fuzzy set and Membership Value of Data 1 and Data 2.
Next, we assign the consequences with respect to the input linguistic variable with degree equal the antecedent validity. The results for data sets 1 & 2 are summarized in Tables 2 and 3 respectively. Data 1(0.8, 0.28; 0.525) TL TLV Rules Antecedent Rules Antecedent Validity Validity IF X\ is B1 and x2 is 0.8x0.3 S2 THEN y = 0.525 = 0.24 IF xt is Bl 0.8x0.7 IF xi is B1 and x2 is 0.8x0.7 and x2 is SI = 0.56 SI THEN y = 0.525 = 0.56 THEN y = CE (0.5) IF x\ is B2 and x2 is 0.2x0.3 S2 THEN y = 0.525 = 0.06 IF X\ is B2 and x2 is 0.2x0.7 SI THENy = 0.525 = 0.14 Total = 1 Table 5.2 Fuzzy rules formed by TL and TLV (Data 1).
84 P.T. Chan & A. B. Rod Data 2(0.65, 0.5; 0.677) TL Antecedent Rules Validity
TLV Rules
IF xi is Bl 0.6x1 = 0.6 and xi is CE THEN y = Bl (0.75) Total = 1 Total = 0.6 Table 5.3 Fuzzy rules formed by TL and TLV (Data 2).
Antecedent Validity IF X\ is CE and x2 is 0.4x1=0.4 CE THEN y = 0.677 IF X\ is B1 and x2 is 0.6x1 = 0.6 CE THEN y = 0.677
Step 3 Create a Combined Fuzzy Rule Base The numerical data are coded into a common framework by the consequents of the FBF given by P
1=1
where p is the number of training data, fa the antecedent validity of the data to that rule and y, consequent data value. The resultant rule table for the three sets of data are shown in Figure 2(ab).
Antecedent
Validity Adaptation
Principle . . .
85
B3 B2 Bl x2 CE SI S2 S3 S2 (a) SI CE Bl B2
0.677 (0.4) 0.677 0.525 0.525 (0.6) (0.56) (0.14) 0.525 0.525 (0.24) (0.06)
Data 2 <bn 0.677 (x,0.4:CE,0.6:Bl),(x2l:CE); Datal <£=) 0.525 (x,0.8:Bl,0.2:B2),(x20.28:S2,0.7:Sl);
The fuzzy rule base ofTLV for Data 1 and Data 2.
B3 B2 Bl CE SI S2 S3 S2 (b) SI CE Bl B2 0.668= 0.677*0.4 K).55»0.03 0.4 +0.03 0.677*0.6+0.55*0.27 0.6 +0.27 0.668 0.55 (0.43) (0.63) 0.638 0.538 0.525 (0.87) (1.19) (0.14) 0.525 0.525 (0.24) (0.06) . Data 3 (x,0.3:CE,0.7:B 1), foO.fcS 1,0.1 :CE);
<J=i 0.55
The fuzzy rule base of TLV for Data 13.
0.638=
0.525*0.56+0.55*0.63 0.538= 0.56+0.63
Figure 5.2
Rule base generated by Example 1.
Let us denote the Table Lookup with AVA Reasoning with TLV and the Table Loopup scheme with TL. Comparing the reasoning with the two algorithms (TL) and (TLV), we get:
86 P.T. Chan & A. B. Rad
Data 1(0.8,0.28; 0.525) (Please refers TL: The output is 0.5 and Modeling error = 0.5250.5 = 0.025. Data 2 (0.65,0.5;0.677) (Please refers TL: The output is 0.75 and Modeling error = 0.750.677=0.073.
to Fig.2a and Table 2) TLV: The output is 0.525 and Modeling error = 0.5250.525 = 0. to Fig.2a and Table 3) TLV: The output is 0.677 and Modeling error = 0.6770.677 = 0.
Next, Data 3 (0.675, 0.36; 0.55) is added for modeling as shown in Table 4.
JC/3)  0.675 x2(3)  0.36 j ( 3 )  0.55
Table 5.4
fuzzy set 1 : Membership value CE:0.3 S 1:0.9 CE.0.8
fuzzy set 2 : Membership value B 1:0.7 CE:0.1 B 1:0.2
Fuzzy set and Membership Value of Data 3.
Antecedent
Validity Adaptation
Principle . . .
87
Data 3 (0.675, 0.36; 0.55) TL Rules
Antecedent Validity IF x, is CE and x2 is SI THEN y 0.3x0.9 = 0.55 = 0.27 IF JC, is CE and JC2 is CE THEN y 0.3x0.1 = 0.55 = 0.03 IF x, is Bl and x2 is SI THEN y 0.7x0.9 = 0.55 = 0.63 IF x, is B1 and x2 is CE THEN y 0.7x0.1 = 0.55 = 0.07 Total = 1 Table 5.5 Fuzzy rules formed by TLV (Data 3). We assign the consequent with respect to the input linguistic variable with degree equal the antecedent validity. Then, for reasoning with TLV, we have: Data 1(0.8,0.28; 0.525) The output is 0.52655 (=0.538*0.56+0.525*0.14+0.525*0.24+0.525*0.06) and modeling error = 0.5320.525 = 0.007. Data 2(0.65,0.5; 0.677) The output is 0.65 (=0.668*0.4+0.638*0.6) and modeling error = 0.650.677 = 0.027 Data 3 (0.675, 0.36; 0.55) The output is 0.576 (=0.638*0.27+0.668*0.03+0.538*0.63+0.55*0.07) and modeling error = 0.569740.55 = 0.01974
5.2.2 Some remarks on properties ofAVA • The results agree with intuition and fuzzy reasoning The adaptation agrees with the human intuition. It is noted that the proposed method is consistent with the fuzzy reasoning, the total antecedent validity equals to 1; and the input data is updated according to its validity measure; the product inference of antecedents (antecedent validity) of each rules accounts for the data's degree of contribution. In other words, the modeling algorithm is able to obtain the same parameters as the original system with a sufficient number of noiseless inputoutput training data. The membership value of each rule accounts for the data's degree of contribution, antecedent validity (AVA). AVA uses this value in the reasoning process and calculation the adaptation portion for the rule consequences.
88 P.T. Chan & A. B. Rod • Degree offreedom in the consequences are increased Mamdani type fuzzy systems, with fuzzy singleton fuzzifier, product inference and center of height defuzzification, are a constrained Fuzzy Basis Function (FBF). The consequences in Mamdani type fuzzy systems are restricted to the fuzzy sets of the output linguistic variables. For example, consider a fuzzy system [7x5; 7] (i.e. the first input variable has 7 fuzzy sets, the second variable has 5 fuzzy sets and the output has 7 fuzzy sets). For this case, Mamdani type fuzzy system has 35 rules with 7 possible consequences; on the other hand, the FBF has 35 rules with 35 possible consequences. When the input fuzzy sets are divided into finer partitions, FBF not only has an increased number of rules, but also has an increased degree of freedom in consequences. • information Usage is increased Essentially, TL scheme only characterizes the key features; TLV discards no details and absorbs each piece of information proportional to its antecedent validity. The idea of AVA is intuitively straightforward and computationally simple; it makes use of the intrinsic treasure of the available data. With AVA principle, all the data are fitted into a common rule table cooperatively. TLV makes use of the remaining 44% information of Datal and the remaining 40% information of Data2. TLV has maximum increase of percentage in data usage when both xx and x2 have the membership value equal to 0.5 for two adjacent respectively; and TLV will have 75% increase in information usage. Moreover, with TLV, if the new data is not as dominates as the occupied rule, its information will still take into consideration. For a 2 variables fuzzy system, TLV makes use of the remaining 3 rules. It is apparent that when the input variables increase, the effectiveness of information extraction by AVA will be more significant. The proposed algorithm can extract more information from the data while the simple one pass procedure of TL is unimpaired.
• Robustness is improved with respect to changes of definition of fuzzy sets It can be observed that TL is more sensitive to the definition of the fuzzy sets due to selection of only the most significant rules. When the centers of the membership functions change, the performance of the fuzzy system change. On the other hand, TLV selects the rules proportional to their antecedents validity. When the definition of the fuzzy sets is changed, the antecedents validity of the fuzzy sets and output consequents will be adapted accordingly. No bias is
Antecedent
Validity Adaptation
Principle . . .
89
against the input data when they are partition to construct fuzzy rules. Every datum gives its contribution when initialized or updated the rule base. This method does not drop any piece of information from the data and it absorbs more knowledge from the training materials. • Information usage TL emphasis on incorporation of human knowledge; whereas our method aim at extracting numerical information. When extensive important expert knowledge is available, TL is preferable. When expert knowledge is limited or inaccessible, or reliable numerical recorded data (successful control recorded data) is available, TLV is a favorite choice. However, human knowledge can also be incorporated adequately as mentioned in Step 2.
5.3 Simulation results Example 1 In this section, we apply the method to modeling of a nonlinear system. The plant to be identified is governed by the difference equation y(k+l) = 03y(k) + 0.6y(kl) + g[u(k)] (7) where the unknown function has the form g(u)=0.6sin(nu) + 0.3sin(3ri) + 0.lsin(5nu). From Eq. 7 the identification model is governed by the difference equation y(*+l) = 0.3y(*) + 0.6y(*l) +/[«(*)] (8) where /[*] is the form of Eq. 2 with 7 fuzzy sets. This problem has also been studied in [4] and [1]. Fig. 3 shows the outputs of the plant and the identified model, where the input u(k) = sin(2nk/200). It is observed from Fig. 3 that the output of the identification model follows the output of the plant. The fuzzy systems have 9 fuzzy subsets divide over the maximum range of the training data. The squared identification error by TL = 51.35 and the square identification error by TLV= 12.04.
90 P.T. Chan & A. B. Rad
Plant (solid), TL (dotted) and TLV (hidden)
50
100
150
200
250
300
350
400
450
500
Figure 5.3 Outputs of the plant and the identifications of TL and TLV for Example 1. 5.4 Adaptive Fuzzy Identifier The algorithm can be extended to form an online identification algorithm by revising step three in Section 2. to update the fuzzy basis function with input data according to AVA
blil»(p)
=aa'lJ«
tf)bll"ln(pl)
+(laa/l'/"
g))yp
(9)
where a is the forgetting factor. The adaptive behavior will be demonstrated by two examples. Example 2 The plant to be identified [1, 4] is described by the secondorder difference equation y(k+\) = g[y(k),y(kl)]+u(k) (10) where g[y(k\ y(k\)} = y(k)y(kl)[y(k)+2.5]/{l+y\k)+y2(k\)] (11) and u(k) = sin (2nk/25). A seriesparallel identifier described by the equation y*(*+l) =Ay{k), y(k\)} + u{k) (12)
Antecedent
Validity Adaptation
Principle . . .
91
was used, where fiy(k), y(kl)] is in the fuzzy system to model the nonlinearity and a = 0.9. Fig. 4 shows the outputs of the plant and the identification model.
Plant (solid), TLV (dotted)
300
Figure 5.4 Output of the plant (solid) and the identification model (dotted) for Example 2. Example 3 Time series prediction is a practical problem in economic and business planning, weather forecasting, control, etc. In this paper, we use the fuzzy system designed by the table lookup scheme to predict the MackeyGlass chaotic time series that is generated by the following delay differential equation: (13) 10 dt Hj?(r —i) 1 The prediction of future values of this time series is a benchmark problem that has been used and reported by Wang, [5, 7] etc.
ds(t)
s(tl)
= bs(t) + a
92 P.T. Chan & A. B. Rod
1.4 1.3 1.2 1.1 1 0.9 0.8 0.7 0.6 0.5 0.4 10 20 30 40 50 60 70 80 90 100
I
1
\
i
1
,

' ' l1 "
:
f /
' 1 •
f
1 'h i . 1 I , :
A section of the MackeyGlass chaotic time series.
Figure 5.5
Fig. 5 shows 1000 points of x(k). The problem of time series prediction can be formulated as follows: given x(kn+l), x(kn+2), ..., x(k), estimate x(k+\), where n is a positive integer. The goal of the task is to use past values of the time series up to time t to predict the value at some point in the future t+P. The standard method for this type of prediction is to create a mapping from D points of the time series spaced apart—that is [x(t (Dl)), ..., x(tD), x(t)], to a predicted future value x(t+P), where D=A and DP=6 were used. We use fourthother RungeKutta for the differential equation (13) with time step 0.1, initial condition x(0)=1.2, and f=17. x(i) =0 for /<0. From the MackeyGlass time series x(t), we extracted 1000 inputoutput data pairs of the following format: [x(f18), x{t\2,x(t(>), x(t); x(t+6), x(t)\ x(t+6)], where f=125 to 1124. The first 500 pairs were used as the training data set for TLV, while the remaining 500 pairs were the checking data set for validating the identified TLV. The number of triangular membership functions assigned to each input of the TLV was set to seven. Fig. 6 shows the result of modeling of the time series.
Antecedent 1.4 1.3 1.2 1.1 1 0.9 0.8 0.7 0.6 0.5 0.4 10
• '
Validity Adaptation
Principle . . .
93
VI
a
• /i
j\
ft
i
1
1
\
i
t
r
l .h
^ '
I
i
Ml k
« , 
I
i
(J ! ;
W .
i
1
!
Mr
r
' :
J
i
1i 1
1 ^ 1
15
\
I
1!
i
«
I
I 1 1 « I
1
ill y
If ! i
1 ;
1
40 45
! "
50
20
25
30
35
Figure 5.6 Prediction and the true values of the time series. 5.5 Conclusions We have proposed the antecedent validity adaptation principle for tuning fuzzy rule base. Then, we applied the principle to refine the table lookup [7] (i.e. TLV) scheme for modeling recorded data. The main features of TLV have been illustrated with example. Next, TLV is extended with online identification capability. The algorithms have shown improved accuracy without impaired the one pass operation of TL; also it does not need to solve complicated differential equations or matrix manipulations. Essentially, table lookup scheme tries to accomplish the data mining by selecting the most significant rule; AVA manages to achieve the knowledge acquisition by incorporating all the available information with respect to their antecedent validity into a rule table. With AVA to nourishing the knowledge, TLV has a fast convergence and a effective adaptation to accommodate the changes. AVA enriches and redefines the knowledge in the fuzzy system. This principle helps discover, develop the dormant and unused assets to the best advance.
94
P.T. Chan & A. B. Rad
References [1] E.H. Mamdani, Applications of Fuzzy Algorithms for Control of Simple Dynamic Plant, Proc. IEE; 121 (12) pp.15851588 (1974) [2] K.S. Narendra and K. Parthasarathy, Identification and Control of Dynamically Systems Using Neural Networks, IEEE Transactions on Neural Networks, Vol.1, No. 1, March (1990) 427 [3] K. Nozaki, H. Ishibuchi, H. Tanaka, A simple but power heuristic method for generating fuzzy rules form numerical data, Fuzzy Sets and Systems 16, (1997)251270 [4] M. Sugeno, editor, Industrial Applications of Fuzzy Control, (Amsterdam; New York, NorthHolland (1985) [5] L.X. Wang, A Course in Fuzzy Systems and Control, Upper Saddle River, N.J. :Prentice Hall (1997) [6] L.X. Wang, Adaptive Fuzzy Systems and Control : desing and stability analysis, Englewood Cliffs, N.J. PTR Prentice Hall (1994) [7] L.X. Wang, Generating fuzzy rules by learning from examples, IEEE Trans. On Systems, Man, and Cybern., 22, no.6 (1992) 14141427 Acknowledgment: The authors gratefully acknowledge the support of The Hong Kong Polytechnic University through the Grant No. GV471.
Chapter 6 Fuzzy Spline Interpolation in Sparse Fuzzy Rule Bases
Mayuka F. Kawaguchi, Hokkaido and Masaaki Miyakoshi University
Abstract
This chapter involves the problem of interpolative reasoning in sparse fuzzy rule bases. First of all, the authors propose a method of linear rule interpolation which utilizes the convex hull including two rules. Secondly, we extend such a linear interpolation technique to a nonlinear one based on the idea of fuzzy splines. Both of rule interpolation methods give fuzzy interpolation functions which coincide with each rule of the given rule base. Next, we describe a method to generate a fuzzy partition through the fuzzy interpolation functions, which allows us to execute the ordinary approximate reasoning in the given rule base. Finally, some numerical examples demonstrate construction of fuzzy interpolation functions and fuzzy partitions by means of linear and spline interpolations. Keywords : sparse rule base, interpolative reasoning, linear rule interpolation, fuzzy spline interpolation
6.1 Introduction Recently, the methodology of interpolative reasoning has attracted attention as a practical problem of approximate reasoning for the case in which only imprecise and sparse pieces of knowledge are given. Koczy & Hirota [12], [13] have defined the concepts of sparse fuzzy rule bases and the distance between fuzzy sets, and have established the method called linear interpolative reasoning, which deduces some adequate conclusion from a sparse rule base. In addition, they [15] have pointed out the effectiveness of interpolative reasoning which allows size reduction of rule bases with a certain accuracy. Relating to Koczy's method, Dubois et al. [9], Baranyi et al. [1], [2], we authors [11] and Hsiao et al. [10] have previously proposed several types of linear interpolation in sparse fuzzy rule bases.
95
96
M. F. Kawaguchi
& M.
Miyakoshi
On the other hand, Saga et al. [16], Wang et al. [21] and Baranyi et al. [3] have proposed several techniques to combine fuzzy logic and spline functions. Saga et al. [16] have introduced parametric spline interpolation of fuzzy points which have conical membership functions. In their method, the center and the radius of the base of the cone are interpolated separately. The aim of this chapter is to introduce a new technique of interpolative reasoning through linear or nonlinear interpolation functions of given fuzzy rules. This chapter is constituted as follows. Section 6.2 is assigned to the definitions of basic notions and notations. In Section 6.3, we propose a linear method to interpolate a new rule into an arbitrary point in input space. In Section 6.4, we extend the linear rule interpolation technique to a nonlinear one based on fuzzy splines by applying the basic idea of Saga's method to sparse fuzzy rule bases. In Section 6.5, we introduce fuzzy interpolation functions and describe an algorithm to construct a fuzzy partition which covers the input space of the given rule base. Section 6.6 is assigned to the conclusions of this work.
6.2 Sparse Fuzzy Rule Bases Throughout this chapter, we consider a sparse fuzzy R = {R^Rj,...,!^} , where Ri=(Ai*Bi) (i = \,...,r), rule base
At and 5, are
fuzzy concepts represented by fuzzy subsets of the universes X and Y, respectively, and R has some gaps i.e.
r
A'Jsupp(4)*0.
i=i
(1)
According to the pioneer work by Koczy et al. [12], [13], we also assume that the universes X and Y are totallyordered metric spaces. Moreover, for the sake of convenience, we treat the universes X and Y as intervals on the real line R, and assume that At and Bt are fuzzy intervals i.e. fuzzy subsets satisfying normality, convexity, upper semicontinuity and
Fuzzy Spline Interpolation
in Sparse Fuzzy Rule Bases
97
supportboundedness. The a level sets of a fuzzy interval At are defined as
{xWA, (*) ^
<4V* =
a
\
f° r
a 6
(0>1]
(2)
clj xmA, (x) > a J for a = 0,
4= U «*/.
a=[0,l]
(3)
where //^.(*) is the membership function of At.
Then, Aia and fi/a
are finite closed intervals for Vae[0,l] as illustrated in Fig.6.1, and denoted by
VAi
At
a 0
9L
L a ia
X
a
fa
max A.
minA,
Fig.6.1
An alevel set of a fuzzy interval At Aia=[ajafa, a,+a*].
98
M. F. Kawaguchi & M.
Miyakoshi
A,a = [mmAia, maxAia] = [a, afa, a, + a £ ] ,
(4)
Bia = [minfi,a, max5, a ] = [b,b\a, b, +b£\.
(5)
In this chapter, the symbol F(X) denotes the family of all fuzzy intervals of the universe X . As it is well known as the LR representation [8], a fuzzy interval At is represented by its mean value a, (i.e. fiA . (a,) = l), left and right spreads ai = a,0 and af = aj0, left and right reference functions LA and RA as follows (see Fig.6.2):
A=W^,af]
L
(6)
for x
Akaix)/af')
<ai (7) at<x,
VAX*) =
^((*o,)/of)
for
where LA (and RA) is a monotone decreasing, and left continuous function satisfying LA(0) = l and Z,^(l) = 0 (i^(0) = l and ^ ( l ) = 0). We often refer to the following definition of a partial order in F{X):
def.
Aj^Aj
< > minAia < winA = forVae[0,l].
a
and maxAia < maxAja
Fuzzy Spline Interpolation
PA,
in Sparse Fuzzy Rule Bases
99
LA((alx)/af')
RA((xa,)la?)
Fig.6.2.
LR representation of a fuzzy interval At
6.3 Linear Rule Interpolation 6.3.1 KHMethod and Convex Hull Method In this subsection, we discuss two methods to give some adequate reasoning conclusion by means of linear interpolation in a sparse rule base. Let's consider the simplest form of a fuzzy reasoning problem in a sparse rule base according to the past works [18][20]; where an observation A* eF(X) occurs between two rules A± => 1\ and A2 => B2:
Rule 1 : Rule 2 : Observation:
4=>A
A2^>B A*
100
M. F. Kawaguchi
& M.
Miyakoshi
Conclusion:
B'
As the first method, let us recall Koczy's linear rule interpolation [12], [13] (hereafter, we call it KHmethod), which gives the conclusion B* eF(Y) in the case that A^A*^A2
an(
i A^2 as follows:
• „» • „ rain Al—rain Ai„ , . „ . „ > . minBa =min5 l a + . "  ^  ( n u n S 2 a min£, a ) r nun A2aram Ala „ maxyL,  max^4,„ max „» :B„ = maxBia + —(maxS?,, maxS^) rmxA2a  max4 a for Va e[0,l]. Fig.6.3 illustrates KHmethod on X  Y plane.
Fig.6.3. Koczy's linear rule interpolation (KHmethod)
Fuzzy Spline Interpolation in Sparse Fuzzy Rule Bases
101
As the other method, we propose an interpolation using the convex hull of the Cartesian product of Aia and Bia . The usage of such a convex hull has
been suggested by Dubois et al. [9] in their discussion regarding approximation of control laws. Let Sa Fig.6.4. such that be the convex hull of Ala x Bla Here, Sa and A2a x B2a as shown in
is represented as the set of the points (x, y) e X x 7
[bL, A'bFa + (1  A.y& ] ye'[M£,+(XX)b\a,k'bfa [Ab[a + (1  l)b\a, b«a ] + (1  A > £ ]
for x e Ala = [ < £ , <& ] for xe[a*a,a^]
for x e A2a = [«&, <& ], (9)
where x = Aa?a + (1  A)a^a = l'a[a + (1  A')«&, A, X' e [0,1].
Now, we determine the «  level set Ba of the conclusion as the greatest closed interval on 7 such that AaxB^c, Sa for the given observation A*.
Then, we obtain the following formulae instead of (8):
M. F. Kawaguchi
& M.
Miyakoshi
Fig.6.4. The convex hull Sa .
AlQA*QA2
and
B1^B2 . _ . mmBla) (10)
. _» . D max^* maxAla , . _ vamBa = minBla + f f(mmB2a nax maxA^ a max^iQ, maxAla 2a min.4*min,4 la , , D* D iS max5 a = maxft a + (maxB2a minA2a  min^4la
n •. maxB, )
forVae[0,l], Al]QA*^A2 and B^B^
m iin ^ *  m iin A a , . • D \ . D. . „ m n ^ „  m n 4i „ a (min52„mina„) min5 a = min5 l a + rar\A2aminAla — m i n A vamAla ^a ^aJ vamA2a„  m i n A „ r, max^4*  maxA , _, . (11) D* 2 lz maxBa = maxBla + (maxB2amaxBla) max^ 2 a  maxAla for V « e [0,11.
Fuzzy Spline Interpolation in Sparse Fuzzy Rule Bases
103
Fig.6.5 shows our interpolation on X  Y plane.
Fig.6.5.
The new method of linear rule interpolation (convex hull method).
If Ba^Ba.
holds for \/a,a'
such that a > a', then B= (J aBt
ae[0,l]
IS
a fuzzy interval. Moreover, in this case, B coincides with the greatest solution of the fuzzy relational equation [17] : A i.e. xB'cS, ndi^juA.(x),fiB.(y)) < Ms(x,y), (12) (13)
where S = (J a Sa is a fuzzy relation on x x Y •
«e[0,l]
As pointed out by Koczy et al. [14] and Shi et al. [18][20], in KHmethod, even if all of the fuzzy rules A± => ^ and A2 => B2, and the observation A
104
M. F. Kawaguchi
& M.
Miyakoshi
are defined as fuzzy intervals, the conclusion B does not always form even as a fuzzy set. In other words, B*ac,B*a, (a>a') does not hold in general.
The same situation can occur also in our newly proposed method. In order to avoid such a difficulty, the authors try a different approach from the abovementioned interpolations to cope with the gap of a rule base in the following subsection. 6.3.2 Linear Interpolation with LR Fuzzy Rules We describe a method to insert a new fuzzy rule A=> B into the gap between two rules A1 => 5[ and A2 => B2 by means of linear interpolation. From the practical point of view, we assume that the antecedent parts Al and A2 of the rules are given as LA  RA fuzzy intervals, and the consequent parts Bl and B2 are given as LBRB fuzzy intervals, i.e.
a\a=a\LA\a\ $a=b\Li\a\
afa=afRA\a\ b*a=bfRB\a) (i = l,2).
Now, let's discuss the condition for our interpolation method proposed in the previous section, under which the observation A is an LARA interval and the conclusion B is an LB  RB fuzzy interval. fuzzy
Theorem 1. Let an observation A be an LARA
fuzzy interval. Then
the conclusion B for A by means of linear rale interpolation is an LB  RB fuzzy interval if and only if the following conditions hold:
Fuzzy Spline Interpolation
in Sparse Fuzzy Rule Bases
105
aL = a[ +c(«2 ~a[\
aR = a? +c(a%  o f ) ;
b = b1+c(b2b1);
(15)
bL=lt+c{bL2bl\
bR=b«+c(b«b«),
where c = (a  ax )/(a2  «i) • Proof. Let us prove the case that B1^B2. be rewritten as follows: The first expression of (10) can
(**) (*>*)+ji ~"\(f;°\{(h AH*. *&)} •
{a2aa2)\alaal)
Constituting (14) and aa=a
RA
(a) into the above equation, we have
x {(A2  fc^1 («))  (6,  if v
1
(«))} •
(16)
(=> )
Constituting the assumption that B is an LBRB into (16), we obtain
fuzzy interval i.e.
b^ = bLLfl(a)
106
M. F. Kawaguchi
& M.
Miyakoshi
{(a^a^\a)~(a2a1)}{(b~biy(bLbt)LB\a)} = [(aRa^A\a)~(aa1)}{(b2bl)(b^bi)LB\a)}.
Considering the condition under which the above equality holds for any reference functions RA and LB, we get the following four equations:
(a«a?)(bLb[)
=
(a"a^bt) (17)
(a2a1){bLblL) {a2ax\bbx)
=
(aa1)(biblL)
(18) (19)
= {a ai)(b2  b,)
From (18) and (19), we have
b = bl+^^(b2bl) a2ax
= bi+c{b2b1),
(20)
respectively. Constituting (20) into (17), we obtain
_fl _ ft • bbx
a a +
i
a
R
R\
i i — r l 2 «i J
a,  a, v '
v ;
Fuzzy Spline Interpolation
in Sparse Fuzzy Rule Bases
107
Moreover, we can obtain the conditions with respect to aL and bR expressed in (15) from the second expression of (10) in the same way. (<= ) Constituting (15) into (16), we have the left side of (16) as (bb£)(bybtLBl(a)) = = {b1+c(b2bl)b^}(b1blLLBl(a)) c(b2bl)b^+btLB1(a),
and a part of the right side of (16) as (aRRA\a)ay(aZRA\a)ai) (a%RA\a)a2)(a?RA\a)ai) {a*+c(aRaR)a*}RA\a){aax) (aRaR)RAl(a)(a2ai) c(aRaR)RA~l(a)c(a2ai) ~ = c. Thus, we gain the following result by constituting the above expressions into (16) and then solving it for b^ : (o24)RA\a){a2ax)
 c{(b2  b![LB\a))  (ft, = = {b1L+c(b2LblL)}Lf\a) bLLf\a).
b^Lf\a))}
108
M. F. Kawaguchi
& M.
Miyakoshi
Moreover, we can obtain b£ = bRRB~1(a)
from the second expression of (10). fuzzy
Therefore, we have shown that the conclusion B is an LBRB interval. O.E.D.
We have proven the above theorem for the case that Bl^B2 . In the same manner as for it, we can prove it also for the case that # 2  A by using (11) instead of (10). Theorem 1 makes it possible to insert an LR rule at any point fl6[d1,a2]cl between two rules Al=>Bl and A2=>B2. Fig.6.6
demonstrates a fuzzy rule A => B as the Cartesian product of A and B in the case that A and B are triangular fuzzy intervals i.e.
y R = AxB
Fig.6.6. A fuzzy rule R = AxB
interpolated at a point x = a .
Fuzzy Spline Interpolation
in Sparse Fuzzy Rule Bases
109
LA(x) = RA(x) = lx,LB(y) LA~\a) = RA~\a) = Lf\a)
= RB(y) = ly; (21) = RB~\a) = \a.
Here, it should be noted that the above theorem holds in both linear rule interpolation methods: KHmethod and the convex hull method. Furthermore, the theorem coincides with the result by Shi et al. [18], [20] when LR fuzzy intervals reduce to triangular fuzzy intervals.
6.4 Nonlinear Rule Interpolation by Fuzzy Spline The purpose of this section is to extend the method of linear rule interpolation which we have newly proposed in the previous section, to a nonlinear one based on fuzzy splines. Saga et al. [16] have fuzzified parametric spline interpolation in order to identify freehand curves on X  Y plane. On the other hand, the authors apply the basic idea of Saga's method to nonparametric spline interpolation to describe the characteristics of inputoutput systems. 6.4.1 Nonparametric Spline Interpolation i = 0,...,N I such that xt<Xj for i<j,
Given N pairs of data {xhy,) a spline curve [6] of degree Kl Bsplines BjK(x) of degree
is represented as a linear combination of
Kl:
s(x)=YjaiBiK{x).
;=0
(22)
The Bsplines BiK(x)
are obtained by de BoorCox Algorithm [5] from onthe xaxisas
N + K knots 4k {k = 0,l,...,N+Kl)
110
M. F. Kawaguchi
& M.
Miyakoshi
fl
fi
(£<*<£,+1)
, , _ , (x<^,x>^+i), (23)
/iW = \ . 0
fi,xW = 7
± i
VA«W+~7
L B 1 +
i^i^)
(24)
The coefficients « , in (22) are obtained as the solution to a system of linear equations A'jc( x i) > ^ s(xj) = yi (i = 0,...,N l) when the knots E,k satisfy The
* e  SchoenbergWhitney's condition
£,•<*,•<£,+£ .
spline curve s(x) is a piecewise polynomial which is smoothly connected at each knot. Here, the term "smoothly connected" means that the derivatives of s(x) of order 1,2,..., K  2 (K > 3) are continuous. 6.4.2 Spline Interpolation with LR Fuzzy Rules
Now, the authors propose a method to interpolate a rule in a sparse rule base R = [Rl,R2,...,Rr} and to construct a fuzzy spline curve for R. f(x) , wfa(x) Let us ,
consider the conventional spline functions w%a(x) and wga(x)
, wfa(x)
for (a,,£,) , ( a .  . a ^ ) , (a,,a£) , ( o , , ^ )
and
(a,, &/a J, respectively, in the sense of (22). Then, we can interpolate a new rule R(x) = (A(x) => £(*)) at arbitrary x in the input space X as follows:
A(x)=
J
a=[0,l]
a[xwfa{x),
x + wfa{x)]
(25)
Fuzzy Spline Interpolation
in Sparse Fuzzy Rule Bases
111
*(*)= U <*[Ai)"U*\ /(*)+<*(*)]
a=lO, 1]
(26)
It should be noted that A(x)eF(X)
and B(x)eF(Y).
According to the
previous section, we regard a fuzzy inference rule R, as the Cartesian product of Af and 5 , , and define a fuzzy spline curve for a fuzzy rule base R = {RhR2,...,Rr} as a fuzzy graph: (27)
G = \jA(x)xB(x),
i.e. Ga=
ieA' = U [*wfa (*),* + < ( i ) ] x [ / ( * )  ^ ( * ) . / ( * ) + W&z(*)]
\jAa(x)xBa(x)
(28) Here, G is a fuzzy relation on XxY function ^ : I x r  > [ 0 , l ] , Next, in order to execute the fuzzy spline, we simplify it by using LR representation of a fuzzy interval as shown in (6) and (7). When Aj and 5, are LR fuzzy intervals as A,= [0,, af,a*\
L iL
and is represented by a membership
and
ARA
Bj^b^b^bj*]
, respectively, for each / e{0,L...,/•} , we need to wf(x), w?(x), w0(x)
evaluate the following five spline functions: f(x),
112
M. F. Kawaguchi
& M.
Miyakoshi
and
wg(x)
for
(«,£,),
(a,af),
(a,a?)
,
(atb^)
and
(a,b*) ,
respectively. A(x)
Therefore, eqs.(25), (26) and (28) are rewritten as follows:
= (x,w7L(x),M>f(x)) = (J
a=[0,1]
(25)'
a [ x  L y l  1 ( a ) w / ' ( x ) , x + /? i<  1 (a)w / /i (x)],
fl(x) = (/(*), *o(*)> * * ( * ) ) , (26)'
= U
a[f(x)LB\a)w^(x),f(x)+RBl(a)wf(x)\,
G
« s U {[^^" 1 («K L (^)^ + ^"1(«)H'f(x)]
(28)' x [f(x)  LB~\a)M>k (x), / ( x ) + iJg" 1 (a)w* (x)] }.
Fig.6.7 demonstrates a fuzzy rule R(x) as the Cartesian product of A(x) and S(x) in die case that A(x) and B(x) are triangular fuzzy intervals i.e eq. (21) holds.
Fuzzy Spline Interpolation
in Sparse Fuzzy Rule Bases
113
R(x) = A(x) x B(x)
Fig.6.7. A fuzzy rule R(x) = A(x) x B(x) interpolated at a point x = x .
6.5 Interpolative Reasoning Using Fuzzy Interpolation Function 6.5.1 Fuzzy Partition Next, we describe a method of interpolative reasoning for an arbitrary input A*eF(X). Let us recall that G is a fuzzy relation on XxY. Therefore,
we can derive the conclusion B* e F(Y) of inference for a given observation A* eF(X) from G as the relational composition of A* and G i.e. .
B =A °G, MB(y)= sup minj^^.(x), n~(x, .y)} •
x&X
»*
A*
(29) (30)
114
M. F, Kawaguchi
& M.
Miyakoshi
Since n^(x,y)
cannot be expressed in the explicit form of a function with Nevertheless, an
respect to x and y, we cannot directly evaluate (30).
approximate conclusion B can be obtained by generating a fuzzy partition [9] A{xl),...,A{xp) covering supp^4* , and then by applying the conventional
fuzzy reasoning method e.g.
f
P
^
B=Ao
U4oM*,)
(31)
Here, A\xA and Byxjj are LR fuzzy intervals interpolated at the point x = Xj into the input space X and the output space Y, respectively:
A'JH'J'VH'JWM)^'
(32
>
*y)=W*>oW*))wWe can generate a new rule base
<33>
R' = {R(xJ) = A(xJ)^B(xJ)(j
= \,...,p\
through a fuzzy spline curve G on condition that the mean value of AyXj J coincides to the rightside value of supp .4 (*,_!) i.e. Xj = x,_i as the following algorithm illustrates.
Fuzzy Spline Interpolation
in Sparse Fuzzy Rule Bases
115
<Algorithm >
Step 1. Step 2.
xi:=«i; Repeat Step 3.
y':=l Step 3 j:=j + \; xp:=ar until xy.= xhl Xj>araf + wf (x^)
Step 4. Step 5.
p:=j+\; For Step 6. j :  \
to p do
Step 6
Output
j , Xj, wf (xj), wf [xj);
J,fM»>b(xj),*>oM
It is clear that the set of the antecedents M(JCJ), A(X2),..., ^ ( ^ )  of R'
forms a fuzzy partition on the input space X in the sense of Dubois et al. [9]: (i) the supports of A{xx), A(x2), • • •, A(xp) (ii) the cores of A(x1), A(x2),•.., A\xpJ (in) A(Xl),A(x2),...,A(xp) and Xj+wfa(xj) form a coverage of X ,
are pairwise disjoint, x, wfa(Xi)> x} wfa(Xj)
are ordered i.e. >Xj+wfa{xj)
for V / > y .
6.5.2
Numerical Examples
Fig.6.8 and Fig.6.9 demonstrate numerical examples of linear rule interpolation and nonlinear rule interpolation by fuzzy spline, respectively.
116
M. F. Kawaguchi & M.
Miyakoshi
Fig.6.8(a) and Fig.6.9(a) show the supports of the given sparse rule base ( r = 6). In both cases, Ai and 5, (/ = 1,2,...,6 ) are triangular fuzzy
intervals as shown in Fig.6.6 and Fig.6.7, and symmetric. Fig.6.8(b) and Fig.6.9(b) illustrate fuzzy interpolation functions represented by 50 rules by means of linear interpolation and spline interpolation, respectively. Fig.6.8(c) and Fig.6.9(c) show the fuzzy partitions which consist of 17 rules, generated by the Algorithm described in the previous subsection. It should be noted that both partitions cover the region between Al and \ space. in the input
6.6 Concluding Remarks This chapter has presented the fundamental idea of fuzzy interpolative reasoning in which a fuzzy partition is generated through the interpolation function of the given rules. Especially, the authors have introduced the nonlinear rule interpolation method by means of splines functions in addition to the linear method based on the convex hull. Our method here makes it possible to apply the ordinary approximate reasoning method to sparse fuzzy rule bases. For the next stage of this approach, our method for the case of oneinputoneoutput system should be extended to the case of multiinputs by using multivariate splines [4]. Also, it should be an important problem from the practical point of view, to apply the revision principle [7] to our rule interpolation technique instead of constructing a fuzzy partition, as Baranyi et al. [2] have suggested in their work.
Fuzzy Spline Interpolation
in Sparse Fuzzy Rule Bases
117
(a) Given rules (r = 6 )
(b) A fuzzy linear interpolation function represented by 50 rules.
(c) A fuzzy partition generated through the above fuzzy linear function. Fig.6.8. Linear Rule Interpolation.
118 M. F. Kawaguchi & M. Miyakoshi
X
(a) Given rules ( r = 6 )
X
(b) A fuzzy interpolation curve represented by 50 rules ( K = 4 ).
(c) A fuzzy partition generated through the above fuzzy curve ( K = 4 ). Fig.6.9. Rule Interpolation by Fuzzy Spline.
Fuzzy Spline Interpolation in Sparse Fuzzy Rule Bases
119
References
[I] Peter Baranyi, Tamas D. Gedeon, Laszlo T. Koczy, "A General Interpolation Technique in Fuzzy Rule Bases with Arbitrary Membership Functions," Proceedings of International Conference on Systems, Man and Cybernetics, Beijing, pp.510515, 1996.
[2]
Peter Baranyi, Sandor Mizik, Laszlo T. Koczy, Tamas D. Gedeon, Istvan Nagy, "Fuzzy Rule Base Interpolation Based on Semantic Revision," Proceedings of International Conference on Systems, Man and Cybernetics, San Diego, 1998. [3] Peter Baranyi, Yeung Yam, ChiTin Yang, "SVD Reduction in Numerical Algorithms: Specialized to BSpline and to Fuzzy Logic Concepts," Proceedings of 8th IFSA World Congress (IFSA '99), Taipei, pp.782786, 1999. [4] Charles K. Chui, Multivariate Splines, SIAM, 1988. [5] Carl de Boor, "On Calculating with BSplines," Journal ofApproximate Theory, 6, pp.5062, 1972. [6] Carl de Boor, A Practical Guide to Splines, SpringerVerlag, 1978. [7] Liya Ding, Peizhuang Wang, "Revision Principle Applied for Approximate Reasoning," in Methodologies for the Conception, Design and Application of Soft Computing (Proceedings of IIZUKA '98) (Eds. Takeshi Yamakawa and Gen Matsumoto), World Scientific, pp.408^13, 1998. [8] Didier Dubois, Henri Prade, "Operations on Fuzzy Numbers," International Journal of Systems Sciences, 9, pp.613626, 1978. [9] Didier Dubois, Henri Prade, Michel Grabisch, "Gradual Rules and the Approximation of Control Laws," in Theoretical Aspects of Fuzzy Control (Eds. Hung T Nguyen et al.), John Wiley & Sons, pp.147181, 1995. [10] WenHoar Hsiao, ShyiMing Chen, ChiaHoan Lee, "A New Interpolative Reasoning Method in Sparse RuleBased Systems," Fuzzy Sets and Systems, 93, pp. 1722, 1998. [II] Mayuka F. Kawaguchi, Masaaki Miyakoshi, Michiaki Kawaguchi, "Linear Interpolation with Triangular Rules in Sparse Fuzzy Rule Bases," Proceedings of
120 M. F. Kawaguchi & M. Miyakoshi 7th IFSA World Congress (IFSA'97), Prague, H, pp.138143, 1997. Laszlo T. Koczy, Kaoru Hirota, "Interpolative Reasoning with Insufficient Evidence in Sparse Fuzzy Rule Bases," Information Sciences, 71, pp. 169201, 1993. Laszlo T. Koczy, Kaoru Hirota, "Approximate Reasoning by Linear Rule Interpolation and General Approximation," International Journal of Approximate Reasoning, 9, pp. 197225, 1993. Laszlo T. Koczy, Szilveszter Kovacs, "Linearity and the cnf Property in Linear Fuzzy Rule Interpolation," Proceedings of 3rd IEEE International Conference on Fuzzy Systems, Orlando, USA, pp.870875, 1994. Laszlo T. Koczy, Kaoru Hirota, "Size Reduction by Interpolation in Fuzzy Rule Bases," IEEE Transactions on Systems, Man and Cybernetics, Part B, 27, pp. 1425, 1997. Sato Saga, Hiromi Makino, "Fuzzy Spline Interpolation and its Application to Online Freehand Curve Identification," Proceedings of 2nd International Conference on Fuzzy Systems (FUZZIEEE'93), San Francisco, pp. 11831190, 1993. Elie Sanchez, "Resolution of Composite Fuzzy Relation Equations," Information and Control, 30, pp.3848, 1976. Yan Shi, Masaharu Mizumoto, Zhi Qiao Wu, "Reasoning Conditions on Koczy's Interpolative Reasoning Method in Sparse Fuzzy Rule Bases," Fuzzy Sets and Systems, 75, pp.6371, 1995. Yan Shi, Masaharu Mizumoto, "Reasoning Conditions on Koczy's Interpolative Reasoning Method in Sparse Fuzzy Rule Bases. Part II," Fuzzy Sets and Systems, 87,pp.4756, 1997. Yan Shi, Masaharu Mizumoto, "A Note on Reasoning Conditions of Koczy's Interpolative Reasoning Method," Fuzzy Sets and Systems, 96, pp.373379, 1998. Liang Wang, Reza Langari, John Yen, "Principal Components, BSplines, and Fuzzy System Reduction," in Fuzzy Logic for the Application to Complex Systems (Eds. W. Chiang and J. Lee), World Scientific, pp.253259, 1996.
[12]
[13]
[14]
[15]
[16]
[17] [18]
[19]
[20] [21]
Chapter 7 Revision Principle Applied for Approximate Reasoning
Liya Ding1, Peizhuang Wang2, Masao Mukaidono3
1
National University of Singapore, Singapore 2 West Texas A & M University, USA 3 Meiji University, Japan
Abstract The basic concept of revision principle proposed for approximate reasoning is that the modification (revision) of consequent is decided by the difference (deviation) between input (given fact) and antecedent and the revising processing is based on some kind of relation between antecedent and consequent. Five revising methods have been introduced based on linear and semantic relation for approximate reasoning. As a continuous work, this article discusses the revision principle applied for approximate reasoning with multiple fuzzy rules that contain multiple subantecedents. An approximation measure is proposed for the integration of revision. With a generalized approximation measure, the revision principle can be applied for more general cases of fuzzy sets. Keywords: approximate reasoning, revision principle, linear revising methods, semantic revising methods, semantic approximation, approximation measure
7.1
Introduction
When a rule P —• Q and a fact P' that is only an approximation of P are given, a conclusion will also be expected even on an approximate basis. Here the propositions P and Q are regarded as fuzzy concepts, and the fuzzy concepts are described by fuzzy sets [28; 29]. The inference can be done even when P and P' are not identical based on the concept of approximate reasoning. Approximate reasoning was put forward by Zadeh
121
122
L. Ding, P. Wang & M.
Mukaidano
[30; 3l], where linguistic truth value such as very true can be used. Unlike symbolic reasoning based on binary logic, approximate reasoning is related to the semantics of propositions to a certain degree. Compositional inference and compatibility modification inference are two main approaches for approximate reasoning [l; 2; 7; 13; 17; 18; 23; 27; 29; 30; 31]. The former realizes inference by obtaining an implication relation between antecedent and consequent of a rule and then composing the input with the relation [29]. The latter realizes inference by determining the measure of satisfaction between input and antecedent of a rule and then using the measure to modify the rule's consequent [13]. The revision principle [8; 9; 10; 19; 21; 22] was proposed in a different way. It is based on the belief that the modification (revision) of consequent should be caused only by the difference (deviation) between input (given fact) and antecedent. In other words, when a method of revision principle is used to approximate reasoning the consequent will always be derived as output if input is the same as the antecedent: Q' = Q when P' = P. This important feature is called nondeviation property and it is satisfied by all methods of revision principle [12]. The revising processing is based on some kind of relation between antecedent and consequent. For a given fuzzy rule P —> Q, it is almost impossible to describe precisely the nonlinear relation Rp_>c? between PCX and Q C Y. As an alternative way, a relation metrix is often used as an approximate description. However, even with only a finite number of points of P and Q to be taken into consideration, the relation metrix may still be too large to use for inference. So the essential thought of the revision principle is to find a way which is simple for calculation but with acceptable accurecy. Instead of the intact relationship Rp_>Q which is usually hard to get, a simplified relation between the P and the Q is used in the revision principle. We select some representative points < x,fj,p(x) >, x 6 X by certain methods, and for each of them we determine only one corresponding point < y,fiQ(y) >, y GY based on given relational factor(s) to make a relational pair (x, y). The collection of all the relational pairs then forms a simplified relation between P and Q:
A similar realtion between P' and Q' can also be defined, where the P' and the Q' are the given input and the possible conclusion of approximate reasoning. When a rule P —¥ Q and a fact P', an approximation of P
Revision Principle Applied for Approximate
Reasoning
123
are given, the task then becomes to deduce the approximate conclusion Q' based on
Q' = fR(Q,P,P')
(1)
where /R, is a revising function based on the relation Rp,Q between the P and the Q. When a different Rp,Q is selected, we may have a different revising method and the corresponding approach to keep the Rp,Q between P' and Q'. If a revising method can keep Rp',Q' = Rp,Q for any case, then it is said that the method has the relation keeping property [12; 22]. Following this idea, linear revising methods [8; 9; 10; 19] and semantic revising methods [21; 22] were proposed. Linear revising methods are the first set of methods developed for revision principle. Using linear revising methods, the conclusion Q' will be calculated by Q' = Q + AQ AQ = fL{Q,P,AP) (2) (3)
linearly, where fi, is a linear function based on fixed point law, fixed value law or valuepoint law. The method based on fixed semantics law was proposed as the first semantic revising method of revision principle (it has been named as SRMI to distinguish from SRMII). The basic idea is based on the socalled semantics of a rule which comes from P.Z. Wang's falling shadow theory [25]. When P —• Q and a semantic approximation [21] P' of P are given, » Q' is calculated by using the semantics of P —> Q with the fixed semantics law. The SRMII was proposed later [22]. Its basic idea is similar with SRMI, but fixed interrelation law is used for inference instead of the fixed semantics law. In author's early work, the proposed revising methods were described only for single rule with single antecedent. In [ll], revision principle was applied with multiple rules through neural network implementation. This article will extend the discussion to multiple rules which may contain multiple antecedents by introducing an approximation measure. The approximation measure is based on a distance between fuzzy sets. It offers a useful feature that for fuzzy sets A, B C X, am(A, B), an approximation measure of A and B is not necessarily 0 when A n B = <f>. Furthermore, the value of am(A,B) is dependent on \X\, the size of X. This
124
L. Ding, P. Wang & M. Mukaidano
gives a flexibility to determine am(.A, B) based on the need of application. When a fuzzy rule is determined to fire, it is necessary to require a certain compatibility between the input and the antecedent. Many works have been done for compatibility and similarity measure, such as [6; 13; 24]. The proposed approximation measure can also serve for this purpose. For simplicity, in [21; 22] we discussed only special cases where an input is a semantic approximation of the antecedent. In this article we will present how the condition can be relaxed by using a normalized approximation. Koczy and his colleague proposed a general revision principle method as a way between the revision principle and the rule interpolation techniques [3] and used socalled normalisation of the support of fuzzy set (suppnorm). Adopting the idea of suppnorm, we introduce normalized approximation and extended semantic approximation which offer a possibility to apply the semantic revising methods for fuzzy sets with arbitrary support and position. A generalized approximation measure is then proposed to deal with normalized approximation of fuzzy set. The rest of this article is arranged as follows: the basic concepts and revising methods of revision principle are briefly explained in section 2; Section 3 introduces an approximation measure, its extended definitions and discusses their properties. The application of revision principle with multiple antecedents and multiple rules are presented in section 4; Section 5 gives the summary.
7.2
Revision Principle
In this section, we briefly review the methods of revision principle to provide reader a basis of understanding. It is assumed that P and P' are defined by fuzzy sets on the universe of discourse X as
P = \ VP(X)/X xeX
P' = J np,(x)/x
xeX
and Q is defined as a fuzzy set on the universe of discourse Y as
Q= HQ(V)/V y e Y
Revision Principle Applied for Approximate
Reasoning
125
where the Hpix) is the membership function of P, J means union of all HP(X) for x over the universe of discourse X. Notations for P' and Q are similarly denned. For simplicity, the fuzzy sets under discussion in this section are assumed to be convex and normalized. The universes of discourse X, Y are realnumber intervals. The application on more general cases will be presented in section 4 of this article.
7.2.1 7.2.1.1
Linear Revising
Methods
Relational Factors in Linear Revising Methods
When the revision principle is applied for approximate reasoning, as mentioned early, a simplified relation between the antecedent and the consequent of a rule will be used. In order to get a reasonable conclusion for some applications, it is important to have an appropriate Rp,Q. Two relational factors have been suggsted to determine Rp,Q in linear revising methods. They are corresponding relation and direction of change[22; 12]. Corresponding Relation The corresponding relationship between the P and the Q of a rule P —¥ Q is found in different ways for different linear revising methods. In fixed value law and valuepoint law, a relational pair (x, y) is decided based on their membership values: fip(x) — HQ{V). While in fixed point law, a relational pair is decided based on certain relation between their positions on universes of discourse: y = f(x). When we fix a v = fJ,p(x), 0<v<lioixeX, there will be /ig(yi) = tiQ(y2) = fJ>p(x), where y\ ^2/2 Figure 7.1 shows an example, where the corresponding point of x in the part AB of P will be either y\ in the part ab or 2/2 in the part ac of Q. In other words, there are two corresponding relationships (AB —• ab,x —• j/i) and (AB > ac,x > 1/2). The former is > an called positive inference (ft) d the latter is called negative inference (t4)That is, when dyp(x) x dnQ(y) > Q dx dy for Hp(x) — /XQ(J/), li!p(x) 7^ 0 and ^'n(y) 7^ 0, it is positive inference, otherwise it is negative inference. This idea is directly used with fixed
126
L. Ding, P. Wang & M.
Mukaidano
0
»/!
V * ° J j LA
Fig. 7.1
The relation of corresponding points
value law and valuepoint law, where a relational pair < x, y > is found for fiP(x) = fiQ{y). The similar idea is used with fixed point law to decide a y € [yi, yr] = Y, the corresponding point of a; € [xi, xr\ = X, by an unification function: y = U(x) = a[(x  xi) e (xr  xt)] x (yr  yi) + yt where a is a correspondence operator defined as : J T for positive inference \ 1— r for negative inference In the unification function, it is possible to use xsi = min[inf(supp(P)),inf(supp(P'))] sup(supp(P'))] (4)
.
xsr — max[sup(supp(P)),
as the left and right point instead of xi and xr, where supp(.) is the suppot of a fuzzy set [32; 15], inf(.) and sup(.) denote the infimum and the supremum [15] of a set. We can also estimate the supp{Q') and then similarly get ysi and ysr to be used in the unificatin function instead of yi and yr. Direction of Change The relational factor direction of change is to determine how a consequent can be revised when an amount of revision has been calculated. For instance, assuming the rule 'if P is small then Q is large' and the fact iP' is very smalF is given, there can be some different semantic viewpoints for deducing Q'. The one is by the understanding that 'the smaller P is the large Q is'. THe other one is that 'the smaller P is the smaller Q is'. The former is called inverse inference() where the direction of change from Q to Q' is inverse to the change from P to P'. The latter is called compliance inference where the direction of change from Q to Q' is the same as from P to P' (Figure 7.2).
Revision Principle Applied for Approximate
Reasoning
127
Q'()
Q
Q'(+)
iSLMH
Fig. 7.2 Direction of change
Fig. 7.3
Fixedpoint law
7.2.1.2
Linear Revising with FixedPoint Law (LFP)
The basic idea here is to fix a point x in X = [xi, xr], the universe of discourse of P, to get a corresponding point y in Y = [yi, yr], the universe of discourse Y of Q, by the unification function \J(x) as given in (4). The deviation between P' and P will be captured by the difference of the values of membership functions /xp(x) and /J,p'(x) at the fixed point x. Then an approximate fJ.Q'(y) will be deduced by the deviation np>(x) — HP(X) as well as the /xg(y). Formula 1 (Linear Revising Method with the FixedPoint Law): I. Deviation from antecedent
Afj,P(x) = fip>(x) ^p(x)
(6) A/ip(x) = 0 AfiP(x) < 0 AfiP(x) > 0
II. Revision to consequent 0 [i  fjq(y)}  H i  M * ) ]
x A
(7)
fJp( )
x
III. Revised membership function of consequent
128
L. Ding, P. Wang & M.
Mukaidano
y
y
Fig. 7.4
Fixedvalue law
0 /^Q'(y) = { i*Q(y) ± A^Q(y) 1 IV. Approximate consequent
/i Q (y)±A/XQ(j/)<0 ° < M s / ) ± A Mg(y) < o (J,Q(y) ± A(iQ{y) > 1
(8)
Q' = J»Q>(y)/y
(9)
where (±) means for compliance inference the ' + ' shall be used and for inverse inference the '—' shall be used. 7.2.1.3 Linear Revising with FixedValue Law (LFV)
Different from the LFP, the basic idea here is to fix a value v e [0,1] such that the membership functions fip(x) = fipi(x') — fiQ(y) = v (x, x' £ X, y G Y) to find a shift Ax — x' — x on the universe of discourse X, and then by this shift to determine another shift Ay from the point y to y' for VQ'iy') — MQ(2/)I where x' is called the deviative point of x for given P and P' and it satifies: dnp{x) ^ dnP.{x') d(j,p(x) , diipAx') Q 0 and ,\ ' =0 or dx' dx dx' dx the y is the corresponding point of x and it satisfies: dnp(x) dx
x
diMqiv) dy
> Q
djj,p{x) dfJ.QJy) = 0 and dx dy Letting
where " + " is for positive inference, '—' for negative inference. MQ'(2/') = A*Q(2/)I the result can be deduced (Figure 7.4).
Formula 2 ( Linear Revising Method with the FixedValue Law ): The universes of discourse are X = [xm, XM] for P and Y = [yn, y^v] for Q respectively. The support of P is supp(P) — ( z i , ^ ) C X, and the suport of Q is supp(Q) = (yi,y2) C Y.
Revision Principle Applied for Approximate
Reasoning
129
I. Revision to consequent (a) Boundary dependent (a1) for positive inference(+)
A ,  Ft** AX) _ {<*; : *> * <»;»;>: (a;^ x) x' e [x, i ] c > * (•*>
M
(10) (a2) for negative inference(—)
A =
^/
A l ) =
{ (x> x)*(y ( ^_
X
VN) + (X xn)
(yn _ y) ^ . ^ _ 3.)
x' € [xm,x)
j/g[j.)Ijf]
V '*'
I
)
X
(ID
(b) Boundary independent Ay = F(x, y, Ax) = f rrau;{y n , y(±)(x'  x) x (y2  yi) H (x 2  xi)}  y \ min{yN, y(±)(x'  x) x (y2  yi) f (x 2  x j }  y where y is the corresponding point of x. II. Approximate consequent (12) (±)(z'  x) < 0 ( i ) ^ ' ~x)>0
Q' = J»Q(y)/{y + *y)
(13)
where (±) means for compliance inference, the sign ' + ' shall be used, and for inverse inference, the sign '—' shall be used. 7.2.1.4 Linear Revising with ValuePoint Law (LVP)
The valuepoint law is a combination of the fixed point law and the fixed value law. It fixes a value /xp(x) = v G [0,1] for x e X to get a corresponding point y £Y which satisfies VQ{y) = pp{x), and ^P(x)xd^(y) ax ay
> 0 or
dtp(x1==0 ax
^
d^(y) ay
= 0
where " + " is for positive inference, '—' for negative inference. An approximate HQ'(y) will be deduced linearly by A/xp(x) = /zp'(x) — /xp(x), the deviation between the membership functions /xp(x) and /xp/(x) at the point x, and fi,Q(y) (Figure 7.5). Formula 3 ( Linear Revising Method with the ValuePoint Law): I. Revision to consequent A/xQ(y) = A/i P (x) = / i p / ( x )  / x P ( x ) (14)
130
L. Ding, P. Wang & M.
Mukaidano
Fig. 7.5
Valuepoint law
II. Revised membership function of consequent (0
VQ' (v)
fiqiy) ± AfiQ(y)
<0
= { MQ (y) ± A^Q (y)
o < HQ (y) ± A/xQ (y) < I
(15)
III. Approximate consequent
C? = J KyivVv
(16)
where (±) means for compliance inference the ' + ' shall be used and for inverse inference the '—' shall be used. 7.2.2 Semantic Revising Methods
Definition l(valuable interval) An interval VA — [xiv, xrv) is called valuable interval [2l] of a fuzzy set A C X = [xi,xr\, if and only if supp(A) = (xiv, xrv) c X is the support of A [32]. Definition 2 (semantic approximation) A fuzzy set A' is said to be a semantic approximation of a fuzzy set A if and only if their valuable intervals coincide [22]. In this section, we discuss only the basic definitions of semantic revising methods for those cases where input fuzzy set and antecedent fuzzy set are semantic approximation each other. Application of semantic revising methods for more general cases will be discussed in section 4 of this article. 7.2.2.1 Semantic Relation and Interrelation in Semantic Revising Methods
When the semantic revising methods are used with a rule P —• Q where » P C X = [xi,xr] and Q C Y = [yi,yr], the interrelation and the semantic
Revision Principle Applied for Approximate
(I) SR
Reasoning
ii)
131
Mp(x)
_P
1 Mp«
SR 5.Q(+)
s ( s >y^
*r
X J? \
9/ L
y >>
M^y) 1 Q
(II)
IR
P,Q
yr
Y
Fig. 7.6
Semantic relation and interrelation of SRMI and SRMII
relation of P and Q can be decided by different ways. In the SRMI we first define an interrelation: IR<gQ = {(x,y)\y = U{x)} (17)
on X x Y, where U(a;) is the unification function as (4). Then we have the semantic relation on [0, l ] 2 , the space of the membership degrees of P and QSR% = {{s,t)\s = M P G O , * = VQ{V), (*>*)
e IR
P,Q}
(18)
An approximate consequent Q' is deduced by fixing the semantic relation between P and Q, and keeping the semantic relation between the given P' and Q'. It is called fixed semantics law. In the SRMII, we first define a semantic relation on [0, l ] 2 : (19) and then we have the interrelation on X xY:
SI
mgg = {(*,y)IM*) = «,Ml/) = * M) e >
which satisfies: ds dt T — x 7 x * > 0 ax ay or
O
(20)
ds , dt — = 0 ana — = 0 aa: dy
where \t is an interrelation constant for SMRII and decided by: f +1 for positive interrelation for negative interrelation
* = {!
(21)
132
L. Ding, P. Wang & M.
Mukaidano
An approximate consequent Q' is deduced by fixing the interrelation between P and Q, and keeping the interrelation between the given P' and Q'. This method is called fixed interrelation law. 7.2.2.2 Semantic Revising Method I (SRMI)
The inference of using SRMI is to get a point x in the universe of discourse X of P and P' and the corresponding point x* where fJ.p>(x) = /j,p(x*). Then based on the interrelation of SRMI between P and Q, the points y* and y can be found in the universe of discourse Y of Q and Q'. With the semantic relation of SRMI, A*n(y) = A*Q(2/*) is obtained. Integrating n'n(y) over all y G Y, an approximate conclusion can be deduced (Figure 7.7). Formula 4 (The Semantic Revising Method I, SRMI) : The valuable interval for P and P' is [xi, xr] = Xp = Xp<, and for Q and Q' is [yi, yr) = YQ = YQ, I. For an x G Xp>, find the corresponding x* G X p which satisfies the following two conditions: and
HP{X*)
=np,(x)
d»p(x*)xd»p,(x)>0 ax ax
II. By interrelation,
^
dMx*)=Q dx
^
dv»{x)=Q dx
y = XJ(x) = a[(x  xi) f (av  a;;)] x {yT  yt) + yt y* = U(i*) = a[{x*  xt) + (xr  x,)] x (yr  yt) + yi where, y G YQ>, y* G YQ and (x,y),(x*,y*) correspondence operator by (5).
(22) (23)
G IRpQ by (17), < is the r
III. Based on the fixed semantics law, when Hp>(x) = (ip(x*), we have MQ'(y) = M<?(y*), where (fip(x*), nq(y*)) G S R p ^ , (^P'(a;), MQ'(J/)) G
Oxvp/ Q / j 3.11(1 j K n / r\i ^ Drtp/j.
IV. The Q' is deduced by
Q' = J »Q>(y)/y
Revision Principle Applied for Approximate
Reasoning
133
••M*)
Fig. 7.7
Semantic Revising Method I
Fig. 7.8
Semantic Revising Method II
Formula 4 shows that the semantic relation in P and Q will always be kept in P' and Q' when the SRMI is applied.
134
L. Ding, P. Wang & M.
Mukaidano
7.2.2.3
Semantic Revising Method II
(SRMII)
The inference of using SRMII is to fix a point x in the universe of discourse X of P' and P and then get the membership values /xp(x) and fipi(x). Based on the semantic relation of SRMII, HQ{V) = HP(X) can be obtained. From MQ(2/) w e have the point y and then let fJ.Q'(y) = /J,pi(x). Integrating VQ'iy) o v e r a u y € Y, an approximate conclusion can be deduced(Figure 7.8). F o r m u l a 5 (The Semantic Revising Method II, SRMII): The valuable interval for P, P' is [a;;, xr] = Xp = Xp>, and for Q and Q' is [yi, yr] = YQ = YQ, I. For a fj,p/(x) £ [0,1] (x £ Xp>), find the corresponding fj,p(x*) e [0,1] (a;* S Xp) which satisfies the condition x — x*. II. By the semantic relation,
My*) = wCO
which satisfies the conditions
(24)
= 0 and
dj^nxd^y)x^>o
ax ay where y* € YQ and (fj,p(x*), tion constant by (21).
„
/XQ(«/*))
^ ^ )
ax
4q(ir) = 0
ay
€ S R ^ by (19), \I> is the interrela
III. Using the fixedinterrelation law I R p , L, = I R p A, when x = x*, we have (a;, y) = (ar*,y*) and y = y*, where (a:*,y*) e l R p ^ , (x,y) IV. By the semantic relation,
MQ'(y) = M P ' ( a ; ) (25)
£lR(pPQl.
which satisfies the condition: % M
aa;
X
% M
ay
X
* >
0
or
% M
ax
= 0
and % M
ay
= 0
V. The Q' is deduced by
Q' = J »Q'{y)/y
Formula 5 shows that the interrelation relation in P and Q will always be kept in P' and Q' when the SRMII is applied.
Revision Principle Applied for Approximate
Reasoning
135
Fig. 7.9 function
Nondecreasing, nonincreasing and full membership part of a membership
7.3
Approximation Measure
There have been many works done for similarity measure between fuzzy sets and elements, such as [4; 5; 14; 16; 20; 24; 26; 33]. The similarity measure based on geometric distance model discussed in [4] raised an important property that for any two fuzzy sets A and B, A,B C U, the situation of AC\B = (f> does not necessarily lead a similarity measure of A and B to be 0. Inspired by this thought and the concept of Hamming distance [15], we introduce an approximation measure between two fuzzy sets. 7.3.1 Basic Definition
We first give the basic definition of approximation measure for convex and normalized fuzzy sets. Suppose m fuzzy sets Pi,P2,...,P T O C X, m > 1. For each of them, there is a membership function fj,pi (x) for x £ Xi C X as shown in Figure 7.9. The Xi = [a;i4,a;ri] is the valuable interval of Pi. We denote the nondecreasing part by fip(+)(x), nonincreasing part by A*p()(x), and full membership part by Mp(=) (x)'' MP(+) (x) MP; 0*0 = { »p&( ) VP(){x)
X
x e [xh, xai)
X S
[XanXbA x<=(xbi,xri]
(26)
where ixp~'(x) = 1. Given a membership value t € [0,1), for an i (1 < i < m), there are two points Xj and a:^^ satisfying Mp(+) (xt ) = * anc ^ A*p() (xt )= * > where x\( ' £ [x; 0 x a i ) and a^ G ( a ; ^ , ^ ] . When there is only one point x* G X with /xp;(x*) = t* = 1, we have a;J;+^ = xlt. = xai = x^. Definition 3(approximation measure) The approximation measure of
136
L. Ding, P. Wang & M.
Mukaidano
two convex and normalized fuzzy sets Pi and Pj (Pi, Pj C X, 1 < i, j < m) is defined as one minus the distance of Pi and Pj over the universe of discourse X:
amx(Pi,Pi) = 1  7^7 /
(\4+) ~ 4{+)\ + \4'] ~ 4(~}\)/2
(27)
When X is obvious for discussion, it can be simply denoted by am(Pj, Pj). In real application, a few representative points may be sufficient for calculating an approximation measure. For instances, four points can be chosen for a trapezoidal fuzzy set and three points for a triangular fuzzy set. Based on the above definition, for any P%,Pj Q X, the following properties of the approximation measure hold. Property 1 Property 2 Property 3 0 < am(Pj, Pj) < 1. When Pi = Pj, am(Pj,P,) = 1. am(Pi,P,) = am(P J ,Pj).
Definition 4(least approximation measure) Let Pj, Pj C X (1 < i,j < m) be two fuzzy sets, Vpi and Vpj the valuable interval of Pi and Pj, respectively. The least approximate measure of Pj and Pj is defined as: amL(Pj,P,) = 1  ±  j (zj ( + )  s* + > + \4~] V \ L\ yte[o,i)  s><>)/2 (28)
where VL = \lv,uv] C X i s the smallest set that includes both of Vpi and Vpi as subsets, and its lower boundary and upper boundary are lv = min[inf(VPi), uy = maa;[sup(Vpj), Property 4 inf(VPj)] sup^p^].
a m * (Pj, Pj) > am L (Pj, P,).
When an approximate reasoning is considered in a relatively narrow range, the result will be more sensitive to the difference between a given input and the antecedent of a fuzzy rule. The a m i provides a measure for the extreme case when the universe of discourse is the smallest interval containing Pj and Pj, so it can be used to calculate the minimal guaranteed value of approximation measure during reasoning.
Revision Principle Applied for Approximate
Reasoning
137
7.3.2
Extended
Definitions
Let supp(A) be the support of fuzzy set A C X, suppi,{A) = inf(supp(A)) and suppu(A) = sup(supp{A)) be the infimum and supremum of supp(A), respectively. The central point of fuzzy set A is: cp{A) = (sup(Aa) + inf{Aa))/2 where a = height(A) and Aa is the acut of A. Definition 5 (normalized approximation with given support and central point) Let A C X be a convex fuzzy set, supp(A) = S, height(A) = a > 0, SUPPL(A) = I, suppu(A) = u, and certral point cp(A). For any given L,U £ X, and L < U, we have ' M L J / , a normalized approximation of A with given X and U, which satisfies: suppL{nALU) suppu( ALU)
n
(29)
= L = U
and the membership function is determined by:
{
a
fj,A(cp(A))/a
x =
cp(nALu)
»A(f(x))/a 0
L<x<U otherwise
(30)
/(*) = (x  cpCALu)) x ^ifZ))
fZ \ u s<cp(MLt,) a; > cp(nALu) a; < cp("ALC7) x > cp(nALU)
+ CP(A)
(31)
{
U
L C/
{
,„ . '
where CP( ALU) is the central point of UALU, which satisfies L < cpi^Am) < U. When cp(nAm) is not previously given, we have cp(nAnj) = f(L, U, I, u): The following function / is a choice to keep the original shape of A as much as possible: f(L, U, I, u) = J~fA symmetrical nAuj x (cp(A) l) + L (34)
can be obtained by defining
138
L. Ding, P. Wang & M.
Mukaidano
cp(nALU)
= (L + U)/2
Definition 6(extended semantic approximation) Let A, B C X be fuzzy sets, suppL(A) = IA, suppu(A) = uA, cp(A) = cA, SUPPL{B) = IB, suppu(B) = UB, and cp{B) = CB The set B is said to be an extended semantic approximation of A if and only if there is a nBuj, normalized approximation of B with L = I A , U = UA and
cP(nBLU) = ^ B ~ _ ? B )
X
M f i )  1B) + L
When U = UB = UA and L = IB = IA, we have nBm = B and B is a semantic approximation of A. So this is an extended definitin of semantic approximation [21; 22]. It makes semantic revising methods possible for fuzzy sets with aritrary support and position. For convex but nonnormalized fuzzy sets, we have a generalized definition of approximation measure. Definition 7(generalized approximation measure) Let A and B be convex fuzzy sets, height(A) = OLA > 0, height(B) — OCB > 0, SUPPL(A) = IA, suppu(A) = UA, SUPPL(B) = IB and suppu(B) = UB The generalized approximation measure is defined by: °amx{A,B) ^ ' a ^ x amx(MUuyl,"BiBUB) (35) max{a.A,aB) is a normalized approximation of A with given IA and UA, =
m i n
where nAiAUA n Bi s a normalized approximation of B with given IB and UBBUB i A normalized fuzzy set is also a normalized approximation of itself. When a A = OLB = lj the Definition 7 returns back to the basic definition as (27). The generalized least approximation measure can be similarly defined. Definition 8 (simplified approximation measure) Let A,B C X be convex fuzzy sets, height(A) = OCA > 0, height(B) = as > 0, suppi,{A) = IA, suppu(A) = UA, SUPPL(B) = IB, and suppu{B) = UB A simplified approximation measure of A and B is defined by < , ,. m min{aA,aB) uA\)/2 w (\h  U\ + \uB amv{A,B) = rX — (36) max{aA,aB) \V\ where V = [min(lA,lB),fnax(uA,UB)] Q X. When B is simply a shift of A on X, then we have s a m v ( A , B) = a m y (.A, B).
Revision Principle Applied for Approximate
Reasoning
139
Definition 9(extended approximation measure) Let A,BCXbe ized convex fuzzy sets. When there is a C C X satisfying min{suppL{A),suppL(B)} min{suppu(A), suppu(B)}
normal
< supp^C) < max{suppi,{A),suppL{B)} < suppu(C) < max{suppu(A), suppu{B)}
an extended approximation measure of A and B can be deduced by
e
amx(A,B)=amx(A,C)*amx(C,B)
(37)
where amx(A, C) is the approximation measure or simplified approximation measure of A and C, and a m ^ f B , C) of B and C, respectively. Normally, eamx(A, B) < amx(A, B). 7.4 7.4.1 Approximate Reasoning Using Revision Principle Reasoning with Multiple Antecedents
Given a rule P n , P i 2 , • ••,P\m —> Qi, where Pu C Xu C X and Q\ C Y. Each Pu(i = 1,..., m) can be defined by the membership function with nondecreasing part /i (+)(x) for x € [xiu,xau), nonincreasing part /J, ()(#) for
Pli Pli
Pit
x S (^bii.^rii], and full membership part (j, (=)(x) for x e [xau,Xbu], the same manner as equation (26). The Q\ is similarly described by > Q (+>G/) »Qi(y) = {A*g()(y) _ M 0 () (y) y&[yi,ya) y [yaM y e (z/!>, 2/r]
e
by
(38)
where fJ^(y) = 1. The reasoning can be carried out by following steps. (1) For given Pn, P\2i ••••> P\mi w e consider each of them at once. When Plt is taken into consideration, the partially revised consequent Qu is determined by applying one (linear or semantic) method of the revision principle to Pu, Pu, and Q\. (2) The idea of fixvalued law is then used to get the deviation of each Qu from Q\. For any t S [0,1) (Figure 7.10), we have corresponding
% (+) = / *  ( + > «
140
L. Ding, P. Wang & M.
Mukaidano
f) Fig. 7.10
yi(+)
y(>
yi()
Deviation of Qu
from Q i
and
y'ti{+)
= M? ( + ) (<)
So for i = 1,2,..., m, there are
Ay^=:y^y^
and (3) Calculate the corresponding yt by y; = yi +
(39) (40)
(41)
TZi*™(Pii,Pii)
and ya, yb, yr by the same manner. (4) An approximate conclusion Qr will be integrated by all individual revisions from Q\ to Qu:
+ +
J
V €[v'a,yh]
(42)
J
y
€(vb,yr\
where y for nondecreasing part and nonincreasing part is calculated by
Revision Principle Applied for Approximate
Reasoning
141
(a) for y G [y'i,y'a),
i^&^jf' (43)
where y £ [yi, ya), AJ/J (b) for y G (y'b,y'r], ' is the deviation for t — UQ (y).
where y G [yb,yr], AJ/J
' is the deviation for t = (J,Q (y)
(5) The confidence of the approximate conclusion Qx is calculated by CanfiQ'j) 7.4.2 Reasoning with Multiple = Mini[axn{Pu, Rules P'u)\ (45)
When we are given a set of n rules:
Pn,Pi2,,Pim>Qi
P21,P22, •••,P2m  > Q2 ^ Wn
•* n l j •* n2i •••) Mim
and a set of facts an approximate reasoning can be performed as follows: (1) Applying the method introduced early for a single rule with multiple antecedents on the set of facts with each given rule Pji,Pj2, •••, Pjm —> Qj (j = l,2,...,n) toget
Qli Q21 •••> Qn
(2) Based on the confidences: ConfiQ'j),Ccmf(Q'2), ...,Ccmf(Q'n)
to find a Qs G {Qi, Q2, •••, Qn} that satisfies Canf{Q3) = MaxjlCanfiQ'j)}
142 L. Ding, P. Wang & M. Mukaidano
(3) Calculate the corresponding yl by
and ya, yb, yr by the same manner. (4) An approximate conclusion Q can be obtained by:
Ql =
+ Jy'£[y'*,y ], lly' [, ,
b
Ler >/
.
£{y),y
+ /
tihvVv
(47)
/ • « • • , *
•'» €(l/6,Vr]
where y for nondecreasing part and nonincreasing part is determined by (a) for y £ [yi,ya),
. _ E?=I""(QJ,Q;)X^'
(b) for y € (j/!,,^],
(48)
V
E;=iam(Q„Q;.)
^>
where y J ( + ) andy7^) are the corresponding points in nondecreasing part and nonincreasing part of membership function of Qj (j = l,2,...,n), respectively, for t = HQs(y), V & [vuVr]When the revision principle is applied to nonnormalized fuzzy sets, normalized approximation and generalized approximation measure can be used. When a semantic revising method is applied for applications where an input fuzzy set may not always be a semantic approximation of corresponding antecedent, an extended semantic approximation can be used. In that case, a distance measure between a normalized approximation and its original fuzzy set needs to be calculated and used as a kind of confidence measure for the revision of consequent. We leave the details to another article.
Revision Principle Applied for Approximate
Reasoning
143
Q'l Ql Q' lit
Q'
Q'2
9'12 t Q'2? t C '•'V \ *\ : >f\' '"\% \/'•* \ \ i: v //'>\x :\\
v 'C 2 3 4
Fig. 7.11
.V / y
5 6 7 8 9 10 Y
An example
7.4.3
Example
Given rules:
P2l,P22^Q2
where Pn,Pi2,P2i,P22 Q X = [0,10], and Q i , Q 2 C Y = [0,10] (Figure 7.11). For simplicity, all of them are triangular. So Pn can be simply denoted by (0,1,2), and others by the same manner. (a) Based on the definition of approximation measure, we have 10 am(P 2 ,P 1 2 ) l(
2 + 2
10
>/2=0.8
(b) Using linear revising method with fixed value law, we get Q'n = (2,3,4) Q'12 = (3,4,5) (c) Using the result of (a) and (b), we have Vh
y'ai
=
=
1+
y'bi
0.9 x 1 + 0.8 x 2 0.9 + 0.8
2.47
144
L. Ding, P. Wang & M.
Mukaidano
= Vri Q\
2+
0.9 x 1 + 0 . 8 x 2 . — =3.47
0.9 x 1 + 0.8 x 2 = 3+— =4.47 ~L7~ = (2.47,3.47,4.47)
(d) With the second rule, we have am(P;,P21)=0.8 am(P2,P22)=0.9 Q 2 1 = (6,7,8) Q 2 2 = (7,8,9) and ,
yh =
0.8x(2) + 0.9X(l) 0.8 + 0.9
"^2
1*0,2
= 9 + °8 ' l  y  ' ) , 753 „;, Q2 =
10 +
0.8X(2H0.9X(1)^53
(6.53,7.53,8.53)
(e) Combine the results from (c) and (d), we have a m ^ Q i ^ 0.853 a m ( Q 2 , Q 2 ) = 0.853 Vi = yi
Va
2.47x0.853 + 6.53x0.853 = 4.5 0.853 + 0.853 3.47 x 0.853 + 0.853 + 4.47 x 0.853 + 0.853 + 7.53 x 0.853 :5.5 0.853 8.53 x 0.853 . „ _ = 6.5 0.853
= Vb
^  . w„ = VT
and an approximate conclusion (Figure 7.11) Q is obtained as: Q = (4.5,5.5,6.5)
Revision Principle Applied for Approximate
Reasoning
145
7.5
Summary
We discussed the application of revision principle for approximate reasoning with multiple rules. An approximation measure has been proposed for the integration of revision. In the early stage of processing (i.e. multiple antecedents), it is used to combine the revision caused by individual subantecedent to consequent. In the late stage of processing (i.e. multiple rules), it is applied as weight for the integration of approximate result derived by using each individual rule in the early stage. To relax the conditions given to fuzzy sets in author's early works, the concepts of normalized approximation and generalized approximation measure were introduced to handle fuzzy sets that may be nonnormalized, or with arbitrary support and position. Our future effort will be put on the study of approximation measure in semantic basis. We will also extend our discussion to nonconvex fuzzy sets and also fuzzy sets defined on multiple dimensions.
146 L. Ding, P. Wang & M. Mukaidano
References
[1] Baldwin, J.F., "A New Approach to Approximate Reasoning Using a Fuzzy Logic". Fuzzy Sets and Systems, 2, pp. 309325, 1979. [2] Baldwin, J.F., "Fuzzy Logic and Its Application to Fuzzy Reasoning", in Advances in Fuzzy Set Theory and Applications, edited by M.M. Gupta, et al., NorthHolland, pp. 93115, 1979. [3] Baranyi, P. and L.T. Koczy, "A General Revision Principle Method as a Way Between the Revision Principle and the Rule Interpolation Techniques", Proc. 6th IEEE International Conference on Fuzzy Systems, pp. 561566, 1997. [4] Chen, S.M., Yeh, M.S. and Hsiao, P.Y., "A comparison of similarity measures of fuzzy values". Fuzzy Sets and Systems, Vol 72, pp. 7989, 1995. [5] Chen, S.M., "Similarity Measure Between Vague Sets and Between Elements", IEEE Trans, on Systems, Man and Cybernetics, Vol 27, No. 1, pp. 153158, 1997. [6] Cross, V. and T. Sudkamp, "Fuzzy Implication and Compatibility modification", Proc. 2nd IEEE International Conference on Fuzzy Systems, pp. 219224, 1993. [7] Cross, V. and T. Sudkamp, "Patterns of Fuzzy RuleBased Inference", International Journal of Approximate Reasoning, 11, pp. 235255, 1994. [8] Ding, L., Z. Shen and M. Mukaidono, "A New Method for Approximate Reasoning", IEEE Proceedings of ISMVL  the 19th International Symposium on Multiplevalued Logic, pp. 179185, 1989. [9] Ding, L. and M. Mukaidono, "A Proposal on Approximate Reasoning Based on Revision Principle and Fixed Value Law" (in Japanese), The Transaction of the Institute of Electronics, Information and Communication Engineers, J72DII, 2, pp. 117122, 1991. [10] Ding, L., Z. Shen and M. Mukaidono, "Revision Principle for Approximate Reasoning, Based on Linear Revising Method", Proc. 2nd International Conference on Fuzzy Logic & Neural Networks (IIZUKA '92), pp. 305308, 1992.
Revision Principle Applied for Approximate Reasoning
147
[11] Ding, L. and Z. Shen, "Neural Network Implementation of Fuzzy Inference for Approximate Casebased Reasoning", in Neural and Fuzzy Systems: The Emergine Science of Intelligence and Computing, edited by S. Mitra, M.M. Gupta and W. Kraske, SPIE Press, pp. 2856, 1994. [12] Ding, L., "Methods of Revision Principle for Approximate Reasoning", Internation Journal of General Systems, Vol.28, No.23, pp.115137, 1999. [13] Dubois, D. and H. Prade(1985), "The Generalized Modus Ponens Under Supmin Composition: A Theoretical Study", in Approximate Reasoning in Expert Systems, Edited by M.M. Gupta, et al, NorthHolland, pp. 217232, 1985. [14] Hyung, L.W., Song, Y.S. and Lee, K.M., "Similarity measure between fuzzy sets and between elements". Fuzzy Sets and Systems, Vol 62, pp. 291293, 1994. [15] Klir, G.J. and T.A. Folger, Fuzzy Sets, PrenticeHall International, 1988. Uncertainty, and Information,
[16] Liu, X., "Entropy, distance measure and similarity measure of fuzzy sets and their relations", Fuzzy Sets and Systems. Vol 52, pp. 305318, 1992. [17] Mizumoto, M., "Some Fuzzy Inference Methods: The IFTHEN case", The Transaction of the Institute of Electronics, Information and Communication Engineers, Japan, J64D, 5, pp. 379386, 1981. [18] Mizumoto, M., "Comparison of Various Fuzzy reasoning Methods", Pnoc. of 2nd IFSA Congress, pp. 27, 1987. [19] Mukaidono, M., L. Ding and Z. Shen, "Approximate Reasoning Based on Revision Principle", Proc. NAFIPS'90,1, pp. 9497, 1990. [20] Pappis, C.P. and Karacapilidis, N.I., "A comparative assessment of measure of similarity of fuzzy values", Fuzzy Sets and Systems. Vol 56, pp. 171174, 1993. [21] Shen, Z., L. Ding, H.C. Lui, P.Z. Wang and M. Mukaidono, "Revision Principle for Approximate Reasoning, Based on Semantic Revising Method", IEEE Proceedings of ISMVL  the 22nd International Symposium on Multiplevalued Logic, pp. 467473, 1992. [22] Shen, Z., L. Ding and M. Mukaidono, "Methods of Revision Principle", Proc. of 5th IFSA Congress, pp. 246249, 1993. [23] Tsukamoto, Y., "An Approach to Fuzzy Reasoning Method", in Advances in Fuzzy Set Theory and Applications, edited by M.M. Gupta, et al., NorthHolland, pp. 137149, 1979. [24] Turksen, I.B., "An Approximate Analogical Reasoning Approach Based on Similarity Measures", IEEE Trans, on Systems, Man and Cybernetics. Vol 18, pp. 10491056, 1988.
148 L. Ding, P. Wang & M. Mukaidano [25] Wang, P.Z., "Fuzziness vs. Randomness, Falling Shadow Theory", BUSEFAL, No. 48, 1991. [26] Wang, W.J., "New similarity measures on fuzzy sets and on elements", Fuzzy Sets and Systems. Vol 85, pp. 305309, 1997. [27] Yager, R.R., "An Approach to Inference in Approximate Reasoning", Journal of ManMachine Studies, 13, pp. 323338, 1980. [28] Zadeh, L. A., "Fuzzy Sets", Information 1965. and Control, 8(3), pp. 338353,
[29] Zadeh, L. A., "The Concept of a Linguistic Variable and Its Application to Approximate Reasoning (I);(II);(III)", Information Sciences, 8, pp. 199249; 8, pp. 301357; 9, pp. 4380, 1975. [30] Zadeh, L. A., "Fuzzy Logic and Approximate Reasoning", Syntheses, 30, pp. 407428, 1975. [31] Zadeh, L. A., "A Theory of Approximate Reasoning", Machine Intelligence, 9, edited by J. Hayes, D. Michie and L. I. Mikulich, Halstead Press, New York, pp. 149194, 1979. [32] Zimmermann, H.J., Fuzzy Sets, Decision Making and Expert Kluwer Academic Publisher, Boston, 1987. Systems,
[33] Zwick, R., Carlstein, E. and Budescu, D.V., "Measures of similarity among fuzzy concepts: a comparative analysis", International Journal of Approximate Reasoning, Vol 1, 221242, 1987.
Chapter 8 Handling Null Queries with Compound Fuzzy Attributes
ShyueLiang Wang1, YuJane Tsai2
IShou University, TaiwanNational University of Kaoshiung, Taiwan
Abstract We present here a generalized approach for handling null queries that contain compound fuzzy attributes. Null queries are queries that elicit a null answer from the database. Compound fuzzy attributes are ambiguous attributes that are not defined in the original database schema but can be derived from multiple rigid attributes in schema. Compound fuzzy attributes derived from simple numbers were studied by Nomura [11]. In this work, we extend compound fuzzy attributes so that they can be derived from numbers, interval values, scalars, and sets of all these data types. Database management systems that can handle this type of ambiguous attributes in null queries not only can reduce the occurrences of null answers but also provide an improved userfriendly query environment Keywords: null query, compound fuzzy attribute, fuzzy database, aggregation function, similarity measure
8.1
Introduction
Ambiguous information permeates our understanding of the real world. Extending capabilities of database management systems to handle imperfect or fuzzy information has been studied extensively [13]. Precise and imprecise data modeling of the real world enterprise emphasizes on the fuzzy data representation in the fuzzy database landscape [4] [13]. Many relational and objectoriented fuzzy data models have since been proposed. Possibility distributions and similarity/proximity relations are two main techniques for representing fuzzy information [2]. However, there are significant feasibility problems for performance requirements on fuzzy database management systems. On the other hand, fuzzy queries on either crisp or fuzzy databases allow users to retrieve information in a
149
150
S.L. Wang & Y.J.
Tsai
more flexible manner [6]. It is suggested that frontend fuzzy querying systems have greater potential in the nearterm, based on performance criteria [4]. Fuzzy querying system plays an importance role in the database landscape. Due to the "lack of flexibility" by conventional DBMS queries, fuzzy queries concern issues of providing more flexible and userfriendly query environment on crisp or fuzzy databases. Typical fuzzy queries allow users specify fuzzy query conditions and fuzzy attribute values so that the system returns data that match the query statement [1]. Querying a database may result in three categories of results: exactly matched, partially matched, and null answers. Exactly matched answers are required in typical crisp queries, whereas partially matched answers are permitted for most fuzzy queries. Fuzzy queries that elicit null answers are usually classified as null queries. All these fuzzy queries assume that query attributes must appear in the original database schema. However, users may not be familiar with the database schema when formulating queries, there might be ambiguous query attributes that are not defined in the original schema. In order to provide all possible answers to users without just receiving the frustrating nulls, null queries may be permitted to contain compound fuzzy attributes in the query statement for reducing the occurrences of null answers. A compound fuzzy attribute in a null query is a fuzzy attribute that can be derived from multiple rigid attributes in the original database schema [11]. For example, a compound fuzzy attribute "Build" might be derived from rigid attributes "Height" and "Weight". From user's point of view, compound fuzzy attributes in null queries are terms that represent intuitive or semantic level meanings contained in the database. In general, causes of null answers may be due to missing attribute values in the database, syntactic errors in the query statement, and misconception about the database schema. There are many researches aimed at resolving the issue of missing attribute values, but few works have been done on resolving semantic errors. Motro [10] proposed the concept of generalized queries to explain null answers. The generalized query finds maximal failures and users them to determine the level of misconception. The level of misconception not only explains the reason why the query statement obtains a null answer, but also retrieves possible answers for the
Handling Null Queries with Compound Fuzzy Attributes
151
generalized query. To reduce user's misconception on the database schema and therefore less null answers, compound fuzzy attributes were proposed by Nomura [11]. It utilizes an averaging operator to represent the relationship between numerical rigid attributes and numerical compound fuzzy attributes. However, compound fuzzy attributes proposed by Nomura can only be derived from simple numbers. For fuzzy set attributes, Zemankova [18] proposed that fuzzy sets can be specified explicitly as a combination of other previously defined fuzzy sets in an Ad Hoc manner. It uses logical connectors and direct specifications to represent the relationships between attributes without systematic approach. In this work, we extend compound fuzzy attributes to fuzzy databases such that they can be derived from numbers, interval values [14], scalars [15], and sets of all these data types in a systematic way. Fuzzy aggregation functions are used to represent the relationships between rigid attributes and compound fuzzy attributes. Null queries containing compound fuzzy attributes from fuzzy databases can thus be handled under a unified approach. The rest of the paper is organized as follows. In section 2, we describe fuzzy data types that appeared in most fuzzy data models. Section 3 describes the generation of compound fuzzy attributes in null queries in detail. In section 4, we give an example showing how these null queries are processed. Finally, some discussion and future works are described in the conclusion. 8.2 Data Representation
Fuzzy databases use many different data types to model the imperfect information such as precise, imprecise, or uncertain data types. In summary, we list some data types that appeared in most fuzzy databases as follow [8]. 1. 2. 3. Number: this is the integer data type in all databases, e.g., Age=28. Interval value: this is a range of data lie between two numbers, e.g., Salary=[30k, 35k]. Scalar: this data type is defined for linguistic labels and usually
152
S.L. Wang & Y.J.
Tsai
4. 5. 6. 7. 8.
9.
10.
expressed as fuzzy sets or possibility distributions, e.g., Behavior=Good. Set of numbers: this data type contains one or more numbers, e.g., Age={28, 29}. Set of interval values: this data type contains one or more interval values, e.g., Salary ={[30k, 35k], [27k, 33k]}. Set of scalars: this data type contains one or more scalars, e.g., Behavior={Good, Normal}. Possibility distribution of numbers: this data type involves uncertainty factors on numbers, e.g., Age= {0.8/28, 0.5/30}. Possibility distribution of interval values: this data type involves uncertainty factors on intervals, e.g., Salary={0.9/[30k, 35k], 0.7/[27k, 32k]}. Possibility distribution of scalars: this data type involves uncertainty factors on scalars, e.g., Behavior={0.5/Good, 0.7/Normal}. Similarity/Proximity relations: this data type defines the analogical relationship between discrete attributes of a domain, e.g., Hobby ={Swimming, Sport, Stamp, Reading}.
In this work, we will concentrate on compound fuzzy attributes that are generated from numbers, interval values, scalars, and sets of all these data types. Other fuzzy data types can be handles in a similar fashion with minor modification [16] 8.3 Generating Compound Fuzzy Attributes A compound fuzzy attribute in a null query is a fuzzy attribute that can be derived from multiple rigid attributes defined in the original database schema. For example, compound fuzzy attributes "Build" of a person can be derived from "Height" and "Weight", "Potential" of an employee can be derived from "Age", and "Experience", and "Attractiveness" of a girl can be derived from "Eye" and "Hair" colors. Assume that a relation "Employee" is defined as Table 1. For simplicity, we only show the number, interval value and scalar data types. Sets of all these data types can be treated similarly. Suppose that we want to find
Handling Null Queries with Compound Fuzzy Attributes
153
employees who are "Big" in the sense that he/she is tall and heavy. A possible query might be as follows: Select "EmployeelD" From "Employee" Where "Build=Skinny" with Threshold^ 0.75. EmployeelD 001 002 003 004 005 Table 8.1 Height Short 180 Normal 170 Weight Average Fat [60, 70] 45
[40, 50] Short A database relation "Employee".
The attribute "Build" is not defined in the "Employee" relation but it can be compounded semantically from rigid attributes "Height" and "Weight". In fact, the relationship between the compounded attribute "Build" and the rigid attributes "Height" and "Weight" can be represented by a suitable aggregation function. In addition to numeric domains for rigid attributes, we assume that the scalar domain of the rigid attribute "Height" is {Short, Normal, Tall], the scalar domain of the rigid attribute "Weight" is {Light, Average, Heavy, Fat}, and the domain of compound fuzzy attribute is {Skinny, Average, Big}. In fact, we have assumed these scalar values as symmetric trapezoidal fuzzy numbers with side parameter b for simplicity. To determine such an aggregation function, nonfuzzy attribute values must first be converted and fuzzified into fuzzy sets so that all rigid attributes have the same data type. The selection of the most suitable aggregation function will be decided by additional information provided by the system designer, which will be explained in later section. To answer a null query with compound fuzzy attributes, similarity measure will be used to measure the "closeness" of the query attribute value with respect to compounded attribute value of each tuple. Those tuples with similarities greater than the threshold value will be the answer to the null query. Therefore, a general approach for handling null queries that contain compound fuzzy attributes consists of: (1) conversion and/or fuzzification of nonfuzzyset rigid attributes, (2) selection of the most suitable
154
S.L. Wang & Y.J. Tsai
aggregation function according to the additional information given by system designer, (3) measuring the similarity between the compounded fuzzy attribute of each tuple and query statement. In the following, we will illustrate these steps in greater detail. 8.3.1 UnitInterval Conversion and Fuzzification A rigid attribute may be expressed as simple numbers, interval values, scalars, or sets of all these data types. When attributes of different data types are compounded, they have to be converted into the same range and the same data representation in order to apply aggregation functions. For data types such as numbers and sets of numbers, they have to be converted to unitinterval values and then fuzzified to fuzzy sets. For interval values, as the two end points are independent simple numbers, they can be converted separately and then fuzzified. Each interval value in a set of interval values is independent to each other and can be treated separately. For scalars that are not defined on the same range, they should be converted to the domain of unitinterval. 8.3.1.1 UnitInterval Conversion
To minimize the effect of extraordinary data, we choose an 5function to convert a rigid numerical attribute value x, which is not necessarily defined on the unit interval, into a new value y that lies in the range of unit interval. The conversion function is defined as,
l2x
= /(*) =
(*max)/
/length
med < x (1) med > x
2x
(xminV /length
for Vxe [min, max], where "max" is the maximum of the domain of the attribute values, "min" is the minimum of the domain of the attribute values, "med" is the mean value of the domain of the attribute values, i.e., (maxmin)/2, and the "length" is the range of the domain of the attribute, i.e., (maxmin).
Handling Null Queries with Compound Fuzzy Attributes
155
For example, in the Table 1, assume that the range of the domain of the attribute "Height" for numerical type is defined on [100cm, 220cm] and the range of the domain of the attribute "Weight" is defined on another interval [10kg, 120kg]. For the fourth tuple in the Table 1, the value of the attribute "Height" is 170cm, after applying the conversion function (1), the converted value is 0.652778, as shown in Fig. 1. For the third tuple, the value of the attribute "Weight" is [60, 70] kg, after applying the conversion function (1), the converted value is [0.4132, 0.5868], as shown in Fig 2.
convert Height into the unit interval
convert Weight into the unit interval
100 Fig.
6Ukg 7ukg 170cm 10 30 50 70 90 110 120 140 160 180 200 220 convened Height I —— converted Weight  » 8.1 The c o n v e r s i o n of Fig. 8.2 The conversion of attribute value 170 cm. attribute value [60kg, 70kg].
8.3.1.2
Fuzzification into the fuzzy set
In order to convert rigid number attributes representation, we define a possibility distribution,
P(z) = exp(
? (zy,) \ b
(2)
where yt is the unitinterval converted value, and b is the side parameter of symmetric trapezoidal fuzzy number representing compound fuzzy attribute. For example, the attribute value 170cm is fuzzified as shown in Fig. 3. For interval data type, we define a trapezoidal type fuzzy set to fuzzify the
156 S.L. Wang & Y.J. Tsai unitinterval converted values. follows, The trapezoidal fuzzy set is defined as
'(byl+zy
P(z)=\l
Z e b w J] Z e [yt,y2
(3)
(y1+bzy
Z>y2
where yt represents the converted left and right endpoint of the interval value, for 2=1, 2. For example, the attribute value [60kg, 70kg] is fuzzified as shown in Fig. 4.
H edohfc170ari
OS 06 0.4 02 0< 3
"
/
A
\
\
J
02 04 06
/
V
OS 1
\
Fig. 8.3 Conversion and fuzzification Fig. 8.4 Conversion and fuzzification for the number 170cm. for the interval [60kg, 70kg]. 8.3.2 Aggregation Functions Aggregation functions on fuzzy sets are operations by which several fuzzy sets are combined in a desirable way to produce a single fuzzy set. Any aggregation function can be chosen to represent the semantic relationship between rigid attributes and the expected compound fuzzy attribute [11]. Formally, an aggregation function on n fuzzy sets (n > 2) is defined by a function h [5], A:[0,lj'>[0,1]. When applied to fuzzy sets Ah A2, ... , An defined on X, function h produces an aggregate fuzzy set B. Thus, B(x)=h(A!(x), A2(x), ... , A„(x)),
Handling Null Queries with Compound Fuzzy Attributes
157
XE X , where X is the universal set. For simplicity, we only consider averaging operators between the minimum and the maximum [3] [9]. The averaging operators that we choose to illustrate the semantic relationship are as follows, 5, = B2=n min(xl,x2,...,xn) (4) <5)
fl
(*,
1
X
—+— +. 2
1 .+—
X
B, = ( ^ , X X 2 X . . . X J : < ) «
B
(6) (7) (8)
_(*l+*2 +  + *.)
n Bi=l{(lxl)x(lx1)x...x(lxJ^ 1
B, =max(xl,x1,...,xn)
1
1
(10)
1  xx• + 1  *, + ... + 1x.
(9)
where B2 is the harmonic mean, B3 is the geometric mean, B4 is the arithmetic mean, 65 is the orgeometric mean, B$ is the orharmonic mean. The relationship among these averaging operators is as follows, and shown in Fig. 5, min = Bl<B2< B, < Bt < B5 < B6< B1 = max .
8.3.3 Similarity Measures A similarity measure of two fuzzy sets is a measure that describes the similarity between fuzzy sets [5]. A similarity measure SM is a real function defined as follows: SM :FxF >F , with the following properties,
158
S.L. Wang & Y.J.
Tsai
(1)
SM(A,B)=SM(B,A\
VA,BeF;
(2) (3)
SM(D,Dc)=0,
VDeP(x); F;
SM(C,C)=MaxSM(A,B),VC&
(4)
VA, 5, C 6 F, if A c B e C, then SM (A, # ) > SAf (A, C) and SM(B, C)> SM(A, C )
where /?+ e [0,1), X is the universal set, F is the class of all fuzzy sets on X, P(X) is the class of all crisp sets on X. Many distancebased, settheoreticbased, and matchingfunctionbased similarity measures have been proposed [7][12][17][19]. In order to measure the similarity between compound fuzzy attribute values of each tuple and query statement, we propose that the following properties should be satisfied by a similarity measure, (1) (2) (3) AnB=<t>^>SM(A,B) = 0;
A = B^>SM(A,B) = l; AnB^Cr>D^>SM(A,B)>SM(C,D);
where A, B, C, D are the fuzzy sets on the university set X. In fact, we select a similarity measure based on distance function that is defined as follows:
S(A,B)=I
IK^I
B are fuzzy sets and
(ii)
where
A
and
at £ A,bt G B.
8.4
Example
Allowing users to specify compound fuzzy attributes in query statements requires a mechanism for system designer to define the relationship
Handling Null Queries with Compound Fuzzy Attributes
159
between rigid attributes and compound fuzzy attributes. We assume that only a chosen set of rigidattributevaluepairs will be given their corresponding expected compound fuzzy attribute value, ex,, by system designer. This chosen set of attributevaluepairs may not cover the entire database. Then the relationship between rigid attributes and compound fuzzy attribute will be determined based on these given information. For example, the first column of Table 2 shows a chosen set of attributevaluepairs from the values of the domain of the rigid attributes, and the second column shows the given expected values of compound fuzzy attribute. To select the most suitable aggregation function to represent the relationship, all nonfuzzy attribute values must be unitinterval converted and fuzzified using equations (1)~(3). For each of these attributevaluepairs, averaging operators, equations (4)~(10) are applied to calculate the aggregated values. Using equation (11), the degrees of similarity between each aggregated value and expected value are shown in Table 2 on columns three to nine for the seven operators respectively. It can seen that the OrGeometric operator achieves the largest similarity measure on the average and should be selected as the most suitable aggregation function representing the semantic relationship between "Height", "Weight", and "Build".
Nfr
SJJIT,
Atfife\alBpas W45) (HHavtf
Fbnrri:
Gaanafc
AiMntib 0373701 0778816 0162585 0777755 02971 0790521 Q514GS 0253927 0487491 042250 05G833 0737885 OS2S641
CKiaxreB; 0526*83 08587371 02145345 0958305 03841819 0891*65 GGD4135 Q340KP 05791993 OSOHm Q9C85524 0937*94 Q8300BQ3
OHazmi: 0882525 0897122 04301636 08224178 03053773 03284023 05579143 024CR247 0695S12 0808419 0890772 OS294CB9 Q2157S5
MK 0499S23 038189 0178479 085014 0342303 085192 0655287
0
0
029554
0
045B2C4
Hg
A>aagp £bw&
osrast
0
0SS381 C09HP
(roroaj
(l«Aeig;) C*nral8a> <T*mitG5) 0O,Anaaff) <Sm,S)
0
0
Q507O1 COKES 0526439 003505 Q128852
anas
Q1K816 0753366 Q138121 023CB55
Hg
i^aaagp SJrny Aiecgp Skkiy
(mm
Q118S4 0207245
azrsra
C6S8S8 0539219 Q897D7 0873239 0823377
mim
(18QWD
0
005708
0
Q1XB57 0276966 0581822 Q236B18
0
Q17Z555 Q112221 Q5297C9 Q132284
Hg Hg
faaz&
(paaraoiRt)
(DSlSJAaig:) ehatPffl)
msm
0955574 0228535
Sim
160 S.L. Wang & Y.J. Tsai
<m<s>
Skkity
CCM59
Q0DCB16 (HEW 0E99G7
1.1KB
Q19132 Q188251
QOE22
05522577 Q7K3Q52 QS7SB
05257907 07*5127 QSB2135
Q55Z715
(ww
/VGctSM.
Hg
asset
0Z7H6
casus
Q5318S2
OTsac
OSI65M

Table 2
Selecting the most suitable aggregation function.
f
—4
\
*—
4
\ 4
S
k
«
i
n
O.ft
O.H
0.2
0.4
BuildSkinny
Fig. 8.5 Compounded fuzzy attribute Fig. 8.6 Compounded fuzzy attribute value for the third tuple and the value for the fifth tuple and the query attribute value "Skinny". query attribute value "Skinny". To answer the query, the degree of similarity between the compound fuzzy attribute value of each tuple and query statement is calculated using equation (11). Fig 5 shows the compound fuzzy attribute for the tuple 003 and the query attribute. The degree of similarity is SM((Normal, [60,70]), Skinny)=0.021, which is less than the threshold value 0.75 and will not be accepted as answer to the query. Fig 6 shows the compound fuzzy attribute value for the tuple 005 and the query attribute. The degree of similarity is SM((Short, [40,50]), Skinny)=0.943, which is greater than the threshold value 0.75 and will be accepted as an answer to the query.
8.5
Conclusions
Null queries caused by missing attribute values or syntactical errors in query statement have been studied widely. However, it is more difficult to handle null answers caused by semantic errors. To reduce the occurrences of null answers appeared in fuzzy queries, compound fuzzy attributes derived from simple numbers were proposed previously. In this work, we further extend the concept of compound fuzzy attribute so that it can be derived from more complex data types such as interval values, scalars, and sets of numbers, interval values, and scalars. In fact, we propose a general
Handling Null Queries with Compound Fuzzy Attributes
161
approach for handling fuzzy queries that contain compound fuzzy attributes that can be derived from fuzzy databases. Database management systems that can handle this type of ambiguous attributes in null queries not only can reduce the occurrences of null answers but also provide an improved userfriendly query environment. For further investigation, we plan to extend to fuzzy data types based on fuzzy sets of numbers, interval values, and scalars, as well as similarity/proximity based fuzzy data types. In addition, different selection schemes for the aggregation function will be considered. References [1] P. Bosc and O. Pivert, "Fuzzy Querying in Conventional Databases," Fuzzy Logic for the Management of Uncertainty (Eds., Zadeh, L. and Kacprzyk, J.), John Wiley, N. Y.(1992) [2] B. P. Buckles and F. E. Petry, "Fuzzy Databases and Their Applications," Fuzzy Information and Decision Process (Eds., Gupta, M. and Sanchez, E.), NorthHolland, New York, pp.361371(1982) [3] D. Dubois, "A Review of Fuzzy Sets Aggregation Connectives," Information Sciences, Vol. 36, pp.85121(1985) [4] R. George, F. E. Petry, B. P. Buckles, and R. Srikanth, "Fuzzy Database Systems  Challenges and Opportunities of New Era," International Journal of Intelligent Systems, Vol. 11, pp.649659(1996) [5] G. J. Klir and B. Yuan, "Fuzzy Sets and Fuzzy Logic Theory and Application," Prentice Hall PTR(1995) [6] D. H. Kraft and F. E. Petry, "Fuzzy Information Systems: Managing Uncertainty in Databases and Information Retrieval Systems," Fuzzy Sets and Systems, Vol. 90, pp. 183191 (1997) [7] X. Liu, "Entropy, Distance Measure and Similarity Measure of Fuzzy Sets and Their Relations," Fuzzy Sets and Systems, Vol. 52, pp.305318(1992) [8] J. M. Medina, M. A. Vila, J. C. Cubero, and O. Pons, "Towards the Implementation of a Generalized Fuzzy Relational Database Model," Fuzzy Sets and Systems, Vol. 75, pp.273289(1995) [9] M. Mizumoto, "Pictorial Representations of Fuzzy Connectives, Part I: Cases of tNorms, tConorms and Averaging Operators," Fuzzy Sets and Systems, Vol. 31, pp.217242(1989) [10] A. Motro, "Query Generalization: A Method for Interpreting Null Answers," Proceedings of Is' International Conference on Expert Database System,
162 S.L. Wang & Y.J. Tsai pp.597616(1986) [11] T. Nomura, T. Odaka, N. Ohki, T. Yokoyama, and Y. Matsushita, "Generating Ambiguous Attributes for Fuzzy Queries," Proceedings of 1992 IEEE International Conference on Fuzzy Systems, pp.753760(1992) [12] C. P. Pappis and N. I. Karacapilidis, "A Comparative Assessment of Measures of Similarity of Fuzzy Values," Fuzzy Sets and Systems, Vol. 56, pp.l71174(1993) [13] S. Parsons, "Current Approach as to Handling Imperfect Information in Data and Knowledge Bases," IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 3, pp.353372(1996) [14] S. L. Wang and Y. J. Tsai, "Null Queries with IntervalValued Ambiguous Attributes," Proceedings of 1998 IEEE International Conference on Systems, Man, and Cybernetics, San Diego, USA, October(1998) [15] S. L. Wang and Y. J. Tsai, "Extending Compound Fuzzy Attributes for Fuzzy Queries," Proceedings of 5'h International Conference on Soft Computing, Fukuoka, Japan, October(1998) [16] S. L. Wang and Y. J. Tsai, "Compounding Fuzzy Attributes from Scalars with Uncertainty for Fuzzy Queries," Proceedings of the 1998 6'h National Conference on Fuzzy Theory and its Applications, Taiwan, December(1998) [17] X. Wang, B. D. Baets, and E. Kerre, "A Comparative Study of Similarity Measures," Fuzzy Sets and Systems, Vol. 73, pp.259268(1995) [18] M. Zemankova, "FILIP: A Fuzzy Intelligent Information System with Learning Capabilities," Information Sciences, Vol. 14, No. 6, pp.473486(1989) [19] R. Zwick, E. Carlstein, and D. Budescu, "Measures of Similarity among Fuzzy Concepts: A Comparative Analysis," International Journal of Approximate Reasoning, Vol. 1, pp.221242(1987).
Chapter 9 Fuzzy System Description Language
Kazuhiko Otsuka1,
Yuichiro Mori2,
l 2
and Masao Mukaidono1
Meiji University Kochi University
Abstract As a first step toward standardization of a practical programming language for fuzzy system applications, we proposed Fuzzy system Description Language (FDL) in 1996. The specification of the first version of FDL was not definitive edition. This specification was designed for hardwarecoding of fuzzy control systems based on fuzzy inference as a prerequisite. So although it fulfills the intended functions, several problems arise for unexpected applications. In this article, we first describe the specification of standardized FDL with its background and properties. Then, we consider some problems ( the assignment operation, the comparison operation and the internal expressions, etc. ) arised from wide applications of FDL. We describe the improvements of FDL. At last, we describe the fuzzy inference systems based on the indirect inference method with FDL and discuss some properties. Keywords : Fuzzy Systems, Fuzzy control, Fuzzy Inference, Discrete Expression
9.1 9.1.1
Introduction Background
At present, there are various systems applied fuzzy theory. Most of them aim at fuzzy control based on fuzzy inference. In these systems, the description style, for example ruleformat and operations, of knowledge bases for fuzzy inference considerably looks similar to each other, although the implementation of operation methods, data expression and etc. is original for each system designer. Consequently, it is very hard to effectively
163
164
K. Otsuka, Y. Mori & M. Mukaidono
share knowledge bases which have been already accumulated. In addition to the problem of the implementation with software, in the systems using hardware optimized fuzzy inference, field engineers design the systems with native development environment depending on specific hardware and the primitive language like assembler language to optimize the stored data and programing code of ROM. In these ways, programmers have to master high skill and knowledge and the system contents are usually hard to understand for the others. In order to efficiently share and store knowledge bases, we need the standard system description language for fuzzy systems not depending on the system architecture and target field In this article, we describe the fuzzy system description language (FDL) as a standard system description language for fuzzy systems.*
9.1.2
Current
status
of fuzzy system
construction
There are various methods for composing fuzzy systems, such as using generic buildin hardware like home electoronics, using native fuzzy processing chip and large knowledge base like expert systems, and using only software system with programming languages. These systems are originally offered by each industries, laboratories and universities concerning the development. In the case of hardware systems, the development environments are closely related with hardwares used, so the replacement of the completed system is very hard. In the software side, the system description languages for fuzzy system are extended from the popular programming languages like C, Lisp and Prolog, to express fuzzy data with userfunction, matrix and process for fuzzy theory [4][l0]. These extensions use different definitions based on properties of each system case by case, and this tendency is especially strong in the practical systems, without a standard description language for fuzzy systems, the cultivated knowhow of system developments can not apply to the others. Therefore, we need to standardize the description language for fuzzy systems so that this language can help to promote fuzzy system developments further in future.
*The fuzzy system description language (FDL) [l],[2], [3] is a part of project activities in 1994 by EIAJ ( Electronic Industries Association of Japan ) which is organized by hardware makers.
Fuzzy System Description
Language
165
9.2 9.2.1
Fuzzy s y s t e m description language (FDL) Outline of FDL
Most of fuzzy systems purpose to fuzzy control based on fuzzy inference. T h e main d a t a in these systems are fuzzy d a t a which a p p e a r as fuzzy sets and fuzzy t r u t h values expressed with membership functions, but numeric d a t a and character d a t a , which are used in usual programming, are also used in these systems. T h e fuzzy inference is the kernel in fuzzy systems, it works together with other parts like preprocess and postprocess. In other word, fuzzy system is a special case of an usual system involving fuzzy information processing in its main part. So, the description language for fuzzy systems must be general enough for usual information processing as well as for fuzzy information processing. When we are going to design a programming language, we can select a language style in the following two ways. One is an original language which can describe fuzzy information processing suitably, and the anothers is an extension of existing usual language which is able to treat the p a r t of fuzzy information processing . In the former case, we must learn a new language format and programming technique and this language has poor affinity to existing systems. It is important what language we choice as a base language for F D L . There have been development environments based on Lisp [6] and Prolog [7],[8], but they are particular development systems dedicated to the subject of research. In real application fields, hardware designers use native hardware description languages like VHDL ( VHSIC Hardware Description Language ) [ l l ] , Verilog [12] and etc., and they efficiently store programs to R O M by assembler language. Hardware description language is only effective for hardware, and the primitive language like assembler language can not solve various present problems. Therefore these languages are not good choices. Recently, object oriented languages like C + + language and Java are popular in the fields of the large system developments in project teams, but they have not become popular in other field programmers. Moreover, the language specification of Java is being u p d a t e d frequently. W i t h more practical consideratin, F D L is based on ANSI C Language which has high portability and is established as a system programming language. As a recent tendency, we also have noticed t h a t the development environments on many platforms like Microsoft Windows are shifting into R A D
166
K. Otsuka, Y. Mori & M.
Mukaidono
( Rapid Application Development ) and visual development based on object oriented languages. There are many instances of system development with Java which is adapting multiplatform and network environment. T h e affinity between the implement of fuzzy theory and the function include C + + language and Java will be better t h a n ANSI C language. Therefore, the standardization based on objectoriented language will be considered as a next stage of this research in near future. In rest of this section, we explain the base of the standard of FDL:Fuzz System Description Language (EIAJ, AEX2002) [l]. In the next section, we will point out some problems in F D L from used experience of it, and propose some improvements for these problems. 9.2.2 9.2.2.1 Specification of FDL of FDL
Basic Specification
T h e d a t a types and operators for fuzzy d a t a are not defined in ANSI C language which is the base language of F D L . So F D L defines the following: • • • • fuzzy d a t a type notations of membership function operators for fuzzy d a t a other functions for fuzzy theory
In addition, as most fuzzy systems are implemented for the fuzzy control based on the fuzzy inference, F D L has to include the following: • management of knowledge base T h e other definitions are the same as ANSI C language. T h e main purpose of F D L is to describe systems applied fuzzy theory to wide applications domains. B u t there are many m a t t e r s which are not clearly decided how to be treated them formally in fuzzy theory. So at current stage, we forcus on the description fuzzy inference engines and the the parts related to the fundamentals of the fuzzy theory. 9.2.2.2 Fuzzy type
Because the base language of F D L is ANSI C, all d a t a used in program must be declared in advance. We defined f u z z y t y p e for fuzzy set and f u z z y
Fuzzy System Description
Language
167
t r u t h value for numerical truth value. When we declare fuzzy type, we must appoint the tag name; the type and the range of support set. fuzzy name { type, left, right }; type = { i n t  f l o a t  double  enum } In this example, "name" is a tag name of this fuzzy type. This is treated as well as s t r u c t and union tag in C language, "type" is the support set type of this fuzzy set, and the range of "name" is limited by two values of l e f t and r i g h t . The support set types are the integer type(int), the enumeration type (enum) limited to finite elements and the float type ( f l o a t . d o u b l e ) which has the continuous space as the support set. In the case of enumeration type, the declaration form is following: enum color { r e d , green, b l u e , b l a c k , white }; fuzzy background { color }; When the support set is the continuous value, we have to write an appointed division number following the range field. If the division number field is omitted, this value is assumed "10" as a default. fuzzy speed { double, 0 . 0 , 200.0, 100 }; In the above example, the last value (100) shows the division number. Therefore, this fuzzy type means this membership function is stored as 101 points, which divide the interval from 0.0 to 200.0 by 100 periods. If the referenced point is different from the storing points, it is assumed to take the nearest storing point. The declaration form of multidimension fuzzy type is following: fuzzy g r i d { i n t , 0, 5 } { double,  1 0 . 0 , 10.0, 50 }; The variable declaration of fuzzy type is same as the numerical type like "int" in C language.
fuzzy speed slow, medium, fast;
9.2.2.3
The notation of membership function
There are four kind notations for a membership function in FDL.
168
K. Otsuka, Y. Mori & M.
Mukaidono
• • • •
Enumeration type Function type Singleton type System type
We can only use these notations as the initial values at the variable declaration. (a) Enumeration type This notation type is described by some pairs of element with its membership degree ( it is called truth value, too ). The membership degrees at nondescribed points are supposed to be 0. fuzzy TypeName VarName = elements { ( ElementName, TruthVarlue ) , ••• }; The fuzzy data VarName belongs to fuzzy type TypeName. Truth Value is a membership degree of ElementName. For example, fuzzy color sample_color = elements { ( r e d , 0 . 5 ) , (yellow, 0 . 1 ) , ( b l u e , 1.0) }; (b) Function type This type uses a user defined function which is usable in many general programming language. This function receives some elements as function arguments and returns one fuzzy truth value data as its membership degree, function is the keyword of this notation followed by the function name used as its definition. The deneral definition is following: fuzzy TypeName VarName = function fuzzy_truth DefFuncName( ArgList )
{ ••• }
DefFuncNameO
For example, fuzzy data slow which belongs to fuzzy type speed is defined by function slow.value. fuzzy speed slow = function slow_value(); fuzzytruth slow_value(double pos)
{ ••• }
Fuzzy System Description
Language
169
(c) Singleton type Singleton type must specify only one element. Its membership degree is 1 for the element, and for every other is 0. fuzzy TypeName VarName = singleton { ElernentName }; For example, attention_color which means the degree of yellow is only 1, and others degree are 0 is described following: fuzzy color attention_color = singleton{ (yellow) }; (d) System notation At present, most fuzzy types used in fuzzy systems are onedimensional and numeric fuzzy data, which means the support set is either integer type or floating points type. Moreover, their membership functions usually have simple shape like triangle or trapezoid. Therefore, we prepared four notations for these membership functions: enumerations with interpolation type (points), vector type (vector), triangle type (triangle) and trapezoid type (trapezoid). In these notations, we describe a membership function by pair of some characteristic elements and their membership degrees. The membership degree of other unspecified points are calculated by liner interpolation with near two specified elements. fuzzy speed medium = t r i a n g l e {40, 60, 80 } ; 9.2.2.4 Operator
The operations for fuzzy type are following: • Assignment operation • Compare operation • Logical operation In the operation between a fuzzy type and a usual numerical data type, the latter is casted into the fuzzy data whose value is a pair of the element and the membership degree of 1. (a) Assignment Operation The assignment operator is " = " as the same of C language. The numerical data and the variable as fuzzy type can assign in anywhere in the program, but the fuzzy data notation can be used only in the variable declaration.
170
K. Otsuka, Y. Mori & M. Mukaidono
(b) Comparison operation The comparison operations for fuzzy truth value are the same of usual C language's. On the other hand, the comparison operations among the fuzzy type are only four kinds, = = , !=, <, > and they are evaluated only when their support sets are numeric and onedimensional. " = = " can use for all fuzzy types and the result is true only two fuzzy data are complete identity. "!=" operator is the negation of " = = " . In the comparison for larger or smaller, if two fuzzy data are overlap, the result is false because we consider here it is impossible to compare. Although there are many interpretations in this case, we supposed as the above.
Fig. 9.1
The comparison of two fuzzy data
In Fig.9.1, Fuzzy data A is smaller than B and C, but the results of both " < " and " > " between B and C are false. (c) Logical Operation There are three kinds of the logic operations: and, or, not. As one of the features of fuzzy theory, there are various definitions of logical operations. Their popular definitions are four kinds: logic, algebraic, bounded and drastic. Moreover, there are many reports about the useful definitions of the combination of them. Therefore, it is improper to fix their definitions. We adopt that the operator definition can be selected by programmers such as the operatoroverload mechanism in C + + language. fuzzy_and my_original_and;
Fuzzy System Description
Language
171
In the above example, fuzzy_and is the keyword for the definition of andoperation, and the function named my_original_and is the definition of the operator and. This function has two arguments ( one argument if negation ) which are fuzzy truth value, and returns one fuzzy truth value. In Table. 9.1, we show the list of popular definitions and their functions name.
Table 9.1 Keyword of popular logical operation.
Logical Algebraic Bounded Drastic
and fuzzyJogicand fuzzy_algebric_and fuzzy_bounded_and fuzzy_drastic_and
or fuzzy Jogic.or fuzzy _algebric_or fuzzy _bounded_or fuzzy _drastic_or
9.2.2.5
Rule Base
The rulebase expressing the knowledge is the most important in the system fuzzy inference is applied. In general, the rulebase is described with ifthen forms. We can select operation definitions in each rulebase because there are many inference methods. Table 9.2 shows the selectable operations and their default definitions.
Table 9.2 Operations in Rule Base.
Operation And Or Not Modification Aggregation
KeyWord fuzzy _and fuzzy _or fuzzy _not modification aggregation
Default fuzzyJogicand fuzzy_logic_or fuzzy _one_minus fuzzy _logic_and fuzzy_logic_or
rule_base control(fuzzy speed self, fore) (fuzzy action result)
i
if self is slow and fore is slow then result is no_action; if self is slow and fore is false then result is accelerate; }
172
K. Otsuka, Y. Mori & M.
Mukaidono
In this example, two fuzzy data, s e l f and fore which belong to speed are the input arguments of this rule base, and r e s u l t of a c t i o n type is the output data. In the left part of "is" in the conclusion part, we can write only output variable. Moreover, in the conclusion part of a rule we can adopt onedimensional liner function (TakagiSugeno Model[l5]) and we can describe the negative conclusion part (else) and weight part (with) for each rule. The rulebase has the mechanism to include the other rulebase (include) to recycle knowledge base. C language, which is a procedural programming language, executes statements in order, but in our rulebase the rule order is not influenced for evaluation. And this mechanism is applicable to the include rulebase, too. 9.2.3 Internal Expression
The quantity of information required by fuzzy data is normally much more than the usual data because the fuzzy data is expressed by the set of elements, and its membership degrees. This situation is avoidable because of the concept of fuzzy theory in which expression is more vague than the conventional. But this property causes the information quantity be huge when we completely express the fuzzy data. It is especially true when support set is continuous type. Therefore, we usually use approximate expression methods. Here, we examine about the internal expressions of fuzzy data. 9.2.3.1 User Function method
The fuzzy data with continuous support set are usually expressed by analytic functions. One way is to represent whole membership function with an userdefined function in program. This method has an advantage in the sense of a little quantity and high precision. However, it is very hard to change the membership shape because fuzzy data is a part of program code. Moreover, we can not store the result of the calculation by the same method in general. We conclude this method is unfit for the internal expression of fuzzy data. 9.2.3.2 Representative points method
The most membership functions are the simple shape like triangle and trapezoid. In this case, we can express the whole membership function with
Fuzzy System Description
Language
173
high precision and little quantity by designating typical points. There is no problem for the result of calculation and in changing data in this method. Therefore, this fits as the internal expressions of fuzzy data. However, the amount of information needed to specify its shape is increaseing in proportion to the complexity of its shape. In general, fuzzy data become more complicated according to repetition of operations in this method. When we implement it by hardware to speed up and compose a builtin control system, we have strong restrictions for memory size and operations. Therefore this method is considered unfit, as the main target of FDL at this point is practical fuzzy control.
1
0
Fig. 9.2 Representative Points Method.
9.2.3.3
Interval division method
The final method is that fuzzy data are stored as matrix elements divided by constant interval on the continuos support set. This method usually needs larger memory space than the others and is poorer in data precision. However, this is not affected by a shape of fuzzy data and can express by constant memory size. This is a suitable expression for the implementation of fuzzy data with hardware, because in this method the amount of data, which is decided by requested precision at first, is not increase. As the conclusion, we adopt this expression as the internal expression of fuzzy data in FDL specification.
174
K. Otsuka, Y. Mori & M. Mukaidono
H
1
h
H
1
•
Fig. 9.3 Interval Division Method.
Table 9.3 Specification of expression method. Weakness Advantage User Function method Difficult for changing degree values Problem of the result of calculation High precision Expression with low memory space Representative points Data amount depends on its shape Need to interpolate at reference of elements method Expression with low memory space High precision Many unuseful data are contained Interval division method Slow calculation speed Data amount does not depend on its shape Operation is the repeat of simple works Easy to parallel processing Method
9.3 9.3.1
Improvement of standardized FDL A few problems of standardized FDL
The final aim of FDL is to describe all fuzzy data processing and procedures in fuzzy theory. The specification of FDL standardized by EIAJ [l] is not definitive edition. This specification is designed for hardwarecoding
Fuzzy System Description
Language
175
of fuzzy control system based on fuzzy inference as a prerequisite. So, although it fulfills the intended functions, several problems arise for unexpected applications. In this section, we pick up these problems and propose the improvements. 9.3.2 Improvement of assignment operation
In FDL, all fuzzy sets are expressed using limited data and each data consists of pairs of an element and the membership degree, so that we can define fuzzy data by enumerating the pairs. But, this notation is very inconvenient for some practical uses. Therefore we equipped a trianglenotation and a trapezoidnotation with FDL. Fuzzy data defined by those notations are automatically converted into the enumeration forms with specific accuracy. For example, the following definition represents Fig.9.4 and it is actually stored as Fig.9.5. fuzzy speed FAST = t r i a n g l e { 60, 80, 100 };
1
0 Fig. 9.4
50 Fuzzy data defined by trianglenotation.
100
speed
According to C language specification ( because, FDL is based on C language ), these conversion procedures must be defined for each fuzzy type. When the definition was described as an initializer like abovementioned, the syntax parser can decide the procedure that should be used, because
176
K. Otsuka, Y. Mori & M.
Mukaidono
l
n
*
•
i
1
1
•
1
1
s
—
0 Fig. 9.5
50 Stored data actually.
100
s
Peed
its definition of fuzzy type is stated clearly in the left side of " = " operator. FAST = t r i a n g l e { 60, 80, 100 }; When we use dynamic allocation of fuzzy data as an assignment statement according to the C language syntax analysis rule, the syntax parser can not detect the fuzzy type of left part. Therefore, for the upward compatibility with C language in FDL, we didn't define the assignment operation of fuzzy data except for an initializer. As previously mentioned, standardized FDL aims at hardwarizing fuzzy system. Limitation of assignment operation enable us to store initial fuzzy data into ROM which is cheaper than RAM. If we use dynamic allocation of fuzzy data, a dependent rate RAM becomes high. If we can't use assignment operation among a program, in other words, if we can't use dynamic allocation of fuzzy data in a program, it will be very strong constraint and restriction of the expressive ability as a programming language. When we consider various utilizations of FDL, flexibility of description is very important rather than hardwarizing. Additionally, the price of RAM becomes cheaper day by day and the problem of cost is not as important as before. In current FDL, the function to change degrees is prepared only for an individual element of fuzzy data. By using this function for all elements of support set, we can realize same function to the assignment operation.
Fuzzy System Description
Language
177
But this method is not practical. So we propose to improve FDL such that fuzzy data described with triangle, trapezoid and so on can be assigned to a variable like a usual numerical value. 9.3.3 Improvement of comparison operation
The comparison operation in current FDL returns {0,1} value which means completely satisfy or not instead of fuzzy value of [0,1]. For example of coincidence operator ( = = ) , in the only case that degrees of all elements of two fuzzy data are completely equal each other, the return value is l(true), and in the other case it becomes O(false).
Fig. 9.6
Comparison of fuzzy data.
For example, in the relation shown in Fig.9.6, A==B and A==C return O(false). But, this definition is clearly different from a general concept of fuzzy theory. In the case of A==B, it is proper to return a value of similarity degree [0,1]. When we consider a definition of the operation on the basis of this concept, there was not typical interpretation for the degree of coincidence of two fuzzy data which are intersected each other. We cannot define this similarity operator easily now, because many discussions are necessary to fix this operation method. This definition problem of intersection degree is related to other comparison operations. We can redefine operations for logic operators ( not,and,or ) in FDL program. Similarly, we propose to change the FDL specification such that the comparison operators can be
178
K. Otsuka, Y. Mori & M.
Mukaidono
redefined in program. 9.3.4 Improvement of Internal expression
In the case that support set is finite in fuzzy data, we can express it by an array or a list structure and can easily use it in program. But in the case that support set is continuous type, the only method to express fuzzy data exactly is to define the membership function as a function which is expressed by a formula. But this method has many problems. So we express a fuzzy data of continuous support set by finite (limited) data. There are two typical methods for the expression as mentioned in 9.2.3.2, 9.2.3.3. (1) Interval division method (2) Representative points method At present, in the standardized FDL, fuzzy data are treated with the method of (1) for hardwarizing fuzzy system. But as mentioned earlier, the situation is changed. With a view of characteristics of each method mentioned in 9.2.3.2, 9.2.3.3, we have to reconsider the necessary functions and abilities of representing fuzzy data for future usage. 9.3.4.1 New requirements and abilities for FDL
One of the targets of the standardized FDL is reduction of cost. In other words, it is to reduce parts (chips) number. The small memory and the simple processor are good for hardwarizing. Additionally, widely utilizable fuzzy system needs constant precision and stability of throughput speed. So the interval division method was adopted to satisfy those requirements. At current stage, the FDL is an experimental production for research and development, and algorithm verification. In other words, a case of softwarelevel system construction will be increased. Therefore, the requirements on the hardware resources, the precision and the throughput speed are different from the case of hardwarization. There is not so strong need of memory efficiency because recent computer has much memory. Alternatively, the precision is more important than the throughput speed. • Requirements for hardwarization — Small and constant memory required — Constant precision
Fuzzy System Description
Language
179
— Stability of throughput speed by simple processor • New requirements — Dynamic definition of fuzzy data — High precision — Adapting to constrain for hardwarization 9.3.4.2 Fusion of two methods
The internal expression of fuzzy data must be changed following the shift in the purpose of FDL. But we can't make away with the hardwarizing function, because the applications of FDL to the hardware realization don't fade out. Therefore, we added representative points method to interval division method as internal expressions of fuzzy data. When we define a fuzzy type as usual, we can use the representative points method; in this case, the division number of internal division method becomes the upper limit of the number of elements.
1
0 Fig. 9.7 Example of definition of near points.
In this case, however some problems are left. When points are extremely close as shown in Fig.9.7, information is lost when conversion of expression method is used(Fig.9.8). This problem can be avoided by the programmer side, because the division by designated number depends on the precision demand that a programmer specifies. The implementation in a hardware
180
K. Otsuka, Y. Mori & M. Mukaidono 1
Fig. 9.8
Information loss by changeover of expression method.
must limit the expression method to the interval division method. It should be decided according to a compiler option or a limitation of system specification.
9.4
Enhancement of the indirect inference
The popular inference method in fuzzy logic is the direct inference method proposed by Zadeh [l6],[l7], Mamdani [18] and others. There is the indirect inference method proposed by Tsukamoto [20], Baldwin [19] and others. The direct inference is calculated directly by set operations for fuzzy data given as input and knowledge. On the other hand, the indirect inference method executes inference in such a way: first mapping the whole fuzzy data to the space of fuzzy truth values, then processing the essential part of its inference in fuzzy truth space and returning the result to original space. In the section, we forcus on the direct inference method. We will examine the problems and their solutions when we describe the system using the indirect inference method with FDL in next section. 9.4.1 The truth qualification
For example, suppose we are given the following fuzzy predicate:
Fuzzy System Description
Language
181
((x) is .4) is moreorlessture In this case, when we fixed the variable x to an element e in the universal set U, we can replace the above predicate into the following proposition: ((e) is A) is moreorlesstrue If the linguistic truth value moreorlesstrue is expressed by T, then we can get new truth value corresponding at e by the following function: MA'(e) = M/*/i(e)) We can obtain the new fuzzy set (A ) when this operation is executed for the whole universal set U. We can easily show this operation in Fig. 9.9 in which the left part is rotated from the original figure to left 90°.
Fig. 9.9
The truth qualification.
9.4.2
The converse
truth
qualification
The converse truth qualification is the inverse problem of the truth qualification. For example, when the following equation and fuzzy predicates A,B are given, we can calculate the linguistic truth value r from two fuzzy set A and B. This process is called the converse truth qualification.
182
K. Otsuka, Y. Mori & M. Mukaidono
((x) is A) is
T
= (x) is B
In this case, we can replace above predicate to the following function.
/J,T(HA(X)) HB{X)
HT(S) = M B ( ^ 1 ( S ) )
We can obtain r by such process. It is shown in Fig. 9.10 as well as Fig.9.9
Fig. 9.10
The converse truth qualification.
When the membership function of fuzzy data A is one to one correspondence, we can obtain r easily as shown Fig. 9.10. In this article, we consider only the cases of one to one correspondence for A. 9.4.3 The extension for FDL
In the truth qualification and the converse truth qualification, the main features which are not in the direct inference process, are the following: (1) Request a converse function of membership function.
Fuzzy System Description
Language
183
(2) T h e membership degrees of a fuzzy d a t a have to correspond to the elements of the t r u t h values. These points are applied to the internal expression of fuzzy d a t a . We have to calculate the converse function to execute the converse t r u t h qualification, but it is very hard by user function method (9.2.3.1). T h e popular solution is to prepare it by programmers, but this is inconvenient and unpractical obviously. Therefore the user function method is unsuitable for general internal expression for fuzzy d a t a from the above reason, too. In the case of interval division method (9.2.3.3), the support set is divided by the constant interval, but there is no restriction to the side of membership degrees. In the t r u t h qualification and the converse t r u t h qualification, the membership degree of a fuzzy d a t a becomes the elements of the other fuzzy d a t a ( fuzzy t r u t h value ) in next step, therefore, in general, the intervales of membership degree and elements of fuzzy t r u t h value usually mismatch. In the t r u t h qualification shown at Fig. 9.11, the elements of T don't correspond the membership degrees of A. We must calculate the membership degree of it using interpolate by two nearest elements on r . Moreover, in the converse t r u t h qualification shown at Fig.9.12, we have to interpolate more times compare t h a n the t r u t h qualification. Theoretically, when we repeat these operations alternately, all fuzzy sets remain because the t r u t h qualification and the converse t r u t h qualification have duality. But if we use the fuzzy sets expressed by the interval division method, we may get the result shown at Fig. 9.13. T h e example of Fig. 9.13 is the result of 100 over loops, where HA(X) = x2, ^B{X) = 1 — \fx and the interval of division is 0.1, and the operations of the t r u t h qualification and the converse t r u t h qualification are applied alternatively. T h e upper figure of Fig. 9.13 is of B and the lower figure of Fig. 9.13 is of r , where the final result becomes to straight line in r . In the actual indirect inference, we don't repeat the operations so many times but this problem affects the result of indirect inference. In the interval division method, the interpolation number is increased according to the member of operations. Therefore, it is concluded t h a t the representative points method is effective as the internal expression of fuzzy d a t a for these operations.
184
K. Otsuka, Y. Mori & M.
Mukaidono
r
1
JI
m \. •
\
\^
^~~^<v
1 0 1
i^T— i
1
—— i
1
Fig. 9.11
The interpolation in the truth qualification.
Fig. 9.12
The interpolation in the converse truth qualification.
Fuzzy System Description
Language
185
Final result of B
• Final result of z
Fig. 9.13
Example of the accumulation of the calculation error.
186
K. Otsuka, Y. Mori & M.
Mukaidono
9.5
Conclusion
In the expression of continuous fuzzy d a t a , there are two problems. One is related to theoretical aspect and the other is actual kinds of processing on the computer. For example, one of the problems in theoretical aspect is t h a t many definitions of fuzzy operations are not yet decided, and one of the problems in actual processing on computers is how to represent the continuous membership functions in memory using finite d a t a points. T h e realization and the standardization of fuzzy system description language, in which these above problems have to be considered, are expected. But until now, as fuzzy system description languages were designed by individual developers, the efficienoy of developing fuzzy system is still unacceptable. To imporve this situation, we proposed Fuzzy system Description Language(FDL) in 1996. This F D L is the first step toward standardization of a practical programming language for fuzzy applications. In this article, as the the continuous work some considerations were performed on these problems raised from experiences of F D L for describing fuzzy systems. We executed verification about internal expressions of fuzzy d a t a which become important when the area of applications of F D L is expanded. We aslo decided the representation method such t h a t it will be suitable for applications in wide range of fields. We will continue to investigate its suitableness for various utilization. Furthermore, we concluded t h a t the definitions of almost fuzzy operators can be redefined except unique definitions in a theoretical sense. T h e primary C language does not have such function, but it is very important function for fuzzy d a t a processing. In the current F D L , the processing and description styles are based on C language. We believe t h a t an object oriented language is suitable for proposed F D L . From the standpoint of the current movement of software systems, we need to consider adoption of an object oriented language for next version of F D L . It is the subject for a future study.
Fuzzy System Description Language
187
References EIAJ: Standard Specification of Fuzzy System Description Language, EIAJ AEX2002, 1996 Kazuhiko Otsuka, Yuichiro Mori, Masao Mukaidono: Steps Toward standardization of Fuzzy System Description Language, Proceedings IIZUKA '96, pp.259263, 1996 Kazuhiko Otsuka, Yuichiro Mori, Masao Mukaidono: Steps Toward standardization of Fuzzy System Description Language II, Proceedings IIZUKA '98, pp.953956, 1998 Makoto Abe, Mikio Nakatsuyama, Hiroaki Kaminaga: Fuzzy and its application, Proceedings First Asia Fuzzy System Symposium, pp.666671, 1993 Motohide Umano, Itsuo Hatano, Hiroyuki Tamura: Data Structures and Manipulation for Fuzzy Sets on Digital Computers, 9th Fuzzy System Symposium in Sapporo, pp.7780, 1993 Motohide Umano, Kenji Kume, Itsuo Hatono, Hiroyuki Tamura: Common Lisp Implementation of FuzzySet Manipulation System, Proceedings First Asia Fuzzy System Symposium, pp.660665, 1993 Liya Ding, Zuling Shen, Masao Mukaidono: The Properties of Fuzzy Logic for Fuzzy Prolog, Proceedings First Asia Fuzzy System Symposium, pp.648653, 1993 Hiroaki Kikuchi, Masao Mukaidono: Linear resolution for fuzzy logic program, Journal of Japan Society for Fuzzy Theory and Systems, Vol.6 No.2, pp.294303, 1994 Yoshifumi Inoue: Fuzzy Set Processing with C + + : 8th Fuzzy System Symposium in Hiroshima, pp.353356, 1993 Masuo Furukawa, Takeshi Miura, Takashi Matsuda: The Programming Language for Fuzzy Control, 10th Fuzzy System Symposium in Osaka, pp.551554, 1994 IEEE Standard VHDL Language Manual, Std 10761987, IEEE, NY, 1988 IEEE Standard Description Language Based on the Verilog(TM) Hardware Description Language, Std 13641995, IEEE, NY
188 K. Otsvka, Y. Mori & M. Mukaidono [13] Toyuhiko Hirota, Torao Yanaru: A Digital Representation of Fuzzy Number and Its Calculation, Proceedings of International Conference of Fuzzy Logic & Neural Networks, pp.527530, 1990 [14] Mayaka F. Kawaguchi, Tsutomu Date: On Calculations of Fundamental Operations of Weakly NonInteractive Fuzzy Number, Journal of Japan Society for Fuzzy Theory and Systems, Vol.4, No.3, pp.93105, 1992 [15] T. Takagi, M. Sugeno: Fuzzy Identification of Systems and Its Applications to Modeling and Control, IEEE Trans. Systems. Man. and Cybernetics., SMC15, 1, pp.116132, 1985 [16] L.A.Zadeh: The Concept of a Linguistic Variable and Its Application to Approximate Reasoning PartI, Information Sciences, 8, pp.199249, 1975, PartII, Information Sciences, 8, pp.301357, PartIll, Information Science, 9, pp.4380 [17] L.A.Zadeh: Fuzzy Logic and Approximate Reasoning, Synthese, Vol.30, pp.407428, 1975 [18] E.H.Mamdani, S.Assilian: An experiment in Linguistic Synthesis with a Fuzzy Logic Controller, Int. J. ManMachine Studies, Vol.7, pp.113, 1974 [19] J.F.Baldwin: A new approach to approximate reasoning using a fuzzy logic, Fuzzy Sets and Systems, Vol.2, 1979 [20] Y.Tsukamoto: An approach to fuzzy reasoning method, Advances in Fuzzy Set Theory and Applications, edited by M.M.Gupta et al, North Holland, 1979 [21] Masao Mukaidono, Kazuyuki Nojima: The Relation Between The Direct Approach and Truth Space Approach in Approximate Reasoning System, Journal of Japan Society for Fuzzy Theory and Systems, Vol.4, No.2, pp.325333, 1992
Part II: Knowledge Representation, Integration, and Discovery by Soft Computing
Chapter 10 Knowledge Representation and Similarity Measure in Learning a Vague Legal Concept
MingQiang Xu 1 , Kaoru Hirota 1 , Hajime Yoshino2
1
Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology 4259 Nagatsuta, Midoriku, Yokohama 2268502, Japan
2
Meiji Gakuin University, Legal Expert Laboratory
Abstract Knowledge representation and similarity measure play an important role in classifying vague legal concepts. In order to consider fuzziness and contextsensitive effects, for the representation of the precedent, a fuzzy factor hierarchy is studied. Current distancebased and featurebased similarity measures are only surface level ones that can't make more than a comparison between objects. Therefore, a deep level similarity measure that can evaluate the results of the surface level one is needed. A structural similarity: factorbased similarity, that is integrated by the surface level and deep level ones is proposed. An argument model t h a t is based on the proposed knowledge representation and similarity measure is proposed. Considering the vague legal concept in the United Nations Convention on Contracts for the International Sale of Goods(CISG), a fuzzy legal argument system is constructed. The main purpose of the proposed system is to support the law education. Keywords : legal expert system, fuzzy logic, knowledge representation, legal reasoning, vague legal concept, CISG, similarity measure, legal argument, casebased reasoning, factor hierarchy, context, retrieval 189
190
M. Xu, K. Hirota & H.
Yoshino
10.1
Introduction
It is known that vague concepts exist in knowledgebased systems [11]. Usually there is no single explicit representation of an entire concept or class, but only representation of the examples of the vague concept. For a query case, by learning from the examples, it can be known whether the query case belongs to the vague concept. Many methods have been developed to address this issue. The argument seems to attract the interest of researchers from various areas, especially from legal reasoning, decision theory, philosophy, psychology and cognitive science [10]. Knowledge representation and similarity measure play an important role in classifying vague legal concepts. In the conventional legal reasoning systems, fuzziness has not been deeply considered, especially in the legal knowledge representation, while the similarity measures have not been sufficiently investigated because they cannot give a contextsensitive similarity measure. Our goal is to develop a computational and representational model of argument considering fuzziness and contextsensitive effects. Factors of a case are usually employed to represent the case [ll]. We know that there is uncertainty and vagueness of the information in cases, a case has not always explicit and specific factors. Therefore, a fuzzy factors hierarchy composed of issue, atomic factors and abstract factors is studied for case representation. The current distancebased and featurebased similarity measures are only surface level ones that can just make a comparison between objects. Therefore, a deep level similarity measure that can evaluate the results of the surface level one, namely, a contextbased similarity measure is proposed. On the basis of these viewpoints, a structural similarity, i.e., factorbased similarity, that is integrated by the surface level and deep level ones is proposed to model the legal argument. Considering the vague concept in the CISG (United Nations Convention on Contracts for the International Sale of Goods), a fuzzy legal argument system is constructed. The fuzzy factors hierarchy is introduced in section 2. The similarity measure in legal reasoning is summarized in section 3. The structural similarity measure, that is a factorbased one, is proposed in section 4. The extension of distancebased and featurebased similarity measures is discussed in section 5. The contextsensitive similarity measure is proposed in section 6. The legal argument based on the structural similarity is de
Knowledge Representation
and Similarity
Measure ...
191
scribed in section 7. An experiment on classifying the vague concept of the CISG is illustrated in section 8.
10.2
Fuzzy Factor Hierarchy
Concept representation using factors is an approach often used in legal expert systems. In order to represent the complex relations between a concept and its factors, factors hierarchy is also used [ll]. A factor is either an element of a case or not in the literature. Whether a query case belongs to a concept is judged by the common and distinct factors between the query case and precedent cases. Howeve, all the factors of a case are not always crisp, namely, a factor is probably not either an element of a case or not, sometimes is an element of a case only in some extent. The value of a factor may also be fuzzy linguistic representation. So traditional factors hierarchy is not appropriate approach for modelling human internal representation of cases. A fuzzy factors hierarchy is for representation of legal cases is proposed. Definition 1 A fuzzy factor is composed of Name, Degree, Values, Relations and Agent, denoted by tuple < N, D, V, R, A>, where N : N a m e , describing the facts of cases, D : Degree, indicating to what extent a case has the factor, V: Factor Values, representing how extreme the factor is in a case to which it applies, R: Relations, representing the level of the action of support or opposition for other factors, A: Agent, represents the side whose viewpoint the factor represents. of the agents. The names of factors are usually symbolic expressions describing the facts of cases. A case usually consists of a set of factors, where the dimension of the set is the number of factors belonging to the set. In the traditional legal expert systems, the degree to which a factor is is an element of a factor set that describes a case is either 1 or 0, in other words, is in a Yes/No form. But it is sometimes difficult to judge in the Yes/No manner because of vagueness and uncertainty. The degree to which a factor is an element of a factr set can be described in several ways. We employ the membership concept and vagueness concept to represent the fuzzy Yes/No [7], where the adaptation that specific
192 M. Xu, K. Hirota & H. Yoshino
knowledge described by limited words is represented by the concept of membership, and the uncertainty of knowledge is represented by the concept of vagueness. The degree to which a factor is an element of a factr set can be described in several ways. We employ the The values of a factor are its magnitudes, often represented by quantity. Quantitative property is related with numeric data, and it is an important nation in argument system. It usually has three types of representations, including crispdata, intervaldata and fuzzydata. The relation between factors can be strengthened or weakened with qualifiers, which could be crisp or fuzzy. In different domains, different relational expressions exist. For example, in the case of the CISG, those could be the expression like: {Best support, Support, Contrary, Best Contrary}. Each factor represents the viewpoint of one of the agents who takes part in the argument. In legal argument system, agents usually include only a defendant and a plaintiff.
Legend Y Factor favors the issue N Factor unfavors the issue ++: Best Support +: Support : Best Contrary  : Contrary
Fig. 10.1 An Example of Fuzzy Factors Hierarchy in the CISG
Fuzzy factors are divided into fuzzy atomic factors and fuzzy abstract factors.
Knowledge Representation
and Similarity
Measure ...
193
Fuzzy atomic factors represent the surface facts of a case. The abstract factors are the connections between the claim of a vague concept and the fuzzy atomic factors. The claim of a vague concept is called Issue. A fuzzy factors hierarchy can be defined in the Definition 2. Definition 2 A factors hierarchy composed of three layers. Top layer: Issue Middle layer: Abstract factors B o t t o m layer: Atomic factors The top layer of a fuzzy factors hierarchy contains only a single node, representing the claim of a vague concept. The bottom level is the fuzzy atomic factors. The factors of the middle layer are structured by fuzzy abstract factors. A node can be expressed by conjuction and and disjuction or of the subnodes. There is usually more than one node in the middle layer. The fuzzy factors hierarchy itself represents a causal connection between an Issue and the related fuzzy atomic factors as well as fuzzy abstract factors. Figure 1 shows an example of a fuzzy factors hierarchy of the legal reasoning system for the CISG. The top level Issue is a claim of the vague concept "The proposal is sufficiently definite". The F l , F2, F3, F4, F5, F6 and F7 are abstract factors, while fl, f2, f3, f4, fo and f6 are atomic factors. The meaning of these factors will be explained in section 8.
10.3
Similarity M e a s u r e in Legal Reasoning
In legal reasoning, the similarity measure is not as simple as that of the other domains such as pattern recognition or data mining since not only the numeric value but also symbolic value must be considered. Therefore, these special characteristics of the similarity measure in legal reasoing are given below. Considering the Fuzziness The fuzziness in legal case has been represented by fuzzy sets in Chapter 2. the same fuzziness should be considered in the similarity measure as well. In fact, fuzziness is also a reason of legal argument. Because of fuzziness,
194
M. Xu, K. Hirota & H.
Yoshino
similarity and distinction in the judgments of plaintiff and defendant exist. Considering Agent's Viewpoint A characteristic of the legal reasoning is that the agents typically include a plaintiff and a defendant. Because an argument exists between them, the viewpoints of two agents should be considered in similarity measure, Usually, one agent proposes a claim based on the similarities between cases, whereas the other agent uses the distinction between the cases to downplay the importance of the similarities. Without distinction, there is no argument. So, in legal argument, both similarity and distinction, that express the viewpoints of two agents, should be measured according to the movement of legal argument. ContextBased Similarity Measure In some situations, the similarity measure is changed with the meaning of the context, e.g., in a fuzzy factor hierarchy, the similarity between the atomic factors is different with the related abstract factors. So, the context plays an important role in the similarity measure. In legal argument, the significances of similarity and distinction are emphasized by the plaintiff and the defendant, respectively. Thus, besides the distancebased similarity and the featurebased similarity, similarity measure based on the context, that can measure the significances of similarity and distinction, should be considered in making an argument. Integrated Similarity Measure Similarity measure in legal argument changes with contexts and viewpoints. Consequently, it is not the same as the conventional distancebased and featurebased similarity measure, that only have final numeric results, but is composed of several stages of similarity measures, which change with the movements of the argument made by the plaintiff and the defendant. Therefore, the final similarity measure should be an integration of the distancebased, featurebased and contextbased similarity measures. Similarity Measure with Explanation The similarity measure in legal reasoning aims to classify vague concepts. It is necessary for users to know not only the conclusion, but also the reasoning process. The process explaining the classification of a vague concept, in the case of a law education system for students, is probebly more important. The analysis above shows that the current similarity measures are not
Knowledge Representation
and Similarity
Measure ...
195
sufficient to meet the required characteristics. For measuring the similarity in the legal argument, first, the traditional similarity measures are extended, and then a new similarity measure is proposed in the next section.
10.4
Structural Similarity Measure: Factorbased One
When measuring the similarity between a query case and the precedent case, intuitively, at first, we find the precedent that is the most onpoint case from the case base. In this stage, the similarity denotes only the precedent that has the largest number of common features with the query case. Then, the distinctions of factors in the two cases are found by the dissimilarity measure made by the opponent. Finally, the significance of the similarities and distinctions are further evaluated by both sides. In order to achieve correspondence with this kind of cognition process of human beings, the similarity measure is classified into surfacelevel similarity and deeplevel similarity. The comparison of these two types of similarity measure is summarized in Table 10.1.
^
\
^
Surface Level
CaseDependant Individual, Numeric Value Retrieval, Comparison
Deep Level
DomainDependent Aggregation, Symbolic Value Retrieval, Evaluatior
Knowledge Representation Output Function
Table 10.1
Comparison of Surfacelevel Similarity and Deeplevel Similarity
For the factor hierarchy representation, where the knowledge of case is represented by multiple levels of factors, similarity measure between the cases is made at the layers of atomic and abstract factors. The similarity measure varying with the knowledge of cases is described at different levels of factors. The similarity measure includes not only the component similarity measure, such as the one between atomic factors, but also the
196
M. Xu, K. Hirota & H. Yoshino
relation between atomic factors and abstract factors. The former is the surface similarity and the latter is the deep similarity. Their integration is a structural similarity measure shown in Figure 10.2. Namely, the factorbased similarity measure(F s , m ) is composed of the distancebased(D S j m ), Tversky model(T S j m ) and the newly proposed contextbased one(C,j m ), as the following equation shows.
P\L'simi
•*•
simi^sim)
(1)
Structural Similarity
Surface Level Component Distancebased Similarity
>
Component
•
Featurebased Similarity
i
. Deep Level Component Contextbased Similarity
Fig. 10.2 Structural similarity
Surface similarity measure has several properties(e.g. symmetry. These properties are not necessarily satisfied by the structural similarity because of the complexity and domainknowledge independence. The properties are investigated below. 1. M i n i m a l i t y Not all cases in the precedent base have the same status. Certain ob
Knowledge Representation
and Similarity
Measure ...
197
jects are considered more prototypical than the others, or in some way more central to the domain, or particularly salient and distinctive [2]. It is suitable for the legal domain. If the precedents are sentenced by different courts, a preference order is present among them. 2. Symmetry Because structural similarity is an integration of surfacelevel similarity and deeplevel similarity, there is no simple symmetry. Generally, we can not say that a precedent(P) is similar to a query case(Q), just as it cannot be said that the parents are similar to the child [l]. The reason is that the precedent case preceeds the query case, and it was sentenced in the court. However, we can say that a precedent and a query case are similar to a context such as the concept: a proposal is sufficiently definite.
^context ^context (Q,P)
3. Triangular inequality This axiom states that if a is similar to b, and b is similar to c, then a and c cannot be very dissimilar. Namely, a and c can be related to b in different aspects. In structure similarity, transitivity does not exist in a broad sense, i.e. if case P I is similar to case Ql and P I is similar to case Q2, then Ql and Q2 are not necessarily similar to each other, because the similarities exist in different contexts and viewpoints.
10.5
Extensions of Distancebased and Featurebased Similarity Measures
Similarity measures are defined at three layers: atomic factors, values of atomic factors, and aggregation of similarities and dissimilarities. Appropriate similarity measures are chosen and applied based on the requirements. The notion of similarity is usually based on the distance(metric) and the feature set[l]. The comparisons of factors values are measured by the distancebased similarity. The fuzzy atomic factors comparisons are measured by the fuzzy Tnorms. Aggregation of similarities and dissimilarities is based on the extension of feature similarity measure proposed by Tversky [l].
198
M. Xu, K. Hirota & H.
Yoshino
10.5.1
Comparisons of Factors Values
A triangular membership function can be used to represent the fuzziness of factor values. For example, the magnitudes of the importance of the important part in the vague concept " The proposal is sufficiently definite" in the CISG can be described by fuzzy membership function. There are several methods for the determination of similarity measures of fuzzy sets [5]. Because the fuzzy set used here becomes a singleton when the factor values become crisp data, and two fuzzy sets sometimes do not overlap, the methods in [5] can not handle these problems. The difference between the centers of gravity of two membership functions A, B is used to measure the similarity. The distance between the two centers of gravity, i.e., \CG{A) — CG(B)\, is used to describe the similarity degree. To satisfy the axioms of similarity theory, the degree of similarity S(A, B) is calculated by S(A,B) 10.5.2 = l\CG(A)CG{B)\. (2)
Fuzzy featurebased Similarity Measure
The modification of the Tversky model is based on considering the fuzziness and the factor hierarchy. Modification includes some extensions to the Tversky model. The Tversky model is modified in the following way. Fuzziness To deal with the fuzziness in the existence of factors, the Tversky model was extended into a model considering the fuzziness. Even though an extension that considers the fuzziness alreday exists in [3],it doesn't satisfy the requirement of the legal argument. Here, a further modification is introduced. Let CB={Pi,P2, ...Pi...Pn} be the case base, and Q be a query case. Each case is represented by a set of factors. Pj={mp., ...mpk...Tnp.}, k Q = { T O Q ,...TTIQ ...TTIQ}, where mp . and TUQ denote the degrees to which the factor fk ( {k = 1,..., t), t is the number of atomic factors) exists in the cases Pi and Q, respectively. They are judged in terms of the membership and vagueness concepts introduced in [7]. The similarity S and the distinctions DS1,DS2 between a query case Q and a precedent case P; are:
Knowledge Representation
and Similarity
Measure ...
199
S(Q,Pi) = f(QnPi),
(3)
DSl(QlPi)
= f{QPi),
(4)
DS2(Pi,Q) Factor Set Similarity(FSS):
= f(PiQ).
(5)
f(QnPi) = Y,SH(mfQk,m%),
&=i
t
(6)
where SH is a distancebased similarity measure between fuzzy sets that will be discussed in the next section. Factor Set Dissimilarity(FSDS) f(QPi) = Max{Max{m%  m £ , 0)}' fc=1 , (7)
f(P{ Q)
= Max{Max{mP\
 m J , 0)}*fe=1.
(8)
The distinction between P and Q is not decided simply by the addition of the Cardinality of the set, but by finding the factor that has the largest Mox(m^  nip., 0). The degree to which a factor is an element of a case is determined by the center of gravity of the membership function that is described by the membership concept and vagueness concept. Factor Hierarchy The fuzzy factor hierarchy that is composed of the factor value, the atomic and abstract factors, as well as their relations, should be considered in the similarity measure. Virtually, the only atomic factors are applied, the similarity measure is a surface one. The relations between atomic factors and abstract factors have a great influence on the similarity measure between the atomic factors in terms of the Tversky model. By considering the abstract factors of the fuzzy
200
M. Xu, K. Hirota & H.
Yoshino
Abstract Factors
Precedent Fig. 10.3
Query Case
Factorbased Similarity Measure
factor hierarchy representation, this surface similarity can be developed into deep similarity that can evaluate the similarity and the distinction. This modification is, in fact, done by a new similarity measure that can evaluate the significance of similarity and distinction (Figure 10.3). The meaning of the significance means: Finding similarity from the distinction according to the other viewpoint and context, in order to downplay the distinction. Finding distinction from the similarity according to the other viewpoint and context, in order to downplay the similarity. In other words, a part of the expected extension of the Tversky model is, in fact, substituted by a deep similarity measure that is referred to as a contextbased similarity measure, and will be discussed in the next section. As a conclusion, factorbased similarity measure is a structural one, being composed of the distancebased(d S i m ) and Tversky model(T s ; m ) (the two considering the fuzziness), and the contextbased one(c s ; m ). The way in which these kinds of similarity measures can be integrated is domain dependent.
Knowledge Representation
and Similarity
Measure ...
201
10.6
Contextbased Similarity Measure
The relation between context and similarity measure was discussed only several years ago [4], and has not been studied sufficiently. In this section, at first, contextbased similarity measure is defined, and is classified in several ways. Then, the most important classification, i.e. the retrieval and evaluation, is further defined, and its properties are also discussed and proved. 10.6.1 Definition and Classification
The similarity measure is affected by the context, namely, it changes with different situations. For example, the goods Jet Engine System and Cultivator Unit are not incomparable because they are not the same type of goods, have the different functions, etc. However they are not only comparable but are also similar if it is judged by considering the composition of goods, i.e. they are both goods that are composed of several parts. The relation between context and similarity is defined below. Definition 3 (Contextbased Similarity Measure) Given two objects: query object OQ and target object OT, a context CX, and a viewpoint VP, then similarity measure is a relation between them. It is represented by the following formula.
Csim(0T,
0Q, CX, VP),
(9)
where OT is a target object, OQ is a query object, CX is a context, VP is a viewpoint, CSim means the contextbased similarity measure relation. In the above formula, the similarity measure is related to four parameters. If a parameter is a constant, it is an input. If a parameter is a variable, it can be regarded as an output. According to whether the parameter is constant or variable, the similarity relation has different functions. If CX in CSim is known, C s j m has a function of retrieval, namely, retrieving the relevant elements from the target object according to CX. If CX is unknown, but OQ or Op contain some known parts, then, the similarity measure makes a evaluation on OQ or Op by finding an element of CX. Therefore, the similarity measure can be classified as:
202 M. Xu, K. Hirota & H. Yoshino
Retrieval, Evaluation CSim is a deep similarity measure. The retrieval has a role of relation between the surfacelevel and the deep similarity measures. The evaluation is the essence of C S j m . The retrieval and evaluation function of contextbased similarity C S j m will be discussed in detail in the following section. The retrieval and evaluation in the similarity measure can be classified further based on different aspects like: knowledge resource and change of context. According to knowledge resources of the context, it can be classified by: Factor Values, Factors , Similarity Measure The context can include constraint conditions such as the range of a factor value, a specific factor, or matching criteria, used to retrieve the expected information from a case or a case base. It allows us to assess the similarity and retrieve the relevant information by means of the constrainted context. The relevance between the current task and current goal is returned in a semantic way. On the other hand, the similarity and distinction in the surfacelevel similarity measure can be evaluated by a context such as abstract factors. The similarity measure in the above classification is the result of the surfacelevel similarity measure. According to the viewpoint of the change of context, it can be classified by: Static: indicates that context is predefined by user or expert. Dynamic: indicates that context is obtained through the reasoning process. Context can be the constraint conditions predefined by users, for example, the value range of a atomic factor. If the predefined constraint is fuzzy, then the contextbased similarity includes the similarity measure between fuzzy sets. It can also be the intermediate result inferred by the system, in case when it is decided by a reasoning process. Which context is used can be judged based on criteria. The judgment criteria come from knowledge representation. So, given a target, the context used for the similarity measure changes with the reasoning mechanism. In this meaning, the similarity measure is a learningbased measuring process.
Knowledge Representation
and Similarity
Measure . . .
203
Fig. 10.4
Illustration of Retrieval in Similarity Measure
10.6.2
Retrieval in Similarity Measure
The retrieval in the proposed similarity measure has multiple purposes, including the retrieval of the most on point case from the case base, the shared factors and unshared atomic factors between cases and the factors whose values are similar or different and so on. A definition is as follows. Definition 4 (Retrieval) Given a target object OT, an element / Q of query object OQ, a context F G CX, and a viewpoint V P , RCX(OT,OQ,CX, VP) means that 3/x € OT is an output if / Q and fa are viewed as the same in the context F and the viewpoint V P . The parameters of Rcx are changed with the inference process. For example, the search space Or may be a set of precedents, or a set of the retrieved case from the precedent base. CX may be the surfacelevel similarity, or the predefined range of factor values, etc. Monotonicity in retrieval Monotonicity is very useful in retrieval. Retrieval results can be adjusted by the control range of a context. Obviously, if there are no ideal objects in the results, the initial context may be too restrictive, and it should be relaxed, whereas the context should be restricted to diminish the retrieved objects. In order to obtain effective retrieval, the context must be ordered. Assume CXi C CXj, namely, CXj is more restrictive than CAY By the
204 M. Xu, K. Hirota & H. Yoshino
"OcJ
Fig. 10.5 Illustration of Evaluation in Similarity
above definition, Rcx(Ox,OQ,CXi,VP) and RCX(OX,OQ,CXJ,VP) imply fx C fj., where fT and fT are the elements of Ox in the sense of CXi and CXj, respectively. So, the less results are retrieved. This is because the more restrictive context means that more factors or factor values are required to satisfy the context. 10.6.3 Evaluation in Similarity Measure
The evaluation function follows the retrieval in order to produce the similarity measure in a context and a viewpoint. Definition 5 (Evaluation) Given a target object Ox, a query object € OQ, a context set CX, and a viewpoint V P , ECX(OT,OQ,CX,VP) outputs F € CX if 3 / T € Or and 3 / Q € OQ, fx and / Q are viewed as the same in the context F € CX and viewpoint VP. It can be further classified as follows. Ecx(0T,fQ,CX,VP) Or and CX are considered as a reference and / Q is compared to the element of Or by finding F 6 CX and fx £ Or, such that fx and / Q are the same in F and VP. Ecx(fT,0Q,CX,VP) OQ and CX are considered as a reference and fx is compared to the
Knowledge Representation
and Similarity
Measure ...
205
element of OQ by finding F € CX and / Q e OQ, such that fa and / Q are the same in F and VP. Ecx(fT,fQ,CX,VP) fr and / Q have the same kind of status and none of them is a reference. Finding F £ CX, let fo and / Q be the same in F and VP. In the above formulas, if F is found, the result also returns True, otherwise, the result also returns False. The properties of the evaluation in the similarity measure are discussed below. Symmetry in evaluation With respect to the similarity in evaluation, if in OT there exists fr that is the same as the JQ € OQ in the context F € CX and viewpoint VP, the result is T. If the order is changed, i.e. the known input is / Q , there exists fr that is similar to / Q in the meaning of F € CX and viewpoint VP, the result is also T. Consequently, if the / x and / Q are viewed as the same in the meaning of CX and VP, the evaluation satisfies the symmetry. Transitivity in evaluation Let an element A of OT or OQ viewed in context CX and viewpoint VP be denoted ACX,VPIf A € OT is the same as B € OQ in CX and VP, and B € OT is the same as C € OQ in CX and VP, it can be known that
Acxyp = BCX,VP and Bcxyp = CCX.VP, therefor, ACX.VP = Ccxyp
This property is useful in citing the similar precedent to support own viewpoint, and oppose the viewpoint of an opponent.
10.7
Structural Similarity and Making Argument
Because interpretation of a vague legal concept in a case is related to the debates made by plaintiff and defendant. Whether a precedent case is similar to a query case, usually is debated by these two sides. In an argument, if one side analogies the query case to a precedent case, the other side will distinguish them, and if one side emphasizes the similarity, the other side will downplay it, namely, emphasizes the dissimilarity of them. So both similarity and dissimilarity should be measured. A factor has different values in different cases. It should also be considered that every factor has a pro or con direction in reasoning process.
206
M. Xu, K. Hirota & H. Yoshino
The proposed argument model in legal reasoning system consists of three steps: 1. Side l ' s Claim The similar cases are retrieved from a set of examples. The case that has the largest similarity degree of facts to that of the query case is as the most on point case Pmopc If the conclusion of case Pmopc favors the one side, the conclusion is regarded as the one side's claim. Iciaim = { The conclusion of the case P mop c}U{ The factors that support the conclusion of the case P m o p c } Pmopc is decided by the Factorbased similarity and the Retrieval in Similarity.
Rcase(CB,
Q, Ts, VP) = Ociaim,
(10)
where VP denotes the viewpoint of the Side 1, the similarity Ts defined in the section 3 is the context. When Ts satisfies :
TS = Max{f(Q, it is known that Oclaim = P m o p c . 2. Side 2's Objection
P 0 } r = i , Pi e CB,
(11)
Another side finds the distinction from the query case and the case Pmopc, and emphasizes it. The distinction includes the difference between the shared factors, and the unshared factors. The former is to find the difference between the values of shared factors, the latter is to find the factor that favors this side from the unshared factors. lobjection = { The factor 0®f that has the largest dissimilarity degree FSDSQ in the query case} U { The factor 0^ that has the largest dissimilarity degree FSDSp in the precedent Pmopc} U { The shared factor Odv but has different factor values} They are decided by the Factorbased similarity and the Retrieval in Similarity.
R factor {Pmopc,
Q, ^DSl,
VP)
= Odj,
(12)
Knowledge Representation and Similarity Measure ...
207
Pfactor\Qi
Pmopci TDS2, VP) = 0%,
(13)
•tifactorK^mopci V, Usimi v ")
=
Udv
{*•**)
The similarity measures should satisfy the following equations.
TDSI
= f(Q
~ Pmopc),
(15)
TDS2 = f{Pmopc  Q).
(16)
Dsim means the dissimilarity between numeric values, it can be computed by the usual distancebased similarity measure [12] 3. Side l ' s Rebuttal The first side downplays the dissimilarity by finding the factors that can disregard the difference emphasized by the other side by the evaluation function introduced in last chapter. The distinction found by the side 2 is evaluated by the side 1. hebuttai={ The abstract factor Osf that can downplay the differences }
CX, VP) = 0$Jl°pc,
Efactor(Pmopc, Oy,
(17)
Efact0AQ,
0%, C X , VP) = Off,
(18)
where CX is a set of abstract factors of the fuzzy factor hierarchy, and VP is the viewpoint of the Side 1. In these 3 steps, if there are relevant cases supported it then they should be cited. By these 3 steps, the computational model is structured as shown in Figure 10.6. A contextsensitive interpretation Imferencei namely, the output of the proposed argument model, can be obtained as (19)
inference — *claim U ^objection U ^rebuttal
208 M. Xu, K. Hirota & H. Yoshino
Input
Case Base
T s
Rcase
Iclaim
Rfactor
L
Iobjection
DS1
^DS2
D
Efactoi
I rebuttal
Factorbased Similarity
Output
Fig. 10.6 Computational Model of Fuzzy Legal Argument
10.8
Experiment
This experiment is based on the vague concept and the cases of the CISC The vague concept "The proposal is sufficiently definite" in the CISG is employed to illustrate how to make a legal argument by the proposed approach. The fuzzy factor hierarchy of this vague concept that is focused in the fixing of price is shown in Figure 10.1. The meanings of them are represented as follows. fi: The important part has a price / 2 : The attachment has no price f%: The attachment is not sold in market /»: There is no product that can substitute the attachment / 5 : There is an important part in the goods /6: There is an attachment f7: There is no similar product for the attachment F l : Indicating the goods F2: Fixing the quantity F3: Making provision for determining the quantity
Knowledge Representation
and Similarity
Measure ...
209
F4: Fixing the price F5: Making provision for determining the price. F6: The goods are composed of several parts. F7: There is not the market price. The following case Cultivator Case is used as a query case. 1) On April 1, company C in New York dispatched a letter containing an offer to the business branch of a Japanese company D in Hamburg, the content of which is that C sells a set of cultivator(the price of the tractor itself is $50,000 to D. The tractor should be equipped with a rake, which is product of company E. The farming machinery is delivered by a U.S. cargo ship). 2) The letter reached D on April 8. 3) On April 9, D telephoned C to tell "I accept your offer, but you should transport the machinery by a Japanese container" Students can at first decide the degrees that this query case has the factors by referring to the atomic factors of the fuzzy factor hierarchy, then in the light of the output of the legal argument to learn the argument skill and further comprehend the meaning of the vague concept and the query case by comparing with the argument made by himself. In the case base, there are 8 precedents that are represented by fuzzy factor hierarchy, for example, the atomic and abstract factors of Jet Engine Case related to the issue are as follows: /i: fa /3: f$: /6: f7: The important part has a price The attachment has no price The attachment is not sold in market There is an important part in the goods There is an attachment There is no similar product for the attachment
If the following atomic factors are considered to be the properties of the query case Cultivator Case, an example of the output of this system for the query case is shown in Figure 10.7. /i: f>: /3: /4: The important part has a price The attachment has no price The attachment is not sold in market There is no product that can substitute the attachment
210
M. Xu, K. Hirota & H.
Yoshino
Fig. 10.7
An Example of the Legal Argument
/ 5 : There is an important part in the goods f&: There is an attachment The explanation for the process of the argument is as follows. Plaintiff's Claim The proposal of the query case is not sufficiently definite, because it has the most high similarity degree with the Jet Engine Case that has the conclusion: the proposal is not sufficiently definite, because the query case has the factors fa, $3. Defendant's Objection Jet Engine Case case is not applicable to the query case, because there is /*4 in the query case, and fj in the precedent. Plaintiff's Rebuttal Jet Engine Case case is still applicable to the query case, even though there is / 4 in the query case, but it is the same as the f7 in the Jet Engine Case under the meaning of the abstract factor F7. They both support the abstract factor F7: there is not the market price. So, the proposal is not
Knowledge Representation
and Similarity
Measure ...
211
sufficiently definite. The output of the system is different with the judgments selected by the user. It is helpful for users to know that results are changed by the different inputs. It also helps users to learn the skill of making legal argument. And, it is also helpful for students to understand the meaning of the statutory rules of the CISG and the meaning of the precedents and query case from the viewpoints of plaintiff and defendant.
10.9
Conclusion
The proposed structural similarity measure  factorbased similarity measure  establishes a framework of a similarity measure in legal argument. It is very important for the knowledgebased system, where the objects should be represented based on the content that cannot be represented by a simple flat structure. This work developed the study of similarity measure in legal reasoning and provided a theory basis using the similarity measure to build more effective and efficient intelligent legal reasoning systems. The proposed approach can be applied to more domains, such as diagnosis and decision support systems beyond the legal domain, especially where strong domain theory is not available. By the fuzzy factors hierarchy, the uncertainty and vagueness of concepts are represented. The similarity and dissimilarity are used to represent the debates between agents. The argument based on the similarity and dissimilarity measures is modeled. In terms of factor hierarchy and the organization of the argument model, a vague concept can be learned from examples. An example on legal reasoning system is used to verify the effectiveness of this model. Extending the case base, and considering hypothesis in the argument are the items in the future work.
212 M. Xu, K. Hirota & H. Yoshino
References [1] Amos Tversky, "Features of Similarity", Psychological Review, Vol.84, No 4, pp. 327352, 1977 [2] Athena Tocatlidou, "Learningbased Similarity Measurement for Fuzzy Sets", Int. J. of Intelligent System, Vol.13, pp. 193220, 1998 [3] Bernadette BouchonMeunier, et al., "Towards general measures of comparison of objects", Fuzzy sets and Systems 84(1996), pp. 143153 [4] Y. Chang, "ContextDependent Similarity", Uncertainty in Artificial Intelligence 6, 12 P.P. Bonissone, et al.(editors), pp. 4147, NorthHolland, Ny, 1991 [5] Chen S.M., Yeh M.S, Hsiao P.Y., "A Comparison of Similarity Measures of Fuzzy Values", Fuzzy sets and Systems, Vol.72 no.l, pp. 7989, 1995 [6] Edwina L. Rissland, Kevin D. Ashley, "AA CaseBased System for Trade Secretes Law", Proc. of ICAIL'87, pp. 6065, 1987 [7] Kaoru Hirota, "Extended Fuzzy Expression of Probabilistic Sets", In Advances in Fuzzy Set Theory and Applications, M.M.Gupta et al.(eds.), NorthHollaand, pp.201214, 1979 [8] Kaoru Hirota, et al, "A Precedentbased Legal Judgement System Using Fuzzy Database", Int. J.of Uncertainty, Fuzziness and KnowledgeBased Systems, Vol.4, No.6, pp. 573580, 1996 [9] Nikola Schretter, "A Fuzzy Logic Expert System For Determining the Required Waiting Period After Traffic Accidents", EUFIT'96, 1996 [10] Nikos Karacapilidis et al, "Using CaseBased Reasoning for Argumentation with Multiple Viewpoints", Proc. of ICCBR 1997, pp. 541552 [11] Vincent Aleven, Kevin D. Ashley, How Difference Is Difference? Arguing About the Significance of Similarities and Differences, Proc. of EWCBR'96, pp. 115, 1996 [12] Zwick R. Carlstein E. et al, "Measures of Similarity Among Fuzzy Concepts: A comparative of Analysis", Journal of Approximate Reasoning, Vol.1, pp. 221242, 1987
Chapter 11 Trend Fuzzy Sets and Recurrent Fuzzy Rules for Ordered Dataset Modelling
J.F.Baldwin, T.P.Martin, J.M.Rossiter
University of Bristol, UK
Abstract We present two methods of modelling ordered datasets using Baldwin's mass assignment. The first method generates a simplified memorybased fuzzy belief updating model. Results are given in application to particle classification and facial feature detection. The second method uses a new, high level, fuzzy trend feature based on a set of fuzzy trend prototypes. These prototypes are closely related to human perceptions of shape in ordered series. The models generated using this method are concise and linguistically clear glass box models. Results are given in application to sunspot and simple sinewave data series. Keywords : mass assignment, fuzzy sets, perceptionbased modelling, memorybased modelling, ordered datasets, time series, trend modelling, belief updating, Pril, high level features, recurrent fuzzy rules
11.1
Introduction and background
In this paper we will describe how mass assignment can be the enabling factor in the modelling and prediction of ordered datasets. The models produced are clear, concise and descriptive glass box models of ordered data. Results are given for gas particle classification, for facial feature extraction from digital images and for time series problems, including the prediction of sunspot activity. We take two approaches to the problem of modelling ordered datasets. Both are derived from simple analysis of human behaviour. Our simple perceptionbased and memorybased models of ordered datasets aim to use high level features and mechanisms based on human behaviour.
213
214
J. F. Baldwin,
T. P. Martin & J. M.
Rossiter
Perceptionbased modelling using soft computing with mass assignment is based on the high level perception mechanisms used by humans to sense their environment. This includes human hearing, touch, smell, and, in our case, vision. The decomposition and interpolation properties of soft computing techniques using mass assignment are well suited to manipulating these complex features. Memorybased modelling using soft computing does not focus on the features being manipulated, such as the perceptionbased features described above, but on how the computing model captures human belief and memory. Implicit in this model is a representation of belief and a method of updating beliefs. Our tool for this soft computing research is the fuzzy set theory based on mass assignment [4]. We will summarize the features of mass assignment which are used in this paper and point the reader in the direction of further information on this exciting soft computing paradigm. We assume the reader has a basic knowledge of probability, possibility and classical fuzzy set theory and has, at the very least, read Zadeh's original fuzzy set publication [ll]. This paper builds on the research first presented in [6] and [7]. 11.1.1 A mass assignment interpretation of fuzzy Sets
Baldwin's mass assignment unifies probability, possibility and fuzzy sets into a single theory. This section is intended as a brief summary of mass assignment and a pointer to more detailed literature. 11.1.1.1 Mass assignment definition
A mass assignment m is defined on the powerset V(X) of the universe X such that the following three conditions hold. (1) m:V(X)>[0,l] (2) m ( 0 ) = 0 (3) Y,Aev(x)
m A
()
=
1
Consequently we can say that VA e V(X), m{A) < 1. Note also that m(A) can be greater than m{B) even when AC B. Complete certainty can be expressed by the mass assignment m(A) — 1, where A is a singleton in X and VB ^ A, m(B) = 0.
Trend Fuzzy Sets and Recurrent Fuzzy Rules . . .
215
Complete uncertainty can be expressed by the mass assignment m(X) = 1. 11.1.1.2 Mass assignments and probability distributions
Given these restrictions it is clear that a mass assignment defines a family of probability distributions over the universe X. Take for example a mass assignment m = {xl} : 0.2, {zl,a:2} : 0.3, {x2, x2>) : 0.4, {a;l,a;2,a;3} : 0.1 denned over the universe X = {a;l,a;2,a;3}. This mass assignment enforces the following probability restrictions, where u,v,x, and y are variables restricting the family of probability distributions allowed, Pr(a;l) = 0.6 — x — u — v Pr(a;2) = x + y + u Pr(a;3) = 0.4  y + v based on the restriction variables u,v,x, and y, 0 < x < 0.3 0 < y < 0.4 u,v > 0 0 < u + v < 0.1 11.1.1.3 A voting model of fuzzy sets
Mass assignments are related directly to fuzzy sets, and hence to possibility distributions, through Baldwin's voting interpretation of fuzzy sets [l] Using the voting interpretation any normalized fuzzy set can be expressed as a mass assignment. Figure 11.1(a) shows a simple discrete fuzzy set T = {o/l, 6/0.7, c/0.5,d/0.1}. The mass assignment m is derived from T using the voting interpretation as illustrated in figure 11.1(b). In this example we consider a group of 10 voters, each of whom must vote on the acceptance of a member of V{X) given T. From figure 11.1(b) we see that voters 10,9, and 8 accept only a, voters 7 and 6 accept only a or b, voters 5,4,3, and 2 accept only a or b or c, and voter 1 accepts any of a,b,c or d. Normalizing the number of voters accepting any one proposition generates the mass assignment m for fuzzy set T shown in Eq. 1.
m = {a}: 0.3, {a, b} : 0.2, {a,b,c} : 0.4, {a,b,c,d}
: 0.1
(1)
216
J. F. Baldwin, T. P. Martin & J. M. y p
Rossiter voters
>m{a}=0.3 1 ^m{a,b}=0.2
m{a,b,c}=0.4
I! fn[r m { a ' b ' c ' d ' = 0 ' 1
a b e d a b e d
Fig. 11.1
A voting interpretation of fuzzy sets
Having converted a fuzzy set into a mass assignment we can now use the calculus of mass assignment to reason with fuzzy sets at the mass level. The advantage of this representation is the close relationship between mass assignments and their corresponding families of probability distributions. Mass assignment therefore provides the crucial link between probability and fuzzy sets. This is a great enabler in developing soft computing solutions based on a more unified theory than commonly used by the fuzzy logic community. The programming language Fril implements mass assignment and support logic calculus in a logic programming framework. For more information on Fril, mass assignment and support logic see [8] and [5].
11.2
Memorybased fuzzy belief updating
In this section we will present a simple belief updating system using recurrent fuzzy rules. As we shall show, this system improves class prediction in ordered datasets. We will approach the problem of belief updating from a humancentred memory and belief angle.
Trend Fuzzy Sets and Recurrent Fuzzy Rules ...
217
11.2.1
Why belief updating for ordered
datasets?
The belief updating method proposed in this section was developed in response to problems encountered in classifying data in particle streams. The following example outlines the particle classification problem. The particle stream classification problem
This problem involves the detection of hazardous particles in a stream of gases. This problem is important in both closed spaces, such as in chemical plants and in mine shafts, and in open spaces, such as during toxic chemical leaks and chemical and biological warfare. The gas to be analysed flows past a sensor. This sensor generates a 6 feature tuple < E1,E2,E3,E4:,T > where E1,...,E4 are continuous domain features and T is some measure of time. Figure 11.2 shows this more clearly.
Ordered Dataset
Output values T El E2 E3 E4 Input class
Fig. 11.2
A particle detector
A system is needed which takes each tuple in the ordered dataset in turn and generates a confidence that the current gas being detected is of a particular class. This confidence can be used to calculate concentration of each gas in a mixture.
Problems exist in such ordered datasets where class boundaries are indistinct. Indistinct classification is clearly an area where soft computing can have an impact. Given that the datasets we are dealing with contain some (however loose) ordering, it is natural to use information contained in the order itself when modelling the dataset. As with many other methods we could, of course, consider some window of size m on the ordered dataset and process all or part of the window. This is shown schematically in figure 11.3 where et=0,... ,et=n is the in
218
J. F. Baldwin, T. P. Martin & J. M.
Rossiter
put stream of evidence and ct—n is the classification based on the evidence stream at time t. Clearly this can lead to computational explosion as the size of the window increases, although this is also determined by the methods employed by the artificial decision maker. It is more practical, efficient and elegant to model the ordering of data as a stream of evidence presented to a human being. This is shown in figure 11.4. The human decision maker here is only given one piece of evidence et=n at a time and must judge classification ct=n at any time using only its memory of past evidence and the current evidence.
't=l\^l=0
evidence window size =m
e
t=l t=m
artificial decision maker
classification
"
c
t=n
e
Fig. 11.3
Decision making on a windowed evidence stream
'/=n: c /=m  
c
f=i
c
i=0
single evidence term e,=„ Fig. 11.4
classification human decision maker —  c , _ .
Decision making on a pointbypoint evidence stream
To keep complexity to a minimum our model uses only a single memory component in much the same way as a human would. This memory component holds a measure of belief in a classification givfai all previous evidence. Problems now arise in constructing a belief updating model using a single memory term that both represents human belief updating and is robust. There are many other real world situations where such a simple belief and memory model derived from human behaviour is applicable. The student grades example below is one such problem.
Trend Fuzzy Sets and Recurrent Fuzzy Rules ...
219
A student grades example
A student is working for his exams. To help him prepare he is given weekly tests. The student's teacher must assess the competence of the student after each test. Competence is graded as low, medium or high. Table 11.1 shows that, given the test scores, the teacher would grade this student with low competence for each of the first three weeks. At the fourth week the student's grades improve markedly and the teacher faces a problem in assessing the competence A of the student. If the assessment of competence A is low then the teacher is clearly discounting the weight of the fourth week test result. If the competence A is judged high then the teacher is placing too much emphasis on the latest scores and not enough on past performance. Week 1 2 3 4 Table 11.1 Score 32 35 34 78 Competence low low low A
Student grades example
Since competence is generally agreed to be a measure of overall performance in a subject, some aggregation of previous scores is appropriate. In reality most teachers would grade the student with medium competence at the end of week 4 This example shows how the teacher's judgment is a function of the belief in the student's past competence and the latest test results.
In the next section we will examine the implementation of our simple belief updating model based on human behaviour using recurrent fuzzy rules. The performance of such a system is then shown in application to particle classification in gaseous streams and the detection of lip areas in facial images. 11.2.2 Fuzzy belief updating methods
Single point classification using mass assignment based Pril fuzzy rules generates support yt for a class c at time t given the current input attributes { a i , . . . ,a„}. yt is therefore some function of input attributes, i.e., Vt = / ( { a i , . . . , a „ } ) . Belief updating can be thought of as a separate process to classification, where the current class support yt forms the input attribute to the belief model. The belief model generates a class support given all the pre
220 J. F. Baldwin, T. P. Martin & J. M. Rossiter vious class supports. Class support y't from the belief model is therefore some function g of the current class support from the classifier and some aggregation of previous class supports, i.e., y't = g(yt, • • • ,ytm)In t h e following three subsections we outline three possible belief updating methods; • external fuzzy belief updating. • internal fuzzy belief updating. • recurrent interacting fuzzy belief updating. 11.2.2.1 External belief updating method
Figure 11.5 shows how a single fuzzy rule is combined with an external belief u p d a t i n g model t o generate support for a given class. Here classification and belief u p d a t i n g are processed separately and sequentially. A belief u p d a t i n g model such as the Einhorn Hogarth's anchoringandadjustment model, as discussed in [9], can be used.
Datapoint (a,,..,aj
Rulec
belief model
y,'
support for class c
Fig. 11.5 A fuzzy rule with belief aggregation T h e advantage of this system is t h a t any s t a n d a r d belief u p d a t i n g m e t h o d can be applied after the fuzzy rule system. T h e disadvantage is t h a t problems can be encountered with such external systems due t o complex interactions between the fuzzy rule system and the belief u p d a t i n g model. 11.2.2.2 Internal belief updating method
To simplify the external belief u p d a t i n g model described above we can incorporate past support as an e x t r a t e r m in the fuzzy rule and use this to replace the external belief model. This feedback rule has now internalized the belief u p d a t i n g component. T h e internalized belief rule, as shown in figure 11.6, is simple and does not interact with any of the other class rules. If a series of inputs is presented to these rules and we then ask the question "what is the predicted class?" the answer would be t h e class given the highest rule support. This answer discards all other class supports and
Trend Fuzzy Sets and Recurrent Fuzzy Rules ...
221
therefore ignores any belief the model may have in any other class. This is inappropriate when two different classes have high supports which are almost identical.
support for y,' class c
Datapoint
(a r ..,a n } Fig. 11.6
Rulec
A simple support feedback rule
We propose that a more appropriate representation for predicted class supports is to form a fuzzy set called "predicted class" across the whole class universe. We propose the introduction of such a fuzzy set into our rule system, and will refer to this fuzzy set from now on as the predicted class fuzzy set. We will later detail how the predicted class fuzzy set is generated from rule supports. The memberships of each class in the predicted class fuzzy set are calculated from the normalized rule supports using Baldwin's voting model of fuzzy sets. Eq. 2 is an example of a predicted class fuzzy set constructed on the class universe {a,b,c}.
predicted class = a/1 + 6/0.2 + c/0.5
(2)
Eq. (2) indicates that the system believes most strongly that the evidence so far supports class a. Clearly even after introducing the predicted class fuzzy set the winning class is the class with highest membership in predicted class. It is important to note that the shape of the predicted class fuzzy set records information about the current state of belief in each and every class, not just the winner. 11.2.2.3 Recurrent interacting fuzzy belief updating method
We can also use this predicted class fuzzy set as a belief element by feeding back the last predicted class fuzzy set into all prediction rules. Semantic unification [8] provides a measure of Vx{previous class  predicted class) within the feedback term. The resulting rule structure is shown in figure 11.7. If we again ask the question, "what is the predicted class?", the answer would now be the complete predicted class fuzzy set. A fuzzy rule with a feedback term involving the predicted class fuzzy
222
J. F. Baldwin, T. P. Martin & J. M.
Rossiter
Rule a
1
w
Rulem pirvwus
I l l / / V '•Cl
Datapoint (a,,...,a }
j
*
form predicted class fuzzy set
predicted class fuzzy set
Fig. 11.7
The recurrent interacting belief updating fuzzy rule
set mimics positive and negative belief updating. Positive updating occurs when class c membership is high. This results in a high match with the feedback term in rule c, which in turn results in belief reinforcement in class c. Negative updating occurs when class c membership is low and membership of some other class or classes is high. The match with feedback term in rule c is low but the match with feedback terms in other rules is high. This results in a support for class c which, once all class supports have been normalized, is reduced. We call this an interactive recurrent fuzzy belief updating rule. The recurrence in this rule is implemented by feeding back the predicted class fuzzy set. The inference rule used in this belief system is the Fril evidential logic rule. The evidential logic used for this interactive belief updating system is best explained with reference to figure 11.8.
((Class i s c i f ) (evlog ( ( f e a t u r e 1 i s fuzzy set 1) ( f e a t u r e 2 i s fuzzy set 2) (previous c l a s s was previous ) ) ) : ( ( 1 1)(0 0))
class)
Wi W2 W3
Fig. 11.8
A recurrent evidential logic rule
The feedback term in figure 11.8 is represented by the third body term in the rule. The previous class fuzzy set is a reference fuzzy set generated during training. The previous class fuzzy set is denned over the class universe and holds information about which classes are likely to precede the class associated with that particular rule.
Trend Fuzzy Sets and Recurrent Fuzzy Rules ...
223
If the predicted class fuzzy set matches the previous class fuzzy set to a high degree then the feedback term in this rule will have high support and this will contribute to this whole rule having high support. The meaning of the support pairs ((1 1)(0 0)) is explained in detail in section 11.3.4. We can generate the previous class fuzzy set simply by analyzing the transitions between classes in an ordered training dataset. The process is as follows: (1) First we construct a transition matrix, such as in table 11.2, by counting the transitions from previous class to current class throughout the training dataset. Each column contains information that describes the likelihood of each class preceding the current class. (2) Each column is then normalized to produce a probability distribution. (3) The probability distributions are then converted into fuzzy sets previous class using mass assignment. (4) The resulting previous class fuzzy sets are then included in the feedback term of the corresponding evidential logic rules.
Previous
a b c
a 90 5 5
Current b 7 82 11
c 3 13 84
Table 11.2
Class transitions
As an example, let us take the column for current class c. Given that current class is c, the classes {a,b,c} will precede the current class with the probabilities Pr(a)= 0.03, Pr(6)=0.13, and Pr(c)=0.84. Using mass assignment, we obtain the previous classc fuzzy set for rule c shown in Eq. (3).
previous classc = a/0.09 + 6/0.29 + 1b/l
(3)
Here a least prejudiced distribution is assumed in the distribution of masses defined by the previous class fuzzy set to the focal elements {a, b, c}.
224
J. F. Baldwin,
T. P. Martin & J. M.
Rossiter
The memberships of focal elements in the previous class fuzzy set are calculated for this example in Eq.(4).
X(a) X(c)
= =
P r ( a ) x 3 = 0.09 X(a) + (Pr(&)  Pr(o)) x 2 = 0.29 X(6) + (Pr(c)  Pr(ft)) = 1 (4)
X(b) =
This least prejudiced method of generating a fuzzy set from a probability distribution is described in detail, including a simple algorithm, in [5]. Having constructed previous class fuzzy sets for all rules the feedback weight for each rule can be calculated. We use semantic discrimination analysis [2] to determine the discriminating power of each previous class fuzzy set. These discrimination values are then normalized and distributed across the rules as term weights. Weights for attribute terms are generated using semantic discrimination analysis between attribute fuzzy sets. 11.2.3 11.2.3.1 Fuzzy belief updating Facial feature extraction results
These results illustrate the operation of the interactive recurrent fuzzy rule in updating belief across a stream of data derived from facial images[3]. This application gives us a clear visual indication of how the belief rules operate internally. In these results we use a 100x100 pixel colour bitmap images. The goal of this application is to binary classify the image into lip and notlip regions. Each point is represented by the two normalized colour features red/(red + green + blue) and green/ (red + green + blue), no other features were used. The test image is traversed horizontally from topleft to bottomright as a single stream of pixels. This traversal method is illustrated in figure 11.9 on a small section of a facial image showing just the lip region. The four stages in the image classification process are: 1 Take the 100 x 100 image and processes it into 100 horizontal slices, each of 1 pixel width. 2 Now join together the horizontal slices to form a stream of 1000 single pixels. The top slice is placed first in the stream and the
Trend Fuzzy Sets and Recurrent Fuzzy Rules ...
225
bottom slice is the last 3 Feed the pixel stream into the trained fuzzy rule system. Each pixel point is classified as lip or notlip. 4 Reconstruct the facial image using the reverse of processes in stages 1 and 2. The final image in figure 11.9 shows the lip regions as a shaded block.
1. original image
lko 1 bli'o n
2. linearized image i slice 1 i slice 2
3. classified series i slice 1
4. classified image lip a: lip
Fig. 11.9
Images processed as linear streams
The 100x100 image in figure 11.10(a) was first processed by hand to define the rough lip mask in figure 11.10(b). These two images were used to train the fuzzy rules. Two test rule sets were generated, one using simple, nonrecurrent, fuzzy rules and one using recurrent interacting belief updating rules. The first test involved predicting the lip region in figure 11.10(a) using the two rule sets. Figures 11.11(a) and 11.11(b) show the predicted lip regions for simple fuzzy rules and recurrent interacting belief updating rules respectively. A second test further highlights the performance of the recurrent rule in this application. The rule sets generated from the original figure 11.10(a) were tested on a second facial image, this time from a female. Figures
226
J. F. Baldwin,
T. P. Martin & J. M.
Rossiter
Fig. 11.10
Lip training images
Fig. 11.11
Lip test images
11.12(a) and 11.12(b) show the classification results using simple fuzzy rules and recurrent interacting belief updating rales respectively. Figures 11.12(c) and 11.12(d) show the lip areas from Figures 11.12(a) and 11.12(b) respectively In greater detail. Figures 11.11(b) and 11.12(b) show markedly less speckling from misclassification than figures 11.11(a) and 11.12(a). This is due to the smoothing effect of the belief memory term. To best apply this method to facial feature detection the Images can be traversed in the same way as described In figure 11.9, but In all four directions; left to right, right to left, top to bottom, and bottom to top. This will clear up more of the speckling and unify the lip region shown In figure 11.12(d). 11.2.3.2 Particle classification
Two datasets were generated from two classes of gaseous particle, each point having five continuous features. • Dataset 1 represents a series where 200 particles of class 1 are 1;
Trend Fuzzy Sets and Recurrent Fuzzy Rides . . .
227
Fig. 11.12 Further lip test images
serieSj followed by 200 particles of class 2. Clearly dataset 1 represents the extreme case where belief in class 1 increases consistently through the first half of the dataset and then is contradicted at the halfway point. R:om the halfway point belief in class 1 falls consistently as belief in class 2 rises. • Dataset 2 represents a series where no two points from the same class are next to each other in the series. Clearly this also is an extreme case. Here the system need only ha¥e belief in the next point given the previous point. Table 11.3 shows results of class prediction on both datasets using single point,prediction and using the following four evidential logic rule structures: a Simple evidential logic rules with no belief updating. Here the class is deduced only from the current input attribute tuple. b Simple feedback rules. Support values are fed back into the same rules. Here the weight of the feedback terms in each rules are set by hand at 0.3. c Feedback rules with automatic weight calculation. Here support values are fed back into the same rules. The weight of the feedback terms are calculated automatically using semantic discrimination analysis.
228
J. F. Baldwin,
T. P. Martin & J. M.
Rossiter
d Feedback rules using predicted class and previous class fuzzy sets. The feedback weights are calculated automatically, predicted class and previous class fuzzy sets are generate using the methods described in sections 11.2.2.2 and 11.2.2.3.
rule structure b c 59 55.75 69.5 60
dataset 1 2 Table 11.3
a 56.25 59.75
d 67 70
Results using particle data: % correctly classified
Results on such particle classification datasets typically show 10 percent improvement when using our recurrent interacting belief updating rules over nonrecurrent fuzzy rules. 11.2.4 Some conclusions from recurrent belief updating
We have shown that by adding a feedback term to the evidential logic rule we have generated a simple belief updating rule. The key to rule interaction is the use of the predicted class fuzzy set as a feedback element. We have also shown how rule weights and reference previous class fuzzy sets are generated automatically from the training data using mass assignment and semantic discrimination analysis. Results in the applications of facial feature extraction and particle classification in streams show improvement using the belief updating evidential logic rule over simple nonrecurrent evidential logic rules.
11.3
Perceptionbased fuzzy trend modelling
As we have shown above, classification in ordered datasets can be improved by the addition of a simple belief updating component. While this method is indeed very useful, and the model is a glass box, we often require a model that is more naturally descriptive of the behaviour of the dataset. Crucial to a concise natural description are the features used and their linguistic labeling. The aim of this next section is to describe a time series with a set of rules which use natural linguistic terms. In our analysis we use natural linguistic
Trend Fuzzy Sets and Recurrent Fuzzy Rules ...
229
terms such as rising, falling, rising more steeply, crest etc. Using these natural terms we produce a glass box model of the series which is easily understood. This linguistic model avoids unintuitive complex mathematical descriptors and black box models, such as those commonly produced by neural networks. We can also use the linguistic fuzzy rule models to predict how a time series will behave in the future. Such time series prediction has many realworld applications such as sunspot prediction to minimize telecommunication disruption.
11.3.1
An introduction
to shape descriptors
for ordered
datasets
Simple linguistic terms such as rising and more complex terms such as rising more steeply are applied to ordered series to generate our descriptive models. A simple linguistic term such as rising indicates that the series S at that point is changing; i.e., rising implies S x +i > S x . These terms are fuzzy measures of the first derivative trend of the series. A more complex term such as rising more steeply indicates that the trend of the series is changing; i.e., rising more steeply implies Sx+2 — S x +i > S^+i — S x . These terms are fuzzy measures of the second derivative trend of the series. By matching these concepts to a time series we can describe the series with simple rules. Figure 11.13(a) shows how shapes based on these natural linguistic terms can be matched with a discrete data series. Here rising more steeply matches the discrete series S to a higher degree than rising.
(a) S(x) i rising more , steeply!' is rising (b)
sw(
rising less , steeply
7*—
»
'
*
x Fig. 11.13 Shape matching
' ' region of interest
x
230
J. F. Baldwin,
T. P. Martin & J. M.
Rossiter
The shape of any series will typically resemble more than one trend shape. For example, within a region of interest, a series can be rising steeply and leveling off at the same time. If we represent the trend shapes as fuzzy sets, a window on the series can have membership in more than one trend fuzzy set at the same time, as shown in figure 11.13(b). Using these trend concepts we can build fuzzy rules to describe the series shape. These rules are of the form "next point is X if previous trend was Y " where X is a fuzzy value for S^, e.g. "about 0.5", and Y is a trend fuzzy set, such as "rising more steeply" In practice it is simpler to construct rules that predict dS, the first derivative of the series, rather than the series S itself. To obtain the next S value the predicted dS is added to the current S value. Such a rule would have the form, "next point is current point + dS if previous trend was Y" 11.3.2 Prototype trend shapes
Before constructing our descriptive fuzzy rule model we first define a number of prototype trend fuzzy sets. Clearly these second derivativelike trend fuzzy sets must not be inconsistent with what most people understand by the corresponding linguistic labels. Figure 11.14 shows six such prototype shapes defined by the functions in table 11.4.
trend falling less steeply rising less steeply falling more steeply rising more steeply crest trough Table 11.4 function f(x)
a
l  ( l  ( l  i )
) 2
(1(1*)<•)*
(1
2
xa)i
a
1(1X
)5
la (x0.5)a a2(x0.5)a
Example prototype functions
Each second derivative trend fuzzy set is constructed from a prototype function in table 11.4 by taking a window of m + 1 values of f(x) where x is distributed uniformly across interval [0 1]. A trend fuzzy set is generated from the difference of this window series as described in the next section. These prototype trend fuzzy sets are an important measure of the shape of a series. The prototypes are naturally linguistic and anyone encountering these linguistic names will have a good idea of the shapes they represent.
Trend Fuzzy Sets and Recurrent Fuzzy Rules . . .
231
falling less steeply
falling more steeply
rising more steeply
rising less steeply
crest
trough
Fig. 11.14 Example prototype shapes, a — 2
Having generated a set of prototype trend fuzzy sets we can now take a window in a series, generate the trend fuzzy set for that window and calculate ^{prototype trend\current trend) using Pril semantic unification [8]. This forms the basis of our fuzzy rule model. 11.3.3 Method for generating trend fuzzy sets
Given a time series S and current position n, our goal is to predict the next point s n + i from the last m points [s„_ T O ,..., s n ]. To generate a trend fuzzy set we process a window of size m from the time series. A window is taken from the difference series D, derived from the original series S such that dra = s n + i — s„. Figure 11.15 shows how D is derived from S.
—i
s
l Si—S^ t —I
d
s
1 2
\
s
1 3
s
1 4
S
1 5
s
1 6
s
1—n
s
ST—Sry t 1
d
SA Si Sc—S^ i 1
d
I 1—d
D
l
2
3
4 I
1
window of size m Fig. 11.15 Extracting difference window D
The windowed difference values [ d i , . . . , d m ] are tested for membership
232
J. F. Baldwin,
T. P. Martin & J. M.
Rossiter
in p difference fuzzy sets, { f i , . . . , f p }. These difference fuzzy sets are defined on the continuous difference universe of D. Examples of such difference fuzzy sets are shown in figure 11.16. We can label the fuzzy sets {fi,...,f5} with appropriate linguistic labels such as falling fast, falling slowly, constant, rising slowly, and rising fast.
f
l
f
2
f
3
f
4
f
5
1
Fig. 11.16
j o
Difference fuzzy sets
1
We must now calculate memberships of each d in each difference fuzzy set f. This generates a membership set P, shown in Eq. (5). P = {X* (d,)  » € { 1 , . . . ,p},j G { 1 , . . . ,m}} (5)
where Xf;(dj) is the membership of point dj in fuzzy set f,. Figure 11.16 illustrates how memberships are calculated. Here Xf2(d) = 1 andxf 3 (d) =0.75. Having constructed a set of memberships for each window point in each fuzzy set we now convert this into a single second derivative fuzzy set. Second derivative trend fuzzy sets are defined on the universe of compound labels L, shown in Eq. (6).
L = {label(U)j  t € { l ) . . . , p } > i € { l ) . . . , m } }
(6)
where label(fi) is the linguistic label of fuzzy set f, such as rising, falling, etc. and j is the position of the current point in the series window. Label label(fi).j is the concatenation of label{ii) and number j into a compound label. Examples of valid labels label(fi).j are therefore "risingl", "fallings", and "constants". The membership of label(£i).j in the second derivative fuzzy set has the same value as the membership of dj in difference fuzzy set £ (that is,
Trend Fuzzy Sets and Recurrent Fuzzy Rules ...
233
Xf;(dj)) and is taken directly from difference membership set P. Now we have constructed a second derivative fuzzy set which describes the trend of the series in the last m points. 11.3.4 Method for generating linguistic fuzzy rules
The standard Fril inference rule[5] is based on Jeffrey's rule, Eq. (7) where h is the head of the rule and b is the body.
Pr(h) = Pr(hb)Pr(b) + Pr(hnb)Pr(nb)
(7)
Figure 11.17 shows a standard Fril rule, (n p) is the support pair representing Pr(hb) and (I u) is the support pair representing Pr(hib). Clearly if Pr(b) is known it is a trivial matter to calculate Pr(h).
((difference is falling if) (trend was falling_less_steeply))
: ((n p)(l u))
Fig. 11.17
Standard Fril rule
Since we are dealing with first derivative linguistic terms (such as low) in the head and second derivative terms (such as falling less steeply) in the body we need a method of determining Pr(hb). We can determine Pr(hb) for each rule by constructing a table of trend fuzzy set labels against difference fuzzy set labels, as shown in table 11.5. Each (n p) is the support pair representing the conditional probability of the difference fuzzy set label given the corresponding prototype trend fuzzy set label, and is equal to Pr(hb) for standard Fril rule. A (n p) pair is calculated as follows, where all probabilities are expressed as support pairs:
(1) Let us consider the set of difference fuzzy set labels describing fuzzy sets rising, constant, falling, etc., {A;} = {label(fi)} and the set of trend fuzzy set labels describing fuzzy sets rising more steeply, crest, falling less slowly, etc., {Bj.} £ {label(gk)} (2) We can directly calculate Pr(Ajdj) for each Ai and each j in the dataset using semantic unification. (3) We can calculate Pr(B f c dj) by first taking a window of size m previous values in the difference stream D. This window, [ d j _ m . . . dj], is converted into a trend fuzzy set fe as described in section 11.3.3. Semantic unification now gives us
234
J. F. Baldwin,
T. P. Martin & J. M.
Rossiter
Pr(B fc fB) which we will take as being equal to Pr(Bj.d,). This is repeated for all Aj and for all B*.. (4) So now we have values for P r ( A J d j ) and Pr(Bj.dj), and thus Pr((A i ,Bfc)dj) =
PrCAildOJMBjd,).
(5) We now need to calculate Pr(A i ,Bj fc ) from Pr((A i ,B)t)dj), i.e., Pr(Aj,B,t) = J ] . P r ( ( A i , B f c )  d j ) . P r ( d j ) . If we assume no prior for P r f ^  ) , Pr^A^Bj.) is now ^2. Pr((A i ,Bj.)Pr(d •)) which is easily calculated, (we can assume no prior because difference fuzzy sets are constructed on the distribution of the training set). (6) Given Pr(Aj,Bfc) we now calculate Pr(AjBfc) and insert this into the appropriate cell in table 11.5.
Trend fuzzy set label falling less steeply falling more steeply rising less steeply rising more steeply crest trough Table 11.5
Difference fuzzy set label rising constant falling ("13 P13) ("7 P7) ("l Pl) (ni4 P14) ("8 P8) ("2 P2) ("15 P15) ("9 Pa) ("3 P3) ("16 Pl6) ("10 P10) ("4 P4) ("17 P17) ( " i i P11) ("5 Ps) ("18 Pis) ("12 P12) ("6 Pe)
Supports for (differencetrend)
Standard Fril rules such as figure 11.17 can be collected into a more concise Fril extended rule as shown in figure 11.18. Now Pr(h) can be calculated directly, with (np) pairs taken from table 11.5.
((difference is falling if)( (trend was falling_less_steeply) (trend was falling_more_steeply) (trend was rising_less_steeply) (trend was rising_more_steeply) (trend was crest) (trend was trough)
) :
(("!Pl)("2P2)("3P3)("4P4)("5P5)("6P6))
Fig. 11.18
Extended Fril rule
Trend Fuzzy Sets and Recurrent Fuzzy Rules ...
235
11.4
Method for generating linguistic evidential logic rules
Figure 11.19 is an example of an evidential logic rule based on the six trend fuzzy sets from table 11.4. Given such a rule we need to calculate the term weights W i , i . . Wi,6((difference is falling if) (evlog ( (trend was falling_less_steeply) Wi,i (trend was f allingmore_steeply) W i ) 2 (trend was rising.less.steeply) W i j 3 (trend was rising_more_steeply) Wi,4 (trend was crest) Wi 5 (trend was trough) W 1,6 ))) : ((1 1)(0 0))
Fig. 11.19
Trend evidential logic rule
Each Wi >n is a measure of the importance of that term in determining the support for the head of the rule.
difference fuzzy set label falling constant rising Wi,i Wi,2 Wi,3 Wl,4 Wl,5 Wi,6 W 2 ,i W 2 ,2 W 2 ,3 W 2 ,4 W2,5 W 2 ,6 W 3 ,i W 3 ,2 W 3 ,3 W 3 ,4 W3,5 W 3 ,6
trend fuzzy set label falling less steeply rising less steeply falling more steeply rising more steeply crest trough Table 11.6
Weights for difference evidential rules
We calculate each Wy in table 11.6 for difference fuzzy set f; and prototype trend fuzzy set gj, given next difference d„+i and current trend fuzzy set g„, from Eq. (8). Weights are then normalized to preserve the constraint, Vi V . W ; J = 1. The conditional probability of one fuzzy set given another fuzzy set is calculated using Fril semantic unification.
Wij=^Pr(fidn+1).Pr(gJ.g„)
(8)
236
J. F. Baldwin,
T. P. Martin & J. M.
Rossiter
11.4.1
Training and testing
the linguistic
evidential
rules
Assuming that n prototype trend fuzzy sets, { g i , . . . , g n } , have already been defined, training involves the following process: (1) p difference fuzzy sets, {fi,... , f p } , are denned on the universe of difference series D such that each fuzzy set covers an equal number of values in D. Typically these fuzzy sets are triangular or trapezoidal. (2) n evidential logic rule weights, { W i , . . . , W n } , are then calculated as described in the previous section. (3) Finally p evidential logic rules, one for each difference fuzzy set label, are constructed. Each rule has one body term for each prototype trend fuzzy set g. Prediction involves the following process: (1) A window of m previous difference values is taken from the difference series D. (2) A trend fuzzy set g is then generated on the window as described previously. This trend fuzzy set describes the trend of the last m points in the series. (3) Evaluating the evidential logic rules and defuzzifying the resulting supports gives a predicted difference value d. (4) This predicted difference value is then added to the last point in the series S to give a predicted value for the next point in S. 11.4.2 Linguistic fuzzy trend results
Figures 11.20 and 11.21 show D and S series prediction results for the two data sets in table 11.7. The sunspot series are taken from a normalization of sunspot data described in [10]. For both cases two difference fuzzy sets are defined and hence two evidential rules were constructed. Window size for the sine series is larger than for the sunspot series to adjust for the lower frequency of the fundamental. The graphs show prediction on the test sets as solid lines and actual values as dotted lines. Points in the region up to z are predictions of the next point in the series given a window of m previous points. Predictions after z are predictions for next point given a window of points up to z only.
Trend Fuzzy Sets and Recurrent Fuzzy Rules ...
237
Future predictions are then made on a window of previous predictions.
dataset sine sunspot train sin(nx + /3) [so,...,s 2 9] Table 11.7 test sin(nx + 7)
[S30, . . . , 8 5 9 ]
window size m 16 6
Test data sets
z Fig. 11.20 Predicted sine wave
z
z Fig. 11.21 Predicted sunspot series
z
Clearly prediction of points in the region before z is more accurate than prediction after z where errors are accumulated and no new information is presented. As can be seen in figures 11.20 and 11.21 the fuzzy trend model, with only two rules, catches the shape of both test cases. The model performs particularly well in figure 11.21 given the complexity of the sunspot data. 11.4.3 Some conclusions from fuzzy trend matching
We have shown that a new feature, the trend fuzzy set, can be used to represent the shape of a time series. A set of prototype trend fuzzy sets describing natural trends such as rising more steeply form the basis of terms
238
J. F. Baldwin,
T. P. Martin & J. M.
Rossiter
Fril evidential logic rules. These rules are used to predict future points the series given a window of previous points. With only a small number rules, difference fuzzy sets, and trend prototypes, a good approximation the shape of the series is produced. Using trend fuzzy sets based on natural linguistic terms we have generated a glass box model of the series. The glass box nature of this method enables a clear understanding of the prediction model and in many cases the series as well. Complexity in the test examples is limited to the distribution of weights in the evidential logic rules. More complex rule structures can be used to model ordered datasets more accurately. 11.4.4 Overall conclusions and future perspectives
in in of of
Our two simple models of perception and belief have improved the modelling of ordered datasets. We have described the two approaches separately in order to highlight their differences. Results across diverse applications have been presented which show the improvements in modelling and prediction possible by using these two models. In some applications it may be desirable to combine the two models to give a belief updating system which uses the highlevel fuzzy trend feature as its input. This will produce a linguistically clear and concise glass box model which captures some element of human perception and belief updating.
Trend Fuzzy Sets and Recurrent Fuzzy Rules ...
239
Reference
J. F. Baldwin, "Management of Fuzzy and Probabilistic Uncertainty for Knowledge Based Systems", in Encycloapedia of AI, (Ed. S. A. Shapiro), John Wiley, 2nd ed., pp.528537, 1992. J. F. Baldwin, "Knowledge from Data Using Fril and Fuzzy Methods", Fuzzy Logic (Ed. J. F. Baldwin), John Wiley and Sons, 1996. J. F. Baldwin, S. Case, T. P. Martin, "Machine Interpretation of Facial Expressions" BT Technology Journal, Vol. 16, 3, pp.156164, 1998. J. F. Baldwin, J. Lawry, T. P. Martin, "A Mass Assignment Theory of the Probability of Fuzzy Events", Fuzzy Sets and Systems, Vol. 83, 3, pp.353367, 1996. J. F. Baldwin, T. P.Martin, B. W. Pilsworth, "Fril  Fuzzy and Evidential Reasoning in Artificial Intelligence", Research Studies Press Ltd, 1995. J. F. Baldwin, T. P. Martin, J. M. Rossiter, "Recurrent fuzzy rules for belief updating", Proceedings of lizuka 1998, Vol. 1, pp.511514, 1998. J. F. Baldwin, T. P. Martin, J. M. Rossiter, "Time series modelling and prediction using fuzzy trend information", Proceedings of lizuka 1998, Vol. 1, pp.499502, 1998. J. F. Baldwin, B. W. Pilsworth, "Semantic Unification with Fuzzy Concepts in FRIL", International Journal of Intelligent Systems, Vol. 7, pp.6169, 1992. A. I. Goldman, "Epistemology and cognition", Harvard University Press, 1986, pp.344358, 1986. A. S. Weigend, B. A.Huberman, D. E. Rumelhart, "Predicting Sunspots and Exchange Rates with Connectionist Networks" in Nonlinear Modelling and Forecasting, AddisonWesley, pp.395432, 1992. L. A. Zadeh, "Fuzzy Sets", Information and Control, Vol. 8, pp.338353, 1965
Chapter 12 Approaches to the Design of Classification Systems from Numerical Data and Linguistic Knowledge
Hisao Ishibuchi, Manabu Nii, and Tomoharu Nakashima
Osaka Prefecture University
Abstract This paper discusses the design of classification systems when we have two kinds of information: numerical data and linguistic knowledge. Numerical data are given as a set of labeled samples (i.e., training patterns), which are usually used for designing classification systems in various pattern classification techniques. Linguistic knowledge is a set of fuzzy ifthen rules, which is not usually utilized in nonfuzzy pattern classification techniques. In this paper, it is implicitly assumed that either kind of information is not enough for designing classification systems with high classification performance. Thus our task is to design a classification system by simultaneously utilizing these two kinds of information. In this paper, we illustrate two approaches to the design of classification systems from numerical data and linguistic knowledge. One is a fuzzyrulebased approach where numerical data are used for generating fuzzy ifthen rules. The other is a neuralnetworkbased approach where linguistic knowledge as well as numerical data are used for training neural networks. First we discuss the extraction of fuzzy ifthen rules directly from numerical data. We also describe the fuzzy rule extraction from neural networks that have already been trained using numerical data. Next we discuss the learning of neural networks from numerical data and linguistic knowledge. In the learning, fuzzy ifthen rules and training patterns are handled in a common framework. Finally we examine the performance of these approaches to the design of classification systems from numerical data and linguistic knowledge through computer simulations. Keywords : fuzzy rulebased systems, neural networks, pattern classification, knowledge extraction, learning from linguistic knowledge 241
242
H. Ishibuchi, M. Nii & T.
Nakashima
12.1
Introduction
When we design information processing systems such as controllers and classifiers, two kinds of information are usually available. One is numerical data, and the other is linguistic knowledge from domain experts. Various pattern classification methods have been proposed for designing classification systems from numerical data [14]. Those methods usually can not utilize linguistic knowledge for designing classification systems. For example, only numerical data are used in the learning of neural networks, which are viewed as nonlinear classifiers in their application to pattern classification problems. On the other hand, fuzzy rulebased systems [5,6] are traditionally designed from linguistic knowledge of human experts. Recently various methods have been proposed for automatically designing fuzzy rulebased systems from numerical data without human experts [714]. The main aim of this paper is to illustrate how the two kinds of available information can be simultaneously utilized for designing pattern classification systems. Thus our task in this paper is to design a pattern classification system from numerical data and linguistic knowledge. We implicitly assume that either kind of information is not enough for designing classification systems with high classification performance. Numerical data are a set of labeled samples (i.e., training patterns with class labels). Let us assume that we have m training patterns x p = (xp\,... , xpn), p = 1,2,... , m from c classes where n is the number of attributes involved in our pattern classification problem. That is, our pattern classification problem has m training patterns with n attributes from c classes. We also assume that linguistic knowledge is given in the form of the following fuzzy ifthen rules: Rule Rj : If X\ is Aj\ and . . . and xn is Ajn then Class Cj with CF = CFj, j = 1,2,... , M, (1)
where Rj is the label of the j  t h fuzzy ifthen rule, x = (x\,... , xn) is an ndimensional pattern vector, A^'s (i = 1,2,... , n) are linguistic values such as "small" and "large", Cj is a consequent class (i.e., one of the c classes), CFj is a certainty grade, and M is the number of given fuzzy ifthen rules. Examples of such fuzzy ifthen rules are "If x\ is small and X2 is small then Class 1 with CF = 0.9" and "If xi is large then Class 2 with CF = 0.8". It should be noted that there is no antecedent condition on the first attribute in the second fuzzy ifthen rule. That is, given fuzzy
Approaches to the Design of Classification
Systems
...
243
ifthen rules may involve some "don't care" attributes. In this paper, we propose the following two approaches to the design of classification systems from numerical data and linguistic knowledge: (1) Fuzzyrulebased approach: Fuzzy ifthen rules are generated from numerical data. The generated fuzzy ifthen rules are used in fuzzy rulebased systems together with linguistic knowledge. For generating fuzzy ifthen rules, we examine two methods. One is direct rule extraction where fuzzy ifthen rules are directly extracted from numerical data [1520]. The design of fuzzy rulebased systems using the direct rule extraction is illustrated in Fig. 12.1. The other is indirect rule extraction where fuzzy ifthen rules are extracted from neural networks that are trained using numerical data [21]. The design of fuzzy rulebased systems using the indirect rule extraction is illustrated in Fig. 12.2. (2) Neuralnetworkbased approach: Linguistic knowledge as well as numerical data are utilized in the learning of neural networks [22, 23]. Fuzzy ifthen rules and training patterns are handled in a common framework in the learning. That is, fuzzy ifthen rules are also used as training data. This approach is illustrated in Fig. 12.3. We first describe the extraction of fuzzy ifthen rules from numerical data and trained neural networks. Next we describe the learning of neural networks from numerical data and linguistic knowledge. Finally we examine the performance of our approaches to the design of classification systems from numerical data and linguistic knowledge through computer simulations. Relations among numerical data, neural networks and linguistic knowledge (i.e., fuzzy ifthen rules) are summarized in Fig. 12.4. The direction
Extraction Numerical data
W
Fuzzy rules Fuzzy rulebased system
Linguistic knowledge
^
Fuzzy rules
Fig. 12.1 Fuzzy rulebased system where fuzzy ifthen rules extracted from numerical data are used together with linguistic knowledge from human experts.
244
H. Ishibuchi, M. Nii & T. Nakashima Learning Numerical data w Neural networks Extraction W Fuzzy rules
^r
Linguistic knowledge
=
Fuzzy rules
> Fuzzy rulebased system
Fig. 12.2 Fuzzy rulebased system where fuzzy ifthen rules extracted from trained neural networks are used together with linguistic knowledge from human experts.
Numerical data
Training patterns
Learning Neural networks
Linguistic knowledge
Fuzzy rules
Learning
Fig. 12.3 Neuralnetworkbased classification system where numerical data and linguistic knowledge are simultaneously used in the learning.
from numerical data to neural networks is the learning of neural networks. This direction is one of the main streams of neural networks research. The direction from numerical data to fuzzy ifthen rules corresponds to the extraction and learning of fuzzy ifthen rules from numerical data. This direction is one of the most active research areas on fuzzy systems. Various techniques from machine learning, neural networks and evolutionary computations have been employed in this research area. While these two directions are very active, only a few studies have been reported along the other two directions in Fig. 12.4: learning of neural networks from linguistic knowledge, and fuzzy rule extraction from neural networks. As we have already mentioned, our approaches utilize the bidirectional relation between neural networks and linguistic knowledge [24].
12.2 12.2.1
Fuzzy RuleBased Approach Assumptions
As we have already described, we assume in this paper that the m labeled training patterns x p = ( x p l , . . . , xpn), p = 1,2,... , m and the M fuzzy if
Approaches
to the Design of Classification
Systems
...
245
• :Class I o :Class 2 » :Class 3 *2
Learning Numerical Data
Extraction
If JCJ is small then Class 1.
l.H
Extraction
0.0
EKXXX , 1.0
y
Linguistic Knowledge
Fig. 12.4
Relations among numerical data, neural networks, and linguistic knowledge.
then rules Rj, j = 1,2,... , M in (1) are given. In this section, we describe the fuzzy rulebased approach where the given numerical data are used for generating fuzzy ifthen rules. The generated rules are used together with the given linguistic knowledge for constructing fuzzy rulebased systems. We generate fuzzy ifthen rules of the same form as the given fuzzy ifthen rules in (1) from the numerical data. For the simplicity of illustration, we assume that the pattern space of our pattern classification problem is the ndimensional unit cube [0, l ] n . In computer simulations of this paper, attribute values are normalized into real numbers in the unit interval [0, 1]. We also assume that five linguistic values (i.e., small, medium small, medium, mediumlarge, and large) in Fig. 12.5 are given for all the n attributes involved in our pattern classification problem. Those linguistic values are used as antecedent fuzzy sets of fuzzy ifthen rules. Of course, the selection of linguistic values is problemdependent. In each application domain, an appropriate collection of linguistic values is to be differently chosen for each attribute. We use the five linguistic values in Fig. 12.5 for illustrating our approaches and demonstrating their performance. In addition to the five linguistic values, we also use "don't care" as a special linguistic value. The membership func
246
H. Ishibuchi, M. Nii & T. Nakashima
tion of "don't care" is the same as the unit interval [0, 1] because the entire domain of each attribute is [0, 1]: Jl I0 i f x € [0, 1], otherwise.
Fig. 12.5 Membership function of five linguistic values (S: small, MS: medium M: medium, ML: mediumlarge, L: large).
small,
Since we have the six antecedent fuzzy sets (five linguistic values in Fig. 12.5 and "don'tcare"), the total number of combinations of antecedent fuzzy sets is 6™. Among them, the most general rule has "don't care" for all the n attributes, and the most specific rules have no "don't care". For the case of twodimensional pattern classification problems with the pattern space [0, 1] x [0, 1], we illustrate fuzzy partitions corresponding to all the 6 x 6 = 36 fuzzy ifthen rules in Fig. 12.6. In this figure, the most general rule has two "don'tcare" conditions (bottomleft figure), and the most specific 25 rules have two linguistic conditions (topright figure). The other 10 rules have a single linguistic condition on either attribute. In many studies on fuzzy rulebased systems, "don't care" is not used as an antecedent fuzzy set. In the case of Fig. 12.6, only 25 fuzzy ifthen rules are usually used in fuzzy rulebased systems. In this paper, we use "don't care" for avoiding the exponential increase in the number of fuzzy ifthen rules as the dimensionality n of the pattern classification problem increases. That is, we do not use all the possible rules (i.e., 6 ™ rules). As we can see from Fig. 12.6, many specific fuzzy ifthen rules are included in some general fuzzy ifthen rules. We tackle the curse of dimensionality (i.e., exponential increase in the number of fuzzy ifthen rules) by using a small number of general rules instead of a large number
Approaches to the Design of Classification
Systems
...
247
i i i i
7//^//////^///////^//////^///.
i i ii
^ ^ ^ I
XI
y//^//////W//////fa/////My/,.
DC
Xi
SXMSX M XMLXL
DC
x2 \ \ !
s
i
0.0
DC
Xi
DC
SYMSYMYMLXL
Fig. 12.6
Fuzzy partitions corresponding to all the 36 fuzzy ifthen rules.
of specific rules. In this sense, fuzzy rule generation from numerical data for highdimensional problems can be viewed as choosing a small number of appropriate combinations of antecedent fuzzy sets from a huge number of possible combinations. 12.2.2 Heuristic Data Rule Extraction Directly from Numerical
In this section, we show how fuzzy ifthen rules can be extracted directly from the given numerical data x p = (xpi,... , xpn), p = 1,2,... , m. Our task in this section is to extract fuzzy ifthen rules of the form in (1). First we briefly illustrate a heuristic rule generation procedure of Ishibuchi et al. [15], which determines the consequent class Cj and the certainty grade CFj of the fuzzy ifthen rule Rj as follows when its antecedent fuzzy sets Aji's (i = 1, 2 , . . . , n) are specified:
248
H. Ishibuchi, M. Nii & T.
Nakashima
Step 1: Calculate the compatibility of each training pattern x p with the fuzzy ifthen rule Rj by the following product operation:
Mj(, x pJ
=
/J'jiyEpl) X . . . X (J,jn(%pn)i
{•*)
where Hji{xPi) is the membership function of Aji. Step 2: For each class, calculate the sum of the compatibility grades of the training patterns with the fuzzy ifthen rule Rj.
/fclass h(Rj) = ^2 W(XP)' x p £Class h
h = 1
'" 2 ' • • • ' C '
(4)
where ftc\ass h(Rj) is the sum of the compatibility grades of the training patterns in Class h with the fuzzy ifthen rule Rj. Step 3: Find Class Cj that has the maximum value of /fciass h{Rj)'/3ciass C, (Rj) = max{/?ciass l ( j R j ) . • • • ) ^Class c(Rj)}• (5)
If two or more classes take the maximum value, the consequent class Cj of the fuzzy ifthen rule Rj can not be determined uniquely. In this case, let Cj be <f>. If a single class takes the maximum value in (5), that class is the consequent class of the fuzzy ifthen rule Rj. If there is no training pattern compatible with the antecedent part of the fuzzy ifthen rule (i.e., if there is no training pattern in the fuzzy subspace Aj\ x Aj2 x . . . , xAjn), the consequent class Cj is also specified as 4>. Step 4' If the consequent class Cj is </», let the certainty grade CFj of the fuzzy ifthen rule Rj be CFj = 1.0. Otherwise the certainty grade CFj is determined as follows:
c
CFj = (/?Class Ci{Rj)P)lY,
h=l
^ C l a s s h(R>)'
^
where P=Y1 0Cla»fc(fli)/(cl). (7)
While the determination of the certainty grade CFj by (6)(7) seems to be a bit complicated at a glance, this procedure is easily understood and intuitively acceptable when we consider two class classification problems
Approaches
to the Design of Classification
Systems
...
249
(i.e., c = 2). In this case, Cj is Class 1 and CFj is specified as follows when
/?Class l(Rj) > /?Class 2(Rj)'
CFi
felass
l(Rj)
— /?Class feass
l{Rj) 2(Rj)
/?Class l(Rj) +
(8)
Otherwise Cj is Class 2 and C.F, is specified as CFj =
^Class 2JRj) JClass 1 (£,) felass l(Rj)
+ , 'Class 2 ( ^ ) '
(9)
As we can see from (4) in Step 2, the determination of the consequent class Cj and the certainty grade CFj depends only on training patterns compatible with the antecedent part of the fuzzy ifthen rule Rj. Such determination is illustrated in Fig. 12.7. As shown in Fig. 12.7, the certainty grade takes its maximum value (i.e., CFj = 1.0) when all the compatible patterns belong to a single class. • : Class 1 o : Class 2 o •
Aj}>
• : Class 1 o : Class 2
V
A
o • o
A/p
•
• •
o o o
= »
Ax
Fig. 12.7
=•
A\
Q: Class \,CFj= 1.0
C,: Class 1,CF, = 0.92
Illustration of the heuristic rule generation procedure.
12.2.3
Fuzzy
Reasoning
A fuzzy rulebased classification system consists of the extracted fuzzy ifthen rules from the training patterns and the given fuzzy ifthen rules (i.e., linguistic knowledge). Let us denote the set of those fuzzy ifthen rules in the fuzzy rulebased classification system by S. In the classification phase, a new pattern x p is classified by the single winner rule Rj* in 5, which is
250
H. Ishibuchi, M. Nii & T.
Nakashima
defined as follows: Hj. • CFr = m a x { ^ ( x p ) • CFj \ Rj G S}. (10)
That is, the winner rule has the maximum product of the compatibility grade /ij(x p ) and the certainty grade CFj. If more than one fuzzy ifthen rule have the same maximum product but different consequent classes for the new pattern x p , the classification of that pattern is rejected. The classification is also rejected if no fuzzy ifthen rule is compatible with the new pattern x p (i.e., /Uj(xp) for all rules in S). This fuzzy reasoning method based on the single winner rule is easily understood by human users. It is also suitable for the learning of fuzzy ifthen rules by a rewardpunishment scheme [25] and the evolution of fuzzy ifthen rules in a geneticsbased machine learning method [19,20]. 12.2.4 Fuzzy Rule Selection by Genetic Algorithms
The consequent part of each fuzzy ifthen rule can be determined by the heuristic rule generation procedure when its antecedent part is specified. For smallsize pattern classification problems with only a few attributes, we can generate 6" fuzzy ifthen rules by examining all the possible combinations of the six antecedent fuzzy sets (i.e., five linguistic values in Fig. 12.5 and "don't care"). Such an exhaustive rule generation can not be applied to highdimensional problems due to the exponential increase in the number of fuzzy ifthen rules. One method for constructing a compact fuzzy rulebased system is to generate a set of promising candidate rules and to select only a small number of significant rules from the candidate rule set [1618]. In the application to highdimensional problems, we generate only general fuzzy ifthen rules with many "don't care" conditions as candidate rules. For example, fuzzy ifthen rules with less than three linguistic conditions (i.e., with more than (n —3) "don't care" conditions) are usually tractable in terms of their size. The number of those fuzzy ifthen rules is 6 2 x n(n — l ) / 2 . Let Scand a n d Ncand be the candidate rule set and the number of candidate rules in Scand, respectively. Genetic algorithms are utilized for selecting only a small number of significant rules from Scand [1618]. We denote a subset of the candidate rule set Scand by S (i.e., 5 C 5cand) In our genetic algorithm, the inclusion and the exclusion of the jth candidate rule is denoted by Sj = 1 and Sj = 0, respectively (j = 1,2,... , iVcand)
Approaches to the Design of Classification
Systems
...
251
That is, Sj = 1 (SJ — 0) means that the j'th candidate rule is included in the rule set S (excluded from the rule set 5). In this manner, every subset S of the candidate rule set Scand i s coded by a bit string of the length NcandFirst our genetic algorithm randomly generates a prespecified number of bit stings of the length iVcand to form an initial population. A fitness value of each sting S (i.e., rule set S) is defined by its classification performance on the training patterns and the number of fuzzy ifthen rules in S as follows: fitness{S) = NCP(S)  w\S\ • \S\, (11)
where NCP(S) is the number of correctly classified training patterns by the rule set 5, w\s\ is a positive constant, and \S\ is the number of fuzzy ifthen rules in S. Since our aim is to select only a small number of significant rules, the second term in (11) is included in the fitness function as a kind of penalty with respect to the number of selected rules. From the current population, good strings with high fitness values are selected as parents for generating new strings by genetic operations. In computer simulations of this paper, we specify the selection probability of each string S in the current population ty by the roulette wheel selection scheme with the linear scaling as p/^N { '
=
fitness(S)  / m i n f f l Es€*[/«ne«(5)/min(*)]'
(
. ^ '
where / m in(*) is the fitness value of the worst sting in the current population * (i.e., the minimum fitness value in * ) . Since every rule set 5 is denoted by a bit string, we can use standard genetic operations such as the uniform crossover and the bit mutation in our genetic algorithm. One characteristic feature of our genetic algorithm is the use of biased mutation probabilities. For efficiently decreasing the number of fuzzy ifthen rules included in each rule set (i.e., the number of l's in each string) by the bit mutation, we assign a higher probability to the mutation from Sj — 1 to Sj — 0 than the mutation Sj = 0 — s3•, = 1. > This is because the former decreases the number of fuzzy ifthen rules while the latter increases it. We also use an elitist strategy where the best string in the current population is always inherited to the next population with no change.
252
H. Ishibuchi, M. Nii & T.
Nakashima
Various versions of our rule selection method were studies in Ishibuchi et al. [18] especially from a viewpoint of multiobjective optimization. Moreover a rewardpunishment learning scheme of fuzzy ifthen rules [25] can be applied to newly generated rule sets for improving their classification performance before their fitness values are calculated. 12.2.5 GeneticsBaaed Generation Machine Learning for Fuzzy Rule
The rule selection method in the previous section searches for a compact rule set from the given candidate rule set. Since the string length is the same as the number of candidate rules, the rule selection method can not handle a large number of candidate rules. This means that we need a prescreening procedure to generate a tractable number of candidate rules when the rule selection method is applied to highdimensional problems. A more straightforward fuzzy rule generation method was proposed for highdimensional pattern classification problems by Ishibuchi et al. [19, 20] where genetic operations were used for generating combinations of antecedent fuzzy sets (i.e., for specifying the antecedent part of each fuzzy ifthen rule). In our fuzzy geneticsbased machine learning algorithm called a fuzzy classifier system, every fuzzy ifthen rule is denoted by a string of the length n, which consists of its n antecedent fuzzy sets. Let us denote the six antecedent fuzzy sets as follows: don't care medium — 0, t — 3, > small — 1, medium small > medium large — 4, > large — 2, > ¥ 5.
Using this notation, every fuzzy ifthen rule is denoted by a string of the length n with the alphabet {0, 1, 2, 3, 4, 5}. For example, "1030" shows "If Xi is small and x^ is don't care and £3 is medium and X\ is don't care". The corresponding consequent part is determined by the heuristic rule generation procedure. So only the antecedent part is coded as a string. Such a string is handled as an individual in our fuzzy classifier system. First our fuzzy classifier system randomly generates a prespecified number of strings of the length n with the alphabet {0, 1, 2, 3, 4, 5} to form an initial population (i.e., an initial set of fuzzy ifthen rules). All the training patterns are classified by the current population for evaluating the fitness of each string (i.e., each fuzzy ifthen rule). A fitness value of a fuzzy ifthen rule Rj in the current population is defined as follows based on the
Approaches to the Design of Classification
Systems
...
253
classification results on the training patterns: fitness{Rj) = NCP(Rj)  werror • NMP(Rj), (13)
where NCP(Rj) is the number of correctly classified training patterns by Rj, Werror is a positive weight, and NMP(Rj) is the number of misclassified training patterns by Rj. It should be noted that a single winner rule is responsible for the correct classification (or misclassification) of each training pattern. The first term and the second term in (13) are viewed as the award for the correct classification and the penalty for the misclassification, respectively. A pair of good strings are selected from the current population for generating new strings by genetic operations. The selection probability of each string is specified in the same manner as in the case of the rule selection. The uniform crossover is used for generating two stings from the selected pair of parent strings. A mutation operation is applied to each value of the generated strings, which randomly replaces a value in the strings with another value. By iterating these genetic operations, a prespecified number of new strings are generated. The same number of the worst strings in the current strings are replaced with the newly generated strings. The number of strings in each population is kept constant during this population update. Various versions of our fuzzy classifier system were studied in Ishibuchi & Nakashima [20] for improving its search ability to efficiently find good fuzzy ifthen rules. The learning of fuzzy ifthen rules [25] is applicable to each population for improving its classification performance before the fitness value of each fuzzy ifthen rule is calculated. 12.2.6 Fuzzy Rule Extraction ral Networks from Numerical Data via Neu
In the previous sections, we describe several methods for generating fuzzy ifthen rules directly from the given training patterns. In this section, we illustrate an extraction method of fuzzy ifthen rules from trained neural networks [21]. Advantages of our method over other rule extraction methods [2635] are as follows: (1) extracted fuzzy ifthen rules are always linguistically interpretable, (2) a certainty grade is assigned to each rule, and (3) our method is applicable to arbitrary trained neural networks. That is, our method is a general algorithm to extract fuzzy ifthen rules of the
254
H. Ishibuchi, M. Nii & T.
Nakashima
form in (1) by handling a trained neural network as a black box model. For the simplicity of illustration, we assume that a standard threelayer feedforward neural network has already been trained using the given training patterns while our method is applicable to more general neural networks (e.g., fourlayer neural networks). The number of input units is the same as the dimensionality of the pattern classification problem (i.e., n), and the number of output units is the same as the number of classes (i.e., c). The number of hidden units, which can be arbitrarily specified, is denoted by nn As in Rumelhart et al. [2], the inputoutput relation of each unit in the trained neural network can be written for an ndimensional input vector xp = (xpi,... , xpn) as follows:
Input units :
oPi = xPi,
n
i = 1,2,...
,n
(14) (15) (16) (17) (18)
Hidden units : opj = f(netpj), netpj = ^
i=\
j = 1,2,... ,UH j = 1,2,... , nH
o pi • Wji +0j,
Output units : oPk = f(netpk), fc = 1,2,... , c
nH
netpk = y^Opj • wkj + Ok, k = 1,2,... ,c
As the activation function /(•) in the hidden and output layers, we use the sigmoidal function f(x) = 1/(1 + exp(—a;)) as in Rumelhart et al. [2]. Our task in this section is to generate fuzzy ifthen rules from the trained neural network in (14)(18). Our rule extraction method [21] determines the consequent part of a fuzzy ifthen rule using the trained neural network when its antecedent part is specified. Thus our rule extraction method can be viewed as a counterpart of the heuristic rule extraction procedure. Now let us illustrate how the consequent class Cp and the certainty grade CFp of the fuzzy ifthen rule Rp can be determined by the trained neural network when its antecedent fuzzy sets Api,... ,Apn are given. First, the antecedent fuzzy sets are presented to the trained neural network as a fuzzy input vector. The inputoutput relation of each unit in the trained neural network is extended to the case of the fuzzy input vector
Approaches to the Design of Classification
Systems
...
255
A p — V pl > • " * ' ™pn) <^S
Input units :
0Pi = Xpi,
n
i — 1,2,... , n j = 1,2,... , nn
(19) (20)
Hidden units : 0Pj = f(NetPj),
Netpj =^20Piwji+dj,
8=1
j = 1,2,... ,TIH
(21)
(22) (23)
Output units : Opk = f(Netpk),
k = 1,2,... , c
Are^fc = ^ O p j  ^ j + 6 » / i ; , fc = l , 2 , . . . , c
In the above formulations, uppercase letters (e.g., APi, 0Pi, NetPj) are fuzzy numbers, and lowercase letters (e.g., Wji, w^j, @j) are real numbers. The inputoutput relation of each unit is defined by fuzzy arithmetic on fuzzy numbers [36]. Its numerical calculation is performed by interval arithmetic [37,38] on level sets of the fuzzy input vector as in many studies on fuzzified neural networks [22,23,39,40]. In Fig. 12.8 and Fig. 12.9, we illustrate the sum of fuzzy numbers and the nonlinear mapping of fuzzy numbers by the sigmoidal activation function, respectively.
Fig. 12.8 Sum of fuzzy numbers A and B.
When a nonfuzzy input vector x p is presented to the trained neural network, we have a crisp output vector o p = ( o p i , . . . ,opc) calculated by (14)(18). In this case, the nonfuzzy input vector x p is classified by the output unit with the maximum output value among the c output units.
256
H. Ishibuchi, M. Nii & T. Nakashima
1.0
/(Net)
w
Net 3
0.0 3 Fig. 12.9
Nonlinear mapping of fuzzy numbers by the sigmoidal activation function.
That is, we use the following classification rule: If opk < oph for all k's (k = 1,2,... , c and k jt h) then x p is classified as Class/i, (24)
where opk is the output value from the fcth output unit (k = 1,2,... , c). By simply extending the above classification rule to the case of the fuzzy input vector A p = (Api,... , Apn), we have the following rule: If Opk < Oph for all k's (k = 1,2,... , c and k ^ h) then A p is classified as Class/i, (25)
where Opk is the fuzzy output from the kth output unit ( f c = l , 2 , . . . , c ) . If we try to classify the fuzzy input vector A p (i.e., to determine the antecedent class Cv) by this classification rule, we have to define the inequality relation between the fuzzy outputs (i.e., Ovk < 0Ph). Since it is very difficult to clearly decide whether the inequality relation holds or not, we use the following classification rule based on the level set of the fuzzy output vector: If [Opfc]/3 < [Oph}/3 for all k's (k = 1, 2 , . . . , c and k ^ h) then [Ap]/3 is classified as Class/i, (26)
where []p denotes the level set of a fuzzy number at the level of /?, which is defined as [X]0 = {x  iix{x)>P, xeRe}. (27)
Approaches to the Design of Classification
Systems
...
257
In general, level sets of fuzzy numbers are intervals. We use the following definition for the inequality relation between the level sets (i.e., intervals) in (26): [aL, au] < [bL, bu] *=> au < bL, (28)
where the superscripts "L" and "U" mean the upper and lower limits of intervals, respectively. That is, an interval A is written as A = [aL, au] = {x  aL <x< au, x € 5ft}. (29)
When the classification rule in (26) holds for a prespecified value of j3, we classify the /?level set [Ap]/3 of the fuzzy input vector A p as Class h. In this case, we determine the consequent class Cp of the fuzzy ifthen rule Rp as Class h. In Fig. 12.10, we show some examples of fuzzy output vectors. The consequent class is determined as Class 2 in the case of Fig. 12.10 (a) whereas it can not be determined in Fig. 12.10 (b). In the latter case, we do not extract the corresponding fuzzy ifthen rule. From Fig. 12.10, we can see that the fuzzy output vector is classifiable when the overlap between the largest fuzzy output and the other fuzzy outputs is not large. In other words, when the overlap is small, we are sure that the fuzzy input vector belongs to Class h determined by (26). Furthermore, we may think that the smaller the overlap is, the larger the certainty of the classification is. Based on this discussion, we define the certainty grade CFP by the overlap between the largest fuzzy output (i.e., Oph) and the other fuzzy outputs as in Fig. 12.10 (a). Since the fuzzy output vector O p = (Opi,... ,Opc) is numerically calculated by interval arithmetic on level sets of the fuzzy input vector A p = (Api,... , Apn), we use the following procedure to determine the consequent class Cp and the certainty grade CFP: Step 1: Specify the value of /? (e.g., /3 = 0.9). Step 2: Examine the classifiability of [Ap]/3 by the classification rule in (26). If [Ap]^ is not classifiable, terminate this procedure. In this case, we do not extract the fuzzy ifthen rule Rp with the antecedent fuzzy sets Api,... , Apn. Otherwise specify the consequent class Cp as Ch that satisfies (26), and go to Step 3. Step 3: Slightly decrease the value of (3 as j3 := /? — e where e is a very small positive constant (e.g., e = 0.01). If /? < 0 , specify the
258
H. Ishibuchi, M. Nii & T. Nakashima
Opi 1.0 ft
CO U CD
Op3
op2

A
"cFp/S.
p
CD
0.0 _ 0.0
(a) Classifiable case.
1.0
Op3
Op2
(b) Unclassifiable case. Fig. 12.10 Examples of fuzzy output vectors.
1.0 and terminate this procedure. certainty grade CFP as CFP Otherwise go to Step 4. Step 4: Examine the classifiabihty of [Ap by the classification rule in (26). If [Ap]/3 is classifiable, return to Step 3 for examining the classifiabihty of the fuzzy input vector A p further. Otherwise, specify the certainty grade CFP as CFp = 1 — (/? + e), and terminate this procedure. By this procedure, the certainty grade is determined by the overlap between the largest fuzzy output Oph and the other fuzzy outputs as shown in Fig. 12.10 (a). Let us illustrate our approach to the design of fuzzy rulebased clas
Approaches to the Design of Classification Systems ...
259
sification systems from numerical data and linguistic knowledge through computer simulations on a simple example. We assume that the training patterns in Fig. 12.11 and the following fuzzy ifthen rules are given for designing a fuzzy rulebased classification system: If xi is small then Class 1 withCF = 1.0, If xi is large then Class 2 withCF = 1.0, If xi is medium or mediumlarge or large and x% is small then Class 1 with CF = 1.0, (32) (30) (31)
where we use a trapezoidal membership function for the combined linguistic value "medium or mediumlarge or large". The antecedent fuzzy set "don't care" on the second attribute x? is omitted in the first and second fuzzy ifthen rules. Our task is to design a fuzzy rulebased classification system from the training patterns in Fig. 12.11 and the fuzzy ifthen rules in (30)(32).
1.0
x o o o
 •
^^^w?
• •
> > > ,
°
^
> >
• : Class 1
o
^
 • *2  •
•
• ^^^^ •
^^^ f o o o o
O : Class 2
0.0 0.0
xx
1.0
Fig. 12.11 work.
Numerical data and the classification boundary by the trained neural net
We trained a threelayer feedforward neural network using the standard backpropagation algorithm. The classification boundary by the trained neural network is shown in Fig. 12.11. Using the five linguistic values in Fig. 12.5 and "don't care" as antecedent fuzzy sets (i.e., as fuzzy inputs to the trained neural networks), we examined 36 fuzzy input vectors. By our rule extraction method, we extracted 24 fuzzy ifthen rules from the
260
H. Ishibuchi, M. Nii & T. Nakashima
trained neural network: If xi is small then Class 1 with CF = 0.73, If x\ is small and X2 is small then Class 1 withCF = 0.94, If x\ is large and X2 is large then Class 2 withCF = 1.0.
1.0
o • • o
•x»^.
o o
o
• : Class 1
o
 • *2  •
•
• •
^^
O : Class 2
o
o
o
0.0 0.0
1.0
*1
Fig. 12.12
Classification boundary by the extracted 15 rules.
1.0
o o o • ^ ^ o 0 o
 •  • *2  •
• : Class 1 O : Class 2
^ \ '
•
• J
o
1 1 1 1
/
o
1
o
1
00
1
1
1
0.0
XX
1.0
Fig. 12.13
Classification boundary by the extracted 15 rules and the given 3 rules.
Approaches to the Design of Classification
Systems
...
261
In the extracted fuzzy ifthen rules, some rules are included in other rules. For example, in the above 24 rules, the second fuzzy ifthen rule is included in the first rule. For avoiding the use of too many fuzzy ifthen rules in classification systems, we do not use fuzzy ifthen rules included in other rules. In the above example, nine fuzzy ifthen rules among the extracted 24 rules are included in other rules. Thus we use the other 15 fuzzy rules in a fuzzy rulebased classification system. In Fig. 12.12, we show the classification boundary obtained by those 15 fuzzy ifthen rules. We can see that the classification boundary by the extracted 15 fuzzy ifthen rules in Fig. 12.12 is similar to that of the trained neural network in Fig. 12.11. The extracted 15 fuzzy ifthen rules are used together with the given 3 fuzzy ifthen rules. In Fig. 12.13, we show the classification boundary by the fuzzy rulebased classification system with the 18 fuzzy ifthen rules.
12.3
Neural NetworkBased Approach
We have already explained how fuzzy rulebased classification systems can be designed from numerical data and linguistic knowledge. In this section, we describe the design of neuralnetworkbased classification systems. As in the previous section, we assume that the m training patterns x p = (xpi,... ,xpn), and the M fuzzy ifthen rules in (1) are given. A learning algorithm was proposed by Ishibuchi et al. [22] for training neural networks by fuzzy ifthen rules without certainty grades. In their approach, fuzzy ifthen rules of the following type are used in the learning of neural networks: If xi is Api and . . . and x<i is Apn then Class Cp. (33)
The antecedent fuzzy sets Ap\,... , Apn are used as fuzzy inputs to neural networks as in our fuzzy rule extraction method. When the fuzzy input vector Ap = (Api,... , Apn) is presented to a threelayer feedforward neural network with n input units, n # hidden units and c output units, the inputoutput relation of each unit is denned by (19)(23). The corresponding fuzzy output vector Op = ( 0 p i , . . . ,Opc) is numerically calculated by interval arithmetic on level sets of the fuzzy input vector A p . A target vector tp = (tpi,... ,tpc) is defined for the fuzzy input vector A p by the
262
H. Ishibuchi, M. Nii & T.
Nakashima
consequent class (i.e., Class Cp) as fl, [0, if Class Cp = Class*, otherwise,
fc = 1>2>
_>c.
(34)
A cost function to be minimized in the learning of the neural network is denned by the difference between the fuzzy output vector O p = (Opi,... , OpC) and the target vector t p = (tpi,... , tpc) as
ep =
^fkiM
k=\
a
+
kM),
>
(35)
*•
where the upper scripts "L" and "[/" denote the lower limit and the upper limit of the alevel set [0Pfc]a of the fuzzy output Opk, respectively: [Opk]a = [[Opk]La, [Opk]ual (36)
The alevel set [Opfc]a is calculated from the alevel set of the fuzzy input vector A p by interval arithmetic. In the same manner as in the backpropagation algorithm [2], we can derive the learning algorithm of the connection weights and biases of the neural network from the cost function in (35). While the neural network can be trained by this cost function, the certainty grade CFp of each fuzzy ifthen rule of the form in (1) is not taken into account in the learning. In this paper, we modify the membership function fi() of each antecedent fuzzy set as follows for utilizing the certainty grade CFp of each fuzzy ifthen rule in the learning of the neural network: »Ki =VApi{xi)CFp, i = 1,2,... ,n; p= 1,2,... , M , (37)
where A*vi is the modified antecedent fuzzy set. Since the membership value is discounted by CFP in (37), the alevel set of the modified antecedent fuzzy set is empty when a > CFP. For preventing such an empty level set from being used in the learning, the value of a in the cost function (36) is restricted by the inequality a < CFp. In our learning method, each training pattern x p = (xpx,... , xpn) is handled as a fuzzy input vector in order to use numerical data and linguistic knowledge in the common learning algorithm. The input value xPi is handled as a
Approaches to the Design of Classification
Systems
...
263
fuzzy number with the following membership function:
[0,
ifx^xpi.
The certainty grade CFP of each training pattern is implicitly assumed as CFP = 1.0. If different certainty grades are assigned to some training patterns, they can be used as in (37) for discounting the membership function in (38). In this manner, the given M fuzzy ifthen rules and the given m training patterns are commonly handled as the fuzzy training data (Api,... , Apn;Cp;CFp), p = 1,2,... , m + M where Api = xpi, i = 1,2,... , n in the case of training patterns. The learning algorithm is summarized as follows: Step 1: As in the learning of standard feedforward neural networks, specify the initial values of the connection weights and biases, the learning rate, the momentum constant, and the stopping condition. In addition to these parameter specifications, specify a set of values of a used in the cost function in (36). Let us denote those values as a i , . . . ,aKStep 2: Let k be the index of the value of a. Specify k as k = 1. Step 3: Specify the pattern index p as p = 1. Step 4: Let a be a := a*. If a > CFP then go to Step 5. Otherwise adjust the connection weights and biases using the alevel set [A P ] Q of the pth fuzzy input pattern Ap. Step 5: If the stopping condition is satisfied, terminate the learning algorithm. Otherwise go to Step 6. Step 6: Update the pattern index p as p := p + 1. If p < m + M then return to Step 4. Otherwise go to Step 7. Step 7: Update the index k as k := k + 1. If k < K then return to Step 3. Otherwise return to Step 2. Let us illustrate our learning algorithm by the numerical data in Fig. 12.11, which has already been used for illustrating our fuzzy rule extraction method. As in the case of the fuzzy rule extraction, we assume that the 3 fuzzy ifthen rules in (30)(32) are given in addition to the 20 training patterns in Fig. 12.11. These two kinds of the available information are handled as the 23 fuzzy training patterns in our learning algorithm. We trained a threelayer feedforward neural network with two input units, 5 hidden units and two output units by the above learning algorithm. We show
264
H. hhibuchi, M. Nii & T. Nakashima
the classification boundary by the trained neural network in Fig. 12.14. From this figure, we can see that all the training patterns are correctly classified. We can also see that the classification boundary coincides with the given fuzzy ifthen rules in (30)(32). This means that our learning algorithm can simultaneously utilize the training patterns and the fuzzy ifthen rules. When there are some conflicts between numerical data and linguistic knowledge, the learning algorithm tries to find a kind of compromise. In such a learning process, the certainty grade attached to every single piece of information plays an important role (i.e., has a large effect on the final learning result). For example, if a training pattern with a very small certainty grade is included in a fuzzy ifthen rule with a different consequent class, that training pattern is almost ignored in the learning of neural networks. On the other hand, when two fuzzy ifthen rules with different consequent classes overlap with each other, the classification boundary is likely to follow the fuzzy ifthen rule with the larger certainty grade.
1.0
 •  • 1 o
o o
0
o ^ 0
• : Class 1 O : Class 2
• •
•' • " • • V
o o o
_1_
1 1
*2
 • "
f
• '
o
1
0.0 0.0
I
_1
1
*1
1.0
Fig. 12.14 Classification boundary by the trained neural network from numerical data and linguistic knowledge.
12.4
Performance Evaluation
In this section, we examine the performance of the two approaches (i.e., the fuzzy rulebased approach and the neural networkbased approach) to the design of classification systems from numerical data and linguistic knowl
Approaches to the Design of Classification
Systems
...
265
edge through computer simulations on the wellknown iris data. The iris data involve 150 samples with four attributes from three classes [41]. Since we had no linguistic knowledge on the iris data, we generated linguistic knowledge in the following manner to artificially create the situation where both numerical data and linguistic knowledge are given. First we randomly divided the iris data into three subsets (say, data sets A, B, and C) with 50 samples. Data set A was used as numerical data. Data set B was used as test data for evaluating classification systems. Data set C was used for generating linguistic knowledge. We employed the GAbased rule selection method [1618] for generating a small number of fuzzy ifthen rules from the data set C. In our computer simulation, the five linguistic values in Fig. 12.5 and "don't care" were used as antecedent fuzzy sets. Since the iris data have four attributes, we examined 6 4 = 1296 combinations of the antecedent fuzzy sets to generate candidate fuzzy ifthen rules of the following type: If x\ is Aji and xi is Aj2 and X3 is Aj% and X4 is Aj± then Class C, with CFj. (39)
By the heuristic rule generation procedure, 491.8 fuzzy ifthen rules were generated from the data set C on the average (over 50 trials for different specifications of the data set C). The other fuzzy ifthen rules could not generated because there were no training patterns from the data set C in the corresponding fuzzy subspaces. Then the rule selection method was applied to the generated candidate rules to select a small number of relevant fuzzy ifthen rules using the data set C. The average number of selected fuzzy ifthen rules was 3.8 over 50 independent trials. In this manner, a small number of fuzzy ifthen rules were found from the data set C as linguistic knowledge. Our task is to design a classification system from the numerical data (i.e., the data set A) and the linguistic knowledge generated by the data set C. The designed classification system is evaluated by the data set B. In our computer simulations, we examined the following seven classification systems. (1) Neural networks that were trained only from the numerical data (i.e., data set A). (2) Neural networks that were trained from both the numerical data and the linguistic knowledge.
266 H. Ishibuchi, M. Nii & T. Nakashima
(3) Fuzzy rulebased systems where fuzzy ifthen rules were generated only from the numerical data. We used the fuzzy classifier system [1920] to generate fuzzy ifthen rules from the data set A. The number of fuzzy ifthen rules was specified as 20 in the fuzzy classifier system. (4) Fuzzy rulebased systems where fuzzy ifthen rules were extracted from trained neural networks in (1). The average number of extracted fuzzy ifthen rules was 48.7. (5) Fuzzy rulebased systems that were directly constructed by the linguistic knowledge. (6) Fuzzy rulebased systems where fuzzy ifthen rules were a mixture of (3) and (5). (7) Fuzzy rulebased systems where fuzzy ifthen rules were a mixture of (4) and (5).
Our computer simulation was iterated 50 times by differently partitioning the iris data set into the three subsets A, B and C. Average classification rates on the test data (i.e., data set B) by the above seven methods are summarized in Table 12.1. For the neuralnetworkbased classification system, we show the best results over various specifications of the number of learning iterations. From this table, we can see that the highest average classification rate was obtained by neural networks trained by the two kinds of available information.
Table 12.1 Classification rates on the test patterns.
Classification system Neural Network
Fuzzy Rule Base
Available Information Numerical data Numerical data and linguistic knowledge Numerical data Trained neural network Linguistic knowledge Numerical data and trained NN Numerical data and linguistic knowledge
Rate 97.7% 97.9% 94.6% 95.6% 93.3% 95.2% 95.4%
Approaches to the Design of Classification Systems ...
267
12.5
Conclusion
In this paper, we illustrated how numerical data (i.e., training patterns) and linguistic knowledge (i.e., fuzzy ifthen rules) can be simultaneously utilized in the design of pattern classification systems. We proposed two approaches. One is a fuzzy rulebased approach where fuzzy ifthen rules generated from numerical data are used together with the given linguistic knowledge to construct a fuzzy rulebased classification system. We described several techniques for generating fuzzy ifthen rules from numerical data. The other approach is a neuralnetworkbased approach where the given linguistic knowledge is used in the learning of neural networks as training data. In this approach, linguistic knowledge (i.e., fuzzy ifthen rules) and numerical data (i.e., training patterns) are handled in a common learning algorithm as fuzzy training patterns. Neural networks were extended to the case of fuzzy input vectors. Through computer simulations, we demonstrated that classification systems designed by utilizing the two kinds of information had high classification rates.
268 H. Ishibuchi, M. Nii & T. Nakashima
References
[I] [2] [3] [4] [5] [6] R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis, John Wiley k Sons, New York, 1973. D. E. Rumelhart, J. L. McClelland, and the PDP Research Group, Parallel Distributed Processing, MIT Press, Cambridge, 1986. S. M. Weiss and C. A. Kulikowski, Computer Systems That Learn, Morgan Kaufmann Publishers, San Mateo, 1991. J. R. Quinlan, C45: Programs for Machine Learning, Morgan Kaufmann Publishers, San Mateo, California, 1993. M. Sugeno, "An introductory survey of fuzzy control," Information vol. 36, no. 1/2, pp. 5983, 1985. Sciences,
C. C. Lee, "Fuzzy logic in control systems: fuzzy logic controller Part I and Part II," IEEE Trans, on Systems, Man, and Cybernetics, vol. 20, no. 2, pp. 404435, 1990. T. Takagi and M. Sugeno, "Fuzzy identification of systems and its applications to modeling and control," IEEE Trans, on Systems, Man, and Cybernetics, vol. 15, no. 1, pp. 116132, 1985. L. X. Wang and J. M. Mendel, "Generating fuzzy rules by learning from examples," IEEE Trans, on Systems, Man, and Cybernetics, vol. 22, no. 6, pp. 14141427, 1992. M. Sugeno and T. Yasukawa, "A fuzzylogicbased approach to qualitative modeling," IEEE Trans, on Fuzzy Systems, vol. 1, no. 1, pp. 731, 1993.
[7]
[8]
[9]
[10] S. Mitra, "Fuzzy MLP based expert system for medical diagnosis," Fuzzy Sets and Systems, vol. 65, No. 2 / 3 , pp. 285296, 1994. [II] S. Abe, and M.S. Lan, "A method for fuzzy rules extraction directly from numerical data and its application to pattern classification," IEEE Trans. on Fuzzy Systems, vol. 3, no. 1, pp. 1828, 1995. [12] Y. Yuan, and H. Zhuang, "A genetic algorithm for generating fuzzy classification rules," Fuzzy Sets and Systems, vol. 84, no. 1, pp. 119, 1996.
Approaches to the Design of Classification Systems ...
269
[13] O. Cordon, and F. Herrera, "A threestate evolutionary process for learn, ing descriptive and approximate fuzzylogiccontroller knowledge bases from examples," International Journal of Approximate Reasoning, vol. 17, no. 4, 369407, 1997. [14] D. Nauck, and R. Kruse, "A neurofuzzy method to learn fuzzy classification rules from data," Fuzzy Sets and Systems, vol. 89, pp. 277288, 1997. [15] H. Ishibuchi, K. Nozaki, and H. Tanaka, "Distributed representation of fuzzy rules and its application to pattern classification," Fuzzy Sets and Systems, vol. 52, no. 1, pp. 2132, 1992. [16] H. Ishibuchi, K. Nozaki, N. Yamamoto, and H. Tanaka, "Construction of fuzzy classification systems with rectangular fuzzy rules using genetic algorithms," Fuzzy Sets and Systems, vol. 65, pp. 237253, 1994. [17] H. Ishibuchi, K. Nozaki, N. Yamamoto, and H. Tanaka, "Selecting fuzzy ifthen rules for classification problems using genetic algorithms," IEEE Trans. on Fuzzy Systems, vol. 3, no. 2, pp. 260270, 1995. [18] H. Ishibuchi, T. Murata, and I. B. Turksen, "Singleobjective and multiobjective genetic algorithms for selecting linguistic rules for pattern classification problems," Fuzzy Sets and Systems, vol. 89, pp. 135150, 1997. [19] H. Ishibuchi, T. Nakashima, and T. Murata, "Performance evaluation of fuzzy classifier systems for multidimensional pattern classification problems," IEEE Trans, on Systems, Man, and Cybernetics, (in press) [20] H. Ishibuchi and T. Nakashima, "Improving the performance of fuzzy classifier systems for pattern classification problems with continuous attributes," IEEE Transactions on Industrial Electronics, (in press) [21] H. Ishibuchi and M. Nii, "Generating fuzzy ifthen rules from trained neural networks," Proc. of 1996 IEEE International Conference on Neural Networks, pp. 11331138, 1996. [22] H. Ishibuchi, R. Fujioka and H. Tanaka, "Neural networks that learn from fuzzy ifthen rules," IEEE Trans, on Fuzzy Systems, vol. 1, no. 2, pp. 8597, 1993. [23] H. Ishibuchi, H. Tanaka, and H. Okada, "Interpolation of fuzzy ifthen rules by neural networks," International J. of Approximate Reasoning, vol. 10, no. 1, pp. 327, 1994. [24] H. Ishibuchi, M. Nii, and I. B. Turksen, "Bidirectional bridge between neural networks and linguistic knowledge: Linguistic rule extraction and learning from linguistic rules," Proc. of 1998 IEEE International Conference on Fuzzy Systems, pp. 11121117, 1998. [25] K. Nozaki, H. Ishibuchi, and H. Tanaka, "Adaptive fuzzy rulebased classification systems," IEEE Trans, on Fuzzy Systems, vol. 4, no. 3, pp. 238250, 1996.
270 H. Ishibuchi, M. Nii & T. Nakashima [26] R. Andrews, J. Diederich, and A. B. Tickele, "Survey and critique of techniques for extracting rules from trained artificial neural networks," KnowledgeBased Systems, vol. 8, no. 6, pp. 373389, 1995. [27] L. Fu, "Rule generation from neural networks," IEEE Trans, on Systems, Man, and Cybernetics, vol. 24, no. 8, pp. 11141124, 1994. [28] S. Sestito and T. Dillon, "Knowledge acquisition of conjunctive rules using multilayered neural networks," International Journal of Intelligent Systems, vol. 8, pp. 779805, 1993. [29] G. Towell and J. W. Shavlik, "Interpretation of artificial neural networks: mapping knowledgebased neural networks into rules," Advances in Neural Information Processing Systems 4 (Edited by J. E. Moody, S. J. Hanson and R. P. Lippmann), San Mateo, Morgan Kaufmann, pp. 977984, 1992. [30] G. Towell and J. W. Shavlik, "Extracting refined rules from knowledgebased neural networks," Machine Learning, vol. 13, pp. 71101, 1993. [31] Y. Hayashi, "A neural expert system with automated extraction of fuzzy ifthen rules and its application to medical diagnosis," Advances in Neural Information Processing Systems 3 (Edited by R. P. Lippmann, J. E. Moody and D. S. Touretzky), San Mateo, Morgan Kaufmann, pp. 578584, 1991. [32] C. Matthews and I. Jagielska, "Fuzzy rule extraction from a trained multilayered neural network," Proc. of 1995 IEEE International Conference on Neural Networks, pp. 744748, 1995. [33] T. Furuhashi, S. Matsushita, H. Tsutsui, Y. Uchikawa, "Knowledge extraction from hierarchical fuzzy model obtained by fuzzy neural networks and genetic algorithms," Proc. of 1997 IEEE International Conference on Neural Networks, pp. 23742379, 1997. [34] N. K. Kasabov, "Fuzzy rule extraction, reasoning and rule adaptation in fuzzy neural networks," Proc. of 1997 IEEE International Conference on Neural Networks, pp. 23802383, 1997. [35] M. Umano, S. Fukunaka, I. Hatono, H. Tamura, "Acquisition of fuzzy rules using fuzzy neural networks with forgetting," Proc. of 1997 IEEE International Conference on Neural Networks, pp. 23692373, 1997. [36] A. Kaufmann and M. M. Gupta, Introduction to Fuzzy Arithmetic, Van Nostrand Reinhold, New York, 1985. [37] R. E. Moore, Methods and Applications of Interval Analysis, SIAM Studies in Applied Mathematics, Philadelphia, 1979. [38] G. Alefeld and J. Herzberger, Introduction to Interval Computations, demic Press, New York, 1983. Aca
[39] J. J. Buckley and Y. Hayashi, "Fuzzy Neural Networks: A Survey," Fuzzy Sets and Systems, vol. 66, pp. 113, 1994.
Approaches to the Design of Classification Systems ...
271
[40] H. Ishibuchi, K. Morioka, and I. B. Turksen, "Learning by Fuzzified Neural Networks," International J. of Approximate Reasoning, vol. 13, no. 4, pp. 327358, 1995. [41] R. A. Fisher, "The use of multiple measurements in taxonomic problems," Annals of Eugenics, vol. 7, pp. 179188, 1936.
Chapter 13 A Clustering based on SelfOrganizing Map and Knowledge Discovery by Neural Network
Kado Nakagawa, Naotake Kamiura, and Yutaka Hata Himeji Institute of Technology
Abstract
Clustering methods, such as kmeans, Fuzzy CMeans (FCM), and others have been developed. However, they are only partitioning a database, so it is difficult to discover the This paper proposes a method to discover the knowledge To select the center vector of each cluster, we employ an reason why each cluster is formed. of how the clusters are derived. number of clusters.
unsupervised clustering method based on SelfOrganizing Map (SOM) without giving the We define the degree of contribution calculated from the weights of the We then describe the knowledge discovery We applied our method to the artificial data and the clustering neural network which learned the center vectors. method from the degrees. problem.
The results show that the degree of contribution is an efficient indicator to
represent the knowledge of how the clusters are formed. Keywords : Data Mining, Knowledge Discovery, Clustering, Kmeans, Fuzzy CMeans (FCM), Center Vector, Number of Clusters, SelfOrganizing Map (SOM), Unsupervised Clustering, Reference Vector, Winning Neuron, Competitive Learning, Similarity Matching, Updating Procedure, Dense Neuron, The Degree of Similarity, Combination, Neural Network, Prune, The Degree of Contribution, Neyman Scott's Method
13.1
Introduction
The rapid developments of computer memory capacity technology enable us to accumulate a lot of databases with a large amount of information. The data mining [1] is useful in analyzing the contents of a largescale database. It finds the patterns that appear at frequent intervals and then discovers the helpful knowledge and unknown rules for users. Namely, it processes a data set and
273
274
K. Nakagawa, N. Kamiura & Y. Hata
discovers the hidden knowledge. Before discovering the hidden knowledge, the information in a huge database could be classified into some clusters. Recently, the clustering [2] has receiving considerable attention as one of the most promising approaches for classifying data. It is applied to the various fields; data analysis, pattern recognition, image processing, fuzzy modeling, and so on. Kmeans [3] and Fuzzy CMeans (FCM for short) [4] are wellknown clustering algorithms. They merely partition a database, need some teaching data, and cannot discover the knowledge of how the clustering result is derived. This paper proposes a useful method in discovering the reason why each cluster is formed. Our method consists of three steps. As the first step, we employ the clustering based on SelfOrganizing Map (SOM for short) [5][7] to find the center vector of each cluster. This step finds the center vectors by doing the clustering. The SOM algorithm is robust for noises and shows easily the highdimensional vectors over threedimensions by mapping a highdimensional input data space onto a lowdimensional discrete lattice of units. This algorithm is an unsupervised learning one [8][ll], so we can find the center vectors without any teaching data. The number of clusters equals that of the center vectors. Our method thus, does not require the number of clusters to execute the clustering though Kmeans and FCM require it in advance. We employ threelayered feedforward neural network to discover the knowledge of how the clustering result is derived. The second step repeats pruning [12][13] a neuron with the smallest sum of errors in the hidden layer, when the network learns the center vectors as teaching data. It is known that many neurons in the hidden layer may lead to overfitting of the data and poor generalization [14][15], while few neurons in the hidden layer may not give a network that learns the data. Our method is inclined to discover suitable knowledge from the network with the optimal number of neurons in the hidden layer. Two different approaches have been proposed to overcome a problem of determining an optimal number of neurons in the hidden layer which is required in the network to solve a given problem. The first approach begins with a minimal network and adds more neurons in the hidden layer only when they are needed to improve the learning capability of the network. The second approach begins with an oversized network and then prunes redundant neurons [12][13] in the hidden layer. Since the second one can discourage the use of unnecessary connections and prevent the weights of the connections from taking excessively large values, we employ it. The third step discovers the knowledge concerned with each cluster from
A Clustering based on SelfOrganizing Map and Knowledge ...
275
the degree of contribution defined by the weights of neurons in the network. The organization of this paper is as follows. In Section 13.2, we explain the SOM algorithm. The three steps of our methods are described in Section 13.3. In Section 13.4, we apply our method to a database formed artificially by employing the Neyman Scott's method [16] and two databases of cars. Finally, in Section 13.5, a brief conclusion and future perspective are discussed. 13.2 Preliminary
13.2.1 Notation of Data Set Let X = {x1,...,xn] be a set of the input data where xk eRN(k = l,...,n). A variable xk is an element in the T dimensional Euclidean space such as V xk ={firstattribute,..., Afthattribute}. Namely, each variable has Nattributes, i.e., xk  {Height, Weight, Age,...}. In this paper, for all input data X:k{j — \,...,N) concerned with jth attribute, the following fuzzy membership value r\:k is calculated. It is x ik  min,
iljk=—
J
—
(1)
max  n u n
j
J
where max , is a maximum value concerned with jth attribute and min , is a • •
J
minimum one. We use these fuzzy membership values as the input data. [Example 1] When a maximum value concerned with one attribute is 100 and a minimum value is 0, an input data 50 has a fuzzy membership value 0.5.
•
13.2.2 The SOM Algorithm The SOM algorithm is an important tool to map highdimensional data sets unknown density distributions onto a lowdimensional discrete lattice of neurons. Fig. 13.1 shows the structure of SOM. In this Fig., some neurons are arranged on the twodimensional lattice, and a weight called reference vector to each neuron is assigned. The lattice type of array is a rectangular.
276 K. Nakagawa, N. Kamiura & Y. Hata input vector x _ \ input layer
— <
reference vector m
Fig. 13.1 The structure of SOM. The SOM consists of an input layer and a map layer with neurons. This input layer is connected with all neurons and supplies all input data to them. A set of data X = {x{,...,xn}, xkeRN with Afattributes and ndata is fed to the SOM via an input layer and a map layer has Afneurons. The neuron i (i = l,...,M) in a map layer has a reference vector mt, mi eR with Nattributes the same as an input data. The learning process of SOM proceeds by modifying reference vectors. The learning process is called a competitive learning. It consists of two operations. If the Euclidean distance between a reference vector assigned to some neuron and input data xk is the smallest, when xk is fed to the SOM, then this neuron is called a winning neuron. The first operation is the similarity matching to find a winning neuron. The second operation is the renewal procedure bringing a reference vector of winning neuron and ones of neurons located around the winning neuron close to xk. We repeat these operations for the predefined number of times. The steps are shown as follows. STEP 1 STEP 2 Initialize all reference vectors mi (i = 1,..., M) at random. Execute followings with respect to t = l,...,T. 1. Select an input data xk e X at random. 2. Find a winning neuron c satisfied equation (2) as an example shown in Fig. 13.2.
min Il**mc!l = I<;<M"\\xt m ill . min]ht tti;'
:
(2)
A Clustering based on SelfOrganizing Map and Knowledge ...
277
where HjcAm( is the Euclidean distance between xk and m,. input data xk
J
J
J
J
J
J
J—J
J
J
^P
map layer
r r r r / / r rr.j
winning neuron
Fig. 13.2 A winning neuron.
3.
Modify reference vectors according to equation (3) as an example shown in Fig.13.3.
IM,(f + l)
_\mi(t) + a(0[*m;(f)] if if
ie.Nc i e Nr
(3)
where Nc is the number of neurons located around the winning neuron c, a(t)<l is a coefficient of learning and t is the number of renewals. input data xk * ~J ~ ] / / ~7
I
J
J
J
map layer
/_/
I
I
I
I
U I
(
I—I
: winning neuron JVc = 1
) : modification neurons
Fig. 13.3
The modification of refrence vectors.
278 K. Nakagawa, N. Kamiura & Y. Hata Nc and a{t) begin at big values to roughly construct a map and then take small values as the learning proceeds. In this paper, Nc begins at a distance three and a(t) is expressed by equations (4). 0.95 a{t)(1 < t < 30999) 950.0 (30999 < t < 77500) l + (r + 30000) 0.02 otherwise a(t).
(4)
Fig.13.4 shows the degree of
1 0.9 § 0.8 •3 0.7 j£ 0.6 8 0.5 o0.4 £0.2 H 0.1 0
a 0.3
0.95
T0.02
±L
20000 40000 60000 80000 The number of modifications 100000
Fig. 13.4 A coefficient of learning.
13.3
A Clustering Employing the SOM and Knowledge Discovery by Neural Network
13.3.1 A Clustering Employing the SOM In this Section, we apply the SOM to estimating the number of clusters and determining the center vector of each cluster. Nc takes a big value at first and the SOM roughly constructs a map. Then several data with similar values of attributes on X are fed to an input layer are learned repeatedly on a map layer. The adjoining reference vector of some neuron is similar to that of its adjacent neuron. After the learning, we calculate the average of distance between the reference vector of some neuron and that of its adjacent neuron. We denote the average by AD. The neuron is learned repeatedly by several data with similar values of attributes as its AD becomes smaller. We call a neuron with AD less
A Clustering based on SelfOrganizing
Map and Knowledge ...
279
than or equal to threshold 5 a dense neuron. We first combine neighboring dense neurons. Then we set the center vector of each cluster to the coordinates obtained by averaging values of attributes of the combined dense neurons. The number of clusters equals that of center vectors. The number of clusters and their center vectors are determined as follows. STEP 1 STEP 2 Apply the input data to the SOM algorithm at random. Find dense neurons. 1. Calculate AD for each neuron. We denote AD of the neuron i by di {i = 1,2,..., M). dt is as follows.
where Nt is the number of neurons around the neuron i and T , is different from Nc. If the neuron i is located at the V edge of SOM, then Nt equals 2 or 3. Otherwise, Nt equals 4 as shown in Fig. 13.5.
~7 7 7 7
1
U
7
7
7
7
7
Fig. 13.5 Neuron ( and neurons around one. 2. Calculate the threshold S by equation (6) where d_max (or d_miri) is the maximum(or minimum) value among ADs. Then find the dense neurons gi,g2>—<8s according to equation (7). 5 = d_min + 0.lx(d_maxd_min) 5 = 1, for i = 1,...,M (6)
280
K. Nakagawa, N. Kamiura
& Y. Hata
if dj < 8, then set the number of dense neuron STEP 3
gs to i.
s <— s + l Combine the dense neurons as follows. 1. T = {Gl,G2,...,GI] be a set of initial clusters. First set / to s. Namely, T = {Gl,G2,...,G1}={gl,g2,...,gs}. 2. If / < 2, then go to STEP 1. 3. Select two clusters G and G (p,q<l). For any dense neuron in G and G , we calculate the value F(Gp,Gq) by following equation. F(Gp,Gq)= max f(x,y) (8) and we define (9)
where x(ory) is an element of Gp(orGq), f(x, y) as follows. )
where a and b are constant. In this paper, we set a to 8 and b to 2. We calculate F(Gp,Gq) for all the possible pairs of two clusters. We call the maximum value of F{G ,G) the degree of similarity £. Namely, £ = max F(Gp,G,) (10)
p,q<I,p*q
4.
If £ > 0 , then go to 4 of STEP 3. From the experimental result, we set 6 to 0.75. Otherwise, go to STEP 4. Combine Gp and Gq and form the new cluster Gr. That is Cr = C,uC,. (11)
STEP 4
Then / <  /  l , r <  T  G p Gq + Gr hold. Simultaneously, renumber the elements of T from G, to Gt in order. Goto 2 of STEP 3. In this STEP, we form / clusters: G1; G2,..., G, with dense neurons. Determine the center vector of each cluster and the membership degree of each neuron as follows. 1. We denote the center vector of each cluster Gr by m_centerr where (r = l,...,/). Calculate m_centerr as follows.
A Clustering based on SelfOrganizing
Map and Knowledge ...
281
I'eC,
m_centerr 
2.
. . \Gr\ (12) _ The sum of reference vectors of dense neurons in Gr. The number of dense neurons in Gr. Calculate the degree of similarity firi between m_centerr and each neuron i (i = 1,..., M) as following equation. firi=lf(m_centerr,i)}c (13)
3.
In this paper, we set c to 1.2. Calculate the membership degree [i'ri of each neuron i with respect to cluster r. ^'n=rIi(14)
IXr=\
STEP 5
Determine the membership degree of each input data as follows. For the input data /? (/? e X) and any neuron i, calculate the Euclidean distance between the neuron /} and i, and find the neuron with the smallest distance. Set the membership degree of P to that of this neuron. /J is regarded as an element of the cluster with this neuron.
[Example 2] Fig. 13.6 shows an example of ADs. If we set the threshold 8 to 0.030, then seven neurons numbered with 5,13,22,25,40,43, and 48 are selected as dense neurons. Fig.13.7 shows a map layer in the SOM. The color of neuron becomes deeper as its AD becomes smaller. The dense neurons are denoted by black ones.
282
K. Nakagawa, N. Ka.rn.iura & Y. Hata
0.11 0.1 0.09 0.08
0.07 pO06 •
a
< » •
•
<0.05 0.04 0.03 • MB* ;' : !:lill!ii:iiif!H:i :illl 0.02 0.01 ^:::}::mmMMi§:mM^22M:imm 0 10
• ' • • • • • • • • • • **•*• . . :. : : • / • • • ; • V r
:::;p:i*:::::!i.;,:::::
" ;
5 = 0.030
: : • : : : : : : : :
: : : : : • : • : : • •  —
"
'
 :  :  : : :  : " : : : : : : : : : : : : : : : : : : •
;!;:;:;;;:::;;;d;y:i4Qi43:::!:!i4X;=^=';;iN!:!!^
•;.:::;;;;;;;;;;;;;;;:;:!!!!!:!::;.
20 30 40 The number of neurons
Fig. 13.6 An example of the ADs.
50
60
©©<&©©© #®<&
Fig. 13.7 The dense neurons.
D
[Example 3] Fig.13.8 shows f(x, v) expressed by equation (9) where a=8 and b=2. With a 6 = 0.75, the dense neurons with the degrees of similarity more than or equal to 0.75 are combined as an identical cluster. Namely, ones with the distance of reference vectors less than or equal to 0.19 forms a cluster. Fig. 13.9 shows the map layer after combining dense neurons. This clustering based on the SOM forms cluster 1 and cluster 2. The distance of reference vectors of arbitrary two dense neurons in the cluster 1 (or cluster 2) is less than or equal to 0.19. However, the distance between reference vector of some dense neuron in the cluster 1 and that in the cluster 2 is more than 0.19.
A Clustering based on SelfOrganizing Map and Knowledge ...
283
/
(
•
l £>0. 9
'£ o.
f(x,y) = exp(8 x \mx  my ) 2 \0.15
•so.r 1°
6 *S 0. 5 f " o 0. 4 go 3 u „ •o 0. 2 J3 0. 1 H 0 0
0
19
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
The distance of reference vector
Fig. 13.8 The degrees of similarity curve.
\mxmj
B BIB86»
^ i ^ f ^ j r l ^ H ^ ^8iF ]*P5
j^ssefe. jj^jtec .^sss&i ]_^^^Tl jflfflBi. ,£&&&
asest* H S £ ] 4sgH&. ^sgilk f j ^ f
cluster 1
cluster 2 Fig. 13.9 The set of cluster.
133.2 Knowledge Discovery by Neural Network A clustering based on the SOM in section 13.3.1 cannot discover the knowledge of how the clusters are derived. In order to discover the knowledge, we introduce a threelayered feedforward neural network. Our network learns the center vector of each cluster as teaching data. In learning phase, we prune a neuron with the smallest sum of errors in the hidden layer. This operation is repeated until we obtain nonconverged network. We define the degree of contribution concerned with one attribute from weights of neurons in the last
284
K. Nakagawa, N. Kamiura & Y. Hata
converged network, and then we discover the knowledge from the degrees. We show each step to discover the knowledge. STEP 1 Let N be the number of attributes and / be the number of clusters. Fig. 13.10 shows a threelayered feedforward neural network with N input neurons, / output neurons, and J hidden neurons.
out,h
kth.
©
wh,in
^ . ^
XUOM,h ^ _ ^
th
Fig. 13.10
The strucuture of neural network.
STEP 2
STEP 3
STEP 4
where W'"" is a connection weight from the kth neuron in the input layer to jth neuron in the hidden layer, and W"" is one from the jth neuron in the hidden layer to ith neuron in the output layer. Our network learns the center vector of each cluster as teaching data. The learning proceeds so that each output value will take 1 when center vector of each cluster is applied to input. We preserve the state of convergence. In the hidden layer, when removing one neuron j (j = I,..., J) from the converged network, we calculate the sum of errors, namely, ((teaching data)(output of the network))2, for every input data. We prune a neuron in the hidden layer with the smallest sum of errors and set J to J1. If the network can not converge, then stop the learning in the last
A Clustering based on SelfOrganizing
Map and Knowledge ...
285
STEP 5
state of convergence. Otherwise, go to STEP 3. Namely, STEP 3 is repeated until a nonconverged network is obtained. The above is done to clearly obtain "the degree of contribution" mentioned after. For the last converged network, we calculate values, Ijk, from the kth neuron in the input layer to the ith neuron in the output layer.
7=1
STEP 6
where J is the number of remained neurons in the hidden layer. We define Iik as the degree of contribution that the kth attribute gives to the ith cluster. When the absolute value of Iik is larger than the threshold thr, we can find a remarkable characteristic for ith cluster concerned with kth attribute. In this way, we employ the property of this attribute as the following knowledge. If Iik > thr, then the knowledge of kth attribute gives to ith cluster is "big". If Iik<thr, then the knowledge of kth attribute gives to ith cluster is "small". Otherwise, the knowledge of kth attribute gives to ith cluster is "neither small nor big". In other words, a large value of Iik over the threshold thr can be represented by "big" for the knowledge of kth attribute of the ith cluster. A small negative value of Iik under the threshold thr can be represented by "small". Otherwise can be represented by "neither small nor big". The threshold thr is determined by a half value of the standard deviation concerned with the absolute value of Iik.
[Example 4] When the number of clusters is three, the number of outputs equals three. As shown in Table 13.1, we make teaching data so that each output value will take 1 when center vector of each cluster is applied to input.
286 K. Nakagawa, N. Kamiura & Y. Hata
Table 13.1 Teaching data. —_____^ output data(cluster) No. ~——__ teaching data 1 (cluster 1) teaching data 2(cluster 2) teaching data 3(cluster 3) First 1 0 0 Second 0 1 0 Third 0 0 1
Assume that the network is converged as shown in Fig.13.11. We show that how to calculate the degree of contribution that second attribute gives to first cluster, /j 2 It is as follows. / 1 2 = 6 x (  4 ) + (  2 ) x 5 = 34 Weights (16)
attibute
cluster
Fig. 13.11 The converged network.
[Example 5] Assume that the degrees of contribution are calculated as shown in Table 13.2. Then the threshold thr equals 36.5. Table 13.3 shows the discovered knowledge from the degrees.
A Clustering based on SelfOrganizing Map and Knowledge ... Table 13.2 The degrees of contribution. ^"^\^attribute cluster NoT~"~—^ cluster 1 cluster2 cluster3 First 11.87 52.1 •44.56 Second 6.31 2.39 37.7 Third 59.2 9.18 1.01
287
Table 13.3 The discovered knowledge form Table 13.2 (thr=36.5). ^~~~^~~^attribute cluster N o ^ ~ ~  \ clusterl cluster2 cluster3 First big small Second Third small 
Wg
The knowledge concerned with the first cluster is "the third attribute is small", the second cluster is "the first attribute is big", and the third cluster is "the first attribute is small and the second attribute is big". The cells marked by ""s in Table 13.3 shows that the discovered knowledge is "neither small nor big".
•
13.4
Experimental Results
13.4.1 The Estimation of the Number of Clusters We applied a clustering based on the SOM to an artificially synthesized five clusters with 234 data specified by two attributes (x,y) in Fig. 13.12 using Neyman Scott's Method. The size of SOM is 20 rows and 20 columns and hence the number of neurons is 400. The number of learning in the SOM was 400,000 times. Fig. 13.13 shows the result of clustering. The number of clusters was five and it is shown that our method has the fine ability of determining the number of cluster without teaching data. Moreover, we can find the center vector of each cluster. Table 13.4 shows the experimental results of applying a clustering based on the SOM to Fig.13.12 for 10 times under the threshold 0=0.7, 0.75, and 0.8. In the case of 0=0.75, the number of clusters was always five and hence 0.75 is appropriate to 6.
288 K. Nakagawa, N. Kamiura & Y. Hata 1.2 r y 1 0.8 0.6 0.4 0.2 0 0.2 0.2
• sample data
•
• " * • * . ,
. • *• • • • • •• ••
. *• •^
••.
%T •
•
••
%• # • • •  •A *
« .*• • •
V£.
••
A T
•
0
• • •• • •
0.2
•
0.4 0.6 0.8 1 1.2
1.4
Fig. 13.12
Artificial data.
Oclusterl ° cluster2 Acluster3 Xcluster4 +cluster5 •center vectors 1.2 1 0.8 0.6 0.4 0.2 0 0.2 0.2
0.2
0.4
0.6
0.
1.2
1.4
Fig. 13.13
The result of clustering employing the SOM.
A Clustering based on SelfOrganizing Map and Knowledge ... Table 13.4 The number of clusters in changing threshold 8. 1 0.8 0.75 0.7 5 5 5 2 5 5 4 3 6 5 4 4 5 5 4 5 5 5 5 6 5 5 3 7 5 5 5 8 5 5 5 9 5 5 5 10 5 5 5
289
13.4.2 The Result on Discovery of Knowledge We applied our method to a car database with 200 data specified by three attributes (Price, Enginesize, and Width). After the clustering, the number of clusters became three. So we prepared the neural network with three input neurons, three output neurons, and adequate hidden neurons. Tables 13.5,13.6, 13.7, 13.8, and 13.9 show the center vector of each cluster, teaching data, a part of clustering result, the degrees of contribution, and the discovered knowledge from the degrees, respectively. The size of SOM used before discovering the knowledge was 20 rows and 20 columns, and the number of learning in the SOM was 400,000 times. We set the sum of errors in neural network to 0.0004. Table 13.5 The center vector of each cluster. ^""^^attribute Enginesize(ci) Price($) cluster N a ^ \ . 186.7(0.68) cluster1 16434.9(0.28) 189(0.71) cluster2 27420.1(0.55) 172.1(0.46) cluster3 9542.6(0.11) Width(inch) 68.4(0.69) 66.9(0.56) 65.4(0.44)
( ):Fuzzy menbership values Table 13.6 Teaching data. ^~~~~~—^ciutput center N o / ^ ~   ^ cluster1 cluster2 cluster3 First 1 0 0 Second 0 1 0 Third 0 0 1
j
290 K. Nakagawa, N. Kamiura & Y. Hata Table 13.7 A part of clustering result
^^^attribute cluster N o > \
Price($) 19045 21485 22470 22625 20000 24565 30760 41315 36880 32250 10295 12945 10345 6785 11048
Enginesize(cc) 188.8 188.8 188.8 188.8 188.8 189 189 193.8 197 199.6 175.4 175.4 169.1 170.7 172.6
Width(inch) 68.8 68.9 68.9 68.9 68.9 66.9 66.9 67.9 70.9 69.6 62.5 65.2 66 61.8 65.2
clusterl
cluster2
cluster3
Table 13.8
The degrees of contribution.
^"~~^—^attribute cluster N o / " " ^  ^ . clusterl cluster2 cluster3
Table 13.9
Price($) 32.46 188.9 121.84
Enginesize(ci) 39.74 47.1 63.18
Width(inch) 202.07 71.26 84.23
The discovered knowledge (thr=153A).
^~~^~~^attribute cluster N o T ^ ~   \ clusterl cluster2 cluster3
Price($) big 
Enginesize(ci) . 
Width(inch) big 
A Clustering based on SelfOrganizing Map and Knowledge ... In this result, the threshold thr was estimated at 153.4. As a result, the discovered knowledge concerned with the first cluster was "the width is big", and concerned with the second cluster was "the price is big". Finally, we applied our method to a car database with 200 data specified by six attributes (Price, Enginesize, Peakrpm, Length, Width, and Height). After the clustering, the number of clusters became five. So we prepared the neural network with six input neurons, five output neurons, and adequate hidden neurons. Tables 13.10, 13.11, 13.12, 13.13, and 13.14 show the center vector of each cluster, teaching data, a part of clustering result, the degrees of contribution, and the discovered knowledge from the degrees, respectively. Table 13.10 The center vector of each cluster.
^\attribute Price($) cluster N b ^ clusterl cluster2 cluster3 cluster4 cluster5 18420(0.33) 7738(0.07) 8013(0.07) 12945(0.19) 6377(0.03) Enginesize (cubic inch) 130(0.26) 98(0.14) 108(0.18) 110(0.18) 90(0.11) Peakrpm (times) 5100(0.39) 4800(0.27) 4800(0.27) 5800(0.67) 5500(0.55) Length (inch) 188.8(0.71) 166.3(0.38) 173.6(0.49) 175.4(0.51) 157.3(0.24) Width (inch) 67.2(0.59) 64.4(0.35) 65.4(0.44) 65.2(0.42) 63.8(0.30) Height (inch) 56.2(0.70) 53(0.43) 54.9(0.59) 54.1(0.53) 50.8(0.25)
291
( ): fuzzy menbership values Table 13.11 Teaching data.
^\output center N o ^ clusterl cluster2 cluster3 cluster4 cluster5 First 1 0 0 0 0 Second 0 1 0 0 0 Third 0 0 1 0 0 Fourth 0 0 0 1 0 Fifth 0 0 0 0 1
292 K. Nakagawa, N. Kamiura & Y. Hata Table 13.12
^~\4ttribute cluster Nc^\ Price($) 28248 28176 31600 34184 40960 13495 16500 12964 6785 11048 7995 8195 8495 9495 13845 16430 16925 7295 7295 7895 15645 7609 8558 6479 6855 Enginesize (cubic inch) 183 183 183 234 308 130 130 156 111 119 97 109 109 97 97 108 108 92 92 110 80 90 98 92 92
A part of clustering result.
Peakrpm (times) 4350 4350 4350 4750 4500 5000 5000 5000 4800 5000 4800 5250 5250 4500 4500 5800 5800 6000 6000 5800 6000 5500 5500 4800 6000 Length (inch) 190.9 187.5 202.6 202.6 208.1 168.8 168.8 173.2 170.7 172.6 171.7 171.7 171.7 171.7 180.2 176.8 176.8 163.4 157.1 167.5 169 157.3 157.3 144.6 144.6 Width (inch) 70.3 70.3 71.7 71.7 71.7 64.1 64.1 66.3 61.8 65.2 65.5 65.5 65.5 65.5 66.9 64.8 64.8 64 63.9 65.2 65.7 63.8 63.8 63.9 63.9 Height (inch) 58.7 54.9 56.3 56.5 56.7 48.8 48.8 50.2 53.5 51.4 55.7 55.7 55.7 55.7 55.1 54.3 54.3 54.5 58.3 53.3 49.6 50.6 50.6 50.8 50.8
clusterl
cluster2
cluster3
cluster4
cluster5
Table 13.13
^xattribute Price($) cluster Nt>K^ clusterl cluster2 cluster3 cluster4 cluster5 177.51 30.93 87.09 79.26 68.85
The degree of contribution.
Peakrpm (times) 81.59 118.51 216.47 108.63 89.36 Length (inch) 263.66 65.54 142.04 83.48 128.8 Width (inch) 131.46 56.51 76.86 55.46 60.51 Height (inch) 244.81 98.34 290.16 87.23 133.76
Enginesize (cubic inch) 70.1 10.65 57.18 13.21 37.75
A Clustering based on SelfOrganizing Map and Knowledge ... Table 13.14
^^ttribute cluster N o N ^
293
The discovered knowledge (tf»r=108.2). Peakrpm (times) Length (inch) big Width (inch) big Height (inch) big
Price($)
big
Enginesize (cubic inch)
clusterl cluster2 cluster3 cluster4 cluster5


small small big
big
small


big
small
The size of SOM used before discovering the knowledge was 20 rows and 20 columns, and the number of learning in the SOM was 400,000 times. We set the sum of errors in neural network to 0.0004. In this result, the threshold thr was estimated at 108.2. As a result, the discovered knowledge concerned with the first cluster was "price, length, width, and height are big", that concerned with the second cluster was "peakrpm is small", that concerned with the third cluster was "length and height are big and peakrpm is small", that concerned with the fourth cluster was "peakrpm is big", and that concerned with the fifth cluster was "length and height are small". The cells marked by ""s in Table 13.9 and 13.14 show that the discovered knowledge is "neither small nor big". In this way, we could estimate the number of cluster from two car databases and extract the knowledge about each cluster. 13.5 Conclusions In this paper, we proposed a useful method in discovering the reasons why the clusters were formed. We introduced the SOM to an unsupervised clustering algorithm. It could determine the center vector of each cluster and estimate the number of clusters. We used a threelayered feedforward neural network for discovering the knowledge. From the neural network which learns the center vectors, we calculate the degree of contribution being equivalent to the product of weights of neurons learning the center vectors. If the absolute value of the degree of contribution is larger than a threshold, then we employs this attribute fed to the neuron in the input layer as the knowledge. The experimental results showed that our method had an ability to discover the knowledge of how the cluster was derived. The degree was an efficient indicator to present the
294
K. Nakagawa, N. Kamiura
& Y. Hata
knowledge. Our method depends on the threshold to derive the knowledge. As the future direction, it remains to automatically find the threshold.
A Clustering based on SelfOrganizing Map and Knowledge ...
295
References
[I] P. Cheeseman, and J. Stutz, "Advance in Knowledge Discovery and Data Mining," Bayesian Classification (AutoClass): Theory and Results, AAAI Press / The MIT Press, Chapter 6, pp. 153180, 1996. [2] H. Asada, "The Dictionary of Artificial Intelligence," (in Japanese) Clustering, Maruzen, 1993. [3] M. R. Anderberg, "Cluster Analysis for Applications," Academic Press, New York, 1973. [4] J. C. Bezdek, "Pattern Recognition with Fuzzy Objective Function Algorithms," Plenum Press, New York, 1981. [5] T. Kohonen, "Selforganization and associative memory," Third Edition, SpringerVerlag, Berlin, 1989. [6] T. Kohonen, "SelfOrganization Map," Proc. IEEE, vol. 78, no. 9, pp. 14641480, 1990. [7] T. Kohonen, "SelfOrganization Maps," Springer Verlarg, Belrin, 1995. [8] T. Kikuchi, T. Matuoka, T. Takeda, and K. Kishi, "Automatic Classification by a Competitive Learning Neural Network," IEICE Trans., DII, Vol. J78DII, No. 10, pp. 15431547, 1995. [9] M. Tanaka, Y. Furukawa, and T. Tanino, "Clustering by Using Self Organizing Map," IEICE Trans., DII, Vol. J79DII, No. 2, pp. 301304, 1996. [10] M. Terashima, F. Shiratani, and K. Yamamoto, "Unsupervised Cluster Segmentation Method Using Data Density Histogram on SelfOrganizing Feature Map," IEICE Trans., DII, Vol. J79DII, No. 7, pp. 12801290, 1996. [II] R. Ito, T. Shida, and T. Kindo, "Competitive Models for Unsupervised Clustering," IEICE Trans., DII, Vol. J79DII, No. 8, pp. 13901400, 1996. [12] T. Hozumi, N. Kamiura, Y. Hata, and K. Yamato, "MultipleValued Logic Design Based on Gate Model Networks," MULTIPLEVALUED LOGIC An International Journal, vol. 3, no.l, pp. 120, 1998. [13] K. Nakagawa, N. Kamiura, and Y. Hata, "Knowledge Discovery Using Fuzzy CMeans and Neural Network," Proc. of the 5th International Conference on Soft Computing and Information / Intelligent Systems, Vol. 2, pp. 915918, Iizuka, Japan, Oct, 1998. [14] R. Setiono, "A Penaltyfunction Approach for Pruning Feedforward Neural Networks," Neural Computation, Vol. 9, No. 1, pp. 185204, 1995.
296 K. Nakagawa, N. Kamiura & Y. Hata [15] R. Setiono, "Extracting Rules from Neural Networks by Pruning and Hiddenunit Splitting," Neural Computation, Vol. 9, No. 1, pp. 205225, 1995. [16] T. Kato, and K. Ozawa, "Nonhierarchical Clustering by Genetic Algorithm," (in Japanese) Information Society of Japan, Vol.37, No.l 1, pp. 19501959, 1996.
Chapter 14 Probabilistic Rough Induction
Juzhen Dong 1 ,
1 2
Ning Zhong 1 ,
Institute of Waseda University
Setsuo Ohsuga 2
Technology
Maebashi
Abstract In this paper, we propose a soft approach called G D T  R S for rule discovery in databases with uncertainty and incompleteness. T h e approach is based on the combination of Generalization Distribution Table ( G D T ) and the R o u g h S e t methodology. A G D T is a table in which the probabilistic relationships between concepts and instances over discrete domains are represented. The G D T provides a probabilistic basis for evaluating the strength of a rule. Furthermore, the rough set methodology is used to find minimal relative reducts from the set of rules with larger strengths. Main features of our approach are ( l ) biases can be flexibly selected for search control, and background knowledge can be used as a bias to control the creation of a G D T and the rule discovery process; (2) the uncertainty of a rule, including its ability to predict possible instances, can be explicitly represented in the strength of the rule. Keywords : inductive learning, knowledge discovery, Generalization Distribution Table (GDT), rough sets, uncertainty and incompleteness, background knowledge, soft computing, hybrid system.
14.1
Introduction from
Inductive learning is a major way to discover classification rules databases.
Various m e t h o d s have been proposed[l8; 16; 7; 22; 12; 13]. value of information, t h a t
According to the value of information, these m e t h o d s can be divided into two types. T h e first t y p e is based on t h e formal is, the real meaning of d a t a is not considered in t h e discovery process. ID3 is a typical m e t h o d of this t y p e [16]. Although rules can be discovered 297
298
J. Dong, N. Zhong & S. Ohsuga
by using the method, it is difficult t o use background knowledge in the discovery process. T h e other type of inductive methods is based on the semantic value of information, t h a t is, the real meaning of d a t a must be considered by using some background knowledge in the discovery process. Dblearn is a typical method of this type [7]. It can discover rules by means of background knowledge represented by concept hierarchies, but if there is no background knowledge, it can do nothing. T h e question is "how can both the formal value and the semantic considered in a rule discovery system?" value be
Unfortunately, so far we have not seen any inductive method t h a t can consider both of the formal value and the semantic value of information. We argue t h a t an ideal rule discovery system should have such feature, t h a t is, on one hand, background knowledge can be used flexibly in the discovery process; on the other hand, if no background knowledge is available, it can also work. Another issue t h a t was not addressed in previous inductive methods is "how can unseen instances be predicted, and how can the uncertainty of a rule including the prediction be represented explicitly?" Since most of previous inductive methods are based on closed world assumption, they only consider the instances t h a t have been collected in a database. It is because there was no way t h a t has been found to know/guess the instances of describing a concept t h a t have never been observed before. In this paper, we propose a soft approach called G D T  R S , which is based on the combination of Generalization Distribution Table (GDT) and the Rough Set methodology, for discovering classification rules hidden in d a t a with uncertainty and incompleteness. The main features of G D T  R S are • Biases can be flexibly selected for search control and background knowledge can be used as a bias to control the creation of a G D T and the rule discovery process; • Unseen instances are considered in rule discovery process and the uncertainty of a rule, including its ability to predict unseen instances, can be explicitly represented in the strength of the rule.
Probabilistic Rough Induction
299
We first give the definition of a Generalization Distribution Table ( G D T ) , and describe some basic concepts on the G D T methodology. Then we outline the rough set methodology. Furthermore, we explain how to combine the G D T with the rough set methodology for discovering classification rules from databases with uncertainty and incompleteness. We focus on basic concepts, principles, two algorithms for implementation of our methodology, and describe the experimental results.
14.2
The G D T Methodology
T h e central idea of our methodology is to use a variant of a transition matrix, called t h e G e n e r a l i z a t i o n D i s t r i b u t i o n T a b l e ( G D T ) , as a hypothesis search space for generalization. In a G D T , the probabilistic relationships between concepts and instances over discrete domains are represented [25; 26]. This section describes the basic concepts and principles of the G D T methodology. 14.2.1 GDT
We define a G D T as consisting of three components: possible instances, possible generalizations for instances, and probabilistic relationshipsbetween possible instances and possible generalizations. T h e possible instances, which are represented in the top row of a G D T , are all possible combinations of attribute values in a database. T h e number of the possible instances is YiTLi n»> w n e r e m i s t n e number of attributes, and rii is the number of different attribute values in attribute i. T h e possible generalizations for instances, which are represented in the left column of a G D T , are all possible cases of generalization for all possible instances. "*", which specifies a wild card, denotes the generalization for instances*. For example, the generalization *6oCi means the attribute A is u n i m p o r t a n t for describing a concept. T h e number of the possible generalizations is n ^ i ( n « + *) — YllLi ni ~ 1T h e probabilistic relationships between the possible instances and the possible generalizations, which are represented in the elements Gij of a G D T , are the probabilistic distribution for describing the strength of the relationship between every possible instance and every possible general*For simplicity, we would like to omit the wild card in some places in this paper.
300 J. Dong, N. Zhong & S. Ohsuga ization. T h e prior distribution is equip robable, if any prior background knowledge is not used. T h u s , it is defined by the Eq. (1), and }_]<?»; = 1:
3
dj
=
piPIjlPGi) 1
NPG,
if PIj G PGi (1) otherwise
0
where PIj is the jth possible instance, PGi is the ith possible generalization, and Npa, is the number of the possible instances satisfying the ith possible generalization, t h a t is,
NPG, ke{l\
n
nk
(2)
PG,[(]=*}
where PGi[l] is the value of the kth attribute in the possible generalization PGi, PGj[t] = * means t h a t PGi doesn't contain attribute /. Furthermore, for convenience, letting E = Ylk=1 «£, Eq. (1) can be changed into the following form,
k€{l\
n nk
PG,[l]jt*}
Gij =p(PIj\PGi)=
<
E 0
if PIj G PGi otherwise
(3)
because of
NpGi
k€{l\ PG,[/] = *}
k€{I\
n n" n nt
PG,[l]7i*}
*€{'! PG,[i]#*} m it=l
n »*
E Since E is a constant for a given database, the prior distribution p(PIj \PGf) is directly proportional to the product of the numbers of values of all attributes contained in PGi.
Probabilistic Rough Induction Table 14.1
pa
^ \
• bOcO • bOcl 1/2 1/2 1/2 1/2 1/2 1/2 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/6 1/4 1/6 1/4 1/4 1/6 al** 1/6 1/6 1/6 1/6 1/4 1/6 1/4 1/6 1/4 1/6 1/6 1/4 1/6 1/6 1/6 1/4 1/6 1/6 1/2
301
A GDT generated from the sample database shown in Table 14.2
aObOcl aOblcO aOblcl a0b2c0 *0b2cl albOcO alb2cl
aObOcO
• bUO
• blcl *b2cO *b2cl aO«cO aO*cl al*cO al*cl &ObO* a.Obl* a0b2* albO* albl* alb2* • *cO **cl *bO* *bl* *b2*
Table 14.2
A sample database a ao ao ao
ai
>A ul
u2 u3 u4 u5 u6 u7
b bo
c
Cl
C] Cl CO Cl Cl Cl
h
6o
h
bo b2
6i
ao ao
ai
d y y y n n n y
T h u s , in our approach, the basic process of hypothesis generation is to generalize t h e instances observed in a database by searching and revising the G D T . Here, two kinds of attributes need to be distinguished: condition attributes and decision attributes (sometimes called class attributes) in a database. T h e condition attributes as possible instances are used to create the G D T , but the decision attributes are not. T h e decision attributes are normally used to decide which concept (class) should be described in a rule. Usually a single decision attribute is all t h a t are required. Table 14.1 is an example of the G D T , which is generated by using three condition attributes, a,b, and c, in a sample database shown in Table 14.2, and a = { a o , a i } , b— {60,61,62}, c— {co.ci}. For example, the real meaning of these condition attributes can be respectively assigned as Weather,
302
J. Dong, N. Zhong & S. Ohsuga
Temperature, Humidity in a weather forecast database, or Temperature, Cough, Headache in a medical diagnosis database. A t t r i b u t e d in Table 14.2 is used as a decision attribute. For example, the real meaning of the decision a t t r i b u t e can be assigned as Wind or Flu corresponding to the assigned condition attributes.
14.2.2
Biases
Since our approach is based on the G D T , rule discovery can be constrained by three types of biases corresponding to three components of the G D T . T h e first type of bias is related to the possible generalizations in a G D T . It is used to decide which concept description should be considered at first. To obtain the best concept descriptions, all possible generalizations should be considered, but not all of t h e m need to be considered at the same time. Possible generalizations (concept descriptions) are divided into several levels of generalization according to the number of wild cards in a generalization: the greater the number of wild cards, the higher the level. For example, all possible generalizations shown in Table 14.1 are divided into two levels of generalization: Leve.l\: {*b0c0, *6 0 ci, • • . , aj6 2 *} Level^: {**co,**ci, . . . , a i * * } . T h u s , it is clear that any generalization in a lower level is properly contained by one or more generalizations in an upper level. As the default, our approach prefers more general concept descriptions in an upper level to more specific ones in a lower level. However, if necessary, a m e t a control can be used t o alter the bias so t h a t more specific descriptions are preferred to more general ones. T h e second type of bias is related to the probability values denoted in Gij in a G D T . It is used t o adjust the strength of the relationship between an instance and a generalization. If no prior background knowledge as a bias is available, as default, the probability of occurrence of all possible instances are equiprobable, as shown in Table 14.1. However, a bias such as background knowledge can be used during the creation of a G D T . T h e distributions will be dynamically updated according to the real d a t a in a database, and they will be unequiprobable. T h e third type of bias is related to the possible instances in a G D T . In our approach, the strength of the relationship between every possible instance and every possible generalization depends to a certain extent on
Probabilistic Rough Induction
303
how the possible instances are defined and selected. Furthermore, background knowledge can be used as a bias t o constrain the possible instances and the prior distributions. For example, if the background knowledge "when the air temperature is very high, it is not possible there exists some frost at ground level" is used, to learn rules from an earthquake database in which there are attributes such as the air temperature, frost at ground level, two centimeters below ground level, t h e atmospheric pressure etc., then we do not consider the possible instances t h a t contradict this background knowledge. T h u s , more refined results may be get by using background knowledge in the discovery process. 14.2.3 Adjusting edge the Prior Distribution by Background Knowl
One of the main features of the G D T methodology is t h a t biases can be selected flexibly for search control, and background knowledge can be used as a bias to control the creation of a G D T and the rule discovery process. This section explains how t o use background knowledge as a bias t o adjust the prior distribution. As stated in Section 14.2, when no prior background knowledge as a bias is available, as default, the occurrence of all possible instances is equiprobable, and t h e prior distribution is shown in Eq. (1). However, the prior distribution can be adjusted by background knowledge, and will be unequiprobable after the adjustment. Generally speaking, the background knowledge can be given in
a
ixji
^
a
i2J2
'
^B>
where a 8 l j l is the jith value of attribute i\, and dj 2 j 2 is the jyth value of attribute i2. ai1j1 is called the premise of the background knowledge, a,2;2 is called t h e conclusion of t h e background knowledge, and Q is called t h e strength of the background knowledge. It means t h a t Q is the probability of occurrence of a, 2 j 2 when a , U l occurs. Q = 0 means t h a t "ai1jl and ai2j2 never occur together"; Q — 1 means t h a t " a ^ j , and ai2j2 always occur in the same time "; while Q = l / n ; 2 means t h a t the occurrence of a; 2 j 2 is the same
304
J. Dong, N. Zhong & S. Ohsuga
as the case of without background knowledge, where rn2 is the number of values of attribute i2. For each instance PI (or each generalization PG), let PI[i] (or PG[i]) denote the entry of PI (or PG) corresponding to attribute i. For each generalization PG such t h a t P G [ i i ] = a J U l and PG[i2] = *, the prior distribution between the PG and related instances will be adjusted. T h e probability of occurrence of attribute value a, 2 j 2 is changed from l/rii 2 to Q by background knowledge, so t h a t , for each of the other values in attribute i2, the probability of its occurrence is changed from l / n ; 2 to (1 — Q)/(rii2 — 1). Let the adjusted prior distribution be denoted by pbkT h e prior distribution adjusted by the background knowledge " a , U l => ai2J2, Q" is
Pbk(PI\PG)
• p(PI\PG) = < p(PI\PG)
xQx
ni2
if PGfa] = ailh , PG[i2] = *, PI[i2] = ai2J2 if PG{h] = ailh, 3 j ( ! < J < ni3J otherwise PG[i2] = *, ^ h)
PI
x  J — ^ . x nh x 1
(4) ai2J
[h]
, p(PI\PG)
where coefficients of p(PI\PG), Q x 7i;2, „~Zi x n i 2 ' a n c ^ 1 a r e called adjusting factor (AF for short) with respect to the background knowledge "a,i1j1 => oi2j2, Q"• They explicitly represent the influence of a piece of background knowledge to the prior distribution. Hence, the adjusted prior distribution can be denoted by
Pbk(PI\PG)
= P(PI\PG)
x AF{PI\PG),
(5)
and the AF is AF(PI\PG) Q x ni2 if PG[n] = ailjl, PG[i2] = *, PI[i2] = ai2J3 if PG[H] = ahjl, PG[i2] = *, ^ j2) PI[i2] = ai2J (6)
 ! — ?  x ni2
3j'(l < j < ni2,j 1 otherwise.
Probabilistic Rough Induction
305
So far, we have explained how the prior distribution is influenced by only one piece of background knowledge. We then consider the case t h a t there are several pieces of background knowledge such t h a t for each i (1 < i < m) and each j (1 < j < n,), there is at most only one piece of background knowledge with its conclusion. Let S be the set of all pieces of background knowledge to be considered. For each generalization P G , let B[S,PG] = { i e { l , . . . , m }  3 i i ( l < ti < m) 3 i j ( l < h < nh) 3j(l <j< m) [(there is a piece of background knowledge in S with a,ljl as its premise and with atj as its conclusion) & PG[i{\ = a,1ji t PG[{\ = * ] } , and for each i G B[S, PG], let J[S, PG, i] = {j G { 1 , . . . fii} 3 i i ( l < *i < m) 3 j i ( l < h < nh) [(there is a piece of background knowledge in S with o,i1j1 as its premise and with a 8 j as its conclusion) &i PG[ii] = a{1j1 & PG\i] *]}. Then, we must use the following adjusting factors AFg with respect to all pieces of background knowledge.
AFS{PI\PG)
w here
= Y[AFi{PI\PG)
i=i
(7)
AFi{PI\PG)
*%i] ^ 7)'i
(8) if i£B[S,PG}, PI[i] = dij je J[S,PG,i],
jej[5,PG,i] l m  \J[S, PG, i}\ x m 1
if ie B[S,PG], Vj(j G J[S, PG, i))[PI[i\ ? a,j] otherwise
where for each i; (1 < iI < m) and each j (1 < j < ni), Qij denotes the strength of the background knowledge (in S) with a^ as its conclusion. Although Q can be any value from 0 to 1 in principle, giving an exact
306
J. Dong, N. Zhong & S. Ohsuga
value of Q is difficult, and the more the background knowledge, the more difficult to calculate the prior distribution. Hence, in practice, if " a ; U l => cii2j2" with higher possibility, we treat t h a t Q is 1, t h a t is, aj2j2 occurs but other values of attribute «2 do not, when a,^, occurs. In contrast, if " a , U l => a ; 2 j 2 " with lower possibility, we treat t h a t Q is 0, t h a t is, a,2j2 dose not occur but other values of attribute i?. occur in equiprobable, when 0,JJJ occurs. Furthermore, if several pieces of background knowledge with higher possibility, and the conclusions of t h e m belong to the same attribute i2, all of the attribute values (conclusions) are treated as occurrence in equiprobable, but other values in attribute i? are treated as no occurrence.
14.2.4
Rule Strength
and Unseen
Instances
In our approach, learned rules are typically expressed in X  • Y with S, t h a t is, "if X then Y with strength 5 " , where X denotes the conjunction of the conditions t h a t a concept must satisfy, Y denotes a concept t h a t the rule describes, and S is a "measure of strength" with which the rule holds. Concretely, X is a conjunction of the a t t r i b u t e values of some condition attributes which corresponds to a generalization, and Y is a conjunction of the attribute values of some decision attributes. Below, we often identify a generalization with the conjunction of the attribute values of attributes in the generalization. X, Y, and S are called the condition, conclusion and strength, respectively, of the rule X — Y with S. * T h e strength 5 of a rule X —• Y is defined as follows:
S(X
> Y) = s(X)
x (1  r(X > Y ) ) ,
(9)
where s(X) is the strength of the generalization X (i.e., the condition of the rule), which is defined below, and r(X —• Y ) is the rate of noise (of rule X —* Y) which is defined by Eq. (12) below. In other words, the strength of a rule represents the incompleteness and uncertainty of the rule, which is influenced by b o t h unseen instances and noises. T h e strength of the generalization X, s(X), is defined as the sum of prior distributions between X and the observed instances satisfying X. It represents how many of possible instances satisfying the generalization X are observed in the database. T h e initial value of s(X) is 0. The value
Probabilistic Rough Induction
307
will be dynamically u p d a t e d according to the real d a t a in a database. If all the possible instances satisfying generalization A' occur in the database, the strength will be the maximal value, 1. Letting X — PG, the strength of the generalization PG is given by Eq. (10) when the prior distribution is equiprobable, or by Eq. (11) when the prior distribution is unequiprobable (i.e., when background knowledge is used).
s(PG) = J2p(PI,\PG) = Nin.rei(PG) x J—
,
NPG
(10)
s{PG) = £p i t (PI,PG) = (EAFS{PI,\PG))
XJ 
(11)
where PI\ is the observed instances in a database, N{nsre\(PG) is the number of the observed instances satisfying the generalization PG. T h e strength of the generalization X represents explicitly the prediction for unseen instances. It merits our attention t h a t Eq. (10) and Eq. (11) are not suitable for duplicate instances. Hence the duplicate instances should be handled before using the equations. We argue t h a t the prediction for unseen instances is an important function for discovering rules in realworld databases. In most cases, the set of instances collected in a database represents a part of all possible instances. This is reasonable, because we expect to learn rules without first collecting every possible instance (like physicians who learn how to diagnose diseases without first having seen every possible patient). Table 14.3 shows an example on Flu. We can see t h a t only a part of symptoms of a disease related to Headache, Temperature, Musclepain can be found in the database, but several possible symptoms (unseen instances) such as Headache(yes) Headache(yes) Headache(yes) A Temperature (normal) A Musclepain(no); A Temperature (high) A Musclepain(no); A Temperature (veryhigh) A Musclepain(no);
are not collected yet. This means t h a t the learning task is illposed if the possible symptoms are not considered in the learning process. For previous inductive approaches, without some other sources of constraint,
308
J. Dong, N. Zhong & S. Ohsuga Table 14.3 A sample database (decision table) on Flu Temperature normal high veryhigh normal high veryhigh Musclepain
yes yes yes yes no yes Flu no yes yes no no yes
V
U\ U2 U3 U4 "5 «6
\
Headache
yes yes yes no no no
there is no way to know the instances of describing a concept that has never before been observed. Our approach based on the Generalization Distribution Table provides a possibility for predicting unseen instances and for explicitly representing the strength of a rule including the prediction. In other words, our approach tries t o find the descriptions of concepts not only by the instances observed during learning but also by unseen instances. T h a t is, our approach is based on open world assumption in this sense. On the other hand, the rate of noises, r, is given by
r(X + Y)
Njnsre\{X)
— Ninsclass(X, Ninsrel{X)
Y)
(12)
where Ninsrei(X) is the number of the observed instances satisfying the generalization X, and Ninsciass(X,Y) is the number of the instances belonging to the class Y and satisfying the generalization X. r(X —+ Y) shows the quality of classification, t h a t is, how many instances satisfying generalization A" cannot be classified into class Y. Furthermore, a user can specify an allowed noise rate as the threshold value. T h u s , the rules with noise rates larger than the threshold value will be deleted.
14.3
Combining the G D T with Rough Sets
This section describes an implementation of the G D T methodology by combining the G D T with the rough set methodology ( G D T  R S for short). By using G D T  R S , we can first find the rules with larger strengths from possible rules, and then find minimal relative reducts from the set of rules with larger strengths [3].
Probabilistic Rough Induction
309
14.3.1
The Rough
Set
Methodology
In t h e rough set methodology for rule discovery, a database is regarded as a decision table, which is denoted T = (U, A, {Va}a€A, f, C, D), where U is a finite set of instances (or objects), called t h e universe, A is a finite set of attributes, each Va is the set of values of attribute a, f is a mapping from U X A to V(= \Ja£A Va), C a n d D are two subsets of A, called t h e sets of condition attributes and decision attributes, respectively, such t h a t CUD — A and CnD = 0. Equivalence classes in U/C and U/D are called condition classes a n d decision classes, respectively. [15; 20; 10]. T h e process of rule discovery is t h a t of simplifying a decision table a n d generating minimal decision algorithm. In general, an approach for decision table simplification consists of the following steps: (1) Elimination of duplicate condition attributes. It is equivalent t o elimination of some column from the decision table. (2) Elimination of duplicate rows. (3) Elimination of superfluous values of attributes. A representative approach for t h e computation of reducts of condition attributes is to represent knowledge in t h e form of a discernibility matrix [20; 15]. T h e basic idea can be briefly presented as follows: Let T = (U,A,{Va}aeA>f>C,D), b e a decision table with U = {ui, u2, • • • ,un}. By a discernibility matrix of T, denoted M(T), we will mean n x n matrix defined as
_
mij
f {ceC:
c(Ui)
/
C(UJ)}
if 3d e D[d(ut)
±
d(Uj)]
~{
A
if Vd G D[d(ui) = d{Uj)]
for i, j = 1, 2, . . . , n. T h u s entry mij is the set of all t h e condition attributes t h a t classify objects Ui and Uj into different decision classes in U/D. Since M(T) is symmetric and ma = 0, M(T) are represented only by elements in t h e lower triangle, t h a t is, t h e m,j with 1 < j < i < n. T h e discernibility function fa for T is defined as follows: For any u; G U,
Mui)
where
= /\{\Jmij
j
:j#iIj€{lI2,...,n}}
310
J. Dong, N. Zhong & S. Ohsuga
(i) V mij rriij ^ (ii) V rriij (iii) V rriij
is the disjunction of all variables a such t h a t c 6 rriij, if 0 = L(false), if rriij = 0 — T(tfrue), if m y = A.
Each logical product in the minimal disjunctive normal from (DNF) of / T ( M « ) is called a reduct of instance it,. Generating minimal decision algorithm is to eliminate the superfluous decision rules associated with the same decision class. It is obvious t h a t some decision rules can be dropped without disturbing the decisionmaking process, since some other rules can take over the j o b of the eliminated rules.
14.3.2
Simplifying
a Decision
Table
by
GDTRS
By using the G D T , it is obvious t h a t one instance can be expressed by several possible generalizations, and several instances can be generalized into one possible generalization. Simplifying a decision table by G D T  R S is to find a minimal set of generalizations, which contains all of the instances in a decision table. T h e method of computing the reducts of condition attributes in G D T  R S , in principle, is equivalent to the discernibility matrix method [20; 15]. However, we won't find dispensable attributes. This is because • Finding dispensable attributes does not benefit the best solution acquiring. T h e larger the number of dispensable attributes, the more difficult to acquire the best solution. • Some values of a dispensable a t t r i b u t e may be indispensable for some values of a decision attribute. For the database with noises, the generalization t h a t contains instances in different decision classes should be checked by Eq. (12). If a generalization X contains instances belonging to a decision class corresponding to Y more than those belonging to other decision classes, and the noise rate (of X —• Y) is smaller t h a n a threshold value, then the generalization X is * regarded as a consistent generalization of the decision class corresponding to Y, and " X —• Y with S(X —• Y)" becomes a candidate rule. Otherwise, the generalization X is contradictory to all the decision classes, and so no rule with X as its premise is generated.
Probabilistic Rough Induction
311
It is clear t h a t if a generalization is contradictory, the related generalizations in levels upper t h a n this generalization are also contradictory. For example, as shown in the sample database in Table 14.2, instance aob\C\ can be generalized into {ao}, {&i}, { c i } , {ao&i}, {aoCi}, or {&1C1}. Generalizations {61} and {CIQCI} are contradictory because they contain the instances belonging to different decision classes. Furthermore, generalizations {ao} and {c\} are also contradictory because {aoCi} is contradictory. For instance a^bxci, only two generalizations {0061} and {61 c\] can be used.
14.3.3
Rule
Selection
There are several possible ways for rule selection. For example, • Selecting the rules • Selecting the rules first type of biases • Selecting the rules t h a t contain as many instances as possible; in the levels as high as possible according to the stated above; with larger strengths.
Here we would like to describe a method of rule selection for our purpose as follows, • Since our purpose is to simplify the decision table, the rules t h a t contain less instances will be deleted if a rule t h a t contains more instances exists. • Since we prefer simpler results of generalization (i.e., more general rules), we first consider the rules corresponding to an upper level of generalization. • If two generalizations in the same level have different strengths, the one with larger strength will be selected first.
14.3.4
Algorithms
We here describe two algorithms (called "Optimal Set of Rules" and "SubOptimal Solution") for implementing the G D T  R S methodology.
14.3.5 Let Tnoise
Algorithm
1 (Optimal
Set of
Rules)
be the expected threshold value.
312
J. Dong, N. Zhong & S. Ohsuga
Step 1. Create one or more G D T s . In fact, this step can be omitted because the prior distribution of a generalization can be calculated by Eq. (1) and Eq. (2), if any prior background knowledge is not used for this calculation. Step 2. Regard the instances with the same condition a t t r i b u t e values (such as u i , U3, and u$ in the sample database of Table 14.2) as one instance, called a compound instance (such as u1 in the following table), so t h a t the probabilities of generalizations can be calculated correctly.
^T^__
u1,(ui,u3,us) u2
Ui «6
a ao ao
ai
b 6o fci
6i
c c\
Cl
co
C\ CI
ao
ai
m
b2 bi
d y,y.n y n n y
Step 3. For each compound instance u' (such as the instance «j in the above table), let DV(u') be the set of the decision classes to which the instances in u' belong. Further, for each v G DV(u'), let N(u',v) be the number of the instances in u' belonging to the decision class v. Calculate the rate r„ as follows: r„(u') = 1
N(u',v)
£
v'eDV(u')
N(u',v')'
If there exist a i ) £ DV(v!) such t h a t rv(u') — min{rvi(u')\v' G DV(u')} and rv(u') < T n o l j e , then we let the compound instance u' belong to the decision class v. If there does not exist any v £ DV(u') such t h a t rv(u') < Tnoise, we treat the compound instance u' as a contradictory instance, and set the decision class of u' to ± ( / a / s e ) . For example,
^lT^4^
Ui(«l, W3,"5)
a ao
b b0
c a
d 1
Let U be the set of all the instances except the contradictory ones. Step 4 Select one instance u from U . Using the idea of discernibility matrix, create a discernibility vector ( t h a t is, the row or the column
Probabilistic Rough Induction
313
corresponding to u in the discernibility matrix) for w. For example, the discernibility vector for instance u 2 : aobici is as follows:
Wc
"2(2/)
«i(L)
b
«2(y)
«4(n)
A
a, c
« 6 (n) 6
"7(2/)
A
Step 5. C o m p u t e all reducts for the instance u by using the discernibility function. For example, for instance «2:ao&iCi, two reducts are acquired: a A b and b Ac. / T ( w 2 ) = (6) A T A (a V c) A (6) A T = (a A 6) V (6 A c). Step 6. Acquire the rules from the reducts for the instance u, and revise the strengths of each rule by Eq. (9). For example, for instance u2:ao&iCi> the following rules are acquired. {a 0 6i} • y with S — 1 x  = 0.5, and
Hci) {6ici} —• y with 5 = 2 x  = 1. Step 7. Select better rules from the rules (for u) acquired in Step 6, by using the method stated in Section 14.3.3. For example, for the instance t ^ a o & i c i , the rule "{&ici} —• y" is selected because it contains the instances more t h a n the rule "{aob\} — y". * Step 8. U — U — u. If U ^ 0 , then go back to Step 4. Otherwise go to Step 9. Step 9. Finish if the number of rules selected in Step 7 for each instance is 1. Otherwise, by using the method stated in Section 14.3.2, find a minimal set of rules, which contains all of the instances in the decision table. The following table gives the result learned from the sample database shown in Table 14.2.
u
«2,«7 «4
rules
h A a *• y co —• n
62 —• n
ue
strengths 1 0.167 0.25
T h e time complexity of Algorithm 1 is 0(mn2Nrmax), where n is the number of instances in a database, m is the number of attributes, Nrmax is the maximal number of reducts for instances. We can see t h a t the algorithm is not suitable for t h e database with a lot of attributes. A possible method to solve the issue is to find a reduct (subset) of condition attributes as a preprocessing before using the algorithm
314
J. Dong, N. Zhong & S. Ohsuga
[4]. In the remainder of this section, we would like t o discuss another algorithm called Sub Optimal Solution t h a t is more suitable for the database with a lot of attributes. Algorithm 2 (Sub Optimal Solution) T h e SubOptimal Solution algorithm is a greedy one. It can be described briefly as follows. Let Tnoise be the expected threshold value, U = {u\, w2, . . . , « „ } be the set of instances, C — \a,\,02, • • . , a;} b e the set of condition attributes, D = {a;+i, a j + 2 , . . . , am] be the set of decision attributes. Further, we will use R t o denote a set of condition a t t r i b u t e values, and RS be denote a set of rules. Initially RS = 0. Step 1  Step 3. Same as A l g o r i t h m 1. Let U be the set of all the noncontradictory instances as in Algor i t h m 1, and let F = u'/D. Step 4 Select one decision class c from F. Let T+ = the set of all instances in c, T_ ~U T+. Tsave+ = 0, T,ave_ = 0, and R = 0, Step 5. Let S[T+,R] = {v\ v G {/(«,,a^) : «,• 6 T+ and 1 < j < 1} and
v£R}.
For each attribute value v G •S'[7+, R], let i?'(i>) = i?U{?;}, NRi,v)(+) be the number of instances in T+ with all the attribute values in R(v), and NR'(V){—) b e the number of instances in T_ with all the attribute values in R (v). Further, let Max[T+, R] = { « £ S[T+,R]\ NR,(v){+) = max J ^ ( , 0 ( + ) } } .
Choose a n attribute value v G Maa;[T+, R] such t h a t #*'(„)()= min
H1{^'(.')()>>
/c
Probabilistic Rough Induction
315
and compute rv, by the following equation:
r„ —
XR'{v)(+) +
NR,(v)()
rv denotes t h e noise rate of rule "VUig.R'(„)*>, ~* Conic)", where Con(c) denotes t h e conjunction of the decision a t t r i b u t e values corresponding t o the decision class c. Step 6. R RU{v}. Step 7. Move the instances t h a t are not contained by R from T+ and T_ to T3ave+ and Tsave_, respectively. T h a t is, Let U (v) = {u e U  v is not a condition attribute value of « } . Then
T+=T+U'(v), T_=TU'(v),
Jsave+ — *save+ ^ U \ ^ / i L save~ Lsave U U \V).
Step 8. If rv > Tnoise, then go back t o Step 5. Step 9. Insert the rule "VVt£R(ti)Vi —• Con(c)" into RS. And Set T+ = Tsave+, T_ = T s a t e _ , Tsave+ = 0, Tsave+ = 0 and R= 0. 5/ep JO. If T+ is not empty, go back t o Step 5. Step 11. F = F  c. If F ^ 0, then go t o Step 4. Otherwise, o u t p u t RS.
The time complexity of Algorithm 2 is 0(m2n2). Here we emphasize t h a t not every greedy approach succeeds in producing the best result overall. Just as in life, a greedy strategy may produce a good result for a while, yet the overall result may be poor. However, it is a b e t t e r way for solving verylarge, complex problems.
14.4
Experiments
Some of databases such as mushroom, meningitis, postoperative patient, earthquake, cancer have been tested for our approach. We would like t o use a mushroom database and a meningitis database as examples.
316
J. Dong, N. Zhong & S. Ohsuga Experiment 1
14.4.1
T h e mushroom database includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family. Each species is identified as definitely edible, definitely poisonous, or unknown edibility and not recommended. This latter class was combined with the poisonous one. The guide clearly states t h a t there is no simple rule for determining the edibility of a mushroom; no rule like "leaflets three, let it be" for Poisonous Oak and Ivy. In the mushroom database, there are 8124 instances, and not any contradictory instance and duplicate instance. To acquire the rules t h a t can discern edible and poisonous mushrooms exactly, we set the threshold value to 0. Tables 14.4 and 14.5 give the results of poisonous mushrooms acquired by G D T  R S and C4.5 respectively, where the column Used Instances denotes the number of instances contained by a rule, the Sire, column denotes the strengths of the rules. The rule selection is based on the method described in Section 14.3.3. T h a t is, we first select the rules, by which the number of instances contained is maximal, and then select the ones in higher level, at last select the rules with larger strength. If several rules in the same level contain the same instances and have the same strength (such as the rules in the last two rows in Table 14.4), all of t h e m are selected.
14.4.1.1
Comparing
with C^5
T h e result of our method is not the same as C4.5 [17]. By comparing the results of them, we can see t h a t some rules discovered by C4.5 are not in our result, such as: "odor(m) —• poisonous". T h e reason is t h a t all of the instances containing odor(m) are contained by other rules with higher strength in G D T  R S . Moreover, the rules discovered by using G D T  R S usually contain more instances t h a n those covered by rules discovered by using C4.5. For example, the rule gillspacing(c) A stalksurfaceabovering(k) —• poisonous
t h a t contains 2228 instances is acquired by G D T  R S . But using C4.5, no rule can contain instances over 2160.
Probabilistic Rough Induction Table 14.4 The result of GDTRS for poisonous Conditions Used Instances gillspacing(c) A stalksurfaceabovering(k) 2228 odor(f) 2160 stalksurfacebelowring(k) A ringnumber(o) 2160 gillspacing(c) A ringtype(e) A population(v) 1760 capsurface(s) A gillspacing(c) A veilcolor(w) A ringnumber(o) A population(v) 1096 capsurface(s) A gilUspacing(c) A gillsize(n) 1040 960 capsurface(s) A bruises(f) A gillsize(n) 872 staikcolorbelowring(w) A ringnumber(o) A sporeprintcolor(w) 712 capcolor(g) A bruises(f) A stalkroot(b) 672 capcolor(y) A bruises(f) 612 stalkroot(b) A habitat(g) capsurface(y) A gillspacing(c) A gillsize(n) 560 A stalksurfacebelowring(s) 504 gillcolor(g) A stalkroot(b) 152 capcolor(w) A gillspacing(c) A stalkroot(b) 72 sporeprintcolor(r) bruises(f) A gillspacing(c) A stalkroot(b) A ringnumber(o) 1392 or gillspacing(c) A stalkshape(e) A stalkroot(b) A ringnumber(o) bruises(f) A stalkroot(b) A ringnumber(o) A habitat(d) 624 or stalkshape(e) A stalkroot(b) A ringnumber(o) A habitat(d)
317
Stre.
8/E 9/E
12/E 60/E 576/E 16/E 16/E 243/E 100/E 20/E 35/E 64/E 60/E 100/E
9/E
60/E 210/E
n
ni, rii is the number of values of the condition attribute i.
Table 14.5 The result of C4.5 for poisonous Error Conditions odor(f) gillspacing(c) A ringnumber(o) A sporeprintcolor(w) odor(p) odor(c) sporeprintcolor(r) odor(m) 0.1% 0.1% 0.5% 0.7% 1.9% 3.8%
Used Instances 2160 1184
256 192 72 36
14.4.2
Experiment
2
T h i s section describes a result of an experiment in which background knowledge is used in t h e learning process t o discover rules from a meningitis database [2]. T h e d a t a b a s e was collected at the Medical Research I n s t i t u t e , Tokyo Medical and Dental University. It has 140 instances, each of which is described by 38 a t t r i b u t e s t h a t can be categorized into present history,
318
J. Dong, N. Zhong & S. Ohsuga
physical examination, laboratory examination, diagnosis, therapy, clinical course, final status, risk factor etc. T h e task is to find important factors for diagnosis (bacteria and virus, or their more detail classifications) and predicting prognosis. A more detailed explanation on this database could be found at http://www.kdel.info.eng.osakacu.ac.jp/SIGKBS. For each of the decision attributes: DIAG2, DIAG, C U L T U R E , C C O U R S E , and C O U R S E ( G r o u p e d ) , we run G D T  R S on it twice: using background knowledge and without using background knowledge, to acquire the rules respectively. For discretizing the continuous attribute, an automatic discretization [19] is used. 14.4.2.1 Background Knowledge Given By a Medical Doctor
T h e experience of a medical doctor, shuch as: If the brain wave (EEGWAVE) is normal, the focus of brain wave ( E E G . F O C U S ) is never abnormal; If the number of white blood cells (WBCs) is high, the inflammation protein ( C R P ) is also high can be used as background knowledge. In the following list, the background knowledge given by a medical doctor is described: • Never occurring together. EEGWAVE (normal) <& EEG.FOCUS(+) CSFCELL(low) *> CelLPoly(high) CSF.CELL(low) <& CelLMono(high) • Occurring with lower possibility.
WBC(low) WBC(low) WBC(low) WBC(low) WBC(low) BT(low) BT(low) BT(low)
=> CRP(high) =^ ESR(high) => CSF.CELL(high) => CelLPoly(high) => CelLMono(high) => STIFF(high) => LASEGUE(high) => KERNIG(high)
• Occurring with higher possibility.
Probabilistic Rough Induction WBC(high) WBC(high) WBC(high) => => => CRP(high) ESR(high) CSF'.CELL(high) => => => => => => => => => => => CelLPoly(high) CclLMono(high) STIFF(high) LASEGUE(high) KERNIG(high) CRP(high) ESR(high) FOCAL (+) EEG.FOCUS(+) CSFGLV(low) CSFPRO(low)
319
WBC(high) WBC(high) BT(high) BT(high) BT(high) BT(high) BT(high) EEG.FOCUS(+) EEG.WAVE(+) CRP(high) CRP(high)
Here "high" in brackets denoted in the background knowledge means t h a t the value is greater than t h e maximal value in t h e normal values range; and "low" means t h a t t h e value is less t h a n t h e minimal value in t h e normal values range. 14.4.2.2 Comparing the Results
T h e effects of usage of the background knowledge in G D T  R S are as follows: First, some candidates of rules, which are deleted due t o lower strengths during rule discovery without background knowledge, are selected. For example, rulei is deleted when no background knowledge is used, but after using the background knowledge stated above, it is reserved because its strength increased 4 times. rulei • ONSET(acute)
5) A CSF.CELL{> 10) A CULTURE() * VIRUS(E), Without using background knowledge, t h e strength S of rulei is 30*(384/E). In the background knowledge given above, there are two clauses related t o this rule: • Never occurring together CSF.CELL(low) o CelLPoly(high) CSF.CELL(low) <> £ CelLMono(high).
A ESR(<
320
J. Dong, N. Zhong & S. Ohsuga
By using automatic discretization to continuous attributes CellPoly and CellMono, the attribute values in each of attributes are divided into two groups: high and low. Since the high groups of CellPoly and CellMono do never occur when CSFCELL(low) occurs, the product of the numbers of attribute values is decreased t o E/4, and the strength S is increased to
S = 30 * ( 3 8 4 / J E 1 ) * 4.
Second, using background knowledge also causes some rules to be replaced by others. For example, the rule rule2 : DIAG(VIRUS(E)) A LOCH, V — EEGabnormal, S = 30/£
can be discovered without background knowledge, b u t after using the background knowledge stated above, it is replaced by
TV,lt2* :
EEGFOCUS(+)
A LOC[4, 7) — EEGabnormal,
S = (10/E) * 4.
T h e reason is t h a t both of t h e m contain the same instances, b u t the strength of rule,2' becomes larger than t h a t of rule2The result has been evaluated by a medical doctor. According t o his opinions, both rule\ and rule? are reasonable, and ruley is much better than rule}Although the similar results can be obtained from the meningitis database by using G D T  R S and C4.5 if such background knowledge is not used [2], it is difficult t h a t such background knowledge is used in C4.5 [17].
14.5
Conclusion
In this paper, we presented a soft approach called G D T  R S for rule discovery in databases, which is based on t h e combination of Generalization Distribution Table (GDT) and the Rough Set theory, as well as discussed the algorithms for its implementation. By using the G D T  R S , we can first find the rules with larger strengths from possible rules, and then find minimal relative reducts from the set of rules with larger strengths. Thus, a minimal set of rules with larger strengths can be acquired from databases with noisy, incomplete data. We described t h a t our approach is very soft, t h a t is, (1) biases can be flexibly selected for search control, and background knowledge
Probabilistic Rough Induction
321
can be used as a bias to control the creation of a G D T and the discovery process; (2) unseen instances are considered in the discovery process, and the uncertainty of a rule, including its ability to predict possible instances, can be explicitly represented in the strength of the rule. Some of databases such as mushroom, meningitis, postoperative patient, earthquake, weather, and cancer have been tested for our approach. T h e ultimate aim of the research project is to create an agentoriented and knowledgeoriented hybrid intelligent model and system for knowledge discovery and d a t a mining in an evolutionary, paralleldistributed cooperative mode. In this model and system, the typical methods of symbolic reasoning such as deduction, induction, and abduction, as well as the methods based on soft computing techniques such as rough sets, fuzzy sets, and granular computing can be cooperatively used by taking the G D T , the transition matrix in stochastic process as mediums. T h a t is, the work t h a t we are doing takes but one step toward this model and system.
322
J. Dong, N. Zhong & S. Ohsuga
References [1] J. Cendrowska, " P R I S M : An Algorithm for Inducing M o d u l a r Rules". J. of ManMachine Studies, Vol.27 (1987) 349370. Inter.
[2] J.Z. Dong, N. Zhong, and S. O h s u g a , "Rule Discovery from t h e Meningitis D a t a b a s e by G D T  R S " (Special P a n e l Discussion Session on Knowledge Discovery from a Meningitis D a t a b a s e ) Proc. the 12th Annual Conference of JSAI, Tokyo, J u n e 17 (1998) 8384. [3] J.Z. Dong, N. Zhong, and S. O h s u g a , "Probabilistic R o u g h I n d u c t i o n " , T . Y a m a k a w a a n d G. M a t s u m o t o (eds.) Methodologies for the Conception, Design and Application of Soft Computing, P r o c . 5 t h I n t e r n a t i o n a l Conference on Soft C o m p u t i n g and Information/Intelligent Systems ( I I Z U K A ' 9 8 ) , World Scientific (1998) 943946. [4] J.Z. Dong, N. Zhong, and S. O h s u g a , "Using Rough Sets with Heuristics to F e a t u r e Selection", Zhong, N., Skowron, A., and O h s u g a , S. (eds.) New Directions in Rough Sets, Data Mining, GranularSoft Computing, Lecture Notes in AI 1711, SpringerVerlag (1999) 178187. [5] U.M. Fayyad, G. P i a t e t s k y  S h a p i r o , P. S m y t h , a n d R. U t h u r u s a m y (eds.) Advances in Knowledge. Discovery and Data Mining, M I T Press (1996). [6] D . F . Gordon and M. D e s J a r d i n s . " E v a l u a t i o n a n d Selection of Biases in Machine Learning", Machine Learning, Vol.20 (1995) 522. [7] J. Han, Y. Cai, and N. Cercone, " D a t a  D r i v e n Discovery of Q u a n t i t a t i v e Rules in Relational D a t a b a s e s " , IEEE Trans. Knowl. Data Eng., Vol.5 ( N o . l ) (1993) 2940. [8] H. Hirsh, "Generalizing Version Spaces", Machine 546. [9] P. Langley, Elements (1996). of Machine Learning, Learning, Vol.17 (1994) Publishers Analysis of
Morgan K a u f m a n n
[10] T . Y . Lin and N. Cercone (ed.) Rough Sets and Data Mining: Imprecise Data, Kluwer Academic Publishers (1997).
[11] T . Mollestad and A. Skowron, "A Rough Set F r a m e w o r k for D a t a Mining of Propositional Default Rules", in Z.W. R a s and M. Michalewicz (eds.)
Probabilistic Rough Induction Proc. Ninth International Symposium on Methodologies for Intelligent tems (ISMIS96 ) , LNAI 1079, Springer (1996) 448457.
323 Sys
[12] T . M . Mitchell, "Version Spaces: A C a n d i d a t e Elimination A p p r o a c h to Rule Learning", Proc. 5th Int. Joint Conf. Artificial Intelligence (1977) 305310. [13] T . M . Mitchell, "Generalization as Search", Artificial (1982) 203226. Intelligence, Vol.18
[14] S. O h s u g a , "Symbol Processing by NonSymbol Processor", Proc. 4th Pacific Rim International Conference on Artificial Intelligence (PRICAI'96) (1996) 193205. [15] Z. Pawlak. ROUGH SETS, Theoretical Kluwer Academic Publishers (1991). Aspects of Reasoning about Data,
[16] J.R. Q u i n l a n , "Induction of Decision Trees", Machine 81106. [17] J.R. Q u i n l a n , C45: (1993). Programs for Machine
Learning,
Vol.1 (1986)
Learning,
Morgan
Kaufmann
[18] J . W . Shavlik and T . G . Dietterich (eds.). Readings Moran K a u f m a n n Publishers (1990).
in Machine
Learning.
[19] N. Shan, H.J. Hamilton, W . Ziarko, and N. Cercone. "Discretization of Continuos Valued A t t r i b u t e s in Classification S y s t e m s " , Proc. 4th International Workshop on Rough Sets, Fuzzy Sets, and Machine Discovery (RSFD'96) (1996) 7481. [20] A. Skowron and C. Rauszer. " T h e discernibility m a t r i c s and functions in information s y s t e m s " , R. Slowinski (ed.) Intelligent Decision Support (1992) 331362. [21] A. Skowron and L. Polkowski, "Synthesis of Decision Systems from D a t a T a b l e s " , in T . Y . Lin and N. Cercone (eds.), Rough Sets and Data Mining: Analysis of Imprecise Data, Kluwer (1997) 259299. [22] P. S m y t h and R. M. G o o d m a n , "An Information T h e o r e t i c A p p r o a c h t o Rule Induction from D a t a b a s e s " , IEEE Trans. Knowl. Data Eng., Vol.4 (No.4) (1992) 301316. [23] J. T e g h e m and J. C h a r l e t , "Use of Rough Sets M e t h o d t o Draw P r e m o n i t o r y Factors for e a r t h q u a k e s by E m p h a s i n g G a s Geochemistry: T h e Case of a Low Seismic Activity C o n t e x t in Belgium", in R. Slowinski (ed.) Intelligent decision support: Handbook of applications and advances of rough set theory, Kluwer (1992) 165179. [24] L. A. Zadeh, "Toward a T h e o r y of Fuzzy Information G r a n u l a t i o n and Its Centrality in H u m a n Reasoning and Fuzzy Logic", Fuzzy Sets and Systems, Elsevier Science Publishers, 90 (1997) 111127.
324
J. Dong, N. Zhong & S. Ohsuga
[25] N. Zhong and S. O h s u g a , "Using Generalization Distribution Tables as a Hypotheses Search Space for Generalization", Proc. 4th International Workshop on Rough Sets, Fuzzy Sets, and Machine Discovery (RSFD96) (1996) 396403. [26] N. Zhong, J.Z. Dong, and S. O h s u g a , "Discovering Rules in t h e Environm e n t with Noise and Incompleteness", Proc. 10th International Florida AI Reaserch Symposium (FLAIRS97) edited in t h e Special Track on Uncertainty in . 4 / ( 1 9 9 7 ) 186191.
Chapter 15 Data Mining via Linguistic Summaries of Databases: An Interactive Approach
Janusz Kacprzyk and Slawomir Zadrozny
Systems Research Institute, Polish Academy of Sciences ul. Newelska 6, 01447 Warsaw, Poland University of Applied Information Technology and Management ul. Newelska 6, 01447 Warsaw, Poland
Abstract We propose an interactive approach to data mining meant as the derivation of linguistic summaries of databases. For interactively formulating the linguistic summaries, and then for searching the database, we employ Kacprzyk and Zadrozny's [611]) fuzzy querying addon, FQUERY for Access. We present an implementation for the derivation of linguistic summaries of sales data at a computer retailer. Keywords: data mining, knowledge discovery, linguistic summaries, fuzzy logic, database, querying, fuzzy querying, fuzzy linguistic quantifier, natural language, interface
15.1. Introduction The recent growth of Information Technology (IT) has implied, among others, the availability of a huge amount of data (from diverse, often remote databases). Unfortunately, the availability of (raw) data does not make by itself the use of those data more productive. More important are relevant, nontrivial dependencies that are encoded in those data. Unfortunately, they are usually hidden, and their discovery is not a trivial act, and requires some intelligence. Data mining is meant here as the (automatic) discovery of such relations, dependencies, etc. from data (stored in a database). In particular, here we mean it as the derivation of linguistic summaries. We propose an approach to the derivation of linguistic summaries of large
325
326
J. Kacprzyk
& S.
Zadrozny
sets of data in the sense of Yager [1820], i.e. derived as linguistically quantified propositions as, e.g., by "most of the employees are young and well paid", with a degree of validity (truth). We would also be interested in more conceptually sophisticated linguistic summaries as, e.g., "most orders are difficult". We advocate the view that the linguistic summaries of the type mentioned above may only be practically formulated interactively, i.e. via an interaction with the user. This interaction will proceed via Kacprzyk and Zadrozny's [611] FQUERY for Access, a fuzzy querying addon to Access. New perspectives related to querying via WWW will be mentioned too. As an example we will show an implementation of the data summarization system proposed for the derivation of linguistic data summaries in a sales database of a computer retailer
15.2. Linguistic Summaries Using Fuzzy Logic with Linguistic Quantifiers
In this paper we mean the linguistic summaries in the sense of Yager [1820]. Basically, we suppose that we have: • • • V  a quality (attribute) of interest, with numeric and nonnumeric (e.g. linguistic) values  e.g. salary in a database of workers, Y = \yi,.,ynj  a set of objects (records) that manifest quality V, e.g. the set of workers; V(yt)  values of quality V for object y,, D = fy{y\ \..., V(ynx)}  a set of data (database)
A summary of data set consists of: • • • a summarizer 5 (e.g. young), a quantity in agreement Q (e.g. most), truth (validity) T e.g. 0.7,
and may exemplified by "T(most of employees are young)=0.1". More specifically, if we have a summary, say "most (Q) of the employees (y,'s) are young (S)", where "most" is a fuzzy linguistic quantifier Hg(x),xe [0,1], "young" is a fuzzy quality 5, and MsCV/Xyj e Y > t n e n using the classic Zadeh's [23, 24] calculus of linguistically quantified propositions, we obtain
Data Mining via Linguistic
Summaries
of Databases ...
327
T = fiQ[j;ins(yi)]
(1)
For more sophisticated summaries as, e.g., "most (Q) employees (v,'s) are young (S) and well paid (F)"> the reasoning is similar, and we obtain
m i=\ N m
T = fiQ(Z(jip(x,)AHB(x,))/^fiB{x,))
(=1
(2)
where, X/igOt,) * 0, and A is a fnorm.
;=i
The above calculus may be replaced by, e.g., OWA operators (cf. Yager and Kacprzyk's [21] volume).
15.3. A General Scheme for Fuzzy Logic Based Data Summarization The simple approach given above has some serious limitations. Basically, in its source version, it is meant for oneattribute simplified summarizers (concepts) e.g., young. It can be extended to cover more sophisticated summaries involving some confluence of attribute values as, e.g, "young and well paid", but this should be done "manually", and leads to some combinatorial problems as a huge number of summaries should be generated and validated to find the most proper one. The validity criterion is not trivial either, and various measures of specificity, informativeness, etc. may be employed. This relevant issue will not be discussed here, and we will refer the reader to, say, Yager [1820] or Kacprzyk and Yager [5]. For instance, cf.. George and Srikanth [2], let a database with n attributeslabels + linguistic quantifiers, constituting a description of workers, be:
Q a few Imany Most Almost all
age young ca. 35 middle aged old
salary low medium high very high
Experience Low Medium High very high
328
J. Kacprzyk
& S.
Zadrozny
Then, we should generate the particular combinations: • • • • almost none workers are: young, low salary, low experience a few workers are: young, low salary, low experience almost all workers are: old, very high salary, very high experience,
whose number may be huge in practice; then, we have to calculate the validity of each summary. This is a considerable task, and George and Srikanth [2] use a genetic algorithm to find the most appropriate summary, with quite a sophisticated fitness function. Clearly, when we try to linguistically summarize data, the most interesting are nontrivial, humanconsistent summarizers (concepts) as, e.g.: • • productive workers, difficult orders, etc.
and it may easily be noticed that they involve a very complicated combination of attributes as with, e.g.: a hierarchy (not all attributes are of the same importance for the concept in question), the attribute values are ANDed and/or ORed, k out of n, most, etc. of them should be accounted for, etc. The basic idea of fuzzy logic based data summarization (data mining) adopted in this paper consists in using a linguistically quantified proposition, as originated by Yager [18, 19], and here we extend it to using a fuzzy querying package. We start with the reinterpretation of (1) and (2) for data summarization. Thus, (1) is meant as formally expressing a statement "Most records match query S" (3)
We assume a standard meaning of the query as a set of conditions on the values of fields from the database's tables, connected with AND and OR. We allow for fuzzy terms in a query (see next section), which implies a degree of matching from [0,1] rather than a yes/no matching. Effectively, a query S defines a fuzzy subset (fuzzy property) on the set of the records, where the membership of them is determined by their matching degree with the query. Similarly, (2) may be interpreted as expressing a statement of the following type "Most records meeting conditions F match query S" (4)
Data Mining via Linguistic
Summaries
of Databases ...
329
Thus, (4) says something about a subset of records, i.e. only those which satisfy (3). That is, in database terminology, F corresponds to a filter and (4) claims that most records passing through F match query S. Moreover, since F may be fuzzy, a record may pass through it to a degree from [0,1]. As this is more general than (3), we will assume (4) as the basis. That is, we seek, for a given database, propositions of the type (1), interpreted as (4), which are highly valid (true). Basically, a proposition sought consists of three elements: • • • a fuzzy filter F (optional), a query S, and a linguistic quantifier Q. There are two limit cases, where we: • • do not assume anything about the form of any of these elements, assume fixed forms of the fuzzy filter and query, and only seek a linguistic quantifier Q.
Obviously, in the first case data summarization will be extremely timeconsuming, though may produce interesting results, not predictable by the user beforehand. In the second case the user has to guess a good candidate formula for summarization, but the evaluation is fairly simple  requires more or less only the same resources as answering a (fuzzy) query. Thus, the second case refers to the summarization known as ad hoc queries, extended with an automatic determination of a linguistic quantifier. Between these two extremes there are different types of summaries, with various assumptions on what is given and what is sought. In case of a linguistic quantifier, it may be given or sought. In case of a fuzzy filter F and a fuzzy query S, more possibilities exist. Basically, both F and S consist of simple conditions, each stating what value afield should take on, and connected using logical connectives. Here we assume that the table(s) of interest for summarization are fixed. We will use the notation shown in Table\l5.1 to describe what is given or what is sought in respect to the fuzzy filter F and query S (below A stands for F or S)
330
J. Kacprzyk
& S.
Zadrozny
Table 15.1. Given and sought elements of summaries A Ac Av A All is given (or sought), i.e., fields, values and how simple conditions are linked using logical connectives, Fields and linkage of simple conditions are given, but values are left out, Denotes sought left out values referred to in the above notation Only a set of fields is given, the other elements are sought
Using this notation we may propose a rough classification of summaries shown in Table 15.2. Table 15.2. A classification of summaries Type 1 1.1 2 Given S S,F
Q,S>C
Sought Q Q
sv
5V S,F,Q S,F,Q FQ
Remarks Simple summarizing through adhoc query (valuefocused) as above + the use of fuzzy filter, i.e., summary is related to a fuzzy subset of records in the simplest case corresponds to the search for typical or exceptional values (see the comments given below this table) Fuzzy rules, extremely expensive computationally Much more viable version of above Looking for causes of some preselected, interesting data features (machine learning alike)
2.1 3 3.1 3.2
Q,^C,F
Nothing
w
s
Thus, we distinguish 3 main types of data summarization. Type 1 is a simple extension of fuzzy querying as in FQUERY for Access. Basically, the user has to conceive a query, which may be true for some population of records in the database. As a result of this type of summarization he or she receives some estimate of the cardinality of this population as a linguistic quantifier. The primary target of this type of summarization is certainly to propose such a query that a large proportion as, e.g., most, of the records satisfy it. On the other hand, it may be interesting to learn that only few records satisfy some meaningful query. Type 1.1 is a straightforward extension of Type 1 summaries by adding a fuzzy filter. Having a fuzzy querying engine dealing with fuzzy filters, the
Data Mining via Linguistic
Summaries
of Databases ...
331
computational complexity is here the same as for Type 1. Type 2 summaries require much more effort. The primary goal of this type of summary is to determine typical (exceptional) values of a field. Then, query S consists of only one simple condition referring to the field under consideration. The summarizer tries to find a value, possibly fuzzy, such that the query (built of the field, equation relational operator and that value) is true for Q records. Depending on the category of Q used as, e.g., most versus few, typical or exceptional values are sought, respectively. This type of summaries may be used with more complicated, regular queries, but it quickly becomes computationally infeasible (combinatorial explosion) and the interpretation of results becomes vague. Type 2.1 may produce typical (exceptional) values for some, possibly fuzzy, subpopulations of records. From the computational point of view, the same remarks apply as for Type 1 versus Type 1.1. Type 3 is the most general. In its full version this type of summaries is to produce fuzzy rules describing the dependencies between values of particular fields. Here the use of a filter is essential, in contrast to the previous types when it was optional. The very meaning of a fuzzy rule obtained is that if a record meets a filter's condition, then it meets also the query's conditions  this corresponds to a classical IFTHEN rule. For a general form of such a rule it is difficult to devise an effective and efficient algorithm looking for such dependencies. Full search may be acceptable only in case of restrictively limited sets of rule (summarizer) building blocks, i.e. fields and their possible values. Type 3.1 summary may produce interesting results in a more reasonable time. It relies on the user pointing out promising fields to be used during the construction of a summarizer. For computational feasibility some limits should also be put on the complexity of query 5 and filter F in terms of the number of logical connectives allowed. Finally, Type 3.2 is here distinguished as a special case due to its practical value. First of all it makes the generation of a summarizer less time consuming and at the same time has a good interpretation. Here the query is known in an exact form and only the filter is sought, i.e. we look for causes of given data features. For example, we may set in a query that profitability of a venture is high and look for the characterization of a subpopulation of ventures (records) of such a high profitability. Effectively, what is sought, is a (possibly fuzzy) filter F. The summaries of type 1 and 2 have been implemented as an extension to our FQUERY for Access (cf. Kacprzyk and Zadrozny [611]).
332
J. Kacprzyk
& S.
Zadrozny
15.4. FQUERY for Access: A Fuzzy Querying Addon FQUERY for Access is an addon (addin) to Microsoft Access that provides fuzzy querying capabilities (cf. Kacprzyk and Zadrozny [611]). FQUERY for Access makes it possible to use fuzzy terms in regular queries, then submitted to the Microsoft Access's querying engine. The result is a set of records matching the query, but obviously to a degree from [0,1]. Briefly speaking, the following types of fuzzy terms are available: • • • fuzzy values, exemplified by low in "profitability is low", fuzzy relations, exemplified by much greater than in "income is much greater than spending", and fuzzy linguistic quantifiers, exemplified by most in "most conditions have to be met".
The elements of the first two types are elementary building blocks of fuzzy queries in FQUERY for Access. They are meaningful in the context of numerical fields only. There are also other fuzzy constructs allowed which may be used with scalar fields. If a field is to be used in a query in connection with a fuzzy value, it has to be defined as an attribute. The definition of an attribute consists of two numbers: the attribute's lower (LL) and upper (UL) limit. They set the interval which the field's values are assumed to belong to, according to the user. This interval depends on the meaning of the given field. For example, for age (of a person), the reasonable interval would be, e.g., [18,75], in a particular context, i.e. for a specific group. Such a concept of an attribute makes it possible to universally define fuzzy values. Fuzzy values are defined as fuzzy sets on [10, +10]. Then, the matching degree md{,) of a simple condition referring to attribute AT and fuzzy value FV in a record R is calculated by
md{ AT = FV, R) = / i F V ( T ( R ( A T ) ) (5)
where: R(AT) is the value of attribute AT in record R, jUFV is the membership function of fuzzy value FV, T: [LLAT,ULAT]—»[10,10] is the mapping from the interval defining AT onto [10,10] so that we may use the same fuzzy values for different fields. A meaningful interpretation is secured by x which makes it possible to treat all field domains as ranging over the unified interval [10,10]. For simplicity, it is assumed that the membership functions of fuzzy values are trapezoidal as in Figure 15.1 and t is assumed linear.
Data Mining via Linguistic
Summaries
of Databases ...
333
•+>
Figure 15.1. An example of the membership function of a fuzzy value Linguistic quantifiers provide for a flexible aggregation of simple conditions. In FQUERY for Access the fuzzy linguistic quantifiers are defined in Zadeh's [23, 24] sense, i.e. as fuzzy sets on the [0, 10] interval instead of the original [0, 1]. They may be interpreted either using the original Zadeh's approach or via the OWA operators (cf. Yager [19], Yager and Kacprzyk [21]). Zadeh's interpretation will be used here. The membership functions of fuzzy linguistic quantifiers are assumed piecewise linear, hence two numbers from [0,10] are needed. Again, a mapping from [0,N], where N is the number of conditions aggregated, to [0,10] is employed to calculate the matching degree of a query. More precisely, the matching degree, md(,), for the query "Q of N conditions are satisfied" for record R is equal to md(Q condition {,R) = /i0 ( T ( £ md{ condition;, R))) (6)
We can also assign different importance degrees for particular conditions. Then, the aggregation formula is equivalent to (1). The importance is identified with a fuzzy set on [0,1], and then treated as property F in (1). In FQUERY for Access, queries containing fuzzy terms are still syntactically correct Access's queries through the use of parameters. Basically, Access represents the queries using SQL. Parameters, expressed as strings limited with brackets, make it possible to embed references to fuzzy terms in a query. We have assumed a special naming convention for parameters corresponding to the particular fuzzy terms. For example: [FfA_FV fuzzy value name] [FfA_FQ fuzzy quantifier name] will be interpreted as a fuzzy value will be interpreted as a fuzzy quantifier
First, a fuzzy term has to be defined using a toolbar of FQUERY for Access that is stored internally. This maintenance of dictionaries of fuzzy terms defined
334
J. Kacprzyk
& S.
Zadrozny
by users strongly supports our approach to data summarization. In fact, the package comes with a set of predefined fuzzy terms but the user may enrich the dictionary too. When the user initiates the execution of a query, it is automatically transformed and then run as a native query of Access. The transformation consists primarily of the replacement of parameters referring to fuzzy terms by calls to functions that secure a proper interpretation of these fuzzy terms. Then, the query is run by Access as usually. In our approach, the interactivity, i.e. user assistance, is in the definition of summarizers (indication of attributes and their combinations). This proceeds via a user interface of a fuzzy querying addon. Basically, the summarizers allowed are: • • • simple as, e.g., "salary is high" compound as, e.g., "salary is low AND age is old" compound with quantifiers, as, e.g., "most of {salary is high, age is young, ..., training is well above average)",
We will also use "natural" linguistic terms, i.e. (7+2!) exemplified by: very low, low, medium, high, very high, and also "comprehensible" fuzzy linguistic quantifiers as: most, almost all, ..., etc. In Kacprzyk and Zadrozny [611], a conventional DBMS is used, and a fuzzy querying tool is developed. Basically, the socalled quantified queries, introduced by Kacprzyk and Ziolkowski [16] and Kacprzyk, Zadrozny and Ziolkowski [15], are used. They make it possible to express complex concepts as, e.g., a "serious water pollution" may well be equated with, say, "almost all of the relevant pollution indicators considerably exceed pollution limits (maybe imprecisely specified)". Zadeh's [23, 24] calculus of linguistically quantified propositions is used. Basically, if comp\ is a degree of matching of a record with the ith partial condition and if the query is "find X's such that Q out of {compx, ..., compp}", then the matching degree of a record with this query is md = iiQ[—^=lcompi], for the case without importance, fr,e[0,l],
ormd = ^gfX/liCfy Acompj)/ X/Lj&;], for that with importance. Notice that the quantified queries are what we do need for implementing the linguistic summaries. We sketch now FQUERY for Access that supports various fuzzy elements in queries. The main issues are: (1) how to extend the syntax and semantics of the query, and (2) how to provide an easy way of eliciting and manipulating those terms by the user. The main entities may be summarized as:
Data Mining via Linguistic
Summaries
of Databases ...
335
•
• • •
<attribute> For each attribute we give: the lower and the upper limit specifying the interval of possible values, and used for scaling the values while calculating the degree of matching with a fuzzy value used, or the degree of membership in a fuzzy relation. <fuzzy value> These are equivalents of imprecise linguistic terms, and are defined by trapezoidal membership functions on [10, +10]. <fuzzy relation> An imprecise (fuzzy) relation is represented by a trapezoidal membership function. <fuzzy quantifier importance coefficient <OWAtag> The fuzzy quantifiers were initially defined (cf. Kacprzyk and Zadrozny [6, 7]) as fuzzy sets in [0.0, 10.0] with piecewise linear membership functions. Then (cf. Kacprzyk and Zadrozny [811]), importance was added to handle queries as "most of the important subcondidons of the query are fulfilled". Moreover, the OWA operators (cf. Yager and Kacprzyk [21]) are supported.
FQUERY for Access is embedded in the native Access's environment as an addon. Definitions of attributes, fuzzy values etc. stored in proper locations, and a mechanism for putting them into the QueryByExample (QBE) sheet (grid) are provided. All the code and data is put into a database file, a library, installed by the user. Parameters are used, and a query transformation is performed. FQUERY for Access provides its own toolbar. There is one button for each fuzzy element, and the buttons for declaring attributes, starting the querying, closing the toolbar and for help (cf. Figure 15.2). Generally, the user interacts with FQUERY for Access, by pressing a button, in order to: • • • declare attributes, define fuzzy elements and put them automatically into the QBE sheet, start the querying process,
The user inserts fuzzy elements into the QBE sheet from special tables in library. Then, the search is started by the GO button. FQUERY for Access employs the standard Access's querying procedure as queries with fuzzy elements are still legitimate. Thus, the original SQL type query is replaced by the modified one with calls to functions filling in an appropriate data structures with information required for computing the matching degree for subsequent records. Then, the query is run by Access as usually and the results are displayed.
336
J. Kacprzyk
& S.
Zadrozny
% Microsoft Access Troublesome orders : Kwerenda wybierajaca
' at ^l^yl^Ttefel^i *>'fli4*w^!s"1FS^ „ tfl^KPltfrtwl
W^msmmm
Older ID Customer ID Employee ID Order Amount
v
i
Order ID Product ID U hit Puce Quantity
_ t
Product ID Product Nami Supplier ID Category ID
i
^
Pole; Product ID Ordei details Country Orders Delivery Time Orders Orders Freight m Orders . Discount Order detat Employee ID Orders
• Swtft
Foka* KryWa,* Sub:
~pr
O'USA'
ZEE
lF(A_FVSoon]
MI
[RA_FVLow]
ML
[FfA.FV High] [ F f A J V High] [FfA_FQ Mostl
ja:
TT
WmmBBgffmm
jFi8
•• •"'• ? ^<OT''^1./?r.M^rpAl "'[c8'llfel f.
Figure 15.2. Composition of a fuzzy query
,
The main elements used for the composition of a fuzzy query are shown in Figure 2 in which we use a fuzzified example which is included in Access: suppose that we wish to find "difficult orders" defined as: Difficult order = from outside the USA short delivery time low order amount high freight costs high discount most (of these conditions should be met) This is therefore a compound query with a fuzzy linguistic quantifier. It is composed as in Figure 15.2. The above fuzzy querying addon was the extended by Kacprzyk and Zadrozny to fuzzy querying over the Internet. The query is defined by using a WWW browser (more specifically, Microsoft Explorer or Netscape Navigator), and the user interface is similar as in Figure 2, with a WWWbrowserlike toolbar. The definition of fuzzy values, fuzzy relations, and linguistic quantifiers
Data Mining via Linguistic
Summaries
of Databases ...
337
is via Java applets. Basically, a query is sent to the WWW server, the searching program decodes the query, and, if fuzzy elements exist, a HTML page is sent back to specify their membership functions, search is done sequentially, and yields a matching degree. The results are sent back as a HTML document to be displayed. So, the interface consists of: HTML pages for query formation, membership functions specifications, list of records and the content of a selected record, a searching program, and a reporting program.
15.5. Summaries via FQUERY for Access FQUERY for Access, which extends the querying capabilities of Microsoft Access by making it possible to handle fuzzy terms, may be viewed as an interesting tool for data mining, including the generation of summaries. The simplest method of data mining through adhoc queries becomes much more powerful by using fuzzy terms. Nevertheless, the implementation of various types of summaries mentioned before seems to be worthwhile, and is fortunately enough relatively straightforward. We rely on dictionaries of fuzzy terms maintained and extended by users during the subsequent sessions. The main feature supporting an easy generation of summaries is the adopted concept of contextfree definitions of particular fuzzy terms. Hence, looking for a summarizer we may employ any term in the context of any attribute. Thus, we get a summarizer building blocks at hand and what is needed is an efficient procedure for their composition, compatible with the rest of the fuzzy querying system. In case of Type 1 summaries only the list of defined linguistic quantifiers is employed. The query S is provided by the user and we are looking for a linguistic quantifier describing in a best way the proportion of records meeting this query. Hence, we are looking for a fuzzy set in the space of linguistic quantifiers such that VS(Q) = tmth(QS(X)) = HQ(lns(Xi)/m)
i=l
(7)
FQUERY for Access processes the query, additionally summing up the matching degrees for all records. Thus, the sum in (7) is easily calculated. Then, the results are displayed as a list of records ordered by their matching degree. In another window, the fuzzy set of linguistic quantifiers sought is shown. We only take into account quantifiers defined by the users for querying, for efficiency. At
338
J. Kacprzyk
& S.
Zadrozny
the same time it seems quite reasonable as the quantifiers defined by the user should have a clear interpretation. Currently, FQUERY for Access does not support fuzzy filters. As soon as this capability is added, also summaries of Type 1.1 will be available. Simply, when evaluating the filter for particular records, another sum used in (2) will be calculated and the final results will be presented as in case of Type 1 summaries. Type 2 summaries require more effort and the redesigning of the results' display. Now, we are given the quantifier and the whole query, but without some values. Thus, first of all, we have extended the syntax of the query language introducing a placeholder for a fuzzy value. That is, the user may leave out some values in the query's conditions and request the system to find a best fit for them. To put such a placeholder into a query, the user employs a new type of parameter. We extend the list by adding the parameter [FfA_F?]. During the query processing these parameters are treated similarly as fully specified fuzzy values. However, the matching degree is calculated not just for one fuzzy value but for all fuzzy values defined in the system. The matching degrees of the whole query against the subsequent records, calculated for different combination of fuzzy values, are summed up. Finally, it is computed for particular combinations of fuzzy values how well the query is satisfied when a given combination is put into a query. Thus, we again obtain as a result a fuzzy set but this time defined in the space of vectors of fuzzy values. Obviously, such computations are extremely timeconsuming and are practically feasible only for one placeholder in a query. On the other hand, the case of one placeholder, corresponding to the search for typical or exceptional values, is the most useful form of a Type 2 summary. It is again fairly easy to embed Type 2 summaries in the existing fuzzy querying mechanism.
15.6. An Example of Implementation The proposed data summarization procedure was implemented for a sales database of a smalltomedium size computer retailer (ca. 15 employees) located in the Southern part of Poland. The database is characterized by: Number Number Number Number Number of records: of attributes: of transaction documents: of suppliers and customers: of products carried: 8743 14 4000 3000 3000
Data Mining via Linguistic
Summaries
of Databases . . .
339
and these numbers vary over time: the number of records increases as more and more sales are recorded, the number of attributes is unchanged, the number of transactions increases, the numbers of suppliers and customers increase (mostly due to the increase of the number of customers as the number of suppliers is more or less the same), and the number of products is more or less the same. The basic structure of the database (in the "dbf" type format) is as shown in Table 15.3.
Table 15.3. The basic structure of the database (in the "dbf type format) Attribute name Date Time Name Amount (number) Price Commission Value Discount Group Transaction value Total sale to customer Purchasing frequency Town Attribute type Description Date Time Text Numeric Numeric Numeric Numeric Numeric Text Numeric Numeric Numeric Text Date of sale Time of sale transaction Name of the product Number of products sold in the transaction Unit price Commission (in %) on sale Value = amount (number) x price; of the product Discount (in %) for transaction Product group to which the product belongs Value of the whole transaction Total value of sales to the customer in fiscal year Number of purchases by customer in fiscal year Town where the customer lives or is based
First, after some initialization, we need to provide parameters. These parameters belong to 3 groups: • • „Query"  definition of the attributes and the subject, „Type of report"  definition of how the results should be presented,
340
J. Kacprzyk
& S.
Zadrozny
•
„Method"  definition of parameters of the method (i.e. a genetic algorithm)
and their meaning is selfevident. We will now give a couple of examples. First, suppose that we are interested in what the relation between the commission and the type of goods sold is. We obtain the linguistic summaries shown in Table 15.4. Table 15.4. Linguistic summaries expressing relations between the group of products and commission Summary Degree of appropriateness Degree of imprecision 0.2329 0.1872 0.2045 0.3453 0.1684 0.4095 0.1376 0.5837 0.1028 0.5837 0.0225 0.5837 0.0237 0.2745 0.1418 0.1872 0.1288 0.1872 Degree of covering Weighted average 0.4202 0.3165 0.5498 0.3699 0.5779 0.3919 0.7212 0.4449 0.4808 0.3162 0.5594 0.3202 0.0355 0.2346 0.0455 0.1881 0.0585 0.1820 Degree of validiiv
About Yi of sales of network elements is with a high commission About Vi of sales of computers is with a medium commission Much sales of accessories is with a high commission. Much sales of components is with a low commission About Vi of sales of software is with a low commission About Vi of sales of computers is with a low commission A few sales of components is without commission A few sales of computers is with a high commission Very few sales of printers is with a high commission
0.3630
0.4753 0.5713 0.6707 0.4309 0.4473 0.0355 0.0314 0.0509
As we can see, the results can be very helpful in, e.g., negotiating commissions for various products sold. Next, suppose that we are interested in relations between the groups of products arid times of sale. We obtain the results shown in Table 15.5.
Data Mining via Linguistic
Summaries
of Databases ...
341
Table 15.5. Linguistic summaries expressing relations between the groups of products and times of sale Degree of appropriateness Degree of imprecision About 1/3 of sales of 0,0999 computers is by the end 0,2010 of year About Vi of sales in 0,0642 autumn is of 0,4095 accessories About 1/3 of sales of 0,0733 network elements is in 0,2124 the beginning of year 0,0833 Very few sales of network elements is by 0,2010 the end of year Very few sales of 0,0768 software is in the 0,2124 beginning of year About Vi of sales in the 0,0348 beginning of year is of 0,4095 accessories About 1/3 of sales in 0,0464 the summer is of 0,2745 accessories About 1/3 of sales of 0,0507 peripherals is in the 0,2525 spring period About 1/3 of sales of 0,0446 software is by the end 0,2010 of year About 1/3 of sales of 0,0458 0,2525 network elements is in the spring period 0,0336 About 1/3 of sales in the summer period is of 0,2745 components Summary Degree of covering Weighted average 0,3009 0,1274 0,4737 0,1143 0,2857 0,0982 0,1176 0,0980 0,1355 0,0929 0,4443 0,0860 0,3209 0,0853 0,3032 0,0809 0,2455 0,0768 0,2983 0,0763 0,3081 0,0745 Degree of validitv
0,2801
0,4790
0,1957
0,0929
0,0958
0,4343
0,3092
0,2140
0,2258
0,2081
0,3081
342
J. Kacprzyk
& S.
Zadroiny
Very few sales of network elements is in the autumn period A few sales of software is in the summer period
0,0485 0,1471 0,1956 0,0692 0,0402 0,1765 0,1362 0,0691
0,0955
0,1765
Notice that in this case the summaries are much less obvious than in the former case expressing relations between the group of product and commission. It should also be noted that the weighted average is here very low but this, by technical reasons, should not be taken literally as these values are mostly used to order the summaries. Finally, let us show in Table 15.6 some of the obtained linguistic summaries expressing relations between the attributes: size of customer, regularity of customer (purchasing frequency), date of sale, time of sale, commission, group of product and day of sale. This is an example of the most sophisticated form of linguistic summaries supported by the system described.
Table 15.6. Linguistic summaries expressing relations between the attributes: size of customer, regularity of customer (purchasing frequency), date of sale, time of sale, commission, group of product and day of sale Summary Degree of appropriateness Degree of imprecision 0,3843 0,2748 0,3425 0,4075 0,3133 0,4708 0,3391 0,3540 0,3882 0,5837 Degree of covering Weighted average 0,6591 0,3863 0,7500 0,3648 0,7841 0,3564 0,6932 0,3558 0,1954 0,3451 Degree of validkv
Much sales on Saturday about noon with a low commission Much sales on Saturday about noon for bigger customers Much sales on Saturday about noon Much sales on Saturday about noon for regular customers A few sales for regular customers is with a low commission
is
0,3951
is
0,4430
is is
0,4654 0,4153
0,1578
Data Mining via Linguistic
Summaries
of Databases ...
343
A few sales for small customers is with a low commission A few sales for onetime customers is with a low commission Much sales for small customers is for nonregular customers
0,3574 0,5837 0,3497 0,5837 0,6250 0,1458
0,2263 0,3263 0,2339 0,3195 0,7709 0,5986
0,1915
0,1726
0,5105
15.7. Conclusions We proposed a realistic, interactive approach to linguistic summaries of large sets of data. By human assistance we could derive complex, "intelligent" and humanconsistent linguistic summaries. Results of an implementation at a computer retailer are very encouraging.
References [1] P. Bosc and J. Kacprzyk, Eds., Fuzziness in Database Management Systems. PhysicaVerlag, Heidelberg, 1995. [2] R. George and R. Srikanth, „Data summarization using genetic algorithms and fuzzy logic", in: Genetic Algorithms and Soft Computing (Eds. F. Herrera and J.L. Verdegay), PhysicaVerlag, Heidelberg and New York, pp. 599611, 1996. [3] J. Kacprzyk, „An interactive fuzzy logic approach to linguistic data summaries", in Proceedings ofNAFIPS'99  18' International Conference of the North American Fuzzy Information Processing Society (Eds. R.N. Dave and T. Sudkamp). IEEE Press, Piscataway, NJ, pp. 595  599, 1999. [4] J. Kacprzyk and P. Strykowski, „Linguistic data summaries for intelligent decision support", in Proceedings of EFDAN'99  4th European Workshop on Fuzzy Decision Analysis and Recognition technology for Management, Planning and Optimization, pp. 312, 1999. [5] J. Kacprzyk and R.R, Yager, ..Intelligent Summaries of Data Using Fuzzy Logic", International Journal of General Systems (in press), 2000.
344
J. Kacprzyk
& S.
Zadrozny
[6] J. Kacprzyk and S. Zadrozny, „Fuzzy querying for Microsoft Access", in Proceedings of the Third IEEE Conference on Fuzzy Systems (Orlando, USA), Vol. 1, pp. 167171, 1994. [7] J. Kacprzyk and S. Zadrozny, „FQUERY for Access: fuzzy querying for a Windowsbased DBMS", in Fuzziness in Database Management Systems (Eds. P. Bosc and J. Kacprzyk) PhysicaVerlag, Heidelberg, pp. 415433, 1995. [8] J. Kacprzyk and S. Zadrozny, "Fuzzy queries in Microsoft Access v. 2", in Proceedings of 6th IFSA World Congress (Sao Paolo, Brazil), Vol. II, pp. 341344, 1995. [9] J. Kacprzyk and S. Zadrozny, "Fuzzy queries in Microsoft Access v. 2", in Fuzzy Information Engineering  A Guided Tour of Applications (Eds. D. Dubois, H. Prade and R.R. Yager):, Wiley, New York, pp. 223232, 1997. [10] J. Kacprzyk and S. Zadrozny, implementation of OWA operators in fuzzy querying for Microsoft Access", in The Ordered Weighted Averaging Operators: Theory and Applications (Eds. R.R. Yager and J. Kacprzyk), Kluwer, Boston, pp. 293306, 1997. [11] J. Kacprzyk and S. Zadrozny, „Flexible querying using fuzzy logic: An implementation for Microsoft Access", in Flexible Query Answering Systems (Eds. T. Andreasen, H. Christiansen and H.L. Larsen):, Kluwer, Boston, pp. 247275). [12] J. Kacprzyk and S. Zadrozny, "Data Mining via Linguistic Summaries of Data: An Interactive Approach", in Methodologies for the Conception, Design and Application of Soft Computing  Proceedings of 5'h IIZUKA'98 (Eds. T. Yamakawa and G. Matsumoto), pp. 668671, 1998. [13] J. Kacprzyk and S. Zadrozny, "On sumarization of large datasets via a fuzzylogicbased querying addon to Microsoft Access", in: Intelligent Information Systems VII, IPI PAN, Warsaw, pp.249258, 1998. [14] J. Kacprzyk and S. Zadrozny, "On interactive linguistic summarization of databases via a fuzzylogicbased querying addon to Microsoft Access", in Computational Intelligence  Theory and Applications (Ed. B. Reusch). Springer, Berlin, pp. 462472, 1999. [15] J. Kacprzyk, S. Zadrozny and A. Ziolkowski, „FQUERY III+: a 'human consistent1 database querying system based on fuzzy logic with linguistic quantifiers", Information Systems, 6, pp. 443453, 1989. [16] J. Kacprzyk and A. Ziolkowski, „Database queries with fuzzy linguistic quantifiers", IEEE Transactions on Systems, Man and Cybernetics, SMC 16, pp. 474479, 1986. [17] D. Rasmussen and R.R. Yager, „Fuzzy query language for hypothesis evaluation", in Flexible Query Answering Systems (Eds. T. Andreasen,
Data Mining via Linguistic
Summaries
of Databases ...
345
H.Christiansen and H.L. Larsen). Kluwer, Boston/Dordrecht/London, pp. 2343, 1997. [18] R.R. Yager, „A new approach to the summarization of data", Information Sciences, 28, pp. 69 86, 1982. [19] R.R. Yager, „On ordered weighted avaraging operators in multicriteria decision making", IEEE Transactions on Systems, Man and Cybernetics, SMC18, pp. 183190, 1988. [20] R.R. Yager, „On linguistic summaries of data", in Knowledge Discovery in Databases.(Eds. W. Frawley and G. PiatetskyShapiro), AAAI/MIT Press, pp. 347363, 1991. [21] R.R. Yager and J. Kacprzyk, Eds., The Ordered Weighted Averaging Operators: Theory and Applications. Kluwer, Boston, 1997. [22] J. Kacprzyk and R.R. Yager, „Linguistic summarization od databases: a perspective", in Proceedings of IFSA'99  World Congress of the International Fuzzy Sets Association (Taipei, Taiwan), vol. 1, 4448, 1999 [23] L.A. Zadeh, „A computational approach to fuzzy quantifiers in natural languages", Computers and Mathematics with Applications,. 9, pp. 149184, 1983. [24] L.A. Zadeh, „Syllogistic reasoning in fuzzy logic and its application to usuality and reasoning with dispositions". IEEE Transaction on Systems, Man and Cybernetics, SMC15, pp. 754763, 1985. [25]L.A. Zadeh and J. Kacprzyk, Eds., Fuzzy Logic for the Management of Uncertainty, Wiley, New York, 1992. [26] L. A. Zadeh and J. Kacprzyk, Eds., Computing with Words in Information/Intelligent Systems. Vol 1: Foundations, PhysicaVerlag, Heidelberg and New York, 1999. [27]L.A. Zadeh and J. Kacprzyk, Eds., Computing with Words in Information/Intelligent Systems. Vol. 2: Applications, PhysicaVerlag, Heidelberg and New York.
About the Authors
349
Jim F. BALDWIN Department of Engineering Maths University of Bristol Bristol, BS8 1TR, UK Phone:+441179287754 Fax:+441179251154 Email: Jim.Baldwin@bris.ac.uk Jim Baldwin is Professor of Artificial Intelligence and Director of the AI Research Groep at the University of Bristol. He has been active in fuzzy sets since early days of the subject, and his research in A.I. covers fundamental aspects of knowledge representation, inference under uncertainty and belief theories, fuzzy control and fuzzy set theory, and machine learning. He is the originator of support logic programming and the mass assignment theory which was developed during his EPSRC Advanced Fellowship (199095). He has published over 250 papers, is a member of the editorial board for a number of journals and has served on many conference program committees.
PfmgTong CHAN Department of Electrical Engineering, The Hong Kong Polytechnic University, Hung Horn, Kowloon, Hong Kong. Email: eeptchan@ee.polyu.edu.hik P.T. Chan received the BEng, MSc and Ph.D degree in electrical engineering from the Hong Kong Polytechnic University (HKPU) in 1994, 1996 and 1999, respectively. During the two years of MSc studies, he also worked as a graduate ft trainee in an E&M engineering consultancy firm. ij He is currently a research associate in H1CPU and working on intelligent control, fuzzy systems and genetic algorithms.
350
Liya DING Institute of Systems Science, National University of Singapore Singapore Phone:+658742516 Fax: +657782571 Email: liya@iss.nus.edu.sg Liya Ding received the B.E. degree in Computer Engineering from Shanghai University of Technology, Shanghai, China, in 1982, and Ph.D degree in Computer Science from Meiji University, Tokyo, Japan, in 1991. Since 1991, she has been with the Institute of Systems Science, National University of Singapore. From 1994 to 1996, she was the project leader of NeuroISS Laboratory, Real World Computing, which is an international research project supported by Japan government. She is a member of IPSA, IEEE, and Singapore Computer Society. Her current research interests include fuzzy logic, approximate reasoning, and applications of knowledge engineering and soft computing.
Juzhen DONG Department of Information Engineering, Maebashi Institute of Technology, 460kamisadoricho, MaebashiCity, 371, Japan Phone & Fax : +81272657366 Juzhen Dong received her Ph.D. from Yamaguchi University, Japan. She is a cooperative researcher in Department of Information Engineering, Maebashi Institute of Technology, Japan. Her research interests include knowledge discovery and data mining, machine learning, soft computing, and intelligent information systems.
351
Jairo ESPINOSA Department of Electrical Engineering ESATSISTA Katholieke Universiteit Leuven Kardinaal Mercierlaae 94 B3001 Beverfee, Belgium Phone: +3216393082 Fax: +3216393080 Email: iairo.espinosa@esat.kuleuven.ac.be URL: http://www.esat.kuleuven.ac.be/espinosa Jairo Espieosa received the degree of Engineer in electronic engineering in 1993 from the IMversidad Distrital Francisco Jose* de Caldas, Colombia and the M.Eng. degree in electrical engineering from the Katholieke Universiteit Leuven, Belgium in 1995. From 1996 till 1999 he was research assistant in the Katholieke Universiteit Leuven, where he wrote his Ph.D. thesis. He is currently working for ISMC (Intelligent Systems Modeling and Control) N.V. in Belgium and he combines his work with teaching activities in the postgraduate program on automation from the Corporacidn Universitaria de Ibague, Colombia.
Takeshi FURUHASHI Department of Information Electronics, Nagoya University Furocho, Chikusaku, Nagoya 4648603, Japan phone:+810527892792 fax:+810527893166 email: furuhashi @ nuee.nagoyau.ac.jp
f
'Hi
Takeshi Furuhashi received his Ph.D. degree from Department of Electrical & Electronics Engineering, Nagoya University, Japan in 1985. From 1985 to 1988, he was an Engineer at Toshiba Corporation, Japan. Since 1988, he has been with School of Engineering, Nagoya University, Japan. He is currently an associate professor at School of Engineering, Nagoya University, Japan. His main works include: (a) "A Theory on Stability of Fuzzy Control System Using Discrete Event Representation and Numerical Analysis on Transition Between Events", Journal of Japan Society for Fuzzy Theory and Systems, 10, 1, pp.126134 (1998); (b) "A Creative Design of Fuzzy Logic Controller Using a Genetic Algorithm", Advances in Fuzzy Systems vol.7,
352
pp.3748 (1997); (c) "Selection of Input Variables of Fuzzy Model Using Genetic Algorithm with Quick Fuzzy Inference", Lecturer Notes in Artificial Intelligence, vol.1285, pp.4553 (1997); (d) "On Fuzzy Modeling Using Fuzzy Neural Networks with the BackPropagation Algorithm", IEEE Tran. on Neural Networks, Vol.3, No.5, pp.801806 (1992). He is a member of IEEE, NAFIPS, SOFT, and SICE. Yutaka HATA Department of Computer Engineering Himeji Institute of Technology 2167, Shosha, Himeji, 6712201, Japan Phone:+81792674986 Fax: +81792668868 Email: hata@comp.eng.himejitech.acjp URL: http://w wwj 1 .comp.eng.hirnejitech.ac.jp Yutaka Hata was born in Hyogo on May 30, 1961. He received the B.E., M.E., and D.E. degrees in Electronics from Himeji Institute of Technology in 1984, 1986, and 1989, respectively. He is currently an associate professor in the Department of Computer Engineering, Himeji Institute of Technology. He spent one year in University of California at Berkeley from 1995. He is now a visiting professor in UC Berkeley. His research interests include multiplevalued logic, soft computing and image processing. He is a member of the IEEE, the Japan Society of Medical Electronics and Biological Engineering, the Japan Society for Fuzzy Theory and Systems, and Biomedical Fuzzy Systems Association.
Kaoru HIROTA Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology 4259 Nagatsutacho, Midoriku, Yokohama 2269502, Japan Phone:+81459245685 Fax: +81459245676 Email: hirota@hrt.dis.titech.ac.jp URL: http://www.hrt.dis.titech.ac.ip Kaoru Hirota received the B.E., M.E. and Dr. E. degree in electronics from Tokyo Institute of Technology, Tokyo, Japan, in 1974, 1976, 1979, respectively. From 1979 to 1982
353
he was with the Sagami Institute of Technology, Fujisawa, Japan. From 1982 to 1995 he was with the College of Engineering, Hosei University, Tokyo. Since 1995, he has been with the Interdisplinary Graduate School of Science and Technology, Tokyo Institute of Technology, Yokohama, Japan. Dr. Hirota is member of IPSA (Vice President 19911993, Treasurer 19972001) and SOFT (Vice President, 19951997), and he is an editor in chief of Int. of Advanced Computational Intelligence. Hlsao ISHIMJCHI Department of Industrial Engineering Osaka Prefecture University Gakuencho 11, Sakai, Osaka 5998531, Japan Phone: +81722549350 Fax: +81722549915 Email: hisaoi@ie.osakafuu.ac.jp URL: http://www.ie.osakafuu.ac.jp/~hisaoi/ ci_lab_e/index.html Hisao Ishibuchi received the B. S. and M. S. degrees in precision mechanics from Kyoto University, Kyoto, Japan, in 1985 and 1987, respectively, and the Ph.D. degree from Osaka Prefecture University, Osaka, Japan, in 1992. Since 1987, he has been with Department of Industrial Engineering at Osaka Prefecture University, where he is currently a Professor from 1999. He was a Visiting Research Associate at University of Toronto from August 1994 to March 1995 and from July 1997 to March 1998. His research interests includefuzzy ralebased systems, fuzzified neural networks, genetic algorithms, fuzzy scheduling, and evolutionary games. Janus KACf EZYK Systems Research Institute Polish Academy of Sciences UL Newelaska 6, 01447 Warsaw, Poland University of Applied Information Technology and Management UL Newelska 6, 01447 Warsaw, Poland Phone:+4822836 44 14 Fax: +4822837 27 72 Email: kacprzyk@ibspan.waw.pl UR: h ttp://ww w. ibspan. waw.pl/
354
Janusz Kacprzyk received his M.S. degree in computer science and automatic control in 1970 from Warsaw University of Technology in Poland. He received the Ph.D. degree in systems analysis in 1977, and the D.Sc. ("habilitation") degrees in 1994, both from the Systems Research Institute, Polish Academy of Sciences in Warsaw, Poland. From 1970 to present he has been employed at the Systems Research Institute, Polish Academy of Sciences, currently as full professor and Deputy Director for Research. In 198183, in the spring of 1986 and 1988 he was a visiting professor at various American universities. His research interests include the use of soft computing, mainly fuzzy logic, in various areas related to intelligent systems, notably, decision making and optimization, control, database querying and information retrieval. Recently, he is working on the use of the computing with words paradigms in various areas related to the above. In 1991  1995 he was a VicePresident of IPSA, and in 1995  1999 he was a member of IPSA Board. From 1999 on he is a member of EUSFLAT Board. Janusz Kacprzyk is the editor in chief of two book series published by the SpringerVerlag group (PhysicaVerlag, Heidelberg and New York): "Studies in Fuzziness and Soft Computing" and "Advances in Soft Computing" and serves on the editorial board on more than 10 respected journals.
Naotake KAMIUEA Department of Computer Engineering Himeji Institute of Technology 2167, Shosha, Himeji, 6712201, Japan Phone:+81792674986 Fax: +81792668868 Email: kamiura@coinp.eng.himejitech.ac.jp
iiiiiiMlh
fi
<«PWV^
™»(pump""
#
iiiiiiiiiilMlllltlffiiilTiiiiiiii nhiitfi
Naotake Kamiura was born in Hyogo on February 3, 1967. He received the B.E., M.E., and D.E. degrees in Electronics from Himeji Institute of Technology, in 1990, 1992, and 1995, respectively. He is currently a research associate in the Department of Computer Engineering, Himeji Institute of Technology. His current research interests include multiplevalued logic and fault tolerance. He is a member of IEEE.
RJH1H
355
Mayuka F. KAWAGUCHI Division of Systems and Information Engineering Graduate School of Engineering Hokkaido University Kita 13, Nishi 8, Kitaku, Sapporo 0608628, Japan Phone: +81417066805 Fax: +81117067830 Email: mayuka@main.eng.hokudai.ac.jp
ft i
•
JW E
Mayuka F. Kawagechi received the B. Eng. degree in electronics engineering in 1985 and the M. Eng. degree in information engineering in 1987 from Hokkaido University, Japan. She received the Ph.D. degree for her studies on fuzzy arithmetic operations relating to triangular norms in 1993 from Hokkaido University, Japan. From 1988 to 1995, she was an Instructor in the Department of Information Engineering at Hokkaido University. Currently, she is an Associate Professor of Systems and Information Engineering at Hokkaido University. Her main research interests involve multiplevalued logic including fuzzy logic, inference systems and numerical analysis. She is a member of Japan Society for Fuzzy Theory and Systems (SOFT), The Institute of Electronics, Information and Communication Engineers (IEICE), Information Processing Society of Japan (IPSJ), The Institute of Electrical and Electronics Engineers (IEEE), etc.
Trewm P. MAETIN Department of Engineering Maths University of Bristol Bristol, BS8 1TR, UK Phone:+441179287754 Fax:+441179251154 Email: Trevor.Martin@bris.ac.uk Trevor Martin is a Senior Lecturer in the AI Research Group at the University of Bristol. His research interests focus on uncertainty in AI, and he is codeveloper of the Fril language and the fuzzy data browser. He has published over 90 papers in refereed journals and conferences, and is a joint organiser of several international fuzzy and fuzzy logic programming workshops.
356
MasaaM MIYAKOSHI Division of Systems and Information Engineering Graduate School of Engineering Hokkaido University Kita 13, Nishi 8, Kita»ku, Sapporo 0608628, Japan Phone: +81117066810 Fax: +81117067830 Email: miyakosi@main.eng.hokudai.ac.jp MasaaM Miyakoshi received the Ph. D degree in information engineering from Hokkaido University, Sapporo, Japan, in 1985. Currently, he is a Professor in the Graduate School of Engineering at Hokkaido University. His research interests include fuzzy set theory and applications. He is a member of The Institute of Electronics, Information and Communication Engineers (IEICE) of Japan, Japan Society for Fuzzy Theory and Systems (SOFT), etc.
Masaharu MIZUMOTO Department of Engineering Informatics Faculty of Computer Science and Technology Osaka ElectroCommunication University Neyagawa, Osaka 5728530, Japan Phone:+81728204569 Fax: +81728240014 Email: mizumoto@mzlab.osakac.ac.jp URL: http://www.osakac.ac.jp/labs/mizumoto Masaharu Mizumoto received B.Eng, M.Eng, and Dr. Eng degrees in Electrical Engineering from Osaka University in 1966, 1968 and 1971, respectively. He is a professor in the Division of Information and Computer Sciences, Graduate School of Engineering, Osaka ElectroCommunication University. He was Vice President of International Fuzzy Systems Association (IPSA) in 19891991, Presidents of Biomedical Fuzzy Systems Association (BMFSA) in 19971999, and Japan Society for Fuzzy Theory and Systems (SOFT) in 19971999. He was an EditorinChief of Journal of SOFT, and now is Advisory editors of International Journal for Fuzzy Sets and Systems, International Journal of Fuzzy Mathematics, Bulletin for Studies and Exchanges on Fuzziness and its Applications, Journal of
357
Biomedical Fuzzy Systems Association, and Biomedical Soft Computing and Human Sciences. His current research interests Include fuzzy reasoning and Its applications to fuzzy control methods, fuzzy neural networks, and biomedical systems. Yuichiro MORI Dept. of Information Science, Kochi UniYersity 251 Akebonocho, Kochishi Kochi 7808520 Japan Phone: +81888448340 Fax: +81888448361 Email: ymori@is.kochiu.ac.jp
I;
Yuichiro Mori received the B.E., M.E. and D.E. in Electrical Engineering from Meiji University in 1990, 1992 and 1995, respectively. He is currently an assistant professor in the Dept. of Mathematics and Information Science, Faculty of Science, Kochi University. His main research Interests are in fuzzy logic circuit and system. He is a member of the Information Processing Society of Japan; and the Japan Society for Fuzzy Theory and Systems.
Masao MUKAIDONO Dept. of Computer Science, Meiji University 111 HigashiMita, Tamaku Kawasaki 2148571 Japan Phone:+81449347450 Fax: +81449347912 Email: masao@cs.meiji.ac.jp Masao Mukaidono received the B.E, M.E., and Ph.D. degrees in Electrical Engineering from Meiji University, Kasawaski, Japan, In 1965, 1967 and 1970, respectively. He is currently a Professor in the Department of Computer Science, School of Science and Technology, Meiji University. His main research interests are in multiplevalued logic; fuzzy logic and Its applications, faulttolerant computing, failsafe logic, and computer aided logic design. Dr. Mukaidono was president of Japan Society of Fuzzy Theory and Systems, and member of the IEEE Computer Society, the Information Processing Society of Japan, and Japanese Society of Artificial Intelligence.
358
Kaio NAKAGAWA Department of Computer Engineering Himeji Institute of Technology 2167, Shosha, Himeji, 6712201, Japan Phone:+81792674986 Fax: +81792668868 Email: nakagawa@eoiTip.eng.hiniejitech.ac.jp Mitsubishi Electric Corporation, Automotive Electronics Development Center 840, ChiyodaMachi, Himeji, 6708677, Japan Phone:+81792988894 Fax: +81792961992 Email: nakagaka@hime.melco.co.jp Kado Nakagawa was born in Osaka on October 21, 1974. He received the B.E. and M.E. degrees from Faculty of Engineering, Himeji Institute of Technology, in 1997 and 1999, respectively. He is currently pursuing his studies at Mitsubishi Electric Corporation, Automotive Electronics Development Center. Tomoharu NAKASHIMA Department of Industrial Engineering Osaka Prefecture University Gakuencho 11, Sakai, Osaka 5998531, Japan Phone :+81722549350 Fax :+81722549915 Email : nakashi@ie.osakafuu.ac.jp URL : http://www.ie.osakafuu.ac.jp/hisaoi/ ci_lab_e/index.html Tomoharu Nakashima received the B. S. and M. S. degrees from Osaka Prefecture University, Osaka, Japan, in 1995 and 1997, respectively, and the Ph.D. degree from Osaka Prefecture University, Osaka, Japan, in 2000. From June 1998 to September 1998, he was a Postgraduate Scholar with the KnowledgeBased Intelligent Engineering Systems Centre, University of South Australia. His current research interests include fuzzy systems, machine learning, genetic algorithms, reinforcement learning, game theory, multiagent systems, and image processing.
359
Manabu NH Department of Industrial Engineering Osaka Prefecture University Gakuencho 11, Sakai, Osaka 5998531, Japan Phone: +81722549350 Fax: +81722549915 Email: manabu@ie.osakafuu.ac.jp URL: http://wwwie.osakafuu.ac.jp/~hisaoi/ ci_lab_e/index.html
Manabu Nii received the B. S. and M. S. degrees from Osaka Prefecture University, Osaka, Japan, in 1996 and 1998, respectively. He is currently pursuing the Ph.D. degree at Osaka Prefecture University. His research interests include fuzzy rulebased systems, fuzzified neural networks, and genetic algorithms.
W
Hires!! OHNO Department of Human Factors, Toyota Central R&D Labs., Inc. Nagkute, Aichi, 48011, Japan Phone: +810561636579 Fax: +810561635743 Email: oonoh@mosk.tytlabs.co.jp Hiroshi Ohno received his Ph.D. degree from Department of Information Electronics, Nagoya University, Japan, in 1999. Since 1988, he has been with Toyota Central R&D Labs., Inc, Japan. His main works include: "Neural networks control for automatic braking control system", Neural Networks, 7, pp.13031312 (1994). He is a member of JSA1, IPSJ, and JCSS.
360
Seteuo OHSUGA Department of Information Science and Computer Science, Waseda University, Japan, Email : ohsuga@ohsuga.info.waseda.ac jp Setsuo Ohsuga is currently professor in Department of Information Science and Computer Science, Waseda University, Japan. He received his Ph.D. from the University of Tokyo. He has been professor and director of Research Center for Advanced Science and Technology (RCAST) at the University of Tokyo.
Kazuhiko 0TSUKA Dept. of Computer Science, Meiji University 111 HigashiMita, Tamaku Kawasaki 2148571 Japan Phone: +81449347442 Fax: +81449347912 Email: tsuka@cs.meiji.ac.jp He received the B.S. degree in Science and the M.E. degree in Engineering from Meiji University, Japan in 1994 and 1996, respectively. He is currently a Ph.D. student in the Department of Computer Science, Meiji University. His main research interests are fuzzy logic and its applications, approximate reasoning in Artificial Intelligence. He is a student member of Japan Society for Fuzzy Theory and Systems.
31! 1
Ahmad Besharati EAD Department of Electrical Engineering, The Bong Kong Polytechnic University, Hung Horn, Kowloon, Hong Kong Email: eeabrad@ee.polyu.edu.hk A.B. Rad received the B.Sc. degree in engineering from Abadan Institute of Technology, Abadan, Iran, the M.Sc. degree in control engineering from the University of Bradford, Bradford, U.K., and the Ph.D. degree in control engineering from the University of Sussex, Brighton, U.K., In 1977, 1986 and 1988, respectively. He Is currently an Associate Professor in the Department of Electrical Engineering, The Hong Kong Polytechnic University, Kowloon, Hong Kong. He has also worked as a control and Instrumentation engineer in the oil Industry for seven years. His current interests include system identification, adaptive control and intelligent process
control.
Jonathan M. EOSSITEM Department of Engineering Maths University of Bristol Bristol, BS8 1TR, UK Phone:+441179287754 Fax:+441179251154 Email: Jonathan.Rossiter@bris.ac.uk Jonothan Rosslter is a research assistant in the AI research Group at the University of Bristol. Before this position he was a research student under an EPSRC/CASE award with the UK Defense and Evaluation Research Agency. His research Interests Include fuzzy knowledge engineering, time series analysis, and objectoriented fuzzy languages.
362
Yan SHI Department of Information Science School of Information Science Hyushu Tokai University 914, Toroku, Kumamoto, 8628652, Japan Phone:+81963862666 Fax: +81^96381^7954 Email: shi@ktmail.ktokaiu.ac.jp URL: http://necws1 .ktokaiu.ac.jp/~shi/y shi.htm Yan Shi received his Ph.D degree in Information and Computer Sciences from Osaka ElectroCommunication University, Japan, in 1997. He is currently an Associate Professor in the Graduate School of Engineering, as well as the School of Information Science, at Kyushe Tokai University, Japan. He was a Research Assistant from 1982 to 1988, and a lecturer from 1988 to 1991 in the Department of Basic Science at the Northeast Heavy Machinery Institute (the present Yanshan University), China. From 1991 to 1992, he was a Visiting Researcher in the Department of Electronics and Informatics Engineering at Yamagata University, Japan. From 1992 to 1994, he was a Research Associate in the Division of Information and Computer Sciences at Osaka ElectroCommunication University, Japan. From 1994 to 1997, he was a Researcher in the Research and Development Division at Mycom Inc., Japan. From 1997 to 2000, he was an Assistant Professor in the Department of Information and System Engineering at Kyushu Tokai University, Japan. His research interests include approximate reasoning, fuzzy system modeling, neurofuzzy learning algorithms for system identification. He is a member of the Japan Society for Fuzzy Theory and Systems (SOFT), the Biomedical Fuzzy Systems Association (BMFSA). YuJane TSAI National University of Kaoshiung, Taiwan No. 416, LanChang Rd., NanTzu District, Kaohsiung, Taiwan, R.O.C. Phone: +88673661533 Fax: +88673661545 Email: yjtsai@plt02.nuk.edu.tw YuJane Tsai received the B.S. degree in Computer Science from YuanZe University, Taiwan in 1995, and M.S. degree in Information Engineering from IShou University, Taiwan in 1998, Since 1996, she has been with the laboratory of multimedia, IShou
V
Jl
University, Taiwan, where she has been involved in applvinr *.»! muJfisvm techniques to database systems. She is currently a project assistant m ,n AtUmtc division at National University of Kaoshiueg, Kaoshiung, 'Taiwan
Joos VANDEWALLE Department of Electrical Engineering ESATSISTA Katholieke Universiteit Leevee Kardieaal Mercieriaan 94 B3001 Heverlee, Belgium Phone: +3216321709 Fax: +3216321970 Email: joos.vandewaHe@esat.kuleuven.ac.be URL: http://www.esatkuleuven.ac.be/sista/sista.htnil, Joos Vaedewalle obtained the electrical engineering degree and a doctorate in applied science, both from the Katholieke Universiteit Leuven, Belgium in 1.971 and 1976 respectively. From 1976 to 1978 he was Research Associate and from July 1978 to July 1979, he was Visiting Professor both at the University of California, Berkeley. Since 1979 he is back at the ESAT Laboratory of the Katholieke Universiteit Leuven, Belgium where he is Full Professor since 1986. He is an Academic Consultant since 1984 at the VSDM group of IMEC (Interuniversity Microelectronics Center, Leuven). Since 1999 he is ViceDean of the Faculty of Applied Science at the Katholieke Universiteit Leuven.
PdZhuang WANG West Texas A&M University AEI, Box 60248, Canyon, TX79016, U.S.A. Tel: +18066512454 Fax:+18066512733 Email: pzwang@wtamu.edu Peizhuang Wang has been a Full Professor, Tutor of Ph.D., and the Head of National Laboratory of Fuzzy Information Processing and Fuzzy Computing, Beijing Normal University; China since 1983. He is an Adjunct Professor at West
364
Texas A&M University since 1996. He was Vice President of International Fuzzy Systems Association (IPSA) (19914993), Vice President of Guangzhou University, China (198589), Senior Researcher at the Institute of Systems Science, National University of Singapore (198995). He is Chairman of Chinese Chapter of IPSA (1993up to date), Honorary Chairman of Aptroeix, the Fuzzy Logic Technology Company, San Jose, and Beijing (1994). He has served as Chairman or member of organisation/program committees of many international conferences, and is a member of editorial boards of 12 Chinese Journals, 8 international refereed journals and many book series.
ShyueLiang WANG Information Management IShou University 1, Section 1, HsuehCheeg Road, TaHse Hsiang, Kaohsiung, Taiwan, R.O.C. Phone: +88676563711 ext 6551 Fax: +88676563734 Email: siwang@isu.edu.tw Leon S.L. Wang received the B.S. in Applied Mathematics in 1977 from National Chiao Tung University in Taiwan, and the Ph.D. degree in Applied Mathematics in 1984 from the State University of New York at Stony Brook, USA, He is now an Associate Professor in Information Management at IShou University, Kaohsiung, Taiwan. From 1984 to 1987, he was an assistant professor in mathematics at University of New Haven, Connecticut. From 1987 to 1994, he joined New York Institute of Technology as a research associate in the Electromagnetic Lab and assistant/associate professor in the Department of Computer Science. In 1996, he is the Director of Computing Center at IShou University. Since 1997, he is the Chairman of Information Management at IShou University. Dr. Wang is a member of IEEE, Chinese Fuzzy System Association, Chinese Computer Association, Chinese Information Management Association, Chinese Association of Information and Management, Kaohsiung Association for Information Development. His current research interests include fuzzy knowledgebased systems, intelligent information systems, and electronic commerce.
365
MingQiang XU Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology 4259 Nagatsutacho, Midoriku, Yokohama 2269502, Japan Phone:+81459245685 Fax: +81459245676 Email: xmq@hrt.dis.titech.ac.jp URL: http://www.hrt.dis.titech.ac.jp MingQiang XU received the B.Eng. and M. Eng. in automatic control from Northwestern Polytechnic University, China, Dr. Eng. in Tokyo Institute of Technology, Japan, in 1986, 1989, 1999, respectively. He majors in the study of intelligent systems such as intelligent telecommunication.
Hajime YOSHINO Meiji Gakuin University, Faculty of Law 1237 Shirokanedai, Minatoku, Tokyo, 1088636 Japan Email: yoshino@mh.meijigakuin.ac.jp URL: http://www.meijigakuin.ac.jp/ ~yoshino/jp/with_tree.htm Brief Biographical History: 197275 Asst. Prof, at Meiji Gakuin University, Faculty of Law 197582 Assoc. Prof, at Meiji Gakuin University, Faculty of law 1982 Professor at Meiji Gakuin University, Faculty of law Main works: "Logical Structure of Contract Law System—For Constructing a Knowledge Base of the United Nations Convention on Contracts for the International Sale of Goods—", Journal of Advanced Computational Intelligence, Vol.2, No.l, pp211(1998). "On the Logical Foundation of Compound Predicate Formulae for Legal Knowledge Representation", Artificial Intelligence and Law, Vol.5, Nos.12, pp.7796(1997).
366
Slawonir ZADIOZMY Systems Research Institute Polish Academy of Sciences Ul. Newelska 6, 01447 Warsaw, Poland University of Applied Information Technology and Management UL Newelska 6, 01447 Warsaw, Poland Email: kacprzyk@ibspan.waw.pl Phone: +4822836 44 14 Fax: +4822837 27 72 UR: http://www.ibspan.waw,pl/ Slawomir Zadrozey received in 1981 his M.S. degree in computer science from the Department of Mathematics, Computer Science and Mechanics, Warsaw University in Poland. In 1994 he received his Ph. D. degree in computer science from the Systems Research Institute, Polish Academy Science in Warsaw, Poland. Since 1981 he has been with the Systems Research Institute, Polish Academy of Sciences, now as Adjunct Professor. He is also Head of the Centre of Information Technology at the Institute. His current scientific interests include applications of fuzzy logic for decision support, database querying and data analysis. He is the author and coauthor of more' than 60 articles and conference papers. He has been involved in the design and implementation of several prototype software packages. He has been actively participating in several international scientific projects. He is also the teacher at the University of Applied Information Technology and Management in Warsaw, Poland, where his interests focus on database management systems theory and application, notably in the Internet environment. Nlng ZHONG Department of Information Engineering, Maebashi Institute of Technology, 460kamisadoricho, MaebashiCity, 371, Japan Phone & Fax : +810272657366 Email: zhong@maebashiit.ac.jp Ning Zhong is currently director of Knowledge Information Systems Laboratory, and an associate professor in Department of Information Engineering, Maebashi Institute of Technology, Japan. He received his Ph.D. from the University
367
of Tokyo. His research interests include knowledge discovery and data mining, rough sets and granularsoft computing, intelligent agents and databases, knowledge and hybrid systems.
Keyword Index
A
abduction 321 abstract factor 190 ad hoc query 329 adaptation 5 adjusting factor 304,305 AFRELI 15,16,17,19,26,32,33,35,36 aggregation 171 aggregation function 149,151,153, 154,156,159,161 antecedent validity adaptation 77,78, 87,93 approximate reasoning 4,121 approximation extended semantic  124,138 normalized 124,137 approximation measure 123,135,143 extended  139 generalized 124,138 least  136 simplified  138 assignment operation 169,175 atomic factor 190 attribute 335
C
case 3, center vector 274,278,279,280,283, 284,285,287,291,293 chaotic systems 4, chaotic time series 26,33 classification boundary 261,264 classification of summaries 330 classification system 241,261,264, 267 fuzzy rule based  258,261,267 closed world assumption 298 clustering 6,16,19,20,27,32,34,36, 274,278,282,283,287,289,291 fuzzy cmeans (FCM) 20,27,32, 34,59,274 fuzzy 59,65,66,67 Kmeans  274 mountain 20,27,32,34 unsupervised  293 coincidence operator 177 comparison operation 170,177 comparison operator 177 compatibility modification inference 122 competitive learning 276 compositional inference 122 compound fuzzy attributes 149,150, 151,152,153,158,159,160 connectionist architecture 5 consistency 53 weak  53 convex hull method 99
B
background knowledge (see knowledge) belief networks 4 belief updating 213,216,217,218, 219,220,222,224,228,238 bias 302,321
370
D
data mining 3,15,17,273,321,325, 327,337 deduction 321 defuzzification 79 center of height  79 degree of contribution 275,283,285, 286,293 degree of explanation 48 degree of similarity 280,281 dense neuron 279,280,281,282 direct inference method 180 discernibility function 309 discernibility matrix 309,313 distinguishability 18,21,24,25 dynamic collection 176
fixed point law 123,126,127 fixed semantics law 123 fixed value law 123,125,128 FQUERY for access 332,335,338 Fril 213,216,219,222,231,233,234, 235,238 function approximation 15 function method 183 function type 168 FuZion 17,18,21,25,26,32,34,36 fuzziness 190,198 fuzzy abstract factor 192 fuzzy arithmetic 255 fuzzy atomic factor 192 fuzzy cmeans (see clustering) fuzzy control 163  system 163,175 fuzzy database 149,150,151,160 fuzzy factor 191 fuzzy factor hierarchy 190,193,199 fuzzy filter 329,330,338 fuzzy ifthen rule 245,250,253,259, 261,264,265,266,267 fuzzy inference 163,175 fuzzy inference engine 166 fuzzy information processing 165 fuzzy interval 96 fuzzy interpolation function 113 fuzzy linguistic quantifier 336 fuzzy logic 1,4,6 fuzzy modeling 15,43,45,46,78 fuzzy neural network (FNN) 43 fuzzy number 255,256 fuzzy partition 113 fuzzy quantifier 335 fuzzy querying 332,336  engine 330  over Internet 336 fuzzy reasoning 87,249 fuzzy relation 335 fuzzy relational equation 103
E
enumeration type 168 evaluation 202,204,205 evidential logic 222 evidential logic rule 222,227,228, 235,236,238 evolutionary computation 1,5 evolutionary programming (EP) 43 evolutionary optimization 45 experience 2 human  2 expert knowledge 15,16 explanation 1
F
feedback rule 220 feedforward neural networks (see neural network) fitness 5 fixed interrelation law 123
371
fuzzy rule 5,44,53,70,78,82,88,110  extraction 253  generation 59,252  selection 250  tuning 62 recurrent 213,216,219 fuzzy rule base sparse  96 fuzzyrulebased approach 241,264, 267 fuzzyrulebased system 243,266 fuzzy set 165,213,259,321,337 convex  135 normalized  135 support of 126,137 trend 231,232,233,235,237,238 fuzzy set theory 4 fuzzy singletontype reasoning  59, 60,61 fuzzy singleton fuzzifier 79 fuzzy spline 109 curve 111,113 fuzzy systems 4,165,186 fuzzy system description language (FDL) 164,186 fuzzy term 334,337 fuzzy trend feature 213 fuzzy truth value 165,167,170,171 fuzzy type 166 fuzzy value 332,335,338 fuzzy variable 46 fuzzy_and 171 fuzzy _not 171 fuzzy_or 171
GDTRS 297,298,308,310,316,318, 319,320 generalization distribution table (GDT) 297,298,299,301,302,308, 312,320,321 genetic algorithms, GA 4,6,250 geneticsbased machine learning 252 gradient descent method 62,64 gradient descent optimization 45, granular computing 321
H
high level features 213,238 humanconsistent summarizer 328 hypothesis generation 301
I
imprecision 1, indirect inference method 180 induction 321 inductive learning 297 inductive method 298 inference 1 infimum 137 initial value 263 interpolation type 169 intelligence 2 intelligent systems 1 interrelation 130 interval arithmetic 255 interval division method 173,178, 183 iris data 265
J
Jeffrey's rule 233
G
GAbased rule selection 265
372
K
KHmethod 99 knowledge 2,  acquisition 1,2,6,43,44,45,47,48,  elicitation 3 discovery 3,7,278,283,321 engineering 1,2,3  integration 7  representation 1,3,6,190  validation 1,3 background  298,300,303,304, 305,306,318,319,320 declarative  2 deep  2 domain  2 expert 15,16 heuristic  2 linguistic  241 procedural  2 surface  2 knowledgebased inference 6 knowledgebased system 2,3,7,190, 211
linguistic meaning 44,45,47,48,51, 53 linguistic models 16 linguistic quantifier 329,330,333, 337 fuzzy  336 linguistic summaries 325,326 linguistic truth value 122 linguistic value 245,259 linguistic variable 5 logic operator 177 logical operation 170
M
mass assignment 213,214,215,216, 219,228 membership degree 169,175 membership function 4,44,47,52,53, 62,64,165,167,169,172,337 Gaussiantype  46 trapezoidal  47 triangular  47,79 modelling 7 fuzzy 15,43,45,46,78 perceptionbased 213 memorybased  213 neurofuzzy  7 nonlinear  44 trend 213 momentum constant 263 mountain clustering (see clustering) modification 171, multivalued logic 4
L
LR representation 98 learning 5 learning algorithm 263 learning rate 263 learning theory 4 least approximation measure (see approximation measure) legal reasoning 193 linear revising method (see revising method) linear rule interpolation 96,99 linguistic explanation 44 linguistic knowledge 241,259,262, 264,267
373
N
neural network 1,5,6,278,283,284, 289,293 trained 253,259,264 feedforward  259,263,274 neuralnetworkbased approach 241, 243,261,264 neuro computing 4 neurofuzzy learning 59,60,62,63, 64,69 neurofuzzy modeling 7 Neyman Scott's method 275,287 nondeviation property 122 nonlinear dynamic system 5 nonlinear modeling 44 null query 149,150,151,152,153,160 numerical data 253,259,262,263, 264,267
query 328,329,334,338
R
recurrent fuzzy rule (see fuzzy rule) recursive least squares (RLS) 23 reduct 310 reference vector 275,276,278,282 relation matrix 122 relation keeping property 123 renewal procedure 276 representative points method 172, 178 retrieval 202,203,206 revising function 123 revising method linear 123,125,127,128,129 semantic 123,130,132,134 revision principle 116,121 rough set 297,298,308,309,321 rough set theory 320 rule 3 fuzzy  (see fuzzy rule) rule base 171 rule discovery 298,302 rule extraction 5,6,15,254 rule generation 5 rule selection 265,311,316 rule strength 306,320
o
open world assumption 308 optimal interface design 17,38 optimization 5 ordered datasets 213,217,229 OWA operator 327
P
pattern 3 pattern classification 241 perceptionbased modelling 213 prior distribution 300,303 probabilistic reasoning 4 probabilistic relationship 299 prune 274,283,284
s
selfOrganizing Map (SOM) 274, 275,276,278,279,281,282,287, 289,293 semantic approximation 130 semantic discrimination analysis 224,228 semantic integrity 17,18,25 semantic relation 130
Q
quantity in agreement 326
374
semantic revising method (see revising method) semantic unification 221,231 sigmoidal activation function 255, 256 similarity matching 276 similarity measure 135,149,153,157, 158,159,190,193,197,200,201, 202,211 contextbased  194,200 contextsensitive  190 distancebased  190,200 factorbased 190,206 featurebased  190 integrated  194 structural 190,205 singleton type 169 soft computing 1,4,321 sparse fuzzy rule base (see fuzzy rule base) stopping condition 263 strength of rule (see rule strength) summarizer 326 support logic 216 support of fuzzy set (see fuzzy set) support pairs 223 supremum 137 system notation 169 systematic random search 5
Tversky model 196,198,200
u
uncertainty 1 unsupervised clustering (see clustering) user function method 172 user interface 334
vague legal concept 205 valuepoint law 123,125,129 valuable interval 130 vector type 169 voting interpretation 215 voting model 215,221
w
winning neuron 276
table lookup scheme 77,78,93 time series 213,228,229,237 trained neural networks 253 training pattern 267 trapezoid type 169 trend fuzzy set 231,232,233,235, 237,238 triangle type 169 truth qualification 180 converse  181
f*T_5l Soft Computing Series — Volume 5
fl new Paradigm of Knowledge Engineering by Soft Computing
Editor: Liya Ding (National University of Singapore) Soft computing (SC) consists of several computing paradigms, including neural networks, fuzzy set theory, approximate reasoning, and derivativefree optimization methods such as genetic algorithms. The integration of those constituent methodologies forms the core of SC. In addition, the synergy allows SC to incorporate human knowledge effectively, deal with imprecision and uncertainty, and learn to adapt to unknown or changing environments for better performance. Together with other modern technologies, SC and its applications exert unprecedented influence on intelligent systems that mimic human intelligence in thinking, learning, reasoning, and many other aspects. Knowledge engineering (KE), which deals with knowledge acquisition, representation, validation, inferencing, explanation, and maintenance, has made significant progress recently, owing to the indefatigable efforts of researchers. Undoubtedly, the hot topics of data mining and knowledge/data discovery have injected new life into the classical Al world. This book tells readers how KE has been influenced and extended by SC and how SC will be helpful in pushing the frontier of KE further. It is intended for researchers and graduate students to use as a reference in the study of knowledge engineering and intelligent systems. The reader is expected to have a basic knowledge of fuzzy logic, neural networks, genetic algorithms, and knowledgebased systems.
ISBN 9810245173
www. worldscientific. com
4606 he
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.