Professional Documents
Culture Documents
net/publication/2522150
CITATIONS READS
61 72
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Jingtao Yao on 17 June 2014.
Abstract the problem, but also brings more insights into the solution
of the problem.
Within a granular computing model of data mining, we
reformulate the consistent classification problems. The
granulation structures are partitions of a universe. A so- 2 Formal Concept Analysis, Granular Com-
lution to a consistent classification problem is a definable puting, and Data Mining
partition. Such a solution can be obtained by searching a
particular partition lattice. The new formulation enables
A close connection between formal concept analysis,
us to precisely and concisely define many notions, and to
granular computing, and data mining can be established
present a more general framework for classification.
by focusing on their two fundamental and related tasks,
namely, concept formation and concept relationship iden-
tification [8].
1 Introduction In the study of formal concepts, every concept is under-
stood as a unit of thoughts that consists of two parts, the
As a recently renewed research topic, granular com- intension and extension of the concept [1, 6]. The inten-
puting (GrC) is an umbrella term to cover any theories, sion (comprehension) of a concept consists of all properties
methodologies, techniques, and tools that make use of gran- or attributes that are valid for all those objects to which the
ules (i.e., subsets of a universe) in problem solving [7, 9, concept applies. The extension of a concept is the set of
10]. Formal concept analysis may be considered as a con- objects or entities which are instances of the concept. All
crete model of granular computing. It deals with the char- objects in the extension have the same properties that char-
acterization of a concept by a unit of thoughts consisting of acterize the concept. In other words, the intension of a con-
two parts, the intension and extension of the concept [1, 6]. cept is an abstract description of common features or prop-
The intension of a concept consists of all properties or at- erties shared by elements in the extension, and the extension
tributes that are valid for all those objects to which the con- consists of concrete examples of the concept. A concept is
cept applies. The extension of a concept is the set of objects thus described jointly by its intension and extension.
or entities which are instances of the concept. Basic ingredients of granular computing are subsets,
Recently, a granular computing model for knowledge classes, and clusters of a universe [7, 10]. There are many
discovery and data mining is proposed by combining results fundamental issues in granular computing, such as granu-
from formal concept analysis and granular computing [8]. lation of the universe, description of granules, relationships
Each granule is viewed as the extension of a certain con- between granules, and computing with granules. Granula-
cept and a description of the granule is an intension of the tion of a universe involves the decomposition of the uni-
concept. Knowledge discovery and data mining, especially verse into parts, or the grouping of individual elements into
rule mining, can be viewed as a process of forming con- classes, based on available information and knowledge. The
cepts and finding relationships between concepts, in terms construction of granules may be viewed as a process of
of their intensions and extensions. identifying extensions of concepts. Similarly, finding a de-
The objective of this paper is to apply the granular com- scription of a granule may be viewed as searching for inten-
puting model for the study of the consistent classification sion of a concept. Demri and Orlowska referred to such pro-
problems with respect to partitions of a universe. We re- cesses as the learning of extensions of concepts and learning
express, precisely and concisely, many notions based on of intensions of concepts, respectively [1]. The relation-
partition lattice. Our reformulation of the consistent clas- ships between concepts, such as sub-concepts, disjoint and
sification problems not only provides a formal treatment of overlap concepts, and partial sub-concepts, can be inferred
from intensions and extensions, Definition 2 A partition π1 is refinement of another parti-
It may be argued that some of the main tasks of knowl- tion π2 , or equivalently, π2 is a coarsening of π1 , denoted
edge discovery and data mining are concept formation and by π1 ¹ π2 , if every block of π1 is contained in some block
concept relationship identification [2, 8, 9]. The results of π2 .
from formal concept analysis and granular computing can
be immediately applied. The refinement relation is a partial ordering of the set of
In a recently proposed granular computing model of data all partitions. Given two partitions π1 and π2 , their meet,
mining, it is assumed that information about a set of objects π1 ∧ π2 , is the largest partition that is a refinement of both
is given by an information table [8]. That is, an object is π1 and π2 , their join, π1 ∨ π2 , is the smallest partition that
represented by a set of attribute-value pairs. A logic lan- is a coarsening of both π1 and π2 . An equivalence classes
guage is defined for the information table. The semantics of the meet are all nonempty intersections of an equivalence
of the language is given in the Tarski’s style through the class from π1 and an equivalence class from π2 , equivalence
notions of a model and satisfiability. The model is an in- classes of the join are the smallest subsets which are exactly
formation table. An object satisfies a formula if the object a union of equivalence classes from π1 and π2 . Under these
has the properties as specified by the formula. Thus, the operations, the poset is a lattice called the partition lattice,
intension of a concept given by a formula of the language, denoted by Π(U ).
and extension is given the set of objects satisfying the for-
mula. This formulation enables us to study formal concepts 3.2 Information tables
in a logic setting in terms of intensions and also in a set-
theoretic setting in terms of extensions. An information table provides a convenient way to de-
In the following sections, we will demonstrate the appli- scribe a finite set of objects called the universe by a finite
cation of the granular computing model for the study of a set of attributes [4, 9].
specific data mining problem known as the consistent clas-
sification problems. Definition 3 An information table is the following tuple:
There is a one-to-one correspondence between partitions of The semantics of the language L can be defined in the
U and equivalence relations (i.e., reflexive, symmetric, and Tarski’s style through the notions of a model and satisfia-
transitive relations) on U . Each equivalence class of the bility. The model is an information table S, which provides
equivalence relation is a block of the corresponding parti- interpretation for symbols and formulas of L.
tion. In this paper, we will use partitions and equivalence
relations, and blocks and equivalence classes interchange- Definition 5 The satisfiability of a formula φ by an object
ably. x, written x |=S φ or in short x |= φ if S is understood, is
defined by the following conditions: Object height hair eyes class
o1 short blond blue +
(1) x |= a = v iff Ia (x) = v, o2 short blond brown -
(2) x |= ¬φ iff not x |= φ, o3 tall red blue +
(3) x |= φ ∧ ψ iff x |= φ and x |= ψ, o4 tall dark blue -
o5 tall dark blue -
(4) x |= φ ∨ ψ iff x |= φ or x |= ψ. o6 tall blond blue +
o7 tall dark brown -
If φ is a formula, the set mS (φ) defined by:
o8 short blond brown -
mS (φ) = {x ∈ U | x |= φ}, (1)
Table 1: An information table
is called the meaning of the formula φ in S. If S is under-
stood, we simply write m(φ).
it may be impossible to find a concept (φ, m(φ)) such that
The meaning of a formula φ is therefore the set of all m(φ) = X. For example, in Table 1, o4 and o5 have the
objects having the property expressed by the formula φ. In same description. The subset {o4 , o5 } must be considered
other words, φ can be viewed as the description of the set a unit. It is impossible to find a formula whose meaning
of objects m(φ). Thus, a connection between formulas of is {o4 , o6 , o7 }. In the case where we can precisely de-
L and subsets of U is established. scribe a subset of objects X, the description may not be
With the introduction of language L, we have a formal unique. That is, there may exist two formulas such that
description of concepts. A concept definable in an informa- m(φ) = m(ψ) = X. For example, the two formulas:
tion table is a pair (φ, m(φ)), where φ ∈ L. More specifi-
cally, φ is a description of m(φ) in S, the intension of con- class = +,
cept (φ, m(φ)), and m(φ) is the set of objects satisfying φ, hair = red ∨ (hair = blond ∧ eyes = blue),
the extension of concept (φ, m(φ)).
To illustrate the idea developed so far, consider an infor- have the same meaning set {o1 , o3 , o6 }. Those observations
mation table given by Table 1, which is adopted from Quin- lead us to consider only certain families of partitions from
lan [5]. The following expressions are some of the formulas Π(U ).
of the language L:
Definition 6 A subset X ⊆ U is called a definable granule
height = tall, in an information table S if there exists at least one formula
hair = dark, φ such that m(φ) = X.
height = tall ∧ hair = dark,
Definition 7 A partition π is called a definable partition
height = tall ∨ hair = dark. in an information table S if every equivalence class is a
The meanings of the formulas are given by: definable granule.
4.2 ID3 type search algorithms partition lattices are introduced. Depending on the proper-
ties of classification rules, a solution to a consistent classifi-
ID3 type learning algorithms can be formulated as a cation problem is a definable partition in one of the lattices.
heuristic search of the semi-lattice ΠCD(C) (U ). The heuris- Such a solution can be obtained by searching that lattice.
tic used is based on an information-theoretic measure of de- Our formulation is similar to the well established version
pendency between the partition defined by class and an- space search method for machine learning [3].
other conjunctively definable partition with respect to the The new formulation enables us to precisely and con-
set of attributes C. Roughly speaking, the measure quanti- cisely define many notions, and to present a more general
fies the degree to which a partition π ∈ ΠCD(C) (U ) satisfies framework for classification. To illustrate its the potential
the condition π ¹ πclass of a solution partition. usefulness and generality, we briefly describe the ID3 and
Specifically, the direction of ID3 search is from coarsest rough set learning algorithms using the proposed model.
partitions of ΠCD(C) (U ) to more refined partitions. Largest
partitions in ΠCD(C) (U ) are the partitions defined by single
attributes in C. Using the information-theoretic measure, References
ID3 first selects a partition defined by a single attribute.
If an equivalence class in the partition is not a conjunc- [1] Demri, S. and Orlowska, E. Logical analysis of indiscerni-
tively definable granule with respect to class, the equiva- bility, in: Incomplete Information: Rough Set Analysis, Or-
lowska, E. (Ed.), Physica-Verlag, Heidelberg, pp. 347-380,
lence class is further divided into smaller granules by us-
1998.
ing an additional attribute. The same information-theoretic
measure is used for the selection of the new attribute. The [2] Fayyad, U.M. and Piatetsky-Shapiro, G. (Eds.) Advances in
smaller granules are conjunctively definable granules with Knowledge Discovery and Data Mining, AAAI Press, 1996.
respect to C. The search process continues until a partition
[3] Mitchell, T.M. Generalization as search, Artificial Intelli-
π ∈ ΠCD(C) (U ) is obtained such that π ¹ πclass .
gence, 18, 203-226, 1982.
4.3 Rough set type search algorithms [4] Pawlak, Z. Rough Sets, Theoretical Aspects of Reasoning
about Data, Kluwer Academic Publishers, Dordrecht, 1991.
Algorithms for finding a reduct in the theory of rough [5] Quinlan, J.R. Learning efficient classification procedures
sets can be viewed as heuristic search of the partition lattice and their application to chess end-games, in: Machine
ΠAD(C) (U ). Two directions of search can be carried, either Learning: An Artificial Intelligence Approach, Vol. 1,
from coarsening partitions to refinement partitions or from Michalski, J.S., Carbonell, J.G., and Mirchell, T.M. (Eds.),
refinement partitions to coarsening partitions. Morgan Kaufmann, Palo Alto, CA, pp. 463-482, 1983.
The smallest partition in ΠAD(C) (U ) is πC . By dropping [6] Wille, R. Concept lattices and conceptual knowledge sys-
an attribute a from C, one obtains a coarsening partition tems, Computers Mathematics with Applications, 23, 493-
πC−{a} . Typically, a certain fitness measure is used for the 515, 1992.
selection of the attribute. The process continues until no
further attributes can be dropped. That is, we find a subset [7] Yao, Y.Y. Granular computing: basic issues and possible so-
lutions, Proceedings of the 5th Joint Conference on Infor-
A ⊆ C such that πA ¹ πclass and ¬(πB ¹ πclass ) for all
mation Sciences, pp.186-189, 2000.
proper subsets B ⊂ A. The resulting set of attributes A is a
reduct. [8] Yao, Y.Y. On modeling data mining with granular comput-
The largest partition in ΠAD(C) (U ) is π∅ . By adding an ing, Proceedings of COMPSAC 2001, pp.638-643, 2001.
attribute a, one obtains a refined partition πa . The process
[9] Yao, Y.Y. and Zhong, N. Potential applications of granular
continues until we have a partition satisfying the condition
computing in knowledge discovery and data mining, Pro-
πA ¹ πclass . The resulting set of attributes A is a reduct. ceedings of World Multiconference on Systemics, Cybernet-
ics and Informatics, pp.573-580, 1999.