Professional Documents
Culture Documents
Textbook Learning With Uncertainty 1St Edition Xizhao Wang Ebook All Chapter PDF
Textbook Learning With Uncertainty 1St Edition Xizhao Wang Ebook All Chapter PDF
Xizhao Wang
Visit to download the full and correct content document:
https://textbookfull.com/product/learning-with-uncertainty-1st-edition-xizhao-wang/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...
https://textbookfull.com/product/uncertainty-quantification-in-
multiscale-materials-modeling-1st-edition-yan-wang/
https://textbookfull.com/product/probability-for-machine-
learning-discover-how-to-harness-uncertainty-with-python-jason-
brownlee/
https://textbookfull.com/product/artificial-intelligence-with-
uncertainty-second-edition-du/
https://textbookfull.com/product/uncertainty-analysis-of-
experimental-data-with-r-1st-edition-benjamin-david-shaw/
Reasoning and Public Health New Ways of Coping with
Uncertainty 1st Edition Louise Cummings (Auth.)
https://textbookfull.com/product/reasoning-and-public-health-new-
ways-of-coping-with-uncertainty-1st-edition-louise-cummings-auth/
https://textbookfull.com/product/persistent-stochastic-shocks-in-
a-new-keynesian-model-with-uncertainty-1st-edition-tobias-kranz-
auth/
https://textbookfull.com/product/programming-for-game-design-a-
hands-on-guide-with-godot-1st-edition-wang/
https://textbookfull.com/product/my-mathematical-life-yuan-wang-
in-conversation-1st-edition-wang/
https://textbookfull.com/product/the-precarious-future-of-
education-risk-and-uncertainty-in-ecology-curriculum-learning-
and-technology-1st-edition-jan-jagodzinski-eds/
LEARNING WITH
Uncertainty
XIZHAO WANG
JUNHAI ZHAI
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Symbols and Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
1 Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Randomness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Entropy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Joint Entropy and Conditional Entropy. . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Mutual Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Fuzziness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Definition and Representation of Fuzzy Sets. . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Basic Operations and Properties of Fuzzy Sets. . . . . . . . . . . . . . . . . . . . 7
1.2.3 Fuzzy Measures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Roughness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Nonspecificity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Relationships among the Uncertainties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5.1 Entropy and Fuzziness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5.2 Fuzziness and Ambiguity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .221
Preface
Learning is an essential way for humans to gain wisdom and furthermore for
machines to acquire intelligence. It is well acknowledged that learning algorithms
will be more flexible and more effective if the uncertainty can be modeled and
processed during the process of designing and implementing the learning algorithms.
Uncertainty is a common phenomenon in machine learning, which can be found
in every stage of learning, such as data preprocessing, algorithm design, and model
selection. Furthermore, one can find the impact of uncertainty processing on various
machine learning techniques; for instance, uncertainty can be used as a heuristic to
generate decision tree in inductive learning, it can be employed to measure the sig-
nificance degree of samples in active learning, and it can also be applied in ensemble
learning as a heuristic to select the basic classifier for integration. This book makes
an initial attempt to systematically discuss the modeling and significance of uncer-
tainty in some processes of learning and tries to bring some new advancements in
learning with uncertainty.
The book contains five chapters. Chapter 1 is an introduction to uncertainty.
Four kinds of uncertainty, that is, randomness, fuzziness, roughness, and nonspeci-
ficity, are briefly introduced in this chapter. Furthermore, the relationships among
these uncertainties are also discussed in this chapter. Chapter 2 introduces the induc-
tion of decision tree with uncertainty. The contents include how to use uncertainty
to induce crisp decision trees and fuzzy decision trees. Chapter 2 also discusses how to
use uncertainty to improve the generalization ability of fuzzy decision trees. Cluster-
ing under uncertainty environment is discussed in Chapter 3. Specifically, the basic
concepts of clustering are briefly reviewed in Section 3.1. Section 3.2 introduces two
types of clustering, that is, partition-based clustering and hierarchy-based clustering.
Validation functions of clustering are discussed in Section 3.3 and feature-weighted
fuzzy clustering is addressed in next sections. In Chapter 4, we first present an intro-
duction to active learning in Section 4.1. Two kinds of active learning techniques,
that is, uncertainty sampling and query by committee, are discussed in Section 4.2.
Maximum ambiguity–based active learning is presented in Section 4.3. A learning
approach for support vector machine is presented in Section 4.4. Chapter 5 includes
five sections. An introduction to ensemble learning is reviewed in Section 5.1.
Bagging and boosting, multiple fuzzy decision trees, and fusion of classifiers based
on upper integral are discussed in the next sections, respectively. The relation-
ship between fuzziness and generalization in ensemble learning is addressed in
Section 5.5.
Taking this opportunity, we deliver our sincere thanks to those who offered us
great help during the writing of this book. We appreciate the discussions and revision
suggestions given by those friends and colleagues, including Professor Shuxia Lu,
Professor Hongjie Xing, Associate Professor Huimin Feng, Dr. Chunru Dong, and
our graduate students, Shixi Zhao, Sheng Xing, Shaoxing Hou, Tianlun Zhang,
Peizhou Zhang, Bo Liu, Liguang Zang, etc. Our thanks also go to the editors who
help us plan and organize this book.
The book can serve as a reference book for researchers or a textbook for senior
undergraduates and postgraduates majored in computer science and technology,
applied mathematics, automation, etc. This book also provides some useful guide-
lines of research for scholars who are studying the impact of uncertainty on machine
learning and data mining.
Xizhao Wang
Shenzhen University, China
Junhai Zhai
Hebei University, China
Symbols and
Abbreviations
Symbols
A The set of conditional attributes
Aij The jth fuzzy linguistic term of the ith attribute
Bel(·, ·) Belief function
bpa(·, ·) Basic probability assignment function
C The set of decision attributes
(C ) f d μ Choquet fuzzy integral
D(· ·) KL-divergence
DP(·) Decision profile
Fα The α cut-set of fuzzy set F
H† Moore–Penrose generalized inverse of matrix H
I Index set
k(·, ·) Kernel function
(l)
pij The relative frequency of Aij regarding to the lth class
Q
POSP The positive region of P with respect to Q
Pθ (y|x) Given x, the posterior probability of y under the model θ
Rd d dimension Euclidean space
R(X ) Lower approximation of X with respect to R
R(X ) Upper approximation of X with respect to R
SB Scatter matrix between classes
SW Scatter matrix within class
(S) f d μ Sugeno fuzzy integral
U Universe
uij The membership of the ith instance with respect to the jth class
(U ) f dμ Upper fuzzy integral
V (yi ) The number of votes of yi
xi The ith instance
[x]R Equivalence class of x with respect to R
yi The class label of xi
β Degree of confidence
Q
γP The significance of P with respect to Q
(w)
δij The weighted similarity degree between the ith and jth samples
μA (x) Membership function
Abbreviations
AGNES AGglomerative NESting
BIRCH Balanced iterative reducing and clustering using hierarchies
CF Certainty factor
CNN Condensed nearest neighbors
CURE Clustering Using REpresentatives
DDE Dynamic differential evolution
DE Differential evolution
DES Differential evolution strategy
DIANA DIvisive ANAlysis
DT Decision table
ELM Extreme learning machine
FCM Fuzzy C-means
F-DCT Fuzzy decision tree
FDT Fuzzy decision table
F-ELM Fuzzy extreme learning machine
F-KNN Fuzzy K-nearest neighbors
FLT Fuzzy linguistic term
FRFDT Fuzzy rough set–based decision tree
GD Gradient descent
GD-FWL Gradient descent–based feature weight learning
IBL Instance based learning
K -NN K -nearest neighbors
LS-SVM Least square support vector machine
MABSS Maximum ambiguity based sample selection
MEHDE Hybrid differential evolution with multistrategy cooperating evolution
MFDT Multiple fuzzy decision tree
QBC Query by committee
ROCK RObust Clustering using linKs
SMTC-C Similarity matrix’s transitive closure
SVM Support vector machine
WFPR Weighted fuzzy production rules
Chapter 1
Uncertainty
1.1 Randomness
Randomness is a kind of objective uncertainty regarding random variables, while
entropy is a measure of the uncertainty of random variables [5].
1.1.1 Entropy
Let X be a discrete random variable that takes values randomly from set X, its prob-
ability mass function is p(x) = Pr(X = x), x ∈ X, denoted by X ∼ p(x). The
definition of entropy of X is given as follows [5].
We can find from (1.1) that the entropy of X is actually a function of p; the following
example can explicitly illustrate this point.
1
2 ■ Learning with Uncertainty
0.9
0.8
0.7
0.6
H (p)
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
p
Example 1.2 Table 1.1 is a small discrete-valued data set with 14 instances.
For attribute X = Outlook, X = {Sunny, Cloudy, Rain}, we can find from
Table 1.1 that Pr(X = Sunny) = p1 = 14 5 , Pr(X = Cloudy) = p = 4 ,
2 14
Pr(X = Rain) = p3 = 145 . According to (1.1), we have
5 5 4 4 5 5
H (Outlook) = − log2 − log2 − log2 = 1.58.
14 14 14 14 14 14
Similarly, we have
4 4 6 6 4 4
H (Temperature) = − log2 − log2 − log2 = 1.56.
14 14 14 14 14 14
7 7 7 7
H (Humidity) = − log2 − log2 = 1.00.
14 14 14 14
Uncertainty ■ 3
8 8 6 6
H (Wind) = − log2 − log2 = 0.99.
14 14 14 14
5 5 9 9
H (PlayTennis) = − log2 − log2 = 0.94.
14 14 14 14
It is worth noting that the probability mentioned earlier is approximated by its
frequency, that is, the proportion of a value in all cases.
I (X ; Y ) = H (X ) − H (X |Y ). (1.5)
We can find from (1.5) that I (X ; Y ) is the reduction of the uncertainty of X due to
presentation of Y . By symmetry, it also follows that
I (X ; Y ) = H (Y ) − H (Y |X ). (1.6)
Theorem 1.1 The mutual information and entropy have the following relation-
ships [5]:
I (X ; Y ) = H (X ) − H (X |Y ); (1.7)
I (X ; Y ) = H (Y ) − H (Y |X ); (1.8)
I (X ; Y ) = H (X ) + H (Y ) − H (X , Y ); (1.9)
I (X ; Y ) = I (Y , X ); (1.10)
I (X ; X ) = H (X ). (1.11)
Uncertainty ■ 5
Example 1.3 Given Table 1.1, let X = Outlook, Y = PlayTennis, then the mutual
information of X and Y is
H (Y |X ) = − p(x) p(y|x) log2 p(y|x)
x∈X y∈Y
1.2 Fuzziness
Fuzziness is a kind of cognitive uncertainty due to the absence of strict or precise
boundaries of concepts [6–10]. In the real world, there are many vague concepts,
such as young, hot, and high. These concepts can be modeled by fuzzy sets that can
6 ■ Learning with Uncertainty
If we extend the values of μA (x) from {0, 1} to interval [0, 1], then we can define a
fuzzy set A. In order to distinguish between classical sets and fuzzy sets, we use
A to
represent a fuzzy set. In the following, we present the definition of fuzzy set.
Definition 1.5 Let U be a universe and A be a mapping from U to [0, 1], denoted
by μ (x). If ∀x ∈ U , μ (x) ∈ [0, 1], then A is said to be a fuzzy set defined on U ;
A A
μ
A (x) is called the membership degree of element x belonging to A.
The set of all fuzzy sets defined on U is denoted by F(U ); ∅ (the empty set) is
a fuzzy set whose membership degree for any element is zero.
For a given fuzzy set
A defined on U ,
A is completely characterized by the set of
pairs [11,12]
A = {(x, μ
A (x)), x ∈ U }. (1.13)
(1) ∅ ⊆A ⊆ U.
(2)
A ⊆A.
(3)
A ⊆ B ⊆
B, A ⇒A =
B.
(4)
A ⊆ B, B ⊆ C ⇒
A ⊆ C.
( B)(x) =
A ∪ B(x) = max{
A(x) ∨ A(x),
B(x)};
( B)(x) =
A ∩ B(x) = min{
A(x) ∧ A(x),
B(x)};
Ac (x) = 1 −
A(x).
(1)
A ∪ A = A, A ∩A = A.
(2)
A ∪ B = B ∪ A, A ∩ B =B ∩ A.
(3)
A ∪ (B ∪ C) = (A ∪ B) ∪ C, A ∩ ( = (
B ∩ C) A ∩
B) ∩ C.
(4)
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C), A ∩ (B ∪ C) = (A ∩
B) ∪ (
A ∩ C).
(5)
A ∪ (A ∩ B) = A, A ∩ (A ∪ B) = A.
(6) A c )c =
( A.
(7)
A∪∅ = A,
A ∩ ∅ = ∅, A ∪ U = U , A∩U = A.
(8) (
A ∪ B) =
c A ∩
c B , (
c A ∩B) =
c A ∪
c Bc .
(1) ∀
A ∈ F(U ), H (A) = 0, if and only if A is a classical set.
(2) ∀A ∈ F(U ), H (A) = 1, if and only if ∀x ∈ U , μ A (x) = 0.5.
(3) ∀
A,
B ∈ F(U ), if μ A (x) ≤ μ
B (x), when μ
A (x) ≥ 12 and μA (x) ≥ μ
B (x),
1
when μA (x) ≤ 2 , then H (A) ≥ H (B).
(4) ∀
A ∈ F(U ), H (A) = H ( Ac ).
Many fuzzy entropies have been defined by different researchers. A short survey of
fuzzy entropy can be found in [15] or in [16]. In the following, we introduce three
fuzzy entropies commonly used in practice.
Definition 1.9 (Zadeh fuzzy entropy [2]) Let A be a fuzzy set defined on U .
P = {p1 , p2 , . . . , pn } is a probability distribution of U , U = {x1 , x2 , . . . , xn }, μ
A (xi )
is the membership function of A, and the fuzzy entropy of A is defined as follows:
n
H (
A) = − A (xi )pi log2 pi .
μ (1.16)
i=1
Definition 1.10 (Luca–Termini fuzzy entropy [14]) Let A be a fuzzy set defined
on U . U = {x1 , x2 , . . . , xn }, μ(x
A i ) is the membership function of
A, and the fuzzy
entropy of A is defined as follows:
1
n
H (
A) = − A (xi ) + (1 − μ
A (xi ) log2 μ
μ A (xi ) log2 (1 − μ
A (xi )) . (1.17)
n i=1
Definition 1.11 (Kosko fuzzy entropy [17]) Let A be a fuzzy set defined on U .
U = {x1 , x2 , . . . , xn }, μ(x
A i ) is the membership function of A, and the fuzzy entropy
of A is defined as follows:
n
μ (xi ) ∧ μ c (xi )
H (
A) = n i=1 A A
. (1.18)
A (xi ) ∨ μ
i=1 μ Ac (xi )
Uncertainty ■ 9
1.3 Roughness
Roughness is originally proposed by Pawlak [3] in order to represent the inadequacy
of knowledge. To introduce roughness, we first review some fundamental concepts
of rough sets.
Rough set theory [3] was developed on the foundation of indiscernible relation,
which is actually an equivalence relation. In the framework of classification, the
objects processed by rough set theory are represented as decision tables.
(1) ∀x ∈ U , xRx.
(2) ∀x, y ∈ U , if xRy, then yRx.
(3) ∀x, y, z ∈ U , if xRy and yRz, then xRz.
and
1.4 Nonspecificity
Nonspecificity [4] is also called ambiguity, which is another type of cognitive uncer-
tainty. It results from choosing one from two or more unclear objects. For example,
an interesting film and an expected concert are holding at the same time. In this sit-
uation, it is hard for those who love both film and music to decide which one should
be attended. This uncertainty is associated with a possibility distribution that was
proposed by Zadeh [18].
Let X be a classical set and let P (X ) denotes the power set of X . A possibility
measure is a function π [19]:
π : P (X ) → [0, 1] (1.24)
n
g(π) = (π∗i − π∗i+1 ) ln i, (1.25)
i=1
Entropy(E0 ) = −p1 ln p1 − p2 ln p2
= −p1 ln p1 − (1 − p1 ) ln p(1 − p1 ) (1.26)
and
and
d
Entropy(E0 ) = 0 (1.30)
dp1
and
d
Fuzziness(E0 ) = 0, (1.31)
dp1
Uncertainty ■ 13
we can obtain that both Entropy(E0 ) and Fuzziness(E0 ) with respect to the vari-
able p1 attain the maximum at p1 = 1−c 2 and monotonically increase when p1 < 2
1−c
and monotonically decrease when p1 > 1−c 2 . It indicates that the entropy and
fuzziness for a probability distribution have the same monotonic characteristics
and extreme-value points. Furthermore, noting that ni=2 pi = c and the symme-
try of variables p1 , p2 , . . . , pn , we conclude that the entropy and fuzziness of an
n-dimensional probability distribution attain their maximum at
1
p1 = p 2 = · · · = pn = . (1.32)
n
1.5
(0.33, 0.33, 1.0986)
1
Entropy
0.5
0
0.2
0.4 0.8
0.6 0.6
p 0.8 0.4
1 0.2 p2
(a)
0
0.2
0.4 0.8
0.6 0.6
p1 0.8 0.4
0.2 p2
(b)
1.4
1.2
0.8
Fuzziness
0.6
0.4
0.2
0
0
0 0.2 0.4 0.5
0.6 0.8 1 1
(a)
0.7
0.6
0.5
Ambiguity
0.4
0.3
0.2
0.1
0
0
0.5 0
0.4 0.2
0.8 0.6
1 1
(b)
References
1. V. R. Hogg, A. T. Craig. Introduction to Mathematical Statistics. New York: Pearson
Education, 2004.
2. L. A. Zadeh. Fuzzy sets. Information and Control, 1965, 8:338–353.
3. Z. Pawlak. Rough sets. International Journal of Information and Computer Sciences,
1982, 11:341–356.
4. Y. Yuan, M. J. Shaw. Induction of fuzzy decision trees. Fuzzy Sets and Systems, 1995,
69(2):125–139.
5. T. M. Cover, J. A. Thomas. The Elements of Information Theory, 2nd edn. John Wiley
& Sons, Inc., Hoboken, NJ, 2006.
16 ■ Learning with Uncertainty
Supervised learning, which can be categorized into classification and regression prob-
lems, is the central part of the machine learning field. Over the past several decades,
designing an effective learning system and promoting the system’s performance have
been the most important tasks in supervised learning. How to model and process
the uncertainty existing in the learning process has become a key challenge in learn-
ing from data. Uncertainty processing has a very significant impact on the entire
knowledge extraction process.
Many theories and methodologies have been developed to model and process
different kinds of uncertainties. Focusing on the decision tree induction, this chap-
ter will discuss the impact of uncertainty modeling and processing on the learning
process. We introduce the target concept learned with the decision tree algorithm,
some popular decision tree learning algorithms including widely used ID3 and its
fuzzy version, the induction of fuzzy decision tree based on rough fuzzy techniques,
and, more importantly, the relationship between the generalization ability of fuzzy
decision tree and the uncertainty in the learning process.
17
18 ■ Learning with Uncertainty