You are on page 1of 53

Learning with Uncertainty 1st Edition

Xizhao Wang
Visit to download the full and correct content document:
https://textbookfull.com/product/learning-with-uncertainty-1st-edition-xizhao-wang/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Uncertainty Quantification in Multiscale Materials


Modeling 1st Edition Yan Wang

https://textbookfull.com/product/uncertainty-quantification-in-
multiscale-materials-modeling-1st-edition-yan-wang/

Probability for Machine Learning - Discover How To


Harness Uncertainty With Python Jason Brownlee

https://textbookfull.com/product/probability-for-machine-
learning-discover-how-to-harness-uncertainty-with-python-jason-
brownlee/

Artificial Intelligence with Uncertainty Second Edition


Du

https://textbookfull.com/product/artificial-intelligence-with-
uncertainty-second-edition-du/

Uncertainty analysis of experimental data with R 1st


Edition Benjamin David Shaw

https://textbookfull.com/product/uncertainty-analysis-of-
experimental-data-with-r-1st-edition-benjamin-david-shaw/
Reasoning and Public Health New Ways of Coping with
Uncertainty 1st Edition Louise Cummings (Auth.)

https://textbookfull.com/product/reasoning-and-public-health-new-
ways-of-coping-with-uncertainty-1st-edition-louise-cummings-auth/

Persistent Stochastic Shocks in a New Keynesian Model


with Uncertainty 1st Edition Tobias Kranz (Auth.)

https://textbookfull.com/product/persistent-stochastic-shocks-in-
a-new-keynesian-model-with-uncertainty-1st-edition-tobias-kranz-
auth/

Programming for Game Design: A Hands-On Guide with


Godot 1st Edition Wang

https://textbookfull.com/product/programming-for-game-design-a-
hands-on-guide-with-godot-1st-edition-wang/

My Mathematical Life: Yuan Wang in Conversation 1st


Edition Wang

https://textbookfull.com/product/my-mathematical-life-yuan-wang-
in-conversation-1st-edition-wang/

The Precarious Future of Education: Risk and


Uncertainty in Ecology, Curriculum, Learning, and
Technology 1st Edition Jan Jagodzinski (Eds.)

https://textbookfull.com/product/the-precarious-future-of-
education-risk-and-uncertainty-in-ecology-curriculum-learning-
and-technology-1st-edition-jan-jagodzinski-eds/
LEARNING WITH

Uncertainty
XIZHAO WANG
JUNHAI ZHAI
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742

© 2017 by Taylor & Francis Group, LLC


CRC Press is an imprint of Taylor & Francis Group, an Informa business

Printed on acid-free paper


Version Date: 20161021

International Standard Book Number-13: 978-1-4987-2412-8 (Hardback)

Library of Congress Cataloging-in-Publication Data

Names: Wang, Xizhao, author. | Zhai, Junhai, author.


Title: Learning with uncertainty / Xizhao Wang, Junhai Zhai.
Description: Boca Raton : CRC Press, [2016] | Includes bibliographical
references and index.
Identifiers: LCCN 2016030316| ISBN 9781498724128 (acid-free paper) | ISBN
9781498724135 (e-book)
Subjects: LCSH: Machine learning. | Fuzzy decision making. | Decision trees.
Classification: LCC Q325.5 .W36 2016 | DDC 006.3/1--dc23
LC record available at https://lccn.loc.gov/2016030316

Visit the Taylor & Francis Web site at


http://www.taylorandfrancis.com

and the CRC Press Web site at


http://www.crcpress.com
Contents

Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Symbols and Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

1 Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Randomness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Entropy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Joint Entropy and Conditional Entropy. . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Mutual Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Fuzziness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Definition and Representation of Fuzzy Sets. . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Basic Operations and Properties of Fuzzy Sets. . . . . . . . . . . . . . . . . . . . 7
1.2.3 Fuzzy Measures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Roughness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Nonspecificity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Relationships among the Uncertainties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5.1 Entropy and Fuzziness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5.2 Fuzziness and Ambiguity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 Decision Tree with Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17


2.1 Crisp Decision Tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.1 ID3 Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.2 Continuous-Valued Attributes Decision Trees. . . . . . . . . . . . . . . . . . 22
2.2 Fuzzy Decision Tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3 Fuzzy Decision Tree Based on Fuzzy Rough Set Techniques. . . . . . . . . . . . 37
2.3.1 Fuzzy Rough Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3.2 Generating Fuzzy Decision Tree with Fuzzy Rough Set
Technique. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4 Improving Generalization of Fuzzy Decision Tree by Maximizing
Fuzzy Entropy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.4.1 Basic Idea of Refinement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.4.2 Globally Weighted Fuzzy If-Then Rule Reasoning. . . . . . . . . . . . . . 44
2.4.3 Refinement Approach to Updating the Parameters. . . . . . . . . . . . . 49
2.4.3.1 Maximum Fuzzy Entropy Principle. . . . . . . . . . . . . . . . . . . 50
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3 Clustering under Uncertainty Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59


3.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2 Clustering Algorithms Based on Hierarchy or Partition. . . . . . . . . . . . . . . . 60
3.2.1 Clustering Algorithms Based on Hierarchy. . . . . . . . . . . . . . . . . . . . . . 60
3.2.2 Clustering Algorithms Based on Partition. . . . . . . . . . . . . . . . . . . . . . . 66
3.3 Validation Functions of Clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.4 Feature Weighted Fuzzy Clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.5 Weighted Fuzzy Clustering Based on Differential Evolution. . . . . . . . . . . 73
3.5.1 Differential Evolution and Dynamic Differential Evolution. . . . 73
3.5.1.1 Basic Differential Evolution Algorithm. . . . . . . . . . . . . . . . 73
3.5.1.2 Dynamic Differential Evolution Algorithm. . . . . . . . . . . 77
3.5.2 Hybrid Differential Evolution Algorithm Based on
Coevolution with Multi-Differential Evolution Strategy. . . . . . . . 78
3.6 Feature Weight Fuzzy Clustering Learning Model Based
on MEHDE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.6.1 MEHDE-Based Feature Weight Learning:
MEHDE-FWL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.6.2 Experimental Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.6.2.1 Comparison between MEHDE-FWL and
GD-FWL Based on FCM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.6.2.2 Comparisons Based on SMTC Clustering. . . . . . . . . . . . 88
3.6.2.3 Efficiency Analysis of GD-, DE-, DDE-, and
MEHDE-Based Searching Techniques. . . . . . . . . . . . . . . . 90
3.7 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

4 Active Learning with Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99


4.1 Introduction to Active Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.2 Uncertainty Sampling and Query-by-Committee Sampling. . . . . . . . . . 102
4.2.1 Uncertainty Sampling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.2.1.1 Least Confident Rule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.2.1.2 Minimal Margin Rule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.2.1.3 Maximal Entropy Rule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.2.2 Query-by-Committee Sampling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.3 Maximum Ambiguity–Based Active Learning. . . . . . . . . . . . . . . . . . . . . . . . . 105
4.3.1 Some Concepts of Fuzzy Decision Tree. . . . . . . . . . . . . . . . . . . . . . . . 106
4.3.2 Analysis on Samples with Maximal Ambiguity. . . . . . . . . . . . . . . . 107
4.3.3 Maximum Ambiguity–Based Sample Selection. . . . . . . . . . . . . . . . 109
4.3.4 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.4 Active Learning Approach to Support Vector Machine. . . . . . . . . . . . . . . . 120
4.4.1 Support Vector Machine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
4.4.2 SVM Active Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.4.3 Semisupervised SVM Batch Mode Active Learning. . . . . . . . . . . . 124
4.4.4 IALPSVM: An Informative Active Learning Approach
to SVM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.4.5 Experimental Results and Discussions. . . . . . . . . . . . . . . . . . . . . . . . . 128
4.4.5.1 Experiments on an Artificial Data Set by
Selecting a Single Query Each Time. . . . . . . . . . . . . . . . . 128
4.4.5.2 Experiments on Three UCI Data Sets by Selecting
a Single Query Each Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.4.5.3 Experiments on Two Image Data Sets by Selecting
a Batch of Queries Each Time. . . . . . . . . . . . . . . . . . . . . . . . 136
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

5 Ensemble Learning with Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149


5.1 Introduction to Ensemble Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.1.1 Majority Voting and Weighted Majority Voting. . . . . . . . . . . . . . . 150
5.1.2 Approach Based on Dempster–Shafer Theory
of Evidence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
5.1.3 Fuzzy Integral Ensemble Approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
5.2 Bagging and Boosting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
5.2.1 Bagging Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
5.2.2 Boosting Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.3 Multiple Fuzzy Decision Tree Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.3.1 Induction of Multiple Fuzzy Decision Tree. . . . . . . . . . . . . . . . . . . . 155
5.3.2 Experiment on Real Data Set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
5.4 Fusion of Classifiers Based on Upper Integral. . . . . . . . . . . . . . . . . . . . . . . . . 170
5.4.1 Extreme Learning Machine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
5.4.2 Multiple Classifier Fusion Based on Upper Integrals. . . . . . . . . . 172
5.4.2.1 Upper Integral and Its Properties. . . . . . . . . . . . . . . . . . . . . 173
5.4.2.2 A Model of Classifier Fusion Based on Upper
Integral. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
5.4.2.3 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
5.5 Relationship between Fuzziness and Generalization in Ensemble
Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
5.5.1 Classification Boundary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
5.5.1.1 Boundary and Its Estimation Given
by a Learned Classifier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
5.5.1.2 Two Types of Methods for Training a Classifier. . . . . . 188
5.5.1.3 Side Effect of Boundary and Experimental
Verification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
5.5.2 Fuzziness of Classifiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
5.5.2.1 Fuzziness of Classifier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
5.5.2.2 Relationship between Fuzziness and
Misclassification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
5.5.2.3 Relationship between Fuzziness and
Classification Boundary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
5.5.2.4 Divide and Conquer Strategy. . . . . . . . . . . . . . . . . . . . . . . . 198
5.5.2.5 Impact of the Weighting Exponent m on the
Fuzziness of Fuzzy K -NN Classifier. . . . . . . . . . . . . . . . . . 198
5.5.3 Relationship between Generalization and Fuzziness. . . . . . . . . . . 199
5.5.3.1 Definition of Generalization and Its
Elaboration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
5.5.3.2 Classifier Selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
5.5.3.3 Explanation Based on Extreme (max/min)
Fuzziness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
5.5.3.4 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .221
Preface

Learning is an essential way for humans to gain wisdom and furthermore for
machines to acquire intelligence. It is well acknowledged that learning algorithms
will be more flexible and more effective if the uncertainty can be modeled and
processed during the process of designing and implementing the learning algorithms.
Uncertainty is a common phenomenon in machine learning, which can be found
in every stage of learning, such as data preprocessing, algorithm design, and model
selection. Furthermore, one can find the impact of uncertainty processing on various
machine learning techniques; for instance, uncertainty can be used as a heuristic to
generate decision tree in inductive learning, it can be employed to measure the sig-
nificance degree of samples in active learning, and it can also be applied in ensemble
learning as a heuristic to select the basic classifier for integration. This book makes
an initial attempt to systematically discuss the modeling and significance of uncer-
tainty in some processes of learning and tries to bring some new advancements in
learning with uncertainty.
The book contains five chapters. Chapter 1 is an introduction to uncertainty.
Four kinds of uncertainty, that is, randomness, fuzziness, roughness, and nonspeci-
ficity, are briefly introduced in this chapter. Furthermore, the relationships among
these uncertainties are also discussed in this chapter. Chapter 2 introduces the induc-
tion of decision tree with uncertainty. The contents include how to use uncertainty
to induce crisp decision trees and fuzzy decision trees. Chapter 2 also discusses how to
use uncertainty to improve the generalization ability of fuzzy decision trees. Cluster-
ing under uncertainty environment is discussed in Chapter 3. Specifically, the basic
concepts of clustering are briefly reviewed in Section 3.1. Section 3.2 introduces two
types of clustering, that is, partition-based clustering and hierarchy-based clustering.
Validation functions of clustering are discussed in Section 3.3 and feature-weighted
fuzzy clustering is addressed in next sections. In Chapter 4, we first present an intro-
duction to active learning in Section 4.1. Two kinds of active learning techniques,
that is, uncertainty sampling and query by committee, are discussed in Section 4.2.
Maximum ambiguity–based active learning is presented in Section 4.3. A learning
approach for support vector machine is presented in Section 4.4. Chapter 5 includes
five sections. An introduction to ensemble learning is reviewed in Section 5.1.
Bagging and boosting, multiple fuzzy decision trees, and fusion of classifiers based
on upper integral are discussed in the next sections, respectively. The relation-
ship between fuzziness and generalization in ensemble learning is addressed in
Section 5.5.
Taking this opportunity, we deliver our sincere thanks to those who offered us
great help during the writing of this book. We appreciate the discussions and revision
suggestions given by those friends and colleagues, including Professor Shuxia Lu,
Professor Hongjie Xing, Associate Professor Huimin Feng, Dr. Chunru Dong, and
our graduate students, Shixi Zhao, Sheng Xing, Shaoxing Hou, Tianlun Zhang,
Peizhou Zhang, Bo Liu, Liguang Zang, etc. Our thanks also go to the editors who
help us plan and organize this book.
The book can serve as a reference book for researchers or a textbook for senior
undergraduates and postgraduates majored in computer science and technology,
applied mathematics, automation, etc. This book also provides some useful guide-
lines of research for scholars who are studying the impact of uncertainty on machine
learning and data mining.

Xizhao Wang
Shenzhen University, China
Junhai Zhai
Hebei University, China
Symbols and
Abbreviations

Symbols
A The set of conditional attributes
Aij The jth fuzzy linguistic term of the ith attribute
Bel(·, ·) Belief function
bpa(·, ·) Basic probability assignment function
C  The set of decision attributes
(C ) f d μ Choquet fuzzy integral
D(·  ·) KL-divergence
DP(·) Decision profile
Fα The α cut-set of fuzzy set F
H† Moore–Penrose generalized inverse of matrix H
I Index set
k(·, ·) Kernel function
(l)
pij The relative frequency of Aij regarding to the lth class
Q
POSP The positive region of P with respect to Q
Pθ (y|x) Given x, the posterior probability of y under the model θ
Rd d dimension Euclidean space
R(X ) Lower approximation of X with respect to R
R(X ) Upper approximation of X with respect to R
SB Scatter matrix between classes
SW  Scatter matrix within class
(S) f d μ Sugeno fuzzy integral
U Universe
uij  The membership of the ith instance with respect to the jth class
(U ) f dμ Upper fuzzy integral
V (yi ) The number of votes of yi
xi The ith instance
[x]R Equivalence class of x with respect to R
yi The class label of xi
β Degree of confidence
Q
γP The significance of P with respect to Q
(w)
δij The weighted similarity degree between the ith and jth samples
μA (x) Membership function

Abbreviations
AGNES AGglomerative NESting
BIRCH Balanced iterative reducing and clustering using hierarchies
CF Certainty factor
CNN Condensed nearest neighbors
CURE Clustering Using REpresentatives
DDE Dynamic differential evolution
DE Differential evolution
DES Differential evolution strategy
DIANA DIvisive ANAlysis
DT Decision table
ELM Extreme learning machine
FCM Fuzzy C-means
F-DCT Fuzzy decision tree
FDT Fuzzy decision table
F-ELM Fuzzy extreme learning machine
F-KNN Fuzzy K-nearest neighbors
FLT Fuzzy linguistic term
FRFDT Fuzzy rough set–based decision tree
GD Gradient descent
GD-FWL Gradient descent–based feature weight learning
IBL Instance based learning
K -NN K -nearest neighbors
LS-SVM Least square support vector machine
MABSS Maximum ambiguity based sample selection
MEHDE Hybrid differential evolution with multistrategy cooperating evolution
MFDT Multiple fuzzy decision tree
QBC Query by committee
ROCK RObust Clustering using linKs
SMTC-C Similarity matrix’s transitive closure
SVM Support vector machine
WFPR Weighted fuzzy production rules
Chapter 1

Uncertainty

Uncertainty is a common phenomenon in machine learning, which can be found


in every phase of learning, such as data preprocessing, algorithm design, and model
selection. The representation, measurement, and handling of uncertainty have a
significant impact on the performance of a learning system. There are four common
uncertainties in machine learning, that is, randomness [1], fuzziness [2], roughness
[3], and nonspecificity [4]. In this chapter, we mainly introduce the first three kinds
of uncertainty, briefly list the fourth uncertainty, and give a short discussion about
the relationships among the four uncertainties.

1.1 Randomness
Randomness is a kind of objective uncertainty regarding random variables, while
entropy is a measure of the uncertainty of random variables [5].

1.1.1 Entropy
Let X be a discrete random variable that takes values randomly from set X, its prob-
ability mass function is p(x) = Pr(X = x), x ∈ X, denoted by X ∼ p(x). The
definition of entropy of X is given as follows [5].

Definition 1.1 The entropy of X is defined by



H (X ) = − p(x) log2 p(x). (1.1)
x∈X

We can find from (1.1) that the entropy of X is actually a function of p; the following
example can explicitly illustrate this point.
1
2 ■ Learning with Uncertainty

0.9

0.8

0.7

0.6
H (p)

0.5

0.4

0.3

0.2

0.1

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
p

Figure 1.1 The relationship between entropy H(p) and p.

Example 1.1 Let X = {0, 1}, and Pr(X = 1) = p, Pr(X = 0) = 1 − p.


According to Equation (1.1), the entropy of X is

H (X ) = −p × log2 p − (1 − p) × log2 (1 − p). (1.2)

Obviously, H (X ) is a function of p. For convenience, we denote H (X ) as H (p). The


graph of the function H (p) is shown in Figure 1.1. We can see from Figure 1.1 that
H (p) takes its maximum at p = 12 .

Example 1.2 Table 1.1 is a small discrete-valued data set with 14 instances.
For attribute X = Outlook, X = {Sunny, Cloudy, Rain}, we can find from
Table 1.1 that Pr(X = Sunny) = p1 = 14 5 , Pr(X = Cloudy) = p = 4 ,
2 14
Pr(X = Rain) = p3 = 145 . According to (1.1), we have

5 5 4 4 5 5
H (Outlook) = − log2 − log2 − log2 = 1.58.
14 14 14 14 14 14

Similarly, we have

4 4 6 6 4 4
H (Temperature) = − log2 − log2 − log2 = 1.56.
14 14 14 14 14 14
7 7 7 7
H (Humidity) = − log2 − log2 = 1.00.
14 14 14 14
Uncertainty ■ 3

Table 1.1 A Small Data with 14 Instances


x Outlook Temperature Humidity Wind y (PlayTennis)

x1 Sunny Hot High Weak No

x2 Sunny Hot High Strong No

x3 Cloudy Hot High Weak Yes

x4 Rain Mild High Weak Yes

x5 Rain Cool Normal Weak Yes

x6 Rain Cool Normal Strong No

x7 Cloudy Cool Normal Strong Yes

x8 Sunny Mild High Weak No

x9 Sunny Cool Normal Weak Yes

x10 Rain Mild Normal Weak Yes

x11 Sunny Mild Normal Strong Yes

x12 Cloudy Mild High Strong Yes

x13 Cloudy Hot Normal Weak Yes

x14 Rain Mild High Strong No

8 8 6 6
H (Wind) = − log2 − log2 = 0.99.
14 14 14 14
5 5 9 9
H (PlayTennis) = − log2 − log2 = 0.94.
14 14 14 14
It is worth noting that the probability mentioned earlier is approximated by its
frequency, that is, the proportion of a value in all cases.

1.1.2 Joint Entropy and Conditional Entropy


Given two random variables X and Y , suppose that (X , Y ) ∼ p(x, y). The joint
entropy of X and Y can be defined as follows.

Definition 1.2 The joint entropy of (X , Y ) is defined as



H (X , Y ) = − p(x, y) log2 p(x, y). (1.3)
x∈X y∈Y
4 ■ Learning with Uncertainty

Given a random variable X , we can define the conditional entropy H (Y |X ) of a


random variable Y .

Definition 1.3 Suppose (X , Y ) ∼ p(x, y), H (Y |X ) is defined as



H (Y |X ) = − p(x)H (Y |X = x)
x∈X
 
=− p(x) p(y|x) log2 p(y|x)
x∈X y∈Y

=− p(x, y) log2 p(y|x). (1.4)
x∈X y∈Y

It is necessary to note that generally H (Y |X )  = H (X |Y ), but H (X ) − H (X |Y ) =


H (Y ) − H (Y |X ).

1.1.3 Mutual Information


Given two random variables X and Y , the mutual information of X and Y , denoted
by I (X ; Y ), is a measure of relevance of X and Y . We now give the definition of
mutual information.

Definition 1.4 The mutual information of X and Y is defined as

I (X ; Y ) = H (X ) − H (X |Y ). (1.5)

We can find from (1.5) that I (X ; Y ) is the reduction of the uncertainty of X due to
presentation of Y . By symmetry, it also follows that

I (X ; Y ) = H (Y ) − H (Y |X ). (1.6)

Theorem 1.1 The mutual information and entropy have the following relation-
ships [5]:

I (X ; Y ) = H (X ) − H (X |Y ); (1.7)
I (X ; Y ) = H (Y ) − H (Y |X ); (1.8)
I (X ; Y ) = H (X ) + H (Y ) − H (X , Y ); (1.9)
I (X ; Y ) = I (Y , X ); (1.10)
I (X ; X ) = H (X ). (1.11)
Uncertainty ■ 5

Example 1.3 Given Table 1.1, let X = Outlook, Y = PlayTennis, then the mutual
information of X and Y is
 
H (Y |X ) = − p(x) p(y|x) log2 p(y|x)
x∈X y∈Y

= −Pr(X = Sunny) [Pr(Y = Yes|X = Sunny)



× log2 Pr(Y = Yes|X = Sunny)
− Pr(X = Sunny) [Pr(Y = No|X = Sunny)

× log2 Pr(Y = No|X = Sunny)
− Pr(X = Cloudy) [Pr(Y = Yes|X = Cloudy)

× log2 Pr(Y = Yes|X = Cloudy)

− Pr(X = Cloudy) [Pr(Y = No|X = Cloudy)



× log2 Pr(Y = No|X = Cloudy)

− Pr(X = Rain) [Pr(Y = Yes|X = Rain)



× log2 Pr(Y = Yes|X = Rain)
− Pr(X = Rain) [Pr(Y = No|X = Rain)

× log2 Pr(Y = No|X = Rain)
 
5 2 2 3 3
=− log2 + log2
14 5 5 5 5
 
4 4 4 0 0
− log2 + log2
14 4 4 4 4
 
5 3 3 2 2
− log2 + log2
14 5 5 5 5
= −(0.35 + 0 + 0.35) = 0.7.

Similarly, we can calculate the mutual information I (Temperature; PlayTennis),


I (Humidity; PlayTennis), and I (Wind; PlayTennis), which are 0.15, 0.05, and 0.03,
respectively. Because I (Sunny; PlayTennis) is maximal, we can conclude that Sunny
is the most significant attribute with respect to PlayTennis.
For the case of continuous random variable, the corresponding concepts can be
found in Chapter 8 of Reference 5.

1.2 Fuzziness
Fuzziness is a kind of cognitive uncertainty due to the absence of strict or precise
boundaries of concepts [6–10]. In the real world, there are many vague concepts,
such as young, hot, and high. These concepts can be modeled by fuzzy sets that can
6 ■ Learning with Uncertainty

be represented by their membership functions [2,11]. In this section, regarding a


fuzzy set, we briefly review the definition, properties, operations, and measures of
fuzziness. The details can be found in [2,11].

1.2.1 Definition and Representation of Fuzzy Sets


In classical set theory, the characteristic function of a set assigns a value of either 1
or 0 to each element in the set. In other words, for every element, it either belongs
to the set or does not belong to the set. Let U be a set of objects, called the universe,
and A ⊆ U ; the characteristic function of A is defined as follows:

1 if x ∈ A;
μA (x) = (1.12)
0 if x ∈
/ A.

If we extend the values of μA (x) from {0, 1} to interval [0, 1], then we can define a
fuzzy set A. In order to distinguish between classical sets and fuzzy sets, we use 
A to
represent a fuzzy set. In the following, we present the definition of fuzzy set.

Definition 1.5 Let U be a universe and  A be a mapping from U to [0, 1], denoted
by μ (x). If ∀x ∈ U , μ (x) ∈ [0, 1], then A is said to be a fuzzy set defined on U ;
A A
μ
A (x) is called the membership degree of element x belonging to A.

The set of all fuzzy sets defined on U is denoted by F(U ); ∅ (the empty set) is
a fuzzy set whose membership degree for any element is zero.
For a given fuzzy set 
A defined on U , 
A is completely characterized by the set of
pairs [11,12]


A = {(x, μ
A (x)), x ∈ U }. (1.13)

If U is a finite set {x1 , x2 , . . . , xn }, then fuzzy set 


A can be conveniently repre-
sented by

 μ(x1 ) μ (x2 ) μ(xn )


A= A + A + ··· + A . (1.14)
x1 x2 xn

When U is not finite, we can write (1.14) as


 μ(x)

A= A
. (1.15)
x

A
Uncertainty ■ 7

1.2.2 Basic Operations and Properties of Fuzzy Sets


Definition 1.6 Let  B ∈ F(U ). If ∀x ∈ U , we have 
A,  A(x) ≤ B(x). Then we say
B includes 
that  A; in other words,  B, denoted by 
A is included in  A ⊆B or denoted
 
by B ⊇ A.

Definition 1.7 Let  B ∈ F(U ). If ∀x ∈ U , we have 


A,  A(x) = B(x). Then we say
that   
B = A. If A is included in  
B, but A  =  B truly includes 
B, then we say that  A;
in other words,  B, denoted by 
A is truly included in  A ⊂ B, or denoted by B ⊃ A.

Theorem 1.2 Let 


A,   ∈ F(U ). Then the following properties hold:
B, C

(1) ∅ ⊆A ⊆ U.
(2) 
A ⊆A.
(3) 
A ⊆ B ⊆
B,  A ⇒A =
B.
(4) 
A ⊆ B, B ⊆ C ⇒ 
   
A ⊆ C.

Definition 1.8 Let  B ∈ F(U ). We define the union 


A,  A∪
B, the intersection

A ∩B, and the complement 
Ac as follows, respectively:

( B)(x) = 
A ∪ B(x) = max{
A(x) ∨  A(x), 
B(x)};
( B)(x) = 
A ∩ B(x) = min{
A(x) ∧  A(x), 
B(x)};

Ac (x) = 1 − 
A(x).

Theorem 1.3 Let 


A,   ∈ F(U ). Then the following laws hold:
B, C

(1) 
A ∪ A = A, A ∩A = A.
(2)     
A ∪ B = B ∪ A, A ∩  B =B ∩ A.
(3)     
A ∪ (B ∪ C) = (A ∪ B) ∪ C,  A ∩ (  = (
B ∩ C) A ∩ 
B) ∩ C.
(4)       
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C), A ∩ (B ∪ C) = (A ∩ 
    B) ∪ ( 
A ∩ C).
(5)       
A ∪ (A ∩ B) = A, A ∩ (A ∪ B) = A. 
(6) A c )c = 
( A.
(7) 
A∪∅ = A, 
A ∩ ∅ = ∅,  A ∪ U = U , A∩U = A.
(8) (
A ∪ B) = 
c A ∩
c B , (
c A ∩B) = 
c A ∪
c Bc .

The proof can be found in [13].


8 ■ Learning with Uncertainty

1.2.3 Fuzzy Measures


The fuzziness of a fuzzy set can be measured with fuzzy entropy. Luca and Termini
pointed out that fuzzy entropy must satisfy the following properties [14].

Theorem 1.4 Let H ( A) be a fuzzy entropy defined on fuzzy set 


A. H (
A) must satisfy
the following properties:

(1) ∀
A ∈ F(U ), H (A) = 0, if and only if  A is a classical set.
 
(2) ∀A ∈ F(U ), H (A) = 1, if and only if ∀x ∈ U , μ A (x) = 0.5.
(3) ∀
A, 
B ∈ F(U ), if μ A (x) ≤ μ 
B (x), when μ 
A (x) ≥ 12 and μA (x) ≥ μ
B (x),
1  
when μA (x) ≤ 2 , then H (A) ≥ H (B).
(4) ∀
A ∈ F(U ), H (A) = H ( Ac ).

Many fuzzy entropies have been defined by different researchers. A short survey of
fuzzy entropy can be found in [15] or in [16]. In the following, we introduce three
fuzzy entropies commonly used in practice.

Definition 1.9 (Zadeh fuzzy entropy [2]) Let  A be a fuzzy set defined on U .
P = {p1 , p2 , . . . , pn } is a probability distribution of U , U = {x1 , x2 , . . . , xn }, μ
A (xi )
 
is the membership function of A, and the fuzzy entropy of A is defined as follows:

n
H (
A) = − A (xi )pi log2 pi .
μ (1.16)
i=1

Definition 1.10 (Luca–Termini fuzzy entropy [14]) Let  A be a fuzzy set defined
on U . U = {x1 , x2 , . . . , xn }, μ(x
A i ) is the membership function of 
A, and the fuzzy

entropy of A is defined as follows:

1 
n

H (
A) = − A (xi ) + (1 − μ
A (xi ) log2 μ
μ A (xi ) log2 (1 − μ
A (xi )) . (1.17)
n i=1

Definition 1.11 (Kosko fuzzy entropy [17]) Let  A be a fuzzy set defined on U .
U = {x1 , x2 , . . . , xn }, μ(x
A i ) is the membership function of A, and the fuzzy entropy

of A is defined as follows:
n  
μ (xi ) ∧ μ c (xi )
H (
A) = n  i=1 A A
. (1.18)
A (xi ) ∨ μ
i=1 μ Ac (xi )
Uncertainty ■ 9

1.3 Roughness
Roughness is originally proposed by Pawlak [3] in order to represent the inadequacy
of knowledge. To introduce roughness, we first review some fundamental concepts
of rough sets.
Rough set theory [3] was developed on the foundation of indiscernible relation,
which is actually an equivalence relation. In the framework of classification, the
objects processed by rough set theory are represented as decision tables.

Definition 1.12 A decision table DT is a four tuple DT = (U , A∪C, V , f ), where


U = {x1 , x2 , . . . , xn } called universe is a set of objects; A = {a1 , a2 , . . . , ad } is a set of
conditional attributes, each attribute taking discrete values; C is a class label variable,
taking values in {1, 2, . . . , k}; V is the Cartesian product of conditional attributes;
and f is an information function, f : U × A → V .

Given a discrete-valued decision table DT and a binary relation R on U , we have


the following basic definitions.

Definition 1.13 R is called an equivalence relation if it satisfies the following


conditions:

(1) ∀x ∈ U , xRx.
(2) ∀x, y ∈ U , if xRy, then yRx.
(3) ∀x, y, z ∈ U , if xRy and yRz, then xRz.

Definition 1.14 ∀x ∈ U . The equivalence class of x is defined as follows:

[x]R = y|(y ∈ U ) ∧ (yRx) . (1.19)

Definition 1.15 ∀X ⊆ U . The lower approximation and the upper approximation


of X with respect to R are defined as

R(X ) = {x|(x ∈ U ) ∧ ([x]R ⊆ X )} (1.20)

and

R(X ) = {x|(x ∈ U ) ∧ ([x]R ∩ X )  = ∅}. (1.21)


10 ■ Learning with Uncertainty

Definition 1.16 ∀X ⊆ U . The approximate accuracy and the roughness of R are


defined as
|R(X )|
αR (X ) = (1.22)
|R(X )|
and
|R(X ) − R(X )|
βR (X ) = 1 − αR (X ) = . (1.23)
|R(X )|

1.4 Nonspecificity
Nonspecificity [4] is also called ambiguity, which is another type of cognitive uncer-
tainty. It results from choosing one from two or more unclear objects. For example,
an interesting film and an expected concert are holding at the same time. In this sit-
uation, it is hard for those who love both film and music to decide which one should
be attended. This uncertainty is associated with a possibility distribution that was
proposed by Zadeh [18].
Let X be a classical set and let P (X ) denotes the power set of X . A possibility
measure is a function π [19]:
π : P (X ) → [0, 1] (1.24)

that has the following properties [19]:


(1) π(∅) = 0.
(2) A ⊆ B =⇒ π(A) ≤ π(B).
 
(3) A1 ⊆ A2 ⊆ · · · =⇒ π Ai = sup π(Ai ),
i∈I i∈I

where I is an index set.


Any possibility measure can be uniquely determined by a possibility distribution
function f : X → [0, 1] via the formula π(A) = supx∈A f (x), where A ⊂ X .

Definition 1.17 The possibility distribution π is called a normalized possibility


distribution if and only if maxx∈X π(x) = 1.

The nonspecificity or ambiguity of a possibility distribution π is defined as follows.

Definition 1.18 Let X = {x1 , x2 , . . . , xn } and let π = {π(x)|x ∈ X } be a nor-


malized possibility distribution on X . The measure of nonspecificity or ambiguity is
defined as follows [4]:
Uncertainty ■ 11


n
g(π) = (π∗i − π∗i+1 ) ln i, (1.25)
i=1

where π∗ = {π∗1 , π∗2 , . . . , π∗n } is the permutation of the possibility distribution


π = {π(x1 ), π(x2 ), . . . , π(xn )}, sorted so that π∗i ≥ π∗i+1 , for i = 1, 2, . . . , n,
and π∗n+1 = 0.
Ambiguity expresses the possible uncertainty of choosing one from many avail-
able objects. Obviously, the larger the set of possible alternatives is, the bigger the
ambiguity is. The ambiguity will attain its maximum when all the μi are equivalent,
that is, all the μ∗i = 1(1 ≤ i ≤ n) but μ∗n+1 = 0. It will be full specificity, that
is, no ambiguity exists, when only one alternative is possible, that is, only one μi is
equal to 1 but all others equal to zero.

1.5 Relationships among the Uncertainties


Entropy is to measure the uncertainty caused by randomness. Fuzziness is to mea-
sure the uncertainty of a linguistic term, which is usually represented by a fuzzy set.
Roughness is to measure the uncertainty of representing knowledge associated with a
target concept. Given equivalence relation R (i.e., the so-called knowledge) and a tar-
get concept X , the bigger the boundary region of X with respect to R is, the bigger the
roughness of R is. Ambiguity is to measure the nonspecificity when we choose one
from many available choices. In the following, we discuss the relationships among
randomness, fuzziness, and ambiguity [20].
In the framework of classification, for a given set of samples or instances, entropy
measures the impurity of a crisp set; fuzziness measures the distinction between the
set and its complement; ambiguity measures the amount of uncertainty associated
with the set of possible alternatives. Clearly, the significance of the three uncertain-
ties is different. The entropy of the sample set is associated with the proportions
of the number of samples from every class, which denotes the class impurity of
the set. Usually, entropy is restricted to be used for a crisp set. The fuzziness of a
sample set is associated with the membership degree of every sample to any class.
Fuzziness denotes the sharpness of the border of every fuzzy subset that is formed
according to the partition based on classes. Particularly, there is no fuzziness for a
crisp set. The ambiguity of a sample set, similar to fuzziness, is associated with the
membership degree of every sample to any class. But ambiguity describes the uncer-
tainty associated with the situation resulting from the lack of specificity in describing
the best or the most representative one. It emphasizes the uncertainty of possibil-
ity of choosing one from many available choices. Next, we discuss the relationship
between entropy and fuzziness and the relationship between fuzziness and ambiguity
in detail.
12 ■ Learning with Uncertainty

1.5.1 Entropy and Fuzziness


Entropy is defined on a probability distribution E0 = (p1 , p2 , . . . , pn ), where
n
i=1 pi = 1 and 0 ≤ pi ≤ 1 for each i (1 ≤ i ≤ n). Fuzziness is defined on a
possibility distribution B, that is, a fuzzy set B = (μ1 , μ2 , . . . , μn ), where 0 ≤
μi ≤ 1 for each i (1 ≤ i ≤ n). Since a probability distribution is a possibility
distribution, we can consider the fuzziness of a probability distribution E0 . We
now analyze the relationship between entropy and fuzziness based on a probability
distribution E0 = (p1 , p2 , . . . , pn ).
As a special case, when E0 is a two-dimensional probability distribution, that is,
E0 = (p1 , p2 ), according to (1.1) and (1.17), we have

Entropy(E0 ) = −p1 ln p1 − p2 ln p2
= −p1 ln p1 − (1 − p1 ) ln p(1 − p1 ) (1.26)

and

Fuzziness(E0 ) = 2(−p1 ln p1 − (1 − p1 ) ln p(1 − p1 )). (1.27)

Obviously, Fuzziness(E0 ) = 2Entropy(E0 ).


Suppose that E0 = (p1 , p2 , . . . , pn ) is an n-dimensional probability distribution.
We now discuss their monotonic properties and extreme-value points. Noting that
n
i=1 pi = 1, we assume that, without losing generality, p1 is the single variable and
p2 , . . . , pn are n − 2 constants with ni=2 pi = c and pn = 1 − c − p1 . Then, (1.1)
and (1.17) degenerate, respectively, to

Entropy(E0 ) = −p1 ln p1 − (1 − p1 ) ln p(1 − p1 ) + A (1.28)

and

Fuzziness(E0 ) = Entropy(E0 ) − (1 − p1 ) ln p(1 − p1 )


− (p1 + c) ln(p1 + c) + B, (1.29)

where A and B are two constants that are independent on p1 . By solving

d
Entropy(E0 ) = 0 (1.30)
dp1

and

d
Fuzziness(E0 ) = 0, (1.31)
dp1
Uncertainty ■ 13

we can obtain that both Entropy(E0 ) and Fuzziness(E0 ) with respect to the vari-
able p1 attain the maximum at p1 = 1−c 2 and monotonically increase when p1 < 2
1−c

and monotonically decrease when p1 > 1−c 2 . It indicates that the entropy and
fuzziness for a probability distribution have the same monotonic characteristics
and extreme-value points. Furthermore, noting that ni=2 pi = c and the symme-
try of variables p1 , p2 , . . . , pn , we conclude that the entropy and fuzziness of an
n-dimensional probability distribution attain their maximum at

1
p1 = p 2 = · · · = pn = . (1.32)
n

Roughly speaking, the fuzziness is an extension of the entropy.


When n = 3, the entropy and the fuzziness of E0 = (p1 , p2 , p3 ) are depicted
in Figure 1.2a and b, which are 3D contour plot of entropy and fuzziness. From
Figure 1.2a and b, we can see that the entropy of E0 = (p1 , p2 , p3 ) is similar to the
fuzziness of E0 = (p1 , p2 , p3 ). They all attain their maximum at p1 = p2 = 0.33
and minimum at three points p1 = p2 = 0, or p1 = 0, p2 = 1, or p1 = 1, p2 = 0.

1.5
(0.33, 0.33, 1.0986)

1
Entropy

0.5

0
0.2
0.4 0.8
0.6 0.6
p 0.8 0.4
1 0.2 p2
(a)

(0.33, 0.33, 1.9505)


2
Fuzziness

0
0.2
0.4 0.8
0.6 0.6
p1 0.8 0.4
0.2 p2
(b)

Figure 1.2 (a) Entropy of E0 = (p1 , p2 , p3 ). (b) Fuzziness E0 = (p1 , p2 , p3 ).


14 ■ Learning with Uncertainty

1.5.2 Fuzziness and Ambiguity


Both of the fuzziness and ambiguity are defined on a fuzzy set. For a general possi-
bility distribution B = (μ1 , μ2 , . . . , μn ) of which the fuzziness is defined as (1.17),
we derive by solving
∂ ∂
Fuzziness(B) = Fuzziness(B)
∂μ1 ∂μ2

= ··· = Fuzziness(B) = 0 (1.33)
∂μn
that Fuzziness(B) attains its maximum at
μ1 = μ2 = · · · = μn = 0.5. (1.34)
It is easy to check that Fuzziness(B) attains its minimum (zero) only at μi = 0
or μi = 1 for each i (1 ≤ i ≤ n). The monotonic feature of the Fuzziness(B) is
simple, that is, Fuzziness(B) monotonically increases in (0.0, 0.5) and monotonically
decreases in (0.5, 1.0) for each μi (1 ≤ i ≤ n).
The ambiguity of B = (μ1 , μ2 , . . . , μn ) is defined as (1.25). According to [19],
g(B) = Ambiguity(B) is a function with monotonicity, continuity, and symmetry.
Its maximum is attained at μ1 = μ2 = · · · = μn = 1, and its minimum is attained
in cases where only one μj (1 ≤ j ≤ n) is not zero.
Combining the above analysis on Fuzziness(B) and Ambiguity(B), we can think
that the essential difference between the two uncertainties exists. When n = 2, the
fuzziness and ambiguity of B = (μ1 , μ2 ) can degenerate, respectively, to
Fuzziness(B) = − μ1 ln μ1 − (1 − μ1 ) ln(1 − μ1 )
− μ2 ln μ2 − (1 − μ2 ) ln(1 − μ2 ). (1.35)
and

⎨(μ1 /μ2 ) × ln 2 if 0 ≤ μ1 < μ2 ;
Ambiguity(B) = ln 2 if μ1 = μ2 ; (1.36)

(μ2 /μ1 ) × ln 2 others.
The pictures of Fuzziness(B) in (1.35) and Ambiguity(B) in (1.36) are depicted
in Figure 1.3a and b. From Figure 1.3a, we can see that the plot of fuzziness of
B = (μ1 , μ2 ) is plane symmetric and the symmetrical planes are μ1 = 0.5 and
μ2 = 0.5, respectively. Fuzziness of B = (μ1 , μ2 ) attains its maximum at (0.5,
0.5) and attains its minimum at four endpoints, (0, 0), (0, 1), (1, 0), and (1, 1). In
Figure 1.3b, the surface above the plane z = 0 is the ambiguity of B = (μ1 , μ2 ). The
lines in the plane z = 0 are the contours of the ambiguity of B = (μ1 , μ2 ). From
the contours, we can see that the points with the same values of μ1 /μ2 are with
the same ambiguity. The more the contour approaches the line μ1 = μ2 , the
more the ambiguity is.
Uncertainty ■ 15

1.4

1.2

0.8
Fuzziness

0.6

0.4

0.2
0
0
0 0.2 0.4 0.5
0.6 0.8 1 1
(a)

0.7
0.6
0.5
Ambiguity

0.4
0.3
0.2
0.1
0
0
0.5 0
0.4 0.2
0.8 0.6
1 1
(b)

Figure 1.3 (a) Fuzziness of B = (μ1 , μ2 ). (b) Ambiguity of B = (μ1 , μ2 ).

References
1. V. R. Hogg, A. T. Craig. Introduction to Mathematical Statistics. New York: Pearson
Education, 2004.
2. L. A. Zadeh. Fuzzy sets. Information and Control, 1965, 8:338–353.
3. Z. Pawlak. Rough sets. International Journal of Information and Computer Sciences,
1982, 11:341–356.
4. Y. Yuan, M. J. Shaw. Induction of fuzzy decision trees. Fuzzy Sets and Systems, 1995,
69(2):125–139.
5. T. M. Cover, J. A. Thomas. The Elements of Information Theory, 2nd edn. John Wiley
& Sons, Inc., Hoboken, NJ, 2006.
16 ■ Learning with Uncertainty

6. A. Colubi, G. Gonzalez-Rodriguez. Fuzziness in data analysis: Towards accuracy and


robustness. Fuzzy Sets and Systems, 2015, 281:260–271.
7. R. Coppi, M. A. Gil, H. A. L. Kiers. The fuzzy approach to statistical analysis.
Computational Statistics & Data Analysis, 2006, 51(1):1–14.
8. R. Seising. On the absence of strict boundaries—Vagueness, haziness, and fuzziness in
philosophy, science, and medicine. Applied Soft Computing, 2008, 8(3):1232–1242.
9. Q. Zhang. Fuzziness–vagueness–generality–ambiguity. Journal of Pragmatics, 1998,
29(1):13–31.
10. G. J. Klir. Where do we stand on measures of uncertainty, ambiguity, fuzziness, and
the like? Fuzzy Sets and Systems, 1987, 24(2):141–160.
11. D. Dubois, H. Prade. Fuzzy Sets and Systems: Theory and Applications. Cambridge,
UK: Academic Press, Inc. A Division of Harcourt Brace & Company, 1980.
12. G. J. Klir, B. Yuan. Fuzzy Sets and Fuzzy Logic: Theory and Applications. New Jersey:
Prentice Hall PTR, 1995.
13. B. Q. Hu. Foundations of Fuzzy Theory. Wuhan, China: Wuhan University Press,
2004.
14. A. D. Luca, S. Termini. A definition of a nonprobabilistic entropy in the setting of
fuzzy set theory. Information Control, 1972, 20(4):301–312.
15. J. D. Shie, S. M. Chen. Feature subset selection based on fuzzy entropy measures for
handling classification problems. Applied Intelligence, 2007, 28(1):69–82.
16. H. M. Lee, C. M. Chen, J. M. Chen, Y. L. Jou. An efficient fuzzy classifier with feature
selection based on fuzzy entropy. IEEE Transactions on Systems, Man, and Cybernetics-
Part B: Cybernetics, 2001, 31(3):426–432.
17. B. Kosko. Fuzzy entropy and conditioning. Information Sciences, 1986, 40(2):
165–174.
18. L. A. Zadeh. Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems,
1978, 1(1):9–34.
19. M. Higashi, G. J. Klir. Measures of uncertainty and information based on possibility
distributions. International Journal of General Systems, 1982, 9(1):43–58.
20. X. Z. Wang, L. C. Dong, J. H. Yan. Maximum ambiguity-based sample selection in
fuzzy decision tree induction. IEEE Transactions on Knowledge and Data Engineering,
2012, 24(8):1491–1505.
Chapter 2

Decision Tree with


Uncertainty

Supervised learning, which can be categorized into classification and regression prob-
lems, is the central part of the machine learning field. Over the past several decades,
designing an effective learning system and promoting the system’s performance have
been the most important tasks in supervised learning. How to model and process
the uncertainty existing in the learning process has become a key challenge in learn-
ing from data. Uncertainty processing has a very significant impact on the entire
knowledge extraction process.
Many theories and methodologies have been developed to model and process
different kinds of uncertainties. Focusing on the decision tree induction, this chap-
ter will discuss the impact of uncertainty modeling and processing on the learning
process. We introduce the target concept learned with the decision tree algorithm,
some popular decision tree learning algorithms including widely used ID3 and its
fuzzy version, the induction of fuzzy decision tree based on rough fuzzy techniques,
and, more importantly, the relationship between the generalization ability of fuzzy
decision tree and the uncertainty in the learning process.

2.1 Crisp Decision Tree


In this section, we first introduce the ID3 algorithm, which is the original version
for learning decision trees from discrete-valued data sets [1,2]. Then we extend it to
the case of learning decision trees from continuous-valued data sets.

17
18 ■ Learning with Uncertainty

2.1.1 ID3 Algorithm


ID3 algorithm uses information gain as a heuristic to greedily select the extended
attributes. The input of ID3 algorithm is a data set, that is, a decision table (DT),
in which the conditional attributes are with discrete values. Table 2.1 is a small
decision table with 14 instances [2]. The output is a decision tree in which internal
nodes correspond to the selected attributes, the branches correspond to a set of the
values of the selected attributes, and the leaf nodes correspond to classes. Figure 2.1
is a decision tree learned from the data set given in Table 2.1. The learned decision
tree can be transformed into a set of if-then rules. The details of the ID3 algorithm
are presented in Algorithm 2.1.
Each path from root to leaf node in the learned decision tree can be transformed
into an if-then rule; the number of rules is the number of leaf nodes in decision tree.
Finally, the learned decision tree can be transformed into a set of if-then rules.
Suppose that the instances in data set S are classified into m categories
C1 , C2 , . . . , Cm . The numbers of instances in Ci (1 ≤ i ≤ m) and in S are denoted
by |Ci | and |S|, respectively. Obviously, the proportion of instances belonging
to Ci is

Table 2.1 A Small Sample of Data Set with 14 Instances


Day Outlook Temperature Humidity Wind PlayTennis

D1 Sunny Hot High Weak No

D2 Sunny Hot High Strong No

D3 Overcast Hot High Weak Yes

D4 Rain Mild High Weak Yes

D5 Rain Cool Normal Weak Yes

D6 Rain Cool Normal Strong No

D7 Overcast Cool Normal Strong Yes

D8 Sunny Mild High Weak No

D9 Sunny Cool Normal Weak Yes

D10 Rain Mild Normal Weak Yes

D11 Sunny Mild Normal Strong Yes

D12 Overcast Mild High Strong Yes

D13 Overcast Hot Normal Weak Yes

D14 Rain Mild High Strong No


Another random document with
no related content on Scribd:
III. MARXISM AND THE CLASS
STRUGGLE.
While it is true that for a certain theory to have a lasting influence
on the human mind it must have a highly scientific value, yet this in
itself is not enough. It quite often happened that a scientific theory
was of utmost importance to science, nevertheless, with the
probable exception of a few learned men, it evoked no interest
whatsoever. Such, for instance, was Newton’s theory of gravitation.
This theory is the foundation of astronomy, and it is owing to this
theory that we have our knowledge of heavenly bodies, and can
foretell the arrival of certain planets and eclipses. Yet, when
Newton’s theory of gravitation made its appearance, a few English
scientists were its only adherents. The broad mass paid no attention
to this theory. It first became known to the mass by a popular book of
Voltaire’s written a half century afterwards.
There is nothing surprising about this. Science has become a
specialty for a certain group of learned men, and its progress
concerns these men only, just as smelting is the smith’s specialty,
and an improvement in the smelting of iron concerns him only. Only
that which all people can make use of and which is found by
everyone to be a life necessity can gain adherents among the large
mass. When, therefore, we see that a certain scientific theory stirs
up zeal and passion in the large mass, this can be attributed to the
fact that this theory serves them as a weapon in the class struggle.
For it is the class struggle that engages almost all the people.
This can be seen most clearly in Marxism. Were the Marxian
economic teachings of no importance in the modern class struggle,
then none but a few professional economists would spend their time
on them. It is, however, owing to the fact that Marxism serves the
proletarians as a weapon in the struggle against capitalism that the
scientific struggles are centered on this theory. It is owing to this
service that Marx’s name is honored by millions who know even very
little of his teaching, and is despised by thousands that understand
nothing of his theory. It is owing to the great role the Marxian theory
plays in the class struggle that his theory is diligently studied by the
large mass and that it dominates the human mind.
The proletarian class struggle existed before Marx for it is the
offspring of capitalist exploitation. It was nothing more than natural
that the workers, being exploited, should think about and demand
another system of society where exploitation would be abolished.
But all they could do was to hope and dream about it. They were not
sure of its coming to pass. Marx gave to the labor movement and
Socialism a theoretical foundation. His social theory showed that
social systems were in a continuous flow wherein capitalism was
only a temporary form. His studies of capitalism showed that owing
to the continuous development of perfection of technique, capitalism
must necessarily develop to Socialism. This new system of
production can only be established by the proletarians struggling
against the capitalists, whose interest it is to maintain the old system
of production. Socialism is therefore the fruit and aim of the
proletarian class struggle.
Thanks to Marx, the proletarian class struggle took on an entirely
different form. Marxism became a weapon in the proletarian hands;
in place of vague hopes he gave a positive aim, and in teaching a
clear recognition of the social development he gave strength to the
proletarian and at the same time he created the foundation for the
correct tactics to be pursued. It is from Marxism that the workingmen
can prove the transitoriness of capitalism and the necessity and
certainty of their victory. At the same time Marxism has done away
with the old utopian views that Socialism would be brought about by
the intelligence and good will of some judicious men; as if Socialism
were a demand for justice and morality; as if the object were to
establish an infallible and perfect society. Justice and morality
change with the productive system, and every class has different
conceptions of them. Socialism can only be gained by the class
whose interest lies in Socialism, and it is not a question about a
perfect social system, but a change in the methods of production
leading to a higher step, i. e., to social production.
Because the Marxian theory of social development is
indispensable to the proletarians in their struggle, they, the
proletarians, try to make it a part of their inner self; it dominates their
thoughts, their feelings, their entire conception of the world. Because
Marxism is the theory of social development, in the midst of which
we stand, therefore Marxism itself stands as the central point of the
great mental struggles that accompany our economic revolution.
IV. DARWINISM AND THE CLASS
STRUGGLE.
That Marxism owes its importance and position only to the role it
takes in the proletarian class struggle, is known to all. With
Darwinism, however, things seem different to the superficial
observer, for Darwinism deals with a new scientific truth which has to
contend with religious prejudices and ignorance. Yet it is not hard to
see that in reality Darwinism had to undergo the same experiences
as Marxism. Darwinism is not a mere abstract theory which was
adopted by the scientific world after discussing and testing it in a
mere objective manner. No, immediately after Darwinism made its
appearance, it had its enthusiastic advocates and passionate
opponents; Darwin’s name, too, was either highly honored by people
who understood something of his theory, or despised by people who
knew nothing more of his theory than that “man descended from the
monkey,” and who were surely unqualified to judge from a scientific
standpoint the correctness or falsity of Darwin’s theory. Darwinism,
too, played a role in the class-struggle, and it is owing to this role
that it spread so rapidly and had enthusiastic advocates and
venomous opponents.
Darwinism served as a tool to the bourgeoisie in their struggle
against the feudal class, against the nobility, clergy-rights and feudal
lords. This was an entirely different struggle from the struggle now
waged by the proletarians. The bourgeoisie was not an exploited
class striving to abolish exploitation. Oh no. What the bourgeoisie
wanted was to get rid of the old ruling powers standing in their way.
The bourgeoisie themselves wanted to rule, basing their demands
upon the fact that they were the most important class, the leaders of
industry. What argument could the old class, the class that became
nothing but useless parasites, bring forth against them? They leaned
on tradition, on their ancient divine rights. These were their pillars.
With the aid of religion the priests held the great mass in subjection
and ready to oppose the demands of the bourgeoisie.
It was therefore for their own interests that the bourgeoisie were in
duty bound to undermine the “divinity” right of rulers. Natural science
became a weapon in the opposition to belief and tradition; science
and the newly discovered natural laws were put forward; it was with
these weapons that the bourgeoisie fought. If the new discoveries
could prove that what the priests were teaching was false, the
“divine” authority of these priests would crumble and the “divine
rights” enjoyed by the feudal class would be destroyed. Of course
the feudal class was not conquered by this only, as material power
can only be overthrown by material power, but mental weapons
become material tools. It is for this reason that the bourgeoisie relied
so much upon material science.
Darwinism came at the desired time; Darwin’s theory that man is
the descendant of a lower animal destroyed the entire foundation of
Christian dogma. It is for this reason that as soon as Darwinism
made its appearance, the bourgeoisie grasped it with great zeal.
This was not the case in England. Here we again see how
important the class struggle was for the spreading of Darwin’s theory.
In England the bourgeoisie had already ruled a few centuries, and as
a mass they had no interest to attack or destroy religion. It is for this
reason that although this theory was widely read in England, it did
not stir anybody; it merely remained a scientific theory without great
practical importance. Darwin himself considered it as such, and for
fear that his theory might shock the religious prejudices prevailing,
he purposely avoided applying it immediately to men. It was only
after numerous postponements and after others had done it before
him, that he decided to make this step. In a letter to Haeckel he
deplored the fact that his theory must hit upon so many prejudices
and so much indifference that he did not expect to live long enough
to see it break through these obstacles.
But in Germany things were entirely different, and Haeckel
correctly answered Darwin that in Germany the Darwinian theory
met with an enthusiastic reception. It so happened that when
Darwin’s theory made its appearance, the bourgeoisie was preparing
to carry on a new attack on absolutism and junkerism. The liberal
bourgeoisie was headed by the intellectuals. Ernest Haeckel, a great
scientist, and of still greater daring, immediately drew in his book,
“Natural Creation,” most daring conclusions against religion. So,
while Darwinism met with the most enthusiastic reception by the
progressive bourgeoisie, it was also bitterly opposed by the
reactionists.
The same struggle also took place in other European countries.
Everywhere the progressive liberal bourgeoisie had to struggle
against reactionary powers. These reactionists possessed, or were
trying to obtain through religious followers, the power coveted. Under
these circumstances, even the scientific discussions were carried on
with the zeal and passion of a class struggle. The writings that
appeared pro and con on Darwin have therefore the character of
social polemics, despite the fact that they bear the names of
scientific authors. Many of Haeckel’s popular writings, when looked
at from a scientific standpoint, are very superficial, while the
arguments and remonstrances of his opponents show unbelievable
foolishness that can only be met in the arguments used against
Marx.
The struggle, carried on by the liberal bourgeoisie against
feudalism was not fought to its finish. This was partly owing to the
fact that everywhere Socialist proletarians made their appearance,
threatening all ruling powers, including the bourgeoisie. The liberal
bourgeoisie relented, while the reactionary tendencies gained an
upper hand. The former zeal in combatting religion disappeared
entirely, and while it is true that the liberals and reactionists were still
fighting among each other, in reality, however, they neared each
other. The interest formerly manifested in science as a weapon in the
class struggle, has entirely disappeared, while the reactionary
tendency that the masses must be brought to religion, became ever
more pronounced.
The estimation of science has also undergone a change. Formerly
the educated bourgeoisie founded upon science a materialistic
conception of the universe, wherein they saw the solution of the
universal riddle. Now mysticism has gained the upper hand; all that
was solved appeared as very trivial, while all things that remained
unsolved, appeared as very great indeed, embracing the most
important life question. A sceptical, critical and doubting frame of
mind has taken the place of the former jubilant spirit in favor of
science.
This could also be seen in the stand taken against Darwin. “What
does his theory show? It leaves unsolved the universal riddle!
Whence comes this wonderful nature of transmission, whence the
ability of animate beings to change so fitly?” Here lies the mysterious
life riddle that could not be overcome with mechanical principles.
Then, what was left of Darwinism by the light of later criticism?
Of course, the advance of science began to make rapid progress.
The solution of one problem always brings a few more problems to
the surface to be solved, which were hidden underneath the theory
of transmission that Darwin had to accept as a basis of inquiry was
ever more investigated, a hot discussion arose about the individual
factors of development and the struggle for existence. While a few
scientists directed their attention to variation, which they considered
due to exercise and adaptation to life (following the principle laid
down by Lamarck) this idea was expressly denied by scientists like
Weissman and others. While Darwin only assumed gradual and slow
changes, De Vries found sudden and leaping cases of variation
resulting in the sudden appearance of new species. All this, while it
went to strengthen and develop the theory of descent, in some cases
made the impression that the new discoveries rent asunder the
Darwinian theory, and therefore every new discovery that made it
appear so was hailed by the reactionists as a bankruptcy of
Darwinism. This social conception had its influence on science.
Reactionary scientists claimed that a spiritual element is necessary.
The supernatural and insolvable has taken the place of Darwinism
and that class which in the beginning was the banner bearer of
Darwinism became ever more reactionary.
V. DARWINISM VERSUS SOCIALISM.
Darwinism has been of inestimable service to the bourgeoisie in its
struggle against the old powers. It was therefore only natural that
bourgeoisdom should apply it against its later enemy, the
proletarians; not because the proletarians were antagonistically
disposed to Darwinism, but just the reverse. As soon as Darwinism
made its appearance, the proletarian vanguard, the Socialists, hailed
the Darwinian theory, because in Darwinism they saw a
corroboration and completion of their own theory; not as some
superficial opponents believe, that they wanted to base Socialism
upon Darwinism but in the sense that the Darwinian discovery,—that
even in the apparently stagnant organic world there is a continuous
development—is a glorious corroboration and completion of the
Marxian theory of social development.
Yet it was natural for the bourgeoisie to make use of Darwinism
against the proletarians. The bourgeoisie had to contend with two
armies, and the reactionary classes know this full well. When the
bourgeoisie attacks their authority, they point at the proletarians and
caution the bourgeoisie to beware lest all authority crumble. In doing
this, the reactionists mean to frighten the bourgeoisie so that they
may desist from any revolutionary activity. Of course, the bourgeois
representatives answer that there is nothing to fear; that their
science but refutes the groundless authority of the nobility and
supports them in their struggle against enemies of order.
At a congress of naturalists, the reactionary politician and scientist
Virchow assailed the Darwinian theory on the ground that it
supported Socialism. “Be careful of this theory,” he said to the
Darwinists, “for this theory is very nearly related to the theory that
caused so much dread in our neighboring country.” This allusion to
the Paris Commune, made in the year famous for the hunting of
Socialists, must have had a great effect. What shall be said,
however, about the science of a professor who attacks Darwinism
with the argument that it is not correct because it is dangerous! This
reproach, of being in league with the red revolutionists, caused a lot
of annoyance to Haeckel, the defendant of this theory. He could not
stand it. Immediately afterwards he tried to demonstrate that it is just
the Darwinian theory that shows the untenableness of the Socialist
demands, and that Darwinism and Socialism “endure each other as
fire and water.”
Let us follow Haeckel’s contentions, whose main thoughts re-occur
in most authors who base their arguments against Socialism on
Darwinism.
Socialism is a theory which presupposes natural equality for
people, and strives to bring about social equality; equal rights, equal
duties, equal possessions and equal enjoyments. Darwinism, on the
contrary, is the scientific proof of inequality. The theory of descent
establishes the fact that animal development goes in the direction of
ever greater differentiation or division of labor; the higher or more
perfect the animal, the greater the inequality existing. The same
holds also good in society. Here, too, we see the great division of
labor between vocations, class, etc., and the higher we stand in
social development the greater become the inequalities in strength,
ability and faculty. The theory of descent is therefore to be
recommended as “the best antidote to the Socialist demand of
making all equal.”
The same holds good, but to a greater extent, of the Darwinian
theory of survival. Socialism wants to abolish competition and the
struggle for existence. But Darwinism teaches us that this struggle is
unavoidable and is a natural law for the entire organic world. Not
only is this struggle natural, but it is also useful and beneficial. This
struggle brings an ever greater perfection, and this perfection
consists in an ever greater extermination of the unfit. Only the
chosen minority, those who are qualified to withstand competition,
can survive; the great majority must perish. Many are called, but few
are chosen. The struggle for existence results at the same time in a
victory for the best, while the bad and unfit must perish. This may be
lamentable, just as it is lamentable that all must die, but the fact can
neither be denied nor changed.
We wish to remark here how a small change of almost similar
words serves as a defence of capitalism. Darwin spoke about the
survival of the fittest, of those that are best fitted to the conditions.
Seeing that in this struggle those that are better organized conquer
the others, the conquerors were called the vigilant, and later the
“best.” This expression was coined by Herbert Spencer. In thus
winning on their field, the conquerors in the social struggle, the large
capitalists, were proclaimed the best people.
Haeckel retained and still upholds this conception. In 1892 he
said, “Darwinism, or the theory of selection, is thoroughly aristocratic;
it is based upon the survival of the best. The division of labor brought
about by development causes an ever greater variation in character,
an ever greater inequality among the individuals, in their activity,
education and condition. The higher the advance of human culture,
the greater the difference and gulf between the various classes
existing. Communism and the demands put up by the Socialists in
demanding an equality of conditions and activity is synonymous with
going back to the primitive stages of barbarism.”
The English philosopher Herbert Spencer already had a theory on
social growth before Darwin. This was the bourgeois theory of
individualism, based upon the struggle for existence. Later he
brought this theory into close relation with Darwinism. “In the animal
world,” he said, “the old, weak and sick are ever rooted out and only
the strong and healthy survive. The struggle for existence serves
therefore as a purification of the race, protecting it from deterioration.
This is the happy effect of this struggle, for if this struggle should
cease and each one were sure of procuring its existence without any
struggle whatsoever, the race would necessarily deteriorate. The
support given to the sick, weak and unfit causes a general race
degeneration. If sympathy, finding its expressions in charity, goes
beyond its reasonable bounds, it misses its object; instead of
diminishing, it increases the suffering for the new generations. The
good effect of the struggle for existence can best be seen in wild
animals. They are all strong and healthy because they had to
undergo thousands of dangers wherein all those that were not
qualified had to perish. Among men and domestic animals sickness
and weakness are so general because the sick and weak are
preserved. Socialism, having as its aim to abolish the struggle for
existence in the human world, will necessarily bring about an ever
growing mental and physical deterioration.”
These are the main contentions of those who use Darwinism as a
defence of the bourgeois system. Strong as these arguments might
appear at first sight, they were not hard for the Socialists to
overcome. To a large extent, they are the old arguments used
against Socialism, but wearing the new garb of Darwinistic
terminology, and they show an utter ignorance of Socialism as well
as of capitalism.
Those who compare the social organism with the animal body
leave unconsidered the fact that men do not differ like various cells
or organs, but only in degree of their capacity. In society the division
of labor cannot go so far that all capacities should perish at the
expense of one. What is more, everyone who understands
something of Socialism knows that the efficient division of labor does
not cease with Socialism; that first under Socialism real divisions will
be possible. The difference between the workers, their ability, and
employments will not cease; all that will cease is the difference
between workers and exploiters.
While it is positively true that in the struggle for existence those
animals that are strong, healthy and well survive, yet this does not
happen under capitalist competition. Here victory does not depend
upon perfection of those engaged in the struggle, but in something
that lies outside of their body. While this struggle may hold good with
the small bourgeois, where success depends upon personal abilities
and qualifications, yet with the further development of capital,
success does not depend upon personal abilities, but upon the
possession of capital. The one who has a larger capital at command
will soon conquer the one who has a smaller capital at his disposal,
although the latter may be more skillful. It is not the personal
qualities, but the possession of money that decides who the victor
shall be in the struggle. When the small capitalists perish, they do
not perish as men but as capitalists; they are not weeded out from
among the living, but from the bourgeoisie. They still exist, but no
longer as capitalists. The competition existing in the capitalist system
is therefore something different in requisites and results from the
animal struggle for existence.
Those people that perish as people are members of an entirely
different class, a class that does not take part in the competitive
struggle. The workers do not compete with the capitalists, they only
sell their labor power to them. Owing to their being propertyless, they
have not even the opportunity to measure their great qualities and
enter a race with the capitalists. Their poverty and misery cannot be
attributed to the fact that they fell in the competitive struggle on
account of weakness, but because they were paid very little for their
labor power, it is for this very reason that, although their children are
born strong and healthy, they perish in great mass, while the children
born to rich parents, although born sick, remain alive by means of
the nourishment and great care that is bestowed on them. These
children of the poor do not die because they are sick or weak, but
because of external cause. It is capitalism which creates all those
unfavorable conditions by means of exploitation, reduction of wages,
unemployment, crises, bad dwellings, and long hours of
employment. It is the capitalist system that causes so many strong
and healthy ones to succumb.
Thus the Socialists prove that, different from the animal world, the
competitive struggle existing between men does not bring forth the
best and most qualified, but destroys many strong and healthy ones
because of their poverty, while those that are rich, even if weak and
sick, survive. Socialists prove that personal strength is not the
determining factor, but it is something outside of man; it is the
possession of money that determines who shall survive and who
shall perish.
VI. NATURAL LAW AND SOCIAL
THEORY.
The false conclusions reached by Haeckel and Spencer on
Socialism are no surprise. Darwinism and Marxism are two distinct
theories, one of which applies to the animal world, while the other
applies to society. They supplement each other in the sense that,
according to the Darwinian theory of evolution, the animal world
develops up to the stage of man, and from then on, that is, after the
animal has risen to man, the Marxian theory of evolution applies.
When, however, one wishes to carry the theory of one domain into
that of the other, where different laws are applicable, he must draw
wrong inferences.
Such is the case when we wish to ascertain from natural law what
social form is natural and applicable, and this is just what the
bourgeois Darwinists did. They drew the inference that the laws
which govern in the animal world, where the Darwinian theory
applies, apply with equal force in the capitalist system, and that
therefore capitalism is a natural order and must endure forever. On
the other hand, there were some Socialists who desired to prove
that, according to Darwin, the Socialist system is the natural one.
Said these Socialists, “Under capitalism men do not carry on the
struggle for existence with like tools, but with unlike ones artificially
made. The natural superiority of those that are healthier, stronger,
more intelligent or morally better, is of no avail so long as birth, class,
or the possession of money control this struggle. Socialism, in
abolishing all these artificial dissimilarities, will make equal
provisions for all, and then only will the struggle for existence prevail,
wherein the real personal superiorities will be the deciding factors.”
These critical arguments, while they are not bad when used as
refutations against bourgeois Darwinists, are still faulty. Both sets of
arguments, those used by the bourgeois Darwinists in favor of
capitalism, and those of the Socialists, who base their Socialism on
Darwin, are falsely rooted. Both arguments, although reaching
opposite conclusions, are equally false because they proceed from
the wrong premises that there is a natural and a permanent system
of society.
Marxism has taught us that there is no such thing as a natural and
a permanent social system, and that there can be none, or, to put it
another way, every social system is natural, for every social system
is necessary and natural under given conditions. There is not a
single definite social system that can be accepted as natural; the
various social systems take the place of one another as a result of
developments in the means of production. Each system is therefore
the natural one for its particular time. Capitalism is not the only
natural order, as the bourgeoisie believes, and no Socialist system is
the only natural system, as some Socialists try to prove. Capitalism
was natural under the conditions of the nineteenth century, just as
feudalism was in the Middle Ages, and as Socialism will be in the
coming age. The attempt to put forward a certain system as the only
natural and permanent one is as futile as if we were to take an
animal and say that this animal is the most perfect of all animals.
Darwinism teaches us that every animal is equally adapted and
equally perfect in form to suit its special environments, and Marxism
teaches us that every social system is particularly adapted to its
conditions, and that in this sense it may be called good and perfect.
Herein lies the main reason why the endeavor of the bourgeois
Darwinists to defend the foundering capitalist system is bound to fail.
Arguments based on natural science, when applied to social
questions, must almost always lead to reverse conclusions. This
happens because, while nature is very slow in its development and
changes within the ken of human history are imperceptible, so that it
may almost be regarded as stable, human society nevertheless
undergoes quick and continuous changes. In order to understand the
moving force and the cause of social development, we must study
society as such. It is only here that we can find the reason of social
development. Marxism and Darwinism should remain in their own
domains; they are independent of each other and there is no direct
connection between them.
Here arises a very important question. Can we stop at the
conclusion that Marxism applies only to society and that Darwinism
applies only to the organic world, and that neither of these theories is
applicable in the other domain? In practice it is very convenient to
have one principle for the human world and another one for the
animal world. In having this, however, we forget that man is also an
animal. Man has developed from an animal, and the laws that apply
to the animal world cannot suddenly lose their applicability to man. It
is true that man is a very peculiar animal, but if that is the case it is
necessary to find from these very peculiarities why those principles
applicable to all animals do not apply to men, and why they assume
a different form.
Here we come to another grave problem. The bourgeois
Darwinists do not encounter such a problem; they simply declare
that man is an animal, and without further ado they set about to
apply the Darwinian principles to men. We have seen to what
erroneous conclusions they come. To us this question is not so
simple; we must first be clear about the differences between men
and animals, and then we can see why, in the human world, the
Darwinian principles change into different ones, namely, into
Marxism.
VII. THE SOCIABILITY OF MAN.
The first peculiarity that we observe in man is that he is a social
being. In this he does not differ from all animals, for even among the
latter there are many species that live socially among themselves.
But man differs from all those that we have observed until now in
dealing with the Darwinian theory; he differs from those animals that
do not live socially, but that struggle with each other for subsistence.
It is not with the rapacious animals which live separately that man
must be compared, but with those that live socially. The sociability of
animals is a power that we have not yet spoken of; a power that calls
forth new qualities among animals.
It is an error to regard the struggle for existence as the only power
giving shape to the organic world. The struggle for existence is the
main power that causes the origin of new species, but Darwin
himself knew full well that other powers co-operate which give shape
to the forms, habits, and peculiarities of animate things. In his
“Descent of Man” Darwin elaborately treated sexual selection and
showed that the competition of males for females gave rise to the
gay colors of the birds and butterflies and also to the singing voices
of birds. There he also devoted a chapter to social living. Many
illustrations on this head are also to be found in Kropotkin’s book,
“Mutual Aid as a Factor in Evolution.” The best representation of the
effects of sociability are given in Kautsky’s “Ethics and the
Materialistic Conception of History.”
When a number of animals live in a group, herd or flock, they carry
on the struggle for existence in common against the outside world;
within such a group the struggle for existence ceases. The animals
which live socially no longer wage a struggle against each other,
wherein the weak succumb; just the reverse, the weak enjoy the
same advantages as the strong. When some animals have the
advantage by means of greater strength, sharper smell, or
experience in finding the best pasture or in warding off the enemy,
this advantage does not accrue only to these better fitted, but also to
the entire group. This combining of the animals’ separate powers
into one unit gives to the group a new and much stronger power than
any one individual possessed, even the strongest. It is owing to this
united strength that the defenseless plant-eaters can ward off
rapacious animals. It is only by means of this unity that some
animals are able to protect their young.
A second advantage of sociability arises from the fact that where
animals live socially, there is a possibility of the division of labor.
Such animals send out scouts or place sentinels whose object it is to
look after the safety of all, while others spend their time either in
eating or in plucking, relying upon their guards to warn them of
danger.
Such an animal society becomes, in some respects, a unit, a
single organism. Naturally, the relation remains much looser than the
cells of a single animal body; nevertheless, the group becomes a
coherent body, and there must be some power that holds together
the individual members.
This power is found in the social motives, the instinct that holds
them together and causes the continuance of the group. Every
animal must place the interest of the entire group above his own; it
must always act instinctively for the advantage and maintenance of
the group without consideration of itself. As long as the weak plant-
eaters think of themselves only and run away when attacked by a
rapacious animal, each one minding his life only, the entire herd
disappears. Only when the strong motive of self-preservation is
suppressed by a stronger motive of union, and each animal risks its
life for the protection of all, only then does the herd remain and enjoy
the advantages of sticking together. In such a case, self-sacrifice,
bravery, devotion, discipline and consciousness must arise, for
where these do not exist society dissolves; society can only exist
where these exist.
These instincts, while they have their origin in habit and necessity,
are strengthened by the struggle for existence. Every animal herd
still stands in a competitive struggle against the same animals of a
different herd; those that are best fitted to withstand the enemy will
survive, while those that are poorer equipped will perish. That group
in which the social instinct is better developed will be able to hold its
ground, while the group in which social instinct is low will either fall
an easy prey to its enemies or will not be in a position to find
favorable feeding places. These social instincts become therefore
the most important and decisive factors that determine who shall
survive in the struggle for existence. It is owing to this that the social
instincts have been elevated to the position of predominant factors.
These relations throw an entirely new light upon the views of the
bourgeois Darwinists. Their claim is that the extermination of the
weak is natural and that it is necessary in order to prevent the
corruption of the race, and that the protection given to the weak
serves to deteriorate the race. But what do we see? In nature itself,
in the animal world, we find that the weak are protected; that it is not
by their own personal strength that they maintain themselves, and
that they are not brushed aside on account of their personal
weakness. This arrangement does not weaken the group, but gives
to it new strength. The animal group in which mutual aid is best
developed is best fit to maintain itself in the strife. That which,
according to the narrow conception appeared as a cause of
weakness, becomes just the reverse, a cause of strength. The
sociable animals are in a position to beat those that carry on the
struggle individually. This so-called degenerating and deteriorating
race carries off the victory and practically proves itself to be the most
skilful and best.
Here we first see fully how near sighted, narrow and unscientific
are the claims and arguments of the bourgeois Darwinists. Their
natural laws and their conceptions of what is natural are derived from
a part of the animal world, from those which man resembles least,
while those animals that practically live under the same
circumstances as man are left unobserved. The reason for this can
be found in the bourgeoise’s own circumstances; they themselves
belong to a class where each competes individually against the
other; therefore, they see among animals only that form of the
struggle for existence. It is for this reason that they overlook those
forms of the struggle that are of greatest importance to men.
It is true that these bourgeois Darwinists are aware of the fact that
man is not ruled by mere egoism without regard for his neighbors.
The bourgeois scientists say very often that every man is possessed
of two feelings, the egotistical, or self-love, and the altruistic, the love
of others. But as they do not know the social origin of this altruism,
they cannot understand its limitations and conditions. Altruism in
their mouths becomes a very indistinct idea which they don’t know
how to handle.
Everything that applies to the social animals applies also to man.
Our ape-like ancestors and the primitive men developing from them
were all defenseless, weak animals who, as almost all apes do, lived
in tribes. Here the same social motives and instincts had to arise
which later developed to moral feelings. That our customs and
morals are nothing other than social feelings, feelings that we find
among animals, is known to all; even Darwin spoke about “the habits
of animals which would be called moral among men.” The difference
is only in the measure of consciousness; as soon as these social
feelings become clear to men, they assume the character of moral
feelings. Here we see that the moral conception—which bourgeois
authors considered as the main distinction between men and
animals—is not common to men, but is a direct product of conditions
existing in the animal world.
It is in the nature of the origin of these moral feelings that they do
not spread further than the social group to which the animal or the
man belongs. These feelings serve the practical object of keeping
the group together; beyond this they are useless. In the animal
world, the range and nature of the social group is determined by the
circumstances of life, and therefore the group almost always
remains the same. Among men, however, the groups, these social
units, are ever changing in accordance with economic development,
and this also changes the social instincts.
The original groups, the stems of the wild and barbarian people,
were more strongly united than the animal groups. Family
relationship and a common language strengthened this union further.
Every individual had the support of the entire tribe. Under such
conditions, the social motives, the moral feelings, the subordination
of the individual to the whole, must have developed to the utmost.
With the further development of society, the tribes are dissolved and
their places are taken by new unions, by towns and peoples. New
formations step into the place of the old ones, and the members of
these groups carry on the struggle for existence in common against
other peoples. In equal ratio with economic development, the size of
these unions increases, the struggle of each against the other
decreases, and social feelings spread. At the end of ancient times
we find that all the people known then formed a unit, the Roman
Empire, and at that time arose the theory—the moral feelings having
their influence on almost all the people—which led to the maxim that
all men are brothers.
When we regard our own times, we see that economically all the
people form one unit, although a very weak one; nevertheless the
abstract feeling of brotherhood becomes ever more popular. The
social feelings are strongest among members of the same class, for
classes are the essential units embodying particular interests and
including certain members. Thus we see that the social units and
social feelings change in human society. These changes are brought
about by economic changes, and the higher the stage of economic
development, the higher and nobler the social feelings.

You might also like