You are on page 1of 53

Computational and Statistical Methods

for Analysing Big Data with


Applications 1st Edition Shen Liu
Visit to download the full and correct content document:
https://textbookfull.com/product/computational-and-statistical-methods-for-analysing-b
ig-data-with-applications-1st-edition-shen-liu/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Statistical Inversion of Electromagnetic Logging Data


Qiuyang Shen

https://textbookfull.com/product/statistical-inversion-of-
electromagnetic-logging-data-qiuyang-shen/

Big Data and Smart Service Systems Liu Xiwei

https://textbookfull.com/product/big-data-and-smart-service-
systems-liu-xiwei/

Computational Intelligence Applications in Business


Intelligence and Big Data Analytics 1st Edition Vijayan
Sugumaran

https://textbookfull.com/product/computational-intelligence-
applications-in-business-intelligence-and-big-data-analytics-1st-
edition-vijayan-sugumaran/

Probabilistic data structures and algorithms for big


data applications Gakhov

https://textbookfull.com/product/probabilistic-data-structures-
and-algorithms-for-big-data-applications-gakhov/
Computational and Statistical Methods in Intelligent
Systems Radek Silhavy

https://textbookfull.com/product/computational-and-statistical-
methods-in-intelligent-systems-radek-silhavy/

Statistical Methods for Machine Learning: Discover How


to Transform Data into Knowledge with Python Jason
Brownlee

https://textbookfull.com/product/statistical-methods-for-machine-
learning-discover-how-to-transform-data-into-knowledge-with-
python-jason-brownlee/

Statistical Methods for Imbalanced Data in Ecological


and Biological Studies Osamu Komori

https://textbookfull.com/product/statistical-methods-for-
imbalanced-data-in-ecological-and-biological-studies-osamu-
komori/

Big Data and Computational Intelligence in Networking


1st Edition Yulei Wu

https://textbookfull.com/product/big-data-and-computational-
intelligence-in-networking-1st-edition-yulei-wu/

Statistical Data Analysis using SAS Intermediate


Statistical Methods Mervyn G. Marasinghe

https://textbookfull.com/product/statistical-data-analysis-using-
sas-intermediate-statistical-methods-mervyn-g-marasinghe/
Computational and Statistical Methods
for Analysing Big Data with Applications
Computational
and Statistical Methods
for Analysing Big Data
with Applications

Shen Liu
The School of Mathematical Sciences and the ARC Centre
of Excellence for Mathematical & Statistical Frontiers,
Queensland University of Technology, Australia

James McGree
The School of Mathematical Sciences and the ARC Centre
of Excellence for Mathematical & Statistical Frontiers,
Queensland University of Technology, Australia

Zongyuan Ge
Cyphy Robotics Lab, Queensland University
of Technology, Australia

Yang Xie
The Graduate School of Biomedical Engineering,
the University of New South Wales, Australia

AMSTERDAM BOSTON HEIDELBERG LONDON


G G G

NEW YORK OXFORD PARIS SAN DIEGO


G G G

SAN FRANCISCO SINGAPORE SYDNEY TOKYO


G G G

Academic Press is an imprint of Elsevier


Academic Press is an imprint of Elsevier
125, London Wall, EC2Y 5AS.
525 B Street, Suite 1800, San Diego, CA 92101-4495, USA
225 Wyman Street, Waltham, MA 02451, USA
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, UK
Copyright © 2016 Elsevier Ltd. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic
or mechanical, including photocopying, recording, or any information storage and retrieval system,
without permission in writing from the publisher. Details on how to seek permission, further information
about the Publisher’s permissions policies and our arrangements with organizations such as the
Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website:
www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the
Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience
broaden our understanding, changes in research methods, professional practices, or medical treatment
may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and
using any information, methods, compounds, or experiments described herein. In using such information
or methods they should be mindful of their own safety and the safety of others, including parties for
whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any
liability for any injury and/or damage to persons or property as a matter of products liability, negligence
or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in
the material herein.
ISBN: 978-0-12-803732-4
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library.
Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the Library of Congress.

For Information on all Academic Press publications


visit our website at http://store.elsevier.com/
List of Figures

Figure 5.1 A 3D plot of the drill-hole data. 106


Figure 6.1 95% credible intervals for effect sizes for the Mortgage default 122
example at each iteration of the sequential design process.
Figure 6.2 Optimal design versus selected design points for (a) credit 123
score, (b) house age, (c) years employed and (d) credit card
debt for the Mortgage default example.
Figure 6.3 Posterior model probabilities at each iteration of the sequential 125
design process for the two models in the candidate set for the
airline example.
Figure 6.4 Ds-optimal designs versus the designs extracted from the 125
airline dataset for (a) departure hour and (b) distance from
the origin to the destination.
Figure 7.1 Standard process for large-scale data analysis, proposed by 134
Peng, Leek, and Caffo (2015).
Figure 7.2 Data Completeness measured using the method described 142
in Section 7.2.1.3. The numbers indicate percentage of
completeness.
Figure 7.3 Customer demographics, admission and procedure claim data 144
for year 2010, along with 2011 DIH were used to train the
model. Later, at the prediction stage, customer
demographics, hospital admission and procedure claim data
for year 2011 were used to predict the number of DIH in
2012.
Figure 7.4 Average days in hospital per person by age for each of the 147
3 years of HCF data.
Figure 7.5 Scatter-plots for bagged regression tree results for customers 149
born before year 1948 (those aged 63 years or older when
the model was trained in 2011).
Figure 7.6 Distribution of the top 200 features among the four feature 151
subsets: (a) in the whole population; (b) in subjects born in
or after 1948; (c) in subjects born before 1948; (d) in the
1+ days group; (e) shows the percentage of the four subsets
with respect to the full feature set of 915 features.
viii List of Figures

Figure 8.1 Sandgate Road and its surroundings. 177


Figure 8.2 Individual travel times over Link A and B on 12 November 178
2013.
Figure 8.3 Clusters of road users and travel time estimates, Link A. 180
Figure 8.4 Clusters of road users and travel time estimates, Link B. 181
Figure 8.5 Travel time estimates of the clustering methods and spot speed 182
data, Link A.
List of Tables

Table 6.1 Results from analysing the full dataset for year 2000 for the 120
Mortgage default example
Table 6.2 The levels of each covariate available for selection in the 121
sequential design process for the mortgage example from
Drovandi et al. (2015)
Table 6.3 Results from analysing the extracted dataset from the initial 121
learning phase and the sequential design process for the
Mortgage default example
Table 6.4 The levels of each covariate available for selection in the 124
sequential design process for the airline example from
Drovandi et al. (2015)
Table 7.1 A summary of big data analysis in healthcare 132
Table 7.2 Summary of the composition of the feature set. the four feature 138
subset classes are: 1) demographic features; 2) medical
features extracted from clinical information; 3) prior cost/dih
features, and; 4) other miscellaneous features
Table 7.3 Performance measures 146
Table 7.4 Performance metrics of the proposed method, evaluated on 148
different populations
Table 7.5 Performance metrics of predictions using feature category 150
subsets only
Table 7.6 An example of interesting ICD-10 primary diagnosis features 151
Table 8.1 Proportions of road users that have the same grouping over the 182
two links
Acknowledgment

The authors would like to thank the School of Mathematical Sciences and the ARC
Centre of Excellence for Mathematical & Statistical Frontiers at Queensland
University of Technology, the Australian Centre for Robotic Vision, and the
Graduate School of Biomedical Engineering at the University of New South Wales
for their support in the development of this book.
The authors are grateful to Ms Yike Gao for designing the front cover of this
book.
Introduction
1
The history of humans storing and analysing data dates back to about 20,000 years
ago when tally sticks were used to record and document numbers. Palaeolithic
tribespeople used to notch sticks or bones to keep track of supplies or trading activi-
ties, while the notches could be compared to carry out calculations, enabling them
to make predictions, such as how long their food supplies would last (Marr, 2015).
As one can imagine, data storage or analysis in ancient times was very limited.
However, after a long journey of evolution, people are now able to collect and
process huge amounts of data, such as business transactions, sensor signals, search
engine queries, multimedia materials and social network activities. As a 2011
McKinsey report (Manyika et al., 2011) stated, the amount of information in our
society has been exploding, and consequently analysing large datasets, which refers
to the so-called big data, will become a key basis of competition, underpinning new
waves of productivity growth, innovation and consumer surplus.

1.1 What is big data?


Big data is not a new phenomenon, but one that is part of a long evolution of data
collection and analysis. Among numerous definitions of big data that have been
introduced over the last decade, the one provided by Mayer-Schönberger and
Cukier (2013) appears to be most comprehensive:
G
Big data is “the ability of society to harness information in novel ways to produce useful
insights or goods and services of significant value” and “things one can do at a large scale
that cannot be done at a smaller one, to extract new insights or create new forms of value.”

In the community of analytics, it is widely accepted that big data can be concep-
tualized by the following three dimensions (Laney, 2001):
G
Volume
G
Velocity
G
Variety

1.1.1 Volume
Volume refers to the vast amounts of data being generated and recorded. Despite
the fact that big data and large datasets are different concepts, to most people big
data implies an enormous volume of numbers, images, videos or text. Nowadays,

Computational and Statistical Methods for Analysing Big Data with Applications.
© 2016 Elsevier Ltd. All rights reserved.
2 Computational and Statistical Methods for Analysing Big Data with Applications

the amount of information being produced and processed is increasing tremen-


dously, which can be depicted by the following facts:
G
3.4 million emails are sent every second;
G
570 new websites are created every minute;
G
More than 3.5 billion search queries are processed by Google every day;
G
On Facebook, 30 billion pieces of content are shared every day;
G
Every two days we create as much information as we did from the beginning of time
until 2003;
G
In 2007, the number of Bits of data stored in the digital universe is thought to have
exceeded the number of stars in the physical universe;
G
The total amount of data being captured and stored by industry doubles every 1.2 years;
G
Over 90% of all the data in the world were created in the past 2 years.
As claimed by Laney (2001), increases in data volume are usually handled by
utilizing additional online storage. However, the relative value of each data point
decreases proportionately as the amount of data increases. As a result, attempts
have been made to profile data sources so that redundancies can be identified and
eliminated. Moreover, statistical sampling can be performed to reduce the size of
the dataset to be analysed.

1.1.2 Velocity
Velocity refers to the pace of data streaming, that is, the speed at which data are
generated, recorded and communicated. Laney (2001) stated that the bloom of
e-commerce has increased point-of-interaction speed and consequently the pace
data used to support interactions. According to the International Data Corporation
(https://www.idc.com/), the global annual rate of data production is expected to
reach 5.6 zettabytes in the year 2015, which doubles the figure of 2012. It is
expected that by 2020 the amount of digital information in existence will have
grown to 40 zettabytes. To cope with the high velocity, people need to access, pro-
cess, comprehend and act on data much faster and more effectively than ever
before. The major issue related to the velocity is that data are being generated con-
tinuously. Traditionally, the time gap between data collection and data analysis
used to be large, whereas in the era of big data this would be problematic as a large
portion of data might have been wasted during such a long period. In the presence
of high velocity, data collection and analysis need to be carried out as an integrated
process. Initially, research interests were directed towards the large-volume charac-
teristic, whereas companies are now investing in big data technologies that allow us
to analyse data while they are being generated.

1.1.3 Variety
Variety refers to the heterogeneity of data sources and formats. Since there are
numerous ways to collect information, it is now common to encounter various types
of data coming from different sources. Before the year 2000, the most common
Introduction 3

format of data was spreadsheets, where data are structured and neatly fit into
tables or relational databases. However, in the 2010s most data are unstructured,
extracted from photos, video/audio documents, text documents, sensors, transaction
records, etc. The heterogeneity of data sources and formats makes datasets too com-
plex to store and analyse using traditional methods, while significant efforts have to
be made to tackle the challenges involved in large variety.
As stated by the Australian Government (http://www.finance.gov.au/sites/
default/files/APS-Better-Practice-Guide-for-Big-Data.pdf), traditional data analysis
takes a dataset from a data warehouse, which is clean and complete with gaps filled
and outliers removed. Analysis is carried out after the data are collected and stored
in a storage medium such as an enterprise data warehouse. In contrast, big data
analysis uses a wider variety of available data relevant to the analytics problem.
The data are usually messy, consisting of different types of structured and
unstructured content. There are complex coupling relationships in big data from
syntactic, semantic, social, cultural, economic, organizational and other aspects.
Rather than interrogating data, those analysing explore it to discover insights and
understandings such as relevant data and relationships to explore further.

1.1.4 Another two V’s


It is worth noting that in addition to Laney’s three Vs, Veracity and Value have
been frequently mentioned in the literature of big data (Marr, 2015). Veracity refers
to the trustworthiness of the data, that is, the extent to which data are free of biased-
ness, noise and abnormality. Efforts should be made to keep data tidy and clean,
whereas methods should be developed to prevent recording dirty data. On the other
hand, value refers to the amount of useful knowledge that can be extracted from
data. Big data can deliver value in a broad range of fields, such as computer vision
(Chapter 4 of this book), geosciences (Chapter 5), finance (Chapter 6), civil avia-
tion (Chapter 6), health care (Chapters 7 and 8) and transportation (Chapter 8). In
fact, the applications of big data are endless, from which everyone is benefiting as
we have entered a data-rich era.

1.2 What is this book about?


Big data involves a collection of techniques that can help in extracting useful infor-
mation from data. Aiming at this objective, we develop and implement advanced
statistical and computational methodologies for use in various high impact areas
where big data are being collected.
In Chapter 2, classification methods will be discussed, which have been exten-
sively implemented for analysing big data in various fields such as customer seg-
mentation, fraud detection, computer vision, speech recognition and medical
diagnosis. In brief, classification can be viewed as a labelling process for new
observations, aiming at determining to which of a set of categories an unlabelled
object would belong. Fundamentals of classification will be introduced first, fol-
lowed by a discussion on several classification methods that have been popular in
4 Computational and Statistical Methods for Analysing Big Data with Applications

big data applications, including the k-nearest neighbour algorithm, regression mod-
els, Bayesian networks, artificial neural networks and decision trees. Examples will
be provided to demonstrate the implementation of these methods.
Whereas classification methods are suitable for assigning an unlabelled observa-
tion to one of several existing groups, in practice groups of data may not have been
identified. In Chapter 3, three methods for finding groups in data will be intro-
duced: principal component analysis, factor analysis and cluster analysis. Principal
component analysis is concerned with explaining the variance-covariance structure
of a set of variables through linear combinations of these variables, whose general
objectives are data reduction and interpretation. Factor analysis can be considered
an extension of principal component analysis, aiming to describe the covariance
structure of all observed variables in terms of a few underlying factors. The primary
objective of cluster analysis is to categorize objects into homogenous groups, where
objects in one group are relatively similar to each other but different from those in
other groups. Both hierarchical and non-hierarchical clustering procedures will be
discussed and demonstrated by applications. In addition, we will study fuzzy clus-
tering methods which do not assign observations exclusively to only one group.
Instead, an individual is allowed to belong to more than one group, with an
estimated degree of membership associated with each group.
Chapter 4 will focus on computer vision techniques, which have countless appli-
cations in many regions such as medical diagnosis, face recognition or verification
system, video camera surveillance, transportation, etc. Over the past decade, com-
puter vision has been proven successful in solving real-life problems. For instance,
the registration plate of a vehicle using a tollway is identified from the picture taken
by the monitoring camera, and then the corresponding driver will be notified and
billed automatically. In this chapter, we will discuss how big data facilitate the
development of computer vision technologies, and how these technologies can be
applied in big data applications. In particular, we will discuss deep learning algo-
rithms and demonstrate how this state-of-the-art methodology can be applied to
solve large scale image recognition problems. A tutorial will be given at the end of
this chapter for the purpose of illustration.
Chapter 5 will concentrate on spatial datasets. Spatial datasets are very common
in statistical analysis, since in our lives there is a broad range of phenomena that
can be described by spatially distributed random variables (e.g. greenhouse gas
emission, sea level, etc.). In this chapter, we will propose a computational method
for analysing large spatial datasets. An introduction to spatial statistics will be pro-
vided at first, followed by a detailed discussion of the proposed computational
method. The code of MATLAB programs that are used to implement this method
will be listed and discussed next, and a case study of an open-pit mining project
will be carried out.
In Chapter 6, experimental design techniques will be considered in analysing big
data as a way of extracting relevant information in order to answer specific ques-
tions. Such an approach can significantly reduce the size of the dataset to be
analysed, and potentially overcome concerns about poor quality due to, for example,
sample bias. We will focus on a sequential design approach for extracting
Introduction 5

informative data. When fitting relatively complex models (e.g. those that are non-lin-
ear) the performance of a design in answering specific questions will generally
depend upon the assumed model and the corresponding values of the parameters. As
such, it is useful to consider prior information for such sequential design problems.
We argue that this can be obtained in big data settings through the use of an initial
learning phase where data are extracted from the big dataset such that appropriate
models for analysis can be explored and prior distributions of parameters can be
formed. Given such prior information, sequential design is undertaken as a way of
identifying informative data to extract from the big dataset. This approach is demon-
strated in an example where there is interest to determine how particular covariates
effect the chance of an individual defaulting of their mortgage, and we also explore
the appropriateness of a model developed in the literature for the chance of a late
arrival in domestic air travel. We will also show that this approach can provide a
methodology for identifying gaps in big data which may reveal limitations in the
types of inferences that may be drawn.
Chapter 7 will concentrate on big data analysis in the health care industry.
Healthcare administrators worldwide are striving to lower the cost of care whilst
improving the quality of care given. Hospitalization is the largest component of
health expenditure. Therefore, earlier identification of those at higher risk of being
hospitalized would help healthcare administrators and health insurers to develop
better plans and strategies. In Chapter 7, we will develop a methodology for analys-
ing large-scale health insurance claim data, aiming to predict the number of hospi-
talization days. Decision trees will be applied to hospital admissions and procedure
claims data, which were observed from 242,075 individuals. The proposed
methodology performs well in the general population as well as in subpopulations
(e.g. elderly people), as the analysis results indicate that it is reasonably accurate in
predicting days in hospital.
In Chapter 8 we will study how mobile devices facilitate big data analysis. Two
types of mobile devices will be reviewed: wearable sensors and mobile phones.
The former is designed with a primary focus on monitoring health conditions of
individuals, while the latter are becoming the central computer and communication
device in our lives. Data collected by these devices often exhibit high-volume,
high-variety and high-velocity characteristics, and hence suitable methods need to
be developed to extract useful information from the observed data. In this chapter,
we will concentrate on the applications of wearable devices in health monitoring,
and a case study in transportation where data collected from mobile devices
facilitate the management of road networks.

1.3 Who is the intended readership?


The uniqueness of this book is the fact that it is both methodology- and application-
oriented. While the statistical content of this book facilitates analysing big data,
the applications and case studies presented in Chapters 4 to 8 can make advanced
6 Computational and Statistical Methods for Analysing Big Data with Applications

statistical and computational methods understandable and available for implementa-


tion to practitioners in various fields of application.
The primary intended readership of this book includes statisticians, mathemati-
cians or computational scientists who are interested in big-data-related issues, such
as statistical methodologies, computing algorithms, applications and case studies.
In addition, people from non-statistical/mathematical background but with moderate
quantitative analysis knowledge will find our book informative and helpful if
they seek solutions to practical problems that are associated with big data.
For example, those in the field of computer vision might be interested in image
processing and pattern recognition techniques presented in Chapter 4, while the
methodology and case study presented in Chapter 7 will be of general interest to
practitioners in the health care or medical industry as well as anyone involved in
analysing large medical data sets for the purpose of answering specific research
questions. Moreover, the case study in Chapter 8, entitled ‘mobile devices in
transportation’, would be beneficial to government officials or industry managers
who are responsible for transport projects.

References
Laney, D. (2001). 3D data management: Controlling data volume, velocity, and variety. US:
META Group.
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., et al. (2011). Big
data: The next frontier for innovation, competition, and productivity. US: McKinsey
Global Institute.
Marr, B. (2015). Big data: Using SMART big data, analytics and metrics to make better
decisions and improve performance. UK: Johny Wiley & Sons, Inc.
Mayer-Schönberger, V., & Cukier, K. (2013). Big data: A revolution that will transform how
we live, work, and think. UK: Hachette.
Classification methods
2
Classification is one of the most popular data mining techniques, which has been
extensively used in the analysis of big data in various fields such as customer
segmentation, fraud detection, computer vision, speech recognition, medical diagno-
sis, etc. We consider classification as a labelling process for new observations, that
is, it predicts categorical labels for new observations, aiming at determining to which
of a set of categories an unlabelled object should belong. For example, a bank loan
officer may want to evaluate whether approving a mortgage application is risky or
safe, and consequently he/she needs to determine a label for the corresponding
applicant (‘risky’ or ‘safe’). This is a typical classification task involving only two
categories, whereas in general one may need to classify a new observation into one
of k categories where k 5 2, 3,. . .. As we will see in the following sections, however,
the principles of classification do not alter regardless of the number of classes.
In practice, a classification task is implemented through the following three
stages:
Stage 1: Specify a suitable algorithm for classification, that is, a classifier.
Stage 2: Optimize the selected classification algorithm using a set of training data.
Stage 3: Make predictions using the optimized classification algorithm.
The selection of a classifier in Stage 1 is generally data dependent, that is, differ-
ent studies may favour different classification algorithms. Two issues need to be
addressed when selecting a classifier. Firstly, there is a trade-off between the
accuracy and computing speed of a classifier: more sophisticated classification
algorithms tend to exhibit greater reliability but are associated with larger computa-
tion cost. Holding other conditions (e.g. computing power) unchanged, the selection
of classifiers heavily depends on what appears to be more desirable to the user: to
make predictions quickly or to make prediction accurately. Secondly, the under-
lying assumptions of a classifier should always be satisfied; otherwise it should not
be applied. For instance, one should not employ a linear classifier to solve nonlinear
classification problems.
Stage 2 is known as the training process of a classification algorithm, which
requires a set of training data which contains observations whose labels are known.
The aim of the training process is to derive a rule that can be used to assign each
new object to a labelled class optimally. This process is implemented by looking
for a set of parameter values that minimizes the likelihood of misclassification.
In Stage 3, the optimized classification algorithm is applied to new observations
so that their labels can be predicted. Note that the reliability of predictions builds
on the assumption that new observations would behave similarly to those in the
training dataset. If this assumption is violated, the performance of the classification
algorithm would be questionable.

Computational and Statistical Methods for Analysing Big Data with Applications.
© 2016 Elsevier Ltd. All rights reserved.
8 Computational and Statistical Methods for Analysing Big Data with Applications

In this chapter, the fundamentals of classification are introduced first, followed by


a discussion on several classification methods that have been popular in big data
applications. Examples will be provided to demonstrate the implementation of these
methods.

2.1 Fundamentals of classification


Classification is often referred to as a supervised learning process, that is, all obser-
vations in the training dataset are correctly labelled. To describe individual observa-
tions, we usually consider a set of features, or explanatory variables, which may be
of any data type such as nominal (e.g. gender), ordinal (e.g. ranking), interval
(e.g. Celsius temperature) and ratio (e.g. income). By examining the relationship
between these features and the corresponding class labels, a classification rule can
be derived to discriminate between individuals from different classes.

2.1.1 Features and training samples


We start with the case where k 5 2, that is, there are two classes of objects. Denote
the labels of these two classes π1 and π2 , respectively. Let R1 and R2 be two regions
of possible outcomes such that if an observation falls in R1 it is then categorized
into π1 ; if it falls in R2 , it is allocated to R2 . When k 5 2, it is assumed that R1 and
R2 are collectively exhaustive, that is, R1 and R2 jointly cover all possible out-
comes. However, one should note that there may not be a clear distinction between
R1 and R2 , implying that they may overlap. As a consequence, classification algo-
rithms are not guaranteed to be error-free. It is possible to incorrectly classify a π2
object as belonging to π1 or a π1 object as belonging to π2 .
Assume that all objects are measured by p random variables, denoted
0
X 5 ½X1 ; . . . ; Xp . It is expected that the observed values of X in π1 represent the
first class, which are different from those in π2 .
For example, consider the following two populations:
π1 : Purchasers of a new product,
π2 : Laggards (those slow to purchase),

whereby consumers are to be categorized into one of these two classes on the basis
of observed values of the following four variables, which are presumably relevant
in this case:
x1 : Education,
x2 : Income,
x3 : Family size,
x4 : History of brand switching.
That is, an individual object, say the ith object, is characterized by
0  
xi 5 x1;i ; x2;i ; x3;i ; x4;i :
Classification methods 9

If there are n objects in total, then we have the following data matrix:
2 3
x1;1 ; x1;2 ; . . . ; x1;n
6 x2;1 ; x2;2 ; . . . ; x2;n 7
X 5 ½x1 ; . . . ; xn  5 6 7
4 x3;1 ; x3;2 ; . . . ; x3;n 5:
x4;1 ; x4;2 ; . . . ; x4;n

If the n labels of X, corresponding to the n columns, are known, X is considered


as a training sample. We then separate X into two groups (purchasers and laggards)
and examine the difference to investigate how the label of an object can be deter-
mined by its feature values, that is, why the ith consumer is a ‘purchaser’ or a ‘lag-
gard’ given the corresponding education, income, family size and history of brand
switching.

Example: Discriminating owners from non-owners of riding


mowers
In order to identify the best sales prospects for an intensive sales campaign, a riding
mower manufacturer is interested in classifying families as prospective owners or
non-owners. That is, the following two classes of families are considered:
π1 : Owners of riding mowers,
π2 : Non-owners of riding mowers.
The riding mower manufacturer has decided to characterize each family by
two features: income ($1000) and lot size (1000 ft2). Twelve families are randomly
selected from each class and the observed feature values are reported in the
following table:
Owners of riding Non-owners of
mowers riding mowers
Income Lot size Income Lot size
($1000) (1000 ft2) ($1000) (1000 ft2)
60 18.4 75 19.6
85.5 16.8 52.8 20.8
64.8 21.6 64.8 17.2
61.5 20.8 43.2 20.4
87 23.6 84 17.6
110.1 19.2 49.2 17.6
108 17.6 59.4 16
82.8 22.4 66 18.4
68 20 47.4 16.4
93 20.8 33 18.8
51 22 51 14
81 20 63 14.8

This dataset is available at http://pages.stat.wisc.edu/Brich/JWMULT02dat/T11-


1.DAT.
10 Computational and Statistical Methods for Analysing Big Data with Applications

To learn how the two features can be used to discriminate between owners and
non-owners, we display the training sample in the scatterplot below. It can be seen
that owners of riding mowers tend to have greater income and bigger lots than non-
owners, which implies that if an unlabelled family has a reasonably large income and
lot size, it is likely to be predicted as a prospective owner. Nonetheless, one should
note from the figure below that the two classes in the training sample exhibit some
overlap, indicating that some owners might be incorrectly classified as non-owners
and vice versa. The objective of training a classifier is to create a rule that minimizes
the occurrence of misclassification. For example, if a linear classifier (represented by
the solid line in the figure below) is considered to determine R1 and R2 , then its
parameters (namely, intercept and slope) are optimized such that the number of blue
circles falling into R2 plus the number of green circles falling into R1 is minimized.

Owner
22.50 Nonowner
R1

20.00
Lot size

17.50

R2
15.00

40.00 60.00 80.00 100.00 120.00


Income

2.1.2 Probabilities of misclassification and the associated costs


In addition to the occurrence of misclassification, one should also consider the
cost of misclassification. Generally, the consequence of classifying a π1 object
as belonging to π2 is different from the consequence of classifying a π2 object as
belonging to π1 . For instance, as we will see in Chapter 8, failing to diagnose a
potentially fatal illness is a much more serious mistake than concluding that the dis-
ease is present whereas in fact it is not. That is, the cost of the former misclassifi-
cation is higher than that of the latter. Therefore, an optimal classification
algorithm should take into account the costs associated with misclassification.
Denote by f1 ðxÞ and f2 ðxÞ the probability density functions associated with the
p 3 1 vector random variable X for the populations π1 and π2 , respectively. Denote
Ω 5 R1 , R2 the sample space, that is, the collection of all possible observations of
X. Two types of misclassification are defined as follows:
Ð
Pð2j1Þ 5 PðxAR2 jπ1 Þ 5 R2 f1 ðxÞdx;
Ð
Pð1j2Þ 5 PðxAR1 jπ2 Þ 5 R1 f2 ðxÞdx;
Classification methods 11

where Pð2j1Þ refers to the conditional probability that a π1 object is classified as


belonging to π2 , which is computed as the volume formed by the probability den-
sity function f1 ðxÞ over the region of π2 . Similarly, Pð1j2Þ denotes the conditional
probability of classifying a π2 object as belonging to π1 , computed as the volume
formed by the probability density function f2 ðxÞ over the region of π1 . Furthermore,
denote p1 and p2 the prior probabilities of π1 and π2 , respectively, where
p1 1 p2 5 1. The overall probabilities of classification can be derived as the product
of the prior and conditional probabilities:
G
P(a π1 object is correctly classified) 5 PðxAR1 jπ1 ÞPðπ1 Þ 5 Pð1j1Þp1 ,
G
P(a π1 object is misclassified) 5 PðxAR2 jπ1 ÞPðπ1 Þ 5 Pð2j1Þp1 ,
G
P(a π2 object is correctly classified) 5 PðxAR2 jπ2 ÞPðπ2 Þ 5 Pð2j2Þp2 ,
G
P(a π2 object is misclassified) 5 PðxAR1 jπ2 ÞPðπ2 Þ 5 Pð1j2Þp2 ,
and then the associated costs are defined as follows:
G
Cost of classifying a π1 object correctly 5 cð1j1Þ 5 0 (no cost),
G
Cost of classifying a π1 object incorrectly 5 cð2j1Þ . 0,
G
Cost of classifying a π2 object correctly 5 cð2j2Þ 5 0 (no cost),
G
Cost of classifying a π2 object incorrectly 5 cð1j2Þ . 0.
The expected cost of misclassification (ECM) is calculated as follows:

ECM 5 cð2j1ÞPð2j1Þp1 1 cð1j2ÞPð1j2Þp2 ;

which is considered as the overall cost function to be minimized. Note that if


cð2j1Þ 5 cð1j2Þ, minimizing the ECM is equivalent to minimizing the total probabil-
ity of misclassification (TPM), which takes the following form:
ð ð
TPM 5 p1 f1 ðxÞdx 1 p2 f2 ðxÞdx:
R2 R1

2.1.3 Classification by minimizing the ECM


A reasonable classifier should have the ECM as small as possible, that is, it deter-
mines R1 and R2 that result in the minimum ECM. Mathematically, R1 and R2 are,
respectively, defined as the regions over which x satisfies the following
inequalities:

f1 ðxÞ cð1j2Þ p2
R1 : $ ;
f2 ðxÞ cð2j1Þ p1
f1 ðxÞ cð1j2Þ p2
R2 : , :
f2 ðxÞ cð2j1Þ p1

That is, an object is classified as π1 if f1 ðxÞcð2j1Þp1 $ f2 ðxÞcð1j2Þp2 ; otherwise it


is classified as π2 .
12 Computational and Statistical Methods for Analysing Big Data with Applications

To implement this classification rule, one needs to evaluate (i) the probability
density functions of π1 and π2 , (ii) the costs of misclassification, and (iii) the prior
probabilities of π1 and π2 . With respect to (i), the probability density functions can
be approximated using a training dataset. With respect to (ii) and (iii), it is often
much easier to specify the ratios than their component parts. For example, it may
be difficult to specify the exact cost of failing to diagnose a potentially fatal illness,
but it may be much easier to state that failing to diagnose a potentially fatal illness
is, say, ten times as risky as concluding that the disease is present but in fact it is
not, that is, the cost ratio is 10.

Example: Medical diagnosis


Suppose that a researcher wants to determine a rule for diagnosing a particular dis-
ease based on a collection of features that characterize the symptoms of a patient.
Clearly, the following two classes are of interest:

π1 : The disease is present,


π2 : The disease is not present.
The researcher has collected sufficient training data from π1 and π2 to approxi-
mate f1 ðxÞ and f2 ðxÞ, respectively. It is believed that failing to diagnose the disease
is five times as risky as incorrectly claiming that the disease is present. In addition,
previous studies have found that about 20% of patients would be diagnosed with
the disease. Therefore, the following information is available:

cð2j1Þ=cð1j2Þ 5 5=1;
p1 =p2 5 0:2=0:8;

and then the classification regions are determined as follows:

f1 ðxÞ cð1j2Þ p2 1 0:8


R1 : $ 5 3 5 0:8;
f2 ðxÞ cð2j1Þ p1 5 0:2
f1 ðxÞ cð1j2Þ p2 1 0:8
R2 : , 5 3 5 0:8:
f2 ðxÞ cð2j1Þ p1 5 0:2

That is, if f1 ðxÞ=f2 ðxÞ $ 0:8, the corresponding patient is diagnosed with the dis-
ease; otherwise the disease is not present.
Assume that a new patient’s symptoms are characterized by x0 , with
f1 ðx0 Þ 5 0:71 and f2 ðx0 Þ 5 0:77. Based on the classification rule above, this patient
should be diagnosed with the disease because

f1 ðxÞ=f2 ðxÞ 5 0:922 . 0:8:


Classification methods 13

Note that under particular circumstances it is nearly impossible to obtain infor-


mation about the costs of misclassification or prior probabilities. In such cases, it is
reasonable to set cð2j1Þ=cð1j2Þ and p1 5 p2 , implying that the classification rule
reduces to

f1 ðxÞ
R1 : $ 1;
f2 ðxÞ
f1 ðxÞ
R2 : , 1:
f2 ðxÞ

In the example above, if cð2j1Þ=cð1j2Þ and p1 =p2 were unknown, the new patient
would have been diagnosed as being healthy, since f1 ðxÞ=f2 ðxÞ 5 0:922 , 1. As a
consequence, when carrying out classification tasks one is encouraged to obtain as
much information as possible to improve the reliability of classification.

2.1.4 More than two classes


So far we have considered classifying an object into one of two classes, that is, k 5 2.
In general one may need to classify an object into one of k categories π1 ; . . .; πk
where k 5 2, 3. . .. Suppose that the ith class πi is associated with the classification
region Ri and the probability density function fi ðxÞ, and its prior probability is pi . For
i; j 5 1; . . .; k, the probability of classifying a πi object into πj is defined as
ð
PðjjiÞ 5 PðxARj jπi Þ 5 fi ðxÞdx;
Rj

and the associated cost is denoted cðjjiÞ. The expected cost of misclassifying a πi
object, denoted ECMi , is computed as

X
k
ECMi 5 PðjjiÞcðjjiÞ;
j51

and the overall ECM is then defined as

X
k X
k X
k
ECM 5 pi ECMi 5 pi PðjjiÞcðjjiÞ:
i51 i51 j51

By minimizing the ECM, the following classification rule applies to a new


observation x0 :
Pk Pk
G
x0 is classified to πj if i51;i6¼j pi fi ðx0 ÞcðjjiÞ # i51;i6¼l pi fi ðx0 ÞcðljiÞ for any l 6¼ j, where
j; l 5 1; . . .; k.
14 Computational and Statistical Methods for Analysing Big Data with Applications

If cðjjiÞ is a constant for any i 6¼ j, the classification rule is then dominated by the pos-
terior probabilities. For a new observation x0 , its posterior probability associated with πi
is of the following form:

Pðπi jx0 Þ 5 Pðx0 jπi ÞPðπi Þ=Pðx0 Þ;


P P
where Pðx0 jπi Þ 5 fi ðx0 Þ, Pðπi Þ 5 pi , and Pðx0 Þ 5 ki51 Pðx0 jπi ÞPðπi Þ 5 ki51 fi ðx0 Þpi .
Since Pðx0 Þ remains the same regardless of the i value, the following classification rule
applies:
G
x0 is allocated to πj if fj ðx0 Þpj $ fi ðx0 Þpi for any i 6¼ j, where i; jAf1; . . . ; kg.

Now we have learnt the fundamentals of classification. In the next section,


we will discuss several classifiers that have been widely employed for analysing
big data.

2.2 Popular classifiers for analysing big data


The following classifiers have been found popular for analysing big data:
G
k-Nearest neighbour algorithm,
G
Regression models,
G
Bayesian networks,
G
Artificial neural networks,
G
Decision trees.
In this section, we briefly discuss these classifiers (as a comprehensive introduc-
tion is out of the scope of this book). For each classifier, suitable references are pro-
vided to which the interested reader is referred for full details.

2.2.1 k-Nearest neighbour algorithm


Being a non-parametric, instance-based method, the k-nearest neighbour (kNN)
algorithm is arguably the simplest and most intuitively appealing classification pro-
cedure (Hall, Park, & Samworth, 2008). Given a new observation x0 , the kNN algo-
rithm determines its class label as follows:
G
Within the training sample, select k data points (k 5 1; 2; . . .) that are closest to x0 , which
are referred to as the k-nearest neighbours. The closeness is evaluated by some distance
measure, for example, Euclidean distance.
G
The class label of x0 is determined by a majority vote of its k-nearest neighbours, that is,
x0 is assigned to the class that is most frequent among its k-nearest neighbours.
To demonstrate, consider the ‘German credit’ dataset which is available at the
following address:
https://archive.ics.uci.edu/ml/datasets/Statlog1%28German1Credit1Data%29
Classification methods 15

This dataset consists of records of 1000 customers characterized by 20


features, with the response variable being whether a customer is a good or bad
credit risk. For the sake of simplicity, we only consider two features: duration
in months and credit amount. The 1000 records are split into a training set and
a test set, where the former contains the first 800 customers and the latter
contains the rest. The following MATLAB code is used to implement the kNN
algorithm:
y 5 creditrisk; % Response variable: good or Bad credit risk
x 5 [duration, credit_amount]; % Duration in months and Credit amount

train 5 800; % Size of training sample


xtrain 5 x(1:train,:); % Training sample
ytrain 5 y(1:train,:); % Labels of training sample

mdl 5 fitcknn(xtrain,ytrain,‘NumNeighbors’,5); % Train the kNN with k 5 5

xtest 5 x(train 1 1:end,:); % Test set


ytest 5 y(train 1 1:end,:); % Labels of test set

ypredict 5 predict(mdl,xtest); % Prediction of the test set


rate 5 sum(ypredict 55 ytest)/numel(ytest); % Compute the rate of correct
classification

The following table summarizes the classification results:


Predicted
Good Bad
Good 123 16
Actual
Bad 47 14

Results show that 68.5% of the customers in the test set were correctly labelled.
This rate may be enhanced by considering a different set of explanatory variables
(features), or a different value of k. Hall et al. (2008) provided guidelines for the
selection of k. When applying kNN to large datasets, it is computationally intensive
to conduct exhaustive searching, that is, the distance from x0 to each training
sample is computed. To make kNN computationally tractable, various nearest-
neighbour search algorithms have been proposed to reduce the number of distance
evaluations actually performed (e.g. Nene & Nayar, 1997). In addition, feature
extraction and dimension reduction can be performed to avoid the effects of
the curse of dimensionality, as in high dimensions the Euclidean distance is not
helpful in the evaluation of closeness.
Although the kNN algorithm is conceptually very simple, it can perform very
well in numerous real-life applications. For instance, as we will see in Chapter 8,
the kNN algorithm is capable of discriminating between the biosignal of a fall
and those of daily-living activities (e.g. walking, jumping, etc.), which enables
automatic fall detection.
16 Computational and Statistical Methods for Analysing Big Data with Applications

2.2.2 Regression models


A regression model is used to estimate the relationship between a response (depen-
dent) variable and a set of explanatory (independent) variables, which indicates the
strength and significance of impact of the explanatory variables on the response
variable. Generally, a regression model can be expressed in the following form:

Y 5 f ðX Þ 1 ε;

where Y denotes the response variable, X is the collection of explanatory variables,


and ε is known as the error term.
Although there are innumerable forms of regression models, for the purpose of
classification we are interested in those which are capable of predicting categorical
response variables. If the response variable is binary, the logistic regression can be
considered (Long, 1997). Denote Y the response variable which has two possible
outcomes (0 or 1), and X 5 ½X1 ; . . . ; Xp 0 denotes the p explanatory variables that
may influence Y. The aim of the logistic regression is to estimate the following
conditional probability:

PðY 5 1jXÞ;

that is, the probability that Y is predicted as ‘1’ given values of the p explanatory
variables. To ensure that PðY 5 1jXÞ takes values between zero and one, the logistic
function is considered:

σðtÞ 5 1=ð1 1 expð2 tÞÞ;

where tAð2N; 1 NÞ and σðtÞA½0; 1. Based on the σðtÞ function, the logistic
regression model is formulated as follows:

expðβ 0 1 β 1 X1 1 ? 1 β p Xp Þ
PðY 5 1jXÞ 5 ;
1 1 expðβ 0 1 β 1 X1 1 ? 1 β p Xp Þ

where β 0 denotes the intercept and β 1 ; . . .; β p are the p coefficients associated with
X1 ; . . .; Xp , respectively. The p 1 1 parameters are usually estimated by the maxi-
mum likelihood estimator, and then PðY 5 1jx0 Þ can be computed where x0 denotes
a new observation. If PðY 5 1jx0 Þ is greater than τ where τ denotes a threshold
value, the new observation is classified into the first category (Y 5 1); otherwise it is
assigned to the second group (Y 5 0). If no other information is available (i.e. costs
of misclassification and prior probabilities), τ 5 0:5 is a reasonable choice.
To illustrate, let’s revisit the example: discriminating owners from non-owners
of riding mowers. The logistic regression takes the following form:

expðβ 0 1 β 1 X1 1 β 2 X2 Þ
PðY 5 1jXÞ 5 ;
1 1 expðβ 0 1 β 1 X1 1 β 2 X2 Þ
Classification methods 17

where Y 5 1 stands for the group of owners, X1 and X2 denote income and lot size,
respectively. Using the dataset discussed in Section 2.1.1, the following parameter
estimates are obtained: β^ 0 5 2 25:86, β^ 1 5 0:110 and β^ 2 5 0:963. Now suppose we
have the following new observations:
Income Lot size
Person 1 90 22
Person 2 64 20
Person 3 70 18

Then based on the estimated regression model and the 0.5 threshold, it is pre-
dicted that
Probability of being Classification result
an owner
Person 1 0.9947 Owner
Person 2 0.6089 Owner
Person 3 0.3054 Non-owner

If Y has more than two classes, one may consider a multinomial logistic regres-
sion model. Suppose that YAf1; . . . ; kg, that is, it has k possible outcomes. Given a
collection of explanatory variables X 5 ½X1 ; . . . ; Xp 0 , the multinomial logistic
regression is of the following form (Long & Freese, 2001):

expðβ 0;mjb 1 β 1;mjb X1 1 ? 1 β p;mjb Xp Þ


PðY 5 mjXÞ 5 Pk ;
i expðβ 0;ijb 1 β 1;ijb X1 1 ? 1 β p;ijb Xp Þ

where b denotes the base category, and β 0;ijb ; . . .; β p;ijb are the coefficients
that measure the ith category in relation to the base category. That is, the multinomial
logistic regression consists of k 2 1 equations with k 2 1 sets of parameters, each of
which evaluates the likelihood of a particular outcome in the comparison to a
pre-determined benchmark, namely, the base category. Note that β 0;bjb ; . . .; β p;bjb 5 0.
Moreover, to determine an optimal set of X1 ; . . .; Xp , a variable selection
algorithm can be implemented. The following MATLAB function is designed to
choose from all possible regression models (i.e. all possible combinations of
explanatory variables):
function [M, Iset] 5 allregress(y,X,criteria)

% Inputs:
% y - dependent variable, column vector, n-by-1
% X - n-by-p explanatory variable matrix
% criteria - model selection criteria
%

% Outputs:
% M - Statistics of the selected model
18 Computational and Statistical Methods for Analysing Big Data with Applications

% Iset - indices of the selected variables, a logic row vector, 1-by-k


%
% Example:
% y 5 randn(100,1);
% X 5 randn(100,6);
% [M_BIC, I_BIC] 5 allregress(y,X);
% [M_AIC, I_AIC] 5 allregress(y,X,‘AIC’);

if size(y,1)B 5 size(X,1)
error(‘y and X must have the same length’)
end

if nargin , 5 2
criteria 5 ‘BIC’; % Set default criterion
end

n 5 size(y,1); % No. of observations


k 5 size(X,2); % No. of explanatory variables
nsets 5 2^k-1; % No. of all possible regression models

C 5 NaN(1); % used to store criterion values


Iset 5 NaN(1,k); % used to store index of X variables

for i 5 1 : nsets
iset 5 logical(dec2bin(i, k) - ‘0’);
x 5 X(:,iset);
[B,B,M] 5 glmfit(x,y,‘binomial’,‘link’,‘logit’);
MSE 5 mean(M.resid.^2);
switch criteria
case ‘AIC’
c 5 log(MSE) 1 numel(M.beta) 2/n;
case ‘BIC’
c 5 log(MSE) 1 numel(M.beta) log(n)/n;
case ‘HQ’
c 5 log(MSE) 1 numel(M.beta) 2 log(log(n))/n;
otherwise
error(‘Selection criterion is not defined properly’)
end
if isnan(C)
C 5 c;
Iset 5 iset;
elseif C . c
C 5 c;
Iset 5 iset;
end
end
M 5 regstats(y,X(:,Iset));
end
Classification methods 19

2.2.3 Bayesian networks


A Bayesian network is a probabilistic graphical model that measures the conditional
dependence structure of a set of random variables based on the Bayes theorem:

PðBjAÞPðAÞ
PðAjBÞ 5 :
PðBÞ

Pearl (1988) stated that Bayesian networks are graphical models that contain
information about causal probability relationships between variables and are often
used to aid in decision making. The causal probability relationships in a Bayesian
network can be suggested by experts or updated using the Bayes theorem and new
data being collected. The inter-variable dependence structure is represented by
nodes (which depict the variables) and directed arcs (which depict the conditional
relationships) in the form of a directed acyclic graph (DAG). As an example, the
following DAG indicates that the incidence of two waterborne diseases (diarrhoea
and typhoid) depends on three indicators of water samples: total nitrogen, fat/oil
and bacteria count, each of which is influenced by another layer of nodes: elevation,
flooding, population density and land use. Furthermore, flooding may be influenced
by two factors: percolation and rainfall.

Elevation Nitrogen

Percolation

Diarrhoea

Flooding

Fat/oil
Rainfall

Typhoid
Population
density

Land use Bacteria

There are two components involved in learning a Bayesian network: (i) structure
learning, which involves discovering the DAG that best describes the causal
20 Computational and Statistical Methods for Analysing Big Data with Applications

relationships in the data, and (ii) parameter learning, which involves learning about
the conditional probability distributions. The two most popular methods for deter-
mining the structure of the DAG are the DAG search algorithm (Chickering, 2002)
and the K2 algorithm (Cooper & Herskovits, 1992). Both of these algorithms assign
equal prior probabilities to all DAG structures and search for the structure that
maximizes the probability of the data given the DAG, that is, PðdatajDAGÞ is maxi-
mized. This probability is known as the Bayesian score. Once the DAG structure is
determined, the maximum likelihood estimator is employed as the parameter learn-
ing method. Note that it is often critical to incorporate prior knowledge about causal
structures in the parameter learning process. For instance, consider the causal
relationship between two binary variables: rainfall (large or small) and flooding
(whether there is a flood or not). Apparently, the former has impact on the latter.
Denote the four corresponding events:
R: Large rainfall,
R: Small rainfall,
F: There is a flood,
F: There is no flood.
 
 In the
 absence of prior
 knowledge, the four joint probabilities PðF; RÞ, P F; R ,
P F; R and P F; R need to be inferred using the observed data; otherwise,
these probabilities can be pre-determined before fitting the Bayesian network to
data. Assume that the following statement is made by an expert: if the rainfall
is large, the chance of flooding is 60%; if the rainfall is small, the chance of no
flooding is four times as big as that of flooding. Then we have the following causal
relationship as prior information:

PðFjRÞ 5 0:6;
PðFjRÞ 5 0:4;
PðFjRÞ 5 0:2;
PðFjRÞ 5 0:8:

Furthermore, assume that meteorological data show that the chance of large
rainfalls is 30%, namely, PðRÞ 5 0:3. Then the following contingency table is
determined:

R R
F PðF; RÞ 5 PðFjRÞPðRÞ 5 0:18 PðF; RÞ 5 PðFjRÞPðRÞ 5 0:14
F PðF; RÞ 5 PðFjRÞPðRÞ 5 0:12 PðF; RÞ 5 PðFjRÞPðRÞ 5 0:56

which is an example of pre-specified probabilities based on prior knowledge.


For the purpose of classification, the naı̈ve Bayes classifier has been extensively
applied in various fields, such as the classification of text documents (spam or legit-
imate email, sports or politics news, etc.) and automatic medical diagnosis (to be
introduced in Chapter 8 of this book). Denote y the response variable which has
Classification methods 21

k possible outcomes, that is, yAfy1 ; . . . ; yk g, and let x1 ; . . . ; xp be the p features that
characterize y. Using the Bayes theorem, the conditional probability of each
outcome, given x1 ; . . . ; xp , is of the following form:
Pðx1 ; . . .; xp jyi ÞPðyi Þ
Pðyi jx1 ; . . .; xp Þ 5 :
Pðx1 ; . . .; xp Þ

Note that the naı̈ve Bayes classifier assumes that x1 ; . . .; xp are mutually indepen-
dent. As a result, Pðyi jx1 ; . . .; xp Þ can be re-expressed as follows:
p
Pðyi ÞLj51 Pðxj jyi Þ
Pðyi jxi ; . . .; xp Þ 5 ;
Pðx1 ; . . .; xp Þ
p
which is proportional to Pðyi ÞLj51 Pðxj jyi Þ. The maximum a posteriori decision rule
is applied and the most probable label of y is determined as follows:
p
y^ 5 argmaxiAf1;...;kg Pðyi Þ L Pðxj jyi Þ:
j51

An illustration of the naı̈ve Bayes classifier is provided here. Let’s revisit the
‘German credit’ dataset. This time, we consider the following features: duration in
months, credit amount, gender, number of people being liable, and whether a
telephone is registered under the customer’s name. The setting of training and test
sets remains the same. The following MATLAB code is implemented:
y 5 creditrisk; % Response variable
x 5 [duration, credit_amount, male, nppl, tele]; % Features

train 5 800; % Size of training sample

xtrain 5 x(1:train,:); % Training sample


ytrain 5 y(1:train,:); % Labels of training sample

xtest 5 x(train 1 1:end,:); % Test set


ytest 5 y(train 1 1:end,:); % Labels of test set

nbayes 5 fitNaiveBayes(xtrain,ytrain); % Train the Naïve Bayes

ypredict 5 nbayes.predict(xtest); % Prediction of the test set


rate 5 sum(ypredict 55 ytest)/numel(ytest); % Compute the rate of correct
classification

Again, 68.5% of the customers in the test set were correctly labelled. The results
are summarized in the following table:
Predicted
Good Bad
Good 121 18
Actual
Bad 45 16
22 Computational and Statistical Methods for Analysing Big Data with Applications

For modelling time series data, dynamic Bayesian networks can be employed to
evaluate the relationship among variables at adjacent time steps (Ghahramani,
2001). A dynamic Bayesian network assumes that an event has impact on another
in the future but not vice versa, implying that directed arcs should flow forward in
time. A simplified form of dynamic Bayesian networks is known as hidden Markov
models. Denote the observation at time t by Yt , where t is the integer-valued time
index. As stated by Ghahramani (2001), the name ‘hidden Markov’ is originated
from two assumptions: (i) a hidden Markov model assumes that Yt was generated
by some process whose state St is hidden from the observer, and (ii) the states of
this hidden process satisfy the first-order Markov property, where the rth order
Markov property refers to the situation that given St21 ; . . . ; St2r , St is independent
of Sτ for τ , t 2 r. The first-order Markov property also applies to Yt with respect
to the states, that is, given St , Yt is independent of the states and observations at all
other time indices. The following figure visualizes the hidden Markov model:

Mathematically, the causality of a sequence of states and observations can be


expressed as follows:

T
PðY1 ; S1 ; . . .; YT ; ST Þ 5 PðS1 ÞPðY1 jS1 Þ L PðSt jSt21 ÞPðYt jSt Þ:
t52

Hidden Markov models have shown potential in a wide range of data mining
applications, including digital forensics, speech recognition, robotics and bioinfor-
matics. The interested reader is referred to Bishop (2006) and Ghahramani (2001)
for a comprehensive discussion of hidden Markov models.
In summary, Bayesian networks appear to be a powerful method for combining
information from different sources with varying degrees of reliability. More details
of Bayesian networks and their applications can be found in Pearl (1988) and
Neapolitan (1990).

2.2.4 Artificial neural networks


The development of artificial neural networks (ANNs), or simply neural networks,
is inspired by biological neural networks (e.g. the neural network of a human brain),
which are used to approximate functions based on a large number of inputs. The
basic unit of an ANN is known as perceptron (Rosenblatt 1985), which is a single-
layer neural network. The operation of perceptrons is based on error-correction
learning (Haykin, 2008), aiming to determine whether an input belongs to one class
or another (Freund & Schapire, 1999). Given an input vector x 5 ½ 1 1; x1 ; . . . ; xm 0 ,
Another random document with
no related content on Scribd:
thousand inhabitants, while villages not of sufficient
importance to be designated on the maps, have populations
varying from ten to thirty thousand. These villages line the
banks of the Pei-ho and the main road to Peking by hundreds.
The troops were never entirely clear of them. … So hurried had
been the flight of the inhabitants that hundreds of houses
were left open, such household possessions that could not be
carried away being tousled about in great disorder. Of all
that dense population, only a few scattered hundreds of aged,
decrepit men and women, and some unfortunate cripples and
abandoned children, remained. A great majority of these were
ruthlessly slain. The Russians and Japanese shot or bayoneted
them without compunction. Their prayers for mercy availed not.
If these miserable unfortunates chanced to fall into the hands
of American or British troops they had a chance for their
lives, but even our armies are not free from these wanton
sacrifices. Every town, every village, every peasant's hut in
the path of the troops was first looted and then burned. A
stretch of country fully ten miles in width was thus swept.
Mounted 'flanks in the air' scoured far and wide, keen on the
scent of plunder, dark columns of smoke on the horizon
attesting their labors. In this merry task of chastising the
heathen Chinese, the Cossacks easily excelled. … Like an
avenging Juggernaut the Army of Civilization moved. Terror
strode before it; Death and Desolation sat and brooded in its
path. Through such scenes as these, day after day, the army
glided. A spirit of utter callousness took root, and enveloped
officers and men alike. Pathetic scenes passed without comment or
even notice. Pathos, involved in a riot of more violent
emotions, had lost its power to move."

T. F. Millard,
A Comparison of the Armies in China
(Scribner's Magazine, January, 1901).

Another eye witness, writing in the "Contemporary Review,"


tells the same sickening and shameful story, with more
vividness of description and detail: "As a rule," he remarks,
"the heathen Chinee suffers silently, and dies calmly. He has,
it is true, a deep-rooted hatred of war, and sometimes a
paralysing fear of being shot down in battle. But he takes
beheading, hanging, or death by torture with as much
resignation as did Seneca, and a great deal less fuss. And he
bears the loss of those near and dear to him with the same
serenity, heroism, or heartlessness. But he does not often
move to pity, and very seldom yearns for sympathy. The dire
sights which anyone might have witnessed during the months of
August and September in Northern China afforded admirable
illustrations of this aspect of the national character. The
doings of some of the apostles of culture were so heinous that
even the plea of their having been perpetrated upon wild savages
would not free them from the nature of crimes. I myself
remember how profoundly I was impressed when sailing on one
calm summer's day up to the bar of Taku towards the mouth of
the river Pei-ho. Dead bodies of Chinamen were floating
seawards, some with eyes agape and aghast, others with
brainless skulls and eyeless sockets, and nearly all of them
wearing their blue blouses, baggy trousers, and black glossy
pigtails. Many of them looked as if they were merely swimming
on their backs. …

{133}

"The next picture that engraved itself upon my memory had for
its frame the town of Tong-kew. … On the right bank [of the
Pei-ho] naked children were amusing themselves in the infected
water which covered them to the arm-pits, dancing, shouting,
splashing each other, turning somersaults, and intoxicating
themselves with the pure joy of living. A few yards behind
them lay their fathers, mothers, sisters, brothers, dead,
unburied, mouldering away. On the left bank, which was also
but a few yards off, was the site of Tongkew: a vast expanse
of smoking rubbish heaps. Not a roof was left standing; hardly
a wall was without a wide breach; formless mounds of baked
mud, charred woodwork, and half-buried clothes were burning or
smouldering still. Here and there a few rootless dwellings
were left, as if to give an idea of what the town had been
before the torch of civilisation set it aflame. Everyone of
these houses, one could see, had been robbed, wrecked, and
wantonly ruined. All the inhabitants who were in the place
when the troops swept through had been swiftly sent to their
last account, but not yet to their final resting-place. Beside
the demolished huts, under the lengthening shadows of the
crumbling walls, on the thresholds of houseless doorways, were
spread out scores, hundreds of mats, pieces of canvas,
fragments of tarpaulin, and wisps of straw, which bulged
suspiciously upwards. At first one wondered what they could
have been put there for. But the clue was soon revealed. In
places where the soldiers had scamped their work, or prey
birds had been busy, a pair of fleshless feet or a plaited
pigtail protruding from the scanty covering satisfied any
curiosity which the passer-by could have felt after having
breathed the nauseating air. Near the motionless plumage of
the tall grass happy children were playing. Hard by an
uncovered corpse a group of Chinamen were carrying out the
orders they had received from the invaders. None of the living
seemed to heed the dead. …

"Feeling that I never know a man until I have been permitted


to see somewhat of his hidden springs of action and gauge the
depth or shallowness of his emotion, I set myself to get a
glance at what lay behind the mask of propriety which a
Chinaman habitually wears in Tongkew as in every other town
and village in the Empire. As soon as the ice seemed broken I
asked one smiling individual: 'Why do you stay here with the
slayers of your relatives and friends?' 'To escape their fate,
if we can,' was the reply. 'We may be killed at any time, but
while we live we must eat, and for food we have to work.'
'Were many of your people killed?' I inquired. 'Look there,'
he answered, pointing to the corpses in the vast over-ground
churchyard, 'and in the river there are many more. The
Russians killed every Chinaman they met. Of them we are in
great fear. They never look whether we have crosses or medals;
they shoot everyone.' 'You are a Christian, then?' I queried.
'Yea, a Christian,' he eagerly answered. 'And I,' 'And I,'
chimed in two others. Ten minutes' further conversation,
however, brought out the fact that they were Christians not
for conscience' sake but for safety, and they were sorely
afraid that they were leaning on a broken reed. The upshot of
what they had to tell me was that the Europeans, mainly the
Russians, looked upon them all as legitimate quarry, and
hounded them down accordingly. They and theirs, they declared,
had been shot in skirmishes, killed in sport, and bayoneted in
play.

"But the ever-recurring refrain of their narrative was the


massacre in cold blood of the three hundred coolies of Taku. …
The story has been often told since then, not merely in the
north but throughout the length and breadth of China. The
leading facts, as narrated on the spot, are these: Some three
hundred hard-working coolies eked out a very cheerless
existence by loading and unloading the steamers of all nations
which touched at Taku. For the convenience of both sides they
all cooped themselves up in one boat, which served them as a
permanent dwelling. When times were slack they were huddled
together there like herrings in a barrel, and when work was
brisk they toiled and moiled like galley slaves. Thus they
managed to get along, doing harm to no man and good to many.
The attack of the foreign troops upon Taku was the beginning
of their end. Hearing one day the sharp reports of rifle
shots, this peaceable and useful community was panic-stricken.
In order to save their dreary lives they determined to go ashore.
Strong in their weakness, and trusting in their character of
working men who abhorred war, they steered their boat
landwards. In an evil hour they were espied by the Russian
troops, who at that time had orders, it is said, to slay every
human being who wore a pigtail. Each of the three hundred
defenceless coolies at once became a target for Muscovite
bullets. It must have been a sickening sight when it was all
done. …

"The river Pei-ho, could it bear witness in words to the


dramas of blood enacted on its banks by Europeans, would have
many a tale to tell as grewsome as that of the slaughter of
the three hundred coolies. … I lived for twelve or thirteen
days on that foul river, and never was I more profoundly
impressed than by what I saw in its waters and on its banks.
The first day after I had left Tientsin I was towed by
untiring coolies through a land thickly studded over with what
had once been human dwellings, but were now high heaps of
smouldering rubbish. … A wave of death and desolation had
swept over the land, washing away the vestiges of Chinese
culture. Men, women, boys, girls, and babes in arms had been
shot, stabbed, and hewn to bits in this labyrinth of streets,
and now, on both banks of the river, reigned the peace
described by Tacitus. …

"Fire and sword had put their marks upon this entire country.
The untrampled corn was rotting in the fields, the pastures
were herdless, rootless the ruins of houses, the hamlets
devoid of inhabitants. In all the villages we passed the
desolation was the same. … The streets and houses of
war-blasted cities were also the scenes of harrowing
tragedies, calculated to sear and scar the memory even of the
average man who is not given to 'sickly sentimentality.' In
war they would have passed unnoticed; in times of peace
(hostilities were definitely over) they ought to have been
stopped by drastic measures, if mild means had proved
ineffectual. I speak as an eye-witness when I say, for
example, that over and over again the gutters of the city of
Tungtschau ran red with blood, and I sometimes found it
impossible to go my way without getting my boots bespattered
with human gore. There were few shops, private houses and
courtyards without dead bodies and pools of dark blood. … The
thirst of blood had made men mad. The pettiest and most
despicable whipper-snapper who happened to have seen the light
of day in Europe or Japan had uncontrolled power over the life
and limbs, the body and soul, of the most highly-cultivated
Chinaman in the city. From his decision there was no appeal. A
Chinaman never knew what might betide him an hour hence, if the
European lost his temper. He might lie down to rest after
having worked like a beast of burden for twelve or fourteen
hours only to be suddenly awakened out of his sleep, marched a
few paces from his hard couch, and shot dead.
{134}
He was never told, and probably seldom guessed, the reason
why. I saw an old man and woman who were thus hurriedly
hustled out of existence. Their day's work done they were
walking home, when a fire broke out on a little barge on the
river. They were the only living beings found out of bed at
the time, and in the pockets of the woman a candle and some
matches were stowed away. Nobody, not even the boat-watchman,
had seen them on or near the boat. They were pounced upon,
taken to the river's edge, shot and buried. It was the work of
fifteen minutes or less. …

"The circumstantial tales told of the dishonouring of wives,


girls, children, in Tientsin, Tungtschau, Pekin, are such as
should in normal beings kindle some sparks of indignation
without the aid of 'sickly sentimentality.' … I knew well a
man whose wife had been dealt with in this manner, and then
killed along with her child. He was one of the 'good and loyal
people' who were on excellent terms with the Christians; but, if
ever he gets a chance of wreaking vengeance upon the
foreigners, he will not lightly let it slip. I knew of others
whose wives and daughters hanged themselves on trees or
drowned themselves in garden-wells in order to escape a much
worse lot. Chinese women honestly believed that no more
terrible fate could overtake them than to fall alive into the
hands of Europeans and Christians. And it is to be feared that
they were right. Buddhism and Confucianism have their martyrs
to chastity, whose heroic feats no martyrology will ever
record. Some of these obscure, but rightminded, girls and
women hurled themselves into the river, and, finding only
three feet of water there, kept their heads under the surface
until death had set his seal on the sacrifice of their life.
This suicidal frenzy was catching. … So far as I have been
able to make out, and I have been at some pains to investigate
the subject, no officers or soldiers of English or
German-speaking nationalities have been guilty of these
abominations against defenceless women."

E. J. Dillon,
The Chinese Wolf and the European Lamb
(Contemporary Review, January, 1901).

CHINA: A. D. 1900 (August 15-28).


Occupation of Peking by the allied forces.
International jealousies.
License to some of the soldiery.
Shameful stories of looting and outrage.
Formal march through the "Forbidden City."

"Early on the morning of the 15th [of August]—the day after


the siege of the Legations was raised—General Chaffee [the
American commander] advanced his men from the Chien Mên, which
he had held overnight, and drove the Chinese from gateway to
gateway back along the wide-paved approach to the far-famed
'Forbidden City.' From the wall at the Chien Mên the American
field battery shelled each of the great gateways before the
infantry advanced, and Captain Reilly, who commanded the
battery, was killed while directing the operations—a bullet
striking him full in the face and passing out through the back
of his head. In him was lost a popular and efficient officer. The
movements of the Americans were watched with no little anxiety
by certain of the allies, who evidently feared that General
Chaffee was about to enter and seize the Forbidden City
itself. The French, who had only that morning arrived, were
apparently very keen to establish a claim by joining in the
attack, for they took their mountain guns to the top of the
wall opposite the Legations, and began blazing away in the
direction of the approaches to the Palace. It so happened that
by this time the Americans had penetrated nearly to the
gateway of the Palace itself, and this French fire, so
suddenly opened, was directed upon them, instead of, as the
French General thought, upon the enemy. General Chaffee rode
down himself from the Chien Mên to where the guns were placed
on the wall, and from below conducted a spirited conversation
with the French General and M. Pichon. 'Stop firing those
guns,' the General shouted up from 60 ft. below, 'you are
killing my men.' Not understanding, the French General replied
to the effect that he was firing for the honour of France, and
M. Pichon joined in with similar protestations. General
Chaffee's protests increased in vigour, and the force,
perhaps, rather than the lucidity of them eventually induced
the French General to desist from firing upon the Americans
for the honour and glory of 'la patrie.' The Russians also
displayed a marked desire to participate in the operations in
front of the Palace, coming up after the fighting was
practically finished and attempting to occupy a part of the
position won by the Americans. Again General Chaffee had to
speak forcibly to persuade the Russians to retire. General
Chaffee cleared and occupied the whole length of the
approaches—a series of noble paved courtyards—from the Chien
Mên to the south gate of the Palace, before which he set a
strong guard. His doings were quite evidently being watched
with suspicion, and in the afternoon a conference was held, at
which it was solemnly agreed by the representatives of the
allies that in the meantime, pending the arrangement of some
concerted plan, the Forbidden City should not be entered. In
other parts of the city the work of clearing out the enemy was
meanwhile progressing, the Japanese and Russians operating on the
east and to the north, and the British to the south in the
Chinese city.

"It was thought that an expedition would have been undertaken


by the French to relieve the besieged in the Pei-tang. Help
could have been obtained for the asking, and it is difficult
to understand why no effort was made to reach the unfortunate
people, who, be it noted, were still being attacked, and whose
position, for all that was known, might have been desperate to
the last degree. The story of the long and weary weeks of
fighting round the stately cathedral pile—alas, now, now
battered and rent!—must be written by no outsider from
hearsay, but first hand by a survivor. As heard from the lips
of Père Favier, it is, indeed, a thrilling narrative in many
respects, surpassing in wonder even its sister story of the
defence of the Legations. … The relief was effected the
following morning by a combined force of French and Russians,
with whom also were the British Marines under Major Luke, R.
M. L. I., the whole under the command of the French general.
When this force arrived it was found that the Japanese had
already practically raised the siege, having started earlier
and worked along on the north-west of the Imperial City,
driving the Chinese before them. However, the Japanese had not
actually penetrated into the Pei-tang defences, and the French
had the satisfaction, after all, of being first in to receive a
joyful welcome from their long-suffering fellow-countrymen.
{135}
The raising of the siege was signalized by the slaughter of a
large number of Chinese who had been rounded up into a cul de
sac and who were killed to a man, the Chinese Christian
converts joining in with the French soldiers of the relieving
force, who lent them bayonets, and abandoning themselves to
the spirit of revenge. Witnesses describe the scene as a
sickening sight, but in judging such acts it is necessary to
remember the provocation, and these people had been sorely
tried. …

"The French general had given orders to Major Luke to remain


with his men to guard a bridge in the rear while the relief of
the Pei-tang was being effected. Afterwards the main body of
the relieving force was pushed on through the Imperial City,
leaving the British contingent behind. After waiting some time
Major Luke came to the conclusion that he must have been
forgotten, and, leaving a guard on the bridge, followed on in
the track of the French troops, to find that they had
penetrated into the Imperial City along the wall of the Palace
as far as the Meishan (Coal Hill), from the pagodas on which
the tri-colour was flying. The Russians had taken up a
position near the North Gate of the Palace, and he was only
just in time to secure the temple building at the foot of the
Meishan, and the camping-ground alongside of it. There was
great enthusiasm between the Russians and French, who cheered
each other as their forces appeared, in marked contrast to the
coolness with which the arrival of Major Luke and his men was
received. … The Russians are camped round the old place and
will permit no one in to see over it; in fact, in this part of
the city French or Russian sentries make it difficult to see most
of the many objects of art or interest. …

"Now that the common bond of interest in the success of the


relief expedition was removed, the points of difference at
once began to appear, and the underlying jealousy and
suspicion with which it seems each nation regards almost every
other manifested itself in various ways, particularly in the
unseemly race for loot and the game of general grab that now
started up, the methods of which were indicated above with
regard to the seizure of the Meishan. The Japanese seized the
Board of Revenue and must have found a huge amount of money
there, to judge by the length of the line of pack mules that
it took to carry it away. Through a mistake, it is said, on
the part of the Americans, the French got possession of the
Palace of Prince Li, said to contain treasure to the extent of
many millions of dollars. The Russians also got some treasure,
seizing on a large bank.

"Inside the Forbidden City, the Chinese say, there is fabulous


wealth in treasure stowed away or buried, and it is
principally lest this should prove true that so much jealousy
exists about the privilege of entering. Of course, the
question is also of great importance politically, and after
several diplomatic conferences it was eventually decided that,
on a date still to be arranged, the Ministers and Generals of
all the Powers should enter at the same time and proceed
together through the Palace, ascertaining the nature and value
of its contents and then sealing the whole place up and
withdrawing to await instructions from the home Governments. …
As regards the larger game of grab, the Russians succeeded in
winning the last large prize, the Wan Shen Shan, or new Summer
Palace, seven miles out near the western hills, racing for it
against a body of Japanese and coming in a quarter of an hour
ahead, having had a long start. So the story goes, but it is
not easy to check such stories, both Japanese and Russians
being very reticent about their relations with each other. One
thing only is certain, that the Russians are in jealous
possession of the Wan Shen Shan. Two British officers who rode
out there a couple of days ago in uniform were refused
admission to the grounds.

"Alongside of this official looting, private looting on the


part of the foreign soldiers was freely permitted during the
first few days; in fact, the city was abandoned for the most
part to the soldiery, and horrible stories of the kind common
in war, but nevertheless and everlastingly revolting, were
current—stories of the ravishing of women in circumstances of
great savagery, particularly by the rough Russian soldiers and
their following of French. The number of Chinese women who
committed suicide rather than submit to dishonour was
considerable. A British officer of standing told me he had
seen seven hanging from the same beam in the house of
apparently a well-to-do Chinaman. These stories, and I heard
of many more, reflect credit upon Chinese womanhood and
something very different upon the armies of Europe, which are
supposed to be the forerunners and upholders of civilization
in this particular campaign. However, this period of licence
was not of long duration. The soldiers having had their fling,
the city was divided, by arrangement, into districts, each
under the control of one of the Powers, proclamations were
issued reassuring the remaining peaceable citizens and
encouraging others to return, and gradually the work of
restoring law and order and confidence is progressing. …

"Where is the Chinese Government? Fled to Je-hol? No one seems


to know for certain. It is only certain that on the morning of
the 14th the Empress-Dowager and her following, and the
Imperial Court, fled by the west gate of the city and
disappeared. This flight took place while the Japanese were
actually engaged in shelling the Tse-kwa Mên and the city
wall. If they had succeeded in their first attempt on the gate
in the morning, the flight of the Court might have been
prevented. The Empress and her advisers had a narrow escape. …

"August 28. After deliberations occupying a full fortnight the


question of what was to be done with the Forbidden City has
been settled, at any rate, for the time being. The main
problem presented was not new; Lord Elgin had to face it forty
years ago. Considerations of immediate political expediency
guided his action then, as they have dictated the course
adopted now. He spared the Imperial Palace, and burnt instead
the Yuen Ming Yuen, or Summer Palace, seven miles from Peking.
As a result the fact that British troops ever entered Peking
does not appear in Chinese history, indeed the idea is
ridiculed by Mandarindom. Remembering this, many people here
thought it would be desirable in the present instance to burn
the Imperial Palace, after carefully removing the art
treasures, and thus, if possible, impress upon the whole
Chinese nation some idea of the enormity of the crime which
their Government has committed against civilization at large.
{136}
On the other hand, it was held that if this were done the
Imperial Court, through loss of 'face,' could never return to
Peking, and this contingency appealed strongly to the
representatives of both Russia and Japan, who conceived that
the interests of their respective countries demanded the
retention of Peking as the capital. What the representatives
of the other Powers thought has not transpired, nor does it
matter much at present, the overwhelming position of Russia
and Japan combined making all opposition to their proposals
futile. Germany may insist upon burning the Palace when he
forces have all arrived, and those who think it ought to be
done hope that she will; but in the meantime the conference of
commanding officers, in consultation with the Ministers,
decided not to do more than march a small force of foreign
troops through the 'sacred precincts' from the South Gate to
the North, after which these were to be again closed, leaving
the Palace intact. There was to be no looting. Everything was
to be done to provide against the idea arising that the place
had been desecrated. The ceremony was merely to be a display
of military power. …

"Arrangements were made for certain Chinese officials to be


present during the ceremony and also for a number of
attendants to open up the various halls through which the
troops would require to pass, and to close the doors behind
the 'barbarians' when they finally withdrew. Yesterday there
were reports of further friction and possible further
postponement of the ceremony, but by evening these had died
away and the programme had assumed at last a definite shape.
According to it the various troops were to parade this morning
between 7 and 8 outside the Tien-an Mên, the Inner Gate of the
Imperial City. There at the time appointed they were drawn up,
and the interest of a great historic event began. The Imperial
Palace, or Forbidden City, is an enclosure about two-thirds of
a mile long from north to south and about half a mile broad
from east to west. It is surrounded by a high wall. Outside
this wall on the west, north, and east lies a broad moat. From
the south it is approached by a series of immense paved
courtyards divided one from the other by high and massive
gateways, above which rise imposing pavilions with
yellow-tiled overhanging roofs, flanked by great towers built
in the same style and similarly roofed with Imperial yellow.
This Forbidden City or Imperial Palace enclosure is situated
within the Imperial City, a larger enclosure, also surrounded
by a high tile-topped wall. It was outside the Inner Gateway
to this Imperial City that the troops were drawn up. The
Russians took up their position on the centre, close to the
stone bridge in front of the Tien-an Mên; the Japanese were
opposite the gateway on the left; the British to the right of
the Russians in a wide paved avenue running east and west
outside the inner wall of the Imperial City. The remainder of
the allies were drawn up to the rear of the Russians and
Japanese in the wide avenue running north and south from the
Outer Gateway (Ta-ching Mên). As a pageant it was not a
success. Soldiers on service do not make a fine show. … Inside
the Tien-an Mên the central stone road continues for about
half a mile down a broad, flagged avenue running between
handsome temple buildings on either hand, until the Wu Mên, or
south gate, of the Forbidden City is reached. It is an
imposing entrance. The gateway itself is high and massive, and
the towers on top are particularly fine. Thus far, on the
morning of the 15th, the American troops fought, driving the
Chinese before them into the city. The self-denial displayed
by General Chaffee on that occasion has not, perhaps, received
proper recognition. There was at that time no agreement to
hold him back, and he might have pressed on and taken the
palace and hoisted the Stars and Stripes over it. It would
have been a fine prize, and the temptation must have been
great, but General Chaffee, acting, possibly, under the advice
of Mr. Conger, the United States Minister, refrained—a
noteworthy act. This gateway has been held by an American
guard ever since, and American troops have been quartered in
the approach to it. …

"After it was over the generals and staff officers and the
Ministers and other privileged persons returned by the way we
had come through the Forbidden City. Tea was provided by the
Chinese officials in the summer-house of the palace garden,
the quaint beauties of which there was now time to appreciate.
Beautiful stone carvings and magnificent bronzes claimed
attention. The march through had occupied about an hour, and
another was spent sauntering back through the various halls
and courtyards. As the halls were cleared the Chinese
attendants hastily closed the doors behind us with evident
relief at our departure. A few jade ornaments were pocketed by
quick-fingered persons desirous of possessing souvenirs, but
on the whole the understanding that there was to be no looting
was carried out. Arrived at the courtyard where their horses had
been left, the generals and staff officers mounted and rode
out of the palace, and the rest of us followed on foot. The
gates were once more closed and guards were stationed outside
to prevent anyone from entering. The Forbidden City resumed
its normal state, inviolate, undesecrated. The honour of the
civilized world, we were told, had been thus vindicated. But
had it?"

London Times,
Peking Correspondence.

CHINA: A. D. 1900 (August-September).


The flight of the Imperial Court.

The following account of the flight of the Court from Peking


to Tai-yuen-fu was given to a newspaper correspondent, in
October, by Prince Su, who accompanied the fugitive Emperor
and Dowager-Empress, and afterwards returned to Peking:

"The day the Court left Peking they travelled in carts to


Kuan-shi, 20 miles to the north, escorted by 3,000 soldiers of
various commands. This composite army pillaged, murdered, and
outraged along the whole route. At Kuan-shi the Imperial
cortege was supplied with mule litters. The flight then
continued at the rate of 20 miles daily to Hsuan-hua-fu, where
a halt was made for three days. This place is 120 miles from
Peking. Up to this time the flight had been of a most
panic-stricken nature. So little authority was exerted that
the soldiers even stole the meals which had been prepared for
the Emperor and the Dowager-Empress. Some improvement was
effected by the execution of several for murder and pillaging,
and gradually the various constituents of the force were
brought under control.
{137}
Many of the Dowager-Empress's advisers were in favour of
remaining at Hsuan-hua-fu, on account of the comparatively
easy means of communication with the capital. The majority,
however, were in such fear of pursuit by the foreign troops
that the proposition was overruled. The flight was then
resumed towards Tai-yuen-fu. Before leaving Hsuan-hua-fu,
10,000 additional troops under Tung-fuh-siang joined the
escort. The newcomers, however, only added to the discord
already prevailing. The Dowager-Empress did little else but
weep and upbraid those whose advice had brought them into such
a position. The Emperor reviled everyone irrespective of his
opinions. The journey to Tai-yuen-fu took 26 days, the longest
route being taken for fear of pursuit. On arriving there the
formation of some kind of Government was attempted, but owing
to the many elements of discord this was found to be next to
impossible. Though many edicts were issued they could not be
enforced. Neither party cared for an open rupture, and affairs
rapidly assumed a state of chaos. Prince Su further said that
the Emperor did not desire to leave Peking, preferring to
trust himself to the allies, but his objections were not
listened to and he was compelled to accompany the flight."

The final resting-place of the fugitive Imperial Court for


some months was Si-ngan-fu, or Sin-gan Fu, or Segan Fu, or
Sian Fu (as it is variously written), a large city, the
capital of the western province of Shensi.

CHINA: A. D. 1900 (August-December).


Discussions among "the Powers" as to the terms to be made
with the Chinese Government.
Opening of negotiations with Prince Ching and Li Hung-chang.

Immediately upon the capture of Peking, Li Hung-chang


addressed appeals to the Powers for a cessation of
hostilities, for the withdrawal of troops from Peking, and for
the appointment of envoys to negotiate a permanent peace.
Discussion among the governments followed, the first definite
outcome of which appeared in the announcement of an intention
on the part of Russia to withdraw her troops from Peking as
soon as order had been re-established there, and of a
disposition on the part of the United States to act with
Russia in that procedure. This substantial agreement between
the two governments was made public by the printing of the
following dispatch, dated August 29, from Mr. Adee, the
American Acting Secretary of State, to the representatives of
the United States in London, Paris, Vienna, Berlin, Rome, and
Tokio:

"The Russian Charge d'Affaires yesterday afternoon made me an


oral statement regarding Russia's purposes in China to the
following effect:—'That, as she has already repeatedly
declared, Russia has no designs of territorial acquisition in
China; that, equally with the other Powers now operating
there, Russia sought the safety of her Legation in Peking and
to help the Chinese Government to repress the troubles that
arose; that incidentally to the necessary defensive measures
on the Russian border, Russia has occupied Niu-chwang for
military purposes, and as soon as order is re-established she
will withdraw her troops from the town if the action of the
other Powers be no obstacle; that the purpose for which the
various Governments have co-operated for the relief of the
Legations in Peking has been accomplished; that taking the
position that as the Chinese Government has left Peking there
is no need for the Russian representative to remain, Russia
has directed her Minister to retire with his official
personnel from China; that the Russian troops will likewise be
withdrawn, and that when the Government of China shall regain the
reins of government and can afford an authority with which the
other Powers can deal, and will express a desire to enter into
negotiations, the Russian Government will also name its
representative.' Holding these views and purposes, Russia has
expressed the hope that the United States will share the same
opinion.

"To this declaration our reply has been made by the following
memorandum:—'The Government of the United States has received
with much satisfaction the reiterated statement that Russia
has no designs of territorial acquisition in China and that,
equally with the other Powers now operating in China, Russia
has sought the safety of her Legation and to help the Chinese
Government to repress the existing troubles. The same purposes
have moved, and will continue to control, the Government of
the United States, and the frank declarations of Russia in
this regard are in accord with those made to the United States
by the other Powers. All the Powers, therefore, having
disclaimed any purpose to acquire any part of China, and now
that the adherence thereto has been renewed since relief
reached Peking, it ought not to be difficult by concurrent
action through negotiations to reach an amicable settlement
with China whereby the treaty rights of all the Powers shall
be secured for the future, the open door assured, the
interests and property of foreign citizens conserved, and full
reparation made for the wrongs and injuries suffered by them.

"So far as we are advised, the greater part of China is at


peace and earnestly desires to protect the life and property
of all foreigners, and in several of the provinces active and
successful efforts to suppress the 'Boxers' have been taken by
the Viceroys, to whom we have extended encouragement through
our Consuls and naval officers. This present good relation
should be promoted for the peace of China. While we agree that
the immediate object for which the military forces of the
Powers have been co-operating—the relief of the Ministers in
Peking—has been accomplished, there still remain other
purposes which all the Powers have in common, which have been
referred to in the communication of the Russian Charge d'
Affaires, and which were specifically enumerated in our Note
to the Powers.

"These are:—To afford all possible protection everywhere in


China to foreign life and property; to guard and protect all
legitimate foreign interests; to aid in preventing the spread
of disorders in the other provinces of the Empire and the
recurrence of such disorders; to seek a solution which may
bring about permanent safety and peace in China; to preserve
the Chinese territorial and administrative entity; to protect
all rights guaranteed by treaty and international law to
friendly Powers; and to safeguard for the world the principle
of equal and impartial trade with all parts of the Chinese
Empire. In our opinion, these purposes could best be attained
by the joint occupation of Peking under a definite
understanding between the Powers until the Chinese Government
shall have been re-established and shall be in a position to
enter into new treaties containing adequate provisions for
reparation and guarantees for future protection.

{138}

"With the establishment and recognition of such authority the


United States would wish to withdraw its military forces from
Peking and remit to the processes of peaceful negotiation our
just demands. We consider, however, that the continued
occupation of Peking would be ineffective to produce the
desired result unless all the Powers unite therein with entire
harmony of purpose. Any Power which determines to withdraw its
troops from Peking will necessarily proceed thereafter to
protect its interests in China by its own method, and we think
this would make a general withdrawal expedient. As to the time
and manner of withdrawal, we think that, in view of the
imperfect knowledge of the military situation resulting from
the interruptions of telegraphic communication, the several
military commanders in Peking should be instructed to confer
and to agree together upon the withdrawal as a concerted
movement, as they agreed upon in advance.

"The result of these considerations is that, unless there is


such a general expression by the Powers in favour of the
continued occupation as to modify the views expressed by the
Russian Government and lead to a general agreement for
continued occupation, we shall give instructions to the
commander of the American forces in China to withdraw our
troops from Peking after due conference with the other
commanders as to the time and manner of withdrawal.

"The Government of the United States is much gratified by the


assurance given by Russia that the occupation of Niu-chwang is
for military purposes incidental to the military steps for the
security of the Russian border provinces menaced by the
Chinese, and that as soon as order is established Russia will
withdraw her troops from those places if the action of the
other Powers is not an obstacle thereto. No obstacle in this
regard can arise through any action of the United States,
whose policy is fixed and has been repeatedly proclaimed."

Even before the communication received from Russia, the


government of the United States had taken steps to withdraw
the greater part of its troops. "On the 25th of August," says
the American Secretary of War, in his annual report, November
30, 1900, "General Chaffee was directed to hold his forces in
readiness for instructions to withdraw, and on the 25th of
September he was instructed to send to Manila all the American
troops in China with the exception of a legation guard, to
consist of a regiment of infantry, a squadron of cavalry, and
one light battery."

The expressions from Russia and the United States in favor of


an early withdrawal of foreign troops from Peking, and the
opening of pacific negotiations with the Chinese government,

You might also like