You are on page 1of 14

Original Article

Modeling of Inter-Sample Variation in Flow


Cytometric Data with the Joint Clustering and
Matching Procedure

Sharon X. Lee,1 Geoffrey J. McLachlan,1* Saumyadipta Pyne2,3

 Abstract
1
Department of Mathematics, University We present an algorithm for modeling flow cytometry data in the presence of large
of Queensland, St. Lucia, Queensland inter-sample variation. Large-scale cytometry datasets often exhibit some within-class
4072, Australia variation due to technical effects such as instrumental differences and variations in data
2
Indian Institute of Public Health acquisition, as well as subtle biological heterogeneity within the class of samples. Fail-
Hyderabad (IIPHH), Plot No. 1, A.N.V. ure to account for such variations in the model may lead to inaccurate matching of
Arcade, Amar Co-op Society, Kavuri populations across a batch of samples and poor performance in classification of unla-
Hills, Madhapur, Hyderabad, AP, 500033 beled samples. In this paper, we describe the Joint Clustering and Matching (JCM) pro-
India cedure for simultaneous segmentation and alignment of cell populations across
3
multiple samples. Under the JCM framework, a multivariate mixture distribution is
CR Rao Advanced Institute of used to model the distribution of the expressions of a fixed set of markers for each cell
Mathematics, Statistics and Computer in a sample such that the components in the mixture model may correspond to the var-
Science, University of Hyderabad ious populations of cells, which have similar expressions of markers (that is, clusters),
Campus, Hyderabad, AP, 500046 India in the composition of the sample. For each class of samples, an overall class template is
formed by the adoption of random-effects terms to model the inter-sample variation
Grant sponsor: Australian Research within a class. The construction of a parametric template for each class allows for direct
Council (to S.X.L. and G.J.M.) quantification of the differences between the template and each sample, and also
between each pair of samples, both within or between classes. The classification of a
Grant sponsor: Ramalingaswami Fellow- new unclassified sample is then undertaken by assigning the unclassified sample to the
ship of Department of Biotechnology, class that minimizes the distance between its fitted mixture density and each class den-
Ministry of Science and Technology (to sity as provided by the class templates. For illustration, we use a symmetric form of the
S.P.) Kullback-Leibler divergence as a distance measure between two densities, but other dis-
Grant sponsors: Defence Research and tance measures can also be applied. We show and demonstrate on four real datasets
Development Establishment (to S.P.); how the JCM procedure can be used to carry out the tasks of automated clustering and
Ministry of Statistics and Program Imple- alignment of cell populations, and supervised classification of samples. VC 2015
mentation, Government of India (to S.P.) International Society for Advancement of Cytometry

Additional Supporting Information may be


found in the online version of this article.  Key terms
flow cytometry; classification; class template; inter-sample variation; clustering; match-
*Correspondence to: Geoffrey J.
ing; skew mixture models; EM algorithm; JCM
McLachlan, Department of Mathematics,
University of Queensland, St. Lucia,
Queensland 4072, Australia.
E-mail: g.mclachlan@uq.edu.au FLOW cytometry (FCM) is a powerful tool in clinical diagnosis of health disorders,
Published online 22 October 2015 in
in particular, immunological diseases. It offers rapid high-throughput measurements
Wiley Online Library (wileyonlinelibrary. of multiple characteristics on every cell in a sample, enabling biologists to study a
com) variety of biological processes at the cellular level. A critical component in the pipe-
DOI: 10.1002/cyto.a.22789 line of FCM data analysis is the identification of cell populations from the multidi-
C 2015 International Society for
mensional FCM dataset, currently performed manually by visually separating regions
V
Advancement of Cytometry or gates of interest on a series of sequential bivariate projections of the data, a process
known as gating. However, this approach has many problems and limitations,
including variability between different analysts, non-standardization and non-
reproducibility of results, an unrealistic assumption that biological relationships
between the markers exist only in the projected low-dimensional space and, more
importantly, non-scalability to high-dimensional analysis, especially when a large

Cytometry Part A  89A: 3043, 2016


Original Article

number of samples is involved. With the advancement of tech- of known classification with respect to the classes. Note that
nology, modern day flow cytometers allow simultaneous in all of these methods, only selected features are used in the
measurements of many markers on millions of cells, with post-hoc classification task. Hence their classifiers are based
some latest revolutionary mass cytometers capable of extend- solely on the information in these features, ignoring other
ing this up to 100 simultaneous parameters (1,2). As the num- features of the underlying distributions such as their shapes or
ber of markers increases, the number of bivariate projections tails, which may represent biologically interesting phenomena.
increases rapidly. This renders conventional manual analysis Here we present the Joint Clustering and Matching
practically infeasible for high-dimensional FCM analysis. Due (JCM) procedure for the supervised classification of FCM
to this and the subjective and time-consuming nature of this samples with respect to a number of predefined classes. Unlike
approach, recent efforts have turned to the development of the above mentioned methods, JCM adopts a measure to
computational methods for the analysis of high-dimensional quantify the similarity or dissimilarity between the fitted para-
flow cytometry (FCM) data to automate the gating process; metric models in terms of differences between the class tem-
see (3) for a recent account. plates representing the class densities of the markers being
Among these methods, mixture models have been widely used. Based on the calculation of distance between a pair of
employed as the underlying mechanism for characterizing the samples modeled by JCM as mixture distributions, their high-
heterogeneous cell populations (4–11), taking advantage of dimensional, and potentially multi-modal, forms can be com-
the convenient and formal framework offered by a model- pared with precision and rigour. Indeed, the matrix of such
based approach to modeling these complex and multimodal pairwise distances for a given class of samples can be used to
datasets. Using this approach, the FCM data can be concep- quantify the overall inter-sample variation, and we describe
tualized as a mixture of populations each of which consists of tools for visualizing the same. The distance-based approach
cells with similar expressions, the distribution of which can be provides a more complete assessment of the differences
characterized by a parametric density. The task of cell- between the samples and the templates for different classes
population identification then translates directly to the classi- and so should lead to more objective discrimination among
cal problem of multivariate model-based clustering. It is well the classes.
known that data generated from flow cytometric studies are In constructing a template for the density of the markers
often asymmetrically distributed, multimodal, as well as hav- for a given class of samples, JCM models the inter-sample var-
ing longer and/or heavier tails than normal. To accommodate iation in a class through the adoption of a random-effects
this, several methods use a mixture of mixtures approach model (REM). With the exception of ASPIRE, the methods
(7,12) where a final cluster may consist of more than one mix- described above do not allow for possible inter-sample varia-
ture component, while some others adopt mixture models tions. They either explicitly or implicitly assume the cell popu-
with skew distributions as components (4,5,7) to enable a sin- lations to have the same underlying distribution across all
gle component distribution to correspond to a cluster. samples. In particular, the location (or mode) and the shape
In addition to cell-population identification, there is also of the distribution of these populations are taken to be the
the challenging task of aligning or matching these populations same. These assumptions are not realistic given that the cell
across multiple samples. This is often difficult for automated populations typically vary significantly across different sam-
methods especially when there is large variation between the ples; see, for example, the Lymphoma dataset described in
samples. Some recently proposed methods (13) attempt to “Lymphoma dataset” section. These assumptions are relaxed
mitigate this by normalizing or adjusting the data in a prepro- within the JCM framework by modeling each sample as an
cessing step, but a more direct approach is to allow for inter- instance of a class template, possibly transformed with a flexi-
sample random effects when modeling a batch of samples. ble amount of variation. The latter is governed by a REM,
The classification of individual samples (for example, thus allowing one to establish a direct parametric correspon-
predicting the disease state of an unlabeled sample) presents dence between the cell populations in a sample and their cor-
another challenging task in the pipeline of flow cytometric responding components in the mixture model for the class
data analysis. To this end, several techniques have been pro- template. This formulation also means that the cell popula-
posed recently in the literature, most of them based on algo- tions are automatically matched across different samples with-
rithms for cell-population identification mentioned above. In out needing any further post-processing.
particular, many of these methods proceed by training a classi- Previously in Ref. 7, the effectiveness of JCM in clustering
fier such as a support vector machine (SVM) to discriminate cells within a given sample and aligning the cell populations
the samples between different classes. Some examples include across a batch of samples was illustrated in three experiments,
flowBin (3,14), SWIFT (3,15), ASPIRE (8,16), PBSC (3), Cit- where JCM provided excellent results. The first two of these
rus (17), and flowPeaks (3,18) which train a classifier based experiments involved multiple samples and (multiplexed)
on features derived from a fitted model or a clustering of the staining panels, as well as multiple time-points and/or classes.
data. Typically, the cluster proportions (size) are used as the In one of these two experiments, the B-cell receptor (BCR)
feature vector. An allocation of a sample to one of the prede- one, which is also examined here (but reformulated as a classi-
fined classes of samples is then performed on the basis of this fication problem), JCM identified a spatio-temporal signature
feature vector using the classifier formed from the available of BCR signaling that improved the distinction between the
training data consisting of observations on the feature vector two classes of patients previously reported in Ref. 19. In the

Cytometry Part A  89A: 3043, 2016 31


Original Article

third illustration, it was demonstrated how JCM was able to tube, yielding a seven-dimensional dataset. This totaled to
identify cell populations in a batch of diffuse large B-Cell lym- 2872 data files for analysis. The first and last tube (i.e., tubes 1
phoma (DLBCL) samples from the FlowCAP-I challenge (3) and 8) were controls and hence not used in our analysis. The
that closely matched the gated populations identified by data have been transformed logarithmically for the SS meas-
expert analysts. In this article, we focus on the application of urements and all fluorescent markers, while the FS measure-
the JCM procedure in the subsequent stage of the FCM pipe- ments remained linear. All channels have also been scaled to
line—sample classification—and the modeling of the inter- the unit interval during preprocessing. The dataset is available
sample variation within a batch of samples. publicly from http://flowrepository.org/id/FR-FCM-ZZYA. For
The JCM algorithm was implemented as an R package training, half of the dataset along with their known labels (156
EMMIX-JCM, available from http://www.maths.uq.edu.au/ healthy patients and 23 AML patients) was used. The challenge
gjm/mix_soft/EMMIX-JCM/. called for the construction of a classifier to label the samples in
the test set (the other half consisting of data from 180 patients).
MATERIALS AND METHODS
Overview of the Datasets BCR dataset. The B-cell receptor (BCR) dataset contains
We shall demonstrate the effectiveness of JCM in auto- flow cytometric measurements from 28 patients diagnosed
mated gating and supervised classification of FCM samples with follicular lymphoma (FL). In brief, each sample was split
using four real benchmark datasets described here. into eight multiplexed panels for staining by F(ab0 )2 against
the BCR heavy chain. For each panel, a pair of lineage markers
West Nile virus (WNV) dataset. Thirteen blood samples (common to all panels) was used to label the B-cells, while
were acquired from patients diagnosed with symptomatic another pair (different in all panels) of phospho-markers was
WNV. These samples were stained to measure the expressions used to measure BCR signaling characteristics, totalling 18 dif-
of CD4, IL17, and CFSE markers. Between 100,000 and ferent markers across all panels. Measurements were recorded
1,000,000 events were recorded for each sample. The samples for all panels at basal (unstimulated) and again at 4 min after
were manually analyzed and gated. To demonstrate the ability stimulation. Further technical details on this experiment and
of JCM in clustering samples to identify cell populations, we the processing of samples can be found in the Supporting
measure the level of agreement between the automated Information of (19). It was observed in (20) that the FL
approach of JCM with manual gating in terms of the misclas- patients can be stratified into two classes that have distinctly
sification rate (MCR). Results are to be compared also with different survival outcomes, which is linked to the presence or
competing algorithms as mentioned above. For comparison absence of a subpopulation of B-cells known as the lymphoma
purposes, we assume the partitions given by manual gating to negative prognostic (LNP) cells. For our illustration, the data-
be the ground truth. This dataset is available from https:// set is randomly partitioned into a training and test set with
flowrepository.org/id/FR-FCM-ZZY3. equal number of samples in each set. The task for the auto-
mated algorithms is to determine the class of FL for each
Lymphoma dataset. Between 2003 and 2008, the British patient, based on the training samples provided. This dataset
Columbia Cancer Agency collected samples from randomly is available upon request from the corresponding author.
selected lymph node biopsies from patients with DLBCL. The JCM Algorithm
Each sample was stained for measuring the expressions of The JCM methodology is based on a multi-level model-
three surface markers, CD3, CD5, and CD19. This dataset based clustering approach, where each sample is represented by
contains 30 samples, with each sample containing between a flexible finite mixture model. The latter is intrinsically linked
3,000 and 100,000 events. This dataset is known to exhibit to a class template through a REM. In brief, the JCM approach
considerable inter-sample variation due to a voltage change in can be conceptualized as a two-level hierarchical model consist-
the instrument settings in 2005. We use this dataset to illus- ing of a lower sample-specific level and a higher batch-specific
trate the usefulness of REM in handling datasets with large level. At the sample-specific level, each cell population in a sam-
inter-sample variation. This dataset is available from https:// ple is characterized by a parametric multivariate distribution. A
flowrepository.org/id/FR-FCM-ZZ4E. sample with multiple populations can thus be viewed as having
a mixture distribution. Note that each sample has its own mix-
Acute myeloid leukemia (AML) dataset. As part of the ture model, implying that all the component parameters are
FlowCAP-II Sample Classification challenge (3), the AML specific to that sample. It should be pointed out that this is sig-
dataset was split equally into a training set and a test set. nificantly different to some of the other available approaches,
Peripheral blood or bone marrow samples from a total of 359 such as SWIFT and HDPGMM, which require all samples to
patients were collected; 316 were healthy patients and 43 were share the same component parameters (except the mixing pro-
diagnosed with AML (3). Due to the large number of markers portions). Our approach gives JCM more flexibility in handling
used, the sample collected from each patient was assayed into inter-sample variations. At the batch-specific level, an entire
eight tubes for staining by different markers combinations, batch of samples can be modeled by a parametric multivariate
with five markers from each tube. In addition, the forward template that describes the overall characteristics of the batch.
scatter (FS) and side scatter (SS) were also available for each To construct a representative template, JCM assumes that each

32 Modeling of Inter-Sample Variation


Original Article

sample-specific mixture model can be effectively modeled as an Regarding the choice of the component densities in (1),
instance of a batch-specific (class) template with subtle inter- JCM has options: (i) the multivariate t-distribution and (ii) a
sample variations. This template is taken to be representative of multivariate skew t-distribution. They are given by
the class from which the batch of samples is assumed to have   mhk 1p
been drawn. The class template is constructed by adopting a Cðmhk21pÞ dh ðy jk Þ 2 2
tp ðy jk ; ljhk ; Rhk ; mhk Þ5 p 1 11 ;
random-effects (RE) approach where individual mixture mod- ðpmhk Þ2 jRhk j2 Cðm2hk Þ mhk
els are linked to the batch mixture models through a flexible
(2)
affine transformation. A further advantage of this approach is
that each cluster in a sample is automatically registered with and
respect to the corresponding cluster of the template. A third
level can be envisaged when between-class comparisons in the STp ðy jk ; ljhk ; Rhk ; dhk ; mhk Þ
case of multiple classes are made in a large cohort analysis. This
52tp ðy jk ; ljhk ; Rhk ; mhk Þ
additional higher level can be used to study intra-class relation-
ships in situations where, for example, multiple disease types sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi !
ðmhk 1pÞ
are present, individuals are measured at different time points, T1 dThk X21
hk ðy jk 2lhk Þ
T 21
; 0; 12dhk Xhk dhk ; mhk 1p ;
mhk 1dh ðy jk Þ
or multiple experiment conditions are analyzed. With the avail-
ability of an overall template for each class, classification of (3)
unlabeled samples can be easily performed by comparing its
respectively. In the above, ljhk is a (cell-specific) location
similarity with each class template. We provide here a brief
parameter, Rhk is a positive-definite scale matrix, and mhk is
description of each of the two levels (the lower sample-specific
the degrees of freedom. The term dðy jk Þ5ðy jk 2ljhk ÞT R21hk ðy jk
level and upper batch-specific level) within the JCM approach.
2ljhk Þ denotes the squared Mahalanobis distance between y jk
Further technical details can be found in Ref. 7.
and ljhk (with Rhk as the scale matrix). The variable Xhk is
defined by Rhk 1dhk dThk. We also let T1 and CðÞ denote the
Level one: modeling of individual samples. As mentioned (scalar) t-distribution function and the Gamma function,
above, each sample is modeled by a finite mixture model. The respectively.
JCM procedure provides two options for parametric densities, In the case of option (i), the t-distribution (2) allows for
the default option using the (restricted) multivariate skew t heavier tails than the normal distribution (as regulated by
(MST) distribution and an option using its symmetric ver- degrees of freedom mhk), thus providing a more robust
sion, the multivariate t-distribution. The MST distribution approach to traditional Gaussian mixture modeling (GMM).
has additional parameters for handling heavy tails and skew- The second option for f ðÞ provided by JCM is a skew version
ness, rendering it well suited for modeling non-elliptical and of (2), with an additional vector dhk consisting of p skewness
asymmetrical clusters that are typical in FCM data. Moreover, parameters for handling non-symmetric distributional shapes.
it formally encompasses the multivariate normal, t, and skew In recent years, many different versions of the skew t-distribu-
normal distributions as special or limiting cases. tion have been proposed; see Refs. 21, 22 for an overview on
To establish notation, let the p-dimensional vector this topic. For our purposes, we have adopted the popular ver-
y jk 2 Rp contain the measurements (on p markers) of cell j in sion as proposed by Ref. 4 and used in Ref. 23. The estimation
sample k, where j51; . . . ; nk and k51; . . . ; K . Here nk denotes of the parameters of a mixture model can be carried out using
the total number of cells in sample k, and K denotes the total the expectation-maximization (EM) algorithm. Specific
number of samples. We let g denote the number of compo- details for finite mixtures of (2) can be found in Ref. 24, and
nents in the mixture model. It is assumed that each cell in the in Ref. 4 for finite mixtures of (3).
jth sample belongs to one of the g components. Then the den-
sity of y jk can be expressed as
Level two: parametric batch-specific (class) template. To
Xg
form a batch-specific (class) template, inter-sample variations
f ðy jk ; Wk Þ5 phk f ðy jk ; hhk Þ; (1) are modeled through a set of RE terms that specify how
h51
sample-specific component distributions may vary from an
where p1k ; . . . ; pgk denote the mixing proportions which are overall representative mixture model (that is, the template
non-negative and sum to one. Here, f ðy jk ; hhk Þ denotes a model). More specifically, these RE terms govern how the
component density with parameters specified by component-location parameters lhk relate to the batch loca-
hhk ðh51; . . . ; gÞ. The vector Wk consists of all the unknown tion parameter lh. We proceed by assuming that each lhk is
parameters of the mixture model, and is given by an affine transformation of lh , that is,
Wk 5ðp1k ; . . . ; pg21;k ; hT1k ; . . . ; hTgk ÞT , where the superscript T
denotes vector or matrix transpose. In practice, the optimal lhk 5ahk  lh 1bhk 1p ; (4)
value of g can either be specified a priori, or inferred from the
data using some information criterion such the Bayesian where  denotes the Hadamard (or elementwise) product, 1p
information criterion (BIC). denotes a p-dimensional vector with all elements being one,

Cytometry Part A  89A: 3043, 2016 33


Original Article

 ð 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 12
and the scaling and translation RE terms are independent and
given by HDðf1 ; f2 Þ5 12 f1 ðyÞf2 ðyÞdy ; (9)
21
ahk  Np ð1p ; n1h Þ;
(5) respectively.
bhk  N1 ð0; n22h Þ; When a new or unlabeled sample is presented to JCM, a
mixture model fS is fitted to obtain a parametric representa-
respectively. Estimation of the parameters of the mixture
tion of the sample. The model fS is compared to each class
model with (4) and (2) or (3) can be implemented using the
template fT using the chosen distance measure to obtain a vec-
EM algorithm. The technical details of the fitting algorithm
tor of dissimilarity scores. The new sample is then assigned to
for our JCM model have been described in the Supporting
the class associated with the lowest dissimilarity score.
Information of Ref. 7.
Unlike ASPIRE and Citrus which use only certain fea-
tures calculated from the predicted clusters for each sample,
Classification of new samples. In the case of multiple classes,
the classification approach of JCM is based on the fitted den-
a template can be formed for each class with JCM, using the avail-
sities for the class templates rather than their selected features.
able training data (that is, the samples of known origin with
More specifically, these other methods calculate features such
respect to the classes). As can be observed from (1) and (4), a
as the proportions and the median of some or all of the
JCM template is characterized parametrically as a mixture model
markers for the identified clusters as a simplified representa-
similar to the sample-specific model fitted to each sample. This
tion of each sample. A classifier is then built on these features.
facilitates the quantitative comparison between different tem-
As such, the accuracy of these classifiers can be quite sensitive
plates. Once the class templates are constructed, they can be quan-
to the choice of features, and is critically dependent on the
titatively compared using a range of information-based measures,
distinctiveness of the selected features across the different
such as the Kullback-Leibler (KL) divergence or the Bhattacharyya
classes. In contrast, JCM does not rely on any particular
distance. For illustration, we adopt the commonly used KL dis-
selected features from the fitted model, but rather uses the
tance as an example in this article, but remark that any other
entire fit provided by the model by proceeding on the basis of
appropriate distance measure can be applied. In this approach, a
the relative size of the class densities provided by the tem-
new sample is fitted with a mixture model and its KL divergence
plates. When an unlabeled sample is presented to JCM, an
from each of the templates is calculated. The new sample is then
independently fitted model of the sample is directly compared
classified to the class associated with the smallest KL distance. We
to the estimate of the parametric form of the template density
use a symmetric combination of the KL information.
More specifically, the KL distance between two continu- for each class. The KL distance provides a measure of this dif-
ous densities f1 ðyÞ and f2 ðyÞ is defined by ference between the estimated density of the unclassified sam-
ð1 ple and the fitted template density for each class.
f1 ðyÞ
KLðf1 ; f2 Þ5 f1 ðyÞlog dy: (6)
21 f2 ðyÞ RESULTS
As the KL distance is not a symmetric measure, that is The effectiveness of the JCM approach is illustrated on
KLðf1 ; f2 Þ ¼
6 KLðf2 ; f1 Þ, we use the mean of KLðf1 ; f2 Þ and KLðf2 ; four separate datasets and compared to a number of other avail-
f1 Þ as our distance measure in the experiments in “Results” sec- able methods. In the first two datasets, we are interested in the
tion. For densities such as the the multivariate normal density, classification of FCM samples into two classes. The first dataset
the KL distance can be evaluated using closed-form expression. was part of the FlowCAP-II competition, while the second has
For more complicated distributions where closed-form expres- been previously analyzed in Refs. 7,19. For the final two datasets,
sions may not exist, the empirical KL divergence (25) may be we examine the performance of JCM and other competing algo-
used as a computationally efficient approximation to the KL rithms in identifying cell populations in the presence of within-
divergence. This is given by class inter-sample variations. In the WNV dataset, only the
1X n
f1 ðyj Þ abundance of a particular population was observed to be vary-
KLemp ðf1 ; f2 Þ5 log ; (7) ing greatly across the samples; whereas for the Lymphoma data-
n j51 f2 ðyj Þ
set, the variations were much more profound.
where y1 . . . ; yn is a set of n points in the domain of f1 and f2.
Performance Evaluation Measures
Typically this is the set on which f1 and f2 were trained.
To evaluate the classification performance of JCM, we
As remarked earlier, other distance measures can be used
compute a number of measures as adopted in Ref. 3 namely
in place of the KL distance. For example, the Bhattacharyya
sensitivity (or recall), specificity, accuracy, precision, and F-
distance and Hellinger distance can be used. Both of these
measure. Let TP, TN, FP, and FN denote the number of true
measures are symmetric, and are given by
ð 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi positives, true negatives, false positives, and false negatives,
BDðf1 ; f2 Þ52log f1 ðyÞf2 ðyÞdy; (8) respectively. These measures are defined as
21
TP
Sensitivity5 ; (10)
and TP1FN

34 Modeling of Inter-Sample Variation


Original Article

Table 1. Classification results by each algorithm on the AML dataset


SENSITIVITY SPECIFICITY ACCURACY PRECISION F-MEASURE ARI AUC

JCM 1.00 0.99 0.99 0.95 0.97 0.97 0.997


Citrus 0.95 1.00 0.99 1.00 0.97 0.97 0.975
flowMatch 0.85 1.00 0.98 1.00 0.92 0.89 0.925
SWIFT 0.65 1.00 0.96 1.00 0.79 0.73 0.825
flowPeaks 0.45 1.00 0.94 0.62 0.62 0.55 0.725
HDPGMM 0.45 1.00 0.94 0.62 0.62 0.55 0.725
flowPeakssvm 1.00 1.00 1.00 1.00 1.00 1.00 1.000
flowType-FeaLect 1.00 1.00 1.00 1.00 1.00 1.00 1.000
SPADE 1.00 1.00 1.00 1.00 1.00 1.00 1.000

TN multi-level modeling strategy of JCM allows for simultaneous


Specificity5 ; (11)
TN 1FP clustering and matching cell populations across samples in a
TP1TN much more natural manner. Moreover, the availability of a
Accuracy5 ; (12)
TP1TN 1FP1FN parametric characterization of the class templates allows for
TP the classification of unlabeled samples to be undertaken on
Precision5 ; (13)
TP1FP the basis of differences between their densities and the class
Precision 3 Sensitivity densities provided by the templates. Following FLAME, Azad
F-measure523 : (14) et al. (27) proposed flowMatch for classifying samples based
Precision 1 Sensitivity
on templates. The flowMatch procedure treats each sample as
In addition, we report in the second to last column of a leaf node of a template tree, then performs agglomerative
Table 1 the adjusted Rand index (ARI) (26), a popular mea- hierarchical clustering of the nodes to obtain a template tree,
sure of cluster agreement in the model-based clustering litera- using a similarity measure based on the KL distance. However,
ture. The performance of the JCM model is also evaluated by with flowMatch, each node in the template tree is character-
the area under receiver operating characteristic (ROC) curve ized by a single normal mixture model. Also, cluster matching
(AUC), reported in the last column of Tables 1 and 2. is performed between two nodes before calculating their simi-
For the WNV dataset, the clustering performance of JCM larity measure. In contrast, JCM automatically matches the
is assessed using the MCR, the standard measure used in sta- clusters across the samples and/or templates in the fitting pro-
tistics to evaluate the accuracy of clustering and/or classifica- cess. The KL distance can be calculated directly on the fitted
tion algorithms against true labels. This MCR is based on the models under the JCM framework.
proportion of mislabeled observations, calculated for each Recently, Cron et al. (6) proposed a hierarchical version
permutation of the predicted cluster labels against the labels (HDPGMM) of the Dirichlet process Gaussian mixture model
given by manual gating and the rate reported is the minimum (DPGMM) for the automatic alignment of cell populations
value over all such permutations. across a batch of samples. Their model has an additional layer
above the DPGMM models fitted to each sample, placing a
Other Methods hierarchical prior over the base distribution to link these indi-
The performance of the JCM procedure is compared vidual DPGMMs. The HDPGMM model assumes that all
with some other available methods for automated clustering samples share the same component parameters with the
and sample classification, including HDPGMM (6), flow- exception of the mixing proportions. In a more recent contri-
Match (27), Citrus (17), and ASPIRE (8). Like FLAME, JCM bution, Dundar et al. (8) extended the HDPGMM model to
is based on finite mixtures of skew distributions. However, allow for inter-sample RE using a similar conceptual strategy
FLAME performs cluster alignment across samples in a post- to JCM. This model, known as ASPIRE, relaxes the require-
hoc fashion using graph-matching techniques. In contrast, the ment for component means to be the same across samples by

Table 2. Classification results by each algorithm on the BCR dataset


SENSITIVITY SPECIFICITY ACCURACY PRECISION F-MEASURE AUC

JCM 0.86 0.43 0.64 0.60 0.71 0.69


Citrus 0.29 0.71 0.50 0.41 0.36 0.59
HDPGMM 0.14 0.71 0.43 0.24 0.20 0.59
flowMatch 0.00 0.91 0.71 0.00 0.00 0.52
ASPIRE 0.00 1.00 0.50 0.00 0.00 0.50
flowPeaks 0.00 1.00 0.50 0.00 0.00 0.50
SWIFT 0.00 1.00 0.50 0.00 0.00 0.50

Cytometry Part A  89A: 3043, 2016 35


Original Article

Figure 1. Comparison of JCM and manual analysis of samples from the WNV dataset. The four major populations identified manually (A)
have similar locations across the samples, but the abundance of the CD41CFSE-population varies greatly between samples. The propor-
tions of the other three populations remain relatively similar as revealed by a heatmap of the proportions (B). The template by JCM (C)
captures the all four populations with a parametric model. [Color figure can be viewed in the online issue which is available at
wileyonlinelibrary.com]

assuming that they deviate probabilistically from the corre- experts identified four cell populations in manual analysis.
sponding template means under a REM. Using this approach, Each sample was examined individually, and the populations
a global template is readily constructed without the need for were matched across the samples in a subsequent step. Four
post-hoc mode clustering as in HDPGMM. examples are shown in Figure 1, where the different popula-
Citrus (17) is another computational tool for sample tions as identified by manual gating are displayed in different
classification and cell-population identification. Unlike the colours. As can be observed in Figure 1A, while the relative
above mentioned methods, Citrus proceeds by using a hier- locations of these populations were quite similar across the
archical clustering of the cells selected from each sample, and samples, the abundance of the CD41CFSE- populations
the classification of samples is based on regression methods varies considerably between the samples, ranging from 0%
on selected features. For scalability, Citrus applies hierarchical (sample012) to 76% (sample013); see Figure 1B. With the
clustering to aggregated data consisting of randomly sampled exception of sample013, the other three populations have very
cells from each sample rather than the data pooled from all similar abundance across the samples. JCM and five compet-
samples. This approach removes the need for clustering indi- ing algorithms (ASPIRE, HDPGMM, flowMatch, flowPeaks,
vidual samples, and automatically identifies local and global and SWIFT) were applied to this dataset such that their tem-
clusters, as well as matching them across different samples. plates have four global clusters. The template obtained by
From the hierarchical clustering of the aggregated data, the JCM as shown in Figure 1C suggests that the location of the
proportion of each sample in each cluster is calculated. For four populations are well recovered. Inspection of the models
fitted to each sample by JCM (not shown) reveals that they
each sample, a feature vector is formed consisting of the clus-
are very similar to the template, but subtle variations can be
ter proportions along with other features of the sample such
observed in the mixing proportions. Indeed, the JCM model
as the median value of each marker.
provides a quantitative measure of the inter-sample variations
Experiment 1: Automated Gating of WNV Samples in the batch as part of its model fitting procedure. As
We analyzed the 13 samples in the WNV dataset as a described in the section “Level two: parametric batch-specific
batch with JCM and assess its ability to reidentify the man- (class) template”, these are given in terms of the variance of
ually gated populations in each sample. With this dataset, the RE terms, namely n1h and n2h ðh51; . . . ; 4Þ. The smaller

36 Modeling of Inter-Sample Variation


Original Article

Table 3. MCR of automated gating by various algorithms against manual analysis on the WNV dataset
SAMPLE JCM ASPIRE HDPGMM FLOWMATCH FLOWPEAKS SWIFT

001 0.0485 0.1880 0.1477 0.2767 0.1494 0.2459


002 0.1007 0.2026 0.1341 0.2651 0.1717 0.2300
003 0.0350 0.2568 0.2143 0.3391 0.1522 0.2172
004 0.2071 0.2945 0.2698 0.0095 0.2302 0.2121
005 0.0390 0.2806 0.1979 0.2349 0.1575 0.1913
006 0.2874 0.3116 0.2249 0.2122 0.1740 0.1242
007 0.2797 0.2794 0.2252 0.0071 0.2068 0.1656
008 0.0421 0.2535 0.2146 0.2750 0.1738 0.2121
009 0.1132 0.3196 0.2273 0.2792 0.1951 0.2258
010 0.0918 0.2373 0.2124 0.0456 0.1734 0.2338
011 0.0649 0.2344 0.1482 0.3083 0.1317 0.1612
012 0.0742 0.2076 0.1858 0.3141 0.1815 0.2556
013 0.2326 0.0437 0.0272 0.2316 0.1475 0.0935
AMCR 0.1243 0.2392 0.1869 0.2153 0.1727 0.1976

the values of these parameters, the closer the location of the high average MCR above 0.2, the ASPIRE and flowMatch algo-
clusters in each sample is relative to that of the template. The rithms have poor performance compared to the other algo-
values of n1h are estimated to be rithms for this dataset.
2 3 2 3
0:095 0:000 0:000 0:057 0:000 0:000 Experiment 2: Identification and Matching of Cell
6 7 6 7 Populations in Lymphoma Samples
6 7 6 7
n11 56
6 0:000 0:079 0:000 7; n12 56 0:000 0:061 0:000 7;
7 6 7 To illustrate the ability of JCM to model and match clus-
4 5 4 5 ters across samples in the presence of large inter-sample varia-
0:000 0:000 0:084 0:000 0:000 0:001 tions, we consider 30 random samples from the Lymphoma
2 3 2 3 dataset. It is known that a subset of these samples was influ-
0:004 0:000 0:000 0:004 0:000 0:000
6 7 6 7 enced by the voltage change in 2005, causing a shift in the
6 7 6 7
n13 56 0:000 0:034 0:000 7; n14 56 0:000 0:011 0:000 7
6 7 6 sub-populations for the CD3 and CD5 channels. Examples
7;
4 5 4 5 from the dataset are shown in the first column of Figure 2,
0:000 0:000 0:015 0:000 0:000 0:000 where noticeable differences can be observed across the sam-
ples for the location of the two clusters. In addition, the distri-
respectively, for the CD41CFSE2, CD4-CFSE2, CD42CFSE1, bution of the clusters varies considerably across the samples,
and CD41CFSE1 populations in this dataset. Similarly, the and some clusters exhibit non-normal features. Of interest
value of n2h are all small (around 0.1), giving an indication that here is how well the automated algorithms can identify and
the location of the clusters remains fairly constant across the discriminate the CD31CD51 and CD32CD52 populations
samples. in each sample and correctly match them across the samples.
On comparing the clustering results of these methods We applied five algorithms (JCM, flowMatch, flowPeaks,
against manual analysis (Table 3), it can be observed that JCM SWIFT, and HDPGMM) to this dataset and compared their
had the lowest average MCR of 0.1243. In 8 of the 13 samples, predicted clustering. The algorithms JCM, SWIFT, and
JCM achieved the lowest MCR compared to the other five HDPMM identified two global clusters, whereas flowPeaks and
algorithms. Note that as the location of populations remains flowMatch identified three clusters. A visual inspection of the
similar across the sample, algorithms that employ pooling results given by flowPeaks (column 4 of Fig. 2) indicates that
(such as flowPeaks and SWIFT) should not be disadvantaged. the pink cluster consist mainly of cells that expressed no or very
Indeed, algorithms such as HDPGMM are designed to adapt low levels of CD5. If they can be viewed as outlying observa-
to this situation. This is reflected in the average MCR of these tions of the CD32CD52 cluster of cells, then flowPeaks can be
algorithms, where HDPGMM, flowPeaks, and SWIFT were considered to have only two global clusters. Regarding the
quite similar and were reasonably close to the performance of matching of populations across samples, JCM and HDPGMM
JCM. The small difference in the performance of these algo- automatically register cell populations across samples. The
rithm may possibly be attributed to the slight variations in the flowMatch procedure performs population matching in a post-
shape of the CD41CFSE2 population across the samples. hoc step. The flowPeaks and SWIFT procedures employ data
Looking at the case of sample012, the absence of the pooling and hence no matching is required.
CD41CFSE2 population did not seem to have a great impact Focusing first on samples with well-separated clusters,
on the clustering performance of JCM, where it achieved a low such as sample008 (row 2) and sample009 (row 3) of Figure 2,
MCR of 0.0742, being just under half the MCR of the next low- most of the algorithms can discriminate between the two clus-
est MCR of 0.1815 (obtained by flowPeaks). With a relatively ters reasonably well. For sample008, it can be observed that

Cytometry Part A  89A: 3043, 2016 37


Original Article

Figure 2. Clustering and matching of cell populations across Lymphoma samples. The density plot of the raw data of selected samples
are shown in the first column. The automated gating results of JCM, flowMatch, flowPeaks, SWIFT, and HDPGMM for each of these sam-
ples are given in columns two through five, respectively. As can be observed, with the exception of flowMatch and flowPeaks, two global
clusters were identified by JCM, SWIFT, and HDPGMM. [Color figure can be viewed in the online issue which is available at
wileyonlinelibrary.com]

with the exception of SWIFT and flowMatch, the other three large portion of the CD32CD52 population that resides in
algorithms can identify the CD31CD51 population (dis- the upper tail region. The flowMatch algorithm modeled the
played as green dots in Fig. 2). For sample009, flowPeaks and CD32CD52 populations with with two components. On
HDPGMM mislabeled a minor portion of CD31CD51 cells. merging these components to give two meta clusters, flow-
In both these samples, flowMatch consistently mislabeled the Match provides a reasonable clustering of the data. There is,
tail portion of the CD32CD52 population. Looking at a case however, still a small portion of CD32CD52 cells incorrectly
where the two populations are close together (sample023, row labeled as CD31CD51. These are located near the tip of the
5, of Fig. 2), flowPeaks and HDPGMM failed to separate these tail of the CD32CD52 population. As can be observed in
populations and labeled them as one cluster. SWIFT provided Figure 2, JCM can capture both populations accurately and
a partition into two clusters, but have clearly mislabeled a adapt more closely to the asymmetric shape of these clusters.

38 Modeling of Inter-Sample Variation


Original Article

In the case where the CD31CD51 population is highly operates on each sample individually, but matching is per-
abundant compared to the CD32CD52 population (sam- formed in a post-hoc manner similar to FLAME. The allows
ple013, row 4 of Fig. 2), flowMatch can identify the abundant for greater flexibility in terms of the location and shape of the
population as one cluster, but splits the less abundant popula- local clusters, but the matching of local clusters show some
tion into two clusters. Similar results can be observed in cases limitations. To consider this further here, we examined the
where the CD32CD52 population is relatively more abun- initial partitions given by the local clusters of flowMatch. It
dant (sample001 and sample027 of Fig. 2). In sample001, revealed that all samples have four local clusters. The extra
however, cells in the tail region of the CD32CD52 popula- cluster lies between the green and pink clusters and was
tion were incorrectly labeled by flowMatch. The flowPeaks merged with the green cluster on matching. Owing to the
and HDPGMM algorithms performed reasonably well in this large inter-sample variation, this extra cluster consists of
sample, but are disappointing for sample013 and sample027. CD32CD52 cells in some samples (for example, in the first
The poor performance in the latter two samples can be attrib- three examples shown in Fig. 2), and CD31CD51 cells in the
uted to the large shift of the CD31CD51 population. In con- other samples (due to the shift of this population in these
trast, JCM was able to model the distribution of both samples). However, as this cluster have similar location across
populations with good precision in all three cases. the sample and are closer to the green cluster, they were incor-
It is of interest to note that as flowPeaks and SWIFT per- porated into the CD31CD51 meta-cluster during the match-
form clustering on the pooled data, all samples share the same ing step.
classification boundaries. In effect, this is similar to drawing It is of interest to note that for these four algorithms, ini-
static gates on all samples, where the gates are based on the tial partitions of the data is based on normal mixture models
clustering of the pooled data. The HDPGMM model is and the final partitions are obtained by merging some of the
designed to cater for some inter-sample variations. However, components. Although this allows for some degree of flexibil-
these are restricted to the mixing proportions only. Hence, the ity in capturing non-normal clusters, it has significant limita-
classification boundary remains the same for all samples. As tions in the presence of random effects; see sample027 as an
can be observed in Figure 2, the static gates given by flowPeaks example. Unlike these algorithms, the skew t-mixture model
and HDPGMM failed to capture the CD31CD51 population adopted by JCM can naturally adapt to the asymmetric shape
in sample013, sample023, and sample027 due to the drastic of the clusters and, by incorporating RE terms in the model
shifts in the CD5 channel. For SWIFT, it resulted in the mis- specification, JCM can accommodate these samples in the
labeling of a large portion of the CD32CD52 population in presence of large inter-sample variations.
all samples.
Sample027 shows an example of a case where one of the Experiment 4: Classification of AML Patients
population exhibit evident skewness. In this case, flowPeaks, We report in the first part of Table 1 the performance
SWIFT, and HDPGMM have difficulty identifying cells measures of JCM and five other algorithms (Citrus, flow-
belonging to the CD32CD52 population, especially those in Match, SWIFT, flowPeaks, and HDPGMM) on the AML data-
the tail of the distribution. As mentioned above, with flow- set. Since the latter three algorithms do not provide a strategy
Match, this population is partitioned into two components. for sample classification, we follow a similar approach to (8).
However, even after merging, it cannot accurately separate the For HDPGMM and flowPeaks, we extract the proportions of
two populations around the tail region. The pink component global clusters from each sample to form a feature vector to
contains both the CD32CD52 cells that lies in the tail and be used for training a SVM based on the labels from the train-
also some cells from the more dispersed CD31CD51 popula- ing set. For SWIFT, a template or consensus model was com-
tion. In contrast, JCM provided a much more appropriate puted based on the pooled samples. Subsequently, the cluster
solution. size for each sample was utilized as features for training a
The results suggests that all algorithms but JCM failed to SVM. With ASPIRE, we adopted the traditional fully super-
recover accurately the distribution of two populations in this vised classification setting as above, training a SVM on both
dataset. Algorithms that rely on data pooling for handling a the AML and normal cases in the training set.
batch of samples (for example, flowPeaks and SWIFT) suffer As each patient in the AML dataset was divided into eight
significantly from the large variations between samples. These tubes, we first analyzed each tube individually with JCM to
algorithms can be considered as applying a static gate to all obtain eight predicted class labels for each patient. Using these
samples. As a result, when the CD31CD51 population is results, a combined probability of having AML is computed
located close to the CD32CD52 population, SWIFT failed to and the final predicted label is based on a threshold calculated
identify a large portion of CD32CD52 cells whereas flow- from the training set.
Peaks can identify only a very small portion of CD31CD51 Among these six algorithms, JCM achieved the highest
cells. While HDPGMM operates on individual samples and AUC value. Also, JCM along with Citrus achieved the highest
allows for some inter-sample variations, it failed to distinguish F-measure value and ARI for this dataset. It can be observed
between the two populations in these cases due to its assump- from Table 1 that JCM and Citrus produced very similar
tion that local clusters shares common location and scale results for the remaining performance measures. Indeed, both
parameters across all samples. Hence, this algorithm is effec- Citrus and JCM correctly predicted 179 of the 180 test-labels
tively assuming static gates. The flowMatch algorithm also with Citrus incorrectly predicting an AML patient to be

Cytometry Part A  89A: 3043, 2016 39


Original Article

Figure 3. Inter-sample variation within (A) the normal and (B) the AML classes can be visualized in two-dimensions using Multi-
dimensional Scaling (MDS) based on the symmetric version of KL distances between the JCM fitted distributions of each pair of samples.

normal, whereas JCM mislabeled a normal patient to be AML. sification. We provide a visual tool in the form of two-
Comparison with the results reported in FlowCAP-II (3) sug- dimensional plots (see Figs. 3 and 4) using Multi-dimensional
gests that the performance of these two algorithms is very Scaling (MDS) based on the symmetrized KL distances
close to that of the top-performing algorithms [flowPeakssvm between the JCM fitted multivariate mixture distributions for
(3), flowType-FeaLect (3), and SPADE (28)] for this dataset; each pair of samples. For example, for the WNV dataset (Fig.
see the second part of Table 1. As ASPIRE did not produce 4A), all samples as modeled by JCM were very similar with
any results (terminated itself with unknown reasons before the exception of the outlying sample (which has very different
completing the fitting procedure), we do not report the results mixing proportions compared to others), whereas much
for ASPIRE in Table 1. However, it is noted in Table 2 in (8) larger variation can be observed between samples in the Lym-
that ASPIRE achieved an overall AUC of 98.9, although using phoma dataset (Fig. 4B).
a different partition into training and test sets. For flowMatch,
three AML samples were mislabeled as normal. The remaining Experiment 3: Classification of FL Patients
two methods, HDPGMM and SWIFT, achieved perfect speci- In Ref. 19, the FL patients were manually analyzed and
ficity, but performed poorly in terms of sensitivity. In other classified into LNP- and LNP1 classes. Here, we applied
words, they have correctly identified all normal patients, but seven algorithms (JCM, ASPIRE, Citrus, flowMatch, flow-
mislabeled a number of AML patients as being normal. This is Peaks, HDPGMM, and SWIFT) to this dataset. For training,
also reflected in their poor F-measure, ARI, and AUC values. half of the data was provided to each algorithm, together
Inter-sample variation within a given class of samples can with their class labels. The BCR dataset presents a more chal-
be studied in various ways. Such insights can lead to under- lenging case for the algorithms, as the distinction between
standing of subtle subclass structures and allow for better clas- the classes is noticeably less pronounced than the AML

Figure 4. Visualization of inter-sample variation within the WNV and Lymphoma datasets through MDS reveals that (A) all but one sample
(sample013) in the WNV dataset were quite similar and (B) the inter-sample variation is more profound in the Lymphoma dataset.

40 Modeling of Inter-Sample Variation


Original Article

Figure 5. Examples of LNP2 and LNP1 samples in the BCR dataset. (A) 2D scatter plots (markers p.PLCg2 and p.STAT5; panel 4) of the
CD201BCL21 population from six different samples of the LNP2 class. (B) The same population from six different samples of the LNP1
class. [Color figure can be viewed in the online issue which is available at wileyonlinelibrary.com]

Cytometry Part A  89A: 3043, 2016 41


Original Article

Table 4. Comparative analysis of multiple methods on FlowCAP datasets


HEU VS. UE HVTN
ACCURACY PRECISION F-MEASURE ACCURACY PRECISION F-MEASURE

JCM 0.493 0.492 0.443 1.000 1.000 1.000


flowCore-flowStats 0.545 0.556 0.500 1.000 1.000 1.000
flowType-FeaLect 0.364 0.333 0.300 1.000 1.000 1.000
SWIFT 0.636 0.667 0.600 1.000 1.000 1.000
PBSC 0.545 0.545 0.545 0.952 0.952 0.952
PramSpheres 0.364 0.364 0.364 0.904 0.904 0.904

dataset. On comparing the scatter plots of these samples (not measures in classifying samples thus uses a different approach
shown), it can be observed that the distribution of the compared to methods such as HDPGMM, Citrus, and
markers on the cells appears to be quite similar for all sam- ASPIRE in which the classifier is built based on a limited
ples. Perhaps the greatest difference occurs in the distribu- number of features derived from the fitted models. In con-
tion of the CD201BCL22 population for the markers trast, JCM is based on the fitted densities for the class tem-
pSTAT5 and pPLCg2 (Fig. 5), where its shape appears to be plates and the sample to be classified and not just a few
more asymmetrical for the LNP2 samples (Fig. 5A) com- features of these estimated densities such as the estimated
pared to the LNP1 samples (Fig. 5B). mixing proportions and component/cluster means or medians
The procedures JCM, ASPIRE, flowMatch, flowPeaks, of some of the markers. We would like to remark that the use
HDPGMM, and SWIFT identified 5, 3, 4, 22, 16, and 1 clus- of the symmetric KL distance throughout this article is for
ters, respectively, with their templates. Their performance illustration purposes. Our approach can be used with any
measures are reported in Table 2. It can be observed that JCM other suitable information measures, such as the Bhattachar-
achieved the highest F-measure value of 0.71. Citrus and yya distance given in Ref. 8 and the Hellinger distance given in
HDPGMM produced results that are quite similar to each Ref. 9. An application of JCM using both the KL distance and
other, yielding the same specificity and AUC. However, Bhattacharyya distance was applied to an immune tolerance
HDPGMM had a lower sensitivity, accuracy, and precision. network (ITN) dataset in Ref. 29.
With a zero sensitivity and precision, the performance of flow- The JCM procedure was compared to existing benchmark
Match, ASPIRE, flowPeaks, and SWIFT is relatively poor for methods on four real FCM datasets. On the WNV and CFSE
this dataset. A closer look at the predicted classification labels datasets where small and large inter-sample variations were
given by ASPIRE, flowPeaks, and SWIFT reveals that they observed, respectively, JCM was able to segment and model the
failed to discriminate between the two classes of patients, clas- populations identify by manual analysis with higher precision.
sifying all samples into one class. In the AML classification challenge, JCM achieved an almost
perfect AUC, the highest among the six recent methods (Citrus,
DISCUSSION flowMatch, SWIFT, flowPeaks, and HDPGMM) considered.
The JCM procedure was also able to correctly identify all AML
We have considered the JCM procedure for the unsuper- samples, achieving a sensitivity of one, while the sensitivity of
vised clustering of cells within a batch of samples that exhibit none of the other five methods exceeded 0.95. The performance
large inter-sample variation and the unsupervised classifica- of JCM and Citrus was very close to that of the top-performing
tion of multiple samples for multidimensional flow cytometric algorithms in the FlowCAP-II competition for this dataset. On
datasets. JCM addresses these two problems with a single the BCR dataset, where the distinction between the classes is
multi-level framework. Firstly, JCM can perform cell- much less profound than the AML dataset, JCM performed
population identification and alignment across multiple sam- best on the basis of both the F-measure and AUC.
ples in a fully automated manner. Secondly, the template In addition, JCM also achieved perfect classification
approach of JCM provides a mathematically convenient way results for the FlowCAP-II HVTN dataset (Table 4). On the
to classify new samples. For the former task, the effectiveness other hand, JCM’s focus on modeling the overarching class
of JCM has been illustrated in Ref. 7 on three experiments, structure could make it less suitable for certain datasets for
giving promising results. The focus of this article is on the which the classification is effectively driven by a specific fea-
modeling of inter-sample variations and the latter task of sam- ture or dimension of the samples rather than their overall
ple classification. high-dimensional form in terms of the mixture distribution.
For the problem of supervised classification of a new, Comparison of classification results on the FlowCAP-II HEU
unlabeled sample to one of a number of predefined classes, a vs. UE dataset (Table 4) illustrates this point.
template for each class is constructed by JCM. Unlike FLAME, Concerning the runtime of JCM, an extensive set of sim-
the class template built by JCM is characterized parametri- ulations was performed and reported in Ref. 7. It was
cally. This facilitates the use of divergence measures such as observed in these simulations that the computation time for
the KL or Bhattacharyya distance for quantitative comparison JCM is linearly proportional to the number of samples, the
of two parametric distributions. The use of these distance number of events per sample, the number of markers, and

42 Modeling of Inter-Sample Variation


Original Article

the number of clusters. To improve the runtime of JCM for 13. Hahne F, Khodabakhshi AH, Bashashati A, Wong CJ, Gascoyne RD, Weng AP,
Seyfert-Margolis V, Bourcier K, Asare A, et al. Per-channel basis normalization meth-
the analysis of high-dimensional datasets, future work may ods for flow cytometry data. Cytometry A 2010;77:121–131.
consider the incorporation of dimensionality reduction tech- 14. O’Neill K, Aghaeepour N, Parker J, Hogge D, Karsan A, Dalal B, Brinkman RR.
Deep profiling of multitube flow cytometry data. Bioinformatics 2015;31:1623–
niques such as matrix factorization and factor analyzers. 1631.
15. Naim I, Datta S, Rebhahn J, Cavenaugh JS, Mosmann TR, Sharma G. Swift: Scalable
clustering for automated identification of rare cell populations in large, high-
LITERATURE CITED dimensional flow cytometry datasets, part 1: Algorithm design. Cytometry A 2014;
1. Tanner SD, Bandura DR, Ornatsky O, Baranov VI, Nitz M, Winnik MA. Flow cytom- 85:402–421.
eter with mass spectrometer detection for massively multiplexed single-cell bio- 16. Dundar M, Yerebakan HZ, Rajwa B. Batch discovery of recurring rare classes toward
marker assay. Pure Appl Chem 2008;80:2627–2641. identifying anomalous samples. In: Proceedings of the 20th ACM SIGKDD interna-
2. Bendall SC, Simonds EF, Qiu P, Amir EaD Krutzik PO, Finck R, Bruggner RV, tional conference on Knowledge discovery and data mining. New York, NY: ACM,
Melamed R, Trejo A, Ornatsky OI, Balderas RS, Plevritis SK, Sachs K, Pe’er D, 2014, pp 223–232.
Tanner SD Nolan GP. Single-cell mass cytometry of differential immune and drug 17. Bruggner RV, Bodenmiller B, Dill DL, Tibshirani RJ, Nolan GP. Automated identifi-
responses across a human hematopoietic continuum. Science 2011;332:687–696. cation of stratifying signatures in cellular subpopulations. Proc Natl Acad Sci 2014;
3. Aghaeepour N, Finak G, Hoos H, Mosmann T, Gottardo R, Brinkman RR, Scheuermann 111:E2770–E2777.
RH, The FLOWCAP Consortium, the DREAM Consortium. Critical assessment of auto- 18. Ge Y, Sealfon SC. Flowpeaks: A fast unsupervised clustering for flow cytometry data
mated flow cytometry analysis techniques. Nature Methods 2013;10:228–238. via k-means and density peak finding. Bioinformatics 2012;28:2052–2058.
4. Pyne S, Hu X, Wang K, Rossin E, Lin TI, Maier LM, Baecher-Allan C, McLachlan GJ, 19. Irish JM, Myklebust JH, Alizadeh AA, Houot R, Sharman JP, Czerwinski DK, Nolan
Tamayo P, Hafler DA, et al. Automated high-dimensional flow cytometric data analy- GP, Levy R. Bcell signaling networks reveal a negative prognostic human lymphoma
sis. Proc Natl Acad Sci USA 2009;106:8519–8524. cell subset that emerges during tumor progression. Proc Natl Acad Sci U S A 2010;
5. Fr€ uhwirth-Schnatter S, Pyne S. Bayesian inference for finite mixtures of univariate 107:12747–12754.
and multivariate skew-normal and skew-t distributions. Biostatistics 2010;11:317–336. 20. Irish JM, Kotecha N, Nolan GP. Mapping normal and cancer cell signalling networks:
6. Cron A, Gouttefangeas C, Frelinger J, Lin L, Singh SK, Britten CM, Welters MJ, van Towards single-cell proteomics. Nat Rev Cancer 2006;6:146–155.
der Burg SH, West M, Chan C. Hierarchical modeling for rare event detection and 21. Lee SX, McLachlan GJ. On mixtures of skew-normal and skew t-distributions. Adv
cell subset alignment across flow cytometry samples. PLoS Comput Biol 2013;9. Data Anal Classification 2013;7:241–266.
7. Pyne S, Lee SX, Wang K, Irish J, Tamayo P, Nazaire MD, Duong T, Ng SK, Hafler D, 22. Lee S, McLachlan GJ. Finite mixtures of multivariate skew t-distributions: Some
Levy R, et al. Joint modeling and registration of cell populations in cohorts of high- recent and new results. Stat Comput 2014;24:181–202.
dimensional flow cytometric data. Plos One 2014;9:e100334
23. Azzalini A, Capitanio A. Distributions generated by perturbation of symmetry
8. Dundar M, Akova F, Yerebakan HZ, Rajwa B. A non-parametric bayesian model for with emphasis on a multivariate skew t distribution. J R Stat Soc Ser B 2003;65:
joint cell clustering and cluster matching: Identification of anomalous sample phe- 367–389.
notypes with random effects. BMC Bioinformatics 2014;15:314
24. McLachlan GJ, Peel D. Finite Mixture Models. New York: Wiley Series in Probability
9. Rossin E, Lin TI, Ho HJ, Mentzer S, Pyne S. A framework for analytical characteriza- and Statistics; 2000.
tion of monoclonal antibodies based on reactivity profiles in different tissues. Bioin-
formatics 2011;27:2746–2753. 25. Mesaros A, Heittola T, Palomaki K. Analysis of acoustic-semantic relationship for
diversely annotated real-world audio data. In: IEEE International Conference on
10. Ho HJ, Lin TI, Chang HH, Haase HB, Huang S, Pyne S. Parametric modeling of cel- Acoustics, Speech and Signal Processing ICASSP. 2013, pp. 813–817.
lular state transitions as measured with flow cytometry different tissues. BMC Bioin-
formatics 2012;13:S5 26. Hubert L, Arabie P. Comparing partitions. J Classification 1985;2:193–218.
11. Ho HJ, Pyne S, Lin TI. Maximum likelihood inference for mixtures of skew student- 27. Azad A, Rajwa B, Pothen A. Immunophenotypes of acute myeloid leukemia from
t-normal distributions through practical EM-type algorithms. Stat Comput 2012;22: flow cytometry data using templates. arXiv:1403.6358 [q-bio.QM] 2014.
287–299. 28. Qiu P. Inferring phenotypic properties from single-cell characteristics. Plos One
12. Naim I, Datta S, Sharma G, Cavenaugh JS, Mosmann TR. Swift: Scalable weighted 2012;7:e37038
iterative sampling for flow cytometry clustering. In: IEEE International Conference 29. Pyne S, Lee S, McLachlan G. Nature and man: The goal of bio-security in the course
on Acoustics Speech and Signal Processing (ICASSP). 2010, pp 509–512. of rapid and inevitable human development. J Ind Soc Agric Stat 2015;69:117–125.

Cytometry Part A  89A: 3043, 2016 43

You might also like