Professional Documents
Culture Documents
Abstract—In this paper, we adapt two existing methods to per- techniques assume that the time series are already segmented,
form semi-supervised temporal clustering: Aligned Cluster Anal- and this is the main difference between the traditional time
ysis (ACA), a temporal clustering algorithm, and Constrained series clustering techniques and what is being defined as TC.
Spectral Clustering, a semi-supervised clustering algorithm. In One technique that uses a clustering approach to segment the
the first method, we add side information in the form of pairwise data is Aligned Cluster Analysis (ACA) [4], which frames this
constraints to its objective function, and in the second, we add a
temporal search to its framework. We also extend both methods
problem as an energy-based temporal clustering and solves it
by propagating the constraints throughout the whole similarity via dynamic programming.
matrix. In order to validate the advantage of the proposed semi- Similar to the general clustering problem, temporal seg-
supervised methods to temporal clustering, we evaluate them in
mentation using clustering may not have satisfactory results
comparison to their original versions as well as another semi-
supervised temporal cluster on three temporal datasets. The due to being a totally unsupervised approach. In some situa-
results show that the proposed methods are competitive and tions, prior high-level knowledge is known about the clusters,
provide good improvement over the unsupervised approaches. or some labeled data is available. This information can be
used to aid the clustering algorithm, and this class of learning
is known as semi-supervised clustering.
Keywords—Semi-supervised clustering; Temporal segmentation;
Kernel k-means. The most common way to add supervisory information
is the use of two pairwise constraints: must-link and cannot-
link. In must-link constraints, two points must be in the same
I. I NTRODUCTION
cluster, and in cannot-link constraints, two points should not
Temporal clustering (TC) can be defined as the factor- be in the same cluster. This type of constraint is a more
ization of multiple time series into a set of non-overlapping general form of supervisory information compared to labeled
segments that belong to k temporal clusters [1]. Temporal data, and it can be expanded by using transitive closure on
clustering is similar to normal clustering, which also requires the must-link relations. Some popular unsupervised algorithms
a similarity measure, a clustering algorithm, and an evaluation that have been adapted into a semi-supervised framework are
criterion; however, the temporal nature of the data requires Constrained K-means Clustering with Background Knowledge
special treatment at one or more of these components. The [5], Semi-supervised Kernel Mean Shift Clustering [6], Semi-
two major ways to handle time series is to either modify supervised Kernel K-means [7], and Constrained Spectral
existing static data clustering algorithms to handle time, or Clustering [8]. However, all these methods are not designed
convert the time series into a form that allows static algorithms to handle temporal data.
to work on. The former approach relies, in most cases, on
In this work, we investigate developing semi-supervised
modifying the similarity measure to an appropriate measure
temporal clustering methods. First, we present Semi-
of time series, for example, dynamic time warping (DTW).
Supervised Aligned Clustering Analysis (SSACA), an adap-
The latter maps the time series into a different representation
tation of the successful temporal clustering method, Aligned
or domain that embeds the temporal information, such as
Cluster Analysis (ACA) [4], where we add side information
Wavelets, Fourier, and Haar transform [2]; or into a number of
in the form of pairwise constraints. Second, we change Con-
model parameters and then apply the conventional static data
strained Spectral Clustering, which is a semi-supervised clus-
clustering algorithm.
tering method, to a temporal semi-supervised method using the
Clustering time series is a tool that can be applied in many same framework. We also add to both methods, an exhaustive
problems such as data mining, visualization, and segmenta- constraint propagation algorithm to improve their performance.
tion. The problem of segmentation is specially important in We test our methods on both synthetic and real-world datasets.
applications like human motion analysis, audio-visual emotion
analysis, animal behaviour analysis, etc. Solving the problem II. R ELATED WORK
of segmentation using a clustering approach requires modelling
the temporal variability of the segments in a time series. As Segmentation and clustering of time series are problems
a result, simply applying some of the traditional time series in various fields including video segmentation [9], facial ex-
clustering techniques such as the ones reviewed in [3] may not pression interpretation [4], animal behavior recognition [10],
produce satisfactory results. Most of the temporal clustering human-motion analysis [11], stock market analysis, etc. How-
ever, an in-depth understanding of this problem is necessary where xi ∈ Rd (see notation1 ) is a vector representing the ith
to differentiate the techniques used to solve it. data point, φ(.) is a nonlinear mapping function, and zc ∈ Rd
represents the centroid of the data points in cluster c. Matrix
It is noteworthy that the TC problem addressed in this paper G ∈ {0, 1}k×n indicates, in a binary format, the cluster to
is different from clustering time series, which refers to the which xi belongs, and Z ∈ Rd×n is the actual value of the
problem of clustering time series when they are already seg- means.
mented. The proposed approach performs temporal segmenta-
tion and groups similar clusters simultaneously. Another way ACA considers distances between segments, which can
of doing temporal segmentation is by change-point detection even be of different lengths, instead of distances between
[12] [13]. Change-point detection consists of an analysis of points. The chosen distance for ACA is Dynamic Time Align-
changes on the distribution of the points within a window ment Kernel (DTAK), which is an extension of DTW. DTAK,
of temporal observations in order to locate their boundaries. unlike DTW, satisfies the triangle inequality. In DTAK, the
However, this technique detects only local boundaries and does distance between two sequences, X = [x1 , . . . , xn ] ∈ Rd×n
not provide a global model for temporal events. HACA [11] and Y = [y1 , . . . , yn ] ∈ Rd×n can be computed recursively
and ACA [4] solve this problem by minimizing the errors (details on [11]).
across various segments for each k cluster.
The goal of ACA is to decompose a segment X =
In terms of semi-supervised methods, we have identified [x1 , . . . , xn ] ∈ Rd×n into m disjoint segments, where each
two approaches that add supervision to temporal clustering. segment belongs to a single k cluster. Each segment is
[14] proposes Temporal-Driven Constrained K-Means (TDCK- constrained by a maximum length nmax , which controls the
Means), an extension of k-means that incorporates a temporal- temporal granularity of the segmentation. The segments begin
aware dissimilarity measure, which combines Euclidean dis- at position si and end at si+1 − 1, such that ni = si+1 − si ≤
tance in the multidimensional space with the distance between nmax . An indicator matrix G ∈ {0, 1}k×m assigns each
timestamps. TDCK-means makes an assumption that adjacent segment to a cluster; gci = 1 if Zi belongs to cluster c.
observations of the same entity should be assigned to the same
cluster, and based on this assumption it creates a contiguity ACA achieve temporal clustering by minimizing:
penalty function that penalizes contiguous observations that
are not assigned to the same cluster. Therefore, external
information knowledge is embedded to the algorithm based on
this notion that similar observations should be close in time. m
k X
X
2
Essentially, they have must-link soft constrain between all pairs Jaca (G, s) = gci
ψ(X[si ,si+1 ) ) − zc
=
of observations belonging to the same entity. c=1 i=1
| {z }
dist2ψ (Yi , zc ) (2)
[15] proposes a system that summarizes a user’s daily k[ψ(Y1 , . . . , ψ(Ym ) − ,
2
ZGkF
activities, such as sleeping, walking, and working by com- T
bining data from body sensors, GPS, computer monitoring s.t.G 1k = 1m and si+1 − si ∈ [1, nmax ],
data, videos, and audio. The paper describes a semi-supervised
temporal clustering algorithm to group large amounts of
multimodal data into different activities. Similar to [14], the where G ∈ {0, 1}k×m is a cluster indicator matrix, and s ∈
constraints are used to penalize non-smooth changes (over Rm+1 is the segment vector Y = X[si ,si+1 ) . In the case of
time) on the assigned clusters. In this method, there is no ACA, the dist2ψ (Yi , zc ) is the squared distance between the
control over the granularity of the temporal term, and there ith segment and the center of cluster c in the nonlinear mapped
is also no robust metric between time series. feature space represented by ψ(.).
0.55
SSTSC+EP
0.5
SSACA
constraints.
0.45
TDCK
0.4
0.35
ACA
V. R ESULTS AND D ISCUSSIONS
SC
0.3
0.25
In the first experiment (Figure 1), we evaluate the methods
0
10
20
30
40
50
on different numbers of constraints. Note that the proposed
Number
of
Constraint
Pairs
semi-supervised methods increase consistently as the number
Fig. 1. Accuracy of synthetic data using different numbers of constraints. TABLE IV. ACCURACY OF 14 SEQUENCES OF SUBJECT 86 FROM THE
MOCAP DATASET.
Average over 14 sequences
B. Audio-visual Emotion Dataset Accuracy
SSACA+EP 0.92 ± 0.08
Audio-Visual Emotion Challenge (AVEC) [18] is an audio- SSTSC+EP 0.92 ± 0.07
visual emotion recognition dataset created for the emotion SSACA 0.92 ± 0.07
recognition challenge (AVEC 2012). The dataset consists of TDCK [14] 0.49 ± 0.05
ACA [4] 0.87 ± 0.09
conversations among several participants and four stereotyped SC 0.74 ± 0.13
characters. Each character has a specific emotion stereotype:
of constraints gets higher. Furthermore, after the number of R EFERENCES
constraints reach over 40 pairs, the exhaustive propagation [1] M. Hoai and F. De la Torre, “Maximum margin temporal clustering,”
method SSACA+EP starts to differentiate from its counter part in Proceedings of the Fifteenth International Conference on Artificial
(SSACA). The single process of keeping the compatibilities Intelligence and Statistics (AISTATS-12), vol. 22, 2012, pp. 520–528.
correct gets very expensive in high number of constraints, [2] T. Fu, “A review on time series data mining,” Engineering Applications
therefore we stopped at 50 pairs. of Artificial Intelligence, vol. 24, no. 1, pp. 164 – 181, 2011.
[3] T. W. Liao, “Clustering of time series data – a survey,” Pattern
Granted that the TDCK algorithm was originally created Recognition, vol. 38, no. 11, pp. 1857 – 1874, 2005.
for multi-entity problems, and in this case we are using it in [4] F. Zhou, F. De la Torre, and J. F. Cohn, “Unsupervised discovery of
a single entity scenario, it can be said that this algorithm may facial events,” in Computer Vision and Pattern Recognition (CVPR),
2010 IEEE Conference on, June 2010, pp. 2574–2581.
be under used; however, it is one of the few approaches that
[5] K. Wagstaff, C. Cardie, S. Rogers, and S. Schrödl, “Constrained k-
is temporal and semi-supervised, and it is still fair to make means clustering with background knowledge,” in Proceedings of the
comparisons with our approaches. Differently from TDCK, Eighteenth International Conference on Machine Learning, ser. ICML
our method consider distances between segments instead of ’01. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.,
distances between points, which captures better the temporal 2001, pp. 577–584.
dynamic of the data. Also, the pairwise constraints are directed [6] S. Anand, S. Mittal, O. Tuzel, and P. Meer, “Semi-supervised kernel
applied to the affected pairs, instead of using them as a unique mean shift clustering,” Pattern Analysis and Machine Intelligence, IEEE
Transactions on, vol. PP, no. 99, pp. 1–1, 2013.
general assumption applied to all the data. These aspects
[7] B. Kulis, S. Basu, I. S. Dhillon, and R. J. Mooney, “Semi-supervised
favourite the proposed methods over TDCK. graph clustering: a kernel approach,” Machine Learning, vol. 74, no. 1,
pp. 1–22, 2009.
Tables II and III show the results of the complex problem of [8] Z. Lu and H. H. S. Ip, “Constrained spectral clustering via exhaustive
audio-visual emotion analysis. On both cases, SSTSC+EP and and efficient constraint propagation,” in Proceedings of the 11th Euro-
SSACA+EP, which have exhaustive propagation, performed pean Conference on Computer Vision: Part VI, ser. ECCV’10. Berlin,
similar or better than the regular propagation used in SSACA. Heidelberg: Springer-Verlag, 2010, pp. 1–14.
The high complex nature of visual features makes the spectral- [9] S. Arifin and P. Cheung, “Affective Level Video Segmentation by
based approach (SSTSC+EP) a more suitable method. For Utilizing the Pleasure-Arousal-Dominance Information,” Multimedia,
IEEE Transactions on, vol. 10, no. 7, pp. 1325–1341, 2008.
the audio features, the advantage of SSTSC+EP is more
[10] S. Oh, J. Rehg, T. Balch, and F. Dellaert, “Learning and inferring motion
evident when we analyze the power and expectancy emotion patterns using parametric segmental switching linear dynamic systems,”
dimensions. In this type of application, emotions are recurrent, International Journal of Computer Vision, vol. 77, no. 1-3, pp. 103–124,
and may be further apart, which causes TDCK to perform 2008.
poorly due to its assumption that benefits closer clusters. [11] F. Zhou, F. De la Torre, and J. K. Hodgins, “Hierarchical aligned cluster
analysis for temporal clustering of human motion,” IEEE Transactions
In the human motion segmentation dataset, we also ob- Pattern Analysis and Machine Intelligence (PAMI), vol. 35, no. 3, pp.
served improvements of the proposed methods compared to 582–596, 2013.
the baselines (Table IV). However, differences between the [12] X. Xuan and K. Murphy, “Modeling changing dependency structure
in multivariate time series,” in Proceedings of the 24th International
semi-supervised methods are not very visible due to the very Conference on Machine Learning, ser. ICML ’07. New York, NY,
low number of segments. USA: ACM, 2007, pp. 1055–1062.
[13] Z. Harchaoui, E. Moulines, and F. R. Bach, “Kernel change-point
In general, the methods show substantial improvements analysis,” in Advances in Neural Information Processing Systems 21,
with minimal addition of side information compared to ex- D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, Eds. Curran
clusively unsupervised methods. In addition, it can be seen Associates, Inc., 2009, pp. 609–616.
that some specific problems, such as audio-visual emotion [14] M. Rizoiu, J. Velcin, and S. Lallich, “Structuring typical evolutions
using temporal-driven constrained clustering,” in Tools with Artificial
analysis is more suitable for SSTSC due to the spectral nature Intelligence (ICTAI), 2012 IEEE 24th International Conference on,
of the algorithm, specially in the case of power and expectancy vol. 1, 2012, pp. 610–617.
dimensions. Simpler cases such as the synthetic dataset is [15] F. De la Torre and C. Agell, “Multimodal diaries,” in Multimedia and
more suitable for SSACA. Moreover, the use of exhaustive Expo, 2007 IEEE International Conference on, July 2007, pp. 839–842.
propagation can enhance the accuracy of the proposed methods [16] D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Schlkopf, “Learning
in situations with a higher number of constraints and segments. with local and global consistency,” in Advances in Neural Information
Processing Systems 16. MIT Press, 2004, pp. 321–328.
[17] R. Araujo and M. Kamel, “A semi-supervised temporal clustering
method for facial emotion analysis,” in Multimedia and Expo Workshops
VI. C ONCLUSION (ICMEW), 2014 IEEE International Conference on, July 2014, p. to
appear.
In this paper, we propose two semi-supervised temporal [18] B. Schuller, M. Valstar, R. Cowie, and M. Pantic, “Avec 2012: The
clustering methods. First, we transform a kernel-based unsu- continuous audio/visual emotion challenge - an introduction,” in Pro-
ceedings of the 14th ACM International Conference on Multimodal
pervised temporal clustering into a semi-supervised temporal Interaction, ser. ICMI ’12. New York, NY, USA: ACM, 2012, pp.
clustering by adding some constraints in the form of must-links 361–362.
and cannot-links. Then, we transform a constrained spectral [19] A. Sayedelahl, R. Araujo, and M. Kamel, “Audio-visual feature-decision
clustering into a temporal semi-supervised clustering. We level fusion for spontaneous emotion estimation in speech conversa-
evaluate our methods on a synthetic dataset and on two distinct tions,” in Multimedia and Expo Workshops, 2013 IEEE International
complex problems. Results show substantial improvements Conference on, July 2013, pp. 1–6.
compared to the original unsupervised methods and other semi- [20] M. Shell. (2012) Carnegie mellon university motion capture database.
[Online]. Available: http:// mocap.cs.cmu.edu
supervised method with minimal supervision.