Professional Documents
Culture Documents
fllobell@xlstat.com
1 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
Introduction
Method
Illustration
Conclusion
2 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
Data structure
3 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
4 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
5 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
6 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
8 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
Our aim
9 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
Existing method
10 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
11 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
Introduction
Method
Illustration
Conclusion
12 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
Preprocessing
13 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
Minimization criterion
(k )
DK = ∑K m
k =1 ∑l=1 ∑i∈Gk ||xil − cl ||
2
14 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
Hierarchical algorithm
15 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
Partitioning algorithm
16 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
Property
m K m
(k ) 2
∑ ||Xl − cl ||2 = DK + ∑ ∑ nk ||cl − cl ||
l=1 k =1 l=1
= DK + BK
17 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
Index
18 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
Within-clusters variationK
H(K ) = ( − 1)(n − K − 1)
Within-clusters variationK +1
DK
=( − 1)(n − K − 1)
DK +1
Hartigan, J. A. (1975)
19 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
Between-clusters variationK × (n − K )
CH(K ) =
Within-clusters variationK × (K − 1)
BK × (n − K )
=
DK × (K − 1)
20 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
Introduction
Method
Illustration
Conclusion
21 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
Data description
22 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
23 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
25 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
26 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
Introduction
Method
Illustration
Conclusion
27 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
28 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
Data structure
Binary data:
=⇒ 1: Attribute checked
=⇒ 0: Attribute not checked
29 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
Data description
• 9 beers
• 15 attributes
• 76 subjects
30 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
Importance of standardization
31 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
Indices
33 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
Introduction
Method
Illustration
Conclusion
34 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
35 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
Toy example
Subjects A, B and C
=⇒ 5 subjects A
=⇒ 4 subjects B
=⇒ 1 subject C
Contingency table
36 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
Clustering results
CA on contingency table
Introduction
Method
Illustration
Conclusion
38 / 39
Introduction Method Illustration The case of Check-All-That-Apply data Conclusion
Conclusion
• We have introduced a clustering method of observations in
the case of data structured in several blocks of variables
• This method is based on an aggregation criterion similar to
Ward’s criterion.
• Two algorithms have been proposed
• An aid for choosing the number of clusters has been added
• A clustering quality index within each block has been
introduced
• We have investigated the benefits of the method in the
specific case of CATA data
• Perspectives: by taking account of the multiblock structure,
we could take account of:
• Specificities of some blocks (e.g. categorical variables)
• Apply specific clustering strategies to some blocks
39 / 39