Professional Documents
Culture Documents
(MDS)
AN INTRODUCTION
TEXT FOR REFERENCE
AN INTRODUCTION TO APPLIED
MULTIVARIATE ANALYSIS WITH R
(2011)
BY
LINK
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 2
WHAT IS MDS?
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 3
APPROACHES TO MDS
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 4
SIMILARITIES VS DISTANCES/ DISSIMILARITIES
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 5
DISTANCES VS DISSIMILARITIES
• A DISSIMILARITY MEASURE FOR TWO OBJECTS ESSENTIALLY MEASURES THE SAME PROPERTY
WITH RESPECT TO THEM BUT IS MORE GENERAL.
• IT MAY NOT SATISFY ALL THE ABOVE PROPERTIES.
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 6
METRIC MDS
• DISTANCES 𝑑𝑑𝑖𝑖𝑖𝑖 BETWEEN ALL POSSIBLE PAIRS OF DATA POINTS (𝑖𝑖, 𝑗𝑗) ARE GIVEN.
• A CONFIGURATION OF POINTS WHICH GIVES RISE TO THOSE DISTANCES IN A
LOWER-DIMENSIONAL SPACE IS SOUGHT.
′
• OBJECTIVE FUNCTION TO BE MINIMIZED, WHERE 𝑑𝑑𝑖𝑖𝑖𝑖 REPRESENTS DISTANCES DATA
POINTS (𝑖𝑖, 𝑗𝑗) IN THE LOWER-DIMENSIONAL SPACE, IS
′ 2
𝐸𝐸𝑀𝑀 = � 𝑑𝑑𝑖𝑖𝑖𝑖 − 𝑑𝑑𝑖𝑖𝑖𝑖
𝑖𝑖≠𝑗𝑗
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 21-7
NONMETRIC MDS
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 8
THE STRESS MEASURE
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 10
INTRODUCTION
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 11
METRIC MDS (CONTD.)
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 13
METRIC MDS (CONTD.)
• USING ALL 𝑝𝑝 DIMENSIONS WILL LEAD TO COMPLETE RECOVERY OF THE ORIGINAL EUCLIDEAN
DISTANCE MATRIX.
• THE BEST-FITTING 𝑚𝑚-DIMENSIONAL REPRESENTATION IS GIVEN BY THE 𝑚𝑚 EIGENVECTORS OF
𝐁𝐁 CORRESPONDING TO THE 𝑚𝑚 LARGEST EIGENVALUES.
• THE ADEQUACY OF THE 𝑚𝑚-DIMENSIONAL REPRESENTATION CAN BE JUDGED BY THE SIZE OF
THE CRITERION
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 14
METRIC MDS (CONTD.)
• WHEN THE OBSERVED PROXIMITY MATRIX IS NOT EUCLIDEAN, THE MATRIX 𝐁𝐁 IS NOT
POSITIVE-DEFINITE.
• SOME OF THE EIGENVALUES OF 𝐁𝐁 WILL BE NEGATIVE.
• SOME COORDINATE VALUES WILL BE COMPLEX NUMBERS.
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 15
METRIC MDS (CONTD.)
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 16
METRIC MDS (CONTD.)
• SIBSON (1979) RECOMMENDED ONE OF THE FOLLOWING TWO CRITERIA FOR DECIDING ON
THE NUMBER OF DIMENSIONS TO ADEQUATELY REPRESENT THE OBSERVED PROXIMITIES:
• TRACE CRITERION: CHOOSE THE NUMBER OF COORDINATES SO THAT THE SUM OF THE POSITIVE
EIGENVALUES IS APPROXIMATELY EQUAL TO THE SUM OF ALL THE EIGENVALUES.
• MAGNITUDE CRITERION: ACCEPT AS GENUINELY POSITIVE ONLY THOSE EIGENVALUES WHOSE
MAGNITUDE SUBSTANTIALLY EXCEEDS THAT OF THE LARGEST NEGATIVE EIGENVALUE.
• HOWEVER, IF THE MATRIX 𝐁𝐁 HAS A CONSIDERABLE NUMBER OF LARGE NEGATIVE
EIGENVALUES, METRIC MDS IS INADVISABLE.
• SOME OTHER METHOD, FOR EXAMPLE NON-METRIC MDS MIGHT BE BETTER EMPLOYED.
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 17
AN EXAMPLE
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 18
EXAMPLE (CONTD.)
THE EIGENVALUES
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 19
EXAMPLE (CONTD.)
• THE SIX BRITISH POPULATIONS APPEAR TO BE
• CLOSE TO POPULATIONS LIVING IN THE ALPS,
YUGOSLAVIA, GERMANY, NORWAY, AND
PYRENEES I
• RATHER DISTANT FROM THE POPULATIONS IN
PYRENEES II, NORTH SPAIN AND SOUTH SPAIN
• THIS WOULD SEEM TO IMPLY THAT ONE
SPECIES MIGHT BE PRESENT IN BRITAIN BUT IT IS
LESS LIKELY THAT THIS IS SO FOR THE OTHER
SPECIES
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 20
ANOTHER EXAMPLE
• DATA: 150 MEASUREMENTS ON MALE EGYPTIAN SKULLS FROM FIVE EPOCHS ON 4 VARIABLES
• MB: MAXIMUM BREADTH OF THE SKULL
• BH: BASIBREGMATIC HEIGHT OF THE SKULL
• BL: BASIALIVEOLAR LENGTH OF THE SKULL
• NH: NASAL HEIGHT OF THE SKULL
• 30 MEASUREMENTS FROM EACH OF THE EPOCHS (4000BC, 3300BC, 1850BC, 200BC, 150AD
• MAHALANOBIS DISTANCES BETWEEN EACH PAIR OF EPOCHS ARE COMPUTED USING THE
ESTIMATED COMMON COVARIANCE MATRIX
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 21
EXAMPLE (CONTD.)
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 22
NON-METRIC MDS
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 23
INTRODUCTION
• SUCH CONSIDERATIONS LED TO THE SEARCH FOR A MDS METHOD THAT USES
ONLY THE RANK ORDER OF THE PROXIMITIES TO PRODUCE A SPATIAL
REPRESENTATION OF THEM.
• A METHOD THAT WOULD BE INVARIANT UNDER MONOTONIC TRANSFORMATIONS OF THE
PROXIMITY MATRIX
• THE DERIVED COORDINATES WILL REMAIN THE SAME IF THE VALUES OF THE OBSERVED
PROXIMITIES ARE CHANGED BUT THEIR RANK ORDER IS NOT.
• SUCH A METHOD WAS PROPOSED IN LANDMARK PAPERS BY SHEPARD (1962)
AND BY KRUSKAL (1964).
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 25
NON-METRIC MDS
• THE COORDINATES IN THE SPATIAL REPRESENTATION OF THE OBSERVED
DISSIMILARITIES 𝛿𝛿𝑖𝑖𝑖𝑖 GIVE RISE TO FITTED DISTANCES 𝑑𝑑𝑖𝑖𝑖𝑖 .
• THESE DISTANCES ARE RELATED TO A SET OF NUMBERS CALLED DISPARITIES 𝑑𝑑̂ 𝑖𝑖𝑖𝑖 , BY
𝑑𝑑𝑖𝑖𝑖𝑖 = 𝑑𝑑̂ 𝑖𝑖𝑖𝑖 + 𝑒𝑒𝑖𝑖𝑖𝑖 .
WHERE THE 𝑒𝑒𝑖𝑖𝑖𝑖 ARE ERROR TERMS REPRESENTING ERRORS OF MEASUREMENT PLUS
DISTORTION ERRORS ARISING BECAUSE THE DISTANCES DO NOT CORRESPOND TO A
CONFIGURATION IN THE PARTICULAR NUMBER OF DIMENSIONS CHOSEN.
• THE 𝑑𝑑̂ 𝑖𝑖𝑖𝑖 ARE MONOTONIC WITH THE OBSERVED DISSIMILARITIES AND, SUBJECT TO
THIS CONSTRAINT, RESEMBLE THE FITTED DISTANCES AS CLOSELY AS POSSIBLE.
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 26
METRIC MDS (CONTD.)
THEN
𝑛𝑛(𝑛𝑛−1)
WHERE 𝑁𝑁 = .
2
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 27
MON-METRIC MDS (CONTD.)
• THUS, FITTED DISTANCES OR DISPARITIES 𝑑𝑑̂ 𝑖𝑖𝑖𝑖 ARE CONSTRUCTED FROM 𝑑𝑑𝑖𝑖𝑖𝑖
SUCH THAT THE 𝑑𝑑̂ 𝑖𝑖𝑖𝑖 S ARE IN THE SAME RANK ORDER AS THE 𝛿𝛿𝑖𝑖𝑖𝑖 S (FOR
DISSIMILARITIES) OR REVERSE RANK ORDER (FOR SIMILARITIES).
• THE 𝑑𝑑̂ 𝑖𝑖𝑖𝑖 s CAN BE LOOKED UPON AS "SMOOTHED" VERSIONS OF THE 𝑑𝑑𝑖𝑖𝑖𝑖 s.
• THIS SMOOTHING PROCESS IS CARRIED OUT USING A METHOD CALLED
LEAST-SQUARES MONOTONIC REGRESSION.
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 28
NON-METRIC MDS (CONTD.)
• IN A PLOT OF 𝑑𝑑𝑖𝑖𝑖𝑖 VERSUS 𝛿𝛿𝑖𝑖𝑖𝑖 , A MONOTONIC CURVE (ONE WHERE THE LINES
JOINING ADJACENT POINTS ARE FLAT/INCREASING IF 𝛿𝛿𝑖𝑖𝑖𝑖 ARE
DISSIMILARITIES OR FLAT/DECREASING IF THEY ARE SIMILARITIES) IS
EXPECTED.
• IF THE 𝑑𝑑𝑖𝑖𝑖𝑖 AND THE 𝛿𝛿𝑖𝑖𝑖𝑖 HAVE THE SAME RANK ORDER, THEN THE PLOT WILL
SHOW SUCH A MONOTONIC CURVE AND THE 𝑑𝑑𝑖𝑖𝑖𝑖 s WILL NOT REQUIRE ANY
SMOOTHING.
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 30
MONOTONIC LS REGRESSION (CONTD.)
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 31
MONOTONIC REGRESSION EXAMPLE
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 32
SHEPARD-KRUSKAL ALGORITHM
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 33
ASSESSING FIT AND CHOOSING THE NUMBER OF
DIMENSIONS
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 34
ASSESSING FIT AND CHOOSING THE NUMBER OF
DIMENSIONS
Stress
NUMBER OF DIMENSIONS.
• FOR ALL PRACTICAL APPLICATIONS,
THE NUMBER OF DIMENSIONS IS
TWO.
• FOR VISUAL DISPLAY
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 35
ILLUSTRATION
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 36
ILLUSTRATION (CONTD.)
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 37
ILLUSTRATION (CONTD.)
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 38
SOME EXAMPLES
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 39
EXAMPLE 0
TO MOTIVATE MDS
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 40
THE DATA
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 41
LOCATION OF 11 US CITIES ON A MAP
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 42
DISTANCE MATRIX FOR THE 11 US CITIES
ATL BOS ORD DCA DEN LAX MIA JFK SEA SFO MSY
ATL 0 934 585 542 1209 1942 605 751 2181 2139 424
BOS 934 0 853 392 1769 2601 1252 183 2492 2700 1356
ORD 585 853 0 598 918 1748 1187 720 1736 1857 830
DCA 542 392 598 0 1493 2305 922 209 2328 2442 964
DEN 1209 1769 918 1493 0 836 1723 1636 1023 951 1079
LAX 1942 2601 1748 2305 836 0 2345 2461 957 341 1679
MIA 605 1252 1187 922 1723 2345 0 1092 2733 2594 669
JFK 751 183 720 209 1636 2461 1092 0 2412 2577 1173
SEA 2181 2492 1736 2328 1023 957 2733 2412 0 681 2101
SFO 2139 2700 1857 2442 951 341 2594 2577 681 0 1925
MSY 424 1356 830 964 1079 1679 669 1173 2101 1925 0
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 43
THE PROBLEM
• THE GEOGRAPHICAL COORDINATES OF THE CITIES ARE EASILY AVAILABLE AND CAN BE USED
TO GENERATE THE MAP
• ASSUME THAT THE COORDINATES ARE NOT KNOWN.
• THE ONLY DATA AVAILABLE IS THE DISTANCE TABLE.
• DRAW A MAP OF THE MAJOR CITIES OF THE USA USING THE BETWEEN-CITIES DISTANCES.
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 44
2-D MDS SOLUTION
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 45
COMPARISON OF THE MDS SOLUTION WITH REALITY
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 46
OBSERVATIONS
• THE CITIES ON THE MDS MAP ARE NOT IN THEIR EXPECTED LOCATIONS.
• THE MAP IS NOT JUST MIRRORED BUT FLIPPED, THAT IS, ROTATED BY 180
DEGREES.
• HIGHLIGHTS THE FACT THAT MDS ONLY TRIES TO PRESERVE THE INTER-
OBJECT DISTANCES
• THERE IS NOTHING WRONG WITH THE MAP!
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 47
EXAMPLE I
COMPARISON OF TOOTHPASTE BRANDS
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 48
INPUT DATA
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 50
PLOT OF STRESS VERSUS DIMENSIONALITY
Stress 0.3
0.2
0.1
0.0
0 1 2 3 4 5
Number of Dimensions
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 21-51
A SPATIAL MAP OF TOOTHPASTE BRANDS
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 21-52
EXAMPLE II
SIMILARITIES BETWEEN COUNTRIES
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 53
SIMILARITY MATRIX FOR 12 COUNTRIES
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 54
TWO-DIMENSIONAL MDS PLOT OF COUNTRIES
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 55
EXAMPLE III
SIMILARITIES BETWEEN COLOURS BASED ON SUBJECTIVE JUDGEMENT
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 56
THE DATA
*Ekman, G. (1954). Dimensions of Color Vision, The Journal of Psychology, 38(2), pp. 467-474.
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 57
SIMILARITY MATRIX
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 58
SCREE PLOT
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 59
TWO-DIMENSIONAL MDS PLOT
RED BLUE
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 60
EXAMPLE IV
ACOUSTIC CONFUSION OF THE LETTERS OF THE ENGLISH ALPHABET
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 61
SIMILARITY MATRIX
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 62
SIMILARITY MATRIX (CONTD.)
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 63
SCREE PLOT
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 64
TWO-DIMENSIONAL ORDINAL MDS PLOT
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 65