You are on page 1of 65

MULTIDIMENSIONAL SCALING

(MDS)
AN INTRODUCTION
TEXT FOR REFERENCE

AN INTRODUCTION TO APPLIED
MULTIVARIATE ANALYSIS WITH R
(2011)

BY

BRIAN EVERITT & TORSTEN HOTHORN

LINK

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 2
WHAT IS MDS?

• PROVIDES A REPRESENTATION IN A LOWER-DIMENSIONAL SPACE OF A SET OF


OBJECTS ENSURING THAT THE PATTERN OF PROXIMITIES (I.E., SIMILARITIES OR
DISTANCES) IS PRESERVED.
• FOR EXAMPLE, GIVEN A MATRIX OF PERCEIVED SIMILARITIES BETWEEN VARIOUS BRANDS
OF AIR FRESHENERS, MDS GENERATES A 2-D PLOT THE BRANDS SUCH THAT
• THOSE BRANDS THAT ARE PERCEIVED TO BE VERY SIMILAR TO EACH OTHER ARE PLACED NEAR
EACH OTHER
• BRANDS THAT ARE PERCEIVED TO BE VERY DIFFERENT FROM EACH OTHER ARE PLACED FAR
AWAY FROM EACH OTHER.

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 3
APPROACHES TO MDS

Metric MDS: used when pairwise


DISTANCES are available
MULTIDIMENSIONAL
SCALING
Non-metric MDS: used when pairwise
DISSIMILARITIES are available

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 4
SIMILARITIES VS DISTANCES/ DISSIMILARITIES

• SIMILARITY BETWEEN A PAIR OF OBJECTS IS MEASURED BY A


FUNCTION WHICH IS MONOTONICALLY DECREASING IN (E.G.,IS
INVERSELY PROPORTIONAL TO) THE DISTANCE/ DISSIMILARITY
BETWEEN THEM.

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 5
DISTANCES VS DISSIMILARITIES

• A DISTANCE MEASURE HAS THE FOLLOWING PROPERTIES:


• THE DISTANCE FROM A POINT TO ITSELF IS ZERO,
• THE DISTANCE BETWEEN TWO DISTINCT POINTS IS POSITIVE,
• THE DISTANCE FROM A TO B IS THE SAME AS THE DISTANCE FROM B TO A,
• THE DISTANCE FROM A TO B (DIRECTLY) IS LESS THAN OR EQUAL TO THE DISTANCE FROM A TO B VIA ANY
THIRD POINT C.

• A DISSIMILARITY MEASURE FOR TWO OBJECTS ESSENTIALLY MEASURES THE SAME PROPERTY
WITH RESPECT TO THEM BUT IS MORE GENERAL.
• IT MAY NOT SATISFY ALL THE ABOVE PROPERTIES.

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 6
METRIC MDS

• DISTANCES 𝑑𝑑𝑖𝑖𝑖𝑖 BETWEEN ALL POSSIBLE PAIRS OF DATA POINTS (𝑖𝑖, 𝑗𝑗) ARE GIVEN.
• A CONFIGURATION OF POINTS WHICH GIVES RISE TO THOSE DISTANCES IN A
LOWER-DIMENSIONAL SPACE IS SOUGHT.

• OBJECTIVE FUNCTION TO BE MINIMIZED, WHERE 𝑑𝑑𝑖𝑖𝑖𝑖 REPRESENTS DISTANCES DATA
POINTS (𝑖𝑖, 𝑗𝑗) IN THE LOWER-DIMENSIONAL SPACE, IS

′ 2
𝐸𝐸𝑀𝑀 = � 𝑑𝑑𝑖𝑖𝑖𝑖 − 𝑑𝑑𝑖𝑖𝑖𝑖
𝑖𝑖≠𝑗𝑗

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 21-7
NONMETRIC MDS

• ONLY THE RANK ORDER OF THE DISSIMILARITIES 𝛿𝛿𝑖𝑖𝑖𝑖 IS REQUIRED TO BE PRESERVED.


• A MONOTONICALLY INCREASING FUNCTION 𝑓𝑓 THAT ACTS ON THE ORIGINAL
DISSIMILARITIES IS INTRODUCED.
• THE RANK ORDER CAN BE PRESERVED IF THE DISSIMILARITIES ARE TRANSFORMED BY 𝑓𝑓.
• NORMALIZED OBJECTIVE FUNCTION:
1 ′ 2
𝐸𝐸𝑁𝑁 = 2 � 𝑓𝑓(𝛿𝛿𝑖𝑖𝑖𝑖 ) − 𝑑𝑑 𝑖𝑖𝑖𝑖

∑𝑖𝑖≠𝑗𝑗 𝑑𝑑𝑖𝑖𝑖𝑖 𝑖𝑖≠𝑗𝑗
• FOR GIVEN PROJECTION, 𝑓𝑓 IS ALWAYS CHOSEN SO AS TO MINIMIZE 𝐸𝐸𝑁𝑁 .

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 8
THE STRESS MEASURE

• IS AN INDICATOR OF THE EXTENT OF MISMATCH BETWEEN THE ORIGINAL


DISTANCES AND THE TRANSFORMED DISTANCES
• IS DEFINED AS
2
∑𝑖𝑖<𝑗𝑗 𝑑𝑑𝑖𝑖𝑖𝑖 − 𝑑𝑑̂ 𝑖𝑖𝑖𝑖
2
∑𝑖𝑖<𝑗𝑗 𝑑𝑑𝑖𝑖𝑖𝑖

• ALSO KNOWN AS KRUSKAL’S TYPE I STRESS.


Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 9
METRIC MDS

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 10
INTRODUCTION

• ASSUME THAT THE PROXIMITY MATRIX 𝐃𝐃 PROVIDED IS A MATRIX OF EUCLIDEAN DISTANCES.


• CONSIDER THE 𝑛𝑛 × 𝑛𝑛 INNER PRODUCTS MATRIX 𝐁𝐁 = 𝐗𝐗𝐗𝐗 ′ , DERIVED FROM A RAW 𝑛𝑛 × 𝑝𝑝 DATA
MATRIX 𝐗𝐗 WHICH IS UNKNOWN.
• ELEMENTS OF 𝐁𝐁 AND 𝐃𝐃 ARE RELATED BY THE EQUATION
WHERE

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 11
METRIC MDS (CONTD.)

• IN TERMS OF ITS SPECTRAL DECOMPOSITION 𝐁𝐁 CAN BE WRITTEN AS


𝐁𝐁 = 𝐕𝐕𝚲𝚲𝐕𝐕 ′ ,
WHERE 𝚲𝚲 = diag(𝜆𝜆1 , 𝜆𝜆2 , ⋯ , 𝜆𝜆𝑛𝑛 ) IS THE DIAGONAL MATRIX OF
EIGENVALUES OF 𝐁𝐁
𝐕𝐕 = 𝐕𝐕1 𝐕𝐕2 ⋯ 𝐕𝐕𝑛𝑛 IS THE CORRESPONDING MATRIX OF
EIGENVECTORS, NORMALISED SO THAT 𝐕𝐕𝑖𝑖 𝐕𝐕𝑖𝑖′ = 𝐈𝐈𝑛𝑛 . 𝐈𝐈𝑛𝑛 : THE IDENTITY
MATRIX OF ORDER 𝑛𝑛
• THE EIGENVALUES ARE ASSUMED TO BE LABELLED SUCH THAT
𝜆𝜆1 > 𝜆𝜆2 > ⋯ > 𝜆𝜆𝑛𝑛 .
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 12
METRIC MDS (CONTD.)

• WHEN 𝐃𝐃 ARISES FROM AN 𝑛𝑛 × 𝑝𝑝 MATRIX OF FULL RANK, THE RANK OF 𝐁𝐁 IS 𝑝𝑝,


SO THAT THE LAST 𝑛𝑛 − 𝑝𝑝 OF ITS EIGENVALUES ARE ZERO.
• SO 𝐁𝐁 CAN BE WRITTEN AS 𝐁𝐁 = 𝐕𝐕1 𝚲𝚲1 𝐕𝐕1′ WHERE 𝐕𝐕1 CONTAINS THE FIRST 𝑝𝑝
EIGENVECTORS AND 𝚲𝚲1 THE 𝑝𝑝 NON-ZERO EIGENVALUES.
• THE REQUIRED COORDINATE VALUES ARE THUS

𝐕𝐕1 Λ11 2
1⁄2
𝐗𝐗 = Λ1 = 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝜆𝜆1 , 𝜆𝜆2 , ⋯ , 𝜆𝜆𝑝𝑝

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 13
METRIC MDS (CONTD.)

• USING ALL 𝑝𝑝 DIMENSIONS WILL LEAD TO COMPLETE RECOVERY OF THE ORIGINAL EUCLIDEAN
DISTANCE MATRIX.
• THE BEST-FITTING 𝑚𝑚-DIMENSIONAL REPRESENTATION IS GIVEN BY THE 𝑚𝑚 EIGENVECTORS OF
𝐁𝐁 CORRESPONDING TO THE 𝑚𝑚 LARGEST EIGENVALUES.
• THE ADEQUACY OF THE 𝑚𝑚-DIMENSIONAL REPRESENTATION CAN BE JUDGED BY THE SIZE OF
THE CRITERION

• VALUES OF 𝑃𝑃𝑚𝑚 OF THE ORDER OF 0.8 SUGGEST A REASONABLE FIT.

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 14
METRIC MDS (CONTD.)

• WHEN THE OBSERVED PROXIMITY MATRIX IS NOT EUCLIDEAN, THE MATRIX 𝐁𝐁 IS NOT
POSITIVE-DEFINITE.
• SOME OF THE EIGENVALUES OF 𝐁𝐁 WILL BE NEGATIVE.
• SOME COORDINATE VALUES WILL BE COMPLEX NUMBERS.

• IF 𝐁𝐁 HAS ONLY A SMALL NUMBER OF SMALL NEGATIVE EIGENVALUES, A USEFUL


REPRESENTATION OF THE PROXIMITY MATRIX MAY STILL BE POSSIBLE USING THE
EIGENVECTORS ASSOCIATED WITH THE 𝑚𝑚 LARGEST POSITIVE EIGENVALUES.

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 15
METRIC MDS (CONTD.)

• THE ADEQUACY OF THE RESULTING SOLUTION MIGHT BE ASSESSED USING


ONE OF THE FOLLOWING TWO CRITERIA SUGGESTED BY MARDIA ET AL.
(1979):

• HERE TOO, VALUES ABOVE 0.8 CORRESPOND TO A GOOD FIT.

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 16
METRIC MDS (CONTD.)

• SIBSON (1979) RECOMMENDED ONE OF THE FOLLOWING TWO CRITERIA FOR DECIDING ON
THE NUMBER OF DIMENSIONS TO ADEQUATELY REPRESENT THE OBSERVED PROXIMITIES:
• TRACE CRITERION: CHOOSE THE NUMBER OF COORDINATES SO THAT THE SUM OF THE POSITIVE
EIGENVALUES IS APPROXIMATELY EQUAL TO THE SUM OF ALL THE EIGENVALUES.
• MAGNITUDE CRITERION: ACCEPT AS GENUINELY POSITIVE ONLY THOSE EIGENVALUES WHOSE
MAGNITUDE SUBSTANTIALLY EXCEEDS THAT OF THE LARGEST NEGATIVE EIGENVALUE.
• HOWEVER, IF THE MATRIX 𝐁𝐁 HAS A CONSIDERABLE NUMBER OF LARGE NEGATIVE
EIGENVALUES, METRIC MDS IS INADVISABLE.
• SOME OTHER METHOD, FOR EXAMPLE NON-METRIC MDS MIGHT BE BETTER EMPLOYED.

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 17
AN EXAMPLE

• DATA FROM INVESTIGATION BY CORBET, CUMMINS, HEDGES, AND KRZANOWSKI (1970),


WHO REPORT A STUDY OF WATER VOLES (GENUS ARVICOLA)
• THE AIM WAS TO COMPARE BRITISH POPULATIONS OF THESE ANIMALS WITH THOSE IN
EUROPE TO INVESTIGATE WHETHER MORE THAN ONE SPECIES MIGHT BE PRESENT IN BRITAIN.
• THE ORIGINAL DATA CONSISTED OF OBSERVATIONS OF THE PRESENCE OR ABSENCE OF 13
CHARACTERISTICS IN ABOUT 300 WATER VOLE SKULLS ARISING FROM
• SIX BRITISH POPULATIONS
• EIGHT POPULATIONS FROM THE REST OF EUROPE.

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 18
EXAMPLE (CONTD.)

THE DISSIMILARITY MATRIX

THE EIGENVALUES

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 19
EXAMPLE (CONTD.)
• THE SIX BRITISH POPULATIONS APPEAR TO BE
• CLOSE TO POPULATIONS LIVING IN THE ALPS,
YUGOSLAVIA, GERMANY, NORWAY, AND
PYRENEES I
• RATHER DISTANT FROM THE POPULATIONS IN
PYRENEES II, NORTH SPAIN AND SOUTH SPAIN
• THIS WOULD SEEM TO IMPLY THAT ONE
SPECIES MIGHT BE PRESENT IN BRITAIN BUT IT IS
LESS LIKELY THAT THIS IS SO FOR THE OTHER
SPECIES

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 20
ANOTHER EXAMPLE

• DATA: 150 MEASUREMENTS ON MALE EGYPTIAN SKULLS FROM FIVE EPOCHS ON 4 VARIABLES
• MB: MAXIMUM BREADTH OF THE SKULL
• BH: BASIBREGMATIC HEIGHT OF THE SKULL
• BL: BASIALIVEOLAR LENGTH OF THE SKULL
• NH: NASAL HEIGHT OF THE SKULL

• 30 MEASUREMENTS FROM EACH OF THE EPOCHS (4000BC, 3300BC, 1850BC, 200BC, 150AD
• MAHALANOBIS DISTANCES BETWEEN EACH PAIR OF EPOCHS ARE COMPUTED USING THE
ESTIMATED COMMON COVARIANCE MATRIX
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 21
EXAMPLE (CONTD.)

• THE DISTANCE MATRIX IS

• THE POSITIVE EIGENVALUES ARE


5.113, 0.098
• THE NEGATIVE EIGENVALUES ARE

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 22
NON-METRIC MDS

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 23
INTRODUCTION

• IN SOME PSYCHOLOGICAL WORK AND IN MARKET RESEARCH, PROXIMITY


MATRICES ARISE FROM ASKING HUMAN SUBJECTS TO MAKE JUDGEMENTS
ABOUT THE SIMILARITY/ DISSIMILARITY OF OBJECTS OR STIMULI OF INTEREST.
• WHEN COLLECTING SUCH DATA, THE INVESTIGATOR MAY FEEL THAT
REALISTICALLY SUBJECTS ARE ONLY ABLE TO GIVE “ORDINAL” JUDGEMENTS.
• FOR EXAMPLE, WHEN COMPARING A RANGE OF COLOURS THEY MIGHT BE ABLE TO
SPECIFY THAT ONE COLOUR IS BRIGHTER THAN ANOTHER BUT NOT BE ABLE TO PUT A
VALUE TO HOW MUCH BRIGHTER.
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 24
INTRODUCTION (CONTD.)

• SUCH CONSIDERATIONS LED TO THE SEARCH FOR A MDS METHOD THAT USES
ONLY THE RANK ORDER OF THE PROXIMITIES TO PRODUCE A SPATIAL
REPRESENTATION OF THEM.
• A METHOD THAT WOULD BE INVARIANT UNDER MONOTONIC TRANSFORMATIONS OF THE
PROXIMITY MATRIX
• THE DERIVED COORDINATES WILL REMAIN THE SAME IF THE VALUES OF THE OBSERVED
PROXIMITIES ARE CHANGED BUT THEIR RANK ORDER IS NOT.
• SUCH A METHOD WAS PROPOSED IN LANDMARK PAPERS BY SHEPARD (1962)
AND BY KRUSKAL (1964).
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 25
NON-METRIC MDS
• THE COORDINATES IN THE SPATIAL REPRESENTATION OF THE OBSERVED
DISSIMILARITIES 𝛿𝛿𝑖𝑖𝑖𝑖 GIVE RISE TO FITTED DISTANCES 𝑑𝑑𝑖𝑖𝑖𝑖 .
• THESE DISTANCES ARE RELATED TO A SET OF NUMBERS CALLED DISPARITIES 𝑑𝑑̂ 𝑖𝑖𝑖𝑖 , BY
𝑑𝑑𝑖𝑖𝑖𝑖 = 𝑑𝑑̂ 𝑖𝑖𝑖𝑖 + 𝑒𝑒𝑖𝑖𝑖𝑖 .
WHERE THE 𝑒𝑒𝑖𝑖𝑖𝑖 ARE ERROR TERMS REPRESENTING ERRORS OF MEASUREMENT PLUS
DISTORTION ERRORS ARISING BECAUSE THE DISTANCES DO NOT CORRESPOND TO A
CONFIGURATION IN THE PARTICULAR NUMBER OF DIMENSIONS CHOSEN.
• THE 𝑑𝑑̂ 𝑖𝑖𝑖𝑖 ARE MONOTONIC WITH THE OBSERVED DISSIMILARITIES AND, SUBJECT TO
THIS CONSTRAINT, RESEMBLE THE FITTED DISTANCES AS CLOSELY AS POSSIBLE.
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 26
METRIC MDS (CONTD.)

• IN GENERAL, ONLY A WEAK MONOTONICITY CONSTRAINT IS APPLIED, SO


THAT IF, SAY, THE OBSERVED DISSIMILARITIES 𝛿𝛿𝑖𝑖𝑖𝑖 ARE RANKED FROM LOWEST
TO HIGHEST TO GIVE

THEN
𝑛𝑛(𝑛𝑛−1)
WHERE 𝑁𝑁 = .
2

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 27
MON-METRIC MDS (CONTD.)

• THUS, FITTED DISTANCES OR DISPARITIES 𝑑𝑑̂ 𝑖𝑖𝑖𝑖 ARE CONSTRUCTED FROM 𝑑𝑑𝑖𝑖𝑖𝑖
SUCH THAT THE 𝑑𝑑̂ 𝑖𝑖𝑖𝑖 S ARE IN THE SAME RANK ORDER AS THE 𝛿𝛿𝑖𝑖𝑖𝑖 S (FOR
DISSIMILARITIES) OR REVERSE RANK ORDER (FOR SIMILARITIES).
• THE 𝑑𝑑̂ 𝑖𝑖𝑖𝑖 s CAN BE LOOKED UPON AS "SMOOTHED" VERSIONS OF THE 𝑑𝑑𝑖𝑖𝑖𝑖 s.
• THIS SMOOTHING PROCESS IS CARRIED OUT USING A METHOD CALLED
LEAST-SQUARES MONOTONIC REGRESSION.

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 28
NON-METRIC MDS (CONTD.)

• ONLY THE RANK ORDER OF THE DISTANCES IS IMPORTANT.


• A MONOTONICALLY INCREASING FUNCTION 𝑓𝑓 THAT ACTS ON THE ORIGINAL
DISTANCES IS INTRODUCED.
• THE RANK ORDER CAN BE PRESERVED IF THE DISTANCES ARE TRANSFORMED BY 𝑓𝑓.
• NORMALIZED OBJECTIVE FUNCTION:
1 ′ 2
𝐸𝐸𝑁𝑁 = 2� 𝑓𝑓(𝑑𝑑𝑖𝑖𝑖𝑖 ) − 𝑑𝑑𝑖𝑖𝑖𝑖

∑𝑖𝑖≠𝑗𝑗 𝑑𝑑𝑖𝑖𝑖𝑖 𝑖𝑖≠𝑗𝑗
• FOR GIVEN PROJECTION, 𝑓𝑓 IS ALWAYS CHOSEN SO AS TO MINIMIZE 𝐸𝐸𝑁𝑁 .
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 29
MONOTONIC LEAST SQUARES REGRESSION

• IN A PLOT OF 𝑑𝑑𝑖𝑖𝑖𝑖 VERSUS 𝛿𝛿𝑖𝑖𝑖𝑖 , A MONOTONIC CURVE (ONE WHERE THE LINES
JOINING ADJACENT POINTS ARE FLAT/INCREASING IF 𝛿𝛿𝑖𝑖𝑖𝑖 ARE
DISSIMILARITIES OR FLAT/DECREASING IF THEY ARE SIMILARITIES) IS
EXPECTED.
• IF THE 𝑑𝑑𝑖𝑖𝑖𝑖 AND THE 𝛿𝛿𝑖𝑖𝑖𝑖 HAVE THE SAME RANK ORDER, THEN THE PLOT WILL
SHOW SUCH A MONOTONIC CURVE AND THE 𝑑𝑑𝑖𝑖𝑖𝑖 s WILL NOT REQUIRE ANY
SMOOTHING.
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 30
MONOTONIC LS REGRESSION (CONTD.)

• IF THERE ARE DEPARTURES FROM MONOTONICITY, SMOOTHING WILL BE


NECESSARY.
• ACHIEVED THROUGH MONOTONIC REGRESSION
• FITS A MONOTONIC CURVE TO THE POINTS (𝑑𝑑𝑖𝑖𝑖𝑖 , 𝛿𝛿𝑖𝑖𝑖𝑖 ), WHILE MAKING THE SUM OF
SQUARED VERTICAL DEVIATIONS AS SMALL AS POSSIBLE (AS IN LEAST-SQUARES LINEAR
REGRESSION).
• THE POINT ON THE MONOTONIC CURVE, 𝑑𝑑̂ 𝑖𝑖𝑖𝑖 , IS THE FITTED OR PREDICTED VALUE OF 𝑑𝑑𝑖𝑖𝑖𝑖 .

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 31
MONOTONIC REGRESSION EXAMPLE

• THE POINTS (𝛿𝛿𝑖𝑖𝑖𝑖 , 𝑑𝑑𝑖𝑖𝑖𝑖 ) ARE SHOWN BY A


CROSS.
• THE FIRST AND SECOND POINTS FOLLOW A
MONOTONIC PATTERN, THE THIRD DOES
NOT.
• TO ACHIEVE MONOTONICITY, THE VALUES
OF 𝑑𝑑̂ 𝑖𝑖𝑖𝑖 FOR THE SECOND AND THIRD
POINTS IS TAKEN TO BE THEIR MEAN.

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 32
SHEPARD-KRUSKAL ALGORITHM

• IMPLEMENTS NON-METRIC MDS THROUGH THE PRINCIPLE OUTLINED ABOVE IN


AN ITERATIVE MANNER.
• STARTS WITH AN INITIAL CONFIGURATION OF POINTS IN THE 𝑑𝑑-DIMENSIONAL
SPACE, FROM WHICH INITIAL ESTIMATES OF 𝑑𝑑𝑖𝑖𝑖𝑖 ARE COMPUTED.
• IMPROVES UPON IT ITERATIVELY BY MINIMIZING STRESS TO ARRIVE AT THE
OPTIMAL CONFIGURATION OF POINTS.

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 33
ASSESSING FIT AND CHOOSING THE NUMBER OF
DIMENSIONS

• ONE METHOD INVOLVES COMPARING THE STRESS OBTAINED FOR THE


SOLUTION WITH THE GUIDELINES DEVELOPED BY KRUSKAL (1964), GIVEN
BELOW.

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 34
ASSESSING FIT AND CHOOSING THE NUMBER OF
DIMENSIONS

• SCREE PLOTS OF STRESS AGAINST


DIMENSION
• LOOK FOR THE ELBOW TO IDENTIFY THE

Stress
NUMBER OF DIMENSIONS.
• FOR ALL PRACTICAL APPLICATIONS,
THE NUMBER OF DIMENSIONS IS
TWO.
• FOR VISUAL DISPLAY

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 35
ILLUSTRATION

JUDGEMENT OF WORLD WAR II LEADERS


• OBJECTIVE: TO OBTAIN A SPATIAL REPRESENTATION OF JUDGEMENTS OF THE
DISSIMILARITIES IN IDEOLOGY OF A NUMBER OF WORLD LEADERS AND
POLITICIANS PROMINENT AT THE TIME OF THE SECOND WORLD WAR
• THE SUBJECTS MADE JUDGEMENTS ON A NINE-POINT SCALE
• THE ANCHOR POINT 1 INDICATING VERY SIMILAR
• THE ANCHOR POINT 9 INDICATING VERY DISSIMILAR

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 36
ILLUSTRATION (CONTD.)

• THE DISSIMILARITY MATRIX

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 37
ILLUSTRATION (CONTD.)

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 38
SOME EXAMPLES

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 39
EXAMPLE 0
TO MOTIVATE MDS

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 40
THE DATA

• AIRLINE DISTANCES BETWEEN 11 US CITIES LISTED BELOW

ATL Atlanta, Georgia MIA Miami, Florida

BOS Boston, Massachusetts JFK New York, New York

ORD Chicago, Illinois SEA Seattle, Washington

DEN Denver, Colorado SFO San Francisco, California

LAX Los Angeles, California MSY New Orleans, Lousiana

DCA Washington, District of Columbia

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 41
LOCATION OF 11 US CITIES ON A MAP

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 42
DISTANCE MATRIX FOR THE 11 US CITIES

ATL BOS ORD DCA DEN LAX MIA JFK SEA SFO MSY
ATL 0 934 585 542 1209 1942 605 751 2181 2139 424
BOS 934 0 853 392 1769 2601 1252 183 2492 2700 1356
ORD 585 853 0 598 918 1748 1187 720 1736 1857 830
DCA 542 392 598 0 1493 2305 922 209 2328 2442 964
DEN 1209 1769 918 1493 0 836 1723 1636 1023 951 1079
LAX 1942 2601 1748 2305 836 0 2345 2461 957 341 1679
MIA 605 1252 1187 922 1723 2345 0 1092 2733 2594 669
JFK 751 183 720 209 1636 2461 1092 0 2412 2577 1173
SEA 2181 2492 1736 2328 1023 957 2733 2412 0 681 2101
SFO 2139 2700 1857 2442 951 341 2594 2577 681 0 1925
MSY 424 1356 830 964 1079 1679 669 1173 2101 1925 0

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 43
THE PROBLEM

• THE GEOGRAPHICAL COORDINATES OF THE CITIES ARE EASILY AVAILABLE AND CAN BE USED
TO GENERATE THE MAP
• ASSUME THAT THE COORDINATES ARE NOT KNOWN.
• THE ONLY DATA AVAILABLE IS THE DISTANCE TABLE.
• DRAW A MAP OF THE MAJOR CITIES OF THE USA USING THE BETWEEN-CITIES DISTANCES.

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 44
2-D MDS SOLUTION

ORIGINAL AFTER ROTATION

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 45
COMPARISON OF THE MDS SOLUTION WITH REALITY

ROTATED MDS SOLUTION ACTUAL MAP

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 46
OBSERVATIONS

• THE CITIES ON THE MDS MAP ARE NOT IN THEIR EXPECTED LOCATIONS.
• THE MAP IS NOT JUST MIRRORED BUT FLIPPED, THAT IS, ROTATED BY 180
DEGREES.
• HIGHLIGHTS THE FACT THAT MDS ONLY TRIES TO PRESERVE THE INTER-
OBJECT DISTANCES
• THERE IS NOTHING WRONG WITH THE MAP!

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 47
EXAMPLE I
COMPARISON OF TOOTHPASTE BRANDS

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 48
INPUT DATA

• PERCEPTION DATA: DIRECT APPROACHES: RESPONDENTS ARE ASKED TO


JUDGE HOW SIMILAR OR DISSIMILAR THE VARIOUS BRANDS ARE.
• CONSIDER SIMILARITY RATINGS OF VARIOUS TOOTHPASTE BRANDS
VERY VERY
DISSIMILAR SIMILAR
CREST VS. COLGATE 1 2 3 4 5 6 7
AQUA-FRESH VS. CREST 1 2 3 4 5 6 7
CREST VS. AIM 1 2 3 4 5 6 7
.
COLGATE VS. AQUA-FRESH 1 2 3 4 5 6 7

• THE NUMBER OF PAIRS TO BE EVALUATED IS 𝑛𝑛 (𝑛𝑛 − 1)/2, WHERE 𝑛𝑛 IS THE


NUMBER OF STIMULI.
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 21-49
SIMILARITY RATING OF TOOTHPASTE BRANDS
Aqua- Crest Colgate Aim Gleem Plus Ultra Close- Pepsod Sensod
Fresh White Brite Up ent yne
Aqua-Fresh
Crest 5
Colgate 6 7
Aim 4 6 6
Gleem 2 3 4 5
Plus White 3 3 4 4 5
Ultra Brite 2 2 2 3 5 5
Close-Up 2 2 2 2 6 5 6
Pepsodent 2 2 2 2 6 6 7 6
Sensodyne 1 2 4 2 4 3 3 4 3

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 50
PLOT OF STRESS VERSUS DIMENSIONALITY

Stress 0.3

0.2

0.1

0.0
0 1 2 3 4 5
Number of Dimensions
Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 21-51
A SPATIAL MAP OF TOOTHPASTE BRANDS

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 21-52
EXAMPLE II
SIMILARITIES BETWEEN COUNTRIES

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 53
SIMILARITY MATRIX FOR 12 COUNTRIES

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 54
TWO-DIMENSIONAL MDS PLOT OF COUNTRIES

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 55
EXAMPLE III
SIMILARITIES BETWEEN COLOURS BASED ON SUBJECTIVE JUDGEMENT

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 56
THE DATA

• TO STUDY THE PERCEPTIONS OF COLOUR IN HUMAN VISION (EKMAN, 1954*)


• 14 COLOURS DIFFERING ONLY IN THEIR HUE (i.e., WAVELENGTHS FROM 434 µm TO 674 µm) WERE
CONSIDERED.
• 31 PEOPLE WERE ASKED TO RATE ON A FIVE-POINT SCALE FROM 0 (NO SIMILARITY AT ALL) TO 4
14
(IDENTICAL) FOR EACH OF 𝐶𝐶2 PAIRS OF COLOURS.
• AVERAGE OF 31 RATINGS FOR EACH PAIR (REPRESENTING SIMILARITY) WAS THEN SCALED.

*Ekman, G. (1954). Dimensions of Color Vision, The Journal of Psychology, 38(2), pp. 467-474.

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 57
SIMILARITY MATRIX

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 58
SCREE PLOT

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 59
TWO-DIMENSIONAL MDS PLOT

RED BLUE

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 60
EXAMPLE IV
ACOUSTIC CONFUSION OF THE LETTERS OF THE ENGLISH ALPHABET

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 61
SIMILARITY MATRIX

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 62
SIMILARITY MATRIX (CONTD.)

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 63
SCREE PLOT

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 64
TWO-DIMENSIONAL ORDINAL MDS PLOT

Statistical Structures in Data, PGDBA Programme, ISI Kolkata, 2022 December 16 & 21, 2022 65

You might also like