You are on page 1of 140

Multiple Correlation & Regression

• In statistics, the coefficient of Multiple


Correlation is a measure of how well a given
variable can be predicted using a linear function of
a set of other variables

• Regression Analysis, predicts the value of the


Dependent Variable (D.V.) based on the known
value of the Independent Variable (I.V.), assuming
that average mathematical relationship between
two or more variables
1
Multiple Correlation & Regression
• Correlation & Regression are generally performed
together.
• Correlation – Degree of association between two
sets of quantitative data.
• Regression analysis – Explains the variation in one
variable (called the Dependent Variable), based on
the variation in one or more other variables (called
the independent variable).
• One Dependent Variable (D.V.) & One
Independent Variable (I.V.) – Simple Regression
• Multiple Independent Variable & one Dependent2
Variable – Multiple Regression
Multiple Correlation & Regression
• Worked Out Example –
• A Manufacturer & marketer of electric motors
would like to build a regression model consisting
of 5 or 6 independent variables to predict sales.
Past data has been collected for 15 sales territories
on sales & 6 different Independent Variables. Build
a Regression Model & recommend whether or not
it should be used by company.
• Data –
• Dependent Variable –
• Y = Sales in Rs Laks in the territory 3
Multiple Correlation & Regression
• Independent Variables –
• X1 = Market Potential in the territory ( in Rs Lakh)
• X2 = No. of Dealers of the company in the
territory
• X3 = No. of Sales Persons in territory
• X4 = Index of competitor activity in territory on 5
Point scale (1 – Low, 5 – High)
• X5 = No. of service People in the territory
• X6 = No. of existing customers in the territory

4
Multiple Correlation & Regression

5
Multiple Correlation & Regression

6
Multiple Correlation & Regression
• SPSS Procedure –
• Correlation:
– Click on ANALYZE ( or STATISTICS depending
upon SPSS version)
– Click on CORRELATE, followed by BIVARIATE
– Select all the variables from list with a ‘Right Arrow’
– Select PEARSON under heading of Correlation
Coefficient
– Select ‘2 – tailed’ under heading Test of Significance
– Click OK to get matrix of pair-wise ‘Pearson
Correlations’ among all the variables selected along
with two tailed significance of each pair-wise 7

correlation.
Multiple Correlation & Regression
• SPSS Procedure –
• Regression:
• Click on ANALYZE (or STATISTICS)
• Click on REGRESSION followed by LINEAR
• Select Dependent Variable & transfer them to
Dependent Variable box using arrow keys
• Select Independent variable & transfer them to
Independent Variable box using arrow keys
• Select ENTER

8
Multiple Correlation & Regression – SPSS Output

9
Multiple Correlation & Regression – SPSS Output –
‘Enter’

10
Multiple Correlation & Regression – SPSS Output

The Standard Error of the Estimate for Regression measures the amount of
variability in the points around the Regression line.
It is the Standard Deviation of the data points as they are distributed around
the Regression line
R Square is a basic matric which tells you about that how much variance is
been explained by the model. What happens in a multivariate linear regression
is that if you keep on adding new variables, the R square value will always
increase irrespective of the variable significance.
What Adjusted R Square do is calculate R square from only those variables
whose addition in the model which are significant. So always while doing a
multivariate linear regression we should look at adjusted R square instead of R
square. 11
Correlation & Regression
+ 3 Se
Y Est. = a + b X
Dependent + 2 Se
Variable + 1 Se
(Y)
- 1 Se
- 2 Se
- 3 Se

Independent Variable (X)


Multiple Correlation & Regression – SPSS Output

Residuals. The difference between the observed value of the


dependent variable (y) and the predicted value (ŷ) is called the
residual (e). Each data point has one residual. Both the sum and
the mean of the residuals are equal to zero.
df – Degrees of Freedom –
For Regression – p ( no of IVs)
For Residual: n-p-1 ( where n is sample size)
MS = SS / df
F = MS (Regression) / MS ( Residual) 13
Multiple Correlation & Regression – SPSS Output

The t statistic is the coefficient divided by its standard error.


The standard error is an estimate of the standard deviation of the
coefficient, the amount it varies across cases. It can be thought of as a
measure of the precision with which the regression coefficient is
measured.
Y ( Sales) = -3.173 + 0.227 (Market Potential) + 0.819 (Dealers) + 1.091 (Sales
Persons) -1.893 (Comp Act Index) – 0.549 (Service Persons) + 0.066 (Exist.14
Customers) --- Referring to Unstandardized Coefficients
Multiple Correlation & Regression – SPSS Output
Standardization of the coefficient is usually done to answer the question of which
of the independent variables have a greater effect on the dependent variable in
a multiple regression analysis, when the variables are measured in different units
of measurement (for example, income measured in dollars and family
size measured in number of individuals).

A regression carried out on original (unstandardized) variables produces


unstandardized coefficients. A regression carried out on standardized variables
produces standardized coefficients. Values for standardized and unstandardized
coefficients can also be derived subsequent to either type of analysis.

Before solving a multiple regression problem, all variables (independent and


dependent) can be standardized. Each variable can be standardized by subtracting
its mean from each of its values and then dividing these new values by
the standard deviation of the variable. Standardizing all variables in a multiple
regression yields standardized regression coefficients that show the change in the
dependent variable measured in standard deviations

15
Multiple Correlation & Regression
• Ex 2 – An organization would like to build
Regression Model consisting of four independent
variables to predict the Compensation (Dependent
Variable) of it’s employees. Past data has been
collected for 15 different employees & four
independent variables.. Build a Regression Model
& recommend it’s proper usage.
• The data is as follows –
• Dependent Variable – (DV)
• Y = Compensation in Rs.
16
Multiple Correlation & Regression
• Independent Variables ( I.V.) –
• 1. Experience in Years
• 2. Education in Years (After 10th)
• 3. Number of Employees Supervised
• 4. Number of Projects Handled
The dataset consisting of observations is given on the
next slide -

17
Multiple Correlation & Regression

18
Multiple Correlation & Regression

19
Multiple Correlation & Regression

20
Multiple Correlation & Regression

21
Attribute Type Perceptual Mapping using
Discriminant Analysis
• Positioning is essentially concerned with mapping
a consumer’s mind & placing all the competing
brands of a product category in appropriate ‘slots’
or ‘positions’ on it.
1. Customer Survey – What customer thinks about
available or particular brands in market based on
some important attributes.
2. This can be plotted on graph (2 attributes at a
time & relative positioning of Brands).
This is Perceptual Mapping of consumer perception
about competing Brands in Product Category. 1
Attribute Type Perceptual Mapping using
Discriminant Analysis
• Methods –
1. Attribute based approach –
– Using Discriminant Analysis
2. Similarity / Dissimilarity based approach
– Easy to understand intutively
– Useful in gaining good understanding of consumer
psyche
– Based on some kind of distance measure between the
Brands being Rated
– Simple Way – Provide customer a cards, each
containing pairs of brands written on it 2
Attribute Type Perceptual Mapping using
Discriminant Analysis
• The application areas for Attribute Type
Perceptual Mapping is same as that of MDS.
• We will be now further interested to find out the
perceptual difference on attribute level
• This will lead to actionable points to improve a
particular attribute ( based on priority) and there
by further improve the Brand Image / Brand Score

3
Multidimensional Scaling (MDS)
for Brand Positioning
MDS 2D Output Dimension 2 ( After Sales Service)

Thompson 7
Videocon 2
1 AIWA
6 Onida

Dimension 1
(Value for Money)
BPL 8 5
Samsung 4 Sony
3
LG

4
Attribute Type Perceptual Mapping using
Discriminant Analysis
• Example – A Chocolate company wants to draw a
perceptual map using attribute based procedure.
Assume Nestle vs Cadbury Vs Amul
• Data was collected from 15 respondents ( 5
consumers of each brand) on five attributes viz.
Price, Quality, Availability, Packaging & Taste.
• The variable are measured using different scales
but higher value indicated a favorable rating.

5
Attribute Type Perceptual Mapping using
Discriminant Analysis
Brand Price Quality Availability Packaging Taste
1 12 34 500 5 18
1 11 35 234 4 15
1 10 36 250 4 14
1 13 22 345 5 12
1 12 23 432 3 13
2 10 14 234 2 15
2 11 17 231 3 11
2 15 23 45 4 10
2 13 14 35 3 12
2 12 15 25 2 10
3 10 22 75 4 8
3 12 24 80 4 7
3 13 28 90 5 10
3 11 17 96 2 12 6
3 11 18 59 2 6
Attribute Type Perceptual Mapping using
Discriminant Analysis – SPSS Input

7
Attribute Type Perceptual Mapping using
Discriminant Analysis – SPSS Input

Measure for Brand - Nominal

Type –Numeric (All)

8
SPSS

1. Analyze
2 Classify

3 Discriminant

9
SPSS

1. Move DV into
Grouping Variable
2. Move IV

Select respective
variables and use
this to move

10
SPSS

1. Select DV

2. Click ‘Define Range’

11
SPSS

Enter the values –


Min as 1 and Max 3
( We have 3 groups)

Continue

12
SPSS

Click on Statistics

13
SPSS

Click on
ANOVA,
Fisher’s &
Unstandardized

Continue

14
SPSS

Click on ‘Classify’

15
SPSS

Click on ‘Combined – Groups’


(under the heading ‘Plot’)

Click on ‘Summery Table


& ‘Leave Out Classification’
& then ‘Continue’

16
SPSS

Click on ‘Save’

17
SPSS

Click on all the


options &
Continue

18
SPSS

Click on ‘Ok’

19
Attribute Type Perceptual Mapping using
Discriminant Analysis – SPSS Output
Data Set 0 -

20
Attribute Type Perceptual Mapping using
Discriminant Analysis – SPSS Output

21
Attribute Type Perceptual Mapping using
Discriminant Analysis
Wilks’ Lambda
• = With in S.S. (‘Sum of Squares’) / Total S.S. (Sum of Squares)
Wilk’s Lambda value is between 0 to 1.
Any value closer to 0 indicates better discriminating power
• If the model is good, ‘With in SS’ should be as much less as possible
‘Eigenvalue = Between S.S. (Sum of Squares) /
With in S.S. (Sum of Squares)
If the model is Good, Eigenvalue should be greater than 1,
i.e. Between S.S. > With in S.S. Higher the value, better it is.
Canonical Correlation = Sq Root of ( Betn S.S. / Total S.S)
• Any Value > 0.5 – Accept the model
Significance value / Confidence Level – Very Good at @ 99% Which
is ( 1- Sig. Value). 22
Attribute Type Perceptual Mapping using
Discriminant Analysis – SPSS Output

23
Attribute Type Perceptual Mapping using
Discriminant Analysis – Graph

24
Output Interpretations
• As we had 3 groups in the example , we have two
Functions / equations ( K-1)
• In some of the examples we may have 4 or 5. We need to
judge importance of each with the associated Eigen Value
and amount of variance it explains from original data
• Significance test also tells us if the given function is
significantly discriminates between the groups(Brands)
• If we use two functions, there will be one perceptual map
with two functions forming the two axis.
• If we use three functions, we will get three perceptual
maps. Function1 vs Function2, F2 Vs F3 & F3 vs F1
25
Output Interpretations
• To draw Graph –
• Function 1 – X Axis, Function2 – Y Axis with suitable scaling
• Select following table for determining coeff. of attribute on function 1
& 2 and join it to 0.

• Similarly use above chart to determine Coeff of Brands on Graph.

26
Output Interpretations
• Variables with longer vectors in a given dimension and those closest
to axis are contributing more to the interpretation of that dimension.
Looking at the graph, we can give label to the dimension .
• E.g. Dimension1 – Availability & Quality
• Dimension 2 – Price & Taste
• Packaging is stand alone and does not affect in any way

• Nestle seems to be stronger on Dimension 1 i.e. Availability &


Quality
• Cadbury is stronger on Dimension 2 i.e. Taste & Price

27
Attribute Type Perceptual Mapping using
Discriminant Analysis

28
Attribute Type Perceptual Mapping using
Discriminant Analysis

29
Attribute Type Perceptual Mapping using
Discriminant Analysis
Lakme

Finish
Packaging
Price
Dim 1
Colour MB
Long Lasting

Rev
Dim 2
30
Multidimensional Scaling (MDS)
for Brand Positioning
• Positioning is essentially concerned with mapping
a consumer’s mind & placing all the competing
brands of a product category in appropriate ‘slots’
or ‘positions’ on it.
1. Customer Survey – What customer thinks about
available or particular brands in market based on
some important attributes.
2. This can be plotted on graph (2 attributes at a
time & relative positioning of Brands).
This is Perceptual Mapping of consumer perception
about competing Brands in Product Category. 1
Multidimensional Scaling (MDS)
for Brand Positioning
• Methods –
1. Attribute based approach –
– Using Discriminant Analysis
2. Similarity / Dissimilarity based approach
– Easy to understand intutively
– Useful in gaining good understanding of consumer
psyche
– Based on some kind of distance measure between the
Brands being Rated
– Simple Way – Provide customer a cards, each
containing pairs of brands written on it 2
Multidimensional Scaling (MDS)
for Brand Positioning
– Ask him to write down a number indicating the
difference between two brands on a numerical scale
which can represent the distance.
– This can be repeated for all pairs of brands under
consideration
– No Attributes are specified for the exercise
– Customer may have the top line parameters on his
mind while rating but he would not specify them
– He would only indicate the distance ( Dissimilarity) in
some numerical value
Output – Number of Dimensions ( Based on ‘Stress’)
& Interpretations 3
Multidimensional Scaling (MDS)
for Brand Positioning
• Example – Use MDS to determine perception of 8
TV Brands & try to plot a positioning map of the
eight brands. Also find out the no. of dimensions
the consumer seem to be using.
AIWA Video LG Samsung Sony Onida Tho BPL
Var1 (AIWA) 0 3 6 8 1 2 7 8
Var2 (Videocon) 3 0 4 6 4 5 2 5
Var3 (LG) 6 4 0 3 2 4 6 1
Var4 (Samsung) 8 6 3 0 3 5 4 7
Var5 (Sony) 1 4 2 3 0 2 8 5
Var6 (Onida) 2 5 4 5 2 0 3 6
Var7 ( Thompson) 7 2 6 4 8 3 0 5
Var8 (BPL) 8 5 1 7 5 6 5 4
0
Multidimensional Scaling (MDS)
for Brand Positioning

5
Multidimensional Scaling (MDS)
for Brand Positioning

1. Analyze

2. Scale

3. Multidimensional Scaling

6
Multidimensional Scaling (MDS)
for Brand Positioning

1.Select all the variables

2.Click the shift arrow to


Take all the variables in
Rt side block

7
Multidimensional Scaling (MDS)
for Brand Positioning

1.After shifting all the variables on


Rt side block
2. Click on ‘Model’

8
Multidimensional Scaling (MDS)
for Brand Positioning

1. Ordinal

2. Matrix
3. Minimum -1, Maximum -3

4. Euclidian Distance

5. Continue

9
Multidimensional Scaling (MDS)
for Brand Positioning

1. Click Options

2. Individual Subject Plots

3. Continue

10
Multidimensional Scaling (MDS)
for Brand Positioning

1. Click Data are Distances

2. Ok

11
Multidimensional Scaling (MDS)
for Brand Positioning
• Important Output –
• Iteration History – This will be available in tables with
‘Stress’ value & improvements at every iteration. This
will be available for all dimension values (1 Dimension,
2 Dimension & 3 Dimension solutions)
• Stimulus Coordinates – This is the important table. You
can make multidimensional scaling based on the
coordinates. Example is shown in next slide
• ‘Stress’ value & ‘RSQ’ ( Stress & Squared Correlation)
value for all the values of Dimensions ( in our example –
1,2 &3 Dimension)
12
Multidimensional Scaling (MDS) for Brand Positioning
No. of Dimensions Stress (S – Stress or Kruskal Stress) RSQ
3 Dimensional 0.05230 0.96043
2 Dimensional 0.24015 0.58135
1 Dimensional 0.43158 0.35255

Stress Value indicates lack of fit, so it should be as close to zero as


possible.
RSQ should be close to 1( at least more than 0.5). RSQ (R Square)
is interpreted as proportion of Variance of transformed data
accounted for by distances in the model.
Clearly one dimensional solution is not good. Two Dimensional
Solution is better but three dimensional solution is the best. It is as low
as 0.05.
Since we are comparing only 8 models, we will not get beyond 3
dimensional solution. If we compare say 12 to 15, we may get more.
However trade off is always reqd. in the case of no. of dimensions
and ease of interpretation from economy angle. 13
Multidimensional Scaling (MDS)
for Brand Positioning

14
Multidimensional Scaling (MDS)
for Brand Positioning
• Assuming that we have decided to use 3 Dimensional solution , the
next task is to name the dimensions. For doing so, subject matter
expertise is needed. We must look at various qualities of various
attributes offered by these 8 brands thro’ our knowledge of market.
This tends to be subjective. E.g. in our example 3 dimensions could be
• Dimension 1 – Value for Money
• Dimension 2 – After Sales Service
• Dimension 3 – Current Brand Image
• From one of the output (Configuration derived in 3 dimensions), we
may conclude that some brand enjoy good brand image wrt dimension
1, while some are perceived best in dimension 2. This is explained wrt
2 Dimensional graph in next slides.

15
Multidimensional Scaling (MDS) for Brand Positioning
Configuration derived in 3 Dimension Stimulus Coordinates
Stress – 0.05230, RSQ – 0.96043
Stimulus Stimulus Name 1 2 3
Number After Sales Current
Value for
Money Service Brand Image
1 VAR00001(AIWA) 1.9512 0.2028 0.0664

2 VAR00002 (Videocon) -0.1995 1.3140 0.7743

3 VAR00003 (LG) -0.6043 -1.3429 0.4680

4 VAR00004 (Samsung) -0.9038 -0.2969 -1.8497

5 VAR00005 (Sony) -0.8931 -1.0092 -0.0350

6 VAR00006 (Onida) 1.1045 0.1529 -0.7070

7 VAR00007 (Thompson) -1.1031 1.6088 -0.1289

8 VAR00008 (BPL) -1.1381 -0.6295 1.4121

16
Multidimensional Scaling (MDS) for Brand Positioning
Configuration derived in 2 Dimension Stimulus Coordinates
Stress – 0.24015, RSQ – 0.58148
Stimulus Stimulus Name 1 2
Number After Sales
Value for Money
Service
1 VAR00001(AIWA) 1.6156 0.4756

2 VAR00002 (Videocon) -0.276 1.3796

3 VAR00003 (LG) -0.254 -1.0558

4 VAR00004 (Samsung) -1.2851 -0.7799

5 VAR00005 (Sony) 0.9602 -0.9335

6 VAR00006 (Onida) 1.1044 0.0665

7 VAR00007 (Thompson) -0.5683 1.5124

8 VAR00008 (BPL) -1.2968 -0.662


17
Multidimensional Scaling (MDS)
for Brand Positioning
MDS 2D Output Dimension 2 ( After Sales Service)

Thompson 7
Videocon 2
1 AIWA
6 Onida

Dimension 1
(Value for Money)
BPL 8 5
Samsung 4 Sony
3
LG

18
Multidimensional Scaling (MDS)
for Brand Positioning
• Example - A set of eight brands of detergent
available in Market and is taken for Multi
Dimensional Scaling (MDS) to determine how the
Indian consumers perceive them. Also it is used to
find out how many dimensions the consumers seem
to be considering, when they think of these brands.
The eight brands are - 1.Rin, 2. Nirma, 3. Ariel,
4. Ok, 5. Eta, 6. Wheel, 7. Trilo, 8. Super Bar 501.
• Input data is given on the next slide.
• (For Ref. – Dimension 1 – Value for Money, Dimension 2 –
Dirt Removal, Dimension 3 – Fabric Care) 19
Multidimensional Scaling (MDS)
for Brand Positioning

20
Multidimensional Scaling (MDS)
for Brand Positioning
Important Outputs -
Stress RSQ
3 Dimensional 0.05975 0.98047
2 Dimensional 0.08609 0.96647
1 Dimensional 0.36852 0.64662

21
Multidimensional Scaling (MDS)
for Brand Positioning
Configuration derived in 3 Dimension Stimulus Coordinates
Stress – 0.05975, RSQ – 0.94583
Stimulus Stimulus Name 1 2 3
Number Dirt Fabric Care
Value for
Money Removal
1 VAR00001(Rin) -0.1860 -0.8867 -1.1559

2 VAR00002 (Nirma) 1.0175 -1.1665 0.3782

3 VAR00003 (Ariel) -1.6674 -0.1959 0.4655

4 VAR00004 (OK) -1.4639 1.3148 - 0.0510

5 VAR00005 (Eta) 0.4844 1.5134 - 0.7565

6 VAR00006 (Wheel) -1.0923 -0.9168 0.8062

7 VAR00007 (Trilo) 1.3535 -0.4514 -0.7064

8 VAR00008 (Super Bar 501) 1.5543 0.7892 1.0199


22
Multidimensional Scaling (MDS)
for Brand Positioning
Configuration derived in 2 Dimension Stimulus Coordinates
Stress – 0.08616, RSQ – 0.93172
Stimulus Stimulus Name 1 2
Number Dirt Removal
Value for
Money
1 VAR00001(Rin) -0.11 -1.03
2 VAR00002 (Nirma) 0.75 -1.14
3 VAR00003 (Ariel) -1.43 0.12
4 VAR00004 (OK) -1.26 1.05
5 VAR00005 (Eta) 0.53 1.4
6 VAR00006 (Wheel) -1.09 -0.85
7 VAR00007 (Trilo) 1.22 -0.35
8 VAR00008 (Super Bar 501)
1.4 0.8 23
Multidimensional Scaling (MDS)
for Brand Positioning

24
Cluster Analysis
• Many times, whole population is diverse, but might
consist of number of similar Groups (Clusters)
• The objective of Cluster Analysis is to assign
observations to groups or Clusters so that
observation with in each group are similar to one
another wrt variables or attributes or attributes of
interests and groups themselves stand apart from one
another
• In other words, the objective is to divide the
observations into homogenous and distinct groups
1
Cluster Analysis
• Cluster Analysis seeks to discover the number and
composition of groups
• Sometimes, common marketing strategy may not
work out. However, separate marketing strategy for
each cluster, based on Age, Gender, Income, Marital
Status, Years of loyalty (how long person has been a
customer) might work out.

2
Cluster Analysis
• Illustrative Example
Person Expenditure on Expenditure on
Food Clothing
A 2 4
B 8 2
C 9 3
D 1 5
E 8.5 1

• Distance Measured
1. Euclidean – Betn A & B –sq root of (2-8)Sq +(4 -
2)sq = Sq Root of 40 = 6.325
2. Squared Euclidean 3
Cluster Analysis
• Grouping of observation

Clusters A B C D E
A
B
C
D
E
4
Cluster Analysis - Hierarchical
Clusters A B C D E
A 0 6.325 7.071 1,414 7.159
B 0 1.414 7.616 1.118
C 0 8.246 2.062
D 0 8.500
E 0
Clusters (BE) A C D
(BE) 0 6.325 1.414 7.616
A 0 7.071 1.414
C 0 8.246
D 0

Clusters (BE) (AD) C


(BE) 0 6.325 1.414
(AD) 0 7.071
C 0 5
Dendrogram Output
S
I
M 27.54
I
L 51.69
A
R
75.85
I
T
100
Y
A D B E C

6
Cluster Analysis K Means
• Iteration 1
Cluster 1 Cluster 2
Obs X1 X2 Obs X1 X2
A 2 4 C 9 3
B 8 2 E 8.5 1
D 1 5
Average 3.67 3.67 Average 8.75 2

• Iteration 2
Cluster 1 Cluster 2
Obs X1 X2 Obs X1 X2
A 2 4 C 9 3
D 1 5 E 8.5 1
B 8 2
Average 1.5 4.5 Average 8.5 2 7
Distance Definitions
• Single – Minimum distance between an item in one
Cluster & an item in other Cluster (Nearest Neighbor
Method)
• Complete - Maximum distance between an item in one
Cluster & an item in other Cluster (Furthest Neighbor
Method)
• Centroid – It is average value within cluster (average
rating)
• Ward’s Method - Calculate distance between each
respondent rating and the cluster centroid by squared
Euclidean method & add them
8
Distance Definitions
Pe Expe Expend Clusters A B C D E
rso nditu iture on A 0 6.325 7.071 1,414 7.159
n re on Clothin
Food g B 0 1.414 7.616 1.118

A 2 4 C 0 8.246 2.062

B 8 2 D 0 8.500

C 9 3 E 0

D 1 5
E 8.5 1

Dist. – AB - , AE – , BD - , ED –
Centroid – AD – AD – (2+1/2), (4+5/2) = 1.5, 4.5
Similarly Calculate, Cetroid – BE –
Wards - AD – (1-1.5)sq + (5-4.5)sq + (2-1.5)sq + (4-4.5)sq = 1
Similarly Calculate, Wards – BE - 9
Distance Definitions
• Between the Group Linkage - It is average
distance (Sq Eucl) between all pairs of customers
in two clusters
• With in the group linkage - First combine two
clusters & it is average (Sq Eucl) between all
pairs of customers in combined cluster

10
Distance Definitions
Pe Expe Expend Clusters A B C D E
rso nditu iture on A 0 6.325 7.071 1,414 7.159
n re on Clothin
Food g B 0 1.414 7.616 1.118

A 2 4 C 0 8.246 2.062

B 8 2 D 0 8.500

C 9 3 E 0

D 1 5
E 8.5 1
Between the group Linkage ( Sq Eucl) : Cluster AD & BE –
• A-B – , A-E- , D –B – ,D–E–
• Add all squares and divide by 4
With in the group linkage (Sq Eucl) : Cluster AD & BE –
A–D– ,B–E– – Add this total to the total of With in group linkage
& divide by 6 11
Distance Definitions
Pe Expe Expend Clusters A B C D E
rso nditu iture on A 0 6.325 7.071 1,414 7.159
n re on Clothin
Food g B 0 1.414 7.616 1.118

A 2 4 C 0 8.246 2.062

B 8 2 D 0 8.500

C 9 3 E 0

D 1 5
E 8.5 1
Between the group Linkage ( Sq Eucl) : Cluster AD & BE –
• A-B – ( 6.325)sq, A-E- (7.159) sq, D –B – (7.616)sq, D – E – (8.500)sq
• Add all squares and divide by 4
With in the group linkage (Sq Eucl) : Cluster AD & BE –
A – D –(1.414) sq, B – E – (1.118) – Add this total to the total of With in group
linkage & divide by 6 12
SPSS

13
SPSS
Variable View

Name VAR 00..1

Type Numeric

Decimal 2

Lebel Type Question or concise form of question

Values None

Missing Check for missing value in data, if any

Columns Actuals .. Just an information

Align Rt Align.. ( Your choice)

Measure Scale

14
SPSS

1. Analyze
2. Classify
3. Hierarchical Cluster

15
SPSS

16
SPSS

1. Enter the 2. Click on


Variables by Statistics
selecting and
clicking

17
SPSS

1 Tick
Agglomeration
Schedule

2 Click
Continue

18
SPSS

Click on
Plots

19
SPSS

1 Dendrogram
2 All Clusters

3 Vertical
4. Continue

20
SPSS

Method

21
SPSS

1 1. Centroid Clustering
2 2. Sq. Eucl. Dist.
3. None (Standardize)
4. Continue
3
4

22
SPSS

Ok

23
SPSS – Hierarchical Clustering Output

24
SPSS
K Means Cluster Analysis -

1.Analyze

2. Classify
3. K Means
Cluster

25
SPSS

1 Select Variables

2 Clusters

26
SPSS

1.Save

2 Tick Cluster
Membership &
Continue

27
SPSS

1 Iterate
2.Default Value -10

3 Continue

28
SPSS

Initial Cluster
Centers
1 Options
ANOVA Table

Cluster Information
For each case

Continue

29
SPSS

Ok

30
SPSS – K Means Output

31
SPSS – K Means Output

F = MSS Between / MSS Within


Higher the ratio, better it is
In the case above –
F (Var1) = 58.800 / 0.333 = 176.40
F (Var 2) = 7.5 / .833 = 9.000
So also Sig. Value for Var 1 = 0.001, i.e
Confidence Level = (1-0.001)*100 = 99.9%
Sig. value for Var 2 = 0.058 i.e.
Confidence Level = (1-0.058)*100 = 94.2% 32
SPSS

Cluster Membership appears in data view

33
References
• The Sum of Squares is the sum of the square of
variation, where variation is defined as the spread
between each individual value and the mean. To
determine the Sum of Squares, the distance between
each data point and the line of best fit is squared and
then summed up. The line of best fit will minimize this
value
• Or
• In statistical data analysis the Total Sum of Squares (TSS
or SST) is defined as being the sum, over all observations,
of the squared differences of each observation from the
overall mean. 34


References
• Mean Squares - Each Mean Square value is computed by
dividing a Sum-of-Squares value by the corresponding
degrees of freedom. In other words, for each row in the
ANOVA table divide the SS value by the df value to
compute the MS value
• In statistics, the degrees of freedom (DF / df) indicate the
number of independent values that can vary in an analysis
without breaking any constraints
• In statistics, the Mean Squared Error (MSE) or Mean
Squared Deviation (MSD) of an estimator measures the
average of the squares of the errors – i.e., the
average squared difference between the estimated values
and what is actual 35
Example
• Cluster Analysis is required to be done based on attitudes
towards shopping. Based on past research, six attitudinal
variables were identified. 20 Consumers were asked to
express their degree of agreement with the following
statements on a 7 point scale (1= Disagree, 7= Agree)
• V1 = Shopping is fun
• V2= Shopping is bad for your budget
• V3= I combine shopping with eating out
• V4= I try to get the best buys when shopping
• V5= I do not care about shopping
• V6 = You can save a lot of money by comparing prices
36
Example

37
SPSS Output

38
SPSS Output

39
SPSS Output

40
SPSS Output

41
SPSS Output

42
No Cluster 1 Cluster 2 Cluster 3
(4,10,14,16,18 (2,5,9,11,13,20 (1,3,6,7,8,12,1
,19 – Total 6) – Total 6) 5,17 – Total 8)
1 V1 = Shopping is fun 3.50 1.67 5.75
2 V2= Shopping is bad for your 5.83 3.00 3.62
budget
3 V3= I combine shopping with 3.33 1.83 6.00
eating out
4 V4= I try to get the best buys 6.00 3.50 3.12
when shopping
5 V5= I do not care about shopping 3.50 5.50 3.12
6 V6 = You can save a lot of money 6.00 3.33 3.88
by comparing prices
Cluster No Likes / Dislikes / Interpretation Marketing
Agrrement / Disapproval / intervention
Affinity Disagrement
1 V2, V4 & V6 NONE Economical

2 V5 V1 & V3 Apathetic / Hate

3 V1 & V3 NONE Shopping Spree/ 43


Fun Loving
Interpretation
Cluster 3 – Could be labeled as ‘Shopping Spree, Fun Loving’. It has high values on V1, V3 &
low value onV5.
Cluster 2 – Could be labeled as ‘Apathetic Buyers’ & are opposite to Cluster 3. It is low on
V1 & V3 & high on V5.
Cluster 1 – Could be labeled as ‘Economical Shoppers’. High on V2, V4 & V6.

44
Conjoint Analysis
• Conjoint Analysis is a multivariate technique
which captures exact levels of ‘Utility’ that an
individual customer puts on various attributes of
product offerings.
• E.g. how much ‘Utility’ customer sees in ‘price
level’ or ‘after sales service’, or ‘product features’
• This is extremely useful in Product Development
phase.
• This can also be used to reposition of existing
product
1
Conjoint Analysis
• Conjoint Analysis starts with a notion that product is a
bundle of various attributes. Customer evaluate each
attribute separately (depending upon satisfaction level of
each attribute and perceived weightage) and assign value
to product.
• It enables a direct comparison between ‘Utilities’ of
various attributes (at different levels) .
• Best combination of attributes at different levels is
possible to be offered.
• If this is done across sample of customers, segment wise,
it is quite possible to predict market share and response of
customers to changes in competitive strategy through
changes in the marketing elements. 2
Conjoint Analysis
• The usage can be at three levels –
– Individual Customer
– Segment Level – advisable to do segment wise without
loosing benefit of getting individual opinions
– Across Segments
• To avoid creating unmanageable data, researcher
has to select only those attribute and level which
are feasible offerings
• So also number of combinations being offered for
ranking by respondents should be also manageable
and should not be too high / impractical. 3
Conjoint Analysis
• Example - CNC Machine (B2B Case) is used to
perform a variety of manufacturing operations.
There are 3 attributes & the levels of attributes,
company is willing to design, are as given below –
• 1. Set up time (in Minutes) – 3, 6, 9, 12 ( 4 Levels)
• 2. Delivery Period (In days) – 18, 22, 28 (3 Levels)
• 3. Number of Tools – 4, 8, 10 ( 3 Levels)
• Use Conjoint Analysis to determine potential
customer’s view points

4
Conjoint Analysis
• Procedure – (Running a Conjoint as Regression Model)
1. Coding of the attribute level (‘Effects Coding’)
2. Prepare the different combinations & get the rankings
done by customers –
– Preparing no. of combinations in the said problem
= 4*3*3 = 36
3. Substituting the combinations with Codes
(worked out in Step 2)
4. Run the coded data in SPSS & derive output
5. Get the ‘Part Utility’ & then work out ‘Ranges of
Utilities’
6. Work out ‘Utility’ of any / all combinations 5
Conjoint Analysis
Step 1 - Coding of the attribute level (‘Effects Coding’)
Set Up Time in Minutes Var 1 Var 2 Var 3

S3 1 0 0
S6 0 1 0
S9 0 0 1
S 12 -1 -1 -1

Delivery Period in Days Var 4 Var 5


D 18 1 0
D 22 0 1
D 28 -1 -1

Number of Tools Var 6 Var 7


T4 1 0
T8 0 1
6
T 10 -1 -1
Step 2 –
Prepare the
different
combinations
(4 * 3 * 3 = 36)

7
Step 3 -
Substituting
the
combinations
with
Codes

8
Conjoint Analysis - SPSS

1. Analyze

2. Regression

3. Linear

9
Conjoint Analysis - SPSS

1. Dependent Variable

2. Independent Variable

10
Conjoint Analysis - SPSS

1. Statistics
2. Model Fit
3. Estimates

4. Continue

11
Conjoint Analysis - SPSS

1. Options
2. Use Prob. of F

3. Include Constant in
Equation
4. Continue
5. Ok

12
Conjoint Analysis - SPSS

13
Conjoint Analysis
Step 5 - Get the ‘Part Utility’ & then work out ‘Ranges of Utilities’
Attribute Level Part Utility Range of Utilities

1 Set Up time in Minutes

S3 5.472 = Max - Min


S6 4.250 = 5.472 – ( - 8.639)
S9 - 1. 083 = 14.111
S12 = - (5.472 + 4.250 – 1.083) = - 8.639

2 Delivery Period in Days

D18 3.389 = Max - Min


D22 1.222 = 3.389 – ( - 4.611)
D28 = - ( 3.389 + 1.222) = - 4.611 =8
3 Number of Tools

T4 -10.361 = Max - Min


T8 1.556 = 8. 805 – (- 10.361)
14
T10 = - (-10.361+1.556) = 8.805 = 19. 166
Conjoint Analysis
1. ‘Tools’ is the most important attribute for this customer
2. The highest individual value of Utility is T10
3. Set Up time is the second most important attribute

Step 6 - Work out ‘Utility’ of any / all combinations


We can pick one attribute level from each attribute and combine
their part utilities to calculate the total utility of the combination.
e.g. S3, D22, T4 = 5.472 + 1.222 + ( - 10. 361) = - 3.667
If we want the best combination, we pick up the highest utility
from each attribute and add them.
e.g in the example that is worked out, highest utility is for the
combination of –
S , D , T = S3 + D18 + T10 = 5.472 +3.389 + 8.805 = 17.666
15
Conjoint Analysis
Perform the Conjoint Analysis using the 'Regression Method' for
the data of a Paint Company.
1. Indicate the Part & Attribute Utility Levels .
2. Also Indicate total Utility for Best combination of attributes.
The three important attributes identified for Paint are:
1. Life - No. of Years the Paint Coat Lasts
2. Price - The Price of One Litre of Paint
3. Colour - Colour Shed of Paint
The levels of above mentioned attributes are as follows:
1. Life - 3 Years, 4 Years, 5 Years
2. Price - Rs 50 / Lt., Rs 60 / Lt., Rs 70 / Lt
3. Colour - Green, Blue, Cream
The ranking of all 27 combinations are given on next slide
16
--
Conjoint Analysis

17
Conjoint Analysis
Step 1 - Coding of the attribute level (‘Effects Coding’)
Life in Years Var 1 Var 2

3 1 0
4 0 1
5 -1 -1

Price ( Rs Per Lt.) Var 4 Var 5


50 1 0
60 0 1
70 -1 -1

Colour Var 6 Var 7


Green 1 0
Blue 0 1
18
Cream -1 -1
Conjoint Analysis

19
Conjoint Analysis

Highest Part Level


Utility =7 i.e. 5 Years
Life &
Highest Range of
Utility is ‘Life’ =
14.11

20
Conjoint Analysis

21

You might also like