Combinepdf PDF

Multiple Correlation & Regression
• In statistics, the coefficient of Multiple

Correlation is a measure of how well a given
variable can be predicted using a linear function of
a set of other variables
• Regression Analysis, predicts the value of the

Dependent Variable (D.V.) based on the known
value of the Independent Variable (I.V.), assuming
that average mathematical relationship between
two or more variables
1
• Correlation & Regression are generally performed
together.
• Correlation – Degree of association between two
sets of quantitative data.
• Regression analysis – Explains the variation in one
variable (called the Dependent Variable), based on
the variation in one or more other variables (called
the independent variable).
• One Dependent Variable (D.V.) & One
Independent Variable (I.V.) – Simple Regression
• Multiple Independent Variable & one Dependent2
Variable – Multiple Regression
• Worked Out Example –
• A Manufacturer & marketer of electric motors
would like to build a regression model consisting
of 5 or 6 independent variables to predict sales.
Past data has been collected for 15 sales territories
on sales & 6 different Independent Variables. Build
a Regression Model & recommend whether or not
it should be used by company.
• Data –
• Dependent Variable –
• Y = Sales in Rs Laks in the territory 3
• Independent Variables –
• X1 = Market Potential in the territory ( in Rs Lakh)
• X2 = No. of Dealers of the company in the
territory
• X3 = No. of Sales Persons in territory
• X4 = Index of competitor activity in territory on 5
Point scale (1 – Low, 5 – High)
• X5 = No. of service People in the territory
• X6 = No. of existing customers in the territory
4
5
6
• SPSS Procedure –
• Correlation:
– Click on ANALYZE ( or STATISTICS depending
upon SPSS version)
– Click on CORRELATE, followed by BIVARIATE
– Select all the variables from list with a ‘Right Arrow’
– Select PEARSON under heading of Correlation
Coefficient
– Select ‘2 – tailed’ under heading Test of Significance
– Click OK to get matrix of pair-wise ‘Pearson
Correlations’ among all the variables selected along
with two tailed significance of each pair-wise 7
correlation.
• SPSS Procedure –
• Regression:
• Click on ANALYZE (or STATISTICS)
• Click on REGRESSION followed by LINEAR
• Select Dependent Variable & transfer them to
Dependent Variable box using arrow keys
• Select Independent variable & transfer them to
Independent Variable box using arrow keys
• Select ENTER
8
Multiple Correlation & Regression – SPSS Output
9
Multiple Correlation & Regression – SPSS Output –
‘Enter’
10
The Standard Error of the Estimate for Regression measures the amount of
variability in the points around the Regression line.
It is the Standard Deviation of the data points as they are distributed around
the Regression line
R Square is a basic matric which tells you about that how much variance is
been explained by the model. What happens in a multivariate linear regression
is that if you keep on adding new variables, the R square value will always
increase irrespective of the variable significance.
What Adjusted R Square do is calculate R square from only those variables
whose addition in the model which are significant. So always while doing a
multivariate linear regression we should look at adjusted R square instead of R
square. 11
Correlation & Regression
+ 3 Se
Y Est. = a + b X
Dependent + 2 Se
Variable + 1 Se
(Y)
- 1 Se
- 2 Se
- 3 Se
Independent Variable (X)

Residuals. The difference between the observed value of the

dependent variable (y) and the predicted value (ŷ) is called the
residual (e). Each data point has one residual. Both the sum and
the mean of the residuals are equal to zero.
df – Degrees of Freedom –
For Regression – p ( no of IVs)
For Residual: n-p-1 ( where n is sample size)
MS = SS / df
F = MS (Regression) / MS ( Residual) 13
The t statistic is the coefficient divided by its standard error.

The standard error is an estimate of the standard deviation of the
coefficient, the amount it varies across cases. It can be thought of as a
measure of the precision with which the regression coefficient is
measured.
Y ( Sales) = -3.173 + 0.227 (Market Potential) + 0.819 (Dealers) + 1.091 (Sales
Persons) -1.893 (Comp Act Index) – 0.549 (Service Persons) + 0.066 (Exist.14
Customers) --- Referring to Unstandardized Coefficients
Standardization of the coefficient is usually done to answer the question of which
of the independent variables have a greater effect on the dependent variable in
a multiple regression analysis, when the variables are measured in different units
of measurement (for example, income measured in dollars and family
size measured in number of individuals).
A regression carried out on original (unstandardized) variables produces

unstandardized coefficients. A regression carried out on standardized variables
produces standardized coefficients. Values for standardized and unstandardized
coefficients can also be derived subsequent to either type of analysis.
Before solving a multiple regression problem, all variables (independent and

dependent) can be standardized. Each variable can be standardized by subtracting
its mean from each of its values and then dividing these new values by
the standard deviation of the variable. Standardizing all variables in a multiple
regression yields standardized regression coefficients that show the change in the
dependent variable measured in standard deviations
15
• Ex 2 – An organization would like to build
Regression Model consisting of four independent
variables to predict the Compensation (Dependent
Variable) of it’s employees. Past data has been
collected for 15 different employees & four
independent variables.. Build a Regression Model
& recommend it’s proper usage.
• The data is as follows –
• Dependent Variable – (DV)
• Y = Compensation in Rs.
16
• Independent Variables ( I.V.) –
• 1. Experience in Years
• 2. Education in Years (After 10th)
• 3. Number of Employees Supervised
• 4. Number of Projects Handled
The dataset consisting of observations is given on the
next slide -
17
18
19
20
21
Attribute Type Perceptual Mapping using
Discriminant Analysis
• Positioning is essentially concerned with mapping
a consumer’s mind & placing all the competing
brands of a product category in appropriate ‘slots’
or ‘positions’ on it.
1. Customer Survey – What customer thinks about
available or particular brands in market based on
some important attributes.
2. This can be plotted on graph (2 attributes at a
time & relative positioning of Brands).
This is Perceptual Mapping of consumer perception
about competing Brands in Product Category. 1
• Methods –
1. Attribute based approach –
– Using Discriminant Analysis
2. Similarity / Dissimilarity based approach
– Easy to understand intutively
– Useful in gaining good understanding of consumer
psyche
– Based on some kind of distance measure between the
Brands being Rated
– Simple Way – Provide customer a cards, each
containing pairs of brands written on it 2
• The application areas for Attribute Type
Perceptual Mapping is same as that of MDS.
• We will be now further interested to find out the
perceptual difference on attribute level
• This will lead to actionable points to improve a
particular attribute ( based on priority) and there
by further improve the Brand Image / Brand Score
3
Multidimensional Scaling (MDS)
for Brand Positioning
MDS 2D Output Dimension 2 ( After Sales Service)
Thompson 7
Videocon 2
1 AIWA
6 Onida
Dimension 1
(Value for Money)
BPL 8 5
Samsung 4 Sony
3
LG
4
• Example – A Chocolate company wants to draw a
perceptual map using attribute based procedure.
Assume Nestle vs Cadbury Vs Amul
• Data was collected from 15 respondents ( 5
consumers of each brand) on five attributes viz.
Price, Quality, Availability, Packaging & Taste.
• The variable are measured using different scales
but higher value indicated a favorable rating.
5
Brand Price Quality Availability Packaging Taste
1 12 34 500 5 18
1 11 35 234 4 15
1 10 36 250 4 14
1 13 22 345 5 12
1 12 23 432 3 13
2 10 14 234 2 15
2 11 17 231 3 11
2 15 23 45 4 10
2 13 14 35 3 12
2 12 15 25 2 10
3 10 22 75 4 8
3 12 24 80 4 7
3 13 28 90 5 10
3 11 17 96 2 12 6
3 11 18 59 2 6
Discriminant Analysis – SPSS Input
7
Discriminant Analysis – SPSS Input
Measure for Brand - Nominal
Type –Numeric (All)
8
SPSS
1. Analyze
2 Classify
3 Discriminant
9
SPSS
1. Move DV into
Grouping Variable
2. Move IV
Select respective
variables and use
this to move
10
SPSS
1. Select DV
2. Click ‘Define Range’
11
SPSS
Enter the values –

Min as 1 and Max 3
( We have 3 groups)
Continue
12
SPSS
Click on Statistics
13
SPSS
Click on
ANOVA,
Fisher’s &
Unstandardized
Continue
14
SPSS
Click on ‘Classify’
15
SPSS
Click on ‘Combined – Groups’

(under the heading ‘Plot’)
Click on ‘Summery Table

& ‘Leave Out Classification’
& then ‘Continue’
16
SPSS
Click on ‘Save’
17
SPSS
Click on all the

options &
Continue
18
SPSS
Click on ‘Ok’
19
Discriminant Analysis – SPSS Output
Data Set 0 -
20
21
Wilks’ Lambda
• = With in S.S. (‘Sum of Squares’) / Total S.S. (Sum of Squares)
Wilk’s Lambda value is between 0 to 1.
Any value closer to 0 indicates better discriminating power
• If the model is good, ‘With in SS’ should be as much less as possible
‘Eigenvalue = Between S.S. (Sum of Squares) /
With in S.S. (Sum of Squares)
If the model is Good, Eigenvalue should be greater than 1,
i.e. Between S.S. > With in S.S. Higher the value, better it is.
Canonical Correlation = Sq Root of ( Betn S.S. / Total S.S)
• Any Value > 0.5 – Accept the model
Significance value / Confidence Level – Very Good at @ 99% Which
is ( 1- Sig. Value). 22
23
Discriminant Analysis – Graph
24
Output Interpretations
• As we had 3 groups in the example , we have two
Functions / equations ( K-1)
• In some of the examples we may have 4 or 5. We need to
judge importance of each with the associated Eigen Value
and amount of variance it explains from original data
• Significance test also tells us if the given function is
significantly discriminates between the groups(Brands)
• If we use two functions, there will be one perceptual map
with two functions forming the two axis.
• If we use three functions, we will get three perceptual
maps. Function1 vs Function2, F2 Vs F3 & F3 vs F1
25
• To draw Graph –
• Function 1 – X Axis, Function2 – Y Axis with suitable scaling
• Select following table for determining coeff. of attribute on function 1
& 2 and join it to 0.
• Similarly use above chart to determine Coeff of Brands on Graph.
26
• Variables with longer vectors in a given dimension and those closest
to axis are contributing more to the interpretation of that dimension.
Looking at the graph, we can give label to the dimension .
• E.g. Dimension1 – Availability & Quality
• Dimension 2 – Price & Taste
• Packaging is stand alone and does not affect in any way
• Nestle seems to be stronger on Dimension 1 i.e. Availability &

Quality
• Cadbury is stronger on Dimension 2 i.e. Taste & Price
27
28
29
Lakme
Finish
Packaging
Price
Dim 1
Colour MB
Long Lasting
Rev
Dim 2
30
• Positioning is essentially concerned with mapping
a consumer’s mind & placing all the competing
brands of a product category in appropriate ‘slots’
or ‘positions’ on it.
1. Customer Survey – What customer thinks about
available or particular brands in market based on
some important attributes.
2. This can be plotted on graph (2 attributes at a
time & relative positioning of Brands).
This is Perceptual Mapping of consumer perception
about competing Brands in Product Category. 1
• Methods –
1. Attribute based approach –
– Using Discriminant Analysis
2. Similarity / Dissimilarity based approach
– Easy to understand intutively
– Useful in gaining good understanding of consumer
psyche
– Based on some kind of distance measure between the
Brands being Rated
– Simple Way – Provide customer a cards, each
containing pairs of brands written on it 2
– Ask him to write down a number indicating the
difference between two brands on a numerical scale
which can represent the distance.
– This can be repeated for all pairs of brands under
consideration
– No Attributes are specified for the exercise
– Customer may have the top line parameters on his
mind while rating but he would not specify them
– He would only indicate the distance ( Dissimilarity) in
some numerical value
Output – Number of Dimensions ( Based on ‘Stress’)
& Interpretations 3
• Example – Use MDS to determine perception of 8
TV Brands & try to plot a positioning map of the
eight brands. Also find out the no. of dimensions
the consumer seem to be using.
AIWA Video LG Samsung Sony Onida Tho BPL
Var1 (AIWA) 0 3 6 8 1 2 7 8
Var2 (Videocon) 3 0 4 6 4 5 2 5
Var3 (LG) 6 4 0 3 2 4 6 1
Var4 (Samsung) 8 6 3 0 3 5 4 7
Var5 (Sony) 1 4 2 3 0 2 8 5
Var6 (Onida) 2 5 4 5 2 0 3 6
Var7 ( Thompson) 7 2 6 4 8 3 0 5
Var8 (BPL) 8 5 1 7 5 6 5 4
0
5
1. Analyze
2. Scale
3. Multidimensional Scaling
6
1.Select all the variables
2.Click the shift arrow to

Take all the variables in
Rt side block
7
1.After shifting all the variables on

Rt side block
2. Click on ‘Model’
8
1. Ordinal
2. Matrix
3. Minimum -1, Maximum -3
4. Euclidian Distance
5. Continue
9
1. Click Options
2. Individual Subject Plots
3. Continue
10
1. Click Data are Distances
2. Ok
11
• Important Output –
• Iteration History – This will be available in tables with
‘Stress’ value & improvements at every iteration. This
will be available for all dimension values (1 Dimension,
2 Dimension & 3 Dimension solutions)
• Stimulus Coordinates – This is the important table. You
can make multidimensional scaling based on the
coordinates. Example is shown in next slide
• ‘Stress’ value & ‘RSQ’ ( Stress & Squared Correlation)
value for all the values of Dimensions ( in our example –
1,2 &3 Dimension)
12
Multidimensional Scaling (MDS) for Brand Positioning
No. of Dimensions Stress (S – Stress or Kruskal Stress) RSQ
3 Dimensional 0.05230 0.96043
2 Dimensional 0.24015 0.58135
1 Dimensional 0.43158 0.35255
Stress Value indicates lack of fit, so it should be as close to zero as

possible.
RSQ should be close to 1( at least more than 0.5). RSQ (R Square)
is interpreted as proportion of Variance of transformed data
accounted for by distances in the model.
Clearly one dimensional solution is not good. Two Dimensional
Solution is better but three dimensional solution is the best. It is as low
as 0.05.
Since we are comparing only 8 models, we will not get beyond 3
dimensional solution. If we compare say 12 to 15, we may get more.
However trade off is always reqd. in the case of no. of dimensions
and ease of interpretation from economy angle. 13
14
• Assuming that we have decided to use 3 Dimensional solution , the
next task is to name the dimensions. For doing so, subject matter
expertise is needed. We must look at various qualities of various
attributes offered by these 8 brands thro’ our knowledge of market.
This tends to be subjective. E.g. in our example 3 dimensions could be
• Dimension 1 – Value for Money
• Dimension 2 – After Sales Service
• Dimension 3 – Current Brand Image
• From one of the output (Configuration derived in 3 dimensions), we
may conclude that some brand enjoy good brand image wrt dimension
1, while some are perceived best in dimension 2. This is explained wrt
2 Dimensional graph in next slides.
15
Configuration derived in 3 Dimension Stimulus Coordinates
Stress – 0.05230, RSQ – 0.96043
Stimulus Stimulus Name 1 2 3
Number After Sales Current
Value for
Money Service Brand Image
1 VAR00001(AIWA) 1.9512 0.2028 0.0664
2 VAR00002 (Videocon) -0.1995 1.3140 0.7743
3 VAR00003 (LG) -0.6043 -1.3429 0.4680
4 VAR00004 (Samsung) -0.9038 -0.2969 -1.8497
5 VAR00005 (Sony) -0.8931 -1.0092 -0.0350
6 VAR00006 (Onida) 1.1045 0.1529 -0.7070
7 VAR00007 (Thompson) -1.1031 1.6088 -0.1289
8 VAR00008 (BPL) -1.1381 -0.6295 1.4121
16
Stress – 0.24015, RSQ – 0.58148
Stimulus Stimulus Name 1 2
Number After Sales
Value for Money
Service
1 VAR00001(AIWA) 1.6156 0.4756
2 VAR00002 (Videocon) -0.276 1.3796
3 VAR00003 (LG) -0.254 -1.0558
4 VAR00004 (Samsung) -1.2851 -0.7799
5 VAR00005 (Sony) 0.9602 -0.9335
6 VAR00006 (Onida) 1.1044 0.0665
7 VAR00007 (Thompson) -0.5683 1.5124
8 VAR00008 (BPL) -1.2968 -0.662

17
MDS 2D Output Dimension 2 ( After Sales Service)
Thompson 7
Videocon 2
1 AIWA
6 Onida
Dimension 1
(Value for Money)
BPL 8 5
Samsung 4 Sony
3
LG
18
• Example - A set of eight brands of detergent
available in Market and is taken for Multi
Dimensional Scaling (MDS) to determine how the
Indian consumers perceive them. Also it is used to
find out how many dimensions the consumers seem
to be considering, when they think of these brands.
The eight brands are - 1.Rin, 2. Nirma, 3. Ariel,
4. Ok, 5. Eta, 6. Wheel, 7. Trilo, 8. Super Bar 501.
• Input data is given on the next slide.
• (For Ref. – Dimension 1 – Value for Money, Dimension 2 –
Dirt Removal, Dimension 3 – Fabric Care) 19
20
Important Outputs -
Stress RSQ
3 Dimensional 0.05975 0.98047
2 Dimensional 0.08609 0.96647
1 Dimensional 0.36852 0.64662
21
Stress – 0.05975, RSQ – 0.94583
Stimulus Stimulus Name 1 2 3
Number Dirt Fabric Care
Value for
Money Removal
1 VAR00001(Rin) -0.1860 -0.8867 -1.1559
2 VAR00002 (Nirma) 1.0175 -1.1665 0.3782
3 VAR00003 (Ariel) -1.6674 -0.1959 0.4655
4 VAR00004 (OK) -1.4639 1.3148 - 0.0510
5 VAR00005 (Eta) 0.4844 1.5134 - 0.7565
6 VAR00006 (Wheel) -1.0923 -0.9168 0.8062
7 VAR00007 (Trilo) 1.3535 -0.4514 -0.7064
8 VAR00008 (Super Bar 501) 1.5543 0.7892 1.0199

22
Stress – 0.08616, RSQ – 0.93172
Stimulus Stimulus Name 1 2
Number Dirt Removal
Value for
Money
1 VAR00001(Rin) -0.11 -1.03
2 VAR00002 (Nirma) 0.75 -1.14
3 VAR00003 (Ariel) -1.43 0.12
4 VAR00004 (OK) -1.26 1.05
5 VAR00005 (Eta) 0.53 1.4
6 VAR00006 (Wheel) -1.09 -0.85
7 VAR00007 (Trilo) 1.22 -0.35
8 VAR00008 (Super Bar 501)
1.4 0.8 23
24
Cluster Analysis
• Many times, whole population is diverse, but might
consist of number of similar Groups (Clusters)
• The objective of Cluster Analysis is to assign
observations to groups or Clusters so that
observation with in each group are similar to one
another wrt variables or attributes or attributes of
interests and groups themselves stand apart from one
another
• In other words, the objective is to divide the
observations into homogenous and distinct groups
1
Cluster Analysis
• Cluster Analysis seeks to discover the number and
composition of groups
• Sometimes, common marketing strategy may not
work out. However, separate marketing strategy for
each cluster, based on Age, Gender, Income, Marital
Status, Years of loyalty (how long person has been a
customer) might work out.
2
Cluster Analysis
• Illustrative Example
Person Expenditure on Expenditure on
Food Clothing
A 2 4
B 8 2
C 9 3
D 1 5
E 8.5 1
• Distance Measured
1. Euclidean – Betn A & B –sq root of (2-8)Sq +(4 -
2)sq = Sq Root of 40 = 6.325
2. Squared Euclidean 3
Cluster Analysis
• Grouping of observation
Clusters A B C D E
A
B
C
D
E
4
Cluster Analysis - Hierarchical
Clusters A B C D E
A 0 6.325 7.071 1,414 7.159
B 0 1.414 7.616 1.118
C 0 8.246 2.062
D 0 8.500
E 0
Clusters (BE) A C D
(BE) 0 6.325 1.414 7.616
A 0 7.071 1.414
C 0 8.246
D 0
Clusters (BE) (AD) C

(BE) 0 6.325 1.414
(AD) 0 7.071
C 0 5
Dendrogram Output
S
I
M 27.54
I
L 51.69
A
R
75.85
I
T
100
Y
A D B E C
6
Cluster Analysis K Means
• Iteration 1
Cluster 1 Cluster 2
Obs X1 X2 Obs X1 X2
A 2 4 C 9 3
B 8 2 E 8.5 1
D 1 5
Average 3.67 3.67 Average 8.75 2
• Iteration 2
Cluster 1 Cluster 2
Obs X1 X2 Obs X1 X2
A 2 4 C 9 3
D 1 5 E 8.5 1
B 8 2
Average 1.5 4.5 Average 8.5 2 7
Distance Definitions
• Single – Minimum distance between an item in one
Cluster & an item in other Cluster (Nearest Neighbor
Method)
• Complete - Maximum distance between an item in one
Cluster & an item in other Cluster (Furthest Neighbor
Method)
• Centroid – It is average value within cluster (average
rating)
• Ward’s Method - Calculate distance between each
respondent rating and the cluster centroid by squared
Euclidean method & add them
8
Pe Expe Expend Clusters A B C D E
rso nditu iture on A 0 6.325 7.071 1,414 7.159
n re on Clothin
Food g B 0 1.414 7.616 1.118
A 2 4 C 0 8.246 2.062
B 8 2 D 0 8.500
C 9 3 E 0
D 1 5
E 8.5 1
Dist. – AB - , AE – , BD - , ED –
Centroid – AD – AD – (2+1/2), (4+5/2) = 1.5, 4.5
Similarly Calculate, Cetroid – BE –
Wards - AD – (1-1.5)sq + (5-4.5)sq + (2-1.5)sq + (4-4.5)sq = 1
Similarly Calculate, Wards – BE - 9
• Between the Group Linkage - It is average
distance (Sq Eucl) between all pairs of customers
in two clusters
• With in the group linkage - First combine two
clusters & it is average (Sq Eucl) between all
pairs of customers in combined cluster
10
n re on Clothin
Food g B 0 1.414 7.616 1.118
A 2 4 C 0 8.246 2.062
B 8 2 D 0 8.500
C 9 3 E 0
D 1 5
E 8.5 1
Between the group Linkage ( Sq Eucl) : Cluster AD & BE –
• A-B – , A-E- , D –B – ,D–E–
• Add all squares and divide by 4
With in the group linkage (Sq Eucl) : Cluster AD & BE –
A–D– ,B–E– – Add this total to the total of With in group linkage
& divide by 6 11
n re on Clothin
Food g B 0 1.414 7.616 1.118
A 2 4 C 0 8.246 2.062
B 8 2 D 0 8.500
C 9 3 E 0
D 1 5
E 8.5 1
Between the group Linkage ( Sq Eucl) : Cluster AD & BE –
• A-B – ( 6.325)sq, A-E- (7.159) sq, D –B – (7.616)sq, D – E – (8.500)sq
• Add all squares and divide by 4
With in the group linkage (Sq Eucl) : Cluster AD & BE –
A – D –(1.414) sq, B – E – (1.118) – Add this total to the total of With in group
linkage & divide by 6 12
SPSS
13
SPSS
Variable View
Name VAR 00..1
Type Numeric
Decimal 2
Lebel Type Question or concise form of question
Values None
Missing Check for missing value in data, if any
Columns Actuals .. Just an information
Align Rt Align.. ( Your choice)
Measure Scale
14
SPSS
1. Analyze
2. Classify
3. Hierarchical Cluster
15
SPSS
16
SPSS
1. Enter the 2. Click on

Variables by Statistics
selecting and
clicking
17
SPSS
1 Tick
Agglomeration
Schedule
2 Click
Continue
18
SPSS
Click on
Plots
19
SPSS
1 Dendrogram
2 All Clusters
3 Vertical
4. Continue
20
SPSS
Method
21
SPSS
1 1. Centroid Clustering
2 2. Sq. Eucl. Dist.
3. None (Standardize)
4. Continue
3
4
22
SPSS
Ok
23
SPSS – Hierarchical Clustering Output
24
SPSS
K Means Cluster Analysis -
1.Analyze
2. Classify
3. K Means
Cluster
25
SPSS
1 Select Variables
2 Clusters
26
SPSS
1.Save
2 Tick Cluster
Membership &
Continue
27
SPSS
1 Iterate
2.Default Value -10
3 Continue
28
SPSS
Initial Cluster
Centers
1 Options
ANOVA Table
Cluster Information
For each case
Continue
29
SPSS
Ok
30
SPSS – K Means Output
31
SPSS – K Means Output
F = MSS Between / MSS Within

Higher the ratio, better it is
In the case above –
F (Var1) = 58.800 / 0.333 = 176.40
F (Var 2) = 7.5 / .833 = 9.000
So also Sig. Value for Var 1 = 0.001, i.e
Confidence Level = (1-0.001)*100 = 99.9%
Sig. value for Var 2 = 0.058 i.e.
Confidence Level = (1-0.058)*100 = 94.2% 32
SPSS
Cluster Membership appears in data view
33
References
• The Sum of Squares is the sum of the square of
variation, where variation is defined as the spread
between each individual value and the mean. To
determine the Sum of Squares, the distance between
each data point and the line of best fit is squared and
then summed up. The line of best fit will minimize this
value
• Or
• In statistical data analysis the Total Sum of Squares (TSS
or SST) is defined as being the sum, over all observations,
of the squared differences of each observation from the
overall mean. 34
•
References
• Mean Squares - Each Mean Square value is computed by
dividing a Sum-of-Squares value by the corresponding
degrees of freedom. In other words, for each row in the
ANOVA table divide the SS value by the df value to
compute the MS value
• In statistics, the degrees of freedom (DF / df) indicate the
number of independent values that can vary in an analysis
without breaking any constraints
• In statistics, the Mean Squared Error (MSE) or Mean
Squared Deviation (MSD) of an estimator measures the
average of the squares of the errors – i.e., the
average squared difference between the estimated values
and what is actual 35
Example
• Cluster Analysis is required to be done based on attitudes
towards shopping. Based on past research, six attitudinal
variables were identified. 20 Consumers were asked to
express their degree of agreement with the following
statements on a 7 point scale (1= Disagree, 7= Agree)
• V1 = Shopping is fun
• V2= Shopping is bad for your budget
• V3= I combine shopping with eating out
• V4= I try to get the best buys when shopping
• V5= I do not care about shopping
• V6 = You can save a lot of money by comparing prices
36
Example
37
SPSS Output
38
SPSS Output
39
SPSS Output
40
SPSS Output
41
SPSS Output
42
No Cluster 1 Cluster 2 Cluster 3
(4,10,14,16,18 (2,5,9,11,13,20 (1,3,6,7,8,12,1
,19 – Total 6) – Total 6) 5,17 – Total 8)
1 V1 = Shopping is fun 3.50 1.67 5.75
2 V2= Shopping is bad for your 5.83 3.00 3.62
budget
3 V3= I combine shopping with 3.33 1.83 6.00
eating out
4 V4= I try to get the best buys 6.00 3.50 3.12
when shopping
5 V5= I do not care about shopping 3.50 5.50 3.12
6 V6 = You can save a lot of money 6.00 3.33 3.88
by comparing prices
Cluster No Likes / Dislikes / Interpretation Marketing
Agrrement / Disapproval / intervention
Affinity Disagrement
1 V2, V4 & V6 NONE Economical
2 V5 V1 & V3 Apathetic / Hate
3 V1 & V3 NONE Shopping Spree/ 43

Fun Loving
Interpretation
Cluster 3 – Could be labeled as ‘Shopping Spree, Fun Loving’. It has high values on V1, V3 &
low value onV5.
Cluster 2 – Could be labeled as ‘Apathetic Buyers’ & are opposite to Cluster 3. It is low on
V1 & V3 & high on V5.
Cluster 1 – Could be labeled as ‘Economical Shoppers’. High on V2, V4 & V6.
44
Conjoint Analysis
• Conjoint Analysis is a multivariate technique
which captures exact levels of ‘Utility’ that an
individual customer puts on various attributes of
product offerings.
• E.g. how much ‘Utility’ customer sees in ‘price
level’ or ‘after sales service’, or ‘product features’
• This is extremely useful in Product Development
phase.
• This can also be used to reposition of existing
product
1
Conjoint Analysis
• Conjoint Analysis starts with a notion that product is a
bundle of various attributes. Customer evaluate each
attribute separately (depending upon satisfaction level of
each attribute and perceived weightage) and assign value
to product.
• It enables a direct comparison between ‘Utilities’ of
various attributes (at different levels) .
• Best combination of attributes at different levels is
possible to be offered.
• If this is done across sample of customers, segment wise,
it is quite possible to predict market share and response of
customers to changes in competitive strategy through
changes in the marketing elements. 2
Conjoint Analysis
• The usage can be at three levels –
– Individual Customer
– Segment Level – advisable to do segment wise without
loosing benefit of getting individual opinions
– Across Segments
• To avoid creating unmanageable data, researcher
has to select only those attribute and level which
are feasible offerings
• So also number of combinations being offered for
ranking by respondents should be also manageable
and should not be too high / impractical. 3
Conjoint Analysis
• Example - CNC Machine (B2B Case) is used to
perform a variety of manufacturing operations.
There are 3 attributes & the levels of attributes,
company is willing to design, are as given below –
• 1. Set up time (in Minutes) – 3, 6, 9, 12 ( 4 Levels)
• 2. Delivery Period (In days) – 18, 22, 28 (3 Levels)
• 3. Number of Tools – 4, 8, 10 ( 3 Levels)
• Use Conjoint Analysis to determine potential
customer’s view points
4
Conjoint Analysis
• Procedure – (Running a Conjoint as Regression Model)
1. Coding of the attribute level (‘Effects Coding’)
2. Prepare the different combinations & get the rankings
done by customers –
– Preparing no. of combinations in the said problem
= 4*3*3 = 36
3. Substituting the combinations with Codes
(worked out in Step 2)
4. Run the coded data in SPSS & derive output
5. Get the ‘Part Utility’ & then work out ‘Ranges of
Utilities’
6. Work out ‘Utility’ of any / all combinations 5
Conjoint Analysis
Step 1 - Coding of the attribute level (‘Effects Coding’)
Set Up Time in Minutes Var 1 Var 2 Var 3
S3 1 0 0
S6 0 1 0
S9 0 0 1
S 12 -1 -1 -1
Delivery Period in Days Var 4 Var 5

D 18 1 0
D 22 0 1
D 28 -1 -1
Number of Tools Var 6 Var 7

T4 1 0
T8 0 1
6
T 10 -1 -1
Step 2 –
Prepare the
different
combinations
(4 * 3 * 3 = 36)
7
Step 3 -
Substituting
the
combinations
with
Codes
8
Conjoint Analysis - SPSS
1. Analyze
2. Regression
3. Linear
9
1. Dependent Variable
2. Independent Variable
10
1. Statistics
2. Model Fit
3. Estimates
4. Continue
11
1. Options
2. Use Prob. of F
3. Include Constant in
Equation
4. Continue
5. Ok
12
13
Conjoint Analysis
Step 5 - Get the ‘Part Utility’ & then work out ‘Ranges of Utilities’
Attribute Level Part Utility Range of Utilities
1 Set Up time in Minutes
S3 5.472 = Max - Min

S6 4.250 = 5.472 – ( - 8.639)
S9 - 1. 083 = 14.111
S12 = - (5.472 + 4.250 – 1.083) = - 8.639
2 Delivery Period in Days
D18 3.389 = Max - Min

D22 1.222 = 3.389 – ( - 4.611)
D28 = - ( 3.389 + 1.222) = - 4.611 =8
3 Number of Tools
T4 -10.361 = Max - Min

T8 1.556 = 8. 805 – (- 10.361)
14
T10 = - (-10.361+1.556) = 8.805 = 19. 166
Conjoint Analysis
1. ‘Tools’ is the most important attribute for this customer
2. The highest individual value of Utility is T10
3. Set Up time is the second most important attribute
Step 6 - Work out ‘Utility’ of any / all combinations

We can pick one attribute level from each attribute and combine
their part utilities to calculate the total utility of the combination.
e.g. S3, D22, T4 = 5.472 + 1.222 + ( - 10. 361) = - 3.667
If we want the best combination, we pick up the highest utility
from each attribute and add them.
e.g in the example that is worked out, highest utility is for the
combination of –
S , D , T = S3 + D18 + T10 = 5.472 +3.389 + 8.805 = 17.666
15
Conjoint Analysis
Perform the Conjoint Analysis using the 'Regression Method' for
the data of a Paint Company.
1. Indicate the Part & Attribute Utility Levels .
2. Also Indicate total Utility for Best combination of attributes.
The three important attributes identified for Paint are:
1. Life - No. of Years the Paint Coat Lasts
2. Price - The Price of One Litre of Paint
3. Colour - Colour Shed of Paint
The levels of above mentioned attributes are as follows:
1. Life - 3 Years, 4 Years, 5 Years
2. Price - Rs 50 / Lt., Rs 60 / Lt., Rs 70 / Lt
3. Colour - Green, Blue, Cream
The ranking of all 27 combinations are given on next slide
16
--
Conjoint Analysis
17
Conjoint Analysis
Step 1 - Coding of the attribute level (‘Effects Coding’)
Life in Years Var 1 Var 2
3 1 0
4 0 1
5 -1 -1
Price ( Rs Per Lt.) Var 4 Var 5

50 1 0
60 0 1
70 -1 -1
Colour Var 6 Var 7

Green 1 0
Blue 0 1
18
Cream -1 -1
Conjoint Analysis
19
Conjoint Analysis
Highest Part Level

Utility =7 i.e. 5 Years
Life &
Highest Range of
Utility is ‘Life’ =
14.11
20
Conjoint Analysis
21

Combinepdf PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Combinepdf PDF

Uploaded by

Copyright:

Available Formats

Multiple Correlation & Regression

• In statistics, the coefficient of Multiple

• Regression Analysis, predicts the value of the

Independent Variable (X)

Residuals. The difference between the observed value of the

The t statistic is the coefficient divided by its standard error.

A regression carried out on original (unstandardized) variables produces

Before solving a multiple regression problem, all variables (independent and

Measure for Brand - Nominal

Type –Numeric (All)

2. Click ‘Define Range’

Enter the values –

Click on ‘Combined – Groups’

Click on ‘Summery Table

Click on all the

• Similarly use above chart to determine Coeff of Brands on Graph.

• Nestle seems to be stronger on Dimension 1 i.e. Availability &

1.Select all the variables

2.Click the shift arrow to

1.After shifting all the variables on

2. Individual Subject Plots

1. Click Data are Distances

Stress Value indicates lack of fit, so it should be as close to zero as

2 VAR00002 (Videocon) -0.1995 1.3140 0.7743

3 VAR00003 (LG) -0.6043 -1.3429 0.4680

4 VAR00004 (Samsung) -0.9038 -0.2969 -1.8497

5 VAR00005 (Sony) -0.8931 -1.0092 -0.0350

6 VAR00006 (Onida) 1.1045 0.1529 -0.7070

7 VAR00007 (Thompson) -1.1031 1.6088 -0.1289

8 VAR00008 (BPL) -1.1381 -0.6295 1.4121

2 VAR00002 (Videocon) -0.276 1.3796

3 VAR00003 (LG) -0.254 -1.0558

4 VAR00004 (Samsung) -1.2851 -0.7799

5 VAR00005 (Sony) 0.9602 -0.9335

6 VAR00006 (Onida) 1.1044 0.0665

7 VAR00007 (Thompson) -0.5683 1.5124

8 VAR00008 (BPL) -1.2968 -0.662

2 VAR00002 (Nirma) 1.0175 -1.1665 0.3782

3 VAR00003 (Ariel) -1.6674 -0.1959 0.4655

4 VAR00004 (OK) -1.4639 1.3148 - 0.0510

5 VAR00005 (Eta) 0.4844 1.5134 - 0.7565

6 VAR00006 (Wheel) -1.0923 -0.9168 0.8062

7 VAR00007 (Trilo) 1.3535 -0.4514 -0.7064

8 VAR00008 (Super Bar 501) 1.5543 0.7892 1.0199

Clusters (BE) (AD) C

Name VAR 00..1

Lebel Type Question or concise form of question

Missing Check for missing value in data, if any

Columns Actuals .. Just an information

Align Rt Align.. ( Your choice)

1. Enter the 2. Click on

F = MSS Between / MSS Within

Cluster Membership appears in data view

2 V5 V1 & V3 Apathetic / Hate

3 V1 & V3 NONE Shopping Spree/ 43

Delivery Period in Days Var 4 Var 5

Number of Tools Var 6 Var 7

1 Set Up time in Minutes

S3 5.472 = Max - Min

2 Delivery Period in Days

D18 3.389 = Max - Min

T4 -10.361 = Max - Min

Step 6 - Work out ‘Utility’ of any / all combinations