You are on page 1of 18

Binomial Distribution

X UNIVARIATE
Poisson Distribution
DATA
ANALYSES
Normal Distribution

X, Y ASSOCIATION,
BIVARIATE DATA ANALYSES INTER-
RELATIONSHIP,
X1, X2 , X3 , ……….. , Xn INTRA-
MULTIVARIATE RELATIONSHIP
DATA ANALYSES
BIVARIATE DATA ANALYSES
Association - CORRELATION
COEFFICIENT

•Is there an association between market share and size of sales force
where hard selling is concerned?

• Are consumers' perceptions of quality related to their perceptions


of prices?

• Can coverage of miles be a good judge of the resale value of a used


car?

• How are the prices and demands of different commodities related?

• Is there any significant association between performance and


confidence?
Correlation Coefficient (r) is a statistic summarising the
strength and direction of association between two metric
(interval or ratio scaled) variables

• -1 ≤ r ≤ 1
• r is an absolute number
• r is a symmetric measure of association
• r measures the strength of LINEAR relationship
• r measures the direction of LINEAR relationship

•Calculation of r assumes that the distributions of the two variables


have the same shape.
• r is inflated/deflated and over/underestimates the population
correlation coefficient, if the above assumption is violated.
•r = 0 does NOT IMPLY that the two variables have no relationship

• r = 0 IMPLIES that there is NO LINEAR relationship


there may exist a non-linear relationship

x2 + y2 = 27
The formula for
correlation
coefficient

Example:-

X Y
4 2
5 5
3 6
75 •Scatter Plot
Y 70
•Scatter diagram
65
60 •Scattergram
55
50
45
40
35
30
30 40 50 60 70 80
X

Correlation is 0.033 (based on 20 observations)


(Very low or negligible correlation)
75
70
65
60
55
50
45
40
35
30
30 40 50 60 70 80

Correlation is 0.603 (based on 20 observations)


75
70
65
60
55
50
45
40
35
30
30 40 50 60 70 80

Correlation is 0.893 (based on 20 observations)


(Very high correlation, quite close to 1.0)
75
70
65
60
55
50
45
40
35
30
30 40 50 60 70 80
Correlation is 0.926 (based on 20 observations)
Approximately 1.0 Near perfect LINEAR relation between
the 2 variables
X2
r(X1,X2 ) = - 0.827

X1 = a - bX2
X2 = c - dX1

X1
X1 X2 X3 X4 X5 X6 X7 X8 X9

X1 1.00 0.80 0.70 0.95 0.01 0.20 0.18 0.16 0.03


X2 1.00 0.63 0.75 0.08 0.11 0.13 0.04 0.09
X3 1.00 0.84 0.02 0.12 0.07 0.15 0.05
X4 1.00 0.01 0.11 0.06 0.02 0.13
X5 1.00 0.93 0.02 0.05 0.03
X6 1.00 0.11 0.09 0.02
X7 1.00 0.95 0.90
X8 1.00 0.93
X9 1.00

Groups of variables exhibiting high


correlation among themselves
The King Kong Effect is the influence, extreme observations can
exert on linear correlation between two variables.

75
70
65 r = 0.827
60
55
50
45
40 r = 0.027
35
30
30 40 50 60 70 80
MULTIVARIATE DATA ANALYSES
PARTIAL CORRELATION COEFFICIENT

A B
D C

rAB . C

rAB . CD
PARTIAL CORRELATION COEFFICIENT

♦ How strongly are sales related to advertising expenditures when


the effect of price is controlled ?

♦ Is there an association between market share and size of the sales


force after adjusting for the effect of sales promotions ?

♦ Are consumers' perceptions of quality related to their perceptions


of prices when the effect of brand image is controlled ?

♦ Temperature affects both rainfall and yield of crop; how does one
find the true relation between rainfall and yield of crop?

Partial Correlation Coefficient (also known as coefficient of partial


determination), rxy.z is a measure of association between two
variables after controlling or adjusting for the effects of one or more
additional variables.
To find the true LINEAR relation between X1 and X2 we need to
adjust for the effect of X3

Est X1 = a + bX3 Est X2 = p + qX3


Correlation coefficient between (X1 - Est X1 ) and (X2 - Est X2 )
(Xi - Est Xi ) = From Xi deleting the part of Xi which is
explained by X3 , i = 1, 2

r12 - r13* r23


r12.3 = ………….. (1)
[(1- r132)(1- r232)] 1/2

• Partial Correlation Coefficient is helpful in detecting spurious


correlation.

• Order of Partial Correlation Coefficient is the number of


• (n+1)th order Partial Correlation Coefficient is obtained by
replacing simple Correlation Coefficient in equation (1) by the nth
order Partial Correlation Coefficient.

• r12.3 and r12 need not have the same sign


• r13 = 0 and r23 = 0, does it mean that r12 = 0 also?

Ex 1) In software profession does performance improve with age?


X1 = performance, X2 = age, X3 = professional experience
r12 = 0.61, r13 = 0.82, r23 = 0.76

=> r12.3 = - 0.0357 => r12 = 0.61 is not the true picture.

If the effect of professional experience is controlled, performance


actually diminishes with age !!!
Because, ‘professional experience’ is highly correlated with
Ex.2) X1 = sales, X2 = advertising expenditure,
X3 = size of sales force,

r12 = 0.9361, r23 = 0.5495, r13 = 0.7334

=> r12.3 = 0.9386 => Sales and advertising expenditure are related;
r12 = 0.9361 This relation is NOT due to the effect of size of
sales force on each of them

Ex.3) X1 ≡ Consumption of basic amenities of life,


X2 ≡ Income, X3 ≡ Household size

r12 = 0.48, r23 = 0.54, r13 = 0.76,


=> r12.3 = 0.12
=>Correlation between income and consumption is spurious