Binomial Distribution

X

Poisson Distribution Normal Distribution

UNIVARIATE DATA ANALYSES

X, Y
BIVARIATE DATA ANALYSES

ASSOCIATION, INTERRELATIONSHIP, INTRARELATIONSHIP

X1, X2 , X3 , ……….. , Xn
MULTIVARIATE DATA ANALYSES

BIVARIATE DATA ANALYSES
Association - CORRELATION COEFFICIENT •Is there an association between market share and size of sales force where hard selling is concerned? • Are consumers' perceptions of quality related to their perceptions of prices? • Can coverage of miles be a good judge of the resale value of a used car? • How are the prices and demands of different commodities related? • Is there any significant association between performance and confidence?

Correlation Coefficient (r) is a statistic summarising the strength and direction of association between two metric (interval or ratio scaled) variables • -1 ≤ r ≤ 1 • r is an absolute number • r is a symmetric measure of association • r measures the strength of LINEAR relationship • r measures the direction of LINEAR relationship •Calculation of r assumes that the distributions of the two variables have the same shape. • r is inflated/deflated and over/underestimates the population correlation coefficient, if the above assumption is violated.

•r = 0 does NOT IMPLY that the two variables have no relationship • r = 0 IMPLIES that there is NO

LINEAR relationship

there may exist a non-linear relationship x2 + y2 = 27

The formula for correlation coefficient Example:X 4 5 3 Y 2 5 6

Y

75 70 65 60 55 50 45 40 35 30 30 40 50 60

•Scatter Plot •Scatter diagram •Scattergram

70

80

X Correlation is 0.033 (based on 20 observations) (Very low or negligible correlation)

75 70 65 60 55 50 45 40 35 30 30 40 50 60 70 80

Correlation is 0.603 (based on 20 observations)

75 70 65 60 55 50 45 40 35 30 30 40 50 60 70 80 Correlation is 0.893 (based on 20 observations) (Very high correlation, quite close to 1.0)

75 70 65 60 55 50 45 40 35 30 30 40 50 60 70 80 Correlation is 0.926 (based on 20 observations) Approximately 1.0 Near perfect LINEAR relation between the 2 variables

X2

r(X1,X2 ) =

- 0.827

X1 = a - bX2 X2 = c - dX1
X1

X1 X1 X2 X3 X4 X5 X6 X7 X8 X9

X2

X3

X4

X5 0.01 0.08 0.02 0.01 1.00

X6 0.20 0.11 0.12 0.11 0.93 1.00

X7 0.18 0.13 0.07 0.06 0.02 0.11 1.00

X8 0.16 0.04 0.15 0.02 0.05 0.09 0.95 1.00

X9 0.03 0.09 0.05 0.13 0.03 0.02 0.90 0.93 1.00

1.00 0.80 0.70 0.95 1.00 0.63 0.75 1.00 0.84 1.00

Groups of variables exhibiting high correlation among themselves

The King Kong Effect is the influence, extreme observations can exert on linear correlation between two variables.
75 70 65 60 55 50 45 40 35 30 30 40 50 60

r = 0.827

r = 0.027

70

80

MULTIVARIATE DATA ANALYSES

PARTIAL CORRELATION COEFFICIENT

D

A

C

B

rAB . C rAB . CD

PARTIAL CORRELATION COEFFICIENT ♦ How strongly are sales related to advertising expenditures when the effect of price is controlled ? ♦ Is there an association between market share and size of the sales force after adjusting for the effect of sales promotions ? ♦ Are consumers' perceptions of quality related to their perceptions of prices when the effect of brand image is controlled ? ♦ Temperature affects both rainfall and yield of crop; how does one find the true relation between rainfall and yield of crop? Partial Correlation Coefficient (also known as coefficient of partial determination), rxy.z is a measure of association between two variables after controlling or adjusting for the effects of one or more additional variables.

To find the true LINEAR relation between X1 and X2 we need to adjust for the effect of X3 Est X1 = a + bX3 Est X2 = p + qX3

Correlation coefficient between (X1 - Est X1 ) and (X2 - Est X2 ) (Xi - Est Xi ) = From Xi deleting the part of Xi which is explained by X3 , i = 1, 2 r12 - r13* r23 r12.3 = [(1- r132)(1- r232)] 1/2 • Partial Correlation Coefficient is helpful in detecting spurious correlation. • Order of Partial Correlation Coefficient is the number of ………….. (1)

• (n+1)th order Partial Correlation Coefficient is obtained by replacing simple Correlation Coefficient in equation (1) by the nth order Partial Correlation Coefficient. • r12.3 and r12 need not have the same sign • r13 = 0 and r23 = 0, does it mean that r12 = 0 also?

Ex 1) In software profession does performance improve with age? X1 = performance, X2 = age, X3 = professional experience r12 = 0.61, r13 = 0.82, r23 = 0.76 => r12.3 = - 0.0357 => r12 = 0.61 is not the true picture.

If the effect of professional experience is controlled, performance actually diminishes with age !!! Because, ‘professional experience’ is highly correlated with

Ex.2) X1 = sales, X2 = advertising expenditure, X3 = size of sales force, r12 = 0.9361, r23 = 0.5495, r13 = 0.7334 => r12.3 = 0.9386 r12 = 0.9361 => Sales and advertising expenditure are related; This relation is NOT due to the effect of size of sales force on each of them

Ex.3) X1 ≡ Consumption of basic amenities of life, X2 ≡ Income, X3 ≡ Household size r12 = 0.48, r23 = 0.54, r13 = 0.76, => r12.3 = 0.12 =>Correlation between income and consumption is spurious

Sign up to vote on this title
UsefulNot useful