This action might not be possible to undo. Are you sure you want to continue?

BooksAudiobooksComicsSheet Music### Categories

### Categories

### Categories

Editors' Picks Books

Hand-picked favorites from

our editors

our editors

Editors' Picks Audiobooks

Hand-picked favorites from

our editors

our editors

Editors' Picks Comics

Hand-picked favorites from

our editors

our editors

Editors' Picks Sheet Music

Hand-picked favorites from

our editors

our editors

Top Books

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Audiobooks

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Comics

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Sheet Music

What's trending, bestsellers,

award-winners & more

award-winners & more

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

Prof. G.R.C.Nair

1

Correlation Analysis

• Correlation Analysis is a statistical technique used to measure the strength of the association between two variables. • This is very useful to predict future scenario for business.

Scatter Diagram

• The Dependent Variable is the variable being predicted or estimated. • The Independent Variable provides the basis for estimation or it is the estimator. • A Scatter Diagram is a chart that portrays the relationship between the two variables.

3

This scatter plot locates pairs of observations of advertising expenditures on the x-axis and sales on the y -axis. We notice that Larger (smaller) values of sales tend to be associated with larger (smaller) values of advertising.

S c a tte rp lo t o f A d ve rtis ing E x p e n d iture s ( X ) a nd S a le s ( Y ) 140 120 100

S a le s

80 60 40 20 0 0 10 20 30 40 50 4 A d ve rtis i ng

Direct Linear

• The scatter of points tends to be distributed around a positively sloped straight line. • The pairs of values of advertising expenditures and sales are not located exactly on a straight line. • The scatter plot reveals a more or less strong tendency rather than a precise linear relationship. • The line represents the nature of the relationship on average.

5

Inverse Linear

Y

6

X

Direct Nonlinear

Y

X

7

• No association / No correlation • Correlated ?

Y

X

8

**Perfect Negative Correlation
**

10 9 8 7 6 Y 5 4 3 2 1 0 0 1 2 3 4 5 X 6 7 8 9 10

**Perfect Positive Correlation
**

10 9 8 7 6 5 Y4 3 2 1 0 0 1 2 3 4 5 X 6 7 8 9 10

Zero Correlation

10 9 8 7 6 Y 5 4 3 2 1 0 0 1 2 3 4 5 X 6 7 8 9 10

**Strong Positive Correlation
**

10 9 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 X 6 7 8 9 10

Y

Nature of Correlation

Correlation can be • Positive or Negative • Linear or Nonlinear • Perfect / Strong / Weak

13

Coefficient of Correlation, r

Karl Pearson’s Coefficient of Correlation (r) is a measure of the strength of the linear relationship between two variables.

It

requires interval or ratio-scaled data. It can range from -1.00 to 1.00. Values of -1.00 or 1.00 indicate perfect and strong correlation. Values close to 0.0 indicate weak correlation. Negative values indicate an inverse relationship and positive values indicate a direct relationship.

Formula for r

We calculate the coefficient of correlation from the following formulae. r = Cov (X,Y)/sxsy , Cov (X,Y) = Σ [(X-X)(Y-Y)]/(n-1) r = Σ [(X-X) (Y-Y)] / root of [ Σ (X-X)2 * Σ Y)2 ] (Y-

Coefficient of Determination

The coefficient of determination (r2) is the proportion of the total variation in the dependent variable (Y ) that is explained or accounted for (not necessarily caused) by the variation in the independent variable (X).

It

is the square of the coefficient of correlation. Ranges from 0 to 1. It does not give any information on the direction of the relationship between the variables.

Rank Correlation

• Edward Spearman’s Rank Correlation Coefficient (R) is used to measure the degree of correlation between two qualitative variables like, honesty, beauty, talent for singing, gift dancing etc which cannot be directly measured. In this case, they are ranked serially , and the correlation ship between the ranks is calculated as R= 1 – [6 ΣD2 / N(N2-1)], where, D is difference in rank for two variables for the same sample.

17

Regression

• In regression analysis we use the independent variable (x) to estimate the dependent variable (y ).

When

the relationship between the variables is linear, it is called Linear regression. Both variables must be at least interval scale. The least squares criterion is used to determine the equation.

**Least Square Regression
**

The linear regression equation is: y’ = a + bx, where: • y’ is the average predicted value of the dependent variable for any value of x. • a is the Y- intercept. It is the estimated y value when x = 0 • b, the regression coefficient, is the slope of the line, or the average change in y for each change of one unit in x

Regression Equation

• The least squares principle is used to obtain a and b.

• ΣY=na+bΣX • Σ XY = a Σ X + b Σ X2 or,

n( Σ Y ) − Σ ) Σ ) X ( X ( Y b= 2 2 n( Σ ) − Σ ) X ( X Σ Y Σ X a = − b n n

20

Example -1

• Dan Ireland, the student body president at Toledo State University, is concerned about the cost to students of textbooks. He believes there is a relationship between the number of pages in the text and the selling price of the book. To provide insight into the problem he selects a sample of eight textbooks currently on sale in the bookstore. Draw a scatter diagram. Compute the correlation coefficient.

Book

Page (X)

Price ($) (Y) 84 75 99 72 69 81 63 93 Σ Y X-X Y-Y

Intro to History Basic Algebra Intro to Psychology Intro to Sociology Bus. Management Intro to Biology Fund. of Jazz Σ X

500 700 800 600 400 500 600

Principles of Nursing 800

ans = 0.614

Scatter Diagram of Number of Pages and Selling Price of Text

100

90 Price ($) 80

70

60 400 500 600 700 800

Page

23

Example 1 contn

Develop a regression equation for the information given in Example 1 that can be used to estimate the selling price based on the number of pages.

**636 4,900 a= − 0.05143 = 48.0 8 8
**

b= 8(397,200) − (4,900)(636) 8(3,150,000) − (4,900)

2

= .05143

Example 1 contn

The regression equation is: Y’ = 48.0 + .05143X • The equation crosses the Y-axis at $48. A book with no pages would cost $48. • The slope of the line is .05143. Each addition page costs about 5 cents.

25

Example 1 contn

We can use the regression equation to estimate values of Y. • Estimate the selling price of an 800 page book.

**Y ′ = 48.0 + 0.05143 X = 48.0 + 0.05143(800) = 89.14
**

26

Example 2/HW

• The marks given by 2 judges to the contestants of a beauty contest is below. Find the correlation between the tastes of the 2 judges. • Contst A B C D E F G H I J

• Judge X 52, 53, 42, 60, 45, 41, 37, 38, 25, 27 • Judge Y 65, 68, 43, 38, 77, 48, 35, 30, 25, 50 • Ans : 0.5394

27

Assumptions

For each value of x, there is a group of y values These y values are normally distributed.

The

means of these normal distributions of y values all lie on the straight line of regression. The standard deviations of these normal distributions are equal.

Standard Error

• Standard Deviation of all values of y is given by S.E = root of { Σ ( Y - y’) 2 / (n - 2)} y’ is the estimated value by regression equation. Y is corresponding actual. Also, S.E=root {(Σ Y2- aΣ Y-bΣ XY)/(n-2)}

29

Confidence Interval

• Higher the standard error, lower the reliability of the predicted value of y • A confidence interval for y’ for a given value of x can be constructed as y’ + z S.E or y' + t S.E with n-2 d.f

30

Significance testing

• If it is necessary to use this sample regression coefficient ‘b’ for the whole population, its significance may be tested • Std error of b = Sb • Sb = S.E / root ( Σ x2 – nx 2 ) • For ‘t’ test, t = (b - B) / Sb, for d.f =n-2 • Ho: B=0, ie, no linear correlation for the population. H1: B = 0 or > 0 or < 0 • A confidence interval for ‘b’ also can be constructed as b + t sb.

31

Example 3

• Estimate the relationship between sales in Rs lakh and ad expense in Rs lakh. Find the 95% confidence interval for the sales when the ad expense is 7 lakhs. Test if the ad has a positive impact on sales at 5% significance.

Sales Advt 3 1 15 2 6 3 20 4 9 5 25 6

• • •

Ans: X = 3.5, Y = 13, a = 2.4, b = 3.03 y’= 2.4+3.03 X. When X=7, y’= 23.6 (2.4 means sales without any ad. For every Re ad, expect 3 Rs sales increase) 32

S.E=root{(Σ Y2- aΣ Y-bΣ XY)/(n-2)} =7.1, t for 5%at d.f, 4 is 2.776. 95% conf int = 23.6 + 2.776 * 7.1 =3. 9 to 43.3 • Ho: B=0, H1:B > 0 • Sb = S.E / root ( Σ x2 – nx 2 ) = 1.7 • • • • t= (b - 0)/sb= 1.785. • Since it is < t critical at d.f 4 (one tail), 2.132, we cannot conclude that there is positive impact at 5% significance level or 95% confidence level. 33

Multiple Regression

• A variable may depend on more than one independent variable. • eg:-Yield of grains depends on rain, fertilizer used etc • Y’ = a + b1X1 + b2 X2 - A three dimensional graph • Or, even • Y’= a + b1 X1 + b2 X2 + b3 X3 + b4 X4 + ……

34

Example 4/ HW

• A professor felt that the hours spent by students on home work and the marks they get are correlated. .Test it with the given data. Student Hrs Mark A 45 40 B C D 60 65 E 105 90 F 65 50 G 90 90 H 80 80 I 55 45 J 75 65

30 90 35 75

• Predict the mark of the student who spends 95 hrs • Obtain a 95% confidence interval for the mark.

35

HW / Assignment

• IIMM Page 521,23, 42,79 • 2009 Terminal Part B 1 a & 1b • 2007 terminal –make up part C .Q 5 a & 5 b. • 2007 terminal part C Q.6b

36

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd