You are on page 1of 49

knowledge skills technology innoventures

www.globsyn.com

Session 6
Quantitative Methods 1

 Topics to be covered in this session:

 Linear Regression and Correlation

11
www.globsyn.edu.in
www.globsyn.com

Objectives of the Session


 To be able to understand the concepts of
 Scatter Plots
 Regression Lines
 Linear Regression
 To be able to find the Regression Lines.
 To be able to understand the concepts of

22
www.globsyn.edu.in
www.globsyn.com

Objectives of the Session

 Linear Regression
 Fitting the Regression Line using Least
Squares Method
 Correlation
 Correlation Coefficient
 Coefficient of Determination
 To be able to compute
 Correlation Coefficient
 Coefficient of Determination

33
www.globsyn.edu.in
www.globsyn.com

Regression Analysis
 Regression and Correlation Analysis will show us
how to determine both the nature and strength of
relationships between two variables
 Introduced in 1877 by Francis Galton. He made a
study that height of children born to tall parents will
regress towards the mean height of the population
 Dependent and Independent Variable :the known
variable is called Independent variable. The variable
we are trying to predict is the dependent variable.
 Example : Sales depending on advertising expenditure
 GNP depending on final consumption spending

4
4
www.globsyn.edu.in
www.globsyn.com

About Regression
 Regression refers to the statistical technique of
modeling the relationship between variables
 In simple linear regression, we model the
relationship between two variables
 One of the variables, denoted by Y, is called
the dependent variable and the other, denoted
by X, is called the independent variable
 The model we will use to depict the
relationship between X and Y will be a
straight-line relationship
 A graphical sketch of the pairs (X, Y) is called a
scatter plot
55
www.globsyn.edu.in
www.globsyn.com

A Scatter Plot
This scatterplot locates
pairs of observations of
advertising
expenditures on the x-
axis and sales on the y-
axis. We notice that:
 Larger (smaller) values
of sales tend to be The scatter of
associated with larger points tends to be
(smaller) values of distributed around
advertising a positively sloped
straight line
66
www.globsyn.edu.in
www.globsyn.com

A Scatter Plot
The pairs of values of
advertising expenditures
and sales are not located
exactly on a straight line
The scatter plot reveals a
more or less strong
tendency rather than a
precise linear The scatter of
relationship points tends to be
The line represents the distributed around
nature of the relationship a positively sloped
on average straight line
77
www.globsyn.edu.in
www.globsyn.com

Example

Student A B C D E F G H

Entrance 74 69 85 63 82 60 79 91
Exam Score

Cumulative 2.6 2.2 3.4 2.3 3.1 2.1 3.2 3.8


GPA

88
www.globsyn.edu.in
www.globsyn.com

9
9
www.globsyn.edu.in
www.globsyn.com

Direct Relationship

10
10
www.globsyn.edu.in
www.globsyn.com

Inverse Relationship- Learning Curve

11
11
www.globsyn.edu.in
www.globsyn.com

Estimating Equation (Positive Slope)

Y= a + bX

a is Y intercept

and b is the slope

b= Y2 – Y1/ X2 – X1
= 7 – 5/ 2 – 1 = 2

a=3

Equation

Y= 3 + 2X

12
12
www.globsyn.edu.in
www.globsyn.com

Y= a + bX

a is Y intercept

and b is the slope

b= Y2 – Y1/ X2 – X1
= 3 – 6/ 1 – 0 = -3

a=6

Equation

Y= 6 - 3X

13
13
www.globsyn.edu.in
www.globsyn.com

Method of Least Squares


 How can we determine the equation of a
straight line drawn through the middle of a set
of points ?
 The points will have a good fit if it minimizes
the error between the estimated points on the
line and the actual points

14
14
14
www.globsyn.edu.in
www.globsyn.com

Method of Least Squares

15
15
www.globsyn.edu.in
www.globsyn.com

16
16
16
www.globsyn.edu.in
www.globsyn.com

Least Squares Method - Formulae

Y = a + bX

17
17
www.globsyn.edu.in
www.globsyn.com

Example
Truck No Age(X) Repair XY X2
(n=4) yrs. Expense( in
thousands of
rupees(Y)
101 5 7 35 25
102 3 7 21 09
103 3 6 18 09
104 1 4 04 01
∑X = 12 ∑Y = 24 ∑XY= 78 = 44= ∑ X2

Mean Value of X = 12/4 = 3


Mean Value of Y = 24/4 = 6

a = 6 – 0.75 x3 = 3.25 b=(78 - 4x3x6)/44-4x9 = 6/8= 0.75

Y = 3.25 + 0.75 X Repair cost for a 7 year old truck = 8.5

18
18
18
www.globsyn.edu.in
www.globsyn.com

Year R&D Annual XY X2


Expenses Profit (Y)
(X)

1983 5 31 155 25
1982 11 40 440 121
1981 4 30 120 16
1980 5 34 170 25
1979 3 25 75 9
1978 2 20 40 4
∑X = 30 ∑Y = 180 ∑XY= 1000 = 200= ∑ X2

Mean Value of X = 30/6 = 5


Mean Value of Y = 180/6 = 30

a = 30 – 2 x5 = 20 b=(1000 - 6x5x30)/200-6x52 = 2

Y = 20 + 2 X

19
19
19
www.globsyn.edu.in
www.globsyn.com

20
20
20
www.globsyn.edu.in
www.globsyn.com

21
21
21
www.globsyn.edu.in
www.globsyn.com

Errors in Regression

Y
the observed data point
Y  b0  b1 X the fitted regression line
Yi .
Yi
{
Error ei  Yi  Yi
Yi the predicted value of Y for X
i

X
Xi

22
22
www.globsyn.edu.in
www.globsyn.com

Standard Error of the Estimate

se = √ ∑ ( Y – Ŷ)2 / n-2
Y - Values of the dependent variable
Ŷ - Estimating values from the estimating equation that
corresponds to each Y value

se - the standard error of the estimate measures the variability


or scatter of the observed values around a regression line

The larger the standard error of the estimate the greater is


the scattering or points around the regression line

23
23
23
www.globsyn.edu.in
www.globsyn.com

Standard Error of the Estimate(Con’td)

X Y Ŷ = 3.75+0.75 X Individual (Y – Ŷ )2
Error (Y-Ŷ )
5 7 3.75 + (0.75)(5) 7 – 7.5 = -0.5 0.25
3 7 3.75 + ( 0.75)(3) 7 – 6.0 = 1.0 1.00
3 6 3.75 + ( 0.75)(3) 6 – 6.0 = 0 0
1 4 3.75 + ( 0.75)(1) 4 – 4.5 = -0.5 0.25
∑ ( Y – Ŷ)2 1.50

se = √ ∑ ( Y – Ŷ)2 / n-2

= √ 1.50 /2 = 0.866

24
24
24
www.globsyn.edu.in
www.globsyn.com

Correlation Analysis
Correlation Analysis is the statistical tool which describes the degree
to which one variable is linearly related to another. It denotes the
strength of the association between two variables

a) Co-efficient of Determination
b) Co-efficient of Correlation

The Sample Co-efficient of Determination is developed from the


relationship
between two kinds of variation : the variation of Y values in a data
set around

1. The fitted regression line


2. Their own mean

25
25
25
www.globsyn.edu.in
www.globsyn.com

Correlation Analysis

26
26
26
www.globsyn.edu.in
www.globsyn.com

Interpretation of r2

= 1 – 0/672 = 1

27
27
www.globsyn.edu.in
www.globsyn.com

PERFECT CORRELATION
BETWEEN X AND Y

28
28
28
www.globsyn.edu.in
www.globsyn.com

Interpretation of r2
POINT X Y ∑( Y – Ŷ)
1ST 1 4 (6- 9)2 = 9 (6- 9)2 = 9
2nd 2 8 (12- 9)2 = 9 (12- 9)2 = 9
3rd 3 12 (6- 9)2 = 9 (6- 9)2 = 9
4th 4 16 (12- 9)2 = 9 (12- 9)2 = 9
5th 5 20 (6- 9)2 = 9 (6- 9)2 = 9
6th 6 24 (12- 9)2 = 9 (12- 9)2 = 9
7th 7 28 (6- 9)2 = 9 (6- 9)2 = 9
8th 8 32 (12- 9)2 =9 (12- 9)2 =9
∑ ( Y – Ŷ)2 = 72

= 1 – 72/72 = 0

29
29
29
www.globsyn.edu.in
www.globsyn.com

NO CORRELATION

30
30
30
www.globsyn.edu.in
www.globsyn.com

Co efficient of Correlation

Is the Square root of the Sample Coefficient of


Determination. This is a measure of the degree of
association between two variables
r = √ r2
When slope of the estimating equation is
positive, r is the positive square root and
when it is negative it is the negative square
root. The values are between -1 to + 1

31
31
www.globsyn.edu.in
www.globsyn.com

Illustrations of Correlation

Y Y
 = -1 =0 Y

=1

X X X

Y
 = -.8 Y
=0 Y

 = .8

X X X

32
32
www.globsyn.edu.in
www.globsyn.com

Caselet 1
American Express Company has long believed that
its cardholders tend to travel more extensively than
others—both on business and for pleasure. As part
of a comprehensive research effort undertaken by a
New York market research firm on behalf of
American Express, a study was conducted to
determine the relationship between travel and
charges on the American Express card. The research
firm selected a random sample of 25 cardholders
from the American Express computer file and
recorded their total charges over a specified period.

33
33
www.globsyn.edu.in
www.globsyn.com

Caselet 1
For the selected cardholders, information was
also obtained, through a mailed questionnaire, on
the total number of miles traveled by each
cardholder during the same period. The data for
this study are given in the following table

34
34
www.globsyn.edu.in
www.globsyn.com

Continued
Miles (X) Dollars (Y) Miles (X) Dollars (Y)
1849 2332 3466 4244
2026 2305 3643 5298
2133 3016 3852 4801
2253 3385 4033 5147
2400 3090 4267 5738
2468 3694 4498 6420
2699 3371 4533 6059
2806 3998 4804 6426
3082 3555 5090 6321
3209 4692 5233 7026
1211 1802 5439 6964
1345 2405 1422 2005
1687 2511
35
35
www.globsyn.edu.in
www.globsyn.com

Caselet 1: Consider the related Scatter Plot

Observe that a Straight Line could be sketched


through the scatter
36
36
www.globsyn.edu.in
www.globsyn.com

Continued
n Miles (X) Dollars (Y) X2 Y2 XY
1 1849 2332
2 2026 2305
3 2133 3016
4 2253 3385
5 2400 3090
6 2468 3694
7 2699 3371
8 2806 3998
9 3082 3555
10 3209 4692
11 3466 4244
12 3643 5298

37
37
www.globsyn.edu.in
www.globsyn.com

Continued
13 3852 4801
14 4033 5147
15 4267 5738
16 4498 6420
17 4533 6059
18 4804 6426
19 5090 6321
20 5233 7026
21 5439 6964
22 1211 1802
23 1345 2405
24 1422 2005
25 1687 2511
Sum
Mean

38
38
www.globsyn.edu.in
www.globsyn.com

Continued

39
39
www.globsyn.edu.in
www.globsyn.com

Continued
n Miles (X) Dollars (Y) X2 Y2 XY
1 1849 2332 3418801 5438224 4311868
2 2026 2305 4104676 5313025 4669930
3 2133 3016 4549689 9096256 6433128
4 2253 3385 5076009 11458225 7626405
5 2400 3090 5760000 9548100 7416000
6 2468 3694 6091024 13645636 9116792
7 2699 3371 7284601 11363641 9098329
8 2806 3998 7873636 15984004 11218388
9 3082 3555 9498724 12638025 10956510
10 3209 4692 10297681 22014864 15056628
11 3466 4244 12013156 18011536 14709704
12 3643 5298 13271449 28068804 19300614
13 3852 4801 14837904 23049601 18493452
14 4033 5147 16265089 26491609 20757851
15 4267 5738 18207289 32924644 24484046

40
40
www.globsyn.edu.in
www.globsyn.com

Continued
16 4498 6420 20232004 41216400 28877160
17 4533 6059 20548089 36711481 27465447
18 4804 6426 23078416 41293476 30870504
19 5090 6321 25908100 39955041 32173890
20 5233 7026 27384289 49364676 36767058
21 5439 6964 29582721 48497296 37877196
22 1211 1802 1466521 3247204 2182222
23 1345 2405 1809025 5784025 3234725
24 1422 2005 2022084 4020025 2851110
25 1687 2511 2845969 6305121 4236057
Sum 79448 106605 293426946 521440939 390185014
Mean 3177.9 4264.2

41
41
www.globsyn.edu.in
www.globsyn.com

Case let 1: Computing the Sum of Squares (SS)

42
42
www.globsyn.edu.in
www.globsyn.com

Case let 1: Computing the Sum of Squares (SS)

Thus, the equation is:

Y = 281.3379 + 1.2533X
43
43
www.globsyn.edu.in
www.globsyn.com

Continued

44
44
www.globsyn.edu.in
www.globsyn.com

Rank Correlation

Spearman’s Rank Correlation is a measure of correlation


that exists between two sets of ranks, a measure of degree
of association between two variables

ρ- Rank Correlation Coefficient


d –difference between the ranks
n- no of observations

45
45
45
www.globsyn.edu.in
www.globsyn.com

RANK CORRELATION - EXAMPLE

46
46
46
www.globsyn.edu.in
www.globsyn.com

Rank Correlation Example (con’td)

= 1 - 6 x58/11(121-1)

= 1 – 0.264 = 0.736

Suggests substantial positive


association between average air
quality and presence of
pulmonary disease

47
47
47
www.globsyn.edu.in
www.globsyn.com

THANK YOU…

All information, including graphical representations, etc provided in this presentation is for exclusive use of current GBS
students and faculty. No part of the document may be reproduced in any form or by any means, electronic or otherwise, without
written permission of the owner.

48
48
48
www.globsyn.edu.in

You might also like