NITK Unit 3 Lecture 21 Regression

AM761-STATISTICAL METHODS FOR CIVIL
ENGINEERING APPLICATIONS
LECTURE 21: REGRESSION
Dr. DEBABRATA KARMAKAR

ASSISTANT PROFESSOR
DEPARTMENT OF WATER RESOURCES AND OCEAN ENGINEERING
Department of Water Resources and Ocean Engineering, NITK Surathkal 1

REGRESSION
Regression: Regression analysis attempts to
establish the nature of relationships between
variables, that is to study the functional
relationship between the variables and thereby
provides a mechanism for prediction or
forecasting.
 The regression analysis is applied to many
fields like economics, business management,
social sciences and engineering.
 Regression analysis provides the estimates of
values of the dependent variable from the
independent variable through regression lines.
 With the help of regression coefficient, we
can find correlation coefficient.
 In a bivariate distribution, if the variables are
related such that the points in the scatter
diagram will cluster round some line, it is
called line of regression.

REGRESSION CONTD….
Lines of regression
Let (xi, yi), i = 1,2,…,n represent the bivariate data where
Y is the dependent variable depending on the independent
variable X. Let the line of regression of Y an X be
Y  a  bX
According to the principle of least squares, the normal equations for estimating the
parameters a and b are
n n
1 n b n

i 1
yi  na  b xi   yi  a   xi  y  a  bx
i 1 n i 1 n i 1
n n n
1 n a n b n 2 b n 2
yx
i 1
i i  a  xi  b xi
i 1 i 1
2
  yi xi   xi   xi  ax   xi
n i 1 n i 1 n i 1 n i 1
1 n 1 n
We know that Cov( X , Y )  11   yi xi  x y   yi xi  11  x y
n i 1 n i 1
1 1
 X2  
n x
xi 2  x 2  
n x
xi 2   X2  x 2

Substituting the value of Cov(X,Y) and σx2 in the relation obtained

1 n b n 2

n i 1
yi xi  ax   xi
n i 1
 11  x y  ax  b( X2  x 2 )
Again y  a  bx  xy  ax  bx 2
 11  ax  bx 2  ax  b( X2  x 2 )
 11  b X2
11
b
 X2
Since b is the slope of the regression line of y on x, and since the regression line passes
through the point ( x , y ) its equation is
(Y  y )  b( X  x )

11
(Y  y )  (X  x)
 X2
However, the correlation coefficient between x and y is
cov( X , Y ) 11  
rXY    11 2 X
 XY  X  Y  X   Y
11 Y
 r
 X  X
2 XY
11 Y
(Y  y )  ( X  x )  (Y  y )  r (X  x)
X
2 XY
X
This is equation of line of regression of Y on X. Similarly, the equation of line of

regression of X on Y is given by
X
( X  x )  rXY (Y  y )
Y
Property 1: In the case of perfect correlation, the two lines of regression coincide
Proof: In the case of perfect correlation, the coefficient of correlation rXY = ±1. So the
equation of the line of regression of Y on X becomes
Y
(Y  y )   (X  x)
X
Y  y   X x 
     

 Y  
 X 
The equation of the line of regression of X on Y becomes
X
(X  x)   (Y  y )
Y
 X x  Y  y 
     

 X  
 Y 
Hence, in the case of perfect correlation, the two lines of regression coincide

Example: The following table gives the aptitude test scores and productivity indices of 10
workers selected at random. Obtain the regression equation of Y on X and regression
equation of X on Y
Aptitude Scores (X) 60 62 65 70 72 48 53 73 65 82

Productivity Index (Y) 68 60 62 80 85 40 52 62 60 81
Solution:

x
 x  65, y   y  65, u  U  0.2, v  V  0.56
n n n n
1 1
 U2 
n
 U 2
 U 2
 1.104,  2
V 
n
 V 2  V 2  1.5284
1 cov(U , V )
cov(U ,V ) 
n
 UV  UV  0.434, rUV 
UV
 0.3341
(i) The regression equation of Y on X is

Y
(Y  y )  rXY ( X  x )  Y  0.3931X  139.44
X
(ii) The regression equation of X on Y is
X
( X  x )  rXY (Y  y )  X  0.2839Y  46.54
Y
(iii) When test score is 92, the productivity index is 75.6052.
(iv) When productivity index is 75, the test aptitude score is 67.8359.
Regression Coefficients
Consider the regression line of Y on X
11 Y
(Y  y )  b( X  x )  (Y  y )  ( X  x )  (Y  y )  r (X  x)
 X2 XY
X
b, which is the slope of the line of regression of Y on X is called regression coefficient. It

represents the increment in the dependent variable Y corresponding to a unit change in the
independent variable X.
Thus, the regression coefficients of Y on X is
11 Y
bYX   r
 X2 XY
X
Similarly, the regression coefficients of X on Y is

11 X
bXY   r
 Y2 XY
Y
Properties of Regression Coefficients
Property 1: Regression coefficients are independent of change of origin and not of scale
Proof: Let U = (X - a)/h and V = (Y - b)/k where a, b, h and k are constants
X a Y b
U  X  a  hU and V   Y  b  kV
h k
E( X )  a  hE(U ) and E(Y )  b  kE(V )
cov( X , Y )  hk cov(U ,V )
 X2  h2 U2 and  Y2  k 2 V2
The regression coefficient of Y on X is

 Y 11 cov( X , Y ) hk cov(U ,V ) k cov(U ,V ) k
bYX  rXY  2     bVU
X X X2
h U
2 2
h U2
h
Hence, the regression coefficients is independent of change of origin and not of scale.

Property 2: Correlation coefficient is the geometric mean between the regression coefficients
Y
Proof: The regression coefficient of Y on X is bYX  rXY
X
X
The regression coefficient of X on Y is bXY  rXY
Y
Multiplying the two regression coefficient, we get bYX bXY  rXY 2  rXY   bYX bXY
Property 3: If one of the regression coefficients is greater than unity, the other must be less
than unity
Proof: Let one of the regression coefficient s be greater than unity. Let the regression
coefficient of Y on X is greater than unity
 1
bYX  rXY Y  1  1
X bYX
Since the correlation coefficient is always between -1 and +1. So rXY  1
2
1
bYX bXY  1  bXY  1
bYX
Thus regression coefficient of X on Y is less than 1.

Difference between Regression and Correlation Analysis
 The correlation coefficient is a measure of degree of variability between X and Y,

regression analysis is to study the nature of relationship between the variables so that
we can predict the value of given one variable.
 In correlation analysis, we cannot say that one variable is the cause and the other is the
effect. In regression analysis, one variable is taken as dependent and other is
independent, thus making it possible to study the cause and effect of relationship.
 In correlation analysis rXY is a measure of direction and degree of linear relationship

between X abd Y. Hence rXY = rYX, it is immaterial whether X is dependent of Y is
dependent. But in regression analysis bYX ≠ bXY and hence it makes a difference as to
which variable is dependent and which is independent.
 Correlation coefficient is independent of change of scale and origin, but regression

coefficients are independent of change of origin but not scale.

Example: Obtain the coefficient of correlation if the two lines of regression are given by
2 X  8  3Y and 2Y  5  X
Solution: rXY  0.866
Example: Given below is the information regarding the advertisement expenditure and sales
in crores of rupees:
(i) Calculate two regression lines
(ii) Find the likely sales when advertisement expenditure is 25 crores.
(iii) If the company wants to attain sales target of 150 crores, what would be the
advertisement budget?
Advertisement Sales (Y)

Expenditure (X)
Mean 20 120
Standard Deviation 5 25
Correlation coefficient is 0.8

Angle between two lines of Regression

Y
The equation of the line of regression of Y on X is (Y  y )  r (X  x)
X
The equation of the line of regression of X on Y is
X 
(X  x)  r (Y  y )  (Y  y )  Y ( X  x )
Y r X
Y Y
The slopes of these two lines are m1  r and m2 
X r X
If θ is the angle between these two lines of regression then,
m1 m2  r 2  1    X  Y 
tan     
1  m1m2  r   X 2   Y 2 
 1  r 2
      1  r 2    X  Y  
Since r2 ≤ 1, so tan     2
X Y
2 
   tan 
1
 2 2 
 r  X   Y   r 
 X   Y 

Case 1: If r = 0, then tan θ = ∞. Hence θ = π/2. Thus if the two variables are uncorrelated,
then the lines of regression becomes perpendicular to each other.
Case 2: If r = ±1, then tan θ = 0. Hence θ = 0 or π. Thus in the case of perfect correlation ,
the two lines of regression either coincide or parallel to each other. However since both the
lines of regression pass through the point ( x , y ) they cannot be parallel . Hence , in case of
perfect correlation, the two lines of regression coincide.
 
Case 3: When the to regression lines intersect, the angle between them is either acute  0    
  2
or obtuse       .
2 

Standard Error of Estimate
The standard error of estimate measures the variability or scatteredness of the observed
values around the regression lines
Y 
Consider the line of regression of Y on X is (Y  y )  r (X  x)  Y  y  r Y (X  x)
X X
Y  y   X x 
   r  

 Y  
 X 
The standard error of estimates, also called residual variance is the expected value of squares
of derivations of the observed values of Y from the expected values.
SY2  E Y  y  , y is the estimate value

2
2
   
SY2  E Y   y  r Y ( X  x ) 
  X 

2
 Y  y   X  x  
2
  
SY2  E Y  y   r Y ( X  x )    Y 2 E  r 
  X  
 Y   X  
2
SY2   Y 2 E Y *  rX * 
 X x  Y  y 
where X* and Y* are standardized variates given by X *    and Y *
   such that

 X  
 Y 
2
X x 1
* 2
E  X   E      1
2
 E X x
 X  X
2
2
Y  y  1
Similarly, E Y   E * 2
    1
2
 E Y y
 Y  Y
2
 X  x  Y  y  
E  X *Y *   E      E  rXY   rXY
  X   Y  

SY2   Y 2 E Y *  rX *    Y 2  E Y *2   r 2 E  X *2   2rE Y * X * 

2
SY2   Y 2 1  r 2  2r 2    Y 2 1  r 2   SY   Y 1  r 2 
1/2
Similarly, the standard error of estimate of X is
S X   X 1  r 2 
1/2
Note: If r = ± 1, then SX = SY = 0 so that each derivation is zero and the two lines of
regression are coincident.
Limitation of Regression Analysis:

 If one or two extreme values are included, the relationship between the variables may
change completely.
 To know whether a linear or non-linear relationship exists, it is advisable to draw scatter
diagram.

Example: A certain city has gathered data on the number of minor traffic accidents and the
number of youth soccer games that occur in a town over a weekend.
Soccer Games (X) 20 30 10 12 15 25 34

Minor Accidents (Y) 6 9 4 5 7 8 9
(a) Predict the number of minor traffic accidents that will occur on a weekend during which
33 soccer games take place in the city.
(b) Calculate the standard error of estimate.

Thank you

NITK Unit 3 Lecture 21 Regression

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

NITK Unit 3 Lecture 21 Regression

Uploaded by

Copyright:

Available Formats

AM761-STATISTICAL METHODS FOR CIVIL

LECTURE 21: REGRESSION

Dr. DEBABRATA KARMAKAR

Department of Water Resources and Ocean Engineering, NITK Surathkal 1

Department of Water Resources and Ocean Engineering, NITK Surathkal 2

Department of Water Resources and Ocean Engineering, NITK Surathkal 3

Substituting the value of Cov(X,Y) and σx2 in the relation obtained

Department of Water Resources and Ocean Engineering, NITK Surathkal 4

However, the correlation coefficient between x and y is

This is equation of line of regression of Y on X. Similarly, the equation of line of

Department of Water Resources and Ocean Engineering, NITK Surathkal 6

Aptitude Scores (X) 60 62 65 70 72 48 53 73 65 82

Department of Water Resources and Ocean Engineering, NITK Surathkal 7

(i) The regression equation of Y on X is

Consider the regression line of Y on X

b, which is the slope of the line of regression of Y on X is called regression coefficient. It

Thus, the regression coefficients of Y on X is

Similarly, the regression coefficients of X on Y is

Properties of Regression Coefficients

Proof: Let U = (X - a)/h and V = (Y - b)/k where a, b, h and k are constants

The regression coefficient of Y on X is

Department of Water Resources and Ocean Engineering, NITK Surathkal 10

Department of Water Resources and Ocean Engineering, NITK Surathkal 11

Difference between Regression and Correlation Analysis

 The correlation coefficient is a measure of degree of variability between X and Y,

 In correlation analysis rXY is a measure of direction and degree of linear relationship

 Correlation coefficient is independent of change of scale and origin, but regression

Department of Water Resources and Ocean Engineering, NITK Surathkal 12

Solution: rXY  0.866

Advertisement Sales (Y)

Correlation coefficient is 0.8

Angle between two lines of Regression

If θ is the angle between these two lines of regression then,

Department of Water Resources and Ocean Engineering, NITK Surathkal 14

Department of Water Resources and Ocean Engineering, NITK Surathkal 15

Standard Error of Estimate

SY2  E Y  y  , y is the estimate value

Department of Water Resources and Ocean Engineering, NITK Surathkal 16

Department of Water Resources and Ocean Engineering, NITK Surathkal 17

SY2   Y 2 E Y *  rX *    Y 2  E Y *2   r 2 E  X *2   2rE Y * X * 

Similarly, the standard error of estimate of X is

Limitation of Regression Analysis:

Department of Water Resources and Ocean Engineering, NITK Surathkal 18

Soccer Games (X) 20 30 10 12 15 25 34

Department of Water Resources and Ocean Engineering, NITK Surathkal 19

You might also like

SY2   Y 2 E Y *  rX *    Y 2  E Y 2   r 2 E  X 2   2rE Y * X * 