You are on page 1of 20

AM761-STATISTICAL METHODS FOR CIVIL

ENGINEERING APPLICATIONS

LECTURE 21: REGRESSION

Dr. DEBABRATA KARMAKAR


ASSISTANT PROFESSOR
DEPARTMENT OF WATER RESOURCES AND OCEAN ENGINEERING

Department of Water Resources and Ocean Engineering, NITK Surathkal 1


REGRESSION
Regression: Regression analysis attempts to
establish the nature of relationships between
variables, that is to study the functional
relationship between the variables and thereby
provides a mechanism for prediction or
forecasting.
 The regression analysis is applied to many
fields like economics, business management,
social sciences and engineering.
 Regression analysis provides the estimates of
values of the dependent variable from the
independent variable through regression lines.
 With the help of regression coefficient, we
can find correlation coefficient.
 In a bivariate distribution, if the variables are
related such that the points in the scatter
diagram will cluster round some line, it is
called line of regression.

Department of Water Resources and Ocean Engineering, NITK Surathkal 2


REGRESSION CONTD….

Lines of regression
Let (xi, yi), i = 1,2,…,n represent the bivariate data where
Y is the dependent variable depending on the independent
variable X. Let the line of regression of Y an X be

Y  a  bX
According to the principle of least squares, the normal equations for estimating the
parameters a and b are
n n
1 n b n

i 1
yi  na  b xi   yi  a   xi  y  a  bx
i 1 n i 1 n i 1
n n n
1 n a n b n 2 b n 2
yx
i 1
i i  a  xi  b xi
i 1 i 1
2
  yi xi   xi   xi  ax   xi
n i 1 n i 1 n i 1 n i 1
1 n 1 n
We know that Cov( X , Y )  11   yi xi  x y   yi xi  11  x y
n i 1 n i 1
1 1
 X2  
n x
xi 2  x 2  
n x
xi 2   X2  x 2

Department of Water Resources and Ocean Engineering, NITK Surathkal 3


REGRESSION CONTD….

Substituting the value of Cov(X,Y) and σx2 in the relation obtained


1 n b n 2

n i 1
yi xi  ax   xi
n i 1

 11  x y  ax  b( X2  x 2 )

Again y  a  bx  xy  ax  bx 2

 11  ax  bx 2  ax  b( X2  x 2 )

 11  b X2
11
b
 X2
Since b is the slope of the regression line of y on x, and since the regression line passes
through the point ( x , y ) its equation is
(Y  y )  b( X  x )

Department of Water Resources and Ocean Engineering, NITK Surathkal 4


REGRESSION CONTD….
11
(Y  y )  (X  x)
 X2

However, the correlation coefficient between x and y is

cov( X , Y ) 11  
rXY    11 2 X
 XY  X  Y  X   Y

11 Y
 r
 X  X
2 XY

11 Y
(Y  y )  ( X  x )  (Y  y )  r (X  x)
X
2 XY
X

This is equation of line of regression of Y on X. Similarly, the equation of line of


regression of X on Y is given by
X
( X  x )  rXY (Y  y )
Y
Department of Water Resources and Ocean Engineering, NITK Surathkal 5
REGRESSION CONTD….

Property 1: In the case of perfect correlation, the two lines of regression coincide

Proof: In the case of perfect correlation, the coefficient of correlation rXY = ±1. So the
equation of the line of regression of Y on X becomes
Y
(Y  y )   (X  x)
X

Y  y   X x 
     

 Y  
 X 
The equation of the line of regression of X on Y becomes
X
(X  x)   (Y  y )
Y
 X x  Y  y 
     

 X  
 Y 
Hence, in the case of perfect correlation, the two lines of regression coincide

Department of Water Resources and Ocean Engineering, NITK Surathkal 6


REGRESSION CONTD….

Example: The following table gives the aptitude test scores and productivity indices of 10
workers selected at random. Obtain the regression equation of Y on X and regression
equation of X on Y

Aptitude Scores (X) 60 62 65 70 72 48 53 73 65 82


Productivity Index (Y) 68 60 62 80 85 40 52 62 60 81

Solution:

Department of Water Resources and Ocean Engineering, NITK Surathkal 7


REGRESSION CONTD….

x
 x  65, y   y  65, u  U  0.2, v  V  0.56
n n n n

1 1
 U2 
n
 U 2
 U 2
 1.104,  2
V 
n
 V 2  V 2  1.5284

1 cov(U , V )
cov(U ,V ) 
n
 UV  UV  0.434, rUV 
UV
 0.3341

(i) The regression equation of Y on X is


Y
(Y  y )  rXY ( X  x )  Y  0.3931X  139.44
X
(ii) The regression equation of X on Y is
X
( X  x )  rXY (Y  y )  X  0.2839Y  46.54
Y
(iii) When test score is 92, the productivity index is 75.6052.

(iv) When productivity index is 75, the test aptitude score is 67.8359.
Department of Water Resources and Ocean Engineering, NITK Surathkal 8
REGRESSION CONTD….

Regression Coefficients

Consider the regression line of Y on X

11 Y
(Y  y )  b( X  x )  (Y  y )  ( X  x )  (Y  y )  r (X  x)
 X2 XY
X

b, which is the slope of the line of regression of Y on X is called regression coefficient. It


represents the increment in the dependent variable Y corresponding to a unit change in the
independent variable X.

Thus, the regression coefficients of Y on X is

11 Y
bYX   r
 X2 XY
X

Similarly, the regression coefficients of X on Y is


11 X
bXY   r
 Y2 XY
Y
Department of Water Resources and Ocean Engineering, NITK Surathkal 9
REGRESSION CONTD….

Properties of Regression Coefficients

Property 1: Regression coefficients are independent of change of origin and not of scale

Proof: Let U = (X - a)/h and V = (Y - b)/k where a, b, h and k are constants

X a Y b
U  X  a  hU and V   Y  b  kV
h k
E( X )  a  hE(U ) and E(Y )  b  kE(V )

cov( X , Y )  hk cov(U ,V )

 X2  h2 U2 and  Y2  k 2 V2

The regression coefficient of Y on X is


 Y 11 cov( X , Y ) hk cov(U ,V ) k cov(U ,V ) k
bYX  rXY  2     bVU
X X X2
h U
2 2
h U2
h

Hence, the regression coefficients is independent of change of origin and not of scale.

Department of Water Resources and Ocean Engineering, NITK Surathkal 10


REGRESSION CONTD….

Property 2: Correlation coefficient is the geometric mean between the regression coefficients
Y
Proof: The regression coefficient of Y on X is bYX  rXY
X
X
The regression coefficient of X on Y is bXY  rXY
Y
Multiplying the two regression coefficient, we get bYX bXY  rXY 2  rXY   bYX bXY

Property 3: If one of the regression coefficients is greater than unity, the other must be less
than unity
Proof: Let one of the regression coefficient s be greater than unity. Let the regression
coefficient of Y on X is greater than unity
 1
bYX  rXY Y  1  1
X bYX
Since the correlation coefficient is always between -1 and +1. So rXY  1
2

1
bYX bXY  1  bXY  1
bYX
Thus regression coefficient of X on Y is less than 1.

Department of Water Resources and Ocean Engineering, NITK Surathkal 11


REGRESSION CONTD….

Difference between Regression and Correlation Analysis

 The correlation coefficient is a measure of degree of variability between X and Y,


regression analysis is to study the nature of relationship between the variables so that
we can predict the value of given one variable.

 In correlation analysis, we cannot say that one variable is the cause and the other is the
effect. In regression analysis, one variable is taken as dependent and other is
independent, thus making it possible to study the cause and effect of relationship.

 In correlation analysis rXY is a measure of direction and degree of linear relationship


between X abd Y. Hence rXY = rYX, it is immaterial whether X is dependent of Y is
dependent. But in regression analysis bYX ≠ bXY and hence it makes a difference as to
which variable is dependent and which is independent.

 Correlation coefficient is independent of change of scale and origin, but regression


coefficients are independent of change of origin but not scale.

Department of Water Resources and Ocean Engineering, NITK Surathkal 12


REGRESSION CONTD….

Example: Obtain the coefficient of correlation if the two lines of regression are given by
2 X  8  3Y and 2Y  5  X

Solution: rXY  0.866

Example: Given below is the information regarding the advertisement expenditure and sales
in crores of rupees:
(i) Calculate two regression lines
(ii) Find the likely sales when advertisement expenditure is 25 crores.
(iii) If the company wants to attain sales target of 150 crores, what would be the
advertisement budget?

Advertisement Sales (Y)


Expenditure (X)
Mean 20 120
Standard Deviation 5 25

Correlation coefficient is 0.8


Department of Water Resources and Ocean Engineering, NITK Surathkal 13
REGRESSION CONTD….

Angle between two lines of Regression


Y
The equation of the line of regression of Y on X is (Y  y )  r (X  x)
X
The equation of the line of regression of X on Y is
X 
(X  x)  r (Y  y )  (Y  y )  Y ( X  x )
Y r X
Y Y
The slopes of these two lines are m1  r and m2 
X r X

If θ is the angle between these two lines of regression then,

m1 m2  r 2  1    X  Y 
tan     
1  m1m2  r   X 2   Y 2 

 1  r 2
      1  r 2    X  Y  
Since r2 ≤ 1, so tan     2
X Y
2 
   tan 
1
 2 2 
 r  X   Y   r 
 X   Y 

Department of Water Resources and Ocean Engineering, NITK Surathkal 14


REGRESSION CONTD….

Case 1: If r = 0, then tan θ = ∞. Hence θ = π/2. Thus if the two variables are uncorrelated,
then the lines of regression becomes perpendicular to each other.

Case 2: If r = ±1, then tan θ = 0. Hence θ = 0 or π. Thus in the case of perfect correlation ,
the two lines of regression either coincide or parallel to each other. However since both the
lines of regression pass through the point ( x , y ) they cannot be parallel . Hence , in case of
perfect correlation, the two lines of regression coincide.

 
Case 3: When the to regression lines intersect, the angle between them is either acute  0    
  2
or obtuse       .
2 

Department of Water Resources and Ocean Engineering, NITK Surathkal 15


REGRESSION CONTD….

Standard Error of Estimate

The standard error of estimate measures the variability or scatteredness of the observed
values around the regression lines
Y 
Consider the line of regression of Y on X is (Y  y )  r (X  x)  Y  y  r Y (X  x)
X X

Y  y   X x 
   r  

 Y  
 X 

The standard error of estimates, also called residual variance is the expected value of squares
of derivations of the observed values of Y from the expected values.

SY2  E Y  y  , y is the estimate value


2

2
   
SY2  E Y   y  r Y ( X  x ) 
  X 

Department of Water Resources and Ocean Engineering, NITK Surathkal 16


REGRESSION CONTD….
2
 Y  y   X  x  
2
  
SY2  E Y  y   r Y ( X  x )    Y 2 E  r 
  X  
 Y   X  
2
SY2   Y 2 E Y *  rX * 
 X x  Y  y 
where X* and Y* are standardized variates given by X *    and Y *
   such that

 X  
 Y 
2
X x 1
* 2
E  X   E      1
2
 E X x
 X  X
2

2
Y  y  1
Similarly, E Y   E * 2
    1
2
 E Y y
 Y  Y
2

 X  x  Y  y  
E  X *Y *   E      E  rXY   rXY
  X   Y  

Department of Water Resources and Ocean Engineering, NITK Surathkal 17


REGRESSION CONTD….

SY2   Y 2 E Y *  rX *    Y 2  E Y *2   r 2 E  X *2   2rE Y * X * 


2

SY2   Y 2 1  r 2  2r 2    Y 2 1  r 2   SY   Y 1  r 2 
1/2

Similarly, the standard error of estimate of X is

S X   X 1  r 2 
1/2

Note: If r = ± 1, then SX = SY = 0 so that each derivation is zero and the two lines of
regression are coincident.

Limitation of Regression Analysis:


 If one or two extreme values are included, the relationship between the variables may
change completely.
 To know whether a linear or non-linear relationship exists, it is advisable to draw scatter
diagram.

Department of Water Resources and Ocean Engineering, NITK Surathkal 18


REGRESSION CONTD….

Example: A certain city has gathered data on the number of minor traffic accidents and the
number of youth soccer games that occur in a town over a weekend.

Soccer Games (X) 20 30 10 12 15 25 34


Minor Accidents (Y) 6 9 4 5 7 8 9

(a) Predict the number of minor traffic accidents that will occur on a weekend during which
33 soccer games take place in the city.
(b) Calculate the standard error of estimate.

Department of Water Resources and Ocean Engineering, NITK Surathkal 19


Thank you
Department of Water Resources and Ocean Engineering, NITK Surathkal 20

You might also like