Professional Documents
Culture Documents
ENGINEERING APPLICATIONS
Lines of regression
Let (xi, yi), i = 1,2,…,n represent the bivariate data where
Y is the dependent variable depending on the independent
variable X. Let the line of regression of Y an X be
Y a bX
According to the principle of least squares, the normal equations for estimating the
parameters a and b are
n n
1 n b n
i 1
yi na b xi yi a xi y a bx
i 1 n i 1 n i 1
n n n
1 n a n b n 2 b n 2
yx
i 1
i i a xi b xi
i 1 i 1
2
yi xi xi xi ax xi
n i 1 n i 1 n i 1 n i 1
1 n 1 n
We know that Cov( X , Y ) 11 yi xi x y yi xi 11 x y
n i 1 n i 1
1 1
X2
n x
xi 2 x 2
n x
xi 2 X2 x 2
11 x y ax b( X2 x 2 )
Again y a bx xy ax bx 2
11 ax bx 2 ax b( X2 x 2 )
11 b X2
11
b
X2
Since b is the slope of the regression line of y on x, and since the regression line passes
through the point ( x , y ) its equation is
(Y y ) b( X x )
cov( X , Y ) 11
rXY 11 2 X
XY X Y X Y
11 Y
r
X X
2 XY
11 Y
(Y y ) ( X x ) (Y y ) r (X x)
X
2 XY
X
Property 1: In the case of perfect correlation, the two lines of regression coincide
Proof: In the case of perfect correlation, the coefficient of correlation rXY = ±1. So the
equation of the line of regression of Y on X becomes
Y
(Y y ) (X x)
X
Y y X x
Y
X
The equation of the line of regression of X on Y becomes
X
(X x) (Y y )
Y
X x Y y
X
Y
Hence, in the case of perfect correlation, the two lines of regression coincide
Example: The following table gives the aptitude test scores and productivity indices of 10
workers selected at random. Obtain the regression equation of Y on X and regression
equation of X on Y
Solution:
x
x 65, y y 65, u U 0.2, v V 0.56
n n n n
1 1
U2
n
U 2
U 2
1.104, 2
V
n
V 2 V 2 1.5284
1 cov(U , V )
cov(U ,V )
n
UV UV 0.434, rUV
UV
0.3341
(iv) When productivity index is 75, the test aptitude score is 67.8359.
Department of Water Resources and Ocean Engineering, NITK Surathkal 8
REGRESSION CONTD….
Regression Coefficients
11 Y
(Y y ) b( X x ) (Y y ) ( X x ) (Y y ) r (X x)
X2 XY
X
11 Y
bYX r
X2 XY
X
Property 1: Regression coefficients are independent of change of origin and not of scale
X a Y b
U X a hU and V Y b kV
h k
E( X ) a hE(U ) and E(Y ) b kE(V )
cov( X , Y ) hk cov(U ,V )
X2 h2 U2 and Y2 k 2 V2
Hence, the regression coefficients is independent of change of origin and not of scale.
Property 2: Correlation coefficient is the geometric mean between the regression coefficients
Y
Proof: The regression coefficient of Y on X is bYX rXY
X
X
The regression coefficient of X on Y is bXY rXY
Y
Multiplying the two regression coefficient, we get bYX bXY rXY 2 rXY bYX bXY
Property 3: If one of the regression coefficients is greater than unity, the other must be less
than unity
Proof: Let one of the regression coefficient s be greater than unity. Let the regression
coefficient of Y on X is greater than unity
1
bYX rXY Y 1 1
X bYX
Since the correlation coefficient is always between -1 and +1. So rXY 1
2
1
bYX bXY 1 bXY 1
bYX
Thus regression coefficient of X on Y is less than 1.
In correlation analysis, we cannot say that one variable is the cause and the other is the
effect. In regression analysis, one variable is taken as dependent and other is
independent, thus making it possible to study the cause and effect of relationship.
Example: Obtain the coefficient of correlation if the two lines of regression are given by
2 X 8 3Y and 2Y 5 X
Example: Given below is the information regarding the advertisement expenditure and sales
in crores of rupees:
(i) Calculate two regression lines
(ii) Find the likely sales when advertisement expenditure is 25 crores.
(iii) If the company wants to attain sales target of 150 crores, what would be the
advertisement budget?
m1 m2 r 2 1 X Y
tan
1 m1m2 r X 2 Y 2
1 r 2
1 r 2 X Y
Since r2 ≤ 1, so tan 2
X Y
2
tan
1
2 2
r X Y r
X Y
Case 1: If r = 0, then tan θ = ∞. Hence θ = π/2. Thus if the two variables are uncorrelated,
then the lines of regression becomes perpendicular to each other.
Case 2: If r = ±1, then tan θ = 0. Hence θ = 0 or π. Thus in the case of perfect correlation ,
the two lines of regression either coincide or parallel to each other. However since both the
lines of regression pass through the point ( x , y ) they cannot be parallel . Hence , in case of
perfect correlation, the two lines of regression coincide.
Case 3: When the to regression lines intersect, the angle between them is either acute 0
2
or obtuse .
2
The standard error of estimate measures the variability or scatteredness of the observed
values around the regression lines
Y
Consider the line of regression of Y on X is (Y y ) r (X x) Y y r Y (X x)
X X
Y y X x
r
Y
X
The standard error of estimates, also called residual variance is the expected value of squares
of derivations of the observed values of Y from the expected values.
2
SY2 E Y y r Y ( X x )
X
2
Y y 1
Similarly, E Y E * 2
1
2
E Y y
Y Y
2
X x Y y
E X *Y * E E rXY rXY
X Y
SY2 Y 2 1 r 2 2r 2 Y 2 1 r 2 SY Y 1 r 2
1/2
S X X 1 r 2
1/2
Note: If r = ± 1, then SX = SY = 0 so that each derivation is zero and the two lines of
regression are coincident.
Example: A certain city has gathered data on the number of minor traffic accidents and the
number of youth soccer games that occur in a town over a weekend.
(a) Predict the number of minor traffic accidents that will occur on a weekend during which
33 soccer games take place in the city.
(b) Calculate the standard error of estimate.