0% found this document useful (0 votes)
21 views13 pages

Regression

This document introduces linear regression analysis, explaining its historical context and significance in statistical estimation and prediction across various fields. It details the concepts of dependent and independent variables, the principle of least squares, and the derivation of regression equations. The chapter focuses primarily on simple regression, outlining the methodology for estimating relationships between two variables.

Uploaded by

Silky Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views13 pages

Regression

This document introduces linear regression analysis, explaining its historical context and significance in statistical estimation and prediction across various fields. It details the concepts of dependent and independent variables, the principle of least squares, and the derivation of regression equations. The chapter focuses primarily on simple regression, outlining the methodology for estimating relationships between two variables.

Uploaded by

Silky Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

9.1.

9
CHAPTER

INTRODUCTION
Linear Regression Analysis

The literal or
dictionary
meaning of the word
average Value. The term was Regression' is'stepping back or returning to the
used by British
first
19th century in Connection with biometrician Sir Francis
some studies he made on estimating Galton in the later part
of the
Sons ol lalparents the extent to which
reverts or regresses the stature of e
back to the mean stature
of the population. He studied
relationship betweenthe heights of aboutone tne
thousand fathers and sons
Regression towards and published the results
Mediocrity in Hereditary in a paper
(i) The tall

()
fathers have tallsons
The average height of the
Stature'.The interesting
and short fathers
have short sons.
features of his study were :
sons of group of tall fathers
height of the sons is less than that of
of a group of short fathers the fathers and the average
is more than
that of the fathers.
In other words,
Galton's studies revealed
that the off springs of
revert or step back
to the averageheight of abnormally tall or short
the population, a parents tend to
Regression to Mediocrity. phenomenon which Galton described
as
He concluded that if the average
height of a certain group of fathers is 'a'cms. above
general averageheight then average (below)the
average height where r is the
height of their sons will be (a x r) cms. above (below)
correlation coefficients the general
their sons. In this case between the heights of the givengroupof fathers and
correlation and since
(i) above.
is positive r|slwe have a xrsa.This supports the result in

But today the word regression


as used in Statistics has a much wider
to biometry. Regression perspective without any reference
analysis,in the general sense, means the
value of one variable estimation or prediction of the
from the known value of the other unknown
variable. It is one of the very
tools which is extensively used in almost all sciences important statistical
natural, social and physical. It
business and economics to is specially used in
study the relationship between two
or more variables that are related
and for estimation of demand causally
and supplycurves, cost functions,
production and consumptionfunctions, etc.
Prediction or estimation is one of the
major problems in almost all spheres of human
activity. The
estimation or prediction of future production,
consumption, prices, investments, sales, profits,
are of paramount
importance to a businessman or economist. income,etc.,
Population estimates and population
projections are indispensable for efficient planning of an economy. The
pharmaceutical concerns are
mterested in studying or estimating the effect of new drugs on
patients. Regression analysis is one of the
Very scientific techniques for making such predictions. In the
words of M.M. Blair "Regression analysis is
mathematical measure of theaverage relationship
between two or more variables in terms of the
units of the original
data".

We come across a number of inter-related events in our day-to-day


life. For instance, the yield of a
p depends on the rainfall, the cost or price of a product depends on
the production and advertising
enditure, the demand for a particular product depends on its
price, expenditure of a person depends on
Income, and so on. The regression analysis confíned to the study of only two variables at a time is
letmed
as
simple regression. But quite often the values of a particular phenomenon may be affected by
multiplicity
multiple
of factors. The regression analysis for studying more than two variables at atime
is known as
Fegression. However, inthis chapter weshall confine ourselves to simple regresiononly.
In
regression analysis there are two types of variables. The variable whose value is influenced or is to
predicted is called dependent variable and the variable which influences the values or is used for
9-2
FUNDAMENTALS OF STATISTICe
prediction, is called independent
variable. In regression analysis independent variable is also
regressor or predictor or known as
explanator while the
variable. dependent variable is also known as regressed or explained

9-2. LINEAR AND NON-LINEAR REGRESSION


I1 the gIVen data are plotted on
bivearlate
ornnh he points sO obtaincd on thc scale
more or less concentrale rOund a alagranm will
curve. called the `crve Often such a curve is not distinct
f regression'.
and is quite contusing and sometimes
complicated too The mathematical cquation of
usually called the regression equation, enables the rCgression Curve,
ns to study the averaee change in the value of
variable for any given the dependent
value of the independent
variable.
IIthe regression curveis a straight line. we
savthat there is linear regression bctween the variables
under study. The equation of sucha curve is
the variables x
the e011ation
and y. In case of linear regression the values of
of a straight line, i.l., a
uaion In rs agIc
the denendent variable increase by a constant
absolute amount for a unit change in
the value of the independent
variable. However, 11 ne curve of
regressi0n is not a straight line, the regression is terned
as curved ornon-linear regression. The
equation will be a functional relation betweenx regression
and v inyolving terms inx and y of degree
higher than one,
1.e., involving teIms of the type
y2, etc. x, v.
However in this chapter we shall confine our discussion to
linear regression
between two variables only.

9-3. LINES OF REGRESSION


Line of regression is the line which gives the best estimate of one variable for any
given value of the
other variable. In case of two variables x and y, we shall have two lines of regression; one of y on
other ofx on y. x and the

Definition.Line fregres.sion fy on x is the line which gives the best eslimate /or the value ofyor
any specified value ofx.
Similarly, line of regression ofx on y is the line which gives the best estimate for the value of'x forany
specified value ofy.
The term best fit is interpreted in accordance with the Principle of Least Squares which consists in
minimising the sum of the squares of the residuals or the errors ofestimates, i. e., the deviations between the
given observed values of the variable and tneir corresponding estimated values as given by theline ofbest
fit. We may minimise the sum of the squares of the errors parallel to y-axis or parallel to x-axis, the former
(i.e., minimising the sum of squares of errors parallel toy-axis), gives the equation of the line of regression
of y on x and the latter, viz., minimising the sum of squares of the errors parallel to x-axis gives the equation
of the line ofregression of x on y.

We shall explain below the technique of deriving the equation of the line of regression of y on x.
93-1.Derivation of Line of Regressionofy on X. Let (,y), (X2, V2), .,(xm, y), ben pairs of
observations on the two variables x and yunder study.Let
y=at bx ... (9-1) Y4la ad

be the line of regression (best fit) ofy on x.


For any given point P;(;, V)in the scatter
P,(x,y)
diagram, theerror of estimate or residual as givenby
the line of best fit (9-1) is P; H, Now, the x-coordinate
of H,is same as that of P, viz ., x; and since H, (x) lies
onthe line (9-1), the y-coordinate of H;, i.e., H, M is H, (x, a t+ bx)
given by (a + bx). Hence, the error of estimate for Pi

is given by
P; H, =P, M- H, M M
=y;- (a+bx) ..(9:2) Fig. 9-1.
UNEAR REGRESSION ANALYSIS
9-3
1hs the crror (parallel to the
Is

scatter diagram. For


lie below
y-axis) foT the ih ont
the points which lie above
will have such errors for
the line the eror would be
all lhe
W pOn o
the line, the error positive and for the pomts
would be negative. wimn
According to the principle of least
squares. we bave todetermine the
that the Sum of the constants a and b in (91)sucn
squares ofthe errors of
estimatesis minimun In
other words. we have to
minimise

E = i=|P H2 - 2 ;-a-bx)? ...(9:3)


subject to variations in a and b.
We may also write E as:

where ye the estimated value


1S
E y- v)
of y as given by
Zo-a-bx ),
(9-1)for given value
...(93a)
the n pairs of observations. of x and summation (2)is taken over
Using the principle of maxima
and minima in
(maximum or minimum) for variations differential calculus. E will have an extremum
ina and b if its partial
Hence from (9:3a), we get derivatives w.r.t. a and b vanish separately.

OE =0 E
Ja and
=0 ...(9-4)

ž2( -a- bx). o-a-bx) =0 ž2(-a-bx)(-1) =0


2( -a-bx) -a-bx) =0 L2(y -a- br)(- =0
x)
On cancelling the factor (-2), we
get
Xo-a- bx) =0 Zy-na- bx =0
x (y-a-bx) 0
=na +bx
y-alx-br 0 ...(94a)

and
y ax +bEr2
...(9-5)

..(9-6)
These equations are known as the ormal
eq11ations for estimating aand b. The quantities
Žy, Xxy can be obtained from the x, r,
given set of n points (|,y), (2,y2), ...,
(In y,) and we can solve the
cquations (9:5) and (9-6) simultaneously for a and b, i.e.,

-2x)(y) + () () - (2 x)(y)+ny nž-(x)?


(ExEy) -(Ex)(
b=ny-x(y)
xy)
and
nŽ,-()? ...(97)
...(9-8)

Substituting these values of a and b from (97)and (98)in (9:1), we get the required equation of the
line of regression ofy on x.
The equation of the line of regression y on x
of can be obtained in a much more systematic and
Simplified form in terms of x, y, Os and r=ry as explained below.
O,
Dividing both sides of (9-5) by n, the total number of pairs,we get

Ey =a+b.x y=a+bx ..(9-9)

This implies that line ofbesti. i.e. regression yon x paS5es through the point (x, p). Or in other

ds, the point (F,y ) lies on the line ofregression ofy onx.
We have:
n(02+2)
and

Cov (,y) yy-F, Dy=n[Cov(r, + Fp] y)


9-4 FUNDAMENTALS OF STATISTICS

Substitutingthesevalues in (9-6) and cancelling out n throughout, we get

Cov (x, )+xp= a. x + b (o,? +x 2)


...(9-10)

Now we have to solve (9-9)and (9-10)for a and b. Multiplying (9:9) by x, we get

xy=ax+ br 2 .(9-11)
Subtracting (9-11) from (9·10), we get
Cov(x. y)
Cov (x,y) =bo? ...(9-12)

We find that the equation (9-1) is in the slope-iniercept form, viz., y mx e Hene repreveus the
slope ofthe line o regression fy on x. Further, we have proved in (99) {.al this linc t.e1ne r
regression ofy on x) passes through the point (x,y ).Hence, using the slope-point lorm of the cquation of a
line, the required cquation of the line of regression of y on x becomes

y y =b (x-- x)
...(9-13)

Cov (x, y)
Or
...(9-14)

Cov (x,y)
But Cov (x,y) =ro,o,
Substituting in (9-14), wemay also write the cquation of the lineof regression of y on x as :

y-y =ro,o, (x-) y-y= (r-*) .(9-15)

Remarks 1. Partial Differentiation. If

f-f,y, z), (say),

i.e., fis a function of three variables x, y and z,then partial derivative isobtained on
dx differentiating f

w.r.i. x, treating the other two variables, viz., y and z as constants. Similarly, is obtained on
differentiating fw.r.l. y regarding x and z as constants.
JE JE
2
da
-0and =0, is only a requirement for extremum (maxima or minima) of E.

The necessary and sufficient conditions for a minima of E for variations in a and b are :

() =0, =0 ...*)

E
and (ii) >0 and = >0 ..**)
Ja?

1heorem. The solaution ofthe least sguare equations (9.5) and (9.6),provides amininmm fE defined
in (9.3).

Proof. The normal equations (9.5) and (9.6) already satisfy the equations in (").

We have -2(y-a-bx
-2(1) 2n
)
>0
ab

E
-
= -2)x (y-a-bx)
2 (-)- 2r?
-
NEAR

and
REGRESSION
ANALYSIS

2
da db db da
E
2 4n Var (X)>0

of the cast satisfies (*) and (**), and


tlie solution square equations (normal equations)
Hence, minma ofP
a

prOVides
ror
3, From (9,ehave :

y-y)0 ...(9-16)

ofy given value ofx as given by the line of regression of y on x(91).


the estimated value for a

),is
here
The line of regression ofy on x passes through the point (x. y ).

: Fitting of lincar and noi-liuear regression (trends)


for determining the trend values.
is discussed in detail in Chapter ll on Time

Analysis'
Qries.
ofx on y which gives the best
the line
0.3-2. Line of Regression of x on y. The line of regression is

also obtained by the principle of squares on minimising the


least
for any givenvalue of y. It is
nale ofx
to the x-axis (See Fig. 9:2). By starting with the equation of the form :
of squares of the errors parallel ...(9-17)
x=A+ By,

of x, ie., deviations between the given values


d minimising the sum of the squares of errors of estimates
regression ofx on y, viz., (9-17), i.e., minimising
ir and their estimates givenby line
E of

-A-By)2,
A and Y4
...(9-18)

normal equations for estimating


ue shall get the
Bas:

xnd + By and yAy + B y...(9-19)

f A and B, we shall
Solving (9-19) simultancously

(2y)(2r)-(y)(Exy) ...(9-20)
nj- (Z»)2 By
nExy- (x)(y)
and B= ...(9:21)
r=A+
ny2- (y)2
Substituting these values ofA and B in (9:17), we
get the required equation
of line of regression of x

Fig. 9-2.
Sshall
|Ony.
same as in equations (9-7) and (9-8)
(9:21) are
Remark. The values ofA and B obtained in (9:20) and

ith x changed toy and y tox. from (9-19)the following


of y on .x, we shall get
the case of line of regression
Proceeding cxactly as
in

TeSults :
X =A+ By ...(9:22)

the point (x,y ).


regression ofx on ypasses through
implies that the line of

)
Ihis
Cov (x, y) ro
(i
B= ...(923)

x ony is
The equation of the line of regression
of
...(9:24)
x-X =BY -y )
X-X =Cov (r,y) -) ..(9-25)

X-x =rOr (y-y) ...(926)

an exercise to the reader.


The is left as
deerivation of these results
FUNDAMENTALS OF STATISTIs
9-6
regression of y on x passes
that the line of
through
Remarks 1. The regression equation (9:15) implies the noin.
of x on y also passes through
that the line of regression
(x,y). Similarly (9-26) implies
the point ).In other words, the mean
val..

of regression pass through the point (y


(X, y ). Hence both the lines

of the two regression


lines.
(X,y)can be obtained asthe point of intersection of regression, one ofy on x and the ke
two lines
ofregression ? There are
always
Z. Why two lines or predict the value of y
of v on x (9-14)
or (9·15) is used t0 estimate
of x on y.The line of regression variable. The estimate e
any given value of x, i.e., when y is a dependent
variable and x is an independent
have the minimum possible error
as defined by the princinle c
be best in the sense that it will
of y by using equation (9-15)
obtained will but th
of x for any givenvalue
least squares. Wecan also obtain an estimate the sum of the squares of erro
best since (9-15) is obtained on
minimisng
estimate soobtained will not be value of y, we use the regressinn
x. Hence to estimate or predictx for any given
of estimates in y and not in of estimates in y
which is derived on minimising
the sum of the squares of errors
equation of x on y (9-26) are not
variable. The two regression equations
Here x is a dependent variable and y is an independent
simple reason that the basis and assumptions for deriving these
reversible or interchangeable because of the sum of the
on miniMising the
cquations are quite different. The regression equation of y on x is obtained
of x on y is obtained on minimising
the y-axis while the regression equation
square of the errors parallel to

the sum of squares of the errors parallel to the x-axis.


In a particular case of perfect correlation, positive or negative i.e.,
=t1, the equation of line of

regression of y on x becomes:

y-y=+-x) 2() .(")

Similarly, the equation of the line of regression ofx on y becomes:

y
x-x t(y-y) .(**)

which is same as (*).


Hence in case of perfect correlation, (r =t 1), both the lines of regression coincide. Therefore, in

general,we alwayshave two lines of regression except in the particular case of perfect correlation (r=tl)
when both the lines coincide and we get only one line.
9:33.Angle Between the Régression Lines. The equations of the lines of regression ofyon x andx

on y are respectively:

(x-*) ...() and x-x=r.-y) ro,(x-) .)


Thus the slopes of lines (i) and (i), (which are in the slope-point form) are respectively :

and
ro,
If0 is the angle between the two lines of regression then

tan ) 1+ mmn2
ro,
1+r

O,Oy
..(9-27)

0 =tanl •..(928)
REGRESSION ANALYSIS 9-7
NEAR
ifr=tIthen 0=tan (0) =0 or Tt.
Inparticular,
tWo
lines are either coincident ( = 0) or they are parallel (0 = ). But since both the lines of
the
interscct
at the point ( x,y), they cannot be parallel. Hence in case of perfect correlation,
TCSSIon
negative,
the two lines of regression coincide.

Ifr=0,
then from (9:28), =tan (oo) =T/2
variables
are uncorrelated, the two lines of regression become pependicularto each other.
fthe

1. Whenever two lines intersect,there are twoangles between them, one acute angle and the
Remarks
obtuse
angle. Further tan 0>0if0<0</2, an acute angle andtan
i.e., O is i.e., <0ifu2<0<T
ther

isan
obtuse angle and since 0 <<1, the acute angle (0,) and obtuse angle 0, between the lines of two
by
arcgiven
egression

=
9, Acute angle =tan ...(9:29)

0, =Obtuse angle =tan-! ..(9-30)

2. When r=0 ie., when x and y are uncorrelated, then the lines of regression of y on x, andx on y are

jven respectively by From (9-15)and (9-26)]1,

y- y =0 y=y and x-X= 0


Y

y= y, represents a line parallel to X-axis at a distance of y units

fom the origin and x =X, represents a line parallel to Y-axis at a

distance of x units from the origin.

Hence, ifr =0, thetwo lines of regression areperpendicular to


each other and are parallel to x-axis and y-axis respectively, as
shown in Fig 9:2(a). Fig. 9-2(a).

3.We have seen above that ifr = 0 (variables uncorrlated), the two lines of regression are
perpendicular to each other and ifr = 1,0=0, i.e., the two lines coincide,This leads us to the conclusion
for higher degree of correlation between the variables,theangle between the lines is smaller, i.e., the
lhat
between the lines increases,
WO Iines of regression are nearer to each other. On the other hand, the angle
coefficient decreases. In other words, if
le, the lines of regression move apart as thevalue of correlation
De lines of regression make a larger angle, they indicate
a poor degree of correlation betweenthe variables

aNd ultimately for 0=


/2, i.e., the lines becoming perpendicular if no correlation exists between the
ariables. Thus by plotting the lines of regression
on a graph paper, we can have an approximate idea about
Me degree of correlation between the two variables
under study. Some illustrations are given below in

hig. 9:3(a) to Fig. 9:3(e).

TWO LINES TWO LINES TWO LINES TWO LINES TWO LINES
APART (LOW CLOSER (HIGH
COINCIDE COINCIDE PERPENDICULAR DEGREE OF
DEGREE OF
(r=-1) (r= 1) (r=0) CORRELATION)
CORRELATION)
YA
Y4 YA Y4

X X o
Fig. 9-3(). Fig. 9-3(. Fig. 9-3(c). Fig. 9-3 (c. Fig. 9:3(e).
FUNDAMENTALS OF STATISTICE

9-8

value of the
Lines for Prediction dependent variable y
Using Regression
9:3-4.
predict the
commonly use to Y, whcn
The equation of the regression line is value of Y, written as
the predicted
variable X. For example,
for a given valuc of theindependent
X= X, is given by:

Y;=at bx; and (9 8) and are obtaincd


(9-7)
the normalcquations
where a and are the lcast
b squares estimates given by

sample data. with utmost care and


from the and estimation
be used for prediction
However, the regression equations should
caution. well. Hence, technically
fits thc data
for estimation ifit
cauation points should be

)
kent
It makes sense
to use the regression offit'.The following
we should test for 'goodness
betore using the lines
of regression
for prediction.
in mind while using regression lines
coefficient
(X, as discussed in S8-5 r=r
sample correlation
of the observed for estimation and
I. Test the significance
of r is significant.
we can use the lines of regression
and §
19-8. If the value
prediction. of regression
model is not a good fit and hence the line
2. Ifr not significant, then the linear
is

should not be used for prediction.


the best predicted value
of Y for any given X is the mean value of Y.
In this case (rnot significant),

can be summarized as given below :


The above results

predicted value of Y, denoted


by Y for any given X is obtained on
(a) Ifris significant, then the best
X equation of Yon X. Thus :
putting the valuce of in the regression

(
ney= bo +bX,

then the best predicted value


for any givenXis Y= Y.
(b) Ifr is not significant,
it isworthwhile to use
good fit tothe given data (i.e., ifr is significant),
3. If the linear regression is a
so, we should not go beyond the
the line of regression to cstimate Y for any given X. However,
in doing

the independentvariable X.
available sample data on
on X (age) of women, say,
For example, suppose the line of regression of Y (blood pressure)
to estimate the blood pressure
Y= bo + b, X, is a good fit to the given data. It will be futile, (rather absurd),

of a woman whosc age is 150 years.


supposethe correlation coefficient betweenX (the dose of the
drug in milligrams,
Asanother example,
from allergy) on a group of n patients
ranging from 3 mgm to 10 mgm) and Y (the number of days of relief
is significant. It will be non-sensical t use the line of regression of Y on to estimate the value of Y for a X
of, say, 50 milligrams. In fact, such a high dose may
even result in the death of the patient.
given dose

4. We should not use the linear regression model to make predictions for Y corresponding to lar
distant values ofX. At adistant value of X, there may be a drastic change in the pattern of the relationship
between thevariables, which is not exhibited in the current available data.

Hence, the predicted values of Y for far distant values of X may be extremely unreliable
5. It will be worthwhile to make predictions for the linear regression model only for the population

from which the sample dataare drawn.It should not be used for a different population. For example, the

line of regression fitted between the blood pressure (Y) and age () of a group of women cannot be used

predict the blood pressure of a man for given age (X).

9.4. COEFFICIENTS OF REGRESSION


Let us consider the line of regression of y on x, viz.,

y=a+bx
ANALYSIS
R GRESSION 9-9

cocficic
. } which is the slope of the line of
regressIon of y on x is called the coefficient

of
1he Irepresenisthe incrementin the value ofthe dependet variable y
for a unit change
he independent variablex. In other words, it represents therate w.r.. x. For
the slope
of change of y
b, i.e., cocfficient
convenience, of regression of y on x is written as byr
in theregression cquation of x on viz.,

y,
Sinilarly
x=A+By,
Breprescnts the clhange in the value dependent
of variablex for a unit change in the value of

vaiable y andis called the coeficient


COcllicicDt
of regression ofx on y. For notational convenience. it is

sdendent

votatlons
by= Coefficient of regression ofy on x.
b,, = Coefticient of regression of x on y.

Prom (9-12),the coefficient of regression of yy on x is given by

Cov (r, y)
( Cov(r, y) =ro,o,.] ...(931)

Sinilarly from (9-23), the coefficient of regression of xon y is given by :

Cov (x, y) ro ...(9:32)

Accordingly,the cquation of the line of regression ofy on x becomes

y-y = bn(x-x), .(933)


md the cquation of the line of regression of x on y becomes

X-x =byy (y-y ) ...(9:34)

Remarks 1.For numerical computations of the cquations of line of regression ofy on x, andx on y, the
olowing formulae forthe regression cocfficients b,, and b, are very convenient to use.
We have

Cov (. )
by definition : [See(8-4) and

(r- )y-p)
(84a)]

y-y |my-(E(E)| ..(9:35)

.e-i}-:-()' [-»] ..(9:35a)

This formula for

B6) simultaneously.
b
Cov (x. )(r-)))
=b,, was also obtained
X(r- ) 2
in
n2
n
y-(x2y)
-(E)?
equation (9·8) on solving the normal equations (9:5) and
...(9:36)

Sindlatly. -z-)- [»Et-()


Cov (x ,y) r-r)ol) ny- (x(y) ..(937)
E-)? nZy2- (Zy)?

coefficients from
Omulae (9:36) and (9:37) are very use ful for computing the values of regression
ven set of n points (r|, ), (r2, y2), (n Vn).
problems
er convenient formulae to be used for finding the regression coefficients for numerical

and .(938)
9-10 FUNDAMENTALS OF STATISTICE

y is a symmetrical tnelion
(wo vaiables x and
between x and v, ie
2. Conelation coeflicient between
not symmetrie functions ofx and y, by
i.e,, bev
'y However,the regression coellicients are

3. We have:
Cov(r, v) Cov (*, y)
Cov(, )
.(*), ..(**),

by
From
cOVariance
(*) and (**), we
tem,since o, > 0 and o,, >
observe that the sigu of each

0. // Cov(x, v) is
regression coofticient by and
positive, both the regression
depends
coejicicnts
on
e
the

positive Cov (x, y) is negative, both the regression coefficients are negative
and if

the same sign can also be obtained indirectly,


r
The result that both the regression coefticients have
they had posite signs .
Theorem 9.1) would become egalive. This implies that
,then2-bb(e.t
±
r is a realquantity lying between 1. Hence
WOuld be imaginary, which is a contradiction to thefact that
b,and b,must have the same sign.
Further, since o, >0and 0, the sign of cach of b,and b,, dependson the covariance term, If

r.,
4.
>
o,
Cov (r, y) is positive, all the threeare positive and if Cov(x,v) is negative, all the three are negative. This

result can be stated slightly differently as follows : coeficients. regression


The sign of correlation coefficient is same as that of the regression If

coefficients are positive, r is positive and if regression coelicients are negative, r is negativ.

9-4-1. Theorems on Regression Coefficients


Theorem 9:1. The corelation coefficient is the geomctric mean between the regression coefficients

i.e.,

.. (939)
Cov (x, y) Cov (r, y)
Proof. We have, b ...(9-40) and ...(9-41)

Multiplying (9-40) and (9-41), weget ...(942)


which establishes the result.

Remark. The sign to be taken before the square root is same as that of regression coefficients. Ifthe

regression coeficients are positive, we take positive sign in (9-42) and if regression coeffñcients are
negative, we take negative sign in (9:42).

Theorem 9-2. If one of the regression coefficients is greater than unity (one), the other must be les
than unity.

Proof. If one of the regression coefficients is greater than 1,then the other must be less than one
because otherwise, on using (9:39), we shall get :
2= bg.by > l,
which is impossible, since 0 Srs1.
Aliter. Let >1 ..)
b,
We have 1 <1.
by S [From (*)]
Hence,if one of the regression coefficients is greater than one, the other must be less than one.
Theorem 93. The arithmetic mean of the modulus value of the regression coefficients is greater than
the modulus value of the correlation coefficient.
Proof. We know that for any two real distinct positive numbers a and b:
Arithmetic Mean>Geometric Mean
atbVab .(**)

a=|byl and b=|byylin


Taking (**),we get
ANALYSIS
REGRESSION
prove 9-11
We have to
AWiter.

truc, sincethe square of 20.0,>0 +0 (o, o) >0.


always real quantity is a
1ss always positive.
hch 94. Regression cocfficients are
Theorem independent of change of but not of scale.
if we i
transformn from x and y to
orign
new variables u and v by scal.
Symbolically. change of origin and

where a, b, h (>0) and .(9-43)


k k(> 0) arc constants,

and (9-44)
b
Then k
the correlation coefficicnt is independent of change and scale we have:
Proof.
Since of origin
.(9:45)

.(9-45a)
and ko,.
(9-43)gives:
Also transformation
independent of change of origin but not of scale.
deviation is

SInce.
standard (9-46)
ko k k

b,,
''o, .(9:46a)
h o,
ko, chunge
coe/ficienis dre independent
obvious thut the regres.sion
is
(946) and (946a). 1t
From by the relaton
x and v to u and v
rgin bt 1of f.scale.
we transform the
variables .(9:47)

In particular
if wetake h-k- 1, i.e., and
we get
only, then from
(9:46) and
(9-46a).
v ()(2w) ..(9-47b)

ie. by change of origin and


mean values
n2in(uv) ...(9-47a)
ofregression
if the

n2 -( y)2
the equalions
of the lines
for obtaining
formulac are veryuseful ofx and yare
large.

These or if the values :


fractions equations 57
out to be in the two regression 73
come 51
Tand / orv data, obtain
67 124 80 47
the following 61
9:1. From 108 121 9/ 39
Example 97 70 Y
97
Sales 75
69 by the variable
71 and the purchases
Purchases by the variable EQUATIONS
the sales dxdy

Let us denote FOR REGRESSION


DOution.
CALCULATIONS
dy -- y y 35
1
25
18
49
324 837
71 729
1 961
0
75 18 714
97 27 529
69 441
108 31 0 156
1209
961
97 -23 21
153
121 1521 81
70 34 31
210
67 289 100
91 -39
759
124 441 529
39 3900
17 10 1089 2chrdy-
61 21 d22868
111
73
80 -33 d
-23
0 dr 6360

47 -mo

dr -0
Xx-900
57
y 700
9-12 FUNDAMENTALS OF STATISTICS

700
We have T =10 =70
900
10 =90: and

2x-) -j)ddy 3900


= 0-6132
6360
x- )2

byy
2(x-)(y) dk dy 3900
= 1:361
2868
-) dy

Regression Equations
Equation of line of regression
of y on x is Equation of line of regression ofx on y is

y-y =br(r-x) X-x = b y-y )


y-70 =06132 (r-90) x-90 =1361 (v-70)
0-6132x- 55-188
=
= 136ly -95-27
y =0-6132x -55-188 +70-000 x = 136ly – 95-27 + 90-00
y =0:6132x 14-812
+
x=136ly - 5:27
Remark. We have
r= b,r b=0:6132 ×1:361 =0-8346 r=tVo-8346 =+ 0-9135
But since, both the regression coefficients
are positive,r must be positive.
Hence,r = 0:9135.
Example 9:2. From the data given
below find:
(a) The tvo regression
coefficients. (b) The wo regression
eql~ations.
(c) The coefficient ofcorrelation between
the marks in Economics and
Statistics.
(a) The most likely marks
in Statistics when marks in Economicsare 30.
Marks in Economnics 25 28 35 32' 31 36
Marks in Statistics 29 38 34 32
43 46 49 41 36 32 31 30 33 39
[HimachalPradesh Univ. M.A. (Econ.), 2003]
Solution. Let us denote the
marks in Economics by
variable Y. the variable X and the marks in Statistics by the

CALCULATIONS FOR REGRESSION

25
28
35
32
43
46

49
41
dx

l-7dts
=x-x =-32

oL-4ORe8o8 3
dy y-y =y- 38

549
EQUATIONS

11

3
16
Diond
dy?

121
25
64
ddy
-35
32
33
31 36 -1
-2
36 32 4 4
-6 2
29 31 -3
16
36 -24
38 30 O6 9 49
-8 21
34 33 36
2 64 48
32 4
39 25 -10
1

Xx=320 Žy=380 d 0 dy =0
dr 140 X dy =398 > dxdy =- 93
Here, Žx_320 =32: and
10 380
10 38.
ANALYSIS 9-13
REGRESSION
Coefficients

Regression

ofregression
of y on x= b,
x-) (-)
=
ddhy 93
=-06643
(xlicient
r-i) dr 140

(x-)(y) ddhy 93

(xticien
of regression of x on y=by =
d 398 0-2337

Regression Equations
ofthe line of regression of x on y is : Equation of the line of regression of y on x is

:
Fquation

r-I =b,(-y) y-y =b,, (r-)


32 =- 0-2337(y-38) y-38 =-0-6643 (r-32)
-3 =-0-2337y + 0-2337 x 38 y=-0 6643r+ 38 + 0-6643 x 32

=-02337y +8-8806 =-0-6643x + 38+ 21-2576

r =-02337y + 32 + 8-8806 y =-0-6643x +59-2576 ...()

r=-0-2337y + 40-8806

Coefficient. We have
Correlation

=(-0:6643) x (-0:2337) =0-1552 r=t\0-1552 =t0-394


P=br. by
get r =-0394.
boththe regression coefficients are negative, r mustbe negative. Hence, wve
Since

the most marks in Statistics (v) when marks in Economics (x) are 30, we
LA In order to estimate likely

on x viz., the equation (*). Takingx 30 in(*), the required


estimate is
ll use the line of regression fy

gien by
y=-06643 x 30 + 59-2576 =-19-929 + 59-2576 = 39-3286
when marks Economics are 30, are 39:3286 39.
Hence, the most likely marks in Statistics in

judges A and B graded seven debators and independently awarded the


Example 9:3. A panel of

ollowing marks :
2 3 4 5 6 7
Debator
40 34 28 30 44 38 31
Marks by A
39 26 30 38 34 28
Marks by B 32
An eighth debator was awarded 36 marks by JudgeA while Judge B was notpresent.
to award to eighth debator
If Judge B was alsopresent, how many marks would you expect him
asSuming same degree of relationship exists in judgement ?
[Delhi Univ. B.Com (Hons.), 1993; HimachalPradesh Univ. M.A. (Econ.), June 1999,
Allahabad Univ. M. Com. 2002]
by Judge be denoted by the variable and the marks awarded by
Solution. Let the marks awarded 4

Judge 'B' by the variable Y.

CALCULATIONS FOR REGRESSION EQUATIONS


Debator u =x-A=X-35 V=y- B=y-30
2 25 4 10
40 32
2 34 39 -1 1 81 -9
28 26 -7 49 16 28
4
30 30 -5 25
5 9 81 64 72
44 38
6 12
38 34 3 4 9 16

31 28 4 -2

P
16

P
4

)y==
ores_
Total
u0 2v=17 206 185 121

he marks awarded by Judge A to the eighth debator are given to be 36, i.e., we are given x 36. We
to tind the marks which would have been given to the 8th debator by Judge B, if he were present. In

You might also like