Regression Analysis
Regression Analysis
Regression Analysis
Introduction
We have studied so far that, when two variables are correlatcd, change in the
direction of one variable causes the change in the direction of the other. It shows that if
thedegree of precisechange in onevariable due to change in the other can be established,
value of one variable can be predicted given the value of the other. Regression Analysis is
the statistical technique with the help of which we can estimate the unknown vaiues of the
other variable. The variable which is used to predict the value of other variable is called
the 'independent' variable. Such variable is also called as 'Regressor', 'Predictor' or
'Explaining' variable. The variable whose value is to be predicted is called the 'dependent
variable' and it is also called as 'Regressed', 'Predicted' or Explaincd' variable.
MEANING AND DEFINITION
Meaning
The literal meaning of regression is 'moving backward', 'going back' or 'the return
the first time
to themean value'. The credit of using regression technique in Statistics for
it in the study of
goes to British Biometrician Sir Francis Galton in 1877 who used
relationship between the height of fathers and sons. According to him, with
a Correlation
Coefficient of 0-8 between heights of fatherS and children, if the average height of
a
height of children
certain set of fathers is x cm. above the general average, the average
Thus, he concluded that tall fathers will have
shall be 0-8x cm. above the general average.
tall sons, and short fathers will have short sons, and the average height
of the sons of
fathers and on the other
longer fathers is less as compared to the average height of their
average
hand, the average height of the sons of shorter fathers is more as compared to the Galton
(normal height).
height of their fathers. There is, thus, a move towards mediocrity the line
graph and termed
studied the average relation between the above two variables by
describing the average relationship between two variables, as the
line of regression (or
regression line).
Definitions
According to Wallis and Roberts, "It is often morc important to find out what the
relation acually is, in order toestimate or predict other variable (the dependent variable)
on the basis of the one variable (independent variable) and the statistical technigue
appropriatle to such a case iscalled Regression Analysis."
Characteristics of Regression Analysis
(I) Measure the average relationship--Itconsists of mathematical devices that are
used tome asurethe averagerelationship between two,or more closely related variables.
(2) Estinating unknown value of dependent variable-It is used for estimating
the unknovwn values of some dependent variable with reference to the known values of its
related independent variables.
(3) Forecasting-It provides a mechanism for prediction or forecasting of the
values of one variable in terms of the values of the other variable.
(4) Two lines of equation--Itconsists of two lines of equation i.e. (i) equation of X
and Yand (ii) equation of Yon X.
Utilities of Regression Analysis
(1) Prediction of unknown values--lt provides a functional relationship between
two or more related variables with the help of which we can easily estimate or predict the
unknow n valucs ot dependent variable from the known values of independent variable.
(2) Measurement of errors of estimates-It provides a measure of errors of
estimales made through the Regression lines. A little scatter of the actual values around
less
the relevant regression line indicates good estimates of the values of a variable with
estimates of the values of a
degree of errors while large scatter indicates in accurate
variable with high degree of errors.
(3) Determination of rate of change in variablesThe change in the value of one
variable can be determined from Regression Coefficients, if there is a change of one unit
in the value of other variable. For example, if Regression Coefficient of X on Y i.e. br, is
0-7, it means that there will be change of 0-7 in the value of X, if there is a change of one
unit in the value of Y.
(4) Measurement of Coefficient of CorrelationCoefficient of correlation
between the two variables can be calculated by taking the square root of the product of
the twoRegression Coefficicnts i.e.
(5) Tool of Statistical analysis in the field of Business and Commerce- People in
Business andCommerce are interested in predicting future events such as consumption,
production, investment, demand, sales, prices, profits etc. and their success depends very
much on the degree of accuracy in estimates. Regression Analysis facilitates such
cstimation.
(6) Tool for measuring and estimating the cause and effect relationship-t is
highly used in the estimation of Demand Curves, Supply Curves, Production functions.
Cost function, etc. In fact,economists have developed many types of production functions
by fiting Regression lines to the input and output data.
(7) Degree and direction of correlation-The degree and direction of Correlation
betweentwo variables can also be measured with the help of Regression Analysis.
(8) Utility in our day-to-day life-It is highly used it our day-to-day life and
sociological studies as well as cstimate the various factors such as birth-rate, death-rate.
tax-rale, vield-rate etc.
Regression Analysis " 37
Limitations of Regression Analysis
() Cause and effect relationship--It is presumed that a change in onc variable is
caused by achange in the other variable ie. the cause and effect relationship between
variables remain unchanged. This assumption may not always hold good and theremay
be no cause and effect relation at all. or example, if we say that there is a correlation
betwecn students grade in college and thcir annualcarnings, 2 years after graduation, we
are not saying that onc causes the other. Rather, both may be caused by other factors.
Hence,this assumption may lcad to errorcous and misleading results in such cases.
(2) Specific limited range-Regression relationship oblained from limited sarnples
can be applied only to that specific range of data. For example, relaiionship of level of
consumption with the salary of clerks cannot bc uscd to regress the level of consumption
in relation tothe salary of managers.
(3) Using past. trends to estimate future trends--We use historical data to
estimate the Regression cquations. But conditions can change and violate one or more of
the assumptions on which our Regression Analysis depends. For example, a factory uses
Regression Analysis to determine the relationship between the number of employees and
production volume. If the data used in the analysis extend back for several yearS, the
resulting regression line may be too steep because it may fail to recognize the effect of
changing technology.
(4) Complicsted calculation-It involves very lengthy and complicated procedure
of calculations and analysis.
(5) It can not be used in case of qualitative phenomenon such as honesty, beauty,
crime etc.
Types of Regression Analysis
Various types of Regressions are
(1) Simple and Multiple Regression;
(2) Linear and Non-Linear (Curvi-linear) Regression; and
(3) Total and Partial Regression.
(I) Simple regression and multiple regression--Regression is said to be simple
when only two variables are studied. For example, the effect of advertising expenses on
sales turnover. In correlation, it does not matter which one is dependent variable and
which one is independent variable. But in regression analysis, it does matter. In this
example, clearly the sales turnover is dependent variable which is influenced from
advertising expenses. The converse is normally notpossible. Hence, advertising expenses
is an independent variable. Now, with advertising expenses () and sales turnover (Yn the
functional relation can be expressed as Y=f(X).
If more than two variables are studicd simultaneously, the Regression is said to be
multiple. In such regression, one variable is a dependent variable and the remainings are
independent variables. For example, the sales turnover () may depend on advertising
expenses (X) and the income of customers (i). Now, the functional relation can be
expressed as Y=f(x, i).
(2) Linear regression and non-linear regression--Regression is said to be lincar.
ne amount of change in one variable (i.e. unit change in the value of independent
Variable) tends tobear a constant ratio to the amount of change in the dependent variable.
If the given data are plotted on a graph, a straight line is obtained. This relation can be
expressed as,
Y = a + bX
38 " Statistics
Correlation Regression
Basis
Correlation means the relationship Regression means stepping back or
Meaning
between two or more variables so returning to the average value. It
expresses average relationship
that the movements in one tends between two or more variables.
to be accompanied by thc corre
sponding movement in the
other(s).
Regression explains the nature of
2. Degree and natuer Correlation explains he degrce|
direction of rclationship relationship between two variablcs.
of relationtionship and
between two variables. The This nature of relationship may be
meaning of direction is + ve or lincar or non-lincar.
- ve.
3 Cause and effect Correlation nccd notimply cause Regression clearly indicates the
relationship and effect relationship between cause and cffect relationship. One
two wariables. variable is clearly assumed as a
cause (independent variable) and
the other as its effcct (dependent
variable).
4. Non-sense Corrclation between two variables There is no such type of fake
Relationship may cxist due to outside variable relationship in regression ie.. there
also, but such relationship will be is nothing like nonscnse regression.
nonsense, fake or Spurious
correlation.
In correlation, there is a sort of| In Regression, it is learly
5. Dependency
mutual dependence. It is imma- indicated that which of the variable
terial whether X is dependent on Y is dependent and which is indepen
i.e. dent. In other words, Y=f (X) or
X=f(), both necd not be true.
Y=f() or X=f()
Regression Analysis " 39
6. Limits of The cocfficicnt of corrclation lics It is not nccessary that regression
Coefficient between + 1and-1 coefficicnt lies bctwcen
However, the multiplication of
both regrcssion cocfficients cannot
be more than 1.
7. Change of origin Correlation cocfficicnt is Regression cocfficicnts arc
and scale independent of change in origin independent of change in origin
and scalc. but not of scalc.
8. Nature of measure Corrclation coefficient ryy is a Regression cocfficicntslinkcd
b,y andwith
byr
relative mcasure of the lincar are absolute mcasures
relationship between X and Yand theunits of mcasurement.
is independent of the units of
yeasurement.
9. Predictability Correlation analysis is concerned Regression analysis is uscd tor
with determining the relaticnship prediction purpose also.
of two variables. It is not capable
of solving prediction problems.
10. Application Coirelation analysis has limited | Regression analysis studies linear
applications as it is confined only as well as non-linear relationship
between the variables and hence,
to the study of linear relationship
between the variables. has much wider applications.
X
of
Line
Line of Y
on X
X
Regression Analysis " 41
(2) / both the lines of regression are
coefficient of correlation between two
perpendicular to each other, then the
variables Xand Yis zero i.e. r= 0.
Lindof X on Y
r=0
K90
Lind ofY on X
+X
X
No Correlation
(3) In case of perfect correlation--Positive or negative, i.e. r= t 1, the lines
coincide each other.
YA
r=+l r=-1
Line of X on Y 45°
and Y on X Line of X on Y
and Y on X
45°
X X
Perfect Positive Perfect Negative
Correlation Correlation
(4) The smaller is the angle between two regression lines, the greater is the degree
of correlation between the variables.
YA Y
Line of Line of
X on Y X on Y
Line of
Y on X Line of
Y on X
Yt Line of
X on Y Line of
X on Y
Line of
Y on X
Line of
Y on X
+X
X
More Degree of (-ve) Less Degree of r(-ve)
y = 3 +2X
10
8 b2slope)
Iunit
a=3
(intercep)|
3 4
The values of these constants (i.e. 'a' and 6') are calculated by the method of leas
squares, which provide two normal equations to find their values
(i) }Y = Na + b)X
(ii) EXY = a2Y + b)X?
Regression Analysis 43
where. X, SY, EXY and SX2 indicate the totáls of the actual values of the respective
variables. By solving above normal cquations, the values of 'a' and 'b' are determined
and the cquation of Regression line is obtained.
Another form of Regression Equation of Yon X-It is an extended form of the
cquation Y=a+ bX, which is suitable only when.
SD (o, and o,), Correlation Coefficient (r) and arithmetic means of both series (X
and Y') are cither given in the question or can be calculated casily. This equation is
expressed as follows :
Y- Y =r O(X- X)
o,
Proof Y = a + bX ...(1)
Y = a+ bX ...(2)
From equation (2), we have
a= Y-hX ...(3)
Put valuë of 'a' is equation (1) we get
Y = (Y -bX)+ bX
Y = Y -bX + bX
Y- Y = bX-bX
(Y- Y) = b(X- X)
or (Y- Y) = r(X- X) where b=
(2) Regression equation of X on YThis equation is used for estimating the
value of Xfor a given value of Y. This equation is expressed as follows :
X = c+ dY
where X is dependent variable and Yis the independent variable. In this equation, 'c' and
d' are two unknown or the constants of the cquation, where 'c' refers to the intercept of
the line and 'd' refers to the siope of the line.
The value of thes constants (i.e. 'c' and 'd') are calculated by the method of least
Squares, which provides two normal equations to find their values
SX = Nc + dY
(i) SXY = cY+ d)y?
where EX, Y, SXY and Sy² indicale the totals of the actual values of the respective
vanables. By solvingthese normal equations, the values of 'c' and ' are determined and
Me equation of regression line is obtained.
Another form of Regression Equation of Xon Y
when S.D.
1U is an extended form of the eugation X =c+ dY, which is suitable only
(o, and o), Correlation Coefficient (r) and Arithmetic Mean of both series ( X and Y)
are either given in the question or can be calculated easily.
Regression Analysis 47
.bere values of a and b constants are
determined by solving two normal equations
(i) EY = Na+ b)X
(ii) SXY = aX + bX2
Line of best fit for X on Y
The regression line of X on Y is obtained by
finding the value of X for any two
extremevalues ot Ythrough the linear equation
X c + dY
where, values of 'c and ' constants are determined by solving two normal
cquations :-
(i) SX = Nc + d)Y
(ii) SXY = cY+ Y2
Ilustration 2.
Using the method of least squares, draw the two regression lines associatcd with the
following data both separately and jointly :
X 7 4 10 6 8
Y 9 12 9 6
Solution:
DETERMINATION OF REGRESSION LINES BY LEAST SQUARE METHOD
X X? y2 XY
7 49 81 63
4 12 16 144 48
10 4 100 16 40
6 36 81 54
8 6 64 36 48
SX=35 SY= 40 Ex2= 265 Sy'=358 2XY= 253
(i) Regression line of Yon X-This line is given by
Y= a+ bX
To find the values of 'a' and 'b', we use
SY = Na + b)X
SXY = aX+ b)X?
By substituting the values, we have
40 = 5a +35b ...(1)
253 = 35a + 265b ...(2)
Multiplying cquation (1 )by 7 andsublracting equation (2) from it, we get
280 = 35a + 245b
253 = 35a + 265b
27 = -20b
Thus, b = - 135
Merits
(1) It gives the best fit to the data because sum of the squared deviations from the
line is least than, they would be from any other straight line.
(2) The sum of positive and negative deviations from this line is zero.
(3) It gives the best estimate of dependent variable.
(4) It also gives the idea about direction and degree (low, moderate, high) of
correlation between variables.
Demerits
line is difficult.
() Computationof two extreme points for cach regression
(2) Time consuming.
(3) Not commonly understood.
(I) ALGEBRAICMETHODS
mnethod
(1) Regression equations through normal equation
Illustration 3.
From the following data, form the regression equations
Y, = a+ bX
and X, = a+ bY
Use the normal equation method.
10 12 13 17 18
X
5 6 7 9 13
Also, estimate the value of Ywhen X= 11and the value of Xwhen Y= 15.
Solution:
DETERMINATION OF REGRESSION EQUATIONS
X Y X2 y² XY
10 5 100 25 50
12 6 144 36 72
13 7 169 49 91
17 289 81 153
18 13 324 169 234
N=5 SX = 70 SY= 40 SX2 = 1,026 2y'=360 XXY=600
(i) Regression equation of Yon X-This equation is given by
Y, = a+ bX
To find the value of constants 'a' and b' in this equation, we
solve the following two
normal equations
£Y = Na + b)X
EXY = a~X + b2x?
Substituting the respective values from table in the above
equations, we have
40 = 5a + 70b ...(1)
600 = 70a + 1026b ...(2)
56 Statistics
Now, the estimate price (P)when the quantity demanded () is = 10, is given by
P= -1:34(10) +18-04
=-13-4 +18-04=464
llustration 5.
Obtain both regression lines from the following data :
50 52 54 55 56 58 60 61 63 65
Y S1 49 48 60 62 63 58 63 66 67
Y dx = X-A dy = Y- B dy drdy
51 8 64 -9 81 72
50
49 -6 36 121 66
52
48 -4 16. -12 144 48
54
60 - B -3 0
55
4 4 -4
56 62
0 3 9
A’ 58 6.)
2 4 2 4 -4
60 58
3 3 9 9
61 63
66 5 25 36 30
63
67 49 7 49 49
65
N= 10 Sar =-6 Ldy Ed'y Slx dy
=216 =- 13 = 457 = 266
X = A+
N
57-4 y = B+ Sdy-
N
60 t-) =58-7
Edxdy-dr Edy
Edx dy N
b,y = br =
Ea'y - (Sdy²
266-6) (-13)
266 -
-6)(-13)
10 10
(-13)? -6)2
457 - 216
10 10
266-7-8 266-7-8
=
457 - 16-9 216-3-6
258-2 258-2
4401 212-4
= 0-59 = 122
Regression Analysis " 57
line of X on
y
Regression Regression line of Yon X
(X- X) = b,, 0- I') - Y)= h,, (X - X)
X -574) = 059 (Y- 58-7) Y- S8 ) = 22 (X - 57-4)
X-574) = 0:59Y- 34-633 ()- 58-7) I22X - 70-028
X = 0:59) - 34-633 + 57:4 ) |:22X - 70-(028 + 587
X= 0:59 Y+ 22-77
'= 122X - 11:33
Illustration 6.
The following table gives age (X) in years of cars and annual
mantenance cOst ()
(in hundred rupces)
7
15 8 21 23
Fstimate the maintenance cost for a 4-year old car after finding the rcgressiOn
equation.
Solution :
The maintenance cost for a 4-year oldcar is given hy the Regression Equation :
=
SY_25
= =5
5
15
Y 99 = 19-8
18 54
21 25 105 ByProduct-moment method--
7 23 49 161
22 81 198
b,, =
LX = 25 EY= 99 X? ZXY Ex
N
N=5 = 165 = 533
S33 - (25) (99)
Regression equationof Yon X is
l65(25):
Y- Y = b, (X- X)
Y- 19-8 = 0-95 (X- 5) 533 -495 38
Y- 19-8 = 0-95X 4-75 165 -- 125 40
Or Y = 0-95X + 15-05 0-95