Professional Documents
Culture Documents
By
Vijeta Gupta
Amity University
Introduction
Regression term was coined by Sir Francis Galton in 1877 while studying the
relationship between the height of fathers and sons graphically and named the line
describing the relationship “The Line of Regression”.
But statisticians used this term on a wider perspective and define as:
The term regression analysis refers to the methods by which estimates are made of the
values of the variable from a knowledge of the values of one or more other variables
and to the measurement of the errors involved in the estimation process.
-Morris Hamburg
This technique is invariably used for studying the relationship between two or more
related variables like:
•Estimation of price and demand, Demand and Supply curves, cost functions, Production
and Consumption functions.
•Estimation of population projections foe efficient planning of an economy.
•Estimating the effects of new drugs on patients.
•Estimation of yield of a crop and rainfall, expenditure on advertisement and volume of
sales, demand of a product depends on its price, expenditure depends on income etc.
Types of Regression
•When only two variables are involved in regression, the functional relationship is
known as simple regression.
• If the relationship between the two variables is linear, it is known as simple linear
regression, otherwise it is known as nonlinear regression.
Used for estimating unknown values of one dependant variable (also called response
variable or measurement) with reference to the known values of its one or more related
independent variables (also known as explanatory variables or predictors).
It provides a mechanism for prediction or forecasting the values of one variable in terms
of values of the other variable.
Line of Regression of Y on X is the line which gives the best estimate for the value of Y for
any specified value of X and similarly for X on Y.
This line is found by a procedure called least squares that provides the best fit (states hat
the line should be drawn through the plotted points in such a manner that the sum of the
squares of the deviations of the actual y values from the computed y values is the least
or, should be minimum. Such a line is known as line of best fit. Y Ye
2
When we plot two variables (say X and Y) on a scatter diagram and draw two lines of best
fit which passes through the plotted points, these lines are called regression lines. These
lines are based on two equations called regression equations which give best estimate of
one variable when other is exactly known or given.
Interpreting The Regression Equation
• Regression equation of y on x
• The equation of the line is called the regression equation. i.e.
y =a + b x
where
y - represents the dependent variable
x - Represents the independent variable
a - the intercept, the value of y when is x zero,
b - the slope, the change in y resulting from a change in x of one until.
The constants (a) and (b) are found by the least squares procedure
• The slope is usually more informative. It shows how much and in which direction y will
very with changes in x
Slope: The slope quantifies the steepness of the line. It equals the change in Y for each
unit change in X. It is expressed in the units of the Y axis divided by the units of the X
axis. If the slope is positive, Y increases as X increases. If the slope is negative, Y
decreases as X increases.
Intercept: The Y intercept is the Y value of the line when X equals zero. It defines the
elevation of the line.
Calculating the Parameter Estimators
To determine the values of a and b to get regression line of y on x, the following two
normal equations are to be solved simultaneous:
Y Na b X
XY a X b X 2
Deviation Method:
Y Y r yx X X
y
r
x is known as the regression coefficient of Y on X (b yx)
byx r
y
xy
x x 2
If we are dealing with the actual values of X and Y variables and not the deviation then
N XY X Y
b
N X X
yx 2 2
Regression equation of X on Y
X = a + by
X Na b Y
XY a Y b Y 2
x
X X r Y Y
y
x
r is known as the regression coefficient of X on Y (b xy)
y
bxy r
x
xy
y y 2
Direct Method:
If we are dealing with the actual values of X and Y variables and not the deviation then
bxy
N XY X Y
N Y 2 Y
2
Characteristics Of Regression Coefficients
• Both the regression coefficients will have the same sign i.e. either positive or
negative
• The value of both the regression coefficient can not be one because r b .b
xy yx
• The coefficient of correlation will have the same sin as that of regression
coefficients.
• Since b r , we can find out any of the four values given the other three.
xy
x
SEY on X e
n
X X
2
SEX on Y e
n
Value based formula
SEY on X
a Y b XY
Y 2
SE X on Y
X 2
a X b XY
n
y 1 r
Correlation coefficient based formulae
2
SE Y on X
SE X on Y x 1 r 2
Difference Between Correlation And
Regression Analysis
Correlation Analysis is a measure of studying the degree of co variability between two or
more variables, the Regression Analysis is a measure of studying the average relationship
between two or more closely related variables.
While Correlation Analysis is not necessarily based on the cause and effect relationship, the
Regression Analysis is necessarily based upon such relationship between the two related
variables. The variable corresponding to cause is taken as independent variable and the
variable corresponding to effect is taken as dependent variable.
Correlation Analysis consist of only one coefficient i.e. r, the Regression Analysis consist of
two coefficients i.e. bxy and byx
Cont…
Coefficient of correlation can never exceed unity that is ±1, any of the regression
coefficients can exceed unity. However both the regression coefficient cannot exceed
unity.
Correlation Analysis might show none-sense correlation between two variables but
regression never shows.
Correlation Analysis is confined to study linear relationship between any two variables;
Regression Analysis can be made to study both linear and non linear relationship.
Correlation Analysis gives pure number while Regression Analysis gives a curve or an
equation to determine the values of dependent variables.
Utilities And Limitations Of
Regression Analysis
It provides functional relationship between variables. We can easily estimate or predict
the unknown values of one variable from the known values of another variable.
The greater the value of r2, the better is the effect and the more useful the regression
equations as the predictive device.