You are on page 1of 15

REGRESSION

By
Vijeta Gupta
Amity University
Introduction
Regression term was coined by Sir Francis Galton in 1877 while studying the
relationship between the height of fathers and sons graphically and named the line
describing the relationship “The Line of Regression”.

But statisticians used this term on a wider perspective and define as:

Regression analysis in general sense, means the estimation or prediction of unknown


value of one variable (dependent) from the known value of other variable
(independent).

Regression Analysis is a statistical tool for measuring the average relationship


between any two or more closely related variables in terms (positively or
negatively) of the original units of there data.
- M.M Blair

The term regression analysis refers to the methods by which estimates are made of the
values of the variable from a knowledge of the values of one or more other variables
and to the measurement of the errors involved in the estimation process.
-Morris Hamburg
This technique is invariably used for studying the relationship between two or more
related variables like:

•Estimation of price and demand, Demand and Supply curves, cost functions, Production
and Consumption functions.
•Estimation of population projections foe efficient planning of an economy.
•Estimating the effects of new drugs on patients.
•Estimation of yield of a crop and rainfall, expenditure on advertisement and volume of
sales, demand of a product depends on its price, expenditure depends on income etc.
Types of Regression
•When only two variables are involved in regression, the functional relationship is
known as simple regression.

• If the relationship between the two variables is linear, it is known as simple linear
regression, otherwise it is known as nonlinear regression.

•When one variable is dependent on two or more independent variables, the


functional relationship between the dependent and the set of independent variables
is known as multiple regression.

• Simple linear regression is considered for discussion here.


Characteristics
It consists of mathematical device to measure average relationship between variables.

Used for estimating unknown values of one dependant variable (also called response
variable or measurement) with reference to the known values of its one or more related
independent variables (also known as explanatory variables or predictors).

It provides a mechanism for prediction or forecasting the values of one variable in terms
of values of the other variable.

It consists of two lines of equations


a. Equation of X on Y
b. Equation of Y on X
Regression Line
Line of regression is the line which gives the best estimate of one variable for any given
value of other variable.

In case of two variables X and Y, we shall have two lines of regression;


One of Y on X and Other of X on Y

Line of Regression of Y on X is the line which gives the best estimate for the value of Y for
any specified value of X and similarly for X on Y.

This line is found by a procedure called least squares that provides the best fit (states hat
the line should be drawn through the plotted points in such a manner that the sum of the
squares of the deviations of the actual y values from the computed y values is the least
or, should be minimum. Such a line is known as line of best fit.  Y  Ye 
2

When we plot two variables (say X and Y) on a scatter diagram and draw two lines of best
fit which passes through the plotted points, these lines are called regression lines. These
lines are based on two equations called regression equations which give best estimate of
one variable when other is exactly known or given.
Interpreting The Regression Equation
• Regression equation of y on x
• The equation of the line is called the regression equation. i.e.
y =a + b x
where
y - represents the dependent variable
x - Represents the independent variable
a - the intercept, the value of y when is x zero,
b - the slope, the change in y resulting from a change in x of one until.
The constants (a) and (b) are found by the least squares procedure
• The slope is usually more informative. It shows how much and in which direction y will
very with changes in x
Slope: The slope quantifies the steepness of the line. It equals the change in Y for each
unit change in X. It is expressed in the units of the Y axis divided by the units of the X
axis. If the slope is positive, Y increases as X increases. If the slope is negative, Y
decreases as X increases.
Intercept: The Y intercept is the Y value of the line when X equals zero. It defines the
elevation of the line.
Calculating the Parameter Estimators

To determine the values of a and b to get regression line of y on x, the following two
normal equations are to be solved simultaneous:
 Y  Na  b X
 XY  a X  b X 2

Deviation Method:
Y  Y  r yx  X  X 
y
r
x is known as the regression coefficient of Y on X (b yx)

byx  r
y

 xy
x x 2

If we are dealing with the actual values of X and Y variables and not the deviation then
N  XY   X  Y
b 
N  X   X 
yx 2 2
Regression equation of X on Y

X = a + by

X  Na  b Y

 XY  a Y  b Y 2

x
X X r Y  Y 
y

x
r is known as the regression coefficient of X on Y (b xy)
y

bxy  r
x

 xy
y y 2

Direct Method:
If we are dealing with the actual values of X and Y variables and not the deviation then

bxy 
N  XY   X  Y
N Y 2   Y 
2
Characteristics Of Regression Coefficients
• Both the regression coefficients will have the same sign i.e. either positive or
negative
• The value of both the regression coefficient can not be one because r  b .b
xy yx

• The coefficient of correlation will have the same sin as that of regression
coefficients.

• Since b  r  , we can find out any of the four values given the other three.
xy
x

• Regression coefficient are independent of change of origin but not of scale.


Standard error of estimate
Fundamental fomula
  Y  Y  2

SEY on X  e

n
 X  X 
2

SEX on Y  e

n
Value based formula

SEY on X 
  a Y  b XY
Y 2

SE X on Y 
X 2
 a  X  b XY
n

 y 1  r 
Correlation coefficient based formulae
2
SE Y on X

SE X on Y  x 1  r 2 
Difference Between Correlation And
Regression Analysis
Correlation Analysis is a measure of studying the degree of co variability between two or
more variables, the Regression Analysis is a measure of studying the average relationship
between two or more closely related variables.

Correlation Analysis is a measure of both degree and nature of relationship, Regression


Analysis aims at establishing the functional relationship between the variables and then
using it to predict or estimate the value of unknown variable on the basis of known values
of its another related variable.

While Correlation Analysis is not necessarily based on the cause and effect relationship, the
Regression Analysis is necessarily based upon such relationship between the two related
variables. The variable corresponding to cause is taken as independent variable and the
variable corresponding to effect is taken as dependent variable.

Correlation Analysis consist of only one coefficient i.e. r, the Regression Analysis consist of
two coefficients i.e. bxy and byx

Cont…
Coefficient of correlation can never exceed unity that is ±1, any of the regression
coefficients can exceed unity. However both the regression coefficient cannot exceed
unity.

Coefficient of Correlation is independent of change of both origin and scale, the


Regression Coefficients are independent of change of origin and not of scale.

In Correlation Analysis rxy= ryx, regression coefficient bxy ≠ byx

Correlation Analysis might show none-sense correlation between two variables but
regression never shows.

Correlation Analysis is confined to study linear relationship between any two variables;
Regression Analysis can be made to study both linear and non linear relationship.

Correlation Analysis is independent of the units of measurement; Regression Analysis is


not independent of the units of measurement.

Correlation Analysis gives pure number while Regression Analysis gives a curve or an
equation to determine the values of dependent variables.
Utilities And Limitations Of
Regression Analysis
It provides functional relationship between variables. We can easily estimate or predict
the unknown values of one variable from the known values of another variable.

It provides a measure of error of estimate made through the regression lines.

We can calculate the correlation coefficient by r = √(bxy X byx).

We can calculate coefficient of determination by r2= bxy X byx

The greater the value of r2, the better is the effect and the more useful the regression
equations as the predictive device.

It involvs very lengthy and complicated procedure of calculations and analysis.

It can not be used In case of qualitative phenomenon. Honesty, crime etc.

You might also like