You are on page 1of 29

Regression Analysis

Regression
“Regression is the measure of the average
relationship between two or more variables in
terms of the original units of the data”.
“Regression analysis is an attempt to establish
the nature of the relationship between variables-
that is to study the functional relationship
between the variables and thereby provides a
mechanism for prediction or forecasting”.
Regression Analysis is a statistical device with the help of
which we are able to estimate the unknown values of one
variable from known values of another variable. The
variable which is used to predict the another variable is
called independent variable (explanatory variable) and, the
variable we are trying to predict is called dependent variable
(explained variable).
The dependent variable is denoted by X and the independent
variable is denoted by Y.
The analysis used in regression is called simple linear
regression analysis. It is called simple because three is only
one predictor (independent variable). It is called linear
because, it is assumed that there is linear relationship
between independent variable and dependent variable.
Types of Regression
There are two types of regression. They are linear
regression and multiple regression.
Linear Regression:
It is a type of regression which uses one independent
variable to explain and/or predict the dependent
variable.
Multiple Regression:
It is a type of regression which uses two or more
independent variable to explain and/or predict the
dependent variable.
Regression Lines:
Regression line is a graphic technique to show the functional
relationship between the two variables X and Y. It is a line which
shows the average relationship between two variables X and Y.
If there is perfect positive correlation between 2 variables, then the
two regression lines are winding each other and to give one line.
There would be two regression lines when there is no perfect
correlation between two variables. The nearer the two regression
lines to each other, the higher is the degree of correlation and the
farther the regression lines from each other, the lesser is the degree of
correlation.
Properties of Regression lines:-
1. The two regression lines cut each other at the point of average of
X and average of Y ( i.e X and Y )
2. When r = 1, the two regression lines coincide each other and give
one line.
3. When r = 0, the two regression lines are mutually perpendicular.
Regression Equations (Estimating Equations)

Regression equations are algebraic expressions of the


regression lines. Since there are two regression lines,
therefore two regression equations. They are :-
1. Regression Equation of X on Y:- This is used to
describe the variations in the values of X for given
changes in Y.
2. Regression Equation of Y on X :- This is used to
describe the variations in the value of Y for given
changes in X.
Population Linear Regression
The population regression model:
Population Random
Population Independent Error
Slope
y intercept Variable term, or
Coefficient
Dependent residual

y  β0  β1x  ε
Variable

Linear component Random Error


component
Population Linear Regression
(continued)

y y  β0  β1x  ε
Observed Value
of y for xi

εi Slope = β1
Predicted Value Random Error
of y for xi
for this x value

Intercept = β0

xi x
Estimated Regression Model
The sample regression line provides an estimate of
the population regression line

Estimated Estimate of Estimate of the


(or predicted) the regression regression slope
y value
intercept
Independent

ŷ i  b0  b1x variable

The individual random error terms ei have a mean of zero


The Least Squares Equation
• The formulas for b1 and b0 are:

b1 
 ( x  x )( y  y )
and b0  y  b1 x
 (x  x) 2

algebraic equivalent:

 xy   x y
b1  n

 x 2

(  x ) 2

n
Regression Equation of Y on X:-

The normal equations to compute ‘’ and ‘’ are: -

b1 
 ( x  x )( y  y )
b0  y  b1 x
 (x  x) 2
Regression Equation of X on Y:-

The normal equations to compute ‘’ and ‘’ are: -

b1 
 ( x  x )( y  y )
b0  x  b1 y
 ( y  y) 2
Interpretation of the
Slope and the Intercept

• b0 is the estimated average value of y when


the value of x is zero

• b1 is the estimated change in the average


value of y as a result of a one-unit change
in x
Regression Equation of Y on X:-

The normal equations

Hence Regression equation Y on X is:

If x=20 then y=2.276


Similarly, we can find regression equation X on Y

If y=10 then x=6.6388


Simple Linear Regression Example

• A real estate agent wishes to examine the relationship


between the selling price of a home and its size
(measured in square feet)

• A random sample of 10 houses is selected


– Dependent variable (y) = house price in $1000s
– Independent variable (x) = square feet
Sample Data for House Price Model
House Price in $1000s Square Feet
(y) (x)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
Graphical Presentation
• House price model: scatter plot and
regression line
450
400
House Price ($1000s)

350
Slope
300
250
= 0.10977
200
150
100
50
Intercept 0
= 98.248
0 500 1000 1500 2000 2500 3000
Square Feet

house price  98.24833  0.10977 (square feet)


Interpretation of the
Intercept, b0

house price  98.24833  0.10977 (square feet)

• b0 is the estimated average value of Y when


the value of X is zero (if x = 0 is in the range of
observed x values)
– Here, no houses had 0 square feet, so b0 =
98.24833 just indicates that, for houses within
the range of sizes observed, $98,248.33 is the
portion of the house price not explained by
Interpretation of the
Slope Coefficient, b1

house price  98.24833  0.10977 (square feet)

• b1 measures the estimated change in the average


value of Y as a result of a one-unit change in X
– Here, b1 = .10977 tells us that the average value of a
house increases by .10977($1000) = $109.77, on
average, for each additional one square foot of size
X: 0 2 5 3 0
Y: 4 7 3 5 1
find both the line of regression.
Ans:
Regression line Y on X
Y=3.56+0.22X
Regression line X on Y
X=1.2+0.2Y
Important assumptions in regression analysis:
1. There should be a linear and additive relationship between dependent
(response) variable and independent (predictor) variable(s). A linear
relationship suggests that a change in response Y due to one unit change
in X¹ is constant, regardless of the value of X¹. An additive relationship
suggests that the effect of X¹ on Y is independent of other variables.
2. There should be no correlation between the residual (error) terms.
Absence of this phenomenon is known as Autocorrelation.
3. The independent variables should not be correlated. Absence of
this phenomenon is known as multicollinearity.
4. The error terms must have constant variance. This phenomenon is
known as homoskedasticity. The presence of non-constant variance is
referred to heteroskedasticity.
5. The error terms must be normally distributed.
Regression Coefficients
The quantity in the regression equations is
called the regression coefficient or slope
coefficient.
Since there are two regression equations,
therefore, there are two regression coefficients-
regression coefficient of X on Y and regression
coefficient of Y on X.
Regression coefficients of X on Y

Regression coefficients of Y on X
Regression equation using mean and
regression coefficient
Regression equation of Y on X

Regression equation of X on Y

Here are mean of X and Y


Q. You are given the following result of the
height(X) and weight(Y) of 1000 managers.
Mean(X)=68.00”
Mean(Y)=150lbs
Standard deviation(X)=2.50”
Standard deviation(Y)=20lbs
Coefficient of correlation between X and Y=0.6.
Estimate from the above data the height of a
manager whose weight is 200 lbs.
The average daily wage for working class in
Nagpur is $12 and for that in delhi $18, their
respective standard deviations are $2 and $3
and the coefficient of correlation is 0.67. find the
most likely wage in Delhi corresponding to the
wage of $20 in Nagpur.

Ans:
For the data given below
Average S.D.
Production(in units) 35 10
Capacity 85 8
utilisation(%)

Coefficient of correlation=0.6
Obtain the two regression equations.
Estimate the production when the capacity utilisation is
70 per cent.
Properties of the regression coefficients
(1) The coefficient correlation is the geometric mean
of the two regression coefficients

(2) If one of the regression coefficients is greater


than unity, the other must be less than unity, since
the value of coefficient of correlation cannot exceed
unity.
(3)Both of the regression coefficient will have the
same sign.
(4) The correlation coefficient will have the same sign
as that of regression coefficient .

You might also like