You are on page 1of 69

CHAPTER TWO

Simple Linear Regression Analysis

 Regression Analysis: is concerned with the study of the

dependence of one variable (Dependent Variable), on other

variable (explanatory Variable),

 with a view of estimating the mean or average value of the

dependent variable in term of the known or fixed (in repeated

sampling) independent variable.


 economic theories are mainly concerned with the

relationships among various economic variables

 When relationships phrased in mathematical

terms, can predict the effect of one variable on

another.
 The specific functional forms may be linear,
quadratic, logarithmic, exponential, hyperbolic, or
other form
 Statistical versus Deterministic Relationships
 A relationship between X and Y, characterized as Y
= f(X) is said to be deterministic or non-stochastic
 If for each value of the independent variable (X)
there is one and only one corresponding value of
dependent variable (Y).
 A relationship between X and Y is said to be
stochastic if for a particular value of X there is a
whole probabilistic distribution of values of Y
REGRESSION VERSUS CORRELATION

 Regression concerned with estimating the average


value of the dependent variable on the basis of the
known or fixed values of the explanatory variable
 The primary objective of correlation analysis is to
measure degree of linear association between two
variables.
 For example, we may be interested in finding the
correlation (coefficient) between smoking and
lung cancer
 In regression we try to estimate or predict the
average value of one variable on the basis of
the fixed values of other variables
 In regression analysis there is an asymmetry in the
way the dependent and explanatory variables are
treated.
 The dependent variable is assumed to be
statistical, random
 The explanatory variables assumed to have fixed
values.
 correlation analysis, treat any (two) a variable
symmetrically; there is no distinction between the
dependent and explanatory variables.
TERMINOLOGY AND NOTATION
TERMINOLOGY AND NOTATION
SIMPLE LINEAR REGRESSION MODEL

 The relationship between the dependent &


independent variable suggested by economic
theory is usually specified as exact or
deterministic relationships
 But in reality the relationship between economic
variables are inexact or stochastic or in
deterministic in nature.
 The simple linear regression is the bivariate, or
two variables
 It indicates a relationship between two variables
related in a linear form.
 Regression analysis is largely concerned with
estimating and/or predicting the mean value of
the dependent variable on the basis of the known
or fixed values of the explanatory variable
CONCEPT OF POPULATION REGRESSION

conditional mean E(Y | Xi) is a function of Xi,


where Xi is a given value of X. Symbolically:
E(Y | Xi) = f (Xi)………………2.1

E(Y | Xi) is a linear function of Xi and known as


population regression function (PRF) or conditional
expectation function (CEF) . Generally denoted as:
E(Y | Xi) = β1 + β2Xi ………………...2.2.

 Where β1 and β2 are known as the regression


coefficients
 β1 and β2 are intercept and slope coefficients,
respectively
1.Stochastic Specification

 Assuming that the supply for a certain


commodity depends on its price

this is a deterministic (non-stochastic).


 If this were true all the points of price-quantity
pairs, if plotted on a two-dimensional plane,
would fall on a straight line
However, if we gather observations on the quantity
actually supplied in the market at various prices and
we plot them on a diagram we see that they do not
fall on a straight line.
The derivation of the observation from the line
may be attributed to several factors.
a. Omission of variables from the function
b. Random behavior of human beings
c. Imperfect specification of the mathematical form
of the model
d. Error of aggregation

e. Error of measurement
 2.1 A HYPOTHETICAL EXAMPLE 1

 regression analysis is largely concerned with


estimating and/or predicting the (population)
mean value of the dependent variable on the
basis of the known or fixed values of the
explanatory variable(s).To understand this,
consider the table refer to a total population of
60 families in a hypothetical community and
their weekly income (X) and weekly
consumption expenditure (Y ), both in dollar.
TABLE 2.1 WEEKLY
FAMILY INCOME X, $

X→
Y 80 100 120 140 160 180 200 220 240 260

Weekly family 55 65 79 80 102 110 120 135 137 150
consumption 60 70 84 93 107 115 136 137 145 152
expenditure Y, $ 65 74 90 95 110 120 140 140 155 175
70 80 94 103 116 130 144 152 165 178
75 85 98 108 118 135 145 157 175 180
– 88 – 113 125 140 – 160 189 185
– – – 115 – – – 162 – 191

Total 325 462 445 707 678 750 685 1043 966 1211

Conditional 65 77 89 101 113 125 137 149 161 173


means of Y,
E (Y | X )
The 60 families are divided into 10 income
groups (from $80 to $260)
Therefore, we have 10 fixed values of X and
the corresponding Y values against each of
the X values
 corresponding to the weekly income level

of $80, the mean consumption expenditure


is $65, while corresponding to the
income level of $200, it is $137.
 we have 10 mean values for the 10
subpopulations of Y.
 We call these mean values conditional expected
values, denoted by E(Y | X)
 E(Y) is unconditional expected value
 add the weekly consumption expenditures for
all the 60 families in the population and
divide this number by 60
 The dark circled points in Figure 2.1 show the
conditional mean values of Y against the various
X values.
 If we join these conditional mean values, we
obtain what is known as the population
regression line (PRL),
 , it is the regression of Y on X
2.2 THE CONCEPT OF POPULATION REGRESSION FUNCTION
(PRF)

 conditional mean E(Y | X ) is a function of X ,


i i

 E(Y | Xi ) = f ( Xi )….………………(2.2.1)
 E(Y | X ) is a linear function of X
i i

 Equation (2.2.1) is known as the conditional expectation function (CEF)


 population regression function (PRF)
 
PRF E(Y | X ) is a linear function of X ,
i i

  say, of the type


 E(Y | X ) = β + β X
i 1 2 i (2.2.2)
 β and β are unknown but fixed parameters
1 2

known as the regression coefficients


 In regression analysis our interest is in
estimating the PRFs like (2.2.2)
 estimating the values of the unknowns β and β
1 2

on the basis of observations on Y and X


 analogously to the PRF that underlies the
population regression line, we can develop the
concept of the sample regression function
(SRF)
 primary objective in regression analysis is to
estimate the PRF (Yi = β1 + β2Xi + ui)
 on the basis of the SRF (Yi = βˆ1 + βˆ2 Xi +
uˆ i or Yi = βˆ1 + βˆ2 Xi + e)
In terms of the SRF, the observed Yi can be
expressed as
 Yi = Yˆi + uˆ i; Or, Yi = Yˆi + e
 And in terms of the PRF, it can be expressed as

 Yi = E(Y | Xi) + Ui
 Yˆi overestimates the true E(Y | Xi) for the Xi
shown therein.
 for any Xi to the left of the point A, the SRF will
underestimate the true PRF.

  in its stochastic form as follows:


 Y = βˆ + βˆ X + uˆ
i 1 2 i i
 where, in addition to the symbols already
defined, uˆ denotes the (sample) residual term
i

Assumptions of the Classical Linear Stochastic


Regression Model.
 In regression analysis our objective is not only to obtain
βˆ1 and βˆ2
 also to draw inferences about the true β1 and β2
 we would like to know how close βˆ1 and βˆ2 β2 are
to their counterparts in the population
 how close Yˆi is to the true E(Y | Xi).
The most important assumptions are;

 Assumption 1: The model is linear in parameters

 The model should be linear in the parameters


regardless of whether the explanatory and the
dependent variables are linear or not.
Assumption 2: Ui is a random real variable

The value which u may assume in any one period


depends on chance; it may be positive, negative or zero
Assumption 3: The mean value of the random
variable (U) in particular period is zero
Mathematically, E (ui |Xi) = 0
Assumption 4: The variance of the random

variable (U) is constant (homoscedasticity)


 the u will show the same dispersion around their
mean.
 the values that u can assume lie with in the same
limits, irrespective of the value of X
Assumption 5: The random variable (U) has a
normal distribution
the values of u (for each x) have a bell shaped symmetrical distribution about their zero mean
and constant variance  2 , i.e. U i  N (0,  2 )

Assumption 6: No autocorrelation between the


disturbances
Given any two X values, Xi and Xj (i not equal with j),
 the correlation between any two ui and uj is zero
 cov (ui, uj |Xi, Xj) = E{[ui − E(ui)] | Xi }{[uj −
E(uj)] | Xj }
 E(ui |Xi)(uj | Xj) =0
 Assumption7: X Values are fixed in repeated
sampling. Values taken by the regressor X are
considered fixed in repeated samples. More
technically, X is assumed to be non-stochastic.
 Assumption 8: The random variable (U) is
independent of the explanatory variables (Zero
covariance between ui and Xi, or E(uiXi) = 0.)
cov(XU )  [( X i  ( X i )][U i  (U i )]
 [( X i  ( X i )(U i )] given E (U i )  0
 ( X iU i )  ( X i )(U i )
  ( X iU i )
 X i (U i ) , given that xi are fixed
0
Assumption 9: The explanatory variables are
measured without error
 the regressors are error free, while y values may
or may not include errors of measurement.
 Assumption 10: The number of observations n
must be greater than the number of Parameter
to be estimated
 Assumption 11: Variability in X values
 The X values in a given sample must not all be
the same
 Assumption 12: regression model is correctly
specified
1. Methods of estimation

The parameters of the simple linear regression model


can be estimated by various methods.
1. Ordinary least square method (OLS)

 classical least square (CLS) involves finding values


for the estimates which will minimize the sum of
square of the squared residuals
 From the estimated relationship ,
we obtain:
 ……………………………2.9a
 ………………………2.10a
 To find the values of that minimize
this sum, we have to partially differentiate
 with respect to and set the partial
derivatives equal to zero
 1
 …………………..2.12a
 divided equation 2.12a by n then

 2.
 the residual, Hence equation 2.14a
can be written as and ….2.15a
Ifwe rearrange equation (2.14) we obtain;
 ………………….2.16a
Substituting the values of from (2.13) to (2.116),
we get:

 , = ( 2)
 …………………………….2.17a
 equation 2.17 can be written as
Substituting (2.15) and (2.16) in (2.14), we get

 Now, denoting as and as


 we get
 ……………………………….2.20a
 The expression in (2.20 a) to estimate the
parameter coefficient is termed is the formula in
deviation form
Precision or Standard Errors of Least-Squares
Estimates
In statistics the precision of an estimate is measured by its
standard error (se).

Gaussian assumptions the standard errors of


the OLS estimates can be obtained:
 , 𝑆𝐸 (𝛽^) = ඥ𝑉𝑎𝑟ሺ𝛽ሻ


𝑆𝐸(𝛼)ඥ𝑉𝑎𝑟(𝛼)


 Whereas: ෍ 𝑈𝑖 = ෍ 𝑦 − 
2 2 ( xy ) 2
 x2.
 Where: var = variance and se = standard error
and where σ2 is the constant homoscedastic.
 σˆ2 is the OLS estimator of the true but
unknown σ2
 the expression n−2 is known as the number of
degrees of freedom (df).
 uˆ2 being the sum of the residuals squared
 or the residual sum of squares (RSS) 0r u 2
 Once uˆ2 is known, σˆ2 can be easily computed
 σˆ2 is known as the standard error of estimate or
the standard error of the regression (se).
 NB.
 1. The variance of βˆ is directly proportional to σ 2
but inversely proportional to x2
 given σ2, if there is substantial variation in the X
values,
 β can be measured more accurately than when the Xi
do not vary substantially
 given x2 , the larger the variance of σ2 , the larger
the variance of β
1. 2. The variance of αis directly proportional to σ 2
and X2 but inversely proportional to x2 and the
sample size n.
 
Properties
of Least Square (OLS) Estimators:
Gauss-Markov Theorem
The closeness of the estimate to the population parameter is
measured by
 the mean and variance or
 standard deviation of the sampling distribution
of the estimates
 . well-known theorem Gauss-Markov Theorem s
say the OLS estimators are BLUE .
 (Best, Linear, and Unbiased Efficient, Estimator).
 An estimator is called BLUE if:
 Linear: a linear function of the random variable
 Unbiased: its average or expected value is equal to
the true population parameter.
 Minimum variance: It has a minimum variance in the
class of linear and unbiased estimators
An unbiased estimator with the least variance is
known as an efficient estimator.
 proofs of these properties
A. Linearity: (for )
Proposition: are linear in Y.
the OLS estimator of is given by:
 ,,

 But
 Now let,

 … is linear in Y
B. Linearity in
Proof: Substitute equation no 2.13 in place of of

equation 2.22
 =

 =
 =
 =
 we know are constant then
 is a linear function of the dependent variable Yi

 C. Biasedness
Proposition: are the unbiased estimators of the
true parameters
 if is an estimator of
 Then

if ˆ is the unbiased estimator of  then bias = 0
 .. are estimators of the true parameters

 Proof (1): Prove that is unbiased i.e.
 We know that
 ,, …………………………………..2.23

 ………….2.24

 since
Therefore, is unbiased estimator of .

 Proof(2): prove that ̂ is unbiased i.e.: (ˆ )  

 by your self(reading assignment)


 Statistical test of Significance of the OLS

Estimators.

 After the estimation of the parameters

 we need to know how ‘good’ is the fit of this line

to the sample observation of Y and X,


 The two most commonly used first order tests in
econometric analysis are;
 i. The coefficient of determination R2
 This test is used for judging the explanatory power
of the independent variable(s).
 ii. The standard error tests of the estimators.
 This test is used for judging the statistical
reliability of the estimates of the regression
coefficients
1. TESTS OF THE ‘GOODNESS OF FIT’
WITH R2
 r2 shows the percentage of total variation of the
dependent variable that can be explained by the
changes in the explanatory variable(s) included in
the model.
 draw a horizontal line corresponding to the mean
value of the dependent variable
 By fitting the line
 try to obtain the explanation of the variation of
the dependent variable Y produced by the
changes of the explanatory variable X.
.Y
Y = e  Y  Yˆ
Y Y = Yˆ Yˆ  ˆ 0  ˆ1 X

= Yˆ  Y
Y.

X
 .. = deviation of the observation Yi from the
regression line
 = deviation of Y from its mean
 = deviation of the regressed (predicted) value ( Yˆ ) from the mean
 in deviation form
 squaring and sum both sides,

 But =

 but
 ..
 Thus

 or
 We know
 Squaring and summing both sides

 since
 The limit of R2: The value of R2 falls between zero
and one. i.e. .
INTERPRETATION OF R2

 Suppose R2= 0.9 this means that the regression


line gives a good fit to the observed data since
this line explains 90% of the total variation of the
Y value around their mean.
 The remaining 10% of the total variation in Y is
unaccounted for by the regression line .
TESTING THE SIGNIFICANCE OF OLS PARAMETERS

 To test the significance of the OLS parameter


estimators we need the following:
 Variance of the parameter estimators
 Unbiased estimator of

You might also like