Chapter Two; Regression Analysis
The term regression was introduced by Francis Galton.
Regression is probably the single most important tool at the
econometrician’s disposal.
It is concerned with the study of the dependence of one variable, the
dependent variable, on one or more other variables, the explanatory
variable/s with the view to estimating/ and or predicting the
(population mean or average value).
Cont.…
Regression analysis
Simple linear regression Multiple linear regression
Terminology and notation
y x
Dependent variable Independent variable
Explained variable Explanatory variable
Response variable Control variable
Predicted variable Predicator variable
Regressand variable Regressor variable
Cont.…
Single-equation Regression Models
In single-equation regression models, one variable, called the
dependent variable, is expressed as a linear function of one or more
other variables, called the explanatory variables.
In such models it is assumed implicitly that causal relationships, if
any, between the dependent and explanatory variables flow in one
direction only, namely, from the explanatory variables to the
dependent variable.
Cont.…
A model in which the dependent variable is expressed as a linear
function of only a single explanatory variable.
E.g., Ct = β1 + β2 Yt + ut
Multiple Regression Model
A model in which there are more than one explanatory variables.
The dependent variable is explained by more than one explanatory
(independent) variables.
E.g., Ct = β1 + β2 Yt + β3 Wt + ut
Cont..
➢ The above equations are regression equation of Y on X.
➢ Since U is a random variable, Y is also a random variable.
We can disaggregate the two components as follows
Y = 0 + 1X + U That is,
[variation in Y] = [systematic variation] + [random variation]
In this regard it is essential to know what the term linear really
means, for it can be interpreted in two different ways. These
are, Linearity in the variables and Linearity in parameters
Reasons for the existence of error term in the model
Omission of other variables
Measurement error
Randomness in human behavior
Imperfect specification of the model
Poor proxy variables
Population and sample regression functions
Imagine a hypothetical society with a total population of 20
families.
Suppose divide these 20 families into 5 groups of approximately the
same income and examine the consumption expenditure of families
in each of these income groups.
If we interested in studying the relationship between consumption
expenditure, and family income.
Cont.…
X 100 140 180 220 260
Y
65 75 95 105 115
70 80 100 110 120
75 85 105 115 125
80 90 110 120 130
From the table we can construct conditional mean and
unconditional mean.
Cont..
Unconditional mean = E(Y) = 98.5
Conditional mean when income level 100 = E(Y/X=100) = 72.5) .
Exercise: compute the conditional mean when the income is 140,
180, 220 and 260.
➢ What is the expected value of consumption expenditure of a
family?
➢ What is the expected value of consumption expenditure of a family
whose monthly income is, 220?
Cont..
The conditional mean increases as X increases
The above figure shows Conditional distribution of expenditure for
various levels of income
The line in the figure is known as the population regression line or,
more generally, the population regression curve.
Cont.…
Geometrically, a population regression curve is simply the locus of
the conditional means or expectations of the dependent variable for
the fixed value of the explanatory variables.
𝑆𝑦𝑚𝑏𝑜𝑙𝑖𝑐𝑎𝑙𝑙𝑦 𝐸( 𝑌 ⁄ 𝑋𝑖 ) = 𝑓(𝑋𝑖) = 𝛽1 + 𝛽2 𝑋𝑖
Consumption expenditure does not necessarily increase as income
level increases we incorporate the error term. That is,
𝑌𝑖 = 𝐸( 𝑌ൗ𝑋 ) + 𝑈𝑖 = 1 + 2𝑋𝑖 + 𝑈𝑖
𝑖
If we take the expected value in both sides
𝐸(𝑌𝑖 | 𝑋𝑖) = 𝐸[𝐸(𝑌 | 𝑋𝑖)] + 𝐸(𝑢𝑖 | 𝑋𝑖)
Cont.…
𝐸( 𝑌𝑖 ⁄ 𝑋𝑖 )= 𝐸( 𝑌ൗ𝑋𝑖 ) since the mean of ui is zero
In most practical situations, however, what we have is a sample of
Y values corresponding to some fixed X's.
Therefore, our main task must be to estimate the population
regression function on the basis of the sample information.
The regression function based on a sample collected from the
population is called sample regression function (SRF).
Cont.…
Sample regression function
Cont.…
The sample regression function (which is a counterpart of the PRF
stated earlier) may be written as:
𝑌𝑖 = 𝛽መ1 +𝛽መ2 𝑋𝑖 , where 𝑌𝑖 read as Y − hat and estimator of 𝐸( 𝑌ൗ𝑋𝑖) ,
𝛽መ1 = estimator of 1,𝛽መ2 = estimator of 2.
➢ Population regression function: 𝑌𝑖 = 1 + 2𝑋𝑖 + 𝑈𝑖
➢ On the basis of population function the sample counterpart is
𝑖 since 𝑌𝑖 = 𝛽መ1 +𝛽መ2 𝑋𝑖
𝑌𝑖 =𝑌𝑖 +𝑈
𝑌𝑖 = 𝛽መ1 +𝛽መ2 𝑋𝑖 +𝑈
𝑖
The Method of Ordinary Least Squares
The critical question now
1. Is SRF is an approximation of the PRF?
2. Can we devise a rule or a method that will make this
approximation as “close” as possible?
In other words, how should the SRF be constructed so that 𝛽መ1 is as
“close” as possible to the true 1 and 𝛽መ2 is as “close” as possible to
the true 2 even though we never know the true 1 and 2.
The method of ordinary least squares has some very attractive
statistical properties that have made it one of the most powerful and
popular method of regression analysis.
Cont..
Recall the two-variable (Y and X) PRF
𝑌𝑖 = 1 + 2𝑋𝑖 + 𝑈𝑖
However, the PRF is not directly observable. Hence, we estimate it
from the SRF. That is,
𝑌𝑖 = 𝛽መ1 +𝛽መ2 𝑋𝑖 +𝑈
𝑖
𝑌𝑖 =𝑌𝑖 +𝑈
𝑖
Where 𝑌𝑖 is the estimated (conditional mean) value of Yi
𝑖 = 𝑌𝑖 − 𝑌𝑖
𝑈
𝑖 = 𝑌𝑖 − 𝛽መ1 − 𝛽መ2 𝑋𝑖
𝑈
𝑖 , (the residuals) are simply the difference between
This shows that 𝑈
the actual and the estimated Y values.
Cont.….
Choose the SRF in such a way that the sum of the residuals 𝑈
𝑖 =
(𝑌𝑖 − 𝑌)
is as small as possible.
2
And to minimize taking both side squaring 𝑈𝑖 = (𝑌𝑖 − 𝑌)
2
Cont.…
Using the concept of partial derivatives one can minimize and set it
equal to zero and solving for 𝛽መ1 &𝛽መ2 .
2 2
𝜕 𝑈𝑖 𝜕 𝑈𝑖
1 = 0, 2 = 0
𝜕𝛽 𝜕𝛽
2
𝜕 𝑈𝑖
1 = 0⟹𝑌𝑖 =n𝛽መ1 +𝛽መ2 𝑋𝑖 …………eq(1)
𝜕𝛽
2
𝜕 𝑈𝑖
2 = 0⟹𝑌𝑖 𝑋𝑖 =𝛽መ1 𝑋𝑖 + 𝑋𝑖 2 …….eq(2)
𝜕𝛽
Cont.…
❑ Substituting the first eq. in place of 𝛽መ1 of the second eq. we get,
𝛽መ1 =𝑌ത − 𝛽መ2 𝑋ത
Example
1. Based on the following consumption expenditure and income
data of the families
a. Compute the slope and intercept estimator (estimate the
consumption model).
b. How would you interpret the slope parameter?
Obs. 1 2 3 4 5
Yi 65 80 100 115 130
Xi 100 140 180 220 260
Solution
Observation 𝑌𝑖 𝑋𝑖 𝑌𝑖 2 𝑋𝑖 2 𝑌𝑖 𝑋𝑖
1 65 100 4225 10000 6500
2 80 140 6400 19600 11200
3 100 180 10000 32400 18000
4 115 220 13225 48400 25300
5 130 260 16900 67600 33800
Sum 490 900 50750 178000 94800
2 2
𝑛 = 5, 𝑋ഥ =180, 𝑌ഥ =98, σ 𝑦𝑖 𝑥𝑖 =6,600, 𝑥𝑖 = 16,000, 𝑦𝑖 =
2,730 Then 𝛽መ2 = 0.4125 and 𝛽መ1 = 23.75
b. When income increase by one unit (dollar or birr) then,
consumption expenditure increase by 0.4125 unit (dollar or birr).
The classical linear regression model (CLRM) Assumptions
Linear regression model is based on certain assumptions; some of
the assumptions are related to Ui and the relationship between Ui
and the explanatory variables, other assumptions are related to the
relationship between the explanatory variables themselves.
There are 10 assumption in CLRM
Assumption 1: Linear regression model: - the regression model is
linear in the parameters.
Assumption 2: X (explanatory) values are fixed in repeated
sampling. Values taken by the regressor X are considered fixed in
repeated samples
Cont.…
Assumption 3: Ui is a random real variable. Each value has a certain
probability of being assumed by U in any particular instance.
Assumption 4: Zero mean of the disturbance term. The mean or
expected value of the disturbance term Ui is zero.
Symbolically, we have: 𝐸( Uiൗ𝑋𝑖) = 0.
Assumption 5: Homoscedasticity assumption which implies that the
variance of Ui is constant.
Symbolically, Var(𝑈𝑖ൗ𝑋𝑖) = E(𝑈𝑖 − E( 𝑈𝑖ൗ𝑋𝑖 ) 2 = 𝜎 2
If the variance of the error term is not constant, heteroscedasticity
problem will occur in our regression analysis.
Cont.….
Assumption 6: No autocorrelation between the disturbances. Given
any two U values, Xi and Xj (i j), the correlation between any two Ui
and Uj (i j) is zero.
This implies that the error term committed for the ith observation is
independent of the error term committed for the jth observation.
𝑆𝑦𝑚𝑏𝑜𝑙𝑖𝑐𝑎𝑙𝑙𝑦, 𝐶𝑜𝑟𝑟(𝑈𝑖 𝑈𝑗) = 𝐸[(𝑈𝑖 – 𝐸(𝑈𝑖 ))𝐸(𝑈_𝑗 − 𝐸(𝑈_𝑗)] = 0
What if i=j
Assumption 7: Zero covariance between Ui and Xi or E(UiXi) = 0.
That is, the error term is independent of the explanatory variable(s).
Cont.…
Assumption 8: the regression model is correctly specified. This means
that we have included all the important regressions explicitly in the
model and that its mathematical form is correct.
Assumption 9: There is no perfect multicollinearity. That is, there are
no perfect linear relationship among the explanatory variables. This
assumption works in case of multiple linear regression.
Assumption 10: The number of observations, n must be greater than
the number of parameters to be estimated.
Properties of Least Squares Estimators: The Gauss-Markov
Theorem
To understand this theorem, we need to consider the best linear
unbiasedness property of an estimator.
The OLS estimator is said to be best linear unbiased estimator
(BLUE) of i if the following hold:
1. Linear Estimator. It is linear, that is, a linear function of a random
variable, such as the dependent variable Y in the regression model.
2. Unbiased estimator: an estimator is said to be unbiased if its
average or expected value, E(𝛽መ𝑖 ), is equal to the true value, 𝑖 .
3. Minimum Variance estimator (or best estimator) An estimator is
best when it has the smallest variance as compared with any other
estimate obtained from econometric methods.
Precision or Standard Errors of Least Squares
Estimates
Since the data are likely to change from sample to sample, the
estimates will change. Therefore, what is needed is some measure of
“reliability” or precision of the estimators 𝛽መ1 and 𝛽መ2 .
መ መ መ 2 𝜎2 𝑉𝑎𝑟(𝑢𝑖 )
Var(𝛽2 ) = E(𝛽2 − E(𝛽2 ) = 2 =
𝑥𝑖 𝑣𝑎𝑟(𝑥𝑖 )
2
𝑋
Var(𝛽መ1 ) = E(𝛽መ1 − E(𝛽መ1 ) 2 = 𝑖
2 𝜎
2
n𝑥𝑖
𝜎
Se(𝛽መ2 ) = Var(𝛽መ2 ) =
2
𝑥𝑖
2
𝑋𝑖
Se(𝛽መ1 )= Var(𝛽መ1 ) = 𝜎 2
n𝑥𝑖
𝜎2
Cov(𝛽መ2 𝛽መ1 ) = -𝑋ത 2
𝑥𝑖
Example
1. Compute the variance and standard error of the
estimators for the previous example.
Solution
Var(𝛽መ2 ) =0.0000625𝜎 2
Var(𝛽መ1 ) = 2.225𝜎 2
Se(𝛽መ2 ) = 0.0079𝜎
Se(𝛽መ1 ) = 1.49𝜎
ෞ𝑖 2
σ𝑢
𝜎 2= = 𝜎ො 2 since they are unbiased estimator
𝑛−𝑘
Correlation coefficient and coefficient of determination (r2)
Coefficient of determination (r2) is a measure of “goodness of fit”.
The coefficient of determination (r2) is a summary measure that tells
how well the sample regression line fits the data.
To compute r2
𝑌𝑖 = 𝑌𝑖 +𝑈
𝑖 or in deviation form 𝑦𝑖= 𝑦ො𝑖 +𝑢ො 𝑖 , squaring both sides and
summing over the sample
2 2 2
𝑦𝑖 = 𝑦ො𝑖 + 𝑢ො 𝑖
In other words total sum of square (TSS) is equal to explained sum of
square (ESS) plus residuals sum of squares (RSS).
Symbolically, TSS = ESS + RSS
Total variation = explained variation + unexplained variation
Cont..
Note that because OLS estimators minimizes the sum of
squared residuals (i.e., the unexplained variation) it
automatically maximizes r2.
𝑇𝑆𝑆 = 𝐸𝑆𝑆 + 𝑅𝑆𝑆 𝑑𝑖𝑣𝑖𝑑𝑒 𝑏𝑜𝑡ℎ 𝑠𝑖𝑑𝑒 𝑏𝑦 𝑇𝑆𝑆
2 2
ESS RSS 𝑦ො 𝑖 𝑢ෝ𝑖
1= + or 1 = 2 + 2
TSS TSS 𝑦𝑖 𝑦𝑖
r2 measures the proportion or percentage of the total variation
in Y explained by the regression model.
2 2
ESS 𝑦ො 𝑖 RSS 𝑢ෝ𝑖
𝑟2 = = 2 or 1 − ⟹ 1 − 2
TSS 𝑦𝑖 TSS 𝑦𝑖
Cont.…
2
2 2 2 2 2 𝑥𝑖
𝛽
Since 𝑦ො𝑖 =𝛽መ2 𝑥𝑖 ⟹ 2
𝑦𝑖
2 (σ 𝑦𝑖 𝑥𝑖 )2
𝑟 = 2 2
𝑦𝑖 𝑥𝑖
Two properties of r2 may be noted
1. It is a non-negative quantity. This is because we are
dealing with sum of squares
2. Its limit are 0 r2 1.
if r2 = 1 means a perfect fit, that is = Yi for each i (or
alternatively Ui2 = 0)
if r2 = 0 means there is no relationship between the
regression and the regressor whatsoever.
Coefficient of correlation
A quantity closely related to but conceptually very much
different from 𝑟 2 .
It measure the degree of association between two variables.
It is a measure of linear association or linear dependence only.
It can be computed either from:
r = ± 𝑟2
or from its definition
σ 𝑦𝑖 𝑥𝑖
𝑟=
2 2
𝑦𝑖 𝑥𝑖
Cont..
Some of the properties of Coefficient of correlation are as
follows
1. It can be positive or negative
2. It lies between the limits of −1 and +1; that is, −1 ≤ r ≤ 1.
3. It is symmetrical in nature
Exercise
1. Compute TSS, ESS, RSS, coefficient of determination
and coefficient of correlation from the previous example.
2. Compute the variance of the error term
3. Obtain the variance and standard error of the estimators
Solution
1. TSS = 2, 730, ESS = 2,722.5, RSS = 7.5, 𝑟 2 =
0.9973 and r = 0.9986.
2. 𝜎ො 2 = 2.5
3. Var(𝛽መ2 ) =0.00015625
Var(𝛽መ1 ) = 5.5624987
Se(𝛽መ2 ) = 0.0125
Se(𝛽መ1 ) = 2.358
Hypotheses testing
Is a given observation or finding compatible with some stated
hypothesis or not?
The word “compatible,” as used here, means “sufficiently”
close to the hypothesized value so that we do not reject the
stated hypothesis.
In the language of statistics, the stated hypothesis is known as
the null hypothesis and is denoted by the symbol H0.
The null hypothesis is usually tested against an alternative
hypothesis denoted by H1.
The theory of hypothesis testing is concerned with developing
rules or procedures for deciding whether to reject or not reject
the null hypothesis.
Cont.…
There are two approaches for devising such rules,
namely, confidence interval and test of
significance.
Hypothesis Testing: The Test-of-significance
Approach
A test of significance is a procedure by which sample
results are used to verify the truth or falsity of a null
hypothesis.
The decision to accept or reject H0 is made on the
basis of the value of the test statistic obtained from
the data at hand.
Cont..
Steps in the Test-of-significance Approach
1. Compute the t-statistic value
𝛽መ𝑖 − β𝑖
𝑡=
𝑠𝑒(𝛽መ𝑖 )
2. Obtain the t-critical value from the table
Critical t values) obtained from the t table for (α/2) level of
significance and n − 2 df,
3. Decision
(If t-statistic > t-critical reject Ho and accept H1 and the reverse is true).
➢ a statistic is said to be statistically significant if the value
of the test statistic greater than the critical value.
➢ a test is said to be statistically insignificant if the value of
the test statistic less than the critical value.
Exercise
Given from the table (tα/2, n-2) df is 3.182 and test the
significance level for the estimators for the pervious example
using significance approach.
Solution
For 𝛽መ2 t-statistic value is 33, the value greater than t-
critical value and implies that accept H1 and reject Ho (𝛽መ2
is statistically significant or income significant effect on
consumption expenditure).
For 𝛽መ1 t-statistic value is 10.07 and which is greater than
t-critical value and implies that accept H1 and reject Ho
(𝛽መ1 is statistically significant).
Hypothesis Testing:
The Confidence-interval Approach
Steps
Construct 100(1 − α)% confidence interval for 𝛽መ𝑖
𝛽መ𝑖 ± 𝑡α/2 𝑠𝑒(𝛽መ𝑖 )
Example in the consumption–income postulate that
H0: β2 = 0.3
H1:β2 = 0.3
Construct a 100(1 − α)% confidence interval for 𝛽መ2 .
If the β2 under H0 falls within this confidence interval, do not
reject H0, but if it falls outside this interval, reject H0.
Exercise
1. For the pervious example construct 95% confidence interval
for the estimators and test the significance level.
Solution
95% CI for β 2 is from 0.3727 - 0.4522; β2 under H0 is not
falls within the 95 confidence interval. Therefore, reject H0
and accept H1.
95% CI for β 1 is from 16.24 - 31.25; β1 under H0 is not falls
within the 95 confidence interval. Therefore, reject H0 and
accept H1.