Professional Documents
Culture Documents
analysis: Estimation
Chapter 2
Two variable regression model
• A two-variable regression model expresses the dependent variable
as a (linear) function of only one explanatory variable.
• In this framework, regression analysis is largely concerned with
estimating and/or predicting the (population) mean or average
value of the dependent variable on the basis of known or fixed
values of the explanatory variable.
• However, in order to understand how the regression analysis
works, we need to understand the concepts of PRF & SRF first.
The population regression function
• Consider a hypothetical country with 23 families (keep in mind that in true sense
population is infinite and unknown to us). Now, suppose that our objective is to examine
the relationship between the weekly family consumption expenditure and Y and weekly
disposable income X of those 23 families.
If we plot a scattergram, we
can visualize the relationship
between X and Y clearly
The population regression function
Figure 1 shows that consumption
expenditure of 23 families under
consideration increases as their income
increases, or in other words,
The line that passes through the
population conditional means is known
as the population regression line/curve or
simply the regression of Y on X .
• Note that above EQ is the stochastic form of the PRF. Now, taking
expectation on both sides of
NOTE: We know that the expected
value of a constant is that constant
itself. The term is a constant,
once the value of Xi is fixed.
The population regression function
• Therefore, if the regression line passes through the conditional
means of Y it means that the conditional mean values of ui
(conditional on given values of X ’s) are zero.
The sample regression function
• In practice, we usually do not know the population of variables
included in the regression model. What we usually have is randomly
selected samples of Y and X .
• Therefore, our task is to estimate the PRF on the basis of sample
information.
• We would get n different SRFs for n different sample, and they are
not likely to be the same. Even though they represent the same PRF
but due to sampling fluctuations they are not similar.
Which of the two regression lines
represents the “true” population
regression line? There is no way we
can be absolutely sure that either of the
regression lines shown in the Figure
represents the true population regression
line (or curve).
The sample regression function
• Now, similar to the PRF, we can write the sample regression
function/line as follows:
• Where
• Since the PRF is not directly observable, we estimate it from the SRF
^
• Which we can write as
Which shows that the
residuals are simply
the differences
between the actual
and estimated Y
values.
Method of ordinary least squares (OLS)
• Now the best way to estimate the PRF is to choose b1 and b2, the estimators of
B1 and B2, in such a way that the residuals ei are as small as possible. The
method of ordinary least squares (OLS) states that b1 and b2 should be chosen
in such a way that the residual sum of squares (RSS), is as small as
possible.
• How do we get the estimates of B1 and B2? We take partial derivatives of the
following terms and choose the value of b1 and b2 that makes equal to
zero.
Method of ordinary least squares (OLS)
• That is, 𝜕 ∑ 𝑒𝑖2 and 𝜕 ∑ 𝑒𝑖2 .
=0 =0
𝜕 𝑏1 𝜕 𝑏2
• Solving them we get following two simultaneous equations,