You are on page 1of 20

Two variable regression

analysis: Estimation

Chapter 2
Two variable regression model
• A two-variable regression model expresses the dependent variable
as a (linear) function of only one explanatory variable.
• In this framework, regression analysis is largely concerned with
estimating and/or predicting the (population) mean or average
value of the dependent variable on the basis of known or fixed
values of the explanatory variable.
• However, in order to understand how the regression analysis
works, we need to understand the concepts of PRF & SRF first.
The population regression function
• Consider a hypothetical country with 23 families (keep in mind that in true sense
population is infinite and unknown to us). Now, suppose that our objective is to examine
the relationship between the weekly family consumption expenditure and Y and weekly
disposable income X of those 23 families.

Notice that, in each


column for a certain value
of X , we get a distribution
of Y , or in other words,
we get a distribution of Y
conditional on given
values of X
Conditional mean
• Noting that the data in Table above represent the population, we
can get the conditional mean (or conditional expectation) of Y .
Symbolically:

If we plot a scattergram, we
can visualize the relationship
between X and Y clearly
The population regression function
Figure 1 shows that consumption
expenditure of 23 families under
consideration increases as their income
increases, or in other words,
The line that passes through the
population conditional means is known
as the population regression line/curve or
simply the regression of Y on X .

In simple terms, PRF tells how the mean or average response of Y


varies with X. If Y and X are linearly related-
Stochastic representation of the population function
• Note that on two occasions there are two families in the $80 and
$100 income group with same consumption expenditure.
• For many families consumption expenditure is either more or less
that the mean levels of the respective group but clustered around
its conditional mean.
• What can we say about the relationship between income and
consumption at the individual family level?
• Individual family consumption expenditure is equal to the
average/mean of that income group plus/minus some quantity
Stochastic representation of the population function

Individual family consumption expenditure +/- Mean or


average consumption expenditure

• The deviations in consumption expenditures at individual family


levels can be expressed as:
Stochastic representation of the population function
• Now, if we assume that is linear in X, we can write:

• Note that above EQ is the stochastic form of the PRF. Now, taking
expectation on both sides of
NOTE: We know that the expected
value of a constant is that constant
itself. The term is a constant,
once the value of Xi is fixed.
The population regression function
• Therefore, if the regression line passes through the conditional
means of Y it means that the conditional mean values of ui
(conditional on given values of X ’s) are zero.
The sample regression function
• In practice, we usually do not know the population of variables
included in the regression model. What we usually have is randomly
selected samples of Y and X .
• Therefore, our task is to estimate the PRF on the basis of sample
information.
• We would get n different SRFs for n different sample, and they are
not likely to be the same. Even though they represent the same PRF
but due to sampling fluctuations they are not similar.
Which of the two regression lines
represents the “true” population
regression line? There is no way we
can be absolutely sure that either of the
regression lines shown in the Figure
represents the true population regression
line (or curve).
The sample regression function
• Now, similar to the PRF, we can write the sample regression
function/line as follows:

• Where

• In stochastic form, we can rewrite the above SRF as follows:


The sample regression function

• To sum up, our primary objective in regression analysis is to


estimate the PRF: on the basis of the SRF:
.
• It is worth mentioning that, as because our analysis is based on a
single sample from the population and because of sampling
fluctuations, our estimate of the PRF is at best an approximate one.
Method of ordinary least squares (OLS)
• Although there are several methods of obtaining the SRF as an
estimator of the true PRF, in regression analysis the method that is
used most frequently is that of least squares (LS), more popularly
known as the method of ordinary least squares (OLS).
• We will use the terms LS and OLS methods interchangeably. To
explain this method, we first explain the least squares principle.
Method of ordinary least squares (OLS)
• Recall our two-variable stochastic PRF

• Since the PRF is not directly observable, we estimate it from the SRF
^
• Which we can write as
Which shows that the
residuals are simply
the differences
between the actual
and estimated Y
values.
Method of ordinary least squares (OLS)
• Now the best way to estimate the PRF is to choose b1 and b2, the estimators of
B1 and B2, in such a way that the residuals ei are as small as possible. The
method of ordinary least squares (OLS) states that b1 and b2 should be chosen
in such a way that the residual sum of squares (RSS), is as small as
possible.

• How do we get the estimates of B1 and B2? We take partial derivatives of the
following terms and choose the value of b1 and b2 that makes equal to
zero.
Method of ordinary least squares (OLS)
• That is, 𝜕 ∑ 𝑒𝑖2 and 𝜕 ∑ 𝑒𝑖2 .
=0 =0
𝜕 𝑏1 𝜕 𝑏2
• Solving them we get following two simultaneous equations,

Solving above two equations, we obtain and


OLS
The term LINEAR in the context of regression analysis
• In general, a function y= f (x) can be termed as linear in x if:
(a) x appears with power 1 only; and
(b) x is not multiplied or divided by any other variable.
• LINEAR IN VARIABLE: When the conditional mean of the
dependent variable is a linear function of the independent variable.

Linear in variable and in parameter

Linear in variable but not in parameter


The term LINEAR in the context of regression analysis
• LINEAR IN PARAMETER: When the conditional mean of the
dependent variable is a linear function of the parameter

Linear in parameter but not in variable

Non linear both in parameter and in variable


• Note that, in the context of regression analysis, the term linear
regression model means that the model is linear in parameter, it
may or may not be linear in explanatory variables.

You might also like