Econometrics I Lecture Notes

Jinka University Department of Economics 2019/20
CHAPTER ONE
INTRODUCTION
1.1. Definition and Scope of Econometrics
Ragnar Frisch is credited with coining the term ‘econometrics.’ Literally interpreted,
econometrics means “economic measurement”, but the scope of econometrics is much broader as
described by leading econometricians.
An econometrician has to be a competent mathematician and statistician who is an economist by
training. Fundamental knowledge of mathematics, statistics and economic theory are a necessary
prerequisite for this field. As Ragnar Frisch (1933) explains in the first issue of Econometrica, it
is the unification of statistics, economic theory and mathematics that constitutes econometrics.
Each view point, by itself is necessary but not sufficient for a real understanding of quantitative
relations in modern economic life.
Econometrics aims at giving empirical content to economic relationships. The three key
ingredients are economic theory, economic data, and statistical methods. Neither ‘theory without
measurement’, nor ‘measurement without theory’ are sufficient for explaining economic
phenomena. It is as Frisch emphasized their union that is the key for success in the future
development of econometrics.
In general, Econometrics is the science which integrates economic theory, economic statistics,
and mathematical economics to investigate the empirical support of the general schematic law
established by economic theory. It is a special type of economic analysis and research in which
the general economic theories, formulated in mathematical terms, is combined with empirical
measurements of economic phenomena. Starting from the relationships of economic theory, we
express them in mathematical terms so that they can be measured. We then use specific
methods, called econometric methods in order to obtain numerical estimates of the coefficients
of the economic relationships.
[
In short, econometrics may be considered as the integration of economics, mathematics, and

statistics for the purpose of providing numerical values for the parameters of economic
relationships and verifying economic theories.
1.2. Models: Economic Models and Econometric Models

What is a model? A model is a simplified representation of a real-world process. For instance,
‘the demand for oranges depends on the price of oranges’ is a simplified representation since
there are a host of other variables that one can think of that determine the demand for oranges.
These include:
 Income of consumers
Econometrics I Lecture Notes Page 1

 An increase in diet consciousness (e.g. drinking coffee causes cancer; so better switch to
orange juice)
 Increase or decrease in the price of substitutes (e.g. that of apple)
However, there is no end to this stream of other variables! Many have argued in favour of
simplicity since simple models are easier:
 to understand
 to communicate
 to test empirically with data
The choice of a simple model to explain complex real-world phenomena leads to two criticisms:
 The model is oversimplified

 The assumptions are unrealistic
For instance, to say that the demand for oranges depends on only the price of oranges
is both an oversimplification and an unrealistic assumption.
 To the criticism of oversimplification, many have argued that it is better to start

with a simplified model and progressively construct more complicated models.
 As to the criticism of unrealistic assumptions, the relevant question is whether
they are sufficiently good approximations for the purpose at hand or not.
In practice we include in our model:
 Variables that we think are relevant for our purpose.

 A ‘disturbance’ or ‘error’ term which accounts for variables that are omitted as
well as all unforeseen forces.
This brings us to the distinction between an economic model and econometric model.
i) Economic Models
An economic model is an organized set of relationships that describes the functioning of an

economic entity under a set of simplifying assumptions. All economic reasoning is ultimately
based on models. Economic models consist of the following three basic structural elements.
1. A set of variables
2. A list of fundamental relationships and
3. A number of strategic coefficients
ii) Econometric Models
The most important characteristic of economic relationships is that they contain a random
element which is ignored by mathematical economic models which postulate exact relationships
between economic variables.

Example: Economic theory postulates that the demand for a commodity (Q) depends on its price
(P), on the prices of other related commodities (P0 ), on consumers’ income (Y) and on tastes (T).
This is an exact relationship which can be written mathematically as:
Q  b0  b1P  b2 P0  b3Y  b4t
The above demand equation is exact. However, many more factors may affect demand. In
econometrics the influence of these ‘other’ factors is taken into account by the introduction into
the economic relationships of random variable. In our example, the demand function studied
with the tools of econometrics would be of the stochastic form:
Q  b0  b1 P  b2 P0  b3Y  b4t  u
where u stands for the random factors which affect the quantity demanded.
Generally, Econometrics differs from mathematical economics in that, although econometrics

presupposes, the economic relationships to be expressed in mathematical forms, it does not
assume exact or deterministic relationship. Econometrics assumes random relationships among
economic variables. Econometric methods are designed to take into account random disturbances
which relate deviations from exact behavioral patterns suggested by economic theory and
mathematical economics. Furthermore, econometric methods provide numerical values of the
coefficients of economic relationships.
[[
1.3. Methodology of Econometrics
The aims of econometrics are:

 Analysis i.e. testing economic theory
 Policy making i.e. Obtaining numerical estimates of the coefficients of economic
relationships for policy simulations.
 Forecasting i.e. using the numerical estimates of the coefficients in order to forecast the
future values of economic magnitudes.
Econometric research is concerned with the measurement of the parameters of economic
relationships and with the predication of the values of economic variables. The relationships of
economic theory which can be measured with econometric techniques are relationships in which
some variables are postulated as causes of the variation of other variables. Starting with the
postulated theoretical relationships among economic variables, econometric research or inquiry
generally proceeds along the following lines/stages.
1. Specification the model

2. Estimation of the model

3. Evaluation of the estimates

4. Evaluation of he forecasting power of the estimated model
1. Specification of the model
In this step the econometrician has to express the relationships between economic variables in
mathematical form. This step involves the determination of three important tasks:
i) The dependent and independent (explanatory) variables which will be

included in the model.
ii) The a priori theoretical expectations about the size and sign of the parameters
of the function.
iii) the mathematical form of the model (number of equations, specific form of
the equations, etc.)
Note: The specification of the econometric model will be based on economic theory and on any
available information related to the phenomena under investigation. Thus, specification of the
econometric model presupposes knowledge of economic theory and familiarity with the
particular phenomenon being studied.
Specification of the model is the most important and the most difficult stage of any econometric
research. It is often the weakest point of most econometric applications. In this stage there exists
enormous degree of likelihood of committing errors or incorrectly specifying the model. Some of
the common reasons for incorrect specification of the econometric models are:
1. The imperfections, looseness of statements in economic theories.

2. The limitation of our knowledge of the factors which are operative in any particular
case.
3. The formidable obstacles presented by data requirements in the estimation of large
models.
The most common errors of specification are:
a. Omissions of some important variables from the function.

b. The omissions of some equations (for example, in simultaneous equations model).
c. The mistaken mathematical form of the functions.
2. Estimation of the model
This is purely a technical stage which requires knowledge of the various econometric methods,
their assumptions and the economic implications for the estimates of the parameters. This stage
includes the following activities.
a. Gathering of the data on the variables included in the model.

b. Examination of the identification conditions of the function (especially for
simultaneous equations models).
c. Examination of the aggregations problems involved in the variables of the
function.

d. Examination of the degree of correlation between the explanatory variables

(i.e. examination of the problem of multicollinearity).
e. Choice of appropriate economic techniques for estimation, i.e. to decide a
specific econometric method to be applied in estimation; such as, OLS, MLM,
MM, Logit, and Probit.
3. Evaluation of the estimates
This stage consists of deciding whether the estimates of the parameters are theoretically
meaningful and statistically satisfactory. This stage enables the econometrician to evaluate the
results of calculations and determine the reliability of the results. For this purpose we use
various criteria which may be classified into three groups:
i. Economic a priori criteria: These criteria are determined by economic theory and
refer to the size and sign of the parameters of economic relationships.
ii. Statistical criteria (first-order tests): These are determined by statistical theory and
aim at the evaluation of the statistical reliability of the estimates of the parameters of
the model. Correlation coefficient test, standard error test, t-test, F-test, and R2 -test
are some of the most commonly used statistical tests.
iii. Econometric criteria (second-order tests): These are set by the theory of
econometrics and aim at the investigation of whether the assumptions of the
econometric method employed are satisfied or not in any particular case. The
econometric criteria serve as a second order test (as test of the statistical tests) i.e.
they determine the reliability of the statistical criteria; they help us establish whether
the estimates have the desirable properties of unbiasedness, consistency etc.
Econometric criteria aim at the detection of the violation or validity of the
assumptions of the various econometric techniques.
4) Evaluation of the forecasting power of the model:
Forecasting is one of the aims of econometric research. However, before using an estimated
model for forecasting by some way or another, the predictive power of the model should be
tested. It is possible that the model may be economically meaningful and statistically and
econometrically correct for the sample period for which the model has been estimated; yet it may
not be suitable for forecasting due to various factors (reasons). Therefore, this stage involves the
investigation of the stability of the estimates and their sensitivity to changes in the size of the
sample. Consequently, we must establish whether the estimated function performs adequately
outside the sample of data i.e. we must test an extra sample performance the model. The steps
discussed above are summarized below diagrammatically.

Schematic description of the steps involved in econometric analysis (research)
1.4. The Sources, Types and Nature of Data

The Sources of Data
The data used in empirical analysis may be collected by a governmental agency (e.g., the
Department of Commerce), an international agency (e.g., the International Monetary Fund (IMF)
or the World Bank), a private organization, or an individual. Literally, there are thousands of
such agencies collecting data for one purpose or another.
The Internet: The Internet has literally revolutionized data gathering. If you just “surf the net”
with a keyword (e.g., exchange rates), you will be swamped with all kinds of data sources.
The data collected by various agencies may be experimental or non-experimental. In

experimental data, often collected in the natural sciences, the investigator may want to collect

data while holding certain factors constant in order to assess the impact of some factors on a
given phenomenon.
For instance, in assessing the impact of obesity on blood pressure, the researcher would want to
collect data while holding constant the eating, smoking, and drinking habits of the people in
order to minimize the influence of these variables on blood pressure.
In the social sciences, the data that one generally encounters are non-experimental in nature, that
is, not subject to the control of the researcher. For example, the data on GNP, unemployment,
stock prices, etc., are not directly under the control of the investigator. As we shall see, this lack
of control often creates special problems for the researcher in pinning down the exact cause or
causes affecting a particular situation. For example, is it the money supply that determines the
(nominal) GDP or is it the other way round?
Types of Data
Three types of data may be available for empirical analysis: time series, cross-section, and
pooled (i.e., combination of time series and cross section) data.
Time Series Data: A time series is a set of observations on the values that a variable takes at
different times. It is a data collected for a single entity (person, firm, country) collected
(observed) at multiple time periods Such data may be collected at regular time intervals, such as
daily (e.g., stock prices, weather reports), weekly (e.g., money supply figures), monthly [e.g., the
unemployment rate, the Consumer Price Index (CPI)], quarterly (e.g., GDP), annually (e.g.
government budgets), quinquennially, that is, every 5 years (e.g., the census of manufactures), or
decennially (e.g., the census of population).
Although time series data are used heavily in econometric studies, they present special problems
for econometricians because of stationarity issues.
Cross-Section Data: Data on different (multiple) entities- workers, consumers, firms,

governmental units - observed for a single time period. Cross-section data are data on one or
more variables collected at the same point in time, such as the census of population conducted by
the Census Bureau every 10 years.
Just as time series data create their own special problems (because of the stationarity issue);
cross-sectional data too have their own problems, specifically the problem of heterogeneity.
Pooled Data: In pooled, or combined, data are elements of both time series and cross-sectional
data.
Panel data (also known as longitudinal data or micropanel) consist of multiple entities where
each entity is observed at two or more time period. This is a special type of pooled data in which
the same cross-sectional unit (say, a family or a firm) is surveyed over time. The key feature of

panel data that distinguishes it from a pooled cross section is the fact that the same cross-
sectional units (individuals, firms, or counties) are followed over a given time period.
CHAPTER TWO
SIMPLE LINEAR REGRESSION
Economic theories are mainly concerned with the relationships among various economic
variables. These relationships, when phrased in mathematical terms, can predict the effect of one
variable on another. The functional relationships of these variables define the dependence of one
variable upon the other variable (s) in the specific form. The specific functional forms may be
linear, quadratic, logarithmic, exponential, hyperbolic, or any other form.
In this chapter we shall consider a simple linear regression model, i.e. a relationship between two
variables related in a linear form. We shall first discuss concept of regression function followed
by estimation methods and their properties, hypothesis testing and prediction using simple linear
regression model.

2.1. Concept of Regression Function

Regression analysis is one of the most commonly used tools in econometric work.
Definition: Regression analysis is concerned with describing and evaluating the

relationship between a given variable (often called the dependent variable) and one or
more variables which are assumed to influence the given variable (often called independent or
explanatory variables).
Much of applied econometric analysis begins with the following premise: y and x are two
variables, representing some population and we are interested in “explaining y in terms of x ,”
or in “studying how y varies with changes in x .”
In writing down a model that will “explain y in terms of x ,” we must confront three issues.
First, since there is never an exact relationship between two variables, how do we allow for other
factors to affect y ? Second, what is the functional relationship between y and x ? And third,
how can we be sure we are capturing a ceteris paribus relationship between y and x (if that is
a desired goal)?
We can resolve these ambiguities by writing down an equation relating y to x. A simple equation
is
y  0  1x  u................................................(2.1)
Equation (2.1), which is assumed to hold in the population of interest, defines the simple linear
regression model. It is also called the two-variable linear regression model or bivariate linear
regression model because it relates the two variables y and x . We now discuss the meaning of
each of the quantities in (2.1).
When related by (2.1), the variables y and x have several different names used interchangeably,
as follows. y is called the dependent variable, the explained variable, the response variable,
the predicted variable, or the regressand. x is called the independent variable, the
explanatory variable, the control variable, the predictor variable, or the regressor. (The term
covariate is also used for x .) The terms “dependent variable” and “independent variable” are
frequently used in econometrics.
The variable u , called the error term or disturbance term or stochastic term in the relationship,
represents factors other than x that affect y . A simple regression analysis effectively treats all
factors affecting y other than x as being unobserved. You can usefully think of u as standing
for “unobserved.”

0 and 1 are known as regression coefficients or regression parameters. 1 is the slope

parameter in the relationship between y and x holding the other factors in u fixed; it is of
primary interest in applied economics. The intercept parameter 0 also has its uses, although it
is rarely central to an analysis.
EXAMPLE 2.1
(Soybean Yield and Fertilizer)
Suppose that soybean yield is determined by the model
yield  0  1 fertilizer  u,
so that y  yield and x  fertilizer. The agricultural researcher is interested in the effect of
fertilizer on yield, holding other factors fixed. This effect is given by 1 .The error term u
contains factors such as land quality, rainfall, and so on. The coefficient 1 measures the effect
of fertilizer on yield, holding other factors fixed: y  1fertilizer .
If the average value of u does not depend on the value of x , it is useful to break y into two
components as in (2.3) below. The component 0  1x is sometimes called the systematic part
of y (also called regression line) —that is, the part of y explained by x —and u is called the
unsystematic part or the part of y not explained by x .
y  0  1 xi  u ..................................(2.2)
thedependent var iable the regression line random var iable
Assumptions of the Classical Linear Stochastic Regression Model
The classicals made important assumption in their analysis of regression .The most important of
these assumptions are discussed below.
1. The model is linear in parameters.

The classicals assumed that the model should be linear in the parameters regardless of whether
the explanatory and the dependent variables are linear or not. This is because if the parameters
are non-linear it is difficult to estimate them since their value is not known but you are given
with the data of the dependent and independent variable.
A function is said to be linear in the parameter, say, β 1 , if β1 appears with a power of 1 only and
is not multiplied or divided by any other parameter (for example, β1 β2 , β2 /β1 , and so on).
EXAMPLE 2.2
All of the following models are linear in parameters except d, e, f and g.


a. Y     X  U e. Y    X U

b. ln Y  1  2 ln X  U f. Y     2 X  U
c. Y  0  1 X 2  U g. lnY     X  U
d. Y  0  1 X  U h. Y 2  1   2 X 1/3  U
2. U i is a random real variable
This means that the value which U may assume in any one period depends on chance; it may be
positive, negative or zero. Every value has a certain probability of being assumed by U in any
particular instance.
3. The mean value of the random variable (U) in any particular period is zero
This means that for each value of x, the random variable(u) may assume various values, some
greater than zero and some smaller than zero, but if we considered all the possible and negative
values of U, for any given value of X, they would have on average value equal to zero. In other
words the positive and negative values of U cancel each other.
Mathematically, E (U)  0 …………………………….. (2.3)
4. The assumption of homoscedasticity
The variance of the random variable (U) is constant in each period. This means that for all
values of X, the U i ’s will show the same dispersion around their mean. In Fig.2.a this
assumption is denoted by the fact that the values that U can assume lie within the same
limits, irrespective of the value of X. For X 1 , U can assume any value within the range AB;
for X 2 , U can assume any value within the range CD which is equal to AB and so on.
Graphically;

Figure 2.a: Homoscedastic Variance
Mathematically;
Var(Ui )  E[Ui  E (Ui )]2  E (Ui )2   2 (Since E(Ui) = 0). ………………………….(2.4)
This constant variance is called homoscedasticity assumption and the constant variance itself is
called homoscedastic variance.
5. The random variable (U) has a normal distribution
This means the values of U (for each X) have a bell shaped symmetrical distribution about their
zero mean and constant variance  2 , i.e.
Ui ~ N (0,  2 ) ………………………………………………………..(2.5)
6. No autocorrelation between the disturbances.
Given any two X values, Xi and Xj (i  j), the correlation between any two Ui and Uj (i  j) is
zero. Symbolically,
  
cov(U i , U j )  E U i  E (U i )  U j  E (U j )  

    
 0 0 
 E (Ui U j ) (2.6)

0
7. Xi are non-stochastic
The X i ' s are a set of fixed values in the hypothetical process of repeated sampling which
underlies the linear regression model. This means that, in taking large number of samples on Y
and X, the X i values are the same in all samples, but the U i values do differ from sample to
sample, and so of course do the values of Yi .
8. The random variable (U) is independent of the explanatory variables .
This means there is no correlation between the random variable and the explanatory variable. If
two variables are unrelated their covariance is zero.
cov( X i ,Ui )   [( X i  ( X i )][U i  (U i )]

 E  X i  E (X i ) U i  since (Ui ) = 0
 E  X iU i  E ( X i )U i  (2.7)
 E(Xi Ui )  E(Xi ) E(Ui )
 X i E (Ui ) since (Ui )  0 and Xi’s are fixed
 0 since (Ui )  0
Dear students! We can now use the above assumptions to derive the following basic concepts.
A. The dependent variable Yi is normally distributed.

i.e. Yi ~N (   x i ),  2  …………………………………………………..(2.8)
Proof:
Mean: (Y )    xi  ui 
   X i Since (ui )  0
Variance: Var (Yi )  Yi  (Yi ) 
2
   X i  u i  (  X i ) 
2
 (ui ) 2
  2 (since (ui )   )
2 2
 var(Yi )   2 ……………………………………….(2.9)

The shape of the distribution of Yi is determined by the shape of the distribution of U i which is
normal by assumption 4. Since  and  , being constant, they don’t affect the distribution ofYi .
Furthermore, the values of the explanatory variable, X i , are a set of fixed values by assumption
5 and therefore don’t affect the shape of the distribution of Yi .
Yi ~N(   X i , 2 )
B. successive values of the dependent variable are independent, i.e

Cov(Yi , Y j )  0
Proof:
Cov(Yi , Y j )  E{[Yi  E (Yi )][Y j  E (Y j )]}
 E{[  X i  U i  E (  X i  U i )][  X j  U j  E (  X j  U j )}
(Since Yi    X i  U i and Y j    X j  U j )
= E[(  X i  Ui    X i )(  X j  U j    X j )] ,Since (ui )  0
 E (U iU j )  0 (from equation (2.6))
Therefore, Cov(Yi ,Y j )  0 .
2.2. Methods of Estimation

Specifying the model and stating its underlying assumptions are the first stage of any
econometric application. The next step is the estimation of the numerical values of the
parameters of economic relationships. The parameters of the simple linear regression model can
be estimated by various methods. Three of the most commonly used methods are:
1. Ordinary least square method (OLS)

2. Maximum likelihood method (MLM)
3. Method of moments (MM)
2.2.1. The Method of Moments: Simply stated, this method of estimation uses the following
rule: Keep equating population moments to their sample counterpart until you have estimated all
the population parameters. In general, the method of moments, sometimes called MM for short,
estimates population moments by the corresponding sample moments. In order to apply this
method to regression models, we must use the facts that population moments are expectations,
and that regression models are specified in terms of the conditional expectations of the error
terms. Consider the following simple linear regression model:

Yi     X i  Ui
The assumptions we have made about the error term U imply that
E (U)  0 and cov( X ,U )  0
In the method of moments, we replace these conditions by their sample counterparts.
Let ̂ and ˆ be the estimator for  and  , respectively. The sample counterpart of U i is the
estimated error Uˆ i (which is also called the residual), defined as
Uˆ i  Yi  ˆ  ˆ X i …………………………………………………………. (2.10)
The two equations to determine ̂ and ˆ are obtained by replacing population assumptions by
their sample counterparts.
Population Assumptions Sample Counterpart
Uˆ i  0 or Uˆ i  0
1
E (U)  0
n
 X iUˆ i  0 or  X iUˆ i  0
1
cov( X ,U )  0
n
n
In these and the following equations,  denotes  . These we get the two equations
i1

Uˆ i  0 or  (Yi  ˆ  ˆ Xi )  0 ………………………………………………..(2.11)
 Yi  ˆ  ˆ  X i  0
→ Y  nˆ  ˆ  X  0
i i
 Y  nˆ  ˆ X ……………………………………….. …………….(2.12)

i i
 ˆ  Y  ˆ X …………………………………………………………….(2.13)
 X Uˆ i i  0 or  X i (Yi  ˆ  ˆ Xi )  0   X iYi  ˆ  X i  ˆ  X i2  0 ……… (2.14)
  X iYi  ˆ  X i  ˆ  X i2 …………………………………. …..(2.15)

Substituting the values of ̂ from (2.13) to (2.15), we get:

Y X  X (Y  ˆX )  ˆX 2
i i i i
 Y X i  ˆXX i  ˆX i2
Y X i i  YX i  ˆ (X i2  XX i )

X iYi  nXY = ˆ ( X i2  nX 2 )
X iYi  nXY
ˆ  ……………………………………. (2.16)
X i2  nX 2
Note that Equation (2.12) and (2.15) are called “normal equations.
2.2.2. Method of Least Squares
Consider the following simple linear regression model:
Yi     X i  Ui
Estimation of  and  by least square method (OLS) or classical least square (CLS) involves
finding values for the estimates ˆ and ˆ which will minimize the sum of the squared residuals (
Uˆ i
2
).
From the estimated relationship Yi  ˆ  ˆ X i  Uˆ i , we obtain:
Uˆ i  Yi  (ˆ  ˆ X i ) …………………………………………….. (2.17)
Uˆ  (Y  ˆ  ˆX

i
2
i i ) 2 ……………………………………….... (2.18)
To find the values of ˆ and ˆ that minimize this sum, we have to partially differentiate Uˆ i
2
with respect to ˆ and ˆ and set the partial derivatives equal to zero.
 Uˆ i2
1.  2 (Yi  ˆ  ˆ X i )  0 …………………………………………… (2.19)
ˆ

Rearranging this expression we will get: Y  n  ˆX

i i …………………. (2.20)
If you divide (2.20) by ‘n’ and rearrange, we get
ˆ  Y  ˆ X ......................................................................................(2.21)
 Uˆ i2
2.  2 X i (Yi  ˆ  ˆ X )  0............................................................(2.22)
ˆ

Note: at this point that the term in the parenthesis in equation 2.19 and 2.22 is the residual,
Uˆ i  Yi  ˆ  ˆ X i . Hence it is possible to rewrite (2.19) and (2.22) as 2Uˆ i  0 and
2 X iUˆ i  0 . It follows that;
Uˆ i  0 and  X Uˆ
i i  0.........................................................(2.23)
If we rearrange equation (2.22) we obtain;
Yi X i  ˆX i  ˆX i2 ……………………………………….(2.24)
Equation (2.20) and (2.24) are called the Normal Equations. Substituting the values of ̂ from
(2.21) to (2.24), we get:

Y X  X (Y  ˆX )  ˆX 2
i i i i
 Y X i  ˆXX i  ˆX i2
Y X i i  YX i  ˆ (X i2  XX i )

X iYi  nXY = ˆ ( X i2  nX 2 )
X iYi  nXY
ˆ  ………………….(2.25)
X i2  nX 2
Equation (2.25) can be rewritten in somewhat different way as follows;
( X i  X )(Yi  Y )  ( X i Yi  XY  XY  XY )
 XY  YX  XY  nXY
 XY  nYX  nXY  nXY
( X  X )(Y  Y )  XY  nXY              (2.26)
( X i  X )2  X i 2  nX 2                  (2.27)
Substituting (2.26) and (2.27) in (2.25), we get
( X  X )(Y  Y )
ˆ 
( X  X ) 2
Now, denoting ( X i  X ) as xi , and (Yi  Y ) as yi we get;
xi yi
ˆ  ………………………………………… (2.28)
xi2

The expression in (2.28) to estimate the parameter coefficient is termed as the formula in deviation
form.
2.3. Adequacy of the Regression Model
Fitting a regression model requires several assumptions.
 Errors are uncorrelated random variables with mean zero;
 Errors have constant variance; and,
 Errors are normally distributed.
The analyst should always consider the validity of these assumptions to be doubtful and conduct
analyses to examine the adequacy of the model
2.3.1. Residuals Analysis
The residuals from a regression model are Uˆ  Y  Yˆ , where Yi is an actual observation and Yˆ is
i i i i
the corresponding fitted value from the regression model. Analysis of the residuals is frequently
helpful in checking the assumption that the errors are approximately normally distributed with
constant variance, and in determining whether additional terms in the model would be useful.
Residual analysis is used to check goodness of fit for models.
2.3.2. Goodness-of-fit ( R 2 )
The aim of regression analysis is to explain the behavior of the dependent variable Y. In any given
sample, Y is relatively low in some observations and relatively high in others. We want to know
why. The variations in Y in any sample can be summarized by the sample variance, Var(Y). We
should like to be able to account for the size of this variance through the test called R 2 .
R 2 shows the percentage of total variation of the dependent variable that can be explained by the
changes in the explanatory variable(s) included in the model. To elaborate this let’s draw a

horizontal line corresponding to the mean value of the dependent variable Y . (see figure 2.1
below). By fitting the line Yˆ  ˆ 0  ˆ1 X we try to obtain the explanation of the variation of the
dependent variable Y produced by the changes of the explanatory variable X.
.Y = U  Y  Yˆ
Y
Y Y = Yˆ Yˆ  ˆ 0  ˆ1 X
= Yˆ  Y
Y.
X
Figure 2.1. Actual and estimated values of the dependent variable Y.
As can be seen from fig.(2.1) above, Y  Y represents measures the variation of the sample
observation value of the dependent variable around the mean. However the variation in Y that can
be attributed the influence of X, (i.e. the regression line) is given by the vertical distance Yˆ  Y .
The part of the total variation in Y about Y that can’t be attributed to X is equal to Uˆ i  Yˆ  Y
which is referred to as the residual variation.
In summary:
Uˆ i  Yi  Yˆ = deviation of the observation Yi from the regression line.
yi  Y  Y = deviation of Y from its mean.
yˆ  Yˆ  Y = deviation of the regressed (predicted) value ( Yˆ ) from the mean.
Now, we may write the observed Y as the sum of the predicted value ( Yˆ ) and the residual term (
Uˆ ).
i
Yi  Yˆ  Uˆ i …………………………………………(2.29)
Observed Yi predicted Yi Re sidual
From equation (2.29) we can have the above equation but in deviation form
yi  yî  Uˆ i . By squaring and summing both sides, we obtain the following expression:
yi 2  ( yî 2  Uˆ i ) 2
yi 2  ( yî 2  Uˆ ii2  2 yiUˆ i )
 yi 2  Uˆ i2  2yîUˆ i
ˆ î = Uˆ i (Yˆ  Y )  Uˆ i (ˆ  ˆ X i  Y )
But yu
 ˆUˆ i  ˆUˆ i X i  Y Uˆ i

(but Uˆ  0, Uˆ X  0 , please try to prove it).

i i i
  yU
ˆ ˆ i  0 ………………………………………………(2.30)
Therefore;
yi2  yˆ 2  Uˆ i2 ………………………………...(2.31)
Total Explained Un exp lained
var iation var iation var ation
OR,
Total sum of Explained sum Re sidual sum
 
square of square of square
       i.e. TSS  ESS  RSS …………… (2.32)
TSS ESS RSS
The breakdown of the total sum of squares TSS into the explained sum of squares ESS and the
residual sum of squares RSS is known as analysis of variance (ANOVA). The purpose of
presenting the ANOVA table is to test the significance of the explained sum of squares.
Mathematically; the explained variation as a percentage of the total variation is explained as:
ESS yˆ 2
 ……………………………………….(2.33)
TSS y 2
The estimated regression line in deviation form is given by yˆ  ̂x (dear students! You can
perform its proof by yourself).Squaring and summing both sides give us
yˆ 2  ˆ 2x 2                        (2.34)
We can substitute (2.30) in (2.29) and obtain:
ˆ 2 x 2
ESS / TSS  …………………………………(2.35)
y 2
 xy  xi x y
2 2
 2  , Since ˆ  i 2 i
 x  y xi
2
xy xy
 2 ………………………………………(2.36)
x y 2
Comparing (2.36) with the formula of the correlation coefficient:
r = Cov (X,Y) / x 2 x 2 = xy / nx 2 x 2 = xy / ( x 2 y )1/2 ………(2.37)
2
= ( xy )2 / ( x 2 y ). ………….(2.38)
2
Squaring (2.37) will result in: r2
Comparing (2.36) and (2.38), we see exactly the same the expressions. Therefore:
xy xy
ESS/TSS  2 2 = r2
x y
From (2.32), RSS=TSS-ESS. Hence R2 becomes;

TSS  RSS RSS uî2

R 
2
 1  1  2 ………………………….…………(2.39)
TSS TSS y
Q1: Prove that the value of R 2 falls between zero and one. i.e. 0  R2  1.
Q2: Show that R 2  r 2 yyˆ .
Other things being equal, one would like R 2 to be as high as possible. In particular, we would like
the coefficients  and  to be chosen in such a way as to maximize R 2 which is equivalent
criterion with that  and  should be chosen to minimize the sum of the squares of the residuals.
Interpretation of R 2
Suppose R2  0.95 , this means that the regression line gives a good fit to the observed data
since this line explains 95% of the total variation of the Y value around their mean. The
remaining 5% of the total variation in Y is unaccounted for by the regression line and is
attributed to the factors included in the disturbance variable u i .
2.4. Properties of OLS Estimates and Gauss-Markov Theorem

Given the assumptions of the classical linear regression model, the least-squares estimates possess
some ideal or optimum properties. These properties are contained in the well-known Gauss–
Markov theorem. To understand this theorem, we need to consider the best linear unbiasedness
property of an estimator. An estimator, say the OLS estimator ˆ , is said to be a best linear
unbiased estimator (BLUE) of β if the following hold:
1. It is linear, that is, a linear function of a random variable, such as the dependent variable
Y in the regression model.
2. It is unbiased, that is, its average or expected value, E( ˆ ),is equal to the true value, β2.
3. It has minimum variance in the class of all such linear unbiased estimators; an unbiased
estimator with the least variance is known as an efficient estimator.
In the regression context it can be proved that the OLS estimators are BLUE. This is the gist of the
famous Gauss–Markov theorem, which can be stated as follows:
Gauss–Markov Theorem: Given the assumptions of the classical linear regression model, the
least-squares estimators, in the class of unbiased linear estimators, have minimum variance,
that is, they are BLUE.
According to the Gauss-Markov theorem, the OLS estimators possess all the BLUE properties.
The detailed proof of these properties is presented below (Dear students! this is your reading
assignment and if you face any difficulty while reading, you are welcome).
a. Linearity: (for ˆ )
Proposition: ˆ & ˆ are linear in Y.

Proof: From (2.24) of the OLS estimator of ˆ is given by:

xi yi xi (Y  Y ) xi Y  Y xi
ˆ    ,
xi2 xi2 xi2
(but xi   ( X  X )   X  nX  nX  nX  0 )
x Y xi
 ˆ  i 2 ; Now, let  Ki (i  1,2,..... n)
xi xi2
 ˆ  K Y                           (2.40)
i
 ̂  K 1Y1  K 2Y2  K 3Y3       K nYn

 ̂ is linear in Y
Check yourself question: Show that ̂ is linear in Y. Hint: ̂    1
n  Xk i Yi . Derive the
relationship between ̂ and Y.
b. Unbiasedness:
Proposition: ˆ & ˆ are the unbiased estimators of the true parameters  & 
From your statistics course, you may recall that if ˆ is an estimator of  then E (ˆ)    the
amount of bias and if ˆ is the unbiased estimator of  then bias = 0 i.e. E (ˆ)    0  E (ˆ)  
In our case, ˆ & ˆ are estimators of the true parameters  &  .To show that they are the
unbiased estimators of their respective parameters means to prove that:
( ˆ )   and (ˆ )  
 Proof (1): Prove that ˆ is unbiased i.e. (ˆ )   .
We know that ̂  kYi  ki (  X i  U i )
 ki  ki X i  ki ui ,
but ki  0 and ki X i  1
xi ( X  X ) X  nX nX  nX
k i     0
xi
2
xi2
xi
2
xi2
  k i  0 …………………………………………………………………(2.41)
xi X i ( X  X ) Xi
k i X i  
xi2 xi2
X 2  XX X 2  nX 2
 
1   k i X i  1.......... .......... .........
X 2  nX 2 X 2  nX 2
……………………………………………(2.42)

ˆ    ki ui  ˆ    ki ui                          (2.43)

( ˆ )  E (  )  k i E (u i ), Since ki are fixed
(ˆ )   , since (ui )  0

Therefore, ˆ is unbiased estimator of  .
 Proof(2): prove that ̂ is unbiased i.e.: (ˆ )  
From the proof of linearity property of ̂ , we know that:
̂  1 n  Xk i Yi
 1 n  Xk i   X i  U i  , Since Yi    X i  U i
    1 n X i  1 n ui  Xk i  Xk i X i  Xk i ui
   1 n ui  Xk i ui ,  ˆ    1
n ui  Xk i ui
   1 n  Xk i )u i …………………………………….(2.44)
(ˆ )    1 n (ui )  Xk i (ui )

(ˆ )                                (2.45)
 ̂ is an unbiased estimator of  .
c. Minimum variance of ˆ and ˆ
Now, we have to establish that out of the class of linear and unbiased estimators of  and  ,
ˆ and ˆ possess the smallest sampling variances. For this, we shall first obtain variance of
ˆ and ˆ and then establish that each has the minimum variance in comparison of the variances
of other linear and unbiased estimators obtained by any other econometric methods than OLS.
a. Variance of ˆ
var(  )  (ˆ  (ˆ )) 2  ( ˆ   ) 2 ……………………………………(2.46)
Substitute (2.43) in (2.46) and we get

var( ˆ )  E ( k u ) 2
i i
 [k u  k u  .......... ..  k n2 u n2  2k1k 2 u1u 2  .......  2k n1k n u n1u n ]

2 2
1 1
2 2
2 2
 [k12 u12  k 22 u 22  .......... ..  k n2 u n2 ]  [2k1k 2 u1u 2  .......  2k n1k n u n1u n ]

 ( k i2 u i2 )  (k i k j u i u j ) i j
 k i2 (u i2 )  2k i k j (u i u j )   2 k i2 (Since (ui u j ) =0)
xi xi2 1
k i  2 , and therefore, k i 
2
 2
xi (xi )
2 2
xi

2
 var( ˆ )   2 k i2  2 ……………………………………………..(2.47)
xi
Variance of ̂
var(ˆ )  (ˆ  ( )
2
  ˆ                             (2.48)
2
Substituting equation (2.30) in (2.34), we get


var( ˆ )    1
n  Xk i  ui2
2

  1 n  Xk i  (u i ) 2
2
  2 ( 1 n  Xk i ) 2
  2 ( 1 n 2  2 n Xk i  X 2 k i2 )
  2 ( 1 n  2 X n k i  X 2 k i2 ) , Since  k i  0
  2 ( 1 n  X 2 k i2 )
1 X2 xi2 1
 2(  ) , Since k 2
  2
n  xi (xi ) xi
2 i 2 2
Again:
1 X 2 xi2  nX 2  X 2 
    
2 
n xi2 nxi2  nxi 
 X2   X i2 
 var(ˆ )   2  1 n  2    2   …………………………………………(2.49)
2 
 x i   nx i 
Dear students! We have computed the variances of OLS estimators. Now, it is time to check
whether these variances of OLS estimators do possess minimum variance property compared to
the variances other estimators of the true  and  , other thanˆ and ˆ .
To establish that ˆ and ˆ possess minimum variance property, we compare their variances with
that of the variances of some other alternative linear and unbiased estimators of  and  , say  *
and  * . Now, we want to prove that any other linear and unbiased estimator of the true
population parameter obtained from any other econometric method has larger variance that that
OLS estimators.
Lets first show minimum variance of ˆ and then that of ̂ .
1. Minimum variance of ˆ
Suppose:  * an alternative linear and unbiased estimator of  and;
Let  *  w i Y i .......... .......... .......... .......... . ………………………………(2.50)

where , wi  ki ; but: wi  ki  ci
 *  wi (  X i  ui ) Since Yi    X i  U i
 wi  wi X i  wi ui
 ( *)  wi  wi X i ,since (ui )  0
Since  * is assumed to be an unbiased estimator, then for  * is to be an unbiased estimator of  ,
there must be true that wi  0 and wi X  1 in the above equation.
But, wi  ki  ci
wi  (ki  ci )  ki  ci
Therefore, ci  0 since ki  wi  0
Again wi X i  (ki  ci ) X i  ki X i  ci X i
Since wi X i  1 and ki X i  1  ci X i  0 .
From these values we can drive ci xi  0, where xi  X i  X
ci xi   ci ( X i  X ) ci X i  Xci
Since ci xi  1 ci  0  ci xi  0
Thus, from the above calculations we can summarize the following results.
wi  0, wi xi  1, ci  0, ci X i  0
To prove whether ˆ has minimum variance or not lets compute var(  *) to compare with var( ˆ ) .
var(  *)  var( wiYi )
 wi var( Yi )
2
 var(  *)   2 wi2 since Var (Yi )   2

But, wi  (k i  ci )  k i  2k i ci  ci
2 2 2 2
ci xi
 wi2  k i2  ci2 Since k i ci  0
xi2
Therefore, var(  *)   (k i  ci )   k i   ci
2 2 2 2 2 2 2
var(  *)  var( ˆ )   2 ci2
Given that ci is an arbitrary constant,  2 ci2 is a positive i.e it is greater than zero. Thus
var(  *)  var( ˆ ) . This proves that ˆ possesses minimum variance property. In the similar way
we can prove that the least square estimate of the constant intercept ( ̂ ) possesses minimum
variance.

Minimum Variance of ̂
We take a new estimator  * , which we assume to be a linear and unbiased estimator of function
of . The least square estimator ̂ is given by:
ˆ  ( 1 n  Xk i )Yi
By analogy with that the proof of the minimum variance property of ˆ , let’s use the weights wi =
ci + ki Consequently;
 *  ( 1 n  Xwi )Yi
Since we want  * to be on unbiased estimator of the true  , that is, ( *)   , we substitute
for Y    xi  ui in  * and find the expected value of  * .
 *  ( 1 n  Xwi )(  X i  ui )
 X u i
 (    Xwi  XX i wi  Xwi ui )
n n n
 *    X   ui / n  Xwi  Xwi X i  Xwi u i
For  * to be an unbiased estimator of the true  , the following must hold.
 ( wi )  0, ( wi X i )  1 and  ( wi u i )  0
i.e., if wi  0, and wi X i  1 . These conditions imply that ci  0 and ci X i  0 .
As in the case of ˆ , we need to compute Var(  * ) to compare with var( ̂ )
var( *)  var ( 1 n  Xwi )Yi 
 ( 1 n  Xwi ) 2 var(Yi )
  2 ( 1 n  Xwi ) 2
  2 ( 1 n2  X 2 wi  2 1 n Xwi )
2
  2 ( n n2  X 2 wi  2 X wi )
2 1
n
var( *)   2 
1
n  X 2 wi
2
 ,Since wi  0
but wi  k i  ci
2 22
 var( *)   2  1
n  X 2 (k i2  ci2 
1 X2 
var( *)   2   2    2 X 2 ci2
 n xi 
 X i2 
  2     2 X 2 ci2
 nxi
2

The first term in the bracket it var( ˆ ) , hence

var( *)  var(ˆ )   2 X 2 ci2
 var( *)  var(ˆ ) , Since  2 X 2 ci2  0

The variance of the random variable (Ui )
Dear student! You may observe that the variances of the OLS estimates involve  2 , which is the
population variance of the random disturbance term. But it is difficult to obtain the population data
of the disturbance term because of technical and economic reasons. Hence it is difficult to compute
 2 ; this implies that variances of OLS estimates are also difficult to compute. But we can compute
these variances if we take the unbiased estimate of  2 which is ̂ 2 computed from the sample
value of the disturbance term ei from the expression:
Uˆ i2
ˆ 
2
…………………………………..2.51
n2
u
To use ̂ 2 in the expressions for the variances of ˆ and ˆ , we have to prove whether ̂ 2 is the
unbiased estimator of  , i.e., E (ˆ 2 )  E 

2 Uˆ   
i
2
2
n2
2.5. Maximum Likelihood Method of Estimation

[
The maximum likelihood method of estimation is based on the idea that different populations
generate different samples, and that any given sample is more likely to have come from some
populations than from others.
The ML estimator of a parameter  is the value of ˆ which would most likely generate the
observed sample observations Y1 , Y2 ,..., Yn . The ML estimator maximizes the likelihood function L
which is the product of the individual probabilities taken overall n observations given by:
L(Y1 ,Y2 ,...,Yn ,  ,  ,  2 )  P(Y1 ) P(Y2 )...P(Yn )
1  1 
 exp  2  Y  E (Y ) 
2

   2
i i

n
2 2
1  1 
 exp  2  Y     X 
2

   2
i i

n
2 2
Our aim is to maximize this likelihood function L with respect to the parameters  ,  and  2 . To

do this, it is more convenient to work with the natural logarithm of L (called the log-likelihood
function) given by:
n n 1
 ln(2 )   (Y     X )
2
ln L  
2 2
i i
2 2
Taking partial derivatives of lnL with respect to  ,  and  2 and equating with zero, we get:
 ln L 1


2 2
 (Y     X )(1)  0
i i
 ln L 1
  2  (Yi     Xi )( X i )  0
 2
Rearranging the above equations, and replacing  by  and  by  , we get:
Y  n    X
i i
 X Y  X   X
i i i i
2
Note that these equations are similar to the normal equations that we obtained so far (i.e. under
OLS and MM). Solving for U  Y  Yˆ and we get:
  Y   X  ˆ
 XY i i
 ˆ
 X  nX
i
2 2
By partial differentiation of the lnL with respect to  2 and equating it to zero we get:
 ln L n 1  1 2  1 
   2    (Yi     Xi )  4   0
 2
2  2  
Replacing  2 by  2 and simplifying, we get:
1 1 1
 (Yi     Xi )   (Yi  ˆ  ˆ Xi )  U i2
2
2 
2
n n n

Note:
i. The ML estimators  and  are identical to the OLS estimators, and are thus best
linear unbiased estimators (BLUE).
ii. The ML estimator  2 of  2 is biased.
2.6. Confidence Intervals and Hypothesis Testing (Statistical Inference)
Estimation of Standard Error

We have already seen that:
2 2  X i 
2
Var ( ˆ )  and Var (ˆ )    2 
where xi  X i  X . Since this variance depends
 xi2  nxi 
on the unknown parameter  2 , we have to estimate  2 . As shown above, an unbiased estimator
of  2 is given by:
ˆ 2   i . Thus, an unbiased estimator of Var ( ˆ ) is given by:

uˆ 2
n2
Vˆ ( ˆ ) 
 uˆ 2
i
(n  2) x 2
i
The square root of Vˆ ( ˆ ) is called the standard error of ˆ , that is,
S.E.( ˆ )= Vˆ ( ˆ ) 
ˆ 2

 uˆ 2
x 2
i (n  2) x 2
i
Similarly,
 X 2   uî  X i2 
2
2  X i   uˆ
2
 X i2 
2
Vˆ (ˆ )  ˆ 2  i 2    ˆ ˆ 
2  V ( ) ˆ  2 
i
and S.E. = =  
 nxi  n  2  nxi   nx i  n  2  nxi2 
2.6.1. Hypothesis Tests in Simple Linear Regression

Goal: Make statement(s) regarding unknown population parameter values based on sample data
Elements of a hypothesis test:
 Null hypothesis (H0 ): Statement regarding the value(s) of unknown parameter(s). It
typically will imply no association between explanatory and response variables in our
applications (will always contain equality).
 Alternative hypothesis (Ha /H1 ): Statement contradictory to the null hypothesis (will
always contain an inequality).
 Test statistic: Difference between the Sample means, scaled to number of standard
deviations (standard errors) from the null difference of 0 for the Population means.
 Rejection region (significance level): Values of the test statistic for which we reject the

null in favor of the alternative hypothesis.

The OLS estimates ˆ and ˆ are obtained from a sample of observations on Y and X. Since
sampling errors are inevitable in all estimates, it is necessary to apply test of significance in order
to measure the size of the error and determine the degree of confidence in order to measure the
validity of these estimates. This can be done by using various tests. The most common ones are:
I. Standard error test
II. Student’s t test
III. P value
IV. Confidence interval
All of these testing procedures reach on the same conclusion. Let us now see these testing methods
one by one.
I. Standard error test1
This test helps us decide whether the estimates ˆ and ˆ are significantly different from zero, i.e.
whether the sample from which they have been estimated might have come from a population
whose true parameters are zero.   0 and / or   0 .
Formally we test the null hypothesis
H0 :   0 against the alternative hypothesis H1 :   0
The standard error test may be outlined as follows.
First: Compute standard error of the parameters.
SE(ˆ )  var( ˆ )
SE (ˆ )  var(ˆ )
Second: compare the standard errors with the numerical values of ˆ and ˆ .
Decision rule:
 If SE ( î )  ˆ , do not reject the null hypothesis and reject the alternative hypothesis.
1
2
We conclude that ˆ is statistically insignificant.
 If SE ( ˆ )  1 2 ˆ , reject the null hypothesis and accept the alternative hypothesis. We
conclude that ˆ is statistically significant.

The acceptance or rejection of the null hypothesis has definite economic meaning. Namely, the
acceptance of the null hypothesis   0 (the slope parameter is zero) implies that the explanatory
variable to which this estimate relates does not in fact influence the dependent variable Y and
1
Note: The standard error test is an approximated test (which is approximated from the z-test
and t-test) and implies a two tail test conducted at 5% level of significance.

should not be included in the function, since the conducted test provided evidence that changes in
X leave Y unaffected. In other words acceptance of H0 implies that the relationship between Y and
X is in fact Y    (0) x   , i.e. there is no relationship between X and Y.
Numerical example: Suppose that from a sample of size n=30, we estimate the following supply
function.
Q  120  0.5 p  uî

SE : (1.7) (0.045)
Test the significance of the slope parameter at 5% level of significance using the standard error
test.
SE ( ˆ )  0.045
(ˆ )  0.6
1
2 ˆ  0.25
This implies that SE ( î )  1
2 î . The implication is ˆ is statistically significant at 5% level of
significance.
II. Student’s t test
From your statistics, any variable X can be transformed into t using the general formula:
X 
t , with n-1 degree of freedom.
x
Where i  value of the population mean
x  sample estimate of the population standard deviation
( X  X ) 2
x 
n 1
n  sample size
We can derive the t-value of the OLS estimates
ˆ   
t ˆ  i 
SE( ˆ ) 
 with n-k degree of freedom.
ˆ   
tˆ 
SE(ˆ ) 
Where:
SE = is standard error
k = number of parameters in the model.
Since we have two parameters in simple linear regression with intercept different from zero, our
degree of freedom is n-2. Like the standard error test we formally test the hypothesis:

H0 :   0 against the alternative H1 :   0 for the slope parameter; and H0 :  0

against the alternative H 1 :   0 for the intercept.
To undertake the above test we follow the following steps.
Step 1: Compute t*, which is called the computed value of t, by taking the value of  in the null
hypothesis. In our case   0 , then t* becomes:
ˆ  0 ˆ
t*  
SE( ˆ ) SE( ˆ )
Step 2: Choose level of significance. Level of significance is the probability of making ‘wrong’
decision, i.e. the probability of rejecting the hypothesis when it is actually true or the probability of
committing a type I error. It is customary in econometric research to choose the 5% or the 1%
level of significance. This means that in making our decision we allow (tolerate) five times out of
a hundred to be ‘wrong’ i.e. reject the hypothesis when it is actually true.
Step 3: Check whether there is one tail test or two tail tests. If the inequality sign in the
alternative hypothesis is  , then it implies a two tail test and divide the chosen level of
significance by two; decide the critical rejoin or critical value of t called t c. But if the inequality
sign is either > or < then it indicates one tail test and there is no need to divide the chosen level of
significance by two to obtain the critical value of tc from the t-table.
Example:
If we have H0 :   0
against: H1 :   0
Then this is a two tail test. If the level of significance is 5%, divide it by two to obtain critical
value of tc from the t-table.
Step 4: Obtain critical value of t, called tc at  2 and n-2 degree of freedom for two tail test.
Step 5: Compare t* (the computed value of t) and tc (critical value of t)
 If t*> tc , reject H0 and accept H1 . The conclusion is ˆ is statistically significant.
 If t*< tc , accept H0 and reject H1 . The conclusion is ˆ is statistically insignificant.

Suppose that from a sample size n=20 we estimate the following consumption function:
Cˆ  100  0.70  uˆ
(75.5) (0.21)
The values in the brackets are standard errors. We want to test the null hypothesis: H0 :   0
against the alternative H1 :   0 using the t-test at 5% level of significance.
a. the t-value for the test statistic is:
ˆ  0 ˆ 0.70
t*   =  3 .3
SE( ˆ ) SE( ˆ ) 0.21

b. Since the alternative hypothesis (H1 ) is stated by inequality sign (  ), it is a two tail test,
hence we divide  2  0.05 2  0.025 to obtain the critical value of ‘t’ at  2 =0.025 and 18
degree of freedom (df) i.e. (n-2=20-2). From the t-table ‘tc’ at 0.025 level of significance
and 18 df is 2.10.
c. Since t*=3.3 and tc=2.1, t*>tc. It implies that ˆ is statistically significant.
The t test of significance: Decision rules

Type of hypothesis H0 :the null H1 : the alternative Decision rule: reject
hypothesis hypothesis H0 if
Two-tail ̂   * ̂   * t  t , df
2
Right-tail ̂   * ̂   * t  t , df
Left-tail ̂   * ̂   * t  t , df
III. The P value: The Exact Level of Significance

The p value (i.e., probability value), also known as the observed or exact level of significance or
the exact probability of committing a Type I Error is another method of hypothesis testing. The p
value is defined as the lowest significance level at which a null hypothesis can be rejected. P value
is probability of obtaining a test statistic more extreme (≤ or  ) than the observed sample value,
given H0 is true.
The p-Value: Rules of Thumb

1. When the p-value is smaller than 0.01, the result is called statistically very significant.
2. When the p-value is smaller than 0.05, the result is called statistically significant.
3. When the p-value is greater than 0.05 result is considered statistically not significant.
Note: The conventionally used levels of significance are 1% and 5%.
There is a great difference between statistical significa nce and real/practical/economic
significance.
2.6.1. Interval Estimation (Confidence Intervals)

Estimation methods considered so far give us a point estimate of a parameter, say ,  and  2 and
that is the best bet, given the data and the estimation method, of what μ might be. But it is always
good policy to give the client an interval, rather than a point estimate, where with some degree of
confidence, usually 95% confidence, we expect  ,  and  2 to lie.
In a two-tail test at  level of significance, the probability of obtaining the specific t-value either –
tc or tc is  2 at n-2 degree of freedom. The probability of obtaining any value of t which is equal to
ˆ  
at n-2 degree of freedom is 1   2   2  i.e. 1   . Symbolically,
SE( ˆ )

P  t (n  2)  t  t (n  2)   1  
 2 2 
 ˆ   
P  t (n  2)   t (n  2)   1  
 2 se( ˆ ) 2

Rearranging the above expressions gives,
P  ˆ  t (n  2)se( ˆ )    ˆ  t (n  2) se( ˆ )   1   . Thus, a (1   )100% confidence interval
 2 2 
for  is given by ˆ  t (n  2)se(ˆ ) . Now we can use the constructed confidence interval for
2
hypothesis testing.
The test procedure is outlined as follows.
H0 :   0
H1 :   0
Decision rule: If the hypothesized value of  in the null hypothesis is within the confidence
interval, accept H0 and reject H1 . The implication is that ˆ is statistically insignificant; while if
the hypothesized value of  in the null hypothesis is outside the limit, reject H0 and accept H1 .
This indicates ˆ is statistically significant.
2.7. Predictions using Simple Linear Regression Model
If the chosen model does not refute the hypothesis or theory under consideration, we may use it to
predict the future value(s) of the dependent, or forecast, variable Yon the basis of known or
expected future value(s) of the explanatory, or predictor, variable X.
Suppose we want to predict the mean consumption expenditure (C) for 1997 given that the
estimated model (using data from 1963-1996) is given by Cˆ  153.48  0.876Y . The GDP (Y)
value for 2004 was 68275.40 million Birr. Putting this GDP figure on the right-hand side of the
estimated model, we obtain;
Cˆ1997  153.48  0.876(68275.40)  59962.7304
Note that there is discrepancy between the predicted value and the actual value which results in
forecast error. What is important here is to note that such forecast errors are unavoidable given
the statistical nature of our analysis.

CHAPTER THREE
MULTIPLE LINEAR REGRESSION
3.1.Introduction
In Chapter 2, we learned how to use simple regression analysis to explain a dependent variable, Y,
as a function of a single independent variable, X. But in practice, economic models generally
contain one dependent variable and two or more independent variables. Such models are called
multiple regression models.
Examples:
1. Suppose that soybean yield (Y) is determined the amount of fertilizer (X1 ) applied,
land quality (X2 ) and rainfall (X3 ) and is given by the following model:
Yi    1 X1i  2 X 2i  3 X 3i  ui ,
The error term u contains unforeseen factors.
2. In a study of the amount of output (product), we are interested to establish a
relationship between output (Q) and labor input (L) & capital input (K). The equation
is often estimated in log-log form as:
ln Qi    1 ln Li  2 ln Ki   i
3. Economic theory postulates that the quantity demanded for a given commodity (Q d)
depends on its price (X1 ), price of other products (X2 ), consumer’s income (X3 ).
Qdi    1 X1i  2 X 2i  3 X 3i   i , where the disturbance term  (reads as
‘epsilon’) contains unobserved factors such as tastes and so on.
Multiple regression analysis is more amenable to ceteris paribus analysis because it allows us to
explicitly control for many other factors that simultaneously affect the dependent variable.
Naturally, if we add more factors to our model that are useful for explaining Y, then more of the
variation in Y can be explained. Thus, multiple regression analysis can be used to build better
models for predicting the dependent variable.
Assumptions of Multiple Regression Model

We continue to operate within the framework of the classical linear regression model
(CLRM) first introduced in Chapter 2. Specifically, we assume the following:
1. Linearity of the model in parameter: The classicals assumed that the model should be
linear in the parameters regardless of whether the explanatory and the dependent variables
are linear or not.
2. Randomness of the error term: The variable ui is a real random variable.
3. Zero mean of the error term: E(ui )  0
4. Homoscedasticity: The variance of each u i is the same for all the xi values.
i.e. E (u i )   u (constant)
2 2
5. Normality of U: The values of each u i are normally distributed.

i.e. ui ~ N (0,  )
2
6. No auto or serial correlation: The values of ui (corresponding to X i ) are independent

from the values of any other ui (corresponding to Xj ) for i j.
i.e. E (u i u j )  0 for xi  j
7. Independence of u i and Xi : Every disturbance term u i is independent of the explanatory

variables. i.e. E(ui X1i )  E(ui X 2i )  0
This condition is automatically fulfilled if we assume that the values of the X’s are a set of
fixed numbers in all (hypothetical) samples.
8. X1i , X 2i ,..., X ki are non-stochastic variables

9. No perfect multicollinearity: The explanatory variables are not perfectly linearly
correlated.
Informally, no collinearity means none of the regressors can be written as exact linear
combinations of the remaining regressors in the model. Formally, no collinearity means that there
exists no set of numbers,λ 1 and λ2 , not both zero such that
1 X1i  2 X 2i  0
If such an exact linear relationship exists, then X1 and X3 are said to be Collinear or linearly
dependent.
3.2. Method of Ordinary Least Squares Revised

3.2.1. The Three Variable Model: Estimation

In order to understand the nature of multiple regression model easily, we start our analysis with the
case of two explanatory variables, then extend this to the case of k-explanatory variables.
Consider the model
Yi    1 X1i  2 X 2i  ui .........................................................(3.1.)
In order to understand the nature of multiple regression model easily, we start our analysis with the
case of two explanatory variables, then extend this to the case of k-explanatory variables. The
expected value of the above model is called population regression equation i.e.
E(Y )    1 X1i  2 X 2i , Since E(ui )  0 . …………………................(3.2)
where  , 1 and 2 are the population parameters.  is referred to as the intercept and 1 and  2
are also sometimes known as regression slopes of the regression. Note that,  2 for example
measures the effect on E (Y ) of a unit change in X 2 when X 1 is held constant.
Since the population regression equation is unknown to any investigator, it has to be estimated
from sample data. Let us suppose that the sample data has been used to estimate the population
regression equation. We leave the method of estimation unspecified for the present and merely
assume that equation (3.2) has been estimated by sample regression equation, which we write as:
Yî  ˆ  ˆ1 X 1i  ˆ2 X 2i ……………………………………………….(3.3)
Where ̂1 and ˆ2 are estimates of the 1 and 2 respectively and Yˆ is known as the predicted
value of Y.
Now it is time to state how (3.1) is estimated. Given sample observation on Y , X 1 & X 2 , we
estimate (3.1) using the method of least square (OLS).
Yi  ˆ  ˆ1 X 1i  ˆ2 X 2i  uî ……………………………………….(3.4)
Equation (3.4) is estimated relation between Y , X 1 & X 2 .
 
uˆ 2  (Y  Yˆ  Y  ˆ  ˆ X  ˆ X )2 …………………………………..(3.5)
i i i 1 1 2 2
To obtain expressions for the least square estimators, we partially differentiate u 2

i with respect
to ˆ , ˆ1 and ˆ 2 and set the partial derivatives equal to zero.

   ui2 
ˆ
 
 2 Yi  ˆ  ˆ1 X 1i  ˆ2 X 2i  0 ………………………. (3.6)
Summing from 1 to n, the multiple regression equation produces Normal Equation:

Y  nˆ  ˆ X  ˆ X …………………………………….(3.7)
1 1i 2 2i
ˆ  Y  ˆ1 X 1  ˆ2 X 2 ------------------------------------------------- (3.8)

To solve for ̂1 and ˆ2 , it is better to put our model in deviation form as below:
y  ˆ1 x1i  ˆ2 x2i  ui  ui  yi  ˆ1 x1i  ˆ2 x2i
  uˆ   ( y  ˆ x
2
i i 1 1i  ˆ2 x2i )2 ……………………………………. (3.9)
Partially differentiating (3.9) w.r.t. ̂1 and ˆ2 , equating to zero and simplifying we get:
  uî2
 0   x1i yi  ˆ1x1i 2  ˆ2x1i x2i ………………………………………… (3.10)
ˆ
 1
  uî2
 0   x2i yi  ˆ1x1i x2i  ˆ2x22i ……………………………………… (3.11)
ˆ 2
ˆ1 and ̂ 2 can easily be solved using matrix

We can rewrite the above two equations in matrix form as follows.
x 1
2
x x 1 2 ˆ1 = x y
1i i …………………..(3.12)
x x 1 2 x 2
2
̂ 2 x y 2i i
If we use Cramer’s rule to solve the above matrix we obtain

x y . x 2  x x . x y
ˆ1  1 2 2 2 1 2 2 2 …………………………..…………….. (3.13)
x1 . x2  (x1 x2 )
ˆ x2 y . x12  x1 x2 . x1 y

2  ………………….……………………… (3.14)
x12 . x2 2  (x1 x2 ) 2
An unbiased estimator of the variance of the errors  2 is given by:
ˆ 
2  ˆ )  uˆ 2
(Yi  Yi
 i
…………………………………………………… (3.15)
n3 n3
Where Yî  ˆ  ˆ1 X 1i  ˆ2 X 2i
The variances of estimated regression coefficients ̂1 and ˆ2 are estimated, respectively, as:
ˆ 2 ˆ 2
Vˆ ( ˆ1 )  and Vˆ ( ˆ2 )  where r12 is the coefficient of correlation
(1  r122 ) x12i (1  r122 ) x22i
between X 1i and X 2i .
3.3.Partial Correlation Coefficients & Their Interpretation
A partial correlation coefficient measures the relationship between any two variables, when all
other variables connected with those two are kept constant. For the three-variable regression model
we can compute three correlation coefficients: r12 (correlation between Y and X2), r13 (correlation

coefficient between Y and X3 ), and r23 (correlation coefficient between X2 and X3); notice that we
are letting the subscript 1 represent Y for notational convenience. These correlation coefficients
are called gross or simple correlation coefficients, or correlation coefficients of zero order.
But now consider this question: Does, say, r12 in fact measure the “true” degree of (linear)
association between Y and X2 when a third variable X3 may be associated with both of them? In
general, r12 is not likely to reflect the true degree of association between Y and X2 in the presence
of X3 . As a matter of fact, it is likely to give a false impression of the nature of association
between Y and X2 , as will be shown shortly. Therefore, what we need is a correlation coefficient
that is independent of the influence, if any, of X3 on X2 and Y. Such a correlation coefficient can
be obtained and is known appropriately as the partial correlation coefficient. Conceptually, it is
similar to the partial regression coefficient. We define
 r12.3 = partial correlation coefficient between Y and X2, holding X3 constant

 r13.2  partial correlation coefficient between Y and X3, holding X2 constant
 r23.1  partial correlation coefficient between X2 and X3, holding Y constant
These partial correlations can be easily obtained from the simple or zero order, correlation
coefficients as follows:
r12  r13r23
r12.3  ……………………………………………………………….. (3.16a)
(1  r132 )(1  r232 )
r13  r12 r23

r13.2  …………………………………………………………………..
(1  r122 )(1  r232 )
(3.16b)
r23  r12 r13

r23.1  ……………………………………………………………………
(1  r122 )(1  r132 )
(3.16c)
The partial correlations given in the above equations are called first order correlation coefficients.
By order we mean the number of secondary subscripts. Thus r12.34 would be the correlation
coefficient of order two, r12.345 would be the correlation coefficient of order three, and so on. As
noted previously, r12 , r13 and so on are called simple or zero-order correlations. The interpretation

of, say, r12.345 is that it gives the coefficient of correlation between Y and X2, holding X3 andX4
constant.
Interpretation of Partial Correlation Coefficients
In the two-variable case, the simple r had a straightforward meaning: It measured the degree of
(linear) association (and not causation) between the dependent variable Y and the single
explanatory variable X. But once we go beyond the two-variable case, we need to pay careful
attention to the interpretation of the simple correlation coefficient. From (3.22a), for example,
we observe the following:
1. Even if r12  0 , r12.3 will not be zero unless r13 or r23 or both are zero.
2. If r12  0 and, r13 and r23 are nonzero and are of the same sign, r12.3 will be negative,
whereas if they are of the opposite signs, it will be positive.
3. The terms r12.3 and r12 (and similar comparisons) need not have the same sign.
4. In the two-variable case we have seen that r 2 lies between 0 and 1. The same property
holds true of the squared partial correlation coefficients. Using this fact, one can obtain the
following expression:
0  r122  r132  r232  2r12r13r23  1
which gives the interrelationships among the three zero-order correlation coefficients.
5. Suppose that r13  r23  0 . This does not mean that Y and X2 are uncorrelated (i.e. r12  0 ).
2
In passing, note that the expression r12.3 may be called the coefficient of partial determination and
may be interpreted as the proportion of the variation in Y not explained by the variable X 3 that has
been explained by the inclusion of X2 into the model. Conceptually it is similar to R2 .
Before moving on, note the following relationships between R2 , simple correlation coefficients,
and partial correlation coefficients:
R 2  r122  (1  r122 )r13.2
2
R 2  r132  (1  r132 )r12.3

2
It above expressions state that R2 will not decrease if an additional explanatory variable is
introduced into the model, which can be seen clearly from the above equations. This equation
states that the proportion of the variation in Y explained by X2 and X3 jointly is the sum of two
parts: the part explained by X2 alone ( R  r12 ) and the part not explained by X2 (1  r12 ) times the
2 2 2
proportion that is explained by X3 after holding the influence of X2 constant. Now R  r12 as long
2 2
as r13.2  0 .
2

The sign of partial correlation coefficients is the same as that of the corresponding estimated
parameter. For example, for the estimated regression equation Yˆ  ˆ  ˆ1 X 1  ˆ2 X 2 , r12.3 has the
same sign as ̂1 and r13.2 has the same sign as ˆ2 .
Partial correlation coefficients are used in multiple regression analysis to determine the relative
importance of each explanatory variable in the model. The independent variable with the highest
partial correlation coefficient with respect to the dependent variable contributes most to the
explanatory power of the model.
3.4. Coefficient of Multiple Determination
In the simple regression model, we introduced R2 as a measure of the proportion of variation in the
dependent variable that is explained by variation in the explanatory variable. In multiple
regression model the same measure is relevant, and the same formulas are valid but now we talk of
the proportion of variation in the dependent variable explained by all explanatory variables
included in the model. The coefficient of determination is:
ESS RSS uˆ 2
R2   1  1  i2 ------------------------------------- (3.17)
TSS TSS yi
In the present model of two explanatory variables given in deviation form:
uî2  ( yi  ˆ1 x1i  ˆ2 x2i ) 2
 uî ( yi  ˆ1 x1i  ˆ2 x2i )
 uî y  ˆ1x1i uî  ˆ2uî x2i
 uî yi since uî x1i  uî x2i  0
 y i ( y i  ˆ1 x1i  ˆ 2 x 2i )
i.e uî2  y 2  ˆ1x1i yi  ˆ2x2i yi
 y 2  ˆ1x1i yi  ˆ2x2i yi  uî 2 ----------------- (3.18)
Total sumof Explained sum of Re sidual sum of squares
square (Total square ( Explained ( un exp lained var iation )
var iation ) var iation )
 R 
ESS ˆ1x1i yi  ˆ2 x2i yi
2
  1
 uî2
----------------------------------(3.19)
TSS y 2  yi2
As in simple regression, R2 is also viewed as a measure of the prediction ability of the model over
the sample period, or as a measure of how well the estimated regression fits the data. If R2 is high,
the model is said to “fit” the data well. If R2 is low, the model does not fit the data well.
Adjusted Coefficient of Determination ( R 2 )
One difficulty with R 2 is that it can be made large by adding more and more variables, even if the
variables added have no economic justification. Algebraically, it is the fact that as the variables

are added the sum of squared errors (RSS) goes down (it can remain unchanged, but this is rare)
and thus R 2 goes up. If the model contains n-1 variables then R 2 =1. The manipulation of model
just to obtain a high R 2 is not wise. An alternative measure of goodness of fit, called the adjusted
R 2 and often symbolized as R 2 , is usually reported by regression programs. It is computed as:
uî2 / n  k  n 1 
R  1 2
2
 1  (1  R 2 )   --------------------------------(3.20)
y / n  1  nk 
This measure does not always goes up when a variable is added because of the degree of freedom
term n-k is the numerator. As the number of variables k increases, RSS goes down, but so does n-
k. The effect on R 2 depends on the amount by which R 2 falls. While solving one problem, this
corrected measure of goodness of fit unfortunately introduces another one. It loses its
interpretation; R 2 is no longer the percent of variation explained. This modified R 2 is sometimes
used and misused as a device for selecting the appropriate set of explanatory variables.
2 2
The R and R tell you whether:
 The regressors are good at predicting or "explaining" the values of the dependent
2 2
variable in the sample of data on hand. If the R (or R ) is nearly 1, then the regressors
produce good predictions of the dependent variable in that sample, in the sense that
the variance of the OLS residual is small compared to the variance of the dependent
variable.
2 2
The R (or R ) do NOT tell you whether:
 Whether the included variable is statistically significant;
 The regressors are a true cause of the movements in the dependent variable;
 There is omitted variable bias; or
 You have chosen the most appropriate set of regressors.
3.5.General Linear Regression Model and Matrix Approach

Let us now generalize the model assuming that it contains k variables. It will be of the form:
Yi  0  1 X1i  2 X 2i  ......  k X ki  ui
There are k+1 parameters to be estimated. The system of normal equations consist of k+1
equations, in which the unknowns are the parameters  0 , 1 ,  2 .......  k and the known terms will be
the sums of squares and the sums of products of all variables in the structural equations.
Least square estimators of the unknown parameters are obtained by minimizing the sum of the
squared residuals.
uî2  (Yi  ˆ0  ˆ1 X 1i  ˆ2 X 2i  ......  ˆk X ki ) 2

With respect to ˆ j ( j  0,1, 2,...., k )

The partial derivations are equated to zero to obtain normal equations.
uî2
 2(Yi  ˆ0  ˆ1 X1i  ˆ2 X 2i  ......  ˆk X ki )  0
ˆ0
uî2
 2(Yi  ˆ0  ˆ1 X1i  ˆ2 X 2i  ......  ˆk X ki )(X1i )  0
ˆ
1
uî2
 2(Yi  ˆ0  ˆ1 X1i  ˆ2 X 2i  ......  ˆk X k )(Xki )  0
ˆk
The general form of the above equations (except first ) may be written as:
uî2
 2(Yi  ˆ0  ˆ1 X 1i       ˆk X ki )(X ji )  0 ; where ( j  1,2,....k )
ˆ j
The normal equations of the general linear regression model are

Yi  nˆ0  ˆ1X 1i  ˆ 2 X 2i  .......... .......... .......... .  ˆ k X ki
Yi X 1i  ˆ0 X 1i  ˆ1X 1i  .......... .......... .......... ...  ˆ k X 1i X ki
2
Yi X 2i  ˆ0 X 21i  ˆ1X 1i X 2i  ˆ 2 X 2i  ..........  ˆ k X 2i X ki

2
: : : : :
: : : : :
Yi X ki  ˆ0 X ki  ˆ1X 1i X ki   X 2i X ki .......... ........  ˆk X ki
2
Solving the above normal equations will result in algebraic complexity. But we can solve this
easily using matrix. Hence in the next section we will discuss the matrix approach to linear
regression model.
The general linear regression model with k explanatory variables is written in the form:
Yi   0  1 X 1i   2 X 2i  .......... ...   k X ki +ui
Since i represents the ith observation, we shall have ‘n’ number of equations with ‘n’ number of
observations on each variable.
Y1  0  1 X11  2 X 21  3 X 31.............  k X k1  u1
Y2  0  1 X12  2 X 22  3 X 32 .............  k X k 2  u2
Y3  0  1 X13  2 X 23  3 X 33.................  k X k 3  u3
…………………………………………………...
Yn  0  1 X1n  2 X 2n  3 X 3n .............  k X kn  un
These equations are put in matrix form as:

 Y1  1 X 11 X 21 ....... X k1   0   u1 
Y  1 X 12 X 22 .......  
X k 2   1   u 
 2   2
Y3   1 X 13 X 23 ....... X k 3    2    u3 
       
. . . . ....... .   .  .
Yn  1 X 1n X 2n ....... X kn    n  un 
Y  X .   u
In short Y  X   u ……………………………………………………(3.21)
The order of matrix and vectors involved are:
Y  (n 1), X  (n  (k  1) ,   (k  1) 1 and u  (n 1)
To derive the OLS estimators of  , under the usual (classical) assumptions mentioned earlier, we
define two vectors ˆ and ‘ û ’ as:

 ˆ0   uˆ1 
  uˆ 
 ˆ1   2
ˆ   .  and uî   . 
   
 .  .
  uˆn 
 ˆk 
Thus we can write: Y  X ˆ  uˆ and û  Y  X ˆ
We have to minimize:
n
 uˆ
i 1
2
i  uˆ12  uˆ22  uˆ32  .........  uˆn2
 uˆ1 
uˆ 
 2
 [uˆ 1 , uˆ 2 ......uˆ n ] .  uˆ 'uˆ
 
.
uˆn 
  uî2  uˆ 'u
ˆ ˆ  (Y  X ˆ )'(Y  X ˆ )
u'u
 YY 'ˆ ' X 'Y  Y ' Xˆ  ˆ ' X ' Xˆ ………………….…(3.22)
Since ˆ ' X ' Y ' is scalar (1x1), it is equal to its transpose;
ˆ ' X 'Y  Y ' Xˆ
ˆ ˆ  Y 'Y  2ˆ ' X 'Y  ˆ ' X ' X ˆ -------------------------------------(3.23)
u'u

ˆ ˆ with respect to the elements in ˆ

Minimizing u'u
uî2  (u'u)
ˆˆ
  2 X ' Y  2 X ' X ˆ  0
ˆ ˆ
( X ' AX )
Since  2 AX and also too 2X’A
ˆ
Equating the expression to null vector 0, we obtain:
 2 X 'Y  2 X ' Xˆ  0  X ' Xˆ  X ' Y
ˆ  ( X ' X ) 1 X 'Y ………………………………. ………. (3.24)

Hence ˆ is the vector of required least square estimators, ˆ0 , ˆ1 , ˆ 2 ,........ ˆ k .
The variance-covariance matrix for the estimated coefficients is given as:
Vˆ ( ˆ )  ˆ u2 ( X ' X ) 1
Form the above expression, the variances of the estimates are given by multiplying the main
diagonal of ( X ' X )1 and ˆ u2 .
When the model is in deviation form we can write the multiple regression in matrix form as ;
̂  ( xx) 1 xy
ˆ1  x 2 1 x1 x 2 ....... x1 x k 
ˆ 2  
 x 2 x1 x 2 x 2 x k 
2
.......
where ˆ = : and ( xx)   : : : 
:  : : : 
 2 
ˆ k  x n x1 x n x 2 ....... x k 
The above column matrix ˆ doesn’t include the constant term ˆ0 .Under such conditions the
variances of slope parameters in deviation form can be written as:
ˆ ˆ )  ˆ 2 ( x ' x) 1 ………………………………………………………….. (3.25)
V( u
Dear Students! I hope that from the discussion made so far on multiple regression model, in
general, you may make the following summary of results.
(i) Model: Y  X u
(ii) Estimators: ˆ  ( X ' X ) 1 X 'Y

(iii) Statistical properties: BLUE
(iv) Variance-covariance: ˆ ˆ )  ˆ 2 ( X ' X ) 1
V( u
(v) Estimation of (u’u): u'u  Y 'Y  ˆ ' X 'Y

1
ˆ ' X 'Y    Yi 
2
n ˆ ' X ' Y  nY 2 ˆ ' x ' y

(vi) Coeff. of determination: R2   
1
Y ' Y  (Yi ) 2 Y ' Y  Y 2
y' y
n
3.6. Hypothesis Testing in Multiple Linear Regression
In multiple regression models we will undertake the following tests of significance.

1. Test of a single parameter
2. Test of a single linear combination of the parameters
3. Test of multiple linear restrictions using the F-test
4. Test of overall significance of the model
[
3.6.1. Test of a Single Parameter
If we invoke the assumption that ui . N (0,  ) , then we can use one of the following tests: the t-
2
test, standard error, confidence interval test or p-value test to test a hypothesis about any individual
partial regression coefficient. Test of a single parameter in multiple linear regression is the same
with significance test under simple linear regression discussed in Chapter 2.
The t-test
To illustrate consider the following example.
Let Y  ˆ0  ˆ1 X 1  ˆ2 X 2  ...ˆk X k  uî
Consider the null hypothesis,
H0 :  j  0
H1 :  j  0, j  1, 2,..., k
Since  j measures the partial effect of X j on Y after controlling for other independent variables,
H 0 :  j  0 means that, once X j have been accounted for, X j has no effect on Y. We compute the
(t-statistic) t-ratio for each ˆ i as follows:
ˆ j
t*ˆ 
j
se( ˆ j )
Next find the tabulated value of t ( tc ).
If t j  tc (tabulated), we do not reject the null hypothesis, i.e. we can conclude that ˆ j is
*

not significant and hence the regressor does not appear to contribute to the explanation
of the variations in Y.
If t  t (tabulated), we reject the null hypothesis and we accept the alternative one; ˆ
*
 j c j

*
is statistically significant. Thus, the greater the value of t j the stronger the evidence
that  j is statistically significant.

Dear students! You can conduct the significance test of a single parameter using the rest of testing
methods in the same fashion done in Chapter 2.
3.6.2. Testing hypotheses about a single linear combination of the parameters
In many applications we are interested in testing a hypothesis involving more than one of the
population parameters. We can also use the t-statistic to test a single linear combination of the
parameters, where two or more parameters are involved.
There are two different procedures to perform the test with a single linear combination of
parameters. In the first, the standard error of the linear combination of parameters corresponding to
the null hypothesis is calculated using information on the covariance matrix of the estimators. In
the second, the model is reparameterized by introducing a new parameter derived from the null
hypothesis and the reparameterized model is then estimated; testing for the new parameter
indicates whether the null hypothesis is rejected or not. The following example illustrates both
procedures.
Example: Are there constant returns to scale in the agricultural production?
To examine whether there are constant returns to scale in the agricultural sector, we are going to
use the Cobb-Douglas production function, given by
ln(output)    1 ln(labour)  2 ln(capital)  u
In the above model parameters 1 and 2 are elasticities (output/labor and output/capital).
Before making inferences, remember that returns to scale refers to a technical property of the
production function examining changes in output subsequent to a change of the same proportion in
all inputs, which are labor and capital in this case. If output increases by that same proportional
change then there are constant returns to scale. Constant returns to scale imply that if the factors
labor and capital increase at a certain rate (say 10%), output will increase at the same rate (e.g.,
10%). If output increases by more than that proportion, there are increasing returns to scale. If
output increases by less than that proportional change, there are decreasing returns to scale. In the
above model, the following occurs
 If 1  2  1, there are constant returns to scale.
 If 1  2  1, there are increasing returns to scale.
 If 1  2  1 , there are decreasing returns to scale.

To answer the question posed in this example, we must test
H0 : 1  2  1 against H1 : 1  2  1

According to H 0 , it is stated that 1  2 1  0 . Therefore, the t-statistic must now be based on

whether the estimated sum 1  2 1 is sufficiently different from 0 to reject H0 in favor of H1 .
Two procedures will be used to test this hypothesis. In the first, the covariance matrix of the
estimators is used. In the second, the model is reparameterized by introducing a new parameter.
Procedure 1: using covariance matrix of estimators
ˆ  ˆ2  1
t*ˆ  ˆ  1 , where se( ˆ1  ˆ2 )  Vˆ ( ˆ1  ˆ2 )  Vˆ ( ˆ1 )  V(
ˆ ˆ )  2 covar( ˆ , ˆ )
se( ˆ  ˆ )
2 1 2
1 2
1 2
()ˆ u2x1 x2
se( ˆ1  ˆ2 )  Vˆ (ˆ1 )  V(
ˆ ˆ )  2
x12x22  (x1x2 )2
2
If t*ˆ  ˆ  tc we will conclude, in a two side alternative test, that there are not constant returns to
1 2
scale. On the other hand, if t *ˆ  ˆ is positive and large enough, we will reject, in a one side
1 2
alternative test (right), H0 in favor of H1 : 1  2  1 . Therefore, there are increasing returns to

scale.
Procedure 2: Reparameterizing the model by introducing a new parameter
It is easier to perform the test if we apply the second procedure. A different model is estimated in
this procedure, which directly provides the standard error of interest. Thus, let us define:
  1  2 1
Thus, the null hypothesis that there are constant returns to scale is equivalent to saying that
H0 :   0 .
From the definition of , we have 1    2  1. Substituting 1 in the original equation:
ln(output)    (  2  1)ln(labour)  2 ln(capital)  u
Hence,

ln output
labour      ln(labour)   capital labour   u
2
Therefore, to test whether there are constant returns to scale is equivalent to carrying out a
significance test on the coefficient of ln(labor) in the transformed model. The strategy of rewriting
the model so that it contains the parameter of interest works in all cases and is usually easy to
implement. We test H0 :   0 against H1 :   0 .
ˆ
The t-statistic is: t*ˆ 
se(ˆ)

If tˆ  t c we reject the null hypothesis of constant returns to scale.

*
3.6.3. Testing Multiple Linear Restrictions Using the F-test
So far, we have only considered hypotheses involving a single restriction. But frequently, we wish
to test multiple hypotheses about the underlying parameters 1 , 2 , 3 ,..., k . In multiple linear
restrictions, we will distinguish three types: exclusion restrictions, model significance and other
linear restrictions.
Exclusion Restrictions
We begin with the leading case of testing whether a set of independent variables has no partial
effect on the dependent variable, Y. These are called exclusion restrictions. Consider the following
model
Y    1 X1  2 X 2  3 X 3  4 X 4  5 X 5  ui ………………………………(3.26)
The null hypothesis in a typical example of exclusion restrictions could be the following:
H0 : 4  5  0 against H1 : H0 is not true
This is an example of a set of multiple restrictions, because we are putting more than one
restriction on the parameters in the above equation. A test of multiple restrictions is called a joint
hypothesis test.
It is important to remark that we test the above H0 jointly, not individually. Now, we are going to
distinguish between unrestricted (UR) and restricted(R) models. The unrestricted model is the
reference model or initial model. In this example the unrestricted model is the model given in
(3.26). The restricted model is obtained by imposing H0 on the original model. In the above
example, the restricted model is
Y    1 X1  2 X2  3 X3  u ………………………………….(3.27)
By definition, the restricted model always has fewer parameters than the unrestricted one.
Moreover, it is always true that RSSR  RSSUR where RSSR is the RSS of the restricted model, and
RSSUR is the RSS of the unrestricted model. Remember that, because OLS estimates are chosen to
minimize the sum of squared residuals, the RSS never decreases (and generally increases) when
certain restrictions (such as dropping variables) are introduced into the model.
Test statistic: F ratio
(RSSR  RSSUR ) / r 2
( RUR  RR2 ) / r
F*   , where r is number of restrictions.
RSSUR (1  RUR
2
) / (n  k)
(n  k)
Decision Rule
The Fr,n k distribution is tabulated and available in statistical tables, where we look for the critical

value ( F (r,nk ) ), which depends on  (the significance level), r (the df of the numerator), and n-k,
(the df of the denominator). Taking into account the above, the decision rule is quite simple.
If F  F (r,n  k ) reject the H 0 .
*
If F  F (r,n  k ) do not reject the

*
H0 .
Model Significance
In this section we extend this idea to joint test of the relevance of all the included explanatory
variables. Now consider the following:
Y  0  1 X1  2 X 2  .........  k X k  ui
H 0 : 1   2   3  .......... ..   k  0
H 1 : at least one of the k is non-zero
This null hypothesis is a joint hypothesis that 1 ,  2 ,........  k are jointly or simultaneously equal to
zero. A test of such a hypothesis is called a test of overall significance of the observed or
estimated regression line, that is, whether Y is linearly related to X 1 , X 2 ,........ X k .
The test procedure for any set of hypothesis can be based on a comparison of the sum of squared
errors from the original, the unrestricted multiple regression model to the sum of squared errors
from a regression model in which the null hypothesis is assumed to be true. When a null
hypothesis is assumed to be true, we in effect place conditions or constraints, on the values that the
parameters can take, and the sum of squared errors increases. The idea of the test is that if these
sum of squared errors are substantially different, then the assumption that the joint null hypothesis
is true has significantly reduced the ability of the model to fit the data, and the data do not support
the null hypothesis.
If the null hypothesis is true, we expect that the data are compliable with the conditions placed on
the parameters. Thus, there would be little change in the sum of squared errors when the null
hypothesis is assumed to be true.
Let the Restricted Residual Sum of Square (RSS R) be the sum of squared errors in the model
obtained by assuming that the null hypothesis is true and RSS UR be the sum of the squared error of
the original unrestricted model i.e. unrestricted residual sum of square (RSS UR). It is always true
that RSSR - RSSUR  0.
Consider Y  ˆ0  ˆ1 X 1  ˆ2 X 2  .........  ˆk X k  uî .
This model is called unrestricted. The test of joint hypothesis is that:
H 0 : 1   2   3  .......... ..   k  0
H 1 : at least one of the k is different from zero.

We know that: Yˆ  ˆ 0  ˆ1 X 1i  ˆ 2 X 2i  .........  ˆ k X ki

Yi  Yî  uî
uî  Yi  Yî
uî2  (Yi  Yî ) 2
This sum of squared error is called unrestricted residual sum of square (RSS UR). This is the case
when the null hypothesis is not true. If the null hypothesis is assumed to be true, i.e. when all the
slope coefficients are zero.
Y  ˆ0  uî
̂ 0 
Y i
Y  (applying OLS)…………………………….(3.28)
n
uî  Yi  ˆ0 but ˆ 0  Y
uî  Yi  ˆ
uî2  (Yi  Yî ) 2  y 2  TSS
The sum of squared error when the null hypothesis is assumed to be true is called Restricted
Residual Sum of Square (RSS R) and this is equal to the total sum of square (TSS).
RSS R  RSSUR / K  1
The F-ratio: ~F( k 1,n  k ) (has an F-distribution with k-1 and n-k degrees of
RSSUR / n  K
freedom for the numerator and denominator respectively).
RSSR  TSS
RSSUR  uî2  y 2  ˆ1yx1  ˆ2yx2  ..........ˆk yxk  RSS
(TSS  RSS ) / k  1
F* 
RSS / n  k
ESS / k  1
F* 
RSS / n  k
If we divide the above numerator and denominator by y 2  TSS then:
ESS
/ k 1
F *  TSS
RSS
/k n
TSS
R2 / k 1
F* 
1  R2 / n  k
This implies the computed value of F can be calculated either as a ratio of ESS & TSS or R 2 & 1-
R2 . If the null hypothesis is not true, then the difference between RSSR and RSSUR (TSS & RSS)

becomes large, implying that the constraints placed on the model by the null hypothesis have large
effect on the ability of the model to fit the data, and the value of F tends to be large. Thus, we
reject the null hypothesis if the F test static becomes too large. This value is compared with the
critical value of F which leaves the probability of  in the upper tail of the F-distribution with k-1
and n-k degree of freedom.
If the computed value of F is greater than the critical value of F (k-1, n-k), then the parameters of
the model are jointly significant or the dependent variable Y is linearly related to the independent
variables included in the model.
3.7. Prediction in the Multiple Regression Model
In Chapter 2 we showed how the estimated two-variable regression model can be used for
prediction, that is, predicting the point on the population regression function (PRF), as well as for
(2) individual prediction, that is, predicting an individual value of Y given the value of the
regressor X=X0 , where X0 is the specified numerical value of X.
The estimated multiple regression too can be used for similar purposes, and the procedure for
doing that is a straightforward extension of the two-variable case.
Let the estimated regression equation be:
Yˆ  ˆ  ˆ1 X 1  ˆ2 X 2
Now consider the prediction of the value Y0 of Y given values X10 of X 1 and X 20 of X 2 . These
could be values at some future date.
The we have:
Y0  ˆ  ˆ1 X 10  ˆ2 X 20  u0
Consider Yˆ0  ˆ  ˆ1 X 10  ˆ2 X 20
The prediction error is: Yˆ0  Y0  ˆ    ( ˆ1  1 ) X 10  ( ˆ2   2 ) X 20  u0
Example
Suppose that the estimated model is: Yˆ  4.0  0.7 X 10  0.3 X 20
If X10  20 and X 20  10 , then Y0 would be:

Yˆ0  4.0  0.7(20)  0.3(10)  17
If the actual value of Y=20, then the prediction error becomes 17-20 = -3.

CHAPTER FOUR
Violations of the Assumptions of Classical Linear Regression Models
4.0. Introduction
In both the simple and multiple regression models, we made important assumptions about the
distribution of Yi and the random error term ‘ ui ’. We assumed that ‘ ui ’ is random variable with
mean zero and var( u t )   , and that the errors corresponding to different observation are
2
uncorrelated, cov(u i ,u j )  0 (for i  j) and in multiple regression we assumed there is no perfect

correlation between the independent variables.
Now, we address the following ‘what if’ questions in this chapter. What if the error variance is not
constant over all observations? What if the different errors are correlated? What if the
explanatory variables are correlated? We need to ask whether and when such violations of the
basic classical assumptions are likely to occur. What types of data are likely to lead to
heteroscedasticity (different error variance)? What type of data is likely to lead to autocorrelation
(correlated errors)? What types of data are likely to lead to multicollinearity? What are the
consequences of such violations on least square estimators? How do we detect the presence of
autocorrelation, heteroscedasticity, or multicollinearity? What are the remedial measures? How
do we build an alternative model and an alternative set of assumptions when these violations exist?
Do we need to develop new estimation procedures to tackle the problems? In the subsequent
sections, we attempt to answer such questions.
4.1. Multicollinearity (MC)

4.1.1. The Nature of Multicollinearity
One of the assumptions of the CLRM is that there is no exact linear relationship exists between
any of the explanatory variables. When this assumption is violated, we speak of perfect
multicollinearity. If all explanatory variables are uncorrelated with each other, we speak of
absence of MC. These are two extreme cases and rarely exist in practice. Of particular interest are
cases in between moderate to high degree of MC.
For k-variable regression involving explanatory variables x1 , x2 ,......, xk, an exact linear
relationship is said to exist if the following condition is satisfied.
1x1  2 x2  ...  k xk  0............................................(4.1)
where 1 , 2 ,.....k are constants such that not all of them are simultaneously zero.
However , the term multicollinearity is used in a broader sense to include the case of perfect

multicollinearity as shown by (4.1) as well as the case where the x-variables are inter-correlated
but not perfectly so as follows:
1x1  2 x2  .......  2 xk  vi  0....................................(4.2)
where vi is the stochastic error term.
Note that multicollinearity refers only to linear relationships among the explanatory variables. It
does not rule out non-linear relationships among the explanatory variables.
For example:Y    1 X i  1 X i  1 X i  vi       (4.3)
2 3
Where: Y-Total cost and X-output.

2 3
The variables X i and X i are obviously functionally related to X i but the relationship is non-
linear. Strictly, therefore, models such as (4.3) do not violate the assumption of no
multicollinearity. However, in concrete applications, the conventionally measured correlation
2 3
coefficient will show X i , Xi and X i to be highly correlated, which as we shall show, will make it
difficult to estimate the parameters with greater precision (i.e. with smaller standard errors).
4.1.2. Reasons for Multicollinearity
MC may arise for various reasons. Firstly, there is a tendency of economic variables to move
together over time. Economic magnitudes are influenced by the same factors and in consequence
once these determining factors become operative the economic variables show the same broad
pattern of behavior over time. For example in periods of booms or rapid economic growth the
basic economic magnitudes grow, although some tend to lag behind others. Thus, income,
consumption, savings, investment, prices, employment, tend to rise in periods of economic
expansion and decrease in periods of recession. Growth and trend factors in time series are the
most serious cause of MC. Secondly, regressing on small sample values of the population may
result in MC. Thirdly, over determined model is another cause of MC. This happens when the
model has more explanatory variables than the number of observations. This could happen in
medical research where there may be a small number of patients about whom information is
collected on a large number of variables.
4.1.3. Consequences of Multicollinearity
Why does the classical linear regression model put the assumption of no multicollinearity among
the X’s? It is because of the following consequences of multicollinearity on OLS estimators.
1. If multicollinearity is perfect, the regression coefficients of the X variables are

indeterminate and their standard errors are infinite. Consider a multiple regression model
with two explanatory variables:

yi  ˆ1 x1i  ˆ2 x2 i  uî
Dear student, do you recall the formulas of ˆ1 and ̂ 2 from our discussion of multiple regression?
ˆ x1 yx 22  x 2 yx1 x 2

1 
x12i x 22  (x1 x 2 ) 2
x 2 yx12  x1 yx1 x 2
ˆ1 
x12 x 22  (x1 x 2 ) 2
Assume x 2  x1 where  is non-zero constant. Substituting it in the above ˆ1 and ̂ 2 formula:
x1 y(x1 ) 2  x1 yx1x1

ˆ1 
x12i (x1 ) 2  (x1x1 ) 2
 2 x1 yx12   2 x1 yx12 0
   Indeterminate.
 2 (x12 ) 2   2 (x12 ) 2 0
Applying the same procedure, we obtain similar result (indeterminate value) for ̂ 2 . Likewise,
from our discussion of multiple regression model, variance of ˆ1 is given by:
 2x22
var( ˆ1 ) 
x12 x12  (x1 x2 ) 2
Substituting x 2  x1 in the above variance formula, we get:
 2 2 x12

2 (x12 ) 2  2 (x12 ) 2
 2 2 x12
   Infinite.
0
These are the consequences of perfect multicollinearity. One may raise the question on
consequences of less than perfect correlation. In cases of near or high multicollinearity, one is
likely to encounter the following consequences.
2. If multicollinearity is less than perfect (i.e. near or high multicollinearity)

i. The regression coefficients are determinate. Although BLUE, the OLS estimators
have large variances and covariances, making precise estimation difficult and still
unbiased.

ii. Because of consequence (i), the confidence intervals tend to be much wider,
leading to the acceptance of the “zero null hypothesis” (i.e., the true population
coefficient is zero) more readily.
iii. Also because of consequence (i), there is a high probability of not rejecting null
hypothesis of zero coefficient (using the t-test) when in fact the coefficient is
significantly different from zero.
iv. Although the t-ratio of few (or none) coefficient is statistically significant, R2 , the
overall measure of goodness of fit, can be very high. The regression model may
do well, that is, R2 may be quite high.
v. The OLS estimates and their standard errors may be quite sensitive to small
changes in the data.
4.1.4. Detection of Multicollinearity
MC almost always exist in most applications. So the question is not whether it is present or not,
it is a question of degree. Also MC is not a statistical problem, it is a data (sample) problem.
Since multicollinearity refers to the condition of the explanatory variables that are assumed to be
non-stochastic, it is a feature of the sample and not of the population. Therefore, we do not “test
for MC”; but measure its degree in any particular sample using some rules of thumb.
Some of the methods of detecting MC are:
1. High R2 but few (or no) significant t-ratios. If R2 is high, say, in excess of 0.8, the F test
in most cases will reject the hypothesis that the partial slope coefficients are
simultaneously equal to zero, but the individual t tests will show that none or very few of
the partial slope coefficients are statistically different from zero.
2. High pair-wise correlation among regressors. Note that high zero-order correlations are a
sufficient but not a necessary condition for the existence of multicollinearity because it
can exist even though the zero-order or simple correlations are comparatively low.
3. Variance Inflation Factor (VIF)
Consider the following regression model:
Yi  0  1 X1   2 X 2  ...  k X k  ui
The VIF of  j is defined as:
1
VIF (  j )  ; j  1, 2,3,..., k
1  R 2j

2
Where R j is the coefficient of determination obtained when X j variable is regressed on the
remaining explanatory variables (called auxiliary regression). For example, the VIF (2 ) is
calculated as:
1
VIF (  2 )  ; where R22 is the coefficient of determination of the auxiliary
1  R22
regression:
X 2  0  1 X1  3 X 3  ...  k X k  u
Rule of thumb:
If VIF (  j ) exceeds 10, the ˆ j is poorly estimated because of MC (or the j th regressor is
responsible for MC).
4.1.5. Remedial Measures of MC
What can be done if multicollinearity is serious? We have two choices: (1) do nothing or (2)
follow some rules of thumb.
Do Nothing
The “do nothing” school of thought is expressed by Blanchard as follows 2 :
“Multicollinearity is God’s will, not a problem with OLS or statistical technique in

general.”
What Blanchard is saying is that multicollinearity is essentially a data deficiency problem

(micronumerosity) and sometimes we have no choice over the data we have available for
empirical analysis.
Rule-of-Thumb Procedures
One can try the following rules of thumb to address the problem of multicollinearity, the success
depending on the severity of the collinearity problem.
1. A priori information. Suppose we consider the model
2 Blanchard, O. J., Comment, Journal of Business and Economic Statistics, vol. 5, 1967, pp. 449–451.

Yi  0  1 X 1i   2 X 2i  ui
Where Y=consumption, X1 =income, and X2 =wealth. Income and wealth variables tend to be
highly collinear. But suppose a priori we believe that 1  0.12 ; that is, the rate of change of
consumption with respect to wealth is one-tenth the corresponding rate with respect to income.
We can then run the following regression:
Yi  0  1 X1i  0.11 X 2i  ui  Yi  0  1 ( X1i  0.1X 2i )  u i
Yi  0  1 ( X1i  0.1X 2i )  u i  Yi  0  1 X i  ui ; X i  X1i  0.1X 2i
Once we obtain ̂1 , we can estimate ˆ2 from the postulated relationship between 1 and 2 .
However, such a priori information is rarely available.
2. Dropping a variable(s) and specification bias.
When faced with severe multicollinearity, one of the “simplest” things to do is to drop one of the
collinear variables. But in dropping a variable from the model we may be committing a
specification bias or specification error. Specification bias arises from incorrect specification of
the model used in the analysis. Thus, if economic theory says that income and wealth should
both be included in the model explaining the consumption expenditure, dropping the wealth
variable would constitute specification bias.
3. Additional or new data.
Since multicollinearity is a sample feature, it is possible that in another sample involving the
same variables collinearity may not be as serious as in the first sample. Sometimes simply
increasing the size of the sample (if possible) may reduce the collinearity problem.
4.2. Heteroscedasticity
4.2.1. The Nature of Heteroscedasticity
In the classical linear regression model, one of the basic assumptions is that the probability
distribution of the disturbance term remains same over all observations of X; i.e. the variance of
each u i is the same for all the values of the explanatory variable. Symbolically,
var(ui )   ui  (ui )   (ui2 )   u2 (Constant value).

2
This feature of homogeneity of variance (or constant variance) is known as homoscedasticity. It

may be the case, however, that all of the disturbance terms do not have the same variance. This

condition of non-constant variance or non-homogeneity of variance is known as

heteroscedasticity. Thus, we say that U’s are heteroscedasticity when:
var( ui )   ui2 (a value that varies)
If  u is not constant but its value depends on the value of X; it means that  ui  f ( X i ) . Such
2 2
dependency is depicted diagrammatically in the following figures. Three cases of

heteroscedasticity all shown by increasing or decreasing dispersion of the observation from the
regression line.
In panel (a)  u seems to increase with X. In panel (b) the error variance appears greater in X’s
2
middle range, tapering off toward the extremes. Finally, in panel (c), the variance of the error
term is greater for low values of X, declining and leveling off rapidly an X increases.
The pattern of heteroscedasticity would depend on the signs and values of the coefficients of the
relationship  ui  f ( X i ) , but u i ’s are not observable.
2
As such in applied research we make
convenient assumptions that heteroscedasticity is of the forms:
i.  ui2  K 2 ( X i2 )
ii.  2  K 2 (X i )
K
iii.  ui2  etc.
Xi
4.2.2. Reasons for Heteroscedasticity
There are several reasons why the variances of u i may be variable. Some of these are:
1. Error learning model:

It states that as people learn their error of behavior become smaller over time. In this case  i is
2
expected to decrease. Example: as the number of hours of typing practice increases, the average
number of typing errors and as well as their variance decreases.
2. As data collection technique improves,  ui2 is likely to decrease.
Thus banks that have sophisticated data processing equipment are likely to commit fewer errors
in the monthly or quarterly statements of their customers than banks without such facilities.
3. Heteroscedasticity can also arise as a result of the presence of outliers.
An outlier is an observation that is much different (either very small or very large) in relation to
the other observation in the sample.
4. Specification errors is another source of heteroscedasticity.

5. Heteroscedasticity is more likely to occur in cross-sectional data
4.2.3. Consequences of Heteroscedasticity

What happens when we use ordinary least squares procedure to a model with hetroscedastic
disturbance terms?
1. OLS estimators are still linear and unbiased. The least square estimators are unbiased
even under the condition of heteroscedasticity.
2. Variance of OLS coefficients will not be minimum. Thus, under heteroscedasticity, the
OLS estimators of the regression coefficient are not BLUE and efficient.
3. Because of consequence (2), confidence interval, t-test and F-test of significance are
invalid.
4. ˆ
2

 uˆ 2
is biased, E( ˆ 2 )  2
nk
4.2.4.Detection of Heteroscedasticity
I. Graphical Method:
Plot the estimated residual ( uî ) or squared ( uî ) against the predicted dependent Variable (Yi) or
2
any independent variable (Xi). Observe the graph whether there is a systematic pattern.
In the figure below uî are plotted against Yˆ or ( X i ) . In fig (a), we see there is no systematic
2
pattern between the two variables, suggesting that perhaps no heteroscedasticity is present in the
data. Figures b to e, however, exhibit definite patterns. . For instance, c suggests a linear
relationship whereas d and e indicate quadratic relationship between uî and Yi .
2

II. Statistical Test

1. White’s Test
Consider: Y  0  1 X1  2 X 2  u
The test involves applying OLS to uˆ   0  1 X1   2 X 2  3 X1   4 X2  5 X1X2  

2 2 2
and
2
calculating the coefficient of determination Rw , where û 2 are OLS residuals from the original
model.
H0 : 1  2  ...  k  0 (Homoscedasticity) against Ha: heteroscedasticity
The test statistic is: cal  nRw

2 2
Decision Rule: Reject H0 if nRw2   p2 (the value of the Chi-square distribution with p degree of
freedom for  level of significance). Here, p is number of variables in the auxiliary regression.
2. Breusch- Pagan/ Cook-Weisberg test for heteroscedasticity

This involves applying OLS to:

uˆ 2
 0  1 X1   2 X 2  ...   k X k   ; and calculate estimated sum of square (ESS). The
ˆ 2
ESS
test statistic is:  cal
2

2
Decision Rule: reject the null hypothesis of homoscedasticity: H0 : 1  2  ...  k  0 if 
where  (k) is the critical value from the Chi-square distribution with k degree of freedom
2

for a given value of  .
4.2.5.Remedial Measures for the Problems of Heteroscedasticity

As we have seen, heteroscedasticity does not destroy the unbiasedness and consistency property
of the OLS estimators, but they are no longer efficient. This lack of efficiency makes the usual
hypothesis testing procedure of dubious value. Therefore, remedial measures concentrate on the
variance of the error term. There are two approaches to remediation: when  i is known and when
2
 i2 is not known.
When  i is known: The Method of Weighted Least Squares

2
If  i is known, the most straightforward method of correcting heteroscedasticity is by means of

2
weighted least squares, for the estimators thus obtained are BLUE. Assume that our original
model is:Y    X i  U i where u i satisfied all the assumptions except that u i is
hetroscedastic.
(ui ) 2   i2  f (k i )
If we apply OLS to the above model, the estimators are no more BLUE. To make them BLUE
we have to transform the above model. Applying OLS to the transformed variables is known as
the method of Generalized (Weighted) Least Squares (GLS/WLS). In short GLS/WLS is OLS
on the transformed variables that satisfy the standard least squares assumptions. The estimators
thus obtained are known as GLS/WLS estimators, and it is these estimators that are BLUE.
Given the model Y    X i  U i
The transforming variable of the above model is  i2   i so that the variance of the
transformed error term is constant. Now divide the above model by  i .

Y   X i ui
    Y *   *   * X *  u*
i i i i
The variance of the transformed error term is constant, i.e.

2
u  u  1 1
var  i   var(u* )    i   2 (ui ) 2  2  i2  1 Constant
 i   i  i i
When  i is not known: White’s Heteroscedasticity-Consistent Variances and SEs.

2
As noted earlier, if true  i are known, we can use the WLS method to obtain BLUE estimators.
2
Since the true  i are rarely known, there is a way of obtaining consistent (in the statistical sense)
2
estimates of the variances and covariances of OLS estimators even if there is heteroscedasticity.
White’s Heteroscedasticity-Consistent Variances and Standard Errors estimate can be performed

so that asymptotically valid (i.e., large-sample) statistical inferences can be made about the true
parameter values. White’s heteroscedasticity-corrected standard errors are also known as robust
standard errors.
4.3. Autocorrelation
In our discussion of simple and multiple regression models, one of the assumptions of
the classicalist is that the cov(u i u j )  E (u i u j )  0 which implies that successive values of
disturbance term u are temporarily independent, i.e. disturbance occurring at one point of
observation is not related to any other disturbance. This means that when observations are made
over time, the effect of disturbance occurring at one period does not carry over into
another period.
If the above assumption is not satisfied, that is, if the value of u in any particular
period is correlated with its own preceding value(s), we say there is autocorrelation of
the random variables. Autocorrelation is a special case of correlation which refers to the
relationship between successive values of the same variable.
The coefficient of autocorrelation: Autocorrelation, as stated earlier, is a kind of lag correlation

between successive values of same variables. Thus, we treat autocorrelation in the same way as
correlation in general. A simple case of linear correlation is termed here as autocorrelation of
first order. In other words, if the value of U in any particular period depends on its own value in
the preceding period alone, we say that u’s follow a first order autoregressive scheme AR(1) (or
first order Markov scheme) i.e.

ut  f (ut 1 ) . -------------------------------------------------------------- 4.1
If ut depends on the values of the two previous periods, then:
ut  f (ut 1 , ut 2 ) ----------------------------------------------------------- 4.2
This form of autocorrelation is called a second order autoregressive scheme and so on.
Generally when autocorrelation is present, we assume simple first form of autocorrelation: ut =
f(ut-1 ) and also in the linear form:
ut  ut 1  vt --------------------------------------------4.3
where  the coefficient of autocorrelation and v is a random variable satisfying all the basic
assumption of ordinary least square.
(v)  0, (v 2 )   v2 and (v i v j )  0 for i  j
The above relationship (4.3) states the simplest possible form of autocorrelation; if we apply
OLS on the model given in (4.3) we obtain:
u u t t 1
ˆ  t 2
n
--------------------------------4.4
u
t 2
2
t 1
Given that for large samples: u t  u t 1 , we observe that coefficient of autocorrelation 

2 2
represents a simple correlation coefficient r.

n n n
 ut ut 1  ut ut 1 u u t t 1
ˆ  t 2
n
 t 2
 t 2
 rut ut 1
(Why?)---------------------4.5
u t2 u t21
u
2
2
 n

t 2
t 1
  u 2 t 1 
 t 2 
 1  ˆ  1 since 1  r  1 ---------------------------------------------4.6
This proves the statement “we can treat autocorrelation in the same way as correlation in
general”. From our statistics background we know that:
 if the value of r is 1 we call it perfect positive correlation,
 if r is -1 , perfect negative correlation and

 If the value of r is 0, there is no correlation.
By the same analogy if the value of ̂ is 1 it is called perfect positive autocorrelation, if ̂ is -1

it is called perfect negative autocorrelation and if   0 , no autocorrelation.
If ̂ =0 in u t  u t  1  v t i.e. u t is not correlated.
Mean, Variance and Covariance of Disturbance Terms in Autocorrelated Model
If the values of u are found to be correlated with simple Markov process, then it becomes:
ut  ut 1  vt with /  / 1 vt fulfilling all the usual assumptions of a disturbance term. Our
objective, here is to obtain value of u t in terms of autocorrelation coefficient  and random
variable vt . The complete form of the first order autoregressive scheme may be discussed as
under:
From (4.3) we have (beginning from u0 )
u1   u0  v1
u2   u1  v2   (  u0  vt )  vt   2u0   v1  v2
u3   u2  v3   (  2u0   v1  v2 )  v3   3u0   2v1   v2  v3
u4   u3  v4   (  3u0   2v1   v2  v3 )  v 4   4u0   3v1   2v2   v3  v4
ut   t u0   t 1v1   t  2v2  ...   vt 1  vt
t 1
ut   t u0    j vt  j .....................................................(4.7)
j 0
From (4.6) we have seen that the value of  is less than in absolute value, i.e. 1    1 . In such

cases,lim
𝑡→∞
 ut 0
t
0  0 . Thus, ut can be expressed as:

ut    j vt  j  vt   vt 1   2vt 2   3vt 3  ... (4.8)
j 0
Now, using this value of u t , let’s compute its mean, variance and covariance

Mean of ut :

E (ut )    j E (vt  j )  0 …………………………………………………… (4.9)
j 0
Variance of ut :
2
  
Var(u t )  E (u t )     j vt  j 
2
 j 0 
  2j 2   
 E    vt  j    j  i vt  j vt i 
 j 0 j 0 i  j 
      
 E    2 j vt2 j   E    j  i vt  j vt i 
 j 0   j 0 i  j 
 
  
   2 j E (vt2 j )    j  i E (vt  j vt i ); E (vt2 j )   v2 , E (vt  j vt i  0;i  j)
j 0 j 0 i  j

   2 j v2
j 0
Note: (Sum of infinite series)

1
If a  1 , then a
j 0
j

1 a
 
 v2
Thus, var(u t )     v   v   
2j 2 2 2j
……………………………. (4.10)
j 0 j 0 1  2
Covariance of ut :
 v2
cov(u t u t s )    s u2 ; s  1, 2,3,.... ………………………… (4.11)
1  2

4.3.1. Effect of Autocorrelation on OLS Estimators.

Following are effects on the estimators if OLS method is applied in presence of autocorrelation
in the given data.
1. OLS estimates are unbiased
Consider the model I deviation form: yt   xt  ut
ut  ut 1  vt ,   1 where vt  ut  ut 1 satisfies all the assumptions the CLRM.
The OLS estimator of ˆ is:
ˆ 
x y
t t
x 2
t
ˆ   t t2 t     t 2 t
x ( x  u ) xu
 xt  xt
0
Then we have  E ( ˆ )   
 xt E (ut )   , (The estimated coefficients are still unbiased .)
 xt2
2. The variances of the OLS estimators is no longer the smallest (efficient)
Consider the two variable regression model: Yt =  0 +  1 X1t + ut
E(ut ut 1 )  0 then var(ˆ1 )   2 . If E(ut ut 1 )  0 and ut  ut 1  vt , then

2
If
x
t
ˆ 2 2 2   xt xt 1 2  t t 2
xx 3  t t 3
xx 
var( 1 ) AR (1)          ... 
 xt2  xt2   xt2  xt2  xt2 
If   0, autocorrelation, then Var( 1 )AR(1) > Var( 1 ). The implication is if wrongly use
ˆ 2
Var (  )  2 while the data is autocorrelated var(  ) is underestimated.
x
3. Wrong Testing Procedure

If var( ˆ ) is underestimated, SE ( ˆ ) is also underestimated, this makes t-ratio large. This large t-
ratio may make ˆ statistically significant while it may not. Wrong testing procedure will make
wrong prediction and inference about the characteristics of the population.
4.3.2. Detection (Testing) of Autocorrelation

1. The Durbin-Watson (DW) d test:
The most celebrated test for detecting serial correlation is one that is developed by statisticians
Durbin and Waston. It is popularly known as the Durbin-Waston d statistic, which is defined as:
t n
 (u  u t 1 )2
 2 1  ˆ  ……………………… (4.12)
t
d t 2
t n
u
t 1
2
t
Assumptions underlying the d-statistics:
a. The regression model includes an intercept term.
b. The disturbances U t are generated by the first order auto regressive scheme:
ut  ut 1  vt  AR(1) scheme.
c. The regression model does not include lagged value of Y the dependent variable as one
of the explanatory variables. Thus, the test is inapplicable to models of the following
type: yt  0  1 X1t  2 X 2t  2 yt 1  ut
d. There are no missing observations in the data.
From (4.12) above, we have
ˆ  0, d  2

if ˆ  1, d  0
ˆ  1, d  4

Thus we obtain two important conclusions
i. Values of d lies between 0 and 4
ii. If there is no autocorrelation ˆ  0, then d  2

Whenever the calculated value of d turns out to be sufficiently close to 2, we do not reject the
null hypothesis, and if it is close to zero or four, we reject the null hypothesis.
For the two-tailed Durbin Watson test, we have set five regions to the values of d as depicted in
the figure below.
We do not have unique critical value of d-static. We have d L -lower bound and d u upper bound
of the initial values of d to reject or not to reject the null hypothesis.
The mechanisms of the D.W test are as follows, assuming that the assumptions underlying the
tests are fulfilled.
 Run the OLS regression and obtain the residuals

 Obtain the computed value of d using the formula given in equation 4.12
 For the given sample size and given number of explanatory variables, find out
critical d L and d U values.
 Now follow the decision rules given below.
1. If d is less that d L or greater than (4  d L ) we reject the null hypothesis of no

autocorrelation in favor of the alternative which implies existence of autocorrelation.
2. If, d lies between d U and (4  dU ) , accept the null hypothesis of no autocorrelation

3. If however the value of d lies between d L and d U or between (4  dU ) and (4  d L ) , the

D.W test is inconclusive.
Example 1. Suppose for a hypothetical model Y    X  U i ,if we found
d  0.1380 ; d L  1.37; dU  1.50
Based on the above values test for autocorrelation
Solution: First compute (4  d L ) and (4  dU ) then compare the computed value of d with dL ,
dU , (4  d L ) and (4  dU )
(4  d L ) =4-1.37=2.63
(4  dU ) =4-1.5=2.50
since d is less than d L we reject the null hypothesis of no autocorrelation.
Limitations of the DW test:
 The test is not applicable if the regression model does not include an intercept term.
 The test is valid for the AR(1) error scheme only.
 The test is invalid when lagged values of the dependent variable appear as regressors.
 The test is invalid if there are missing values.
 There are certain regions where the test is inconclusive.
2. The Breusch- Godfrey (BG) Test
Consider a two variable regression line: Yt     X t  ut ; t  1,2,3,...,T
Assume that the error term follows the autoregressive scheme of order p, that is, AR (p) given
by:
ut  1ut 1  2ut 2  3ut 3  ...   put  p  vt where vt satisfies all assumptions of the CLRM.
The null hypothesis to be tested is:
H 0 : 1  2  3  ...   p (no autocorrelation)
Ha : H0 is not true (there is autocorrelation)

Steps:
1. Estimate the model ( Yt     X t  ut ) using OLS and obtain the residuals, uˆt .
2. Regress uˆt on X t and uˆt 1 , uˆt 2 , uˆt 3 ,..., uˆt  p , that is, run the following auxiliary regression:
uˆt     X t  1uˆt 1  2uˆt 2  3uˆt 3  ...   puˆt  p  t
3. Obtain the coefficient of determination R2 from the auxiliary regression
4. If the sample size T is large, Breusch and Godfrey have shown that (T P)R follows the
2
Chi-square (  ) distribution with p degrees of freedom.

2
Decision rule:
Reject the null hypothesis of no autocorrelation if (T P)R exceeds the tabulated value from the
2
 2 distribution with p degrees of freedom for a given level of significance  .
4.3.3. Remedial Measures for the problems of Autocorrelation

 The Cochrane-Orcutt Two-step procedure (CORC):
Consider the model: Yt  0  1 X t  ut
Steps:
a. Run OLS on the above equation and obtain uˆt
b. Run OLS (with no intercept) on uˆt  ut 1  vt and obtain ̂
c. Use ̂ to transform the variables:
Yt  ˆYt 1  0 (1  ˆ )  1* (Xt  ˆ X t 1 )  (u t  ˆ ut 1 )  Yt *   *  1* X t*  ut*

Y1*  *  X t* ut*
d. Run OLS on Y1    1 X t  ut
* * * * *
 Cochrane-Orcutt Iterative Procedure
e. If DW test shows that the autocorrelation still existing, then it needs to iterate the
procedures from (4). Obtain the uˆt
*

Run OLS on uˆt  uˆt 1  vt and obtain ̂ˆ (the second-round estimated )
* * ,
f.
g. Use ̂ˆ to transform the variables:
Yt  ˆˆYt 1  0 (1  ˆˆ )  1** (Xt  ˆˆ X t 1 )  (u t  ˆˆ ut 1 )  Yt **   **  1** X t**  ut**

Yt**   **  X t** ut**
h. Run OLS on Yt    1 X t  ut
** ** ** ** **
i. Check on the DW3 -statistic, if the autocorrelation is still existing, then go into third-round
procedures and so on.
 Prais-Winsten transformation
Consider the model: Yt  0  1 X t  ut
Assume AR(1): ut  ut 1  vt ; 1    1. Then transform the variables as below:
Yt  Yt 1  0 (1   )  1* (Xt   X t 1 )  (u t   ut 1 )  Yt *   *  1* X t*  ut*  GLS

Yt*  *  X t* ut*
To avoid the loss of the first observation of each variable, the first observation of Y * and X*
should be transformed as:
Yt *1  1  ˆ 2 (Yt 1 )
X t*1  1  ˆ 2 (Xt 1 )
4.4. Specification Errors: Omission of Variables
Until now we have assumed that the multiple linear regression we are estimating includes all the
relevant explanatory variables. In practice, this is rarely the case. Sometimes some relevant
variables are not included due to oversight or lack of measurements. This is often called the
problem of excluding a relevant variable or underspecifying the model. How our inferences
change when this problem is present?
Suppose the true equation is: y  1 x1  2 x2  u
Instead, we omit x2 and estimate the equation: y  1 x1  u . This will be referred to as the
“misspecified model.” The estimate of 1 we get is

ˆ1  
x1 y
x 2
1
Substituting the expression for y from the true equation in this, we get
ˆ1   x x  xu
x1 ( 1 x1   2 x2  u )
 ˆ1  1   2 1 2 1
x 2
1 x x 2
1
2
1
Since E ( x1u )  0 , we get E ( ˆ1 )  1   2 x x 1 2

where x x 1 2
is the regression coefficient
x 2
1 x 2
1
from a regression of x2 on x1 .
Thus, ̂1 is a biased estimator of 1 and the bias is given by
bias   2
x x 1 2
x 2
1
The variance of ̂1 , under the misspecified model, is given by
2
Var ( ˆ1 ) 
x 2
1 (1  r122 )
Thus, ̂1 is the biased estimator but has a smaller variance than ̂1 estimated from the correctly
specified model. This results in large t-ratio/CI which in turn leads to rejection of null hypothesis
when it is true.
4.5. Tests of Parameter Stability: The Chow Test
When we estimate a multiple regression equation and use it for predictions at future points of
time we assume that the parameters are constant over the entire time period of estimation and
prediction. To test this hypothesis of parameter consistency (or stability) some tests have been
proposed. One of these tests is Chow (Analysis-of-variance) test.
The Chow’s (ANOVA) Test
Suppose that we have two independent sets of data with sample sizes n1 and n2 , respectively.
The regression equation is
Y  1  11 X1  12 X 2  ...  1k X k  u ; for the first set
Y  2  21 X1  22 X 2  ...  2k X k  u ; for the second set

A test for stability of the parameters between the populations that generated the two data sets is a
test of hypothesis:
H0 : 11  21, 12  22 , 13  23 ,..., 1k  2k ,1  2
If this hypothesis is true, we estimate a single equation for the data set obtained by pooling the
two data sets. The F-test we use is the F-test described in Chapter 3 based on RSSUR and RSSR .
To get RSSUR we estimate the regression model of each of the data sets separately.
Define
RSS1  Residual sum of squares for the first data set.
RSS2  Residual sum of squares for the second data set.
RSS1  RSS2  RSSUR
RSSR  Residual sum of squares for the pooled (first+second) data set.
Fcal 
 RSSR  RSSUR  / (K 1)
RSSUR /  n1  n2  2k  2 
Decision Rule: Reject the null hypothesis of identical parameters (parameter stability) if
Fcal  F  k  1, n1  n2  2k  2 

Econometrics I Lecture Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econometrics I Lecture Notes

Uploaded by

Copyright:

Available Formats

Jinka University Department of Economics 2019/20

In short, econometrics may be considered as the integration of economics, mathematics, and

1.2. Models: Economic Models and Econometric Models

Econometrics I Lecture Notes Page 1

 The model is oversimplified

 To the criticism of oversimplification, many have argued that it is better to start

In practice we include in our model:

 Variables that we think are relevant for our purpose.

An economic model is an organized set of relationships that describes the functioning of an

Econometrics I Lecture Notes Page 2

Q  b0  b1P  b2 P0  b3Y  b4t

Generally, Econometrics differs from mathematical economics in that, although econometrics

1.3. Methodology of Econometrics

The aims of econometrics are:

1. Specification the model

Econometrics I Lecture Notes Page 3

3. Evaluation of the estimates

i) The dependent and independent (explanatory) variables which will be

1. The imperfections, looseness of statements in economic theories.

a. Omissions of some important variables from the function.

a. Gathering of the data on the variables included in the model.

Econometrics I Lecture Notes Page 4

d. Examination of the degree of correlation between the explanatory variables

4) Evaluation of the forecasting power of the model:

Econometrics I Lecture Notes Page 5

Schematic description of the steps involved in econometric analysis (research)

1.4. The Sources, Types and Nature of Data

The data collected by various agencies may be experimental or non-experimental. In

Econometrics I Lecture Notes Page 6

Cross-Section Data: Data on different (multiple) entities- workers, consumers, firms,

Econometrics I Lecture Notes Page 7

Econometrics I Lecture Notes Page 8

2.1. Concept of Regression Function

Definition: Regression analysis is concerned with describing and evaluating the

Econometrics I Lecture Notes Page 9

0 and 1 are known as regression coefficients or regression parameters. 1 is the slope

Assumptions of the Classical Linear Stochastic Regression Model

1. The model is linear in parameters.

Econometrics I Lecture Notes Page 10

2. U i is a random real variable

Mathematically, E (U)  0 …………………………….. (2.3)

4. The assumption of homoscedasticity

Econometrics I Lecture Notes Page 11

Figure 2.a: Homoscedastic Variance

Var(Ui )  E[Ui  E (Ui )]2  E (Ui )2   2 (Since E(Ui) = 0). ………………………….(2.4)

5. The random variable (U) has a normal distribution

6. No autocorrelation between the disturbances.

Econometrics I Lecture Notes Page 12

cov( X i ,Ui )   [( X i  ( X i )][U i  (U i )]

A. The dependent variable Yi is normally distributed.

Econometrics I Lecture Notes Page 13

B. successive values of the dependent variable are independent, i.e

2.2. Methods of Estimation

1. Ordinary least square method (OLS)

Econometrics I Lecture Notes Page 14

E (U)  0 and cov( X ,U )  0

In the method of moments, we replace these conditions by their sample counterparts.

Population Assumptions Sample Counterpart

Econometrics I Lecture Notes Page 15

Uˆ i  0 or  (Yi  ˆ  ˆ Xi )  0 ………………………………………………..(2.11)

 Y  nˆ  ˆ X ……………………………………….. …………….(2.12)

 X Uˆ i i  0 or  X i (Yi  ˆ  ˆ Xi )  0   X iYi  ˆ  X i  ˆ  X i2  0 ……… (2.14)

  X iYi  ˆ  X i  ˆ  X i2 …………………………………. …..(2.15)

Y X i i  YX i  ˆ (X i2  XX i )