You are on page 1of 13

___________________________________________________________________________________________________

Subject Business Economics

Paper No and Title 8, FUNDAMENTALS OF ECONOMETRICS

Module No and Title 1,INTRODUCTION TO TWO VARIABLE REGRESSION


ANALYSIS

Module Tag BSE_P8_M1

BUSINESS ECONOMICS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


MODULE No. : 1, INTRODUCTION TO TWO VARIABLE REGRESSION
ANALYSIS
___________________________________________________________________________________________________

CONTENTS

1. INTRODUCTION

2. WHAT IS REGRESSION ANALYSIS

3. ECONOMIC DATA

4. TYPES OF REGRESSION MODEL

5. POPULATION REGRESSION MODELS

6. THE STOCHASTIC DISTURBANCE TERM

7. SAMPLE REGRESSION FUNCTION

8. SUMMARY

BUSINESS ECONOMICS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


MODULE No. : 1, INTRODUCTION TO TWO VARIABLE REGRESSION
ANALYSIS
___________________________________________________________________________________________________

1.INTRODUCTION

The term Regression has its historical origin which is connected to the works of Francis Galton. While studying
the height of parents and children, Galton found the following result: tall parents usually have tall children and
short parents usually have short children but the average or mean height of children born of both parents tend to
move towards the average or mean height in the population as a whole. In other words the height of children of
both tall and short parents tends to move towards theaverage or mean height of the population. Karl Pearson
confirmed the findings of Galton by analysing thousand records of heights of both children and parents.

2. WHAT IS REGRESSION ANALYSIS

Regression analysis is a statistical tool or method that is very useful for studying the relationships between two (or
more) variables. For example, in economics we have Keynes’s theory of Physiological law of consumption. It
states when income increases there is an associated increase in consumption. However the marginal propensity to
consume will lie between 0 and 1. There are other such examples of relationship between variables in social
sciences. In such cases, Regression Analysis can be employed to build a model to predict the value of one
variable (dependent variable) on the basis of other given variables (the independent variables). We briefly explain
the notation and terminology usedcommonly in regression analysis.

Notation:

Dependent Variable:

Independent Variables: , , ,…………………………………………, �

Terminology:

The dependent variables are also known by various terminologies such as: Explained variable, predictand,
Regressand, Response, Endogenous, outcome, controlled variable.

The independent variables are also known by various terminologies: Predictor, Regressor, Stimulus, Exogenous,
Covariate, control variable.

Though the use of terminology is a personal choice, we will simply use dependent and independent/ explanatory
variables in this chapter.

2.1 Deterministic versus Statistical Relationship

In regression analysis we look for statistical relationship (and not deterministic relationship) between variables. In
statistical relationship we essentially deal with stochastic or random variables that possessed probability
distributions. For instance, the yield of crop depends on temperature, rainfall and fertilizers but agronomists could

BUSINESS ECONOMICS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


MODULE No. : 1, INTRODUCTION TO TWO VARIABLE REGRESSION
ANALYSIS
___________________________________________________________________________________________________

not predict the yield of crop exactly because there are errors involved in measuring these explanatory variables
and there are other factors affecting crop yield. Therefore in statistical relationship, the dependent variable and
explanatory variables does not have exact relationship.

In deterministic relationship the relationship between variables can be exactly determined. One such example is
Newton’s Law of Gravity. According to Newton’s Law of Gravity, the force of attraction between every particle
in the universe is directly proportional to the product of the masses and it is inversely proportional to the square of
the distance between them. There are other examples such as Ohm’s Law or Boyle’s Gas Law.

2.2 Regression versus Causation

A statistical relationship between two (or more) variables does not in any manner imply causation. For instance,
we know that crop yield depends on rainfall. Statistically speaking there is no valid reason why we cannot assume
that rainfall depends on crop yield. However simple common sense do suggests that this is not usually the case as
we cannot change rainfall by increasing or decreasing crop yield. Therefore the crop yield is treated as dependent
variable and rainfall as explanatory variable.

To determine causality between variables we must take into account a priori or theoretical considerations.
Statistical relationship in itself is not at all sufficient to imply causation between variables.

2.3 Regression versus Correlation

Both correlation and regression are closely related but conceptually different. In correlation analysis our main
objective is to measure the strength or degree of linear association between variables. For example we are
interested in measuring the correlation (coefficient) between smoking and lung cancer.

In regression analysis we are interested in predicting the average or mean value of one variable (dependent
variable) on the basis of the given values of other variables (independent variables). For instance we would want
to know the occurrence of AIDS among drug users. Secondly in correlation analysis, both the variables are treated
as random. However in regression analysis the dependent variable is random variable while the explanatory
variables is fixed and given.

3. ECONOMIC DATA

The success of any regression analysis will depend on the availability of high quality data necessary for
economic research. So we briefly look at the types of data and issue of data accuracy.

Types of Data

There are three types of data available for empirical analysis: time series, cross-section and pooled data (i.e.,
combination of time series and cross section).

Times series:
In time series, we measure the set of observations at different time periods. We can have daily data (e.g., stock
prices, weather reports), weekly data (e.g., money supply, weekly sale), monthly data (e.g., consumer price index,
BUSINESS ECONOMICS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS
MODULE No. : 1, INTRODUCTION TO TWO VARIABLE REGRESSION
ANALYSIS
___________________________________________________________________________________________________

unemployment rate), quarterly data (GDP, Industrial Production), Annual data (e.g., government budgets),
quinquennially data (e.g., census of manufacturing) and decennially data (e.g., census of population). The
problem with time series econometrics is that we need to assume that the underlying time series data is stationary.
Simply speaking a time series data is said to be stationary if its mean and variance remain constant over a period
of time.

Cross-Section Data:
In Cross Section Data we measure the data on one or more variables at the same point in time. Examples- Census
of population after every 10 years, Consumer expenditure survey etc. The problem with cross-sectional data
relates to the issue of heterogeneity. So, when we include heterogeneous units in a statistical analysis, the size and
scale effect needs to be considered into account.

Pooled Data
In pooled data we combine both the time series and cross sectional data. For instance we can have a data on the
price and output of different variety of cloths for a time period of say 10 years.

The Issue of Data Accuracy


There are lots of data available for empirical research in economics. However the quality of data is often not very
good. There are many reasons for that.
(1) Most data in social sciences are non-experimental in nature. Therefore observational error due to
omission or commission may creep in during data collection.
(2) Measurement errors may arise due to approximations and round off in experimental data
(3) The problem of non response is rampant in questionnaire type surveys.
(4) It is often difficult to compare various sample data as there are wide differences in the method of
obtaining data
(5) Sometimes economic data are available in highly aggregated form and may not contain much information
about micro units.

Therefore it is possible that if the quality of data is poor, it may lead to unsatisfactory results for the economic
researcher.

4.TYPES OF REGRESSION MODEL

There are usually two types of regression models

(1) Simple Regression model: This is further divided into simple Linear regression Model and Simple
Non Linear regression Model
(2) Multiple regression model. This is also further divided into Multiple Linear regression model and
Multiple Non Linear Regression model.

BUSINESS ECONOMICS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


MODULE No. : 1, INTRODUCTION TO TWO VARIABLE REGRESSION
ANALYSIS
___________________________________________________________________________________________________

Regression Model

Simple Regression Multiple Regression


Model (Only One Model (Two or more
explanatory variable) explanatory variables)

Linear Non-Linear Linear Non-Linear

In simple regression model the number of independent variable is one while in multiple regression model it is
more than one. The term Linearity can mean both Linearity in parameter and variables. For regression analysis,
Linear regression model would mean a model which is linear in parameter and Non Linear regression model
would mean a model which is not linear in parameter. By Linearity in parameter we mean that the parameters say

�, has the power 1 and are not multiplied or divided by any other parameters like � . or ��
1

Linear in Variable Non Linear in Variable


Linear in Parameter Linear Regression Model Linear Regression Model
Non Linear in Parameter Non-Linear Regression Model Non-Linear Regression Model

Examples:

Linear Regression model

(a) = �� + � ( Simple Linear Regression Model)


(b) = �� + � (Simple Linear Regression Model)
(c) = �� + � +� (Multiple Linear Regression Model)
(d) = �� + � +� + � (Multiple Linear Regression Model)

Non Linear Regression Model

(e) = �� + � (Simple Non Linear Regression model)


(f) = �� + � + � (Multiple Non Linear Regression Model)

5. POPULATION REGRESSION MODELS

BUSINESS ECONOMICS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


MODULE No. : 1, INTRODUCTION TO TWO VARIABLE REGRESSION
ANALYSIS
___________________________________________________________________________________________________

In regression analysis the main aim is to estimate and/or predict the (population) average/mean value of the
dependent variable based on the known explanatory variable (s). We consider in table 1 aillustrative example of
the weekly income and weekly expenditure on consumption of a hypothetical community. The total
population comprised of 36 families. These families are further divided into 6 income groups from Rs 1000 to Rs
3500. We have 6 fixed values of and the corresponding values are shown in the table. So we have 6 sub
populations. The bottom row shows the conditional expected values of Y for the 6 subpopulations.

Table 1: An illustrative example of weekly income and weekly expenditure on consumption.

Weekly Income (in Rupees)


X
1000 1500 2000 2500 3000 3500
Weekly 700 850 1150 1500 1800 2000
Consumption 750 950 1200 1600 1850 2200
Y 800 1000 1350 1700 1900 2250
850 1200 1450 1800 2050 2300
900 1250 1500 1900 2150 2350
___ 1350 1550 _____ 2250 2400
___ ___ 1600 _____ ____ 2600
Total 4000 6600 9800 8500 12000 16100
⁄ � 800 1100 1400 1700 2000 2300

Corresponding to table 1, we can also find the conditional probability for population of 36 observations. This can
be seen in table 2.

Table 2: Conditional probability for population of observation 36

X
1000 1500 2000 2500 3000 3500
Conditional 1/5 1/6 1/7 1/5 1/6 1/7
Probabilities 1/5 1/6 1/7 1/5 1/6 1/7
p(y/Xi) 1/5 1/6 1/7 1/5 1/6 1/7
1/5 1/6 1/7 1/5 1/6 1/7
1/5 1/6 1/7 1/5 1/6 1/7
___ 1/6 1/7 _____ 1/6 1/7
BUSINESS ECONOMICS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS
MODULE No. : 1, INTRODUCTION TO TWO VARIABLE REGRESSION
ANALYSIS
___________________________________________________________________________________________________

___ ___ 1/7 _____ ____ 1/7


Conditional 800 1100 1400 1700 2000 2300
Means of Y

We plot the graph of the relationship between expenditure on consumption and income as shown in fig 1. We can
clearly see that there is enough variation in expenditure on consumption in each income group. However the
average expenditure on consumption rises as income rises. This phenomenon can be seen in table 1 itself. For
instance, when the income level is Rs 1000, the corresponding average expenditure on consumption is Rs 800.
When the income is increased to Rs 1500 the average expenditure on consumption rises to Rs 1100. Thus we
clearly see a positive relationship between income and expenditure on consumption. These mean/average values
of dependent variable (here consumption expenditure) are known as conditional expected values as they are
conditioned or depend on the given explanatory variable (here income group).

Apart from conditional expected values, we can also find unconditional expected value of expenditure on
consumption . We add the expenditure on consumption of all income groups and divide it by the total
number of families which is Rs 1583.33. It is unconditional as it includes expenditure on consumption of all
income groups. The conditional expected values are different from the unconditional expected value. The
unconditional expected value gives the average expenditure on consumption of all income groups while the
conditional expected value gives the average expenditure on consumption of a particular income group. For
instance the conditional expected value of expenditure on consumption for people with income group of Rs 1000
is Rs 800 while for another income group Rs 1500 it is Rs 1100. The concept of conditional expected value may
help us to predict the average or mean value of dependent variable at different values of independent variable
which is in fact the essence of regression analysis.

The next objective is to obtain population regression line. The population regression line is obtained by joining
the conditional mean values of for various level of . Geometrically the population regression line is the locus
of conditional means of dependent variable for given values of the explanatory variables. The regression curve
thus passes through these (conditional) mean values. We assume for simplicity shake that these values are
symmetrically distributed around their respective (conditional) mean values. This can be seen in figure 1 which is
given below:

BUSINESS ECONOMICS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


MODULE No. : 1, INTRODUCTION TO TWO VARIABLE REGRESSION
ANALYSIS
___________________________________________________________________________________________________

5.1Population Regression Function

Continuing with the above illustrative example we have already seen that each conditional mean of expenditure
onconsumption depends on particular income level. Therefore, ⁄ � is a function of � where � is a given
value of . Mathematically

⁄ � = � ��

where � denotes some function of the independent variable .

Equation is known as Conditional Expectation Function (CEF) or Population Regression Function (PRF). It
tells us how the expected value of the distribution of Y is related functionallyto the value of � in some way.

The next question relates to the functional form which � should take? The function form of the Population
Regression Function is both an empirical and theoretical question. For instance, economist assumed that
expenditure on consumption and income are linearly related. For simplicity sake and as a initial working
hypothesis we assume that the Population Regression Function ⁄ � is a linear function of � .

⁄ � = �� + � � ��

where�� � � are known as the regression coefficients. The regression coefficients are fixed but unknown.�� is
known as intercept or constant term; and� as slope coefficients.

BUSINESS ECONOMICS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


MODULE No. : 1, INTRODUCTION TO TWO VARIABLE REGRESSION
ANALYSIS
___________________________________________________________________________________________________

Equation (2) is known as linear popular regression model. Our main objective is to find (i) the true values of
�� � � and (ii) standard error of �� � �

5.2 Stochastic Specification of the Population Regression Function

Coming back to our illustrative example, we find that the average expenditure on consumption increases as
income increases. However if we look at a particular family this need not be necessary true. For instance there is a
family with income of Rs 1500 whose expenditure on consumption is Rs 850 which is below the expenditure of
one family with income of Rs 1000. Therefore there are families whose consumption expenditure deviates from
the average expenditure.

The deviation of Individual � from its expected /mean value ⁄ � can be expressed as follows
�� = � − ⁄ �
� = ⁄ � + �� ��
� = �� + � � + �� ��

The deviation of � from its expected value is denoted by �� . �� is an unobservable random variable and can take
either positive or negative values. They are popularly known asstochastic disturbance term.

Interpretation of equation (4)

The expenditure on consumption of an individual family for a given income level can be expressed as the sum of
two components:

(1) The systematic or deterministic component:


This is represented by ⁄ � which is the average expenditure on consumption of all the families with
the same income level.

(2) The random, or non-systematic component.


This is represented by the stochastic disturbance term,��

To understand the issues more clearly we write the hypothetical example in table 1 in the form of equation 3
The individual consumption expenditure for = � can be written as given below:

= = �� + � + �

= = �� + � + �

= = �� + � + �

= = �� + � + �

= = �� + � + �

Conditional mean of disturbance term


BUSINESS ECONOMICS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS
MODULE No. : 1, INTRODUCTION TO TWO VARIABLE REGRESSION
ANALYSIS
___________________________________________________________________________________________________

Consider equation (3) and take expectation both the sides

�⁄ � = ⁄ � + �� ⁄ �

= ⁄ � + �� ⁄ � (Since expected value of constant is that constant itself).

Since �⁄ � is equivalent to ⁄ � we get

�� ⁄ � =

Thus the conditional mean value of �� is zero. This is because we assume that the regression curve/line passes
through the conditional mean of .

The stochastic specification in equation 3 clearly shows that there are other variable(s) apart from income which
affects expenditure on consumption and income alone cannot explain individual family consumption expenditure.

6. THE STOCHASTIC DISTURBANCE TERM

The disturbance term �� captures all omitted variables that collectively affects but are not included in the model.
The question is why is it not possible to introduce all the terms that affect the dependent variable explicitly into
the model. There are numerous number of reasons for this:

1. There is always some elements of intrinsic randomness in human responses. This arises due to
unpredictability of human choices, error in making decisions among others.
2. An effect of large number of omitted variables is contained in . Due to incompleteness of theory or data
unavailability there are large numbers of explanatory variables which are excluded from the model.
3. There could be error in measuring
4. Functional form of is not known. In reality it is very difficult to know the exact functional form of
the relationship between dependent variable and independent variables.

7.SAMPLE REGRESSION FUNCTION

Our objective is to estimate Population Regression Function (PRF). In reality we cannot observe or see the
population relationship between the dependent variable and explanatory variable . So we use sample
information to estimate population values. Consider the two random samples drawn from Table 1

Table 3: Two Random Samples


Sample 1 Sample 2
Y X Y X
700 1000 800 1000

BUSINESS ECONOMICS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


MODULE No. : 1, INTRODUCTION TO TWO VARIABLE REGRESSION
ANALYSIS
___________________________________________________________________________________________________

950 1500 1200 1500


1350 2000 1450 2000
1800 2500 1500 2500
2150 3000 1850 3000
2400 3500 2250 3500

The Sample Regression Function (SRF) is the counterpart of PRF which can be expressed as given below:

: ̂� = � + �

Where ̂� is an estimator of [ � ⁄ � ]
� is an estimator of ��
is an estimator of �

We draw sample regression curve/line based on table 3. This can be seen from figure 2 which is given below:

Recall that the Stochastic Population Regression Function can be written as follows:
� = �� + � � + ��

The Stochastic Sample Regression Function is


� = �+ �+ �

Where ̂� is the conditional expected value / predicted value of �


̂
� is the deviation (or residual/error) between predicted value of � and the actual �.
Note that �� ≠ � .

The main objective of regression analysis is to estimate the Population regression Function with the help of the
Sample Regression Function. The sample estimates will be used as an approximate of the population parameters.
BUSINESS ECONOMICS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS
MODULE No. : 1, INTRODUCTION TO TWO VARIABLE REGRESSION
ANALYSIS
___________________________________________________________________________________________________

These sample estimates will also vary from one sample to another sample. So our task will be to estimate the SRF
which make this approximation as close as possible.

8. SUMMARY

 The main idea behind any regression analysis is to study the statistical dependence of dependent
variable on one or more explanatory variables

 The objective of any regression analysis is to estimate and / or predict the mean value of the dependent
variable on the basis of the known value of the explanatory variables.

 The success of any regression analysis will depend upon the availability of high quality data.

 Regression models could be of two types: Simple Regression model and Multiple Regression model. In
both cases, we could have linear and non-linear regression models.

 We study linear population regression functions regressions that are linear in the parameters only.They
could be non linear in the explanatory variables.

 The population regression function (PRF) or the conditional expectation function (CEF) remains the key
concept behind regression analysis.We study how the average value of the dependent variable changes
with the given value of the explanatory variables.

 We study the stochastic PRF as they are useful for empirical analysis.The stochastic disturbance term ��
plays an important role in estimating the PRF.

 The stochastic disturbance term �� captures all the factors that influence the dependent variable but are
not explicitly incorporated in the model.

 In reality one rarely has the access to the entire population of interest.So we use the stochastic sample
regression function to estimate the PRF.

BUSINESS ECONOMICS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


MODULE No. : 1, INTRODUCTION TO TWO VARIABLE REGRESSION
ANALYSIS

You might also like