You are on page 1of 21

FUNDAMENTALS OF

ECONOMETRICS
DR ABDUL WAHEED
PhD Econometrics
FUNDAMENTALS OF
ECONOMETRICS
Week 2
Lecture 4
The Nature of Data
➢ Econometric analysis requires data
➢ Different kinds of economic data sets are:
▪ Cross-sectional data
▪ Time series data
▪ Pooled cross-sectional
▪ Panel/Longitudinal data
➢ Econometric methods depend on the nature of the data used
▪ Use of inappropriate methods may lead to misleading results
The Nature of Data
Cross-Sectional Data Sets
➢ Sample of individuals, households, firms, cities, states, countries,
or other units of interest at a given point of time/in a given period.
➢ Cross-sectional observations are more or less independent.
➢ For example, pure random sampling from a population.
➢ Sometimes pure random sampling is violated, e.g. units refuse to
respond in surveys, or if sampling is characterized by clustering.
➢ Cross-sectional data typically encountered in applied
microeconomics and other social sciences.
The Nature of Data
Cross-Sectional Data Sets
Example: Cross-sectional data set on wages and other characteristics

Observation
number

Indicator
variables
Hourly
wage 1=yes,
0=no
The Nature of Data
Time Series Data
➢ Observations of a variable or several variables over time
➢ For example, stock prices, money supply, consumer price index,
gross domestic product, automobile sales, …..
➢ Time series observations are typically serially correlated
➢ Ordering of observations conveys important information
➢ Data frequency: daily, weekly, monthly, quarterly, annually, …
➢ Typical features of time series: trends and seasonality
➢ Typical applications: applied macroeconomics and finance
The Nature of Data
Time Series Data Sets
Example: Time series data on minimum wages and related variables

Average minimum Average coverage Unemployment rate Gross national


wage for given year rate product
The Nature of Data
Pooled Cross Sections Data
➢ Two or more cross sections are combined in one data set
➢ Cross sections are drawn independently of each other
➢ Pooled cross sections often used to evaluate policy changes
➢ Example:
▪ Evaluate effect of change in property taxes on house prices
▪ Random sample of house prices for the year 1993
▪ A new random sample of house prices for the year 1995
▪ Compare before/after (1993: before reform, 1995: after reform)
The Nature of Data
Pooled Cross Sections Data Area of house
Example: Pooled cross sections on housing prices Number of
bedrooms
Number of
bathrooms

Before reforms

Property tax

Price of house

After reforms
The Nature of Data
Panel or longitudinal data
➢ The same cross-sectional units are followed over time
➢ Panel data have a cross-sectional and a time series dimension
➢ Panel data can be used to account for time-invariant unobservable
➢ Panel data can be used to model lagged responses
The key feature of panel data that distinguishes them from a pooled cross
section is that the same cross-sectional units (individuals, firms, or counties)
are followed over a given time period.
➢ Example:
▪ City crime statistics; each city is observed in two years
▪ Time-invariant unobserved city characteristics may be modeled
▪ Effect of police on crime rates may exhibit time lag
The Nature of Data
Panel or longitudinal data
Example: wo-year panel data on city crime statistics

Each city has two


time series
observations

Number of
police in 1986

Number of
police in 1990
The Sources of Data
Sources of data (International)
Success of regression is dependent on availability of quality data
International Sources?
➢ World Bank, World Development Indicators (WDI)
➢ IMF: International Monetary Website
➢ Asian Development Bank, International Energy Organization
➢ Yet best source is: Google itself.. And you can Google for relevant
international data
The Sources of Data
Sources of data (Local)
Local Sources of Data?
➢ Economic Survey of Pakistan
➢ Handbook of State Bank of Pakistan
➢ Surveys of Pakistan Bureau of Statistics
➢ Websites of Planning Commission, SBP, PBS, Ministry of Finance,
SECP etc.
The Linear Regression Model (LRM)
The general form of the LRM model is also known as Linear Population
Model (LPM) or True Model

𝒀𝒊 = 𝜷𝟏 + 𝜷𝟐 𝑿𝟐𝒊 + 𝜷𝟑 𝑿𝟑𝒊 + ⋯ + 𝜷𝒌 𝑿𝒌𝒊 + 𝒖𝒊

The Simple Linear regression model

𝒀𝒊 = 𝜷𝟏 + 𝜷𝟐 𝑿𝟐𝒊 + ε𝒊
It is called simple because it has only two variables.
The Linear Regression Model (LRM)
The term “linear” regression will always mean a regression that is linear
in the parameters; the β’s (that is, the parameters) are raised to the first
power only. It may or may not be linear in the explanatory variables, the
X’s.

LRM=Linear regression model


NLRM=Non-linear regression model
Estimation of the Linear Regression Model
A commonly used method to estimate
the regression coefficients
is the method of
Ordinary Least Squares
(OLS)
Ordinary Least Squares(OLS)
It is explained as follows:
OLS calculates the unknown regression parameters by minimizing the
sum of the squared errors of the regression model
One way to obtain estimates of the 𝜷 coefficients would be to make the
sum of the error term as small as possible, ideally zero. For theoretical
and practical reasons, the method of OLS does not minimize the sum of
the error term, but minimizes the sum of the squared error term.
Ordinary Least Squares(OLS)
Consider the general linear regression model
𝒀𝒊 = 𝜷𝟏 + 𝜷𝟐 𝑿𝟐𝒊 + 𝜷𝟑 𝑿𝟑𝒊 + ⋯ + 𝜷𝒌 𝑿𝒌𝒊 + 𝒖𝒊

𝒖𝒊 = 𝒀𝒊 − 𝜷𝟏 − 𝜷𝟐 𝑿𝟐𝒊 − 𝜷𝟑 𝑿𝟑𝒊 − ⋯ − 𝜷𝒌 𝑿𝒌𝒊

𝑢𝑖2 = 𝒀𝒊 − 𝜷𝟏 − 𝜷𝟐 𝑿𝟐𝒊 − 𝜷𝟑 𝑿𝟑𝒊 − ⋯ − 𝜷𝒌 𝑿𝒌𝒊 2

𝒖𝟐𝒊 = the error sum of squares (ESS)


The actual minimization of ESS involves calculus techniques. We take the (partial)
derivative of ESS with respect to each 𝜷 coefficient, equate the resulting equations
to zero, and solve these equations simultaneously to obtain the estimates of the k
regression coefficients. Since we have k regression coefficients, we will have to
solve k equations simultaneously.
Ordinary Least Squares(OLS)
For simplicity consider the simple linear regression model

𝒀𝒊 = 𝜷𝟏 + 𝜷𝟐 𝑿𝒊 + 𝒖𝒊

We can estimate the 𝒀 𝒊 = 𝜷 𝟏 + 𝜷 𝟐 𝑿𝒊 + 𝒖 𝒊

𝒖𝒊 = 𝒀𝒊 − 𝜷𝟏 − 𝜷𝟐 𝑿𝒊

2
𝑢𝑖2 = 𝒀𝒊 − 𝜷𝟏 − 𝜷𝟐 𝑿𝒊

Differentiating partially with respect to 𝜷𝟏 and 𝜷𝟐 , for minimizing the errors sum
of squares, the following results are obtained
Ordinary Least Squares(OLS)
𝜕 𝑢𝑖2
= −2 𝒀𝒊 − 𝜷𝟏 − 𝜷𝟐 𝑿𝒊 𝑿𝒊
𝜕𝜷𝟐
𝜕 𝑢𝑖2
= −2 𝒀𝒊 − 𝜷𝟏 − 𝜷𝟐 𝑿𝒊
𝜕𝜷𝟏

Setting these equations to zero, after algebraic simplification and manipulation, gives the
following two equations.
𝒀𝒊 = 𝑛𝜷𝟏 + 𝜷𝟐 𝑿𝒊

𝒀𝒊 𝑿𝒊 = 𝜷𝟏 𝑿𝒊 + 𝜷𝟐 𝑿𝟐𝒊

where 𝑛 is the sample size.


These simultaneous equations are known as the normal equations.
Ordinary Least Squares(OLS)
Solving the normal equations simultaneously, following results are obtained

𝒏 𝒀𝒊 𝑿𝒊 − 𝑿𝒊 𝒀𝒊
𝜷𝟐 =
𝒏 𝑿𝟐𝒊 − 𝑿𝒊 𝟐

𝑿𝟐𝒊 𝒀𝒊 − 𝑿𝒊 𝒀𝒊 𝑿𝒊
𝜷𝟏 =
𝒏 𝑿𝟐𝒊 − 𝑿𝒊 𝟐
The estimators obtained are known as the least-squares estimators.

You might also like