Professional Documents
Culture Documents
1
Textbook
Wooldridge, Jeffrey (2008). Introductory Econometrics:
A Modern Approach. 4th edition, paperback. South-
Western, Division of Thomson Learning.
2
The Nature of Business Statistics Data
• Reference: Wooldridge, Chapter 1.
• Business Statistic is used for:
– Estimating Business Models
– Evaluating & implementing policy
– Forecasting
• What is the effect of education on wages?
• How do training programs impact productivity?
• How will share prices develop in the future?
3
• Key ingredient: Data – typically in the form of
large samples.
• Data = information.
• Business Statistic= a method for processing
data and learning about general patterns in
the population of interest.
• For example, what is the effect of education
on labor market outcomes in the US?
4
Common structures of Business data
1. Cross-sectional data: Sample of individuals, households, firms,
taken at a given point in time; often obtained from random sampling
from the underlying population.
2. Time series data: Observations on one or several variables over
time (e.g. GDP for Sweden 1971-2011). Time series observations are
unlikely to be independent over time which implies certain
methodological problems that we will study later.
3. Pooled cross sections: Combines cross-section datasets for
different time periods.
4. Panel (or longitudinal) data: Combines cross-section datasets
for different time periods for the same individuals.
5
Example 1.1:
Becker’s model of crime
• Certain crimes have clear economic rewards,
but they also have costs.
• From Becker’s (1968) perspective, the decision
to participate in illegal activity is influenced by
the rewards and costs.
• Now write down an equation describing the
time spent in criminal activity as a function of
various factors:
6
A model of crime:
x3 = other income
x7 = age
Think about whether the various x-variables likely impact on y positively or negatively. 7
Model Specification
• Before we can undertake statisical analysis
linked to crime or worker productivity, the
models above must be made specific.
• This means we must decide exactly what the
function f( ) looks like.
• A second issue is how to deal with variables
tha cannot be observed (e.g. the wage that
someone can earn in criminal activity).
8
A model of crime
where:
crime = measure of the frequency of criminal activity
wage = wage that can be earned legally;
othinc = income from other sources;
freqarr = frequency of arrests for prior crimes;
freqconv = freqency of conviction;
avgsen = average sentence length after conviction;
9
Causality
• A common goal for applied statisian is to estimate
the causal effect of one variable on some outcome of
interest.
• Important: Distinguish correlation (association) from
causation.
• Ceteris paribus: other relevant factors being equal,
what is the effect of…
– a price increase on consumer demand
– training on worker productivity
10
Causality (cont’d)
If…
a) …we succeed in holding all other relevant
determinants of (say) productivity constant;
and
b) …find a link between training and
productivity,
…then we can conclude that training has a
causal effect on productivity.
11
Causality (cont’d)
• Ideal setting is experimental: laboratory –
administer treatment to half the sample and
use the other half as control.
• Much of the research in Busines and
economics use non-experimental data
• A key challenge in Business Statistics is to
condition on enough other factors, so that a
case for causality can be made.
12
Causality: Example
• Goal: Estimate the causal effect of education on wages
• Data: WAGE1.DTA. (Source: 1976 Current Population Survey in the US).
• Scatter plot:
25
20
average hourly earnings
10 5
0 15
0 5 10 15 20
years of education
13
Causality: Example
• This of course doesn’t imply that education causes wages
• Wages are determined by many other factors except education –
for example, innate ability
– High ability => high wages
– High ability => high education (e.g. intelligent individuals choose high education )
• Perhaps the correlation between education and wages visible
in the graph is driven by ability rather than education?
• To credibly estimate the causal effect of education, we must find
a way of determining the link between education and wages
holding innate ability constant!
14
Chapter 2:
The Simple Regression Model
15
The simple regression model
Suppose we want to ”explain y in terms of x”.
Three issues:
1. Since there’s never an exact relationship between two
variables: how allow for other factors affecting y?
2. What is the functional form?
3. Are we capturing a ceteris paribus (causal)
relationship between y and x?
16
The simple linear regression model
Assume that, in the population, outcome variable y can be
modeled as a function of x as follows:
17
Simple regression: The functional
relationship between y and x is linear:
• If other factors in u are held fixed, so that the
change in u is zero (Δu=0), then in a linear model x
has a constant effect on y:
19
• To get reliable estimators of β0 and β1 from a random
sample of data we have to make an assumption
restricting how unobservable u is related to the
explanatory variable x.
20
Detour: Expected Values
21
The expected value
24
Illustration:
25
Now back to Chapter 2
• We encountered the following ’crucial assumption’:
26
Example
Model:
27
A more innocent assumption
• As long as the intercept β0 is included in the
equation, we can always assume that the average
value of u in the population is zero:
28
Model:
Assumption:
Assumption:
30
Interpretation:
Breaking y into two parts
31
Deriving the
Ordinary Least Squares Estimates
32
Estimation procedure
Assumption:
Assumption:
The first of these (mean independence) implies zero covariance
between x and u. We can now re-write the above assumptions as
(2.11)
(2.10)
(2.13)
(2.12)
(2.13)
35
• Show how we can solve for and from these
equations ( you need to know how to do this).
where and
36
(2.19)
(2.17)
38
Why is this estimator called the ’ordinary
least squares’ (OLS) estimator?
• To see why, first define a fitted value for y when
x=xi as
Least squares…
39
40
Some related concepts…
• The OLS regression line (or, the sample
regression function; SRF):
• Interpretation:
42
Example:
CEO Salary and Return on Equity
43
15000
Scatter plot:
Cov(salary,roe) = 1342.5
1990 salary, thousands $
10000
Corr(salary,roe)= 0.11
Var(roe) = 72.6
5000 0
0 20 40 60
return on equity, 88-90 avg
45
15000
10000
5000
0
The regression line
0 20 40 60
return on equity, 88-90 avg