Cee561 L3

CEE 561
Transportation Modelling
Lecture 2
Continuous vs. Discrete Goods
Continuous Goods Discrete Goods

x2
auto
1
Indifference
u curves
3
u2
u1
x1
0 1 bus
1
Outline
Data
Modeling Principles
– Assumptions
– Estimates
– Statistical Tests
– Potential data problems
Data
Cross-section
– Activities of individual persons, firms or other units at single time
z One observation/ individual
Time series
– Movement of a variable over time
z Annual, quarterly, monthly, weekly observations etc.
z Mostly used for national or regional level aggregation of the
observations
Pooled/panel
– Combination of time series and cross-section
– Behavior of individual persons, firms or other units over time
2
Examples
Cross-sectional data
Vehicle Miles Travelled in 2008
No of
No of cars HH No of HH
ID VMT Children TAZ
in HH Income members
in HH
1 1000 2 >80k 4 2 101
2 1200 2 30-50k 3 1 101
3 600 1 30-50k 2 0 104
:
:
Examples
Time series data
Vehicle Miles Travelled in between 2006-2008
Average
Avg Fuel Average car
Year VMT/
Price/L ownership
person
2000 20 500 0.01
2001 22 575 0.02
2002 25 0.04
: : :
: : :
2006 45 800 0.06
2007 50 900 0.06
2008 77 1000 0.06
3
Examples
Pooled/Panel data
No of
Avg Fuel No of cars HH No of HH
ID Year VMT Children TAZ
Price/L in HH Income members
in HH
2006 30 800 2 >80k 4 2 101
1 2007 50 900 2 >80k 4 2 101
2008 77 1000 2 >80k 4 2 101
2006 30 1000 2 30-50k 3 1 101
2 2007 50 1500 2 30-50k 3 1 101
2008 77 1200 2 30-50k 3 1 101
:
:
Examples
Pseudo panel data
No of
Avg Fuel No of cars HH No of HH
ID Year VMT Children TAZ
Price/L in HH Income members
in HH
1 2006 30 800 1 >80k 4 1 101
2 2007 50 900 2 50-80k 3 2 104
3 2008 77 1000 2 >80k 4 2 101
4 2006 30 1000 3 30-50k 3 2 108
5 2007 50 1500 1 30-50k 2 1 101
6 2008 77 1200 1 50-80k 3 1 101
:
:
4
Modeling Principles
Hypothesis:
– Example: VMT= f (fuel cost, no of cars, hh income, hh size)
Linear relationship:
– Example: VMT = α + β cost * cost +β car *carNo + β inc * hhInc + β size * hhSize
Non-linear relationship:
β cos t * cost
VMT = α + β car *carNo +
1 + β inc * hhInc
– In this course we will deal with linear relationships only

In Regression analysis, we estimate α and β s that
‘best fit’ the observed data using estimators
Estimators
Our interest: Population
Available: Sample/samples from population
– Sample information to obtain best possible estimates
Estimator: Rule that gives a reasonable estimate for each
and every possible sample
– Estimators are rules
– Estimates are numbers produced by the estimator
Desirable properties
– Unbiasedness
– Efficiency
– Consistency (only for large sample)
10
5
Desirable Properties
We want our estimators to be :

– Unbiased: Expected value of estimator close to true mean
z Bias = E ( β * ) − β
population
– Efficient: For a given sample size, variance is smaller than any

other unbiased estimator
z Higher efficiency indicates higher reliance on results
– Consistent: As N increases β 
→ β population
*
This assumption is required when we do statistical tests (e.g. t-test)
11
Examples of Estimators
Least/Minimum error
N
Min∑ (Yi − Yi )
i =1
Least/Minimum absolute error

N
Min∑ | (Yi − Yi ) |
i =1
Ordinary Least square (OLS)

N
Min∑ (Yi − Yi ) 2
i =1
Weighted least square (WLS)

N
Min∑ wi (Yi − Yi ) 2
i =1
12
6
Two Variable Linear Regression Model
Model:
Yi = α + β X i + ε i
X i = non − stochastic
ε i = stochastic random term
(often follows certain distributions )
13
Error ( ε )
Variables cannot provide perfect explanations
Errors are things that influence Yi other than Xi
Reasons
– Simplification of reality
z e.g. VMT=f (no of cars, hh income, hh size, hh children, location)
z Omitted variables: individual tastes, education, lifestyle patterns
and many more…
– Measurement errors
z Privacy issues
z Poor record keeping etc.
14
7
Error ( ε )
Prediction Error ε = Y − Y *
i i i
Yi* = Predicted dependent variables= α + β X i
Sum squared error (SSE) : ∑ ε i 2 = ∑ (Yi − Yi* ) 2

N N
In OLS, we minimize SSE
15
Two Variable Linear Regression Model
Model:
Yi = α + β X i + ε i
X i = non − stochastic
ε i = stochastic random term
(often follows certain distributions )
Solution:
β=∑
XY i i
∑X i
2
Y X
α = ∑ −β∑ i i
N N
If Y varies a lot when X varies little, β will be big.
In other words, β is the magnitude of influence of x on y
16
8
Statistical Significance
How dependable are the estimates?
How significant is X in explaining Y ?
– If there is a high probability that β is not 0, then β * is
statistically significant
– The smaller the standard errors (variances) are relative to the
coefficients, the more confidence we have in the estimates
How to test?
– Use t-stats/ t-test
– t-stat = β − β*
std error of β *
– Compare with tcritical (at 95% or 90% level of confidence) at (N-k)
dof (N=Obs number, k= number of estimated parameters)
z > tcritical: statistically significant
17
Goodness-of-Fit
How well the model fits the data
Measure R 2 = 1 − ∑ ε i 2
∑Y i
2
18
9
Multivariate Linear Regression Model
Yi = α + β1 X 1i + β 2 X 2i + β 3 X 3i + ... + ε i
In matrix notation:
Y=βX
X = [1 X1 X 2i X 3i ....]
OLS Solution
β = ( X ' X ) −1 ( X ' Y )
19
Goodness-of-Fit
R2 always increases as we add new variables
2
Measure R which accounts for k (number of estimated
parameters)
Model with higher R-bar sqr. has better goodness-of-fit
in absolute terms
20
10
Example: Chicago Trip Generation
Dependent variable:
– average trips per occupied dwelling unit
Independent variables
– average car ownership
– average household size
– three zonal social indices
21
Assignment
Variations you can try:
– Add other variables
– Use interaction terms
– Use log on variables
– Piecewise linear formulation
Evaluation criteria:
– Correct signs
– Improvement in goodness-of-fit
– t-test
22
11
Assumptions of Classical LR Model
1. Relationship between X and Y linear

2. X non-stochastic and no exact linear relationship exists
between two or more independent variables
3. Error has zero expected value (cancel out)
E (ε ) = 0
4. Error has constant variance for all observations
E (ε 2 ) = σ 2
5. No correlation among errors
E (ε iε j ) = 0, for all i ≠ j
23
Gauss Markov Theorem

If 1-5 is fulfilled OLS is BLUE
– Best
– Linear
– Unbiased
– Estimator
24
12
Violation 1: Collinearity
Types
– Perfect correlation
– Other high interdependence: multicollinearity
Examples
– e.g. GPA=f(X1,X2,X3, X4,X5)
z X1= parents education level
z X2= average hours of study / day
z X3= average hours of study/ week
z X4= parents income
z X5= school
– X2 and X3 perfectly collinear
– X1 and X4 can be multicollinear
25
Violation 1:Collinearity (cont)

Effect:
– Perfect:
z Cannot be estimated
– Multicollinear:
z Difficult to interpret
z Affects statistical significance
Solution:
– Drop one variable
– Caution: May result bias
26
13
Violation 2: Heteroscedasticity
Homoscedastic= constant variance
Heteroscedastic = variance not constant
Example:
– Large firm: bigger errors
– Larger TAZ: bigger errors
Effect: Estimators unbiased but inefficient

Solution: Weighted least square (WLS)
27
Violations 3: Serial correlation

Both cross section and time series
Can be positive or negative
– e.g. Positive error: incorrect mileage reading
– Negative error: mileage data taken in Jan 2009 instead
of Dec 2008 ; overestimation of 2008 VMT,
underestimation of 2009 VMT
Effect: Estimators unbiased but inefficient
Solution:
– Prais-Winsten, Cochrane-Orcutt, Durbin’s Method
28
14

Cee561 L3

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cee561 L3

Uploaded by

Copyright:

Available Formats

CEE 561

Continuous vs. Discrete Goods

Continuous Goods Discrete Goods

– In this course we will deal with linear relationships only

We want our estimators to be :

– Efficient: For a given sample size, variance is smaller than any

This assumption is required when we do statistical tests (e.g. t-test)

Least/Minimum absolute error

Ordinary Least square (OLS)

Weighted least square (WLS)

Yi* = Predicted dependent variables= α + β X i

Sum squared error (SSE) : ∑ ε i 2 = ∑ (Yi − Yi* ) 2

In OLS, we minimize SSE

Two Variable Linear Regression Model

1. Relationship between X and Y linear

Gauss Markov Theorem

Violation 1:Collinearity (cont)

Effect: Estimators unbiased but inefficient

Violations 3: Serial correlation

You might also like

Cee561 L3

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cee561 L3

Uploaded by

Copyright:

Available Formats

CEE 561

Continuous vs. Discrete Goods

Continuous Goods Discrete Goods

– In this course we will deal with linear relationships only

 We want our estimators to be :

– Efficient: For a given sample size, variance is smaller than any

This assumption is required when we do statistical tests (e.g. t-test)

 Least/Minimum absolute error

 Ordinary Least square (OLS)

 Weighted least square (WLS)

Yi* = Predicted dependent variables= α + β X i

Sum squared error (SSE) : ∑ ε i 2 = ∑ (Yi − Yi* ) 2

 In OLS, we minimize SSE

Two Variable Linear Regression Model

1. Relationship between X and Y linear

Gauss Markov Theorem

Violation 1:Collinearity (cont)

 Effect: Estimators unbiased but inefficient

Violations 3: Serial correlation

You might also like

We want our estimators to be :

Least/Minimum absolute error

Ordinary Least square (OLS)

Weighted least square (WLS)

In OLS, we minimize SSE

Effect: Estimators unbiased but inefficient