Professional Documents
Culture Documents
Transportation Modelling
Lecture 2
auto
1
Indifference
u curves
3
u2
u1
x1
0 1 bus
1
Outline
Data
Modeling Principles
– Assumptions
– Estimates
– Statistical Tests
– Potential data problems
Data
Cross-section
– Activities of individual persons, firms or other units at single time
z One observation/ individual
Time series
– Movement of a variable over time
z Annual, quarterly, monthly, weekly observations etc.
z Mostly used for national or regional level aggregation of the
observations
Pooled/panel
– Combination of time series and cross-section
– Behavior of individual persons, firms or other units over time
2
Examples
Cross-sectional data
Vehicle Miles Travelled in 2008
No of
No of cars HH No of HH
ID VMT Children TAZ
in HH Income members
in HH
1 1000 2 >80k 4 2 101
2 1200 2 30-50k 3 1 101
3 600 1 30-50k 2 0 104
:
:
Examples
Time series data
Vehicle Miles Travelled in between 2006-2008
Average
Avg Fuel Average car
Year VMT/
Price/L ownership
person
2000 20 500 0.01
2001 22 575 0.02
2002 25 0.04
: : :
: : :
2006 45 800 0.06
2007 50 900 0.06
2008 77 1000 0.06
3
Examples
Pooled/Panel data
Vehicle Miles Travelled in between 2006-2008
No of
Avg Fuel No of cars HH No of HH
ID Year VMT Children TAZ
Price/L in HH Income members
in HH
2006 30 800 2 >80k 4 2 101
1 2007 50 900 2 >80k 4 2 101
2008 77 1000 2 >80k 4 2 101
2006 30 1000 2 30-50k 3 1 101
2 2007 50 1500 2 30-50k 3 1 101
2008 77 1200 2 30-50k 3 1 101
:
:
Examples
Pseudo panel data
Vehicle Miles Travelled in between 2006-2008
No of
Avg Fuel No of cars HH No of HH
ID Year VMT Children TAZ
Price/L in HH Income members
in HH
1 2006 30 800 1 >80k 4 1 101
2 2007 50 900 2 50-80k 3 2 104
3 2008 77 1000 2 >80k 4 2 101
4 2006 30 1000 3 30-50k 3 2 108
5 2007 50 1500 1 30-50k 2 1 101
6 2008 77 1200 1 50-80k 3 1 101
:
:
4
Modeling Principles
Hypothesis:
– Example: VMT= f (fuel cost, no of cars, hh income, hh size)
Linear relationship:
– Example: VMT = α + β cost * cost +β car *carNo + β inc * hhInc + β size * hhSize
Non-linear relationship:
β cos t * cost
VMT = α + β car *carNo +
1 + β inc * hhInc
Estimators
Our interest: Population
Available: Sample/samples from population
– Sample information to obtain best possible estimates
Estimator: Rule that gives a reasonable estimate for each
and every possible sample
– Estimators are rules
– Estimates are numbers produced by the estimator
Desirable properties
– Unbiasedness
– Efficiency
– Consistency (only for large sample)
10
5
Desirable Properties
– Consistent: As N increases β
→ β population
*
11
Examples of Estimators
Least/Minimum error
N
Min∑ (Yi − Yi )
i =1
6
Two Variable Linear Regression Model
Model:
Yi = α + β X i + ε i
X i = non − stochastic
ε i = stochastic random term
(often follows certain distributions )
13
Error ( ε )
Variables cannot provide perfect explanations
Errors are things that influence Yi other than Xi
Reasons
– Simplification of reality
z e.g. VMT=f (no of cars, hh income, hh size, hh children, location)
z Omitted variables: individual tastes, education, lifestyle patterns
and many more…
– Measurement errors
z Privacy issues
z Poor record keeping etc.
14
7
Error ( ε )
Prediction Error ε = Y − Y *
i i i
15
Model:
Yi = α + β X i + ε i
X i = non − stochastic
ε i = stochastic random term
(often follows certain distributions )
Solution:
β=∑
XY i i
∑X i
2
Y X
α = ∑ −β∑ i i
N N
If Y varies a lot when X varies little, β will be big.
In other words, β is the magnitude of influence of x on y
16
8
Statistical Significance
How dependable are the estimates?
How significant is X in explaining Y ?
– If there is a high probability that β is not 0, then β * is
statistically significant
– The smaller the standard errors (variances) are relative to the
coefficients, the more confidence we have in the estimates
How to test?
– Use t-stats/ t-test
– t-stat = β − β*
std error of β *
– Compare with tcritical (at 95% or 90% level of confidence) at (N-k)
dof (N=Obs number, k= number of estimated parameters)
z > tcritical: statistically significant
17
Goodness-of-Fit
How well the model fits the data
Measure R 2 = 1 − ∑ ε i 2
∑Y i
2
18
9
Multivariate Linear Regression Model
Yi = α + β1 X 1i + β 2 X 2i + β 3 X 3i + ... + ε i
In matrix notation:
Y=βX
X = [1 X1 X 2i X 3i ....]
OLS Solution
β = ( X ' X ) −1 ( X ' Y )
19
Goodness-of-Fit
R2 always increases as we add new variables
2
Measure R which accounts for k (number of estimated
parameters)
Model with higher R-bar sqr. has better goodness-of-fit
in absolute terms
20
10
Example: Chicago Trip Generation
Dependent variable:
– average trips per occupied dwelling unit
Independent variables
– average car ownership
– average household size
– three zonal social indices
21
Assignment
Variations you can try:
– Add other variables
– Use interaction terms
– Use log on variables
– Piecewise linear formulation
Evaluation criteria:
– Correct signs
– Improvement in goodness-of-fit
– t-test
22
11
Assumptions of Classical LR Model
23
24
12
Violation 1: Collinearity
Types
– Perfect correlation
– Other high interdependence: multicollinearity
Examples
– e.g. GPA=f(X1,X2,X3, X4,X5)
z X1= parents education level
z X2= average hours of study / day
z X3= average hours of study/ week
z X4= parents income
z X5= school
– X2 and X3 perfectly collinear
– X1 and X4 can be multicollinear
25
26
13
Violation 2: Heteroscedasticity
Homoscedastic= constant variance
Heteroscedastic = variance not constant
Example:
– Large firm: bigger errors
– Larger TAZ: bigger errors
27
28
14