Professional Documents
Culture Documents
Optional
Regression by MS Excel 1 02
12000 12000
6000 6000
4000 4000
2000 2000
0 0
0 200 400 600 800 1000 0 200 400 600 800 1000
AW Distance from MTR Station (m) AW Distance from MTR Station (m)
Categorical Approach? 15
Y (Price) In some facility case (e.g., highway interchange), the spatial relation
cannot be simply linear (e.g., accessibility benefits are cancelled out by
air pollution & noise near a facility)
In such a complex spatial case, we may try to incorporate categorical
approach (distance band dummy 1/0)
Not
good
line
0 Band Band Band Band Band X (Distance)
100 200 300 400 500
Generate Band Dummy (1/0) 16
Select All Parts
& Sort the
dataset based
on Actual
SMALL
Walking
Distance from
Small to Large
LARGE
Generate Band Dummy (1/0) 17
Add a new column for the new variable “MTR
Distance within 100 m dummy”
Insufficient
Regression with Band Dummy 21
Finalized Model
<0.01
Make a graph
Regression with Band Dummy 22
1200 Band Dummies can be incorporated
Access benefits are highly into semi‐log and full‐log functions
1000 localized within 100m? too. You may also consider
incremental bands (e.g., by 50
800
meters), but it should not be too
Housing Price Premium (HKD sq f)
200
‐200
‐400
‐1000
Within 100m Within 200m Within 300m Within 400m Within 500m Between 500‐1000m
Y = 0 + 1X1 + 2X2 +
X1 X2
Simple Example 25
P = 0 + 1X1 + 2X2 +
X1 (sq f.)
P‐X2 Relationship 27
P ($/sq f.)
0 1 X2 (1/0)
No‐view Ocean view
P‐X1‐X2 Relationship 28
P ($/sq f.) X2 = 1
X2 = 0
X2 = 0
e
P
X1 X2
Interaction means… 31
HK$20,000 HK$5,000
HK$30,000
How to write? 1 32
P ($/sq f.) X2 = 1
X2 = 0
P = 0 + ’ 1X1 + 2X2 +
’ 1= 1+3X2
P = 0 + ( 1+3X2 )X1
+ 2X2 +
How to write? 2 34
P = 0 + 1 X1 +3X2 X1
+ 2X2 +
P = 0 + 1 X1 + 2X2
+3X1 X2 +
Interaction Effect
Extensions 35
Technically speaking, interactions
can be
• More than two (e.g., 3X1 X2 X3 )
• Numeric & Numeric Variables
• Dummy & Dummy Variables
• Linear, semi‐long, and full‐log forms
• Non‐linear
But, don’t be too many, unreasonable, and too
complex
Possible Combinations 36
Another Example 37
P = 0 + 1 X1 + 2X2
+3X1 X2 +
P: Property Price ($/sq f)
X1 :Distance from MTR (m)
X2 :Public-Private Coordination (1/0)
Another Example 38
Descriptive
Relational
Causal
Y= aX + b
Dependent Independent
Causality is difficult 41
Direct Causal Relationship Indirect Causal Relationship
X Y X Z Y
Spurious Relationship
Bidirectional Causal Relationship
Z
X Y
X Y
Moderated Causal Relationship
Unobserved Relationship
Z
X Y
X Y
Dynamic & Complex 42
Panel vs. Cross‐Sectional 43
Panel > Cross‐sectional
Advantages
• You can follow individual changes
• You can assume more accurate causal relations
• You can test difference‐in‐differences more widely
• You can take into account individual differences
unobserved (as fixed effects)
• You can increase the number of cases in your modeling
Disadvantages
• Data collection may be more difficult and time consuming
• Analysis requires more careful attentions and advanced
techniques
Panel Data 44
Housing Price Unit of Analysis = District (N=3)
District A
District B
District C
District B 1996 P
b1996
District C 1996 P
c1996
District A 2001 P
a2001
District B 2001 P
b2001
District C 2001 P
c2001
District A 2006 P
a2006
District B 2006 P
b2006
District C 2006 P
c2006
District A 2011 P
a2011
District B 2011 P
b2011
District C 2011 P
c2011 “Long Format”
Panel Data Organization 3 47
X it1 X it2
District A 1996 P
a1996
District A 2001 P
a2001
District A 2006 P
a2006
District A 2011 P
a2011
District B 1996 P
b1996
District B 2001 P
b2001
District B 2006 P
b2006
District B 2011 P
b2011
District C 1996 P
c1996
District C 2001 P
c2001
District C 2006 P
c2006
District C 2011 P
c2011 “Long Format”
Balanced Panel Data 48
Sampling matters
Interventions 49
X it1 X it2
District A 1996 P
a1996 0
District B 1996 P
b1996
0
District C 1996 P
c1996
0
District A 2001 P
a2001 0
District B 2001 P
b2001 0
District C 2001 P
c2001 0
District A 2006 P
a2006 0
District B 2006 P
b2006
1
District C 2006 P
c2006
0
District A 2011 P
a2011 0
District B 2011 P
b2011 1
District C 2011 P
c2011 1
Time Lag 50
Interventions often have a “time lag” on
dependent variable.
District C 2001 P
c2001 0 0 ‐
District A 2006 P
a2006 0 0 0
District B 2006 P
b2006
1 1 0
District C 2006 P
c2006
0 0 0
District A 2011 P
a2011 0 0 0
District B 2011 P
b2011 1 1 1
District C 2011 P
c2011 1 0 0
Equation 52