You are on page 1of 69

CA4229 Semester B 2018

Land Use Planning &


Applied Valuation
1 February 2018
Week 3
Hedonic Price Analysais
(Con. & Advanced)
Murakami, Jin
Assistant Professor
Department of Architecture and Civil Engineering
City University of Hong Kong
Today’s Outline 00

• Regression (by MS Excel)


• Non‐Linear Spatial Relations
• Interaction
• Terms
• Panel Data Analysis
Regression
By MS Excel
Software for OLS Regression 01

Optional
Regression by MS Excel 1 02

P = 0 +  1Z1 +  2Z2 +  3Z3 + t +


P: Market Transaction Price [$ per sq ft]
Z1: Internal Attributes
Z2 : External Attributes
Z3 : Other Attributes
t: data year (panel data only)
0: constant
1, 2, 3,  : parameters
: error term
Regression by MS Excel 2 03
Open the data table from MS Excel
Regression by MS Excel 3 04
File  Options
Regression by MS Excel 4 05
Place “Analysis ToolPak” under “Active Application Add‐Ins”
Regression by MS Excel 5 06
Data  Data Analysis
Regression by MS Excel 6 07
Descriptive Statistics
Regression by MS Excel 6 08
Descriptive Statistics
Regression by MS Excel 7 09
Descriptive Statistics
Regression by MS Excel 8 10
Organize the format of Descriptive Statistics
Regression by MS Excel 9 11
Data  Data Analysis
Regression by MS Excel 10 12
Regression by MS Excel 11 13
Non‐Linear
Spatial Relations
Coefficients of Distance 14
Sensitivity Test: If the other conditions are the same (means of each variable in
the model), how much are prices sensitive to one of the variables (e.g., Actual Walking
Distance from MTR Station)
Estimate from the Linear Model Estimate from the Full‐log Model

12000 12000

Slope = ‐ 422.62 Elasticity = ‐0.053%


10000 10000
Housing Price (HK$ per Sq f)

Housing Price (HK$ per Sq f)


8000 8000

6000 6000

4000 4000

2000 2000

0 0
0 200 400 600 800 1000 0 200 400 600 800 1000
AW Distance from MTR Station (m) AW Distance from MTR Station (m)
Categorical Approach? 15
Y (Price) In some facility case (e.g., highway interchange), the spatial relation
cannot be simply linear (e.g., accessibility benefits are cancelled out by
air pollution & noise near a facility)
 In such a complex spatial case, we may try to incorporate categorical
approach (distance band dummy 1/0)

Not
good
line
0 Band Band Band Band Band X (Distance)
100 200 300 400 500
Generate Band Dummy (1/0) 16
Select All Parts
& Sort the
dataset based
on Actual
SMALL
Walking
Distance from
Small to Large

LARGE
Generate Band Dummy (1/0) 17
Add a new column for the new variable “MTR
Distance within 100 m dummy”

And then, give “1” if AWDtoMTRstation is less


than 100 meters, OR “0”
Generate Band Dummy (1/0) 18
Add a new column for the new variable
“MTR Distance within 200 m dummy”

And then, give “1” if


AWDtoMTRstation is between 100‐200
meters, OR “0”
Generate Band Dummy (1/0) 19
Add a new column for the new variable
“MTR Distance within 300 m dummy”

And then, give “1” if


AWDtoMTRstation is between 200‐300
meters, OR “0”

Add a new column for the new variable


“MTR Distance within 400 m dummy”

And then, give “1” if


AWDtoMTRstation is between 300‐400
meters, OR “0”

Add a new column for the new variable


“MTR Distance within 500 m dummy”

And then, give “1” if


AWDtoMTRstation is between 400‐500
meters, OR “0”
Regression with Band Dummy 20
Run regression using the 5 distance band dummy variables (100m, 200m, 300m, 400m & 500m)
‐‐‐ Instead of AWD to MTR Station (m), which has continuous distance values.

Insufficient
Regression with Band Dummy 21

Finalized Model

<0.01
Make a graph
Regression with Band Dummy 22
1200 Band Dummies can be incorporated
Access benefits are highly into semi‐log and full‐log functions
1000 localized within 100m? too. You may also consider
incremental bands (e.g., by 50
800
meters), but it should not be too
Housing Price Premium (HKD sq f)

600 incremental (*too small distance


banding means just like continuous
400 distances).

200

‐200

‐400

‐600 Access values are deeply


discounted around 400‐500m?
‐800

‐1000
Within 100m Within 200m Within 300m Within 400m Within 500m Between 500‐1000m

MTR Station Distance (Band Dummy)


Interaction
Terms
Independent Variables 23

Y = 0 +  1X1 +  2X2 +

X1, X2 can be….


• Numeric
• Integer
• Dummy (1/0)
The Idea of “Interaction” 24
In the regression model, you may be able to
explain one thing (e.g., property pricy) by using
more than two factors (e.g., size, age, distance
from a station etc…)– so‐called main effects. Each
factor has a impact individually, but you may also
find additive effects among multiple factors
e
Y

X1 X2
Simple Example 25

P = 0 +  1X1 +  2X2 +

P: Property Price ($/sq f)


X1 :Size of Room (sq f)
X2 :Ocean View (1/0)
P‐X1 Relationship 26
P ($/sq f.)

X1 (sq f.)
P‐X2 Relationship 27
P ($/sq f.)

0 1 X2 (1/0)
No‐view Ocean view
P‐X1‐X2 Relationship 28
P ($/sq f.) X2 = 1

X2 = 0

No Interaction Effect X1 (sq f.)


P‐X1‐X2 Relationship 2 29
P ($/sq f.) X2 = 1

X2 = 0

Interaction Effect X1 (sq f.)


Interaction means… 30
• Room Size (X1) increases Property Price (P)
• Ocean View (X2) increases Property Price (P)
• The combination of X1 and X2 increases
Property Price (P) more.

e
P

X1 X2
Interaction means… 31

HK$20,000 HK$5,000

Not simply sum

HK$30,000
How to write? 1 32
P ($/sq f.) X2 = 1

X2 = 0

Interaction Effect X1 (sq f.)


How to write? 2 33

P = 0 + ’ 1X1 +  2X2 +

’ 1=  1+3X2

P = 0 + ( 1+3X2 )X1
+  2X2 +
How to write? 2 34

P = 0 +  1 X1 +3X2 X1
+  2X2 +

P = 0 +  1 X1 +  2X2
+3X1 X2 +
Interaction Effect
Extensions 35
Technically speaking, interactions
can be
• More than two (e.g., 3X1 X2 X3 )
• Numeric & Numeric Variables
• Dummy & Dummy Variables
• Linear, semi‐long, and full‐log forms
• Non‐linear
But, don’t be too many, unreasonable, and too
complex
Possible Combinations 36
Another Example 37

P = 0 +  1 X1 +  2X2
+3X1 X2 +
P: Property Price ($/sq f)
X1 :Distance from MTR (m)
X2 :Public-Private Coordination (1/0)
Another Example 38

Lilian Law with Jin Murakami (2014)


Another Example 39

Lilian Law with Jin Murakami (2014)


Panel Data
Analysis
Your question? 40
There are three basic types of questions that research projects can address:

Descriptive
Relational
Causal
Y= aX + b
Dependent Independent
Causality is difficult 41
Direct Causal Relationship Indirect Causal Relationship

X Y X Z Y

Spurious Relationship
Bidirectional Causal Relationship
Z
X Y

X Y
Moderated Causal Relationship
Unobserved Relationship
Z
X Y

X Y
Dynamic & Complex 42
Panel vs. Cross‐Sectional 43
Panel > Cross‐sectional
Advantages
• You can follow individual changes
• You can assume more accurate causal relations
• You can test difference‐in‐differences more widely
• You can take into account individual differences
unobserved (as fixed effects)
• You can increase the number of cases in your modeling
Disadvantages
• Data collection may be more difficult and time consuming
• Analysis requires more careful attentions and advanced
techniques
Panel Data 44
Housing Price Unit of Analysis = District (N=3)

District A

District B

District C

1996 2001 2006 2011 Year t


Panel Data Organization 1 45
1996 2001 2006 2011

District A Pa1996 Pa2001 Pa2006 Pa2011


District B Pb1996 Pb2001 Pb2006 Pb2011
District C Pc1996 Pc2001 Pc2006 Pc2011
“Wide Format”
Panel Data Organization 2 46
X it1 X it2
District A 1996 P
a1996

District B 1996 P
b1996

District C 1996 P
c1996

District A 2001 P
a2001

District B 2001 P
b2001

District C 2001 P
c2001

District A 2006 P
a2006

District B 2006 P
b2006

District C 2006 P
c2006

District A 2011 P
a2011

District B 2011 P
b2011

District C 2011 P
c2011 “Long Format”
Panel Data Organization 3 47
X it1 X it2
District A 1996 P
a1996

District A 2001 P
a2001

District A 2006 P
a2006

District A 2011 P
a2011

District B 1996 P
b1996

District B 2001 P
b2001

District B 2006 P
b2006

District B 2011 P
b2011

District C 1996 P
c1996

District C 2001 P
c2001

District C 2006 P
c2006

District C 2011 P
c2011 “Long Format”
Balanced Panel Data 48

Sampling matters
Interventions 49
X it1 X it2
District A 1996 P
a1996 0
District B 1996 P
b1996
0

District C 1996 P
c1996
0

District A 2001 P
a2001 0
District B 2001 P
b2001 0

District C 2001 P
c2001 0

District A 2006 P
a2006 0

District B 2006 P
b2006
1

District C 2006 P
c2006
0

District A 2011 P
a2011 0
District B 2011 P
b2011 1

District C 2011 P
c2011 1
Time Lag 50
Interventions often have a “time lag” on
dependent variable.

Think about the impact of transportation


investment on property prices. It takes some years
after the completion.

You may test several different time lags (e.g., 1~5


years) and pick up one lag. But you may not be
able to test too big time lags because you would
loose a lot of cases for the model.
Time Lag Table 51
X it1 X it lag 1 X it lag 2
District A 1996 P
a1996 0 ‐ ‐
District B 1996 P
b1996
0 ‐ ‐
District C 1996 P
c1996
0 ‐ ‐
District A 2001 P
a2001 0 0 ‐
District B 2001 P
b2001 1 0 ‐

District C 2001 P
c2001 0 0 ‐

District A 2006 P
a2006 0 0 0
District B 2006 P
b2006
1 1 0

District C 2006 P
c2006
0 0 0

District A 2011 P
a2011 0 0 0
District B 2011 P
b2011 1 1 1

District C 2011 P
c2011 1 0 0
Equation 52

Pit =0 + 1X1it+ 2X2it+uit+


i = district (i=A, B, C)
t= year (t=1996, 2001, 2006, 2011)
N = 3 districts x 4 years = 12 cases
uit= Di + Tt
Di : District Specific Effects
Tt : Time Specific Effects
Year Dummy 53
You may have to consider “year specific
effect Ti” in panel data analysis using “year
dummy (1/0)” variables.

Year effects are usually unobserved trends


or phenomenon in specific years.

One of the years should be dropped as “a


base year”
Year Dummy Table 54
T 2001 T 2006 T 2011
District A 1996 P
a1996
0 0 0
District B 1996 P
b1996
0 0 0
District C 1996 P
c1996 0 0 0
District A 2001 P
a2001 1 0 0
District B 2001 P
b2001 1 0 0
District C 2001 P
c2001 1 0 0
District A 2006 P
a2006 0 1 0
District B 2006 P
b2006 0 1 0
District C 2006 P
c2006 0 1 0
District A 2011 P
a2011 0 0 1
District B 2011 P
b2011 0 0 1
District C 2011 P
c2011 0 0 1
Between vs. Within 1 55
Difference
between
Districts
Change
within
A District
Between vs. Within 2 56
Both “Between” and “Within” are statistically significant
Between vs. Within 3 57
“Between” is statistically significant , while “Within” is not
Between vs. Within 4 58
“Within” is statistically significant , while “Between” is not
Between vs. Within 5 59
“Between” can be estimated by OLS
Regression Model
“Within” needs conducting Fixed
Effects (FE) or Random Effects (RE)
Model (*I recommend using STATA
rather than SPSS)
Fixed Effects (FE) Model 60
Pit =0 + 1X1it+ 2X2it+ui+

*You cannot include independent


variable Xi that does not change over
the time period.
Random Effects (FE) Model 61
Pit =0 + 1X1it+ 2X2it+(ui )+

*You can include independent variable


Xi that does not change over the time
ui period.
FE or RE: Hausman Test 62
To decide between fixed or random effects you can
run a Hausman test where the null hypothesis is
that the preferred model is random effects vs. the
alternative the fixed effects. It basically tests
whether the unique errors are correlated with the
regresssors.
Prices in the time series 63
Monetary values (e.g., property prices) change over the
time periods. To compare the time‐dependent values, we
need to adjust the values based on a certain year (before
you start analysis). The most typical way can be using
“Consumer Price Index(CPI)”. Each country (government)
usually announce its own annual CPI over the past decades
(annually or sometimes monthly by goods and services).

Year Property Price ($M) CPI‐96 CPI Adjusted Price ($M)


1996 50 100  50
2001 53 113  46
2006 56 115  50
2011 63 107  55
*Year 1996 Value

You might also like