Professional Documents
Culture Documents
Director
Centre for Real Estate Studies
Faculty of Engineering and Geoinformation Science
Universiti Tekbnologi Malaysia
Skudai, Johor
Objectives
Specific:
* Concepts of data analysis
* Some data analysis techniques
* Some tips for data analysis
No. of houses
250,000
150000
200,000 1991
100000 150,000 2000
100,000
50000
50,000
0
1 2 3 4 5 6 7 8
0
32635.8 38100.6 42468.1 47684.7 48408.2 61433.6 77255.7 97810.1
Kl u
Loan t o propert y sect or (RM
M ggi
ta ng
Se tian
at
Po ar
ng
rB t
ho h a
r
ah
million)
m
Ko ua
si
n
J o Pa
n
ga
Ti
er
Demand f or shop shouses (unit s) 71719 73892 85843 95916 101107 117857 134864 86323
tu
Supply of shop houses (unit s) 85534 85821 90366 101508 111952 125334 143530 154179
Ba
Year (1990 - 1997)
Trends in property loan, shop house dem and & supply District
200
14
180
Price (RM/sq. ft of built area)
12
Proportion (%)
10 160
8
140
6
4
120
2
0 100
4
4
-1
-2
-3
-4
-5
-6
-7
0-
80
10
20
30
40
50
60
70
0 20 40 60 80 100 120
Age Category (Years Old)
Demand (% sales success)
Examples of “abstraction” of phenomena
200
180 50.00 %
prediction
30.00 100.00
140
80.00
60.00
40.00
120 20.00 20.00
0.00
-20.00
100 10.00 -40.00
-60.00
-80.00
80 -100.00
20 40 60 80 100 120 10.00 20.00 30.00 40.00 50.00 60.00
Distance from Ashurton (km)
Demand (% sales success)
Inferential statistics
Using sample statistics to infer some
“phenomena” of population parameters
Common “phenomena”: cause-and-effect
* One-way r/ship Y = f(X)
Dep=7t – 192.6
Coefficientsa
“Effects” of KLIA on the development of Likert scaling based on Descriptive analysis based
Sepang interviews on ex-ante post-ante
experimental investigation
Note: No way can Likert scaling show “cause-and-effect” phenomena!
When analysing:
* Be objective
* Accurate
* True
Separate facts and opinion
Avoid “wrong” reasoning/argument. E.g.
mistakes in interpretation.
Introductory Statistics for Social Sciences
Basic concepts
Central tendency
Variability
Probability
Statistical Modelling
Basic Concepts
Population: the whole set of a “universe”
Sample: a sub-set of a population
Parameter: an unknown “fixed” value of population characteristic
Statistic: a known/calculable value of sample characteristic
representing that of the population. E.g.
μ = mean of population, = mean of sample
SD SST
= 210,000
3
J.B. houses
DST
μ=?
Basic Concepts (contd.)
Randomness: Many things occur by pure
chances…rainfall, disease, birth, death,..
Variability: Stochastic processes bring in
them various different dimensions,
characteristics, properties, features, etc.,
in the population
Statistical analysis methods have been
developed to deal with these very nature
of real world.
“Central Tendency”
Measure Advantages Disadvantages
Mean Best known average Affected by extreme values
(Sum of Exactly calculable Can be absurd for discrete data
all values
Make use of all data (e.g. Family size = 4.5 person)
÷
no. of Useful for statistical analysis Cannot be obtained graphically
values)
x 3 5 7 8 9 10 12
f 1 1 2 3 2 2 1
= 96; = 12
f 3 5 14 24 18 20 12
Thus, = 96/12 = 8
Central Tendency–“Mean of Grouped Data”
House rental or prices in the PMR are frequently
tabulated as a range of values. E.g.
x^
“Probability Distribution”
Defined as of probability density function (pdf).
Many types: Z, t, F, gamma, etc.
“God-given” nature of the real world event.
General form: (continuous)
(discrete)
E.g.
“Probability Distribution” (contd.)
Dice1
Dice2 1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
“Probability Distribution” (contd.)
Mean = 4.0628
Std. Dev. = 1.70319
N = 32
0
2.00 3.00 4.00 5.00 6.00 7.00
Rental (RM/sq.ft.)
“Probability Distribution” (contd.)
* Bell-shaped, symmetrical
μ = mean of variable x
σ = std. dev. Of x
* Has a function of
π = ratio of circumference of a
circle to its diameter = 3.14
e = base of natural log = 2.71828
“Probability distribution”
Note: p(AGE=age) ≠ 1
How to turn this graph into
a probability distribution
function (p.d.f.)?
When X= μ, Z = 0, i.e.
When X = μ + σ, Z = 1
When X = μ + 2σ, Z = 2
When X = μ + 3σ, Z = 3 and so on.
It can be proven that P(X1 <X< Xk) = P(Z1 <Z< Zk)
SND shows the probability to the right of any
particular value of Z.
Example
Normal distribution…Questions
Your sample found that the mean price of “affordable” homes in Johor
Bahru, Y, is RM 155,000 with a variance of RM 3.8x107. On the basis of a
normality assumption, how sure are you that:
Answer (a):
160,000 -155,000
P(Y ≤ 160,000) = P(Z ≤ ---------------------------)
= P(Z ≤ 0.811) 3.8x107
= 0.1867
Using Z-table , the required probability is:
1-0.1867 = 0.8133
Always remember: to convert to SND, subtract the mean and divide by the std. dev.
Normal distribution…Questions
Answer (b):
X1 - μ 145,000 – 155,000
Z1 = ------ = ---------------- = -1.622
σ 3.8x107
X2 - μ 160,000 – 155,000
Z2 = ------
σ
= ----------------
3.8x107
= 0.811
P(Z1<-1.622)=0.0455; P(Z2>0.811)=0.1867
P(145,000<Z<160,000)
= P(1-(0.0455+0.1867)
= 0.7678
Normal distribution…Questions
You are told by a property consultant that the
average rental for a shop house in Johor Bahru is
RM 3.20 per sq. After searching, you discovered
the following rental data:
Similar to Z-distribution:
* t(0,σ) but σn→∞→1
* -∞ < t < +∞
* Flatter with thicker tails
* As n→∞ t(0,σ) → N(0,1)
* Has a function of
where =gamma distribution; v=n-1=d.o.f; =3.147
* Probability calculation requires information on
d.o.f.
“Student’s t-Distribution”
* defining
Fr(t) =
where r ≡ n-1 is the number of degrees of freedom, -∞<t<∞,(t) is the gamma function,
B(a,b) is the beta function, and I(z;a,b) is the regularized beta function defined by
Forms of “statistical” relationship
Correlation
Contingency
Cause-and-effect
* Causal
* Feedback
* Multi-directional
* Recursive
The last two categories are normally dealt with
through regression
Correlation
“Co-exist”.E.g.
* left shoe & right shoe, sleep & lying down, food & drink
Indicate “some” co-existence relationship. E.g.
* Linearly associated (-ve or +ve)
Formula:
* Co-dependent, independent
But, nothing to do with C-A-E r/ship!
Example: After a field survey, you have the following
data on the distance to work and distance to the city
of residents in J.B. area. Interpret the results?
Contingency
A form of “conditional” co-existence:
* If X, then, NOT Y; if Y, then, NOT X
* If X, then, ALSO Y
* E.g.
+ if they choose to live close to workplace,
then, they will stay away from city
+ if they choose to live close to city, then, they
will stay away from workplace
+ they will stay close to both workplace and city
Correlation and regression – matrix approach
Correlation and regression – matrix approach
Correlation and regression – matrix approach
Correlation and regression – matrix approach
Correlation and regression – matrix approach
Test yourselves!
Q1: Calculate the min and std. variance of the following data:
PRICE - RM ‘000 130 137 128 390 140 241 342 143
SQ. M OF FLOOR 135 140 100 360 175 270 200 170
Q2: Calculate the mean price of the following low-cost houses, in various
localities across the country:
Q5: Find:
(AGE > “30-34”)
(AGE ≤ 20-24)
( “35-39”≤ AGE < “50-54”)
Test yourselves!
Q6: You are asked by a property marketing manager to ascertain whether
or not distance to work and distance to the city are “equally” important
factors influencing people’s choice of house location.
You are given the following data for the purpose of testing:
(a) Set your research design and data analysis procedure to address
the research issue
(b) Test your hypothesis that low-income tenants do not perceive
“quality life” to be important in paying their house rentals.
Thank you