Professional Documents
Culture Documents
Arch 2021
B.Arch 2021
20BAR Research Methodology
November 2021 Paul Varghese
Version 21.00.1 +91.9567263770
pvarghese.ijk@gmail.com
Inferential Statistics:
It aims to obtain conclusions that affect future decision-making.
Weighted mean: the product of the weightage of an event and its quantitative outcome, then summing the products.
Geometric mean: of n numbers {x1, x2, x3, …, xn} is the nth root of their product.
Harmonic mean: calculated by dividing the number of terms by its reciprocals.
Variance (s2):
Also
Coefficient of standard deviation = (σ/x)
One standard deviation from the mean in either direction on the horizontal axis (the two shaded areas
closest to the centre axis on the above graph - in red) accounts for somewhere around 68% of the
population/sample. Two standard deviations away from the mean (the four areas closest to the centre areas –
all red & green areas) account for roughly 95% of the population/sample. And three standard deviations
(all shaded areas) account for 99% of the population/sample.
Degree of freedom: is the number of values in the final calculation of a statistic that are free to vary.
Skewness: degree of asymmetry of the distribution.
Kurtosis: measures how ‘peaked’ or’ flat’ the data distribution is; a mathematical definition pertaining to the
degree of peakedness or flatness of the distribution.
Confidence interval: it quantifies the uncertainty in the measurement.
Correlation: an analytical technique used to show the relationship between pairs -- to know how strongly (or
weakly) the pairs are related to one another.
Correlation coefficient (r): A decimal number between 0.00 and ±1.00 that indicates the degree to which two
quantitative variables are related.
Regression: it fits a line to a plot in such a way as to minimize the sum of the squares of the residuals.
Type I Error: a probability (α) involves incorrectly rejecting a true null hypothesis.
Type II Eror: a probability (β) involves not rejecting a null hypothesis that is in fact false.
Alternate Hypothesis (H1): the hypothesis that is contradictory to the null hypothesis (H0) [μ1 ≠ μ2]
• can save time & money: less expensive than census, comparative results at faster speed,
• good accuracy when conducted by experts,
• useful when population contains infinite members,
• estimate of sampling errors – information concerning characteristics of the population.
Potential errors
1 Define the population of - inappropriate population for problem
interest - undefined or vaguely defined population, causing inconsistencies in
selecting the sample and imprecise conclusions
2 If possible, obtain a list of all Sampling frame may not match the population by listing members who do
members of the population not belong (over-representation) or missing some members (under-
(called a sampling frame) representation)
3 Choose a sampling method - getting a biased, unrepresentative sample
- using non-random sampling method but analysing the data as if the
sample was random
4 Determine sample size Too small a sample for the required accuracy
5 Obtain the sample and collect - non-response error - when population members not obtainable or not
the data responding are not representative of the population as a whole
- asking ambiguous, biased or other poor questions
- mistakes - mishearing an answer, miskeying data into the computer,
dishonest researchers etc. etc.
SAMPLING METHODS (REFER K&G: CHAPTER/SECTION 04.5, 09.2)
Random sampling methods (each member of the target population has an equal chance of being selected):
simple random sampling - choose at random from the frame.
o e.g. from a complete list of all 2000 employees , numbered 1-2000, use a random number generator to
generate 50 random numbers in the 1-2000 range and question those 50 employees
stratified sampling - split population into homogenous segments, or strata, and choose at random
proportionate numbers from each strata.
o e.g. if the company is known to have 60% female employees and 40% male a sample of 50 employees would
select 30 women at random and 20 men at random.
cluster, or area, sampling - split population into heterogeneous groups, or clusters, choose a cluster at
random and then sample within it
o e.g. select one of the company’s sales regions East/West/North/South.
systematic sampling - select every nth member from the frame.
o e.g. select employees numbered 40, 80, 120, etc.
Non-random sampling can be used by experienced researchers to obtain good results more cheaply than random
sampling. However, inferential statistical theory is based on probability theory and is only valid when the data has
Faculty of Architecture & Planning, K A H E Page 4
[Type text] Research Methodology Notes B.Arch 2021
been collected with every member of the population having an equal chance of being selected i.e. by random
sampling. If you do not use random sampling any inferences you make from your data do not have any
mathematical basis.
When n is small, the shape of the distribution will depend largely on the shape of the parent population, but as n
gets large (n >30), the shape of the sampling distribution will become more and more like a normal distribution,
irrespective of the shape of the parent population. The theorem that explains this kind of relationship between the
shape of the population distribution and the sampling distribution of the mean is called the Central Limit
Theorem.
Or,
Given a population with mean μ and variance σ2, the sampling distribution of the mean will have a mean equal to
μ (μx = μ) and a variance (σx2) equal to σ2/NN, and standard deviation σx equal to σ/N√NN.
The distribution will approach the normal distribution as N, the sample size increases. Refer fig. below.
Inferential Statistics
Going beyond descriptive statistics, one can begin to tell more things about the data
• sampling from the total population, of which one needs to know more about,
• one needs to infer something about the characteristics of the population from what one knows
from the characteristics of the sample
Parametric Methods
In parametric methods/tests, one assumes that the distribution is normal.
Assumptions include that population is normal, samples are independent, standard deviation is known.
Non-parametric methods