You are on page 1of 6

[Type text] Research Methodology Notes M.Arch & M.

Plan 2020-21
M.Arch & M.Plan 2020-21
20MAR111 Research Methodology
December 2020 Paul Varghese
Version 20.00.4 +91.9567263770
pvarghese.ijk@gmail.com

Statistics for Architecture & Planning


Some Terms:
Random sample: a manner to ensure that each and every element of the population has an equal chance of being
selected
Population: the entire collection of events or subjects which is being studied.
Sample: observations from a population, of which the characteristics are to be studied.
Variable: property of an object or event that can take on different values.
Dependent variable: those that are used to measure something, not under the experimenter’s control (and part of
the data).
Validity (internal/external): state of subjects/groups chosen that are fundamental to both for the external or
internal integrity of the experiment.
Hypothesis: an assumption (theory, supposition) which has to be proved or disproved.

Statistics for Research


Statistical Concepts
Probability

Descriptive Statistics (Refer K&G: Chapter 08)


When our need is to describe a set of data, or to understand what the data is about, oneemploy descriptive statistics.
The basic measures and analysis used give an idea of the distribution of the observations in the data-set, which
together, are called descriptive statistics. This might, however, only be a superficial reading of the data available.

Measures of Central Tendency & Dispersion


A value (real or calculated) around which all the observations tend to cluster.

Normal distribution (assumptions of):


 many of the dependent variables are assumed to be normally distributed in the population, i.e., that the
sample distribution would closely resemble the population,
 if normally distributed, the techniques used allows one to make inferences about the population,
 the hypothetical set of samples would be approximately normal under a variety of circumstances,
 the tests all assume that the population sampled is normally distributed.

Mode: the most frequently occurring value; is not affected by extreme values, not always
amenable to algebraic treatment. (A dataset might not have a mode, or might even have
multiple modes.)
Median: is the positional average; the middle value when arranged in order, used in the context of
qualitative phenomena. Not often used in sampling statistics.
Mean: A most commonly used ‘average’; the most common measure of central tendency.

Weighted mean:
Faculty of Architecture & Planning, K A H E Page 1
[Type text] Research Methodology Notes M.Arch & M.Plan 2020-21
Geometric mean:
Harmonic mean:

Measures of Dispersion / Variability: (Refer K&G: Chapter 08)


Range: the difference between the extreme values of the data-set.
The Normal Distribution: graphically the normal distribution is the "bell-shaped" curve; the curve is
symmetrical about the middle peak; the ‘tails’ of the distribution approach, but never touch, the horizontal axis;
the mean is the position of the peak of the curve
Outliers: Outliers are extreme, atypical values. They may be genuine values or the result of some mistake. If it is a
mistake this must be corrected or, if this is impossible, the data ignored.
Deviation (s):
Mean Deviation: is the average of the difference of the values of items from some average of the series.
Standard deviation (σ): a measure based on how far values are from the mean; defined as the square-root of the
average of the squares of deviations in a series obtained from the arithmetic average.

Variance (s2):
Also
Coefficient of standard deviation = (σ/x)

One standard deviation from the mean in either direction on the horizontal axis (the two shaded areas
closest to the centre axis on the above graph - in red) accounts for somewhere around 68% of the
population/sample. Two standard deviations away from the mean (the four areas closest to the centre areas –
all red & green areas) account for roughly 95% of the population/sample. And three standard deviations
(all shaded areas) account for 99% of the population/sample.

Faculty of Architecture & Planning, K A H E Page 2


[Type text] Research Methodology Notes M.Arch & M.Plan 2020-21
Degree of Freedom:
Skewness: degree of asymmetry of the distribution.
Kurtosis: measures how ‘peaked’ or’ flat’ the data distribution is; a mathematical definition pertaining to the
degree of peakedness or flatness of the distribution.

The Hypothesis & Testing it (Refer K&G: Chapter 10)


In statistics,
Research Hypothesis: the basic formal question / hypothesis that has to be solved.
Null Hypothesis (H0): the hypothesis that the difference between the two population means is zero, or ‘null’.
[μ1 – μ2 = 0]
The null hypothesis was formulated as the method of contradiction by R. A. Fisher, which is that one can never
prove a hypothesis always true, but one can sometimes prove one false.
In Neyman and Pearson’s view, one either rejects or accepts the null hypothesis.

Type I Error: a probability (α) involves incorrectly rejecting a true null hypothesis.
Type II Eror: a probability (β) involves not rejecting a null hypothesis that is in fact false.

Alternate Hypothesis (H1): the hypothesis that is contradictory to the null hypothesis (H0) [μ1 ≠ μ2]

H1: μ1 ≠ μ2 i.e., H1: μ1 > μ2 or H1: μ1 < μ2

Sampling (Refer K&G: Chapter/sections 04.5, 09.2)

• can save time & money: less expensive than census, comparative results at faster speed,
• good accuracy when conducted by experts,
• useful when population contains infinite members,
• estimate of sampling errors – information concerning characteristics of the population.

Sampling error = frame error + chance error + response error


Total error = measurement error + sampling error
nonsampling errors are difficult to estimate

Faculty of Architecture & Planning, K A H E Page 3


[Type text] Research Methodology Notes M.Arch & M.Plan 2020-21

The typical Sampling Procedure is:

Potential errors
1 Define the population of - inappropriate population for problem
interest - undefined or vaguely defined population, causing inconsistencies in
selecting the sample and imprecise conclusions
2 If possible, obtain a list of all Sampling frame may not match the population by listing members who do
members of the population not belong (over-representation) or missing some members (under-
(called a sampling frame) representation)
3 Choose a sampling method - getting a biased, unrepresentative sample
- using non-random sampling method but analysing the data as if the
sample was random
4 Determine sample size Too small a sample for the required accuracy
5 Obtain the sample and collect - non-response error - when population members not obtainable or not
the data responding are not representative of the population as a whole
- asking ambiguous, biased or other poor questions
- mistakes - mishearing an answer, miskeying data into the computer,
dishonest researchers etc. etc.
SAMPLING METHODS (REFER K&G: CHAPTER/SECTION 04.5, 09.2)
Random sampling methods (each member of the target population has an equal chance of being selected):
 simple random sampling - choose at random from the frame.
o e.g. from a complete list of all 2000 employees , numbered 1-2000, use a random number generator to
generate 50 random numbers in the 1-2000 range and question those 50 employees
 stratified sampling - split population into homogenous segments, or strata, and choose at random
proportionate numbers from each strata.
o e.g. if the company is known to have 60% female employees and 40% male a sample of 50 employees would
select 30 women at random and 20 men at random.
 cluster, or area, sampling - split population into heterogeneous groups, or clusters, choose a cluster at
random and then sample within it
o e.g. select one of the company’s sales regions East/West/North/South.
 systematic sampling - select every nth member from the frame.
o e.g. select employees numbered 40, 80, 120, etc.

Non random sampling methods:


 convenience sampling - select on basis of convenience/ cost
 judgement sampling - researcher chooses what he/she judges to be a representative sample
 quota sampling - as judgement, but researcher must fill certain quotas e.g. 25 men aged 30 to 40
 purposive - sample chosen for a specific purpose e.g. “key informants”, thought to be most interested and/
or knowledgeable.

Non-random sampling can be used by experienced researchers to obtain good results more cheaply than random
sampling. However, inferential statistical theory is based on probability theory and is only valid when the data has
been collected with every member of the population having an equal chance of being selected i.e. by random
sampling. If you do not use random sampling any inferences you make from your data do not have any
mathematical basis.

Sampling Distribution of Mean


Student’s t-Distribution
Chi-squared (χ2) Distribution
Snedecor’s F-Distribution

Faculty of Architecture & Planning, K A H E Page 4


[Type text] Research Methodology Notes M.Arch & M.Plan 2020-21

Central Limit Theorem (Refer K&G: Chapter/section 09.6)

When n is small, the shape of the distribution will depend largely on the shape of the parent population, but as n
gets large (n >30), the shape of the sampling distribution will become more and more like a normal distribution,
irrespective of the shape of the parent population. The theorem that explains this kind of relationship between the
shape of the population distribution and the sampling distribution of the mean is called the Central Limit
Theorem.
Or,
Given a population with mean μ and variance σ2, the sampling distribution of the mean will have a mean equal to
μ (μx = μ) and a variance (σx2) equal to σ2/NN, and standard deviation σx equal to σ/N√NN.

The distribution will approach the normal distribution as N, the sample size increases. Refer fig. below.

Tending towards normality

Typical sample distribution

Inferential Statistics
Going beyond descriptive statistics, one can begin to tell more things about the data
• sampling from the total population, of which one needs to know more about,
• one needs to infer something about the characteristics of the population from what one knows
from the characteristics of the sample

Parametric Methods
In parametric methods/tests, one assumes that the distribution is normal.
Assumptions include that population is normal, samples are independent, standard deviation is known.

Faculty of Architecture & Planning, K A H E Page 5


[Type text] Research Methodology Notes M.Arch & M.Plan 2020-21

Estimation of Confidence Interval (Refer K&G: Chapter/section 09.8.2)

Process of estimation of confidence interval

Non-parametric Methods (Refer K&G: Chapter 13.7)

One can apply a test without a model, it is called a distribution-free or a non-parametric test. Hence, in
non-parametric methods/tests, there are no assumptions of normality in the distribution. One does not
assume that a particular distribution is applicable, or that a value is attached to a parameter to the
population.
The central point of a data-set is usually the arithmetic mean, when weightage is given to the magnitude
of the observation, while the location-wise central point is given by the median, which is what is used in
sign-tests and other non-parametric tests.
Typically,
One sample Sign test, Two sample Sign test (Paired sign test), Wilcoxon signed Rank Sum test (single), Mann-
Whitney U-test, Kruskul Wallis Test, Spearman’s Rank Correlation, etc.

Non-parametric methods

• Does not suppose any particular distribution


• Quick and easy to use, no laborious computations since observations are placed in rank order, or
sometimes just signs (+/-),
• Not so efficient -- does not use all the information available, but uses groupings or rankings, with
resulting loss in efficiency,
• Can be satisfactorily used with not-so accurate data,
• Non-parametric tests can be used for ordinal or nominal scale data, but parametric tests cannot.

Faculty of Architecture & Planning, K A H E Page 6

You might also like