STATISTICS
DEPARTMENT OF CONSTRUCTION ECONOMICS AND
MANAGEMENT
CEDAT
SEMESTER II, 2014
BSC(CM) II, BS (QS) III, BS (LE) III
3/5/2014 1
TUTORS: Godfrey Mwesige
BSC (Civil Eng.), Mak; MSC (Civil Eng.), Illinois; Dip. Road
Traffic Safety, Lund
Mr. Julius Ssemanda
BS(QS); Mak
3/5/2014 2
COURSE OUTLINE
PART I: RESEARCH METHODS
Categories of research
Surveys and Experiments
Data Collection Methods
Sampling Errors and Mitigation Measures
Design of Survey Data Collection Instrument
Questionnaire
Planning and Designing a Research Study
Ethical Considerations in Research
Disseminating Research Results
3/5/2014 3
COURSE OUTLINE
PART II: STATISTICS
Nature of Data
Data Types
Descriptive Statistics; Collect, Present and Characterize
Data; Measures of Location and Dispersion/Variability.
Statistical Inference; Estimation and Hypothesis Testing.
Linear Regression and Correlations.
3/5/2014 4
REFERENCES
Marczyk, G., Dematteo, D. & Festinger, D. (2005). Essentials
of Research Design and Methodology. John Wiley & Sons Inc.
Johnson, R., Freund, J. & Miller, I. (2011). Miller and
Freunds Probability and Statistics for Engineers. 8
th
Edition,
Prentice Hall.
Blaxter, L., Hughes, C. & Tight, M. (2006). How to
Research. 3
rd
Ed., Open University Press
Any other Reference Texts on Research Methods and
Statistics
3/5/2014 5
Course Management
Mode of Delivery: Lectures, Labs and Class
Exercises. All compulsory.
Continuous Assessment (40%) :
Coursework (20%) : 30
th
April, 2014.
Two Tests (20%): Dates to be communicated
Final Exam (60%): Covering entire course
content
3/5/2014 6
Introduction
Research methods is a scientific approach to solving problems real or
abstract following scientific steps; problem definition, questions,
hypothesis and objectives, data collection, analysis and conclusion.
Research mainly follows empirical approach which relies on direct
observation and experimentation in acquisition on new
knowledge (Kazdin, 2003).
Direct observation of a population (surveys) or manipulation of
variables to influence outcome (Experiments) are major sources of
data.
Both Surveys and Experiments involve some degree of measurements
to obtain data. These measurements must be carefully carried out to
obtain accurate data.
In this course, we shall concentrate on methods and measurements
techniques of data collection, analysis and reporting in both parts.
3/5/2014 7
Categories of Research
There are two broad categories of research a researcher should be familiar with;
Qualitative or quantitative.
Quantitative research involves studies that make use of statistical analyses to
obtain their ndings. Key features include formal and systematic measurement
and the use of statistics.
Qualitative research involves studies that do not attempt to quantify their
results through statistical summary or analysis. Qualitative studies typically
involve interviews and observations without formal measurement. A case study,
which is an indepth examination of one person, is a form of qualitative
research. Qualitative research is often used as a source of hypotheses for later
testing in quantitative research.
In either category of research either Primary or Secondary Data is
required;
Primary data; obtained by the researcher through surveys and experimentation.
Secondary data are data that were collected for some purpose other than your study
Examples government records, internal documents, previous surveys
3/5/2014 8
Sources of Data by Categories
Qualitative
Focus Group
InDepth Interview
Case Study
Participant observation
Secondary data analysis
Quantitative
Questionnaires
Experiments
Structured observation
Secondary data analysis
3/5/2014 9
Surveys and Experiments
Survey  to collect data for the analysis of some aspect
of a group or area.
measure things as they are; that is, snapshot of population at one
point in time.
You can not change the variables being measured, but record them
as they are.
often used as a synonym for questionnaire.
Methods of data collection include; telephone, selfadministered,
personal interview, etc
Experiment  manipulate at least one variable
(treatment) to evaluate response, to study causeeffect
relationships.
Can be either field and laboratory experiments.
Example; measuring the effect on cement content on crushing
strength (response) of blocks is an experiment. Cement content is
a variable that is being manipulated.
3/5/2014 10
Data Collection Methods: Surveys
Most projects in construction management,
valuation and quantity surveying will mainly
involve surveys and not many experiments.
How can we collect data in surveys?
Face to face (talk)
Telephone call respondent and ask a series of
questions
Computer /Internet Design and send survey
instrument online, eg using survey monkey
Mail (no outdated as post mails not working efficiently)
Observe (onsite)
Gather secondary sources
3/5/2014 11
General Guidelines for Suitable Method of
Data Collection
1. Describing a population  surveys
2. Describing users/visitors  onsite survey
3. Describing nonusers, potential users or general population  household survey
4. Measuring impacts, causeeffect relationships  experiments
5. Short, simple household studies phone
6. Captive audience or very interested population survey (selfadministered
questionnaire)
7. Testing new ideas  experiments or focus groups
8. Indepth study surveys (indepth personal interviews, focus groups, case
studies).
9. Anytime suitable secondary data exists  secondary data
3/5/2014 12
Survey Implementation Process
Survey planning define the problem & hypotheses;
Survey design background info, design procedures,
organization, sampling, drafting questions, constructing
survey;
Field implementation pretesting, training, briefing,
interviewing & data collection;
Data preparation coding & data entry, cleaning,
programming, compiling; and
Data analysis analysis, testing, reporting, using.
3/5/2014 13
Survey Data Collection Methods: In
depth Interviews
One person answers detailed questions;
Can answer many openended questions
Can be recorded (audio or video)
Usefulness;
Explorative research,
Cannot be generalized, and
Mainly useful for qualitative research
Meeting with people facetoface can be the most effective
method of asking questions.
It is also perhaps the most expensive and time consuming
3/5/2014 14
Survey Data Collection Methods: Focus
Groups
Qualitative method similar to indepth interview.
A group of selected persons answers detailed
questions in a group discussion lead by a
moderator.
Person selection.
Homogeneity (similar attributes) vs. heterogeneity
(different attributes)
Ideally 57 persons.
Many open questions.
Documented with video taping.
Generalization of results.
3/5/2014 15
Merits and Demerits of Data Collection
by Focus Groups
Merits
Relaxed atmosphere.
Less pressure on
individuals.
Productive and creative
character of discussion.
Insights into attitudes and
ideas of people.
Demerits
Hard to recruit people
Importance of moderator
Expensive or costly
Time consuming data analysis
Generalization of results
3/5/2014 16
Survey Data Collection Methods:
Telephone Interviews
Calling subjects on the telephone can
be a very effective way of
interviewing:
It is faster, less expensive in general.
But people are more likely to refuse to
participate.
3/5/2014 17
Survey Data Collection Methods: Mail
Surveys
Survey instruments (questionnaires)
can be mailed to subjects for them to
fill out on their own and mail back.
Can be done in large numbers.
Often results in a high rate of errors and non
response.
3/5/2014 18
Merits and Demerits of Mail Surveys
3/5/2014 19
Survey Data Collection Methods:
Internet/Emails
Merits
Speed
Practically no cost once the set
up has been completed.
Can include pictures and sound.
Some webpage surveys can use
complex skip logic,
randomizations, etc, not generally
possible with paper or email
surveys
People may respond more
honestly to sensitive questions
when giving their answers to a
computer.
Web page surveys eliminate the
need for data entry.
Demerits
Only reach people with email/ internet
access does not reflect the population
as a whole.
People dislike unsolicited email.
Email addresses needed.
People can easily quit in the middle of
the survey.
No control over who replies to (popup)
webpage surveys.
Multiple responses/ignored instructions
(advanced systems can check such
things) .
Most email surveys cannot use skip
logic, randomization, etc.
3/5/2014 20
Enhancing Quality of Survey Data
3/5/2014 21
Categories of Survey Errors
Sampling errors
Sampling errors are the random errors that are introduced into the
survey simply because not every member of the survey population is
included in the drawn sample.
Sampling errors reflect the potential variability between the estimate of a
parameter in the sample and its true value in the population.
These errors affect the precision of the survey results.
Nonsampling errors and biases
Caused by problems that can occur during the survey design and data
collection stages which may cause survey measures and parameter
estimates to be systematically incorrect.
Nonsampling errors reflect how well the information is collected, and
include nonresponse biases often caused by refusals and response
biases which reflect a systematic distortion of survey responses.
These errors affect the accuracy of the survey results.
3/5/2014 22
Precision vs Accuracy
(Reliability vs ValidityBulls eye example)
3/5/2014 23
Sampling Errors
3/5/2014 24
NonSampling Errors
Nonsampling errors occur due to mistakes made
along the process of data acquisition; faulty
questionnaire or measuring instrument.
Increasing sample size will not reduce this type of
errors.
There are three types of nonsampling errors;
Errors in data acquisition,
Nonresponse errors,
Selection bias.
3/5/2014 25
Data Acquisition Error
3/5/2014 26
Nonresponse Error
3/5/2014 27
Selection Bias
3/5/2014 28
Strategies to Reduce Errors
Sampling Design.
Design of Survey Instrument
(Questionnaire).
3/5/2014 29
Sampling Design: Key Questions
What are the primary objectives and constraints of the survey?
What are the variables of greatest interest and the desired level of
precision?
How should the study population be defined?
What information is readily available on the variables of interest?
Which sampling method should be used to meet the precision
requirements?
What sample size is required to satisfy these precision requirements?
Are there enough resources to collect such a sample?
What is the precision and confidence level corresponding to a
smaller sample size?
3/5/2014 30
Basic Sampling Concepts
Target Population Specifies all elements of interest to the researcher
within geographical, business or professional boundaries. It could be
construction firms at a macro level or internal elements within individual
construction firms at micro level. This is defined by the purpose of the
research.
Sampling Unit: In the example above, the sampling unit could
individual construction firms or specific elements within individual firms
that will be compared across a defined spectrum of construction firms.
Sampling Frame:
list of all sampling units
depends on such factors as the target population, the mode of data
collection, and the sampling unit.
Sampling strategy :
Probability sample
Nonprobability sample
3/5/2014 31
Class Exercise: Performance Evaluation
Problem
Background:
It is University Policy that individual academic departments invite professors from other
universities as External Examiners every academic year to evaluate the quality of course
assessments in a particular program. In Academic Year 2010/11, the External Examiner
evaluated the quality of examinations questions, marking guides, and performance of randomly
selected five (05) students from BSC (CM) program for both Semesters I and II. In his
assessment, he observed that in general, Semester II performance was better than Semester I and
that female students performed even better. However, it is not clear to the Quality Control
Directorate whether to uphold the observation of the external examiner or not.
Questions that have to be answered to solve the problem:
1) Is the professors assessment valid? (This is a problem)
Not sure, we need data.
2) How can we prove the validity of the Professors observation?
We need to collect data on performance of students in the two semesters in question.
3/5/2014 32
Class Exercise: Performance Evaluation
Problem
3) What type of data is required to prove the Professors observation?
Quantitative: GPA of CM II students, academic year 2010/11 in Semester I & II, male
and female
4) How Can we collect this data?
Seek data from CEDAT registrar (secondary data sources) or collect it from students
themselves (primary data sources)
5) What is the target population?
CM II Students
6) What is the sampling frame?
The problem has two strata; Semester I & II, and Gender (male & Female), so random
sampling by strata is advised.
7) What method of data collection should we use?
The data required is resident in the target population. That is, it is already there. So a survey
using a questionnaire is appropriate. Or obtain secondary data from CEDAT registrar
3/5/2014 33
Class Exercise: Performance Assessment
Problem
9) How should the sample be drawn from the Population?
Random sampling of the population in respective strata (Male and Female).
10) How can we control sampling errors?
Random sampling will eliminate sampling errors, resulting from say only smart
students sitting together that would influence the outcome.
11) How should the data collection instrument be designed?
What data needs to be collected in the instrument?
The data collection instrument should be a questionnaire, it should only contain
the data required to solve the problem, and none that can be used to positively
identify the respondent. This creates confidence in the respondent to give accurate
data. Remember this in all your future research endeavors!
3/5/2014 34
Questionnaire for the Class Exercise:
Performance Evaluation Problem
Makerere University
College of Engineering, Design, Art and Technology
School of Built Environment
Department of Construction Economics and Management
RESEARCH QUESTIONNAIRE:
Evaluation of Performance of BSC(CM) Program for Academic Year 2010/11
Background:
Dear Respondent, the Department of Construction Economics and Management is
carrying out research as per the above title. You have been identified as one of the
key respondents in the study population. Kindly fillin this questionnaire and return it
to the researcher as soon as you can. All your responses will be treated with utmost
confidentiality, and we also promise to communicate to you the results of the
research as soon as it is concluded.
Thanks
Researcher.
3/5/2014 35
Questionnaire for the Class Exercise:
Performance Evaluation Problem
Academic Year: 2010/11
Sex of the Respondent (tick appropriately)
Male Female
Grade Point Average (GPA) for Semester I (Write to 2 decimal places)
: 
Grade Point Average (GPA) for Semester II (Write to 2 decimal places)
: 
End of Questionnaire
Thank You
3/5/2014 36
Questionnaire Data Collection Results
Grade Point Averages (GPA) for Semester I & II, Academic Year 2010/11 for BSC (CM), Year II
SEM I (M+F) SEM II (M+F) SEM I (M) SEM II (M) SEMI (F) SEM II (F) SAMPLE A (SEM I) SAMPLE B (SEM I) SAMPLE C (SEM I)
4.93 4.75 4.93 4.75 3.53 3.5 3.7 4.02 3.96
4.55 4.29 4.55 4.29 2.2 3 3.95 3.91 4.56
4.52 4.46 4.52 4.46 4.52 4.3 3.93 4.23 3.56
4.5 4.28 4.5 4.28 3.7 3.53 4.28 4.27 3.3
3.95 3.97 3.95 3.97 4.59 4.49 3.87 3 3.98
3.41 3.12 3.41 3.12 4.55 4.35 4.27 4.27
3.96 3.8 3.96 3.8 3.82 3.81 4.36 4.23
3.93 3.93 3.93 3.93 4.3 3.73
4.66 4.52 4.66 4.52 4.36 4.1
4.2 4.32 4.2 4.32 3.91
3.75 3.7 3.75 3.7 2.93
3.67 3.05 3.67 3.05 3.09
4.56 4.3 4.56 4.3 3.88
3.56 3.7 3.56 3.7
3.5 3.01 3.5 3.01
3.8 3.93 3.8 3.93
4.02 3.88 4.02 3.88
4.28 4.19 4.28 4.19
3.92 3.74 3.92 3.74
4.02 3.71 4.02 3.71
3.87 3.66 3.87 3.66
3.91 3.98 3.91 3.98
4.23 4.2 4.23 4.2
4.27 4.14 4.27 4.14
3 2.97 3 2.97
4.27 4.09 4.27 4.09
4.36 3.35 4.36 3.35
2.48 3.12 2.48 3.12
3.3 3.2 3.3 3.2
3.98 4.55 3.98 4.55
4.27 3.54 4.27 3.54
4.3 4.32 4.3 4.32
4.36 4.07 4.36 4.07
3.91 4.34 3.91 4.34
2.93 3.14 2.93 3.14
3.09 3.89 3.09 3.89
3.88 3.56 3.88 3.56
4.23 4.48 4.23 4.48
4.48 4.48 4.48 4.48
3.73 3.02 3.73 3.02
4.44 2.96 4.44 2.96
4.48 4.56 4.48 4.56
4.1 3.12 4.1 3.12
3.53 3.5
2.2 3
4.52 4.3
3.7 3.53
4.59 4.49
4.55 4.35
3.82 3.81
3/5/2014 37
Summary Statistics: Measures of
Location and Variance
Statistical Descriptors
SEM I
(M+F)
SEM II
(M+F) SEM I (M) SEM II (M) SEM I (F) SEM II (F)
SAMPLE A
(SEM I)
SAMPLE B
(SEM I)
SAMPLE C
(SEM I)
Mean 3.97 3.85 3.99 3.85 3.84 3.85 3.95 3.89 3.97
Standard Error 0.08 0.07 0.08 0.08 0.32 0.21 0.09 0.15 0.13
Median 4.00 3.91 4.02 3.93 3.82 3.81 3.93 4.02 3.98
Mode 4.27 3.12 4.27 3.12 #N/A #N/A #N/A 3.91 #N/A
Standard Deviation 0.56 0.53 0.51 0.53 0.85 0.55 0.21 0.53 0.39
Sample Variance 0.31 0.28 0.26 0.28 0.72 0.30 0.04 0.28 0.15
Kurtosis 1.55 1.14 0.96 1.13 1.83 1.19 2.17 0.38 0.21
Skewness 1.12 0.29 0.91 0.29 1.29 0.33 0.97 1.08 0.32
Range 2.73 1.79 2.45 1.79 2.39 1.49 0.58 1.43 1.26
Minimum 2.20 2.96 2.48 2.96 2.20 3.00 3.70 2.93 3.30
Maximum 4.93 4.75 4.93 4.75 4.59 4.49 4.28 4.36 4.56
Sum 198.47 192.37 171.56 165.39 26.91 26.98 19.73 50.53 35.69
Count 50.00 50.00 43.00 43.00 7.00 7.00 5.00 13.00 9.00
Largest(1) 4.93 4.75 4.93 4.75 4.59 4.49 4.28 4.36 4.56
Smallest(1) 2.20 2.96 2.48 2.96 2.20 3.00 3.70 2.93 3.30
Confidence
Level(95.0%) 0.16 0.15 0.16 0.16 0.79 0.51 0.26 0.32 0.30
3/5/2014 38
Data Inspection: Overview of
Descriptive
The mean GPA for Semester I (3.97) is higher than that for Semester II (3.85); in
which case you would conclude that in fact Semester II performance was lower
than Semester I. Then conclude that the professors observation was not correct.
Not necessarily true!!
This conclusion would only be based on measure of location (mean), neglecting
measures of dispersion (standard deviation) that would tell us whether the
difference is significant. Will deal with this later.
A similar observation hold true for female student performance. Notice how
sample mean varies with sample sizes, recall how the samples were selected. The
method and sample size reduces sampling error.
You ought to appreciate how the data was collected, the reason, and therefore the
kind of analysis we will do now on in the course.
In this lab, you are required to reproduce these results your self. Download
the Excel File from your Group Email.
3/5/2014 39
Steps in Sampling
Define study population.
Specify sampling frame and unit.
Specify sampling method.
Determine sample size.
Choose sample.
3/5/2014 40
Sampling Methods
Simple random sample.
Systematic sample.
Stratified sample.
Cluster sample.
Choicebased sample.
Combinations
3/5/2014 41
Simple Random Sampling
All members of a given population have the same chance of
being selected.
Selection of each member must be independent from the
selection of any other member of the sample.
Put every members of the population in a pool, then
randomly select from that pool. e.g. lottery.
Assigns a single number to each element in the list then
randomly draws the sample by choosing numbers.
3/5/2014 42
Systematic Sampling
A starting point in a list is randomly determined and
every kth element of the sampling frame is selected
thereafter.
From a sampling frame of 1,000 a sample of size
100 would be drawn taking every 10th element.
(sampling interval = 1000/100 = 10)
3/5/2014 43
Systematic Sampling
With this approach, rather than randomly selecting from the
sampling frame, the researcher selects sampling units in
sequences separated by a preset interval.
Provided that the sampling frame order is relatively
unbiased, this approach is essentially equivalent to the simple
random sample.
Often, in personal surveys, fieldworkers are instructed to
approach every n person passing a certain point.
3/5/2014 44
Stratified Sampling
This sampling procedure separates the population into
mutually exclusive sets (strata), and then draws simple
random samples from each stratum.
3/5/2014 45
Stratified Sampling
Stratified sampling is based on the fact that a
homogeneous population produces samples with
smaller sampling errors.
a population is divided into subgroups (strata),
according to certain stratification variables.
appropriate numbers of elements are drawn
from each stratum proportionately.
3/5/2014 46
Cluster Sampling
Cluster sampling is a simple random sample of groups or clusters of sampling
units. All of the units within a selected cluster may be included in the sample.
Workplace/establishment surveys are examples of cluster surveys. Some small
number of establishments are first selected from the population of all
establishments within a study area. Employees and visitors are then sampled
within the selected establishments.
Households, construction firms by classification are examples of clusters.
This procedure is useful when;
it is difficult and costly to develop a complete list of the population
members.
the population members are widely dispersed geographically.
Cluster sampling may increase sampling error, because of probable similarities
among cluster members.
3/5/2014 47
Questionnaire Construction
Good questionnaires are more than sets of well written questions.
How items are put together into a complete survey instrument is also very
important.
The questionnaire should be organized in a concise manner that keep the
interest of the respondents, encouraging them to complete the entire
questionnaire.
The format is as important as the words;
Spread the questions out, use white space.
A good layout should minimize errors by making directions, questions and
response items clear and easy to follow.
3/5/2014 48
Questionnaire Construction Guidelines
A good questionnaire must be well designed:
Keep the questionnaire as short as possible.
Ask short, simple, and clearly worded questions.
Start with demographic questions to help respondents get started comfortably.
Use dichotomous and multiple choice questions.
Use openended questions cautiously.
Avoid using leadingquestions.
Pretest a questionnaire on a small number of people.
Think about the way you intend to use the collected data when preparing the
questionnaire. 3/5/2014 49
Questionnaire Construction: What to
Avoid
Confusing Questions
Outside respondents experience
Technical terms
Assuming knowledge
Overfamiliarity
Uncommon idioms/slangs
Ambiguous questions
Confusing twoparts questions
Indefinite comparisons
Indefinite persons/Places, e.g neighborhoods.
Incomplete questions
Imprecise
Indefinite in time
Loaded Questions
Provide unfair alternatives
Link personalities to questions
Link institutions to questions
3/5/2014 50
Planning and Designing a Research
Study
3/5/2014 51
Steps in Planning and Designing a
Research Study
Choose a research topic
Carry out literature review
Formulate a research problem
Define objectives
Articulate research hypothesis
Choose variables to study
Define the methods: Tools and sampling plan
Collect the necessary data
Analysis the data: Descriptive and inferential statistics
Report Findings based on the hypothesis and collected data precisely and clearly
State any major constraints and/or challenges to the study.
3/5/2014 52
Choosing a Research Topic
Out of Interest Something at heart!
Problem Solving Real or imaginary, e.g. improve productivity,
efficiency, scheduling or financial control of large construction projects.
Previous Research Continuation of unfinished research.
Testing a theory about something.
Advise from research supervisor or funder.
Example: Measuring productivity of masons on building projects below and above
window levels.
3/5/2014 53
Examples of Past Student Research Titles
3/5/2014 54
Name Year Program Research Title
John Musiime 2012 CM Development of Visual Basic Application in Excel for
Management of Road Construction Projects in Uganda
Angella Uwase 2011 QS Assessment of the Effect of Equipment Availability and
Costs on Road Maintenance projects
Edgar Mukata 2013 CM Material Management on Large Building Construction Sites: A
case study of Kampala City
Timothy Ndifuna 2013 CM Assessing the Quality of Burnt Clay and Earth Bricks used in
Building Projects
Angella Asiimwe 2011 QS Empirical Research on Unit Rate of Paving a Kilometer of
Road in Uganda: A Case of KubiriNorthern ByPass Link
Jonathan Gombya 2013 CM The Extent of delays and Cost Overruns on Private Building
Projects in Uganda
Bongole Lutaaya 2012 CM Evaluation of the Extent of Use of Partition Walling in
Office Buildings in Kampala
Darlius Tumwiine 2011 CM Assessing the Quality of Precast Concrete Blocks on Market
in Kampala
Literature Review
Literature review simply means getting familiar with published work about
the topic. Example: Productivity of Masons.
The objective is to getting an understanding of how the topic has been
addressed by other researcher; definitions, measurement criteria, variables,
methods and tools of data collection, data analysis and reporting.
The information is summarized in a good format that guides the reader
into deeper understanding of the research topic.
Literature review follows approved referencing and citation formats; for
example APA (American Psychological Association) that Makerere
University uses.
3/5/2014 55
Literature Review: Citation and
Referencing in APA
The objective of citation and referencing is to avoid plagiarism, which is
using someones work without attrition. It is a major academic offense.
Citation formats in APA system is given as a handout as part of the Class
Notes. Demonstrate in Class an Example in Mendeley.
Remember to discuss the review coherently and clearly so as not to
confuse the reader.
If you find that at the end of the review, you are unable to proceed with
any of the problem definition, methods, variables, objectives, etc, it
simply means you have not exhausted the literature on the subject matter.
3/5/2014 56
Formulating a research Problem
The research problem should in a way clearly and
precisely define the problem of the study topic.
Good research problems must meet three criteria (see Kerlinger, 1973). First, the
research problem should describe the relationship between two or more variables. Second,
the research problem should take the form of a question. Third, the research problem
must be capable of being tested empirically (i.e., with data derived from direct
observation and experimentation) Marczyk, Dematteo & Festinger (2005).
The research question must clearly indicate what is being study clearly and
precisely. In example topic, the research question (s) as part of the
problem definition would be Is the productivity of masons on building projects
different working below and above window levels? If so, what factors greatly influence
this productivity? These are research question examples for the title.
3/5/2014 57
Define the Objectives
The objective is the main intent of the study, similar to the
study title rephrased.
If the title is Measuring productivity of masons on building projects
below and above window levels, then the objective would be;
To measure productivity of masons on building projects below and
above window levels.
There are also specific objectives, which are subtasks that
have to be fulfilled to realize the general objective.
3/5/2014 58
Define the Objectives: Specific Objectives
The specific objectives represent subtasks that have to be
performed in the study to realize the main objectives and/or
solve the problem.
Simply put, its a breakdown of activities that have to be
performed in the study to realize the main objective.
Specific objectives are several depending on the extent of the
study.
3/5/2014 59
Research Hypothesis
A hypothesis is an educated and testable guess to an answer to the
research question (Marczyk, Dematteo & Festinger, 2005).
It is an attempt to explain, predict, and explore the phenomenon of
interest.
There are two types; null and alternative hypothesis
The null hypothesis represents the educated guess of the
researcher, while the alternative represents the contrary.
The hypothesis could be direction or nondirectional.
3/5/2014 60
Research Hypothesis
The null hypothesis predicts no difference between variables measured
across time or treatment levels.
For example in earlier exercise in class, the null hypothesis would be:
Ho: There is no difference in performance between Sem I and II of CM I
students in 2010/11.
The alternative (Ha) predicts there will be a difference between performance
in the two semesters.
However, the professor has already given the null hypothesis
Ho: Performance of CMI students in 2010/11 was greater for Sem II than
Sem I.
Ha: Performance of CM I students in 2010/11 was not greater (less or
equal) for Sem II than Sem I.
3/5/2014 61
Choosing Variables to Study
A variable is anything that can take on different values. For example, height,
weight, age, race, attitude, and IQ are variables because there are different
heights, weights, ages, races, attitudes, and IQs (Marczyk, Dematteo &
Festinger, 2005).
In construction management, valuation and quantity surveying, variables
may include; productivity, compressive strength of concrete elements, cost per unit,
consumption per unit time, etc.
In the class exercise, the main variable was GPA because it varies
amongst students, and is a measure of academic performance.
In planning your study, identify the variables that will be measured. It is
these that you design for a data collection method.
Many categories of variable exists for a research study.
3/5/2014 62
Choosing Variables to Study: Types
Two main types of variables exist:
Dependent a measure of the effect (if any) of the independent
variable. GPA for performance as in the class exercise.
Independent the factor that is manipulated or controlled by the
researcher and independent of the outcome being
measured. These were not considered in the exercise but
could include; sex, aggregate weighted score in Advanced level,
attendance per semester per student, etc.
The independent variables, the researcher seeks to explain
whether the dependent variable is predictable, and therefore
useful in regression and correlation analysis.
3/5/2014 63
Categorical v Continuous Variables
Categorical variables are variables that can take on
specic values only within a dened range of values.
Examples; gender, marital status, class of construction
firms, class of academic degrees, etc.
Continuous variables are variables that can theoretically
take on any value along a continuum.
Examples; height, weight, income, volume, weight,
time, area, etc.
3/5/2014 64
Research Methodology : Methods
There is a distinction between methods and methodology.
According to Blaxter, Hughes & Tight (2006), the
distinction is as follows;
Method means the tools of data collection or analysis: techniques
such as questionnaires and interviews.
Methodology means the approach or paradigm that underpins
the research. That is, whether it is a quantitative or
qualitative research; experiment or survey. A clear
statement must be defined in the methodology citing
reasons based on literature review and/or experience.
3/5/2014 65
Choosing a Research Method: Factors
(Blaxter, Hughes and Tight, 2006)
Research Questions: What exactly are you trying to nd out? This can lead you either into
the quantitative or qualitative direction.
Are you interested in making standardized and systematic comparisons or do you really want to
study this phenomenon or situation in detail? Experiment or survey.
The Literature: How have other researchers dealt with this topic? To what extent do you
wish to align your own research with standard approaches to the topic?
Practical Considerations: Issues of time, money, availability of samples and data, familiarity
with the subject under study, access to situations, gaining cooperation.
Knowledge payoff: Will you learn more about this topic using quantitative or qualitative
forms of research? Which approach will produce more useful knowledge? Which
will do more good?
Style: Some people prefer one approach to another.
3/5/2014 66
Choosing a Research Method: Details
Required
The following need to be stated with respect to the research
question, hypothesis or specific objectives;
Nature of data collection (survey or experiment),
Data collection instrument: questionnaire for survey
and experiment design for experiments stating the
equipment and setup,
Target population,
Sampling frame,
Sampling method, and sample size,
Data collection, preparation and analysis; statistical or
other tests to be used to analyze the data, and
How the expected outcome will be reported.
3/5/2014 67
Data Collection: Measurements
The main concern in data collection is method of
measurement.
There are two reasons why measurement technique is
important in data collection;
Allows the researcher to quantify the variables
The level of statistical sophistication required to analyze the data
depend on the scale of measurement to quantify the variables.
There are four main measurement scales; nominal,
ordinal, interval and ratio.
The measurement scale and equipment affects
reliability and validity of measurements.
3/5/2014 68
Data Collection: Nominal Scale
Characteristics
Used only to qualitatively classify or categorize not to
quantify.
No absolute zero point.
Cannot be ordered in a quantitative sequence.
Impossible to use to conduct standard mathematical
operations.
Examples include; gender, religious and political afliation,
and marital status.
Purely descriptive and cannot be manipulated
mathematically.
3/5/2014 69
Data Collection: Ordinal Scale
Characteristics
Build on nominal measurement.
Categorize a variable and its relative magnitude in relation to
other variables.
Represent an ordering of variables with some number
representing more than another.
Information about relative position but not the interval between
the ranks or categories.
Qualitative in nature.
Example would be nishing position of runners in a race, position in
class etc.
Lack the mathematical properties necessary for sophisticated
statistical analyses.
3/5/2014 70
Data Collection: Interval Scale
Characteristics
Quantitative in nature.
Build on ordinal measurement.
Provide information about both order and distance
between values of variables.
Numbers scaled at equal distances.
No absolute zero point; zero point is arbitrary.
Addition and subtraction are possible.
Examples include temperature measured in Fahrenheit and
Celsius.
Lack of an absolute zero point makes division and
multiplication impossible.
3/5/2014 71
Data Collection: Ratio Scale
Characteristics
Identical to the interval scale, except that they
have an absolute zero point.
Unlike with interval scale data, all
mathematical operations are possible.
Examples include; height, weight, and time.
Highest level of measurement.
Allow for the use of sophisticated statistical
techniques.
3/5/2014 72
Data Collection: Main Approach
Methods
Formal testing; using scientific equipment/tools and
procedures.
Interviewing; in a questionnaire by answering a series of
guided questions.
Global ratings (Also known as Likert Scale); such as personal
attitude about a statement strongly disagree, disagree, agree or
strongly agree. E.g on a scale of 1 to 5, indicate the extent to which you
enjoy the course so far.
1 2 3 4 5
Hate it Neutral Love it
Observation; has to do with time and a defined group.
3/5/2014 73
Data Preparation, Analyses and
Interpretation: Data Preparation
The process of data analysis involves three main
steps;
(1) preparing the data for analysis,
(2) analyzing the data, and
(3) interpreting the data (i.e., testing the research hypotheses and
drawing valid inferences).
This includes;
1. Logging and tracking data
2. Data screening
3. Constructing a database
4. Data entry and coding missing variables
5. Data transformation if necessary: square root, log or inverse
transformation of variables.
3/5/2014 74
Data Preparation, Analyses and
Interpretation: Data Analysis
1. Descriptive Statistics:
General description to summarize data: frequency tables,
histograms, bar charts, boxplots, stemandleaf plots.
Measures of central tendency or location: mean, median, mode.
Measures of dispersion or variation: range, variance, standard
deviation, interquartile range.
Measures of association: correlation coefficient, coefficient of
determination, and Pearson coefficient for ratio or interval scale data,
Spearman Rank order for ordinal data.
3/5/2014 75
Data Preparation, Analyses and
Interpretation: Data Analysis
2. Inferential Statistics:
TTest: to test mean difference between two groups on
interval or ratio scales.
Analysis of Variance (ANOVA): comparing means across more
than two groups, an extension of the ttest.
ChiSquare (
2
): Used to test hypothesis when the data is
nominal or ordinal. Summarizes the discrepancies between
observed and expected values.
Regression: Simple linear, multiple and logistic regression.
3/5/2014 76
Interpreting Data and Drawing Inferences
Data collection and statistical testing is about solving a
problem, by answering a question through hypothesis testing
using collected data so as to arrive at a conclusion.
Therefore, based on data collected, you will test the
hypothesis and make a statement about the findings
statistically and the meaning of the outcome in real sense.
Ordinary readers understand real sense not your statistics, but
statistics is a tool to arrive at a good conclusion.
Part II of this course, we will explore statistical approaches to
descriptive and inferential statistics necessary for data analysis.
3/5/2014 77
Ethical Considerations in Research
Fundamental Ethical principles;
Respect for persons, especially those you are superior.
Justice in a way you choose research participants not avoid bias.
Confidentiality; to treat results of your respondents with utmost
confidentiality, not to be used in a form that can harm, injure your
respondents. E.g. studying about robbers at the university and later
disclosing to police who they are is unethical.
Not to use minors and impaired persons in research because they do not
make rational decisions.
Protection of research assistants from danger of the research;
appropriate protection, and safety measures.
Avoid plagiarism; which is falsifying work, or coping someone elses work
without due attrition. It is an offense. There are principles on how to
avoid this. See the handout on referencing and citation in APA.
END OF PART I
3/5/2014 78
Course Work: Write a Research Proposal (20%)
(Date of Issue:)
Instructions:
1) Following the concepts learnt so far in class, you are required to write a
research proposal, bound and submitted in a hard copy to the Tutor
not later than .
2) You are expected to follow and apply most of the principles learnt in
this course.
3) Take this exercise seriously. Devote time and effort to it.
4) All work or ideas produced MUST be original and reflect your own
effort. Reproducing past work will be detected and will score no mark.
5) The length of proposal should NOT BE MORE THAN 10 PAGES.
3/5/2014 79
Format of the Research Proposal
1) Title Page: This should include the University Name, College, School and
Department, Title of research Topic, Name of Researcher and Registration
Number (You), and Name of Supervisor (Tutor), and this Text A Research
proposal submitted to the Department of Construction Economics and Management for the
Award of a Degree of Bachelor of Science in Construction Management of Makerere
University, and finally, month and Year. This should fit on one full page.
2) Table of contents, List of figures /Tables, and List of acronyms.
3) Chapter 1: Introduction including;
a. a background to your research topic, identifying the need, social, economic, professional
relevancy and urgency.
b. Problem Statement
c. Main Objective and Specific objectives,
d. Hypothesis,
e. Justification, and
f. scope of the study.
3/5/2014 80
Format of the Research Proposal
3) Chapter 2: Literature Review using APA (American Psychological
Association citation and referencing format) on past work about the research topic;
a. Key definitions of terms in the research topic,
b. Type of research and applicable procedures in the literature,
c. Methods of data collection and measurements in general, and
d. Data analysis approaches.
4) Chapter 3: Methodology;
a. Define the type of research,
b. Method of data collection; Experimentation or survey.
c. Type of Data to be collected including major variables (dependent and independent if
applicable),
d. Target population/experimental setup, sampling frame, sampling methods, sample size,
e. Data collection instrument (s) includes type of data to be collected, and
f. Statement on how data will be analyzed (descriptive or inferential statistics in detail)
3/5/2014 81
Format of the Research Proposal
5) References ( Follow APA format).
6) Time Schedule for the study; break down study per activity and how long it will take you to accomplish
each activity.
7) Estimated Budget for the study; stationery, transport, material purchases, remuneration of research
assistants, hire of equipment if applicable, material testing if applicable.
8) Appendices; copies of standards to use, questionnaire, or photos of equipment.
NB:
You are collecting data from different sources, kindly write it logically and coherently so as to interest the
reader. Do not look at this as a mere class exercise. Just assume the TUTOR is a businessman with
capacity to finance your research and you want him to get interested and fund it. In this tone, you will
present the best of everything stated above, coherently and clearly so as to attract the necessary interest!
You may have to consult other members of staff, practicing construction mangers among other people
to identify a suitable topic and/or problem. You are also free to consult me during working time.
GOOD LUCK
3/5/2014 82
PART II: STATISTICS
Descriptive Statistics
Inferential Statistics
3/5/2014 83
Terminology
A population is a collection of all units of interest.
A parameter is a numerical characteristic of a
population eg ( mean, standard deviation, variance).
A sample is a subset of a population that is actually
observed.
A random sample gives an equal preassigned chance
to every unit of the population to enter the sample.
The population and its parameters are typically
unknown and the sample is used to infer the values
of the parameters.
A statistic is a numerical characteristic of a
sample, used to infer the values of parameters.
3/5/2014 84
Statistics
Science of collecting and analyzing data for drawing
conclusions and making decisions.
Descriptive Statistics and exploratory data
analysis summary and description of collected data.
Inferential Statistics generalizing from a sample to
a population.
3/5/2014 85
Descriptive Statistics
Collect data
e.g. Survey
Present data
e.g. Tables and graphs
Characterize data
e.g. Sample mean
3/5/2014 86
Graphical Methods to Summarize
Data
Graphical methods to summarize data include;
Pie charts, and
Bar graphs.
In each case, data is organized so that data falls into
only one category of variable.
Pie Charts  Basically are used for discrete variables
with few outcomes. Choose a small number of
categories for the variable because too many variables
make the pie chart difficult to interpret.
3/5/2014 87
Graphical Methods Bar Charts
Used to display frequency data from qualitative variables.
Label frequencies on one axis and categories on the other axis. This can be
done pretty well and quick in MS Excel.
For histograms, data has to be summarized in a frequency table from which
frequency and relative frequency diagrams are drawn. However, histograms
are only applicable to grouped data.
The decision on how many classes to use depends on the range, your
personal input, and number of class intervals that represent your data the
best.
3/5/2014 88
Histograms
3/5/2014 89
Histogram Characteristics
The location of the modal interval or class keeps shifting with
the number of classes. Therefore very sensitive to the number
of classes.
If we are to compare two different samples or populations, it is
important that the decision is based on a relative frequency
histogram rather than a frequency histogram as it eliminates
issues of variability in sample sizes.
3/5/2014 90
Histogram Jargon
A histogram with one major peak is called unimodal, that with
two major peaks is bimodal and uniform if every interval has
essentially the same number of observations.
A histogram can also be described based on symmetry of the
distribution created as;
Symmetrical if the right and left sides have essentially the
same shapes, and usually the mean, mode and median coincide.
Skewed to the right if much of the data by frequency is to the
left and very few to the right, and the median and mode are to the
left of the mean. Long tail to the right.
Skewed to the left if much of the data by frequencies is to the
right and few to the left. Long tail to the left. The median and
mode are to the right of the mean.
See Class work Example 1 for the distinction
3/5/2014 91
Histograms for GPA in SEM I & II Exercise
Recall the class exercise carried out in the first part of the
course. Take a close look at the summary of descriptive statistics
(Slide 16).
For SEM I (M+F), note that the median and the mode are to
the right of the mean, suggesting performance is skewed to the
left. That is, you have many students scoring above 3.97 (the
Mean) than below it. Does the histogram show the same?
On the other hand, Sem II performance, the; mode is to the left
of the mean, and the median to the right. Performance is
neither skewed to the left or right. That is nearly symmetrical.
Does it seem so?
3/5/2014 92
Histograms for GPA in SEM I & II
Exercise  Contd
3/5/2014 93
Measures in Descriptive Statistics
Graphical methods are mainly used to summarize
the data and to give a visual trend about the data.
The Histogram for instance will show where the data
is located and variability.
However, magnitude of measures of location and
variability as seen from the histogram has to be
determined. These are referred to as measures of
location and variability or dispersion, summarized in
the next slide.
3/5/2014 94
Summary of Measures
3/5/2014 95
Measures of Location
3/5/2014 96
Mean
3/5/2014 97
Characteristics of the Mean
3/5/2014 98
There is only one mean for dataset.
It is influenced by extreme measurements.
Means of subset can be combined to determine
the mean of the complete data set.
Can be used only with quantitative data.
For group data the mean is rather stable even
when data is organized into different classes.
Median
3/5/2014 99
The sample median, is the middle value in a set
of data that is arranged in ascending order.
For an even number of data points the
median is the average of the middle two.
Characteristics of the Median
3/5/2014 100
There is only one median for dataset.
Not influenced by extreme measurements.
Medians of subset cannot be combined to
determine the median of the complete data
set.
Can be used only with quantitative data.
Example
3/5/2014 101
Median for Grouped Data
3/5/2014 102
The median for grouped data is given by;
L + w/f (0.5n cf
b
)
Where;
L is the lower class boundary of the median class,
w is the class width
f is the frequency of the median class
n is the total number of observations
cf
b
is the cumulative frequency of the class before the
median class.
Mode
3/5/2014 103
The measurement that occurs more often (with the highest
frequency).
The mode exhibits the following characteristics;
There can be more than one mode for dataset.
Not influence by extreme measurements.
Modes of subset cannot be combined to determine the
mode of the complete data set.
Can be used for qualitative and quantitative data.
In group data the mode can change depending on the
categories (classes) used.
Mode for Grouped Data
3/5/2014 104
We can define the MODAL INTERVAL to be the
class interval with the highest frequency.
Since we would not know the actual measurements
but only how many measurements fall into each
interval, the mode is taken as the midpoint of the
modal interval, it is an approximation of the mode
of the actual sample measurement.
Percentiles
3/5/2014 105
Median divides data set into two parts of equal
size.
Quartiles divide the data set into 4 equal parts.
That is the 25
th
and 75
th
percentiles are also
referred to as Lower and upper quartiles
respectively.
Percentiles divide the data set into even finer
parts, e.g. 99%.
Calculating Percentiles
3/5/2014 106
Order the n observations from smallest to largest.
Determine the product np where, p is the
proportion required. For example, p=0.25 for 25
th
percentile, p=0.75 for 75
th
percentile.
If np is an integer, round it up to the next integer
and find the corresponding ordered value.
If np is an integer, say k, calculate the mean of kth
and (k+1)th ordered observations.
Consider the example in the next slide.
Worked Example on Percentiles and
Quartiles
3/5/2014 107
Given the data below, obtain the quartiles;
25
th
, 50
th
and 75
th
and the 93
rd
percentile.
Note the data is already sorted.
221 234 245 253 265 266 271 272 274 276
276 276 278 284 289 290 290 292 292 296
297 298 300 303 304 305 305 308 308 309
310 311 312 314 315 315 323 330 333 336
337 338 343 346 355 364 366 373 390 391
Worked Example on Percentiles and
Quartiles
3/5/2014 108
n=50, for 25
th
, 50
th
, 75
th
and 93
rd
percentiles,
p=0.25, 0.50, 0.75, and 0.93 respectively.
25
th
percentile (Lower quartile, Q1);
np=50*0.25 = 12.5. Rounded up to 13. That
is the 13
th
observation = 278
The second quartile or median; np=50*0.50
= 25, an integer, so we obtain the mean of
the 25
th
and 26
th
observations as follows
(304+305)/2 = 304.5
Worked Example on Percentiles and
Quartiles
3/5/2014 109
75
th
percentile (upper quartile, Q3), np =
50*0.75 = 37.5, rounded up to 38
th
observation which is 330.
93
rd
percentile, np=50*0.93=46.5, rounded
up to 47
th
observation which is 366.
You can compute interquartile range = Q3
Q1= 330278 = 52. It indicates that 50% of
the observations are within 52 units of each
other. Or simply bound by 278 and 330.
Boxplots
3/5/2014 110
A summary of information contained in quartiles
can be summarized in a diagram called a boxplot.
The diagram gives a visual representation of how the
data is distributed from the smallest to the largest.
The boxplot can be used to locate extreme
observations in the data that are classified as outliers.
Could be used to detect errors in data collected but
not always.
Constructing a Boxplot
3/5/2014 111
The centre half of the data extending from the lower to
upper quartile is represented by a rectangle.
The median or second quartile is represented by a line
through the rectangle.
A line is drawn extending from the upper quartile to the
largest value, and another from the lower quartile to the
smallest value. These lines are called whiskers.
If the data set is symmetrical, the box plot will also indicate
so.
Constructing a Boxplot
3/5/2014 112
To identify outliers in the data set, the whiskers should extend
1.5*IQR from Q1 and Q3 respectively.
If an observation is outside these whiskers, they are referred
to as outliers.
Therefore the boxplot can be used to show the outliers in the
data. These could be real errors in data collection, or simply a
nonhomogeneous population as assumed in data collection.
Boxplot for the Example
3/5/2014 113
221
Median=304.5
391
Q3=330
Q1=278
Labwork: Construct a box plot for SEM I GPA (M+F) on
Slide 41
Measures of Variability
3/5/2014 114
There are four major measures of variability;
Range,
Variance,
Standard deviation, and
Coefficient of variation.
Measures of Variability  Range
3/5/2014 115
The range is the difference between the
largest and the smallest value of a data set.
The range is influenced by the extreme values,
and indicates how dispersed is the data set.
The range divided by 4 is an approximation
of the standard deviation. That is;
Standard deviation = = (range/4)
Measures of Variability  Variance
3/5/2014 116
The variance is the mean of the square deviation.
The deviation is the difference between individual
observation and the mean. The sum of the deviation
is always zero.
Variance for a sample is denoted as (s
2
) and for a
population as (
2
). Has square units of the data.
Formula for ungrouped data;
Measures of Variability Standard
Deviation
3/5/2014 117
The standard deviation is the square root of
variance.
It has the same units as the data.
Standard deviation for a sample is denoted as
(s) and for a population as (). Formula for
ungrouped data;
Measures of Variability Coefficient
of Variation
3/5/2014 118
This is a measure of the extent of variability
by expressing the ratio of the standard
deviation to the mean multiplied by 100%.
It is denoted as COV. If COV=0, it is a
uniform distribution, COV=1 is a Poisson
distribution; 0<COV<1 is other distributions
including normal distribution. Ideally, data
should have as small as possible a value of
COV for homogeneity.
Measures of Variability Coefficient
of Variation
3/5/2014 119
Example Question on Measures of
Variability
3/5/2014 120
The following are 14 measurrements on the
strength of paper to be used in Cardboards;
121, 128, 129, 132, 135, 133, 127, 115, 131,
125, 118, 114, 120, 116. Compute;
a) The range and an approximation of standard
deviation.
b) The mean
c) The Variance and standard deviation
d) The coefficient of variation.
Probability Distributions
3/5/2014 121
In the first part of the course, we saw that
variables of interest are the ones for which
data is collected.
Such variables are also referred to as random
variables, because they can take on any value.
These values when plotted on a histogram,
the form a probability distribution if drawn
with relative frequency on the vertical axis.
Probability Distributions
3/5/2014 122
Several distributions result depending on the
nature of data; discrete or continuous.
Examples of probability distributions that
may result from the data are;
a) Binomial (discrete)
b) Poisson (discrete), and
c) Normal (continuous).
Binomial Distribution
3/5/2014 123
Binomial distributions result from Bernoulli trials in
which there are only two outcomes; success or
failure.
The experimenter fixes the number of trials (n) and
counts the number of successes (s) in n trials.
The probability of success is the same for each trial.
The outcomes (successes) from different trails are
independent.
Binomial Distribution
3/5/2014 124
Question: Can the following be treated as a Bernoulli
trial? Drivers stooped at a roadblock to be checked
for possession of drivers license.
Solution
There are only two outcomes, you either have a drives
license or not.
If we treat all drivers the same, they will have equal
probability of not having a drivers license.
Possession of one drivers license is independent of
the next driver in successive arrival.
It is therefore a Bernoulli Trial, and the distribution
will be Binomial.
Binomial Distribution Practical Problem
3/5/2014 125
As an example, if you wish to determine whether
paving blocks manufactured in Kampala meet
specifications, and if you decide to sample at least 10
blocks from each manufacturer. If success is defined
as block meeting specifications, then the number of
successes in n trials from m manufacturers follow a
binomial distribution.
Binomial Distribution
3/5/2014 126
Where p is the probability of success in an
experiment, n is the number of trials, x is the value
for which probability is being computed.
The mean of a binomial distribution = np and the
variance = np(1p).
If n is sufficiently large, the binomial distribution
approximates to normal distribution.
Binomial Distribution Question
3/5/2014 127
It has been claimed that in 60% of all solarheat
installations the utility bill is reduced by at least one
third. Accordingly, what are the probabilities that the
utility bill will be reduced by at least onethird in;
a) Four of five installations,
b) At least four of five installations?
c) Compute the mean and variance of the
distribution.
Poisson Distribution
3/5/2014 128
Poisson distribution arises from a Poisson process in
which measurements are time dependent.
That is the number of events occurring in a fixed
time interval. Examples, number of calls received
every minute at a customer service centre, the
number of students arriving at CEDAT every five
minutes between 7 and 9 AM.
Poisson process is always random with the mean
equal to the variance.
Poisson Distribution Model
3/5/2014 129
Where lambda is the mean arrival in the time interval
under consideration.
Normal Distribution
3/5/2014 130
The normal distribution is the most important distribution
describing quantitative continuous data.
The normal distribution describes most natural phenomena
even for discrete variables of large samples.
The distribution is symmetrical about the mean, and as
discussed earlier, the mode, median and mean are located
together.
The symmetry allows the distribution to be standardized.
Normal Distribution Curve
3/5/2014 131
Standardized Normal Curve
Z is the score on a standardized normal
curve, x is the random measurement with a
standard deviation of sigma.
3/5/2014 132
3/5/2014 133
3/5/2014 134
Properties of a normal
distribution
The standardized normal distribution has zero mean
and standard deviation one.
68% of the data will always lie within one standard
deviation of the mean.
95% of the data lies within approximately two
standard deviations of the mean.
The 95% is often taken in statistics as a confidence
level to carry out further statistical tests.
3/5/2014 135
Example Question
The time for a super glue to set can be
treated as a random variable having a
normal distribution with mean 30 seconds.
Find its standard deviation if the probability
is 0.20 that it will take on a value greater
than 39.20 seconds.
3/5/2014 136
Solution: Presentation of Data
3/5/2014 137
=?
=30
x=39.20
P(x>39.20)=0.20
Solution: Computation
Using the zscore equation;
z
1
= (x)/, x=39.20, u=30, sigma is
unknown.
However, the Probability P(z>z
1
) =0.20. We
can read the value of z
1
for the probability
from ztable. z
1
=0.84.
Sigma = (39.20 30)/0.84 = 9.20/0.84 =
10.95 seconds
3/5/2014 138
Exercise Question: Normal
distribution
The initial setting time of a new type of
cement is a random variable having the
normal distribution with mean = 4.76
seconds, and standard deviation 0.04
seconds. What is the probability that this
kind of cement will set in;
a) Less than 4.66 seconds
b) More than 4.80 seconds
c) Anywhere from 4.70 to 4.82 seconds.
3/5/2014 139
Exercise Question: Solution
3/5/2014 140
INFERENTIAL STATISTICS
3/5/2014 141
Inferences Concerning the Mean
Inferential statistics helps the researcher to generalize from
information contained in the sample about the general
population.
The approaches available are hypothesis testing and confidence
intervals usually that is all about the mean.
We use the sampling distribution of the mean property.
If several samples are drawn from the population that is normal,
the distribution of the sample means is normal, with mean (mu)
and standard deviation equal to standard error = (sigma/root n).
The difference between the true mean and mean of the
distribution of means is referred to as the Error or tolerance
(E).
3/5/2014 142
Sampling Distribution of the Means
3/5/2014 143
x
= /n
xbar
Error (E)
Estimation of the Error
Recall that to compute z with random variable x,
and standard deviation ;
For a distribution of the means the zscore is
computed as with xbar as the mean of means;
3/5/2014 144
Estimation of the Error (E)
Rearranging the formula for distribution of means;
The difference between the mean of the means and the true mean is
the error (E). However, it is always not known since the true mean is
always unknown.
We use the property of symmetry of normal distribution and
probability to state the accuracy of the mean estimate.
The 95% confidence level is often used state the confidence level. 99%
confidence level is also common where accuracy of estimate is
paramount.
3/5/2014 145
Confidence Level Demonstration
3/5/2014 146
x
= /n
(upper estimate) (lower estimate)
Probability associated
with estimation of the
mean (1)
/2
/2
95% Confidence Level
3/5/2014 147
x
= /n
(upper estimate)
Z
0.025
= 1.96
(lower estimate)
Z
0.025
= 1.96
(10.05)=0.95
0.025
0.025
99% Confidence Level
3/5/2014 148
x
= /n
(upper estimate)
Z
0.005
= 2.575
(lower estimate)
Z
0.005
= 2.575
(10.01)=0.99
0.005
0.005
Large Sample Confidence Interval
for with known
Recall that;
We can rearrange the above formula to create the
confidence bounds about the true mean as
follows;
3/5/2014 149
( )
n
z E x
o
= =
( )
n
z x
o
o
2
s
n
z x
o
o
2
+ s
n n
z x z x
o o
o o
2 2
+ < <
Small Sample Confidence Interval
for with unknown
For small samples, with unknown population standard
deviation, the zscore is replaced with a t from a t
distribution and standard deviation of the sample;
We can rearrange the above formula to create the
confidence bounds about the true mean as follows;
3/5/2014 150
( )
n
s
t x
2
o
s
n
s
t x
2
o
+ s
n
s
n
s
t x t x
2 2
o o
+ < <
( )
n
s
t E x
2
o
s =
Example on Confidence Interval
A random sample of size n=100 is taken from a population with =5.1.
Given that the sample mean (xbar)=21.6, construct a 95% confidence
interval for the population mean
Solution
This is a large sample with known population standard deviation. So
substitute the values in the confidence interval expression with z=1.96.
The interpretation is that either the interval 20.6 to 22.6 contains the population
mean or not, but we are 95% confident that it does.
3/5/2014 151
100
1 . 5
100
1 . 5
* 96 . 1 6 . 21 * 96 . 1 6 . 21 + < <
6 . 22 6 . 20 < <
Determination of Sample Size
Recall that the Error (E) is given by;
If we make n the subject in the above equation, we obtain;
3/5/2014 152
( )
n
z E x
o
o
2
= =
n
z E
o
o
2
=
2
2

.

\

=
E
z
n
o
o
Example on Sample Size
A research worker wants to determine the average time it takes a
mechanic to rotate the tires of a car, and she wants to be able
to assert with 95% confidence that the mean of her sample is
off by at most 0.50 minutes. If she can presume from past
experience that sigma=1.6 minutes, how large a sample will
she have to take?
Solution
Substitute the values, E=0.50, z
0.025
=1.96, and sigma=1.60
Approximately a sample of 40 will be required.
3/5/2014 153
( ) 3 . 39
2
50 . 0
60 . 1 * 96 . 1
= = n
Trial Question on Sample Size
If we want to determine the average mechanical aptitude of a
large group of workers, how large a random sample will we
need to be able to assert with probability 0.95 that the sample
mean will not differ from the true mean by more than 3.0
points? Assume that it is known from past experience that
sigma=20.0.
3/5/2014 154
Hypothesis Testing
There are many problems in which, rather than estimate the
value of a parameter, we must decide whether a statement
concerning a parameter is true or false.
That is we must test a hypothesis or an assertion about a
parameter.
Recall from the class exercise about performance, we could
test whether GPA in SEM II was more than GPA in SEM I.
To validate this observation, we use hypothesis testing.
Many similar observations, beliefs and assertions are validated
this way scientifically.
3/5/2014 155
Hypothesis Testing: Steps
Formulate the null and alternative hypotheses
Specify the probability of rejection, Type I error, also referred
to as level of significance (alpha=0.05 or 0.01).
Construct a criteria to reject the null hypothesis against the
alternative (critical z or tscore).
Calculate from the data the value of the statistic on which
decision is based (zscore or tscore).
Decide whether to reject the null hypothesis or fail to reject it.
3/5/2014 156
Hypotheses Concerning one
Mean: Example
A trucking firm is suspicious of the claim that the average
lifetime of certain tires is at least 28,000 km. To check the
claim, the firm puts 40 of these tires on its trucks and gets a
mean lifetime of 27,463 km with a standard deviation of
1,348 km. what can it conclude if the probability of type I
error (alpha=0.01)?
Solution
State the hypotheses: Ho & Ha
Ho: Average lifetime of a certain tire is 28,000 km or greater (
28,000).
Ha: Average lifetime of a certain tire is less than 28,000 km ( <
28,000).
Level of significance: =0.01
3/5/2014 157
Hypotheses Concerning one
Mean: Example
Criterion: The critical zscore corresponding to alpha value
of 0.01 for onetailed distribution is Z= 2.33. Reject Ho if
the modulus of Z calculated is greater than Z critical (2.33).
Calculations:
3/5/2014 158
( )
n
x
z
o
0
=
( )
52 . 2
52 . 2
000 , 28 463 , 27
40
348 , 1
=
=
=
z
z
Hypotheses Concerning one
Mean: Example
Decision: Since the modulus of Z calculated (2.52) is greater
than Z critical (2.33), we reject the null hypothesis at
alpha=0.01.
Conclusion: The trucking firms suspicion that the average
lifetime of a certain tire is less than 28,000 km is confirmed.
Note that for small sample sizes and unknown sigma, you will
have to use a ttest instead of a ztest.
3/5/2014 159
Inferences Concerning Proportions
Many engineering and construction management problems
deal with proportions, percentages or probabilities.
The information that is usually available for the estimation of
a proportion is the number x, that an appropriate event occurs
in n trials, occasions or observations.
Sample proportion = x/n, population proportion is denoted
as p.
We shall not go through the derivation of the formula. Those
interest could read Chapter 10, Johnson, Freund & Miller
(2011).
3/5/2014 160
Large Sample Confidence Interval
for p
3/5/2014 161
( ) ( )
n n
x
n n
x n
x
n
x
n
x
n
x
z z
+ < <
1 1
2 2
o o
Example on Confidence interval of
proportions
If x=36 of n=100 persons interviewed are familiar with the
tax incentive for installing energy saving bulbs, construct a
95% confidence interval for the true population proportion.
Solution
x/n = 36/100 = 0.36; z
a/2
= 1.96.
We are 95% confident that the population proportion of persons
familiar with tax incentives is between 0.266 and 0.454.
3/5/2014 162
( ) ( )
100
64 . 0 36 . 0
100
64 . 0 36 . 0
96 . 1 36 . 0 96 . 1 36 . 0 + < <
454 . 0 266 . 0 < <
Sample Size for Proportions
If your research involves interviewing people or firms about a
certain issues, with two outcomes where one is success, then
the sample size should be computed as follows if p is known
either from past studies or pilot study;
If p is unknown, then the sample size should be computed as
follows;
3/5/2014 163
( )
2
2
1

.

\

=
E
z
p p n
o
2
4
1 2

.

\

=
E
z
n
o
Example on Sample Size for
Proportions
Suppose we want to estimate the true proportion of
contractors using prefabricated formwork to cast slabs, and
that we want to be at least 95% confident that the error is at
most 0.2. How large a sample will we need if
a) We have no idea what the true proportion might be;
b) We know the true proportion does not exceed 0.12?
Solution
a)
b)
3/5/2014 164
( )( ) 11 14 . 10 88 . 0 12 . 0
2
2 . 0
96 . 1
== = = n
( ) 25 01 . 24
2
2 . 0
96 . 1
4
1
== = = n
REGRESSION ANALYSIS
The main objective of many statistical investigations is to be
able to predict outcomes on the basis of equations.
Prediction is made based on dependent (response) and
independent (explanatory) variables.
For instance the crushing strength of concrete is dependent
on water cement ratio, cement content, proportion of
aggregates and age of concrete after casting.
The independent variables are used to predict the dependent
variable.
3/5/2014 165
REGRESSION ANALYSIS: Model
form
The model form for simple regression model is as follows;
Where Y
i
=the observation, i, of the dependent variable,
0
=population parameter for the intercept,
1
= population
parameter for the slope, X
i
= is the independent variable
corresponding to dependent variable, i, and
i
=error term that
is independent and normally distributed with mean zero, and
variance,
2
.
3/5/2014 166
i i i
x y c   + + =
1 0
REGRESSION ANALYSIS: Model
form
The model form for simple regression model is as follows;
Where Y
i
=the observation, i, of the dependent variable,
0
=population parameter for the intercept,
1
= population
parameter for the slope, X
i
= is the independent variable
corresponding to dependent variable, i, and
i
=error term that
is independent and normally distributed with mean zero, and
variance,
2
.
3/5/2014 167
i i i
x y c   + + =
1 0
REGRESSION ANALYSIS: Sample
Statistics for the Model
The model form for a sample is as follows;
Estimation of the statistics of the model is based on the
method of least squares. The derivation of the formula is not
covered in this course.
3/5/2014 168
i i
x b b y
1 0
+ =
REGRESSION ANALYSIS:
Method of Least Squares
Methods of least squares commences with computation of
sum of squares;
3/5/2014 169
( )
( )
( )( ) y y x x S
y y S
x x S
i
n
i
i xy
n
i
i yy
n
i
i xx
=
=
=
=
=
=
1
2
1
1
2
REGRESSION ANALYSIS:
Method of Least Squares
The estimate statistics;
3/5/2014 170
xx
xy
S
S
b
x b y b
=
=
1
1 0
REGRESSION ANALYSIS:
Coefficient of Determination
The coefficient of determination (R
2
) expresses how well the model
predicts the data. If Rsquare is close to 1, then model is predicts
100% of the data. This is ideal not often obtained from
experimental data.
Yhat is the predicted yvalue using the least square estimate
statistics.
The square root of the coefficient of determinations yields
correlation coefficient.
3/5/2014 171
( )
( )
=
=
=
n
i
i
n
i
i
y y
y y
R
1
2
1
2
2
1
REGRESSION ANALYSIS:
Worked Example
Given the following data, fit a simple linear regression model
by method of least squares.
3/5/2014 172
y x
0.18 20
0.37 60
0.35 100
0.78 140
0.56 180
0.75 220
1.18 260
1.36 300
1.17 340
1.65 380
REGRESSION ANALYSIS:
Worked ExampleSum of Squares
Sum of Squares
3/5/2014 173
( )
( )
( )( ) 40 . 505
13745 . 2
000 , 132
1
2
1
1
2
= =
= =
= =
=
=
=
y y x x S
y y S
x x S
i
n
i
i xy
n
i
i yy
n
i
i xx
REGRESSION ANALYSIS:
Worked ExampleStatistics
Statistics
The full model;
3/5/2014 174
069 . 0 200 * 00383 . 0 835 . 0
00383 . 0
132000
40 . 505
200 ; 835 . 0
0
1 0
1
= =
=
= = =
= =
b
x b y b
S
S
b
x y
xx
xy
i i
x y 00383 . 0 069 . 0 + =
REGRESSION ANALYSIS: Worked
ExampleRsquare
Coefficient of Determination;
Correlation coefficient (r);
3/5/2014 175
( )
( )
( )
( )
905 . 0
13745 . 2
202 . 0
1
1
13745 . 2
202 . 0
1
2
1
2
2
1
2
1
2
= =
=
=
=
=
=
=
=
n
i
i
n
i
i
n
i
i
n
i
i
y y
y y
R
y y
y y
95 . 0 905 . 0
2
= = = R r
REGRESSION ANALYSIS: Worked
ExampleMS Excel
It is possible using MS Excel 2007 to carry out the above
analysis. Go to Data, Data Analysis, choose regression and
follow instructions. The output is as follows;
3/5/2014 176
SUMMARY
OUTPUT
Regression Statistics
Multiple R 0.951
R Square 0.905
Adjusted R Square 0.893
Standard Error 0.159
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 1.93507 1.9351 76.49 0.00002
Residual 8 0.20238 0.0253
Total 9 2.13745
Coefficients Standard Error t Stat Pvalue Lower 95% Upper 95%
Intercept (b
0
)
0.0692 0.1010 0.6857 0.5123 0.1636 0.3021
Slope (b
1
)
0.0038 0.0004 8.7460 0.0000 0.0028 0.0048
END OF COURSE:
WISH YOU GOOD LUCK PROFESSIONALLY!!
3/5/2014 177