Professional Documents
Culture Documents
1
6/30/2022
Data
Data means observation or evidence.
• Nature of data:
-Qualitative data or attributes
-Quantitative data or variables
Score is the numerical description of an individual with regard to
some characteristics of variables.
Difference between data and facts:
• Facts are organized in their original form but data are
organized in systematic order.
• Data can be interpreted easily, but fact is not.
• Facts are descriptive in nature but data are exploratory.
• Facts may be directly based on finding.
2
6/30/2022
Experiment
An experiment is a structured study where the
researchers attempt to understand;
• the causes,
• effects,
• and processes involved in a particular process.
This method is usually controlled by the
researcher, who determines which subject is
used, how they are grouped, and the treatment
they receive.
6
3
6/30/2022
Design of experiment
• What is design of experiments? Design of
experiments (DOE) is a systematic, efficient
method that enables scientists and engineers to
study the relationship between multiple input
variables (aka factors) and key output variables
(aka responses). It is a structured approach for
collecting data and making discoveries.
4
6/30/2022
10
5
6/30/2022
11
Interview method
Advantage:
• It is possible to get complete response.
• It is more personal than questionnaire.
• Interviewer has much control over the flow and sequence of
questions .
• It is possible to make survey responsive much to earlier
results.
Disadvantage:
• Information obtained are difficult to analyze.
• It ca not be quantified.
• It need trained interviewer.
12
6
6/30/2022
13
14
7
6/30/2022
16
8
6/30/2022
Telephone Interviews
• In telephone interviews, respondents are contacted by
telephone in order to collect data for surveys
• Telephone interviewing has been used for decades
and, in some ways, has advantages over other
methods of undertaking surveys
• With improvements in the IT-field, computers can be
used to assist in telephone interviewing, and answers
given by respondents can be entered by interviewers
directly into the computer, saving effort, time and
cost
17
18
9
6/30/2022
19
20
10
6/30/2022
21
22
11
6/30/2022
23
24
12
6/30/2022
Questionnaire design
25
13
6/30/2022
Advantages of questionnaire
• Can target large number of people
• Reach respondents in widely dispersed locations
• Can be relatively low cost in time and money
• Relatively easy to get information from people quickly
• Standardised questions
• Analysis can be straight-forward and responses pre-coded
• Low pressure for respondents
• Lack of interviewer bias
(possibility of ‘ghost interviewer’ effect)
• It is low cost when the universe is large and widely spread
geographically
• Respondents have adequate time to give well thought out answers
• Respondents who are easily approachable can be reached
conveniently 27
Limitations of questionnaire
• Low response rate and consequent bias and
confidence in results
• Unsuitable for some people
– e.g. poor literacy, visually impaired, young children
• Question wording can have major effect on
answers
• Misunderstandings cannot be corrected
• There can be also a lose of questionnaire
• It is the slowest of all methods
28
14
6/30/2022
30
15
6/30/2022
31
32
16
6/30/2022
33
34
17
6/30/2022
36
18
6/30/2022
Questionnaire scale
• Carefully designed measurement scale to measure
one or more aspects of individuals or groups attitudes
• More commonly used are:
1. Thurston’s equal appealing interval scale
2. Likert summating scales
3. Guttman’s cumulative scale
4. Osgood’s semantic differential scale
37
A Likert scale
• A Likert scale assumes that the
strength/intensity of an attitude is linear, i.e. on
a continuum from strongly agree to strongly
disagree, and makes the assumption that
attitudes can be measured.
• For example, each of the five (or seven)
responses would have a numerical value which
would be used to measure the attitude under
investigation.
• Likert scales typically range from 2 to 10 –
with 3, 5, or, 7 being the most common. 38
19
6/30/2022
39
20
6/30/2022
41
42
21
6/30/2022
44
22
6/30/2022
Secondary Data
• Documentary
• Written documents
• Survey based secondary
• Census
• Continuous survey
• Adhoc surveys
45
Secondary Data
• Advantages
– Resource requirements
– Unobtrusive
– Quality
– Longitudinal
– Comparative
• Disadvantages
– Might not match your need
– Tie in with RQs
– Aggradations and definitions
– Access?
46
23
6/30/2022
47
48
24
6/30/2022
49
Design of Experiments
DoE
50
25
6/30/2022
DoE
• Experiment can be defined as a systematic
procedure carried out under controlled
conditions in order to discover an unknown
effect, to test or establish a hypothesis, or to
illustrate a known effect.
• Experimental design is an efficient method of
optimizing the experimental conditions for
SPE to maximize the amount of useful
information obtained with the minimum
number of experiments.
51
52
26
6/30/2022
53
27
6/30/2022
Parameters to DoE
Important topics germane to experimental
design include
• hypothesis statements,
• experimental control,
• specifying independent and dependent
variables,
• selection and assignment of samples or
participants to conditions,
55
• collecting data,
• and selecting valid statistical tests.
Through accurate and precise empirical
measurement and control an experimental
design increases a researcher’s ability to
determine causal relationships and state
causal conclusions.
56
28
6/30/2022
29
6/30/2022
59
60
30
6/30/2022
61
62
31
6/30/2022
Example
• Read the article titled;
Assessment of Punching Shear Strength of
Fiber-reinforced Concrete Flat Slabs Using
Factorial Design of Experiments
• https://www.researchgate.net/publication/357428695_Assessment_of_Pu
nching_Shear_Strength_of_Fiber-
reinforced_Concrete_Flat_Slabs_Using_Factorial_Design_of_Experiments
64
32
6/30/2022
65
Data analysis
• Data analysis depends on the type of data
collected.
1. Qualitative data analysis.
2. Quantitative data analysis.
66
33
6/30/2022
67
68
34
6/30/2022
69
70
35
6/30/2022
Basic statistics
Measures of Central Tendency of data,
1. the mean,
2. median and
3. mode.
Measures of data dispersion
1. Range,
2. Standard deviation,
3. Variance.
71
Crosstabulation
• Crosstabulation tables (contingency tables)
display the relationship between two or more
categorical (nominal or ordinal) variables.
• The size of the table is determined by the
number of distinct values for each variable,
with each cell in the table representing a unique
combination of values.
• Numerous statistical tests are available to
determine whether there is a relationship
between the variables in a table.
72
36
6/30/2022
Example 1
• A researcher wants to know if geotechnical
investigation practice is related to the level of
consulting firms. He collects data from a simple
random sample of 164 consulting firms as given
in the table below.
• Research question : How often do you practice
appropriate and adequate geotechnical
investigation for the design of building projects ?
• 1: Never, 2:Rarely, 3: Sometimes, 4: Usually, 5: Always
73
Contingency table
74
37
6/30/2022
75
Hypothesis Testing
• A hypothesis is a pre statement (a claim) about
a population based on previous data or
experience.
• A hypothesis test is rule that specifies whether
to accept or reject a claim about a population
depending on the evidence provided by a
sample of data.
76
38
6/30/2022
77
78
39
6/30/2022
79
Two-sided Hypothesis
• Use a two-sided alternative hypothesis (also
known as a nondirectional hypothesis) to
determine whether the population parameter is
either greater than or less than the
hypothesized value.
• A two-sided test can detect when the
population parameter differs in either
direction, but has less power than a one-sided
test.
80
40
6/30/2022
One-sided Hypothesis
• Use a one-sided alternative hypothesis (also
known as a directional hypothesis) to
determine whether the population parameter
differs from the hypothesized value in a
specific direction.
• The direction can be specified to be either
greater than or less than the hypothesized
value. A one-sided test has greater power than
a two-sided test, but it cannot detect whether
the population parameter differs in the
opposite direction. 81
82
41
6/30/2022
Tests of Hypothesis
• Parametric tests are those that make
assumptions about the parameters of the
population distribution from which the sample
is drawn. This is often the assumption that the
population data are normally distributed.
• Non-parametric tests are “distribution-free”
and, as such, can be used for non-Normal
variables.
83
84
42
6/30/2022
Example 2
• A researcher wants to know if the level of consulting
firm and geotechnical investigation practice are
related for the building design consulting firms in
Addis Ababa. He collects data on a simple random
sample of n = 164 firms, as given on table below.
Never Rarely Sometimes Usually Always total
level 1 1 5 13 14 17 50
level 2 1 6 7 12 10 36
level 3 1 12 9 10 8 40
level 4 1 10 11 10 6 38
total 4 33 40 46 41 164
85
43
6/30/2022
Always
Usually
level 1
Sometimes level 2
level 3
level 4
Rarely
Never
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
87
44
6/30/2022
89
90
45
6/30/2022
Expected Frequencies
• The hypothesis of independence tells us which
frequencies we should have found in our sample: the
expected frequencies.
• Expected frequencies are the frequencies we expect in
our sample if the null hypothesis holds.
91
Expected Frequencies
92
46
6/30/2022
Residuals
• Insofar as the observed and expected
frequencies differ, our data deviate more from
independence. So how much do they differ?
First off, we subtract each expected frequency
from each observed frequency, resulting in
a residual. That is,
93
94
47
6/30/2022
95
96
48
6/30/2022
97
98
49
6/30/2022
99
100
50
6/30/2022
101
102
51
6/30/2022
103
Reporting
104
52
6/30/2022
Parametric tests
Z-Tests and T-Tests
• Z-tests and t-tests are statistical methods
involving hypothesis testing that have
applications in science, engineering, and many
other disciplines on scale data.
105
106
53
6/30/2022
Z-Tests
• Z-test is the statistical test, used to analyze
whether two population means are different
or not when the variances are known and the
sample size is large.
• This test statistic is assumed to have a normal
distribution, and standard deviation must be
known to perform an accurate z-test.
107
108
54
6/30/2022
One-sample Z-test
109
110
55
6/30/2022
111
Steps to follow
1. Define the Null and Alternative Hypothesis
2. State the Alpha
3. State the Decision Rule
4. Calculate the Test Statistic
5. State the Result
6. State conclusion
112
56
6/30/2022
Example 2
• The average time overrun of construction projects
due to unexpected soil condition in Addis Ababa city
was estimated to be 15 % of the original agreed
project time with a standard deviation of 8. A
researcher considered 40 contractors and found out
that the average time overrun is rather 17 %.
1. State the null and alternative hypothesis.
2. Can it be claimed that the average time overrun is 15
% at a confidence level of 95 % with the sample
data.
113
1. H0: μ = 15
H1: μ≠ 15
2. Z statistics
z
=(17-15)/(8/40^0.5)=1.58
For a 95 % confidence level,
z=1.96
Accept the null hypothesis
114
57
6/30/2022
115
Example 3
• The average time overrun of construction projects
due to unexpected soil condition in Gondar city was
estimated to be 16 % of the original agreed project
time with a standard deviation of 7. A researcher
considered 40 contractors and found out that the
average time overrun is rather 22 %.
1. State the null and alternative hypothesis.
2. Can it be claimed that the average time overrun is 22
% at a confidence level of 95 % with the sample
data.
116
58
6/30/2022
1. H0: μ = 16
H1: μ≠ 16
2. Z statistics
z
=(22-16)/(7/40^0.5)=5.42
For a 95 % confidence level, z=1.96
Fail to reject the null hypothesis
117
Two-sample Z-test
• The above formula is used for one sample z-
test, if you want to run two sample z-test, the
formula for z-statistic is,
118
59
6/30/2022
Example 4
• Compare the means of the two samples in example 1
and 2 respectively.
• Is there enough evidence to claim that the time over
run of construction projects due to unexpected soil
condition in Gondar is greater than that of Addis
Ababa?
119
H0: μ1 = μ2
H1: μ1≠ μ2
• (17-22)-(15-16)/((8/40+7/40)^0.5)=-6.536
• Reject the null hypothesis
120
60
6/30/2022
121
T-test
• In order to know how significant the difference
between two groups are, T-test is used,
basically it tells that difference (measured in
means) between two separate groups could
have occurred by chance.
• This test assumes to have a normal distribution
while based on t-distribution, and population
parameters such as mean, or standard deviation
are unknown.
122
61
6/30/2022
123
124
62
6/30/2022
125
126
63
6/30/2022
127
Example 5
• The average time overrun of construction projects
due to unexpected soil conditions in Gondar city was
estimated to be 16 % of the originally agreed project
time. A researcher considered 25 contractors and
found out that the average time overrun is rather 22 %
with a standard deviation of 7.
1. State the null and alternative hypotheses.
2. Can it be claimed that the average time overrun is 22
% at a confidence level of 95 % with the sample
data.
128
64
6/30/2022
H0: μ1 = 16
H1: μ1≠ 16
= (22-16)/(7/(25^0.5)=3.57
T for alpha =0.05=2.045
3.57>2.045
Reject the null hypothesis.
129
Two-sample T-test
130
65
6/30/2022
131
P-value
• A p-value is the probability that the outcomes, from
sample data, have occurred by chance, and varies
from 0% to 100%. In general, these values are written
in decimal format, like a p-value of 5% is written as
0.05.
• Lower p-values are considered to be favorable, as
they indicate that data didn’t happen by chance.
• For example, if p-value is 0.01, it means that there is
1% probability that, from an event, the results have
appeared by chance. However, a p-value of 0.05 is
ideally acceptable, signifying that data is valid.
132
66
6/30/2022
67
6/30/2022
135
136
68
6/30/2022
137
Exercise
Design and Analysis of Experiments, by Douglas
C. Montgomery
Page 54
Problems 2.1-2.5
138
69
6/30/2022
139
70
6/30/2022
Correlation
• Correlation quantifies the strength of the linear
relationship between a pair of variables.
• On a scatter diagram, the closer the points lie
to a straight line, the stronger the linear
relationship between two variables.
141
142
71
6/30/2022
Regression
• Regression expresses the relationship in the
form of an equation.
• The regression line is obtained using the
method of least squares. Any line y = a + bx
that we draw through the points gives a
predicted or fitted value of y for each value of
x in the data set.
144
72
6/30/2022
145
Validation
• Validation is a very important procedure
particularly in researches that involve
modeling and simulation.
• Analysis results gained through modeling and
simulation should be validated to check if the
results are compatible to actual values.
146
73
6/30/2022
147
Data interpretation
• Data interpretation is the process of assigning
meaning to the collected information and
determining the conclusions, significance, and
implications of the findings. The goal of the
interpretation of data is to highlight useful
information and suggest conclusions.
148
74
6/30/2022
149
Thank you!
150
75