You are on page 1of 200

Research Methodologies

Virtual University of Pakistan


 Descriptive statistic
◦ In descriptive statistics, we describe the given data
either in numeric, tabular form or graphical form
 Inferential statistics
◦ In inferential statistics, we make predictions
(inferences) from the given data

2
 Quantitative data
◦ It can be either discrete or continuous
 Continuous data: Data that can take on any value in an
interval
 Discrete data: Data that can take on only integer
values, such as counts
 Qualitative data
◦ Data that can take on only a specific set of values
representing a set of possible categories
 Ordinal data: Categorical data that has an explicit
ordering
 Binary data: A special case of categorical data with just
two categories of values (0/1, true/false)

3
 Data can measured on one of four scales,
depending on the type of data
1. Nominal scale
 Categorical variables can be placed into categories. They
don’t have a numeric value and so cannot be added,
subtracted, divided or multiplied. They have no order
2. Ordinal scale
 Ordinal scale is used for data where order matters i.e. for
ordinal data. distances along the scale are not meaningful
3. Interval scale
 Ordered labels with meaningful distances, we do not have
absolute zero value and ratios are not meaningful
4. Ratio scale
 The ratio scale is exactly the same as the interval scale with
two differences: i) true zero point, ii) ratios are meaningful

4
 Practical Statistics for Data Scientists by Peter
Bruce and Andrew Bruce
 https://www.statisticshowto.com

5
Research Methodologies

Virtual University of Pakistan


 Mean
◦ The sum of all values divided by the number of values
e.g. data set 8 5 11 4 10 15 1
mean = 7.71
 Trimmed mean
◦ The average of all values after dropping a fixed number of
extreme values
e.g. data set 9 7 14 6 19 3 2 10 17
2 3 6 7 9 10 14 17 19
trimmed mean = 9.2
 Weighted mean
◦ The sum of all values times a weight divided by the sum of
the weights
e.g. data set 12 10 13 8 3 7 1 7 4
weighted mean = 100 / 12 = 8.33

2
 Median
◦ The value such that one-half of the data lies above and below
e.g. data set 9 7 14 6 19 3 2 10 17
2 3 6 7 9 10 14 17 19
median = 9
e.g. data set 9 12 8 15 8 6
6 8 8 9 12 15
median = 8.5
 Weighted median
◦ The value such that one-half of the sum of the weights lies above and
below the sorted data
e.g. data set 12 10 13 8 3 7 1 7 4
1 3 4 7 7 8 10 12 13
weighted median = 10
 Outlier
◦ A data value that is very different from most of the data
 Robust
◦ A estimate is said to be robust if it is not affected by extreme values

3
 Deviations
◦ The difference between the observed values and the estimate of
location
e.g. data set 6 13 5 2 9 mean = 7
deviations -1 6 -2 -5 2
 Mean absolute deviation
◦ The mean of the absolute value of the deviations from the mean
e.g. data set 6 13 5 2 9 mean = 7
absolute deviations 1 6 2 5 2
mean absolute deviation = 3.2
 Variance
◦ The sum of squared deviations from the mean divided by n – 1
where n is the number of data values
e.g. data set 6 13 5 2 9 mean = 7
deviations -1 6 -2 -5 2
squared deviations 1 36 4 25 4
variance = 70 / 4 = 17.5

4
 Standard deviation
◦ The square root of the variance
 Median absolute deviation from the median (MAD)
◦ The median of the absolute value of the deviations from the
median
e.g. data set 6 13 5 2 9 median = 6
absolute deviations 0 7 1 4 3
MAD = 3
 Range
◦ The difference between the largest and the smallest value
in a data set

5
 Percentile
◦ The value such that P percent of the values take on this
value or less and (100–P) percent take on this value or
more
e.g. data set 5 2 4 8 1 2 5 6
we want to compute 80th percentile
sort data 1 2 2 4 5 5 6 8
let i be the position of 80th percentile
i = (80 / 100) × 8 = 6.4 ➔ 7
so 80th percentile is the 7th value in the sorted data
which is 6
 Interquartile range (IQR)
◦ The difference between the 75th percentile and the 25th
percentile

6
 Mode
◦ The most commonly occurring category or value in a
data set
 Expected value
◦ When the categories can be associated with a numeric
value, this gives an average value based on a category’s
probability of occurrence
e.g. numeric value of Cat. 1 = 300, probability of Cat. 1 = 0.05
numeric value of Cat. 2 = 50, probability of Cat. 2 = 0.15
numeric value of Cat. 3 = 0, probability of Cat. 3 = 0.80
expected value = (300)(0.05) + (50)(0.15) + (0)(0.80)
expected value = 22.5

7
 Practical Statistics for Data Scientists by Peter
Bruce and Andrew Bruce
 https://www.statisticshowto.com

8
Research Methodologies

Virtual University of Pakistan


 It is a function or a listing which shows all the
possible values (or intervals) of the data
 Histogram
◦ A plot of the frequency table with the bins on the
x-axis and the count (frequency or probability) on
the y-axis 10
8
Frequency

6
4
2
0

Age
2
 Bar graph
◦ Similar to histogram, but depicts
qualitative/categorical data

14
12
10
Frequency

8
6
4
2
0
Sci-Fi Thriller Comedy Horror Other
Movie Genre

3
 Probability Density Function (PDF)
◦ Another plot to draw distribution of quantitative
data 0.035

0.03

0.025
Probability

0.02

0.015

0.01

0.005

105

120
30

75
45

60

90
52.5

97.5

112.5
37.5

67.5

82.5
Weight

4
 Symmetric vs skewed
 Light-tailed vs heavy-tailed
 Unimodal vs multimodal

5
16
14
Symmetric distribution
12
10
8
6
4
2
0
1 2 3 4 5 6 7

12 12

10
Left-skewed distribution 10
Right-skewed distribution
8 8

6 6

4 4

2 2

0 0
1 2 3 4 5 6 7 1 2 3 4 5 6 7

6
7
14

12
bimodal
distribution
10

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

8
 Practical Statistics for Data Scientists by Peter
Bruce and Andrew Bruce
 https://www.statisticshowto.com

9
Research Methodologies

Virtual University of Pakistan


 Population vs sample
◦ Population represents all possible measurements or
outcomes that are of interest to us in a particular study
◦ Sample is a small subset of the population we are
studying
 Parameter vs statistic
◦ Parameters are numbers that summarize data for an
entire population. Represented by Greek letters e.g. μ, σ
◦ Statistics are numbers that summarize data from a
sample. Represented by small Latin letters e.g. 𝑥,ҧ s
 Random variables
◦ Random variables map outcomes of random processes
to numbers
◦ Since drawing samples is a random process therefore,
sample statistic is a random variable

2
 Point vs interval estimate
◦ Point estimation gives us a particular value as an
estimate of the population parameter
◦ Interval estimation gives us a range of values which is
likely to contain the population parameter. This interval
is called a confidence interval
 Systematic error vs random error
◦ Systematic error is associated with faulty measurement
equipment or sampling process that gives
unrepresentative samples
◦ Random error occurs as a result of sampling variability
 The importance of unbiased sampling
◦ Classic example of Literary Digest poll of 1936 US
elections

3
 Practical Statistics for Data Scientists by Peter
Bruce and Andrew Bruce
 https://www.statisticshowto.com

4
Research Methodologies

Virtual University of Pakistan


2
 Standard Normal distribution (also called z-
distribution) is a normal distribution with
mean = 0 and standard deviation = 1
 To convert data to standard normal form
(standardization):
◦ Calculate mean and standard deviation of the data
◦ For each data value, subtract the mean and divide
by standard deviation
 Each of the transformed value is a z-score

3
 Sampling distribution of a statistic refers to
the distribution of some sample statistic (e.g.
mean, median etc.), over many samples
drawn from the same population
 Standard error (SE) of a statistic is the
standard deviation of its sampling
distribution

4
 Sampling distribution of the sample means is
approximately normally distributed (bell-
shaped), if the sample size is large enough
 Standard error of the
σ
sample means =
𝑛
where σ is the
standard deviation of
the population and n
is the sample size

3 2   2 3
− − − µ + + +
n n n n n n

Sampling distribution of sample means ( x)


5
 The Central Limit Theorem states that the
sampling distribution of the sample means
approaches a normal distribution as the
sample size gets larger — no matter what the
shape of the population distribution
 The Law of Large numbers states that as
sample size increases, sample mean (𝑥)ҧ
converges to true population mean (μ)

6
Income data [1] 7
 Regardless of the distribution of the population,
sample distribution of the sample means is
approximately normally distributed provided
large enough sample size (n ≥ 30) and a large
number of samples
 The sampling distribution of the sample means
centers at the true population mean provided
large enough sample size (n ≥ 30) and a large
number of samples
 Standard error of the sample means depends on
the standard deviation of the population
 Standard error decreases with increase in sample
size

8
1. Practical Statistics for Data Scientists by
Peter Bruce and Andrew Bruce
2. https://www.statisticshowto.com

9
Research Methodologies

Virtual University of Pakistan


 If n ≥ 30
, where s is the standard
𝑠
◦ Standard error =
𝑛
deviation of the sample
◦ Sampling distribution of the means will be
approximated as Normal distribution
 If n < 30, provided population is Normally
distributed
𝑠
◦ Standard error = 𝑛
◦ Sampling distribution of the means will be
approximated as t-distribution

2
99% sample means, µ ± 2.576 × SE
95%
95% sample means, µ ± 1.96 × SE
90% sample means, µ ± 1.645 × SE

3s
3s 2s
2s ss µ ss 2s
2s 3s
3s
 −−  −−  −−  ++  ++  ++
nn nn nn nn nn nn

3
s
95% sample means, µ ± 1.96 ×
n

3s 2s s µ s 2s 3s
− − − + + +
n n n n n n
𝑠
(𝑥ҧ ± 1.96 × )
𝑛

4
 Practical Statistics for Data Scientists by Peter
Bruce and Andrew Bruce
 https://www.statisticshowto.com

5
Research Methodologies

Virtual University of Pakistan


 Degrees of freedom (df): it is the number of independent
observations in a data set
 Degrees of freedom (df) for one sample t-statistic = sample size
(n) - 1 0.45

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
-5 -4 -3 -2 -1 0 1 2 3 4 5

2
 95% confidence interval for sample mean (𝑥),
ҧ
when n ≥ 30
𝒙 ± 1.96 × SE
◦ ഥ
◦ Here 𝒙
ഥ is the “point estimate”, 1.96 × SE is the
“margin of error” and 1.96 is the “critical value”
 General form of confidence interval
◦ point estimate ± margin of error
◦ where margin of error = critical value × SE

3
97.5%
95%
2.5% 2.5%
-3 -2 -1 0 1 2 3
z*

4
5
 Critical t-value depends on two things:
1. Confidence level
2. Degrees of freedom (df)

6
1. Random sampling is used to draw sample
2. Sampling distribution of sample statistic is
normally distributed, or population is
normally distributed in case of n < 30
3. Individual data values / observations in the
sample are independent of each other

8
 Practical Statistics for Data Scientists by Peter
Bruce and Andrew Bruce
 https://www.statisticshowto.com
 Khan Academy (khanacademy.org) video
lectures on confidence intervals

9
Research Methodologies

Virtual University of Pakistan


 Let x be the round-trip-time (RTT), in milliseconds,
observed in a computer network. Calculate 99%
confidence interval for the mean RTT assuming the
network RTT is normally distributed.
x = {0.21, 0.17, 0.2, 0.28, 0.19, 0.16,
0.22, 0.25, 0.21, 0.22, 0.29, 0.16}
◦ Calculate sample mean (𝑥)ҧ and sample standard deviation
(s)
𝑥ҧ = 0.213, s = 0.043
◦ Since, sample size < 30, we use t-distribution to calculate
the critical value
Critical t-value for 99% confidence level and df = 11 is
3.106
𝑠
◦ 99% confidence interval = 𝑥ҧ ± critical t-value ×
𝑛
0.043
= 0.213 ± 3.106 ×
12
= 0.213 ± 0.039
= (0.174, 0.252)
2
 There are elections in a provincial assembly constituency.
The constituency has only two candidates A and B, and all
eligible voters in the constituency would cast their vote.
We randomly picked 90 people in the constituency and
asked who would they vote for. 52 of the 90 people
supported candidate A. Can we state with 95% confidence
that candidate A would win?
◦ Sample proportion = 𝑝Ƹ = fraction of the people that support
candidate A = 52 / 90 = 0.578
◦ Sampling distribution of the sample proportion is Normal,

𝑝(1− ො
𝑝)
standard error of the sample proportion = SE =
𝑛
0.578(0.422)
= = 0.052
90
◦ 95% confidence interval = 𝑝Ƹ ± critical z-value × SE
= 0.578 ± 1.96 × 0.052
= 0.578 ± 0.102
= (0.476, 0.68)
◦ We cannot state with 95% confidence that candidate A would win.

3
 Practical Statistics for Data Scientists by Peter
Bruce and Andrew Bruce
 https://www.statisticshowto.com

4
Research Methodologies

Virtual University of Pakistan


 Research Proposal
◦ Introduction
◦ Objectives
◦ Problem Statement
◦ Hypothesis
◦ Research Methods
 Quantitative Methods
 Qualitative Methods
 Mixed Methods
 Quantitative research methods are
characterized by the collection of information
which can be analyzed numerically, the
results of which are typically presented using
statistics, tables and graphs.

 Used to test or confirm theories and


hypothesis.
 Samples are collected:
◦ Surveys
◦ Experiments
◦ Online Polls
◦ …

 Formal analysis techniques are applied and


results are then represented using figures
and numbers.
 “A method for collection of information from
a sample of individuals through their
responses to questions*.
 A survey provides a quantitative or numeric
description of trends, attitudes, or opinions
of a population by studying a sample of that
population.

* Check J., Schutt R. K. Survey research. In: J. Check, R. K. Schutt., editors. Research methods in
education. Thousand Oaks, CA:: Sage Publications; 2012. pp. 159–185.
 Survey Design
 Population and Sample
 Instrumentation
 Variables in study
 Data analysis and interpretation
Survey Design:

 Identify the purpose of survey research.


◦ To generalize from a sample to a population so that inferences can be
made about some characteristic, attitude, or behavior of population.

 Indicate why a survey is the preferred type of data collection


procedure for the study.
◦ In this rationale, consider the advantages of survey designs, such as the
economy of the design and the rapid turnaround in data collection.

 Indicate whether the survey will be cross-sectional or


longitudinal.
◦ In cross-sectional survey data collected at one point in time and in
longitudinal data collected over time.
Population and Sample:

 Identify the population in the study.


◦ Also state the size and means of identifying the individuals.

 Identify whether the sampling design for this population is single


stage or multistage (called clustering).
◦ A single-stage sampling procedure is one in which the researcher has
access to names in the population and can sample the people (or other
elements) directly.
◦ In a multistage or clustering procedure, the researcher first identifies
clusters (groups or organizations), obtains names of individuals within
those clusters, and then samples within them.
 Identify the selection process for individuals.
◦ Random Sampling
◦ Systematic Sampling
◦ Convenience Sampling
Population and Sample:

 Identify whether the study will involve stratification of


the population before selecting the sample.
◦ Stratification means that specific characteristics of individuals
(e.g., gender—females and males) are represented in the
sample and the sample reflects the true proportion in the
population of individuals with certain characteristics.

 Indicate the number of people in the sample and the


procedures used to compute this number.
◦ E.g. X% of population. Select the number of samples and
procedures as per requirement of research or margin of
error.
Instrumentation:

 Name the survey instrument used to collect data.


◦ Discuss whether it is an instrument designed for this research,
a modified instrument, or an intact instrument developed by
someone else.

 To use an existing instrument, describe the


established validity of scores obtained from past use
of the instrument.
◦ Whether one can draw meaningful and useful inferences
from scores on the instruments.
◦ When one modifies an instrument or combines instruments in
a study, the original validity and reliability may not hold for the
new instrument, and it becomes important to reestablish
validity and reliability during data analysis.
Instrumentation:

 Include sample items from the instrument so that


readers can see the actual items used.
◦ In an appendix to the proposal, attach sample items or the
entire instrument.
 Indicate the major content sections in the instrument.
◦ Cover letter
◦ Contents
◦ Closing instructions
◦ Likert Scale
 Discuss plans for pilot testing or field-testing the
survey and provide a rationale for these plans.
◦ Necessary to establish the validity of the survey, questions,
format and scales.
Instrumentation:

 For a mailed survey, identify steps for


administering the survey and for following up
to ensure a high response rate.
◦ The first mail-out may be advance-notice letter to all
participants
◦ The second mail-out may be the actual survey
◦ The third mail-out may be the participation certificate etc.
Variables in the Study:

 Although purpose statement or hypothesis


reflect variables but it is useful to relate the
variables to the specific questions or
hypotheses on the instrument in the method
section of proposal.
◦ One technique is to relate the variables, the research questions or
hypotheses, and sample items on the survey instrument so that a reader
can easily determine how the data collection connects to the variables and
questions/hypotheses.
Variables in the Study:
Data Analysis and Interpretation:

 In the proposal, present information about


the steps involved in analyzing the data.
◦ E.g. How data will be aggregated
◦ Which statistical tools/tests will be used
◦ What is rational behind using these tools
◦ How the results will be presented.
Research Methodologies

Virtual University of Pakistan


 Experiments refer to the way of experimenting
something practically with the help of scientific
procedure/approach and the outcome is observed.
 Participants
 Variables
 Materials
 Procedures
Participants:

 Describe the selection process for participants as


either random or nonrandom.

 Tell the reader about the number of participants


in each group and the systematic procedures for
determining the size of each group.

 Identify other features in the experimental design


that will systematically control the variables that
might influence the outcome.
◦ Equating the groups for example.
Variables:

 Independent variables
 Dependent variables.
◦ Students with high IQ level achieve more grades in study.
Materials:

 Report on the materials used for the


experimental treatment (e.g., Lessons and
handouts in case of a special study program).

 Describe the instrument or instruments


participants complete in the experiment.

 Report on the validity of instruments.


Experimental Procedures:

 Identify the type of experimental design to be


used in the proposed study.
◦ Pre-experimental designs
◦ Quasi-experiments
◦ True experiments
◦ Single-subject
Research Methodologies

Virtual University of Pakistan


 Research Proposal
◦ Introduction
◦ Objectives
◦ Problem Statement
◦ Hypothesis
◦ Research Methods
 Quantitative Methods
 Qualitative Methods
 Mixed Methods
 Qualitative Research methods involve
collecting and analyzing non-numerical data
(e.g., text, video, or audio) to understand
concepts, opinions, or experiences.

 Some common methods include focus groups


(group discussions), individual interviews,
and participation/observations.

 The sample size is typically small


 Natural setting: Qualitative researchers tend
to collect data in the field at the site where
participants experience the issue or problem
under study.
◦ This up-close information gathered by actually
talking directly to people and seeing them behave
and act within their context is a major characteristic
of qualitative research.
 Researcher as key instrument: Qualitative
researchers collect data themselves through
examining documents, observing behavior, or
interviewing participants.
◦ They may use a protocol—an instrument for
collecting data—but the researchers are the ones
who actually gather the information.
 Multiple sources of data: Qualitative
researchers typically gather multiple forms of
data, such as interviews, observations,
documents, and audiovisual information
rather than rely on a single data source.
◦ The researchers review all of the data, make sense
of it and presents the conclusions.
 Inductive and deductive data analysis:
Qualitative researchers build their
conclusions/assumptions from the bottom up
by organizing the data into increasingly more
abstract units of information.
 Then deductively, the researchers look back
at their data to determine if more evidence
can support each conclusion or whether they
need to gather additional information.
◦ Thus, while the process begins inductively,
deductive thinking also plays an important role as
the analysis moves forward.
 Participant’s meanings: In the entire
qualitative research process, the researcher
keeps a focus on learning the meaning that
the participants hold about the problem or
issue, not the meaning that the researchers
bring to the research or that writers express
in the literature.
 Emergent design: The research process for
qualitative researchers is emergent. This
means that the initial plan for research
cannot be tightly prescribed, and some or all
phases of the process may change or shift
after the researcher enters the field and
begins to collect data.
◦ For example, the questions may change, the forms
of data collection may shift, and the individuals
studied and the sites visited may be modified.
 Reflexivity: Reflexivity is about
acknowledging your role in the research. As a
qualitative researcher, you are part of the
research process, and your prior experiences,
assumptions and beliefs will influence the
research process.
 Holistic account: the whole phenomenon
under study is understood as a complex
system that is more than the sum of its parts
Research Methodologies

Virtual University of Pakistan


 Ethnography
 Narrative Method
 Phenomenological
 Grounded Theory
 Case Study
 The term ethnography literally means “writing
about groups of people.
 You can identify a group of people; study
them in their homes or workplaces; note how
they behave, think, and talk; and develop a
general portrait of the group.
 Getting involved in the environment, live with
the target audience, and collect data through
observing and interacting with subjects
 The term narrative comes from the verb “to
narrate” or “to tell (as a story) in detail”.
 People tell stories about their life.
 Researcher then focuses on studying a single
person, gathering data through the collection
of stories and reporting individual
experiences.
 The researcher conducts in-depth interviews
and reads various documents. Moreover, it
also reviews the events that largely impact
the personality of an individual.
 The term phenomenological means the study
of a phenomenon e.g. a situation, an
experience, an event etc.
 The fundamental goal of the approach is to
arrive at a description of the nature of the
particular phenomenon.
 Data is collected through interviews, visiting
places, observation, surveys, and reading
documents.
 Provides the reason behind an event, a
situation or an experience.
 Data is collected through observation,
interview, literature review, and relevant
document analysis.
 Data is then analyzed and the reason,
explanation or theory behind the event is
tried to figure out.
 Describes an experience, event or situation
etc.
 Data is collected through interviews, direct
observation, and historical documentation.
 Data is then analyzed and the conclusions are
presented.
 Researcher is typically involved in a sustained
and intensive experience with participants.
 This may result into multiple challenges
 How to avoid personal bias, personal
liking/disliking etc.
 Collecting data from participants is a difficult
task, especially when the data is personal.
 Procedures adopted to ensure the safety of
personal data must be articulated to the
participants.
 If you are new to qualitative research, better to
take advise from some experienced researcher.
Research Methodologies

Virtual University of Pakistan


 Research Proposal
◦ Introduction
◦ Objectives
◦ Problem Statement
◦ Hypothesis
◦ Research Methods
 Quantitative Methods
 Qualitative Methods
 Mixed Methods
 A mixed methods research design is a
procedure for collecting, analyzing, and
“mixing” both quantitative and qualitative
research and methods in a single study to
understand a research problem*
 This “mixing” or blending of data, it can be
argued, provides a stronger understanding of
the problem.
 Both Qualitative and Quantitative data
 Strengths of both methods

* Creswell, J. W., & Plano Clark, V. L. (2011). Designing and conducting mixed methods research (2nd ed.). Thousand Oaks, CA: Sage.
 In this approach, a researcher collects both
quantitative and qualitative data, analyzes
them separately, and then compares the
results to see if the findings confirm or
disconfirm each other.
 Issue with this approach is the sample size
for both the qualitative and quantitative data
collection process.
 Same number of individuals on both the
qualitative and quantitative database.
◦ Qualitative sample will be increased, and it will limit
the amount of data collected from any one
individual
 Whether the individuals for the sample of
qualitative participants should also be
individuals in the quantitative sample.
 In this design we collect and analyze
quantitative data which then is followed by
the collection and analysis of qualitative data.
 Two phases
 The quantitative results typically inform the
types of participants to be purposefully
selected for the qualitative phase and the
types of questions that will be asked of the
participants.
 The overall intent of this design is to have the
qualitative data help explain in more detail
the initial quantitative results.
 In case of outliers, the participants selected
may effect the final conclusions.
 Easy but time taking.
 In this design we collect and analyze
qualitative data which then is followed by the
collection and analysis of quantitative data.
 Two phases
 How to develop instrument out of the
findings from qualitative data
 Sample selection may be problem
 Again, time taking activity
Research Methodologies

Virtual University of Pakistan


 Large sample size
 Quick Collection of information
 Minimum bias (due to randomization)
 Remote collection of data
 Does not require direct observation
 Only quantities and no reason behind
quantities
 No interaction with participants means no
review of results with participants
 Expensive
 Difficult to modify instrument once study
begins
 Relatively easy
 You get reasons behind motivations
 Discussion with participants
 Helps explore new areas of research/ideas
 Easy to modify/flexible
 Less expensive
 No numbers and figures
 Small sample size
 Difficult to interact with people
 Researcher bias can influence results
 Combines strengths of both qualitative and
quantitative methods
 Collects rich and comprehensive data
 Flexibility
 Strong inference and evidence
 Accurate results
 Sample size
 Increased complexity
 Time taking
 More number of resources
Research Methodologies

Virtual University of Pakistan


Qualitative methods Quantitative methods Mixed methods
Qualitative data i.e. Quantitative data Both qualitative and
Non numeric comprising of quantitative data
measurable quantities
text, images, videos Numbers, values, text, images, videos ,
etc. tables. numbers, values,
tables etc.

Open ended Close ended Both close ended and


instruments instruments open ended
instruments
Interviews, group Survey, experiments Interviews, group
discussion, audio etc. discussion, audio
visuals etc. visuals, survey,
experiments etc.
Qualitative methods Quantitative methods Mixed methods
Summarizing, Formal mathematical Both
categorizing, and statistical analysis
interpretation etc.
Small sample size Large sample size Depends on
researcher
Research Methodologies

Virtual University of Pakistan


 Hypothesis testing is used to test an
assumption/hypothesis regarding a
population parameter
 The purpose of hypothesis testing is to
determine whether:
◦ there is enough statistical evidence in favor of a
certain belief/hypothesis, about a population
parameter
◦ Or, a result is due to chance occurrence

2
 Accurately predicting coin tosses:
◦ Probability of accurately predicting two consecutive
1 2
coin tosses = 2
= 0.25
◦ Probability of accurately predicting seven
1 7
consecutive coin tosses = 2
≈ 0.008

3
 Null hypothesis (denoted by H0): Statement of
zero or no change
 Alternative hypothesis (denoted by Ha or H1):
statement that there is a change (what we
hope to prove)
 We test only the null hypothesis i.e. we either
“reject the null hypothesis” (which suggests
the alternative hypothesis) or “fail to reject
the null hypothesis”

4
 A website’s interface is changed with the intent to increase the
mean amount of time people spent on it. Before changing the
interface, the average visit-duration was 15 minutes
◦ H0: μ = 15 min.
◦ Ha: μ > 15 min.
 As per provincial government, the current literacy rate of the
province is 65%. Ali suspects that it is less than 65%, he
randomly sampled 120 people and found that 59.2% of them
were literate
◦ H0: p = 0.65
◦ Ha: p < 0.65
 A restaurant owner installed a new automated drink machine.
The machine is designed to dispense 530mL of liquid on the
medium size setting. The owner suspects that the machine may
not be dispensing the said quantity of liquid in medium drinks [3]
◦ H0: μ = 530mL
◦ Ha: μ ≠ 530mL

5
 Test statistic: A test statistic is computed on the sample to
measure the difference between the observed data and what
would be expected under the null hypothesis
◦ e.g. test can be z-test or t-test, and test statistic can be z-statistic or t-
statistic
 p-value: It is the probability of finding results at least as extreme
as the observed ones when the null hypothesis is true
◦ Its value is between 0 and 1
◦ High value indicates strong evidence in favor of the null hypothesis and
low value indicates strong evidence against the null hypothesis
 Significance level (α): It is a pre-defined value for a hypothesis
test. If the calculated p-value is ≤ α, we reject the null
hypothesis (i.e. the results are statistically significant), otherwise
we fail to reject the null hypothesis
◦ Typical values for significance level are 0.05 and 0.01
 Statistical significance: A result is statistically significant when it
is very unlikely to have occurred given the null hypothesis is true

6
 Type of test: It depends on the alternative
hypothesis. It can be either one-tailed (right or
left-tailed) or two-tailed
 Critical region: The critical region (also called
rejection region) is the region of values that
corresponds to the rejection of the null
hypothesis
◦ It depends on the value of significance level and the type
of test
 Acceptance region: The region of values where
we fail to reject the null hypothesis
 Critical value(s): The value(s) which separate the
critical region from the acceptance region

7
 H0: μ = 0, Ha: μ > 0, α = 0.05
Suppose n ≥ 30 so the test is z-test
Acceptance region (95%)

Critical region
(α = 5%)

-3 -2 -1 0 1 2 3

CriticalCritical
Critical
value value
value
Value of testValue of test
statistic statistic 8
 H0: μ = 0, Ha: μ ≠ 0, α = 0.05
Suppose n ≥ 30 so the test is z-test
Acceptance region (95%)

Critical region Critical region


(α/2 = 2.5%) (α/2 = 2.5%)

-3 -2 -1 0 1 2 3

Critical value Critical


Critical valuevalue
Value of test
statistic 9
1. Define null and alternative hypothesis
2. Define the significance level (α) for the test
3. Define the test statistic and critical region
4. Calculate test statistic and p-value, and
compare p-value with significance level
5. Conclude regarding the hypothesis

10
1. Practical Statistics for Data Scientists by
Peter Bruce and Andrew Bruce
2. https://www.statisticshowto.com
3. Khan Academy (khanacademy.org) video
tutorials on hypothesis testing

11
Research Methodologies

Virtual University of Pakistan


Null hypothesis (H0) actually is
True False
Correct
Decision made Fail to reject Type II error
decision
about Null
hypothesis (H0) Correct
Reject Type I error
decision

 Type I error: Rejecting the null hypothesis (H0) when it is


actually true
◦ Probability of Type I error is ‘α’, the significant level
 Type II error: Failing to reject the null hypothesis (H0) when
it is actually false
◦ Probability of Type II error is ‘β’
 Power of a test: It is the probability of rejecting the null
hypothesis when it is actually false
◦ It is the probability of avoiding a type II error, so it is ‘1 - β’

2
 H0: μ = μ1, Ha: μ > μ1

µ1 µ2
α, Prob. of Type I error
β,
β, Prob.
Prob. of
of Type
Type IIII error 1 - β, Power

3
 Practical Statistics for Data Scientists by Peter
Bruce and Andrew Bruce
 https://www.statisticshowto.com

4
Research Methodologies

Virtual University of Pakistan


 Testing about population mean (μ)
◦ If sample size (n) ≥ 30
 We use z-test and the calculated statistic is z-statistic
or z-score
◦ If sample size (n) < 30, and population is normally distributed
 We use t-test and the calculated statistic is t-statistic
ҧ 0
𝑥−μ
◦ In either case, test statistic is calculated as 𝑠
𝑛
where 𝑥ҧ is the sample mean
μ0 is the population mean from the H0
s is the sample standard deviation
n is the sample size

2
 Testing about population proportion (p)
◦ We use z-test and the calculated statistic is z-statistic
ො 0
𝑝−𝑝
◦ Test statistic is calculated as
𝑝0 (1−𝑝0 )
𝑛
where 𝑝Ƹ is the sample proportion
𝑝0 is the population proportion from the H0
n is the sample size

3
 p-value for right-tailed z-test
◦ e.g. H0: μ = μ1, Ha: μ > μ1
Assume z-statistic = 1.15
p-value = 1 – 0.8749 = 0.1251

4
5
 p-value for two-tailed z-test
◦ e.g. H0: μ = μ1, Ha: μ ≠ μ1
Assume z-statistic = -2.05
p-value = 2 × 0.0202= 0.0404

6
7
 p-value for left-tailed t-test
◦ e.g. H0: μ = μ1, Ha: μ < μ1, n = 10 → df = 9
Assume t-statistic = -1.39
p-value ≈ 0.10

8
 Practical Statistics for Data Scientists by Peter
Bruce and Andrew Bruce
 https://www.statisticshowto.com
 t-distribution calculator
◦ https://homepage.divms.uiowa.edu/~mbognar/applets/t.html

10
Research Methodologies

Virtual University of Pakistan


1. Random sample
2. Sampling distribution of a sample statistic
(i.e. mean or proportion) be normally
distributed
3. Individual observations / data values in the
sample be independent

2
 For mean
i. if n ≥ 30
 We use z-distribution to build a confidence interval or do a hypothesis testing
ii. if n < 30, population be normally distributed
 We use t-distribution to build a confidence interval or do a
hypothesis testing
iii. if n < 30 AND distribution of population is unknown
 We draw the distribution of sample data to see if the sample data is
roughly symmetric and there are no outliers, if yes, then We use t-
distribution to build a confidence interval or do a hypothesis testing

3
 For proportion
◦ Let 𝑝Ƹ be the sample proportion and n be the sample size:
the sampling distribution of sample proportion is normally
distributed if 𝑛𝑝Ƹ ≥ 10 AND 𝑛(1 − 𝑝)Ƹ ≥ 10. And we use z-
distribution to build a confidence interval or do a
hypothesis testing
 Example 1: 𝑝Ƹ = 0.28, n = 50
 𝑛𝑝Ƹ = 14, 𝑛(1 − 𝑝)Ƹ = 36
 We can approximate the sampling distribution of sample proportion
with the normal distribution
 Example 2: 𝑝Ƹ = 0.16, n = 50
 𝑛𝑝Ƹ = 8, 𝑛(1 − 𝑝)Ƹ = 42
 We cannot approximate the sampling distribution of sample
proportion with the normal distribution

4
 If we are sampling with replacement then
individual observations / data values are
independent
 If we are sampling without replacement then
individual observations / data values aren't
technically independent since removing each
observation changes the population
◦ However, we can treat the individual observations as
independent if the sample size is no more than 10% of
the population. This is known as 10% condition.
 e.g. if sample size = 50 then population size must be ≥ 500

5
 Practical Statistics for Data Scientists by Peter
Bruce and Andrew Bruce
 Khan Academy (khanacademy.org) video
tutorials on hypothesis testing
 https://www.statisticshowto.com

6
Research Methodologies

Virtual University of Pakistan


 A website’s interface is changed with the intent to increase the
mean amount of time people spent on it. Before changing the
interface, the average visit-duration was 25 minutes. After
changing the interface, a random sample of 50 visitors is drawn.
The mean and standard deviation of visit-duration for the
sample is 26 min. and 5.1 min., respectively. Is there sufficient
evidence to conclude that the new interface increased the mean
amount of time people spent on the website? Assume
significance level of 0.05.
◦ H0: μ = 25, Ha: μ > 25, n = 50, 𝑥ҧ = 26, s = 5.1, α = 0.05
ҧ 0
𝑥−μ 26−25
◦ z-statistic = 𝑠 = 5.1 = 1.386 ≈ 1.39
𝑛 50
◦ p-value = 1 – 0.9177
= 0.0823
◦ Since, p-value > α, so we fail to reject the null hypothesis,
and conclude that there is no sufficient evidence that the
new interface increased the mean amount of time people
spent on the website
2
 Government has set the standard for the minimum weight for a
roti to be sold at tandoors at 100g. A person wanted to check
whether the tandoor in his street is following the standard. He
took a random sample of 8 rotis and measured their weights.
The mean and the standard deviation of the sample is 95.5g and
3.64g, respectively. Is there sufficient evidence to conclude at
significance level of 0.01 that the tandoor is not following the
standard. Assume that the weight of roti is normally distributed.
Provide statistical justification for your answer.
◦ H0: μ ≥ 100g, Ha: μ < 100g, n = 8, 𝑥ҧ = 95.5, s = 3.64, α = 0.01
ҧ 0
𝑥−μ 95.5−100
◦ t-statistic = 𝑠 = 3.64 ≈ -3.5
𝑛 8
◦ p-value = 0.005

◦ Since, p-value < α, so we reject the null hypothesis, and


conclude that there is sufficient evidence that the tandoor
is not following the standard
3
 According to a very large poll, 36% people in the country support
political party A. Faraz wants to know whether the figure is true
for the constituency where he lives. He took a random sample of
100 people in the constituency and asked them which political
part they support. According to the sample data, 32% people
expressed their support for party A. Is there sufficient evidence
at significance level of 0.05 that the proportion of people that
support party A is not 36% in the constituency? Provide statistical
justification for your answer.
◦ H0: p = 0.36, Ha: p ≠ 0.36, n = 100, 𝑝Ƹ = 0.32, α = 0.05
ො 0
𝑝−𝑝 0.32−0.36
◦ z-statistic = = ≈ -0.83
𝑝0 (1−𝑝0 ) 0.36 (0.64)
𝑛 100
◦ p-value = 2 × 0.2033
= 0.4066
◦ Since, p-value > α, so we fail to reject the null hypothesis, and
conclude that there is no sufficient evidence that the proportion of
people that support party A is not 36% in the union council

4
 https://www.statisticshowto.com
 t-distribution calculator
◦ https://homepage.divms.uiowa.edu/~mbognar/applets/t.html

5
Research Methodologies

Virtual University of Pakistan


 Once a research work is completed, you write
a research report
 A research report is a document that provides
complete details of the research and its
findings
 Essential step
 Writing a research report is important as it is
the only tool to communicate your research
results.
 1. Logical analysis of the subject matter:
◦ It is the first step which is primarily concerned with
the development of a subject. There are two ways in
which to develop a subject.
 (a) Logically
 (b) Chronologically

 2. Preparation of the final outline:


◦ “Outlines are the framework upon which long
written works are constructed”.
 3. Preparation of the rough draft:
◦ Such a step is of utmost importance for the
researcher. He sits to write down what he has done
in the context of his research study.

 4. Rewriting and polishing of the rough draft:


◦ The careful revision makes the difference between a
mediocre and a good piece of writing.
◦ While rewriting and polishing, one should check the
report for weaknesses in logical development or
presentation.
 5. Preparation of the final bibliography:
◦ All the work that has been consulted: Books,
research articles, magazine, newspaper articles etc.

 6. Writing the final draft:


◦ Final draft should be carefully written. There should
be no ambiguity and vagueness. Each and every
thing should be clearly mentioned.
Research Methodologies

Virtual University of Pakistan


 The layout of the report means as to what the
research report should contain.
 A comprehensive layout of the research
report should comprise
◦ 1. Preliminary pages
◦ 2. The main text
◦ 3. The end matter
1. Preliminary Pages
◦ In its preliminary pages the report should carry a
title and date, followed by acknowledgements in the
form of ‘Preface’ or ‘Foreword’.
◦ Then there should be a table of contents followed
by list of tables and illustrations
2. Main Text
 The main text provides the complete outline of
the research report along with all details.
 The main text of the report should have the
following sections:
◦ Introduction
◦ Statement of findings and recommendations
◦ The results
◦ The implications drawn from the results
◦ The summary.
 Each main section of the report should begin on
a new page.
2.1 Introduction:
 The purpose of introduction is to introduce the
research project to the readers.
 Enough background should be given to make
clear to the reader why the problem was
considered worth investigating.
 The hypotheses of study, if any, and the
definitions of the major concepts employed in
the study should be explicitly stated in the
introduction of the report.
 The methodology adopted in conducting the
study must be fully explained.
2.1 Introduction:
 How was the study carried out?
 What was its basic design?
 If the study was an experimental one, then what were the experimental
manipulations?
 If the data were collected by means of questionnaires or interviews, then exactly
what questions were asked (The questionnaire or interview schedule is usually
given in an appendix)?
 If measurements were based on observation, then what instructions were given to
the observers?
 Regarding the sample used in the study the reader should be told: Who were the
subjects? How many were there?
 How were they selected?
2.2 Statement of findings and
recommendations:
◦ After introduction, the research report must contain
a statement of findings and recommendations in
non-technical language so that it can be easily
understood by all concerned.
2.3 Results:
◦ Detailed presentation of the findings of the study,
with supporting data in the form of tables and
charts together with a validation of results, is the
next step in writing the main text of the report.
◦ Provide result summaries rather than raw data
2.4 Implications of the results:

◦ Mention the conclusion


◦ At the same time, a forecast of the probable future
of the subject and an indication of the kind of
research which needs to be done in that particular
field is useful and desirable.
2.5 Summary
◦ It has become customary to conclude the research
report with a very brief summary, resting in brief
the research problem, the methodology, the major
findings and the major conclusions drawn from the
research results.
3 End Matter
◦ At the end of the report, appendices should be
enlisted in respect of all technical data such as
◦ questionnaires, sample information, mathematical
derivations and the like ones.
◦ Bibliography of sources consulted should also be
given.
◦ Index should be given at the end of the report
1. Preliminary Pages
2. Main Text
2.1 Introduction
2.2 Statement of findings and recommendations
2.3 The results
2.4 The implications drawn from the results
2.5 The summary
3. End Matter
Research Methodologies

Virtual University of Pakistan


 The results of a research investigation can be
presented in a number of ways viz., a
technical report, a popular report, an article,
a monograph or at times even in the form of
oral presentation.
 Here we will discuss Technical Report,
Popular Report and Oral Presentation.
 Technical report is used whenever a full
written report of the study is required
whether for recordkeeping or for public
dissemination.
 In the technical report the main emphasis is
on:
◦ The methods employed
◦ Assumptions made in the course of the study
◦ The detailed presentation of the findings including
their limitations and supporting data
 1. Summary of results: A brief review of the main findings just in two or three
pages.
 2. Nature of the study: Description of the general objectives of study, formulation
of the problem in operational terms, the working hypothesis, the type of analysis
and data required, etc.
 3. Methods employed: Specific methods used in the study and their limitations. For
instance, in sampling studies we should give details of sample design viz., sample
size, sample selection, etc.
 4. Data: Discussion of data collected, their sources, characteristics and limitations.
If secondary data are used, their suitability to the problem at hand be fully assessed.
In case of a survey, the manner in which data were collected should be fully
described.
 5. Analysis of data and presentation of findings: The analysis of data and
presentation of the findings of the study with supporting data in the form of tables
and charts be fully narrated. This, in fact, happens to be the main body of the report
usually extending over several chapters.
 6. Conclusions: A detailed summary of the findings and the policy implications
drawn from the results be explained.
 7. Bibliography: Bibliography of various sources consulted be prepared and
attached.
 8. Technical appendices: Appendices be given for all technical matters relating to
questionnaire, mathematical derivations, elaboration on particular technique of
analysis and the like ones.
 9. Index: Index must be prepared and be given invariably in the report at the end.
 A popular report is used if the research
results have policy implications.
 The popular report is one which gives
emphasis on simplicity and attractiveness.
The simplification should be sought through
clear writing, minimization of technical,
particularly mathematical, details and liberal
use of charts and diagrams.
 1. The findings and their implications: Emphasis in the report is given on the
findings of most practical interest and on the implications of these findings.
 2. Recommendations for action: Recommendations for action on the basis of the
findings of the study is made in this section of the report.
 3. Objective of the study: A general review of how the problem arise is presented
along with the specific objectives of the project under study.
 4. Methods employed: A brief and non-technical description of the methods and
techniques used, including a short review of the data on which the study is based, is
given in this part of the report.
 5. Results: This section constitutes the main body of the report wherein the results
of the study are presented in clear and non-technical terms with liberal use of all
sorts of illustrations such as charts, diagrams and the like ones.
 6. Technical appendices: More detailed information on methods used, forms, etc.
is presented in the form of appendices. But the appendices are often not detailed if
the report is entirely meant for general public.
 At times oral presentation of the results of
the study is considered effective.
 Leads to discussion so better understanding
of results, conclusion and decisions
 But the main demerit of this sort of
presentation is the lack of any permanent
record concerning the research details and it
may be just possible that the findings may
fade away from people’s memory even before
an action is taken.
 In order to overcome this difficulty, a written
report may be circulated before the oral
presentation and referred to frequently
during the discussion.
 Use of slides, wall charts and blackboards is
quite helpful in contributing to clarity and in
reducing the boredom.
Research Methodologies

Virtual University of Pakistan


 Report should be long enough to cover the
subject but short enough to maintain
interest.
 A research report should not be dull; it
should be such as to sustain reader’s interest.
 Use charts, graphs for quick extraction of
knowledge.
 No grammatical mistakes, provide precise
and clear details. Better to consult some
technical report writer before writing.
 The reports should be free from grammatical
mistakes and must be prepared strictly in
accordance with the techniques of composition
of report-writing such as the use of quotations,
footnotes, documentation, proper punctuation
and use of abbreviations in footnotes and the
like.
 The report must present the logical analysis of
the subject matter. It must reflect a structure
wherein the different pieces of analysis relating
to the research problem fit well.
 Appendices, Bibliography and index should
be provided.
 Report must be attractive in appearance, neat
and clean, whether typed or printed.
 Limitations and future directions.
Research Methodologies

Virtual University of Pakistan


 Analysis functions help you perform tasks
related to data analysis
 We will use Microsoft Excel as a tool
◦ Simple and easily available
 Sum
◦ Sum(Number1,[Number2],…)
◦ Adds values
 Average
◦ Average(Number1,[Number2],…)
◦ Returns average value
 Count
◦ Count(value1,[value2],[value3],…)
◦ Returns total number of cells containing numbers
 CountA
◦ CountA(value1,[value2],[value3],…)
◦ Identifies whether a cell is empty or not
Research Methodologies

Virtual University of Pakistan


 IF
◦ Allows you to make logical comparisons between a
value and what you expect
◦ IF (logical_test, [value_if_true], [value_if_false])
 SUMIF
◦ sum the values in a range that meet criteria
◦ SUMIF(range, criteria, [sum_range])
 SUMIF
◦ sum the values in a range that meet criteria
◦ SUMIF(range, criteria, [sum_range])
 SUMIF
◦ sum the values in a range that meet criteri
◦ SUMIF(range, criteria, [sum_range])
 CONCATENATE
◦ Concatenates (joins) text in cells
◦ SUMIF(Text1, [Text2],…)
 LEFT
◦ Extracts specified characters from text from left
◦ LEFT(Text,[num_chars])
 RIGHT
◦ Extracts specified characters from text from right
◦ RIGHT(Text,[num_chars])
 MID
◦ Extracts specified characters from text from right
◦ RIGHT(Text,[num_chars])
Research Methodologies

Virtual University of Pakistan

You might also like