You are on page 1of 33

Chapter Five

Research Design

1
Research Design
• Research design is the framework of research methods(type) and
techniques chosen by a researcher.
• The research design is the conceptual structure within which research is
conducted; it constitutes the blueprint for the collection, measurement
and analysis of data.
• Creating a research design means making decisions about:
– The type of data you need (qualitative, quantitative, observational, experimental)
– The location and timescale of the research
– The participants and sources
– The variables and hypotheses (if relevant)
– The methods for collecting and analysing data
• The research design determines exactly what will and will not be
included and It also defines the criteria by which you will evaluate your
results and draw your conclusions.
• The reliability and validity of your study depends on how you collect,
measure, analyse, and interpret your data.
2
We may split the overall research design into three:
• The sampling design - which deals with the method of selecting
items to be observed for the given study.
• The statistical design - which concerns with the question of how
many items are to be observed and how the information and data
gathered are to be analysed;
• The operational design - which deals with the techniques by
which the procedures specified in the sampling, Statistical and
observational designs can be carried out.
• Research Design can be
• Quantitative-Quantitative research aims at determining the relationship
between quantitative variables or compare groups
• Qualitative-Qualitative research are concerned with how people think and act
in their everyday lives.
• In qualitative interviewing, researchers model their interviews
after a normal conversation rather than a formal question and
answer exchange 3
What is Sampling?
• Research studies involve a particular group of participants, But
intend on answering a general question about a larger population of
individuals.
• Sampling is the act, process, or technique of selecting a
representative part of a population for the purpose of determining
parameters or characteristics of the whole population.
• A sample is “a smaller but hopefully representative collection of units
from a population used to determine truths about that population”
• Sample can be any size like a single person or 50 people
• The larger the sample, the more likely the sample will share the same
characteristics as the population.
• EXAMPLE: Flipping a coin
• The more times we flip a coin, the more likely we are to observe an equal split
(i.e., 50/50) of coin tosses into heads or tails

4
Why sampling?
•Six main reasons for sampling instead of doing a census are:
– Economy
– Timeliness
– The large size of many populations
– Inaccessibility of some of the population
– Destructiveness of the observation
•Disadvantages of Sample Surveys compared with Censuses are:
– Data on sub-populations (such as a particular ethnic group) and
Data for small geographical areas may be too unreliable to be
useful.
– Estimates are subject to sampling error which arises as the
estimates are calculated from a part (sample) of the population.
– May have difficulty communicating the precision (accuracy) of
the estimates to users.
5
• While developing a sampling design, the researcher must pay attention
to the following points:
– Type of universe
• In finite universe the number of items is certain, but in case of an infinite universe the
number of items is infinite. Number of computers in a given lab is finite universe and
number of starts in space is infinite universe.
– Sampling unit
• Sampling unit may be a geographical one such as state, district, village, etc., or a
construction unit such as house, flat, etc., or it may be a social unit such as family, club,
school, etc., or it may be an individual.
– Source list
• It contains the names of all items of a universe (in case of finite universe only). If source
list is not available, researcher has to prepare it. Such a list should be comprehensive,
correct, reliable and appropriate. It is extremely important for the source list to be as
representative of the population as possible.
– Size of sample
• This refers to the number of items to be selected from the universe to constitute a
sample.
– Budgetary constraint
6
Sampling Techniques
• Sampling methods fall into two categories:
– Probability sampling
• Each unit of the population will be represented in the
sample.
• Each member of the population has a chance (equal in
the case of random selection) of being selected.
– Non-probability sampling
• The researcher has no way of forecasting that each
member of the population will be represented in the
sample.
• Some members of the population have little or no chance
of being selected.
– Probability Sampling includes simple random sampling,
Systematic random sampling, Stratified random sampling,
Proportional stratified sampling and Cluster sampling
7
• Simple random sampling
– The least sophisticated of all sampling designs
– Simple random selection where every member of the
population is given an equal chance of being selected;
– A table of random number or lottery system is used to
determine which units are to be selected.
– Good for homogeneous population;
– Easy when the population is small and elements are known;
– Impractical for very large populations
• EXAMPLE 1: A list of all currently enrolled students at WCU is obtained
and a table of random numbers is used to select a sample of students
• EXAMPLE 2: A researcher obtains a list of all residential addresses in the
town and uses a computer generated random list of homes to be
included in a survey

8
• Systematic random sampling
– Selecting elements of the population in predetermined sequence;
– Select every kth item on a list (k= N/n)
– Randomness element is in picking up the starting point (regular
pattern and may lead bias)
– For example, if you wanted a sample size of 100 from a population
of 1000, select every 1000/100 = 10th member of the sampling
frame.
• Stratified random sampling
– Applied when the population has different layers (strata)
– The researcher samples from each one of the layers (stratum)
equally
– Examples
• Sampling of school children from grades 4, 5 and 6
• Sampling of customers equally from Corporate customers and Residential customers

– Reduces sampling bias


9
10
• Proportional stratified sampling
– When number of the elements of the strata are different
samples are taken from each sub group based on the ratio of
the sub groups size to the total data population
• Assume a total data population of 1000 divided in to four groups having
450, 250, 200 and 100 respectively. If we are using a sample size of 200,
the 45% of the data comes from 1st group, 25% from the 2nd, 20% from
the 3rd and 10% from the fourth group.
Cluster sampling
– Grouping the population into clusters and then select members
of clusters Population Random selection of clusters
C1 C2 C3 C4 C5
C5 C6 C7 C8 C8
C9 C10 C11 C12
C11
Disadvantages include an increased risk of bias, if the chosen clusters
are not representative of the population, resulting in an increased
sampling error. 11
Population Appropriate sampling
characteristics technique
Homogeneous members •Simple random sampling
•Systematic random sampling

Stratified population with Stratified random sampling


approximately equal in size
Stratified population, strata Proportional stratified
different in size sampling
Population with discrete clusters Cluster sampling
with similar characteristics

12
• Non-probability sampling includes Convenience sampling, Quota
sampling and Purposive sampling.
• Convenience sampling
– Sometimes known as grab or opportunity sampling or
accidental or haphazard sampling.
– A type of nonprobability sampling which involves the sample
being drawn from that part of the population which is close to
hand (willingness and availability ).
– That is, readily available and convenient.
– May be appropriate for some less demanding research
– volunteer bias ( may not represent the population)
• Purposive sampling (Judgmental sampling )
– The researcher chooses the sample based on who they think
would be appropriate for the study.
– This is used primarily when there is a limited number of people
that have expertise in the area being researched 13
• Quota sampling
– A variation of convenience sampling
– Elements are selected in the same proportion as in the
population but not in a random fashion
• Ex: there are equal number of Information Science (IS)
and Computer Science (CS) students
– Quota sampling would choose 20 IS and 20 CS
students without any attempt to random selection
• Ex: an interviewer may be told to sample 200 females and
300 males between the age of 45 and 60.

14
Exercise- Identify the type of sampling
• Each student’s name is written in peace of paper and placed in
to a cup. Names picked without looking.
• Every 5th phone is picked up and tasted for defects.
• To conduct survey on all GC of Software engineering
department, you chose software engineering students from
your dormitory.
• To represent all students in a school, you ask all students in a
school bus for survey.

15
Sampling bias
• Sampling bias is any influence that may have disturbed
the randomness by which the choice of a sample has
been selected.
• In probability sampling, members of population may have a known
chance of being selected.
– To study the effect of typing speed on programming, you may randomly
choose software engineering students from students’ database. But
unfortunately, your data may be dominated by students with high or low
interest in programming.
– The study of the use of email as a resource sharing platform in Software
Engineering may be affected by those who have no email account.
– Researchers want to understand the effect of a new traffic law in a city and
so conduct a survey via convenience sampling inside a mall. The study is
highly likely to suffer under coverage from the following groups:
• People who do not like visiting malls
• Ones who do not have transportation to the mall
• Those who prefer visiting another mall 16
Determining sample size
• If the sample size is too small, it will not yield valid results or
adequately represent the realities of the population being
studied.
• On the other hand, while larger sample sizes yield smaller
margins of error and are more representative, a sample size that
is too large may significantly increase the cost and time taken to
conduct the research.
• Confidence intervals measure the degree of uncertainty or
certainty in a sampling method and how much uncertainty there
is with any particular statistic.
• The confidence interval is usually a plus or minus (±) figure.
• For example, if your confidence interval is 6 and 60% percent of your
sample picks an answer, you can be confident that if you had asked the
entire population, between 54% (60-6) and 66% (60+6) would have
picked that answer.

17
• The confidence level refers to the percentage of probability, or
certainty that the confidence interval would contain the true
population parameter when you draw a random sample many times.
• It is expressed as a percentage and represents how often the
percentage of the population who would pick an answer lies within
the confidence interval.
– For example, a 99% confidence level means that should you repeat an
experiment or survey over and over again, 99 percent of the time, your results
will match the results you get from a population.
• The larger your sample size, the more confident you can be that their
answers truly reflect the population.
• In other words, the larger your sample size for a given confidence
level, the smaller your confidence interval.
• Another critical measure when determining the sample size is the
standard deviation, which measures a data set’s distribution from its
mean.

18
• An other important consideration to make when determining your sample
size is the size of the entire population you want to study.
• The following are constant values of confidence level

Confidence level z-score


80% 1.28
85% 1.44
90% 1.65
95% 1.96
99% 2.58
• Once we get all the above information, to calculate sample size, the
formula is:
Necessary Sample Size = (Z-score)2 * StdDev*(1-StdDev) / (margin of error)2
– Example: Say you choose to work with a 95% confidence level, a standard deviation of
0.5, and a confidence interval (margin of error) of ± 5%
Sample size = ((1.96)2 x .5(.5)) / (.05)2
= 384.16

19
Data Collection
• is a term used to describe a process of preparing and collecting
research data
• It is important to choose the right data collection method(s) as this will
allow data to be collected that will meet the objectives of the
research;
• Types of Data
– Primary Data:
• Are those which are collected for the first time and thus happen to be original in
character
• Includes Observation (Systematic viewing), Personal Interviews (structured or
unstructured, Telephone Interviews, Questionnaire (self administered),Popular in
case of big inquiries, Schedules (filled by enumerators)
– Secondary data:
• Are those which have been collected by someone else and which have already been
passed through the statistical process
• Choice of data collection largely depends upon the objective of research,
minimization of Bias, Reduction of non-response, Reduction of data error,
Minimization of expenses 20
Development of Questionnaire
• Types of questionnaire includes:
– Open ended question
– Multiple choice
– Yes / No
– Likert scale (Rating scale)
• 5 4 3 2 1 or -2 -1 0 1 2
• Strongly agree, agree, strongly disagree
– Rankings
• During design of questionnaire:
– Pay attention to Pattern of questions
– Pay attention to the number of questions
– Avoid Difficult questions
– Avoid Leading questions
– Avoid Ambiguous questions
– Pilot testing (feasibility study) is very important, Helps to validate your
questionnaire 21
When to use different types of questions
• Open questions should be used when rich qualitative data is
needed that describes the respondent´s perception of their
own experience.
• Multiple choice questions are useful when there is more
complexity in the range of possible responses in discrete
categories, but the range of expected responses is still fairly
limited.
• Dichotomous questions are useful in situations where you
want to force respondents to express a clear opinion or as a
filter for determining which subsequent questions are
appropriate. Can have two possible answers.
• Rating scales are useful for seeking a measure of perceptions
and attitudes of respondents.

22
Collection of secondary data
• Secondary data might be either published or unpublished.
• One should however be careful in using secondary data since
the data available may be misleading.
• One has to check for
– Suitability
• (Is it relevant for your research problem)
– Adequacy
• (will you be able to answer your questions adequately)
– Reliability
• (when was the data collected, who collected data, how
was the data collected?)

23
Actual Data Collection
• One should carefully plan the data collection as this is the
departure for execution of the research
• Pre-data collection
– Training of Data Collectors might be crucial
– Supporting letters might be necessary
• Post-data collection
– Editing of returned questionnaires
• The data you have collected may be presented using
– Textual
– Tabular methods
– Graphical methods
Data presentation
• Text presentation
– Text is the main method of conveying information as it is used to explain
results and trends, and provide contextual information.
– Data are fundamentally presented in paragraphs or sentences.
– Text can be used to provide interpretation or emphasize certain data.
– If quantitative information to be conveyed consists of one or two numbers, it is
more appropriate to use written language than tables or graphs.
– For instance, “The failure rate of software projects following wrong
requirement was 50% in 2016 and 60% in 2017.
– If this information were to be presented in a graph or a table, it would occupy
an unnecessarily large space on the page, without enhancing the readers'
understanding of the data.
– If more data are to be presented, or other information such as that regarding
data trends are to be conveyed, a table or a graph would be more appropriate.
– By nature, data take longer to read when presented as texts and when the
main text includes a long list of information, readers and reviewers may have
difficulties in understanding the information.
• Table presentation
– Tables, which convey information that has been converted into words or
numbers in rows and columns, have been used for nearly 2,000 years.
– Anyone with a sufficient level of literacy can easily understand the
information presented in a table.
– Tables are the most appropriate for presenting individual information, and
can present both quantitative and qualitative information.
– The strength of tables is that they can accurately present information that
cannot be presented with a graph.
– A number such as “132.145852” can be accurately expressed in a table.
– Tables are useful for summarizing and comparing quantitative information
of different variables.
– However, the interpretation of information takes longer in tables than in
graphs, and tables are not appropriate for studying data trends.
– Furthermore, since all data are of equal importance in a table, it is not easy
to identify and selectively choose the information required.

26
• Graph presentation
– Whereas tables can be used for presenting all the information, graphs
simplify complex information by using images and emphasizing data
patterns or trends, and are useful for summarizing, explaining, or exploring
quantitative data.
– While graphs are effective for presenting large amounts of data, they can be
used in place of tables to present small sets of data.
– A graph format that best presents information must be chosen so that
readers and reviewers can easily understand the information.
– Scatter plot
– Scatter plots present data on the x- and y-axes and are used to investigate
an association between two variables.
– A point represents each individual or object, and an association between
two variables can be studied by analyzing patterns across multiple points.
– A regression line is added to a graph to determine whether the association
between two variables can be explained or not.

27
Example:
The local ice cream shop keeps track of how much ice cream they sell versus
the noon temperature on that day.

28
– Bar graph and histogram
– A bar graph is used to indicate and compare values in a discrete category or
group, and the frequency or other measurement parameters (i.e. mean).
– Depending on the number of categories, and the size or complexity of each
category, bars may be created vertically or horizontally.
– The height (or length) of a bar represents the amount of information in a
category.
– Bar graphs are flexible, and can be used in a grouped or subdivided bar
format in cases of two or more data sets in each category.
– Unlike bar chart, a histogram groups numbers into ranges .

29
– Pie chart
– A pie chart, which is used to represent nominal data (in other words, data
classified in different categories), visually represents a distribution of
categories.

30
Descriptive and Inferential Statistics
• Descriptive statistics is the term given to the analysis of data that
helps describe, show or summarize data in a meaningful way.
• Typically, there are two general types of statistic that are used to
describe data:
– Measures of central tendency: these are ways of describing the central
position of a frequency distribution for a group of data.
– Example: the frequency distribution and pattern of marks scored by the 100
students from the lowest to the highest. We can describe this central
position using a number of statistics, including the mode, median, and mean.
– Measures of spread: these are ways of summarizing a group of data by
describing how spread out the scores are.
– To describe this spread, a number of statistics are available to us, including
the range, quartiles, absolute deviation, variance and standard deviation.
• Inferential statistics are techniques that allow us to use samples to
make generalizations about the populations from which the
samples were drawn.
31
• The methods of inferential statistics are:
– The estimation of parameter
– This means taking a statistic from your sample data (for example the sample
mean) and using it to say something about a population parameter (i.e. the
population mean).
– Testing of statistical hypotheses.
– This is where you can use sample data to answer research questions. 

32
Assignment (15%)
• Use any sample data and compute components
of descriptive statistics using SPSS.

33

You might also like