Professional Documents
Culture Documents
Research design
www.knust.edu.gh
READING LIST
Aaker, D.A., Kumar, V. and Day, G.S. (1998). Marketing
research. 6th edition. John Wiley, New York.
www.knust.edu.gh
www.knust.edu.gh
www.knust.edu.gh
www.knust.edu.gh
• Example: Y =aX +C
• Y is the dependent variable
• X is the independent (explanatory variable),
• a is the coefficient (a constant attached to an independent variable)
• C is the constant.
www.knust.edu.gh
www.knust.edu.gh
Data preparation includes editing, coding, and data entry.
These activities ensure the accuracy of the data and their
conversion from raw form to reduced and classified forms
that are more appropriate for analysis.
www.knust.edu.gh
Raw Data
Editing
Analysis
Approach?
o Consistent with the intent of the question and other information in the
survey.
o Uniformly entered.
o Complete.
www.knust.edu.gh
A daily field edit enables enumerators to identify
respondents who should be recontacted to fill in omissions in
a timely fashion.
The supervisor may also use field edits to spot the need for
further interviewer training or to correct faulty procedures.
Central(In-house) Editing
A rigorous editing by a single editor in a small study or by a
team of editors in the case of a large inquiry performed at a
centralized office.
www.knust.edu.gh
ii. The coding categories should be mutually exclusive and
independent. This implies that there should be no
overlap among the categories to ensure that a subject or
response can be placed in only one category.
www.knust.edu.gh
Coding Closed Questions
www.knust.edu.gh
Multiple dummy variables are needed to represent a single
qualitative response that can take on more than two
categories.
As a rule, if k is the number of categories for a qualitative
variable, k–1 dummy variables are needed to represent the
variable.
The data collection instrument can also be precoded during
the design stage.
NB: The major objective in the code-building process is to
accurately transfer the meanings from written responses to
numeric codes.
www.knust.edu.gh
Coding Open-Ended Questions
One of the primary reasons for using open-ended questions
is that insufficient information or lack of a hypothesis may
prohibit preparing response categories in advance.
These questions may be exploratory or they may be potential
follow-ups to structured questions.
The purpose of coding such questions is to reduce the large
number of individual responses to a few general categories
of answers that can be assigned numerical codes.
Similar answers should be placed in a general category and
assigned the same code much as the codes are assigned in the
qualitative sample.
www.knust.edu.gh
For example, a consumer survey about frozen food also
asked why a new microwaveable product would not be
purchased:
• We don’t buy frozen food very often.
• I like to prepare fresh food.
• Frozen foods are not as tasty as fresh foods.
• I don’t like that freezer taste.
How can these responses be coded ?
www.knust.edu.gh
2.2 Code Book
A book that identifies each variable in a study and gives the
variable’s description, code name, and position in the data
matrix.
In essence, the code book provides a quick summary that is
particularly useful when a data file becomes very large
It is used by the researcher staff to promote more accurate
and more efficient data entry or data analysis
It is also the definitive source for locating the positions of
variables in the data file during analysis.
Most code books contain the question number, variable
name, location of the variable’s code on the input medium
and , descriptors for the response options as shown in
Figure 2. www.knust.edu.gh
Question Variable Code Description Variable Name
Number
1 1 Gender Gender
1 = male
0 = female
2 2 Marital status Marital
1 = married
2 = widow (er)
3 = divorced
4 = separated
5 = never married
99 = missing
3 3 Traveled in past 3 Travel
months
1 = yes
2 = no
4 4 Purpose of last trip PurposeT
1 = business
2 = vacation
3 = personal
www.knust.edu.gh
www.knust.edu.gh
Shape/Distribution
Symmetric Distribution
The right hand side of the distribution is a mirror image of
the left hand side.
Positively skewed
scores pile up on left
tapers off on the right
Negatively skewed
scores pile up on the right
tapers off on the left
www.knust.edu.gh
Nominal and ordinal data are often described using
frequency tables, percentages and graphs (e.g. bar
chart)/charts (i.e. pie chart)
www.knust.edu.gh
Graphs and Charts for Displaying Descriptive Statistics
www.knust.edu.gh
Box Plot
• The box plot is useful for detecting skewness of distributions by noticing
where the median is located and disparities between the lengths of the two
whiskers.
• In a symmetrical distribution, the median is centred and the whiskers
are of equal length.
1. Click on Graphs >>Ligacy Dialogs …
2. On the drop down menu select boxplot
3. The Boxplot Chart dialogue box provides for choice among simple and
clustred boxplots as well as summaries for group cases and summaries
for separate variables.
4. For simple boxplot, select simple and summaries for separate variables.
5. Transfer the required variables to the boxes represent space
6. Click OK and the output presents the Boxplot
www.knust.edu.gh
Parametric and Non-parametric Test
Parametric tests are based on assumptions about population
distributions and parameters.
Test statistics depend on calculation of measurements of central tendencies.
www.knust.edu.gh
Parametric and Non-parametric Test
Parametric tests are based on assumptions about population
distributions and parameters.
Test statistics depend on calculation of measurements of central tendencies.
www.knust.edu.gh
2. They tend to use less information than the parametric tests. For example,
the sign test requires the researcher to determine only whether the data
values are above or below the median, not how much above or below the
median each value is.
3. They are less efficient than their parametric counterparts when the
assumptions of the parametric methods are met. That is, larger sample
sizes are needed to overcome the loss of information.
For example, the nonparametric sign test is about 60% as efficient as its
parametric counterpart, the z test. Thus, a sample size of 100 is needed for use
of the sign test, compared with a sample size of 60 for use of the z test to
obtain the same results. www.knust.edu.gh
www.knust.edu.gh
Data Requirements
To use the Paired Samples t Test, your data must meet the following
requirements:
1. The dependent variable should be measured on a continuous scale
(i.e., it is measured at the interval or ratio level)
www.knust.edu.gh
www.knust.edu.gh
www.knust.edu.gh
Data Requirements
The data for Independent Samples t Test must meet the following
requirements:
www.knust.edu.gh
www.knust.edu.gh
www.knust.edu.gh
The dependent variable should be measured at the interval or ratio level (i.e.,
continuous dependent variable)
However, the former is commonly used to compare the means across three or more
groups since the latter (independent Samples t Test) is popular for two unrelated
groups comparison.
www.knust.edu.gh
There is need for homogeneity of variances. This condition can be verified using
Levene's test for homogeneity of variances. If this condition is not satisfied,
Welch’s ANOVA, which does not assume that the variances should be equal, can be
used. When variances are unequal, post hoc tests that do not assume equal
variances should be used (for example, Dunnett’s C).
www.knust.edu.gh
The ANOVA doesn’t test that one mean is less than another, only whether they’re
all equal or at least one is different.
www.knust.edu.gh
This test utilizes a contingency table to analyze the data. The categories for one
variable appear in the rows, and the categories for the other variable appear in
columns.
Each variable must have two or more categories. Each cell reflects the total count
of cases for a specific pair of categories.
www.knust.edu.gh
Data Requirements
The data for Chi-Square Test of Independence must meet the following requirements:
The two variables should consist of two or more categorical independent groups.
For example,
www.knust.edu.gh
Relatively large sample size. Expected frequencies should be at least 5 for the
majority (80%) of the cells. There should not be a situation where there is no
observation in a particular cell.
OR
If the computed value > critical value, then REJECT the null hypothesis.
If the computed value < critical value, then DO NOT the null hypothesis.
www.knust.edu.gh
Thus, to investigate any change in scores from one time point to another, or when
individuals are subjected to more than one condition.
Since this test does not assume normality in the data, it can be used when this
assumption has been violated and the use of the paired sample t-test is rendered
inappropriate
Common Uses
The Wilcoxon signed rank test is commonly used to test the following:
o Statistical difference between two time points
o Statistical difference between two conditions
o Statistical difference between two measurements
www.knust.edu.gh
www.knust.edu.gh
www.knust.edu.gh
For instance, the Mann-Whitney U test can be used to determine whether salaries,
measured on a continuous scale, differed based on educational level (i.e., your
dependent variable would be "salary" and your independent variable would be
"educational level", which has two groups: "high school" and "university").
Common Uses
The Mann-Whitney U test is commonly used to test the following:
o The observations for the two groups must be independent. This implies that there
should be no relationship between the observations in each group or between the
groups themselves
www.knust.edu.gh
o NB: The same shape does not necessarily imply a normal distribution
Fig.1. Profit per 100kg bag of maize Fig.2. Profit per 100kg bag of groundnut
www.knust.edu.gh
www.knust.edu.gh
www.knust.edu.gh
To determine the appropriate critical U value we need the sample sizes for the two
groups (n1 and n2) and the two-sided level of significance (e,g . α=0.05).
It is important to note that this is different from many statistical tests, where the
obtained value has to be equal to or larger than the critical value.
The Mann Whitney Critical U table is used only when the sample size is small (i.e.,
20 or less). For large sample sizes, the U is approximately equal to the Z distribution
and therefore the Z table is used instead.
www.knust.edu.gh
This test does not assume normality in the data and is much less sensitive to outliers.
The basic intuition behind the test is analogous to that for the parametric one-way
ANOVA:
o A real difference among treatments should cause the variability of scores between
groups to be greater than the variability of scores within groups
o If all the scores are ranked, the variability of rank-sums between groups should be
greater than the variability of rank-sums within groups
www.knust.edu.gh
www.knust.edu.gh
To be able to interpret the results from a Kruskal-Wallis H test, there is the need to
determine whether the distributions in each group have the same shape or
different shapes (refer to the reasons assigned in the Mann-Whitney U test
section).
To perform the Kruskal-Wallis test, each of the sample sizes must be at least 5.
H0: There is no difference between groups. There is no tendency for ranks in any
sample to be systematically higher or lower than in any other condition.
H1: There are differences between groups. The ranks in at least one group or sample
are systematically higher or lower than in another group.
www.knust.edu.gh
One way of finding significant differences between the means is to make all
possible pairwise comparisons (i.e. test if each pair of mean ranks is equal).
This can done through pairwise Mann-Whitney test with Bonferroni correction or
by Dunn’s test .
www.knust.edu.gh