You are on page 1of 78

Kwame Nkrumah University of

Science & Technology, Kumasi, Ghana

ABM 355 : RESEARCH METHODS IN


AGRIBUSINESS

DR. SETH ETUAH

setuah@knust.edu.gh // 0544197198; 0501580600


COURSE CONTENT
 General Introduction to  Sampling methods
research

 Thinking like a researcher  Data collection methods

 The research process: an  Measurement and scaling


overview
 Data analysis
 Interpretation of results and
 Research proposal
development report writing

 Research design

www.knust.edu.gh
READING LIST
 Aaker, D.A., Kumar, V. and Day, G.S. (1998). Marketing
research. 6th edition. John Wiley, New York.

 Zikmund, W. G., Babin, B. J., Carr, J. C., & Griffin M. (2013).


Business Research Methods, International edition ; South-
Western, Cengage Learning publishers, Australia/New Zealand.

 Cooper, R. D. & Schindler, S. P. (2008). Business Research


Methods. Boston: Irwin McGraw Hill.

 Bryman, A. & Bell, E. (2007). Business Research Methods, USA:


Oxford University Press.
www.knust.edu.gh
TERMINOLOGIES
 Population: This is the set of all the individuals of interest
in a particular study. It is the entire group one wishes to
study. Populations may be quite large. It may vary in size,
from large to small.

 Sample: It is a set of individuals selected from a population


usually intended to represent the population

Why the need to sample?


 It is normally impossible to examine every single individual
in a population. Hence there is the need to select a smaller, a
manageable group to represent the entire population.
 Example: A sample of producers/consumers/retailers/wholesalers etc.
www.knust.edu.gh

SETH ETUAH (PHD) 4


 Parameter: It is a numerical value that describes a
population. It may be obtained from a single measurement
or set of measurements from the population. Eg. Mean age
in census, etc.
 Statistic: It is a numerical value that describes a sample.
It may be obtained from a single measurement or derived
from a set of measurement from a sample. Eg. Sample
mean, std. etc

www.knust.edu.gh

SETH ETUAH (PHD) 5


Sampling Error: It is the discrepancy or amount of error that
exist between a sample statistic and the corresponding
population parameter. It is normally occurring difference
between a statistic and a parameter.
 It is normally referred to as “margin of error” i.e. when sample
statistic is used to represent a population parameter, there will always
be a “margin of error”.

 Data: These are measurements or observations. The raw,


unorganized facts that need to be processed.
 A Data Set: It is a collection of measurement or observations.
 Datum: It is a single measurement or observation. It is
normally called a score or a raw data.
www.knust.edu.gh

SETH ETUAH (PHD) 6


TYPES OF DATA
1. Primary Data
 It is a raw data collected for the first time.
 Normally through survey interviews, focus group
discussions etc.
 It is often collected from the field.
 It is original in nature that focuses the desire of researcher.
 It is often elaborated and used only by the investigator.
Secondary Data
 It is a processed data.
 It is often collected from previous records or official records
such as accredited institutional sources etc.
www.knust.edu.gh

DR. SETH ETUAH


 Secondary data may not serve all purposes as desired.
 It is not elaborate but concise.
 Unlike primary data which is used only by the researcher, it
may be used by anyone.
 All secondary data was initially primary but primary data may
not be secondary data.

www.knust.edu.gh

DR. SETH ETUAH


Before data collection, one should be mindful of the
following:

 The aim of the investigation


Knowledge about the sources of collection
Method of collection
Choice of units
Proper sampling
Degree of accuracy

www.knust.edu.gh

SETH ETUAH (PHD) 9


STATISTICAL METHODS
 Descriptive statistics: They are the statistical procedures
that are used to summarize, organize and simplify data.
Examples: Averages, Graphs (line, chart, histogram etc).

 Inferential Statistics: It consists of techniques that allow us


to study sample and make generalization about the
populations from which they were selected.

Variable: It is a characteristic or something that can change or


have different values for different individuals.
Example; temperature, height, weight , yield, etc.
www.knust.edu.gh

SETH ETUAH (PHD) 10


Constant: It is a characteristic or condition that does not vary
or change but is the same for every individual.

Discrete Variable: A variable that is not divisible into infinite


number of fractional parts.
It consists of separate indivisible categories.
It consists of whole numbers that vary in countable steps.
Example: number of children in a family, household size
etc.
Continuous variable: This is variable that is divisible into
infinite number of fractional parts. Example: weight of
individuals, heights of individuals.

www.knust.edu.gh

SETH ETUAH (PHD) 11


Independent variable: A variable that is manipulated
by the researcher.
 The dependent variable is the property you are trying
to explain; it is always the object of the research.

 Dependent variable: A variable observed for changes


that may occur as a result of a manipulation.
 The independent variable is often seen influencing,
directly or indirectly, the dependent variable.

• Example: Y =aX +C
• Y is the dependent variable
• X is the independent (explanatory variable),
• a is the coefficient (a constant attached to an independent variable)
• C is the constant.
www.knust.edu.gh

SETH ETUAH (PHD) 12


SCALES OF MEASUREMENT
Nominal, Ordinal, Interval, and Ratio scales

1. Nominal scale: Consists of a set of categories that have different


names but are not differentiated in terms of magnitude and direction.
Example: Gender of person on questionnaire, nominal scale of two
categories
Occupation: sales professionals, skilled trade and others (specify).

2. Ordinal scale: It consists of a set of categories that are organized


in an ordered sequence. Measurement on an ordinal scale ranks
observations in terms of size of magnitude.
Examples:
 Series of ranks: 1st, 2nd, 3rd and so on.
 In perception studies, we have ordinal scales such as LIKERT scale
 University degree/class
www.knust.edu.gh

SETH ETUAH (PHD) 13


3. Interval scale consists of ordered categories that are all intervals
of exactly the same size.
 Equal differences between numbers on the scale reflect equal
differences in magnitude.
 The zero point on an interval scale is arbitrary
 The ratio between two numbers is not meaningful.
 Example: Temperature , date, age deviation from mean etc

4. Ratio scale: is an interval scale with the additional feature of an


absolute zero point.
 With a ratio scale, ratios of numbers do reflect ratios of
magnitude.
 Eg. weight. Height

NB: Quantitative/qualitative data


www.knust.edu.gh

SETH ETUAH (PHD) 14


DATA ANALYSIS
 After the fieldwork has been completed, the data must be
converted into a format that will answer the research
questions.
 The application of reasoning to understand the data that
have been collected is known as data analysis
 The raw data are often not in a form that can be directly
used for analysis.
 There is, therefore, the need to process or prepare the data
prior to the analysis.

www.knust.edu.gh
 Data preparation includes editing, coding, and data entry.
 These activities ensure the accuracy of the data and their
conversion from raw form to reduced and classified forms
that are more appropriate for analysis.

www.knust.edu.gh
Raw Data

Editing

Coding Error checking


takes place in
each of these
Data File stages

Analysis
Approach?

Descriptive Univariate Bivariate Multivariate


Analysis Analysis Analysis Analysis

Figure 1.1: Overview of the Stages of Data Analysis


www.knust.edu.gh
STAGES OF DATA ANALYSIS
1. EDITING
 Editing is the process of checking the completeness,
consistency, and legibility of data and making the data
ready for coding and transfer to storage.
 Editing detects errors and omissions on questionnaires or
other data collection forms, corrects them when possible,
and certifies that maximum data quality standards are
achieved.
 When a problem or an error is detected in the process, the
editor has to adjust the data to make them more complete,
consistent, or readable.
www.knust.edu.gh
 In some instances, the editor may need to reconstruct data
especially when the probable true response is very obvious.

 Specifically, editing guarantees that data are:


o Accurate.

o Consistent with the intent of the question and other information in the
survey.

o Uniformly entered.

o Complete.

o Arranged to simplify coding and tabulation

www.knust.edu.gh
 A daily field edit enables enumerators to identify
respondents who should be recontacted to fill in omissions in
a timely fashion.
 The supervisor may also use field edits to spot the need for
further interviewer training or to correct faulty procedures.

Central(In-house) Editing
 A rigorous editing by a single editor in a small study or by a
team of editors in the case of a large inquiry performed at a
centralized office.

 Central editing takes place when all forms or schedules have


been completed and returned to the office.
www.knust.edu.gh
 The editor must analyze the instruments used by each
interviewer to detect falsification of data and obvious errors
such as an entry of data in the wrong place, specifying time in
days when it was requested in weeks etc.

 When answers provided are out of range of expected values or


not related to the question asked or missing, the editor can
sometimes determine the proper answer by reviewing the other
information in the data set.

 If the correct answer cannot be easily determined based on the


other information available on the respondent, the editor can
strike out the answer (and replace it with “no answer” or
“unknown”) if it is deemed inappropriate.

 In some instances, the respondents can be contacted for


clarification. www.knust.edu.gh
Useful Rules to Guide Editors
 Be familiar with instructions given to interviewers and
coders.
 Do not erase, or make illegible the original entry by the
interviewer or respondent; original entries should remain
legible.
 Make all editing entries on an instrument or in a data set in
some distinctive colour and in a standardized form.
 Initial all answers changed or supplied.
 Place initials and date of editing on each instrument
completed or in a separate field within a data set.
www.knust.edu.gh
2. Coding
 The process of assigning a numerical score or other
character symbol to previously edited data (or answers) so
that the responses can be grouped into a limited number of
categories.

 Assigning numerical symbols permits the transfer of data


from questionnaires or interview forms to a computer.

 Codes can be broadly defined as rules for interpreting,


classifying, and recording data in the coding process; also,
the actual numerical or other character symbols assigned to
raw data.
www.knust.edu.gh
 In coding, categories are the partitions of a data set of a
given variable
 For instance, if the variable is gender, the partitions are male
and female
 Both closed- and open-response questions must be coded
prior to data analyses

Basic Rules for Code Construction


 There exist two basic rules for code construction
i. The coding categories should be exhaustive, meaning that
a coding category should exist for all possible responses.

www.knust.edu.gh
ii. The coding categories should be mutually exclusive and
independent. This implies that there should be no
overlap among the categories to ensure that a subject or
response can be placed in only one category.

iii. The categories should be appropriate to the research


problem and purpose.

iii. The categories within a single variable should be


derived from one classification dimension. This means
every option in the category set is defined in terms of
one concept or construct.

www.knust.edu.gh
Coding Closed Questions

 Closed questions are favoured by researchers over open-


ended questions for their efficiency and specificity.
 Such questions are easier to code, record, and analyze
 Qualitative responses to structured questions such as “yes”
or “no” can be represented with numbers, one each to
represent the respective category
 So, the number 1 can be used to represent “yes” and 0 can
be used to represent “no.”

www.knust.edu.gh
 Multiple dummy variables are needed to represent a single
qualitative response that can take on more than two
categories.
 As a rule, if k is the number of categories for a qualitative
variable, k–1 dummy variables are needed to represent the
variable.
 The data collection instrument can also be precoded during
the design stage.
NB: The major objective in the code-building process is to
accurately transfer the meanings from written responses to
numeric codes.
www.knust.edu.gh
Coding Open-Ended Questions
 One of the primary reasons for using open-ended questions
is that insufficient information or lack of a hypothesis may
prohibit preparing response categories in advance.
 These questions may be exploratory or they may be potential
follow-ups to structured questions.
 The purpose of coding such questions is to reduce the large
number of individual responses to a few general categories
of answers that can be assigned numerical codes.
 Similar answers should be placed in a general category and
assigned the same code much as the codes are assigned in the
qualitative sample.
www.knust.edu.gh
 For example, a consumer survey about frozen food also
asked why a new microwaveable product would not be
purchased:
• We don’t buy frozen food very often.
• I like to prepare fresh food.
• Frozen foods are not as tasty as fresh foods.
• I don’t like that freezer taste.
How can these responses be coded ?

www.knust.edu.gh
2.2 Code Book
 A book that identifies each variable in a study and gives the
variable’s description, code name, and position in the data
matrix.
 In essence, the code book provides a quick summary that is
particularly useful when a data file becomes very large
 It is used by the researcher staff to promote more accurate
and more efficient data entry or data analysis
 It is also the definitive source for locating the positions of
variables in the data file during analysis.
 Most code books contain the question number, variable
name, location of the variable’s code on the input medium
and , descriptors for the response options as shown in
Figure 2. www.knust.edu.gh
Question Variable Code Description Variable Name
Number
1 1 Gender Gender
1 = male
0 = female
2 2 Marital status Marital
1 = married
2 = widow (er)
3 = divorced
4 = separated
5 = never married
99 = missing
3 3 Traveled in past 3 Travel
months
1 = yes
2 = no
4 4 Purpose of last trip PurposeT
1 = business
2 = vacation
3 = personal
www.knust.edu.gh

Figure 2: Sample Codebook of Questionnaire Items


Data Entry
 The process of transferring (coded) data from a research project,
such as answers to a survey questionnaire, to computers for
viewing and manipulation.
 Keyboarding remains a mainstay for researchers who need to
create a data file immediately and store it in a minimal space on
a variety of media.
 However, an optical scanning system may be used to read
material directly into the computer’s memory.
 Voice recognition systems are alternatives for the telephone
interviewer.
 Data verification or cleaning is necessary to ensure that all
codes are legitimate. www.knust.edu.gh
Descriptive Analysis
 The elementary transformation of raw data in a way that
describes the basic characteristics such as central tendency,
variability and shape of the distribution .
 Central Tendency
A statistical measure to determine a single score that defines
the centre of the distribution. The goal is to find the single
score that is most representative of the entire group.
Mean
Median
Mode
www.knust.edu.gh
Spread (variability)
A measure how the individual distributions are deviated from
the averages like mean, median or mode.
Variance;
Standard Deviation
Range ; (R = Xmax – Xmin)
1 3
Interquartile Range ; Q3 – Q1 where 1 4 and 3 4 𝑁
𝑄 = 𝑁 𝑄 =
N = Total frequency
• For example. Find the interquartile range given the data 3, 4,
5, 7, 9, 10, 11, 13

www.knust.edu.gh
Shape/Distribution
Symmetric Distribution
 The right hand side of the distribution is a mirror image of
the left hand side.

Positively skewed
 scores pile up on left
 tapers off on the right

Negatively skewed
 scores pile up on the right
 tapers off on the left

www.knust.edu.gh
 Nominal and ordinal data are often described using
frequency tables, percentages and graphs (e.g. bar
chart)/charts (i.e. pie chart)

To obtain descriptive statistics (using SPSS)


1. Click on Analyze >> Descriptive Statistics >> Explore to
obtain the Explore dialogue box.
2. Transfer to the Dependent List box by clicking and
highlighting those variables for which you wish to obtain
descriptive statistics.
3. In the Display box click on Statistics which will bring
up the Explore: Statistics dialogue box.
4. Ensure Descriptive is chosen. Select Continue >> OK to
produce the output www.knust.edu.gh
Tabulation
 The orderly arrangement of data in a table or other summary
format showing the number of responses to each response
category; tallying.
 Frequency table: A table showing the different ways
respondents answered a question.

 Cross-tabulation/ Contingency table : A data matrix that


displays the frequency of some combination of possible
responses to multiple variables; cross-tabulation results

• Cross-tabs allow the inspection and comparison of


differences among groups based on nominal or ordinal
categories www.knust.edu.gh
To obtain a cross-tabulation
• Click on Analyze >> Descriptive Statistics >>Crosstabs …
• This brings up the cross-tabulation dialogue box.
• Two variables are transferred to row(s) and the other to
column(s) box respectively.
• Click on OK to obtain the output.

www.knust.edu.gh
Graphs and Charts for Displaying Descriptive Statistics

 Frequency polygon: The most common ways of representing


frequency distributions graphically are by numerous variations
of the frequency polygon, the simplest of which is the line
graph

1. Click Graph >> Line.


2. In the Line Charts dialogue box select Simple.
3. Choose Summaries for Groups of cases.
4. Click Define to produce the Define Simple Line
5. Summaries of Groups of Cases: dialogue box.
6. Select the variable you wish to plot and then click the arrow
button to place it into the Category Axis box.
7. Select OK. www.knust.edu.gh
Bar charts
A common method of presenting categorical data is the bar chart where
the height or length of each bar is proportional to the size of the
corresponding number.
1. Click on Graphs >>Bar … on the drop down menu.
2. The Bar Chart dialogue box provides for choice among a
number of different bar chart forms.
3. The Define Simple Bar dialogue box emerges with a variety of
options for the display. We have chosen N of cases but there
are other options for you to explore
4. Transfer the required variable
5. Click OK and the output presents the Bar Chart

www.knust.edu.gh
Box Plot
• The box plot is useful for detecting skewness of distributions by noticing
where the median is located and disparities between the lengths of the two
whiskers.
• In a symmetrical distribution, the median is centred and the whiskers
are of equal length.
1. Click on Graphs >>Ligacy Dialogs …
2. On the drop down menu select boxplot
3. The Boxplot Chart dialogue box provides for choice among simple and
clustred boxplots as well as summaries for group cases and summaries
for separate variables.
4. For simple boxplot, select simple and summaries for separate variables.
5. Transfer the required variables to the boxes represent space
6. Click OK and the output presents the Boxplot

www.knust.edu.gh
Parametric and Non-parametric Test
 Parametric tests are based on assumptions about population
distributions and parameters.
 Test statistics depend on calculation of measurements of central tendencies.

The assumptions for parametric test


1. Equal interval or ratio level data
2. Normal distribution or closely so.
3. Homogeneity of variance.
4. Samples randomly drawn from the population
 If any of assumptions 1, 2, and 4 are not met then non-
parametric tests should be used www.knust.edu.gh
Examples of Parametric tests

• Paired sample t-test


• Independent sample t-test
• Analysis of variance etc.

• Examples of Non-parametric Test


• Non-parametric tests make no assumptions about
population parameters or distributions
• For example; chi-square test, mann-whitney u , Kruskal-
Wallis H test, Jonckheere- Terpstra etc.

www.knust.edu.gh
Parametric and Non-parametric Test
 Parametric tests are based on assumptions about population
distributions and parameters.
 Test statistics depend on calculation of measurements of central tendencies.

The assumptions for parametric test


1. Equal interval or ratio level data
2. Normal distribution or closely so.
3. Homogeneity of variance.
4. Samples randomly drawn from the population
 If any of assumptions 1, 2, and 4 are not met then non-
parametric tests should be used www.knust.edu.gh

DR. SETH ETUAH


Advantages and Disadvantages of Non-
parametric Methods
Advantages
There are four advantages that nonparametric methods have over parametric methods:
1. They can be used to test population parameters when the variable is not normally
distributed.
2. They can be used when the data are nominal or ordinal.
3. They can be used to test hypotheses that do not involve population parameters.
4. In some cases, the computations are easier than those for the parametric
counterparts.

www.knust.edu.gh

DR. SETH ETUAH


Disadvantages
There are three disadvantages of nonparametric methods:
1. They are less sensitive than their parametric counterparts when the
assumptions of the parametric methods are met. Therefore, larger
differences are needed before the null hypothesis can be rejected.

2. They tend to use less information than the parametric tests. For example,
the sign test requires the researcher to determine only whether the data
values are above or below the median, not how much above or below the
median each value is.

3. They are less efficient than their parametric counterparts when the
assumptions of the parametric methods are met. That is, larger sample
sizes are needed to overcome the loss of information.
For example, the nonparametric sign test is about 60% as efficient as its
parametric counterpart, the z test. Thus, a sample size of 100 is needed for use
of the sign test, compared with a sample size of 60 for use of the z test to
obtain the same results. www.knust.edu.gh

DR. SETH ETUAH


Examples of Parametric tests
 Paired sample t-test
 Independent sample t-test
 Analysis of variance etc.

• Examples of Non-parametric Test


 Non-parametric tests make no assumptions about
population parameters or distributions
 For example; chi-square test, mann-whitney u test,
Kruskal-Wallis H test, Jonckheere- Terpstra etc.

www.knust.edu.gh

DR. SETH ETUAH


Paired Sample t-test
The Paired Samples t Test (also called the dependent t-test or
repeated measure t test) compares two means that are from the
same individual, object, or two related groups on the same
continuous, dependent variable.

The two means typically represent two different times (for


instance, pre-test and post-test with an intervention between the
two time points) or two different but related conditions or units.

The purpose of the test is to determine whether there is statistical


evidence that the mean difference between paired observations on
a particular outcome or continuous dependent variable is
significantly different from zero.
www.knust.edu.gh

DR. SETH ETUAH


Common Uses
The Paired Samples t Test is commonly used to test the following:
o Statistical difference between two time points
o Statistical difference between two conditions
o Statistical difference between two measurements
o Statistical difference between a matched pair

Data Requirements
To use the Paired Samples t Test, your data must meet the following
requirements:
1. The dependent variable should be measured on a continuous scale
(i.e., it is measured at the interval or ratio level)

www.knust.edu.gh

DR. SETH ETUAH


2) The independent variable should consist of two
categorical, "related groups" or "matched pairs".
"Related groups" indicates that the same subjects are
present in both groups.
3) The distribution of the differences in the dependent
variable between the two related groups should be
approximately normally distributed.
4) There should be no significant outliers in the
differences between the two related groups. Outliers are
simply single data points within your data that do not
follow the usual pattern.
www.knust.edu.gh

DR. SETH ETUAH


Hypotheses Associated with the Paired Samples t Test
 The hypotheses can be expressed in two different ways that express
the same idea and are mathematically equivalent:
 H0: µ1 - µ2 = 0 ("the difference between the means of the paired or
related groups is equal to 0")
 H1: µ1 - µ2 ≠ 0 ("the difference between the means of the paired or
related groups is not 0")
OR
 H0: µ1 = µ2 ("the means of the paired or related groups are equal")
 H1: µ1 ≠ µ2 ("the means of the paired or related groups are not
equal")

www.knust.edu.gh

DR. SETH ETUAH


Independent Sample t-test
 The Independent Samples t Test (also called two-sample t test or
uncorrelated Scores t Test or unpaired t Test) compares the means
between two unrelated groups on the same continuous, dependent
variable to determine whether there is statistical evidence that the
associated population means are significantly different.

 Unrelated groups (unpaired groups or independent groups) are groups


in which the cases (e.g., participants) in each group are different.
 An individual in one group cannot also be a member of the other
group and vice versa.
 For instance, an individual would have to be classified as either male
or female – not both.

www.knust.edu.gh

DR. SETH ETUAH


Common Uses
The Independent Samples t Test is commonly used to test the following:

o Statistical differences between the means of two groups

o Statistical differences between the means of two interventions

o Statistical differences between the means of two change scores

Data Requirements
The data for Independent Samples t Test must meet the following
requirements:

1. The dependent variable should be measured on a continuous scale


(i.e., it is measured at the interval or ratio level).
www.knust.edu.gh

DR. SETH ETUAH


2. The independent variable should consist of two categorical,
independent groups. Examples of independent variables that meet
this criterion include gender (2 groups: male or female),
employment status (2 groups: employed or unemployed),
interventions (2 groups: beneficiaries or non-beneficiaries), and so
forth.
3. There should be independence of observations, which means that
there is no relationship between the observations in each group or
between the groups themselves. That is, there must be different
participants in each group with no participant being in more than
one group.
4. There should be no significant outliers.
5. The dependent variable should be approximately normally
distributed for each group of the independent variable
www.knust.edu.gh

DR. SETH ETUAH


6. There should be homogeneity of variances. When this assumption
is violated and the sample sizes for each group differ, the p value
becomes unreliable. Specifically, unequal variances could affect
the Type I error rate.

7. Random sample of data from the population

Hypotheses for the Independent t-test


 The null hypothesis for the independent t-test is that the population
means from the two unrelated groups are equal:
H0: µ1 = µ2
The alternative hypothesis is that the population means are not equal:

www.knust.edu.gh

DR. SETH ETUAH


Analysis of Variance (ANOVA)
 The ANOVA (or the one-way analysis of variance or one-factor ANOVA) is mainly
used to compare the means of more than two independent (unrelated) groups in
order to determine whether there is statistical evidence that the associated
population means are significantly different
 For instance, it can be used to determine whether average profit per 100kg bag of
maize differed among producers, wholesalers and retailers. Since the one-way
ANOVA is an omnibus test statistic, it cannot indicate the specific groups that are
statistically significantly different from each other.
 The source of the differences can be determined through a post hoc test (Latin for
“after this”).
 The one-way analysis of variance is basically an extension of the independent
sample t test.
 When comparing at least three groups based on one factor variable, then it said to
be one way analysis of variance (ANOVA).

www.knust.edu.gh

DR. SETH ETUAH


 For example, if we want to compare whether or not the mean output of
three workers is the same based on the working hours of the three workers.
When factor variables (or the categorical variables or the independent
variables) are two or more then it is said to be two way analysis of variance
(ANOVA).
 For example, based on working condition and working hours, we can
compare whether or not the mean output of three workers is the same.
 This discussion is however restricted to the one way analysis of variance
(ANOVA).
Common Uses
The One-Way ANOVA is commonly used to test the following:
Statistical differences among the means of three or more unrelated groups
Statistical differences among the means of three or more interventions
Statistical differences among the means of three or more change scores

www.knust.edu.gh

SETH ETUAH (PHD) 57


Data Requirements
 The data for One-Way ANOVA must meet the following requirements:

 The dependent variable should be measured at the interval or ratio level (i.e.,
continuous dependent variable)

 The independent variable should consist of three or more categorical, independent


groups. Both the One-Way ANOVA and the Independent Samples t Test can
compare the means for two unrelated groups.

 However, the former is commonly used to compare the means across three or more
groups since the latter (independent Samples t Test) is popular for two unrelated
groups comparison.

 The observations should be independent, which means that there is no relationship


between the observations in each group or between the groups themselves.

 There should be no significant outliers.

www.knust.edu.gh

DR. SETH ETUAH


 The dependent variable should be approximately normally distributed for each
category of the independent variable. The Kolmogorov-Smirnov or the Shapiro-
Wilk test may be used to confirm normality of the group.

 There is need for homogeneity of variances. This condition can be verified using
Levene's test for homogeneity of variances. If this condition is not satisfied,
Welch’s ANOVA, which does not assume that the variances should be equal, can be
used. When variances are unequal, post hoc tests that do not assume equal
variances should be used (for example, Dunnett’s C).

 Random sample of data from the population

Hypotheses for the One-Way ANOVA


The null hypothesis is that the means are all equal:

www.knust.edu.gh

DR. SETH ETUAH


 The alternative hypothesis is that at least one of the means is different

 The ANOVA doesn’t test that one mean is less than another, only whether they’re
all equal or at least one is different.

www.knust.edu.gh

DR. SETH ETUAH


Chi-Square Test of Independence
 The chi-square test of independence (also called Pearson's chi-square test) is used
to determine whether there is an association or relationship between (two)
categorical variables.

 This test utilizes a contingency table to analyze the data. The categories for one
variable appear in the rows, and the categories for the other variable appear in
columns.

 Each variable must have two or more categories. Each cell reflects the total count
of cases for a specific pair of categories.

 The Chi-Square Test of Independence can only compare categorical variables. It


cannot make comparisons between continuous variables or between categorical and
continuous variables.

 The Chi-Square Test of Independence only assesses relationship between


categorical variables, and cannot give an indication of the direction of influence or
causation.

www.knust.edu.gh

SETH ETUAH (PHD) 61


Common Uses
The Chi-Square Test of Independence is commonly used to test the following:

o Statistical independence or association between two or more categorical variables.

Data Requirements
The data for Chi-Square Test of Independence must meet the following requirements:

 The two variables should be measured at an ordinal or nominal level (i.e.,


categorical data).

 The two variables should consist of two or more categorical independent groups.
For example,

 Gender (2 groups: Males and Females), profession (5 groups: surgeon, doctor,


nurse, dentist, therapist) etc.

www.knust.edu.gh

SETH ETUAH (PHD) 62


 The categorical variables must not be "paired" in any way (e.g. pre-test/post-test
observations).

 Relatively large sample size. Expected frequencies should be at least 5 for the
majority (80%) of the cells. There should not be a situation where there is no
observation in a particular cell.

Hypotheses for Chi-Square Test of Independence


 The null hypothesis (H0) and alternative hypothesis (H1) of the Chi-Square Test of
Independence can be expressed in two different but equivalent ways:

H0: "[Variable 1] is independent of [Variable 2]“

H1: "[Variable 1] is not independent of [Variable 2]"

OR

H0: "[Variable 1] is not associated with [Variable 2]"

H1: "[Variable 1] is associated with [Variable 2]"


www.knust.edu.gh

DR. SETH ETUAH


 The computed value is then compared to the critical value from the distribution
table with degrees of freedom df = (R - 1)(C - 1) and chosen confidence level.

 If the computed value > critical value, then REJECT the null hypothesis.

 If the computed value < critical value, then DO NOT the null hypothesis.

www.knust.edu.gh

DR. SETH ETUAH


The Wilcoxon Signed Ranks Test
 The Wilcoxon signed rank test is the nonparametric test equivalent to the paired
sample t-test. It is used to compare two sets of scores that come from the same
participants.

 Thus, to investigate any change in scores from one time point to another, or when
individuals are subjected to more than one condition.

 Since this test does not assume normality in the data, it can be used when this
assumption has been violated and the use of the paired sample t-test is rendered
inappropriate

Common Uses
The Wilcoxon signed rank test is commonly used to test the following:
o Statistical difference between two time points
o Statistical difference between two conditions
o Statistical difference between two measurements
www.knust.edu.gh

DR. SETH ETUAH


o Statistical difference between a matched pair
Data Requirements for Wilcoxon Test
The data for Wilcoxon Test must meet the following requirements:
 The dependent variable should be measured at the ordinal or continuous
level.
 The independent variable should consist of two categorical, "related groups"
or "matched pairs".
 The dependent (ordinal/continuous) variable for the two related groups or matched
pairs does not necessarily have to be normally distributed or have homogenous
variances.

 However, the distribution of the differences between the two related


groups needs to be symmetrical in shape.

 If this assumption is violated, the data can be transformed (possibly differenced) to


achieve a symmetrically-shaped distribution of differences or better still employing
the sign test instead of the Wilcoxon signed-rank test

www.knust.edu.gh

DR. SETH ETUAH


Hypotheses for Wilcoxon Test
H0: µ1 = µ2 ("the paired or related groups are equal")
H1: µ1 ≠ µ2 ("the paired or related groups are not equal")
For the computation of the test value, see appendix 1 Wilcoxon

www.knust.edu.gh

DR. SETH ETUAH


Mann-Whitney U Test
 The Mann-Whitney U test (also called the Mann Whitney Wilcoxon Test) is used to
compare differences between two independent groups when the dependent variable
is either ordinal or continuous, but not normally distributed.

 For instance, the Mann-Whitney U test can be used to determine whether salaries,
measured on a continuous scale, differed based on educational level (i.e., your
dependent variable would be "salary" and your independent variable would be
"educational level", which has two groups: "high school" and "university").

 It is considered the non-parametric alternative test to the independent sample t-test.

Common Uses
The Mann-Whitney U test is commonly used to test the following:

o Statistical differences between two groups

o Statistical differences between two interventions


www.knust.edu.gh

DR. SETH ETUAH


o Statistical differences between two change scores
o Statistical differences between preferences of people from two different locations
(e.g urban vs. rural)
o The differences are mainly about the mean or median ranks of the two groups.

Data Requirements (Mann-Whitney U Test)


The data for Mann-Whitney U Test must meet the following requirements:

o The dependent variable should be measured at the ordinal or continuous level.

o The independent variable should consist of two categorical independent groups.

o The observations for the two groups must be independent. This implies that there
should be no relationship between the observations in each group or between the
groups themselves

o The dependent (ordinal/continuous) variable for the two unrelated or independent


groups does not necessarily have to be normally distributed.

www.knust.edu.gh

DR. SETH ETUAH


o Nonetheless, to be able to interpret the results from a Mann-Whitney U test, there is
the need to determine whether the distributions of scores (i.e., the continuous or
ordinal dependent variable) for two groups have the same shape.

o NB: The same shape does not necessarily imply a normal distribution

o The diagram below throws more like on this requirement.

Fig.1. Profit per 100kg bag of maize Fig.2. Profit per 100kg bag of groundnut

Source: Modified from Laerd Statistics (2013)

www.knust.edu.gh

SETH ETUAH (PHD) 70


o Fig. 1 indicates the distribution of profits per bag (100kg) of maize for male and
female retailers. Since the two distributions are identical, the blue-coloured
male distribution is underneath the red-coloured female distribution.
o However, in Fig.2. , even though both distributions have the same shape, they
have a different location. Identical distributions (as shown in Fig.1.) are rare in
empirical analysis of data.
o As a result, if the distributions for the two groups have the same or similar
shape the Mann-Whitney U test can be conducted to compare the medians (of
the profits) for the two groups.
o If the two distributions have different shapes, then the Mann-Whitney U test can
be conducted to compare the mean ranks.

Hypotheses for the Mann-Whitney U Test


 The null and alternative hypotheses for a Mann-Whitney U test can be stated in
different ways depending on the shape of the distribution of scores (i.e.
dependent variable) and, for that matter , the comparison being done ( i.e.,
whether comparing the mean ranks or the medians).

www.knust.edu.gh

DR. SETH ETUAH


The general hypothesis can be stated as:
o H0: the distribution of scores for the two groups are equal
o H1: the distribution of scores for the two groups are not equal
However, the alternative hypothesis (H1) can also be expressed as follows:
H1: the mean ranks of the two groups are not equal
 The alternative hypothesis in italics is appropriate only when mean ranks are
being compared. This indicates that it is possible for the groups to have different
distributions but still not reject the null hypothesis of equal distributions.
 When the comparison is based on the medians, the hypothesis for the Mann-
Whitney U test can be expressed as
H0: the distributions of the two groups are equal
H1: the medians of the two groups are not equal
 It is important to note that the null hypothesis remains unchanged irrespective of
the type of comparison under consideration

www.knust.edu.gh

DR. SETH ETUAH


 For a one-tailed test, you will use the U for the group you predict will have the larger
sum (or mean) of ranks.

 To determine the appropriate critical U value we need the sample sizes for the two
groups (n1 and n2) and the two-sided level of significance (e,g . α=0.05).

 To be statistically significant, the computed U has to be equal to or less than the


critical U value.

 It is important to note that this is different from many statistical tests, where the
obtained value has to be equal to or larger than the critical value.

 The Mann Whitney Critical U table is used only when the sample size is small (i.e.,
20 or less). For large sample sizes, the U is approximately equal to the Z distribution
and therefore the Z table is used instead.

www.knust.edu.gh

SETH ETUAH (PHD) 73


Kruskal-Wallis H Test
 The Kruskal-Wallis H test (also called the one-way ANOVA on ranks) is a rank-based
nonparametric test that can be used to determine if there are statistically significant
differences between at least three groups of an independent variable on a continuous
or ordinal dependent variable.

 It is considered the nonparametric alternative to the one-way ANOVA, and an


extension of the Mann-Whitney U test to allow the comparison of more than two
independent groups.

 This test does not assume normality in the data and is much less sensitive to outliers.

 The basic intuition behind the test is analogous to that for the parametric one-way
ANOVA:

o A real difference among treatments should cause the variability of scores between
groups to be greater than the variability of scores within groups

o If all the scores are ranked, the variability of rank-sums between groups should be
greater than the variability of rank-sums within groups
www.knust.edu.gh

DR. SETH ETUAH


Common Uses
The Mann-Whitney U test is commonly used to test the following:

o Statistical differences among three or more unrelated groups

o Statistical differences among three or more interventions

o Statistical differences among three or more change scores

Data Requirements (Kruskal-Wallis H Test)


The data for Kruskal-Wallis H Test must meet the following requirements:

 The dependent variable should be measured at the ordinal or continuous level

 The independent variable should consist of three or more categorical, independent


groups.

www.knust.edu.gh

DR. SETH ETUAH


 The observations in various groups should be independent. This indicates that
there is no relationship between the observations in each group or between the
groups themselves

 To be able to interpret the results from a Kruskal-Wallis H test, there is the need to
determine whether the distributions in each group have the same shape or
different shapes (refer to the reasons assigned in the Mann-Whitney U test
section).

 To perform the Kruskal-Wallis test, each of the sample sizes must be at least 5.

Hypotheses (Kruskal-Wallis H Test )


The null and alternative hypotheses are very similar to those in the parametric one-
way ANOVA.

H0: There is no difference between groups. There is no tendency for ranks in any
sample to be systematically higher or lower than in any other condition.

H1: There are differences between groups. The ranks in at least one group or sample
are systematically higher or lower than in another group.
www.knust.edu.gh

DR. SETH ETUAH


 As a result, the significance of H is usually evaluated using a chi-squared
distribution with k-1 degrees of freedom
 Where k represents the number of groups .
 It is important to note that the Kruskal-Wallis H test is an omnibus test statistic
and cannot indicated the specific groups of the independent variable that are
statistically significantly different from each other.

 One way of finding significant differences between the means is to make all
possible pairwise comparisons (i.e. test if each pair of mean ranks is equal).

 This can done through pairwise Mann-Whitney test with Bonferroni correction or
by Dunn’s test .

www.knust.edu.gh

SETH ETUAH (PHD) 77


www.knust.edu.gh

You might also like