Professional Documents
Culture Documents
TOPIC 0:
INTRODUCTION
Data analysis is perhaps the most important component of research. Weak analysis produces
inaccurate results that not only hamper the authenticity of the research but also make the
findings unusable. It‟s imperative to choose your data analysis methods carefully to ensure
that your findings are reliable and valid.
Data analysis is the process of evaluating data using analytical and statistical tools to discover
useful information and ease or aid decision making by organisation or policy makers. Data
analysis is a part of a larger process of deriving business intelligence. This process includes
one or more of the following steps:
Posing questions: a question must be asked in the problem domain. Each study seeks to
answer a particular question. For example: does reward determine employees‟
productivity?
Defining objectives: any study must have a set of clearly defined objectives. Much of the
decisions made in the rest of the process depend on how clearly the objectives of the
study have been stated. For example to determine the relationship between rewards and
employees productivity
Hypotheses formulation: before any analysis is conducted, postulate or conjecture is
made on what the relationship among variables may be. Hypotheses are testable
statement on the relationship among variables of interest. For example there is no
significant relationship between rewards and employees productivity.
Data collection: data are of two forms depending on the source of the data namely
primary and secondary data. Primary data are also called field data or first-hand data
because they are collected and compiled by the researcher using tools like questionnaires
or focus groups whereas secondary data are collected from other sources. This step is
very important since the quality and reliability of the findings from data analysis depend
firstly on the quality and reliability of the data collection procedure. When data is being
collected using surveys, a questionnaire to be presented to the subjects is needed. The
questions should be properly designed for the statistical method being used.
1
Quantitative Analysis
Data wrangling or coding: Raw data may be collected in different formats. The
collected data must be cleaned and converted so that analysis tools can import it.
Data analysis: it the process by which sense is made of the data gathered in research by
proper application of statistical methods and answer to the question posed earlier
provided.
Drawing conclusions and making predictions: this is the step where after sufficient
analysis, conclusions can draw from the data and appropriate predictions can be made.
These conclusions and predications may then be summarized in a report delivered to end
users.
The main purpose of data analysis is to look at what the data is trying to tell us that are to
draw conclusions on phenomenon and carry out predictions.
Data analysis makes great use of economic data to describe phenomenon that is to test a
theory or estimate a relationship. We are looking in this section at the different types of data
and their uses in econometrics.
There are three major types of economic data sets in econometrics: cross-sectional, time-
series, and panel. They are distinguished by the dependence structure across observations.
2
Quantitative Analysis
Cross-Section data show spatial variation: Variation across units. Cross-sectional data can be
used in cross-sectional regression, which is regression analysis of cross-sectional data. For
example, if we want to measure current obesity levels in a population, we could draw a
sample of 1,000 people randomly from that population, measure their weight and height, and
calculate what percentage of that sample is categorized as obese. This cross-sectional sample
provides us with a snapshot of that population, at that one point in time. Note that we do not
know based on one cross-sectional sample if obesity is increasing or decreasing; we can only
describe the current proportion.
In time series, data are observations on a variable over time. The same small-scale or
aggregate entity is observed at various points in time. A time-series variable is often
subscripted with the letter t that is they are indexed by time. This type of data is characterized
by serial dependence so the random sampling assumption is inappropriate. Most aggregate
economic data is only available at a low frequency (annual, quarterly or perhaps monthly) so
the sample size is typically much smaller than in cross-section studies. The exception is
financial data where data are available at a high frequency (weekly, daily, hourly, or by
transaction) so sample sizes can be quite large.
Most often time-series data are macro data or macro-type data, for example time-series for
macro-economic variables from the National Accounts. But micro-data may also occur as
time-series, for example time-series for a particular household or time-series for a particular
firm. In time-series data the data variation goes over time periods; we have variation over
time (time serial variation). Time-Series data show temporal variation: Variation over periods
(years, months, weeks, seconds ...).
Panel data combines elements of cross-section and time-series. These data sets consist of a set
of individuals (typically persons, households, or corporations) surveyed repeatedly over time.
The common modelling assumption is that the individuals are mutually independent of one
another, but a given individual‟s observations are mutually dependent. This is a modified
random sampling environment. Thus, the ordering in the cross section of a panel data set does
not matter, but the ordering in the time dimension matters a great deal. If we do not take into
account the time in panel data, we say that we are using pooled cross sectional data.
3
Quantitative Analysis
Panel data (or time-series cross-sectional (TSCS) data), combines both and looks at multiple
subjects and how they change over the course of time. Panel analysis uses panel data to
examine changes in variables over time and differences in variables between subjects. The
variables in a panel data set can vary both across the spatial dimension and over and time
dimension. But some of them may vary along one dimension only. Panel data show both
spatial and temporal variation.
Panel data have, over the years, become a gradually more important and more frequently used
data type for analysing economic relationships. This has several explanations: (i) Panel data is
a `richer' data type than (pure) cross-section data and (pure) time-series data. (ii) The
development of the data collection and data processing methods. (iii) The development in
computer technology. With panel data, all the coefficients can be estimated simultaneously
meanwhile it was impossible with pure cross-sectional and pure time series data.
Quantitative research typically explores specific and clearly defined questions that examine
the relationship between two events, or occurrences, where the second event is a consequence
of the first event. Such a question might be: „what impact did the programme have on
children‟s school performance?‟ To test the causality or link between the programme and
children‟s school performance, quantitative researchers will seek to maintain a level of control
of the different variables that may influence the relationship between events and recruit
respondents randomly. Quantitative data is often gathered through surveys and questionnaires
that are carefully developed and structured to provide you with numerical data that can be
explored statistically and yield a result that can be generalised to some larger population.
4
Quantitative Analysis
Research following a qualitative approach is exploratory and seeks to explain „how‟ and
„why‟ a particular phenomenon, or programme, operates as it does in a particular context. As
such, qualitative research often investigates i) local knowledge and understanding of a given
issue or programme; ii) people‟s experiences, meanings and relationships and iii) social
processes and contextual factors (e.g., social norms and cultural practices) that marginalise a
group of people or impact a programme. Qualitative data is non-numerical, covering images,
videos, text and people‟s written or spoken words. Qualitative data is often gathered through
individual interviews and focus group discussions using semi-structured or unstructured topic
guides.
The table below summarises key differences between qualitative and quantitative research
(analysis).
Qualitative research Quantitative research
Type of Subjective Objective
knowledge
Aim Exploratory and observational Generalizable and testing
Flexible Fixed and controlled
Characteristics Contextual portrayal Independent and dependent
variables
Dynamic, continuous view of Pre and post-measurement of
change change
Sampling Purposeful Random
Data collection Semi-structured or unstructured Structured
Nature of data Narrative, quotations, descriptions Number statistics
Value uniqueness, particularity replication
Analysis Thematic Statistical
There are many different methods of collecting data. Depending on the type of research
(qualitative or quantitative), they may use one or more of the following forms:
5
Quantitative Analysis
Quantitative data is numerical and can be collected in a number of forms. The most common
forms of quantitative data are shown below.
• Units: number of staff that have been trained; number of children enrolled in school for the
first time
• Prices: amount of money spent on a building, or the additional revenue of farmers following
a seed distribution programme
• Rates of change: percentage change in average household income over a reporting period
• Scoring and ranking: scores given out of ten by project participants to rate the quality of
service they have received.
Statistical analysis is used to summarise and describe quantitative data and graphs or tables
can be used to visualise present raw data. This section will review the commonly used
methods/sources of quantitative data and the techniques used for recruiting participants.
Quantitative data can be collected using a number of different methods and from a variety of
sources.
1. Surveys and questionnaires use carefully constructed questions, often ranking or scoring
options or using closed-ended questions. A closed-ended question limits respondents to a
specified number of answers. For example, this is the case in multiple-choice questions. Good
quality design is particularly important for quantitative surveys and questionnaires.
2. Biophysical measurements can include variables such as height and weight of a child
3. Project records are a useful source of data. For example, the number of training events
held and the number of participants attending
4. Service provider or facility data includes school attendance or health care provider
vaccination records
6
Quantitative Analysis
5. Service provider or facility assessments are often carried out during the monitoring and
evaluation of our projects.
Individual interview
An individual interview is a conversation between two people that has a structure and a
purpose. It is designed to elicit the interviewee‟s knowledge or perspective on a topic.
Individual interviews, which can include key informant interviews, are useful for exploring an
individual‟s beliefs, values, understandings, feelings, experiences and perspectives of an
issue. Individual interviews also allow the researcher to ask into a complex issue, learning
more about the contextual factors that govern individual experiences.
Photovoice
Photovoice is a participatory method that enables people to identify, represent and enhance
their community, life circumstances or engagement with a programme through photography
and accompanying written captions. Photovoice involves giving a group of participant‟s
cameras, enabling them to capture, discuss and share stories they find significant.
Picture story
The picture story method enables children, in a fun and participatory way, to communicate
their perspectives on particular issues through a series of drawings (story telling) they have
7
Quantitative Analysis
made. The story telling can either be done in writing, depending on the child‟s level of
literacy, or verbally with a researcher. The picture story method is relatively quick and
inexpensive, particularly if the draw-and-write technique is adopted. The picture story method
provides a non-threatening way to explore children‟s views on a particular issue (e.g. barriers
to girl‟s education) and to begin to identify what can be done to address any struggles faced
by children.
Qualitative research often focuses on a limited number of respondents who have been
purposefully selected to participate because you believe they have in-depth knowledge of an
issue you know little about, such as:
They have experienced first-hand you topic of study, e.g. workers of an organisation
They show variation in how they respond to hardship
They have particular knowledge or expertise regarding the group under study, e.g.
social workers supporting working street children.
You can select a sample of individuals with a particular „purpose‟ in mind in different ways,
including random sampling, purposive sampling, cluster sampling, etc.
Data analysis methods depends on the type of data analysis conducted (Quantitative or
qualitative).
After preparing your data that is transforming the raw data collected into a usable and
readable format by validating, editing and coding the data, the data is ready for analysis. The
two most commonly used quantitative data analysis methods are descriptive statistics and
inferential statistics.
5.1.1 Descriptive statistics
Typically, descriptive analysis (or descriptive analysis) is the first level of analysis. It helps
the researcher summarise the data and find patterns. A few commonly used descriptive
statistics are:
8
Quantitative Analysis
Percentage: it is used to express how a value or group of respondents within the data relates
to a larger group of respondents.
Frequency: the number of times a value is found.
Range: the highest and lowest value in a set of values.
Descriptive statistics provide absolute numbers. However, they do not explain the rationale or
reasoning behind those numbers. Before applying descriptive statistics, it‟s important to think
about which one is best suited for your research question and what you want to show. For
example, a percentage is a good way to show the gender distribution of respondents.
Descriptive statistics are most helpful when the research is limited to the sample and does not
need to be generalized to a larger population. For example, if you are comparing the
percentage of female and male employees in two different organisations, then descriptive
statistics is enough. Since descriptive analysis is mostly used for analysing single variable, it
is often called univariate analysis.
Often, researchers collect data on a sample of their population, then they generalize the results
to the entire population or target group. Inferential statistics are used to generalize results and
make predictions about a larger population.
These are complex analyses that show the relationship between several different variables,
rather than describing a single variable. They are used when the researcher needs to go
beyond absolute values and understand the relations between variables.
A few types of inferential analysis are:
Correlation: This describes the relationship between two variables. If a correlation is found,
it means that there is a relationship among the variables. For example, taller people tend to
have a higher weight. Hence, height and weight are correlated with each other. However, this
doesn‟t necessarily mean that one variable causes the other (e.g. gaining weight doesn‟t cause
people to grow taller).
Regression: This shows the relationship between two variables. For example, regression can
help us guess someone‟s weight based on their height.
9
Quantitative Analysis
Analysis of variance (ANOVA): This is a statistical procedure used to test the degree to
which two or more groups vary or differ in an experiment. In most experiments, a great deal
of variance indicates that there was a significant finding from the research. For example, to
understand the relationship between number of children in the family and the socio economic
status, a researcher may recruit a sample of families from each economic status and ask them
about their ideal number of children. The ANOVA will be used to check if the difference
between the groups‟ answers is statistically significant or due to random chance.
The choice of inferential statistic completely depends upon the research objective. Like in the
case of descriptive analysis, it is best to identify the appropriate inferential statistic for your
research question.
Since inferential statistics are used to determine the relationship between two or more
variables, they are called bivariate analysis (when limited to two variables) or multivariate
analysis (when there are more than two variables).
The above-stated methods are the most commonly used methods for data analysis. However,
other data analysis methods and metrics, such as standard deviation and variance, are also
available.
Qualitative data analysis works a little differently from quantitative data, primarily because
qualitative data is made up of words, observations, images, and even symbols. Deriving
absolute meaning from such data is nearly impossible; hence, it is mostly used for exploratory
research. While in quantitative research there is a clear distinction between the data
preparation and data analysis stage, analysis for qualitative research often begins as soon as
the data is available.
Analysis and preparation of data happen in parallel and include the following steps:
1. Getting familiar with the data: since most qualitative data are just words, the researcher
should start by reading it several times to get familiar and start looking for basic
observations or patterns. This includes transcribing the data.
2. Revisiting the research objectives to identify the questions that can be answered
through the collected data.
10
Quantitative Analysis
11
Quantitative Analysis
TOPIC ONE:
HYPOTHESIS TESTING
The null hypothesis denoted as H0 states that there is no effect (or relationship between) of
the independent variable on the dependent variable. This is the hypothesis the researcher want
to verify or test on the basis of sampled information.
The alternative (directional) hypothesis denoted H1 is the counter proposition to the null
hypothesis. It is formulated or states that the exogenous (independent) variable has an effect
on the endogenous (dependent) variable.
After formulating the research hypotheses, the research wishes to reject the null hypothesis
(H0) and accept the alternative (H1).
In testing hypothesis that is when making decision whether to reject or accept the null
hypothesis (H0), the researcher is liable to commit some errors. One or two types of errors are
usually committed:
Type I error: it is committed when the null hypothesis (H0) is true but is rejected for sampled
data (when the null hypothesis is rejected when it is actually true). This implies that the
alternative hypothesis (H1) is accepted when in fact it is wrong. It should further be noted that
12
Quantitative Analysis
the probability of committing type I error is called the level of significance and is noted . As
such the level of significance simply refers to the probability of rejecting H0 when in fact it is
true.
Type II error: It is committed when the null hypothesis (H0) which is false in the population
is accepted erroneously on the basis of sampled information. It means the alternative
hypothesis is rejected whereas it should not be rejected. The probability of committing type II
error is denoted .
It is equally important to divide the whole set of value of the population into 2 zones / regions
namely the acceptance region and the rejection or critical region. The values of the population
form a normal distribution. The critical zone can be chosen either at the left end of the
distribution or at the right end (tail), or half at each tail (end) of the distribution.
13
Quantitative Analysis
[ Insert Graph]
Step 1: formulate your null and alternative hypotheses (H0 and H1)
Step 4: determine the tabular or critical value of the test statistic at the specified level of
significance
Step 5: compare the computed (or calculated) test statistic with the critical value. If the
computed value lies in the critical zone (is greater than the critical value), reject the null
hypothesis (H0) and conclude that the estimation (test statistics) is statistically significant.
Otherwise, that is if the computed or calculated value lies outside the critical zone or lies in
the acceptance region (is less than the critical value) accept the null hypothesis (H0) and
conclude that the estimation (test statistics) is statistically insignificant.
14
Quantitative Analysis
TOPIC TWO
The chi square distribution is the most commonly used test of association for research by
students. There are two variants of the Chi square distribution
The Chi square test goodness of fit seeks to compare the observed frequencies of a
distribution with the expected frequencies. When the Chi square goodness of fit is used, the
data are grouped into K categories and the observed frequencies for each category are
determined. For each category, the expected frequency can be determined if the data were
distributed in a specific hypothetical manner.
The computed (or calculated) chi square value can then be compared with the tabular (or
critical) chi square to decide whether to reject or accept the null hypothesis (H0).
Application:
A year 3 student wants to investigate the most important reason for the choice of a degree
program in FEMS. The question asked to the students is „which of the following reasons best
explained (is the most important to you) the choice of your study program?”. Students are
obliged to choose only one reason from the following list:
a) Prestige
b) Availability of pedagogic material
c) Ease of passing examination
d) Job opportunities
Responses were collected from 28 students and summarised in the table below:
15
Quantitative Analysis
Task: Investigate whether the reason for the choice of a study program plays a role in the
decision to choose a training program at 5% level of significance.
Solution:
When we have a cross tabulation of 2 categorical or qualitative data (contingency table), the
question we usually ask is: are the two variables related to each other or they are independent?
In other words, does one variable affect the other? It is very important to note that these two
variables must be categorical in nature.
The chi square test of independence is conducted with the following formula:
Where:
Ri = total of row i
Ci = total of column i
T = grand total
r = number of rows and c = number of columns.
Application:
The table below shows information of political affiliation of personnel of a public company in
Cameroon.
16
Quantitative Analysis
Solution:
The contingency coefficient is a coefficient of association that tells whether two variables or
data sets are independent or dependent of each other. It is also known as Pearson’s
Coefficient (not to be confused with Pearson‟s Coefficient of Skewness).
Where:
If C is near zero (or equal to zero) you can conclude that your variables are
independent of each other; there is no association between them.
If C is away from zero there is some relationship; C can only take on positive values.
17
Quantitative Analysis
The larger the table your chi-square coefficient is calculated from, the closer to 1 a perfect
association will approach. That‟s why some statisticians suggest using the contingency
coefficient only if you‟re working with a 5 by 5 table or larger.
A contingency coefficient is particularly informative if you are working with a large sample,
and you do not need to find out if an association is complete or not (just whether or not the
association exists).
Other alternative measures of association include the phi coefficient (which has the same
weak point as our C; never reaching one), and Cramer’s V. Cramer‟s V is often preferred
because with perfect association, it becomes exactly 1 no matter how large the table.
N.B: The p-value for the significance of the contingency coefficient and Cramer‟s V
coefficient are the same as that of the chi square.
Application:
Solution:
Assignment:
The results of a random sample of children with pain from muscular injuries treated with
Amoxicillin, Ibuprofen and codeine are shown in the table.
Amoxicillin Ibuprofen codeine Total
Significant improvement 58 81 61
Slight improvement 42 19 39
Total
At , is there enough evidence to conclude that the treatment and result are
independent?
18
Quantitative Analysis
TOPIC THREE
The student t-test tells you how significant the differences between groups are. In other
words, it informs you if those differences (measured in means) could have happened by
chance. There are principally three types of t-tests.
It is also known as the t test of one sample / population mean. It compares the mean of your
sample data to a known value (hypothesised value). For example, you may be interested to
know how your sample mean compares to the population mean. The one sample t tesr is
appropriate when you do not know the population standard deviation or you have a small
sample size. The assumptions of the test are:
Data is independent
̅
⁄
√
Application:
You company wants to improve sales. Past sales data indicate that the average sale was 100
MU per transaction. After training your sales force, recent sales data taken from a sample of
19
Quantitative Analysis
25 salesmen indicates an average sale of 130 MU, with a standard deviation of 15 MU. Did
the training work? Test the hypothesis at 5% level of significance.
Solution:
Assignment:
A lecturer claims that the average performance of students in quantitative analysis is 13. After
administering a test to a sample of 16 students, the mean score is 15 with a standard deviation
of 2.5. Investigate the claim of the lecturer at 5% level.
It is also known as the t-test statistics about difference two populations means. This is the
most common form of the t-test. It helps you to compare the means of two sets of data. For
example, you could run a test to see if the mean test scores of males and females are different.
This second variant of t-test answers the question. Could these differences have occurred by
random chance? Assumptions of the test include:
Independence of the two samples: you need two independent categorical groups that
represent your independent variables (for example male and female).
The dependent variable should be approximately normally distributed and measured
on a continuous scale.
The variances of the dependent variable should be equal
Consider ̅̅̅ the sample mean of population 1 and ̅̅̅ that of population 2, then the difference
between the 2 samples means ( ̅̅̅ ̅̅̅ ). The formula for computing the value of t-test
statistics for such a difference is given by:
20
Quantitative Analysis
√[ ]
√[ ]
̅̅̅ ̅̅̅ are respectively the first and second sample means
are the variances of the first sample and second sample respectively
are the number of observations in the first and second samples respectively
The test that assumes equal population variances is referred to as the pooled t-test. Pooling
refers to finding a weighted average of the two independent sample variances.
The pooled test statistic uses a weighted average of the two sample variances.
( ) ( )
variances
If then the variance based on the larger sample size will receive more weight than
the other Variance.
The advantage of this test statistic is that it exactly follows the student‟s t-distribution with
n1+ n2– 2 degrees of freedom. The t-statistics formula with pooled variance becomes:
√[ ] √ [ ] √[ ]
√[ ]
Where Varp and Sp are respectively the pooled variance and pooled standard deviation.
The above formula is based on a restrictive assumption of equality of variances which rarely
holds true. However, when the assumption of equality of variances is violated, how do you
21
Quantitative Analysis
proceed? You can still use the two-sample t-test. You use a different estimate of the standard
deviation.
Application:
A researcher wishes to verify that students from private universities perform better than those
from state universities. He selects a sample from each type of the universities and administers
the same test to both samples and obtain the following results:
Solution:
Assignment:
This section describes the hypothesis testing procedure for the difference between two
population means when the two samples are dependent.when we have two samples collected
from the same source, the sample are called matched or paired samples and the previous test
procedure is no longer applicable. For example we may administer a test in quantitative
analysis and research methods to the same group of year 3 students. In paired samples, the
difference between 2 data values for each element of the two samples is denoted by d and is
called the paired difference. Usually we need to compute the mean and standard deviation of
the paired differences for the samples:
22
Quantitative Analysis
∑ ∑ ̅
̅ √[ ]
̅
̅
̅ √
̅
∑ ̅
√
Application:
Two tests, one in quantitative analysis and the other in managerial economics were
administered to a sample of 10 level 400 management students in FEMS. The score on 100
are presented in the table below:
Quantitative Analysis 60 51 42 53 40 74 65 41 55 70
Managerial Economics 75 58 79 81 55 70 80 74 60 83
Verify the hypothesis that performances in the two subjects were not the same
Solution:
Assignment:
23
Quantitative Analysis
TOPIC FOUR
ANOVA is used when the population means to be compared are more than two (2). There are
basically two types of ANOVA:
1.1 Definition
This is a design where one independent variable with many levels is associated with a
dependent variable. A typical design of the one way ANOVA is the comparison of the mean
from three different populations ( ̅ ̅ ̅ ). Since it is usually difficult to get the true
population means, we infer the information from sample means computed as estimate of the
population.
ANOVA is a technique that breaks down total variation into 2 components. These
components are between group variation and within group variation.
The between group variation is due to the general differences among the means of the
samples. The possible explanation for between group variability is what we call the treatment
or group effect. This means that the differences arise due to the treatments that are
implemented. Such differences may also arise due to individual differences. In this case,
subjects may have entered the treatment conditions with different ability or attitude.
The within group variability is due to variation among subjects within the respective samples.
Example: Is there a significant difference of reading skills between children from various
socio economic backgrounds (High, Medium, Low).
24
Quantitative Analysis
To explain this concept, we consider a one factor experiment with 3 treatments or groups as
displayed in the table below
G1 G2 G3
X11 X21 X31
X12 X22 X32
X13 X23 X33
. . .
. . .
. . .
X1n1 X2n2 X31
T1 T2 T3
̅ ̅ ̅
The table shows that there are n1 subjects in group G1, n2 subjects in group G2 and n3 subjects
in group G3. The groups means are ̅ ̅ ̅ .
Step 1: We compute the sum of squares for total variation (SStotal), sum of squares for
between group variation (SSBG) and sum of squares for within group variation (SSWG).
Where∑ is the sum of squared observations in the different groups and is the squared
sum of the sub totals ( ∑ ).
25
Quantitative Analysis
In ANOVA, we have to compute 3 degrees of freedom for each of the sum squares variation.
Once the F-ratio has been calculated, it can be compared with the critical F for a given level
of significance at specified degree of freedom for the numerator ( ) and degree of freedom
for the denominator ( ).
The calculation required for the above test is summarised in a table known as the ANOVA
table:
at F (k-1; N-k)
26
Quantitative Analysis
Application:
The yield of certain varieties of corn grown a particular type of soil, treated with chemical C1,
C2 and C3 are given in the table below:
C1 C2 C3
3 2 4
4 4 6
5 3 5
4 3 5
Solution:
Assignment:
Three machines are tested in a company to see if the machines produce the same quantity of a
given product. Productions for a certain number of days of the week are presented in the table
below:
27
Quantitative Analysis
M1 M2 M3
10 9 17
8 8 15
12 14 12
5 13 14
11 16
Task: Test at 1% level of significance if the productions from the three above machines are
the same.
A one way ANOVA will tell you that at least two groups were different from each other. But
it will not tell you which groups were different. If your test returns a significant f-statistic,
you may need to run an ad hoc test (like the Least Significant Difference test) to tell you
exactly which groups had a difference in means.
2.1 Definition
A Two Way ANOVA is an extension of the One Way ANOVA. With a One Way, you have
one independent variable affecting a dependent variable. With a Two Way ANOVA, there
are two independents. Use a two way ANOVA when you have one measurement variable
(i.e. a quantitative variable) and two nominal variables. In other words, if your experiment
has a quantitative outcome and you have two categorical explanatory variables, a two way
ANOVA is appropriate.
For example, you might want to find out if there is an interaction between income and gender
for anxiety level at job interviews. The anxiety level is the outcome, or the variable that can
be measured. Gender and Income are the two categorical variables. These categorical
variables are also the independent variables, which are called factors in a Two Way ANOVA.
28
Quantitative Analysis
The factors can be split into levels. In the above example, income level could be split into
three levels: low, middle and high income. Gender could be split into three levels: male,
female, and transgender. Treatment groups are all possible combinations of the factors. In this
example there would be 3 x 3 = 9 treatment groups.
The results from a Two Way ANOVA will calculate a main effect and an interaction effect.
The main effect is similar to a One Way ANOVA: each factor‟s effect is considered
separately. With the interaction effect, all factors are considered at the same time. Interaction
effects between factors are easier to test if there is more than one observation in each cell. For
the above example, multiple stress scores could be entered into cells. If you do enter multiple
observations into cells, the number in each cell must be equal.
Two null hypotheses are tested if you are placing one observation in each cell. For this
example, those hypotheses would be:
For multiple observations in cells, you would also be testing a third hypothesis:
H03: The factors are independent or the interaction effect does not exist.
29
Quantitative Analysis
MANOVA is just an ANOVA with several dependent variables. It‟s similar to many other
tests and experiments in that its purpose is to find out if the response variable (i.e. your
dependent variable) is changed by manipulating the independent variable. The test helps to
answer many research questions, including:
Suppose you wanted to find out if a difference in textbooks affected students‟ scores in math
and science. Improvements in maths and science means that there are two dependent
variables, so a MANOVA is appropriate.
An ANOVA will give you a single (univariate) F-value while a MANOVA will give you a
multivariate F value. MANOVA tests the multiple dependent variables by creating new,
artificial, dependent variables that maximize group differences. These new dependent
variables are linear combinations of the measured dependent variables.
If the multivariate F value indicates the test is statistically significant, this means that
something is significant. In the above example, you would not know if math scores have
improved, science scores have improved (or both). Once you have a significant result, you
would then have to look at each individual component (the univariate F tests) to see which
dependent variable(s) contributed to the statistically significant result.
Advantages
30
Quantitative Analysis
Disadvantages
4.1 Definition
A factorial ANOVA is an Analysis of Variance test with more than one independent variable,
or “factor“. It can also refer to more than one Level of Independent Variable. For example,
an experiment with a treatment group and a control group has one factor (the treatment) but
two levels (the treatment and the control). The terms “two-way” and “three-way” refer to the
number of factors or the number of levels in your test. Four-way ANOVA and above are
rarely used because the results of the test are complex and difficult to interpret.
A two-way ANOVA has two factors (independent variables) and one dependent
variable. For example, time spent studying and prior knowledge are factors that affect
how well you do on a test.
A three-way ANOVA has three factors (independent variables) and one dependent
variable. For example, time spent studying, prior knowledge, and hours of sleep are
factors that affect how well you do on a test
4.2 Variability
In a one-way ANOVA, variability is due to the differences between groups and the
differences within groups. In factorial ANOVA, each level and factor are paired up with each
31
Quantitative Analysis
other (“crossed”). This helps you to see what interactions are going on between the levels and
factors. If there is an interaction then the differences in one factor depend on the differences in
another.
Let‟s say you were running a two-way ANOVA to test male/female performance on a final
exam. The subjects had either had 4, 6, or 8 hours of sleep.
A two-way factorial ANOVA would help you answer the following questions:
1. Is sex a main effect? In other words, do men and women differ significantly on their
exam performance?
2. Is sleep a main effect? In other words, do people who have had 4,6, or 8 hours of sleep
differ significantly in their performance?
3. Is there a significant interaction between factors? In other words, how do hours of
sleep and sex interact with regards to exam performance?
4. Can any differences in sex and exam performance be found in the different levels of
sleep?
A Student‟s t-test will tell you if there is a significant variation between groups. A t-test
compares means, while the ANOVA compares variances between populations.
You could technically perform a series of t-tests on your data. However, as the groups grow in
number, you may end up with a lot of pair comparisons that you need to run. ANOVA will
32
Quantitative Analysis
give you a single number (the f-statistic) and one p-value to help you support or reject the null
hypothesis.
33
Quantitative Analysis
TOPIC FIVE
34