You are on page 1of 4

STEPS IN QUANTITATIVE DATA ANALYSIS

This is a broad outline of key steps in quantitative data analysis likely to be


performed in M&E activities. It aims at describing the line of reasoning that is
pursued more than specific analysis tools. For each step in quantitative data
analysis, the following is presented:
Step
? Description
What the step is and why you are doing it.

 Actions and tools


What you will do and what you will use to do it.
 Reporting
How you will say what you found. Clarity is very important: results
should be written clearly, in plain language, not in statistical jargon! In
general, reporting should record and illustrate:
 How did we implement the step?
 What considerations guided us (e.g. assumptions)?
 What results did we obtain?
Formulate an hypothesis and select variables
? A hypothesis is a statement about an expected relationship between two
or more variables that permits empirical testing. Formulating the
hypothesis and then choosing the variables represent the key
conceptual stage of the research, since it defines the direction of the
study. If you play enough with a set of data you will find some sort of
relationship, but the relationships that are meaningful to you should be
those defined in the hypothesis.
 You must clarify:
 Why are we doing a study?
 What are the key variables?
 What relationship we can expect among them?
 The first part of a study report should declare and clarify the hypothesis.
It is important to show how they are related to the data gathering and
analysis. (What data were collected? What analysis tools will be
employed?)
Determine sample (randomly selected!)
Collect the data
Prepare the data
? Data must be cleaned and organised for analysis. (Note that coding and
nature of the data should be thought through before the data gathering
process starts and should pre-tested.)
 Actions:
 Code/ input the data in the analysis software
 Check the data for errors and accuracy (Are all the responses
reasonable? Are all relevant questions answered? Are the responses
complete?)
 Transform the data (e.g. collapse data into categories, handle missing
values)
Tools:
Steps in quantitative data analysis - Page 1/4
Levels of measurement (i.e. nominal, ordinal, interval, ratio) determine
what analysis tools can be used for a variable.
 Briefly describe the dataset on which you are operating, focusing only on
unique aspects of the study (e.g. what categories did you employ?).
Describing the sample
? To put the data in context, describe it in terms of averages (e.g. average
height) and variation (e.g. the range of heights).
 Employ descriptive statistics:
 Measures of central tendency, which indicate a typical or central
figure for a group of members (e.g. mean, mode, median)
 Measures of variance, which indicate the dispersion of the data, how
scattered the data are (e.g. standard deviation for continuos
data/range for categorical data)
 This stage is likely to produce extensive information. A report at this
stage could read:
“There were x children in the study. The average [variable 1] was x,
ranging from x1 to x2. The average [variable 2] was z, with a standard
deviation of z…
Use tables and graphs to summarise and clarify the most important
information.
Comparing groups within the sample:

When proceeding from simply describing the data to making inferences


on them, we enter the realm of probability. We will have to accept that
we are working on a sample and that we will never be certain that this
perfectly represents the reality. Could the results have arisen by
chance? To what extent are they really typical? Can we really generalise
our conclusions?
 Our ability to pick up a real difference between groups (if such a
difference exists) is determined by the number of observations within
each group (the sample size). The larger the sample size, the more
likely we are to pick up differences between groups if they actually
exist.
 The amount of variation (e.g. the range of heights of boys and girls) is
a factor. The less variation within each group, the more likely we are to
pick up differences between groups if they actually exist.
As an example, if we accept a 95% CI there is 1 in 20 chances that a
relation can be by chance. Another way of looking at it is that if you ask
20 questions, one of those is probably not correct. (THIS LAST
BULLET IS NOT VERY CLEAR.).
Explore the differences between data
? This means assessing whether the differences among the same variable
in two different groups are statistically relevant. For example, you found
out that the values of the averages of a given variable are different in
two groups (e.g. study group and control group). Can you say that there
is really a difference among the two groups or could this difference have
arisen by chance?
  Measures of significance, e.g. the t-test are used to find out if a
difference is significant.
 N/A
Steps in quantitative data analysis - Page 2/4
Explore the relationships within data (among pairs of variables)
? You must understand what relations exist among different variables in
your dataset and establish if they are statistically significant, particularly
between measures of programme operations and measures of expected
effects
  Formulate hypothesis on what relationships are likely to exist amongst
your MAIN variable and the others by using “null hypothesis”— assume
that there is NO relationship between variable x and z, then run tests
to disprove this.
 Measure the strength of the relationship.
 Understand the likelihood that this relationship appeared by chance.

Example of strong relationship.


Example of weak
But what is the likelihood that it
relationship
appeared by chance?

Whether variables are nominal, ordinal or interval determines which


analytical techniques are appropriate for studying relationships
 For nominal variables, cross tabulations of the data are used (e.g. chi-
square test).
 Interval variables use correlation tests.
 You will have to document the relationships relevant for your study, i.e.:
 Those that involve the main variable being analysed
 Those that are likely not to have appeared by chance and are strong
enough to be significant.
When writing reports on correlation, don’t limit yourself to the statistical
jargon: e.g. “we did a chi-square test and it revealed a p of x." You are
not simply looking for the results of a test! Instead, clarify the
hypothesis, clarify the effect of the relationship of the data, the strength
of the relation, the likelihood that the relationship did not appear by
chance. For example, while exploring the relationship within BMI and n.
of siblings you could say: “children with more than one sibling had a
lower BMI. Children with no siblings had a BMI of x. The relationship was
distinguishably different at the 95% confidence interval."

Finding a statistical relationship among variables does not always


imply that one caused the other. The relation could be true, but could
just appear by accident. Causality is not something that can be
revealed only by statistical tests, but is a logic process that builds on
the statistical findings (i.e.: the existence of a relationship).

Explore models built on relevant variables


? Devise and test explanatory models.

Steps in quantitative data analysis - Page 3/4


 Actions:
Considering all the relationships you discovered, choose the set of key
variables that you think are most likely to influence the main variable.
Test them to understand if and to what degree they can explain the
variance of the main variable of your study. The models could be
applied to different subgroups of the main variables (e.g. male and
female population).
In building up your case you could, for example:
 Choose variables because you discovered that they are strongly
correlated with the main variable
 Choose variables that are not so strongly correlated, but still you deem
important for your model
 Discard variables that are strongly correlated to your main variable, but
are on the same causal pathway, therefore would not add relevant
information
 Discard variables that are apparently related but could lead you to
wrong conclusions.
Those models are based on your judgement and should be justified.
Note that multivariate techniques can be very powerful analytical tools,
but they must be used with great care. They are all based on numerous
assumptions, some of which will not be met. As a result, apparent
findings often are not valid. A plan for data analysis should not include
any multivariate techniques unless the evaluation team -- manager and
consultants -- are already well-acquainted with them or can call on the
assistance of someone who knows how to use them.
Tools:
Regression analysis/Multivariate analysis
 Explain:
 why you did/did not choose variables
 the rationale for the model
 the finding from statistical tests
Organise and present the data (see Overview – managing data analysis)
Validate/discuss with key stakeholders (see Overview – managing data
analysis)

Steps in quantitative data analysis - Page 4/4

You might also like