Professional Documents
Culture Documents
This is a broad outline of key steps in quantitative data analysis likely to be performed in M&E activities. It
aims at describing the line of reasoning that is pursued more than specific analysis tools. For each step in
quantitative data analysis, the following is presented:
Step
Description
? What the step is and why you are doing it.
Reporting
How you will say what you found. Clarity is very important: results should be written clearly, in plain
language, not in statistical jargon! In general, reporting should record and illustrate:
How did we implement the step?
What considerations guided us (e.g. assumptions)?
What results did we obtain?
The first part of a study report should declare and clarify the hypothesis.
It is important to show how they are related to the data gathering and analysis. (What data were
collected? What analysis tools will be employed?)
Actions:
Code/ input the data in the analysis software
Check the data for errors and accuracy (Are all the responses reasonable? Are all relevant
questions answered? Are the responses complete?)
Transform the data (e.g. collapse data into categories, handle missing values)
Tools:
Levels of measurement (i.e. nominal, ordinal, interval, ratio) determine what analysis tools can be
used for a variable.
Briefly describe the dataset on which you are operating, focusing only on unique aspects of the
study (e.g. what categories did you employ?).
This stage is likely to produce extensive information. A report at this stage could read:
“There were x children in the study. The average [variable 1] was x, ranging from x1 to x2. The
average [variable 2] was z, with a standard deviation of z…
Use tables and graphs to summarise and clarify the most important information.
When proceeding from simply describing the data to making inferences on them, we enter the
realm of probability. We will have to accept that we are working on a sample and that we will
never be certain that this perfectly represents the reality. Could the results have arisen by chance?
To what extent are they really typical? Can we really generalise our conclusions?
Our ability to pick up a real difference between groups (if such a difference exists) is determined
by the number of observations within each group (the sample size). The larger the sample size,
the more likely we are to pick up differences between groups if they actually exist.
The amount of variation (e.g. the range of heights of boys and girls) is a factor. The less variation
within each group, the more likely we are to pick up differences between groups if they actually
exist.
As an example, if we accept a 95% CI there is 1 in 20 chances that a relation can be by chance.
Another way of looking at it is that if you ask 20 questions, one of those is probably not correct.
(THIS LAST BULLET IS NOT VERY CLEAR.).
Measures of significance, e.g. the t-test are used to find out if a difference is significant.
N/A
Formulate hypothesis on what relationships are likely to exist amongst your MAIN variable and
the others by using “null hypothesis”— assume that there is NO relationship between variable x
and z, then run tests to disprove this.
Measure the strength of the relationship.
Understand the likelihood that this relationship appeared by chance.
Whether variables are nominal, ordinal or interval determines which analytical techniques are
appropriate for studying relationships
For nominal variables, cross tabulations of the data are used (e.g. chi-square test).
Interval variables use correlation tests.
You will have to document the relationships relevant for your study, i.e.:
Those that involve the main variable being analysed
Those that are likely not to have appeared by chance and are strong enough to be significant.
When writing reports on correlation, don’t limit yourself to the statistical jargon: e.g. “we did a chi-
square test and it revealed a p of x." You are not simply looking for the results of a test! Instead,
clarify the hypothesis, clarify the effect of the relationship of the data, the strength of the relation, the
likelihood that the relationship did not appear by chance. For example, while exploring the
relationship within BMI and n. of siblings you could say: “children with more than one sibling had a
lower BMI. Children with no siblings had a BMI of x. The relationship was distinguishably different at
the 95% confidence interval."
Finding a statistical relationship among variables does not always imply that one caused the
other. The relation could be true, but could just appear by accident. Causality is not something
that can be revealed only by statistical tests, but is a logic process that builds on the statistical
findings (i.e.: the existence of a relationship).
Explain:
why you did/did not choose variables
the rationale for the model
the finding from statistical tests
Organise and present the data (see Overview – managing data analysis)