Professional Documents
Culture Documents
DATA ANALYSIS
A )Introduction to statistics
Descriptive statistics
Probability
Inference
B) Sampling design
C) Data collection techniques
D )Probability distributions
E ) Statistical inference
INTRODUCTION TO STATISTICS
Is the study of how to organize, analyze and interpret numerical information from data so as to
draw valid and meaningful conclusions?
Data is the raw facts, numbers, and ideas pertaining to an activity of interest.
Information is processed data.
Statistics is divided into two branches
i) Descriptive statistics
This is used when the purpose of an investigation is to describe the data that has been (or will be
collected.
Suppose a researcher is interested in determining the proportion of voters in his village ( LC 1) who
prefer a certain candidate . The focus of the researcher is his/her village and he/she will collect data
on all the voters in the village and note whether each voter supports the given candidate and the
calculate the proportion. Because the researcher is using statistical methods merely to describe the
data he/she collects, this is an example of descriptive statistics.
They are:
i) Size of the sample
ii) The method of sample selection
When the sample is the population, we are in the area of descriptive statistics and the conclusion
will be 100% certain. Thus one of the major goals of inferential statistics is to assess the degree of
certainty of inferences when such inferences are drawn from sample data.
VARIABLES AND CONSTANTS
Variables are characteristics of persons or objects that vary from one person to person or object to
object. In our example of the researcher above, the preference of a certain candidate by a voter will
vary from voter to voter hence preference here is a variable.
Characteristics that remain constant from person to person or object to object are called constants.
Whether a characteristic is designated as a variable or a constant depends on the study in question.
In the above study of voter preference, the number of voters in the village ( LC1) is constants ( i.e it
does not change for that village in this particular study).
MEASUREMENT OF VARIABLES
Measurement involves the observation of characteristics on persons or objects and the assignment
of numbers to such persons or objects so that the numbers assigned represent the amounts of the
characteristics possessed. The rules for making number assignments determine the type of
arithmetic operations and comparisons that can be meaningfully made.
Example 1. What is the probability of obtaining 2 heads when a coin is tossed twice ?
solution: there are 4 possible equally likely outcomes listed as {HH} {HT} {TH} {TT}
only one these results in 2 heads i.e. {HH}
Let E be the event “ obtaining two heads “
therefore P(E) = ¼
Example 2 . In a data set there are 227 males and 273 females, if the experiment consists of selecting
one individual from this data set at random.
What is the probability that the individual is
i) A Female
ii) A Male
iii) A toddler
iv) Either female or male
P(F) = 273/500
P(M) = 227/500
P(F) = 273/500
P(T) =0/500
P(M or F) =500/500
PROBABILITY RULES
COMPLEMENTARY RULE
In the example 2 above , a female individual cannot also be a male and vice-versa. We call these 2
events mutually exclusive or disjoint events .
The event “individual selected is male” is the same as the event ” individual selected is not female” .
These 2 events are said to be complementary events.
event “individual selected is male” is the complement of the event ” individual selected is female”
If 2 events E1 and E2 are complements of each other then p(E1) = 1- P(E2)
ADDITIVE RULE
For any 2 events E1 and E2 the probability of the combined event E1 or E2 is P(E1 or E2) and is given
by
p(E1 or E2) =p(E 1)+ P(E2). This is called the first additive rule of probability
The second additive rule states that p(E1 or E2) =p(E 1)+ P(E2) – P(E1 and E2) where
P(E1 and E2) represents the probability of the outcomes that E1 and E2 have in common
Example 3 A cross tabulation of the number of students in a data set who sat post secondary exam
by region is given below. Determine the probability of selecting a student who is from the south or
who took the post secondary exam.
Region
NEast central south west Total
MULTIPLICATIVE RULE
P(E1 and E2) = p(E 1). P(E2) e.g tossing a coin twice
P(HH) = P(H).P(H) = ½*1/2 = ¼
This rule applies when the two events are independent of one another ( have no effect on each
other). This rule can be extended to more than 2 events.
CONDITIONAL PROBABILITY
When two events are not independent, the probability of one event depends whether or not the
other has occurred. We write this as P(E1/E2) i.e probability of E1 given E2 has occurred.
Example 4: A cross tabulation of gender and marijuana usage is given below
Marijuana usage
Never Yes Total
Sex male 185 42 227
female 223 50 273
Total 408 92 500
a) The probability that a student randomly selected never smoked marijuana given that the
student is male is 185/227
b) The probability that a student randomly selected never smoked marijuana given that the
student is female is 223/273
c) The probability that a student randomly selected is male given that the student never
smoked marijuana is 185/408
In general P(E1/E2) = P (E1 and E2)/P(E2) as an illustration refer to example 4(a) above
P(E1) = student randomly selected never smoked marijuana = 185/500
P(E2) = student is male = 227/500
Now P (E1 and E2) = 185/500
By substitution P(E1/E2) = P (E1 and E2)/P(E2)= 185/500*500/227 = 185/227 as before
DESCRIPTIVE STATISTICS:
COUNTING RESPONSES
Whenever you ask a number of people to answer the same questions or when you measure the
same characteristics for several people or objects, you want to know how frequently the possible
responses or values occur.
This can be as simple as just counting up the number of yes or no responses to a question. Or it can
be considerably more complicated- for example if you have asked people to report their annual
income or their ages.
In this case of annual income and age, simply counting the number of times each unique income or
age occurs may not be a useful summary of the data. In this case you need to resort to other means
to summarize and display values for one variable at a time.
FREQUENCY TABLE
From this table, you can tell how frequently people gave each response.
It consists of rows which represent the responses given. It shows how many people gave each
response.
It also consists of a part labelled missing which will tell you how many respondents did not select
one of the responses.
The last row gives the total of the respondents who participated in the survey/study.
The frequency table also includes a column which shows the proportion of respondents who gave
each response in terms of percentages. These percentages help to compare various survey results.
The last column gives the cumulative percentage which gives the percentage of people who gave a
response and any response that precedes it in the frequency table. It is the sum of the valid
percentages for that row and all rows before it.
When reporting results based on cases with no missing values, you should also report the
percentages of cases that refused to give an answer. This is because missing values can be a big
problem especially where many respondents refuse to answer. This makes interpretation of results
difficult.
[ consider this scenario of a study of 100 employees of which 55 reported they were a satisfied lot ,
4 rate themselves unsatisfied and 41 refused to answer. That means 55% are satisfied. Now
remove non response and the percentage becomes 93 %. So which is the right conclusion ? ]
It may mean the company is full of satisfied employees many of whom don’t like to answer question.
It may mean half of the employees are unhappy but fear to voice their dissatisfaction.
Ensure to always run frequency tables on your variables because this will help detect
mistakes in the data files as each code captured in data entry will be reported on by the
frequency tables.
PIE CHARTS
This is a visual display of a frequency table. It consists of slices for each row in the frequency table.
The size of the slice depends on the number of cases in the category. An
BAR CHARTS
This is a visual display of a frequency table. It consists of bars for each row in the frequency table.
The length of the bar depends on the number of cases in the category. An example of bar chart is
shown below
HISTOGRAM
Some responses to variables will produce so many slices or bars which will be crowded that they are
not useful in any way (variables like income and age mentioned earlier above).
In a pie chart and bar chart, no provision is made for a value of a variable that may not occur for
example if you have values 2 and 4 where 3 is missing, the bar or slice for the value 2 will be next to
the bar of the value 4 hence not telling you that values of 3 are missing.
A better display in this case is the histogram. This groups adjacent values together. It is similar to a
bar chart except that each bar represents a range of values. An example of a histogram with a
normal curve imposed is shown below. The normal curve will be discussed later .
MEASURES OF CENTRAL TENDENCY
MODE: This is refers to that observation with the highest frequency in the data set. Is usually
used for variables measured on a norminal scale. It is a useful statistic to report with a frequency
table or a bar chart.
MEDIAN: This is a statistical measure that divides the data into two equal parts. To get the
median of the ungrouped data we arrange the data into either ascending or descending
order and then selecting the middle value.
MEASURES OF DISPERSION
RANGE: (for ungrouped data) This is the distance between the largest and smallest
observations.
INTER-QUARTILE RANGE;
KURTOSIS
Kurtosis: This is the degree of Peakedness or flatness of a probability distribution relative to the
normal distribution with the same variance.
Example 1:
Example 2:
It is difficult if not impossible to get a single value that is a true representative of the
population parameter since different samples yield different point estimates of the
corresponding population parameter, thus instead of trying to get a single point estimate
we concentrate on determining an interval within which we expect the true value to lie with
some level of probability; such an interval is called the CONFIDENCE INTERVAL.
INTERVAL ESTIMATE : This defines an interval within which the true value is expected to lie
with some level of probability.
For instance the (1 – α) 100% confidence interval for the mean is given by:
Therefore: (1 – α) 100% =
Z tells us how many standard deviations the x value lies above or below the mean.
CONFIDENCE INTERVAL FOR THE MEAN WHEN THE STANDARD DEVIATION IS UNKNOWN.
The 1-α 100% confidence interval for the mean when the standard deviation is unknown is
given by the t-distribution.
i.e.: (1 – α) 100% =
Hypothesis testing.
Selection of the appropriate test statistic and specification of the rejection criteria.
Pay attention to whether a test is one-tailed or two-tailed to get the right critical
value and rejection region. I.e. alpha/ two or alpha just.
Computation of the test statistic and p-value based on the observed data plus the
value of the test statistic.
A STATISTICAL TEST FOR MEAN ( μ) WHEN σ IS KNOWN OR WHEN THE SAMPLE SIZE N >=
30.
A STATISTICAL TEST FOR MEAN ( μ) WHEN σ IS UNKNOWN OR WHEN THE SAMPLE SIZE N
< 30.
If the population standard deviation is not known, the error bound for a population
mean is:
EBM=tα/2⋅(sn√)
tα/2 is the t-score with area to the right equal to α2
SAMPLING TECHNIQUES
Recall the bound on the sampling error is 2 times the estimate of the standard error of the
point estimate thus
B=
Stratum h
Accounting 30,000 2000 500 45
Finance 28500 1700 350 40
Information technology 31500 2300 200 30
Marketing 27,000 1600 300 35
Operating management 31,000 2250 150 30
We have =
as point estimator of the mean
Calculation of the standard error of the mean
h
Accounting 1 = 20,222,222,222
Finance 2 = 7,839,125,000
Marketing 4 = 5814837143
1. A total sample size n must be chosen. we must then decide how to assign the samples
units to the various strata
2. First decide how large a sample to take in each stratum and then sum the stratum sample
sizes to obtain the total sample size.
Since it is often of interest to develop estimates of the mean, total, and proportion for the
individual strata, a combination of these two approaches is often employed. The factors
considered most important in making the allocation are;-
Population total
Systematic Sampling
Is often used as an alternative to simple random sampling, this is because it can be time
consuming to select a simple random sample by first finding a random number and searching
through the frame to locate the elements.
It requires that the defined target population be ordered in some way e.g. a list, roll or roster.
You need a skip interval which is determined as
After identifying the skip pattern then a starting point is randomly selected.
Suppose a sample size of 50 is required from a population of 5000 elements. We might
sample one element for every .
A systematic sample for this case would involve randomly selecting one of the first 100
elements from the frame. The remaining sample elements are then identified by starting with
the first sample element and then selecting every 100th element that follows in the frame.
Cluster Sampling
This requires that the population be divided into N group of elements called Clusters such
that each element in the population belongs to one and only one cluster. E.g. suppose we
want to survey registered voters in this country, one approach would be to develop a frame
consisting of all registered voters and then select a simple random from this frame.
Alternatively in cluster sampling we might choose to define the frame as the list all districts
in the country. In this approach each district is a cluster which consists of a group of
registered voters.
Suppose we select a simple random sample say 10 of these districts, at this point we can
collect data on all registered voters in each of these 10 districts, this approach is called Single
Stage Cluster sampling. We could as well select a simple random sample of registered voters
from each of these 10 sampled clusters; this approach is called Two-Stage Cluster sampling.
Cluster sampling is similar to stratified sampling as the two methods divide the population
into groups. Cluster sampling tends to provide better results when other elements within the
cluster are not alike (heterogeneous).
In the ideal case, each cluster will be a small –scale version of the entire population and
hence sampling a small number of clusters would provide good information about the
characteristics of the entire population.
Cluster sampling involves area sampling where the clusters are countries, townships, cities or
other well-defined geographical sections
Quota Sampling
Involves identifying and qualifying initial prospective respondents who can in turn help
identify additional people to include in the study, It is also called referral sampling. It is
applicable where;-
- The defined target population is small and unique
- Compiling the complete list of sampling units is very difficult
MODULE 3 Methods of data collection
There are various methods of data collection, most important ones being observation,
personal interview and questionnaires.
DIRECT OBSERVATION
This involves enumerators taking observations directly from the sampling units of interest
e.g. in
Agricultural surveys enumerators observe and measure accurately the area under cultivation.
NOTE THAT.
Observation techniques require prior knowledge of the field of research.
That is one cannot observe cattle if he does not know what it looks like, or leaf types if they
do not have leaf knowledge and appearance.
ADVANTAGES
It is free from errors due to memory lapse as enumerators record every thing as they
see it.
Non response errors are never encountered.
DISADVANTAGES
It is expensive and time consuming as it involves moving from one place to another.
Transport and communication is a problem especially in low developed countries
where road net works and other means of communication are poor.
The technique is not always feasible, especially when observation of human behaviors
is involved. This is because people have a tendency of changing the behavior during
the process of observation.
You cannot get people’s attitude towards something by mere observation.
Ethical issues concerning privacy of an individual are not considered.
PERSONAL INTERVIEW
Interviewing is a technique that is used to gain an understanding of the underlying reasons
and motivations for people’s attitudes, preferences or behavior.
In this method of data collection, the enumerator is brought into contact with the respondent
(one to one or one to many) and asks him or her (them) questions about the subject under
study.
TYPE OF INTERVIEWS
STRUCTURED:
Base on a carefully worded interview schedule
Frequently require short answers being ticked off
Useful when there are a lot of questions which are not particularly contentious or thought
provoking.
Respondent may become irritated by having to give over-simplified answer.
SEMI-STRUCTURED
The interview focused by asking certain questions but with scope for the respondent to
express him or her self at strength.
UNSTRUCTURED
This is also called an in-depth interview. The interviewer begins by asking a general question.
The interviewer then encourages the respondent to talk freely. The interviewer uses an
unstructured format, the subsequent direction of the interview being determined by the
respondent’s initial reply. The interviewer then probes for elaboration.”Why do you say
that?” or that is interesting, tell me more or would you like to add any thing else, being a
typical probes
ADVANTAGES
Interaction creates an opportunity for on spot clarification of concepts and form of
information sought in the survey. It is very useful where the respondent is not sure of
the kind of responses to give.
Suitable for both literate and illiterate population
It has a high response rate than questionnaire
Serious approach by respondent resulting from accurate information
Complete and immediate
Possible in -depth questions (probing)
Interviewer in control and can give help if there is a problem
Can use recording equipments
Used to pilot other methods
Can investigate the motives and feelings of the respondent
DISADVANTAGES
It involve high expenses on transport and other field related exercise
It is prone interviewer’s bias. Often at times interviewers may ask leading and
suggestive questions. Such questions may bias the respondent
There is the problem of memory lapse
It is subject to problems like language barrier, non response and hostilities
It is also time consuming
Limited geographical coverage
Respondent bias- Tendency to please or impress, create false personal image, or end
interview quickly
QUESTIONAIRRE METHOD
This particular method of data collection can be explored in four dimensions
Self administered questionnaire
Mail questionnaire
Telephone interviews
By computer (emails, directly)
Mail questionnaire
This involves mailing questionnaires to prospective respondents with a list of instructions and
a letter explaining the objective and importance of the survey.
Respondents are expected to fill the questionnaires and return them by mail
Advantages
Speed and cost reduction as it does not involve movement of people (enumerators)
It is very effective where sampling unites are scattered
It is possible to get correct information about sensitive issues since the people fill the
questionnaires privately
It reduces errors due to interviews bias
Correct information can be got since consultations can be made
Disadvantages
It presupposes a high level of literacy among the respondents. This is not usually the
case in most African countries
It assumes the existence of a good and efficient postal system, which may not be the
case
There is a high rate of non-response
Follow up are very difficult to conduct. (Some kind of reminder need to be sent in
order to remind the respondent to send back the questionnaire)
Responses are usually slow. This is because people fill in the questionnaire at their on
pace
Actively A wrong person may fill the questionnaires there by biasing the results.
Remember that in interviewer’s bias include
The interviewer asking leading question
Writing wrong information for personal reasons or due to stress
Interviewers may also bias or lead to memory lapse of respondent, that is when a
woman dresses propound the respondent is a mea, he might be upset and give wrong
information leading to errors
These include:
1) Simple correlation.
2) Multiple correlation.
Simple correlation: This refers to the statistical measure of the linear relationship between
any two (2) continuous random variables. i.e. Physical statures of Parents and their
offspring, and the correlation between the demand for a product and its price.
Forms of Simple correlation
Linear correlation:
N.B: If the relationship between the two variables is a Perfect linear relationship, then the
scatter plot of the points will fall on a straight line as shown in the examples below.
Example 1:
Example 2:
For Positive correlation: Small values of x go with small values of Y while large values of x
go with large values of Y. i.e The relationship between the Age of an individual and his/her
Weight.
For Negative correlation: Small values of x go with large values of Y while large values of x
go with small values of Y. i.e. the relationship between the Quantity Demanded of a
product and its Price.
Examples of no correlation
Example 1:
Example 2:
When there is Zero or no correlation (relationship) then this implies that the change in one
(independent) variable has no effect on the other (dependent).
The correlation coefficient is a value that lies between -1 and +1 i.e. -1 ≤ rxy ≤ +1
The higher the correlation coefficient the greater relationship between the variables and
vice-versa.
The correlation coefficient can be calculated by using either Parametric techniques or non
Parametric techniques,
The correlation coefficient can also be calculated using the covariance of the variables X and
Y and their variances.
Where:
One of the nonparametric techniques is the Spearman’s Rank correlation
coefficient.
N.B: The correlation coefficient measure runs short of the significance level thus the
Probability value is used to tell the level of significance of the relationship between the
random variables; if the P value is less than the level of significance α then that implies that
there is a significant relationship between the two random variables and vice versa.
REGRESSION ANALYSIS
Regression: This refers to the statistical measure of the relationship between any two or
more random variables. It’s a statistical technique for estimating the relationship among
variables and it’s used to find out whether there is any evidence of relationships among
variables of interest for the purpose of predicting future values. Regression is concerned
with the study of the dependence of one dependent variable on one or more independent
variables.
N.B: The independent variable can also be called the Explanatory/ Exogenous / Regressor
while the dependent variable can also be called the Explained / Endogenous / Regressand
Simple linear regression: This refers to the statistical measure of the relationship between
two random variables where by one is the dependent variable and the other is the
independent variable.
In simple regression there is only one independent variable that is assumed to be affecting
the dependent variable. I.e. Consider the function y = f(x1). The x is the independent
variable while the y is the dependent variable.
A linear regression is a statistical method that helps one understand the relationship
between two (or more) variables. It does this in three ways:
Yi = b0 + b1Xi + εi for i = 1, …, n
where:
b0 is the y-intercept
In this simple linear regression model below used for modelling data points there is only
one independent variable: , and two parameters, α / and :
Consider the straight line:
From the above equation β1 is the slope of the line, which shows the average change (increase or
decrease) in the dependent variable given a unit change (increase or decrease) in the independent
variable.
So given a random sample from a Population, we estimate the population parameters and
obtain the sample linear regression model:
N.B: A zero means a Poor fit while a One (1) means Perfect linearity.
The Adjusted R2: The Adjusted R2 is similar to the R2 but it takes into account the variables
being used in the model (the number of predictors). i.e. the sample size, so it’s slightly
smaller or equal to the R2.
MULTIPLE REGRESSION
Simple linear regression looks at the dependence of a quantitative variable on another;
however, suppose that the dependent variable depends not only on one independent
variable but more then this is what is called multiple regression. e.g. the number of children
a woman has may depend on the Age of the mother, number of years spent at school and
even the number of co wives.
Multiple Regression analysis: This refers to the statistical measure of the relationship
between three or more random variables. In other words it’s a statistical technique that
predicts the value of one dependent variable basing on two or more independent variables.
In multiple regression there are two or more independent random variables that are
assumed to be affecting the dependent variable. i.e. Consider the function y = f(x1, x2, x3,
…, xn). The x’s are the independent variables while the y is the dependent variable.
In Multiple linear regression analysis a multiple linear regression model is fitted and it takes
the form:
Y = α + β1 X1 + β2 X2 + β3 X3 + .... + βn Xn + έ
Where:
Y = Dependent variable
X= Independent variable
βn = The coefficients of the independent variables.
έ = The error / disturbance term.
From the above equation each βi tells us the slope of the line, in other words it shows the average
change (increase or decrease) in the dependent variable given a unit change (increase or decrease)
in the independent variable.
N.B: Before fitting the Multiple regression model one should make sure that the xijs are
uncorrelated by carrying out correlation analysis (compute a correlation matrix amongst the
independent variables); if any of the two independent variables are highly correlated then you
should always use one of them in regression analysis but not both.
MODULE 11
WRITING A RESEARCH REPORT
1 OVERVIEW
After completing data analysis, as a researcher you are obliged to conclude your project by writing a
research report. Basically this report contains the following items
research question
relevant literature on the subject
Key findings
Limitations of the study
Writing the research report can be very challenging because it requires demonstrating a clear
understanding of the:
research question
analytical methods
and the findings
The intended audience should be able to understand the data analysis aspects without getting lost in
the subject of data analysis process and tools.
They need to understand the limitations as well as appreciate the insights of the study. Furthermore
the majority of the audience may not be familiar with all the statistical methods used so it is
imperative that every analysis is accompanied by a written explanation of the findings. At this report
writing stage, the data analysis becomes a ‘writer/narrator’ instead of the ‘explorer’.
2 WRITING STYLE
The first question to consider is the audience. Not all readers have the same skills or abilities to
understand statistics and therefore the style should be appropriate to the audience.
The following are guidelines:
Scientists writing to others can take for granted that their readers will understand the
fundamental statistical methods used as you have met in the course. However this audience
will also need the findings and methods summarized so that they can critically evaluate your
research project.
A general audience needs a different approach. Statistical results like ‘coefficients’, ‘standard
errors’, ‘significance tests’, e.t.c will be meaningless to those who do not understand
statistics or have little understanding of statistics.
Audiences with skills more than yours e.g professors will required in-depth discussion of the
process used in addition to the most import findings.
In general, scientists and other general audiences are more interested in findings than in the process
of the data analysis.
In all cases, the report will consist of statistical output which should be explained. Try to keep
analyses that are central to the research question. Also try to use statistical formats that are reader
friendly in the form of tables, graphs, and written text.
These titles are captivating but do not offer enough information and they portray the
researcher as an advocate of a cause other than an objective analysis of data.
For example who is down but not defeated? , what is the problem with the community?,
what is the struggle?.
Consider these titles
The impact of taxation on disposable income
The democratic class struggle in Uganda 1990-2000
These titles are captivating but they offer substantial information on the research project.
Abstract
This section although appears as a major section, should be written last. This is because it a concise
summary of the report
It should contain information about the research question, data, findings, and conclusions of the
study.
You should use words as effectively as possible. It is recommend to be no more 1 page.
Introduction
This section examines the rest of the report. It informs the reader of the research question and why
this question is important.
After reading through the introduction the reader should have a clear idea of the question the writer
is addressing.
Good introduction begins with phases like “ In this paper I intend to demonstrate that…………..”. This
brief phrase forces the writer to come to terms with the specific purpose of the paper and force the
rest of the report accordingly.
In this section avoid any un supported statements. If they are to be used, begin by quoting the
source e.g “According to Uganda Government National statistics abstract of
2000/2001……………………….”.
The introduction should not serve as a conclusion. Avoid statements like “In this paper I will prove
that poverty accounts highly for rising domestic violence……………………………..” This is because
scientific methodology cannot prove anything but provides support for some hypotheses and refute
others hypotheses. You can redefine statement to “ In this paper I will examine whether poverty
accounts highly for rising domestic violence……………………………..”
Introduction only opens up the research question.
It is recommended to cover 1-3 pages.
Literature review
This section places the study in the context of the body of research. It is basically aimed at
acknowledging other researchers works and contributions to the collective knowledge about the
subject under study while informing the reader of how these contributions relate to the current
research project. This section highlights what is known and what is not known about the research
question.
Literature review at the bare minimum is an overview of the essential findings from articles related
to the research question.
Literature review is significantly improved by creating a structure that links these various articles.
This structure provides the reader with a solid understanding of specific relationships.
As an example when studying factors that associated with family violence, the literature review can
first be organized by looking at the factors gender, age, race/ethnicity and the social economic
factors in that order. This will provide a structure for understanding the specific relationships.
Another approach could be to use comparison by dates for exampling studies between a base period
and a current period.
Avoid literature reviews that discuss studies in a random order or in the order in which the articles
are read. This is because the sequence does not matter to the audience.
The interest of the audience is in the research question, knowledge that has already been developed
and how the current research project fits into this body of literature.
Methodology
This section contains the methods used. It explains the following:
data collection techniques
data sources
sampling strategies i.e sample size and sample characteristics
indicators
modifications made to the data
statistical procedures applied
It should be concise as the rest of the sections. Its lengthy will depend on the complexity of the data
collection and analysis but generally it recommended to be between 2- 5 pages.
Findings
This details the results of the study. You should stick to those findings that relate to the research
question. This is because in the course of analysis other interesting results might appear.
Include tables, graphs and narrating text. This is because some audiences read tables, graphs and
ignore text and vice-versa( read text and ignore tables).
The findings should be related to the hypothesis. Include any conclusions on the hypotheses.
Depending on the study, it usually ranges between 6-12 pages.
Conclusion
This section relates the findings to the research question. It also discusses the limitations of the
study.
A good conclusion generalizes findings and gives consideration to external validity i.e the degree to
which findings accurately describe behavior outside the confines of the study.
A biased sample for example threatens external validity. An example is if the sample is students
avoid making generalizing the findings to the whole population.
A good conclusion ends by opening up the research question to further investigation by say “In
this study I have showed that ……………………………………… Further research needs to be done to find
out if…………………………………………..”
References
This section provides a list of all the references cited in the report.
You should provide enough information to enable the reader locate the source.
Further reading: Data analysis with SPSS (A first course in applied statistics)
Stephen A Sweet and Karen Grace-Martin 2003