Professional Documents
Culture Documents
TOOLS
The tools that will eventually be used for analyzing data are selected during the research planning stage
and not after data has been collected. The preliminary plan for data analysis helps the researcher in
deciding on the following:
i. What to do when
ii. How questionnaires will be checked for completeness
iii. Action to take after the checking – e.g. reject some questionnaires or questions (e.g. due to
omission of some issues, etc.)
iv. Editing the questionnaires – identifying illegible questions, incomplete responses,
unsatisfactory responses, ambiguous responses etc.
v. Options to pursue after rejection: - e.g. returning to the field, assigning missing values,
discarding the unsatisfactory questionnaire, excluding the questions with unsatisfactory
responses from analysis.
vi. Statistical methods to apply in the analysis.
vii. Procedure to follow in analysis – coding scheme, code sheet, tabulation.
1. Descriptive Statistics
Deal with compilation and presentation of data in various forms e.g. tables, charts, and diagrams. This is
done to display and pass on information from which conclusions and recommendations can be made.
Nominal measures
Name/label e.g. gender, type of client, etc. from these nominal variables we could have the category
labels “male & female” and “start-up & expansion” respectively. It then becomes possible to group
the population into homogeneous categories and providing a count of each category. Furthermore it
becomes possible to compare two or more homogeneous categories of nominal values
Ordinal measures
Here, each individual or unit has a position in a numbered order. e.g.
1. Never
2. Sometimess e a r c h M e t h o d s Page 38
3. Always
Each category has a number as a label
Interval measures
Here it is possible to rank the individuals or units and also measure the distance between them.
Therefore, we need a physical unit of measure.
e.g. the no. of loans accessed
Ratio measures
A variable is taken to be a ratio level if the scale of values assumed by the variable includes an
absolute zero value, i.e. 0 – 100%
b. Quantifying
In describing a certain characteristic from a set of data, it is necessary to represent its value in terms of
quantities or numerical values. e.g. we can assign male (1) and female (2). This is called coding.
c. Organizing data
Data should be organized in a way that makes it easy for the mind to absorb and understand. To organize
data is to arrange them into a pre-set format. This ultimately helps in describing the situation through
numbers and representing it through diagrams.
x F
4 2
5 3
6 1
7 2
esearchMetho
8 1
9 1
The researcher is thus able to see how the numbers of visits are distributed
Example: (These could be no. of products [variable] stocked by 100 small scale retail stores)
81 85 62 71 70 81 86 67 96 51
63 71 75 69 48 34 87 86 73 75
42 91 58 93 52 82 90 95 82 72
53 38 77 93 85 47 70 68 57 71
96 40 70 92 68 88 58 51 90 74
52 63 96 77 83 76 48 92 81 83
92 73 84 78 78 72 60 84 78 60
43 70 83 64 96 93 55 73 58 40
88 96 72 53 87 92 73 77 63 58
71 80 38 63 56 76 82 61 76 63
i. You may first arrange the variables in ascending [or descending] order in order to identify the
smallest and highest values.
ii. We condense the products by allocating them into CLASSES. (We can use groups of 10
e.g. 1 – 10, 11 – 20, etc. each group is called a class).
iii. Use a tally mark to place each variable in its appropriate or corresponding class.
iv. The frequency distribution is constructed as shown below:
NOTE:
In constructing group frequency distribution, it is important to calculate the class interval. Class interval =
upper class boundary, less lower class boundary.
To get lower boundary: lower class limit less 0.5
= 31 – 0.5 = 30.5
To get upper boundary: upper limit plus 0.5
= 40 + 0.5 = 40.5
d) Displaying Data
Descriptive statistics also help us in displaying data. We can use tools like histogram, bar charts, cartoons,
pie charts, frequency curves, etc to display data.
i) Cartoons
One can chose cartoons to display data. For example
Men Women
1 rep: 1000
Y-axis
(Frequency)
X-axis
0 1 2 3 4 (score or observation)
Note
i. All bars are of equal width
ii. The height of each bar represents the frequency (sometimes percentage)
iii. The bars are separated to show the fact that the data is discrete
iv. Axes are scaled or labeled and the bar chart titled
v. The first bar can touch the y-axis.
Frequency
Observation/score
Example:
The following diagram shows how different regions share the total loan amount disbursed by an MFI.
A B 800
D 1200
C 700
2. MATHEMATICAL/INDUCTIVE/INFERENTIAL STATISTICS
This is concerned with extending beyond particular information available and attempting to make general
predictions. They are measures that enhance the understanding of data. The important statistical measures
used are:
• Measures of central tendency (averages)
• Measures of dispersion
• Measures of correlation
• Measures of association h
•
•
M e t h o d s Page 42
i. MEASURES OF CENTRAL TENDENCY
Collected data is useless and meaningless until it is organized in a certain manner. A measure of central
tendency is an aggregate or summary measure that represents a value that is at the center of the
distribution values.
Example:
Mean of raw data: Suppose you collected the following data on number of members of 5 credit groups: -
30, 26, 20, 29, 20
The average number of members per group (mean) = 30+26+20+29+20
5
AM or x = 125/5 = 25
3, 6, 7, 2, 2, 8, 2, 7, 2, 5, 5, 9, 5
Substituting:
X = ∑fx = 68 = 4.5
∑f 15s e a r c h M e t h o d s Page 43
NB: We cannot work out the fx because x is not a single figure. We therefore need to work out the class
midpoints.
E.g. for class 1 – 3, the class mid point will be (1 + 3)/2 = 2
No of Class Frequency fx
visits midpoint (groups
(x) visited f)
1–3 2 6 12
4–6 5 5 25
7–9 8 4 32
∑f = 15 ∑fx =
69
x = ∑fx = 69 = 4.6
∑f 15
b) Mode
The value that occurs most often. It is therefore the value with the most frequency.
NOTE:
For grouped frequency distribution, the mode is the mid-point of the class with the highest
frequency.
If a distribution has the same frequency for all values, such a distribution has no mode.
c) Median:
The middle value when all values are arranged in the order of size. It divides the distribution into two equal
halves of number of values on its either side.
1, 3, 6, 7, 2, 2, 8, 2, 7, 2, 5, 5, 9, 5
Re-arrange: - 1, 2, 2, 2, 2, 2, 3, 4, 5, 5, 5, 6, 7, 7, 8, 9s e a r c h M e t h o d s Page 4
If total number of values is odd, take the middle value as median
If it is even, calculate the average of the two middle numbers.
Locate the middle data item by n/2; where n is the total number of observations (∑f) n/2 =
15/2 = 7.5
Calculate the cumulative frequencies until they reach the value 7.5 or exceed 7.5 for the first time
Choose the median as the value of x corresponding to the last value of F found to exceed 7.5. This is
10, and the x value that corresponds to it is 5. Therefore the median = 5
Median of grouped distribution
Example
No of visits (x) Frequency (f) Cu.f
(F)
1–3 6 6
4–6 5 11
7–9 4 15
∑f = 15
= 6 + (4.5) = 6 + 0.409
11
= 6.4
Measures of Dispersion
A measure of dispersion is an aggregate measure of deviation from a central value. The most commonly
used measures are the:
i. Range
ii. Mean deviation
iii. Standard deviation
The Range
The difference between the lowest value and the highest value in the distribution
Example: Given the following sets of data, calculate the range Set
A: 1, 2, 3, 4, 5, 6, 7, 8, 9
Set B: 3, 4, 4, 4, 5, 5, 6, 6, 7
Answer:
Set A: 9 – 1 = 8
Set B: 7 – 3 = 4
This shows that the values in set A are spread out more than those in set B.
Deviation is the difference between the value of an observation and the mean, i.e. (x – x).
The deviation can be from the mean, mode or median. In most cases it is from the mean. (Where you are
not told, assume it is in reference to the mean)
In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values.
A low standard deviation indicates that the values tend to be close to the mean of the set, while a high
standard deviation indicates that the values are spread out over a wider range. It is a useful measure of
spread for normal distributions.
Illustration:
Given the data – 1, 2, 2, 6, 9, determine the mean and
deviation. a) Mean = 20/5 = 4
b) Deviation (how far each value is from the mean):
_ _
X x x–x
1 4 -3
2 4 -2
2 4 -2
e a r c h M e t h o d s Page 4
6 4 2
9 4 5
_
∑(x -x) = 0
The magnitudes of deviation will change e.g. -3 becomes bigger (9). [If we had any of values (x -
x) being positive but less than 1, the magnitude of the deviation would become smaller. e.g. 0.4, after
squaring, would become 0.16]
Therefore, we divide by the number of the observations:
_
∑ (x - x) 2
n
The result is the variance
Formula: _
Variance = δ2 = ∑ (x – x) 2
n
_ _ _
X X x–x (x – x)2
1 4 -3 9
2 4 -2 4
2 4 -2 4
6 4 2 4
9 4 5 25
_
∑ (x - x) 2 =
46
= 46
5
= 9.2
If we get the square root of the variance, (i.e. √δ2), the result is the standard
deviation. Therefore: _
Standard deviation = √δ2 = √∑ (x - x) 2
n
_
= δ2 = √ ∑ (x - x) 2
n
= √9.2
= 3.03 t h o d s Page 47
__
Expanding the equation = ∑ (x - 2xx + x2)
2
= ∑x2 - x2 n
Measures of Association
a) Rank Correlation Coefficient (Rs)
The rank correlation investigates the presence or absence of association between variables. Moreover, it
measures the strength or degree of relationship between variables.
Assumptions
i. The data consists of a random sample of n pairs. Each pair of observation represents two
measurements taken on the same object or individual called the unit of association.
ii. Each x (observation) is ranked relative to all other observed values of X (variable e.g. age) from
smallest to largest or the largest to smallest in order of magnitude.
E.g. X Rank
34
61
43
5 2 s e a r c h M e t h o d s Page 48
iii. Each y (observation) is ranked to all other observed values of Y (variable e.g. height) from
the smallest to the largest or the largest to the smallest in order of magnitude
iv. If ties occur among the x’s or among the y’s, each tied value is assigned the mean rank
position for which it is tied.
Example 1
X Rank Positions
3 1 1
6 5 5
4 2 2.5
4 3 2.5
5 4 4
For tie ups = 2+3 = 2.5
2
Example 2
X Rank Positions
3 1 1
6 2 3
4 5 5
4 3 3
4 4 3
For tie ups = 2+3+4 = 3
3
v. If the data consists of non-numeric observations, they must be capable of being ranked as
described above.
E.g. above average, average, poor, etc
Formula:
rs = 1 - 6 ∑d2i
n (n2-1)
Example:
The following are the number of hours, which 10 clients with loans spent in the business per day and the
number of different customers that they served. Calculate rs.
No of hrs No of diff
X customers Y
8 56
5 44
11 79
13 72
10 70
5 54
18 94
15 85
2 33
8 65
e s e a r c h M e t h o d s Page 49
Solution:
Rank of X Rank of Rank xi – rank di2
yi (di)
6.5 -0.5 0.25
8.5 -0.5 0.25
-1
10 10
6.5 0.5 0.25
n = 10 n = 10 ∑ di2 = 3
Substituting: rs = 1 - 6∑ di2
n(n2-1)
= 1- 6(3)
10(100-1)
= 1- 18
990
= 0.98 (means that they are highly correlated)
Note:
rs ranges from -1 to +1. i.e. –1 ≤ r, ≤ 1
If the value of rs calculated tends towards 0, it shows that there’s no correlation; if towards 1, there is a
correlation. We can go ahead and test whether the above finding is actually true statistically. For a layman,
0.98 means that the two variables are highly correlated. A statistician/researcher would like to go further
and test whether there is or there isn’t a
correlation between them – hypothesis testing.