You are on page 1of 20

CHAPTER 2: METHODS OF DATA COLLECTION AND PRESENTATION

2.1 Methods of data collection


There are two sources of data:

1.Primary Data

• Data measured or collect by the investigator or the user directly from


the source.
• Two activities involved: planning and measuring.

a) Planning:

 Identify source and elements of the data.


 Decide whether to consider sample or census.
 If sampling is preferred, decide on sample size, selection
method,… etc
 Decide measurement procedure.
 Set up the necessary organizational structure.

b) Measuring: there are different options.

 Focus Group
 Telephone Interview
 Mail Questionnaires
 Door-to-Door Survey
 Mall Intercept
 New Product Registration
 Personal Interview and
 Experiments are some of the sources for collecting the
primary data.

2. Secondary Data

 Data gathered or compiled from published and unpublished sources or


files.
 When our source is secondary data check that:
 The type and objective of the situations.
 The purpose for which the data are collected and
compatible with the present problem.
 The nature and classification of data is appropriate to our
problem.

1
MESERET TADDESSE EJETA
 There are no biases and misreporting in the published data.

Note: Data which are primary for one may be secondary for the other.

2.2 METHODS OF DATA PRESENTATION

2.2.1 FREQUENCY DISTRIBUTION


Definition: The number of times the values of a variable in a data occurs is called the
frequency of art value.

Definition: A tabular representation of values of a variable together with the


corresponding frequencies is called a frequency distribution (FD)

Types of frequency distributions

1. Ungrouped frequency distribution (UFD)


Shows a distribution where the values of a variable are linked with respective
frequencies. Discrete frequency distribution is one which involves a discrete variable
Example: The following data represents the number of books read in the past

six months by each student in a class of 25.

6 24 14 11 33

15 15 8 14 10

8 27 15 6 20

20 9 33 15 10

6 11 20 8 6

Construct a frequency distribution for this data.

Solution: These individual observations can be arranged in an ascending or descending order of


magnitude in which case the series is called an “array”

Array of number of books read in the past six months by each student in a class of 25.

6 6 6 6 8 8 8 9 10 10 11 11

14 14 15 15 15 15 20 20 20 24 27 33

33

Since the variable “ Number of books read” can assume only the values 0,1,2,3,4,5,6…, (which
are whole numbers) it is a discrete variable.

2
MESERET TADDESSE EJETA
Therefore, its frequency distribution is a discrete frequency distribution.

Number of books Number of

Read(X) students (f)

6 4
8 3
9 1
10 2
11 2
14 2
15 4
20 3
24 1
27 1
33 2
Total 25

Steps in constructing a discrete frequency distribution.

1. make sure that the variable you have is discrete


2. Determine the possible values of the variable
3. Prepare an array for the distribution of the variable
4. Prepare three columns, the first for the different values of the variable, the second for
tally marks to facilitate the counting, the third for the frequency corresponding to each
value of the variable

Values Tally Marks Frequency

5. Write the possible values of the variable in ascending order in the first column.

2. Categorical frequency Distribution

- Used for data that can be placed in to specific categories e.g. marital status.
Example: A social worker collected the following data on marital status for 25 persons.

(M= married, S = Single, W= widowed, D= divorced)

M S D W D S S M M M W D
S M M W D D S S S W W D
D

3
MESERET TADDESSE EJETA
Construct a frequency distribution for the above data.

Solution: Make a table as shown below

Class Tally frequency percent

M |||| 5 20

S |||||| 7 28

D |||||| 7 28

W ||||| 6 24

3. Continuous frequency distribution (Grouped frequency distribution)


Continuous frequency distribution arises from continuous variables. i.e.., from
measurements done on continuous scales like height , weight, amount of power supply,
etc. unlike that of a discrete frequency distribution where one class is used for each value
of a continuous variable. For otherwise the purpose of classification i.e. Condensation of
the data will be lost. Hence the observations each item falls in one and only one group
(class) and the classes would be exhaustive

Example: The following table shows the frequency distribution of the test results of 50 students
in Statistics course.

Test Results Number of students

10-13 8
14 – 17 15

18 – 21 15

22 – 25 7

26 _ 30 5

Total 50

The categories in to which the observations are distributed are called classes or class intervals.
The classes should be set so that they contain all items and no two classes share the same item.
This is the basic principle in the construction of such frequency distributions. We will define
some concepts associated with continuous frequency distributions in the following way.

Class limits: In the above table the students are distributed in to different classes. There are 8
students with scores between 10 and 13.The numbers 10 and 13 are called lower and upper class

4
MESERET TADDESSE EJETA
limits, respective. There are 15 students with scores between 14 and 17. The numbers 14 and 17
are called lower and upper class limits. Respectively

Class limits are there fore the lowest and highest values that can be included in a class. In the
above examples, the numbers 10, 14, 18 and 22 are called the lower class limits (LCL) and the
numbers 13, 17, 21 and 25 are called the upper class limits (UCL)

Class boundaries (real class limits): A class boundary is a number that does not appear in the
stated class limits but is rather a value that falls midway between the upper limit of one class and
the lower limit of the next large one.

In practice, the class boundaries are obtained by adding the upper class limit of one class
interval to the lower limit of the next higher class interval and dividing it be 2.

Let d= LCL of the second class - UCL of the first class.

Then adding ½ d to upper limits gives the upper class boundaries (UCB) and subtracting ½ d
from the lower limits gives the lower class boundary (LCB)

For the data in the above table,

d= 14 - 13=1, ½ d = 0.5

Subtracting 0.5 from the lower limits gives

9.5, 13.5 17.5 and 21.5 which are the lower class boundaries

Adding 0.5 to the upper limits gives 13.5, 17.5, 21.5 and 35.5 which are the upper class
boundaries.

Therefore, the above table becomes

Classes Frequency Class boundaries

10-13 8 9.5-13.5

14-17 15 13.5 – 17.5

18-21 20 17.5-21.5

22-25 20 21.5-25.5

Total 50

Or Class boundaries are obtained by subtracting half of the unit of measurement (u) form the
lower limits and by adding half of u on the upper class limits of a class.

5
MESERET TADDESSE EJETA
Where u is the distance between two possible consecutive measures. It is usually taken as 1, 0.1,
0.01, 0.001, …

u= 1 if all the observations are whole numbers

u=0.1 if all the observations are to one decimal places

u= 0.01 if all the observations to two decimal places

Let LCB= Lower class boundary

UCB= upper class boundary

u u
Then LCBi  LCLi  and UCBi  UCLi 
2 2

For the data in the above example, consider the 2nd class 14-17, since u =1,

LCL 2  14,UCL 2  17,u  1  0.5


2 2
LCB 2  14  0.5  13.5,,UCB 2  17  0.5  17.5

Class width (class size): The size or width of a class interval is the difference between the upper
and lower class boundaries and is preferred to as the class width, class size or class length

i.e CWi  UCBi  LCBi

In the above table, for the first class, the class width is 13.5 -9.5 =4 and the

second class 14-17 has class width 17.5 -13.5 =4. In this table all classes have equal

size which is 4.

When all the classes are of the same size the class width can also be obtained as the difference
between any two consecutive lower limits or upper limits EX: see the above table.

Class mark or class mid-point or the class interval: is a value which lies mid way between the
lower and upper limits of the class and is obtained by adding the lower and upper class limits and
dividing the sum by two.

LCL  UCL
class mark  CM  
i.e. 2
LCB  UCB

2

6
MESERET TADDESSE EJETA
Note that when the class size is uniform in a distribution, after finding the class mark of the first
class the remaining are obtained by adding the class size. So, in the case of classes with the same
size. The class width can also be obtained as the difference between any two consecutive class
marks.

For the distribution of the above table , the class mark of the first class is 11.5 then the class
mark of the second class in 11.5+4 = 15.5 the class mark of the 3rd class is 15.5+4=19.5 and
that of the fourth class is 19.5+4 = 23.5 then we can have the following table.

Class Limits Class boundaries Class mark Frequency

10-13 9.5-13.5 11.5 8

14-17 13.5-17.5 15.5 15

18-21 17.5-17.5 19.5 20

22-25 21.5-25.5 23.5 7

Total 50

Basic principles for constructing a continuous frequency distribution

The basic principles or steps in constructing a continuous frequency distribution are:

1. Determine the number of classes that will be used to group the data.

The number of classes should be neither so large as to destroy the advantage of classification,
nor be so small that the chief characteristic of the data is missed. The exact number of classes to
use depends upon the number of figures to be classified, the size of figures, the purpose that data
has to serve and the arbitrary preference of the analyst.

A small number of items to be classified justify a small number of classes. For example, if we
classify 30 items into 20 classes we would loose more than we gain from the classification. If, on
the other hand, we classify 15,000 items in to 5 classes we would probably give away too much
information.

So, in general the approximate number of classes depends upon the number of measurements
and the following rough information gives us a good hint.

Sturges’ Rule

To fix the number of classes (k) one can use the above method, a personal judgment depending
up on the nature of investigation or decide with the help of Sturges’ Rule, stating that

Number of classes = k= 1+ 3.322 x logN

7
MESERET TADDESSE EJETA
Where N = total number of observations and log is common logarithm.

Generally, the number of classes should be between 5 and 20. That is, not less than 5 and not
greater than 20 classes should be used for any kind of distribution.

2. The size of the class has to be determined.

Whenever possible, all classes should be of the same size. This facilitates the analysis of the data
and simplifies comparison between different classes.

A frequency distribution with equal class size can be presented pictorially with greater ease.

However, in some cases equal size is either impossible or undesirable.

If the number of classes is known and if it is decided to use classes of equal size, the
determination of the size is relatively simple. Since the class size depends upon the number of
classes and the extent to which the values of the variable are spread or dispersed, the following
simple formula can be used.

Range R
Class width  or cw 
Number of classes k

Where R = Range = Highest Value __ Smallest Value

3. Determine the lower class limit of the first class so that the smallest item falls in this
class. The remaining lower class limits are obtained using the following relations.

LCL2 = LCL1 + cw, LCL3 = LCL2 + cw, LCL4 = LCL3 + cw, … , LCLi+1 = LCLi + cw

4. Determine the upper class limit of the first class using the formula
_
UCL1 = LCL1 + cw u. The remaining lower class limits are obtained using the
following relations.

UCL2 = UCL1 + cw, UCL3 = UCL2 + cw, …, UCLi+1 = UCLi + cw

5. Complete the continuous frequency distribution with the respective class frequencies.

Example: Following are marks of (out of 100) obtained by 50 students in Statistics.

41 50 69 77 88 92 40 51 67 75 87 94 93 86 72 62 53 49 57 67

70 85 97 95 83 79 68 52 44 44 55 64 75 83 74 60 56 42 56 69

70 42 64 52 63 60 59 61 65 78

a) Construct a continuous frequency distribution with suitable number of classes.

8
MESERET TADDESSE EJETA
b) Complete the distribution obtained in (a) with the class boundaries and class marks.

Solution: Step1. Here N = 50, then k = 1+ 3.322 x log50 = 1+ 3.322 x 1.69890004

Thus k = 1+ 5.643978354 = 6.643978354  7

R 97  40
Step2. cw =   8.142857143
k 7

Rounding 8.142857143 to the nearest whole number to facilitate

the construction of the distribution and further the analysis of the data,

the class size will be 9

Step3. Let LCL1 = 40, then LCL2 = 40 + 9 = 49, LC3L3 = 49 + 9 = 58,

LC3L4 = 58+ 9 = 67, LC3L5 = 67 + 9 = 76, LC3L6 = 76 + 9 = 85,

LC3L7= 85 + 9 = 94.

Step4. UCL1 = 40 + 9 _ 1 = 48, where u= 1.

Then UCL2 = 48 + 9 = 57, UCL3 = 57 + 9 = 66, UCL4 = 66+ 9 = 75,

UCL5 = 75+ 9 = 84, UCL6= 84 + 9 = 93, UCL7 = 93 + 9 = 102

Step5. Completing the distribution gives the following table.

Class Number of Class Class


limits(Marks) Students(frequency) boundaries marks

40 __ 48 6 39.5 __ 48.5 44

49 __ 57 10 48.5 __ 57.5 53

57 __ 66 9 57.5 __ 66.5 62

67 __ 75 11 66.5 __ 75.5 71

76 __ 84 5 75.5 __ 84.5 80

85 __ 93 6 84.5 __ 93.5 89

94 __ 102 3 93.5 __ 102.5 98

Total 50

9
MESERET TADDESSE EJETA
4. Relative Frequency Distribution

The relative frequency of a class shows the relative concentration of items in a given class
interval to the other classes of a frequency distribution.

class frequency
Relative frequency of a class
Total frequency

Relative frequencies are usually given as decimals or percentages.

Example: The following table shows and example of relative frequency distribution.

Number of Relative frequency


Wages (X) Works
In decimals In %
(f)

75-80 9 0.09 9%

80-85 12 0.12 12%

85-90 15 0.15 15%

90-95 11 0.11 11%

95-100 20 0.2 20%

100-105 20 0.2 20%

105-110 11 0.11 11%

110-115 2 0.02 2%

Total 100 1.00 100%

5. Cumulative Frequency Distribution

The cumulative frequency of value of a variable (a class) is the sum of all the frequencies
preceding or succeeding that value (class) including the frequency of that value (class) there are
two types of cumulative frequency distributions namely the “less than” cumulative frequency
distribution and the “more than” cumulative frequency distribution.
a) “Less than” cumulative frequency distribution
Less than cumulative frequency for any value of the variable (or class) is obtained by adding
values (or classes), including the frequency of that value (class) against which the totals are
written, provided the values (Classes) are arranged in ascending order of magnitude. Or for

10
MESERET TADDESSE EJETA
grouped frequency distribution it is the sum of all frequencies lying below the upper class
boundaries of each class.

Example: The table below shows the ‘less than’ cumulative frequency distribution of marks of
70 students in a class.

‘Less than’
Marks Frequency Cumulative
Frequency
30-35 5 5
35-40 10 5+10=15
40-45 15 15+15=30
45-50 30 30+30=60
50-55 5 60+5=65
55-60 5 65+5=7

The above ‘less than’ cumulative frequency distribution can also be written as follows
Marks Frequency
Less than 30 0
Less than 35 5
Less than 40 15
Less than 45 30
Less than 50 60
Less than 55 65
Less than 60 70

b) ‘more than’ cumulative frequency distribution


The ‘more than’ cumulative frequency is obtained similarly by finding the cumulative totals
of frequencies starting from the highest value of the variable (class) to the lowest value
(class). Thus in the above illustration the number of students with marks ‘more than 50’ is
5+5= 10, and ‘more than 40’ is 15+30+5+5=55 and so on. The complete ‘more than’ type
cumulative frequency distribution for this data is given below:
‘More than’ cumulative frequency distribution of marks of 70 students

‘more than’
Marks Frequency cumulative
frequency
30-35 5 65+5=70
35-40 10 55+10=65
40-45 15 40+15=55
45-50 30 10+30=40
50-55 5 5+5=10
55-60 5 5

The above ‘more than’ C.F. Distribution can also be expressed in the following form:

11
MESERET TADDESSE EJETA
Number of
Marks
students
More than 30 70
More than 35 65
More than 40 55
More than 45 40
More than 50 10
More than 55 5
More than 60 0

Remark: In ‘less than’ C.F. Distribution , the c.f. refers to the upper class boundary of the
corresponding class and in ‘more than’ cumulative frequency distribution, the c.f. refers to
the lower class boundary of the corresponding class
Exercise: Convert the following distribution in to ‘more than’ frequency distribution

Weekly wages less Number of


than birr workers
20 41
40 92
60 156
80 194
100 201

2.2.2 GRAPHICAL PRESENTATION OF DATA


After organizing data in to frequency distributions it is often helpful to present data graphically.
Graphs communicate the essential characteristics of a frequency distribution in pictorial form so
that one can readily identify these characteristics and can compare one frequency distribution
with another.

Histogram, frequency polygon and cumulative frequency curves are common ways of
representing frequency distribution graphically.

Histogram
A histogram is a graphical display of the distribution of a data set. A histogram looks like a
vertical bar graph, except that the columns touch each other.

The given grouped data is plotted in the form of a series of rectangles. Class boundaries are
marked along the x-axis and the frequencies along the y- axis according to a suitable scale. If all
the classes are of the same size, the height of the rectangles can be taken to be numerically equal
to the class frequencies.

12
MESERET TADDESSE EJETA
If on the other hand the size of the class intervals is not uniform, the height of the rectangles can
be adjusted by taking the “frequency density” of the corresponding classes as scale for the
vertical axis.

class Frequency
Frequency density 
Class width

A histogram gives us an idea about the shape of the data distribution. It can indicate to us,
graphically, where the center of the data distribution lies. It will also reveal whether the
distribution is symmetric or skewed.

Example: Construct a histogram for the following frequency distribution

Class limits fi Class


boundary

10-19 4 9.5-19.5

20-29 5 19.5-29.5

30-39 8 29.5-39.5

40-49 6 39.5-49.5

50-59 2 49.5-59.5

Solution:

10
8
Frequency

6
4
2
0
9.5 19.5 29.5 39.5 49.5
Class boundary

Frequency polygon

Is a line chart of frequency distribution in which either the values of discrete variables or the
class marks of classes are plotted against the frequencies and these plotted points are joined
together by straight lines.

13
MESERET TADDESSE EJETA
It is thus a graphic presentation tool that may be used as an alternative to the histogram. For a
large number of classes a frequency polygon is preferable.

For a frequency distribution where class intervals are equal, the area of frequency polygon is
equal to the area of the histogram.

Example: Construct a frequency polygon for the above frequency

distribution.

Solution:

Class
limits Frequency Class Marks

10 _19 4 14.5

20 _ 29 5 24.5

30 _ 39 8 34.5

40 _ 49 6 44.5

50 _ 59 2 54.5

10

8
Frequency

0
4.5 14.5 24.5 34.5 44.5 54.5 64.5
Class m ark

Remark: We enclose the polygon to imaginary class marks to the left


and to the right of the extreme class marks.

14
MESERET TADDESSE EJETA
Cumulative frequency curve (ogive)

It is the graphic representation of cumulative frequency distribution.

The ogive curve can be traced either on less than basis or more than basis.

a) ‘Less than Ogive’: Upper class boundaries are plotted against the ‘less than’ cumulative
frequencies.
b) ‘More than’ Ogive: Lower class boundaries are plotted against the ‘more than’
cumulative frequencies.
Example: Construct (a) the ‘Less than’ ogive and

(b) the ‘More than’ ogive for the above frequency distribution.

Solution:

Less than More than


Class cumulative cumulative
limits Frequency frequency frequency

10 _19 4 4 25

20 _29 5 9 21

30 _39 8 17 16

40 _ 49 6 23 8

50 _59 2 25 2

Less than cumulative


Frequency
25
20
15
10
5
0
9.5 19.5 29.5 39.5 49.5 59.5
Upper Class boundary

15
MESERET TADDESSE EJETA
More than
cumulative
Frequency

25
20
15
10
5
0
9.5 19.5 29.5 39.5 49.5 59.5

Lower Class boundary

2.2.3 DIAGRAMMATIC PRESENTATION OF DATA


When the basis of classification is not quantitative, i.e.; when the data are of attribute nature,
statistical data can be presented diagrammatically using charts. The charts could be bar chart,
pie-chart or pictogram all of which having specific uses depending upon the nature of the
information to be depicted.

BAR CHARTS
Bar charts are drawn almost in the same way as graphs. Data are presented by a series of bars,
the heights of each bar showing the size of the observation represented.

While drawing the bar charts:

i. The width of the bar should be kept uniform and


ii. The graphs between successive bars should be remain the same
There are four main types of bar charts serving different purpose. These are simple bar charts,
component charts, percentage component bar charts and multiple bar charts.

Simple bar chart

In simple bar charts, each bar represents one and only one figure. A simple bar chart is usually
constructed to represent total only.

Example: the following tale shows the number of student attending in four departments.

Department Mathematics Statistics Physics Chemistry

Number of student 56 45 40 50

16
MESERET TADDESSE EJETA
Construct a simple bar chart for the above table.

Solution:

60 56
50
Number of students

50 45
40
40

30

20

10

0
Math. Stat. Physics Chemistry

Department

Component (sub-divided) bar chart

The component bar chart gives the break up in parts which constitutes the aggregate in a year
place or sector. In such type of chart, it is possible to compare changes in part, in aggregate, as
well.

Example: The table and chart below show the revenue, expenditure of a country on education

Education Expenditure (in million)

1978-80 1980-81 1981-82

Primary 60 80 40

Secondary 40 60 60

Higher Education 20 40 20

Total 120 180 120

17
MESERET TADDESSE EJETA
Primary
200
Secondary
Higher Education
150

100

50

0
1978-80 1880-81 1981-82

Multiple bar charts

Here the interrelated components part are shown n adjoining bars, colored or marked differently,
thus allowing comparison between different parts.

Example: The charts below show the revenue expenditure of a country in education

Primary
90
Secondary
80
Higher Education
70
60
50
40
30
20
10
0
1978-80 1880-81 1981-82

Pie-chart (angular chart)

A pie-chart is a circle divided by radical lines into sections or slice so that the area of each
section is proportional to the size of the amount represented. It is a simple description display of
data that sum to a given total. A pie-chart is probably the most illustrative way of displaying
quantities as percentage of a given total. The total area of the pie represents 100 percent of the
quantity of interest (the sum of the variable values in all categories of the slice denotes.

18
MESERET TADDESSE EJETA
Thus, a pie-chart indicates relative frequencies by slicing up a circle into distinct sectors.

The sum of angles at a point being 360 o, the component parts of the data are expressed as
proportions of 360o and the sectors of circle represent these parts. The degrees corresponding to
components are obtained by dividing the amount for each item divided by the total and
multiplying by 360o and to be drawn by means of a protractor.

In order to draw pie chart, it is convenient to form before hand a table of percentages and the
corresponding angles to be drawn at the center of the circle.

Example: The following table shows the monthly expense of family with income1000
Birr.

Item Food Clothing Rent Others Total

Amount(in Birr) 400 200 250 150 1000

Solution:

Item Amount Degrees Amount

(in Birr) (Size of central angle) (in percentages)

Food 400 144 40

Clothing 200 72 20

Rent 250 90 25

Others 150 54 15

Total 1000 360 100

19
MESERET TADDESSE EJETA
Pie-chart for the above table
Food
Clothing
15% Rent
Others
40%

25%

20%

20
MESERET TADDESSE EJETA

You might also like