You are on page 1of 59

University of Gondar

College of medicine and health science


Department of Epidemiology and Biostatistics

Chapter two: Method of data collection and


presentation

Wullo S. (MPH)
Reading assignment
Methods of data collection

 The most common modes of collecting data can be


summarized as:
Observation
Personal (Face-to-Face) interview
Telephone
Group administered surveys
Mail
Web based survey
Combination of Methods
Methods of Data Presentation

 Objectives of the chapter


 After completing this chapter, the student will be able to:
– Identify the different methods of medical and biological
data organization and presentation
– Identify the criterion for the selection of a method to
organize and present data
Organization and Presentation of data

• Having collected and edited the data, the next important step is to
organize it.
• The process of arranging data in to classes or categories according to
similarities is called classification
• Classification is a preliminary and it prepares the ground for proper
presentation of data.
• The presentation of data is broadly classified in to the following two
categories:
• Tabular presentation
• Diagrammatic and Graphic presentation.

9/30/21 Wullo S. 4
Tabular presentation of data
• Frequency distribution: is the organization of raw
data in table form using classes and frequencies.
• Frequency: is the number of values in a specific class
of the distribution
• Raw data: recorded information in its original
collected form, whether it be counts or
measurements, is referred to as raw data.

9/30/21 Wullo S. 5
Frequency distribution (F.D.)…

 Frequency distribution can be grouped or ungrouped

The Ungrouped Frequency Distribution is also classified as:


– Ungrouped FD for Categorical Variables
– Ungrouped FD for Discrete Variables

i. Categorical Frequency Distribution


– Data are classified according to non-numerical categories.
– Categories must be mutually exclusive and exhaustive.
– Used to present nominal and ordinal data.
6
Ungrouped FD for Categorical Variables

Nominal data: Here the construction is straight


forward: count the occurrences in each category and
find the totals.
Example: The martial status of 60 adults classified as
single, married, divorced and widowed is presented in a
FD as below:
Marital Single Married Divorced Widowed Total
status
Frequency 25 20 8 7 60

7
Categorical F.D…

b) Ordinal data. The construction is identical to the


nominal case. How ever, the categories should be put
in an ordered manner.
Example: Satisfaction on teaching method in a class of
size 80 is presented in a FD as shown below.

Satisfaction Very Satisfied Dissatisfied Very Total


Satisfied dissatisfied
Frequency 15 36 3 7 60

8
Ungrouped FD for Discrete Variables

a) Ungrouped Discrete Frequency Distribution


– Count the number of times each possible value is repeated.
Example: In a survey of 30 families, the number of children per
family was recorded and obtained the following data:
42 4 3 2 8 3 4 4 2 2 8 5 3 4 5 4 5 4 3 5 2 7 3 3 6
7 3 8 4.

9
Discrete/Ungrouped FD…

These individual observations can be arranged in ascending


order of magnitude to from an array: 2 2 2 2 2 3 3 3 3 3 3
3 4 4 4 4 4 4 4 4 5 5 5 5 6 7 7 8 8 8.
The distribution of children in 30 families would be:

No.of 2 3 4 5 6 7 8 Total
children
No. of family 5 7 8 4 1 2 3 30
(f)

10
Continuous/grouped F.D

b) Continuous Frequency Distribution


– Arise from continuous variables/data.
– Unlike for a discrete FD, a class can not be allocated to
each value of a continuous variable.
– Categories in to which the observations are distributed
are called classes or class intervals.
– Classes should be exhaustive and mutually exclusive.

11
Continuous/grouped F.D…
Example: Consider the following FD on wages of 100
workers in a factory.

Wage (CI) 40-44 45-49 50- 55-59 60- 65- 70-74 75-79
54 64 69
Freq. 6 9 15 17 20 13 12 8
CB’s 39.5- 44.5- 49.5- 54.5- 59.5- 64.5- 69.5- 74.5-
44.5 49.5 54.5 59.5 64.5 69.5 74.5 79.5

– workers earning between 40 and 44 birr (inclusive)


are grouped in to the first class, workers earning
between 45 and 49 birr (inclusive) are grouped in to
the second class, and so on.

12
Continuous/grouped F.D…

Steps in constructing continuous frequency distribution

1. Determine the number of classes (k): Number of items


belonging to a class.

– Decide k with the help of Sturge’s rule:


k = 1 + 3.322 log n, rounded up to the nearest integer.
Where n => number of observations
log => common logarithm (logarithm of 10).
Example: If n=10, k = 4.32 ≈4; if n=100, k= 7.644 ≈ 8; if n=
1000, k =10.96 ≈ 11.

13
Continuous/grouped F.D…

2. Determine the Class Width (w): The difference between the


successive class limits or class boundaries (may be upper or
lower) of a class.
Range
we use, w 
k
Note that

“k” , rounded up to the nearest integer.

14
Continuous/grouped F.D…

3. Determine the Class Limits

 The lowest and height values that can be included in a class such

that there is gap between successive classes.

 The lower class limit of the first class should be the smallest

value of the observations.

 Add the size of a class on the lower class limit to obtain the

lower class limit of the next higher classes.

15
Cont…
 To find the upper limit of the first class, subtract U from

the lower limit of the second class. Then continue to add

the class width to this upper limit to find the rest of the

upper limits or

 Obtain the upper class limits by adding class width minus

one to the corresponding lower class limits. i.e. UCL

=LCL+ (W-1)
Continuous/grouped F.D…
4. Determine the Class boundaries
– Let U =LCL of a class – UCL of preceding class. Add half of
this difference (U/2) to all upper class limits to get the upper
class boundaries (UCBs), and subtract (U/2) from all lower
class limits to get the lower class boundaries (LCBs).
– UCBi = UCLi +U/2
– LCBi = LCLi – U/2

Where Units of measurement (U): the distance between two


possible consecutive measures. It is usually taken as 1, 0.1,
0.01, 0.001, -----. 17
Cont…
5. Class mark (C.M) or Mid points: it is the average of the
lower and upper class limits or the average of upper and lower
class boundary.
5. Determine the frequency of each class: determined simply
by counting the number of observations belonging to each
class.
6. Cumulative frequency: is the number of observations less
than/more than or equal to a specific value.
7. Cumulative frequency above: it is the total frequency of
all values greater than or equal to the lower class boundary of
a given class.

8. Cumulative frequency blow: it is the total frequency of


all values less than or equal to the upper class boundary of a
given class.
9. Relative frequency (rf): it is the frequency divided by the
total frequency.
10. Relative cumulative frequency (rcf): it is the cumulative
frequency divided by the total frequency.
Example 1
The blood glucose level for 50 patients is shown below.

Construct a frequency distribution for the following data.

44 50 79 63 66 54 56 70 56 63

60 87 60 70 59 60 62 88 71 53

56 65 74 80 51 83 69 77 69 50

58 42 43 85 43 75 55 60 58 49

72 67 55 77 48 45 61 47 44 61
Solution:

Step 1: Find the highest and the lowest value H=88, L=42

Step 2: Find the range; R=H-L=88-42=46.

Step 3: Select the number of classes desired using

Sturges formula;

k=1+3.322log (50) =6.64=7(rounding up)

Step 4: Find the class width; w=R/k=46/7=6.57=7

(rounding up)
….Cont
Step 5: Select the starting observation as lowest class limit (this is
usually the lowest observation). Add the width to that observation
to get the lower limit of the next class. Keep adding until there are
7 classes.
42, 49, 56, 63, 70, 77, 84 are the lower class limits.
Step 6: Find the upper class limit; e.g. the first upper class=42-
U=49-1=48
48, 55, 62, 69, 76, 83, 90 are the upper class limits.
So combining step 5 and step 6, one can construct the following
classes.
So combining step 5 and step 6, one can construct the following classes.

Class limits
42-48
49-55
56-62
63-69
70-76
77-83
84-90
Step 7: Find the class boundaries by subtracting 0.5 from each lower class limit and
adding 0.5 to the UCL as shown.

and
LCBi  LCLi  U 2 UCBi UCLi U 2
Example: For class 1 = 42-0.5=41.5 and
UCB1  48 0.5  48.5
Then continue adding W on both boundaries to obtain the rest boundaries. By
doing so one can obtain the following classes.
Class boundary
41.5 – 48.5
48.5 – 55.5
55.5 – 62.5
62.5 – 69.5
69.5 – 76.5
76.5 – 83.5
83.5 – 90.5
Step 8: Tally the data.
Step 9: Write the numeric values for the tallies in the
frequency column.
Step 10: Find cumulative frequency.
Step 11: Find relative frequency and /or relative
cumulative frequency.
The complete frequency distribution follows

Class Class Class Freq. <CF >CF RF <RCF >RCF


limits boundary Mark

42-48 41.5 – 48.5 45 8 8 50 0.16 0.16 1

49-55 48.5 – 55.5 52 8 16 42 0.16 0.32 0.84

56-62 55.5 – 62.5 59 13 29 34 0.26 0.58 0.68

63-69 62.5 – 69.5 66 7 36 21 0.14 0.72 0.42

70-76 69.5 – 76.5 73 6 42 14 0.12 0.84 0.28

77-83 76.5 – 83.5 80 5 47 8 0.10 0.94 0.16

84-90 83.5 – 90.5 87 3 50 3 0.06 1 0.06

Total 50 1
Continuous/grouped F.D…

Example 2: Construct a continuous FD for the following


raw data of ages of patients admitted at felege hiwot
hospital in a given week.

57, 53, 65, 55, 50, 45, 64, 52, 16, 46,
42, 63, 33, 64, 53, 25, 54, 35, 48, 55,
70, 47, 39, 58, 52, 36, 65, 75, 26, 20,
55, 60, 83, 61, 45, 63, 49, 42, 35, 18,
51, 45, 42, 65, 39, 59, 45, 41, 30, 40.
26
Continuous/grouped F.D…
Solution:
i. Using the Struges’ rule, the number of classes is:
k= 1+ 3.322 log 50 =6.64 ≈ 7.
ii. Range = highest value – lowest value
= 83 –16= 67.
Range 67
w   9.57  10
iii) Class width k 7

iv) Since the smallest value is 16, the LCL1 is 16 and the
UCL1 is 25; and the frequency distribution would look like:

27
Continuous/grouped F.D…
Here is the FD:
Ages Freq.
16-25 4
26-35 5
36-45 12
46-55 14
56-65 12
66-75 2
76-85 1
Total 50
28
Continuous/grouped F.D…
Example: The class marks and class boundaries of the
above Example are:
CL Freq. CM CB
16-25 4 20.5 15.5-25.5
26-35 5 30.5 25.5-35.5
36-45 12 40.5 35.5-45.5
46-55 14 50.5 45.5-55.5
56-65 12 60.5 55.5-65.5
66-75 2 70.5 65.5-75.5
76-85 1 80.5 75.5-85.5
Total 50
29
Continuous/grouped F.D…
Cumulative frequency distributions
 Tells us how often the values fall below or above
that class. There are two types of CFD:

The “less than” cumulative F.D.


 Obtained by adding the frequency of all the
preceding classes including the frequency of that
class.
The “more than” cumulative F.D.
 Obtained by adding the frequency of the succeeding
classes including the frequency of that class.

30
Continuous/grouped F.D…
Example: For the data in the above Example, both
cumulative frequency distributions are given below:

Less than cum.freq. More than cum.freq.


Marks Cum. Marks Cum.
Freq. Freq.
<25.5 4 >15.5 50
<35.5 9 >25.5 46
<45.5 21 >35.5 41
<55.5 35 >45.5 29
<65.5 47 >55.5 15
<75.5 49 >65.5 3
<85.5 50 >75.5 1

31
Diagrammatic and Graphical Methods of Data
Presentation
A F.D can be presented graphically or diagrammatically.
Advantages
• To understand the information easily.
• To make the data attractive.
• To make comparisons of items easy.
• To draw attention of the observer.
 The purpose of graphs and diagrams is not to provide exact and
detailed information, but simple comparisons. Any further
information shall rather be obtained from the original data.
32
2.2.2 Diagrammatic Presentation of Data

Diagrams are appropriate for presenting discrete as well as


qualitative data.
The three most commonly used diagrammatic presentation of
data are:
 Pie charts
Bar charts
 Pictograms
Pie Chart

 Pie chart can used to compare the relation between


the whole and its components
 Pie chart is important for depicting discrete variables
with relatively few categories.
 Pie chart is a circular diagram and the area of the
sector of a circle is used in pie chart.
Cont…
Steps in constructing a pie-chart
1. Construct a frequency table

2. Change the frequency into percentage (P) or fraction (F)


3. Change the percentages into degrees, where:

4. Draw a circle and divide it accordingly


Component Part
Angle of Sector  x 360 0
Total
Example

Example2.4: The following table gives the details of monthly


budget of a family. Represent these figures by a suitable
diagram.
Example
Example

300

600
100

400
100

food clothing House Rent Fuel and Light misclaneous


Bar Chart

The bar chart (simple, multiple and stacked bar graph)


used to represent and compare the frequency distribution
of discrete variables and attributes or categorical
series.
• The vertical or horizontal bins to represent the
frequencies of a distribution. While we draw bar chart,
we have to consider the following points. These are (see
the following slide)
Cont…
Tips for constructing bar chart
1. Whenever possible it is better to construct a bar diagram on
a graph paper
2. All bars drawn in any single study should be of the same
width

3. The different bars should be separated by equal distances


4. All the bars should rest on the same line called the base
5. Whenever possible, it is advisable to draw bars in order of
magnitude
Simple Bar Chart

• is used to represents data involving only one


variable classified on spatial, quantitative or
temporal basis.
Example Draw simple bar diagram to represent the
profits of a bank for 5 years.
Multiple Bar Chart

• are used two or more sets of inter-related data are represented


(multiple bar diagram facilities comparison between more than
one phenomenon).

Example : Draw a multiple bar chart to represent the import and


export of Canada (values in $) for the years 1991 to 1995.
Example
Component Bar Chart
• is used to represent data in which the total magnitude is
divided into different or components
Example 2.7: The table below shows the quantity in hundred kgs
of Wheat, Barley and Oats produced on a certain form during
the years 1991 to 1994. Draw stratified bar chart.
Example
2.2.3 Graphical Presentation of data
The histogram, frequency polygon and cumulative frequency graph
(ogive) are most commonly applied graphical representation for
continuous data.
Procedures for constructing statistical graphs
• Draw and label the X and Y axes.
• Choose a suitable scale for the frequencies or cumulative
frequencies and label it on the Y axes.
• Represent the class boundaries for the histogram or ogive and the
mid points for the frequency polygon on the X axes.
• Plot the points.
Box plots

• A visual picture called box (box-and-whisker )plot can be used


to convey a fair amount of information about the distribution of
a set of data.
• It is used as an exploratory data analysis tool

• The box shows the distance between the first and the third
quartiles,

• The median is marked as a line within the box and

• The end lines show the minimum and maximum values


respectively
48
Box plots cont…
Box plot is the five-number summary:

The minimum entry


Q1
Q2 (median)
Q3
The maximum entry
The quartiles are sets of values which divide the
distribution into four parts such that there are an equal
number of observations in each part.
 Q1 = [(n+1)/4]th
 Q2 = [2(n+1)/4]th
 Q3 = [3(n+1)/4]th
Box plots cont…

Example: Use the following age data of 15 patients to draw a box-


and-whisker plot.

Min Q1 Q2 Q3 Max

35 35 36 37 37 38 42 43 43 44 45 48 48 51 55
Illustration of Box-plot using the age of 15 patients

Notice the distribution of


data in each
quarter(distance between
quartiles)

51
A box-plot indicating the distribution of blood lead level of
individuals by sex

52
Histogram

• A graph which places the class boundaries on the


horizontal axis and the frequencies on a vertical axis
• Class marks and class limits are some times used as
quantity on the X axes.
• Example: Construct a histogram to by using the
following data
Example*:
The blood glucose level for 50 patients is shown below.
Construct a frequency distribution for the following data.

44 50 79 63 66 54 56 70 56 63

60 87 60 70 59 60 62 88 71 53

56 65 74 80 51 83 69 77 69 50

58 42 43 85 43 75 55 60 58 49

72 67 55 77 48 45 61 47 44 61
Example*

Class Class Class Freq. <CF >CF RF <RCF >RCF


limits boundary Mark

42-48 41.5 – 48.5 45 8 8 50 0.16 0.16 1


49-55 48.5 – 55.5 52 8 16 42 0.16 0.32 0.84
56-62 55.5 – 62.5 59 13 29 34 0.26 0.58 0.68
63-69 62.5 – 69.5 66 7 36 21 0.14 0.72 0.42
70-76 69.5 – 76.5 73 6 42 14 0.12 0.84 0.28
77-83 76.5 – 83.5 80 5 47 8 0.10 0.94 0.16
84-90 81.5 – 90.5 87 3 50 3 0.06 1 0.06
Total 50 1
Histogram
14
12
Number of Patients

10
8
6
4
2
0
41.5 – 48.5 – 55.5 – 62.5 – 69.5 – 76.5 – 41.5 –
48.5 55.5 62.5 69.5 76.5 83.5 48.5
Blood Glucose Level
FrequencyPolygon
Line graph of class marks against class frequencies.
To draw a frequency polygon we connect the midpoints of
class boundaries of the histogram by a straight line.
Frequncy (Number of

14
12
10
Patients)

8
6
4
2
0
38 45 52 59 66 73 80 87 94
Class Marks (Blood Glucose Level)
Ogive (cumulative frequency polygon)
• A graph showing the cumulative frequency (less than or more than
type) plotted against upper or lower class boundaries respectively.
• That is class boundaries are plotted along the horizontal axis and
the corresponding cumulative frequencies are plotted along the
vertical axis.
• The points are joined by a free hand curve.
• Example: Draw an ogive curve(less than type) for the above data.
(Example *)
Ogive Graph (Cumulative Less Than Type)

60

50

40

30

20

10

0
41.5 48.5 55.5 62.5 69.5 76.5 83.5 90.5

You might also like