Professional Documents
Culture Documents
Contents
2.1 Sources of data
2.1.1 Methods of primary data collection
2.1.2 Methods of secondary data collection
2.2 Methods of data presentation
2.3 Frequency distribution
2.3.1 Categorical frequency distribution
2.3.2 Ungrouped frequency distribution
2.3.3 Grouped frequency distribution
2.4 Diagrammatic and Graphical Presentation of Data
“I've come loaded with statistics, for I've noticed that a man can't prove anything without
statistics”
M. TWAIN
abebuabebaw@yahoo.com
1
Chapter two: Methods of data presentation
Questionnaires: are a popular means of collecting data, but are difficult to design
and often require many rewrites before an acceptable questionnaire is produced.
Advantages:
Disadvantages:
Design problems
Questions have to be relatively simple.
Historically low response rate (although inducements may help).
Time delay whilst waiting for responses to be returned
Require a return deadline.
Several reminders may be required.
Assumes no literacy problems.
No control over who completes it.
Not possible to give assistance if required.
Problems with incomplete questionnaires.
Replies not spontaneous and independent of each other.
Respondent can read all questions beforehand and then decide
whether to complete or not. For example, perhaps because it is too
long, too complex, uninteresting, or too personal.
abebuabebaw@yahoo.com
2
Chapter two: Methods of data presentation
Advantages:
Disadvantages:
abebuabebaw@yahoo.com
3
Chapter two: Methods of data presentation
Advantages:
Disadvantages:
Subjects need to be clear about what they are being asked to do,
why and what you plan to do with the data.
Diarists need to be of a certain educational level.
Some structure is necessary to give the diarist focus, for example, a
list of headings.
Encouragement and reassurance are needed as completing a diary
is time-consuming and can be irritating after a while.
Progress needs checking from time-to-time.
Confidentiality is required as content may be critical.
Analyses problems, so you need to consider how responses will be
coded before the subjects start filling in diaries.
abebuabebaw@yahoo.com
4
Chapter two: Methods of data presentation
abebuabebaw@yahoo.com
5
Chapter two: Methods of data presentation
1. You have to identify that the data is in nominal or ordinal scale of measurement
2. Make a table as show below
abebuabebaw@yahoo.com
6
Chapter two: Methods of data presentation
abebuabebaw@yahoo.com
7
Chapter two: Methods of data presentation
B. Since 12 of the 50workers had no days of sick leave, the answer is 50-12=38
C. The answer is the sum of the frequencies for values 3, 4 and 5 that is 4+5+8=17
III. Grouped Frequency Distribution
When the range of the data is large, the data must be grouped in which each class has more than
one unit in width.
Definition of some basic terms
• Grouped frequency distribution: is a FD when several numbers are grouped into one
class.
• Class limits (CL): It separates one class from another. The limits could actually appear in
the data and have gaps between the upper limits of one class and the lower limit of the next
class.
• Unit of measure (U): This is the possible difference between successive values. E.g. 1,
0.1, 0.01, 0.001……
• Class boundaries: Separate one class in a grouped frequency distribution from the other.
The boundary has one more decimal place than the raw data. There is no gap between the
upper boundaries of one class and the lower boundaries of the succeeding class. Lower
class boundary is found by subtracting half of the unit of measure from the lower class
limit and upper class boundary is found by adding half unit measure to the upper class
limit.
abebuabebaw@yahoo.com
8
Chapter two: Methods of data presentation
• Class width (W): The difference between the upper and lower boundaries of any
consecutive class. The class width is also the difference between the lower limit or upper
limits of two consecutive classes.
• Class mark (Mid point): It is found by adding the lower and upper class limit
(Boundaries) and divided the sum by two.
• Cumulative frequency (CF): It is the number of observation less than the upper class
boundary or greater than the lower class boundary of class.
• CF (Less than type): it is the number of values less than the upper class boundary of a
given class.
• CF (Greater than type): it is the number of values greater than the lower class boundary
of a given class.
• Relative frequency (Rf ):The frequency divided by the total frequency. This gives the
percent of values falling in that class.
Rfi = fi/n= fi/∑fi
• Relative cumulative frequency (RCf): The running total of the relative frequencies or the
cumulative frequency divided by the total frequency gives the percent of the values which
are less than the upper class boundary or the reverse.
abebuabebaw@yahoo.com
9
Chapter two: Methods of data presentation
The classes must be continuous. Even if there are no values in a class, the class
must be included in the frequency distribution. There should be no gaps in a
frequency distribution. The only exception occurs when the class with a zero
frequency is the first or last. A class width with a zero frequency at either end
can be omitted with out affecting the distribution.
The classes must be equal in width. The reason for having classes with equal
width is so that there is not a distorted view of the data. One exception occurs
when a distribution is open-ended. i.e., it has no specific beginning or end values.
4. Find the class width by dividing the range by the number of classes
𝑅 𝑅𝑎𝑛𝑔𝑒
𝑊 = 𝑜𝑟 𝑊𝑖𝑑𝑡ℎ =
𝐾 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐶𝑙𝑎𝑠𝑠𝑒𝑠
Note that: Round the answer up to the nearest whole number if there is a reminder. For
instance, 4.7 ≈ 5 and 4.12 ≈ 5
5. Select the starting point as the lowest class limit. This is usually the lowest score
(observation). Add the width to that score to get the lower class limit of the next class.
Keep adding until you achieve the number of desired class(𝐾) calculated in step 3.
6. Find the upper class limit; subtract unit of measurement(𝑈) from the lower class limit of
the second class in order to get the upper limit of the first class. Then add the width to
each upper class limit to get all upper class limits.
Unit of measurement: Is the next expected upcoming value. For instance, 28, 23, 52, and
then the unit of measurement is one. Because take one datum arbitrarily, say 23, then the
next upcoming value will be 24. Therefore,𝑈 = 24 − 23 = 1. If the data is 24.12, 30,
21.2 then give priority to the datum with more decimal place. Take 24.12 and guess the
next possible value. It is 24.13. There fore, 𝑈 = 24.12 − 24.13 = 0.01.
Note that: 𝑈 = 1 is the maximum value of unit of measurement and is the value when we
don’t have a clue about the data.
𝑈
7. Find the class boundaries. 𝑳𝑜𝑤𝑒𝑟 𝑪𝑙𝑎𝑠𝑠 𝑩𝑜𝑢𝑛𝑑𝑎𝑟𝑦 = 𝑳𝑜𝑤𝑒𝑟 𝑪𝑙𝑎𝑠𝑠 𝑳𝑖𝑚𝑖𝑡 − 2
𝑈 𝑈
and 𝑼𝑝𝑝𝑒𝑟 𝑪𝑙𝑎𝑠𝑠 𝑩𝑜𝑢𝑛𝑑𝑎𝑟𝑦 = 𝑼𝑝𝑝𝑒𝑟 𝑪𝑙𝑎𝑠𝑠 𝑳𝑖𝑚𝑖𝑡 + 2 . In short, 𝐿𝐶𝐵𝑖 = 𝐿𝐶𝐿𝑖 − 2
𝑈
and 𝑈𝐶𝐵𝑖 = 𝑈𝐶𝐿𝑖 + 2 .
8. Tally the data and write the numerical values for tallies in the frequency column
9. Find cumulative frequency. We have two type of cumulative frequency namely less than
cumulative frequency and more than cumulative frequency. Less than cumulative
frequency is obtained by adding successively the frequencies of all the previous classes
including the class against which it is written. The cumulate is started from the lowest to
the highest size. More than cumulative frequency is obtained by finding the cumulate
total of frequencies starting from the highest to the lowest class.
For example, the following frequency distribution table gives the marks obtained by 40
students:
abebuabebaw@yahoo.com
10
Chapter two: Methods of data presentation
The above table shows how to find less than cumulative frequency and the table shown
below shows how to find more than cumulative frequency.
Example 2.3: Consider the following set of data and construct the frequency distribution.
11 29 6 33 14 21 18 17 22 38
31 22 27 19 22 23 26 39 34 27
Steps
1. Highest value = 39, Lowest value = 6
2. 𝑅 = 39 − 6 = 33
3. 𝐾 = 1 + 3.32 log 20 = 5.32 ≈ 6
𝑅 33
4. 𝑊 = 𝐾 = = 5.5 ≈ 6
6
5. Select starting point. Take the minimum which is 6 then add width 6 on it to get the next
class LCL.
6. Upper class limit. Since unit of measurement is one. 12 − 1 = 11. So 11 is the UCL of
the first class. Therefore, 6 − 11 is the first class
abebuabebaw@yahoo.com
11
Chapter two: Methods of data presentation
7. Find the class boundaries. Take the formula in step 7. 𝐿𝐶𝐵1 = 𝐿𝐶𝐿1 − 0.5 and 𝑈𝐶𝐵1 =
𝑈𝐶𝐿1 + 0.5
8. 9 and 10
abebuabebaw@yahoo.com
12
Chapter two: Methods of data presentation
A pie chart is a circle that is divided in to sections or wedges according to the percentage of frequencies
in each category of the distribution. The angle of the sector is obtained using:
𝑉𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑝𝑎𝑟𝑡
𝐴𝑛𝑔𝑙𝑒 𝑜𝑓 𝑎 𝑠𝑒𝑐𝑡𝑜𝑟 = ∗ 3600
𝑇ℎ𝑒 𝑤ℎ𝑜𝑙𝑒 𝑞𝑢𝑎𝑛𝑡𝑖𝑡𝑦
Example 2.4: Draw a suitable diagram to represent the following population in a town.
abebuabebaw@yahoo.com
13
Chapter two: Methods of data presentation
Boys Men
15% 25%
Girls Women
40% 20%
A) Bar Charts
✓ Used to represent & compare the frequency distribution of discrete variables and attributes or
categorical series.
✓ Bars can be drawn either vertically or horizontally.
In presenting data using bar diagram,
✓ All bars must have equal width and the distance between bars must be equal.
✓ The height or length of each bar indicates the size (frequency) of the figure represented.
There are different types of bar charts. The most common being:
❖ Simple bar chart
❖ Component or sub divided bar chart.
❖ Multiple bar charts
I. Simple bar chart
✓ Are used to display data on one variable.
✓ They are thick lines (narrow rectangles) having the same breadth. The magnitude of a quantity is
represented by the height /length of the bar.
Example 2.5: Number of students in the four department of Science College given as follows:
Department Physics Maths Chemistry Biology
abebuabebaw@yahoo.com
14
Chapter two: Methods of data presentation
Solution:
Simple bar chart
800 600
Frequency
600 450
400
400 200
200
0
Phys Maths Chem Bio
De prtm e nt
Solution:
800
600 Female
Frequency 400 Male
200
0
Phys Maths Chem Bio
Department
Example 2.7: The following data represent sales by product, 1957- 1959 of a given company for three
products A, B, C.
abebuabebaw@yahoo.com
15
Chapter two: Methods of data presentation
A 12 14 18
B 24 21 18
C 24 35 54
Draw a multiple bar chart to represent the sales by product from 1957 to 1959.
Solution:
B) Pictograph
In this diagram, we represent data by means of some picture symbols. We decide about a suitable picture
to represent a definite number of units in which the variable is measured.
2.2.4 Graphical Presentation of data
The histogram, frequency polygon and cumulative frequency graph or ogive is most commonly applied
graphical representation for continuous data.
Procedures for constructing statistical graphs:
➢ Draw and label the X and Y axis.
➢ Choose a suitable scale for the frequencies or cumulative frequencies and label it on the Y axis.
➢ Represent the class boundaries for the histogram or ogive or the mid points for the frequency
polygon on the X axis.
➢ Plot the points.
➢ Draw the bars or lines to connect the points.
Histogram
A graph which displays the data by using vertical connected bars of various heights to represent
frequencies. Class boundaries are placed along the horizontal axis. Class marks and class limits are some
times used as quantity on the X axis.
abebuabebaw@yahoo.com
16
Chapter two: Methods of data presentation
Solution:
Histogram
Frequency
20
15
15 12
10
10
4 4
5 3 2
0
Class boundaries
Frequency polygon
If we join the mid-points of the tops of the adjacent rectangles of the histogram with line segments a
frequency polygon is obtained. When the polygon is continued to the x-axis just outside the range of the
lengths the total area under the polygon will be equal to the total area under the histogram.
Example 2.9: Construct a frequency polygon to represent the previous data in example 2.8.
Solution:
Class Frequency Class Class R.F. % R.F. Less than More than
limits marks boundaries C.F. C. F.
(percent)
abebuabebaw@yahoo.com
17
Chapter two: Methods of data presentation
Adding two class marks with f i = 0 , we have 9.5 at the beginning, and 89.5 at the end, the following
Frequency Polygon
20
F
r
15
e
q
10
u
e
n 5
c
y 0
9.5 19.529.539.549.559.569.579.589.5
Class mark
abebuabebaw@yahoo.com
18
Chapter two: Methods of data presentation
40 40
30 30
20 20
10 10
0
0
14.5 24.5 34.5 44.5 54.5 64.5 74.5 84.5
14.5 24.5 34.5 44.5 54.5 64.5 74.5 84.5
Class Boundaries
Class Boundaries
Note: For both ogives, one class with frequency zero is added for similar reason with the frequency
polygon.
abebuabebaw@yahoo.com
19