You are on page 1of 48

CHAPTER 2: DATA COLLECTION AND

PRESENTATION
• Data Collection
• Classification of Data
• Methods of Data Collection
– Tabular Methods of Data Presentation
• Frequency Distributions (Absolute, Relative and
Cumulative Distributions)
– Graphic Methods of Data Presentation
(Histograms, Polygons, Ogive, Pie-Charts, Bar and
Line Graphs)

July 14, 2022 1


Classification of Data
• There are two types of data based on their
source. These are primary and secondary
data.
• Primary data – These are data which are the
measurements and records of original study.
These are data which are collected as a fresh
and for the first time and thus happens to be
original in character.
– These are data which are directly measured and
recorded from the source. These are data which
are not collected by someone else before.
July 14, 2022 2
Secondary Data
• In some situations there are cases which are not
conducive for the principal investigator to start
his study from the very beginning. In such a
situation he may use and take in to consideration
what have already been collected by others.
• Secondary data are those which have already
been collected by someone else and which have
already been passed through some statistical
process. When an investigator uses the data
which have already been collected by others,
such data are called secondary data. Secondary
data can be taken from journals, reports,
periodicals, publications, etc.
July 14, 2022 3
• Secondary data should be used with greater care.
• The investigator, before using these data, must
observe that they possess the following
characteristics.
– Reliability of Data: The data collected from other
source should be reliable enough to be used by the
investigator. Determining and testing the reliability of
secondary data is the most important as well as
difficult task. Reliability can be tested by answering
questions like:
• Who collected them?
• What were the sources of data?
• What methods were used to collect them?
• At what time were they collected?
July 14, 2022 4
• Suitability of Data:
– Before using the secondary data, they must be
evaluated whether they could serve for another
purpose other than the one for which they were
collected.
– The suitability of data can be evaluated from the
point of the nature and scope of investigation
view.

July 14, 2022 5


• Adequacy of Data: Reliability and suitability of
secondary data may not be sufficient for the
investigator to use these data for analysis.
Besides these, they should be tested for
adequacy. Adequacy can be tested by
evaluating the data in terms of area coverage,
level of accuracy; number of respondents
participated and so on.

July 14, 2022 6


Data Collection Strategies
• No one best way: decision depends on:
– What you need to know: numbers or stories
– Where the data reside: environment, files, people
– Resources and time available
– Complexity of the data to be collected
– Frequency of data collection
– Intended forms of data analysis

July 14, 2022 7


Rules for Collecting Data

• Use multiple data collection methods


• Use available data, but need to know
– how the measures were defined
– how the data were collected and cleaned
– the extent of missing data
– how accuracy of the data was ensured

July 14, 2022 8


Rules for Collecting Data
• If must collect original data:
– be sensitive to burden on others
– pre-test, pre-test, pre-test
– establish procedures and follow them (protocol)
– maintain accurate records of definitions and
coding
– verify accuracy of coding, data input

July 14, 2022 9


Structured Approach
• All data collected in the same way
• Especially important for multi-site and cluster
evaluations so you can compare
• Important when you need to make
comparisons with alternate interventions

July 14, 2022 10


Use Structured Approach When:

• need to address extent questions


• have a large sample or population
• know what needs to be measured
• need to show results numerically
• need to make comparisons across different
sites or interventions

July 14, 2022 11


Semi-structured Approach
• Systematic and follow general procedures but
data are not collected in exactly the same way
every time
• More open and fluid
• Does not follow a rigid script
– may ask for more detail
– people can tell what they want in their own way

July 14, 2022 12


Use Semi-structured Approach
when:
• conducting exploratory work
• seeking understanding, themes, and/or
issues
• need narratives or stories
• want in-depth, rich, “backstage”
information
• seek to understand results of data that are
unexpected

July 14, 2022 13


Quantitative Approach
• Data in numerical form
• Data that can be precisely measured
– age, cost, length, height, area, volume, weight,
speed, time, and temperature
• Harder to develop
• Easier to analyze

July 14, 2022 14


Qualitative Approach
• Data that deal with description
• Data that can be observed or self-reported, but
not always precisely measured
• Less structured, easier to develop
• Can provide “rich data” — detailed and widely
applicable
• Is challenging to analyze
• Is labor intensive to collect
• Usually generates longer reports

July 14, 2022 15


Which Data?
If you: Then Use:
- want to conduct statistical analysis
- want to be precise Quantitative
- know what you want to measure
- want to cover a large group
- want narrative or in-depth information
- are not sure what you are able to measure Qualitative
- do not need to quantify the results

July 14, 2022 16


How to Decide on Data Collection
Approach

• Choice depends on the situation


• Each technique is more appropriate in
some situations than others
• Caution: All techniques are subject to bias

July 14, 2022 17


Triangulation to Increase Accuracy of
Data
• Triangulation of methods
– collection of same information using different
methods
• Triangulation of sources
– collection of same information from a variety of
sources
• Triangulation of evaluators
– collection of same information from more than
one evaluator
July 14, 2022 18
Data Collection Tools

• Participatory Methods
• Records and Secondary Data
• Observation
• Surveys and Interviews
• Focus Groups
• Diaries, Journals, Self-reported Checklists
• Expert Judgment
• Delphi Technique
• Other Tools
July 14, 2022 19
2-2

Data Presentation
Frequency Distribution
• Frequency distribution: A grouping of data
into categories showing the number of
observations in each mutually exclusive
category.
• There are three types of frequency
distributions; these are categorical,
ungrouped and grouped.
Categorical data
• The categorical frequency distribution is used
for data which are qualitatively described.
• The important thing here is that it can be able
to classify the data in to complete and non-
overlapping categories.

July 14, 2022 21


• No Name LOE
• 1 Abebe Diploma
• 2 Wordofa B.Sc
• 3 Toga M.Sc
• 4 Kahay PhD
• 5 Ahmed Diploma
• 6 Hirut B.Sc
• “ ” “
• “ “ “
• “ “ “
• 50 Kassech Ph.D

July 14, 2022 22


There are 15 workers having diploma, 20
workers having B.Sc, 10 works having M.Sc and 5
workers having Ph.D. Present the data using
appropriate method of presentation?

LOE NO Percentage
Diploma 15 30%
Bachelor 20 40%
Master 10 20%
Ph.D 5 10%
Total
July 14, 2022
50 100% 23
Ungrouped frequency distributions
• Ungrouped frequency distributions can be
useful when you want to see how often each
individual value occurs in a dataset.
• Note that ungrouped frequency distributions
work best with small datasets in which there
are only a few unique values.

July 14, 2022 24


July 14, 2022 25
Grouped Frequency Distribution
• This is a method of presenting data which is
quantitatively measured and when a variable
contains a large volume of raw data.
• It contains several important concepts such as
class limits, class width, class interval and
frequencies.
• Class limits are classified as lower class limit
and upper class limit.

July 14, 2022 26


• Class limits: the class limits are the lowest and
highest value of the class for unsmoothed
frequency.
• Class boundary: numbers used to describe
class for smoothed frequency distribution
• Class width: the difference between upper and
lower boundaries of the class.
• Class mark (midpoints): The mid value of lower
and upper class boundary (class limit) of a class
infernal.
• Frequency: The number of times that a certain
value is repeated in a given data.
July 14, 2022 27
upper class lim it  Lower class lim it
ClassMark 
2

upper class boundary  lower class boundary


Class Mark 
2

July 14, 2022 28


2-5

EXAMPLE 1
• Dr. Tillman is the dean of the school of business and
wishes to determine the amount of studying
business school students do. He selects a random
sample of 30 students and determines the number
of hours each student studies per week: 15.0, 23.7,
19.7, 15.4, 18.3, 23.0, 14.2, 20.8, 13.5, 20.7, 17.4,
18.6, 12.9, 20.3, 13.7, 21.4, 18.3, 29.8, 17.1, 18.9,
10.3, 26.1, 15.7, 14.0, 17.8, 33.8, 23.2, 12.9, 27.1,
16.6.
• Organize the data into a frequency distribution.
2-6

EXAMPLE 1 continued
Consider the classes 8-12 and 13-17. The class marks are 10 and
15. The class interval is 5 (13-8).

Hours studying Frequency, f


8-12 1
13-17 12
18-22 10
23-27 5
28-32 1
33-37 1
2-7

Suggestions on Constructing a
Frequency Distribution

• The class intervals used in the frequency


distribution should be equal.
• Determine a suggested class interval by
using the formula: i = (highest value-lowest
value)/number of classes.
2-8

Suggestions on Constructing a
Frequency Distribution

• Use the computed suggested class


interval to construct the frequency
distribution.
Note: this is a suggested class interval;
if the computed class interval is 97, it
may be better to use 100.
• Count the number of values in each
class.
2-9

Relative Frequency Distribution


• The relative frequency of a class is obtained by dividing the class
frequency by the total frequency.

Frequency, Relative
Hours f Frequency
8-12 1 1/30=.0333

13-17 12 12/30=.400

18-22 10 10/30=.333

23-27 5 5/30=.1667

28-32 1 1/30=.0333

33-37 1 1/30=.0333

TOTAL 30 30/30=1
T
2-12

Graphic Presentation
• Diagrammatic presentation of data has the following
advantages:
– They help in drawing the required information with short
period of time with out any complexity.
– They have greater attraction than figures.
– They facilitate comparison
• Diagrammatic presentations have greater importance
in the presentation of categorical data.
• There are different types of diagrammatic
presentation that are in use these days.
Bar charts (Bar graphs)
• Bar charts are one- dimensional rectangular
diagrams used to display usually qualitative
distributions. Bar charts have the following common
characteristics:
– The length or height of the bar associated with a category
of a class interval represents the corresponding frequency.
– The bars are equally spaced. Equal space should be left
between consecutive bars.
– Each bar has equal width
– The bars can lie horizontally or vertically.
– The bars should be labeled appropriately.

July 14, 2022 35


July 14, 2022 36
Pie- Chart
• Pie-Chart is a circle divided in to component
sectors according to the proportion of
components from the total.
• It is constructed by dividing 3600 of a circle in
to angles each of which is proportional to the
size of the respective component.

July 14, 2022 37


Population of English native speakers

July 14, 2022 38


• Histogram: A graph in which the class limits or
boundaries are marked on the horizontal axis
and the class frequencies on the vertical axis.
The class frequencies are represented by the
heights of the bars and the bars are drawn
adjacent to each other.
• It is commonly used for frequency distribution
with continuous classes.
• It can’t be used with frequency distribution
having open ended class.
• No space is left between bars
July 14, 2022 39
36 25 38 46 55 68 72 55 36 38
67 45 22 48 91 46 52 61 58 55

July 14, 2022 40


2-13

• Frequency polygon
• It is a curve which can be drawn by using
the class marks on the horizontal axis and
the frequencies on the vertical axis.
• It can be drawn with or without
histogram.
– If we are drawing the polygon using
histogram, we plot the midpoint of each
histogram curve and joining them with a line.
Construct a frequency polygon for
the following data
Test Scores Frequency
49.5-59.5 5
59.5-69.5 10
69.5-79.5 30
79.5-89.5 40
89.5-99.5 15
July 14, 2022 42
Cumulative Frequency Distribution (Ogive)

• A cumulative frequency distribution (ogive) is


used to determine how many or what
proportion of the data values are below or
above a certain value.

July 14, 2022 43


• Less than ogive : Plot the points with the
upper limits of the class as abscissae and
the corresponding less than cumulative
frequencies as ordinates.
• The points are joined by free hand
smooth curve to give less than
cumulative frequency curve or the less
than Ogive.
• It is a rising curve.
July 14, 2022 44
• Greater than ogive : Plot the points with
the lower limits of the classes as abscissa
and the corresponding Greater than
cumulative frequencies as ordinates.
• Join the points by a free hand smooth
curve to get the “More than Ogive”.
• It is a falling curve.

July 14, 2022 45


July 14, 2022 46
Exercise
I.Q. Frequency
60 – 70 2
70 – 80 5
80 –90 12
90 – 100 31
100 – 110 39
110 – 120 10
120 – 130 4
July 14, 2022 47
Required:
• Develop a relative frequency, less than
cumulative frequency, and more than
cumulative frequency tables.
• Compute class mark and class width for each
class.
• Develop histogram, line graph, less than ogive,
and more than ogive.

July 14, 2022 48

You might also like