You are on page 1of 82

Chapter two

Methods of data collection and presentation

2.1 Methods of data collection


 In order to generate valid conclusion from a data, information
has to be collected in a systematic manner
 Whatever the quality of sampling and analysis method, a
haphazardly collected dataset is less likely to produce valuable
and generalizable information
 Hence, data should be collected systematically

Set by G.F 1
2.1.1 Source of data
Sources of data
 Depending on the source, data can be classified as Primary or
Secondary data.
1. Primary Data
 Data measured or collect by the investigator or the user directly
from the source.
 Primary data refers to the first hand data gathered by the
researcher himself.
 data is gathered for the first time by the researcher for a given purpose
Example:

 An enquiry is made fromSeteach


by G.F
tax payer in a city to obtain
2
Cont’d
 Two activities involved in primary data collection are planning
and measuring
a) Planning
 Identify source and elements of the data
 Decide whether to consider sample survey or census survey
 If sampling is preferred; decide on sample size, sample selection
method etc.
 Decide measurement/data collection/ procedure
 Set up the necessary organizational structure

Set by G.F 3
Cont’d
b) Measuring: there are different options
 Focus Group discussion
 Telephone Interview
 Mail Questionnaires
 Door-to-Door Survey
 New Product Registration
 Personal Interview and
 Experiments are some of the sources for collecting the
primary data

Set by G.F 4
cont’d

2. Secondary Data
 Secondary data means data collected by someone else earlier
 It is important to analyze the past situations
 It is obtained from published and unpublished sources, such as
 book, survey reports (Journals or periodicals ), official records,
newspapers, etc.
These are becoming very important as a secondary data collection is
concerned because they provide an up to date information which all the
books may not hold. These journals give our specific information about a
topic and are useful for the researcher.
Journals or periodicals have an advantage over the books that while the
books give general information about all the topics, journals and periodicals
Set by G.F 5
talk only about specific topics in detail which is why they are more helpful
Cont’d
1. Data reliability
 The data connection analysis should be done by asking questions
like;
 who collected the data,
 what were the sources of the collected data,
 when was the data collected
 what were the methods used to collect it,
 what’s the desired level of accuracy achieved and
 if there is any bias by the compiler.
Set by G.F 6
Cont’d
(2) Suitability of the data
The researcher should carefully see the terms(variable types) and
units of collection (data) and the time at which the data is collected
It is compatible with the present study problem
The nature and classification of data
(3) Data sufficiency
Check whether the scope of the current study is not narrow or wider
than the secondary data
There are no biases and misreporting in the published data.

Set by G.F 7
sources of secondary data

(1) Printed Sources


i. Books
ii. journals/Periodicals: Print journals are the physical paper version of a journal
iii. newspapers
(1) Published and electronic sources
i. E-Journals: a journal with articles available in full text online
ii. General websites
iii. Weblogs
iv. Blogs
v. Social media
(1) Unpublished personal records
i. Diary
ii. Letters
iii. Government records
iv. Census data
Set by G.F 8
Cont’d
(1) Printed Sources
a) Books
 the oldest data source
 the most authentic form of secondary data
 require extensive time utilization for proper secondary data

b) Magazines/Journals or Periodicals
 provide an up to date data than books
 Provide specific information about a topic

c) Newspapers: a very reliable source of information as they contain the


latest information. does not require a background check or authentication
Set by G.F 9
Cont’d
(2) Published and electronic sources
a) E-Journals
 Found more commonly and available freely over the internet than printed
journals.
 Most academic research datasets are published in these journals

b) General websites
• They may not contain reliable information

• There could be some misleading information on certain websites


• There are some websites that don’t provide reliable source, eg

• There are some authentic websites that provide citations and bibliography
for every quote that has been made on their website example: Wikipedia
Set by G.F 10
Cont’d

c) Weblogs

 They are the records kept in the form of a video or audio or written

format.

d) Blogs

• Blogs are internet diaries which are written by different people in

order to summarize their experiences.

• some of the blogs are considered to be authentic while others are not

much

• There could be different forms of blogs such as personal blogs for


Set by G.F 11
news blogs.
Cont’d
e) Social Media

 Social media has gained an exceptionally upper hand in recent days

for record-keeping.

 almost everybody uses social media and publishes different data on

it

 Every second tone of data is being uploaded and published on social

media which is visible by a close friend circle of the published

person.

 Although not every single photograph


Set by G.F
can be used as secondary data,
12
Cont’d
(3) Unpublished personal records
a) Diary
 These are personal records kept by people which are rare
 For example, Anne Frank’s diary is one of the records that is been
kept by an individual in order to summarize the records of Nazi wars
b) Letters
c) Government records
d) Census data

Set by G.F 13
Difference between primary & secondary data
Primary data Secondary data
Directly collected by Previously collected by other
investigator person
Less prone to error Much prone to error
Correction is possible Can’t be corrected
Takes time, cost & labor Doesn't’ take much time, cost &
labor

Set by G.F 14
2.1.2 Data collection methods
 Questionnaire is the main data collection instrument in
formal survey
 Depending on the amount of freedom given to a respondent in
offering responses, there are two basic types of questions:
1. Open-ended questions and
2. Closed ended questions
 The type of questions for use will be determined by
 the form of responses wanted,
 the nature of the respondents and
 their ability to answer the questions
Set by G.F 15
Cont’d
1) Open-ended questions: - allows the respondent to answer freely in
his or her own words
Example: what do you think are the reasons for a high drop-out rate of
village health committee members?

2) Closed – ended questions:-


 Pre-determined list of alternate responses presented to the respondent for
checking the appropriate one(s)

 the respondent’s answers are restricted in some way to a limited


range of alternatives
 Closed ended question fall into one of the two categories:
 dichotomous questions: Yes-no, true –false, agree-disagree, like-
dislike etc.
Set by G.F 16

Steps in designing a questionnaire
 Designing a good questionnaire always takes several drafts.
1) Prepare the content
2) Formulating Questions
3) sequencing of questions
4) Formatting the questionnaire
5) Translation

Set by G.F 17
Cont’d
Step 1: Content
 Decide what questions will be needed to measure/define your
variables and reach your objectives.
 When developing the questionnaire, you should reconsider the
variables you have chosen, and, if necessary, add, drop or change
some
Step 2: Formulating Questions
 Formulate one/more questions that will provide the information
needed for each variable
 Take care that questions are specific and precise enough that
different respondents do not interpret
Set by G.F them differently 18
cont’d

Cont’d

 In preparing questions, there are Requirements


Must be clear and unambiguous
Avoid leading questions: e.g. You don’t smoke, do you?
Use simple language: The language of a question should be simple
Avoid Sensitive topics: e.g. how many times you sex per week?
Multiple questions should be avoided: e.g. Do you like listening
radio & watching television?

Set by G.F 19
Cont’d
Step 3: sequencing of questions
 Design interview schedule/questionnaire to be consumer
friendly
 The sequence of questions must be logical and allow natural
discussion, even in more structured interviews
 Pose more sensitive questions as late as possible in the
interview

Set by G.F 20
Cont’d
Step 4: Formatting the questionnaire
 When you finalize your questionnaire, be sure that
 Each questionnaire has a heading and space to insert the number,
data and location of the interview, and , if required the name of the
informant/data collectors
 Layout is such that questions belonging together appear together
visually
 Sufficient space is provided for answers to open-ended questions

Set by G.F 21
Cont’d
Step 5: Translation
 If interview will be conducted in one or more local
languages,
 After having translating you should have it retranslated
into the original language

Set by G.F 22
Data collection methods
1) Interview(face-to-face, mailed or telephone)
 a meeting between an interviewer and interviewee.

2) Online Questionnaires: the use of online surveys


3) Focus Group discussion: Collecting a data on a group of
deliberately selected participants (6–12 persons)
 The focus group is guided/led by facilitators/moderators

4) Observation: collecting a data by a critical observation and


recording the practice (behavior, culture…)

Set by G.F 23
Cont’d
5) Records And Document: extracting data from existing documents
 The documents can be:
 internal to an organization (such as emails, sales reports, records of
customer feedback, activity logs, purchase orders, etc.)
 external (such as Government reports)

Set by G.F 24
Pros and cons of different data collection methods
1) Interview (via face-to-face or video conferencing tools)
Advantage
1) Accurate: The interviewee can’t provide false information
2) The interviewer can capture raw emotions, tone, voice, and word
choices to gain a deeper understanding
3) Interviewers can ask follow-up questions and require additional
information to understand attitudes, motivations, etc.
4) Participants do not need to be able to read and write to respond

Set by G.F 25
Cont’d

Disadvantage
i. High costs as this method require a staff of people to perform the
interview.
ii. The quality of the collected data depends on the ability of the
interviewer to gather data well.
iii. A time-consuming process that involves transcription, organization,
reporting, etc.
iv. Doesn’t give opportunity to probe and explore
 Relatively inflexible
 Less reliable to assess behavior and attitude of respondents
Set by G.F 26
Cont’d
2) Surveys and Questionnaires
Advantages

i. Ease of data collection: many respondents can be conducted fast

ii. Easily accessible and can be deployed via many online channels
like web, mobile, email, etc.

iii. Low price compared to other methods

iv. Easy to analyze and present with different data visualization types

v. A wide range of data types can be collected such as attitudes,


opinions, values, etc.

Set by G.F 27
Cont’d
Disadvantage
i. Survey fraud. Answers may not be honest as some people answer
online surveys just to receive a promised reward.
ii. Many questions might be left unanswered and participants may
not stay fully engaged to the end.
iii. Participants may have different interpretations of the questions.
iv. Cannot fully capture emotions and feelings.

Set by G.F 28
cont’d
3)Focus group discussion
Advantage
i. Easy measure the reaction of customers to your brand, products,
or marketing campaigns.
ii. The moderator can ask questions to gain a deeper understanding
of the respondents’ emotions.
iii. The moderator can observe non-verbal responses, such as body
language or facial expressions.
iv. Provide brainstorming opportunities and participants can create
new ideas.
Set by G.F 29
Cont’d
Advantage
i. Participants can not give honest answers for sensitive topics
ii. Requires strong facilitator
iii. Doesn’t give quantitative information
iv. It is difficult to organize the discussion

Set by G.F 30
Cont’d
4) Observation
Advantage
i. Simple to collect data. Observation does not require tech skills of
the researcher.
ii. Allows for a detailed description of behaviors, intentions, and
events.
iii. Provide accurate information: The observer can view participants in
their natural environment and directly check their behavior.
iv. Doesn’t depend on people’s willingness to report. Some
respondents don’t want to speak about themselves or don’t have time
Set by G.F 31
for that.
Cont’d
Disadvantage
i. Can take a lot of time if the observer has to wait for a particular
event to happen
ii. Cannot study attitudes and opinions
iii. Liable to subjective observational bias. The personal view of the
observer can be an obstacle to make valid conclusions.
iv. Expensive method. It requires a high cost, effort, and plenty of
time.
v. Situations of the past cannot be studied

Set by G.F 32
Cont’d
6) Secondary data
Advantage
i. Ease of data collection as it needs lees resource (labor & cost)
and time)
ii. No need of searching and motivating respondents to participate in
the study
iii. Allows to track history of events/progress. For example, you may
want to find out why there are lots of negative reviews from your
customers about your products. In this case, you can look at
recorded customers’ feedback.
Set by G.F 33
Cont’d
Disadvantage
i. Information may be outdated or inapplicable
ii. Time-consuming
iii. No knowledge on the accuracy of data collection
iv. Less likely to give qualitative information

Set by G.F 34
2.2 METHODS OF DATA PRESNTATION
 Classification is the process of arranging data in to classes or
categories according to similarities
 Classification is a preliminary and it prepares the ground for proper
presentation of data.
 Mainly, the purpose of classification is to divide the data into
homogeneous groups or class

Set by G.F 35
Cont’d

 There are four important bases of classification:

1) In geographical classification: data are arranged according to places,


areas or regions, like states, provinces, cities, countries etc.

2) In chronological classification: data are arranged according to their


time occurrence , i.e., weekly, monthly, quarterly, annually, etc.
3) In qualitative classification: the data are arranged according to
attributes like sex, martial status, educational standard, stage or
intensity of diseases etc.
4) In quantitative classification: the data are arranged according to
certain characteristic that has been measured like height, weight,
income of persons
Set by G.F 36
Data classification and Tabulation
 Classification: used to classify data according to some characteristics
 Tabulation: used to classify data according to number of
characteristics
1. One-way table: classify data based on single characteristic

2. Two-way table: classify data based on two characteristics

3. Multi-way (high order) table: classify data based on more than two characteristics
Set by G.F 37
Cont’d
 The presentation of data is broadly classified in to two categories:

1. Tabular presentation (frequency distribution): presenting data


with tables using their
 frequencies and
 percentages

2. Diagrammatic and Graphic presentation: presenting data with


graph using their
 frequency and
 Percentages
 Pictures
Set by G.F 38
1. Tabular presentation (Frequency Distribution)
 It is a tabular arrangement of raw data into classes according to the
size or magnitude along with corresponding class frequencies (the
number of values fall in each class)
 It is the organization of raw data in table form with classes and
frequencies, where:
o Raw Data is data collected in original form
o Frequency is the number of times a certain value occurred in the
class

Set by G.F 39
Cont’d
 The reasons for constructing a frequency distribution

1. To organize the data in a meaningful & intelligible way


2. To determine the nature /shape of the distribution
3. To facilitate computational procedures for measures of
average and spread
4. To draw charts and graphs for data presentation
5. To make comparisons between different data set
 There are three basic types of frequency distributions

1. Categorical frequency distribution


2. Ungrouped frequency distribution
3. Grouped frequency distribution
Set by G.F 40
1. Categorical frequency Distribution

 Use for data that can be place in specific categories such as


nominal or ordinal (qualitative data)
Example: a social worker collected the following data on marital status
for 25 persons. (M=married, S=single, W=widowed, D=divorced)

Set by G.F 41
Cont’d
.

Set by G.F 42
Cont’d
.

Set by G.F 43
Exercise 2.1
1. Twelve people were asked which sandwiches they had bought from
a sandwich shop. Their answer were:

 Having this,
a) Prepare categorical frequency distribution

b) Which type of sandwich was more preferable by the people?

Set by G.F 44
2. Un grouped frequency distribution
 It is used for small number of observations
 It is done by putting all individual values in the dataset in ascending
order along with the number of times each observation actually occurs

Set by G.F 45
Cont’d

Set by G.F 46
Exercise 2.2
 A class of 30 students were asked how many brothers and sisters they
have. Here are the results.

a) Draw a tally and frequency table of the results.


b) Interpret the results and comment on the numbers of brothers and
sisters that the students have.

Set by G.F 47
Exercise 2.3
1. A survey is carried out to test the manufacturer’s claim that there
are ‘about 36 chocolate buttons in each packet’. The number of
buttons in each of 25 packets is counted as follow:

 For this data, prepare ungrouped frequency distribution.

Set by G.F 48
3. Grouped frequency Distribution
.

Set by G.F 49
Cont’d
.

Set by G.F 50
Cont’d

Class width

Set by G.F 51
Cont’d
.

Set by G.F 52
Cont’d
.

Set by G.F 53
Cont’d
.

Set by G.F 54
Cont’d
.

Set by G.F 55
Cont’d
.

Set by G.F 56
Cont’d
.

Set by G.F 57
Cont’d
.

Set by G.F 58
Cont’d
.

Set by G.F 59
Cont’d
.

Set by G.F 60
Exercise 2.4
 A fitness club carries out a survey to find out the ages of its members.
Here are the results.

i. Prepare grouped frequency distribution for the dataset


ii. Which age group are more p

Set by G.F 61
2. Diagrammatic and Graphic presentation of data
 These are techniques for presenting data in visual displays using
geometric and pictures
Importance: -
Ꙫ They have greater attraction.
Ꙫ They facilitate comparison.
Ꙫ They are easily understandable.

A) Diagrammatic presentation of data


o Diagrams are appropriate for presenting discrete as well as
qualitative data.

Set by G.F 62
Cont’d
 The three most commonly used diagrammatic presentation for
discrete as well as qualitative data are Pie charts, Pictogram & Bar
chart
1. Pie chart
 A Pie Chart is a circular chart divided into sectors, illustrating relative
magnitudes or frequencies of classes of a given variable.
 Pie chart usually represents categorical data but it is also possible to
use it for discrete quantitative data.
 The angle of each sector has to be proportional to the relative
frequency of a given class, which is: 𝑣𝑎𝑙𝑢𝑒𝑜𝑓 𝑝𝑎𝑟𝑡 0
Set by G.F Angle= ∗ 360
63
𝑣𝑎𝑙𝑢𝑒𝑜𝑓 𝑤h𝑜𝑙𝑒𝑞𝑢𝑎𝑛𝑡𝑖𝑡𝑦
cont’d

Set by G.F 64
Cont’d
.

Set by G.F 65
Cont’d
2. Pictogram
 It is presenting data with the help of pictures
 Here the magnitudes of quantities of the variable are explained
with the help of pictures which depict the variable
approximately
Example: The following table shows the orange production in a
plantation from production year 1990-1993. Represent the data by a
pictogram

Set by G.F 66
Cont’d
Solution:

 In a pictogram, each symbol in the picture represents a fixed


quantity of the variable
Set by G.F 67
3. Bar Charts
 A set of bars (thick lines or narrow rectangles) representing
some magnitude over time space.
 There are three different types of bar charts
1. Simple bar chart
2. Component bar chart
3. Multiple bar chart

Set by G.F 68
1. Simple Bar Chart
 Are used to display data of one categorical variable
 They are thick lines (narrow rectangles) having the same
breadth.
 The magnitude of a quantity is represented by the height
/length of the bar
Example: The following data represent sale by product, 1957-
1959 of a given company for three products A, B, C.

Set by G.F 69
Cont’d
Solution:

Set by G.F 70
2. Component Bar chart

 The bars represent total value of a variable with each total


broken in to its component parts and different paints or designs
are used for identifications
Example: Draw a component bar chart to represent the sales by
product from 1957 to 1959.
Solutions:

Set by G.F 71
3. Multiple Bar charts

 These are used to display data on more than one variable.


 They are used for comparing different variables at the same
time.
Example: Draw a component bar chart to represent the sales by
product from 1957 to 1959.
Solution

Set by G.F 72
b) Graphical Presentation of data
 The histogram, frequency polygon and cumulative frequency
graph/Ogive is most commonly applied graphical
representation for continuous data.
 Procedures for constructing statistical graphs
Draw and label the X and Y axes.
Choose a suitable scale for the frequencies/cumulative
frequencies and label it on the Y axes.
Represent
 the class boundaries for the histogram/Ogive
 the mid points for the frequency polygon on the X axes.

Plot the points.


Set by G.F 73
a) Histogram

 A graph which displays the data by using vertical bars of


various heights to represent frequencies.
 Class boundaries are placed along the horizontal axes.
 Class marks and class limits are some times placed on as the X axes.

 In the case of Histogram the categories (bars) must be adjacent


Example: the following table summarizes the management mid exam score of
38 students out of 35 marks.

Set by G.F 74
Cont’d
 If we want to draw Histogram for this data it would be like this:

Set by G.F 75
b) Frequency Polygon
 Frequency Polygon depicts a frequency distribution for discrete or
continuous numeric data.
 It is used to for understand the shapes of distributions.
 It is done by placing the mid-point on the x-axis and frequency on y-
axis
 A Histogram can easily be changed to Frequency Polygon by joining
the mid points of the top of the adjacent rectangles of the Histogram
with a line.
 It is also possible to draw Frequency Polygon without drawing
Histogram.
Set by G.F 76
Cont’d
Example: the following Frequency Distribution represents the ages (in
years) of 60 merchants at a psychiatric counseling center.

Set by G.F 77
Cont’d
.

Set by G.F 78
C) O’give (cumulative frequency polygon)

 A graph showing the cumulative frequency (less than or more


than type) plotted against upper or lower class boundaries
respectively.
 That is class boundaries are plotted along the horizontal axis
and the corresponding cumulative frequencies are plotted along
the vertical axis.
 The points are joined by a free hand curve.

Example: These data represent the record high temperatures in for each
of the 50 states. Construct less than type ogive curve for the data given
below.
Set by G.F 79
Cont’d
.

Solution:

Set by G.F 80
Cont’d
....

Set by G.F 81
tha
En d nk
of c you!
hap ! !
ter
two

Set by G.F 82

You might also like