You are on page 1of 26

Module 2 Data Collection, Presentation, Organization and

Sampling Methods

At the end of the module the students should be able to:


1. differentiate primary data from secondary data;
2. explain the different methods of collecting data;
3. explain how data can be appropriately organized and displayed;
4. compute for the appropriate sample size;
5. identify sampling techniques; and
6. determine some of the sources of errors in sampling.

DATA COLLECTION
Data collection is the process of gathering and measuring information on variables of
interest, in an established systematic fashion that enables one to answer stated research
questions, test hypotheses, and evaluate outcomes.

SOURCES OF DATA
Whether conducting research in the social sciences, humanities arts, or natural
sciences, the ability to distinguish between primary and secondary sources are
essential.
Primary Sources
Provides a first-hand account of an event or time period and are considered to be
authoritative. They represent original thinking, reports on discoveries or events, or
they can share new information. Often these sources are created at the time the
events occurred but they can also include sources that are created later. They are
usually the first formal appearance of original research.
Primary Data are data documented by the primary source. The data
collectors documented the data themselves. The first-hand information
obtained by the investigator is more reliable and accurate since the
investigator can extract the correct information by removing doubts, if any,
in the minds of the respondents regarding certain questions. High response
rates might be obtained since the answers to various questions are obtained
on the spot. It permits explanation of questions concerning difficult subject
matter.
Secondary Sources
It offers an analysis, interpretation or a restatement of primary sources and are
considered to be persuasive. They often involve generalization, synthesis,
interpretation, commentary or evaluation in an attempt to convince the reader of
the creator's argument. They often attempt to describe or explain primary sources.
Secondary Data are data documented by a secondary source. The data
collectors had the data documented by other sources.
In secondary data, data are primary data for the agency that collected
them, and become secondary for someone else who uses these data for
his own purposes.
WAYS TO COLLECT PRIMARY DATA
1. Direct personal interviews - The researcher has direct contact with the interviewee.
The researcher gathers information by asking questions to the interviewee.
2. Indirect/Questionnaire Method - This method of data collection involves sourcing and
accessing existing data that were originally collected for the purpose of the study.
An open-ended question is a type of question that does not include response
categories. The respondent is not given any possible answers to choose from. This
type of question is usually appropriate for collecting subjective data. It permits free
responses that should be recorded in the respondent’s own words.
Example: - Can you describe exactly what the traditional birth attendant did when
your labor started? - What do you think are the reasons for a high drop-out rate of
village health committee members?
A closed-ended question is a type of question that includes a list of response
categories from which the respondent will select his answer. It is useful if the range
of possible responses is known. This type of question is usually appropriate for
collecting objective data.
Did you eat any of the following foods yesterday?
Fish or meat Yes No
Eggs Yes No
Milk or cheese Yes No
3. Focus group is a group interview of approximately six to twelve people who share
similar characteristics or common interests. A facilitator guides the group based on a
predetermined set of topics.
4. Experiment is a method of collecting data where there is direct human intervention
on the conditions that may affect the values of the variable of interest.
Bear in mind that the experimental method has several limitations that you should be
aware of.
- Ethical, moral, and legal Concerns
- Unrealistic Controlled Environments
- Inability to Control for All Variables
5. Observation is a technique that involves systematically selecting, watching and
recoding behaviors of people or other phenomena and aspects of the setting in which
they occur, for the purpose of getting (gaining) specified information. It includes all
methods from simple visual observations to the use of high-level machines and
measurements, sophisticated equipment or facilities such as:
- Radiographic
- biochemical
- X-ray machines
- Microscope
- Clinical examinations

WAYS TO COLLECT SECONDARY DATA


1. Published report on newspaper and periodicals.
2. Financial Data reported in annual reports.
3. Records maintained by the institution.
4. Internal reports of the government departments.
5. Information from official publications.
Module 2
Name:______________________________________Score:_________________
Section:_____________________________________Date:__________________

Activity 1
Data Collection
I. Primary or Secondary Data. Determine the sources of data for the following.
primary
_______________1. The focus groups, individual respondents and panels of
respondents.
secondary
_______________2. The reports on quality control, production and financial accounts
issued by the companies.
_______________3.
secondary The government and non-government publications.
secondary
_______________4. The data which is generated within the company such as routine
business activities.
primary
_______________5. An artifact, document, diary, manuscript, or other source of
information created at the time of study.
primary
_______________6. A training record

secondary
_______________7. A map produced in 2016 showing what land European countries
controlled in the world in the 18th century.

primary
_______________8. A public opinion poll on what should be the next Mountain Dew
flavor.
primary
_______________9. Someone's Facebook page
_______________10. A report about the life of Jose P. Rizal written by a grade 6 student.
secondary

II. Choose the letter of the BEST answer. Write the letter of your choice on the
space provided.
1. A type of question that allows the audience to respond to the question based on their
understanding and experience.
A. multiple choices C. open ended questions
B. itemized question D. close ended questions

2. Which of the following BEST describes a primary data?


A. Data that you have collected yourself.
B. Data collected by a professional researcher.
C. Data collected for a purpose.
D. Data sourced on the internet.

3. Which of the following steps you should do first before collecting data?
A. Decide what the purpose of the data collection is.
B. Write your questions.
C. Collect personal information about respondents.
D. Contact potential participants.

4. Which of the following an open-ended question?


A. Have you ever tried skating?
B. Would you like to try skating?
C. What would encourage and motivate you to try skating?
D. Would you consider skating if there were free lessons?

5. When creating research questions, which of the following you should avoid?

A. Biased questions C. Double-barreled questions


B. Questions that assume what they ask D. All of the above

6. Which of these is NOT an example of a secondary source of data?

A. Newspaper articles C. Encyclopedia


B. Diary entry D. Textbook

7. If you want to determine if people who take vitamin C every day are less likely to get
colds, which method of gathering data is the most appropriate?

A. Census C. Questionnaire
B. Sample Survey D. Experiment

8. Which of the following is a possible source of secondary data?

A. Interview session with the sample


B. Data collect from an experiment
C. Any data gained from a survey or questionnaires
D. Publications, government documents, brochures, newsletters, annual reports

9. What type of data gathering procedure are you using if you are choosing people at
random at the supermarket and asking them to taste two brands of orange juice to
determine which brand they prefer?
A. Census C. Experiment
B. Sample Survey D. Observational Study
10. If you want to conduct a research about students’ vices, which of the following is the
most appropriate method?

A. Census C. Experiment
B. Sample Survey D. Observational Study
DATA PRESENTATION
Presentation of data refers to an exhibition or putting up data in an attractive and useful
manner such that it can be easily interpreted. The three main forms of presentation of
data are:
1. Textual Presentation
All the data is presented in the form of text, phrases, or paragraphs. It involves
enumerating important characteristics, emphasizing significant figures and identifying
important features of data. Text is the principal method for explaining findings, outlining
trends, and providing contextual information.
Example 1.
A researcher is asked to present the performance of a section in the
statistics test. The following are the test scores:

The data presented in textual form would be like this:


In the statistics class of 40 students, 3 obtained the perfect score of 50.
Sixteen students got a score 40 and above, while only 3 got 19 and below.
Generally, the students performed well in the test with 23 or 70% getting a
passing score of 38 and above.
Example 2.
The full year 2015 poverty incidence among population, or the proportion of
poor Filipinos, was estimated at 23.3 percent (revised from the 21.6 percent
released last 27 October, 2016). This translates to 23.5 million Filipinos
(from 21.9 million) who lived below the poverty threshold estimated at PhP
9,452 (from PhP 9,064), on average, for a family of five per month in 2015.

2. Tabular Presentation
It is a systematic and logical arrangement of data in the form of Rows and Columns with
respect to the characteristics of data. A table is best suited for representing individual
information and represents both quantitative and qualitative information.
2.1. Simple or One-Way Table
Example 1.
Table 2
First Grading Grades Scale in Mathematics 8
Grade Frequency Percentage Verbal Interpretation
90-100 0 0.00 Outstanding
85-89 1 3.33 Very Satisfactory
80-84 8 26.67 Satisfactory
75-79 8 26.67 Fairly Satisfactory
Below 75 13 43.33 Did Not Meet the Expectations
Total 30 100.00

It can be seen in the table that 13 or 43.33% of the students have grades of
below 75 or students who did not meet the expectations, 8 or 26.67% have grades
of 75-79 or students who are fairly satisfactory, 8 or 26.67% have grades of 80-84
or students who are satisfactory, only 1 or 3.33% has a grade of 85-89 or
considered to be very satisfactory, and none of the students got grades of 90-100
or considered as outstanding students.

Example 2.

2.2. Compound Table


A compound table is just an extension of a simple in which there are more than
one variable distributed among its attributes. An attribute is just a quality, property
or component of a variable according to which it can be differentiated with respect
to other variables. We may refer to a compound table as a cross tabulation or even
to a contingency table depending on the context in which it is used.
Example 1.

The country’s total external trade in goods in June 2021, which amounted
to USD 15.84 billion, grew at an annual rate of 26.8 percent. In the previous month,
the annual increase was recorded at 44.8 percent, while in June 2020, the decline
was -16.4 percent. (Table A)
Of the total external trade in June 2021, 58.9 percent were imported goods,
while the rest were exported goods.

Example 2.

The country’s unemployment rate in June 2021 remained the same as the
7.7 percent reported a month ago. This is lower than the unemployment rates in
April (8.7%), February (8.8%), and January (8.7%) of the same year, but higher
than the 7.1 percent reported in March 2021.
3. Graphical Presentation
A graph is a very effective visual tool as it displays data at a glance, facilitates comparison,
and can reveal trends and relationships within the data such as changes over time, and
correlation or relative share of a whole. It is considered an important medium of
communication because we are able to create a pictorial representation of the numerical
figures.

Bar Graph
It is constructed by labeling each category of data on either the horizontal or vertical axis
and the frequency or relative frequency of the category on the other axis. Rectangles of
equal width are drawn for each category. The height of each rectangle represents the
category’s frequency or relative frequency. It is use to organize discrete data.
Simple Bar Graph
A simple bar chart is used to represent data involving only one variable classified
on a spatial, quantitative or temporal basis. In a simple bar chart, we make bars of
equal width but variable length, i.e., the magnitude of a quantity is represented by
the height or length of the bars.
https://www.emathzone.com/tutorials/basic-statistics/simple-bar-chart.html#ixzz73qSFdN3y

Example 1.

Multiple Bar Graph


In a multiple bars diagram two or more sets of inter-related data are represented
(multiple bar diagram facilitates comparison between more than one phenomena).
https://www.emathzone.com/tutorials/basic-statistics/multiple-bar-chart.html#ixzz73qSpMg5r
Example 2.

Pie Chart
A Pie Chart is a type of graph that displays data in a circular graph. The pieces of the
graph are proportional to the fraction of the whole in each category. In other words, each
slice of the pie is relative to the size of that category in the group as a whole. The entire
“pie” represents 100 percent of a whole, while the pie “slices” represent portions of the
whole.
Example 1.
Example 2.

Line Graph
A line graph is a type of chart used to show information that changes over time. We plot
line using several points connected by straight lines. We also call it a line chart. The line
graph comprises of two axes known as ‘x’ axis and ‘y’ axis.
Simple Line Graph
A simple line graph is a kind of graph that is plotted with only a single line
showing the relationship between two variables.
Example 1.
Multiple Line Graph
A multiple line graph is a line graph that is plotted with two or more lines. It is used
to depict two or more variables that change over the same period of time. The
independent variable is usually on the horizontal axis, while the 2 or more
dependent variables are on the vertical axis.
Example 2.
Module 2
Name:______________________________________Score:_________________
Section:_____________________________________Date:__________________

Activity 2
Data Presentation
Answer the following:
1. The following table shows the total numbers (in millions) of tourists visiting each country
and the numbers of English tourists visiting each country:

a. Draw a bar chart showing the total numbers visiting each country.

b. Draw a pie chart showing the distribution of English tourists between the four
destination countries.

2. Suppose the numbers of books read by each student were randomly listed as follows.
Use the data below to make a frequency distribution table with the inclusion of the
percentage.
TABULAR PRESENTATION OF QUANTITATIVE DATA
Data for quantitative variables may likewise be organized by determining the frequency
counts belonging to each group called classes or class intervals. Consequently, we need
to prepare a stem-and leaf display or construct a frequency distribution table, to effectively
present the data.
Suppose a regional-wide survey was conducted to determine its functional literacy
rate. Functional literacy, according to National Statistics Office (NSO), is a higher level of
literacy which includes not only reading and writing skills but also numerical and
comprehension skills. The survey includes 10-64 years old household members of
provinces and key cities in the region. The literacy rate of the sample was determined,
and the results are as follows:

84 78 90 84 95 82 84 75 83 89
88 90 88 91 89 85 98 86 92 93
66 98 81 87 74 89 98 79 84 87
80 89 73 86 82 94 97 94 86 93
93 95 96 97 88 77 96 76 88 92

Literacy rate, a quantitative variable, may be organized using a stem and leaf
display or frequency distribution table.

STEM-and-LEAF DISPLAY
Presenting quantitative data in condensed form using stem-and-leaf display that
contain the individual observation, thus no information is loss. Each value in the type of
presentation is divided into two parts – a stem and leaf. The leaves for each stem are
shown separately in the presentation.
How to Prepare a Stem-and-Leaf Display
1. Split each value into two parts. The first part is the first digit, which is called the
stem. The second part will be the second digit, which is called the leaf.
2. Draw a vertical line and write the stems on the left side of it arranged in
ascending order.
3. After listing the stems, read the leaves for all values and record them next to the
corresponding stems on the right side of the vertical line.
Tabular Presentation for Grouped Data
Data formed by arranging individual observations of a variable into groups, so that a
frequency distribution table of these groups provides a convenient way of summarizing or
analyzing the data is termed as grouped data. Grouping of data plays a significant role
when we have to deal with large data. This information can also be displayed using a
pictograph or a bar graph.
Example 1. Construct a frequency distribution table for the given data below.
Math Quiz Scores of Grade 8

6 23 20 24 4 14 11 11 8 21
22 22 11 14 7 4 24 25 22 5
24 23 16 15 25 7 11 25 20 19
11 22 2 12 14 13 17 23 16 22
22 9 7 22 4 13 20 24 1 7

To construct a frequency distribution table for quantitative data, we have the


following steps:
1. Find the range of the data set. The range (R) is given by the difference between
the highest (H) and lowest (L) data entries. So, for our given data set we have:
R = H – L = 25 – 1 = 24
2. Determine the number of classes, also known as number of class intervals (c).
Note that these classes represent a variable. One rule to help us decide on the
number of classes is to use Sturge’s Formula, given by;
c = 1 + 3.322 log n
where: c – number of classes
n – sample size/ total frequency
Therefore: c = 1 + 3.322 log 50 → c≈7
3. Find the class size (i), also known as class width of the data set. Divide the
range by the number of classes (c) and round up to find the class size of the data
set. Thus, we have
i=R/C
where: i = class size
i = 24 / 7 R = range
i = 3.43 c = number of classes
i≈3
4. List the class intervals of the data set for the given data, we will have to construct
seven (7) classes with a class size (i) of 3. Determine also the lower limits and the
upper limits of the classes.
a. The lower limit of the first-class interval is a number nearest to the
lowest value of the data entries that is divisible by the class size. This
value may be less than or equal to the lowest value.
For the given data, lowest value is 1. The nearest number to 1 that is
divisible by 3 is 0 which is the lower limit of the first-class interval. To find
the lower limit of the remaining 6 classes, add the class size to the lower
limit of each previous class.
b. The upper limit of the first-class interval is a number that is one less
than the lower limit of the second class. The upper limits of the remaining
five classes are determined by adding the class size to the upper limits of
each previous class.
5. Tally the entries from each class interval.
Table 1
Distribution of the Grade 8 Students in Math Quiz

Scores Frequency Percentage


0-2 2 4
3-5 4 8
6-8 6 12
9-11 6 12
12-14 6 12
15-17 4 8
18-20 4 8
21-23 11 22
24-26 7 14
Total 50 100
Histogram
A histogram is a display of statistical information that uses rectangles to show the
frequency of data items in successive numerical intervals of equal size. In the most
common form of histogram, the independent variable is plotted along the horizontal
axis and the dependent variable is plotted along the vertical axis. The data appears
as colored or shaded rectangles of variable area.
Module 2
Name:______________________________________Score:_________________
Section:_____________________________________Date:__________________

Activity 3
Tabular Presentation for Quantitative Data

Answer the following:

1. Below are the list of the weekly expenses (in ₱) of 50 students in BSCE.
188 90 95 140 241 183 405 203 369 191
302 359 219 120 261 238 112 361 164 102
131 253 147 302 192 144 115 105 129 91
374 149 167 267 314 177 151 161 305 247
211 312 311 281 260 250 340 138 233 341

a. Make a stem-and-leaf display for the given data.

b. Use Sturge’s Formula to make a tabular presentation for the above data.
Include the percentage in your tabular presentation.

c. Create a histogram using the grouped data in b.


Sampling Techniques

A sampling technique is the name or other identification of the specific process by


which the entities of the sample have been selected. Sampling is a process used in
statistical analysis in which a predetermined number of observations are taken from a
larger population. The methodology used to sample from a larger population depends on
the type of analysis being performed.

Probability Sampling

Probability sampling is defined as a sampling technique in which the researcher


chooses samples from a larger population using a method based on the theory of
probability. For a participant to be considered as a probability sample, he/she must be
selected using a random selection.

Simple Random Sampling


As the name suggests, is an entirely random method of selecting the
sample. There are two ways in which researchers choose the samples in this
method of sampling: The lottery system and using number generating software/
random number table. This sampling technique usually works around a large
population and has its fair share of advantages and disadvantages.

Stratified Random Sampling


It involves a method where the researcher divides a more extensive
population into smaller groups that usually don’t overlap but represent the entire
population. While sampling, organize these groups and then draw a sample from
each group separately.
Random Cluster Sampling

It is a way to select participants randomly that are spread out


geographically. For example, if you wanted to choose 100 participants from the
entire population of the Philippines, it is likely impossible to get a complete list of
everyone. Instead, the researcher randomly selects areas (i.e., cities or counties)
and randomly selects from within those boundaries.

Systematic Random Sampling


It is when you choose every “nth” individual to be a part of the sample. For
example, you can select every 5th person to be in the sample. Systematic
sampling is an extended implementation of the same old probability technique in
which each member of the group is selected at regular periods to form a sample.
There’s an equal opportunity for every member of a population to be selected using
this sampling technique.
Non-Probability Sampling

Non-probability sampling is a sampling technique where the odds of any member


being selected for a sample cannot be calculated. It’s the opposite of probability
sampling, where you can calculate the odds. In addition, probability sampling involves
random selection, while non-probability sampling does not—it relies on the subjective
judgement of the researcher.

Convenience Sampling
As the name suggests, this involves collecting a sample from somewhere
convenient to you: the mall, your local school, your church. Sometimes called
accidental sampling, opportunity sampling or grab sampling.
Haphazard Sampling
Where a researcher chooses items haphazardly, trying to simulate
randomness. However, the result may not be random at all and is often tainted
by selection bias.
Purposive Sampling
Where the researcher chooses a sample based on their knowledge about the
population and the study itself. The study participants are chosen based on the
study’s purpose. There are several types of purposive sampling. For a full list,
advantages and disadvantages of the method, see the article: Purposive Sampling.
Quota Sampling
It is equivalent to stratified random sampling in terms of nonprobability
sampling. Where the groups (i.e. men and women) in the sample are proportional to
the groups in the population.
Snowball Sampling
Where research participants recruit other members for the study. This method
is particularly useful when participants might be hard to find. For example, a study
on working prostitutes or current heroin users.
Module 2
Name:______________________________________Score:_________________
Section:_____________________________________Date:__________________

Activity 4
Sampling Techniques

I. Identify the sampling method used in the following:


systematic
____________________1. Every fifth person boarding a plane is searched thoroughly.
cluster
____________________2. At a local community College, five math classes are randomly
selected out of 20 and all of the students from each class are interviewed.
stratified
____________________3. A researcher randomly selects and interviews fifty male and
fifty female teachers.
cluster
____________________4. A researcher for an airline interviews all of the passengers on
five randomly selected flights.

____________________5. Based on 13,500 responses from 42,000 surveys sent to its


alumni, a major university estimated that the annual salary of its alumni was 92,500.
voluntary response
convenience
____________________6. A community college student interviews the first 100 students
to enter the building to determine the percentage of students that own a car.

7. A psychologist is studying the sleep patterns of the 4060 students at her university.
She decides to start by asking a random sample of 30 students how many hours of sleep
they get weekday nights. Identify the type of sample in each of the following survey
methods.
systematic
____________________a. The psychologist assigns each student a number from
1 to 4060. She selects the sample by randomly choosing one of the first 132
numbers and every 132nd number thereafter.
simple
____________________b. The psychologist assigns each student a number from
0001 to 4060 and uses a computer to randomly generate a list of 30 numbers to
select the students for the sample.
cluster
____________________c. Students are listed by the neighborhood they live in.
The psychologist randomly selects six neighborhoods and then randomly selects
five students from each one.
stratified
____________________d. An equal proportion of students are randomly selected
from each discipline.
SAMPLE SIZE

In doing research, if the population is too big to handle, an extensive number of


samples is acceptable. Determining the sample size is very important consideration
because too large samples may cause waste of time, resources and money, too small
sample may lead to inaccurate results.

Using Slovin’s formula:

Determining n

N
n= where: N = population size and e = margin of error
1 + Ne 2

Determining e

𝑁−𝑛
𝑒 = √ 𝑁𝑛

Examples:

I. Find the sample size:

1. Given: N = 1,000; e = 5%

N 1,000 1,000
n= = = = 285.71  286
1 + Ne 2 1 + (1,000) (0.05) 2 3.5

2. Given: N = 40,000; e = 10%

N 40,000 40,000
n= = = = 99.75  100
1 + Ne 2
1 + (40,000 ) (0.1) 2
401

3. A researcher is conducting an investigation regarding the factors affecting the


efficiency of 185 faculty members of a certain college with a margin of error of 5%.

N 185 185
n= = = = 126.49  126
1 + Ne 2
1 + (185) (0.05) 2
1.4625

4. If the population size is 250 at 95% accuracy.

N 250 250
n= = = = 153.85  154
1 + Ne 2 1 + (250) (0.05) 2 1.625
II. Find the margin of error (e), given:

1. N = 10 000 and n = 2 000

𝑁−𝑛
𝑒=√
𝑁𝑛

10 000 − 2 000
𝑒=√
10 000 (2 000)

𝑒 = 0.02 or 𝑒 = 2%

2. N = 7 250 and n = 379

𝑁−𝑛
𝑒=√
𝑁𝑛

7 250 − 379
𝑒=√
7 250 (379)

𝑒 = 0.05 or𝑒 = 5%

For Systematic Random Sampling

Find the sample number of employees – respondents in a population of 1,000


employees in Archgames.corp using systematic random sampling such that the
margin of error is 10%.

N 1,000 1,000
n= = = = 90.91  91
1 + Ne 2
1 + (1,000) (0.1) 2
11

N 1,000
k= = = 10.99  11
n 91

This means that every 11th element will be gotten as sample.


For Proportional Stratified Random Sampling

Given the distribution of respondents below, how many samples of each


category will be included in the sample if the margin of error is 5%?

N 1,500 1,500
n= = = = 315 .79  316
1 + Ne 2
1 + (1,500 )(0.05) 2
4.75

Category Population Size (N) Number of Samples (n)


900
Supervisors 900 n=  316 = 190
1,500
500
Team Leaders 500 n=  316 = 105
1,500
100
Agents 100 n=  316 = 21
1,500
TOTAL 1,500 316
Module 2
Name:______________________________________Score:_________________
Section:_____________________________________Date:__________________

Activity 5
Sample Size
Answer the following:

1. Given the population size with a corresponding margin of error, determine the sample
size of each of the following, showing your solution.
a. N = 1 500; e = 5%
b. N = 3 050; e = 1%
c. N = 6 075; e = 10%
d. N = 2 500; e = 5%
e. N = 1 200 500; e = 5%
2. A survey to find out if families living in a certain municipality are in favor of No mask No
entry will be conducted. To ensure that all groups according to their income are
represented, respondents will be divided into high income (Class A), middle (Class B)
and low income (Class C).
1. Using 5% margin of error, find the sample size for the given data below.
2. Use the sample size and complete the table by using proportional stratified sampling
method.
No. of Number of
STRATA Solution
Families (N) Samples (n)

Class A 1 000

Class B 1 500

Class C 2 500

Total

You might also like