You are on page 1of 31

12

Inquiries,
Investigations &
Immersion
Quarter 4 – Module 5 Finding
Answers to Research Questions
(Quantitative)
Inquiries, Investigations and Immersion – Grade 12
Quarter 4 - Module 5: Finding Answers to Research Questions (Quantitative)
First Edition, 2021

Republic Act 8293, section 176 states that: No copyright shall subsist in any
work of the Government of the Philippines. However, prior approval of the government
agency or office wherein the work is created shall be necessary for exploitation of such
work for profit. Such agency or office may, among other things, impose as a condition
the payment of royalties.

Borrowed materials (i.e., songs, stories, poems, pictures, photos, brand names,
trademarks, etc.) included in this module are owned by their respective copyright
holders. Every effort has been exerted to locate and seek permission to use these
materials from their respective copyright owners. The publisher and authors do not
represent nor claim ownership over them.

Published by the Department of Education

Development Team
Writers: Jeane Eloise B. Palen
Editor: Evelyn C. Tripoli
Reviewer: Rina Joyce Ajos
Illustrator:
Layout Artist:
Management Team:
Josephine L. Fadul – Schools Division Superintendent
Melanie P. Estacio - Assistant Schools Division Superintendent
Christine C. Bagacay – Chief – Curriculum Implementation Division
Darwin F. Suyat – Education Program Supervisor – English
Lorna C. Ragos - Education Program Supervisor - Learning Resources
Management

Inilimbag sa Pilipinas ng
Department of Education – Division of Tagum City
Office Address: Energy Park, Apokon, Tagum City, 8100
Telefax: (084) 216-3504
E-mail Address: tagum.city@deped.gov.ph
12

Inquiries,
Investigations and
Immersion
Quarter 4 – Module 5
Finding Answers to Research
Questions (Quantitative)
Introductory Message
This Self-Learning Module (SLM) is prepared so that you, our dear learners, can continue
your studies and learn while at home. Activities, questions, directions, exercises, and
discussions are carefully stated for you to understand each lesson.

Each SLM is composed of different parts. Each part shall guide you step-by- step as you
discover and understand the lesson prepared for you.

Pre-tests are provided to measure your prior knowledge on lessons in each SLM. This will tell
you if you need to proceed on completing this module or if you need to ask your facilitator
or your teacher’s assistance for better understanding of the lesson. At the end of each
module, you need to answer the post-test to self-check your learning. Answer keys are
provided for each activity and test. We trust that you will be honest in using these.

In addition to the material in the main text, Notes to the Teacher are also provided to our
facilitators and parents for strategies and reminders on how they can best help you on your
home-based learning.

Please use this module with care. Do not put unnecessary marks on any part of this SLM. Use
a separate sheet of paper in answering the exercises and tests. And read the instructions
carefully before performing each task.

If you have any questions in using this SLM or any difficulty in answering the tasks in this
module, do not hesitate to consult your teacher or facilitator.

Thank you.

ii
Let Us Learn!

After going through this module, you are expected to:

1. Gather and analyze data with intellectual honesty using suitable techniques.

By the end of the module, the learners are expected to:

• Apply relevant descriptive statistics in analyzing and interpreting a problem;


• Construct a codebook which can aid in analyzing data collected from survey
research;
• Write the results and discussion of their study; and
• Discuss of importance of proper data management practice.

1
Let Us Try!

Choose the best answer. Write your answer on a separate sheet of


paper.

1. What does quantitative data refer to?


a. Graphs and tables
b. Numerical data that could usefully be quantified to help you answer your
research question/s and to meet your objectives.
c. Any data you present in your report.
d. Statistical analysis
2. Which measure of central tendency is obtained using the middle score when all
scores are organized in numerical order?
a. Mean c. Mode
b. Median d. None of these
3. Which measure of central tendency is obtained by calculating the sum of values and
dividing this figure by the number of values there are in the data set?
a. Mean c. Mode
b. Median d. None of these
4. Which measure of central tendency is derived from the most common value?
a. Mean c. Mode
b. Median d. None of these
5. What method is used to compute average or central value of collected data?
a. Measure of positive variation
b. Measures of central tendency
c. Measures of negative skewness
d. Measures of negative variation
6. What does standard deviation refer to?
a. A way of measuring extent of spread of quantifiable data.
b. Inappropriate in management and business research.
c. A way of describing those phenomena that are not the norm.
d. A way of illustrating crime statistics.

For questions 7 to 9, refer to the following problem

A survey was conducted to know the audience feedback on a dance presentation. It asked
this question:

“In your opinion the dance presentation was entertaining, boring, or neither?”

2
Respondents Entertaining Boring Neither
A 1
B 1
C 1
D 1
E 1
Total 3 1 1

7. What percentage of the respondents said that the dance presentation is entertaining?
a. 50% c. 70%
b. 60% d. 20%
8. What percentage of the respondents said that the dance presentation is boring?
a. 50% c. 70%
b. 60% d. 20%
9. What percentage of the respondents said that the dance presentation is neither
entertaining nor boring?
a. 50% c. 70%
b. 60% d. 20%
10. The total marks obtained by few students in mathematics exam are 100, 160, 154, 95,
and 82. What is the mean?
a. 117.2 c. 119.2
b. 118.2 d. 120.2

Let Us Study

A. THE DATA MANAGEMENT PROCESS


A researcher must be knowledgeable in managing the data obtained in the process of
completing their research study. Without this background knowledge, investigators are
left to a trial-and-error approach or dependence on other team members to determine
appropriate data management strategies. The data management process starts with data
preparation and data collection and ends with data maintenance (as shown in Figure 1). In
this section, we are going to discuss about the different steps in data management
process.

2
Figure 1. Data Management Process

Source: Josefina Almeda [UP Statistical Society]. (2021, April 19). CLEARING
PATHWAYS: Significance of Proper Data Handling in Empowering
Scientific Thinking [Video]. Facebook.
https://www.facebook.com/upstatsoc/videos/vb.203566473003057/3
85295256195607/

Data Collection
Data collection or data gathering is defined as the process of gathering and
measuring information on variables of interest, in an established systematic method
that enables one to answer stated research questions, test hypotheses, and evaluate
outcomes. There are several techniques or strategies for data collection with
corresponding statistical instruments. These data collection strategies were discussed
in module 4 (interview, observations, survey questionnaires, and experiments). The
kind of analysis that can be performed on a set of data will be influenced by the goals
identified at the outset, and the data gathered.

Quantitative research is concerned with testing hypotheses derived from theory


and/or being able to estimate the size of a phenomenon of interest. Depending on
the research question, participants may be randomly assigned to different
treatments.

The quantitative data collection method relies on random sampling and structured
data collection instruments that fit diverse experiences into predetermined response
categories. It produces results that is easy to summarize, compare, and generalize.

If this is not feasible, the researcher may collect data on participant and situational
characteristics to statistically control their influence on the dependent or outcome
variable. If the intent is to generalize from the research participant to a larger
population, the researcher will employ probability sampling to select participants.
To obtain reliable information that will help you answer the research questions,
follow these steps:

2
1. Determine the objectives of the study you are undertaking.
2. Define the population of interest.
3. Choose the variables that you will measure in the study.
4. Decide on an appropriate design for producing data.
5. Collect the data.
6. Determine the appropriate descriptive and/or data analysis techniques.
Data Editing
In this stage, the raw data collected will be transferred to the data editors to check
for the completeness, accuracy, and preciseness of data. The adage “garbage in,
garbage out” illustrates the issue on management of data. The quality of your
analysis depends on the quality of the raw data you used. Hence, the quality of data
collected is foundational to the validity of study findings. Quality data collection
requires a systematic approach and includes 1) training data collectors and 2)
monitoring completeness and accuracy of raw data. The latter is the focus of this
stage of the process.
In a well-executed study, the data collection plan, including procedures, instruments,
and forms, is designed, and pretested to maximize accuracy. All data collection
activities are monitored to ensure adherence to the data collection protocol and to
prompt actions to minimize and resolve missing and questionable data. Monitoring
procedures are instituted at the outset and maintained throughout the study, since
the faster irregularities can be detected, the greater the likelihood that they can be
resolved in a satisfactory manner and the sooner preventive measures can be
instituted.
Nevertheless, there is often the need to “edit” data, both before and after they are
computerized. The first step is manual or visual editing. Before forms are encoded in
the computer, the forms are reviewed to spot irregularities and problems that
escaped notice or correction during monitoring.
Open ended questions, if there are any, usually needed to be coded. This will be
discussed in the next module (qualitative analysis). Codes for encoding may also be
needed for close-ended questions. Even forms with only close-ended questions
having pre-coded responses (i.e., have numbers or letters corresponding to each
response choice) may require coding for each situation as unclear or ambiguous
responses, multiple responses to a single item, written comments from the
participant or data collector, and other situation that arise.
Code names for variables should be meaningful and easy to remember. Coding and
naming conventions should be standardized for files, variables, programs, and other
entities in a data management system. For example, in longitudinal study (RWHP)
where the researcher collects data at three different data collection time points, the
individual data files developed were named as RWHP1, RWHP2, and RWHP3. To
assure brevity, all variable names were limited to eight characters or

3
less. A coding manual was written, in the study, that matched all variable names with
variable labels and codes.
When variables are measured across multiple data points using the same measures,
the variable names must reflect the different time points of data collection as well as
different versions of instruments that might have been used. In table 1, the variable
names are described for variables measured at three different data collection points.
As noted in the table, the variable name was slightly modified to reflect the time
when the variable was measured.
Table 1. Examples of Variable Names.
Variable Variable Name Variable Name Variable Name
Description Time 1 Time 2 Time 3
What region do you aRegion bRegion cRegion
live?
Do you have a aPayjob bPayjob cPayjob
paying job?
Have you been told aAids bAids cAids
you have AIDS?
Coping Scale (54 aCope1- bCope1- cCope1-
items) aCope54 bCope54 cCope54
Social Support aSS1- aSS19 bSS1-bSS19 cSS1-cSS19
Scale (19 items)
Depression Scales (20 aDS1-aDS20 bDS1-bDS20 cDS1-cDS20
items)
Another example is the codebook table shown below, which includes codes for
responses of close-ended questions. Remember, consistent rules must be used for
coding variables.
Table 2. Sample Codebook for a Survey Questionnaire
Variables Code Name Code
1. Gender Gender M – Male F
– Female
2. Track TrackSHS 1 – GAS
2 - STEM
3. Overall OAHE 1 – Worse
assessment of 2 – Very Poor
health education
3 – Normal
4 – Good
5 – Excellent

4
Data Entry
Once the data has been edited, that is the only time that you enter them into the
computer system. For small questionnaires and data forms, data can be encoded
directly into a spreadsheet or even a plain text file. A customized data entry program
often checks each value as it is entered, to prevent illegal values from entering the
data set. This facility serves to reduce keying errors but will also detect illegal
responses on the form for that slipped through the visual edits. You can do this using
the Data Validation Tool in Excel Spreadsheet. There are multiple tutorials online
that you can follow in applying data validation to your database.
Data entry must be performed by well-trained and responsible individuals. Data
must be entered with attention to detail and some individuals are better at this than
others. Consistency in data entry is best achieved by one rather than multiple
individuals, and as the number of persons involved in data entry increases, the chance
of error also increases. However, systematic bias may be an issue with only one data
entry individual.
Data Cleaning and Validation
Once the data are computerized, they are subjected to a series of computer checks
to clean them. Data cleaning is the process of preparing data for analysis by
removing or modifying data that is incorrect, incomplete, irrelevant, duplicated, or
improperly formatted. This data is usually not necessary or helpful when it comes to
analyzing data because it may hinder the process or provide inaccurate results. There
are several methods of cleaning data depending on how it is stored along with the
answers being sought.

Data cleaning is not simply about erasing information to make space for new data,
but rather finding a way to maximize a data set’s accuracy without necessarily deleting
information. For one, data cleaning includes more actions than removing data, such
as fixing spelling and syntax errors, standardizing data sets, and correcting mistakes
such as empty fields, missing codes, and identifying duplicate data points.
Data Segmentation
After validating and cleaning the data, you can now start summarizing them through
descriptive statistics or test your hypothesis using inferential statistics. We will only
focus on descriptive statistics in this module.
Data Storage
You can either store your data in your computer’s hard drive, external hard drive, or
in the cloud. Storing data in the cloud is preferrable since your teammates can view
and edit the file wherever they are as long as they are online. Examples of cloud
storage are Google Drive and OneDrive.

5
Hygiene and Maintenance
Data maintenance involves creating a back-up copy of the files on regular basis. The
printed and digital copies of the data must also be stored in a secured location. This
stage also includes proper documentation of the information, procedures, and data
analysis conducted in the overall data management process.
B. INTERPRETATION AND ANALYSIS
Once you have a set of data, you will need to organize it so that you can analyze how
frequently each datum occurs in the set.

Levels of Measurement
The way a set of data is measured is called its level of measurement. Correct
statistical procedures depend on a research being familiar with levels of
measurement. Not every statistical operation can be used with every set of data. Data
can be classified into four levels of measurement.

a. Nominal Scale Level


Data that is being measured in a nominal scale is qualitative
(categorical). Categories, colors, names, labels, and favorite foods
along with yes or no responses are examples of nominal level data.
Nominal scale data are not ordered. Nominal scale data cannot be
used in calculations.

b. Ordinal Scale Level


Data that is measured using an ordinal scale is like nominal scale data
but there is a big difference. The ordinal scale data can be ordered. An
example of ordinal scale data is the Likert scale where the responses to
questions of a cruise survey are “excellent”, “good”, “satisfactory”,
and “unsatisfactory”. These responses are ordered from the most
desired responses to the least desired. But the differences between
two pieces of data cannot be measured. Like the nominal scale data,
ordinal scale data cannot be used in calculation.
c. Interval Scale Level
Data that is measured using the interval scale is like ordinal level data
because it has a definite ordering but there is a difference between
data. Temperature scales like Celsius and Fahrenheit are measured
using the interval scale.

d. Ratio Scale Level


Data that is measured using the ratio scale takes care of the ratio
problem and gives you the most information. Ratio scale data is like
interval scale data, but it has a 0 point and ratios can be calculated.
For example, four multiple

6
choice statistics final exam scores are 90, 68, 20, and 92 (out of
possible 100 points). The data can be ordered from lowest to highest.
The differences between the data have meaning. The score 92 is more
than the score 68 by 24 points. Ratios can be calculated. The smallest
score is 0.

Frequency and Frequency Tables

A frequency is the number of times a value of the data occurs. According to Table 3
below, there are three students who work two hours, five students who work three
hours, and so on. The sum of the values in the frequency column, 20, represents the
total number of students included in the sample.

A relative frequency is the ratio (fraction or proportion) of the number of times a


value of the data occurs in the set of all outcomes to the total number of outcomes.
To find the relative frequencies, divide each frequency by the total number of
students in the sample – in this case,
20. Relative frequencies can be written as fractions, percent, or decimals.

Cumulative relative frequency is the accumulation of the previous relative


frequencies. To find the cumulative relative frequencies, add all the previous relative
frequencies to the relative frequency for the current row.

Table 3. Frequency Table of Student Work Hours and Relative and Cumulative
Relative Frequencies.
Cumulative Relative
Data Value Frequency Relative Frequency
Frequency
3 3 0.15
2 𝑜𝑟 0.15 𝑜𝑟 15%
20
5 5 0.15 + 0.25 = 0.40
3 𝑜𝑟 0.25 𝑜𝑟 25%
20
3 3 0.40 + 0.15 = 0.55
4 𝑜𝑟 0.15 𝑜𝑟 15%
20
6 6 0.55 + 0.30 = 0.85
5 𝑜𝑟 0.30 𝑜𝑟 30%
20
2 2 0.85 + 0.10 = 0.95
6 𝑜𝑟 0.10 𝑜𝑟 10%
20
1 1 0.95 + 0.05 = 1.00
7 𝑜𝑟 0.05 𝑜𝑟 5%
20
20
Total

7
Graphical Interpretation of Data
It is a good idea to look at a variety of graphs to see which is the most helpful in
displaying the data. We might make different choices of what we think is the “best”
graph depending on the data and the context. Our choice also depends on what we
are using the data for.
Pie Charts. Use Pie Chart to compare the proportion of data in each category or
group. A pie chart is a circle that is divided into segments or slices to represent the
proportion of observations that are in each category. There should be a maximum of
six slices in using pie charts. The first slice must start at 12 o’clock. Others must be
placed last. Lastly, only use explosion if you want to focus on a pie slice.

EXAMPLE 1: A quality engineer for an automotive supply company wants to decrease


the number of car door panels that are rejected because of paint flaws. As part of
the initial investigation, the engineer creates a pie chart to compare the counts of
flaws in each category.

INTERPRETATION OF RESULTS:
The pie chart shows that Peel is the most common paint flow and that Smudge and
Other are the least common paint flaws. (Note: Smudge (green) – 15%; Scratch
(yellow) – 32.5%; Peel (red) – 37.5%; Other (sky blue – 15.0%)
Figure 2. Pie Chart of Flaws.

Bar Charts. Use Bar Chart to compare the counts, the means, or other summary
statistics using bars to represent groups or categories. The height of the bar shows
either the count, the variable function (mean, sum, standard deviation, and others),
or the summary value for the

8
group. Bars may be vertical or horizontal. Use the bar graph instead of pie chart if
you must compare more than six categories.
Using example 1, we will create a bar chart instead of a pie chart.

INTERPRETATION OF RESULTS:
The bar chart shows that Peel is the most common paint flaw and that Smudge and Other are
the least common paint flaws.
Figure 3. Bar Chart of Flaws.

EXAMPLE 2: An electronics design engineer studies the effect of operating


temperature and three types of face-plate glass on the light output of an
oscilloscope tube. As part of the initial investigation, the engineer creates a bar chart
to compare the light output of various combinations of temperature and glass type.
INTERPRETATION OF RESULTS
The temperature that produces the highest light output most often is 150 degrees.
Although the difference in light output between glass types is small, the glass type that
produces the highest light output most often is Glass type 1. Overall, the highest light
output occurs with glass type 1 at 150 degrees.
Figure 4. Chart of Mean Light Output

9
Pareto Chart. Use Pareto chart when you want to organize the bar chart in
decreasing order, with longest bars on the left and shortest bars on the right. As you
can see from Figure 4 that the school is populated with Asian students while the
Native American students are the minority of the school population.
Figure 5. Pareto Chart of Ethnicity of Students.

Stem-and-Leaf Plot. Use Stem-and-Leaf Plot to examine the shape and spread of
sample data. The stem-and-leaf plot, or stemplot, comes from the field of exploratory
data analysis. It is a good choice when the data sets are small.
To create the plot, divide each observation of data into a stem and a leaf. The leaf
consists of a final significant digit. For example, 23 has stem two and leaf three. The
number 432 has stem 43 and leaf two. The decimal 9.3 has stem nine and leaf three.
Write the stems in a vertical line from smallest to largest. Then write the leaves in
increasing order next to their corresponding stem.
EXAMPLE 3: A scientist for a company that manufactures processed food wants
to assess the percentage of fat in the company’s bottled sauce. The advertised
percentage is 15%. The scientist measures the percentage of fat in 20 random
samples. Previous measurements found that the population standard deviation is
2.6%.
INTERPRETATION OF RESULTS
Figure 6. Stem-and-Leaf of Percent Fat

10
For each row, the number in the “stem” (the middle column) represents the first digit
(or digits) of the sample values. The “leaf unit” at the top of the plot indicates which
decimal place the leaf values represent.
The first row of the stem-and-leaf plot of Percent Fat as a stem of 12 and contains the
leaf values 3, 4, and 8. The leaf unit is 0.1. Thus, the first row of the plot represents
sample values of approximately 12.3, 12.4, and 12.8.
Histogram. Use Histogram to examine the shape and spread of your data. A
histogram works best when the sample size is at least 20. A histogram divides
sample values into many intervals and represents the frequency of data values in
each interval with a bar. The horizontal axis is labeled with what the data represents
(for instance, distance from your home to school). The vertical axis is labeled either
frequency or relative frequency (or percent frequency or probability).
EXAMPLE 4: A quality control engineer needs to ensure that the caps on shampoo
bottles are fastened correctly. If the caps are fastened too loosely, they may fall off
during shipping. If they are fastened too tightly, they may be too difficult to remove.
The target torque value for fastening the caps is 18. The engineer collects a random
sample of 68 bottles and tests the amount of torque that is needed to remove the
caps.
INTERPRETATION OF RESULTS
Most caps were fastened with a torque of 14 to 24. Only one cap was very loose,
with a torque of less than 11. However, the distribution is positively skewed. Many
caps required a torque of greater than 24 to remove, and five caps required torque
of greater than 33, nearly two times the target value.
Figure 7. Histogram of Torque.

Time Series Plot. Use Time Series Plot to look for patterns in your data over time, such
as trends or seasonal patterns. To construct a time series graph, we must look at
both pieces of our paired data set. We start with a standard Cartesian coordinate
system. The horizontal axis

11
is used to plot the date or time increments, and the vertical axis is used to plot the
values of the variable that we are measuring. By doing this, we make each point on
the graph correspond to a date and a measured quantity. The points on the graph are
typically connected by straight lines in the order in which they occur.
EXAMPLE 5: A marketing analyst wants to assess trends in tennis racquet sales. The
analyst collects sales data from the previous five years to predict the sales of the
product for the next 3 months. As part of the initial investigation, the analyst creates
a time series plot to see how sales have changed over time.
INTERPRETATION OF RESULTS
The time series plot shows a clear upward trend. There may also be a slight curve in
the data; the increase in the data values seems to accelerate over time.
Figure 8. Time Series Plot of Racquets

Box Plots. Box plots give a good graphical image of the concentration of the data.
They also show how far the extreme values are from most of the data. A box plot is
constructed from five values: the minimum value, the first quartile, the median, the
third quartile, and the maximum value. We will discuss the calculation and
interpretation of these interpretations later. We use these values to compare how
close other values are to them.
To construct a box plot, use a horizontal or vertical number line and a rectangular
box. The smallest and largest data values label the endpoints of the axis. The first
quartile marks one end of the box and the third quartile marks the other end of the
box. Approximately the middle 50 percent of the data fall inside the box. The
“whiskers” extend from the ends of the box to the smallest and largest data values.
The median or second quartile can be between the first and third quartiles, or it can
be one, or the other, or both. The box plot gives a good, quick picture of the data.
EXAMPLE 6: A plant fertilizer manufacturer wants to develop a formula of fertilizer
that yields the most increase in the height of plants.

12
To test fertilizer formulas, a scientist prepares three groups of 50 identical seedlings:
a control group with no fertilizer, a group with manufacturer’s fertilizer, named
GrowFast, and a group with fertilizer named SuperPlant from a competing
manufacturer. After the plants are in a controlled greenhouse environment for three
months, the scientist measures the plants’ heights.
As part of the initial investigation, the scientist creates a boxplot of the plant heights
from the three groups to evaluate the differences in plant growth between plants
with no fertilizer, plants with the manufacturer’s fertilizer, and plants with their
competitor’s fertilizer.

INTERPRETATION OF RESULTS
Figure 9. Boxplot of Height

GrowFast produces the tallest plants overall. SuperPlant also increases plant height,
but its variability is greater, and SuperPlant does not have a positive effect on a large
population of the seedlings. The graph shows that GrowFast causes a greater and
more consistent increase in plant height.
Scatter Plots. Use Scatterplot to investigate the relationship between a pair of
continuous variables. A scatterplot displays ordered pairs of X and Y variables in a
coordinate plane.
EXAMPLE 6: A medical researcher studies obesity in adolescent girls. Because body
fat percentage is difficult and expensive to measure directly, the researcher wants to
determine whether the body mass index (BMI) – a measurement that is easy to take
– is a good predictor of body fat percentage. The researcher collects BMI, body fat
percentage, and other personal variables of 92 adolescent girls.
INTERPRETATION OF RESULTS
The scatterplot for the BMI and body fat data shows a strong positive and linear
relationship between the two variables. Body mass index (BMI) may be a good
predictor of body fat percentage.

13
Figure 10. Scatterplot of Body Fat Percentage and BMI.

Numerical Interpretation of Data Measures of


the location of data.
The common measures of location are quartiles and percentiles.
Quartiles are special percentiles. The first quartile, 𝑄1, is the same as the 25th
percentile, and the third quartile, 𝑄3, is the same as the 75th percentile. The
median, M, is called both the second quartile and the 50th percentile.
To calculate quartiles and percentiles, the data must be ordered from smallest to
largest. Quartiles divide ordered data into quarters.
Percentiles divide ordered data into hundredths. To score in the 90th percentile of an
exam does not mean, necessarily, that you received 90% on a test. It means that 90%
of test scores are the same or less than your score and 10% of the test scores are the
same or greater than your test score.
Percentiles are useful for comparing values. For this reason, universities and colleges
uses percentiles extensively. One instance in which colleges and universities use
percentiles is when SAT results are used to determine a minimum testing score that
will be used as an acceptance factor.
Percentiles are mostly used with very large populations. Therefore, if you were to say
that 90% of the test scores are less (and not the same or less) than your score, it
would be acceptable because removing one data value is not significant.
The median is a number that measures the center of the data. You can think of the
median as the middle value, but it does not actually have to be one of the observed
values. It is a number that separates ordered data into halves. Half the values are
the same number or smaller than the median, and half the values are the same
number or larger.

14
Quartiles are numbers that separate the data into quarters. Quartiles may or may
not be part of the data. The find the quartiles, find the median or second quartile.
The first quartile, 𝑄1, is the middle value of the lower half of the data, and the third
quartile, 𝑄2, is the middle value, or median, of the upper half of the data.
The interquartile range (IQR) is a number that indicates the spread of the middle half
or the middle 50% of the data. It is the difference between the third quartile (𝑄3)
and the first quartile (𝑄1).
𝐼𝑄𝑅 = 𝑄3 − 𝑄1
The IQR can help to determine potential outliers. A value is suspected to be a
potential outlier if it is less than (1.5)(𝐼𝑄𝑅) below the first quartile or more than
(1.5)(𝐼𝑄𝑅) above the third quartile. Potential outliers always require further
investigation.
Note: A potential outlier is a data point that is significantly different from the other
data points. These special points may be errors or abnormality, or they may be a key
to understanding the data.
INTERPRETATION: A percentile indicates the relative standing of a data
value when data are sorted into numerical order from smallest to largest.
Percentages of data values are less than or equal to the kth percentile. For example,
15% of data values are less than or equal to the 15th percentile.
• Low percentiles always correspond to lower data values.
• High percentiles always correspond to higher data values.
A percentile may or may or may not correspond to a value judgement about
whether it is good or bad. The interpretation of whether a certain percentile is good
or bad depends on the context of the situation to which the data applies. In some
situations, a low percentile would be considered good; in other contexts, a high
percentile might be considered good. In many situations, there is no value
judgment that applies.

Formula for Finding the kth Percentile.

Let k – kth percentile

i – the index (ranking or position of a data value)


n – the total number of data Process

1. Order the data from smallest to largest.


2.
3. If i is an integer, then kth percentile is the data value in the ith
position in the ordered set of data.
4. If i is not an integer, then round i up and round i down to the nearest
integers. Average the two data values in these two positions in the ordered
data set.

15
EXAMPLE 7: Listed are 29 ages for the members of the faculty of XYZ
Senior High School in order from smallest to largest. 21; 22; 25; 25; 26; 26; 27; 29;
30; 31; 33; 34; 34; 36; 36; 37; 40; 41; 41; 42; 45;
47; 52; 53; 54; 54; 55; 57; 58
a. Find the 70th percentile.
b. Find the 83rd percentile.
Solution.
1. 𝑘 = 70; 𝑖 = 𝑖𝑛𝑑𝑒𝑥; 𝑛 = 29
𝑘 70
𝑖= (𝑛 + 1) = ( ) (29 + 1) = 21.
10 100
0
Twenty-one in an integer, and the data value in the 21st position in the
ordered data set is 45. The 70th percentile is 45 years old.
2. 𝑘 = 83; 𝑖 = 𝑖𝑛𝑑𝑒𝑥; 𝑛 = 29
𝑘
𝑖= 8 ) (29 + 1) = 24.9.
3
(𝑛 + 1) = (
100 100
24.9 is not an integer. Round it down to 24 and up to 25. The age in the
24th position is 53 and the age in the 25th position is 54. Average 53 and
54. The 83rd percentile is 53.5 years.
INTERPRETATION: When writing the interpretation of a
percentile in the context of the given data, the sentence should contain the
following information.
• Information about the context of the situation being considered.
• The data value (value of the variable) that represents the percentile.
• The percent of individuals or items with data values below the percentile
• The percent of individuals or items with data values above the percentile
Measures of the center of data.
The center of a data set is also a way of describing location. The two most widely
used measures of the center of the data are the mean (average) and the median. To
calculate the mean weight of 50 people, add the 50 weights together and divide by
50. To find the median weight of the 50 people, order the data and find the number
that splits the data into two equal parts. The median is generally a better measure of
the center when there are extreme values or outliers because it is not affected by
the precise numerical values of the outliers. The mean is the most common measure
of the center.
Another measure of the center is the mode. The mode is the most frequent value.
There can be more than one mode in a data set as long

16
as those values have the same frequency, and that frequency is the highest. A data
set with two modes is called bimodal.
Measures of the spread of data.
An important characteristic of any set of data is the variation in the data. In some
data sets, the data values are concentrated closely near the mean; in other data
sets, the data values are more widely spread out from the mean. The most common
measure of variation, or spread, is the standard deviation. The standard deviation is
a number that measures how far data values are from their mean.
The standard deviation is always positive or zero. The standard deviation is small
when the data are all concentrated close to the mean, exhibiting little variation or
spread. The standard deviation is larger when the data values are more spread out
from the mean, exhibiting more variation.
Suppose that we are studying the amount of time customers wait in line at the
checkout at supermarket A and supermarket B, the average wait time of both
supermarkets is five minutes. At supermarket A, the standard deviation for the wait
time is two minutes; at supermarket B the standard deviation is more spread out
from the average; wait times at supermarket A are more concentrated near the
average.
Suppose, again, that Rosa and Binh both shop at supermarket A. Rosa waits at the
checkout counter for seven minutes and Binh waits for one minute. At supermarket
A, the mean waiting time is five minutes and the standard deviation is two minutes.
The standard deviation can be used to determine whether a data value is close to or
far from the mean.
Rosa waits for seven minutes:
• Seven is two minutes longer than the average of five; two minutes is
equal to one standard deviation.
• Rosa’s waiting time of seven minutes is two minutes longer
than the average of five minutes.
• Rosa’s waiting time of seven minutes is one standard
deviation above the average of five minutes.
Binh waits for one minute:
• One is four minutes less than the average of five; four minutes is
equal to two standard deviations.
• Binh’s wait time of one minute is four minutes less than
the average of five minutes.
• Binh’s wait time of one minute is two standard deviations below the
average of five minutes.
Calculating the Standard Deviation
To calculate the standard deviation, we need to calculate the variance first.
The variance is the average of the squares of the

17
deviations. The symbol 𝜎2 represents the population variance, the
population standard deviation 𝜎 is the square root of the population
variance. The symbol 𝑠2 represents the sample variance; the sample
standard deviation is s is the square root of the sample variance.
If the numbers come from a census of the entire population and not a
sample, when we calculate the average of the squared deviations to find the
variance, we divide by N, the number of items in the population. If the data
are from a sample rather than a population, when we calculate the average
of the squared deviations, we divide by n – 1, one less than the number of
items in the sample.

Formula for the Sample Standard Deviation

∑(𝑥 − 𝑥) 2
𝑠= √
𝑛−1

where 𝑥 is the individual score in the sample, 𝑥 is the sample mean, and n is the
sample population.

The standard deviation, 𝑠 𝑜𝑟 𝜎, is either zero or larger than zero.


Describing the data with reference to the spread is called
“variability”. The variability in data depends upon the method by which the
outcomes are obtained; for example, by measuring or by random sampling.
When the standard deviation is zero, there is no spread; that is all the data
values are equal to each other. The standard deviation is small when the
data are all concentrated close to the mean and is larger when the data
values show more variation from the mean. When the standard deviation is
a lot larger than zero, the data values are very spread out about the mean;
outliers can make s or very large.

Reminder: Do not use inferential statistics when using Likert scales. Likert scales are
ordinal type of data and can be only analyzed using cross-tabulation (Table 4). Only
use significance test or inferential statistics when the variables are continuous and
qualifies the testing assumptions.

18
Table 4. Overall assessment cross-tabulation lifted from the study of (Zhang et al.,
2016) on “How the Public Uses Media WeChat to Obtain Health Information in
China: A Survey Study”.
Overall assessment of Overall assessment of
searching for medical current health education system
knowledge via internet
Very Poor 71 (4.34%) 124 (7.58%)
Worse 293 (17.91%) 489 (29.89%)
Normal 1069 (65.34%) 915 (55.93%)
Good 186 (11.37%) 103 (6.30%)
Excellent 17 (1.04%) 5 (0.31%)

Additional note: You can access this article here: https://bit.ly/3uI4f6m. Observe
how the authors analyzed and discussed the results of their survey research. A
sample of their research instrument can also be found within the article.
Descriptive statistics are most helpful when the research is limited to the sample and
does not need to be generalized to a larger population. For example, if you are
comparing the percentage of adults vaccinated in four different barangays, then
descriptive statistics is enough.
Summary
The significant of data interpretation is indisputable. Data analysis and
interpretation are crucial to develop sound conclusions and make better informed
decisions. As such, below is some tips in interpreting data.
1. Collect your data and make it as readable as possible.
2. Choose the type of data analysis to perform.
3. Think. Ponder about your data from various points of views (connect it
to your related literature), and what it means for various respondents.
4. Reflect. Be aware of many dangers of data analysis and
interpretation. Make sure your statements are valid and backed
with sound evidence.

Let Us Practice

Direction: Read the problem and answer the questions that follow.

Problem: Twenty-five selected students were asked the number of movies they
watched the previous week. The results are as follows.

19
Number of Relative Cumulative
Movies Frequency Frequency Relative
Frequency
0 5
1 9
2 6
3 4
4 1

1. Complete the remaining columns of the chart.


2. Construct a histogram of the data.
3. Find the mean, median, and mode.
4. Calculate the first, second, and third quartile.
5. Write your interpretation of the data based on your analysis.

Let Us Practice More


Direction: Read the problem and answer the questions that follow.

Problem: Sixty-five randomly selected car salesperson were asked the number of cars
they generally sell in one week. Fourteen people answered that they
generally sell three cars, nineteen generally sell four cars, twelve generally
sell five cars, nine generally sell six cars, eleven generally sell seven cars.

1. Complete the table found below.


Cumulative
Data Value (# Relative
Frequency Relative
of cars) Frequency
Frequency

2. Find the sample mean 𝑥


3. Find the sample standard deviation, s.
4. Construct a histogram of the data.
5. Find the first quartile.
6. Find the median.
7. Find the 3rd quartile.
8. Construct a box plot of the data.
9. Find the 40th percentile.
10. Find the 90th percentile.
11. Write your interpretation of the data based on your analysis.

20
Let Let Us Remember

Direction: Fill in the blanks with the correct word or phrase to complete each sentence.
1. In , the raw data collected will be transferred to the data editors to
check for the completeness, accuracy, and preciseness of data.
2. is not simply about erasing information to make space for
new data, but rather finding a way to maximize a data set’s accuracy without
necessarily deleting information.
3. A is the number of times a value of the data occurs.
4. Use to compare the proportion of data in each category or group.
5. give a good graphical image of the concentration of the data.

Let Us Assess

Directions: In this activity, you must create a survey questionnaire that you will be using in
collecting data that will answer your research problem.
Perform any sampling technique to obtain a sample size of at least 30. Using the survey
questionnaire that you created, answer the following questions.
1. How many variables did you use in your survey? Enumerate those variables and
write their corresponding level of measurement.
2. Create a codebook for the variables used in the survey. Use the table shown below
in completing this task.

Table 5. Sample Codebook.


Variable Name Variable Code Levels of
Description Measurement
Track A specific Nominal
1 – GAD
specialization a SHS 2 – STEM
student is 3 - HUMS
enrolled in.

Let Us Enhance

Direction: Write the results and discussion of your study. Refer to rubrics found
in Table 6 in writing this section of your paper.

21
Needs
Exemplary (4) Competent (3) Developing (2)
Criteria Improvement
(1)
Results Subheadings are Subheadings Subheadings are Results are
included may or may not heavily interpreted,
and are clear not be included; Results mixing the
and informative; included; Results include some results and the
Results are are interpretation discussion section;
reported and reported but with respect Results section is
interpreted with not interpreted to the missing, only
respect to the with respect literature; Some figures and
literature; to the figures and tables are
All figures and literature; Most tables present.
tables included figures are referred to
are referred in and tables in the body of
body of the included are the results
results section. referred to in section.
the body of
the results
section.
Discussion Discussion Discussion Discussion Discussion is brief
addressed the addresses most addressed the and is a repetition
major findings major major of the results
of the study; findings of the findings of the section, if in
Results are study; Most study; included at all.
interpreted with results are Results have
respect to outside interpreted with little
sources; Careful respect interpretation
details any to outside with respect
problems sources; to outside
occurring with Considers some sources,
results, techniques, problem citations; Not
inconsistencies occurring with thorough
. results, discussion on
techniques, and the problems
inconsistenci occurring with
es. results.
Layout and Ideas are Most main Ideas covered in Ideas in the paper
Organizatio n ordered clearly ideas are the paper are are
and effectively to ordered clearly occasionally out difficult to follow;
fully and of order; Some Lack of transitions
comprehend the effectively to transitions from topic to
paper; Clear fully topic;
transitions are comprehend Paragraphs are

22
used throughout the paper; are used routinely too
the paper; Many transitions throughout the long,
Paragraphs are are used paper; encompassing too
the appropriate throughout the Paragraphs are many ideas or too
length, not too paper; often too long short,
short or too Most paragraphs or too consisting of
long. are of the short. only 1 or 2
appropriate sentences.
length.

Grammar Writing formality Much of the Writing level Paper is


and Style is writing level is is too difficult to read
appropriate for appropriate for simplistic or due to scattered
the audience; audience; Most too complicated construction and
Word choice is sentences have for audience; grammatical
precise and precise Some sentences errors.
varied; Sentence word choice; have precise
structures are Most sentences choice; Many
clear and not have a clear sentences have
convoluted; No construction; a clear
grammar and Few grammar construction;
spelling errors are and spelling Many grammar
present; errors are and spelling
Tense is correct present; Few errors; Many
throughout the errors in errors in
paper. tense in the tense in the
paper. paper.

Let Us Reflect

In your own words, discuss the importance of proper data management procedures in analyzing
and interpreting the results of a research study.

23
References

Abbas S. Tavakoli et al., “Data Management Plans: Stages, Components, and Activitieis”,
Applications and Applied Mathematics (AAM): An International Journal 1 2006: 141-
151, Accessed April 20, 2021.

Barbara Illowsky and Susan Dean, Introductory Statistics, Houston, Texas: OpenStax, 2018.

Victor J. Schoenbach, Data Analysis and Interpretation, last updated: July 17, 2014,
www.epidemiolog.net.

“What is Data Cleaning”, Sisense, accessed May 5, 2021,


https://www.sisense.com/glossary/data-cleaning/

Xingting Zhang et al., “How the Public Uses Social Media WeChat to Obtain Health
Information in China: A Survey Study”, BMC Medical Informatics and Decision
Making 2017: 17-66.

https://support.minitab.com/en-us/minitab-express/1/help-and-how- to/
For inquiries or feedback, please write or call:

Department of Education – Division of Tagum City

Energy Park, Apokon, Tagum City, 8100

Telefax: (084) 216-3504

Email Address: tagum.city@deped.gov.ph

You might also like