You are on page 1of 32

12

11
Inquiries,
Investigations &
Immersion
Quarter 4 – Module 5
Finding Answers to Research
Questions (Quantitative)
Inquiries, Investigations and Immersion – Grade 12
Quarter 4 - Module 5: Finding Answers to Research Questions (Quantitative)
First Edition, 2021

Republic Act 8293, section 176 states that: No copyright shall subsist in any
work of the Government of the Philippines. However, prior approval of the government
agency or office wherein the work is created shall be necessary for exploitation of such
work for profit. Such agency or office may, among other things, impose as a condition
the payment of royalties.

Borrowed materials (i.e., songs, stories, poems, pictures, photos, brand names,
trademarks, etc.) included in this module are owned by their respective copyright
holders. Every effort has been exerted to locate and seek permission to use these
materials from their respective copyright owners. The publisher and authors do not
represent nor claim ownership over them.

Published by the Department of Education

Development Team
Writers: Jeane Eloise B. Palen
Editor: Evelyn C. Tripoli
Reviewer: Rina Joyce Ajos
Illustrator:
Layout Artist:
Management Team:
Josephine L. Fadul – Schools Division Superintendent
Melanie P. Estacio - Assistant Schools Division Superintendent
Christine C. Bagacay – Chief – Curriculum Implementation Division
Darwin F. Suyat – Education Program Supervisor – English
Lorna C. Ragos - Education Program Supervisor - Learning Resources
Management

Inilimbag sa Pilipinas ng
Department of Education – Division of Tagum City
Office Address: Energy Park, Apokon, Tagum City, 8100
Telefax: (084) 216-3504
E-mail Address: tagum.city@deped.gov.ph
12

Inquiries,
Investigations and
Immersion
Quarter 4 – Module 5
Finding Answers to Research
Questions (Quantitative)
Introductory Message
This Self-Learning Module (SLM) is prepared so that you, our dear learners,
can continue your studies and learn while at home. Activities, questions,
directions, exercises, and discussions are carefully stated for you to
understand each lesson.

Each SLM is composed of different parts. Each part shall guide you step-by-
step as you discover and understand the lesson prepared for you.

Pre-tests are provided to measure your prior knowledge on lessons in each


SLM. This will tell you if you need to proceed on completing this module or if
you need to ask your facilitator or your teacher’s assistance for better
understanding of the lesson. At the end of each module, you need to answer
the post-test to self-check your learning. Answer keys are provided for each
activity and test. We trust that you will be honest in using these.

In addition to the material in the main text, Notes to the Teacher are also
provided to our facilitators and parents for strategies and reminders on how
they can best help you on your home-based learning.

Please use this module with care. Do not put unnecessary marks on any part
of this SLM. Use a separate sheet of paper in answering the exercises and
tests. And read the instructions carefully before performing each task.

If you have any questions in using this SLM or any difficulty in answering the
tasks in this module, do not hesitate to consult your teacher or facilitator.

Thank you.

ii
Let Us Learn!

After going through this module, you are expected to:

1. Gather and analyze data with intellectual honesty using suitable


techniques.

By the end of the module, the learners are expected to:

• Apply relevant descriptive statistics in analyzing and interpreting


a problem;
• Construct a codebook which can aid in analyzing data collected
from survey research;
• Write the results and discussion of their study; and
• Discuss of importance of proper data management practice.

1
Let Us Try!
Choose the best answer. Write your answer on a separate sheet of
paper.

1. What does quantitative data refer to?


a. Graphs and tables
b. Numerical data that could usefully be quantified to help you
answer your research question/s and to meet your objectives.
c. Any data you present in your report.
d. Statistical analysis
2. Which measure of central tendency is obtained using the middle score
when all scores are organized in numerical order?
a. Mean c. Mode
b. Median d. None of these
3. Which measure of central tendency is obtained by calculating the sum
of values and dividing this figure by the number of values there are in
the data set?
a. Mean c. Mode
b. Median d. None of these
4. Which measure of central tendency is derived from the most common
value?
a. Mean c. Mode
b. Median d. None of these
5. What method is used to compute average or central value of collected
data?
a. Measure of positive variation
b. Measures of central tendency
c. Measures of negative skewness
d. Measures of negative variation
6. What does standard deviation refer to?
a. A way of measuring extent of spread of quantifiable data.
b. Inappropriate in management and business research.
c. A way of describing those phenomena that are not the norm.
d. A way of illustrating crime statistics.

For questions 7 to 9, refer to the following problem

A survey was conducted to know the audience feedback on a dance


presentation. It asked this question:

“In your opinion the dance presentation was entertaining, boring, or neither?”

2
Respondents Entertaining Boring Neither
A 1
B 1
C 1
D 1
E 1
Total 3 1 1

7. What percentage of the respondents said that the dance presentation is


entertaining?
a. 50% c. 70%
b. 60% d. 20%
8. What percentage of the respondents said that the dance presentation is
boring?
a. 50% c. 70%
b. 60% d. 20%
9. What percentage of the respondents said that the dance presentation is
neither entertaining nor boring?
a. 50% c. 70%
b. 60% d. 20%
10. The total marks obtained by few students in mathematics exam are 100,
160, 154, 95, and 82. What is the mean?
a. 117.2 c. 119.2
b. 118.2 d. 120.2

Let Us Study

A. THE DATA MANAGEMENT PROCESS


A researcher must be knowledgeable in managing the data obtained in the
process of completing their research study. Without this background
knowledge, investigators are left to a trial-and-error approach or
dependence on other team members to determine appropriate data
management strategies. The data management process starts with data
preparation and data collection and ends with data maintenance (as shown
in Figure 1). In this section, we are going to discuss about the different
steps in data management process.

2
Figure 1. Data Management Process

Source: Josefina Almeda [UP Statistical Society]. (2021, April 19).


CLEARING PATHWAYS: Significance of Proper Data Handling in
Empowering Scientific Thinking [Video]. Facebook.
https://www.facebook.com/upstatsoc/videos/vb.203566473003057/3
85295256195607/
Data Collection
Data collection or data gathering is defined as the process of gathering
and measuring information on variables of interest, in an established
systematic method that enables one to answer stated research
questions, test hypotheses, and evaluate outcomes. There are several
techniques or strategies for data collection with corresponding
statistical instruments. These data collection strategies were discussed
in module 4 (interview, observations, survey questionnaires, and
experiments). The kind of analysis that can be performed on a set of
data will be influenced by the goals identified at the outset, and the data
gathered.

Quantitative research is concerned with testing hypotheses derived


from theory and/or being able to estimate the size of a phenomenon of
interest. Depending on the research question, participants may be
randomly assigned to different treatments.

The quantitative data collection method relies on random sampling and


structured data collection instruments that fit diverse experiences into
predetermined response categories. It produces results that is easy to
summarize, compare, and generalize.

If this is not feasible, the researcher may collect data on participant and
situational characteristics to statistically control their influence on the
dependent or outcome variable. If the intent is to generalize from the
research participant to a larger population, the researcher will employ
probability sampling to select participants.
To obtain reliable information that will help you answer the research
questions, follow these steps:

2
1. Determine the objectives of the study you are undertaking.
2. Define the population of interest.
3. Choose the variables that you will measure in the study.
4. Decide on an appropriate design for producing data.
5. Collect the data.
6. Determine the appropriate descriptive and/or data analysis
techniques.
Data Editing
In this stage, the raw data collected will be transferred to the data
editors to check for the completeness, accuracy, and preciseness of
data. The adage “garbage in, garbage out” illustrates the issue on
management of data. The quality of your analysis depends on the
quality of the raw data you used. Hence, the quality of data collected is
foundational to the validity of study findings. Quality data collection
requires a systematic approach and includes 1) training data collectors
and 2) monitoring completeness and accuracy of raw data. The latter is
the focus of this stage of the process.
In a well-executed study, the data collection plan, including procedures,
instruments, and forms, is designed, and pretested to maximize
accuracy. All data collection activities are monitored to ensure
adherence to the data collection protocol and to prompt actions to
minimize and resolve missing and questionable data. Monitoring
procedures are instituted at the outset and maintained throughout the
study, since the faster irregularities can be detected, the greater the
likelihood that they can be resolved in a satisfactory manner and the
sooner preventive measures can be instituted.
Nevertheless, there is often the need to “edit” data, both before and after
they are computerized. The first step is manual or visual editing. Before
forms are encoded in the computer, the forms are reviewed to spot
irregularities and problems that escaped notice or correction during
monitoring.
Open ended questions, if there are any, usually needed to be coded.
This will be discussed in the next module (qualitative analysis). Codes
for encoding may also be needed for close-ended questions. Even forms
with only close-ended questions having pre-coded responses (i.e., have
numbers or letters corresponding to each response choice) may require
coding for each situation as unclear or ambiguous responses, multiple
responses to a single item, written comments from the participant or
data collector, and other situation that arise.
Code names for variables should be meaningful and easy to remember.
Coding and naming conventions should be standardized for files,
variables, programs, and other entities in a data management system.
For example, in longitudinal study (RWHP) where the researcher
collects data at three different data collection time points, the individual
data files developed were named as RWHP1, RWHP2, and RWHP3. To
assure brevity, all variable names were limited to eight characters or

3
less. A coding manual was written, in the study, that matched all
variable names with variable labels and codes.
When variables are measured across multiple data points using the
same measures, the variable names must reflect the different time
points of data collection as well as different versions of instruments that
might have been used. In table 1, the variable names are described for
variables measured at three different data collection points. As noted in
the table, the variable name was slightly modified to reflect the time
when the variable was measured.
Table 1. Examples of Variable Names.
Variable Variable Variable Variable
Description Name Time 1 Name Time 2 Name Time 3
What region do you aRegion bRegion cRegion
live?
Do you have a aPayjob bPayjob cPayjob
paying job?
Have you been told aAids bAids cAids
you have AIDS?
Coping Scale (54 aCope1- bCope1- cCope1-
items) aCope54 bCope54 cCope54
Social Support aSS1- aSS19 bSS1-bSS19 cSS1-cSS19
Scale (19 items)
Depression Scales aDS1-aDS20 bDS1-bDS20 cDS1-cDS20
(20 items)
Another example is the codebook table shown below, which includes
codes for responses of close-ended questions. Remember, consistent
rules must be used for coding variables.
Table 2. Sample Codebook for a Survey Questionnaire
Variables Code Name Code
1. Gender Gender M – Male
F – Female
2. Track TrackSHS 1 – GAS
2 - STEM
3. Overall OAHE 1 – Worse
assessment of 2 – Very Poor
health education
3 – Normal
4 – Good
5 – Excellent

4
Data Entry
Once the data has been edited, that is the only time that you enter them
into the computer system. For small questionnaires and data forms,
data can be encoded directly into a spreadsheet or even a plain text file.
A customized data entry program often checks each value as it is
entered, to prevent illegal values from entering the data set. This facility
serves to reduce keying errors but will also detect illegal responses on
the form for that slipped through the visual edits. You can do this using
the Data Validation Tool in Excel Spreadsheet. There are multiple
tutorials online that you can follow in applying data validation to your
database.
Data entry must be performed by well-trained and responsible
individuals. Data must be entered with attention to detail and some
individuals are better at this than others. Consistency in data entry is
best achieved by one rather than multiple individuals, and as the
number of persons involved in data entry increases, the chance of error
also increases. However, systematic bias may be an issue with only one
data entry individual.
Data Cleaning and Validation
Once the data are computerized, they are subjected to a series of
computer checks to clean them. Data cleaning is the process of
preparing data for analysis by removing or modifying data that is
incorrect, incomplete, irrelevant, duplicated, or improperly formatted.
This data is usually not necessary or helpful when it comes to analyzing
data because it may hinder the process or provide inaccurate results.
There are several methods of cleaning data depending on how it is
stored along with the answers being sought.

Data cleaning is not simply about erasing information to make space


for new data, but rather finding a way to maximize a data set’s accuracy
without necessarily deleting information. For one, data cleaning
includes more actions than removing data, such as fixing spelling and
syntax errors, standardizing data sets, and correcting mistakes such as
empty fields, missing codes, and identifying duplicate data points.
Data Segmentation
After validating and cleaning the data, you can now start summarizing
them through descriptive statistics or test your hypothesis using
inferential statistics. We will only focus on descriptive statistics in this
module.
Data Storage
You can either store your data in your computer’s hard drive, external
hard drive, or in the cloud. Storing data in the cloud is preferrable since
your teammates can view and edit the file wherever they are as long as
they are online. Examples of cloud storage are Google Drive and
OneDrive.

5
Hygiene and Maintenance
Data maintenance involves creating a back-up copy of the files on
regular basis. The printed and digital copies of the data must also be
stored in a secured location. This stage also includes proper
documentation of the information, procedures, and data analysis
conducted in the overall data management process.
B. INTERPRETATION AND ANALYSIS
Once you have a set of data, you will need to organize it so that you can
analyze how frequently each datum occurs in the set.

Levels of Measurement
The way a set of data is measured is called its level of measurement.
Correct statistical procedures depend on a research being familiar with
levels of measurement. Not every statistical operation can be used with
every set of data. Data can be classified into four levels of measurement.

a. Nominal Scale Level


Data that is being measured in a nominal scale is
qualitative (categorical). Categories, colors, names, labels,
and favorite foods along with yes or no responses are
examples of nominal level data. Nominal scale data are not
ordered. Nominal scale data cannot be used in
calculations.

b. Ordinal Scale Level


Data that is measured using an ordinal scale is like
nominal scale data but there is a big difference. The ordinal
scale data can be ordered. An example of ordinal scale data
is the Likert scale where the responses to questions of a
cruise survey are “excellent”, “good”, “satisfactory”, and
“unsatisfactory”. These responses are ordered from the
most desired responses to the least desired. But the
differences between two pieces of data cannot be measured.
Like the nominal scale data, ordinal scale data cannot be
used in calculation.
c. Interval Scale Level
Data that is measured using the interval scale is like
ordinal level data because it has a definite ordering but
there is a difference between data. Temperature scales like
Celsius and Fahrenheit are measured using the interval
scale.

d. Ratio Scale Level


Data that is measured using the ratio scale takes care of
the ratio problem and gives you the most information. Ratio
scale data is like interval scale data, but it has a 0 point
and ratios can be calculated. For example, four multiple

6
choice statistics final exam scores are 90, 68, 20, and 92
(out of possible 100 points). The data can be ordered from
lowest to highest. The differences between the data have
meaning. The score 92 is more than the score 68 by 24
points. Ratios can be calculated. The smallest score is 0.

Frequency and Frequency Tables

A frequency is the number of times a value of the data occurs.


According to Table 3 below, there are three students who work two
hours, five students who work three hours, and so on. The sum of the
values in the frequency column, 20, represents the total number of
students included in the sample.

A relative frequency is the ratio (fraction or proportion) of the number


of times a value of the data occurs in the set of all outcomes to the total
number of outcomes. To find the relative frequencies, divide each
frequency by the total number of students in the sample – in this case,
20. Relative frequencies can be written as fractions, percent, or
decimals.

Cumulative relative frequency is the accumulation of the previous


relative frequencies. To find the cumulative relative frequencies, add all
the previous relative frequencies to the relative frequency for the
current row.

Table 3. Frequency Table of Student Work Hours and Relative and


Cumulative Relative Frequencies.
Cumulative
Data Value Frequency Relative Frequency
Relative Frequency
3 3 0.15
2 𝑜𝑟 0.15 𝑜𝑟 15%
20
5 5 0.15 + 0.25 = 0.40
3 𝑜𝑟 0.25 𝑜𝑟 25%
20
3 3 0.40 + 0.15 = 0.55
4 𝑜𝑟 0.15 𝑜𝑟 15%
20
6 6 0.55 + 0.30 = 0.85
5 𝑜𝑟 0.30 𝑜𝑟 30%
20
2 2 0.85 + 0.10 = 0.95
6 𝑜𝑟 0.10 𝑜𝑟 10%
20
1 1 0.95 + 0.05 = 1.00
7 𝑜𝑟 0.05 𝑜𝑟 5%
20
20
Total

7
Graphical Interpretation of Data
It is a good idea to look at a variety of graphs to see which is the most
helpful in displaying the data. We might make different choices of what
we think is the “best” graph depending on the data and the context. Our
choice also depends on what we are using the data for.
Pie Charts. Use Pie Chart to compare the proportion of data in each
category or group. A pie chart is a circle that is divided into segments
or slices to represent the proportion of observations that are in each
category. There should be a maximum of six slices in using pie charts.
The first slice must start at 12 o’clock. Others must be placed last.
Lastly, only use explosion if you want to focus on a pie slice.

EXAMPLE 1: A quality engineer for an automotive supply company


wants to decrease the number of car door panels that are rejected
because of paint flaws. As part of the initial investigation, the engineer
creates a pie chart to compare the counts of flaws in each category.

INTERPRETATION OF RESULTS:
The pie chart shows that Peel is the most common paint flow and that
Smudge and Other are the least common paint flaws. (Note: Smudge
(green) – 15%; Scratch (yellow) – 32.5%; Peel (red) – 37.5%; Other (sky
blue – 15.0%)
Figure 2. Pie Chart of Flaws.

Bar Charts. Use Bar Chart to compare the counts, the means, or other
summary statistics using bars to represent groups or categories. The
height of the bar shows either the count, the variable function (mean,
sum, standard deviation, and others), or the summary value for the

8
group. Bars may be vertical or horizontal. Use the bar graph instead of
pie chart if you must compare more than six categories.
Using example 1, we will create a bar chart instead of a pie chart.

INTERPRETATION OF RESULTS:
The bar chart shows that Peel is the most common paint flaw and that
Smudge and Other are the least common paint flaws.
Figure 3. Bar Chart of Flaws.

EXAMPLE 2: An electronics design engineer studies the effect of


operating temperature and three types of face-plate glass on the light
output of an oscilloscope tube. As part of the initial investigation, the
engineer creates a bar chart to compare the light output of various
combinations of temperature and glass type.
INTERPRETATION OF RESULTS
The temperature that produces the highest light output most often is
150 degrees. Although the difference in light output between glass types
is small, the glass type that produces the highest light output most
often is Glass type 1. Overall, the highest light output occurs with glass
type 1 at 150 degrees.
Figure 4. Chart of Mean Light Output

9
Pareto Chart. Use Pareto chart when you want to organize the bar
chart in decreasing order, with longest bars on the left and shortest
bars on the right. As you can see from Figure 4 that the school is
populated with Asian students while the Native American students are
the minority of the school population.
Figure 5. Pareto Chart of Ethnicity of Students.

Stem-and-Leaf Plot. Use Stem-and-Leaf Plot to examine the shape and


spread of sample data. The stem-and-leaf plot, or stemplot, comes from
the field of exploratory data analysis. It is a good choice when the data
sets are small.
To create the plot, divide each observation of data into a stem and a
leaf. The leaf consists of a final significant digit. For example, 23 has
stem two and leaf three. The number 432 has stem 43 and leaf two. The
decimal 9.3 has stem nine and leaf three. Write the stems in a vertical
line from smallest to largest. Then write the leaves in increasing order
next to their corresponding stem.
EXAMPLE 3: A scientist for a company that manufactures
processed food wants to assess the percentage of fat in the company’s
bottled sauce. The advertised percentage is 15%. The scientist
measures the percentage of fat in 20 random samples. Previous
measurements found that the population standard deviation is 2.6%.
INTERPRETATION OF RESULTS
Figure 6. Stem-and-Leaf of Percent Fat

10
For each row, the number in the “stem” (the middle column) represents
the first digit (or digits) of the sample values. The “leaf unit” at the top
of the plot indicates which decimal place the leaf values represent.
The first row of the stem-and-leaf plot of Percent Fat as a stem of 12 and
contains the leaf values 3, 4, and 8. The leaf unit is 0.1. Thus, the first
row of the plot represents sample values of approximately 12.3, 12.4,
and 12.8.
Histogram. Use Histogram to examine the shape and spread of your
data. A histogram works best when the sample size is at least 20. A
histogram divides sample values into many intervals and represents
the frequency of data values in each interval with a bar. The
horizontal axis is labeled with what the data represents (for instance,
distance from your home to school). The vertical axis is labeled either
frequency or relative frequency (or percent frequency or probability).
EXAMPLE 4: A quality control engineer needs to ensure that the
caps on shampoo bottles are fastened correctly. If the caps are fastened
too loosely, they may fall off during shipping. If they are fastened too
tightly, they may be too difficult to remove. The target torque value for
fastening the caps is 18. The engineer collects a random sample of 68
bottles and tests the amount of torque that is needed to remove the
caps.
INTERPRETATION OF RESULTS
Most caps were fastened with a torque of 14 to 24. Only one cap was
very loose, with a torque of less than 11. However, the distribution is
positively skewed. Many caps required a torque of greater than 24 to
remove, and five caps required torque of greater than 33, nearly two
times the target value.
Figure 7. Histogram of Torque.

Time Series Plot. Use Time Series Plot to look for patterns in your data
over time, such as trends or seasonal patterns. To construct a time
series graph, we must look at both pieces of our paired data set. We
start with a standard Cartesian coordinate system. The horizontal axis

11
is used to plot the date or time increments, and the vertical axis is used
to plot the values of the variable that we are measuring. By doing this,
we make each point on the graph correspond to a date and a measured
quantity. The points on the graph are typically connected by straight
lines in the order in which they occur.
EXAMPLE 5: A marketing analyst wants to assess trends in tennis
racquet sales. The analyst collects sales data from the previous five
years to predict the sales of the product for the next 3 months. As part
of the initial investigation, the analyst creates a time series plot to see
how sales have changed over time.
INTERPRETATION OF RESULTS
The time series plot shows a clear upward trend. There may also be a
slight curve in the data; the increase in the data values seems to
accelerate over time.
Figure 8. Time Series Plot of Racquets

Box Plots. Box plots give a good graphical image of the concentration
of the data. They also show how far the extreme values are from most
of the data. A box plot is constructed from five values: the minimum
value, the first quartile, the median, the third quartile, and the
maximum value. We will discuss the calculation and interpretation of
these interpretations later. We use these values to compare how close
other values are to them.
To construct a box plot, use a horizontal or vertical number line and a
rectangular box. The smallest and largest data values label the
endpoints of the axis. The first quartile marks one end of the box and
the third quartile marks the other end of the box. Approximately the
middle 50 percent of the data fall inside the box. The “whiskers” extend
from the ends of the box to the smallest and largest data values. The
median or second quartile can be between the first and third quartiles,
or it can be one, or the other, or both. The box plot gives a good, quick
picture of the data.
EXAMPLE 6: A plant fertilizer manufacturer wants to develop a
formula of fertilizer that yields the most increase in the height of plants.

12
To test fertilizer formulas, a scientist prepares three groups of 50
identical seedlings: a control group with no fertilizer, a group with
manufacturer’s fertilizer, named GrowFast, and a group with fertilizer
named SuperPlant from a competing manufacturer. After the plants are
in a controlled greenhouse environment for three months, the scientist
measures the plants’ heights.
As part of the initial investigation, the scientist creates a boxplot of the
plant heights from the three groups to evaluate the differences in plant
growth between plants with no fertilizer, plants with the manufacturer’s
fertilizer, and plants with their competitor’s fertilizer.

INTERPRETATION OF RESULTS
Figure 9. Boxplot of Height

GrowFast produces the tallest plants overall. SuperPlant also increases


plant height, but its variability is greater, and SuperPlant does not have
a positive effect on a large population of the seedlings. The graph shows
that GrowFast causes a greater and more consistent increase in plant
height.
Scatter Plots. Use Scatterplot to investigate the relationship between
a pair of continuous variables. A scatterplot displays ordered pairs of X
and Y variables in a coordinate plane.
EXAMPLE 6: A medical researcher studies obesity in adolescent
girls. Because body fat percentage is difficult and expensive to measure
directly, the researcher wants to determine whether the body mass
index (BMI) – a measurement that is easy to take – is a good predictor
of body fat percentage. The researcher collects BMI, body fat
percentage, and other personal variables of 92 adolescent girls.
INTERPRETATION OF RESULTS
The scatterplot for the BMI and body fat data shows a strong positive
and linear relationship between the two variables. Body mass index
(BMI) may be a good predictor of body fat percentage.

13
Figure 10. Scatterplot of Body Fat Percentage and BMI.

Numerical Interpretation of Data


Measures of the location of data.
The common measures of location are quartiles and percentiles.
Quartiles are special percentiles. The first quartile, 𝑄1, is the same as
the 25th percentile, and the third quartile, 𝑄3 , is the same as the 75th
percentile. The median, M, is called both the second quartile and the
50th percentile.
To calculate quartiles and percentiles, the data must be ordered from
smallest to largest. Quartiles divide ordered data into quarters.
Percentiles divide ordered data into hundredths. To score in the 90th
percentile of an exam does not mean, necessarily, that you received
90% on a test. It means that 90% of test scores are the same or less
than your score and 10% of the test scores are the same or greater
than your test score.
Percentiles are useful for comparing values. For this reason,
universities and colleges uses percentiles extensively. One instance in
which colleges and universities use percentiles is when SAT results
are used to determine a minimum testing score that will be used as an
acceptance factor.
Percentiles are mostly used with very large populations. Therefore, if
you were to say that 90% of the test scores are less (and not the same
or less) than your score, it would be acceptable because removing one
data value is not significant.
The median is a number that measures the center of the data. You
can think of the median as the middle value, but it does not actually
have to be one of the observed values. It is a number that separates
ordered data into halves. Half the values are the same number or
smaller than the median, and half the values are the same number or
larger.

14
Quartiles are numbers that separate the data into quarters. Quartiles
may or may not be part of the data. The find the quartiles, find the
median or second quartile. The first quartile, 𝑄1, is the middle value of
the lower half of the data, and the third quartile, 𝑄2 , is the middle
value, or median, of the upper half of the data.
The interquartile range (IQR) is a number that indicates the spread of
the middle half or the middle 50% of the data. It is the difference
between the third quartile (𝑄3 ) and the first quartile (𝑄1 ).
𝐼𝑄𝑅 = 𝑄3 − 𝑄1
The IQR can help to determine potential outliers. A value is suspected
to be a potential outlier if it is less than (1.5)(𝐼𝑄𝑅) below the first
quartile or more than (1.5)(𝐼𝑄𝑅) above the third quartile. Potential
outliers always require further investigation.
Note: A potential outlier is a data point that is significantly different
from the other data points. These special points may be errors or
abnormality, or they may be a key to understanding the data.
INTERPRETATION: A percentile indicates the relative standing of
a data value when data are sorted into numerical order from smallest
to largest. Percentages of data values are less than or equal to the kth
percentile. For example, 15% of data values are less than or equal to
the 15th percentile.
• Low percentiles always correspond to lower data values.
• High percentiles always correspond to higher data values.
A percentile may or may or may not correspond to a value judgement
about whether it is good or bad. The interpretation of whether a
certain percentile is good or bad depends on the context of the
situation to which the data applies. In some situations, a low
percentile would be considered good; in other contexts, a high
percentile might be considered good. In many situations, there is no
value judgment that applies.

Formula for Finding the kth Percentile.


Let k – kth percentile
i – the index (ranking or position of a data value)
n – the total number of data
Process
1. Order the data from smallest to largest.
𝑘
2. Calculate 𝑖 = 100 (𝑛 + 1)
3. If i is an integer, then kth percentile is the data value in the ith
position in the ordered set of data.
4. If i is not an integer, then round i up and round i down to the
nearest integers. Average the two data values in these two
positions in the ordered data set.

15
EXAMPLE 7: Listed are 29 ages for the members of the faculty of
XYZ Senior High School in order from smallest to largest. 21; 22; 25;
25; 26; 26; 27; 29; 30; 31; 33; 34; 34; 36; 36; 37; 40; 41; 41; 42; 45;
47; 52; 53; 54; 54; 55; 57; 58
a. Find the 70th percentile.
b. Find the 83rd percentile.
Solution.
1. 𝑘 = 70; 𝑖 = 𝑖𝑛𝑑𝑒𝑥; 𝑛 = 29
𝑘 70
𝑖 = 100 (𝑛 + 1) = (100) (29 + 1) = 21.
Twenty-one in an integer, and the data value in the 21st
position in the ordered data set is 45. The 70th percentile is
45 years old.
2. 𝑘 = 83; 𝑖 = 𝑖𝑛𝑑𝑒𝑥; 𝑛 = 29
𝑘 83
𝑖= (𝑛 + 1) = ( ) (29 + 1) = 24.9.
100 100
24.9 is not an integer. Round it down to 24 and up to 25.
The age in the 24th position is 53 and the age in the 25th
position is 54. Average 53 and 54. The 83rd percentile is 53.5
years.
INTERPRETATION: When writing the interpretation of a
percentile in the context of the given data, the sentence should
contain the following information.
• Information about the context of the situation being considered.
• The data value (value of the variable) that represents the
percentile.
• The percent of individuals or items with data values below the
percentile
• The percent of individuals or items with data values above the
percentile
Measures of the center of data.
The center of a data set is also a way of describing location. The two
most widely used measures of the center of the data are the mean
(average) and the median. To calculate the mean weight of 50 people,
add the 50 weights together and divide by 50. To find the median
weight of the 50 people, order the data and find the number that splits
the data into two equal parts. The median is generally a better
measure of the center when there are extreme values or outliers
because it is not affected by the precise numerical values of the
outliers. The mean is the most common measure of the center.
Another measure of the center is the mode. The mode is the most
frequent value. There can be more than one mode in a data set as long

16
as those values have the same frequency, and that frequency is the
highest. A data set with two modes is called bimodal.
Measures of the spread of data.
An important characteristic of any set of data is the variation in the
data. In some data sets, the data values are concentrated closely near
the mean; in other data sets, the data values are more widely spread
out from the mean. The most common measure of variation, or
spread, is the standard deviation. The standard deviation is a number
that measures how far data values are from their mean.
The standard deviation is always positive or zero. The standard
deviation is small when the data are all concentrated close to the
mean, exhibiting little variation or spread. The standard deviation is
larger when the data values are more spread out from the mean,
exhibiting more variation.
Suppose that we are studying the amount of time customers wait in
line at the checkout at supermarket A and supermarket B, the
average wait time of both supermarkets is five minutes. At
supermarket A, the standard deviation for the wait time is two
minutes; at supermarket B the standard deviation is more spread out
from the average; wait times at supermarket A are more concentrated
near the average.
Suppose, again, that Rosa and Binh both shop at supermarket A.
Rosa waits at the checkout counter for seven minutes and Binh waits
for one minute. At supermarket A, the mean waiting time is five
minutes and the standard deviation is two minutes. The standard
deviation can be used to determine whether a data value is close to or
far from the mean.
Rosa waits for seven minutes:
• Seven is two minutes longer than the average of five; two
minutes is equal to one standard deviation.
• Rosa’s waiting time of seven minutes is two minutes
longer than the average of five minutes.
• Rosa’s waiting time of seven minutes is one standard
deviation above the average of five minutes.
Binh waits for one minute:
• One is four minutes less than the average of five; four
minutes is equal to two standard deviations.
• Binh’s wait time of one minute is four minutes less than
the average of five minutes.
• Binh’s wait time of one minute is two standard deviations
below the average of five minutes.
Calculating the Standard Deviation
To calculate the standard deviation, we need to calculate the
variance first. The variance is the average of the squares of the

17
deviations. The symbol 𝜎 2 represents the population variance,
the population standard deviation 𝜎 is the square root of the
population variance. The symbol 𝑠 2 represents the sample
variance; the sample standard deviation is s is the square root
of the sample variance.
If the numbers come from a census of the entire population and
not a sample, when we calculate the average of the squared
deviations to find the variance, we divide by N, the number of
items in the population. If the data are from a sample rather
than a population, when we calculate the average of the squared
deviations, we divide by n – 1, one less than the number of
items in the sample.

Formula for the Sample Standard Deviation

∑(𝑥 − 𝑥̅ ) 2
𝑠=√
𝑛−1

where 𝑥 is the individual score in the sample, 𝑥̅ is the sample mean,


and n is the sample population.

The standard deviation, 𝑠 𝑜𝑟 𝜎, is either zero or larger than zero.


Describing the data with reference to the spread is called
“variability”. The variability in data depends upon the method
by which the outcomes are obtained; for example, by measuring
or by random sampling. When the standard deviation is zero,
there is no spread; that is all the data values are equal to each
other. The standard deviation is small when the data are all
concentrated close to the mean and is larger when the data
values show more variation from the mean. When the standard
deviation is a lot larger than zero, the data values are very
spread out about the mean; outliers can make s or very large.

Reminder: Do not use inferential statistics when using Likert scales.


Likert scales are ordinal type of data and can be only analyzed using
cross-tabulation (Table 4). Only use significance test or inferential
statistics when the variables are continuous and qualifies the testing
assumptions.

18
Table 4. Overall assessment cross-tabulation lifted from the study of
(Zhang et al., 2016) on “How the Public Uses Media WeChat to Obtain
Health Information in China: A Survey Study”.
Overall assessment of Overall assessment of
searching for medical current health education
knowledge via internet system
Very Poor 71 (4.34%) 124 (7.58%)
Worse 293 (17.91%) 489 (29.89%)
Normal 1069 (65.34%) 915 (55.93%)
Good 186 (11.37%) 103 (6.30%)
Excellent 17 (1.04%) 5 (0.31%)

Additional note: You can access this article here:


https://bit.ly/3uI4f6m. Observe how the authors analyzed and
discussed the results of their survey research. A sample of their
research instrument can also be found within the article.
Descriptive statistics are most helpful when the research is limited to
the sample and does not need to be generalized to a larger population.
For example, if you are comparing the percentage of adults vaccinated
in four different barangays, then descriptive statistics is enough.
Summary
The significant of data interpretation is indisputable. Data analysis
and interpretation are crucial to develop sound conclusions and make
better informed decisions. As such, below is some tips in interpreting
data.
1. Collect your data and make it as readable as possible.
2. Choose the type of data analysis to perform.
3. Think. Ponder about your data from various points of views
(connect it to your related literature), and what it means for
various respondents.
4. Reflect. Be aware of many dangers of data analysis and
interpretation. Make sure your statements are valid and
backed with sound evidence.

Let Us Practice

Direction: Read the problem and answer the questions that follow.

Problem: Twenty-five selected students were asked the number of movies


they watched the previous week. The results are as follows.

19
Cumulative
Number of Relative
Frequency Relative
Movies Frequency
Frequency
5
0
9
1
6
2
4
3
1
4
1. Complete the remaining columns of the chart.
2. Construct a histogram of the data.
3. Find the mean, median, and mode.
4. Calculate the first, second, and third quartile.
5. Write your interpretation of the data based on your analysis.

Let Us Practice More

Direction: Read the problem and answer the questions that follow.

Problem: Sixty-five randomly selected car salesperson were asked the


number of cars they generally sell in one week. Fourteen people
answered that they generally sell three cars, nineteen generally
sell four cars, twelve generally sell five cars, nine generally sell
six cars, eleven generally sell seven cars.

1. Complete the table found below.


Cumulative
Data Value Relative
Frequency Relative
(# of cars) Frequency
Frequency

2. Find the sample mean 𝑥̅


3. Find the sample standard deviation, s.
4. Construct a histogram of the data.
5. Find the first quartile.
6. Find the median.
7. Find the 3rd quartile.
8. Construct a box plot of the data.
9. Find the 40th percentile.
10. Find the 90th percentile.

20
11. Write your interpretation of the data based on your analysis.

Let Us Remember

Direction: Fill in the blanks with the correct word or phrase to complete
each sentence.
1. In _________________, the raw data collected will be transferred to the data
editors to check for the completeness, accuracy, and preciseness of data.
2. _______________ is not simply about erasing information to make space for
new data, but rather finding a way to maximize a data set’s accuracy without
necessarily deleting information.
3. A ___________ is the number of times a value of the data occurs.
4. Use ___________ to compare the proportion of data in each category or group.
5. _____________ give a good graphical image of the concentration of the data.

Let Us Assess

Directions: In this activity, you must create a survey questionnaire that you
will be using in collecting data that will answer your research problem.
Perform any sampling technique to obtain a sample size of at least 30. Using
the survey questionnaire that you created, answer the following questions.
1. How many variables did you use in your survey? Enumerate those
variables and write their corresponding level of measurement.
2. Create a codebook for the variables used in the survey. Use the table
shown below in completing this task.

Table 5. Sample Codebook.


Variable Name Variable Code Levels of
Description Measurement
Track A specific Nominal
1 – GAD
specialization a 2 – STEM
SHS student is 3 - HUMS
enrolled in.

Let Us Enhance

Direction: Write the results and discussion of your study. Refer to


rubrics found in Table 6 in writing this section of your paper.

21
Needs
Exemplary Competent Developing
Criteria Improvement
(4) (3) (2)
(1)
Results Subheadings Subheadings Subheadings Results are
are included may or may are not heavily
and are clear not be included; interpreted,
and included; Results mixing the
informative; Results are include some results and the
Results are reported but interpretation discussion
reported and not with respect section; Results
interpreted interpreted to the section is
with respect to with respect literature; missing, only
the literature; to the Some figures figures and
All figures and literature; and tables tables are
tables included Most figures are referred present.
are referred in and tables to in the body
body of the included are of the results
results section. referred to in section.
the body of
the results
section.
Discussion Discussion Discussion Discussion Discussion is
addressed the addresses addressed brief and is a
major findings most major the major repetition of the
of the study; findings of the findings of results section,
Results are study; Most the study; if in included at
interpreted results are Results have all.
with respect to interpreted little
outside with respect interpretation
sources; to outside with respect
Careful details sources; to outside
any problems Considers sources,
occurring with some problem citations; Not
results, occurring thorough
techniques, with results, discussion on
inconsistencies techniques, the problems
. and occurring
inconsistenci with results.
es.
Layout and Ideas are Most main Ideas covered Ideas in the
Organizatio ordered clearly ideas are in the paper paper are
n and effectively ordered are difficult to
to fully clearly and occasionally follow; Lack of
comprehend effectively to out of order; transitions from
the paper; Clear fully Some topic to topic;
transitions are comprehend transitions Paragraphs are

22
used the paper; are used routinely too
throughout the Many throughout long,
paper; transitions the paper; encompassing
Paragraphs are are used Paragraphs too many ideas
the appropriate throughout are often too or too short,
length, not too the paper; long or too consisting of
short or too Most short. only 1 or 2
long. paragraphs sentences.
are of the
appropriate
length.
Grammar Writing Much of the Writing level Paper is
and Style formality is writing level is is too difficult to read
appropriate for appropriate simplistic or due to scattered
the audience; for audience; too construction
Word choice is Most complicated and
precise and sentences for audience; grammatical
varied; have precise Some errors.
Sentence word choice; sentences
structures are Most have precise
clear and not sentences choice; Many
convoluted; No have a clear sentences
grammar and construction; have a clear
spelling errors Few grammar construction;
are present; and spelling Many
Tense is correct errors are grammar and
throughout the present; Few spelling
paper. errors in errors; Many
tense in the errors in
paper. tense in the
paper.

Let Us Reflect

In your own words, discuss the importance of proper data management


procedures in analyzing and interpreting the results of a research study.
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________

23
24
Let Us Try
1. B 6. A
2. B 7. D
3. A 8. D
4. C 9. D
5. B 10. B
Let Us Practice
1.
2.
3. Mean = 1.48; Median = 1; Mode = 1
4. 𝑄1 = 1; 𝑄2 = 1; 𝑄3 = 2
5. Answers may vary
Let Us Remember
1. DATA EDITING
2. DATA CLEANING
3. FREQUENCY
4. PIE CHART
5. BOX PLOTS
Answer key to Activities
References

Abbas S. Tavakoli et al., “Data Management Plans: Stages, Components,


and Activitieis”, Applications and Applied Mathematics (AAM): An
International Journal 1 2006: 141-151, Accessed April 20, 2021.

Barbara Illowsky and Susan Dean, Introductory Statistics, Houston, Texas:


OpenStax, 2018.

Victor J. Schoenbach, Data Analysis and Interpretation, last updated: July


17, 2014, www.epidemiolog.net.

“What is Data Cleaning”, Sisense, accessed May 5, 2021,


https://www.sisense.com/glossary/data-cleaning/

Xingting Zhang et al., “How the Public Uses Social Media WeChat to Obtain
Health Information in China: A Survey Study”, BMC Medical
Informatics and Decision Making 2017: 17-66.

https://support.minitab.com/en-us/minitab-express/1/help-and-how-
to/
For inquiries or feedback, please write or call:

Department of Education – Division of Tagum City

Energy Park, Apokon, Tagum City, 8100

Telefax: (084) 216-3504

Email Address: tagum.city@deped.gov.ph

You might also like