You are on page 1of 32

DATA: GATHERING AND ORGANIZING DATA

Data are facts, or set of information or observations under study.

Data are classified as qualitative (quality) or quantitative (quantity).

Qualitative data manifest the concepts of attributes and is sometimes called as


categorical data. This cannot subjected to arithmetic operations – addition,
subtraction, multiplication or division. Examples are gender, nationality, civil status,
etc.

Quantitative data are numerical in nature and are obtained from counting or
measuring. Examples are test scores, height, weight, etc.

Under this classification of data, there is what we called as level or scale of


measurement which then assigns symbols or numerical to objects or events according
to some rules. There are four (4) measurement scales, to wit, nominal, ordinal, interval,
and ratio.

Nominal is a scale of measurement used for labeling variables into distinct


classifications and doesn`t involve quantitative value or order. Examples are
gender, nationality, civil status, color eyes, political preference, etc.

Ordinal is a scale of measurement used to simply depict the order of variables


and not the difference between each of the variables. Examples are siblings in the
family, honor students in the class, ranking of contestants in a beauty contest, etc

Nominal and Ordinal data falls under qualitative data.

Interval is a numerical scale where the order of the variables is known as well as
the difference between the variables. The value of 0 is arbitrary. Examples are test
scores, temperature, etc.

Ratio is a scale similar to interval level but the only difference is its assumption of
an absolute or true zero point. Examples are height, weight, time, etc.

Interval and Ratio data falls under quantitative data.

Let`s try this!


Classify the following as quantitative data or qualitative data. Further, identify which scale
of measurement it falls. (e.g. Quantitative Data – Interval)
1. Color of the eye 6. Age of Teachers
2. Number of typewriters in the room 7. Rank of Students
3. Civil Status 8. Speed of a Car
4. Address 9. Birth Rates
5. Telephone Numbers 10. Score in Math Exam

One body of knowledge that deals on data analysis is Statistics. Acelajado, et.al (1999)
define it as a systematic process of data – collection, organization or presentation,
analysis, and interpretation.

Considering these four (4) areas, Statistics was classified as Descriptive Statistics and
Inferential Statistics.
Descriptive Statistics is concerned with the methods of collecting, organizing and
presenting data appropriately and creatively to describe or assess group
characteristics.

Inferential Statistics is concerned with inferring or drawing conclusions about the


population based from the preselected elements of the population.

Let`s try this!


Distinguish whether to make use of Descriptive Statistics or Inferential Statistics.
1. A teacher computes the average grade of students and determines the top ten
from them.
2. A market vendor investigates the most popular brand of energy drink.
3. A sports journalist determines the most popular basketball player for the year.
4. An engineer calculates the average height of the buildings along Dimasalang
Boulevard.
5. A school administrator forecast future expansion of a school.

Collection of Data

We must be familiar of basic terms: Population and Sample, and its underlying
measurements.

Population consist of all objects under study. It is sometimes referred as the


overall, total or whole observation being considered.

Sample is a set of data collected from a statistical population by a defined


procedure. It is sometimes referred as the subgroup, subset, or representative of
the population

Parameter is any numerical or nominal characteristic of a population. And is a


value or measurement obtained from a population.

Statistic is an estimate of a parameter or simply any value or measurement


obtained from a sample.

Here are methods of collecting data: Direct or Interview Method, Indirect or Questionnaire
Method, Registration Method, Observation Method, and Experimentation

 Direct or Interview Method is a person-to-person interaction.

 Indirect or Questionnaire Method is obtained by distributing questionnaires.

 Registration Method is a method of collecting data is governed by laws.


Examples are birth and death rates registered in PSA, registration of land
vehicles in LTO, list of registered voters in COMELEC, etc.

 Observation Method is a scientific method of investigation.

 Experimentation is determine cause-and-effect of a certain phenomenon


under some controlled conditions.

In conducting researches, we rarely consider the whole or overall of our subjects being
undertaken and just consider a portion of it. We just include a small representative of a
population called samples.
In determining the right number of sample, we usually use the Slovin`s formula.
Where:
𝑁 n = sample size
𝑛=
1 + 𝑁𝑒 2 N = population size
e = margin of error

Example:
A group of researchers will conduct a survey to find out the opinion of residents of a
particular community regarding the oil price hike. If there are 10,000 residents in the
community and the researchers plan to use a sample using a 10% margin of error, what
should the sample size be?
10,000
Given: 𝑛=
n=? 1 + (10,000)(0.1)2
N = 10,000 10,000
𝑛=
e = 10% or 0.10 1 + (10,000)(0.01)
𝑛 = 99.01 or 𝟗𝟗

Let`s try this!


Given the following population size and margin of error, determine the desired sample
size.
1. N = 40,000 e = 5%
2. N = 15,000 e = 1%

After choosing the right method of collection and computing the sample size, next step
is sampling technique.

Sampling Technique is a procedure used to determine the individuals or


members of a sample.

It is classified as Probability Sampling and Non-Probability Sampling.

Probability Sampling is a type of sampling technique wherein each and every


unit of the population has the equal chance for selection as a sampling unit. It is
also called as Formal Sampling or Random Sampling.

Non-Probability Sampling is a type of sampling technique wherein the probability


of each case being selected from the total population is not known. Units of the
sample are chosen on the basis of personal judgment or convenience

Under Probability Sampling are Simple Random Sampling, Stratified Sampling, Cluster
Sampling, Systematic Sampling, and Multistage Sampling.

Simple Random Sampling (SRS) is the purest form of probability sampling. It


assures each element in the population has an equal chance of being included in
the sample.

Types:
 With Replacement
- The unit once selected has the chance for again selection

 Without Replacement
- The unit once selected cannot be selected again
Methods:
 Fishbowl/ Lottery Method
 Table of Random Numbers

Stratified Random Sampling is a probability sampling across two or more strata


or groups. Elements within each strata are homogeneous, but are heterogeneous
across strata.

Types:
 Proportionate Stratified Random Sampling
- Each stratum has the same sampling fraction

A sampling frame having 3 strata with 100, 200 and 300 population sizes respectively.
And the researcher chose a sampling fraction of ½. Then, the researcher must randomly
sample 50, 100 and 150 subjects from each stratum respectively.

Groups Popn Size Sampling Fraction Sample Size


A 100 1/2 50
B 200 1/2 100
C 300 1/2 150
Total 600 300

 Disproportionate Stratified Random Sampling


- The different strata have different sampling fractions

Example:
Suppose a community consists of 5,000 families belonging to different income brackets.
A sample of 200 families will selected using Stratified Random Sampling.

Strata Nr of Families
High-Income Families 1,000
Average-Income Families 2,500
Low-Income Families 1,500
N = 5,000

Solution:

Strata Nr of Families Percentage Nr of Families per Strata


1,000
High 1,000 = 0.2 or 20% 0.2 x 200 = 40
5,000
2,500
Average 2,500 = 0.5 or 50% 0.5 x 200 = 100
5,000
1,500
Low 1,500 = 0.3 or 30% 0.3 x 200 = 60
5,000
N = 5,000 n = 200

Interpretation: From the result, it was revealed that from 1,000 high-income families, 40
families will be selected as part of the sample, 100 out of 2,500 for average-income, and
60 out of 1,500 for low-income families.
Cluster Sampling is an SRS-like selection of sample but in groups or cluster.

Systematic Sampling is a selection technique of elements among population by


following a random starting point and a common succession.

Multi-Stage Sampling is a combination of several sampling technique.

Example:
- Select all schools; then sample within schools
- Sample schools; then measure all students
- Sample schools; then sample students

To differentiate the different probability sampling techniques, consider the following


illustration.
Whilst, for non-probability sampling, there are Purposive Sampling, Quota Sampling,
Snowball Sampling, Self-selection Sampling, and Convenience Sampling

Purposive Sampling is a sampling procedure in which an experienced researcher


selects the sample based on some appropriate characteristic of sample members.
It is also called as Judgment Sampling.

Quota Sampling is a sampling procedure similar to stratified random sampling but


in a non-random way. The proportions from each subgroups (from the population)
are drawn through convenience sampling until the quota is met. It is normally used
for interview surveys.

Snowball Sampling is a referral-type of sampling wherein it starts with a key


person and introduce the next one to become a chain. It stops when either no new
cases are given or the sample is as large as manageable.

Self-Selection Sampling occurs when you allow each case, usually individuals,
to identify their desire to take part in the research being conducted. The need for
cases is announced publicly, either by advertising through appropriate media or by
asking them to take part, and collect data from those who respond.

Convenience Sampling is non-probability sampling wherein each cases are


selected haphazardly only for those easiest to obtain or reach. Or in short, sample
most available are chosen. As the name of the sampling implies, it is done at the
“convenience” of the researcher

Let`s try this!


Identify the sampling method used in the following problems.
1. Every fifth person boarding a plane is searched thoroughly.
2. At a local community College, five math classes are randomly selected out of 20
and all of the students from each class are interviewed.
3. A community college student interviews the first 100 students to enter the building
to determine the percentage of students that own a car.
4. The names of 70 contestants are written on 70 cards. The cards are placed in a
bag, and three names are picked from the bag.
5. A researcher is interested in maximum-security inmates. She groups maximum-
security prisons by state, randomly selects 10 states, and, from those 10, selects
three prisons. She includes all the inmates in those three prisons in her sample.

Organization/ Presentation of Data

Data presented in an organized and systematic way may serve as one-picture-summary


of the data where significant characteristics can easily be seen. But before going any
further, let us consider ways of classifying data. It can be classified as grouped data or
ungrouped data.

Grouped data are data that are organized and arranged into different classes or
categories.

Ungrouped data are data that are not organized, or if arranged, could only be
from highest to lowest or vice versa.

Data can be presented in three (3) forms: Textual, Tabular, and Graphical.
Textual Method is a way of presenting data in paragraph form. This involves
enumerating the important characteristics, giving emphasis on significant figures
and identifying important features of the data.

Example:
Below are test scores in a 50-item test of ten (10) students.

25 33 28 40 26 30 37 9 48 15

Solution:
Arrange first the scores from lowest to highest.

9 15 25 26 28 30 33 37 40 48

Then, make your textual presentation through a short narrative describing the data set.

The highest score is 48 and the lowest is 3. Two students got a score of 40 and above,
while only one (1) got a score of 10 and below. Generally, the students performed well in
the test with eight (8) students or 80% getting a score of 25 and above.

Arranging a mass of data manually is quite tedious, but putting the data in a stem-and-
leaf plot would made it easy.

Stem-and-Leaf plot is a method of presenting data by separating number into two


parts: stem and leaf. The stem consists of the first digit and the leaf consists of the
second digit. While in a three-digit number, the stem consists of the fist two digits
and the leaf consist the last digit.

Example:
Let’s use the set of scores of 10 students in the prior example. Since it’s a two-digit
number, separate the first digit and write it in the stem whilst the second digit to the leaf.

Stem Leaves Arrange the numbers Stem Leaves


0 9 in leaf portion from 0 9
1 5 lowest to highest
1 5
2 5, 8, 6 2 5, 6, 8
3 3, 0, 7 3 0, 3, 7
4 0, 8 4 0, 8

By looking at the stem-and-leaf plot, we can easily rank the data or put them in order.

Let`s try this!


The following are ages of 15 patients confined in a government hospital. Construct a
stem-and-leaf and make a short narrative summary describing the data set.

31 48 25 16 32
28 48 49 51 42
37 55 37 37 27
Tabular Method is a way of presenting data by using tables.

One kind of table that is use in organizing data from ungrouped data to grouped data is
the frequency distribution table. There are two kinds of this table: categorical frequency
table and grouped frequency table.

Example:
Categorical Frequency Table
Following are blood types of 25 respondents being studied.

A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A
Solution:

Step 1: Make a table.

Class Tally f %
A
B
O
AB

Step 2: Tally the data


Step 3: Count the tallies and place results under frequencies.
Step 4: Find the percentage of each value by using the following formula:
% = f/n x 100%
Step 5: Find the totals for the frequency and percent column

Grouped Frequency Table

Following are test scores of 50 students in a Math exam.

49 57 38 73 81 74 59 76 65 69
54 56 69 68 78 65 85 49 69 61
48 81 68 37 43 78 82 43 64 67
52 56 81 77 79 85 40 85 59 80
60 71 57 61 69 61 83 90 87 74
Solution:

Step 1: Determine the classes


 Find the highest and lowest values
 Find the range
 Select number of classes desired
(5 to 20) or use the Sturge`s formula

Min Nr of Classes = 1+3.3 log n

 Find the width by dividing the range by


the number of classes and rounding up
 Select a starting point (usually the
lowest value or any convenient
number less than the lowest value);
add the width to get the lower limits
 Find the upper class limits
 Find the boundaries

Step 2: Tally the data.


Step 3: Find the numerical frequencies from the tallies.

Let`s try this!


The following are the IQ scores of 30 student applicants in Probinsyano National High
School:

96 96 91 88 78 55
103 106 94 91 81 75
72 88 100 94 88 78
109 96 96 94 88 70
113 106 100 96 88 78

Construct a frequency distribution table.

Graphical Method is a way of presenting data through graphs. Most common


graphical presentations are bar graph, line graph, and pie chart. Other used presentations
techniques are histogram, ogive, and frequency polygon.

Histogram is a graph represented by


vertical or horizontal rectangles
whose bases are the class marks
and whose heights are the
frequencies.
Ogive is a line graph where the bases are the class boundaries and the heights
are the less than cumulative frequencies for the less than ogive and greater
cumulative frequencies for the greater than ogive.

Legend:
Greater than ogive
Less than ogive

Frequency Polygon is a line graph whose bases are the class marks and whose
heights are the frequencies.

MEASURES OF CENTRAL TENDENCY

Before going to our subject matter, let`s first discuss a pre-requisite topic about
summation notation.

Summation notation is the addition of n values or measurements. It is denoted


by a Greek symbol  followed by an equation that describes the pattern of
succession of scores.
For the summation notation of:
9

∑ 𝑋𝑖 It is read as “the summation of X sub i,


where i goes from 1 to 9”
𝑖=1

It can be expanded to 𝑥1 +𝑥2 +𝑥3 +𝑥4+ 𝑥5 +𝑥6 +𝑥7 +𝑥8 +𝑥9 .


Considering the following values for each element:
𝑥1 = 76, 𝑥2 = 45, 𝑥3 = 51, 𝑥4 = 27, 𝑥5 = 6, 𝑥6 = 76, 𝑥7 = 62, 𝑥8 = 12, 𝑥9 = 2

We can, therefore, substitute it to the given equation


𝑥1 +𝑥2 +𝑥3 +𝑥4+ 𝑥5 +𝑥6 +𝑥7 +𝑥8 +𝑥9 = 76+45+51+27+6+76+62+12+2 = 357

Thus, the resulting sum is equal to 357.

Example:
1. From the summation notation ∑6𝑖=4 𝑋𝑖 2 , solve using the aforecited values.

Solution:
6

∑ 𝑋𝑖 2 = 𝑋4 2 + 𝑋5 2 + 𝑋6 2 = 272 + 62 + 762 = 𝟔, 𝟓𝟒𝟏 Since 𝑥4 = 27, 𝑥5 = 6,


𝑖=4
𝑥6 = 76

It can also be done in reverse and come up with a summation notation.

Like the equation: 𝑋7 𝑌7 + 𝑋8 𝑌8 + 𝑋9 𝑌9


9

It can be rewritten in a summation notation of ∑ 𝑋𝑖 𝑌𝑖


𝑖=7
2. Express the expanded equation, (𝑋5 + 5) + (𝑋6 + 5) + (𝑋7 + 5) + (𝑋8 + 5), in
summation notation.

Answer:
8

It can be rewritten in a summation notation of ∑(𝑋𝑖 + 5)


𝑖=5

Let`s try this!


Let 𝑋1 = 13, 𝑋2 = 8, 𝑋3 = 12, 𝑌1 = 3, 𝑌2 = 6, 𝑌3 = 15, compute for the following:
1. ∑3𝑖=1 𝑋𝑖 2. ∑3𝑖=1 𝑌𝑖 3. ∑3𝑖=1 𝑋𝑖 𝑌𝑖

Descriptive Statistics is all about describing the characteristics of the data. One way is
through measures of central tendency.

Measures of central tendency are numerical descriptive measures which


indicate or locate the center of a distribution or data set.
The most common statistical tools under this measure are Mean, Median, and Mode.

Mean is the sum of the values of a group of items divided by the number of such
items. It is called as “average” or “arithmetic mean” and is represented by  (read
as “mu”) for population and 𝑥 (read as x bar) for sample.

Average is a number expressing the


central or typical value in a set of
data.

For ungrouped data, the formula below is used:


Where:
𝛴𝑋  = mean
=
𝑛 𝛴𝑋 = sum of the measurement or values
𝑛 = number of sample
For grouped data: Where:
𝛴𝑓𝑋𝑚 𝑓 = frequency
=
𝑛 𝑋𝑚 = classmark/ midpoint
𝑛 = number of sample
Example:

1. Ten students were polled as to the number of siblings in their individual families.
The raw data is the following set: 3, 2, 2, 1, 3, 6, 3, 3, 4, 2. Compute for the
arithmetic mean.

Solution:
Σ𝑥 𝑥1 + 𝑥2 + ⋯ + 𝑥10 3 + 2 + 2 + 1 + 3 + 6 + 3 + 3 + 4 + 2
𝑥= = = = 𝟐. 𝟗
𝑛 𝑛 10
2. Compute for the arithmetic mean using the frequency distribution table below.

Class Interval f
16 - 23 1
24 - 31 3
32 - 39 6
40 - 47 12
48 - 55 10
56 - 63 8

Solution:

Class Interval f Xm fXm


16 - 23 1 19.5 19.5
24 - 31 3 27.5 82.5
32 - 39 6 35.5 213
40 - 47 12 43.5 522
48 - 55 10 51.5 515
56 - 63 8 59.5 476
40 1,828
𝛴𝑓𝑋𝑚 1,828
𝑥= = = 𝟒𝟓. 𝟕
𝑛 40
Under mean, there is what we called weighted mean.

Weighted Mean is a type of mean wherein each value or measurement has a


different weight or degree of importance.

Example:
Listed below are the grades of a student’s semester courses. Calculate the Grade Point
Average (GPA).

Course Grade Points (X) Credits (W)


Math A 4 5
History B 3 3
Health A 4 2
Art C 2 2

Solution:
Course Grade Points (X) Credits (W) X*W
Math A 4 5 20
History B 3 3 9
Health A 4 2 8
Art C 2 2 4
W = 12 X = 41

𝛴𝑋𝑊 41
𝑥= = = 𝟑. 𝟒𝟐
𝑊 12
The student`s Grade Point Average (GPA) resulted to 3.42.

Characteristics of the Mean:


 It is most appropriate measure when the data are in interval or ratio scale.
 The mean lies between the largest and smallest values of measurements.
 There is only one value for the mean for a given set of values of measurements.
 It is easily influenced by extreme values because all values contribute to the
average.

Let`s try this!


1. Tapia obtained the following marks in her five (5) subjects in finals. Compute her
Grade Point Average (GPA).

Subject Units Grade


Math 3 80
English 3 85
Science 5 83
Filipino 3 84
Social Studies 3 88
2. The following are test scores obtained by 3rd year section Maligaya in their Math
test. Compute the mean.

Class Interval f
20 - 24 4
25 - 29 6
30 - 34 7
35 - 39 10
40 - 44 5
45 - 49 8

Median is the middlemost item/ score/ value in an ordered distribution and is


denoted by the symbol 𝑥̃.

Example:
The following are the ages of the Math teachers in Hitaasan Elementary School: 21, 23,
32, 28, 25, 50, 48. Compute for the median.

Solution:
From the given scores, arrange it from lowest to highest.

21, 23, 32, 28, 25, 50, 48 21, 23, 25, 28, 32, 48, 50

Since n = 7, 𝑛⁄2 = 7⁄2 = 3.5 𝑜𝑟 𝟒. Thus, median is equal to 4 or the fourth entity from the
set of ordered scores.

1st 2nd 3rd 4th 5th 6th 7th


̃ = 𝟐𝟖
𝒙
21 23 25 28 32 48 50

How about for even 𝑛?

3, 2, 2, 1, 1, 6, 3, 3, 4, 2 1, 1, 2, 2, 2, 3, 3, 3, 4, 6

Since n = 10, 𝑛⁄2 = 10⁄2 = 𝟓. Thus, median is equal to the 5th and 6th entity from the set
of ordered scores

1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th
1 1 2 2 2 3 3 3 4 6

There is only ONE value for median. Hence, we will get the average of the two scores.

2+3 5 ̃ = 𝟐. 𝟓
𝒙
= = 𝟐. 𝟓
2 2
For grouped data, the following formula is used:
Where:
𝑛⁄ − 𝐹 LB = the lower limit for the pinned class boundary
𝑥̃ = 𝐿𝐵 + ( 2 )𝑖 F = less than cumulative frequency for the pinned class
𝑓 f = frequency for the pinned class
i = interval
𝑛 = number of sample
Example: CI f
16-23 1
Here are test scores of 40 students in Math: 24-31 3
32-39 6
40-47 12
48-55 10
Locate this resulted 56-62 8
Solution:
value to <cf in the n = 40
𝑛⁄ − 𝐹 table. Since the
𝑥̃ = 𝐿𝐵 + ( 2 value is 20, it CI f <cf
)𝑖
𝑓 should fall under 22 16-23 1 1
𝑛⁄ = 40⁄ = 𝟐𝟎 (with its range is 24-31 3 4
2 2 from 11 to 22) 32-39 6 10
20 − 10
𝑥̃ = 39.5 + ( )8 40-47 12 22
12 48-55 10 32
̃ = 𝟒𝟔. 𝟏𝟕
𝒙 56-62 8 40
n = 40
Characteristics of the Median:
 Appropriate measure for interval data
 Lies between the highest and lowest measurement
 There is only one value for the median in a given set of measurement
 It is not influenced by extreme values
 The value where half of the distribution lies above it and the other half lies below it

Let`s try this!


The table below shows the age distribution of the contestants in a raffle draw
sponsored by a popular noon time game show.

Age f
25 – 29 12
30 – 34 7
35 – 39 3
40 – 44 6
45 – 49 10
50 – 54 8
55 – 59 4

Compute the median and interpret the result.

Mode is the value which occurs most frequently in a set of measurement or values
and is sometimes referred as an inspection average. It is usually represented by
symbol “Mo”. Unlike mean and median, this can have multiple values – bimodal,
multimodal.

Example:
Ten students in a math class were polled as to the number of siblings in their individual
families and the results were: 3, 2, 2, 1, 3, 6, 3, 3, 4, 2. Find the mode for the number of
siblings.
Solution:
3, 2, 2, 1, 3, 6, 3, 3, 4, 2 Mo = 3

What score/s occurs most frequent?

From the set of scores, there is only three (3) of 2`s and four of 3`s. Therefore, the score
that is most frequently occurring is 3.

For grouped data, the following formula is used:


Where:
LB = the lower limit for the pinned class boundary
∆1 ∆1 = difference between the highest frequency and the
𝑀𝑜 = 𝐿𝐵 + ( )𝑖 frequency just below it
∆1 + ∆2
∆2 = difference between the highest frequency and the
frequency just above it
i = interval
𝑛 = number of sample
Example:

Let`s use the test scores of 40 students presented in the example for median and compute
for the mode.

Solution: CI f Choose the class


16-23 1 with the highest
∆1 24-31 3 frequency.
𝑀𝑜 = 𝐿𝐵 + ( )𝑖 32-39 6
∆1 + ∆2
40-47 12
12 − 6
𝑀𝑜 = 39.5 + ( )8 48-55 10
(12 − 6) + (12 − 10) 56-62 8
𝑴𝒐 = 𝟒𝟓. 𝟓 n = 40

Under mode, there is what we called as Pearsonian Mode. This is use for multimodal
distribution. Its formula is:

Pearsonian Mode = 3𝑥̃ - 2𝑥

Characteristics of a Mode:
 Appropriate measure for nominal-scale data
 The least reliable measure because its value is undefined in some distributions
 The value which occurs most often
 The quick approximation of the average

Let`s try this!


The following are test scores obtained by IV-Masikap students in Math exam:

Class Interval f
20-24 4
Compute for the mode and 25-29 6
interpret the result. 30-34 7
35-39 10
40-44 5
45-49 8
Other descriptive measures which are used to locate the position of values or scores in
the distribution is quantiles or fractiles. It characterized of three types – quartile, decile,
and percentile.

Quartiles are score points which divide a distribution into “four” equal parts, so
that each part represents 1⁄4 or 25% of the data set.

Deciles are values that divide a set of distribution into “ten” equal parts, so that
each part represents 1⁄10 or 10% of the data set.

Percentiles are values that


divide a set of distribution
into “one hundred” equal
parts, so that each part
represents 1⁄100 or 1% of
the data set.

To better picture out the concept,


consider the illustration posted on
the side.

For grouped data, the following formula are used:


𝑛𝑁
−𝐹
4
𝑄𝑛 = 𝐿𝐵 + [ ]i Quartile
𝑓

𝑛𝑁
−𝐹
10
𝐷𝑛 = 𝐿𝐵 + [ ]i Decile
𝑓

𝑛𝑁
−𝐹
100
𝑃𝑛 = 𝐿𝐵 + [ ]i Percentile
𝑓

It can be observed that it is somewhat similar with the formula for median.

Example:
CI f
The following are Math test scores of 50 first year students 20 – 24 4
of Silago College Inc. 25 – 29 7
30 – 34 12
Compute for Q3, D3, P90. 35 – 39 10
40 – 44 9
45 – 49 6
50 – 54 2
N = 50
Solution: CI f <cf
3𝑁
−𝐹 20 – 24 4 4
4
𝑄3 = 𝐿𝐵 + [ ]𝑖 25 – 29 7 11
𝑓
30 – 34 12 23
3𝑁 3(50) 35 – 39 10 33
= = 𝟑𝟕. 𝟓
4 4 40 – 44 9 42
45 – 49 6 48
Locating the resulted value of 50 – 54 2 50
37.5 to <cf in the table, it should N = 50
fall under 42 since its range is
from 34 to 42.

37.5 − 33 𝑄3 represents the 75% of


𝑄3 = 39.5 + [ ] 5 = 𝟒𝟐
9 the distribution.

Interpretation:
Out of 50 students who took the Math test, approximately 75% or 38 students got
a score lower than 42 while the remaining 25% scored higher than 42.

3𝑁
−𝐹 CI f <cf
𝐷3 = 𝐿𝐵 + [ 10 ]𝑖 20 – 24 4 4
𝑓 25 – 29 7 11
3𝑁 3(50) 30 – 34 12 23
= = 𝟏𝟓 35 – 39 10 33
10 10
40 – 44 9 42
Locating the resulted value of 15 45 – 49 6 48
to <cf in the table, it should fall 50 – 54 2 50
under 23 since its range is from N = 50
12 to 23.

15 − 11 𝐷3 represents the 30% of


𝐷3 = 29.5 + [ ] 5 = 𝟑𝟏. 𝟏𝟕
12 the distribution.

Interpretation:
Thirty percent of the total students or 15 who took the Math test obtained a score
lower than 31.17 whereas the other 35 obtained a higher score than 31.17.

90𝑁
−𝐹 CI f <cf
𝑃90 = 𝐿𝐵 + [ 100 ]𝑖 20 – 24 4 4
𝑓 25 – 29 7 11
90𝑁 90(50) 30 – 34 12 23
= = 45 35 – 39 10 33
100 100
40 – 44 9 42
45 − 42 45 – 49 6 48
𝑃90 = 44.5 + [ ] 5 = 𝟒𝟕 50 – 54 2 50
6
N = 50
Interpretation:

Forty-five out of 50 or 90% scored lower than 47 in the administered Math test.
Whilst the remaining number of students scored higher than 47.
Let`s try this!
Nr of Cars f
For 50 days, Vino Gah recorded the number 40 – 44 3
of cars passing by their street from 10:00 o` 45 – 49 10
clock AM to 12:00 o` clock AM. The following 50 – 54 13
table shows the distribution. 55 – 59 9
60 – 64 8
Compute for D3, and P95 and interpret the 65 – 69 7
result.

MEASURES OF VARIABILITY OR DISPERSION

In the description of data, measures of central tendency is not fully sufficient. It needs
another measure which is on the variability or dispersion of data.

Measures of variability or dispersion are measures of the average distance of


each observation from the center of the distribution. They measure the
homogeneity or heterogeneity of a particular group.

A small measure of variability would mean that the data are:


 clustered closely around the mean
 more homogeneous
 less variable
 more consistent and
 more uniformly distributed

To better understand the concept, observe the following example:

Grades of five (5) Boys and five (5) girls in their Math subject

Boys Girls
Kaka 70 Bunang 82
Bangkay 95 Bakikang 80
Budoy 60 Petra 83
Dodong 80 Bering 81
Tuko 100 Lodi 79
Mean: 81 Mean: 81

B. Tabular Form A. Graphical Form

The mean grade of both groups is 81. Considering only this result, we can conclude that
the two groups equally performed well and all of its members have obtained a good grade
and passed the subject. However, looking on the individual grades of both the boys and
the girls, we can see that some of boy-students have acquired a failing grade while all of
girl-students have passed the subject.

Moreover, it can be noticed that grades of the males are far apart from each other while
the grades of the females are more compressed or clustered together. This is now the
role of measures of dispersion. By getting the distance of each item from the center of the
distribution, the group can now be described more effectively. The most common
statistical tools under this measure are range, mean absolute deviation, quartile deviation,
variance and standard deviation.
Range is the difference between the highest and the lowest values. This is the
simplest but the most unreliable measure of variability since it only uses two values
in the distribution.

There are four types of range. These are Absolute Range (AR), Total Range (TR),
Interquartile Range (IQR), and Kelly Range (KR).

AR = HS – LS Where:
HS = Highest Score
TR = (HS – LS) + 1 LS = Lowest Score
Q3 = Third Quartile
IQR = Q3 – Q1 Q1 = First Quartile
P90 = 90th Percentile
P10 = 10th Percentile
KR = P90 – P10

Mean Absolute Deviation (MAD) is the average of the summation of the absolute
deviation of each observation from the mean.

Ungrouped Data: Where:


𝜮|𝑿 − 𝑿| X is the individual values
𝑴𝑨𝑫 = 𝑿 is the arithmetic mean
𝒏
n is the number of cases

Grouped Data: Where:


𝜮𝒇|𝑿𝒎 − 𝑿| 𝑋𝑚 is the midpoint of the class
𝑴𝑨𝑫 = 𝑿 is the arithmetic mean
𝒏
n is the total number of cases
f is the class frequency
Example:
Let`s use the scores of five (5) boys and five (5) girls presented in the intro example.
Compute the variability using MAD and interpret the result.

Boys: Score (X) Mean (𝑿) |𝑿 − 𝑿|


𝛴|𝑋 − 𝑋| 70 81 11
𝑀𝐴𝐷 = 95 81 14
𝑛
66 60 81 21
𝑀𝐴𝐷 = = 𝟏𝟑. 𝟐 80 81 1
5
100 81 19
Total 66

Girls: Score (X) Mean (𝑿) |𝑿 − 𝑿|


82 81 1
𝛴|𝑋 − 𝑋| 80 81 1
𝑀𝐴𝐷 =
𝑛 83 81 2
6 81 81 0
𝑀𝐴𝐷 = = 𝟏. 𝟐
5 79 81 2
Total 6
Interpretation:
The male group has a MAD of 13.2 while the female group has 1.2. This then shows that
female group is more homogeneous than the male group.
Quartile Deviation is divides the difference between the 3rd and 1st quartile into
two. It is sometimes called “semi interquartile range”. The formula is shown below:

𝑄3 − 𝑄1 Where:
𝑄𝐷 = Q3 = 3rd Quartile
2 Q1 = 1st Quartile

Variance is the average of the squared deviation from the mean.

Ungrouped Data:
Where:
2
Σ (𝑋 − 𝜇 )2 X = Individual Values
𝜎 = (Population)
𝑁  = Population Mean
𝑋 = Sample Mean
2
Σ(𝑋 − 𝑋) N = Population Size
𝑠2 = (Sample) n = Sample Size
𝑛−1
Grouped Data:
Where:
Σ𝑓(𝑋𝑚 − 𝜇 )2 f = frequency
𝜎2 = (Population)
𝑁 Xm = Classmark/ Midpoint
 = Population Mean
2 𝑋 = Sample Mean
Σ𝑓(𝑋𝑚 − 𝑋) (Sample)
𝑠2 = N = Population Size
𝑛−1 n = Sample Size

Standard Deviation is the square root of the average deviation from the mean, or
simply the square root of the variance.

Example:

1. Using the same example of five (5) boys and five (5) girls. Compute the variability
of the distribution using standard deviation.
𝟐
Boys: Score (X) Mean (𝑿) 𝑿−𝑿 (𝑿 − 𝑿)
70 81 -11 121
2 95 81 14 196
Σ(𝑋 − 𝑋) 1120
𝑠=√ =√ = 𝟏𝟔. 𝟕𝟑 60 81 -21 441
𝑛−1 5−1 80 81 -1 1
100 81 19 361
Total 1,120

Girls: 𝟐
Score (X) Mean (𝑿) 𝑿−𝑿 (𝑿 − 𝑿)
2 82 81 1 1
Σ(𝑋 − 𝑋) 10 80 81 -1 1
𝑠=√ =√ = 𝟏. 𝟓𝟖
𝑛−1 5−1 83 81 2 4
81 81 0 0
79 81 -2 4
Total 10
Interpretation:
This confirms the result that the scores of girls are less variable compare to the scores of
boys.
2. The table on the side shows the Math test scores of CI f
50 first year students of Silago College Inc. Compute 20 – 24 4
the variability of the distribution using standard 25 – 29 7
deviation and interpret the result. 30 – 34 12
35 – 39 10
40 – 44 9
45 – 49 6
50 – 54 2
N = 50
Solution:
CI f 𝑿𝒎 𝝁 𝑿𝒎 − 𝝁 (𝑿𝒎 − 𝝁)𝟐 𝒇(𝑿𝒎 − 𝝁)𝟐
20 – 24 4 22 35.90 -13.90 193.21 772.84
25 – 29 7 27 35.90 -8.90 79.21 554.47
30 – 34 12 32 35.90 -3.90 15.21 182.52
35 – 39 10 37 35.90 1.10 1.21 12.1
40 – 44 9 42 35.90 6.10 37.21 334.89
45 – 49 6 47 35.90 11.10 123.21 739.26
50 – 54 2 52 35.90 16.10 259.21 518.42
N = 50 𝚺𝒇(𝑿𝒎 − 𝝁)𝟐 = 3,114.50

Σ𝑓(𝑋𝑚 − 𝜇 )2 3,114.50
𝜎=√ =√ = 𝟕. 𝟖𝟗
𝑁 50

Thus, the measure of variability as to standard deviation is equal to 7.89.

Let`s try this!


For 50 days, Vino Gah recorded the number of cars Nr of Cars f
passing by their street from 10:00 o` clock AM to 40 – 44 3
12:00 o` clock AM. The following table shows the 45 – 49 10
distribution. 50 – 54 13
55 – 59 9
Compute variability using standard deviation. 60 – 64 8
65 – 69 7

Standard deviation and variance are both reliable measures of variability or dispersion of
the distribution. However, it cannot be used in comparing two sets of data of different
units. This can only be done through coefficient of variation.

Coefficient of variation is the ratio of the standard deviation to the mean. The
formula is shown below:

𝑠 Where:
𝑐𝑣 = ∗ 100 s = standard deviation
𝑥 𝑋 = mean

NOTE: The bigger the value, the more dispersed is the distribution and the smaller the
value, the less dispersed is the distribution.
Example:
Game Nr of Assists Nr of Points
1 8 18
The following table illustrates the
2 10 20
number of assists and the number of
points made by Pepot in his ten (10) 3 9 22
Basketball games played in the 4 12 16
regional level: 5 5 35
6 1 12
Determine if which area he performed 7 4 23
consistently. 8 7 25
9 9 30
10 3 15
Solution:

Nr of Assists
Nr of Assists Nr of Points
𝑠 3.46
𝑐𝑣 = ∗ 100 = ∗ 100 Mean 6.80 21.60
𝑥 6.80 SD (Sample) 3.46 7.04
𝑐𝑣 = 𝟓𝟎. 𝟖𝟖%

Nr of Points
𝑠 7.04
𝑐𝑣 = ∗ 100 = ∗ 100 = 𝟑𝟐. 𝟓𝟗%
𝑥 21.60

Interpretation:
Coefficient of variation for number of assists and number points resulted to 50.88% and
32.59%, respectively. As it is noted that the lesser the value, the less dispersed is the
distribution. It can then be concluded that the less dispersed is the distribution for number
of points and that Pepot is more consistent in making points than giving assists.

Let`s try this!


Hamabaw National College conducted a comprehensive applied Math examination which
consist of two parts – oral exam and written exam. On the first, 20 students took the said
exam with its results shown below:

Oral Exam 4 1 4 5 3 2 3 4 3 5 2 2 4 3 5 5 1 1 1 2
Written Exam 2 3 1 4 2 5 3 1 2 1 2 2 1 1 2 3 1 2 3 4

1. Compute for mean and standard deviation.


2. Using coefficient of variation, determine which area the 20 students performed
consistently.

MEASURES OF RELATIVE POSITION

This is conversions of values, usually standardized test scores, to show where a given
value stands in relation to other values of the same grouping. Under this measure are z-
score, percentile rank, and stanine.

Z-score is a statistical tool that measures how many standard deviation a


particular value is above or below the mean.
It uses either of the two formulas depending on the given values:

𝑋−𝜇
𝑍= Where:
𝜎 X = raw score
𝜇 = mean
𝑥−𝜇 𝜎 = standard deviation
𝑧=𝜎 n = sample size
⁄ 𝑛

For z-score, it uses a normal curve in plotting the distribution.

Normal curve is a bell-


shaped curve that is
symmetric about a vertical
line through the mean of the
data.

It has the following properties:


 The graph is symmetric about a vertical line through the mean of the distribution
 The mean, median, and mode are equal
 The total area under the normal curve is equal to 1 to 100%
 The normal curve area may be subdivided into standard deviations, at least 3 units
to the left and 3 units to the right of the vertical line

Going back to z-score, this also standardized a given value of “X” using the mean and
standard deviation to determine its specific distance above or below the mean.

Example:
Compute the z-score for X of 43 with mean of 37 and standard deviation of 5. Interpret
the result.

Solution:
𝑋 − 𝜇 43 − 37
𝑍= = = 𝟏. 𝟐
𝜎 5

Interpretation:
The raw score of 43 resulted to a z score of 1.2. This means that it is 1.2 standard
deviation above the mean.

Moreover, z-score can be used in finding the percentage which falls under certain ranges.

Example:

1. What is the percentage from the mean of 37 to the raw score of 43 with standard
deviation of 5 to the overall distribution?
What percentage falls
under the shaded portion
𝑋 − 𝜇 43 − 37 of the normal curve?
𝑍= = = 𝟏. 𝟐
𝜎 5

Answer:

In the earlier example, it was found out that X of 43 is 1.2 standard deviation above the
mean. To find out its percentage from the mean, we need to refer to the Z table.

From the table, it then corresponds to a value of 0.8849 or 88.49%, however, this is for
z-score of 1.2 and below. The problem being asked from the problem is from the mean
to 1.2, so:
0.8849 – 0.50 = 0.3849 or 38.49%

It can then be concluded that there is 38.49% of the data falls from the mean to 1.2
standard deviation above the mean.

2. The IQ of 300 students in a certain high school is approximately normally


distributed with mean=100 and s=15. How many students have an IQ between 85
and 120?

Solution: What percentage falls


under the shaded portion
For X = 85 For X = 120 of the normal curve?
𝑋−𝜇 𝑋−𝜇
𝑧= 𝑧=
𝜎 𝜎
85 − 100 120 − 100
𝑧= 𝑧=
15 15
𝑧 = −1.0 𝑧 = 1.33
Z-score of 1.33 is equal to z value of 0.9082 or 90.82%.

Z-score of -1.00 is equal to z value of 0.1587 or 15.87%.

Answer and Interpretation:


Scores of 85 and 120 was found out to be equal to z-scores of -1.0 and 1.33, respectively.
This indicates that IQ score of 85 is one (1) standard deviation below the mean and IQ
score of 120 is 1.33 standard above the mean.
Subtracting the z-value 0.1587
to 0.50 since we`re only finding
Area from Z = 0 to Z = 1.33 is 0.4082
the percentage from -1 to 0.
Area from Z = 0 to Z = -1.0 is 0.3413
Adding the areas, it resulted to 0.7495 or 74.95%
(0.50 – 0.1587 = 0.3413)
300 x 74.95% = 225

Thus, from the 300 students who were involved in the test, 225 students have an IQ
between 85 and 120.

Let`s try this!


In an entrance exam given to 700 students, the mean and standard deviation
are 80 and 10 respectively. How many of them got a score higher than or
equal to 85?
Standard Normal Cumulative Probability Table

Cumulative probabilities for NEGATIVE z-values are shown in the following table:

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
-3.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002
-3.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003
-3.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005
-3.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007
-3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
-2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
-2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
-2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
-2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036
-2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
-2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064
-2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
-2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
-2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
-2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183
-1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
-1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
-1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367
-1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
-1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
-1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681
-1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
-1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985
-1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170
-1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379
-0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611
-0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867
-0.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148
-0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
-0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
-0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121
-0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483
-0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859
-0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247
0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641
Standard Normal Cumulative Probability Table

Cumulative probabilities for POSITIVE z-values are shown in the following table:

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998
Another thing to be familiar of when dealing with normal curve is the empirical rule.

According to the empirical rule, in a normal distribution, approximately


 68% of the data lie within 1 standard deviation above and below the mean
 95% of the data lie within 2 standard deviations above and below the mean
 99.7% of the data lie within 3 standard deviations above and below the mean

Example:
A survey of 1,000 Philippine gas stations found that the price charged for a gallon of
regular gas could be closely approximated by a normal distribution with a mean of ₱3.10
and a standard deviation of ₱0.18. What proportion of the surveyed gas stations charge
between ₱2.74 and ₱3.46 for a gallon of regular gas?

Solution:
Compute first the z-score for X`s of 2.74 and 3.46 with mean of 3.10 and standard
deviation of 0.18
𝑥 − 𝜇 2.74 − 3.10 𝑥 − 𝜇 3.46 − 3.10
𝑧= = = −𝟐 𝑧= = =𝟐
𝜎 0.18 𝜎 0.18

So, the z-score of the two raw scores resulted to -2 and 2 which means that it is two (2)
standard below and two (2) above the mean, respectively. Looking back into the empirical
rule, it can, then, be said that the proportion of the surveyed gas stations that charge
between ₱2.74 and ₱3.46 for a gallon of regular gas is 95%.

Percentile rank is the proportion of scores in a distribution that a specific score is


equal to or lower than it. The formula is shown below:
Where:
𝑏 PR = Percentile ranking
𝑃𝑅 = • 100
𝑛 b = Nr of pieces of data below the one
being tested
n = Total data being tested
Example:
1. Below are scores in a 50-item test in Statistics of ten (10) students. Compute for
the percentile ranking of Tekla who scores 35 in test.

33 37 41 21 20 31 35 46 10 36

Solution: Tekla`s score


Arrange first the scores in ascending order

10 20 21 31 33 35 36 37 41 46
𝑏 5
𝑃𝑅 = • 100 = • 100 = 𝟓𝟎%
𝑛 10

It can, then, be interpreted that 50% of the students scored below Tekla`s score of 35 and
that she performs better than them in the test.

2. On a reading examination given to 900 students, Elaine`s score of 602 was higher
than the scores of 576 of the students who took the examination. What is the
percentile ranking for Elaine`s score?

Solution:
𝑏 576
𝑃𝑅 = • 100 = • 100 = 𝟔𝟒
𝑛 900

Thus, Elaine`s score of 602 places her at the 64th percentile ranking. This implies that
she scored higher than the 64% of the 900 students who took the reading examination.

Let`s try this!


On an examination given to 8,600 students, Jack`s score of 405 was higher than the
scores of 3,952 of the students who took the examination. What is the percentile rank for
Jack`s score?

Stanine (“Standard Nine”) is a way of scaling scores on a nine-point scale, with


the 5th stanine being at the mean, 1st stanine the lowest, and 9th stanine the
highest. Hence, it is particularly useful for grouping students or individuals and in
converting any test score to a single-digit score.
Average

Below average Above average

NOTE: Unlike z-scores that is expressed in decimal number, e.g. 1.2, 2.3, etc., stanines
are always positive whole numbers from 0 to 9 with each point in the scale is 0.5 standard
deviations away from the mean.

Example:
The following are scores of five (5) students, from a class of 35 students, for a 50-item
test. The class mean for the test is posted at 19 with its standard deviation of 5.61.
Determine their stanine scale each observed score.

Inday Kiko Bebang Berting Doming


22 17 27 24 20
Solution:
To determine the stanine scale, you need, first, to compute using the z-score.

Given:  = 19,  = 5.61

For Inday: For Berting:


𝑋 − 𝜇 22 − 19 𝑋 − 𝜇 24 − 19
𝑧= = = 0.5348 𝑧= = = 0.8913
𝜎 5.61 𝜎 5.61
For Kiko: For Doming:
𝑋 − 𝜇 17 − 19 𝑋 − 𝜇 20 − 19
𝑧= = = −0.3565 𝑧= = = 0.1783
𝜎 5.61 𝜎 5.61
For Bebang:
𝑋 − 𝜇 27 − 19
𝑧= = = 1.4262
𝜎 5.61
Then, refer to the table below for the scaling of scores:

Min z- Max z- % of population Cumulative % of


Stanine
score score in each stanine population
1 −∞ -1.751 4 100
2 -1.750 -.1.251 7 96
3 -1.250 -0.751 12 89
4 -0.750 -0.251 17 77
5 -0.250 0.251 20 60
6 0.250 0.751 17 40
7 0.750 1.251 12 23
8 1.250 1.751 7 11
9 1.750 ∞ 4 4

Hence, the following scales for each score was found out:

Students Scores z-score Stanine


Inday 22 0.5348 6
Kiko 17 -0.3565 4
Bebang 27 1.4262 8
Berting 24 0.8913 7
Doming 20 0.1783 5

Interpretation:
Among the five (5) sampled students, two stands out, Bebang and Berting, which both
scored above average with their stanine scores of 7 and 8. While the other three (3)
students were registered as average.
REFERENCES:

Acelajado, M.J., et.al. (1999). Mathematics for the New Millennium-Statistics. Makati City,
Philippines. Diwa Scholastic Press Inc.

Picciano, A. G. (2018). Measures of Relative Position. Retrieved from


https://www.anthonypicciano.com/education-research-methods/measures-of-relative-
position/

Stanine. (2018). Retrieved from https://www.yourdictionary.com/stanine

You might also like