You are on page 1of 35

MODULE 4: ​Data Management

MODULE 4
Data Management
Catalina B. Gayas & Emmeline R. Garcia

Table of Contents
Lesson 1 ​Data Collection, Organization, and Interpretation
Basic terminology in statistics 3
Data Collection and Sampling Techniques 4
Frequency Distribution and Graphs for Numerical Data 6
Lesson 2 ​Measures of Central Tendency
Mean (Raw and Grouped Data) 13
The Weighted Mean 15
Median (Raw and Grouped Data) 15
Mode (Raw and Grouped Data) 18
Types of Distribution 19
Lesson 3 ​Measures of Variation
Range (Raw and Grouped Data) 22
Mean Absolute Deviation (Raw and Grouped Data) 23
Variance and Standard Deviation (Raw and Grouped Data) 25
Lesson 4 ​Measures of Relative Position
Standard Score 30
Percentiles, Deciles, and Quartiles 31
Lesson 5 ​Normal Distribution
The Standard Normal Distribution 37
Applications of Normal Distribution 41
Lesson 6 ​Correlation Coefficients and Linear Regression
Correlation Analysis 46
Linear Regression 50
Leyte Normal University | Mathematics Unit 1
MODULE 4: ​Data Management

Overview
Statistics is used in all aspects of human endeavors. Statistics is used ​to describe ​data; ​to determine
significant relationship between and among variables; to determine significant difference in a
variable of interest between or among groups; and to make forecast and prediction. ​The concepts in
Statistics were already discussed in your K to 12 Curriculum. Hence, this module focuses on the
application of these concepts in the real setting, in which you can relate to. It is the aim of this
module to make you appreciate the importance of Statistics, and at the same time have fun doing
the exercises and activities.
This module includes the topics: Data Collection, Organization, and Interpretation; Measures
of Central Tendency; Measures of Dispersion; Measures of Relative Position; Normal Distribution;
and Correlation Coefficients and Linear Regression. Computer applications will be utilized in
this module, especially the use of Microsoft Excel and statistical analysis software, like SPSS, for
data analysis.

Objectives
At the end of this module, you should be able to:
1. demonstrate knowledge of basic statistical terms;
2. use statistical methods to summarize and organize data;
3. solve problems applying normal distribution;
4. apply linear regression and correlation in analyzing data; and
5. interpret computer outputs in data analysis.
Leyte Normal University | Mathematics Unit 2

MODULE 4: ​Data Management

LESSON 1: Data Collection, Organization, and Interpretation

Introduction

Statistics ​is defined as science of collecting, organizing, summarizing, presenting and


interpreting data. There are three main reasons why student study statistics. They are as
follows: (1) To read and understand the various statistical studies published in print or broadcast
media; (2) To conduct research in his own field since statistical procedures are basic to
research; and (3) To become better consumers and citizen by using the knowledge gained from
studying statistics.

Basic Terminology in Statistics

In studying statistics, it is important to understand the basic terms used in the subject. The
following terms are defined for this purpose.
Variable ​refers to a characteristic or attribute that can assume different or varied values. Example
of a variable is sex, nationality, score, height, etc. ​Data ​are the measurements or observations that
the variables can assume. A ​data set ​is collection of data values, and every particular value in the
set is called ​datum.

There are two branches of statistics. The branch that involves collection, organization,
summarization and presentation of data is called ​descriptive statistics. ​While the branch that
makes generalization from ​sample ​(​representative of a population​) to a ​population ​(​totality of all
observations or entities of any sort​), performs estimation and hypothesis testing, and determines
relationship among variables and makes predictions is called ​inferential statistics.

Variables can be classified as quantitative and qualitative. ​Quantitative variable ​is a


numerical value that can be ordered or ranked. IQ, scores, weight, temperature are examples of
quantitative variables. Quantitative variable is further classified as discrete and continuous.
Discrete variable ​assumes values that can be counted. On the other hand, a ​continuous variable
assumes unlimited number of values between any two specific values. Continuous variable is
measured. The number of deaths in a certain locality relative to CoViD-19 pandemic is an example
of a discrete variable, while the ​height ​of a person is an example of a continuous variable. Why a
height is considered a continuous variable? What are other examples of continuous variables?
How about discrete variables?

Variables are also classified according into four levels of measurement scales. They are: nominal,
ordinal, interval and ratio. ​Nominal scale ​is the simplest scale of measurement that classifies data
into mutually exclusive categories and uses numbers for labels only. Sex, occupation,
religious affiliation and marital status are examples of nominal data. ​Ordinal scale ​uses numbers for
labelling and the numbers can be ​ranked​. However, there is no equal difference between
ranks. Socio economic status, Latin honor, and academic rank are examples of ordinal data.
Interval scale ​possesses the characteristics of ordinal scale (label and rank) and equal differences
between ranks exist. Also, in an interval data, there is no true zero value. Score in an examination,
temperature, Intelligent Quotient (IQ) are examples of interval scale. ​Ratio scale ​is the
highest level of measurement. It possesses all the characteristics of an ordinal scale (label, rank,
equal differences
Leyte Normal University | Mathematics Unit 3

MODULE 4: ​Data Management

between ranks) and a true zero value of a number exist. Distance travelled, height, weight and age
are examples of ratio scale.

Variables are also classified according to their functions, especially in experimental studies.
They are independent or explanatory variable, dependent or outcome variable, and
confounding variable. ​Independent Variable ​is the variable manipulated by the researcher, while
the ​dependent variable ​is the variable affected or influenced by the manipulated variable.
The ​confounding variable ​on the other hand is a variable that influences the dependent
variable. For example a researcher is interested on finding out the effect of learning delivery
modes (pure online, pure printed module, mixture of online and printed module) on the
performance (test score) of the students in GE104. The delivery mode is the independent
variable; the performance is the dependent variable. The performance can be affected by
learning ability of the students. Thus, the learning ability is a confounding variable.

Data Collection and Sampling Techniques

Data can be collected in different ways. The method to use in the collection of data depends on the
source of data as well as the type of data to be collected. Data can be collected through
survey (​ telephone, questionnaire or interview), ​test, observation, ​and ​experimentation. D
​ etails
on how each method are done and what is the advantage of one over the other will not be
part of this lesson as this is exhaustively discussed in your research course.

Data are collected from a representative of a population called sample. The process of collecting
samples is called ​sampling. ​There are two types of sampling: ​non-probability ​and ​probability
sampling. In non-probability sampling, not every member of the population is given equal chance to
be chosen, hence the samples are not are true representative of the population. If the objective of
the study is to make a generalization, using non-probability sampling is discouraged. ​Convenience
or Accidental sampling​, Purposive ​or Judgemental Sampling and ​Quota Sampling a​ re the most
common techniques in non-probability sampling.

Probability sampling on the other hand gives equal chance to each member of the population to be
selected as a representative. There are four techniques under this type of sampling. They are as
follows: simple random sampling, systematic random sampling, stratified random sampling
and cluster random sampling.

Simple Random Sampling ​is a technique used in when the population is homogeneous with respect
to the characteristic of interest to the researcher and the population size is known (Petilos, 2012).
Selection of sample can be done either by lottery method or using random numbers.

Systematic Random Sampling ​is a technique that selects the desired sample size by selecting every
​ ​subject. To select the sample the researcher assigns number to each member of the population
kth
(by numbering consecutively) then he determines the value of ​k ​by dividing the total number
of cases (population) by the desired number of samples. For example the total population (​N)​ is
1,000 and the sample size (​n​) is 100. Therefore, the value of ​k ​is 10. Thus, the researcher will select
every 10​th ​subject in the population, which is determined by selecting the starting number between
1 to 10 by using simple random sampling. Suppose the starting number is 6, so the
researcher will
Leyte Normal University | Mathematics Unit 4

MODULE 4: ​Data Management

consider the subjects whose numbers are: 6, 16, 26, etc. until the desired number of
samples is completed.

Stratified Random Sampling ​is a technique used by grouping the population into subgroups called
strata ​according to the common characteristic/s as determined by the researcher. The subjects are
selected from each stratum which is proportional to the number of each subgroup. For example if
the population consists of all freshmen student across the three colleges (A, B, and C) in University
X.​ If the total freshmen population among the three colleges is 1400 divided as follows: ​NA​ ​= 350;
NB​ ​= 500 and ​NC​ ​= 550 and the researcher wishes to take a total of 350 respondents. Then he has to
select from each stratum the desired samples using either simple random sampling or systematic
random sampling using the following computation:
College N n

A 350

B 500

C 550

Total 1,400 351

[​Note​: Due to the rule of rounding off numbers as applied in A & C which are 87.5 = 88 and
137.5 = 138, respectively, the researcher has to decide in which subgroup he has to reduce the
samples by 1.]

Cluster Random Sampling ​is a technique used when the population is large enough or the
respondents are residing in a large geographic area and it is impossible for the researcher to obtain
the list of all members of the population. The members of each cluster are heterogeneous. Unlike
the stratified random sampling where the subjects are selected individually, in this technique
cluster/s is selected randomly and all members of the selected cluster would represent the
population. For example a researcher wishes to determine the type of fertilizer (pure
synthetic, pure organic or combination of synthetic and organic) use by rice farmers from the
municipality of Town Q. Assuming that there is no available list of rice farmers (categorized a small
scale, medium and large scale rice producing), the researcher can get a copy of the map of Town Q
and determine the number of barangays which are located outside downtown and along the
seashore areas. Each of these barangays is a considered a cluster. Suppose there are 43
barangays that belong to this group. Therefore, there are 43 clusters to choose from. The
researcher then decides how many of these barangays will be included and then he randomly
selects the cluster/s. The rice farmers in the selected cluster/s represent the group from Town Q.

Frequency Distribution and Graphs for Numerical Data

Once the researcher has already collected the data, the next thing to do is to organize. There are
three ways of presenting data: tabular, graphical and textual. The following discussion focuses on
how to organize raw data and subsequently represent those using graphs.

Example 1.1
Below are scores of 50 students in Statistics examination.
Leyte Normal University | Mathematics Unit 5
MODULE 4: ​Data Management

63 88 79 92 86 87 83 78 47 67
68 76 ​46 ​81 92 77 76 84 70 66
77 75 ​98 ​81 82 81 87 78 70 60
94 79 52 82 77 81 77 70 74 61
56 69 83 83 71 48 90 52 75 84

Looking at the array of scores it would be difficult for the reader to tell the characteristic of
the group. Thus, a frequency distribution needs to be prepared. A ​frequency distribution ​is
an organization of raw data classes/groups and frequencies. The frequency distribution is a
tabular way of organizing raw data. The following are the steps in preparing frequency distribution.

Step 1. Determine the number of classes.


• ​Find the highest value (​HV)​ and lowest value (​LV)​ .
• ​Find the range (​R)​ by subtracting the lowest from the highest value.
• ​Determine the estimate number of classes by getting the square root of ​n,​ ​call this ​k.​
Your actual number of classes could be greater than the estimated one.
Step 2. Determine the class size of the interval.
R​
c=​ ​ k ​ ​(r​ ounded to the nearest whole number)​

Step 3. Determine lower and upper limit of the lowest class interval. The lower limit should
be divisible by the class interval.

Step 4. Determine the upper class
Step 5. Tally the scores in their respective classes
Step 6. Summarize the tallies.

Illustration: Using the array of raw scores given above, we have:


1. Determine the number of classes
R ​= ​HV – LV
=​ 98 – 46
​R =​ 52
(​it tells us the gap between the highest and lowest scores in the given data set​)

k ​= ​50 ​= ​7.07
k ​= ​7

2. Determine the class size.


= ​7.43 ​c ​= ​8

R​ 52​
c=​ ​ k ​= ​ 7
3. Determine lower and upper limits of the lowest class interval.
Since the lowest value in the given data set is 46 and it is not divisible by the class interval ​€
which is 8, we have to find a smaller number closest to 46 which is divisible by 8.
The number is 40. So, our lower limit of our lowest class interval is 40 and the upper limit is
47, because the lower limit of the next class interval is 48 = lower limit of the preceding
class
Leyte Normal University | Mathematics Unit 6

MODULE 4: ​Data Management

added by the class size (c). It follows that the upper limit of this class interval is 55. Thus,
the class boundary is 48 – 55. Following the same procedure, you can find the remaining
class intervals.
4. Determine the upper class. The highest class interval should contain the highest value of the
given data set. Since our highest value is 98 which is not divisible by the class size of 8, so the
lower limit of the highest class interval should be a number smaller and closest to 98. The
number is 96. Thus, the highest class interval is between 96 – 103.
5. List down the class intervals and tally the scores in their respective classes.
Class Limits Class Boundaries Tallies Frequency

96 - 103 95.5 - 103.5 / 1

88 - 95 87.5 – 95.5 ///// 5

80 - 87 79.5 – 87.5 /////-/////-//// 14

72 - 79 71.5 – 79.5 /////-/////-/// 13

64 - 71 63.5 – 71.5 /////-/// 8

56 - 63 55.5 – 62.5 //// 4

48 - 55 47.5 – 55.5 /// 3

40 - 47 39.5 – 47.5 // 2

REMARKS:
• ​In this illustration the actual number of classes which is 8 is greater than the estimated value of ​k
which is 7.
• ​The second column shows the boundary of each class interval in which the actual lower and upper limits
are indicated. These are called ​true limits or class boundaries.
• ​The true upper limit of the preceding class is also the true lower limit of the succeeding class. This shows
the continuity of the data.

Using the same data set as presented in the frequency distribution above, we can prepare graphs.
In this module, we will discuss only the histogram, frequency polygon and ogive. These are
the most commonly used graphs in research.

A ​histogram ​displays the data using continuous bars (vertical or horizontal). The histogram is a bar
graph in which bars are constructed without space in between. This implies that the data presented
is continuous. The heights/lengths of the bars show the frequency of the respective classes. The
frequency polygon ​on the other hand displays the data by using lines connecting the points
plotted for the frequencies of each class. This graph is used when the data is continuous.
Both graphs use the midpoints of the classes in the frequency axis.

The ​ogive ​is a graph that shows the cumulative frequencies for the classes in the given distribution.
The ogive can be constructed either for cumulative frequency less of cumulative frequency greater.
The following are steps in constructing the above-specified graphs manually. The same graphs can
be constructed by using either by Excel or Minitab and the specific steps are illustrated in the book
of Bluman.

Example 1.2
Before constructing the different graphs, we need to add more information in our
frequency distribution as shown below.
Leyte Normal University | Mathematics Unit 7

MODULE 4: ​Data Management

Class Interval f X <cf >cf rf

95.5 - 103.5 1 99.5 50 1 2.0

87.5 – 95.5 5 91.5 49 6 10.0

79.5 – 87.5 14 83.5 44 20 28.0

71.5 – 79.5 13 75.5 30 33 26.0

63.5 – 71.5 8 67.5 17 41 34.0

55.5 – 62.5 4 59.5 9 45 18.0

47.5 – 55.5 3 51.5 5 48 10.0

39.5 – 47.5 2 43.5 2 50 4.0

​N = 50 100.0

REMARKS:

​ ​+U
X ​= LL ​ L
• ​The midpoint of each class is obtained .
2​
using the formula:

Steps in Constructing a Histogram


Step What to do?

1 Construct two perpendicular axes (vertical and horizontal)

2 Label the vertical axis as the ​frequency a​ xis and the horizontal as ​variable
axis​.(In our illustration below, our variable is a s​ core)​

3 Lay off segments along the vertical axis (y-axis) to correspond to the
frequencies. ​(The segments must be equal in length)

4 Lay off segments along the horizontal axis (x-axis) to correspond to the different
class intervals of the variable. The first line segment should be moved a little to the
right if the lowest value of the variable is not zero.

5 Mark all midpoints of the intervals and label these using class midpoints.

6 Draw rectangle or bars whose heights correspond to the frequency counts and
whose widths to the class size. ​(Shade or color your bars).

Adapted from: Resource Materials in Basic Statistics (Petilos,p.9)

Score

​ istogram
Figure 1.1. H
Leyte Normal University | Mathematics Unit 8
MODULE 4: ​Data Management

Steps in Constructing a Frequency Polygon


Step What to do?

1 Construct two perpendicular axes (vertical and horizontal)

2 Label the vertical axis as the ​frequency a​ xis and the horizontal as ​variable
axis​.(In our illustration above, our variable is a s​ core​)

3 Lay off segments along the vertical axis (y-axis) to correspond to the
frequencies. ​(The segments must be equal in length)

4 Lay off segments along the horizontal axis (x-axis) to correspond to the different
class intervals of the variable. The first line segment should be moved a little to the
right if the lowest value of the variable is not zero.

5 For each class interval, the class midpoint and corresponding frequency are
considered ​ordered pair ​and is ​plotted ​in the plane determined by the
coordinate axes.

6 The plotted points are then joined using line segments from left to right. To close the
polygon, extend one class interval to both sides by connecting the endpoints of the
graph to the midpoints of the extended segments along the x-axis.

Adapted from: Resource Materials in Basic Statistics (Petilos, p.10)

16
14
12
yc
0
n

10
8
6
4
2 Score
Polygon
​ requency
​Figure 1.2. F

Steps in Constructing an Ogive


Step What to do?

1 Construct two perpendicular axes (vertical and horizontal)

2 Label the vertical axis as the ​cumulative frequency a​ xis and the horizontal as
variable ​axis​. (In our example the variable is a s​ core)​ .

3 Lay off equal segments along the vertical axis (​y a​ xis) to correspond to the
cumulative frequencies. Use an appropriate scale to represent the cumulative
frequencies. ​(Depending on the numbers in the cumulative frequencies, the scales
can be by 2’s, 4’s, 5’s, etc. )

4 Lay off equal segments along the horizontal axis (​x ​axis) to correspond to the
true upper limit of the ogive for less than cumulative frequencies and true lower
of the ogive for greater than cumulative frequencies

5 Plot the cumulative frequencies with the corresponding class boundaries.

6 The plotted points are then joined using line segments from left to right.

Reference: Bluman, pp54-55


Leyte Normal University | Mathematics Unit 9

MODULE 4: ​Data Management

​REMARK:
• ​To determine the percentage or the number of cases found below or above a particular boundary. ​• ​If
the ogives (for >cf and <cf) are graphed on the same coordinate plane, a line can be drawn from the point
of intersection of the two graphs onto the variable axis which represents the median of the data set.
e

20 r

10 60

yc
0 50
n
40
e

u 30
q
Class Boundaries
e 20
r

F
10
60
0
50 yc

40
n

e
Class Boundaries
u

30 q

​Figure 1.3. L​ ess than cumulative frequency ​ reater than cumulative frequency
​Figure 1.4. G

Stem and Leaf Plot


Another method of organizing data which is a combination of sorting and graphing is the
called ​stem and leaf plot. ​It is a data plot that uses the leading digit as stem and the trailing digit as
leaf.

Steps in Constructing Stem and Leaf Plot.


Step What to do?

1 List down the leading digits of the data set called the ​stem​. Arrange them in
a column either from lowest to highest or vice versa.

2 Starting from the first to the last entry of the data set, carefully record the
trailing digits ​(leaf​) in their corresponding stem.

3 Arrange in order the trailing digits in each row. If there are no data values in a class,
the stem number is written and the leaf row is left blank.

Reference: Bluman, pp81-82

Example 1.2
Let us illustrate the above procedure using the data on the scores of 50 students in
Statistics examination. The data are reproduced as follows:

63 88 79 92 86 87 83 78 47 67
68 76 46 81 92 77 76 84 70 66
77 75 98 81 82 81 87 78 70 60
94 79 52 82 77 81 77 70 74 61
56 69 83 83 71 48 90 52 75 84
Leyte Normal University | Mathematics Unit 10

MODULE 4: ​Data Management

Steps:
1. ​Stem ​(Leading Digit) ​Leaf ​(Trailing Digit)
9 48220
8 831123627113744
7 76599717678800045
6 3897601
5 622
4 687

2. Rearranging the trailing digits (leaf) we have:


​Stem Leaf
9 02248
8 111122333446778
7 00014556677778899
6 0136789
5 226
4 678

REMARKS:
• ​The figure shows that the distribution peaks in the center and there are no gaps in the data. ​•
The highest score is 98 and the lowest is 46.
• ​Most scores are 70 and above.

What other information can you draw from the figure above?
Leyte Normal University | Mathematics Unit 1
1

MODULE 4: ​Data Management

Exercises 1.1

A. Determine the area of statistics (descriptive or inferential) illustrated by thefollowing


statements.
1. A recent study showed that eating garlic could lower blood pressure.
2. The teacher - pupil ratio in public schools has increased from 1:40 in 2015 to 1:50 in
2019.
3. It is predicted that the average number of automobiles each household owns will
increase next year.
4. A study revealed that Lagundi is more effective in curing cough than a similar
product.
5. Consumers generally prefer Colgate than any other toothpaste.

B. In each statement below identify the variable/s and classify it/them according to the level of
measurement (nominal, ordinal, interval, ratio)
1. Marital status of faculty members in a university.
2. Time it takes a student to travel from home to school.
3. Scores in the College Admission Test of freshman students in University Q.
4. Socio-economic status of the residents in a barangay (poor, average, above-average). 5.
Ages of freshman college students of Leyte Normal University.

C. Classify each variable as discrete or continuous


1. Number of CoViD-19 cases in Eastern Visayas.
2. Weights of backpacks of college students inside a Science laboratory room.
3. Number of new mono bloc chairs inside the university social hall.
4. Blood pressures of patients seeking admission in a hospital.
5. Number of boxes of disposable surgical masks sold in one pharmacy in three days.

D. A research is to be conducted to determine the level of language proficiency and numeracy


skills among the 700 Education and 300 Management graduating students at University Q.
The researcher wants a sample of 300 be selecting representatives from the two
programs.
1. What is the population of the study?
2. What is the sample in the study?
3. What are the variables of the study? What is the level of measurement of each
variable?
4. What sampling technique used in this study?

E. ​An insurance company researcher conducted a survey on the number of car thefts in a
large city for a period of 30 days last summer. The raw data are shown below. Construct a
grouped frequency distribution, frequency polygon, histogram and ogives ​(Show all
necessary solutions).
52 58 75 79 57 65 62 77 56 51
59 53 51 66 55 68 63 78 50 53
67 65 69 66 69 57 73 72 75 55
Leyte Normal University | Mathematics Unit 12

MODULE 4: ​Data Management

LESSON 2: Measures of Central Tendency

Introduction

Statistics ​is a science of collecting, organizing, summarizing, presenting and interpreting data. There
are two branches of statistics. The branch that involves collection, organization, summarization and
presentation of data is called ​descriptive statistics. ​While the branch that involves the
interpretation and drawing conclusion is called ​inferential statistics. ​Descriptive statistics
include the measures of central tendency, measures of position and measures of variability.

There are three measures of central tendency or measures of central location, namely: the ​mean,
median ​and the ​mode​. The measure of central tendency is a single value that describes a whole set
of data by identifying the central position within the given data set. It is sometimes called
the measure of central location or summary statistics.

Mean (Raw and Grouped Data)

The data gathered in their original form is called ​raw or ungrouped data​, while the data that have
been organized into a frequency distribution is called ​grouped data.

For raw data, the ​mean ​is defined as the arithmetic average of a data set It is equal to the sum of
the measurements divided by the number of cases (n). It is the measure used when there is
no extreme value of the data set and the data is either an interval or ratio. Among the three
measures of central tendency, the mean is the most reliable and is amenable for further
mathematical manipulation which makes it useful for inferential statistics.

Formula: mean =

The Greek capital letter sigma is used to denote a sum. Thus, the formula above means, the
​ ivided by the total number of cases. For the data collected from a
summation of the values of ​x d
population the symbol use for the mean is a Greek letter (read as mu) which is called ​parameter. x​
​ ar) which is
(read as: ​x b
While the data collected from sample, the symbol use for the mean is
called statistic. The total number of cases is denoted by N​ a​ nd ​n ​for a parameter and statistic,

respectively. Thus the working formula for the mean of a population is: =

Example 2.1.​ Compute for the average of the scores in a Math quiz of 15

students. 23 25 34 32 22 24 26 24 34 30 26 26 37

25 24

Leyte Normal University | Mathematics Unit 13

MODULE 4: ​Data Management

Solution:
Using a calculator, we have:

​ + ​25​+ ​34 ​+.​ ..​+ ​24


​ 23​
x=
15

​= ​412
15
x=​ ​27.5

This implies that the average score in a Math quiz of the 15 students is 27.5

Note: Rounding Rule for the Mean. ​The mean should be rounded to one more decimal place
than occurs in the raw data.

For grouped data, the mean is obtained by using the formula below:

​ fX
​ Σ​
x=
N

where: �​ ​= average or mean


​f ​= class frequency

​X ​= ​midpoint of each class
​N =​ total number of cases

Example 2.2. U​ sing the data in Example 4.1.1 we find the mean of grouped data. (Scores of
50 students in Statistics examination)
Class Interval f X fX

96 - 103 1 99.5 99.5


88 – 95 5 91.5 457.5

80– 87 14 83.5 1169.0

72– 79 13 75.5 981.5

64 – 71 8 67.5 540.0

56 – 63 4 59.5 238.0

48– 55 3 51.5 154.5

40 – 47 2 43.5 87.0

N =​ 50 fX =​ 3,727.00

By substitution, we have:

​ fX
​ Σ​
x=

​ 3727
N= ​
50 =
​ ​74.54

Therefore, the average score of 50 students in a Statistics examination is 74.54.



Note​: We rounded off the computed mean to the nearest hundredths because the class intervals is
actually 0.5 below and above the given limits. Thus, the true lower limit of each class interval is 0.5
below the apparent lower limit and the true upper limit is 0.5 higher than the apparent upper limit.
Leyte Normal University | Mathematics Unit 14

MODULE 4: ​Data Management

Example, f​ or the class interval of ​40 - 47 ​with the ​lower limit of 40 and the upper limit of 47,​ ​has a
true lower limit of 39.5​ ​(0.5 lower than the apparent lower limit), and has a ​true upper limit of 47.5
(0.5 higher than the apparent upper limit).

There is another method of finding the mean of grouped data by using the assumed
deviation. However, the discussion of this method will not be included in this module.

The Weighted Mean

When the weight of each value or observation is not equal the ​weighted mean ​is obtained. The
weighted mean ​is computed using the formula below:

​ ​Σ​wX
X=
Σ​w

​ 1​​ X1​ ​+​w2​​ X2​ ​+​...​+w


​ w
X= ​ ​nX
​ ​n
w1​ ​+​w2​ ​+​...​+w
​ ​n

Where: ​w​1​, ​w​2​, …, ​w​nare


​ the weights and ​X​1​, ​X2​​ ,…, ​Xn​ ​are the values or observations
ΣΣΣΣΣ​wX =​ sum of the products of each value and its respective weight

Σ​w =​ sum of the weights


Example 2.3
Find the grade weighted average of a student in his five subjects as shown in the table below:
Subject Grade (​X)​ No. of Units (​w)​ wX

Mathematics 1.5 3 4.5

English 1.7 3 5.1

PE 1.3 2 2.6

Physics 1.6 5 8.0

Social Science 1.5 3 4.5


​ 16​
​Σ​w= Σ​wX ​=24.7

By substituting to the formula, we find the Grade Weighted Average (GWA) of the student:

​ wX
X ​= Σ​

Σ​w ​= 24.7

16 =
​ ​1.54
Thus, the grade weighted mean of the student is 1.54.

Median (Raw and Grouped Data)

The median is the middlemost value of the measurements when they are arranged from smallest to
highest. It is used when the data is at least ordinal. The median is not affected by extreme values or
outliers. The median is reliable and less stable than the mean.
Leyte Normal University | Mathematics Unit 1
5

MODULE 4: ​Data Management

For raw data or ungrouped data, the median is obtained by getting the middlemost value after the
data set is arranged from lowest to highest. It is the value that divides the data set into two equal
parts.

Example 2.4
Using the data set in ​Example 2.1 ​we have:

23 25 34 32 22 24 26 24 34 30 26 26 37 25 24

Solution: a) Arrange the scores from lowest to highest.


Using stem and leaf plot we have:

Stem Leaf
3 42407
2 3524646654

Rearranging the leaf in our plot above we have

Stem Leaf
3 02447
2 2344455666

22 23 24 24 24 25 25 ​26 ​ ​26 26 30 32 34 34 37

Thus, the median of the given data set is 26. This implies that with the score of 26, there
seven cases below and above it. ​Example 2.4 i​ s an example of data set for odd cases (n = 15). How
to find the median when there are even cases? Based on the definition of the median it is the
middlemost value.

Example 2.5

​ 6 30 32 34 34 35 37
22 23 24 24 24 25 25 ​26 26 2
Thus we
⎛ ⎞
⎛​ ⎞​
​ ​ ⎝⎜​ ⎠​⎟​th c​ ase.
2​+1
n 2
case and
To get the median of even ⎠​⎟​th ​
have: n
cases, we take the

average of the ​⎝​⎜


⎞​ n​
⎝⎜​ ⎠​⎟​th ​case ​+​ 2​+​1

⎛​ ⎞​
⎝⎜​ ⎠​⎟​th c​ ase ​n


Md = € 2 €
2
​ ​+ ​26
​= 26
2
Md =​ ​26

This implies that the value of 26 divides the cases into two equal parts. This 26 is not the 8​th​ ​nor the
9​th​ ​case but there is a value of 26 between 8​th​ ​and 9​th​ ​cases.

Leyte Normal University | Mathematics Unit 16

MODULE 4: ​Data Management

For grouped data, the median is obtained using the formula below:
⎛⎜ ⎟​ c​
2 −
​ ​cf ⎟​(​ )

Md ​= ⎜​ f ⎟
⎜ ⎞
LL ​+ ⎝ ⎟ ⎠
N

where: ​LL ​= true lower limit of the median class


​N =​ total number of cases

​cf ​= cumulative frequency below the median class
f =​ frequency of the median class
​c =​ class interval

Example 2.6
​ e find the median of grouped data. (Scores of 50 students
Using the data in ​Example 1.1 w
in Statistics examination)
Class Interval f <cf

96 - 103 1 50

88 – 95 5 49

80– 87 14 44

​72– 79 (​median class)​ 13 ​(​f​) 30

64 – 71 8 ​17 ​(​cf)

56 – 63 4 9

48– 55 3 5

40 – 47 2 2

​N= 50
Note that 50% of 50 cases is 25. This means that we find a number or value such that 50% of the
total number of cases is below and above it. Using the formula above we have:
⎜​ = ⎛⎜ ⎟
⎜ 2 −​
​ 17
⎛ 71.5​ + ⎜
Md ​= ⎝ ⎟ ⎜​ 13 ⎠
⎞ ⎜
LL + ​ N ⎞

⎛⎜ ⎠ ⎝ ⎟
⎜ 2− ​ ​cf ⎟​ (​c​) ⎞ 50 ⎟​ 8​
⎟​
f ⎟​(​ )
(​0.6154​)(​8​) ​= 7​ 1.5​+ ​4.92
​ ​−​17
​= ​71.5​+25
8​ = ​71.5​+
⎠​⎟​(​ ) ​
76.42
⎝​⎜
Md ​= 13

This implies that 76.42 is the middlemost value of the given data set. This means that there are 25
cases found below and above this value.

Leyte Normal University | Mathematics Unit 17

MODULE 4: ​Data Management

Mode (Raw and Grouped Data)

The ​mode ​is the most frequent value in a given data set. The mode is used when you want
to determine a quick estimate of the typical value in a given data set. The mode is the most
unstable measure of central tendency especially if there are only few cases. A given data set can
have more than one mode. For cases where there are two modes it is called ​bimodal​.

Example 2.7
Using the data set in ​Example 1.1​, we notice that there are two values (24 and 26) that have the
same frequency of 3.

​ 5 25 ​26 26 26 ​30 32 34 34 37
22 23 ​24 24 24 2

Therefore, the modes of the given distribution are ​24 ​and ​26​. This is an example of a
bimodal distribution.

Example 2.8
Find the mode of the following data: ​12, 34, 12, 71, 48, 93, 71 ​.

By inspection, the number 12 occurs more often than the other numbers. Therefore, the ​mode ​of
the distribution is ​12. ​This is an example of a ​unimodal ​distribution.
Example 2.9
Find the mode of the following data set:
12, 5, 8, 9, 11, 11, 4, 7, 23, 7, 8, 12, 23, 9, 4, 5

By inspection, each number in the list occurs twice. There is no number that occurs more
often than the others. Therefore, there is ​no mode.

For grouped data, the mode is obtained by using the formula below:

⎛ c​
⎠​⎟​(​ )
Mo ​= ​LL
+​d​1
d​1 + ​
​ ​d2
⎝​⎜

where: ​LL ​= true lower limit or lower boundary of the ​modal class​;
d​ 1​ ​= absolute difference between the frequencies of the ​modal class

​and the lower class interval (interval just below it);
​d2​ =​ absolute difference between the frequencies of the ​modal class
a​ nd the higher class interval (interval just above it);
​c ​= the class size
Leyte Normal University | Mathematics Unit 18

MODULE 4: ​Data Management

Example 2.10
Using the data in Example 4.1.1 we find the mode of grouped data. (Scores of 50 students
in Statistics examination)
Class Interval f

96 - 103 1

88 – 95 5
(interval just above the modal class)

80– 87 14
(modal class)

72– 79 13
(interval just below the modal class)

64 – 71 8

56 – 63 4

48– 55 3
40 – 47 2

Using the formula below, we obtain the mode of the given data set:
⎞ ⎞
⎛ ⎛
c​
​ ​1
​ ​LL ​+d
Mo = ⎠​⎟​(​ ) ​= ​79.5​+​14 ​−1
​ 3
⎜​ + ​ 14 ​− 5​
⎝​⎜ ⎜ (​ ) ​⎝
d1​ + ​ ​2
​ d (​14 ​−1​ 3​) ⎟​ 8​
⎟​(​ ) ⎠​

1​ ⎞
​ ​79.5​+​ 1​+ ​9
Mo = 8​ = ​79.5​+
⎠​⎟​(​ ) ​
⎝​⎜
Mo =​ ​80.30 (​0.10​)(​8​) ​= ​79.5​+.​ 80

Therefore, the mode of the given data set is 80.3.



In summary, the given data set has the following values of the measures of central
tendency: Mean = 74.54 Median = 76.42 Mode = 80.30

What is the characteristic of our illustrative distribution? Why?

Types of Distribution

The characteristic of the distribution can be determined by the shape of its graph (histogram
of frequency polygon). According to Bluman, the symmetric, positively skewed and negatively
skewed are the most important shapes of graphs that describe a distribution. Skewness refers to
the degree of departure of the distribution from the line of symmetry. When the data values
are evenly distributed on both sides of the mean and it is unimodal, the distribution is
called ​symmetric distribution. ​Further, the mean, median and mode have equal values and are at
the center of the ​x ​= ​Md =
​ ​Mo .​
distribution. In symbol,


Leyte Normal University | Mathematics Unit 19

MODULE 4: ​Data Management

A ​positively skewed ​or ​right-skewed distribution ​is unimodal and majority of the data values
cluster at the lower end of the distribution and to the left of the mean. Moreover, with the
positively skewed distribution, the mode is lesser than the median and the median is lesser than
Mo ​< ​Md ​< ​x .​
the mean. In symbol,

A ​negatively skewed ​or ​left-skewed distribution ​is observed when majority of the data
values cluster at the upper end of the distribution and to the right of the mean. Furthermore,
with the

negatively skewed distribution the mode is greater than the median and the median is greater than
x ​< ​Md <
​ ​Mo .​
the mean. In symbol,

The following graphs are illustrations of the three types of distribution according to its
skewness (​MathBits.com).

Symmetric Distribution

Positively Skewed Distribution

Negatively Skewed Distribution

Leyte Normal University | Mathematics Unit 2


0
MODULE 4: ​Data Management

Summary of Measures of Central Tendency


Measure Common When to Use Advantage Disadvantage
Name

Mean Arithmetic • ​There are no • ​Most stable, i.e., • ​Affected by


Average extreme values stable and less extreme scores or
• ​When the data at variable from sample values
least an interval to sample
• ​Amendable for
further
mathematical
manipulation which
makes it useful in
inferential statistics

Median Middle • ​The distribution • ​Easy to compute • ​Less stable from


Score/Valu is skewed • ​Not affected by sample to sample
e • ​When the data is extreme scores or
at least ordinal or values
rank

Mode Typical • ​When a quick • ​Easy to compute • T​ he most


Score/Valu estimate to the unstable
e typical score or measure
value to be especially when
determined the number
of cases is small.

Adapted from: Resource Materials in Basic Statistics (Petilos, p.14)

Exercises 2.1

A. Using Exercise 13.1 on page 811 of the book, ​Mathematical Excursion ​by Aufman,
answer numbers 4 to 9 and 11.

B. Using the same exercise, find the mean, median and mode of the data set of number
14 on page 812.

C. Problem Solving.
1. If the mean age of eight college freshman students is 19.25. and six of the ages
are: 19, 18, 20, 19, 20 and 18. What are the ages of the two students who are
twin siblings? What is the mode (age) of the eight students?
2. Find the mean of 20, 30, 40, 50 and 60.
a. Add 5 to each value and find the mean.
b. Subtract 5 from each value and find the mean.
c. Multiply each value by 5 and find the mean.
d. Divide each value by 5 and find the mean.
e. Make a general statement about each situation.
Leyte Normal University | Mathematics Unit 2
1

MODULE 4: ​Data Management

LESSON 3: Measures of Variation

Introduction

In the preceding lesson you learned the three measures of central tendency namely, mean, median
and mode. Accordingly, to describe the data set, it is important that one knows more than
the measures we studied in the previous lesson as one tends to claim that two or more data sets
are not varied when it is observed that the averages are equal. In this lesson, we will
discuss the ​measures of variation/spread ​or ​measures of dispersion​. In this module the four
measures of variability both for ungrouped and grouped data will be talked over. They are
the range, mean absolute deviation, variance and standard deviation.

Range (Raw and Grouped Data)

The ​range ​is simply the gap or difference between the highest and lowest value/observation of the
data set. In formula: ​R ​= ​HV –​ ​LV.​

If R = 0, it implies that all values in a data set are equal. Thus, there is no variability of the data.

Example 3.1
Ages of female faculty members from three departments.
Statistical Implication/Impression Data Set
Measure
A B C

37 40 39

38 41 40

42 42 42

45 43 43

48 44 46

Mean Equal distribution 42 42 42

Range Distribution A is 11 4 7
more spread. Why?
According to Petilos in his ​Resource Material in Basic Statistics, ​range ​of grouped data is equal to
the ​difference between true upper limit of the highest class interval a​ nd ​the true lower limit of the
lowest class interval.​ If the apparent limits are used, the range is equal to the ​difference between
upper limit of the highest class interval less than the lower limit of the lowest class interval plus 1.​ In
formula:
R= ​ ​ UL​ − ​ LL​
(​ )​H ​ (​ )​L


Leyte Normal University | Mathematics Unit 2
2

MODULE 4: ​Data Management

Example 3.2
Scores of 50 students in Statistics examination
Class Interval f

96 - 103 1

88 – 95 5

80– 87 14

72– 79 13

64 – 71 8

56 – 63 4

48– 55 3

40 – 47 2

​N = 50

Using the data set as presented in the distribution above, the range is:

R ​= 103.5 – 39.5 = ​64 ​(using true limits)

R ​= 103 – 40 + 1 = 63 + 1 = ​64 ​(using apparent limit)

Mean Absolute Deviation (Raw and Grouped Data)

The ​mean absolute deviation (MAD) ​of a data set is defined as the average distance between each
data value and the mean. It helps to describe how “spread out” the values in a data set
are (​https://www.khanacademy.org/math​). The MAD for raw data is computed using the
following ​ ​formula:

​ ​X −
​ Σ
MAD = ​ ​x
or value
N
where: ​X =​ score

x =​ mean score or mean value



​N ​= total number of cases

Using the data set of ​Example 3.1 ​and computing for the MAD of each distribution, we
have: ​€

Example 3.3
Ages of female faculty members from three departments
Statistical Implication/Impression Data Set
Measure
A B C

37 40 39

38 41 40

42 42 42

45 43 43

48 44 46

Leyte Normal University | Mathematics Unit 2


3

MODULE 4: ​Data Management

Mean Equal distribution 42 42 42

Range Distribution A is more 11 4 7


spread. Why?

MAD Distribution B is least 3.6 1.2 2.0


variable compared to
the other two data
sets. Why?

By substituting the formula, we find the MAD of Data Set A as follows:


​ ​X ​− ​x
​ Σ
MAD =

​ ​− ​42 ​+ ​38 ​− ​42 ​+ ​42 ​− ​42 ​+ ​45 ​− ​42 ​+ ​48 ​− ​42


​ 37
N=
5

​ + ​4 ​+ ​0 ​+ ​3​+ ​6
​= 5​
18​
5 ​= ​ 5
MAD = ​ ​3.6

Following the same procedure we find the MAD of the remaining two distributions as reflected on
the table above.

It can be deduced from the table of ​Example 3.3 ​that the scores of Data Set A deviate from
the mean by an average of 3.6, compared to Data Set B where the scores deviate from the mean by
an average of 1.2. This implies that Data Set B is less spread compared to Data Set A. The lesser
the value of MAD the less spread the distribution is.

For grouped data the MAD is obtained using the formula below:

​ ​f X −
​ Σ
MAD = ​ ​x
N


x =​ mean score or mean value

Example 3.4 ​N ​= total number of cases


where: ​X =​ class mark or
midpoint of each class ​f =​ €
frequency of each class
Using the data in ​Example 1.1 ​we find the mean absolute deviation of grouped data.
Scores of 50 students in Statistics examination
Class Interval f X ⏐ ​x ​-​X​⏐ �​⏐ ​x ​-​X​⏐

96 - 103 1 99.5 24.96 24.96

88 – 95 5 91.5 16.96 84.80

80– 87 14 83.5 8.96 125.44


€​ €​

72– 79 13 75.5 0.96 12.48

64 – 71 8 67.5 7.04 56.32

56 – 63 4 59.5 15.04 60.16

48– 55 3 51.5 23.04 69.12

40 – 47 2 43.5 31.04 62.08

​N = 50 ​ ​⏐ ​x -​ X
​Σ � ​ ⏐​ ​= 495.36
x​=74.54 (from ​Example 2.2​)
Recall:
Leyte Normal University | Mathematics Unit ​€ 2
4


MODULE 4: ​Data Management

Thus,

​ ​Σ ​f X −
MAD = ​ ​x

N ​= 495.36

50 =
​ ​9.9072
​ ​9.91
MAD =

This implies that the 50 scores deviate from the mean of 74.54 by an average of 9.91 units.

Variance and Standard Deviation (Raw and Grouped Data)

The last two measures of dispersion or measures of variation to be included in this module are the
variance and standard deviation. Bluman, in his book Elementary Statistics, defines ​variance ​as the
average of the squares of the distance each score or value from the mean. While the
standard deviation​, is the square root of the variance. It looks at how spread out a group of
numbers is from the mean (https://www.investopedia.com).

The population variance and standard deviation are calculated using the following respective
formulas:
Σ​ X ​− ∝​ 2
σ​2 ​= ​ (​ )​
read as “sigma squared”​):
Variance (​σ2​ ​
N

Σ​ X ​− ∝​
σ=​ (​ )​2
Standard Deviation (​σ ​= square root of the variance)​ :


N

Where: ​σ2​​ ​= population variance


​σ ​= population standard deviation

​X =​ the item or observation
​∝ ​= population mean
​N ​= total number of cases

Example 3.5
The following data are ages of 10 teachers in one Elementary School:

27, 34, 30, 29, 28, 30, 34, 35, 28, 29.

Find the variance and standard deviation of this population data.

Solution: ​To compute for the variance​, we present the data as shown in the table below:
Age (​X​) X−
​ ∝ (​X ​− ∝​)2​

27 -4.4 19.36

34 2.6 6.76

30 -1.4 1.96

Leyte Normal University | Mathematics Unit 2


5

MODULE 4: ​Data Management

29 -2.4 5.76

28 -3.4 11.56

30 -1.4 1.96

34 2.6 6.76

35 3.6 12.96

38 6.6 43.56

29 -2.4 5.76

∝ ​= 31.4 Σ​(​X ​− ∝​)2​ ​= 116.4

By substituting the formula, the variance is


Σ​ X ​− ∝​ 2
σ​2 ​=​ (​ )​
N ​= 116.4

10 =
​ ​11.64
When the variance is zero (0) it indicates that all of the data values are the same. Thus, there is no
variation. Since a variance is an average of the square it follows that all non-zero variances
are ​€
positive. A small variance indicates that the data points tend to be very close to the mean, and to
each other. A high variance indicates that the data points are very spread out from the mean, and
from one another (MathBits.com).

What does a population variance of 11.64 mean? Since the value of 11.64 is far from zero,
this implies that the observations are more spread from one another and from the mean.

From above value of population variance, it follows that the population standard deviation which is
the square root of the variance is:
σ = ​11.64 ​= ​3.41 ​.

We recall that the standard deviation measures how concentrated the data are around the mean;
the more concentrated, the smaller the standard deviation. ​€
(https://www.dummies.com/education/math/statistics). What is the implication of the above value
in relation to the mean of the given data set?

Example 3.6
Using the data set of Example 4.3.3, determine the variance and standard deviation of each subset
of data. Compare your results. The table is reproduced below.

Ages of female faculty members from three departments


Statistic Implication/Impression Data Set
al
Measur A B C
e
37 40 39

38 41 40

42 42 42

45 43 43

48 44 46

Mean Equal distribution 42 42 42

Leyte Normal University | Mathematics Unit 2


6

MODULE 4: ​Data Management


Range Distribution A is 11 4 7
more spread. Why?

MAD Distribution B is 3.6 1.2 2.0


least variable
compared to the
other two data
sets. Why?

Variance

Standard
Deviatio
n

Computing Sample Variance and Standard Deviation

The table below shows the different notations use for the variance and standard deviation.
Notation Statistical Measure

σ​2 Variance of a population

σ Standard deviation of a sample

s2​ Variance of a sample

s Standard deviation of a sample

If the data set is taken from a sample, the variance and standard deviation are obtained using the
following computational formula (Bluman, p.137)

Sample Variance:
​ ​X ​2 ​(
s​2 ​=​n Σ ) −​ ​(​Σ​X )​​ 2
−​1​
n​ n
)
(​

Sample Standard Deviation (s​ quare root of the variance):


€ ) −​ ​(​Σ​X )​​ 2
​ ​X ​ (
​ ​n Σ
s=
2​

variance
n​ n −
​ ​1​
Where: ​s​2 =​ sample (​ )

​X ​= individual observation

​n =​ sample size
Example 3.7
Find the sample variance and standard deviation for the daily production rate of fiberglass boats of
a certain manufacturer. If the company production manager feels that a standard deviation of more
than three boats a day is unacceptable, should the manager be concerned about the plant
production rate? Why?

17 21 18 27 17 21 20 22 18 23
Leyte Normal University | Mathematics Unit 2
7

MODULE 4: ​Data Management

Solution:
X 17 21 18 27 17 21 20 22 18 23 Σ​X ​= 204

X​2 289 441 324 729 289 441 400 484 324 529 Σ​X2​ ​= 4,250

​ ​X 2​ ​(
s2​ ​=​n Σ ) −​ ​(​Σ​X ​)​2
n​ n ​−​1​ =​
(​ )​ (​10​)(​4250​) −​ ​(​204​)​2
10​ 10 ​−​1​
(​ )
s2​ ​= 42500
​ ​− ​41616

90 ​= ​884

90 =
​ ​9.82
From above value of sample variance, it follows that the sample standard deviation which is
the square root of the variance is:

s=​ ​9.82 =
​ ​3.13
.
REMARK:

Since the obtained sample standard deviation of 3.1implies that the fiberglass boats plant
daily ​€
production is within the acceptable rate. Thus, there is no reason for the plant manager to weary
about its production.

Computing Sample Variance and Standard Deviation from Grouped Data

For grouped data we find the variance and standard deviation using the following computational
formula (Bluman, p.139)

n​ n −
​ ​1​
Variance: (​ )
s2​ ​=​n ​Σ​fX 2​ ​( )
− ​ Σ​fX ​ 2 ​ ​fX ​2 ​( )
(​ )​ ​ n
s= ​ Σ
− ​ Σ​fX ​ 2​ n​ n
(​ )​ ​ (​
−​1​
)
Standard
Deviation:

where: ​f ​= class frequency


​X ​= class mark

​n ​= total number of observations

Example 3.8
Using the data in Example 4.1.1 we find the variance and standard deviation of grouped data. The
table is reproduced below:
Leyte Normal University | Mathematics Unit 2
8

MODULE 4: ​Data Management

Scores of 50 students in Statistics examination


Class Interval f X fX fX2​

96 - 103 1 99.5 99.5 9900.25

88 – 95 5 91.5 457.5 41861.25

80– 87 14 83.5 1169.0 97611.50

72– 79 13 75.5 981.5 74103.25

64 – 71 8 67.5 540.0 36450.00

56 – 63 4 59.5 238.0 14161.00

48– 55 3 51.5 154.5 7956.75

40 – 47 2 43.5 87.0 3784.50

​ 50
N= ​ ​3727
Σ​fX = Σ​fX​2​= ​285,828.50

Substituting the above computational or shortcut formula, we obtain the sample variance as
follows:

​ ​fX 2​ ​(
s2​ ​=​n Σ ) −​ ​(​Σ​fX )​​ 2
n​ n ​−​1​ =​
(​ )​ (​50​)(​285828.50​) −​ ​(​3727​)​2
(​50​)(​50 ​−​1​)
s2​ ​= 14291425
​ ​−​13890529

=​
(​50​)(​49​) ​ 400896

2450 =
​ ​163.63

With the above sample variance value of 163.63 it follows that the sample standard deviation (​s)​
which is the square root of the variance is 12.79. This implies that the scores of 50 students deviate

from the mean on the average by a distance of 12.79 units.

There is another method of computing the sample variance and sample standard deviation by using
the Coded Deviation. However, its discussion is not included in this module.

Exercises 3.1

​ y Aufman,
A. Using Exercise 13.2 on page 823 of the book, ​Mathematical Excursion b
answer numbers 4 to 8 and 12.

B. Using the same exercise, answer number 20 on page 824 on the ages of the female
and male actors Academy awardees. Answer questions a, b, and c found at the end
of the exercise.

C. Critical Thinking
Using the exercise no. 26 on page 825 perform the suggested activity and answer
the question found at the end.
Leyte Normal University | Mathematics Unit 2
9

You might also like