You are on page 1of 63

`ENGINEERING DATA ANALYSIS (PROBABILITY AND STATISTICS)

Math 212

Instructor: Engr.Gaudencio T. Tiwing, PSE.,MBA.

Chapter 1: INTRODUCTION
In any investigation, whether quantitative or qualitative, a decision has to be made. Before a
decision can be made, pertinent information has to be gathered and a plane should be conceived on
how to deal with the information gathered. Thus, to give meaning to this information, statistical
methods have to be employed.
In any school of thought, a layman would be thinking of the chance that certain events may
happen or may not happen. In this regard, the layman unknowingly has in mind the functions of
statistics.
MEANING OF STATISTICS
- The term statistics refers to a set of pertinent activities such as collection, presentation,
analysis, and interpretation of quantitative data. It is a field of study which deals with
mathematical characterization of a group or groups of items.
Collection of data refers to the process of gathering numerical information. Methods of gathering
pertinent information include interview, questionnaire, experiments, observation, and documentary
analysis.
Once the data are gathered, the next step in statistical inquiry is the presentation of data in
appropriate tables and graphs. Such tables refer to frequency distribution which may either be
one- dimensional. Graphical presentation includes bar graphs, frequency polygon, pie graph, and
many others.
Analysis of data refers to the activity of describing the properties or behavior of the data or the
possible correlation of different quantities or variables. Such description can be obtained after
summarizing the data into measurements like the average.
Finally, interpretation has to be made based on the preliminary activities and other statistical
methods. Such methods involve testing the significance of the results.
NATURE OF STATISTICS
Descriptive Statistics
- Refers to that field of statistics that includes the methods of collecting, classifying, graphing,
and averaging data with the objective of simply describing the properties or characteristics of
the data on hand. Thus, the task of the statistician in this area is simply to select a few
procedures, do some averaging, and eventually be able to identify significant features of the
given data.
Inferential Statistics
- Demands a somewhat higher degree of critical judgment and advanced mathematical models.
This field is concerned with drawing conclusions or generalizations from organized data. Thus,
the task of the statistician here is not just to devise ways to give a summary description of the
data but ways to test the significance of the results.
Universe
- Group under consideration (where raw data will come from).
Variable
- Characteristic under consideration.
a. Qualitative variable (non-numerical)
b. Quantitative (numerical)
b.1. Discrete quantitative variable (countable)
b.2. Continuous quantitative variable (measurable)
Population
- Refers to the totality of all objects under study. Such objects have common attributes which
are grouped with the objective of determining certain trends that might be useful to the
researcher.
- A set containing all the possible outcome/answer of the variable.
Ex. Tossing a die
P = {1, 2, 3, 4, 5, 6}
Sample
- Small part that serves as the representative of the population.
- Subset of the population.
A = sample representing the result as odd number
A = {1, 3, 5}
B = sample representing the result as a perfect square
B = {1, 4}
To compute for the value of the sample size n relative to the population size N, use the formula
N
n = where N = population size
1 + N e2 n = sample size
e = margin of error
Examples
1. Find the sample size if the population size is 250 at 95% accuracy.
250
n = = 153.85 = 154
1 + 250 (0.05)2
2. A researcher is conducting an investigation regarding the factors affecting the efficiency of the
185 faculty members of a certain university. If he wanted to have a margin of error of 5%, then
how many of the faculty members should be taken as respondents?
185
n = = 126.5 = 127
1 + 185 (0.05)2
SUMMATION NOTATION, Σ
The sum of n observations represented by x1, x2, x3, . . . . . xn is usually written as
x1 + x2 + x3 + . . . . + xn
or in sigma notation, the sum shall be written as
n
Σ xi = x1 + x2 + x3 + . . . . + xn
i=1

The left hand side of the equation is read as the sum of all x for i = to 1 to n.

Rule 1. If c is a constant, then the sum of n c constants is equal to c times the number of
constants.
n
Σ c = c+c+c+....+c = nc
i=1

Examples
3
Σ 6 = 6 + 6 + 6 + = 18
i=1

3
Σ 6 = 3 (6) = 18
i=1

Expand the following:


7 (i + 1) xi 4 x3 5 x4 6 x5 7 x6 8 x7
Σ = + + + +
i=3
(i – 2) y(i + 2) y5 2 y6 3 y7 4 y8 5 y9
6 i x(i – 1) 4 x3 5 x4 6 x5
Σ = + +
i=4
(i + 3) y(1 +1) 7 y5 8 y6 9 y7

Rule 2. If c is a constant multiplied by each of the n observations represented by xi, then the sum
of the products is equal to c times the sum of n observations.
n
Σ cxi = cx1 + cx2 + cx3 + . . . . + cxn
i=1

= c (x1 + x2 + x3 + . . . . + xn)
n n
Σ cx1 = c Σ xi
i=1 i=1

Examples
4 4
Suppose the value of Σ xi = 16. Then evaluate the value of Σ 5xi.
i=1 i=1

4 4
Σ 5x1 = 5 Σ xi
i=1 i=1

= 5 (16) = 80
Consider the following observations.
x1 = 12; x2 = 7; x3 = 10; x4 = 13; x5 = 8
Determine the value of the following expressions.
5 5
a. Σ xi b. Σ 3xi
i=1 i=1

Solution
5
a. Σ xi = x1 + x2 + x3 + x4 + x5
i=1

= 12 + 7 + 10 + 13 + 8
5
Σ xi = 50
i=1

5 5
b. Σ 3xi = 3 Σ xi = 3 (50) = 150
i=1 i=1

Write the summation notations


x2 4 x3 9 x4 16 x5 5 (i – 1 )2 xi
+ + + = Σ
i=2
3 y5 4 y6 5 y7 6 y8 (i +1) y(i + 3)
4 i2 xi (i + 1)
= Σ
i=1
(i +2) y(i + 4)
8 (i – 4)2 xi (i - 1)
= Σ
i=5
(i – 2) yi
5 x2 5 x3 5 x4 4 xi
+ + = 5 Σ
i=2
6 y3 7 y4 8 y5 (i +4) y(i + 1)
8 x(i – 4)
= 5 Σ
i=6
i y(i - 3)
Rule 3. If xi and yi are two random quantities, then
n
Σ (xi + yi) = (x1 + y1) + (x2 + y2) + . . . . (xn + yn)
i=1

= (x1 + x2 + . . . . + xn) + (y1 + y2 + . . . . + yn)


n n n
Σ (xi + yi) = Σ xi + Σ yi
i=1 i=1 i=1

Examples
Consider the following quantities.
x1 = 3; x2 = 5; x3 = 7; x4 = 6
y1 = 5; y2 = 8; y3 = 8; y4 = 9
4
Evaluate the expression Σ (xi + yi)
i=1

Solution
n
Σ (xi + yi) = (x1 + y1) + (x2 + y2) + (x3 + y3) + (x4 + y4)
i=1

= (3 + 5) + (5 + 8) + (7 + 8) + (6 + 9)
= 8 + 13 + 15 + 15 = 51
4 4 4
Σ (xi + yi) = Σ xi + Σ yi
i=1 i=1 i=1

= (x1 + x2 + x3 + x4) + (y1 + y2 + y3 + y4)


= (3 + 5 + 7 + 6) + (5 + 8 + 8 + 9)
= 21 + 30 = 51
QUIZ
1. Determine the sample size of each of the following given the population size with the
corresponding margin of error.
a. N = 350 ; e = 10%
b. N = 5,600 ; e = 5%
2. A researcher is conducting an investigation regarding the factors affecting the level of
efficiency of the municipal mayors of the Philippines. If there are 700 municipal mayors
throughout the country, then determine the sample size he should use if he wants to have a
margin of error of 5%.
3. Consider the following measurements.
x1 = 12; x2 = 16; x3 = 9; x4 = 10; x5 = 15
y1 = 13; y2 = 15; y3 = 12; y4 = 8; y5 = 17
Determine the value of the following expressions.
5
a. Σ xi = x1 + x2 + x3 + x4 + x5 = 12 + 16 + 9 + 10 + 15 = 62
i=1

4 4 4
b. Σ (xi + yi) = Σ xi + Σ yi = (12 + 13) + (16 + 15) +( 9 + 12) + (10 + 8) = 95
i=1 i=1 i=1

4. Expand the given expression.


8 i x(i – 1) 4 x3 5 x4 6 x5 7 x6 8 x7
Σ = + + + +
i=4 (i – 2) y(i + 2) 2 y6 3 y7 4 y8 5 y9 6 y10
Chapter 2: DATA PRESENTATION
Generally, data collected from different sources are usually unorganized and in a form
unsuitable for immediate interpretation. In any statistical investigation, once pertinent data are already
gathered, the next step is to present such data in organized form using appropriate tables and graphs.
In this chapter, we will consider tabular presentation through frequency distribution and different
methods of graphical presentation.
FREQUENCY DISTRIBUTION
Suppose, a statistics class with 60 students were given an examination and the results are
shown in Table 2.1 below.
Table 2.1
Test Scores Obtained by the 60 Students in a Statistics Class

48 73 57 57 69 88 11 80 82 47
46 70 49 45 75 81 33 65 38 59
94 59 62 36 58 69 45 55 58 65
30 49 73 29 41 53 37 35 61 48
22 51 56 55 60 37 56 59 57 36
12 36 50 63 68 30 56 70 53 28

Notice that in Table 2.1, no trend or pattern in the scores of the student is evident. Thus it is
desirable that the data be grouped into categories or intervals. The raw data in Table 2.1 can be
presented in a frequency distribution as shown in table 2.2.
Table 2.2
The Frequency Distribution of the Examination Results of Sixty Students in a Statistics Class
Exam Scores Number of Students
11 – 22 3
23 – 34 5
35 – 46 11
47 – 58 19
59 – 70 14
71 – 82 6
83 – 94 2
n = 60
From Table 2.2, we can say that 3 out of 60 students got scores ranging from 11 to 22.
Nineteen students got scores from 47 to 58 and only two were able to get scores ranging from 83 to
94. Now, what if the students are classified according to their respective courses? See Table 2.3.
Table 2.3
Frequency Distribution of the Sixty Students in Statistics
Grouped According to their Respective Courses
Courses Number of Students
BS Geodetic Engineering 12
BS Civil Engineering 25
BS Electrical Engineering 15
BS Sanitary Engineering 6
BS Mechanical Engineering 2
n = 60
Table 2.3 can be interpreted in the same way as Table 2.2. For example, we can say that 25
out of 60 students are taking BSCE and only 2 students are taking BSME.
Table 2.2 and 2.3 are examples of frequency distributions. In a frequency distribution, the data
are summarized into classes or categories to show the frequency of occurrence of the values or
objects in each class or category. If the data are grouped according to numerical intervals or classes
as in Table 2.2, then we have a quantitative frequency distribution. When the data are tabulated in
terms of categories as in Table 2.3, then we shall call this table a qualitative frequency distribution.
Consider the frequency distribution in Table 2.2. Notice that the first column represents the
groupings in terms of numerical intervals. These numerical intervals are called classes or class
intervals. Hence the classes in this table are 11 – 22, 23 – 34, . . . , and 83 – 94. The class interval 11
– 22 is the first class or the lowest interval. The interval 23 – 34 is the second, and 83 – 94 shall be
treated as the highest or the seventh class interval.
Any interval is defined by class limits. Class limits refer to the lowest and the highest value that
can be entered in each class. The lowest value that can go in each class is known as the lower class
limit and the highest value that can go in each class is called the upper class limit. To illustrate,
consider the interval 35 – 46. The value 35 is the lower limit and 46 as the upper limit of the interval.
The number of values that fall in a given interval is known as the frequency. The frequencies of
the distribution are usually listed in one column and is represented by ‘f’. Thus in Table 2.2, the value
3 shall mean the frequency of the first interval, 19 shall be the frequency of the interval 47 – 58, and
so on.
In a frequency distribution, it is assumed that the values are evenly distributed within the
interval. There are some instances, however, where an interval has to be summarized and be
represented by a single value. This value called the midpoint or class mark serves as the
representative of the given interval. Generally, the midpoint is obtained by adding the class limits then
dividing the sum by two. Thus, if we let x be the class midpoint and L1 and U1 as the lower and upper
limits of a particular interval, then
L1 + U1
x =
2
To illustrate, consider again the interval 11 – 22. To get the class mark or midpoint of this
interval, we shall have
11 + 22
x = = 16.5
2
Likewise, the midpoint of the interval 23 – 34 is
23 + 24
x = = 28.5
2
The class boundary, commonly known as the true limit refers to the value midway between the
upper limit of a certain interval and the lower limit of the next. Thus, if the class limits are whole
numbers, then the boundary of each interval can be obtained by simply adding 0.5 to the upper limit
and subtracting 0.5 to the lower limit. To illustrate, in the interval 11 – 22, the class boundaries shall
be 11- 0.5 = 10.5 and 22 + 0.5 = 22.5. The values 10.5 and 22.5 shall be the lower class boundary
and the upper class boundary respectively.
The size of the class interval represented by c, also called class width, can be obtained using
several methods. First, this value can be computed by getting the difference between the boundaries
of a particular class. So, in the interval 23 – 34, the class boundaries are 22.5 and 34.5. Thus, the
class width is 34.5 – 22.5 = 12. Second, the size of the class interval can also be obtained by getting
the difference between two successive upper limits or two successive lower limits. In our frequency
distribution, we may take any two successive lower limits. For example, if we take 23 and 35, the
difference between these two numbers, 35 – 23 = 12, is the size of the class interval. If we consider
two successive upper class limits 58 and 70, the difference, 70 – 58 = 12, shall also be the size of the
class interval.
We can now reproduce the distribution in Table 2.2 to include the notations we introduced and
the class boundaries and class marks or midpoints of the corresponding class intervals.
Table 2.4
The Frequency Distribution of the Examination Results of Sixty Students in a Statistics Class
Classes f x Class Boundaries
11 – 22 3 16.5 10.5 – 22.5
23 – 34 5 28.5 22.5 – 34.5
35 – 46 11 40.5 34.5 – 46.5
47 – 58 19 52.5 46.5 – 58.5
59 – 70 14 64.5 58.5 – 70.5
71 – 82 6 76.5 70.5 – 82.5
83 – 94 2 88.5 82.5 – 94.5
n = 60
CONSTRUCTION OF A FREQUENCY DISTRIBUTION
Steps:
1. Get the lowest and the highest value in the distribution. We shall let H and L be the highest
and the lowest value in the distribution.
2. Get the value of the range. The range denoted by R, refers to the difference between the
highest and the lowest value in the distribution. Thus,
R = H–L
3. Determine the number of classes. In the determination of the number of classes, it should be
noted that there is no standard method to follow. Generally, the number of classes must not be
less than 5 and should not be more than 15. In some instances, however, the number of
classes can be approximated by using the relation
k = 1 + 3.3 log n
where: k = number of classes
n = sample size
4. Determine the size of the class interval. The value of c can be obtained by dividing the range
by the desired number of classes. Hence,
R
c =
k
5. Construct the classes. In constructing the classes, we first determine the lowest lower limit of
the distribution. The value of this lower limit can be chosen arbitrarily as long as the lowest
value shall fall on the first interval and the highest value to the last interval.
6. Determine the frequency of each class. The determination of the number of frequencies is
done by counting the number of items that fall in each interval.
Example 1:
Construct the frequency distribution in Table 2.1.
Step 1. Get the lowest and highest value.
H = 94 ; L = 11
Step 2. Get the range.
R = H – L = 94 – 11 = 83
Step 3. Determine the number of class intervals.
k = 1 + 3.3 log n
= 1 + 3.3 log 60
= 6.88 = 7
Step 4. Determine the size of the class interval.
R 83
c = = = 11.86 = 12
k 7
Step 5. Construct the classes. The lowest value is 11. Thus, we can start with 11 or below as the
lowest class limit. If we use 11 as the lowest lower class limit, then the upper class limit
can be obtained by subtracting 1 from the size of the class interval and adding the result to
the lower limit. This procedure shall be repeated until the 7th class interval is constructed.
Step 6: Determine the frequencies. The number of frequencies is simply counted from the set of
data.

Classes f
11 – 22 3
23 – 34 5
35 – 46 11
47 – 58 19
59 – 70 14
71 – 82 6
83 – 94 2
n = 60
DERIVED FREQUENCY DISTRIBUTION
Given a frequency distribution, we can construct other frequency distributions like the relative
frequency distribution and the cumulative frequency distribution.
Relative Frequency Distribution
The relative frequency distribution of a given set of data shows the proportion in percent the
frequency of each class to the total frequency. The relative frequency denoted by %f can be obtained
by dividing the class frequency by the sample size and multiplying the result by 100. The formula for
converting the class frequency to percent is,
f
%f = x 100
n
where: %f = the relative frequency for each class interval
f = the frequency of each class
n = sample size
Classes %f
11 – 22 5.00
23 – 34 8.33
35 – 46 18.33
47 – 58 31.67
59 – 70 23.33
71 – 82 10.00
83 – 94 3.33
Total = 99.99%
To interpret the result, we can say that 5% of the class got scores ranging from 11 – 22, 8.33%
of the class got scores ranging from 23 – 24, and so on.
Cumulative Frequency Distribution
The cumulative frequency distribution can also be derived from the frequency distribution. This
distribution can be obtained by simply adding the class frequencies. Unlike the relative frequency
distribution where the frequencies are converted as percent of the sample size, this type of distribution
tries to determine partial sums from the data classified in terms of classes. This distribution answers
problems like “the number of students who got a passing mark”, the number of employees who got
efficiency rating from 75% to 95%, and so on.
There are two kinds of cumulative frequency distribution. These are as follows:
1. Less than cumulative frequency distribution – refers to the distribution whose frequencies are
less than or below the upper class boundary they correspond to. We shall let <cumf be the
less than cumulative frequency.
2. Greater than cumulative frequency – refers to the distribution whose frequencies are greater
than or above the lower class boundary they correspond to. We shall let >cumf be the greater
than cumulative frequency.
The less than cumulative frequency (a) and greater than cumulative frequency (b) are
shown below.
(a) Classes <cumf
11 – 22 3
23 – 34 8
35 – 46 19
47 – 58 38
59 – 70 52
71 – 82 58
83 – 94 60
(b) Classes >cumf
11 – 22 60
23 – 34 57
35 – 46 52
47 – 58 41
59 – 70 22
71 – 82 8
83 – 94 2
Notice that in the less than cumulative frequency, the cumulative frequency 3 corresponds to
the upper class boundary 22.5. Hence, we can say that 3 students were able to get scores less than
23.5. Also, the interval 23 – 34 contains the frequency 5. Hence, we have 3 + 5 = 8 values less than
the upper class boundary 34.5. The interval 35 – 46 contains the frequency 11. Thus, we can say that
there are 3 + 5 + 11 = 19 values or examination scores less than the upper class boundary 47.5.
In the case of the greater than cumulative frequency, the frequencies are added in reverse
starting from the frequency of the highest interval. To illustrate, the number of values corresponding to
the interval 83 – 94 is 2. Hence, we may say that there are 2 values greater than the lower class
boundary 82.5. Similarly, the interval 71 – 82 contains 6 values. Thus, we can say that there are 2 + 6
= 8 values greater than the lower boundary 70.5. The interval 59 – 70 contains 14 values. Therefore,
we can say that there are 2 + 6 + 14 = values greater than the lower class boundary 58.5.
GRAPHICAL PRESENTATION
Graphical presentation refers to the pictorial presentation of data. In any statistical investigation,
the data presented in a graph enables the researcher to see at a glance the general characteristics
and special features of such data. Reduction into visual form, therefore, often leads to the greater
understanding that could facilitate the solution of the problem. In this section, we shall consider
different methods of data presentation: the histogram, frequency polygon, cumulative frequency
polygon or ogive, and the pie graph.
Histogram
Histogram refers to a data presentation that uses bars in presenting the frequencies of each
class. The graph is usually presented in quadrant I of a two-dimensional coordinate system.
Generally, the horizontal axis represents the classes and the vertical axis represents the frequency.
The horizontal axis is subdivided into equal intervals where one subinterval represents a class. It
should be noted that classes are one unit apart and are defined by class limits. In the case of the
frequencies, the vertical axis is also subdivided into equal intervals.
After subdividing the axes, the bar is then drawn for each class. The width of the bar is equal to
the size of the class interval and the height corresponds to the frequency. This implies that the higher
the frequency, the taller is the bar. The histogram of frequency distribution of Table 2.2 is shown on
Figure 2.1.

20

18

16

14
FREQUENCY

12

10

11 - 22 23 - 34 35 - 46 47 - 58 59 - 70 71 - 82 83 - 94
CLASS
Figure 2.1. Histogram of the Examination Scores of Sixty Students in Statistics

Frequency Polygon
In plotting the histogram, we assume that frequencies are evenly distributed within the interval.
In a frequency polygon, we assume that the frequencies of each interval are concentrated at the
midpoint of the interval. Instead of drawing bars to represent the interval, we simply make a dot above
the bar to represent the position of the midpoint within the interval. Thus, in a frequency polygon, the
horizontal axis is subdivided into subintervals and the points that divide these subintervals represent
the midpoints.
The frequency polygon based on the same data as in the histogram in Figure 2.1 is shown in
Figure 2.2. Observe that the frequency polygon in Figure 2.2 is not a smooth continuous curve since
these various points are joined by smooth segments.
20
18

16

14

FREQUENCY 12

10

16.5 28.5 40.5 52.5 64.5 76.5 88.5


CLASS MARK
Figure 2.2. Frequency Polygon of the Examination Scores of Sixty Students in Statistics
<Ogive and >Ogive
The construction of <ogive and >ogive is different from that of the frequency polygon. Instead of
plotting points corresponding to class marks and frequencies, we plot points corresponding to class
boundaries and cumulative frequencies. This is done because we want our graph to visually represent
the number of cases falling above or below particular values. In plotting the cumulative frequency of
the exam scores of the students in Statistics, we shall plot the less than cumulative frequency
equivalent to 3 against the upper class boundary 22.5, the cumulative frequency 8 versus 34.5, and
so on. Figure 2.3 shows the <ogive and >ogive of the said examination scores.

60

>ogive <ogive
50

40
cumf

30

20

10

10.5 22.5 34.5 46.5 58.5 70.5 82.5 94.5


Class Boundary
Figure 2.3. Graph of <Ogive and >Ogive of the Examination Scores of Sixty Students in Statistics
Pie Graph

1 …. 11 - 22
2 …. 23 - 34
3 …. 35 - 46
4 …. 47 - 58
5 …. 59 - 70
6 …. 71 - 82
7 …. 83 - 94

Figure 2.4. Pie Graph of the Examination Scores of Sixty Students in Statistics

Chapter 3: MEASURES OF CENTRAL TENDENCY


Presenting a set of data in a frequency distribution table usually serves as a preliminary step in
summarizing the data in one or two values. Such data can be summarized by considering one of the
most important concepts in statistical investigation – the concept of central tendency or average.
In this chapter, we shall consider the three most commonly used averages – the mean, the
median, and the mode. The extension of the concept on median such as the quartile, decile, and the
percentile shall also be considered.
MEAN
One of the simplest and most efficient measures of central tendency is the mean. It is the value
obtained by adding the values in the distribution and dividing the sum by the total number of values.
Notice that all the values in the distribution are taken into consideration when computing the value of
the mean.
Mean for Ungrouped Data
To compute the mean for ungrouped data, we shall let x be the value of the mean.
sum of all the values in the distribution
x =
number of values in the distribution
Σx
x =
n
Examples
1. Consider the following values
21, 10, 36, 42, 39, 52, 30, 25, 26
Compute the value of the mean.
Σx
x =
n
21 + 10 + 36 + 42 + 39 + 52 + 30 + 25 + 26 281
x = =
9 9
x = 31.22
2. The ages of 15 students in a certain class were taken and shown below.
15, 18, 17, 16, 19, 21, 18, 23, 24, 18, 16, 17, 20, 21, 19
Determine the mean age of the students.
15 + 18 + 17 + 16 + 19 + 21 + 18 + 23 + 24 + 18 + 16 + 17 + 20 + 21 + 19
x =
15
x = 18.80
3. The daily sales of ABC Enterprises for the first seven days of a certain month are shown below.
₱ 5,286 ₱ 10,826 ₱ 2,580 ₱ 6,386 ₱ 4,650 ₱ 3,635 ₱ 8,625
Determine the daily mean sales of the store for the first seven days.

Σx
x =
n
₱ 5,286 + ₱ 10,826 + ₱ 2,580 + ₱ 6,386 + ₱ 4,650 + ₱ 3,635 + ₱ 8,625
x =
7
x = ₱ 5,998.29
Weighted Mean
There are some instances where, in the computation of the mean of a set of data, each value in
the distribution is associated with a certain weight or degree of importance. For example, a student in
a certain college is enrolled in 6 subjects where not all of the subjects carry a three-unit load.
Assuming further, that the said student was able to obtain the following grades as shown below.
Subject No. of Units Grade
1 3 2.0
2 3 3.0
3 5 1.25
4 1 3.0
5 2 2.5
6 3 2.5
If each subject carries a 3-unit load, then the mean grade can be obtained by simply adding the
grades in the third column and then divide the sum by 6.
Since the number of units per subject are different, then each weight should be considered by
multiplying first the number of units per subject to its corresponding grade. The products are added
and the sum shall be divided by the total number of units. We shall call this the weighted mean.
The method discussed above can be represented by the formula,
Σ wx
x = where: x = represents the item values
Σw w = represents the weight associated to x
Example
Suppose we are interested in computing the weighted mean grade of the student in our
previous example as shown below.
Subject No. of Units (w) Grade (x)
1 3 2.0
2 3 3.0
3 5 1.25
4 1 3.0
5 2 2.5
6 3 2.5
To compute the value of the weighted mean, we have
Σ wx
x =
Σw
3 (2.0) + 3 (2.0) + 5 (1.25) + 1 (3.0) + 2 (2.5) + 3 (2.5) 36.75
x = = = 2.16
3+3+5+1+2+3 17
The weighted mean can also be computed by constructing another column
representing the products of the item values and their corresponding weights. The sum of
these products shall be the numerator of the equation.
Mean for Grouped Data
To compute the value of the mean of a data presented in a frequency distribution, we shall
consider two methods:
1. Midpoint Method
2. Unit Deviation Method
In using the Midpoint Method, the midpoint of each class interval is taken as the representative
of each class. These midpoints are multiplied by their corresponding frequencies. The products are
added and the sum is divided by the total by the total frequencies. The value obtained is considered
the mean of the grouped data. The formula is
Σ fx
x = where: f = represents the frequency of each class
n x = the midpoint of each class
n = total number of frequencies or sample size
To be able to apply the above equation, we shall follow the steps below.
1. Get the midpoint of each class.
2. Multiply each midpoint by its corresponding frequency.
3. Get the sum of the product in step 2.
4. Divide the sum obtained in step 3 by the total number of frequencies. The result shall be
rounded off to two decimal places.
Example
Consider the frequency distribution of the examination scores of the sixty students in a
statistics class. Compute the value of the mean.
Step 1. Get the midpoint of each class. The midpoints are shown in the third column.
Classes f x
11 – 22 3 16.5
23 – 34 5 28.5
35 – 46 11 40.5
47 – 58 19 52.5
59 – 70 14 64.5
71 – 82 6 76.5
83 – 94 2 88.5
Step 2. Multiply each midpoint by its corresponding frequency. The products are shown in the third
column.
Classes f x fx
11 – 22 3 16.5 49.5
23 – 34 5 28.5 142.5
35 – 46 11 40.5 445.5
47 – 58 19 52.5 997.5
59 – 70 14 64.5 903.0
71 – 82 6 76.5 459.0
83 – 94 2 88.5 177.0
Step 3. Get the products in step 2.
Classes f x fx
11 – 22 3 16.5 49.5
23 – 34 5 28.5 142.5
35 – 46 11 40.5 445.5
47 – 58 19 52.5 997.5
59 – 70 14 64.5 903.0
71 – 82 6 76.5 459.0
83 – 94 2 88.5 177.0
n = 60 Σ fx = 3,174.0
Step 4. Divide the result in step 3 by the sample size. The result is the mean of the distribution.
Σ fx
x =
n
3174
x = = 52.90
60
The alternative method of computing the value of the mean for grouped data is the Unit
Deviation Method. Instead of using midpoints, this method uses unit deviations. This method is
usually implemented by considering an arbitrary point as the initial step in approximating the value of
the mean. This point is the midpoint of any class interval. For conventional purposes, however, the
midpoint of the class interval with the highest frequency will be the arbitrary value and shall be called
the assumed mean. The interval containing the assumed mean shall be referred to as the mean class.
The next step is done by constructing the unit deviation column. This step involves assigning a
deviation value of 0 to the assumed class mean and the other class marks with successive integers.
For example, if the distribution has 9 classes and the fifth class interval is the assumed class mean,
then the entries in the unit deviation column shall be -4, -3, -2, -1, 0, 1, 2, 3, and 4. However, if the
assumed class mean is the fourth class interval, then the entries in the unit deviation column will be -
3, -2, -1, 0, 1, 2, 3, 4, and 5 respectively. The unit deviations are usually represented by d.
The third step is implemented by multiplying the frequency by their corresponding unit
deviations. The products are added and the sum is divided by the sample size. The result is then
multiplied by the size of the class interval.
Finally, the value of the mean is determined by adding the product to the assumed mean.
The formula will be as follows:
Σfd
x = xo + c where: xo = represents the assumed mean
n f = frequency of each class
d = unit deviation
c = size of the class interval
n = sample size
To be able to apply the above equation, we shall follow the steps below.
1. Choose an assumed mean by getting the midpoint of any interval.
2. Construct the unit deviation column.
3. Multiply the frequencies by their corresponding unit deviations. Add the products.
4. Divide the sum in step 3 by the sample size.
5. Multiply the result in step 4 by the size of the class interval.
6. Add the value obtained in step 5 to the assumed mean. The obtained result which is the mean
should be rounded off to two decimal places.
Example
Compute the value of the mean of the data in the frequency distribution of the examination
scores of the sixty students in a statistics class.
Step 1. Choose an assumed mean.
Classes f
11 – 22 3
23 – 34 5
35 – 46 11
47 – 58 19
59 – 70 14
71 – 82 6
83 – 94 2
An assumed mean may be the midpoint of the class interval 47 – 58.
Step 2. Construct the unit deviation column

Classes f d
11 – 22 3 -3
23 – 34 5 -2
35 – 46 11 -1
47 – 58 19 0
59 – 70 14 1
71 – 82 6 2
83 – 94 2 3
n = 60
Step3. Multiply the frequencies by their corresponding unit deviations. Add the products.
Classes f d fd
11 – 22 3 -3 -9
23 – 34 5 -2 - 10
35 – 46 11 -1 - 11
47 – 58 19 0 0
59 – 70 14 1 14
71 – 82 6 2 12
83 – 94 2 3 6
n = 60 Σ fd = 2
Steps 4, 5, and 6.
Σfd
x = xo + c
n
2
= 52.5 + (12)
60
x = 52.90
Sample Work

1. The final grades of a student in six subjects where he was enrolled were taken and are shown
below.
Subject No. of Units Final grade fx
Math 1 5 78 390
Chemistry 4 87 348
English 3 90 270
Drawing 2 83 166
Filipino 3 84 252
Com Lab 1 82 82
n = 18 Σfx = 1,508
Determine the weighted mean.

Σ fx
x =
n
1508
x = = 83.78
18
2. Consider the frequency distribution of the ages of 75 mayors shown below.
Classes f
25 – 30 3
31 – 36 6
37 – 42 11
43 – 48 27
49 – 54 16
55 – 60 7
61 – 66 4
67 – 72 1
Compute the mean age of the mayors using (a) the midpoint method and (b) the unit
deviation method.
a. Midpoint Method
Classes f x fx
25 – 30 3 27.5 82.5
31 – 36 6 33.5 201.0
37 – 42 11 39.5 434.5
43 – 48 27 45.5 1,228.5
49 – 54 16 51.5 824.0
55 – 60 7 57.5 402.5
61 – 66 4 63.5 254.0
67 – 72 1 69.5 69.5
n = 75 Σ fx = 3,496.5
Σ fx
x =
n
3496.5
x = = 46.62
75
b. Unit Deviation Method
Classes f d fd
25 – 30 3 -3 -9
31 – 36 6 -2 - 12
37 – 42 11 -1 - 11
43 – 48 27 0 0
49 – 54 16 1 16
55 – 60 7 2 14
61 – 66 4 3 12
67 – 72 1 4 4
n = 75 Σ fd = 14
Σfd
x = xo + c
n
14
= 45.5 + (6)
75
x = 46.62
MEDIAN
In the process of computing the mean, we observed that all the values are taken into
consideration. Thus, if a distribution contains extreme values, then the value of the mean is usually
pulled either to the right or to the left depending on the position of these extreme values.
We shall now consider a central tendency that does not take into consideration all the values in
the distribution. This measure called the median is a positional measure defined as the middlemost
value in the distribution. Hence, this value divides a given set of data into two equal parts.
Median for Ungrouped Data
In the determination of the median of ungrouped data, it is always a must that the values be
arranged in terms of magnitude either from the lowest to highest or vice versa. Suppose a
distribution contains 9 values. Then, the middlemost value in the set of data shall be the fifth value
since there will be four values below it and four values above it. If there are 10 values in the
distribution, then the value of the median shall be the average of the fifth and the sixth values.
Let x be the median read as x curl
x = x(n + 1)/2 if n is odd
x(n/2) + x(n/2 + 1)
x = if n is even
2
Example
1. Find the median of the following values.
21, 10, 36, 42, 39, 52, 30, 25, 26
Before identifying the value of the median, it is necessary that the values be arranged in
terms of magnitude. Thus, we have
10, 21, 25, 26, 30, 36, 39, 42, 52
Since n = 9 and is odd, then we shall use the equation
x = x(n + 1)/2
= x(9 + 1)/2 = x5 (refers to the fifth value)
x = 30
2. The following values are the number of students of the first 8 classes in a certain college taken
for inspection:
21, 25, 26, 30, 36, 39, 42, 55
Determine the median.
The values are already arranged in terms of magnitude. Thus we skip the initial step. Since n =
8 and is even, then we shall use the equation
x(n/2) + x(n/2 + 1)
x =
2
x(8/2) + x(8/2 + 1) x4 + x5
= =
2 2
30 + 36
x = = 33
2
Median for Grouped Data
Just like the mean, the computation of the value of the median is done through interpolation.
The procedure requires the construction of the less than cumulative frequency column (<cumf).
The first step in finding the value of the median is to divide the total number of frequencies by
2. This is consistent with the definition of the median. The value n/2 shall be used to determine the
cumulative frequency before the median class denoted by cumf b. cumfb refers to the highest value
under the <cumf column that is lea than n/2. The median class refers to the interval that contains the
median, that is, where the (n/2)th value is located. Hence, among the entries under the <cumf which
are greater than n/2, the smallest shall be the frequency of the median class. If a distribution contains
an interval where the cumulative frequency is exactly n/2, then the upper boundary of that class will
be the median and no interpolation is needed.
After identifying the median class, we shall approximate the position of the median within the
median class. This approximation shall be done by subtracting the value of cumf b from n/2. Then, the
difference is divided by the frequency of the median class times the size of the class interval. The
result is then added to the lower boundary of the median class to get the median of the distribution.
The computing formula for grouped data is given below.
n/2 – cumfb
x = xb + c where: xb = refers to the lower boundary of the
fm median class
cumfb = cumulative frequency before the
median class
fm = frequency of the median class
To be able to apply the above equation, we shall follow the steps below.
1. Get ½ of the total number of values.
2. Determine the value of cumfb.
3. Determine the median class.
4. Determine the lower boundary and the frequency of the median class and the size of the class
interval.
5. Substitute the values obtained in steps 1 – 4 in the equation. Round off the result to two
decimal places.
Example
Compute the value of the median of the examination scores of the sixty students in a statistics
class.
Construct the less than cumulative frequency column.
Classes f <cumf
11 – 22 3 3
23 – 34 5 8
35 – 46 11 19
47 – 58 19 38
59 – 70 14 52
71 – 82 6 58
83 – 94 2 60
Steps
1. n/2 = 60/2 = 30
2. cumfb = 19
3. Median class: 47 – 58
4. xb = 46.5; fm = 19; c = 12
n/2 – cumfb
5. x = xb + c
fm
30 – 19
= 46.5 + 12
19
x = 53.45
Sample Work
A researcher is conducting an investigation regarding the income of the alumni of a certain university
5 years after graduation. The monthly incomes of the 200 respondents were taken and are presented
below.

Classes f <cumf
3,500 – 4,999 6 6
5,000 – 6,499 23 29
6,500 – 7,999 36 65
8,000 – 9,499 40 105
9,500 – 10,999 59 164
11,000 – 12,499 20 184
12,500 – 13,999 8 192
14,000 – 15,499 6 198
15,500 – 16,999 2 200

Determine the median of the monthly income of the 200 respondents.


Steps
1. n/2 = 200/2 = 100
2. cumfb = 65
3. Median class: 8,000 – 9,499
4. xb = 7,999.5; fm = 40; c = 1,500
n/2 – cumfb
5. x = xb + c
fm
100 – 65
= 7,999.5 + 1,500
40
x = ₱ 9,312
MODE
We will now consider the third measure of central tendency known as the mode. This type of
average is the simplest both in concept and in application. By definition, the mode is referred to as
the most frequent value in the distribution. We shall use the symbol x read as ‘x hat’ to represent the
mode.
Mode for Ungrouped Data
In the case of ungrouped data, the value of the mode can be obtained through inspection, thus,
no computation is needed. In some instances, the mode might exist or it might not exist. If it exists, it
can be more than one value. Let us consider the following examples.
Example
Consider the following sets of measurements.
A: 21, 23, 16, 15, 26, 27, 19, 24
B: 31, 21, 16, 15, 21, 27, 19, 18
C: 17, 25, 34, 25, 27, 19, 19, 24
In set A, notice that there is no value that occurred more than once. Hence, we can say
that the mode in this set of data does not exist. In set B, the value 21 appeared twice. Since this
value has the most number of occurrence, then we may say that
x = 21
In set C, there are two most frequent values in the distribution. Hence, we can say that the
distribution contains two values representing the mode. These values are
x = 25, 19
Mode for Grouped Data
In the computation of the value of the mode for grouped data, it is necessary to identify the
class interval that contains the mode. The interval, called the modal class, contains the highest
frequency in the distribution.
The next step after getting the modal class is to determine the mode within the class. This value
may be approximated by getting the differences of the frequency of the modal class to the frequency
before and to the frequency after the modal class. If we let d 1 be the difference of the frequency of
the modal class and the frequency of the interval preceding the modal class and d 2 be the difference
of the frequency of the modal class and the frequency of the interval after the modal class, then the
mode within the class shall be approximated using the expression

d1
c
d1 + d 2
If this expression is added to the lower boundary of the modal class, then we can come up with
the computing formula for the value of the mode for grouped data. The formula is
d1
x = xb + c
d1 + d2
To be able to apply the above equation, we shall consider the following steps.
1. Determine the modal class.
2. Get the value of d1.
3. Get the value of d2.
4. Get the lower boundary of the modal class.
5. Apply the formula by substituting the values obtained in the preceding steps.
Example
Consider the frequency distribution of the examination scores of sixty students. Compute the mode
of the distribution.
Classes f
11 – 22 3
23 – 34 5
35 – 46 11
47 – 58 19 --- modal class
59 – 70 14
71 – 82 6
83 – 94 2

d1 = 19 – 11 = 8
d2 = 19 – 14 = 5
d1
x = xb + c
d1 + d2
8
x = 46.5 + (12)
8+5
x = 53.88
COMPARISON OF THE AVERAGES
In the case of the mean, the following are some observations that can be made.
1. The mean always exists in any distribution. This implies that for any set of data, the mean can
always be computed.
2. The value of the mean in any distribution is unique. This implies that for any distribution,
there is only one possible value of the mean.
3. In the distribution for this measure, it takes into consideration all the values in the
distribution.
In the case of the median, we have the following observations.
1. Like the mean, the median also exists in any distribution.
2. The value of the median is also unique.
3. This is a positional measure.
The mode has the following characteristics.
1. It does not always exist.
2. If the mode exists, it is not always unique.
3. In determining the value of the mode, it does not take into account all the values in the
distribution.

x = x = x
The Position of the Mean, Median, and the Mode in a Normal Distribution.

x = x = x
The Position of the Mean, Median, and the Mode in a Positively Skewed Distribution.
x = x = x
The Position of the Mean, Median, and the Mode in a Negatively Skewed Distribution.

Of the three measures of central tendency, the mean is considered the most important. Since
all values are considered in the computation, it can be used in higher statistical treatment.
There are some instances, however, when the mean is not a good representative of a set of
data. This happens when a set of data contains extreme values either to the left or to the right of the
average. In this situation, the value of the mean is pulled to the direction of these extreme values.
Thus, the median is used instead.
When a set of data is symmetric or normally distributed, the three measures are identical or
approximately equal. When the distribution is skewed, that is, either negatively skewed or positively
skewed, the three averages diverge. In any case, however, the value of the median will always be
between the mode and the mean.
A set of data is said to be positively skewed when the graph of the distribution has a longer tail
to the right. The data is said to be negatively skewed when the longer tail is at the left.

QUANTILES

Quartiles
Quartiles refer to the values that divide the distribution into four equal parts. There are three
quartiles represented by Q1, Q2, and Q3. The value Q1 refers to the value in the distribution that falls
on the first one fourth of the distribution arranged in magnitude. In the case of Q 2 or the second
quartile, this value corresponds to the median. In the case of the third quartile or Q 3, this value
corresponds to three fourths of the distribution.

L Q1 x Q3 H
Q2

For grouped data, the procedure of computing the value of the first and the third quartiles is
similar to that of computing the value of the median. The computing formula for the k th quartile
where k = 1, 2, 3 is given by
kn/4 – cumfb
Q k = xb + c where:
fQk xb = lower boundary of the kth quartile class
cumfb = cumulative frequency befor the kth
quartile class
fQk = frequency of the kth quartile class
Example
Considering the frequency distribution of the examination scores of sixty students in a statistics class,
compute the value of the first quartile and the third quartile.
Classes f <cumf
11 – 22 3 3
23 – 34 5 8
35 – 46 11 19
47 – 58 19 38
59 – 70 14 52
71 – 82 6 58
83 – 94 2 60
Solution
To compute the value of Q1,
1. Get ¼ of the total number of frequencies.
n/4 = 60/4 = 15
2. Get the value of the cumulative frequency before the first quartile class
cumfb = 8
3. Determine the first quartile class.
1st quartile class: 35 – 46
4. Determine the lower boundary of the first quartile class.
xb = 35 – 0.5 = 34.5
5. Get the frequency of the first quartile class.
fQ1 = 11
6. Substitute
n/4 – cumfb
Q1 = xb + c ; For the first quartile, k = 1
fQ1
15 – 8
Q1 = 34.5 + 12
11
Q1 = 42.14
To compute the value of Q3,
1. 3n/4 = 3 (60)/4 = 45
2. cumfb = 38
3. 3rd quartile class: 59 – 70
4. xb = 59 – 0.5 = 58.5
5. fQ3 = 14

3 n/4 – cumfb
Q3 = xb + c ; For the third quartile, k = 3
fQ3
45 – 38
Q3 = 58.5 + 12
14
Q3 = 64.5
Deciles
If a given set of data is divided into ten equal parts, then there are nine points of division known
as deciles. The method of computing the values of these measurements is the same as in the median
or quartiles. The nine points of division are denoted by the symbols D1, D2, D3, ….., D9. The first decile
or D1 is the value at or below one tenth of all the items in the distribution. The fifth decile (D 5) is the
value at or below five tenths or one half of the items in the distribution. Thus, the value of the fifth
decile is equal to the value of the median or second quartile.

For grouped data, the computing formula is patterned after the formula for the value of the
median or quartiles as shown below.
kn/10 – cumfb
Dk = xb + c where:
fDk xb = lower boundary of the kth decile class
cumfb = cumulative frequency befor the kth
decile class
fDk = frequency of the kth decile class
k = 1, 2, 3, ….., 9

Example
Considering the frequency distribution of the examination scores of sixty students in a statistics class,
determine the value of the first decile and the fifth decile.
Classes f <cumf
11 – 22 3 3
23 – 34 5 8
35 – 46 11 19
47 – 58 19 38
59 – 70 14 52
71 – 82 6 58
83 – 94 2 60
Solution
To compute the value of D1,
1. kn/10 = 1 (60)/10 = 6
2. cumfb = 3
3. 1st decile class: 23 – 34
4. xb = 23 – 0.5 = 22.5
5. fD1 = 5
6. Substitute
1n/10 – cumfb
D1 = xb + c
fD1
6–3
D1 = 22.5 + 12
5
D1 = 29.7
To compute the value of D5,
1. 5n/10 = 5 (60)/10 = 30
2. cumfb = 19
3. 5th decile class: 47 – 58
4. xb = 47 – 0.5 = 46.5
5. fD5 = 19
6. Substitute
5n/10 – cumfb
D5 = xb + c
fD5
30 – 19
D5 = 46.5 + 12
19
D5 = 53.45

Percentiles
Percentiles refer to those values that divide a distribution into one hundred equal parts. There
are 99 percentiles represented by P1, P2, P3, ….., P99. When we say 45 percentile, we are referring to
that value at or below (45/100)th of the data.
For grouped data, the computing formula is similar to that of the median, quartile or decile.
kn/100 – cumfb
Pk = xb + c where:
fPk xb = lower boundary of the kth percentile class
cumfb = cumulative frequency befor the kth
percentile class
fPk = frequency of the kth percentile class
k = 1, 2, 3, ….., 99

Example
Considering the frequency distribution of the examination scores of sixty students in a statistics class,
determine the value of the 43rd percentile.
Classes f <cumf
11 – 22 3 3
23 – 34 5 8
35 – 46 11 19
47 – 58 19 38
59 – 70 14 52
71 – 82 6 58
83 – 94 2 60
Solution
To compute the value of P43,
1. 43n/100 = 43 (60)/100 = 25.8
2. cumfb = 19
3. 43rd percentile class: 47 – 58
4. xb = 47 – 0.5 = 46.5
5. fP43 = 19
6. Substitute
43n/100 – cumfb
P43 = xb + c
fP43
25.8 – 19
P43 = 46.5 + 12
19
P43 = 50.8

Sample Work:

The performance ratings of 100 faculty members of a certain college were taken and are presented in
a frequency distribution as follows:
Classes f <cumf
71 – 74 3 3
75 – 78 10 13
79 – 82 13 26 - 1st quartile, 2nd decile
83 – 86 18 44 - 40th percentile
87 – 90 25 69
91 – 94 19 88 - 3rd quartile, 7th decile, 75th %
95 – 96 12 100
n = 100
Compute the value of the following:
a. Values of the first and third quartile.
b. Values of the second and seventh decile.
c. Values of the 40th and 75th percentile.

Solution
a. Values of the first and third quartile
To compute the value of Q1,
1. n/4 = 100/4 = 25
2. cumfb = 13
3. 1st quartile class: 79 – 82
4. xb = 79 – 0.5 = 78.5
5. fQ1 = 13
6. Substitute
n/4 – cumfb
Q1 = xb + c ; For the first quartile, k = 1
fQ1
25 – 13
Q1 = 78.5 + 4
13
Q1 = 82.19
To compute the value of Q3,
1. 3n/4 = 3 (100)/4 = 75
2. cumfb = 69
3. 3rd quartile class: 91 – 94
4. xb = 91 – 0.5 = 90.5
5. fQ3 = 19

3 n/4 – cumfb
Q3 = xb + c ; For the third quartile, k = 3
fQ3
75 – 69
Q3 = 90.5 + 4
19
Q3 = 91.76
b. Values of the second and seventh decile
To compute the value of D2,
1. kn/10 = 2 (100)/10 = 20
2. cumfb = 13
3. 2nd decile class: 79 – 82
4. xb = 79 – 0.5 = 78.5
5. fD2 = 13
6. Substitute
2n/10 – cumfb
D2 = xb + c
fD2
20 – 13
D2 = 78.5 + 4
13
D2 = 80.65
To compute the value of D7,
1. 7n/10 = 7 (100)/10 = 70
2. cumfb = 69
3. 7th decile class: 91 – 94
4. xb = 91 – 0.5 = 90.5
5. fD7 = 19
6. Substitute
7n/10 – cumfb
D7 = xb + c
fD7
70 – 69
D7 = 90.5 + 4
19
D7 = 90.71
c. Values of the 40th and 75th percentile
To compute the value of P40,
1. 40n/100 = 40 (100)/100 = 40
2. cumfb = 26
3. 40th percentile class: 83 – 86
4. xb = 83 – 0.5 = 82.5
5. fP40 = 18
6. Substitute

40n/100 – cumfb
P40 = xb + c
fP40
40 – 26
P40 = 82.5 + 4
18
P40 = 85.61
To compute the value of P75,
1. 75n/100 = 75 (100)/100 = 75
2. cumfb = 69
3. 40th percentile class: 91 – 94
4. xb = 91 – 0.5 = 90.5
5. fP75 = 19
6. Substitute
75n/100 – cumfb
P75 = xb + c
fP75
75 – 69
P75 = 90.5 + 4
19
P75 = 91.76

Chapter 4: MEASURES OF VARIATION


In the previous chapter, we discussed the measures of central tendency. These measures simply
approximate the central value of the distribution. However, such descriptions are not enough to be
able to adequately describe the characteristics of a set of data. Hence, there is a need to consider
how the values are scattered on either side of the center. Values used to determine the scatter of
values in a distribution are called measures of variation. In this chapter, we will consider the six
measures of variation: the range; the inter-quartile range; the quartile deviations; the average
deviation; the variance; and the standard deviation.
RANGE
Among the measures of variation, the range is considered the simplest. We defined the range as
the difference between the highest and the lowest value in the distribution. For example, if the
lowest value in the distribution is 12 and the highest value is 125, then the range is the difference
between 125 and 12 which is 113. In symbols, if we let R be the range, then
R = H–L where: H – represents the highest value
L – represents the lowest value

In the case of grouped data, the difference between the highest upper class boundary and the
lowest lower class boundary is considered the range. The rationale is that the class boundaries are
considered the true limits.
Example
Determine the value of the range of the given data.
Classes f
11 – 22 3
23 – 34 5
35 – 46 11
47 – 58 19
59 – 70 14
71 – 82 6
83 – 94 2
n = 60
Solution
R = H–L
= 94.5 – 10.5
R = 84
Notice the simplicity of the way the value of the range is computed. This value is generally
determined if the objective is to emphasize extreme variations. For example, in examinations,
performance of the students is generally evaluated in terms of ranges. It is done by quoting the
lowest and highest scores obtained.
The range, of course has some disadvantages. First, this value is always affected by extreme
values. Second, in the process of computing the value of the range, not all values are considered.
Thus, does not consider the variation of the items relative to the central value of the distribution.

SEMI-INTER QUANTILE RANGE OR QUARTILE DEVIATION


Another measure of variation is the semi-inter quartile range or quartile deviation. This value is
obtained by getting one half the difference between the third and the firs quartiles. This value covers
the middle 50% of the distribution.

L Q1 x Q3 H
Q
Q3 – Q1

In symbols, if we let Q be the semi-inter quartile range, then


Q3 – Q1
Q =
2
Example
The examination scores of 60 students in a statistics class resulted to the following values: Q 1 = 42.14
and Q3 = 64.5. Determine the value of the semi-inter quartile range.
Solution
Q3 – Q1
Q =
2
64.5 – 42.14
=
2
Q = 11.18
In a given frequency distribution, if the values of Q1 and Q3 are not given, then these values
should be first computed before the value of Q can be obtained.
Example
The performance ratings of 100 faculty members of a certain college were taken and are presented in
a frequency distribution as follows:
Classes f < cumf
71 – 74 3 3
75 – 78 10 13
79 – 82 13 26 - - - 1st quartile class
83 – 86 18 44
87 – 90 25 69
91 – 94 19 88 - - - 3rd quartile class
95 – 96 12 100
n = 100
Compute the value of the semi-inter quartile range.

Solution
To compute the value of Q1,
1. n/4 = 100/4 = 25
2. cumfb = 13
3. 1st quartile class: 79 – 82
4. xb = 79 – 0.5 = 78.5
5. fQ1 = 13
6. Substitute
n/4 – cumfb
Q1 = xb + c ; For the first quartile, k = 1
fQ1
25 – 13
Q1 = 78.5 + 4
13
Q1 = 82.19
To compute the value of Q3,
1. 3n/4 = 3 (100)/4 = 75
2. cumfb = 69
3. 3rd quartile class: 91 – 94
4. xb = 91 – 0.5 = 90.5
5. fQ3 = 19

3 n/4 – cumfb
Q3 = xb + c ; For the third quartile, k = 3
fQ3
75 – 69
Q3 = 90.5 + 4
19
Q3 = 91.76
Q3 – Q1
Q =
2
91.76 – 82.19
=
2
Q = 4.78 - - - (semi-inter quartile range)
AVERAGE DEVIATION
In the preceding sections, the variation of the distribution was computed without taking into
consideration all the values in the distribution. Another measure known as the average deviation takes into
account each and every item in the distribution.
The average deviation refers to the arithmetic mean of the absolute deviation of the values from the
mean of the distribution. This measure is sometimes referred to as the mean absolute distribution.
Average Deviation for Ungrouped Data
Let AD be the average deviation. Then for ungrouped data, the computing formula is given by

Σ │x - x│
AD = where: x represents the individual values
n x is the mean of the distribution

To be able to apply the above equation, we shall follow the steps below.
1. Arrange the values in column according to magnitude.
2. Compute the value of the mean (x).
3. Determine the deviation (x – x).
4. Convert the deviation in step 3 into positive deviations. Use the absolute value sign │x - x│.
5. Get the sum of the absolute deviations in step 4.
6. Divide the sum in step 5 by n.
Example
Consider the following values.
x: 13, 16, 9, 6, 15, 7, 11
Determine the value of the average deviation.
Solution
x x–x │x – x│
6 6 – 11 = -5 5
7 7 – 11 = -4 4
9 9 – 11 = -2 2
11 11 – 11 = 0 0
13 13 – 11 = 2 2
15 15 – 11 = 4 4
16 16 – 11 = 5 5
Σx = 77 Σ│x – x│ = 22

Σ 77
x = = = 11
n 7
Σ │x - x│ 22
AD = = = 3.14
n 7
Average Deviation for Grouped Data
Computing formula:
Σ f│x - x│
AD = where: x – midpoint of each class
n x – mean of the distribution
f – represents the frequency of each class
n – the total number of frequency
Steps in determining the average deviation:
1. Compute the value of the mean (x).
2. Determine the deviation (x – x).
3. Multiply the deviation by its corresponding frequency.
4. Add the results in step 3.
5. Divide the sum in step 4 by n.
Example
Compute the average deviation of the frequency distribution for the examination results of sixty students in a
statistics class.
1. Compute the value of the mean.
Classes f x fx
11 – 22 3 16.5 49.5
23 – 34 5 28.5 142.5
35 – 46 11 40.5 445.5
47 – 58 19 52.5 997.5
59 – 70 14 64.5 903.0
71 – 82 6 76.5 459.0
83 – 94 2 88.5 177.0
n = 60 Σ fx = 3,174.0
Σ fx 3,174
x = = = 52.90
n 60
2. Construct the deviation column. Convert the deviations to positive deviations.

Classes f x fx x–x x–x


11 – 22 3 16.5 49.5 - 36.4 36.4
23 – 34 5 28.5 142.5 - 24.4 24.4
35 – 46 11 40.5 445.5 - 12.4 12.4
47 – 58 19 52.5 997.5 - 0.4 0.4
59 – 70 14 64.5 903.0 11.6 11.6
71 – 82 6 76.5 459.0 23.6 23.6
83 – 94 2 88.5 177.0 35.6 35.6
3. Multiply the positive deviations by their corresponding frequencies.
Classes f x fx x–x x–x f x–x
11 – 22 3 16.5 49.5 - 36.4 36.4 109.2
23 – 34 5 28.5 142.5 - 24.4 24.4 122.0
35 – 46 11 40.5 445.5 - 12.4 12.4 136.4
47 – 58 19 52.5 997.5 - 0.4 0.4 7.6
59 – 70 14 64.5 903.0 11.6 11.6 162.4
71 – 82 6 76.5 459.0 23.6 23.6 141.6
83 – 94 2 88.5 177.0 35.6 35.6 71.2
n = 60 Σ f x – x = 750.4
Σ f│x - x│ 750.4
AD = = = 12.51
n 60
VARIANCE
This measure considers another procedure in handling signed numbers. Instead of assuming positive
deviations, a valid arithmetical operation is used, the squaring operation. If the deviations x – x are squared,
then the sum of the squared deviations will also be equal to a positive number. If this sum is divided by the
sample size, then the mean of the squared deviations could be determined.
Variance for Ungrouped Data
Computing formula:
Σ (x – x )2
s2 = where: x – represents the individual values in the distribution
n x – mean of the distribution
n – sample size
Steps in determining the variance:
1. Compute the value of the mean (x).
2. Determine the deviation of each value from the mean. (x – x).
3. Square the deviations.
4. Calculate the sum of the squared deviations.
5. Divide the sum by the total number of values.
Example
Compute the value of the variance of the following measurements.
13, 5, 7, 9, 10, 17, 15, 12
Solution
x x–x (x – x)2
5 -6 36
7 -4 16
9 -2 4
10 -1 1
12 1 1
13 2 4
15 4 16
17 6 36
2
Σx = 88 Σ (x – x) = 114

Σ 88
x = = = 11
n 8
Σ (x – x)2 114
2
s = = = 14.25
n 8

Alternative computing formula:

Σ x2 Σx 2

s2 = –
n n

Steps in determining the variance using the above equation:


1. Arrange the values in magnitude and in vertical column.
2. Get the sum of the values.
3. Square the values in step 1.
4. Add the squared values.
5. Substitute in the formula.
Example
Compute the value of the variance of the following measurements.
13, 5, 7, 9, 10, 17, 15, 12
Solution
x x2
5 25
7 49
9 81
10 100
12 144
13 169
15 225
17 289
2
Σx = 88 Σ x = 1082

Σ x2 Σx 2
2
s = –
n n

2
1082 88
s2 = – = 124.25
8 8
Variance for Grouped Data
Computing formula:
Σ f (x – x) 2
s2 = where: x – midpoint of each class interval
n x – mean of the distribution
f – frequency of each class
n – sample size
Steps in determining the variance of grouped data:
1. Compute the value of the mean (x).
2. Determine the deviation (x – x) by subtracting the mean from the midpoint of each class interval.
3. Square the deviation obtained in step 2.
4. Multiply the frequencies by their corresponding squared deviations then add the results.
5. Divide the sum in step 4 by the sample size.
Example
Calculate the value of the variance of the frequency distribution for the examination results of sixty students in
a statistics class.
Classes f x fx x–x (x – x) 2 f (x – x) 2
11 – 22 3 16.5 49.5 - 36.4 1,324.96 3,974.88
23 – 34 5 28.5 142.5 - 24.4 595.36 2,976.80
35 – 46 11 40.5 445.5 - 12.4 153.76 1,691.36
47 – 58 19 52.5 997.5 - 0.4 0.16 3.04
59 – 70 14 64.5 903.0 11.6 134.56 1,883.84
71 – 82 6 76.5 459.0 23.6 556.96 3,341.76
83 – 94 2 88.5 177.0 35.6 1,267.36 2,534.72
n = 60 Σ fx = 3,174 Σ f (x – x) 2 = 16,406.40
Σ fx 3,174
x = = = 52.90
n 60
Σ f (x – x)2 16,406.40
2
s = = = 273.44
n 60
Alternative computing formula:
Σ fd2 Σ fd 2
2
s = – c2
n n
Steps in determining the variance using the unit deviation method:
1. Determine the unit deviation column.
2. Multiply the frequency by its corresponding unit deviation.
3. Square the unit deviation.
4. Multiply the squared unit deviation by its corresponding frequency.
5. Add the results in step 2.
6. Add the results in step 4.
7. Substitute in the formula.

Example
Calculate the value of the variance of the frequency distribution for the examination results of sixty students in
a statistics class.
Classes f d fd d2 fd2
11 – 22 3 -3 -9 9 27
23 – 34 5 -2 -10 4 20
35 – 46 11 -1 -11 1 11
47 – 58 19 0 0 0 0
59 – 70 14 1 14 1 14
71 – 82 6 2 12 4 24
83 – 94 2 3 6 9 18
n = 60 Σ fd = 2 Σ fd2 = 114
Σ fd2 Σ fd 2
2
s = – c2
n n
2
114 2
2
s = – (12)2 = 273.44
60 60
STANDARD DEVIATION
This is one of the most important measures of variation. In the computation of the variance, the
deviation x – x was squared. This implies that the variance is expressed in square units. Extracting the square
root of the value of the variance will give the value of the standard deviation.

s = s2
For ungrouped data:
Σ (x – x)2
s =
n

Σ x2 Σx 2

s = –
n n
For grouped data:
Σ f (x – x)2
s =
n

Σ fd2 Σ fd 2

s = – c2
n n
Example
The value of the variance of a set of measurements was computed to be equal to 128.93. Determine the value
of the standard deviation.

s = s2 = 128.93 = 11.35

Math 212 – ENGINEERING PROBABILITY AND STATISTICS


Pre-Final Exam

1. The results of an IQ test of a group of students in a certain college were taken and are presented in a
frequency distribution below.
Classes f <cumf
70 – 75 2 2
76 – 81 8 10
82 – 87 19 29 ---- 2nd decile class
88 – 93 21 50 ---- 1st quartile class
94 – 99 28 78 ---- 40th percentile class
100 – 105 38 116 ---- 3rd quartile class; 7th decile class, 75th percentile class
106 – 111 15 131
112 – 117 9 140
n = 140
Compute the following:
a. Values of the first and third quartile.
b. Values of the second and seventh quartile.
c. Values of the 40th and 75th percentile.
Solution
a. Values of the first and third quartile
To compute the value of Q1,
1. n/4 = 140/4 = 35 4. xb = 88 – 0.5 = 87.5
2. cumfb = 29 5. fQ1 = 21
st
3. 1 quartile class: 88 – 93 6. Substitute
n/4 – cumfb
Q1 = xb + c ; For the first quartile, k = 1
fQ1
35 – 29
Q1 = 87.5 + (6) = 89.21
21
To compute the value of Q3,
1. 3n/4 = 3 (140)/4 = 105 4. xb = 100 – 0.5 = 99.5
2. cumfb = 78 5. fQ3 = 38
3. 3rd quartile class: 100 – 105
3 n/4 – cumfb
Q3 = xb + c ; For the third quartile, k = 3
fQ3
105 – 78
Q3 = 99.5 + (6) = 103.76
38
Q3 – Q1 103.76 – 89.21
Q = = = 7.28 - - - (semi-inter quartile range)
2 2

b. Values of the second and seventh decile.


To compute the value of D2,
1. kn/10 = 2 (140)/10 = 28 4. xb = 82 – 0.5 = 81.5
2. cumfb = 10 5. fD2 = 19
3. 2nd decile class: 82 – 87 6. Substitute
2n/10 – cumfb
D 2 = xb + c
fD2
28 – 10
D2 = 81.5 + (6) = 87.18
19
To compute the value of D7,
1. 7n/10 = 7 (140)/10 = 98 4. xb = 100 – 0.5 = 99.5
2. cumfb = 78 5. fD7 = 38
3. 7th decile class: 100 – 105 6. Substitute
7n/10 – cumfb
D 7 = xb + c
fD7
98 – 78
D7 = 99.5 + (6) = 102.66
38
c. Values of the 40th and 75th percentile
To compute the value of P40,
1. 40n/100 = 40 (140)/100 = 56
2. cumfb = 50 5. fP40 = 28
3. 40th percentile class: 94 – 99 6. Substitute
4. xb = 94 – 0.5 = 93.5
40n/100 – cumfb
P40 = xb + c
fP40
56 – 50
P40 = 93.5 + (6) = 94.79
28
To compute the value of P75,
1. 75n/100 = 75 (140)/100 = 105 4. xb = 100 – 0.5 = 99.5
2. cumfb = 78 5. fP75 = 38
3. 40th percentile class: 100 – 105 6. Substitute
75n/100 – cumfb
P75 = xb + c
fP75
105 – 78
P75 = 99.5 + (6) = 103.76
38
2. The NSAT scores of a group of freshmen in a certain college were taken and are presented in a
frequency distribution as shown below.
a. Average deviation
Classes f x fx x–x |x – x| f |x – x|
43 – 49 9 46 414 - 21.56 21.56 194.04
50 – 56 13 53 689 - 14.56 14.56 189.28
57 – 63 15 60 900 - 7.56 7.56 113.40
64 – 70 25 67 1675 - 0.56 0.56 14.00
71 – 77 16 74 1184 6.44 6.44 103.04
78 – 84 10 81 810 13.44 13.44 134.40
85 – 91 8 88 704 20.44 20.44 163.52
92 – 98 4 95 380 27.44 27.44 109.76
n = 100 Σfx = 6756 Σ f|x – x| = 1021.44
Σ fx 6756
x = = = 67.56
n 100
Σ f│x - x│ 1021.44
AD = = = 10.21
n 100
b. Value of the variance considering the mean of the distribution.
Classes f x fx x–x (x – x)2 f (x – x)2
43 – 49 9 46 414 - 21.56 464.83 4183.47
50 – 56 13 53 689 - 14.56 211.99 2755.87
57 – 63 15 60 900 - 7.56 57.15 857.30
64 – 70 25 67 1675 - 0.56 0.31 7.75
71 – 77 16 74 1184 6.44 41.47 663.52
78 – 84 10 81 810 13.44 180.63 1806.30
85 – 91 8 88 704 20.44 417.79 3342.32
92 – 98 4 95 380 27.44 752.85 3011.40
n = 100 Σfx = 6756 Σ f|x – x| = 16627.93
Σ f (x – x)2 16627.93
2
s = = = 166.28
n 100
c. Value of the variance using the unit deviation method
Classes f d fd d2 f d2
43 – 49 9 -3 - 27 9 81
50 – 56 13 -2 - 26 4 52
57 – 63 15 -1 - 15 1 15
64 – 70 25 0 0 0 0
71 – 77 16 1 16 1 16
78 – 84 10 2 20 4 40
85 – 91 8 3 24 9 72
92 – 98 4 4 16 16 64
n = 100 Σfd = 8 Σ fd2 = 340

Σ fd2 Σ fd 2
2
s = – c2
n n
2
340 8
2
s = – (7)2 = 166.29
100 100
d. Standard deviation

s = s2 = 166.29 = 12.90

Chapter 5. COUNTING TECHNIQUES


If one has to make a prediction regarding the outcome of a certain activity, then the possible
outcomes should first be identified. Identification of these outcomes requires the knowledge of counting. In
counting, the simplest method is done by enumerating all the possible outcomes. This method, however, is
too laborious and hence, not an efficient method. In this chapter, we shall consider the most commonly
used methods of counting – the fundamental principle of counting, permutation, and combination.
FUNDAMENTAL PRINCIPLE OF COUNTING
Consider the numbers 1, 2, 3, and 4. Suppose we want to determine the total two-digit numbers that
can be formed if these numbers are combined. First, let us assume that no digit is repeated. Thus, the
possible two-digit numbers that can be formed can be enumerated as follows:
12 21 31 41
13 23 32 42
14 24 34 43
Notice that we are able to exhaust all the possibilities through enumeration. In this example, we have
12 possible two-digit numbers. Now, what if the digits can be repeated? If repetition is allowed
11 12 13 14
21 22 23 24
31 32 33 34
41 42 43 44
Hence, we have 16 possible outcomes. This method of enumeration can always be applied if the
number of possibilities is at a manageable level.
Let us now consider the three- digit numbers that can be formed from the numbers 1, 2, 3, 4, and 5.
Clearly, the number of possibilities is now large enough to be enumerated. Thus, we have to use another
counting technique – the fundamental principle of counting which is stated below.
If the first activity can be done in n1 ways and after it has been done, the second activity can be done
in n2 ways, then the total number of ways in which the two activities can be done is equal to n1 * n2.
This principle can be extended to cases with more than two activities. This method is sometimes
referred to as multiplication of choices.
Examples
1. How many two digit numbers can be formed from the numbers 1, 2, 3, and 4 if
a. repetition is not allowed?
b. Repetition is allowed?
a. We shall let n1 be the number of ways of filling the 10s place and n2 be the number of ways
of filling the units place. Then, the tens place can be filled in 4 ways since we can use any of
the four numbers. Since repetition is not allowed, then the units place can be filled in 3
ways. The logic here is that the number used in the tens place can no longer be used in the
units place. Thus we have n1 = 4 and n2 = 3. Then by the fundamental principle of counting,
we have
n1 * n2 = 4 * 3 = 12 ways
b. The number of ways of filling the tens place is equal to 4. Since repetition is allowed, then
the numbers used in the tens place can still be used in the units place. Thus, n1 = 4 and n2 =
4. Then, by fundamental principle of counting, we have
n1 * n2 = 4 * 4 = 16 ways
2. How many three-digit numbers can be formed from the digits 1, 2, 3, 4, and 5 if any of the digits
can be repeated?
We let n1 be the number of ways of filling the hundreds place. Let n2 be the number of
ways of filling the tens place and n3 be the number of ways of filling the units place.
The hundreds place can be filled in 5 ways, since repetition is allowed, then the tens, and
the units place can both be filled in 5 ways. Thus, n1 = 5, n2 = 5, and n3 = 5. Then by the
fundamental principle of counting, we have
n1 * n2 * n3 = 5 * 5 * 5 = 125 ways
3. The club members are going to elect their officers. If there are 4 candidates for president, 3 for vice
president and two for secretary, then in how many ways can the officers be elected?
Let n1 be the number of ways of electing the president. Let n2 be the number of ways of
electing the vice president and n3 be the number of ways of electing the secretary. By
inspection, we can say n1 = 4, n2 = 3 and n1 = 2. Then by the fundamental principle of
counting, we have
n1 * n2 * n3 = 4 * 3 * 2 = 24 ways
4. In how many ways can Miss Universe, the first runner-up, and second runner-up be chosen from 10
finalists?
Let n1 be the number of ways of choosing the Miss Universe. Let n2 be the number of ways
of choosing the first runner-up and n3 be the number of ways of choosing the second
runner-up.
The Miss Universe can be chosen in 10 possible ways. The first and second runner-ups can
be chosen in 9 and 8 ways respectively. Hence by fundamental principle of counting, we
have
n1 * n2 * n3 = 10 * 9 * 8 = 720 ways

PERMUTATION
The term permutation refers to the arrangement of objects with reference to order. Given a set with
n objects, then we can take r objects from the set. The total number of permutation of n objects taken r at
a time is represented by the notation nPr and can be evaluated using the formula
n!
nPr =
(n – r)!
where n! is read as n factorial.
The factorial of any integer n denoted by n! is defined as follows:
n! = n * (n – 1) * (n – 2) * (n – 3) * . . . . . * 3 * 2 * 1
Examples
1. Evaluate the following:
a. 5! = 5 * 4 * 3 * 2 * 1 = 120
b. 7! = 7 * 6 * 5 * 4 * 3 * 2 * 1 = 5,040
c. 10! = 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1 = 3,628,800
We shall define the factorial of 0 to be equal to 1. Thus in symbols, we have 0! = 1. Also, the factorial of 1 is
also equal to 1 or 1! = 1.
It should be noted that 5! ≠ 5 since 5! = 120. In the same manner, 3! + 4! ≠7!. The left hand side of the
equation is equal to
3! + 4! = 6 + 24 = 30
The right hand of the equation is 7! = 5,040.
Going back to the equation, this equation can be evaluated if we can assign a value of n and r. To
illustrate, the value of n = 7 and r = 3. Then the permutation of 7 objects taken 3 at a time is equal to
7! 7! 5,040
7P3 = = = = 210
(7 – 3)! 4! 24
2. Evaluate the value of the following:
10! 10! 3,628,800
a. 10P5 = = = = 30,240
(10 – 5)! 5! 120
5! 5! 5! 5! 5! 5! 120 120 120
b. 5P3 + 5P4 + 5P5 = + + = + + = + +
(5 – 3)! (5 – 4)! (5 – 5)! 2! 1! 0! 2 1 1
= 60 + 120 + 120 = 300
8! 8! 40,320
c. 4 (8P4) = 4 * = 4* = 4* = 4 * 1,680 = 6,720
(8 – 4)! 4! 24
3. In how many ways can a president, a vice president, a secretary, and a treasurer be elected from a
class with 39 students?
Given 39 students, we are going to fill 4 distinct positions. Hence we can say that n = 39 and r = 4.
39! 39!
39P4 = = = 1,974,024 ways
(39 – 4)! 35!
4. In how many ways can 8 individuals be seated in a row of 8 chairs?
Generally, seating arrangements are treated as a permutation problem. Also, this problem is a case
where n objects are taken altogether since n = r = 8.
8! 8! 8!
8P8 = = = = 8! = 40,320 ways
(8 – 8)! 0! 1
5. In how many ways can 9 distinct books be arranged in a shelf?
Given 9 books, all of these books shall be taken altogether if arranged in a shelf. Thus, n = 9, r = 9.

9! 9! 9!
9P9 = = = = 9! = 362,880 ways
(9 – 9)! 0! 1
6. In how many ways can 8 individuals be seated in a row of 8 seats if two individuals wanted to be
seated side by side?
In solving this problem, we have to consider first the two individuals who wanted to be seated side
by side as one individual, Hence we shall have 7 individuals taken 7 at a time or 7P7. The next step is
to consider the permutation of the two individuals taken as one. We shall have 2P2.
Applying the fundamental principle of counting, we have
7! 2!
7P7 * 2P2 = * = 7! * 2! = 5,040 * 2 = 10,080 ways
(7 – 7)! (2 – 2)!
7. Suppose 4 different mathematics books and 5 different physics books shall be arranged in a shelf.
In how many ways can such books be arranged if the books of the same subject shall be placed side
by side?
The initial step is to arrange the set of math books as one object and the set of physics books as
another object. This can be done in 2P2 ways. The second step is to determine the number of
permutation of 4 distinct math books. This can be done in 4P4 ways. The next step is to determine
the total number of permutations of 5 distinct physics books. This can be done in 5P5 ways. Then
finally, we apply the fundamental principle of counting. Thus.

2! 4! 5!
2P2 * 4P4 * 5P5 = * * = 2! * 4! * 5!
(2 – 2)! (4 – 4)! (5 – 5)!
= 2 * 24 * 120 = 5,760 ways

PERMUTATIONS WITH THINGS THAT ARE ALIKE

In the preceding section, the equation used was established with the assumption that n objects are
distinct. There are some instances when some objects cannot be distinguished from one another since such
objects are alike.
The number of permutations of n objects taken altogether, where r1 are of one kind, r2 are of the
other kind, and so on is given by
n!
nPn =
r1! * r2!* . . . * rk!
Examples
1. Determine the number of permutations that can be formed using the letters of the word DADDY.
Let r1 be the number of D’s, r2 be the number of A’s and r3 be the number of Y’s. Hence, we
can say that r1 = 3, r2 = 1 and r 3 = 1.

5! 120
5P5 = = = 20
3! * 1! * 1! 6
2. Find the possible permutations of the word MISSISSIPPI.
The number of n = 11. Let r1 be the number of M’s, r2 be the number of I’s, r3 be the
number of S’s and r4 be the number of P’s. By inspection, r1 = 1, r2 = 4, r3 = 4, r4 = 2.
11!
11P11 = = 34,650 ways
1! * 4! * 4! * 2!

3. Find the total 7 digit numbers that can be formed using all the digits in the following numerals
5771535
From the given conditions, we can say that n = 7
r1 = 5 (number of 5’s)
r2 = 2 (number of 7’s)
r3 = 1 (number of 1’s)
r4 = 1 (number of 3’s)
7!
P
7 7 = = 4,20 ways
3! 2! 1! 1!
4. In how many ways can 3 copies of Hart’s Algebra, 5 copies of Rainville’s Calculus, and 7 copies of
Rider’s Trigonometry can be arranged in a shelf?
15!
15P15 = = 360,360 ways
3! 5! 7!

CIRCULAR PERMUTATION
If n distinct objects are arranged in a circle, then the arrangement is known as circular permutation.
The number of circular permutations of n objects taken altogether is
n – 1 Pn – 1 = (n – 1)!
Examples
1. In how many ways can 6 individuals be seated in a round table with 6 chairs?
6 – 1 P6 – 1 = (6 – 1)! = 5! = 120 ways
2. In how many ways can 6 persons be seated around a table with 6 chairs if two individuals wanted
to be seated side by side?
Consider the two individuals who wanted to be seated as one person, thus n = 5. The
number of circular permutations of these 5 individuals is
5 –1P5 –1 = (5 – 1)! = 4! = 24.
Then consider the permutation of the two persons treated as one
2P2 = 2! = 2
By the fundamental principle of counting
24 * 2 = 48 ways
3. In how many ways can seven different colored beads be made into a bracelet?
7 – 1 P7 – 1 = (7 – 1)! = 6! = 720 ways

COMBINATIONS
Consider a group with n objects. From n objects, we can take r objects without considering the order
in which the objects are taken. In this regard, we are talking about combination of n objects taken r at a
time represented by the notation nCr. This concept of combination is usually applied to problem about
groups, communities or collections where order of the elements is not important.
Consider the numbers 1, 2, 3, 4, and 5. Suppose we take two elements from this group of five
numbers. If order is important and no repetition is allowed, then the total number of elements that can be
formed is 20, computed as follows:
5!
5P2 = = 20
(5 – 2)!
The 20 two-digit numbers that can be formed from the numbers 1, 2, 3, 4, and 5 are shown below:
12 13 14 15
21 23 24 25
31 32 34 35
41 42 43 45
51 52 53 54
Notice that since order is important, the numbers 12 and 21 are different numbers. Similarly, the
numbers 34 and 43 are treated as different numbers. In the case of combinations, the numbers 12 and 21
represent the same grouping since in combination, the order in which the objects are taken is not
important. Thus, if in permutations, we have 20 possibilities, in combination we only have 10.
The computing formula for the combination of n objects taken r at a time is given by
n!
C
n r =
(n – r)! r!
Examples
1. In how many ways can a committee of 3 members be chosen from a group of 6 members?
Grouping in terms of committees is treated as a combination problem since only one
position is being filled, that is, committee membership. Thus, we have a situation of having
6 objects taken 3 at a time.
6! 6!
6C3 = = = 20 ways
(6 – 3)! 3! 3! 3!
2. A class consists of 5 boys and 7 girls.
a. In how many ways can the class president, the vice president, and the secretary be elected?
b. In how many ways can the class elect three members of a certain committee?
a. Election of the president, vice president, and the secretary means filling three distinct
positions. Thus, this is a permutation problem. Hence,
12! 12!
12P3 = = = 1,320 ways
(12 – 3)! 9!
b. Choosing members of a committee implies filling only one position and therefore a
combination problem. Thus
12! 6!
12C3 = = = 220 ways
(12 – 3)! 3! 3! 3!
3. In how many ways can a student answer 5 out of 8 questions?
8! 8!
8C5 = = = 56 ways
(8 – 5)! 5! 3! 5!
4. In how many ways can a student answer 5 out of 8 questions if he is required to answer 3 of the
first 4 questions?
Since 3 of the first 4 questions must be answered, then this may be done in 4C3 ways.
Having now answered 3 questions, then he must answer 2 of the remaining 4 questions.
This can be done in 4C2 ways. Using the fundamental principle of counting, the total number
of ways in which he can answer the examination is
4! 4!
4C3 * 4C2 = * = 4 * 6 = 24 ways
(4 – 3)! 3! (4 – 2)! 2!
5. In how many ways can 2 balls be drawn from a box containing 7 red and 6 green balls?
There is no condition as to what ball must be drawn from the box. Hence it is required to
determine the number of ways of drawing 2 balls from a box of 13 balls. This can be done in
12C2 ways. Thus,

13! 13!
13C2 = = = 78 ways
(13 – 2)! 2! 11! 2!

6. A box contains 7 red and 6 green balls. In how many ways can 2 balls be drawn such that
a. they are both green?
b. 1 is red and 1 is green?
a. The condition requires that the two balls must be both green. This can be done in 6C2 ways.
Since only two balls shall be drawn, it follows that no red ball shall be taken which can be
done in 7C0 way. Since by the fundamental principle of counting,
6! 7!
6C2 * 7C0 = * = 15 * 1 = 15 ways
(6 – 2)! 2! (7 – 0)! 0!
b. One red ball can be taken from 7 red balls in 7C1 ways. One green ball can also be taken
from 6 green balls in 6C1 ways. Therefore, by the fundamental principle of counting,
7! 6!
7C1 * 6C1 = * = 7 * 6 = 42 ways
(7 – 1)! 1! (6 – 1)! 1!
Chapter 6 : PROBABILITY
CONCEPT OF PROBABILITY
In the study of probability, we shall consider activities for which the outcome cannot be predicted
with certainty. These activities, called experiments, could always result in a single outcome. Although the
single outcome cannot be predicted before the performance of the experiment, the set of all possible
outcomes can be determined. This set of all possible outcomes is referred to as a sample space. Each
individual element or outcome in a sample space is known as a sample point.
Suppose, we let S be the sample space. A subset of S called event shall be a collection of possible
outcomes of the experiment. We say that A has occurred if the outcome of the experiment is an element of
A. The sample space S or an event B may be described by listing all the elements or by defining the
properties that its elements must satisfy.
Examples
1. Consider the activity of rolling a dice. This activity has six possible outcomes, that is 1, 2, 3, 4, 5, and
6. Thus
S = [1, 2, 3, 4, 5, 6]
Any of the numbers 1 to 6 is a sample point of S. We can say that there are six sample
points. If we let A be the event of getting an odd number and B an event of getting a
perfect square, then
A = [1, 3, 5] and B = [1, 4]
Note that the elements of A are elements of the sample space S. The number of sample
points in a sample space S, events A and B are usually written as n(S), n(A), and n(B)
respectively. Thus
n(S) = 6, n(A) = 3, n(B) = 2
2. If a pair of dice is rolled, then determine the number of sample points of the following:
a. Sample space
b. Event of getting a sum of 5
c. Event of getting a sum of at most 4
The determination of the number of sample points requires the knowledge of the
fundamental principles of counting.
a. Let n1 be the number of possible outcomes for the first dice.
Let n2 be the number of possible outcomes for the second dice.
Since n1 = 6 and n2 = 6, then by the fundamental principle of counting, we have
n(S) = n1 * n2 = 6 * 6 = 36
b. Let A be the event of getting the sum of 5. The number of sample points in A can be listed.
Thus
A = [(1, 4), (4, 1), (2, 3), (3, 2)]
n(A) = 4
c. Let B the event of getting a sum of at most 4. Then
B = [(1, 1), (1, 2), (1, 3), (3, 1), (2, 2), (2, 1)]
n(B) = 6

3. A box contains 6 red and 4 green balls. If three balls are drawn from the box, then determine the
number of sample points of the following:
a. The sample space
b. The event of getting all red balls
c. The event of getting 1 red and 2 green balls
Before the sample points are determined, define first the events.
Let S be the event of drawing 3 balls from the box.
Let A be the event of getting all red balls.
Let B be the event of getting 1 red and 2 green balls.
The determination of the number of sample points for these types of problems uses the
concept of combinations. Thus,
10!
a. n(S) = 10C3 = = 120
(10 – 3)! 3!
6! 4!
b. n(A) = 6C3 * 4C0 = = = 20 * 1 = 20
(6 – 3)! 3 ! (4 – 0)! 0!
6! 4!
c. n(B) = 6C1 * 4C2 = * = 6 * 6 = 36
(6 – 1)! 1! (4 – 2)! 2!
Probability is the chance that an event will happen. The probability of an event A denoted by P(A)
refers to a number between 0 and 1 including the values 0 and 1. This number can be expressed as a
fraction, as a decimal or as a percent. When a probability of zero is assigned to event A, it means that it is
impossible for event A to occur. When event A is assigned a probability of 1, then it means that event a will
really occur.
The probability of occurrence plus the probability of non-occurrence of an event is always equal to 1.
Therefore, in a given observation or experiment, an event must occur or not. If we let A’ be the event that
A will not occur, then we say that
P(A) + P(A’) = 1
From the above formula,
P(A) = 1 – P(A’)
or
P(A’) = 1 – P(A)
4. A student in a statistics class was able to compute the probability of passing the subject to be equal
to 0.46. What is the probability that he is not going to pass the subject?
Let A be the event of passing the subject.
Let A’ be the event of not passing the subject.
P(A’) = 1 – P(A) = 1 – 0.46 = 0.54

THREE APPROACHES TO PROBABILITY


The determination of the probability of an event can be determined in different ways. The three
different conceptual approaches are stated below.
1. Subjective Probability
2. Probability of the Relative Frequency
3. Classical probability
In subjective probability, the probability of an event is determined based on individual’s experience or
perception. This approach does not require extensive data to support one’s judgment but simply expresses
the strength of one’s belief with regard to uncertainties involved. Generally, the value associated with
subjective probability is biased since this approach is nothing but an educated guess.
The second approach of interpreting probability of an event is through the determination of the relative
frequency of occurrence. The relative frequency of a certain class interval is obtained by dividing the
frequency (f) by the total number of frequency (n) and multiplying the result by 100. If the value of n is
large, then the relative frequency can be used to predict the probability of an event in the future.
5. Records show that 100 out of 1,500 students who entered a certain college leave the school due to
financial problem. What can we say about the probability that a freshman entering this college will
leave the school due to financial reason?
Solution. Let A be the event that a freshman will leave the college due to financial reason.
Then based on the past records, the relative frequency shall be
100
P(A) = = 0.0667 or 6.67%
1,500
Hence, we may say that 6.67% of the freshmen students are expected to leave the school
due to financial reasons. In terms of probability, we can say that the chance that a student
will leave the school is 0.0667.
6. Records show that in a certain university, 350 out of 1,750 graduates who took the CPA
examination were able to pass. What can we say about the possible performance of the future
graduates of this university who will take the CPA examination?
Solution. Let A be the event that a graduate of this university who will take the examination will pass.
Using the relative frequency of occurrence,
350
P(A) = = 0.2 or 20%
1,750
Hence, we may say that 0.2 or 20% of the graduates of this university will be able to pass
the CPA examination.
7. Last year, the efficiency ratings of the 406 employees of a certain company were taken and
presented in a frequency distribution below.
Efficiency Rating No. of Employees
70 – 75 75
76 – 81 86
82 – 87 105
88 – 93 80
94 – 99 60
n = 406
Based on the data, what can we say about the proportion of employees for this year who shall have
an efficiency rating from
a. 70 – 75?
b. 82 – 87?
a. The relative frequency of the interval 70 – 75 is
75
= 0.1847 = 18.47%
406
Hence, for this year, we can say that 18.47% of the employees is expected to have an
efficiency rating from 70 – 75.
b. The relative frequency of the interval 82 – 87 is
105
= 0.2586 = 25.86%
406
Hence, we may that 25.86% of the employees shall have an efficiency rating from 82 – 87.
The third approach in dealing with the concept of probability is the classical probability. In this
approach, an experiment shall be performed. Possible outcomes of an experiment can be predicted even
before the performance. One of the assumptions in classical probability is that the probability of each
sample point must be equal.
The computing formula for the classical probability of an event A is given by
n(A)
P(A) =
n(S)
Where, n(A) represents the number of sample points in event A.
n(S) represents the number of sample points in sample space S
8. If a coin is tossed, what is the probability of getting a head?
Solution. Determine the number of sample points
S = {head, tail}
n(S) = 2
Let A be the event of getting a head.
A = {head}
n(A) = 1
n(A) 1
P(A) = = = 0.5
n(S) 2
9. If two coins are tossed, what is the probability of getting two heads?
Solution. Determine the number of sample points
S = {HH, HT, TH,TT}
n(S) = 4
Let A be the event of getting two heads.
A = {HH}
n(A) = 1
n(A) 1
P(A) = = = 0.25
n(S) 4
10. If a dice is rolled, what is the probability of getting
a. An odd number?
b. An even number?
c. A perfect square?
Solution. Determine the number of sample points
S = {1, 2, 3, 4, 5, 6}
n(S) = 6
a. Let A be the event of getting an odd number.
A = {1, 3, 5} and n(A) = 3
n(A) 3 1
P(A) = = =
n(S) 6 2
b. Let B the event of getting an even number.
B = {2, 4, 6} and n(B) = 3
n(B) 3 1
P(B) = = =
n(S) 6 2
c. Let C be the event of getting a perfect square.
C = {1, 4} and n(C) = 2
n(C) 2 1
P(C) = = =
n(S) 6 3
11. If a pair of dice is rolled, what is the probability of getting
a. a sum of 6?
b. a sum of less than 13?
c. a sum of 13?
d. a sum of at least 10?
Solution. Determine the number of sample points.
n(S) = 36
a. Let A be the event of getting a sum of 6.
A = {(1, 5), (5, 1), (2, 4), (4, 2), (3, 3) }
n(A) = 5
n(A) 5
P(A) = =
n(S) 36
b. Let B be the event of getting a sum of less than 13.
n(B) = 36
n(B) 36
P(B) = = =1
n(S) 36
c. Let C be the event of getting a sum of 13.
n(C) = 0
n(C) 0
P(C) = = = 0
n(S) 36
d. Let D be the event of getting a sum of at least 10.
D = {(4, 6), (6, 4), (5, 5), (5, 6), (6, 5), (6, 6)}
n(D) = 6
n(D) 6
P(A) = =
n(S) 36
12. A box has 3 red, 4 green, and 6 yellow balls. If a ball is drawn from the box, what is the probability
that
a. it is green?
b. it is not red?
Solution. Determine the number of sample points.
n(S) = 13
a. Let A be the event of getting a green ball.
n(A) = 4
n(A) 4
P(A) = =
n(S) 13
b. Let B be the event of getting a sum of less than 13.
n(B) = 3
P(B’) = 1 – P(B)
n(B) 3 10
P(B’) = 1 – = 1– =
n(S) 13 13
13. A box contains 7 red and 6 green balls. If two balls are drawn from the box, what is the probability
of getting
a. both green?
b. 1 red and 1 green?
Solution. Determine the number of sample points.
13!
n(S) = 13C2 = = 78
(13 – 2)! 2!
a. Let A be the event of getting both green balls.
6! 7!
n(A) = 6C2 * 7C0 = * = 15 * 1 = 15
(6 – 2)! 2! (7 – 0)! 0!
n(A) 15 5
P(A) = = =
n(S) 78 26
b. Let B be the event of getting 1 red and 1 green ball.
7! 6!
n(B) = 7C1 * 6C1 = * = 7 * 6 = 42
(7 – 1)! 1! (6 – 1)! 1!
n(B) 42 7
P(B) = = =
n(S) 78 13
ADDITION RULE
In practice, probability of two or more events are usually considered. If we let A and B be the events
then these two events can be combined to form another event. The event that at least one of the events A
and B will happen denoted by A U B. The event that both A and B will occur is denoted by A B. The
probability

Math 212 – ENGINEERING PROBABILITY AND STATISTICS


Final Exam

1. The results of an IQ test of a group of students in a certain college were taken and are presented in a
frequency distribution below.
Classes f
76 – 81 8
82 – 87 19
88 – 93 21
94 – 99 28
100 – 105 38
106 – 111 15
112 – 117 9
Compute the value of the standard deviation using the equation,
Σ f│x - x│
AD =
N
Classes f x fx x–x |x – x| f |x – x|
76 – 81 8 78.5 628.0 - 18.52 18.52 148.16
82 – 87 19 84.5 1605.5 - 12.52 12.52 237.88
88 – 93 21 90.5 1900.5 - 6.52 6.52 136.92
94 – 99 28 96.5 2702.0 - 0.52 0.52 14.56
100 – 105 38 102.5 3895.0 5.48 5.48 208.24
106 – 111 15 108.5 1627.5 11.48 11.48 172.20
112 – 117 9 114.5 1030.5 17.48 17.48 157.32
n = 138 Σfx = 13389.0 Σ f|x – x| = 1075.28
Σ fx 13389
x = = = 97.02
n 138
Σ f│x - x│ 1075.28
AD = = = 7.79
n 138
2. Consider the number 2, 3, 5, 6, and 7. How many two-digit numbers can be formed from these
numbers if
a. repetition is not allowed?
n1 * n2 = 5 * 4 = 20
b. repetition is allowed?
n1 * n2 = 5 * 5 = 25
3. A college has three entrance gates and two exit gates. In how many ways can a student enter then
leave the building?
n1 * n2 = 3 * 2 = 6 ways
4. In how many ways can 8 different books be arranged in a shelf?
8! 8! 8!
8P8 = = = = 8! = 40,320 ways
(8 – 8)! 0! 1
5. In how many ways can the president, vice president, and the secretary be elected from a group with 20
students?
Given 20 students, we are going to fill 3 distinct positions. Hence we can say that n = 20 and r = 3.
20! 20!
20P3 = = = 6,840 ways
(20 – 3)! 17!
6. How many committees of 4 members can be formed from a group with 7 seniors and 6 juniors?
13! 13!
13C4 = = = 715 committees
(13 – 4)! 4! 9! 4!
7. A box contains 2 red, 5 blue, and 5 yellow balls. In how many ways can three balls be drawn from the
box such that
a. they are all blue?
5! 2! 5!
5C3 * 2C0 * 5C0 = * * = 10 * 1 * 1 = 10 ways
(5 – 3)! 3! (2 – 0)! 0! (5 – 0)! 0!
b. 2 are red and 1 is yellow?
2! 5! 5!
2C2 * 5C1 * 5C0 = * * = 1 * 5 * 1 = 5 ways
(2 – 2)! 2! (5 – 1)! 1! (5 – 0)! 0!
c. they are of different colors?
2! 5! 5!
2C1 * 5C1 * 5C1 = * * = 2 * 5 * 5 = 50 ways
(2 – 1)! 1! (5 – 1)! 1! (5 – 1)! 1!
8. If three coins are tossed, what is the probability of getting
a. two heads?
Determine the number of sample points
S = {HHH, HHT, HTH, THH, TTH,THT,HTT,TTT}
n(S) = 8
Let A be the event of getting two heads.
A = {HHT, HTH, THH}
n(A) = 3
n(A) 3
P(A) = = = 0.375
n(S) 8
b. three heads
Let B be the event of getting three heads.
A = {HHH}
n(A) = 1
n(A) 1
P(A) = = = 0.125
n(S) 8
9. A box contains 7 red, 3 green, and 3 yellow balls. If two balls are drawn from the box, then what is the
probability of getting
a. two red balls?
Determine the number of sample points.
13!
n(S) = 13C2 = = 78
(13 – 2)! 2!
Let A be the event of getting both red balls.
7! 3! 3!
n(A) = 7C2 * 3C0 * 3C0 = * * = 21 * 1 * 1 = 21
(7 – 2)! 2! (3 – 0)! 0! (3 – 0)! 0!
n(A) 21 7
P(A) = = = = 0.269
n(S) 78 26
b. 1 red and 1 yellow balls?
Let B be the event of getting 1 red and 1 yellow ball.
7! 3! 3!
n(B) = 7C1 * 3C0 * 3C1 = * * = 7 * 1 * 3 = 21
(7 – 1)! 1! (3 – 0)! 0! (3 – 1)! 1!
n(B) 21 7
P(B) = = = = 0.269
n(S) 78 26
c. non-red balls?
Let C be the event of getting non-red balls.
7! 6!
n(C) = 7C0 * 6C2 = * = 1 * 15 = 15
(7 – 0)! 0! (3 – 2)! 2!
n(C) 15 5
P(C) = = = = 0.192
n(S) 78 26
10. A committee of five is to be formed from a group of 6 men and 5 women. If these individuals have an
equal chance of being selected, what is the probability that the members are
a. all men?
Determine the number of sample points.
11!
n(S) = 11C5 = = 462
(11 – 5)! 5!
Let A be the event of selecting all men.
6! 5!
n(A) = 6C5 * 5C0 = * = 6*1 = 6
(6 – 5)! 5! (5 – 0)! 0!
n(A) 6
P(A) = = = 0.013
n(S) 462
b. two are men and 3 are women?
Let B be the event of selecting 2 men and three women.
6! 5!
n(B) = 6C2 * 5C3 = * = 15 * 10 = 150
(6 – 2)! 2! (5 – 3)! 3!
n(B) 150
P(B) = = = 0.325
n(S) 462

You might also like