You are on page 1of 40

Republic of the Philippines

Bulacan State University


City of Malolos, Bulacan
Tel/Fax (044) 791-0153

2nd Semester, A.Y. 2019 – 2020

COE 202 – Engineering Data Analysis

Lecture 02 – Descriptive Statistics (Describing Data Set)

INTRODUCTION

This lecture will discuss the subject matter of descriptive statistics, and in doing so
learn ways to describe and summarize a set if data. The coverage are as follows: indicate
how data that take discusses ways of summarizing data sets by use of statistics, which
are numerical quantities whose values are determined by the data on only a relatively
few distinct values can be described by using frequency tables or graphs; deals with
data whose set of values is grouped into different intervals; considers three statistics
that are used to indicate the “center” of the data set: the sample mean, the sample
median, and the sample mode; introduces the sample variance and its square root,
called the sample standard deviation. These statistics are used to indicate the spread of
the values in the data set. Also, describe the quartiles of a given data set and present
using a box and whisker plot. Lastly, a graphical technique, called the scatter diagram,
for presenting such data is introduced, as is the sample correlation coefficient, a statistic
that indicates the degree to which a large value of the first member of the pair tends to
go along with a large value of the second.

I. Cumulative and Relative Frequency Calculation

Frequency – is the number of times a given datum occurs in a data set. A relative
frequency is the fraction of times an answer occurs.

Cumulative Frequency – is the sum of the class and all classes below it in a frequency
distribution. at a certain point is found by adding the frequency at the present point to
the cumulative frequency of the previous point. It is the 'running total' of frequencies.

Relative frequency – an event that is defined as the number of times that the event
occurs during experimental trials, divided by the total number of trials conducted. It
can be written as fractions, percent, or decimals.

Steps in Calculating Cumulative and Relative Frequency

The following data recorded in the table is the height


(in inches), of a sample of the Varsity player of
Bulacan State University competing SCUAA. Make a
cumulative and relative distribution table based on
the given data as shown.
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

For Cumulative Frequency:

a) Make a third column on the table and label it ‘Cumulative frequency’, below it
input the first value of the frequency which is 5, and after it on the 2nd cell of
the column, type the formula “=C2+B3” (where C2 is the actual location of your
first cumulative frequency count from), and press ‘enter’ button.

b) The value will then appear, B3 is the location of your second frequency count) in
the first row of your new column. After that select, click the cell you entered the
formula. Click and drag the little square in the bottom right hand corner of the
cell to the bottom column.

n
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

c) Excel will then populate the cell with all the remaining values needed for the
Cumulative frequency column.

For Relative Frequency:

a) First, sum all the frequency below the 2nd column and type the formula “= SUM
(B2:B9)”.
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

b) After that make a fourth column on the table and label it ‘Relative frequency’.
Below it on the first cell of the new column, type the formula “=B2+B$10” (where
D2 is the actual location of your first Relative Frequency count from), and press
‘enter’ button.

c) The value will then appear, and after that select, click and drag the little square
in the bottom right hand corner of the cell to the bottom column.

n
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

d) And finally, Excel will then populate the cell with all the remaining values needed
for the rest of the column of the Relative Frequency.

II. Constructing a Histogram

A histogram is a graphical display of data using bars of different heights. In a histogram,


each bar groups numbers into ranges. Taller bars show that more data falls in that
range. Histograms can provide a visual display of large amounts of data that are difficult
to understand in a tabular, or spreadsheet form.

Steps in Constructing a Histogram

For example, 50 Mechanical Engineering student in a classroom took their exam on


their subject Advance Mathematics. Listed below are the scores they on their exam (out
of 100). Draw a tabular histogram on the following numerical data.
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

a) Record all the data in the spreadsheet. Set a Bin (range) or intervals where the
frequency of the exam scores will occur.

b) Select the entire data set.

c) In the Insert > Chart group > Insert Static Chart > click Histogram chart icon.
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

d) Click the File tab and then select Options > Add-ins > Manage drop-down > Excel
Add-ins > Analysis Toolpack > click Okay, and then go back to the Data tab and
click Data Analysis.
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

e) In the ‘Data Analysis’ dialog box, select Histogram from the list then click Okay.
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

f) In the Histogram dialog box > Input Range (all the exam scores) > Select the Bin
Range (all the set range for the data) > Specify the Output Range if you want to
get the Histogram in the same worksheet > Select Chart Output > then click
Okay.

g) Finally, a frequency distribution table and a Histogram Chart will appear in the
specified location.
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

III. Pareto Diagram

A Pareto diagram, also known as Pareto Analysis or sorted histogram chart contains
both columns sorted in descending order and a line representing the cumulative total
percentage. Pareto diagram highlight the biggest factors in a data set and are considered
one of the seven basic tools of quality control as it's easy to see the most common
problems or issues. To do this effectively, it utilizes the Pareto Principle, which is most
known as the 80/20 rule.

Components of a Pareto Diagram

Cumulative Count Percentage


Cumulative Percentage Curve
Number of Occurrences
(Frequency)

Categories of Data

Steps in Constructing Pareto Diagram

For example, if a business was investigating the delay associated with processing credit
card applications. The data could be grouped into the following categories: no signature,
residential address not valid, non-legible handwriting, already a customer and other.

a) Record the data.


Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

b) Re-order the data from the largest to smallest and sum all the occurrences.

c) Determine the Cumulative percentage of each category represents.

Manual percentage calculation:


𝐼𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
( ) 𝑥100
𝑇𝑜𝑡𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

d) Prepare and analyze the diagram.

And the result.

Cumulative Percentage Curve


Cumulative Count Percentage
Number of Occurrences
(Frequency)

Categories of Data
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

IV. Mean, Median, Mode, SD, Variance Calculation for Grouped and Ungrouped
Data

(Sample Problem – UNGROUPED DATA)

1. A study was conducted to see whether the foot width of the feet of girls differ from
that of boys. Random samples of 19 girls and 20 boys from the fourth grade were
taken. The foot width of each students was taken and are given below:
Girls 8.8 9.3 9.3 7.9 8.7 8.8 9 9.5 8.3 9 8.1 9.5 9.3 8.6 8.6 8.5 9 7.9 8.8
Boys 8.4 8.8 9.7 9.8 8.9 9.7 9.6 8.8 9.8 8.9 9.1 9.8 9.2 8.6 9.4 9.5 8.9 9.3 9 8.6

Find the mean, median, and mode of the foot width of the 20 boys. Compute also the
standard deviation and variance.

(Manual Computation)

➢ For Mean

{The formula for calculating mean for ungrouped data is} 𝑥̅ = ∑X / n

∑X Sum of X1 + X2+ X3 +…X10

𝑥̅ = ∑X / n

n = Total number of boy students.

Calculation:

The mean will be,

∑X = 8.4 + 8.8 + 9.7 + 9.8 + 8.9 + 9.7 + 9.6 + 8.8 + 9.8 + 8.9 + 9.1 + 9.8 + 9.2 + 8.6 +
9.4 + 9.5 + 8.9 + 9.3 + 9 + 8.6 = 183.8

x̅ = ∑X / n = 183.8 / 20 = 9.19

̅ = 9.19
𝒙

Steps involved in computing means for ungrouped data are given below:

a) Add up all the scores of all the students.


b) Divide this sum by the number of students whose scores have been added.

➢ For Median

First, arrange the foot width of the 20 boys in ascending order

8.4, 8.6, 8.6, 8.8, 8.8, 8.9, 8.9, 8.9, 9, 9.1, …

9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.7, 9.8, 9.8, 9.8, …

Determine if the n is even or odd. If n is odd, the sample median is the value in position
(n + 1)/2; if n is even, it is the average of the values in positions n/2 and n/2 + 1.

n = 20 and even

20 / 2 = 10th ,10th foot width is 9.1

20 / 2 + 1 = 11th , 11th foot width is 9.2


Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

(9.1 + 9.2)/2 = 9.15

Median = 9.15

➢ For Mode

In ungrouped data, mode is that single score which occurs most frequently (there can
be more than one mode):

8.4, 8.6, 8.6, 8.8, 8.8, 8.9, 8.9, 8.9, 9, 9.1,

9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.7, 9.8, 9.8, 9.8

Here the foot width 8.9 and 9.8 are repeated maximum number of times therefore 8.9
and 9.8 are the mode.

➢ For Variance

, where: n = total number of boy students

𝑥̅ = mean

xi = individual value

s2 = (8.4-9.19)² + (8.6-9.19)² + (8.6-9.19)² + (8.8-9.19)² + (8.8-9.19)²

19 19 19 19 19

+ (8.9-9.19)² + (8.9-9.19)² + (8.9-9.19)² + (9-9.19)² + (9.1-9.19)²

19 19 19 19 19

+ (9.2-9.19)² + (9.3-9.19)² + (9.4-9.19)² + (9.5-9.19)² + (9.6-9.19)²

19 19 19 19 19

+ (9.7-9.19)² + (9.7-9.19)² + (9.8-9.19)² + (9.8-9.19)² + (9.8-9.19)²

19 19 19 19 19

s2 = 0.2041

➢ For Standard deviation

s = √𝑠 2

s = √0.2041

s = 0.4518
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

(Using Microsoft Excel)

a) Select your data.

b) To generate descriptive statistics for these foot widths, execute the following
steps. Click Data > Data Analysis > Descriptive Statistics > OK

c) Select the range A2:A21 as the Input Range.


Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

d) Select cell C1 as the Output Range.

e) Make sure Summary statistics is checked. Then, click OK.

Result:
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

2. According to the journal Chemical Engineering, an important property of a fiber


is its water absorbency. A random sample of 20 pieces of cotton fiber is taken
and the absorbency on each piece was measured. The following are the
absorbency values:

18.71 21.41 20.72 21.81 19.29 22.43 20.17 23.71 19.44 20.50
18.92 20.33 23.00 22.85 19.25 21.77 22.11 19.77 18.04 21.12.

Calculate the sample mean and median for the above sample values.

(Manual Computation)

➢ For Mean

{The formula for calculating mean for ungrouped data is} 𝑥̅ = ∑X / n

∑X Sum of X1 + X2+ X3 +…X10

𝑥̅ = ∑X / n

n = Total number of absorbency values

Calculation:

The mean will be,

∑X = 18.71 + 21.41 + 20.72 + 21.81 + 19.29 + 22.43 + 20.17 + 23.71 + 19.44 + 20.50 +
18.92 + 20.33 + 23.00 + 22.85 + 19.25 + 21.77 + 22.11 + 19.77 + 18.04 + 21.12 = 415.35

x̅ = ∑X / n = 415.35 / 20 = 20.77

̅ = 20.77
𝒙

Steps involved in computing means for ungrouped data are given below:

a) Add up all the scores of all the students.


b) Divide this sum by the number of students whose scores have been added.

➢ For Median

First, arrange the absorbency values in ascending order

18.04, 18.71, 18.92, 19.25, 19.29, 19.44, 19.77, 20.17, 20.33, 20.5

20.72, 21.12, 21.41, 21.77, 21.81, 22.11, 22.43, 22.85, 23, 23.71

Determine if the n is even or odd. If n is odd, the sample median is the value in position
(n + 1)/2; if n is even, it is the average of the values in positions n/2 and n/2 + 1.

n = 20 and even

20 / 2 = 10th ,10th absorbency value is 20.5

20 / 2 + 1 = 11th , 11th absorbency value is 20.72

(20.5 + 20.72)/2 = 20.61

Median = 20.61
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

Find the sample variance and standard deviation.

➢ For Variance

, where: n = total number of absorbency values

𝑥̅ = mean

xi = individual value

s2 = (18.04-20.77)² + (18.71-20.77)² + (18.92-20.77)² + (19.25-20.77)²

19 19 19 19

+ (19.29-20.77)² + (19.44-20.77)² + (19.44-20.77)² + (20.17-20.77)²

19 19 19 19

+ (20.33-20.77)² + (20.5-20.77)² + (20.72-20.77)² + (21.12-20.77)²

19 19 19 19

+ (21.41-20.77)² + (21.77-20.77)² + (21.81-20.77)² + (22.11-20.77)²

19 19 19 19

+ (22.43-20.77)² + (22.85-20.77)² + (23-20.77)² + (23.71-20.77)²

19 19 19 19

s2 = 2.5329

➢ For Standard deviation

s = √𝑠 2

s = √2.5329

s = 1.5915
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

(Using Microsoft Excel)

a) Select your data.

b) To generate descriptive statistics for these absorbency values, execute the


following steps. Click Data > Data Analysis > Descriptive Statistics > OK

c) Select the range A1:A20 as the Input Range.


Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

d) Select cell C1 as the Output Range.

e) Make sure Summary statistics is checked. Then, click OK.

Result:
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

(Sample Problem – GROUPED DATA)

1. The following measurements were recorded for drying time, in hours, of a certain
brand of latex paint. Calculate the mean, median and mode and find the s2 and
s for the grouped data.

8 2 5 4 5 7 6 9 13 3

Steps to execute prior in the calculation of mean and variance for grouped data:

a) Determine the total number of observations, n.

n = 10

b) Compute for the range, 𝑹 = 𝒙𝒎𝒂𝒙 − 𝒙𝒎𝒊𝒏.

xmax = highest observation from the data set = 13

xmin = lowest observation from the data set = 2

R = 13 – 2

R = 11

c) Select the recommended number of cells 𝑘 (𝒌 = √𝒏) and compute for the cell
𝒙 −𝒙
width 𝑐. 𝒄 = 𝒎𝒂𝒙𝒌 𝒎𝒊𝒏 .

𝑘 = √10 = 3.16 = 3 or 4
13 − 2
𝑐=
√10
c = 3.48 = 3

Hence, the intervals are:


Class Interval

2-4

5-7

8-10

11-15

d) Tally the numbers in each class interval (from the data set above). Next, count
the tally marks and write the frequency in the third column. The frequency is just
the total.
Time (hr) Tally Frequency (f)

2-4 III 3

5-7 IIII 4

8-10 II 2

11-13 I 1
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

e) Using Microsoft Excel, encode the obtain class intervals and frequencies.

f) For the row 1 of column E, F, G and H, encode the following,

X = class mark = the midpoint for each class interval

fX = the product of frequency (f) and class mark (X)

fX2 = the product of frequency (f) and square of class mark (X)

cf (cumulative frequency) = obtain by adding frequencies successively from the lowest


to highest interval

g) For the class mark, at row 3 column E, type “=(A3+B3)/2” then press enter.
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

h) In order to get the remaining class mark, click and drag the little square in the
bottom right hand corner of the cell into the bottom column.

i) For fX, at row 3 column F, type “=PRODUCT(D3, E3)” then press enter.

j) In order to get the remaining fX, click and drag the little square in the bottom
right hand corner of the cell into the bottom column.
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

k) For fX2, at row 3 column G, type “= PRODUCT (D3, E3, E3)” then press enter.

l) In order to get the remaining fx2, click and drag the little square in the bottom
right hand corner of the cell into the bottom column.

m) For cumulative frequency (cf), at row 3 column H, input the first value of the
frequency which is 3. Below of that cell, type the formula “= SUM (H3, D4)” (where
H3 is the actual location of your first cumulative frequency count from) then
press the enter.
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

n) In order to get the remaining cf, click and drag the little square in the bottom
right hand corner of the cell into the bottom column.

o) To get the summation of frequency (f), fX, and fX2, use the formula “= SUM
(D3:D6)” , “= SUM (F3:F6)” and “= SUM (G3:G6)”, respectively.

➢ Calculation of Mean
Mean = ∑fX / n

Midpoint for each class interval = X

n = the total number of observation = 10

∑fX =Sum of the midpoints weighted by their frequencies = 63

Mean = ∑fX / n = 63 / 10 = 6.3

Thus, the Mean marks obtained by this group of students = 6.3


Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

➢ Calculation of Median

Locate the Cumulative Frequency which is greater than or equal to n/2 and note down
its corresponding Median Class.

n/2 = 10/2 = 5
Time (hr) Tally f X fX fX2 cf

2-4 III 3 3 9 27 3

5-7 IIII 4 6 24 144 7

8-10 II 2 9 18 162 9

11-13 I 1 12 12 144 10

n = 10 ∑fX = 63 ∑fX2 = 477

Now, the formula for calculating the median when the data are grouped in class interval
is
𝑛
−𝐹
Median = L + 2
𝑓
×𝑐

Where L = lower boundary point of median class = (5+4)/2 = 4.5

n = total frequency = 10

F = total frequency above the median class = 3

f = frequency of the median class = 4

c = class interval size

= upper boundary – lower boundary = 7.5 – 4.5 = 3

10
−3
Median = 4.5 + 2
4
×3

Median = 6

➢ Calculation of Mode

Locate the class mode by finding the class interval that contains the largest frequency.
Time (hr) Tally f X fX fX2 cf

2-4 III 3 3 9 27 3

5-7 IIII 4 6 24 144 7

8-10 II 2 9 18 162 9

11-13 I 1 12 12 144 10

n = 10 ∑fX = 63 ∑fX2 = 477


Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

Now, the formula for calculating the mode when the data are grouped in class interval
is
𝐷1
Mode = L + 𝐷1 + 𝐷2
×𝑐

Where L = lower boundary point of class mode = (5+4)/2 = 4.5

c = class interval size

= upper boundary – lower boundary = 7.5 – 4.5 = 3

D1 = is the difference between the frequency of class mode and the

frequency of the class before the class mode = 4 – 3 = 1

D2 = is the difference between the frequency of class mode and the

frequency of the class after the class mode = 4 – 2 = 2


𝐷1
Mode = L + 𝐷1 + 𝐷2
×𝑐

1
Mode = 4.5 + 1+2
×3

Mode = 5.5

➢ Calculation of Variance and Standard Deviation

Time (hr) Tally f X fX fX2 cf

2-4 III 3 3 9 27 3

5-7 IIII 4 6 24 144 7

8-10 II 2 9 18 162 9

11-13 I 1 12 12 144 10

n = 10 ∑fX = 63 ∑fX2 = 477

The formula for calculating the variance is,


(∑𝑓𝑥)2
∑f𝑥 2 −
s2 = 𝑛−1
𝑛

Substituting the values,


(63)2
477−
s2 = 10−1
10

So,

s2 = 8.9
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

For the standard deviation,

s = √𝑠 2

s = 2.98

2. Given the following set of data is the number of orders received each day during
the past 50 days at the office of a mail-order company. (a) Calculate the mean,
median, and mode for the grouped data. (b) Find the standard deviation and
variance.
21 18 19 20 18

10 15 16 17 13

17 20 14 19 16

20 17 17 20 15

15 12 15 16 21

16 17 18 10 13

14 20 17 17 20

17 18 14 21 18

20 11 18 19 15

15 21 17 15 16

(Manual Computation)

Steps to execute prior in the calculation of mean and variance for grouped data:

a) Determine the total number of observations, n.

n = 50

b) Compute for the range, 𝑹 = 𝒙𝒎𝒂𝒙 − 𝒙𝒎𝒊𝒏.

xmax = highest observation from the data set = 21

xmin = lowest observation from the data set = 10

R = 21 – 10

R = 11

c) Select the recommended number of cells 𝑘 (𝒌 = √𝒏) and compute for the cell
𝒙 −𝒙
width 𝑐. 𝒄 = 𝒎𝒂𝒙 𝒎𝒊𝒏 .
𝒌

𝑘 = √50 = 7.07 = 7
21 − 10
𝑐=
√50
c = 1.56 = 2
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

Hence, the intervals are:


Class Interval

10-11

12-13

14-15

16-17

18-19

20-21

22-23

d) Tally the numbers in each class interval (from the data set above). Next, count
the tally marks and write the frequency in the third column. The frequency is just
the total.
Class Interval Tally Frequency (f)

10-11 III 3

12-13 III 3

14-15 IIIII IIIII 10

16-17 IIIII IIIII IIII 14

18-19 IIIII IIII 9

20-21 IIIII IIIII I 11

22-23 0

e) Using Microsoft Excel and following the instructions from the previous problem,
we obtain the following,
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

➢ Calculation of Mean

Mean = ∑fX / n

Mid-point for each class interval = X

n = the total number of observations = 50

∑fX =Sum of the midpoints weighted by their frequencies = 837

Mean = ∑fX / n = 837 / 50 = 16.74

Thus, the Mean marks obtained by this group of students = 16.74.

➢ Calculation of Median

Locate the Cumulative Frequency which is greater than or equal to n/2 and note down
its corresponding Median Class.

n/2 = 50/2 = 25
No. of Order Tally f X fX fX2 cf

10-11 III 3 10.5 31.5 330.75 3

12-13 III 3 12.5 37.5 468.75 6

14-15 IIIII IIIII 10 14.5 145 2102.5 16

16-17 IIIII IIIII IIII 14 16.5 231 3811.5 30

18-19 IIIII IIII 9 18.5 166.5 3080.25 39

20-21 IIIII IIIII I 11 20.5 225.5 4622.75 50

22-23 0 22.5 0 0 50

III n = 50 ∑fX = 837 ∑fX2 = 14416.5

Now, the formula for calculating the median when the data are grouped in class interval
is
𝑛
−𝐹
Median = L + 2
𝑓
×𝑐

Where L = lower boundary point of median class = (16+15)/2 = 15.5

n = total frequency = 50

F = total frequency above the median class = 16

f = frequency of the median class = 14

c = class interval size

= upper boundary – lower boundary = 17.5 – 15.5 = 2


Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

50
−16
Median = 15.5 + 2
×2
14

Median = 16.79

➢ Calculation of Mode

Locate the class mode by finding the interval that contains the largest frequency.
No. of Order Tally f X fX fX2 cf

10-11 III 3 10.5 31.5 330.75 3

12-13 III 3 12.5 37.5 468.75 6

14-15 IIIII IIIII 10 14.5 145 2102.5 16

16-17 IIIII IIIII IIII 14 16.5 231 3811.5 30

18-19 IIIII IIII 9 18.5 166.5 3080.25 39

20-21 IIIII IIIII I 11 20.5 225.5 4622.75 50

22-23 0 22.5 0 0 50

III n = 50 ∑fX = 837 ∑fX2 = 14416.5

Now, the formula for calculating the mode when the data are grouped in class interval
is
𝐷1
Mode = L + 𝐷1 + 𝐷2
×𝑐

Where L = lower boundary point of class mode = (16+15)/2 = 15.5

c = class interval size

= upper boundary – lower boundary = 17.5 – 15.5 = 2

D1 = is the difference between the frequency of class mode and the

frequency of the class before the class mode = 14 – 10 = 4

D2 = is the difference between the frequency of class mode and the

frequency of the class after the class mode = 14 – 9 = 5


𝐷1
Mode = L + 𝐷1 + 𝐷2
×𝑐

4
Mode = 15.5 + 4+5
×2

Mode = 16.39
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

➢ Calculation of Variance and Standard Deviation

No. of Order Tally f X fX fX2 cf

10-11 III 3 10.5 31.5 330.75 3

12-13 III 3 12.5 37.5 468.75 6

14-15 IIIII IIIII 10 14.5 145 2102.5 16

16-17 IIIII IIIII IIII 14 16.5 231 3811.5 30

18-19 IIIII IIII 9 18.5 166.5 3080.25 39

20-21 IIIII IIIII I 11 20.5 225.5 4622.75 50

22-23 0 22.5 0 0 50

III n = 50 ∑fX = 837 ∑fX2 = 14416.5

The formula for calculating the variance is,


(∑𝑓𝑥)2
∑f𝑥 2 −
s2 = 𝑛
𝑛−1

Substituting the values,


(837)2
14416.5−
s2 = 50−1
50

So,

s2 = 8.27

For the standard deviation,

s = √𝑠 2

s = 2.88
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

V. Constructing Simple Box and Whisker Plot

Simple Box and Whisker Plot

It is also called boxplot, is a graphical presentation of information by a five-number


summary (“Minimum”, Lower or First Quartile (Q1), Median, Upper or third Quartile,
“Maximum”). It does not show a enough display in as much detail as a stem, leaf plot
or histogram, but this kind of plotting is especially useful for indicating whether a
distribution is skewed and whether there are potential unusual observations in the data
set. Box and whisker plots can also provide additional details, allowing multiple sets of
data to be displayed in the same graph and it is very useful when large numbers of
observations are involved and when two or more data sets are being compared.

Components of Simple Box

❖ the ends of the box are the upper and lower quartiles, so the box spans
the interquartile range (Interquartile Range=Q1-Q3)
❖ the median is marked by a vertical line inside the box
❖ the whiskers are the two lines outside the box that extend to the highest and
lowest observations.
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

Solving for A Simple Box and Whisker Pot

The data values on the table below depict the number of televisions sold at a department
store each month for 12 months. Create a box-and-whisker plot to display the data and
find the five-number summary, Maximum, First Quartile, Median, Third Quartile, and
the Minimum.
January 143

February 80

March 85

August 110

June 98

September 91

May 102

July 89

October 95

April 108

November 118

December 152

a) Record the data in the spreadsheet, simply copy-paste.


Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

b) Find the value of the five-number summary which is the Maximum, First
Quartile, Median, Third Quartile, Minimum.

Minimum: 80

c) To find the First Quartile:

d) To find the Median:


Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

e) To find the Third Quartile:

Maximum: 152

f) Select all the data point on the table and then go to the Insert tab > Charts group
> Statistic Chart symbol > click Box and Whisker.

And the result:


Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

VI. Scatter Diagram

Also called a scatter plot (XY graph), is a type of graph where corresponding values from
a set of data are placed as points on a coordinate plane and shows relationship between
the points is sometimes shown to be positive, negative, strong, or weak. The main
purpose of a this, is to show how strong the relationship, or correlation, between the
two variables is. The tighter the data points fall along a straight line, the higher the
correlation.

Scatter Diagram and Correlation

There are three types of correlation that is needed to interpret the scatter diagram
correctly that shows two numeric variables.

i. Positive Correlation - as the x variable increases, so does the y variable. An


example of a strong positive correlation is the amount of time the students spend
studying and their grades.
ii. Negative Correlation - as the x variable increase, the y variable decreases.
Ditching classes and grades are negatively correlated - as the number of absences
increases, the exam scores decrease.
iii. No Correlation - there is no evident relationship between the two variables; the
dots are scattered around the entire chart area. For example, student’s height
and grades appear to have no correlation as the former does not affect the latter
in any way.

Properties of Scatter Diagram

1. The linear correlation coefficient is always between -1 and 1.

2. If r = +1, there is a perfect positive linear relation between the two variables.

3. If r = -1, there is a perfect negative linear relation between the two variables.

4. The closer r is to +1, the stronger is the evidence of positive association


between the two variables.

5. The closer r is to -1, the stronger is the evidence of negative association


between the two variables.

6. If r is close to 0, there is little or no evidence of a linear relation between the


two variables - this does not mean there is no relation, only that there is
no linear relation.
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

Constructing a Scatter Diagram

An engineering student of had a hypothesis for his research in Engineering Data


Analysis. He believed that the more students studied Admath, the better their scores
would be. He took a poll in which he asked students the average number of hours that
they studied per week during a given semester. He then found out the overall percent
that they received in their math classes. His data is shown in the table below. Draw a
scatter diagram based on the table and its correlation.
Study Time (hrs) Math Grade (percent)

4 82

3.5 81

5 90

2 74

3 77

6.5 97

0.5 51

3.5 58

4.5 86

5 88

1 62

1.5 75

3 70

5.5 90

a) Copy the data in the Table into the spreadsheet, simply copy-paste.
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

b) Select all the data and Go to Insert > Charts Group > Scatter Chart > Click on
the first chart.

And the diagram will automatically appear. Put proper title and label on the diagram.
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153

Study Time vs Math Grade


120
Math Grade (in percent)

100

80

60

40

20

0
0 1 2 3 4 5 6 7
Study Time (in Hours)

c) To find the correlation, type the formula, “= CORREL (Y values , X values), on a


new cell and the answer will automatically appear.

You might also like