Professional Documents
Culture Documents
Lecture 02 - Descriptive Statistics (Describing Data Set)
Lecture 02 - Descriptive Statistics (Describing Data Set)
INTRODUCTION
This lecture will discuss the subject matter of descriptive statistics, and in doing so
learn ways to describe and summarize a set if data. The coverage are as follows: indicate
how data that take discusses ways of summarizing data sets by use of statistics, which
are numerical quantities whose values are determined by the data on only a relatively
few distinct values can be described by using frequency tables or graphs; deals with
data whose set of values is grouped into different intervals; considers three statistics
that are used to indicate the “center” of the data set: the sample mean, the sample
median, and the sample mode; introduces the sample variance and its square root,
called the sample standard deviation. These statistics are used to indicate the spread of
the values in the data set. Also, describe the quartiles of a given data set and present
using a box and whisker plot. Lastly, a graphical technique, called the scatter diagram,
for presenting such data is introduced, as is the sample correlation coefficient, a statistic
that indicates the degree to which a large value of the first member of the pair tends to
go along with a large value of the second.
Frequency – is the number of times a given datum occurs in a data set. A relative
frequency is the fraction of times an answer occurs.
Cumulative Frequency – is the sum of the class and all classes below it in a frequency
distribution. at a certain point is found by adding the frequency at the present point to
the cumulative frequency of the previous point. It is the 'running total' of frequencies.
Relative frequency – an event that is defined as the number of times that the event
occurs during experimental trials, divided by the total number of trials conducted. It
can be written as fractions, percent, or decimals.
a) Make a third column on the table and label it ‘Cumulative frequency’, below it
input the first value of the frequency which is 5, and after it on the 2nd cell of
the column, type the formula “=C2+B3” (where C2 is the actual location of your
first cumulative frequency count from), and press ‘enter’ button.
b) The value will then appear, B3 is the location of your second frequency count) in
the first row of your new column. After that select, click the cell you entered the
formula. Click and drag the little square in the bottom right hand corner of the
cell to the bottom column.
n
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153
c) Excel will then populate the cell with all the remaining values needed for the
Cumulative frequency column.
a) First, sum all the frequency below the 2nd column and type the formula “= SUM
(B2:B9)”.
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153
b) After that make a fourth column on the table and label it ‘Relative frequency’.
Below it on the first cell of the new column, type the formula “=B2+B$10” (where
D2 is the actual location of your first Relative Frequency count from), and press
‘enter’ button.
c) The value will then appear, and after that select, click and drag the little square
in the bottom right hand corner of the cell to the bottom column.
n
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153
d) And finally, Excel will then populate the cell with all the remaining values needed
for the rest of the column of the Relative Frequency.
a) Record all the data in the spreadsheet. Set a Bin (range) or intervals where the
frequency of the exam scores will occur.
c) In the Insert > Chart group > Insert Static Chart > click Histogram chart icon.
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153
d) Click the File tab and then select Options > Add-ins > Manage drop-down > Excel
Add-ins > Analysis Toolpack > click Okay, and then go back to the Data tab and
click Data Analysis.
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153
e) In the ‘Data Analysis’ dialog box, select Histogram from the list then click Okay.
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153
f) In the Histogram dialog box > Input Range (all the exam scores) > Select the Bin
Range (all the set range for the data) > Specify the Output Range if you want to
get the Histogram in the same worksheet > Select Chart Output > then click
Okay.
g) Finally, a frequency distribution table and a Histogram Chart will appear in the
specified location.
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153
A Pareto diagram, also known as Pareto Analysis or sorted histogram chart contains
both columns sorted in descending order and a line representing the cumulative total
percentage. Pareto diagram highlight the biggest factors in a data set and are considered
one of the seven basic tools of quality control as it's easy to see the most common
problems or issues. To do this effectively, it utilizes the Pareto Principle, which is most
known as the 80/20 rule.
Categories of Data
For example, if a business was investigating the delay associated with processing credit
card applications. The data could be grouped into the following categories: no signature,
residential address not valid, non-legible handwriting, already a customer and other.
b) Re-order the data from the largest to smallest and sum all the occurrences.
Categories of Data
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153
IV. Mean, Median, Mode, SD, Variance Calculation for Grouped and Ungrouped
Data
1. A study was conducted to see whether the foot width of the feet of girls differ from
that of boys. Random samples of 19 girls and 20 boys from the fourth grade were
taken. The foot width of each students was taken and are given below:
Girls 8.8 9.3 9.3 7.9 8.7 8.8 9 9.5 8.3 9 8.1 9.5 9.3 8.6 8.6 8.5 9 7.9 8.8
Boys 8.4 8.8 9.7 9.8 8.9 9.7 9.6 8.8 9.8 8.9 9.1 9.8 9.2 8.6 9.4 9.5 8.9 9.3 9 8.6
Find the mean, median, and mode of the foot width of the 20 boys. Compute also the
standard deviation and variance.
(Manual Computation)
➢ For Mean
𝑥̅ = ∑X / n
Calculation:
∑X = 8.4 + 8.8 + 9.7 + 9.8 + 8.9 + 9.7 + 9.6 + 8.8 + 9.8 + 8.9 + 9.1 + 9.8 + 9.2 + 8.6 +
9.4 + 9.5 + 8.9 + 9.3 + 9 + 8.6 = 183.8
x̅ = ∑X / n = 183.8 / 20 = 9.19
̅ = 9.19
𝒙
Steps involved in computing means for ungrouped data are given below:
➢ For Median
9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.7, 9.8, 9.8, 9.8, …
Determine if the n is even or odd. If n is odd, the sample median is the value in position
(n + 1)/2; if n is even, it is the average of the values in positions n/2 and n/2 + 1.
n = 20 and even
Median = 9.15
➢ For Mode
In ungrouped data, mode is that single score which occurs most frequently (there can
be more than one mode):
9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.7, 9.8, 9.8, 9.8
Here the foot width 8.9 and 9.8 are repeated maximum number of times therefore 8.9
and 9.8 are the mode.
➢ For Variance
𝑥̅ = mean
xi = individual value
19 19 19 19 19
19 19 19 19 19
19 19 19 19 19
19 19 19 19 19
s2 = 0.2041
s = √𝑠 2
s = √0.2041
s = 0.4518
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153
b) To generate descriptive statistics for these foot widths, execute the following
steps. Click Data > Data Analysis > Descriptive Statistics > OK
Result:
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153
18.71 21.41 20.72 21.81 19.29 22.43 20.17 23.71 19.44 20.50
18.92 20.33 23.00 22.85 19.25 21.77 22.11 19.77 18.04 21.12.
Calculate the sample mean and median for the above sample values.
(Manual Computation)
➢ For Mean
𝑥̅ = ∑X / n
Calculation:
∑X = 18.71 + 21.41 + 20.72 + 21.81 + 19.29 + 22.43 + 20.17 + 23.71 + 19.44 + 20.50 +
18.92 + 20.33 + 23.00 + 22.85 + 19.25 + 21.77 + 22.11 + 19.77 + 18.04 + 21.12 = 415.35
x̅ = ∑X / n = 415.35 / 20 = 20.77
̅ = 20.77
𝒙
Steps involved in computing means for ungrouped data are given below:
➢ For Median
18.04, 18.71, 18.92, 19.25, 19.29, 19.44, 19.77, 20.17, 20.33, 20.5
20.72, 21.12, 21.41, 21.77, 21.81, 22.11, 22.43, 22.85, 23, 23.71
Determine if the n is even or odd. If n is odd, the sample median is the value in position
(n + 1)/2; if n is even, it is the average of the values in positions n/2 and n/2 + 1.
n = 20 and even
Median = 20.61
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153
➢ For Variance
𝑥̅ = mean
xi = individual value
19 19 19 19
19 19 19 19
19 19 19 19
19 19 19 19
19 19 19 19
s2 = 2.5329
s = √𝑠 2
s = √2.5329
s = 1.5915
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153
Result:
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153
1. The following measurements were recorded for drying time, in hours, of a certain
brand of latex paint. Calculate the mean, median and mode and find the s2 and
s for the grouped data.
8 2 5 4 5 7 6 9 13 3
Steps to execute prior in the calculation of mean and variance for grouped data:
n = 10
R = 13 – 2
R = 11
c) Select the recommended number of cells 𝑘 (𝒌 = √𝒏) and compute for the cell
𝒙 −𝒙
width 𝑐. 𝒄 = 𝒎𝒂𝒙𝒌 𝒎𝒊𝒏 .
𝑘 = √10 = 3.16 = 3 or 4
13 − 2
𝑐=
√10
c = 3.48 = 3
2-4
5-7
8-10
11-15
d) Tally the numbers in each class interval (from the data set above). Next, count
the tally marks and write the frequency in the third column. The frequency is just
the total.
Time (hr) Tally Frequency (f)
2-4 III 3
5-7 IIII 4
8-10 II 2
11-13 I 1
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153
e) Using Microsoft Excel, encode the obtain class intervals and frequencies.
fX2 = the product of frequency (f) and square of class mark (X)
g) For the class mark, at row 3 column E, type “=(A3+B3)/2” then press enter.
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153
h) In order to get the remaining class mark, click and drag the little square in the
bottom right hand corner of the cell into the bottom column.
i) For fX, at row 3 column F, type “=PRODUCT(D3, E3)” then press enter.
j) In order to get the remaining fX, click and drag the little square in the bottom
right hand corner of the cell into the bottom column.
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153
k) For fX2, at row 3 column G, type “= PRODUCT (D3, E3, E3)” then press enter.
l) In order to get the remaining fx2, click and drag the little square in the bottom
right hand corner of the cell into the bottom column.
m) For cumulative frequency (cf), at row 3 column H, input the first value of the
frequency which is 3. Below of that cell, type the formula “= SUM (H3, D4)” (where
H3 is the actual location of your first cumulative frequency count from) then
press the enter.
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153
n) In order to get the remaining cf, click and drag the little square in the bottom
right hand corner of the cell into the bottom column.
o) To get the summation of frequency (f), fX, and fX2, use the formula “= SUM
(D3:D6)” , “= SUM (F3:F6)” and “= SUM (G3:G6)”, respectively.
➢ Calculation of Mean
Mean = ∑fX / n
➢ Calculation of Median
Locate the Cumulative Frequency which is greater than or equal to n/2 and note down
its corresponding Median Class.
n/2 = 10/2 = 5
Time (hr) Tally f X fX fX2 cf
2-4 III 3 3 9 27 3
8-10 II 2 9 18 162 9
11-13 I 1 12 12 144 10
Now, the formula for calculating the median when the data are grouped in class interval
is
𝑛
−𝐹
Median = L + 2
𝑓
×𝑐
n = total frequency = 10
10
−3
Median = 4.5 + 2
4
×3
Median = 6
➢ Calculation of Mode
Locate the class mode by finding the class interval that contains the largest frequency.
Time (hr) Tally f X fX fX2 cf
2-4 III 3 3 9 27 3
8-10 II 2 9 18 162 9
11-13 I 1 12 12 144 10
Now, the formula for calculating the mode when the data are grouped in class interval
is
𝐷1
Mode = L + 𝐷1 + 𝐷2
×𝑐
1
Mode = 4.5 + 1+2
×3
Mode = 5.5
2-4 III 3 3 9 27 3
8-10 II 2 9 18 162 9
11-13 I 1 12 12 144 10
So,
s2 = 8.9
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153
s = √𝑠 2
s = 2.98
2. Given the following set of data is the number of orders received each day during
the past 50 days at the office of a mail-order company. (a) Calculate the mean,
median, and mode for the grouped data. (b) Find the standard deviation and
variance.
21 18 19 20 18
10 15 16 17 13
17 20 14 19 16
20 17 17 20 15
15 12 15 16 21
16 17 18 10 13
14 20 17 17 20
17 18 14 21 18
20 11 18 19 15
15 21 17 15 16
(Manual Computation)
Steps to execute prior in the calculation of mean and variance for grouped data:
n = 50
R = 21 – 10
R = 11
c) Select the recommended number of cells 𝑘 (𝒌 = √𝒏) and compute for the cell
𝒙 −𝒙
width 𝑐. 𝒄 = 𝒎𝒂𝒙 𝒎𝒊𝒏 .
𝒌
𝑘 = √50 = 7.07 = 7
21 − 10
𝑐=
√50
c = 1.56 = 2
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153
10-11
12-13
14-15
16-17
18-19
20-21
22-23
d) Tally the numbers in each class interval (from the data set above). Next, count
the tally marks and write the frequency in the third column. The frequency is just
the total.
Class Interval Tally Frequency (f)
10-11 III 3
12-13 III 3
22-23 0
e) Using Microsoft Excel and following the instructions from the previous problem,
we obtain the following,
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153
➢ Calculation of Mean
Mean = ∑fX / n
➢ Calculation of Median
Locate the Cumulative Frequency which is greater than or equal to n/2 and note down
its corresponding Median Class.
n/2 = 50/2 = 25
No. of Order Tally f X fX fX2 cf
22-23 0 22.5 0 0 50
Now, the formula for calculating the median when the data are grouped in class interval
is
𝑛
−𝐹
Median = L + 2
𝑓
×𝑐
n = total frequency = 50
50
−16
Median = 15.5 + 2
×2
14
Median = 16.79
➢ Calculation of Mode
Locate the class mode by finding the interval that contains the largest frequency.
No. of Order Tally f X fX fX2 cf
22-23 0 22.5 0 0 50
Now, the formula for calculating the mode when the data are grouped in class interval
is
𝐷1
Mode = L + 𝐷1 + 𝐷2
×𝑐
4
Mode = 15.5 + 4+5
×2
Mode = 16.39
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153
22-23 0 22.5 0 0 50
So,
s2 = 8.27
s = √𝑠 2
s = 2.88
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153
❖ the ends of the box are the upper and lower quartiles, so the box spans
the interquartile range (Interquartile Range=Q1-Q3)
❖ the median is marked by a vertical line inside the box
❖ the whiskers are the two lines outside the box that extend to the highest and
lowest observations.
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153
The data values on the table below depict the number of televisions sold at a department
store each month for 12 months. Create a box-and-whisker plot to display the data and
find the five-number summary, Maximum, First Quartile, Median, Third Quartile, and
the Minimum.
January 143
February 80
March 85
August 110
June 98
September 91
May 102
July 89
October 95
April 108
November 118
December 152
b) Find the value of the five-number summary which is the Maximum, First
Quartile, Median, Third Quartile, Minimum.
Minimum: 80
Maximum: 152
f) Select all the data point on the table and then go to the Insert tab > Charts group
> Statistic Chart symbol > click Box and Whisker.
Also called a scatter plot (XY graph), is a type of graph where corresponding values from
a set of data are placed as points on a coordinate plane and shows relationship between
the points is sometimes shown to be positive, negative, strong, or weak. The main
purpose of a this, is to show how strong the relationship, or correlation, between the
two variables is. The tighter the data points fall along a straight line, the higher the
correlation.
There are three types of correlation that is needed to interpret the scatter diagram
correctly that shows two numeric variables.
2. If r = +1, there is a perfect positive linear relation between the two variables.
3. If r = -1, there is a perfect negative linear relation between the two variables.
4 82
3.5 81
5 90
2 74
3 77
6.5 97
0.5 51
3.5 58
4.5 86
5 88
1 62
1.5 75
3 70
5.5 90
a) Copy the data in the Table into the spreadsheet, simply copy-paste.
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153
b) Select all the data and Go to Insert > Charts Group > Scatter Chart > Click on
the first chart.
And the diagram will automatically appear. Put proper title and label on the diagram.
Republic of the Philippines
Bulacan State University
City of Malolos, Bulacan
Tel/Fax (044) 791-0153
100
80
60
40
20
0
0 1 2 3 4 5 6 7
Study Time (in Hours)