Chapter 5. Descriptive Statistics

Chapter 5: Descriptive Statistics
Course Name: PROBABILITY & STATISTICS

Lecturer: Duong Thi Hong
Hanoi, 2022
1 / 71 Chapter 5: Descriptive Statistics

Content
1 Numerical Summaries of Data

2 Stem-and-Leaf Diagrams
3 Frequency Distributions and Histograms
4 Box Plots
5 Times Sequence Plots

Introduction

Content

4 Box Plots

1. Numerical Summaries of Data
Sample Mean
If the n observations in a sample are denoted by x1 , x2 , ..., xn the sample
mean is Pn
x1 + x2 + ... + xn i=1 xi
x̄ = =
n n
Example : Let's consider the weight of the eight observations collected from
the prototype engine connectors: 12.6, 12.9, 13.4, 12.3, 13.6, 13.5, 12.6 and
13.1. Find the sample mean.
x1 + x2 + ... + xn 12.6 + 12.9 + ... + 13.1

x̄ = = = 13
n 8

Question 1

Question 2

Question 3

Question 4

Although the sample mean is useful, it does not convey all of the information
about a sample of data. The variability or scatter in the data may be
described by the sample variance or the sample standard deviation.

Sample Variance
If x1 , x2 , ..., xn is a sample of n observations, the sample variance is
Pn
Pn 2
Pn 2 ( xi )2
i=1 (xi − x̄) i=1 xi −
i=1
2 n
s = =
n−1 n−1
The sample standard deviation,s, is the positive square root of the
sample variance.

Example 1
Table 6-1 displays the quantities needed for calculating the sample variance
and sample standard deviation for the pull-off force data. The sample vari-
ance is
P8
2 i=1 (xi− x̄)2
s = = 1.60/7 = 0.2286
8−1
√ √
The sample standard deviation is s = s2 = 0.2286 = 0.48 pounds

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Example 1
These data are plotted in Fig. 6-2.

Question 1

Question 2

Question 3

Question 4

Question 5

In addition to the sample variance and sample standard deviation, the

sample range, or the difference between the largest and smallest
observations, is a useful measure of variability.
Sample Range
If the n observations in a sample are denoted by x1, x2, ..., xn, the
sample range is
r = max(xi ) − min(xi )

Median
The sample median is a measure of central tendency that divides
the data into two equal parts, half below the median and half above.
If the number of observations is even, the median is halfway
between the two central values.
If the number of observations is odd, the median is the central
value.
Example : The prices (in dollars) for a sample of round-trip flights
from Chicago, Illinois to Cancun, Mexico are listed. Find the median
of the flight prices:
872 432 397 427 388 782 397

Mode
The sample mode is the most frequently occurring data value.
Quartiles
When an ordered set of data is divided into four equal parts, the
division points are called quartiles. The first or lower quartile, q1 or
Q1 , is a value that has approximately 25% of the observations below
it and approximately 75%.

Interquartile Range (IQR)

An ordered set of data is divided into four equal parts, the division
points are called quartiles:
The first quartile, q1 or Q1: is a value that has approximately
25% of the observations below.
The sample median or second quartile, q2 or Q2, has
approximately 50% of the observations below its value.
The third quartile, q3 or Q3, has approximately 75% of the
observations below its value.
The interquartile range, IQR = Q3 − Q1

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

Question 11

Question 12

Question 13

Question 14

Exercise 1
1) Will the sample mean always correspond to one of the observations
in the sample?
2) Will exactly half of the observations in a sample fall below the
mean?
3) Will the sample mean always be the most frequently occurring
data value in the sample?
4) Can the sample standard deviation be equal to zero? Give an
example.

Exercise 1
1) Will the sample mean always correspond to one of the observations
in the sample?
2) Will exactly half of the observations in a sample fall below the
mean?
3) Will the sample mean always be the most frequently occurring
data value in the sample?
4) Can the sample standard deviation be equal to zero? Give an
example.
Exercise 2
Suppose that you add 10 to all of the observations in a sample. How
does this change the sample mean? How does it change the sample
standard deviation?
Content

4 Box Plots

2. Stem-and-Leaf Diagrams
A stem-and-leaf diagramis a good way to obtain an informative visual

display of a data set x1, x2, ..., xn, where each number xi consists of at
least two digits. To construct a stemand-leaf diagram, use the following
steps.
Steps to Construct a Stem-and-Leaf Diagram
1) Divide each number xi into two parts: a stem, consisting of one
or more of the leading digits, and a leaf, consisting of the remaining
digit.
2) List the stem values in a vertical column.
3) Record the leaf for each observation beside its stem.
4) Write the units for stems and leaves on the display.

Example 2
To illustrate the construction of a stem-and-leaf diagram, consider
the alloy compressive strength data in Table 6-2.

Example 2
We will select as stem values the numbers 7, 8, ..., 24. The resulting
stem-and-leaf diagram is presented in Fig. 6-4.

Example 2
The last column in the diagram is a frequency count of the number
of leaves associated with each stem. Inspection of this display imme-
diately reveals that:
Most of the compressive strengths lie between 110 and 200 psi.
A central value is somewhere between 150 and 160 psi.
The strengths are distributed approximately symmetrically
about the central value.

Example 3
Figure 6-5 illustrates the stem-and-leaf diagram for 25 observations
on batch yields from a chemical process.

Example 3
In Fig. 6-5(a) we have used 6, 7, 8, and 9 as the stems: This

results in too few stems, and the stem-and-leaf diagram does
not provide much information about the data.
In Fig. 6-5(b) we have divided each stem into two parts,
resulting in a display that more adequately displays the data.
Figure 6-5(c) illustrates a stem and-leaf display with each stem
divided into five parts: There are too many stems in this plot,
resulting in a display that does not tell us much about the
shape of the data.

Exercise 1
1) When will the median of a sample be equal to the sample mean?
2) When will the median of a sample be equal to the mode?

Exercise 1
1) When will the median of a sample be equal to the sample mean?
2) When will the median of a sample be equal to the mode?
Exercise 2
Suppose that you add 10 to all of the observations in a sample. How
does this change the sample mean? How does it change the sample
standard deviation?

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Content

4 Box Plots

3. Frequency Distributions and Histograms
A frequency distribution is a more compact summary of data than

a stem-and-leaf diagram. To construct a frequency distribution, we
must divide the range of the data into intervals, which are usually
called class intervals, cells,or bins.
A frequency distribution for the comprehensive strength data in Ta-
ble 6-2 is shown in Table 6-4.

The histogram is a visual display of the frequency distribution. The

steps for constructing a histogram follow
Constructing a Histogram (Equal Bin Widths)
1) Label the bin (class interval) boundaries on a horizontal scale.
2) Mark and label the vertical scale with the frequencies or the rel-
ative frequencies.
3) Above each bin, draw a rectangle where height is equal to the
frequency (or relative frequency) corresponding to that bin.

Example 4
Figure below presents the production of transport aircraft by the
Boeing Company in 1985. Notice that the 737 was the most popular
model, followed by the 757, 747, 767, and 707.

Question 1

Question 2

Content

4 Box Plots

4. Box Plots
An ordered set of data is divided into four equal parts, the division
points are called quartiles:
The first quartile, q1 or Q1: is a value that has approximately
25% of the observations below.
The sample median or second quartile, q2 or Q2, has
approximately 50% of the observations below its value.
The third quartile, q3 or Q3, has approximately 75% of the
observations below its value.
The interquartile range, IQR = Q3 − Q1

4. Box Plots
A box-plot is a visual display that describes important features of

data: three quartiles, the minimum/maximum values, and unusual
observations (outliers).

4. Box Plots
Example
Given a data of ages of 14 random adults from a village: 15, 20, 31,
31, 32, 40, 41, 41, 42, 43, 45, 45, 50, 70. Draw a box plot for this
data.

Content

4 Box Plots

5. Times Sequence Plots

Chapter 5. Descriptive Statistics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 5. Descriptive Statistics

Uploaded by

Copyright:

Available Formats

Chapter 5: Descriptive Statistics

Course Name: PROBABILITY & STATISTICS

1 / 71 Chapter 5: Descriptive Statistics

1 Numerical Summaries of Data

2 / 71 Chapter 5: Descriptive Statistics

3 / 71 Chapter 5: Descriptive Statistics

1 Numerical Summaries of Data

4 / 71 Chapter 5: Descriptive Statistics

13.1. Find the sample mean.

x1 + x2 + ... + xn 12.6 + 12.9 + ... + 13.1

5 / 71 Chapter 5: Descriptive Statistics

6 / 71 Chapter 5: Descriptive Statistics

7 / 71 Chapter 5: Descriptive Statistics

8 / 71 Chapter 5: Descriptive Statistics

9 / 71 Chapter 5: Descriptive Statistics

about a sample of data. The variability or scatter in the data may be

described by the sample variance or the sample standard deviation.

10 / 71 Chapter 5: Descriptive Statistics

11 / 71 Chapter 5: Descriptive Statistics

12 / 71 Chapter 5: Descriptive Statistics

13 / 71 Chapter 5: Descriptive Statistics

14 / 71 Chapter 5: Descriptive Statistics

15 / 71 Chapter 5: Descriptive Statistics

16 / 71 Chapter 5: Descriptive Statistics

17 / 71 Chapter 5: Descriptive Statistics

18 / 71 Chapter 5: Descriptive Statistics

19 / 71 Chapter 5: Descriptive Statistics

20 / 71 Chapter 5: Descriptive Statistics

21 / 71 Chapter 5: Descriptive Statistics

22 / 71 Chapter 5: Descriptive Statistics

23 / 71 Chapter 5: Descriptive Statistics

24 / 71 Chapter 5: Descriptive Statistics

In addition to the sample variance and sample standard deviation, the

25 / 71 Chapter 5: Descriptive Statistics

26 / 71 Chapter 5: Descriptive Statistics

27 / 71 Chapter 5: Descriptive Statistics

Interquartile Range (IQR)

28 / 71 Chapter 5: Descriptive Statistics

29 / 71 Chapter 5: Descriptive Statistics

30 / 71 Chapter 5: Descriptive Statistics

31 / 71 Chapter 5: Descriptive Statistics

32 / 71 Chapter 5: Descriptive Statistics

33 / 71 Chapter 5: Descriptive Statistics

34 / 71 Chapter 5: Descriptive Statistics

35 / 71 Chapter 5: Descriptive Statistics

36 / 71 Chapter 5: Descriptive Statistics

37 / 71 Chapter 5: Descriptive Statistics

38 / 71 Chapter 5: Descriptive Statistics

39 / 71 Chapter 5: Descriptive Statistics

40 / 71 Chapter 5: Descriptive Statistics

41 / 71 Chapter 5: Descriptive Statistics

42 / 71 Chapter 5: Descriptive Statistics

43 / 71 Chapter 5: Descriptive Statistics

1 Numerical Summaries of Data

44 / 71 Chapter 5: Descriptive Statistics

A stem-and-leaf diagramis a good way to obtain an informative visual

45 / 71 Chapter 5: Descriptive Statistics

46 / 71 Chapter 5: Descriptive Statistics

47 / 71 Chapter 5: Descriptive Statistics

48 / 71 Chapter 5: Descriptive Statistics

49 / 71 Chapter 5: Descriptive Statistics

In Fig. 6-5(a) we have used 6, 7, 8, and 9 as the stems: This

50 / 71 Chapter 5: Descriptive Statistics

51 / 71 Chapter 5: Descriptive Statistics