Chapter 2: Numerical Summary Measures

Measures of Center
Measures of Variability
More Detailed Summary Quantities
Quantile Plots
Chapter 2: Numerical Summary Measures
Applied Probability and Statistics for Engineering and Science Chapter 2: Numerical Summary Measures 1
Measures of Center
Quantile Plots
Outline of Chapter 2
2.1 Measures of Center

2.2 Measures of Variability
2.3 More Detailed Summary Quantities
2.4 Quantile Plots
Measures of Center
Measures of Variability Measures of center for data
More Detailed Summary Quantities Measures of center for distributions
Quantile Plots
The sample mean
The most frequently used measure of center is simply the

arithmetic average of the available observations
The sample mean of observations 𝑥1 , ..., 𝑥𝑛 , denoted by 𝑥, is
given by ∑𝑛
𝑥𝑖
𝑥 = 𝑖=1 (1)
𝑛
The mean suffers from one deficiency that makes it an
inappropriate measure of center under some circumstances: its
value can be greatly affected by the presence of even a single
outlier (i.e., unusual large or small observation)
Measures of Center
Quantile Plots
The sample median
An alternative measure of center to resist the effect of outliers

is the median.
The sample median, denoted by 𝑥 ˜, is obtained by first ordering
the sample observations from smallest to largest. Then
{ ( )
single middle value = 𝑛+1
2 th value on ordered list if 𝑛 odd
𝑥
˜= ( ) ( )
average of two middle values = average of 𝑛2 th and 𝑛2 + 1 th if 𝑛
(2)
Measures of Center
Quantile Plots
Discrete distributions
The mean value (expected value) of a discrete variable 𝑥,

denoted by 𝜇 [or 𝐸(𝑥)] is given by
∑
𝜇= 𝑥𝑝(𝑥) (3)
where the summation is over all possible 𝑥 values.

Examples:
if 𝑥 is a binomial variable with parameters 𝑛 (group size) and 𝜌
(success proportion), then 𝜇 = 𝑛𝜌
if 𝑥 is a Poisson variable with parameters 𝜆, then the mean
value of 𝑥 is 𝜆 itself.
Measures of Center
Quantile Plots
Continuous distributions
The mean value (expected value) of a continuous variable 𝑥,

is given by ∫ ∞
𝜇𝑥 = 𝑥𝑓 (𝑥)𝑑𝑥 (4)
−∞
If 𝑥1 , 𝑥2 , ..., 𝑥𝑛 have been randomly selected from some
population or process distribution with mean value 𝜇, then the
sample mean 𝑥 gives a point estimate for 𝜇
0.4
0.35
0.3
0.25
f(x)
0.2
0.15
0.1
0.05 µ =0
x
0
−6 −4 −2 0 2 4 6
x
Measures of Center
Quantile Plots
The median of a continuous distribution
Like the sample median 𝑥˜ separates the sample into two equal
halves, the median 𝜇
˜ of a continuous distribution divides the
area under the density curve into two equal halves, i.e.,
∫ 𝜇˜
𝑓 (𝑥)𝑑𝑥 = .5 (5)
−∞
0.014
0.012
0.01
0.008
f(x)
0.006
0.004
0.002
The median
0
0 100 200 300 400 500
x
Measures of Center
Measures of variability for sample data
Variance and standard deviation of a discrete distribution
Variance and standard deviation of a continuous distribution
Quantile Plots
The simplest measure of variability in a sample is the range

between the largest and smallest sample values.
The sample variance, denoted by 𝑠2 , is defined by
∑𝑛
2 (𝑥𝑖 − 𝑥)2 𝑆𝑥𝑥
𝑠 = 𝑖=1 = (6)
𝑛−1 𝑛−1
The sample standard deviation, √
denoted by 𝑠, is the (positive)
square root of the variance 𝑠 = 𝑠2
Measures of Center
Quantile Plots
Variance of a discrete distribution
Let 𝑥 be a discrete variable with mass function 𝑝(𝑥) and mean

value 𝜇.
The variance of a discrete distribution for a variable 𝑥 is
defined by ∑
𝜎2 = (𝑥 − 𝜇)2 𝑝(𝑥) (7)
where the sum is over all possible 𝑥 values
√
The standard deviation: 𝜎 = 𝜎 2
Measures of Center
Quantile Plots
Variance of a continuous distribution
The variance of a continuous distribution with density

function 𝑓 (𝑥) is obtained by replacing summation in the
discrete case by integration and substituting 𝑓 (𝑥) for 𝑝(𝑥).
The variance of a continuous distribution is defined by
∫ ∞
2
𝜎 = (𝑥 − 𝜇)2 𝑓 (𝑥)𝑑𝑥 (8)
−∞
√
The standard deviation: 𝜎 = 𝜎 2
In the case of a normal distribution, the related variance can
be determined by
∫ ∞
2 1 2 2
𝜎 = (𝑥 − 𝜇)2 √ 𝑒−(𝑥−𝜇) /(2𝜎 ) 𝑑𝑥 = 𝜎 2 (9)
−∞ 2𝜋𝜎
Measures of Center
Quartiles and the interquartile range
Boxplots
Percentiles
Quantile Plots
The median separates a data set or distribution into two

equal parts (i.e., 50% of the values exceed the median and
50% are smaller than the median)
Quartiles and percentiles give more detailed information
about location of a data set or distribution by considering
percentages other than 50%.
The lower and upper quartiles along with the median separate
a data set or distribution into four equal parts:
25% all values smaller than the lower quartiles
25% exceed the upper quartiles
25% lie between each quartile and the median
Measures of Center
Boxplots
Percentiles
Quantile Plots
Quartiles and the interquartile range: definitions
Separate the 𝑛 ordered sample observations into a lower half

and an upper half.
If 𝑛 is an odd number, include the median 𝑥
˜ in each half.
Then:
lower quartile = median of the lower half of the data
upper quartile = median of the upper half of the data
The interquartile range (IQR), a measure of variability that
is resistant to the effect of outliers, is the difference between
the two quartiles: IQR = upper quartile - lower quartile
Measures of Center
Boxplots
Percentiles
Quantile Plots
Boxplots
A boxplot is a visual display of data based on the following

five-number summary:
smallest 𝑥𝑖 , lower-quartile, median, upper-quartile, largest 𝑥𝑖
To create a boxplot, do the following:
draw a horizontal measurement scale.
place a rectangle (above the axis) with left and right edges are
at the lower and upper quartiles, respectively.
place a vertical line segment or a symbol inside the rectangle
at the location of the median
draw ”whiskers” out from either end of the rectangle to the
smallest and largest values in the sample
Measures of Center
Boxplots
Percentiles
Quantile Plots
Any observation farther than 1.5 IQR from the closet quartile
is called an outlier.
An outlier is extreme if it is more than 3 IQR from the
nearest quartile, and it is mild otherwise.
Measures of Center
Boxplots
Percentiles
Quantile Plots
Percentiles
A Let 𝑝 denote a number between 0 and 1. Then the

(100𝑝)th percentile, 𝜂𝑝 also called the 𝑝th quantile, separates
the smallest 100𝑝% of the data or distribution from the
remaining values.
For instance, 90% of all values lie below the 90th percentile,
𝜂.9 , and only 10% of all values exceed the 90th percentile.
The median is the 50th percentile.
For a continuous distribution, 𝜂𝑝 is the solution to the
equation ∫ 𝜂𝑝
𝑓 (𝑥)𝑑𝑥 = 𝑝 (10)
−∞
where 𝑝 is the area under the density curve to the left of 𝜂𝑝
Example/figure 2.7: ...
Measures of Center
Introduction
Sample quantiles
A normal quantile plot
Quantile Plots
Introduction
An investigator usually wishes to know whether it is plausible

that a numerical sample 𝑥1 , 𝑥2 , ..., 𝑥𝑛 was selected from a
paricular type of popular distribution.
Many inferential procedures are based on the assumption that
the underlying distribution is of a specified type.
However, the use of such procedure is inappropriate if the
actual distribution differs greatly from the assumed type.
Understanding the underlying distribution can sometimes give
insight into the physical mechanisms involved in generating
the data.
Measures of Center
Introduction
Sample quantiles
Quantile Plots
Introduction (cont.)
An effective way to check the distribution assumption is to

construct a quantile plot.
The essence of such a plot is that if the plot is based on the
correct distribution, the points in the plot will fall close to a
straight line.
If not, the points should depart substantially from a linear
pattern.
Measures of Center
Introduction
Sample quantiles
Quantile Plots
Definition
Let 𝑥(1) denote the smallest sample observation, 𝑥(2) the

second smallest sample observation,..., and 𝑥(𝑛) the largest.
sample observation
Take 𝑥(1) to be the (.5/𝑛)th sample quantile, 𝑥(2) to be the
(1.5/𝑛)th sample quantile,..., and finally 𝑥(𝑛) to be the
(𝑛 − .5/𝑛)th sample quantile.
Generally, for 𝑖 = 1, 2, ..., 𝑛 𝑥(𝑖) to be the (𝑖 − .5/𝑛)th sample
quantile.
Measures of Center
Introduction
Sample quantiles
Quantile Plots
For 𝑖 = 1, 2, ..., 𝑛, the (𝑖 − .5/𝑛)th quantiles are determined

for a specified population or process distribution whose
plausibility is being investigated.
If the sample were actually selected from the specified
distribution, the related sample quantiles should be reasonably
close to the corresponding distributional quantiles, i.e., for
𝑖 = 1, 2, ..., 𝑛, there should be reasonable agreement between
𝑥(𝑖) and the (𝑖 − .5/𝑛)th quantiles of the specified distribution.
After determining the appropriate quantiles for the
distribution (under investigated), form the 𝑛 pair as follows:
(( ) ( ))
.5 𝑛 − .5
th quantile, 𝑥(1) , ..., th quantile, 𝑥(𝑛)
𝑛 𝑛
(11)
Each such pair can be plotted as a point on a two-dimensional
coordinate system
Measures of Center
Introduction
Sample quantiles
Quantile Plots
A normal quantile plot: comments and an example
In each pair (i.e., each plotted point), if the first number is

close to the second number, the point in the plot will fall close
to a 45𝑜 line with slope 1 passing through the point (0,0)
Example: this program can be carried out to decide whether a
normal distribution with 𝜇 = 100 and 𝜎 = 15 is plausible.One
may need to do:
determine the appropriate 𝑧 quantiles (𝑧 refers to standard
normal distribution),
the considered normal distribution quantiles are expressed in
the form 𝜇 + (corresponding 𝑧 quantile) × 𝜎
It is noted that quantile for normal (𝜇, 𝜎) distribution =
𝜇 + (corresponding 𝑧 quantile) × 𝜎
Measures of Center
Introduction
Sample quantiles
Quantile Plots
A normal quantile plot: definition
A normal quantile plot is a plot of the (𝑧 quantile,

observation) pairs.
The linear relation between normal (𝜇,𝜎) quantiles and 𝑧
quantiles implies that if the sample has come from a normal
distribution with parameters of 𝜇 and 𝜎, the points in the plot
should fall close to a straight line with slope 𝜎 and vertical
intercept 𝜇
A plot for which the points fall close to some straight line
suggests that the assumption of a normal population or
process distribution is plausible.

Chapter 2: Numerical Summary Measures

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 2: Numerical Summary Measures

Uploaded by

Copyright:

Available Formats

Measures of Center

Chapter 2: Numerical Summary Measures

2.1 Measures of Center

The sample mean

The most frequently used measure of center is simply the

The sample median

An alternative measure of center to resist the eﬀect of outliers

The mean value (expected value) of a discrete variable 𝑥,

where the summation is over all possible 𝑥 values.

The mean value (expected value) of a continuous variable 𝑥,

The median of a continuous distribution

Measures of variability for sample data

The simplest measure of variability in a sample is the range

Variance of a discrete distribution

Let 𝑥 be a discrete variable with mass function 𝑝(𝑥) and mean

Variance of a continuous distribution

The variance of a continuous distribution with density

Quartiles and the interquartile range

The median separates a data set or distribution into two

Quartiles and the interquartile range: deﬁnitions

Separate the 𝑛 ordered sample observations into a lower half

A boxplot is a visual display of data based on the following

A Let 𝑝 denote a number between 0 and 1. Then the

An investigator usually wishes to know whether it is plausible

An eﬀective way to check the distribution assumption is to

Let 𝑥(1) denote the smallest sample observation, 𝑥(2) the

A normal quantile plot

For 𝑖 = 1, 2, ..., 𝑛, the (𝑖 − .5/𝑛)th quantiles are determined

A normal quantile plot: comments and an example

In each pair (i.e., each plotted point), if the ﬁrst number is

A normal quantile plot: deﬁnition

A normal quantile plot is a plot of the (𝑧 quantile,

You might also like