You are on page 1of 21

Measures of Center

Measures of Variability
More Detailed Summary Quantities
Quantile Plots

Chapter 2: Numerical Summary Measures

Applied Probability and Statistics for Engineering and Science Chapter 2: Numerical Summary Measures 1
Measures of Center
Measures of Variability
More Detailed Summary Quantities
Quantile Plots

Outline of Chapter 2

2.1 Measures of Center


2.2 Measures of Variability
2.3 More Detailed Summary Quantities
2.4 Quantile Plots

Applied Probability and Statistics for Engineering and Science Chapter 2: Numerical Summary Measures 2
Measures of Center
Measures of Variability Measures of center for data
More Detailed Summary Quantities Measures of center for distributions
Quantile Plots

The sample mean

The most frequently used measure of center is simply the


arithmetic average of the available observations
The sample mean of observations 𝑥1 , ..., 𝑥𝑛 , denoted by 𝑥, is
given by ∑𝑛
𝑥𝑖
𝑥 = 𝑖=1 (1)
𝑛
The mean suffers from one deficiency that makes it an
inappropriate measure of center under some circumstances: its
value can be greatly affected by the presence of even a single
outlier (i.e., unusual large or small observation)

Applied Probability and Statistics for Engineering and Science Chapter 2: Numerical Summary Measures 3
Measures of Center
Measures of Variability Measures of center for data
More Detailed Summary Quantities Measures of center for distributions
Quantile Plots

The sample median

An alternative measure of center to resist the effect of outliers


is the median.
The sample median, denoted by 𝑥 ˜, is obtained by first ordering
the sample observations from smallest to largest. Then
{ ( )
single middle value = 𝑛+1
2 th value on ordered list if 𝑛 odd
𝑥
˜= ( ) ( )
average of two middle values = average of 𝑛2 th and 𝑛2 + 1 th if 𝑛
(2)

Applied Probability and Statistics for Engineering and Science Chapter 2: Numerical Summary Measures 4
Measures of Center
Measures of Variability Measures of center for data
More Detailed Summary Quantities Measures of center for distributions
Quantile Plots

Discrete distributions

The mean value (expected value) of a discrete variable 𝑥,


denoted by 𝜇 [or 𝐸(𝑥)] is given by

𝜇= 𝑥𝑝(𝑥) (3)

where the summation is over all possible 𝑥 values.


Examples:
if 𝑥 is a binomial variable with parameters 𝑛 (group size) and 𝜌
(success proportion), then 𝜇 = 𝑛𝜌
if 𝑥 is a Poisson variable with parameters 𝜆, then the mean
value of 𝑥 is 𝜆 itself.

Applied Probability and Statistics for Engineering and Science Chapter 2: Numerical Summary Measures 5
Measures of Center
Measures of Variability Measures of center for data
More Detailed Summary Quantities Measures of center for distributions
Quantile Plots

Continuous distributions

The mean value (expected value) of a continuous variable 𝑥,


is given by ∫ ∞
𝜇𝑥 = 𝑥𝑓 (𝑥)𝑑𝑥 (4)
−∞
If 𝑥1 , 𝑥2 , ..., 𝑥𝑛 have been randomly selected from some
population or process distribution with mean value 𝜇, then the
sample mean 𝑥 gives a point estimate for 𝜇
0.4

0.35

0.3

0.25
f(x)

0.2

0.15

0.1

0.05 µ =0
x
0
−6 −4 −2 0 2 4 6
x

Applied Probability and Statistics for Engineering and Science Chapter 2: Numerical Summary Measures 6
Measures of Center
Measures of Variability Measures of center for data
More Detailed Summary Quantities Measures of center for distributions
Quantile Plots

The median of a continuous distribution

Like the sample median 𝑥˜ separates the sample into two equal
halves, the median 𝜇
˜ of a continuous distribution divides the
area under the density curve into two equal halves, i.e.,
∫ 𝜇˜
𝑓 (𝑥)𝑑𝑥 = .5 (5)
−∞

0.014

0.012

0.01

0.008
f(x)

0.006

0.004

0.002
The median
0
0 100 200 300 400 500
x

Applied Probability and Statistics for Engineering and Science Chapter 2: Numerical Summary Measures 7
Measures of Center
Measures of variability for sample data
Measures of Variability
Variance and standard deviation of a discrete distribution
More Detailed Summary Quantities
Variance and standard deviation of a continuous distribution
Quantile Plots

Measures of variability for sample data

The simplest measure of variability in a sample is the range


between the largest and smallest sample values.
The sample variance, denoted by 𝑠2 , is defined by
∑𝑛
2 (𝑥𝑖 − 𝑥)2 𝑆𝑥𝑥
𝑠 = 𝑖=1 = (6)
𝑛−1 𝑛−1
The sample standard deviation, √
denoted by 𝑠, is the (positive)
square root of the variance 𝑠 = 𝑠2

Applied Probability and Statistics for Engineering and Science Chapter 2: Numerical Summary Measures 8
Measures of Center
Measures of variability for sample data
Measures of Variability
Variance and standard deviation of a discrete distribution
More Detailed Summary Quantities
Variance and standard deviation of a continuous distribution
Quantile Plots

Variance of a discrete distribution

Let 𝑥 be a discrete variable with mass function 𝑝(𝑥) and mean


value 𝜇.
The variance of a discrete distribution for a variable 𝑥 is
defined by ∑
𝜎2 = (𝑥 − 𝜇)2 𝑝(𝑥) (7)
where the sum is over all possible 𝑥 values

The standard deviation: 𝜎 = 𝜎 2

Applied Probability and Statistics for Engineering and Science Chapter 2: Numerical Summary Measures 9
Measures of Center
Measures of variability for sample data
Measures of Variability
Variance and standard deviation of a discrete distribution
More Detailed Summary Quantities
Variance and standard deviation of a continuous distribution
Quantile Plots

Variance of a continuous distribution

The variance of a continuous distribution with density


function 𝑓 (𝑥) is obtained by replacing summation in the
discrete case by integration and substituting 𝑓 (𝑥) for 𝑝(𝑥).
The variance of a continuous distribution is defined by
∫ ∞
2
𝜎 = (𝑥 − 𝜇)2 𝑓 (𝑥)𝑑𝑥 (8)
−∞

The standard deviation: 𝜎 = 𝜎 2
In the case of a normal distribution, the related variance can
be determined by
∫ ∞
2 1 2 2
𝜎 = (𝑥 − 𝜇)2 √ 𝑒−(𝑥−𝜇) /(2𝜎 ) 𝑑𝑥 = 𝜎 2 (9)
−∞ 2𝜋𝜎

Applied Probability and Statistics for Engineering and Science Chapter 2: Numerical Summary Measures 10
Measures of Center
Quartiles and the interquartile range
Measures of Variability
Boxplots
More Detailed Summary Quantities
Percentiles
Quantile Plots

Quartiles and the interquartile range

The median separates a data set or distribution into two


equal parts (i.e., 50% of the values exceed the median and
50% are smaller than the median)
Quartiles and percentiles give more detailed information
about location of a data set or distribution by considering
percentages other than 50%.
The lower and upper quartiles along with the median separate
a data set or distribution into four equal parts:
25% all values smaller than the lower quartiles
25% exceed the upper quartiles
25% lie between each quartile and the median

Applied Probability and Statistics for Engineering and Science Chapter 2: Numerical Summary Measures 11
Measures of Center
Quartiles and the interquartile range
Measures of Variability
Boxplots
More Detailed Summary Quantities
Percentiles
Quantile Plots

Quartiles and the interquartile range: definitions

Separate the 𝑛 ordered sample observations into a lower half


and an upper half.
If 𝑛 is an odd number, include the median 𝑥
˜ in each half.
Then:
lower quartile = median of the lower half of the data
upper quartile = median of the upper half of the data
The interquartile range (IQR), a measure of variability that
is resistant to the effect of outliers, is the difference between
the two quartiles: IQR = upper quartile - lower quartile

Applied Probability and Statistics for Engineering and Science Chapter 2: Numerical Summary Measures 12
Measures of Center
Quartiles and the interquartile range
Measures of Variability
Boxplots
More Detailed Summary Quantities
Percentiles
Quantile Plots

Boxplots

A boxplot is a visual display of data based on the following


five-number summary:
smallest 𝑥𝑖 , lower-quartile, median, upper-quartile, largest 𝑥𝑖
To create a boxplot, do the following:
draw a horizontal measurement scale.
place a rectangle (above the axis) with left and right edges are
at the lower and upper quartiles, respectively.
place a vertical line segment or a symbol inside the rectangle
at the location of the median
draw ”whiskers” out from either end of the rectangle to the
smallest and largest values in the sample

Applied Probability and Statistics for Engineering and Science Chapter 2: Numerical Summary Measures 13
Measures of Center
Quartiles and the interquartile range
Measures of Variability
Boxplots
More Detailed Summary Quantities
Percentiles
Quantile Plots

Any observation farther than 1.5 IQR from the closet quartile
is called an outlier.
An outlier is extreme if it is more than 3 IQR from the
nearest quartile, and it is mild otherwise.

Applied Probability and Statistics for Engineering and Science Chapter 2: Numerical Summary Measures 14
Measures of Center
Quartiles and the interquartile range
Measures of Variability
Boxplots
More Detailed Summary Quantities
Percentiles
Quantile Plots

Percentiles

A Let 𝑝 denote a number between 0 and 1. Then the


(100𝑝)th percentile, 𝜂𝑝 also called the 𝑝th quantile, separates
the smallest 100𝑝% of the data or distribution from the
remaining values.
For instance, 90% of all values lie below the 90th percentile,
𝜂.9 , and only 10% of all values exceed the 90th percentile.
The median is the 50th percentile.
For a continuous distribution, 𝜂𝑝 is the solution to the
equation ∫ 𝜂𝑝
𝑓 (𝑥)𝑑𝑥 = 𝑝 (10)
−∞
where 𝑝 is the area under the density curve to the left of 𝜂𝑝
Example/figure 2.7: ...
Applied Probability and Statistics for Engineering and Science Chapter 2: Numerical Summary Measures 15
Measures of Center
Introduction
Measures of Variability
Sample quantiles
More Detailed Summary Quantities
A normal quantile plot
Quantile Plots

Introduction

An investigator usually wishes to know whether it is plausible


that a numerical sample 𝑥1 , 𝑥2 , ..., 𝑥𝑛 was selected from a
paricular type of popular distribution.
Many inferential procedures are based on the assumption that
the underlying distribution is of a specified type.
However, the use of such procedure is inappropriate if the
actual distribution differs greatly from the assumed type.
Understanding the underlying distribution can sometimes give
insight into the physical mechanisms involved in generating
the data.

Applied Probability and Statistics for Engineering and Science Chapter 2: Numerical Summary Measures 16
Measures of Center
Introduction
Measures of Variability
Sample quantiles
More Detailed Summary Quantities
A normal quantile plot
Quantile Plots

Introduction (cont.)

An effective way to check the distribution assumption is to


construct a quantile plot.
The essence of such a plot is that if the plot is based on the
correct distribution, the points in the plot will fall close to a
straight line.
If not, the points should depart substantially from a linear
pattern.

Applied Probability and Statistics for Engineering and Science Chapter 2: Numerical Summary Measures 17
Measures of Center
Introduction
Measures of Variability
Sample quantiles
More Detailed Summary Quantities
A normal quantile plot
Quantile Plots

Definition

Let 𝑥(1) denote the smallest sample observation, 𝑥(2) the


second smallest sample observation,..., and 𝑥(𝑛) the largest.
sample observation
Take 𝑥(1) to be the (.5/𝑛)th sample quantile, 𝑥(2) to be the
(1.5/𝑛)th sample quantile,..., and finally 𝑥(𝑛) to be the
(𝑛 − .5/𝑛)th sample quantile.
Generally, for 𝑖 = 1, 2, ..., 𝑛 𝑥(𝑖) to be the (𝑖 − .5/𝑛)th sample
quantile.

Applied Probability and Statistics for Engineering and Science Chapter 2: Numerical Summary Measures 18
Measures of Center
Introduction
Measures of Variability
Sample quantiles
More Detailed Summary Quantities
A normal quantile plot
Quantile Plots

A normal quantile plot

For 𝑖 = 1, 2, ..., 𝑛, the (𝑖 − .5/𝑛)th quantiles are determined


for a specified population or process distribution whose
plausibility is being investigated.
If the sample were actually selected from the specified
distribution, the related sample quantiles should be reasonably
close to the corresponding distributional quantiles, i.e., for
𝑖 = 1, 2, ..., 𝑛, there should be reasonable agreement between
𝑥(𝑖) and the (𝑖 − .5/𝑛)th quantiles of the specified distribution.
After determining the appropriate quantiles for the
distribution (under investigated), form the 𝑛 pair as follows:
(( ) ( ))
.5 𝑛 − .5
th quantile, 𝑥(1) , ..., th quantile, 𝑥(𝑛)
𝑛 𝑛
(11)
Each such pair can be plotted as a point on a two-dimensional
coordinate system
Applied Probability and Statistics for Engineering and Science Chapter 2: Numerical Summary Measures 19
Measures of Center
Introduction
Measures of Variability
Sample quantiles
More Detailed Summary Quantities
A normal quantile plot
Quantile Plots

A normal quantile plot: comments and an example

In each pair (i.e., each plotted point), if the first number is


close to the second number, the point in the plot will fall close
to a 45𝑜 line with slope 1 passing through the point (0,0)
Example: this program can be carried out to decide whether a
normal distribution with 𝜇 = 100 and 𝜎 = 15 is plausible.One
may need to do:
determine the appropriate 𝑧 quantiles (𝑧 refers to standard
normal distribution),
the considered normal distribution quantiles are expressed in
the form 𝜇 + (corresponding 𝑧 quantile) × 𝜎
It is noted that quantile for normal (𝜇, 𝜎) distribution =
𝜇 + (corresponding 𝑧 quantile) × 𝜎

Applied Probability and Statistics for Engineering and Science Chapter 2: Numerical Summary Measures 20
Measures of Center
Introduction
Measures of Variability
Sample quantiles
More Detailed Summary Quantities
A normal quantile plot
Quantile Plots

A normal quantile plot: definition

A normal quantile plot is a plot of the (𝑧 quantile,


observation) pairs.
The linear relation between normal (𝜇,𝜎) quantiles and 𝑧
quantiles implies that if the sample has come from a normal
distribution with parameters of 𝜇 and 𝜎, the points in the plot
should fall close to a straight line with slope 𝜎 and vertical
intercept 𝜇
A plot for which the points fall close to some straight line
suggests that the assumption of a normal population or
process distribution is plausible.

Applied Probability and Statistics for Engineering and Science Chapter 2: Numerical Summary Measures 21

You might also like