Professional Documents
Culture Documents
Statistics
Descriptive Statistics
Shaheena Bashir
FALL, 2019
2/31
Outline
Introduction
Measures of Location
Mean
Median
Measures of Dispersion
Standard Deviation
Quantiles
Five Number Summary
Chebyshev’s Rule
o
3/31
Introduction
Background
o
4/31
Measures of Location
o
5/31
Measures of Location
Mean
Mean
This is the case, for example, when our interpretation of the visual
center corresponds to a value for which the numerical contribution
from data points that are greater than the ‘center’ is equally
balanced by the numerical contribution from those that are less
than it. In such settings, the appropriate statistic to measure this
‘visual center’ is naturally the average, or mean, of the collected
observations.
o
7/31
Measures of Location
Mean
Mean
n
1X
x̄ = xi
n
i=1
1
= (x1 + x2 + · · · + xn )
n
The population mean is also computed the same way but is
denoted as µ.
o
8/31
Measures of Location
Mean
Mean: Example
Consider the data collection [1.9, 2.5, 3.6, 3.8, 3.2]. The mean for
this data collection is
(1.9 + 2.5 + 3.6 + 3.8 + 3.2)
x̄ =
5
= 3
o
9/31
Measures of Location
Mean
Mean: Limitations
Median
A measure of the center that is less sensitive to unusually large or
small observations than the mean is provided by the median.
Median divides the set of ordered data values in equal sized halves.
To find the median, x̃, of a data collection x1, . . . , xn:
1. Sort the n data values in order from smallest to largest, i.e.,
x(1) , x(2) , . . . , x(n) .
2. If n is odd, the median, x̃, is the single value in the middle of
this ordered list, i.e., x̃ = x( n+1 ) .
2
3. If n is even, there are two ‘middle values’, and the median, x̃,
x( n ) +x( n +1)
is their average, i.e., x̃ = 2 2 2 .
Since the median is the midpoint of the data, 50% of the values
are below it. Hence, the median is also the 50th percentile.
o
11/31
Measures of Location
Median
Median: Example
o
12/31
Measures of Dispersion
x̄A = 8
Collection B: 7.5, 7.6, 7.7, 7.8, 7.9, 8.1, 8.2, 8.3, 8.4, 8.5;
x̄B = 8
o
13/31
Measures of Dispersion
Standard Deviation
Collection A
4 6 8 10 12
Collection B
4 6 8 10 12
o
15/31
Measures of Dispersion
Standard Deviation
o
16/31
Measures of Dispersion
Standard Deviation
( xi )2
X P
1
Var = xi2 −
(n − 1) n
o
18/31
Measures of Dispersion
Quantiles
Quantiles
o
19/31
Measures of Dispersion
Quantiles
Quantiles: Example
o
20/31
Measures of Dispersion
Five Number Summary
o
21/31
Measures of Dispersion
Five Number Summary
c39
c52
6
Frequency
4
2
0
0 2 4 6 8 10
o
22/31
Measures of Dispersion
Five Number Summary
Box-and-Whisker Plot
I A box plot is a way to plot data that simplifies comparison
between groups
I There is a vertical box whose height corresponds to the
interquartile range of the data (the width is just to make the
figure easy to interpret).
I Then there is a horizontal line for the median; and
I The behavior of the rest of the data is indicated with whiskers
are extended from the sides of the box to the maximum and
minimum data values
I The outlier identified if any by reducing the whisker length to
the most extreme observation that is not a potential outlier.
I Each data-set is represented by a vertical structure, making it
easy to show multiple data-sets on one plot and interpret the
plot. o
23/31
Measures of Dispersion
Five Number Summary
Box Plot
Cabbage Data
4.0
3.5
Head Weight (KG)
3.0
2.5
2.0
1.5
1.0
c39 c52
Cultivar Type
o
24/31
Measures of Dispersion
Five Number Summary
o
25/31
Measures of Dispersion
Chebyshev’s Rule
Background
o
26/31
Measures of Dispersion
Chebyshev’s Rule
o
27/31
Measures of Dispersion
Chebyshev’s Rule
Example
o
28/31
Measures of Dispersion
Chebyshev’s Rule
Chebyshev Rule
o
29/31
Measures of Dispersion
Chebyshev’s Rule
1
1−
k2
o
µ− kσ µ+ kσ
30/31
Measures of Dispersion
Chebyshev’s Rule
1
At least 1 − k2
of observations fall within µ ± kσ
I Approximately 1 − 212 = 75% of observations fall within k = 2
standard deviation of the mean
I Approximately 1 − 312 = 89% of observations fall within k = 3
standard deviation of the mean
o
31/31
Measures of Dispersion
Chebyshev’s Rule
Example