Descriptive Statistics: 4 Edition David P. Doane and Lori E. Seward

9/28/12
Chapter 4
Descriptive Statistics
A PowerPoint Presentation Package to Accompany
Chapter Contents
Applied Statistics in Business &
Economics, 4th edition 4.1 Numerical Description
4.2 Measures of Center
David P. Doane and Lori E. Seward 4.3 Measures of Variability
4.4 Standardized Data
4.5 Percentiles, Quartiles, and Box Plots
Prepared by Lloyd R. Jaisingh
4.6 Correlation and Covariance
4.7 Grouped Data
4.8 Skewness and Kurtosis
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. 4-2
Chapter 4
Chapter 4
Descriptive Statistics Descriptive Statistics
Chapter Learning Objectives Chapter Learning Objectives
LO4-1: Explain the concepts of center, variability, and shape. LO4-7: Calculate quartiles and other percentiles.
LO4-2: Use Excel to obtain descriptive statistics and visual displays. LO4-8: Make and interpret box plots.
LO4-3: Calculate and interpret common measures of center. LO4-9: Calculate and interpret a correlation coefficient and
LO4-4: Calculate and interpret common measures of variability. covariance.
LO4-5: Transform a data set into standardized values.

LO4-10: Calculate the mean and standard deviation from
grouped data.
LO4-6: Apply the Empirical Rule and recognize outliers.
LO4-11: Assess skewness and kurtosis in a sample.
4-3 4-4
Chapter 4
Chapter 4
LO4-1 4.1 Numerical Description LO4-2 4.1 Numerical Description

LO4-1: Explain the concepts of center, variability, and shape. LO4-2: Use Excel to obtain descriptive statistics and visual displays.
Three key characteristics of numerical data:
EXCEL Histogram Display for Tables 4.3
4-5 4-6
1
9/28/12
Chapter 4
Chapter 4
LO4-3 4.2 Measures of Center LO4-3 4.2 Measures of Center
LO4-3: Calculate and interpret common measures of center. Median
Mean •  The median (M) is the 50th percentile or midpoint of the sorted
•  A familiar measure of center sample data.
•  M separates the upper and lower halves of the sorted observations.
Population Mean Sample Mean •  If n is odd, the median is the middle observation in the data array.
•  If n is even, the median is the average of the middle two
observations in the data array.
•  In Excel, use function =AVERAGE(Data) where Data is an array of

data values.
4-7 4-8
Chapter 4
Chapter 4
Mode LO4-1: Explain the concepts of center, variability, and shape.
•  The most frequently occurring data value. Shape

•  Compare mean and median or look at the histogram to determine
•  May have multiple modes or no mode. degree of skewness.
•  The mode is most useful for discrete or categorical data with only a •  Figure 4.10 shows prototype population shapes showing varying
few distinct data values. For continuous data or data with a wide degrees of skewness.
range, the mode is rarely useful.
4-9 4-10
Chapter 4
Chapter 4

Geometric Mean Growth Rates Year Revenue (mil)
•  The geometric mean (G) is a 2006 2,361
•  For example, from
multiplicative average.
2006 to 2010, JetBlue 2007 2,843
Airlines revenues are:
2008 3,392
2009 3,292
2010 3,779
The average growth rate:
Growth Rates
A variation on the geometric
mean used to find the average
growth rate for a time series.
or 12.5 % per year.
4-11 4-12
2
9/28/12
Chapter 4
Chapter 4
Midrange Trimmed Mean
•  The midrange is the point halfway between the lowest and highest
•  To calculate the trimmed mean, first remove the highest and lowest
values of X.
k percent of the observations.
•  Easy to use but sensitive to extreme data values.
•  For example, for the n = 33 P/E ratios, we want a 5 percent trimmed
mean (i.e., k = .05).
•  For the J.D. Power quality data: •  To determine how many observations to trim, multiply k by n, which
is 0.05 x 33 = 1.65 or 2 observations.
•  So, we would remove the two smallest and two largest observations
before averaging the remaining values.
•  Here, the midrange (126.5) is higher than the mean (114.70) or
median (113).
4-13 4-14
Chapter 4
Chapter 4
LO4-3 4.2 Measures of Center LO4-4 4.3 Measures of Variability
Trimmed Mean
LO4-4: Calculate and interpret common measures of variability.
•  Here is a summary of all the measures of central tendency for the
J.D. Power data. •  Variation is the spread of data points about the center of the
distribution in a sample. Consider the following measures of
Mean: 114.70 =AVERAGE(Data) variability:
Measures of Variability
Median: 113 =MEDIAN(Data)
Statistic Formula Excel Pro Con
Mode: 111 =MODE.SNGL(Data)
Geometric Mean: 113.35 =GEOMEAN(Data) Sensitive to
=MAX(Data) -
Range xmax – xmin Easy to calculate extreme data
Midrange: 126.5 (MIN(Data)+MAX(Data))/2 MIN(Data)
values.
5% Trim Mean: 113.94 =TRIMMEAN(Data, 0.1) Sample Plays a key role
Nonintuitive
Variance =VAR.S(Data) in mathematical
•  The trimmed mean mitigates the effects of very high values, but still meaning.
(s2) statistics.
exceeds the median.
4-15 4-16
Chapter 4
Chapter 4
LO4-4 4.3 Measures of Variability LO4-4 4.3 Measures of Variability

Measures of Variation Measures of Variability
Statistic Formula Excel Pro Con Statistic Formula Excel Pro Con
Most common Mean n
Sample
standard
measure. Uses
Nonintuitive absolute ∑ xi − x =AVEDEV(Data)
Easy to
Lacks “nice”
theoretical
=STDEV.S(Data) same units as the deviation i =1 understand.
deviation meaning. properties.
raw data ($ , £, ¥, (MAD) n
(s)
grams etc.).
Sample Measures Population variance
Requires
coef- relative variation
non-
ficient. of None in percent so can Population
negative
variation compare data standard
data.
(CV) sets. deviation
4-17 4-18
3
9/28/12
Chapter 4
Chapter 4
LO4-4 4.3 Measures of Variability LO4-4 4.3 Measures of Variability
Coefficient of Variation Mean Absolute Deviation
•  Useful for comparing variables measured in different units or with •  This statistic reveals the average distance from the center.
different means.
•  A unit-free measure of dispersion. •  Absolute values must be used since otherwise the deviations
around the mean would sum to zero. It is stated in the unit of
•  Expressed as a percent of the mean. measurement.
•  Only appropriate for nonnegative data. It is undefined if the mean is •  The MAD is appealing because of its simple interpretation.
zero or negative.
4-19 4-20
Chapter 4
Chapter 4
LO4-1 4.3 Measures of Variability 4.4 Standardized Data
Central Tendency vs. Dispersion:
Manufacturing
Chebyshev s Theorem
•  For any population with mean m and standard deviation s, the

percentage of observations that lie within k standard deviations of
the mean must be at least 100[1 – 1/k2].
•  For k = 2 standard deviations,
•  Although
100[1 – 1/22] = 75%
applicable to
•  So, at least 75.0% will lie within m + 2s
any data set,
•  For k = 3 standard deviations, these limits
100[1 – 1/32] = 88.9% tend to be
•  So, at least 88.9% will lie within m + 3s
rather wide.
•  Take frequent samples to monitor quality.
4-21 4-22
Chapter 4
Chapter 4
4.4 Standardized Data 4.4 Standardized Data

The Empirical Rule The Empirical Rule
•  The normal distribution is symmetric and is also known as the
bell-shaped curve.

•  The Empirical Rule states that for data from a normal distribution, Note: No upper
we expect the interval ! ± k! to contain a known percentage bound is given.
of data. For
Data values
outside
k = 1, 68.26% will lie within m + 1s
m + 3s
are rare.



4-23 4-24
4
9/28/12
Chapter 4
Chapter 4
LO4-5 4.4 Standardized Data LO4-6 4.4 Standardized Data
LO4-5: Transform a data set into standardized values. LO4-6: Apply the Empirical Rule and recognize outliers.
•  A standardized variable (Z) redefines each observation in terms of
the number of standard deviations from the mean.

A negative z
Standardization formula value means the
for a population: observation is to the
left of the mean.
Standardization formula Positive z means

for a sample (for n > 30):
the observation is to
the right of the mean.
4-25 4-26
Chapter 4
Chapter 4
4.4 Standardized Data LO4-7 4.5 Percentiles, Quartiles, and Box-Plots
Estimating Sigma LO4-7: Calculate quartiles and other percentiles
•  For a normal distribution, the range of values is almost 6s Percentiles

(from m – 3s to m + 3s).

•  Percentiles are data that have been divided into 100 groups.
•  If you know the range R (high – low), you can estimate the
•  For example, you score in the 83rd percentile on a standardized
standard deviation as s = R/6.
test. That means that 83% of the test-takers scored below you.
•  Deciles are data that have been divided into

•  Useful for approximating the standard deviation when only R is
10 groups.
known.
•  Quintiles are data that have been divided into
5 groups.
•  This estimate depends on the assumption of normality.
•  Quartiles are data that have been divided into
4 groups.
4-27 4-28
Chapter 4
Chapter 4
LO4-7 4.5 Percentiles, Quartiles, and Box Plots LO4-7 4.5 Percentiles, Quartiles, and Box Plots
Percentiles Quartiles
•  Quartiles are scale points that divide the sorted data into four
•  Percentiles may be used to establish benchmarks for comparison
groups of approximately equal size.
purposes (e.g. health care, manufacturing, and banking industries
use 5th, 25th, 50th, 75th and 90th percentiles).
•  Quartiles (25, 50, and 75 percent) are commonly used to assess
financial performance and stock portfolios.
Q1 Q2 Q3
•  Percentiles can be used in employee merit evaluation and salary
benchmarking. ïLower 25%ð | ïSecond 25%ð | ïThird 25%ð | ïUpper 25%ð
•  The three values that separate the four groups are called Q1, Q2,
4-29
and Q3, respectively. 4-30
5
9/28/12
Chapter 4
Chapter 4
Quartiles
Quartiles – The method of medians
•  The second quartile Q2 is the median, a measure of central
tendency. •  The first quartile Q1 is the median of the data values below Q2, and
the third quartile Q3 is the median of the data values above Q2.
Q2
ï Lower 50% ð | ï Upper 50% ð Q1 Q2 Q3
ïLower 25%ð | ïSecond 25%ð | ïThird 25%ð | ïUpper 25%ð

•  Q1 and Q3 measure dispersion since the interquartile range Q3 – Q1
measures the degree of spread in the middle 50 percent of data
values.
Q1 Q3 For first half of data, 50% above, For second half of data, 50%
50% below Q1. above, 50% below Q3.
ïLower 25%ð | ï Middle 50% ð | ïUpper 25%ð
4-31 4-32
Chapter 4
Chapter 4
Method of Medians Method of Medians
•  For small data sets, find quartiles using method of medians: Example:
Step 1: Sort the observations.
Step 2: Find the median Q2.
Step 3: Find the median of the data values that lie below Q2.
Step 4: Find the median of the data values that lie above Q2.
4-33 4-34
Chapter 4
Chapter 4
Example: P/E Ratios and Quartiles LO4-8: Make and interpret box plots.
•  So, to summarize: •  A useful tool of exploratory data analysis (EDA).
Q1 Q2 Q3 •  Also called a box-and-whisker plot.

ïLower 25%ð ïSecond 25%ð ïThird 25%ð ïUpper 25%ð •  Based on a five-number summary:
27 35.5 40.5
of P/E Ratios of P/E Ratios of P/E Ratios of P/E Ratios
Xmin, Q1, Q2, Q3, Xmax
•  These quartiles express central tendency and dispersion. What is
the interquartile range? •  Consider the five-number summary for the previous P/E ratios
example:
Xmin, Q1, Q2, Q3, Xmax
7 27 35.5 40.5 49
4-35 4-36
6
9/28/12
Chapter 4
Chapter 4
Box Plots Box Plots
•  The box plot is displayed visually, like this.
•  A box plot shows variability and shape.
4-37 4-38
Chapter 4
Chapter 4
Box Plots: Fences and Unusual Data Values Box Plots: Fences and Unusual Data Values
•  Use quartiles to detect unusual data points.
•  For example, consider the P/E ratio data:
•  These points are called fences and can be found using the following
formulas:
Inner fences Outer fences:
Inner fences Outer fences: Lower fence: 107 – 1.5 (126 –107) = 78.5 107 – 3.0 (126 –107) = 50
Lower fence Q1 – 1.5 (Q3 – Q1) Q1 – 3.0 (Q3 – Q1) Upper fence: 126 + 1.5 (126 –107) =
126 + 3.0 (126 –107) = 183
154.5
Upper fence Q3 + 1.5 (Q3 – Q1) Q3 + 3.0 (Q3 – Q1)
There is one outlier (170) that lies above the inner fence. There are no
extreme outliers that exceed the outer fence.
•  Values outside the inner fences are unusual while those outside the
outer fences are outliers.
4-39 4-40
Chapter 4
Chapter 4
Box Plots: Midhinge

Box Plots: Fences and Unusual Data Values
•  Truncate the whisker at the fences and display •  The average of the first and third quartiles.
unusual values and outliers as dots.
Outlier
•  The name midhinge derives from the idea that, if the box were
•  Based on these fences, there is only one outlier. folded in half, it would resemble a hinge .
4-41 4-42
7
9/28/12
Chapter 4
Chapter 4
LO4-9 4.6 Correlation and Covariance LO4-9 4.6 Correlation and Covariance
LO4-9: Calculate and interpret a correlation coefficient and covariance. Correlation Coefficient
•  Illustration of Correlation Coefficients
Correlation Coefficient
•  The sample correlation coefficient is a statistic that describes the
degree of linearity between paired observations on two quantitative
variables X and Y.
Note: -1 ≤ r ≤ +1.
4-43 4-44
Chapter 4
Chapter 4
LO4-9 4.6 Correlation and Covariance LO4-9
LO 4.6 Correlation and Covariance
Covariance
Covariance
The covariance of two random variables X and Y (denoted σXY )
measures the degree to which the values of X and Y change together. A correlation coefficient
is the covariance divided
by the product of the
standard deviations of X
and Y.
4-45 4-46
Chapter 4
Chapter 4
LO4-10 4.7 Grouped Data LO4-10 4.7 Grouped Data

LO4-10: Calculate the mean and standard deviation from grouped
data. Group Mean and Standard Deviation
Weighted Mean
Group Mean and Standard Deviation
4-47 4-48
8
9/28/12
Chapter 4
Chapter 4
LO4-11 4.8 Skewness and Kurtosis LO4-11 4.8 Skewness and Kurtosis
LO4-11: Assess skewness and kurtosis in a sample. LO4-11: Assess skewness and kurtosis in a sample.
Skewness Kurtosis
4-49 4-50

Descriptive Statistics: 4 Edition David P. Doane and Lori E. Seward

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Descriptive Statistics: 4 Edition David P. Doane and Lori E. Seward

Uploaded by

Copyright:

Available Formats

9/28/12

Chapter Learning Objectives Chapter Learning Objectives

LO4-5: Transform a data set into standardized values.

LO4-1 4.1 Numerical Description LO4-2 4.1 Numerical Description

Three key characteristics of numerical data:

EXCEL Histogram Display for Tables 4.3

• In Excel, use function =AVERAGE(Data) where Data is an array of

• The most frequently occurring data value. Shape

LO4-3 4.2 Measures of Center LO4-3 4.2 Measures of Center

LO4-4 4.3 Measures of Variability LO4-4 4.3 Measures of Variability

• For any population with mean m and standard deviation s, the

4.4 Standardized Data 4.4 Standardized Data

k = 3, 99.73% will lie within m + 3s

Standardization formula Positive z means

Estimating Sigma LO4-7: Calculate quartiles and other percentiles

• For a normal distribution, the range of values is almost 6s Percentiles

• Deciles are data that have been divided into

ïLower 25%ð | ïSecond 25%ð | ïThird 25%ð | ïUpper 25%ð

Method of Medians Method of Medians

Step 1: Sort the observations.

Step 2: Find the median Q2.

Q1 Q2 Q3 • Also called a box-and-whisker plot.

Xmin, Q1, Q2, Q3, Xmax

Box Plots Box Plots

• The box plot is displayed visually, like this.

• A box plot shows variability and shape.

Box Plots: Midhinge

LO4-10 4.7 Grouped Data LO4-10 4.7 Grouped Data

Group Mean and Standard Deviation

You might also like

•  In Excel, use function =AVERAGE(Data) where Data is an array of

•  The most frequently occurring data value. Shape

•  For any population with mean m and standard deviation s, the

•  For a normal distribution, the range of values is almost 6s Percentiles

•  Deciles are data that have been divided into

Q1 Q2 Q3 •  Also called a box-and-whisker plot.

•  The box plot is displayed visually, like this.

•  A box plot shows variability and shape.