You are on page 1of 9

9/28/12

Chapter 4
Descriptive Statistics
A PowerPoint Presentation Package to Accompany

Chapter Contents
Applied Statistics in Business &
Economics, 4th edition 4.1 Numerical Description
4.2 Measures of Center
David P. Doane and Lori E. Seward 4.3 Measures of Variability
4.4 Standardized Data
4.5 Percentiles, Quartiles, and Box Plots
Prepared by Lloyd R. Jaisingh
4.6 Correlation and Covariance
4.7 Grouped Data
4.8 Skewness and Kurtosis
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. 4-2

Chapter 4

Chapter 4
Descriptive Statistics Descriptive Statistics

Chapter Learning Objectives Chapter Learning Objectives

LO4-1: Explain the concepts of center, variability, and shape. LO4-7: Calculate quartiles and other percentiles.
LO4-2: Use Excel to obtain descriptive statistics and visual displays. LO4-8: Make and interpret box plots.
LO4-3: Calculate and interpret common measures of center. LO4-9: Calculate and interpret a correlation coefficient and
LO4-4: Calculate and interpret common measures of variability. covariance.

LO4-5: Transform a data set into standardized values.


LO4-10: Calculate the mean and standard deviation from
grouped data.
LO4-6: Apply the Empirical Rule and recognize outliers.
LO4-11: Assess skewness and kurtosis in a sample.

4-3 4-4
Chapter 4

Chapter 4

LO4-1 4.1 Numerical Description LO4-2 4.1 Numerical Description


LO4-1: Explain the concepts of center, variability, and shape. LO4-2: Use Excel to obtain descriptive statistics and visual displays.

Three key characteristics of numerical data:

EXCEL Histogram Display for Tables 4.3

4-5 4-6

1
9/28/12

Chapter 4

Chapter 4
LO4-3 4.2 Measures of Center LO4-3 4.2 Measures of Center
LO4-3: Calculate and interpret common measures of center. Median

Mean •  The median (M) is the 50th percentile or midpoint of the sorted
•  A familiar measure of center sample data.
•  M separates the upper and lower halves of the sorted observations.
Population Mean Sample Mean •  If n is odd, the median is the middle observation in the data array.
•  If n is even, the median is the average of the middle two
observations in the data array.

•  In Excel, use function =AVERAGE(Data) where Data is an array of


data values.

4-7 4-8

Chapter 4

Chapter 4
LO4-3 4.2 Measures of Center LO4-1 4.2 Measures of Center
Mode LO4-1: Explain the concepts of center, variability, and shape.

•  The most frequently occurring data value. Shape


•  Compare mean and median or look at the histogram to determine
•  May have multiple modes or no mode. degree of skewness.
•  The mode is most useful for discrete or categorical data with only a •  Figure 4.10 shows prototype population shapes showing varying
few distinct data values. For continuous data or data with a wide degrees of skewness.
range, the mode is rarely useful.

4-9 4-10
Chapter 4

Chapter 4

LO4-3 4.2 Measures of Center LO4-3 4.2 Measures of Center


Geometric Mean Growth Rates Year Revenue (mil)
•  The geometric mean (G) is a 2006 2,361
•  For example, from
multiplicative average.
2006 to 2010, JetBlue 2007 2,843
Airlines revenues are:
2008 3,392
2009 3,292
2010 3,779
The average growth rate:
Growth Rates
A variation on the geometric
mean used to find the average
growth rate for a time series.
or 12.5 % per year.
4-11 4-12

2
9/28/12

Chapter 4

Chapter 4
LO4-3 4.2 Measures of Center LO4-3 4.2 Measures of Center
Midrange Trimmed Mean
•  The midrange is the point halfway between the lowest and highest
•  To calculate the trimmed mean, first remove the highest and lowest
values of X.
k percent of the observations.
•  Easy to use but sensitive to extreme data values.
•  For example, for the n = 33 P/E ratios, we want a 5 percent trimmed
mean (i.e., k = .05).

•  For the J.D. Power quality data: •  To determine how many observations to trim, multiply k by n, which
is 0.05 x 33 = 1.65 or 2 observations.

•  So, we would remove the two smallest and two largest observations
before averaging the remaining values.
•  Here, the midrange (126.5) is higher than the mean (114.70) or
median (113).
4-13 4-14

Chapter 4

Chapter 4
LO4-3 4.2 Measures of Center LO4-4 4.3 Measures of Variability
Trimmed Mean
LO4-4: Calculate and interpret common measures of variability.
•  Here is a summary of all the measures of central tendency for the
J.D. Power data. •  Variation is the spread of data points about the center of the
distribution in a sample. Consider the following measures of
Mean: 114.70 =AVERAGE(Data) variability:
Measures of Variability
Median: 113 =MEDIAN(Data)
Statistic Formula Excel Pro Con
Mode: 111 =MODE.SNGL(Data)
Geometric Mean: 113.35 =GEOMEAN(Data) Sensitive to
=MAX(Data) -
Range xmax – xmin Easy to calculate extreme data
Midrange: 126.5 (MIN(Data)+MAX(Data))/2 MIN(Data)
values.
5% Trim Mean: 113.94 =TRIMMEAN(Data, 0.1) Sample Plays a key role
Nonintuitive
Variance =VAR.S(Data) in mathematical
•  The trimmed mean mitigates the effects of very high values, but still meaning.
(s2) statistics.
exceeds the median.
4-15 4-16
Chapter 4

Chapter 4

LO4-4 4.3 Measures of Variability LO4-4 4.3 Measures of Variability


Measures of Variation Measures of Variability
Statistic Formula Excel Pro Con Statistic Formula Excel Pro Con
Most common Mean n
Sample
standard
measure. Uses
Nonintuitive absolute ∑ xi − x =AVEDEV(Data)
Easy to
Lacks “nice”
theoretical
=STDEV.S(Data) same units as the deviation i =1 understand.
deviation meaning. properties.
raw data ($ , £, ¥, (MAD) n
(s)
grams etc.).
Sample Measures Population variance
Requires
coef- relative variation
non-
ficient. of None in percent so can Population
negative
variation compare data standard
data.
(CV) sets. deviation

4-17 4-18

3
9/28/12

Chapter 4

Chapter 4
LO4-4 4.3 Measures of Variability LO4-4 4.3 Measures of Variability
Coefficient of Variation Mean Absolute Deviation
•  Useful for comparing variables measured in different units or with •  This statistic reveals the average distance from the center.
different means.

•  A unit-free measure of dispersion. •  Absolute values must be used since otherwise the deviations
around the mean would sum to zero. It is stated in the unit of
•  Expressed as a percent of the mean. measurement.

•  Only appropriate for nonnegative data. It is undefined if the mean is •  The MAD is appealing because of its simple interpretation.
zero or negative.

4-19 4-20

Chapter 4

Chapter 4
LO4-1 4.3 Measures of Variability 4.4 Standardized Data
Central Tendency vs. Dispersion:
Manufacturing
Chebyshev s Theorem

•  For any population with mean m and standard deviation s, the


percentage of observations that lie within k standard deviations of
the mean must be at least 100[1 – 1/k2].
•  For k = 2 standard deviations,
•  Although
100[1 – 1/22] = 75%
applicable to
•  So, at least 75.0% will lie within m + 2s
any data set,
•  For k = 3 standard deviations, these limits
100[1 – 1/32] = 88.9% tend to be
•  So, at least 88.9% will lie within m + 3s
rather wide.
•  Take frequent samples to monitor quality.
4-21 4-22
Chapter 4

Chapter 4

4.4 Standardized Data 4.4 Standardized Data


The Empirical Rule The Empirical Rule
•  The normal distribution is symmetric and is also known as the
bell-shaped curve.

•  The Empirical Rule states that for data from a normal distribution, Note: No upper
we expect the interval ! ± k! to contain a known percentage bound is given.
of data. For
Data values
outside
k = 1, 68.26% will lie within m + 1s
m + 3s
are rare.

k = 2, 95.44% will lie within m + 2s

k = 3, 99.73% will lie within m + 3s


4-23 4-24

4
9/28/12

Chapter 4

Chapter 4
LO4-5 4.4 Standardized Data LO4-6 4.4 Standardized Data
LO4-5: Transform a data set into standardized values. LO4-6: Apply the Empirical Rule and recognize outliers.
•  A standardized variable (Z) redefines each observation in terms of
the number of standard deviations from the mean.

A negative z
Standardization formula value means the
for a population: observation is to the
left of the mean.

Standardization formula Positive z means


for a sample (for n > 30):
the observation is to
the right of the mean.

4-25 4-26

Chapter 4

Chapter 4
4.4 Standardized Data LO4-7 4.5 Percentiles, Quartiles, and Box-Plots

Estimating Sigma LO4-7: Calculate quartiles and other percentiles

•  For a normal distribution, the range of values is almost 6s Percentiles


(from m – 3s to m + 3s).

•  Percentiles are data that have been divided into 100 groups.

•  If you know the range R (high – low), you can estimate the
•  For example, you score in the 83rd percentile on a standardized
standard deviation as s = R/6.
test. That means that 83% of the test-takers scored below you.

•  Deciles are data that have been divided into


•  Useful for approximating the standard deviation when only R is
10 groups.
known.
•  Quintiles are data that have been divided into
5 groups.
•  This estimate depends on the assumption of normality.
•  Quartiles are data that have been divided into
4 groups.
4-27 4-28
Chapter 4

Chapter 4

LO4-7 4.5 Percentiles, Quartiles, and Box Plots LO4-7 4.5 Percentiles, Quartiles, and Box Plots

Percentiles Quartiles
•  Quartiles are scale points that divide the sorted data into four
•  Percentiles may be used to establish benchmarks for comparison
groups of approximately equal size.
purposes (e.g. health care, manufacturing, and banking industries
use 5th, 25th, 50th, 75th and 90th percentiles).
•  Quartiles (25, 50, and 75 percent) are commonly used to assess
financial performance and stock portfolios.
Q1 Q2 Q3
•  Percentiles can be used in employee merit evaluation and salary
benchmarking. ïLower 25%ð | ïSecond 25%ð | ïThird 25%ð | ïUpper 25%ð

•  The three values that separate the four groups are called Q1, Q2,
4-29
and Q3, respectively. 4-30

5
9/28/12

Chapter 4

Chapter 4
LO4-7 4.5 Percentiles, Quartiles, and Box Plots LO4-7 4.5 Percentiles, Quartiles, and Box Plots
Quartiles
Quartiles – The method of medians
•  The second quartile Q2 is the median, a measure of central
tendency. •  The first quartile Q1 is the median of the data values below Q2, and
the third quartile Q3 is the median of the data values above Q2.

Q2
ï Lower 50% ð | ï Upper 50% ð Q1 Q2 Q3

ïLower 25%ð | ïSecond 25%ð | ïThird 25%ð | ïUpper 25%ð


•  Q1 and Q3 measure dispersion since the interquartile range Q3 – Q1
measures the degree of spread in the middle 50 percent of data
values.

Q1 Q3 For first half of data, 50% above, For second half of data, 50%
50% below Q1. above, 50% below Q3.
ïLower 25%ð | ï Middle 50% ð | ïUpper 25%ð

4-31 4-32

Chapter 4

Chapter 4
LO4-7 4.5 Percentiles, Quartiles, and Box Plots LO4-7 4.5 Percentiles, Quartiles, and Box Plots

Method of Medians Method of Medians

•  For small data sets, find quartiles using method of medians: Example:

Step 1: Sort the observations.

Step 2: Find the median Q2.

Step 3: Find the median of the data values that lie below Q2.

Step 4: Find the median of the data values that lie above Q2.

4-33 4-34
Chapter 4

Chapter 4

LO4-7 4.5 Percentiles, Quartiles, and Box Plots LO4-8 4.5 Percentiles, Quartiles, and Box Plots

Example: P/E Ratios and Quartiles LO4-8: Make and interpret box plots.
•  So, to summarize: •  A useful tool of exploratory data analysis (EDA).

Q1 Q2 Q3 •  Also called a box-and-whisker plot.


ïLower 25%ð ïSecond 25%ð ïThird 25%ð ïUpper 25%ð •  Based on a five-number summary:
27 35.5 40.5
of P/E Ratios of P/E Ratios of P/E Ratios of P/E Ratios
Xmin, Q1, Q2, Q3, Xmax
•  These quartiles express central tendency and dispersion. What is
the interquartile range? •  Consider the five-number summary for the previous P/E ratios
example:

Xmin, Q1, Q2, Q3, Xmax

7 27 35.5 40.5 49
4-35 4-36

6
9/28/12

Chapter 4

Chapter 4
LO4-8 4.5 Percentiles, Quartiles, and Box Plots LO4-8 4.5 Percentiles, Quartiles, and Box Plots

Box Plots Box Plots

•  The box plot is displayed visually, like this.

•  A box plot shows variability and shape.

4-37 4-38

Chapter 4

Chapter 4
LO4-8 4.5 Percentiles, Quartiles, and Box Plots LO4-8 4.5 Percentiles, Quartiles, and Box Plots

Box Plots: Fences and Unusual Data Values Box Plots: Fences and Unusual Data Values
•  Use quartiles to detect unusual data points.
•  For example, consider the P/E ratio data:
•  These points are called fences and can be found using the following
formulas:
Inner fences Outer fences:

Inner fences Outer fences: Lower fence: 107 – 1.5 (126 –107) = 78.5 107 – 3.0 (126 –107) = 50
Lower fence Q1 – 1.5 (Q3 – Q1) Q1 – 3.0 (Q3 – Q1) Upper fence: 126 + 1.5 (126 –107) =
126 + 3.0 (126 –107) = 183
154.5
Upper fence Q3 + 1.5 (Q3 – Q1) Q3 + 3.0 (Q3 – Q1)
There is one outlier (170) that lies above the inner fence. There are no
extreme outliers that exceed the outer fence.
•  Values outside the inner fences are unusual while those outside the
outer fences are outliers.

4-39 4-40
Chapter 4

Chapter 4

LO4-8 4.5 Percentiles, Quartiles, and Box Plots LO4-8 4.5 Percentiles, Quartiles, and Box Plots

Box Plots: Midhinge


Box Plots: Fences and Unusual Data Values
•  Truncate the whisker at the fences and display •  The average of the first and third quartiles.
unusual values and outliers as dots.
Outlier

•  The name midhinge derives from the idea that, if the box were
•  Based on these fences, there is only one outlier. folded in half, it would resemble a hinge .

4-41 4-42

7
9/28/12

Chapter 4

Chapter 4
LO4-9 4.6 Correlation and Covariance LO4-9 4.6 Correlation and Covariance
LO4-9: Calculate and interpret a correlation coefficient and covariance. Correlation Coefficient
•  Illustration of Correlation Coefficients
Correlation Coefficient
•  The sample correlation coefficient is a statistic that describes the
degree of linearity between paired observations on two quantitative
variables X and Y.

Note: -1 ≤ r ≤ +1.

4-43 4-44

Chapter 4

Chapter 4
LO4-9 4.6 Correlation and Covariance LO4-9
LO 4.6 Correlation and Covariance

Covariance
Covariance
The covariance of two random variables X and Y (denoted σXY )
measures the degree to which the values of X and Y change together. A correlation coefficient
is the covariance divided
by the product of the
standard deviations of X
and Y.

4-45 4-46
Chapter 4

Chapter 4

LO4-10 4.7 Grouped Data LO4-10 4.7 Grouped Data


LO4-10: Calculate the mean and standard deviation from grouped
data. Group Mean and Standard Deviation
Weighted Mean

Group Mean and Standard Deviation

4-47 4-48

8
9/28/12

Chapter 4

Chapter 4
LO4-11 4.8 Skewness and Kurtosis LO4-11 4.8 Skewness and Kurtosis
LO4-11: Assess skewness and kurtosis in a sample. LO4-11: Assess skewness and kurtosis in a sample.
Skewness Kurtosis

4-49 4-50

You might also like