Professional Documents
Culture Documents
Chapter 4
Descriptive Statistics
A PowerPoint Presentation Package to Accompany
Chapter Contents
Applied Statistics in Business &
Economics, 4th edition 4.1 Numerical Description
4.2 Measures of Center
David P. Doane and Lori E. Seward 4.3 Measures of Variability
4.4 Standardized Data
4.5 Percentiles, Quartiles, and Box Plots
Prepared by Lloyd R. Jaisingh
4.6 Correlation and Covariance
4.7 Grouped Data
4.8 Skewness and Kurtosis
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. 4-2
Chapter 4
Chapter 4
Descriptive Statistics Descriptive Statistics
LO4-1: Explain the concepts of center, variability, and shape. LO4-7: Calculate quartiles and other percentiles.
LO4-2: Use Excel to obtain descriptive statistics and visual displays. LO4-8: Make and interpret box plots.
LO4-3: Calculate and interpret common measures of center. LO4-9: Calculate and interpret a correlation coefficient and
LO4-4: Calculate and interpret common measures of variability. covariance.
4-3 4-4
Chapter 4
Chapter 4
4-5 4-6
1
9/28/12
Chapter 4
Chapter 4
LO4-3 4.2 Measures of Center LO4-3 4.2 Measures of Center
LO4-3: Calculate and interpret common measures of center. Median
Mean • The median (M) is the 50th percentile or midpoint of the sorted
• A familiar measure of center sample data.
• M separates the upper and lower halves of the sorted observations.
Population Mean Sample Mean • If n is odd, the median is the middle observation in the data array.
• If n is even, the median is the average of the middle two
observations in the data array.
4-7 4-8
Chapter 4
Chapter 4
LO4-3 4.2 Measures of Center LO4-1 4.2 Measures of Center
Mode LO4-1: Explain the concepts of center, variability, and shape.
4-9 4-10
Chapter 4
Chapter 4
2
9/28/12
Chapter 4
Chapter 4
LO4-3 4.2 Measures of Center LO4-3 4.2 Measures of Center
Midrange Trimmed Mean
• The midrange is the point halfway between the lowest and highest
• To calculate the trimmed mean, first remove the highest and lowest
values of X.
k percent of the observations.
• Easy to use but sensitive to extreme data values.
• For example, for the n = 33 P/E ratios, we want a 5 percent trimmed
mean (i.e., k = .05).
• For the J.D. Power quality data: • To determine how many observations to trim, multiply k by n, which
is 0.05 x 33 = 1.65 or 2 observations.
• So, we would remove the two smallest and two largest observations
before averaging the remaining values.
• Here, the midrange (126.5) is higher than the mean (114.70) or
median (113).
4-13 4-14
Chapter 4
Chapter 4
LO4-3 4.2 Measures of Center LO4-4 4.3 Measures of Variability
Trimmed Mean
LO4-4: Calculate and interpret common measures of variability.
• Here is a summary of all the measures of central tendency for the
J.D. Power data. • Variation is the spread of data points about the center of the
distribution in a sample. Consider the following measures of
Mean: 114.70 =AVERAGE(Data) variability:
Measures of Variability
Median: 113 =MEDIAN(Data)
Statistic Formula Excel Pro Con
Mode: 111 =MODE.SNGL(Data)
Geometric Mean: 113.35 =GEOMEAN(Data) Sensitive to
=MAX(Data) -
Range xmax – xmin Easy to calculate extreme data
Midrange: 126.5 (MIN(Data)+MAX(Data))/2 MIN(Data)
values.
5% Trim Mean: 113.94 =TRIMMEAN(Data, 0.1) Sample Plays a key role
Nonintuitive
Variance =VAR.S(Data) in mathematical
• The trimmed mean mitigates the effects of very high values, but still meaning.
(s2) statistics.
exceeds the median.
4-15 4-16
Chapter 4
Chapter 4
4-17 4-18
3
9/28/12
Chapter 4
Chapter 4
LO4-4 4.3 Measures of Variability LO4-4 4.3 Measures of Variability
Coefficient of Variation Mean Absolute Deviation
• Useful for comparing variables measured in different units or with • This statistic reveals the average distance from the center.
different means.
• A unit-free measure of dispersion. • Absolute values must be used since otherwise the deviations
around the mean would sum to zero. It is stated in the unit of
• Expressed as a percent of the mean. measurement.
• Only appropriate for nonnegative data. It is undefined if the mean is • The MAD is appealing because of its simple interpretation.
zero or negative.
4-19 4-20
Chapter 4
Chapter 4
LO4-1 4.3 Measures of Variability 4.4 Standardized Data
Central Tendency vs. Dispersion:
Manufacturing
Chebyshev s Theorem
Chapter 4
• The Empirical Rule states that for data from a normal distribution, Note: No upper
we expect the interval ! ± k! to contain a known percentage bound is given.
of data. For
Data values
outside
k = 1, 68.26% will lie within m + 1s
m + 3s
are rare.
k = 2, 95.44% will lie within m + 2s
4-23 4-24
4
9/28/12
Chapter 4
Chapter 4
LO4-5 4.4 Standardized Data LO4-6 4.4 Standardized Data
LO4-5: Transform a data set into standardized values. LO4-6: Apply the Empirical Rule and recognize outliers.
• A standardized variable (Z) redefines each observation in terms of
the number of standard deviations from the mean.
A negative z
Standardization formula value means the
for a population: observation is to the
left of the mean.
4-25 4-26
Chapter 4
Chapter 4
4.4 Standardized Data LO4-7 4.5 Percentiles, Quartiles, and Box-Plots
• If you know the range R (high – low), you can estimate the
• For example, you score in the 83rd percentile on a standardized
standard deviation as s = R/6.
test. That means that 83% of the test-takers scored below you.
Chapter 4
LO4-7 4.5 Percentiles, Quartiles, and Box Plots LO4-7 4.5 Percentiles, Quartiles, and Box Plots
Percentiles Quartiles
• Quartiles are scale points that divide the sorted data into four
• Percentiles may be used to establish benchmarks for comparison
groups of approximately equal size.
purposes (e.g. health care, manufacturing, and banking industries
use 5th, 25th, 50th, 75th and 90th percentiles).
• Quartiles (25, 50, and 75 percent) are commonly used to assess
financial performance and stock portfolios.
Q1 Q2 Q3
• Percentiles can be used in employee merit evaluation and salary
benchmarking. ïLower 25%ð | ïSecond 25%ð | ïThird 25%ð | ïUpper 25%ð
• The three values that separate the four groups are called Q1, Q2,
4-29
and Q3, respectively. 4-30
5
9/28/12
Chapter 4
Chapter 4
LO4-7 4.5 Percentiles, Quartiles, and Box Plots LO4-7 4.5 Percentiles, Quartiles, and Box Plots
Quartiles
Quartiles – The method of medians
• The second quartile Q2 is the median, a measure of central
tendency. • The first quartile Q1 is the median of the data values below Q2, and
the third quartile Q3 is the median of the data values above Q2.
Q2
ï Lower 50% ð | ï Upper 50% ð Q1 Q2 Q3
Q1 Q3 For first half of data, 50% above, For second half of data, 50%
50% below Q1. above, 50% below Q3.
ïLower 25%ð | ï Middle 50% ð | ïUpper 25%ð
4-31 4-32
Chapter 4
Chapter 4
LO4-7 4.5 Percentiles, Quartiles, and Box Plots LO4-7 4.5 Percentiles, Quartiles, and Box Plots
• For small data sets, find quartiles using method of medians: Example:
Step 3: Find the median of the data values that lie below Q2.
Step 4: Find the median of the data values that lie above Q2.
4-33 4-34
Chapter 4
Chapter 4
LO4-7 4.5 Percentiles, Quartiles, and Box Plots LO4-8 4.5 Percentiles, Quartiles, and Box Plots
Example: P/E Ratios and Quartiles LO4-8: Make and interpret box plots.
• So, to summarize: • A useful tool of exploratory data analysis (EDA).
7 27 35.5 40.5 49
4-35 4-36
6
9/28/12
Chapter 4
Chapter 4
LO4-8 4.5 Percentiles, Quartiles, and Box Plots LO4-8 4.5 Percentiles, Quartiles, and Box Plots
4-37 4-38
Chapter 4
Chapter 4
LO4-8 4.5 Percentiles, Quartiles, and Box Plots LO4-8 4.5 Percentiles, Quartiles, and Box Plots
Box Plots: Fences and Unusual Data Values Box Plots: Fences and Unusual Data Values
• Use quartiles to detect unusual data points.
• For example, consider the P/E ratio data:
• These points are called fences and can be found using the following
formulas:
Inner fences Outer fences:
Inner fences Outer fences: Lower fence: 107 – 1.5 (126 –107) = 78.5 107 – 3.0 (126 –107) = 50
Lower fence Q1 – 1.5 (Q3 – Q1) Q1 – 3.0 (Q3 – Q1) Upper fence: 126 + 1.5 (126 –107) =
126 + 3.0 (126 –107) = 183
154.5
Upper fence Q3 + 1.5 (Q3 – Q1) Q3 + 3.0 (Q3 – Q1)
There is one outlier (170) that lies above the inner fence. There are no
extreme outliers that exceed the outer fence.
• Values outside the inner fences are unusual while those outside the
outer fences are outliers.
4-39 4-40
Chapter 4
Chapter 4
LO4-8 4.5 Percentiles, Quartiles, and Box Plots LO4-8 4.5 Percentiles, Quartiles, and Box Plots
• The name midhinge derives from the idea that, if the box were
• Based on these fences, there is only one outlier. folded in half, it would resemble a hinge .
4-41 4-42
7
9/28/12
Chapter 4
Chapter 4
LO4-9 4.6 Correlation and Covariance LO4-9 4.6 Correlation and Covariance
LO4-9: Calculate and interpret a correlation coefficient and covariance. Correlation Coefficient
• Illustration of Correlation Coefficients
Correlation Coefficient
• The sample correlation coefficient is a statistic that describes the
degree of linearity between paired observations on two quantitative
variables X and Y.
Note: -1 ≤ r ≤ +1.
4-43 4-44
Chapter 4
Chapter 4
LO4-9 4.6 Correlation and Covariance LO4-9
LO 4.6 Correlation and Covariance
Covariance
Covariance
The covariance of two random variables X and Y (denoted σXY )
measures the degree to which the values of X and Y change together. A correlation coefficient
is the covariance divided
by the product of the
standard deviations of X
and Y.
4-45 4-46
Chapter 4
Chapter 4
4-47 4-48
8
9/28/12
Chapter 4
Chapter 4
LO4-11 4.8 Skewness and Kurtosis LO4-11 4.8 Skewness and Kurtosis
LO4-11: Assess skewness and kurtosis in a sample. LO4-11: Assess skewness and kurtosis in a sample.
Skewness Kurtosis
4-49 4-50