Professional Documents
Culture Documents
Chapters 3&4 (16th Edition)
Chapters 3&4 (16th Edition)
Describing Data:
Numerical Measures
Chapter 3
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Learning Objectives
LO3-1 Compute and interpret the mean, the median,
and the mode.
LO3-2 Compute a weighted mean.
LO3-3 Compute and interpret the geometric mean.
LO3-4 Compute and interpret the range, variance, and
standard deviation.
LO3-5 Explain and apply Chebyshev’s theorem and the
Empirical Rule.
LO3-6 Compute the mean and standard deviation of
grouped data.
3-2
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
1
07/11/2014
Measures of Location
The purpose of a measure of location is to pinpoint the
center of a distribution of data.
There are many measures of location. We will consider
three:
1. The arithmetic mean
2. The median
3. The mode
3-3
LO3-1
3-4
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
2
07/11/2014
LO3-1
Population Mean
For ungrouped data, the population mean is the sum of
all the population values divided by the total number of
population values:
3-5
LO3-1
3-6
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
3
07/11/2014
LO3-1
3-7
LO3-1
3-8
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
4
07/11/2014
LO3-1
3-9
LO3-1
Sample Mean
For ungrouped data, the sample mean is the sum of all
the sample values divided by the number of sample
values:
3-10
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
5
07/11/2014
LO3-1
3-11
LO3-1
The Median
MEDIAN The midpoint of the values after they have been
ordered from the minimum to the maximum values.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
6
07/11/2014
LO3-1
Examples - Median
The ages for a sample of The heights of four
five college students are: basketball players, in
inches, are:
21, 25, 19, 20, 22
76, 73, 80, 75
Arranging the data in
ascending order gives: Arranging the data in
ascending order gives:
19, 20, 21, 22, 25.
73, 75, 76, 80.
Thus the median is 21.
Thus the median is 75.5.
3-13
LO3-1
The Mode
MODE The value of the observation that appears
most frequently.
3-14
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
7
07/11/2014
LO3-1
Example - Mode
Using the data
measuring the
distance in miles
between exits on I-75
through Kentucky,
what is the modal
distance?
LO3-1
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
8
07/11/2014
Weighted Mean
The weighted mean of a set of numbers X1, X2, ..., Xn,
with corresponding weights w1, w2, ...,wn, is computed
with the following formula:
3-17
LO3-2
3-18
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
9
07/11/2014
3-19
LO3-3
EXAMPLE:
The return on investment earned by Atkins Construction
Company for four successive years was: 30 percent, 20
percent, -40 percent, and 200 percent. What is the
geometric mean rate of return on investment?
3-20
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
10
07/11/2014
LO3-3
EXAMPLE:
During the decade of the 1990s, and into the 2000s, Las Vegas, Nevada,
was the fastest-growing city in the United States. The population
increased from 258,295 in 1990 to 584,539 in 2011. This is an increase of
326,244 people, or a 126.3 percent increase over the period. What is the
average annual increase?
3-21
Dispersion
A measure of location, such as the mean or the median, only
describes the center of the data but it does not tell us anything
about the spread of the data.
For example, if your nature guide told you that the river ahead
averaged 3 feet in depth, would you want to wade across on foot
without additional information? Probably not. You would want to
know something about the variation in the depth.
A second reason for studying the dispersion in a set of data is to
compare the spread in two or more distributions.
3-22
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
11
07/11/2014
LO3-4
Measures of Dispersion
Range
Variance
Standard Deviation
3-23
LO3-4
Example – Range
The number of cappuccinos sold at the Starbucks
location in the Orange County Airport between 4 and 7
p.m. for a sample of 5 days last year were 20, 40, 50,
60, and 80. Determine the range for the number of
cappuccinos sold.
3-24
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
12
07/11/2014
LO3-4
3-25
LO3-4
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
13
07/11/2014
LO3-4
x 19 17 ... 34 10 348
29
N 12 12
3-27
LO3-4
2
( X )2 1,488
124
N 12
3-28
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
14
07/11/2014
LO3-4
Sample Variance
3-29
LO3-4
3-30
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
15
07/11/2014
LO3-4
where :
s 2 is the sample variance
x is the value of each observation in the sample
x is the mean of the sample
n is the number of observations in the sample
3-31
Chebyshev’s Theorem
3-32
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
16
07/11/2014
LO3-5
3-33
3-34
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
17
07/11/2014
LO3-6
3-35
LO3-6
3-36
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
18
07/11/2014
LO3-6
3-37
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
19
07/11/2014
Describing Data:
Displaying and
Exploring Data
Chapter 4
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Learning Objectives
LO4-1 Construct and interpret a dot plot.
LO4-2 Construct and describe a stem-and-leaf display.
LO4-3 Identify and compute measures of position.
LO4-4 Construct and analyze a box plot.
LO4-5 Compute and interpret the coefficient of skewness.
LO4-6 Create and interpret a scatter diagram.
LO4-7 Develop and explain a contingency table.
4-2
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
1
07/11/2014
Dot Plots
A dot plot groups the data as little as possible and the
identity of an individual observation is not lost.
To develop a dot plot, each observation is simply
displayed as a dot along a horizontal number line
indicating the possible values of the data.
If there are identical observations or the observations are
too close to be shown individually, the dots are “piled” on
top of each other.
4-3
LO4-1
4-4
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
2
07/11/2014
LO4-1
4-5
Stem-and-Leaf
In Chapter 2, frequency distributions were used to organize
data into a meaningful form.
A major advantage to organizing the data with a frequency
distribution is that we get a visual picture of the shape of a
distribution.
There are two disadvantages, however, of organizing the data
into a frequency distribution:
1. The exact identity of each value is lost.
2. It is difficult to tell how the values within each class are distributed.
4-6
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
3
07/11/2014
LO4-2
Stem-and-Leaf
Stem-and-leaf display: a statistical technique to organize and
present a set of data. Each numerical value is divided into two
parts. The leading digit(s) becomes the stem and the trailing
digit the leaf. The stems are located along the vertical axis and
the leaf values are stacked against each other along the
horizontal axis.
Advantage of the stem-and-leaf display over a frequency
distribution: the identity of each observation is not lost.
4-7
LO4-2
4-8
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
4
07/11/2014
LO4-2
4-9
Measures of Position
The standard deviation is the most widely used
measure of dispersion.
4-10
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
5
07/11/2014
LO4-3
Percentile Computation
To compute a percentile, let Lp refer to the location of a desired
percentile. So if we wanted to find the 33rd percentile we would
use L33 and if we wanted the median, the 50th percentile, then
L50.
4-11
LO4-3
Percentiles - Example
Listed below are the commissions earned last month by
a sample of 15 brokers at Salomon Smith Barney’s
Oakland, California, office.
4-12
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
6
07/11/2014
LO4-3
Percentiles – Example
Step 1: Organize the data from lowest to
largest value.
4-13
LO4-3
Percentiles – Example
Step 2: Compute the first and third
quartiles. Locate L25 and L75
using:
25 75
L25 = (15 +1) =4 L75 = (15 +1) = 12
100 100
Therefore, the first and third quartiles are located at the 4th and 12th
positions, respectively: L25 = $1, 721; L75 = $2, 205
4-14
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
7
07/11/2014
LO4-3
Percentiles – Example
In the previous example the location formula yielded a whole number. What
if there were 6 observations in the sample with the following ordered
observations: 43, 61, 75, 91, 101, and 104 , that is n=6, and we wanted to
locate the first quartile?
25
L25 (6 1) 1.75
100
Locate the first value in the ordered array and then move .75 of the distance
between the first and second values and report that as the first quartile. Like
the median, the quartile does not need to be one of the actual values in the
data set.
The 1st and 2nd values are 43 and 61. Moving 0.75 of the distance
between these numbers, the 25th percentile is 56.5, obtained as 43 +
0.75*(61- 43).
4-15
Box Plot
A box plot is a graphical display, based on
quartiles, that helps us picture a set of data.
To construct a box plot, we need only five
statistics:
1. The minimum value,
2. Q1(the first quartile),
3. The median,
4. Q3 (the third quartile), and
5. The maximum value.
4-16
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
8
07/11/2014
LO4-4
Boxplot - Example
Alexander’s Pizza offers free delivery of its pizza within 15 miles.
Alex, the owner, wants some information on the time it takes for
delivery. How long does a typical delivery take? Within what range
of times will most deliveries be completed? For a sample of 20
deliveries, he determined the following information:
Minimum value = 13 minutes
Q1 = 15 minutes
Median = 18 minutes
Q3 = 22 minutes
Maximum value = 30 minutes
Develop a box plot for the delivery times. What conclusions can you
make about the delivery times?
4-17
LO4-4
Boxplot Example
Step1: Create an appropriate scale along the horizontal axis.
Step 2: Draw a box that starts at Q1 (15 minutes) and ends at Q3 (22
minutes). Inside the box we place a vertical line to represent the
median (18 minutes).
Step 3: Extend horizontal lines from the box out to the minimum value (13
minutes) and the maximum value (30 minutes).
4-18
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
9
07/11/2014
LO4-4
Skewness
Chapter 3 introduced measures of central location (the
mean, median, and mode) and measures of dispersion
(the range and standard deviation) for a distribution of
data.
Shape is another characteristic of a distribution.
There are four shapes commonly observed:
1. symmetric,
2. positively skewed,
3. negatively skewed, and
4. bimodal.
4-20
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
10
07/11/2014
LO4-5
4-21
LO4-5
4-22
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
11
07/11/2014
LO4-5
Skewness – An Example
Following are the earnings per share for a sample of 15
software companies for the year 2010. The earnings per
share are arranged from minimum to maximum.
4-23
LO4-5
Skewness – An Example
Step 1 : Compute the Mean
X
X
$74.26
$4.95
n 15
s
XX
2
($0.09 $4.95)2 ... ($16.40 $4.95)2 )
$5.22
n 1 15 1
4-24
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
12
07/11/2014
4-25
LO4-6
4-26
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
13
07/11/2014
Contingency Tables
What if we wish to study the relationship between two
variables when one or both are nominal or ordinal
scale? In this case we tally the results with a
contingency table.
4-27
LO4-7
Contingency Tables
A contingency table is a cross-tabulation that
simultaneously summarizes two variables of interest.
Examples:
1. Students at a university are classified by gender and class rank.
2. A product is classified as acceptable or unacceptable and by the
shift (day, afternoon, or night) when it is manufactured.
3. A voter in a school bond referendum is classified by party affiliation
(Democrat, Republican, and other) and the number of children that
voter has attending school in the district (0, 1, 2, etc.).
4-28
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
14
07/11/2014
LO4-7
4-29
LO4-7
4-30
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
15