You are on page 1of 34

07/11/2014

Describing Data:
Numerical Measures

Chapter 3

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

Learning Objectives
LO3-1 Compute and interpret the mean, the median,
and the mode.
LO3-2 Compute a weighted mean.
LO3-3 Compute and interpret the geometric mean.
LO3-4 Compute and interpret the range, variance, and
standard deviation.
LO3-5 Explain and apply Chebyshev’s theorem and the
Empirical Rule.
LO3-6 Compute the mean and standard deviation of
grouped data.

3-2

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

1
07/11/2014

LO3-1 Compute and interpret the


mean, the median, and the mode.

Measures of Location
 The purpose of a measure of location is to pinpoint the
center of a distribution of data.
 There are many measures of location. We will consider
three:
1. The arithmetic mean
2. The median
3. The mode

3-3

LO3-1

Characteristics of the Mean


 The arithmetic mean is the most widely used
measure of location.
 It requires the interval scale.
 Major characteristics:
 All values are used.
 It is unique.
 The sum of the deviations from the mean is 0.
 It is calculated by summing the values and
dividing by the number of values.

3-4

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

2
07/11/2014

LO3-1

Population Mean
For ungrouped data, the population mean is the sum of
all the population values divided by the total number of
population values:

3-5

LO3-1

Example – Population Mean


There are 42 exits on I-75 through the state of Kentucky.
Listed below are the distances between exits (in miles).

1. Why is this information a population?

2. What is the mean number of miles between exits?

3-6

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

3
07/11/2014

LO3-1

Example – Population Mean


There are 42 exits on I-75 through the state of Kentucky.
Listed below are the distances between exits (in miles).

Why is this information a population?

This is a population because we are considering all of the


exits in Kentucky.

What is the mean number of miles between exits?

3-7

LO3-1

Parameter versus Statistic


PARAMETER A measurable characteristic of a
population.

STATISTIC A measurable characteristic of a


sample.

3-8

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

4
07/11/2014

LO3-1

Properties of the Arithmetic Mean


1. Every set of interval-level and ratio-level data has a
mean.
2. All the values are included in computing the mean.
3. The mean is unique.
4. The sum of the deviations of each value from the mean is
zero.

3-9

LO3-1

Sample Mean
For ungrouped data, the sample mean is the sum of all
the sample values divided by the number of sample
values:

3-10

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

5
07/11/2014

LO3-1

Example – Sample Mean

3-11

LO3-1

The Median
MEDIAN The midpoint of the values after they have been
ordered from the minimum to the maximum values.

Properties of the median:


1. There is a unique median for each data set.
2. It is not affected by extremely large or small values
and is therefore a valuable measure of central
tendency when such values occur.
3. It can be computed for ratio-level, interval-level, and
ordinal-level data.
4. It can be computed for an open-ended frequency
distribution if the median does not lie in an open-
ended class.
3-12

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

6
07/11/2014

LO3-1

Examples - Median
The ages for a sample of The heights of four
five college students are: basketball players, in
inches, are:
21, 25, 19, 20, 22
76, 73, 80, 75
Arranging the data in
ascending order gives: Arranging the data in
ascending order gives:
19, 20, 21, 22, 25.
73, 75, 76, 80.
Thus the median is 21.
Thus the median is 75.5.

3-13

LO3-1

The Mode
MODE The value of the observation that appears
most frequently.

3-14

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

7
07/11/2014

LO3-1

Example - Mode
Using the data
measuring the
distance in miles
between exits on I-75
through Kentucky,
what is the modal
distance?

Organize the distances


into a frequency table
and select the distance
with the highest
frequency.
3-15

LO3-1

The Relative Positions of the Mean,


Median and the Mode

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

8
07/11/2014

LO3-2 Compute a weighted mean.

Weighted Mean
The weighted mean of a set of numbers X1, X2, ..., Xn,
with corresponding weights w1, w2, ...,wn, is computed
with the following formula:

3-17

LO3-2

Example – Weighted Mean


The Carter Construction Company pays its hourly
employees $16.50, $19.00, or $25.00 per hour. There are
26 hourly employees: 14 are paid at the $16.50 rate, 10
at the $19.00 rate, and 2 at the $25.00 rate.

What is the mean hourly rate paid for the 26 employees?

3-18

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

9
07/11/2014

LO3-3 Compute and interpret


the geometric mean.

The Geometric Mean


 Useful in finding the average change of percentages,
ratios, indexes, or growth rates over time.
 It has wide application in business and economics
because we are often interested in finding the percentage
changes in sales, salaries, or economic figures, such as
the GDP.
 The geometric mean will always be less than or equal to
the arithmetic mean.

3-19

LO3-3

The Geometric Mean: Finding the


average rate of return over time

EXAMPLE:
The return on investment earned by Atkins Construction
Company for four successive years was: 30 percent, 20
percent, -40 percent, and 200 percent. What is the
geometric mean rate of return on investment?

3-20

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

10
07/11/2014

LO3-3

The Geometric Mean: Finding an


Average Percent Change Over Time

EXAMPLE:
During the decade of the 1990s, and into the 2000s, Las Vegas, Nevada,
was the fastest-growing city in the United States. The population
increased from 258,295 in 1990 to 584,539 in 2011. This is an increase of
326,244 people, or a 126.3 percent increase over the period. What is the
average annual increase?

3-21

LO3-4 Compute and interpret the range,


variance, and standard deviation.

Dispersion
 A measure of location, such as the mean or the median, only
describes the center of the data but it does not tell us anything
about the spread of the data.
 For example, if your nature guide told you that the river ahead
averaged 3 feet in depth, would you want to wade across on foot
without additional information? Probably not. You would want to
know something about the variation in the depth.
 A second reason for studying the dispersion in a set of data is to
compare the spread in two or more distributions.

3-22

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

11
07/11/2014

LO3-4

Measures of Dispersion
 Range

 Variance

 Standard Deviation

3-23

LO3-4

Example – Range
The number of cappuccinos sold at the Starbucks
location in the Orange County Airport between 4 and 7
p.m. for a sample of 5 days last year were 20, 40, 50,
60, and 80. Determine the range for the number of
cappuccinos sold.

Range = Maximum value – Minimum value


= 80 – 20
= 60

3-24

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

12
07/11/2014

LO3-4

Variance and Standard Deviation


VARIANCE The arithmetic mean of the squared deviations
from the mean.

STANDARD DEVIATION The square root of the variance.

 The variance and standard deviations are nonnegative and are


zero only if all observations are the same.
 For populations whose values are near the mean, the variance and
standard deviation will be small.
 For populations whose values are dispersed from the mean, the
population variance and standard deviation will be large.
 The variance overcomes the weakness of the range by using all
the values in the population.

3-25

LO3-4

Computing the Variance

Steps in computing the variance:

Step 1: Find the mean.


Step 2: Find the difference between each observation and
the mean, and square that difference.
Step 3: Sum all the squared differences found in Step 2.
Step 4: Divide the sum of the squared differences by the
number of items in the population.
3-26

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

13
07/11/2014

LO3-4

Example – Variance and Standard


Deviation
The number of traffic citations issued during the last twelve months in
Beaufort County, South Carolina, is reported below:

What is the population variance?

Step 1: Find the mean.

 
x 19  17  ...  34  10 348
  29
N 12 12

3-27

LO3-4

Example – Variance and Standard


Deviation Continued
What is the population variance?

Step 2: Find the difference between each


observation and the mean of 29,
and square that difference.

Step 3: Sum all the squared differences found in Step 2.

Step 4: Divide the sum of the squared differences


by the number of items in the population.

2  
( X   )2 1,488
  124
N 12

3-28

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

14
07/11/2014

LO3-4

Sample Variance

3-29

LO3-4

Example – Sample Variance


The hourly wages for
a sample of part-time
employees at Home
Depot are: $12, $20,
$16, $18, and $19.

The sample mean is


$17.

What is the sample


variance?

3-30

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

15
07/11/2014

LO3-4

Sample Standard Deviation

where :
s 2 is the sample variance
x is the value of each observation in the sample
x is the mean of the sample
n is the number of observations in the sample

3-31

LO3-5 Explain and apply Chebyshev’s


theorem and the Empirical Rule.

Chebyshev’s Theorem

The arithmetic mean biweekly amount contributed by the Dupree


Paint employees to the company’s profit-sharing plan is $51.54,
and the standard deviation is $7.51. At least what percent of the
contributions lie within plus 3.5 standard deviations and minus 3.5
standard deviations of the mean?

3-32

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

16
07/11/2014

LO3-5

The Empirical Rule

3-33

LO3-6 Compute the mean and


standard deviation of grouped data.

The Arithmetic Mean of Grouped


Data

3-34

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

17
07/11/2014

LO3-6

Example - The Arithmetic Mean of


Grouped Data
Recall in Chapter 2, we
constructed a frequency
distribution for Applewood
Auto Group profit data for
180 vehicles sold. The
information is repeated in the
table. Determine the
arithmetic mean profit per
vehicle.

3-35

LO3-6

Example - The Arithmetic Mean of


Grouped Data

3-36

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

18
07/11/2014

LO3-6

Example - Standard Deviation of


Grouped Data
Refer to the frequency distribution for the Applewood Auto
Group data used earlier. Compute the standard deviation of the
vehicle profits.

3-37

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

19
07/11/2014

Describing Data:
Displaying and
Exploring Data
Chapter 4

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

Learning Objectives
LO4-1 Construct and interpret a dot plot.
LO4-2 Construct and describe a stem-and-leaf display.
LO4-3 Identify and compute measures of position.
LO4-4 Construct and analyze a box plot.
LO4-5 Compute and interpret the coefficient of skewness.
LO4-6 Create and interpret a scatter diagram.
LO4-7 Develop and explain a contingency table.

4-2

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

1
07/11/2014

LO4-1 Construct and


interpret a dot plot.

Dot Plots
 A dot plot groups the data as little as possible and the
identity of an individual observation is not lost.
 To develop a dot plot, each observation is simply
displayed as a dot along a horizontal number line
indicating the possible values of the data.
 If there are identical observations or the observations are
too close to be shown individually, the dots are “piled” on
top of each other.

4-3

LO4-1

Dot Plots - Example


The Service Departments at Tionesta Ford Lincoln Mercury and Sheffield
Motors, Inc., two of the four Applewood Auto Group Dealerships, were both
open 24 days last month. Listed below is the number of vehicles serviced last
month at the two dealerships. Construct dot plots and report summary statistics
to compare the two dealerships.

4-4

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

2
07/11/2014

LO4-1

Dot Plot – Example in Minitab

4-5

LO4-2 Construct and describe


a stem-and-leaf display.

Stem-and-Leaf
 In Chapter 2, frequency distributions were used to organize
data into a meaningful form.
 A major advantage to organizing the data with a frequency
distribution is that we get a visual picture of the shape of a
distribution.
 There are two disadvantages, however, of organizing the data
into a frequency distribution:
1. The exact identity of each value is lost.
2. It is difficult to tell how the values within each class are distributed.

 One technique that is used to display quantitative information in


a condensed form is the stem-and-leaf display.

4-6

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

3
07/11/2014

LO4-2

Stem-and-Leaf
 Stem-and-leaf display: a statistical technique to organize and
present a set of data. Each numerical value is divided into two
parts. The leading digit(s) becomes the stem and the trailing
digit the leaf. The stems are located along the vertical axis and
the leaf values are stacked against each other along the
horizontal axis.
 Advantage of the stem-and-leaf display over a frequency
distribution: the identity of each observation is not lost.

4-7

LO4-2

Stem-and-leaf Plot Example


Listed in Table 4–1 is the number of 30-second radio advertising
spots purchased by each of the 45 members of the Greater Buffalo
Automobile Dealers Association last year.
Organize the data into a stem-and-leaf display. Around what values
do the number of advertising spots tend to cluster? What is the
fewest number of spots purchased by a dealer? The largest
number purchased?

4-8

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

4
07/11/2014

LO4-2

Stem-and-leaf Plot Example

The usual procedure is to sort the leaf values from


the smallest to largest.

4-9

LO4-3 Identify and compute


measures of position.

Measures of Position
 The standard deviation is the most widely used
measure of dispersion.

 Another way to describe the spread of data is using


the position of values that divide a set of observations
into equal parts.

 These measures of position include quartiles,


deciles, and percentiles.

4-10

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

5
07/11/2014

LO4-3

Percentile Computation
 To compute a percentile, let Lp refer to the location of a desired
percentile. So if we wanted to find the 33rd percentile we would
use L33 and if we wanted the median, the 50th percentile, then
L50.

 The number of observations is n, so if we want to locate the


median, its position is at (n + 1)/2, or we could write this as
(n + 1)(P/100), where P is the desired percentile.

4-11

LO4-3

Percentiles - Example
Listed below are the commissions earned last month by
a sample of 15 brokers at Salomon Smith Barney’s
Oakland, California, office.

$2,038 $1,758 $1,721 $1,637


$2,097 $2,047 $2,205 $1,787
$2,287 $1,940 $2,311 $2,054
$2,406 $1,471 $1,460

Locate the median, the first quartile, and the third


quartile for the commissions earned.

4-12

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

6
07/11/2014

LO4-3

Percentiles – Example
Step 1: Organize the data from lowest to
largest value.

4-13

LO4-3

Percentiles – Example
Step 2: Compute the first and third
quartiles. Locate L25 and L75
using:

25 75
L25 = (15 +1) =4 L75 = (15 +1) = 12
100 100
Therefore, the first and third quartiles are located at the 4th and 12th
positions, respectively: L25 = $1, 721; L75 = $2, 205

$1,460 $1,471 $1,637 $1,721


$1,758 $1,787 $1,940 $2,038
$2,047 $2,054 $2,097 $2,205
$2,287 $2,311 $2,406

4-14

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

7
07/11/2014

LO4-3

Percentiles – Example
In the previous example the location formula yielded a whole number. What
if there were 6 observations in the sample with the following ordered
observations: 43, 61, 75, 91, 101, and 104 , that is n=6, and we wanted to
locate the first quartile?

25
L25  (6  1)  1.75
100
Locate the first value in the ordered array and then move .75 of the distance
between the first and second values and report that as the first quartile. Like
the median, the quartile does not need to be one of the actual values in the
data set.
The 1st and 2nd values are 43 and 61. Moving 0.75 of the distance
between these numbers, the 25th percentile is 56.5, obtained as 43 +
0.75*(61- 43).

4-15

LO4-4 Construct and


analyze a box plot.

Box Plot
 A box plot is a graphical display, based on
quartiles, that helps us picture a set of data.
 To construct a box plot, we need only five
statistics:
1. The minimum value,
2. Q1(the first quartile),
3. The median,
4. Q3 (the third quartile), and
5. The maximum value.

4-16

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

8
07/11/2014

LO4-4

Boxplot - Example
 Alexander’s Pizza offers free delivery of its pizza within 15 miles.
Alex, the owner, wants some information on the time it takes for
delivery. How long does a typical delivery take? Within what range
of times will most deliveries be completed? For a sample of 20
deliveries, he determined the following information:
 Minimum value = 13 minutes
 Q1 = 15 minutes
 Median = 18 minutes
 Q3 = 22 minutes
 Maximum value = 30 minutes

 Develop a box plot for the delivery times. What conclusions can you
make about the delivery times?

4-17

LO4-4

Boxplot Example
Step1: Create an appropriate scale along the horizontal axis.
Step 2: Draw a box that starts at Q1 (15 minutes) and ends at Q3 (22
minutes). Inside the box we place a vertical line to represent the
median (18 minutes).
Step 3: Extend horizontal lines from the box out to the minimum value (13
minutes) and the maximum value (30 minutes).

4-18

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

9
07/11/2014

LO4-4

Boxplot – Using Minitab


Refer to the Applewood Auto Group data.
Develop a box plot for the variable age of
the buyer. What can we conclude about
the distribution of the age of the buyer?

The MINITAB statistical software system


was used to develop the following chart
and summary statistics. What can we
conclude about the distribution of the age
of the buyers?

• The median age of buyers was 46


years.
• 25 percent of the buyers were more
than 52.75 years of age.
• 50 percent of the purchasers were
between the ages of 40 and 52.75
years.
• The distribution of age is symmetric.
4-19

LO4-5 Compute and understand


the coefficient of skewness.

Skewness
 Chapter 3 introduced measures of central location (the
mean, median, and mode) and measures of dispersion
(the range and standard deviation) for a distribution of
data.
 Shape is another characteristic of a distribution.
 There are four shapes commonly observed:
1. symmetric,
2. positively skewed,
3. negatively skewed, and
4. bimodal.

4-20

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

10
07/11/2014

LO4-5

Commonly Observed Shapes

4-21

LO4-5

Computing the Coefficient of


Skewness
The coefficient of skewness can range from -3 up to 3.
 A value near -3, indicates considerable negative skewness.
 A value near +3 indicates considerable positive skewness.
 A value of 0 occurs when the mean and median are equal and
indicates that the distribution is symmetrical and is not skewed.
Skewness can be calculated using Pearson’s Coefficient of
Skewness formula:

4-22

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

11
07/11/2014

LO4-5

Skewness – An Example
Following are the earnings per share for a sample of 15
software companies for the year 2010. The earnings per
share are arranged from minimum to maximum.

 Compute the mean, median, and standard deviation.


Find the coefficient of skewness using Pearson’s
estimate.
 What is your conclusion regarding the shape of the
distribution?

4-23

LO4-5

Skewness – An Example
Step 1 : Compute the Mean

X
X 
$74.26
 $4.95
n 15

Step 2 : Compute the Standard Deviation

s

 XX 
2


($0.09  $4.95)2  ...  ($16.40  $4.95)2 )
 $5.22
n 1 15  1

Step 3 : Find the Median


The middle value in the set of data, arranged from smallestto largestis 3.18

Step 4 : Compute the Skewness


3( X  Median ) 3($4.95  $3.18)
sk    1.017
s $5.22

4-24

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

12
07/11/2014

LO4-6 Create and interpret a


scatter diagram.

Describing the Relationship between


Two Variables
 When we study the relationship between two variables we refer to
the data as bivariate.

 One graphical technique we use to show the relationship between


two variables is called a scatter diagram.

 To draw a scatter diagram, we scale one variable along the


horizontal axis (X-axis) of a graph and the other variable along
the vertical axis (Y-axis).

4-25

LO4-6

Scatter Diagram Examples

4-26

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

13
07/11/2014

LO4-7 Develop and explain


a contingency table.

Contingency Tables
 What if we wish to study the relationship between two
variables when one or both are nominal or ordinal
scale? In this case we tally the results with a
contingency table.

4-27

LO4-7

Contingency Tables
A contingency table is a cross-tabulation that
simultaneously summarizes two variables of interest.

Examples:
1. Students at a university are classified by gender and class rank.
2. A product is classified as acceptable or unacceptable and by the
shift (day, afternoon, or night) when it is manufactured.
3. A voter in a school bond referendum is classified by party affiliation
(Democrat, Republican, and other) and the number of children that
voter has attending school in the district (0, 1, 2, etc.).

4-28

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

14
07/11/2014

LO4-7

Contingency Tables – An Example


There are four dealerships in the Applewood Auto group. Suppose we want to
compare the profit earned on each vehicle sold by a particular dealership. To put
it another way, is there a relationship between the amount of profit earned and
the dealership? The table below is the cross-tabulation of the raw data. Note the
profit is converted to an ordinal variable.

4-29

LO4-7

Contingency Tables – An Example


From the contingency table, we observe the following:
1. From the total column on the right, 90 of the 180 cars sold had a
profit above the median and half below. From the definition of the
median, this is expected.
2. For the Kane dealership 25 out of the 52, or 48 percent, of the
cars sold were sold for a profit more than the median.
3. The percent profits above the median for the other dealerships
are 50 percent for Olean, 42 percent for Sheffield, and 60 percent
for Tionesta.

4-30

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

15

You might also like