You are on page 1of 61

Chapter 3

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-1
Chapter 3
Calculating Descriptive Statistics
CHAPTER 3 MAP
3.1 Measures of Central Tendency

3.2 Measures of Variability

3.3 Using the Mean and Standard Deviation Together

3.4 Working with Grouped Data

3.5 Measures of relative Positions

3.6 Measures of Association Between Two Variables

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-2
3.1 Measures of Central Tendency

Central tendency is a single value used to


describe the center point of a data set.

Measures of Central Tendency

Mean Median Mode

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-3
The Mean
The mean, or average, is the most common
measure of central tendency.
• Calculate the mean by adding all the values in
a data set and then dividing the result by the
number of observations.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-4
The Mean
Formula for the Sample Mean:
where = the sample mean
= the values in the sample

= the sum of all the data values

n = the number of data values


in the sample
Pronounced
“x-bar”

Sample size Observed values

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-5
The Mean
Formula for the Population Mean:

where = the population mean


(the Greek letter “mu”)

N = the number of data values


in the population

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-6
Calculating The Mean
Example: suppose a sample of size n = 5 gives
the following values:
6.2 7.1 4.8 9.0 3.3

The sample mean:

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-7
The Median

The median is the value in the data set for


which half the observations are higher and
half the observations are lower.
• First arrange the data in ascending order.
• Use an Index Point to determine the position of
the median in the data set.
Formula for the Index Point for the Median:
i = 0.5(n), where n = no. of data points.
Whenever the index point is not a whole number, round the
value up to the next highest whole number.
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
3-8
The Median
Example with sample of size n = 7:
21 27 27 28 34 45 50
• The index number is
i = 0.5(n) = 0.5(7) = 3.5

The index number is not a whole number so


round up to i = 4.
The median value is, therefore, in the fourth
position of our sorted data.
21 27 27 28 34 45 50
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
3-9
The Median

The median is not sensitive to outliers.


21 27 27 28 34 45 5000
• The median is still 28.

When there are an odd number of data


values, the median is always the middle value
in the data set.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-10
The Median

When the index point is an even whole


number, the position of the median is halfway
between the index point (i) and the next data
point (the i + 1 position).

When there are an even number of data


values, the median is halfway between the
two middle values.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-11
The Median
Example with sample of size n = 6:
145 157 170 182 204 209
• The index number is
i = 0.5(n) = 0.5(6) = 3

The index number is a whole number so the median


value is halfway between the third and fourth values in
the sorted data.
median = (170 + 182)/2 = 176

145 157 170 182 204 209


Copyright © 2020, 2015, 2013 Pearson Education, Inc.
3-12
The Mode

The mode is the value that appears most often


in a data set.
• If no data value or category repeats more than
once, then we say that the mode does not exist.
• More than one mode can exist if two or more
values tie for most frequent.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-13
The Mode
Example with numerical data:
• Number of children per family in a sample of 24
families:
0,0,0,0,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,4,5

Number
of children Frequency
0 4 The value that
1 5 appears most
2 8 often is 2
0 1 2 3 4 5 (occurs 8 times),
3 4
4 2 so the mode = 2
children.
5 1
Mode = 2

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-14
The Mode

Example with categorical data:

Car Model Number in Parking Lot


Acura 3
Ford 5
BMW 1
Toyota 7

• The car that appears most often is Toyota (occurs 7


times), so the mode is the Toyota model.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-15
Review Example

Prices for 5 homes have been collected


House Prices: ▪ Mean: ($3,000,000/5)
$2,000,000 = $600,000
500,000
300,000 ▪ Median: middle value of ranked
100,000 data
100,000
Sum 3,000,000
= $300,000
▪ Mode: appears most often
= $100,000

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-16
3.2 Measures of Variability

Measures of variability show how much spread


is present in the data.

Measures of Variability

Range Variance Standard


Deviation
For a sample
For a sample
For a population
For a population

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-17
The Range
Simplest measure of variation
Difference between the highest value and the
lowest value in a data set

Range = Highest value – Lowest Value

Example: 1, 2, 4, 4, 6, 8, 8, 8, 8, 9, 11, 11, 12, 13

0 1 2 3 4 5 6 7 8 9 10 11 12 13

Range = 13 - 1 = 12
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
3-18
The Variance and Standard Deviation

• The sample variance is denoted by s2

Sample Variance Formula:

where = sample mean


n = sample size
= the difference between each
data value and the sample mean

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-19
Calculating the Sample Variance

Sample
Data (xi) : 4 6 8 9 11 12 12 18
n=8 Mean = = 10

The variance measures the


variability, or spread, of the
data points around the mean.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-20
The Standard Deviation

The standard deviation is the square root of


the variance.
• Has the same units as the original data

Sample standard deviation formula:

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-21
Calculating the
Sample Standard Deviation
Sample
Data (xi) : 4 6 8 9 11 12 12 18
n=8 Mean = = 10

A measure of how far on


average each data value is
from the mean of the sample

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-22
The Variance and Standard Deviation for
a Population
Used when the data set represents an entire
population rather than a sample from a
population

Population Variance Formula:

where = population mean


N = population size
= the difference between each data
value and the population mean

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-23
The Variance and Standard Deviation for
a Population
Used when the data set represents an entire
population rather than a sample from a
population
Population Standard
Deviation Formula:

where = population mean


N = population size
= the difference between each data
value and the population mean

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-24
Using Excel to Calculate the Variance and
Standard Deviation
The Excel functions for the sample variance
and standard deviation are:
=VAR.S(data values) The letter S in
VAR.S and STDEV.S
=STDEV.S(data values) indicates “sample”.

The functions for the population variance and


standard deviation are:
=VAR.P(data values) The letter P in
VAR.P and STDEV.P
=STDEV.P(data values) indicates “population”.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-25
3.3 Using the Mean and Standard
Deviation Together
The standard deviation is a common measure
of consistency in business applications, such as
quality control.
• The standard deviation measures the amount of
variability around the mean.
The standard deviation is affected by the scale
of the data.
• When sample means are very different, comparing
standard deviations can be misleading.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-26
The Coefficient of Variation

The coefficient of variation, CV, measures the


standard deviation in terms of its percentage
of the mean.
• A high CV indicates high variability relative to the
size of the mean.
• A low CV indicates low variability relative to the
size of the mean.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-27
The Coefficient of Variation
Formula for the sample coefficient of variation:
where s = the sample standard deviation
= the sample mean

Formula for the population coefficient of variation:


where = the population standard deviation
= the population mean

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-28
Coefficient of Variation Example
Stock Price for Nike: Stock Price for Google:
Average price last year = $59.67 Average price last year =$1045.85
Standard deviation = $6.64 Standard deviation = $68.70

Coefficient of Variation:

Nike:

Google:

Table 3.14, based on https://www.nasdaq.com/


Copyright © 2020, 2015, 2013 Pearson Education, Inc.
3-29
The z-Score

The z-score identifies the number of standard deviations a


particular value is from the mean of its distribution.
• A z-score has no units.

The z-score is - zero for values equal to the mean


- positive for values above the mean
- negative for values below the mean

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-30
The z-Score
Formula for the population z-score
where = the data value of interest
= the population mean
= the population standard deviation

• Formula for the sample z-score:

where = the data value of interest


= the sample mean
= the sample standard deviation

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-31
z-Score Example
For example, let’s say you have a test score
of 190. The test has a mean (μ) of 150 and a
standard deviation (σ) of 25. Assuming a
normal distribution, Find the z-score.:

z = (x – μ) / σ
= (190 – 150) / 25 = 1.6.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-32
The Empirical Rule
According to the empirical rule, if a distribution
follows a bell-shaped, symmetrical curve
centered around the mean, we would expect:
Approximately 68% of the Approximately 95% of the Approximately 99.7% of
values to fall within ± 1 values to fall within ± 2 the values to fall within ± 3
standard deviations from standard deviations from standard deviations from
the mean the mean the mean

-1 +1 -2 +2 -3 +3
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
3-33
Expressing z-Scores in Terms of x
Formulas for expressing the z-score in terms of x:

For a population For a sample

Question: For a symmetric bell shaped population with a mean of 20 and a


standard deviation of 3, what interval will contain about 95% of all the values?

Answer: About 95% of the values are within ± 2 standard deviations:

About 95% of the values will


fall between 14 and 26.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-34
3.4 Working with Grouped Data

Suppose data has already been summarized by a


frequency distribution.
• The individual data values are no longer shown.
• Only grouped data is available.

To estimate the average for the frequency


distribution:
• Find the midpoint for each group.
(The midpoint is the halfway point in each
group.)
• Use the midpoint as a representative value for that
group.
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
3-35
The Mean of Grouped Data
The formula for the Sample Mean from Grouped
Data: where = the frequency for class i
= the midpoint for class i

= the total number of observations

k = the number of classes


• The mean is only an approximate value since the midpoint is
just an estimate of the value in each class.
Formula for the
population mean from
grouped data:
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
3-36
Example: The Mean of Grouped Data

Example An online merchant has collected the following


grouped data for the number of web pages viewed by a sample
of its customers:
Number of pages Frequency

1 to under 5 6
5 to under 9 12
9 to under 13 10
13 to under 17 4
The merchant would like to

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-37
Example: The Mean of Grouped Data
1. Find the midpoint for each class
Number of Midpoint Frequency
pages (mi) (fi)
1 to under 5 3 6 The midpoint is
5 to under 9 7 12 the halfway point
9 to under 13 11 10 in each group.
13 to under 17 15 4

2. Calculate the mean:

The average number of viewed pages is about 8.5.


Copyright © 2020, 2015, 2013 Pearson Education, Inc.
3-38
The Variance and Standard Deviation of
Grouped Data
Formula for the Sample Variance: Grouped Data
where = the approximate sample mean
= the frequency for class i
= the midpoint for class i

= the total number of observations

k = the number of classes

Formula for the Population Variance: Grouped Data

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-39
Example: The Variance and Standard
Deviation of Grouped Data
Number of Midpoint Frequency
pages (mi) (fi)
1 to under 5 3 6
5 to under 9 7 12
9 to under 13 11 10
13 to under 17 15 4

Calculate the variance and standard deviation. Recall that = 8.5.

So the standard deviation is


Copyright © 2020, 2015, 2013 Pearson Education, Inc.
3-40
3.5 Measures of Relative Position

Measures of relative position compare the


position of one value in relation to other
values in the data set.
Measures of Relative Position

Percentiles Quartiles

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-41
Percentiles

Percentiles measure the approximate percentage


of values in the data set that are below the value
of interest.
The pth percentile of a data set (where p is any
number between 1 and 100) is the value that at
least p percent of the observations will fall below.
Examples:
• 20% of the data values are below the 20th percentile.
• 73% of the data values are below the 73rd percentile.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-42
Percentiles
To find percentiles manually:
• Sort the data from lowest to highest
• Calculate the index point, i

where: p = the percentile of interest


n = the number of data values

If i is not a whole number, round i to the next whole number.


The ith position represents our value of interest.
If i is a whole number, the midpoint between the ith and i + 1
position is our value of interest.
Note: The index point, i, is not the value of the percentile, it
is the position of the percentile value in the ranked data.
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
3-43
Percentiles
Example: Miles per gallon were recorded for a sample
of 12 cars. The ranked values are shown below.

Question: What is the value of Position MPG Position MPG


60th percentile? 1 16.2 7 28.3
2 18.0 8 31.1
Answer:
3 20.5 9 32.0
4 20.5 10 35.8
5 22.8 11 38.0
6 26.4 12 42.5
Round up to get i = 8, so find the
60th percentile value in the 8th
position of the ranked data
The 60th percentile is 31.1 MPG.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-44
Percentiles in Excel

Excel calculates percentiles using the


PERCENTILE.EXC function:
=PERCENTILE.EXC(array, k)
where:
array = the data range of interest
k = the percentile of interest between 0 and 1 inclusive

• Excel uses a slightly different technique to


calculate percentiles so Excel output may be
different than if you calculate manually.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-45
Quartiles

Quartiles split the ranked data into 4 equal


groups:
• The first quartile (Q1) is the value that constitutes
the 25th percentile.
• The second quartile (Q2) is the value that
constitutes the 50th percentile.
• Note that the second quartile (the 50th percentile) is
the median.
• The third quartile (Q3) is the value that
constitutes the 75th percentile.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-46
Quartiles

Example: Find the first quartile


Sample Data: 11 12 13 16 16 17 18 21 22 22 25
(n = 11)

Q1 = 25th percentile, so find the index number i where n = 11 and p = 25:

so round up and use the value in the 3rd position: Q1 = 13

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-47
Quartiles

Quartiles can be found in Excel with the QUARTILE.EXE


function

=QUARTILE.EXC(array, quart)
where: array = the data range of interest
quart = 1, 2, or 3 (for the first, second, or third quartile)

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-48
Interquartile Range

The interquartile range, IQR, describes the


middle 50% of a range.
Find the IQR by subtracting the first quartile
from the third quartile.
IQR = Q3 – Q1

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-49
The Five-Number Summary

A five-number summary consists of these five


values:
In the prior example:
• The minimum value Min = 0.59
• The first quartile Q1 = 2.37
• The second quartile Q2 = 3.27
• The third quartile Q3 = 4.26
• The maximum value Max = 11.31

• Note that outliers are included in the five-


number summary.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-50
3.6 Measures of Association Between
Two Variables
The goal of this section is to examine two
descriptive statistics that measure the linear
relationship between two variables.
Measures of Association
Between Two Variables

Sample Sample
Covariance Correlation
Coefficient

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-51
Sample Covariance
The sample covariance, sxy , measures the direction
of the linear relationship between two variables.
• A relationship is linear if the scatter plot of the independent and
dependent variables has a straight-line pattern.

The formula for the sample covariance is


where = the sample mean of the x variable
= the sample mean of the y variable
= the difference between each data value
and the sample mean for the x variable
= the difference between each data value
and the sample mean for the y variable
n = the sample size

The covariance is useful in describing the direction of the relationship


but not the strength of the relationship.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-52
Correlation Coefficient

The sample correlation coefficient, rxy , measures


both the strength and direction of the linear
relationship between two variables.

The formula for the sample correlation coefficient is


where sxy = the sample covariance between
variables x and y
sx = the sample standard deviation for the x variable
sy = the sample standard deviation for the y variable

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-53
Example: Covariance and Correlation
Calculations
Example: A car dealer wants to examine the relationship
between the number of sales representatives and the
number of cars sold per week.
• Suppose a sample of 6 weeks is selected.
Week Number of Number of
sales cars sold
Two values are recorded representatives
for each week: 1 2 4
a) Number of sales 2 5 10
representatives 3 3 7
b) Number of cars sold 4 4 7
5 3 6
6 4 8
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
3-54
Example: Covariance and Correlation
Calculations

Scatterplot
Wee Number of Number of
k sales cars sold
representatives (y)
(x)
1 2 4
2 5 10
3 3 7
4 4 7
5 3 6
6 4 8

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-55
Covariance Calculations

Number Number
of Sales of Cars
Reps Sold
xi y
2 3.5 -1.5 4 7 -3 4.5
5 3.5 1.5 10 7 3 4.5
3 3.5 -0.5 7 7 0 0
4 3.5 0.5 7 7 0 0
3 3.5 -0.5 6 7 -1 0.5
4 3.5 0.5 8 7 1 0.5
=
10
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
3-56
Covariance Calculations
Completing the calculation:

• A positive value implies a positive linear relationship.


(as one variable increases, the second variable also tends to
increase)
• A negative covariance indicates a negative linear
relationship.
(as one variable increases, the second variable tends to decrease)
• A covariance close to zero indicates no relationship
between the two variables.
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
3-57
Correlation Coefficient Calculations

Completing the calculation:

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-58
Sample Correlation Coefficient

The sample correlation coefficient, rxy ,


indicates both the strength and direction of
the linear relationship between the
independent and dependent variables.
• The values of r range from -1.0, a strong
negative relationship, to +1.0, a strong
positive relationship.
• When r = 0, there is no relationship
between variables x and y.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-59
The Correlation Coefficient
Examples of approximate r values

Graph A (r = 1.0): perfect positive correlation between x and y

Graph B (r = -1.0): perfect negative correlation between x and y

Graph C (r = 0.6): a moderately positive relationship: y tends to increase as x increases,


but not necessarily at the steady rate we observed in Graph A
Graph D (r = -0.4): a relatively weak negative relationship: the correlation coefficient is
closer to zero, negative r value so y tends to decrease as x increases
Graph E (r = 0): no relationship between x and y

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-60
All rights reserved. No part of this publication may be reproduced, stored in a retrieval
system, or transmitted, in any form or by any means, electronic, mechanical, photocopying,
recording, or otherwise, without the prior written permission of the publisher.
Printed in the United States of America.

Copyright © 2020, 2015, 2013 Pearson Education, Inc.


3-61

You might also like