You are on page 1of 115

Chapter 3

Numerical Descriptive
Measures

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 1


Objectives

In this chapter, you learn to:


■ Describe the properties of central tendency,
variation, and shape in numerical data
■ Construct and interpret a boxplot
■ Compute descriptive summary measures for a
population
■ Calculate the covariance and the coefficient of
correlation

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 2


Summary Definitions

 The central tendency is the extent to which the


values of a numerical variable group around a
typical or central value.
 If the measures are computed for data from a sample,
they are called sample statistics.
 If the measures are computed for data from a population,
they are called population parameters.
 A sample statistic is referred to as the point estimator of
the corresponding population parameter.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 3


Summary Definitions

 The variation is the amount of dispersion or


scattering away from a central value that the values
of a numerical variable show.
 The shape is the pattern of the distribution of
values from the lowest value to the highest value.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 4


Shape of a Distribution

 A distribution is a group of scores (measures).


 If a distribution is graphed, the resulting bar graph
or histogram can have any “shape”.
 The most common shape you will see is a bell
curve.
 Bell-shaped distributions are also called normal
distributions or Gaussian distributions.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 5


Shape of a Distribution

■One important characteristic Normal Distribution

of normal distributions is
that most of the scores pile
up in the middle.
■Normal distributions are
symmetrical in that the right
and left sides of the graph
are identical.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 6


Shape of a Distribution
Positively Skewed Distribution

■Graphs can deviate from the


bell shape because of skew.
■A skewed distribution is
asymmetrical
■right and left sides are not
identical.
■scores piles up at lower or
upper ends.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 7


Shape of a Distribution

■Distributions also vary in


kurtosis
■which is the extent to which
they have an exaggerated peak
versus a flatter appearance
■distributions that have a higher,
more exaggerated peak than a
normal curve are called
leptokurtic,
■distributions that have a flatter
peak are called platykurtic.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 8


Shape of a Distribution

■ Describes how data are distributed


■ Two useful shape related statistics are:
■ Skewness
■ Measures the extent to which data values are not
symmetrical
■ Kurtosis
■ Kurtosis affects the peakedness of the curve of
the distribution—that is, how sharply the curve
rises approaching the center of the distribution

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 9


Shape of a Distribution (Skewness)

■ Measures the extent to which data is not


symmetrical

Left-Skewed Symmetric Right-Skewed


Mean < Mean = Median <
Median Median Mean

Skewness
Statistic <0 0 >0

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 10


Shape of a Distribution -- Kurtosis measures
how sharply the curve rises approaching the
center of the distribution

Sharper Peak
Than Bell-Shaped
(Kurtosis > 0)

Bell-Shaped
(Kurtosis = 0)
Flatter Than
Bell-Shaped
(Kurtosis < 0)

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 11


Measures of Central Tendency:

■ It is also called measure of location.


■ Mean
■ Median
■ Mode
■ Weighted Mean
■ Geometric Mean
■ Percentiles
■ Quartiles

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 12


Measures of Central Tendency:

■ You are probably already familiar with the notion of


central tendency
■ If your five history exam scores for a semester were
33%, 81%, 86%, 96%, and 96%, the “center” of these
scores summarizes your academic performance in
history.
■ What is the central value of this distribution?

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 13


Measures of Central Tendency:
■ Your history instructor could use the mean, the median,
or the mode to find the center of your scores?
■ If your instructor used the arithmetic average (i.e., the
mean, 78.4%), you would get a C.
■ Although the mean is the most common measure of central
tendency, there are other options.
■ She could use the middle test score (the median, 86%), and
you would get a B.
■ She could also use the most frequently occurring test score
(the mode, 96%), and you would get an A.
■ You will prefer the last one.
■ There are rules of thumb that help you decide when to use each of
these measures of center.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 14


Measures of Central Tendency:
■ When the data are nominal (i.e., when the data are
categories rather than values), you must use the mode
to summarize the center.
■ The median is the best option when data are ordinal.
■ When working with interval or ratio data, you need to
choose between the mean and the median to represent
the center.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 15


Measures of Central Tendency:
■ In general, you should use the mean to summarize
interval or ratio data
■ If the data set contains one or more “extreme” scores
that are very different from the other scores, you
should use the median
■ Statisticians would consider the 38 value an outlier
because it is a very extreme score compared with the
rest of the scores in the distribution

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 16


Measures of Central Tendency:
When to Use Measures of Central Tendency

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 17


Measures of Central Tendency:

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 18


Measures of Central Tendency:

■ When a distribution is symmetrical (or close to


being symmetrical), the mean, the median, and the
mode are all very similar in value.
■ When a distribution is very asymmetrical, the mean,
the median, and the mode are different.
■ In asymmetrical distributions, the mean is “pulled”
toward the distribution’s longer tail.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 19


Measures of Central Tendency:
The Mean

■ The arithmetic mean (often just called the “mean”) is


the most common measure of central tendency

■ For a sample of size n:


Pronounced x-bar
The ith value

Sample size Excel File


Observed values

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 20


Measures of Central Tendency:
The Mean (con’t)

■ The most common measure of central tendency


■ Mean = sum of values divided by the number of values
■ Affected by extreme values (outliers)

11 12 13 14 15 16 17 18 19 20 11 12 13 14 15 16 17 18 19 20

Mean = 13 Mean = 14

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 21


Measures of Central Tendency:
The Mean (con’t)

■ The Mean as the Center of Balance for the Dot


Plot of the Classroom Size Data

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 22


Measures of Central Tendency:
The Median

■ The median is the midpoint of a distribution of


scores
■ In an ordered array, the median is the “middle”
number (50% above, 50% below)
■ When working with a list of scores, you begin by
putting the scores in order from lowest to highest (or
highest to lowest).
■ Less sensitive than the mean to extreme values

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 23


Measures of Central Tendency:
The Median

■ In an ordered array, the median is the “middle”


number (50% above, 50% below)

11 12 13 14 15 16 17 18 19 20 11 12 13 14 15 16 17 18 19 20

Median = 13 Median = 13

■ Less sensitive than the mean to extreme values

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 24


Measures of Central Tendency:
Locating the Median

■ The location of the median when the values are in numerical order
(smallest to largest):

■ If the number of values is odd, the median is the middle number

■ If the number of values is even, the median is the average of the


two middle numbers

Note that is not the value of the median, only the position of
the median in the ranked data

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 25


Measures of Central Tendency:
The Mode
■ Value that occurs most often
■ Not affected by extreme values
■ Used for either numerical or categorical data
■ There may be no mode
■ There may be several modes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6

Mode = 9 No Mode
Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 26
Measures of Central Tendency:
Review Example

House Prices: ▪ Mean: ($3,000,000/5)


$2,000,000 = $600,000
$ 500,000 ▪ Median: middle value of ranked
$ 300,000
$ 100,000 data
$ 100,000 = $300,000
Sum $ 3,000,000 ▪ Mode: most frequent value
= $100,000

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 27


Weighted Mean

 In some instances, the mean is computed by


giving each observation a weight that reflects
its relative importance or frequency of each
observation
 The choice of weights depends on the
application or frequency

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 28


Weighted Mean
Ron Butler, a home builder, is looking over the expenses he incurred for a
house he just built. For the purpose of pricing future projects, he would like to
know the average wage ($/hour) he paid the workers he employed. Listed
below are the categories of workers he employed, along with their respective
wage and total hours worked.

Worker Wage ($/hr) Total Hours


Carpenter 21.60 520

Electrician 28.72 230

Laborer 11.80 410

Painter 19.75 270

Plumber 24.16 160

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 29


Weighted Mean

■ Example: Construction Wages

FYI, the equally-weighted (simple) mean = $21.21

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 30


Geometric Mean

■ The geometric mean is calculated by finding


the nth root of the product of n values.
■ It is often used in analyzing growth rates in
financial data (where using the arithmetic
mean will provide misleading results).
■ It should be applied anytime you want to
determine the mean rate of change over
several successive periods (be it years,
quarters, weeks, . . .).

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 31


Geometric Mean

■ Other common applications include changes in


populations of species, crop yields, pollution
levels, and birth and death rates.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 32


Geometric Mean

■ Example: Rate of Return

The average growth rate per period is (0.97752 – 1)(100) =


–2.248%.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 33


Measures of Central Tendency:
Which Measure to Choose?

▪ The mean is generally used, unless extreme values


(outliers) exist.
▪ The median is often used, since the median is not
sensitive to extreme values. For example, median
home prices may be reported for a region; it is less
sensitive to outliers.
▪ In some situations, it makes sense to report both the
mean and the median.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 34


Measures of Central Tendency:
Summary

Central Tendency

Arithmetic Median Mode


Mean

Middle value Most


in the ordered frequently
array observed
value

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 35


Measures of Variation
■ Mean is commonly used to
summarize the center of a
distribution of scores measured
on an interval or ratio scale.
■ The mean does a good job
describing the center of scores, it
is also important to describe how Same center,
different variation

“spread out from center” scores


are.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 36


Measures of Variation
■ There are several ways to
describe the variability of
interval/ratio data
■ The easiest measure of variability
is the range
■ The most common measure of
variability is the standard Same center,
different variation

deviation

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 37


Measures of Variation
Variation

Range Variance Standard Coefficient


Deviation of Variation

■ Measures of variation give


information on the spread or
variability or dispersion of
the data values.

Same center,
different variation
Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 38
Measures of Variation:
The Range

▪ Simplest measure of variation


▪ Difference between the largest and the smallest
values:

Range = Xlargest – Xsmallest

Example:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 13 - 1 = 12

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 39


Measures of Variation:
Why The Range Can Be Misleading

▪ Does not account for how the data are distributed

7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5

▪ Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 40


Measures of Variation:
The Sample Variance

■ Average (approximately) of squared deviations


of values from the mean

■ Sample variance:

Were = arithmetic mean


n = sample size
Xi = ith value of the variable X

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 41


Measures of Variation:
The Sample Standard Deviation

■ Most used measure of variation


■ Shows variation about the mean
■ Is the square root of the variance
■ Has the same units as the original data

■ Sample standard deviation:

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 42


Measures of Variation:
The Standard Deviation

Steps for Computing Standard Deviation

1. Compute the difference between each value and


the mean.
2. Square each difference.
3. Add the squared differences.
4. Divide this total by n-1 to get the sample variance.
5. Take the square root of the sample variance to get
the sample standard deviation.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 43


Measures of Variation:
The Standard Deviation
Summary of Five Steps to Computing a Sample’s Standard
Deviation

Excel File
Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 44
Measures of Variation:
Sample Standard Deviation:
Calculation Example

Sample
Data (Xi) : 10 12 14 15 17 18 18 24
n=8 Mean = X = 16

A measure of the “average”


scatter around the mean
Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 45
Measures of Variation:
Comparing Standard Deviations

Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 3.338

Data B Mean = 15.5


11 12 13 14 15 16 17 18 19 20
S = 0.926
21

Data C Mean = 15.5


S = 4.567
11 12 13 14 15 16 17 18 19 20 21

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 46


Measures of Variation:
Comparing Standard Deviations

Smaller standard deviation

Larger standard deviation

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 47


Measures of Variation:
Summary Characteristics

▪ Standard deviation can be difficult to interpret


as a single number on its own
▪ The more the data are spread out, the greater
the range, variance, and standard deviation.
▪ The more the data are concentrated, the
smaller the range, variance, and standard
deviation.
▪ If the values are all the same (no variation), all
these measures will be zero.
▪ None of these measures are ever negative.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 48


Measures of Variation:
Summary Characteristics

▪ The standard deviation is affected by outliers


(extremely low or extremely high numbers in the
data set).
▪ That’s because the standard deviation is based on
the distance from the mean. 
▪ And remember, the mean is also affected by outliers.
▪ The standard deviation has the same units of
measure as the original data.
▪ If you’re talking about inches, the standard deviation
will be in inches.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 49


Measures of Variation:
Summary Characteristics

▪ More precisely, it is a measure of the average


distance between the values of the data in the
set and the mean.
▪ Data varies within ± Standard Deviation from
Mean

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 50


Measures of Variation:
The Coefficient of Variation

■ Measures relative variation


■ Always in percentage (%)
■ Shows variation relative to mean
■ Can be used to compare the variability of two or
more sets of data measured in different units

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 51


Measures of Variation:
Comparing Coefficients of Variation
■ Stock A:
■ Average price last year = $50
■ Standard deviation = $5

Both stocks
■ Stock B: have the same
standard
■ Average price last year = $100 deviation, but
■ Standard deviation = $5 stock B is less
variable relative
to its price

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 52


Measures of Variation:
Comparing Coefficients of Variation (con’t)
■ Stock A:
■ Average price last year = $50
■ Standard deviation = $5

Stock C has a
■ Stock C: much smaller
standard
■ Average price last year = $8 deviation but a
■ Standard deviation = $2 much higher
coefficient of
variation

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 53


Locating Extreme Outliers:
Z-Score

▪ The mean perfectly balances the positive and


negative deviation scores of a distribution.
▪ The sum of the positive deviation scores will
always equal the sum of the negative deviation
scores.
▪ Standard deviation describes how much
variability there is in a set of numbers.
▪ The mean and the standard deviation help you
interpret a distribution of scores by telling you
the “center” of the scores and how much
scores vary around that center.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 54


Locating Extreme Outliers:
Z-Score

▪ For example, suppose your score on the GMAT


was 25.
▪ This score alone doesn’t tell you much about
your performance,
▪ if you knew that the mean GMAT score was 21 with
a standard deviation of 4.70,
▪ you could interpret your score.
▪ Your score of 25 was 4 points better than the
population mean.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 55


Locating Extreme Outliers:
Z-Score

▪ The population standard deviation was 4.70;


this means that your score of 25 (i.e., +4 from
the mean) deviated less from the mean than
was typical (4.70).
▪ So, you did better than average but only a little
better because your score was less than 1
standard deviation above the mean.
▪ Z – Score will help you to know exactly how
you did in GMAT.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 56


Locating Extreme Outliers:
Z-Score

▪ To compute the Z-score of a data value,


subtract the mean and divide by the standard
deviation.
▪ The Z-score is the number of standard
deviations a data value is from the mean.
▪ A data value is considered an extreme outlier if
its Z-score is less than -3.0 or greater than
+3.0.
▪ The larger the absolute value of the Z-score,
the farther the data value is from the mean.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 57


Locating Extreme Outliers:
Z-Score

▪ A z score will indicate if a given score is very


good (far above the mean), very bad (far below
the mean), or average (close to the mean).
▪ When looking at GMAT scores, larger positive z
scores represent better performance and larger
negative z scores represent worse
performance.
▪ A z for a single score can help you compare
two scores from different distributions.
▪ Comparing TOFEL and GMAT Score.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 58


Locating Extreme Outliers:
Z-Score

where X represents the data value


X is the sample mean
S is the sample standard deviation
Your GMAT Single Score:
Z = (25 – 21)/4.7 = 0.851 ≈ 1

How must you should get to have 3


SD above mean? Excel File.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 59


Locating Extreme Outliers:
Z-Score

▪ Suppose the mean math SAT score is 490, with a


standard deviation of 100.
▪ Compute the Z-score for a test score of 620.

A score of 620 is 1.3 standard deviations above the


mean and would not be considered an outlier.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 60


Z-Score and Standard Normal
Curve
▪ Z scores enable us to
locate any score in a
distribution of scores
▪ they provide a very
systematic way to
compare any score
to any other score

▪ Positive z scores are above average and that


positive z scores greater than +1 are further above
the average

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 61


Z-Score and Standard Normal
Curve
▪ The distribution of raw
scores from which the
z scores are derived is
normally shaped
■ A normally shaped
distribution of z scores
enables researchers to
make very precise probability statements about
any score in a distribution.
We will discuss this when we deal with normal distribution

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 62


General Descriptive Stats Using
Microsoft Excel Functions

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 63


General Descriptive Stats Using
Microsoft Excel Data Analysis Tool
1. Select Data.
2. Select Data Analysis.
3. Select Descriptive
Statistics and click OK.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 64


General Descriptive Stats Using
Microsoft Excel

4. Enter the cell


range.
5. Check the
Summary
Statistics box.
6. Click OK

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 65


Excel output
Microsoft Excel
descriptive statistics
output, using the house
price data:

House Prices:

$2,000,000
500,000
300,000
100,000
100,000

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 66


Minitab Output
Minitab descriptive statistics output using the house price data:
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
Descriptive Statistics: House Price

Total
Variable Count Mean SE Mean StDev Variance Sum Minimum
House Price 5 600000 357771 800000 6.40000E+11 3000000 100000

N for
Variable Median Maximum Range Mode Mode Skewness Kurtosis
House Price 300000 2000000 1900000 100000 2 2.01 4.13

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 67


Quartile Measures
■ Quartiles split the ranked data into 4 segments with
an equal number of values per segment

25% 25% 25% 25%

Q1 Q2 Q3

■ The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger
■ Q2 is the same as the median (50% of the observations
are smaller and 50% are larger)
■ Only 25% of the observations are greater than the third
quartile

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 68


Quartile Measures:
Locating Quartiles

Find a quartile by determining the value in the


appropriate position in the ranked data, where

First quartile position: Q1 = (n+1)/4 ranked value

Second quartile position: Q2 = (n+1)/2 ranked value

Third quartile position: Q3 = 3(n+1)/4 ranked value

where n is the number of observed values

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 69


Quartile Measures:
Calculation Rules

■ When calculating the ranked position use the


following rules
■ If the result is a whole number then it is the ranked
position to use

■ If the result is a fractional half (e.g. 2.5, 7.5, 8.5, etc.)


then average the two corresponding data values.

■ If the result is not a whole number or a fractional half


then round the result to the nearest integer to find the
ranked position.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 70


Quartile Measures:
Locating Quartiles

Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22

(n = 9)
Q1 is in the (9+1)/4 = 2.5 position of the ranked data
so use the value half way between the 2nd and 3rd values,

so Q1 = 12.5
Q1 and Q3 are measures of non-central location
Q2 = median, is a measure of central tendency
Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 71
Quartile Measures
Calculating The Quartiles: Example

Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22

(n = 9)
Q1 is in the (9+1)/4 = 2.5 position of the ranked data,
so Q1 = (12+13)/2 = 12.5

Q2 is in the (9+1)/2 = 5th position of the ranked data,


so Q2 = median = 16

Q3 is in the 3(9+1)/4 = 7.5 position of the ranked data,


so Q3 = (18+21)/2 = 19.5
Q1 and Q3 are measures of non-central location
Q2 = median, is a measure of central tendency
Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 72
Quartile Measures:
The Interquartile Range (IQR)

■ The IQR is Q3 – Q1 and measures the spread in the


middle 50% of the data

■ The IQR is also called the midspread because it covers


the middle 50% of the data

■ The IQR is a measure of variability that is not


influenced by outliers or extreme values

■ Measures like Q1, Q3, and IQR that are not influenced
by outliers are called resistant measures

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 73


Calculating The Interquartile Range

Example:
X Median X
minimu Q1 (Q2) Q3 maximu
m m
25% 25% 25%
25%
12 30 45 57
70

Interquartile range
= 57 – 30 = 27

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 74


The Five Number Summary

The five numbers that help describe the center, spread


and shape of data are:
▪ Xsmallest
▪ First Quartile (Q1)
▪ Median (Q2)
▪ Third Quartile (Q3)
▪ Xlargest

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 75


Relationships among the five-number
summary and distribution shape

Left-Skewed Symmetric Right-Skewed


Median – Xsmallest Median – Xsmallest Median – Xsmallest

> ≈ <
Xlargest – Median Xlargest – Median Xlargest – Median
Q1 – Xsmallest Q1 – Xsmallest Q1 – Xsmallest

> ≈ <
Xlargest – Q3 Xlargest – Q3 Xlargest – Q3
Median – Q1 Median – Q1 Median – Q1

> ≈ <
Q3 – Median Q3 – Median Q3 – Median

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 76


Five Number Summary and
The Boxplot

■ The Boxplot: A Graphical display of the data


based on the five-number summary:
Xsmallest -- Q1 -- Median -- Q3 -- Xlargest
Example:

25% of data 25% 25% 25% of data


of data of data

Xsmallest Q1 Median Q3 Xlargest

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 77


Five Number Summary:
Shape of Boxplots
■ If data are symmetric around the median then the box
and central line are centered between the endpoints

Xsmallest Q1 Median Q3 Xlargest

■ A Boxplot can be shown in either a vertical or horizontal


orientation

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 78


Distribution Shape and
The Boxplot

Left- Symmetri Right-


Skewed c Skewed

Q1 Q2 Q 3 Q1 Q 2 Q3 Q1 Q2 Q3

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 79


Boxplot Example

■ Below is a Boxplot for the following data:

Xsmallest Q1 Q2 / Median Q3 Xlargest


0 2 2 2 3 3 4 5 5 9 27

0 2 3 5 27

■ The data are right skewed, as the plot depicts

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 80


Measuring the Skew
■ A useful measure of the direction and the extent of
the skew is provided by Pearson’s coefficient of
skewness (SK).

 
𝟑 ( 𝒎𝒆𝒂𝒏 − 𝒎𝒆𝒅𝒊𝒂𝒏 )
𝑺𝑲 =
𝑺𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 81


Measuring the Skew 𝟑 ( 𝒎𝒆𝒂𝒏 − 𝒎𝒆𝒅𝒊𝒂𝒏 )
 𝑺𝑲 =
𝑺𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏

■ Table below shows the weekly output of the


devices for cell phones produced by the 200
production workers in cell phone company.

■ Find the arithmetic mean of the weekly output.


■ Find the median weekly output.
■ Why do (a) and (b) differ?
Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 82
Measuring the Skew 𝟑 ( 𝒎𝒆𝒂𝒏 − 𝒎𝒆𝒅𝒊𝒂𝒏 )
 𝑺𝑲 =
𝑺𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏

■ Before calculating the solution to Example, look


briefly at the table of data above and decide on
the direction of skew and the likely impact this will
have on the mean and median values.
■ Solution (a) - Since we are dealing with grouped data,
we must take class mid-points for the variable Xi. We
then use the formula:

𝐹𝑖 𝑋 𝑖
´𝑋 = ∑
 

∑ 𝐹𝑖

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 83


Measuring the Skew 𝟑 ( 𝒎𝒆𝒂𝒏 − 𝒎𝒆𝒅𝒊𝒂𝒏 𝑭𝒊 𝑿 𝒊 ) ∑
 𝑺𝑲 = ´  =
𝑿
𝑺𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏
∑ 𝑭𝒊

■ The arithmetic mean


of the weekly output is
234.9 units.

■ Solution b - The
median position is
𝒏
 
𝑴𝒆𝒅𝒊𝒂𝒏=𝒍 +𝒉 × ( 𝟐
− 𝒄𝒇
𝒇 )
 
𝑛=𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
 
𝑓 =𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠
 
h=𝑐𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒
 
𝑐𝑓 =𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡h𝑒 𝑝𝑟𝑒𝑐𝑒𝑑𝑖𝑛𝑔 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠
 
Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 84
Measuring the Skew 𝟑 ( 𝒎𝒆𝒂𝒏 − 𝒎𝒆𝒅𝒊𝒂𝒏 𝑭𝒊 𝑿 𝒊 ) ∑
 𝑺𝑲 = ´  =
𝑿
𝑺𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏
∑ 𝑭𝒊

■ The median class


interval for output is
220–240 units, as
shown in the table
below

No. of
No. of
Output Mid Point Employee
Employee
(cumulative)

100 -160 130 1 1


160 - 180 170 5 6
180 -200 190 10 16
200 - 220 210 35 51
220 - 240 230 55 106
240 - 260 250 74 180
260 - 300 280 20 200

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 85


Measuring the Skew 𝟑 ( 𝒎𝒆𝒂𝒏 − 𝒎𝒆𝒅𝒊𝒂𝒏 𝑭𝒊 𝑿 𝒊 ) ∑
 𝑺𝑲 = ´  =
𝑿
𝑺𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏
∑ 𝑭𝒊

■ The median class interval for output is 220–240


units

No. of
No. of
Output Mid Point Employee
Employee
(cumulative)

100 -160 130 1 1


160 - 180 170 5 6
180 -200 190 10 16
200 - 220 210 35 51
220 - 240 230 55 106
240 - 260 250 74 180
260 - 300 280 20 200

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 86


Measuring the Skew 𝟑 ( 𝒎𝒆𝒂𝒏 − 𝒎𝒆𝒅𝒊𝒂𝒏 𝑭𝒊 𝑿 𝒊 ) ∑
 𝑺𝑲 = ´  =
𝑿
𝑺𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏
∑ 𝑭𝒊

■ The arithmetic mean output (234.9 units) is


lower than the median output (238 units).
■ We would expect this to be the case since the data is
clearly skewed to the left as the arithmetic mean
(simple average) will be pulled down by the few
extremely low values.

No. of
No. of
Output Mid Point Employee
Employee
(cumulative)

100 -160 130 1 1


160 - 180 170 5 6
180 -200 190 10 16
200 - 220 210 35 51
220 - 240 230 55 106
240 - 260 250 74 180
260 - 300 280 20 200

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 87


Numerical Descriptive Measures
for a Population

▪ Descriptive statistics discussed previously


described a sample, not the population.
▪ Summary measures describing a population,
called parameters, are denoted with Greek
letters.
▪ Important population parameters are the
population mean, variance, and standard
deviation.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 88


Numerical Descriptive Measures
for a Population: The mean µ

■ The population mean is the sum of the values in


the population divided by the population size, N

Where μ = population mean


N = population size
Xi = ith value of the variable X

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 89


Numerical Descriptive Measures
for a Population: The mean µ
■ Why only N in the formula of population mean
instead of n – 1?
■ The reason n-1 is used is because that is
the number of degrees of freedom in the sample.
■ The sum of each value in a sample minus the mean
must equal 0, so if you know what all the values
except one are, you can calculate the value of the
final one.
■ Degrees of Freedom refers to the maximum number
of logically independent values, which are values
that have the freedom to vary, in the data sample.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 90


Numerical Descriptive Measures For A
Population: The Variance σ2

■ Average of squared deviations of values from


the mean

■ Population variance:

Where μ = population mean


N = population size
Xi = ith value of the variable X

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 91


Numerical Descriptive Measures For A
Population: The Standard Deviation σ

■ Most commonly used measure of variation


■ Shows variation about the mean
■ Is the square root of the population variance
■ Has the same units as the original data

■ Population standard deviation:

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 92


Sample statistics versus
population parameters

Measure Population Sample


Parameter Statistic
Mean

Variance

Standard
Deviation

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 93


The Empirical Rule

■ The empirical rule approximates the variation of


data in a bell-shaped distribution
■ Approximately 68% of the data in a bell shaped
distribution is within 1 standard deviation of the
mean or

68%

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 94


The Empirical Rule
■ Approximately 95% of the data in a bell-shaped
distribution lies within two standard deviations of the
mean, or µ ± 2σ

■ Approximately 99.7% of the data in a bell-shaped


distribution lies within three standard deviations of the
mean, or µ ± 3σ

95% 99.7%

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 95


Using the Empirical Rule

▪ Suppose that the variable Math SAT scores is bell-


shaped with a mean of 500 and a standard deviation
of 90. Then,
▪ Approximately 68% of all test takers scored between 410
and 590, (500 ± 90).
▪ Approximately 95% of all test takers scored between 320
and 680, (500 ± 180).
▪ Approximately 99.7% of all test takers scored between
230 and 770, (500 ± 270).

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 96


Chebyshev Rule

■ Regardless of how the data are distributed, at


least (1 - 1/k2) x 100% of the values will fall
within k standard deviations of the mean (for k >
1)
■ Examples:
At least Within

(1 - 1/22) x 100% = 75% ….............. k=2 (μ ± 2σ)


(1 - 1/32) x 100% = 88.89% ……….. k=3 (μ ± 3σ)

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 97


Percentiles
■A  percentile provides information about how
the data are spread over the interval from the
smallest value to the largest value.
■ Admission test scores for colleges and
universities are frequently reported in terms of
percentiles.
■ The th percentile of a data set is a value such
that at least p percent of the items take on this
value or less and at least (100 – ) percent of
the items take on this value or more.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 98


Percentiles
■ Arrange
  the data in ascending order.
■ Compute , the location of the th percentile.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 99


Percentiles
■ 80th Percentile

Example: Apartment Rents

The 80th percentile is the 56th value plus 0.8


times the difference between the 57th and 56th
values. So, the 80th percentile = 635 + 0.8(649
– 635) = 646.2.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 100
Percentiles
■ 80th Percentile

Example: Apartment Rents


80th percentile = 635 + 0.8(649 – 635) = 646.2.
“At least 80% of the “At least 20% of the
items take on a items take on a
value of 646.2 or less.” value of 646.2 or more.”

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 101
Percentiles
■ 75th Percentile is 3rd Quartile
Example:
X Median X
minimu Q1 (Q2) Q3 maximu
m m
25% 25% 25%
25%
12 30 45 57
70

Interquartile range
= 57 – 30 = 27
75th Percentile
Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 102
We Discuss Two Measures Of The Relationship
Between Two Numerical Variables

 Scatter plots allow you to visually examine the


relationship between two numerical variables
and now we will discuss two quantitative
measures of such relationships.

▪ The Covariance
▪ The Coefficient of Correlation

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 103
The Covariance

■ The covariance measures the strength of the linear


relationship between two numerical variables (X & Y)

■ The sample covariance:

■ Only concerned with the strength of the relationship


■ No causal effect is implied

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 104
Interpreting Covariance

■ Covariance between two variables:


cov(X,Y) > 0 X and Y tend to move in the same direction
cov(X,Y) < 0 X and Y tend to move in opposite directions
cov(X,Y) = 0 X and Y are independent

■ The covariance has a major flaw:


■ It is not possible to determine the relative strength of the
relationship from the size of the covariance

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 105
Coefficient of Correlation

■ Measures the relative strength of the linear


relationship between two numerical variables
■ Sample coefficient of correlation:

where

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 106
Features of the
Coefficient of Correlation
■ The population coefficient of correlation is referred as ρ.
■ The sample coefficient of correlation is referred to as r.
■ Either ρ or r have the following features:
■ Unit free
■ Range between –1 and 1
■ The closer to –1, the stronger the negative linear relationship
■ The closer to 1, the stronger the positive linear relationship
■ The closer to 0, the weaker the linear relationship

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 107
Scatter Plots of Sample Data with
Various Coefficients of Correlation
Y Y

X X
r = -1 r = -.6
Y
Y Y

X X X
r = +1 r = +.3 r=0
Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 108
The Coefficient of Correlation Using
Microsoft Excel Function

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 109
The Coefficient of Correlation Using
Microsoft Excel Data Analysis Tool
1. Select Data
2. Choose Data Analysis
3. Choose Correlation &
Click OK

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 110
The Coefficient of Correlation
Using Microsoft Excel

4. Input data range and select


appropriate options
5. Click OK to get output

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 111
Interpreting the Coefficient of Correlation
Using Microsoft Excel

▪ r = .733

▪ There is a relatively
strong positive linear
relationship between test
score #1 and test score
#2.

▪ Students who scored high


on the first test tended to
score high on second test.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 112
Pitfalls in Numerical
Descriptive Measures

■ Data analysis is objective


■ Should report the summary measures that best
describe and communicate the important aspects of
the data set

■ Data interpretation is subjective


■ Should be done in fair, neutral and clear manner

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 113
Ethical Considerations

Numerical descriptive measures:

■ Should document both good and bad results


■ Should be presented in a fair, objective and
neutral manner
■ Should not use inappropriate summary
measures to distort facts

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 114
Chapter Summary

In this chapter we have discussed:


■ Describing the properties of central tendency,
variation, and shape in numerical data
■ Constructing and interpreting a boxplot
■ Computing descriptive summary measures for a
population
■ Calculating the covariance and the coefficient of
correlation

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 115

You might also like