3.0 Numerical Descriptive Measures

Chapter 3
Numerical Descriptive
Measures
Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 1

Objectives
In this chapter, you learn to:

■ Describe the properties of central tendency,
variation, and shape in numerical data
■ Construct and interpret a boxplot
■ Compute descriptive summary measures for a
population
■ Calculate the covariance and the coefficient of
correlation

Summary Definitions
 The central tendency is the extent to which the

values of a numerical variable group around a
typical or central value.
 If the measures are computed for data from a sample,
they are called sample statistics.
 If the measures are computed for data from a population,
they are called population parameters.
 A sample statistic is referred to as the point estimator of
the corresponding population parameter.

Summary Definitions
 The variation is the amount of dispersion or

scattering away from a central value that the values
of a numerical variable show.
 The shape is the pattern of the distribution of
values from the lowest value to the highest value.

Shape of a Distribution
 A distribution is a group of scores (measures).

 If a distribution is graphed, the resulting bar graph
or histogram can have any “shape”.
 The most common shape you will see is a bell
curve.
 Bell-shaped distributions are also called normal
distributions or Gaussian distributions.

■One important characteristic Normal Distribution
of normal distributions is
that most of the scores pile
up in the middle.
■Normal distributions are
symmetrical in that the right
and left sides of the graph
are identical.

Positively Skewed Distribution
■Graphs can deviate from the

bell shape because of skew.
■A skewed distribution is
asymmetrical
■right and left sides are not
identical.
■scores piles up at lower or
upper ends.

■Distributions also vary in

kurtosis
■which is the extent to which
they have an exaggerated peak
versus a flatter appearance
■distributions that have a higher,
more exaggerated peak than a
normal curve are called
leptokurtic,
■distributions that have a flatter
peak are called platykurtic.

■ Describes how data are distributed

■ Two useful shape related statistics are:
■ Skewness
■ Measures the extent to which data values are not
symmetrical
■ Kurtosis
■ Kurtosis affects the peakedness of the curve of
the distribution—that is, how sharply the curve
rises approaching the center of the distribution

Shape of a Distribution (Skewness)
■ Measures the extent to which data is not

symmetrical
Left-Skewed Symmetric Right-Skewed

Mean < Mean = Median <
Median Median Mean
Skewness
Statistic <0 0 >0

Shape of a Distribution -- Kurtosis measures
how sharply the curve rises approaching the
center of the distribution
Sharper Peak
Than Bell-Shaped
(Kurtosis > 0)
Bell-Shaped
(Kurtosis = 0)
Flatter Than
Bell-Shaped
(Kurtosis < 0)

Measures of Central Tendency:
■ It is also called measure of location.

■ Mean
■ Median
■ Mode
■ Weighted Mean
■ Geometric Mean
■ Percentiles
■ Quartiles

■ You are probably already familiar with the notion of

central tendency
■ If your five history exam scores for a semester were
33%, 81%, 86%, 96%, and 96%, the “center” of these
scores summarizes your academic performance in
history.
■ What is the central value of this distribution?

■ Your history instructor could use the mean, the median,
or the mode to find the center of your scores?
■ If your instructor used the arithmetic average (i.e., the
mean, 78.4%), you would get a C.
■ Although the mean is the most common measure of central
tendency, there are other options.
■ She could use the middle test score (the median, 86%), and
you would get a B.
■ She could also use the most frequently occurring test score
(the mode, 96%), and you would get an A.
■ You will prefer the last one.
■ There are rules of thumb that help you decide when to use each of
these measures of center.

■ When the data are nominal (i.e., when the data are
categories rather than values), you must use the mode
to summarize the center.
■ The median is the best option when data are ordinal.
■ When working with interval or ratio data, you need to
choose between the mean and the median to represent
the center.

■ In general, you should use the mean to summarize
interval or ratio data
■ If the data set contains one or more “extreme” scores
that are very different from the other scores, you
should use the median
■ Statisticians would consider the 38 value an outlier
because it is a very extreme score compared with the
rest of the scores in the distribution

When to Use Measures of Central Tendency


■ When a distribution is symmetrical (or close to

being symmetrical), the mean, the median, and the
mode are all very similar in value.
■ When a distribution is very asymmetrical, the mean,
the median, and the mode are different.
■ In asymmetrical distributions, the mean is “pulled”
toward the distribution’s longer tail.

The Mean
■ The arithmetic mean (often just called the “mean”) is

the most common measure of central tendency
■ For a sample of size n:

Pronounced x-bar
The ith value
Sample size Excel File

Observed values

The Mean (con’t)
■ The most common measure of central tendency

■ Mean = sum of values divided by the number of values
■ Affected by extreme values (outliers)
11 12 13 14 15 16 17 18 19 20 11 12 13 14 15 16 17 18 19 20
Mean = 13 Mean = 14

The Mean (con’t)
■ The Mean as the Center of Balance for the Dot

Plot of the Classroom Size Data

The Median
■ The median is the midpoint of a distribution of

scores
■ In an ordered array, the median is the “middle”
number (50% above, 50% below)
■ When working with a list of scores, you begin by
putting the scores in order from lowest to highest (or
highest to lowest).
■ Less sensitive than the mean to extreme values

The Median
■ In an ordered array, the median is the “middle”

number (50% above, 50% below)
11 12 13 14 15 16 17 18 19 20 11 12 13 14 15 16 17 18 19 20
Median = 13 Median = 13
■ Less sensitive than the mean to extreme values

Locating the Median
■ The location of the median when the values are in numerical order
(smallest to largest):
■ If the number of values is odd, the median is the middle number
■ If the number of values is even, the median is the average of the

two middle numbers
Note that is not the value of the median, only the position of
the median in the ranked data

The Mode
■ Value that occurs most often
■ Not affected by extreme values
■ Used for either numerical or categorical data
■ There may be no mode
■ There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
Mode = 9 No Mode
Review Example
House Prices: ▪ Mean: ($3,000,000/5)

$2,000,000 = $600,000
$ 500,000 ▪ Median: middle value of ranked
$ 300,000
$ 100,000 data
$ 100,000 = $300,000
Sum $ 3,000,000 ▪ Mode: most frequent value
= $100,000

Weighted Mean
 In some instances, the mean is computed by

giving each observation a weight that reflects
its relative importance or frequency of each
observation
 The choice of weights depends on the
application or frequency

Weighted Mean
Ron Butler, a home builder, is looking over the expenses he incurred for a
house he just built. For the purpose of pricing future projects, he would like to
know the average wage ($/hour) he paid the workers he employed. Listed
below are the categories of workers he employed, along with their respective
wage and total hours worked.
Worker Wage ($/hr) Total Hours

Carpenter 21.60 520
Electrician 28.72 230
Laborer 11.80 410
Painter 19.75 270
Plumber 24.16 160

Weighted Mean
■ Example: Construction Wages
FYI, the equally-weighted (simple) mean = $21.21

Geometric Mean
■ The geometric mean is calculated by finding

the nth root of the product of n values.
■ It is often used in analyzing growth rates in
financial data (where using the arithmetic
mean will provide misleading results).
■ It should be applied anytime you want to
determine the mean rate of change over
several successive periods (be it years,
quarters, weeks, . . .).

Geometric Mean
■ Other common applications include changes in

populations of species, crop yields, pollution
levels, and birth and death rates.

Geometric Mean
■ Example: Rate of Return
The average growth rate per period is (0.97752 – 1)(100) =

–2.248%.

Which Measure to Choose?
▪ The mean is generally used, unless extreme values

(outliers) exist.
▪ The median is often used, since the median is not
sensitive to extreme values. For example, median
home prices may be reported for a region; it is less
sensitive to outliers.
▪ In some situations, it makes sense to report both the
mean and the median.

Summary
Central Tendency
Arithmetic Median Mode

Mean
Middle value Most

in the ordered frequently
array observed
value

Measures of Variation
■ Mean is commonly used to
summarize the center of a
distribution of scores measured
on an interval or ratio scale.
■ The mean does a good job
describing the center of scores, it
is also important to describe how Same center,
different variation
“spread out from center” scores

are.

■ There are several ways to
describe the variability of
interval/ratio data
■ The easiest measure of variability
is the range
■ The most common measure of
variability is the standard Same center,
different variation
deviation

Variation
Range Variance Standard Coefficient

Deviation of Variation
■ Measures of variation give

information on the spread or
variability or dispersion of
the data values.
Same center,
different variation
Measures of Variation:
The Range
▪ Simplest measure of variation

▪ Difference between the largest and the smallest
values:
Range = Xlargest – Xsmallest
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 13 - 1 = 12

Why The Range Can Be Misleading
▪ Does not account for how the data are distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
▪ Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119

The Sample Variance
■ Average (approximately) of squared deviations

of values from the mean
■ Sample variance:
Were = arithmetic mean

n = sample size
Xi = ith value of the variable X

The Sample Standard Deviation
■ Most used measure of variation

■ Shows variation about the mean
■ Is the square root of the variance
■ Has the same units as the original data
■ Sample standard deviation:

The Standard Deviation
Steps for Computing Standard Deviation
1. Compute the difference between each value and

the mean.
2. Square each difference.
3. Add the squared differences.
4. Divide this total by n-1 to get the sample variance.
5. Take the square root of the sample variance to get
the sample standard deviation.

The Standard Deviation
Summary of Five Steps to Computing a Sample’s Standard
Deviation
Excel File
Sample Standard Deviation:
Calculation Example
Sample
Data (Xi) : 10 12 14 15 17 18 18 24
n=8 Mean = X = 16
A measure of the “average”

scatter around the mean
Comparing Standard Deviations
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 3.338
Data B Mean = 15.5

11 12 13 14 15 16 17 18 19 20
S = 0.926
21
Data C Mean = 15.5

S = 4.567
11 12 13 14 15 16 17 18 19 20 21

Comparing Standard Deviations
Smaller standard deviation
Larger standard deviation

Summary Characteristics
▪ Standard deviation can be difficult to interpret

as a single number on its own
▪ The more the data are spread out, the greater
the range, variance, and standard deviation.
▪ The more the data are concentrated, the
smaller the range, variance, and standard
deviation.
▪ If the values are all the same (no variation), all
these measures will be zero.
▪ None of these measures are ever negative.

▪ The standard deviation is affected by outliers

(extremely low or extremely high numbers in the
data set).
▪ That’s because the standard deviation is based on
the distance from the mean.
▪ And remember, the mean is also affected by outliers.
▪ The standard deviation has the same units of
measure as the original data.
▪ If you’re talking about inches, the standard deviation
will be in inches.

▪ More precisely, it is a measure of the average

distance between the values of the data in the
set and the mean.
▪ Data varies within ± Standard Deviation from
Mean

The Coefficient of Variation
■ Measures relative variation

■ Always in percentage (%)
■ Shows variation relative to mean
■ Can be used to compare the variability of two or
more sets of data measured in different units

Comparing Coefficients of Variation
■ Stock A:
■ Average price last year = $50
■ Standard deviation = $5
Both stocks
■ Stock B: have the same
standard
■ Average price last year = $100 deviation, but
■ Standard deviation = $5 stock B is less
variable relative
to its price

Comparing Coefficients of Variation (con’t)
■ Stock A:
■ Average price last year = $50
■ Standard deviation = $5
Stock C has a
■ Stock C: much smaller
standard
■ Average price last year = $8 deviation but a
■ Standard deviation = $2 much higher
coefficient of
variation

Locating Extreme Outliers:
Z-Score
▪ The mean perfectly balances the positive and

negative deviation scores of a distribution.
▪ The sum of the positive deviation scores will
always equal the sum of the negative deviation
scores.
▪ Standard deviation describes how much
variability there is in a set of numbers.
▪ The mean and the standard deviation help you
interpret a distribution of scores by telling you
the “center” of the scores and how much
scores vary around that center.

Z-Score
▪ For example, suppose your score on the GMAT

was 25.
▪ This score alone doesn’t tell you much about
your performance,
▪ if you knew that the mean GMAT score was 21 with
a standard deviation of 4.70,
▪ you could interpret your score.
▪ Your score of 25 was 4 points better than the
population mean.

Z-Score
▪ The population standard deviation was 4.70;

this means that your score of 25 (i.e., +4 from
the mean) deviated less from the mean than
was typical (4.70).
▪ So, you did better than average but only a little
better because your score was less than 1
standard deviation above the mean.
▪ Z – Score will help you to know exactly how
you did in GMAT.

Z-Score
▪ To compute the Z-score of a data value,

subtract the mean and divide by the standard
deviation.
▪ The Z-score is the number of standard
deviations a data value is from the mean.
▪ A data value is considered an extreme outlier if
its Z-score is less than -3.0 or greater than
+3.0.
▪ The larger the absolute value of the Z-score,
the farther the data value is from the mean.

Z-Score
▪ A z score will indicate if a given score is very

good (far above the mean), very bad (far below
the mean), or average (close to the mean).
▪ When looking at GMAT scores, larger positive z
scores represent better performance and larger
negative z scores represent worse
performance.
▪ A z for a single score can help you compare
two scores from different distributions.
▪ Comparing TOFEL and GMAT Score.

Z-Score
where X represents the data value

X is the sample mean
S is the sample standard deviation
Your GMAT Single Score:
Z = (25 – 21)/4.7 = 0.851 ≈ 1
How must you should get to have 3

SD above mean? Excel File.

Z-Score
▪ Suppose the mean math SAT score is 490, with a

standard deviation of 100.
▪ Compute the Z-score for a test score of 620.
A score of 620 is 1.3 standard deviations above the

mean and would not be considered an outlier.

Z-Score and Standard Normal
Curve
▪ Z scores enable us to
locate any score in a
distribution of scores
▪ they provide a very
systematic way to
compare any score
to any other score
▪ Positive z scores are above average and that

positive z scores greater than +1 are further above
the average

Z-Score and Standard Normal
Curve
▪ The distribution of raw
scores from which the
z scores are derived is
normally shaped
■ A normally shaped
distribution of z scores
enables researchers to
make very precise probability statements about
any score in a distribution.
We will discuss this when we deal with normal distribution

General Descriptive Stats Using
Microsoft Excel Functions

Microsoft Excel Data Analysis Tool
1. Select Data.
2. Select Data Analysis.
3. Select Descriptive
Statistics and click OK.

Microsoft Excel
4. Enter the cell

range.
5. Check the
Summary
Statistics box.
6. Click OK

Excel output
Microsoft Excel
descriptive statistics
output, using the house
price data:
House Prices:
$2,000,000
500,000
300,000
100,000
100,000

Minitab Output
Minitab descriptive statistics output using the house price data:
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
Descriptive Statistics: House Price
Total
Variable Count Mean SE Mean StDev Variance Sum Minimum
House Price 5 600000 357771 800000 6.40000E+11 3000000 100000
N for
Variable Median Maximum Range Mode Mode Skewness Kurtosis
House Price 300000 2000000 1900000 100000 2 2.01 4.13

Quartile Measures
■ Quartiles split the ranked data into 4 segments with
an equal number of values per segment
25% 25% 25% 25%
Q1 Q2 Q3
■ The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger
■ Q2 is the same as the median (50% of the observations
are smaller and 50% are larger)
■ Only 25% of the observations are greater than the third
quartile

Quartile Measures:
Locating Quartiles
Find a quartile by determining the value in the

appropriate position in the ranked data, where
First quartile position: Q1 = (n+1)/4 ranked value
Second quartile position: Q2 = (n+1)/2 ranked value
Third quartile position: Q3 = 3(n+1)/4 ranked value
where n is the number of observed values

Quartile Measures:
Calculation Rules
■ When calculating the ranked position use the

following rules
■ If the result is a whole number then it is the ranked
position to use
■ If the result is a fractional half (e.g. 2.5, 7.5, 8.5, etc.)

then average the two corresponding data values.
■ If the result is not a whole number or a fractional half

then round the result to the nearest integer to find the
ranked position.

Quartile Measures:
Locating Quartiles
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9)
Q1 is in the (9+1)/4 = 2.5 position of the ranked data
so use the value half way between the 2nd and 3rd values,
so Q1 = 12.5
Q1 and Q3 are measures of non-central location
Q2 = median, is a measure of central tendency
Quartile Measures
Calculating The Quartiles: Example
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9)
Q1 is in the (9+1)/4 = 2.5 position of the ranked data,
so Q1 = (12+13)/2 = 12.5
Q2 is in the (9+1)/2 = 5th position of the ranked data,

so Q2 = median = 16
Q3 is in the 3(9+1)/4 = 7.5 position of the ranked data,

so Q3 = (18+21)/2 = 19.5
Q1 and Q3 are measures of non-central location
Q2 = median, is a measure of central tendency
Quartile Measures:
The Interquartile Range (IQR)
■ The IQR is Q3 – Q1 and measures the spread in the

middle 50% of the data
■ The IQR is also called the midspread because it covers

the middle 50% of the data
■ The IQR is a measure of variability that is not

influenced by outliers or extreme values
■ Measures like Q1, Q3, and IQR that are not influenced
by outliers are called resistant measures

Calculating The Interquartile Range
Example:
X Median X
minimu Q1 (Q2) Q3 maximu
m m
25% 25% 25%
25%
12 30 45 57
70
Interquartile range
= 57 – 30 = 27

The Five Number Summary
The five numbers that help describe the center, spread

and shape of data are:
▪ Xsmallest
▪ First Quartile (Q1)
▪ Median (Q2)
▪ Third Quartile (Q3)
▪ Xlargest

Relationships among the five-number
summary and distribution shape
Left-Skewed Symmetric Right-Skewed

Median – Xsmallest Median – Xsmallest Median – Xsmallest
> ≈ <
Xlargest – Median Xlargest – Median Xlargest – Median
Q1 – Xsmallest Q1 – Xsmallest Q1 – Xsmallest
> ≈ <
Xlargest – Q3 Xlargest – Q3 Xlargest – Q3
Median – Q1 Median – Q1 Median – Q1
> ≈ <
Q3 – Median Q3 – Median Q3 – Median

Five Number Summary and
The Boxplot
■ The Boxplot: A Graphical display of the data

based on the five-number summary:
Xsmallest -- Q1 -- Median -- Q3 -- Xlargest
Example:
25% of data 25% 25% 25% of data

of data of data
Xsmallest Q1 Median Q3 Xlargest

Five Number Summary:
Shape of Boxplots
■ If data are symmetric around the median then the box
and central line are centered between the endpoints
Xsmallest Q1 Median Q3 Xlargest
■ A Boxplot can be shown in either a vertical or horizontal

orientation

Distribution Shape and
The Boxplot
Left- Symmetri Right-

Skewed c Skewed
Q1 Q2 Q 3 Q1 Q 2 Q3 Q1 Q2 Q3

Boxplot Example
■ Below is a Boxplot for the following data:
Xsmallest Q1 Q2 / Median Q3 Xlargest

0 2 2 2 3 3 4 5 5 9 27
0 2 3 5 27
■ The data are right skewed, as the plot depicts

Measuring the Skew
■ A useful measure of the direction and the extent of
the skew is provided by Pearson’s coefficient of
skewness (SK).

𝟑 ( 𝒎𝒆𝒂𝒏 − 𝒎𝒆𝒅𝒊𝒂𝒏 )
𝑺𝑲 =
𝑺𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏

Measuring the Skew 𝟑 ( 𝒎𝒆𝒂𝒏 − 𝒎𝒆𝒅𝒊𝒂𝒏 )
𝑺𝑲 =
■ Table below shows the weekly output of the

devices for cell phones produced by the 200
production workers in cell phone company.
■ Find the arithmetic mean of the weekly output.

■ Find the median weekly output.
■ Why do (a) and (b) differ?
Measuring the Skew 𝟑 ( 𝒎𝒆𝒂𝒏 − 𝒎𝒆𝒅𝒊𝒂𝒏 )
𝑺𝑲 =
■ Before calculating the solution to Example, look

briefly at the table of data above and decide on
the direction of skew and the likely impact this will
have on the mean and median values.
■ Solution (a) - Since we are dealing with grouped data,
we must take class mid-points for the variable Xi. We
then use the formula:
𝐹𝑖 𝑋 𝑖
´𝑋 = ∑

∑ 𝐹𝑖

Measuring the Skew 𝟑 ( 𝒎𝒆𝒂𝒏 − 𝒎𝒆𝒅𝒊𝒂𝒏 𝑭𝒊 𝑿 𝒊 ) ∑
𝑺𝑲 = ´ =
𝑿
∑ 𝑭𝒊
■ The arithmetic mean

of the weekly output is
234.9 units.
■ Solution b - The
median position is
𝒏

𝑴𝒆𝒅𝒊𝒂𝒏=𝒍 +𝒉 × ( 𝟐
− 𝒄𝒇
𝒇 )

𝑛=𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛

𝑓 =𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠

h=𝑐𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒

𝑐𝑓 =𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡h𝑒 𝑝𝑟𝑒𝑐𝑒𝑑𝑖𝑛𝑔 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠

𝑺𝑲 = ´ =
𝑿
∑ 𝑭𝒊
■ The median class

interval for output is
220–240 units, as
shown in the table
below
No. of
No. of
Output Mid Point Employee
Employee
(cumulative)
100 -160 130 1 1

160 - 180 170 5 6
180 -200 190 10 16
200 - 220 210 35 51
220 - 240 230 55 106
240 - 260 250 74 180
260 - 300 280 20 200

𝑺𝑲 = ´ =
𝑿
∑ 𝑭𝒊
■ The median class interval for output is 220–240

units
No. of
No. of
Employee
(cumulative)
100 -160 130 1 1

160 - 180 170 5 6
180 -200 190 10 16
200 - 220 210 35 51
220 - 240 230 55 106
240 - 260 250 74 180
260 - 300 280 20 200

𝑺𝑲 = ´ =
𝑿
∑ 𝑭𝒊
■ The arithmetic mean output (234.9 units) is

lower than the median output (238 units).
■ We would expect this to be the case since the data is
clearly skewed to the left as the arithmetic mean
(simple average) will be pulled down by the few
extremely low values.
No. of
No. of
Employee
(cumulative)
100 -160 130 1 1

160 - 180 170 5 6
180 -200 190 10 16
200 - 220 210 35 51
220 - 240 230 55 106
240 - 260 250 74 180
260 - 300 280 20 200

Numerical Descriptive Measures
for a Population
▪ Descriptive statistics discussed previously

described a sample, not the population.
▪ Summary measures describing a population,
called parameters, are denoted with Greek
letters.
▪ Important population parameters are the
population mean, variance, and standard
deviation.

for a Population: The mean µ
■ The population mean is the sum of the values in

the population divided by the population size, N
Where μ = population mean

N = population size

for a Population: The mean µ
■ Why only N in the formula of population mean
instead of n – 1?
■ The reason n-1 is used is because that is
the number of degrees of freedom in the sample.
■ The sum of each value in a sample minus the mean
must equal 0, so if you know what all the values
except one are, you can calculate the value of the
final one.
■ Degrees of Freedom refers to the maximum number
of logically independent values, which are values
that have the freedom to vary, in the data sample.

Numerical Descriptive Measures For A
Population: The Variance σ2
■ Average of squared deviations of values from

the mean
■ Population variance:
Where μ = population mean

N = population size

Numerical Descriptive Measures For A
Population: The Standard Deviation σ
■ Most commonly used measure of variation

■ Shows variation about the mean
■ Is the square root of the population variance
■ Has the same units as the original data
■ Population standard deviation:

Sample statistics versus
population parameters
Measure Population Sample

Parameter Statistic
Mean
Variance
Standard
Deviation

The Empirical Rule
■ The empirical rule approximates the variation of

data in a bell-shaped distribution
■ Approximately 68% of the data in a bell shaped
distribution is within 1 standard deviation of the
mean or
68%

The Empirical Rule
■ Approximately 95% of the data in a bell-shaped
distribution lies within two standard deviations of the
mean, or µ ± 2σ
■ Approximately 99.7% of the data in a bell-shaped

distribution lies within three standard deviations of the
mean, or µ ± 3σ
95% 99.7%

Using the Empirical Rule
▪ Suppose that the variable Math SAT scores is bell-

shaped with a mean of 500 and a standard deviation
of 90. Then,
▪ Approximately 68% of all test takers scored between 410
and 590, (500 ± 90).
▪ Approximately 95% of all test takers scored between 320
and 680, (500 ± 180).
▪ Approximately 99.7% of all test takers scored between
230 and 770, (500 ± 270).

Chebyshev Rule
■ Regardless of how the data are distributed, at

least (1 - 1/k2) x 100% of the values will fall
within k standard deviations of the mean (for k >
1)
■ Examples:
At least Within
(1 - 1/22) x 100% = 75% ….............. k=2 (μ ± 2σ)

(1 - 1/32) x 100% = 88.89% ……….. k=3 (μ ± 3σ)

Percentiles
■A percentile provides information about how
the data are spread over the interval from the
smallest value to the largest value.
■ Admission test scores for colleges and
universities are frequently reported in terms of
percentiles.
■ The th percentile of a data set is a value such
that at least p percent of the items take on this
value or less and at least (100 – ) percent of
the items take on this value or more.

Percentiles
■ Arrange
the data in ascending order.
■ Compute , the location of the th percentile.

Percentiles
■ 80th Percentile
Example: Apartment Rents
The 80th percentile is the 56th value plus 0.8

times the difference between the 57th and 56th
values. So, the 80th percentile = 635 + 0.8(649
– 635) = 646.2.
Percentiles
■ 80th Percentile
Example: Apartment Rents

80th percentile = 635 + 0.8(649 – 635) = 646.2.
“At least 80% of the “At least 20% of the
items take on a items take on a
value of 646.2 or less.” value of 646.2 or more.”
Percentiles
■ 75th Percentile is 3rd Quartile
Example:
X Median X
minimu Q1 (Q2) Q3 maximu
m m
25% 25% 25%
25%
12 30 45 57
70
Interquartile range
= 57 – 30 = 27
75th Percentile
We Discuss Two Measures Of The Relationship
Between Two Numerical Variables
 Scatter plots allow you to visually examine the

relationship between two numerical variables
and now we will discuss two quantitative
measures of such relationships.
▪ The Covariance
▪ The Coefficient of Correlation
The Covariance
■ The covariance measures the strength of the linear

relationship between two numerical variables (X & Y)
■ The sample covariance:
■ Only concerned with the strength of the relationship

■ No causal effect is implied
Interpreting Covariance
■ Covariance between two variables:

cov(X,Y) > 0 X and Y tend to move in the same direction
cov(X,Y) < 0 X and Y tend to move in opposite directions
cov(X,Y) = 0 X and Y are independent
■ The covariance has a major flaw:

■ It is not possible to determine the relative strength of the
relationship from the size of the covariance
Coefficient of Correlation
■ Measures the relative strength of the linear

relationship between two numerical variables
■ Sample coefficient of correlation:
where
Features of the
Coefficient of Correlation
■ The population coefficient of correlation is referred as ρ.
■ The sample coefficient of correlation is referred to as r.
■ Either ρ or r have the following features:
■ Unit free
■ Range between –1 and 1
■ The closer to –1, the stronger the negative linear relationship
■ The closer to 1, the stronger the positive linear relationship
■ The closer to 0, the weaker the linear relationship
Scatter Plots of Sample Data with
Various Coefficients of Correlation
Y Y
X X
r = -1 r = -.6
Y
Y Y
X X X
r = +1 r = +.3 r=0
The Coefficient of Correlation Using
Microsoft Excel Function
The Coefficient of Correlation Using
Microsoft Excel Data Analysis Tool
1. Select Data
2. Choose Data Analysis
3. Choose Correlation &
Click OK
The Coefficient of Correlation
Using Microsoft Excel
4. Input data range and select

appropriate options
5. Click OK to get output
Interpreting the Coefficient of Correlation
Using Microsoft Excel
▪ r = .733
▪ There is a relatively
strong positive linear
relationship between test
score #1 and test score
#2.
▪ Students who scored high

on the first test tended to
score high on second test.
Pitfalls in Numerical
Descriptive Measures
■ Data analysis is objective

■ Should report the summary measures that best
describe and communicate the important aspects of
the data set
■ Data interpretation is subjective

■ Should be done in fair, neutral and clear manner
Ethical Considerations
Numerical descriptive measures:
■ Should document both good and bad results

■ Should be presented in a fair, objective and
neutral manner
■ Should not use inappropriate summary
measures to distort facts
Chapter Summary
In this chapter we have discussed:

■ Describing the properties of central tendency,
variation, and shape in numerical data
■ Constructing and interpreting a boxplot
■ Computing descriptive summary measures for a
population
■ Calculating the covariance and the coefficient of
correlation

3.0 Numerical Descriptive Measures

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3.0 Numerical Descriptive Measures

Uploaded by

Copyright:

Available Formats

Chapter 3

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 1

In this chapter, you learn to:

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 2

 The central tendency is the extent to which the

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 3

 The variation is the amount of dispersion or

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 4

 A distribution is a group of scores (measures).

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 5

■One important characteristic Normal Distribution

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 6

■Graphs can deviate from the

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 7

■Distributions also vary in

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 8

■ Describes how data are distributed

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 9

■ Measures the extent to which data is not

Left-Skewed Symmetric Right-Skewed

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 10

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 11

■ It is also called measure of location.

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 12

■ You are probably already familiar with the notion of

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 13

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 14

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 15

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 16

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 17

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 18

■ When a distribution is symmetrical (or close to

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 19

■ The arithmetic mean (often just called the “mean”) is

■ For a sample of size n:

Sample size Excel File

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 20

■ The most common measure of central tendency

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 21

■ The Mean as the Center of Balance for the Dot

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 22

■ The median is the midpoint of a distribution of

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 23

■ In an ordered array, the median is the “middle”

■ Less sensitive than the mean to extreme values

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 24

■ If the number of values is odd, the median is the middle number

■ If the number of values is even, the median is the average of the

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 25

House Prices: ▪ Mean: ($3,000,000/5)

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 27

 In some instances, the mean is computed by

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 28

Worker Wage ($/hr) Total Hours

Electrician 28.72 230

Laborer 11.80 410

Painter 19.75 270

Plumber 24.16 160

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 29

■ Example: Construction Wages

FYI, the equally-weighted (simple) mean = $21.21

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 30

■ The geometric mean is calculated by finding

Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 3, Slide 31