Professional Documents
Culture Documents
Economics
Session 3
DESCRIPTIVE STATISTICS (CONT’)
32%
30%
X Y Us
Data Presentation
Data
Presentation
Qualitative Quantitative
Data Data
Data
Presentation
Qualitative Quantitative
Data Data
Row Is
Major Count Tally:
Category |||| ||||
Accounting 130
|||| ||||
Economics 20
Management 50
Total 200
Data Presentation
Data
Presentation
Qualitative Quantitative
Data Data
100
Percent Frequency or %
Used
Also
50
0
Acct. Econ. Mgmt.
Major Vertical Bars
Zero Point for Qualitative
Variables
Data Presentation
Data
Presentation
Qualitative Quantitative
Data Data
Data
Presentation
Qualitative Quantitative
Data Data
100
Percent Frequency or %
Used
Also
50
0
Acct. Mgmt. Econ.
Major Vertical Bars
Zero Point for Qualitative
Variables
Thinking Challenge
You’re an analyst for IRI. You want to show the
market shares held by Web browsers in 2006.
Construct a bar graph, pie chart, & Pareto diagram
to describe the data.
Browser Mkt. Share (%)
Firefox 14
Internet Explorer 81
Safari 4
Others 1
Bar Graph Solution*
100%
Market Share (%)
80%
60%
40%
20%
0%
Firefox Internet Safari Others
Explorer
Browser
Pie Chart Solution*
Market Share
Firefox,
14%
Safari, 4%
Others,
1%
Internet
Explorer,
81%
Pareto Diagram Solution*
100%
Market Share (%)
80%
60%
40%
20%
0%
Internet Firefox Safari Others
Explorer
Browser
Presenting
Quantitative Data
Data Presentation
Data
Presentation
Qualitative Quantitative
Data Data
2. Data: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
Data Presentation
Data
Presentation
Qualitative Quantitative
Data Data
Class Freq.
Count 15.5 – 25.5 3
5 25.5 – 35.5 5
35.5 – 45.5 2
Frequency 4
3
Relative
Frequency 2 Bars
Touch
Percent 1
0
0 15.5 25.5 35.5 45.5 55.5
Lower Boundary
Frequency Distribution Table
Example
Raw Data: 24, 26, 24, 21, 27 27 30, 41, 32, 38
Data
Presentation
Qualitative Quantitative
Data Data
$400,000
$70,000
Central Tendency
(Location)
Variation
(Dispersion)
Shape
Numerical Data
Properties & Measures
Numerical Data
Properties
Central Relative
Variation
Tendency Standing
Mean Range Percentiles
Median Interquartile Range Z–scores
Mode Variance
Standard Deviation
Central Tendency
Numerical Data
Properties & Measures
Numerical Data
Properties
Central Relative
Variation
Tendency Standing
Mean Range Percentiles
Median Interquartile Range Z–scores
Mode Variance
Standard Deviation
Mean
1. Measure of central tendency
2. Most common measure
3. Acts as ‘balance point’
4. Affected by extreme values (‘outliers’)
5. Formula (sample mean)
n
Xi X1 X 2 … X n
i 1
X
n n
Mean Example
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
n
Xi X1 X 2 X 3 X 4 X 5 X 6
i 1
X
n 6
10 .3 4.9 8.9 11.7 6.3 7.7
6
8.30
Numerical Data
Properties & Measures
Numerical Data
Properties
Central Relative
Variation
Tendency Standing
Mean Range Percentiles
Median Interquartile Range Z–scores
Mode Variance
Standard Deviation
Median
1. Measure of central tendency
2. Middle value in ordered sequence
• If n is odd, middle value of sequence
• If n is even, average of 2 middle values
3. Position of median in sequence
n 1
Positioning Point
2
4. Not affected by extreme values
Median Example
Odd-Sized Sample
• Raw Data: 24.1 22.6 21.5 23.7 22.6
• Ordered: 21.5 22.6 22.6 23.7 24.1
• Position: 1 2 3 4 5
n 1 5 1
Positioning Point 3.0
2 2
Median 22 .6
Median Example
Even-Sized Sample
• Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
• Ordered: 4.9 6.3 7.7 8.9 10.3 11.7
• Position: 1 2 3 4 5 6
n 1 6 1
Positioning Point 3.5
2 2
7.7 8.9
Median 8.30
2
Numerical Data
Properties & Measures
Numerical Data
Properties
Central Relative
Variation
Tendency Standing
Mean Range Percentiles
Median Interquartile Range Z–scores
Mode Variance
Standard Deviation
Mode
1. Measure of central tendency
2. Value that occurs most often
3. Not affected by extreme values
4. May be no mode or several modes
5. May be used for quantitative or qualitative
data
Mode Example
• No Mode
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
• One Mode
Raw Data: 6.3 4.9 8.9 6.3 4.9 4.9
• More Than 1 Mode
Raw Data: 21 28 28 41 43 43
Thinking Challenge
You’re a financial analyst
for Prudential-Bache
Securities. You have
collected the following
closing stock prices of new
stock issues: 17, 16, 21, 18,
13, 16, 12, 11.
Describe the stock prices
in terms of central
tendency.
Central Tendency Solution*
Mean
n
Xi X1 X 2 … X 8
i 1
X
n 8
17 16 21 18 13 16 12 11
8
15 .5
Central Tendency Solution*
Median
• Raw Data: 17 16 21 18 13 16 12 11
• Ordered: 11 12 13 16 16 17 18 21
• Position: 1 2 3 4 5 6 7 8
n 1 8 1
Positioning Point 4.5
2 2
16 16
Median 16
2
Central Tendency Solution*
Mode
Raw Data: 17 16 21 18 13 16 12 11
Mode = 16
Summary of
Central Tendency Measures
Measure Formula Description
Mean Xi / n Balance Point
Median (n +1) Middle Value
Position
2 When Ordered
Mode none Most Frequent
Shape
Shape
1. Describes how data are distributed
2. Measures of Shape
• Skew = Symmetry
Central Relative
Variation
Tendency Standing
Mean Range Percentiles
Median Interquartile Range Z–scores
Mode Variance
Standard Deviation
Range
1. Measure of dispersion
2. Difference between largest & smallest
observations
Range = Xlargest – Xsmallest
3. Ignores how data are distributed
7 8 9 10 7 8 9 10
Range = 10 – 7 = 3 Range = 10 – 7 = 3
Numerical Data
Properties & Measures
Numerical Data
Properties
Central Relative
Variation
Tendency Standing
Mean Range Percentiles
Median Interquartile Range Z–scores
Mode Variance
Standard Deviation
Variance &
Standard Deviation
1. Measures of dispersion
2. Most common measures
3. Consider how data are distributed
4. Show variation about mean (X or μ)
X = 8.3
4 6 8 10 12
Sample Variance Formula
n 2
(X i X )
i 1
S2
n 1
2 2 2
=
(X 1 X ) (X 2 X ) … (X n X )
n 1
n - 1 in denominator!
(Use N if Population
Variance)
Sample Standard Deviation
Formula
S S 2
n 2
(X i X )
i 1
n 1
(X 1 X ) (X
2
2 X ) … (X
2
n X )
2
n 1
Variance Example
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
n n
(X i X ) Xi
2
i 1 i 1
S 2
where X 8.3
n 1 n
2 2 2
S 2
(
10 .3 8.3 ) (4.9 8.3 ) … (7.7 8.3 )
6 1
6.368
Thinking Challenge
• You’re a financial analyst
for Prudential-Bache
Securities. You have
collected the following
closing stock prices of
new stock issues: 17, 16,
21, 18, 13, 16, 12, 11.
• What are the variance
and standard deviation
of the stock prices?
Variation Solution*
Sample Variance
Raw Data: 17 16 21 18 13 16 12 11
n 2 n
(X i X ) Xi
i 1 i 1
S2 where X 15 .5
n 1 n
2 2 2
S 2
(
17 15 .5 ) (16 15 .5 ) … (11 15 .5 )
8 1
11.14
Variation Solution*
X i
Dispersion about
X
Standard Deviation 2
(Sample) Sample Mean
n 1
X i X
Dispersion about
Standard Deviation 2
(Population) Population Mean
N
Variance (X i X )2 Squared Dispersion
(Sample) n–1 about Sample Mean
Interpreting Standard
Deviation
Interpreting Standard Deviation:
Chebyshev’s Theorem
• Applies to any shape data set
• No useful information about the fraction of data in the
interval x – s to x + s
• At least 3/4 of the data lies in the interval
x – 2s to x + 2s
• At least 8/9 of the data lies in the interval
x – 3s to x + 3s
• In general, for k > 1, at least 1 – 1/k2 of the data lies
in the interval x – ks to x + ks
Interpreting Standard Deviation:
Chebyshev’s Theorem
x 3s x 2s xs x xs x 2s x 3s
No useful information
x = 15.5 s = 3.34
Central Relative
Variation
Tendency Standing
Mean Range Percentiles
Median Interquartile Range Z–scores
Mode Variance
Standard Deviation
Numerical Measures of
Relative Standing: Percentiles
• Describes the relative location of a
measurement compared to the rest of the data
• The pth percentile is a number such that p% of
the data falls below it and (100 – p)% falls
above it
• Median = 50th percentile
Percentile Example
• You scored 560 on the GMAT exam. This
score puts you in the 58th percentile.
• What percentage of test takers scored lower
than you did?
• What percentage of test takers scored higher
than you did?
Percentile Example
• What percentage of test takers scored lower
than you did?
58% of test takers scored lower than 560.
• What percentage of test takers scored higher
than you did?
(100 – 58)% = 42% of test takers scored
higher than 560.
Numerical Data
Properties & Measures
Numerical Data
Properties
Central Relative
Variation
Tendency Standing
Mean Range Percentiles
Median Interquartile Range
Z–scores
Mode Variance
Standard Deviation
Numerical Measures of
Relative Standing: Z–Scores
• Describes the relative location of a
measurement compared to the rest of the data
• Sample z–score Population z–score
x–x x–μ
z= s z= σ
Positioning Point of Qi
i n 1 ( )
4
Quartile (Q1) Example
• Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
• Ordered: 4.9 6.3 7.7 8.9 10.3 11.7
• Position: 1 2 3 4 5 6
Q 1 Position
( ) ( )
1 n 1 1 6 1
1.75 2
4 4
Q 1 6 .3
Quartile (Q2) Example
• Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
• Ordered: 4.9 6.3 7.7 8.9 10.3 11.7
• Position: 1 2 3 4 5 6
Q 2 Position
( ) ( )
2 n 1 2 6 1
3.5
4 4
7.7 8.9
Q2 8.3
2
Quartile (Q3) Example
• Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
• Ordered: 4.9 6.3 7.7 8.9 10.3 11.7
• Position: 1 2 3 4 5 6
Q 3 Position
( ) ( )
3 n 1 3 6 1
5.25 5
4 4
Q 3 10 .3
Numerical Data
Properties & Measures
Numerical Data
Properties
Central
Variation Shape
Tendency
Mean Range Skew
Median Interquartile Range
Mode Variance
Standard Deviation
Interquartile Range
1. Measure of dispersion
2. Also called midspread
3. Difference between third & first quartiles
• Interquartile Range = Q3 – Q1
4. Spread in middle 50%
5. Not affected by extreme values
Thinking Challenge
• You’re a financial analyst for
Prudential-Bache Securities.
You have collected the
following closing stock prices
of new stock issues: 17, 16,
21, 18, 13, 16, 12, 11.
• What are the quartiles, Q1
and Q3, and the interquartile
range?
Quartile Solution*
Q1
Raw Data: 17 16 21 18 13 16 12 11
Ordered: 11 12 13 16 16 17 18 21
Position: 1 2 3 4 5 6 7 8
( ) 1 (8 1)
1 n 1
Q 1 Position 2.5
4 4
Q 1 12 .5
Quartile Solution*
Q3
Raw Data: 17 16 21 18 13 16 12 11
Ordered: 11 12 13 16 16 17 18 21
Position: 1 2 3 4 5 6 7 8
Q 3 Position
( ) 3 (8 1) 6.75 7
3 n 1
4 4
Q 3 18
Interquartile Range Solution*
Interquartile Range
Raw Data: 17 16 21 18 13 16 12 11
Ordered: 11 12 13 16 16 17 18 21
Position: 1 2 3 4 5 6 7 8
4 6 8 10 12
Shape & Box Plot
x x x
Positive Negative No
relationship relationship relationship
Scattergram Example
• You’re a marketing analyst for Hasbro Toys.
You gather the following data:
Ad $ (x) Sales (Units) (y)
1 1
2 1
3 2
4 2
5 4
• Draw a scattergram of the data
Scattergram Example
Sales
4
3
2
1
0
0 1 2 3 4 5
Advertising
Time Series Plot
Time Series Plot
• Used to graphically display data produced over
time
• Shows trends and changes in the data over
time
• Time recorded on the horizontal axis
• Measurements recorded on the vertical axis
• Points connected by straight lines
Time Series Plot Example
• The following data shows Average
the average retail price of Date Price
regular gasoline in New Oct 16, 2006 $2.219
York City for 8 weeks in Oct 23, 2006 $2.173
2006. Oct 30, 2006 $2.177
• Draw a time series plot Nov 6, 2006 $2.158
for this data. Nov 13, 2006 $2.185
Nov 20, 2006 $2.208
Nov 27, 2006 $2.236
Dec 4, 2006 $2.298
Time Series Plot Example
Price
2.35
2.3
2.25
2.2
2.15
2.1
2.05
10/16 10/23 10/30 11/6 11/13 11/20 11/27 12/4
Date
Distorting the Truth
with Descriptive Techniques
Errors in Presenting Data
1. Using ‘chart junk’
2. No relative basis in
comparing data
batches
3. Compressing the
vertical axis
4. No zero point on the
vertical axis
‘Chart Junk’
100 25
0 0
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
No Zero Point
on Vertical Axis
Bad Presentation Good Presentation
Monthly Sales Monthly Sales
$ $
45 60
42 40
39 20
36 0
J M M J S N J M M J S N
Conclusion
1. Described Qualitative Data Graphically
2. Described Numerical Data Graphically
3. Explained Numerical Data Properties
4. Described Summary Measures
5. Analyzed Numerical Data Using Summary
Measures