- Assignment 3
- PROBLEM SET-3 Discrete Probability - Solutions
- tif_ch08
- Probability Assignment
- The Mirror Daily_ 10 February 2017 Newpapers.pdf
- chap 7
- sels3
- Chapter 2
- Statistics - Hypothesis Testing One Sample Tests
- 20120130407002
- latihan soal statistika
- The Mirror Daily_ 6 February 2017 Newpapers.pdf
- Final Presentation 1 (1)
- PROBLEM SET-4 Continuous Probability - Solutions
- Bayes+Theorem
- Plane Manufactring Operations Management
- Applications of Stochastic Models in Hydrology
- Out
- 603 Basics
- Gallardo 2005
- Regional Study of Telecom Technology Options For Indian Rural Eucation - Survey Ananlysis
- D 643 - 97 R02 _RDY0MW__
- Chapter III
- Annotated SPSS Output.doc
- 02 Chapters 5 and 6 Colored
- Jenny
- Ratio
- 4_Acceptance_Testing_and_Criteria_for_Ready_Mixed_Concrete_in_Hong_Kong_By_Prof_Albert_Kwan.pdf
- 302-Inst-Ch3
- Week 5 Result Analysis 1 Lecture Note
- The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
- Dispatches from Pluto: Lost and Found in the Mississippi Delta
- Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
- Sapiens: A Brief History of Humankind
- The Unwinding: An Inner History of the New America
- Yes Please
- The Prize: The Epic Quest for Oil, Money & Power
- A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
- This Changes Everything: Capitalism vs. The Climate
- Grand Pursuit: The Story of Economic Genius
- The Emperor of All Maladies: A Biography of Cancer
- John Adams
- Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
- Rise of ISIS: A Threat We Can't Ignore
- Smart People Should Build Things: How to Restore Our Culture of Achievement, Build a Path for Entrepreneurs, and Create New Jobs in America
- The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
- The World Is Flat 3.0: A Brief History of the Twenty-first Century
- Team of Rivals: The Political Genius of Abraham Lincoln
- The New Confessions of an Economic Hit Man
- Bad Feminist: Essays
- How To Win Friends and Influence People
- Steve Jobs
- Angela's Ashes: A Memoir
- The Incarnations: A Novel
- You Too Can Have a Body Like Mine: A Novel
- Leaving Berlin: A Novel
- The Silver Linings Playbook: A Novel
- The Sympathizer: A Novel (Pulitzer Prize for Fiction)
- Extremely Loud and Incredibly Close: A Novel
- The Light Between Oceans: A Novel
- We Are Not Ourselves: A Novel
- The First Bad Man: A Novel
- The Rosie Project: A Novel
- The Blazing World: A Novel
- The Flamethrowers: A Novel
- Brooklyn: A Novel
- A Man Called Ove: A Novel
- Bel Canto
- The Master
- Life of Pi
- The Love Affairs of Nathaniel P.: A Novel
- A Prayer for Owen Meany: A Novel
- The Cider House Rules
- The Perks of Being a Wallflower
- The Bonfire of the Vanities: A Novel
- Lovers at the Chameleon Club, Paris 1932: A Novel
- Interpreter of Maladies
- Beautiful Ruins: A Novel
- The Kitchen House: A Novel
- The Art of Racing in the Rain: A Novel
- Wolf Hall: A Novel
- The Wallcreeper
- Billy Lynn's Long Halftime Walk: A Novel

**Numerical Descriptive Measures
**

USING STATISTICS: Evaluating the Performance of Mutual Funds

3.1

3.2

**MEASURES OF CENTRAL TENDENCY,
**

VARIATION, AND SHAPE

The Mean

The Median

The Mode

Quartiles

The Geometric Mean

The Range

The Interquartile Range

The Variance and the Standard Deviation

The Coefficient of Variation

Z Scores

Shape

Visual Explorations: Exploring Descriptive

Statistics

Microsoft Excel Descriptive Statistics Output

Minitab Descriptive Statistics Output

NUMERICAL DESCRIPTIVE MEASURES

FOR A POPULATION

The Population Mean

The Population Variance and Standard Deviation

**The Empirical Rule
**

The Chebychev Rule

3.3

**COMPUTING NUMERICAL DESCRIPTIVE
**

MEASURES FROM A FREQUENCY

DISTRIBUTION

3.4

**EXPLORATORY DATA ANALYSIS
**

The Five-Number Summary

The Box-and-Whisker Plot

3.5

**THE COVARIANCE AND THE COEFFICIENT
**

OF CORRELATION

The Covariance

The Coefficient of Correlation

3.6

**PITFALLS IN NUMERICAL DESCRIPTIVE
**

MEASURES AND ETHICAL ISSUES

A.3

**USING SOFTWARE FOR DESCRIPTIVE
**

STATISTICS

A.3.1 Microsoft Excel

A3.2 Minitab

A3.3 (CD-ROM Topic) SPSS

LEARNING OBJECTIVES

In this chapter, you learn:

• To describe the properties of central tendency, variation,

and shape in numerical data

• To calculate descriptive summary measures for a population

• To construct and interpret a box-and-whisker plot

• To describe the covariance and the coefficient of correlation

72

CHAPTER THREE Numerical Descriptive Measures

U S I N G S TAT I S T I C S

Evaluating the Performance of Mutual Funds

Return to the study of mutual funds introduced in Chapter 2. You want to

decide which types of mutual funds to invest in. In the last chapter you

learned how to present data in tables and charts. However, when dealing

with numerical data, such as the return on investments in mutual funds in

2003, you also need to summarize the data, and ask statistical questions.

What is the central tendency for returns of the various funds? For example, what is the mean return in 2003 for the low-risk, average-risk, and

high-risk mutual funds? How much variability is present in the returns?

Are the returns for high-risk funds more variable than for average-risk

funds or low-risk funds? How can you use this information when deciding

what mutual funds to invest in?

or numerical variables, you need more than just the visual picture of what a variable looks

like than you get from the graphs discussed in Chapter 2. For example, for the 2003 returns,

you would like to determine not only whether the riskier funds had a higher 2003 return, but

whether they also had greater variation, and how the returns for each risk group were distributed. You also want to examine whether there is a relationship between the expense ratio and

the 2003 return. Reading this chapter will allow you to learn about some of the methods to

measure:

F

•

•

•

**central tendency, the extent to which all of the data values group around a central value
**

variation, the amount of dispersion or scattering of values away from a central point

shape, the pattern of the distribution of values from the lowest value to the highest value

You will also learn about the covariance and the coefficient of correlation that help measure the

strength of the association between two numerical variables.

3.1

**MEASURES OF CENTRAL TENDENCY, VARIATION,
**

AND SHAPE

You can characterize any set of data by measuring its central tendency, variation, and shape.

Most sets of data show a distinct central tendency to group around a central point. When people talk about an “average value” or the “middle value” or the most popular or frequent value,

they are talking informally about the mean, median, and mode, three measures of central tendency.

Variation measures the spread or dispersion of values in a data set. One simple measure of

variation is the range, the difference between the highest and lowest value. More commonly

used in statistics are the standard deviation and variance, two measures explained later in this

section. The shape of a data set represents a pattern of all the values from the lowest to highest

value. As you will learn later in this section, many data sets have a pattern that looks approximately like a bell, with a peak of values somewhere in the middle.

3.1: Measures of Central Tendency, Variation, and Shape

73

The Mean

The arithmetic mean (typically referred to as the mean) is the most common measure of central tendency. The mean is the only common measure in which all the values play an equal role.

The mean serves as a “balance point” in a set of data (like the fulcrum on a seesaw). You calculate the mean by adding together all the values in a data set and then dividing that sum by the

number of values in the data set.

The symbol X , called X bar, is used to represent the mean of a sample. For a sample containing n values, the equation for the mean of a sample, is written as

sum of the values

number of values

X =

**Using the series X1, X2, . . . , Xn to represent the set of n values and n to represent the number of
**

values, the equation becomes:

X =

X1 + X 2 + L + X n

n

**By using summation notation (discussed fully in Appendix B), you replace the numerator
**

n

X 1 + X 2 + … + X n by the term

∑ Xi

that means sum all the X i values from the first X

i =1

**value, X1 , to the last X value, Xn , to form Equation (3.1), a formal definition of the sample
**

mean.

SAMPLE MEAN

The sample mean is the sum of the values divided by the number of values.

n

X =

∑ Xi

i =1

(3.1)

n

X = sample mean

n = number of values or sample size

where

Xi = ith value of the variable X

n

**∑ X i = summation of all Xi values in the sample
**

i =1

Because all the values play an equal role, a mean will be greatly affected by any value that

is greatly different from the others in the data set. When you have such extreme values, you

should avoid using the mean.

The mean can suggest what is a “typical” or central value for a data set. For example, if you

knew the typical time it takes you to get ready in the morning, you might be able to better plan

your morning and minimize any excessive lateness (or earliness) going to your destination.

Suppose you define the time to get ready as the time in minutes (rounded to the nearest minute)

from when you get out of bed to when you leave your home. You collect the times shown below

for 10 consecutive work days:

Day:

Time (minutes):

1

2

3

4

5

6

7

8

9

10

39

29

43

52

39

44

40

31

44

35

74

CHAPTER THREE Numerical Descriptive Measures

TIMES

**The mean time is 39.6 minutes, computed as follows:
**

X =

**sum of the values
**

number of values

n

X =

∑ Xi

i =1

n

X =

39 + 29 + 43 + 52 + 39 + 44 + 40 + 31 + 44 + 35

10

X =

396

= 39.6

10

**Even though no one day in the sample actually had the value 39.6 minutes, allotting about 40
**

minutes to get ready would be a good rule for planning your mornings, but only because the 10

days does not contain extreme values.

Contrast this to the case in which the value on day four was 102 minutes instead of 52 minutes. This extreme value would cause the mean to rise to 44.6 minutes as follows:

X =

**sum of the values
**

number of values

n

X =

X =

∑ Xi

i =1

n

446

= 44.6

10

**The one extreme value has increased the mean by more than 10% from 39.6 to 44.6 minutes. In
**

contrast to the original mean that was in the “middle,” greater than 5 of the get-ready times

(and less than the 5 other times), the new mean is greater than 9 of the 10 get-ready times. The

extreme value has caused the mean to be a poor measure of central tendency.

EXAMPLE 3.1

**THE MEAN 2003 RETURN FOR SMALL CAP MUTUAL FUNDS WITH HIGH RISK
**

The 121 mutual funds that are part of the “Using Statistics” scenario (see page 72) are classified

according to the risk level of the mutual funds (low, average, and high) and type (small cap, mid

cap, and large cap). Compute the mean 2003 return for the small cap mutual funds with high risk.

SOLUTION The mean 2003 return for the small cap mutual funds with high risk (MUTUALis 51.53, calculated as follows:

FUNDS2004)

X =

**sum of the values
**

number of values

n

=

=

∑ Xi

i =1

n

463.8

= 51.53

9

**The ordered array for the nine small cap mutual funds with high risk is:
**

37.3

39.2

44.2

44.5

53.8

56.6

59.3

62.4

66.5

Four of these returns are below the mean of 51.53 and five of these returns are above the mean.

3.1: Measures of Central Tendency, Variation, and Shape

75

The Median

The median is the value that splits a ranked set of data into two equal parts. The median is not

affected by extreme values, so you can use the median when extreme values are present.

The median is the middle value in a set of data that has been ordered from lowest to highest

value.

To calculate the median for a set of data, you first rank the values from smallest to largest.

Then use Equation (3.2) to compute the rank of the value that is the median.

MEDIAN

50% of the values are smaller than the median and 50% of the values are larger than the

median.

Median =

n +1

ranked value

2

(3.2)

**You compute the median value by following one of two rules:
**

•

•

**Rule 1 If there are an odd number of values in the data set, the median is the middle
**

ranked value.

Rule 2 If there are an even number of values in the data set, then the median is the average

of the two middle ranked values.

To compute the median for the sample of 10 times to get ready in the morning, you rank the

daily times as follows:

Ranked values:

29 31 35 39 39 40 43 44 44 52

Ranks:

1

2

3

4

5

6

7

8

9 10

↑

Median = 39.5

Because the result of dividing n + 1 by 2 is (10 + 1)/2 = 5.5 for this sample of 10, you must use

Rule 2 and average the fifth and sixth ranked values, 39 and 40. Therefore, the median is 39.5.

The median of 39.5 means that for half of the days, the time to get ready is less than or equal to

39.5 minutes, and for half of the days the time to get ready is greater than or equal to 39.5 minutes. The median time to get ready of 39.5 minutes is very close to the mean time to get ready

of 39.6 minutes.

EXAMPLE 3.2

**COMPUTING THE MEDIAN FROM AN ODD-SIZED SAMPLE
**

The 121 mutual funds that are part of the “Using Statistics” scenario (see page 72) are classified according to the risk level of the mutual funds (low, average, and high) and type (small

cap, mid cap, and large cap). Compute the median 2003 return for the nine small cap mutual

funds with high risk. MUTUALFUNDS2004

SOLUTION Because the result of dividing n + 1 by 2 is (9 + 1)/2 = 5 for this sample of nine,

using Rule 1, the median is the fifth ranked value. The percentage return in 2003 for the nine

small cap mutual funds with high risk are ranked from the smallest to the largest:

4 66. Half the small cap high-risk mutual funds have returns equal to or below 53. Thus. EXAMPLE 3. For example. Like the median and unlike the mean. the mode is 3.5 53.8. .5 1 2 3 4 5 6 7 8 9 Ranks: ↑ Median The median return is 53. extreme values do not affect the mode. For this data set.8 and half have returns equal to or above 53.3 62. The Mode The mode is the value in a set of data that appears most frequently. Compute the mode for the following data that represents the number of server failures in a day for the past two weeks. The extreme value 26 is an outlier.” Example 3.2 44.76 CHAPTER THREE Numerical Descriptive Measures Ranked values: 37. 39 minutes and 44 minutes.4 66. You should use the mode only for descriptive purposes as it is more variable from sample to sample than either the mean or the median. the median and the mode better measure central tendency than the mean. the systems manager can say that the most common occurrence is having three server failures in a day.3 62.5 These data have no mode. more times than any other value. the median is also equal to 3 while the mean is equal to 4.6 59. A set of data will have no mode if none of the values is “most typical.2 44.6 59. None of the values is most typical because each value appears once.4 presents a data set with no mode.2 44. 1 3 0 3 26 2 7 4 0 2 3 3 6 3 SOLUTION The ordered array for these data is 0 0 1 2 2 3 3 3 3 3 4 6 7 26 Because 3 appears five times.8 56.5. MUTUALFUNDS2004 SOLUTION The ordered array for these data is 37. consider the time to get ready data shown below. EXAMPLE 3. since each of these values occurs twice.8.3 COMPUTING THE MODE A systems manager in charge of a company’s network keeps track of the number of server failures that occur in a day. 29 31 35 39 39 40 43 44 44 52 There are two modes.8 56.5 53. For these data. Often there is no mode or there are several modes in a set of data.3 39.3 39.4 DATA WITH NO MODE Compute the mode for the 2003 return for the small cap mutual funds with high risk.2 44.

50th.0% that are larger. if the sample size n = 7. Equations (3. (3.3. then the quartile is equal to the average of the corresponding ranked values. 4.75 ranked value. the time to get ready is greater than or equal to 35 minutes. Q3 = 3( n + 1) ranked value 4 (3.1 FIRST QUARTILE Q1 25. the time to get ready is less than or equal to 44 minutes. You interpret this to mean that on 75% of the days.0% of the values from the other 75. and Q3 are also the 25th. . Ranked values: 29 31 35 39 39 40 43 44 44 52 1 2 3 4 5 6 7 8 9 10 Ranks: The first quartile is the (n + 1)/4 = (10 + 1)/4 = 2. The third ranked value for the get-ready time data is 35 minutes.3) and (3. For example. Round 2.5.0% are larger than the third quartile Q3.0% of the values from the largest 25. and on 25% of the days. The third quartile Q3 divides the smallest 75.4) Use the following rules to calculate the quartiles: • • • Rule 1 If the result is a whole number. Variation. the time to get ready is greater than or equal to 44 minutes.0% of the values are smaller than the median and 50. if the sample size n = 9. halfway between the second ranked value and the third ranked value. you round the result to the nearest integer and select that ranked value. etc. and 75.0% are larger than the first quartile Q1. Q1 = n +1 ranked value 4 (3.0% are larger.3) THIRD QUARTILE Q3 75. and 25. Using the third rule for quartiles.5.2). For example.75 ranked value. respectively. the first quartile Q1 is equal to the (9 + 1)/4 = 2.25 ranked value. and 75th percentile. the first quartile. Equations (3.3).1: Measures of Central Tendency.4) can be expressed generally in terms of finding percentiles: (p ∗ 100)th percentile = p ∗ (n + 1) ranked value. if the sample size n = 10.).5 ranked value. The second quartile Q2 is the median—50. then the quartile is equal to that ranked value. The third quartile is the 3(n + 1)/4 = 3(10 + 1)/4 = 8. and Shape 77 Quartiles 1The Q1. Rule 3 If the result is neither a whole number nor a fractional half. the first quartile Q1 is equal to the (7 + 1)/4 = second ranked value. You interpret the first quartile of 35 to mean that on 25% of the days the time to get ready is less than or equal to 35 minutes. and on 75% of the days. and (3. The eighth ranked value for the get-ready time data is 44 minutes. the first quartile Q1 is equal to the (10 + 1)/4 = 2. For example.4) define the first and third quartiles.0% of the values are smaller than the third quartile Q3.75 to 3 and use the third ranked value. median. you round this down to the eighth ranked value.0% of the values are smaller than Q1. rank the data from smallest to largest.0%. Quartiles split a set of data into four equal parts—the first quartile Q1 divides the smallest 25. Rule 2 If the result is a fractional half (2. To illustrate the computation of the quartiles for the time-to-get-ready data. you round up to the third ranked value. Using the third rule for quartiles.

5 1 2 3 4 5 6 7 8 9 Ranks: For these data Q1 = = ( n + 1) ranked value 4 9 +1 ranked value = 2.2 44.7. the percentage return in 2003 for the nine small cap mutual funds with high risk is: Ranked value: 37.3 and 62. Q1 is the 2. the third quartile Q3 is halfway between 59. halfway between the seventh ranked value and the eighth ranked value.2 = 41. and high) and type (small cap.7 and 75% are greater than or equal to 41. Q1 = 39. mid cap.85.5 ranked value. halfway between the second ranked value and the third ranked value.85 2 The first quartile of 41.8 56. .78 CHAPTER THREE Numerical Descriptive Measures EXAMPLE 3.2 and the third ranked value is 44. using the second rule. Compute the first quartile (Q1) and third quartile (Q3) 2003 return for the small cap mutual funds with high risk.7 2 To find the third quartile Q3 Q3 = = 3( n + 1) ranked value 4 3( 9 + 1) ranked value = 7.6 59.2. average.3 and the eighth ranked value is 62.85 indicates that 75% of the returns in 2003 for small cap high-risk funds are below or equal to 60.5 53.7 indicates that 25% of the returns in 2003 for small cap high-risk funds are below or equal to 41. The third quartile of 60.85 and 25% are greater than or equal to 60.2 44. Since the second ranked value is 39.5 ranked value 4 Therefore. Since the seventh ranked value is 59.4 = 60. and large cap).3 62.3 39.4.2. Thus. the first quartile Q1 is halfway between 39.2 and 44.5 COMPUTING THE QUARTILES The 121 mutual funds that are part of the “Using Statistics” scenario (see page 72) are classified according to the risk level of the mutual funds (low.4.2 + 44. using the second rule. MUTUALFUNDS2004 SOLUTION Ranked from smallest to largest.4 66.5 ranked value 4 Therefore. Thus.5 ranked value. Q3 is the 7.3 + 62. Q3 = 59.

However. Equation (3.0 ))]1/ 2 − 1 = [(0.1: Measures of Central Tendency. The geometric mean measures the rate of change of a variable over time. GEOMETRIC MEAN RATE OF RETURN RG = [(1 + R1 ) × (1 + R2 ) × L × (1 + Rn )]1/ n − 1 where (3.00 ) = 0.50 or − 50% 100.000 that declined to a value of $50. 000 − 50.6) Ri is the rate of return in time period i To illustrate using these measures. the arithmetic mean of the yearly rates of return of this investment is X = ( −0. 000 Using Equation (3.6) defines the geometric mean rate of return. GEOMETRIC MEAN The geometric mean is the nth root of the product of n values X G = ( X 1 × X 2 × L × X n )1/ n (3. Variation.0 ]1/ 2 − 1 = 1−1 = 0 Thus.000 value at the end of year 2. the geometric mean rate of return more accurately reflects the (zero) change in the value of the investment for the two-year period than does the arithmetic mean.50 )) × (1 + (1. the geometric mean rate of return for the two years. and Shape 79 The Geometric Mean The geometric mean and the geometric rate of return measure the status of an investment over time.6).25 or 25% 2 since the rate of return for year 1 is 50. The rate of return for this investment for the two-year period is 0.0 )]1/ 2 − 1 = [1. 000 − 100.5) defines the geometric mean.50 ) + (1.000 at the end of year 1 and then rebounded back to its original $100.3. 000 and the rate of return for year 2 is 100. is RG = [(1 + R1 ) × (1 + R2 )]1/ n − 1 = [(1 + ( −0.00 or 100% 50.5) Equation (3. because the starting and ending value of the investment is unchanged. . consider an investment of $100. 000 R2 = = 1.50 ) × ( 2. 000 R1 = = −0.

2. . MUTUALFUNDS2004 SOLUTION Ranked from the smallest to the largest.35%.5 53.6847) × (1. RANGE The range is equal to the largest value minus the smallest value. and high) and type (small cap. EXAMPLE 3. using Equation (3. the range is 52 − 29 = 23 minutes.2. the 2003 return for the nine small cap mutual funds with high risk is: 37. The range of 23 minutes indicates that the largest difference between any two days in the time to get ready in the morning is 23 minutes. Range = Xlargest − Xsmallest (3. average.5001))]1/ 2 − 1 = [(0. SOLUTION Using Equation (3. mid cap.6). you rank the data from smallest to largest: 29 31 35 39 39 40 43 44 44 52 Using Equation (3.7) To determine the range of the times to get ready.5 Therefore.3 = 29.6 COMPUTING THE GEOMETRIC MEAN RATE OF RETURN The percentage change in the NASDAQ Composite Index was −31.01% in 2003. and large cap).2 44.4 66.7) the range = 66.7 COMPUTING THE RANGE IN THE 2003 RETURN OF SMALL CAP HIGH-RISK MUTUAL FUNDS The 121 mutual funds that are part of the “Using Statistics” scenario (see page 72) are classified according to the risk level of the mutual funds (low.2 44.53% in 2002 and +50.5 − 37.0135 − 1 = 0. Compute the range of the 2003 return for the small cap mutual funds with high risk.5001)]1/ 2 − 1 = [1.80 CHAPTER THREE Numerical Descriptive Measures EXAMPLE 3.3 39. The Range The range is the simplest numerical descriptive measure of variation in a set of data.3153)) × (1 + (0. the geometric mean rate of return in the NASDAQ Composite Index for the two years is RG = [(1 + R1 ) × (1 + R2 )]1/ n − 1 = [(1 + ( −0.3 62.8 56. Compute the geometric rate of return.0271]1/ 2 − 1 = 1.0135 The geometric rate of return in the NASDAQ Composite Index for the two years is 1. The largest difference between any two returns for the small cap mutual funds with high risk is 29.6 59.7).

In other words. it is not influenced by extreme values. it cannot be affected by extreme values. . clustered near the middle. Summary measures such as the median. Thus. Although the range is a simple measure of total variation in the data.2 44.7 = 19.15 Therefore. and Shape 81 The range measures the total spread in the set of data.5 53. are called resistant measures.85 − 41. MUTUALFUNDS2004 SOLUTION Ranked from smallest to largest. using the range as a measure of variation when at least one value is an extreme value is misleading.8 COMPUTING THE INTERQUARTILE RANGE FOR THE 2003 RETURN OF SMALL CAP HIGH-RISK MUTUAL FUNDS The 121 mutual funds that are part of the “Using Statistics” scenario (see page 72) are classified according to the risk level of the mutual funds (low.85.15. INTERQUARTILE RANGE The interquartile range is the difference between the third quartile and the first quartile. mid cap.6 59. To determine the interquartile range of the times to get ready 29 31 35 39 39 40 43 44 44 52 you use Equation (3. Interquartile range = 60.3.3 39.8) and the earlier results on page 78. Q1. or clustered near one or both extremes.1: Measures of Central Tendency.5 Using Equation (3. the 2003 return for the nine small cap mutual funds with high risk is: 37. the range does not indicate if the values are evenly distributed throughout the data set.2 44. it does not take into account how the data are distributed between the smallest and largest values. and large cap). The interval 35 to 44 is often referred to as the middle fifty.4 66. Q3. the interquartile range in the 2003 return is 19.3 62. Interquartile range = Q3 − Q1 (3. therefore. which cannot be influenced by extreme values. the interquartile range in the time to get ready is 9 minutes. Q1 = 41.8 56. average. and high) and type (small cap. Compute the interquartile range of the 2003 return for the small cap mutual funds with high risk. Q1 = 35 and Q3 = 44. The Interquartile Range The interquartile range (also called midspread) is the difference between the third and first quartiles in a set of data. EXAMPLE 3. Variation. and the interquartile range.8) The interquartile range measures the spread in the middle 50% of the data.7 and Q3 = 60. Interquartile range = 44 − 35 = 9 minutes Therefore.8) and the earlier results on page 78. Because the interquartile range does not consider any value smaller than Q1 or larger than Q3.

82 CHAPTER THREE Numerical Descriptive Measures The Variance and the Standard Deviation Although the range and the interquartile range are measures of variation. These statistics measure the “average” scatter around the mean—how larger values fluctuate above it and how smaller values distribute below it. for every set of data these differences would sum to zero. Two commonly used measures of variation that take into account how all the values in the data are distributed are the variance and the standard deviation. This sum is then divided by the number of values minus 1 (for sample data) to get the sample variance (S 2). X1.9) expresses the equation using summation notation. neither the variance nor the standard deviation can ever be negative. X3. Xn. In statistics. The square root of the sample variance is the sample standard deviation (S). n S = S2 = ∑ ( X i − X )2 i =1 n −1 (3. . if you did that. .9) X = mean n = sample size Xi = ith value of the variable X ∑ ( X i − X )2 = summation of all the squared differences between the Xi values and X i =1 SAMPLE STANDARD DEVIATION The sample standard deviation is the square root of the sum of the squared differences around the mean divided by the sample size minus one. . For a sample containing n values. they do not take into consideration how the values distribute or cluster between the extremes. this quantity is called a sum of squares (or SS). SAMPLE VARIANCE The sample variance is the sum of the squared differences around the mean divided by the sample size minus one. although both of these statistics will be zero if there is no variation at all in a set of data and each value in the sample is the same. One measure of variation that would differ from data set to data set would square the difference between each value and the mean and then sum these squared differences. Because the sum of squares are a sum of squared differences that by the rules of arithmetic will always be nonnegative. the variance and standard deviation will be a positive value. n S2 = where n ∑ ( X i − X )2 i =1 n −1 (3. the sample variance (given by the symbol S2) is S2 = ( X1 − X )2 + ( X 2 − X )2 + L + ( X n − X )2 n −1 Equation (3. A simple measure of variation around the mean might take the difference between each value and the mean and then sum these differences. For most sets of data. . However. you would find that because the mean is the balance point in a set of data. X2.10) .

Step 5: Take the square root of the sample variance to get the sample standard deviation. For almost all sets of data. You will most likely use the sample standard deviation as your measure of variation [defined in Equation (3. The sum of the squared differences (Step 3) is shown at the bottom of Table 3. Table 3.40 −4.1 shows Step 1.36 0.60 0.3.60 4.56 153. As the sample size increases.9) [and the inner term in Equation (3. the majority of the observed values lie within an interval of plus and minus one standard deviation above and below the mean.60 4.1 Computing the Variance of the Getting Ready Times X = 39.40 12.60 −10. Variation. The third column of Table 3. TABLE 3. n − 1 is used because of certain desirable mathematical properties possessed by the statistic S2 that make it appropriate for statistical inference (which will be discussed in Chapter 7).40 45.40 −8. Step 3: Add the squared differences.36 19.96 19. However.60 3.10)].40 −0. The standard deviation helps you to know how a set of data clusters or distributes around its mean. and Shape 83 If the denominator were n instead of n − 1. the standard deviation is always a number that is in the same units as the original sample data.1 shows the first four steps for calculating the variance and standard deviation for the getting ready times data with a mean ( X ) equal to 39.1 shows Step 2. the difference between dividing by n or n − 1 becomes smaller and smaller.40 0. This total is then divided by 10 − 1 = 9 to compute the variance (Step 4).1. Unlike the sample variance.36 21.1: Measures of Central Tendency.6 Time (X) 39 29 43 52 39 44 40 31 44 35 Step 1: (Xi − X ) Step 2: (Xi − X )2 −0.10)] would calculate the average of the squared differences around the mean. The second column of Table 3.16 Step 3: Sum: Step 4: Divide by (n − 1): 412. To hand-calculate the sample variance S2 and the sample standard deviation S: Step 1: Compute the difference between each value and the mean.76 0.16 73. Step 2: Square each difference. Therefore.36 112. which is a squared quantity.36 11. knowledge of the mean and the standard deviation usually helps define where at least the majority of the data values are clustering. Equation (3. Step 4: Divide this total by n − 1 to get the sample variance.6 (see page 74 for the calculation of the mean).82 .

1. clustering between X − 1S = 32.77 This indicates that the get-ready times in this sample are clustering within 6. and high) and type (small cap.6 ) 2 + L + ( 35 − 39. Using the second column of Table 3.395 .16 8 = 111.53) 2 9 −1 = 891. the sample standard deviation S is n S2 = S = ∑ ( X i − X )2 i =1 n −1 = 45.9) on page 82 n S2 = ∑ ( X i − X )2 i =1 n −1 = ( 44. mid cap.53) 2 + ( 39.82 Because the variance is in squared units (in squared minutes for these data).6 ) 2 10 − 1 = 412. EXAMPLE 3.83 and X + 1S = 46.5 − 51.77 minutes around the mean of 39. For any set of data.6 minutes (i.2 − 51.4 9 = 45.10) on page 82. Using Equation (3.53) 2 + L + ( 66. you can also calculate the sum of the differences between each value and the mean to be zero.2 illustrates the computation of the variance and standard deviation for the return in 2003 for the small cap mutual funds with high risk. average. 7 out of 10 get-ready times lie within this interval.9): n S2 = ∑ ( X i − X )2 i =1 n −1 = ( 39 − 39.e. MUTUALFUNDS2004 SOLUTION Table 3.84 CHAPTER THREE Numerical Descriptive Measures You can also calculate the variance by substituting values for the terms in Equation (3.37).9 COMPUTING THE VARIANCE AND STANDARD DEVIATION OF THE 2003 RETURN OF SMALL CAP HIGH-RISK MUTUAL FUNDS The 121 mutual funds that are part of the “Using Statistics” scenario (see page 72) are classified according to the risk level of the mutual funds (low.6 ) 2 + ( 29 − 39. this sum will always be zero: n ∑ ( X i − X ) = 0 for all sets of data i =1 This property is one of the reasons that the mean is used as the most common measure of central tendency.82 = 6.5 − 51. and large cap). Compute the variance and standard deviation of the 2003 return for the small cap mutual funds with high risk. In fact. Using Equation (3. to compute the standard deviation you take the square root of the variance..

0667 2. the data are. variance. In fact. None of the measures of variation (the range.55 indicates that the 2003 returns for the small cap mutual funds with high risk are clustering within 10. variance.395 Using Equation (3. interquartile range. standard deviation. Variation. The Coefficient of Variation Unlike the previous measures of variation presented.5333 Return 2003 Step 1: (Xi − X ) Step 2: (Xi − X )2 44. clustering between X − 1S = 40.2333 −7. and standard deviation will all equal zero.5 39.8667 7.0333 −12.3333 10.3 56.2 Computing the Variance of the 2003 Return for the Small Cap Mutual Funds with High Risk 85 X = 51. interquartile range. .2 62..8 37. • • • • The more spread out. the sample standard deviation S is n S = S2 = ∑ ( X i − X )2 i =1 n −1 = 111.5 −7.55 around the mean of 51.7778 224. and standard deviation.e. and standard deviation.0844 60.3.395 = 10.53 (i.3211 25. measures the scatter in the data relative to the mean.1: Measures of Central Tendency. and standard deviation. the coefficient of variation is a relative measure of variation that is always expressed as a percentage rather than in terms of the units of the particular data. variance. the range. the larger the range. If the values are all the same (so that there is no variation in the data).6 53. or homogeneous the data are. variance.16 111.3 44.2 66.7667 5. or dispersed.98 and X + 1S = 62.55 The standard deviation of 10.0011 Step 3: Sum: Step 4: Divide by (n − 1): 891. denoted by the symbol CV.1111 118.6711 5. interquartile range. and Shape TABLE 3. The coefficient of variation. interquartile range.08).2667 −14.10) on page 82.5878 53.6% (5 out of 9) of the 2003 returns lie within this interval. The more concentrated.4 59. The following summarizes the characteristics of the range. interquartile range. 55. the smaller the range.1378 202. and variance) can ever be negative.9667 49.4678 152.3333 14.

You will find the coefficient of variation very useful when comparing two or more sets of data that are measured in different units as Example 3. and finds that the mean weight is 26. and the mean volume is 8.8 cubic feet. relative to the mean.77 CV = 100% = 100% = 17. the coefficient of variation is 3. EXAMPLE 3. . the standard deviation is 17. Z Scores An extreme value or outlier is a value located far away from the mean. the farther the distance from the value to the mean.77. The Z score is the difference between the value and the mean. multiplied by 100%.8 Thus.9 CVW = 100% = 15% 26. divided by the standard deviation. the operations manager should compare the relative variability in the two types of measurements. When packages are stored in the trucks in preparation for delivery. S CV = 100% X where (3.10 COMPARING TWO COEFFICIENTS OF VARIATION WHEN TWO VARIABLES HAVE DIFFERENT UNITS OF MEASUREMENT The operations manager of a package delivery service is deciding on whether to purchase a new fleet of trucks.6 and S = 6. For weight. the coefficient of variation is 2.2 cubic feet. The operations manager samples 200 packages. How can the operations manager compare the variation of the weight and the volume? SOLUTION Because the measurement units differ for the weight and volume constraints.11) S = sample standard deviation X = sample mean For the sample of 10 get-ready times. since X = 39.0 pounds.0 For volume. with a standard deviation of 3.86 CHAPTER THREE Numerical Descriptive Measures COEFFICIENT OF VARIATION The coefficient of variation is equal to the standard deviation divided by the mean.0% 8. the coefficient of variation is S 6.6 For the get-ready times.1% of the size of the mean.10 illustrates. with a standard deviation of 2. Z scores are useful in identifying outliers.9 pounds. the package volume is much more variable than the package weight.10% X 39. The larger the Z score. you need to consider two major constraints—the weight (in pounds) and the volume (in cubic feet) for each item.2 CVV = 100% = 25.

65 0. None of the percentage returns met that criterion to be considered outliers.0. a Z score is considered an outlier if it is less than -3. average.77 minutes.09 0.1: Measures of Central Tendency. The largest Z score is 1.77 −0. .5.0 minutes.09 Table 3.65 −0.12) For the time to get ready in the morning data.6 6.42 for a percentage return of 66. The largest Z score is 1. Compute the Z scores of the 2003 return for the small cap mutual funds with high risk.0.3 shows the Z scores for all 10 days.3.4 illustrates the Z scores of the 2003 return for the small cap mutual funds with high risk.50 1. The time to get ready on the first day is 39.3 Z Scores for the 10 Get-Ready Times Mean Standard deviation EXAMPLE 3. MUTUALFUNDS2004 SOLUTION Table 3.83 −0. and large cap). and high) and type (small cap.11 Time (X) Z Score 39 29 43 52 39 44 40 31 44 35 39.0 − 39. The lowest Z score is −1.57 0.27 0.06 −1. the mean is 39. As a general rule. and Shape 87 Z SCORES Z = X −X S (3.68 COMPUTING THE Z SCORES OF THE 2003 RETURN OF SMALL CAP HIGH-RISK MUTUAL FUNDS The 121 mutual funds that are part of the “Using Statistics” scenario (see page 72) are classified according to the risk level of the mutual funds (low.6 6.0 or greater than +3.35 for a percentage return of 37. As a general rule.0 or greater than +3. Variation.6 minutes and the standard deviation is 6. The lowest Z score was −1.09 −1.57 for day 2 on which the time to get ready was 29 minutes. mid cap. TABLE 3. a Z score is considered an outlier if it is less than −3. None of the times met that criterion to be considered outliers.3. You compute the Z score for day 1 from Z = = X −X S 39.77 = −0.83 for day 4 on which the time to get ready was 52 minutes.

FIGURE 3. Each half of the curve is a mirror image of the other half of the curve.2 66.xla macro workbook and select VisualExplorations Descriptive Statistics from the Microsoft Excel menu bar.53 10.35 −0.21 −1. most of the values are in the upper portion of the distribution.8 37. Shape influences the relationship of the mean to the median in the following ways: • • • Mean < median. when low and high values balance each other out. Shape is the pattern of the distribution of data values throughout the entire range of all the values.4 Z Scores of the 2003 Return for the Small Cap Mutual Funds with High Risk Mean Standard Deviation Return 2003 Z Scores 44.74 0. negative or left-skewed Mean = median.3 44.03 0. There is a long tail and distortion to the left that is caused by some extremely small values.2 62. or skewed. There is a long tail on the right of the distribution and a distortion to the right that is caused by some extremely large values. most of the values are in the lower portion of the distribution.48 0.5 51. and the mean equals the median.42 Shape A third important property that describes a set of numerical data is shape.6 53. In this panel. Read the instructions in the popup box (see illustration on page 89) and click OK to examine a dot scale diagram for the sample of 10 get-ready times used throughout this chapter. . In this panel. not symmetrical and showing an imbalance of low values or high values. variation. or right-skewed The data in panel A are negative. These extremely large values pull the mean upward so that the mean is greater than the median. and shape.1 depicts three data sets.4 59.3 56. positive or right-skewed Figure 3.1 A Comparison of Three Data Sets Differing in Shape Panel A Negative. Open the Visual Explorations. The low and high values on the scale balance. or left-skewed.5 39. symmetric or zero skewness Mean > median. The data in panel B are symmetrical. A distribution will either be symmetrical.55 −0. or right-skewed. each with a different shape.17 1.69 1. Visual Explorations: Exploring Descriptive Statistics Use the Visual Explorations Descriptive Statistics procedure to see the effect of changing data values on measures of central tendency. The data in panel C are positive. These extremely small values pull the mean downward so that the mean is less than the median.88 CHAPTER THREE Numerical Descriptive Measures TABLE 3. or left-skewed Panel B Symmetrical Panel C Positive.67 −1.

along with statistics for kurtosis and skewness. median.3. Skewness measures the lack of symmetry in the data and is based on a statistic that is a function of the cubed differences around the mean. Which measures are affected by this change? Which ones are not? You can flip between the “before” and “above” diagrams by repeatedly pressing Crtl-Z (undo) followed by Crtl-Y (redo) to help see the changes the extreme value caused in the diagram. In addition. This measure is not discussed in this text (see reference 2).2 Microsoft Excel Descriptive Statistics of the 2003 Returns Based on Risk Level . and count (sample size) on a single worksheet. Variation. range. variance. A skewness value of zero indicates a symmetric distribution. minimum. maximum. Kurtosis measures the relative concentration of values in the center of the distribution as compared with the tails and is based on the differences around the mean raised to the fourth power. mode. From Figure 3. there appears to be slight differences in the 2003 percentage return for the FIGURE 3. the Excel descriptive statistics output for the 2003 return of the funds based on risk level. standard deviation.2. and Shape 89 Experiment by entering an extreme value such as 10 minutes into one of the tinted cells of column A. The standard error is the standard deviation divided by the square root of the sample size and will be discussed in Chapter 7.1: Measures of Central Tendency. Excel computes the standard error. Microsoft Excel Descriptive Statistics Output The Microsoft Excel Data Analysis ToolPak generates the mean. all of which have been discussed in this section.

there appears to be slight differences in the 2003 percentage return for the three risk levels. PH Grade ASSIST . variance.2 The following is a set of data from a sample of n = 6: 7 4 9 7 3 12 a. variance. Are there any outliers? d. and mode. Are there any outliers? d. b. c. Compute the mean. PH Grade ASSIST 3. and coefficient of variation. c. High-risk funds had a slightly higher mean. coefficient of variation (labeled CoefVar). 3. interquartile range. Compute the mean. FIGURE 3. and coefficient of variation. b. Describe the shape of the data set. standard deviation. and coefficient of variation. first and third quartiles. Describe the shape of the data set. standard deviation.4 The following is a set of data from a sample of n = 5: 7 −5 −8 7 9 a. c. standard deviation. Minitab computes the sample size (labeled as N).3 The following set of data is from a sample of n = 7: 12 7 4 9 0 7 3 a. interquartile range.1 The following is a set of data from a sample of n = 5: 7 4 9 8 2 a. the Minitab descriptive statistics output for the 2003 return of the funds based on risk level. Compute the range. Compute the Z scores. and quartiles than did low-risk and average-risk funds. and mode.90 CHAPTER THREE Numerical Descriptive Measures three risk levels. standard deviation (labeled StDev). High-risk funds had a slightly higher mean and median than did low-risk and average-risk funds. range. Compute the Z scores. and mode. From Figure 3. Compute the range. Describe the shape of the data set. and coefficient of variation. interquartile range. minimum. median. median. and mode. Compute the range. median. the mean. Compute the range. There was very little difference in the standard deviations or interquartile ranges of the three groups. and interquartile range (labeled IQR). b. Minitab Descriptive Statistics Output For descriptive statistics. maximum.1 Learning the Basics PH Grade ASSIST 3. median. Compute the mean. median. variance. c. median. interquartile range.3 Minitab Descriptive Statistics of the 2003 Returns Based on Risk Level PROBLEMS FOR SECTION 3. standard deviation. variance. b. PH Grade ASSIST 3. Describe the shape of the data set. There was very little difference in the standard deviations of the three groups. Compute the mean. all of which have been discussed in this section.3.

c. b. What would be the effect on your answers in (a) and (b) if the last value for grade Y were 588 instead of 578? Explain. Which grade of tire is providing better quality? Explain. range.7 The following data represent the total fat for burgers and chicken items from a sample of fastfood chains. For the burgers and chicken items separately: a. Columbia Utah State University. first quartile. c.425 922 308 a. D1). Fees—and Ire. an increase of 6. Logan 1. 46. and Shape 3. first quartile..1: Measures of Central Tendency. median. Athens University of Illinois. Manhattan University of Maine. Compute the geometric mean rate of return.5 Suppose that the rate of return for a particular stock during the past two years was 10% and 30%. range.” USA Today. Variation. Compute the mean. . b. and Z scores. The following represents the change in the cost of tuition. or SPSS.30.” Copyright © 2004 by Consumers Union of U. standard deviation. 2004.0 510 22. FASTFOOD Burgers 19 31 34 35 39 39 43 Chicken 7 9 15 16 16 18 22 25 27 33 39 Source: Extracted from “Quick Bites.0 530 19..” Copyright © 2001 by Consumers Union of U. Durham Ohio State University. NY 10703–1057. Compute the mean.) PH Grade ASSIST Applying the Concepts Problems 3. and third quartile. how? d. median.0 420 16. Yonkers.8 The median price of a home in December 2003 rose to $173. and standard deviation. January 27. Based on the results of (a) through (c).9 In the 2002–2003 academic year. what conclusions can you reach concerning the differences in total fat of burgers and chicken items? 3.10 The following data COFFEEDRINK represent the calories and fat (in grams) of 16-ounce iced coffee drinks at Dunkin’ Donuts and Starbucks. Berkeley University of Georgia. interquartile range. Adapted with permission from Consumer Reports. what conclusions can you reach concerning the change in costs between the 2001–2002 and 2002–2003 academic years? 3.5 22.20 can be solved manually or by using Microsoft Excel. Oxford University of New Hampshire. SELF Test 3.6 The operations manager of a plant that manufactures tires wants to compare the actual inner diameter of two grades of tires. March 2001. compute the mean. For the full year.1 million homes (James R.0 Source: Extracted from “Coffee as Candy at Dunkin’ Donuts and Starbucks.0 260 350 3. interquartile range. c. Product Dunkin’ Donuts Iced Mocha Swirl latte (whole milk) Starbucks Coffee Frappuccino blended coffee Dunkin’ Donuts Coffee Coolatta (cream) Starbucks Iced Coffee Mocha Expresso (whole milk and whipped cream) Starbucks Mocha Frappuccino blended coffee (whipped cream) Starbucks Chocolate Brownie Frappuccino blended coffee (whipped cream) Starbucks Chocolate Frappuccino Blended Crème (whipped cream) Calories Fat 240 8.. are as follows: PH Grade ASSIST Grade X Grade Y 568 570 575 578 584 573 574 575 577 578 a.S.7% from December 2002.0 350 20. 1A–2A). and the results representing the inner diameters of the tires. how? d. 3. Compute the variance. August 8. A sample of five tires of each grade was selected.6–3. a shared dormitory room. June 2004. median. many public universities in the United States raised tuition and fees due to a decrease in state subsidies (Mary Beth Marklein. a. Minitab. “Public Universities Raise Tuition.10 and a rate of return of 30% is recorded as 0. Why do you think the article reports the median home price and not the mean home price? 91 3. and third quartile. Based on the results of (a) through (c). Hagerty. each of which is expected to be 575 millimeters. coefficient of variation.. Inc.589 593 1. Adapted with permission from Consumer Reports. Columbus University of South Carolina. 9. Inc. Orono University of Mississippi. b.200. COLLEGECOST University Change in Cost ($) University of California. and the most popular meal plan between the 2001–2002 academic year and the 2002–2003 academic year for a sample of 10 public universities. Compute the variance. and coefficient of variation. “Housing Prices Continue to Rise. 2002.” The Wall Street Journal. Are the data skewed? If so.3. For each of the two grades of tires. standard deviation. Are the data skewed? If so. ranked from smallest to largest.720 708 1. (Note: A rate of return of 10% is recorded as 0.223 869 423 1. NY 10703–1057.S. b. Describe the shape of the distribution of the price of homes sold. Urbana–Champaign Kansas State University. Yonkers. sales hit a record 6.

Are the data skewed? If so. what conclusions can you reach concerning the price of 3-megapixel digital cameras at a camera specialty store during 2003? 3. Compute the mean. b. c. A. 2003. 3.5 15. Are the data skewed? If so. Compute the mean. cities during a week in October 2003. standard deviation. and third quartile. interquartile range. Repeat (a) through (c).13 A software development and consulting firm located in the Phoenix metropolitan area develops software for supply chain management systems using systematic software reuse. 3. coefficient of variation. first quartile. Based on the results of (a) through (c). range. Based on the results of (a) through (c). b.) d. what conclusions can you reach concerning the calories and fat in iced coffee drinks at Dunkin’ Donuts and Starbucks? 3.0 45. a. coefficient of variation. Rothenberger.5 75. range. and standard deviation. For each variable (hotel cost and rental car cost). “A Performance Measure for Software Reuse Projects.342 instead of 342. variance. REUSE 50 62.M. The waiting time in minutes (defined as the time the customer enters the line to . Looking at the distribution of times to failure. standard deviation. October 10. which measures of location do you think are most appropriate and which least appropriate to use for these data? Why? b. Louis New Orleans Detroit Cleveland Atlanta Orlando Miami Pittsburgh Boston New York Washington. how? d. S.049 631 512 266 492 562 298 a.5 37. Comment on the difference in the results.0 47. what conclusions can you reach concerning the daily cost of a hotel and rental car? 3. coefficient of variation. and Z scores. The following data are given as a percentage of the total code written for a software system that is part of the reuse database. b.0 25. Instead of starting from scratch when writing and developing new custom software systems. median. 1131–1153. What would you advise if the manufacturer wanted to be able to say in advertisements that these batteries “should last 400 hours”? (Note: There is no right answer to this question. Are there any outliers? Explain. Compute the variance. first quartile. median. range. median. and Z scores. J. Compute the mean. 30(Fall 1999). interquartile range. variance. Compute the range. Are the data skewed? If so. c. The numbers of hours they were used until failure were: BATTERIES 342 426 317 545 264 451 1. how? d. a. c.92 CHAPTER THREE Numerical Descriptive Measures For each variable (calories and fat). and Z scores. and K. and third quartile.” Decision Sciences. CAMERA 340 450 450 280 220 340 290 370 400 310 340 430 270 380 a. Based on the results of (a) through (c). Dooley.15 A bank branch located in a commercial district of a city has developed an improved process for serving customers during the noon to 1:00 P. c. Hotel Cars 205 179 185 210 128 145 177 117 221 159 205 128 165 180 198 158 132 283 269 204 47 41 49 38 32 48 49 41 56 41 50 32 34 46 41 40 39 67 69 40 Source: Extracted from The Wall Street Journal. Suppose that the first value was 1. and mode. HOTEL-CAR City San Francisco Los Angeles Seattle Phoenix Denver Dallas Houston Minneapolis Chicago St.11 The following data represent the daily hotel cost and rental car cost for 20 U. b. first quartile. Interpret the summary measures calculated in (a) and (b).C. interquartile range. Eight analysts at the firm were asked to estimate the reuse rate when developing a new software system. how? d.14 A manufacturer of flashlight batteries took a sample of 13 batteries from a day’s production and used them continuously until they were drained. Are there any outliers? Explain. a. lunch period. and mode. c. standard deviation. Compute the mean.0 Source: M.12 The cost of 14 models of 3-megapixel digital cameras at a camera specialty store during 2003 was as follows. Are there any outliers? Explain. and standard deviation. W4. the firm uses a database of reusable components totaling more than 2. the point is to consider how to make such a statement precise. Compute the mean.000. Calculate the range. median. and third quartile. Compute the variance. D. using this value. Compute the variance. median.000 lines of code collected from 10 years of continuous reuse effort.

68 5.90 −9. and the money market deposit.19 3. b.66 5.64 4. and Shape when he or she reaches the teller window) of all customers during this hour is recorded over a period of one week.44 −6.79 a. A random sample of 15 customers is selected.20 The time period from 2000 to 2003 saw a great deal of volatility in the value of metals. 2004. Compare the results of (b) to those of problems 3.79 8. and the Wilshire 5000 Index. 2004. Year Platinum Gold Silver 2003 2002 2001 2000 34. Passenger car sales increased 61% in 2002 and 55% in 2003 (Peter Wonacott.” On the basis of the results of (a) and (b).89 Source: Extracted from The Wall Street Journal.38 5. Are there any outliers? Explain.8 24. Compare the results of (b) to those of problems 3. Variation. and silver from 2000 to 2003.01 8.40 −20. As a customer walks into the branch office during the lunch hour. gold.74 3.02 1. a. Year One Year 30 Month Money Market 2003 2002 2001 2000 1. a.3 −23. What conclusions can you reach concerning the geometric rates of return of the four stock indexes? c.12 6. the Standard & Poor’s 500. standard deviation. evaluate the accuracy of this statement. interquartile range. and the results are as follows: BANK1 4. The waiting time in minutes (defined as the time the customer enters the line to the time he or she reaches the teller window) of all customers during these hours is recorded over a period of one week. and coefficient of variation. and Z scores.9 Source: Extracted from The Wall Street Journal. What conclusions can you reach concerning the geometric rates of return of the three deposits? c. and third quartile.5 24.5 −21. Compute the variance.35 10. and silver. b.0 −5.0 5. Compute the variance.M.18 (b) and 3. 3.98 3.20 (b).21 5.03 −3.09 Source: Extracted from The Wall Street Journal. The branch manager replies.2 24. Calculate the geometric rate of return for the Dow Jones Industrial Index. the Standard & Poor’s 500. range.17 China is the fastest-growing market for passenger car sales and fourth biggest after the United States.1: Measures of Central Tendency.” The Wall Street Journal. gold. What conclusions can you reach concerning the geometric rates of return of the three metals? c.93 3. b. and the money market deposit from 2000 to 2003. b. range.82 8. is also concerned with the noon to 1 P. 2004. and the results are as follows: BANK2 9.19 The time period from 2000 to 2003 saw a great deal of volatility in the value of investments. standard deviation. January 2.20 26.50 6. the Russell 2000 Index. January 2.61 1.76 2.97 −10. Compute the mean.18 (b) and 3.” On the basis of the results of (a) and (b). evaluate the accuracy of this statement. Japan. The data in the following table BANKRETURN represent the total rate of return of the one-year certificate of deposit. Calculate the geometric rate of return for the one-year certificate of deposit. Compute the geometric mean rate of increase.46 1. The data in the following table STOCKRETURN represent the total rate of return of the Dow Jones Industrial Index.34 3. 3.60 5.20 1. a.73 2. and third quartile.40 −22.90 −10.08 6.20 (b).55 3. first quartile. how? d.02 29. January 2.64 0. As a customer walks into the branch office during the lunch hour.73 3. he asks the branch manager how long he can expect to wait.54 3.40 −21. Year DJIA SP500 Russell2000 Wilshire5000 2003 2002 2001 2000 25. “Almost certainly less than five minutes.91 5.02 5.10 45. the 30-month certificate of deposit.10 −11.61.16 Suppose that another branch.77 2. and Germany. 2004.47 a. “Almost certainly less than five minutes. lunch hour. (Hint: Denote an increase of 61% as R1 = 0. Are the data skewed? If so.49 6. median. c.2 1.46 6. the Russell 2000 Index. 3. Compute the mean. interquartile range. c.01 −5. the 30-month certificate of deposit.3 19.97 5. median.5 1.19 (b) and 3.5 −3. how? d. . and the Wilshire 5000 Index from 2000 to 2003. The data in the following table METALRETURN represent the total rate of return for platinum.) SELF Test 3. she asks the branch manager how long she can expect to wait. first quartile. “A Fear Amid China’s Car Boom. The branch manager replies.58 −1. 3. A random sample of 15 customers is selected.19 (b).90 8. b.02 5.17 9. A17).10 0. Calculate the geometric rate of return for platinum. Are there any outliers? Explain.30 −15. February 2. Compare the results of (b) to those of problems 3. Are the data skewed? If so.13 4. coefficient of variation.18 The time period from 2000 to 2003 saw a great deal of volatility in the value of stocks.20 4. located in a residential area.

5 that contains the five biggest bond funds (in terms of total assets) as of March 1. In this section. If your data set represents numerical measurements for an entire population.5 5 5 Thus. the population mean. The Population Mean The population mean is represented by the symbol µ.5 + 7.0 7. C2. variation. you need to calculate and interpret parameters. To help illustrate these parameters.3 + 12.0 + 7.5. N µ = where ∑ Xi i =1 N (3.94 CHAPTER THREE Numerical Descriptive Measures 3. March 25.8 + 6. . 2004.3 12. the Greek lowercase letter mu.13) µ = population mean Xi = ith value of the variable X N ∑ X i = summation of all Xi values in the population i =1 To compute the mean return for the population of bond funds given in Table 3. summary measures for a population.13).9 37. Equation (3.9 Source: Extracted from The Wall Street Journal.13) defines the population mean. POPULATION MEAN The population mean is the sum of the values in the population divided by the population size N.1 presented various statistics that described the properties of central tendency. and shape for a sample.2 NUMERICAL DESCRIPTIVE MEASURES FOR A POPULATION Section 3.5%. you will learn about three descriptive population parameters. The 52-week return for each of these funds is also listed. LARGEST BONDS.5 2003 Return for the Population Consisting of the Five Largest Bond Funds 52-Week Return (in %) Vanguard GNMA Vanguard Total Bond Index Pimco Total Return Admin Pimco Total Return Instl America Bond Fund 3. and population standard deviation.5 7.8 6. Bond Fund TABLE 3. use Equation (3.5 = = 7. population variance. N µ = ∑ Xi i =1 N = 3. the mean 2003 return for these bond funds is 7. first review Table 3. 2004.

represents the population variance and the symbol σ.04 + 29.5) 2 + ( 7.5) 2 + (6. you use Equation (3. Like the related sample statistics.3.14 = 8.9) and (3.0 − 7. The denominators for the right-side terms in these equations use N and not the (n − 1) term that is used in the equations for the sample variance and standard deviation [see Equations (3.5 − 7.5 on page 94.828 5 . The symbol σ2. N σ2 = ∑ ( X i − µ)2 i =1 N = ( 3.5) 2 5 = 13. the Greek lowercase letter sigma.15) define these parameters.14) and (3. the Greek lowercase letter sigma squared.10) on page 82].16 5 = 44.69 + 1.9 − 7. POPULATION VARIANCE The population variance is the sum of the squared differences around the population mean divided by the population size N.8 − 7.2: Numerical Descriptive Measures for a Population 95 The Population Variance and Standard Deviation The population variance and the population standard deviation measure variation in a population.3 − 7. the population standard deviation is the square root of the population variance. N σ2 = where ∑ ( X i − µ )2 i =1 N (3.5) 2 + (12. represents the population standard deviation.00 + 0.15) To compute the population variance for the data of Table 3.14) µ = population mean Xi = ith value of the variable X N ∑ ( X i − µ )2 = summation of all the squared differences between the Xi values and µ i =1 POPULATION STANDARD DEVIATION N σ = ∑ ( X i − µ )2 i =1 N (3. Equations (3.25 + 0.14).5) 2 + ( 7.

You should use the standard deviation that uses the original units of the data (percentage return). The empirical rule helps you measure how the values distribute above and below the mean. values not found in the interval µ ± 3σ are almost always considered outliers.02 ) = (12. This can help you to identify outliers when analyzing a set of numerical data. at a value greater than the mean. Is it very likely that a can will contain less than 12 ounces of cola? SOLUTION µ ± σ = 12. This large amount of variation suggests that these large bond funds produce results that differ greatly.96 CHAPTER THREE Numerical Descriptive Measures Thus. where the median and mean are the same.10 ) µ ± 3σ = 12.828 squared percentage return.00 and 12. Therefore. EXAMPLE 3.97.00. or those not appearing bell-shaped for any other reason. In left-skewed data sets.828 = 2. you can consider values not found in the interval µ ± 2σ as potential outliers. that is.12 ) Using the empirical rule.06 ± 3( 0. In symmetrical data sets. 12.97 Therefore.12 ounces. . it is highly unlikely that a can will contain less than 12 ounces. that is. and approximately 99. the values often tend to cluster around the median and mean producing a bellshaped distribution. this clustering occurs to the left of the mean.000 will be beyond three standard deviations from the mean. Describe the distribution of fill-weights. 12. approximately 95% will contain between 12.5 by approximately 2.02.06 ± 2(0. the values tend to cluster to the right of the mean.7% are within a distance of ±3 standard deviations from the mean.7% will contain between 12.04.08 ) µ ± 2σ = 12. The squared units make the variance hard to interpret. As a general rule.08 ounces. In right-skewed data sets.06 ounces and a standard deviation of 0. the variance of the returns is 8. The empirical rule implies that for bell-shaped distributions only about one out of 20 values will be beyond two standard deviations from the mean in either direction. Approximately 99. Approximately 95% of the values are within a distance of ±2 standard deviations from the mean. at a value less than the mean.02 = (12.04 and 12. For heavily skewed data sets. From Equation (3. approximately 68% of the cans will contain between 12. the Chebyshev rule discussed on page 97 should be applied instead of the empirical rule. 12. Therefore.02 and 12.15). a large portion of the values tend to cluster somewhat near the median.12 USING THE EMPIRICAL RULE A population of 12-ounce cans of cola is known to have a mean fill-weight of 12.02.02 ) = (12.06 ± 0. The rule also implies that only about three in 1. You can use the empirical rule to examine the variability in bell-shaped distributions: • • • Approximately 68% of the values are within a distance of ±1 standard deviation from the mean.10 ounces. The Empirical Rule In most data sets. The population is also known to be bell-shaped. the typical 2003 return differs from the mean of 7. N σ = σ2 = ∑ ( X i − µ )2 i =1 N = 8.

6 How Data Vary Around the Mean % of Values Found in Intervals Around the Mean Interval (µ − σ.06 ounces and a standard deviation of 0. However. and at least 88. a population of 12-ounce cans of cola is known to have a mean fill-weight of 12.04. between 0 and 11. TABLE 3.12.89% Approximately 68% Approximately 95% Approximately 99. You can state that at least 75% of the cans will contain between 12. The results you compute using the sample statistics are approximations since you used sample statistics ( X .7% USING THE CHEBYSHEV RULE As in Example 3.3. σ). Consider k = 2. µ + 3σ) EXAMPLE 3. 12. The rule indicates at least what percentage of the values fall within a given distance from the mean. Is it very likely that a can will contain less than 12 ounces of cola? SOLUTION µ ± σ = 12.08 ) µ ± 2σ = 12.6 compares the Chebyshev and empirical rules. Therefore.00 and 12. 12. µ + σ) (µ − 2σ.00. 12. The Chebyshev rule is very general and applies to any type of distribution.06 ± 2( 0.2: Numerical Descriptive Measures for a Population 97 The Chebyshev Rule The Chebyshev rule (reference 1) states that for any data set.10 ounces. You can use these two rules for understanding how data are distributed around the mean when you have sample data. .02.13 Chebyshev (for any distribution) Empirical Rule (bell-shaped distribution) At least 0% At least 75% At least 88. However. regardless of shape.08 ounces. Describe the distribution of fill-weights. µ + 2σ) (µ − 3σ.06 ± 0. the shape of the population is unknown and you cannot assume that it is bell-shaped.12 ounces. if the data set is approximately bell-shaped. The Chebyshev rule states that at least [1 − (1/2)2] × 100% = 75% of the values must be found within ±2 standard deviations of the mean.02 = (12.12 ) Because the distribution may be skewed.11% of the cans contain less than 12 ounces. In each case. you cannot use the empirical rule.02 and 12.04 and 12. Table 3.89% will contain between 12. use the value you calculated for X in place of µ and the value you calculated for S in place of σ. the empirical rule will more accurately reflect the greater concentration of data close to the mean.10 ) µ ± 3σ = 12. you cannot say anything about the percentage of cans containing between 12.02 ) = (12. S) and not population parameters (µ. Using the Chebyshev rule.02 ) = (12.06 ± 3( 0. the percentage of values that are found within distances of k standard deviations from the mean must be at least (1 − 1/k2) × 100% You can use this rule for any value of k greater than 1.02.

a. Compute the mean. In addition.8 10.24 Consider a population of 1.3 13.5 12. whichever is appropriate.5 11.5 9. are there any outliers? Explain.5 (Q1) and 10. Are you surprised at the results in (b)? 3.3 10.7 11. and standard deviation for the population. Compute the population mean. c.2 Learning the Basics 3. Index Bond Fund of America A Franklin Calif.7 11. Compute the variance and standard deviation for this population. b.22 The following is a set of data for a population with N = 10: 7 5 6 6 6 4 8 6 9 3 a.7 10.9 a. and standard deviation for this population. at least 93.5 16.0 8.20 and that σ.5 10. ±2.2 11.5 9. within ±2 standard deviations of the mean? c. Using the results in (c).75% of these funds are expected to have one-year total returns between what two amounts? 3.21 The following is a set of data for a population with N = 10: 7 5 11 8 3 6 2 1 9 8 a.2 10.25 The following table ASSETS represents the assets in billions of dollars of the five largest bond funds. and within ±3 standard deviations of the mean? c. According to the Chebyshev rule.9 9.8 10.6 9.5 10.3 10.1 12.3 8. the mean one-year total percentage return achieved by all the funds.23 The following data represent the quarterly sales tax receipts (in thousands of dollars) submitted to the comptroller of the Village of Fair Lake for the period ending March 2004 by all 50 business establishments in that locale: TAX SELF Test 10. Compute the variance and standard deviation for this population.7 12.5 (Q3). is 2. How have the results changed? 3. to further explain the variation in this data set. what percentage of these funds is expected to be a. Compute the mean. b.024 mutual funds that primarily invested in large companies.1 9.6 a. Interpret the standard deviation. suppose you determined that the range in the one-year total returns is from −2. Compute the population standard deviation.1 6. is 8. A Vanguard Short-Term Corp.98 CHAPTER THREE Numerical Descriptive Measures PROBLEMS FOR SECTION 3. According to the empirical rule.2 15.4 10.8 10.5 9. within ±1 standard deviation of the mean? PH Grade ASSIST b. What proportion of these businesses have quarterly sales tax receipts within ±1. Compute the mean for this population.0 7.0 8. Use the empirical rule or the Chebyshev rule. You determined that µ. Assets (Billions $) 19. According to the Chebyshev rule.3 11.5 7. c. or ±3 standard deviations of the mean? d.0 11. the standard deviation.75.9 14. respectively.1 and that the quartiles are.0 12. b.1 11. PH Grade ASSIST 3.6 10. d.0 to 17. Compare and contrast your findings with what would be expected on the basis of the empirical rule.5 7.8 7. Compare and contrast your findings versus what would be expected based on the empirical rule. PH Grade ASSIST Applying the Concepts 3. variance. Compute the population mean.8 13.27 The data in the file DOWRETURN give the 10-year annualized return (1994–2003) for the 30 companies in the Dow Jones Industrials.3 12.7 11. or ±3 standard deviations of the mean? c.6 8. what percentage of these funds are expected to be within ±1.0 12.4 5.6 9. a. Is there a lot of variability in the assets of the bond funds? 3.5 9.0 13. variance. ±2. b. b.1 12. Are you surprised at the results in (b)? d.8 8. Compute the mean for this population of the five largest bond funds. b. Tax-Free Inc.9 6. Compute the population standard deviation.6 11. within ±2 standard deviations of the mean. 5.26 The data in the file ENERGY contains the per capita energy consumption in kilowatt hours for each of the 50 states and the District of Columbia during 1999. . Do (a) through (c) with the District of Columbia removed. Interpret this parameter. Interpret this number. Bond Fund Vanguard GNMA Vanguard Total Bond Mkt. What proportion of these states has average per capita energy consumption within ±1 standard deviation of the mean. Interpret these parameters.

you assume that all values within each class interval are located at the midpoint of the class.3: Computing Numerical Descriptive Measures from a Frequency Distribution 3.17) n −1 Example 3.3 99 COMPUTING NUMERICAL DESCRIPTIVE MEASURES FROM A FREQUENCY DISTRIBUTION Sometimes you have only a frequency distribution.7).7 Frequency Distribution of the 2003 Return for Growth Mutual Funds Annual Percentage 2003 Return 10 but less than 20 20 but less than 30 30 but less than 40 40 but less than 50 50 but less than 60 60 but less than 70 Total Frequency 2 9 13 15 5 5 49 .3. TABLE 3.16) n X = sample mean n = number of values or sample size c = number of classes in the frequency distribution mj = midpoint of the jth class fj = numbers of values in the jth class To calculate the standard deviation from a frequency distribution.14 APPROXIMATING THE MEAN AND STANDARD DEVIATION FROM A FREQUENCY DISTRIBUTION Consider the frequency distribution of the 2003 return of growth funds (Table 3.14 illustrates the computation of the mean and the standard deviation from a frequency distribution. When you have data from a sample that has been summarized into a frequency distribution. EXAMPLE 3. you can compute approximations to the mean and the standard deviation. When this occurs. you can compute an approximation of the mean by assuming that all values within each class interval are located at the midpoint of the class. Compute the mean and standard deviation. APPROXIMATING THE MEAN FROM A FREQUENCY DISTRIBUTION c X = where ∑ mj f j j =1 (3. APPROXIMATING THE STANDARD DEVIATION FROM A FREQUENCY DISTRIBUTION c S = ∑ ( m j − X )2 f j j =1 (3. not the raw data.

51 49 c and S = S = ∑ ( m j − X )2 f j j =1 n −1 8. Percentage Return Number of Funds(fj) Midpoint(mj) mj fj (mj − X ) (mj − X )2 (mj − X )2fj 10 but less than 20 20 but less than 30 30 but less than 40 40 but less than 50 50 but less than 60 60 but less than 70 Total 2 9 13 15 5 5 49 15 25 35 45 55 65 30 225 455 675 275 325 1. .985 −25.29 Given the following frequency distribution for n = 100: Class Intervals Frequency Class Intervals Frequency 0—Under 10 10—Under 20 20—Under 30 30—Under 40 40—Under 50 10 20 40 20 10 100 0—Under 10 10—Under 20 20—Under 30 30—Under 40 40—Under 50 40 25 15 15 5 100 Approximate a.212.8005 8.0 = 40.08 PROBLEMS FOR SECTION 3.4015 1.16) and (3.49 14.0409 394.301.49 24.2449 49 − 1 = 171.3 Learning the Basics 3.3601 20.08843 = 13. b.17) on page 99.100 CHAPTER THREE Numerical Descriptive Measures SOLUTION The computations that you need to calculate the approximations of the mean and standard deviation of the 2003 return for growth mutual funds are summarized in Table 3.212.6813 302.5202 2.51 −15.049.985. the standard deviation.9601 599.7601 1. the mean. the standard deviation.49 650.8005 2.5601 30.165.8 Computations Needed to Calculate the Approximations of the Mean and Standard Deviation of the 2003 Return for Growth Mutual Funds Using Equations (3.51 4.51 −5.1601 209. Approximate a. b.8.28 Given the following frequency distribution for n = 100: 3.7601 240.998.2449 TABLE 3. the mean. c X = X = ∑ mj f j j =1 n 1.

2 98. approximate the a. do you think the mean and the standard deviation of the accounts receivable have changed substantially from March to April? Explain.4 97.32 The following data represent the distribution of the ages of employees within two different divisions of a publishing company.0 For U.0 1. do U. Age of Employees (Years) 20—Under 30 30—Under 40 40—Under 50 50—Under 60 60—Under 70 A Frequency B Frequency 8 17 11 8 2 15 32 20 4 0 For each of the two divisions (A and B).0 12.S.0 3. mean.4 75. . approximate the a. mean. d.000 $10.0 100.0 32.0 32 54 61 68 68 70 71 72 44..7 94. Two independent samples of 50 accounts were selected for each of the two months. Construct a frequency distribution for each group. variation.000 to under $8.4 U. EXPLORATORY DATA ANALYSIS Section 3.000 $8. 3.0 92.S.S. On the basis of the results of (b) and (c). On the basis of the results of (a).0 4. c.000 $6. On the basis of the results of (a) and (b).4 94.and foreign-made automobiles seem to differ in their braking distance? Explain.0 44.000 $4. do you think there are differences in the age distribution between the two divisions? Explain.-Made Automobile Models “Less Than” Braking Indicated Values Distance (in Ft) Number Percentage 210 220 230 240 (continued) 0 1 2 3 0.4: Exploratory Data Analysis Applying the Concepts 3. On the basis of (a) and (b).000 Total For each month.-Made Automobile Models “Less Than” Braking Indicated Values Distance (in Ft) Number Percentage 250 260 270 280 290 300 310 320 4 8 11 17 21 23 25 25 101 Foreign-Made Automobile Models “Less Than” Indicated Values Number Percentage 16. b. On the basis of the results of (a). standard deviation.S.000 $2.and foreign-made automobiles a. approximate the standard deviation of the braking distance.4 Foreign-Made Automobile Models “Less Than” Indicated Values Number Percentage 0 1 4 19 0.0 8. b. Another way of describing numerical data is thrpough exploratory data analysis that includes the five-number summary and the box-and-whisker plot (references 5 and 6).6 26.. and shape.000 to under $10.0 84.4 5.31 The following table contains the cumulative frequency distributions and cumulative percentage distributions of braking distance (in feet) at 80 miles per hour for a sample of 25 U. 3. b. standard deviation.30 A wholesale appliance distributing firm wished to study its accounts receivable for two successive months.0 84.6 100.000 to under $6. c.3.000 to under $4. c.1 discussed sample statistics for numerical data that are measures of central tendency.S. approximate the mean of the braking distance.0 100.000 to under $12.0 68. The results are summarized in the following table: Frequency Distributions for Accounts Receivable Amount March Frequency April Frequency 6 13 17 10 4 0 50 10 14 13 10 0 3 50 $0 to under $2.-manufactured automobile models and for a sample of 72 foreign-made automobile models in a recent year: U.

The distance from Q1 to the median is less than the distance from the median to Q3. TABLE 3. EXAMPLE 3. For the sample of 10 get-ready times. Compute the five-number summary of the 2003 return for the small cap mutual funds with high risk. average. The distance from Q1 to the median is greater than the distance from the median to Q3. The distance from Xsmallest to Q1 is greater than the distance from Q3 to Xlargest. Distance from Q1 to the median versus the distance from the median to Q3.5 = 12. MUTUALFUNDS2004 . The distance from Xsmallest to Q1 is less than the distance from Q3 to Xlargest. Therefore. Both distances are the same. The distance from Xsmallest to the median is greater than the distance from the median to Xlargest.9 Relationships among the Five-Number Summary and the Type of Distribution Type of Distribution Comparison Left-Skewed Symmetric Right-Skewed Distance from Xsmallest to the median versus the distance from the median to Xlargest. the five-number summary is 29 35 39.5. the first quartile = 35. and large cap). Therefore. Both distances are the same. mid cap. The distance from Xsmallest to the median is less than the distance from the median to Xlargest. The distance from Xsmallest to Q1 (35 − 29 = 6) is slightly less than the distance from Q3 to Xlargest (52 − 44 = 8). Table 3.9 explains how the relationships among the “five numbers” allows you to recognize the shape of a data set. and high) and type (small cap. the get-ready times are slightly right-skewed.5 44 52 The distance from the median to Xsmallest to the median (39.5 − 29 = 10. and the third quartile = 44.15 COMPUTING THE FIVE-NUMBER SUMMARY OF THE 2003 PERCENTAGE RETURN OF SMALL CAP HIGH-RISK MUTUAL FUNDS The 121 mutual funds that are part of the “Using Statistics” scenario (see page 72) are classified according to the risk level of the mutual funds (low. Distance from Xsmallest to Q1 versus the distance from Q3 to Xlargest.1 show that the median = 39. the smallest value is 29 minutes and the largest value is 52 minutes (see pages 75 and 77). Calculations done previously in section 3.102 CHAPTER THREE Numerical Descriptive Measures The Five-Number Summary A five-number summary that consists of Xsmallest Q1 Median Q3 Xlargest provides a way to determine the shape of the distribution.5) is slightly less than the distance from Xlargest (52 − 39.5). Both distances are the same.

3 = 16.8 − 37.8 60. AVERAGE RISK. the five-number summary is 37.3. The box-and-whisker plot of the get-ready times in Figure 3. MUTUALFUNDS2004 . The right whisker is slightly longer than the left whisker. Therefore. Construct the box-and-whisker plot of the 2003 return for lowrisk. Figure 3.e. the upper 25% of the data are represented by a whisker connecting the right side of the box to Xlargest.8 = 12.7) is less than the distance from Xsmallest to the median (53. The distance from Xsmallest to Q1 (41.3 and the largest value is 66. The lower 25% of the data are represented by a line (i..16 THE BOX-AND-WHISKER PLOT OF THE 2003 PERCENTAGE RETURN OF LOW-RISK. the median = 53. the smallest value in the data set is 37. the box contains the middle 50% of the values in the distribution.4: Exploratory Data Analysis 103 SOLUTION From previous computations for the 2003 return for the small cap mutual funds with high risk (see pages 76 and 78).4 illustrates the box-and-whisker plot for the get-ready times.4 indicates very slight rightskewness since the distance between the median and the highest value is slightly more than the distance between the lowest value and the median.5. the results are inconsistent. This indicates slight right-skewness. AND HIGH-RISK MUTUAL FUNDS The 121 mutual funds that are part of the “Using Statistics” scenario (see page 72) are classified according to the risk level of the mutual funds (low.65).8.85 = 5.5 The distance from the median to Xlargest (66.3 41.5 − 60.7 53. and large cap).7.7 − 37.5 − 53. average. mid cap. average-risk.85 66.85. and the third quartile = 60. Similarly. and high) and type (small cap.4) is slightly less than the distance from Q3 to Xlargest (66. FIGURE 3. EXAMPLE 3. The Box-and-Whisker Plot A box-and-whisker plot provides a graphical representation of the data based on the fivenumber summary. Thus. and high-risk mutual funds. In addition. a whisker) connecting the left side of the box to the location of the smallest value.3 = 4. Therefore. The vertical line at the left side of the box represents the location of Q1 and the vertical line at the right side of the box represents the location of Q3. Xsmallest.5). the first quartile = 41. This indicates left skewness.4 Box-and-Whisker Plot of the Time to Get Ready Xsmallest 20 25 30 Q1 35 Median 40 Time (minutes) Xlargest Q3 45 50 55 The vertical line drawn within the box represents the median.

The high-risk funds appear left-skewed because of the long lower whisker. The low-risk funds appear to be slightly right-skewed since the upper whisker is longer than the lower whisker. the whiskers in the Minitab boxand-whisker plot extend to 1. FIGURE 3.5 Minitab Box-andWhisker Plot of the 2003 Return for LowRisk.2 The median percentage return and the quartiles are higher for the highrisk funds than for the low-risk and average-risk funds. (Note: The area under each polygon is split into quartiles corresponding to the five-number summary for the box-and-whisker plot. Minitab displays the box-and-whisker plot vertically from bottom (low) to top (high).6 Box-and-Whisker Plots and Corresponding Polygons for Four Distributions Panel A Bell-shaped distribution Panel B Left-skewed distribution Panel C Right-skewed distribution Panel D Rectangular distribution .) FIGURE 3. but the median return is closer to the first quartile than to the third quartile. The average-risk funds are right-skewed due to the extremely large return of one fund (78). and high-risk mutual funds. and High-Risk Mutual Funds Figure 3. average-risk.6 demonstrates the relationship between the box-and-whisker plot and the polygon for four different types of distributions. The asterisk (*) for the average-risk fund represents the presence of outlier values. SOLUTION Figure 3.104 CHAPTER THREE Numerical Descriptive Measures 2If there are outliers.5 is the Minitab box-and-whisker plot of the 2003 return for low-risk. Average-Risk.5 times the interquartile range beyond the quartiles or to the highest value.

4(d) on page 90. In these distributions. The number of hours until failure are in the file.35 The following is a set of data from a sample of n = 7: 12 7 4 9 0 7 3 a.589 593 1.425 922 308 a. . Discuss. Urbana–Champaign Kansas State University. Construct the box-and-whisker plot and describe the shape.33 The following is a set of data from a sample of n = 5: 7 4 9 8 2 a. Construct the box-and-whisker plot and describe the shape. 2002. a shared dormitory room.4: Exploratory Data Analysis 105 Panels A and D of Figure 3. 75% of all values are found between the left edge of the box (Q1) and the end of the right whisker (Xlargest). Orono University of Mississippi. 3.XLS University Change in Cost ($) University of California.” USA Today. The concentration of values is on the low end of the scale (i. b. Construct the box-and-whisker plot and describe the shape. List the five-number summary. the left side of the box-and-whisker plot). c. the mean and median are equal. many public universities in the United States raised tuition and fees due to a decrease in state subsidies (Mary Beth Marklein. Durham Ohio State University. the right side). and the most popular meal plan from the 2001–2002 academic year to the 2002–2003 academic year for a sample of 10 public universities.36 The following is a set of data from a sample of n = 5: 7 −5 −8 7 9 a.6 is left-skewed. Discuss. List the five-number summary. For this left-skewed distribution. c.38 In the 2002–2003 academic year. Compare your answer in (b) with that from problem 3. Fees—and Ire.. PH Grade ASSIST 3.37–3.3(d) on page 90. and the remaining 25% of the values are dispersed along the long right whisker at the upper end of the scale. Compare your answer in (b) with that from problem 3. b. Berkeley University of Georgia. 3. The few small values distort the mean toward the left tail. Here. the length of the left whisker is equal to the length of the right whisker. 1A–2A). Minitab. b. In addition. August 8. List the five-number summary. PROBLEMS FOR SECTION 3.2(d) on page 90. Compare your answer in (b) with that from problem 3.6 is right-skewed. COLLEGECOST. Discuss. Athens University of Illinois.223 869 423 1. Construct the box-and-whisker plot and describe the shape.4 Learning the Basics PH Grade ASSIST 3. or SPSS. Columbus University of South Carolina.1(d) on page 90.37 A manufacturer of flashlight batteries took a sample of 13 batteries from a day’s production and used them continuously until they were drained. the long left whisker contains the smallest 25% of the values. b. BATTERIES PH Grade ASSIST SELF Test 342 426 317 545 264 451 1. Columbia Utah State University.3. Panel B of Figure 3. List the five-number summary.049 631 512 266 492 562 298 a. Compare your answer in (b) with that from problem 3. c.42 can be solved manually or by using Microsoft Excel.e.720 708 1. Panel C of Figure 3. b. Oxford University of New Hampshire. Logan 1. Construct the box-and-whisker plot and describe the shape. The following represents the change in the cost of tuition. 75% of all data values are found between the beginning of the left whisker (Xsmallest) and the right edge of the box (Q3). Manhattan University of Maine. Discuss.6 are symmetric. Construct the box-and-whisker plot and describe the shape.e. 3. b. demonstrating the distortion from symmetry in this data set. and the median line divides the box in half. “Public Universities Raise Tuition. the skewness indicates that there is a heavy clustering of values at the high end of the scale (i. Therefore. c. List the five-number summary. List the five-number summary. Applying the Concepts Problems 3.. PH Grade ASSIST 3.34 The following is a set of data from a sample of n = 6: 7 4 9 7 3 12 a.

. b.20 4. you used scatter diagrams to visually examine the relationship between two numerical variables.” Copyright © 2000 by Consumers Union of U.08 6. Yonkers.64 4.40 The following data represent the bounced check fee (in dollars) for a sample of 23 banks for direct-deposit customers who maintain a $100 balance and the monthly service fee (in dollars) for direct-deposit customers if their accounts fall below the minimum required balance of $1500 for a sample of 26 banks.01 8. Dooley.46 6.47 a.5 15.5 Burgers 19 31 34 35 39 39 43 Chicken 7 9 15 16 16 18 22 25 27 33 39 Source: Extracted from “Quick Bites.13 4. and describe the shape of the distribution for the burgers and chicken items.55 3. and the results are as follows: BANK1 4. March 2001.17 illustrates its use. Construct the box-and-whisker plot for the burgers and the chicken items. June 2000. located in a residential area.19 3. List the five-number summary of the waiting time at the two bank branches.77 2.” Copyright © 2001 by Consumers Union of U..10 0.49 6.35 10.” Decision Sciences..38 5. A random sample of 15 customers is selected. b.02 5. a.91 5.79 8. Equation (3. List the five-number summary for the burgers and for the chicken items. Construct the box-and-whisker plot of the bounced check fee and the monthly service fee. 1131–1153. List the five-number summary of the bounced check fee and of the monthly service fee.73 3. Rothenberger. c.5 75. c.0 Source: M.66 5. Should you compare the two bank branches? Explain. A random sample of 15 customers is selected. b. the firm uses a database of reusable components totaling more than 2.68 5. The waiting time in minutes (defined as the time the customer enters the line until he or she reaches the teller window) of all customers during these hours is recorded over a period of one week.42 A bank branch located in a commercial district of a city has developed an improved process for serving customers during the noon to 1:00 P.21 5. In this section. List the five-number summary. 46. Inc.S.5. Yonkers. 30(Fall 1999).17 9. The Covariance The covariance measures the strength of the linear relationship between two numerical variables (X and Y). Form the box-and-whisker plot and describe the shape of the data.106 CHAPTER THREE Numerical Descriptive Measures 3.0 25.34 3.79 Another branch. Adapted with permission from Consumer Reports.90 8.12 6. and K.02 5. the covariance and the coefficient of correlation that measure the strength of the relationship between two numerical variables are discussed.82 8.18) defines the sample covariance and Example 3. is also concerned with the noon to 1 P.50 6. lunch period. THE COVARIANCE AND THE COEFFICIENT OF CORRELATION In section 2. The waiting time in minutes (operationally defined as the time the customer enters the line to the time he or reaches the teller window) of all customers during this hour is recorded over a period of one week. b. and the results are as follows: BANK2 9. What similarities and differences are there in the distribution of the waiting time at the two bank branches? d.0 47. A. a.54 3. What similarities and differences are there in the distributions for the burgers and the chicken items? 3. J.41 The following data represent the total fat for burgers and chicken items from a sample of fast-food chains. BANKCOST1 BANKCOST2 26 28 20 20 21 22 25 25 18 25 15 20 18 20 25 25 22 30 30 30 15 20 29 12 8 5 5 6 6 10 10 9 7 10 7 7 5 0 10 6 9 12 0 5 10 8 5 5 9 Source: Extracted from “The New Face of Banking.000. “A Performance Measure for Software Reuse Projects. c. FASTFOOD 3.000 lines of code collected from 10 years of continuous reuse effort. Eight analysts at the firm were asked to estimate the reuse rate when developing a new software system. 3. The following data are given as a percentage of the total code written for a software system that is part of the reuse database. REUSE 50 62. Adapted with permission from Consumer Reports. What similarities and differences are there in the distributions for the bounced check fee and the monthly service fee? 3. NY 10703–1057.M.0 45. lunch hour.39 A software development and consulting firm located in the Phoenix metropolitan area develops software for supply chain management systems using systematic software reuse. a. Inc. NY 10703–1057. Construct the box-and-whisker plot and describe the shape of the distribution of the two bank branches. Instead of starting from scratch when writing and developing new custom software systems.M.S..5 37..

18) into a set of smaller calculations.7 Microsoft Excel Worksheet for the Covariance between Expense Ratio and 2003 Return for the Small Cap High-Risk Funds Expense Ratio 1.579 9 −1 = 1.42 1.3 62.4 66.25 0.17 ∑ ( X i − X )(Yi − Y ) i =1 (3.7 contains a Microsoft Excel worksheet that calculates the covariance for these data.10 presents the expense ratio and 2003 return for the small cap high-risk funds and Figure 3.5 .72 1.61 1.2 44. Y ) = 9.6 59.19738. Compute the sample covariance.57 1.40 1. The Calculations area of Figure 3.5: The Covariance and the Coefficient of Correlation 107 THE SAMPLE COVARIANCE n cov( X .8 56. From cell C17. the covariance is 1.7 breaks down Equation (3.20 2003 Return 37.5 53.68 1.3 39. cov( X .18) directly.33 1. SOLUTION Table 3.18) n −1 COMPUTING THE SAMPLE COVARIANCE Consider the expense ratio and the 2003 return for the small cap high-risk funds.19738 TABLE 3.3. Y ) = EXAMPLE 3.2 44.10 Expense Ratio and 2003 Return for the Small Cap High-Risk Funds FIGURE 3. or by using Equation (3.

In the discussion of Figure 3. When dealing with population data for two numerical variables. Panel C illustrates a perfect positive relationship where ρ equals +1. You can see that for small values of X there is a very strong tendency for Y to be large. all the points could be connected with a straight line. the Greek letter ρ is used as the symbol for the coefficient of correlation. Figure 3.6. each of which contains 100 values of X and Y. Likewise. Since the covariance can have any value. Thus. In panel A. In panel C the linear relationship between X and Y is very weak. or −1. the coefficient of correlation ρ equals 0.108 CHAPTER THREE Numerical Descriptive Measures The covariance has a major flaw as a measure of the linear relationship between two numerical variables. you are unlikely to have a sample coefficient of exactly +1. Y increases in a perfectly predictable manner when X increases. Y decreases in a perfectly predictable manner. 0. the coefficient of correlation in panel B is not as negative as in panel A. the relationships were deliberately described as tendencies and not as cause-and-effect. This wording was used on purpose. The data do not all fall on a straight line. When you have sample data. the coefficient of correlation r is −0. the sample coefficient of correlation r is calculated. the coefficient of correlation ρ equals −1.9 on page 109 presents scatter diagrams along with their respective sample coefficients of correlation r for six data sets. Panel B shows a situation in which there is no relationship between X and Y. In this case. In this case. The data in panel B have a coefficient of correlation equal to −0. The values of the coefficient of correlation range from −1 for a perfect negative correlation to +1 for a perfect positive correlation. The linear relationship between X and Y in panel B is not as strong as in panel A.8 there is a perfect negative linear relationship between X and Y. Correlation alone cannot prove . Perfect means that if the points were plotted in a scatter diagram. and when X increases. and as X increases. The Coefficient of Correlation The coefficient of correlation measures the relative strength of a linear relationship between two numerical variables.8 Types of Association between Variables Y Y Panel A Perfect negative correlation (r = –1) X Y Panel B No correlation (r = 0) X Panel C Perfect positive correlation (r = +1) X In panel A of Figure 3. When using sample data. there is no tendency for Y to increase or decrease. the large values of X tend to be paired with small values of Y. and the small values of X tend to be paired with large values of Y. r = −0. so the association between X and Y cannot be described as perfect.9. Figure 3.3. and there is only a slight tendency for the small values of X to be paired with the larger values of Y. To better determine the relative strength of the relationship. you need to compute the coefficient of correlation.8 illustrates three different types of association between two variables. and the large values of X tend to be associated with large values of Y. FIGURE 3. you are unable to determine the relative strength of the relationship. Thus.9. Panels D through F depict data sets that have positive coefficients of correlation because small values of X tend to be paired with small values of Y.

9 Six Scatter Diagrams Created from Minitab and Their Sample Coefficients of Correlation r that there is a causation effect. that the change in the value of one variable caused the change in the other variable. but correlation alone does not imply causation. by the effect of a third variable not considered in the calculation of the correlation. You would need to perform additional analysis to determine which of these three situations actually produced the correlation.5: The Covariance and the Coefficient of Correlation Panel A Panel B Panel C Panel D Panel E Panel F 109 FIGURE 3. . Equation (3. that is.3. Therefore. or by a cause-andeffect relationship.19) defines the sample coefficient of correlation r and Example 3. A strong correlation can be produced simply by chance.18 illustrates its use. you can say that causation implies correlation.

10 and Equation (3.110 CHAPTER THREE Numerical Descriptive Measures THE SAMPLE COEFFICIENT OF CORRELATION r = cov( X . compute the sample coefficient of correlation.19).18 COMPUTING THE SAMPLE COEFFICIENT OF CORRELATION Consider the expense ratio and the 2003 return for the small cap high-risk funds.18 illustrates the computation of the sample coefficient of correlation using Equation (3.19) n ∑ ( X i − X )(Yi − Y ) where cov(X. EXAMPLE 3. Y) = i =1 n −1 n ∑ ( X i − X )2 SX = i =1 n −1 n ∑ (Yi − Y )2 SY = i =1 n −1 Example 3.3943786 FIGURE 3. Y ) S X SY (3.287663)(10. SOLUTION r = = cov( X . From Figure 3.19738 ( 0.19).554383) = 0. Y ) S X SY 1.10 Microsoft Excel Worksheet for the Sample Coefficient of Correlation r between the Expense Ratio and the 2003 Return for Small Cap High-Risk Funds .

D1) that discussed investment in foreign stocks stated that the coefficient of correlation between the return on investment of U. . Those mutual funds with the highest expense ratios tend to be associated with the highest 2003 returns. It only indicates the tendencies present in the data. stocks and these five other types of investments can you make? b. stocks and International Large Cap stocks was 0. b.” The Wall Street Journal. Adapted with permission from Consumer Reports. the coefficient of correlation indicates the linear relationship..53. 2003. bonds and Emerging market stocks was −0.58. Compare the results of (a) to those of problem 3. bonds and these five other types of investments can you make? b.S. c. What conclusions about the strength of the relationship between the return on investment of U. U. or SPSS.S. PROBLEMS FOR SECTION 3. between two numerical variables. U. U.46 The following data COFFEEDRINK represent the calories and fat (in grams) of 16-ounce iced coffee drinks at Dunkin’ Donuts and Starbucks: Product Calories Fat Dunkin’ Donuts Iced Mocha Swirl latte (whole milk) Starbucks Coffee Frappuccino blended coffee Dunkin’ Donuts Coffee Coolatta (cream) Starbucks Iced Coffee Mocha Expresso (whole milk and whipped cream) Starbucks Mocha Frappuccino blended coffee (whipped cream) Starbucks Chocolate Brownie Frappuccino blended coffee (whipped cream) Starbucks Chocolate Frappuccino Blended Crème (whipped cream) 240 260 350 8.0 420 16.S.03.S.5 22. Compare the results of (a) to those of problem 3.71.49 can be solved manually or by using Microsoft Excel..e.3.S. November 26.S.5 Learning the Basics 3.45 (a).394.48. as indicated by a coefficient of correlation. bonds and International Small Cap stocks was −0. Clements. stocks and Emerging market debt was 0.S..20.e. 3.0 Source: Extracted from “Coffee as Candy at Dunkin’Donuts and Starbucks. stocks and International Small Cap stocks was 0. Inc.0 350 20. U. Clements. 9. U. S. November 26. a.44–3. past performance does not guarantee future performance. In summary. bonds and International Bonds was 0. stocks and International Bonds was 0.” The Wall Street Journal. the larger values of X are typically paired with the larger values of Y) or negatively correlated (i. Yonkers. the linear relationship between the two variables is stronger. bonds and Emerging market debt was 0.0 510 22. Compute the coefficient of correlation. U.18. U.44 (a). 2003. the larger values of X are typically paired with the smaller values of Y). Compute the covariance. stocks and Emerging market stocks was 0. little or no linear relationship exists.0 530 19.5: The Covariance and the Coefficient of Correlation 111 The expense ratio and the 2003 return for the small cap high-risk funds are positively correlated.44 A recent article (J. When the coefficient of correlation is near 0. NY 10703–1057. Applying the Concepts Problems 3. r = 0. bonds and International Large Cap stocks was −0. The sign of the coefficient of correlation indicates whether the data are positively correlated (i. “Why Investors Should Put up to 30% of Their Stock Portfolio in Foreign Funds.S. or association. As with all investments. D1) that discussed investment in foreign bonds stated that the coefficient of correlation between the return on investment of U. Those mutual funds with the lowest expense ratios tend to be associated with the lowest 2003 returns. What conclusions about the strength of the relationship between the return on investment of U. The existence of a strong correlation does not imply a causation effect.S. “Why Investors Should Put up to 30% of Their Stock Portfolio in Foreign Funds.13. June 2004.80..S.45 A recent article (J.” Copyright © 2004 by Consumers Union of U. How strong is the relationship between X and Y? Explain.S. You can only say that this is what tended to happen in the sample. When the coefficient of correlation gets closer to +1 or −1. Minitab.0 3. You cannot assume that having a low expense ratio caused the low 2003 return. 3.S. U. 3.10.43 The following is a set of data from a sample of n = 11 items: X 7 5 8 Y 21 15 24 3 6 10 12 4 9 15 18 9 18 30 36 12 27 45 54 a. This relationship is fairly weak. a.

SELF Test 3. c.6 227.1 243. A1. a.1 107.7 9. Adapted with permission from Consumer Reports.50 4. Source: Extracted from N. 25. What conclusions can you reach about the relationship between the turnover rate of pre-boarding screeners and the security violations detected? a.00 2.” Copyright 2002 by Consumers Union of U. What conclusions can you reach about the relationship between the battery capacity and the digital-mode talk time? d.5 3.9 191.48 The following data SECURITY represent the turnover rate of pre-boarding screeners at airports in 1998–1999 and the security violations detected per million passengers.9 a.47 The following data represent the value of exports and imports in 2001 for various countries: EXPIMP Country Exports Imports 874.” The Wall Street Journal. Faces Test at New Trade Talks. your interpre- . NY 10703–1057. Is this borne out by the data? PITFALLS IN NUMERICAL DESCRIPTIVE MEASURES AND ETHICAL ISSUES In this chapter you studied how a set of numerical data can be characterized by various statistics that measure the properties of central tendency.1 158. What conclusions can you reach about the relationship between exports and imports. b.00 2.8 912.5 266.1 31.3 116. c. February 2002.5 121. d.25 2. c.3 10.8 25. Compute the coefficient of correlation. Compute the coefficient of correlation.50 2. Compute the coefficient of correlation. St.2 18. b.8 14.1 30. variation..8 403. Compute the coefficient of correlation.2 202.3 13.9 6. Krueger.2 141. c. Compute the sample covariance. Which do you think is more valuable in expressing the relationship between calories and fat—the covariance or the coefficient of correlation? Explain. Compute the covariance.S.” The New York Times. Miller.49 The following data CELLPHONE represent the digitalmode talk time in hours and the battery capacity in milliampere-hours of cellphones.25 2. b.25 2.8 1180.4 122.25 1.75 1. Your next step is analysis and interpretation of the calculated statistics.9 7.75 2.1 730.5 10.0 176. and shape. C2. Compute the covariance.2 349..25 2. Inc. Louis Atlanta Houston Boston Chicago Denver Dallas Baltimore Seattle/Tacoma San Francisco Orlando Washington–Dulles Los Angeles Detroit San Juan Miami New York–JFK Washington–Reagan Honolulu Turnover Source: Extracted from Alan B.7 31. d.5 15. Which do you think is more valuable in expressing the relationship between exports and imports—the covariance or the coefficient of correlation? Explain. “Post-Iraq Influence of U.00 450 900 900 900 700 800 800 900 900 Source: Extracted from “Service Shortcomings.00 3. September 9. Yonkers.0 European Union United States Japan China Canada Hong Kong Mexico South Korea Taiwan Singapore 3. November 15. City City Turnover Violations 416 375 237 207 200 193 156 155 140 11. Talk Time Battery Capacity Talk Time Battery Capacity 4.75 800 1500 1300 1550 900 875 750 1100 850 1.112 CHAPTER THREE Numerical Descriptive Measures a.S. King and S.25 2. Compute the covariance.75 1.25 3.6 Violations 110 100 90 88 79 70 64 53 47 37 20.1 13. b. Your analysis is objective.2 259. “A Small Dose of Common Sense Would Help Congress Break the Gridlock over Airport Security.6 22. You would expect cellphones with higher battery capacity to have a higher talk time.9 14. 2003. What conclusions can you reach about the relationship between calories and fat? 3.50 2.2 21. 2001.5 150.

variability. and statistics. geometric mean. you must do it in a fair. Everyone sees the world from different perspectives. median. and shape of a numerical variable Mean. Different people form different conclusions when interpreting the analytical findings. because data interpretation is subjective. Objectivity in data analysis means reporting the most appropriate descriptive summary measures for a given data set. Thus.Summary 113 tation is subjective. you were able to present useful information through the use of pie charts. Now that you have read the chapter and have become familiar with various descriptive summary measures and their strengths and weaknesses. mode. In addition. interquartile range. histograms.1–3. standard deviation. The analysis of the mutual funds based on risk level is objective and reveals several impartial findings. quartiles.” Ethical considerations arise when you are deciding what results to include in a report. what you hear on the radio or television. coefficient of variation. damned lies. In addition. Ethical Issues Ethical issues are vitally important to all data analysis. and coefficient of correlation. and other graphical methods. and shape using numerical descriptive measures such as the mean. neutral. much skepticism has been expressed about the purpose. analyzed. Table 3. SUMMARY This chapter was about numerical descriptive measures.5) . how should you proceed with the objective analysis? Because the data distribute in a slightly asymmetrical manner.11 provides a list of the numerical descriptive measures covered in this chapter. quartiles. You should document both good and bad results. In this and the previous chapter. and interpreted. coefficient of correlation (section 3. when making oral presentations and presenting written reports. you need to question what you read in newspapers and magazines. and the objectivity of published studies. Unethical behavior occurs when you willfully choose an inappropriate summary measure (e. unethical behavior occurs when you selectively fail to report pertinent findings because it would be detrimental to the support of a particular position. you need to give results in a fair. you studied descriptive statistics—how data are presented in tables and charts. the basic principles of probability are presented in order to bridge the gap between the subject of descriptive statistics and the subject of inferential statistics. Over time. range. and what you see on the World Wide Web. box-and-whisker plot (sections 3. Perhaps no comment on this topic is more telling than a quip often attributed to the famous nineteenth-century British statesman Benjamin Disraeli: “There are three kinds of lies: lies. and clear manner. and neutral manner. range. objective. Type of Analysis Numerical Data Describing a central tendency. standard deviation. the mean for a very skewed set of data) to distort the facts in order to support a particular position. Z scores. and then summarized. shouldn’t you report the median in addition to the mean? Doesn’t the standard deviation provide more information about the property of variation than the range? Should you describe the data set as right-skewed? On the other hand. In the next chapter. data interpretation is subjective. described.. variance. You must avoid errors that may arise either in the objectivity of your analysis or in the subjectivity of your interpretation.4) Describing the relationship between two numerical variables Covariance. median. variation. As a daily consumer of information. When dealing with the mutual fund data. You explored characteristics of past performance such as central ten- TABLE 3.11 Summary of Numerical Descriptive Measures dency.g. the focus.

Y ) = ∑ ( X i − X )(Yi − Y ) i =1 n −1 (3.18) Sample Coefficient of Correlation r = cov( X . Y ) S X SY (3.19) TERMS arithmetic mean 73 box-and-whisker plot 103 central tendency 72 Chebyshev rule 97 coefficient of correlation 108 coefficient of variation 85 covariance 106 dispersion 72 empirical rule 96 extreme value 86 five-number summary 102 geometric mean 79 interquartile range 81 left-skewed 88 mean 73 median 75 midspread 81 mode 76 outlier 86 population mean 94 population standard deviation population variance 97 Q1: first quartile 77 Q2: second quartile 77 95 .114 CHAPTER THREE Numerical Descriptive Measures KEY FORMULAS Sample Mean Z Scores n X = ∑ Xi i =1 (3.16) n ∑ ( X i − X )2 i =1 (3.14) N (3.8) Sample Variance n X = ∑ mj f j j =1 Approximating the Standard Deviation from a Frequency Distribution c (3.3) i =1 σ2 = Third Quartile Q3 Q3 = X −X S Z = 3( n + 1) ranked value 4 (3.4) Population Standard Deviation N ∑ ( X i − µ )2 Geometric Mean 1/ n X G = ( X1 × X 2 × L × X n ) (3.9) n −1 ∑ ( m j − X )2 f j j =1 S = Sample Covariance n S2 = ∑ (Xi − X ) i =1 n 2 n −1 (3.7) Interquartile Range Interquartile range = Q3 − Q1 (3.17) n −1 Sample Standard Deviation S = (3.11) cov( X .2) µ = First Quartile Q1 n +1 ranked value Q1 = 4 ∑ Xi i =1 (3.15) N Geometric Mean Rate of Return S2 = (3.5) i =1 σ = RG = [(1 + R1 ) × (1 + R2 ) × L × (1 + Rn )]1/ n − 1 (3.6) Approximating the Mean from a Frequency Distribution c Range Range = Xlargest − Xsmallest (3.13) N Population Variance N ∑ ( X i − µ )2 (3.1) n (3.12) Population Mean N Median Median = n +1 rank value 2 (3.10) Coefficient of Variation S CV = 100% X KEY (3.

5 grams of tea in a bag.45 5.61 5. interquartile range. Construct a box-and-whisker plot.59 What is meant by the property of shape? 3.50 5. on average.53 How do you interpret the first quartile. 3. two problems arise.53 5.40 5.67 5.68–3. Interpret the measures of central tendency and variation within the context of this problem.53 5.77 5. We recommend that you solve problems 3.58 5. First.Chapter Review Problems Q3: third quartile 77 quartiles 77 range 80 resistant measures 81 right-skewed 88 sample coefficient of correlation 109 CHAPTER sample covariance 106 sample mean 73 sample standard deviation sample variance 82 shape 72 skewed 88 spread 72 REVIEW Checking Your Understanding 3.54 5. there are 5. interquartile range.65 5. and the extremely fast filling operation of the machine (approximately 170 bags a minute).40 5.49 5.57 5. possible requests for additional medical information and medical exams. median.55 What does the Z score measure? 3.5 grams of tea in a bag? If you were in charge of this process. Second. variance. c.25 5.55 5. variance.57 5. The following table provides the weight in grams of a sample of 50 tea bags produced in one hour by a single machine.32 5. differences in the density of the tea. The ability to deliver approved policies to customers in a timely manner is critical to the profitability of this service to the bank. Is the company meeting the requirement set forth on the label that.52 What are the differences among the mean. The approval process consists of underwriting.51 5.44 5. Compute the mean.57 5.51 What is meant by the property of central tendency? 3. Are the data skewed? If so. if any. and a policy compilation stage during which the policy pages are generated and sent to the bank for delivery.60 How do the covariance and the coefficient of correlation differ? Applying the Concepts You can solve problems 3.58 How do the empirical rule and the Chebychev rule differ? 3. first quartile. and what are the advantages and disadvantages of each? 3. a random sample of 27 approved policies was selected and the following total processing time in days was recorded: INSURANCE 73 19 16 64 28 28 31 90 60 56 31 56 22 18 45 48 17 17 17 91 92 63 50 51 69 16 17 .56 5. how? e. the company is giving away product.67 manually or by using Microsoft Excel.86 using Microsoft Excel.44 5.47 5.58 5. For this product. Getting an exact amount of tea in a bag is problematic 115 standard deviation 82 sum of squares 82 symmetrical 88 variance 82 variation 76 Z scores 86 82 PROBLEMS because of variation in the temperature and humidity inside the factory.36 a. Compute the range.56 5. would you try to make concerning the distribution of weights in the individual bags? 3. standard deviation. customers may not be able to brew the tea to be as strong as they wish.53 5. and coefficient of variation. and coefficient of variation.54 5.61 5.50 5.47 5. and mode.52 5. Minitab. TEABAGS 5.61–3. there are 5. or SPSS.46 5. a medical information bureau check.34 5. savings banks are permitted to sell a form of life insurance called Savings Bank Life Insurance (SBLI).40 5.45 5.55 5. During a period of one month.29 5. and third quartile? 3. If the bags are underfilled. or SPSS.62 5.42 5. and third quartile.44 5.41 5. standard deviation. Minitab. If the average amount of tea in a bag exceeds the label weight. on average.62 In New York State.53 5. what changes. Why should the company producing the tea bags be concerned about the central tendency and variation? d. the company may be in violation of the truth-in-labeling laws. and what are the advantages and disadvantages of each? 3.54 What is meant by the property of variation? 3. median. the label weight on the package indicates that.67 5.50 5.61 A quality characteristic of interest for a tea-bag-filling process is the weight of the tea in the individual bags.58 5. b.63 5.42 5.56 What are the differences among the various measures of variation such as the range. which includes a review of the application.32 5.50 What are the properties of a set of numerical data? 3. median.57 How does the empirical rule help explain the ways in which the values in a set of numerical data cluster and distribute? 3.

Calculate the mean.66 Problems with a telephone line that prevent a customer from receiving or making calls are disconcerting to both the customer and the telephone company. Construct a box-and-whisker plot and describe the shape. and third quartile. what would you say? Explain.15 3.97 Central Office II Time to Clear Problems (minutes) 7.64 also produces electric insulators. first quartile. standard deviation.55 3.652 1. d.403 8. and 15 installation crews.65 The manufacturing company in problem 3. how? d.634 1.465 8. FURNITURE 54 5 35 137 31 27 152 2 123 81 74 27 11 19 126 110 110 29 61 35 94 31 26 5 12 4 165 32 29 28 29 26 25 1 14 13 13 10 5 27 4 52 30 22 36 26 20 23 33 68 a.688 1. and coefficient of variation.60 0.420 8. and coefficient of variation.32 3.411 8.728 1.752 1. Compute the mean. The company requires that the width of the trough be between 8.382 8. Construct a box-and-whisker plot and describe the shape.58 4.409 a.30 2.373 8. had undergone a major expansion in the past several years. Compute the mean.405 8.476 8.48 1. List the five-number summary.64 A manufacturing company produces steel housings for electrical equipment. The distance from one side of the form to the other is critical because of weatherproofing in outdoor applications.447 8. Interpret the measures of central tendency and variability in (a). On the basis of the results of (a) through (c).75 0. and third quartile.383 8. range. . interquartile range. interquartile range. Construct a side-by-side box-and-whisker plot.762 1. In particular.10 1.498 8. a measurer. Compute the range.75 0.439 8. c. median.396 8. and standard deviation for the force variable. standard deviation. how? d. variance.784 1. Are the data skewed? If so. and coefficient of variation.810 1.866 1.93 1. Interpret these measures of central tendency and variability. On the basis of the results of (a) through (c).774 1.80 1. a short-circuit is likely to occur.756 1. A large family-held department store selling furniture and flooring.60 1. variance.53 0.85 0.866 1.429 8.413 8.385 8.592 1.23 0. if you had to tell the president of the company how long a customer should expect to wait to have a complaint resolved.481 8. including carpet. first quartile. b. c. the flooring department had expanded from 2 installation crews to an installation supervisor. The following data represent the number of days between the receipt of the complaint and the resolution of the complaint.656 1.52 1. b. If the insulators break when in use.93 5.410 8.680 1.458 8. Construct a box-and-whisker plot.60 4.419 8. Calculate the mean.116 CHAPTER THREE Numerical Descriptive Measures a.05 6.92 0.427 8.10 0. TROUGH 8. A sample of 50 complaints concerning carpet installation was selected during a recent year.45 0.460 8.65 0.744 1.429 8. b. Compute the range. range.31 and 8.317 8. The following are the widths of the troughs in inches for a sample of n = 49. variance. The following data represent samples of 20 problems reported to two different offices of a telephone company and the time to clear these problems (in minutes) from the customers’ lines: PHONE Central Office I Time to Clear Problems (minutes) 1.479 8.870 1.414 8.414 8. The data from 30 insulators from this experiment are as follows: FORCE 1. first quartile.610 1. What can you conclude about the strength of the insulators if the company requires a force measurement of at least 1.420 8. median. Force is measured by observing how many pounds must be applied to the insulator before it breaks. b.460 8. d.10 0.662 1.410 8.52 3.63 One of the major measures of the quality of service provided by any organization is the speed with which it responds to customer complaints. median. how? d.734 1.405 8.500 pounds? 3.48 1.412 8.61 inches. c.422 8.489 8.02 0. Compute the mean.764 1.31 inches and 8. interquartile range.351 8.436 8. What can you conclude about the number of troughs that will meet the company’s requirements of troughs being between 8.810 1. and standard deviation for the width.53 4.550 1.348 8.97 1.498 8.522 1. standard deviation. Are the data skewed? If so.447 8. b.72 For each of the two central office locations: a. are there any differences between the two central offices? Explain.484 8.481 8.75 0. Are the data skewed? If so. Compute the range. c.734 1.48 3.462 8.02 3. To test the strength of the insulators.420 8.323 8. It is produced using a 250-ton progressive punch press with a wipe-down operation putting two 90-degree forms in the flat steel to make the trough. c.08 1. The main component part of the housing is a steel trough that is made out of a 14-gauge steel coil. median.736 a.662 1.415 8.312 8.61 inches wide? 3. What would you tell a customer who enters the bank to purchase this type of insurance policy and asks how long the approval process takes? 3.65 1.78 2. 3.820 1.444 8.10 1. destructive testing is carried out to determine how much force is required to break the insulators.343 8. Construct a box-and-whisker plot.60 0. median.696 1.788 1. and third quartile.

Are the data skewed? If so.29 7. b. Adapted with permission from Consumer Reports.92 Plant B 9.67 In many manufacturing processes the term “work-inprocess” (often abbreviated WIP) is used. 12(2003).25 10.37 6. first quartile. and sugar in grams for 33 breakfast cereals. and third quartile. fiber in grams. for the variables of cost per serving. C.. Are the data for any of the types of food skewed? If so. NY 10703–1057.62 8. and fat in grams for 97 varieties of dry and canned dog and cat food. Construct a side-by-side box-and-whisker plot for the four types (dry dog food.62 5. protein in grams.. On the basis of the results of (a) through (c).62 12. and third quartile. standard deviation. Construct a box-and-whisker plot. February 1998. protein in grams.17 13. such as bobble-head giveaways.Chapter Review Problems 3.69 State budget cuts forced a rise in tuition at public universities during the 2003–2004 academic year.75 15.92 11. Construct a side-by-side box-and-whisker plot. and coefficient of variation for the difference in 117 tuition between 2002–2003 and 2003–2004 for in-state students and out-of-state students.58 5. fiber in grams. interquartile range. Construct a five-number summary for the 43 games where promotions were held and for the 37 games without promotions. first quartile.54 11.50 7. Yonkers. interquartile range. 3.41 14. Compute the mean. median. What conclusions can you reach concerning the cost per ounce in cents. median.71 10. 18–19.75 12. standard deviation. interquartile range. c. Adapted with permission from Consumer Reports.S. calories. For each variable: a.25 5. and canned cat food).46 9. dry cat food. how? d. variance. b.25 9. 173–183).54 8. “Promotion Timing in Major League Baseball and the Stacking Effects of Factors that Increase Game Attractiveness.62 7. variance. and the sugar in grams for the 33 breakfast cereals? 3.68 The data contained in the file CEREALS consists of the cost in dollars per ounce.S.13 13.45 8.42 10. Compute the mean.46 21. how? . median. b. how? d. Compute the range. interquartile range. c. standard deviation.71 For each of the two plants: a.. 3. October 1999. Yonkers. canned dog food. and bound. a. and coefficient of variation. b. Inc. WIP Plant A 5. standard deviation. are there any differences between the two plants? Explain. C.46 16. variance. Are the data skewed? If so. median. first quartile. one for the 43 games where promotions were held and one for the 37 games without promotions. and coefficient of variation. The data file ROYALS includes the following variables for the Kansas City Royals during the 2002 baseball season: GAME = Home games in the order they were played ATTENDANCE = Paid attendance for the game PROMOTION—Y = a promotion was held. Inc. The following data represent samples of 20 books at each of two production plants and the processing time (operationally defined as the time in days from when the books came off the press to when they were packed in cartons) for these jobs. For the four types of food (dry dog food. variance. In a book manufacturing plant the WIP represents the time it takes for sheets from a press to be folded. Compute the range. Compute the range. how? d. Compute the mean.58 9.50 7. Krehbiel. gathered.00 2. N = no promotion was held a.96 4. Source: Extracted from Copyright 1999 by Consumers Union of U.” Sport Marketing Quarterly.21 6.62 25. NY 10703–1057. What conclusions can you reach concerning the difference in tuition between 2002–2003 and 2003–2004 for in-state students and out-of-state students? 3. sewn. 33–34. Construct a box-and-whisker plot of the difference in tuition between 2002–2003 and 2003–2004 for in-state students and out-of-state students.33 14.04 5. d. calories. c. c. c.29 13. increase attendance at Major League Baseball games? An article in Sport Marketing Quarterly reported on the effectiveness of marketing promotions (T. tipped on end sheets. Calculate the mean and standard deviation of attendance for the 43 games where promotions were held and for the 37 games without promotions. dry cat food and canned cat food). Source: Extracted from Copyright 1998 by Consumers Union of U.29 7. Compute the range.41 11. Compute the mean. canned dog food. Boyd and T. Discuss the results of (a) through (c) and comment on the effectiveness of promotions at Royals’ games during the 2002 season. Are the data skewed? If so.. cups per can. Construct a graphical display containing two boxand-whisker plots.29 16.70 Do marketing promotions. b.42 11.71 The data contained in the file PETFOOD2 consist of the cost per serving. and third quartile. The data in the file TUITION include the difference in tuition between 2002–2003 and 2003–2004 for in-state students and outof-state students. first quartile. and coefficient of variation. and fat in grams: a. and third quartile for the difference in tuition between 2002–2003 and 2003–2004 for in-state students and out-of-state students.

all other operating revenue.S. and coefficient of variation. b. text cost. For each of the variables of average travel-to-work time in minutes. Construct a box-and-whisker plot. Shingles that experience low amounts of granule loss are expected to last longer in normal use than shingles that experience high amounts of granule loss. a sampling of 700.” Copyright © 2002 by Consumers Union of U. turning circle requirement. player compensation and benefits. regular season gate receipts. Compute the coefficient of correlation between price and each of the following: text speed. Compute the range.8 grams of granule loss if it is expected to last the length of the warranty period. Inc. median. players arguing that owners are making money. 3. The data file GRANULE contains a sample of 170 measurements made on the company’s Boston shingles. Compute the mean. the file BB2001 contains team-by-team statistics on ticket prices.118 CHAPTER THREE Numerical Descriptive Measures d. b. national and other local expenses.73 The data in the file STATES represent the results of the American Community Survey. . yearly filter energy cost. percentage of homes with eight or more rooms. What conclusions can you reach concerning the regular season gate receipts.8 grams or less. b. weight. and 140 measurements made on Vermont shingles. Yonkers. Source: Extracted from “Printers. interquartile range. 47. do you think that any of the other variables might be useful in predicting printer price? Explain. a. 51. March 2002. Census. radio. variance. percentage of homes with eight or more rooms. and percentage of mortgage-paying homeowners whose housing costs exceed 30% of income? 3. Accelerated-life testing exposes the shingle to the stresses it would be subject to in a lifetime of normal use in a laboratory setting via an experiment that takes only a few minutes to conduct. Yonkers. and income from baseball operations. and color photo cost. and cable receipts. median household income. color photo time. c. text cost. Compute the coefficient of correlation between price and energy cost. What conclusions can you reach concerning the average travel-to-work time in minutes.. median household income. a shingle is repeatedly scraped with a brush for a short period of time and the amount of shingle granules that are removed by the brushing is weighed (in grams). Adapted with permission from Consumer Reports. national and other local expenses. April 2002. Compute the correlation between the number of wins and player compensation and benefits. List the five-number summary for the Boston shingles and for the Vermont shingles.” Copyright © 2002 by Consumers Union of U. In this situation. median. c.72 The manufacturer of Boston and Vermont asphalt shingles provide their customers with a 20-year warranty on most of their products. Construct a box-and-whisker plot. Compute the coefficient of correlation between price and filter cost. To determine whether a shingle will last as long as the warranty period. canned dog food. radio. February 2002. a.75 The data in the file AIRCLEANERS represent the price.. and color photo cost of computer printers. Inc. player compensation and benefits. color photo time. NY 10703–1057. NY 10703–1057. interquartile range. Yonkers. How strong is the relationship between these two variables? e.” Copyright © 2002 by Consumers Union of U. local television.S. b. What conclusions can you reach concerning any differences among the four types (dry dog food. and third quartile. and luggage capacity. and income from baseball operations? 3. Compute the mean. c. For each of these variables. standard deviation. 3. In addition to data related to team statistics for the 2001 season. AUTO2002 Source: Extracted from “The 2002 Cars. and cable receipts. In this test. Adapted with permission from Consumer Reports. length.74 The economics of baseball has caused a great deal of controversy with owners arguing that they are losing money. and percentage of mortgage-paying homeowners whose housing costs exceed 30% of income: a. What conclusions about the relationship of energy cost and filter cost to the price of the air cleaners can you make? Source: Extracted from “Portable Room Air Cleaners. local television. Based on the results of (a).000 households taken in each state during the 2000 U. 3. accelerated-life testing is conducted at the manufacturing plant. Compute the range. all other operating revenue. Adapted with permission from Consumer Reports.S.. and yearly filter cost of room air cleaners. and coefficient of variation. and fans complaining about how expensive it is to attend a game and watch games on cable television. text speed. Are the data skewed? If so. a shingle should experience no more than 0. Construct side-by-side box-and-whisker plots for the two brands of shingles and describe the shapes of the distributions.. first quartile.76 The data in the file PRINTERS represent the price. first quartile. how? d.77 You want to study characteristics of the model year 2002 automobiles in terms of the following variables: miles per gallon. Comment on the shingles’ ability to achieve a granule loss of 0. the fan cost index. variance. and canned cat food)? 3.. Inc. and third quartile.S. c. standard deviation. width. NY 10703–1057. a.. Are the data skewed? If so. dry cat food. b. how? d. a.

000 10. The chart below was provided to compare the average 1989 to 1990 hospital charges for three medical procedures (coronary bypass. how? d. and coefficient of variation. length. median. Are the data for any of the variables skewed? If so. interquartile range. and hip replacement) at three competing institutions (El Camino. width. and luggage capacity. Your CEO knows you are currently taking a course in statistics and calls you in to discuss this. and reply . standard deviation. You smile. how? d. variance. and third quartile. Sequoia and El Camino Hospitals are Stanford Medical Center’s main local competition. Stanford data are the average cost of all operations. median. decor. weight. service. take a deep breath. What conclusions can you reach concerning differences between SUVs and non-SUVs? 3. What conclusions can you reach concerning differences between New York City and Long Island restaurants? 3. c. how? d.000 20.78 Refer to the data of problem 3. simple birth. Compute the mean. interquartile range. El Camino Dollars 40. . sicker. November 11. median. first quartile.77. Suppose you were working in a medical center. for each of these variables: a. first quartile. c. Compute the range. You want to compare sports utility vehicles (SUVs) with non-SUV vehicles in terms of miles per gallon.000 For New York City and Long Island restaurants. variance. and third quartile. b. standard deviation. Sequoia costs are averages of the middle 50% of all charges for each operation. Are the data skewed? If so.Chapter Review Problems For each of these variables: a. decor rating. She now requests that you prepare her response. Sequoia. . Compute the mean. b. and coefficient of variation. She tells you that the article was presented in a discussion group setting as part of a meeting of regional area medical center CEOs last night and that one of them mentioned that this chart was totally meaningless and asked her opinion. The data file RESTRATE contains the Zagat rating for food.80 As an illustration of the misuse of statistics. Are the data skewed? If so. and 50 restaurants located on Long Island. for the variables of food rating. and price per person: a. Medicare. and the price per person for a sample of 50 restaurants located in New York City. Construct a side-by-side box-and-whisker plot for the New York City and Long Island restaurants.” The New York Times Sunday Business Section. interquartile range. and El Camino Hospital. Source: Stanford Medical Center. Compute the range. turning circle requirement. Compute the mean. b. service rating. Medicaid. Construct a side-by-side box-and-whisker plot. Compute the range. variance.79 Zagat’s publishes restaurant ratings for various locations in the United States. c.000 Sequoia Stanford 30. standard deviation. 50. and more complex patients. and third quartile. and Stanford).000 0 N/A Coronary bypass 119 Simple birth Hip replacement El Camino costs are the average of high and low charges for a simple birth with a two-day stay and a hip replacement with a nine-day stay. Source: Extracted from Zagat Survey 2002 New York City Restaurants and Zagat Survey 2002 Long Island Restaurants. Sequoia Hospital. 1990) implied that costs at Stanford Medical Center had been driven up higher than at competing institutions because the former was more likely than other organizations to treat indigent. Construct a box-and-whisker plot. first quartile. . an article by Glenn Kramon (“Coaxing the Stanford Elephant to Dance. For SUVs and non-SUV vehicles. What Health Care Costs A comparison of average 1989–90 hospital charges in California for various operations. and coefficient of variation. What conclusions can you draw concerning the 2002 automobiles? 3.

for the variables expense ratio in percentage. the pie charts—for all our variables.84 You wish to compare mutual funds that have fees to those that do not have fees. a. interquartile range. variance. and the country of origin (U. how? d. a. Construct a box-and-whisker plot. 2003 Return. and coefficient of variation. c. some of the output looks weird— like the box-and-whisker plots for gender and for major and the pie charts for grade point index and for height. a. or high Best quarter—Best quarterly performance 1999–2003 Worst quarter—Worst quarterly performance 1999–2003 3. Are the data skewed? If so.33. Are the data skewed? If so. median. and third quartile. 2003 Return. interquartile range. Your task is to write a report based on a complete descriptive evaluation of each of the numerical variables— price. Compute the range. versus imported) for each of the 69 beers that were sampled. the mean for height is 68. b. the medians. one of whom you particularly want to impress. 2003 Return. For each of these three groups. Compute the mean. The variables are: Fund—The name of the mutual fund Category—Type of stocks comprising the mutual fund—small cap. perform a similar evaluation comparing and contrasting each of these numerical variables based on the origins of the beers—those brewed in the United States versus those that were imported. threeyear return. and third quartile. average. b. the mean for gender is 1.76. and five-year return. Construct a box-and-whisker plot. regular and ice beers. imported lagers. Compute the range. three-year return.23. “I’ve got it all—the means. for the variables expense ratio in percentage. interquartile range.86 You wish to compare small cap.” Copyright © 1996 by Consumers Union of U. Are the data skewed? If so. how? d. the type of beer (craft lagers. What conclusions can you reach about differences between mutual funds that have a growth objective to those that have value objective? 3. and light and nonalcoholic beers). how? d. Compute the range. Inc. and light or nonalcoholic beers. for the variables expense ratio in percentage. What conclusions can you reach concerning these variables? 3. first quartile. c. three-year return. Construct a box-and-whisker plot. NY 10703–1057. mid cap. interquartile range. The problem is. and alcoholic content—regardless of type of product or origin. imported lagers.83 For expense ratio in percentage. b. the percent of alcohol content per 12 fluid ounces.50. 2003 Return.120 CHAPTER THREE Numerical Descriptive Measures 3. and five-year return. standard deviation. median. charts. standard deviation. standard deviation. c. median. Then perform a similar evaluation comparing each of these numerical variables based on type of product—craft lagers. Adapted with permission from Consumer Reports. craft ales. first quartile. calories. first quartile. Appended to your report should be all appropriate tables. Yonkers.S. and charts for a data set containing several numerical and categorical variables assigned by the instructor for study purposes. and large cap mutual funds. or SPSS to get the needed summary information. tables. standard deviation. and coefficient of variation. What conclusions can you reach about differences between small cap. variance. b. how? d. the mean for grade point index is 2. mid cap. This individual has volunteered to use Microsoft Excel. median. and coefficient of variation. variance. c. and five-year return. and third quartile. threeyear return. Also. first quartile. mid cap. regular and ice beers. For each of these two groups.82 The data found in the data file BEER represent the price of a six-pack of 12-ounce bottles. Compute the range. a.S. Compute the mean. Are the data skewed? If so. the mean for major is 4. the calories per 12 fluid ounces. For each of these two groups. large cap Objective—Objective of stocks comprising the mutual fund—growth or value Assets—In millions of dollars Fees—Sales charges (no or yes) Expense ratio—ratio of expenses to net assets in percentage 2003 Return—Twelve-month return in 2003 Three-year return—Annualized return 2001–2003 Five-year return—Annualized return 1999–2003 Risk—Risk-of-loss factor of the mutual fund classified as low. Construct a box-and-whisker plot..81 You are planning to study for your statistics examination with a group of classmates. In addition. and third quartile. and large cap mutual funds? . TEAM PROJECTS The data file MUTUALFUNDS2004 contains information regarding 12 variables from a sample of 121 mutual funds. craft ales. June 1996. I can’t understand why Professor Krehbiel said we can’t get the descriptive stats for some of the variables—I got it for everything! See. Compute the mean. the box-and-whisker plots. and numerical descriptive measures. Source: Extracted from “Beers. What conclusions can you reach about differences between mutual funds that have fees and those that do not have fees? 3. and coefficient of variation. Minitab.85 You wish to compare mutual funds that have a growth objective to those that have value objective. Compute the mean. the standard deviations.” What is your reply? Report Writing Exercises 3. variance. and five-year return.. This person comes over to you with the printout and exclaims.

.. 4. G. What factors may have limited the number of responses to that question? REFERENCES 1. Note that the last question of the survey has fewer responses.htm a second time and reexamine their supporting data and then answer the following: 1. 5. WEB 2. Hoaglin. C. MA: Addison-Wesley. and Computing of Exploratory Data Analysis (Boston. Kendall. and generate a box-and-whisker plot. The Advanced Theory of Statistics. Evaluate the methods StockTout used to summarize the results of its customer survey www. and D. CASE Apply your knowledge about the proper use of numerical descriptive measures in this continuing Web Case from Chapter 2. WA: Microsoft Corporation. Exploratory Data Analysis (Reading. 1981). Open to the worksheet containing the data you want to summarize. 1958). NJ: Prentice Hall. 2. 2004). 6. Finish .prenhall. Visit the StockTout Investing Service Web site www. F. P. In the Descriptive Statistics dialog box (see Figure A3.1 MICROSOFT EXCEL For Descriptive Statistics Use the Data Analysis ToolPak. PA: Minitab Inc.. Microsoft Excel 2003 (Redmond. Stuart.com/Springville/StockToutHome. J. vol.htm. Minitab Version 14 (State College. 2003).Appendix 121 RUNNING CASE MANAGING THE SPRINGVILLE HERALD For what variable in the Chapter 2 Managing the Springville Herald case (see page 62) are numerical descriptive measures needed? For the variable you identify: 1. enter the cell range of the data in the Input Range box. Reexamine the data you inspected when working on the Web Case for Chapter 2. MA: Duxbury Press. Compute the appropriate numerical descriptive measures. 3. Tukey. Applications. Basics. and A. Select Tools Data Analysis. 1 (London: Charles W. 2002). Griffin. Can descriptive measures be computed for any variables? How would such summary statistics support StockTout’s claims? How would those summary statistics affect your perception of StockTout’s record? 2. Is there anything you would do differently to summarize these results? 3.com/ Springville/ST_Survey.0 Brief Guide (Upper Saddle River. Choose the Columns option and Labels in First Row if you are using data that are arranged like the data in the Excel files on the CD-ROM packaged with this text. Identify another graphical display that might be useful and construct it. M.prenhall. Appendix 3 Using Software for Descriptive Statistics A3. 1977). What conclusions can you form from that plot that cannot be made from the box-and-whisker plot? Summarize your findings in a report that can be included with the task force’s study. From the list that appears in the Data Analysis dialog box. SPSS Base 12. Velleman.. select Descriptive Statistics and click OK.1).

Click OK. for either first or third quartile. Minimum. QUARTILE. For Coefficient of Correlation Open the Correlation. enter 1 as the K value. For Box-and-Whisker Plot See section G. COUNT.MTW worksheet. or SMALL. A3. Note in Figure 3. Median. enter the cell range of the data to be summarized and click OK. select the Mean. In the Function dialog box.10. and N total (the sample size) check boxes. enter either 1 or 3 as the Quart value. In the Function Arguments dialog box.3 on page 90. MEDIAN. Interquartile range.xls Excel file.xls Excel file. the formula =E17/(E18 * E19) could also be used in this particular worksheet to calculate the statistic. VAR.3). open the MUTUALFUNDS2004.2). Range.1 Data Analysis Descriptive Statistics Dialog Box To enter one of these functions into a worksheet. shown in Figure 3.10 on page 110. select an empty cell and then select Insert Function. This allows Excel to automatically update the value of n when the size of the table area is changed and ensures that the n − 1 term is always correct. Third quartile. Click the OK button to return to the . LARGE. Note in Figure 3. Standard deviation. FIGURE A3. select Statistical from the drop-down list and then scroll to and select the function you want to use. shown in Figure 3. STDEV.2 MINITAB Computing Descriptive Statistics To produce descriptive statistics for the 2003 return for different risk levels shown in Figure 3.5 (Box-and-Whisker Plot) if you want PHStat2 to produce a box-and-whisker plot as a Microsoft Excel chart. As shown in Figure 3. and ensures that the n − 1 term is always correct. The worksheet uses the CORREL function to calculate the coefficient of correlation. OR you can use any of these sample statistics worksheet functions in your own formulas including AVERAGE (for mean). Enter C10 or Risk in the By variables (optional): edit box. Step 1: In the Display Descriptive Statistics dialog box (see Figure A3. SUM. and for QUARTILE. FIGURE A3. MODE.2 Minitab Display Descriptive Statistics Dialog Box Step 2: Select the Statistics button. Coefficient of variation. (There are no Microsoft Excel commands that directly produce box-and-whisker plots.7 on page 107. Follow the onscreen instructions for modifying the table area if you want to use this worksheet with other pairs of variables. Summary statistics. Maximum. First quartile. In the Display Descriptive Statistics—Statistics dialog box (see Figure A3.122 CHAPTER THREE Numerical Descriptive Measures by selecting New Worksheet Ply. enter C7 or ‘Return 2003’ in the Variables: edit box. Kth Largest.10 that cell E16 contains a formula that uses the COUNT function. and clicking OK.) In versions of Microsoft Excel earlier than Excel 2003.7 that cell C15 contains a formula that uses the COUNT function. MAX. Select Stat Basic Statistics Display Descriptive Statistics. you may encounter errors in results when using the QUARTILE function. and Kth Smallest. Follow the onscreen instructions for modifying the table area if you want to use this worksheet with other pairs of variables.) For Covariance Open the Covariance. MIN. Results appear on a separate worksheet. This allows Excel to automatically update the value of n when the size of the table area is changed. since the covariance Sx and SY already appear in the worksheet. (For LARGE and SMALL.

5 Minitab Boxplots—One Y. With Groups dialog box (see Figure A3. enter C6 or ‘Expense ratio’ and C7 or ‘Return 2003’. Click the OK button. select the One Y With Groups choice. Calculating a Coefficient of Correlation To compute the coefficient of correlation for the expense ratio and the 2003 return for all the mutual funds.) Click the OK button. select the One Y Simple choice.5). Step 1: In the Boxplots dialog box (see Figure A3. FIGURE A3. Enter C10 or Risk in the Categorical variables edit box. Click the OK button again to compute the descriptive statistics.Appendix 123 Display Descriptive Statistics dialog box.3 Minitab Display Descriptive Statistics— Statistics Dialog Box Using Minitab to Create a Box-and-Whisker Plot To create a box-and-whisker plot for the 2003 return for different risk levels shown in Figure 3.MTW worksheet. With Groups Dialog Box The output will be similar to Figure 3. (If you want to create a box-and-whisker plot for one group.4). enter C7 or ‘Return 2003’ in the Graph variables: edit box. Select Stat Basic Statistics Correlation.MTW worksheet.6 Minitab Correlation Dialog Box .4 Minitab Boxplots Dialog Box Step 2: In the Boxplot—One Y.5 on page 104. In the Correlation dialog box (see Figure A3. FIGURE A3. open the MUTUALFUNDS2004. FIGURE A3. FIGURE A3. open the MUTUALFUNDS2004. Select Graph Boxplot.5 on page 104.6). Click the OK button.

- Assignment 3Uploaded byapi-3833460
- PROBLEM SET-3 Discrete Probability - SolutionsUploaded bymaxentiuss
- tif_ch08Uploaded byJerkz Lim Pei Yong
- Probability AssignmentUploaded byadee_uson
- The Mirror Daily_ 10 February 2017 Newpapers.pdfUploaded byThit Htoo Lwin
- chap 7Uploaded byapi-3763138
- sels3Uploaded byAlex Popa
- Chapter 2Uploaded byapi-3716695
- Statistics - Hypothesis Testing One Sample TestsUploaded byLeeHouEE
- 20120130407002Uploaded byIAEME Publication
- latihan soal statistikaUploaded byArgantha Bima Wisesa
- The Mirror Daily_ 6 February 2017 Newpapers.pdfUploaded byThit Htoo Lwin
- Final Presentation 1 (1)Uploaded byFaisal Ahmed
- PROBLEM SET-4 Continuous Probability - SolutionsUploaded bymaxentiuss
- Bayes+TheoremUploaded byyaneromar
- Plane Manufactring Operations ManagementUploaded byVijay
- Applications of Stochastic Models in HydrologyUploaded bypallavi gupta
- OutUploaded byKomang Sudarma
- 603 BasicsUploaded byEmmanuel Ababio
- Gallardo 2005Uploaded byYulia Elf
- Regional Study of Telecom Technology Options For Indian Rural Eucation - Survey AnanlysisUploaded bynehapatel20
- D 643 - 97 R02 _RDY0MW__Uploaded byjamaljamal20
- Chapter IIIUploaded byedniel maratas
- Annotated SPSS Output.docUploaded bytranminhphuon2307
- 02 Chapters 5 and 6 ColoredUploaded byIsrael Lives
- JennyUploaded bydarshandugar
- RatioUploaded byamit kumar dewangan
- 4_Acceptance_Testing_and_Criteria_for_Ready_Mixed_Concrete_in_Hong_Kong_By_Prof_Albert_Kwan.pdfUploaded bysunitkghosh1
- 302-Inst-Ch3Uploaded byLou Rawls
- Week 5 Result Analysis 1 Lecture NoteUploaded byramesh babu