You are on page 1of 78

benn.8206.05.

pgs 12/15/06 8:22 AM Page 321

Statistical thinking will one


day be as necessary for effi-
cient citizenship as the abil-
ity to read and write.

H. G. Wells

UNIT 5A
Fundamentals of Statistics: We discuss how
statistical studies are conducted, with empha-
sis on the importance of sampling.

UNIT 5B
Should You Believe a Statistical Study? We
Statistical develop eight useful guidelines for evaluating
statistical claims.

Reasoning UNIT 5C
Statistical Tables and Graphs: We investi-
gate basic tables and graphs, including fre-
Is your drinking water safe? Do most people quency tables, bar graphs, pie charts,
histograms, and line charts.
approve of the Presidents tax plan? How much is
the cost of health care rising? These questions and UNIT 5D
thousands more like them can be answered only Graphics in the Media: News media go well
beyond the basics with fancy statistical graph-
through statistical studies. Indeed, statistical infor- ics. We explore common types of media
mation appears in the news every day, making the graphics.
ability to understand and reason with statistics cru-
UNIT 5E
cial to modern life. Correlation and Causality: One of the most
important uses of statistics is to identify cause-
and-effect relationships. We investigate how to
interpret correlations and how to decide
whether a correlation is the result of causality.

321
benn.8206.05.pgs 12/15/06 8:22 AM Page 322

322 CHAPTER 5 Statistical Reasoning

UNIT 5A Fundamentals of Statistics

The subject of statistics plays a major role in modern society. Its used to determine
By the Way whether a new drug is effective in treating cancer. Its involved when agricultural
Youll sometimes hear inspectors check the safety of the food supply. Its used in every opinion poll and sur-
the word data used as a vey. In business, its used for market research. Sports statistics are part of daily conver-
singular synonym for sation for millions of people. Indeed, youll be hard-pressed to think of a topic that is
information, but techni- not linked in some way to statistics.
cally the word data is
But what is (or are) statistics? There are two answers, because the term statistics can
plural. One piece of
information is called a be either singular or plural. When it is singular, statistics refers to the science of statis-
datum, and two or more tics. The science of statistics helps us collect, organize, and interpret data, which are
pieces are called data. numbers or other pieces of information about some topic. When it is plural, the word
statistics refers to the data themselves, especially those that describe or summarize
something. For example, if there are 30 students in your class and they range in age
from 17 to 64, the numbers 30 students, 17 years, and 64 years are statistics that
describe your class.

TWO DEFINITIONS OF STATISTICS

Statistics is the science of collecting, organizing, and interpreting data.


Statistics are the data that describe or summarize something.

How Statistics Works


Statistical studies are conducted in many different ways and for many different pur-
poses, but they all share a few characteristics. To get the basic ideas, consider the
Nielsen ratings, which are used to estimate the numbers of people watching various
television shows. These ratings are used, for example, to determine the most popular
television show of the week.
Suppose the Nielsen ratings tell you that Lost was last weeks most popular show,
with 22 million viewers. You probably know that no one actually counted all 22 mil-
lion people. But you may be surprised to learn that the Nielsen ratings are based on
the television-viewing habits of people in only 5000 homes. To understand how
Nielsen can draw a conclusion about millions of Americans from 5000 homes, we
need to investigate the principles behind statistical research.
Nielsens goal is to draw conclusions about the viewing habits of all Americans. In
HISTORICAL NOTE the language of statistics, we say that Nielsen is interested in the population of all
Americans. The characteristics of this population that Nielsen seeks to learnsuch
Statistics originated with
the collection of census as the number of people watching each television showare called population
and tax data, which are parameters. Note that, although we usually think of a population as a group of peo-
affairs of state. That is ple, in statistics a population can be any kind of grouppeople, animals, or things.
why the word state is at For example, in a study of college costs, the population might be all colleges and uni-
the root of the word
versities, and the population parameters might include prices for tuition, fees, and
statistics.
housing.
benn.8206.05.pgs 12/15/06 8:22 AM Page 323

5A Fundamentals of Statistics 323

Nielsen seeks to learn about the population of all Americans by studying a much
smaller sample of Americans in depth. More specically, Nielsen has devices (called By the Way
people meters) attached to televisions in 5000 homes, so the people who live in Arthur C. Nielsen
these homes make up the sample of Americans that Nielsen studies. The individual founded his company
measurements that Nielsen collects from the sample, such as who is watching each and invented market
show at each time, constitute the raw data. Nielsen then consolidates these raw data research in 1923. He
into a set of numbers that characterize the sample, such as the percentage of young began producing ratings
for radio programs in
male viewers watching Lost. These numbers are called sample statistics. 1942 and added televi-
sion ratings in the 1960s.
Nielsens people meters,
DEFINITIONS attached to all the tele-
visions in 5000 homes, tell
The population in a statistical study is the complete set of people or things being the company when
studied. The sample is the subset of the population from which the raw data are each television is on and
actually obtained. what show is being
watched. People in the
Population parameters are specic characteristics of the population that a statis- homes are supposed to
tical study is designed to estimate. Sample statistics are numbers or observations push buttons that tell
that summarize the raw data. Nielsen who is watching
each television. Nielsen
can thereby determine
the breakdown of view-
ership by age, sex, and
E X A M P L E 1 Population and Sample ethnicity, as well as total
viewing numbers.
For each of the following cases, describe the population, sample, population parame-
ters, and sample statistics.
a. Agricultural inspectors for Jefferson County measure the levels of residue
from three common pesticides on 25 ears of corn from each of the 104 corn-
producing farms in the county.
b. Anthropologists determine the average brain size of early Neanderthals in
Europe by studying skulls found at three sites in southern Europe.
SOLUTION

a. The inspectors seek to learn about the population of all ears of corn grown
in the county. They do this by studying a sample that consists of 25 ears
from each farm. The population parameters are the average levels of residue
from the three pesticides on all corn grown in the county. The sample sta-
tistics describe the average levels of residue that are actually measured on
the corn in the sample.
b. The anthropologists seek to learn about the population of all early Nean-
derthals in Europe. Specically, they seek to determine the average brain
size of all Neanderthals, which is the population parameter in this case. The
sample consists of the relatively few individual Neanderthals whose skulls
are found at the three sites. The sample statistic is the average brain size
(skull size) of the individuals in the sample. Now try Exercises 2530.

The Process of a Statistical Study


Because Nielsen does not study the entire population of all Americans, it cannot actu-
ally measure any population parameters. Instead, the company tries to infer reasonable
values for population parameters from the sample statistics (which it did measure).
benn.8206.05.pgs 12/15/06 8:22 AM Page 324

324 CHAPTER 5 Statistical Reasoning

The process of inference is simple in principle, though it must be carried out with
By the Way great care. For example, suppose Nielsen nds that 7% of the people in its sample
Statisticians often divide
watched Lost. If this sample accurately represents the entire population of all Ameri-
their subject into two cans, then Nielsen can infer that approximately 7% of all Americans watched the show.
major branches. In other words, the sample statistic of 7% is used as an estimate for the population
Descriptive statistics is parameter. (By using statistical techniques that well discuss in Unit 6D, Nielsen can
the branch that deals also estimate the uncertainty in the inferred population parameters.)
with describing data in
the form of tables,
Once Nielsen has estimates of the population parameters, it can draw general con-
graphs, or sample statis- clusions about what Americans were watching. The process used by Nielsen Media
tics. Inferential statistics is Research is similar to that used in many statistical studies. Figure 5.1 summarizes the
the branch that deals general relationships among a population, a sample, the sample statistics, and the
with inferring (or estimat- population parameters.
ing) population charac-
teristics from sample
data.

BASIC STEPS IN A STATISTICAL STUDY

1. State the goal of your study precisely. That is, determine the population you
want to study and exactly what youd like to learn about it.
2. Choose a representative sample from the population.
3. Collect raw data from the sample and summarize these data by nding sample
statistics of interest.
4. Use the sample statistics to infer the population parameters.
5. Draw conclusions: Determine what you learned and whether you achieved your
goal.

START

1. Identify goals.

2. Draw from population.


POPULATION SAMPLE

3. Collect raw data


5. Draw conclusions.
and summarize.

4. Make inferences
POPULATION about population. SAMPLE
PARAMETERS STATISTICS

FIGURE 5.1 Elements of a statistical study.


benn.8206.05.pgs 12/15/06 8:22 AM Page 325

5A Fundamentals of Statistics 325

E X A M P L E 2 Unemployment Survey
Each month, the U.S. Labor Department surveys 60,000 households to determine
characteristics of the U.S. work force. One population parameter of interest is the
U.S. unemployment rate, dened as the percentage of people who are unemployed
By the Way
among all those who are either employed or actively seeking employment. Describe According to the Labor
how the ve basic steps of a statistical study apply to this research. Department, someone
who is not working is not
SOLUTION The steps apply as follows. necessarily unemployed.
For example, stay-at-
Step 1. The goal of the research is to learn about the employment (or unem- home moms and dads
ployment) within the population of all Americans who are either are not counted among
employed or actively seeking employment. the unemployed unless
they are actively trying
Step 2. The Labor Department chooses a sample consisting of people employed to find a job, and peo-
or seeking employment in 60,000 households. ple who had been try-
ing to find work but
Step 3. The Labor Department asks questions of the people in the sample, and
gave up in frustration
their responses constitute the raw data for the research. The Department are not counted as
then consolidates these data into sample statistics, such as the percentage unemployed.
of people in the sample who are unemployed.
Step 4. Based on the sample statistics, the Labor Department makes estimates of
the corresponding population parameters, such as the unemployment
rate for the entire United States.
Step 5. The Labor Department draws conclusions based on the population
parameters and other information. For example, it might use the current
and past unemployment rates to draw conclusions about whether jobs
have been created or lost. Now try Exercises 3136.

Choosing a Sample
Choosing a sample may be the most important step in any statistical study. If the sam-
ple fairly represents the population as a whole, then its reasonable to make inferences
from the sample to the population. But if the sample is not representative, then theres
little hope of drawing accurate conclusions about the population.
Suppose you want to determine the average height and weight of students at a
large university by measuring the heights and weights of a sample of 100 students. A
sample consisting only of members of the football and basketball teams would not be
reliable, because these athletes tend to be larger than most students. In contrast, sup-
pose you select your sample with a computer program that randomly draws student
numbers from the entire university population. In this case, the 100 students in your
sample are likely to be representative of the entire student body. You can therefore
expect that the average height and weight of students in the sample are reasonable
estimates of the averages for all students.

DEFINITION

A representative sample is a sample in which the relevant characteristics of the


sample members match those of the population.

Now try Exercises 3738.


benn.8206.05.pgs 12/15/06 8:22 AM Page 326

326 CHAPTER 5 Statistical Reasoning

A sample drawn with a computer program that selects students at random is an


example of a simple random sample. More technically, simple random sampling
means that every sample of a particular size has the same chance of being selected. In
the case of the student sample, every set of 100 students has an equal chance of being
selected by the computer program.
Simple random sampling is usually the best way to choose a representative sample.
However, it is not always practical or necessary, so other sampling techniques are
sometimes used. The following box summarizes four of the most common sampling
techniques, and Figure 5.2 illustrates the ideas.

COMMON SAMPLING METHODS

Simple random sampling: We choose a sample of items in such a way that every
sample of a given size has an equal chance of being selected.
Systematic sampling: We use a simple system to choose the sample, such as
selecting every 10th or every 50th member of the population.
Convenience sampling: We use a sample that is convenient to select, such as peo-
ple who happen to be in the same classroom.
Stratified sampling: We use this method when we are concerned about differ-
ences among subgroups, or strata, within a population. We rst identify the sub-
groups and then draw a simple random sample within each subgroup. The total
sample consists of all the samples from the individual subgroups.

Hey!
Do you support
the death
penalty?

Simple Random Sampling: Convenience Sampling:


Every sample of the same size has an equal Use results that are readily available.
chance of being selected. Computers are
often used to generate random telephone
numbers.

Systematic Sampling: Stratified Sampling:


Select every kth member. Partition the population into at least two strata,
then draw a sample from each.
FIGURE 5.2 Common sampling techniques.
benn.8206.05.pgs 12/15/06 8:22 AM Page 327

5A Fundamentals of Statistics 327

Regardless of what type of sampling is used, always keep the following two key
ideas in mind:
No matter how a sample is chosen, the study can be successful only if the sample
is representative of the population.
Even if a sample is chosen in the best possible way, it is still just a sample (as
opposed to the entire population). Thus, we can never be sure that a sample is rep-
resentative of the population. In general, a larger sample is more likely to be rep-
resentative of the population, as long as it is chosen well.

E X A M P L E 3 Sampling Methods
Identify the type of sampling used in each of the following cases, and comment on
whether the sample is likely to be representative of the population.
a. You are conducting a survey of students in a dormitory. You choose your
sample by knocking on the door of every 10th room.
b. To survey opinions on a possible property tax increase, a research rm ran-
domly draws the addresses of 150 homeowners from a public list of all
homeowners. By the Way
c. Agricultural inspectors for Jefferson County check the levels of residue from Neanderthals lived
three common pesticides on 25 ears of corn from each of the 104 corn- between about 100,000
producing farms in the county. and 30,000 years ago in
d. Anthropologists determine the average brain size of early Neanderthals in Eurasia and northern
Africa. They were physio-
Europe by studying skulls found at three sites in southern Europe.
logically distinct from
SOLUTION modern humans, but sci-
entists are not yet sure
a. Choosing every 10th room makes this a systematic sample. The sample may whether they repre-
be representative, as long as students were randomly assigned to rooms. sented a separate
species or could inter-
b. The records presumably list all homeowners, so drawing randomly from
breed with Homo sapi-
this list produces a simple random sample. It has a good chance of being ens. Neanderthals
representative of the population. developed many
c. Each farm may have different pesticide use, so the inspectors consider corn aspects of culture,
from each farm as a subgroup (stratum) of the full population. By checking including caring for the
sick and burying their
25 ears of corn from each of the 104 farms, the inspectors are using strati-
dead. Skull measure-
ed sampling. If the ears are collected randomly on each farm, each set of ments suggest that
25 is likely to be representative of its farm. Neanderthals had larger
d. By studying skulls found at selected sites, the anthropologists are using a brains than modern
convenience sample. They have little choice, because only a few skulls humans.
remain from the many Neanderthals who once lived in Europe. However, it
seems reasonable to assume that these skulls are representative of the larger
population. Now try Exercises 3944.

Watching Out for Bias


Consider a study designed to estimate the average weight of all men at a college. As we
discussed earlier, a sample consisting only of football players would not be representa-
tive of the population with respect to weight. We say that this sample is biased because
the men in the sample differ in a critical way from typical men at the college. More
generally, the term bias refers to any problem in the design or conduct of a statistical
study that tends to favor certain results.
benn.8206.05.pgs 12/15/06 8:22 AM Page 328

328 CHAPTER 5 Statistical Reasoning

Besides occurring in a poorly chosen sample, bias can arise in many other ways.
For example, a researcher may be biased if he or she has a personal stake in the out-
come of the study. In that case, the researcher might distort (intentionally or uninten-
tionally) the true meaning of the data. You should always be on the lookout for any
type of bias that may affect the results or interpretation of a statistical study. Well dis-
cuss sources of bias further in Unit 5B.

DEFINITION

A statistical study suffers from bias if its design or conduct tends to favor certain
results.

Time out to think


Thinking about issues of bias, explain why television networks use Nielsen to measure
ratings rather than doing it themselves.

Types of Statistical Study


Broadly speaking, most statistical studies fall into one of two categories: observational
studies and experiments. Nielsens studies of television viewing are observational
because they are designed to observe the television-viewing behavior of the people in
its 5000 sample homes. Note that observational studies may still involve some inter-
action. For example, an opinion poll is observational, even though researchers may
conduct in-depth interviews, because the polls goal is to learn (observe) peoples
opinions, not to change them. Similarly, a study in which individuals in the sample are
weighed is also observational, because the measurement process records (observes)
but does not change a persons weight.
In contrast, consider a medical study designed to test whether large doses of vita-
min C can help prevent colds. To conduct this study, the researchers must ask some
people in the sample to take large doses of vitamin C. This type of statistical study is
called an experiment, because some participants receive a treatment (in this case,
vitamin C) that they would not otherwise receive.

TWO BASIC TYPES OF STATISTICAL STUDY

1. In an observational study, researchers observe or measure characteristics of the


sample members but do not attempt to inuence or modify these characteristics.
2. In an experiment, researchers apply a treatment to some or all of the sample
members and then look to see whether the treatment has any effects.

It is difficult to determine whether an experimental treatment works unless you


compare groups that receive the treatment to groups that dont. In the vitamin C
study, for example, researchers might create two groups of people: a treatment
benn.8206.05.pgs 12/15/06 8:22 AM Page 329

5A Fundamentals of Statistics 329

group that takes large doses of vitamin C and a control group that does not take With proper treat-
vitamin C. The researchers can then look for differences in the numbers of colds ment, a cold can be
among people in the two groups. Having a control group is usually crucial to inter- cured in a week. Left
preting the results of experiments. to itself, it may linger
In an experiment, it is very important for the treatment and control groups to be for seven days.
alike in all respects except for the treatment. For example, if the treatment group con- A MEDICAL FOLK SAYING
sisted of active people with good diets and the control group consisted of sedentary
people with poor diets, we could not attribute any differences in colds to vitamin C
alone. To avoid this type of problem, assignments to the control and treatment groups
must be done randomly.

TREATMENT AND CONTROL GROUPS

The treatment group in an experiment is the group of sample members who


receive the treatment being tested.
The control group in an experiment is the group of sample members who do not
receive the treatment being tested.
It is important for the treatment and control groups to be selected randomly and
to be alike in all respects except for the treatment.

The Placebo Effect and Blinding


For experiments involving people, using a treatment and a control group might not
be enough to get reliable results. The problem is that people can be affected by their
beliefs as well as by real treatments. For example, stress and other psychological fac-
tors have been shown to affect resistance to colds. If people taking vitamin C get
fewer colds than people who dont, we cant conclude that the vitamin C was respon-
sible. It might be that people stayed healthier because they believed that vitamin C
works. Therefore, people in the control group should be given a placeboin this By the Way
case, pills that look like vitamin C pills but dont actually contain vitamin C. As long The placebo effect can
as the participants dont know whether they are in the treatment or control group be surprisingly powerful.
(that is, whether they got the real pills or the placebo), any effect arising from psycho- Consider a drug now
logical factorsknown as a placebo effectshould affect both groups equally. Then, used to combat bald-
if people in the vitamin C group get fewer colds than people in the control group, we ing, which was tested on
balding men. The drug
have evidence that vitamin C really works. maker was pleased to
learn that 86% of the
men receiving the drug
either stopped balding
DEFINITIONS or grew new hair. But
remarkably, so did 42%
A placebo lacks the active ingredients of a treatment being tested in a study, but is of the men who
identical in appearance to the treatment. Thus, study participants cannot distin- received the placebo!
guish the placebo from the real treatment. In other studies, as many
as 75% of participants
The placebo effect refers to the situation in which patients improve simply receiving a placebo
because they believe they are receiving a useful treatment. have actually improved.
benn.8206.05.pgs 12/15/06 8:23 AM Page 330

330 CHAPTER 5 Statistical Reasoning

In statistical terminology, the practice of keeping people in the dark about who is
in the treatment group and who is in the control group is called blinding. A single-
blind experiment is one in which the participants dont know which group they
belong to, but the experimenters (the people administering the treatment) do know.
Using a placebo is one way to create a single-blind experiment. Sometimes, a single-
blind experiment can still be unreliable if the experimenters can subtly inuence
outcomes. For example, in an experiment that involves interviews, the experi-
menters might speak differently to people who received the real treatment than to
those who received the placebo. This type of problem can be avoided by making the
experiment double-blind, which means neither the participants nor the experi-
menters know who belongs to each group. (Of course, someone must keep track of
the two groups in order to evaluate the results at the end. In typical double-blind
experiments, researchers hire experimenters to make any necessary contact with the
participants.)

BLINDING IN EXPERIMENTS

An experiment is single-blind if the participants do not know whether they are


members of the treatment group or members of the control group, but the experi-
menters do know.
An experiment is double-blind if neither the participants nor the experimenters
(people administering the treatment) know who belongs to the treatment group
and who belongs to the control group.

E X A M P L E 4 Whats Wrong with This Experiment?


For each of the experiments described below, identify any problems and explain how
the problems could have been avoided.
a. A chiropractor wants to know if his adjustments relieve back pain. He per-
forms adjustments on 25 patients with back pain. Afterward, 18 of the
patients say they feel better. He concludes that the adjustments are an effec-
tive treatment.
b. A new drug for attention decit disorder (ADD) is supposed to make the
affected children more polite. Randomly selected children suffering from
ADD are divided into treatment and control groups. Those in the control
group receive a placebo that looks just like the real drug. The experiment is
single-blind. Experimenters interview the children one-on-one to decide
whether they became more polite.
SOLUTION

a. The 25 patients who receive adjustments represent a treatment group, but


this study lacks a control group. The patients may be feeling better because
of a placebo effect rather than any real effect of the adjustments. The chiro-
practor might have improved his study by hiring an actor to do a fake
adjustment (one that feels like a real manipulation, but doesnt actually con-
benn.8206.05.pgs 12/15/06 8:23 AM Page 331

5A Fundamentals of Statistics 331

DILBERT reprinted by permission of United Feature Syndicate, Inc.

form to chiropractic guidelines) on a control group. Then he could have


compared the results in the two groups to see whether a placebo effect was
involved.
b. Because the experimenters know which children received the real drug, dur-
ing the interviews they may inadvertently speak differently or interpret
behavior differently with these children. In that case, their conclusions
might not be valid. The experiment should have been double-blind, so that
the experimenters conducting the interviews would not have known which
children received the real drug and which children received the placebo.
Now try Exercises 4550.

Case-Control Studies
Sometimes it may be impractical or unethical to conduct an experiment. For example,
suppose we want to study how alcohol consumed during pregnancy affects newborn
babies. Because it is already known that alcohol can be harmful during pregnancy, it
would be unethical to divide a sample of pregnant mothers randomly into two groups
and then force the members of one group to consume alcohol. However, we may be
able to conduct a case-control study, in which the participants naturally form groups
by choice. In this example, the cases consist of mothers who consume alcohol during
pregnancy by choice, and the controls consist of mothers who choose not to consume
alcohol.
A case control study is observational because the researchers do not change the
behavior of the participants. But it also resembles an experiment because the cases
effectively represent a treatment group and the controls represent a control group.

DEFINITIONS

A case-control study is an observational study that resembles an experiment


because the sample naturally divides into two (or more) groups. The participants
who engage in the behavior under study form the cases, which makes them like a
treatment group in an experiment. The participants who do not engage in the
behavior are the controls, making them like a control group in an experiment.
benn.8206.05.pgs 12/15/06 8:23 AM Page 332

332 CHAPTER 5 Statistical Reasoning

E X A M P L E 5 Which Type of Study?


For each of the following questions, what type of statistical study is most likely to lead
to an answer? Why?
a. What is the average income of stock brokers?
b. Do seat belts save lives?
c. Can lifting weights improve runners times in a 10-kilometer race?
d. Can a new herbal remedy reduce the severity of colds?
SOLUTION

a. An observational study can tell us the average income of stock brokers. We


need only survey (observe) the brokers.
b. It would be unethical to do an experiment in which some people were told
to wear seat belts and others were told not to wear them. Instead, we can
conduct an observational case-control study. Some people choose to wear seat
belts (the cases) and others choose not to wear them (the controls). By com-
paring the death rates in accidents between cases and controls, we can learn
whether seat belts save lives. (They do.)
c. We need an experiment to determine whether lifting weights can improve
runners 10K times. One group of runners will be put on a weight-lifting
program, and a control group will be asked to stay away from weights. We
must try to ensure that all other aspects of their training are similar. Then
we can see whether the runners in the lifting group improve their times
more than those in the control group. Note that we cannot use blinding in
this experiment because there is no way to prevent participants from know-
ing whether they are lifting weights.
d. We should use a double-blind experiment, in which some participants get the
actual remedy while others get a placebo. We need double-blind condi-
tions because the severity of a cold may be affected by mood or other fac-
tors that experimenters might inadvertently inuence.
Now try Exercises 5156.

Surveys and Opinion Polls


By the Way Surveys and opinion polls may be the most common types of statistical study, and we
Politicians and mar- must be very careful when we interpret them. Fortunately, survey and poll results usu-
keters often pretend ally include something called the margin of error.
they are trying to con- Suppose a poll nds that 76% of the public supports the President, with a margin
duct a true opinion poll
of error of 3 percentage points. The 76% is a sample statistic; that is, 76% of the peo-
or survey when, in fact,
they are deliberately ple in a sample said they support the President. The margin of error helps us under-
trying to get particular stand how well this sample statistic is likely to approximate the true population
results. These types of parameter (in this case, the percentage of all Americans who support the President).
surveys are called push By adding and subtracting the margin of error from the sample statistic, we nd a
polls because they try
range of values, or a confidence interval, likely to contain the population parameter.
to push peoples
opinions. In this case, we add and subtract 3 percentage points to nd a condence interval
from 73% to 79%.
benn.8206.05.pgs 12/15/06 8:23 AM Page 333

5A Fundamentals of Statistics 333

DEFINITION

The margin of error in a statistical study is used to describe a confidence inter-


val that is likely to contain the true population parameter. We nd this interval by
subtracting and adding the margin of error from the sample statistic obtained in
the study. That is, the condence interval is
from A sample statistic 2 margin of error B
to A sample statistic 1 margin of error B

How condent can we be in a poll result? Unless we are told otherwise, we assume
that the margin of error is dened to give us 95% condence that the condence
interval contains the population parameter. Well discuss the precise meaning of 95%
condence in Unit 6D, but for now you can think of it as follows: If the poll were
repeated 20 times with 20 different samples, 19 of the 20 polls (that is, 95% of the
polls) would have a condence interval that contains the true population parameter.

E X A M P L E 6 Close Election
An election eve poll nds that 52% of surveyed voters plan to vote for Smith, and she
needs a majority (more than 50%) to win without a runoff. The margin of error in the
poll is 3 percentage points. Will she win?

SOLUTION We subtract and add the margin of error of 3 percentage points to nd a


condence interval

from 52% 2 3% 5 49% to 52% 1 3% 5 55%


We can be 95% condent that the actual percentage of people planning to vote for
her is between 49% and 55%. Because this condence interval leaves open the possi-
bility of both a majority and less than a majority, this election is too close to call.
Now try Exercises 5760.

Time out to think


In Example 6, suppose the poll found the candidate had 55% of the vote. Should
she be confident of a win?
benn.8206.05.pgs 12/15/06 8:23 AM Page 334

334 CHAPTER 5 Statistical Reasoning

EXERCISES 5A

QUICK QUIZ 7. If we see a placebo effect in an experiment to test a new


treatment designed to cure warts, we know that
Choose the best answer to each of the following questions.
Explain your reasoning with one or more complete sentences. a. the experiment was not properly double-blind.
1. You conduct a poll in which you randomly select 1000 reg- b. the experimental groups were too small.
istered voters from Texas and ask if they approve of the job c. warts were cured among members of the control group.
their governor is doing. The population for this study is
a. all registered voters in the state of Texas. 8. An experiment is single-blind if
b. the 1000 people that you interview. a. it lacks a treatment group. b. it lacks a control group.
c. the governor of Texas. c. the participants do not know whether they belong to the
treatment or control group.
2. Results of the poll described in Exercise 1 would most
likely suffer from bias if you chose the participants from 9. Poll X predicts that Powell will receive 49% of the vote,
while Poll Y predicts that he will receive 53% of the vote.
a. all registered voters in Texas. Both polls have a margin of error of 3 percentage points.
b. all people with a Texas drivers license. What can you conclude?
c. people who donated money to the governors campaign. a. One of the two polls must have been conducted poorly.
b. The two polls are consistent with each other.
3. When we say that a sample is representative of the popula-
tion, we mean that c. Powell will receive 51% of the vote.
a. the results found for the sample are similar to those we 10. A survey reveals that 12% of Americans believe Elvis is still
would nd for the entire population. alive, with a margin of error of 4 percentage points. The
b. the sample is very large. condence interval for this poll is
c. the sample was chosen in the best possible way. a. from 10% to 14%. b. from 8% to 16%.
c. from 4% to 20%.
4. Consider an experiment designed to see whether cash
incentives improve school attendance. The researcher
chooses two groups of 100 high school students. She offers REVIEW QUESTIONS
one group $10 for every week of perfect attendance. She 11. Why do we say that the term statistics has two meanings?
tells the other group that they are part of an experiment Describe both meanings.
but does not give them any incentive. The students who do
not receive an incentive represent 12. Dene the terms population, sample, population parameter,
and sample statistics as they apply to statistical studies.
a. the treatment group. b. the control group.
c. the observation group. 13. Describe the ve basic steps in a statistical study, and give
an example of their application.
5. The experiment described in Exercise 4 is
14. Why is it so important that a statistical study use a repre-
a. single-blind. b. double-blind. c. not blind. sentative sample? Briey describe four common sampling
methods.
6. The purpose of a placebo is
a. to prevent participants from knowing whether they 15. What is bias? How can it affect a statistical study? Give
belong to the treatment group or the control group. examples of several forms of bias.
b. to distinguish between the cases and the controls in a 16. Describe and contrast observational studies and experi-
case-control study. ments. What do we mean by the treatment group and
c. to determine whether diseases can be cured without any control group in an experiment? What do we mean by the
treatment. cases and controls in an observational case-control study?
benn.8206.05.pgs 10/1/07 9:38 AM Page 335

5A Fundamentals of Statistics 335

17. What is a placebo? Describe the placebo effect and how it 29. Harris Interactive surveyed 2435 U.S. adults nationwide
can make experiments difficult to interpret. How can mak- and asked them to rate quality of American public schools.
ing an experiment single-blind or double-blind help?
30. The American Institute of Education conducts an annual
18. What is meant by the margin of error in a survey or opin- study of attitudes of incoming college students by survey-
ion poll? How is it used to identify a condence interval? ing approximately 261,000 rst-year students at 462 col-
leges and universities. There are approximately 1.6 million
DOES IT MAKE SENSE? rst-year college students in this country.
Decide whether each of the following statements makes sense
Steps in a Study. Describe how you would apply the ve basic
(or is clearly true) or does not make sense (or is clearly false).
steps of a statistical study to the issues in Exercises 3136.
Explain your reasoning.
31. You want to determine the average number of hours per
19. In my experimental study, I used a sample that was larger
day students at a middle school spend listening to iPods.
than the population.
32. As an airline marketing executive, you want to know if
20. I followed all the guidelines for sample selection carefully, there has been an increase in frustration with air travel
yet my sample still did not reect the characteristics of the among business travelers.
population.
33. You want to know the percentage of male college students
21. I wanted to test the effects of vitamin C on colds, so I gave in America who do Sudoku puzzles at least once per week.
the treatment group vitamin C and gave the control group
vitamin D. 34. You want to know the typical percentage of the bill that is
left as a tip in restaurants.
22. I dont believe the results of the experiment, because the
results were based on interviews but the study was not 35. You want to know the average lifetime of windshield
double-blind. wipers on cars made in Japan.

23. The pre-election poll found that Kennedy would get 58% 36. You want to know the percentage of high school students
of the vote, with a margin of error of 4%, but he ended up who are vegetarians.
losing the election. 37. Representative Sample? You want to determine the
mean (average) number of hours spend studying each week
24. By choosing my sample carefully, I can make a good esti-
by high school girls. Which of the following samples is
mate of the average height of Americans by measuring the
most likely to be representative, and why? Also explain
heights of only 500 people.
why each of the other choices is not likely to make a repre-
BASIC SKILLS & CONCEPTS sentative sample for this study.

Population and Sample. For the studies described in Exer- The girls track team
cises 2530, describe the population, sample, population param- The girls in an advanced placement calculus course
eters, and sample statistics. The girls in the cast of the current theater production
25. In order to gauge public opinion on how to handle Irans
The rst 50 girls you meet in the school cafeteria
growing nuclear program, the Pew Research Center sur-
veyed 1001 Americans by telephone. 38. Representative Sample? You want to determine the typi-
26. Astronomers typically determine the distance to a galaxy (a cal dietary habits of students at a college. Which of the fol-
galaxy is a huge collection of billions of stars) by measuring lowing would make the best sample, and why? Also explain
the distances to just a few stars within it and taking the why each of the other choices would not make a good sam-
mean (average) of these distance measurements. ple for this study.

27. In a USA Today Internet poll, readers responded voluntar- Students in a single dormitory
ily to the question Do you consume at least one caf- Students majoring in public health
feinated beverage every day? Students who participate in intercollegiate sports
28. The Gallup Organization conducted a poll of 1003 Ameri- Students enrolled in a required mathematics class
cans in its household panel who plan to take a summer
vacation to determine what percentage of people plan to Identify the Sampling Method. Exercises 3944 each
cancel their summer vacation because of the increase in describe a sample. Identify the sampling method as simple ran-
gasoline prices. dom sampling, systematic sampling, convenience sampling, or
benn.8206.05.pgs 12/15/06 8:23 AM Page 336

336 CHAPTER 5 Statistical Reasoning

stratied sampling. Briey explain why you think this sampling 49. A (hypothetical) study of 45 swimmers found that those
method was chosen. who were placed on a weight-training regimen in addition
39. An IRS (Internal Revenue Service) auditor randomly to daily swimming workouts improved their times by 3.5%.
selects for audits 30 taxpayers in each of the ling status 50. A survey of 275,811 rst-year college students revealed
categories: single, head of household, married ling jointly, that 32.4% of these students had an A average in high
and married ling separately. school (Higher Education Research Institute).
40. People magazine chooses its 25 most beautiful women by
looking at responses from readers who voluntarily mail in a Which Type of Study? For each of the questions in Exercises
survey printed in the magazine. 5156, what type of statistical study is most likely to lead to an
answer? Why?
41. A study of the use of antidepressants selects 50 participants 51. How many hours per week does the average public school
whose ages are between 20 and 29, 50 participants whose teacher work?
ages are between 30 and 39, and 50 participants whose
ages are between 40 and 49. 52. What is the percentage of American voters who favor a
constitutional amendment banning gay marriages?
42. Every 100th computer chip that is produced is given a reli-
ability test. 53. Do teenagers with a diet high in dairy products have a
higher incidence of acne?
43. A computer randomly selects 400 names from a list of all
registered voters. Those selected are surveyed to predict 54. Do drivers of the same model car get better mileage with
who will win the election for Mayor. high-ethanol fuel?
44. A taste test for chips and salsa is given at the entrance to a 55. Does a multi-vitamin a day reduce the incidence of
supermarket. strokes?
Type of Study. For Exercises 4550, state whether the study is 56. Are the Sunday horoscopes in a local newspaper more
an observational study or an experiment. If it is an experiment, accurate than the weekday horoscopes?
describe the treatment and control groups and discuss whether
single- or double-blinding is needed. If it is observational, state Margin of Error. Each of Exercises 5760 states both a sample
whether it is a case-control study and, if it is, distinguish statistic and a margin of error. Find the condence interval in
between the cases and the controls. each case, and answer any additional questions asked. Be sure to
explain your answers clearly.
45. A study at the University of Southern California separated
108 volunteers into groups, based on psychological tests 57. A poll is conducted the day before a state election for Sen-
designed to determine how often they lied and cheated. ator. There are only two candidates running. The poll
Those with a tendency to lie had different brain structures shows that 53% of the voters surveyed favor the Republi-
than those who did not lie (British Journal of Psychiatry). can candidate, with a margin of error of 2.5 percentage
points. Should the Republican plan a victory party? Why
46. A National Cancer Institute study of 716 melanoma or why not?
patients and 1014 cancer-free patients matched by age, sex,
and race found that those having a single large mole had 58. A poll is conducted the day before an election for U.S.
twice the risk of melanoma. Having 10 or more moles was Representative. There are only two candidates running.
associated with a 12 times greater risk of melanoma The poll shows that 48.5% of the voters surveyed favor the
(Journal of the American Medical Association). Democratic candidate, with a margin of error of 2.0 per-
centage points. Based on this poll, should the Democratic
47. In a study done at Boston University, researchers took candidate expect to lose the election? Why or why not?
snapshots of 4000 white adults every four years for 30 years
and determined that 9 of 10 men and 7 of 10 women will 59. Of 133 adult Americans surveyed in a Gallup poll who said
eventually become overweight (Annals of Internal Medicine). their vacation plans had changed because of high gasoline
prices, 58% said they had changed their destination or
48. A breast cancer study began by asking 25,624 women ques- shortened their trip. With a margin of error of 9.0 per-
tions about how they spent their leisure time. The health centage points, can you say that a majority of Americans
of these women was tracked over the next 15 years. Those changed their destination or shortened their trip?
women who said they exercise regularly were found to
have a lower incidence of breast cancer (New England Jour- 60. In a survey of 1002 people, 701 (which is 70%) said that
nal of Medicine). they voted in the most recent presidential election (based
benn.8206.05.pgs 12/15/06 8:23 AM Page 337

5A Fundamentals of Statistics 337

on data from ICR Research Group). The margin of error 65. In a TIME/CNN poll, 748 adults were asked whether they
for the survey was 3 percentage points. However, actual believed their children would have a higher standard of liv-
voting records show that only 61% of all eligible voters ing than they have; 63% of those polled said yes. The
actually did vote. Does this necessarily imply that people margin of error was 3.7 percentage points.
lied when they answered the survey?
66. A Gallup poll of 1002 American adults determined that
81% of those surveyed believed that the state of moral val-
ues in the country overall was getting worse. The margin
of error was 3.2 percentage points.

67. Based on its survey of 60,000 households (see Example 2),


the U.S. Labor Department reported an unemployment
rate of 6.4% in June 2003. The margin of error for the
report was 0.2 percentage point.

68. The Pew Research Center asked 1546 adult Americans


whether humans would land on Mars within the next
50 years; 76% of these people said either denitely yes
or probably yes. The margin of error for the poll was
2.5 percentage points.

69. A Fox News opinion poll asked 900 registered voters, Do


you personally think the government is listening to your
FURTHER APPLICATIONS phone conversations? Thirty percent of those surveyed
Experiment Results. Consider an experiment designed to responded yes and 58% responded no. The margin of
determine the effectiveness of a new drug. The drug is given to error was 3.0 percentage points.
participants in the treatment group, while participants in the
control group receive a placebo. For each set of results described 70. A Roper Organization survey of 2000 adults revealed that
in Exercises 6164, discuss whether there appears to be evidence 64% of those surveyed kept money in a regular savings
that the treatment is effective. account. The margin of error for the survey was 2.2 per-
centage points.
61. 70% of those in the treatment group showed improve-
ment; 30% of those in the placebo group showed
improvement. WEB PROJECTS
62. 45% of those in the treatment group showed improve- Find useful links for Web Projects on the text Web site:
ment; 45% of those in the placebo group showed www.aw.com/bennett-briggs
improvement. 71. Current Nielsen Ratings. Find the Nielsen ratings for
63. 90% of those in the treatment group showed improve- the past week. What were the three most popular televi-
ment; 50% of those in the placebo group showed sion shows? Explain both the rating and the share for
improvement. each show.

64. 25% of those in the treatment group showed improve- 72. Nielsen Sample. Use information available on the
ment; 50% of those in the placebo group showed Nielsen Media Research Web site to answer each of the
improvement. following questions.
Interpreting Real Studies. For each of Exercises 6570, do a. How does Nielsen select the sample of homes to be
the following: included in a viewer survey?
a. Identify the population and the population parameter of b. Describe a few ways by which Nielsen attempts to
interest. check that the results from its people meter surveys are
b. Briey describe the sample and sample statistic for the accurate.
study. c. Based on what you have learned, do you think the
c. Find the condence interval likely to contain the population Nielsen ratings are reliable? If so, why? If not, why
parameter of interest. not?
benn.8206.05.pgs 12/15/06 8:23 AM Page 338

338 CHAPTER 5 Statistical Reasoning

73. Attitude Update. The Pew Research Center for the Peo- not chosen a major, answer this question for a major that
ple and the Press studies public attitudes toward the press, you are considering.)
politics, and public policy issues. Go to its Web site and
78. Statistics in Sports. Choose a sport and describe three
nd the latest survey about attitudes. Write a one-page
different statistics commonly tracked by participants in or
summary of what Pew surveyed, how it conducted the sur-
spectators of the sport. In each case, briey describe the
vey, and what it found.
importance of the statistic to the sport.
74. Labor Statistics. Use the Bureau of Labor Statistics Web 79. Sample and Population. Find a report in todays news
page to learn about its monthly survey. Choose one aspect concerning any type of statistical study. What is the popu-
of the survey, such as how the sample is chosen or how it is lation being studied? What is the sample? Why do you
used to compare unemployment rates over time. Write a think the sample was chosen as it was?
short summary of what you learn.
80. Poor Sampling. In a recent newspaper or magazine, nd
75. Professional Polling. Visit the Web site of a national an article about a study that attempts to describe some
polling organization and report on a recent poll. Write a characteristic of a population, but that you believe involved
short description of the poll and its results, commenting poor sampling (for example, a sample that was too small or
on features such as sampling technique, sample size, and unrepresentative of the population under study). Describe
margin of error. the population, the sample, and what you think was wrong
with the sample. Briey discuss how you think the poor
IN THE NEWS sampling affected the study results.
76. Statistics in the News. Select three news stories from the 81. Good Sampling. In a recent newspaper or magazine, nd
past week that involve statistics in some way. In each case, an article that describes a statistical study in which the
write one or two paragraphs describing the role of statistics sample was well chosen. Describe the population, the sam-
in the story. ple, and why you think the sample was a good one.
77. Statistics in Your Major. Write two to three paragraphs 82. Margin of Error. Find a report of a recent survey or poll.
describing the ways in which you think the science of sta- Interpret the sample statistic and margin of error quoted
tistics is important in your major eld of study. (If you have for the survey or poll.

UNIT 5B Should You Believe a Statistical Study?

Most statistical research is carried out with integrity and care. Nevertheless, statistical
research is sufficiently complex that bias can arise in many different ways. We should
always examine reports of statistical research carefully, looking for anything that
might make us question the results. In this unit, we discuss eight guidelines that can
help you answer the question Should I believe a statistical study?

Guideline 1: Identify the Goal, Population,


and Type of Study
Before evaluating the details of a statistical study, we must know what it is about.
Based on what you hear or read, try to answer basic questions such as these:

What was the goal of the study?


What was the population under study? Was the population clearly and appropri-
ately dened?
What type of study was used? Was the type appropriate for the goal?
benn.8206.05.pgs 12/15/06 8:23 AM Page 339

5B Should You Believe a Statistical Study? 339

If you cant nd reasonable answers to these questions, it will be difficult to evaluate


other aspects of the study. By the Way
Surveys show that nearly
half of Americans
E X A M P L E 1 Appropriate Type of Study? believe their horo-
scopes. However, in con-
A newspaper reports: Researchers gave each of the 100 participants their astrological trolled experiments, the
horoscopes, and asked them whether the horoscopes appeared to be accurate. Eighty- predictions of horo-
ve percent of the participants reported that the horoscopes were accurate. The scopes come true no
researchers concluded that horoscopes are valid most of the time. Analyze this study more often than would
according to Guideline 1. be expected by
chance.
SOLUTION The goal of the study was to determine the validity of horoscopes. Based
on the news report, it appears that the study was observational: The researchers simply
asked the participants about the accuracy of the horoscopes. However, because the
accuracy of a horoscope is somewhat subjective, this study should have been a con-
trolled experiment in which some people were given their actual horoscope and oth-
ers were given a fake horoscope. Then the researchers could have looked for
differences between the two groups. Moreover, because researchers could easily inu-
ence the results by how they questioned the participants, the experiment should have
been double-blind. In summary, the type of study was inappropriate to the goal and
its results are meaningless. Now try Exercises 1920.

Guideline 2: Consider the Source


Statistical studies are supposed to be objective, but the people who carry them out and
fund them may be biased. Thus, it is important to consider the source of a study and
evaluate the potential for biases that might invalidate its conclusions.

E X A M P L E 2 Is Smoking Healthy?
By 1963, enough research on the health dangers of smoking had
accumulated that the Surgeon General of the United States publicly
announced that smoking is bad for health. Research done since that
time has built further support for this claim. However, while the
vast majority of studies show that smoking is unhealthy, a few stud-
ies found no dangers from smoking, and perhaps even health
benets. These studies generally were carried out by the Tobacco
Research Institute, funded by the tobacco companies. Analyze the
Tobacco Research Institute studies according to Guideline 2.

SOLUTION Tobacco companies had a nancial interest in mini-


mizing the dangers of smoking. Because the studies carried out at
the Tobacco Research Institute were funded by the tobacco compa-
nies, there may have been pressure on the researchers to produce
results to the companies liking. This potential for bias does not
mean their research was biased, but the fact that it contradicts virtu-
ally all other research on the subject should be cause for concern.
Now try Exercises 2122. Copyright 1998, 2004 by Sidney Harris.
benn.8206.05.pgs 12/15/06 8:23 AM Page 340

340 CHAPTER 5 Statistical Reasoning

Guideline 3: Look for Bias in the Sample


By the Way
Look for bias that may prevent the sample from being representative of the popula-
After decades of argu- tion. There are two particularly common forms of bias that can affect sample selection.
ing to the contrary, in
1999 the Philip Morris
Companythe worlds
largest seller of tobacco
productspublicly
acknowledged that BIAS IN CHOOSING A SAMPLE
smoking causes lung
cancer, heart disease, Selection bias occurs whenever researchers select their sample in a way that tends
emphysema, and other to make it unrepresentative of the population. For example, a pre-election poll
serious diseases. Shortly that surveys only registered Republicans has selection bias because it is unlikely to
thereafter, Philip Morris reect the opinions of all voters.
changed its name to
Altria. Participation bias occurs primarily with surveys and polls; it arises whenever
people choose whether to participate. Because people who feel strongly about an
issue are more likely to participate, their opinions may not represent the larger pop-
ulation that is less emotionally attached to the issue. (Surveys or polls in which peo-
ple choose whether to participate are often called self-selected or voluntary response
surveys.)

CA S E S T U DY The 1936 Literary Digest Poll


The Literary Digest, a popular magazine of the 1930s, successfully predicted the out-
HISTORICAL NOTE
comes of several elections using large polls. In 1936, editors of the Literary Digest
A young pollster named conducted a particularly large poll in advance of the presidential election. They ran-
George Gallup con-
domly chose a sample of 10 million people from various lists, including names in tele-
ducted his own survey
prior to the 1936 elec- phone books and rosters of country clubs. They mailed a postcard ballot to each of
tion. Sending postcards these 10 million people. About 2.4 million people returned the postcard ballots. Based
to only 3000 randomly on the returned ballots, the editors of the Literary Digest predicted that Alf Landon
selected people, he cor- would win the presidency by a margin of 57% to 43% over Franklin Roosevelt.
rectly predicted not only
Instead, Roosevelt won with 62% of the popular vote. How did such a large survey go
the outcome of the
election, but also the so wrong?
outcome of the Literary The sample suffered from both selection bias and participation bias. The selection
Digest poll to within 1%. bias arose because the Literary Digest chose its 10 million names in ways that favored
Gallup went on to affluent people. For example, selecting names from telephone books meant choosing
establish a very success-
only from those who could afford telephones back in 1936. Similarly, country club
ful polling organization.
members are usually quite wealthy. The selection bias favored Landon because he
was the Republican, and affluent voters of the 1930s tended to vote for Republican
candidates.
The participation bias arose because return of the postcard ballots was voluntary.
People who felt most strongly about the election were more likely to be among those
who returned their postcard ballots. This bias also tended to favor Landon because he
was the challengerpeople who did not like President Roosevelt could express their
desire for change by returning the postcards. Together, the two forms of bias made
the sample results useless, despite the large number of people surveyed.
benn.8206.05.pgs 12/15/06 8:23 AM Page 341

5B Should You Believe a Statistical Study? 341

E X A M P L E 3 Self-Selected Poll By the Way


The television show Nightline conducted a poll in which viewers were asked whether More than a third of all
the United Nations headquarters should be kept in the United States. Viewers could Americans routinely shut
respond to the poll by paying 50 cents to call a 900 phone number with their opin- the door or hang up the
phone when contacted
ions. The poll drew 186,000 responses, of which 67% favored moving the United
for a survey, thereby
Nations out of the United States. Around the same time, a poll using simple random making self-selection a
sampling of 500 people found that 72% wanted the United Nations to stay in the problem for legitimate
United States. Which poll is more likely to be representative of the general opinions pollsters. One reason
of Americans? people hang up may be
the proliferation of sell-
SOLUTION The Nightline sample suffered from severe participation bias. Not only ing under the guise of
did viewers choose whether to call in for the survey, but they had to pay to participate. market research (often
called sugging), in
This cost made it even more likely that respondents would be those who felt a need
which a telemarketer
for change. Thus, despite its large number of respondents, the Nightline survey was pretends you are part of
too biased to be trusted. In contrast, a simple random sample of 500 people is quite a survey in order to get
likely to be representative, so the nding of this small survey has a better chance of you to buy something.
representing the true opinions of all Americans. Now try Exercises 2324.

Guideline 4: Look for Problems in Defining or


Measuring the Variables of Interest
Statistical studies usually attempt to measure something, and we call the things being
measured the variables of interest in the study. The term variable simply refers to an
item or quantity that can vary or take on different values. For example, variables in
the Nielsen ratings include show being watched and number of viewers.

DEFINITION

A variable is any item or quantity that can vary or take on different values. The
variables of interest in a statistical study are the items or quantities that the study
seeks to measure.

Results of a statistical study may be especially difficult to interpret if the variables


under study are difficult to dene or measure. For example, imagine trying to conduct
a study of how exercise affects resting heart rates. The variables of interest would be
amount of exercise and resting heart rate. However, both variables are difficult to dene
and measure. In the case of amount of exercise, its not clear what the denition covers:
Does it include walking to class? Even if we specify the denition, how can we meas-
ure amount of exercise given that some forms of exercise are more vigorous than oth-
ers? The following two examples describe real cases in which dening or measuring
variables caused problems in statistical studies.

Time out to think


How would you measure your resting heart rate? Describe some difficulties in defin-
ing and measuring resting heart rate.
benn.8206.05.pgs 12/15/06 8:23 AM Page 342

342 CHAPTER 5 Statistical Reasoning

E X A M P L E 4 Can Money Buy Love?


A Roper poll reported in USA Today involved a survey of the wealthiest 1% of Ameri-
cans. The survey found that these people would pay an average of $487,000 for true
love, $407,000 for great intellect, $285,000 for talent, and $259,000 for eternal
youth. Analyze this result according to Guideline 4.
SOLUTION The variables in this study are very difficult to dene. How, for example,
do you dene true love? And does it mean true love for a day, a lifetime, or some-
thing else? Similarly, does the ability to balance a spoon on your nose constitute tal-
ent? Because the variables are so poorly dened, its likely that different people
interpreted them differently, making the results very difficult to interpret.
Now try Exercise 25.

E X A M P L E 5 Illegal Drug Supply


Law enforcement authorities try to stop illegal drugs from entering the country. A
commonly quoted statistic is that they succeed in stopping only about 10% to 20% of
the drugs entering the United States. Should you believe this statistic?
SOLUTION There are essentially two variables in the study: quantity of illegal drugs
intercepted and quantity of illegal drugs NOT intercepted. It should be relatively easy to
measure the quantity of illegal drugs that law enforcement officials intercept. How-
ever, because the drugs are illegal, its unlikely that anyone is reporting the quantity of
drugs that are not intercepted. How, then, can anyone know that the intercepted
drugs are 10% to 20% of the total? In a New York Times analysis, a police officer was
By the Way quoted as saying that his colleagues refer to this type of statistic as P.F.A., for
Many hardware stores pulled from the air. Now try Exercise 26.
sell simple kits that you
can use to test whether
radon gas is accumulat-
ing in your home. If it is, Guideline 5: Watch Out for Confounding Variables
the problem can be Variables that are not intended to be part of the study can sometimes make it difficult
eliminated by installing to interpret results properly. Such variables are often called confounding variables,
an appropriate radon
because they confound (confuse) a studys results.
mitigation system, which
usually consists of a fan Its not always easy to discover confounding variables. Sometimes they are discov-
that blows the radon out ered years after a study was completed, and sometimes they are not discovered at all.
from under the house Fortunately, confounding variables are sometimes more obvious and can be discov-
before it can get in. ered simply by thinking hard about factors that may have inuenced a studys
results.

E X A M P L E 6 Radon and Lung Cancer


Radon is a radioactive gas produced by natural processes (the decay of uranium) in the
ground. The gas can leach into buildings through the foundation and can accumulate
in relatively high concentrations if doors and windows are closed. Imagine a study
that seeks to determine whether radon gas causes lung cancer by comparing the lung
cancer rate in Colorado, where radon gas is fairly common, with the lung cancer rate
in Hong Kong, where radon gas is less common. Suppose the study nds that the
benn.8206.05.pgs 12/15/06 8:23 AM Page 343

5B Should You Believe a Statistical Study? 343

lung cancer rates are nearly the same. Is it fair to conclude that radon is not a signi-
cant cause of lung cancer?

SOLUTION The variables under study are amount of radon and lung cancer rate. How-
ever, because smoking can also cause lung cancer, smoking rate may be a confounding
variable in this study. In particular, the smoking rate in Hong Kong is much higher
than the smoking rate in Colorado, so any conclusions about radon and lung cancer
must take the smoking rate into account. In fact, careful studies have shown that
radon gas can cause lung cancer, and the U.S. Environmental Protection Agency
(EPA) recommends taking steps to prevent radon from building up indoors.
Now try Exercises 2728.

Guideline 6: Consider the Setting and Wording in


Surveys
Even when a survey is conducted with proper sampling and with clearly dened terms
and questions, its important to watch out for problems in the setting or wording that By the Way
might produce inaccurate or dishonest responses. Dishonest responses are particu- People are more likely to
larly likely when the survey concerns sensitive subjects, such as personal habits or choose the item that
income. For example, the question Do you cheat on your income taxes? is unlikely comes first in a survey
because of what psy-
to elicit honest answers from those who cheat, especially if the setting does not guar-
chologists call the
antee complete condentiality. availability errorthe
In other cases, even honest answers may not be accurate if the wording of ques- tendency to make judg-
tions invites bias. Sometimes just the order of the words in a question can affect the ments based on what is
outcome. A poll conducted in Germany asked the following two questions: available in the mind.
Professional polling
Would you say that traffic contributes more or less to air pollution than industry? organizations must be
very careful to avoid this
Would you say that industry contributes more or less to air pollution than traffic? problem, sometimes by
With the rst question, 45% answered traffic and 32% answered industry. With the posing the question to
some people in one
second question, only 24% answered traffic while 57% answered industry. Thus, sim-
order and to others in
ply changing the order of the words traffic and industry dramatically changed the sur- the opposite order.
vey results.

E X A M P L E 7 Do You Want a Tax Cut?


The Republican National Committee commissioned a poll to nd out whether
Americans supported a tax-cut proposal. Asked whether they favored the tax cut,
67% of respondents answered yes. Should we conclude that Americans supported the
proposal?

SOLUTION A question like Do you favor a tax cut? is biased because it does not
give other options (much like the fallacy of limited choice discussed in Unit 1A). In fact,
an independent poll conducted at the same time gave respondents a list of options for
using surplus revenues. This poll found that 31% wanted the money devoted to Social
Security, 26% wanted it used to reduce the national debt, and only 18% favored using
it for a tax cut. (The remaining 25% of respondents chose a variety of other options.)
Now try Exercises 2930.
benn.8206.05.pgs 12/15/06 8:23 AM Page 344

344 CHAPTER 5 Statistical Reasoning

Guideline 7: Check That Results Are Presented Fairly


Even when a statistical study is done well, it may be misrepresented in graphs or con-
cluding statements. Researchers may occasionally misinterpret the results of their
own studies or jump to conclusions that are not supported by the results, particularly
when they have personal biases toward certain interpretations. In other cases, news
reporters or others may misinterpret a survey or jump to unwarranted conclusions
that make a story seem more spectacular. Misleading graphs are an especially com-
mon problem (see Unit 5D). In general, you should look for inconsistencies between
the interpretation of a study (in pictures and words) and any actual data given with it.

E X A M P L E 8 Does the School Board Need a Statistics Lesson?


The school board in Boulder, Colorado, created a hubbub when it announced that
28% of Boulder school children were reading below grade level, and hence con-
cluded that methods of teaching reading needed to be changed. The announcement
was based on reading tests on which 28% of Boulder school children scored below the
national average for their grade. Do these data support the boards conclusion?
SOLUTION The fact that 28% of Boulder children scored below the national aver-
age for their grade implies that 72% scored at or above the national average. Thus,
the school boards ominous statement about students reading below grade level
makes sense only if grade level means the national average score for a particular
grade. This interpretation of grade level is curious because it means that half the
students in the nation are always below grade levelno matter how high the scores.
The conclusion that teaching methods needed to be changed was not justied by
these data. Now try Exercises 3132.

Guideline 8: Stand Back and Consider the Conclusions


Extraordinary claims Finally, even if a study seems reasonable according to all the previous guidelines, you
require extraordinary should stand back and consider the conclusions. Ask yourself questions such as
evidence. these:
CARL SAGAN (19341996)
Did the study achieve its goals?
Do the conclusions make sense?
Can you rule out alternative explanations for the results?
If the conclusions do make sense, do they have any practical signicance?

E X A M P L E 9 Practical Signicance
An experiment is conducted in which the weight losses of people who try a new Fast
Diet Supplement are compared to the weight losses of a control group of people who
try to lose weight in other ways. After eight weeks, the results show that the treatment
group lost an average of 21 pound more than the control group. Assuming that it has
no dangerous side effects, does this study suggest that the Fast Diet Supplement is a
good treatment for people wanting to lose weight?
SOLUTION Compared to the average persons body weight, the difference of 12 pound
hardly matters at all. Thus, while the statistics in this case may be interesting, they
dont seem to have much practical signicance. Now try Exercises 3336.
benn.8206.05.pgs 12/15/06 8:23 AM Page 345

5B Should You Believe a Statistical Study? 345

SUMMARY Eight Guidelines for Evaluating a Statistical Study

1. Identify the goal of the study, the population considered, and the type of study.
2. Consider the source, particularly with regard to whether the researchers may be
biased.
3. Look for bias that may prevent the sample from being representative of the
population.
4. Look for problems in dening or measuring the variables of interest, which can
make it difficult to interpret results.
5. Watch out for confounding variables that can invalidate the conclusions of a
study.
6. Consider the setting and the wording of questions in any survey, looking for
anything that might tend to produce inaccurate or dishonest responses.
7. Check that results are presented fairly in graphs and concluding statements,
since both researchers and media often create misleading graphics or jump to
conclusions that the results do not support.
8. Stand back and consider the conclusions. Did the study achieve its goals? Do the
conclusions make sense? Do the results have any practical signicance?

EXERCISES 5B

QUICK QUIZ 3. Consider a study designed to learn about the social net-
works of all college freshmen, in which the researchers
Choose the best answer to each of the following questions.
randomly interviewed students living in on-campus dormi-
Explain your reasoning with one or more complete sentences.
tories. The way this sample was chosen means the study
1. You read about an issue that was subject to an observa- will suffer from
tional study when clearly it should have been studied with
a double-blind experiment. The results from the observa- a. selection bias.
tional study are therefore b. participation bias.
a. still valid, but a little less reliable. c. confounding variables.
b. valid, but only if you rst correct for the fact that the 4. The show American Idol selects winners based on votes cast
wrong type of study was done. by anyone who wants to vote. This means that the winner
c. essentially meaningless. a. is the person most Americans want to win.
b. may or may not be the person most Americans want to
2. A study conducted by the oil company Exxon Mobil shows
win, because the voting is subject to participation bias.
that there was no lasting damage from a large oil spill in
Alaska. This conclusion c. may or may not be the person most Americans want to
win, because the voting should have been double-blind.
a. is denitely invalid, because the study was biased.
5. Consider an experiment in which you measure the weights
b. may be correct, but the potential for bias means that you of 6-year-olds. The variable of interest in this study is
should look very closely at how the conclusion was
reached. a. the size of the sample.

c. could be correct if it falls within the condence interval b. the weights of 6-year-olds.
of the study. c. the ages of the children under study.
benn.8206.05.pgs 12/15/06 8:23 AM Page 346

346 CHAPTER 5 Statistical Reasoning

6. Consider a survey in which 1000 people are asked How 12. Describe and contrast selection bias and participation bias
often do you go to the dentist? The variable of interest in in sampling. Give an example of each.
this study is
13. What do we mean by variables of interest in a study?
a. the number of visits to the dentist.
14. What are confounding variables, and what problems can
b. the 1000-person size of the sample.
they cause?
c. the integers 0 through 5.
7. Imagine a survey of randomly selected people found that DOES IT MAKE SENSE?
people who used sunscreen were more likely to have been Decide whether each of the following statements makes sense
sunburned in the past year. Which explanation for this (or is clearly true) or does not make sense (or is clearly false).
result seems most likely? Explain your reasoning.
a. Sunscreen is useless. 15. The TV survey got more than 1 million phone-in
b. The people in the study all used sunscreen that had responses, so it is clearly more valid than the survey by the
passed its expiration date. professional pollsters, which involved interviews with only
a few hundred people.
c. People who use sunscreen are more likely to spend time
in the sun. 16. The survey of religious beliefs suffered from selection bias
8. You want to know whether people prefer Smith or Jones because the questionnaires were handed out only at
for mayor, and you are considering two possible ways to Catholic churches.
word the question. Wording X is Do you prefer Smith or 17. My experiment proved beyond a doubt that vitamin C can
Jones for mayor? Wording Y is Do you prefer Jones or reduce the severity of colds, because I controlled the exper-
Smith for mayor? (That is, the names are reversed in the iment carefully for every possible confounding variable.
two wordings.) The best approach is to
a. use Wording X for everyone. 18. Everyone who jogs for exercise should try the new training
regimen, because careful studies suggest it can increase
b. use the same wording for everyoneit doesnt matter your speed by 1%.
whether it is Wording X or Wording Y.
c. use Wording X for half the people and Wording Y for BASIC SKILLS & CONCEPTS
the other half.
Would You Believe This Study? Exercises 1930 each
9. A self-selected survey is one in which describe some aspect of a statistical study. Based solely on the
a. the people being surveyed decide which question to information given in each case, decide whether you have any
answer. reason to doubt the results of the study. Explain your reasoning.
b. people decide for themselves whether to be part of the 19. Researchers who want to assess the quality of school
survey. lunches in American elementary schools visit a school in
Topeka, Kansas.
c. the people who design the survey are also the survey
participants. 20. An experimental, double-blind study nds that people who
eat more fast food are more likely to feel tired throughout
10. If a statistical study is carefully conducted in every possible
the day.
way, then
a. its results must be correct. 21. The staff at the conservative Heritage Foundation con-
ducted a study to nd out what people think of the new
b. we can have condence in its results, but it is still possi-
Democratic tax plan.
ble that they are not correct.
c. we say that the study is perfectly biased. 22. A study nanced by a major pharmaceutical company nds
that its new drug is no more effective against high blood
REVIEW QUESTIONS pressure than older, less expensive drugs.

11. Briey describe each of the eight guidelines for evaluating 23. A TV talk show host asks the TV audience, Do you sup-
statistical studies. Give an example to which each guideline port a national speed limit of 55 mph? and asks people to
applies. vote by telephone at a toll-free number.
benn.8206.05.pgs 12/15/06 8:23 AM Page 347

5B Should You Believe a Statistical Study? 347

24. In trying to determine whether their candidate for gover- 36. Researchers, monitoring the health of 200 people who take
nor has a chance of defeating the incumbent Democrat, at least two pills per day, claim that people who take pills
the Republican Party conducts a survey of 1000 of its regularly have better health.
members, selected at random.

25. A study claims to have found that Europeans lead more FURTHER APPLICATIONS
fullling lives than Americans. Bias. Exercises 3744 present situations in which bias may be
an issue. Describe one potential source of bias in the situation,
26. A government study nds, based on people who had their and briey discuss whether the bias should affect your view of
tax returns audited, that 15% of taxpayers understate their the situation.
income.
37. People visiting the Web site SaveTheAnimals.com can
27. In a study designed to determine whether people who wear vote on whether or not euthanasia of prairie dogs is
helmets while riding a bicycle have fewer accidents, acceptable.
researchers tracked 500 riders with helmets for one month.
38. Market researchers conduct a survey at a supermarket on a
28. A study seeks to learn about obesity among children. The weekday between 10:00 a.m. and noon to determine what
researchers monitor the eating and exercise habits of the fraction of customers use coupons.
children in the study, carefully recording everything they
eat and all their activity.

29. A consumer pollster for soft drinks asked customers in a


supermarket, Do you prefer Zinger sodas or some other
brand?

30. To gauge public opinion on whether there should be a


constitutional amendment to ban ag burning, a survey
asked people, Do you support the American ag?

Would You Believe This Claim? Exercises 3136 each


describe a claim based on a statistical study. Based solely on the
information given in each case, decide whether you have any
reason to doubt the claim. Explain your reasoning.
31. A study involving 200 long-distance runners claimed that a
new energy drink is preferable for all athletes.

32. Citing statistical data indicating that half the children in


the school district are of above average weight, the School 39. An exit poll designed to predict the winner of a local elec-
Board claims to have proved that new exercise classes tion uses interviews with everyone who votes between 7:00
should be mandated for everyone. and 7:30 a.m.

33. The U.S. Census Bureau claims that a larger proportion of 40. An exit poll designed to predict the winner of a national
U.S. residents than ever have earned high school and col- election uses interviews with randomly selected voters in
lege diplomas. New York.

34. Based on data showing that a new cold treatment can 41. In order to determine the opinions of people in the 18- to
shorten the average duration of a cold from 7 days to 24-year age group on controlling illegal immigration,
6.8 days, the company that sells the treatment claims that researchers survey a random sample of 1000 National
everyone should use it. Guard members in this age group.

35. A study of 20 nations (in the Canadian Medical Association 42. A college mails survey forms to all current seniors, asking
Journal) discovered that Germany has the most mean for the students choice of their all-time best and worst
annual visits to a doctor (8.5), while Finland has the professor. Students are asked to return the survey in the
fewest (3.2). campus mail.
benn.8206.05.pgs 12/15/06 8:23 AM Page 348

348 CHAPTER 5 Statistical Reasoning

43. Planned Parenthood members are surveyed to determine 48. CNN reports on a Zagat Survey of Americas Top Restau-
whether American adults prefer abstinence, counseling and rants which found that only nine restaurants achieved a
education, or morning-after pills for high school students. rare 29 out of a possible 30 rating and none of those
restaurants is in the Big Apple.
44. Scientists working for Greenpeace (which opposes geneti-
cally engineered crops) conduct a study to determine 49. USA Today reports that two-thirds of adults say that cell
whether Monsantos new, genetically engineered soybean phone use during a dinner for two at a nice restaurant is
poses any threat to the environment. unacceptable.
50. Only 2% of the estates of Americans who died in the past
45. Its All in the Wording. Princeton Survey Research Asso- year paid estate taxes, while 60% of Americans favor
ciates did a study for Newsweek magazine illustrating the repealing estate taxes.
effects of wording in a survey. Two questions were asked:
51. Time Magazine reports that 28% of Americans polled
Do you personally believe that abortion is wrong?
believe the Bible is literally true, down from 38% in 1976.
Whatever your own personal view of abortion, do you
favor or oppose a woman in this country having the 52. Thirty percent of newborns in India would qualify for
choice to have an abortion with the advice of her doctor? intensive care if they were born in the United States.

To the rst question, 57% of the respondents replied yes, Accurate Headlines? Exercises 5355 give a headline and a
while 36% responded no. In response to the second ques- brief description of the statistical news story that accompanied
tion, 69% of the respondents favored the choice, while the headline. In each case, discuss whether the headline accu-
24% opposed the choice. Discuss why the two questions rately represents the story.
produced seemingly contradictory results. How could the 53. Headline: Drugs shown in 98 percent of movies
results of the questions be used selectively by various
groups? Story summary: A government study claims that drug
use, drinking, or smoking was depicted in 98% of the top
46. Tax or Spend? A Gallup poll asked the following two movie rentals (Associated Press).
questions:
54. Headline: Sex more important than jobs
Do you favor a tax cut or increased spending on other
government programs? Result: 75% for tax cut. Story summary: A survey found that 82% of 500 people
interviewed by phone ranked a satisfying sex life as impor-
Do you favor a tax cut or spending to fund new retire- tant or very important, while 79% ranked job satisfaction
ment savings accounts, as well as increased spending on as important or very important (Associated Press).
education, defense, Medicare and other programs?
Result: 60% for the spending. 55. Headline: Grape juice may ght disease
Story summary: A study of 15 people, partially funded by
Discuss why the two questions produced seemingly contra-
Welch Foods, found that grape juice helps to expand blood
dictory results. How could the results of the questions be
vessels and increase the levels of HDL cholesterol. Both
used selectively by various groups?
constricted blood vessels and low HDL levels are risk fac-
tors for heart disease (Milwaukee Journal Sentinel).
Stat-Bytes. Politicians must make their political statements
(often called sound-bytes) very short because the attention span 56. Exercise and Dementia. A recent study in the Annals of
of listeners is so short. A similar effect occurs in reporting sta- Internal Medicine was summarized by the Associated Press,
tistical news. Major statistical studies are often reduced to one in part, as follows:
or two sentences. The summaries of statistical reports in Exer- The study followed 1740 people aged 65 and older who showed
cises 4752 are taken from various news sources. Discuss what no signs of dementia at the outset. The participants health was
crucial information is missing and what more you would want evaluated every two years for six years. Out of the original
to know before you acted on the report. pool, 1185 were later found to be free of dementia, 77 percent
47. The Atlantic, summarizing a Federal Highway Administra- of whom reported exercising three or more times a week;
tion report, says that the worst traffic bottleneck in the 158 people showed signs of dementia, only 67 percent of whom
United States is the U.S. 101/I-405 interchange, which said they exercised that much. The rest either died or withdrew
generates 27,144 hours of delay every year. from the study.
benn.8206.05.pgs 12/15/06 8:23 AM Page 349

5C Statistical Tables and Graphs 349

a. How many people completed the study? IN THE NEWS


b. Fill in the following two-way table (with numbers of 59. Applying the Guidelines. Find a recent newspaper arti-
individuals), using the gures given in the above cle or television report about a statistical study on a topic
passage: that you nd interesting. Write a short report applying
each of the eight guidelines given in this section. (Some of
Exercise No exercise Total the guidelines may not apply to the particular study you
are analyzing. In that case, explain why the guideline is not
Dementia applicable.)
No dementia
Total 60. Believable Results. Find a recent news report about a
statistical study whose results you believe are meaningful
c. Draw a Venn diagram with two overlapping circles to and important. In one page or less, summarize the study
illustrate the data. and explain why you nd it believable.

61. Unbelievable Results. Find a recent news report about a


WEB PROJECTS statistical study whose results you dont believe are mean-
ingful or important. In one page or less, summarize the
Find useful links for Web Projects on the text Web site:
study and why you dont believe its claims.
www.aw.com/bennett-briggs
57. Polling Organization. Go to the Web site for a major 62. Legal Experts. Find a news report concerning a major
professional polling organization. Study results from a ongoing trial. Find out whether any of the expert wit-
recent poll, and evaluate the poll according to the guide- nesses are being paid by either side. Based on what you
lines in this section. learn, describe whether you think the experts are giving
biased testimony.
58. Harpers Index. Go to the Web site for the Harpers
Index and study a few of the recently quoted statistics. Be 63. Biased Questioning? Find a recent news report of
sure to select the option on the page that allows you to see responses to a single question in an opinion poll. State the
the sources for the statistics. Choose three statistics that exact words of the question and the results of the poll.
you nd particularly interesting, and discuss whether, in Analyze the question and the reported results for potential
accord with the guidelines given in this section, you biases. At the end of your analysis, state whether you
believe them. believe the results, and defend your opinion.

UNIT 5C Statistical Tables and Graphs

Whether you look at a newspaper, a corporate annual report, or a government study,


you are almost sure to see tables and graphs of statistical data. Some of these tables
and graphs are simple; others can be quite complex. Some make it easy to understand
the data; others may be confusing or even misleading. In this unit, well investigate
some of the basic principles behind tables and graphs, preparing for more complex
graphics in Unit 5D.

Frequency Tables
A teacher makes the following list of the grades she gave to her 25 students on an
essay:
A C C B C D C C F D C C C B B A B D B A A B F C B
benn.8206.05.pgs 12/15/06 8:23 AM Page 350

350 CHAPTER 5 Statistical Reasoning

TABLE 5.1 This list contains all the raw data, but it isnt easy to read. A better way to display
Grade Frequency these data is with a frequency tablea table showing the number of times, or freq-
uency, that each grade appears (Table 5.1). The ve possible grades are called the
A 4 categories for the table.
B 7 There are two common variations on the idea of frequency. The relative fre-
quency for a category expresses its frequency as a fraction or percentage of the total.
C 9
For example, 4 of the 25 students received A grades, so the relative frequency for A
D 3 grades is 4 > 25, or 16%. The total relative frequency must always be 1, or 100%.
F 2 However, because of rounding, you may sometimes nd that the relative frequencies
in a table or chart add up to slightly more or less than 100%.
Total 25 The cumulative frequency is the number of responses in a particular category
and all preceding categories. For example, the cumulative frequency for grades of C
and above is 20, because 20 students received grades of either A, B, or C.

DEFINITION

A basic frequency table has two columns:


The rst column lists all the categories of data.
The second column lists the frequency of each category, which is the number of
times each category appears in the data set.
Additional columns may include relative frequency (frequency expressed as a
fraction or percentage of the total) or cumulative frequency (total of frequencies
for the given category and all previous categories).

E X A M P L E 1 Relative and Cumulative Frequency


Add to Table 5.1 columns showing the relative and cumulative frequencies.
SOLUTION Table 5.2 shows the new columns and calculations.

TABLE 5.2
Grade Frequency Relative Frequency Cumulative Frequency
A 4 4 > 25 5 16% 4
B 7 7 > 25 5 28% 7 1 4 5 11
C 9 9 > 25 5 36% 9 1 7 1 4 5 20
D 3 3 > 25 5 12% 3 1 9 1 7 1 4 5 23
F 2 2 > 25 5 8% 2 1 3 1 9 1 7 1 4 5 25
Total 25 1 5 100% 25

Now try Exercises 2526.

Time out to think


Briefly explain why the total relative frequency should always be 1, or 100%.
benn.8206.05.pgs 12/15/06 8:23 AM Page 351

5C Statistical Tables and Graphs 351

Data Types
Essay grades represent subjective ratings, not actual measurements or counts. We say
that the grade categories are qualitative, because they represent qualities such as bad
or good. In contrast, scores on a multiple-choice exam are quantitative, because they
represent an actual count (or measurement) of the number of correct answers. As
well see shortly, distinguishing between qualitative and quantitative data can be use-
ful in creating tables or graphs.

DATA TYPES

Qualitative data describe qualities or nonnumerical categories.


Quantitative data represent counts or measurements.

E X A M P L E 2 Data Types
Classify each of the following types of data as either qualitative or quantitative.
a. Brand names of shoes in a consumer survey
b. Heights of students
c. Audience ratings of a lm on a scale of 1 to 5, where 5 means excellent
SOLUTION

a. Brand names are nonnumerical categories, so they are qualitative data.


b. Heights are measurements, so they are quantitative data.
c. Although the lm rating categories involve numbers, the numbers represent
subjective opinions about a lm, not counts or measurements. Thus, they
are qualitative data, despite being stated as numbers.
Now try Exercises 2734.

Time out to think


Give another example in which numbers are used to represent qualitative data
rather than quantitative data.

Binning Data
When we deal with quantitative data categories, its often useful to group, or bin, the
data into categories that cover a range of possible values. For example, in a table of
income levels, it might be useful to create bins of $0 to $20,000, $20,001 to $40,000,
and so on. In this case, the frequency of each bin is simply the number of people with
incomes in that bin.

E X A M P L E 3 Binned Exam Scores


Consider the following set of 20 scores from a 100-point exam:
76 80 78 76 94 75 98 77 84 88 81 72 91 72 74 86 79 88 72 75
benn.8206.05.pgs 12/15/06 8:23 AM Page 352

352 CHAPTER 5 Statistical Reasoning

Determine appropriate bins and make a frequency table. Include columns for relative
and cumulative frequency, and interpret the cumulative frequency for this case.
SOLUTION The scores range from 72 to 98. One way to group the data is with 5-point
bins. The rst bin represents scores from 95 to 99, the second bin represents scores
from 90 to 94, and so on. Note that there is no overlap between bins. We then count the
frequency (the number of scores) in each bin. For example, only 1 score is in bin 95 to
99 (the high score of 98) and 2 scores are in bin 90 to 94 (the scores of 91 and 94).
Table 5.3 shows the complete frequency table. In this case, we interpret the cumula-
tive frequency of any bin to be the total number of scores in or above that bin. For
example, the cumulative frequency of 6 for the bin 85 to 89 means that 6 scores are
either between 85 and 89 or higher than 89.

TABLE 5.3 Frequency Table for Binned Exam Scores


Scores Frequency Relative Frequency Cumulative Frequency
95 to 99 1 0.05 5 5% 1
90 to 94 2 0.10 5 10% 3
85 to 89 3 0.15 5 15% 6
80 to 84 3 0.15 5 15% 9
75 to 79 7 0.35 5 35% 16
70 to 74 4 0.20 5 20% 20
Total 20 1.00 5 100% 20

Now try Exercises 3536.

Essay Grade Data


Bar Graphs and Pie Charts
Bar graphs and pie charts are commonly used to show data when
10 the categories are qualitative. You are probably familiar with both,
9 36% but lets review the basic ideas.
8 32% Consider the essay grade data in Table 5.1. A bar graph would
show each category with a bar whose length corresponded to its
Frequency of grade

7 28%
Relative frequency

frequency. If you make a bar graph by hand (as opposed to with


6 24%
a computer), you should measure the bar lengths carefully to
5 20%
make sure they correctly correspond to the frequencies. In
4 16% Figure 5.3, for example, the vertical axis is marked with frequen-
3 12% cies 12 centimeter apart. Thus, the bar for A grades is 2 centime-
2 8% ters long, because the frequency of A grades is 4. Note that the
1 4%
left side of the bar graph in Figure 5.3 is marked with frequency,
while the right side is marked with relative frequency. As you
0
A B C D F can see, bar graphs make it easy to display both frequencies
Grade simultaneously.
FIGURE 5.3 Bar graph for the essay grade data in In contrast, pie charts are used primarily for relative frequen-
Table 5.1. cies, because the total pie must always represent the total relative
benn.8206.05.pgs 12/15/06 8:23 AM Page 353

5C Statistical Tables and Graphs 353

frequency of 100%. The size of each wedge is proportional to the relative frequency
of the category it represents. Figure 5.4 shows a pie chart for the essay grade data. To
make comparisons easier, relative frequencies are often written on pie chart wedges.

A
16% F
8%

B D
28% 12%

C
36%

FIGURE 5.4 Pie chart for the essay grade data in Table 5.1.

Nowadays, most people make graphs with the aid of computers that measure bar
lengths or wedge sizes automatically. However, you must still specify any labels or
axis marks you want on a graph. This labeling is extremely important: Without
proper labels, a graph is meaningless. The following summary lists the important
labels for graphs. Of course, not all labels are necessary in all cases. For example, pie
charts do not require a vertical or horizontal scale. Notice how these rules were
applied in Figure 5.3.

IMPORTANT LABELS FOR GRAPHS

Title/caption: The graph should have a title or caption (or both) that explains
what is being shown and, if applicable, lists the source of the data.
Vertical scale and title: Numbers along the vertical axis should clearly indicate
the scale. The numbers should line up with the tick marksthe marks along the
axis that precisely locate the numerical values. Include a label that describes the
variable shown on the vertical axis.
Horizontal scale and title: The categories should be clearly indicated along the
horizontal axis. (Tick marks may not be necessary for qualitative data, but should
be included for quantitative data.) Include a label that describes the variable shown
on the horizontal axis.
Legend: If multiple data sets are displayed on a single graph, include a legend or
key to identify the individual data sets.

E X A M P L E 4 Carbon Dioxide Emissions


Carbon dioxide is released into the atmosphere primarily by the combustion of fossil
fuels (oil, coal, natural gas). Table 5.4 lists the eight countries that emit the most car-
bon dioxide each year. Make bar graphs for the total emissions and the emissions per
person. Put the bars in descending order of size.
benn.8206.05.pgs 12/15/06 8:23 AM Page 354

354 CHAPTER 5 Statistical Reasoning

TABLE 5.4 The Worlds Eight Leading Emitters of Carbon Dioxide


Total Carbon Dioxide Per Person Carbon
Emissions (millions of Dioxide Emissions
Country metric tons of carbon) (metric tons of carbon)
United States 1582 5.4
China 966 0.7
Russia 438 3.0
Japan 329 2.6
India 280 0.3
Germany 230 2.8
Canada 164 5.2
United Kingdom 154 2.6

Source: U.S. Department of Energy, based on 2003 emissions.

SOLUTION The categories are the countries. Because country names are qualitative
data, a bar graph is appropriate.
The values for total carbon dioxide emissions go from 154 to 1582 (millions of
tons), so a range of 0 to 1600 makes a good choice for the vertical scale. Each bars
height corresponds to its data value, and we label the category (country) under the
bar. Figure 5.5a shows the bar graph for total emissions, with bars in order of decreas-
ing height.
The data values for per person emissions range from 0.3 to 5.4 (tons), so a range of
0 to 6 will work for this vertical scale. Figure 5.5b shows the bar graph, again with
bars placed in order of descending height.
Total CO2 Emissions Per Person CO2 Emissions
HISTORICAL NOTE
A bar graph with the 1500 6
Per capita CO2 emissions
of metric tons of carbon)

bars in descending
CO2 emissions (millions

(metric tons of carbon)

order is often called a 1200 5


Pareto chart, after Ital-
4
ian economist Vilfredo 900
Pareto (18481923). 3
600
2
300
1

0 0
U.S.

U.S.
China
Russia

United
Kingdom
Japan
India

Canada

Canada
Russia
United
Kingdom

China
India
Japan
Germany

Germany

(a) (b)
FIGURE 5.5 Bar graphs for (a) total carbon dioxide emissions by country and (b) per per-
son carbon dioxide emissions by country. Now try Exercises 3738.

Time out to think


Note that the two bar graphs in Figure 5.5 do not show the countries in the same
order. Why not? What can we learn by comparing the two graphs? Explain.
benn.8206.05.pgs 12/15/06 8:23 AM Page 355

5C Statistical Tables and Graphs 355

E X A M P L E 5 Simple Pie Chart


Among the registered voters in Rochester County, 25% are Democrats, 25% are
Republicans, and 50% are Independents. Make a pie chart showing the breakdown of
party affiliations in Rochester County.
SOLUTION The wedge sizes should correspond to the relative frequencies. Thus,
the wedges for Republicans and Democrats each occupy one-fourth of the pie, while
the wedge for Independents occupies the remaining half of the pie (Figure 5.6).
Note the importance of clear labeling.

Registered Voters in Rochester County

Democrat
25%

Republican
25%

Independent
50%

FIGURE 5.6 Party affiliations of registered voters in Rochester County.

Now try Exercises 3940.

E X A M P L E 6 Student Majors What Students Expect to Major In

Figure 5.7 is a pie chart showing planned major areas for Undecided
rst-year college students. Make a bar graph showing Other Fields
8.3%
9.9%
the same data, with the bars in order of decreasing size. Arts and
Technical
What are the three most popular major areas? Comment 2.1%
Humanities
12.1%
on the relative ease with which this question can be
Social
answered with the pie chart and the bar graph. Sciences
Biological
10.0%
SOLUTION Figure 5.8 shows the bar graph for the data. Sciences
6.6%
Note that, because we have only relative frequency
data from the pie chart, we can show only relative fre- Professional
quencies on the bar graph. This bar graph makes it 11.6%
immediately obvious that the three most popular major Business
Physical 16.7%
areas are business (16.7%), arts and humanities (12.1%), Sciences
and professional (11.6%). (Professional includes elds 2.6% Engineering
Education
with professional licensing, such as architecture, nurs- 8.7% 11.0%
ing, and pharmacy.) In contrast, it takes a fair amount of FIGURE 5.7 Planned major areas for first-year college
study of the pie chart before we can easily list the three students.
most popular major areas. Source: The Chronicle of Higher Education.
benn.8206.05.pgs 12/15/06 8:23 AM Page 356

356 CHAPTER 5 Statistical Reasoning

What Students Expect to Major In

18
16
14

Percentage of students
12
10
8
6
4
2

0
Business

Arts and
Humanities

Professional

Education

Social Sciences

Other

Engineering

Undecided

Biology

Physical
Sciences

Technical
FIGURE 5.8 Bar graph for the data in Figure 5.7. Now try Exercises 41 42.

Time out to think


Example 6 discussed an advantage of a bar graph over a pie chart for showing the
data concerning major areas. Do you think the pie chart has any advantages over
the bar graph? If so, what?

Histograms and Line Charts


For quantitative data categories, the two most common types of graphics are
histograms and line charts. Figure 5.9a shows a histogram for the binned exam data of
Table 5.3. Figure 5.9b shows a line chart for the same data.

Exam Scores Exam Scores


8 8

7 7

6 6

5 5
Frequency

Frequency

4 4

3 3

2 2

1 1

0 0
70 75 80 85 90 95 100 70 75 80 85 90 95 100
Scores Scores

(a) (b)
FIGURE 5.9 (a) Histogram for the data in Table 5.3. (b) Line chart for the same data.
benn.8206.05.pgs 12/15/06 8:23 AM Page 357

5C Statistical Tables and Graphs 357

A histogram is essentially a bar graph in which the data categories are quantita- Technical Note
tive. Thus, the bars on a histogram must follow the natural order of the numerical Different books define
categories. In addition, the widths of histogram bars have a specic meaning. For the terms histogram
example, the width of each bar in Figure 5.9a represents 5 points on the exam. and bar graph differ-
Because there are no gaps between the categories, the bars on a histogram touch each ently. In this book, a
other. bar graph is any
A line chart serves the same basic purpose as a histogram, but instead of using graph that uses bars,
and histograms are
bars, a line chart connects a series of dots. When data are binned, the dot is placed at
bar graphs used for
the center of each bin. Histograms and line charts are often used to show how some
quantitative data
variable changes with time. For example, the line chart in Figure 5.10 shows how the categories.
U.S. homicide rate has changed with time. The categories are time intervals. In this
case, each bin represents a year in the data. Histograms and line charts with time on
the horizontal axis are often called time-series diagrams.

U.S. Homicide Rate


12
Homicides per 100,000 people

10

4
0
1992
1962

1972

1982
1968

1978

1988

1998
1976

1996
1970
1960

1966

1980

1986

1990

2000
2002
2004
1964

1974

1984

1994

Year
FIGURE 5.10 U.S. homicide rate per 100,000 people.
Source: FBI Uniform Crime Reports.

DEFINITIONS
A histogram is a bar graph for quantitative data categories. The bars
have a natural order and the bar widths have specic meaning.
TABLE 5.5
A line chart shows the data value for each category as a dot, and the dots are Number of
connected with lines. For each dot, the horizontal position is the center of Age Actresses
the bin it represents and the vertical position is the data value for the bin.
2029 7
A time-series diagram is a histogram or line chart in which the horizon-
tal axis represents time. 3039 15
4049 6
5059 1
E X A M P L E 7 Oscar-Winning Actresses 6069 3
Table 5.5 shows the ages of 34 recent Academy Awardwinning actresses at 7079 1
the time when they won their award. Make a histogram and a line chart to
8089 1
display these data. Discuss the results.
benn.8206.05.pgs 12/15/06 8:23 AM Page 358

358 CHAPTER 5 Statistical Reasoning

SOLUTION The fact that the categories are 10-year bins makes the data quantitative.
Thus, a histogram is appropriate. Figure 5.11a shows the histogram. The bars touch
one another because there are no gaps between the categories.
Figure 5.11b shows the same data as a line chart. The histogram is also included to
show how it relates to the line chart. In looking at these data, we see that actresses are
most likely to win Oscars when they are fairly young.

Ages of 34 Academy AwardWinning Actresses Ages of 34 Academy AwardWinning Actresses


20 20
Number of actresses

Number of actresses
15 15

10 10

5 5

0 0
10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90
Age at time of award Age at time of award

(a) (b)
FIGURE 5.11 Histogram for ages of 34 recent Academy Awardwinning actresses. (b) Line chart for the same data, with
histogram overlaid for comparison. Now try Exercises 4344.

E X A M P L E 8 Reading a Time-Series Diagram


Figure 5.12 shows a time-series line chart of stock, bond, and gold prices over a
12-week period. Suppose that, on July 7, you invested $100 in a stock fund that tracks
the S&P 500, $100 in a bond fund that follows the Lehman Index, and $100 in gold.
If you sold all three funds on August 4, how much did you gain or lose?

MARKET GAUGE: COMPARING INVESTMENTS

How $100 $105


HISTORICAL NOTE invested Stocks

Gold was once consid- 12 weeks ago in


ered to be a solid invest- stocks
ment and an important (measured by
the S.&P. 500), 100
part of any investment
portfolio. However, gold bonds (Lehman
Treasury Bond Gold
prices have languished
in recent decades. In Index) and gold Bonds
2006, gold was worth would have
fared through 95
only about $650 per
ouncemuch less than yesterday.
7 14 21 28 4 11 18 25 1 8 15 22 29
its inflation-adjusted July Aug. Sept.
value of more than
$2000 per ounce in 1980.
FIGURE 5.12
benn.8206.05.pgs 9/29/07 11:53 AM Page 359

5C Statistical Tables and Graphs 359

SOLUTION The graph shows that the $100 in the stock fund would have been worth
about $101 on August 4. The $100 bond investment would have declined in value to
about $96. The gold investment would have held its initial value of $100. Thus, on
August 4, your complete portfolio would have been worth $101 1 $96 1 $100 5
$297. You would have lost $3 on your total investment of $300.
Now try Exercises 4546.

EXERCISES 5C

QUICK QUIZ 7. You have a list of the GPAs of 100 college graduates, pre-
cise to the nearest 0.001. You want to make a frequency
Choose the best answer to each of the following questions.
table for these data. A good rst step would be to
Explain your reasoning with one or more complete sentences.
1. In a class of 100 students, 25 students received a grade of a. group all the data into bins 0.2 of a grade point wide.
B. What was the relative frequency of a B grade? b. draw a pie chart for the 100 individual GPAs.
a. 25 c. count how many people have identical GPAs.
b. 0.25 8. You have a list of the average gasoline price for each month
c. It cannot be calculated with the information given. during the past year. Which type of display would be most
appropriate for these data?
2. For the class described in Exercise 1, what was the
cumulative frequency of a grade of B or above? a. a bar graph b. a pie chart c. a line chart

a. 25 9. A histogram is
b. 0.25 a. a graph that shows how some quantity has changed
through history.
c. It cannot be calculated with the information given.
b. a graph that shows cumulative frequencies.
3. Which of the following is an example of qualitative data?
c. a bar chart for quantitative data.
a. waist sizes in inches b. ratings of restaurants
10. You have a histogram and you want to convert it into a line
c. meal costs at restaurants
chart. A good rst step would be to
4. The sizes of the wedges in a pie chart tell you a. make a list of all the categories in alphabetical order.
a. the number of categories in the pie chart. b. place a dot at the top of each bar, in the center of the bar.
b. the frequencies of the categories in the pie chart. c. calculate all the relative frequencies that you can read
c. the relative frequencies of the categories in the pie chart. from the histogram.

5. You have a table listing ten tourist attractions and their REVIEW QUESTIONS
annual numbers of visitors. Which type of display would 11. What is a frequency table? Explain what we mean by the
be most appropriate for these data? categories and frequencies. What do we mean by relative
a. a bar graph b. a pie chart c. a line chart frequency? What do we mean by cumulative frequency?

6. Where should you put the names of the ten tourist attrac- 12. What is the distinction between qualitative data and quan-
tions when you make your display of the data described in titative data? Give a few examples of each.
Exercise 5?
13. What is the purpose of binning? Give an example in which
a. They should be in the title of the display. binning is useful.
b. They should be in alphabetical order along the vertical
14. What two types of graphs are most common when the cat-
axis.
egories are qualitative data? Describe the construction of
c. They should be listed along the horizontal axis. each.
benn.8206.05.pgs 12/15/06 8:23 AM Page 360

360 CHAPTER 5 Statistical Reasoning

15. Describe the importance of labeling on a graph, and briey 29. The responses of people in a sausage taste test where
discuss the kinds of labels that should be included on 0 5 inedible up to 5 5 outstanding
graphs.
30. The lowest high temperature in each month of the year in
16. What two types of graphs are most common when the cat- Sedona, Arizona
egories are quantitative data? Describe the construction of
each. 31. The responses (yes, no, undecided) to the question Will
you vote for a new water treatment plant?
DOES IT MAKE SENSE? 32. The total income of each household in America
Decide whether each of the following statements makes sense
33. The dessert selections at a restaurant used in a customer
(or is clearly true) or does not make sense (or is clearly false).
preference poll
Explain your reasoning.
17. I made a frequency table with two columns, one labeled 34. The number of people voting for each dessert selection in
State and one labeled State Capitol. a restaurant preference poll

18. The relative frequency of B grades in our class was 0.3. Binned Frequency Tables. In Exercises 3536, use the indi-
cated bin size to make a frequency table for the following set of
19. Your bar graph must be wrong, because your bars are
exam scores:
wider than the ones shown on the teachers answer key.
20. Your bar graph must be wrong, because it shows different 89 67 78 75 64 70 83 95 69 84
frequencies than the ones shown on the teachers answer 77 88 98 90 92 68 86 79 60 96
key. Include columns for relative frequency and cumulative fre-
21. Your pie chart must be wrong, because you have the 45% quency. Briey explain the meaning of each column.
frequency wedge near the upper left and the answer key 35. Use 5-point bins (95 to 99, 90 to 94, etc.).
shows it near the lower right.
36. Use 10-point bins (90 to 99, 80 to 89, etc.).
22. Your pie chart must be wrong, because when I added the
percentages on your wedges, they totaled 124%. 37. Largest States. The following table shows the ve most
populous U.S. states as of 2004. Make a bar graph for these
23. I was unable to make a bar chart, because the data cate- data, with the bars in descending order.
gories were qualitative rather than quantitative.
24. I rearranged the bars on my histogram so that the tallest State Population
bar would come rst. California 35.9 million
Texas 22.5 million
BASIC SKILLS & CONCEPTS
New York 19.2 million
Frequency Tables. Make a frequency table for the data in
each of Exercises 2526. Include columns for relative frequency Florida 17.4 million
and cumulative frequency. Briey explain the meaning of each Illinois 12.7 million
column.
25. Final grades of 20 students in a math class:
38. Food Franchises. The table below shows the ve food
AA BBBBB CCCCCCCC DDD FF companies with the most franchises. Make a bar graph for
these data, with the bars in descending order.
26. A lm section of a local newspaper lists 5 ve-star lms
(the highest rating), 10 four-star lms, 20 three-star lms, Company Number of franchises
15 two-star lms, and 5 one-star lms.
McDonalds 22,183
Qualitative vs. Quantitative. In Exercises 2734, determine
Subway 21,444
whether the variable described is qualitative or quantitative, and
explain why. Kentucky Fried Chicken 10,040
27. The hair color of individuals Dominos Pizza 6953
28. The average service time in a bank Dunkin Donuts 5759
benn.8206.05.pgs 12/15/06 8:23 AM Page 361

5C Statistical Tables and Graphs 361

Constructing Pie Charts. Exercises 3940 each give a data set. 45. Homicide Rates. Study Figure 5.10. Write one to two
Compute the percentage for each category and construct a pie paragraphs summarizing how the homicide rate has
chart for the data. changed with time since 1960.
39. Six candidates ran for three seats on the City Council. The 46. Death Rates. Figure 5.13 shows overall death rates in the
vote tallies for the candidates are given in the table below. United States during the 20th century. Note that the spike
in 1919 was due to a worldwide epidemic of inuenza.
Candidate Votes Write a few sentences summarizing the overall trend,
Aniston 2380 describing how much the death rate changed during the
century, and putting the 1919 spike into context in terms
Clooney 1030 of its impact on the population.
Cruise 987
Jolie 1753 Death Rates per 1000 Population
20
Pitt 1914
15
Streep 2208

Rate
10
40. In a pizza preference poll, 92 people voted for their 5
favorite toppings as follows. 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000
Year
Topping Votes Figure 5.13 Source: National Center for Health Statistics.

Anchovies 8
FURTHER APPLICATIONS
Cheese 27
Statistical Graphs. Each of Exercises 4756 gives a table of
Pepperoni 16 data. For each exercise, do the following:
Sausage 36 a. Explain whether the data categories are qualitative or
Vegetarian 23 quantitative.
b. If the data categories are qualitative, draw either a bar graph
41. Government Income. The pie chart in Figure 4.12 on or a pie chart for the data. If the data categories are quanti-
p. 308 shows the makeup of federal government receipts. tative, draw either a histogram or a line chart for the data.
Make a bar graph for these data. c. Write a one-paragraph summary of any interesting infor-
42. Government Spending. The pie chart in Figure 4.13 on mation revealed by the graphic.
p. 309 shows the makeup of federal government spending.
47. The following frequency table gives the ages of the Nobel
Make a bar graph for these data.
Prize winners in literature at the time of their award for
43. Oscar-Winning Actors. The following data show the 1990 through 2005.
ages of 34 recent Academy Awardwinning actors at the
time they won their award. Make a frequency table for Age Number of winners
these data, using bins of 2029, 3039, and so on. Then
5859 2
draw both a histogram and a line chart to display the
binned data. 6061 1

32 37 36 32 51 53 33 61 35 45 55 39 6263 3

76 37 42 40 32 60 38 56 48 48 40 43 6465 0

62 43 42 44 41 56 39 46 31 47 40 43 6667 1
6869 2
44. Oscar Winners. In words, contrast the graphs in Exam-
7071 1
ple 7 with those you drew in Exercise 43. Do actors appear
to be more likely to win Oscars when they are younger, 7273 2
older, or neither? Do you think these graphs indicate any 7475 2
difference in how movie makers treat male and female per-
formers? Defend your opinion. 7677 2
benn.8206.05.pgs 12/15/06 8:23 AM Page 362

362 CHAPTER 5 Statistical Reasoning

48. The following table lists the top eight retail companies in 51. The following table lists areas of the worlds major land
the United States, by total sales volume. masses.

Land mass Area (millions of sq. miles)


Company Sales (billions of dollars)
Asia 17.2
Albertsons 36.8
Africa 11.6
Home Depot 45.7
North America 9.3
JC Penney 33.0
South America 6.9
Kmart 37.0
Australia 3.0
Kroger 49.0
Europe 3.8
Sears 40.9
Antarctica 5.1
Target 36.9
All others 2.1
Wal-Mart 193.3

Source: Wall Street Journal Almanac. 52. The following table gives the percentages of total energy
produced in the United States from various sources.
49. The following table shows the average SAT scores for vari-
ous ethnic groups in the United States in 2005. Energy source Percentage of total energy
Coal 32.2%
Ethnic group Average SAT score Natural gas 31.0%
White 1068 Crude oil 16.4%
Black 864 Nuclear power 11.7%
Native American 982 Renewable 8.7%
Asian/Pacic Islander 1091 Source: U.S. Department of Energy.
Hispanic 917
53. The following table gives the stated religions of rst-year
Source: The College Board. college students. (Note: The other religions category
consists of religions that were stated by less than 1% of the
50. The following table lists the ten musical groups with the students in the sample.)
most platinum albums in the United States (1,000,000
sales). Religion Percent of sample
Baptist 11.6
Group Number of platinum albums Catholic 30.5
The Beatles 92 Episcopal 1.7
The Eagles 81 Jewish 2.8
Led Zeppelin 80 Lutheran 5.8
AC/DC 60 Methodist 6.4
Aerosmith 59 Mormon 1.5
Pink Floyd 54 Presbyterian 4.0
Van Halen 50 United Church of Christ 1.5
U2 45 Other religions 19.3
Alabama 44 No religion 14.9
Fleetwood Mac 44 Source: UCLA Higher Education Research Institute.
benn.8206.05.pgs 12/15/06 8:23 AM Page 363

5C Statistical Tables and Graphs 363

54. The following table gives the rates of violent crimes (rape, c. The total numbers of automobile fatalities in 1982 and
robbery, assault, theft) by age of victim. Rates are units of 2003 were 43,945 and 42,643, respectively. What percent-
crimes per 1000 people aged 12 or older. age of all fatalities in these two years involved alcohol?
d. In view of your answer to part c, can you offer explana-
Age group Crime rate tions for the trend in these data? Explain.
1215 51.6 57. Ages of Presidents. The following table gives the order
1619 53.0 of the presidents of the United States and the ages at
which they rst took office.
2024 43.3
a. Find a creative way to display these data.
2534 26.4 b. Which presidents could have said that they were the
35 49 18.5 youngest president (or the same age in years as the
50 64 10.3 youngest) at the time they took office?
c. Which presidents could have said that they were the
.65 2.0
oldest president (or the same age in years as the oldest) at
Source: Bureau of Justice Statistics. the time they took office?
d. Write a paragraph describing signicant features of the
55. The following table gives average family size in the United
data.
States since 1940.
Order 1 2 3 4 5 6 7 8 9 10 11
Year Family size Year Family size
Age 57 61 57 57 58 57 61 54 68 51 49
1940 3.76 1980 3.29
Order 12 13 14 15 16 17 18 19 20 21 22
1950 3.54 1985 3.23
Age 64 50 48 65 52 56 46 54 49 50 47
1960 3.67 1990 3.17
Order 23 24 25 26 27 28 29 30 31 32 33
1965 3.70 1995 3.19
Age 55 55 54 42 51 56 55 51 54 51 60
1970 3.58 2000 3.17
Order 34 35 36 37 38 39 40 41 42 43
1975 3.42 2003 3.19
Age 62 43 55 56 61 52 69 64 46 54
Source: U.S. Bureau of Census.

56. Drunk Driving Deaths. Figure 5.14 shows the number WEB PROJECTS
of automobile fatalities in the United States in which alco-
hol was involved for each year from 1982 to 2003. Find useful links for Web Projects on the text Web site:
www.aw.com/bennett-briggs
Alchohol-Related Fatalities 58. CO2 Emissions. Look for updated data concerning inter-
30,000 national carbon dioxide emissions at the Web site for the
25,000 International Energy Annual, published by the U.S. Energy
Information Administration (EIA). Create an updated or
Fatalities

20,000
15,000 expanded version of Figure 5.5. Discuss any new features
10,000 of your updated graphs.
5000 59. Energy Table. Explore some of the many energy tables at
0 the U.S. Energy Information Administration (EIA) Web
1982

1984

1986

1988

1990

1992

1994

1996

1998

2000

2002

site. Choose a table that you nd interesting, and make a


Year graph of its data. You may choose any of the graph types
discussed in this section. Explain how you made your
Figure 5.14 Source: National Highway Traffic Safety
graph, and briey discuss what can be learned from it.
Administration.
60. Statistical Abstract. Go to the Web site for the Statistical
a. How many alcohol-related fatalities were there in 1982? Abstract of the United States. Explore the selection of fre-
in 2003? Comment on the overall trend over this period. quently requested tables. Choose one table of interest to
b. What is the percent change in alcohol-related fatalities you, and make a graph from its data. You may choose any
over this period? of the graph types discussed in this section. Explain how
benn.8206.05.pgs 12/15/06 8:23 AM Page 364

364 CHAPTER 5 Statistical Reasoning

you made your graph, and briey discuss what can be 63. Pie Chart. Find a recent news article that includes a pie
learned from it. chart. Briey discuss the effectiveness of the pie chart. For
example, would it be better if the data were displayed in a
IN THE NEWS bar graph rather than a pie chart? Could the pie chart be
improved in other ways?
61. Frequency Tables. Find a recent news article that
includes some type of frequency table. Briey describe the 64. Histogram. Find a recent news article that includes a his-
table and how it is useful to the news report. Do you togram. Briey explain what the histogram shows, and dis-
think the table was constructed in the best possible way cuss whether it helps make the point of the news article.
for the article? If so, why? If not, what would you have Are the labels clear? Is the histogram a time-series dia-
done differently? gram? Explain.
62. Bar Graph. Find a recent news article that includes a bar 65. Line Chart. Find a recent news article that includes a line
graph with qualitative data categories. Briey explain what chart. Briey explain what the line chart shows, and discuss
the graph shows, and discuss whether it helps make the whether it helps make the point of the news article. Are the
point of the news article. labels clear? Is the line chart a time-series diagram? Explain.

UNIT 5D Graphics in the Media

Now that weve discussed basic types of statistical graphs, we are ready to explore some
of the fancier graphics that appear daily in the news. We will also discuss several cau-
tions to keep in mind when interpreting media graphics.

Graphics Beyond the Basics


Many graphical displays of data go beyond the basic types discussed in Unit 5C. Here,
we explore a few of the types that are most common in the news media.

Multiple Bar Graphs


PC and On-Line Households in the U.S., 19952003 A multiple bar graph is a simple extension of a regular bar
(In millions)
graph. It has two or more sets of bars that allow comparison
80 Households with PCs between two or more data sets. All the data sets must involve
On-line households the same categories so that they can be displayed on the
70 same graph. For example, Figure 5.15 is a multiple bar graph
60 showing trends in home computing. The categories are
years. The two sets of bars represent two different measures
50 of home computing: ownership of personal computers and
40
connection to the Internet. Note that a legend clearly identi-
es the two sets of bars.
30

20 E X A M P L E 1 Computing Trends
10 Summarize two major trends shown in Figure 5.15.
0 SOLUTION The most obvious trend is that both data sets
1995 1997 1999 2001 2003
show an increase with time. That is, the number of homes
FIGURE 5.15 Trends in home computing. with computers and the number of online homes both
Source: Statistical Abstract of the United States. increased with time. We see a second trend by comparing
benn.8206.05.pgs 10/12/07 4:01 PM Page 365

5D Graphics in the Media 365

the bars within each year. In 1995, the number of online homes (about 10 million) was
less than one-third the number of homes with computers (about 33 million). By 2003,
the number of online homes (about 62 million) was about 90% of the number of
homes with computers (about 70 million). This tells us that a higher percentage of
computer users are going online. Now try Exercises 2324.

Stack Plots
Another common type of graph, called a stack plot, shows different data sets in a ver-
tical stack. Figure 5.16 uses a stack plot to show trends in death rates (deaths per
100,000 people) for four diseases since 1900. Each disease has its own color-coded
region, or wedge; note the importance of the legend. The thickness of a wedge at a
particular time tells you its value at that time: When a wedge is thick it has a large
value, and when it is thin it has a small value.

Death Rates for Various Diseases: 19002004


900
In a stack plot, the thickness of a wedge
800 at a particular time tells you its value.

700

600 620
Deaths per 100,000

For 1980, the top of the cardiovascular wedge Pneumonia


500 is at about 620 along the vertical axis Cardiovascular

400 Tuberculosis
and the bottom is at about 180. So the 1980
death rate for cardiovascular disease was Cancer
300 about 620 180 = 440 (deaths per 100,000).
200 180
100

0
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000
Year
FIGURE 5.16 A stack plot showing trends in death rates from four diseases.

E X A M P L E 2 Stack Plot
Based on Figure 5.16, what was the death rate for cardiovascular disease in 1980? Dis-
cuss the general trends visible on this graph.
SOLUTION For 1980, the cardiovascular wedge extends from about 180 to 620 on
the vertical axis, so its thickness is about 440. Thus, the death rate in 1980 for cardio-
vascular disease was about 440 deaths per 100,000 people. The graph shows several
important trends. First, the downward slope of the top wedge shows that the overall
death rate from these four diseases decreased substantially, from nearly 800 deaths per
100,000 in 1900 to about 525 in 2003. The drastic decline in the thickness of the
tuberculosis wedge shows that this disease was once a major killer, but has been nearly
benn.8206.05.pgs 12/15/06 8:23 AM Page 366

366 CHAPTER 5 Statistical Reasoning

wiped out since 1950. Meanwhile, the cancer wedge shows that the death rate from
By the Way cancer rose steadily until the mid-1990s, but has dropped somewhat since then.
Since the mid-1980s,
Now try Exercises 2528.
there has been a small
but noticeable resur-
Graphs of Geographical Data
gence of tuberculosis in We are often interested in geographical patterns in data. Figure 5.17 shows one com-
the United States. Part of mon way of displaying geographical data. In this case, the data on per capita (per per-
the resurgence is due to son) income are shown state by state. The legend explains that different colors
new strains of the dis- represent different income levels. Similar colors are used for similar income levels.
ease that resist most
common drug
Thus, it is easy to see that income levels tend to be highest in the northeast and lowest
treatments. in the south.

State Per Capita Income


WA

MT ND VT ME
OR MN
NH
ID SD WI MA
MI NY
WY RI
IA PA CT
NE NJ
NV IL OH
UT IN DE
CO WV MD
CA KS MO VA DC
KY
Key: NC
OK TN
$20,000$24,999 AZ SC
NM AR
$25,000$29,999
MS AL GA
$30,000$34,999
$35,000$39,999 TX LA
$40,000$44,999 FL

AK

HI

FIGURE 5.17 Per capita income in the 50 states (2002).


Source: U.S. Department of Commerce.

The display in Figure 5.17 works well because each state is associated with a
unique income level. For data that vary continuously across geographical areas, a
contour map is more convenient. Figure 5.18 shows a contour map of temperature
over the United States at a particular time. Each of the contours connects locations
with the same temperature. For example, the temperature is 50F everywhere along
the contour labeled 50 and 60F everywhere along the contour labeled 60F.
Between these two contours, the temperature is between 50F and 60F. Note that in
regions where contours are tightly spaced, there are greater temperature changes. For
example, the closely packed contours in the northeast indicate that the temperature
varies substantially over small distances. To make the graph easier to read, the regions
between adjacent contours are color-coded.
benn.8206.05.pgs 10/12/07 4:01 PM Page 367

5D Graphics in the Media 367

Widely separated contours mean large


regions have nearly the same temperature.

Closely packed contours mean a large


20F temperature difference over a short distance.
WA
MT 30F ME
ND 20F
OR VT
MN NH
ID WI
SD 30F NY MA
WY MI CT RI
40F
IA 40F PA
NE 50F NJ
NV 40F UT IL IN MD DE
CO OH
CA WV
50F KS MO VA
60F KY 60F
NC
OK TN
AZ
NM AR SC
MS AL GA
70F
TX LA
70F FL
80F

FIGURE 5.18 A contour map of temperature.

The greatest value of


E X A M P L E 3 Interpreting Geographical Data a picture is when it
Study Figures 5.17 and 5.18, using them to answer the following questions. forces us to notice
what we never
a. Which state(s) had the highest per capita income in 2002? expected to see.
b. Were there any temperatures above 80F in the United States on the date
JOHN TUKEY
shown in Figure 5.18? If so, where?

SOLUTION

a. Connecticut was the only state with a per capita income in the highest cate-
gory shown on the graph ($40,000$44,999), so it had the highest per capita
income. (The District of Columbia was also in this category, but it is not a
state.)
b. The 80 contour passes through southern Florida, so the parts of Florida
south of this contour had a high temperature above 80.
Now try Exercises 2930.

Time out to think


Look for a weather map in todays news. How are the temperature contours
shown? Interpret the temperature data.
benn.8206.05.pgs 12/15/06 8:23 AM Page 368

368 CHAPTER 5 Statistical Reasoning

Essay Grade Data Three-Dimensional Graphics


Today, computer software makes it easy to give almost any graph a
9 three-dimensional appearance. For example, Figure 5.19 shows the
8 bar graph of Figure 5.3, but dressed up with a three-dimensional
7 look. It may look nice, but the three-dimensional effects are
Frequency of grade

6 purely cosmetic. They dont provide any information that wasnt


5 already in the two-dimensional graph in Figure 5.3. As this
4 example shows, many three-dimensional graphics really only
3
make two-dimensional data look a little fancier.
In contrast, each of the three axes in Figure 5.20 carries distinct
2
information, making it a true three-dimensional graph. Researchers
1
studying migration patterns of a bird species (the Bobolink) counted
0 the number of birds ying over seven New York cities throughout
A B C D F
Grade the night. As shown on the inset map, the cities were aligned east-
FIGURE 5.19 This graph has a three-dimensional west so that the researchers would learn what parts of the state the
appearance, but shows only two-dimensional data. birds ew over, and at what times of night, as they headed south for

SONIC MAPPING TRACES BIRD MIGRATION


Sensors across New York State counted each occurrence of the nocturnal flight call of the
bobolink to trace the fall migration on the night of Aug. 2829, 1993. The data showed the
heaviest swath passing over the eastern part of the state.

Num
NEW YORK ber o
f bird
s
70
Alfred Richford Jefferson
Cuba
60
Oneonta
Ithaca
Beaver Dams 50

40 40

30 30

20
20

10 10

0
Jeffe
One rson
Hou
8

7 rs a onta
6 fter Rich
5 8:30 Itha ford
4 p.m ca
. Bea
ver D
Alfre ams
3

2 d
Cub
a
1

Source: Bill Evans/Cornell Laboratory of Ornithology


FIGURE 5.20 This graph shows true three-dimensional data.
Source: New York Times.
benn.8206.05.pgs 10/11/07 1:28 PM Page 369

5D Graphics in the Media 369

the winter. Thus, the three axes measure number of birds, time of night, and east-west
location.

E X A M P L E 4 Three-Dimensional Bird Migration


Based on Figure 5.20, at about what time was the largest number of birds ying
over the east-west line marked by the seven cities? Over what part of New York did
most of the birds y? Approximately how many birds passed over Oneonta around
12:00 midnight?

SOLUTION The number of birds detected in all the cities peaked between 3 and
5 hours after 8:30 p.m., or between about 11:30 p.m. and 1:30 a.m. More birds ew
over the two easternmost cities of Oneonta and Jefferson than over cities farther west.
Thus, most of the birds were ying over the eastern part of the state. To answer the
specic question about Oneonta, note that 12:00 midnight is the midpoint of time
category 4. On the graph, this time aligns with the dip between peaks on the line at
Oneonta. Looking across to the number of birds axis, we see that about 30 birds were
ying over Oneonta at that time. Now try Exercises 3139.

Combination Graphics
All of the graphic types we have studied so far are common and fairly easy to create.
But the media today are often lled with many varieties of even more complex graph-
ics. For example, Figure 5.21 shows a graphic concerning the participation of women
in the summer Olympics. This single graphic combines a line chart, many pie charts,
and numerical data. It is certainly a case of a picture being worth far more than a
thousand words.

The Ever-Growing Presence of Women in Summer Olympics Women participating


5,000
44.0%
4,500
42.0
34.2 4,000
Percentage of women participants
3,500
Total number of women participating
28.8 3,000

25.8 2,500

23.0 2,000
20.7 21.5 1,500
14.8
14.2
10.5 11.4 13.3 1,000
9.6 9.4 16.1
1.6% 0.9 1.8 2.2 2.9 4.4 9.0 8.1 500

0
1900 04 08 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 00 04
3 2 3 6 no 6 11 14 14 15 no 19 25 26 29 33 39 43 49 50 62 86 98 108 121 135
games games
Number of events for women
Source: International Olympic Committee
FIGURE 5.21 Source: Adapted from The New York Times.
benn.8206.05.pgs 12/15/06 8:23 AM Page 370

370 CHAPTER 5 Statistical Reasoning

E X A M P L E 5 Olympic Women
Describe three trends shown in Figure 5.21.
SOLUTION The line chart shows that the total number of women competing in the
summer Olympics has risen fairly steadily, especially since the 1960s, reaching nearly
5000 in the 2004 games. The pie charts show that the percentage of women among all
competitors has also increased, reaching 44% in the 2004 games. The bold red num-
bers at the bottom show that the number of events for women has also increased dra-
matically, reaching 135 in the 2004 games.
Now try Exercises 40 41.

Time out to think


Do you think the upward trend of the pie charts in Figure 5.21 will continue over the
next few Olympic games? Why or why not?

A Few Cautions about Graphics


As we have seen, graphics can offer clear and meaningful summaries of statistical data.
However, even well-made graphics can be misleading if we are not careful in inter-
preting them, and poorly made graphics are almost always misleading. Moreover,
some people use graphics in deliberately misleading ways. Here, we discuss a few of
the more common ways in which graphics can lead us astray.

Perceptual Distortions
1980 $1.00
Many graphics are drawn in a way that distorts our perception of them. Figure 5.22
shows one of the most common types of distortion. The dollar-shaped bars are used
to represent the declining value of the dollar over time. The lengths of the bars repre-
sent the data, but our eyes tend to focus on the areas of the bars. For example, the bot-
tom bar is supposed to show that a dollar in 2005 was worth only 42% as much as a
dollar in 1980. Its length is indeed 42% that of the top bar, but its area is much
smaller in comparison (about 18% of the area of the top bar). This gives the percep-
1990 $0.63 tion that the value of the dollar shrank even more than it really did.
Now try Exercises 42 43.

Watch the Scales


Figure 5.23a shows the percentage of college students between 1910 and 2005 who
2005 $0.42 were women. At rst glance, it appears that this percentage grew by a huge margin
after about 1950. But the vertical axis scale does not begin at zero and does not end at
100%. The increase is still substantial but looks far less dramatic if we redraw the
FIGURE 5.22 The lengths of graph with the vertical axis covering the full range of 0 to 100% (Figure 5.23b). From
the dollars are proportional to a mathematical point of view, leaving out the zero point on a scale is perfectly honest
their spending power, but our
eyes are drawn to the areas,
and can make it easier to see small-scale trends in the data. Nevertheless, as this exam-
which decline more than the ple shows, it can be visually deceptive if you dont study the scale carefully.
lengths. Now try Exercises 44 45.
benn.8206.05.pgs 12/15/06 8:23 AM Page 371

5D Graphics in the Media 371

Women as a Percentage of All College Students

60 100
90
55 80
70
Percent women

Percent women
50
60
45 50
40
40
30
20
35
10
30 0
1920
1960 1980 2000 1940
1920 1940 1960 1980 2000
Year Year
(a) (b)
FIGURE 5.23 Both graphs show the same data, but they look very different because their vertical
scales have different ranges.
Source: National Center for Education Statistics and Bureau of Labor Statistics.

Sometimes the scale may not be deceptive, but still requires care to avoid misinter-
pretation. Consider Figure 5.24a, which shows how the speeds of the fastest comput-
ers have increased with time. At rst glance, it appears that speeds have been
increasing linearly. For example, it might look as if the speed increased by the same
amount from 1990 to 2000 as it did from 1950 to 1960. However, if we look closely,
we see that each tick mark on the vertical scale represents a tenfold increase in speed.
Now we see that computer speed grew from about 1 to 100 calculations per second
between 1950 and 1960, and from about 100 million to 10 billion calculations per sec-
ond between 1990 and 2005. This type of scale is called an exponential scale (or
logarithmic scale), because each unit corresponds to a power of 10. In general, expo-
nential scales are useful for displaying data that vary over a huge range of values. You
can see this usefulness by looking at Figure 5.24b, where the computer data have been
recast with an ordinary scale. Because the speeds have grown so rapidly, the ordinary
scale makes it impossible to see any detail in the early years shown on the graph.
Computer Speed

1011
Calculations per second

calculations per second

100
108
By the Way
Billions of

105 In 1965, Intel founder


50
Gordon E. Moore pre-
102 dicted that advances in
technology would allow
0 computer chips to dou-
1960

1980
1990
2000
1950

1950
1960
1970
1980
1990
2000
1970

ble in power roughly


every two years. This
Year Year idea is now called
(a) (b) Moores law, and it has
FIGURE 5.24 Both graphs show the same data, but the one on the left uses an exponential scale. held fairly true ever since
Moore first stated it.
Now try Exercise 46.
benn.8206.05.pgs 10/12/07 3:54 PM Page 372

372 CHAPTER 5 Statistical Reasoning

Time out to think


Based on Figure 5.24a, can you predict the speed of the fastest computers in 2015?
Could you make the same prediction with Figure 5.24b? Explain.

Percentage Change Graphs


Is college getting more or less expensive? A quick look at Figure 5.25 might give the
impression that the cost for private colleges has been holding fairly steady while the
cost for public colleges fell steeply in 2006 after rising in prior years.
But look more closely and youll see that this is not the case at all. The vertical axis
in Figure 5.25 represents the percentage increase in costs. A at graph means only that
costs increased by the same percentage each year, not that costs held steady. Similarly,
the drop in 2006 for public colleges means only that the cost rose by less in that year
than in the preceding years.
In fact, actual costs (not adjusted for ination) for both public and private colleges
have risen substantially with time, as shown in Figure 5.26. Moreover, because the
rate of ination (as measured by the Consumer Price Index; see Unit 3D) has been
less than the rate of increase in college costs, the real cost of public colleges has steadily
risen. Graphs that show percentage change are very common, particularly with eco-
nomic data. Although they are perfectly honest, you can be misled unless you inter-
pret them with great care.
Changes in College Costs Actual College Costs

16% $24,000
Public
$20,000 Public Private
Percentage change from
previous academic year

12%
$16,000
Private
8% $12,000

$8,000
4%
$4,000

0 0
9596

9798
96 97

98 99

99 00

00 01

0102

0203

0304

0405

0506

95 96

96 97

9798

98 99

99 00

00 01

0102

0203

0304

0405

05 06
FIGURE 5.25 This graph shows the rate of increase FIGURE 5.26 This graph shows the change with time in
the actual cost (not adjusted for inflation) of tuition and
with time in tuition and fees at four-year public and
fees at four-year public and private colleges. You can
private colleges.
use the rise in these costs to calculate the percentage
Source: The College Board.
increases shown in Figure 5.25.
Source: The College Board.
Now try Exercise 47.
Pictographs
Pictographs are graphs embellished with additional artwork. The artwork may make
the graph more appealing, but it can also distract or mislead. Figure 5.27 is a picto-
graph showing the rise in world population from 1804 to 2054 (numbers for future
years are based on United Nations projections). The lengths of the bars correspond
correctly to world population for the different years listed. However, the artistic
embellishments of this graph are deceptive in several ways. For example, your eye
may be drawn to the gures of people lining the globe. Because this line of people
rises from the left side of the pictograph to the center and then falls, it might give the
benn.8206.05.pgs 12/15/06 8:23 AM Page 373

5D Graphics in the Media 373

World Population By the Way


(in billions of people)
If world population con-
tinues to double at the
same rate as in the late
20th century, it will reach
34 billion by 2100 and
9 192 billion by 2200. By
8 about 2650, human pop-
7 ulation would be so
6 large that it would not fit
Billions of people 5
4 on the Earth, even if
3 everyone stood elbow-
2 to-elbow everywhere.
1

1804 1927 1960 1974 1987 1999 2013 2028 2054


FIGURE 5.27 Source: Data from United Nations Population Divi-
sion, World Population Prospects.

impression that future world population will be declining. In fact, the line of people is
purely decorative and carries no information.
Perhaps the most serious problem with this pictograph is that it makes it appear
that world population has been rising linearly. However, notice that the time intervals
on the horizontal axis are not uniform in size. For example, the interval between the
bars for 1 billion and 2 billion people is 123 years (from 1804 to 1927), but the inter-
val between the bars for 5 billion and 6 billion people is only 12 years (from 1987 to
1999).
Pictographs are very common, but as this example shows, you have to study them
carefully to extract the essential information and not be distracted by the cosmetic
effects. Now try Exercise 48.

EXERCISES 5D

QUICK QUIZ a. 2 per 100,000.


Choose the best answer to each of the following questions. b. 20 per 100,000.
Explain your reasoning with one or more complete sentences. c. 200 per 100,000.
1. Consider Figure 5.15. Suppose you were given data for the
number of households with high-speed Internet access in 3. Consider Figure 5.17. According to this graph, what is per
each of the years shown. How would you add these data to capita income in Oregon (OR)?
the graphic? a. between $25,000 and $30,000
a. Add a third bar for each year. b. exactly $25,000
b. Stack the high-speed data on top of the on-line bars. c. It cannot be determined from the graph.
c. Put a small pie chart on top of each pair of bars. 4. Consider Figure 5.18. According to this map, the tempera-
2. Consider Figure 5.16. According to this graph, the approx- ture in Iowa (IA) was
imate death rate from tuberculosis in 1950 was a. 30F. b. 40F. c. between 30F and 40F.
benn.8206.05.pgs 9/29/07 11:53 AM Page 374

374 CHAPTER 5 Statistical Reasoning

5. Consider Figure 5.18. Notice the small loop labeled 40F 14. Describe how perceptual distortions can arise in graphics
near the southeast corner of Idaho (ID). What can you say and how they can be misleading.
about temperatures within that small region?
15. How can graphics be misleading when the scales do not
a. They were 40F. go all the way to zero? Why are such graphics sometimes
b. They were higher than 40F but lower than 50F. useful?
c. They could have been anything above 40F.
16. What is an exponential scale? When is an exponential scale
6. Suppose you are given a contour map showing elevation useful?
(altitude) for the state of Vermont. The region with the
most closely spaced contours represents 17. Explain how a graph that shows percentage change can
show descending bars (or a descending line) even when the
a. the highest altitude. variable of interest is increasing.
b. the lowest altitude.
18. What is a pictograph? How can a pictograph enhance a
c. the steepest terrain. graph? How can it make a graph misleading?
7. Consider Figure 5.21. Approximately how many women
participated in the 1948 Olympics? DOES IT MAKE SENSE?
a. 19 b. 9.4 c. 450 Decide whether each of the following statements makes sense
(or is clearly true) or does not make sense (or is clearly false).
8. Consider Figure 5.23a. The way the graph is drawn
Explain your reasoning.
a. makes the graph completely invalid. 19. My bar chart contains more information than yours,
b. makes the changes from one decade to the next appear because I made my bars three-dimensional.
larger than they really were.
20. I used an exponential scale because the data values for my
c. makes it more difficult to see the upward and downward categories ranged from 7 to 450,000.
trends that have occurred over time.
9. Consider Figure 5.24a. Moving one tick mark up the verti- 21. Theres been only a very slight rise in our stock price over
cal axis represents an increase in computer speed of the past few months, but I wanted to make it look dramatic
so I started the vertical scale from the lowest price rather
a. 1 billion calculations per second. than from zero.
b. a factor of 2.
22. A graph showing the yearly rate of increase in the number
c. a factor of 10. of computer users has a slight downward trend, even
10. Consider Figure 5.25. In years where the graph slopes though the actual number of users is rising.
downward with time,
BASIC SKILLS & CONCEPTS
a. college costs decreased.
23. Net Grain Production. Net grain production is the dif-
b. the cost of college rose, but by a lower percentage than
ference between the amount of grain a country produces
in previous years.
and the amount of grain its citizens consume. It is positive
c. the cost of college rose, but the new cost represented a if the country produces more than it consumes, and nega-
lower proportion of the average persons income. tive if the country consumes more than it produces. Fig-
ure 5.28 shows the net grain production of four countries
REVIEW QUESTIONS in 1990 and projected for 2030.
11. Briey describe the construction and use of multiple bar a. Which of the four countries had to import grain to meet
graphs and stack plots. its needs in 1990?
12. What are geographical data? Briey describe at least two b. Which of the four countries are expected to need to
ways to display geographical data. Be sure to explain the import grain to meet needs in 2030?
meaning of contours on a contour map.
c. Given that India and China are the worlds two most
13. What are three-dimensional graphics? Explain the differ- populous countries, what does this graph tell you about
ence between graphics that only appear three-dimensional how world agriculture will have to change between now
and those that show truly three-dimensional data. and 2030?
benn.8206.05.pgs 12/15/06 8:23 AM Page 375

5D Graphics in the Media 375

Net Grain Production, school. What do these data say about the value of a col-
1990 and 2030 (projected) lege education?
100
c. The graph has a three-dimensional appearance. Is it
1990 showing true three-dimensional data, or is the appear-
50
2030
ance purely cosmetic? Do you think the three-
0 dimensional appearance helps or hinders the display?
Millions of tons

50 25. Stack Plot. Answer the following based on Figure 5.16.


a. State whether the death rate for each of the four diseases
100
individually decreased or increased between 1900 and
150 2003.
b. When was the death rate due to cardiovascular diseases
200
the greatest, and what was it?
250 c. What was the death rate due to cancer in 2000?
U.S. China India Russia
FIGURE 5.28 d. Based on the trends in the graph, speculate on which of
these four diseases will be responsible for the most
deaths in 2050. Explain.
24. Education and Earnings. Figure 5.29 shows median
earnings in three different years according to level of edu- 26. College Degrees. Figure 5.30 shows the numbers of col-
cation. lege degrees awarded to men and women over time.
a. Briey explain the meaning of each of the three sets of a. Estimate the numbers of college degrees awarded to
bars on the graph. men and to women (separately) in 1930 and in 2005.
b. Compare in words the change in earnings between 1985 b. Did men or women earn more degrees in 1980? Did
and 2000 for people with bachelors degrees to the men or women earn more degrees in 2005?
change for people who did not graduate from high c. During what decade did the total number of degrees
awarded increase the most?

Median Earnings of Workers 21 Years and Over by d. Compare the total numbers of degrees awarded in 1950
Educational Attainment, 1985 to 2000 and 2005.

$80,000 e. Do you think the stack plot is an effective way to display


these data? Briey discuss other ways that might have
70,000 2000 been used instead.
1995
1985
60,000 College Degrees Awarded

50,000 1400
College graduates (thousands)

1200
40,000
1000 Women
30,000 Men
800

20,000 600

10,000 400

200
0
Overall Not high High Some Bachelors Advanced 0
1900

1910

1920

1930

1940

1950

1960

1970

1980

1990

2000

school school college/ degree degree


graduate Associate
degree
FIGURE 5.29 Source: TIME Almanac, 1999, p. 886 and Year
U.S. Census Bureau. FIGURE 5.30
benn.8206.05.pgs 12/15/06 8:23 AM Page 376

376 CHAPTER 5 Statistical Reasoning

27. Federal Spending. Figure 5.31 shows the changes in Interpret the stack plot and discuss some of the trends it
major spending categories of the federal budget. (Payments reveals.
to individuals includes Social Security and Medicare; net
a. Find the percentage of the budget that went to net
interest represents interest payments on the national debt;
interest in 1990, 1995, and 2005.
all other represents non-defense discretionary spending.)
b. Find the percentage of the budget that went to defense
in 1960, 1980, and 2005.
Percentage Composition of Federal
Government Outlays c. Find the percentage of the budget that went to pay-
ments to individuals in 1980, 2000, and 2005.
100
All other
28. Federal Trends. Consider Figure 5.31. Summarize at
80 Net interest least three trends shown in the gure.

60 National defense 29. School Segregation. One way of measuring segregation


Percent

is to determine the likelihood that a black student will have


white classmates. A New York Times study found that, by
40
this measure, segregation increased signicantly in the
Payments to individuals 1990s. Figure 5.32 shows the probability that a black stu-
20 dent had white classmates, by county, during the
19971998 academic year. Do there appear to be any sig-
nicant regional differences? Can you pick out any differ-
60 65 70 75 80 85 90 95 00 05 ences between urban and rural areas? Discuss possible
Year explanations for a few of the trends that you see in the
FIGURE 5.31 Source: Office of Management and Budget. gure.

Probability That a Black Student Would Have White Classmates

Less than 10%


20% 40%
40% 60%
60% 80%
More than 80%
Counties with no data or no black students
FIGURE 5.32 Source: New York Times, April 2, 2000.
benn.8206.05.pgs 9/29/07 11:53 AM Page 377

5D Graphics in the Media 377

30. Contour Elevations. Contour maps are often used to (projected) in two different ways; the age categories are in oppo-
show geographical elevations. Figure 5.33 shows elevation site order so that all of the data can be viewed. Use these graphs
contours around Boulder, Colorado. Discuss a few key fea- to answer the questions in Exercises 3139.
tures shown on the map. 31. Briey describe the meaning of each bar.

N
32. Do these graphs display true three-dimensional data, or is
the three-dimensional look cosmetic?
W E
33. How has the percentage of the youngest Americans
S changed since 1960?
34. Estimate the percentage of 5- to 17-year-olds in 1960 and
in 2000.
35. Estimate the percentage of 45- to 65-year-olds in 1960 and
in 2010.
36. In which year did (will) the 25- to 44-year-old group com-
prise the largest percentage of the population?
37. In which year did (will) the 45- to 65-year-old group com-
prise the largest percentage of the population?
38. Which age group is expected to see the greatest increase
FIGURE 5.33 between 2000 and 2050?

U.S. Age Distribution. Parts (a) and (b) of Figure 5.34 display 39. Describe the most signicant changes that you see in the
the age distribution of the U.S. population from 1960 to 2050 U.S. population between 1960 and 2050.
40. Extending the Olympic Graph. Make a list of all the
U.S. Age Distribution data you would need in order to extend the graph in
Figure 5.21 to the 2008 Olympics and beyond.
35
30
41. Data for 2008 Olympics. Use the Web to nd the data
you need to extend Figure 5.21 (see Exercise 40) through
25
Percent of 20
the 2008 Olympics (assuming they have occurred by the
population time you read this problem). Then photocopy the graph
15 5 and add the new data on the same graph.
10 517
5
1824 42. Volume Distortion. Figure 5.35 uses television sets to
2544 represent the numbers of homes with cable in 1980 and
0 4565
1960
1970
1980

65
1990
2000

Homes with Cable TV


2010

2050

Year
(a)

35
30
25
Percent of 20
population 2005
15 65
10 4565
5 2544
1824
0 517
1960

1980
1970
1980

5
1990
2000
2010

2050

Year
(b)
18 million homes 73 million homes
FIGURE 5.34 FIGURE 5.35
benn.8206.05.pgs 12/15/06 8:23 AM Page 378

378 CHAPTER 5 Statistical Reasoning

2005. Note that the heights of the TVs represent the num-
bers of homes. Briey explain how the graph creates a per- Lincoln
ceptual distortion that exaggerates the true change in the
number of homes with cable. Saab

43. Three-Dimensional Pies. The pie charts in Figure 5.36 Lexus


represent the percentage of Americans in three age cate-
gories in 1990 and 2050 (projected). Briey explain how
Oldsmobile
the three-dimensional effects create a perceptual distortion
in this case. Why would at pies (without the three-dimen- 170 180 190 200 210
sional effects) give a more accurate representation of the Braking distance (feet)
data? FIGURE 5.38 Source: Car and Driver.

1990 Age Distribution


65 84 85+ 46. Cellular Phone Users. The following table shows the
number of cell phone subscribers in the United States for
selected years between 1990 and 2003. Display the data
using both an ordinary vertical scale and an exponential
vertical scale. (Hint: For the exponential scale, use tick
Others
marks at 1 million, 10 million, and 100 milllion.) Which
2050 Age Distribution graph is more useful? Why?
65 84 85+
Year Subscribers (millions)
1990 5.3
1995 33.8
Others
1997 55.3
FIGURE 5.36
Source: U.S. Census Bureau. 1998 69.2
1999 86.0
44. Comparing Earnings. Figure 5.37 compares the average 2000 109.5
weekly earnings of men and women. Identify any mislead-
ing aspects of the display. Draw the display in a fairer way. 2001 128.3
2002 140.8
800
2003 158.7
Average weekly earnings

750

700
47. Rising College Costs. Refer to Figures 5.25 and 5.26 to
650 answer the following questions.
600 a. In what academic year did public college costs rise by
the largest percentage? What was the percentage
550
increase?
500
Men Women b. In the same year (as part a), what was the percentage
FIGURE 5.37 Source: U.S. Census Bureau. increase in private college costs?
c. In the same year, which had the larger increase in actual
45. Braking Distances. Figure 5.38 shows the braking dis- cost (in dollars): public or private colleges? Explain.
tance for four different cars. Discuss the ways in which it
might be deceptive. How much greater is the braking dis- 48. World Population. Recast Figure 5.27 with a proper hor-
tance of Lincolns than the braking distance of Oldsmo- izontal axis. What trends are clear in your new graph that
biles? Draw the display in a fairer way. are not clear in the original? Explain.
benn.8206.05.pgs 12/15/06 8:23 AM Page 379

5D Graphics in the Media 379

FURTHER APPLICATIONS 51. Daily Newspapers. The following table gives the number
of daily newspapers and their total circulation (in millions)
Creating Graphics. Exercises 4952 give tables of real data.
for selected years since 1920.
For each table, make a graphical display of the data. You may
choose any graphic type that you feel is appropriate to the data
set. In addition to making the display, write a few sentences Number of Circulation
explaining why you chose this type of display and a few sen- Year daily newspapers (millions)
tences describing interesting patterns in the data. 1920 2042 27.8
49. Percent Never Married. The following table shows the 1930 1942 39.6
percentages, for 1970 and 2003, of men and women in var-
ious age categories who were never married. 1940 1878 41.1
1950 1772 53.9
1960 1763 58.8
Women 1970 2003 Men 1970 2003 1970 1748 62.1
2024 35.8 75.4 2024 54.7 86.0 1980 1747 62.2
2529 10.5 40.3 2529 19.1 54.6 1990 1611 62.3
3034 6.2 22.7 3034 9.4 33.1 2000 1485 56.1
3539 5.4 14.3 3539 7.2 21.8 2003 1456 55.2
40 44 4.9 12.2 4044 6.3 17.4 Source: Editor & Publisher.

Source: U.S. Census Bureau. 52. Firearm Fatalities. The following table summarizes
deaths due to rearms in different nations in a recent year.

50. Alcohol on the Road. The following table gives the total Fatal
number of automobile fatalities and the number of fatali- Country Total Homicides Suicides accidents
ties in which alcohol was involved for 1982 to 2004. All
gures are in thousands of deaths. U.S. 35,563 15,835 18,503 1225
Germany 1197 168 1004 25
Canada 1189 176 975 38
Year Total Alcohol Australia 536 96 420 20
1982 43,945 26,173 Spain 396 76 219 101
1984 44,257 24,762 U.K. 277 72 193 12
1986 46,087 25,017 Sweden 200 27 169 4
1988 47,087 23,833 Vietnam 131 85 16 30
1990 44,599 22,587 Japan 93 34 49 10
1992 39,250 18,290 Source: Coalition to Stop Gun Violence.
1994 40,716 17,308
53. Seasonal Effects on Schizophrenia? The graph in
1996 42,065 17,749 Figure 5.39 shows data regarding the relative risk of schiz-
1998 41,501 16,673 ophrenia among people born in different months.
2000 41,945 17,380 a. Note that the scale of the vertical axis does not include
zero. Sketch the same risk curve using an axis that
2002 42,815 17,419
includes zero. Comment on the effect of this change.
2004 42,643 17,013
b. Each value of the relative risk is shown with a dot at its
Source: National Highway Traffic most likely value and with an error bar indicating the
Safety Administration. range in which the data value probably lies. The study
benn.8206.05.pgs 12/15/06 8:23 AM Page 380

380 CHAPTER 5 Statistical Reasoning

concludes that the risk was also signicantly associated WEB PROJECTS
with the season of birth. Given the size of the error
Find useful links for Web Projects on the text Web site:
bars, does this claim appear justied? (Is it possible to
www.aw.com/bennett-briggs
draw a at line that passes through all of the error bars?)
55. Weather Maps. Many Web sites offer contour maps with
current weather data. For example, you can use the Yahoo
1.4
Weather site to generate many different contour weather
maps. Generate at least two contour weather maps and dis-
1.3
cuss what they show.
1.2
1.1 56. Cancer Cure. As shown in Figure 5.16, cancer is one of
Relative risk

1.0 the leading causes of death today. Nevertheless, scientists


have made great progress in treating many forms of cancer.
0.9
Go to the American Cancer Society Web site and investi-
0.8 gate research into cancer cures. Read about one or two
0.7 recent studies, and write a short report on what you learn.
0.6 Be sure to include graphics in your report.

r
57. USA Snapshot. The USA Today Web site offers a daily
br y

M ry
ch

l
ay
ne

A ly

em t
r
em r
ec ber
be
ri

pt us

O be

e
Fe u a r

Ju

N tob
ua

Ap

M
ar

Ju

pictograph for its USA Snapshot. Study todays snapshot.


em
Se u g
n

c
Ja

ov

Briey discuss its purpose and effectiveness.


D

Month of birth
FIGURE 5.39 Source: New England Journal IN THE NEWS
of Medicine. 58. News Graphics. Find a recent news report that shows a
multiple bar graph or stack plot. Comment on the effec-
tiveness of the display. Could another display have been
54. Starting Salaries for Men and Women. Consider the used to depict the same data?
data in the table below showing the average starting
59. Geographical Data. Find an example of a graph of geo-
salaries for men and women with various levels of educa-
graphical data in a recent news report. Comment on the
tion. Construct a graphical display and write two para-
effectiveness of the display. Could another display have
graphs that demonstrate as clearly as possible the evident
been used to depict the same data?
disparity in the salaries of men and women.
60. Three-Dimensional Effects. Find an example of a three-
Male Female dimensional display in a recent news report. Are the data
three-dimensional or are the three-dimensional effects
Overall $44,726 $28,367
cosmetic? Comment on the effectiveness of the display.
Not a HS graduate 21,447 14,214 Could another display have been used to depict the same
HS graduate only 33,266 21,659 data?
Some college 36,419 22,615 61. Graphic Confusion. Find an example in a recent news
report of a graph that is misleading in one of the ways dis-
Associates degree 43,462 29,537
cussed in this unit. Explain what makes the graph mislead-
Bachelors degree 63,084 38,447 ing, and describe how it could have been drawn more
Masters degree 76,896 48,205 honestly.
Professional 136,128 72,445 62. Outstanding News Graph. Find a graph from a recent
Doctorate 95,894 73,516 news report that, in your opinion, is truly outstanding in
displaying data visually. Discuss what the graph shows, and
Source: U.S. Census Bureau, 2003. explain why you think it is so outstanding.
benn.8206.05.pgs 12/15/06 8:23 AM Page 381

5E Correlation and Causality 381

UNIT 5E Correlation and Causality

A major goal of many statistical studies is to determine whether one factor causes
another. For example, does smoking cause lung cancer? In this unit, we will discuss
how statistics can be used to search for correlations that might suggest a cause-and-
effect relationship. Then well explore the more difficult task of establishing causality.

Seeking Correlation
What does it mean when we say that smoking causes lung cancer? It certainly does not
mean that youll get lung cancer if you smoke a single cigarette. It does not even mean Smoking is one of the
that youll denitely get lung cancer if you smoke heavily for many years, since some leading causes of
heavy smokers do not get lung cancer. Rather, it is a statistical statement meaning that statistics.
you are much more likely to get lung cancer if you smoke than if you dont smoke. FLETCHER KNEBEL
Lets try to understand how researchers learned that smoking causes lung cancer.
Before they could investigate cause, researchers rst needed to establish correlations
between smoking and cancer. The process of establishing correlations began with
observations. The early observations were informal. Doctors noticed that smokers
made up a surprisingly high proportion of their patients with lung cancer. This sug-
gestion of a linkage led to carefully conducted studies in which researchers compared
lung cancer rates among smokers and nonsmokers. These studies showed clearly that
heavier smokers were more likely to get lung cancer. In more formal terms, we say
that there is a correlation between the variables amount of smoking and incidence of lung
cancer. A correlation is a special type of relationship between variables, in which a rise
or fall in one goes along with a corresponding rise or fall in the other.

DEFINITION

A correlation exists between two variables when higher values of one variable
consistently go with higher values of another or when higher values of one vari-
able consistently go with lower values of another.

Here are a few other examples of correlations:


There is a correlation between the variables height and weight for people. That is,
taller people tend to weigh more than shorter people.
There is a correlation between the variables demand for apples and price of apples. By the Way
That is, demand tends to decrease as prices increase.
Smoking is linked to
There is a correlation between practice time and skill among piano players. That is, many serious diseases
those who practice more tend to be more skilled. besides lung cancer,
including heart disease
Establishing a correlation between two variables does not mean that a change in and emphysema. Smok-
one variable causes a change in the other. Thus, nding the correlation between smok- ing is also linked with less
ing and lung cancer did not by itself prove that smoking causes lung cancer. We could lethal health conditions
such as premature skin
imagine, for example, that some gene predisposes a person both to smoking and to
wrinkling and sexual
lung cancer. Nevertheless, identifying the correlation was the crucial rst step in impotence.
learning that smoking causes lung cancer.
benn.8206.05.pgs 12/15/06 8:23 AM Page 382

382 CHAPTER 5 Statistical Reasoning

Time out to think


Suppose there really were a gene that made people prone to both smoking and
lung cancer. Explain why we would still find a strong correlation between smoking
and lung cancer in that case, but would not be able to say that smoking caused
lung cancer.

Scatter Diagrams
Table 5.6 shows the production cost and gross receipts (total revenue from ticket
sales) for the 15 biggest-budget science ction and fantasy movies of all time (through
mid-2006). Movie executives presumably hope there is a favorable correlation
between the production budget and the receipts. That is, they hope that spending
more to produce a movie will result in higher box office receipts. But is there such a
correlation? We can look for a correlation by making a scatter diagram showing the
relationship between the variables production cost and gross receipts.

TABLE 5.6 Biggest-Budget Science Fiction and Fantasy Movies


Production Cost Gross Receipts
Movie (millions of dollars) (millions of dollars)
King Kong (2005) 207 218
Spider-Man 2 (2004) 200 373
Chronicles of Narnia (2005) 180 292
Waterworld (1995) 175 88
Van Helsing (2004) 170 120
Polar Express (2004) 170 172
Terminator 3 (2003) 170 150
Poseidon (2006) 160 52
Batman Begins (2005) 150 205
Harry Potter/Goblet of Fire (2005) 150 290
Armageddon (1998) 140 201
Men in Black 2 (2002) 140 190
Spider-Man (2002) 139 403
Final Fantasy: The Spirits Within (2001) 137 32
Hulk (2003) 137 132

Note: Gross receipts are for United States only; worldwide receipts are often sub-
stantially higher. These figures are not adjusted for inflation.

DEFINITION

A scatter diagram is a graph in which each point represents the values of two
variables.
benn.8206.05.pgs 12/15/06 8:23 AM Page 383

5E Correlation and Causality 383

The following procedure describes how we make the scatter diagram, which is
shown in Figure 5.40:
1. We assign one variable to each axis, and we label each axis with values that
comfortably t the data. Here, we assign production cost to the horizontal axis
and gross receipts to the vertical axis. We choose a range of $50 to $250 million
for the production cost axis and $0 to $450 million for the gross receipts axis.
2. For each movie in Table 5.6, we plot a single point at the horizontal position
corresponding to its production cost and the vertical position corresponding to Technical Note
its gross receipts. For example, the point for the movie Waterworld goes at a We often have some
position of $175 million on the horizontal axis and $88 million on the vertical reason to think that
axis. The dashed lines on Figure 5.40 show how we locate this point. one variable depends
at least in part on the
3. (Optional) If we wish, we can label data points, as is done for selected points in other. In the case of
Figure 5.40. Figure 5.40, we might
guess that gross
receipts should
450
depend on the pro-
Gross receipts (millions of dollars)

400 Spider-Man
Spider-Man 2
duction cost. We
350
therefore call produc-
tion cost the expla-
300 Harry Potter/Goblet of Fire Chronicles of Narnia natory variable and
250 gross receipts the
King Kong response variable,
200 Batman Begins
because the produc-
150 Terminator 3 tion cost might help
Hulk Van Helsing
100
explain the gross
Waterworld receipts. The explana-
50 Poseidon tory variable is usually
0 plotted on the hori-
50 100 150 200 250 zontal axis and the
Production cost (millions of dollars) response variable on
FIGURE 5.40 Scatter diagram for the data in Table 5.6. the vertical axis.

Time out to think


By studying Table 5.6, associate each of the unlabeled data points in Figure 5.40
with a particular movie.

Types of Correlation
Look carefully at the scatter diagram for movies in Figure 5.40. The dots seem to be
scattered about with no apparent pattern. In other words, at least for these big-budget
movies, there appears to be little or no correlation between the amount of money
spent producing the movie and the amount of money it earned in gross receipts.
Now consider the scatter diagram in Figure 5.41, which shows the weights (in
carats) and retail prices of 23 diamonds. Here, the dots show a clear upward trend,
indicating that larger diamonds generally cost more. The correlation is not perfect.
For example, the heaviest diamond is not the most expensive. But the overall trend
seems fairly clear. Because the prices tend to increase with the weights, we say that
Figure 5.41 shows a positive correlation.
benn.8206.05.pgs 12/15/06 8:23 AM Page 384

384 CHAPTER 5 Statistical Reasoning

Higher weight generally goes with Higher life expectancy generally goes with lower
higher price, so this is a positive correlation. infant mortality, so this is a negative correlation.

Infant mortality (deaths per 1000 live births)


18,000 120
16,000 Bangladesh
100 Pakistan
14,000
Price (dollars)

12,000 80
India Egypt
10,000
60 Brazil
8,000 Peru Israel,
Kenya Guatemala Czech
6,000 40
Republic
Mexico
4,000 Russia Greece
20 South
2,000 Canada,
Korea Australia
0 0
0 1.5 0.52 1
2.5 60 70 80 50
Weight (carats) Life expectancy (years)
FIGURE 5.41 A scatter diagram for diamond weights FIGURE 5.42 A scatter diagram for life expectancy and
and prices. infant mortality.

In contrast, Figure 5.42 shows a scatter diagram for the variables life expectancy and
infant mortality in 16 countries. We again see a clear trend, but this time it is a
negative correlation: Countries with higher life expectancy tend to have lower infant
mortality.
Besides stating whether a correlation exists, we can also discuss its strength. The
more closely the data follow the general trend, the stronger is the correlation.
By the Way
In statistics, the
correlation coefficient
RELATIONSHIPS BETWEEN TWO DATA VARIABLES
provides a quantitative
measure of the strength No correlation: There is no apparent relationship between the two variables.
of a correlation. It is
defined to be 1 for a Positive correlation: Both variables tend to increase (or decrease) together.
perfect (meaning all
data points lie on a sin- Negative correlation: The two variables tend to change in opposite directions,
gle straight line) positive with one increasing while the other decreases.
correlation, 21 for a per-
fect negative correla- Strength of a correlation: The more closely two variables follow the general
tion, and 0 for no trend, the stronger the correlation (which may be either positive or negative). In a
correlation. perfect correlation, all data points lie on a straight line.

E X A M P L E 1 Ination and Unemployment


Prior to the 1990s, most economists assumed that the unemployment rate and the
ination rate were negatively correlated. That is, when unemployment goes down,
ination goes up, and vice versa. Table 5.7 shows unemployment and ination data
for the period 19902006. Make a scatter diagram for these data. Based on your dia-
gram, does it appear that the data support the historical claim of a link between the
unemployment and ination rates?
benn.8206.05.pgs 12/15/06 8:23 AM Page 385

5E Correlation and Causality 385

TABLE 5.7 U.S. Inflation and Unemployment


Unemployment Inflation Unemployment Inflation
Year Rate (%) Rate (%) Year Rate (%) Rate (%)
1990 5.6 5.4 1999 4.3 2.2
1991 6.8 4.2 2000 4.0 3.4
1992 7.5 3.0 2001 4.2 1.8
1993 6.9 3.0 2002 5.8 1.6
1994 6.1 2.6 2003 6.0 2.3
1995 5.6 2.8 2004 5.5 2.7
1996 5.4 3.0 2005 5.1 3.4
1997 4.9 2.3 2006 4.6 3.4
1998 4.6 2.3

Source: U.S. Bureau of Labor Statistics; 2006 data through May of that year.

SOLUTION We make the scatter diagram by plotting the variable unemployment rate
on the horizontal axis and the variable ination rate on the vertical axis. To make the
graph easy to read, we use values ranging from 3.5% to 8% for the unemployment
rate and from 0 to 6% for the ination rate. Figure 5.43 shows the result. To the eye,
there does not appear to be any obvious correlation between the two variables. (A cal-
culation conrms that there is no appreciable correlation.) Thus, these data do not
support the historical claim of a negative correlation between the unemployment and
ination rates.

5
Inflation rate (%)

4
3
2
1
0
4 5 6 7 8
Unemployment rate (%)
FIGURE 5.43 Scatter diagram for the data in Table 5.7.
Now try Exercises 2324.

E X A M P L E 2 Accuracy of Weather Forecasts


The scatter diagrams in Figure 5.44 show two weeks of data comparing the actual
high temperature for the day with the same-day forecast (left diagram) and the three-
day forecast (right diagram). Discuss the types of correlation on each diagram.
benn.8206.05.pgs 12/15/06 8:23 AM Page 386

386 CHAPTER 5 Statistical Reasoning

70 70

Actual temperature (F)

Actual temperature (F)


60 60

50 50

40 40

30 30

20 20
20 30 40 50 60 20 30 40 50 60
Same-day forecast (F) Three-day forecast (F)
FIGURE 5.44 Comparison of actual high temperatures with same-day and three-day
forecasts.

SOLUTION Both scatter diagrams show a general trend in which higher predicted
temperatures mean higher actual temperatures. Thus, both show positive correla-
tions. However, the points in the left diagram lie more nearly on a straight line, indi-
cating a stronger correlation than in the right diagram. This makes sense, because we
expect weather forecasts to be more accurate on the same day than three days in
advance. Now try Exercises 2526.

Possible Explanations for a Correlation


We began by stating that correlations can help us search for cause-and-effect rela-
tionships. But weve already seen that causality is not the only possible explanation
for a correlation. For example, the predicted temperatures on the horizontal axis of
Figure 5.44 certainly do not cause the actual temperatures on the vertical axis. The
following box summarizes three possible explanations for a correlation.

POSSIBLE EXPLANATIONS FOR A CORRELATION

1. The correlation may be a coincidence.


2. Both variables might be directly inuenced by some common underlying cause.
3. One of the correlated variables may actually be a cause of the other. Note that,
even in this case, we may have identied only one of several causes.

E X A M P L E 3 Explanation for a Correlation


Consider the correlation between infant mortality and life expectancy in Figure 5.42.
Which of the three possible explanations for a correlation applies? Explain.
SOLUTION The negative correlation between infant mortality and life expectancy is
probably an example of common underlying cause. Both variables respond to an
underlying variable that we might call quality of health care. In countries where health
care is better in general, infant mortality is lower and life expectancy is higher.
Now try Exercises 2728.
benn.8206.05.pgs 12/15/06 8:23 AM Page 387

5E Correlation and Causality 387

E X A M P L E 4 How to Get Rich in the Stock Market (Maybe) By the Way


Every nancial advisor has a strategy for predicting the direction of the stock market. The Super Bowl Indicator
Most focus on fundamental economic data, such as interest rates and corporate prof- went into a slump after
its. But an alternative strategy relies on a remarkable correlation between the Super Super Bowl 32, correctly
predicting the stock
Bowl winner in January and the direction of the stock market for the rest of the year:
markets direction in
The stock market tends to rise when a team from the old, pre-1970 NFL wins the only one of the next
Super Bowl, and tends to fall otherwise. This correlation successfully matched 28 of seven years.
the rst 32 Super Bowls to the stock market. Suppose that the Super Bowl just ended
and the winner was the Detroit Lions, an old NFL team. Should you invest all your
spare cash (and maybe even some that you borrow) in the stock market?
SOLUTION Based on the reported correlation, you might be tempted to invest, since
the old-NFL winner suggests a rising stock market over the rest of the year. However,
this investment would make sense only if you believed that the Super Bowl result
actually causes the stock market to move in a particular direction. This belief is clearly
preposterous, and the correlation is undoubtedly a coincidence. If you are going to
invest, dont base your investment on this correlation. Now try Exercises 2934.

Establishing Causality
Suppose you have discovered a correlation and suspect causality. How can you test
your suspicion? Lets return to the issue of smoking and lung cancer. The strong cor-
relation between smoking and lung cancer did not by itself prove that smoking causes The truth is rarely pure
lung cancer. In principle, we could have looked for proof with a controlled experi- and never simple.
ment. But such an experiment would be unethical, since it would require forcing a
OSCAR WILDE
group of randomly selected people to smoke cigarettes. So how was smoking estab-
lished as a cause of lung cancer?
The answer involves several lines of evidence. First, researchers found correlations
between smoking and lung cancer among many groups of people: women, men, and
people of different races and cultures. Second, among groups of people that seemed
otherwise identical, lung cancer was found to be rarer in nonsmokers. Third, people
who smoked more and for longer periods of time were found to have higher rates of
lung cancer. Fourth, when researchers accounted for other potential causes of lung
cancer (such as exposure to radon gas or asbestos), they found that almost all the
remaining lung cancer cases occurred among smokers.
These four lines of evidence made a strong case, but still did not rule out the possi-
bility that some other factor, such as genetics, predisposes people both to smoking
and to lung cancer. However, two additional lines of evidence made this possibility
highly unlikely. One line of evidence came from animal experiments. In controlled
experiments, animals were divided into randomly chosen treatment and control
groups. The experiments still found a correlation between inhalation of cigarette
smoke and lung cancer, which seems to rule out a genetic factor, at least in the ani-
mals. The nal line of evidence came from biologists studying cell cultures (that is,
small samples of human lung tissue). The biologists discovered the basic process by
which ingredients in cigarette smoke can create cancer-causing mutations. This
process does not appear to depend in any way on specic genetic factors, making it all
but certain that lung cancer is caused by smoking and not by any preexisting genetic
factor.
benn.8206.05.pgs 12/15/06 8:23 AM Page 388

388 CHAPTER 5 Statistical Reasoning

The following box summarizes these ideas about establishing causality. Generally
By the Way speaking, the case for causality is stronger when more of these guidelines are met.
The first four guidelines
for establishing causality
are called Mills meth- GUIDELINES FOR ESTABLISHING CAUSALITY
ods, after the English
philosopher and econo- To investigate whether a suspected cause actually causes an effect:
mist John Stuart Mill
(18061873). Mill was a
1. Look for situations in which the effect is correlated with the suspected cause
leading scholar of his even while other factors vary.
time and an early advo- 2. Among groups that differ only in the presence or absence of the suspected
cate of womens right to cause, check that the effect is similarly present or absent.
vote.
3. Look for evidence that larger amounts of the suspected cause produce larger
amounts of the effect.
4. If the effect might be produced by other potential causes (besides the suspected
cause), make sure that the effect still remains after accounting for these other
potential causes.
5. If possible, test the suspected cause with an experiment. If the experiment can-
not be performed with humans for ethical reasons, consider doing the experi-
ment with animals, cell cultures, or computer models.
6. Try to determine the physical mechanism by which the suspected cause pro-
duces the effect.

Time out to think


Theres a great deal of controversy concerning whether animal experiments are
ethical. What is your opinion of animal experiments? Defend your opinion.

CA S E S T U DY Air Bags and Children


By the Way By the mid-1990s, passenger-side air bags had become commonplace in cars. Statisti-
Based on these studies, cal studies showed that the air bags saved many lives in moderate- to high-speed colli-
the government now sions. But a disturbing pattern also appeared. In at least some cases, young children,
recommends that child especially infants and toddlers in child car seats, were killed by air bags in low-speed
car seats never be used
collisions.
on the front seat, and
that children under age At rst, many safety advocates found it difficult to believe that air bags could be the
12 sit in the back seat if cause of the deaths. But the observational evidence became stronger, meeting the rst
possible. four guidelines for establishing causality. For example, the greater risk to infants in
child car seats t Guideline 3, because it indicated that being closer to the air bags
increased the risk of death. (A child car seat sits on top of the built-in seat, thereby
putting a child closer to the air bags than the child would be otherwise.)
To seal the case, safety experts undertook experiments using dummies. They found
that children, because of their small size, often sit where they could be easily hurt by
the explosive opening of an air bag. The experiments also showed that an air bag
could impact a child car seat hard enough to cause death, thereby revealing the physi-
cal mechanism by which the deaths occurred.
benn.8206.05.pgs 12/15/06 8:23 AM Page 389

5E Correlation and Causality 389

CA S E S T U DY What Is Causing Global Warming? By the Way


Statistical measurements show that the global average temperaturethe average tem- Carbon dioxide and
perature everywhere on Earths surfacehas risen about 1.5F in the past century, other greenhouse gases
with more than half of this warming occurring in just the past 30 years. But what is are present naturally in
Earths atmosphere,
causing this so-called global warming?
which is a good thing.
Scientists have for decades suspected that the temperature rise is tied to an increase Without them, Earths
in the atmospheric concentration of carbon dioxide and other greenhouse gases. Com- average temperature
parative studies of Earth and other planets, particularly Venus and Mars, show that would be a frigid 210F;
the greenhouse gas concentration is the single most important factor in determining a with them, the global
average temperature is
planets average temperature. It is even more important than distance from the Sun.
about 59F. From this per-
For example, Venus, which is about 30% closer than Earth to the Sun, would be only spective, the problem
about 45F warmer than Earth if it had an Earth-like atmosphere. But because Venus with global warming is
has a thick atmosphere made almost entirely of carbon dioxide, its actual surface tem- that human input of car-
perature is about 880Fhot enough to melt lead. The reason greenhouse gases bon dioxide and other
greenhouse gases into
cause warming is that they slow the escape of heat from a planets surface, thereby
our atmosphere is rap-
raising the surface temperature. idly causing our planet
In other words, the physical mechanism by which greenhouse gases cause warming to have too much of a
is well understood (satisfying Guideline 6 on our list), and there is no doubt that a good thing.
large rise in carbon dioxide concentration would eventually cause Earth to become
much warmer. Nevertheless, as youve surely heard, many people have questioned
whether the current period of global warming really is due to humans or whether it
might be due to natural variations in the carbon dioxide concentration or other natu-
ral factors.
In an attempt to answer these questions, the United States and other nations have
devoted billions of dollars over the past two decades to an unprecedented effort to
understand Earths climate. We still have much more to learn, but the research to date
makes a strong case for human input of greenhouse gases as the cause of global warm-
ing. Two lines of evidence make the case particularly strong.
The rst line of evidence comes from careful measurements of past and present
carbon dioxide concentrations in Earths atmosphere. Figure 5.45 shows the data.
Notice that past changes in the carbon dioxide concentration correlate clearly with
By the Way
temperature changes, conrming that we should expect a rising greenhouse gas con- Global warming is a
centration to cause rising temperatures. Moreover, while the past data show that the major issue because
computer models sug-
carbon dioxide concentration does indeed vary naturally, it also shows that the recent
gest it will have severe
rise is much greater than any natural increase during the past several hundred thou- consequences. Among
sand years. Human activity is the only viable explanation for the huge recent increase the predicted conse-
in carbon dioxide concentration. quences are an
The second line of evidence comes from experiments. We cannot perform con- increase in the strength
and frequency of hurri-
trolled experiments with our entire planet, but we can run experiments with computer
canes and other severe
models that simulate the way Earths climate works. Earths climate is incredibly com- storms, a rise in sea level
plex, and many uncertainties remain in attempts to model the climate on computers. due to both heating of
However, todays models are the result of decades of work and renement. Each time the oceans and melting
a model of the past failed to match real data, scientists sought to understand the miss- of glacial ice, and major
changes to local
ing (or incorrect) ingredients in the model and then tried again with improved mod-
weather patterns
els. Todays models are not perfect, but they match real climate data quite well, giving around the world.
scientists condence that the models have predictive value. Figure 5.46 compares
benn.8206.05.pgs 12/15/06 8:23 AM Page 390

390 CHAPTER 5 Statistical Reasoning

Temperature change (C)


(relative to past millennium)
6
4
2
0
2
4
6
8
10 Human use of fossil
Periods of higher CO2 concentration 380 fuels has raised CO2
coincide with times of higher global levels above all peaks

CO2 (ppm)
400 average temperature. 360 occurring in the past
350 today 400,000 years.
CO2 (ppm)

300 340
1750
250
320
200
150 300
400,000 300,000 200,000 100,000 0 1960 1970 1980 1990 2000 2010
Years ago Year
FIGURE 5.45 The atmospheric concentration of carbon dioxide and global average tempera-
ture over the past 400,000 years.The recent CO2 data (right) represent direct meas-
urements (at Mauna Loa, Hawaii); the past data come from studies of air bubbles
trapped in Antarctic ice.The concentration is measured in parts per million (ppm).

1.0
Observations show a clear rise
average global temperature) (C)

in average global temperatures


(red line) . . .
Change (compared to past

0.5

0.0

. . . agreeing with models


0.5 (green swath) that include
effects of greenhouse gases
released by humans.

1.0
1850 1900 1950 2000
Year
FIGURE 5.46 This graph compares the predictions of
various climate models (green swath) with observed tem-
perature changes (red line) since about 1860. The agree-
ment is not perfecttelling us we still have much to
learnbut it is good enough to give us confidence that
greenhouse gases are indeed causing global warming.

model data and real data, showing good agreement and clearly suggesting that human
activity is the cause of global warming. If you include the effects of the greenhouse
gases put into the atmosphere by humans, the models agree with the data, but if you
leave out these effects, the models fail.
benn.8206.05.pgs 12/15/06 8:23 AM Page 391

5E Correlation and Causality 391

Time out to think


Check the idea that human activity causes global warming against each of the six
guidelines for establishing causality.

Confidence in Causality
If human activity is causing global warming, wed be wise to change our activities so as
to stop it. But while we have good reason to think that this is the case, not everyone is
yet convinced. Moreover, the changes needed to slow global warming might be very
expensive. How do we decide when weve reached the point where something like
global warming requires steps to address it?
In an ideal world, we would continue to study the issue until we could establish
for certain that human activity is the cause of global warming. However, we have
seen that it is difficult to establish causality and often impossible to prove causality
beyond all doubt. We are therefore forced to make decisions about global warming,
and many other important issues, despite remaining uncertainty about cause and
effect.
In other areas of mathematics, accepted techniques help us deal with uncertainty
by allowing us to calculate numerical measures of possible errors. But there are no
accepted ways to assign such numbers to the uncertainty that comes with questions of By the Way
causality. Fortunately, another area of study has dealt with practical problems of For criminal trials, the
causality for hundreds of years: our legal system. You may be familiar with the follow- Supreme Court
ing three broad ways of expressing a legal level of condence. endorsed this guidance
from Justice Ginsburg:
Proof beyond a reason-
able doubt is proof that
leaves you firmly con-
vinced of the defen-
dants guilt. There are
BROAD LEVELS OF CONFIDENCE IN CAUSALITY very few things in this
world that we know with
Possible cause: We have discovered a correlation, but cannot yet determine
absolute certainty, and
whether the correlation implies causality. In the legal system, possible cause (such in criminal cases the law
as thinking that a particular suspect possibly caused a particular crime) is often the does not require proof
reason for starting an investigation. that overcomes every
possible doubt. If, based
Probable cause: We have good reason to suspect that the correlation involves on your consideration of
cause, perhaps because some of the guidelines for establishing causality are satis- the evidence, you are
ed. In the legal system, probable cause is the general standard for getting a judge firmly convinced that
the defendant is guilty
to grant a warrant for a search or wiretap.
of the crime charged,
Cause beyond reasonable doubt: We have found a physical model that is so suc- you must find him guilty.
If on the other hand, you
cessful in explaining how one thing causes another that it seems unreasonable to
think there is a real possi-
doubt the causality. In the legal system, cause beyond reasonable doubt is the usual bility that he is not guilty,
standard for conviction. It generally demands that the prosecution show how and you must give him the
why (essentially the physical model) the suspect committed the crime. Note that benefit of the doubt
beyond reasonable doubt does not mean beyond all doubt. and find him not guilty.
benn.8206.05.pgs 12/15/06 8:23 AM Page 392

392 CHAPTER 5 Statistical Reasoning

While these broad levels remain fairly vague, they give us at least some common
language for discussing condence in causality. If you study law, you will learn much
more about the subtleties of interpreting these terms. However, because statistics has
little to say about them, we will not discuss them much further in this book.

Time out to think


Given what you know about global warming, do you think that human activity is a
possible cause, probable cause, or cause beyond reasonable doubt? Defend your
opinion. Based on your level of confidence in the causality, how would you recom-
mend setting policies with regard to global warming?

EXERCISES 5E

QUICK QUIZ 6. What type of correlation would you expect between wages
and the unemployment rate?
Choose the best answer to each of the following questions.
Explain your reasoning with one or more complete sentences. a. none
1. If X is correlated with Y, b. positive: higher wages would go with higher
a. X causes Y. unemployment
b. increasing values of X go with increasing values of Y. c. negative: higher wages would go with lower
unemployment
c. increasing values of X go with either increasing or
decreasing values of Y. 7. You have found a higher rate of birth defects among babies
2. Consider Figure 5.42. According to this diagram, life born to women exposed to second-hand smoke. To support
expectancy in Russia is about a claim that the second-hand smoke caused the birth
defects, what else should you expect to nd?
a. 22 years. b. 63 years. c. 58 years.
a. evidence that higher rates of defects are correlated with
3. If the points on a scatter diagram fall on a nearly straight exposure to greater amounts of smoke
line sloping upward, the two variables have
b. evidence that these types of birth defects occur only in
a. a strong positive correlation. babies whose mothers were exposed to smoke, and never
b. a weak negative correlation. to any other babies
c. no correlation. c. evidence that the types of birth defects in these babies
are more debilitating than other types of birth defects
4. If the points on a scatter diagram fall into a broad swath
that slopes downward, the two variables have 8. Consider Figure 5.45. According to this graph, how does
a. a strong positive correlation. the CO2 concentration today compare to the highest CO2
concentrations during the 400,000 years before humans
b. a weak negative correlation.
began industry?
c. no correlation.
a. The values are about the same.
5. When can you rule out the possibility that changes to vari-
b. Todays value is about 10% higher.
able X cause changes to variable Y?
a. when there is no correlation between X and Y c. Todays value is about 30% higher.

b. when there is a negative correlation between X and Y 9. Based on the trend shown in Figure 5.45, predict the CO2
c. when a scatter diagram of the two variables shows points concentration in the year 2040.
lying in a straight line a. 390 ppm b. 420 ppm c. 600 ppm
benn.8206.05.pgs 12/15/06 8:23 AM Page 393

5E Correlation and Causality 393

10. A jury nding that a person is guilty beyond reasonable 21. I had originally suspected that an increase in variable E
doubt is supposed to mean that would cause a decrease in variable F, but I no longer
a. the person is denitely guilty. believe this because I found no correlation between the
two variables.
b. the 12 members of the jury each felt that there was more
than a 50% chance that the person was guilty. 22. I agree that we should require kids to wear helmets if hel-
c. any reasonable person would conclude that the evidence mets really lower injury rates, but it makes no sense to start
was sufficient to establish guilt. this requirement until we have absolute proof that helmets
cause the lower injury rate.

REVIEW QUESTIONS
BASIC SKILLS & CONCEPTS
11. What is a correlation? Give three examples of pairs of vari-
ables that are correlated. Interpreting Scatter Diagrams. Exercises 2326 each show a
scatter diagram with its axes labeled. For each exercise, do the
12. What is a scatter diagram, and how do you make one? How following:
can we use a scatter diagram to look for a correlation? a. Indicate the variables for which we can seek a correlation
with this diagram.
13. Dene and distinguish among positive correlation, nega-
tive correlation, and no correlation. How do we determine b. State whether the diagram shows a positive correlation, a
the strength of a correlation? negative correlation, or no correlation. If there is a positive
or negative correlation, state whether it is strong or weak.
14. Describe the three general categories of explanation for a c. In words, summarize any conclusions you can draw from
correlation. Give an example of each. the diagram.
15. Briey describe each of the six guidelines presented in this
unit for establishing causality. Give an example of the 23. 2004 Model Cars
application of each guideline. 35
City gas mileage (mi/gal)
16. Briey describe three levels of condence in causality and 30
how they can be useful when we do not have absolute 25
proof of causality.
20
DOES IT MAKE SENSE? 15
Decide whether each of the following statements makes sense 10
(or is clearly true) or does not make sense (or is clearly false). 1500 2500 3500 4500
Explain your reasoning. Weight of cars (pounds)
17. There is a strong negative correlation between the price
of tickets and the number of tickets sold. This suggests
that if we want to sell a lot of tickets, we should lower the
price.
24. U.S. Presidential Elections, 19642004
18. There is a strong positive correlation between the amount
of time spent studying and grades in mathematics classes.
This suggests that if you want to get a good grade, you 8
should spend more time studying.
Unemployment (%)

7
6
19. I found a nearly perfect positive correlation between vari-
able A and variable B, and therefore was able to conclude 5
that an increase in variable A causes an increase in vari- 4
able B. 3
2
20. I found a nearly perfect negative correlation between 1
variable C and variable D, and therefore was able to con- 0
clude that an increase in variable C causes a decrease in 50 55 60 65 70
variable D. Voter turnout (%)
benn.8206.05.pgs 12/15/06 8:23 AM Page 394

394 CHAPTER 5 Statistical Reasoning

25. Employees of Big Co. FURTHER APPLICATIONS


Percent of income given to charity

10 Making Scatter Diagrams. Exercises 3540 each give a table of


data. In each case, do the following:
8
a. Make a scatter diagram for the data.
6 b. State whether the two variables appear to be correlated and,
if so, whether the correlation is positive or negative and
4
strong or weak.
2 c. Suggest a reason for the correlation (or lack of correlation).
If you suspect causality, briey discuss what further evi-
0 dence you would need to establish it.
$30,000
$60,000

$120,000

$180,000
$210,000
$90,000

$150,000

$240,000
$270,000
35. Defense and Economy. The table below gives the per
capita gross national product and the per capita expendi-
Salary level (dollars per year) ture on defense for eight developed countries. Gross
national product (GNP) is a measure of the total economic
output of a country in monetary terms. Per capita GNP is
26. U.S. Farms 19502000
the GNP averaged over every person in the country.
500
450
Average size (acres)

400 Per capita Per capita


350 Country GNP ($) defense ($)
300
250 Australia 26,900 350
200
150 France 31,000 553
100
50 Germany 30,120 328
0 Israel 17,380 1673
0 1 2 3 4 5 6
Number of farms (millions) Japan 37,180 310
Norway 52,000 659
Types of Correlation. Exercises 2734 list pairs of variables. United Kingdom 33,940 583
State the units you would use to measure each of the two vari-
United States 41,400 1128
ables (for example, pounds, years, or miles per hour). Then state
whether you believe the two variables are correlated. If you
believe they are correlated, state whether the correlation is posi- 36. The following table gives number of home runs and bat-
tive or negative and strong or weak. Explain your reasoning. ting average for baseballs Most Valuable Players,
27. Latitude north of the equator and average high tempera- 19962005 A NL 5 National League and
ture in June AL 5 American League B .

28. Height of individual and amount of pocket change


Home Batting
29. Age and time spent daily on cell phone Player runs average

30. Altitude on a mountain hike and air pressure Ken Caminiti (1996 NL) 40 .326
Juan Gonzalez (1996 AL) 47 .314
31. Population of a state and average salary of public school
teachers Larry Walker (1997 NL) 49 .366

32. Population of a state and percentage of foreign-born Ken Griffey Jr. (1997 AL) 56 .304
residents Sammy Sosa (1998 NL) 66 .308
33. Fertility rate of women and life expectancy in the country Juan Gonzalez (1998 AL) 45 .318
Chipper Jones (1999 NL) 45 .319
34. Family income of public school students and experience of
teacher (continued)
benn.8206.05.pgs 12/15/06 8:23 AM Page 395

5E Correlation and Causality 395

Ivan Rodriguez (1999 AL) 35 .332


Household income Weekly TV hours
Jeff Kent (2000 NL) 33 .334
Less than $30,000 56.3
Jason Giambi (2000 AL) 43 .333
$30,000$40,000 51.0
Barry Bonds (2001 NL) 73 .328
$40,000$50,000 50.5
Ichiro Suzuki (2001 AL) 8 .350
$50,000$60,000 49.7
Barry Bonds (2002 NL) 46 .370
More than $60,000 48.7
Miguel Tejada (2002 AL) 34 .308
Barry Bonds (2003 NL) 45 .341 Source: Nielsen Media Research.

Alex Rodriguez (2003 AL) 47 .298 39. The following table gives the average teacher salary and
the expenditure on public education per pupil for ten states
Barry Bonds (2004 NL) 45 .362
in 2004.
Vladimir Guerrero (2004 AL) 39 .337
Albert Pujols (2005 NL) 41 .330 Average teacher Per pupil
State salary (dollars) expenditure (dollars)
Alex Rodriguez (2005 AL) 48 .321
Alabama 38,325 6701
Alaska 51,736 9808

37. The following table gives per capita personal income and Arizona 41,843 5474
percent of the population below the poverty level for ten Connecticut 57,337 11,774
states in 2004. Massachusetts 53,181 10,772
North Dakota 35,441 6683
Oregon 49,169 7587
Per capita Percent of
personal population below Texas 40,476 7168
State income (dollars) poverty level Utah 38,976 5245
California 35,019 13.1 Wyoming 39,532 9673
Colorado 36,063 9.7 Source: National Education Association.
Illinois 34,351 12.6
40. The following table gives mean daily Caloric intake (all
Iowa 30,560 8.9 residents) and infant mortality rate (per 1000 births) for
Minnesota 35,861 7.4 ten countries.

Montana 26,857 15.1 Mean daily Infant mortality rate


Nevada 33,405 10.9 Country Calories (per 1000 births)
New Hampshire 37,040 5.8 Afghanistan 1523 154
Utah 26,606 9.1 Austria 3495 6
West Virginia 25,872 17.4 Burundi 1941 114
Source: U.S. Census Bureau; U.S. Bureau of Economic Analysis. Colombia 2678 24
Ethiopia 1610 107
Germany 3443 6
38. The following table gives the average hours of television
watched in households in ve categories of annual income. Liberia 1640 153
(Hint: For the rst and last categories of the household New Zealand 3362 7
income data, place the dot at the position corresponding to Turkey 3429 44
$25,000 and $65,000, respectively. For other categories,
place the dot at the center of each bin.) United States 3671 7
benn.8206.05.pgs 9/29/07 11:53 AM Page 396

396 CHAPTER 5 Statistical Reasoning

Correlation and Causality. Exercises 4146 make statements 50. High-Voltage Power Lines. Suppose that people living
about a correlation. In each case, state the correlation clearly near a particular high-voltage power line have a higher
(for example, there is a positive correlation between variable A incidence of cancer than people living farther from the
and variable B). Then state whether the correlation is most power line. Can you conclude that the high-voltage power
likely due to coincidence, a common underlying cause, or a line is the cause of the elevated cancer rate? If not, what
direct cause. Explain your answer. other explanations might there be for it? What other types
of research would you like to see before you conclude that
41. In a large resort city, the crime rate increased at the same high-voltage power lines cause cancer?
time that the number of tourists increased.
51. Soccer and Birthdays. A recent study revealed that the
42. Over the past three decades, the number of miles of free- best soccer players in the world tend to have birthdays in
ways in Los Angeles has grown, and traffic congestion has the earlier months of the year. Is this a coincidence or can
worsened. you nd a plausible explanation?
43. When gasoline prices rise, sales of sport utility vehicles
decline. WEB PROJECTS
44. Sales of ice cream in a local restaurant are positively corre- Find useful links for Web Projects on the text Web site:
lated with sales of swimming suits at a local store. www.aw.com/bennett-briggs
45. Automobile gas mileage decreases with tire pressure. 52. Success in the NFL. Use the Web to nd last seasons
NFL team statistics. Make a table showing the following
46. Over a period of twenty years, the number of ministers and for each team: number of wins, average yards gained on
priests in a city increased, as did attendance at movies. offense per game, and average yards allowed on defense
per game. Make scatter diagrams to explore the correla-
47. Identifying Causes: Headaches. You are trying to iden-
tions between offense and wins and between defense and
tify the cause of late-afternoon headaches that plague you
wins. Discuss your ndings. Do you think that there are
several days each week. For each of the following tests and
other team statistics that would yield stronger correlations
observations, explain which of the six guidelines for estab-
with the number of wins?
lishing causality you used and what you concluded.
The headaches occur only on days that you go to work. 53. Statistical Abstract. Explore the frequently requested
tables at the Web site for the Statistical Abstract of the
If you stop drinking Coke at lunch on days you go to
United States. Choose data that are of interest to you and
work, the headaches persist.
explore at least two correlations. Briey discuss what you
In the summer, the headaches occur less frequently if learn from the correlations.
you open the windows of your office slightly. They
occur even less often if you open the windows of your 54. Air Bags and Children. Starting from the Web site of the
office fully. National Highway Traffic Safety Administration, research
Having made all these observations, what reasonable con- the latest studies on the safety of air bags, especially with
clusion can you reach about the cause of the headaches? regard to children. Write a short report summarizing your
ndings and offering recommendations for improving
48. Smoking and Lung Cancer. There is a strong correla- child safety in cars.
tion between tobacco smoking and incidence of lung can-
cer, and most physicians believe that tobacco smoking 55. Global Warming. Use the Web to nd recent informa-
causes lung cancer. Yet, not everyone who smokes gets tion about global warming and its potential consequences.
lung cancer. Briey describe how smoking could cause Discuss the evidence linking human activity to global
cancer when not all smokers get cancer. warming. In light of your ndings, suggest how we should
deal with the issue of global warming.
49. Longevity of Orchestra Conductors. A famous study in
Forum on Medicine (1978) concluded that the mean lifetime 56. Tobacco Lawsuits. Tobacco companies have been the
of conductors of major orchestras was 73.4 years, about subject of many lawsuits relating to the dangers of smok-
5 years longer than that of all American males at the time. ing. Research one recent lawsuit. What were the plaintiffs
The author claimed that a life of music causes a longer life. trying to prove? What statistical evidence did they use?
Evaluate the claim of causality and propose other explana- How well do you think they established causality? Did they
tions for the longer life expectancy of conductors. win? Summarize your ndings in one to two pages.
benn.8206.05.pgs 12/15/06 8:23 AM Page 397

Chapter 5 Summary 397

IN THE NEWS tion. Describe the study and the claimed causation. Do you
think the claim of causation is legitimate? Explain.
57. Correlations in the News. Find a recent news report
that describes some type of correlation. Describe the cor- 59. Legal Causation. Find a news report concerning an
relation. Does the article give any sense of the strength of ongoing legal case, either civil or criminal, in which estab-
the correlation? Does it suggest that the correlation lishing causality is important to the outcome. Briey
reects any underlying causality? Briey discuss whether describe the issue of causation in the case and how the
you believe the implications the article makes with respect ability to establish or refute causality will inuence the
to the correlation. outcome of the case.
58. Causation in the News. Find a recent news report in
which a statistical study has led to a conclusion of causa-

CHAPTER 5 SUMMARY
UNIT KEY TERMS KEY IDEAS AND SKILLS
5A statistics Understand and interpret the five basic steps in a statistical study.
is a science Understand the importance of a representative sample.
are data Be familiar with four common sampling methods:
population, sample simple random sampling
population parameters, systematic sampling
sample statistics convenience sampling
bias stratified sampling
observational study Distinguish between observational studies and experiments; also
case-control study recognize observational case-control studies.
experiment Understand the placebo effect and the importance of blinding in
placebo, experiments.
placebo effect Find a confidence interval from a margin of error:
blinding from (sample statistic 2 margin of error)
single-blind to (sample statistic 1 margin of error)
double-blind
margin of error
confidence interval

5B selection bias Understand and apply eight guidelines for evaluating a statistical
participation bias study.
variable (in a statistical
study)
(Continues on the next page)
benn.8206.05.pgs 12/15/06 8:23 AM Page 398

398 CHAPTER 5 Statistical Reasoning

5C frequency table Interpret and create frequency tables.


categories Interpret and create bar graphs and pie charts.
frequency Interpret and create histograms and line charts.
relative frequency
cumulative frequency
data types
qualitative
quantitative
bar chart
pie chart
histogram
line chart
time-series diagram

5D multiple bar graph Interpret multiple bar graphs, stack plots, contour maps, and other
stack plot media graphs.
geographical data Distinguish between true three-dimensional data and graphs that
contour map have a three-dimensional look for cosmetic reasons only.
Be aware of common cautions about graphs.

5E correlation Distinguish between correlation and causality.


cause Create and interpret scatter diagrams and use them to identify
scatter diagram correlations:
positive, negative, or no correlation
strength of correlation
Know three possible explanations for a correlation:
coincidence
common underlying cause
true cause
Understand and apply six guidelines for establishing causality.

You might also like