You are on page 1of 42

Graphs for categorical

data
Bar Chart
When to Use Categorical data

How to construct
– Draw a horizontal line; write the categories or
labels below the line at regularly spaced
intervals
– Draw a vertical line; label the scale using
frequency or relative frequency
– Place equal-width rectangular bars above each
category label with a height determined by its
frequency or relative frequency
Bar Chart (continued)
What to Look For
Frequently or infrequently occurring
categories

Collect the following data and then display the data in a


bar chart:
What is your favorite ice cream flavor?

Vanilla, chocolate, strawberry, or other


Each year the Princeton Review conducts a survey of
students applying to college and of parents of college
applicants. In 2009, 12,715 high school students
responded to the question “Ideally how far from home
would you like the college you attend to be?” Also,
3007 parents of students applying to college
responded to the question “how far from home would
What should you do first?
you like the college your child attends to be?” Data is
displayed in the frequency table below.
Frequency
Ideal Distance Students Parents
Create a
Less than 250 miles 4450 1594 comparative
250 to 500 miles 3942 902 bar chart
500 to 1000 miles 2416 331 with these
More than 1000 miles 1907 180 data.
Relative Frequency
Ideal Distance Students Parents
Less than 250 miles .35 .53
250 to 500 miles .31 .30
500 to 1000 miles .19 .11
More than 1000 miles .15 .06
Found
Foundby
bydividing
dividingthe
thefrequency
frequencybybythe
thetotal
totalnumber
number
ofofstudents
parents
What does this graph
show about the ideal
distance college
should be from home?
Segmented (or Stacked) Bar Charts

When to Use Categorical data

How to construct
– MUST first calculate relative frequencies
– Draw a bar representing 100% of the group
– Divide the bar into segments corresponding to
the relative frequencies of the categories
Remember the Princeton survey . . .

Create a segmented bar graph with these


data.

Relative Frequency
Firstdraw a bar
Ideal Distance Students that represents
Parents
Less than 250 miles .35 .53 100% of the
250 to 500 miles .31 .30 students who
500 to 1000 miles .19 .11 answered the
More than 1000 miles .15 .06 survey.
Relative Frequency
Notice
Ideal that
Distance this segmented
Students bar chart displays the
Parents
same
Less thanrelationship
250 miles between
.35 the
.53opinions of students
250
andtoparents
500 milesconcerning
.31 the ideal
.30 distance that college
500 to is
1000 miles
from home as.19the double.11 bar chart does.
More than 1000 miles .15 .06

1.0 First
Next, draw a bar
divide the
Do that
the same thing for
represents
0.8 Less thanbar into segments.
250 miles
parents100% – don’t
of theforget
250 to 500 miles
0.6 a key denoting
students whoeach
Relative frequency

500 to 1000 miles


0.4
categorythe
More than answered
1000 miles

0.2
survey.

Students Parents
Pie (Circle) Chart
When to Use Categorical data

How to construct
– Draw a circle to represent the entire data set
– Calculate the size of each “slice”:
Relative frequency × 360°
– Using a protractor, mark off each slice

To describe
– comment on which category had the largest
proportion or smallest proportion
Typos on a résumé do not make a very good
impression when applying for a job. Senior
executives were asked how many typos in a
résumé would make them not consider a job
candidate. The resulting data are summarized
in the table below.
Number of Typos Frequency Relative Frequency Create a pie
1 60 .40 chart for
2 54 .36 these data.
3 21 .14
4 or more 10 .07
Don’t know 5 .03
Number of Typos Frequency Relative Frequency
1
What does this 60
pie chart tell.40 us about the number of
2
typos occurring 54
in résumés before.36
the applicant would not be considered
for a job?
3 21 .14
4 or more 10 .07
Don’t know 5 .03
First draw a circle
Next,
Repeat calculate
to represent
for each the
theslice.
size of the
entire slice
data for “1
set.
typo”
Here is the
completed pie chart
.40×360º
created=144º
using
Minitab.
Draw that slice.
Dotplot
When to Use Small numerical data
sets

How to construct
– Draw a horizontal line and mark it with an
appropriate numerical scale
– Locate each value in the data set along the scale
and represent it by a dot. If there are two are
more observations with the same value, stack the
dots vertically
Dotplot (continued)
What to Look For
– The representative or typical value
– The extent to which the data values spread out
– The nature of the distribution along the number line
– The presence of unusual values

Collect the following data and then display the data in a dotplot:

How many body piercings do you


have?
Double Bar Charts
When to Use Categorical data

How to construct
– Constructed like bar charts, but with two (or
more) groups being compared
– MUST use relative frequencies on the vertical
axis
– MUST include a key to denote the different bars
Why MUST we use relative frequencies?
Stem-and-Leaf Displays
When to Use Univariate numerical data

How to construct
Can Each
– Select
also number
one
create is split
orcomparative
more of theinto two parts:
leading digitsdisplays
stem-and-leaf for the
Remember
stem the data set collected in Chapter 1 – how many
– List Stem
piercingsthe possible
do you
– have?
consistsstem of values
Would the firstinUse
a vertical
a stem-and-leaf
digit(s) column
display be a
Record
– good the
graph leaf
for thisfor each
distribution? Whyfor
observation
small
orbeside
why to
not?
Leaf - consistsstem
each corresponding value
of the final digit(s)
moderate sized data
– Indicate the units for stems sets. and leaves in awork
Doesn’t key
or legend Be sure to list every
If you have a long well for large data
stemlists
from of the
leaves
sets.
To describe behind a few stems,
smallest youlargest
to the can
– comment on the center, spread,
split stemsand shape
in order toofspread
value the
distribution and if there areoutanythe
unusual features
distribution.
The following data are price per ounce for various brands
of different brands of dandruff shampoo at a local
grocery store.
0.32 0.21 0.29 0.54 0.17 0.28 0.36 0.23

Create a stem-and-leaf display with this data?


What
List would
the anof
stems
Stem Leaf For
TheContinuethe
median observation
price per
recording ounce
each for
leaf
appropriate
“0.32”, stem
Describe
vertically
write the be?
this
2 behind
1 7 dandruff shampoo is $0.285,
with the corresponding with
stem
the distribution.
“3” The
stem.
2 1 9 8 3 a range of $0.37. distribution
3 2 6 is positively skewed with an
4
outlier at $0.54.
5 4
The Census Bureau projects the median age in 2030 for
the 50 states and Washington D.C. A stem-and-leaf
display is shown below.

Notice
We use L for lower leaf values (0-4)that
andnow
H foryou can
see the shape of this
higher leaf values (5-9).
distribution.

Notice that you really cannot see a


We can split the stems in order to
distinctive shape for this distribution
better see the shape of the
due to the long list of leaves
distribution.
The median percentage of primary-school-aged children
The following
enrolled is data
in school is ontruncate
the
larger
Let’s forpercentage
countries
the in
leavesofNorthern
primary-
to the Africa
unit
Be Create
sure
What a
to comparative
use comparative
isranges
an appropriate stem-and-
language
stem?
school-aged children
than in Central Africa,who
but are
the enrolled are
place. in school
the for
same. 19
The
when leaf
describing display.
these distributions!
countries in Northern Africa and for 23 countries
distribution for countries in Northern Africa is strongly in
Central African.
negatively skewed, but the distribution
“4.6” becomesfor countries
“4” in Central
Africa is approximately symmetrical.
Northern Africa
54.6 34.3 48.9 77.8 59.6 88.5 97.4 92.5 83.9
98.8 91.6 97.8 96.1 92.2 94.9 98.6 86.6 96.9
88.9

Central Africa
58.3 34.6 35.5 45.4 38.6 63.8 53.9 61.9 69.9
43.0 85.0 63.4 58.4 61.9 40.9 73.9 34.8 74.4
97.4 61.0 66.7 79.6
Histograms
When to Use Univariate numerical data

How to constructDiscrete datadifferently for discrete versus


Constructed
continuous
For comparative histograms – use two separate graphs data scale
with the same
―Draw a horizontal
on the horizontal axis scale and mark it with the possible
values for the variable
―Draw a vertical scale and mark it with frequency or
relative frequency
―Above each possible value, draw a rectangle centered
at that value with a height corresponding to its
frequency or relative frequency
To describe
– comment on the center, spread, and shape of the
distribution and if there are any unusual features
Queen honey bees mate shortly after they
become adults. During a mating flight, the queen
usually takes several partners, collecting sperm
that she will store and use throughout the rest of
her life. A study on honey bees provided the
following data on the number of partners for 30
queen bees.

12 2 4 6 6 7 8 7 8 11 8 3 5 6 7 10 1 9
7 6 9 7 5 4 7 4 6 7 8 10

Create a histogram for the number of partners of


the queen bees.
Draw a rectangle above
First
eachdraw a horizontal
value axis,
with a height
7 scaled
Next witha the
corresponding
draw topossible
theaxis,
vertical
values of the
withvariable
frequency.
scaled frequencyof
6
interest.
or relative frequency.
5

0
0 1 2 3 4 5 6 7 8 9 10 11 12

Suppose we use relative frequency instead of frequency on the vertical axis.

What do you notice about the shapes of these two histograms?


Histograms
When to Use Univariate numerical data

How to constructContinuous data


―Mark the boundaries of the class intervals on the
horizontal axis
―Draw a vertical scale and mark it with frequency or
relative frequency
―Draw a rectangle
This is the typedirectly above that
of histogram eachmost
class interval
with a height corresponding to its frequency or
students are familiar with.
relative frequency
To describe
– comment on the center, spread, and shape of the
distribution and if there are any unusual features
A study examined the length of hours spent
The median
watching number
TV per day of forhours spent watching
a sample of children TV per
ageday wasfor
1 and greater for theof
a sample 1-year-olds
children than age 3. forBelow
the 3-
areyear-olds. The distribution
comparative histograms. for the 3-year-olds was
more strongly skewed right than the distribution for
the 1-year-olds,
Writebut
a fewthe
Notice the twoscale
common
sentences distributions had
on the horizontal
comparing
axis similar
the distributions.
ranges.

Children Age 1 Children Age 3


Histograms with unequal intervals
When to use
- when you have a concentration of data in the middle
with some extreme values

How to construct
- construct similar to histograms with continuous data,
but with density on the vertical axis

relative frequency for interval


density 
width of interval
Cumulative Relative Frequency Plot
When to use
- used to answer questions about percentiles.
How to construct Percentiles are a value with a
- Mark the boundaries of the of
given percent intervals on theat or
observations
horizontal axis below that value.
- Draw a vertical scale and mark it with relative
frequency
- Plot the point corresponding to the upper end of
each interval with its cumulative relative
frequency, including the beginning point
- Connect the points.
The National Climatic Center has been collecting
weather Find
data for many years. The annual rainfall
the cumulative relative frequency for each interval

amounts for Albuquerque, New Mexico from 1950 to


2008 were used to create the frequency distribution
below.
Annual Rainfall Relative Cumulative relative
(in inches) frequency frequency
4 to <5 0.052 0.052
5 to <6 0.103 +
0.155
6 to <7 0.086 +
0.241
7 to <8 0.103
8 to <9 0.172
9 to <10 0.069 Continue this pattern to complete the
table
10 to < 11 0.207
11 to <12 0.103
12 to <13 0.052
13 to <14 0.052
The National Climatic Center has been collecting
weather dataa for
To create many years.
cumulative Thefrequency
relative annual rainfall
plot,
amounts
graphfor Albuquerque,
a point New Mexico
for the upper value offrom 1950 to
the interval
2008 were used to create the frequency distribution
below.
and the cumulative relative frequency
Annual Rainfall Relative Cumulative relative
(in inches) frequency frequency
In the context of this
4 to <5 0.052 0.052
problem, 5explain
to <6 the 0.103 0.155
meaning of6 tothis
<7 value. 0.086 0.241
Why isn’t this
7Plotvalue one 0.103
to <8 0.344
a point for each interval.
8(1)?
to <9
Plot a starting0.172 0.516
point at (4,0).
In the context of this
9 to <10 0.069 0.585
10 to < 11
Connect 0.207
the points. 0.792
problem, explain the
11 to <12
meaning of this value. 0.103 0.895
12 to <13 0.052 0.947
13 to <14 0.052 0.999
1.0 What proportion of years had
rainfall amounts that were 9.5
0.8 inches or less?
Cumulative relative frequency

0.6
Approximately 0.55

0.4

0.2

2 4 6 8 10 12 14

Rainfall
1.0 Approximately 30% of the years had
annual rainfall less than what
0.8 amount?
Cumulative relative frequency

0.6

0.4

0.2

Approximately 7.5 inches


2 4 6 8 10 12 14

Rainfall
Which interval of rainfall
amounts had a larger 1.0

proportion of years –
0.8
9 to 10 inches or
Cumulative relative frequency

10 to 11 inches?
0.6
Explain
0.4
The interval 10 to 11 inches,
because its slope is steeper,
0.2 indicating a larger proportion
occurred.
2 4 6 8 10 12 14

Rainfall
Displaying Bivariate
Numerical Data
Scatterplots
When to Use Bivariate numerical data
Scatterplots are discussed in much greater depth in
Chapter 5.
How to construct
- Draw a horizontal scale and mark it with
appropriate values of the independent variable
- Draw a vertical scale and mark it appropriate
values of the dependent variable
- Plot each point corresponding to the
observations
To describe
- comment the relationship between the variables
Time Series Plots
When to Use
- measurements collected over time at regular
intervals
How to construct
- Draw
Can a horizontalbivariate
be considered scale and mark it with
appropriate
data where thevalues of time
y-variable is
-the
Draw a vertical
variable scale and
measured and mark it appropriate
values of the observed variable
theeach
- Plot x-variable
point is time
corresponding to the
observations and connect
To describe
- comment on any trends or patterns over time
The accompanying time-series plot of movie box
office totals (in millions of dollars) over 18
weeks in the summer for 2001 and 2002
appeared in USA Today (September 3, 2002).

Describe any
trends or
patterns that
you see.

You might also like