Algebra 1

Section 9.3

Data with 2 Variables
Two Way Frequency Tables

Sometimes a data set can hold information about more than one characteristic. For example,
people can be interviewed about their age and color preference. To summarize the frequencies of
each possible category for both variables, a two way frequency table can be used. One variable’s
categories are listed as columns and the other’s categories are listed as rows. Then, the intersection
of categories represents how many fit into both categories. Totals are also given at the end of each
row/column so that the data for separate variables is also given. For example, consider the two way
frequency table below that describes the correlation between gender and favorite color.

Male Female Total
Red 3 12 15
Blue 11 5 16
Green 10 4 14
Purple 4 8 12
Orange 3 5 8
Total 31 34 65

The intersection of Female and Green, for example, represents the number of people inter-
viewed who were female and preferred green.

The intersection of the total column and row represents the total number of people who were
interviewed, in this case 65.

There are several statistics about data with two variables that can be found directly from two
way frequency tables. These statistics and other characteristics of data with two variables will be
discussed in this section.

1
Relative Frequencies And Association

Data with two variables can show not only information about the variables separately, but also
the effect the variables have on each other, or the association between the variables.

Relative frequency shows the association between two categories, each of separate variables. Rel-
ative frequencies can be found in a two way frequency table. For reference, an example two way
frequency table is given below.

Male Female Total
Red 3 12 15
Blue 11 5 16
Green 10 4 14
Purple 4 8 12
Orange 3 5 8
Total 31 34 65

Joint frequency is the frequency of two conditions happening together. For example, in the table
above, the joint relative frequency of Males who like the color green is 10. These are occurrences of
people both fitting under the category of male and fitting under the category of preferring green. Each
cell in a table that is the intersection of two conditions (not totals) represents a joint relative frequency.

Marginal frequency can be found on the margin of a table. It is the frequency of a single cat-
egory for a single variable occurring. It is found in a cell that is the intersection of a total column or
row and a non-total column or row. For example, the marginal frequency of people who like purple in
the table above is 12. That is, 12 people in total liked purple in the interview.

Conditional relative frequency is not found directly in a table. It is the ratio of a joint frequency
to a marginal frequency. For example, the conditional relative frequency of people who like blue can
be found for both men and women by dividing 11 (the joint frequency of men who like blue) by 16
(the marginal frequency of people who like blue) and dividing 5 (the joint frequency of women who like
blue) by 16. The results are 0.6875 and 0.3125 respectively. This means that more men than women
like blue. This suggests there is an association between gender and having blue as one’s favorite color.

If associations were found between all or most of the category combinations, it would be plausible
to assume that there is an association in general between gender and color preference. Note, however,
that a correlation between the variables does not necessarily imply a causation of one variable on
another.

2
Scatter Plots

Data with two quantitative variables can be represented visually on scatter plots, which graph-
ically demonstrate the association, and nature of association, between the variables.

One of the variables can be plotted on one axis while the other variables can be plotted on a
perpendicular axis. Then, each data point containing two values can be plotted on the axes as if if the
data point were Cartesian (xy) coordinates.

For example, consider the scatter plot below that represents the association between the height
of a group of trees and their ages.

Height of Tree vs. Age of Tree
11
10
9
8
7
Height (feet)

6
5
4
3
2
1

1 2 3 4 5 6 7 8 9
Age (years)

3
In general, the age of the tree increases as the height of the tree increases. This means there is
an correlation between the variables. In fact, the association looks linear. To find a linear equation
that might model the situation shown in the scatter plot, use the form y = mx + b and notice that
b = 0 since the height of the tree is 0 feet when the tree’s age is 0 years. To find the slope, take any
two points and use the formula m = xy22 −y 1
−x1 .

Not every scatter plot will show an correlation between variables, however. For example, consider
the plot below which shows the association between the population of a city and the height of the city’s
mayor.

Height of Mayor vs. Population of City
6
5
4
Height (feet)

3
2
1

1 2 3 4 5 6 7 8 9
City Population (hundred thousand)

There is no correlation between the variables, and one can tell because the dots on the scatter
plot appear to follow no pattern.

4
Examples

Here are a few examples to test the concepts provided in this section. Answers can be found on
the following pages.

1. The following two way frequency table summarizes the vegetation found in dry and wet climates.

Dry Wet Total
Cacti 10 0 10
Thorny Shrubs 8 4 12
Wild Grass 5 20 25
Leafy Trees 1 10 11
Fruit Trees 6 15 21
Total 30 49 65

Based on the data in the above table, is there an association between climate type of an area
and vegetation found in that area?

2. After a survey was taken, the individual in charge of the survey wanted to know about the par-
ticipants. The table below summarizes that age and gender of the participants of the survey.

Male Female Total
14 4 6 10
15 2 4 6
16 8 6 14
17 11 9 20
18 16 13 29
19 21 19 40
20 14 10 24
21 8 4 12
Total 84 71 155

a. What is the joint frequency of 19 year old males?
b. What is the marginal frequency of female participants?
c. What are the conditional relative frequencies for 16 year old participants?

5
3. The table below shows the temperature in degrees Fahrenheit on certain days and the number
of people at the park on those days. Make a scatter plot of the data. Does the data suggest
an association between temperature and people at the park? If there is an association, is it linear?

Temperature (degrees Fahrenheit) People at the Park
30 10
40 12
50 15
60 25
70 27
80 17
90 8

6
Solutions

These are the solutions to the questions on the previous page

1. The conditional relative frequencies for each type of vegetation vary significantly between wet
and dry climate. That is, there is a clear effect of climate on type of vegetation. Thus, there
is an association between these variables. Cacti and thorny shrubs are significantly more likely
to grow in dry climates than wet climates. Wild grass and both types of trees are significantly
more likely to grow in wet climates than dry climates.

2. a. The joint frequency of 19 year old males can be found in the table in the cell intersected by
the male column and the 19 row. The joint frequency is 21.
b. The marginal frequency of female participants is the total number of female participants. In
this case, that’s 71.
c. There are 14 total participants who are 16 years old. The conditional relative frequencies of
males and females are found by dividing the joint frequencies of 16 year old male and 16 year old
female participants, respectively, by 14. For females, the conditional relative frequency is thus
6 8
14 ≈ 0.429. For males, the conditional relative frequency is thus 14 ≈ 0.571.

7
3. The graph of the scatter plot for the data is given below.

Temperature vs. People at the Park
30
25
20
People at the park

15
10
5

10 20 30 40 50 60 70 80 90
Temperature (degrees Fahrenheit)

The graph suggests there is an association between the variables. As the temperature increases
up to 70 degrees, the number of people at the park increases. However, once the temperature
increases past this point, the number of people at the park decreases. This is not the way that a
linear relationship works. Thus, there is an association between temperature and people at the
park but the association is not linear.

8