You are on page 1of 4

Section 3.

2 The Association Between Two Quantitative Variables 99

Example 4
Numerical and
graphical summaries Worldwide Internet and Facebook Use
Picture the Scenario
The number of worldwide Internet users and the number of users of social
networking sites such as Facebook have grown significantly over the past
decade. This growth though has not been distributed evenly throughout
the world. Countries such as Australia, Sweden, and the Netherlands have
achieved an Internet penetration of more than 80%, while only 7.1% of
India’s population uses the Internet. The story with Facebook is similar.
More than 40% of the populations of countries such as the United States and
Australia use Facebook, compared to fewer than 3% of the populations of
countries such as China, India, and Russia.
The Internet Use data file on the text CD contains recent data for 33
countries on Internet penetration, Facebook penetration, broadband sub-
scription percentage, and other variables related to Internet use. In this
example, we’ll investigate the relationship between Internet penetration and
Facebook penetration. Note that we will often say “use” instead of “penetra-
tion” in these two variable names. Table 3.4 displays the values of these two
variables for each of the 33 countries.

Table 3.4 Internet and Facebook Penetration Rates For 33 Countries

Country Internet Penetration Facebook Penetration

Argentina 49.40% 30.53%

Australia 80.60% 46.01%

Belgium 67.30% 36.98%

Brazil 37.76% 4.39%

Canada 72.30% 52.08%

Chile 50.90% 46.14%

China 22.40% 0.05%

Colombia 38.80% 25.90%

Egypt 12.90% 5.68%

France 65.70% 32.91%

Germany 67.00% 14.07%

Hong Kong 69.50% 52.33%

India 7.10% 1.52%

Indonesia 10.50% 13.49%

Italy 48.80% 30.62%

Japan 73.80% 2.00%

Malaysia 62.80% 37.77%

Mexico 24.90% 16.80%

Netherlands 82.90% 20.54%

(Continued)
100 Chapter 3 Association: Contingency, Correlation, and Regression

Country Internet Penetration Facebook Penetration

Peru 26.20% 13.34%

Philippines 21.50% 19.68%

Poland 52.00% 11.79%

Russia 27.00% 2.99%

Saudi Arabia 22.70% 11.65%

South Africa 10.50% 7.83%

Spain 66.80% 30.24%

Sweden 80.70% 44.72%

Taiwan 66.10% 38.21%

Thailand 20.50% 10.29%

Turkey 35.00% 31.91%

USA 77.33% 46.98%

UK 70.18% 45.97%

Venezuela 25.50% 28.64%


Source: Data from www.internetworldstats.com and www.checkfacebook.com.

Question to Explore
Use numerical and graphical summaries to describe the shape, center, and vari-
ability of the distributions of Internet penetration and Facebook penetration.
Think It Through
Using MINITAB, we obtain the following numerical measures of center and
variability:

Variable N Mean StDev Minimum Q1 Median Q3 Maximum


Internet Use 33 47.00 24.40 7.00 24.00 49.00 68.50 83.00
Facebook Use 33 24.73 16.49 0.00 11.00 26.00 38.00 52.00

Figure 3.4 portrays the distributions using histograms. We observe that the dis-
tribution of Internet use is unimodal and skewed to the left. The distribution of
Facebook use can be characterized as roughly symmetric with multiple modes.

 Figure 3.4 MINITAB Histograms of Internet Use and Facebook Use for the 33 Countries. Question Which nations, if any,
might be outliers in terms of Internet use? Facebook use? Which graphical display would more clearly identify potential outliers?
Section 3.2 The Association Between Two Quantitative Variables 101

Insight
The histograms portray each variable separately. How can we portray the
association between Internet use and Facebook use on a single display?
We’ll study how to do that next.

Try Exercise 3.13, part a

Looking for a Trend: The Scatterplot


With two quantitative variables, it is common to denote the response variable y
and the explanatory variable x. We use this notation because graphical plots for
examining the association use the y-axis for values of the response variable and
the x-axis for values of the explanatory variable. This graphical plot is called a
scatterplot.

Scatterplot
A scatterplot is a graphical display for two quantitative variables using the horizontal (x)
axis for the explanatory variable x and the vertical (y) axis for the response variable y. The
values of x and y for a subject are represented by a point relative to the two axes. The
observations for the n subjects are n points on the scatterplot.

Example 5

Scatterplots Internet and Facebook Use


Picture the Scenario
We return to the data from Example 4 for 33 countries on Internet and
Facebook use.

Questions to Explore
a. Display the relationship between Internet use and Facebook use with
a scatterplot.
b. What can we learn about the association by inspecting the scatterplot?

Think It Through
a. The first step is to identify the response variable and the explana-
tory variable. We’ll study how Facebook use depends on Internet
use. A temporal relationship exists between when individuals become
Internet users and when they become Facebook users; the former pre-
cedes the latter. We will treat Internet use as the explanatory variable
and Facebook use as the response variable. Thus, we use x to denote
Internet use and y to denote Facebook use. We plot Internet use on
the horizontal axis and Facebook use on the vertical axis. Any statisti-
cal software package can create a scatterplot. Using data such as that
in Table 3.4, place Internet use in one column and Facebook use in
another. Select the variable that plays the role of x and the variable
that plays the role of y. Figure 3.5 shows the scatterplot created with
MINITAB.
Consider the observation for the United States. It has Internet use x = 77
and Facebook use y = 47. Find its point circled in black.
102 Chapter 3 Association: Contingency, Correlation, and Regression

 Figure 3.5 MINITAB Scatterplot for Internet Use and Facebook Use for 33
Countries. The point for Japan is labeled and has coordinates x = 74 and y = 2.
Question Is there any point that you would identify as standing out in some way? Which
country does it represent, and how is it unusual in the context of these variables?

b. Here are some things we learn by inspecting the scatterplot:

 There is a clear trend. Nations with larger percentages of Internet


use generally have larger percentages of Facebook use.
 For countries with relatively low Internet use (below 20%), there is
little variability in Facebook use.
 Facebook use ranges from about 2% to 13% for each such country.
 For countries with high Internet use (above 20%), there is high
variability in Facebook use. Facebook use ranges from about 2% to
52% for these countries.
 The point for Japan seems unusual. Its Internet use is among the
highest of all countries (74%), while its Facebook use is among the
lowest (2%). Based on values for other countries with similarly high
Internet use, we might expect Facebook use to be between 25% and
50% rather than 2%. Although not as unusual as Japan, Facebook
use for the Netherlands (21%) is a little lower than we’d expect for
a country with such high Internet use (83%). Can you identify the
point for the Netherlands on the scatterplot?

Insight
Although the points for Japan and, to a lesser extent, the Netherlands, can
be considered atypical, there is a clear overall association. The countries with
lower Internet use tend to have lower Facebook use, and the countries with
high Internet use tend to have high Facebook use.

Try Exercise 3.12, parts a and b

How to Examine a Scatterplot


We examine a scatterplot to study association. How do values on the response
variable change as values of the explanatory variable change? As Internet use
gets higher, for instance, we see that Facebook use gets higher. When there’s
a trend in a scatterplot, what’s the direction? Is the association positive or
negative?

You might also like