You are on page 1of 23

Business Statistics

Fourth Canadian Edition

Chapter 6
Scatterplots, Association,
and Correlation

Copyright © 2021 Pearson Canada Inc.


Ch. 6: Scatterplot, Association, and
Correlation (1 of 2)
Learning Objectives
1) Draw a scatterplot and use it to analyze the relationship
between two variables
2) Calculate the correlation as a measure of linear
relationship between two variables
3) Distinguish between correlation and causation

Copyright © 2021 Pearson Canada Inc.


Ch. 6: Scatterplot, Association, and
Correlation (2 of 2)
A scatterplot, which plots one quantitative variable against
another, can be an effective display for data
Scatterplots are the ideal way to picture associations
between two quantitative variables

Figure 6.1 Monthly Canadian/U.S. exchange


rate and oil prices.
Sources: Based on OPEC basket price of oil;
Bank of Canada exchange rates (January–
November 2014).

Copyright © 2021 Pearson Canada Inc.


6.1 Looking at Scatterplots (1 of 4)
The direction of the association is important
A pattern that runs from the upper left to the lower right is
said to be negative

A pattern running from the lower left to the upper right is


called positive

Look for direction: What’s the


sign—positive, negative, or neither?

Copyright © 2021 Pearson Canada Inc.


6.1 Looking at Scatterplots (2 of 4)
The second thing to look for in a scatterplot is its form
If there is a straight line relationship, it will appear as a cloud
or swarm of points stretched out in a generally
consistent, straight form. This is called linear form.
Sometimes the relationship curves gently, while still
increasing or decreasing steadily; sometimes it curves
sharply up then down

Look for form: Is it straight,


curved, something exotic, or no pattern?

Copyright © 2021 Pearson Canada Inc.


6.1 Looking at Scatterplots (3 of 4)
The third feature to look for in a scatterplot is the strength of
the relationship
Do the points appear tightly clustered in a single stream or
do the points seem to be so variable and spread out that we
can barely discern any trend or pattern?

Look for strength: How much scatter?

Copyright © 2021 Pearson Canada Inc.


6.1 Looking at Scatterplots (4 of 4)
Finally, always look for the unexpected
An outlier is an unusual observation, standing away from the
overall pattern of the scatterplot

Look for unusual features: Are


there unusual observations or subgroups?

Copyright © 2021 Pearson Canada Inc.


6.2 Assigning Roles to Variables in
Scatterplots (1 of 3)
To make a scatterplot of two quantitative variables, assign
one to the y-axis and the other to the x-axis
Be sure to label the axes clearly, and indicate the scales of
the axes with numbers
Each variable has units, and these should appear with the
display—usually near each axis
Since we are investigating two variables, we call this branch
of Statistics bivariate analysis

Copyright © 2021 Pearson Canada Inc.


6.2 Assigning Roles to Variables in
Scatterplots (2 of 3)
Each point is placed on a scatterplot at a position that
corresponds to values of the two variables
The point’s horizontal location is specified by its x-value, and
its vertical location is specified by its y-value variable
Together, these variables are known as coordinates and
written (x, y)

Copyright © 2021 Pearson Canada Inc.


6.2 Assigning Roles to Variables in
Scatterplots (3 of 3)
One variable plays the role of the explanatory or predictor
variable, while the other takes on the role of the response
variable
We place the explanatory variable on the x-axis and the
response variable on the y-axis
The x- and y-variables are sometimes referred to as the
independent and dependent variables, respectively

Copyright © 2021 Pearson Canada Inc.


6.3 Understanding Correlation (1 of 6)
The ratio of the sum of the product zx zy for every point in the
scatterplot to n – 1 is called the correlation coefficient.

r 
z zx y

n 1

Two of the more common alternative formulas for correlation are:

r 
 x  x   y  y  
  x  x  y  y

 x  x   y  y 
2 2
 n  1 sx sy

Copyright © 2021 Pearson Canada Inc.


6.3 Understanding Correlation (2 of 6)
Finding the Correlation Then x  14, y  7, sx  6.20,
Coefficient and s y  3.39.
Suppose the data pairs Deviations in x Deviations in y Product
are: 6 −14 = −8 5 − 7 = −2 −8 × −2 = 16
10 −14 = −4 3 − 7 = −4 16
x 6 10 14 19 21 14 −14 = 0 7−7=0 0
19 −14 = 5 8−7=1 5
y 5 3 7 8 12 21 −14 = 7 12 − 7 = 5 35

Add up the products: 16 + 16 + 0 + 5 + 35 = 72


Finally, we divide by (n − 1) × sx × sy = (5 − 1) × 6.20 × 3.39
= 84.07.
The ratio is the correlation coefficient: r = 72/84.07 = 0.856.
Copyright © 2021 Pearson Canada Inc.
6.3 Understanding Correlation (3 of 6)
Correlation Conditions
Correlation measures the strength of the linear association
between two quantitative variables
Before you use correlation, you must check three conditions:
• Quantitative Variables Condition: Correlation
• applies only to quantitative variables
• Linearity Condition: Correlation measures the strength only
of the linear association
• Outlier Condition: Unusual observations can distort
• the correlation

Copyright © 2021 Pearson Canada Inc.


6.3 Understanding Correlation (4 of 6)
Correlation Properties
• The sign of a correlation coefficient gives the direction of
the association
• Correlation is always between −1 and +1
• Correlation treats x and y symmetrically
• Correlation has no units

Copyright © 2021 Pearson Canada Inc.


6.3 Understanding Correlation (5 of 6)
Correlation Properties
• Correlation is not affected by changes in the center or
scale of either variable.
• Correlation measures the strength of the linear association
between the two variables.
• Correlation is sensitive to unusual observations.

Copyright © 2021 Pearson Canada Inc.


6.3 Understanding Correlation (6 of 6)
Correlation Tables
Correlation tables are compact and give a lot of summary
information at a glance. There, you’ll see the correlations
between pairs of variables in a data set arranged in a table.

Table 6.1 A correlation table for some variables collected on a sample of Amazon books.

Blank #Pages Width Thick Thick


#Pages 1.000 Blank Blank Blank
Width 0.003 1.000 Blank Blank
Thick 0.813 0.074 1.000 Blank
Pub year 0.253 0.012 0.309 1.000

Copyright © 2021 Pearson Canada Inc.


6.4 Straightening Scatterplots (1 of 3)
Example: The cost of generating electric power from solar
has been steadily declining. The figure shows the price of
systems installed in Germany, during 2009-2013.
The time series plot of the data does not seem to indicate a
strong linear association:

Copyright © 2021 Pearson Canada Inc.


6.4 Straightening Scatterplots (2 of 3)

Figure 6.2 Price of solar installations in Germany, 2009–2013, in Euros/Watt. Source: “Analysis of 13
years of successful PV development in Germany under the EEG with a focus on 2013,” Renewable
International, March 2014, Bernard Chabot.

Copyright © 2021 Pearson Canada Inc.


6.4 Straightening Scatterplots (3 of 3)
However, if we look at the logarithm of the values, the plot looks
straighter, so the correlation is now a more appropriate measure of
association.

Simple
transformations such
as the logarithm,
square root, and
reciprocal can
sometimes straighten
a scatterplot’s form.

Figure 6.3 Logarithm (to the base 10) of the price of solar installations
in Germany shown in Figure 6.2.

Copyright © 2021 Pearson Canada Inc.


6.5 Lurking Variables and Causation
There is no way to
conclude from a high
correlation alone that one
variable causes the other.
There’s always the
possibility that some third
variable—a lurking
variable—is
simultaneously affecting
both of the variables you
have observed.
Figure 6.4 Life Expectancy and numbers of Doctors per
Person in 40 countries shows a fairly strong, positive
linear relationship with a correlation of 0.705.

Copyright © 2021 Pearson Canada Inc.


What Can Go Wrong?
• Don’t say “correlation” when you mean “association”
• Don’t correlate categorical variables
• Make sure the association is linear
• Beware of outliers and multiple clusters
• The correlation between just two data points is
meaningless.
• Don’t confuse correlation with causation
• Watch out for lurking variables

Copyright © 2021 Pearson Canada Inc.


What Have We Learned? (1 of 2)
• Begin investigation by looking at a scatterplot, we are
interested in
– Direction
– Form
– Strength
• The sign of the correlation tells us the direction of the
association
• The magnitude of the correlation tells us of the strength of
a linear association
• Correlation has no units, so shifting or scaling the data,
standardizing, or even swapping the variables has no
effect on the numerical value
Copyright © 2021 Pearson Canada Inc.
What Have We Learned? (2 of 2)
To use correlation we have to check certain conditions for
the analysis to be valid:
• Check the Linearity Condition
• Watch out for unusual observations
We’ve learned not to make the mistake of assuming that a
high correlation or strong association is evidence of a cause-
and-effect relationship.

Copyright © 2021 Pearson Canada Inc.

You might also like