You are on page 1of 39

# Unit 5: Scatter Plots

I.

Vocabulary List

## Definitions will be given in notes

scatter plot
regression
correlation
line of best fit/trend line
correlation coefficient
residual
residual plot
observed value
predicted value

## II. Scatter Plots Basics

Researchers, such as
anthropologists, are
often interested in how
two measurements are
related. The statistical
study of the relationship
between variables is
called regression.

## Displaying data visually can help you see

relationships. A scatter plot is a graph with points
plotted to show a possible relationship between
two sets of data. A scatter plot is an effective way to
display some types of data.
Is a scatter plot
discrete or
continuous?
discrete

## B. Graphing a Scatter Plot with Given Data

1. The table shows the number of cookies in a jar from the time since they were
baked. Graph a scatter plot using the given data.

## Use the table to make ordered pairs

for the scatter plot.
The x-value represents the time since the cookies were baked and the
y-value represents the number of cookies left in the jar.
Plot the ordered pairs.

## III. Describing Correlation

A scatter plot is helpful in understanding the
form, direction, and strength of the relationship
between two variables. Correlation is the
strength and direction of the linear relationship
between the two variables.

## Ex 1: Describe the correlation illustrated by the

scatter plot.
As the average daily
temperature increased,
the number of visitor
increased.
There is a positive
correlation between the
two data sets.

## Ex. 2: Describe the correlation illustrated by the scatter

plot.
As the elevation in Nevada
increases, the mean
annual temperature
decreases.
There is a negative
correlation between the
two data sets.

## IV. Line of Best Fit

If there is a strong linear relationship between two
variables (positive or negative), a line of best fit,
or a line that best fits the data, can be used to
make predictions. This is also called a trend line.
When drawing a line of best fit, try to have about
the same number of points above and below the
line of best fit.

Ex. 1: The scatter plot shows a relationship between the total amount of
money collected at the concession stand and the total number of tickets
sold at a movie theater. Based on this relationship, predict how much
money will be collected at the concession stand when 150 tickets have been
sold.

## a. Draw a line of fit and use it to make a prediction.

Draw a line that has about the same
number of points above and below it. Your
line may or may not go through data points.
b. Based on the data, \$750 is a reasonable prediction of
how much money will be collected when 150 tickets
have been sold.

## Find the point on the line whose x-value is 150.

The corresponding y-value is 750.

## c. Write a slope-intercept form of the line of fit.

y = mx + b
Points (120, 600); (150, 750)
Find the slope: 5
y = 5x

Ex 2:
Albany and Sydney are
about the same distance
from the equator. Make
a scatter plot with
Albanys temperature as
the independent
variable. Name the type
of correlation. Then
sketch a line of best fit
and find its equation.

## Step 1 Plot the data points.

Step 2 Identify the correlation.
Notice that the data set is
negatively correlatedas the
temperature rises in Albany, it
falls in Sydney.

## Step 3 Sketch a line of best fit.

Draw a line that splits
the data evenly above
and below.

## Step 4 Identify two points on the line.

For this data, you might select (35, 64) and
(85, 41).
Step 5 Find the slope of the line that models the
data.
Use the point-slope form.
Point-slope form.
y y1= m(x x1)
y 64 = 0.46(x 35)
y = 0.46x + 80.1

Substitute.
Simplify.

## V. Correlation Coefficient (With Technology)

The correlation coefficient r is a measure of how
well the data set is fit by a model. In other words,
how well it fits the line of best fit.

## Dont worry, thats why we have

graphing calculators!!!

## You can use a graphing calculator to perform a

linear regression and find the correlation
coefficient r.
To display the correlation
coefficient r, you may have
to turn on the diagnostic
mode. To do this, press
and choose the
DiagnosticOn mode.
Press enter, and then press
enter again to activate it.

## Example 2: Anthropology Application

Anthropologists can
use the femur, or
thighbone, to estimate
the height of a human
being. The table shows
the results of a
randomly selected
sample.

Example 2 Continued

a. Make a scatter
plot of the data
with femur
length as the
independent
variable.

shown at right.

## b. Find the correlation coefficient r and the

line of best fit. Interpret the slope of the
line of best fit in the context of the problem.
Enter the data into lists L1
and L2 on a graphing
calculator. Do this by
pressing STAT and then
1: Edit... Use the linear
regression feature by
pressing STAT, choosing
CALC, and selecting
4:LinReg. The equation of
the line of best fit is
h 2.91l + 54.04.

## !!! If you do not see r2

and r, you did not
correctly turn on
DiagnosticOn. Try it
again.

## The slope is about 2.91, so for each 1 cm

increase in femur length, the predicted increase
in a human beings height is 2.91 cm.
The correlation coefficient is r 0.986. What type
of correlation does it have?

Strong positive

## c. A mans femur is 41 cm long. Predict the

mans height.
The equation of the line of best fit is
h 2.91l + 54.04. Use the equation to predict the
mans height.
For a 41-cm-long femur,
h 2.91(41) + 54.04 Substitute 41 for l.
h 173.35
The height of a man with a 41-cm-long femur
would be about 173 cm.

Example 2

## The gas mileage for randomly selected cars

based upon engine horsepower is given in the
table.

## Check It Out! Example 2 Continued

a. Make a scatter
plot of the data
with horsepower
as the independent
variable.
The scatter plot is
shown on the right.

## b. Find the correlation coefficient r and the line of

best fit. Interpret the slope of the line of best
fit in the context of the problem.
Enter the data into lists L1
and L2 on a graphing
calculator. Use the linear
regression feature by
pressing STAT, choosing
CALC, and selecting
4:LinReg. The equation of
the line of best fit is
y 0.15x + 47.5.

## The slope is about 0.15, so for each 1 unit

increase in horsepower, gas mileage drops
0.15 mi/gal.

## The correlation coefficient is r 0.916, which

indicates a strong negative correlation.

## c. Predict the gas mileage for a 210-horsepower

engine.
The equation of the line of best fit is
y 0.15x + 47.5. Use the equation to predict
the gas mileage. For a 210-horsepower engine,
y 0.15(210) + 47.50.

y 16

Example 3

## Find the following information for this data set

on the number of grams of fat and the number
of calories in sandwiches served at Daves Deli.

## Use the equation of the line of best fit to predict

the number of grams of fat in a sandwich with
420 Calories. How close is your answer to the
value given in the table?

## a. Make a scatter plot of the data with fat

as the independent variable.
The scatter plot is
shown on the right.

## b. Find the correlation coefficient and the

equation of the line of best fit. Draw the
line of best fit on your scatter plot.
The correlation coefficient is
r = 0.682. The equation of
the line of best fit is
y 11.1x + 309.8.

## c. Predict the amount of fat in a sandwich

with 420 Calories. How accurate do you
think your prediction is?
420 11.1x + 309.8

## Calories is the dependent variable.

110.2 11.1x
9.9 x
The line predicts 10 grams of fat. This is not close
to the 15 g in the table.

IV. Residuals

## A residual is the difference in the observed

value of the response variable (the actual
data point you were given) and the value
predicted by the line of best fit (the y
value you would get if you substituted x
into the line of best fit equation).
In other words, it is the measurement of
how far the data fall from the line of best
fit.
Residual = observed y predicted y

Residual Plots
A Residual Plot is a scatterplot of all of the residual
values. They help us assess the fit of a regression
line.
If the regression line captures the overall
relationship between x and y, the residuals should
have no systematic pattern.

## The uniform scatter of points

indicates that the regression line
fits the data well, so the line is a
good model.

## A curved pattern shows that the

relationship is not linear.

the line. The response variable y has
more spread for larger values of the
explanatory variable x, so the prediction
will be less accurate when x is large.

Ex 1: Complete each table using the given values. A calculator will be very
useful. Round answers to one decimal place. Construct the residual plot. Be
sure to label the independent and dependent variables, along with the units.
3

y (Observed
Value)

13

22

26

27

31

Predicted
Value

Residual
Value

Residual

1
0
-1
-2
-3

## Does the residual plot suggest a linear relationship?

Explain.

2 3 4 5

Ex 1: Complete each table using the given values. A calculator will be very
useful. Round answers to one decimal place. Construct the residual plot. Be
sure to label the independent and dependent variables, along with the units.
3

y (Observed
Value)

Predicted
Value

8.7

Residual
Value
-2.7

13

13.6

-0.6

22

18.4

3.6

26

23.3

2.7

27

28.2

-1.2

31

33.1

-2.1

Residual

1
0
-1
-2
-3

## Does the residual plot suggest a linear relationship?

Explain.
Yes, because there is no pattern.

2 3 4 5