You are on page 1of 39

Unit 5: Scatter Plots

I.

Vocabulary List

Definitions will be given in notes

scatter plot
regression
correlation
line of best fit/trend line
correlation coefficient
residual
residual plot
observed value
predicted value

II. Scatter Plots Basics

Researchers, such as
anthropologists, are
often interested in how
two measurements are
related. The statistical
study of the relationship
between variables is
called regression.

A. Definition and Use

Displaying data visually can help you see


relationships. A scatter plot is a graph with points
plotted to show a possible relationship between
two sets of data. A scatter plot is an effective way to
display some types of data.
Is a scatter plot
discrete or
continuous?
discrete

B. Graphing a Scatter Plot with Given Data


1. The table shows the number of cookies in a jar from the time since they were
baked. Graph a scatter plot using the given data.

Use the table to make ordered pairs


for the scatter plot.
The x-value represents the time since the cookies were baked and the
y-value represents the number of cookies left in the jar.
Plot the ordered pairs.

III. Describing Correlation


A scatter plot is helpful in understanding the
form, direction, and strength of the relationship
between two variables. Correlation is the
strength and direction of the linear relationship
between the two variables.

Ex 1: Describe the correlation illustrated by the


scatter plot.
As the average daily
temperature increased,
the number of visitor
increased.
There is a positive
correlation between the
two data sets.

Ex. 2: Describe the correlation illustrated by the scatter


plot.
As the elevation in Nevada
increases, the mean
annual temperature
decreases.
There is a negative
correlation between the
two data sets.

IV. Line of Best Fit


If there is a strong linear relationship between two
variables (positive or negative), a line of best fit,
or a line that best fits the data, can be used to
make predictions. This is also called a trend line.
Helpful Hint
When drawing a line of best fit, try to have about
the same number of points above and below the
line of best fit.

Ex. 1: The scatter plot shows a relationship between the total amount of
money collected at the concession stand and the total number of tickets
sold at a movie theater. Based on this relationship, predict how much
money will be collected at the concession stand when 150 tickets have been
sold.

a. Draw a line of fit and use it to make a prediction.


Draw a line that has about the same
number of points above and below it. Your
line may or may not go through data points.
b. Based on the data, $750 is a reasonable prediction of
how much money will be collected when 150 tickets
have been sold.

Find the point on the line whose x-value is 150.


The corresponding y-value is 750.

c. Write a slope-intercept form of the line of fit.

y = mx + b
Points (120, 600); (150, 750)
Find the slope: 5
y = 5x

Ex 2:
Albany and Sydney are
about the same distance
from the equator. Make
a scatter plot with
Albanys temperature as
the independent
variable. Name the type
of correlation. Then
sketch a line of best fit
and find its equation.

Step 1 Plot the data points.


Step 2 Identify the correlation.
Notice that the data set is
negatively correlatedas the
temperature rises in Albany, it
falls in Sydney.

Step 3 Sketch a line of best fit.


Draw a line that splits
the data evenly above
and below.

Step 4 Identify two points on the line.


For this data, you might select (35, 64) and
(85, 41).
Step 5 Find the slope of the line that models the
data.
Use the point-slope form.
Point-slope form.
y y1= m(x x1)
y 64 = 0.46(x 35)
y = 0.46x + 80.1

Substitute.
Simplify.

An equation that models the data is y = 0.46x + 80.1.

V. Correlation Coefficient (With Technology)


The correlation coefficient r is a measure of how
well the data set is fit by a model. In other words,
how well it fits the line of best fit.

Dont worry, thats why we have


graphing calculators!!!

You can use a graphing calculator to perform a


linear regression and find the correlation
coefficient r.
To display the correlation
coefficient r, you may have
to turn on the diagnostic
mode. To do this, press
and choose the
DiagnosticOn mode.
Press enter, and then press
enter again to activate it.

Example 2: Anthropology Application


Anthropologists can
use the femur, or
thighbone, to estimate
the height of a human
being. The table shows
the results of a
randomly selected
sample.

Example 2 Continued

a. Make a scatter
plot of the data
with femur
length as the
independent
variable.

The scatter plot is


shown at right.

b. Find the correlation coefficient r and the


line of best fit. Interpret the slope of the
line of best fit in the context of the problem.
Enter the data into lists L1
and L2 on a graphing
calculator. Do this by
pressing STAT and then
1: Edit... Use the linear
regression feature by
pressing STAT, choosing
CALC, and selecting
4:LinReg. The equation of
the line of best fit is
h 2.91l + 54.04.

!!! If you do not see r2


and r, you did not
correctly turn on
DiagnosticOn. Try it
again.

The slope is about 2.91, so for each 1 cm


increase in femur length, the predicted increase
in a human beings height is 2.91 cm.
The correlation coefficient is r 0.986. What type
of correlation does it have?

Strong positive

c. A mans femur is 41 cm long. Predict the


mans height.
The equation of the line of best fit is
h 2.91l + 54.04. Use the equation to predict the
mans height.
For a 41-cm-long femur,
h 2.91(41) + 54.04 Substitute 41 for l.
h 173.35
The height of a man with a 41-cm-long femur
would be about 173 cm.

Example 2

The gas mileage for randomly selected cars


based upon engine horsepower is given in the
table.

Check It Out! Example 2 Continued


a. Make a scatter
plot of the data
with horsepower
as the independent
variable.
The scatter plot is
shown on the right.

b. Find the correlation coefficient r and the line of


best fit. Interpret the slope of the line of best
fit in the context of the problem.
Enter the data into lists L1
and L2 on a graphing
calculator. Use the linear
regression feature by
pressing STAT, choosing
CALC, and selecting
4:LinReg. The equation of
the line of best fit is
y 0.15x + 47.5.

The slope is about 0.15, so for each 1 unit


increase in horsepower, gas mileage drops
0.15 mi/gal.

The correlation coefficient is r 0.916, which


indicates a strong negative correlation.

c. Predict the gas mileage for a 210-horsepower


engine.
The equation of the line of best fit is
y 0.15x + 47.5. Use the equation to predict
the gas mileage. For a 210-horsepower engine,
y 0.15(210) + 47.50.

Substitute 210 for x.

y 16

The mileage for a 210-horsepower engine would be


about 16.0 mi/gal.

Example 3

Find the following information for this data set


on the number of grams of fat and the number
of calories in sandwiches served at Daves Deli.

Use the equation of the line of best fit to predict


the number of grams of fat in a sandwich with
420 Calories. How close is your answer to the
value given in the table?

a. Make a scatter plot of the data with fat


as the independent variable.
The scatter plot is
shown on the right.

b. Find the correlation coefficient and the


equation of the line of best fit. Draw the
line of best fit on your scatter plot.
The correlation coefficient is
r = 0.682. The equation of
the line of best fit is
y 11.1x + 309.8.

c. Predict the amount of fat in a sandwich


with 420 Calories. How accurate do you
think your prediction is?
420 11.1x + 309.8

Calories is the dependent variable.

110.2 11.1x
9.9 x
The line predicts 10 grams of fat. This is not close
to the 15 g in the table.

IV. Residuals

A residual is the difference in the observed


value of the response variable (the actual
data point you were given) and the value
predicted by the line of best fit (the y
value you would get if you substituted x
into the line of best fit equation).
In other words, it is the measurement of
how far the data fall from the line of best
fit.
Residual = observed y predicted y

Residual Plots
A Residual Plot is a scatterplot of all of the residual
values. They help us assess the fit of a regression
line.
If the regression line captures the overall
relationship between x and y, the residuals should
have no systematic pattern.

Things to look out for with residual plots

The uniform scatter of points


indicates that the regression line
fits the data well, so the line is a
good model.
This will help you on your FR ?

A curved pattern shows that the


relationship is not linear.

Increasing or decreasing spread about


the line. The response variable y has
more spread for larger values of the
explanatory variable x, so the prediction
will be less accurate when x is large.

Ex 1: Complete each table using the given values. A calculator will be very
useful. Round answers to one decimal place. Construct the residual plot. Be
sure to label the independent and dependent variables, along with the units.
3

Line of Best Fit Equation: y = 4.88x + 3.8


y (Observed
Value)

13

22

26

27

31

Predicted
Value

Residual
Value

Residual

1
0
-1
-2
-3

Does the residual plot suggest a linear relationship?


Explain.

2 3 4 5

Ex 1: Complete each table using the given values. A calculator will be very
useful. Round answers to one decimal place. Construct the residual plot. Be
sure to label the independent and dependent variables, along with the units.
3

Line of Best Fit Equation: y = 4.88x + 3.8


y (Observed
Value)

Predicted
Value

8.7

Residual
Value
-2.7

13

13.6

-0.6

22

18.4

3.6

26

23.3

2.7

27

28.2

-1.2

31

33.1

-2.1

Residual

1
0
-1
-2
-3

Does the residual plot suggest a linear relationship?


Explain.
Yes, because there is no pattern.

2 3 4 5