Professional Documents
Culture Documents
• Which roller coaster variables do you think are strongly related to the
top speed on the ride?
Problem 4.3
Statisticians measure the strength of a linear relationship between two variables
using a number called the correlation coefficient. This number is a decimal
between -1 and 1. When the points lie close to a straight line, the correlation
coefficient is close to -1 or 1.
continued on the next page >
A 1. The graph below has a correlation coefficient of 1.0. What do you think
a correlation coefficient of 1.0 means?
A cc of 1.0 means the graph
has positive association that is
perfectly predictable.
2. Which of the six scatter plots below (a)–(f ) has a correlation coefficient
of -1.0? What do you think a correlation coefficient of -1 .0 means?
Letter b. It goes downward and it makes a straight line. (negative association)
a. 0.4 b. -1 c. 0.8
It has positive association and the It has positive association but its still
dots are more spread out. not 100% accurate. (still tight)
d. -0.4 e. -0.8 f. 0
B The scatter plot below shows the relationship between the top speed of a
roller coaster and its maximum drop. The pink dots represent wood-frame
roller coasters. The blue dots represent steel-frame coasters.
120
110
100
90
80
Top Speed (mph)
70
60
50
40
30
20
10
0
0 40 80 120 160 200 240 280 320 360 400 440
Maximum Drop (ft)
Yes, there is a strong 1. Suppose you drew one linear model for all the data in the graph.
positive association
between max drop and Could you use the model to make an accurate prediction about the
max speed. top speed of the roller coaster with a given maximum drop? Explain.
2. Estimate the correlation coefficient for the top speed and the
maximum drop. Is the correlation coefficient closest to -1, -0.5, 0,
0.5, or 1? Closest to 1
3. Is the maximum drop of a roller coaster likely to be one of the causes
of the top speed of the coaster? Why or why not?
Yes because the higher the coaster, the faster it will go going down.
continued on the next page >
120
110
100
90
80
Top Speed (mph)
70
60
50
40
30
20
10
0
0 1000 2000 3000 4000 5000 6000 7000
Track Length (ft)
1. Suppose you drew one linear model for all the data in the graph. Yes, however I cannot
Could you use the model to make an accurate prediction about the predict with 100%
certainty.
top speed of the roller coaster with a given track length? Explain.
2. Estimate the correlation coefficient for the top speed and track length.
Is the correlation coefficient closest to -1, -0.5, 0, 0.5, or 1? Closest to 0.5
3. Is the track length of a roller coaster likely to be one of the causes of
the top speed of the coaster? Why or why not? No, because the points of the graph are
scattered. It doesn't look like the length of the
coaster affects the speed.
4. Computer and calculator data analysis tools can take data pairs like
those plotted above and calculate exact correlation coefficients. Use
the tool that you have available to find the correlation coefficient for
the sample of (track length, top speed) data in the table.
Track Length (ft) 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,000
Top Speed (mph) 5 20 45 50 40 50 55 60 85 65
120
110
100
90
80
Top Speed (mph)
70
60
50
40
30
20
10
0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
Ride Time (min)
Yes, but it is 1. Suppose you drew one linear model for all the data in the graph.
difficult to accurately
predict exact Could you use the model to make an accurate prediction about the
coordinates. top speed of the roller coaster with a given ride time? Explain.
2. Estimate the correlation coefficient for the top speed and ride time. Is
the correlation coefficient closest to -1, - 0.5, 0, 0.5, or 1? 0.5
3. Suppose most of the points on a scatter plot cluster near a line, with
only a few that don’t fit the pattern. The points that lie outside a cluster
are called outliers. Use the graph above. Find each point. Then decide
whether the point is an outlier. If it is, explain why you think it is an
outlier. Far from cluster
a. (1.75, 50) b. (0.30, 80) c. (3.35, 75)
d. (0.28, 120) e. (0.80, 21) f. (1.0, 10) Not close to the
Farthest from cluster model line
g. Use the scatter plot in Question C. Find two outliers on that graph
and estimate their coordinates (track length, top speed).
(4,80) (3.7, 65) continued on the next page >
90
80
70
Number of Riders
60
50
40
30
20
10
0
10 20 30 40 50 60 70 80
Age of Riders
It is closest to -1. 3. Estimate the correlation coefficient for the number of riders and age of
riders. Is the correlation coefficient closest to -1, -0.5, 0, 0.5, or 1?
4. Are any of the data points outliers? If so, estimate the coordinates of
those points. (80,1)
(77,3)
(15,44)
F Is it possible to have a correlation coefficient close to - 1 or 1 with only a
few outliers? Explain your thinking. Yes, because a few outliers do not affect the
overall association.