You are on page 1of 5

Unit 4 Review Exercises Name __________________________________________

1. Born to be old? Is there a relationship between the gestational period (time from conception to
birth) of an animal and its average life span? The figure shows a scatterplot of the gestational period
and average life span for 43 species of animals as well as the least-squares regression line.

(a) Describe the association shown in the scatterplot. 


There’s a moderately strong positive linear


associations between gestational period and average
life span for these species of animals.

(b) Point A is the hippopotamus. Would you consider this point an outlier, high leverage point, and/or
influential point? Explain your reasoning for each.


Point A has a large residual. Therefore, it’s an outlier. It’s not a high leverage point
because it’s value of gestation is similar to the other remaining data. Point A is
considered a in uential point because its large residual likely in uences the correlation
substantially and if removed, correlation would become closer to 1.

(c) Point B is the Asian elephant. Would you consider this point an outlier, high leverage point, and/or
influential point? Explain your reasoning for each.

Point B is not considered an outlier because its residual is low. However, it should be
considered a high leverage point because it’s value of gestation is high compared to
the rest of the data. Point B would not be considered a in uential point because even
though it may have a high value in the x direction compared to the rest of the data, it’s
residual is low meaning that the slope of the regression line would not change by
much if Point B were to be removed.

2. Penguins diving A study of king penguins looked for a relationship between how deep the penguins
dive to seek food and how long they stay under water. For all but the shallowest dives, there is a linear
relationship that is different for different penguins. The study gives a scatterplot for one penguin titled
“The Relation of Dive Duration (y) to Depth (x).” Duration y is measured in minutes and depth x is in
meters. The report then says, “The regression equation for this bird is: ŷ = 2.69+ 0.0138x .”

(a) What is the slope of the regression line? Interpret this value. 


The slope is 0.0138. This means that the predicted dive


duration increases by 0.0138 minutes per meter increase
in dive depth.

(b) Does the y intercept of the regression make any sense? If so, interpret it. If not, explain why not. 


The y intercept predicts that a dive with 0 meters of depth is predicted to take 2.69
minutes. This does not make any sense as it’s impossible to dive with 0 meters of
depth.

(c) According to the regression line, how long does a typical dive to a depth of 200 meters last? 


A typical dive to a depth of 200 meters lasts 5.45 minutes.

0.013862003 5.45
2.69
(d) One of these penguins dives down to 100 meters and it takes a duration of 3 minutes and 15
seconds. To the nearest tenth of a minute, what is this penguin’s residual? Explain in context what this
value means.

0.01381160 4,07 3.25 4.07 0.82


2.69
The penguin’s dive was 0.82 minutes shorter than the predicted
duration of dives at 100 meters.
3. Stats teachers’ cars A random sample of AP® Statistics teachers was asked to report the age (in
years) and mileage of their primary vehicles. A scatterplot of the data, a least-squares regression
printout, and a residual plot are provided below.

(a) Give the equation of the least-squares regression line for these data. Identify any variables you use.

4 predicted mileage x age of car


12188 3704
y
(b) One teacher reported that her 6-year-old car had 65,000 miles on it. Find and interpret its residual.
The teacher’s car has 11832 less miles than the predicted mileage for 1218866 3704 76,832
cars that are 6 years old based on the regression line.
65,000 76,832 1183

(c) What’s the correlation between car age and mileage? Interpret this value in context.

The correlation between car age and mileage is strong positive linear.

5837 0.915
(d) Is a linear model appropriate for these data? Explain how you know.

A linear model is appropriate for this data as teh residual plot shows
random scatter and no pattern.

(e) Interpret the values of s and r2.

s: The mileage of these cars di er by 20870.5 miles from the predicted


values of the regression line on average.

r2: 83.7% of variation in mileage is accounted for in regression line between


age and mileage.
4. Late bloomers? Japanese cherry trees tend to blossom early when spring weather is warm and later
when spring weather is cool. Here are some data on the average March temperature (in °C) and the day
in April when the first cherry blossom appeared over a 24-year period:

(a) Calculate the correlation and equation of the


least-squares regression line. Interpret the
correlation, slope, and y intercept of the line in this
setting. 


37,12 r 0,85
4
LSRL: 4.686

Slope - The number of days in April until the rst bloom is predicted to decrease by 4.686
days per degree increase in temperature celsius in average temperature in March,

Y intercept - The predicted number of days until rst bloom would be 33.12 if the average
temperature in March was 0 degree celsius.

Correlation - There is a moderately strong negative linear relationship between average March
temperature in Celsius and number of days until the rst bloom in April.

(b) Suppose that the average March temperature this year was 8.2°C. Would you be willing to use the
equation in part (a) to predict the date of first bloom? Explain. 


No, I would not be willing to use the equation in part A to predict the date of the rst bloom
because the scope of the data collected does not cover up to 8.2 degrees Celsius.
Therefore, it is possible that a di erent changes in pattern may occur past the scope of the
data that is not represented in the regression line.

(c) Calculate and interpret the residual for the year when the average March temperature was 4.5°C.
Show your work. 


The time it took for rst bloom was 2.033 days earlier than what was predicted by
the regression line when the average temperature in March was 4.5 degrees Celsius.

4.686 4.5 33.12 12.033


4
10 12.033 2.032
5. Nap time Data is collected from a team of child physicians asking each one what they think the
recommended number of hours of sleep per day is for individuals of various ages. The data point at
(2,14.5) represents an average of 14.5 hours of sleep recommended from these physicians for
individuals who are 2 years old.

(a) Describe the association shown in the scatterplot.

There is a moderately strong negative curved


association between age and the
recommended hours of sleep per day.

(b) A different researcher wants to use this data to analyze the sleep recommendations for individuals
up to and including 12 years of age only. Describe the association shown in the scatterplot if you ignore
any of the data values of people over the age of 12.

If only including up to 12 years of age, there is a moderately strong negative linear


association between age and recommend hours of sleep per day.

(c) Explain how your answers for (a) and (b) highlight the problem with extrapolating from a previous
set of data values.

The pattern associating age and recommend hours of sleep changes after age 12
as the recommend hours of sleep attens signi cantly. This shows the di culty in
extrapolating because had the data past 12 years not been represented, the
change of pattern might of remained unknown.

You might also like