You are on page 1of 3

1

Olympic Swimmers Heights vs. Times in the 200m Butterfly


For our topic we chose to see if Olympic swimmers heights affected their times, in the
200 meter butterfly. We collected our data by going to the internet and typing olympic
swimmers. We then took their height and time and put them into a spreadsheet. Our
explanatory variable is the height of the swimmers. Our response variable is how fast the
swimmers swam in the 200m butterfly. Our scatter plot shows a negative correlation. We
thought there was going to be more of a correlation but we were wrong. It was very weak. Our
correlation coefficient, r, is -0.33 which is very low. The correlation coefficient tells us how
closely correlated the height and time are. The average height is 72.29 inches and the average
time is 117.46 seconds. We found the averages by going to our calculator and pressing stat,
edit, and then we typed our data set into L1 and L2. We then went to stat, calc, 2 variable
statistics. The x represented the average height and the y represented the average time. After
words, we found our least square regression line for our data and we came up with = -0.284x
+137.996. We found the regression line by going to stat, calc 4: LinReg (ax+b).
The marginal change is -.284. For each change of 1 inch in height you see -.284
seconds for the change in time. The slope of the regression line tells us how many units the
response variable is expected to change for each unit change in the explanatory variable. The
number of units change in the response variable is for each unit change in the explanatory
variable is called the marginal change of the response variable. In other words, the marginal
change is the number the y value goes up per unit of x.
There are three influential points in the data set. Influential points are points that can
change the rest of the data, in other words, influential points are outliers. We found that any
outliers would be the y values, and would be anything lower than 109.98 seconds and anything
higher than 124.51 seconds. Based on this information, we found 3 influential points: (66,
126.37), (69, 124.72), and (71, 124.64). We took out each of these numbers from our data set
and found our statistics for our new data set. The information showed that our data was even
more scattered than before. The new equation of our regression line was = -0.066x+121.77
and our new r = 0.014. Because we took our influential points out, we could tell that there was
even less of a correlation than we previously thought based on our normal data set. This could
be because the outliers in our data set actually supported our theory, that taller swimmers have
quicker times or that shorter swimmers have slower times.
The original value of our coefficient of determination, r, was 0.11. Because this value is
close to 0, this means our data set has no correlation. If this number would have been closer to

2
1, it would have meant that our data did have some kind of correlation; however, 0.11 is too
close to zero and our coefficient of determination tells us that there is no correlation between the
height of swimmers and their swimming times.
The coefficient of determination can tell you if a data set has correlation, and it can tell
you how much of the variation is explained by the linear model. Because our r = 0.11, we can
tell that 11% of our data is explained. We can find this percent by multiplying r by 100. If 11%
of the variation is explained by the linear model, this means that 89% of our variation is
unexplained by our linear model. Because there is such a large percent of the variation
unexplained, we can tell that there are lurking variables influencing our y variables.
One lurking variable that could be affecting the olympic swimmer's time is simply that the
swimmer might not have had their best day. It is rare that in the olympics whenever the
swimmers compete they get a personal record, this shows that some swimmers have swam
better on another day. One lurking variable is that the swimmers were just having an off day.
Another lurking variable that could be present are injuries. There have been some swimmers
that swim with sore shoulders or with cramped legs. Injuries could affect how fast a swimmer
finishes in their heat. Both of these lurking variables could affect our data set because some
swimmers may have decreased their time. Shorter swimmers and taller swimmers could both
have been affected by these lurking variables and because of this, any of our times could have
changed, meaning that while these lurking variables may have affected our data, it may not
have affected the results we concluded.
Our data had no correlation, therefore making it hard for us to predict interpolation or
extrapolation points. Interpolation is predicting values for x values that are between observed
x values in the data set. For example, we predicted what a 68 inch tall swimmers time might be.
In order to find this we plugged 68 into our regression line equation ( = -0.284x+137.996). We
found that a swimmer with a height of 68 inches might swim a time of 117.28 seconds. While
interpolation can be close the correct answer, extrapolations is more likely to be inaccurate
because extrapolation deals with a data point outside of the data set. Extrapolation is predicting
values for x values that are beyond observed x values in a data set. An example of
extrapolation for our data set would be predicting the swim time of an 80 inch tall swimmer might
be. In order to estimate the swim time value, we plugged in 80 for x in our regression line
equation and got 115.28 seconds. This means that, based on our data, an 80 inch tall swimmer
might swim the 200 meter butterfly in 115.28 seconds. However, this number could be

3
inaccurate because it is outside of our data set, and because there is not a strong correlation
between our data values.
Times every other one:
http://london2012.nytimes.com/swimming/mens-200m-butterfly#heats
Times all:
https://www.rio2016.com/en/swimming-standings-sw-mens-200m-butterfly