You are on page 1of 33

CHAPTER

13
Statistics

Copyright © Cengage Learning. All rights reserved.


Section13.5 Linear Regression and
Correlation

Copyright © Cengage Learning. All rights reserved.


Linear Regression

3
Linear Regression
When performing research studies, scientists often wish to
know whether two variables are related. If the variables are
determined to be related, a scientist may then wish to find
an equation that can be used to model the relationship.

For instance, a geologist might want to know whether there


is a relationship between the duration of an eruption of a
geyser and the time between eruptions.

A first step in this determination is to collect some data.


Data involving two variables are called bivariate data.

4
Linear Regression
Table 13.11 gives bivariate data showing the time between
two eruptions and the duration of the second eruption for
10 eruptions of the geyser Old Faithful.

Table 13.11

5
Linear Regression
Once the data are collected, a scatter diagram or scatter
plot can be drawn, as shown in Figure 13.15.

Figure 13.15

6
Linear Regression
One way for the geologist to create a model of the
relationship between the time between two eruptions and
the duration of the second eruption is to find a line that
approximates the data points plotted in the scatter plot.

There are many such lines that


can be drawn, as shown in
Figure 13.16.

Figure 13.16
7
Linear Regression
Of all the possible lines that can be drawn, the one that is
usually of most interest is called the line of best fit or the
least-squares regression line.

The least-squares regression line is the line that fits the data
better than any other line that might be drawn.

The least-squares regression line is defined as follows.

8
Linear Regression
Applying some techniques from calculus, it is possible to
find a formula for the least-squares line.

9
Linear Regression
To apply the formula to the data for Old Faithful, we first
find the value of each summation.

Next, we use these values to find the value of a.

10
Linear Regression
We then find the values of

and use them to find the y-intercept, b.

11
Linear Regression
The regression equation is

The graph of the regression


equation and a scatter plot
of the data are shown at right.

12
Linear Regression
We can now use the regression equation to estimate the
duration of an eruption given the time between eruptions.
For instance, if the time between two eruptions is 200
seconds, then the estimated duration of the second
eruption is

The approximate duration of the eruption is 78 seconds.

13
Linear Regression
As our example demonstrates, it can be challenging to
calculate all of the values needed to find a regression line.
Fortunately, many computer programs and calculators can
perform these calculations.

The following example shows the use of a TI-84 to find the


regression line for the Old Faithful data.

Enter the data from Table 13.11 into L1 and L2, as shown
below.

Table 13.11
14
Linear Regression
Press the key, tab to
and scroll to 4:. Then press .

Scroll to the Store RegEQ line. Press


the key and scroll to .
Press twice.

Press twice. The slope a and


y-intercept b of the least-squares line
are shown. You will see two additional
values, r2 and r, displayed on the screen.
15
Linear Regression
The equation for the regression line is stored in Y1.
Using 200 seconds as the time between eruptions,
we can calculate the expected duration of the eruption
as follows.

Press the key and scroll


to . Press twice.
Now enter “(200)” and press
.

The predicted duration of the eruption is approximately


78 seconds.

16
Linear Regression
Here is an additional example of calculating regression
lines. Professor R. McNeill Alexander wanted to determine
whether the stride length of a dinosaur, as shown by its
fossilized footprints, could be used to estimate the speed
of the dinosaur.

17
Linear Regression
Stride length for an animal is defined as the distance x from
a particular point on a footprint to that same point on the
next footprint of the same foot.

Because dinosaurs are extinct, Alexander and fellow


scientist A. S. Jayes carried out experiments with many
types of animals, including adult men, dogs, camels,
ostriches, and elephants.

18
Linear Regression
Some of the results from these experiments are recorded in
Table 13.12.

Speed for Selected Stride Lengths.


Table 13.12 19
Example 1 – Find the Equation of a Least-Squares Line

Find the equation of the least-squares line for the ordered


pairs in Table below.

20
Example 1 – Solution
Enter the data into a calculator or software program that
supports regression equations. Here are the results using a
TI-84 calculator.

The regression equation is

21
Linear Regression
Once the equation of the least-squares line is found, it can
be used to make predictions. This procedure is illustrated in
the next example.

22
Example 2 – Use a Least-Squares Line to Make a Prediction

Use the equation of the least-squares line from Example 1


to predict the average speed of an adult man for each of
the following stride lengths. Round your results to the
nearest tenth of a meter per second.

a. 2.8 m b. 4.8 m

23
Example 2(a) – Solution
From Example 1, the regression equation is

Substituting 2.8 for x and evaluate the resulting expression.

The predicted average speed of an adult man with a stride


length of 2.8 m is 4.3 m/s.

24
Example 2(b) – Solution cont’d

From Example 1, the regression equation is

Substituting 4.8 for x and evaluate the resulting expression.

The predicted average speed of an adult man with a stride


length of 4.8 m is 9.8 m/s.

25
Linear Regression
The procedure in Example 2a made use of an equation to
determine a point between given data points. This
procedure is referred to as interpolation.

In Example 2b, an equation was


used to determine a point to the
right of the given data points.

The process of using an equation


to determine a point to the right
or left of given data points is
referred to as extrapolation.
See Figure 13.18. Interpolation and extrapolation
Figure 13.18
26
Linear Correlation Coefficient

27
Linear Correlation Coefficient
To determine the strength of a linear relationship between
two variables, statisticians use a statistic called the linear
correlation coefficient, which is denoted by the variable r
and is defined as follows.

28
Linear Correlation Coefficient
If the linear correlation coefficient r is positive, the
relationship between the variables has a positive
correlation. In this case, if one variable increases, the
other variable also tends to increase.

If r is negative, the linear relationship between the variables


has a negative correlation. In this case, if one variable
increases, the other variable tends to decrease.

The closer | r | is to 1, the stronger the linear relationship


between the variables.

29
Example 3 – Find a Linear Correlation
Coefficient
Find the linear correlation coefficient for stride length
versus speed of an adult man. Use the data in Table below.
Round your result to the nearest hundredth.

30
Example 3 – Solution
The correlation coefficient r is displayed on the same screen
as the values for the regression equation. The screen from
Example 1 is shown below.

The linear correlation coefficient, rounded to the nearest


hundredth, is 0.99.

31
Linear Correlation Coefficient
The linear correlation coefficient indicates the strength of a
linear relationship between two variables; however, it does
not indicate the presence of a cause-and-effect
relationship.

32
Linear Correlation Coefficient
In your work with applications that involve the linear
correlation coefficient r, it is important to remember the
following properties of r.

33

You might also like