You are on page 1of 23

Chapter 17: Introduction to

Regression

1
Introduction to Linear Regression
• The Pearson correlation measures the
degree to which a set of data points form a
straight line relationship.
• Regression is a statistical procedure that
determines the equation for the straight
line that best fits a specific set of data.

2
Introduction to Linear Regression (cont.)

• Any straight line can be represented by an


equation of the form Y = bX + a, where b
and a are constants.
• The value of b is called the slope constant
and determines the direction and degree
to which the line is tilted.
• The value of a is called the Y-intercept and
determines the point where the line
crosses the Y-axis.
3
Introduction to Linear Regression (cont.)

• How well a set of data points fits a straight line


can be measured by calculating the distance
between the data points and the line.
• The total error between the data points and the
line is obtained by squaring each distance and
then summing the squared values.
• The regression equation is designed to produce
the minimum sum of squared errors.

5
Introduction to Linear Regression (cont.)

The equation for the regression line is

6
Introduction to Linear Regression (cont.)
• The ability of the regression equation to
accurately predict the Y values is
measured by first computing the
proportion of the Y-score variability that is
predicted by the regression equation and
the proportion that is not predicted.

8
Introduction to Linear Regression (cont.)

• The unpredicted variability can be used to


compute the standard error of estimate
which is a measure of the average
distance between the actual Y values and
the predicted Y values.

9
Introduction to Linear Regression (cont.)

• Finally, the overall significance of the regression


equation can be evaluated by computing an F-
ratio.
• A significant F-ratio indicates that the equation
predicts a significant portion of the variability in
the Y scores (more than would be expected by
chance alone).
• To compute the F-ratio, you first calculate a
variance or MS for the predicted variability and
for the unpredicted variability:
10
Introduction to Linear Regression (cont.)

11
Introduction to Multiple Regression
with Two Predictor Variables
• In the same way that linear regression
produces an equation that uses values of
X to predict values of Y, multiple
regression produces an equation that
uses two different variables (X1 and X2) to
predict values of Y.
• The equation is determined by a least
squared error solution that minimizes the
squared distances between the actual Y
values and the predicted Y values.
13
Introduction to Multiple Regression
with Two Predictor Variables (cont.)
• For two predictor variables, the general form of
the multiple regression equation is:
Ŷ= b1X1 + b2X2 + a

• The ability of the multiple regression equation to


accurately predict the Y values is measured by
first computing the proportion of the Y-score
variability that is predicted by the regression
equation and the proportion that is not predicted.

15
Introduction to Multiple Regression
with Two Predictor Variables (cont.)

As with linear regression, the unpredicted variability (SS


and df) can be used to compute a standard error of
estimate that measures the standard distance between the
actual Y values and the predicted values.
16
Introduction to Multiple Regression
with Two Predictor Variables (cont.)
• In addition, the overall significance of the
multiple regression equation can be
evaluated with an F-ratio:

17
Partial Correlation
• A partial correlation measures the
relationship between two variables (X and
Y) while eliminating the influence of a third
variable (Z).
• Partial correlations are used to reveal the
real, underlying relationship between two
variables when researchers suspect that
the apparent relation may be distorted by
a third variable.

18
Partial Correlation (cont.)
• For example, there probably is no
underlying relationship between weight
and mathematics skill for elementary
school children.
• However, both of these variables are
positively related to age: Older children
weigh more and, because they have spent
more years in school, have higher
mathematics skills.

19
Partial Correlation (cont.)
• As a result, weight and mathematics skill
will show a positive correlation for a
sample of children that includes several
different ages.
• A partial correlation between weight and
mathematics skill, holding age constant,
would eliminate the influence of age and
show the true correlation which is near
zero.

20