You are on page 1of 23

# Chapter 17: Introduction to

Regression

1
Introduction to Linear Regression
• The Pearson correlation measures the
degree to which a set of data points form a
straight line relationship.
• Regression is a statistical procedure that
determines the equation for the straight
line that best fits a specific set of data.

2
Introduction to Linear Regression (cont.)

## • Any straight line can be represented by an

equation of the form Y = bX + a, where b
and a are constants.
• The value of b is called the slope constant
and determines the direction and degree
to which the line is tilted.
• The value of a is called the Y-intercept and
determines the point where the line
crosses the Y-axis.
3
Introduction to Linear Regression (cont.)

## • How well a set of data points fits a straight line

can be measured by calculating the distance
between the data points and the line.
• The total error between the data points and the
line is obtained by squaring each distance and
then summing the squared values.
• The regression equation is designed to produce
the minimum sum of squared errors.

5
Introduction to Linear Regression (cont.)

## The equation for the regression line is

6
Introduction to Linear Regression (cont.)
• The ability of the regression equation to
accurately predict the Y values is
measured by first computing the
proportion of the Y-score variability that is
predicted by the regression equation and
the proportion that is not predicted.

8
Introduction to Linear Regression (cont.)

## • The unpredicted variability can be used to

compute the standard error of estimate
which is a measure of the average
distance between the actual Y values and
the predicted Y values.

9
Introduction to Linear Regression (cont.)

## • Finally, the overall significance of the regression

equation can be evaluated by computing an F-
ratio.
• A significant F-ratio indicates that the equation
predicts a significant portion of the variability in
the Y scores (more than would be expected by
chance alone).
• To compute the F-ratio, you first calculate a
variance or MS for the predicted variability and
for the unpredicted variability:
10
Introduction to Linear Regression (cont.)

11
Introduction to Multiple Regression
with Two Predictor Variables
• In the same way that linear regression
produces an equation that uses values of
X to predict values of Y, multiple
regression produces an equation that
uses two different variables (X1 and X2) to
predict values of Y.
• The equation is determined by a least
squared error solution that minimizes the
squared distances between the actual Y
values and the predicted Y values.
13
Introduction to Multiple Regression
with Two Predictor Variables (cont.)
• For two predictor variables, the general form of
the multiple regression equation is:
Ŷ= b1X1 + b2X2 + a

## • The ability of the multiple regression equation to

accurately predict the Y values is measured by
first computing the proportion of the Y-score
variability that is predicted by the regression
equation and the proportion that is not predicted.

15
Introduction to Multiple Regression
with Two Predictor Variables (cont.)

## As with linear regression, the unpredicted variability (SS

and df) can be used to compute a standard error of
estimate that measures the standard distance between the
actual Y values and the predicted values.
16
Introduction to Multiple Regression
with Two Predictor Variables (cont.)
• In addition, the overall significance of the
multiple regression equation can be
evaluated with an F-ratio:

17
Partial Correlation
• A partial correlation measures the
relationship between two variables (X and
Y) while eliminating the influence of a third
variable (Z).
• Partial correlations are used to reveal the
real, underlying relationship between two
variables when researchers suspect that
the apparent relation may be distorted by
a third variable.

18
Partial Correlation (cont.)
• For example, there probably is no
underlying relationship between weight
and mathematics skill for elementary
school children.
• However, both of these variables are
positively related to age: Older children
weigh more and, because they have spent
more years in school, have higher
mathematics skills.

19
Partial Correlation (cont.)
• As a result, weight and mathematics skill
will show a positive correlation for a
sample of children that includes several
different ages.
• A partial correlation between weight and
mathematics skill, holding age constant,
would eliminate the influence of age and
show the true correlation which is near
zero.

20