You are on page 1of 35

REGRESSION

main source: Gravetter F J & Wallnau L B (2013)


Leech N L, Barret K C & Morgan G A (2011)
Important Terminologies
• Linear Regression
• Line of Best Fit
• Regression Equation (Regression Line)
• Standard Error of Estimates
The General Idea About Linear
Regression
Supposed we are asked to investigate the relationship
between two variables namely VarP (being the
independent) and VarQ (being the dependent):

Pair VarP VarQ

Pair 1 10 7
Pair 2 20 12
Pair 3 30 17
Pair 4 40 22

What would be the predicted value of VarQ if VarP =


15? If VarP = 25? How do you predict these?
Pair VarP VarQ

Pair 1 10 7

Pair 2 20 12

Pair 3 30 17

Pair 4 40 22
Notice that if we connect these
points, we would get a straight line.
This line fits ALL the ‘observed’
points.This straight line is called the
line of best fit or regression line

The line of best fit defines a basis for


predicting values of Q, given values
of P (and vice versa).

The concept of the line of best fit can


be extended to form a basis for linear
regression.
A researcher investigates the relationship between individual’s
score on a Reading Aptitude Test and the average amount of
hours he/she spends for reading (Hours). The data gathered
from 10 students are as follows:

Score on
Reading
Student Hours (Y)
Aptitude Test
(X)
S1 20 5
S2 5 1
S3 5 2
S4 40 7
S5 30 8
S6 35 9
S7 5 3
S8 5 2
S9 15 5
S10 be the 40
What would predicted number of8hours spent by a
student with a Reading Aptitude score of 27? How do you
predict this?
Scatterplot (Hours Spent vs Reading Aptitude Test)
Notice that in this case, we cannot
connect these points with a straight line.
This is because the relation between X and
Y values is not a perfect positive
relationship (in fact, r = 0.94).
However, the concept of the line of best fit
can be employed with some adjustments;
draw a regression line such that, given the
observed relation between X and Y, we can
attempt to make predictions of Y which,
although not perfect, would involved the
least degree of prediction error (i.e. this
regression line is a compromise in getting
the line of best fit.
Note that the line may not pass through
ANY of the ‘observed’ points!
The procedure which produces a
linear regression line (which provides
us with a basis for the best
prediction) is called linear regression.
In using this type of prediction, we
make a vital assumption, that the
variables used to make the prediction
are linearly related.
Linear regression can only be
performed when both variables being
investigated are in the form of
interval/ratio
Scatterplot (Hours Spent vs Reading
Apptitude Scores)

Reading Aptitude Scores

regression line
Correlation vs Linear
Regression
The Correlation Regression is a
measures the degree statistical procedure
to which a set of data that determines the
points form a straight equation for the
line relationship. straight line that best
fits a specific set of
data.
Y Y

X X
Linear Regression
• The regression line is represented by
linear equation which takes form of
Ŷ = bX + a
where b and a are constants.
• The value of b is called the gradient of the
regression line and determines the
direction and degree to which the line is
tilted.
• The value of a is called the Y-intercept and
determines the point where the line
crosses the Y-axis.
Gradient = 1 Gradient = 1

Y-intercept = 4 Y-intercept = 4

Line of Best Fit Regression Line


Numerical Computation of a and b
Given an independent variable X which
relates to a dependent variable Y, the
equation for the regression line is given by:

where

and
The Standard Error of Estimates
(SEE)
Regression equations are more accurate than just
guessing. But even if you are using a regression
equation, you should not fall into the trap of thinking that
these equations will give you predictions that are 100%
accurate. As long as the correlation coefficients you're
using are not perfect (ie. r = 1 or r = -1), your
predictions will not be perfect. In all psychological and
biological populations some unexplained variance will
cause inaccuracies in predictions made using regression
equations.

As you may recall, the coefficient of determination


expresses the amount of variance in one variable that is
explained by the other variable.
Whenever the correlation coefficient is not +1.00 or
-1.00, an unexplained variance will always cause some
error in prediction. This error of prediction is called the
standard error of estimate.
The standard error of estimate (SEE) is
the standard deviation of actual values
from the value estimated from the
regression equation.
The smaller the standard error of
estimate, the closer to the actual real-
world values the prediction is likely to
be.
Range of deviation of Y scores one SEE above and one SEE
below the line of best fit for sample study where SEE = 0.963
Additional Statistic Related to Linear
Regression

where
and

=n-2
EXAMPLE
Consider the following pairs of scores where X is
an independent variable which can be used to X Y
predict the value of Y (dependent variable).
2 3
a. Sketch the scatterplot of X versus Y
6 11
b. Use SPSS to:
0 6
i. Compute Pearson correlation r
between X and Y 4 6
ii. Find regression equation relating X 5 7
and Y
7 12
iii. Compare the value of r with R
obtained from the regression. What 5 10
can you say about these values? 3 9
iv. What would be the predicted value of
Y when X = 4.2?
Performing Linear Regressin using SPSS
The Data
(Use similar procedures as described
before to produce scatterplot of X vs Y)
Producing Linear Regression
Analyze  Regression  Linear
Select the dependent variable and move to the
Dependent box. Select the independent variable
and move to the Independent(s) box. Use the
drop-down arrow to select Enter as the Method.
Click Statistics and check Estimates and
Model fit boxes. Click Continue. OK.
The Output

Scatterplot (X vs Y)

X
Pearson correlation r = R =
0.750
Y-intercept

gradient
Determining the Coefficients b and a to form Ŷ = bX + a:

Look the values of Bs in the unstandardized


regression coefficient column;
b = 1 (the regression coefficient of the
independent variable X, or the
gradient of regression line)
a = 4 (the constant in the regression
equation, or the Y-intercept of
regression line)

i.e. the regression equation is:


Ŷ=X+4

When X = 4.2, then Y = 4.2 + 4 = 8.2


Interpretation of SPSS Output
(c.f. Leech, Barret & Morgan (2011) p 101)
Y

You might also like