Professional Documents
Culture Documents
Pair 1 10 7
Pair 2 20 12
Pair 3 30 17
Pair 4 40 22
Pair 1 10 7
Pair 2 20 12
Pair 3 30 17
Pair 4 40 22
Notice that if we connect these
points, we would get a straight line.
This line fits ALL the ‘observed’
points.This straight line is called the
line of best fit or regression line
Score on
Reading
Student Hours (Y)
Aptitude Test
(X)
S1 20 5
S2 5 1
S3 5 2
S4 40 7
S5 30 8
S6 35 9
S7 5 3
S8 5 2
S9 15 5
S10 be the 40
What would predicted number of8hours spent by a
student with a Reading Aptitude score of 27? How do you
predict this?
Scatterplot (Hours Spent vs Reading Aptitude Test)
Notice that in this case, we cannot
connect these points with a straight line.
This is because the relation between X and
Y values is not a perfect positive
relationship (in fact, r = 0.94).
However, the concept of the line of best fit
can be employed with some adjustments;
draw a regression line such that, given the
observed relation between X and Y, we can
attempt to make predictions of Y which,
although not perfect, would involved the
least degree of prediction error (i.e. this
regression line is a compromise in getting
the line of best fit.
Note that the line may not pass through
ANY of the ‘observed’ points!
The procedure which produces a
linear regression line (which provides
us with a basis for the best
prediction) is called linear regression.
In using this type of prediction, we
make a vital assumption, that the
variables used to make the prediction
are linearly related.
Linear regression can only be
performed when both variables being
investigated are in the form of
interval/ratio
Scatterplot (Hours Spent vs Reading
Apptitude Scores)
regression line
Correlation vs Linear
Regression
The Correlation Regression is a
measures the degree statistical procedure
to which a set of data that determines the
points form a straight equation for the
line relationship. straight line that best
fits a specific set of
data.
Y Y
X X
Linear Regression
• The regression line is represented by
linear equation which takes form of
Ŷ = bX + a
where b and a are constants.
• The value of b is called the gradient of the
regression line and determines the
direction and degree to which the line is
tilted.
• The value of a is called the Y-intercept and
determines the point where the line
crosses the Y-axis.
Gradient = 1 Gradient = 1
Y-intercept = 4 Y-intercept = 4
where
and
The Standard Error of Estimates
(SEE)
Regression equations are more accurate than just
guessing. But even if you are using a regression
equation, you should not fall into the trap of thinking that
these equations will give you predictions that are 100%
accurate. As long as the correlation coefficients you're
using are not perfect (ie. r = 1 or r = -1), your
predictions will not be perfect. In all psychological and
biological populations some unexplained variance will
cause inaccuracies in predictions made using regression
equations.
where
and
=n-2
EXAMPLE
Consider the following pairs of scores where X is
an independent variable which can be used to X Y
predict the value of Y (dependent variable).
2 3
a. Sketch the scatterplot of X versus Y
6 11
b. Use SPSS to:
0 6
i. Compute Pearson correlation r
between X and Y 4 6
ii. Find regression equation relating X 5 7
and Y
7 12
iii. Compare the value of r with R
obtained from the regression. What 5 10
can you say about these values? 3 9
iv. What would be the predicted value of
Y when X = 4.2?
Performing Linear Regressin using SPSS
The Data
(Use similar procedures as described
before to produce scatterplot of X vs Y)
Producing Linear Regression
Analyze Regression Linear
Select the dependent variable and move to the
Dependent box. Select the independent variable
and move to the Independent(s) box. Use the
drop-down arrow to select Enter as the Method.
Click Statistics and check Estimates and
Model fit boxes. Click Continue. OK.
The Output
Scatterplot (X vs Y)
X
Pearson correlation r = R =
0.750
Y-intercept
gradient
Determining the Coefficients b and a to form Ŷ = bX + a: