You are on page 1of 37

Chapter 7

Correlation and Simple


Linear Regression
Objectives
When you have completed this chapter, you
will be able to:
1. Understand and interpret the terms dependent variable and
independent variable.
2. Plot scatter diagram.
3. Recognize the types of relationship that exists between two
variables from the plot of scatter diagrams.
4. Calculate and interpret the coefficient of correlation.
5. Find the regression line equation of y on x using the least
squares method.
6. Compute the expected value of y for a given x.
Correlation Analysis is a group of statistical techniques to
measure the association between two variables.

A Scatter Diagram
is a chart that shows the
relationship between
two variable.

The Independent
The Dependent Variable
is the variable being Variable provides the
predicted or estimated. basis for estimation. It
is the predictor variable.
Independent variable, x or
dependent variables, y?

 Weight Height
y x
 Sales calls x y sold
Units
 Saving Income
y x
 Revision Hours Grading
x y
 Pressure of balloon yTemperature
x in a room
The Coefficient of Correlation (r) is a measure of the
strength of the relationship between two variables.
Also called Pearson’s r and It requires interval or ratio-
Pearson’s product moment scaled data.
correlation coefficient.
It can range from P e a r s o n 's r
-1.00 to 1.00.
Values of -1.00 or 1.00
indicate perfect and strong
correlation. -1 0 1
Negative values indicate an Values close to 0.0 indicate
inverse relationship and weak correlation.
positive values indicate a
direct relationship.
10
9
8
7
6
Y 5
4
3
r=1
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X
Perfect Positive Linear
Correlation
10
9
8

r = -1
7
6
Y 5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X

Perfect Negative Linear Correlation


10
9
8
7
r=0
6
Y 5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X
Zero Correlation
Positive Linear Correlation
10
9
8
7
6
Y 5
4
3
2 0< r < 1
1
0
0 1 2 3 4 5 6 7 8 9 10
X
10
9
8
7
-1< r < 0
6
Y 5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X

Negative Linear Correlation


10
9
8
7
6
Y 5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X

Nonlinear Correlation
Example 1
The manager of MRM System randomly selected 10 sales
representative and determined the number of sales calls each one
made last month and the number of units of the product he/she
sold. Sales Number of Number of
Representative sales calls units sold
Mohd Ali 14 28
Budiyanto 35 66
Cheng Long 22 38
Fathimah 29 70
Hashim 6 22
Kamarul 15 27
Rajagopal 17 28
Roslina 20 47
Swee Lee 12 14
Siti Rahimah 29 68
Represent the above information
in a Scatter Diagram.

There is a positive and strong correlation between


number of sales calls and units sold.
We calculate the coefficient of correlation
from the following formula.
SS xy SS = Sum of Squares
r 
SS xx SS yy

2
 x  2

where SS xx  x 
n
2
 y  2

SS yy  y 
n
 x  y 
SS xy  xy 
n
x y
14 28 14x28=784
35 66
22 38
29 70
6 22
15 27
17 28
20 47
12 14
29 68
x y
14 28 196 784 392
35 66 1225 4356 2310
22 38 484 1444 836
29 70 841 4900 2030
6 22 36 484 132
15 27 225 729 405
17 28 289 784 476
20 47 400 2209 940
12 14 144 196 168
29 68 841 4624 1972
199 408 4681 20510 9661
We calculate the coefficient of correlation from the
following formula.

2
 x  2

SS xx  x 
n

2
 y  2

SS yy  y 
n
 x  y 
SS xy  xy 
n
SS xy
r 
SS xx SS yy

There is a positive and strong correlation between number


of sales calls and units sold.
Simple Linear Regression

The complete regression model


(Population regression line)
Estimated regression model
y  A  Bx   yˆ a  bx
which is taken from where
y  A  Bx ŷ= the estimated or predicted
value of y
where a = estimated value of A
A = y-intercepy b = estimated value of B
B = slope
 = error term
The least squares regression line
ˆ a  bx
y

SS xy
b a  y  bx
SS xx
where
 x  y 
SS xy  xy 
n

2
 x  2

SS xx  x 
n
SS xy a  y  bx
b
SS xx

The least squares regression line


ˆ a  bx
y
b of 2.1387 indicates that for each extra additional sale
call made, about two units are sold.
-1.7601 indicates that the intercept with Y-axis is
below the origin.
If the sales representative makes 20 calls in a month,
how many units will be sold?
Oxford publisher is concerned
about the cost to students of
textbooks. He believes there is
a relationship between the
number of pages in the text and
the selling price of the book.
To provide insight into the
problem he selects a sample of
eight textbooks currently on
sale in the bookstore. Draw a Example 2
scatter diagram. Compute the
correlation coefficient.
Book Page Price(RM)
Introduction to History 500 84
Basic Algebra 700 75
Introduction to Psychology 800 99
Introduction to Sociology 600 72
Business Management 400 69
Introduction to Biology 600 81
Fundamentals of Jazz 600 63
Principles of Nursing 800 93
Scatter Diagram

There is a positive and moderate correlation between number


of pages and selling price.
500x84=42000
500 84
700 75
800 99
600 72
400 69
600 81
600 63
800 93
500 84 250000 7056 42000
700 75 490000 5625 52500
800 99 640000 9801 79200
600 72 360000 5184 43200
400 69 160000 4761 27600
600 81 360000 6561 48600
600 63 360000 3969 37800
800 93 640000 8649 74400

5000 636 3,260,000 51,606 405,300


We calculate the coefficient of correlation from the
following formula.

2
 x  2

SS xx  x 
n

2
 y  2

SS yy  y 
n
 x  y 
SS xy  xy 
n
SS xy
r 
SS xx SS yy

There is a positive and moderate association between


the number of pages and the selling price of the book
Develop a regression equation for the information
given that can be used to estimate the selling price
based on the number of pages.
SS xy a  y  bx
b
SS xx

The least squares regression line


ˆ a  bx
y
The regression equation is:

The slope of the line is 0.0578.


Each addition page costs
about a RM0.06.

The equation crosses the Y-axis at RM43.375.


A book with no pages would cost RM43.38.
Estimate the selling price of an 800
page book.

Price = 43.375 + 0.0578(Number of Pages)


= 43.39 + 0.0578(800)
= RM89.615
Exercise

A study is done to see whether there is a relationship


between a student’s grade point average (GPA) and
the number of hours the student studies per week.

Hour 12 9 16 3 15 5 16
GPA 3.52 3.31 3.75 2.10 4.00 1.69 3.74

(a)Plot the scatter diagram.


(b)Find the linear correlation coefficient. Explain your answer.
(c)Find the least square regression line.
(d)Predict the grade point average of a student who studies 10.5
hours per week.
GPA Versus The Numbers of Hours
GPA
4.50

4.00

3.50

3.00

2.50

2.00

1.50

1.00

0.50

0.00
2 4 6 8 10 12 14 16 18

The Numbers of Hours


x y xy
12 3.52 144 12.3904 42.24
9 3.31 81 10.9561 29.79 SSxx =170.8571
16 3.75 256 14.0625 60 SSyy =4.826686
3 2.10 9 4.41 6.3
SSxy =26.56857
15 4.00 225 16 60
5 1.69 25 2.8561 8.45
16 3.74 256 13.9876 59.84
76 22.11 996 74.6627 266.62

r = 0.9252
There is a strong and positive correlation between the
GPA and the number hours studied by students.
The Least Square Regression Line is b = 0.1555
yˆ 1.4703  0.1555 x a = 1.4703

When x=10.5, yˆ 1.4703  0.1555(10.5) 3.1031


The GPA of a student who studies 10.5 hours per week is 3.1031.

You might also like