You are on page 1of 19

DR. AMADO T.

ALIMURUNG
Instructor
CORRELATION

Correlation refers to the statistical association


between two variables. A correlation exists between two
variables when one of them is related to the other in
some ways.
A scatterplot is the best place to start. A scatterplot
or scatter diagram is a graph of the paired (x, y)
sample data with a horizontal x-axis and a vertical y-axis.
A relationship has no correlation when the points
on a scatterplot do not show any direction or pattern.

A relationship is non-linear when the points on a


scatterplot follow a pattern but not a straight line.

A relationship is linear when the points on a


scatterplot follow a somewhat straight line pattern.
Positive linear relationships have points that incline
upwards to the right. As x values increase, then y values
increase. Otherwise, it is negative relationship.
PEARSON PRODUCT MOMENT CORRELATION
COEFFICIENT

The most widely used measure of correlation is


Pearson Product Moment Correlation Coefficient
or simply Pearson r.
Formula:

r= n(∑xy) – (∑x)(∑y)
√[n∑x2 – (∑x)2 ][n∑y2 – (∑y)2]
where:

x - the observed data for the independent variable


y - the observed data for the independent variable
n – the sample size
∑x – the summation of x values
∑y – the summation of y values
∑x2 – the summation of the square of each of x values
∑y2 – the summation of the square of each of y values
∑xy – the summation of the product of the x and y values
Range of
Degree of Correlation
Correlation
0.80 – 1.00 very strong positive
0.60 – 0.79 strong positive
0.40 – 0.59 moderate positive
0.20 – 0.39 weak positive
0.00 – 0.19 INSIGNIFICANT
0.00 – (-0.19) very weak negative
(-0.20) – (-0.39) weak negative
(-0.40) – (-0.59) moderate negative
(-0.60) – (-0.79) strong negative
(-0.80) – (-1.00) very strong negative
Example: A study was conducted to investigate the relationship
existing between the grade in Statistics and Computer subjects.
A random sample of 10 students in a certain college were taken
and the data are as follows:
Question: Is there a relationship between the performance of
the students in Statistics and Computer subjects?

Student A B C D E F G H I J

Statistics 75 83 80 77 89 78 92 86 93 84

Computer 78 87 78 76 92 81 89 89 91 84
Solution:
Student x y xy x2 y2
A 75 78 5850 5625 6084
B 83 87 7221 6889 7569
C 80 78 6240 6400 6084
D 77 76 5852 5929 5776
E 89 92 8188 7921 8464
F 78 81 6318 6084 6561
G 92 89 8188 8464 7921
H 86 89 7654 7396 7921
I 93 91 8463 8649 8281
J 84 84 7056 7056 7065
n = 10 ∑x = 837 ∑y = 845 ∑xy =71030 ∑x2 =70413 ∑y2 = 71717
Solution:
r= n(∑xy) – (∑x)(∑y)
√[n∑x2 – (∑x)2 ][n∑y2 – (∑y)2]
= 10(71030) – (837)(845)
√[10(70413) – (837)2] [10(71717) – (845)2]
= 3035
√(3561)(3145)
= 3035
3346.54
= 0.9069 or 0.91
Therefore, there is a very positive relationship between the
performance of the students in Statistics and Computer subjects.
LINEAR REGRESSION

A simple linear regression is a mathematical


equation that allows us to predict a response for a given
predictor value. This is used in the process of prediction.
Prediction is calculating scores of the criterion variable
(yᶺ) on the basis of knowledge of the predictor (x).
Linear Regression Formula
yᶺ = a + bx
which is called the least square line or the simple
regression line
where: a – the y-intercept
b - the slope
x – predictor variable
yᶺ - estimate of the mean value of the response
variable for any value of the predictor variable
The y-intercept is the predicted value for the response
(y) when x = 0. The slope describes the change in y for
each one unit change in x.
The values of a and b can be computed by using the
following formula:

b = n∑xy – (∑x)(∑y)
n∑x2 – (∑x)2

a = Mny- bMnx

where: Mny – the mean of the y values


Mnx – the mean of the x values
Example: Given the following data on correlation between the
grade in Statistics and Computer subjects.
Question: What would be the predicted grade of a student in
Computer who has a grade of 85 in Statistics and what
regression equation could be used?

Student A B C D E F G H I J

Statistics 75 83 80 77 89 78 92 86 93 84

Computer 78 87 78 76 92 81 89 89 91 84
Solution:
Student x y xy x2 y2
A 75 78 5850 5625 6084
B 83 87 7221 6889 7569
C 80 78 6240 6400 6084
D 77 76 5852 5929 5776
E 89 92 8188 7921 8464
F 78 81 6318 6084 6561
G 92 89 8188 8464 7921
H 86 89 7654 7396 7921
I 93 91 8463 8649 8281
J 84 84 7056 7056 7065
n = 10 ∑x = 837 ∑y = 845 ∑xy =71030 ∑x2 =70413 ∑y2 = 71717
Solution:
b = n∑xy – (∑x)(∑y)
n∑x2 – (∑x)2
= 10(71030) – (837)(845)
10(70413) – (837)2
= 3035
3561
b = 0.85
Mny = 837/10 = 83.7
Mnx = 845/10 = 84.5

a = Mny- bMnx
= 84.5 – (0.85)(83.7)
a = 13.36

The regression equation is yᶺ = a + bx


yᶺ = 13.36 + (0.85)(85)
yᶺ = 85.61 or 86
Therefore, if the grade of a student in Statistics (x) is
85, then the predicted Computer grade is 86.
Activity: Determine the relationship between the family
monthly income and the grades of the students.

Student A B C D E F G

Family
Income 30000 21000 45000 54000 86000 34000 49000

Grade 1.25 1.75 3.0 2.75 3.0 2.25 2.5


Answer:
Solution: r = 0.749 or 0.75, there is a strong positive
relationship between the family monthly income and the grades
of the students.
Family Income
Student Grades (y) xy x2 y2
(x)
A 30000 1.25 37500 900000000 1.5625
B 21000 1.75 36750 441000000 3.0625
C 45000 3.0 135000 2025000000 9
D 54000 2.75 148500 2916000000 7.5625
E 86000 3.0 258000 7396000000 9
F 34000 2.25 76500 1156000000 5.0625
G 49000 2.5 122500 2401000000 6.25
n=7 319000 16.5 814750 17235000000 41.5

You might also like