You are on page 1of 34

Correlation

Correlation
A correlation is a relationship between two variables. The data
can be represented by the ordered pairs (x, y) where x is the
independent (or explanatory) variable, and y is the dependent
(or response) variable.
A scatter plot can be used to determine whether a linear y
(straight line) correlation exists between two variables.
2

x
Example: 2 4 6

x 1 2 3 4 5 –2

y –4 –2 –1 0 2
–4
Larson & Farber, Elementary Statistics: Picturing the World, 3e 2
Example of Correlation
Is there an association between:
 Children’s IQ and Parents’ IQ?
 Degree of social trust and number of
membership in voluntary association ?
 Urban growth and air quality destructions?
 Donor funding and number of publication by
Ph.D. students?
 Number of police patrol and number of crime?
 Grade on exam and time on exam?
Correlation Represents
a Linear Relationship
 Correlation involves a linear relationship.
 "Linear" refers to the fact that, when we graph our two
variables, and there is a correlation, we get a line of points.
 Correlation tells you how much two variables are linearly
related, not necessarily how much they are related in
general.
 There are some cases that two variables may have a
strong, or even perfect, relationship, yet the relationship is
not at all linear. In these cases, the correlation coefficient
might be zero.
Specific Example
Water
Temperature Consumption
(F) (Glasses)
For seven
random summer 75 16
days, a person 83 20
recorded the
temperature and 85  25
their water 85 27
consumption, during 92 32
a three-hour period
spent outside.   97 48
99 48
How “strong” is the linear relationship?
Correlation Coefficient
The correlation coefficient is a measure of the strength and the
direction of a linear relationship between two variables. The
symbol r represents the sample correlation coefficient. The
formula for r is
n  xy   x  y 
r .
2 2
n  x   x  n  y   y 
2 2
The range of the correlation coefficient is 1 to 1. If x and y have
a strong positive linear correlation, r is close to 1. If x and y have
a strong negative linear correlation, r is close to 1. If there is no
linear correlation or a weak linear correlation, r is close to 0.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 7


Correlation
• Measures the relative strength of the linear
relationship between two variables
• Ranges between –1 and 1
• The closer to –1, the stronger the negative linear relationship
• The closer to 1, the stronger the positive linear relationship
• The closer to 0, the weaker any positive linear relationship
Calculating a Correlation Coefficient
Calculating a Correlation Coefficient
In Words In Symbols
1. Find the sum of the x-values. x
2. Find the sum of the y-values. y
3. Multiply each x-value by its corresponding  xy
y-value and find the sum.
4. Square each x-value and find the sum.
5. Square each y-value and find the sum. x 2
6. Use these five sums to calculate the y 2
correlation coefficient. n  xy   x  y 
r  .
2 2
n  x   x 
2
n  y   y 
2

Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 9
Correlation Coefficient
Example:
Calculate the correlation coefficient r for the following data.
x y xy x2 y2
1 –3 –3 1 9
2 –1 –2 4 1
3 0 0 9 0
4 1 4 16 1
5 2 10 25 4
 x  15  y  1  xy  9  x 2  55  y 2  15

n  xy   x  y  5(9)  151


r  
2 2 2
n  x 2   x  n  y 2   y  5(55)  15 2 5(15)  1

60 There is a strong positive


  0.986
50 74 linear correlation between x
and y.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 10
Correlation Coefficient
Example:
The following data represents the number of hours 12 different
students watched television during the weekend and the scores of
each student who took a test the following Monday.
a.) Display the scatter plot.
b.) Calculate the correlation coefficient r.

Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50

Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 11
Correlation Coefficient
Example continued:
Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50
y
100
80
Test score

60
40
20
x
2 4 6 8 10
Hours watching TV
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 12
Correlation Coefficient
Example continued:
Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50
xy 0 85 164 222 285 340 380 420 348 455 525 500
x2 0 1 4 9 9 25 25 25 36 49 49 100
y2 9216 7225 6724 5476 9025 4624 5776 7056 3364 4225 5625 2500

 x  54  y  908  xy  3724  x 2  332  y 2  70836

n  xy   x  y  12(3724)  54 908


r    0.831
2 2 2
n  x   x 
2
n  y   y 
2
12(332)  54 2
12(70836)  908

There is a strong negative linear correlation.


As the number of hours spent watching TV increases, the test
scores tend to decrease.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 13
Direction of the relationship
Variables can be positively or negatively
correlated.
Positive correlation: A value of one variable
increase, value of other variable increase.

Negative correlation: A value of one variable


increase, value of other variable decrease.
Scatter Diagram
• Scatter diagram is a graphical method to
display the relationship between two variables

• Scatter diagram plots pairs of bivariate


observations (x, y) on the X-Y plane

• Y is called the dependent variable

• X is called an independent variable


Scattergrams

Y Y Y
Y Y Y

X X X

Positive correlation Negative correlation No correlation


Strength of the relationship
The magnitude of correlation:

 Indicated by its numerical value


 ignoring the sign
 expresses the strength of the linear
relationship between the variables.
Example

• A researcher believes that there is a


linear relationship between BMI (Kg/m2)
of pregnant mothers and the birth-weight
(BW in Kg) of their newborn

• The following data set provide


information on 15 pregnant mothers who
were contacted for this study
BMI (Kg/m2) Birth-weight (Kg)

20 2.7
30 2.9
50 3.4
45 3.0
10 2.2
30 3.1
40 3.3
25 2.3
50 3.5
20 2.5
10 1.5
55 3.8
60 3.7
50 3.1
35 2.8
Scatter diagram of BMI and Birthweight
4

3.5

2.5

1.5

0.5

0
0 10 20 30 40 50 60 70
Correlation Coefficient, R
• R is a measure of strength of the linear
association between two variables, x and y.

• Most statistical packages and some hand


calculators can calculate R

• For the data in our Example R=0.94



• R has some unique characteristics
 Correlation Coefficient, R
• R takes values between -1 and +1
  
• R=0 represents no linear relationship
between the two variables
 
• R>0 implies a direct linear relationship
• R<0 implies an inverse linear relationship
• The closer R comes to either +1 or -1, the
stronger is the linear relationship
Pearson’s correlation coefficient
There are many kinds of correlation coefficients but
the most commonly used measure of correlation is
the Pearson’s correlation coefficient. (r)

 The Pearson r range between -1 to +1.


 Sign indicate the direction.
 The numerical value indicates the strength.
 Perfect correlation : -1 or 1
 No correlation: 0
 A correlation of zero indicates the value are not linearly related.
 However, it is possible they are related in curvilinear fashion.
Scatter Plots of Data with
Various Correlation Coefficients
Y Y Y

X X X
r = -1 r = -.6 r=0
Y
Y Y

X X X
r = +1 r = +.3 r=0
 Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
Linear Correlation
Linear relationships Curvilinear relationships

Y Y

X X

Y Y

X X
 Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
Linear Correlation
Strong relationships Weak relationships

Y Y

X X

Y Y

X X
 Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
Linear Correlation
No relationship

X
 Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
Difference between Correlation
and Regression

• Correlation Coefficient, R, measures the


strength of bivariate association

• The regression line is a prediction


equation that estimates the values of y for
any given x
Limitations of the correlation
coefficient
• Though R measures how closely the two
variables approximate a straight line, it
does not validly measures the strength of
nonlinear relationship 
• When the sample size, n, is small we also
have to be careful with the reliability of
the correlation
• Outliers could have a marked effect on R
• Causal Linear Relationship
Coefficient of Determination r2
 
 The percentage of shared variance is represented
by the square of the correlation coefficient, r2 .
 Variance indicates the amount of variability in a
set of data.
 If the two variables are correlated, that means that
we can account for some of the variance in one
variable by the other variable.
Correlation and Causation
 Two things that go together may not necessarily
mean that there is a causation.
 One variable can be strongly related to another, yet
not cause it. Correlation does not imply causality.

 When there is a correlation between X and Y.


 Does X cause Y or Y cause X, or both?  
 Or is there a third variable Z causing both X and
Y , and therefore, X and Y are correlated?
Correlation Coefficient
Assignment 1:
The following data shows the respective weights of foot X and Y
of a sample of 8 fathers and their oldest sons.
Calculate the correlation coefficient r.

Weight X of
1 3 4 6 8 9 11 14
Father’s foot (kg)

Weight Y of Son’s
1 2 4 4 5 7 8 9
foot (kg)

Larson & Farber, Elementary Statistics: Picturing the World, 3e 32


Correlation Coefficient
Assignment 2:
The following data shows the final grades in Algebra and Physics
obtained by 10 students selected at random from a large group of
students at Namilyango College.
Calculate the correlation coefficient r.

Algebra (X )) 75 80 93 65 87 71 98 68 84 77

Physics (Y) 82 78 86 72 91 80 95 72 89 74

Larson & Farber, Elementary Statistics: Picturing the World, 3e 33


Correlation Coefficient
Assignment 3:
The following table shows the ages X and systolic blood pressure
Y of 12 women.
a.) Display the scatter plot.
b.) Calculate the correlation coefficient r.

Age (X) 56 42 72 36 63 47 55 49 38 42 68 60
Blood
47 25 60 18 49 28 50 45 15 40 52 55
pressure (Y)

Larson & Farber, Elementary Statistics: Picturing the World, 3e 34

You might also like