You are on page 1of 18

Statistics And Probability

PEARSON’S
PRODUCT–MOMENT
CORRELATION COEFFICIENT Presented by:
Jill Sarah C. Ilagan
Who Discovered
Pearson’s
Product–moment Correlation
Coefficient

WOODGROVE 2
BANK
PEARSON’S
PRODUCT–MOMENT CORRELATION COEFFICIENT

Pearson’s product moment correlation coefficient, or


Pearson’s r was developed by Karl Pearson (1948) from a related
idea introduced by Sir Francis Galton in the late 1800’s. In
addition to being the first of the correlational measures to be
developed, it is also the most commonly used measure of
association.
All subsequent correlation measures have been developed
from Pearson’s equation and are adaptations engineered to control
for violations of the assumptions that must be met in order to use
Pearson’s equation (Burns & Grove, 2005; Polit & Beck, 2006).
Pearson’s r measures the strength, direction and probability of the
linear association between two interval or ratio variables.

WOODGROVE 3
BANK
When To Run/Use
PPMCC Pearson’s
Product–moment Correlation
Coefficient

WOODGROVE 4
BANK
PEARSON’S
PRODUCT–MOMENT CORRELATION COEFFICIENT

The Pearson product-moment correlation coefficient (or Pearson


correlation coefficient, for short) is a measure of the strength of a linear
association between two variables and is denoted by r. Basically, a
Pearson product-moment correlation attempts to draw a line of best fit
through the data of two variables, and the Pearson correlation coefficient,
r, indicates how far away all these data points are to this line of best fit
(i.e., how well the data points fit this new model/line of best fit).

WOODGROVE 5
BANK
PEARSON’S
PRODUCT–MOMENT CORRELATION COEFFICIENT
• When the relationship between two variables needs to be studied, a good
way of representing the data graphically is to use a scatter diagram. With
a scatter diagram, a series of points is plotted. The x- and y-coordinates
of each point are taken from the values of the variables.
• The data and scatter diagram below show the height of young children
(cm) plotted against their mass (kg).

WOODGROVE 6
BANK
PEARSON’S
PRODUCT–MOMENT CORRELATION COEFFICIENT
• From the graph it is clear that there is a relationship between the two
variables. Generally, as height increases so does mass.
• This can be emphasized by plotting a line of best fit. This should pass
through the point which represents the mean values of the height and
mass.
• To see how strong a correlation there is between the two variables,
Pearson’s product–moment correlation coefficient can be calculated

WOODGROVE 7
BANK
• The value of Pearson’s product–moment coefficient (r ) gives an
indication of the level of correlation between two variables. It has a
value in the range –1 ≤ r ≤ 1.
• The value of r is calculated using the formula where is the standard
deviation of x, is the standard deviation of y and s xy is the
covariance of x and y s
xy
r 
sx s y
• A value of r near –1 implies a strong negative correlation between
the two variables.
• A value of r near +1 implies a strong positive correlation between
the two variables.
• A value of r near 0 implies there is little or no correlation between
the two variables.
WOODGROVE 8
BANK
For the data given earlier, the results can be
entered in a GDC calculator and the value of r
calculated.

This value of r shows that there is a


moderate positive correlation between a child’s
height and mass.

WOODGROVE 9
BANK
In summary, correlation coefficients are used
to assess the strength and direction of the linear
relationships between pairs of variables. When
both variables are normally distributed use
Pearson's correlation coefficient, otherwise use
Spearman's correlation coefficient.

WOODGROVE 10
BANK
Using Stepwise Method
WOODGROVE 11
BANK
PEARSON’S
PRODUCT–MOMENT CORRELATION COEFFICIENT

People tend to marry other people of about the same age. Our experience tells us
"yes," but are we confident with this answer. One way to address the question is to look
at pairs of ages for a sample of married couples. The samples date below does just that
with the ages of 10 married couples. Going across the columns we see that, yes,
husbands and wives tend to be of about the same age, with men having a tendency to be
slightly older than their wives. This is no big surprise, but at least the data bear out our
experiences, which is not always the case. What we know of statistics, however, tells us
that what we see is not always significant. So lets apply the Pearson r formula and see
what happens.

Husband
(x)
36 72 37 36 51 50 47 50 37 41

Wife 35 67 33 35 50 46 47 42 36 41
(y)

WOODGROVE 12
BANK
Always start an investigation of bivariate data with a graph. Notice the
scatterplot indicates a fairly strong, linear association between the ages
of husbands and wives.
80

70

60

50

40
Wife’s age
30

20

10

0
30 35 40 45 50 55 60 65 70 75

Husband’s age
WOODGROVE 13
BANK
XY is simply the product of columns X and Y. Therefore, 36 x 35 = 1260. Therefore, 36 x 35 = 1260, 72 x 67 =
4824, and so on.
Another issue to consider is how many data points we have. Although we have 10 for husbands and 10 for
wives, we do not have 20 independent data points. Instead, we have 10 pairs of data points, so n = 10. Remember,
correlation concerns the relationship between the two variables, and husbands and wives definitely form pairs. It
certainly wouldn't make sense to treat all 20 individuals as independent and allow them to pair up in some random
manner!
N = # pairs
So with the above information we can calculate the Sx (S for the X column) and the Sy (S for the Y column) and we
are ready to go. Since you have had plenty of practice so far, do that now. Find the mean and the standard deviations
of x and y, you should find:

WOODGROVE 14
BANK
Husbands Wives
Pair X X2 Y Y2 XY
1 36 1296 35 1225 1260
2 72 5184 67 1489 4824
3 37 1369 33 1089 1221
4 36 1296 35 1225 1260
5 51 2601 50 2500 2550
6 50 2500 46 2116 2300
7 47 2209 47 2209 2209
8 50 2500 42 1764 2100
9 37 1369 36 1296 1332
10 41 1681 41 1681 1681
= 457 22005 432 19594 20737

WOODGROVE 15
BANK
Now just plug it all into the formula.

So based on our data, there is a positive correlation of 0.974 between the ages of husbands and wives.
Now, is this significant? To check for a significant Pearson r correlation, we turn to a new set of tables.
The concept, however, will remain the same. We need to compare our calculated r to a critical r value
found in the table

A hypothesis test for correlation will start with a null hypothesis of "zero correlation." Then the
alternative hypothesis will be "there is a non-zero correlation." In the hypotheses below, the Greek letter
rho represents the true correlation in the population from which our sample is drawn. WOODGROVE 16
BANK
DF = N-2 Α = 0.05 Α = 0.01
1 0.99700 0.99990 For a Pearson r correlation, our df = N - 2,
2 0.95000 0.99000
3 0.87800 0.95900 where N is still equal to the number of pairs. So
4 0.81100 0.91700 what is our decision? Even if we use the more
5 0.75400 0.87400
6 0.70700 0.83400 conservative alpha = 0.01, our calculated r
7 0.66600 0.79800 (0.974) exceeds the critical r (from the table) of
8 0.63200 0.76500
9 0.60200 0.73500
0.765. So our correlation is significant.
10 0.57600 0.70800
11 0.55300 0.68400
12 0.53200 0.66100 26 0.37400 0.47900
13 0.51400 0.64100 27 0.36700 0.47100
14 0.49700 0.62300 28 0.36100 0.46300
15 0.48200 0.60600 29 0.35500 0.45600
16 0.46800 0.59000 30 0.34900 0.44900
17 0.45600 0.57500 35 0.32500 0.41800
18 0.44400 0.56100 40 0.30400 0.39300
19 0.43300 0.54900 45 0.28800 0.37200
20 0.42300 0.53700 50 0.27300 0.35400
21 0.41300 0.52600 60 0.25000 0.32500
22 0.40400 0.51500 70 0.23200 0.30300
23 0.39600 0.50500 80 0.21700 0.28300
24 0.38800 0.49600 90 0.20500 0.26700
25 0.38100 0.48700 100 0.19500 0.25400

WOODGROVE 17
BANK
THANK YOU

You might also like