0% found this document useful (0 votes)
178 views59 pages

Pearson Correlation & Regression Guide

This document discusses Pearson product-moment correlation. It begins by introducing Karl Pearson, who founded the world's first university statistics department and contributed significantly to statistics. The document then defines correlation as a statistical method to determine if a relationship exists between two variables, and can measure the direction and strength of that relationship. Positive, negative, and zero correlations are described. Examples are provided to demonstrate each type. The document also discusses scatter plots and using them to visualize relationships between variables. It introduces the Pearson correlation coefficient and what strengths of correlation the values represent. An example is provided to demonstrate calculating the Pearson correlation coefficient between two sets of data and interpreting the results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
178 views59 pages

Pearson Correlation & Regression Guide

This document discusses Pearson product-moment correlation. It begins by introducing Karl Pearson, who founded the world's first university statistics department and contributed significantly to statistics. The document then defines correlation as a statistical method to determine if a relationship exists between two variables, and can measure the direction and strength of that relationship. Positive, negative, and zero correlations are described. Examples are provided to demonstrate each type. The document also discusses scatter plots and using them to visualize relationships between variables. It introduces the Pearson correlation coefficient and what strengths of correlation the values represent. An example is provided to demonstrate calculating the Pearson correlation coefficient between two sets of data and interpreting the results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Pearson Product-

Moment
Correlation

SAS Mathematics Faculty


Objectives
1. Use the methods of linear
regression and correlations to
predict the value of a variable
given certain conditions.
2. Solve and interpret and
coefficient of determination, 2.
Karl Pearson (1857-1936)

- He was an
influential English
mathematician and
biostatician.
Karl Pearson (1857-1936)
- In 1911, he founded the
world’s first university
statistics department at
the University College of
London, and contributed
significantly to the field of
biometrics, meteorology,
social Darwinism and
Eugenics.
What is Correlation?
It is a statistical method used to
determine whether a relationship between
two variables exists.
It also measure of the direction and
strength of linear relationship between
two variables.
Direction maybe positive, negative or
zero.
Types of Correlation
 Positive correlation
 Negative correlation
 Zero correlation
Positive Correlation
A positive correlation exists
when high values of one variable
correspond to high values in the
other variable or low values in one
variable correspond to low values in
the other variable.
Positive Correlation
10

Score in English
8
6
4
2
0
0 1 2 3 4 5 6 7 8 9 10
Score in Mathematics

The graph indicates a direct correlation between


variables x and y which appears to be increasing.
Negative Correlation
A negative correlation exist
when high values of one variable
correspond to low values in the
other variable or low values in one
variable correspond to high values
in the other variable.
Negative Correlation
12

Score in English
10
8
6
4
2
0
0 2 4 6 8 10 12

Score in Mathematics

This time the trend of the data is decreasing, hence, the variables are
negatively correlated.
Zero Correlation
A zero correlation exists when
high values in one variable
correspond to either high or low
values in the other variable.
Zero Correlation
10

Score in English
8
6
4
2
0
1 2 3 4 5 6 7 8 9 10
Score in Mathematics

The scatter graph of the data above is either increasing or decreasing. This
graph represents a zero correlation.
Example:
Determine the direction of
relationship between the following
pairs of variables. Is it positive,
negative or zero?
Example:
1. Room rate and size of a room in
a hotel

A. Positive
B. Negative
C. Zero
Example:
2. Weight and height of students

A. Positive
B. Negative
C. Zero
Example:
3. Pressure and volume of a gas

A. Positive
B. Negative
C. Zero
Example:
4. IQ and height of persons

A. Positive
B. Negative
C. Zero
Example:
5. Number of customers and sales
in a department store

A. Positive
B. Negative
C. Zero
What is Correlation?
Strength can be perfect, strong or
high, moderate, low, zero or no
correlation.

Note:
Correlation between two variables
does not prove X causes Y or Y causes
X.
A scatter plot (or scatter diagram)
is used to show how each point
collected from a set of bivariate data
are scattered on the Cartesian plane.
It gives a good visual picture
between the two variables.
It is a graphical representation of the
relationship between two variables.
Types of Linear Correlation
Example:
Construct the scatterplots for the
following bivariate data using
Microsoft Excel and describe the
relationship in terms of direction
and strength.
Sample 1
60
50
40
30
20
10
0
10 12 14 16 18 20 22
Sample 2
18
16
14
12
10
8
6
4
2
0
0 1 2 3 4 5 6 7 8
Sample 3
120

100

80

60

40

20

0
0 20 40 60 80 100 120
Pearson Product-Moment Correlation
- The most widely used in statistics to
measure the degree of the relationship
between the linear related variables.

- The Pearson r correlation would


require both variables to be normally
distributed.
Testing Normality of the Data
• To determine if the data is following a
normality distribution, we can use the
graphical or numerical method.
Graphical method
• Histogram
• It plots the observed
values against their
frequency and states a
visual estimation of
whether the distribution
is bell-shaped or not.
Graphical method
• Q-Q probability Plots
• It displays the
observed values
against normally
distributed
data(represented by
the line).
Graphical method
• Q-Q probability
Plots
• If the data is normally
distributed, the points
in a Q-Q plot will lie
on a straight diagonal
line.
Remember:
• Graphical methods are typically not very
useful when the sample size is small.
Numerical method
• Shapiro-Wilk Test
• One of the most popular tests for normality
assumption diagnostics which has good
properties of power and is based on correlation
within given observations and associated normal
scores
Hypotheses of Normality Test
• Ho: The sample data follows a normal distribution
• Ha: The sample data does not follow a normal
distribution.
• When we are testing normality:
• If P-value is greater than the alpha, it means that the data
are normal.
• If P-value is less than the alpha, it means that the data are
NOT normal.
Pearson Product-Moment Correlation
The following summarizes the correlation coefficient
and strength of relationships:

0.00 no correlation, no relationship


±0.01 to ±0.20 very low correlation, almost negligible relationship
±0.21 to ±0.40 slight correlation, definite but small relationship
±0.41 to ±0.70 moderate correlation, substantial relationship
±0.71 to ±0.90 high correlation, marked relationship
±0.91 to ±0.99 very high correlation, very dependable relationship
±1.00 perfect correlation, perfect relationship
Example
• A medical researcher wants to know if a linear relationship exists
between toddlers’ age (months) and their height (cm). Data are
shown below: Age Height
12 75
13 72
15 70
18 80
20 81
24 80
Example

• Determine whether the following:


• The data is normal.
• Determine the correlation.
• Identify if there is a significant relationship
between
Solution
• The data is normal.
• Based on the Q-Q plot for
the age, since the points
almost lie in the diagonal
line. Hence, we can say that
the data in age is normal.
Solution
• The data is normal.
• Based on the Q-Q plot for
the height, since most
points almost lie in the
diagonal line. Hence, we
can say that the data in
height is also normal.
Solution
• The data is normal.
• Based on the table, since the
computed p-value (0.749) in
age and the computed p-
value (0.231) in height is
greater than to p-value
(0.05). Hence, the data for
age and height is normal.
• Determine the correlation.

• Using the Pearson Product-Moment Correlation


formula, you can create the following columns:
X Y XY
12 75 144 5625 900
13 72 169 5184 936
15 70 225 4900 1050
18 80 324 6400 1440
20 81 400 6561 1620
24 80 576 6400 1920
Total 102 458 1838 35070 7866
Interpretation

The Pearson correlation coefficient (r = 0.75) can be


interpreted as there is a high positive correlation and a
marked relationship between the toddler’s age in months and
their height. It means that as the toddler’s age increases
their height tends to increase and vice versa.
Example
• To identify if there is a significant
relationship between.
• Step 1. State the null and alternative
hypotheses.
• Step 2. Determine the value of alpha.
• Step 3. Identify the test statistics.
T test
- Used to test if there is a significant relationship
between two set of scores.
Example
• Step 4. Determine the degrees of freedom, computed
t-value, and critical t-value.
• Step 5. If the computed t is greater than or equal to
the critical value of t then reject the null hypothesis.
If the computed t is less than the critical value of t
then accept the null hypothesis.
• Step 6. Formulate your conclusion and interpretation.
Solution
• Step 1.
• Ho: There is a significant relationship
between the toddler’s age (in months) of the
baby and their height.
• Ha: There is no significant relationship
between the toddler’s age (in months) of the
baby and their height.
Solution
• Step 2. The value of is 0.05.
• Step 3. The t-test will be used with the
degrees of freedom of .
• Step 4. Since , then the degrees of
freedom is .
Solution

𝟎.𝟕𝟓 √ 𝟒
• To find the
computed t-value.

𝒕=
√𝟏−𝟎.𝟓𝟔𝟐𝟓
To find the critical t-value, use the t-table.
Solution
• Step 5. Since the computed t-value (2.27)
is less than the critical t-value (2.776).
Therefore, we accept the null hypothesis.
• Step 6. There is a significant relationship
between the toddlers’ age (in months)
and their height.
Coefficient of Determination

- This tells us how much of dependent


variable () is due to or can be attributed
to independent variable ().

- This is denoted as .
Example

From our previous example, if then .


Interpretation

Fifty-six percent of the variation in


toddlers’ height is due to or can be
attributed to the variation in the toddlers’
age. The remaining 44% is due to other
factors.
Summary

Closer to 0 = weaker
Closer to 1.0 = stronger
Summary

r is equal to 1.0 perfect


r = 0 could mean many things:
No relationship at all between X & Y
Non-linear relationship between X & Y
Restricted range on X and/or Y
Outlier may be causing problems
Thank
you!

You might also like