You are on page 1of 60

Bi c o l Bn i v e r s i t y

Bi c o l Bn i v e r s i t y G u i n o b a t a n
A g r i c u l t u r a l a n d Bi o sy s t e m s En g i n e e r i n g D e p a r t m e n t

CORRELATION
ANALYSIS
MATH 4: Engineering Data Analysis

Presentation by
GROUP 3 | III-ABE2a
Page 02 of 61

OVERVIEW
Introduction Lesson 3: Differences of the
Lesson 1: Correlation Analysis Correlation Coefficients
Lesson 2: Correlation Coefficient Lesson 4: Uses and Advantages
Spearman's Rank Correlation Coefficient References
Kendall Rank Correlation Thank You
Pearson Product-Moment Coefficient

CORRELATION ANALYSIS
CORRELATION ANALYSIS

Introduction
Correlation analysis is a statistical method used to evaluate the
strength and direction of the linear relationship between two
quantitative variables. In other words, it helps to assess
whether and how changes in one variable are associated with
changes in another variable. This analysis is crucial in various
fields, including economics, biology, psychology, finance, and
many others.

Page 03 of 61 Presentation by Group 3 | III-ABE2a


Page 04 of 61

SON
ES
CORRELATION
L

1
ANALYSIS

CORRELATION ANALYSIS
MATH 4: ENGINEERING DATA ANALYSIS

WHAT IS CORRELATION
ANALYSIS?
Correlation analysis in research is a statistical method used to
measure the strength of the linear relationship between two
variables and compute their association. Simply put - correlation
analysis calculates the level of change in one variable due to
the change in the other.

Page 05 of 61
Page 06 of 61

SON
ES
CORRELATION
L

2
COEFFICIENT

CORRELATION ANALYSIS
LESSON 2

CORRELATION
COEFFICIENT
Unit of measurement used to calculate the intensity in the linear
relationship between the variables involved in a correlation
analysis, this is easily identifiable since it is represented with the
symbol r and is usually a value without units which is located
between 1 and -1.

Page 07 of 61
Page 08 of 61

There are usually three different ways of ranking statistical correlation according to:

CHARLES SPEARMAN MAURICE KENDALL KARL PEARSON

Each coefficient will represent the end result as ‘r’. Spearman’s Rank and Pearson’s
Coefficient are the two most widely used analytical formulae depending on the
types of data researchers.
LESSON 2
Page 09 of 61

Pearson Correlation Coefficient (r)


Pearson’s r / Bivariate correlation / Pearson product-moment correlation coefficient (PPMCC)

the most common way of measuring a linear


correlation.
a number between 1 and -1 that measures the
strength and direction of the linear relationship
between two quantitative variables.

PEARSON PRODUCT-MOMENT COEFFICIENT


Page 10 of 61
Pearson Coefficient

Descriptive Statistic
it summarizes the characteristics of a dataset

Inferential Statistic
we can test whether there is a significant
relationship between two variables.
PEARSON PRODUCT-MOMENT COEFFICIENT
VISUALIZING THE
Pearson correlation coefficient

how close the observations are to a line of best fit.


whether the slope of the line of best fit is negative or positive.

slope; negative, r is negative.


slope; positive, r is positive.
RULES OF THUMB
The Pearson correlation coefficient is a good choice when all
of the following are true:

1. Both variables are quantitative.


2. The variables are normally distributed.
3. The data have no outliers.
4. The relationship is linear.
Steps in Hypothesis Testing

1.State the null and alternative hypothesis.


2. Choose level of significance of size α. By convention, α = 0.05.
3. Select the appropriate test statistics.
4. Establish the critical region. The degrees of freedom (df): df =
n–2. One-tailed or two-tailed: Two-tailed is an appropriate
choice for correlations.
5. Compute the value of the test statistic from the sample data.
6. Provide statistical decision.
7. Make an interpretation and, if possible, draw a conclusion.
SAMPLE: Imagine you’re studying the relationship between newborns’
weight and length. You have the weights and lengths of the 10 babies
born last month at your local hospital. The weight and length of 10
newborns has a Pearson correlation coefficient of 0.47. Determine
whether there is significant relationship between weight and height.

Given:
r = 0.47
n = 10
SOLUTION:
4. Rejection Region
If the t value is greater than the critical value, then
the relationship is statistically significant (p < α).
If the t value is less than the critical value, then the
relationship is not statistically significant (p > α).

Find the critical value of t


df = n – 2 = 10-2 = 8
the significance level is usually .05.
For a one-tailed test of significance at α = .05 and
df = 8, the critical value of t (t*) is 1.86.
SOLUTION:
5. t-value

The weight and length of 10


newborns has a Pearson
correlation coefficient of 0.47.
Since we know that n = 10 and
r = 0 .47, we can calculate the
t value:
SOLUTION:
6. Statistical Decision : Therefore, we don’t reject the null hypothesis that
the Pearson correlation coefficient of the population (ρ) is 0.

7. Interpretation / Conclusion : For the correlation between weight and


height in a sample of 10 newborns, the t value is less than the critical
value of t. Therefore, there is no significant relationship between weight
and height (p > .05).

Note that a sample size of 10 is very small. It’s possible that you would find a significant relationship if
you increased the sample size.
Spearman's Rank
Correlation Coefficient
The Spearman’s rank coefficient of correlation or Spearman correlation coefficient is
a nonparametric measure of rank correlation (statistical dependence of ranking
between two variables).
Named after Charles Spearman, it is often denoted by the Greek letter ‘ρ’ (rho) and
is primarily used for data analysis.
It measures the strength and direction of the association between two ranked
variables.

SPEARMAN'S RANK CORRELATION COEFFICIENT


Page 23 of 61
Spearman's Rank
Correlation Coefficient
But before we talk about the Spearman correlation coefficient, it is important to
understand Pearson’s correlation first. A Pearson correlation is a statistical measure of
the strength of a linear relationship between paired data.
For the calculation and significance testing of the ranking variable, it requires the
following data assumption to hold true:
Interval or ratio level
Linearly related
Bivariant distributed
If your data doesn’t meet the above assumptions, then you would need Spearman’s
Coefficient.
SPEARMAN'S RANK CORRELATION COEFFICIENT
Spearman's Rank
Correlation Coefficient
It is necessary to know what monotonic function is to understand Spearman correlation
coefficient. A monotonic function is one that either never decreases or never increases
as it is an independent variable increase. A monotonic function can be explained using
the image below:

SPEARMAN'S RANK CORRELATION COEFFICIENT


Spearman's Rank
Correlation Coefficient
The image explains three concepts in monotonic function:
1. Monotonically increasing: When the ‘x’ variable increases and the ‘y’ variable never
decreases.
2. Monotonically decreasing: When the ‘x’ variable increases but the ‘y’ variable never
increases
3. Not monotonic: When the ‘x’ variable increases and the ‘y’ variable sometimes
increases and sometimes decreases.
Monotonic relation is less restrictive when compared to a linear relationship that is used
in Pearson’s coefficient.

SPEARMAN'S RANK CORRELATION COEFFICIENT


Spearman's Rank
Correlation Coefficient
Although monotonicity is not the ultimate requirement for Spearman correlation
coefficient, it will not be meaningful to pursue Spearman’s correlation without actually
determining the strength and direction of a monotonic relationship if it was already
known that the relationship between the variable is non-monotonic.

SPEARMAN'S RANK CORRELATION COEFFICIENT


Spearman's Rank
Correlation Coefficient
Formula and Calculation with Example:

Here,
n= number of data points of the two variables
d= difference in ranks of the “th” element
LESSON 2
Spearman's Rank
Correlation Coefficient
The Spearman Coefficient, ⍴, can take a value between +1 to -1 where,
A ⍴ value of +1 means a perfect association of rank
A ⍴ value of 0 means no association of ranks
A ⍴ value of -1 means a perfect negative association between ranks.
Closer the ⍴ value to 0, the weaker is the association between the two ranks.

LESSON 2
Spearman's Rank
Correlation Coefficient
Sample Problem
We must be able to rank the data before proceeding with the Spearman’s Rank
Coefficient of Correlation. It is important to observe if increasing one variable, the
other variable follows a monotonic relation.
At every level, you will need to compare the values of the two variables. Here is how
the calculations work:
The scores of 9 students in History and Geography are mentioned in the table
below.

LESSON 2
Spearman's Rank
Correlation Coefficient
Sample Problem
Step 1 - Create a table of the data obtained.
Step 2- Start by ranking the two data sets. Data ranking can be achieved by
assigning the ranking “1” to the biggest number in the column, “2” to the second
biggest number and so forth. The smallest value will usually get the lowest ranking.
This should be done for both sets of measurements.
Step 3- Add a third column d to your data set, d here denotes the difference
between ranks. For example, if the first student’s physics rank is 3 and the math rank
is 5 then the difference in the rank is 3. In the fourth column, square your d values.

LESSON 2
Spearman's Rank
Correlation Coefficient
Sample Problem
Step 1 - Create a table of the data obtained.
Step 2- Start by ranking the two data sets. Data ranking can be achieved by
assigning the ranking “1” to the biggest number in the column, “2” to the second
biggest number and so forth. The smallest value will usually get the lowest ranking.
This should be done for both sets of measurements.
Step 3- Add a third column d to your data set, d here denotes the difference
between ranks. For example, if the first student’s physics rank is 3 and the math rank
is 5 then the difference in the rank is 2. In the fourth column, square your d values.

LESSON 2
Spearman's Rank
Correlation Coefficient
Sample Problem

LESSON 2
Spearman's Rank
Correlation Coefficient
Sample Problem
Step 4- Add up all your d square values, which is 12 (∑d square)
Step 5- Insert these values in the formula

LESSON 2
Spearman's Rank
Correlation Coefficient
Sample Problem
=1 - [(6*12)/(9(81-1))]
=1 - (72/720)
=1-01
=0.9
The Spearman’s Rank Correlation for this data is 0.9 and as mentioned above if the
⍴ value is nearing +1 then they have a perfect association of rank.

LESSON 2
Page 36 of 61

Kendall Rank
Correlation
Also commonly known as “Kendall’s tau coefficient”.

Kendall's Tau is a correlation coefficient


and is thus a measurement of relationship
between two variables. It was introduced
by Maurice Kendall in 1938 (Kendall 1938).
Kendall Rank Correlation
Kendall's Tau measures the strength of the
relationship between two variables. BUT,
unlike Pearson’s it is non-parametric

The data must not be normally distributed


and variables of interest can be continuous
or ordinal and should have a monotonic
relationship.
Assumptions for
Kendall’s Tau
Assumptions mean that your data must satisfy certain properties in order for
statistical method results to be accurate.

The assumptions for Kendall’s Tau include:

Continuous or ordinal

Monotonicity
KENDALL RANK CORRELATION
Continuous or Ordinal
Continuous means that the variable
can take on any reasonable value.

Ordinal variables are categories that


have an inherent order.
KENDALL RANK CORRELATION
Monotonicity

This means that the direction of the


relationship between the variables is
consistent. For instance, when one
variable goes up, the other goes up
(in general).

KENDALL RANK CORRELATION


When to use Kendall’s
Tau?
You should use Kendall’s Tau in the following scenario:

1. You want to know the relationship between two variables


2. Your variables of interest are continuous with outliers or
ordinal
3. You have only two variables
KENDALL RANK CORRELATION
Kendall’s Tau is very similar to
Spearman’s rank however
Kendall’s tau should be
preferred over Spearman’s if
very few data with many rank
ties are available
KENDALL RANK CORRELATION
Calculate Kendall's Tau

We can calculate the Kendall's Tau with this formula:

Where C is the number of concordant pairs and D is the number of discordant pairs.

KENDALL RANK CORRELATION


What is Concordant and
Discordant Pairs?
A pair of observations is concordant if the subject who is
higher on one variable is also higher on the other variable.
A pair of observations is discordant if the subject who is
higher on one variable is lower on the other variable.

KENDALL RANK CORRELATION


Example Kendall's Tau
Suppose two doctors rank 6 patients by descending physical health. One of the two
doctors, in this case the female, is now defined as the reference and the patients are
sorted from 1 to 6.
Example Kendall's Tau
Suppose two doctors rank 6 patients by descending physical health. One of the two
doctors, in this case the female, is now defined as the reference and the patients are
sorted from 1 to 6.

-
++
- + -
++++
++++-
Example Kendall's Tau
Suppose two doctors rank 6 patients by descending physical health. One of the two
doctors, in this case the female, is now defined as the reference and the patients are
sorted from 1 to 6.

Now we can easily calculate the


number of concordant and
discordant pairs. We get the
number of concordant pairs by
counting all "+". In our example
we have a total of 11.
Example Kendall's Tau
Suppose two doctors rank 6 patients by descending physical health. One of the two
doctors, in this case the female, is now defined as the reference and the patients are
sorted from 1 to 6.

We get the number of


discordant pairs by counting
through all the "-". In our
example we have a total of 4.
Example Kendall's Tau
Suppose two doctors rank 6 patients by descending physical health. One of the two
doctors, in this case the female, is now defined as the reference and the patients are
sorted from 1 to 6.

C is 11 and D is 4

Using the formula:


Example Kendall's Tau
Suppose two doctors rank 6 patients by descending physical health. One of the two
doctors, in this case the female, is now defined as the reference and the patients are
sorted from 1 to 6.

Alternate formula:
Kendall's Tau
Significance
In the case of Kendall's Tau, the null and alternative hypotheses result in:

Null hypothesis: the correlation coefficient Tau = 0


(There is no correlation.)

Alternative hypothesis: the correlation coefficient Tau ≠ 0


(There is a correlation.)
Page 52 of 61

SON
ES
DIFFERENCES OF
L

3 THE CORRELATION
COEFFICIENTS

CORRELATION ANALYSIS
Pearson Correlation Spearman's Rank Correlation
Kendall's Tau Significance
Coefficient Coefficient

Appropriate for
continuous, numerical non-parametric measure suitable for ordinal, interval, or
data that is approximately ratio data. It does not assume a specific distribution.
normally distributed.

It assumes a linear
does not assume a linear relationship and is based on
relationship between the
the ranks of the data.
variables.

LESSON 3
Page 53 of 61
Pearson Correlation Spearman's Rank Correlation
Kendall's Tau Significance
Coefficient Coefficient

Measures the strength and It assesses the monotonic


direction of a linear relationship between two
relationship between two variables by comparing the
It compares the number of
variables. The coefficient ranks of the observations. The
concordant and discordant pairs
ranges from -1 to 1, where: coefficient can range from -1 to 1,
of observations. The coefficient,
-1 indicates a perfect with:
often denoted as τ (tau), ranges
negative linear relationship, -1 indicating a perfect inverse
from -1 to 1, with the same
1 indicates a perfect monotonic relationship,
interpretation as Spearman's rank
positive linear relationship, 1 indicating a perfect
correlation.
and, monotonic relationship, and
0 indicates no linear 0 indicating no monotonic
relationship. relationship.

LESSON 3
Page 54 of 61
Page 55 of 61

SON
ES
USES AND
L

4
ADVANTAGES

CORRELATION ANALYSIS
USES OF CORRELATION
ANALYSIS
Correlation analysis is used to study practical cases. Here, the researcher
can't manipulate individual variables. For example, correlation analysis is
used to measure the correlation between the patient's blood pressure and
the medication used.

Marketers use it to measure the effectiveness of advertising. Researchers


measure the increase/decrease in sales due to a specific marketing
campaign.
LESSON 4
Page 06 of 15

ADVANTAGES
Awareness of the behavior between two variables: A
01 correlation helps to identify the absence or presence of
a relationship between two variables. It tends to be more
relevant to everyday life.

A good starting point for research: It proves to be a


02 good starting point when a researcher starts investigating
relationships for the first time.

LESSON 4
Page 06 of 15

ADVANTAGES
Uses for further studies: Researchers can identify the
03 direction and strength of the relationship between two
variables and later narrow the findings down in later
studies.

Simple metrics: Research findings are simple to classify.


04 The findings can range from -1.00 to 1.00. There can be
only three potential broad outcomes of the analysis.

LESSON 4
Presentation by Group 3 | III-ABE2 Page 14 of 15

REFERENCES
https://www.questionpro.com/features/correlation-
analysis.html#:~:text=What%20is%20correlation%20analysis%3F,the%20change%20in%20the%20other.
https://blog.flexmr.net/correlation-analysis-definition-exploration
https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_multivariable/bs704_multivariable5.html
https://www.questionpro.com/blog/spearmans-rank-coefficient-of-
correlation/#:~:text=For%20example%2C%20if%20the%20first,column%2C%20square%20your%20d%
20values.&text=The%20Spearman's%20Rank%20Correlation%20for,a%20perfect%20association%20of
%20rank.
CORRELATION ANALYSIS

Presentation by

GROUP 3 | III-ABE2a

Caryl R. Loterte

THANK
Kimberly Jane M. Mitra
Emmanuel John D. Olitan
William C. Pamparo

YOU!

You might also like