You are on page 1of 11

Quantitative Methods

1
Correlation Analysis

Module 010 - Correlation Analysis

At the end of this module you are expected to:


1. Explain what is Correlation Analysis;
2. Understand what is Pearson Product Moment Correlation
3. Recognize the Spearman Rank Correlation
Correlation Analysis

Methods of correlation and regression can be used in order to analyze the extent and
the nature of relationships between different variables. Correlation analysis is used
to understand the nature of relationships between two individual variables. For
example, if we aim to study the impact of foreign direct investment (FDI) on the level
of economic growth in Vietnam, then two variables can be specified as the amounts
of FDI and GDP for the same period.

Correlation coefficient ‘r’ is calculated through the following formula:

𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟=
√(𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 )(𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 )

Where, x and y are values of variables, and n is size of the sample.

 The value of correlation coefficient can be interpreted in the following


manner:

 If ‘r’ is equal to 1, then there is perfect positive correlation between two values;

 If ‘r’ is equal to -1, then there is perfect negative correlation between two
values;

 If ‘r’ is equal to zero, then there is no correlation between the two values.

In practical terms, the closer the value of ‘r’ to 1, the higher positive impact of FDI on
GDP growth in Vietnam. Similarly, if the value of ‘r’ is less than 0, the closer it is to –
1, the greater the negative impact of FDI on GDP growth in Vietnam. If ‘r’ is equal to
zero, then FDI is perceived to have no impact on GDP change in Vietnam a within the
given sample.

Course Module
The most popular forms of correlation analysis used in business studies include
Pearson product-moment correlation, Spearman Rank correlation and
Autocorrelation. Correlation analysis is a method of statistical evaluation used to
study the strength of a relationship between two, numerically measured, continuous
variables (e.g. height and weight). This particular type of analysis is useful when a
researcher wants to establish if there are possible connections between variables. It
is often misunderstood that correlation analysis determines cause and effect;
however, this is not the case because other variables that are not present in the
research may have impacted on the results.

If correlation is found between two variables it means that when there is a systematic
change in one variable, there is also a systematic change in the other; the variables
alter together over a certain period of time. If there is correlation found, depending
upon the numerical values measured, this can be either positive or negative.

Positive correlation exists if one variable increases simultaneously with the other, i.e.
the high numerical values of one variable relate to the high numerical values of the
other.

Negative correlation exists if one variable decreases when the other increases, i.e. the
high numerical values of one variable relate to the low numerical values of the other.

Pearson’s product-moment coefficient is the measurement of correlation and ranges


(depending on the correlation) between +1 and -1. +1 indicates the strongest positive
correlation possible, and -1 indicates the strongest negative correlation possible.
Therefore the closer the coefficient to either of these numbers the stronger the
correlation of the data it represents. On this scale 0 indicates no correlation, hence
values closer to zero highlight weaker/poorer correlation than those closer to +1/-1.

If there is correlation between two numerical sets of data, positive or negative, the
coefficient worked out can allow you to predict future trends between the two
variables. However, you must remember that you cannot be 100% sure that your
prediction will be correct because correlation does not determine cause or effect.

Autocorrelation (serial correlation) implies the correlation among the values of the
same variables but at various times. Autocorrelation coefficient is calculated by
changing lagged data with the formula for the Pearson product-moment correlation
coefficient. Also, because a series of unshifted data will express perfect correlation,
the function begins with the coefficient of 1.

Correlation coefficient ‘r’ illustrated above is just a mathematical formula and you
don’t have to calculate correlation coefficient manually. For a bachelor’s degree
dissertation most supervisors accept correlation tests that have been run on a simple
Excel spreadsheet. For master’s or PhD level studies, on the other hand, you will have
to use more advanced statistical software such as SPSS or NCSS for your correlation
analysis.
Quantitative Methods
3
Correlation Analysis

Correlation analysis as a research method offers a range of advantages. This method


allows data analysis from many subjects simultaneously. Moreover, correlation
analysis can study a wide range of variables and their interrelations. On the negative
side, findings of correlation does not indicate causations i.e. cause and effect
relationships.

Pearson Product Moment Correlation

The Pearson product-moment correlation is calculated by taking the ratio of the


sample of the two variables to the product of the two standard deviations and
illustrates the strength of linear relationships. In Pearson product-moment
correlation the correlation coefficient is not robust due to the fact that strong linear
relationships between the variables are not recognized. The correlation coefficient is
sensitive to outlying points therefore the correlation coefficient is not resistant.

The Pearson product-moment correlation coefficient (Pearson’s correlation, for


short) is a measure of the strength and direction of association that exists between
two variables measured on at least an interval scale.

For example, you could use a Pearson’s correlation to understand whether there is an
association between exam performance and time spent revising. You could also use a
Pearson's correlation to understand whether there is an association between
depression and length of unemployment.

A Pearson’s correlation attempts to draw a line of best fit through the data of two
variables, and the Pearson correlation coefficient, r, indicates how far away all these
data points are from this line of best fit (i.e., how well the data points fit this
model/line of best fit).

Note: If one of your two variables is dichotomous you can use a point-biserial
correlation instead, or if you have one or more control variables, you can run
a Pearson's partial correlation.

Assumptions:

When you choose to analyse your data using Pearson’s correlation, part of the process
involves checking to make sure that the data you want to analyse can actually be
analysed using Pearson’s correlation. You need to do this because it is only
appropriate to use Pearson’s correlation if your data "passes" four assumptions that
are required for Pearson’s correlation to give you a valid result. In practice, checking
for these four assumptions just adds a little bit more time to your analysis, requiring
you to click of few more buttons in SPSS Statistics when performing your analysis, as
well as think a little bit more about your data, but it is not a difficult task.
Before we introduce you to these four assumptions, do not be surprised if, when
analysing your own data using SPSS Statistics, one or more of these assumptions is
violated (i.e., is not met). This is not uncommon when working with real-world data
Course Module
rather than textbook examples, which often only show you how to carry out Pearson’s
correlation when everything goes well! However, don’t worry. Even when your data
fails certain assumptions, there is often a solution to overcome this. First, let’s take a
look at these four assumptions:

Assumption #1: Your two variables should be measured at the interval or ratio
level (i.e., they are continuous). Examples of variables that meet this criterion include
revision time (measured in hours), intelligence (measured using IQ score), exam
performance (measured from 0 to 100), weight (measured in kg), and so forth. You
can learn more about interval and ratio variables in our Types of Variable guide.

Assumption #2: There is a linear relationship between your two variables. Whilst
there are a number of ways to check whether a linear relationship exists between
your two variables, we suggest creating a scatterplot using SPSS Statistics, where you
can plot the one variable against the other variable, and then visually inspect the
scatter plot to check for linearity. Your scatterplot may look something like one of the
following:

Figure 1. Linear Relationship


URL: https://statistics.laerd.com/spss-tutorials/pearsons-product-moment-correlation-using-spss-
statistics.php
Retrieved: September 08, 2018

If the relationship displayed in your scatterplot is not linear, you will have to either
run a nonparametric equivalent to Pearson’s correlation or transform your data,
which you can do using SPSS Statistics. In our enhanced guides, we show you how to:
(a) create a scatterplot to check for linearity when carrying out Pearson’s correlation
using SPSS Statistics; (b) interpret different scatterplot results; and (c) transform
your data using SPSS Statistics if there is not a linear relationship between your two
variables.

Note: Pearson's correlation determines the degree to which a relationship is linear.


Put another way, it determines whether there is a linear component of association
between two continuous variables. As such, linearity is not actually an assumption of
Pearson's correlation. However, you would not normally want to pursue a Pearson's
correlation to determine the strength and direction of a linear relationship when you
Quantitative Methods
5
Correlation Analysis

already know the relationship between your two variables is not linear. Instead, the
relationship between your two variables might be better described by another
statistical measure. For this reason, it is not uncommon to view the relationship
between your two variables in a scatterplot to see if running a Pearson's correlation
is the best choice as a measure of association or whether another measure would be
better.

Assumption #3: There should be no significant outliers. Outliers are simply single
data points within your data that do not follow the usual pattern (e.g., in a study of
100 students’ IQ scores, where the mean score was 108 with only a small variation
between students, one student had a score of 156, which is very unusual, and may
even put her in the top 1% of IQ scores globally). The following scatterplots highlight
the potential impact of outliers:

Figure 2. The Effect of an Outlier on a Pearson Correlation.


URL: https://statistics.laerd.com/spss-tutorials/pearsons-product-moment-correlation-using-spss-
statistics.php
Retrieved: September 08, 2018

Pearson’s correlation coefficient, r, is sensitive to outliers, which can have a very large
effect on the line of best fit and the Pearson correlation coefficient. Therefore, in some
cases, including outliers in your analysis can lead to misleading results. Therefore, it
is best if there are no outliers or they are kept to a minimum. Fortunately, when using
SPSS Statistics to run Pearson’s correlation on your data, you can easily include
procedures to screen for outliers. In our enhanced Pearson’s correlation guide, we:
(a) show you how to detect outliers using a scatterplot, which is a simple process
when using SPSS Statistics; and (b) discuss some of the options available to you in
order to deal with outliers.
Assumption #4: Your variables should be approximately normally distributed. In
order to assess the statistical significance of the Pearson correlation, you need to have
bivariate normality, but this assumption is difficult to assess, so a simpler method is
more commonly used. This simpler method involves determining the normality of
each variable separately. To test for normality you can use the Shapiro-Wilk test of
normality, which is easily tested for using SPSS Statistics. In addition to showing you

Course Module
how to do this in our enhanced Pearson’s correlation guide, we also explain what you
can do if your data fails this assumption.

Spearman Rank

Before learning about Spearman’s correlation it is important to understand Pearson’s


correlation which is a statistical measure of the strength of a linear relationship
between paired data. Its calculation and subsequent significance testing of it requires
the following data assumptions to hold:

 interval or ratio level;


 linearly related;
 bivariate normally distributed.

If your data does not meet the above assumptions then use Spearman’s rank
correlation Spearman Rank correlation requires the data to be sorted and the value
to be assigned a specific rank with 1 to be assigned as the lowest value. Moreover, in
case of data value appearing more than once, equal values will be specified their
average rank.

What is a monotonic relationship?

A monotonic relationship is a relationship that does one of the following: (1) as the
value of one variable increases, so does the value of the other variable; or (2) as the
value of one variable increases, the other variable value decreases. Examples of
monotonic and non-monotonic relationships are presented in the diagram below:

Figure 3: The Effect monotonic relationship


URL: https://statistics.laerd.com/spss-tutorials/pearsons-product-moment-correlation-using-spss-
statistics.php
Retrieved: September 08, 2018

Monotonic function

To understand Spearman’s correlation it is necessary to know what a monotonic


function is. A monotonic function is one that either never increases or never
Quantitative Methods
7
Correlation Analysis

decreases as its independent variable increases. The following graphs illustrate


monotonic
functions:

Figure 4: The monotonic function.


URL: http://www.statstutor.ac.uk/resources/uploaded/spearmans.pdf
Retrieved: September 08, 2018

Monotonically increasing - as the x variable increases the y variable never


decreases;

Monotonically decreasing - as the x variable increases the y variable never


increases;

Not monotonic - as the x variable increases the y variable sometimes decreases


and sometimes increases.

Spearman’s correlation coefficient

Spearman’s correlation coefficient is a statistical measure of the strength of a


monotonic relationship between paired data. In a sample it is denoted by and is by
design constrained as follows:
-1 ≤ rs ≤ 1

And its interpretation is similar to that of Pearsons, e.g. the closer is to the
stronger the monotonic relationship. Correlation is an effect size and so we can
verbally describe the strength of the correlation using the following guide for the
absolute value of:

.00-.19 “very weak”


.20-.39 “weak”
.40-.59 “moderate”
.60-.79 “strong”

Course Module
.80-1.0 “very strong”

The calculation of Spearman’s correlation coefficient and subsequent significance


testing of it requires the following data assumptions to hold:

Interval or ratio level or ordinal;

Monotonically related.

Note, unlike Pearson’s correlation, there is no requirement of normality and hence it


is a nonparametric statistic. Let us consider some examples to illustrate it. The
following table gives x and y values for the relationship y = exp(x) . From the graph
we can see that this is a perfectly increasing monotonic relationship.

Figure 5: The monotonic relationship.


URL: http://www.statstutor.ac.uk/resources/uploaded/spearmans.pdf
Retrieved: September 08, 2018

The calculation of Pearson’s correlation for this data gives a value of .699 which does
not reflect that there is indeed a perfect relationship between the data. Spearman’s
correlation for this data however is 1, reflecting the perfect monotonic relationship.
Spearman’s correlation works by calculating Pearson’s correlation on the ranked
values of this data. Ranking (from low to high) is obtained by assigning a rank of 1 to
the lowest value, 2 to the next lowest and so on. If we look at the plot of the ranked
data, then we see that they are perfectly linearly related.
Quantitative Methods
9
Correlation Analysis

Figure 6: The Linear Relationship.


URL: http://www.statstutor.ac.uk/resources/uploaded/spearmans.pdf
Retrieved: September 08, 2018

In the figures below various samples and their corresponding sample


correlation coefficient values are presented. The first three represent the
“extreme” monotonic correlation values of -1, 0 and 1:

perfect –ve no correlation perfect +ve


monotonic correlation monotonic correlation

Course Module
Figure 7: Monotonic Correlation.
URL: http://www.statstutor.ac.uk/resources/uploaded/spearmans.pdf
Retrieved: September 08, 2018

Invariably what we observe in a sample are values as follows:

Figure 8: Monotonic Correlation.


URL: http://www.statstutor.ac.uk/resources/uploaded/spearmans.pdf
Retrieved: September 08, 2018

Note: Spearman’s correlation coefficient is a measure of a monotonic


relationship and thus a value of does not imply there is no relationship
between the variables. For example in the following scatterplot which
implies no (monotonic) correlation however there is a perfect quadratic
relationship:
Quantitative Methods
11
Correlation Analysis

Figure 9: Perfect Quadratic Relationship


Retrieved: September 08, 2018

References and Supplementary Materials


Books and Journals
1. Lee Baker; 2018; Beginner’s Guide to Correlation Analysis; Chicago; Green Apple
Publishing
2. Balaram Koduri; 2015; Quantitative Correlation Analysis of Motor and Dysphonia
Features of Parkinson Disease; Texas; University of North Texas
Online Supplementary Reading Materials
1. Spearman Rank; http://learntech.uwe.ac.uk/da/Default.aspx?pageid=1441; Septmber
08, 2018
2. Spearman Correlation;
http://www.statstutor.ac.uk/resources/uploaded/spearmans.pdf; September 08,
2018

Online Instructional Videos


1. Correlation analysis is about observing when and to what extent various securities and
markets move together or inversely. Since these relationships are dynamic, it is useful to
measure them historically and monitor them in real time. For example, this analysis may
be useful in revealing which securities in a portfolio provide diversification and which
securities may be duplicating unwanted risk. Fred Palmliden presents two custom
indicators designed to shed light on intermarket relationships.;
https://www.youtube.com/watch?v=ZG_WLFmXJJc; September 08, 2018
2. How to run a correlation analysis using Excel and write up the findings for a report;
https://www.youtube.com/watch?v=zEXK6M93lb8; September 08, 2018
3. In this brief presentation, Kelly Clement shows you what correlation analysis is, and
how to use it in your market analysis.;
https://www.youtube.com/watch?v=eFVNyjq0TB8; September 08, 2018

Course Module

You might also like