Professional Documents
Culture Documents
DAFM - Unit 7 - Correlation and Regression
DAFM - Unit 7 - Correlation and Regression
Unit 7
Correlation and Regression
Unit 7 - Correlation and Regression
Learning Objectives
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 2
Unit 7 - Correlation and Regression
Table of contents
S.No Details Page No.
1. Introduction 5
2. Introducing Correlation 6
2.1 Types of Correlation 7
2.1.1 Positive Correlation & Negative Correlation 7
2.1.2 Simple, Partial, and Multiple Correlation 8
2.1.3 Linear and Non-Linear Correlation 9
3. Measuring Correlation 9
3.1 Scatter Diagrams for Correlation 9
3.2 Karl Pearson’s Correlation Coefficient 12
3.2.1 Calculating Correlation Coefficient with Pearson’s and Spreadsheet 14
3.3 Spearman’s Rank Correlation Coefficient 18
3.3.1 Calculating Correlation Coefficient with Spearman’s rank correlation 20
4. Introducing Regression 21
5. Measuring Regression 23
5.1 Scatter Diagrams for Regression 23
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 3
Unit 7 - Correlation and Regression
Table of contents
S.No Details Page No.
5.2 Representation in a Spreadsheet 25
5.2.1 Deriving the Regression Equation 30
6. Chapter Summary 34
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 4
Unit 7 - Correlation and Regression
1. Introduction
• Is there any relationship between these two stocks?
• Is there a relationship between employee absence and customer dissatisfaction?
These are some of the questions that involve the use of the concept of correlation and regression.
These utilities of these concepts at banks is beyond question.
Once you understand the concept and understand its workings, you will be able to use these concepts
in almost any banking situation. Why are these two concepts bundled together? Both these concepts
are interrelated, and an understanding of correlation forms the basis for your understanding of the
concept of coefficient of determination.
This unit deals with these two concepts, and you will learn who two variables interact with each other
and even be able to use it in financial planning. The examples in this unit will provide an understanding
of how these concepts can be used in financial planning and other situations.
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 5
Unit 7 - Correlation and Regression
2. Introducing Corelation
We observe in our everyday lives that increase of prices contracts the demand for
goods and expands the supply. When prices decrease supply shrinks and demand
expands. Apart from these, we intuitively feel that there is some relationship
between variables like fertilisers and yield, production and profit, remuneration and
motivation and so on. So, it is evident that the study of the relationship between
two variables is necessary for certain fields. In 1887, Sir Francis Galton was the
first person to study the relationship between the heights of sons and their fathers
based on the collected data.
Let us look at it from another perspective. Addition of 10-11% of chromium makes
the steel stainless, thereby increasing the life of a stainless-steel product. It is easy
to measure the amount of chromium rather than the life of the product. If we can
establish a relationship between the percentage of chromium and product life, we Fig. 7.1: Increased Sales Correlates with Festival Season
can predict the product’s lifetime accurately.
Correlation refers to the relationship between two or more variables. In other words, when two or more variables move in tune with each other,
they are said to be correlated.
Correlation analysis deals with:
• Measuring the relationship between variables
• Testing the significance of relationship
• Giving confidence interval for population correlation measure
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 6
Unit 7 - Correlation and Regression
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 8
Unit 7 - Correlation and Regression
3. Measuring Correlation
Correlation not only indicates the direction of a relationship between the variables but also helps us understand the strength of the relationship, which is
very much essential for making rational decisions. There are many methods to provide a visual or numerical representation of correlation. Let us now
analyze a few methods of measuring correlation.
There are different ways to measure the correlation between the variables. In this unit, we shall discuss three simple and renowned measures of
correlation.
• Scatter plot.
• Karl Pearson’s correlation coefficient.
• Spearman’s rank correlation.
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 9
Unit 7 - Correlation and Regression
2. If the dots lie close to a straight line that runs from left bottom to right top, then the variables are said to be positively correlated.
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 10
Unit 7 - Correlation and Regression
3. If the dots lie exactly on a straight line that runs from left top to right bottom, then the variables are said to be perfectly negatively correlated.
4. If the dots lie very close to a straight line that runs from left top to right bottom, then the variables are said to be negatively correlated.
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 11
Unit 7 - Correlation and Regression
5. If the dots lie all over, then the variables are not correlated.
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 12
Unit 7 - Correlation and Regression
1.
Where,
x = X – X̅
y = Y – Y̅
'n' is the number of paired observations. Please note that [∑xy/N] is called Covariance of x and y.
Sigma denoted by σ stands for standard deviation.
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 13
Unit 7 - Correlation and Regression
We choose one of these three equations depending upon the available information.
2.
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 15
Unit 7 - Correlation and Regression
X Y
1. Put the data in an Excel worksheet.
9 10 2. Select Data à Data Analysis.
3. Select Correlation and click the ok button.
6 16
8 12
8 8
6 14
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 16
Unit 7 - Correlation and Regression
It is the same value obtained by manual computation. From now on, you may wish to use the
tool directly!
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 17
Unit 7 - Correlation and Regression
Where D is the difference between ranks assigned to the variables. N stands for the number of variables. We will shortly look at how D is computed,
through an example. Value of r or ρ (pronounced as Spearman’s rho) lies between –1 and +1 and its interpretation is same as that of Karl Pearson’s
correlation coefficient. Karl Pearson’s r and Spearman’s r will be nearly equal when the distribution is normal.
Spearman’s Rank correlation coefficient assumes the following:
a. Need not to be a normal distribution – a non-parametric measure.
b. The variables under study are affected by a large number of independent causes so as to form a normal distribution.
When we do not know the shape of population distribution and the data is qualitative, Spearman’s Rank correlation coefficient is used to measure the
relationship. Unlike the Pearson’s product moment correlation coefficient, the Spearman rank correlation coefficient does not assume:
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 18
Unit 7 - Correlation and Regression
So, if there is any doubt that either of the two variables does not have a normal distribution, you should use the Spearman rank correlation and not the
Pearson correlation. Also, in the case of non-linear relationship, the Spearman correlation may be a better indicator than the Pearson correlation.
Similar to the Pearson’s r, a negative value of ρ indicates a negative relationship. Statistically, significant ρ implies that it reflects a true correlation in the
population rather than by chance.
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 19
Unit 7 - Correlation and Regression
1 5 6 -1 1
2 6 4 2 4
3 4 5 -1 1
4 3 1 2 4
5 2 2 0 0
6 7 7 0 0
7 1 3 -2 4
14
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 20
Unit 7 - Correlation and Regression
Since the result is positive and the value is in the strong range, we may conclude that the ranks given by the two
judges are positively correlated.
4. Introducing Regression
A professional life encounters a large number of variables. As a banker, you may deal with variables such as:
• Deposits and loans
• Service fees
• Branch/area/zone profitability
• Population profile of the locality/area/city of bank branch
• Rates and other guidelines from the RBI
• Economic conditions
• Political climate
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 21
Unit 7 - Correlation and Regression
It makes sense to think that some of these variables are inter-dependent and change in a variable affects others. Let us try to understand the cause-and-
effect relationship between variables of the banking sector.
1. What happens when a rival bank opens a branch in your vicinity?
2. What happens when a new multi-storied complex is opened for occupation next door?
3. What happens when a new safety vault is installed in your branch?
4. What happens to GDP growth when GST is implemented?
This is a list of possible changes in the environment in which you operate your branch. In consequent to the changes, you need to analyse the following:
1. If any of the above indicated change impacts your business?
2. In case of a possible impact, which variable(s) does it affect?
3. Will the impact be positive or negative?
4. What will be the quantum of impact?
5. How to assess the impact?
Regression analysis provides answer to such questions. It helps you understand and estimate how changes in one variable affect another variable. The
variable causing the change is called the independent variable and the one getting changed is called the dependent variable.
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 22
Unit 7 - Correlation and Regression
5. Measuring Regression
Regression is usually represented as: Y = a + bx
Where,
• Y = Dependent variable
• A = Intercept
• B = Slope
• X = Independent variable
• Y here stands for the number of depositors and is called the dependent variable because the number of deposits is dependent on the interest rate.
• The interest rate is the explanatory variable or the independent variable. If the interest rate goes up, more deposits will be opened.
1400
1200
1000
Deposit
400
200
0
0 2 4 6 8 10 12
Interest rate
The scatter plot shows that there is an increase in deposits with increase in interest rates. While the scatter plot does indeed give a picture of the
association, we need to set the same in numbers.
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 24
Unit 7 - Correlation and Regression
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 25
Unit 7 - Correlation and Regression
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 26
Unit 7 - Correlation and Regression
30 Y
20 Predicted Y
Fig. 7.13: Regression in a Spreadsheet
10
0
0 2 4 6 8 10 12 14 16
X Variable 1
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 27
Unit 7 - Correlation and Regression
The output shows the following table. Note the R square value (marked in yellow). This is one of the first things you need to see when analysing data.
SUMMARY OUTPUT
Regression Statistics
Multiple R 1
R Square 1
Adjusted R Square 1
Standard Error 0
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 1320 1320 #NUM! #NUM!
Residual 8 0 0
Total 9 1320
Intercept 90 0 65535 # 90 90 90 90
X Variable 1 -4 0 65535 # -4 -4 -4 -4
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 28
Unit 7 - Correlation and Regression
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 29
Unit 7 - Correlation and Regression
Let’s test it by considering the tax for the year 2013. The tax rate in 2013 is 10 therefore Y is:
Y = 90 – 4 x 10
Y = 90 – 40 = 50
Crosschecking with the deposit volume in the year 2013, it is indeed 50. So how can we use this result? If you know that the tax rate will be reduced to 5%
in the next budget, then the deposit volumes are likely to be:
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 30
Unit 7 - Correlation and Regression
Y = 90 – 4 x 5 = 70 → 70 million
The above example is fictitious, as it is very unlikely that you will get a perfect 1 as R square value. In real life, there could also be multiple factors affecting
the prediction. The above analysis with only two variables is called bivariate analysis. You can also do the same analysis with multiple variables in which
case it is called multiple regression.
There are many other areas where regression analysis proves to be effective in terms of understanding the relationship and using it to forecast the
dependent variable patterns. Many business decisions are dependent on correctly estimating or forecasting future conditions and as a banker, you will find
multiple uses such as making sales forecasts. You can even use these forecasts to substantiate your point but remember reality can be way different as
there are a large number of random events to blunt any prediction. For example, the demonetization period saw a dip in business which no forecast could
have foreseen. The 2008 crisis escaped the detection of the best systems and even the best-known experts in the field. Nevertheless, most of the
forecasts in the banking sector are relatively stable. For example, forecasts related to the growth of deposits or growth in the loan portfolio.
Formula Method
We try to establish a linear relationship between dependent and independent variables to evaluate, to what extent the dependent variable changes for
every unit change in the independent variable.
Y = a + bx
b=
Xy − nXy
X − nX 2 2
a = y − bX
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 31
Unit 7 - Correlation and Regression
Let's calculate the regression equation using the formula for below data.
2008 15 30
2009 14 34
2010 13 38
2011 12 42
2012 11 46
2013 10 50
2014 9 54
2015 8 58
2016 7 62
2017 6 66
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 32
Unit 7 - Correlation and Regression
Term Deposit
Effective amount by
Year Tax rate for women X^2 XY
women
(in million
rupees)
2008 15 30 225 450
2009 14 34 196 476
2010 13 38 169 494
2011 12 42 144 504
2012 11 46 121 506
2013 10 50 100 500
If you know that the tax rate will be reduced to 5% in the next budget, then
2014 9 54 81 486 the deposit volumes are likely to be:
2015 8 58 64 464 Y = 90 – 4 x 5 = 70 → 70 million
2016 7 62 49 434
2017 6 66 36 396
105 480 1185 4710
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 33
Unit 7 - Correlation and Regression
6. Chapter Summary
Here are the key points discussed in this unit.
• Correlation refers to the relationship between two or more variables.
• The variables in a relationship may exhibit various types of correlation such as positively correlated or negatively correlated.
• Two types of data are available – parametric and non-parametric.
• To calculate correlation, either Karl Pearson’s method or Spearman’s Rank correlation method can be used based on the data type.
• Regression is a concept to estimate the changes induced in one variable by another variable. The variable causing the change is
called the independent variable and the one getting changed is called the dependent variable.
• Both correlation and regression can be analyzed visually through scatter diagrams and can also be represented on a spreadsheet.
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 34