You are on page 1of 34

Data Analysis for Managers

Unit 7
Correlation and Regression
Unit 7 - Correlation and Regression

Learning Objectives

At the completion of this unit, you will be able to:


• Explain the concept of correlation and types of correlation.
• Use Karl Pearson’s correlation coefficient and Spearman’s rank correlation coefficient to measure the correlation.
• Recognize the need for regression analysis.
• Apply the regression equations to calculate correlation coefficient.

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 2
Unit 7 - Correlation and Regression

Table of contents
S.No Details Page No.
1. Introduction 5
2. Introducing Correlation 6
2.1 Types of Correlation 7
2.1.1 Positive Correlation & Negative Correlation 7
2.1.2 Simple, Partial, and Multiple Correlation 8
2.1.3 Linear and Non-Linear Correlation 9
3. Measuring Correlation 9
3.1 Scatter Diagrams for Correlation 9
3.2 Karl Pearson’s Correlation Coefficient 12
3.2.1 Calculating Correlation Coefficient with Pearson’s and Spreadsheet 14
3.3 Spearman’s Rank Correlation Coefficient 18
3.3.1 Calculating Correlation Coefficient with Spearman’s rank correlation 20
4. Introducing Regression 21
5. Measuring Regression 23
5.1 Scatter Diagrams for Regression 23

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 3
Unit 7 - Correlation and Regression

Table of contents
S.No Details Page No.
5.2 Representation in a Spreadsheet 25
5.2.1 Deriving the Regression Equation 30
6. Chapter Summary 34

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 4
Unit 7 - Correlation and Regression

1. Introduction
• Is there any relationship between these two stocks?
• Is there a relationship between employee absence and customer dissatisfaction?
These are some of the questions that involve the use of the concept of correlation and regression.
These utilities of these concepts at banks is beyond question.
Once you understand the concept and understand its workings, you will be able to use these concepts
in almost any banking situation. Why are these two concepts bundled together? Both these concepts
are interrelated, and an understanding of correlation forms the basis for your understanding of the
concept of coefficient of determination.
This unit deals with these two concepts, and you will learn who two variables interact with each other
and even be able to use it in financial planning. The examples in this unit will provide an understanding
of how these concepts can be used in financial planning and other situations.

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 5
Unit 7 - Correlation and Regression

2. Introducing Corelation
We observe in our everyday lives that increase of prices contracts the demand for
goods and expands the supply. When prices decrease supply shrinks and demand
expands. Apart from these, we intuitively feel that there is some relationship
between variables like fertilisers and yield, production and profit, remuneration and
motivation and so on. So, it is evident that the study of the relationship between
two variables is necessary for certain fields. In 1887, Sir Francis Galton was the
first person to study the relationship between the heights of sons and their fathers
based on the collected data.
Let us look at it from another perspective. Addition of 10-11% of chromium makes
the steel stainless, thereby increasing the life of a stainless-steel product. It is easy
to measure the amount of chromium rather than the life of the product. If we can
establish a relationship between the percentage of chromium and product life, we Fig. 7.1: Increased Sales Correlates with Festival Season
can predict the product’s lifetime accurately.

Correlation refers to the relationship between two or more variables. In other words, when two or more variables move in tune with each other,
they are said to be correlated.
Correlation analysis deals with:
• Measuring the relationship between variables
• Testing the significance of relationship
• Giving confidence interval for population correlation measure
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 6
Unit 7 - Correlation and Regression

2.1 Types of Correlation


The variables in a relationship may exhibit various types of correlation:
• When the variables move in the same direction, they are said to be positively correlated.
• If the variables move in opposite direction, they are said to be negatively correlated.
• If the variables move without reference to each other or in a disorderly fashion, then there is no correlation between them.
Correlation can be understood from three perspectives:
1. Positive or negative
2. Simple, partial and multiple
3. Linear and non-linear

2.1.1 Positive Correlation & Negative Correlation


In a positive correlation, two variables X and Y are involved, and both
the variables vary in the same direction. If variable X increases,
variable Y will also increase; when variable X decreases, variable Y
will also decrease. Fig. 7.2: Positive Correlation

For example, as and when the interest on deposits increases the


deposits will also increase. When the price of the product decreases
the supply of product will also decrease.
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 7
Unit 7 - Correlation and Regression

Negative correlation: when two variables move in the opposite


direction it is said to be negatively correlated. For example, when the
interest on the loan increases the demand for the loan decreases.
Fig. 7.2: Negative Correlation

2.1.2 Simple, Partial, and Multiple Correlation


When a relationship involves only two variables, it is called a simple correlation. Partial and multiple correlations involve three or more variables. Partial
correlation involves more than two variables out of which the relationship between only two variables will be analyzed keeping all other variables constant.
Multiple correlation involves assessing how the movements in a variable can be assessed based on the movement of the other variables.
A study on the relationship between advertisement expenditure and sales is an example of a simple correlation that involves just two variables.
Various factors that influence the demand for a product are the price of the commodity, customer’s taste and preferences, the number of customers in the
market, Income of people, Change in price of related goods, etc., and if researcher considers studying the relationship between any two variables keeping
all other variables constant, it is said to be a partial correlation.
And if the researcher considers studying the influence of all the above-listed factors on demand together, then it is said to be multiple correlations.

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 8
Unit 7 - Correlation and Regression

2.1.3 Linear and Non-Linear Correlation


These correlations depend upon the constancy of the ratio of change between the variables. In a linear correlation, the percentage of change is equal in
both variables. It is not so in non-linear correlation.

3. Measuring Correlation
Correlation not only indicates the direction of a relationship between the variables but also helps us understand the strength of the relationship, which is
very much essential for making rational decisions. There are many methods to provide a visual or numerical representation of correlation. Let us now
analyze a few methods of measuring correlation.
There are different ways to measure the correlation between the variables. In this unit, we shall discuss three simple and renowned measures of
correlation.
• Scatter plot.
• Karl Pearson’s correlation coefficient.
• Spearman’s rank correlation.

3.1 Scatter Diagrams for Correlation


A scatter diagram, which is a diagrammatic representation of the variable relationship, is also called a correlogram. Visual inspection of the scatter diagram
gives a clear indication of the relationship between two variables. The ordered pairs of observed variable values are plotted on the xy planes as dots.
Therefore, it is also known as a dot diagram. A scatter diagram:
• Gives the direction in which the variables are related.
• Does not give any quantitative measure for comparison between sets of data.

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 9
Unit 7 - Correlation and Regression

A scatter diagram can be visualised as any of the following plots.


1. If the dots lie exactly on a straight line that runs from left bottom to right top, then the variables are said to be perfectly positively correlated.

Fig. 7.3: Perfect Positive Correlation

2. If the dots lie close to a straight line that runs from left bottom to right top, then the variables are said to be positively correlated.

Fig. 7.4: Positive Correlation

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 10
Unit 7 - Correlation and Regression

3. If the dots lie exactly on a straight line that runs from left top to right bottom, then the variables are said to be perfectly negatively correlated.

Fig. 7.5: Perfect Negative Correlation

4. If the dots lie very close to a straight line that runs from left top to right bottom, then the variables are said to be negatively correlated.

Fig. 7.6: Negative Correlation

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 11
Unit 7 - Correlation and Regression

5. If the dots lie all over, then the variables are not correlated.

Fig. 7.7: Non-Correlation

3.2 Karl Pearson’s Correlation Coefficient


We have seen that scatter diagrams give a good idea of the direction and strength of the correlation between two variables. However, this gives only an
eye-estimation and not any numerical value for comparison or other purposes.
This Karl Pearson’s correlation coefficient method is mostly suitable for numerical or quantitative data. The correlation coefficient, also called the linear
correlation coefficient, is a measure of the strength and direction of a linear relationship between two variables. The linear correlation coefficient is
sometimes referred to as the Pearson product moment correlation coefficient in honour of its developer Karl Pearson. Incidentally, Karl Pearson was a
protégé of Sir Galton and was the first holder of the Galton chair of eugenics at the University of London. Karl Pearson's Correlation Coefficient is also
called Standardised Covariance.
The mathematical expression of Karl Pearson's Correlation Coefficient is given as:
X and Y are two variables, and N stands for the number of values

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 12
Unit 7 - Correlation and Regression

r may take any value between -1 and +1.


• r = 0 indicates absolutely no correlation between the two variables.
• r = -1 indicates perfect negative correlation.
• r = +1 indicates perfect positive correlation.
• -1 < r < 0 indicates negative correlation.
• 0 < r < 1 indicates positive correlation.
The mathematical expression of Karl Pearson's Correlation Coefficient may take several other forms, all giving the same result. These forms are given
below:

1.
Where,
x = X – X̅
y = Y – Y̅

'n' is the number of paired observations. Please note that [∑xy/N] is called Covariance of x and y.
Sigma denoted by σ stands for standard deviation.

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 13
Unit 7 - Correlation and Regression

We choose one of these three equations depending upon the available information.
2.

3.2.1 Calculating Correlation Coefficient with Pearson’s and Spreadsheet


Let us consider an example with two variables - interest rate and credit volume.
Problem: Find the correlation coefficient between the interest rate and the volume of credit for the last five years.
Interest rate 9 6 8 8 6

Volume of Credit (in Crores) 10 16 12 8 14

Solution 1 – Using Karl Pearson's Correlation Coefficient:


X Y X2 Y2 XY
9 10 81 100 90
6 16 36 256 96
8 12 64 144 96
8 8 64 64 64
6 14 36 196 84
37 60 281 760 430
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 14
Unit 7 - Correlation and Regression

We use the formula:

Thus, the correlation coefficient by means of Karl Pearson’s model is -0.82496.


Here r is negative i.e. the variables move in opposite directions. The value is near -1. This indicates that there is a strong reason to believe that as the
credit volume goes up, interest rate comes down and vice versa.

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 15
Unit 7 - Correlation and Regression

Solution 2 – Using popular spreadsheet software:

X Y
1. Put the data in an Excel worksheet.
9 10 2. Select Data à Data Analysis.
3. Select Correlation and click the ok button.
6 16

8 12

8 8

6 14

Fig. 7.8: Selecting the Correlation Tool

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 16
Unit 7 - Correlation and Regression

4. Highlight the data (A1:B6) including column headings as Input Range.


5. Select Columns for Grouped By.
6. Select F9 as Output Range.
7. Click Ok Button

Fig. 7.9: Selecting the Output Range under Correlation


You get the correlation coefficient computed as = -0.82496.

Fig. 7.10: Computed Value in a Spreadsheet

It is the same value obtained by manual computation. From now on, you may wish to use the
tool directly!

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 17
Unit 7 - Correlation and Regression

3.3 Spearman’s Rank Correlation Coefficient


Recall the concepts discussed in unit 1, where we discussed quantitative data and qualitative data. Data on which arithmetic operations such as mean and
standard deviation holds good will be classified as numerical data. Data for which arithmetic operations doesn’t hold good will be classified under
qualitative/categorical data. This spearman’s correlation would be most suitable for such categorical/ordinal data.
For noncategorical data, data should be changed to an ordinal scale to apply the spearman’s rank correlation.
Although not as powerful as Pearson results, Spearman’s Rank still provides great insight into the correlational analysis of two variables.
Spearman’s Rank correlation coefficient is defined as:

Where D is the difference between ranks assigned to the variables. N stands for the number of variables. We will shortly look at how D is computed,
through an example. Value of r or ρ (pronounced as Spearman’s rho) lies between –1 and +1 and its interpretation is same as that of Karl Pearson’s
correlation coefficient. Karl Pearson’s r and Spearman’s r will be nearly equal when the distribution is normal.
Spearman’s Rank correlation coefficient assumes the following:
a. Need not to be a normal distribution – a non-parametric measure.
b. The variables under study are affected by a large number of independent causes so as to form a normal distribution.
When we do not know the shape of population distribution and the data is qualitative, Spearman’s Rank correlation coefficient is used to measure the
relationship. Unlike the Pearson’s product moment correlation coefficient, the Spearman rank correlation coefficient does not assume:
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 18
Unit 7 - Correlation and Regression

• The nature of the frequency distribution of the variables.

• That the relationship between the variables is linear.

So, if there is any doubt that either of the two variables does not have a normal distribution, you should use the Spearman rank correlation and not the
Pearson correlation. Also, in the case of non-linear relationship, the Spearman correlation may be a better indicator than the Pearson correlation.

Practical guidelines for Interpreting Spearman’s ρ:

• For values of ρ of 0.9 to 1, the correlation is very strong.

• For values of ρ between 0.7 and 0.89, correlation is strong.

• For values of ρ between 0.5 and 0.69, correlation is moderate.

• For values of ρ between 0.3 and 0.4.9, correlation is moderate to low.

• For values of ρ between 0.16 and 0.29, correlation is weak to low.

• For values of ρ below 0.16, correlation is too low to be meaningful.

Similar to the Pearson’s r, a negative value of ρ indicates a negative relationship. Statistically, significant ρ implies that it reflects a true correlation in the
population rather than by chance.

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 19
Unit 7 - Correlation and Regression

3.3.1 Calculating Correlation Coefficient with Spearman’s rank correlation


Let us consider an example with non-linear variables to apply Spearman’s rank correlation.
Problem:
In a singing competition, two judges assigned the following ranks for 7 candidates.
Competitor 1 2 3 4 5 6 7
Judge I 5 6 4 3 2 7 1
Judge II 6 4 5 1 2 7 3

Find Spearman’s rank correlation coefficient.


Solution:
Competitor R1 (Judge 1) R2 (Judge 2) D = R1 – R 2 D2

1 5 6 -1 1
2 6 4 2 4
3 4 5 -1 1
4 3 1 2 4
5 2 2 0 0
6 7 7 0 0
7 1 3 -2 4
14
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 20
Unit 7 - Correlation and Regression

Since the result is positive and the value is in the strong range, we may conclude that the ranks given by the two
judges are positively correlated.

4. Introducing Regression
A professional life encounters a large number of variables. As a banker, you may deal with variables such as:
• Deposits and loans
• Service fees
• Branch/area/zone profitability
• Population profile of the locality/area/city of bank branch
• Rates and other guidelines from the RBI
• Economic conditions
• Political climate

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 21
Unit 7 - Correlation and Regression

It makes sense to think that some of these variables are inter-dependent and change in a variable affects others. Let us try to understand the cause-and-
effect relationship between variables of the banking sector.
1. What happens when a rival bank opens a branch in your vicinity?
2. What happens when a new multi-storied complex is opened for occupation next door?
3. What happens when a new safety vault is installed in your branch?
4. What happens to GDP growth when GST is implemented?

This is a list of possible changes in the environment in which you operate your branch. In consequent to the changes, you need to analyse the following:
1. If any of the above indicated change impacts your business?
2. In case of a possible impact, which variable(s) does it affect?
3. Will the impact be positive or negative?
4. What will be the quantum of impact?
5. How to assess the impact?
Regression analysis provides answer to such questions. It helps you understand and estimate how changes in one variable affect another variable. The
variable causing the change is called the independent variable and the one getting changed is called the dependent variable.

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 22
Unit 7 - Correlation and Regression

5. Measuring Regression
Regression is usually represented as: Y = a + bx
Where,
• Y = Dependent variable
• A = Intercept
• B = Slope
• X = Independent variable
• Y here stands for the number of depositors and is called the dependent variable because the number of deposits is dependent on the interest rate.
• The interest rate is the explanatory variable or the independent variable. If the interest rate goes up, more deposits will be opened.

5.1 Scatter Diagrams for Regression


Similar to correlation, regression can also be measured using
various methods. As explained earlier, the scatter diagram helps to Interest rate Deposit volume (in millions)
analyse regression visually. Recall the previous example of 4 400
increasing number of deposits against increased interest rates. To 5 550
verify whether the number of new deposits is dependent on interest
6 720
rate with the scatter diagram, follow these steps:
7 900
1. Retrieve data from the branch records and fill in the number of 8 1060
deposits opened with each change in the interest rate. 9 1210
10 1380
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 23
Unit 7 - Correlation and Regression

2. Put it on a scatter plot to identify trends.


Deposit volume (in millions)
1600

1400

1200

1000
Deposit

800 Fig. 7.10: Regression Trend


600

400

200

0
0 2 4 6 8 10 12
Interest rate

The scatter plot shows that there is an increase in deposits with increase in interest rates. While the scatter plot does indeed give a picture of the
association, we need to set the same in numbers.

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 24
Unit 7 - Correlation and Regression

5.2 Representation in a Spreadsheet


Using a spreadsheet, the numerical value can be obtained for regression. To carry out the regression
analysis in an excel sheet, consider an imaginary situation. You might have noticed in the past that lowering
of income tax rates increases the amount of deposits. However, you may not have learnt how to quantify
this increase.
You may not be sure whether the deposit amounts increased every time with a reduction of income tax
rates. So, you wish to understand if there are sufficient cause and effect relationship between these two
variables. Once you understand this relationship, you may be able to forecast the amount of new term
deposits.
Term Deposit amount by
Year Effective Tax rate for women women
You have probably understood by now that income tax rate is the
(in million rupees)
independent variable x and term deposit amount is the dependent
2008 15 30
variable y. Let’s try to find out whether the change in effective tax
2009 14 34
rate has an impact on the volume of term deposits.
2010 13 38
Consider the following hypothetical data: 2011 12 42
2012 11 46
2013 10 50
2014 9 54
2015 8 58
2016 7 62
2017 6 66

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 25
Unit 7 - Correlation and Regression

To use the regression function in Excel:


1. Go to File à Options à Add-Ins. You will get the following dialogue box.

Fig. 7.12: Add-Ins Window

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 26
Unit 7 - Correlation and Regression

2. Select Analysis ToolPak and click the Go button.


3. Check Analysis ToolPak in the list that appears and click the OK button.
4. Check whether the same is enabled. It will be visible in the Data menu as Data Analysis.
5. Select the relevant data and click the Data Analysis option.
6. Select the Regression function from the Data Analysis list that appears.
7. Select the dependent variable array in the independent variable (x) box – tax rate -and the array dependent variable in the ‘y’ box list (deposit volume).
Keep the other options in the default mode.
8. Check the Line Fit Plots option. The line fit plot shows the direction of the predicted vs the explanatory variable.
The output will be shown as below:

X Variable 1 Line Fit Plot


70
60
50
40
Y

30 Y
20 Predicted Y
Fig. 7.13: Regression in a Spreadsheet
10
0
0 2 4 6 8 10 12 14 16
X Variable 1

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 27
Unit 7 - Correlation and Regression

The output shows the following table. Note the R square value (marked in yellow). This is one of the first things you need to see when analysing data.
SUMMARY OUTPUT

Regression Statistics
Multiple R 1
R Square 1
Adjusted R Square 1
Standard Error 0
Observations 10

ANOVA
df SS MS F Significance F
Regression 1 1320 1320 #NUM! #NUM!
Residual 8 0 0
Total 9 1320

Standard Upper Lower Upper


Coefficients t Stat P-value Lower 95%
Error 95% 95.0% 95.0%

Intercept 90 0 65535 # 90 90 90 90
X Variable 1 -4 0 65535 # -4 -4 -4 -4
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 28
Unit 7 - Correlation and Regression

The R square value:


• Indicates the degree closeness of the data to the regression line.
• Shows goodness of fit.
• Is known as the coefficient of determination.
• Is always between 0 and 1.
• Is the square of the correlation between x and y.
R squared in simple terms refers to the degree of predictability. The following are various interpretations of the R square value:
• An R square of 0.80 means that 80% of the variance of the dependent variable can be predicted from the independent variable.
• An R square of 1 means that the movement of the dependent variable (y) can be 100% predicted from the movement of the independent variable (x).
• An R square of 0 means there is absolutely no correlation at all.
From the solution, we can infer 100% movement in deposit volumes in relation to the movement of tax rates, as the R square is 1, which shows a perfect
fit.
How will this help you?
You can derive the regression equation, which helps to make forecasts or predictions. Understanding the regression tool enables you to take decisions on
firmer foundations.

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 29
Unit 7 - Correlation and Regression

5.2.1 Deriving the Regression Equation


In a regression equation, we can predict the value of variable ‘Y’ that is dependent on the value of an independent variable ‘x’. Recalling the simple linear
function of regression:
Y = a + bx
Using this equation, we try to predict Y based on x where ‘a’ is the intercept and ‘b’ is the slope. In the above deposit problem, we inferred R squared as
100%, which indicates a good fit. How do we now derive the simple linear equation from the output?
In the regression output given above the intercept is 90, which is mentioned under coefficient in the intercept row. The slope is -4. Substituting these values
in the regression equation:
Y = 90 – 4 (x) which means
Deposit volume (Y) = 90 – 4 times the tax rate (x)

Let’s test it by considering the tax for the year 2013. The tax rate in 2013 is 10 therefore Y is:
Y = 90 – 4 x 10
Y = 90 – 40 = 50
Crosschecking with the deposit volume in the year 2013, it is indeed 50. So how can we use this result? If you know that the tax rate will be reduced to 5%
in the next budget, then the deposit volumes are likely to be:

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 30
Unit 7 - Correlation and Regression

Y = 90 – 4 x 5 = 70 → 70 million
The above example is fictitious, as it is very unlikely that you will get a perfect 1 as R square value. In real life, there could also be multiple factors affecting
the prediction. The above analysis with only two variables is called bivariate analysis. You can also do the same analysis with multiple variables in which
case it is called multiple regression.
There are many other areas where regression analysis proves to be effective in terms of understanding the relationship and using it to forecast the
dependent variable patterns. Many business decisions are dependent on correctly estimating or forecasting future conditions and as a banker, you will find
multiple uses such as making sales forecasts. You can even use these forecasts to substantiate your point but remember reality can be way different as
there are a large number of random events to blunt any prediction. For example, the demonetization period saw a dip in business which no forecast could
have foreseen. The 2008 crisis escaped the detection of the best systems and even the best-known experts in the field. Nevertheless, most of the
forecasts in the banking sector are relatively stable. For example, forecasts related to the growth of deposits or growth in the loan portfolio.

Formula Method
We try to establish a linear relationship between dependent and independent variables to evaluate, to what extent the dependent variable changes for
every unit change in the independent variable.
Y = a + bx

b=
 Xy − nXy
 X − nX 2 2

a = y − bX
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 31
Unit 7 - Correlation and Regression

Let's calculate the regression equation using the formula for below data.

Term Deposit amount by women


Year Effective Tax rate for women
(in million rupees)

2008 15 30
2009 14 34
2010 13 38
2011 12 42
2012 11 46
2013 10 50
2014 9 54
2015 8 58
2016 7 62
2017 6 66

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 32
Unit 7 - Correlation and Regression

Term Deposit
Effective amount by
Year Tax rate for women X^2 XY
women
(in million
rupees)
2008 15 30 225 450
2009 14 34 196 476
2010 13 38 169 494
2011 12 42 144 504
2012 11 46 121 506
2013 10 50 100 500
If you know that the tax rate will be reduced to 5% in the next budget, then
2014 9 54 81 486 the deposit volumes are likely to be:
2015 8 58 64 464 Y = 90 – 4 x 5 = 70 → 70 million
2016 7 62 49 434
2017 6 66 36 396
105 480 1185 4710

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 33
Unit 7 - Correlation and Regression

6. Chapter Summary
Here are the key points discussed in this unit.
• Correlation refers to the relationship between two or more variables.
• The variables in a relationship may exhibit various types of correlation such as positively correlated or negatively correlated.
• Two types of data are available – parametric and non-parametric.
• To calculate correlation, either Karl Pearson’s method or Spearman’s Rank correlation method can be used based on the data type.
• Regression is a concept to estimate the changes induced in one variable by another variable. The variable causing the change is
called the independent variable and the one getting changed is called the dependent variable.
• Both correlation and regression can be analyzed visually through scatter diagrams and can also be represented on a spreadsheet.

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 34

You might also like