You are on page 1of 9

CHAPTER - 8

Correlation

BBS – 1st Year

Business Statistics

Learning Objectives
Upon Completion of this chapter, students should be able to
➢ Understand the meaning of Correlation
➢ Types of Correlation
➢ Use and Significance of Correlation
➢ Methods of Studying Correlation
➢ Correlation of Bivariate Frequency Distribution
➢ Probable Error of Correlation Coefficient
➢ Rank Correlation
Business Statistics, BBS 1st Year Correlation

1. Introduction of Correlation
In previous chapters, we discussed about various descriptive statistical measures in order to
summarize the mass of data. However, we are limited to only one set of data. In, this chapter, we
are going to study about the relationship between two set of data. The method of measuring the
relationship between two or more set of variables is known as correlation. In other words,
correlation is statistical device designed to measure the degree of association between two or more
variables.

2. Type of Correlation
Following 6 types of correlation can be observed between two set of variables;
2.1 Positive Correlation
The relation is said to be positive when two set of variables moves in the same direction. I.e. when
one variable increases other also increases and vice-versa.

2.2 Negative Correlation


The relation is said to be negative when two set of variables move in the opposite direction. I.e.
increase in one variable cause to decrease in another variable.

2.3 Linear Correlation


The correlation is said to be linear if the corresponding value two set of variables are proportional
to each other. I.e. the ratio between the corresponding values are always equal.

2.4 Non-Linear Correlation


The correlation is said to be non-linear if the corresponding values of two variables are not
proportional to each other. I.e. ratios between corresponding values are not same.
2.5 Partial Correlation
It is the degree of association between the dependent variable and only one particular independent
variable amongst many independent variables.

2.6 Multiple Correlation


When the correlation among more than two variables are studied, then it is said to be multiple
correlation. In fact, it the relationship between the dependent variable and other independent
variables.

2
Business Statistics, BBS 1st Year Correlation

3. Use and Significance of Correlation


➢ It is used in physical and social science,
➢ It is also useful to measure the relationship between the variables like price, quantity, cost,
sales, profit etc. in Economics and Business Studies,
➢ It is helpful in measuring the degree of relationship between the variables such as income
and expenditure, price and supply, supply and demand, advertisement expenses and sales
etc.,
➢ It serves as basis for understanding the concept of regression analysis.

4. Methods of Studying Correlation


Correlation between two set of variables can be measured by using following 3 methods;

4.1 Graphical Method or Scatter Diagram Method


In this method, one set of variables are kept along X-axis and another set of variables are kept
under Y-axis. Various combinations of X and Y are plotted in graph which are represented by dots.
This is called scattered diagram. Correlation between these variables are identified by observing
the trend. Some Examples are;

Perfectly Positive Correlation Perfectly Negative Correlation

3
Business Statistics, BBS 1st Year Correlation

Relatively Positive Correlation Relatively Negative Correlation

No Correlation

Advantages/Merits
➢ It is easy to draw,
➢ Simplest and attractive method of identifying correlation,
➢ Easy to understand and interpret,
➢ Values of extreme item do not affect this method.
Demerits/Disadvantages
➢ Unable to measure precise extent of correlation,
➢ Using graphical method degree of correlation cannot be quantified.

4
Business Statistics, BBS 1st Year Correlation

4.2 Karl Pearson's Correlation Coefficient


Karl Pearson's correlation coefficient measures a degree of association between two variables only
to the extend it is linear. If X and Y are two set of variables then correlation between X and Y is
denoted by rxy. Karl Pearson's correlation is also called product moment correlation coefficient or
simple correlation coefficient or simply correlation. It can be calculated by using any of the
following formula;
𝐶𝑜𝑣(𝑋,𝑌) Σ(𝑋−𝑋̅)(𝑌−𝑌̅)
r= Where, Cov.(X,Y) =
𝛿𝑋 .𝛿𝑌 𝑛

Σ(𝑋−𝑋̅)(𝑌−𝑌̅)
r=
√Σ(𝑋−𝑋̅)2 .√Σ(𝑌−𝑌̅)2

Σ𝑥𝑦
r= Where, x = (𝑋 − 𝑋̅) & y = (𝑌 − 𝑌̅)
√Σ𝑥 2 .√Σ𝑦 2

Σ𝑥𝑦
r=
𝑛.𝛿𝑋 .𝛿𝑌
nΣ𝑋𝑌− Σ𝑋.Σ𝑌
r=
√nΣ𝑋 2 −(Σ𝑋)2 .√nΣ𝑌 2 −(Σ𝑌)2

From Assumed Mean


by changing origin
nΣ𝑢𝑣− Σ𝑢.Σ𝑣
r= Where, u = (X-A) & v = (Y-B)
√nΣ𝑢2 −(Σ𝑢)2 .√nΣ𝑣 2 −(Σ𝑣)2

A = Assumed Mean of X
B = Assumed Mean of Y
nΣ𝑢𝑣− Σ𝑢.Σ𝑣 𝑋−𝐴 𝑌−𝐵
r= Where, u = &v=
√nΣ𝑢2 −(Σ𝑢)2 .√nΣ𝑣 2 −(Σ𝑣)2 ℎ 𝑘

h = class difference of X
k = class difference of Y
For By-variate Distribution
nΣf𝑋𝑌− Σf𝑋.Σf𝑌
r=
√nΣf𝑋 2 −(Σf𝑋)2 .√nΣf𝑌 2 −(Σf𝑌)2

nΣf𝑢𝑣− Σf𝑢.Σf𝑣
r=
√nΣf𝑢2 −(Σf𝑢)2 .√nΣf𝑣 2 −(Σf𝑣)2

5
Business Statistics, BBS 1st Year Correlation

Interpretation

-1 -0.5 0 +0.5 +1
Perfectly Perfectly
Negative Positive

Highly Lower Lower Highly


Negative Negative Positive Positive

➢ If r = 1, Perfectly Positive Correlation


➢ If r = -1, Perfectly Negative Correlation
➢ If r > 0, Positive Correlation
➢ If r < 0, Negative Correlation
➢ If r = 0, No Correlation
➢ If r > 0.5, High Degree of Positive Correlation
➢ If r < 0.5, Low Degree of Positive Correlation
➢ If r < -0.5, High Degree of Negative Correlation
➢ If r > -0.5, Low Degree of Negative Correlation
Properties of Karl Pearson's Correlation Coefficient
➢ Value of correlation coefficient always lies between -1 and +1 (i.e. -1< r <+1)
➢ Correlation coefficient is the geometric mean between two regression coefficients.
I.e. r = + √𝑏𝑋𝑌 . 𝑏𝑌𝑋
Where, 𝑏𝑋𝑌 = Regression coefficient of regression line X on Y.
𝑏𝑌𝑋 = Regression coefficient of regression line Y on X.
➢ Correlation coefficient is independent of change in origin as well as scale.
➢ Two independent variables are uncorrelated but uncorrelated variable may not always be
independent.
➢ Correlation coefficient is a relative measure.
Merits / Advantages
➢ It is one of most popular method of measuring correlation,
➢ It is a simple mathematical technique,
➢ It gives co-variation between two variables,
➢ It quantifies the relationship to make study easier,
➢ It also gives direction of correlation i.e. positive and negative.

6
Business Statistics, BBS 1st Year Correlation

Demerits / Limitations
➢ It assumes linear relationship between the variables,
➢ The value of correlation is unduly affected by the extreme items,
➢ It involves complex calculation to determine the value of correlation,
➢ Existence of correlation does not necessarily indicate cause and effect relationship between
two variables.

4.3 Probable Error of Correlation Coefficient


Probable error is the method testing reliability of the calculated correlation coefficient. It is denoted
by P.E. (r). P.E. is computed as follows;
P.E. (r) = 0.6745 × S.E.
1 −𝑟 2
Where, Standard Error (S.E.) =
√𝑛

1 −𝑟 2
Therefore, P.E. (r) = 0.6745 ×
√𝑛

Note: by using the sample correlation and P.E. we can also estimate the range of probable
correlation coefficient of population. Where, the range of population correlation = r + P.E. (r).
Interpretation
➢ if |𝑟| < P.E. (r), Correlation is not significant
➢ if |𝑟| > 6 P.E.(r), Correlation is significant
➢ In all other cases, significance could not be defined.

4.4 Spearman's Rank Correlation Coefficient


When the correlation between two variables are computed using the rank, then such a correlation
is called rank correlation coefficient. Rank is the assignment of order or priorities according to the
relative importance and weight. This method is primarily used in calculating correlation between
two qualitative data which could not be measured in term of quantity. Ex. beauty, knowledge,
honesty, intelligence etc. In this chapter we are going to study about the rank correlation given by
Spearman, which is also known as Spearman's Rank Correlation.
4.4.1 Normal Rank Correlation Coefficient
If there is no repetition on rank i.e. no any value comes twice, then in such situation normal rank
correlation is calculated.
6Σ𝑑 2
R=1- Where, d = R1 – R2
𝑛(𝑛2 −1)

7
Business Statistics, BBS 1st Year Correlation

4.4.2 Repeated Rank


In some cases, there may be repetition in values in a series i.e. any value may come twice or more.
In such situation, normal rank correlation does not give appropriate result. Thus, we use following
formula to calculate correlation with repeated rank;
𝑚1 (𝑚1 2 −1) 𝑚2 (𝑚2 2 −1)
6[Σ𝑑 2 + + +⋯ ]
12 12
R=1-
𝑛(𝑛2 −1)

Where, 𝑚1 = No. of repetition of first repeated value

𝑚2 = No. of repetition of second repeated value


4.4.3 Interpretation
Interpretation of rank correlation coefficient is exactly same as that of Karl Pearson's Correlation
coefficient. I.e. It's value ranges from -1 to +1.

-1 -0.5 0 +0.5 +1
Perfectly Perfectly
Negative Positive

Highly Lower Lower Highly


Negative Negative Positive Positive

➢ If R = 1, Perfectly Positive Correlation


➢ If R = -1, Perfectly Negative Correlation
➢ If R > 0, Positive Correlation
➢ If R < 0, Negative Correlation
➢ If R = 0, No Correlation
➢ If R > 0.5, High Degree of Positive Correlation
➢ If R < 0.5, Low Degree of Positive Correlation
➢ If R < -0.5, High Degree of Negative Correlation
➢ If R > -0.5, Low Degree of Negative Correlation

8
Business Statistics, BBS 1st Year Correlation

5. Coefficient of Determination
Coefficient of determination measures the proportion of the variation in the dependent variable
due to change in the independent variable. The value of dependent variable may be changed due
to various factors. But the change in dependent variable caused by the change in independent
variable is represented by the coefficient of determination. It is also called as explained variation.
This is because, coefficient of determination provided the proportion of variation in dependent
variable which is explained by the independent variable.
This concept can be linked with the concept of dispersion. Standard deviation shows the variable
of the data. And coefficient of determination gives the proportion of the total variation which is
caused by the independent variable.
Coefficient of Determination = r2
Total variation = Square of Standard Deviation (⸹2)
Explained Variation = ⸹2 × r2
Unexplained Variation = ⸹2 – (⸹2 × r2)
Interpretation
If coefficient of determinant is 0.60; it indicates that 60% of the variation can be explained and
remaining 40% cannot be explained.

You might also like