Professional Documents
Culture Documents
3.1. Covariance
3.2. Correlation
3.2.1. Introduction
3.2.2. History
5.1. Case Study 1: Jobless Rate and Delinquent Loan Rate in USA
5.4. Case Study 4: Measure of Inflation Which Reflected Economic Activity Better in
2020-21
6. Conclusion
7. References
1. Introduction
Economics is the nexus and engine that runs society, affecting societal well-being,
raising standards of living when economies prosper or lowering citizens through class
structures when economies perform poorly. From a household budget to international trade,
economics ranges from the micro- to the macro-level.
When studying various processes and phenomena of scientific interest, quite often
there is a need to clarify the features and properties of various factors that influence these
processes and phenomena. In the presence of a large statistical material, multidimensional
mathematical and statistical methods can be successfully used to study these factors, which
also include correlation and regression analysis methods. Using these methods allows
obtaining conclusions and forecasts that can confirm or refute particular results of scientific
research.
2. Purpose
The purpose of this report is to illustrate the concept of correlation analysis, detail the
methodology of its use, present its applications in Economics, and the interpretation of the
results when conducting economic studies.
3. Measures of Associations
The measures of association determine the association between two variables or the
degree of association between two variables.
3.1. Covariance
The covariance is a measure of association between two variables. Let the variables
be X and Y and their corresponding means be x and y . The covariance is defined as:
Cov ( X , Y )=
∑ ( xi −x)( y i− y)
N
The covariance is the sum of the cross product of the deviations of the values of X and
Y from their means divided by the population size.
3.2. Correlation
3.2.1. Introduction
The correlation between two variables measures the strength of the relationship
between them but it doesn’t indicate the cause and effect relationship between the
variables. Correlation measures co-variation, not causation. Causation means
a change in one variable affects/causes the changes in other variable. In other
words, just because two events or things occur together does not imply that one
is the cause of the other. A positive “linear” correlation between two variables say
X and Y implies that high values of X are associated with high values of Y, and that
low values of X are associated with low values of Y. It does not imply that X causes
Y.
In 1885, Sir Francis Galton first defined the term "regression" and completed the
theory of bivariate correlation. A decade later, Karl Pearson developed the index that we still
use to measure correlation, Pearson's ‘r’.
The complete name of the correlation coefficient deceives many into a belief that Karl
Pearson developed this statistical measure himself. Although Pearson did develop a rigorous
treatment of the mathematics of the Pearson Product Moment Correlation, it was the
imagination of Sir Francis Galton that originally conceived modern notions of correlation and
regression. Galton, a cousin of Charles Darwin was an accomplished 19th century scientist
who made substantial scientific contributions to biology, psychology and applied statistics.
Figure 1.
Sir Francis Galton Karl Pearson
It was in 1888 that Galton first wrote about correlation:
“Two variable organs are said to be co-related when the variation of the one is
accompanied on the average by more or less variation of the other, and in the same
direction … It is easy to see that co-relation must be the consequence of the variations
of the two organs being partly due to common causes. If they were wholly due to
common causes, the co-relation would be perfect, as is approximately the case with
the symmetrically disposed parts of the body. If they were in no respect due to
common causes, the co-relation would be nil …”
Karl Pearson, Galton's colleague and friend, pursued the refinement of correlation with
such vigor that the statistic r, a statistic Galton called the index of co-relation and Pearson
called the Galton coefficient of reversion, is known today as Pearson's r.
In 1896, Pearson published his first rigorous treatment of correlation and regression in the
Philosophical Transactions of the Royal Society of London. In this paper, Pearson credited
Bravais with ascertaining the initial mathematical formulae for correlation. Pearson noted that
Bravais happened upon the product-moment (that is, the "moment" or mean of a set of
products) method for calculating the correlation coefficient but failed to prove that this
provided the best fit to the data. Using an advanced statistical proof, Pearson demonstrated
that optimum values of both the regression slope and the correlation coefficient could be
xy
calculated from the product-moment, ∑ , where x and y are deviations of observed values
N
from their respective means and N is the number of pairs.
If two variables change in the same direction (i.e. if one increases the other
also increases, or if one decreases, the other also decreases), then this is called a positive
correlation. Examples are:
Advertising and sales.
Heights and weights;
Household income and expenditure;
Price and supply of commodities;
Amount of rainfall and yield of crops
If two variables change in the opposite direction (i.e. if one increases, the other decreases and
vice versa), then the correlation is called a negative correlation. Examples are:
The nature of the graph gives us the idea of the linear type of correlation between
two variables. If the graph is in a straight line, the correlation is called a “linear
correlation” and if the graph is not in a straight line, the correlation is non-linear
or curvi-linear.
In general two variables x and y are said to be linearly related, if there exists a
relationship of the form y = a + bx where ‘a’ and ‘b’ are real numbers. This is nothing but a
straight line when plotted on a graph sheet with different values of x and y and for constant
values of a and b. Such relations generally occur in physical sciences but are rarely
encountered in economic and social sciences.
Scatter Plots (also called scatter diagrams) are used to graphically investigate the
possible relationship between two variables without calculating any numerical
value. In this method, the values of the two variables are plotted on a graph paper.
One is taken along the horizontal (X-axis) and the other along the vertical (Y-axis).
By plotting the data, we get points (dots) on the graph which are generally
scattered and hence the name ‘Scatter Plot’.
The manner in which these points are scattered, suggest the degree and the
direction of correlation. The degree of correlation is denoted by ‘r’ and its direction
is given by the signs positive and negative.
Figure 2. Degree of Correlation
Though this method is simple and is a rough idea about the existence and the degree
of correlation, it is not reliable. As it is not a mathematical method, it cannot measure the
degree of correlation.
It gives the precise numerical expression for the measure of correlation. It is denoted by ‘r’.
The value of ‘r’ gives the magnitude of correlation and its sign denotes its direction. The
mathematical formula for computing r is:
r=
∑ ( x i−x)( y i − y) ,
N σx σy
where x and y represents the standard deviations of x and y respectively; x and y are the
means of x and y respectively and N is number of pairs of observations.
This method is based on the ranks of the items rather than on their actual values.
The advantage of this method over the others in that it can be used even when the
actual values of items are unknown. For example if we want to know the
correlation between honesty and wisdom of the girls of our class, we can use this
method by giving ranks to the girls. It can also be used to find the degree of
agreements between the judgments of two examiners or two judges. The formula
is:
6∑ D
2
R=1− 2 ,
N (N −1)
where R is the rank correlation coefficient; D is the difference between the ranks of two items
and N is the number of observations.
Through the coefficient of correlation, we can measure the degree or extent of the
correlation between two variables. On the basis of the coefficient of correlation we
can also determine whether the correlation is positive or negative and also its
degree or extent.
Correlation may be positive, negative or zero but lies with the limits ± 1. i.e. the value
of r is such that -1 ≤ r ≤ +1. The + and – signs are used for positive linear correlations and
negative linear correlations, respectively.
1. Perfect correlation: If two variables change in the same direction and in the same
proportion, the correlation between the two is perfect positive. According to Karl Pearson the
coefficient of correlation in this case is +1. On the other hand, if the variables change in the
opposite direction and in the same proportion, the correlation is perfect negative. Its
coefficient of correlation is -1. In practice we rarely come across these types of correlations.
2. Absence of correlation: If two series of two variables exhibit no relations between them
or change in one variable does not lead to a change in the other variable, then we can firmly
say that there is no correlation or absurd correlation between the two variables. In such a case
the coefficient of correlation is 0.
3. Limited degrees of correlation: If two variables are not perfectly correlated or there is a
perfect absence of correlation, then we term the correlation as Limited correlation.
If x and y have a strong positive linear correlation, r is close to +1. An r value of
exactly +1 indicates a perfect positive correlation.
If x and y have a strong negative linear correlation, r is close to -1. An r value of
exactly -1 indicates a perfect negative correlation
If there is no linear correlation or a weak linear correlation, r is close to 0.
In contrast, supply is negatively correlated with price. When supply decreases without
a corresponding demand decrease, prices increase. The same number of consumers now
compete for a reduced number of goods, which makes each good more valuable in the eye of
the consumer.
Like demand and price, consumer spending and GDP are examples of positively
correlated variables where movement by one variable causes movement by the other. In this
case, consumer spending is the variable that affects a change in GDP. Firms set production
levels based on demand, and demand is measured by consumer spending. As the level of
consumer spending moves up and down, production levels strive to match the change in
demand, resulting in a positive relationship between the two variables.
5.1. Case Study 1: Jobless Rate and Delinquent Loan Rate in USA
At the beginning of 2009, the economic downturn resulted in the loss of jobs and an
increase in delinquent loans for housing. The national unemployment rate was 6.5% and the
percentage of delinquent loans was 6.12% (The Wall Street Journal, January 27, 2009). In
projecting where the real estate market was headed in the coming year, economists studied
the relationship between the jobless rate and the percentage of delinquent loans. The
expectation was that if the jobless rate continued to increase, there would also be an increase
in the percentage of delinquent loans. The data below show the jobless rate and the
delinquent loan percentage for 27 major real estate markets.
Jobless Delinquent
Metro Area Rate (%) X Loan (%) Y
Atlanta 7.1 7.02
Boston 5.2 5.31
Charlotte 7.8 5.38
Chicago 7.8 5.4
Dallas 5.8 5
Denver 5.8 4.07
Detroit 9.3 6.53
Houston 5.7 5.57
Jacksonville 7.3 6.99
Las Vegas 7.6 11.12
Los Angeles 8.2 7.56
Miami 7.1 12.11
Minneapolis 6.3 4.39
Nashville 6.6 4.78
New York 6.2 5.78
Orange County 6.3 6.08
Orlando 7 10.05
Philadelphia 6.2 4.75
Phoenix 5.5 7.22
Portland 6.5 3.79
Raleigh 6 3.62
Sacramento 8.3 9.24
St. Louis 7.5 4.4
SanDiego 7.1 6.91
San Francisco 6.8 5.57
Seattle 5.5 3.87
Tampa 7.5 8.42
For the above data, the Correlation Coefficient r = 0.44. Hence there is a low positive
correlation between Jobless rate and Delinquent house loan rate. A scatter diagram plotting
the two variables is shown below.
15
13
11
Delinquent Loan %
1
5 5.5 6 6.5 7 7.5 8 8.5
Jobless Rate %
COVID19 related data (Daily Confirmed cases, Recovered cases and deaths) from 01 April
2020 to 31 Oct 2021 has been analysed in this example. A screenshot of the sample data is
shown below. The data is collected from https://www.covid19india.org.
It is natural to believe that there are strong correlations between Daily Confirmed
cases & Daily Recovered cases and Daily Confirmed cases & Daily Deaths. To statistically
prove this, correlation analysis of Confirmed cases against Recovered cases and Confirmed
cases against Deaths have been performed. For the Confirmed cases against Recovered
cases, the correlation coefficient is 0.9155 and for the Confirmed cases against Deaths, the
correlation coefficient is 0.8334. This confirms the assumption that the correlations are
strong. Corresponding scatter diagrams are shown below:
450000
400000
350000
300000
250000
200000
150000
100000
50000
0
0 50000 100000 150000 200000 250000 300000 350000 400000 450000
7000
6000
5000
4000
3000
2000
1000
0
0 50000 100000 150000 200000 250000 300000 350000 400000 450000
This case study is taken from Indian Economic Survey 2020-2021. Evidence from the
experience of Spanish flu establishes that cities that intervened with lockdowns earlier and
more aggressively experience stronger recovery in economic front in the long run. Learning
from this experience, India implemented an early and stringent lockdown from late March to
May to curb the pace of spread of COVID-19. With the economy brought to a standstill for
two complete months, the inevitable effect was a 23.9 per cent contraction in GDP as
compared to previous year’s quarter. This contraction was consistent with the stringency of
the lockdown.
Figure. Correlation between Stringency and GDP Contraction during April-June 2020
Note: Bubble size corresponds to number of deaths as on 31st December, 2020; number of deaths per
lakh indicated with the bubble.
5.4. Case Study 4: Measure of Inflation Which Reflected Economic Activity Better in
2020-21
For the period April 2020 to November 2020, Consumer Price Index – Combined
(CPI-C) is weakly related to Index of Industrial Production (IIP) growth while Wholesale
Price Index (WPI) inflation and CPI-C Core inflation are positively and strongly related to
IIP growth. Therefore, core CPI-C inflation and WPI inflation, have been more in sync with
the demand conditions in the economy. During the period April 2020 to November 2020, the
correlation coefficient between WPI inflation and Year over Year (YoY) growth in IIP is
around 0.8 while the correlation coefficient between CPI-C core inflation and IIP growth has
been 0.9. The correlation between IIP growth and CPI inflation during the same period is 0.2.
Similarly, we can see high correlation of CPI Core inflation and WPI inflation with other
metrics of production and demand in the Indian economy as shown in the Table below. Food
items have a large weight of around 39 per cent in the CPI-C index. This means that shocks to
food prices due to COVID19 can have large impacts on CPI-C inflation.
Table: Correlation of Production Vs. Demand Metrics with CPI-C, Core and
WPI Inflation in 2020-21
6. Conclusion
In summary, correlation coefficients are used to assess the strength and direction of
the linear relationships between pairs of variables. When both variables are normally
distributed use Pearson's correlation coefficient, otherwise use Spearman's correlation
coefficient. Spearman's correlation coefficient is more robust to outliers than is Pearson's
correlation coefficient. Correlation coefficients do not communicate information about
whether one variable moves in response to another. There is no attempt to establish one
variable as dependent and the other as independent. Thus, relationships identified using
correlation coefficients should be interpreted for what they are: associations, not causal
relationships.
The use of correlation analysis extends to numerous important fields. For example, in
finance, correlation analysis can be used to measure the degree of linear relationships
between interest rates and stock returns, money supply and inflation, stock and bond returns,
and exchange rates.
Some of its short-comings include its unreliability, sensitivity to outliers, and the
suggestion of linear relationships where none exist.
7. References