You are on page 1of 8

3.

0 Correlation analysis

In economic literature, there are numerous discussions of variables in pairs for example quantity
and price, consumption and income, saving and investments.

When you look at pairs of variables, this is bivariate analysis.

Multivariate relationships include more than two variables for examples output, capital, and
labour; investment, savings, and incomes.

Correlation analysis may be defined as the degree and direction of association between or among
variables.

When the correlation is between two variables, it’s called simple correlation and if it is above
two variables, then it is called multiple correlation.

3.1 Scatter diagram:

A scatter plot or scatter graph is a diagram that uses Cartesian coordinates to display values for
two variables for a set of data. In a scatter graph the two sets of data collected are plotted against
each other using the scales on each axis. The diagram helps to show whether a correlation can be
established between the variables.

3.2 Types of correlation

i. Positive correlation

An increase in the value of one variable leads to an increase in the value of the other variables
and vice versa. For example price and quantity supplied. We can represent a positive correlation
on a scatter diagram.
ii. Negative correlation

An increase in the value of one variable leads to a decrease in the value of the other variable for
example distance travelled by the car and the fuel left in its tank. We can also represent a
negative correlation on a scatter diagram

iii. Zero / no correlation

A change in the value of one variable doesn’t affect the value of the other variable.
Note: The strength of association could also be illustrated using scatter diagram so that the nearer
the point to the line of best fit, the stronger the relationship when they are dispersed away from
the line of best fit, it is a weak relationship.

Reading assignment: Read and make notes about perfect positive correlation, perfect negative
correlation, strong positive, strong negative, weak positive, and weak negative correlation.

3.3 The Correlation coefficient

This measures the strength of the correlation between two variables. For quantitative variables
we use product moment/Pearson’s correlation coefficient whereas for qualitative variables we
use Spearman’s rank correlation coefficient.

1. Quantitative measures

Quantitative variables are those variables that can be expressed as numerical values (things we
can measure, count and weigh). Here we use Pearson / product moment correlation coefficient.

Note: It is at times referred to as correlation coefficient. It is denoted by r .

n xy   x  y
r xy 
   
n x 2   2  n
x    y  
2
 y   2

Properties correlation coefficient

- It is systematic rxy = ryx


- It is a value that falls between -1 and 1 inclusive. That is to say -1 ≤ rxy ≤ 1

Note: rxy = 1 means perfect positive correlation

rxy = -1 means perfect negative correlation

rxy = 0 means no/zero correlation

Example 3.1

Given the information below where Y is the quantity supplied in kgs and X is price in dollars $,
use it to
a) Compute correlation coefficient and comment.
b) Construct a scatter diagram and comment.

Solution

a)

Y X XY X2 Y2
10 2 20 4 100
20 4 80 16 400
30 6 180 36 900
50 8 400 64 2500
40 10 400 100 1600
60 12 720 144 3600
80 14 1120 196 6400
90 16 1440 256 8100
90 18 1620 324 8100
120 20 2400 400 14400
 Y  590  X  110  XY  8380  X  1540 Y  46,100
2 2

n xy   x  y 10  8380   110  590 


 
r xy
n x 2   2  n
  x   
 y
2
   y  
2
10  1540 110 10  46100  590 
2 2

 0.9792

Comment: There is a high positive correlation between Y and X.


b) Scatter diagram
Comment: There is a positive correlation between X and Y.
Defects of quantitative correlation coefficient
a) The higher the observations, the greater the products and the higher the sums of
the products (∑ ). If x and y are highly correlated, the correlation efficient may
appear larger and may not be necessary true.

Remedy. For higher values of ∑ , divide ∑ by n i.e. where n is the

number of observations.
b) Different units of measurements e.g. quantity in kgs and price in dollars.
Remedy. Simply divide covariance by the standard deviation of x and y and the result is
unit less.
Assumptions of quantitative correlation coefficient
 X and Y should be quantifiable.
 X and Y should be accurately measured.
The coefficient of determination, r
The coefficient of determination, r is the ratio of the explained variation to the total variation and
is obtained by squaring the value of rxy i.e

r r
2
Coefficient of determination (CD), xy
Example 3.2: Compute the coefficient of determination in Example 3.1 and interpret.
Solution

r r  0.9792  0.958  0.958  100  95.8%


2 2
xy

The coefficient of determination gives the proportion of all the variation (in the y-values) that is
explained (by the variation in the x-values).
Interpretation: Approximately 95.8% of the variations in the quantity supplied (Y) are due to
the changes in the price (X).
2. Qualitative measure
In case the above two assumptions are violated, the best measure is the Spearman’s rank
correlation coefficient. If X and Y are qualitative e.g. gender, education level, religion, marital
status etc. or X and Y are in accurately measured, we use Spearman’s rank correlation
coefficient. Spearman’s rank correlation coefficient is denoted by ρ and it is given by

6 d
2

  1

n n 1
2
 , where d=difference in ranks, n=number of pairs of X and Y.
Note: Spearman’s rank correlation .
Example 3.3:
The data below shows the performance in English and mathematics test for S.6 candidates in a
certain school.
English (X) 40 60 41 75 60 60 38
Math (Y) 10 10 50 80 50 34 52

a) Compute Spearman’s rank correlation coefficient and comment.


b) Construct a scatter diagram for the data and comment.
Solution
X Y Rx Ry d d2
40 10 6 6.5 -0.5 0.25
60 10 3 6.5 -3.5 12.25
41 50 5 3.5 1.5 2.25
75 80 1 1 0 0
60 50 3 3.5 -0.5 0.25
60 34 3 5 -2 4
38 52 7 2 -5 25

Note: d= Rx - Ry or d= Ry – Rx

a)

Comment: There is a low positive correlation between the performance in Mathematics


and English.
b) Scatter diagram

Comment: There is no correlation between X and Y.


Limitations of correlation.
 Its only applicable if you have linear relationships.
 It does not specify the nature of relationship i.e. in causation.

3.4 Linear regression

Linear regression attempts to model the relationship between two variables by fitting a linear
equation to observed data. One variable is considered to be an explanatory variable, and the other
is considered to be a dependent variable.

3.4.1 The line of regression


The linear regression line has an equation of the form Y=a +bX, where X is the explanatory
variable and Y is the depedent variable. The constant a is the Y-intercept and the constant b is
the gradient of the regression line.

Y  a  bX
n xy   x  y
b
n x   x
2
  2

a
 y  b x
n

Example 3.4: Using the data in Example 3.4,

a) Obtain the regression line


b) Interpret the equation obtained
c) Use the equation obtained to predict the quantity supplied (Y) when the price is 15
dollars.

Solution

n xy   x y (10  8380 )  (110  590) 18900


a) b    5.727
 
=
n x   x
2
(10  1540 ) 110
2 2
3300

a
 y  b x  590  5.727 *110  3.997
n 10
Y=-3.997+5.727X
b) Interpretation: When the price (X) increases by 1 dollar, the quantity supplied (Y) will
increase by approximately 5.727 units.
c) When X=15 dollar, then Y=?
Using Y=-3.997+5.727X
Y==-3.997+5.727(15)=81.9

You might also like