You are on page 1of 54

Correlation Analysis

Defination
 The extent to which two or more things are
related to one another -- “co-related”
OR
 If the change in one variable affects a change in
other variable, the variables are s.t.b. correlated
 Correlation is a bivariate measure of association
(strength) of linear relationship between two
variables.
 Thus, correlation analysis is a statistical tool
which is used to describe the degree to which one
variable is linearly related to another.
 Univariate distribution - distribution
involving only one variable.
 Some situations occur in which our focus
is simultaneously on two or more
variables - bivariate distribution
 The movement in one variable is
accompanied by movements in other
variable.
 EXAMPLE : husband’s and wife’s age
move together, price and demand of
commodities.
Types of Correlation :

1 Positive and negative (Based on the


direction of change)
2. Simple, Partial and multiple (Based
on number of variables)
3. Linear and non-linear (Based on
change in proportion)
Positive and negative correlation
 If the two variables deviate in the same direction i.e.,
if the increase (or decrease) in one variable results in
a corresponding increase (or decrease) in the other,
correlation is s.t.b direct or positive
 EXAMPLE : income and expenditure
height and weight of group of persons
 If the two variables deviate in the opposite direction
i.e., if the increase (or decrease) in one variable
results in a corresponding decrease (or increase) in
the other, correlation is s.t.b diverse or negative.
 EXAMPLE : price and demand of the commodity
volume and pressure of a perfect gas
 Correlation is said to be perfect if the
deviation in one variable is followed by a
corresponding and proportional deviation in
the other.
Simple, Partial and Multiple
correlation
 Simple correlation : When only two variables are
involved and relationship is studied between those
two variables
 Partial correlation : More than two variables are
considered but relationship between two variables is
considered keeping other variables as constant.
 EXAMPLE: Production of wheat depends on many
factors like rainfall, quality of seed, manure etc. If
the relation between production of wheat/hectare
and quality of seed is studied keeping rainfall and
manure constant, then correlation is said to be
partial.
 Multiple correlation : Here, the relationship among
two or more variables is studied simultaneously.
 In above example, if we study the relationship
between production and other factors
simultaneously, the relationship is called multiple
correlation.
Linear and non linear correlation
■ Linear correlation:
If two variables are plotted ,having
straight line.(Ratio of changes
between two variables is same)
■ Non-Linear correlation :
If two variables are plotted ,having
non linear (curve) .(Ratio of changes
between two variables is not same)
Methods of studying correlation

 Graphicalmethod
 Mathematical method
Graphical method
( scatter diagram)
 Scatter diagram is mainly used to represent
bivariate data.
 These diagrams indicate the existence of a
relationship, as well as the strength of that
relationship.
 It is an easy way to highlight any relationship
that may exist and its type, whether direct or
inverse.
Steps of drawing the Scatter diagram
 Collect data on two variables, one
independent and the other dependent.
 Draw a diagram with the “cause” or
independent variable labeled on the
horizontal (X) axis and the “effect” or
dependent variable labeled on vertical (Y)
axis.
11-7
11-2 Scatter Plots - Example
 Construct a scatter plot for the data obtained in
a study of age and blood pressure of six
randomly selected subjects.
 The data is given on the next slide.
11-8
11-2 Scatter Plots - Example

Subject Age, x Pressure, y


A 43 128
B 48 120
C 56 135
D 61 143
E 67 141
F 70 152
11-9
11-2 Scatter Plots - Example

Positive Relationship

150
150
Pressure
Pressure

140
140

130
130

120
120
40
40 50
50 60
60 70
70
Age
Age
11-2 Scatter Plots - Other Examples
11-10

Negative Relationship
90
90
80
80
grade
Finalgrade

70
70
Final

60
60
50
50
40
40
55 10
10 15
15
Number
Numberofofabsences
absences
11-2 Scatter Plots - Other Examples
11-11

No Relationship
10
10

55
Y
y

00
00 10
10 20
20 30
30 4040 50
50 60
60 70
70
xX
Merits : Scatter diagram
 Simple to calculate & understand.
 Attractive method of finding correlation
 Rough idea at glance for positive or
negative correlation
 Not influenced by extreme items
 First step in finding the correlation
Demerits :Scatter diagram

 Not exact degree can be calculated.


Mathematical method
(Correlation Coefficient)
 Provides the numerical description of strength
or degree to which two variables are linearly
related.
 Karl Pearson's coefficient of correlation
 Spearman’s Rank coefficient of
correlation

 Sample correlation coefficient, r.


 Population correlation coefficient, 
Karl Pearson's coefficient of
correlation

 It is a quantity that gives the amount of linear


relationship between the variables.
 Mathematically,
cov( X , Y )
r
 X Y
X is standard deviations of X

Y
is standard deviations of Y

Cov ( x, y) : Combined variance of x and y


Formula for the Correlation Coefficient r

n xy   x y


r
 2

n x    x n y    y
2 2 2

Where n is the number of data pairs
Range of Values for the Correlation
Coefficient

Strong negative No linear Strong positive


relationship relationship relationship

  
Interpretation
Interpretation Value of r

High positive correlation 0.75 ≤r < 1

Moderate positive 0.50 ≤r < 0.75


correlation

Low positive correlation r < 0.5


Interpretation
Interpretation Value of r

High negative correlation - 0.75 ≥ r > - 1

Moderate negative - 0.50 ≥ r > - 0.75


correlation
Low negative correlation r > - 0.50
 Karl Pearson correlation technique works best
with linear relationship
 It does not work well with curvilinear
relationships (in which the relationship does
not follow a straight line.)
 Example : age and health care.
They are related, but the relationship doesn’t
follow a straight line say, young children and
older people both tend to use much more
health care than teenagers or young adults.
( Multiple regression is used to examine
curvilinear relationships)
Properties of correlation coefficient
 Correlation coefficient measures the strength or degree of linear
relationship
 The value of r lies between +1 and -1
 r is independent of both change in origin( means subtracting some
constant from the given value of X and Y) and change in
scale(means dividing or multiplying every value of X and Y by
some constant. i.e .
if u = X-A / I and v = Y-B / J then
r(X,Y)= r(U,V)
 Correlation coefficient is symmetric.

rxy  ryx  r
Relation between x and y is same as y and x.
Example 1
S. Ad.Expen. Sale
 The following table shows 10
No. (in thousand) (Units)
years data of advertisement
expenditure and sales of a 1 50 700
company. Calculate the 2 50 650
correlation coefficient
between these two variables 3 50 600
for this company? 4 40 500
 Determine coefficient of 5 30 450
determination.
6 20 400
 Determine Probable error
and point out whether 7 20 300
coefficient of correlation is 8 15 250
significant. 9 10 210
 Find the limits within which r
10 5 200
lies for another sample from
the same universe.
Example 2
 Calculate the Export of Export of
correlation coefficient raw cotton manufactured
from the following (crores) goods (crores)
data : 42 56
44 49
58 53
55 58
89 65
98 76
66 58
EXAMPLE

The deviations from the respective means


of X and Y series are given below

x -4 -3 -2 -1 0 1 2 3 4
y 3 -3 -4 0 4 1 2 -2 -1

Calculate Karl Pearson coefficient of


correlation from the above data
Correlation of Bivariate grouped
data
 Here, frequencies are involved.

r=
Example 3
 Find the coefficient of correlation between the age and the sum
assured from the following table:
Age group Sum assured ( in Rs.)

10000 20000 30000 40000 50000

20-30 4 6 3 7 1

30-40 2 8 15 7 1

40-50 3 9 12 6 2

50-60 8 4 2 - -
Example 4
 Find the coefficient of correlation from the following bivariate
frequency distribution :
Sales Advertising expenditure (Rs. ‘000)
revenue
(Rs. lakhs) 5-10 10-15 15-20 20-25

75-125 4 1 - -

125-175 7 6 2 1

175-225 1 3 4 2

225-275 1 1 3 4
Coefficient of determination

 Coefficient of determination represents


the percentage of variation in the
dependent variable explained by the
independent variable.
 Coefficient of determination is
denoted by r²
 r² = explained variance ∕ total variance
 r² will lie in 0 to 1
Interpretation
 If r = 0.7, then r² = 0.49
 This implies that 49% of the variation in
the dependent variable can be attributed
to the independent variable.
 In other words, 49% of the variability has
been explained and the remaining 51% is
unaccounted for.
 Value of r² close to one indicates that all
the variability in the dependent variable is
well accounted for by the independent
variable.
 In example 1, r = 0.976
 Coefficient of determination
r² = 0.95
This means 95% of sales variation is
explained by advertising expenditure.
Coefficient of no-determination

 Coefficient of non-determination is
ratio of unexplained variation to
total variation.
 Coefficient of non-determination is
denoted by K²
 K² = 1- r²
 K² =unexplained variance ∕ total variance
Standard error

 If r is the correlation coefficient


between the two variables X and Y,
for a sample of n observations, the
standard error of the correlation
coefficient, r is given by:
SE (r) = (1-r²)∕√n
Probable error

 Def: The probable error of the coefficient


of correlation is an amount, which if
added to or subtracted from mean
correlation coefficient, produces amounts
within which the chances are even that a
coefficient of correlation from a series
selected at random will fall.
PE (r) =0.6745(1-r²)∕√N
Where :
r = coefficient of correlation
N = number of pairs
 Limits of population correlation coefficient are:

ρ = r± PE(r)
Where ρ : population correlation
Functions : Probable error

 r > 6 PE( r) , r is significant


 r < PE( r) , r is insignificant
Example 5

 A student calculates the value of r


as 0.7 when the value of N is 5 and
concludes that r is highly
significant. Is he correct?
Rank Correlation
 It is another measure of correlation
 It is used when the distribution of the data is such
that it is not possible to quantify it but only rank it in
a certain order on the basis of a certain attribute.
 Helps to correlate two sets of qualitative
observations which are subject to ranking such as
qualitative productivity ratings (poor, fair, good,
very good etc.) for a group of workers by two
independent observers.
 Spearman’s rank correlation
coefficient is a distribution-free
measure (which does not make any
assumptions about the parameters
of the population), since no strict
assumptions are made about the
form of the population from which
sample observations are drawn.
Example
 Ten competitors in a beauty contest were ranked by two judges
in the following order:

First 1 6 5 10 3 2 4 9 7 8
judge
Second 6 4 9 8 1 2 3 10 5 7
judge

Calculate Spearman’s rank correlation coefficient. Is there an


association between the ranking?
Example
 Ten competitors in a beauty contest were ranked by three judges
in the following order:

First 1 6 5 10 3 2 4 9 7 8
judge
Second 3 5 8 4 7 10 2 1 6 9
judge
Third 6 4 9 8 1 2 3 10 5 7
judge

 Use the method of rank correlation to determine which pair of


judges has the nearest approach to common tastes in beauty.(
Example
 A large manufacturing firm wants to
determine whether relationship exists
between the number of work-hrs an
employee misses per year and the employee’s
annual wages (in thousands of rupees). A
sample of 15 employees produced the data
shown in the following table :
Example
Emplye Hrs wages

 Calculate 1 49 15.8

Spearman’s rank 2 36 17.5

3 127 11.3
correlation 4 91 13.2
coefficient as a 5 72 13

measure of the 6 34 14.5

strength of the 7 155 11.8

relationship
8 11 20.2

9 191 10.8
between work-hrs 10 6 18.8
and annual wages. 11 63 13.8

Explye hrs wages 12 79 12.7

13 43 15.1

14 57 24.2

15 82 13.9
Tied / Repeated Ranks
 When two or more individuals get the same rank with
respect to either of the two characteristics being
studied, common rank is assigned to the
observations, that are repeated.
 This common rank is the average of the ranks which
these observations would have assumed if they had
been different from one another.
 EXAMPLE : If two obs. Were ranked equal at the
fourth place, then both these obs. Would be ranked
as 4+5/2 = 4.5
the next obs. Would be ranked 6 and so on.
Then rank correlation coefficient is
given by :
Rs = 1- 6 { d 2 + 1 m1(m1 2– 1)

2 12
+ 1 m2(m2 – 1)+…………..}
12
2

n(n – 1 )
Where, mi = no. of times the ith
repeated item is repeated. i = 1,2……..
Student Midterm Final
score exam
Example score
Neha 82 94
 The scores of 10
Chani 81 92
students on the
mid term Aditi 80 85
examination and Sumit 68 75
the final Aditya 70 73
examination are
Mohit 92 95
given. Compute
the rank Reha 76 69
correlation Rahul 80 86
coefficient. Sakshi 86 90
charu 62 69
Example
Applican Marks in Marks in
ts accounts statistic
 An examination of 8 s
applicants for a clerical
post was taken by a A 15 40
firm. From the marks B 20 30
obtained by the
applicants in the C 28 50
accounts and statistics D 12 30
paper, compute the
rank correlation E 40 20
coefficient.
F 60 10
G 20 30
H 80 60
EXAMPLE

 Coefficient of correlation between X


and Y for 20 items is 0.3. mean of X is
15 and that of Y is 20 while standard
deviations are 4 and 5 respectively. At
the time of calculation one item 27 has
wrongly been taken as 17 in case of X
series and 35 instead of 30 in case of Y
series. Find the correct coefficient of
correlation.
EXAMPLE
 Calculate the coefficient X Y
of correlation between 21 18
values of X and Y. 22 20
 If each X value is 28 25
multiplied by 2 and
32 30
increased by 3 and
each Y value is 35 31
multiplied by 3 and 36 32
subtracted by 4, find
the new coefficient of
correlation. Comment
on the result

You might also like