You are on page 1of 29

CORRELATION

18
STATISTICAL RELATIONSHIP

A statistical relationship exists if a change in one variable (X) results in a systematic


change (increase/decrease) in another variable (Y).
We study the relationship of such types:
• Relation between height and weight
• Relation between height and age
• Relation between price and demand

19
STATISTICAL RELATIONSHIP

• Such a relationship is called statistical relationship.


• Special methods have been developed to discover the existence
of statistical relationship between two variables.
• When both the variables are quantitative we use the term
correlation analysis to describe the methods designed to find
out if the statistical relationship between the two variables
exists or not.

20
CORRELATION

• Correlation, is a measure of the degree to which any two variables vary


together. In other words, two variables are said to be correlated if they
tend to simultaneously vary in some direction.
• Correlation is a statistical method used to determine whether a
relationship between variables exists.
• Correlation is a Linear association between two random variables.

21
TYPES OF CORRELATION

1. POSITIVE CORRELATION
• A positive relationship exists when both
variables increase or decrease at the same time.
For instance,
a person’s height and weight are related; and the
relationship is positive, since the taller a person
is, generally, the more the person weighs

22
TYPES OF CORRELATION

2. NEGATIVE CORRELATION
• In a negative relationship, as one variable
increases, the other variable decreases, and vice versa.
For example, if you measure the strength of people
over 60 years of age, you will find that as age
increases, strength generally decreases. The word
generally is used here because there are exceptions.
The volume of gas will decrease as the pressure
increases.
GPA will decrease as the Hours of video games
played increases.

23
TYPES OF CORRELATION

3. NO CORRELATION
It means that two variables do not follow the same or
opposite trend together.
The number of freckles on a person’s face and the
number of T shirts they have.
Average time spent watching TV in a week and size of
television.

24
CORRELATION ANALYSIS

Correlation Analysis involves various methods and techniques used for studying and
measuring the extent of the relationship between two variables.
There are two methods mostly used to study Correlation.
Scatter Diagram
Pearson Product Moment Coefficient of Correlation

25
SCATTER DIAGRAM

• A scatter plot is a graph of the ordered pairs (x, y) of numbers consisting of the variable x and
the variable y
• The scatter plot is a visual way to describe the nature of the relationship between the independent and
dependent variables.
• The scales of the variables can be different, and the coordinates of the axes are determined by the smallest
and largest data values of the variables.
• A first step in finding whether or not a relationship between two variables exists, is to plot each
pair of observations (X,Y) along two axes, the pattern of the resulting points revealing any
correlation present.
• If a relationship between the variables exists, then the points in the scatter diagram will show a
tendency to cluster around a straight line or some curve.

Video Aid: 26
https://youtu.be/sHbX58y5D4U
E XAMPLE 1
CONSTRUCT A SC ATTE R PLOT
FOR TH E DATA SHOWN FOR
C AR RE NTAL COMPANIE S IN
TH E UNITE D STATE S FOR A
RE CE NT YE AR.

Company Car (in ten Revenue (in


Thousands) billions)
A 63 7
B 29 3.9
C 20.8 2.1
D 19.1 2.8
E 13.4 1.4
F 8.5 1.5

27
USING EXCEL DRAW SCATTER PLOT

SCATTER PLOT
8

Revenue (in billions


7
6
5
4
3
2
1
0
0 10 20 30 40 50 60 70
Cars (in ten thoursands)

The above plot suggests a positive relationship,


since as the number of cars rented increases,
revenue tends to increase also

28
E X A M P L E 2 : C O N S T RU C T A S C AT T E R
P L OT F O R T H E DATA O B TA I N E D I N
A S T U DY O N T H E N U M B E R O F
ABSENCES
AND THE FINAL GRADES OF SEVEN SCATTER PLOT
R A N D O M LY S E L E C T E D S T U D E N T S 100
F RO M A S TAT I S T I C S C L A S S . T H E
80

Final Grade(%)
DATA A R E S H OW N H E R E .
60

40

20

Student Number of Final Grade 0


0 5 10 15 20
absences (X) %. (Y) Number of Absences
A 6 82
B 2 86
C 15 43 The plot of the data suggests a negative relationship,
D 9 74 since as the number of absences increases, the final
E 12 58 grade decreases.

F 5 90
G 8 78
29
EXAMPLE 3
C O N S T RU C T A S C AT T E R P L OT F O R T H E
DATA O B TA I N E D I N A S T U DY O N T H E
N U M B E R O F H O U R S T H AT N I N E P E O P L E
E X E R C I SE E AC H W E E K A N D T H E A M O U N T
SCATTER PLOT
80
O F M I L K ( I N O U N C E S ) E AC H P E R S O N 70
C O N S U M E S P E R W E E K . T H E DATA A R E 60

AMOUNT
S H OW N . 50
40
30
20
Subject Hours (X) Amount (Y) 10
0
A 3 48 0 2 4 6 8 10 12
HOURS
B 0 8
C 2 32
D 5 64
E 8 10 the plot of the data shows no specific type of
F 5 32 relationship, since no pattern is discernible.

G 10 56
H 2 72
I 1 48
30
PEARSON PRODUCT MOMENT
CORRELATION COEFFICIENT

• A numerical measure of strength in the linear relationship between any two variables is called the
Pearson’s product moment correlation coefficient or sometimes, the coefficient of simple correlation.
• The coefficient of correlation between the two variables (X, Y) is generally denoted by ‘r ’ or ‘𝒓𝒙𝒚 ’ is
defined by:

σ 𝑿−𝑿ഥ 𝒀−𝒀ഥ
𝒓𝒙𝒚 = ഥ 𝟐 σ 𝒀−𝒀ഥ 𝟐
σ 𝑿−𝑿

31
COEFFICIENT OF CORRELATION

• For computational purpose. We have an alternative form of ‘r’ as:


𝒏 σ 𝑿𝒀−σ 𝑿 σ 𝒀
• 𝒓𝒙𝒚 =
{𝒏 σ 𝑿𝟐 −(σ 𝑿)𝟐 }{𝒏 σ 𝒀𝟐 −(σ 𝒀)𝟐 }

• This is more convenient and useful form.


• The coefficient of correlation ‘r’ is a pure number (independent of the units in
which the variables are measured) and it assume values that can range from +1 for
perfect positive linear relationship, to -1 for perfect negative linear relationship
with the intermediate value of zero indicating no linear relationship between X and
Y.

32
Correlation Coefficient Interpretation Guideline
The correlation coefficent ranges from -1 (a perfect negative correlation) to +1 ( a perfect postive correlation)

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

Very weak Strong Very


Strong moderate weak
Strong correlation moderate correlation Strong
correlation correlation correlation
correlation correlation correlation

33
Perfect Correlation:
𝒓=𝟏
GRAPHICAL
or INTERPRETATION OF R
𝒓 = −𝟏
Strong
Correlation:

r > .70 or r < –.70


Moderate
Correlation:

𝟎. 𝟑𝟎 ≤ 𝒓 ≤ 𝟎. 𝟕𝟎
or
−𝟎. 𝟕𝟎 ≤ 𝒓 ≤ −𝟎. 𝟑𝟎
Weak Correlation:
𝟎 ≤ 𝒓 < 𝟎. 𝟑𝟎
or
−𝟎. 𝟑𝟎 ≤ 𝒓 < 𝟎

34
EXAMPLE 1 (SLIDE #27)

Car (in ten Revenue (in XY 𝑿𝟐 𝒀𝟐


Company Thousands) billions)
X Y
A 63 7 441 3969 49
B 29 3.9 113.10 841 15.21
C 20.8 2.1 43.68 432.64 4.41
D 19.1 2.8 53.48 364.81 7.84
E 13.4 1.4 18.76 179.56 1.96
F 8.5 1.5 2.75 72.25 2.25
Total Σ𝑋 = 153.8 Σ𝑌 = 18.7 Σ𝑋𝑌 = 682.77 Σ𝑿𝟐 = 5859.26 Σ𝒀𝟐 = 80.67

35
𝒏 σ 𝑋𝑌−σ 𝑋 σ 𝑌
𝑟𝑥𝑦 =
{𝒏 σ 𝑋 𝟐 −(σ 𝑋)𝟐 }{𝒏 σ 𝑌 𝟐 −(σ 𝑌)𝟐 }

6(682.77)−(153.8)(18.7)
𝑟𝑥𝑦 =
{(6×5859.26)−(153.8)2 }{(6×80.67)−(18.7)2 }

4096.62−2876.06 1220.56
𝑟𝑥𝑦 = = 1242.95
11501.12 ×(134.33)
𝑟𝑥𝑦 = 0.982

Interpretation :
The correlation coefficient suggest a strong relation ship between the numbe rof cars a rental agency and its
annual income.

36
E XAMPLE 3 : TH E FOLLOWING TABLE SHOWS TH E PRICE (IN 1 0 0 0 $) AND DE MAND
(IN 1 0 0 S) OF TH E E LE CTRIC FANS IN A SUMME R. C ALCULATE TH E COE FFI CIE NT OF
CORRE LATI ON BE TWE E N X AND Y AND INTE RPRE T TH E RE SULTS.

Price (X) Demand (Y)


Solution:
3 17
First we have to find the Arithmetic mean of X and Y:
For X, 4 16
σ𝑿
ഥ=
𝑿 5 15
𝒏
𝟕𝟓
𝑋ത = = 𝟕. 𝟓 6 13
𝟏𝟎
For Y, 7 11
σ𝒀
ഥ=
𝒀
𝒏 8 10
9 8
𝟏𝟏𝟐

𝑌= = 𝟏𝟏. 𝟐
𝟏𝟎 10 9
11 7
37
12 6
CALCULATIONS

38
Putting the sums in the Formula we get,
σ 𝑋−𝑋ത 𝑌−𝑌ത
𝑟𝑥𝑦 =
σ 𝑋−𝑋ത 𝟐 σ 𝑌−𝑌ത 𝟐

−104.00
𝑟𝑥𝑦 =
(82.50 )(135.60 )

−104.00
𝑟𝑥𝑦 = = −0.98
105.77

The value of r =-0.98 shows that, there is strong negative


relationship exists between Price (x) and Demand (y). Which
means as the prices increases the demand of electric fans
decreases. 39
PROPERTIES OF ‘R’

The correlation co-efficient r is symmetrical with respect to the variable x and y.


𝒓𝒙𝒚 = 𝒓𝒚𝒙
The correlation co-efficient ‘r’ lies between -1 to +1
−𝟏 ≤ 𝒓 ≤ +𝟏
The correlation co-efficient r The correlation co-efficient r is independent of origin
and scale.
𝒓𝒙𝒚 = 𝒓𝒖𝒗

40
EXAMPLE 6:

The following table shows the price($) and Supply (in 100s) of refrigerator in
summer. Find Correlation between X and Y.

Price (X) 78 89 97 69 59 79 68 61
Supply (Y) 125 137 156 112 107 136 123 108

If u=X-69 and v = Y-112.


Then prove that 𝑟𝑥𝑦 = 𝑟𝑢𝑣 .

41
SOLUTION

Sr. no. X Y XY 𝑿𝟐 𝒀𝟐
1 78 125 9750 6084 15625
2 89 137 12193 7921 18769
3 97 156 15132 9409 24336
4 69 112 7728 4761 12544
5 59 107 6313 3481 11449
6 79 136 10744 6241 18496
7 68 123 8364 4624 15129
8 61 108 6588 3721 11664
Total Σ𝑋 = 600 Σ𝑌 = 1004 Σ𝑋𝑌 = 46242 Σ𝑿𝟐 = 128012 Σ𝒀𝟐 = 76812

42
𝑛 σ 𝑥𝑦−σ 𝑥 σ 𝑦
𝑟𝑥𝑦 =
{𝑛 σ 𝑥 2 −(σ 𝑥)2 }{𝑛 σ 𝑦 2 −(σ 𝑦)2 }

8(76812 )−(600 )(1004 )


𝑟𝑥𝑦 =
{8 46242 −(600 )2 }{8(128012 )−(1004 )2 }

614496−602400 12096
𝑟𝑥𝑦 = =
9936∗16080 12640.0506

𝑟𝑥𝑦 = 0.9570
This shows strong positive correlation between price and Supply

43
LETS CALCULATE U=X-69 AND V = Y-112

Sr. no. X Y U=X-69 V=Y-112 𝒖𝟐 𝒗𝟐 𝒖𝒗

1 78 125 9 13 81 169 117

2 89 137 20 25 400 625 500

3 97 156 28 44 784 1936 1232

4 69 112 0 0 0 0 0

5 59 107 -10 -5 100 25 50

6 79 136 10 24 100 576 540

7 68 123 -1 11 1 121 -11

8 61 108 -8 -4 64 16 32

Total Σ𝑋 = 600 Σ𝑌 = 1004 Σ𝑢 = 48 Σ𝑣 = 108 Σ𝑢2 = 1530 Σ𝑣 2 = 3468 Σ𝑢𝑣 = 2160

44
Putting the values in the formula, we get

𝑛 σ 𝑢𝑣−σ 𝑢 σ 𝑣
𝑟𝑢𝑣 =
{𝑛 σ 𝑢2 −(σ 𝑢)2 }{𝑛 σ 𝑣 2 −(σ 𝑣)2 }

8(2160 )−(48)(108 )
𝑟𝑢𝑣 =
{8 1530 −(48 )2 }{8(3468 )−(108 )2 }

17280−5184 12096
𝑟𝑢𝑣 = = 12640.0506
9936∗16080

𝑟𝑢𝑣 = 0.9570

So, 𝑟𝑢𝑣 =𝑟𝑥𝑦 which shows that correlation coefficient is independent of scale and origin.

45
PRACTICE QUESTIONS

1. Draw scatter plot and calculate the 2. If n=10, σ 𝑥𝑦 = 736 , 𝑥ҧ = 7.5, 𝑦ത = 11.2,
coefficient of correlation of the supply and
demand data given below ෍( 𝑥 − 𝑥)ҧ 2 = 82.50, ෍( 𝑦 − 𝑦)
ത 2
Supply (X). 1 2 3 4 5 6 7 8 = 135.60
Demand(Y). 3 4 6 8 10 12 14 15 Find 𝑟𝑥𝑦 .

3
• For the following correlations
• 𝑟1 = 0.29, 𝑟2= -0.63, 𝑟3 = 0.15, 𝑟4 = -0.34, 𝑟5 = 0.04
• Which one is strongest and which one is weakest, why?

46

You might also like