You are on page 1of 29

Module-3.

Exploring the Bivariate


Data-using Correlation and
Regression Analysis

Dr. K.Mahipal Reddy


Ph.D (IIT Madras)
Assistant Professor
VIT-AP University, Amaravati

March 2, 2024
Syllabus

Module No. 2: Exploring the Data using Correlation and Regression

Correlation
Regression line.

Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
Motivation of Correction
The statistical techniques discussed so far are for only one variable.
In many research situations one has to consider two variables
simultaneously to know whether these two variables are related
linearly.
If so, what type of relationship that exists between them.
This leads to bivariate (two variables) data analysis namely
correlation analysis.
The correlation concept will help to answer the following types of
questions.
Whether study time in hours is related with marks scored in the
examination?
Is it worth spending on advertisement for the promotion of sales?
Is there any relationship between exerxise(hours) and blood pressure ?
Whether price of a commodity and demand related?
Is there any relationship between rainfall and production
Exploring the Bivariate Data using Correlation and Regression
of rice?
Dr. K.M.REDDY
/ 18
Definition of correlation
Correlation is a statistical measure which helps in analyzing the
interdependence of two or more variables.
Correlation analysis attempts to measure the strength of relationships
between two variables by means of a single number called a
correlation coefficient.
The measure ρ of linear association between two variables X and Y is
estimated by the Karl Pearson’s correlation coefficient r, where
1
xi yi − x̄ȳ
P
Sxy Cov(X,Y)
r=p , or r = or r = r P n rP
Sxx Syy σx σy x2i yi2
2 2
n − (x̄) n − (ȳ)

n n n
(xi − x̄)2 , Syy = (yi − ȳ)2 , Sxy = (xi − x̄)(yi − ȳ)
P P P
, Sxx =
i=1 i=1 P i=1 P
xi yi
Cov(X,Y) is the covariance of X and Y , x̄ = n , ȳ = n
Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
Properties of correlation

Covariance: Covariance is one of the statistical measurement to


know the relationship of the variance between the two variables.
The covariance is denoted as in
(x − x̄)(y − ȳ)
P
Cov(X,y) =
N
If Cov(x,y) > 0 then X and Y are Positively correlated
If Cov(x,y) < 0 then X and Y are Negatively correlated
If Cov(x,y) = 0 then X and Y are Uncorrelated

Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
Types of simple correlations

Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
Regression is the estimation or prediction of unknown values of one
variable from known values of another variable.
A line of regression is the straight line which gives the best fit in the
least square sense to the given frequency.
the line of regression of y on x is given by
σy
y − ȳ = r (x − x̄)
σx
P
xy σ
The regression coefficient of y on x = b(yx) = P x2 = r σxy
Here r is the correlation coefficient, x̄ means of x, ȳ is means of y.
Similarly, the line of regression of x on y is given by
σx
x − x̄ = r (y − ȳ)
σy
P
xy
The regression coefficient of x on y = bxy = P y2 = r σσxy
Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
1. Example of Correlation and Regression

The average prices of stocks and bonds listed on the New York Stock
Exchange during the years 1950 through 1959 are given as

(a) Find the correlation coefficient.


(b) Interpret the results.
(c) Fit a regression line of y on x.

Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
The tabular form is

Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
(a) Correlation coefficient:
P P
xi 452..64 yi 975.16
x̄ = n = 10 = 45.26, ȳ = n = 10 = 97.52, n = 10.
n n
(xi − x̄)2 = 449.38, (yi − ȳ)2 = 93.69,
P P
Sxx = Syy =
i=1 i=1
n
(xi − x̄)(yi − ȳ) = −94.67
P
Sxy =
i=1
The correlation coefficient is
Sxy −94.67
r=p =p = −0.4614
Sxx Syy (449.38)(93.69)

(b) Interpret the results (or) conclusion: there is some negative


correlation between stock and bond prices.

Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
(c).Fit a regression line of y on x
The line of regression of y on x is given by
σy
y − ȳ = b(yx) (x − x̄), b(yx) = r
σx
P
xy
where the regression coefficient of y on x, b(yx) = P x2
P
xy Sxy
b(yx) = P x2 = Sxx = −94.67
449.38 = −0.21
P P
xi 452..64 yi 975.16
x̄ = n = 10 = 45.26, ȳ = n = 10 = 97.52
Fit a regression line of y on x

y − 97.52 = (−0.21)(x − 45.26)


y = (−0.21)x + 97.52 + 0.21 ∗ 45.26
y = (−0.21)x + 107.02
Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
2. Example of Correlation and Regression

Find the correlation coefficient between x and y

and also fit a regression line of x on y.


Answer: The correlation coefficient is
Sxy 1591 352
r=p =q = = 0.9487863
Sxx Syy 4111
1368 ∗ 2 371

n n
(xi − x̄)2 , Syy = (yi − ȳ)2 ,
P P
where Sxx =
i=1 i=1
n P
xi
P
yi
(xi − x̄)(yi − ȳ), x̄ =
P
Sxy = n , ȳ = n .
i=1

Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
the line of regression of x on y is given by
σx
x − x̄ = r (y − ȳ)
σy

The regression coefficient of x on y =


n
P
P (xi −x̄)(yi −ȳ)
xy
bxy = P y2 = r σσxy = i=1
n
P
(yi −ȳ)2
i=1

x̄ = 74.5, ȳ = 125.75, bxy = 0.77


The line of regression of x on y is x − 74.5 = 0.77(y − 125.75)

x = 0.77y + 74.5 − 0.77 ∗ 125.75 = 0.77y − 22.33

Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
3. Example of Correlation and Regression

Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
4. Example of Correlation and Regression

Find the correlation coefficient between x and y from the data

Answer: The correlation coefficient is


−3737
Sxy 3 −1250
rxy = p = q = = −0.97732952
Sxx Syy 15488
944 ∗ 9 1279

Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
5. Example of Correlation and Regression

Find the correlation coefficient between x and y from the data

Answer: The correlation coefficient is


Sxy
rxy = p = −0.9581
Sxx Syy

Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
Spearman’s rank correlation

A group of n individuals may be arranged in order of merit w.r.t.


same characteristic. The same group would give different order for
different characteristics.
Consider the orders corresponding to two characteristics A and B. the
correlation between the n pairs of ranks is called the rank correlation
in the characteristics A and B for that group of individuals.
the rank correlation coefficient is calculated by

6[ d2i + c.f ]
P
ρ=1−
n(n2 − 1)

Where c.f is correlation factor

Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
In some cases, we may have two or more equal observations in either
of two series or in both series. In such cases we award, the common
rank for such repeated items and then we find the rank correlation.
The correlation factor(c.f.) is obtained individually, for every repeated
rank in x-series, as well as in y-series.
P 2
Finally, the total correlation factor is to be added, to the form d i
to obtain the rank correlation coefficient.
the rank correlation coefficient is calculated by

6[ d2i + c.f ]
P
ρ=1−
n(n2 − 1)

here n is the number of given pairs


m(m2 −1)
Where correlation factor (c.f.) is given by c.f = 12
here m is the number of times a rank repeated.
Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
Example 1:
Ten students got the following percentage of marks in chemistry and
physics
student 1 2 3 4 5 6 7 8 9 10
Marks in 78 36 98 25 75 82 90 62 65 39
chemistry
Marks in 84 51 91 60 68 62 86 58 63 47
Physics

Calculate the rank correlation coefficient.


Solution:
Students Marks in Marks in Rank (𝒙𝒊 ) Rank (𝒚𝒊 ) 𝒅 𝒊 = 𝒙 𝒊 − 𝒚𝒊 𝒅𝒊 𝟐
Chemistry Physics
1 78 84 4 3 1 1
2 36 51 9 9 0 0
3 98 91 1 1 0 0
4 25 60 10 7 3 9
5 75 68 5 4 1 1
6 82 62 3 6 -3 9
7 90 86 2 2 0 0
8 62 58 7 8 -1 1
9 65 63 6 5 1 1
10 39 47 8 10 -2 4
Here σ 𝑑𝑖 2 = 26, n=10
6 σ 𝑑𝑖 2
Rank correlation coefficient 𝜌 = 1 −
𝑛 𝑛2 −1
6 26
=1 − = 0.8424
10 99

.
Example 2:
• Calculate rank correlation coefficient of the data:
X 80 78 75 75 68 57 60 59
Y 110 111 114 114 114 116 115 117
Solution:
Calculation
x y rank 𝒙𝒊 rank 𝒅𝒊 = 𝒙𝒊 − 𝒚𝒊 𝒅𝒊 𝟐
𝒚𝒊
80 110 1 8 -7 49
78 111 2 7 -5 25
75 114 3.5 5 -1.5 2.25
75 114 3.5 5 -1.5 2.25
68 114 5 5 0 0
57 116 8 2 6 36
60 115 6 3 3 9
59 117 7 1 6 36

෍ 𝑑𝑖 2 = 159.5
• In 𝑥-series:
rank 3.5 repeated twice
𝑚 𝑚2 − 1 2 22 − 1
𝑐. 𝑓.1 = = = 0.5
12 12
• In 𝑦 −series:
rank 5 repeated thrice
Correlation factor
𝑚 𝑚2 − 1 3 32 − 1
𝑐. 𝑓.2 = = =2
12 12
Correlation factor= 𝑐. 𝑓.1 +𝑐. 𝑓.2
= 0.5 + 2 = 2.5
6 σ 𝑑𝑖 2 +𝑐.𝑓.
• Rank correlation is 𝜌 = 1 −
𝑛 𝑛2 −1

6 159.5 + 2.5
=1− = −0.9285
8 82 − 1
Example 3:
• Calculate the rank correlation coefficient to the following data;
X: 48 33 40 9 16 16 65 24 16 57
Y: 13 13 24 6 15 4 20 9 6 19
4
5

You might also like