Professional Documents
Culture Documents
March 2, 2024
Syllabus
Correlation
Regression line.
Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
Motivation of Correction
The statistical techniques discussed so far are for only one variable.
In many research situations one has to consider two variables
simultaneously to know whether these two variables are related
linearly.
If so, what type of relationship that exists between them.
This leads to bivariate (two variables) data analysis namely
correlation analysis.
The correlation concept will help to answer the following types of
questions.
Whether study time in hours is related with marks scored in the
examination?
Is it worth spending on advertisement for the promotion of sales?
Is there any relationship between exerxise(hours) and blood pressure ?
Whether price of a commodity and demand related?
Is there any relationship between rainfall and production
Exploring the Bivariate Data using Correlation and Regression
of rice?
Dr. K.M.REDDY
/ 18
Definition of correlation
Correlation is a statistical measure which helps in analyzing the
interdependence of two or more variables.
Correlation analysis attempts to measure the strength of relationships
between two variables by means of a single number called a
correlation coefficient.
The measure ρ of linear association between two variables X and Y is
estimated by the Karl Pearson’s correlation coefficient r, where
1
xi yi − x̄ȳ
P
Sxy Cov(X,Y)
r=p , or r = or r = r P n rP
Sxx Syy σx σy x2i yi2
2 2
n − (x̄) n − (ȳ)
n n n
(xi − x̄)2 , Syy = (yi − ȳ)2 , Sxy = (xi − x̄)(yi − ȳ)
P P P
, Sxx =
i=1 i=1 P i=1 P
xi yi
Cov(X,Y) is the covariance of X and Y , x̄ = n , ȳ = n
Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
Properties of correlation
Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
Types of simple correlations
Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
Regression is the estimation or prediction of unknown values of one
variable from known values of another variable.
A line of regression is the straight line which gives the best fit in the
least square sense to the given frequency.
the line of regression of y on x is given by
σy
y − ȳ = r (x − x̄)
σx
P
xy σ
The regression coefficient of y on x = b(yx) = P x2 = r σxy
Here r is the correlation coefficient, x̄ means of x, ȳ is means of y.
Similarly, the line of regression of x on y is given by
σx
x − x̄ = r (y − ȳ)
σy
P
xy
The regression coefficient of x on y = bxy = P y2 = r σσxy
Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
1. Example of Correlation and Regression
The average prices of stocks and bonds listed on the New York Stock
Exchange during the years 1950 through 1959 are given as
Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
The tabular form is
Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
(a) Correlation coefficient:
P P
xi 452..64 yi 975.16
x̄ = n = 10 = 45.26, ȳ = n = 10 = 97.52, n = 10.
n n
(xi − x̄)2 = 449.38, (yi − ȳ)2 = 93.69,
P P
Sxx = Syy =
i=1 i=1
n
(xi − x̄)(yi − ȳ) = −94.67
P
Sxy =
i=1
The correlation coefficient is
Sxy −94.67
r=p =p = −0.4614
Sxx Syy (449.38)(93.69)
Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
(c).Fit a regression line of y on x
The line of regression of y on x is given by
σy
y − ȳ = b(yx) (x − x̄), b(yx) = r
σx
P
xy
where the regression coefficient of y on x, b(yx) = P x2
P
xy Sxy
b(yx) = P x2 = Sxx = −94.67
449.38 = −0.21
P P
xi 452..64 yi 975.16
x̄ = n = 10 = 45.26, ȳ = n = 10 = 97.52
Fit a regression line of y on x
n n
(xi − x̄)2 , Syy = (yi − ȳ)2 ,
P P
where Sxx =
i=1 i=1
n P
xi
P
yi
(xi − x̄)(yi − ȳ), x̄ =
P
Sxy = n , ȳ = n .
i=1
Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
the line of regression of x on y is given by
σx
x − x̄ = r (y − ȳ)
σy
Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
3. Example of Correlation and Regression
Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
4. Example of Correlation and Regression
Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
5. Example of Correlation and Regression
Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
Spearman’s rank correlation
6[ d2i + c.f ]
P
ρ=1−
n(n2 − 1)
Exploring the Bivariate Data using Correlation and Regression Dr. K.M.REDDY
/ 18
In some cases, we may have two or more equal observations in either
of two series or in both series. In such cases we award, the common
rank for such repeated items and then we find the rank correlation.
The correlation factor(c.f.) is obtained individually, for every repeated
rank in x-series, as well as in y-series.
P 2
Finally, the total correlation factor is to be added, to the form d i
to obtain the rank correlation coefficient.
the rank correlation coefficient is calculated by
6[ d2i + c.f ]
P
ρ=1−
n(n2 − 1)
.
Example 2:
• Calculate rank correlation coefficient of the data:
X 80 78 75 75 68 57 60 59
Y 110 111 114 114 114 116 115 117
Solution:
Calculation
x y rank 𝒙𝒊 rank 𝒅𝒊 = 𝒙𝒊 − 𝒚𝒊 𝒅𝒊 𝟐
𝒚𝒊
80 110 1 8 -7 49
78 111 2 7 -5 25
75 114 3.5 5 -1.5 2.25
75 114 3.5 5 -1.5 2.25
68 114 5 5 0 0
57 116 8 2 6 36
60 115 6 3 3 9
59 117 7 1 6 36
𝑑𝑖 2 = 159.5
• In 𝑥-series:
rank 3.5 repeated twice
𝑚 𝑚2 − 1 2 22 − 1
𝑐. 𝑓.1 = = = 0.5
12 12
• In 𝑦 −series:
rank 5 repeated thrice
Correlation factor
𝑚 𝑚2 − 1 3 32 − 1
𝑐. 𝑓.2 = = =2
12 12
Correlation factor= 𝑐. 𝑓.1 +𝑐. 𝑓.2
= 0.5 + 2 = 2.5
6 σ 𝑑𝑖 2 +𝑐.𝑓.
• Rank correlation is 𝜌 = 1 −
𝑛 𝑛2 −1
6 159.5 + 2.5
=1− = −0.9285
8 82 − 1
Example 3:
• Calculate the rank correlation coefficient to the following data;
X: 48 33 40 9 16 16 65 24 16 57
Y: 13 13 24 6 15 4 20 9 6 19
4
5