You are on page 1of 17

CORRELATION AND REGRESSION

Unit-4

MISSION VISION CORE VALUES


CHRIST is a nurturing ground for an individual’s Excellence and Service Faith in God | Moral Uprightness
holistic development to make effective contribution Love of Fellow Beings
to the society in a dynamic environment Social Responsibility | Pursuit of Excellence
CHRIST
Deemed to be University

SCATTER PLOT
 Scatter plot is used to detect whether
two variables are related or not.
 In this one variable is taken at X-axis while other
at Y-axis.
 Then plot is made by marking the pair of values
(x, y) on the graph.

Excellence and Service


SCATTER PLOT
CHRIST
Deemed to be University

COVARIANCE
 Covariance is a measure of relationship between two variables.
 Let X and Y be two variables then covariance between them
is
given by:
1. For raw data:

or,

2.For frequency

data: or,

Excellence and Service


CHRIST
Deemed to be University

CORRELATION
 Scatter plot is used to make a guess about the
relationship among two variables, but it can’t tell
how much strong the relationship is.
 To overcome from this difficulty correlation coefficients
are used.
 It is used to measure of strong the linear relationship
among two variables.
 Generally two types of Correlation coefficients are used.
1. Karl Pearson’s Correlation Coefficient
2. Spearman’s Rank Correlation Coefficient
Excellence and Service
CHRIST
Deemed to be University

KARL PEARSON’S CORRELATION COEFFICIENT


 It is also called as Product moment correlation
coefficient.
Assumptions:
1. It is used when both the variables are continuous.
2. Both the variables should be normally distributed.
 It is given by formula:

 Where,

 V(x) = variance of X
 V(y) = variance of Y

Excellence and Service


CHRIST
Deemed to be University

KARL PEARSON’S CORRELATION COEFFICIENT


Properties:
1. If X and Y be two continuous variables and a, b, c and d be any
constants, then we can define two new variables W = aX+b and
Z=cY+d, the correlation between W and Z would be same as that
between X and Y. i.e. r(W, Z) = r(X, y). In other
words
correlation is free from change of origin and scale.

2. If two variables are independent then correlation among


them
would always be 0. (Converse in not true)

3. Value of Pearson’s correlation coefficient always lies between –1


and +1.
Excellence and Service
CHRIST
Deemed to be University

KARL PEARSON’S CORRELATION COEFFICIENT


 The Pearson’s correlation coefficient takes values between 1 and –1
Interpretation:
1. r < 0; x and y are said to be negatively correlated.
2. r > 0; x and y are said to be positively correlated.
3. r = 0; x and y are said to be uncorrelated.
4. |r| < 0.3; x and y are slightly correlated
5. 0.3 < |r| < 0.75; x and y are moderately correlated.
6. |r| > 0.75; x and y are strongly correlated.
Ex: r = 0.89, then variables under consideration are said to be strongly
positively correlated, i.e., If the value of one variable increases then the
value of other value also increased and change in X also depend in very
strong manner upon change in Y.
Ex: r = –0.58, then variables under consideration are said to be moderately
negatively correlated, i.e., If the value of one variable increases then the
value of other value decreased and change in X also depend in moderate
manner upon change in Y. Excellence and Service
CHRIST
Deemed to be University

SPEARMAN’S RANK CORRELATION


🞭 It is used when variables are of
discrete type or in ranking form or variables are
not normally distributed.
🞭 For obtaining this first the observations
are changing
separately without ranked their places.
in descending
🞭 Then order
difference for
in both
ranks (di) variables
calculated(sayfor
x values
is corresponding & ofy)x and y .
i i

Excellence and Service


CHRIST
Deemed to be University

SPEARMAN’S RANK CORRELATION


 It is used when the assumptions of the Karl Pearson’s product moment
correlation are not fulfilled, i.e.,
 It is used when variables are discrete or ranked in some manner.
 It is used when variables are non Normally distributed.
 It is given by formula:

 The Spearman’s correlation coefficient takes values between 1 and -1


Interpretation:
1. ρ < 0; x and y are said to be negatively correlated.
2. ρ > 0; x and y are said to be positively correlated.
3. ρ = 0; x and y are said to be uncorrelated.
4. | ρ | < 0.3; x and y are slightly correlated
5. 0.3 < | ρ | < 0.75; x and y are moderately correlated.
6. | ρ | > 0.75; x and y are highly correlated.
Excellence and Service
CHRIST
Deemed to be University

LINEAR REGRESSION
 The scatter plot tells in what manner two variables are related,
whereas, correlation tells how much strong the linear relationship
among the two variables is. But these can’t tell what changes
would occur in variable when unit change is made in other
variable.
 Regression analysis is used to interpret the amount of change in
one variable when changes are made in other variable.
 Generally two types of regression analysis are used:
1. Linear regression
2. Curvelinear regression
 It is used when data is of quantitative (Continuous) type.
 It shows that what changes would occur in one variable when unit
changes are made in the other variable.

Excellence and Service


CHRIST
Deemed to be University

LINEAR REGRESSION
 The model for linear regression is given by:

 If only two variables are involved in the study

Yi = β0 + β1Xi + εi
 If more than two variables are involved in
the study Then model can be extended as:
Yi = β0 + β1X1i + β2X2i + ...... + βkXki + εi
 This model is called as Multiple
Linear Regression model.

Excellence and Service


The Least-Squares Line Y= β0 + β1 X
• Summarizes bivariate data: Predicts Y from X
• with smallest errors (in vertical direction, for Y axis)
• Intercept is 15.32 salary (at 0 years of experience)
• Slope is 1.673 salary (for each additional year of
experience, on average)
n c e
er i e
60 3 Exp
7
+ 1.6
Salary (Y)

50
5.32
= 1
40 a lary
S
30
20
10
Experience (X)
0 10 20
Irwin/McGraw-Hill © Andrew F. Siegel, 2003
CHRIST
Deemed to be University

LINEAR REGRESSION
 Here Y is called as response variable or dependent
variable.
 Xjs are called as predictor or regressors or
independent variables
 ε is called as random error term which occur due to
unavoidable causes.
 β0 is called as intercept parameter.
 βj's are called as slope parameters. βj tells what
changes would occur in response variable if unit
change is made in ith predictor.
Excellence and Service
CHRIST
ASSUMPTIONS FOR LINEAR Deemed to be University

REGRESSION
For performing linear regression analysis certain assumptions
are made which are as follows:
1. The dependent variable and predictors must be continuous.
2. Predictors must be linearly related with the
dependent variables.
3. The dependent variable must be normally distributed.
4. Predictors must be uncorrelated with each other.
5. Dependent variable must be homogenous.
6. Dependent variable must not be depend upon its
past values (for time dependent data).
Excellence and Service
CHRIST
Deemed to be University

LINEAR REGRESSION
 Let us consider the simplest case of linear regression, i.e., the case
when the relation is to be study among two variables only.
 Let X and Y be the two variables under consideration and {(xi, yi);
i=1, 2, …,n} denotes the n pairs of observations on these two
variables.
 Then the equation of Line of X on Y is given by:
xi = α0 + βxyyi +εi

 In the same manner the equation of Line of Y on X is given by:


yi = α1 + βyxxi +ε'i
 The main purpose of regression analysis is to obtained the
estimate of the unknown parameters α0, α1, βxy and βyx.

Excellence and Service


CHRIST
Deemed to be University

PROPERTIES OF REGRESSION
COEFFICIENTS
1. The estimate of regression coefficient is free from the change
of the origin but not of the scale.

2. If βyx and βxy be the estimates of the regression coefficient for


the line of Y on X and line of X on Y respectively and r be
the correlation among X and Y, then
a) βxy*βyx = r2
b) βxy/βyx = V(X)/V(Y)

Excellence and Service

You might also like