You are on page 1of 17

CORRELATION AND

REGRESSION
INTRODUCTION
 Bivariate distribution-A distribution in which each

unit of series assumes two values.


 Multivariate distribution-A distribution in which each

unit of the series assumes more than two values.


 Correlation-It is a statistical tool which is used to

study the relationship between two variables.


 There are various methods used to study and measure

extent of relationship between two variables one of


them is correlation analysis.
 “When the relationship is of a quantitative nature, the

appropriate statistical tool for discovering and


measuring the relationship and expressing it in a
brief formula is known as correlation”

-Croxton and Cowden.


Types of correlation
Positive and negative correlation.

Linear and non-linear correlation.


Positive and negative correlation
 Positive or direct correlation-if the increase in the
values of one variable results, on an average, in a
corresponding increase in the values of the other
variable.

 Negative or inverse correlation-if the increase in


the values of one variable results, on the average,
in a corresponding decrease in the values of the
other variable.
Linear and non-linear correlation
 Linear correlation-corresponding to a unit change
in one variable, there is a constant change in the
other variable over the entire range of the values.

 Non-linear or curvilinear correlation-


corresponding to a unit change in one variable, the
other variable does not change at a constant rate
but at fluctuating rate.
Correlation and Causation
 Causation always implies correlation but the converse is

not true.
 The reason for high degree of correlation between variables

may be due to following:


Mutual dependence.

Both the variables being influenced by the same external

factors.
Pure chance.
Methods of studying correlation
Commonly used methods for studying the correlation
between two variables are
 Scatter diagram method.

 Karl Pearson's coefficient of correlation (covariance

method).
 Two-way frequency table.

 Rank method.

 Concurrent deviations method.


Scatter diagram method
 Scatter diagram is one of the simplest ways of
diagrammatic representation of a bivariate distribution
and provides us one of simplest tools of ascertaining
the correlation between two variables.
 The following points should be noted in interpreting
the scatter diagram regarding the correlation between
the two variables:
 If the points are very dense, i.e. close to each other a fairly
good amount of correlation may be expected. If points are
widely scattered, a poor correlation may be expected.
If the points reveal any trend the variables are said to be

correlated and if no trend is revealed, the variables are


uncorrelated.

If all the points lie on a straight line starting form the

left bottom and going up towards the right top, the


correlation is perfect and positive, and if all the points
lie on a straight line starting form left top and coming
down to right bottom, the correlation is perfect and
negative.
If there is upward trend rising from lower left hand corner to
upper right hand corner, the correlation is positive. If the
points depict a downward trend form the upper left hand
corner to lower right corner the correlation is negative.
 Karl Pearson’s coefficient of correlation-A
mathematical method for measuring the intensity or
the magnitude of linear relationship between two
variable series.

 Method of concurrent deviations-This is a method of

determining the correlation between two series when


we are not very serious about its precision.
Rank correlation method
 Statistical series in which the variables under

consideration are not capable of quantitative


measurement but can be arranged in serial order.
 Correlation coefficient can be found under two

conditions
 When actual ranks are given.

 When ranks are not given.


Regression
 Regression analysis, in the general sense, means the

estimation or prediction of the unknown value of one


variable from the known value of the other variable.
 Linear regression-If the regression curve is a straight

line.
 Non-linear regression-If the regression curve is not a

straight line.
Correlation Vs. Regression Analysis
 Correlation literally means the relationship between
two or more variables which vary. Regression means
stepping back or returning to the average value.

 Correlation need not imply cause and effect


relationship between the variable under study.
Regression clearly indicates the cause and effect
relationship.
 Correlation coefficient is a linear relationship between
x and y and is independent of the units of
measurement. It is pure number lying between + or –
1. Regression is absolute measure and the variable x
and y are dependent on each other.

 Correlation analysis is confined to only to study of


linear relationship between the variables and
therefore, has limited applications. Regression
analysis has wider application as it studies linear as
well as non-linear relationship between the variables.