Professional Documents
Culture Documents
When
x 1 2 3 ... n x 1 2 3 ... n
Two methods:
In such case we shall have to convert the data into ranks. The highest (smallest)
observation is given rank 1. The next highest (next lowest) observation is given rank 2 and
so on.
It is immaterial in which way (ascending or descending) the ranks are assigned. However
the same approach should be followed for all the variables under consideration.
SPEARMAN’S CORRELATION
RANK
CORRELATION
ANALYSIS
SPECIAL CASE -When ranks are repeated.
In case of attributes if there is a tie i.e, if any two or more individuals are placed
together in any classification w.r.t. an attribute or if in case of variable data there
is more than one item with the same value, in either or both the series; then
Spearman’s formula for calculating Rank Correlation Coefficient breaks down,
since in this case the variable X and Y do not take the values from 1 to n and
consequently (we had initially assumed that ).
SPEARMAN’S CORRELATION
RANK
CORRELATION
ANALYSIS
SPECIAL CASE -When ranks are repeated.
In such case, common ranks are assigned to the repeated
items. These common ranks are the arithmetic mean of the
ranks which these items would have got if they were
different from each other and the next item will get the rank
next to the rank used in computing the common ranks.
SPEARMAN’S CORRELATION
RANK
CORRELATION
ANALYSIS
SPECIAL CASE -When ranks are repeated.
If only a small proportion of the ranks are ties, this technique may be applied
together with rank correlation coefficient formula.
Where m is the no. of times an item is repeated. This factor is to be added for
each repeated value in both the series.
SPEARMAN’S CORRELATION
RANK
CORRELATION
ANALYSIS
● Karl Pearson’s correlation coefficient
CHARACTERISTICS OF SPEARMAN’S assumes that the parent population from
RANK CORRELATION COEFFICIENT which sample observations are drawn is
normal. If this assumption is violated then
● We always have , which we need a measure which is distribution
provides a check for numerical free (or non-parametric). A distribution free
calculations. measure is one which does not make any
● Since Spearman’s rank correlation assumptions about the parameter of the
coefficient is nothing but Pearsonian population.
correlation coefficient between the
ranks, it can be interpreted in the Spearman’s is such a measure, since no
same way as the Karl Pearson’s strict assumptions are made about the form
correlation coefficient of the population from which the sample
observations are drawn.
SPEARMAN’S CORRELATION
RANK
CORRELATION
ANALYSIS
Unless many ties exist, the coefficient of
CHARACTERISTICS OF SPEARMAN’S rank correlation should be slightly lower
RANK CORRELATION COEFFICIENT than the Pearsonian coefficient.
● Spearman’s formula is easy to
● Spearman’s formula is the only formula to
understand and apply as compared
be used for finding correlation coefficient if
with Karl Pearson’s formula. The
we are dealing with qualitative
value obtained by the two formulae
characteristics which cannot be measured
r & are generally different. The
quantitatively but can be arranged serially.
difference arises due to the fact that
It can also be used where actual data are
when ranking is used instead of full
given. In case of extreme observations,
set of observations, there is always
Spearman’s formula is preferred to
some loss of information.
Pearson’s formula.
SPEARMAN’S CORRELATION
RANK
CORRELATION
ANALYSIS
The term best fit is interpreted in accordance with the Principle of Least
Squares which consists in minimising the sum of the squares of the residuals
or the errors of estimates i.e., the deviations between the given observed
values of the variable and their corresponding estimated values as given by
the line of best fit.
We may minimise the sum of the squares of the errors parallel to y axis or
parallel to the x axis, the former gives the equation of the line of regression of y
on x and the latter gives the equation of the line of regression of x on y
REGRESSION
ANALYSIS
LINES OF REGRESSION
Line of Regression of Y on X
Line of Regression of X on Y
We can also obtain an estimate of x for any given value of y using this equation,
but the estimates so obtained will not be best fit since this equation is obtained
on minimising the sum of the squares of errors of estimates in y and not in x.
REGRESSION
WHY TWO LINES ANALYSIS
OF REGRESSION
Hence to estimate or predict x for any given value of y, we use the regression
equation of x on y, which is derived on minimising the sum of the squares of error
of estimates in x. Here x is the dependent variable and y is the independent
variable.
If theta is the acute angle between the i.e., the two lines will either coincide or
two lines of regression then: they are parallel. But since both the lines
of regression intersect at , they
cannot be parallel. Hence in case of
perfect correlation, positive or negative,
the two lines of regression coincide.
In the particular case of perfect correlation,
If r=0, then i.e.,if the
two variable are uncorrelated, the two
lines of regression become perpendicular
to each other.
REGRESSION
ANGLE BETWEEN THE ANALYSIS
REGRESSION LINES
We have seen that if r=0, the two lines of regression are perpendicular to each other and
in the particular case of perfect correlation, the two lines of regression coincide.
This leads us to the conclusion that for higher degree of correlation between the
variables, the angle between the lines is smaller.
On the other hand, the angle between the lines increases i.e., the lines move apart as
the value of correlation coefficient decreases. In other words if the lines of regression
make a larger angle, they indicate a poor degree of correlation between the variables
and ultimately for i.e., two lines become perpendicular. Thus by plotting the lines
of regression on a graph paper, we can have an approximate idea about the degree of
correlation between the two variables under study.
REGRESSION
ANGLE BETWEEN THE ANALYSIS
REGRESSION LINES
REGRESSION
COEFFICIENT OF ANALYSIS
REGRESSION
Let us consider the line of regression of y
on x: y = bx + a
We know that both the lines of regression pass through the point , solving the above
two equations simultaneously, we get:
REGRESSION
SOME MORE ANALYSIS
FORMULAE
REGRESSION
SOME MORE ANALYSIS
FORMULAE
To find the Regression Coefficients and the Correlation Coefficient from the two lines of
Regression:
To obtain the coefficient of regression of y on x, we write the regression equation in the form
y = a+bx.
Given the two lines of Regression, how to determine which is the line of regression of y
on x and which is the line of regression of x on y: ……….I
Let’s assume that I is the equation of the line of regression of y on x and II is the
equation of regression of x on y. We can then go on to obtain & and hence r2.
If r2 is <1, then our assumption is correct. However if r2 comes out to be greater than
1, then our assumption is incorrect because r2 must lie between 0 & 1.
CORRELATION ANALYSIS
V/s
REGRESSION ANALYSIS
1. Correlation literally means the relationship between two or more variables which vary in
sympathy so that the movements in one tend to be accompanied by the corresponding
movements in the other(s). On the other hand, regression means stepping back or returning to
the average value and is a mathematical measure expressing the average relationship between
the two variables.
2. Correlation coefficient ‘rxy’ between two variables x and y is a measure of the direction and
degree of the linear relationship between two variables which is mutual. It is symmetric, i.e., ryx =
rxy and it is immaterial which of x and y is dependent variable and which is independent variable
Regression analysis aims at establishing the functional relationship between the two variables
under study and then using this relationship to predict or estimate the value of the dependent
variable for any given value of the independent variable. It also reflects upon the nature of the
variable, i.e., which is dependent variable and which is independent variable. Regression
coefficients are not symmetric in x and y, i.e., byx ≠ bxy.
CORRELATION ANALYSIS
V/s
REGRESSION ANALYSIS
3. Correlation need not imply cause and effect relationship between the variables under
study. However, regression analysis clearly indicates the cause and effect relationship
between the variables. The variable corresponding to cause is taken as independent
variable and the variable corresponding to effect is taken as dependent variable.
4. Correlation coefficient rxy is a relative measure of the linear relationship between x and y
and is independent of the units of measurement. It is a pure number lying between ± 1.
On the other hand, the regression coefficients, byx and bxy are absolute measures
representing the change in the value of the variable y(x), for a unit change in the value of
the variable x(y). Once the functional form of regression curve is known, by substituting the
value of the dependent variable we can obtain the value of the independent variable and
this value will be in the units of measurement of the variable.
CORRELATION ANALYSIS
V/s
REGRESSION ANALYSIS
Purchases 71 75 69 97 70 91 39 61 80 47
x y
91 71 1 1 1 1 1
97 75 7 5 49 25 35
67 70 -23 0 529 0 0
Σ x = 900 Σ y = 700 Σ dx = 0 Σ dy = 0 Σ dx 2 = Σ dy 2 = Σ dx dy =
6360 2868 3900
bxy = Σ (x – x– ) (y – y– )
byx = Σ(x – x– ) (y – y– )
Σ(y – y–)2
Σ(x – x– )2
= Σ dx dy
= Σ dx dy
Σdy 2
Σdx2
= 3900
= 3900
2868
6360
Regression Equations = 1·361
= 0·6132
Equation of line of regression of y on x is
y – y – = byx (x – x– )
⇒ y – 70 = 0·6132 (x – 90)
= 0·6132x – 55·188
⇒ y = 0·6132x – 55·188 + 70·000
⇒ y = 0·6132x + 14·812