You are on page 1of 79

● Statistical methods of Central Examples: Bivariate Distribution:

Tendency, Dispersion etc. are


➢ The series of marks of individual
helpful for the purpose of
students in 2 subjects in an
comparison and analysis of
exam.
distribution involving only one
➢ The series of Sales revenue &
variable i.e, Univariate Distribution.
Advertising expenditure of
● We may however come across
different companies in a
certain series where each item of
particular year.
the series may assume the values of
two or more variables.
● Distributions in which each unit of
● Thus, in a Bivariate Distribution we
the series assumes 2 values is called
are given a set of pairs of
Bivariate Distribution.
observations, one value of each pair
● Distributions in which each unit of
being the values of each of the two
the series assumes more than 2
variables.
values is called Multivariate
Distribution.
CORRELATION &
CORRELATION
ANALYSIS
In a Bivariate distribution, we may be interested to find if
there is any relationship between the 2 variables under
study.
The CORRELATION is a statistical tool which studies
the relationship between the two variables.
CORRELATION ANALYSIS involves methods and
techniques used for studying and measuring the extent
of the relationship between the two variables.
CORRELATION

● ‘When the relationship is of a


● ‘The effect of Correlation is to
quantitative nature the
reduce the range of uncertainty
appropriate statistical tool for
of our predictions.
discovering and measuring the
relationship and expressing it in Two variables are said to be
a brief formula is known as correlated, if the change in one
Correlation’. variable results in a corresponding
● ‘Correlation is an analysis of the change in the other variable.
covariation between two or more
variables’.
TYPES OF
CORRELATION
❖ Positive & Negative Correlation:
➢ If the values of the two variables deviate in the same direction, correlation is
said to be Positive or Direct.
➢ If the values of the two variables deviate in opposite direction, correlation is
said to be Negative or Inverse.
❖ Linear & Non-Linear Correlation:
➢ Correlation between two variables is said to be Linear, if corresponding to a
unit change in one variable there is a constant change in the other variable
over the entire range of the variable value.
➢ The relationship between two variables is said to be Non-Linear or
Curvilinear, if corresponding to unit change in one variable, the other
variable does not change at a constant rate but at a fluctuating rate.
TYPES OF
CORRELATION
❖ Simple, Partial & Multiple Correlation:
➢ If only two variables are involved in a study, then the
correlation is said to be Simple Correlation.
➢ When three or more variables are involved in a study,
and we consider only two variables influencing each
other while the effect(s) of the other variable(s) is/are
held constant, it is a case of Partial Correlation.
However, if we study the influence of all the variables at
the same time, it is a case of Multiple Correlation.
CORRELATION

In all these cases involving two or more variables, we may be


interested in seeing:

➔ If there is any association between the variables.


➔ If there is an association, is it strong enough to be useful.
➔ If so, what form the relationship between the two variables takes.
➔ How we can make use of that relationship for predictive purpose
that is forecasting, and
➔ How good such predictions will be.
CORRELATION &
CAUSATION
In a bivariate distribution, if the variables have
the cause and effect relationship, they are
Correlation analysis enables us to bound to vary in sympathy with each other and
have an idea about the degree therefore there is bound to be a high degree of
and direction of the relationship correlation between them.
between the two variables under
In other words, causation always implies
study. However, it fails to reflect
correlation. However the converse is not true,
upon the cause and effect
i.e., even a fairly high degree of correlation
relationship between the between the two variables need not imply a
variables. cause and effect relationship between them.
REASONS FOR
HIGH DEGREE OF
CORRELATION
➢ Mutual Dependance: the may be due to the effect or interaction of a
phenomenon under study may third variable or a number of variables on
inter-influence each other. Such a each of these two variables. Example -
situation is usually observed in data Yield of two unrelated crops.
relating to economics and business ➢ Pure Chance: It may happen that a small
situations. Example - Supply & randomly selected sample from a bivariate
Demand and Fashion. distribution may show a fairly high degree
➢ Both the variables being influenced by of correlation, though actually the variables
the same external factor: A high degree may not be correlated in the population.
of correlation between the two variables Such correlation may be attributed to
chance fluctuation.
CORRELATION
ANALYSIS
Correlation Analysis is a statistical
technique used to indicate the Types of correlation analysis methods:
nature and degree of relationship
existing between one variable and 1. Scatter Diagram.
the other(s). 2. Correlation Graph.
3. Pearson’s Coefficient of Correlation.
It is also used along with Regression
4. Spearman’s Rank Correlation.
Analysis to measure how well the
regression line explains the 5. Concurrent Deviation Method.
variations of the dependent variable
with the independent variable.
CORRELATION
ANALYSIS
SCATTER DIAGRAM
● This method is also known as Dotogram or Dot Diagram.
● Scatter diagram is one of the simplest method of
diagrammatic representation of a bivariate distribution and
provides us one of the simplest tool of ascertaining the
correlation between two variables.
● Under this method, both the variables are plotted on the
graph paper by putting dots. The diagram so obtained is
called Scatter Diagram (It is customary to take the dependent
variable along the Y axis and the independent variable along
the X axis).
CORRELATION
ANALYSIS
SCATTER DIAGRAM
From scatter diagram we can form a fairly good, though rough, idea about the
relationship between the two variables. We should keep the following points in
mind while interpreting correlation from a scatter diagram:
➢ If the plotted points are very close to each other, it indicates high degree of
correlation and vice-versa.
➢ If the points on the scatter diagram reveal any trend, the variables are said
to be correlated and if no trend is revealed the variables are uncorrelated.
➢ If there is an upward trend rising from lower left hand corner and going
upwards to the upper right hand corner, the correlation is positive, since
this reveals that the values of the two variables move in the same
direction.
CORRELATION
ANALYSIS
SCATTER DIAGRAM
➢ If the points depict a downward trend from the upper left
corner to the lower right hand corner, the correlation is
negative since in this case the values of the two variables
move in the opposite direction.
➢ In particular, if all the points lie on a straight line starting from
the left bottom and going up towards the right top, the
correlation is perfect and positive.
➢ If all the points lie on a straight line starting from left top and
coming down to the right bottom, the correlation is perfect and
negative.
CORRELATION
ANALYSIS
SCATTER DIAGRAM
➢ The method of Scatter Diagram is readily comprehensible and
enables us to form a rough idea of the nature of the
relationship between the two variables merely by observation.
➢ Moreover, this method is not affected by extreme observations.
➢ However , this method is not suitable if the number of
observations is fairly large.
➢ The method of scatter diagram only tells us about the nature of
the relationship whether it is positive or negative and whether it
is high or low. It doesn’t provide us an exact measure of the
extent of the relationship between the two variables.
KARL PEARSON’S CORRELATION
COEFFICIENT OF
CORRELATION
ANALYSIS
COVARIANCE METHOD: Karl Pearson’s measure known as
Pearsonian Correlation Coefficient
A mathematical method for between 2 variables X and Y, usually
measuring the intensity or the denoted by r(X,Y) or or simply r
magnitude of linear relationship is a numerical measure of linear
between two variables was relationship between them is defined
suggested by Karl Pearson and it as the ratio of the covariance
is by far the most widely used between X and Y, to the product of
method in practice. the standard deviations of X and Y.
KARL PEARSON’S CORRELATION
COEFFICIENT OF
CORRELATION
ANALYSIS
Symbolically,

When

are n pairs of observations of the variables X and Y


in a bivariate distribution: Substituting the equations of Cov(X,Y), SDx
and SDy, we can write the Pearsonian
Correlations coefficient as:
KARL PEARSON’S CORRELATION
COEFFICIENT OF
CORRELATION
ANALYSIS
KARL PEARSON’S CORRELATION
COEFFICIENT OF
CORRELATION
ANALYSIS
Value of r Degree of Correlation
Properties:
1 Perfect Correlation
● Pearsonian Correlation Coefficient 0.90 or more Very high degree of Correlation
cannot exceed 1 numerically.
○ This provides us with a check on Sufficiently high degree of
0.75 to 0.90 Correlation
our calculations.
○ The sign of r indicates the nature 0.6 to 0.75 Moderate degree of Correlation
of the correlation.
0.30 to 0.60 Only the possibility of Correlation
○ The following table sums up the
degree of correlation corresponding < 0.30 Possibly no Correlation
to various values of r:
0 Absence of Correlation
KARL PEARSON’S CORRELATION
COEFFICIENT OF
CORRELATION
ANALYSIS
● Pearsonian Correlation Coefficient The formula of Pearson’s Correlation
is independent of the change of Coefficient becomes quite tedious to
origin and scale. use in numerical problems if X
○ If U=(X-A)/h & V=(Y-B)/k, where and/or Y are fractional or if X and Y
A, B, h and k are constants and
are large. In such cases, we can
h>0, k>0; then the correlation
conveniently change the origin and
coefficient between X and Y is
same as the correlation scale(if possible) in X or/and Y to get
coefficient between U and V, i.e; new variables U and V & compute
r(X,Y) = r(U,V) the correlation between U and V as:
KARL PEARSON’S CORRELATION
COEFFICIENT OF
CORRELATION
ANALYSIS

● 2 independent variables are uncorrelated , but the converse is not true.


○ If X and Y are independent variables, then r(X,Y) = 0.
However, the converse of the theorem is not true, i.e., uncorrelated
variables need not necessarily be independent. Uncorrelation between the
two variables simply implies the absence of any linear relationship between
them. They may however be related in some other form, other than a
straight line i.e., quadratic, logarithmic etc.
KARL PEARSON’S CORRELATION
COEFFICIENT OF
CORRELATION
ANALYSIS
● Pearsonian Correlation ● The square root of Pearsonian
Coefficient is the geometric Correlation Coefficient is known
mean of the 2 regression as the Coefficient of
coefficients, i.e.; Determination.
Coefficient of Determination
The most common interpretation of the coefficient of determination is
how well the regression model fits the observed data. For example, a
coefficient of determination of 0.6 shows that 60% of the data fit the
regression model.

Coefficient of determination, in statistics, R 2 (or r 2), a measure that


assesses the ability of a model to predict or explain an outcome in the
linear regression setting. More specifically, R 2 indicates the proportion
of the variance in the dependent variable ( Y ) that is predicted or
explained by linear regression and the predictor variable ( X , also known
as the independent variable).
KARL PEARSON’S CORRELATION
COEFFICIENT OF
CORRELATION
ANALYSIS
There are two functions of Probable Error:

● Determination of Limits: The limits of ● Interpretation of ‘r’:


population correlation coefficient are ○ If r < PE(r), there is no evidence
of correlation i.e., a case of
insignificant correlation.
implying that if we take another ○ If r > 6*PE(r), correlation is
random sample of the size N from significant.
the same population, then the ○ If r < 6*PE(r), correlation is
observed value of the correlation
coefficient in the second sample can insignificant.
be expected to lie within the limits ○ If the probable error is small,
given above, with 0.5 probability correlation exist where r > 0.5
SPEARMAN’S CORRELATION
RANK
CORRELATION
ANALYSIS
Sometimes we come across statistical In such situations Karl Pearson’s
series in which the variables under coefficient of correlation cannot be used.
consideration are not capable of
quantitative measurements, but can be
arranged in serial order. This happens
Charles Edward Spearman, developed a
when we are dealing with qualitative
formula in 1904 which consists in obtaining
characteristics (attributes) such as the correlation coefficient between the
bravery, honesty, character, morality ranks of n individuals in the two attributes
etc., which cannot be measured under study.
quantitatively but can be arranged
serially.
SPEARMAN’S CORRELATION
RANK
CORRELATION
ANALYSIS
Suppose we want to find if two characteristics Spearman’s Rank Correlation
A, say intelligence and B, say beauty are Coefficient, usually denoted by (Rho)
related or not. Both the characteristics are is given by the formula:
incapable of quantitative measurements but
we can arrange a group of n individuals in
order of merit (ranks) w.r.t proficiency in the 2
characteristics.

Let the random variables X and Y denote the


rank of the individuals in the characteristics A
and B respectively. If we assume that there is Where, d is the difference between the
no tie, i.e, if no two individuals get the same pair of ranks of the same individual in the
rank in a characteristic then, X and Y assume two characteristics and n is the number
numerical values ranging from 1 to n. of pairs.
SPEARMAN’S CORRELATION
RANK
CORRELATION
ANALYSIS
Spearman’s Rank Correlation Coefficient On the other hand, will be minimum, i.e,
lies between -1 and 1, i.e, -1 < < 1.
= -1, if is maximum, i.e’ the
=1; iff . Now , deviation d is maximum, which is so if the
ranks of the individuals in the two
iff each d=0, i.e., the ranks of an individual
characteristics are in the reverse (opposite)
are same in both the characteristics.
order as given in table below:
Example:

x 1 2 3 ... n x 1 2 3 ... n

y 1 2 3 ... n y n n-1 n-2 ... 1


SPEARMAN’S CORRELATION
RANK
CORRELATION
ANALYSIS

Computation of Rank Correlation Coefficient.

Two methods:

1. When actual ranks are given.


2. When ranks are not given.
SPEARMAN’S CORRELATION
RANK
CORRELATION
ANALYSIS
CASE 1 - When actual ranks are given.

In this situation the following steps are


involved-

1. Compute d, the difference of ranks.


2. Compute
3. Obtain the sum
4. Use the formula for
SPEARMAN’S CORRELATION
RANK
CORRELATION
ANALYSIS
CASE 2 - When ranks are not given.

Spearman’s Rank Correlation Coefficient formula can also be used


even if we are dealing with variables which are measured quantitatively, i.e, when the
actual data but not the ranks relating to the two variables are given.

In such case we shall have to convert the data into ranks. The highest (smallest)
observation is given rank 1. The next highest (next lowest) observation is given rank 2 and
so on.

It is immaterial in which way (ascending or descending) the ranks are assigned. However
the same approach should be followed for all the variables under consideration.
SPEARMAN’S CORRELATION
RANK
CORRELATION
ANALYSIS
SPECIAL CASE -When ranks are repeated.

In case of attributes if there is a tie i.e, if any two or more individuals are placed
together in any classification w.r.t. an attribute or if in case of variable data there
is more than one item with the same value, in either or both the series; then
Spearman’s formula for calculating Rank Correlation Coefficient breaks down,
since in this case the variable X and Y do not take the values from 1 to n and
consequently (we had initially assumed that ).
SPEARMAN’S CORRELATION
RANK
CORRELATION
ANALYSIS
SPECIAL CASE -When ranks are repeated.
In such case, common ranks are assigned to the repeated
items. These common ranks are the arithmetic mean of the
ranks which these items would have got if they were
different from each other and the next item will get the rank
next to the rank used in computing the common ranks.
SPEARMAN’S CORRELATION
RANK
CORRELATION
ANALYSIS
SPECIAL CASE -When ranks are repeated.

If only a small proportion of the ranks are ties, this technique may be applied
together with rank correlation coefficient formula.

If a large proportion of ranks are tied, it is advisable to apply an adjustment or a


correlation factor (C.F.) to the formula as below:

In the formula add the factor to

Where m is the no. of times an item is repeated. This factor is to be added for
each repeated value in both the series.
SPEARMAN’S CORRELATION
RANK
CORRELATION
ANALYSIS
● Karl Pearson’s correlation coefficient
CHARACTERISTICS OF SPEARMAN’S assumes that the parent population from
RANK CORRELATION COEFFICIENT which sample observations are drawn is
normal. If this assumption is violated then
● We always have , which we need a measure which is distribution
provides a check for numerical free (or non-parametric). A distribution free
calculations. measure is one which does not make any
● Since Spearman’s rank correlation assumptions about the parameter of the
coefficient is nothing but Pearsonian population.
correlation coefficient between the
ranks, it can be interpreted in the Spearman’s is such a measure, since no
same way as the Karl Pearson’s strict assumptions are made about the form
correlation coefficient of the population from which the sample
observations are drawn.
SPEARMAN’S CORRELATION
RANK
CORRELATION
ANALYSIS
Unless many ties exist, the coefficient of
CHARACTERISTICS OF SPEARMAN’S rank correlation should be slightly lower
RANK CORRELATION COEFFICIENT than the Pearsonian coefficient.
● Spearman’s formula is easy to
● Spearman’s formula is the only formula to
understand and apply as compared
be used for finding correlation coefficient if
with Karl Pearson’s formula. The
we are dealing with qualitative
value obtained by the two formulae
characteristics which cannot be measured
r & are generally different. The
quantitatively but can be arranged serially.
difference arises due to the fact that
It can also be used where actual data are
when ranking is used instead of full
given. In case of extreme observations,
set of observations, there is always
Spearman’s formula is preferred to
some loss of information.
Pearson’s formula.
SPEARMAN’S CORRELATION
RANK
CORRELATION
ANALYSIS

CHARACTERISTICS OF SPEARMAN’S RANK CORRELATION


COEFFICIENT
● Spearman’s formula has its limitations also. It is not practical in the case
of bivariate frequency distribution for N>30, this formula should not be
used unless the ranks are given, since in the contrary case, the
calculations are quite time consuming.
CORRELATION
ANALYSIS
LIMITATIONS OF CORRELATION ● Correlation analysis cannot determine
ANALYSIS- cause and effect relationship. One
should not assume that a change in Y
Correlation Analysis is a statistical variable is caused by a change in X
tool, which should be properly used variable unless one is reasonably
so that correct results can be sure that one variable is the cause
obtained. Given below are some while the other is the effect.
limitations/errors frequently made in ● Another mistake that occur frequently
the use of Correlation Analysis: is on account of mis-interpretation of
the coefficient of correlation.
CORRELATION
ANALYSIS
● Another mistake in the interpretation of
Suppose in one case r = 0.7, it will be the coefficient of correlation occurs
wrong to interpret that correlation when one concludes a positive or
explains 70% of the total variation in negative relationship even though the
Y. the error can be seen easily when
two variables are actually unrelated.
we calculate the coefficient of
Determination. Here the coefficient of For example, the age of students and
Determination will be 0.49. This their score in examination have no
means that only 49% of the total relation with each other. The two
variation in Y is explained. variables may show similar
movements but there does not seem
to be a common link between them.
REGRESSION
ANALYSIS
In business, several times it becomes For example, a company is interested to
necessary to have forecast so that the know how the demand for television sets
management can take a decision regarding will increase in the next 5 years keeping in
mind the growth of population in a certain
a product or a particular course of action.
town. Here, it is clearly assumed that the
increase in population will lead to an
In order to make a forecast, one has to
increased demand for TVs.
ascertain some relationships between two
or more variables relevant to a particular Thus, to determine the nature and extent
situation. of relationship between these two
variables becomes important for the
company
REGRESSION
ANALYSIS
In Correlation analysis, we studied FIRST, we learn how to build
in some depth linear correlation statistical models of relationship
between 2 variables. Here we between the variables to have a
have a similar concern, the better understanding of the their
association between variables, features.
except that we develop it further in
SECOND, we extend the models
two aspects.
to consider their use in forecasting.
REGRESSION
ANALYSIS
The literal meaning of the word heights of about 1000 fathers and sons and
‘REGRESSION’ is stepping back or published the results in a paper ‘Regression
returning to the average value. The term towards Mediocrity in Hereditary Stature’.
was first used by British biometrician Sir The interesting features of his study were:
Francis Galton in connection with some
studies he made on estimating the extent to I. The tall fathers have tall sons and
which the stature of the sons of tall parents short father have short sons.
revert or regress back to the mean stature II. The average height of the sons of
of the population. group of tall fathers is less than that of
the fathers and the average height of
He studied the relationship between the the sons of a group of short fathers is
more than that of the fathers.
REGRESSION
ANALYSIS
In other words, Galton’s studies
revealed that the offsprings of
abnormally tall or short parents tend to will be (a * r) cms above (below)
revert or step back to the average height the general average height where r
of the population, a phenomenon which is the correlation coefficient
Galton described as ‘Regression to between the heights of the given
Mediocrity’. group of fathers and their sons. In
this case correlation is positive and
He concluded that if the average height
of a certain group of fathers is ‘a’ cms
since we have
above (below) the general average
height then average height of their sons
REGRESSION
ANALYSIS
But today the word Regression as It is extensively used in almost all
used in statistics has a much wider sciences - natural, social and physical.
perspective w/o any reference to It is especially used in business and
biometry. economics to study the relationship
between two or more variables that are
Regression Analysis, in general
related causally and for estimation of
means the estimation or prediction
demand and supply curves, cost
of the unknown value of one
functions, production and consumption
variable from the known value of
functions etc.
the other variable.
REGRESSION
ANALYSIS
Prediction or estimation is one of the major
problems in almost all spears of human Regression Analysis is one of the very
activity. The estimation or prediction of
future production, consumption, prices, scientific techniques for making such
investment, sales, profit, income etc are of predictions.
paramount importance to a businessman or
economist. “Regression Analysis is a
Population estimates and population mathematical measure of the
projections are indispensable for effective average relationship between two or
planning of an economy.
more variables in terms of the
The pharmaceutical concerns are interested original units of the data”.
in studying or estimating the effect of new
REGRESSION
ANALYSIS
We come across a number of The regression analysis confined to
interrelated events in our day-to-day the study of only two variables at a
life. For example, the yield of a crop time is termed as Simple Regression.
depends on the rainfall, the price of
a product depends on the production But quite often the values of a
and advertising expenditure, the particular phenomenon may be
demand for a particular product affected by multiplicity of factors. The
depends on its price, expenditure of regression analysis for studying more
a person depends on his/her income than 2 variables at a time is known as
and so on. Multiple Regression.
REGRESSION
ANALYSIS
In Regression Analysis there are two types of variables:

➔ The variable whose value is influenced or is to be predicted


is called Dependent or Regressed or Explained variable
and
➔ The variable which influences the values or is used for
prediction is called Independent or Regressor or Predictor or
Explanator variable.
REGRESSION
Linear & Non-Linear ANALYSIS
Regression
If the Regression Curve is a straight line, we
If the given bivariate data are plotted on say that there is Linear Regression between
a graph, the points so obtained on the the variables under study. The equation of
scatter diagram will more or less such curve is the equation of a straight line,
concentrate round a curve, called the i.e, a first degree equation in the variable x
Curve of Regression. and y (y= bx+a).

The mathematical equation of the However , if the curve of regression is not a


regression curve, usually called the straight line, the regression is termed as
Regression Equation, enables us to Curved or Non-Linear Regression. The
study the average change in the value of regression equation will be a functional
relation between x and y, involving terms in x
the dependent variable for any given
and y of degree higher than one, i.e, involving
value of the independent variable.
terms of the type
REGRESSION
ANALYSIS
LINES OF REGRESSION

Line of Regression of y on x is the


The lines of regression is the line
line which gives the best estimate for
which gives the best estimate of one
the value of y for any given value of
variable for any other given value of
x.
the other variable.
Similarly, Line of Regression of x on
In case of two variables x and y, we
y is the line which gives the best
shall have two lines of regression:
estimate for the value of x for any
one of y on x and the other of x on y.
given value of y.
REGRESSION
ANALYSIS
LINES OF REGRESSION

The term best fit is interpreted in accordance with the Principle of Least
Squares which consists in minimising the sum of the squares of the residuals
or the errors of estimates i.e., the deviations between the given observed
values of the variable and their corresponding estimated values as given by
the line of best fit.

We may minimise the sum of the squares of the errors parallel to y axis or
parallel to the x axis, the former gives the equation of the line of regression of y
on x and the latter gives the equation of the line of regression of x on y
REGRESSION
ANALYSIS
LINES OF REGRESSION

REGRESSION LINE: Y on X REGRESSION LINE: X on Y


REGRESSION
ANALYSIS
LINES OF REGRESSION
Let y = a+ bx be the line of regression of y
on x.

Line of Regression of Y on X

Solving the 2 above equations we can find


the equation of the line of regression of y on x
REGRESSION
ANALYSIS
LINES OF REGRESSION
Let x = A+ By be the line of regression of x
on y.

Line of Regression of X on Y

Solving the 2 above equations we can find


the equation of the line of regression of y on x
REGRESSION
ANALYSIS
LINES OF REGRESSION

The two Regression equations implies that the lines of


regression of y on x and x on y passes through the point .

In other words the mean values can be obtained as the


point of intersection of the two regression lines.
REGRESSION
WHY TWO LINES ANALYSIS
OF REGRESSION
There are always 2 lines of regression, one of y on x and the other of x on y.
The line of regression of y on x is used to estimate or predict the value of y for
any given value of x i.e., when y is the dependent variable and x is the
independent variable. The estimate so obtained will be best in the sense that it
will have the minimum possible error as defined by the principles of least
squares.

We can also obtain an estimate of x for any given value of y using this equation,
but the estimates so obtained will not be best fit since this equation is obtained
on minimising the sum of the squares of errors of estimates in y and not in x.
REGRESSION
WHY TWO LINES ANALYSIS
OF REGRESSION
Hence to estimate or predict x for any given value of y, we use the regression
equation of x on y, which is derived on minimising the sum of the squares of error
of estimates in x. Here x is the dependent variable and y is the independent
variable.

The two regression equations are not reversible or interchangeable because of


the simple reason that the basis and assumptions for deriving these equations are
quite different.

The regression equation of y on x is obtained on minimising the sum of the


squares of the errors parallel to the y axis, while the regression equation of x on y
is obtained on minimising the sum of squares of the errors parallel to the x axis.
REGRESSION
WHY TWO LINES ANALYSIS
OF REGRESSION

In the particular case of perfect correlation, positive or


negative, the equations of line of regression of y on x
coincides with the line of regression of x on y.
In general we always have two lines of regression
except in the the particular case of perfect correlation,
when the two line coincide and we get only 1 line.
REGRESSION
ANGLE BETWEEN THE ANALYSIS
REGRESSION LINES

If theta is the acute angle between the i.e., the two lines will either coincide or
two lines of regression then: they are parallel. But since both the lines
of regression intersect at , they
cannot be parallel. Hence in case of
perfect correlation, positive or negative,
the two lines of regression coincide.
In the particular case of perfect correlation,
If r=0, then i.e.,if the
two variable are uncorrelated, the two
lines of regression become perpendicular
to each other.
REGRESSION
ANGLE BETWEEN THE ANALYSIS
REGRESSION LINES

Hence, if r = 0, the two lines of


regression are perpendicular to
each other and are parallel to
the x axis and the y axis and
intersect at .
REGRESSION
ANGLE BETWEEN THE ANALYSIS
REGRESSION LINES

We have seen that if r=0, the two lines of regression are perpendicular to each other and
in the particular case of perfect correlation, the two lines of regression coincide.

This leads us to the conclusion that for higher degree of correlation between the
variables, the angle between the lines is smaller.

On the other hand, the angle between the lines increases i.e., the lines move apart as
the value of correlation coefficient decreases. In other words if the lines of regression
make a larger angle, they indicate a poor degree of correlation between the variables
and ultimately for i.e., two lines become perpendicular. Thus by plotting the lines
of regression on a graph paper, we can have an approximate idea about the degree of
correlation between the two variables under study.
REGRESSION
ANGLE BETWEEN THE ANALYSIS
REGRESSION LINES
REGRESSION
COEFFICIENT OF ANALYSIS
REGRESSION
Let us consider the line of regression of y
on x: y = bx + a

The coefficient of x -‘b’ which is the slope of


the line of regression of y on x is called
Coefficient of Regression of y on x. It
represents the increment in the value of the
dependents variable y for a unit change in
the independent variable x

In other words, it represents the rate of


change of y w.r.t. x. Similarly coefficient of
x on y.
REGRESSION
COEFFICIENT OF ANALYSIS
REGRESSION

Accordingly, the equation of the line of regression of y on x is


given by:

And the equation of the line of regression of x on y becomes:


FORMULAE FOR REGRESSION
NUMERICAL ANALYSIS
COMPUTATION
CHARACTERISTICS OF REGRESSION
REGRESSION ANALYSIS
COEFFICIENT
That the sign of each regression
● Correlation coefficient between two
variables x and y is symmetrical
coefficient depends on the covariance
function between x and y. However, term, since the correlation coefficients
the regression coefficient are not are positive. If Cov(x,y) is positive,
symmetrical function of x and y.
● We have: both the regression coefficients are
positive and if Cov(x,y) is negative,
both the regression coefficients are
negative.
From the above formulae, we observe Further, if Cov(x,y) is positive all the 3
terms are positive and vise versa.
PROPERTIES OF REGRESSION
REGRESSION ANALYSIS
COEFFICIENT

● The correlation coefficient is the GM between the regression


coefficients.
● If one of the regression coefficient is greater than 1, the other
regression coefficient must be lesser than 1.
● The arithmetic mean of the modulus value of the regression
coefficients is greater than the modulus value of the correlation
coefficient.
● Regression coefficients are independent of change in origin but not of
scale.
REGRESSION
SOME MORE ANALYSIS
FORMULAE

To find the Mean Values from the two Lines of Regression:

Let us suppose that the two lines of Regression are:

We know that both the lines of regression pass through the point , solving the above
two equations simultaneously, we get:
REGRESSION
SOME MORE ANALYSIS
FORMULAE
REGRESSION
SOME MORE ANALYSIS
FORMULAE

To find the Regression Coefficients and the Correlation Coefficient from the two lines of
Regression:

Let us suppose that the two lines of Regression are:

To obtain the coefficient of regression of y on x, we write the regression equation in the form
y = a+bx.

Then b, the coefficient of x gives the value of . Similarly for .


REGRESSION
SOME MORE ANALYSIS
FORMULAE
REGRESSION
SOME MORE ANALYSIS
FORMULAE

Given the two lines of Regression, how to determine which is the line of regression of y
on x and which is the line of regression of x on y: ……….I

Suppose the 2 lines of regression are: ……….II

Let’s assume that I is the equation of the line of regression of y on x and II is the
equation of regression of x on y. We can then go on to obtain & and hence r2.

If r2 is <1, then our assumption is correct. However if r2 comes out to be greater than
1, then our assumption is incorrect because r2 must lie between 0 & 1.
CORRELATION ANALYSIS

V/s

REGRESSION ANALYSIS
1. Correlation literally means the relationship between two or more variables which vary in
sympathy so that the movements in one tend to be accompanied by the corresponding
movements in the other(s). On the other hand, regression means stepping back or returning to
the average value and is a mathematical measure expressing the average relationship between
the two variables.

2. Correlation coefficient ‘rxy’ between two variables x and y is a measure of the direction and
degree of the linear relationship between two variables which is mutual. It is symmetric, i.e., ryx =
rxy and it is immaterial which of x and y is dependent variable and which is independent variable

Regression analysis aims at establishing the functional relationship between the two variables
under study and then using this relationship to predict or estimate the value of the dependent
variable for any given value of the independent variable. It also reflects upon the nature of the
variable, i.e., which is dependent variable and which is independent variable. Regression
coefficients are not symmetric in x and y, i.e., byx ≠ bxy.
CORRELATION ANALYSIS

V/s

REGRESSION ANALYSIS
3. Correlation need not imply cause and effect relationship between the variables under
study. However, regression analysis clearly indicates the cause and effect relationship
between the variables. The variable corresponding to cause is taken as independent
variable and the variable corresponding to effect is taken as dependent variable.

4. Correlation coefficient rxy is a relative measure of the linear relationship between x and y
and is independent of the units of measurement. It is a pure number lying between ± 1.

On the other hand, the regression coefficients, byx and bxy are absolute measures
representing the change in the value of the variable y(x), for a unit change in the value of
the variable x(y). Once the functional form of regression curve is known, by substituting the
value of the dependent variable we can obtain the value of the independent variable and
this value will be in the units of measurement of the variable.
CORRELATION ANALYSIS

V/s

REGRESSION ANALYSIS

5. There may be non-sense correlation between two variables which is due to


pure chance and has no practical relevance, e.g., the correlation between the
size of shoe and the intelligence of a group of individuals. There is no such
thing like non-sense regression.
6. Correlation analysis is confined only to the study of linear relationship
between the variables and, therefore, has limited applications. Regression
analysis has much wider applications as it studies linear as well as non-linear
relationship between the variables.
REGRESSION
ANALYSIS
From the following data, obtain the two regression equations :

Sales 91 97 108 121 67 124 51 73 111 57

Purchases 71 75 69 97 70 91 39 61 80 47
x y

91 71 1 1 1 1 1

97 75 7 5 49 25 35

108 69 18 -1 324 1 -18

121 97 31 27 961 729 837

67 70 -23 0 529 0 0

124 91 34 21 1156 441 714

51 39 -39 -31 1521 961 1209

73 61 -17 -9 289 81 153

111 80 21 10 441 100 210

57 47 -33 -23 1089 529 759

Σ x = 900 Σ y = 700 Σ dx = 0 Σ dy = 0 Σ dx 2 = Σ dy 2 = Σ dx dy =
6360 2868 3900
bxy = Σ (x – x– ) (y – y– )
byx = Σ(x – x– ) (y – y– )
Σ(y – y–)2
Σ(x – x– )2
= Σ dx dy
= Σ dx dy
Σdy 2
Σdx2
= 3900
= 3900
2868
6360
Regression Equations = 1·361
= 0·6132
Equation of line of regression of y on x is
y – y – = byx (x – x– )
⇒ y – 70 = 0·6132 (x – 90)
= 0·6132x – 55·188
⇒ y = 0·6132x – 55·188 + 70·000
⇒ y = 0·6132x + 14·812

Equation of line of regression of x on y is


x – x – = bxy (y – y– ) ⇒ x – 90 = 1·361 (y – 70)
= 1·361y – 95·27 ⇒ x = 1·361y – 95·27 + 90·00
⇒ x = 1·361y – 5·27
Remark. We have
r2 = byx bxy = 0·6132 × 1·361 = 0·8346 ⇒ r = ± ⎯√0·8346 = ± 0·9135
But since, both the regression coefficients are positive, r must be
positive. Hence, r = 0·9135.

You might also like