You are on page 1of 49

Online Class Etiquettes and Precautions for the

Students

• Access of outsiders are not allowed.


• Contents of the class are not shareable to outsiders.
• Audio/video recording of the online sessions are not shareable
on social media platforms such as Facebook, Twitter etc.
• If any student is found to breach the mentioned code of
conduct, upon proper investigation, he/she will be subject to
penalty as per the University Guidelines.
Correlation Analysis

• Combination of two words (Co- together and relation


– connection)
• Large number of problems relating/ involving the use
of two or more than two variable.
i.e movements in one are accompanied by the
movements in the other.
• Ex- Income and expenditure
• - Sales and advertisement
-Sales of mask and Covid-19
- e learning app and Covid -19
- Price of a commodity and demand amount
Correlation Analysis

• The statistical tool with the help of which we measure the


strength of the association/ connection between two or more
variables is termed as correlation.

• A.M. Tuttle, - An analysis of the co variation of two or more


variables is usually called correlation.

• W.I King- means that between two series or groups of data


there exists some causal connection.
Significance of the study of Correlation

• Provides the degree of relationship between the variables


• Helpful for the purpose of prediction and forecasting
• Verifying the reliability and accuracy of the data
• Important for business and economic point of view
CORRELATION DOES NOT NECESSARILY MEAN CAUSATION

• Two variables may be related to each other but this does not
mean that one variable causes the other. For example, we
may find that logical reasoning and creativity are correlated,
but that does not mean if we could increase peoples’ logical
reasoning ability, we would produce greater creativity. We
need to conduct an actual experiment to unequivocally
demonstrate a causal relationship. But if it is true that
influencing someones’ logical reasoning ability does influence
their creativity, then the two variables must be correlated
with each other. In other words, causation always implies
correlation, however converse is not true.
CORRELATION DOES NOT NECESSARILY MEAN
CAUSATION

1. Chance coincidence : A small sample bivariate series may


show the relationship but such a relationship may not exist
in the universe.

2. Influence of mutual dependence: It is possible that both the


variables are influenced by one or more other variables.For
example, expenditure on food and entertainment for a given
number of households show a positive relationship because
both have increased over time. But, this is due to rise in
family incomes over the same period. In other words, the two
variables have been influenced by another variable - increase
in family incomes.
CORRELATION DOES NOT NECESSARILY MEAN
CAUSATION

3. There may be another situation where both the variables may


be influencing each other so that we cannot say which is the
cause and which is the effect. For example, take the case of
price and demand. The rise in price of a commodity may lead
to a decline in the demand for it. Here, price is the cause and
the demand is the effect. In yet another situation, an increase
in demand may lead to a rise in price. Here, the demand is the
cause while price is the effect, which is just the reverse of the
earlier situation. In such situations, it is difficult to identify
which variable is causing the effect on which variable, as both
are influencing each other.
CORRELATION DOES NOT NECESSARILY MEAN
CAUSATION

4. Influence of another variable: positive correlation Between yield of


rice and jute ( amount of rainfall)
The foregoing discussion clearly shows that correlation does
not indicate any causation or functional relationship.
Correlation coefficient is merely a mathematical relationship
and this has nothing to do with cause and effect relation. It
only reveals co-variation between two variables. Even when
there is no cause-and-effect relationship in bivariate series
and one interprets the relationship as causal, such a
correlation is called spurious or non-sense correlation.
Obviously, this will be misleading. As such, one has to be very
careful in correlation exercises and look into other relevant
factors before concluding a cause-and-effect relationship.
The Coefficient of Correlation, r

The Coefficient of Correlation (r) is a measure of the


strength of the relationship between two variables. It
requires interval or ratio-scaled data.
• It can range from -1.00 to 1.00.
• Values of -1.00 or 1.00 indicate perfect and strong
correlation.
• Values close to 0.0 indicate weak correlation.
• Negative values indicate an inverse relationship and
positive values indicate a direct relationship.

9
Types of correlation

1. Positive Correlation: Positive correlation occurs when an


increase in one variable increases the value in another.
2. Negative Correlation: Negative correlation occurs when an
increase in one variable decreases the value of another.
3. No Correlation: No correlation occurs when there is no linear
dependency between the variables.
4. Partial correlation: It measures the strength of a relationship
between two variables, while controlling for the effect of one
or more other variables.
Types of correlation

6. Linear Correlation: Correlation is said to be linear if the ratio of change


is constant. When the amount of output in a factory is doubled by
doubling the number of workers, this is an example of linear correlation.
• If the change in one variable is accompanied by change in another variable
in a constant ratio,
• it is a case of linear correlation. Observe the following data:
• X : 10 20 30 40 50
• Y : 25 50 75 100 125

7. Non Linear Correlation: Correlation is said to be non linear if the ratio


of change is not constant. For example, doubling the rainfall won’t harvest
twice crops.
Perfect Correlation

12
Correlation Coefficient - Interpretation

13
Methods of Coefficient of Correlation

i. Scatter diagram
I. Karl Pearson’s Coefficient of Correlation
II. Spearman’s Rank of coefficient of Correlation
III. Standard error of Coefficient of Correlation

i. Scatter diagram: This method is also known as Dotogram or Dot


diagram. Scatter diagram is one of the simplest methods of
diagrammatic representation of a bivariate distribution. Under this
method, both the variables are plotted on the graph paper by putting
dots. The diagram so obtained is called "Scatter Diagram". By studying
diagram, we can have rough idea about the nature and degree of
relationship between two variables.
Scatter diagram

1. if the plotted points are very close to each other, it indicates high
i.

degree of correlation. If the plotted points are away from each other, it
indicates low degree of correlation.
2. if the points on the diagram reveal any trend (either upward or downward),
the variables are said to be correlated and if no trend is revealed, the
variables are uncorrelated.
3. if there is an upward trend rising from lower left hand corner and going
upward to the upper right hand corner, the correlation is positive since
this reveals that the values of the two variables move in the same
direction. If, on the other hand, the points depict a downward trend from
the upper left hand corner to the lower right hand corner, the correlation
is negative since in this case the values of the two variables move in the
opposite directions.
4. In particular, if all the points lie on a straight line starting from the left
bottom and going up towards the right top, the correlation is perfect and
positive, and if all the points like on a straight line starting from left top
and coming down to right bottom, the correlation is perfect and negative.
Scatter diagram
Scatter diagram

Figure 4-1 Scatter Diagrams


2. PEARSON’S COEFFICIENT OF CORRELATION

2. PEARSON’S COEFFICIENT OF CORRELATION


A mathematical method for measuring the intensity or the magnitude of
linear relationship between two variables was suggested by Karl Pearson
(1867-1936), a great British Biometrician and Statistician and, it is by far the
most widely used method in practice. Karl Pearson’s measure, known as
Pearsonian correlation coefficient between two variables X and Y, usually
denoted by r(X,Y) or rxy or simply r is a numerical measure of linear
relationship between them and is defined as the ratio of the covariance
between X and Y, to the product of the standard deviations of X and Y.
Symbolically,
PEARSON’S COEFFICIENT OF CORRELATION

• Assumption of Karl Pearson’s Coefficient of Correlation:

• Linear relationship
• Causal relationship
• Error of measurement
PEARSON’S COEFFICIENT OF CORRELATION
Problem
Problem
Properties of Pearsonian Correlation Coefficient

1. Pearsonian correlation coefficient cannot exceed 1


numerically. In other words it lies between –1 and +1.
Symbolically,
-1 ≤ r ≤1

2. The sign of r indicate the nature of the correlation. Positive


value of r indicates positive correlation, whereas negative
value indicates negative correlation. r = 0 indicate absence of
correlation.
Properties of Pearsonian Correlation Coefficient
Properties of Pearsonian Correlation
Coefficient
4. Pearsonian Correlation coefficient is independent of the change of origin
and scale. Mathematically, if given variables X and Y are transformed to
new variables U and V by change of origin and scale, then the correlation
Coefficient between X and Y is same as the correlation coefficient between
U and V i.e., r(X,Y) = r(U, V) => rxy = ruv
5. Two independent variables are uncorrelated but the converse is not true
If X and Y are independent variables then rxy = 0
6. Pearsonian coefficient of correlation is the geometric mean of the two
regression coefficients. The signs of both the regression coefficients are
the same, and so the value of r will also have the same sign.
Properties of Pearsonian Correlation
Coefficient
7. The square of Pearsonian correlation coefficient is known as
the coefficient of determination. Coefficient of
determination, which measures the percentage variation in
the dependent variable that is accounted for by the
independent variable, is a much better and useful measure
for interpreting the value of r.
Coefficient of Determination
The coefficient of determination (r2) is the proportion of
the total variation in the dependent variable (Y) that is
explained or accounted for by the variation in the
independent variable (X). It is the square of the coefficient
of correlation.

• It ranges from 0 to 1.
• It does not give any information on the direction of the
relationship between the variables.

27
Probable Error of Correlation Coefficient

• Probable error of the correlation coefficient is such a measure


of testing the reliability of the observed value of the
correlation coefficient, when we consider it as satisfying the
conditions of the random sampling. If r is the observed value
of the correlation coefficient in a sample of N pairs of
observations for the two variables under consideration, then
the Probable Error, denoted by PE (r) is expressed as
Probable Error of Correlation Coefficient
Problem
Problem
Problem

Using the Copier Sales of


America data compute ,
i. The correlation coefficient
and coefficient of
determination. Interpret the
result.
ii. Also find the probable error
of the data and explain the
significance of the data.

32
Correlation Coefficient - Example

How do we interpret a correlation of 0.759?


First, it is positive, so we see there is a direct relationship between
the number of sales calls and the number of copiers sold. The value
of 0.759 is fairly close to 1.00, so we conclude that the association
is strong.

However, does this mean that more sales calls cause more sales?
No, we have not demonstrated cause and effect here, only that the
two variables—sales calls and copiers sold—are related.
33
Coefficient of Determination (r2) - Example

•The coefficient of determination, r2 ,is 0.576, found by (0.759)2

•This is a proportion or a percent; we can say that 57.6 percent of the


variation in the number of copiers sold is explained, or accounted for, by
the variation in the number of sales calls.

34
Lag and Lead in correlation
I
• In correlation of time series the investigator may find there is a
gap before a cause and effect relationship is established.

• For example, the supply of a commodity may increase today,


but it may not have an immediate effect on prices- it may take
few days or even months for prices to adjust to the increased
supply

• The difference in the period before a cause and effect


relationship is established is called lag. Ignoring this time gap
produce fallacious conclusions. The pairing of items is
adjusted according to the time lag.
Lag and Lead in correlation
• Pran juice is studying the effect of its latest advertising campaign. People
chosen at random are called and asked how many juice they had bought in
the past week and how many advertisement they have either seen in the
past week.

Months ; Jan Feb Mar Apr May Jun Jul Aug


Number of ads : 3 7 4 2 0 4 1 2
juice purchased: 11 18 9 4 7 6 3 8

• Allowing two months time lag calculate coefficient of correlation


Calculate the sample coefficient of determination. Interpret the result.

SPEARMAN’S RANK CORRELATION

• Sometimes we come across statistical series in which the variables


under consideration are not capable of quantitative measurement
but can be arranged in serial order.

• This happens when we are dealing with qualitative characteristics


(attributes) such as honesty, beauty, character, morality, etc., which
cannot be measured quantitatively but can be arranged serially. In
such situations Karl Pearson’s coefficient of correlation cannot be
used as such.

• Charles Edward Spearman, a British Psychologist, developed a


formula in 1904, which consists in obtaining the correlation
coefficient between the ranks of N individuals in the two
attributes under study.
Rank correlation
Repeated Ranks
Rank correlation
Rank correlation
problem
Testing the Significance of
the Correlation Coefficient

H0: ρ = 0 (the correlation in the population is 0)


H1: ρ ≠ 0 (the correlation in the population is not 0)
Reject H0 if:
t > tα/2,n-2 or t < -tα/2,n-2

43
Testing the Significance of
the Correlation Coefficient - Example

H0: ρ = 0 (the correlation in the population is 0)


H1: ρ ≠ 0 (the correlation in the population is not 0)
Reject H0 if:
t > tα/2,n-2 or t < -tα/2,n-2
t > t0.025,8 or t < -t0.025,8
t > 2.306 or t < -2.306

44
Testing the Significance of
the Correlation Coefficient - Example

The computed t (3.297) is within the rejection region, therefore, we will reject H 0. This means the
correlation in the population is not zero. From a practical standpoint, it indicates to the sales
manager that there is correlation with respect to the number of sales calls made and the
number of copiers sold in the population of salespeople .
45
Correlation(Contd.)

❖ Spearman’s rank of coefficient of correlation is suitable for Qualitative data like honesty,
efficiency, intelligence etc. not applicable for large and grouped data set.

Here, d is the difference between two ranks.

If tie in Ranks:

❖ Standard error of coefficient of correlation:


Pearson's correlation coefficient

Assumption of Karl Pearson’s Coefficient of Correlation:

• Linear relationship: If the paired observation of both the


variables are plotted on a scatter diagram , then the plotted
points will form a straight line.
• Causal relationship: If there is often a cause effect
relationship between the forces affecting the distribution of
the obs in the two variables, then the correlation is
meaningful. Otherwise not.
• Error of measurement: If error of measurement is reduced
minimum, then the correlation is meaningful.
Pearson's correlation coefficient

• If the variables are independent, Pearson's correlation coefficient is 0, but


the converse is not true because the correlation coefficient detects only
linear dependencies between two variables.

• Correlation is synonymous with dependence. Correlation is the measure of


how two or more variables are related to one another. There are several
correlation coefficients, often denoted ρ (rho )/r or R , measuring the degree
of correlation.
• The most common of these is the Pearson correlation coefficient, which is
sensitive only to a linear relationship between two variables (which may be
present even when one variable is a nonlinear function of the other).

• Other correlation coefficients – such as Spearman's rank correlation – have


been developed to be more robust than Pearson's, that is, more sensitive to
nonlinear relationships. Mutual information can also be applied to measure
dependence between two variables.
Pearson's correlation coefficient

• For example, suppose the random variable X is symmetrically distributed


about zero, and Y = X 2 Then Y is completely determined by X , so that X
and Y are perfectly dependent, but their correlation is zero; they are
uncorrelated. However, in the special case when X and Y are jointly
normal, uncorrelatedness is equivalent to independence.

• Even though uncorrelated data does not necessary imply independence,


one can check if random variables are independent if their mutual
information is 0.

You might also like