You are on page 1of 39

# PROBABILITY AND DATA ANALYSIS

CORRELATION ANALYSIS

## PRANOTI DEEPAK DARE

121620003
F.Y.MTECH(CONSTRUCTION MANAGEMENT)
CONTENTS

Necessity
Introduction

Significance

## Correlation and causation

Types of correlation

## Methods of studying correlation

Multiple correlation

Limitations
CIVIL ENGINEERING NECESSITY
Many hydrologic variables are related to each
other through cause and effect
Changes in the values of one or more variables
cause changes in some other variable
When simultaneous observations on such
hydrological variables are available, one may be
interested in finding out how strong is such
association.
Linear association between hydrologic variables
is expressed by the correlation
If one variable drives the other, they may be
correlated, as rainfall and runoff
The variables may also be correlated if they
share the same cause, such as river discharge,
concentration or transport rates of sediment
Correlation Analysis attempts to determine the
degree of relationship between variables- Ya-
Kun-Chou.
Correlation is an analysis of the covariation
between two or more variables.- A.M.Tuttle.
INTRODUCTION
Correlation: A LINEAR association between two
random variables
Correlation analysis show us how to determine
both the nature and strength of relationship
between two variables
When variables are dependent on time
correlation is applied
Correlation lies between +1 to -1
A zero correlation indicates that there is no
relationship between the variables
A correlation of 1 indicates a perfect negative
correlation
A correlation of +1 indicates a perfect positive
correlation
SIGNIFICANCE

## Correlation coefficient is significantly different

from 0, we say that the correlation coefficient is
"significant
Correlation coefficient is not significantly
different from 0 (it is close to 0), we say that
correlation coefficient is "not significant"
CORRELATION DOES NOT NECESSARILY IMPLY
CAUSATION

## A correlation does not mean that there is causation.

Correlation means that there is a relationship between
two variables.
Causation means that if you see a change in your
explanatory variable, it should cause a change in the
response variable.
Even if a correlation is very strong, this is not by itself
good evidence that a change in x will cause a change in
y
EXAMPLE
One study in Victorian England showed a strong
correlation between people wearing top hats, and their
life expectancy. This relationship was shown to be
very strong (high r).
Does this mean that had Queen Victoria provided free
top-hats for all, the life expectancy in England would
have shot up?
There is a confirmed correlation. However, there is
NO causation. That is, wearing top hats does not
cause people to live longer.
Which situation describes a correlation that is
not a causal relationship?

## (1) The rooster crows, and the Sun rises.

(2) The more miles driven, the more gasoline
needed.
(3) The more powerful the microwave, the
faster the food cooks.
(4) The faster the pace of a runner, the quicker
the runner finishes.
POSITIVE CORRELATION
If all the plotted points form a straight line from
lower left hand corner to the upper right hand
corner
Perfectly Positive Correlation is denoted by
r = +1
NEGATIVE CORRELATION
If all the plotted dots lie on a straight line falling
from upper left hand corner to lower right hand
corner
Perfectly Negative Correlation is denoted by
r = -1
Depends upon the direction of change
of the variables

Positive Negative
If the two variables If the two variables
tend to move together tend to move together
in the same direction, in the opposite
then it is called positive direction, then it is
or direct correlation called negative or
Price and supply, height inverse correlation
and weight, yield and Price and demand,
rainfall yield of crop and price
Simple Multiple
One dependent
One independent and
and more than
one dependent
one independent
variable
variables
quantity of money
price, demand and
and price level supply
Partial
One dependent variable and
more than one independent
variable but only one
independent variable is
considered and other
independent variables are
considered constant
price and demand
eliminating supply side
Linear Non linear
When plotted on a When plotted on a
graph it tends to be graph it is not a
a perfect line straight line
METHODS OF STUDYING CORRELATION
Scatter Diagram Method
Karl Pearson Coefficient Correlation of Method

## Spearmans Rank Correlation Method

SCATTER DIAGRAM METHOD

## Simplest method of studying the relationship

between two variables diagrammatically
One variable is represented along the horizontal
axis and the second variable along the vertical
axis
For each pair of observations of two variables, we
put a dot in the plane
REGRESSION LINE
Regression line
The straight line of best fit drawn through the
points on a scatterplot
CORRELATION: LINEAR RELATIONSHIPS
Strong relationship = good linear fit

FIT

## Points clustered closely around a line show a strong

correlation
COEFFICIENT OF CORRELATION
A measure of the strength of the linear
relationship between two variables that is
defined in terms of the (sample) covariance of the
variables divided by their (sample) standard
deviations
Represented by r

r lies between +1 to -1

-1 < r < +1

## The + and signs are used for positive linear

correlations and negative linear correlations,
respectively
INTERPRETING CORRELATION
COEFFICIENT R
Strong correlation: r > .70 or r < .70
Moderate correlation: r is between .30 & .70 or r
is between .30 and .70
Weak correlation: r is between 0 and .30 or r is
between 0 and .30 .
COEFFICIENT OF DETERMINATION
Coefficient of determination lies between 0 to 1
Represented by r.r

## The coefficient of determination is a measure of

how well the regression line represents the data
If the regression line passes exactly through
every point on the scatter plot, it would be able to
explain all of the variation
The farther the line is from the points, the less it
is able to explain the correlation
r.r is useful because it gives the proportion of the
variance (fluctuation) of one variable that is
predictable from the other variable
It is a measure that allows us to determine how
certain one can be in making predictions from a
certain model/graph
The coefficient of determination is the ratio of the
explained variation to the total variation
The coefficient of determination is such that
0 < r.r < 1, and denotes the strength of the linear
association between x and y
The Coefficient of determination represents the
percent of the data that is the closest to the line
of best fit
For example, if r = 0.922, then r.r = 0.850

## Which means that 85% of the total variation in y

can be explained by the linear relationship
between x and y (as described by the regression
equation)
The other 15% of the total variation in y remains
unexplained
EXAMPLE:
STRONG CORRELATION AS r=0.95
SPEARMANS RANK COEFFICIENT
A method to determine correlation when the data
is not available in numerical form
As an alternative the method, the method of rank
correlation is used
When the values of the two variables are
converted to their ranks, and there from the
correlation is obtained, the correlations known as
rank correlation.
r = rank correlation coefficient

## Some authors use the symbol r for rank correlation.

D.D = sum of squares of differences between the pairs of ranks.
n = number of pairs of observations.
COMPUTATION FOR TIED OBSERVATIONS
For example
If the value so is repeated twice at the 5th rank,
the common rank to be assigned to each item is
average of 5 and 6 i.e 5.5
If the ranks are tied, it is required to apply a
correction factor which is
EXAMPLE
Interpretation: There is uniformity in the
performance of students in the two tests
MULTIPLE CORRELATION
Statistical technique that predicts values of one
variable on the basis of two or more other
variables
Video of SPSS analysis
DID YOU KNOW???

## Rooster crowing is perfectly

correlated (r=1.0) with the sun
rising?
LIMITATIONS
Correlation is among the most powerful techniques available
to researchers.
These techniques require:
Every variable is measured at the interval-ratio level
Each independent variable has a linear relationship
with the dependent variable
Independent variables do not interact with each other
Independent variables are uncorrelated with each other
When these requirements are violated (as they often are),
these techniques will produce biased and/or inefficient
estimates.
REFERENCES
NPTEL/Module11/Lect39,40
https://en.wikipedia.org/wiki/Multiple_correlati
on
https://www.stat.auckland.ac.nz/~teachers/2007/..
./correlation-causation
http://www.powershow.com/view1/27a1c9-
ZDc1Z/Coefficient_of_Multiple_Correlation_powe
rpoint_ppt_presentation
Statistics: Higher Secondary First Year
TAMILNADU TEXTBOOK CORPORATION
THANK YOU!