You are on page 1of 52

LOGO

CORRELATION
ANALYSIS
MBA A
Newish
Jashan
Jotdeep Singh
Yogesh

Introduction
Correlation a LINEAR association between two
random variables
Correlation analysis show us how to determine
both the nature and strength of relationship
between two variables
When variables are dependent on time correlation
is applied
Correlation lies between +1 to -1

A zero correlation indicates that there is no


relationship between the variables
A correlation of 1 indicates a perfect negative
correlation
A correlation of +1 indicates a perfect positive
correlation

Types of Correlation
There are three types of correlation

Types

Type 1

Type 2

Type 3

Type1
Positive

Negative

No

Perfect

If two related variables are such that


when one increases (decreases), the
other also increases (decreases).
If two variables are such that when
one increases (decreases), the other
decreases (increases)
If both the variables are independent

Type 2

Linear

Non linear

When plotted on a graph it tends to be a perfect


line
When plotted on a graph it is not a straight line

Type 3
Simple

Multiple

Partial

Two independent and one dependent variable


One dependent and more than one independent
variables
One dependent variable and more than one
independent variable but only one independent
variable is considered and other independent
variables are considered constant

Methods of Studying Correlation


Scatter Diagram Method

Karl Pearson Coefficient Correlation of


Method

Spearmans Rank Correlation Method

Correlation: Linear
Relationships

Strong relationship = good linear fit


180

160
140

140

120

120

S ymptom Index

S y m pto m In de x

160

100
80
60

100
80
60

40

40

20

20

0
0

50

100

150

Drug A (dose in mg)

Very good fit

200

250

0
0

50

100

150

200

250

Drug B (dose in mg)

Moderate fit

Points clustered closely around a line show a strong correlation.


The line is a good predictor (good fit) with the data. The more
spread out the points, the weaker the correlation, and the less
good the fit. The line is a REGRESSSION line (Y = bX + a)

Coefficient of Correlation
A measure of the strength of the linear relationship
between two variables that is defined in terms of
the (sample) covariance of the variables divided by
their (sample) standard deviations
Represented by r
r lies between +1 to -1
Magnitude and Direction

-1 < r < +1
The + and signs are used for positive linear
correlations and negative linear correlations,
respectively


r xy

n XY X Y

n X ( X ) nY (Y )
2

Shared variability of X and Y variables on the


top
Individual variability of X and Y variables on the
bottom

Interpreting Correlation
Coefficient r
strong correlation: r > .70 or r < .70
moderate correlation: r is between .30 &
.70
or r is between .30 and .70
weak correlation: r is between 0 and .30
or r is between 0 and .30 .

Coefficient of Determination
Coefficient of determination lies between 0 to 1
Represented by r2
The coefficient of determination is a measure of how
well the regression line represents the data
If the regression line passes exactly through every point
on the scatter plot, it would be able to explain all of the
variation
The further the line is away from the points, the less it is
able to explain

r 2, is useful because it gives the proportion of the variance


(fluctuation) of one variable that is predictable from the
other variable
It is a measure that allows us to determine how certain one
can be in making predictions from a certain model/graph
The coefficient of determination is the ratio of the explained
variation to the total variation
The coefficient of determination is such that 0 < r 2 < 1, and
denotes the strength of the linear association between x and
y

The Coefficient of determination represents the


percent of the data that is the closest to the line of
best fit
For example,

if r = 0.922, then r 2 = 0.850

Which means that 85% of the total variation in y


can be explained by the linear relationship between
x and y (as described by the regression equation)
The other 15% of the total variation in y remains
unexplained

Spearmans rank coefficient


A method to determine correlation when the data
is not available in numerical form and as an
alternative the method, the method of rank
correlation is used. Thus when the values of the
two variables are converted to their ranks, and
there from the correlation is obtained, the
correlations known as rank correlation.

Computation of Rank Correlation


Spearmans rank correlation coefficient
can be calculated when
Actual ranks given
Ranks are not given but grades are given but not
repeated
Ranks are not given and grades are given and
repeated

LOGO

BUSINESS STATISTICS
PRESENTATION
ON
REGRESSION ANALYSIS

OBJECTIVES OF THE PRESENTATION-

What is regression analysis


Types and methods of regression analysis
Practical aspect of regression analysis with an
example

INTRODUCTIONRegression analysis is the statistical tool which is


employed for the purpose of forecasting or making
estimates
Here we make use of various mathematical formulas
and assumptions to describe a real world situation.
In every situation, estimation becomes easy once it is
known that the variable to be estimated is related to and
dependent to some other variable.

For making estimates we first have to model the relationship


between the variable involved .
Models can me broadly be classified into
Linear regression Linear regression analysis is a powerful technique used for
predicting the unknown value of a variable from the known
value of another variable.
More precisely, if X and Y are two related variables, then
linear regression analysis helps us to predict the value of Y
for a given value of X or vice verse.
For example age of a human being and maturity are related
variables. Then linear regression analyses can predict level
of maturity given age of a human being.

Multiple regression Multiple regression analysis is a powerful technique


used for predicting the unknown value of a variable from
the known value of two or more variables- also called the
predictors.
Multiple regression analysis helps us to predict the value
of Y for given values of X1, X2, , Xk.
For example the yield of rice per acre depends upon
quality of seed, fertility of soil, fertilizer used, temperature,
rainfall. If one is interested to study the joint affect of all
these variables on rice yield, one can use this technique.

Dependent and Independent Variables By linear regression, we mean models with just one
independent and one dependent variable. The variable whose
value is to be predicted is known as the dependent variable
and the one whose known value is used for prediction is
known as the independent variable.
By multiple regression, we mean models with just one
dependent and two or more independent variables. The
variable whose value is to be predicted is known as the
dependent variable and the ones whose known values are
used for prediction are known independent variables.

Methods of solving regression models1) GRAPHICAL

METHOD-

In this graphical method the average relationship


between the dependent variable and independent
variable is expressed by a line called line of best fit.
Example:

Experience( in
years)

Income( in
000)

15

150

10

120

60

40

70

90

240
210
income

Line of best fit

180
150
120
90
60
30

10
experience

12

1418

16

2) ALGEBRIC

METHOD-

In this method we make use of regression equation


and regression coefficients.

Regression equation(Linear).
A statistical technique used to explain or predict thebehaviour of a dependent
variable
The general equation is given by-

y = a + bx

a is the intercept
b is the slope of line
With the use of the above general equation we find the normal equations
Multiplying the general equation by N and taking the summatation of it
we find the first normal equation i.e.

Y = N.a + bX
And again to find the second normal equation we multiply the general
equation by x and then take the summatation i.e.

XY=a X + b X2

Regression equation(Multiple).
General equation => y = a + b1 x1 + b2x2 + .........+ bnxn
Normal equations for multiple regression are:
Y = N.a + b1X1 + b2X2
X1Y= a X1 + b1 X1 2 + b2 X1 . X2
X2Y= a X2 + b1 X1 . X2 + b2 X22

Lines of Regression
There are two lines of regression- that of Y on X and X on Y.
The line of regression of Y on X is given by Y = a + bX where a and b
are unknown constants known as intercept and slope of the equation.
This is used to predict the unknown value of variable Y when value of
variable X is known.
On the other hand, the line of regression of X on Y is given by X = c +
dY which is used to predict the unknown value of variable X using the
known value of variable Y.
Often, only one of these lines make sense.
Exactly which of these will be appropriate for the analysis in hand will
depend on labeling of dependent and independent variable in the
problem to be analyzed.

Regression coefficientsThe coefficient of X in the line of regression of Y on X is called the


regression coefficient of Y on X and is denoted by b y x
It represents change in the value of dependent variable (Y)corresponding to
unit change in the value of independent variable (X).
And similarly the coefficient of Y in the line of regression of X on Y is
called coefficient of X on Y and is denoted by b x y .
The two regression co-efficient are byx and bxy .
The formula for the two regression co- efficient are given by

or

b y x = N .XY X .
Y N. X2 (X)2

xy

= N. XY X . Y
N. Y2 (Y)2

How Good Is the Regression?


Once a regression equation has been constructed, we can
check how good it by examining the coefficient of
determination (R2).
R2 always lies between 0 and 1.
The closer R2 is to 1, the better is the model and its
prediction.

PRACTICAL ASPECT OF REGRESSION ANALYSIS

Here we will show a linear regression analysis between two

variables X and Y.
Variable X is taken as driving experience and variable Y is
taken as number of road accidents(in a year).
Road accident is taken as the dependent variable and which
is related to independent variable X i.e. driving experience.
X
5
(driving
experienc
e)

12

15

25

16

Y ( no. of
road
accidents)

87

50

71

44

56

42

60

64

From the date we will show The estimated regression line for the date.
Number of road accidents taking place when the
driving experience is 10 years and 30 years.
co efficient of determination(R2) and which will
help us to know that how much percentage of
dependent variable is explained by independent
variable.

The following is the tabular representation of data related to


driving experience and number of road accidents.
X

X.Y

X2

Y2

64

320

25

4096

87

174

7569

12

50

600

144

2500

71

639

81

5041

15

44

660

225

1963

56

336

36

3136

25

42

1050

625

1764

16

60

960

256

3600

X=90

Y=474

X.Y=47 X2=139 Y2=2964


39
6
2

Since the estimated regression line is given by Y = a + b.X , now


using the normal equations we calculate the value of a and b .
Y = N. a + b X

XY=a X + b X2

474= 8.a + b.90

4739 = a.90 + b.1396

8a + 90b = 474 E .q - 1

90a + 1396 b = 4739

E.q-2

Now solving both the equation we get the value of a and b asValue of a = 76.66
Value of b = -1.5475
The estimated regression line is
Y = 76.66 1.5476 X

Trend line for


Y = 76.66 1.5476 X

80
70
60

50
No. Of accidents
40
30
20
10
3
18

6
21

9
12
24
27
experience

15

Since we all know that the road accidents are dependent upon the driving
experience and a new driver is considered to be inexperienced and for
him the risk of accident is more so there exist a negative relationship
between the two variables so the trend line is downward sloping in this
case.
From the above value of a and b we can see that value of a is 76.66 which
means if a driver has 0 experience then the no of road accidents that will
take place is 76.66
From the value of b we can say that for every extra year of driving
experience , the road accident is decreased by 1.5476
No of accidents with 10 yr experience
Y = 76.66 1.5476 X
Y = 76.66 1.5476 (10)
Y = 61. 184

No. of accidents with 30 yr experience


Y = 76.66 1.5476 X
Y = 76.66 1.5476 (30)
Y= 30.232

Now we find coefficient of variation for the data


using regression coefficients.
b

yx

= N .XY X . Y
N. X2 (X)2

= N. XY X . Y
N. Y2 (Y)2
= 8(4739) 90. 474

= 8 (4739) 90 . 474

8(29642)
(474)2
= 0.381

8(1396) (90)2
= 1.547
Now

xy

R2 =

b y x .b x y

= (- 1. 547) (- 0.381)
=

0.5894

From the above coefficient of determination we can say that almost 59 %


of variance of dependent variable is explained by the independent
variable.

LOGO

Conceptual Frame work of


SENSEX and Nifty

Stock Market Indices


Stock Market performance is
quantified by calculating an index
using the benchmark scrips and as
known to all
SENSEX (Sensitive
Index) is associated with Bombay Stock
Exchange and S&P CNX NIFTY is
associated
with
National
Stock
Exchange

Bombay Stock Exchange


There are 23 stock exchanges in the India.
Bombay Stock Exchange is the largest, with
over 6,000 stocks listed. The BSE accounts
for over two thirds of the total trading
volume in the country.
Established in 1875, the exchange is also the
oldest in Asia. Among the twenty-two Stock
Exchanges recognized by the Government of
India under the Securities Contracts
(Regulation) Act, 1956, it was the first one to
be recognized and it is the only one that had
the
privilege of
getting
permanent

Scrips at BSE

ACC
AIRTEL
BHEL
DLF
GRASIM
GUJRAT AMBUJA
HDFC
HDFC BANK
HINDALCO
HUL
ICICI BANK
INFOSYS
SUN Pharma IND.
LTD
ITC
L&TMARUTI

o MARUTI
o MAHINDRA &
MAHINDRA
o NTPC
o ONGC
o RANBAXY
o RELIANCE
COMMUNICATION
o RELIANCE
INFRASTRUCTURE
o RIL
o STERLITE
INDUSTIES LTD
o SBI
o TCS
o TATA MOTERS
o TATA STEEL
o TATA POWER
COMPANY LTD
o WIPRO

National Stock Exchange


The National Stock Exchange (NSE), located
in Bombay, is India's first debt market.
It was set up in 1993 to encourage stock
exchange
reform
through
system
modernization and competition.
The instruments traded are, treasury bills,
government security and bonds issued by
public sector companies

Listing History
How are
the SENSEX 30
Trading
Frequency
Rank
Stocks
are
based
on selected?
the Market Cap (Should be
Among top 100)
Market Capitalization weight
Industry / sector they belong
Historical Record

Methodology of SENSEX
SENSEX has been calculated since 1986 and
initially it was calculated based on the Total
Market Capitalization methodology and the
methodology was changed in 2003 to Free
Float Market Capitalization.
Hence, these days, the SENSEX is based on
the Free Floating Market cap of 30 SENSEX
Stocks traded on the BSE relative to the base
value which is 100(1978-79) and it is
calculated for every 15 seconds

SENSEX is calculated using the "Free-float


Market Capitalization" methodology, wherein, the
level of index at any point of time reflects the freefloat market
It reflects value of 30 component stocks relative to
a base period.
The market capitalization of a company is
determined by multiplying the price of its stock by
the number of shares issued by the company.
This market capitalization is further multiplied by
the free-float factor to determine the free-float

How SENSEX is calculated?

The formula for calculating the SENSEX =


(Sum of free flow market cap of 30
benchmark stocks)*Index Factor
where,
Index Factor = 100/Market Cap Value in
1978-79.
100 is the Index value during 1978-79.

How NIFTY is calculated?

The National Stock Exchange (NSE) is


associated with NIFTY and it is also
calculated by the same methodology but with
two key differences.
1. Base year is 1995 and base value is 1000.
2. NIFTY is calculated based on 50 stocks.

Formulae for valuation

Free float market


Capital

SENSEX= Market Capital in


1978-79

Base index points of


1978-79