8 views

Uploaded by Newish Kainauria

correlation and regration

correlation and regration

© All Rights Reserved

- 2016 fm units 34 course outline- semester 1
- Correlation and regression
- Multiple Regression
- Econ5025 Practice Problems
- LAB01_Rep_15109066-15109086
- tmpC2E1.tmp
- MB0040-MBA-1st Sem 2011 Assignment Staistics for Managment
- Senior Capstone
- Correlation and Dependence
- Spss Notes
- Bivariate Data
- S08 2013 Proceedings
- Correlation and Regression
- STAB22 Midterm 2009F
- Is Relationship Marketing for Everyone
- ch3_10_15_test_1 (1)
- Validation of scoring Instruments in Obstetrics- Gynaecology
- spsssyl
- Business Applications of Regressions
- Weather-based coffee leaf rust apparent infection rate modeling.pdf

You are on page 1of 52

CORRELATION

ANALYSIS

MBA A

Newish

Jashan

Jotdeep Singh

Yogesh

Introduction

Correlation a LINEAR association between two

random variables

Correlation analysis show us how to determine

both the nature and strength of relationship

between two variables

When variables are dependent on time correlation

is applied

Correlation lies between +1 to -1

relationship between the variables

A correlation of 1 indicates a perfect negative

correlation

A correlation of +1 indicates a perfect positive

correlation

Types of Correlation

There are three types of correlation

Types

Type 1

Type 2

Type 3

Type1

Positive

Negative

No

Perfect

when one increases (decreases), the

other also increases (decreases).

If two variables are such that when

one increases (decreases), the other

decreases (increases)

If both the variables are independent

Type 2

Linear

Non linear

line

When plotted on a graph it is not a straight line

Type 3

Simple

Multiple

Partial

One dependent and more than one independent

variables

One dependent variable and more than one

independent variable but only one independent

variable is considered and other independent

variables are considered constant

Scatter Diagram Method

Method

Correlation: Linear

Relationships

180

160

140

140

120

120

S ymptom Index

S y m pto m In de x

160

100

80

60

100

80

60

40

40

20

20

0

0

50

100

150

200

250

0

0

50

100

150

200

250

Moderate fit

The line is a good predictor (good fit) with the data. The more

spread out the points, the weaker the correlation, and the less

good the fit. The line is a REGRESSSION line (Y = bX + a)

Coefficient of Correlation

A measure of the strength of the linear relationship

between two variables that is defined in terms of

the (sample) covariance of the variables divided by

their (sample) standard deviations

Represented by r

r lies between +1 to -1

Magnitude and Direction

-1 < r < +1

The + and signs are used for positive linear

correlations and negative linear correlations,

respectively

r xy

n XY X Y

n X ( X ) nY (Y )

2

top

Individual variability of X and Y variables on the

bottom

Interpreting Correlation

Coefficient r

strong correlation: r > .70 or r < .70

moderate correlation: r is between .30 &

.70

or r is between .30 and .70

weak correlation: r is between 0 and .30

or r is between 0 and .30 .

Coefficient of Determination

Coefficient of determination lies between 0 to 1

Represented by r2

The coefficient of determination is a measure of how

well the regression line represents the data

If the regression line passes exactly through every point

on the scatter plot, it would be able to explain all of the

variation

The further the line is away from the points, the less it is

able to explain

(fluctuation) of one variable that is predictable from the

other variable

It is a measure that allows us to determine how certain one

can be in making predictions from a certain model/graph

The coefficient of determination is the ratio of the explained

variation to the total variation

The coefficient of determination is such that 0 < r 2 < 1, and

denotes the strength of the linear association between x and

y

percent of the data that is the closest to the line of

best fit

For example,

can be explained by the linear relationship between

x and y (as described by the regression equation)

The other 15% of the total variation in y remains

unexplained

A method to determine correlation when the data

is not available in numerical form and as an

alternative the method, the method of rank

correlation is used. Thus when the values of the

two variables are converted to their ranks, and

there from the correlation is obtained, the

correlations known as rank correlation.

Spearmans rank correlation coefficient

can be calculated when

Actual ranks given

Ranks are not given but grades are given but not

repeated

Ranks are not given and grades are given and

repeated

LOGO

BUSINESS STATISTICS

PRESENTATION

ON

REGRESSION ANALYSIS

Types and methods of regression analysis

Practical aspect of regression analysis with an

example

employed for the purpose of forecasting or making

estimates

Here we make use of various mathematical formulas

and assumptions to describe a real world situation.

In every situation, estimation becomes easy once it is

known that the variable to be estimated is related to and

dependent to some other variable.

between the variable involved .

Models can me broadly be classified into

Linear regression Linear regression analysis is a powerful technique used for

predicting the unknown value of a variable from the known

value of another variable.

More precisely, if X and Y are two related variables, then

linear regression analysis helps us to predict the value of Y

for a given value of X or vice verse.

For example age of a human being and maturity are related

variables. Then linear regression analyses can predict level

of maturity given age of a human being.

used for predicting the unknown value of a variable from

the known value of two or more variables- also called the

predictors.

Multiple regression analysis helps us to predict the value

of Y for given values of X1, X2, , Xk.

For example the yield of rice per acre depends upon

quality of seed, fertility of soil, fertilizer used, temperature,

rainfall. If one is interested to study the joint affect of all

these variables on rice yield, one can use this technique.

Dependent and Independent Variables By linear regression, we mean models with just one

independent and one dependent variable. The variable whose

value is to be predicted is known as the dependent variable

and the one whose known value is used for prediction is

known as the independent variable.

By multiple regression, we mean models with just one

dependent and two or more independent variables. The

variable whose value is to be predicted is known as the

dependent variable and the ones whose known values are

used for prediction are known independent variables.

METHOD-

between the dependent variable and independent

variable is expressed by a line called line of best fit.

Example:

Experience( in

years)

Income( in

000)

15

150

10

120

60

40

70

90

240

210

income

180

150

120

90

60

30

10

experience

12

1418

16

2) ALGEBRIC

METHOD-

and regression coefficients.

Regression equation(Linear).

A statistical technique used to explain or predict thebehaviour of a dependent

variable

The general equation is given by-

y = a + bx

a is the intercept

b is the slope of line

With the use of the above general equation we find the normal equations

Multiplying the general equation by N and taking the summatation of it

we find the first normal equation i.e.

Y = N.a + bX

And again to find the second normal equation we multiply the general

equation by x and then take the summatation i.e.

XY=a X + b X2

Regression equation(Multiple).

General equation => y = a + b1 x1 + b2x2 + .........+ bnxn

Normal equations for multiple regression are:

Y = N.a + b1X1 + b2X2

X1Y= a X1 + b1 X1 2 + b2 X1 . X2

X2Y= a X2 + b1 X1 . X2 + b2 X22

Lines of Regression

There are two lines of regression- that of Y on X and X on Y.

The line of regression of Y on X is given by Y = a + bX where a and b

are unknown constants known as intercept and slope of the equation.

This is used to predict the unknown value of variable Y when value of

variable X is known.

On the other hand, the line of regression of X on Y is given by X = c +

dY which is used to predict the unknown value of variable X using the

known value of variable Y.

Often, only one of these lines make sense.

Exactly which of these will be appropriate for the analysis in hand will

depend on labeling of dependent and independent variable in the

problem to be analyzed.

regression coefficient of Y on X and is denoted by b y x

It represents change in the value of dependent variable (Y)corresponding to

unit change in the value of independent variable (X).

And similarly the coefficient of Y in the line of regression of X on Y is

called coefficient of X on Y and is denoted by b x y .

The two regression co-efficient are byx and bxy .

The formula for the two regression co- efficient are given by

or

b y x = N .XY X .

Y N. X2 (X)2

xy

= N. XY X . Y

N. Y2 (Y)2

Once a regression equation has been constructed, we can

check how good it by examining the coefficient of

determination (R2).

R2 always lies between 0 and 1.

The closer R2 is to 1, the better is the model and its

prediction.

variables X and Y.

Variable X is taken as driving experience and variable Y is

taken as number of road accidents(in a year).

Road accident is taken as the dependent variable and which

is related to independent variable X i.e. driving experience.

X

5

(driving

experienc

e)

12

15

25

16

Y ( no. of

road

accidents)

87

50

71

44

56

42

60

64

From the date we will show The estimated regression line for the date.

Number of road accidents taking place when the

driving experience is 10 years and 30 years.

co efficient of determination(R2) and which will

help us to know that how much percentage of

dependent variable is explained by independent

variable.

driving experience and number of road accidents.

X

X.Y

X2

Y2

64

320

25

4096

87

174

7569

12

50

600

144

2500

71

639

81

5041

15

44

660

225

1963

56

336

36

3136

25

42

1050

625

1764

16

60

960

256

3600

X=90

Y=474

39

6

2

using the normal equations we calculate the value of a and b .

Y = N. a + b X

XY=a X + b X2

8a + 90b = 474 E .q - 1

E.q-2

Now solving both the equation we get the value of a and b asValue of a = 76.66

Value of b = -1.5475

The estimated regression line is

Y = 76.66 1.5476 X

Y = 76.66 1.5476 X

80

70

60

50

No. Of accidents

40

30

20

10

3

18

6

21

9

12

24

27

experience

15

Since we all know that the road accidents are dependent upon the driving

experience and a new driver is considered to be inexperienced and for

him the risk of accident is more so there exist a negative relationship

between the two variables so the trend line is downward sloping in this

case.

From the above value of a and b we can see that value of a is 76.66 which

means if a driver has 0 experience then the no of road accidents that will

take place is 76.66

From the value of b we can say that for every extra year of driving

experience , the road accident is decreased by 1.5476

No of accidents with 10 yr experience

Y = 76.66 1.5476 X

Y = 76.66 1.5476 (10)

Y = 61. 184

Y = 76.66 1.5476 X

Y = 76.66 1.5476 (30)

Y= 30.232

using regression coefficients.

b

yx

= N .XY X . Y

N. X2 (X)2

= N. XY X . Y

N. Y2 (Y)2

= 8(4739) 90. 474

= 8 (4739) 90 . 474

8(29642)

(474)2

= 0.381

8(1396) (90)2

= 1.547

Now

xy

R2 =

b y x .b x y

= (- 1. 547) (- 0.381)

=

0.5894

of variance of dependent variable is explained by the independent

variable.

LOGO

SENSEX and Nifty

Stock Market performance is

quantified by calculating an index

using the benchmark scrips and as

known to all

SENSEX (Sensitive

Index) is associated with Bombay Stock

Exchange and S&P CNX NIFTY is

associated

with

National

Stock

Exchange

There are 23 stock exchanges in the India.

Bombay Stock Exchange is the largest, with

over 6,000 stocks listed. The BSE accounts

for over two thirds of the total trading

volume in the country.

Established in 1875, the exchange is also the

oldest in Asia. Among the twenty-two Stock

Exchanges recognized by the Government of

India under the Securities Contracts

(Regulation) Act, 1956, it was the first one to

be recognized and it is the only one that had

the

privilege of

getting

permanent

Scrips at BSE

ACC

AIRTEL

BHEL

DLF

GRASIM

GUJRAT AMBUJA

HDFC

HDFC BANK

HINDALCO

HUL

ICICI BANK

INFOSYS

SUN Pharma IND.

LTD

ITC

L&TMARUTI

o MARUTI

o MAHINDRA &

MAHINDRA

o NTPC

o ONGC

o RANBAXY

o RELIANCE

COMMUNICATION

o RELIANCE

INFRASTRUCTURE

o RIL

o STERLITE

INDUSTIES LTD

o SBI

o TCS

o TATA MOTERS

o TATA STEEL

o TATA POWER

COMPANY LTD

o WIPRO

The National Stock Exchange (NSE), located

in Bombay, is India's first debt market.

It was set up in 1993 to encourage stock

exchange

reform

through

system

modernization and competition.

The instruments traded are, treasury bills,

government security and bonds issued by

public sector companies

Listing History

How are

the SENSEX 30

Trading

Frequency

Rank

Stocks

are

based

on selected?

the Market Cap (Should be

Among top 100)

Market Capitalization weight

Industry / sector they belong

Historical Record

Methodology of SENSEX

SENSEX has been calculated since 1986 and

initially it was calculated based on the Total

Market Capitalization methodology and the

methodology was changed in 2003 to Free

Float Market Capitalization.

Hence, these days, the SENSEX is based on

the Free Floating Market cap of 30 SENSEX

Stocks traded on the BSE relative to the base

value which is 100(1978-79) and it is

calculated for every 15 seconds

Market Capitalization" methodology, wherein, the

level of index at any point of time reflects the freefloat market

It reflects value of 30 component stocks relative to

a base period.

The market capitalization of a company is

determined by multiplying the price of its stock by

the number of shares issued by the company.

This market capitalization is further multiplied by

the free-float factor to determine the free-float

(Sum of free flow market cap of 30

benchmark stocks)*Index Factor

where,

Index Factor = 100/Market Cap Value in

1978-79.

100 is the Index value during 1978-79.

associated with NIFTY and it is also

calculated by the same methodology but with

two key differences.

1. Base year is 1995 and base value is 1000.

2. NIFTY is calculated based on 50 stocks.

Capital

1978-79

1978-79

- 2016 fm units 34 course outline- semester 1Uploaded byapi-319995141
- Correlation and regressionUploaded byPranusha Reddy
- Multiple RegressionUploaded byJagdip Barik
- Econ5025 Practice ProblemsUploaded byTrang Nguyen
- LAB01_Rep_15109066-15109086Uploaded byAndrian Pratama
- tmpC2E1.tmpUploaded byFrontiers
- MB0040-MBA-1st Sem 2011 Assignment Staistics for ManagmentUploaded byAli Asharaf Khan
- Senior CapstoneUploaded byPa'oneakaiLee-Namakaeha
- Correlation and DependenceUploaded byhaidahusin
- Spss NotesUploaded byVeronica_1990
- Bivariate DataUploaded byJillur Hoque
- S08 2013 ProceedingsUploaded byIbrahim Alsalman
- Correlation and RegressionUploaded bysahar5
- STAB22 Midterm 2009FUploaded byexamkiller
- Is Relationship Marketing for EveryoneUploaded byCoky Fauzi Alfi
- ch3_10_15_test_1 (1)Uploaded byAdam
- Validation of scoring Instruments in Obstetrics- GynaecologyUploaded byCreanga Cristina
- spsssylUploaded byAkshata P
- Business Applications of RegressionsUploaded byMuhammad Atiq Ur Rehman 22-FET/PHDEE/S19
- Weather-based coffee leaf rust apparent infection rate modeling.pdfUploaded byStevens
- v7n2gUploaded byKrishanu Pradhan
- Regression AnalysisUploaded byViplav Nigam
- artikel mipm.docxUploaded byagustina
- Statistical Analysis and Forecast of Consumption of Lube Oil in IndiaUploaded byYash Maurya
- PEER_stage2_10.1136%2Fbjo.2009.158097Uploaded byGema La Rosa Carbonell
- Assignment 2.1Uploaded bySagar
- MCC 202.docxUploaded byRon Opulencia
- corr reg 1.pdfUploaded byrajender564
- 1988-Stuberg-mARCHA-NIÑOSUploaded byAlejandra Vasquez
- Home assignment 1.docxUploaded byluunguyenphuan93

- Gretl's Guide (June 2017)Uploaded bycantor2000
- JPEGUploaded byRakesh Inani
- Face Recognition System on Raspberry PiUploaded byAlan Sagar
- Rmp Lecture Notes 1 EqUploaded byJoão Alves
- 48435_01Uploaded byMohamed Mosaed
- FEM ANSYS Workbench 3d ModelingUploaded bybrrawal
- Lecture 3Uploaded byAmit Rai
- Applied ThermodynamicsUploaded byFawad Hassan
- Kevin MacDermid- The Sunyaev-Zeldovich EffectUploaded byNestorr50
- TheSwiftProgrammingLanguage(Swift4)Uploaded byObi-Wan Kenobi
- EE2224 - Solid Mechanics - Stress StrainUploaded byPreedep Baradidathan
- 2011-P371N4-Group 1 Extra Credit (1)Uploaded byMartha Nita Florentina
- Thesis on Tall BuildingUploaded byDip_Azrin
- Exercises Libor Market Model - ICLUploaded bymeko1986
- [Thesis]on MpCCI as a Coupling Library for FSI With CFXUploaded byLuong Anh
- Math g2 m3 Full ModuleUploaded byRivka Share
- Chapter 5 TMV Examples ExercisesUploaded byKashif
- JTree (Java 2 Platform SE v1Uploaded byshaikirfanahmed
- A molecular approach ch13Uploaded byStephen
- MarkovUploaded byPaulo Zuniga
- GATE-Production-and-Industrial-Engineering-2010.pdfUploaded byrajurana25
- Symon Mechanics TextUploaded byJihan A. As-sya'bani
- Module 7Uploaded byBawbity
- ec-6 syllabusUploaded byshashwatbhattacharya
- Developmental Changes in Brain Function Underlying Inhibitory Control in Autism Spectrum DisordersUploaded bytonylee24
- Civil Lect 1 - 2015Uploaded byAbdelhay Mohamed Hassan
- Direct operating cost aircraftUploaded byusakalamba
- Rubinstein - Economic FablesUploaded byhyper
- 1.2 Beams With Uniform Load and End MomentsUploaded byfabricio88
- cse-02Uploaded byAtul Sharma