© All Rights Reserved

6 views

Chapter 5,6 Regression Analysis.pptx

© All Rights Reserved

- Sample Statistics Assignment
- Linear Regression Gnuplot
- Voli Issue 16 Jan Mar 2015
- Part 4 Modeling Profitability Instead of Default.txt
- EMIE 2012 Ass 2 Q 1 2 Solutions
- Custom IND Bank
- Market Share Rewards to Pioneering Brands
- Bacteria Charles River
- North South Airlines Case Study
- Simple
- 31. Regression Analysis 1-1-11
- CHAPTER 4 (1)
- PSY 2003 Chapter 16 Notes
- rr
- [Brealey R., Myers S., Allen F.] Principles of Cor(B-ok.cc)
- Yesim Ozan_Simple Linear Regression-Presentation_08.08.15.pptx
- Inference for Regression
- Homework 5 Solutions
- render_04
- CH2+Moore.ppt

You are on page 1of 44

Dr Sunil D Lakdawala

Sunil_lakdawala@hotmail.com

Regression Line

a: Intercept, b: Slope

Draw line passing through (1,5) and (2,7)

Find out a and b

Predict value of Y, for X = 5

Is relation direct?

Similarly draw line passing through (0,6) and

(1,3)

Find a and b. Is relation direct?

25-Aug-17 2

Simple Regression

the process of constructing a mathematical

model or function that can be used to predict

or determine one variable by another

variable.

The variable to be predicted is called the

dependent variable and is denoted by Y.

The predictor is called the independent

variable or explanatory variable, and is

denoted by X.

25-Aug-17 3

Regression Line (Cont)

See # of Passengers vs cost

Find our error, absolute error and square error

Can we take error? Absolute error? Square error?

What are characteristics?

Draw line such that square error is least

The equation of the simple linear regression line is

given by

Yi=a +bXi+

Minimize 2 by finding best fit for a, b

25-Aug-17 4

Problem

displayed in the table.

The values in the 1st

column denote number of

passengers for 12 five-

hundred-mile commercial

airline flights using Boeing

737s during the same

season of the year.

We use these data to

develop a regression model

to predict cost by number of

passengers.

25-Aug-17 5

Regression Line (Cont)

b = (Xi*Yi n*X*Y) / (Xi2 n*X2)

a = Y - b* X

Se Standard Error = Sqrt((Yi-Yp)2/(n-2))

Assumption: Errors are normally distributed

What is interpretation of standard error. In

comparison with mean value of Y? In terms

of percentage?

Look at example of Cost vs Passengers

25-Aug-17 6

Regression Line (Cont)

Interpretation of Standard Error (Cont)

what is the range of cost for 80 passengers

with 95% confidence and 90% confidence

For n > 30, use Z distribution

with For n < 30, normal distribution can not

be assumed. Need to take t distribution

What will be the range for the above

problem?

Range is same for Y. Is it true?

Assumption: One is predicting within the

range

25-Aug-17 7

Correlation Analysis

Degree to which one variable is linearly related to

another

Coefficient of Determination:

r2 = 1 (Yi Yp)2 / (Yi Y)2 (between 0 and 1)

= 1 - Ratio

Ratio : variation between actual and predicted value

w. r. t. variation of Yi from mean (Unexplained part)

Variation of Y around regression line

Variation of Y around its own mean

r2 = 1 - ratio of above two

r2 = 0.78 78% variation of Y from Y is explained

using regression. 22% is not explained

25-Aug-17 8

INTERPRETATION of r 2

linearly related, and value will be 1

If first term = second term, there is no

relation, and value is 0

Take example of table 12-6 (DO), table 12-

13, figure 12-13, figure 12-14 to find r2

(HENKE)

Calculate values using excel

25-Aug-17 9

Coefficient of Correlation

r = sqrt (r2)

See fig 12.16

If r = 0.6, how good is regression? How much

variation in Y is explained by regression?

25-Aug-17 10

Inferences about population parameters

Instead of point value of b, we want to find

out range of b with 90% confidence level

Find t for given degrees of freedom, and

given confidence level

Find sb (standard error for b)

Sb = Se / (Xi2 nX2)

Range is b t*sb and b + t*sb

25-Aug-17 11

The Equation

by

Yi=0+1Xi+

Minimize 2 by finding best fit for 0, 1

Table 5-2 Pulp Price Regression Makridakis

Table 5-4 PVC Regression Makridakis

Residual / Error

Outliers: Observation with large residuals (Table 5-4)

Influential Observations: Observation that have a great

influence on the fitted equation. Usually they are extreme

observation (. (See King-Kong problem Fig 5-8

Makridakis)

25-Aug-17 12

The Equation (Cont)

drowning and Y: Consumption of coke might be highly

related, but there may not be causal relationship Both

might be increasing in summer!

Lurking Variable : Explanatory variable not included in

regression that is highly related to both X and Y (e.g.

Season in above case

Confounding Variable: New Car Sales may depend

upon both Price as well as Advertisement

Expenditure. Last two are called Confounding variable

25-Aug-17 13

The Equation (cont)

The regression analysis is performed under following assumptions.

Y = 0 + 1 * X +

1. Residual:

(Y- ^Y) should be near zero. (Y - ^Y) is the residue, denoted by

2. Residual

Plot X vs (Y- ^Y). should be Random (i.e. normally distributed)

and should not have any trend (unlike Y = X**2)

3. Standard Error Se = (SUM Squared Error/(N-K-1); K = 1 for

Simple Regression; SUM Squared Error = (Y- ^Y) **2

Se should be acceptable (Se / Ymean give good idea of error)

68% of Residue should be within Se

95% of Residue should be within 2* Se

25-Aug-17 14

The Equation (Cont)

4. Correlation (r) is measure of linear association-ship

between the variables. Even if variables have high

nonlinear relationship, r might be very small (see Fig

5-7 Makridakis)

5. For small n, r is notoriously unstable. For n = 30 or

more, it starts becoming stable

6. r can change drastically due to extreme values. (See

King-Kong problem Fig 5-8 Makridakis, where just one

extreme point changes r from 0.527 to 0.940). What

should we do?

7. Coefficient of Determination: R**2 should be high,

towards 1. Interpretation of R2 and r

25-Aug-17 15

The Equation (Cont)

8. P value should be smaller than 0.05 (i.e. 95%

confidence) for rejecting null Hypothesis (0 =0 / 1 =

0)

F = t**2 = MS (Regression) / MS (Residual) (t value

for 1)

Significance F = p value for 1

9. Adjusted R**2 =

1 (Sum Squared Error/ (N-K-1)*(N-1)/ (X-X mean

)(Y-Y mean )

10. Should make common sense i.e. when X change by

1, Y changes by Slope. +ve or ve change should

make common sense

11. Only prediction valid within the range from which model

is made

25-Aug-17 16

The Equation (Cont)

12. Please see equation 5.19 for error interval on predicted

value

13. Please see equations for and and their error

interval on page 216

14. Residues vs explanatory variable should not have any

pattern (No trend, No seasonality, etc..)

15. Residues should have mean as zero and should be

normally distributed

25-Aug-17 17

Data and Analysis

25-Aug-17 18

Summary Output

25-Aug-17 19

Residuals

It is the

difference

between the

actual Y value

and predicted Y

value by the

regression

model in

predicting each

value of the

dependent

variable.

25-Aug-17 20

Residuals

residuals squared is

called the sum of

squares of error

(SSE).

The standard error

of the estimate is a

standard deviation

of the error of the

regression model

25-Aug-17 21

Coefficient of Determination

models is the coefficient of determination.

The coefficient of determination is the

proportion of variability of the dependent

variable (Y) accounted for or explained by the

independent variable (X).

It is denoted by r2.

It lies between 0 and 1.

25-Aug-17 22

r 2 in Airlines Cost

r2 = .899 [pg6,12,13]

This means that about 89.9% of the variability

of the cost of flying a Boeing 737 airplane on

a commercial flight is accounted for or

predicted by the number of passengers.

This also means that about 11.1% of the

variation in airline flight cost, Y, is

unaccounted for by X or unexplained by the

regression model.

25-Aug-17 23

Correlation

It is a measure of

association. It measures

the strength of relatedness

of two variables.

For example, we may be

interested in determining

the correlation between

the prices of two stocks in

the same industry

How strong are these

correlations?

The Pearson product -

moment correlation

coefficient is given by.

25-Aug-17 24

Correlation

1. The measure is applicable only

if both variables being analyzed

have at least an interval level of

data.

2. r is a measure of the linear

correlation of two variables.

3. r = +1 denotes a perfect positive

relationship between two sets of

variables.

4. r = -1$ denotes a perfect

negative correlation, which

indicates an inverse relationship

between two variables.

5. r=0 means that there is no linear

relationship between the two

variables.

6. The coefficient of determination

= (correlation coefficient) r2

25-Aug-17 25

25-Aug-17 26

Factors to be taken care of

out the example Y = SQR (X)

Do not try to predict Y, outside the range from

the ones used for building model (Try

predicting for Y = SQR (X)

25-Aug-17 27

Multiple Regression Model

The general equation which describes multiple regression

model is given by

Yi=0+1Xi + 2X2 + kXk +

Minimize 2 by finding best i

Assumptions made in the model are :

1. Residual:

(Y- ^Y) should be near zero. (Y - ^Y) is the residue,

denoted by

Plot Xi vs (Y- ^Y) for each Xi. should be Random and should

not have any trend (unlike Y = X**2)

2. Standard Error Se = (SUM Squared Error/(N-K-1); K = # of

independent variables; SUM Squared Error = (Y- ^Y) **2

Se should be acceptable (Se / Ymean give good idea of

error)

68% of Residue should be within Se

95% of Residue should be within 2* Se

25-Aug-17 28

Multiple Regression Model (Cont)

4. P value should be smaller than 0.05 (i.e. 95% confidence)

for rejecting null Hypothesis (i = 0)

F = t**2 = MS (Regression) / MS (Residual) (t value

overall)

Significance F = p value overall. For rejecting null

hypothesis (1=0 and 2=0 ..), this should be small

5. Should make common sense i.e. when X change by 1, Y

changes by Slope. +ve or ve change should make common

sense

6. Adjusted R**2 should not be very different from R**2. By

adding more variables, one can always make R**2 large.

But Adjusted R**2 might be much smaller than R**2

25-Aug-17 29

Problem

See Fig 6-1 Bankdata Regression

A real estate study was conducted

in a small city to determine what

variables, if any, are related to the

market price of a home.

Several variables were explored,

including the number of bedrooms,

the number of bathrooms, the age of

the house, the number of square

feet of living space, the total number

of square feet of space, and how

many garages the house had.

Suppose that the business analyst

wants to develop a regression

model to predict the market price of

a home by two variables: ``total

number of square feet in the house''

and the age of the house.

The data are given in the table.

25-Aug-17 30

The Fitted Model

Y = 57.351 + 0.0177X1 0.6663X2

Interpretation:

The Y- intercept is equal to 57.351. In this example, Y-intercept

does not have any practical significance.

The coefficient of X1 (total number of square feet in the house) is

0.0177. This means that 1-unit increase in square footage would

result in predicted increase of (0.0177) ($1000) = $17.70 in the

price of the home if the age were held constant.

The coefficient of X2 (age) is -0.6663. The negative sign on the

coefficient denotes an inverse relationship between the age of a

house and the price of the house : the older the house, the lower

the price. In this case, if the total number of square feet in the

house is kept constant, a 1-unit increase in the age of the house

(1 year) will result in (-0.6663) ($1000) = - 666.30, a predicted

drop in the price.

25-Aug-17 31

Testing the Model

r2 = 0.715

Testing the

overall model

Significance

Tests of the

Regression

Coefficients

25-Aug-17 32

25-Aug-17 33

Analysis of Residuals

25-Aug-17 34

Multicollinearity

variables of a multiple regression model being highly

correlated. This causes problems in the

interpretation of results.

In particular, these problems are

It is difficult, if not impossible, to interpret the estimates of

the regression coefficients.

Inordinately small t values for the regression coefficients

may result.

The standard deviations of regression coefficients are

overestimated.

The algebraic sign of estimated regression coefficients may

be the opposite of what would be expected for a particular

predictor variable. [pg24]

25-Aug-17 35

Search Procedures

All possible regression

Take all possible combination of K variables (2K -1 models).

Choose the best model

Forward selection

Start with one variable. Try out all variable one at a time. Choose

the best one.

Then take 2nd variable, and so on.

Backward Elimination:

Start with all variables

Keep on repeating

Stepwise regression

Same as Forward Selection, but at every time also check that the

variable included is significant (acceptable p value)

25-Aug-17 36

Factors to be taken care of

Value of R2 could be inflated. Consider R2 adjusted

Better model does not imply cause and effect between

independent variables and dependent variable (some

other factors might be causing both)

Value of regression coefficient may not directly tell about

the importance, because

Different Units

Multi co-linearity

same,

Use Search Procedures

Use r between two variables. For larger value, do not

take both

25-Aug-17 37

Non Linear Models (5/4 - Makridakis)

Nonlinearity in parameters More complex (One may be

able to use transformation to convert into linear in certain

cases)

Nonlinearity in variables

Local Regression (see 5/4/3 of Makridakis)

25-Aug-17 38

Non-Linear Model

Y=0+1X1+ 2X2 **2 +

Choose Y1 = X1 ;Y2 = X2 **2

Y=0+1X1+ 2X1X2 +

Choose Y1 = X1 ;Y2 = X1 * X2

Y = 0*1X

Log(Y) = Log(0) +X*Log(1); Now it is in

linear form

Similarly Y = 0*X1 ; Y = 1 / (0+1X1+ 2X2 )

can be converted into linear regression

25-Aug-17 39

Indicator (Dummy) Variable

For Gender, define Gender = 1 for Male and Gender

= 0 for Female

For Region (N/S/W/E), do not map to 0, 0.33, 0.66

and 1.0 (Why? .. Unordered)

Define three variable, X1, X2 and X3

X1 = 1 for North Region, 0 other wise

X2 = 1 for South Region, 0 other wise

X3 = 1 for West Region, 0 other wise

X1 = X2 = X3 = 0 represents East Region

25-Aug-17 40

Others (Pg 270 Makridakis)

Trading day variation

Introduce seven variables, T1: # of Mondays in Month, T2: # of

Tuesdays in Month, ..

Holiday Effect

V=1 if Diwali falls in this month (or part of Diwali)

25-Aug-17 41

Interventions (Pg 271 Makridakis)

Seat belt legislation was introduced

Due to that car accidents went down

Introduce dummy variable I = 0 (Before seat belt

legislation), and I =1 (after seat belt legislation)

More complex models can be introduced, if effect is

spread over some time

See figure 8-15 for intervention variable

25-Aug-17 42

Effect of Advertising Expenditure on Sale (Pg 271 Makridakis)

advertisement expense is one of the input variable

Effect of advertisement expense lasts till 3 months (say)

One can model as follows:

Yt = b0 + b1*X1,t + b2*X1,t-1 + b3*X1,t-2 + ..

25-Aug-17 43

Miscellaneous

Variance - Covariance Matrix

Vector X = (X1, X2, X3, )

Let (i) be arithmetic average of X(i)

(i.j) = k (X(i,k) - (i))*(X(j,k) - (j))/ N

(i.i) is Variance Matrix

25-Aug-17 44

- Sample Statistics AssignmentUploaded byExpertsMind.com
- Linear Regression GnuplotUploaded byjopaome_007
- Voli Issue 16 Jan Mar 2015Uploaded bySugan Pragasam
- Part 4 Modeling Profitability Instead of Default.txtUploaded byWathek Al Zuaiby
- EMIE 2012 Ass 2 Q 1 2 SolutionsUploaded byLy Sophea
- Custom IND BankUploaded byNikhil Bansal
- Market Share Rewards to Pioneering BrandsUploaded byPraveen Hatti
- Bacteria Charles RiverUploaded byYaasiin Oozeer
- North South Airlines Case StudyUploaded byPhilip Joseph
- SimpleUploaded byRandi Pratama
- 31. Regression Analysis 1-1-11Uploaded byAni Krishna
- CHAPTER 4 (1)Uploaded byNg Choon Wei
- PSY 2003 Chapter 16 NotesUploaded byGabe Marquez
- rrUploaded byram
- [Brealey R., Myers S., Allen F.] Principles of Cor(B-ok.cc)Uploaded byAzrul Humaira
- Yesim Ozan_Simple Linear Regression-Presentation_08.08.15.pptxUploaded byyeşim ozan
- Inference for RegressionUploaded byb3nzy
- Homework 5 SolutionsUploaded byamt801
- render_04Uploaded byMohamed Ahmed Gutale
- CH2+Moore.pptUploaded byAdeel Shaikh
- 1-s2.0-S0926580505001251-main.pdfUploaded byDarko Tešić
- Estadística Clase 7Uploaded byAndres Gaort
- DSC2008 Tutorial 4.docxUploaded byBen Chong
- Multifactor Non-linear Modeling for Accelerated Stability Analysis and PredictionUploaded bySiri Kalyan
- Stat 241 HW4Uploaded bySae Ik Cho
- Brm Power Point by NilaUploaded byShelveyElmoDias
- Download Fullpapers Akk60199870a9fullUploaded byAris Sudibyo
- Presentation 9 Chap.8(1) (2)Uploaded byKevin Nyakundi
- @Regression.pptUploaded byRejie
- Ghayur Final Repaired)Uploaded byghayurk

- Kano Model of VOCUploaded bygogo07906
- Service Marketing IntorductionUploaded bySushil Kumar
- Gap_ModelUploaded byRudrakshi Razdan
- Gap 4 - Managing Service PromisesUploaded bySushil Kumar
- NumbersUploaded bySushil Kumar
- CsrUploaded bySushil Kumar
- Beyond GreeningUploaded bySushil Kumar
- Logistics Management 1 _ PPTUploaded bySushil Kumar
- Chapter 3 Time Series DecompositionUploaded bySushil Kumar
- Chapter 4 Exponential Smoothening MethodsUploaded bySushil Kumar
- Driving Operational Innovation Using Lean Six SigmaUploaded byLalitha Ramaswamy
- Lean Six Sigma_Intro..pdfUploaded bySushil Kumar
- New Delhi is Running Out of WaterUploaded bySushil Kumar
- In SpiritUploaded bySushil Kumar
- 61 Unit Cost CalcsUploaded byantonyo3000
- S4 Positioning OverUploaded bySushil Kumar
- S7 Gurudas Nulkar Channels-doneUploaded bySushil Kumar
- S1 Sales vs MktgUploaded bySushil Kumar
- S5 Product DoneUploaded bySushil Kumar
- DellUploaded bySushil Kumar
- S9 Marketing Planning-overUploaded bySushil Kumar
- IRCTC Project ReportUploaded bySushil Kumar
- SCM.pptxUploaded bySushil Kumar
- Bank File UpdatedUploaded bySushil Kumar
- RBI UpdatedUploaded bySushil Kumar
- WalmartUploaded bySushil Kumar
- GMAT HandbookUploaded byNazera Salejee
- HiDesign StrategyUploaded bySanjeevRanjan

- acara9salahUploaded byrochun
- t-test for MBA Salary.xlsxUploaded bybryanmv
- Sex Differences in the Adult Human BrainUploaded byHugo Junkers
- Crime and StatisticsUploaded byHasani Wheat
- Summer ProjectUploaded bypranav_rudra
- 000101 Managerial Economic Demand FunctionUploaded byPsy Vivianne
- An in Vitro Study on the Dimensional Stability of a Vinyl Polyether Silicone Impression Material Over a Prolonged Storage PeriodUploaded byesmek
- SSRN-id2761885Uploaded byCorolla Sedan
- Work Habits and Entrepreneurial Success in Three Industrial Sub Sectors in Rivers StateUploaded byIJARP Publications
- Tutorial for StatgraphicsUploaded bySaúl Dt
- Active Surveillance of Vaccine Safety a System to.11Uploaded byBunga Petrina Oktarini
- Tutorial Workshop Questions s1 2016Uploaded bySimran Narsinghani
- Eview Manual GuideUploaded byAdhi Chandra Wirawan
- Pinnacle SoluW2tions Perfume ResearchUploaded byNor Najwa Solihah
- CFA Level 1 NotesUploaded bykazimeister1
- t-testUploaded bydevilsdevil
- Statistical AssessmentUploaded byMuhammad Afzal Mirza
- Regresión y CalibraciónUploaded byAlbertoMartinez
- 04-f4_full.zp97824Uploaded bySteffi Yang
- Homework 1 SolutionsUploaded byVan Thu Nguyen
- Macroeconomic Theory and Policy c3-c6Uploaded byMinicute Hươngiang
- Chapter5_BiostatsUploaded byRige
- Optimization of Ascorbic Acid TabletsUploaded byaastha212002
- Review-Key-+Test+#1-+Basic+Inferential+Statistics-+MCUploaded bySathyaraj Sekar
- supplierUploaded byLevent
- Aira Thesis.Uploaded bySO Fab
- cbesta2chap09slidesUploaded byArjohn Yabut
- QNT 561 Week Five Book Problems (With Hints)Uploaded byatkray
- Carver Solutions CopyUploaded byEdward Rod
- Six Sigma.dumps.icgb.v2015!08!06.by.exampass.144qUploaded byAnonymous YkEOLlju