Learning Objectives

Understanding What is regression analysis Where it is used?

Regression

Provides a conceptually simple method for investigating functional relationships between one or more independent explanatory variables (factors) and a dependent variable (outcome of interest)

or a model connecting the response or dependent variable and one or more explanatory or predictor or

independent variables

Regression in Business

Predict the future joint distribution of asset returns Construct a optimal portfolio (choose weights) Estimate the effect of price and advertisement on sales Decide what is optimal price and ad campaign Predict the future probability of default using known characteristics of borrower Decide whether or not to lend (and if so, how much)

Regression in Business

Sales volume, market movement (icecream, houses) Customer complaints over time Key product specialization Predict the demographics and types of future workforce for large companies Estimate training impact

What price should I charge for my car? What will the interest rate be next month? Will this person like that movie?

Does your income increase if you complete this course? Will tax incentives change purchasing behaviour? Is my advertising campaign working?

Where to start?

Linear Prediction

Example : Predicting house price Problem: Predict market price based on observed characteristics

Solution: Look at property related data where we know the price and some observed characteristics Build a decision rule that predicts price as a function of the observed characteristics

What characteristics do we use?

Size No of rooms Attached baths Garage space, UPS facility, neighbourhood etc

Easy to quantify variables like price and size but what about other variables like aesthetics, workmanship etc.

To keep things simple lets focus only on size

The value that we seek to predict is called the dependent (or explained ) variable, and we denote this as

The variable that we use to guide prediction is the independent (or explanatory) variable, and this is labelled

Linear prediction

Recall that the equation of a line is: Y = b0 + b1X We add the random residual term Y = b0 + b1X + u

The intercept value is in units of Y (Rs.1,00,000) The slope is in the units of Y per unit of X (Rs.1,00,000/1,000 Sq feet)

Y = b0 + b1X + u

Intercept b0 : when X =0, Y = b0 Intercept is the best predictor of Y Slope b1 : when X increases by 1 unit (1000 sq ft), Y increases by b1 units (Rs.1,00,000)

Linear Prediction

Linear Prediction

We can now predict the price of a house when we know only the size

Y = dependent variable X1, X2, X3, Xp = independent variables Linear relationship is written as:

Estimating this model requires statistical tools better than simple graphical methods Least Square Method

A reasonable way to fit a line is to minimize the amount by which the fitted value differs from the actual value. This amount is called the residual or Error

Yi 0 1 X i u i

X Y i 0 1 i

Fitted value

What is the fitted value?

The dots are the observed values and the line represents our fitted values given by

What is the residual for the ith observation?

ui

u Yi Y i i u Y Y

i i i

Total may be small but the individual residual may be widely scattered Also positives may cancel out negative residuals resulting in a small total

OLS Criteria

How well does the sample regression line fit the data? We want to know what proportion of variations in Y does our model explain?

r2: Measures the goodness of fit

Ballentine view of r2

r2 = 0

r2 = 1

Sam wants to predict the sale of compact cassette tape recorder across stores using advertisement and price data where Sales is measured in number of units sold Advertisement = number of times product is advertised within the store Price = in dollars Predict the sale of compact cassette tape recorder if advertisement = 7 and price = $132?

Yi 0 1 X 1i 2 X 2i u i

Sales 0 1 ( Advertisement ) 2 (Pr ice) Error

Coefficientsa Unstandardized Coefficients B (Constant) 1 Number of Advertisement Price in Dollars 219.231 6.381 -1.671 Std. Error 86.242 2.180 .684 .847 -.706 Standardize d Coefficients Beta 2.542 .085 2.927 .061 -2.441 .092

Model

Sig.

Estimated Equation

Interpretation Constant: 0: When Advertisement and price are zero Average sales = 219.231 (constant) Slopes: 1 : If advertisement increases by 1 number, sales increases by 6.4 units 2 : If price increases by 1 $, sales decreases by 1.67 units

Prediction

Predict Sales when Advertisement = 7 and Price = $132 Sales = 219.231 + 6.381 x 7 -1.671 x 132 = 219.231 + 604.667 220.572 =603.326 units of sale

R-Square

SPSS output

Model Summary Model 1 R .884a R Square .782 Adjusted R Square .637 Std. Error of the Estimate 16.108

R-Square = 0.782 indicates that the model explains 78.2 % variation in Y variable

Hypothesis testing

H1 : 2 2

For df = n-k and level of significance read the table value from t table Decision rule: if the calculated |t| > t, then reject the Ho.

Hypothesis testing

Test if each of the slope coefficients make any impact on the Y variable at significance level of 0.05.

0 H0 : 1 0 H1 : 1

Significance level (SPSS output):0.061 0.061 > 0.05 => Do not reject H0 Advertisement has no significant impact on Sales

Coefficientsa Unstandardized Coefficients B (Constant) 1 Number of Advertisement 219.231 6.381 Std. Error 86.242 2.180 .847 Standardized Coefficients Beta 2.542 .085 2.927 .061 Model t Sig.

Price in Dollars

-1.671

.684

-.706

-2.441 .092

Is the regression as a whole significant? Test if atleast one X variable has an impact on the Y

H1: atleast one i 0 ( Y depends on at least one X)

Statistics used : F Statistics Given as ANOVA table output in SPSS output At Significance level of 0.05 If Sig < 0.05, then Reject H0

ANOVAb Model Sum of Squares df 2 3 5 Mean Square F Sig.

a. Predictors: (Constant), Price in Dollars, Number of advertisement b. Dependent Variable: Sales (units sold) Sig = 0.102 > 0.05. Hence do not reject H0. Y does not depend on any of the X variables

