You are on page 1of 36

# Regression Analysis

## Ordinary Least Square Method

Learning Objectives
Understanding What is regression analysis Where it is used?

## Use model to make a business decision

Regression

Provides a conceptually simple method for investigating functional relationships between one or more independent explanatory variables (factors) and a dependent variable (outcome of interest)

## The relationship is expressed in the form of an equation

or a model connecting the response or dependent variable and one or more explanatory or predictor or

independent variables

Regression in Business

## Risk analysis for investment (Optimal Portfolio choice)

Predict the future joint distribution of asset returns Construct a optimal portfolio (choose weights) Estimate the effect of price and advertisement on sales Decide what is optimal price and ad campaign Predict the future probability of default using known characteristics of borrower Decide whether or not to lend (and if so, how much)

## Credit scoring models

Regression in Business

## Sales or market forecast

Sales volume, market movement (icecream, houses) Customer complaints over time Key product specialization Predict the demographics and types of future workforce for large companies Estimate training impact

## Straight prediction question:

What price should I charge for my car? What will the interest rate be next month? Will this person like that movie?

## Explanation and Understanding

Does your income increase if you complete this course? Will tax incentives change purchasing behaviour? Is my advertising campaign working?

Where to start?

## Scatterplot exploring 2-3 dimensions.

Linear Prediction
Example : Predicting house price Problem: Predict market price based on observed characteristics

Solution: Look at property related data where we know the price and some observed characteristics Build a decision rule that predicts price as a function of the observed characteristics

## Linear Prediction: Predicting house prices

What characteristics do we use?

## Many factors or characteristics affect the price of house

Size No of rooms Attached baths Garage space, UPS facility, neighbourhood etc

Easy to quantify variables like price and size but what about other variables like aesthetics, workmanship etc.

## Linear Prediction: Predicting house prices

To keep things simple lets focus only on size

The value that we seek to predict is called the dependent (or explained ) variable, and we denote this as

## Y = price of house (e.g. lakhs of rupees)

The variable that we use to guide prediction is the independent (or explanatory) variable, and this is labelled

## Line here is the Trend line

Linear prediction

Recall that the equation of a line is: Y = b0 + b1X We add the random residual term Y = b0 + b1X + u

## Where b0 is the intercept and b1 is the slope

The intercept value is in units of Y (Rs.1,00,000) The slope is in the units of Y per unit of X (Rs.1,00,000/1,000 Sq feet)

## Linear regression: interpretation

Y = b0 + b1X + u

Intercept b0 : when X =0, Y = b0 Intercept is the best predictor of Y Slope b1 : when X increases by 1 unit (1000 sq ft), Y increases by b1 units (Rs.1,00,000)

Linear Prediction

## Hat indicates an estimate

Linear Prediction

We can now predict the price of a house when we know only the size

## Regression Model: General

Y = dependent variable X1, X2, X3, Xp = independent variables Linear relationship is written as:

## Y = b0 + b1X1 + b2X2 + bpXp + u

Estimating this model requires statistical tools better than simple graphical methods Least Square Method

## Least Square Regression Model

A reasonable way to fit a line is to minimize the amount by which the fitted value differs from the actual value. This amount is called the residual or Error

Yi 0 1 X i u i

## Estimated using Least Square Regression Model

X Y i 0 1 i

Fitted value
What is the fitted value?

The dots are the observed values and the line represents our fitted values given by

## Residual or error term

What is the residual for the ith observation?

ui

u Yi Y i i u Y Y
i i i

## Objective: Minimize the total of residuals to get best fit

Total may be small but the individual residual may be widely scattered Also positives may cancel out negative residuals resulting in a small total

OLS Criteria

## The coefficient of determination r2

How well does the sample regression line fit the data? We want to know what proportion of variations in Y does our model explain?

## This is given by r-square statistic- coefficient of determination

r2: Measures the goodness of fit

## The coefficient of determination r2

Ballentine view of r2

r2 = 0

r2 = 1

## Example for multivariate regression model

Sam wants to predict the sale of compact cassette tape recorder across stores using advertisement and price data where Sales is measured in number of units sold Advertisement = number of times product is advertised within the store Price = in dollars Predict the sale of compact cassette tape recorder if advertisement = 7 and price = \$132?

## Least Square Regression Model

Yi 0 1 X 1i 2 X 2i u i

## Least Square Regression

Sales 0 1 ( Advertisement ) 2 (Pr ice) Error
Coefficientsa Unstandardized Coefficients B (Constant) 1 Number of Advertisement Price in Dollars 219.231 6.381 -1.671 Std. Error 86.242 2.180 .684 .847 -.706 Standardize d Coefficients Beta 2.542 .085 2.927 .061 -2.441 .092

Model

Sig.

## Least Square Regression

Estimated Equation

## Sales 219.231 6.381( Advertisement) 1.671(Pr ice)

Interpretation Constant: 0: When Advertisement and price are zero Average sales = 219.231 (constant) Slopes: 1 : If advertisement increases by 1 number, sales increases by 6.4 units 2 : If price increases by 1 \$, sales decreases by 1.67 units

Prediction

Predict Sales when Advertisement = 7 and Price = \$132 Sales = 219.231 + 6.381 x 7 -1.671 x 132 = 219.231 + 604.667 220.572 =603.326 units of sale

R-Square

SPSS output
Model Summary Model 1 R .884a R Square .782 Adjusted R Square .637 Std. Error of the Estimate 16.108

## a. Predictors: (Constant), Price in Dollars, Number of Advertisement

R-Square = 0.782 indicates that the model explains 78.2 % variation in Y variable

Hypothesis testing

## Testing individual slope coefficients using t test Sample = population H0 : 2 2

H1 : 2 2

For df = n-k and level of significance read the table value from t table Decision rule: if the calculated |t| > t, then reject the Ho.

Hypothesis testing

Test if each of the slope coefficients make any impact on the Y variable at significance level of 0.05.
0 H0 : 1 0 H1 : 1

## 6.381 0 1 1 t 2.927 ) 2.18 se( 1

Significance level (SPSS output):0.061 0.061 > 0.05 => Do not reject H0 Advertisement has no significant impact on Sales

## Individual t test: significance of slope coefficient

Coefficientsa Unstandardized Coefficients B (Constant) 1 Number of Advertisement 219.231 6.381 Std. Error 86.242 2.180 .847 Standardized Coefficients Beta 2.542 .085 2.927 .061 Model t Sig.

Price in Dollars

-1.671

.684

-.706

-2.441 .092

## Hypothesis testing: The overall significance

Is the regression as a whole significant? Test if atleast one X variable has an impact on the Y

## 0 (Y doesnt depend on X) H0: 1 2

H1: atleast one i 0 ( Y depends on at least one X)

Statistics used : F Statistics Given as ANOVA table output in SPSS output At Significance level of 0.05 If Sig < 0.05, then Reject H0

## Hypothesis testing : F test

ANOVAb Model Sum of Squares df 2 3 5 Mean Square F Sig.

## 1396.212 5.381 .102a 259.470

a. Predictors: (Constant), Price in Dollars, Number of advertisement b. Dependent Variable: Sales (units sold) Sig = 0.102 > 0.05. Hence do not reject H0. Y does not depend on any of the X variables