You are on page 1of 43

MMGT6012

Business Tools for Management

TOPIC 5: Multiple Regression Modelling


Dr. Matthew Beck
ITLS, Business School

The University of Sydney Page 1


5. Multiple Regression Modelling

Assumptions of Regression

Linear relationship between X and Y

No multicolinearity
– Independent variables are not correlated with each other

Normality of Error
– Error values (ε) are normally distributed for any given value of X

Homoscedasticity
– The probability distribution of the errors has constant variance

Independence of Errors
– Error values are statistically independent

The University of Sydney Page 2


5. Multiple Regression Modelling

The Simple Linear Regression Function

In the population the regression model is:

𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋𝑖𝑖 + 𝜀𝜀𝑖𝑖

In the sample the regression model is:

𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋𝑖𝑖

The University of Sydney Page 3


5. Multiple Regression Modelling

Y Yi = β0 + β1Xi + ε i
Observed Value
of Y for Xi

εi Slope = β1
Predicted Value Random Error
of Y for Xi
for this Xi value

Intercept = β0

Xi X
The University of Sydney Page 4
5. Multiple Regression Modelling

Y Yi = β0 + β1Xi + ε i
Observed Value
of Y for Xi

εi
Predicted Value Random Error Slope = β1
of Y for Xi
for this Xi value

Intercept = β0

Xi X
The University of Sydney Page 5
5. Multiple Regression Modelling

Y Yi = β0 + β1Xi + ε i
Observed Value
of Y for Xi

εi
Predicted Value Random Error Slope = β1
of Y for Xi
for this Xi value
Objective is to
Intercept = β0
minimise all errors!
Xi X
The University of Sydney Page 6
5. Multiple Regression Modelling

The Multiple Linear Regression Function

In the population the regression model (for k indep. variables) is:

𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋1𝑖𝑖 + 𝛽𝛽2 𝑋𝑋2𝑖𝑖 + ⋯ + 𝛽𝛽𝑘𝑘 𝑋𝑋𝑘𝑘𝑖𝑖 + 𝜀𝜀𝑖𝑖

In the sample the regression model is:

𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋1𝑖𝑖 + 𝛽𝛽2 𝑋𝑋2𝑖𝑖 + ⋯ + 𝛽𝛽𝑘𝑘 𝑋𝑋𝑘𝑘𝑘𝑘

The University of Sydney Page 7


5. Multiple Regression Modelling

Two Variable Model

Y
Ŷ = b0 + b1X1 + b 2 X 2

X2

X1
The University of Sydney Page 8
5. Multiple Regression Modelling

Multiple Regression Example

A local golf store wants to evaluate factors thought to influence


demand for boxes of golf balls

Dependent Variable:
– Golf Ball Sales

Independent Variables:
– Price (in $)
– Advertising (in $100)

Data are collected for 15 weeks

The University of Sydney Page 9


5. Multiple Regression Modelling
Week Sales Price Advert
Multiple Regression Example 1 450 33 3.3
2 560 45 3.3
3 450 48 3.0
Sales = β0 + β1(Price) + β2(Advert)
4 530 48 4.5
5 450 41 3.0
6 480 45 4.0
7 530 27 3.0
8 570 38 3.7
9 550 42 3.5
10 590 30 4.0
11 440 43 3.5
12 400 47 3.2
13 540 35 4.0
14 550 30 3.5
15 400 42 2.7
The University of Sydney Page 10
5. Multiple Regression Modelling

Multiple Regression Example

Visualising the relationships

Sales vs. Price Sales vs. Advertising


700 700

600 600

500 500

400 400

300 300

200 200

100 100

0 0
0 20 40 60 0 1 2 3 4 5

The University of Sydney Page 11


5. Multiple Regression Modelling

Multiple Regression Example

How well does the model fit the data?


– R2 reports the variation in Y explained by the variation in X

– 51.9% of the variation in sales is explained by variations in price and


advertisting

The University of Sydney Page 12


5. Multiple Regression Modelling

Multiple Regression Example

Are the coefficients significant?


– Can use either the sig value (< 0.05) or the t statistic (> 1.96)

The University of Sydney Page 13


5. Multiple Regression Modelling

Multiple Regression Example

Use the coefficients to construct the regression equation:


– Sales = 406.9 – 4.149(Price) + 73.773(Advert)

The University of Sydney Page 14


5. Multiple Regression Modelling

Multiple Regression Example

As advertising increases by
1 unit ($100), sales
increase by 73.8 units

Sales = 406.9 – 4.149(Price) + 73.773(Advert)

As price increases by 1 unit


($1), sales decrease by 4.1
units

The University of Sydney Page 15


5. Multiple Regression Modelling

Multiple Regression Example

Use the regression model to predict sales:


– What if prices was set to $40 per box and advertising was $450?

Sales = 406.9 – 4.149(Price) + 73.773(Advert)

Sales = 406.9 – 4.149(40) + 73.773(4.5)

Sales = 572.9 boxes of golf balls Note that advertising was initially
measured in hundreds of dollars so
convert 450 to 4.5 “hundreds”

The University of Sydney Page 16


5. Multiple Regression Modelling

Class Activity

What is the range of sales for which you are 95% confident the
actual sales of golf balls will be?

The University of Sydney Page 17


5. Multiple Regression Modelling

Summarising What We Know

We know that we do the following for regression:


– Assess model fit
– Assess significance of coefficients
– Interpret model
– Make predictions

The University of Sydney Page 18


5. Multiple Regression Modelling

Adding to Our Knowledge

There are several issues with multiple regression modelling:


– R-square never decreases when a new X variable is added to the model
– Using multiple t-tests to individually test each coefficient one at a time
increases the chance of making a Type I error
– How do we compare the impacts of different independent variables
– How do we add categorical variables to our regression model
– How do we remove variables that are insignificant

The University of Sydney Page 19


5. Multiple Regression Modelling

Adjusted R-square

R-square:
– Never decreases when a new X variable is added to the model

Adjusted R-Square:
– Calibrates the R2 based on how many X variables we are using
– Does using an extra X (k = number of X’s) add any benefit

 2  n − 1 
2
R = 1 − (1 − R ) 
 n − k − 1 
adj

The University of Sydney Page 20
5. Multiple Regression Modelling

Adjusted R-square

Adjusted R-square is used for comparing model fit:


– Used to determine which model fits the data better
– Can only compare models with the same Y (dependent) variable

R-Square is used for interpreting model fit:


– Used to interpret how well the chosen model is performing

The University of Sydney Page 21


5. Multiple Regression Modelling

ANOVA (F test)

Using a t-test:
– Every time we use a t-test we allow for a 5% chance of a mistake
– The more times we do a test the more likely it is we will make a mistake

ANOVA (F test):
– Test of overall model significance
– H0: β1 = β2 = … = βk = 0 (no linear relationship)
– H1: at least one βi ≠ 0 (at least one Xi affects Y)

The University of Sydney Page 22


5. Multiple Regression Modelling

ANOVA (F test)

The University of Sydney Page 23


5. Multiple Regression Modelling

Standardised Betas

Slope Coefficients:
– Tell us the average change in Y for a one unit chance in X
– They are a function of how X is measured!

Predicting Public Transport Use:


– Distance to work is in kilometres
– Age is in years
– Income is in dollars
– Number of people in family is in ???

The University of Sydney Page 24


5. Multiple Regression Modelling

Standardised Betas

Recall our discussion of the Normal distribution:


– We could standardise data to remove units of measurement

𝒙𝒙 − 𝝁𝝁 x = value of interest
𝐳𝐳 = μ = mean of the data
𝝈𝝈 σ = standard deviation

We can do the same to the data in regression:


– We could convert columns of data to z-scores
– Use the z-scores as the X variables in the regression model
– These slope coefficients would have a standardised unit of measure

The University of Sydney Page 25


5. Multiple Regression Modelling

Standardised Betas

Only absolute values matter:

Standardised Betas are used for comparison NOT interpretation

The University of Sydney Page 26


5. Multiple Regression Modelling

Dummy Coding

Adding continuous variables is easy:


– Coefficients tell you as X changes by a unit how does Y change
– Have numerical properties ($10 is twice as much as $5)

Can categorical variables change by one unit?


– Male = 1
– Female = 2
– Does being female mean you are twice a male?

The University of Sydney Page 27


5. Multiple Regression Modelling

Dummy Coding

Regression modelling relies on numerical properties:


– We need a method to separate the numerical properties from the numbers
that are used to represent different categories of outcomes

Dummy coding is the process of including categorical variables:


– You create a “dummy” variable for each category within a variable
– A numerical proxy for the presence or absence of a category

Typically:
– 0 = absence of category/characteristic
– 1 = presence of category/characteristic

The University of Sydney Page 28


5. Multiple Regression Modelling

Dummy Coding – Simple Example

Predicting income based on gender:


– Female = 0
– Male = 1

Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 450.694 1.033 436.194 .000
GENDER 109.010 1.393 .999 78.243 .000
a. Dependent Variable: INCOME

The University of Sydney Page 29


5. Multiple Regression Modelling

Class Activity

Income = 450.69 + 109.01(Gender)

– How much do you predict females (0) earn?

– How much do you predict males (1) earn?

– What is the difference in weekly income?

The University of Sydney Page 30


5. Multiple Regression Modelling

Dummy Coding – Visual Example Ball Sales Price Season


275 36 0
338 42 1
Does seasonality affect golf ball sales? 420 30 1

– Winter = 0 254 31 0
373 47 1
– Summer = 1 418 34 1
199 38 0
243 45 0
Scatterplot ignoring Season: 259 30 0

Sales vs. Price 403 40 1


343 41 1
450
400 242 33 0
350 191 39 0
300 217 45 0
250
351 34 1
200
150 364 32 1
100 283 33 0
50
330 47 1
0
0 10 20 30 40 50 324 43 1

The University of Sydney Page 31


5. Multiple Regression Modelling

Dummy Coding – Visual Example

Looking at sales in Summer and Winter side-by-side:

Sales vs. Price (Winter) Sales vs. Price (Summer)


450 450
400 400
350 350
300 300
250 250
200 200
150 150
100 100
50 50
0 0
0 10 20 30 40 50 0 10 20 30 40 50

The University of Sydney Page 32


5. Multiple Regression Modelling

Dummy Coding – Visual Example

Looking at sales in Summer and Winter on the one graph:

Sales
450

400

350

300

250

200

150

100

50

0
0 5 10 15 20 25 30 35 40 45 50

The University of Sydney Page 33


5. Multiple Regression Modelling

Dummy Coding – Visual Example

The effect of price is the same no matter what season it is:


– Just in summer we sell a constant amount more

Y = b0 + b1X1 + b2X2 Same Slope

– X2 = 1 → b0 + b1X1 + b2(1) → b0 + b1X1 + b2 → (b0 + b2) + b1X1


– X2 = 0 → b0 + b1X1 + b2(0) → b0 + b1X1 + 0 → b0 + b1X1
Different Constant

A dummy variable gives the average difference between two categories

The University of Sydney Page 34


5. Multiple Regression Modelling

Dummy Coding – Visual Example

Looking at regression output from SPSS:

What is the regression equation:


– On average, how many more golf balls do we sell in summer each week?

The University of Sydney Page 35


5. Multiple Regression Modelling

Dummy Coding – Multiple Categories

Thus far we have only looked at a variable with two categories:


– How many variables (X’s) were used the regression equation?
– We compared the impact of one category (1) against a base (0)

If we have a variable with more than two categories:


– Create (k-1) new variables
– Where k is the # of categories

The University of Sydney Page 36


5. Multiple Regression Modelling
Sales Colour Colour
Dummy Coding – Multiple Categories 1623 White 2
1259 Yellow 1
68 Pink 3
2000 110 Pink 3
y = -563.36x + 2303.6
1800 R² = 0.3865 346 Pink 3
1600 1780 White 2
1400 1335 Yellow 1
1200 1861 White 2
1000 1697 White 2
800
1870 White 2
600
83 Pink 3
400
323 Pink 3
200
0
1163 Yellow 1
0 1 2 3 4 1720 White 2
1090 Yellow 1
1716 White 2
1300 Yellow 1
1886 White 2
1548 White 2
The University of Sydney Page 37
5. Multiple Regression Modelling
Sales Colour Colour Yellow White Pink
1623 White 2
1259 Yellow 1
68 Pink 3
110 Pink 3
346 Pink 3
1780 White 2
1335 Yellow 1
1861 White 2
1697 White 2
1870 White 2
83 Pink 3
323 Pink 3
1163 Yellow 1
1720 White 2
1090 Yellow 1
1716 White 2
1300 Yellow 1
1886 White 2
1548 White 2

The University of Sydney Page 38


5. Multiple Regression Modelling

Dummy Coding – Multiple Categories

The variable with the zeros is referred to as the base variable:


– In the previous slide our base variable was the colour white
– I generally use the mode category as the base
– We compare the impact of a category to this basis for comparison

SPSS output for this regression model:

The University of Sydney Page 39


5. Multiple Regression Modelling

Class Activity

1. How well does the model now fit compared to previously?

2. What is the regression equation?

3. How do you interpret the regression coefficients?

The University of Sydney Page 40


5. Multiple Regression Modelling

Stepwise Regression

The process we use to remove insignificant variables

1. Perform regression and put all your X variables in the model

2. Identify the X variable with the biggest sig value

3. Remove it!!

4. Keep going one by one until only significant variables remain

The University of Sydney Page 41


5. Multiple Regression Modelling

Quick Summary of Topic 5

Measures of Model Performance:


– R-square: Percent of variation in Y explained by variations in the X’s
– Adjusted R-square: Used to compare performance of different models of Y
– ANOVA: Used to assess overall model significance

Coefficients:
– T-test: Used to determine if coefficients are significant or not
– Stepwise: Used to remove insignificant X’s one at a time
– Unstandardised: Used to create regression line and measure impact of X
– Standardised: Used to compare relative impacts of the different X
– Interpretation of slope coefficient for continuous vs. categorical X’s differs

The University of Sydney Page 42


5. Multiple Regression Modelling

Class Activity

1. Locate the "Class Data – Clean.xlsx" data file

2. On this data identify a Y variable of interest:


– Conduct regression modelling on the Y variable

3. Create a Powerpoint presentation that shows:


– Your initial model and a list of what variables you removed
– Your final model, interpretation of model performance
– Interpretation of slope coefficients and what it tells you about the Y variable

4. Email to matthew.beck@sydney.edu.au
– GROUP NAME in subject line

The University of Sydney Page 43

You might also like