Professional Documents
Culture Documents
MMGT6012 - Topic 5 - Multiple Regression Modelling
MMGT6012 - Topic 5 - Multiple Regression Modelling
Assumptions of Regression
No multicolinearity
– Independent variables are not correlated with each other
Normality of Error
– Error values (ε) are normally distributed for any given value of X
Homoscedasticity
– The probability distribution of the errors has constant variance
Independence of Errors
– Error values are statistically independent
Y Yi = β0 + β1Xi + ε i
Observed Value
of Y for Xi
εi Slope = β1
Predicted Value Random Error
of Y for Xi
for this Xi value
Intercept = β0
Xi X
The University of Sydney Page 4
5. Multiple Regression Modelling
Y Yi = β0 + β1Xi + ε i
Observed Value
of Y for Xi
εi
Predicted Value Random Error Slope = β1
of Y for Xi
for this Xi value
Intercept = β0
Xi X
The University of Sydney Page 5
5. Multiple Regression Modelling
Y Yi = β0 + β1Xi + ε i
Observed Value
of Y for Xi
εi
Predicted Value Random Error Slope = β1
of Y for Xi
for this Xi value
Objective is to
Intercept = β0
minimise all errors!
Xi X
The University of Sydney Page 6
5. Multiple Regression Modelling
Y
Ŷ = b0 + b1X1 + b 2 X 2
X2
X1
The University of Sydney Page 8
5. Multiple Regression Modelling
Dependent Variable:
– Golf Ball Sales
Independent Variables:
– Price (in $)
– Advertising (in $100)
600 600
500 500
400 400
300 300
200 200
100 100
0 0
0 20 40 60 0 1 2 3 4 5
As advertising increases by
1 unit ($100), sales
increase by 73.8 units
Sales = 572.9 boxes of golf balls Note that advertising was initially
measured in hundreds of dollars so
convert 450 to 4.5 “hundreds”
Class Activity
What is the range of sales for which you are 95% confident the
actual sales of golf balls will be?
Adjusted R-square
R-square:
– Never decreases when a new X variable is added to the model
Adjusted R-Square:
– Calibrates the R2 based on how many X variables we are using
– Does using an extra X (k = number of X’s) add any benefit
2 n − 1
2
R = 1 − (1 − R )
n − k − 1
adj
The University of Sydney Page 20
5. Multiple Regression Modelling
Adjusted R-square
ANOVA (F test)
Using a t-test:
– Every time we use a t-test we allow for a 5% chance of a mistake
– The more times we do a test the more likely it is we will make a mistake
ANOVA (F test):
– Test of overall model significance
– H0: β1 = β2 = … = βk = 0 (no linear relationship)
– H1: at least one βi ≠ 0 (at least one Xi affects Y)
ANOVA (F test)
Standardised Betas
Slope Coefficients:
– Tell us the average change in Y for a one unit chance in X
– They are a function of how X is measured!
Standardised Betas
𝒙𝒙 − 𝝁𝝁 x = value of interest
𝐳𝐳 = μ = mean of the data
𝝈𝝈 σ = standard deviation
Standardised Betas
Dummy Coding
Dummy Coding
Typically:
– 0 = absence of category/characteristic
– 1 = presence of category/characteristic
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 450.694 1.033 436.194 .000
GENDER 109.010 1.393 .999 78.243 .000
a. Dependent Variable: INCOME
Class Activity
– Winter = 0 254 31 0
373 47 1
– Summer = 1 418 34 1
199 38 0
243 45 0
Scatterplot ignoring Season: 259 30 0
Sales
450
400
350
300
250
200
150
100
50
0
0 5 10 15 20 25 30 35 40 45 50
Class Activity
Stepwise Regression
3. Remove it!!
Coefficients:
– T-test: Used to determine if coefficients are significant or not
– Stepwise: Used to remove insignificant X’s one at a time
– Unstandardised: Used to create regression line and measure impact of X
– Standardised: Used to compare relative impacts of the different X
– Interpretation of slope coefficient for continuous vs. categorical X’s differs
Class Activity
4. Email to matthew.beck@sydney.edu.au
– GROUP NAME in subject line