You are on page 1of 18

Multiple

Regression
Background
Multiple Linear Regression is widely used in academics and also in MR
We can consider it the start of Multivariate Analysis, for our course
Any idea what the following are:
◦ Multivariate analysis
◦ Multiple Linear Regression (MLR)
Background
Different levels of an independent variable are
associated with corresponding changes in the
dependent variable
◦ What is an IV? What is a DV?
◦ IV is denoted by X, while DV is denoted Y
◦ We can loosely say X causes Y
Any idea how regression works? The principle behind
it? In what scale the IV is, the DV is?
◦ Assume one X, one Y
Background
Normally, the IV & DV continuous, not discrete
◦ Meaning?

In regression, a line is repeatedly fitted in the scatter-plot of X and Y


◦ The line of best fit is the regression line

Consider the following data


Background
X Y
0 0
1 1
2 2
3 3
4 4
5 5
Background
Let us plot the points
Drawing a line of best fit is child’s play
◦ The association is perfectly linear
In real life, we rarely find data that are so perfect
◦ We instead may find data that may be as follows
Thus, the line of best fit is the regression line
◦ There is some error
◦ But the idea is to minimise this error; how is this done?
Background
The sum of least squares is followed
Different lines are fitted, the errors squared and the line with the sum
of least squares is chosen finally
◦ Sometimes, MLR is called OLS or Ordinary least squares

Why should one square the errors and then add? Why not just add up?
Background
The idea is 2-fold
◦ We cancel out +ve and –ve errors
◦ We penalise large errors

This is a 1-IV case, similar with n IVs


◦ Impossible to show on the board

Now let us consider some real data and perform a regression


Some Considerations
Can also handle non-metric or categorical IVs e.g. gender influences
shopping time
◦ This is called dummy coding
◦ Basically dummy regression is the same as an ANOVA
◦ Both are forms of the General Linear Model

While MLR is useful, it has certain prerequisites and limitations


Some Considerations
There should be not be collinearity between the IVs
◦ This creates biased estimates
◦ First step is therefore to get the correlation matrix in
Excel/SPSS
◦ How to remove this collinearity?
One should also go into MLR with sufficient research on
likely relationships
◦ Else, may end up doing sample-specific data mining
◦ No guarantee about robustness of results
Some Considerations
◦ The shot-gun approach should be avoided
◦ MR firms may not agree
There should not be heteroscedasticity in the DV
◦ 2 marks bonus for saying this orally in the final!
◦ This can be got around by transforming the data using log,
inverse, square root
Cannot handle non-linear relationships
◦ Consider the following data
Some Considerations
X Y
1 1
2 4
3 9
4 16
5 25
6 36
7 49
8 64
Some Considerations
SPSS will give you a decent regression but it misses the point
◦ Have to use polynomial regression, beyond scope

Must take great care in ensuring all IVs put in, else may reach utterly
erroneous conclusions e.g.
◦ Sales on Ad, leaving out Price, SP
Some Considerations
Ideally have some likely results in mind before going in for data
collection
◦ MR firms screw up here
◦ We academics score big here
◦ Why is this important?

In case no working knowledge is there, use stepwise regression


◦ It will give you the order of importance
Some Considerations
In exploratory research, ok to use it
◦ Not a big fan of stepwise
Key Terms – A Review
Coefficient of Determination, R2, gives the extent of variation in Y
explained by X (or X1, X2 and so on)
◦ Also called variance explained
◦ Better would be adjusted R2

“b” is the unstandardised weight and β is the standardised weight


◦ Since different units may be there for diff IVs
Key Terms – A Review
F-Value and t-value must be looked at too
Any doubts?
Do you want to learn how regression can handle
◦ Categorical data
◦ Interaction effects? What problems will come here?
◦ Need demo?
Summary
MLR is a very useful tool
◦ It has wide applications

But must be careful to avoid violating fundamental assumptions, mainly


multicollinearity
◦ Esp. in MR

You might also like