You are on page 1of 21

Multiple Regression

Topic 5

Agenda

Background
Example with Real Data
Some Considerations
Key Terms
Summary

Background
Multiple Linear Regression is widely
used in academics and also in MR
We can consider it the start of
Multivariate Analysis, for our course
Any idea what the following are:
Multivariate analysis
Multiple Linear Regression (MLR)

Background
Multivariate analysis is hard to define well
Some say anytime you have more than 2
variables, it is multivariate
Some say that you need to have many
combinations of variables i.e. variates
Some say that you need to have multiple
dependent variables

For all practical purposes, following can be


considered multivariate
MLR
Factor analysis

Background

Discriminant analysis
Cluster Analysis
Conjoint analysis
Canonical Correlation
Structural Equation Modeling

We shall consider just MLR, factor,


discriminant and cluster analyses
Linear regression involves finding a linear
relationship between an independent variable
and dependent variable

Background
Different levels of an independent variable
are associated with corresponding changes
in the dependent variable
What is an IV? What is a DV?
IV is denoted by X, while DV is denoted Y
We can loosely say X causes Y

Any idea how regression works? The


principle behind it? In what scale the IV is,
the DV is?
Assume one X, one Y

Background
Normally, the IV & DV continuous,
not discrete
Meaning?

In regression, a line is repeatedly


fitted in the scatter-plot of X and Y
The line of best fit is the regression line

Consider the following data

Background
X

Background
Let us plot the points
Drawing a line of best fit is childs play
The association is perfectly linear

In real life, we rarely find data that are so


perfect
We instead may find data that may be as follows

Thus, the line of best fit is the regression line


There is some error
But the idea is to minimise this error; how is this
done?

Background
The sum of least squares is followed
Different lines are fitted, the errors
squared and the line with the sum of
least squares is chosen finally
Sometimes, MLR is called OLS or
Ordinary least squares

Why should one square the errors


and then add? Why not just add up?

Background
The idea is 2-fold
We cancel out +ve and ve errors
We penalise large errors

This is a 1-IV case, similar with n IVs


Impossible to show on the board

Now let us consider some real data


and perform a regression

Some Considerations
Can also handle non-metric or
categorical IVs e.g. gender influences
shopping time
This is called dummy coding
Basically dummy regression is the same
as an ANOVA
Both are forms of the General Linear
Model

While MLR is useful, it has certain


prerequisites and limitations

Some Considerations
There should be not be collinearity between
the IVs
This creates biased estimates
First step is therefore to get the correlation
matrix in Excel/SPSS
How to remove this collinearity?

One should also go into MLR with sufficient


research on likely relationships
Else, may end up doing sample-specific data
mining
No guarantee about robustness of results

Some Considerations
The shot-gun approach should be avoided
MR firms may not agree

There should not be heteroscedasticity in


the DV
2 marks bonus for saying this orally in the final!
This can be got around by transforming the data
using log, inverse, square root

Cannot handle non-linear relationships


Consider the following data

Some Considerations
X
1
2
3
4
5
6
7
8

Y
1
4
9
16
25
36
49
64

Some Considerations
SPSS will give you a decent
regression but it misses the point
Have to use polynomial regression,
beyond scope

Must take great care in ensuring all


IVs put in, else may reach utterly
erroneous conclusions e.g.
Sales on Ad, leaving out Price, SP

Some Considerations
Ideally have some likely results in
mind before going in for data
collection
MR firms screw up here
We academics score big here
Why is this important?

In case no working knowledge is


there, use stepwise regression
It will give you the order of importance

Some Considerations
In exploratory research, ok to use it
Not a big fan of stepwise

Key Terms A Review


Coefficient of Determination, R2,
gives the extent of variation in Y
explained by X (or X1, X2 and so on)
Also called variance explained
Better would be adjusted R2

b is the unstandardised weight and


is the standardised weight
Since different units may be there for diff
IVs

Key Terms A Review


F-Value and t-value must be looked
at too
Any doubts?
Do you want to learn how regression
can handle
Categorical data
Interaction effects? What problems will
come here?
Need demo?

Summary
MLR is a very useful tool
It has wide applications

But must be careful to avoid violating


fundamental assumptions, mainly
multicollinearity
Esp. in MR