You are on page 1of 21

# Multiple Regression

Topic 5

Agenda

Background
Example with Real Data
Some Considerations
Key Terms
Summary

Background
Multiple Linear Regression is widely
used in academics and also in MR
We can consider it the start of
Multivariate Analysis, for our course
Any idea what the following are:
Multivariate analysis
Multiple Linear Regression (MLR)

Background
Multivariate analysis is hard to define well
Some say anytime you have more than 2
variables, it is multivariate
Some say that you need to have many
combinations of variables i.e. variates
Some say that you need to have multiple
dependent variables

## For all practical purposes, following can be

considered multivariate
MLR
Factor analysis

Background

Discriminant analysis
Cluster Analysis
Conjoint analysis
Canonical Correlation
Structural Equation Modeling

## We shall consider just MLR, factor,

discriminant and cluster analyses
Linear regression involves finding a linear
relationship between an independent variable
and dependent variable

Background
Different levels of an independent variable
are associated with corresponding changes
in the dependent variable
What is an IV? What is a DV?
IV is denoted by X, while DV is denoted Y
We can loosely say X causes Y

## Any idea how regression works? The

principle behind it? In what scale the IV is,
the DV is?
Assume one X, one Y

Background
Normally, the IV & DV continuous,
not discrete
Meaning?

## In regression, a line is repeatedly

fitted in the scatter-plot of X and Y
The line of best fit is the regression line

## Consider the following data

Background
X

Background
Let us plot the points
Drawing a line of best fit is childs play
The association is perfectly linear

## In real life, we rarely find data that are so

perfect
We instead may find data that may be as follows

## Thus, the line of best fit is the regression line

There is some error
But the idea is to minimise this error; how is this
done?

Background
The sum of least squares is followed
Different lines are fitted, the errors
squared and the line with the sum of
least squares is chosen finally
Sometimes, MLR is called OLS or
Ordinary least squares

## Why should one square the errors

Background
The idea is 2-fold
We cancel out +ve and ve errors
We penalise large errors

## This is a 1-IV case, similar with n IVs

Impossible to show on the board

## Now let us consider some real data

and perform a regression

Some Considerations
Can also handle non-metric or
categorical IVs e.g. gender influences
shopping time
This is called dummy coding
Basically dummy regression is the same
as an ANOVA
Both are forms of the General Linear
Model

## While MLR is useful, it has certain

prerequisites and limitations

Some Considerations
There should be not be collinearity between
the IVs
This creates biased estimates
First step is therefore to get the correlation
matrix in Excel/SPSS
How to remove this collinearity?

## One should also go into MLR with sufficient

research on likely relationships
Else, may end up doing sample-specific data
mining
No guarantee about robustness of results

Some Considerations
The shot-gun approach should be avoided
MR firms may not agree

## There should not be heteroscedasticity in

the DV
2 marks bonus for saying this orally in the final!
This can be got around by transforming the data
using log, inverse, square root

## Cannot handle non-linear relationships

Consider the following data

Some Considerations
X
1
2
3
4
5
6
7
8

Y
1
4
9
16
25
36
49
64

Some Considerations
SPSS will give you a decent
regression but it misses the point
Have to use polynomial regression,
beyond scope

## Must take great care in ensuring all

IVs put in, else may reach utterly
erroneous conclusions e.g.
Sales on Ad, leaving out Price, SP

Some Considerations
Ideally have some likely results in
mind before going in for data
collection
MR firms screw up here
Why is this important?

## In case no working knowledge is

there, use stepwise regression
It will give you the order of importance

Some Considerations
In exploratory research, ok to use it
Not a big fan of stepwise

## Key Terms A Review

Coefficient of Determination, R2,
gives the extent of variation in Y
explained by X (or X1, X2 and so on)
Also called variance explained

## b is the unstandardised weight and

is the standardised weight
Since different units may be there for diff
IVs

## Key Terms A Review

F-Value and t-value must be looked
at too
Any doubts?
Do you want to learn how regression
can handle
Categorical data
Interaction effects? What problems will
come here?
Need demo?

Summary
MLR is a very useful tool
It has wide applications

## But must be careful to avoid violating

fundamental assumptions, mainly
multicollinearity
Esp. in MR