You are on page 1of 33

Simple Regression Analysis

Basically, we are looking to find how a


one unit change in some variable is
going to affect another variable

Y = a + bX
X is the independent variable
Y is the dependent variable
a & b are constants
Bivariate Regression
Second $800
Year
Sales, $600
(Y)
$400

$200

$0 $0 $100 $200 $300 $400 $500


First year sales, (X)
Bivariate Regression
Now, lets suppose that wed like to make a
prediction of second year sales in dollars,
based on first year sales in dollars.
We can develop a regression equation:
the equation for the best-fitting line in the
constant-dollar coordinate system:
(The line is developed by minimizing the squared
deviations yielding an optimal line. Sounds tricky, but
look at this picture and you can see how this works
Bivariate Regression
Second $800
Year
Sales, $600
(Y)
$400

$200

$0 $0 $100 $200 $300 $400 $500


First year sales, (X)
Bivariate Regression
Second $800
Year
Sales $600
$400

$200 Y = $103 + $1.31X


(Y)
$0 $0 $100 $200 $300 $400 $500
First year sales (X)
Bivariate Regression
Now, let us suppose that we would like to
estimate the second-year sales for someone
with $325 in first year sales.
What would you estimate those sales to be,
if
Y = $103 + $1.31X ?
Bivariate Regression
Second $800
Year
Sales $600
$400

$200 Y = $103 + $1.31X


(Y)
$0 $0 $100 $200 $300 $400 $500
First year sales (X)
Bivariate Regression
Second $800
Year
Sales $600
$400

$200 Y = $103 + $1.31X


(Y)
$0 $0 $100 $200 $300 $400 $500
First year sales (X)
Bivariate Regression
a is also called the intercept term,
What does the intercept tell us?
and b is called the slope with respect to X.
What does the slope tell us?
The point of any regression algorithm is to
solve for those coefficients.
Bivariate Regression
But, how does it do that?
The most common form of regression is
called least squares or ordinary least
squares regression.
It tries to minimize the sum of squared
errors of prediction.
Bivariate Regression
An error of prediction is the difference
between what someone actually scored on Y,
and what the equation would predict they
would score, based on his or her X value.
For person i, Error = Yi ^
Yi = e i
Bivariate Regression
Second $800
Year
Sales $600
$400

(Y) $200

$0 $0 $100 $200 $300 $400 $500


First year sales (X)
Bivariate Regression
Second $800
Year
Sales $600
$400

(Y) $200

$0 $0 $100 $200 $300 $400 $500


First year sales (X)
Bivariate Regression
Second $800
Year
Sales $600
$400

(Y) $200

$0 $0 $100 $200 $300 $400 $500


First year sales (X)
Bivariate Regression
Second $800
Year
Sales $600
$400

(Y) $200

$0 $0 $100 $200 $300 $400 $500


First year sales (X)
Bivariate Regression
Second $800
Year
Sales $600
$400}error, e
(Y) $200

$0 $0 $100 $200 $300 $400 $500


First year sales (X)
Bivariate Regression
Least-squares regression minimizes the sum
of those squared errors for all people in the
data set:

minimize: ei 2
i
by its choice of a and b.
Bivariate Regression

So, we have an index of how well


the equation does at minimizing
that sum of squared errors:
r2, called r-squared
It tells us how much of the variance
our model (bivariate in this case)
accounts for
Multiple Regression
When we have multiple predictor variables,
we simply extend the form of the bivariate
regression equation:

Y = a + b1 X1 + b2 X2 + b3 X3 + .
Multiple Regression
^

Y = a + b1 X1 + b2 X2 + b3 X3 + .
As before, the goal of the multiple
regression algorithm is to find the
values of the a and bs that
minimizes the sum of squared
errors.
Multiple Regression
^
Y = a + b1 X1 + b2 X2 + b3 X3 + .
So, lets take an example.
Suppose we are trying to predict how much
people will buy from an on-line toy store
(Y), based on their demographic
characteristics.
Multiple Regression
We have these independent variables (Xs):
X1: Estimated family income ($/year)
X2: Grandparent status (1=yes, 0=no)
X3: Head of household years education (0-20)
Multiple Regression the data
Name Y: sales X1 income X2 Grand X3 educ
J.R.E. $1484.92 $590,000 0 16
C.A.D. $22.94 $23,918 1 12

W.J.C. $662.61 $350,000 0 19


H.R.C. $3.98 $44,000 0 19
N.A.N. $29.04 $18,650 1 13
Multiple Regression
And, lets suppose our equation is solved:
^
Y = $10 + $0.001 X1 + $90.00 X2 + $0.10 X3
X1: Estimated family income ($/year)
X2: Grandparent status (1=yes, 0=no)
X3: Head of household years education (6-20)
Multiple Regression
So, what are predicted sales for this
^
household?
Y = $10 + $0.001 X1 + $90.00 X2 + $0.10 X3
X1: annual income of $50,000/year
X2: not grandparents
X3: 16 years education (bachelors degree)
Multiple Regression
How good is this equation? Is it worth
using?
That is a matter for r2, our measure of
goodness-of-fit to the data.
0 < R2 < 1.00
With 0.0 indicating no fit, and1.0 indicating
perfect fit.
The fit means how much of the variance is
being accounted for by the equation
Multiple Regression
Statistical tests:
There is an significance test of the null
hypothesis that multiple R2 is 0.0 in the
population.
If rejected, the equation can be used.
If not rejected, then the equation should not
be used (why?)
Just remember, this test JUST tells us if the
equation OVERALL works. Not if everything
in it contributes
Multiple Regression
Statistical tests:
There is also a t-test of the null
hypothesis that a b value is 0.0 in the
population. Well see in two
slides)
If the test is found to be different
than zero, the b contributes, if not, it
doesnt.
Multiple Regression Using SPSS
1. Click ANALYZE
2. Select REGRESSION
3. Click LINEAR
4. Move the outcome to DEPENDENT Box
5. Move predictors to INDEPENDENT(S) box
6. Click OK

(This is process list is just as a reference, no need to


memorize)

14 | 29
Copyright Houghton Mifflin
Company. All rights reserved.
Multiple Regression Using SPSS
(Contd)

Just as a reference
Multiple Regression - Overall
2 Model Summary

Adjusted Std. Error of


Model R R Square R Square the Estimate
1 .934a .873 .858 2.23
a. Predictors: (Constant), Number of Competing
Detergents, Advertising Expenditures for Bright ($in
100)

ANOVA b

Sum of 1
Model Squares df Mean Square F Sig.
1 Regression 580.373 2 290.187 58.293 .000a
Residual 84.627 17 4.978
Total 665.000 19
a. Predictors: (Constant), Number of Competing Detergents, Advertising Expenditures
for Bright ($in 100)
b. Dependent Variable: Dollar Sales of Bright ($ in Thousands)
Very important, Sig must
ALWAYS be < .05 for us to
14 | 31 believe
Multiple Regression -
Components
Coefficientsa

Standardi
zed
Unstandardized Coefficien
Coefficients ts
Model B Std. Error Beta t Sig.
1 (Constant) 8.854 6.717 1.318 .205
4 3
Advertising Expenditures
.808 .324 .619 2.496 .023
for Bright ($in 100)
Number of Competing
-.498 .376 -.328 -1.324 .203
Detergents
a. Dependent Variable: Dollar Sales of Bright ($ in Thousands)

If we want to know how much relative contribution is, we use the standardized form
of Beta.
A Live Example

You might also like