You are on page 1of 46

MMGT6012

Business Tools for Management

TOPIC 4: Introduction to Regression


Dr. Matthew Beck
ITLS, Business School

The University of Sydney Page 1


4. Introduction to Regression

Class Activity

Graph the data in the table Y X


4 7

Draw what you think is the straight line that fits best 3 5
3 4
1 2
Using this line: 15 8

– What do you predict X will be if Y = 4? 1 2

– How close is this prediction to the actual value? 3 4


1 2
2 3
3 5

The University of Sydney Page 2


4. Introduction to Regression

Class Activity

Graph the data in the table Y X


21 1
20 2

Draw what you think is the straight line that fits best: 22 3

– What do you predict X will be if Y = 5? 24 4

– How close is this prediction to the actual value? 26 5


28 6
23 7
Now draw any line that best fits the data: 22 8

– What do you now predict X will be if Y = 5? 22 9

– How close is this prediction to the actual value? 18 10


19 11
20 12

The University of Sydney Page 3


4. Introduction to Regression

Class Activity

Finally, graph the data in the table Y X


12 5

Draw what you think is the straight line that fits best: 18 8

– How well does the line fit the data? 12 5


10 4
12 5
Can you work out the equation of this line? 14 6
16 7
2 0
8 3
14 6

The University of Sydney Page 4


4. Introduction to Regression

What is Linear Regression?

Process of fitting a straight (linear) line that best fits the data

Estimating the equation for that straight line:


– Constant (what does Y equal when X equals zero)
– Slope (how much does Y go up or down as X changes)
– Some error (our line is not perfect)

How did you try and fit the line:


– Mental process you used to fit your straight lines?

The University of Sydney Page 5


4. Introduction to Regression

What is Linear Regression?

The University of Sydney Page 6


4. Introduction to Regression

Y Yi  β0  β1Xi  εi
Observed Value
of Y for Xi

εi Slope = β1
Predicted Value Random Error
of Y for Xi
for this Xi value

Intercept = β0

Xi X
The University of Sydney Page 7
4. Introduction to Regression

Y Yi  β0  β1Xi  εi
Observed Value
of Y for Xi

εi
Predicted Value Random Error Slope = β1
of Y for Xi
for this Xi value

Intercept = β0

Xi X
The University of Sydney Page 8
4. Introduction to Regression

Y Yi  β0  β1Xi  εi
Observed Value
of Y for Xi

εi
Predicted Value Random Error Slope = β1
of Y for Xi
for this Xi value
Objective is to
Intercept = β0
minimise all errors!
Xi X
The University of Sydney Page 9
4. Introduction to Regression

Assumptions of Regression

Linear relationship between X and Y

No multicolinearity
– Independent variables are not correlated with each other

Normality of Error
– Error values (ε) are normally distributed for any given value of X

Homoscedasticity
– The probability distribution of the errors has constant variance

Independence of Errors
– Error values are statistically independent

The University of Sydney Page 10


4. Introduction to Regression

Clean Your Data!

Y X 16
4 7 14
3 5
12
3 4
10
1 2 y = 1.5859x - 3.0606
15 8 8
1 2 6
3 4 4
1 2
2
2 3
0
3 5
0 2 4 6 8 10

The University of Sydney Page 11


4. Introduction to Regression

Clean Your Data!

Y X 16
4 7 14
3 5
12
3 4
10
1 2
5 8 8
1 2 6
y = 0.6263x - 0.0303
3 4 4
1 2
2
2 3
0
3 5
0 2 4 6 8 10

The University of Sydney Page 12


4. Introduction to Regression

Simple Linear Regression

Only one independent variable, X

Relationship between X and Y is described by a linear function

Changes in Y are assumed to be caused by changes in X

The University of Sydney Page 13


4. Introduction to Regression

The Linear Regression Function

In the population the regression model is:

Dependent Slope Independent Error


Constant
Variable Coefficient Variable Term

𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝜀𝑖

Linear Random
Component Component

The University of Sydney Page 14


4. Introduction to Regression

The Linear Regression Function

In the sample the regression model is:

Estimated Estimated Estimated Observed


Y-value Constant Coefficient X-value

𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖

Linear in the
parameters

The University of Sydney Page 15


4. Introduction to Regression

The Linear Regression Function

Constant:
– The average value of Y when X is equal to zero

Slope coefficient:
– The average change in Y for a one unit change in X

Error:
– The difference between the observed Y and the predicted Y
– Also called the residual

The University of Sydney Page 16


4. Introduction to Regression

Simple Regression - Example

Some facts about the property market:


– Currently ~9 million homes in Australia
– Average size of 245m2 (largest in world)
– Average number of residents = 2.6 (down from 3.1 in 1976)
– Average price in 2014 of $571,500
– Prices rose in Sydney by 12.2% in the year
– Accounts for $1.5 trillion of household debt

The University of Sydney Page 17


4. Introduction to Regression

Simple Regression - Example


Price Size
Thinking about buying a home: 735 130
– Relationship between house price and house size 936 149
837 158
Is a house fairly valued or not? 924 174
– Buy for yourself or buy as an investment 597 102
657 144
A random sample of 10 houses is selected: 1215 218
– Dependent variable (Y) = house price in $1000s 972 223
– Independent variable (X) = metres square 957 132
765 158

The University of Sydney Page 18


4. Introduction to Regression

Simple Regression - Example

1. Is the data clean?


2. What might the relationship look like?
3. How consistent might the relationship be?
House Price ($'000)
1400

1200

1000

800

600

400

200

0
0 50 100 150 200 250

The University of Sydney Page 19


4. Introduction to Regression

Simple Regression - Example

1. Where is the straight line of best fit?


2. What is the linear equation for this straight line?

House Price ($'000)


1400 Slope
1200

1000
Y = 3.66665x + 277.26
800

600
Constant
400

200

0
0 50 100 150 200 250

The University of Sydney Page 20


4. Introduction to Regression

Simple Regression - Example

𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖

𝑌 = 277.26 + 3.67𝑋

𝑃𝑟𝑖𝑐𝑒 = 277.26 + 3.67(𝑆𝑖𝑧𝑒)

The constant is the average value of Y when X = 0


– Can a house ever have a size of 0?
– So does the constant actually tell us here?!?

The University of Sydney Page 21


4. Introduction to Regression

Simple Regression - Example

𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖

𝑌 = 277.26 + 3.67𝑋

𝑃𝑟𝑖𝑐𝑒 = 277.26 + 3.67(𝑆𝑖𝑧𝑒)

The slope is the average change in Y when X changes by 1 unit


– As SIZE goes up by one unit (one square metre)
– House PRICE rises by 3.67 units ($3670)

The University of Sydney Page 22


4. Introduction to Regression

Model Performance

We estimate average impacts of X on Y:


– We may be interested in knowing how good these averages are
– How well we can explain changes in Y by the different values of X

The University of Sydney Page 23


4. Introduction to Regression

Model Performance

Coefficient of Determination:
– Also called the R-Square (R2) value

The portion of the total variation in the dependent variable that is


explained by variation in the independent variable

0  R 1 2

The University of Sydney Page 24


4. Introduction to Regression

Simple Regression - Example

R2 = 0.59
Y = 277.26 + 3.67X

House Price ($'000)


1400

1200

1000

800

600

400

200

0
0 50 100 150 200 250

The University of Sydney Page 25


4. Introduction to Regression

R2 and the Correlation Statistic (r)

R2 = 1, r = +1 R2 = 1, r = -1
^=b +b X
Yi 0 1 i
^=b +b X
Yi 0 1 i

R2 = .8, r = +0.9 R2 = 0, r = 0

^=b ^=b +b X
Y
Yi 0 + b1Xi i 0 1 i

The University of Sydney Page 26


4. Introduction to Regression

Measuring Error

The standard error of the regression line:


– Represents the average error around the regression line
– Wrong the regression model is on average (in the units Y is measured in)
– Smaller values are better because means the data is closer to the line

Y Y

X X
The University of Sydney Page 27
4. Introduction to Regression

Measuring Error

The standard error is relative to the units of Y:


– The size of the error is relative to the size of the Y variable

We assume the error term is normal:


– 95% of observations are + or – 2 standard deviations of the mean!!

We can not only predict Y:


– Give a range of Y where were are 95% confident Y is in that range

The University of Sydney Page 28


4. Introduction to Regression

Class Activity

1. Predict the price for the above home:


– What range of prices are you 95% confident the true price will be in?

2. Is this apartment one you would invest in given your model?

The University of Sydney Page 29


4. Introduction to Regression

Class Activity

The University of Sydney Page 30


4. Introduction to Regression

Issues with Prediction

Can only use regression to predict like things:


– Are the houses in our data set representative of the house we are trying to
predict a value of Y (price) for?
– Can only predict Y within the range of the X values we have

House Price ($'000)


1400

1200

1000

800

600 Cannot predict values


400
for large houses or
200
small houses
0
0 50 100 150 200 250
The University of Sydney Page 31
4. Introduction to Regression

Measuring Error and Model Performance

250
y = 0.5014x + 9.6693
200 R2 = 0.9994

150

100

50

0
0 100 200 300 400 500

Standard Error = 0.000759380428589896

The University of Sydney Page 32


4. Introduction to Regression

Measuring Error and Model Performance

250
y = 0.5005x + 9.5696
2
R = 0.9529
200

150

100

50

0
0 100 200 300 400 500

Standard Error = 0.007062134927559

The University of Sydney Page 33


4. Introduction to Regression

Measuring Error and Model Performance

350
300 y = 0.4658x + 15.539
2
250 R = 0.3818

200
150

100
50
0
-50 0 100 200 300 400 500

-100

Standard Error = 0.03764192536044

The University of Sydney Page 34


4. Introduction to Regression

Measuring Error and Model Performance

1000

800 y = 0.6328x - 19.306


2
R = 0.0806
600

400

200

0
0 100 200 300 400 500
-200

-400

-600

Standard Error = 0.1357083240931

The University of Sydney Page 35


4. Introduction to Regression

Measuring Error and Model Performance

2000
y = 0.522x + 30.018
2
1500 R = 0.0085

1000

500

0
0 100 200 300 400 500
-500

-1000

-1500

Standard Error = 0.3538647424814

The University of Sydney Page 36


4. Introduction to Regression

Measuring Error and Model Performance

As we have more error in our model it gets harder to fit a line

What do you think it means if the best fitting line has NO SLOPE?
– Think about what a flat line tells you about Y as X goes up or down…

The slope is the average impact of X on Y:


– If that average has NO SLOPE what impact does X have on Y?

The University of Sydney Page 37


4. Introduction to Regression

Testing the Impact of X on Y

If only there was some way of testing an


average against a fixed value…
The University of Sydney Page 38
4. Introduction to Regression

Testing the Impact of X on Y

One Sample t Test:


– Test an average against a fixed value

We want to test the average impact of X against zero impact:


– H0: 1 = 0
– H1: 1 ≠ 0

If the slope is equal to zero (i.e., 1 = 0) then:


– Y = 0 + 1X1

Becomes:
– Y = 0

The University of Sydney Page 39


4. Introduction to Regression

Testing the Impact of X on Y

One Sample t Test:


– The formula is actually quite simple!

b1  β1
t b1 = regression slope coefficient
Sb1 β1 = hypothesized slope (i.e. 0)
Sb1 = standard error of the slope coefficient
d.f.  n  2

Calculate t and compare it to the critical value of 1.96

The University of Sydney Page 40


4. Introduction to Regression

Testing the Impact of X on Y

250
y = 0.5014x + 9.6693
200 R2 = 0.9994

150

100

50

0
0 100 200 300 400 500

Standard Error = 0.000759380428589896


t = 660.263 (compare to critical of 1.96)
The University of Sydney Page 41
4. Introduction to Regression

Testing the Impact of X on Y

250
y = 0.5005x + 9.5696
2
R = 0.9529
200

150

100

50

0
0 100 200 300 400 500

Standard Error = 0.007062134927559


t = 70.867 (compare to critical of 1.96)
The University of Sydney Page 42
4. Introduction to Regression

Testing the Impact of X on Y

350
300 y = 0.4658x + 15.539
2
250 R = 0.3818

200
150

100
50
0
-50 0 100 200 300 400 500

-100

Standard Error = 0.03764192536044


t = 12.375 (compare to critical of 1.96)
The University of Sydney Page 43
4. Introduction to Regression

Testing the Impact of X on Y

1000

800 y = 0.6328x - 19.306


2
R = 0.0806
600

400

200

0
0 100 200 300 400 500
-200

-400

-600

Standard Error = 0.1357083240931


t = 4.663 (compare to critical of 1.96)
The University of Sydney Page 44
4. Introduction to Regression

Testing the Impact of X on Y

2000
y = 0.522x + 30.018
2
1500 R = 0.0085

1000

500

0
0 100 200 300 400 500
-500

-1000

-1500

Standard Error = 0.3538647424814


t = 1.622 (compare to critical of 1.96)
The University of Sydney Page 45
4. Introduction to Regression

Class Activity

Does house size have a significant impact on house price?

The University of Sydney Page 46

You might also like