Professional Documents
Culture Documents
Class Activity
Draw what you think is the straight line that fits best 3 5
3 4
1 2
Using this line: 15 8
Class Activity
Draw what you think is the straight line that fits best: 22 3
Class Activity
Draw what you think is the straight line that fits best: 18 8
Process of fitting a straight (linear) line that best fits the data
Y Yi β0 β1Xi εi
Observed Value
of Y for Xi
εi Slope = β1
Predicted Value Random Error
of Y for Xi
for this Xi value
Intercept = β0
Xi X
The University of Sydney Page 7
4. Introduction to Regression
Y Yi β0 β1Xi εi
Observed Value
of Y for Xi
εi
Predicted Value Random Error Slope = β1
of Y for Xi
for this Xi value
Intercept = β0
Xi X
The University of Sydney Page 8
4. Introduction to Regression
Y Yi β0 β1Xi εi
Observed Value
of Y for Xi
εi
Predicted Value Random Error Slope = β1
of Y for Xi
for this Xi value
Objective is to
Intercept = β0
minimise all errors!
Xi X
The University of Sydney Page 9
4. Introduction to Regression
Assumptions of Regression
No multicolinearity
– Independent variables are not correlated with each other
Normality of Error
– Error values (ε) are normally distributed for any given value of X
Homoscedasticity
– The probability distribution of the errors has constant variance
Independence of Errors
– Error values are statistically independent
Y X 16
4 7 14
3 5
12
3 4
10
1 2 y = 1.5859x - 3.0606
15 8 8
1 2 6
3 4 4
1 2
2
2 3
0
3 5
0 2 4 6 8 10
Y X 16
4 7 14
3 5
12
3 4
10
1 2
5 8 8
1 2 6
y = 0.6263x - 0.0303
3 4 4
1 2
2
2 3
0
3 5
0 2 4 6 8 10
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝜀𝑖
Linear Random
Component Component
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖
Linear in the
parameters
Constant:
– The average value of Y when X is equal to zero
Slope coefficient:
– The average change in Y for a one unit change in X
Error:
– The difference between the observed Y and the predicted Y
– Also called the residual
1200
1000
800
600
400
200
0
0 50 100 150 200 250
1000
Y = 3.66665x + 277.26
800
600
Constant
400
200
0
0 50 100 150 200 250
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖
𝑌 = 277.26 + 3.67𝑋
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖
𝑌 = 277.26 + 3.67𝑋
Model Performance
Model Performance
Coefficient of Determination:
– Also called the R-Square (R2) value
0 R 1 2
R2 = 0.59
Y = 277.26 + 3.67X
1200
1000
800
600
400
200
0
0 50 100 150 200 250
R2 = 1, r = +1 R2 = 1, r = -1
^=b +b X
Yi 0 1 i
^=b +b X
Yi 0 1 i
R2 = .8, r = +0.9 R2 = 0, r = 0
^=b ^=b +b X
Y
Yi 0 + b1Xi i 0 1 i
Measuring Error
Y Y
X X
The University of Sydney Page 27
4. Introduction to Regression
Measuring Error
Class Activity
Class Activity
1200
1000
800
250
y = 0.5014x + 9.6693
200 R2 = 0.9994
150
100
50
0
0 100 200 300 400 500
250
y = 0.5005x + 9.5696
2
R = 0.9529
200
150
100
50
0
0 100 200 300 400 500
350
300 y = 0.4658x + 15.539
2
250 R = 0.3818
200
150
100
50
0
-50 0 100 200 300 400 500
-100
1000
400
200
0
0 100 200 300 400 500
-200
-400
-600
2000
y = 0.522x + 30.018
2
1500 R = 0.0085
1000
500
0
0 100 200 300 400 500
-500
-1000
-1500
What do you think it means if the best fitting line has NO SLOPE?
– Think about what a flat line tells you about Y as X goes up or down…
Becomes:
– Y = 0
b1 β1
t b1 = regression slope coefficient
Sb1 β1 = hypothesized slope (i.e. 0)
Sb1 = standard error of the slope coefficient
d.f. n 2
250
y = 0.5014x + 9.6693
200 R2 = 0.9994
150
100
50
0
0 100 200 300 400 500
250
y = 0.5005x + 9.5696
2
R = 0.9529
200
150
100
50
0
0 100 200 300 400 500
350
300 y = 0.4658x + 15.539
2
250 R = 0.3818
200
150
100
50
0
-50 0 100 200 300 400 500
-100
1000
400
200
0
0 100 200 300 400 500
-200
-400
-600
2000
y = 0.522x + 30.018
2
1500 R = 0.0085
1000
500
0
0 100 200 300 400 500
-500
-1000
-1500
Class Activity