Professional Documents
Culture Documents
1
Relations between variables
2
Scatter plots
• A scatter plot (or scatter diagram) can be used to
show the relationship between two variables.
Y Y
X X
Y Y
3
X X
Types of Relations
Y Y
X X
Y Y
X X 4
Types of Relations (Cont’d)
No relationship
X 5
Historical Origins of Regression Models
7
Simple Linear Regression Models
Slope Random
Y intercept Independen Error
Coefficient
t Variable term
Dependent
Variable
Yi = 0 + X
1 i + i
Linear component Random Error
Assumptions: component
• E(εi) = 0
• Variance (εi) = σ2
• Covariance (εi, εj) =0 8
Model Assumptions
• Normality of Error
Error values (ε) are (normally) distributed for any
given value of X.
• Homoscedasticity
The probability distribution of the errors has
constant variance.
• Independence of Errors
Error values are statistically independent.
9
Simple Linear Regression Equation
Estimated
(or predicted) Estimate of Estimate of the
Y value for the regression regression slope
observation i intercept
Value of X for
Ŷi = b0 + b1Xi
observation i
10
Slopes and the Intercept
11
Example
12
Sample Data for House Price Model
House Price in $’000s Square Feet
(Y) (X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
13
Graphical Presentation
350
300
250
200
150
100
50
0
0 500 1000 1500 2000 2500 3000
Square Feet
14
Graphical Presentation
350 Slope
300
= 0.10977
250
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet
16
Interpretation of the Slope Coefficient b1
17
Predictions using the Fitted Model
To predict the price for a house with 2,000
square feet, we write:
18
Interpolation vs. Extrapolation
• When using a regression model for prediction, only
predict within the relevant range of data
450
400
House Price ($1000s)
350
300
250
200
150 Do not try to
100 extrapolate
50
0
beyond the range
0 500 1000 1500 2000 2500 3000 of observed X’s
Square Feet
19
Interpolation vs. Extrapolation
20
Estimation (Method of Least Squares)
21
Estimation (Method of Least Squares)
(X i X )( Y i Y )
• b1 = i 1
n
2
(X i X )
i 1
• b0 = Y b1 X
22
Estimation of error terms variance σ2
• The estimator of σ2 is n
n 2
(Yi Yˆ )
2 ei
2 SSE i 1 i 1
S MSE
n 2 n 2 n 2
23
Method of Maximum Likelihood
n
(X i X )( Y i Y )
• b1 = i 1
n
Normal Error Model
2
(X i X )
i 1
• b0 = Y b1 X
SSE n 2 2
• The estimator of σ2 is S
n n
• MLE of b0 = LSE of b0 (unbiased)
• MLE of b1 = LSE of b1 (unbiased)
• MLE of σ2 < unbiased estimator of σ2 [asymptotically unbiased]
24
Example
25
Regression output from SAS
Parameter Estimates
Variable Label DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept Intercept 1 2.11405 0.32089 6.59 <.0001
ACT ACT test score 1 0.03883 0.01277 3.04 0.0029
26
A Scatter Plot with Regression Line
27
The Slope
Answer = 3.27895
Answer = 0.03883
28
More on Inference
Yi = 0 + 1Xi + i
Assumptions:
29
Distribution of b1
(X i X )( Y i Y )
b1 = i 1
n
2
(X i X )
i 1
= k iY i
i 1
(X X )
where k i n
i
2
(X i X )
i 1
30
Distribution of b1
It follows that
ß1
~ t (n - 2) with s 2 (b ) = MSE
b1
s { b1 } 1 Sxx
31
Testing (Two-sided test of β1)
b1
Test statistics: T =
s { b1 }
32
Two-sided Confidence Intervals for β1
b1 t / 2 ,n 2
s { b1 }, b1 t / 2 ,n 2
s { b1 }
33
One-sided Test for β1
H0: β1 ≤ k
H1: β1 > k
b1 k
Test statistics: T = s { b1 }
34
One-sided Tests for β1
H0: β1 ≧ k
H1: β1 < k
b1 k
Test statistics: T = s { b1 }
35
Distribution of b0
Recall that b0 = Y b1 X
b0 ~ N(β0, σ2 1
n n
X
2
)
(X i X )
i 1
b0 0
s{ b0 }
~ t (n - 2)
Proof will be discussed in class
36
Two-sided Tests for β0
H0: β0 = k
H1: β0 ≠ k
b0 k
Test statistics: T = s{b 0 }
37
Two-sided Confidence Interval for β0
38
One-sided Tests for β0
H0: β0 ≤ k
H1: β0 > k
b0 k
Test statistics: T = s{ b }
0
39
One-sided Test for β0
H0: β0 ≧ k
H1: β0 < k
b0 k
Test statistics: T = s{b 0 }
40
Estimation of the mean of the response
variable for a given level of X
Example
Suppose you want to develop a model to predict selling
price of homes based on assessed value. A sample of 30
recently sold single-family houses in a small city is
selected to study the relationship between selling price
(Y, in $000) and assessed value (X, in $000). The houses
in the city had been reassessed at full value 1 year prior
to the study.
41
Estimation of the mean of the response
variable for a given level of X
Let Xh denote the level of X for which we wish to
estimate the mean response [to be estimated by Ŷh].
(Note: Given Xh, the mean response is E(Yh) = ß0 + ß1Xh
according to the model)
For estimation, Given Xh: Ŷh = b0 + b1Xh
2
1 X X
E(Yh) - Ŷh ~ N(0, σ2 n n
h
2
)
(X i X )
i 1
42
Confidence Interval for E(Yh)
2 2
1 X X 1 X X
Yˆh Yˆh
h h
t / 2 ,n 2
s n
, t / 2 ,n 2
s n
n 2 n 2
(X i
X ) (X i
X )
i 1 i 1
43
Prediction of a new observation Yh(new)
Example
Suppose you want to develop a model to predict selling
price of homes based on assessed value. A sample of 30
recently sold single-family houses in a small city is
selected to study the relationship between selling price (Y,
in $000) and assessed value (X, in $000). The houses in the
city had been reassessed at full value 1 year prior to the
study.
44
Prediction of a new observation Yh(new)
Ŷh = b0 + b1Xh
2
)
(X i X )
i 1
45
Confidence Interval for Yh(new)
2 2
Yˆh Yˆh
1 X X 1 X X
t / 2 ,n 2
s 1
n
h
, t / 2 ,n 2
s 1
n
h
n 2 n 2
(X i X ) (X i X )
i 1 i 1
46
ANOVA and Regression Analysis
47
ANOVA and Regression
48
ANOVA and Regression
49
ANOVA and Regression
50
ANOVA and Regression
Total Variation
= Unexplained Variation + Explained Variation
51
ANOVA and Regression
52
ANOVA and Regression
ANOVA Table
Sum of Squares Degrees of freedom Mean squares F
(SS) (df) (MS)
Regression SSR 1 MSR = SSR/1 MSR/MSE
b) 0 R 2
1
• Coefficient of Correlation
a) r R 2