You are on page 1of 22

Linear Regression Analysis

Multiple Linear Regression in Matrix Notation


• The MLR Model in Scalar Form
Yi = β0 + β1Xi + … + βkXki + i where i ∼iid N(0, σ2 ).
Consider now writing an equation for each observation:
Y1 = β0 + β1X1 + … + βkXk1 + 1
Y2 = β0 + β1X2 + … + βkXk2 + 2
... ... ...
Yn = β0 + β1Xn + … + βkXkn + n
The MLR Model in Matrix Form
𝑌1 0 + 1 𝑋1 + ⋯ + 𝑘 𝑋𝑘1 1
𝑌2 0 + 1 𝑋2 + ⋯ + 𝑘 𝑋𝑘2 2
= + ⋮
⋮ ⋮
𝑌𝑛 0 + 1 𝑋𝑛 + ⋯ + 𝑘 𝑋𝑘𝑛 𝑛

𝑌1 1 𝑋11 𝑋21 … 𝑋𝑘1 0 1


𝑌2 1 𝑋12 𝑋22 … 𝑋𝑘2 1 2
Or , = + ⋮
⋮ ⋮ ⋮ ⋮
𝑌𝑛 1 𝑋1𝑛 𝑋2𝑛 … 𝑋𝑘𝑛 𝑘 𝑛
• X is called the design matrix.
• β is the vector of parameters
•  is the error vector
• Y is the response vector.
The Design Matrix
1 𝑋11 𝑋21 … 𝑋𝑘1
1 𝑋12 𝑋22 … 𝑋𝑘2
𝑋𝑛×(𝑘+1) =
⋮ ⋮
1 𝑋1𝑛 𝑋2𝑛 … 𝑋𝑘𝑛
Vector of Parameters
0
1
β(k+1)×1 =

𝑘
Vector of Error Terms
1
2
𝑛×1 = ⋮
𝑛
Vector of Responses
𝑌1
𝑌2
𝒀𝑛×1 =

𝑌𝑛
Multiple Linear Regression Model in Matrix Form
Thus,
Y = Xβ +
Covariance matrix of 
Covariance matrix of Y
Distributional Assumptions in Matrix Form
∼ N(0, σ2I) where I is an n × n identity matrix.
• Ones in the diagonal elements specify that the variance of each i is 1
times σ2.
• Zeros in the off-diagonal elements specify that the covariance
between different i is zero.
• This implies that the correlations are zero.
• Xi’s are independent.
Parameter Estimation
Least Squares
• Residuals are  = Y − Xβ.
• Want to minimize sum of squared residuals.
1
2
𝑖=1 𝜀𝑖 = (1 2 … 𝑛 ) ⋮ = '
𝑛 2

𝑛

• We want to minimize ' = (Y−Xβ)’ (Y−Xβ).


• We take the derivative with respect to the vector β.
• This is like a quadratic function: (Y − Xβ)2.
• The derivative works out to 2 times the derivative of (Y − Xβ)’ with respect to β.
• That is, d/dβ ((Y − Xβ)’(Y − Xβ)) = −2X’(Y − Xβ).
• We set this equal to 0 (a vector of zeros), and solve for β.
• So, −2X’(Y − Xβ) = 0.
• Or, X’Y = X’Xβ (the “normal” equations).
• Solving this equation for β gives the least squares solution for β.
• Multiply on the left by the inverse of the matrix X’X.
• The matrix X’X is a (k+1) × (k+1) square matrix for SLR.
• 𝛽 = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑌.
In simple linear regression, we defined,
1 1 … 1 1 𝑋11 … 𝑋𝑘1
′ 𝑋11 𝑋12 … 𝑋1𝑛 1 𝑋12 … 𝑋𝑘2
•𝑋𝑋=
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
𝑋𝑘1 𝑋𝑘2 … 𝑋𝑘𝑛 1 𝑋1𝑛 … 𝑋𝑘𝑛
𝑛 𝑛

𝑛 𝑋1𝑖 … 𝑋𝑘𝑖
𝑖=1 𝑖=1
𝑛 𝑛 𝑛
2
= 𝑋1𝑖 𝑋1𝑖 … 𝑋1𝑖 𝑋𝑘𝑖
𝑖=1 𝑖=1 𝑖=1
⋮ ⋮ ⋮ ⋮
𝑛 𝑛 𝑛
2
𝑋𝑘𝑖 𝑋1𝑖 𝑋𝑘𝑖 … 𝑋𝑘𝑖
𝑖=1 𝑖=1 𝑖=1
Distribution of 𝛽
We know that
𝛽 = (X’X)−1X’Y.
The only random variable involved is Y, so the distribution of 𝛽 is based
on the distribution of Y.
Since Y ∼ N(Xβ,σ2I),
we have E(𝛽) = (X’X) −1 X’Xβ = β
σ2𝛽 = Cov(𝛽) = σ2 (X’X) −1
Since σ2 is estimated by the MSE, s2, σ2𝛽 is estimated by s2(X’X)−1.
ANOVA Table
Sources of variation are:
Regression
Error (Residual)
Total

SS and df add as:


SSR + SSE = TSS
dfR + dfe = dfT
ANOVA Table

Source df SS MS F
Regression P-1=k+1 – 1 = k RSS MSR = RSS/(p – 1) MSR/MSE
Error n–p=n–2 SSE MSE = SSE / (n – p)
Total n–1
Inference for Individual Regression Coefficients
• Confidence Interval for 𝛽 1
• We know that 𝛽 1∼ N(β, σ2(X’X)−1)
• Define var{𝛽 1}p×p = 𝜎2(X’X)−1 = MSE × (X’X) −1
• CI for β1:

𝛽 1 ± tn – (k+1), 1 - /2 s.e.{𝛽 1}.


Significance Test for βj
H0 : β j = 0

• test statistic t∗ = 𝛽 1 /s.e.(𝛽 1)


• Where t* follows t-distribution with (n – p) d.f.
• This tests the significance of the independent variable in the model.
• In SLR, the t-tests for β are same as the F-test.
• In MLR, the t-tests are used for testing the significance of each βj.
Example:
Consider a data set, on price of houses.
area (x1) (00 sq.ft) rooms (x2) age (x3) (years) price (y) (in 000)

23 3 8 6562

15 2 7 4569

24 4 9 6897

29 5 4 7562

31 7 6 8234

25 3 10 7485

You might also like