0% found this document useful (0 votes)

20 views5 pages

Maximum Likelihood in Linear Regression

The document discusses statistical models, particularly focusing on maximum likelihood estimation (MLE) for multiple linear regression and the derivation of the log-likelihood function. It includes theorems related to confidence intervals and Wilks' theorem, which relates to the chi-squared distribution in hypothesis testing. Additionally, it defines the general linear model and ordinary least squares estimation within the context of multivariate normal data.

Uploaded by

gohocel840

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views5 pages

Maximum Likelihood in Linear Regression

Uploaded by

gohocel840

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

474 CHAPTER III.

STATISTICAL MODELS

The derivative of the log-likelihood function (4) at β̂ with respect to σ 2 is

dLL(β̂, σ 2 ) d n 1 T −1
= − log(σ ) − 2 (y − X β̂) V (y − X β̂)
2
dσ 2 dσ 2 2 2σ
n 1 1
=− 2
+ 2 2
(y − X β̂)T V −1 (y − X β̂) (8)
2 σ 2(σ )
n 1
=− 2 + (y − X β̂)T V −1 (y − X β̂)
2σ 2(σ 2 )2

and setting this derivative to zero gives the MLE for σ 2 :

dLL(β̂, σ̂ 2 )
=0
dσ 2
n 1
0=− 2
+ 2 2
(y − X β̂)T V −1 (y − X β̂)
2σ̂ 2(σ̂ )
n 1
= (y − X β̂)T V −1 (y − X β̂) (9)
2σ̂ 2 2(σ̂ 2 )2
2(σ̂ 2 )2 n 2(σ̂ 2 )2 1
· 2 = · (y − X β̂)T V −1 (y − X β̂)
n 2σ̂ n 2(σ̂ 2 )2
1
σ̂ 2 = (y − X β̂)T V −1 (y − X β̂)
n

Together, (7) and (9) constitute the MLE for multiple linear regression.

1.5.24 Maximum log-likelihood

Theorem: Consider a linear regression model (→ III/1.5.1) m with correlation structure (→ I/1.14.5)
V

m : y = Xβ + ε, ε ∼ N (0, σ 2 V ) . (1)
Then, the maximum log-likelihood (→ I/4.1.4) for this model is

n RSS n
MLL(m) = − log − [1 + log(2π)] (2)
2 n 2
under uncorrelated observations (→ III/1.5.1), i.e. if V = In , and

n wRSS n 1
MLL(m) = − log − [1 + log(2π)] − log |V | , (3)
2 n 2 2
in the general case, i.e. if V ̸= In , where RSS is the residual sum of squares (→ III/1.5.9) and wRSS
is the weighted residual sum of squares (→ III/1.5.22).

Proof: The likelihood function (→ I/5.1.2) for multiple linear regression is given by (→ III/1.5.23)
3. ESTIMATION THEORY 115

where 1 − α is the confidence level and χ21,1−α is the (1 − α)-quantile of the chi-squared distribution
(→ II/3.7.1) with 1 degree of freedom.

Proof: The confidence interval (→ I/3.2.1) is defined as the interval that, under infinitely repeated
random experiments (→ I/1.1.1), contains the true parameter value with a certain probability.
Let us define the likelihood ratio (→ I/4.1.6)

p(y|ϕ, λ̂)
Λ(ϕ) = for all ϕ∈Φ (4)
p(y|ϕ̂, λ̂)
and compute the log-likelihood ratio (→ I/4.1.7)

log Λ(ϕ) = log p(y|ϕ, λ̂) − log p(y|ϕ̂, λ̂) . (5)

Wilks’ theorem states that, when comparing two statistical models with parameter spaces Θ1 and
Θ0 ⊂ Θ1 , as the sample size approaches infinity, the quantity calculated as −2 times the log-ratio of
maximum likelihoods follows a chi-squared distribution (→ II/3.7.1), if the null hypothesis is true:

maxθ∈Θ0 p(y|θ)
H0 : θ ∈ Θ0 ⇒ −2 log ∼ χ2∆k as n→∞ (6)
maxθ∈Θ1 p(y|θ)
where ∆k is thendifference
o in dimensionality between Θ0 and Θ1 . Applied to our example in (5), we
note that Θ1 = ϕ, ϕ̂ and Θ0 = {ϕ}, such that ∆k = 1 and Wilks’ theorem implies:

−2 log Λ(ϕ) ∼ χ21 . (7)

Using the quantile function (→ I/1.9.1) χ2k,p of the chi-squared distribution (→ II/3.7.1), an (1 − α)-
confidence interval is therefore given by all values ϕ that satisfy

−2 log Λ(ϕ) ≤ χ21,1−α . (8)

Applying (5) and rearranging, we can evaluate

■
Sources:
• Wikipedia (2020): “Confidence interval”; in: Wikipedia, the free encyclopedia, retrieved on 2020-
02-19; URL: https://en.wikipedia.org/wiki/Confidence_interval#Methods_of_derivation.
• Wikipedia (2020): “Likelihood-ratio test”; in: Wikipedia, the free encyclopedia, retrieved on 2020-
02-19; URL: https://en.wikipedia.org/wiki/Likelihood-ratio_test#Definition.
• Wikipedia (2020): “Wilks’ theorem”; in: Wikipedia, the free encyclopedia, retrieved on 2020-02-19;
URL: https://en.wikipedia.org/wiki/Wilks%27_theorem.
294 CHAPTER II. PROBABILITY DISTRIBUTIONS

 
1 ··· 0
 
 . . 
Cov(z) =  .. . . . ..  = In . (7)
 
0 ··· 1
2) Next, consider an n × n matrix A solving the equation AAT = Σ. Such a matrix exists, because Σ
is defined to be positive definite (→ II/4.1.1). Then, x can be represented as a linear transformation
of (→ II/4.1.13) z:

x = Az + µ ∼ N (A0n + µ, AIn AT ) = N (µ, Σ) . (8)

Thus, the covariance (→ I/1.13.1) of x can be written as:

Cov(x) = Cov(Az + µ) . (9)

With the invariance of the covariance matrix under addition (→ I/1.13.14)

Cov(x + a) = Cov(x) (10)

and the scaling of the covariance matrix upon multiplication (→ I/1.13.15)

Cov(Ax) = ACov(x)AT , (11)

this becomes:

Cov(x) = Cov(Az + µ)
(10)
= Cov(Az)
(11)
= A Cov(z)AT
(12)
(7) T
= AIn A
= AAT
=Σ.

■
Sources:
• Rosenfeld, Meni (2016): “Deriving the Covariance of Multivariate Gaussian”; in: StackExchange
Mathematics, retrieved on 2022-09-15; URL: https://math.stackexchange.com/questions/1905977/
deriving-the-covariance-of-multivariate-gaussian.

4.1.11 Differential entropy

Theorem: Let x follow a multivariate normal distribution (→ II/4.1.1)

x ∼ N (µ, Σ) . (1)
Then, the differential entropy (→ I/2.2.1) of x in nats is
n 1 1
h(x) = ln(2π) + ln |Σ| + n . (2)
2 2 2
2. MULTIVARIATE NORMAL DATA 525

2 Multivariate normal data

2.1 General linear model
2.1.1 Definition
Definition: Let Y be an n × v matrix and let X be an n × p matrix. Then, a statement asserting a
linear mapping from X to Y with parameters B and matrix-normally distributed (→ II/5.1.1) errors
E

Y = XB + E, E ∼ MN (0, V, Σ) (1)
is called a multivariate linear regression model or simply, “general linear model”.
• Y is called “data matrix”, “set of dependent variables” or “measurements”;
• X is called “design matrix”, “set of independent variables” or “predictors”;
• B are called “regression coeﬀicients” or “weights”;
• E is called “noise matrix” or “error terms”;
• V is called “covariance across rows”;
• Σ is called “covariance across columns”;
• n is the number of observations;
• v is the number of measurements;
• p is the number of predictors.
When rows of Y correspond to units of time, e.g. subsequent measurements, V is called “temporal
covariance”. When columns of Y correspond to units of space, e.g. measurement channels, Σ is called
“spatial covariance”.
When the covariance matrix V is a scalar multiple of the n×n identity matrix, this is called a general
linear model with independent and identically distributed (i.i.d.) observations:
i.i.d.
V = λIn ⇒ E ∼ MN (0, λIn , Σ) ⇒ εi ∼ N (0, λΣ) . (2)
Otherwise, it is called a general linear model with correlated observations.

Sources:
• Wikipedia (2020): “General linear model”; in: Wikipedia, the free encyclopedia, retrieved on 2020-
03-21; URL: https://en.wikipedia.org/wiki/General_linear_model.

2.1.2 Ordinary least squares

Theorem: Given a general linear model (→ III/2.1.1) with independent observations

Y = XB + E, E ∼ MN (0, σ 2 In , Σ) , (1)
the ordinary least squares (→ III/1.5.3) parameters estimates are given by

B̂ = (X T X)−1 X T Y . (2)

Proof: Let B̂ be the ordinary least squares (→ III/1.5.3) (OLS) solution and let Ê = Y − X B̂ be
the resulting matrix of residuals. According to the exogeneity assumption of OLS, the errors have
conditional mean (→ I/1.10.1) zero
2 CHAPTER I. GENERAL THEOREMS

1 Probability theory
1.1 Random experiments
1.1.1 Random experiment
Definition: A random experiment is any repeatable procedure that results in one (→ I/1.2.2) out
of a well-defined set of possible outcomes.
• The set of possible outcomes is called sample space (→ I/1.1.2).
• A set of zero or more outcomes is called a random event (→ I/1.2.1).
• A function that maps from events to probabilities is called a probability function (→ I/1.5.1).
Together, sample space (→ I/1.1.2), event space (→ I/1.1.3) and probability function (→ I/1.1.4)
characterize a random experiment.

Sources:
• Wikipedia (2020): “Experiment (probability theory)”; in: Wikipedia, the free encyclopedia, re-
trieved on 2020-11-19; URL: https://en.wikipedia.org/wiki/Experiment_(probability_theory).

1.1.2 Sample space

Definition: Given a random experiment (→ I/1.1.1), the set of all possible outcomes from this
experiment is called the sample space of the experiment. A sample space is usually denoted as Ω and
specified using set notation.

Sources:
• Wikipedia (2021): “Sample space”; in: Wikipedia, the free encyclopedia, retrieved on 2021-11-26;
URL: https://en.wikipedia.org/wiki/Sample_space.

1.1.3 Event space

Definition: Given a random experiment (→ I/1.1.1), an event space E is any set of events, where
an event (→ I/1.2.1) is any set of zero or more elements from the sample space (→ I/1.1.2) Ω of this
experiment.

Sources:
• Wikipedia (2021): “Event (probability theory)”; in: Wikipedia, the free encyclopedia, retrieved on
2021-11-26; URL: https://en.wikipedia.org/wiki/Event_(probability_theory).

1.1.4 Probability space

Definition: Given a random experiment (→ I/1.1.1), a probability space (Ω, E, P ) is a triple con-
sisting of
• the sample space (→ I/1.1.2) Ω, i.e. the set of all possible outcomes from this experiment;
• an event space (→ I/1.1.3) E ⊆ 2Ω , i.e. a set of subsets from the sample space, called events (→
I/1.2.1);
• a probability measure P : E → [0, 1], i.e. a function mapping from the event space (→ I/1.1.3)
to the real numbers, observing the axioms of probability (→ I/1.4.1).

Random Matrices and Statistical Definitions
No ratings yet
Random Matrices and Statistical Definitions
5 pages
Generalized Linear Modeling Examples & Solutions
No ratings yet
Generalized Linear Modeling Examples & Solutions
43 pages
Statistical Theorems and Confidence Intervals
No ratings yet
Statistical Theorems and Confidence Intervals
5 pages
Generalized Linear Models
No ratings yet
Generalized Linear Models
109 pages
Generalized Linear Models Overview
No ratings yet
Generalized Linear Models Overview
29 pages
Probability Distributions and Models
No ratings yet
Probability Distributions and Models
5 pages
Bayesian Linear Regression Models
No ratings yet
Bayesian Linear Regression Models
5 pages
Understanding Large Sample Theory in Statistics
No ratings yet
Understanding Large Sample Theory in Statistics
10 pages
Advanced Econometrics PDF
No ratings yet
Advanced Econometrics PDF
58 pages
Stat520 Ch.5
No ratings yet
Stat520 Ch.5
5 pages
Econometrics Exercise Set 2 Solutions
No ratings yet
Econometrics Exercise Set 2 Solutions
12 pages
Binomial Regression and GLM Overview
No ratings yet
Binomial Regression and GLM Overview
10 pages
Asymptotic Analysis of Least Squares Estimators
No ratings yet
Asymptotic Analysis of Least Squares Estimators
8 pages
QM3 Formulas and Probability Tables
No ratings yet
QM3 Formulas and Probability Tables
8 pages
Statistical Models and Proof Techniques
No ratings yet
Statistical Models and Proof Techniques
5 pages
Principles of Statistics
No ratings yet
Principles of Statistics
113 pages
Summation Rules and Statistical Concepts
No ratings yet
Summation Rules and Statistical Concepts
3 pages
Probability and Statistics Cheat Sheet
100% (3)
Probability and Statistics Cheat Sheet
28 pages
Multiple Regression with Random Variables
No ratings yet
Multiple Regression with Random Variables
6 pages
ABD Formulas
No ratings yet
ABD Formulas
55 pages
Econometrics I: OLSE and Test Statistics
No ratings yet
Econometrics I: OLSE and Test Statistics
9 pages
Understanding Probability Theory Concepts
No ratings yet
Understanding Probability Theory Concepts
7 pages
Econometrics I Homework 2 Solutions
No ratings yet
Econometrics I Homework 2 Solutions
11 pages
Stat Modelling Notes
No ratings yet
Stat Modelling Notes
49 pages
Franklin A. Graybill-Theory and Application of The Linear Model-Cengage Learning (March 27, 2000)
No ratings yet
Franklin A. Graybill-Theory and Application of The Linear Model-Cengage Learning (March 27, 2000)
237 pages
Econometrics: OLS Estimator Insights
No ratings yet
Econometrics: OLS Estimator Insights
13 pages
Advanced Econometrics I Lecture Notes
No ratings yet
Advanced Econometrics I Lecture Notes
40 pages
Bayesian Inference and Variational Methods
No ratings yet
Bayesian Inference and Variational Methods
5 pages
9 Mle
No ratings yet
9 Mle
39 pages
Bera 2 - Introduction To Statistics For Econometricians, Part II Apostila
No ratings yet
Bera 2 - Introduction To Statistics For Econometricians, Part II Apostila
114 pages
Dis 1
No ratings yet
Dis 1
5 pages
Delta Method in Statistics Explained
No ratings yet
Delta Method in Statistics Explained
10 pages
Probability and Statistics: Cookbook
No ratings yet
Probability and Statistics: Cookbook
28 pages
McCullagh - GLM
100% (11)
McCullagh - GLM
526 pages
Understanding Generalized Linear Models
No ratings yet
Understanding Generalized Linear Models
6 pages
Mathematics of The Linear Model and Linear Mixed Model: Brian Zhang February 2020
No ratings yet
Mathematics of The Linear Model and Linear Mixed Model: Brian Zhang February 2020
20 pages
Understanding Beta Distribution Basics
No ratings yet
Understanding Beta Distribution Basics
5 pages
Linear Regression Study Guide
No ratings yet
Linear Regression Study Guide
27 pages
Linear Regression Inference with Covariates
No ratings yet
Linear Regression Inference with Covariates
47 pages
Inf 2
No ratings yet
Inf 2
37 pages
Ssmda Prev
No ratings yet
Ssmda Prev
39 pages
Maximum Likelihood: Key Insights and Issues
No ratings yet
Maximum Likelihood: Key Insights and Issues
31 pages
Local Asymptotic Normality in Statistics
No ratings yet
Local Asymptotic Normality in Statistics
16 pages
GLM - NelderWedderburn1972
No ratings yet
GLM - NelderWedderburn1972
16 pages
Advanced Statistical Theory
No ratings yet
Advanced Statistical Theory
132 pages
Generalized Linear Models Overview
No ratings yet
Generalized Linear Models Overview
16 pages
Exponential Families in Statistics
No ratings yet
Exponential Families in Statistics
45 pages
Understanding Generalized Linear Models
No ratings yet
Understanding Generalized Linear Models
40 pages
Maximum Likelihood Estimation in ML
No ratings yet
Maximum Likelihood Estimation in ML
8 pages
MS Theory Exam Study Guide
No ratings yet
MS Theory Exam Study Guide
50 pages
Advanced GLM Techniques Lecture
No ratings yet
Advanced GLM Techniques Lecture
61 pages
Linear Statistical Models Lecture Notes
No ratings yet
Linear Statistical Models Lecture Notes
150 pages
Lecture 3
No ratings yet
Lecture 3
36 pages
Stats Cheat Sheet
No ratings yet
Stats Cheat Sheet
28 pages
Mathematics Handbook for Engineering Students
No ratings yet
Mathematics Handbook for Engineering Students
11 pages
Mathematics and Statistics For Financial Risk Management 2nd Edition Edition Miller
No ratings yet
Mathematics and Statistics For Financial Risk Management 2nd Edition Edition Miller
514 pages
10 Math Tutsheet 14 Statistics Median 2019
No ratings yet
10 Math Tutsheet 14 Statistics Median 2019
2 pages
Discrete Random Variables Guide
No ratings yet
Discrete Random Variables Guide
14 pages
Christensen
No ratings yet
Christensen
17 pages
Classnote Minor Statistics 1
No ratings yet
Classnote Minor Statistics 1
16 pages
Pengaruh Pemasaran Online pada Pembelian
No ratings yet
Pengaruh Pemasaran Online pada Pembelian
18 pages
ECONOMETRICS
No ratings yet
ECONOMETRICS
2 pages
Tugas Data Mining Pertemuan 10 Kelompok 3
No ratings yet
Tugas Data Mining Pertemuan 10 Kelompok 3
4 pages
Fundamentals of Business Mathematics and Statistics Questions With Answers PDF For All Competitive Exams and Interviews
No ratings yet
Fundamentals of Business Mathematics and Statistics Questions With Answers PDF For All Competitive Exams and Interviews
24 pages
Economics 1620 Spring 2023 Midterm Exam
No ratings yet
Economics 1620 Spring 2023 Midterm Exam
12 pages
Solution Manual for Radiation Detection
No ratings yet
Solution Manual for Radiation Detection
18 pages
Ch. 3 HW Solutions
No ratings yet
Ch. 3 HW Solutions
8 pages
Categorical Data Analysis With SAS and SPSS Applications
100% (1)
Categorical Data Analysis With SAS and SPSS Applications
576 pages
PSMOD - Sample Practical Test (A)
No ratings yet
PSMOD - Sample Practical Test (A)
3 pages
Clinical Trial Design Bayesian and Frequentist Adaptive Methods - 1st Edition Study Guide Download
100% (14)
Clinical Trial Design Bayesian and Frequentist Adaptive Methods - 1st Edition Study Guide Download
14 pages
Unit 1
No ratings yet
Unit 1
4 pages
Statistics Boot Camp: X F X X E DX X XF X E Important Properties of The Expectations Operator
No ratings yet
Statistics Boot Camp: X F X X E DX X XF X E Important Properties of The Expectations Operator
3 pages
Normal and T Distribution Probability Tables
No ratings yet
Normal and T Distribution Probability Tables
5 pages
4.1. Estimation
No ratings yet
4.1. Estimation
30 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
6 pages
ProSta Module 6
No ratings yet
ProSta Module 6
36 pages
Bio 31201 Chi-Square Test Application
No ratings yet
Bio 31201 Chi-Square Test Application
4 pages
Stat Cookbook
No ratings yet
Stat Cookbook
31 pages
Machine Learning Handbook - Radivojac and White
No ratings yet
Machine Learning Handbook - Radivojac and White
108 pages
Grade II Probability Lesson Plan
No ratings yet
Grade II Probability Lesson Plan
32 pages
Anna University Applied Maths Question Paper
100% (1)
Anna University Applied Maths Question Paper
3 pages
Flood Frequency Analysis For Burhi Gandak
No ratings yet
Flood Frequency Analysis For Burhi Gandak
8 pages
Statistics Unit 1 Revision 2026 Final
No ratings yet
Statistics Unit 1 Revision 2026 Final
20 pages
Paper Airplane Flight Distance Experiment
No ratings yet
Paper Airplane Flight Distance Experiment
20 pages

Maximum Likelihood in Linear Regression

Uploaded by

Maximum Likelihood in Linear Regression

Uploaded by

474 CHAPTER III.

The derivative of the log-likelihood function (4) at β̂ with respect to σ 2 is

and setting this derivative to zero gives the MLE for σ 2 :

1.5.24 Maximum log-likelihood

log Λ(ϕ) = log p(y|ϕ, λ̂) − log p(y|ϕ̂, λ̂) . (5)

−2 log Λ(ϕ) ∼ χ21 . (7)

−2 log Λ(ϕ) ≤ χ21,1−α . (8)

x = Az + µ ∼ N (A0n + µ, AIn AT ) = N (µ, Σ) . (8)

Cov(x) = Cov(Az + µ) . (9)

Cov(x + a) = Cov(x) (10)

Cov(Ax) = ACov(x)AT , (11)

4.1.11 Differential entropy

2 Multivariate normal data

2.1.2 Ordinary least squares

1.1.2 Sample space

1.1.3 Event space

1.1.4 Probability space

You might also like