Professional Documents
Culture Documents
Logistic Regression: Jia Li
Logistic Regression: Jia Li
Logistic Regression
Jia Li
Department of Statistics
The Pennsylvania State University
Email: jiali@stat.psu.edu
http://www.stat.psu.edu/∼jiali
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
Logistic Regression
Preserve linear classification boundaries.
I By the Bayes rule:
Pr (G = k | X = x) = Pr (G = l | X = x) .
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
Assumptions
Pr (G = 1 | X = x)
log = β10 + β1T x
Pr (G = K | X = x)
Pr (G = 2 | X = x)
log = β20 + β2T x
Pr (G = K | X = x)
..
.
Pr (G = K − 1 | X = x)
log = β(K −1)0 + βKT −1 x
Pr (G = K | X = x)
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
Pr (G = k | X = x)
log = βk0 − βl0 + (βk − βl )T x .
Pr (G = l | X = x)
I Number of parameters: (K − 1)(p + 1).
I Denote the entire parameter set by
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
exp(βk0 + βkT x)
Pr (G = k | X = x) = PK −1
1 + l=1 exp(βl0 + βlT x)
for k = 1, ..., K − 1
1
Pr (G = K | X = x) = PK −1 .
1+ l=1 exp(βl0 + βlT x)
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
I Similarities:
I Both attempt to estimate Pr (G = k | X = x).
I Both have linear classification boundaries.
I Difference:
I Linear regression on indicator matrix: approximate
Pr (G = k | X = x) by a linear function of x.
Pr (G = k | X = x) is not guaranteed to fall between 0 and 1
and to sum up to 1.
I Logistic regression: Pr (G = k | X = x) is a nonlinear function
of x. It is guaranteed to range from 0 to 1 and to sum up to 1.
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
Binary Classification
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
I If yi = 1, i.e., gi = 1,
If yi = 0, i.e., gi = 2,
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
exp(β T x)
p(x; β) = Pr (G = 1 | X = x) =
1 + exp(β T x)
1
1 − p(x; β) = Pr (G = 2 | X = x) =
1 + exp(β T x)
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
I The element on the jth row and nth column is (counting from
0):
∂L(β)
∂β1j ∂β1n
N T T T
X (1 + e β xi )e β xi xij xin − (e β xi )2 xij xin
= −
i=1
(1 + e β T xi )2
XN
= − xij xin p(xi ; β) − xij xin p(xi ; β)2
i=1
XN
= − xij xin p(xi ; β)(1 − p(xi ; β)) .
i=1
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
Pseudo Code
1. 0 → β
2. Compute y by setting its elements to
1 if gi = 1
yi = ,
0 if gi = 2
i = 1, 2, ..., N.
3. Compute p by setting its elements to
T
e β xi
p(xi ; β) = i = 1, 2, ..., N.
1 + e β T xi
4. Compute the diagonal matrix W. The ith diagonal element is
p(xi ; β)(1 − p(xi ; β)), i = 1, 2, ..., N.
5. z ← Xβ + W−1 (y − p).
6. β ← (XT WX)−1 XT Wz.
7. If the stopping criteria is met, stop; otherwise go back to step
3.
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
Computational Efficiency
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
1. 0 → β
2. Compute y by setting its elements to
1 if gi = 1
yi = , i = 1, 2, ..., N .
0 if gi = 2
3. Compute p by setting its elements to
T
e β xi
p(xi ; β) = i = 1, 2, ..., N.
1 + e β T xi
4. Compute the N × (p + 1) matrix X̃ by multiplying the ith row of
matrix X by p(xi ; β)(1 − p(xi ; β)), i = 1, 2, ..., N:
T
p(x1 ; β)(1 − p(x1 ; β))x1T
x1
xT p(x2 ; β)(1 − p(x2 ; β))x T
X= 2 2
· · · X̃ = · · ·
xNT p(xN ; β)(1 − p(xN ; β))xNT
5. β ← β + (XT X̃)−1 XT (y − p).
6. If the stopping criteria is met, stop; otherwise go back to step 3.
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
Example
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
e 0.7679−0.6816X1 −0.3664X2
Pr (G = 1 | X = x) =
1 + e 0.7679−0.6816X1 −0.3664X2
1
Pr (G = 2 | X = x) =
1 + e 0.7679−0.6816X1 −0.3664X2
I The classification rule is:
1 0.7679 − 0.6816X1 − 0.3664X2 ≥ 0
Ĝ (x) =
2 0.7679 − 0.6816X1 − 0.3664X2 < 0
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
I Within training
data set
classification error
rate: 28.12%.
I Sensitivity: 45.9%.
I Specificity: 85.8%.
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
Multiclass Case (K ≥ 3)
I When K ≥ 3, β is a (K-1)(p+1)-vector:
β10
β11
.
..
β10
β1 β1p
β20
β20
.
β2 .
β= = .
..
β2p
.
.
β(K −1)0 ..
βK −1
β(K −1)0
..
.
β(K −1)p
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
βl0
I Let β̄l = .
βl
I The likelihood function becomes
N
X
L(β) = log pgi (xi ; β)
i=1
N T
!
X e β̄gi xi
= log PK −1 Tx
i=1 1+ l=1 e β̄l i
N K −1
" !#
β̄lT xi
X X
= β̄gTi xi − log 1 + e
i=1 l=1
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
N
X
= xij (I (gi = k) − pk (xi ; β))
i=1
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
I Matrix form.
I y is the concatenated indicator vector of dimension
N × (K − 1).
y1 I (g1 = k)
y2 I (g2 = k)
y= . yk =
..
..
.
yK −1 I (gN = k)
1≤k ≤K −1
I p is the concatenated vector of fitted probabilities of dimension
N × (K − 1).
p1 pk (x1 ; β)
p2 pk (x2 ; β)
p= . pk =
..
..
.
pK −1 pk (xN ; β)
1≤k ≤K −1
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
∂L(β)
= X̃T (y − p)
∂β
∂ 2 L(β)
= −X̃T WX̃ .
∂β∂β T
I The formula for updating β new in the binary classification case
holds for multiclass.
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
Computation Issues
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
Simulation
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
LDA Result
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
β = (−0.3288, −1.3275)T .
e −0.3288−1.3275x
Pr (G = 1 | X = x) = .
1 + e −0.3288−1.3275x
Jia Li http://www.stat.psu.edu/∼jiali
Logistic Regression
Jia Li http://www.stat.psu.edu/∼jiali