Professional Documents
Culture Documents
Main dimension:
I Supervised - answers/ labels known, very common.
I Unsupervised - no answers/ labels. Find patterns in data.
I Semi-supervised - not dicussed in this course.
I Inductive learning - learning rules(will not discussed).
I Reinforcement learning - reward based learning (already
discussed).
Secondary dimension:
I Offline or batch - all the data is available at once. Currently,
the most common application.
I Online - data is available sequentially and decsions have to be
made on each data item as it comes. All data is not available
all at once. Have to keep updating model as data streams in.
Supervised learning
P(x|Ci )P(Ci )
P(Ci |x) = , i ∈ 1..2
P(x)
C
X
p(error ) = p(error |Ci )P(Ci )
i=1
XC Z
= P(Ci )[1 − p(x|Ci )dx]
i=1 Ri
(A)
z }| {
XC Z
=1−[ P(Ci ) p(x|Ci )dx] (1)
i=1 Ri
C Z
X
p(correct) = max [P(Ci )p(x|Ci )]dx
Ri i
i=1
C
X Z
rk = λki p(x|Ck )dx
i=1 Ri
Not all errors are equal II
I The goal is to choose regions Ri such that expected or average risk
is minimized. The expected risk is:
C
X
r= P(Ck )rk
k=1
C
X C
X Z
= P(Ck ) λki p(x|Ck )dx
k=1 i=1 Ri
C Z C
!
X X
= λki p(x|Ck )P(Ck ) dx (2)
i=1 Ri k=1
I To minimize the risk (2) choose the regions Ri such that Ri is given
suitable label Ci . This means minimize each of the C integrals.
Achieved if x is given label Ci as follows: x ∈ Ri if `i < `j , ∀j 6= i
PC PC
where `i = k=1 λki p(x|Ck )P(Ck ) and `j = k=1 λkj p(x|Ck )P(Ck )
with j 6= i
Not all errors are equal III
I Above is really a weighted version of the BDR with λki as weights.
The loss/risk can be specified by a loss/risk matrix:
λ11 . . . λ1C
.. .. ..
. . .
λC 1 ... λCC
(
0 k =i
λki =
1 k 6= i
1 1 T Σ−1 (x−µ̄ )]
p(x|Ci ) = 1 e [− 2 (x−µ̄i ) i i
(2π)(d/2) |Σi | 2
wi = Σ−1
i µ̄i ,
−1
wi0 = − 12 µ̄T 1
i Σi µ̄i − 2 ln |Σi | + ln P(Ci )