Professional Documents
Culture Documents
机器学习周志华 8.16.23 PM
机器学习周志华 8.16.23 PM
[Mitchell ,
(learning
et a l., 2001].
2
(data
(attribute
(attribute (samp1e
(feature vector).
=
Xi = (X i1; Xi2; . . . ;
), (dimensionality).
(training) ,
(regression).
(binary (positive
(negative
(multi-class
Y1) , (X2 , Y2) ,..., (Xm ,
=
(testing
(testing
f(x).
(supervised (unsupervised
(unseen instance).
(distribution)
(independent and identically
4
(inductive .
1
2
3
4
"
1.3 5
*) ^
x 3 x 3+1=
7 11
(version
6
*) ^ *)"
(inductive bias) ,
(feature
1. 4 7
y
&
B
,
/' "
-,
\ A
(Occam's
= _X 2 + 6x +
*) ^ *) ^
8
y y ,
B
J
/
B
, ‘BEES
(a) (b)
I (1. 1)
h æEX-X
P(h I
f f h
I X ,'ca)
= L IX,
æEX-X h
h
1. 4 9
f) , (1. 3)
f f
Free Lunch
[Wolpert , 1996; Wolpert and Macready , 1995].
*)
*) ^
^
"
10
(Logic
(General Problem
A.
E. A.
S. Michalski
B.
J.
S. G.
[Michalski et a l.,
1. 5 11
G.
[Carbonell ,
R. S. et a l.,
E. A.
[Cohen and
S.
Logic
12
S.
J. J.
D. E.
(statistical
Vector
(kernel
N. J.
1. 6 13
Intel
14
and DeCoste ,
ZJZ;;225;
T.
(data
1.6 15
S.
S.
16
(Sparse Distributed
http://www.cs.waikato.
ac.nz/ml/weka j.
[Michalski et a l.,
Morgan
1.7 17
A.
and Feigenbaum ,
[Dietterich ,
et a l.,
1996;
(principle of multiple
explanations)
Transactions on Pattem
Analysis and Machine Com-
on Neural
19
1.1
1. 2
^ *))
v ^ ,
^
V (A = *)
=
1. 3
1.4*
= I
h
1. 5
20
(2007). 3(12):35-44.
(John McCarthy, ,
and
(error
rate)
-;:;) x 100%
(error) ,
(training
(empirical error) (generalization
24
(model
(testing
(testing
2.2 25
x 30% = 70%.
(stratified
26
rv
1997]
(cross
= D1 U D k, Di n Dj = ø
(k-fold cross
validat ion).
D
G
ID1 1D2 1 31D4 1Ds ID6 1D71 D81Dg IDlOI
D
1
r
IDsID6ID7ID8IDgID lO l J
2.2 27
and Tibshirani ,
(2.1)
(out-of-bag
estimate).
28
(parameter tuning).
(validation
measure).
2.3 29
(2.2)
E (f ;Ð) = (2.3)
(2 .4)
1- E (f ;D) .
, (2.6)
30
acc (f; Ð) L EM Z
(2.7)
I FN
FP I TN
TP
(2.8)
TP+FP'
'-F
R -T -N. (2.9)
2.3 31
10
./
0.2
Ol 0 2 04 0 6
(Break-Event
32
2xPxR 2 xTP
.,,-,r-. (2.10)
P+R
(macro-F1):
macro-F1 = - ,.
Inacro- -P ••
x macro- -R. (2.14)
macro- P +
(micro-F1):
TP
m lCr o- r = (2.15)
TP+FP'
2.3 33
TP
mlCrO- l'í = ‘ (2.16)
TP+FN'
2x
micrc• F1= (2.17)
micro- P + micro-R
2.3.3
(cut
(Receiver Operating
[Spackman ,
'-F
T PR -TA -N, (2.18)
FP R
…
R÷
- (2.19)
34
1.0 1. 0
0.8
AUC
0.2 0.2
(Area Under
ROC
AUC = . (2.22)
(unequa1 cost).
(cost
costii =
17'1 / .,
>
costl0 5 1 qa-
9"-
36
(total
E (f; D; cost) (f
+ L I. (2.23)
(cost
pX costOl
(2.24)
+ (1 - p)
(normaliza-
FNR == 1 -
2.4 37
0.5 1. 0
2010]
=
38
(1 - x
P(Ê; f) = (2.26)
f)/8f =
=
0.25
0.20
0.15
0, 10
= 0.3)
(binomial
e = maxf (2.27)
2 .4 39
('Binomial' ,
www.r-project.org
(2.28)
(2.29)
EO)
(2.30)
-10 5
Tt
‘4 5 10
= 10)
1
40
-1).
k
2 5 10 20 30
0.05 12.706 2.776 2.262 2.093 2.045
0.10 6.314 2.132 1.833 1.729 1.699
(paired t-tests
.ð. 1 , .ð.2 , . . .
(2.31)
x
2.4 41
1998].
0.5(ßi +
f..L
Tt = (2.32)
5
0.2 I: ut
2.4 .3
(contingency table)
eOO eOl
eîo e l1
-
le01 -
- e lO l- 1)2
(2.33)
^
('Chisquare' , 0.1
= 2
42
2 .4.4
2,
Dl 1 2 3
D2 l 2.5 2.5
D3 1 2 3
D4 1 2 3
1 2.125 2.875
+ -
- - (N -1)Tx2
YF Z N(k-1)-TX2' (2.35)
2.4 43
- l)(N -
0.05
2 3 4 5 6 7 8 9 10
1. (k 4 10.128 5.143 3.863 3.259 2.901 2.661 2.488 2.355 2.250
(/F / , 1 5 7.709 4.459 3.490 3.007 2.711 2.508 2.359 2.244 2.153
-1 , (k -1) * (N -1)). 8 5.591 3.739 3.072 2.714 2.485 2.324 2.203 2.109 2.032
10 5.117 3.555 2.960 2.634 2.422 2.272 2.159 2.070 1. 998
15 4.600 3.340 2.827 2.537 2.346 2.209 2.104 2.022 1. 955
20 4.381 3.245 2.766 2.492 2.310 2.179 2.079 2.000 1. 935
0.1
2 3 4 5 6 7 8 9 10
4 5.538 3.463 2.813 2.480 2.273 2.130 2.023 1. 940 1. 874
5 4.545 3.113 2.606 2.333 2.158 2.035 1.943 1.870 1. 811
8 3.589 2.726 2.365 2.157 2.019 1. 919 1.843 1. 782 1. 733
10 3.360 2.624 2.299 2.108 1. 980 1. 886 1. 814 1. 757 1. 710
15 3.102 2.503 2.219 2.048 1.931 1. 845 1. 779 1. 726 1. 682
20 2.990 2.448 2.182 2.020 1.909 1. 826 1. 762 1. 711 1. 668
(post-hoc
CD=JF , (2.36)
Inf)!
sqrt
2 3 4 5 6 7 8 9 10
0.05 1. 960 2.344 2.569 2.728 2.850 2.949 3.031 3.102 3.164
0.1 1. 645 2.052 2.291 2.459 2.589 2.693 2.780 2.855 2.920
44
=
QO.05 = =
1. 0 3.0
(bias-variance
f(x;
2.5 45
[(YD-y)2] (2.39)
= ED [(f 2]
= ED [(f 2J
variance
Tibshirani ,
1989],
2.6 47
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
- . (2 .43)
, x-x
(2 .44)
2.9
2.10*
49
Brad1ey, A. P. (1997). "The use of under the ROC curve in the eva1ua-
tion of machine 1earning a1gorithms." Pattem Recognition,
Breiman, L. (1996). variance , and arcing classifiers." Technica1 Report
460 , Statistics Department , University of California, Berke1ey, CA.
Demsar , J. (2006). "Statistical comparison of classifiers over mu1tip1e data
sets." Journal of Machine Leaming Research, 7:1-30.
Dietterich, T. G. (1998). "Approximate statistical tests for comparing super-
vised classification 1earning algorithms." Neural
1923.
P. uni
CA.
.
Gosset ,
(Student's t-test).
,
Pearson,
(University College
=
f(æ) + b, (3.1)
f(æ) (3.2)
(un-
derstandability)
=
54
(0 , (1 , 0, 0).
(3.3)
(5quare 1055)
== arg min ) (f
i:i
(Euclidean
(least
estimation).
434-t
no-nu-
z'
'b\lll-/
(3.5)
Z
(3.6)
f(x) =
Z Yi(Xi - x)
(3.7)
3.2 55
(3.8)
f(Xi) + ,
= x
X11 X12
= (y - X 'ÛJ )T (y -
2X 1 y) (3.10)
definite ma-
(3.11)
=
56
(3.13)
(log-linear
U
30
20
2 X
3.3 57
y = (3.15)
(generalized linear
(link
g(-) =
(unit-step function)
I 0, z<0;
y= < 0.5 , z = 0 ; (3.16)
1 1, z >0,
Z >O
z = 0;
l
10 5 10 z
58
(surrogate
1
y = 1. + e- z (3.17)
1
y= .
(3.18)
(3.19)
1-y
U
(3.20)
1-y
(log
(3.21)
1-y
(logistic
3.3 59
= 11
p(y = 11 x)
=wTx+b. (3.22)
p(y = 01 x)
A wTæ+b
(3.23)
'
p(U=01z)=1 (3.24)
.
(maximum likelihood
(log-
likelihood)
1 (3.25)
X = (x; = 1 1
= p(y = 0 1 = 1-
+ . (3.26)
= E . (3.27)
and
Vandenberghe , descent
= . (3.28)
+1 at \
(3.29)
60
: Xi(Yi - P1 , (3.30)
mtt
ß )) . (3.31)
Discriminant
Xl
"
=
3.4 61
- wT :E ow +
(3.32)
scatter matrix)
+:E 1
(3.33)
æEXo
scatter matrix)
Sb = , (3.34)
(3.35)
(generalized
Rayleigh
1,
(3.36)
s.t. wTSww = 1.
Sb W (3.37)
62
Sb W , (3.38)
. (3.39)
St = Sb + Sw
, (3.40)
N
(3 .4 1)
SWi = L (3.42)
Sb = St - Sw
T
m (3 .43)
Sw ,
3.5 63
tr (WTSbW)
(3 .44)
,
E
. (3 .45)
(classifier)
=} h -7 "+"
(Error Correcting
Output
ECOC [Dietterich and Bakiri ,
3.5 65
(coding
and et a l.,
11 h fa 14 15 h h fa 16
•• ••
C1• • 32 V3 C1 • • 4 4
C •
2 • 4 4 C2 • • 2 2
C •
3 • 1 2 C3 • • 5
C •
4 • 22 v'2
••
66
y >
(3 .46)
l-y
(3 .4 7)
3.7 67
y' y m
1- y' 1- Y --
m+
(3.48)
(rebal-
(rescaling).
ance).
(upsam-
pling)
(threshold-moving).
[Chawla
et a l.,
EasyEnsemble [Liu et
(cost-sensitive
(sparse
(sparsity
LASSO
[Tibshirani ,
68
et al.,
[Crammer and Singer ,
et 2006 , 2008].
(Directed Acyclic
et al.,
(misclassification
and Liu ,
and Liu ,
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8*
3.9
70
J
Crammer, K. and Y. Singer. (2002). "On the learnability and design of
codes for multiclass problems." Machine Learning, 47(2-3):201-233.
"3L"
74
A)
2: if then
3:
4: end if
5: if A= 0
6: return
7: end if
for do
10:
11: if
12: return
13: else
14: A\
15: end if
16: end for
4.2 75
(information
(k = 1, 2,. . . ,
IYI
=- I:>k1og2Pk (4.1)
k=l
(information gain)
= . (4.2)
erative Dichotomiser
(8. 8 9. 9\
1"'7) =
76
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
D1
D2 D3
4, 6 , 10 , 13 ,
3 , 7, 8 , 9 ,
= = 11 , 12 , 14,
=
fLlDU!
IDI
= x
= 0.143; = 0.141;
= 0.381; = 0.289;
= 0.006.
2, 3 , 4 , 5, 6 , 8 , 10 ,
= 0.043; = 0.458;
= 0.331; = 0.458;
= 0.4 58.
78
(gain
a) (4.3)
(4.4)
= 0.874 (V = 2) , = 1.580 (V = 3) ,
= 4.088 (V = 17).
C4.5
4.3 79
and Regression
et al., (Gini
IYI
=1- (4.5)
argmin
pruning) [Quinlan,
80
4.3 81
4%
4%
4%
100% = 42.9%
(decision
stump).
4.4 83
1%
4%
1993] .
t
84
, (4.7)
[Quinlan , 1993].
Gain(D , a) t)
-LEnt(Df) , (4.8)
IDI
a,
12345678
0.697 0.460
0.774 0.376
0.634 0.264
0.608 0.318
0.556 0.215
0.403 0.237
0.481 0.149
0 .437 0.211
0.666 0.091
0.243 0.267
0.245 0.057
0.343 0.099
0.639 0.161
0.657 0.198
0.360 0.370
0.593 0.042
0.719 0.103
4.4 85
{0.049 , 0.074 , 0.095 , 0.101 , 0.126 , 0.155 , 0.179 , 0.204 , 0.213 , 0.226 , 0.250 , 0.265 ,
0.292 , 0.344 , 0.373 ,
= 0.109; = 0.143;
= 0.141; = 0.381;
= 0.289; = 0.006;
= 0.262; = 0.349.
.
86
7, 14 ,
..,
ÌJ k
= 1 , 2 , .. . , ÌJ k ,
LZLEL-L
(4.9)
~pf
(1 k IYI) , (4.10)
(1 v V) . (4.11)
4.4 87
Gain(D , a) = p x Gain(D , a)
\tll1/
× En4L ~D En 4tu
ND 4 -i9"
IYI
Ent(15) = -
Ent(15) = - LPk
(6. 6 8. 8\
= - I -::-: . -::-: 14 J = 0.985.
\11FE/\11/
NDND
EE
MM)) - - -
oo
ubub -
oo
FDFb nu03
nu--
1inu
q4qL
.•
88
14
0.306 = 0.252 .
17
= 0.252; = 0.171;
= 0.145; = 0 .424;
= 0.289; = 0.006.
9, 13 , 14 , 12 ,
4.5 89
0.697 0.460
0.774 0.376
0.634 0.264
0.608 0.318
0.556 0.215
0.403 0.237
0.481 0.149
0.437 0.211
,.•
0.666 0.091
0.243 0.267
0.245 0.057
0.343 0.099
0.639 0.161
0.657 0.198
0.360 0.370
0.593 0.042
0.719 0.103
90
06EZE +
+
+
+
o. 2
o O. 2 O. 4 O. 6 O. 8
(multivariate decision
(oblique
decision tree).
(univariate decision
4.5 91
/ '-....
O. 2
o 0.2 O. 4 O. 6 0.8
92
[Quinlan ,
4.1
4.2
4.3
4.4
4.5
http://archive.ics.uci.edu/mlj. 4.6
4.7
4.8*
4.9
4.10
94
achine
Kohonen 1988
Networks
[Kohonen ,
U
98
(activation
1.0
(b)
• (Xl ^ = 1, e= f(l . Xl + 1 . X2 -
5.2 99
'W l/ \'W 2
Xl X2
= X2 = = 1;
• = = 0, e= f( -0.6. Xl + O.
X2 = = = 1.
(dummy
(5.1)
ý)Xi , (5.2)
and Papert ,
100
X2 X2
"+"
(1 , 1) (0, 1) ..., (1 , 1)
i+
( AU nU)
(1 , 0) Xl (0 , 0) (1, 0) Xl
X2)
+
r_"_"
F
, i< ;H "+
-0 X
Xl ,
..
(0, 0) (1, 0) xl
(d)
(nu , )
1
i
‘. (1, 1)
1
Xl
Xl X2
5.3 101
(connection
Yl) ,
.. . , (Xm , Ym)} , Xi E Yi
=
102
Yl
Xl Xi Xd
Bj) , (5.3)
(5.4)
+
x
. (5.5)
(5.6)
5.3 103
_
(5.7)
't.
(5.8)
f'(x)=f(x)(1-f(x)) , (5.9)
- yj) f'
- fjj)(yj (5.10)
6. W hj (5.11)
6. ()j (5.12)
6. Vih , (5.13)
(5.14)
åb h
104
=
j=l
= bh (l - . (5.15)
2: repeat
3: for all (Xk ,
4:
5:
6:
7:
8: end for
9:
, (5.16)
5.3 105
0.53 1.72
2f .
1
of + +
2
error
(one round ,
gradient
[Hornik et al.,
(early
(5.17)
(local
(global
v 0) 111(w;
w;
v ,- , -,
5 .4 107
algorithms) [Goldberg ,
108
5.5.1
RBF(Radial and
= Ci) , (5.18)
= (5.19)
5 .5 .2
(stability-
plasticity
5.5.3
SOM(Self-Organizing
(Self-Organizing Fea-
matching
110
(construc-
(
5.5 111
5.5.5
neural (recurrent neural
networks"
1987].
5.5.6
(energy-based
E {O ,
E(s) =- (5.20)
i=l j= i+ l i=l
(station
ary distribution)
112
(a)
P(s) (5.21)
Boltzmann
(Contrastive
= rr (5.22)
P (hj I v) (5.23)
j=1
(5.24)
5.6 113
(deep
layer-wise
(pre- (fine-
belief [Hinton
(weight
Neural
[LeCun and 1995; LeCun et a1.,
et a 1.,
114
et al. , 1998]
(feat ure
(representation learning) .
(feature
5.7 115
[Haykin , [Bishop ,
Computation , Neural
IEEE 'JIransactions on Neural Networks and Learning Systems;
on Neu-
ral Networks.
5.1
5.2
5.3
5.4
5.5
5.6
http://archive.ics.ucí.edu/ml/
5.7
5.8
5.9*
5.10
http://yann.lecun.com/
117
networks architectures."
Goldberg , D. K (1989). Genetic Algorithms in Optirnizaiton and Ma-
chine Addison-Wesley, Boston , MA.
Gori , M. and A. Tes i. (1992). "On the problem of local in backpropa-
gation." IEEE 0 11, knalysis and Intelligence ,
14(1):76-86
S. (1998). Neural A Comprehensive 2nd
tion. Prentice-Hall , Upper Saddle River , NJ.
Hinton , G. (2010). "A practical guide to Boltzmann ma-
chines." Technical Report UTML TR 2010-003 , Department of Computer
Science , University of Toronto.
Hinton , G. , S. Osindero , and Y. -W. Teh. (2006). "A fast learning algorithm for
deep belief nets." Neural 1527-1554.
Hornik , K., M. Stinchcombe, and H. White. (1989). feedforward
networks are universal approximators." Neural 2(5):359-366
Kohonen , T. (1982). "Self-organized formation of topologically correct feature
maps." Cybernetics , 43(1):59-69
Kohonen , T. (1988). "An introduction to neural computing." Neural Networks ,
1(1):3-16.
Kohonen , T. (2001). 3rd edition. Springer, Berlin.
LeCun , Y. and Y. (1995). networks for images , speech ,
and time-series." In The of Brain Neural Networks
(M. A. Arbib , ed.) , MIT Press , Cainbridge , MA.
LeCun , Y., L. and P. Haffner. (1998). "Gradient-based
learning applied to document Proceedings of theIEEE, 86(11):
2278 2324.
D. J. C. (1992). "A 1
Minsky,
Paul
UCSD
!'l"
= {(Xl , Y2) , . . . , (X m , Ym)} ,
Xl
(6.1)
=
122
÷
(6.2)
=
, T'o".J n llA.I <lV1., I V
> v,
_,.,..-
+ b<
b +1, Yi = +1 ; (6.3)
WTXi +b -1.
(6 .4)
Ilwll'
(margin).
X2
+ +/ ...T _ ,
+
Xl
(maximum
2
(6.5)
s.t. 1,
6.2 123
1-2
(6.6)
S.t. i = 1, 2,..., m.
Vector
f(æ) (6.7)
quadratic
(dual
, (6.8)
, (6.9)
(6.10)
( tu )
4·i-t4
max -
124
S.t. = 0 ,
i=l
i =
f(x) =
+b , (6.12)
(Karush-Kuhn-
(> O
(6.13)
(Yd(Xi) - 1) = 0 ..
=
0,
=
et a l.,
:? 0 :? 0 , (6.14)
"uo
(6.15)
=C (6.16)
= 1,
=1 (6.17)
= {i > 0, i =
(6.18)
126
>
(6 .20)
'"
S.t.
T
x WU
U
Z Z nhv
6.3 127
;;;:: 0 , i = 1, 2,..., m .
= = cþ(Xi)Tcþ(Xj) , (6.22)
(ker-
nel trick).
=0,
i=l
;;;:: 0 , i = 1, 2,… , m.
f(x) +b
+b
i=l
(6.24)
(kernel
vector expansion).
128
X1) Xj) Xm )
K= X1) Xm )
X1)
= x 'f Xj
d = = (x 'f Xj)d
= exp
= exp (-
= 0, B < 0
(6.25)
6.4 129
z) (6.26)
= z)g(z ) (6.27)
(80ft
X2
Xl
(hard
130
+ (6.28)
-1) , (6.29)
1 1. if Z < 0:
eO/1(Z) = , (6.30)
10 , otherwise.
(surrogate 10ss).
(6.34)
(s1ack variab1es) Çi
mm (6.35)
W ,b,Çí
6.4 131
f(z )
3
= exp( - z)
= max(O , 1 -
l
f
‘t ,
( ~) i
/i " H
U
ut
l
S.t. 3 1 - Çi
Çi i = 1, 2 ,… , m.
+ b)) - (6.36)
, (6.37)
(6.38)
(6.39)
132
lllax (6 .40)
a
s.t.
i = 1, 2,… , m.
0 ,
-1 + Çi 0,
(6 .4 1)
(Xi) - 1 + Çi) = 0,
Çi = 0.
= =
=
< =
= =
>
and
6.5 133
C(f (x i) ,
(structural risk)
C(f (Xi) , (empirical risk)
Vector
134
_-0 0
, (6.43)
I 0, if Izl E ;
(6 .44)
=<
l Izl - E , otherwi8e
utLz (6 .45)
f! (z )
if Izl E
otherwise
Z
6.5 135
S.t. E+ Çi ,
f(x;)
0, i = 1 , 2,… , m.
C
mZM
b,
(6 .4 7)
0= , (6.48)
, (6.49)
iii . (6.50)
mhMMA (6.51 )
S.t. =0,
âi c.
136
f_ -
- f(Xi) - =0,
(6.52)
= 0 , ÇiÇi = 0 ,
(0 = 0 , (0 -
=
Yi - f(Xi) - =
- Yi -
-
f(x) = (6.53)
- Yi - f - Çi) = O. <
(6.54)
i=l
(6.55)
6.6 137
f(x) (6.56)
i=1
=
(representer
and
Smola ,
: ]Rm •-7
2fE (6.57)
(kernel
h(x) . (6.59)
138
(6.60)
(6.61)
$
st = - (6.62)
Xi) , (6.64)
(6.65)
(6.66)
mo
(6.67)
m1
M= 11 1) T , (6.68)
6.7 139
(6.69)
(6.70)
and Vapnik ,
(statistical
plane
et a l.,
[Hsieh et al.,
[Tsang et a l.,
and Seeger ,
and
et a l., 2012].
6.1
6.2
csie. ntu.edu. tw/
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.10*
142
based on risk A ls
145
N. Vapnik ,
"Nothing is
theory."
decision
=
I
loss)
(conditional risk)
(risk)
N
R(Ci I æ) I æ) (7.1)
decision
I
= argminR(c (7.3)
optimal
risk). 1
148
J 0, if i = j ; (7.4)
21-IL otherwise ,
= argmaxP(c I x) , (7.6)
cEY
P(c I I
(discriminative
I
(generative
P(æ , c)
P(c I æ) (7.7)
P(æ)
P (c) P(x I c)
P(c I æ) = (7.8)
P(æ)
P(æ I
(likelihood);
I
I c).
I
7.2 149
I I
2005;
Likelihood
P(Dc = rr (7.9)
ax L L( QUC)
PAV
C
a?4
(7.11)
rv N(!-"c ,
(7.12)
(7.13)
{I'!
I
I
Bayes
(attribute conditional
P(c I I c) , (7.14)
P(z)PW)ff
7.3 151.
), 'hM
3 -aTi FDU ax p c pz c
(7.15)
I C).
IDcl (7.16)
IDI'
P(Xi I c) =
IDc ,Xil (7.17)
IDcl
I c) '"
( \
P(Xi I c) exp I . (7.18)
\ /
0.460 ?
17
152
1 c):
1 .... ( 0
1 ..... ( (0.697 -
0.195 ….t-' \ 2.0.195 2 ) -
-
V2ir. 0.101 \ 2.0.101 2 )
1 \ _. (\ fìD t:.'
)'
7.3 153
X 10- 5 .
> 6.80 x
(Laplacian
P(c)
IDcl + 1 (7.19)
IDI+N'
IDC , I + 1
Xi
(7.20)
IDcl +Ni .
17 + 2
154
3+1
0+1
(lazy
(semi-naïve Bayes
(One-Dependent
I
7.4 155
SPODE (Super-Parent
mutual information)
I(xi , Xj ;
Xj I
m' [Webb
et al. , 2005].
c,
6+1
y,
(belief
Acyclic
7.5 157
Probability
FB ( Xi
= , Xd
l 7ri) = (7.26)
i=l i=l
j_ X4 I X2.
158
(common
= . (7.27)
(marginal
(marginal-
ization)
j_ X4 I Xl
y j_ z I
(direct
ed)
1988].
(moral
(moralization) [Cowell et a l., 1999].
=
7.5 159
I Xl , X3 1- X2 I X5
X3 1- X5 I
(Minimal Description
Itp
= =
s(B I
= (Akaike Information
= (Bayesian
Information
D) I D) . (7.31
=
s(B I I
(7.32)
et
7.5 161
(evidence).
=
= q I E = e) ,
=
=
=
P(Q =q I E (7.33)
(random
(Markov
P(Q I IE =
= q I E = e).
162
= (G , 8);
1: nn = 0
2: qO
3: for t = 1, 2, . . . , T do
4: for do
5: Z = E U Q \ {Qi};
6: z=euqt-l\{qf-l};
7: IZ = z)j
8: Z
9: qt
10: end for
11: if qt = q then
12: nq = nq + 1
13: end if
14: end for
P(Q = q I E =
"1"
7.6
(latent
(marginal likelihood)
EM (Expectation- et al.,
P(Z I
I
IX,
1983].
descent)
164
and pazzani ,
1991].
[Friedman et al. , [Webb et al.,
(lazy Bayesian Rule) [Zheng and Webb ,
[Kohavi ,
(Bayesian
2006].
J.
1990; Chickering et
[Friedman and Goldszmidt ,
and Domingos ,
1997; Heckerman , 1998].
7.7 165
mÎxture
7.1
7.2*
7.3
7.4
P(Xi I
7.5
7.6
7.7
x 2=
.
7.8
y ..l z I
7.9
7.10
167
Bayes ,
,
(individual
(base learner) ,
(base learning
(component
172
× h1 X h1 × ×
h2 × hz v' × hz × ×
ha v' × d h3 V V x ha × ×
× ×
(8.1)
/ T \
H(x) = sign I ) (8.2)
8.2 Boosting 173
. (8.3)
(Random Forest).
8.2 Boosting
(additive
T
H(x) (8.4)
function) [Friedman et
fexp(H I Ð) = . (8.5)
174
1: 1)1(X) = 11m.
2: for t = 1, 2 ,. ..., T do
3: ht = 'c (D , 1)t);
7:
Ð ,(æ) " f if ht(x) = f(x)
Z, if
iH-
dj-
n··- Zt
1-3JU-AU
A-A
P (f (x) = 11 æ)
(8.7)
P (f (x) =
,-_ P (f (x) = 11
l(1 ... P (f (x) == -11 æ))j
\2-h
=
1 100) = -11 æ)
-1 , P (f (x) = 1 æ) < P (f (x)
1 1 æ)
= y æ) ,
1 (8.8)
consistent
8.2 175
eexp Ðt ) = ]
I Ðt ) = - Et) + (8.10)
1, (1 - Et \
(8.11)
I Ð) =
f(æ)ht(æ)j . (8.12)
= hr(æ) =
Ð) 1_-f(ælH+_ , (æl (,
( 1-
"1_\ 1.. 1_\ , f2(æ)h;(æ )\1
) I
ht(æ) = I Ð)
h
176
=
| I
r • I, (8.14)
h I I
(æ)
Dt(æ) = u. (,.,.\, , (8.15)
I
ht(æ) = I ... r •
,- r - ,-, II
f(æ)h(æ) = 1- , (8.17)
ht(æ) ] (8.18)
D(æ) e-f(æ)Ht(æ)
(æ) =
[e-f(æ)Ht(æ)]
1) (æ)
(æ)]
= Dt( æ) . , (8.19)
8.2 Boosting 177
0.6 • EZE
0.21
o Q2 Q4 Q6 QS
nu Zb
2 0.4 O. 6 0.8
<
(a) J
(c)
178
8.3
8.3.1 Bagging
Bagging [Breiman ,
Bootstrap
sampling).
1: for do
2: ht = .l3 (D , Ðb.)
3: end for
H(x) = argmax I:;;=l[(ht (x)
yEY
8.3 179
T (0 (m) +0
[Zhou. 2012].
T
Hoob(æ) = = y) . lI (æ 1:. Dt) , (8.20)
t:l
Eoob . (8.21)
[Breiman ,
180
ji
o
; 0 4
'" 0.2
4
0.2
0.8 0.8
(a) (b)
=
= = log2 d
[Breiman ,
0 .34 0.028
0 .3 2 , - - -\
0.22 0.008
(a) (b)
8 .4 181
Bagging
/
.!
/ < h3 . .
.f ,/ . f \"
• h1 :
..,,' '"1
2000]
averaging)
H(x ) (8.22)
182
• averaging)
T
(8.23)
Breiman T
voting)
T N T
if >
(8.24)
otherwise.
voting)
yoting)
H(x) = . (8.26)
h;
(hard voting).
h; I
(soft voting).
= {(æl ,
1: for t = 1 , 2,… , T do
2: ht =
3: end for
4:
5: for i = 1, 2,..., m do
6: for t = 1, 2,… , Tdo
7: Zit = ht(æi)j
8: end for
9: D' = D' U
10: end for
11: h' =
H(æ) = h'(h1(æ) , h 2(æ) ,..., hT(æ))
=D\
Linear
and Witten ,
2002].
8.5 185
Model A
[Clarke ,
:
(arnbigui ty
I æ) = , (8.27)
A(h I æ) = æ)
I æ) E(hi I
= LWiE(hi I æ) -E(H I æ)
i=l
= E(h I æ) - E(H I æ) . (8.31)
186
Ei =J x)p(x)dx , (8.33)
Ai =J x)p(x (8.34)
E(H).
E =J E(H I (8.35)
E=E-A. (8.36)
and Vedelsby ,
(error-ambiguity decomposition).
8.5 187
= Y2) , . . . , (X m ,
{-1 ,
hi = -1
b d
measure)
d-Mb+c
1, 8 ,;.; = (8.37)
m
coefficient)
- bc
(8.38)
c)(c + d)(b + d)
• Q-statistic )
bc (8.39)
p-1 qa-Ba
= (8 .40)
Pl (8 .41)
m
P2 (8 .42)
m2
188
0 .40 0.40
0.35 0.35
0.30 0.30
0 .25
v
'
.......
0 .1 5
0. 10 r
nuo A 06
U
0 0 .2 0.4 0.6 0.8 - 0.2
( ) B F
(a) D
base
8.5 189
1: for t = 1, 2, . . . ,T do
3: Dt =
4: ht = 5:. (D t )
5: end for
H(x) = (MapFt (x))
yEY
and Schapire ,
Mult iBoosting
[Webb ,
[Demiriz et a l.,
and Wyner ,
2014].
of
and Whitaker , 2003; Tang et a l.,
2012]
(selec-
tive [Rokach , 2010a]; [Zhou et a l.,
[Zhou ,
and
2012]
192
8.1
k
P(H(n)";; k) (8 .43)
i=O
> 0, k = (p -
8.2
-
(8 >
8.3
8.5
8.6
8.7
8.8
Iterative
8.9*
193
Breiman,
(unsupervised
ty
(anomaly
(clustering).
=
= (X í1; Xí2;... ;
Il = = ø
= U7=1
(cluster
(validity
198
(intra-cluster (inter-cluster
(reference (external
(internal
index).
= = {Cl ,
= {q , 02 ,...,
D8 = {(æi , æj)
(i <
c+ d =
• J accard
(9.5)
• and Mallows
(9.6)
9.3 199
ICI(IOI - , (9.8)
dmin(Ci ,
• Bouldin
\
I ) . (9.12)
• )
DI = _min. ir ( ) • (9.13)
l-'Tii: J
(distance
= Xj ; (9.15)
200
= Xj2; . • . ;
(Minkowski distance)
(..!':_ \
- Xjul P 1 . (9.18)
distance)
(city
distance)
block distance)
(continuous
(categorical
(numerical
(nominal attribute)
(ordinal
(non-ordinal
attribute)
(Value Difference Metric) [Stanfill and Waltz ,
(9.21 )
9.3 201
(weighted
0 (i =
(similarity
(distance metric
< d3
202
based
9.4.1
LæEGi
et a l.,
1Lk}
2: repeat
= ø k)
4: forj = 1 , 2, . . . , m do
5: dji =
6: = d ji ;
7: U{Xj};
8: end for
9: for i = 1 , 2 ,… , k do
10: X;
11:
12:
13: else
14:
15: end if
16: end for
17:
, Ck}
X12 ,
= (0.697;
0.369 , 0.506 ,
03 = {X1 , X2 , X3 , X4 , X21 , X22 , X24 , X25 , X26 , X27 , X28 , X29 , X30}'
0.8 0.8
0.7 0.7
0.6 0.6
ooo
..
. ..
..
. .
. ....;..
....::::... .
+ .‘
+ .... ...: _.
0.2
•
" . • +. .- .
• ....::...
-'
.1
0.2
0.1 0.1
0.8 0.8
0.7 0.7
0.6 0.6
•
. ••..
+ +
.....
.
0.2 · /J.·' ·
/ 4. 4
0.1 ...:i
0.2 0.3 0.4 0.6 0.8 0.9 0.2 0.4 0.7 0.8 0.9
(Learning Vector
= {(Xl , Yl ), (Xm ,
9.4 205
, (Xm , Ym)}j
2:
3:
4: i d ji = Ilxj -pil12j
5: djij
6: if Yj =
7: (Xj -
8: else
9: p' = (Xj -
10: end if
11:
12:
, Pq}
p' , (9.25)
=
= . (9.26)
206
(Iossy com
(vector
Ri = {x EX Illx -PiI12:( . (9.27)
(Voronoi tessellation).
Cl, C2 , C2 , Cl , Cl.
= (0.722; 0 .442) .
IEI: (9.28)
E- 1 :
9.4 207
0.8 0.8
0.7
.. +' . .
• + .+•
+ ·
.+
.
0.2
l . .
0. 1'-
+
0.8 0.9
(b)
0.8 0.8
0.7
0.6
• .+ • l •
.
er 0.5
.+ . .
‘
•
-'.0
•..... ......... .......
0.2
. ,. . to 0.2
J
0.1
0.8 0.9
(c) (d )
= Cl ,
PM (
PM(X) . p (x , (9.29)
>
coefficient) =1
208
=
2 ,...
P(Zj =
P (Zj = i) . PM(Xj i)
PM(Zj =i 1 Xj) =
PM(Xj)
. p(Xj :E i)
(9.30)
p(Xj :E l)
. (9.31)
:Ei)
\Ill-/'PA
LL D -qJ
\-atfF/
m 2 QU qdq,"
J.Li , :E i )
:E i ) = 0 , (9.33)
:E 1)
= PM(Zj
9.4 209
(9.34)
= j=1 (9.35)
0, =
(9.36)
p(Xj (9.37)
p(Xj
(9.38)
k}
210
8:
9: •
10:
11: 11 ,,;; k}
12:
13: Ci ,,;;
=
= :1: 6 ,
= 0.219 , 1'12 = =
0.361 , a; = = 0.316
21=(::;;;;:;)Ji=(;:;:::17) z•(:;;;;;:;) ,
9.5 211
0.8 0.8
0.7 0.7
0.6
0.5
. . ‘
A
A
A
A
a‘
4‘
a‘ ' ‘ • ,‘
.+
o 0 0 0 + + • o 0 4
0.2 0.2
0.1
40U
ß.1 0.2 0.3 0.4 0.5 0.8 0.9
(a)
0.8
0.7 0.7
0.6 0.6
0.5
'‘
'‘
0.5
. 4‘
A‘
• ,‘
‘+‘
+ 0.3
• + • +
O 0
0.2
.. 0.1
L u u u u u M L u u u u u u u
l
(c) (d)
=
(density-based
5- (neigh-
patial Clustering of Appli-
cations with Noise"
D=
= I : : ; E};
212
= Xi , Pn
0 0...- - 0
h XA
OY\/21 , , \ ,J , J d )
(9.39)
(9.40)
(seed) ,
9.5 213
MinPts).
2: for do
3:
4: if ;;;:: MinPts then
5: 0 = OU{Xj}
6: end if
7: end for
f=D
10: while do
11: f o1d = f;
12: =< 0>;
13: f = f \ {o};
14: while do
15:
16: if then
18:
19: f =f \ t:l i
20: end if
21: end while
22: k= k+ = f o1d \f;
23: O=O\Ck
24: end while
= {Cl, C 2 ,..., Ck}
0.11 , MinPts = 5.
D= X9 , X13 , X14 , X18 , X19 , X24 ,
D = D \ 01 =
X13 , X14 , X24 , X25 , X28 ,
214
0.8 0.8
0.7 0.7
0.6 0.6
0.5
. O5A
0 |
0.3
r J vF > •
0.2
0.1 .
6C 0.8 0.9 0.2 0.3 0 .4 0.5 0.6 0.7 0.8 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5
. 0.5
U.3
0.2 0.2
0.1 • 0/
.
0
OL Z
0.8 0.9 0.3 0.4 0.5 0.6 0.7 0.8 0.9
= 0.11 , M inPts =
"0"
C2 = X17 , X 21} ,
C3 = ;
C4 = X 27 , X28 , X 30 } .
tive N ESti
9.6 215
dorff
9.2 , (9.41)
dmax ( Ci , C j )
_.
= Ifl ax (9.42)
= {X1 , X2 , … , Xm};
dmax
AGNES
216
0.7
0.6
0.4
"'
…
0.2
0.0
C7 = {æ11 , æ12}.
9.7 217
0.8 0.8
0.7 0.7
0.6 0.6
..
..... ooo
..
. .;
."
.
.. .."
<!. • .•
0.2 0.2
0.1 0.1
1
0.7 0.8 0.9 0.8 0.9
0.8
0.7 0.7
0.6 0.6
...
‘
...•
0.4
i / J\\ V .• . ..
..•
.
...
......, .........!
0.2 •. 0.2
= 7, 6 , 5,
218
silhouette width)
1988; Halkidi et a1., 2001; Maulik and
2002].
k-
[Schölkopf et al., clustering) [von Luxburg ,
et al.,
et al., 2012].
220
9.1
disth(X , Z) = - zl12 . (9 .4 5)
9.3
9.4
9.5
9.6
9.7
9.8
9.9*
9.10*
221
tion
145.
Hinneburg , A. and D. A. Keim. (1998). "An efficient approach to clustering
large multimedia databases with noise." In Pmceedings of the 4th
tional Conference on K Discovery and (KDD) , 58-65 ,
New York , NY.
Hodge , V. J. and J. Austin. (2004). "A survey of outlier detection methodolo-
gies." A 7t ificial Intelligence
Z. (1998). "Extensions to the k-means algorithm for clustering large
data sets with categorical values." Knowledge Discovery ,
2(3) :283-304.
D. W. , D. Weinshall , and Y. Gdalyahu. (2000). with
non-metric Image retrieval and class representation." IEEE
Transactions on Pattern Analysis and Machine Intelligence , 6(22):583-600.
Jain , A. K. (2009). "Data clustering: 50 years beyond k-means." Pattern Recog-
nition Letters , 31(8):651-666.
Jain , A. K. and R. C. Dubes. (1988). Algorithms for Clustering Prentice
Hall , Upper Saddle River , NJ.
K., M. N. Murty, and P. J. Flynn. (1999). "Data clustering: A review."
ACM Computing 3(31):264-323.
Kaufman , L. and P. J. Rousseeuw. (1987). "Clustering by means of medoids."
In Statistical on the LI-Norm Related Methods (Y.
Dodge , ed.) , 405-416 , Elsevier , Amsterdam , The Netherlands.
Kaufman , L. and P. J. (1990). in An Intro-
duction to Cluster Analysis. & New York , NY.
Kohonen , T. (2001). 3rd edition. Springer , Berlin.
Liu , F. T. , K. M. Ting , and Z.-H. Zhou. (2012). "Isolation-based anomaly detec-
tion." ACM on Knowledge Discovery fmm 6(1):Article
3.
Maulik , U. and S. Bandyopadhyay. (2002). "Performance evaluation of some
clustering algorithms and validity indices." IEEE on
Analysis and Machine Intelligence , 24(12):1650-1654.
223
McLachlan , G. and D. Peel. (2000). Finite Mixture Models. John Wiley &8ons ,
New York , NY.
Mitchell , T. (1997). McGraw Hill , New York , NY.
Pelleg , D. and A. Moore. (2000). "X-means: Extending
estimation of the number of clusters." 1n of the 1 7th
tional Conference on M achine 727-734 , 8tanford , CA.
Rousseeuw , P. J. (1987). "8ilhouettes: A graphical aid to the interpretation
and validation of cluster analysis." of Computational and Applied
20:53-65.
A. 8mola, and K.-R. Müller. (1998). "Nonliear anal-
ysis as a kernel
Stanfill , C. and D. Waltz. memory-based
Minkowski ,
- 3) + (33 - 23) =
(Kaunas)
10.1
(lazy
(eager learning) .
//
/ / / / -\Y< \
'+ I /
l / f
l l l+
\ \- /
k =
226
:L p2(clæ)
1 - p2(c* I æ)
and Hart ,
(dense
8=
(103 )20 =
10.2 227
[Bellman , 1957]
(curse of
dimensionality) .
red uction)
.., ...' ..
...
-:'..'
' _' • 9 L & .......
;f d dzh
..
• t .. fz -. .... L'
,::_
.:....
Z E ]R d'x m , d'
- zj ll = distij .
bij = bij
(10.5)
j=1
m m
nunhu
= ,
i=1 j=1
tr(B)
(10.7)
(10.8)
(10.9)
decomposition) , B =
A = ;;:: ... V
nu
Z= Xm
-EA
10.3 229
. (10.12)
<<- d.
z=wTx , (10.13)
Z E
(i =1
Component
230
=
IIwil12 = 1, =0
( <
Zi2;...;
= TWTXi + const
(W T (10.14)
Xi X ;
tr (WTXXTW) (10.15)
s.t.VVTVVzI
tr (WTXXTW) (10.16)
S.t. WTW=I ,
10.3 231
X2
0.045
2 Xl
, (10.17)
;;:: ... =
æ ij
., Wd"
).
i= l j= l
t. (10.18)
232
(c)
[Schölkopf et al.,
(10.19)
10 .4 233
, (10.20)
i = 1, 2,. . .
(10.21)
(10.22)
Xj) = . (10.23)
(10.24)
(K)ij A = .
(j = 1, 2, . . .
æ) , (10.25)
234
(manifold
[Tenenbaum et al.,
10.5 235
1: for i = 1, 2,... , rn do
2:
3:
4: end for
Xj);
7: return
Xk
Xk
z
•
• Xl
Xi = + WilXI , (10.26)
(10.27)
Jbm
S.t. = 1,
= (Xi - Xj)T(Xi -
kEQi
(10.28)
l ,sEQi
(10.29)
10.6 237
M = (1 - W? (1 - W) , (10.30)
(10.31)
ZZT = 1.
8: return
=
= , (10.33)
0, W = (W)ii
distance)
C.
Component
[Goldberger et al.,
10.6 239
(10.35)
3 ,
Pi , (10.36)
= (10.37)
-
(10.38)
i=l
et a l., 2005]
(Xi'
mjp - (10.39)
"
240
et a1., 1996];
1997].
[Fisher , [Baudat
and Anouar ,
(Canonical Correlation
[Harden et
view
[Yang et al. ,
[Ye et a 1., Zhou ,
and Bader , 2009].
et al. ,
et a1.,
et
10.7 241
and Saul ,
et al., 2007; Zhan et
et
al.,
[Davis et a l.,
242
10.1
10.2
10.3
10 .4
10.5
10.6
http://vision.ucsd.edu/content
jyale-face-database
10.7
10.8*
10.10
243
Pearson ,
College
,
(relevant
(irrelevant
(feature selection).
(data
(redundant
248
(su bset
(subset
(i =
{D l, D2 ,...,
, (1 1. 1)
11.2 249
IYI
Ent(D) =- (1 1. 2)
Y1) ,
(X2 , Y2) , ..., (x m ,
250
+ , (1 1. 3)
=
= -
and Rendell ,
[Kononenko ,
(l = 1, 2,…, IYI;
8j
11. 3 251
1:
2: d= IAI;
3: A* = A;
4: t = 0;
5: while t < T do
6:
7: d'= IA'I;
8: E' =
9: < d)) then
10: t = 0;
11: E=E';
12: d= d';
13: A* =A'
14: else
15: t=t+l
16: end if
17: end while
252
y l),
I)Yi - WTXi)2 (1 1. 5)
(1 1.6)
(1 1. 7)
(Least Absolute
Selection Operator)
50.
11 .4 253
W1
\!
J
W2
Gradient Descent ,
[Boyd and Vandenberghe ,
minf(x)
z
(11.8)
( -ti1ai
-
-i
-EL
zkrmk jvkzk)
= Xk -
Xk+1 - . (11.13)
( Il. P
I < z' ;
Xk +1 = < 0, (1 1. 14)
I z'!
11. 5 255
(codebook)
(dictionary
learning)
(sparse
(1 1. 15)
256
minllxi - . (1 1. 16)
(1 1. 17)
= A = E ]R kxm , 11 .
[Aharon et a l.,
=
)=1 IIF
X- J - bíoí
\ J7=' / IIF
2F ( )
n E LU
.
4tinxu
= 2:#i
11.6 257
pressíve sens-
sensing) [Donoho , 2006; a l.,
mg
n <<:
y = q, æ , (11.19)
(1 1. 20)
258
x m (n<<
E
-ms s nu
(1 1. 22)
s.t. Y = As .
L1
et a l.,
s.t. Y = As .
11. 6 259
(Basis
Pursuit [Chen et al. , 1998].
(collaborative filter-
"
5 ? 3 2
?
53?· ?
whuq'-qr·
5 ?
3 5 4
and Recht ,
rank(X) (1 1. 24)
s.t. (X)ij =
260
(nuclear norm):
norm)
min{m ,n}
IIXII*= , (1 1. 25)
IIXII* (1 1. 26)
Programming ,
n<<
[Recht , 2011].
and Fu kunaga ,
et al.,
(Akaike Criterion) [Akaike , [Blum and
Langley , [Forman ,
and
John , et al.,
[Quinlan ,
and Pederson , 1997; Jain and Zongker ,
and Elisseeff, 2003; Liu et al.,
1 1. 7 261
LASSO
LASSO [Yuan and
Lin , LASSO [Tibshirani et al.,
(group
sparse coding)
et a l., 2008; Wang et al., 2010].
2006; et al.,
et al. ,
et al., 2010]. [Baraniuk ,
(http://www.yelab.net/software/SLEP/).
262
1 1. 1
1 1.. 2
1 1. 3
11.4
1 1.5
1 1. 6
11.7
1 1. 8
1 1. 9
11.1 0*
263
E. J. , J. . uncertainty principles:
Exact signal highly incomplete frequency
1EEE on 1nformation 52(2):489-509
Chen , D. L. Donoho , and M. A. Saunders. (1998). "Atomic decomposition
by basis pursuit." S1AM on Scientific 20(1):33-6 1.
Donoho , D. L. (2006). "Compressed sensing." 1EEE on
tion Theory , 52(4):1289-1306.
T. Hastie , 1. Johnstone , and R. Tibshirani. (2004). "Least angle
regression." Annals 01 Statistics , 32(2):407-499.
264
68(1):49-67
Zou , H. and T. Hastie. (2005). "Regularization and variable selection via the
elastic net." of the Royal Statistical Society - Series B , 67(2):301
320.
1867-1918
learning
and identically
E(h;Ð) , (12.1)
Ê(h;D) (12.2)
:::;;
(12.5)
) (12.6)
(12.8)
12.2
Approximately
1984].
=
(concept
class)
(hypothesis
12.2 269
<
P(E(h) E) 1- 8 , (12.9)
< E, 8 <
270
Learning
E, 1/6,
PAC learnable)
;?;
(properly PAC
=1=
1111
12.3 271
P(h(x) = Y) = 1 -
= 1- E(h)
<1 • E • (12.10)
(12.12)
<5, (12.13)
(12.14)
.
272
<E<
P(IE(h) - .
<E<
<8<
exp(
•
(agnostic
PAC
0 8 <
"
m ?:
(12.20)
12.4
(Vapnik-
dimension) 1971].
.
=
E N, 0 < E <
= 2m } . (12.23)
12.4 275
E b} , X = = +1 ,
= 0.5 , X2
{h [o ,l] ' h[O ,2]' h[l ,2J'
< X4 <
(X5 ,
X =
E
= =
1972]:
(12.24)
= 1, =
- 1, d - - 1, = {Xl' X2 , … , Xm } ,
D' = {Xl' X2 ,..., Xm- l} ,
={ I 3h ,
11iID I = (12.25)
(12.26)
1) 1) (12.27)
I
1l D I
1 (m 1) + (m 1)
(n:_-;:l) o.
=
= -=-11) )
{12.28)
12.4 277
= (:tt
m;;'d.
(:t i:; (7)
=
d, 0 <8<
P II E(h)
_",
- ,;:::" , 18d ln
m' - --- 0 I
\
1- 8 . (12.29)
E= 11
m
•
278
(12.30)
Risk
E(g) . (12.31)
K=; ,
(12.32)
_ f- __,..
E(g) E(g)
(12.34)
m 2 '
P( E(h) -
E(h) -
= E(h) - E(g) + E
12.5 279
12.5
Rademach-
er(1892-1969)
= {(X l, Y1) , (x m ,
Ê(h)
1
mz2
(12.36)
ar324ph(24) (12.37)
Yi) ,
280
(12.38)
, (12.39)
Rz(F) =
Rm (F) (12 .4 1)
et
2012]:
+ (12.42)
(12.43)
Êz (f) ,
(12 .44)
282
J J
+3 (12 .46)
•
12.5 283
,x m } , 0 < Ó <
= X x
!h (z) = !h (x , y) = ,
{!h: h
sup 1
J
WL
-
- Yi ih(Xi))]
(J
(12.50)
(12.51)
•
284
et
a l., 2012]
Rm (tl) (12.52)
Dt =
(12.55)
1-m
avb Z D at ZD z 9" vhuEU
stability):
Z =
= z)
[Bousquet and
0< ð<
D) + ß + (4mß + M) (12.59)
ß=
D) - i( 'c,
-
i( 'c, (12.60)
Risk
12.7 287
Jl (g ,V) =
hEH
,
Ed-2'x
E-AVBq"
p mE
I Jl (g , V) - ê( g , D) I ,,;;
Jl('c,
Jl( Jl (g ,
,,;; Jl('c, D) - Jl (g , D) +E
E
[Valiant ,
[Kearns and Vazirani ,
288
and Chervonenkis ,
and
deterministic
et al., 1996].
289
12.1
12.2
12 .4
12.5
12.6
12.7
12.8
12.9
+
12.10*
290
Shelah , S. (1972). "A combinatorial problem; stability and order for models
and theories in languages." Journal of 41
(1):247-261
Valiant , L. G. (1984). "A theory of the learnable." of the
ACM, 27(11):1134-1142.
Vapnik , V. N. (1971). "On the uniform convergence of
relative frequencies of events to their probabilities." Theory of
Its Applications, 16(2):264-280.
292
G.
theory of the
learnable"
l<<
(active
!"
294
..
"+"? " _ " ?
••
-•
4
+
•••
(duster
(manifold
13.2 295
296
= {1 , 2,...
N
p(x) p(X , (13.1)
0, = 1;
I(x) = argmaxp(y = j 1 x)
N
= (13.2)
p(x :Ei )
p(8 = i 1 x) = ;. (13.3)
p(x :E i)
= j 1
= j
1 8 =
= j 18 = i) = =j 1 = o.
= j 1 8 = i,
= ...,
13.2 297
Dz
IN \
LL(Dz U Du) = . p(Xj :E i) . p(Yj 1e = i ,xj) 1
/
:E i)
ji
'YIJ" = N (13.5)
. p(Xj :E i)
Z eqJ
(13.6)
:E i =
$30Jz z \z30tL
(13.7)
(13.8)
and U yar ,
et
298
Support Vector
(low-density
4
.+.• 0',.
...-
·••
•• -
• ...L• .
•• · •
"
assignment)
13.3 299
= . . , (XI ,
XI+2 , . . . 1 << u , 1 + u = m.
= . . . , YI+u) ,
me
g ,_
+CILÇi +Cu (13.9)
S.t. i = 1, 2,… , l ,
+ b) ;? i = 1+ 1, 1 + 2 ,… ,m ,
Çi ;? 0 , i =
• ,
=
300
, (13.10)
11 ,+
+ çj >
[Joachims. 1999].
I( exp
__._
), if od 14 )
(W)ij = < ,-- / .
l 0, otherwise ,
=
Yi = Sign (f (Xi)) ,
(energy function) [Zhu et al. , 2003]:
- f(Xj))2
1-2
IIU Z PTd
Z q" w ,
d
PT
Z rJ z
-
i=1 j=1
= fT(D - W)f , (13.12)
W=
IWuz Wuul
302
D = I Dll
I uul Ll UU I
l Olu Wluilri
2fJWudl (13.14)
-ud
DO
P D W
.. t!
D;-;-lWlu 1I
.L.IU •• ,,,
(13.16)
'
= =
(Duu(1 -
=(I
= (1 - Puu)-lpudl . (13.17)
et a l., 2004].
U =
- {Xl ,'"
= diag(d1 , d2 ,...,
+ u) x = (F I, F i,...,
Fi = ((F) i1, • ,
Yi = arg maX1';;j ,;; IYI (F)ij.
= 1, 2 ,..., m , j = 1, 2,...,
13.4 303
OK
4.'''hw 'bn8
-qJ
F QU Y -qJ
(13.18)
'+b ,A
F(t + 1) + (13.19)
= (:1: 1, Yl)};
4: t= 0;
5:
6: +
7: t =
8:
9: for i = 1 + 1, 1 + 2,..., 1
et al. , 2004]
11 • •
,
\Ô=l \ (W)ij 11
. . ) tJ 11 z
(13.21)
304
based
sity.
" " (co-training) [Blum
(multi-view
(attribute
set)
13.5 305
= =
sufficient
and Mitchell ,
and Zhou ,
and Li ,
[Zhou and Li ,
306
2: Du \Ds;
3: for j = 1, 2 do
5: end for
6: for t = 1 , 2 , . . . , T do
7: for j = 1, 2 do
9
c Ds;
10: = I Xi E Dt};
11: I
12: Ds = Ds \ (Dp U Dn);
13: end for
14: if
15: break
16: else
17: for j = 1, 2 do
18:
19: end for
20:
21: end if
22: end for
h2
13.6 307
et al. ,
=
2: repeat
3: Cj = ø (1 j k);
4: fori=1 , 2,..., mdo
5: d ij = ;
7: is ...merged=false;
8: while -, do
9: minjE /C dij ;
10:
11: if -, is_voilated then
12: Cr =
13:
14: else
{r};
16: ø then
17:
18: end if
19: end if
20: end while
21: end for
22: for j = 1, 2, . . ., k do
X;
24: end for
25:
C2 ,..., Ck}
308
(X i ,
C= {(X2 , X21) , (X21' X2) , (X13 , X 23 ) , ( X23 , X 13), (X19 , X23 ) , ( X23 , X19)}.
= X6 , X12 ,
0.8 0.8
0.7
0.6
...
0.6
0.5 0.5
0.2
0.1
0.2
0.1
í ..:..
0.7 0.8 0.9 0.8 0.9
0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.2 0.2
0.1 0.1
(
=
13.6 309
=
label).
Seed
et al.,
1: for j = 1 , 2 , . . . , k do
2: X
end for
4: repeat
5:
6: for j = 1, 2, . . . , k do
7: for all X E S , do
8: C j = Cj U{ X }
9: end for
10: end for
11: for all
12: dij = Ilxi ;
13: r = d ij ;
14: Cr = Cr U{x i}
15: end for
16: for j = 1 , 2 ,. . . , k do
X;
18: end for
19:
C2 ,..., C k}
310
82 = 83 = {XI4 , X17}.
0.8
0.7
0.(. 0.6
'·. ·.·.. ..
.
• •
0.4 \. .
\. ...." +
......
.\
...'
......
\. •
...
+
0.2
•
•.. . """ _L !
0.2 .• •+
0.1 . · . - -s 0.1
;.
• ...! .
-
"
0.6 0 0.8 0.9 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.8
0.7 0.7
0.6
'… .
.… . •
.. ......... ........ \. .• +
••
•
.
.•
_....... .
\.
(. ......•
• .....
••
e .-
-
ee 0.2
\
0.1 • •
.
0.1 ! •........... •
-
0.2 0.3 0.7 0.9 0.2 0.3 0.4 0.5 0.6 0.7 0.9
=
13.7 311
and Landgrebe ,
and Mitchell ,
et al.,
and Landgrebe ,
et a l.,
et a l.,
[Collobert et a l.,
and Chawla ,
and Cohen ,
and Li , et a l.,
et a l., 2006b;
[Zhou and Li ,
[Settles ,
313
13.1
13.2
13.3 of
P Z QV
Z P Z AU --qa
13.4
http:j j
13.5
13.6*
13. 7*
13.8
13.10
314
(2013).
Basu , S. , A. Banerjee, and R. J. Mooney. (2002). "Semí-supervised clusteríng
by seedíng." In Proceedings of the 19th on M achine
19-26 , Sydney, Australía.
P. Níyogí , and V. Síndhwaní. (2006). "Manifold regularízatíon: A
geometríc framework for learníng from labeled and unlabeled examples."
Journal of 7:2399-2434.
Blum, A. and S. (2001). "Learníng from labeled and unlabeled data
usíng graph míncuts." In Proceedings of the 18th International Conference
on Machine , 19-26 , Wíllíamston , MA.
Blum, A. and T. Mítchell. (1998). "Combíníng labeled and unlabeled data with
co-traíning." In ofthe 11th Annual Conference on Computation-
al Learning Theory 92-100 , Madison , W I.
Chapelle , 0. , M. Chi , and A. Zien. (2006a). "A contínuation method for semi-
supervísed SVMs." In Proceedings of the 23rd Conference on
Machine , 185-192 , Pittsburgh , PA.
Chapelle , 0. , B. Schölkopf, and A. Zien , eds. (2006b). Semi-Supervised
ing. MIT Press , Cambrídge , MA.
Chapelle , 0. , J. Weston , and B. Schölkopf. (2003). "Cluster kernels for semi-
supervised learníng." In in Processing Systems
15 (NIPS) (S. Becker , S. K. Obermayer , eds.) , 585'---592 , MIT
Press , Cambridge , MA.
Chapelle , O. and A. Zien. (2005). learning by low density
separation." In Proceedings of the 1 0th W orkshop on A
Intelligence and Statistics (AISTATS) , 57-64 , Savannah Hotel , Barbados.
R., F. Sinz , J. L. Bottou. (2006). "Tr adíng convexíty for
scalability." In Proceedings of the 23rd Conference on Machine
201-208 , Pittsburgh , PA.
G. and I. Cohen. (2002). "Unlabeled data can degrade classifìca-
tíon performance of generative In Proceedings of the 15th Inter-
Conference of the Intelligence Research Society
315
Riemann,
1853
I
I 0).
graphical
network).
Markov
Bayesian
320
., OM}.
(Markov
B=
P(Yl
..., Xn}:
< =t +
02 , . . .
{Xl , X2 ,...,
=
322
Random
functions)
X =
, (14.2)
=
14.2 323
Lx
ç xQ*;
= Lx
X5 ,
(separating
• (global Markov
XB I xc.
C B
324
P(XC)
(14.5)
P(XA'XC)
P(XA I XC)
P(XC)
.
(14.6)
Markov
14.3 325
l_ X v I XV\(u ,v) .
= <I 1. 5‘ if
---, --
I 0.1 , otherwise ,
I 0.2 ‘ if XR = X r;;
--, --
I 1. 3, otherwise ,
= e-HQ(xQ) (14.8)
Random
326
= {Xl' X2 , … , = .. .,
I
^
y i Yl Y2 Y3 Y4
y
| [ D] [N] [V] [P] [D] [N]
X
1
- -
P
L
V
-
3
-
n
u
-
--
w
1I
j - - 1-
nC o
-- - I
[D] [N]
The boy
[V] [P][D] [N]
I x
(chain-structured
Yl Y2 Y3 Yn
x = {Xl X2 ... Xn }
14.3 327
P X =
--zp
ex UU + X + QU'K X
1i
feature
Sk(Yi , X ,
feature
I 1, if Yi = [V] and Xi
=<
1 0, otherwise ,
328
(marginalization).
P(XE) . (14.13)
14.4 329
m43(X3)
P(Xl , X2 , X3 , X4 , X5)
X4 X3 X2 Xl
(14.14)
I I X2)m12(X2) , (14.15)
X3 X4 X2
P(X5) I X3) L
I I X3)
= L P (X5 I
= m35(X5) . (14.16)
330
,
(14.17)
=jm35(Z5) (14.18)
mij(Xj) II (14.19)
14.5 331
inference).
14.5.1
332
Ep[f] = J (14.21 )
N
, (14.22)
Chain
Monte
(14.23)
(14.25)
14.5 333
I x) ,
Metropolis-Hastings
(rejeet
et al., K.
I
x t - 1 )A(x* I I A(x* I
p(xt •
1 )Q(X* I Xt - 1) = p(X*)Q(Xt - 1 I x*)A(x t - 1 I x*) , (14.27)
Ixt- 1 ).
2: for t = 1, 2 ,... do
3: I
4:
5: if I x t - 1 ) then
6: xt = x*
7: else
8: xt =
9: end if
10: end for
11: return Xl , x 2 ,
334
(, p(X*)Q(xt-l I x *) \
A(x*|xt-1) =min l 1 i (14.28)
p(xt - 1)Q(x* I x t - 1 ))
I
xt = ,
notation) [Buntine,
14.5 335
rr I:
N
p(x 1 8) = P(Xi' (14.29)
z 1
=
@
= 1 (14.31)
z 1
p(z 1 1
M
q(Z) = (14.35)
exponential
qi
qi
(qj =
[lnp (x , z) l)
q;(Z3)=.(14AO)
J/ J exp (x , Z)] )dzj
[lnp(x ,
14.6 337
field (mean
Dirichlet
(k =
1, 2,...
C.1. 6
338
14.7 339
T K I N \
IIp(8 t I II I Zt ,n , 8 t ) ), (14 .4 1)
pa
LR HK
@
(14.42)
T
LL( (14.43)
(14 .44)
[Pearl, [Pearl ,
and Geman ,
et
and McCallum , 2012].
340
Belief Propagation)
[Murphyet
and Kappen , 2007; (factor graph)
[Kschischang et al.,
and Spiegelhalter ,
et a l.,
[Jordan ,
and Jordan , 2008].
(non-parametric
et a l., and
[Hofmann ,
et a l., 2003;
Gilks et 1996]
341
14.1
14.2
14.3
14.4
14.5
14.6
14.7
14.8
14.9*
14.10*
342
2:159-225.
and D. Geman. (1984). "Stochastic relaxation , Gibbs distributions ,
and the Bayesian restoration of images." IEEE on Pattern
Machine Intelligence , 6(6):721-74 1.
Ghahramani , Z. and T. L. Griffiths. (2006). "Infinite latent feature models
the Indian process." In
18 (NIPS) (Y. Schölkopf, and J. C. Platt , eds.) , 475-482 ,
MIT Press , Cambridge , MA.
Gilks , W. R., S. Richardson , and D. J. Spiegelhalter. (1996).
Monte in Practice. Chapman & HalljCRC , Boca Raton , FL.
Gonzalez , J. E. , Y. Low , and C. Guestrin. (2009). "Residualsplash for optimally
parallelizing belief propagation." In Proceedings of the 12th
on and
Clearwater Beach, FL.
Hastings , W. K. (1970). "Monte Carlo sampling methods using Markov chains
and
Hofmann , T. (2001). probabi1istic latent semantic
analysis." Machine 42(1):177-196.
Jordan , M. 1., ed. (1998). Models. Kluwer , Dordrecht ,
The Netherlands.
Koller , D. and N. Friedman. (2009). Graphical Models: Principles
Techniques. MIT Press , Cambridge , MA.
B. J. Frey, and H.-A. Loeliger. (2001). "Factor graphs and
the sum-product algorithm." IEEE on Information Theory , 47
343
(2) :498-519.
Lafferty, J. D. , A.
dom Probabilistic mode1s for segmenting and labeling sequence data."
1n Proceedings of the 18th Intemational Conference Leaming
282-289 , Williamstown , MA.
Lauritzen , S. L. and D. J. Spiege1ha1ter. (1988). "Local computations with prob
abi1ities on graphica1 structures and their application to expert systems."
of the Royal Society - Series B , 50(2):157-224.
Metropolis , N. , A. W. Rosenb1uth , M. N. Rosenb1uth , A. H. Teller , and E.
Teller. (1953). "Equations of state calcu1ations by fast computing machines."
Joumal of Chemical Physics , 21(6):1087-1092.
J. M. and H. J. Kappen. (2007). "Sufficient conditions for convergence
of the sum-product a1gorithm." IEEE on Information Theory ,
53(12):4422-4437.
Pearl , J. "
Pearl , 1936-
.
Newell
"9.
Michael
[Fürnkranz et al., 2012 J. (rule
@• f1 ^ f2 (15.1)
^ ;
348
val
(confl.ict
(ordered
(priority
(default rule)
(propositional
(first-order (propositional
(atomic
+
^
15.2 349
(relational
(sequential
=
350
6,
.
^ .
^
15.2 351
.
352
(beam
and Niblett ,
Ratio
(Reduced Error
[Brunk and Pazzani ,
(growi ng
and
(Incremental 4]l
[Cohen ,
ed Incremental Pruning
Produce Error Reduction ,
JRIP
lREP* [Cohen ,
IREP*(D);
2: i = 0;
3: repeat
5: Di =
= lREP*(Di );
7:
8: i =
9: until
354
•
rule)j
•
rule).
et a l., 2012].
15 .4 355
1) 6) 10) 14)
16) 17) 1) 6)
6) 7) 10) 15)
10) 14) 15) 16)
2) 3) 6) 7)
(relational
(background
.)
356
Y) .
Y) ,
15.5 357
(FOIL
F_Gain (15.3)
\ -_ m+ +m_ _
X (log 2 - 1og 2
6)" .
Logic
P(f (X)) ,
358
(Least General
LGG) [Plotkin , 1970].
t) =
LGG(s , t) ==
Y) Y)
Y);
15.5 359
Y) Y) Y).
10)
, (15.5)
LGG( lO, Y)
and Dze-
roski , 1993]
(Relative Least
General Generalization) [Plotkin ,
360
A.
principle)
(inverse
resolution) [Muggleton and Buntine ,
= L1 = -, L 2 , C1 = A V L , C2 = B V
C=AV
V , (15.7)
C 1 . C2 (15.8)
v AVB
[Muggleton ,
q •A
:
q •A (15.10)
p • p • A^q
A^B
:
q •B
• A^q p
(15.11)
p • A^C (15.12)
q• B q• C
p • A^B (15.13)
• A q • r^C
T
"C' = CÐ
= {X/Y}.
= BÐ
ò (most
general
= {1/ X} , Ð2 = {1/ X , 2/Y} ,
Ð3
= AV = Bv = ...., L2Ð,
362
C= V . (15.14)
(15.15)
(C = AV
(--, L101)021 =
Y) Y).
,
15.6 363
{ljM , XjN} , O2 =
(symbolism
[Michalski , 1983]. [Fürnkranz et
and
PRISM [Cendrowska ,
(Never-El1ding Language
et
[Muggleton ,
GOLEM [Muggleton and
PROGOL [Muggleton ,
and Lin ,
[Muggleton , [Srinivasan ,
Datalog [Ceri et al.,
and Muggleton ,
(probabilistic
ILP) [De Raedt et
(relational Bayesian network) [Jaeger ,
et al.,
Logic Program) al., 2000
logic network) [Richardson and Domingos ,
(statistical relationallearning) [Getoor and
365
15.1
15.2
15.3
15.4
15.5
15.6 Y)" .
15.7
15.8
15.9* t2 , … ,
15.1 0*
366
machin
Journal of the ACM, 12(1):23-41.
Srinivasan,A.(1999). "The A1eph manual."
mach1earnj A1ephj a1eph.html.
Winston , P. H. (1970). structural descriptions from Ph.D.
Department Engineering , MIT , Cambridge , MA.
Wnek , J. and R. S. Micha1ski. (1994). "Hypothesis-driven constructive induc-
tion in AQI7-HCI: A method and experiments." Machine Learning , 2(14):
139-168.
369
G.
achine
(reinforcement learning).
Decision
= (X , A , P, R) ,
372
p=l
r= -100
a) = 1.
16.2 373
tll
16.2
armed
(exploration-
374
\ $$$ \
(Exploration-
Exploitation
V l, V 2 ,.
(16.1)
1-n
n n × n
+un ,,,.. (16.2)
16.2 375
n
+ 1-n
un AVV
n (16.3)
1: r 0;
2: Vi = 1 , 2 ,. .. K: Q(i) = = 0;
3: for t = 1 , 2 ,.. ., T do
4: if rand() < E then
5: k
6: else
7: k = argmaxi Q(i)
8: end if
9: v = R(k)j
10: r = r + v;
t
12: count(k)
13: end for
= l/vt.
16.2.3 Softmax
376
" T
(16 .4)
1: r = 0;
2: Vi = 1, 2,... K: Q(i) = 0 , count(i) = 0;
3: for t = 1, 2,..., T do
4:
5: v = R(k);
Softmax
(T =
16.3 377
0.40 ï
0 .35 1
Softmax (r =O .OI )
\'
il(
0.25
o 500 1000 1500 2000 2500 3000
E =
(16.5)
I Xo
2: ;=1 rt I Xo = (16.6)
a) = I Xo = x , aO
IF--4
Z
z
R
•z + Z --au
(X , A , P , R);
1: V(x) = 0;
2: for t = 1, 2, . . . do
3: V'(x) = 2:: x'EX P:• x'
4: if then
5: break
6: else
7: V = V'
8: end if
9: end for
;
(16.10)
.
x'EX
V*(x) = (16.12)
(16.13)
V; (x) =
lb<::......
L:
x'EX
P:-+ x'
. (16.14)
íT '(x))
)
16.3 381
(16.16)
= a) , (16.17)
(policy iteration).
(X , A , P , R);
1: V(x) =
2: loop
3: for t = 1, 2,... do
4: : V'(x) = a) I: x/EX
5: if t = T + 1 then
6: break
7: else
8: V = V'
9: end if
10: end for
11: = argmaxaEA Q(x , a);
12: if \/x then
13: break
14: else
16: end if
17: end loop
382
th)= …= maxaEA
2: x' EX X-+X'
PZ• X'
\"1 '--X-+X" T .
.
'1' (16.18)
= (X , A , P , R);
1: V(x) = 0;
2: for t = 1, 2, ' , • do
3: : V'(x) = L: x' EX P:-+ x ' (t
4: if maxxEX lV (x) - V'(x)1 < () then
5: break
6: else
7: V=V'
8: end if
9: end for
= argmaxaEA
V'(x) . (16.19)
16.4 383
(model-free
< T1 , T2 , . . . , TT , XT >,
384
1 7T (X) , E;
7T E (X) = < '" - - (16.20)
;;;:::
(on-policy
1: Q(x , a) = =
2: for s = 1, 2 , . .. do
3:
< >;
4: fort=O , 1,..., T-1 do
5: R= T=-t ri;
6: =
=
8:
9:
I argmaxa ,
a) == < ::, ,-J: _:_
E[fl
E[fl = I
r I_'P(X)
Jx " n q(x) f(x)dx . (16.23)
(16.24)
q(xD
(16.25)
Q(x , a) (16.26)
p 7r = rr
T-l
(16.27)
386
(16.28)
7T'
1: Q(x , a) = 0 ,
2: for s = 1, 2, . . . do
3:
< r l, >j
1___ E+ ;
.0 1 E/IAI ,
5: for t = do
6: R = ;
7: Q(Xt , at) =
8:
for
= Q(x , a')
11: end for
(Temporal
16 .4 387
= t 2::;=1
- Qt( X,
(rt +l -
Q1r (x , a) (X'))
x'EX
Qf+l = - , (16.31)
and Dayan ,
388
1: Q(X , a)
= 0 , 1l" (x , a)
2: x =
3: for t = 1 , 2 , . .. do
4: r , x'
5: a'
6: Q(x , a) = Q(x , a) a') - Q(x , a));
= argmaxa" ;
8:
9: end for
1: Q(x , a) =
2: x = xo;
3: for t = 1, 2, . . . do
4: r , x'
5: a'
6: Q(x , a) = Q(x , a) -
= argmaxa " ;
8: x
9: end for
(tabular value
16.5 389
=
et al., 2010]
(16.32)
function
.(16.33)
lr.' l
2 I
= (16.34)
x . (16.35)
X , (16.36)
390
(0; . . . ; 1; . . . ;
1: (J = 0;
2: = arg ;
3: for t = 1, 2, . . . do
4: r , æ'
5: a' =
(apprenticeship learning) ,
(Iearning
from demonstration) ,
(Iearning by
(imitation "learning).
16.6 391
reinforce-
ment learning) [Abbeel and Ng , 2004].
=E
392
(16.37)
X11") ?: 0 . (16.38)
(16.39)
"
S.t. 1
2: 1['
3: for t = do
4:
5: S.t. :S; 1;
6: 1[' =
7: end for
16.7 393
[Sutton ,
p.22.
and Dayan ,
and Niranjan , 1994].
et a l.,
and Scherrer , [Dann et
(approximate dynamic
394
Q(k)
16.2
16.3
16.4
16.5 ).
16.6
16.7
16.8
16.9
16.10*
395
Andreyevich
=
transpose A T, (AT)ij =
(A.l)
(AB)T = B TA T . (A.2)
= A- 1A =
det(A) = L • • • (A.9)
uESn
400
det(I) =
AA AA
--n
d 4lu A - d 4EU
=A A A n4 A
1i
,
G
(A.15)
i
(A.16)
(A.17)
i
(A.18)
401
(A.19)
(A.20)
(\1 2 f (A.21)
rule)
(A.22)
(A.23)
. 1
(A.24)
ax ax
T>
, (A.25)
r> T
(A.26)
T>
-, (A.27)
-, (A.28)
. (A.29)
åA
402
(A.30)
(chain
= g (h
(A.31)
- b)
- byW(Ax - b) 2W(Ax- b)
= 2AW(Ax - b) . (A.32)
A=U :E yT (A.33)
matrix);
Value
vector) ,
matrix ap-
k:::;;
mm IIA-AIIF (A.34)
AE lRmxn
s.t. rank(A) =k .
403
Ak = Uk :E kV f' (A.35)
=
=
= , (B.2)
=
404
=
=
< 0.
= =
l MZKO (B.3)
=0.
J(x) (B.4)
S.t . hi (X) = 0 (i = ,
(j = .
405
L Z = FJ Z h g + Z B
… (B.6)
(primal
(dual
(dual function) r : lR.m x
(B.7)
:::;; 0 , (B.8)
=
æ EollJ}
. (B.9)
mJ pi nu
CO
(B.ll)
(dual
(weak
= (strong
(B.12)
æ z
s.t. Ax
Q E
pro-
B 407
(i =
s.t. Ai. X = bi , i = 1, 2, .
X>- O.
+ Ll x) <
Ll x , (B.17)
method).
(coordi-
nate ascent)
= (X1' X2 ,.. .
,
XO X 1 ,
t+1 _ J!{ •.
= arg mm H Xi' -, . . . ,
Xi . - i, y, Xã) (B.18)
,
X O X 1, X 2 ,..
point).
point).
409
p(x)
U(x
1
b-a
AA
Ehl=-r; (0.2)
[x] 12
(0.3)
b).
(Jacob Bernoulli ,
P(x = ;
410
lE [X] (0.5)
var[x] . (0.6)
= Bin(m I = ! .. (0.7)
\ml
lE [X] = (0.8)
var[x] = . (0.9)
=
E =
c nu
=
i=l
(0.11)
var[Xi] ; (0.12)
= 1I[j = . (0.13)
(, 2.
P(ml , m2 ,... , md I = Mult(ml ,
; (0.14)
ml!m2!
lE [mi] = ; (0.15)
411
,
,,
I
,,
, ,,
--
0.5 1 nu
\b
= 1(1
r (a)r(b)
; (C.18)
b)
(C .19)
'
mrbLl = [ JR , (C.20)
+
filo
r - d
(C .21)
B(a , b) (C.22)
412
...;
= (C.23)
lmk
FJrn
(C.24)
(C.25)
ov
pu
(C.26)
f )
=N(x (C.27)
lE [x] (C.28)
var[x] (C.29)
p(x
(C.30)
413
p(x)
4 4 z
lE [x] (C.31)
cov[x] = . (C.32)
I = {X1 , X2 , '"
I I
I I distribution).
X = {X1 ,
1
- mx)
= (C.33)
b' = b+ m -
414
(, 3
ullback-Leibler
I \ L _ p( x)
KL(PIIQ) = I p(x) dx (C.34)
q(x)
(C.35)
, (C.36)
+ H(P, )
H(P,
415
W.
418
130 , 147 LASSO , 252 , 261
5x
LVQ , 204 , 218
AdaBoost , 173
108
331
Bagging , 178
MDP, 371
137, 139
111
Boosting , 173 , 190 MvM , 63
101
OvO , 63
OvR , 63
ECOC , 64
ReLU , 114
192 , 268 RIPPER , 353
RKHS , 128
Softmax , 375
273 , 274
WEKA , 16
420
27 , 179
185
156 , 319 , 339
101
158
158 , 328
158
58 ,132 , 325
149
421
137, 232
185 , 304
10 , 363
156, 319
206 , 296
59 , 149 , 297
149 , 328
422
191
182 , 225
183 , 225
299
61, 138
79 , 352 61 , 138
101 , 104
103 , 402
161 , 328
161 , 320
199
423
148 , 325
161
150 130
179
130
106
158
12 , 139
129
162 , 319
158
197
126
156
191
2
105 , 133
105
425
189 , 227
408
010-62782989 13701121933
2016
ISBN
N. ( TP181
http://WWw.wqbook. com
100084
010-62770175 010-62786544
010-62776969,
1 5000
064027-01
_._
II
111
Z
x
x
JJ}NJPD
.
sup(.)
lI(. )
sign(. ) > 1, 0 , 1
1
1. 1 ........................................................... 1
1. 2 . ........... .................... ...... ....... .......... 2
1. 3 ....................................................... 4
1.4 .......... .......... .............. .... .... ... ... ....... 6
1. 5 ....................................................... 10
1. 6 ....................................................... 13
1. 7 ....... ................................................. 16
................................................................ .
............................................................. 20
........................................................... 22
........... 73
4.1 ..... .................................................. 73
4.2 ....................................................... 75
4.3 .. ......................... ....... ..................... 79
4.4 ................................................... 83
4.5 ................................................... 88
4.6 ....................................................... 92
................................................................. 93
............................................................. 94
........................................................... 95
...................................... 97
5.1 ......................... .... ......................... 97
5.2 .............................................. 98
5.3 .......................................... ....... .101
5.4 ............................................106
5.5 .............................................. 108
5.6 ....................................................... 113
5.7 ....................................................... 115
................................................................. 116
............................................................. 117
........................................................... 120
.121
6.1 ................................................ .121
6.2 ....................................................... 123
6.3 ......................................................... 126
6.4 ................................................ .129
6.5 .................................................. .133
xl
171
8.1 ..................................................... 171
8.2 Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
8.3 ............................................ .178
8.4 ....................................................... 181
8.5 ......................................................... 185
8.6 ....................................................... 190
.......................................................... ........192
............................................................. 193
....................... .....................................196
197
9.1 ....................................................... 197
9.2 ....................................................... 197
9.3 ....................................................... 199
9.4 ....................................................... 202
9.5 ....................................................... 211
xii
.... ......225
10.1 . ... ...... ............................ .............. .225
10.2 ...................................................... 226
10.3 .................................................... 229
10.4 ....................... ."..........................232
10.5 ........................ .......;......................234
10.6 ... ...... ..... ... ...................... .......... .....237
10.7 ......... ...... .......................................240
................ ... ...... ... ..................... •...............242
....... ........... ............................ ... ..... .......243
..... .......... ............................. .... .... .... ...246
..............................247
11. 1 .......................... .............".......247
11. 2 .................................................... 249
11. 3 .................................................... 250
11. 4 .......................... .... ..... ..... .252
11.5 ............................ ... ....... .....254
11. 6 ...................................................... 257
11. 7 ......... ...... .............................. .........260
............. ..... ...... .......................... .... ...... .....262
............................................................. 263
........................................................... 266
12.5 ............................................279
12.6 ....... .............. ........... .......... .... ..........284
12.7 ...................................................... 287
................................................................. 289
... ........... ......... ............ ............. .............290
........................................................... 292
293
13.1 .... .......... ....................... .............. .293
13.2 .................................................... 295
13.3 ............. .................. ............. ....... 298
13 .4 .............. ........... ............. ......... ...300
13.5 ..... ............ ....................... .......304
13.6 .. ............... ...................... .......... ... 307
13.7 ...................................................... 311
................................................................. 313
............................................................. 314
........................................................... 317
319
14.1 ...............................................319
14.2 ............................................322
14.3 .................................................... 325
14.4 ... ........... ............... .......... ...;........ .328
14.5 ...................................................... 331
14.6 ...................................................... 337
14.7 ....... .... ..... ............ ............ ........... ... 339
................................................................. 341
................. ............................................342
...................... ........... ............ ........ ......345
. ………………………………….... .347
15.1 ..... ........... ............ ............ ........... ... 347
15.2 ............... ............... .......... ........... ... 349
15.3 ...................................................... 352
Xl V
15 .4 ................................................. .354
15.5 .............................................357
15.6 ...................................................... 363
...................................................:............. 365
............................................................. 366
........................................................... 369
.371
16.1 ...................... 371
16.2 .................................................373
16.3 .................................................... 377
16.4 .....................................•.............. 382
16.5 .................................................... 388
16.6 ...................................................... 390
16.7 ...................................................... 393
............ ....'.......... .................... ...................394
............................................................. 395
.... ................... ............... .....................397
.'..........399
A ............................................................. 399
B ..... ..... ........... .................... ................... .403
C .... .................................... ................409
417
419