行列およびテンソルデータに対する機械学習（数理助教の会 2011/11/28）

2011/11/28
Sensors
Time
Star Wars Titanic
Blade
Runner
User 1
User 2
User 3
AB
X1 X2 X3 X4
f (X ) = X , W + b
W = AB ()

X G 1 U 1 2 U 2 K U K
x1 x2 x3 x4
f (x) = x, w1 + xx , W 2 +
1

(2006-2009)

(2010-)

(?)

(2006-2009)

(2010-)

2006-2009
Right or leE?
Tucker2010-
1 U 2 U 3
(1)
(2)
U(3)
Brain-computer interface
Aims to decode thoughts or commands from
human brain signal [Wolpaw+ 2002]
Encoding
Thoughts
Commands
Decoding
Signal AcquisiXon
(EEG, MEG, )
P300 speller system
Evoked
Response
Farwell & Donchin 1988
P300 speller system

A
G
M
S
Y
5
B
H
N
T
Z
6
C
I
O
U
1
7
D
J
P
V
2
8
E
K
Q
W
3
9
F
L
R
X
4
_
A
G
M
S
Y
5
B
H
N
T
Z
6
C
I
O
U
1
7
D
J
P
V
2
8
E
K
Q
W
3
9
F
L
R
X
4
_
ER detected!
ER detected!
The character must be P

(X1, y1), , (Xn,yn)
X1 X2 X3 X4
Xn
Xi ( x )
yi = +1 or -1 ()

minimize
W ,b
(f
(X
),
y
)
+
R(
W
)
i
i
i=1
f (X )
= X , W + b

Schaden 1- (nuclear norm / trace norm)
W S1 =

j (W )
()
j=1
1
argmin
X W 2
F + W S1
2
W
= U max(S , 0)V
X
= U SV
vc:
uc:
Modeling P300 speller (decoding)

Suppose that we have a detector f(X) that detects
the P300 response in signal X.
f1 f2 f3 f4 f5 f6
f7
f8
f9
f10 but learning 2 x 6-class classier
This is nothing
f11
f12
How we do this
12 2 8 1 3 4 11 9 5 6 10 7
MulXnomial likelihood f.
MulXnomial likelihood f.
Experiment
Two subjects (A&B) from BCI compeXXon III
64 channels x 37 Xme-points (600ms @ 60Hz)
12 epochs x 15 repeXXons x 85 leders = 15300 epochs in
training set
100 leders for test
Linear detector funcXon (bias is irrelevant)
(36)
9"
.//012/3
+0456/7)B
(""
))))!*+,"-
("(
8"
("
"
!(
("
"
("
))))!*+,"-
(""
'"
&"
%"
$"
#"
!"
("!(
:)2/7;<6)/=>?=@6@7A
(""
'"
&"
%"
$"
#"
!"
("!(
:)2/7;<6)/=>?=@6@7A
.//012/3
+0456/7).
("
9"
15
5
(""
))))!*+,"-
("(
(""
))))!*+,"-
("(
8"
("
"
!(
("
Tomioka & Mller 2009
(Subject A)
!"#$%&
-.$$%&/
4"5%)9:;&7%
0()!*'<=!',!'
'()!*+!',!'
0
,
!0
, ',, 0,, 1,, +,, 2,, 3,,
4"5%)6578
0
,
!0
, ',, 0,, 1,, +,, 2,, 3,,
4"5%)6578
300ms
300ms
(Subject B)
3()!*+,5!'.
!2
2()!*2,4!'.
!'
'()!*+,-!'.
!'
!"#$%&
/0$$%&1
6"7%);<=&9%
2
.
!2
. '.. 2.. 3.. 4.. +.. 5..
6"7%)879:
2
.
!2
. '.. 2.. 3.. 4.. +.. 5..
6"7%)879:
2
.
!2
. '.. 2.. 3.. 4.. +.. 5..
6"7%)879:
2
3

Farwell & Donchin
(2x6)
Schaden 1-

(K)
X Rn1 nK
X =
r=1
Ar 1
Ar
Ar =
R X
R NP
NP
CANDECOMP / PARAFAC (CP)
1

X=ABT
CP
X =
r=1
ar br cr = [[A, B, C]]
(Kruskal 77)
kA + kB + kC 2R + 2
kA A k-rankkk
3
Kolda & Bader 2009
X3
X = a1 b1 c2 + a1 b2 c1 + a2 b1 c1
Y2
1
1
1
Y = (a1 + a2) (b1 + b2) (c1 + c2) a1 b1 c1
X YF 0
( )
Tucker
Tucker [Tucker 66]
n3
n1
n2
r3
r1 C
r2
Xijk =
r2
r1
n1
r1
r2
r3
U(1)
n2
U(2)
(1) (2) (3)

Cabc Uia Ujb Ukc
a=1 b=1 c=1
CP (r1=r2=r3)
Tucker
n3
r3
U(3)
-k ()
-1
X (1)
I2
I1
I2
I2
I1
I2
I3
-2
I3
X (2)
I3
I1
I3
I3
I2
I2
I3
I1
-k-k
I3
X = C 1 U 1 2 U 2 3 U 3
r1
-1
X (1) = U 1 C (1) (U 3 U 2 )
rank(X(1))=r1
I1
r2
r1
U1
r3
U3
r3
r2
I2
U2
-2
X (2) = U 2 C (2) (U 1 U 3 )
rank(X(2))=r2
-3
X (3) = U 3 C (3) (U 2 U 1 )
rank(X(3))=r3
TuckerX(k)
CP / Tucker

CP
Tucker
NP
SVD)
NP
r1 x r2 x r3
Tucker

overlapped Schaden 1-
K

X =
k X (k) S1
S1
k=1
-k
Schatten 1-

k

(Cf. Liu+09, Signoretto+10, Tomioka+10, Gandy+11)

minimize
X
Estimation error
subject to

X ,
S1
Xijk = Yijk
((i, j, k) )
Convex
Tucker (exact)
Optimization tolerance
10
10
0.2
0.4
0.6
0.8
Fraction of observed elements
minimize
x
subject to
x1
Ax = y
A: n x N
(n << N)
Donoho-Tanner Phase TransiXon

Donoho & Tanner Precise Undersampling Theorem
p r(2n-r)
n2p
Recht et al (2007) Guaranteed Minimum-Rank SoluXons of Linear Matrix EquaXons via Nuclear Norm MinimizaXon
W : (r1,...,rK)
yi = X i , W + i
(i = 1, . . . , M )

1
2
W=
argmin
y X(W)2 + M W S1
2
WRn1 nK
(N =
k=1
nk )
X : RN RM
X(W) = (X 1 , W , . . . , X M , W)
(cf. Negahban &

Wainwright 11)
(X) C

2
1
2
X()2 (X)F
M
C

(X)X
M>N ()
C

- X () =

i X i
i=1

2X ()mean /M

X
mean
1
X (k)
:=
S
K
k=1
W 32M 1
W
rk
F
(X) K
k=1
: Hlder

W, X W S1 X mean
X (k) S
X mean :=
K
k=1
-k
XS :=
max
j{1,...,m}
j (X)
S1
mean

2

K
(X)
W 32M 1
W
rk
F
M
(X) K
k=1

M- X ()mean
Negahban & Wainwright

M- X
()mean

(M=N)
X()22
2
=

EX ()
mean
(X) = 1/M

2X ()mean /M
nk + N/nk
K
k=1
(OK)
n3
n1 n2
(N =
k=1

X ()
mean
nk )
2
(M=N)
M
2
K
K
k=1
nk + N/nk /N
W 2
W
2 1
F
Op n 1/2 r1/2
N
n1 1/2 :=
1
K
2
K
1/nk ,
k=1
r1/2 :=
1
K
n1 1/2 r1/2
K 2
k=1 rk
:
( =0.01)
!$
:123=>(!*(!*#!?*!9=!"''8+
:123=>'!!*'!!*(!?*! =!"##8+
93/5*:;</-34*3--,-
W
W
F
N
)*'!
'"(
'
!"(
!*
!
!"#
!"$
!"%
!'
!"&
+,-./01234*-/56*775 77'8#77-77'8#
'
:
( =0.1)
!"!'#
!"!'
70,3289:,*0120**)*
W 2
W
F
N
8./0;<=!2=!2#!>2!7;!"?&6(
8./0;<'!!2'!!2=!>2! ;'"=6(
7
!"!!&
!"!!%
!"!!$
!"!!#
!2
!
!"#
!"$
!"%
!'
!"&
()*+,-./012*,342553 55'6#55*55'6#
'

(Xi )
M

2X ()mean /M
M
EX ()mean
nk + N/nk
K
k=1
n3
n1 n2
(N =
M
cn1 1/2 r1/2
N
k=1
nk )
OK (=1/64)
(M: , N: )
3

1.
M
cn1 1/2 r1/2 ()
N
2.

2
K
K
k=1
nk + N/nk / M
2
2 1
W
W
n 1/2 r1/2
F
Op
N
M
Convex
Tucker (exact)
Optimization tolerance
10
10
0.2
0.4
0.6
0.8
Fraction of observed elements
Fraction at Error<=0.01
Estimation error

() 0
0.8
0.6
0.4
0.2
0
0
size=[50 50 20]
size=[100 100 50]
0.2
0.4
0.6
1
Normalized rank ||n ||1/2||r||1/2
0.8
0.8
0.8
0.6
0.4
0.2
0
0
size=[50 20]
size=[100 40]
0.1
0.2
0.3
1
0.4
0.6
0.4
0.2
0
0
size=[50 50 20]
size=[100 100 50]
0.2
0.4
0.6
1
0.8

TuckerSVD
Tucker

Normalized rank
Wolpaw, Birbaumer, McFarland, Pfurtscheller, Vaughan (2002) Brain-computer interfaces for communicaXon and
control. Clin. Neurophysiol. 113, 767791.
Farwell & Donchin (1988) Talking o the top of your head: toward a mental prosthesis uXlizing event-related brain
potenXals. Electroencephalogr. Clin. Neurophysiol. 70 (6), 510523.
Tomiok a & Mller (2009) A regularized discriminaXve framework for EEG analysis with applicaXon to brain-computer
interface. Neuroimage, 49 (1), 415-432.
Kolda & Bader (2009) Tensor decomposiXons and applicaXons. SIAM Review, 51(3):455500.
Tucker (1966) Some mathemaXcal notes on three-mode factor analysis. Psychometrika, 31(3):279311.
Gandy, Recht, & Yamada (2011) Tensor compleXon and low-n-rank tensor recovery via convex opXmizaXon. Inverse
Problems, 27:025010.
Liu, Musialski, Wonka, & Ye. (2009) Tensor compleXon for esXmaXng missing values in visual data. In Prof. ICCV.
Signoredo, de Lathauwer, & Suykens (2010) Nuclear norms for tensors and their use for convex mulXlinear esXmaXon.
Tech Report 10-186, K.U.Leuven.
Donoho & Tanner (2010) Precise undersampling theorems. Proceedings of the IEEE, 98(6):913924.
Recht, Fazel, & Parrilo (2010) Guaranteed minimum-rank soluXons of linear matrix equaXons via nuclear norm
minimizaXon. SIAM Review, 52(3):471501.
Tomioka, Hayashi, & Kashima (2011) EsXmaXon of low-rank tensors via convex opXmizaXon. Technical report, arXiv:
1010.0789, 2011.
Tomioka, Suzuki, Hayashi, & Kashima (2011) StaXsXcal performance of convex tensor decomposiXon. Advances in NIPS
24. 2011, Granada, Spain.

行列およびテンソルデータに対する機械学習（数理助教の会 2011/11/28）

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

行列およびテンソルデータに対する機械学習（数理助教の会 2011/11/28）

Uploaded by

Copyright:

Available Formats

2011/11/28

Star Wars Titanic

P300 speller system

Farwell & Donchin 1988

P300 speller system

The character must be P

Modeling P300 speller (decoding)

Linear detector funcXon (bias is irrelevant)

Tomioka & Mller 2009

Kolda & Bader 2009

(1) (2) (3)

a=1 b=1 c=1

Donoho-Tanner Phase TransiXon

(cf. Negahban &

Negahban & Wainwright

You might also like