Professional Documents
Culture Documents
com/download
I
3
7
*
To ; :hinton@cs.toronto.edu
hold out
arXiv: 1207.0580 v1 [cs
dropout
(1)
hold out
dropout
0.5
dropout
1
dropout
dropout
(L2 ) L2
dropout
N softmax
N
2
dropout
dropout (2)
dropout
MNIST( )
60,000 28x28 10,000
(3) (4)
(5)
2
1: 50% dropout MNIST
20% dropout
103
97 94 93 88 50% dropout 83 79 78 78
77 79 (
a)
dropout TIMIT
(hmm)
HMM
TIMIT(6) (7,8)
2 HMM
TIMIT 21
10ms 4 4000
3
2:TIMIT dropout
50% 20% Dropout
185 softmax 39
50% Dropout ( 2)
CIFAR-10 10 32x32
( ) ( )
50,000 10,000
(9)( 3)
18.5%(10) 16.6%
( D)
dropout 15.6%
ImageNet
(11) 2010 1000 1000
4
3: CIFAR-10 bird 10
47.2%
45.7%(12) 48.6%
1000 softmax L2 6
50% dropout 42.4%( E)
dropout dropout
50 201,369 2000
C log(1+ C) 2
2000 31.05% 50%
dropout( C) 29.62%
dropout
dropout dropout
0.5 dropout
4: 5 Imagenet 5
5
50%
(13)
Dropout
dropout
dropout
bagging
(15)
Bagging
(16) Dropout
Dropout bagging
dropout
(17) (17)
6
1. D. E. Rumelhart, G. E. Hinton, R. J. Williams, Nature 323, 533 (1986).
2. G. E. Hinton, Neural Computation 14, 1771 (2002).
3. L. M. G. D. C. Ciresan, U. Meier, J. Schmidhuber, Neural Computation 22, 3207
(2010).
4. Y. B. Y. Lecun, L. Bottou, P. Haffner, Proceedings of the IEEE 86, 2278 (1998).
5. G. E. Hinton, R. Salakhutdinov, Science 313, 504 (2006).
6. A. Mohamed, G. Dahl, G. Hinton, IEEE Transactions on Audio, Speech, and Language
Processing, 20, 14 (2012).
7. G. Dahl, D. Yu, L. Deng, A. Acero, IEEE Transactions on Audio, Speech, and
Language Processing, 20, 30 (2012).
8. N. Jaitly, P. Nguyen, A. Senior, V. Vanhoucke, An Application OF Pretrained Deep
Neu-ral Networks To Large Vocabulary Conversational Speech Recognition, Tech.
Rep. 001, Department of Computer Science, University of Toronto (2012).
9. A. Krizhevsky, Learning multiple layers of features from tiny images, Tech. Rep. 001,
De-partment of Computer Science, University of Toronto (2009).
10. A. Coates, A. Y. Ng, ICML (2011), pp. 921 928.
11. J. Deng, et al., CVPR09 (2009).
12. J. Sanchez, F. Perronnin, CVPR11 (2011).
13. S. J. N. R. A. Jacobs, M. I. Jordan, G. E. Hinton, Neural Computation 3, 79 (1991).
14. R. M. Neal, Bayesian Learning for Neural Networks, Lecture Notes in Statistics No.
118 (Springer-Verlag, New York, 1996).
15. L. Breiman, Machine Learning 24, 123 (1996).
16. L. Breiman, Machine Learning 45, 5 (2001).
17. J. D. A. Livnat, C. Papadimitriou, M. W. Feldman, PNAS 105, 19803 (2008).
18. R. R. Salakhutdinov, G. E. Hinton, Artificial Intelligence and Statistics (2009).
19. D. D. Lewis, T. G. R. Y. Yang, Journal of Machine Learning 5, 361 (2004).
20. We thank N. Jaitly for help with TIMIT, H. Larochelle, R. Neal, K. Swersky and C.K.
I. Williams for helpful discussions, and NSERC, Google and Microsoft Research for
funding. GEH and RRS are members of the Canadian Institute for Advanced Research.
7
A MNIST
A.1
MNIST 28 28 60,000 10,000
( )
dropout 4 (784-800-
800- 10,784 -1200-1200- 10,784 -2000-2000- 10,784 -1200-1200-1200-10)
50%
20% 100 minibatch
10.0 ( minibatch )
epoch 0.998
l l l
l = 15 0.01
0.5 500
0.99 0.99 (1- )
3000 :
8
A.2
dropout
dropout
- (5) 784-500-500-
1
2000
dropout drop - out : 50%
20% 1.0
100 minibatch
1000 epoch 118
dropout 92
- (18)2(784-500-1000-10)
dropout-backpropagation 1784 -500 - 1000 - 10
( 1000 DBM (18))
dropout
79 94
A.3
dropout
dropout 784-500-500
5
100 dropout
dropout
B TIMIT
TIMIT 630
8 10
1
For code see http://www.cs.toronto.edu/ hinton/MatlabForSciencePaper.html
2
For code see http://www.utstat.toronto.edu/ rsalakhu/DBM.html
9
(b)
Kaldi( speech 3
)
10 25
0 1 100 minibatch
dropout (15 31 ) (3 4 5 )
(2000 4000 ) 6
dropout
B.1 Pretraining
TIMIT (5)
RBM
3http://kaldi.sourceforge.net
10
6: (a) (b)
11
0.01
1.0
0.5 20 epoch 0.9 0.
001 ( 1- ) 0.001 L2 100
epoch
rbm 0.01
log(p/(1 p)) p
RBM 50 epoch
B.2
rbm dropout-backpropagation
10 0.5 0.9 1.0(
minibatch ) MNIST dropout
200 epoch
0.1
6 dropout
Dropout
C
1 (RCV1-v2)(19) 804,414
4
103 : / /
/ / /
3
63 63 11
1
4
The corpus is available at http://www.ai.mit.edu/projects/jmlr/papers/volume5/lewis04a/lyrl2004 rcv1v2 README.htm
12
4 (25%) 50
402,738
2000
7: (a) (b)
dropout-backpropagation
2000-2000-1000-50 MNIST dropout ( A.1)
500 epoch
7 2000-
2000-1000-50 2000-1000-1000-50 dropout dropout
D CIFAR-10
8000 32 32
CIFAR-10 60000
5
10 5000 1000
5
The CIFAR dataset is available at http://www.cs.toronto.edu/ kriz/cifar.html.
13
CIFAR-10
CIFAR-10
CIFAR-10
E ImageNet
ImageNet
Mechanical Turk
2010 1000 1000
ImageNet
130 5 15
CIFAR-10 1000 10
ImageNet ImageNet
256 256
F
CIFAR-10 ImageNet (cnn)
cnn CNN
CNN
( )
14
CNN
CNN
( )
dropout
F.1
cnn
F.2
15
N
F.3
max-with-zero
i
aix,y = max(0 zx,y )) zx,yi (
)
( )
RGB
F.4
F.5
max-with- 0
( 1)
16
F.6
128 0.9
w
F.7
2 3
10 10
G CIFAR-10
dropout CIFAR-10 CNN
3 3 2
N=
9 = 0.001 = 0.75 10 softmax
64 5 5(
)
dropout CIFAR-10 dropout
16 3 3
50% dropout softmax
17
H ImageNet
dropout ImageNet CNN 256 256
224 224
7
3 3
2 64
11 11 4 (
) 256 5 5
( ) 256
64 16
(56 56) 2
( ) max-with-zero
max-with-zero 512
32 ( ) 16
512 32
32
4096 50% dropout
1000 softmax softmax 1000 256 256
224 224 ( )
dropout ImageNet
dropout
dropout
dropout dropout
18