You are on page 1of 18

https://fanyi.youdao.

com/download

I
3
7

G. E. Hinton* N. Srivastava, A. Krizhevsky, I. Sutskever R. R. Salakhutdinov


6 King s College Rd,
NE] 2012

Toronto, Ontario M5S 3G4, Canada

*
To ; :hinton@cs.toronto.edu

hold out
arXiv: 1207.0580 v1 [cs

dropout

(1)

hold out

dropout
0.5
dropout

1
dropout

dropout

(L2 ) L2

dropout
N softmax
N
2
dropout
dropout (2)
dropout

MNIST( )
60,000 28x28 10,000
(3) (4)
(5)

160 50% dropout


L2 130 20%
110 ( 1)
Dropout
(5)
118
50% dropout 92 URL -

2
1: 50% dropout MNIST
20% dropout

103
97 94 93 88 50% dropout 83 79 78 78
77 79 (
a)
dropout TIMIT
(hmm)

HMM
TIMIT(6) (7,8)

2 HMM
TIMIT 21
10ms 4 4000

3
2:TIMIT dropout
50% 20% Dropout

185 softmax 39
50% Dropout ( 2)

HMM Viterbi HMM


dropout 22.7% dropout 19.7%

CIFAR-10 10 32x32
( ) ( )
50,000 10,000
(9)( 3)
18.5%(10) 16.6%

( D)
dropout 15.6%
ImageNet
(11) 2010 1000 1000

4
3: CIFAR-10 bird 10

47.2%
45.7%(12) 48.6%

1000 softmax L2 6
50% dropout 42.4%( E)

dropout dropout

50 201,369 2000
C log(1+ C) 2
2000 31.05% 50%
dropout( C) 29.62%
dropout
dropout dropout
0.5 dropout

4: 5 Imagenet 5

5
50%

(13)

Dropout

(14) 0.5 dropout

dropout
dropout

bagging
(15)
Bagging
(16) Dropout
Dropout bagging

dropout (naive bayes)

dropout
(17) (17)

6
1. D. E. Rumelhart, G. E. Hinton, R. J. Williams, Nature 323, 533 (1986).
2. G. E. Hinton, Neural Computation 14, 1771 (2002).
3. L. M. G. D. C. Ciresan, U. Meier, J. Schmidhuber, Neural Computation 22, 3207
(2010).
4. Y. B. Y. Lecun, L. Bottou, P. Haffner, Proceedings of the IEEE 86, 2278 (1998).
5. G. E. Hinton, R. Salakhutdinov, Science 313, 504 (2006).
6. A. Mohamed, G. Dahl, G. Hinton, IEEE Transactions on Audio, Speech, and Language
Processing, 20, 14 (2012).
7. G. Dahl, D. Yu, L. Deng, A. Acero, IEEE Transactions on Audio, Speech, and
Language Processing, 20, 30 (2012).
8. N. Jaitly, P. Nguyen, A. Senior, V. Vanhoucke, An Application OF Pretrained Deep
Neu-ral Networks To Large Vocabulary Conversational Speech Recognition, Tech.
Rep. 001, Department of Computer Science, University of Toronto (2012).
9. A. Krizhevsky, Learning multiple layers of features from tiny images, Tech. Rep. 001,
De-partment of Computer Science, University of Toronto (2009).
10. A. Coates, A. Y. Ng, ICML (2011), pp. 921 928.
11. J. Deng, et al., CVPR09 (2009).
12. J. Sanchez, F. Perronnin, CVPR11 (2011).
13. S. J. N. R. A. Jacobs, M. I. Jordan, G. E. Hinton, Neural Computation 3, 79 (1991).
14. R. M. Neal, Bayesian Learning for Neural Networks, Lecture Notes in Statistics No.
118 (Springer-Verlag, New York, 1996).
15. L. Breiman, Machine Learning 24, 123 (1996).
16. L. Breiman, Machine Learning 45, 5 (2001).
17. J. D. A. Livnat, C. Papadimitriou, M. W. Feldman, PNAS 105, 19803 (2008).
18. R. R. Salakhutdinov, G. E. Hinton, Artificial Intelligence and Statistics (2009).
19. D. D. Lewis, T. G. R. Y. Yang, Journal of Machine Learning 5, 361 (2004).
20. We thank N. Jaitly for help with TIMIT, H. Larochelle, R. Neal, K. Swersky and C.K.
I. Williams for helpful discussions, and NSERC, Google and Microsoft Research for
funding. GEH and RRS are members of the Canadian Institute for Advanced Research.

7
A MNIST
A.1
MNIST 28 28 60,000 10,000
( )
dropout 4 (784-800-
800- 10,784 -1200-1200- 10,784 -2000-2000- 10,784 -1200-1200-1200-10)
50%
20% 100 minibatch
10.0 ( minibatch )
epoch 0.998
l l l
l = 15 0.01
0.5 500
0.99 0.99 (1- )
3000 :

0= 10.0,f = 0.998,pi= 0.5,pf= 0.99,T = 500

8
A.2
dropout
dropout

- (5) 784-500-500-
1
2000
dropout drop - out : 50%
20% 1.0
100 minibatch
1000 epoch 118
dropout 92
- (18)2(784-500-1000-10)
dropout-backpropagation 1784 -500 - 1000 - 10
( 1000 DBM (18))
dropout
79 94

A.3
dropout

dropout 784-500-500
5
100 dropout
dropout

B TIMIT
TIMIT 630
8 10

1
For code see http://www.cs.toronto.edu/ hinton/MatlabForSciencePaper.html
2
For code see http://www.utstat.toronto.edu/ rsalakhu/DBM.html

9
(b)

5:MNIST (a) backprop (b) dropout

Kaldi( speech 3
)
10 25
0 1 100 minibatch
dropout (15 31 ) (3 4 5 )
(2000 4000 ) 6
dropout

B.1 Pretraining
TIMIT (5)
RBM
3http://kaldi.sourceforge.net

10
6: (a) (b)

11
0.01
1.0
0.5 20 epoch 0.9 0.
001 ( 1- ) 0.001 L2 100
epoch
rbm 0.01
log(p/(1 p)) p
RBM 50 epoch

B.2
rbm dropout-backpropagation
10 0.5 0.9 1.0(
minibatch ) MNIST dropout
200 epoch
0.1
6 dropout
Dropout

C
1 (RCV1-v2)(19) 804,414
4
103 : / /
/ / /
3
63 63 11
1

4
The corpus is available at http://www.ai.mit.edu/projects/jmlr/papers/volume5/lewis04a/lyrl2004 rcv1v2 README.htm

12
4 (25%) 50
402,738
2000

7: (a) (b)

dropout-backpropagation
2000-2000-1000-50 MNIST dropout ( A.1)
500 epoch
7 2000-
2000-1000-50 2000-1000-1000-50 dropout dropout

D CIFAR-10
8000 32 32

CIFAR-10 60000
5
10 5000 1000

5
The CIFAR dataset is available at http://www.cs.toronto.edu/ kriz/cifar.html.

13
CIFAR-10
CIFAR-10
CIFAR-10

E ImageNet
ImageNet
Mechanical Turk
2010 1000 1000
ImageNet
130 5 15
CIFAR-10 1000 10
ImageNet ImageNet

256 256

F
CIFAR-10 ImageNet (cnn)

cnn CNN

CNN

( )

14
CNN
CNN

( )

dropout

F.1
cnn

F.2

(x, y) bank i aix,y

15
N

F.3
max-with-zero
i
aix,y = max(0 zx,y )) zx,yi (
)

( )
RGB

F.4

F.5

max-with- 0

( 1)

16
F.6
128 0.9
w

i v < w E i>i wi ith


cuda-convnet NVIDIA GTX 580 GPU CIFAR-10
90 ImageNet

F.7
2 3
10 10

G CIFAR-10
dropout CIFAR-10 CNN
3 3 2
N=
9 = 0.001 = 0.75 10 softmax
64 5 5(
)
dropout CIFAR-10 dropout

16 3 3
50% dropout softmax

17
H ImageNet
dropout ImageNet CNN 256 256
224 224
7
3 3
2 64
11 11 4 (
) 256 5 5
( ) 256
64 16
(56 56) 2
( ) max-with-zero

max-with-zero 512
32 ( ) 16
512 32
32
4096 50% dropout
1000 softmax softmax 1000 256 256
224 224 ( )

dropout ImageNet
dropout

dropout

dropout dropout

18

You might also like