Professional Documents
Culture Documents
www.ijiset.com
ISSN 2348 – 7968
STOCHASTICALLY REDUCING
OVERFITTING IN DEEP NEURAL NETWORK USING
DROPOUT
Abstract
Deep neural networks are trained on the large number of
parameters which are likely to co-adapt and overfit. Overfitting is 2. Deep Restricted Boltzmann Machine
a challenging problem in the deep neural network.Dropout
training has shown a significant effect in improving deep neural
network. The aim of this dissertation to study dropout and other
c1 c3 c4 cm
which are built on dropout regularization methods. In real world
data is noisy with i.e. missing features, unlabeled, unstructured.
We will study method to distort data prior training to act as a
regularizer. This will create data having a correlation with real
world data. Restricted Boltzmann Machine probabilistic energy
based graphical model with no interconnection between hidden
to hidden units and visible to visible units. It would be stacked
and Deep RBM will be formed for training. b1 b3 b4 bn
Keywords: — Deep neural networks, Regularization,
Overfitting, Distorted Distribution.
Figure 1 RBM
465
IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 2 Issue 5, May 2015.
www.ijiset.com
ISSN 2348 – 7968
Restricted Boltzmann Machine is undirected energy 3. Contrastive Divergence K-steps
based graphical model. In the energy based models the
energy functions distribution of the data over the hidden
and visible units. The energy function is as described in the Input : RBM (V1 , . . . , Vm , H1 , . . . , H n ), training batch S
equation. Output : Gradient approximation Δw ij , Δb j and Δci
RBM can be stacked to form DRBM which can train
using CD-k for a faster learning. DRBM can used various for i = 1, . . . , n, j = 1, . . . , m
machine learning tasks like computer vision, Natural init Δw ij = Δb j = Δci = 0 for i = 1, . . . , n, j = 1, . . . , m
Language Programming, handwritten recognition.
for all the v ∈ S do
Pre-training (0)
State Activation v ← v
1.Compute energy 1.Train layer one for t = 0, . . . , k -1 do
Dataset by one
2.Explore unsupervised for i = 1, . . . , n do
4. Distorted Distribution
Figure 2 Workflow for Training DRBM. The data is first distorted
The data will be distorted by different distributions. The
first as shown in fig. 3, pre-trained using RBM only the CD-k
algorithm is applied for learning from weights. distribution will define the characteristics of new features.
The new distribution will in the form
( ) ( )
Data Pre- N
P x' |x = ∏ PD x'n |x n ; θn
Add Random
Processing
Noise
Dataset
1.Import data
1. Select data n =1
2.Convert in req.
format
2. Apply distortion where,
3. Generate data
3.Explore data
x' are new parameter
θn model parameters
Randomize noise x normal parameter
selection
1. Guassian Noise PD is the type of distribution
2. Laplace Noise
3. Dropout Noise PD can take one of the following forms:
1. Dropout in which the n th feature is randomly set to zero with probability pn;
Figure 3 Workflow for distorting data first the data is pre-
processed then randomly the distribution is applied 2. Gaussian on the n th feature with variance σ
3. Laplace n th feature with variance λ
466
IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 2 Issue 5, May 2015.
www.ijiset.com
ISSN 2348 – 7968
4. Conclusions
The better your paper looks, the better the Journal looks.
Thanks for your cooperation and contribution.
467
IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 2 Issue 5, May 2015.
www.ijiset.com
ISSN 2348 – 7968
function. For training our model on different variety of [14] G. E. Hinton. A practical guide to training restricted
dataset we increase dataset size by applying different boltzmann machine.Neural Networks: Tricks of the Trade,
distributions. Comparison with GPU performance will be a 599-619,2010
good comparison, it can explore the distribute aspect also. [15] L. Prechelt. Early stopping-but when? In Neural Networks:
Tricks of the trade. Springer Berlin Heidelberg, pp. 55-69
These methods can be extended to NLP, computer vision
,1988
as well as multimodal learning. [16] R. Caruana , S. Lawrence , & L. Giles (2001). Overfitting
The model of DRBM is implemented through stacking in neural nets: Backpropagation, conjugate gradient, and
different RBM. It would be interesting to observe results early stopping. Advances in neural information processing
by applying to different models like convolution neural net systems, 402-408,2001
(CNN), Deep Belief Net (DBN). [17] S. Ha K., Cho , & D. MacLachlan. Response models based
on bagging neural networks. Journal of Interactive
Marketing, 19(1), 17-30,2005
References [18] S. Geman , E. Bienenstock , & R. Doursat. Neural
[1] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, &
networks and the bias/variance dilemma. Neural
R. Salakhutdinov . Dropout: A simple way to prevent neural
computation, 4(1), 1-58, 1992.
networks from overfitting. The Journal of Machine Learning
[19] Li. Deng "An overview of deep-structured learning for
Research, 15(1), pp.1929-1958, Jul. 2014.
information processing."Proceedings of Asian-Pacific
[2] G. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, & R.
Signal & Information Processing Annual Summit and
Salakhutdinov. Improving neural networks by preventing
Conference (APSIPAASC). 2011.
co-adaptation of feature detectors. ArXiv preprint arXiv:
[20] Y. Bengio , P. Lamblin , D. Popovici , & H. Larochelle.
1207.0580, Jul. 2012. Greedy layer-wise training of deep networks. Advances in
[3] I.J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, neural information processing systems, 19, 153, 2007.
& Y. Bengio .Maxout networks. arXiv preprint [21] G.E.Hilton, T.J. Slenkosi D. E. Rumelhart, J. L.
arXiv:1302.4389, Sep 2013 McClelland, and the PDP Research Group, ed. "Learning
[4] J.T. Springenberg & M. Riedmiller .Improving Deep and Relearning in Boltzmann Machines". Parallel
Neural Networks with Probabilistic Maxout Units. arXiv Distributed Processing: Explorations in the Microstructure
preprint arXiv:1312.6116,Jul 2013 of Cognition. Volume 1: Foundations (Cambridge: MIT
[5] Y. Miao, F. Metze, & S. Rawat .Deep maxout networks for Press): 282–317, 1986.
low-resource speech recognition. IEEE workshop [22] Hinton Geoffrey, Simon Osindero, and Yee-Whye Teh. "A
in Automatic Speech Recognition and Understanding fast learning algorithm for deep belief nets." Neural
(ASRU), Dec.2013 computation 18, no. 7, pp: 1527-1554, 2006.
[6] M. Cai, Y. Shi, & J. Liu. Deep maxout neural networks for [23] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.A.
speech recognition. IEEE workshop in Automatic Speech Manzagol. Stacked denoising autoencoders: Learning useful
Recognition and Understanding (ASRU), Dec.2013 representations in a deep network
[7] L. Wan, M. Zeiler, S. Zhang, Y. L. Cun, & R. Fergus. [24] A. Krizhevsky , I. Sutskever , & G. E. Hinton . Imagenet
Regularization of neural networks using dropconnect. classification with deep convolutional neural networks.
In Proceedings of the 30th International Conference on In Advances in neural information processing systems pp.
Machine Learning (ICML-13), 2013, pp. 1058-1066. 1097-1105, 2012.
[8] Frazão Xavier, and Luís A. Alexandre. "DropAll: [25] N. Srivastava . Improving neural networks with
Generalization of Two Convolutional Neural Network dropout ,Master’s Thesis University of Toronto, Dec. 2013
Regularization Methods." Image Analysis and [26] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner.
Recognition.Springer International Publishing, 282-289, "Gradient-based learning applied to document
2014. recognition." Proceedings of the IEEE, 86(11), pp:2278-
[9] M. D. Zeiler, & R. Fergus . Stochastic pooling for 2324, November 1998
regularization of deep convolutional neural networks. arXiv [27] Krizhevsky Alex and Hinton Geofrey. Learning multiple
preprint arXiv:1301.3557,Jan.2013 layers of features from tiny images. Technical report,
[10] Bertero M. "Regularization methods for linear inverse University of Toronto, April 2009.
problems." Inverse Problems. Springer Berlin Heidelberg, [28] Yuval Netzer, Tao Wang, Adam Coates, Alessandro
pp. 52-112, 1986. Bissacco, Bo Wu, Andrew Y. Ng Reading Digits in Natural
[11] Ba J., & Frey B. .Adaptive dropout for training deep neural Images with Unsupervised Feature Learning NIPS
networks. InAdvances in Neural Information Processing Workshop on Deep Learning and Unsupervised Feature
Systems, 2013, (pp. 3084-3092). Learning 2011.
[12] Y. Bengio. Learning deep architectures for [29] Y. LeCun, F.J. Huang, L. Bottou, Learning Methods for
AI. Foundations and trends in Machine Learning, 2(1), 1- Generic Object Recognition with Invariance to Pose and
127, Jan.2009. Lighting. CVPR 2004
[13] A. Fischer & C. Igel. Training restricted Boltzmann [30] S. Haykin, Neural Networks - A Comprehensive
machines: An introduction. Pattern Recognition, 47(1), 25- Foundation, Maxwell Mac Millian Int., New York, 1994.
39,2014
468
IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 2 Issue 5, May 2015.
www.ijiset.com
ISSN 2348 – 7968
[31] C. M. Bishop, Neural Networks for Pattern Recognition,
Oxford University Press, 1995.
[32] G. Hinton, 'Deep belief networks', Scholarpedia, vol. 4, no.
5, pp. 5947
[33] Slatton T. G. A comparison of dropout and weight decay
for regularizing deep neural networks. Undergraduate
Honors Theses University of Arkansas Libraries ,2014
[34] J. A. Koziol , E. M. Tan , Dai L., P. Ren & J. Y. Zhang
Restricted Boltzmann Machines for Classification of
Hepatocellular Carcinoma.Computational Biology
Journal, 2014.
[35] A. Y. Ng Feature selection, L 1 vs. L 2 regularization, and
rotational invariance. In Proceedings of the twenty-first
International Conference on Machine learning pp: 78,ACM
July 2014
[36] Nishtha Tripathi, Avani Jadeja. A Survey of
Regularization Methods for Deep Neural
Network.International Journal of Computer Science and
Mobile Computing, Vol.3 Issue.10, pg. 429-436, Nov.
2014.
[37] V. Tetko Igor, David J. Livingstone, and Alexander I.
Luik. Neural network studies.Comparison of overfitting and
overtraining. Journal of chemical information and computer
sciences 35.5 pp: 826-833, 1995
[38] G. E. Hinton Training products of experts by minimizing
contrastive diver gence. Neural computation, 14(8),
1771-1800, 2014.
[39] In.mathworks.com, 'MATLAB - The Language of
Technical Computing', [Online]. Available:
http://in.mathworks.com/products/matlab/. [Accessed: 05-
May- 2014].
469