Professional Documents
Culture Documents
2nd International Conference on Modelling, Simulation and Applied Mathematics (MSAM 2017)
Abstract—Though deep autoencoder networks show excellent adaptively selected number of hidden variables to combine
ability in learning feature, its poor performance on test data go subgroups of features into a network and improved the
against visualization and classification of image. In particular, a statistical significance of identified features [10]. However, the
standard neural net with multi-hidden layers typically fails to above methods are limited in image recognition with deep
work when sample size is small. In order to improve the neural networks, so it is worth exploring a more effective way
generalization ability and reduce over-fitting, we apply dropout to reduce over-fitting of deep networks. Preventing co-
to optimize the deep autoencoder networks. In this paper, we adaptation of feature detectors by dropout showed performance
propose face recognition based on deep autoencoder networks of neural networks on supervised learning tasks in 2012 [11].
with dropout. Our experiments show that deep autoencoder
networks with dropout yield significantly lower test error, and In this paper, for solving bad generalization, we model deep
bring a new conception in pattern recognition with deep learning. autoencoder networks with dropout and propose face
recognition based on model in the small sample. Our
Keywords-deep- autoencoder networks; dropout; face experiments show that this method yields significantly lower
recognition classification error on the small facial dataset.
(A)
W2 W1 Y R50 W2 X R100 (B)
(A) X R100 W1 Y1 R50 Y2 R10 1 +1 +1
bil 1
W1 W1 bil 1
(B) X R Y1 R50 X R100 m1l ~
100
y1 l
f y1 l y1l
wil 1 xil 1 y il 1 l 1
yil 1
wil 1 xi
W2 W2 l l m2l ~ l
(C) Y1 R Y2 R Y1 R
50 10 50
y2 y2 y2
m3l ~ l
FIGURE I. PRE-TRAINING A DEEP AUTOENCODER NETWORK y 3l y3l y3
WITH FOUR LAYERS.
FIGURE III. COMPARISON OF THE BASIC OPERATIONS OF A
By above pre-training, we obtain approximate optimal STANDARD AND DROPOUT NETWORK: (A) A STANDARD
NETWORK. (B) A DROPOUT NETWORK.
model. What we need is Y2 generated by the input X through
two encoder layers, and establishes relationship with label Consider neural network with L 4 layers.
a
information of the input X . Further, we yield optimal model of Let l 0,1, , L 1 index the hidden layers of the network.
deep autoencoder networks by supervised fine-tuning, as
l l
shown in Figure 2. Let x denote the vector of inputs into layer l , y denote the
l l
W1 W2 vector of outputs from layer l . W and b are the weights
X Y1 Y2 classifier
and biases at layer l . The feed-forward operation of a standard
prediction neural network can be described as Figure 4, where f is any
fine-tuning activation function.
Label error
1 2
W W
x11
3
W
training standard neural nets:
FIGURE II. FINE-TUNING DEEP AUTOENCODER NETWORKS.
1 x il 1 W i l 1 y il bil 1
xj
The optimal model takes the visual features of facial image y il 1 f x il 1
l 0 ,1, , L 1
as input, and outputs the class information of image by
x d1
connecting classifier. Though deep autoencoder networks show
excellent ability in learning feature, its poor performance on
test data go against visualization and classification of image. In +1 b 1 +1 b 2 +1 b 3
order to improve the generalization ability and reduce over-
fitting, we apply dropout to optimize the deep autoencoder FIGURE IV. STANDARD NEURAL NET MODEL.
networks.
As previously described, random vector m samples the
A motivation for dropout comes from a theory of the role of outputs yil of that layer, which create the thinned
sex in evolution [12]. The explanation for the superiority of
sexual reproduction is that, sexual reproduction is not just to y l are used as input to the next layer. By dropping a
outputs ~ i
reduce complex co-adaptations by mix-ability of genes, but unit out, we mean temporarily removing it from the network.
also to allow new gens to enhance their ability to fit the
Each unit is retained with a fixed probability p independent of
environment [13]. Similarly, each hidden unit in a neural
network trained with dropout must learn to work with a other units and the choice of which units to drop is random.
randomly chosen sample of other units and prevent complex This amounts to sampling a sub-network from a larger network
co-adaptations among units. This should reduce over-fitting when the process is applied at each layer. The feed-forward
each hidden unit on the training data [14]. operation with dropout becomes Figure 5.
Based on the above, we discover that training dropout
training dropout neural nets:
neural nets are equivalent to adding a probabilistic process for x11 W 1
W 2
3
W
each hidden layer as shown in Figure 3(B). Combined with m il 1 ~ Bernoulli ( p )
probability, we need a random vector m , each of which has x j1 x il 1 W i l 1 y il bil 1
probability p of being 1, to sample each hidden unit. Then,
y il 1 f x il 1
m obeys independent Bernoulli distribution. y il 1 y il 1 * m il 1
~
x d1 l 0 ,1, , L 2
1
output: y i
L 1
f x i L 1
+1 b +1 b 2 +1 b 3
+
244
Advances in Intelligent Systems Research, volume 132
245
Advances in Intelligent Systems Research, volume 132
ACKNOWLEDGMENT
We would like to thank Speech Vision, Robotics Group and
Cambridge University Engineering Department for warm-
hearted help of The ORL Database of Faces download from
their web site and excellent suggestions of face recognition.
This work is supported by the National Natural Science
Foundation of China (NSFC) 11301493.
246