Professional Documents
Culture Documents
Networks
The expression for the normal distribution cuurve is given by In this system, we construct a Gabor
G filter bank consisting of
40 filters. We choose eight orientatiions to capture subtle features
ଵ ିሺିఓሻమ of the facial expression, and five scales
s to efficiently represent
ሺሻ ൌ ቀ ቁ (1)
ఙξଶగ ଶఙ మ features of a 128×128 image. The otther parameters selected are ߛ
where ߤ represents the mean value, and ߪ denootes the standard = ߙ ൌ ξʹ and ݂௫ ൌ ͲǤʹͷ, which h are also appropriate for the
deviation. image size.
In our system, we set ߤ to be 0 and ߪ to be 1. However, due We extract the Gabor featuress of a grey-scale image by
to the nature of neural networks, ߤ has no iinfluence on the convolving the image ܫሺܽǡ ܾሻ with
w the Gabor filter bank
classification results. Figure 4 gives a visual example of the
௨ǡ௩ ሺܽǡ ܾሻ, i.e.,
histogram remapping. Here the mapped pixel vaalues are rescaled
back to the 8-bit interval for visualization purposes. ܨሺܽǡ ܾሻ ൌ ܫሺܽǡ ܾሻ
כ௨ǡ௩ ሺܽǡ ܾሻ. (5)
The magnitude responses of a sample
s image filtered by two
of the Gabor filters are shown in Fig
g.5.
ݑథ ȉ ߶൫ݔ ൯ ൌ σ
ୀଵ ߙ ቀ߶ሺݔ ሻ ȉ ߶ ൫ݔ ൯ቁ. (7) In the function, ൌ , where x is the input vector and W is
the weight parameter.
Denote the kernel function by
The output layer is a softmax classifier. Each output unit will
݇ሺݔ ǡ ݔ ሻ ൌ ߶ሺݔ ሻ ȉ ߶൫ݔ ൯. (8) output the probability of the input im
mage being its corresponding
expression. The output h(x) of the softmax
s classifier for an input
Hence, the nonlinear principal components can be extracted vector ݔ is given by
implicitly using the kernel function withoout the explicit
projection of input vectors to high dimensionall space ܴ . This ሺݕ ൌ ͳȁݔ Ǣ ܹሻ ௐభ ௫
݁ۍ ې
makes the Kernel PCA have a similar computattional complexity ሺݕ ൌ ʹȁݔ Ǣ ܹሻ
݄ௐ ሺݔ ሻ ൌ ൦ ൪ൌ
ଵ
݁ ێௐమ ௫ ۑ (11)
compared with linear PCA. ڭ ೈೣ
ೕ ێ ۑ ڭ
σ
ೕసభ
In this system, we use the fractional power ppolynomial kernel, ሺݕ ൌ ݉ȁݔ Ǣ ܹሻ ݁ۏௐ௫ ے
which is defined by motions, and W is the weight
where m equals to the number of em
ᇱ ᇱ
൫ݔ ǡ ݔ ൯ ൌ ൫ݔ ݔ ൯ ȉ หݔ ݔ ห .
Ǥ଼
(9) parameter.
B. Greedy layer-wise training
In Section 5, we will show that the nonlinear principal
components extracted by KPCA achieve better recognition rates Even though the significant powwer of deep networks has been
than linear principal components extracted usingg PCA. proved theoretically and appreciatted for decades, researchers
found it difficult to train deep nettworks. Traditional gradient-
based optimization algorithms are no
ot effective when the gradient
is propagated across multiple lay yers of non-linear functions
IV. FEEDFORWARD DEEP NEURAL NETW
WORKS
[19][20]. Reasons include insuffficiency of labeled data,
Deep neural networks are ones in which thhere are multiple converging to local optima and diffu
usion of gradients.
hidden layers. Since each hidden layer compputes a nonlinear
transform of the previous layer, multiple hiddenn layers have the In order to address those problem
ms, Hinton et al. has proposed
power to generate much more complex features oof the input. As a a greedy layer-wise unsupervised training strategy based on
result, a deep network can learn significantlyy more complex restricted Boltzmann machines (RBM) [21]. Bengio et al. further
functions than a shallow network. It has been shown that a k- improved the greedy layer-wise prrocedure with autoassociator
layer network can represent functions that a (k – 1) layer network networks. The main idea of the meth hod is to train different layers
can only represent with an exponentially large nnumber of hidden of the deep network one at a time [18]. It is the training strategy
units [18]. that we apply.
A. The network architecture In our training process, the two hidden layers of the network
mages. They will try to learn
are firstly trained using unlabeled im
The deep network we design contains one input layer, two an identity function where the desirred output is the same as the
hidden layers and one output layer as shown in Fig.6. The inputs input. This process is unsupervised d feature learning. The use of
are feature vectors obtained after KPCA. Eaach hidden layer unlabeled images helps the neetwork learn good feature
contains 200 units. representations prior to supervised learning. Then we feed the
labeled image data into the two hidd den layers that are pre-trained
and perform forward propagation to obtain feature vectors. Those with the total number of images.
feature vectors are used to train the output layer, which is a
softmax classifier. Supervised training is applied here, where the For each test set, we have recorded the recognition rate for
target value of a output unit is 1 if the labeled emotion is the each emotion as well as the total recognition rate. TABLE II
same as the one it represents, and 0 otherwise. We apply fine- shows the recognition rate using original/uniform/normal
tuning of the whole network as the final step. We treat all layers histogram distribution and PCA/KPCA.
as one single model and use back propagation algorithm to Based on the results, we can see that histogram remapped
improve upon all the weights in one iteration. images significantly outperform images with no histogram
manipulation. Of the two histogram manipulation techniques,
fitting a normal distribution leads to better recognition rates than
V. EXPERIMENTAL RESULTS histogram equalization. Moreover, Kernel PCA is more
powerful over PCA in terms of improving recognition rates.
We use the Extended Cohn-Kanade Dataset for training and Overall, fitting a normal histogram distribution combined with
testing the deep neural network. It is a very popular database KPCA yields the best performance.
used to evaluate the performance of facial expression recognition
systems. The number of images contained in the dataset is shown B. Recognition of seven facial expressions
in TABLE I. In order to further test the robustness of the system and do
TABLE I. Number of images of the seven emotions in CK+ Dataset reliable benchmark comparison, we perform recognition tests on
seven emotions, including six basic emotions and contempt. A
Emotion Number of Images confusion matrix of the test results is presented in TABLE III.
Angry 45
TABLE III. Confusion matrix of seven emotion recognition
Disgust 59
An Di Fe Ha Sa Su Co
Fear 25
Happiness 69 An 84.4 3.0 2.2 0 10.4 0 0
Contempt 19 Ha 0 0 0 100 0 0 0