You are on page 1of 7

Gender Recognition using Facial Patterns by Convolutional Neural

Networks
Abstract:
Face is a unique feature of human body and can be used for gender recognition of any human being
however recognizing the gender of human beings in computer vision is not an easy task. Gender
classification and recognition using facial images has a really important use in many interactive
applications.
This paper introduces various deep learning techniques and convolutional neural networks for
recognition and classification of human gender. The name of the dataset used in this gender
recognition method is Facial Recognition Technology (FERET) database which is made for this
purpose as part of Face Recognition Technology (FERET) program. This dataset comprises of 578
facial images. As told earlier many deep learning techniques have been used to train the dataset
and to recognize the gender but the highest classification accuracy is obtained using the “AlexNet”
classifier. The accuracy obtained is better than the earlier proposed methods and the classification
speed is also greater but still the results are not still perfect and as much more research as possible
on gender recognition using facial patterns is required.
Keywords: Gender recognition, Facial patterns, AlexNet classifier, Deep learning, Convolutional
neural networks

Introduction:
Gender recognition using facial patterns involve recognition and classification of human gender
since crucial information is perceived regarding human gender from human face.
Gender recognition is basically a binary recognition technique used to classify males and females
and this problem is an active research topic in the present which has attracted the attention of many
computer vision researchers. Facial patterns are very helpful in providing useful information
required for human interactive applications. It is also used for customer oriented advertising,
content based indexing, surveillance, target advising searching and demographics. According to
research the difference between femininity and masculinity can be really helpful in improving the
performance of various computer vision and biometric applications.
The main focus of the paper is on Gender Recognition from facial patterns using different deep
learning descriptors and AlexNet classifier in particular. Many other gender recognition methods
have been proposed in recent years in literature and here is a summary of those proposed methods
and their advantages and limitations.
In [1], a multi-agent system has been proposed for classification of gender and image using facial
images. The proposed method is basically a integration of various techniques like LBP,
Fisherfaces , ANN, Eigenfaces and their combination with filters (Sobel, Gabor). for pre-
processing and acquisition of images. The greatest advantage of the proposed system is that it
performs classification in real time and the workload has been reduced.
In [2] cumulative benchmark approach has been used for extracting appearance and geometric
features of face automatically from front view. Three diverse classifiers have been used for this
purpose namely Adobos, SVM and neural network. It is an efficient method but stability and
scalability of images is required for feature extraction.
In [3] the facial patterns are considered to be correlated features by training a single CNN that
jointly learns from all facial patterns. This technique of CNN has fewer parameters that makes it
less sensitive to over fitting and its easier to train the dataset. However the accuracy is not perfect.
In [4] a fusion based gender identification method is used in which facial patterns are used as
Input. Four different frameworks are required which is LBP, PCA, SVM and LDA. Weighted
voting has been used for fusing the decisions obtained from four frameworks. This method
obtains recognition rate of 94% for neutral faces of FEI face dataset, which is equal to state-of-
the-art rate for this dataset.
In [5] a method of gender recognition has been proposed that uses multiresolution statistical
descriptors derived from histogram of Discrete Wavelet Transform. This method yields better
accuracy.
In this research paper, we have proposed different texture descriptors i.e. GLCM, LBP,
ALexNet, VGG, RESNET etc. FERET dataset has been used consisting of 478 pictures. The
experimental results proved that RESNET gives the best performance with accuracy of 98%.

Proposed Methodology:
A Convolutional Neural Network (CNN, or ConvNet) are a special kind of multi-layer neural
networks, designed for recognition and identification of visual patterns from pixel images with the
help of minimal preprocessing. The ImageNet project is a very large visual database designed with
the purpose of visual object recognition software research. There is also an annual software contest
related to ImageNet project called the ImageNet Large Scale Visual Recognition Challenge
ILSVRC, where software programs compete to proficiently and correctly recognize and detect
scenes and objects.
In this research paper we have used Convolutional Neural Networks (CNN) to train our two
datasets (FERET) containing of 234 pictures each and did gender recognition using facial patterns.

AlexNet Architecture:
AlexNet was much greater than previous CNNs that were used for tasks related to computer vision
( e.g. Yann LeCun’s LeNet paper in 1998). It has about 650,000 neurons and 60 million parameters
and it took five to six days to train on two GTX 580 3GB GPUs. However today the CNNs are
very complex that can run on faster GPUs with efficiency even on very large datasets.
AlexNet consists of total 8 layers out of which 5 are convolutional layers and 3 of them are fully
connected layers.
Multiple Convolutional Kernels (a.k.a filters) are used for extracting interesting features in an
image. There are usually many kernels of the same size in a single convolutional layer e. g the first
Convolutional Layer of AlexNet contains 96 kernels each of size 11x11x3. The height and width
of the kernels is equal and the depth is the same as the number of channels.
The first two Convolutional layers are followed by the Overlapping Max Pooling layers that are
described next. The fifth, fourth and third convolutional layers are connected directly. The fifth
convolutional layer is also followed by an Overlapping Max Pooling layer whose output goes into
a series of two fully connected layers. The second fully connected layer feeds into a softmax
classifier which has 1000 class labels.
After all the convolution and fully connected layers ReLU nonlinearity is applied. After the
application of ReLU nonlinearity to the first and second convolution layers, local normalization is
performed before pooling. However later normalization was found to be not very useful. So its
detail will not be discussed.
Overlapping Max Pooling
Max Pooling layers are usually used to downsample the height and width of the tensors, keeping
the depth same. Overlapping Max Pool layers are very similar to the Max Pool layers, except the
adjacent windows over which the max is computed overlap each other. The authors used pooling
windows of size 3×3 with a stride of 2 between the adjacent windows. This overlapping nature of
pooling helped reduce top-5 error rate by 0.3% and the top-1 error rate by 0.4% respectively when
compared to using non-overlapping pooling windows of size 2×2 with a stride of 2 that would give
same output dimensions.
ReLU Nonlinearity:
The use of ReLU (Rectified Linear Unit) Nonlinearity is an important feature of the AlexNet.
Sigmoid or tanh activation functions are usually used to train a neural network model. AlexNet
proved that with the help of ReLU nonlinearity, deep CNNs could be trained much more faster
than using the saturating activation functions like sigmoid or tanh. The figure below shows that
using ReLUs (solid curve), the AlexNet classifier could achieve a 25% training error rate six times
faster than an equivalent network using tanh(dotted curve). This was tested on another dataset “the

CIFAR-10 dataset.”
Lets see why it trains faster with the ReLU function. The ReLU function is given by the formula
f(x) = max(0,x)
Above are the plots of the two functions – ReLU and tanh. The tanh function gets saturated at very
high or very low values of z. At these regions, the slope of the function goes very close to zero.
This can slow down gradient descent. On the other hand the ReLU function’s slope is not close to
zero for higher positive values of z. Thus the optimization converges faster. The slope is still zero
for negative values of z but most of the neurons in a neural network usually end up having positive
values. ReLU is preferred over the sigmoid function too for the same reason.
Overfitting
When we used to memorize the answers to the questions without any understanding in school then
we were unable to answer the questions which required thinking and understanding.
Same is the case with neural networks, the size of the Neural Network is its tendency and capacity
to learn, but if you are not vigilant, it will try to memorize the examples in the training data without
understanding the concept. As a consequence, the Neural Network will work extremely well on
the training data, but they fail to understand and learn the real concept. It will fail to work well on
unseen and new test data. This phenomenon is called overfitting.
Methods to Reduce Overfitting:
AlexNet uses different methods to reduce overfitting.
Data Augmentation
The first one is data augmentation which involves showing a Neural Network containing different
variation of the same image helps preventing overfitting. You are forcing it to not learn and
memorize! Often it is possible to generate additional data from existing data free of cost. Here are
the few methods and tricks used by the AlexNet team to reduce overfitting.
Data Augmentation by Mirroring
If we have an image of a human in our training set, its mirror image is also a valid image of a
human. An example is demonstrated in the figure below to explain it. So we can double the size
of the training dataset by simply flipping the image about the vertical axis.
Data Augmentation by Random Crops
In addition, cropping the original image randomly will also lead to additional data that is just a
shifted version of the original data.
The authors of AlexNet extracted random crops of size 227×227 from inside the 256×256 image
boundary to use as the network’s inputs. They increased the size of the data by a factor of 2048
using this method.

Notice the four randomly cropped images look very alike but they are not exactly the same. Thus
we learn that in Neural Network that minor shifting of pixels does not change the fact that the
image is still that of a human. It would not have been possible for the authors to use such a large
network without data augmentation because it would have affected from substantial overfitting.
Dropout
The authors experimented with other ways to reduce overfitting too with about 60M parameters to
train. So they applied another technique called dropout that was introduced by G.E. Hinton
in another paper in 2012. The dropout technique involves dropping of neuron from the neural
network with a probability of 0.5. The neuron does not contribute to either backward or forward
propagation thus it helps in reduction of overfitting.
References:
[1] G-Briones, Alfonso, et al. "A multi-agent system for the classification of gender and age
from images." Computer Vision and Image Understanding 172, 98-106, 2018
[2] V Vivek Kumar, S Srivastava, T Jain, and A Jain. "Local Invariant Feature-Based Gender
Recognition from Facial Images." In Soft Computing for Problem Solving, pp. 869-878. , 2019.
[3] Aslam, A., Hussain, B., Cetin, A. E., Umar, A. I., & Ansari, R. Gender classification based on
isolated facial features and foggy faces using jointly trained deep convolutional neural
network. Journal of Electronic Imaging, 27(5), 053023, 2018
[4] Ghojogh, Benyamin, Saeed Bagheri Shouraki, Hoda Mohammadzade, and Ensieh Iranmehr.
"A Fusion-based Gender Recognition Method Using Facial Images." In Electrical Engineering
(ICEE), Iranian Conference on, pp. 1493-1498. IEEE, 2018.
[5] Sheetlani, Jitendra, Chitra Dhawale, and Rajmohan Pardeshi. "Gender Identification from
Frontal Facial Images Using Multiresolution Statistical Descriptors." In Computing,
Communication and Signal Processing, pp. 977-986. Springer, Singapore, 2019.

You might also like