Estimation of Age and Gender Using Convolutional Neural Network

Estimation of Age and Gender using Convolutional Neural Network
1.INTRODUCTION
With the approval of the General Data Protection Regulation (GDPR), the use of a
client’s personal information has become very strict. Since currently, it is not legal to
store a person’s sensitive information (name, email, phone number) without one's
consent, there is a need to have an alternative way to gather client’s information for
marketing purposes.
Age and gender play fundamental roles in social interactions. Languages reserve
different salutations and grammar rules for men or women, and very often different
vocabularies are used when addressing elders compared to young people. Despite the
basic roles these attributes play in our day-to-day lives, the ability to automatically
estimate them accurately and reliably from face images is still far from meeting the
needs of commercial applications. This is particularly perplexing when considering
recent claims to super-human capabilities in the related task of face recognition.
Visual recognition and classification have problems, have presented a new way to
solve complex issues. Such issues that have received an increasing amount of
attention are age and gender classification problems.
The model’s accuracy rates depend not only on the implemented network but also on
the data that is used to train it. The training data is what allows the system to learn to
identify and predict outcomes correctly, therefore, the more data that is fed to the
network, the better the results. when it comes to face detection or age/gender
classification, and usually one of the most important ones, are partial occlusions and
low-quality images. Those influence directly the outcome results as the model has less
information to work on, which makes it harder to predict. The same applies when it is
a human making the prediction. If the image has low quality, it is harder for a human
to be able to understand what is being seen, and therefore, to make a prediction. Age
and gender classification are an inherently challenging problem though, more so than
many other tasks in computer vision. The main reason for this discrepancy in
difficulty lies in the nature of the data that is needed to train these types of systems.
Page | 1
Department of MCA, Kuvempu University
1.1 BACKGROUND
Since recent decades, CNN, image processing and machine vision have been sharply
developed, and they have become a very important part of artificial intelligence and
the interface between human and technology. These technologies have been applied
widely in industry and medicine but rarely in realm related to successes on specific
tasks, such as visual recognition and classification problems, have presented a new
way to solve complex issues.
Despite the importance of the project to identifying the images from image files using
CNN, and although this has been studied for at least 30 years, advances achieved
seem to be little timid.
In the case of image processing, some problems can be solved by using morphological
mathematical operations, which are easy to implement and understand. However,
more complex problems often demand more sophistical approaches. Techniques like
neural network can be very powerful if properly applied.in many cases, that the use of
those techniques is in more demand.
One of the most common methods in image feature extraction is based on age and
gender features of face in image. Some simple features are aspect accurate number of
age.
The main aim of this project work is to create an efficient system that is able to detect
faces in images and to classify such faces based on age and gender. Thus, the goals
below can be derived from this:
The system should be able to detect faces in images the system needs to be able,
provided an image with people, to identify most faces correctly and provide good
images as input to the next part of the system, which is the age and gender
classification model.
The system should be able to classify a face into a set of age and gender classes we
need an accurate approach to provide a good age and gender classification on the
customers, so that we can have a high categorization accuracy. When it comes to age
prediction, a high accuracy could be troublesome to achieve as even humans find it
hard to predict the age of a person based on their facial characteristics alone; hence, in
this case, a lower accuracy might be acceptable.
Page | 2
The system should be configurable in terms of age classes used allows the user to
have some control over which classes to use. So, the system should allow the user to
configure the age classes as he sees fit, without requiring to re-train the underlying
mode
Evaluate currently available models for both face detection and age/gender
classification Due to the fact that AI systems have grown rapidly over the last years,
there are multiple models that are capable of detecting faces and others capable of
classifying faces by age and gender. Therefore, an analysis needs to be conducted in
order to validate which ones achieve better results for our problem, enabling us to
decide which models to use in our system.
Create a system integrating the identified best available models Evaluate the failing
outputs of the integrated models in order to find an underlying reason for such
failures. Adapt the system to overcome the identified failing outputs to improve the
classification accuracy. Finally, an analysis of the outputs that are wrongly predicted
needs to be done in order to understand why such failures occur. The objective of this
is to, after understanding why such images tend to fail, find a possible solution and
adapt the system to increase the accuracy further.
 Chapter 1 gives the Introduction of this project.

 Chapter 2 gives a Literature Review.
 Chapter 3 gives a brief picture of Methodology.
 Chapter 4 explains in detail on the Result and Discussion of this proposed
work.
 Chapter 5 deals with Conclusion.
 Chapter 6 gives a view on Bibliography.
Page | 3
2. LITERATURE REVIEW
Literature Review provides a brief survey to the various existing methods available in
literature for age and gender prediction. A good number of methods for age and
gender prediction exist in the literature. There are lot of works are being carried out
on prediction of age and gender techniques, various techniques are used to predict the
face in a video or in an image. This section reviews the few of the related works to
our project.
Ranganatha S1, Dr. Y P Gowramma [April 2015] were proposed that the demand for
biometric security system has risen due to a wide range of surveillance, access control
and law enforcement applications. Among the various biometric security systems
based on finger print, iris, voice or speech and signature, face recognition by age and
gender seems to be easily accessible, non-intrusive, and most universal system. It is
easy to use, can be used efficiently for mass scanning which is quite difficult in case
of other biometrics, and also increases user friendliness in human computer
interaction. Several interesting research attempts are made in the field of face
recognition. There are three main divisions of face recognitions based on the face data
acquisition type: methods that deal with intensity images, those that operate on video
sequences and those that are based on other sensor inputs. This paper provides an
overview of few widely used methods in each of these divisions along with the
advantages and disadvantages.
Vladimir khryashchev, Alexander Ganin, Olga Stepanova, Anton Lebedev et al. 2016
were proposed new framework for facial expression recognition using an attentional
convolutional network has been developed. Attention plays an important role in
detecting facial expressions, which can then enable neural networks having less than
10 layers to compete with much deeper networks for emotion recognition.
Xiaofeng Wang, Azliza Mohd Ali, Plamen Angelov et al. 2017 were proposed that the
Face recognition was achieved successfully but they are affected by illumination,
pose, facial expression, face containing eyebrows, nose, mouth length, face local
points using Dlib in OpenCV.
D D Pribavkin, P Y Yakimov et al. 2019 were proposed in their model, during

preprocessing Adaboost method is used to remove irrelevant features and Viola Jones
Page | 4
algorithm is used to extract Haar like features which are given as input to CNN model
for processing.
Michal Uricar et al. proposed controlled production of SVM for prophesying gender
and age from a facial image using deep features exploited from DEX. Since for
gender they used a binary SVM classifier and they trained 10 dissimilar splitting of
the dataset of ChaLearn LAP 2016 in various multi-class SO-SVM classifier varieties
for the estimation of probable age. Each multiclass SO-SVM predictor need a value
of SoftMax normal refinement. Author terminates the finest is to associate the
illustration capability in terms of the deep structures through the robustness capability
of SO-SVM model for prophecy.
Tsun-Yi Yang et al. Proposed A novel CNN unit recognized as Soft Stagewise
Regression Network (SSR Net) for the estimation of age from an individual facial
image. They deal with age approximation by carrying out multi class ordering also
then reversing outcomes into reversion through computing the anticipated values.
Each and every phase is actually liable for refining the determination of the prior
stage of it’s for precise age estimation.
Gil Levi and Tal Hassner proposed additionally also included data augmentation
techniques (oversampling and center-crop) to allow the testing dataset to be expanded
and the model to be trained more efficiently. They also applied dropout learning
during the testing phase in order to reduce over-fitting of their model, which consists
of randomly ignoring the output of some layers. Their accuracy percentages were
86.8% for gender classification and 50.7% for age classification.
Rajeev Ranjan et al. also achieved over 90% gender accuracy using Face Recognition
pretrained models. As opposed to the previous work, they created multiple models for
each specific task because having separate networks allowed them to design faster and
more portable models. Additionally, the running time of all models combined is less
than the time that the all-in-one mode. In this work, they were also able to achieve the
highest age classification accuracy reported so far of 70.5% on actual group estimation. To
note that the 1-off group classification accuracy for this achieved 96.2% (1-off is when a
person is classified in one group immediately above or below the correct one).
Page | 5
G. Antipov et al. proposed multiple age encodings were tested. as opposed to gender
classification, where we only have two possible values (male/female).in age
estimation, we could be trying to predict exact age (person A has 29 years) or classify
a person in an age group (Person A is in the group [20-30]). Their experiments
showed that the encoding used could vary the results of age estimation in about 0.95
(MALE). The best encoding used was the Label Distribution Age Encoding, which
treats the age as a set of classes representing all possible ages, having as the content of
each vector cell a Gaussian Distribution centered at the target age, as opposed to what
happens in pure per-year classification which contains a cell value as a binary
encoding (0/1).
Shixing Chen et al. proposes a framework that generates binary output for sub
networks that at last combines to obtain the age labels for age classification by giving
input facial images. Independently, features were learned during designing all array of
age class. Due to which various age class patterns were discovered and leads to
estimated evaluation. To reduce over fitting labelled data was used and each age
group was trained individually.
F. Dornaika et al. For training and testing phase regression tool were used in both
phases. Thus, they also sympathetic in efficient shallow regressors. They used PLS
(Partial Least Square) regressor. It’s a statistical method that can discover
relationship between of observed variables X and Y whereas X shows the
observation and Y associated response. PLS is a powerful statistical approach which
can achieve dimensionality reduction and classification/regression similar to PCA.
Many researchers have made an attempt for age and gender prediction. Some
approaches identify image based on face age and gender information. They also
classify the images as gender wise like male and female using classifier algorithms.
But this proposed project work as seen in figure make a simple approach by just
considering without many complications. Lots of researches have proposed many
methods for finding face detection in an image. out of these our project uses a simple
and robust prediction of age and gender by using another object as reference. This
proposed system uses CNN which can easily predict the age and gender.
Page | 6
Input Image from Database
Image Pre-processing
Face Detection
Feature Extraction
Classification
Age and Gender Detection
Figure 2.1: Flowchart of proposed algorithm
For Age and Gender Detection, Deep EXpectation (DEX) – is used for age estimation
which can be seen in image classification fueled by deep learning. From the deep
learning concept we learn four key ideas that we apply to our solution: (i) The deeper
the neural networks (by sheer increase of parameters / model complexity) the better is
the capacity to model highly non-linear transformations - with some optimal depth on
current architectures; (ii) the larger and more diverse the datasets used for training, the
better the network learns to generalize and the more effective it becomes to over-
fitting; (iii) the alignment of the object in the input image impacts the overall
performance; (iv) when the training data is small that is when we must finetune a
network pre-trained for comparable inputs and goals which would benefit us from the
transferred knowledge.
Page | 7
The areas of age and gender classification have been studied for decades. Various
different approaches have been taken over the years to tackle this problem, with
varying levels of success. Some of the recent age classification approaches are
surveyed in detail in. Very early attempts focused on the identification of manually
tuned facial features and used differences in these features’ dimensions and ratios as
signs of varying age. The features of interest in this approach included the size of the
eyes, mouth, ears, and the distances between them. While some early attempts have
shown fairly high accuracies on constrained input images (near ideal lighting, angle,
and visibility), few have attempted to address the difficulties that arise from real-
world variations in picture quality/clarity.
Deep learning architecture for our Convolutional Neural Network (CNN). Can start
from pre-trained CNNs on the large ImageNet dataset to classify images such that to
obtain a meaningful representation and a smooth and warm start for further fine-
tuning on relatively smaller face datasets. Adjusting the CNNs on facial images with
age annotations is an important step for superior performance, because the CNN
adapts to best fit the particular data distribution and perform effective age estimation.
Due to the shortage of facial images with apparent age annotation, here explore the
benefit of adjusting over crawled Internet face images with available age. While age
estimation is expounded to regression problem, go further and cast the age estimation
as a multi-class classification of age bins followed by a SoftMax expected value
refinement.
OpenCV’s Haar cascades to detect and extract a face region, then classified it using
CNN model. discovered that it’s best to neither subtract the training mean nor
normalize the pixels within the detected face region before classifying it.
Page | 8
3 METHODOLOGY
Methodology To be able to understand how we could achieve our goals; an

investigation was done to study what kind of methods and techniques were being
applied by other authors to provide a model that could solve our proposed problem.
The investigation regarding the literature review was done using several known
platforms like Google Scholar, ACM Digital Library, and IEEE Digital Library. A
separate investigation was done for both face detection and age/gender classification
tasks, analyzing separate papers for each of them. All the methods applied were
analyzed as part of the specific paper, and the results achieved for each of the
problems presented were extracted Age and Gender Classification – A Proposed
System and registered. Once the feature extraction is complete, two files are obtained.
They were: training feature data and Test feature data classification using
Convolutional Neural Network. Where it trains files and then use the test file to
perform the classification task on the test data. Consequently, it would load all the
data files (training and test data files) make modification to the data according to the
proposed model chosen.
Figure3.1: Main stages of the system
Page | 9
With this, were able to have a summary of performances achieved and which
techniques were used so that we can make an informed decision on how to create our
system. The scope was focused to only include research from 2015 onwards, although
a few older documents were used as well. With this, we wanted to have only recent
papers related to the specified subjects, since recognition models are improving and
evolving at a fast pace, and what was used in the last decade may not be the best
approach today. Furthermore, an additional investigation was done to find existing
models that could be used for our own purposes and testing. In image recognition
research a lot have been done about general features extraction or recognition between
different classes and objects. In case of special domain recognition, taking into an
account the unique characteristics that belong to this category, improves the
performance of the system.
3.1 PATTERN RECOGNITION
To identify an item is to recognize the item is associate it with appropriate name. Such
as, the automobile in front of any house is a Honda Accord. Or, a human face in
pictures is an image. Identifying a face in an image requires recognizing the age and
gender by one or more characteristics, such as with an age number and also predicting
gender. Accurate identification of an image can be very helpful in knowing in
biometric as well as how can easily detect from one person to another.
First, lets look at some common characteristics of images that are useful in identifying
them. Now if the same was in a face class dealing with age and gender, the field of
study concerned with identification, classification, and predicting of images, most of
the image’s characteristics may be more obvious.
Pattern recognition is a very important within a computer vision, and the aim of
pattern recognition/ classification is to classify or recognize the pattern based on
extracted feature from them. The pattern recognition involves 3 steps 1) Pre-
processing 2) Feature Extraction and 3) Classification. In pre-processing one usually
processes the image data so it should be in suitable form Ex one gets an isolated
object after this step. In second step measure the properties of object of interest and in
3rd step, determine the class of object based on features. A brief explanation on the
pattern recognition is given in the figure.
Page | 10
Input Image
Image Acquisition
Image Pre-processing
Feature Extraction and

Classification
Age and Gender Detection
Figure3.2: Main Pattern Recognition Steps
3.2 IMAGE PRE-PROCESSING
Before the operations, some of the images are rotated manually for helping the
program to arrange image apex direction to the right side. Afterwards, automatic pre-
processing techniques are applied to all of the image. These pre-processing steps are
illustrated on an image as seen in figure.
The rectangle of interest (ROI) of the image should include all the pixels their values
are smaller than a specific threshold, and then binary image of the image is retrieved.
In this approach the threshold is automatically gotten according to the face detection
of the image files. then the contour of the image of the face can be extracted.
3.3 FEATURE EXTRACTION
Page | 11
After pre-processing, in pattern recognition, the important and essential task is to

measure the properties of an object because object have to be detected based on these
computed properties. In the feature extraction step, the task is to describe the regions
based on chosen representation, ex A region may be represented by its boundary and
its boundary is described by its properties(features) such as age and gender.
There are 2 types of representation, an external representation and internal

representation. An external representation is chosen when a primary focus is on Face
detection of characteristics. An internal representation is selected when the primary
focus is on age and gender.
Properties such as Age and gender. Sometimes the data is used directly to obtain the
descriptors such as in determining the gender of the image, the aim of description is to
quantify a representation of an object. This implies, one can compute results based on
their properties.
3.4 CLASSIFICATION(RECOGNITION)
Once the feature has been extracted, then these features are to be used to classify and
identify an object using age and gender estimation classifier to classify image on the
basis of detecting face of an image such as number in age and gender whether it is of
male or female.
In general pattern recognition systems, there are two steps in building a classifier.
Training and Testing (or Recognition). These steps can be further broken into sub-
steps.
Training:
1 Pre-processing: Process the data so it is in a suitable form.

2 Feature Extraction: reduce the amount of data by extracting relevant
information, usually results in a vector of scalar values.
3 Model Estimation: From the finite set of feature vectors, need to estimate a
model (usually statistical) for each class of the training data.
Recognition:
1 Pre-processing.
2 Feature Extraction: (both steps are same as above).
Page | 12
3 Classification: Compare feature extraction vectors to the various models and

find the closest match.
One can match the feature vectors obtained in training set.
3.5 DEEP LEARNING
Machine Learning is mainly used for object identification, classification problems,

and prediction problems, and is divided into Supervised Learning and Unsupervised
Learning. Deep learning can be described as a set of techniques used as part of
Machine Learning, more specifically used in neural networks, which work with a set
of layers, where the layers allow the data that is the input to the system to be
decomposed and analyzed. those layers are not specifically designed for each
problem, but they are part of a generic procedure that is adaptable to multiple problem
types. So, the same network structure can be used in different problems. How the
network is trained and which data is used is what will distinguish the behavior of such
a model.
Deep Learning has shown many improvements on various tasks that were for many
years hard to solve by other algorithms, showing better results in multiple studies
compared to previous works. Such tasks include problems where the input has a
complex structure and cannot be easily learned by traditional artificial intelligence
algorithms, such as image classification (where each image is represented as a pixel
array)
3.6 CONVOLUTIONAL NEURAL NETWORK (CNN)
A convolutional neural network consists of an input layer, hidden layers and an output

layer. In any feed-forward neural network, any middle layers are called hidden
because their inputs and outputs are masked by the activation function and
final convolution. In a convolutional neural network, the hidden layers include layers
that perform convolutions. Typically this includes a layer that does multiplication or
other dot product, and its activation function is commonly ReLU. This is followed by
other layers such as pooling layers, fully connected layers, and normalization layers.
Convolutional layer:
Page | 13
In a CNN, the input is a tensor with a shape: (number of inputs) x (input height) x

(input width) x (input channels). After passing through a convolutional layer, the
image becomes abstracted to a feature map, also called an activation map, with shape:
(number of inputs) x (feature map height) x (feature map width) x (feature
map channels). A convolutional layer within a CNN generally has the following
attributes:
 Convolutional filters/kernels defined by a width and height (hyper-

parameters).
 The number of input channels and output channels (hyper-parameters). One
layer's input channel must equal the number of output channels (also called depth)
of its input.
 Additional hyperparameters of the convolution operation, such as: padding,
stride, and dilation.
Convolutional layers convolve the input and pass its result to the next layer. This is
similar to the response of a neuron in the visual cortex to a specific stimulus. Each
convolutional neuron processes data only for its receptive field. Although fully
connected feedforward neural networks can be used to learn features and classify
data, this architecture is generally impractical for larger inputs such as high resolution
images. It would require a very high number of neurons, even in a shallow
architecture, due to the large input size of images, where each pixel is a relevant input
feature. For instance, a fully connected layer for a (small) image of size 100 x 100 has
10,000 weights for each neuron in the second layer. Instead, convolution reduces the
number of free parameters, allowing the network to be deeper. For example,
regardless of image size, using a 5 x 5 tiling region, each with the same shared
weights, requires only 25 learnable parameters. Using regularized weights over fewer
parameters avoids the vanishing gradients and exploding gradients problems seen
during backpropagation in traditional neural networks. Furthermore, convolutional
neural networks are ideal for data with a grid-like topology (such as images) as spatial
relations between separate features are taken into account during convolution and/or
pooling.
Pooling layer:
Page | 14
Convolutional networks may include local and/or global pooling layers along with
traditional convolutional layers. Pooling layers reduce the dimensions of data by
combining the outputs of neuron clusters at one layer into a single neuron in the next
layer. Local pooling combines small clusters, tiling sizes such as 2 x 2 are commonly
used. Global pooling acts on all the neurons of the feature map. There are two
common types of pooling in popular use: max and average. Max pooling uses the
maximum value of each local cluster of neurons in the feature map, while average
pooling takes the average value.
Fully connected layer:
Fully connected layers connect every neuron in one layer to every neuron in another
layer. It is the same as a traditional multi-layer perceptron neural network (MLP). The
flattened matrix goes through a fully connected layer to classify the images.
Receptive field:
In neural networks, each neuron receives input from some number of locations in the
previous layer. In a convolutional layer, each neuron receives input from only a
restricted area of the previous layer called the neuron's receptive field. Typically, the
area is a square (e.g. 5 by 5 neurons). Whereas, in a fully connected layer, the
receptive field is the entire previous layer. Thus, in each convolutional layer, each
neuron takes input from a larger area in the input than previous layers. This is due to
applying the convolution over and over, which takes into account the value of a pixel,
as well as its surrounding pixels. When using dilated layers, the number of pixels in
the receptive field remains constant, but the field is more sparsely populated as its
dimensions grow when combining the effect of several layers.
Weights:
Each neuron in a neural network computes an output value by applying a specific

function to the input values received from the receptive field in the previous layer.
The function that is applied to the input values is determined by a vector of weights
and a bias (typically real numbers). Learning consists of iteratively adjusting these
biases and weights.
Page | 15
The vector of weights and the bias are called filters and represent

particular features of the input (e.g., a particular shape). A distinguishing feature of
CNNs is that many neurons can share the same filter. This reduces the memory
footprint because a single bias and a single vector of weights are used across all
receptive fields that share that filter, as opposed to each receptive field having its own
bias and vector weighting.
A specific kind of such a deep neural network is the convolutional network, which is
commonly referred to as CNN or ConvNet. It's a deep, feed-forward artificial neural
network. that feed-forward neural networks are also called multi-layer perceptrons
(MLPs), which are the quintessential deep learning models. The models are called
"feed-forward" because information flows right through the model. There are no
feedback connections in which outputs of the model are fed back into itself.
Convolutional neural networks have been one of the most influential innovations in
the field of computer vision. They have performed a lot better than traditional
computer vision and have produced state-of-the-art results. These neural networks
have proven to be successful in many different real-life case studies and applications,
like:
 Image classification, object detection, segmentation, face recognition;
 Self-driving cars that leverage CNN based vision systems;
 Classification of crystal structure using a convolutional neural network;
 And many more, of course
Since with the ImageNet competition, that these networks are achieving better results
in image classification than other algorithms, which made them since then the most
used approach for all recognition and detection tasks, approaching almost the same
results as human performance. Convolutional Neural Networks (CNN) are a type of
neural network that generalizes better than previous neural networks.
CNN is designed to process data in the form of multiple arrays, passing the input
between layers that extract the necessary features of the input and assign to those
features calculated weights (filters) that will decide which are more important.
Page | 16
Figure3.3: Example of Convolutional Neural Network.
Figure depicts a typical architecture of a CNN, where the input image is fed into
another layer and, then, the output of that layer is fed into the next ones. Each layer
has a specific purpose. The Convolutional Layer is the one responsible for extracting
features from the image (resulting in a feature map). It uses a set of filters to make
computations over the initial vector and extract features from it, feeding the result to
the next layer; this can be used, for example, to detect edges in images. Fully
Connected Layer is the one that classifies the results based on the high-level features
extracted in previous layers, assigning a classification.
Convolutional Neural Network is one of the main categories to do image

classification and image recognition in neural networks. Scene labeling, objects
detections, and face recognition, etc., are some of the areas where convolutional
neural networks are widely used.
Convolution layer is the first layer to extract features from an input image. By
learning image features using a small square of input data, the convolutional layer
preserves the relationship between pixels. It is a mathematical operation which takes
two inputs such as image matrix and a kernel or filter.
Page | 17
Using deep convolutional neural networks (CNNs), follows a pattern in the computer
vision community as CNNs are shown more and more to provide unparalleled
performance for other types of image classification. The first application of CNNs
was the LeNet-5. However deeper architectures were infeasible due to the state of
hardware performance and cost. In recent years, with the dawn of never-before seen
fast and cheap compute, revived the interest in CNNs showing that deep architectures
are now both feasible and effective, and continued to increase the depth of such
networks to show even better performance. Therefore, the authors of leveraged these
advances to build a powerful network that showed state-of-the-art performance. They
advocate for a relatively shallow network, however, in order to prevent over-fitting,
the relatively small dataset they were operating on. Deeper networks, although
generally more expressive, also have a greater tendency to fit noise in the data. So,
while shows improved performance with deeper architectures training on millions of
images, shows improvements for shallower architectures for their use case.
Image recognition in CNN
CNNs are often used in image recognition systems. In an error rate of 0.23% on

the MNIST database was reported. Another paper on using CNN for image
classification reported that the learning process was "surprisingly fast"; in the same
paper, the best published results as of 2011 were achieved in the MNIST database and
the NORB database. Subsequently, a similar CNN called Alex Netwon the ImageNet
Large Scale Visual Recognition Challenge 2012.
When applied to facial recognition, CNNs achieved a large decrease in error rate.
Another paper reported a 97.6% recognition rate on "5,600 still images of more than
10 subjects". CNNs were used to assess video quality in an objective way after
manual training; the resulting system had a very low root mean square error.
The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object

classification and detection, with millions of images and hundreds of object classes. In
the ILSVRC 2014, a large-scale visual recognition challenge, almost every highly
ranked team used CNN as their basic framework. The winner GoogLeNet (the
foundation of Deep Dream) increased the mean average precision of object detection
to 0.439329, and reduced classification error to 0.06656, the best result to date. Its
network applied more than 30 layers. That performance of convolutional neural
Page | 18
networks on the ImageNet tests was close to that of humans. The best algorithms still
struggle with objects that are small or thin, such as a small ant on a stem of a flower
or a person holding a quill in their hand. They also have trouble with images that have
been distorted with filters, an increasingly common phenomenon with modern digital
cameras. By contrast, those kinds of images rarely trouble humans. Humans, however,
tend to have trouble with other issues. For example, they are not good at classifying
objects into fine-grained categories such as the particular breed of dog or species of
bird, whereas convolutional neural networks handle this.
One of the applications of convolutional neural networks (CNN) is perhaps the

LeNet-5 network described by for optical character recognition. Compared to modern
deep CNN, their network was relatively modest due to the limited computational
resources of the time and the algorithmic challenges of training bigger networks.
Though much potential laid in deeper CNN architectures (networks with more neuron
layers), only recently have they became prevalent, following the dramatic increase in
both the computational power (due to Graphical Processing Units), the amount of
training data readily available on the Internet, and the development of more effective
methods for training such complex models. One recent and notable examples are the
use of deep CNN for image classification on the challenging ImageNet benchmark.
Deep CNN have additionally been successfully applied to applications including
human pose estimation, face parsing, facial key point detection, speech recognition
and action classification. This is the report of their application to the tasks of age and
gender classification from unconstrained photos.
A CNN for age and gender estimation Gathering a large, labeled image training set
for age and gender estimation from social image repositories requires either access to
personal information on the subjects appearing in the images (their birth date and
gender), which is often private, or is tedious and time-consuming to manually label.
Data-sets for age and gender estimation from real-world social images are therefore
relatively limited in size and presently no match in size with the much larger image
classification data-sets (e.g. the ImageNet dataset). Overfitting is common problem
when machine learning based methods are used on such small image collections. This
problem is exacerbated when considering deep convolutional neural networks due to
their huge numbers of model parameters. Care must therefore be taken in order to
avoid overfitting under such circumstances.
Page | 19
CNN is an active architecture of approach of profound learning which takes input as

multimedia like images, videos or various 2D/3D data. It is evident that gender and
age estimation on basis of facial features deals with images and video data. Thus,
various researches done by the researcher’s and scientist lies on facial gender and age
estimation focuses on CNN models. CNN model consists of different weights for
every hidden neuron expressed in mathematical expression of multi-dimensional
matrix. Further image transfers through hidden layers of CNN, dimensions of
weights and matrix gets remodel after every convolutional layer. CNN frameworks
are designed by a well-fixed combination of layers such as narrow layers, sub sample
layers, and full layers. Concentrated layers serve as building block to CNN and it is
used to perform the basic function of convolution. Sub-sample layers perform major
function for controlling over fitting issues by reducing the parameters and size by
using max operation for maximum pooling. At last the fully connected layer of
neurons is maps to all activation function of previous layers. There exists an
additional RELU layer that helps in implementing the non-purity function and
correction in CNNs. The last layer is usually utilized as SoftMax layer that provides
decimal feasibility to each output neuron. Researchers have proposed several
interactive CNN architectures for basic neural network concept.
3.7 OPENCV GOOGLENET
OpenCV is a Python open-source library, which is used for computer vision in

Artificial intelligence, Machine Learning, face recognition, etc.
In OpenCV, the CV is an abbreviation form of a computer vision, which is defined as

a field of study that helps computers to understand the content of the digital images
such as photographs and videos.
The purpose of computer vision is to understand the content of the images. It extracts
the description from the pictures, which may be an object, a text description, and
three-dimension model, and so on. For example, cars can be facilitated with computer
vision, which will be able to identify and different objects around the road, such as
traffic lights, pedestrians, traffic signs, and so on, and acts accordingly.
Page | 20
Machines are facilitated with seeing everything, convert the vision into numbers and
store in the memory. computer convert images into numbers. the pixel value is used to
convert images into numbers. A pixel is the smallest unit of a digital image or
graphics that can be displayed and represented on a digital display device.
The last network that was tested for face detection was OpenCV’s GoogLeNet, which
takes around 0.08 seconds to process one image, which is similar to the MTCNN
approach. OpenCV is known for having multiple libraries for visual recognition tasks,
and this new model is their first using Deep Learning in order to detect faces. The
tests conducted followed the same configurations as both previous tests, and results
reached 82.4% detection rate, which outperforms the two previous ones. Table 3
shows a summary on the accuracy regarding all three models. This last network has
two outputs. It gives us a confidence value (value from 0 to 1) representing the
probability of the face being, in fact, a real face, which indicates how confident the
network is with its prediction, and a set of coordinates for the extracted face, which
can be used to crop the faces so that they can be used by the age and gender model.
3.8 FACE DETECTION
Intersection over union is a popular evaluation metric used in object detection

problems to measure the accuracy of the predictions using their bounding boxes,
which are rectangular boxes that represent a certain object area in an image. For this
evaluation to be possible we need the annotated bounding box, and the predicted
bounding box of the detector. Once we have those two, we can calculate it. The
calculated metric is a number that ranges from 0 to 1, which gives an estimate on how
accurate our prediction.
Analyzed Papers in a system that needs to detect faces in images is the visual
variations in each image, like pose and lighting, which requires efficient models that
can differentiate a face from other background objects. Haoxiang Li et al. face and
overcome those problems by using several Convolutional Neural Networks with
multiple detection stages, which have a calibration stage after each of them. Their
solution involves multiple convolutional layers, pooling layers, normalization layers,
and one fully connected layer.
Page | 21
Normalization layers help to normalize the input parameters of each layer, improving
the training time and efficiency. This necessity arises because the distribution from a
layer’s input has changes whenever the previous layer parameters also change, which
slows down the training of the model. As a result, they managed to outperform other
models at the time using two different datasets, namely, Face Detection Dataset and
Benchmark (FDDB) and Annotated Faces in the Wild (AFW).
Figure3.4 - Diagram of the face detection model.
3.9 GENDER AND AGE ESTIMATION
The problem of automatically extracting age related attributes from facial images has
received increasing attention in recent years and many methods have been put fourth.
A detailed survey of such methods can be found. We note that despite our focus here
on age group classification rather than precise age estimation (i.e., age regression), the
survey below includes methods designed for either task. Early methods for age
estimation are based on calculating ratios between different measurements of facial
features. Once facial features (e.g. eyes, nose, mouth, chin, etc.) are localized and
their sizes and distances measured, ratios between them are calculated and used for
classifying the face into different age categories according to hand-crafted rules.
Gender classification. A detailed survey of gender classification methods can be

found in and more recently. Here we quickly survey relevant methods. One of the
early methods for gender classification used a neural network trained on a small set of
near-frontal face images. In the combined structure of the head (obtained using a laser
scanner) and image intensities were used for classifying gender applied directly to
image intensities. Rather than AdaBoost for the same purpose, here again, applied to
Page | 22
image intensities. Finally, viewpoint-invariant age and gender classification was

presented. More recently, used the Weber’s Local texture Descriptor for gender
recognition, demonstrating near perfect performance on the FERET benchmark. In
intensity, shape and texture features were used with mutual information, again
obtaining near-perfect results on the FERET benchmark. More recently, uses a similar
approach to model age progression in subjects under 18 years old. As those methods
require accurate localization of facial features, a challenging problem by itself, they
are unsuitable for in-the-wild images which one may expect to find on social
platforms
Gender and Age Estimation the same outcome as for the investigation regarding face
detection systems, also for gender and age classification models. Facial gender
classification and age estimation have many challenges. Gender recognition of face
images is an important task in computer vision as many applications depend on the
correct gender assessment. Examples of these applications include visual surveillance,
marketing, intelligent user interfaces, demographic studies, etc. The gender
recognition problem is usually divided into several steps, similarly to other
classification problems object detection, preprocessing, feature extraction and
classification. In the detection phase, the face region is detected and cropped from the
image. Then, a preprocessing technique is used to reduce variations in scale and
illumination. After this normalization, the feature extraction step aims at obtaining
representative and discriminative descriptors of the face region. Finally, a binary
classifier that learns the differences between male and female representations is
trained. Gender prediction is followed for two-class that can be male or female. It is
easy for human but not for machine to classify gender easily. For gender
classification may methods and models have been bullied based on additional
information from hairstyle, body shape, clothing and facial features. While in age
estimation, it is not feasible to predict authentic age as of now. So, age grouping is
still utilized for age prediction using facial images. Moreover, number of good
datasets for age estimation and gender classification are limited for conduction of
intense research. This is because deep neural networks have shown good
performances in predicting the age and gender from images. This was validated by S.
Lapuschkin et al. where they achieved an accuracy of 92.6% on gender prediction and
62.8% on exact age class prediction. For this, they used a pre-trained model (VGG-
Page | 23
16), using both ReLU and Pooling layers in their CNN. Other work also followed the
same approach using a pre-trained model and achieve similar results on gender
classification, using the Caffe model and Face Recognition pre-trained models.
Figure3.5 - Final model workflow
4.RESULTS AND DISCUSSION
4.1 AVAILABLE DATABASE
Proposed work has been working 6 categories and more than 25 images. The images a
mainly collected from google and real time on age and gender recognition and has
Page | 24
therefore created a dataset consisting of, and from several other platforms. Downloads
or more information can be found. This database is very attractive since at last 20
images of the same category are present, which is essential for a good cognition at a
large scale.
4.2 RESULTING DATABASE
Finally, by mixing a bit the three sources exposed in the previous paragraph, the final
database contains age-based category. The repartition is roughly the following 60%
from real time datast,30% from the google. The age and gender belong to different
category
The list of age and gender category used for testing are as follows:
1. New born child -Age 0-2

2. Child - Age 3-9
3. Teen - Age 10-15
4. Young - Age 16-24
5. Young Adult - Age 25-40
6. Adult – Age 41-59
7. Senior - Age 60-101
Gender classification on 2 categories
1. Male
2. Female
4.3 RESULTS
In order to test the efficiency one can, collect additional pictures of image that contain
no human face present in the database and see if the system recognize them. But to
have significant results another set of suitable test images could have to be found. So,
a ground truth evaluation of the database has bn conducted.
It consists of going though all the images in the database and such the best match. If
the image indicated is part of the same category as the image under test then it’s a
successful cognition. By doing this for whole database the performance of the system
can be evaluated by establishing the recognition rate.
Page | 25
4.4 SAMPLE SCREENSHOTS
In this section various results of the projects are discussed along with the snapshots. It
also describes the conclusion of the project and also the future enhancement which
can be included further done.
Figu4.1: User interface page
This page is user interface it is used to execute the program. This page helps to select
an image files from device.
Page | 26
Figu4.2: Output page
In this page the image file can be displayed by age and gender by detecting face.
Page | 27
Figure 4.3: Output page
In this page the image file can be displayed by age and gender by detecting face of
single person.
Page | 28
Figure 4.4: Output page
This page is the output page. In this page the image file can be displayed by age and
gender by detecting face of two persons .
Page | 29
Figure 4.5: Console page
In this page print both age and gender in the console page with timings.
Page | 30
5.CONCLUSION
Convolutional Neural Network (CNN) have grown rapidly over the last years. This
enabled us to create, using multiple models and frameworks, a system capable of
detecting faces and classifying them by age and gender. The main objective of this
research work was to create an efficient system that was able to detect faces in images
and to classify such faces into age and gender, and to evaluate wrong outputs in order
to find an underlying reason for such failures. In order to fulfill such objective,
various frameworks capable of detecting faces in images and capable of classifying
those into age and gender classes were tested and validated in order to understand
which would fit better into our problem.
For the age and gender classification, we have validated that such systems perform
poorly when their confidence threshold to accept a prediction as valid is low. From
our analysis, we can conclude that if we take examples with a low confidence level,
they have a higher probability of being incorrect. This can be used to filter out
examples that need another kind of validation to guarantee that they are indeed
correct. Adding a human-in-the-loop validation system on the filtered samples could
increase the overall accuracy and provide more reliable statistics to be used. the fact
that makeup affects the classification of the model and the existence of wrongly
annotated samples in the dataset made it harder to confirm the actual impact that
wrongly predicting data near the boundaries of the classes have in the overall
accuracy of the system.
Finally, the proposed system was created, and we achieved good results on age and
gender classification when compared to existing state-of-the-art models, while for
face detection, results exceeded the state-of-the-art models that were analyzed. Until
the presented system is actually deployed in a real environment, the performance
obtained with the dataset cannot be truly validated. Once deployed and field validated,
the system can potentially be installed not only in supermarkets but in any
commercial area that could benefit from customer statistics gathering, helping in the
management decision making.
Future Work The dataset used in our tests has some limitations when it comes to
wrongly annotated images and lack of diverse angles. The angles are mostly aligned
in the current dataset, as opposed to what would be obtained in a supermarket
Page | 31
environment, where the camera would be viewing the customers from a top-down
view. The final model should be further tested using a real environment scenario to
see how it can adapt to using real data, which was not available at the time this work
was conducted.
In this it was implemented for gender classification only, as it demonstrated better

results when the initial model is not confident on the gender of the person. Similarly,
another model could be used for age classification, in order to be used in cases where
the initial model has low confidence, which, as shown in our tests, those tend to be
wrong more frequently. As future work, such an alternative model could be created
and adopted in order to increase the overall accuracy.
Page | 32
6.BIBLIOGRAPHY
1. Bartlett, M.S.; Littlewort, G.; Fasel, I.; Movellan, J.R. Real Time Face
Detection and Facial Expression Recognition: Development and Applications
to Human Computer Interaction. In Proceedings of the Conference on
Computer Vision and Pattern Recognition Workshop, Madison, WI, USA, 16–
22 June 2003; Volume 5, pp. 53–53.
2. T. Ahonen, A. Hadid, and M. Pietikainen. Face description with local binary
patterns: Application to face recognition.
3. Y. Fu, G. Guo, and T. S. Huang. Age synthesis and estimation via faces: A
survey. Trans. Pattern Anal. Mach. Intell., 32(11):1955–1976, 2010.
4. Shuo Yang, Ping Luo, Chen Change Loy & Xiaoou Tang, “From Facial Parts
Responses to Face Detection: A Deep Learning Approach”, 2015, In
Proceedings of the IEEE International Conference on Computer Vision p.
3676-3684. doi: 10.1109/ICCV.2015.419
5. Shixing Chen, Caojin Zhang and Ming Dong, Deep Age Estimation: From
Classification to Ranking, transactions on multimedia 2017.
6. M Uricár, R Timofte, R Rothe, J Matas and L Van Gool Structured output svm
prediction of apparent age, gender and smile from deep features, Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition
Workshops 2016.
7. Tsun-Yi Yang, Yi-Hsuan Huang, Yen-Yu Lin, Pi-Cheng Hsiu and Yung-Yu
Chaung, SSR-Net: A Compact Soft Stagewise Regression Network for Age
Estimation, Proceedings of the Twenty-Seventh International Joint.
8. F. Dornaika, Arganda-Carreras and C. Belver, Age estimation in facial
images through transfer learning, Machine Vision and Applications 2018.
9. Gil Levi and Tal Hassner Age and Gender Classification using Convolutional
Neural Networks, Intelligence Advanced Research Projects Activity (IARPA)
2015.
10. Mane, S., Shah, G: Facial recognition, expression recognition, and gender
identification, In: Data Management, Analytics and Innovation, pp. 275-
290.Spinger, Singapore (2019)
11. Fang, J., et al: Multi Stage learning for gender ang age prediction.
Neurocomputing 334, 114-124(2019).
Page | 33
12. Ito, k., et al: Age and gender prediction from face images using convolutional
neural network. In: 2018 Asia-Pacific Signal and information Processing
Association Annual Summit and Conference (APSIPA ASC), IEEE (2018).
13. Hosseini, S., et al: Age ang gender classification using Wide Convolutional
Neural network and Gabor Filter. In 2018 International Workshop on
Advanced Image Technology (IWAIT). IEEE (2018).
14. Ke, P., et al: A Novel Face Recognition Algorithm is based on the
combination of LBP and CNN In 2018 14 TH IEEE International Conference
On Signal Processing (ICSP) in 2018.
15. Lee, S.H., et al: Age and gender estimation using deep residual learning
network. In: 2018 International Workshop on Advanced Image Technology
(IWAIT). IEEE (2018).
16. Verma, S., Jariwala, K.N: Age and gender classification using histogram of
oriented gradients and back propagation neural network (2018).
17. Ait-Sahalia, Y., Xiu, D.: Principal component analysis of High-frequency
data.J. Am. Stat. Assoc. 114, 1-17 (2018).
18. Dabiri, Z., Lang, S.: Comparison of independent component analysis,
principal component analysis, minimum noise fraction transformation for tree
spwcies classification using APEX hyperspectral imagery. ISPRS Int.J.Geo-
Inf.7(12),488(2018).
19. Sze, V., et al.: Efficient processing of deep neural network a tutorial and
survey. In, Proceedings of the IEEE105.12,pp.2295-2329(2017).
20. Dehghan, A., et al.: Dager: deep age, gender and emotion recognition using
convolutional neural network. arXiv preprint (2017).
21. Liu, W., et al: A survey of deep neural network architectures and their
application. Neurocomputing 234,11-26(2017).
22. Lapuschkin, S., et al.: Understanding and comparing deep neural network for
age and gender classification. In: Proceeding of IEEE International
Conference on Computer Vision (2017).
23. Masanet, J., Albiol, A., Paredes, r.: Local deep neural network foe gender
recognition. Pattern Recogn. Lett. 70,80-86 (2016).
24. Zhang, K., et al: Gender and smile classification using deep convolutional
neural networks. In: Proceeding of the IEEE Conference on Computer Vision
and Pattern Recognition Workshops (2016).
Page | 34
25. Santarcangelo, V., Ferinella. G M., Battiato, S.: Gender recognition: methods,
datasets and results. In: 2015 IEEE International Conference on Multimedia &
Expo Workshops (ICMEW). IEEE (2015).
26. Rothe, R., Timofie, R., Van Gool, I.: Dex: deep expectation of apparent age
from a single image. In: Proceedings of the IEEE Conference on computer
Vision Workshops (2015).
27. Levi, G., Hassner, T: Age and gender classification using convolutional neural
networks. In: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition Workshops (2015).
28. Van de Wolfshaar, J., Karaaba, M.F., Wiering., M. A: Deep convolutional
neural networks and support vector machines for gender recognition. In: 2015
IEEE Symposium Series on Computational Intelligence. IEEE (2015).
29. Kalansuriya, T.R., Dharmaratne, A. T: Neural network-based age and gender
classification for facial images. ICTer 7(2).1-10(2014).
30. Sang, D.V., Cuong, L.T.B., Van Thieu, V: Multi-task learning for smile
detection, emotion recognition and gender classification. In: Proceedings of
the Eight International Symposium on Information and Communication
Technology. ACM (2014).
31. Ahonen, A. Hadid, and M. Pietikainen. Face description with local binary
patterns: Application to face recognition.
Page | 35

Estimation of Age and Gender Using Convolutional Neural Network

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Estimation of Age and Gender Using Convolutional Neural Network

Uploaded by

Copyright:

Available Formats

Estimation of Age and Gender using Convolutional Neural Network

 Chapter 1 gives the Introduction of this project.

D D Pribavkin, P Y Yakimov et al. 2019 were proposed in their model, during

Input Image from Database

Age and Gender Detection

Figure 2.1: Flowchart of proposed algorithm

Methodology To be able to understand how we could achieve our goals; an

Figure3.1: Main stages of the system

3.1 PATTERN RECOGNITION

Feature Extraction and

Age and Gender Detection

Figure3.2: Main Pattern Recognition Steps

3.2 IMAGE PRE-PROCESSING

3.3 FEATURE EXTRACTION

After pre-processing, in pattern recognition, the important and essential task is to

There are 2 types of representation, an external representation and internal

1 Pre-processing: Process the data so it is in a suitable form.

3 Classification: Compare feature extraction vectors to the various models and

3.5 DEEP LEARNING

Machine Learning is mainly used for object identification, classification problems,

3.6 CONVOLUTIONAL NEURAL NETWORK (CNN)

A convolutional neural network consists of an input layer, hidden layers and an output

In a CNN, the input is a tensor with a shape: (number of inputs) x (input height) x

 Convolutional filters/kernels defined by a width and height (hyper-

Fully connected layer:

Each neuron in a neural network computes an output value by applying a specific

The vector of weights and the bias are called filters and represent

 Image classification, object detection, segmentation, face recognition;

 Self-driving cars that leverage CNN based vision systems;

 Classification of crystal structure using a convolutional neural network;

 And many more, of course

Figure3.3: Example of Convolutional Neural Network.

Convolutional Neural Network is one of the main categories to do image

Image recognition in CNN

CNNs are often used in image recognition systems. In an error rate of 0.23% on

The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object

One of the applications of convolutional neural networks (CNN) is perhaps the

CNN is an active architecture of approach of profound learning which takes input as

3.7 OPENCV GOOGLENET

OpenCV is a Python open-source library, which is used for computer vision in

In OpenCV, the CV is an abbreviation form of a computer vision, which is defined as

3.8 FACE DETECTION

Intersection over union is a popular evaluation metric used in object detection

Figure3.4 - Diagram of the face detection model.

3.9 GENDER AND AGE ESTIMATION

Gender classification. A detailed survey of gender classification methods can be

image intensities. Finally, viewpoint-invariant age and gender classification was

Figure3.5 - Final model workflow

4.RESULTS AND DISCUSSION

4.1 AVAILABLE DATABASE

4.2 RESULTING DATABASE

1. New born child -Age 0-2

Gender classification on 2 categories

4.4 SAMPLE SCREENSHOTS

Figu4.1: User interface page

Figu4.2: Output page

Figure 4.3: Output page

Figure 4.4: Output page

Figure 4.5: Console page

In this it was implemented for gender classification only, as it demonstrated better

You might also like