You are on page 1of 20

ÉCOLE NATIONALE SUPÉRUEURE DE TECHNIQUES AVANCÉES

MI204 - Reconnaissance d’Images

TP3 : Classication d’images par CNN

Diogo Santos Gimenez


Eduardo Scardellato e Silva

Palaiseau, France 2024


Table des matières
1 Introduction 2

2 Structuration des données 2

3 Architecture du réseau 3

4 Apprentissage 5

5 Hyperparamètres 8

6 Approfondissement du modèle 9

7 Sur-apprentissage 14
7.1 Learning rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.2 Optimization algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

8 Cartes d’activation 18

9 Bibliography : 18

1
1 Introduction
Dans ce TP nous allons expérimenter les réseaux de neurones convolutifs (CNN) pour
la classi- cation d’images. Pour cela, nous utiliserons l’interface de programmation (API)
Keras 1 qui permet de créer très simplement des architectures neuronales, de les entraı̂ner
et de les tester. Pour pouvoir utiliser nos propres machines et leur puissance limitée (no-
tamment en l’absence de GPU), nous allons travailler sur un nombre adapté de petites
images issues de la base CIFAR10 (Figure 1), et avec des réseaux de petite taille qui ne
sont que partiellement représentatifs des capacités de l’apprentissage profond. L’objectif
est principalement de comprendre la structure des réseaux de convolution, la dynamique
de l’apprentissage, et l’inuence des diérents paramètres sur les performance d’un modèle.

2 Structuration des données


In the realm of Image recognition and many other Ai applications we have the division
of the data set into 3 parts : training, validation and testing. This structured aproch aims
to ensure the robustness of the model and it’s results, allowing it to properly extrapolate
to new datasets. Now we’ll explore a bit about each one of the sets :
— Training set : it is responsible for the foundation of the model’s learning process. It
is vital that it has a sufficient quality and size, usually between 60% to 80% of thge
dataset, since it directly influences the model’s ability to learn the bpatterns within
the data. Besides that it is necessary to not solely rely on it, because it can lead to
overfitting, where the model learn form this set to well, resulting in the impairment
of its performance on new data.
— Validation set : its use is mainly associated to mitigate the occurrence of overfitting,
so it’s used during the training phase as a way to fine tune the parameters of the
model and provide an unbiased evaluation of the model. Trough it we can asses the
model’s learning progress. So we can base ourselves in the model’s performance on
the validation set to tweak our model. Usually it represents from 10% to 20% of the
dataset.
— Test set : it is used as the final assesment that ensures that the model’s predictions
anre not merely reflective of the paterns seen in the training and validation sets,
by exposing it to data that it has never seen, offering an unbiased evaluation of it’s
performance.
So it’s clear that this approach not only facilitates the effective training of models
but also ensures their reliability and generalizability to new data. It strikes a balance
between learning from the data and avoiding overfitting, thereby maximizing the model’s
performance in practical applications.
Another important aspect to tackle about the data that is feeded to the model is the
normalization/standardization aspect. It helps to increase the efficiency of machine lear-
ning models, particularly CNN. By standardizing the range of pixel values across images,

2
normalization ensures that the model does not become biased towards particular scales or
magnitudes of input data. This standardization facilitates faster and more stable conver-
gence during training, as it prevents the gradients from becoming too large or too small,
thereby enhancing the training speed and overall performance. Moreover, normalization
makes the model more robust and generalizable to variations in lighting, contrast, and
other photographic characteristics of images.
In our notebook we used the function ’standardize’ to normalize the images from cifar
10, it followed the following steps :
1. Mean Calculation (img data mean) : The mean of each color channel is calculated
across the entire image (height and width), effectively reducing each image to a mean
per channel. This helps to center the data around zero.
2. Standard Deviation Calculation (img data std) : The standard deviation is also
calculated for each color channel of each image. This measures the spread of pixel
values relative to the mean.
3. Normalization : Each pixel in the image is normalized by subtracting the mean
and dividing by the standard deviation. This operation is applied individually to
each color channel of each image.
Normalizing inputs is a crucial step in pre-processing that significantly benefits the
learning process in several ways. Firstly, it can accelerate the convergence of the learning
algorithm by standardizing features to be on a comparable scale. This adjustment not only
speeds up the learning process but also ensures a smoother and more efficient path to
achieving optimal performance. Secondly, normalization aids in reducing initial bias within
the network. It addresses the issue where certain weights may disproportionately influence
the learning outcome, either by being too large or too small relative to others, thereby
preventing inefficient learning patterns. Furthermore, by bringing data into a standardized
format, normalization enhances the overall performance and stability of the model, parti-
cularly when dealing with unseen data. It minimizes the risk that the model will overfit
to the scale or distribution of the training data, thereby making predictions more reliable
and robust across different datasets. Lastly, in the context of convolutional neural networks
(CNNs) and other deep learning frameworks, normalized input is essential for preventing
neuron saturation and ensuring that gradients are kept within a manageable range during
the backpropagation process. This is vital for the effective training of deep neural net-
works, as it helps in maintaining the gradient flow across many layers without diminishing
or exploding, thus facilitating deeper and more complex learning architectures.

3 Architecture du réseau
The archtexture of our neural network is defined using a sequential model in Keras,
with the following layers :
— Input Layer :

3
— Shape : p32, 32, 3q - This defines the input shape as images of 32 ˆ 32 pixels in
size with 3 color channels (RGB).
— Convolutional Layer (Conv2D) :
— After a Conv2D layer with padding=’same’, the height and width remain the
same if the stride is 1. The depth becomes the number of filters used in that
layer
— Filters : 8 - Specifies the number of filters used in the convolution operation.
— Kernel Size : p3, 3q - The size of the filter that will be used to convolve around
the input.
— Activation : ReLU (Rectified Linear Unit) - Introduces non-linearity to the mo-
del, allowing it to learn more complex patterns.
— Padding : ’same’ - Ensures that the output of the convolution operation has the
same spatial dimensions as the input.
— Regularization : L2 with a coefficient of 0.00 - Implies that, in practice, there is
no regularization effect.
— Dropout Layer :
— Rate : 0.0 - Indicates that during training, no units are dropped out as the rate
is zero.
— Max Pooling Layer (MaxPool2D) :
— Reduces the height and width by the pool size (for example, a 2x2 pool will
halve the height and width)
— Pool Size : p2, 2q - Determines the window size for the pooling operation, which
is 2 ˆ 2 pixels in this case.
— Flattening Layer (Flatten) :
— converts the entire 3D feature map into a 1D vector, so its size is the product
of the height, width, and depth of the previous layer’s output.
— This layer serves to flatten the feature maps into a single dimension.
— Dense Layers :
— layers do not have height and width ; they are fully connected layers where the
’depth’ is the number of neurons in the layer.
— First Dense Layer : 64 units with ReLU activation and L2 regularization coeffi-
cient of 0.00.
— Second Dense Layer : 10 units with softmax activation for multiclass classifica-
tion and L2 regularization coefficient of 0.00.
Besides thtat, it is important to note the trainable parameters of our model, which
refers to the parameters from the network that can be adjusted through learning during
the training of our model, this includes the weights and biases of the network’s layers. In
our network we have 2 types of layers that have trainable parameters the Dense layers
and Convolutional layers. They are calculated as it follows :

4
— Convolutional Layers : ppm ˆ n ˆ dq ` 1q ˚ kq, which can be written as : ((shape
of width of the filter * shape of height of the filter * number of filters in the previous
layer+1)*number of filters)
— Dense Layers : ppc ˚ pq ` 1 ˚ cq, which can be written as : ((current layer neurons *
previous layer neurons)+1*current layer neurons)
In our case the output layer often uses a softmax activation function, which encodes
the class probabilities. The softmax function converts the output neurons’ raw values into
probabilities that sum to one, with each neuron representing the probability that the input
belongs to a particular class. The number of neurons in the output layer corresponds to
the number of classes in the classification task. For a problem with 10 classes, you would
have an output layer with 10 neurons, each representing the probability of one class

4 Apprentissage
In the learnig aspect of our task, there are 3 notions that are fundamental to this
process, they are :
— Epoch : A full pass through the entire training dataset. Requires many epochs for
optimal convergence.
— Step : One pass through a batch of the training dataset, contributing to one epoch.
— Batch : A subset of the training data for one update of the model’s parameters.
We now want to analyse how changing the size of the ’batchsize’, in the beginning we
used it equal to 32, and we also used it as 16 and 64, as a way to see the way that it
affects the learning process. When we increased the batchsize, we found that it speed up
the calculation time of an step but did not really notice an influence in the that that it took
to compute an epoch. When we increased it’s size, we saw that a bit of the opposite. About
the learning curves, increasing the batchsize makes the model struggle with generalizing,
as seen by the more significant fluctuations and poorer validation performance. The larger
batch size could be causing the optimization process to settle in less optimal regions of the
weight space. With a smaller batchsize the model seems to be learning better, indicated
by the smoother curves and better validation performance. This might be due to the more
frequent updates allowing the model to navigate the optimization landscape more efficiently.

5
Figure 1 – Learning curves with batchsize = 16

Figure 2 – Learning curves with batchsize = 32

6
Figure 3 – Learning curves with batchsize = 64

Figure 4 – Vizualization of how the batchsize influences the fiiting

Another important aspect about the learning step of our model is the optmization
functions, which are at the center of the learning process. The choice of an optimization
function affects the speed and quality of learning. Simple methods like Stochastic Gradient
Descent (SGD) provide a foundational approach, introducing randomness in parameter

7
updates that can help in finding global minima but may require careful tuning and longer
training times.
Enhancements like SGD with Momentum accelerate learning by adding a fraction of
the previous update to the current one, which propels the parameters towards the optimum
more steadily and helps avoid getting stuck in local minima.
Advanced optimizers like Adam adapt the learning rate for each parameter based on
estimates of first and second moments of the gradients. This adaptability makes Adam
exceptionally well-suited for high-dimensional parameter spaces in CNNs, where it consis-
tently achieves good performance with minimal parameter tuning.

5 Hyperparamètres
Convolutional Neural Networks (CNNs) rely heavily on hyperparameters, which are pre-
set configurations that shape the network’s structure and learning dynamics. The following
ones are present in our notebook ?
— Number of Filters in Conv2D : filters=8 influences the complexity and com-
putational load of the model.
— Kernel Size in Conv2D : kernel size = (3, 3) affects the granularity of features
extracted.
— Activation Functions : ’relu’ and ’softmax’ govern the non-linear transforma-
tions applied to inputs.
— Padding in Conv2D : ’same’ ensures the output tensor’s spatial dimensions are
preserved.
— Regularization : l2(0.00) applies regularization to mitigate overfitting, though it’s
set to have no effect in this model.
— Dropout Rate : Dropout(0.0) is meant to prevent overfitting by randomly deacti-
vating input units, though it’s not active in this case.
— Size of Dense Layer : Dense(64, ...) determines the complexity of feature com-
binations before the output layer.
— Batch Size : batch size=64 and batch size=8 influence the gradient estimation
and speed of updates.
— Number of Epochs : epochs=20 and epochs=10 affect how well the model learns
from the entire dataset.
— Learning Rate of SGD : learning rate=0.01 is crucial for the convergence speed
and precision of the model.
— Momentum in SGD : momentum=0.0 is intended to accelerate convergence, though
it’s not utilized in this setup.

8
6 Approfondissement du modèle
Looking at the initial model, we can see that it was defined with simplicity and usability
in mind, that is to run as fast as possible, this means that the results of the model are not
as good, converging to a a precision of 41.80% on the test images, as shown in the Figure
5.

Figure 5 – Result of the initial model

We can also see it’s normalised confusion matrix and it’s learning curves, as it follows :

Figure 6 – Learning curves for the original model

9
Figure 7 – Normalised Confusion Matrix for the original model

From the learning curves, we see that the training loss is gradually decreasing, which is
a good sign of the model learning from the training dataset. However, the validation loss
is erratic and tends to increase, indicating the model may be overfitting, which is further
suggested by the training accuracy, that improves over epochs, but the validation accuracy
remains low and unstable.
The confusion matrix, which shows normalized values indicating the proportion of pre-
dictions for each class, reveals several insights. No class is being predicted with very high
accuracy ; most classes have precision well below 50%. The model is particularly struggling
with certain classes, confusing them with others, such as ’ship’ being frequently misclas-
sified as ’plane’ and ’truck’, and ’cat’ being misclassified as ’dog’. The diagonal elements,
which represent correct predictions, are not dominant, which ideally they would be. This
confusion among classes indicates that the model is not learning distinctive features for
each class effectively.
Overall, the original model is currently not performing optimally and would benefit from
strategies aimed at reducing overfitting and improving its ability to distinguish between
classes.
With this in mind, we set out to design our own model. We based ourselves in the

10
available literature about CNN and image recognition, we tried to implement it into the
notebook that was given to us, but we had many issues with compatibility. So we decided
to create another notebook called ”Our model”.In the next part we are going to talk about
what we added to it and how it helps us to achive better results
1. Data Usage :
— Our model : We utilised an increased more images from the data base.
— Improvement : With this we expected to improve the model’s learning by
providing it with more data.
2. Model Architecture :
— Our model : We implemented a more complex and deeper CNN architecture,
with multiple convolutional layers interspersed with BatchNormalization and
Dropout layers. This allows for capturing more sophisticated features from the
data.
— Improvement : The added complexity is designed to better handle variability
in the image, which can lead to higher classification accuracy.
3. Regularization :
— Our model : We applied L2 regularization to all convolutional layers, in addi-
tion to strategically using Dropout to prevent overfitting. This helps keep the
model generalizable to unseen data.
— Improvement : More rigorous regularization and the use of Dropout helped to
build a model that not only learns well from training data but also maintains
its performance on test data.
4. Data Preprocessing :
— Our model : We utilized Z-score normalization, which standardizes the input
data to have a mean of zero and standard deviation of one. In addition, we
employed data augmentation techniques through ImageDataGenerator.
— Improvement : Z-score normalization and data augmentation enhance the mo-
del’s ability to handle variations in input data, resulting in better generalization
and robustness.
5. Optimization :
— Our model : We adopted an adaptive scheme for the learning rate (Piecewise-
ConstantDecay), allowing fine-tuning of the learning rate at different stages of
training.
— Improvement : This enables the model to better adapt over the course of
training, potentially achieving better results in fewer epochs, and adjusting to
changes in the gradient.
Our proposed model gives us the following results :

11
Figure 8 – Result of our proposed model

Figure 9 – Learning curves for our proposed model

12
Figure 10 – Normalised Confusion Matrix for our proposed model

we have that in the learning curves, the training and validation loss both show a decrea-
sing trend, with the validation loss following closely to the training loss, which is indicative
of good generalization. The training and validation accuracy curves are both rising, with
the validation accuracy closely tracking the training accuracy. This suggests the model is
learning effectively and not overfitting as significantly as before.
The normalized confusion matrix shows high values along the diagonal for all classes,
indicating a high true positive rate for each category. Classes like ’ship’, ’truck’, and ’frog’
have particularly high values, showing that the model predicts these with high accuracy.
While there are some off-diagonal elements, which indicate misclassifications, they are
relatively low, suggesting that the model is reasonably good at distinguishing between
the different classes.
Finally, the precision table confirms the confusion matrix findings, with high precision
scores across all classes. The overall accuracy of the model on the test set is 84.04%, which
is quite strong for a 10-class classification problem.
Overall, these images show that the model has been well-tuned and is performing well
across all classes, with good generalization from training to validation datasets.

13
7 Sur-apprentissage
To explore the occurrence of overfitting, we decided to modify the original model given
to us. Being specif, we increased the learning rate (0.1), chose a simpler optmisation algorith
(SGD and we increased the number of epochs (100). All of this had the intention to cause
overfitting in our model, which gave us the following results :

Figure 11 – Result of the overfitted model

Figure 12 – Learning curve of the overffited model

14
The learning curves reveal some potential reasons for this poor performance. The trai-
ning loss decreases initially but then becomes erratic, while the validation loss is highly
unstable and generally increases over time. The training accuracy shows improvement but
plateaus at a level that is not particularly high, and the validation accuracy is both low
and volatile. This, however, does not necessarily indicate overfitting, which is typically cha-
racterized by a low training loss with a high validation loss, and a high training accuracy
with a low validation accuracy, with the gaps between the two growing over time. As it is
shown in the next figure :

Figure 13 – Overffited learning curve

But we also have the model performing really well on training data and poorly on the
validation as it follows :

Figure 14 – Training results of the overfitted model

So we are going to assume that the model is overfitted. And we are going to analyse how
each of the parameters that we changed from the original model can change our results.

7.1 Learning rate


Here we only made the learning rate smaller, from 0.1 to 0.01. These were our results :

15
Figure 15 – Results when we changed the learning rate

Figure 16 – Learning curve when we changed the learning rate

The learning curve shows a model that seems to be learning effectively at first, as
indicated by the rapid decrease in training loss. However, the validation loss soon surpasses
the training loss and continues to rise for the remainder of the training, which is a clear
sign of overfitting. Even tough the accuracy has increased.

16
7.2 Optimization algorithm
Here we we used ADAM (lr=0.01), which gave us the following results :

Figure 17 – Results when we used ADAM

Figure 18 – Learning curve when we used ADAM

The learning curve depicted in the image shows both training and validation loss oscil-
lating significantly throughout the training process. The training and validation accuracy’s

17
are extremely low, barely above 10%, and they also display a lot of volatility. This beha-
vior suggests that the model is not learning effectively. o basically it is as good as random
chance. Another curios thing is that it classifies everything as a frog. This can be caused by
the high learning rate which make the weight’s updates too large or even the high number
of epochs may be contribuiting with the high lr. A similar case happens when we only
reduce the number of epochs.

8 Cartes d’activation
We have that the activation maps show brighter areas where the filters of the first
convolutional layer are responding more strongly to features in the input images. These
could be edges, textures, or patterns that the layer has learned to recognize. This kind of
visualization helps understand what the neural network is paying attention to in the early
stages of feature extraction.
Now, we’ll look the activation maps from our proposed model, which is as it follows :

Figure 19 – Activation map from our proposed model

We can clearly see that the first layer looks at edges in the images, as if trying t figure
out the contours of the elements. We expect that as we go into deeper layers, they will
start ”looking” for more complex features like eyes, legs and wheels.
For comparrison we’ll look into the overfitted map’s activation map.

Figure 20 – Activation map from the overfitted model

We can clearly see that it is worse at detecting edges and outlining, which reflects the
poor performance of this model.

9 Bibliography :
— Keras documentation : Keras 3 API documentation. (s.d.). Keras : Deep Learning for
humans. https ://keras.io/api/

18
— SENGUPTA, J. (2023, 21 de maio). How to decide the hyperparameters in CNN. Me-
dium. https ://medium.com/@sengupta.joy4u/how-to-decide-the-hyperparameters-in-
cnn-bfa37b608046
— Vasudev, R. (2019, 11 de fevereiro). Understanding and Calculating the number of Pa-
rameters in Convolution Neural Networks (CNNs). Medium. https ://towardsdatascience.com/unders
and-calculating-the-number-of-parameters-in-convolution-neural-networks-cnns-fc88790d530d : :text=
— Convolutional Neural Network (CNN) : Tensorflow Core. TensorFlow. (n.d.). https ://www.tensorflow

19

You might also like