You are on page 1of 13

Horizontal Max Pooling a novel approach

for noise reduction in Max Pooling for better


feature detection in DNN
Yash More,
M. Tech.- Computer Science and Engineering,
D. Y. Patil International University(DYPIU),
Pune, India.
Email : yashsanjaymore2@gmail.com
ORCID : 0000-0001-8053-2506

Kunal Dumbre,
M. Tech.- Computer Science and Engineering,
D. Y. Patil International University(DYPIU),
Pune, India.
Email : arjun.dumbre23@gmail.com

Abstract — As of now CNN(Convolutional Neural Network) in DNN(Deep Neural Network)


has become a main focus for better and faster feature detections. More and more deep neural
networks have been proposed to improve the efficiency and accuracy of models. While all the
models are based on the same building pillars like convolution subsampling, pooling,
activations functions and etc. used in the model. Till now many researches have shown
interest in using different activation functions like relu, leaky relu and as of now using
softmax activation function but yet they all are using similar pooling approach "Max
Pooling". Using this we can detect the feature more prominently but also highlighting some
unwanted features too referred to as noise. So in this paper we have introduced a new pooling
approach "Horizontal Max Pooling" to reduce this noise for better feature detection and
extraction. Although this paper contains a single image example we have tested this approach
on many different images and matrices each result in noise reduction and visible and
noticeable change.

Keywords — Horizontal Max Pooling, HMax Pooling, Max Pooling, Convolutional Neural Network, Deep Neural
Network, CNN, DNN.

1. Introduction :

A CNN (Convolutional Neural Networks) model is a type of DNN(Deep Neural Network) model
which allows us to extract higher representations for the image content. Unlike the classical
image recognition where you define the image features yourself, CNN takes the image’s raw
pixel data, trains the model, then extracts the features automatically for better classification[1]
[2],

The reference research paper taken to study was : “ImageNet Classification with Deep
Convolutional Neural Networks” by Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hinton

1
from University of Toronto. It is based on the Alexnet CNN model using the ImageNet data set
in which the researchers has tried to improve the performance of the already existing
networks[1],

In an CNN model we are trying to detect a particular feature from the input image and produce
feature maps (convolved features)[2]. These convolved features will always change depending
on the model, filter, computations etc we perform on the input data which are nothing but the
image's raw pixel data converted into a multidimensional matrix. We convolve(sliding a small
window referred to as a Kernel or a Filter) through this matrix and perform required
computations.

While performing these computations we need to use a pooling technique which is used to
reduce the time to process the dataset by directly reducing the size of the dataset without losing
its important features for a CNN model to learn and the amount of computation performed in the
network.

The further paper is organized as follows: Section 2 reviews different pooling techniques.
Section 3 contains the problem identification while using the current state of the art "Max
Pooling" technique. Section 4 contains the information about the proposed "Horizontal Max
Pooling" technique for noise reduction and better feature extractions. Section 5 contains the
conclusion and Section 6 contains the references used for the study.

2. Pooling Layer :

2.1. Pooling Layer :

A common CNN model architecture is to have a number of convolution and pooling layers
stacked one after the other [3].

2.2. Why to use Pooling Layers :

Pooling layers are used to reduce the dimensions of the feature maps. Thus, it reduces the
number of parameters to learn and the amount of computation performed in the network[3].

The pooling layer summarizes the features present in a region of the feature map generated by a
convolution layer. So, further operations are performed on summarized features instead of
precisely positioned features generated by the convolution layer. This makes the model more
robust to variations in the position of the features in the input image[3].

2.3. Types of Pooling Layers :

2.3.a. Min Pooling :

Min pooling is a pooling operation that selects the minimum element from the region of the
feature map covered by the filter. It is mostly used when the image has a light background since
min pooling will select darker pixels[5].

Min pool only takes the minimum value from the kernel because of which if we apply it on an

2
image matrix we can get a blurry image making it harder to recognise the object separation lines.

The filter size can be user defined and results may vary from if we use a different type of filter.

2.3.b. Average Pooling :

Average Pooling is a pooling operation that calculates the average value from the region of the
feature map covered by the filter. It is usually used after a convolutional layer. It adds a small
amount of translation invariance - meaning translating the image by a small amount does not
significantly affect the values of most pooled outputs. It extracts features more smoothly than
max pooling.[4]

The final output of average pooling on an image matrix does result in smoothing the image
making the features less differential from others as compared to max pooling.

2.3.c. Max Pooling :

Max pooling is a pooling operation that selects the maximum element from the region of the
feature map covered by the filter. Thus, the output after max-pooling layer would be a feature
map containing the most prominent features of the previous feature map[3].

Max pooling extracts more pronounced features like edges as compared to min and average
pooling techniques. Max Pooling is the most commonly used pooling technique.

2.3.d. Global Pooling :

Instead of down sampling patches of the input feature map, global pooling down samples the
entire feature map to a single value. This would be the same as setting the "filter size" to the size
of the input feature map(input image matrix).

The following Figure 1 shows the output of this different pooling approaches on "Matrix A",
were Matrix A is considered to as a image input matrix and the Matrix B, C and D are the output
after pooling operations.

3
[ Figure 1 : Results of different pooling operations on matrix A ]

The above figure is based on matrix calculations so it might be confusing and does not give a
visual representation of what would be the effect of these layers on a real image. So let's see how
this pooling layers affect a image in the Figure 2.

4
[Figure 2 : Effect of different pooling operations on a 9x9 size image[7].]

In the above figure "Original Image" is showing the image taken as input. "Min pooling" is
showing the effect of min pooling operation on the original image in which the original image is
shown with a dark and dim background and also the flowers' middle seed part is missing, thus
losing important features of the image matrix. "Average pooling" is showing the effect of average
pooling operation on the original image where the image is blurry and smooth as compared to the
original image again losing some of the features of the input image. "Max pooling" is showing
the effect of max pooling operation on the original image where the original image has been
sharpened and the details are more prominent as compared to other pooling operations.

This same scenario is mentioned in the below example (Figure 3.1 and 3.2) where we have taken
a line and applied this pooling techniques on them to show the difference in a more
understandable and visually differential way.

[Figure 3.1 : Comparison of different approaches of the pooling layers in which Min pooling gives better result for
images with white background and black object.[7]]

[Figure 3.2 : Comparison of different approaches of the pooling layers in which Max pooling gives better result for
the images with black background and white object (Ex: MNIST dataset).[7]]

In the above figures 3.1 and 3.2 we can see that,


⦁ If we apply Min pooling to an image it darkens the image which is why we can't see
anything in figure 3.2 with balck background but in figure 3.1 the line is clearly visible
with the white backkground. Because of this we can state that it is unreliable and it also
reduces the clarity of features by darkening the image,
⦁ If we apply Average pooling it smoothes and blurs the image does again losing some
5
features as we can see in both the above figure 3.1 and 3.2 the result of average pooling
is a line with some light pixels around making the feature less differential,
⦁ If we apply Max pool to an image, vice versa to min pool it sharpens the image does
making the edges and feature more prominent.

3. Problem Identification :

While using max pooling and sharpening the image we are actually exploiting the features and
highlighting the unwanted features from the input matrix referred to as "noise" by taking the max
value from the filter. Because of this noise we can get unwanted results which may raise some
concerns while using this feature on a large dataset hampering the predictions accuracy.

So in this paper we are trying to introduce a new pooling technique called “Horizontal Max
Pooling” or “HMax Pooling” using which we are endeavoring to reduce the unwanted noise
detected through the max pooling operation. By which we can aim for better feature detection
and increase prediction accuracy in CNN models.

4. Horizontal Max (HMax) Pooling :

In HMax Pooling we are focusing on the max value after horizontal pixel additions and dividing
them with 2 to normalize the output. HMax pooling gives us more prominent edges as compared
to average pooling in result with less noise detection than max pooling making it a better feature
extraction technique as compared to the current standard pooling techniques. It does not hamper
the image too much making it a better featured technique than average pooling and less noisy as
compared to max pooling.

The algorithm for HMax Pooling is as follows,

4.1. HMax Pooling Algorithm :

Step 1 : START,

Step 2 : Take a image as input and convert it to a numpy array,

Step 3 : Convolve through the array with stride = 2*2, Padding = Valid and save this
convolve in an variable as "pools",

Step 4 : Now Loop through this pools,

Step 5 : Here we starts "HMax Pooling" on the numpy array.


Add the 1st and 3rd pixel value, divide it by 2 to normalize the output and
save it in a variable(a). Similarly add 2nd and 4th pixel value divide it by 2 to
normalize and save it in a variable(b),

Step 6 : Compare this two variables a and b and append the max value to a new numpy
array(output array),

6
Step 7 : Convert this output array to a viewable image this would be the "HMax
Pooled" image,

Step 8 : STOP.

The above algorithm takes a raw image as an input converts it to a numpy array(a matrix with n
dimensions) and performs the "HMax pooling" on that array as given in Step 5 and Step 6.

[Note : In the below algorithm we have taken a stride and kernel size of 2x2. So we are having
four pixel values in one kernel window. If the kernel size changes to 3x3 the pixel addition will
also change as adding the pair of 1st three horizontal pixel then 2nd and 3rd and comparing
them for the max value.]

Below a flow chart for HMax Pooling operations is shown. The flowchart is totally based on the
pseudocode of the presented pooling approach to run understanding through code too.

[Figure 4 : Flowchart for the HMax Pooling approach]

7
The algorithm explains : We take a raw image as an input convert it to a numpy array. Then we
create a variable named "pools" to save the convolved sub arrays(with stride and kernel been 2x2
both). This pools variable containing an array of input image with sub arrays of subsampling is
passed to a "Horizontal_Max_Pooling '' named function containing the HMax Pooling definition.

This function takes pools variable as an input and returns a new numpy array containing the size-
reduced feature matrix. In this function as we have kept the stride and kernel 2x2 we only take 4
pixel values as an input having 00,01,10,11 as their address. We add the 1st & the 3rd pixel,
divide the result with 2 and save it in a variable "a". Again we add the 2nd & 4th pixel, divide it
with 2 and save it in a variable "b".

Then we compare these two variables and whichever would be the largest would be inserted in
the new numpy array "pooled". This pooled will be the result of HMax Pooling approach on the
input image matrix in numpy array structure.

After applying HMax Pooling technique on the "Matrix E" the output will be "Matrix F" as
shown below in Figure 5,

[Figure 5 : Showing the output for HMax Pool on the given "Matrix E".]

The above Figure 5 shows the result of HMax pooling algorithm as "Matrix F" after applied on
raw input image matrix "Matrix E". This process is similar to Min, Max and Average pooling
technique from Figure 1 but the calculations approach is different as stated in HMax pooling
algorithm.

Now if just add the numbers from "Matrix F" and take its average it will give us an value as
shown below,

(16 + 16 + 85 + 31) / 4 = 148,

Consider this 148 as the features value of "Matrix F" and calculate the same for the Max,
Average and Min pooling matrices(Matrix B, C and D respectively) from "Figure 1" we would
get.,

8
Min Pool = 11.5,
Average Pool = 119 and,
Max Pool = 199.

While max pooling been the highest but also containing noise. Average pooling been 119 in
middle but blurring the image resulting in feature loss. And min pooling been lowest containing
minimum features.

If we compare this values with the HMax Pooling(148) and plot them on a straight line we can
see that the HMax pooling sits right in between "Average pooling" and "Max Pooling" making it
better than average pooling having more features and lesser than max pooling containing less
noise. This scenario has been shown below in Figure 6,

[Figure 6 : Figure showing the comparison of different pooling layers based on the calculated feature values]

While applying the HMax Pooling we are actually focusing on the horizontal pixels group from
the kernel window taken from the convolutions. The focusing areas of all pooling techniques is
below shown in Figure 7.

9
[Figure 7 : Representation of pixel focus on the particular area in the filter window of 2x2 with stride 2x2, this is an
assumed scenario of a matrix and not a actual output the actual output is shown in "Figure 8"]

As we can see that in "Figure 7" Max pooling is only focusing on the maximum values from a
2x2 kernel like 20,30,112,37. Similarly in Min pooling it focuses on min values like 8,0,0,34,4.
Average pooling focusing on the average of the kernel window resulting in the focusing area
been close to middle. And finally HMax Pool focusing on horizontal pixels resulting in focusing
area been close to middle like average pool but with a small and slim area.

In the above Figure 7, Average pooling and HMax Pooling might look similar but the difference
between the output image on which HMax Pooling is implemented would be distinguishable.

Lets see how a image will be affected when we implement different polling approaches on an
image,

10
[Figure 8 : Result of different pooling approaches on a real life image]

In the above "Figure 8" we can see that after applying "Min Pool" it darkens the image. After
using "Average pool" the image is not viewable because of the average point. The focus is
shifted only on the average value pixels which is why the image is looking like this.
Notice that Max and HMax pool results are close to each other but if we observe them both very
closely we can notice that the noise in HMax pooled image is less as compared to the max pooled
image and we can verify this by applying different edge detection filters on this both to see which
technique is able to detect more features.

The different "Edge Detection filters" used to verify are shown below in "Figure 9.1" and
"Figure 9.2" is showing the results after applying this edge detection filters on the max pooled
and HMax pooled resultant images.

[Figure 9.1 : Figure showing the different edge detection filters been used to verify a better feature detection
approach]

11
[Figure 9.2 : Figure showing results after applying different filters shown in "Figure 9.1" on max pooled and HMax
pooled image from "Figure 8"]

In Figure 9.2 we can clearly see that after applying different edge detection filters on max pooled
and HMax pooled image the HMax pooled image is showing better feature detection as
compared to max pooled approach. This same technique is used on multiple images and matrices
and every time the result was consistent with HMax Pooling giving better feature detection as
compared to max pooling.

5. Conclusion :

Finally as a result of Section 4 we can conclude that there is a noticable difference between max
pooling and HMax pooling with HMax giving better feature detections as compared to max
pooling. Yet if we apply this same HMax pooling technique in a CNN model the result of
prediction accuracy may improve which till now has been done using max pooling approach
which would be the future scope for this pooling technique approach.

6. References :

[1] Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hinton from University of Toronto,
"ImageNet Classification with Deep Convolutional", URL :
https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
[2] Understanding CNN (Convolutional Neural Network), URL :
https://towardsdatascience.com/understanding-cnn-convolutional-neural-network-69fd626ee7d4
[3] CNN | Introduction to Pooling Layer, URL : https://www.geeksforgeeks.org/cnn-
introduction-to-pooling-layer/
[4] Average Pooling, URL : https://paperswithcode.com/method/average-pooling

12
[5] What are some deep details about pooling layers in CNN? by Fatima Hasan, URL :
https://www.educative.io/answers/what-are-some-deep-details-about-pooling-layers-in-cnn
[6] A Gentle Introduction to Pooling Layers for Convolutional Neural Networks, Global Pooling
Layers, URL : https://machinelearningmastery.com/pooling-layers-for-convolutional-neural-
networks/
[7] Maxpooling vs minpooling vs average pooling, Average, Max and Min pooling of size 9x9
applied on an image, URL : https://medium.com/@bdhuma/which-pooling-method-is-better-
maxpooling-vs-minpooling-vs-average-pooling-95fb03f45a9
[8] Guide to Different Padding Methods for CNN Models, URL :
https://analyticsindiamag.com/guide-to-different-padding-methods-for-cnn-models/

13

You might also like