You are on page 1of 7

U-Net Architecture For Image

Segmentation

Presented by
Shashikanta Sahoo
Siddharth Mohanty
Table Of Contents:
• Introduction To U-Net
• Understanding The U-Net Architecture
Introduction To U-Net:

The U-Net architecture, first published in the year 2015, has been a
revolution in the field of deep learning. The architecture won the
International Symposium on Biomedical Imaging (ISBI) cell tracking
challenge of 2015 in numerous categories by a large margin. Some of
their works include the segmentation of neuronal structures in electron
microscopic stacks and transmitted light microscopy images.
With this U-Net architecture, the segmentation of images of sizes
512X512 can be computed with a modern GPU within small amounts of
time. There have been many variants and modifications of this
architecture due to its phenomenal success. Some of them include
LadderNet, U-Net with attention, the recurrent and residual
convolutional U-Net (R2-UNet), and U-Net with residual blocks or
blocks with dense connections.
about

• However, the sliding window approach suffered two main drawbacks that were
countered by the U-Net architecture. Since each pixel was considered separately
considered, the resulting patches we overlapping a lot. Hence, a lot of overall
redundancy was produced. Another limitation was that the overall training
procedure was quite slow and consumed a lot of time and resources. The feasibility
of the working of the network is questionable due to the following reasons.

• The U-Net is an elegant architecture that solves most of the occurring issues. It uses
the concept of fully convolutional networks for this approach. The intent of the U-
Net is to capture both the features of the context as well as the localization. This
process is completed successfully by the type of architecture built. The main idea of
the implementation is to utilize successive contracting layers, which are immediately
followed by the upsampling operators for achieving higher resolution outputs on the
input images.
Understanding The U-Net Architecture:
• The architecture shows that an input image is passed through the
model and then it is followed by a couple of convolutional layers with
the ReLU activation function. We can notice that the image size is
reducing from 572X572 to 570X570 and finally to 568X568. The
reason for this reduction is because they have made use of unpadded
convolutions (defined the convolutions as "valid"), which results in
the reduction of the overall dimensionality. Apart from the
Convolution blocks, we also notice that we have an encoder block on
the left side followed by the decoder block on the right side
• The encoder block has a constant reduction of image size with the help of the max-
pooling layers of strides 2. We also have repeated convolutional layers with an increasing
number of filters in the encoder architecture. Once we reach the decoder aspect, we
notice the number of filters in the convolutional layers start to decrease along with a
gradual upsampling in the following layers all the way to the top. We also notice that the
use of skip connections that connect the previous outputs with the layers in the decoder
blocks.
• This skip connection is a vital concept to preserve the loss from the previous layers so that
they reflect stronger on the overall values. They are also scientifically proven to produce
better results and lead to faster model convergence. In the final convolution block, we
have a couple of convolutional layers followed by the final convolution layer. This layer has
a filter of 2 with the appropriate function to display the resulting output. This final layer
can be changed according to the desired purpose of the project you are trying to perform.

You might also like