Professional Documents
Culture Documents
Published in AIGuys
You have 1 free member-only story left this month. Sign up for Medium and get an extra one
Save
U-Net
V-Net
U-Net++
R2U-Net
Attention U-Net
150
ResUnet
https://medium.com/aiguys/attention-u-net-resunet-many-more-65709b90ac8b 1/16
03.02.2023, 17:53 Attention U-Net, ResUnet, U-Net++, U²-Net | AIGuys
U²-Net
UNET3+
TransUNET
Swin-UNET
Let’s take a look into the all-new exciting world of U-Net. Feel free to skip the
explanation of U-Net (most of you are already aware of that).
U-Net
To understand the architecture of U-Net let’s understand the given task first. Given
an input image network should try to generate a segmentation output mask which
means each pixel should be classified as the desired object or not (look at the figure
below). So, the idea behind U-net was that, if we feed the image to an encoder that
keeps decreasing the spatial size of the feature block; after sufficient training, the
network will generalize to store only the important features and discard away less
useful data. Finally, the output of the encoder followed by a decoder will generate
the desired output mask. The problem was that the decoder layers were not getting
enough context in order to generate the segmentation mask from the encoder output.
The great idea introduced in the U-Net paper to solve the context issue was to add
skip connections from the encoder to the decoder before each size-reduction step.
Given below is the architecture of the U-Net, we can see that after applying two
Conv blocks image is reduced by half, and from each Conv block (2 Conv blocks),
there is a skip connection that takes the features from the encoder and
concatenates them to the decoder thus giving the decoder enough context to
generate proper segmentation mask. If we replace the operation of concatenation
with the addition we get the network called LinkNet. LinkNet performs similar to
the U-Net (in some cases even beating the U-Net).
https://medium.com/aiguys/attention-u-net-resunet-many-more-65709b90ac8b 2/16
03.02.2023, 17:53 Attention U-Net, ResUnet, U-Net++, U²-Net | AIGuys
The basic principle behind each U-Net variation is first to decrease the feature block
spatially and then increase it back again to the original size creating a bottleneck for
learning the important features. Also, add skip connections between the encoder and
decoder so that network has enough context while generating the segmentation mask.
V-Net
The original V-Net was proposed for 3D or volumetric data but it can be still used
with 2D images. The only difference between U-Net and V-Net is that V-Net uses a
convolutional layer replacing the up-sampling and down-sampling pooling layer.
The idea behind V-Net is that using the Maxpool operation leads to a lot of
information loss thus replacing it with another series of Conv operations without
padding will help in preserving more information.
https://medium.com/aiguys/attention-u-net-resunet-many-more-65709b90ac8b 3/16
03.02.2023, 17:53 Attention U-Net, ResUnet, U-Net++, U²-Net | AIGuys
U-Net++
This network took the idea of skip connection even one step further. Why take the
contextual information from the same spatial dimension between encoder and
decoder. Instead, they took the context from each encoder level (spatial size-wise)
and scaled it accordingly to feed it to every level in the decoder. See the given below
image to understand it clearly.
https://medium.com/aiguys/attention-u-net-resunet-many-more-65709b90ac8b 4/16
03.02.2023, 17:53 Attention U-Net, ResUnet, U-Net++, U²-Net | AIGuys
R2U-Net
This paper tried to use the idea of the recurrent neural network to give the temporal
dynamic behavior to the network. I’m not sure why they think this is going to give
better results but in my testing, it completely failed to converge on a custom
dataset.
https://medium.com/aiguys/attention-u-net-resunet-many-more-65709b90ac8b 5/16
03.02.2023, 17:53 Attention U-Net, ResUnet, U-Net++, U²-Net | AIGuys
Attention U-Net
This network borrowed the idea of an attention mechanism from NLP and used it in
skip connections. It gave the skip connections an extra idea of which region to focus
on while segmenting the given object. This works great even with very small objects
due to the attention present in the skip connections. This one is a little bit more
complex to implement on your own from scratch but the idea behind this is quite
ingenious and simple.
https://medium.com/aiguys/attention-u-net-resunet-many-more-65709b90ac8b 6/16
03.02.2023, 17:53 Attention U-Net, ResUnet, U-Net++, U²-Net | AIGuys
Attention U-Net (Image taken from the original Attention U-Net paper)
Attention mechanism (Image taken from the original Attention U-Net paper)
2. The vector, g, is taken from the next lowest layer of the network. The vector has
smaller dimensions and better feature representation, given that it comes from
deeper into the network.
https://medium.com/aiguys/attention-u-net-resunet-many-more-65709b90ac8b 7/16
03.02.2023, 17:53 Attention U-Net, ResUnet, U-Net++, U²-Net | AIGuys
4. Vector x goes through a stridded convolution such that its dimensions become
64x32x32 and vector g goes through a 1x1 convolution such that its dimensions
become 64x32x32.
5. The two vectors are summed element-wise. This process results in aligned
weights becoming larger while unaligned weights becoming relatively smaller.
6. The resultant vector goes through a ReLU activation layer and a 1x1 convolution
that collapses the dimensions to 1x32x32.
7. This vector goes through a sigmoid layer which scales the vector between the
range [0,1], producing the attention coefficients (weights), where coefficients
closer to 1 indicate more relevant features.
ResUnet
ResUnet is a very interesting idea that takes the performance gain of Residual
networks and uses it with the U-Net. Given below is the architecture of ResUnet. In
my testing, I’ve found that it is a very capable network but with a slightly large
number of parameters.
https://medium.com/aiguys/attention-u-net-resunet-many-more-65709b90ac8b 8/16
03.02.2023, 17:53 Attention U-Net, ResUnet, U-Net++, U²-Net | AIGuys
U²-Net
It uses the idea of U-Net and implements that in each Conv block. It is basically a U-
Net of U-Net. If you want to know more about this network click here.
https://medium.com/aiguys/attention-u-net-resunet-many-more-65709b90ac8b 9/16
03.02.2023, 17:53 Attention U-Net, ResUnet, U-Net++, U²-Net | AIGuys
UNET3+
This is similar to UNet++ but with fewer parameters. This works extremely well,
comparable to Attention U-Net but with even fewer parameters. Another novel idea
in this paper is that classification results are also used to augment the process of
segmentation (it’s called the Classification Guided Module). Explaining the full
implementation detail of this paper is beyond this blog.
https://medium.com/aiguys/attention-u-net-resunet-many-more-65709b90ac8b 10/16
03.02.2023, 17:53 Attention U-Net, ResUnet, U-Net++, U²-Net | AIGuys
TransUNET
TransUNET is based on the idea of using Transformers. Recently Vision Image
transformers made a huge noise in the field thus researchers of this paper thought
why not add that to UNet as well. It is indeed a very capable network but takes a lot
of time to train, much slower to train compared to other variants of UNet (I couldn’t
run this because it has more than 400 million parameters). Explaining the
mechanism of the transformer is another blog in itself. But if you are interested in
Vision Transformers click here.
https://medium.com/aiguys/attention-u-net-resunet-many-more-65709b90ac8b 11/16
03.02.2023, 17:53 Attention U-Net, ResUnet, U-Net++, U²-Net | AIGuys
Swin-UNET
This architecture is also based on Transformers but this time it is Swin-
transformers. This is also pretty slow to train but manageable with an RTX-GPU. In
my testing, it gave some decent results but was still too big and slow compared to
smaller, faster, and better Attention U-Net. To know more about Swin Transformers
click here.
https://medium.com/aiguys/attention-u-net-resunet-many-more-65709b90ac8b 12/16
03.02.2023, 17:53 Attention U-Net, ResUnet, U-Net++, U²-Net | AIGuys
Conclusion
In my testing of all these U-Net variants on a custom dataset, I find that Attention U-
Net and Unet3+ are the best performing networks with a limited number of
parameters (less than 10 million). Other networks might outperform these two but
they require a huge amount of data and computation power.
https://medium.com/aiguys/attention-u-net-resunet-many-more-65709b90ac8b 13/16
03.02.2023, 17:53 Attention U-Net, ResUnet, U-Net++, U²-Net | AIGuys
References:
https://medium.com/aiguys/attention-u-net-resunet-many-more-65709b90ac8b 14/16
03.02.2023, 17:53 Attention U-Net, ResUnet, U-Net++, U²-Net | AIGuys
https://medium.com/aiguys/attention-u-net-resunet-many-more-65709b90ac8b 15/16
03.02.2023, 17:53 Attention U-Net, ResUnet, U-Net++, U²-Net | AIGuys
Semantic Segmentation
Give a tip
Keeping you updated on recent advancements in the AI is our primary goal. Hope you enjoy these curated lists. Take
a look.
Your email
By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our
privacy practices.
https://medium.com/aiguys/attention-u-net-resunet-many-more-65709b90ac8b 16/16