You are on page 1of 1

Facebook AI and UC Berkley pick a fight with Transformers

With all the insane hype around GPT3, DALLE, PaLM, and many more, now is the
perfect time to cover this paper.

Go through the Machine Learning news these days, and you will see Transformers
everywhere (watch this video IBM Technology for a quick overview to the idea). And
for good reason. Since their introduction, Transformers have taken the world of
Deep Learning by storm. While they were traditionally associated with Natural
Language Processing, Transformers are now being used in Computer Vision Pipelines
too. Just in the last few weeks, we have seen the use of Transformers in some
insane applications in Computer Vision. Thus, it seemed like Transformers would
replace Convolutional Neural Networks (CNNs) for generic Computer Vision tasks.

Researchers at Facebook AI however have something to add. In their paper, “A


ConvNet for the 2020s”, the authors posit that a large part of the reason that
Transformers have been outperforming CNNs in Vision-related tasks has been the
superior training protocols used by Transformers (which are a newer architecture).
Thus, by improving the pipeline around the models, they argue that we can close the
performance gap between Transformers and CNNs. In their words,

In this work, we reexamine the design spaces and test the limits of what a pure
ConvNet can achieve. We gradually “modernize” a standard ResNet toward the design
of a vision Transformer, and discover several key components that contribute to the
performance difference along the way.

The results are quite interesting, and they show that CNNs can even outperform
Transformers in certain tasks. This is more proof that your Deep Learning Pipelines
can be improved with better training, rather than simply going for bigger models.
In this article, I will cover some interesting findings from their paper. But first
some context into Transformers and CNNs and the advantage of each kind of
architecture in Computer Vision tasks.
CNNs: The OG Computer Vision Networks

Convolutional Neural Networks have been the OG Computer Vision Architecture since
their inception. In fact, the foundations of CNNs are older than I am. CNNs were
literally built for vision.

You might also like