You are on page 1of 11

Swin Transformer:

Hierarchical Vision
Transformer using Shifted
Windows
Discover how Swin Transformer improves the efficiency and accuracy of vision
transformers for image recognition.

by Lakshya Karwa
Introduction
1 Vision Transformers Overview 2 The Swin Transformer Problem

What are vision Transformers and Why are they How does Swin Transformer address the
important and what are their limitations? limitations of traditional vision transformers?

3 Existing Work 4 Research Highlights

A review of recent research and innovations in Key findings from the Swin Transformer model
the field of vision transformers. and how they address the problem.
Vision Transformers Overview

• Vision transformers - type of NN – for visual recognition tasks - capture long-range dependencies in images – difficult for general
CNNs
• Limitations?
 Computationally expensive
 Require more data for robust training
• Swin Transformer – combines CNNs and Transformers – feature maps – image patches
• Complexity? – linear? How?
• Recent innovations – Use of Attention mechanism to improve efficiency and unsupervised learning techniques
• Research highlight – high accuracy, less time complexity, exceptional results
Vision Teansformer Vs Swin Transformer
Swin Transformer Architecture

1 Hierarchical Feature Maps

How the Swin Transformer uses hierarchical


feature maps for improved accuracy.
Shifted Windows 2
What are shifted windows and how do they
help overcome the limitations of traditional
patch-based transformers? 3 Patch Merging

Explanation of the Swin Transformer's patch


merging mechanism and its effect on
performance.
Shifted Window Approach
Swin Transformer Architecture
Results and Performance
State-of-the-Art Image Dataset Analysis Limitations and
Comparison Considerations

How does the Swin Evaluation of the Swin What factors may affect the
Transformer compare to other Transformer's performance on Swin Transformer's
models in terms of accuracy Img clf. on ImgNet and Obj performance and how can they
and efficiency? Detection on COCO. be addressed?
Application and Future Work

Possible Applications Extension Possibilities Further Research

How can Swin Transformer be used What are some potential areas for What are some possible research
in industries ranging from expanding the Swin Transformer questions and areas for
healthcare to advertising? model? improvement?
Shortfalls
1 Hardware 2 Computational
Requirements Resources

What are the minimum What are the substantial


hardware requirements for computational resource
training and using the Swin requirements of the model?
Transformer model?

3 Data Size Requirements

Are there any specific dataset and data size requirements when it comes to
training and using the model?
Conclusion
Impact Future Potential

How does the Swin Transformer contribute to the field What are the potential benefits and implications of
of computer vision and artificial intelligence? improved image recognition technology?

You might also like