You are on page 1of 11

Swin Transformer:

Hierarchical Vision
Transformer using
Shifted Windows
Discover how Swin Transformer improves the efficiency and accuracy
of vision transformers for image recognition.

by Lakshya Karwa
Introduction
1 Vision Transformers Overview 2 The Swin Transformer Problem

What are vision Transformers and Why How does Swin Transformer address the
are they important and what are their limitations of traditional vision
limitations? transformers?

3 Existing Work 4 Research Highlights

A review of recent research and Key findings from the Swin Transformer
innovations in the field of vision model and how they address the
transformers. problem.
Vision Transformers Overview
• Vision transformers - type of NN – for visual recognition tasks - capture long-range dependencies in images –
difficult for general CNNs
• Limitations?
➢ Computationally expensive
➢ Require more data for robust training
• Swin Transformer – combines CNNs and Transformers – feature maps – image patches
• Complexity? – linear? How?
• Recent innovations – Use of Attention mechanism to improve efficiency and unsupervised learning techniques
• Research highlight – high accuracy, less time complexity, exceptional results
Vision Transformer Vs Swin Transformer
Swin Transformer Architecture

1 Hierarchical Feature Maps

How the Swin Transformer uses


hierarchical feature maps for
Shifted Windows 2 improved accuracy.
What are shifted windows and how
do they help overcome the limitations
of traditional patch-based 3 Patch Merging
transformers?
Explanation of the Swin Transformer's
patch merging mechanism and its
effect on performance.
Shifted Window Approach
Swin Transformer Architecture
Results and Performance
State-of-the-Art Image Dataset Limitations and
Comparison Analysis Considerations

How does the Swin Evaluation of the Swin What factors may affect
Transformer compare to Transformer's the Swin Transformer's
other models in terms of performance on Img clf. on performance and how can
accuracy and efficiency? ImgNet and Obj Detection they be addressed?
on COCO.
Application and Future Work

Possible Applications Extension Possibilities Further Research

How can Swin Transformer be What are some potential areas What are some possible
used in industries ranging for expanding the Swin research questions and areas
from healthcare to Transformer model? for improvement?
advertising?
Shortfalls
1 Hardware 2 Computational
Requirements Resources

What are the minimum What are the substantial


hardware requirements for computational resource
training and using the Swin requirements of the
Transformer model? model?

3 Data Size Requirements

Are there any specific dataset and data size requirements when
it comes to training and using the model?
Conclusion
Impact Future Potential

How does the Swin Transformer contribute to What are the potential benefits and implications
the field of computer vision and artificial of improved image recognition technology?
intelligence?

You might also like