Professional Documents
Culture Documents
1
References Resmlp: Feedforward networks for image classification with
data-efficient training. CoRR, abs/2105.03404, 2021. 1
[1] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub-
[13] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko-
biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakan-
reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia
tan, Pranav Shyam, Girish Sastry, Amanda Askell, Sand-
Polosukhin. Attention is all you need. pages 5998–6008,
hini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom
2017. 1
Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler,
[14] Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao
Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric
Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao.
Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack
Pyramid vision transformer: A versatile backbone for dense
Clark, Christopher Berner, Sam McCandlish, Alec Radford,
prediction without convolutions. In Proceedings of the
Ilya Sutskever, and Dario Amodei. Language models are
IEEE/CVF International Conference on Computer Vision,
few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell,
pages 568–578, 2021. 1
M. F. Balcan, and H. Lin, editors, Advances in Neural Infor-
mation Processing Systems, volume 33, pages 1877–1901.
Curran Associates, Inc., 2020. 1
[2] MMClassification Contributors. Openmmlab’s image clas-
sification toolbox and benchmark. https://github.
com/open-mmlab/mmclassification, 2020. 1
[3] Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasude-
van, and Quoc V Le. Autoaugment: Learning augmentation
strategies from data. In Proceedings of the IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition, pages
113–123, 2019. 1
[4] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov,
Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner,
Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl-
vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is
worth 16x16 words: Transformers for image recognition at
scale. volume abs/2010.11929, 2020. 1
[5] Jacob Gildenblat and contributors. Pytorch library for
cam methods. https://github.com/jacobgil/
pytorch-grad-cam, 2021. 1
[6] Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. Deep
residual learning for image recognition. 2016 IEEE Confer-
ence on Computer Vision and Pattern Recognition (CVPR),
pages 770–778, 2016. 1
[7] Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao
Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam
Shazeer, and Zhifeng Chen. Gshard: Scaling giant models
with conditional computation and automatic sharding, 2020.
1
[8] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei,
Zheng Zhang, Stephen Lin, and Baining Guo. Swin trans-
former: Hierarchical vision transformer using shifted win-
dows. ArXiv, abs/2103.14030, 2021. 1
[9] Maria-Elena Nilsback and Andrew Zisserman. 17 category
flower dataset, 2006. 1
[10] Karen Simonyan and Andrew Zisserman. Very deep convo-
lutional networks for large-scale image recognition. CoRR,
abs/1409.1556, 2015. 1
[11] Ilya O. Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lu-
cas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung,
Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario
Lucic, and Alexey Dosovitskiy. Mlp-mixer: An all-mlp ar-
chitecture for vision. CoRR, abs/2105.01601, 2021. 1
[12] Hugo Touvron, Piotr Bojanowski, Mathilde Caron, Matthieu
Cord, Alaaeldin El-Nouby, Edouard Grave, Armand Joulin,
Gabriel Synnaeve, Jakob Verbeek, and Hervé Jégou.