Famous Architecture papers with Code
Image Classification:
Network in Network [Paper] [Note] [Torch Code]
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." arXiv preprint
arXiv:1312.4400 (2013).
VGG [Paper] [Note] [Torch Code]
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for
large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
GoogleNet [Paper] [Note] [Torch Code]
Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition. 2015.
ResNet [Paper] [Note] [Torch Code]
He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
Popular Module
Dropout [Paper] [Note]
Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from
overfitting." Journal of Machine Learning Research 15.1 (2014): 1929-1958.
Batch Normalization [Paper] [Note]
Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by
reducing internal covariate shift[J]. arXiv preprint arXiv:1502.03167, 2015.
Object Detection in Image:
RCNN [Paper] [Note] [Code]
Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik, Rich feature hierarchies
for accurate object detection and semantic segmentation
Spatial pyramid pooling in deep convolutional networks for visual recognition [[Paper]]
([Link] [Note] [Code]
He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional
networks for visual recognition[J]. Pattern Analysis and Machine Intelligence, IEEE
Transactions on, 2015, 37(9): 1904-1916.
Fast R-CNN [[Paper]] ([Link] [Note] [Code]
Ross Girshick, Fast R-CNN, arXiv:1504.08083.
Faster R-CNN, Microsoft Research [[Paper]]
([Link] [Note] [Code] [Python Code]
Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, Faster R-CNN: Towards Real-
Time Object Detection with Region Proposal Networks, arXiv:1506.01497.
End-to-end people detection in crowded scenes [[Paper]]
([Link] [Note] [Code]
Russell Stewart, Mykhaylo Andriluka, End-to-end people detection in crowded
scenes, arXiv:1506.04878.
You Only Look Once: Unified, Real-Time Object Detection [[Paper]]
([Link] [Note] [Code]
Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi, You Only Look Once:
Unified, Real-Time Object Detection, arXiv:1506.02640
Adaptive Object Detection Using Adjacency and Zoom Prediction [[Paper]]
([Link] [Note]
Lu Y, Javidi T, Lazebnik S. Adaptive Object Detection Using Adjacency and Zoom
Prediction[J]. arXiv:1512.07711, 2015.
Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent
Neural Networks [Paper] [Note]
Sean Bell, C. Lawrence Zitnick, Kavita Bala, Ross Girshick. arXiv:1512.04143, 2015.
G-CNN: an Iterative Grid Based Object Detector [Paper]
Mahyar Najibi, Mohammad Rastegari, Larry S. Davis. arXiv:1512.07729, 2015.
Object Detection in Video:
Seq-NMS for Video Object Detection [Paper] [Note]
Wei Han, Pooya Khorrami, Tom Le Paine, Prajit Ramachandran, Mohammad
Babaeizadeh, Honghui Shi, Jianan Li, Shuicheng Yan, Thomas S. Huang. Seq-NMS
for Video Object Detection. arXiv preprint arXiv:1602.08465, 2016
Image Caption:
Exploring Nearest Neighbor Approaches for Image Captioning [Paper]
Devlin J, Gupta S, Girshick R, et al. Exploring Nearest Neighbor Approaches for
Image Captioning[J]. arXiv preprint arXiv:1505.04467, 2015.
Show and Tell: A Neural Image Caption Generator [Paper] [Note]
Vinyals, Oriol, et al. "Show and tell: A neural image caption generator."
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
2015.
Image Generations:
Pixel Recurrent Neural Networks [Paper] [Note]
van den Oord A, Kalchbrenner N, Kavukcuoglu K. Pixel Recurrent Neural
Networks[J]. arXiv preprint arXiv:1601.06759, 2016.
Variational Autoencoder [Paper] [Note]
Kingma D P, Welling M. Auto-encoding variational bayes[J]. arXiv preprint
arXiv:1312.6114, 2013.
DRAW: A recurrent neural network for image generation [Paper] [Torch
Code] [Tensorflow Code] [Note]
Gregor K, Danihelka I, Graves A, et al. DRAW: A recurrent neural network for
image generation[J]. arXiv preprint arXiv:1502.04623, 2015.
Scribbler: Controlling Deep Image Synthesis with Sketch and Color [Paper] [Note]
Patsorn Sangkloy, Jingwan Lu, et al. Scribbler: Controlling Deep Image Synthesis
with Sketch and Color. arXiv preprint arXiv:1612.00835, 2016.
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial
Networks [Paper]
Radford A, Metz L, Chintala S. Unsupervised representation learning with deep
convolutional generative adversarial networks[J]. arXiv preprint arXiv:1511.06434,
2015.
Improved Techniques for Training GANs [Paper]
Salimans T, Goodfellow I, Zaremba W, et al. Improved Techniques for Training
GANs[J]. arXiv preprint arXiv:1606.03498, 2016.
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative
Adversarial Nets[Paper]
Chen X, Duan Y, Houthooft R, et al. InfoGAN: Interpretable Representation
Learning by Information Maximizing Generative Adversarial Nets[J]. arXiv preprint
arXiv:1606.03657, 2016.
Image-to-Image Translation with Conditional Adversarial Networks [Paper] [Note] [Torch
Code] [Tensorflow Code]
Isola P, Zhu J Y, Zhou T, et al. Image-to-Image Translation with Conditional
Adversarial Networks[J]. arXiv preprint arXiv:1611.07004, 2016.
Learning to Generate Images of Outdoor Scenes from Attributes and Semantic
Layouts [Paper] [Note]
Levent Karacan, Zeynep Akata, Aykut Erdem, Erkut Erdem. Learning to Generate
Images of Outdoor Scenes from Attributes and Semantic Layouts [J]. arXiv
preprint arXiv:1612.00215, 2016.
Learning to Discover Cross-Domain Relations with Generative Adversarial
Networks [Paper] [Note]
Kim, Taeksoo, et al. "Learning to Discover Cross-Domain Relations with
Generative Adversarial Networks." arXiv preprint arXiv:1703.05192 (2017).
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial
Networks [Paper] [Note]
Zhu J Y, Park T, Isola P, et al. Unpaired Image-to-Image Translation using Cycle-
Consistent Adversarial Networks[J]. arXiv preprint arXiv:1703.10593, 2017.
BEGAN: Boundary Equilibrium Generative Adversarial Networks [Paper] [Note]
Berthelot, David, Tom Schumm, and Luke Metz. "BEGAN: Boundary Equilibrium
Generative Adversarial Networks." arXiv preprint arXiv:1703.10717 (2017).
StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial
Networks [Paper] [Note] [Tensorflow Code]
Zhang, Han, et al. "StackGAN: Text to Photo-realistic Image Synthesis with
Stacked Generative Adversarial Networks." arXiv preprint arXiv:1612.03242 (2016).
Image & Language
Learning Deep Representations of Fine-Grained Visual Descriptions [Paper] [Note]
Reed, Scott, et al. "Learning deep representations of fine-grained visual
descriptions." Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition. 2016.
Activation Maximization
Synthesizing the preferred inputs for neurons in neural networks via deep generator
networks [Paper] [Note]
Nguyen A, Dosovitskiy A, Yosinski J, et al. Synthesizing the preferred inputs for
neurons in neural networks via deep generator networks[J]. arXiv preprint
arXiv:1605.09304, 2016.
Style Transfer
A neural algorithm of artistic style [Paper] [Note]
Gatys L A, Ecker A S, Bethge M. A neural algorithm of artistic style[J]. arXiv
preprint arXiv:1508.06576, 2015.
Perceptual losses for real-time style transfer and super-resolution [Paper] [Note]
Johnson J, Alahi A, Fei-Fei L. Perceptual losses for real-time style transfer and
super-resolution[J]. arXiv preprint arXiv:1603.08155, 2016.
Super Resolution
Texture Enhancement via High-Resolution Style Transfer for Single-Image Super-
Resolution [Paper] [Note]
Il Jun Ahn, Woo Hyun Nam. Texture Enhancement via High-Resolution Style
Transfer for Single-Image Super-Resolution [J]. arXiv preprint arXiv:1612.00085,
2016.
Others
Fully convolutional networks for semantic segmentation [Paper] [Note]
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic
segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition. 2015: 3431-3440.
Open Courses
CS231n: Convolutional Neural Networks for Visual Recognition [Course Page]
CS224d: Deep Learning for Natural Language Processing [Course Page]
Online Books
Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville
By : Dr. Mazhar Javed Awan