You are on page 1of 39

Understanding deep

learning
A COMPLETE NOVICE’S PERSPECTIVE
Deep learning overview
Why now?
1. Data deluge
2. Cheaper GPUs
3. New techniques
Why is it popular?
Amazing performance in many tasks like never before
1. Machine translation
2. Speech recognition
3. Computer vision
4. Reinforcement learning
5. Natural language processing
Machine translation: Before deep
learning
Rule-based machine translation (1970s)
◦ Bilingual dictionary and linguistic rules
◦ Interlingua
◦ Find a ‘universal language’ as a middle layer
◦ Impossible task, can’t handle exceptions

Example-based machine translation (1980s)


◦ 1984, Makoto Nago (University of Tokyo)
◦ Learn through translations

Statistical machine translation (1990s)


◦ Use corpora to extract statistical relationships
Machine translation: Deep learning
Paper in 2014 by Bengio’s Lab
◦ Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
◦ https://arxiv.org/abs/1406.1078

Basic idea: Recurrent Neural Network Encoder-Decoder


Machine translation: Deep learning
27 September, 2016
A Neural Network for Machine Translation, at Production Scale
◦ https://ai.googleblog.com/2016/09/a-neural-network-for-machine.html
A few years ago we started using Recurrent Neural Networks (RNNs) to directly learn the mapping
between an input sequence (e.g. a sentence in one language) to an output sequence (that same
sentence in another language) [2].
Whereas Phrase-Based Machine Translation (PBMT) breaks an input sentence into words and phrases
to be translated largely independently, Neural Machine Translation (NMT) considers the entire input
sentence as a unit for translation.
The advantage of this approach is that it requires fewer engineering design choices than previous
Phrase-Based translation systems. When it first came out, NMT showed equivalent accuracy with
existing Phrase-Based translation systems on modest-sized public benchmark data sets.
Machine translation: deep learning
Speech recognition
Object recognition
Automatic colouring
Style transfer
Automatic text generation
NLP with deep learning
Word embeddings
Turn text into numbers
◦ Word2Vec

Perform operations on them


Based on shallow neural networks (used as input to deep neural networks)
Intuition
Automatic hierarchical feature extraction
Types of neural networks
Simple feedforward neural networks
Most common type
◦ Input: 1 vector
◦ Output: probabily, real number, or multiple outputs
Recurrent neural network
Like feedforward, but signal feeds back into itself
Recurrent neural networks
Recurrent neural networks
Useful for sequences where the past can affect the future
◦ Natural language
◦ Time series (e.g. finance)

Provide ‘memory’ to neural networks


LSTM (Long-Short Term Memory)
◦ Longer dependencies
◦ Gated Recurrent Units
RNN: Neural machine translation
Seq2Seq model
◦ Deep recurrent architecture
◦ Je suis étudiant -> I am a student
RNN: Text generation
Feed a sequence of characters
◦ Predict the next character
◦ Recurrent units keep the context

Then feed the output back into itself!


Convolutional neural networks
Use a sliding window to capture parts of an image
◦ Then use pooling
◦ E.g. keep only 1 pixel out of 9, or average their values

Allows the extraction of higher level features


◦ By utilising feature locality
◦ And ignoring noise
Feature extraction
Image classification
VCG (right), inception module (bottom), Alexnet (Middle)
Reinforcement learning
Deep Q-learning
Approach by Google Deep Mind
◦ AI company in London

Create AI that can play video games


◦ Goal to extend to real environments

Current evolution
◦ Networks play against each other
◦ Managed to beat professional Go players
Generative Adversarial Network
Putting it all together
Image captioning
Combination of convolutional units and RNN
Same architecture (but with 3d convolution) can be used for video captioning
Style transfer
Feed random images to pretrained network
Dual loss (content and style)
Train to combine the two
Images colorization
Image generation
Through GAN (left – real, right – generated)
Image translation through GANs
Tools for deep learning
https://en.wikipedia.org/wiki/Comparison_of_deep_learning_software
Tensorflow
◦ Google
◦ Very flexible
PyTorch
◦ Open source
◦ Facebook, Nvidia, Twitter and other companies develop it
◦ Useful for research
Keras
◦ Python higher-level interface for Tensorflow
Caffe
◦ Berkley AI research
◦ Useful for computer vision
Commoditised services
Google Cloud AI
◦ https://cloud.google.com/products/machine-learning/
◦ Vision, speech-to-text, text-to-speech, translation, and other

IBM
◦ https://www.ibm.com/watson/products-services/
◦ Visual recognition, translation, sentiment analysis, entity extraction

Microsoft Azure
◦ https://azure.microsoft.com/en-gb/solutions/
◦ Vision, NLP, etc.
So when to use deep learning
Amazing for anything relating to
◦ Audio
◦ Computer vision
◦ NLP
Drawbacks
◦ Loads of data
◦ Lots of processing power
◦ 1000s of hyperparameter
◦ Months of training
When to use
◦ ML or stats better for many problems (especially when datasets are smaller)
◦ If you face a computer vision, audio, etc. problem then deep learning is the best bet
◦ Try using a commoditized service before developing your own
◦ Developing your own solution -> cost effective in the long run (plus IP)
Learn more
Tesseract Academy
◦ http://tesseract.academy
◦ https://www.youtube.com/playlist?list=PLVce3C5Hi9BBfabvhEzYQTQDYEg2vtuxH
◦ Data science, big data and blockchain for executives and managers.

The Data scientist


◦ Personal blog
◦ Covers data science, analytics, blockchain, tokenomics and many more subjects
◦ http://thedatascientist.com/what-deep-learning-is-and-isnt/

You might also like