You are on page 1of 13

5/20/2017 Learning Deep Learning with Keras

Piotr Migda - blog

Projects Articles Publications Resume About Photos

Learning Deep Learning with Keras

30 Apr 2017 Piotr Migda [machine-learning] [deep-learning] [overview]
see: tweet by Franois Chollet (the creator of Keras) with over 140 retweets
see: Facebook post by Kaggle with over 200 shares
see: like it? upvote it on the Hacker News :)

I teach deep learning both for a living (as the main instructor, in a Kaggle-
winning team1) and as a part of my volunteering with the Polish Childrens Fund giving
workshops to gifted high-school students2. I want to share a few things Ive learnt about
teaching (and learning) deep learning.

Whether you want to start learning deep learning for you career, to have a nice adventure
(e.g. with detecting huggable objects) or to get insight into machines before they take over3,
this post is for you! Its goal is not to teach neural networks by itself, but to provide an
overview and to point to didactically useful resources. 1/13
5/20/2017 Learning Deep Learning with Keras

Dont be afraid of artificial neural networks - it is easy to start! In fact, my biggest regret is
delaying learning it, because of the perceived difficulty. To start, all you need is really basic
programming, very simple mathematics and knowledge of a few machine learning concepts.
I will explain where to start with these requirements.

In my opinion, the best way to start is from a high-level interactive approach (see also:
Quantum mechanics for high-school students and my Quantum Game with Photons). For
that reason, I suggest starting with image recognition tasks in Keras, a popular neural
network library in Python. If you like to train neural networks with less code than in Keras,
the only viable option is to use pigeons. Yes, seriously: pigeons spot cancer as well as
human experts!

What is deep learning and why is it cool?

Deep learning is a name for machine learning techniques using many-layered artificial
neural networks. Occasionally people use the term artificial intelligence, but unless you
want to sound sci-fi, it is reserved for problems that are currently considered too hard for
machines - a frontier that keeps moving rapidly. This is a field that exploded in the last few
years, reaching human-level accuracy in visual recognition tasks (among many other tasks).
Unlike quantum computing, or nuclear fusion - it is a technology that is being applied right
now, not some possibility for the future. There is a rule of thumb:

Pretty much anything that a normal person can do in <1 sec, we can now automate with AI.
- Andrew Ngs tweet

Some people go even further, extrapolating that statement to experts. Its not a surprise that
companies like Google and Facebook at the cutting-edge of progress. In fact, every few
months I am blown away by something exceeding my expectations, e.g.:

The Unreasonable Effectiveness of Recurrent Neural Networks4 for generating fake

Shakespeare, Wikipedia entries and LaTeX articles
A Neural Algorithm of Artistic Style style transfer (and for videos!)
Real-time Face Capture and Reenactment
Colorful Image Colorization
Plug & Play Generative Networks for photorealistic image generation
Dermatologist-level classification of skin cancer along with other medical diagnostic tools
Image-to-Image Translation (pix2pix) - sketch to photo
Teaching Machines to Draw sketches of cats, dogs etc

It looks like some sorcery. If you are curious what neural networks are, take a look at this
series of videos for a smooth introduction:

Neural Networks Demystified by Stephen Welch - video series

A Visual and Interactive Guide to the Basics of Neural Networks by J Alammar 2/13
5/20/2017 Learning Deep Learning with Keras

These techniques are data-hungry. See a plot of AUC score for logistic regression, random
forest and deep learning on Higgs dataset (data points are in millions):

In general there is no guarantee that, even with a lot of data, deep learning does better than
other techniques, for example tree-based such as random forest or boosted trees.

Lets play!
Do I need some Skynet to run it? Actually not - its a piece of software, like any other. And
you can even play with it in your browser:

TensorFlow Playground for point separation, with a visual interface

ConvNetJS for digit and image recognition
Keras.js Demo - to visualize and use real networks in your browser (e.g. ResNet-50)

Or if you want to use Keras in Python, see this minimal example - just to get convinced
you can use it on your own computer.

Python and machine learning

I mentioned basics Python and machine learning as a requirement. They are already
covered in my introduction to data science in Python and statistics and machine learning
sections, respectively.

For Python, if you already have Anaconda distribution (covering most data science
packages), the only thing you need is to install TensorFlow and Keras.

When it comes to machine learning, you dont need to learn many techniques before
jumping into deep learning. Though, later it would be a good practice to see if a given
problem can be solved with much simpler methods. For example, random forest is often a
lockpick, working out-of-the-box for many problems. You need to understand why we need to 3/13
5/20/2017 Learning Deep Learning with Keras

train and then test a classifier (to validate its predictive power). To get the gist of it, start with
this beautiful tree-based animation:

Visual introduction to machine learning by Stephanie Yee and Tony Chu

Also, it is good to understand logistic regression, which is a building block of almost any
neural network for classification.

Deep learning (that is - neural networks with many layers) uses mostly very simple
mathematical operations - just many of them. Here there are a few, which you can find in
almost any network (look at this list, but dont get intimidated):

vectors, matrices, multi-dimensional arrays,

addition, multiplication,
convolutions to extract and process local patterns,
activation functions: sigmoid, tanh or ReLU to add non-linearity,
softmax to convert vectors into probabilities,
log-loss (cross-entropy) to penalize wrong guesses in a smart way (see also Kullback-
Leibler Divergence Explained),
gradients and chain-rule (backpropagation) for optimizing network parameters,
stochastic gradient descent and its variants (e.g. momentum).

If your background is in mathematics, statistics, physics5 or signal processing - most likely

you already know more than enough to start!

If your last contact with mathematics was in high-school, dont worry. Its mathematics is
simple to the point that a convolutional neural network for digit recognition can be
implemented in a spreadsheet (with no macros), see: Deep Spreadsheets with ExcelNet. It
is only a proof-of-principle solution - not only inefficient, but also lacking the most crucial part
- the ability to train new networks.

The basics of vector calculus are crucial not only for deep learning, but also for many other
machine learning techniques (e.g. in word2vec I wrote about). To learn it, I recommend
starting from one of the following:

J. Strm, K. strm, and T. Akenine-Mller, Immersive Linear Algebra - a linear algebra

book with fully interactive figures
Applied Math and Machine Learning Basics: Linear Algebra from the Deep Learning
Linear algebra cheat sheet for deep learning by Brendan Fortuner

Since there are many references to NumPy, it may be useful to learn its basics:

From Python to Numpy by Nicolas P. Rougier 4/13
5/20/2017 Learning Deep Learning with Keras

SciPy lectures: The NumPy array object

At the same time - look back at the meme, at the What mathematicians think I do part. Its
totally fine to start from a magically working code, treating neural network layers like LEGO

There is a handful of popular deep learning libraries, including TensorFlow, Theano, Torch
and Caffe. Each of them has Python interface (now also for Torch: PyTorch).

So, which to choose? First, as always, screw all subtle performance benchmarks, as
premature optimization is the root of all evil. What is crucial is to start with one which is easy
to write (and read!), one with many online resources, and one that you can actually install on
your computer without too much pain.

Bear in mind that core frameworks are multidimensional array expression compilers with
GPU support. Current neural networks can be expressed as such. However, if you just want
to work with neural networks, by rule of least power, I recommend starting with a framework
just for neural networks. For example

If you like the philosophy of Python (brevity, readability, one preferred way to do things),
Keras is for you. It is a high-level library for neural networks, using TensorFlow or Theano as
its backend. Also, if you want to have a propaganda picture, there is a possibly biased (or
overfitted?) popularity ranking:

The state of deep learning frameworks (from GitHub metrics), April 2017. - Franois
Chollet (Keras creator)

If you want to consult a different source, based on arXiv papers rather than GitHub activity,
see A Peek at Trends in Machine Learning by Andrej Karpathy. Popularity is important - it
means that if you want to search for a network architecture, googling for it (e.g. UNet
Keras ) is likely to return an example. Where to start learning it? Documentation on Keras is
nice, and its blog is a valuable resource. For a complete, interactive introduction to deep
learning with Keras in Jupyter Notebook, I really recommend: 5/13
5/20/2017 Learning Deep Learning with Keras

Deep Learning with Keras and TensorFlow by Valerio Maggio

For shorter ones, try one of these:

Visualizing parts of Convolutional Neural Networks using Keras and Cats by Erik Reppel
Deep learning for complete beginners: convolutional neural networks with Keras by
Petar Velikovi
Handwritten Digit Recognition using Convolutional Neural Networks in Python with
Keras by Jason Brownlee (Theano tensor dimension order6)

There are a few add-ons to Keras, which are especially useful for learning it. I created ASCII
summary for sequential models to show data flow inside networks (in a nicer way than
model.summary() ). It shows layers, dimensions of data (x, y, channels) and the
number of free parameters (to be optimized). For example, for a network for digit recognition
it might look like:


Input ##### 32 32 3
Conv2D \|/ ------------------- 896 0.1%
relu ##### 32 32 32
Conv2D \|/ ------------------- 9248 0.7%
relu ##### 30 30 32
MaxPooling2D Y max ------------------- 0 0.0%
##### 15 15 32
Dropout | || ------------------- 0 0.0%
##### 15 15 32
Conv2D \|/ ------------------- 18496 1.5%
relu ##### 15 15 64
Conv2D \|/ ------------------- 36928 3.0%
relu ##### 13 13 64
MaxPooling2D Y max ------------------- 0 0.0%
##### 6 6 64
Dropout | || ------------------- 0 0.0%
##### 6 6 64
Flatten ||||| ------------------- 0 0.0%
##### 2304
Dense XXXXX ------------------- 1180160 94.3%
relu ##### 512
Dropout | || ------------------- 0 0.0%
##### 512
Dense XXXXX ------------------- 5130 0.4%
softmax ##### 10

You might be also interested in nicer progress bars with keras-tqdm, exploration of
activations at each layer with quiver or converting Keras models to JavaScript, runnable in a 6/13
5/20/2017 Learning Deep Learning with Keras

browser with Keras.js.

If not Keras, then I recommend starting with bare TensorFlow. It is a bit more low-level and
verbose, but makes it straightforward to optimize various multidimensional array (or, well,
tensor) operations. A few good resources:

the official TensorFlow Tutorial is very good

Learn TensorFlow and deep learning, without a Ph.D. by Martin Grner
TensorFlow Tutorial and Examples for beginners by Aymeric Damien (with Python 2.7)
Simple tutorials using Googles TensorFlow Framework by Nathan Lintz

In any case, TensorBoard makes it easy to keep track of the training process. It can also be
used with Keras, via callbacks.

Theano is similar to TensorFlow, but a bit older and harder to start. For example, you need to
manually write updates of variables. Typical neural network layers are not included, so one
often uses libraries such as Lasagne. If youre looking for a place to start, I like this

Theano Tutorial by Marek Rei

At the same time, if you see some nice code in Torch or PyTorch, dont be afraid to install
and run it!

Every machine learning problem needs data. You cannot just tell it detect if there is a cat in
this picture and expect the computer to tell you the answer. You need to show many
instances of cats, and pictures not containing cats, and (hopefully) it will learn to generalize it
to other cases. So, you need some data to start. And it is not a drawback of machine
learning or just deep learning - it is a fundamental property of any learning!

Before you dive into uncharted waters, it is good to take a look at some popular datasets.
The key part about them is that they are popular. It means that you can find a lot of
examples what works. And have a guarantee that these problems can be solved with neural

Many good ideas will not work well on MNIST (e.g. batch norm). Inversely many bad ideas
may work on MNIST and no[t] transfer to real [computer vision]. - Franois Chollets tweet 7/13
5/20/2017 Learning Deep Learning with Keras

Still, I recommend starting with the MNIST digit recognition dataset (60k grayscale 28x28
images), included in keras.datasets. Not necessary to master it, but just to get a sense that it
works at all (or to test the basics of Keras on your local machine).

Indeed, I once even proposed that the toughest challenge facing AI workers is to answer
the question: What are the letters A and I? - Douglas R. Hofstadter (1995)

A more interesting dataset, and harder for classical machine learning algorithms, is
notMNIST (letters A-J from strange fonts). If you want to start with it, here is my code for
notMNIST loading and logistic regression in Keras.

If you want to play with image recognition, there is CIFAR dataset, a dataset of 32x32 photos
(also in keras.datasets). It comes in two versions: 10 simple classes (including cats, dogs,
frogs and airplanes ) and 100 harder and more nuanced classes (including beaver, dolphin,
otter, seal and whale). I strongly suggest starting with CIFAR-10, the simpler version.
Beware, more complicated networks may take quite some time (~12h on CPU my 7 year old
Macbook Pro).

Deep learning requires a lot of data. If you want to train your network from scratch, it may
require as many as ~10k images even if low-resolution (32x32). Especially if data is scarce,
there is no guarantee that a network will learn anything. So, what are the ways to go?

use really low res (if your eye can see it, no need to use higher resolution)
get a lot of data (for images like 256x256 it may be: millions of instances)
re-train a network that already saw a lot
generate much more data (with rotations, shifts, distortions)

Often, its a combination of everything mentioned here.

Standing on the shoulders of giants

Creating a new neural network has a lot in common with cooking - there are typical
ingredients (layers) and recipes (popular network architectures). The most important cooking
contest is ImageNet Large Scale Visual Recognition Challenge, with recognition of hundreds
of classes from half a million dataset of photos. Look at these Neural Network Architectures,
typically using 224x224x3 input (chart by Eugenio Culurciello): 8/13
5/20/2017 Learning Deep Learning with Keras

Circle size represents the number of parameters (a lot!). It doesnt mention SqueezeNet
though, an architecture vastly reducing the number of parameters (e.g. 50x fewer).

A few key networks for image classification can be readily loaded from the keras.applications
module: Xception, VGG16, VGG19, ResNet50, InceptionV3. Some others are not as plug &
play, but still easy to find online - yes, there is SqueezeNet in Keras. These networks serve
two purposes:

they give insight into useful building blocks and architectures

they are great candidates for retraining (so-called transfer learning), when using
architecture along with pre-trained weights)

Some other important network architectures for images:

U-Net: Convolutional Networks for Biomedical Image Segmentation

Retina blood vessel segmentation with a convolution neural network - Keras
Deep Learning Tutorial for Kaggle Ultrasound Nerve Segmentation competition,
using Keras
A Neural Algorithm of Artistic Style
Neural Style Transfer & Neural Doodles implemented in Keras by Somshubra
A Brief History of CNNs in Image Segmentation: From R-CNN to Mask R-CNN by Dhruv

Another set of insights: 9/13
5/20/2017 Learning Deep Learning with Keras

The Neural Network Zoo by Fjodor van Veen

How to train your Deep Neural Network - how many layers, parameters, etc

For very small problems (e.g. MNIST, notMNIST), you can use your personal computer -
even if it is a laptop and computations are on CPU.

For small problems (e.g. CIFAR, the unreasonable RNN), you might be still able to use a PC,
but it requires much more patience and trade-offs.

For medium and larger problems, essentially the only way to go is to use a machine with a
strong graphic card (GPU). For example, it took us 2 days to train a model for satellite image
processing for a Kaggle competition, see our:

Deep learning for satellite imagery via image segmentation by Arkadiusz Nowaczyski

On a strong CPU it would have taken weeks, see:

Benchmarks for popular convolutional neural network models by Justin Johnson

The easiest, and the cheapest, way to use a strong GPU is to rent a remote machine on a
per-hour basis. You can use Amazon (it is not only a bookstore!), here are some guides:

Keras with GPU on Amazon EC2 a step-by-step instruction by Mateusz Sieniawski, my

Running Jupyter notebooks on GPU on AWS: a starter guide by Francois Chollet

Further learning
I encourage you to interact with code. For example, notMNIST or CIFAR-10 can be great
starting points. Sometimes the best start is to start with someones else code and run it, then
see what happens when you modify parameters.

For learning how it works, this one is a masterpiece:

CS231n: Convolutional Neural Networks for Visual Recognition by Andrej Karpathy and
the lecture videos

When it comes to books, there is a wonderful one, starting from introduction to mathematics
and machine learning learning context (it even covers log-loss and entropy in a way I like!):

Deep Learning, An MIT Press book by Ian Goodfellow, Yoshua Bengio and Aaron

Alternatively, you can use (it may be good for an introduction with interactive materials, but
Ive found the style a bit long-winded): 10/13
5/20/2017 Learning Deep Learning with Keras

Neural Networks and Deep Learning by Michael Nielsen

Other materials
There are many applications of deep learning (its not only image recognition!). I collected
some introductory materials to cover its various aspects (beware: they are of various
difficulty). Dont try to read them all - I list them for inspiration, not intimidation!

The Unreasonable Effectiveness of Recurrent Neural Networks by Andrej Karpathy
How convolutional neural networks see the world - Keras Blog
What convolutional neural networks look at when they see nudity Clarifai Blog
Convolutional neural networks for artistic style transfer by Harish Nrayanan
Dreams, Drugs and ConvNets - my slides (NSFW); I am considering turning it into a
longer post on machine learning vs human learning, based on common mistakes
Yes you should understand backprop by Andrej Karpathy
Transfer Learning using Keras by Prakash Vanapalli
Generative Adversarial Networks (GANs) in 50 lines of code (PyTorch)
Minimal and Clean Reinforcement Learning Examples
An overview of gradient descent optimization algorithms by Sebastian Ruder
Picking an optimizer for Style Transfer by Slav Ivanov
Building Autoencoders in Keras by Francois Chollet
Understanding LSTM Networks by Chris Olah
Recurrent Neural Networks & LSTMs by Rohan Kapur
Oxford Deep NLP 2017 course
List of resources
How to Start Learning Deep Learning by Ofir Press
A Guide to Deep Learning by YN^2
Staying up-to-date:
r/MachineLearning Reddit channel covering most of new stuff - an interactive, visual, open-access journal for machine learning
research, with expository articles
my links at - though, just saving, not an
automatic recommendation
@fastml_extra Twitter channel
GitXiv for papers with code
dont be afraid to read academic papers; some are well-written and insightful (if you
own Kindle or another e-reader, I recommend Dontprint)
Data (usually from challenges)
AF Classification from a short single lead ECG recording: the PhysioNet/Computing
in Cardiology Challenge 2017 11/13
5/20/2017 Learning Deep Learning with Keras

iNaturalist 2017 Competition (675k images with 5k species), vide Mushroom AI

I would like to thank Kasia Kulma, Martina Pugliese, Pawe Subko, Monika Pawowska and
ukasz Kidziski for helpful feedback on the content and to Sarah Martin for polishing my

If you recommend a source that helped you with your adventure with deep learning - feel
invited to contact me! (@pmigdal for short links, an email for longer remarks.)

The deep learning meme is not mine - Ive just I rewrote from Theano to Keras (with
TensorFlow backend).

1. NOAA Right Whale Recognition, Winners Interview (1st place, Jan 2016), and a fresh
one: Deep learning for satellite imagery via image segmentation (4th place, Apr

2. This January during a 5-day workshop 6 high-school students participated in a rather

NSFL project - constructing a neural network for detecting trypophobia triggers, see e.g.
grzegorz225/trypophobia-detector and cytadela8/trypophobia_detector.

3. It made a few episodes of webcomics obsolete: xkcd: Tasks (totally, by Park or Bird?),
xkcd: Game AI (partially, by AlphaGo), PHD Comics: If TV Science was more like REAL
Science (not exactly, but still its cool, by LapSRN).

4. The title alludes to The Unreasonable Effectiveness of Mathematics in the Natural

Sciences by Eugene Wigner (1960), one of my favourite texts in philosophy of science.
Along with More is Different by PW Andreson (1972) and Genesis and development of a
scientific fact (pdf here) by Ludwik Fleck (1935).

5. If your background is in quantum information, the only thing you need to change is to
. Just expect less tensor structure, but more convolutions.

6. Is it only me, or does Theano tensor dimension order sound like some secret convent?
Before you start searching how to join it: it is about the shape of multi-dimensional
arrays: (samples, channels, x, y) rather than TensorFlows (samples, x, y,
channels) .


Like Share 474 people like this. Be the first of your friends.

HN Submission/Discussion 12/13
5/20/2017 Learning Deep Learning with Keras

Get notified about new posts via email (MailChimp) Piotr Migda - a data science freelancer, with PhD in quantum physics; based in Warsaw,
Poland. Believing in side projects, active in
stared gifted education, developing the Quantum
pmigdal Game and working as a data science instructor

Python (+ of R) for data analysis and machine

learning, JavaScript for data visualization.
Currently focusing on deep learning. 13/13