Professional Documents
Culture Documents
You'll also learn about model evaluation and validation, an important technique for training
and assessing neural networks. We also have guest instructor Andrew Trask, author
of Grokking Deep Learning, developing a neural network for processing text and predicting
sentiment.
You'll also use convolutional networks to build an autoencoder, a network architecture used
for image compression and denoising. Then, you'll use a pretrained neural network (VGGnet),
to classify images of flowers the network has never seen before, a technique known
as transfer learning.
Then, you'll learn about word embeddings and implement the Word2Vec model, a network
that can learn about semantic relationships between words. These are used to increase the
efficiency of networks when you're processing text.
Applying Deep Learning
Back to Home
01. Introduction
02. Style Transfer
03. DeepTraffic
04. Flappy Bird
05. Books to Read
02. Style Transfer >> revision to lesson to apply from Github
Code directly
Style Transfer
As an example of the kind of things you'll be building with deep learning models, here is a
really fun project, fast style transfer. Style transfer allows you to take famous paintings, and
recreate your own images in their styles! The network learns the underlying techniques of
those paintings and figures out how to apply them on its own. This model was trained on the
styles of famous paintings and is able to transfer those styles to other images and even videos!
I used it to style my cat Chihiro in the style of Hokusai's The Great Wave Off Kanagawa.
DeepTraffic
Another great application of deep learning is in simulating traffic and making driving
decisions. You can find the DeepTraffic simulator here. The network here is attempting
to learn a driving strategy such that the car is moving as fast as possible
using reinforcement learning. The network is rewarded when the car chooses actions
that result in it moving fast. It's this feedback that allows the network to find a
strategy of actions for optimal speed.
To learn more about setting the parameters and training the network, read
the overview here.
Discuss how you built your network and your results with your fellow students in
Study Groups.
04. Flappy Bird
Flappy Bird
In this example, you'll get to see a deep learning agent playing Flappy Bird! You have the
option to train the agent yourself, but for now let's just start with the pre-trained network given
by the author. Note that the following agent is able to play without being told any information
about the structure of the game or its rules. It automatically discovers the rules of the game by
finding out how it did on each iteration.
Instructions
1. Install miniconda or anaconda if you have not already. You can follow our tutorial for help.
2. Create an environment for flappybird
o Mac/Linux: conda create --name=flappybird python=2.7
o Windows: conda create --name=flappybird python=3.5
3. Enter your conda environment
o Mac/Linux: source activate flappybird
o Windows: activate flappybird
4. conda install -c menpo opencv3
5. pip install pygame
6. pip install tensorflow
7. git clone https://github.com/yenchenlin/DeepLearningFlappyBird.git
8. cd DeepLearningFlappyBird
9. python deep_q_network.py
If all went correctly, you should be seeing a deep learning based agent play Flappy Bird! The
repository contains instructions for training your own agent if you're interested!
Books to read
We believe that you learn best when you are exposed to multiple perspectives on the
same idea. As such, we recommend checking out a few of the books below to get an
added perspective on Deep Learning.
Neural Networks And Deep Learning by Michael Nielsen. This book is more
rigorous than Grokking Deep Learning and includes a lot of fun, interactive
visualizations to play with.
The Deep Learning Textbook from Ian Goodfellow, Yoshua Bengio, and Aaron
Courville. This online book contains a lot of material and is the most rigorous of
the three books suggested.
INSTRUCTOR NOTE:
Anaconda is a distribution of packages built for data science. It comes with conda, a
package and environment manager. You'll be using conda to create environments for
isolating your projects that use different versions of Python and/or different packages.
You'll also use it to install, uninstall, and update packages in your environments. Using
Anaconda has made my life working with data much more pleasant.
04. Installing Anaconda
Installation instructions
Installing Anaconda
Anaconda is available for Windows, Mac OS X, and Linux. You can find the installers
and installation instructions at https://www.anaconda.com/download/.
If you already have Python installed on your computer, this won't break anything.
Instead, the default Python used by your scripts and programs will be the one that
comes with Anaconda.
Choose the Python 3.6 version, you can install Python 2 versions later. Also, choose
the 64-bit installer if you have a 64-bit operating system, otherwise go with the 32-bit
installer. Go ahead and choose the appropriate version, then install it. Continue on
afterwards!
After installation, you’re automatically in the default conda environment with all
packages installed which you can see below. You can check out your own install by
entering conda list into your terminal.
Play
00:00
-00:01
Settings
Enter fullscreen
Play
On Windows
A bunch of applications are installed along with Anaconda:
To avoid errors later, it's best to update all the packages in the default environment.
Open the Anaconda Prompt application. In the prompt, run the following
commands:
Note: In the previous step, running conda upgrade conda should not be necessary
because --all includes the conda package itself, but some users have encountered
errors without it.
In the rest of this lesson, I'll be asking you to use commands in your terminal. I highly
suggest you start working with Anaconda this way, then later use the GUI if you'd like.
Troubleshooting
If you are seeing the following "conda command not found" and are using ZShell, you
have to do the following:
Play
00:02
-00:08
Settings
Enter fullscreen
Play
You can install multiple packages at the same time. Something like conda install
numpy scipy pandas will install all those packages simultaneously. It's also possible to
specify which version of a package you want by adding the version number such
as conda install numpy=1.10 .
Conda also automatically installs dependencies for you. For example scipy depends
on numpy , it uses and requires numpy . If you install just scipy ( conda install scipy ),
Conda will also install numpy if it isn't already installed.
Most of the commands are pretty intuitive. To uninstall, use conda remove
package_name . To update a package conda update package_name . If you want to
update all packages in an environment, which is often useful, use conda update
--all . And finally, to list installed packages, it's conda list which you've seen
before.
If you don't know the exact name of the package you're looking for, you can try
searching with conda search *search_term* . For example, I know I want to
install Beautiful Soup, but I'm not sure of the exact package name. So, I try conda
search *beautifulsoup* . Note that your shell might expand the wildcard * before
running the conda command. To fix this, wrap the search string in single or double
quotes like conda search '*beautifulsoup*' .
It returns a list of the Beautiful Soup packages available with the appropriate package
name, beautifulsoup4 .
conda install numpy
conda install pandas
conda install numpy pandas
SOLUTION:
udacimak v1.1.3
07. More environment actions
Saving and loading environments
A really useful feature is sharing environments so others can install all the packages
used in your code, with the correct versions. You can save the packages to a YAML file
with conda env export > environment.yaml . The first part conda env export writes
out all the packages in the environment, including the Python version.
Above you can see the name of the environment and all the dependencies (along with
versions) are listed. The second part of the export command, >
environment.yaml writes the exported text to a YAML file environment.yaml . This file
can now be shared and others will be able to create the same environment you used
for the project.
Listing environments
If you forget what your environments are named (happens to me sometimes),
use conda env list to list out all the environments you've created. You should see a
list of environments, there will be an asterisk next to the environment you're currently
in. The default environment, the environment used when you aren't in one, is
called root .
Removing environments
If there are environments you don't use anymore, conda env remove -n env_name will
remove the specified environment (here, named env_name ).
https://docs.getpelican.com/en/stable/
Pelican 4.2.0
Pelican is a static site generator, written in Python. Highlights include:
I’ve also found it useful to create environments for each project I’m working on. It works great
for non-data related projects too like web apps with Flask. For example, I have an environment
for my personal blog using Pelican.
Sharing environments
When sharing your code on GitHub, it's good practice to make an environment file and include
it in the repository. This will make it easier for people to install all the dependencies for your
code. I also usually include a pip requirements.txt file using pip freeze (learn more
here) for people not using conda.
More to learn
To learn more about conda and how it fits in the Python ecosystem, check out this article by
Jake Vanderplas: Conda myths and misconceptions. And here's the conda documentation you
can reference later.
09. On Python versions at Udacity
Python versions at Udacity
Most Nanodegree programs at Udacity will be (or are already) using Python 3 almost
exclusively.
At this point, there are enough new features in Python 3 that it doesn't make much sense to
stick with Python 2 unless you're working with old code. All new Python code should be
written for version 3. Read more here.
For most of Python's history including Python 2, printing was done like so:
print("Hello", "world!")
> Hello world!
The print function was back-ported to Python 2 in version 2.6 through
the __future__ module:
# In Python 2.6+
from __future__ import print_function
print("Hello", "world!")
> Hello world!
The print statement doesn't work in Python 3. If you want to print something and have it
work in both Python versions, you'll need to import print_function in your Python 2 code.
Jupyter Notebooks
Back to Home
01. Instructor
02. What are Jupyter notebooks?
03. Installing Jupyter Notebook
04. Launching the notebook server
05. Notebook interface
06. Code cells
07. Markdown cells
08. Keyboard shortcuts
09. Magic keywords
10. Converting notebooks
11. Creating a slideshow
12. Finishing up
02. What are Jupyter notebooks?
Jupyter
Play
01:21
-00:14
Mute
Disable captions
Settings
Enter fullscreen
Play
What are Jupyter notebooks?
Welcome to this lesson on using Jupyter notebooks. The notebook is a web application that
allows you to combine explanatory text, math equations, code, and visualizations all in one
easily sharable document. For example, here's one of my favorite notebooks shared recently,
the analysis of gravitational waves from two colliding blackholes detected by the LIGO
experiment. You could download the data, run the code in the notebook, and repeat the
analysis, in effect detecting the gravitational waves yourself!
Notebooks have quickly become an essential tool when working with data. You'll find them
being used for data cleaning and exploration, visualization, machine learning, and big data
analysis. Here's an example notebook I made for my personal blog that shows off many of the
features of notebooks. Typically you'd be doing this work in a terminal, either the normal
Python shell or with IPython. Your visualizations would be in separate windows, any
documentation would be in separate documents, along with various scripts for functions and
classes. However, with notebooks, all of these are in one place and easily read together.
Notebooks are also rendered automatically on GitHub. It’s a great feature that lets you easily
share your work. There is also http://nbviewer.jupyter.org/ that renders the notebooks from
your GitHub repo or from notebooks stored elsewhere.
Literate programming
Notebooks are a form of literate programming proposed by Donald Knuth in 1984. With
literate programming, the documentation is written as a narrative alongside the code instead of
sitting off by its own. In Donald Knuth's words,
Instead of imagining that our main task is to instruct a computer what to do, let us concentrate
rather on explaining to human beings what we want a computer to do.
After all, code is written for humans, not for computers. Notebooks provide exactly this
capability. You are able to write documentation as narrative text, along with code. This is not
only useful for the people reading your notebooks, but for your future self coming back to the
analysis.
Just a small aside: recently, this idea of literate programming has been extended to a whole
programming language, Eve.
From Jupyter documentation
The central point is the notebook server. You connect to the server through your browser and
the notebook is rendered as a web app. Code you write in the web app is sent through the
server to the kernel. The kernel runs the code and sends it back to the server, then any output is
rendered back in the browser. When you save the notebook, it is written to the server as a
JSON file with a .ipynb file extension.
The great part of this architecture is that the kernel doesn't need to run Python. Since the
notebook and the kernel are separate, code in any language can be sent between them. For
example, two of the earlier non-Python kernels were for the R and Julia languages. With an R
kernel, code written in R will be sent to the R kernel where it is executed, exactly the same as
Python code running on a Python kernel. IPython notebooks were renamed because notebooks
became language agnostic. The new name Jupyter comes from the combination
of Julia, Python, and R. If you're interested, here's a list of available kernels.
Another benefit is that the server can be run anywhere and accessed via the internet. Typically
you'll be running the server on your own machine where all your data and notebook files are
stored. But, you could also set up a server on a remote machine or cloud instance like
Amazon's EC2. Then, you can access the notebooks in your browser from anywhere in the
world.
03. Installing Jupyter Notebook
Installing Jupyter Notebook
By far the easiest way to install Jupyter is with Anaconda. Jupyter notebooks automatically
come with the distribution. You'll be able to use notebooks from the default environment.
Jupyter notebooks are also available through pip with pip install jupyter notebook .
04. Launching the notebook server
Launching the notebook server
To start a notebook server, enter jupyter notebook in your terminal or console. This will
start the server in the directory you ran the command in. That means any notebook files will be
saved in that directory. Typically you'd want to start the server in the directory where your
notebooks live. However, you can navigate through your file system to where the notebooks
are.
When you run the command (try it yourself!), the server home should open in your browser.
By default, the notebook server runs at http://localhost:8888 . If you aren't familiar with
this, localhost means your computer and 8888 is the port the server is communicating on.
As long as the server is still running, you can always come back to it by going to
http://localhost:8888 in your browser.
If you start another server, it'll try to use port 8888 , but since it is occupied, the new server
will run on port 8889 . Then, you'd connect to it at http://localhost:8889 . Every
additional notebook server will increment the port number like this.
If you tried starting your own server, it should look something like this:
05. Notebook interface
Notebook interface
When you create a new notebook, you should see something like this:
06. Code cells
Code cells
Most of your work in notebooks will be done in code cells. This is where you write your code
and it gets executed. In code cells you can write any code, assigning variables, defining
functions and classes, importing packages, and more. Any code executed in one cell is
available in all other cells.
To give you some practice, I created a notebook you can work through. Download the
notebook Working With Code Cells below then run it from your own notebook server. (In
your terminal, change to the directory with the notebook file, then enter jupyter notebook )
Your browser might try to open the notebook file without downloading it. If that happens,
right click on the link then choose "Save Link As…"
Need to understand
07. Markdown cells
Markdown cells
As mentioned before, cells can also be used for text written in Markdown. Markdown is a
formatting syntax that allows you to include links, style text as bold or italicized, and format
code. As with code cells, you press Shift + Enter or Control + Enter to run the Markdown
cell, where it will render the Markdown to formatted text. Including text allows you to write a
narrative alongside your code, as well as documenting your code and the thoughts that went
into it.
You can find the documentation here, but I'll provide a short primer.
Headers
You can write headers using the pound/hash/octothorpe symbol # placed before the text.
One # renders as an h1 header, two # s is an h2, and so on. Looks like this:
# Header 1
## Header 2
### Header 3
renders as
Header 1
Header 2
Header 3
Links
Linking in Markdown is done by enclosing text in square brackets and the URL in
parentheses, like this [Udacity's home page](https://www.udacity.com) for a link
to Udacity's home page.
Emphasis
You can add emphasis through bold or italics with asterisks or underscores ( * or _ ). For
italics, wrap the text in one asterisk or underscore, _gelato_ or *gelato* renders as gelato.
Bold text uses two symbols, **aardvark** or __aardvark__ looks like aardvark.
Either asterisks or underscores are fine as long as you use the same symbol on both sides of
the text.
Code
There are two different ways to display code, inline with text and as a code block separated
from the text. To format inline code, wrap the text in backticks. For
example, `string.punctuation` renders as string.punctuation .
To create a code block, start a new line and wrap the text in three backticks
```
import requests
response = requests.get('https://www.udacity.com')
```
or indent each line of the code block with four spaces.
import requests
response = requests.get('https://www.udacity.com')
Math expressions
You can create math expressions in Markdown cells using LaTeX symbols. Notebooks use
MathJax to render the LaTeX symbols as math symbols. To start math mode, wrap the LaTeX
in dollar signs $y = mx + b$ for inline math. For a math block, use double dollar signs,
$$
y = \frac{a}{b+c}
$$
This is a really useful feature, so if you don't have experience with LaTeX please read this
primer on using it to create math expressions.
Play
00:02
-00:00
Settings
Enter fullscreen
Play
Wrapping up
Here's a cheatsheet you can use as a reference for writing Markdown. My advice is to make
use of the Markdown cells. Your notebooks will be much more readable compared to a bunch
of code blocks.
08. Keyboard shortcuts
Keyboard shortcuts
Notebooks come with a bunch of keyboard shortcuts that let you use your keyboard to interact
with the cells, instead of using the mouse and toolbars. They take a bit of time to get used to,
but when you're proficient with the shortcuts you'll be much faster at working in notebooks.
To learn more about the shortcuts and get practice using them, download the
notebook Keyboard Shortcuts below. Again, your browser might try to open it, but you want
to save it to your computer. Right click on the link, then choose "Save Link As…"
09. Magic keywords
Magic keywords
Magic keywords are special commands you can run in cells that let you control the notebook
itself or perform system calls such as changing directories. For example, you can set up
matplotlib to work interactively in the notebook with %matplotlib .
Magic commands are preceded with one or two percent signs ( % or %% ) for line magics and
cell magics, respectively. Line magics apply only to the line the magic command is written on,
while cell magics apply to the whole cell.
NOTE: These magic keywords are specific to the normal Python kernel. If you are using other
kernels, these most likely won't work.
Timing code
At some point, you'll probably spend some effort optimizing code to run faster. Timing how
quickly your code runs is essential for this optimization. You can use the timeit magic
command to time how long it takes for a function to run, like so:
10. Converting notebooks
Converting notebooks
Notebooks are just big JSON files with the extension .ipynb .
Since notebooks are JSON, it is simple to convert them to other formats. Jupyter comes with a
utility called nbconvert for converting to HTML, Markdown, slideshows, etc.
The slides are created in notebooks like normal, but you'll need to designate which cells are
slides and the type of slide the cell will be. In the menu bar, click View > Cell Toolbar >
Slideshow to bring up the slide cell menu on each cell.
This will show a menu dropdown on each cell that lets you choose how the cell shows up in
the slideshow.
Choose slide type
Slides are full slides that you move through left to right. Sub-slides show up in the slideshow
by pressing up or down. Fragments are hidden at first, then appear with a button press. You
can skip cells in the slideshow with Skip and Notes leaves the cell as speaker notes.
Back to Home
01. Introduction
02. Data Dimensions
03. Data in NumPy
04. Element-wise Matrix Operations
05. Element-wise Operations in NumPy
06. Matrix Multiplication: Part 1
07. Matrix Multiplication: Part 2
08. NumPy Matrix Multiplication
09. Matrix Transposes
10. Transposes in NumPy
11. NumPy Quiz
Part 02_Neural Networks
Back to Home
01. Instructor
02. Introduction
03. Classification Problems 1
04. Classification Problems 2
05. Linear Boundaries
06. Higher Dimensions
07. Perceptrons
08. Why "Neural Networks"?
09. Perceptrons as Logical Operators
10. Perceptron Trick
11. Perceptron Algorithm
12. Non-Linear Regions
13. Error Functions
14. Log-loss Error Function
15. Discrete vs Continuous
16. Softmax
17. One-Hot Encoding
18. Maximum Likelihood
19. Maximizing Probabilities
20. Cross-Entropy 1
21. Cross-Entropy 2
22. Multi-Class Cross Entropy
23. Logistic Regression
24. Gradient Descent
25. Logistic Regression Algorithm
26. Pre-Lab: Gradient Descent
27. Notebook: Gradient Descent
28. Perceptron vs Gradient Descent
29. Continuous Perceptrons
30. Non-linear Data
31. Non-Linear Models
32. Neural Network Architecture
33. Feedforward
34. Backpropagation
35. Pre-Lab: Analyzing Student Data
36. Notebook: Analyzing Student Data
37. Outro
00:00:00.000 --> 00:00:04.040
So you may be wondering why are these objects called neural networks.
AND Perceptron
AND And OR Perceptrons
What are the weights and bias for the AND perceptron?
Set the weights ( weight1 , weight2 ) and bias bias to the correct values that calculate
AND operation as shown above.
Start Quiz:
import pandas as pd
# Print output
num_wrong = len([output[4] for output in outputs if output[4] ==
'No'])
output_frame = pd.DataFrame(outputs, columns=['Input 1', ' Input 2',
' Linear Combination', ' Activation Output', ' Is Correct'])
if not num_wrong:
print('Nice! You got it all correct.\n')
else:
print('You got {} wrong. Keep trying!\n'.format(num_wrong))
print(output_frame.to_string(index=False))
The OR perceptron is very similar to an AND perceptron. In the image below, the OR
perceptron has the same line as the AND perceptron, except the line is shifted down.
What can you do to the weights and/or bias to achieve this? Use the following AND
perceptron to create an OR Perceptron.
OR Perceptron Quiz
Increase the weights
Decrease the weights
Increase a single weight
Decrease a single weight
Increase the magnitude of the bias
Decrease the magnitude of the bias
NOT Perceptron
Unlike the other perceptrons we looked at, the NOT operation only cares about one
input. The operation returns a 0 if the input is 1 and a 1 if it's a 0 . The other inputs
to the perceptron are ignored.
In this quiz, you'll set the weights ( weight1 , weight2 ) and bias bias to the values
that calculate the NOT operation on the second input and ignores the first input.
Start Quiz:
quiz.py
import pandas as pd
# Print output
num_wrong = len([output[4] for output in outputs if output[4] ==
'No'])
output_frame = pd.DataFrame(outputs, columns=['Input 1', ' Input 2',
' Linear Combination', ' Activation Output', ' Is Correct'])
if not num_wrong:
print('Nice! You got it all correct.\n')
else:
print('You got {} wrong. Keep trying!\n'.format(num_wrong))
print(output_frame.to_string(index=False))
XOR Perceptron
Play
00:10
-00:35
Mute
Disable captions
Settings
Enter fullscreen
Play
XOR Perceptron
Quiz: Build an XOR Multi-Layer Perceptron
Now, let's build a multi-layer perceptron from the AND, NOT, and OR perceptrons to
create XOR logic!
The neural network below contains 3 perceptrons, A, B, and C. The last one (AND) has
been given for you. The input to the neural network is from the first node. The output
comes out of the last node.
The multi-layer perceptron below calculates XOR. Each perceptron is a logic operation
of AND, OR, and NOT. However, the perceptrons A, B, and C don't indicate their
operation. In the following quiz, set the correct operations for the perceptrons to
calculate XOR.
QUIZ QUESTION::
Set the operations for the perceptrons in the XOR neural network.
ANSWER CHOICES:
A
B
C
Perceptron Operators
NOT
AND
OR
SOLUTION:
udacimak v1.1.3
10. Perceptron Trick
Perceptron Trick
In the last section you used your logic and your mathematical knowledge to create perceptrons
for some of the most common logical operators. In real life, though, we can't be building these
perceptrons ourselves. The idea is that we give them the result, and they build themselves. For
this, here's a pretty neat trick that will help us.
Perceptron Algorithm
Play
00:00
-00:07
Mute
Disable captions
Settings
Enter fullscreen
Play
Closer
Farther
SOLUTION:Closer
DL 10 S Perceptron Algorithm
Play
00:00
-00:01
Mute
Disable captions
Settings
Enter fullscreen
Play
Time for some math!
Now that we've learned that the points that are misclassified, want the line to move closer to
them, let's do some math. The following video shows a mathematical trick that modifies the
equation of the line, so that it comes closer to a particular point.
11. Perceptron Algorithm
Perceptron Algorithm
And now, with the perceptron trick in our hands, we can fully develop the perceptron
algorithm! The following video will show you the pseudocode, and in the quiz below, you'll
have the chance to code it in Python.
Recall that the perceptron step works as follows. For a point with coordinates
(p,q)(p,q), label yy, and prediction given by the equation \hat{y} = step(w_1x_1 +
w_2x_2 + b)y^=step(w1x1+w2x2+b):
Then click on test run to graph the solution that the perceptron algorithm gives you.
It'll actually draw a set of dotted lines, that show how the algorithm approaches to the
best solution, given by the black solid line.
Feel free to play with the parameters of the algorithm (number of epochs, learning
rate, and even the randomizing of the initial parameters) to see how your initial
conditions can affect the solution!
Start Quiz:
perceptron.py data.csvsolution.py
import numpy as np
# Setting the random seed, feel free to change it and see different
solutions.
np.random.seed(42)
def stepFunction(t):
if t >= 0:
return 1
return 0
Play
01:17
05:50
Disable captions
Settings
Enter fullscreen
Play
We pick back up on log-loss error with the gradient descent concept.
Which of the following conditions should be met in order to apply gradient descent?
(Check all that apply.)
The error function should be discrete
The error function should contain only positive values
The error function should be differentiable
The error function should be normalized
The error function should be continuous
SOLUTION:
udacimak v1.1.3
15. Discrete vs Continuous
Discrete vs Continuous Predictions
In the last few videos, we learned that continuous error functions are better than discrete error
functions, when it comes to optimizing. For this, we need to switch from discrete to
continuous predictions. The next two videos will guide us in doing that.
Need to understand
16. Softmax
Multi-Class Classification and Softmax
Softmax Quiz
cos
log
exp
SOLUTION:exp
Quiz: Coding Softmax
And now, your time to shine! Let's code the formula for the Softmax function in
Python.
Start Quiz:
softmax.py solution.py
import numpy as np
import numpy as np
def softmax(L):
expL = np.exp(L)
sumExpL = sum(expL)
result = []
for i in expL:
result.append(i*1.0/sumExpL)
return result
INSTRUCTOR NOTE:
Correction: At 2:18, the top right point should be labelled -log(0.7) instead of -
log(0.2) .
WEBVTT
Kind: captions
Language: en
Let's dive into the details. The next video will show you how to calculate an error function.
Important
WEBVTT
Kind: captions
Language: en
If this sounds anything like the perceptron algorithm, this is no coincidence! We'll see
it in a bit.
00:00:00.000 --> 00:00:02.580
And now we finally have the tools to write
Instructions
In this notebook, you'll be implementing the functions that build the gradient descent
algorithm, namely:
When you implement them, run the train function and this will graph the several of
the lines that are drawn in successive gradient descent steps. It will also graph the
error function, and you can see it decreasing as the number of epochs grows.
This is a self-assessed lab. If you need any help or want to check your answers, feel
free to check out the solutions notebook in the same folder, or by clicking here.
27. Notebook: Gradient Descent
Workspace
This section contains either a workspace (it can be a Jupyter Notebook workspace or an online
code editor work space, etc.) and it cannot be automatically downloaded to be generated here.
Please access the classroom with your account and manually download the workspace to your
local machine. Note that for some courses, Udacity upload the workspace files
onto https://github.com/udacity, so you may be able to download them there.
Workspace Information:
So, this is just a small recap video that will get us ready for what's coming.
Recall that if we have our data in the form of these points over
here and the linear model like this one, for example,
this will give rise to a probability function that looks like this.
Where the points on the blue or positive region have more chance of being
blue and the points in the red or negative region have more chance of being red.
plots it in the graph and then it returns a probability that the point is blue.
and this mimics the neurons in the brain because they receive nervous impulses,
WEBVTT
Kind: captions
Language: en
This first two videos will show us how to combine two perceptrons into a third, more
complicated one.
WEBVTT
Kind: captions
Language: en
Now, not all neural networks look like the one above. They can be way more
complicated! In particular, we can do the following things:
Multi-Class Classification
And here we elaborate a bit more into what can be done if our neural network needs
to model data with more than one output.
Multiclass Classification
WEBVTT
Kind: captions
Language: en
00:00:00.000 --> 00:00:02.640
We briefly mentioned multi-class classification
Error Function
Just as before, neural networks will produce an error function, which at the end, is
what we'll be minimizing. The following video shows the error function for a neural
network.
Sounds more complicated than what it actually is. Let's take a look in the next few videos. The
first video will show us a conceptual interpretation of what backpropagation is.
WEBVTT
Kind: captions
Language: en
And the next few videos will go deeper into the math. Feel free to tune out, since this
part gets handled by Keras pretty well. If you'd like to go start training networks right
away, go to the next section. But if you enjoy calculating lots of derivatives, let's dive
in!
In the video below at 1:24, the edges should be directed to the sigmoid function and
not the bias at that last layer; the edges of the last layer point to the bias currently
which is incorrect.
important
Chain Rule
We'll need to recall the chain rule to help us calculate derivatives.
Chain Rule
WEBVTT
Kind: captions
Language: en
00:00:00.000 --> 00:00:02.919
So before we start calculating derivatives,
important
WEBVTT
Kind: captions
Language: en
Recall that the sigmoid function has a beautiful derivative, which we can see in the
following calculation. This will make our backpropagation step much cleaner.
35. Pre-Lab: Analyzing Student Data
Lab: Analyzing Student Data
Now, we're ready to put neural networks in practice. We'll analyze a dataset of student
admissions at UCLA.
Instructions
In this notebook, you'll be implementing some of the steps in the training of the neural
network, namely:
This is a self-assessed lab. If you need any help or want to check your answers, feel free to
check out the solutions notebook in the same folder, or by clicking here.
This section contains either a workspace (it can be a Jupyter Notebook workspace or an online
code editor work space, etc.) and it cannot be automatically downloaded to be generated here.
Please access the classroom with your account and manually download the workspace to your
local machine. Note that for some courses, Udacity upload the workspace files
onto https://github.com/udacity, so you may be able to download them there.
Workspace Information:
Back to Home
And as a bonus, we'll be implementing this in a very effective way using matrix
multiplication with NumPy!
02. Gradient Descent
Gradient Descent with Squared Errors
We want to find the weights for our neural networks. Let's start by thinking about the goal.
The network needs to make predictions as close as possible to the real values. To measure this,
we use a metric of how wrong the predictions are, the error. A common metric is the sum of
the squared errors (SSE):
Remember that the output of a neural network, the prediction, depends on the weights
Gradient is another term for rate of change or slope. If you need to brush up on this
concept, check out Khan Academy's great lectures on the topic.
The gradient is just a derivative generalized to functions with more than one variable.
We can use calculus to find the gradient at any point in our error function, which
depends on the input weights. You'll see how the gradient descent step is derived on
the next page.
Below I've plotted an example of the error of a neural network with two inputs, and
accordingly, two weights. You can read this like a topographical map where points on
a contour line have the same error and darker contour lines correspond to larger
errors.
At each step, you calculate the error and the gradient, then use those to determine
how much to change each weight. Repeating this process will eventually find weights
that are close to the minimum of the error function, the black dot in the middle.
Caveats
Since the weights will just go wherever the gradient takes them, they can end up
where the error is low, but not the lowest. These spots are called local minima. If the
weights are initialized with the wrong values, gradient descent could lead the weights
into a local minimum, illustrated below.
Notes
Check out Khan Academy's Multivariable calculus lessons if you are unfamiliar with the
subject.
# Input data
x = np.array([0.1, 0.3])
# Target
y = 0.2
# Input to output weights
weights = np.array([-0.8, 0.5])
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1/(1+np.exp(-x))
def sigmoid_prime(x):
"""
# Derivative of the sigmoid function
"""
return sigmoid(x) * (1 - sigmoid(x))
learnrate = 0.5
x = np.array([1, 2, 3, 4])
y = np.array(0.5)
# Initial weights
w = np.array([0.5, -0.5, 0.3, 0.1])
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1/(1+np.exp(-x))
def sigmoid_prime(x):
"""
# Derivative of the sigmoid function
"""
return sigmoid(x) * (1 - sigmoid(x))
learnrate = 0.5
x = np.array([1, 2. 3, 4])
y = np.array(0.5)
# Initial weights
w = np.array([0.5, -0.5, 0.3, 0.1])
Data cleanup
You might think there will be three input units, but we actually need to transform the
data first. The rank feature is categorical, the numbers don't encode any sort of
relative values. Rank 2 is not twice as much as rank 1, rank 3 is not 1.5 more than rank
2. Instead, we need to use dummy variables to encode rank , splitting the data into
four new columns encoded with ones or zeros. Rows with rank 1 have one in the rank
1 dummy column, and zeros in all other columns. Rows with rank 2 have one in the
rank 2 dummy column, and zeros in all other columns. And so on.
We'll also need to standardize the GRE and GPA data, which means to scale the values
such that they have zero mean and a standard deviation of 1. This is necessary
because the sigmoid function squashes really small and really large inputs. The
gradient of really small and large inputs is zero, which means that the gradient
descent step will go to zero too. Since the GRE and GPA values are fairly large, we
have to be really careful about how we initialize the weights or the gradient descent
steps will die off and the network won't train. Instead, if we standardize the data, we
can initialize the weights easily and everyone is happy.
This is just a brief run-through, you'll learn more about preparing data later. If you're
interested in how I did this, check out the data_prep.py file in the programming
exercise below.
Now that the data is ready, we see that there are six input features: gre , gpa , and the
four rank dummy variables.
After you've written these parts, run the training by pressing "Test Run". The MSE will
print out, as well as the accuracy on a test set, the fraction of correctly predicted
admissions.
Feel free to play with the hyperparameters and see how it changes the MSE.
Start Quiz:
gradient.py data_prep.pybinary.csvsolution.py
import numpy as np
from data_prep import features, targets, features_test, targets_test
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1 / (1 + np.exp(-x))
# Initialize weights
weights = np.random.normal(scale=1 / n_features**.5, size=n_features)
for e in range(epochs):
del_w = np.zeros(weights.shape)
for x, y in zip(features.values, targets):
# Loop through all records, x is the input, y is the target
# TODO: Update weights using the learning rate and the average
change in weights
weights += 0
admissions = pd.read_csv('binary.csv')
# Standarize features
for field in ['gre', 'gpa']:
mean, std = data[field].mean(), data[field].std()
data.loc[:,field] = (data[field]-mean)/std
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1 / (1 + np.exp(-x))
# Initialize weights
weights = np.random.normal(scale=1 / n_features**.5, size=n_features)
for e in range(epochs):
del_w = np.zeros(weights.shape)
for x, y in zip(features.values, targets):
# Loop through all records, x is the input, y is the target
Derivation
Before, we were dealing with only one output node which made the code
straightforward. However now that we have multiple input units and multiple hidden
units, the weights between them will require two indices: w_{ij}wij where ii denotes
input units and jj are the hidden units.
For example, the following image shows our network, with its input units labeled x_1,
x_2,x1,x2, and x_3x3, and its hidden nodes labeled h_1h1 and h_2h2:
The lines indicating the weights leading to h_1h1 have been colored differently from
those leading to h_2h2 just to make it easier to read.
Now to index the weights, we take the input unit number for the _ii and the hidden
unit number for the _jj. That gives us
w_{11}w11
w_{12}w12
The following image includes all of the weights between the input layer and the
hidden layer, labeled with their appropriate w_{ij}wij indices:
Start Quiz:
multilayer.py solution.py
import numpy as np
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1/(1+np.exp(-x))
# Network size
N_input = 4
N_hidden = 3
N_output = 2
np.random.seed(42)
# Make some fake data
X = np.random.randn(4)
print('Hidden-layer Output:')
print(hidden_layer_out)
print('Output-layer Output:')
print(output_layer_out)
07. Backpropagation
Backpropagation
Backpropagation
Now we've come to the problem of how to make a multilayer neural network learn.
Before, we saw how to update weights with gradient descent. The backpropagation
algorithm is just an extension of that, using the chain rule to find the error with the
respect to the weights connecting the input layer to the hidden layer (for a two layer
network).
To update the weights to hidden layers using gradient descent, you need to know
how much error each of the hidden units contributed to the final output. Since the
output of a layer is determined by the weights between layers, the error resulting
from units is scaled by the weights going forward through the network. Since we
know the error at the output, we can use the weights to work backwards to hidden
layers.
Then, the gradient descent step is the same as before, just with the new errors:
where w_{ij}wij are the weights between the inputs and hidden layer and x_ixi are
input unit values. This form holds for however many layers there are. The weight steps
are equal to the step size times the output error of the layer times the values of the
inputs to that layer
Let's walk through the steps of calculating the weight updates for a simple two layer
network. Suppose there are two input values, one hidden unit, and one output unit,
with sigmoid activations on the hidden and output units. The following image depicts
this network. (Note: the input values are shown as nodes at the bottom of the image,
while the network's output value is shown as \hat yy^ at the top. The inputs
themselves do not count as a layer, which is why this is considered a two layer
network.)
Very Important
It turns out this is exactly how we want to calculate the weight update step. As before,
if you have your inputs as a 2D array with one row, you can also
do hidden_error*inputs.T , but that won't work if inputs is a 1D array.
Backpropagation exercise
Below, you'll implement the code to calculate one backpropagation update step for
two sets of weights. I wrote the forward pass - your goal is to code the backward pass.
Things to do
Start Quiz:
backprop.py solution.py
import numpy as np
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1 / (1 + np.exp(-x))
## Forward pass
hidden_layer_input = np.dot(x, weights_input_hidden)
hidden_layer_output = sigmoid(hidden_layer_input)
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1 / (1 + np.exp(-x))
## Forward pass
hidden_layer_input = np.dot(x, weights_input_hidden)
hidden_layer_output = sigmoid(hidden_layer_input)
## Backwards pass
## TODO: Calculate output error
error = target - output
o Make a forward pass through the network, calculating the output \hat yy^
o Calculate the error gradient in the output unit, \delta^o = (y - \hat y)
f'(z)δo=(y−y^)f′(z) where z = \sum_j W_j a_jz=∑jWjaj, the
input to the output unit.
o Propagate the errors to the hidden layer \delta^h_j = \delta^o W_j
f'(h_j)δjh=δoWjf′(hj)
o Update the weight steps:
Start Quiz:
backprop.py data_prep.pybinary.csvsolution.py
import numpy as np
from data_prep import features, targets, features_test, targets_test
np.random.seed(21)
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1 / (1 + np.exp(-x))
# Hyperparameters
n_hidden = 2 # number of hidden units
epochs = 900
learnrate = 0.005
for e in range(epochs):
del_w_input_hidden = np.zeros(weights_input_hidden.shape)
del_w_hidden_output = np.zeros(weights_hidden_output.shape)
for x, y in zip(features.values, targets):
## Forward pass ##
# TODO: Calculate the output
hidden_input = None
hidden_output = None
output = None
## Backward pass ##
# TODO: Calculate the network's prediction error
error = None
Note: This code takes a while to execute, so Udacity's servers sometimes return with an error
saying it took too long. If that happens, it usually works if you try again.
Start Quiz:
backprop.py data_prep.py binary.csvsolution.py
import numpy as np
import pandas as pd
admissions = pd.read_csv('binary.csv')
# Standarize features
for field in ['gre', 'gpa']:
mean, std = data[field].mean(), data[field].std()
data.loc[:,field] = (data[field]-mean)/std
np.random.seed(21)
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1 / (1 + np.exp(-x))
# Hyperparameters
n_hidden = 2 # number of hidden units
epochs = 900
learnrate = 0.005
for e in range(epochs):
del_w_input_hidden = np.zeros(weights_input_hidden.shape)
del_w_hidden_output = np.zeros(weights_hidden_output.shape)
for x, y in zip(features.values, targets):
## Forward pass ##
# TODO: Calculate the output
hidden_input = np.dot(x, weights_input_hidden)
hidden_output = sigmoid(hidden_input)
output = sigmoid(np.dot(hidden_output,
weights_hidden_output))
## Backward pass ##
# TODO: Calculate the network's prediction error
error = y - output
Very important
## Introduction
Udacity Workspaces with GPU support are available for some projects as an alternative to
manually configuring your own remote server with GPU support. These workspaces provide a
Jupyter notebook server directly in your browser. This lesson will briefly introduce the
Workspaces interface.
Important Notes:
Workspaces sessions are connections from your browser to a remote server. Each student has
a limited number of GPU hours allocated on the servers (the allocation is significantly more
than completing the projects is expected to take). There is currently no limit on the number of
Workspace hours when GPU mode is disabled.
Workspace data stored in the user's home folder is preserved between sessions (and can be
reset as needed, e.g., to get project updates).
Only 3 gigabytes of data can be stored in the home folder.
Workspace sessions are preserved if your connection drops or your browser window is
closed, simply return to the classroom and re-open the workspace page; however, workspace
sessions are automatically terminated after a period of inactivity. This will prevent you from
leaving a session connection open and burning through your time allocation. (See the section
on active connections below.)
The kernel state is preserved as long as the notebook session remains open, but it
is not preserved if the session is closed. If you exit the notebook for more than half an hour
and the session is closed, you will need to re-run any previously-run cells before continuing.
## Overview
The default workspaces interface
When the workspace opens, you'll see the normal Jupyter file browser. From this
interface you can open a notebook file, start a remote terminal session, enable the
GPU, submit your project, or reset the workspace data, and more. Clicking the three
bars in the top left corner above the Jupyter logo will toggle hiding the classroom
lessons sidebar.
NOTE: You can always return to the file browser page from anywhere else in the
workspace by clicking the Jupyter logo in the top left corner.
## Opening a notebook
View of the project notebook
Clicking the name of a notebook (*.ipynb) file in the file list will open a standard
Jupyter notebook view of the project. The notebook session will remain open as long
as you are active, and will be automatically terminated after 30 minutes of inactivity.
You can exit a notebook by clicking on the Jupyter logo in the top left corner.
NOTE: Notebooks continue to run in the background unless they are stopped. IF
GPU MODE IS ACTIVE, IT WILL REMAIN ACTIVE AFTER CLOSING OR STOPPING A
NOTEBOOK. YOU CAN ONLY STOP GPU MODE WITH THE GPU TOGGLE BUTTON.
(See next section.)
## Enabling GPU Mode
GPU Workspaces can also be run without time restrictions when the GPU mode is
disabled. The "Enable"/"Disable" button (circled in red in the image) can be used to
toggle GPU mode. NOTE: Toggling GPU support may switch the physical server
your session connects to, which can cause data loss UNLESS YOU CLICK THE
SAVE BUTTON BEFORE TOGGLING GPU SUPPORT.
Do not try to permanently hold the workspace session active when you do not have a
process running (e.g., do not try to hold the session open in the background)—the
limits are in place to preserve your GPU time allocation; there is no guarantee that
you'll receive additional time if you exceed the limit.
Make sure that you save the results of the long running task to disk as soon as the
task ends (e.g., checkpoint your model parameters for deep learning networks);
otherwise the workspace will disconnect 30 minutes after the active process ends, and
the results will be lost.
for i in keep_awake(range(5)): #anything that happens inside this loop will keep
the workspace active
# do iteration with lots of work here
Example using active_session :
with active_session():
# do long-running work here
## Submitting a Project
The Submit Project Button
Some workspaces are able to directly submit projects on your behalf (i.e., you
do not need to manually submit the project in the classroom). To submit your project,
simply click the "Submit Project" button (circled in red in the above image).
If you do not see the "Submit Project" button, then project submission is not enabled
for that workspace. You will need to manually download your project files and submit
them in the classroom.
NOTE: YOU MUST ENSURE THAT YOUR SUBMISSION INCLUDES ALL REQUIRED
FILES BEFORE SUBMITTING -- INCLUDING ANY FILE CONVERSIONS (e.g., from
ipynb to HTML)
## Opening a Terminal
The "New" menu button
Jupyter workspaces support several views, including the file browser and notebook
view already covered, as well as shell terminals. To open a terminal shell, click the
"New" menu button at the top right of the file browser view and select "Terminal".
## Terminals
Jupyter terminal shell interface
Terminals provide a full Bash shell that you can use to install or update software
packages, fetch updates from github repositories, or run any other terminal
commands. As with the notebook view, you can return to the file browser view by
clicking on the Jupyter logo at the top left corner of the window.
NOTE: Your data & changes are persistent across workspace sessions. Any
changes you make will need to be repeated if you later reset your workspace
data.
## Resetting Data
The Menu Button
The "Menu" button in the bottom left corner provides support for resetting your
Workspaces. The "Refresh Workspace" button will refresh your session, which has no
effect on the changes you've made in the workspace.
The "Reset Data" button discards all changes and restores a clean copy of the
workspace. Clicking the button will open a dialog that requires you to type "Reset
data" in a confirmation dialog. ALL OF YOUR DATA WILL BE LOST.
Resetting should only be required if Udacity makes changes to the project and you
can't get them via git pull , or if you destroy the contents of the workspace. If you
do need to reset your data, you are strongly encouraged to download a copy of your
work from the file interface before clicking Reset Data.
Back to Home
This section contains either a workspace (it can be a Jupyter Notebook workspace or an online
code editor work space, etc.) and it cannot be automatically downloaded to be generated here.
Please access the classroom with your account and manually download the workspace to your
local machine. Note that for some courses, Udacity upload the workspace files
onto https://github.com/udacity, so you may be able to download them there.
Workspace Information:
In this project, you'll get to build a neural network from scratch to carry out a prediction
problem on a real dataset! By building a neural network from the ground up, you'll have a
much better understanding of gradient descent, backpropagation, and other concepts that are
important to know before we move to higher level tools such as Tensorflow. You'll also get to
see how to apply these networks to solve real prediction problems!
Instructions
1. Download the project materials from our GitHub repository. You can get download the
repository with git clone https://github.com/udacity/deep-learning.git . Our
files in the GitHub repo are the most up to date, so it's the best place to get the project files.
2. cd into the first-neural-network directory.
3. Download anaconda or miniconda based on the instructions in the Anaconda lesson.
4. Create a new conda environment:
jupyter notebook
If you need help running the notebook file, check out the Jupyter notebook lesson.
Submission
Before submitting your solution to a reviewer, you are required to submit your project to
Udacity's Project Assistant, which will provide some initial feedback. It will give you
feedback within a minute or two on whether your project will meet all specifications.
It is possible to submit projects which do not pass all tests; you can expect to get feedback
from your Udacity reviewer on these within 3-4 days.
The setup for the project assistant is simple. If you have not installed the client tool from a
different Nanodegree program already, then you may do so with the command pip install
udacity-pa .
To submit your code to the project assistant, run udacity submit from within the top-level
directory of the project. You will be prompted for a username and password. If you login using
google or facebook, visit this link for alternate login instructions.
This process will create a zipfile in your top-level directory named first_neural_network-
result-.zip , where there will be a number between result- and .zip . This is the file that
you should submit to the Udacity reviews system.
Upload that file into the system and hit Submit Project below!
If you run into any issues using the project assistant, please check this page to troubleshoot;
feel free to post your problem in Knowledge if it isn't covered by one of the displayed cases!
What to do afterwards
If you're waiting for new content or to get the review back, here's a great video from Frank
Chen about the history of deep learning. It's a 45 minute video, sort of a short documentary,
starting in the 1950s and bringing us to the current boom in deep learning and artificial
intelligence.
Your first neural network
Code Functionality
Criteria
All code works appropriately and passes all unit tests All the code in the notebook runs in Python
Sigmoid activation function The sigmoid activation function is implemen
Forward Pass
Criteria
Forward Pass - Training The forward pass is correctly implemented for the network's training.
Forward Pass - Run The run method correctly produces the desired regression output for the n
Backward Pass
Criteria Meet Specification
Batch Weight Change The network correctly implements the backward pass for each batch, correctly up
Updating the weights Updates to both the input-to-hidden and hidden-to-output weights are implemente
Hyperparameters
Criteria Meet Specification
Number of epochs The number of epochs is chosen such the network is trained well enough to accurately m
Number of hidden The number of hidden units is chosen such that the network is able to accurately predict
units overfitting.
Learning rate The learning rate is chosen such that the network successfully converges, but is still time
Output Nodes The number of output nodes is properly selected to solve the desired problem.
Final Results The training loss is below 0.09 and the validation loss is below 0.18.
Part 02-Module 01-Lesson 06_Sentiment Analysis
03. Materials
Materials
As you follow along this lesson, it's extremely important that you open the Jupyter notebook
and attempt the exercises. Much of the value in this experience will come from seeing how
your solution is different from Andrew's and playing around with the code in your own way.
Make this lesson count!
Workspace
The best way to open the notebook is to click here, which will open it in a new window. We
recommend you to work on the notebook in that window, and watch the videos in this one.
You can also get to the notebook by clicking the "Next" button in the classroom.
If you want to download the notebooks yourself, you can clone them from our GitHub
repository. You can either download the repository with git clone
https://github.com/udacity/deep-learning.git , or download it as an archive file
from this link.
Note: the notebooks for these lessons have been updated since the videos were recorded. In
most cases that just means your notebook will contain more hints and explanatory text than
what you see in the videos, but there may be some minor differences in the code as well. With
these changes, you still will be able to follow along with the lessons, and should have an easier
time understanding the project material.
Solutions
If you need help, feel free to look at the solutions in the same folder.
04. The Notebooks
Workspace
This section contains either a workspace (it can be a Jupyter Notebook workspace or an online
code editor work space, etc.) and it cannot be automatically downloaded to be generated here.
Please access the classroom with your account and manually download the workspace to your
local machine. Note that for some courses, Udacity upload the workspace files
onto https://github.com/udacity, so you may be able to download them there.
Workspace Information:
In this project, you'll test your theory of what features of a review correlate with the label!
Here are your specific steps:
Mini Project 1
Task List:
Work in the Project 1 section of Sentiment_Classification_Projects.ipynb .
Follow the notebook’s instructions to test the correlation between review features and labels.
Task Feedback:
Nice work! In the next video, Andrew will explain his solution.
Important about project
09. Mini Project 2
Instructions
In the following mini project, you'll convert the inputs and outputs of the dataset into numbers.
Namely, you will convert each review string into a vector, and each label into a 0 or 1 .
You’ll need to make a few additions to the notebook, but the main work will be implementing
two functions, whose signatures are shown below:
def update_input_layer(review):
""" Modify the global layer_0 to represent the vector form of review.
The element at a given index of layer_0 should represent \
how many times the given word occurs in the review.
Args:
review(string) - the string of the review
Returns:
None
"""
global layer_0
# clear out previous state, reset the layer to be all 0s
layer_0 *= 0
## Your code here
pass
def get_target_for_label(label):
"""Convert a label to `0` or `1`.
Args:
label(string) - Either "POSITIVE" or "NEGATIVE".
Returns:
`0` or `1`.
"""
pass
Mini Project 2
Task List:
Work in the Project 2 section of Sentiment_Classification_Projects.ipynb .
Follow the notebook’s instructions to convert your inputs and outputs to numbers.
Create a global vocabulary set vocab
Initialize a global layer_0 that is a vector of the size of the text vocabulary. All values should
be initialized to 0 .
Implement update_input_layer .
Implement get_target_label .
Task Feedback:
Nice work! In the next video, Andrew will share his solution.
Keras
Back to Home
01. Intro
02. Keras
03. Pre-Lab: Student Admissions in Keras
04. Lab: Student Admissions in Keras
05. Optimizers in Keras
06. Mini Project Intro
07. Pre-Lab: IMDB Data in Keras
08. Lab: IMDB Data in Keras
Keras
Hi again! Now, we know all there is about training and optimizing neural networks,
and we've actually trained a few of them in NumPy, But this is not what we normally
do in real life. There are many packages which will make our life much easier. The two
main ones that we'll learn in this course are Keras and TensorFlow. In this lesson, we'll
learn to use Keras.
The way we'll learn is by writing lots of code and building lots of models. We'll start by
building a simple neural network that will solve the XOR problem. Then, we'll build a
bigger neural network that will analyze the student data that we have analyzed in a
previous section.
And finally, we'll have a lab in which you'll be able to build a neural network yourself,
which will process text, and make predictions on the sentiment of movie reviews in
IMDB.
02. Keras
Neural Networks in Keras
Luckily, every time we need to use a neural network, we won't need to code the activation
function, gradient descent, etc. There are lots of packages for this, which we recommend you
to check out, including the following:
Keras
TensorFlow
Caffe
Theano
Scikit-learn
And many others!
In this course, we will learn Keras. Keras makes coding deep neural networks simpler. To
demonstrate just how easy it is, you're going to build a simple fully-connected network in a
few dozen lines of code.
We’ll be connecting the concepts that you’ve learned in the previous lessons to the methods
that Keras provides.
The general idea for this example is that you'll first load the data, then define the network, and
then finally train the network.
Sequential Model
from keras.models import Sequential
Layers
The Keras Layer class provides a common interface for a variety of standard neural network
layers. There are fully connected layers, max pool layers, activation layers, and more. You can
add a layer to a model using the model's add() method. For example, a simple model with a
single hidden layer might look like this:
import numpy as np
from keras.models import Sequential
from keras.layers.core import Dense, Activation
# X has shape (num_rows, num_cols), where the training data are stored
# as row vectors
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=np.float32)
# 1st Layer - Add an input layer of 32 nodes with the same input shape as
# the training samples in X
model.add(Dense(32, input_dim=X.shape[1]))
The activation "layers" in Keras are equivalent to specifying an activation function in the
Dense layers (e.g., model.add(Dense(128)); model.add(Activation('softmax')) is
computationally equivalent to model.add(Dense(128, activation="softmax"))) ), but it is
common to explicitly separate the activation layers because it allows direct access to the
outputs of each layer before the activation is applied (which is useful in some model
architectures).
Once we have our model built, we need to compile it before it can be run. Compiling the Keras
model calls the backend (tensorflow, theano, etc.) and binds the optimizer, loss function, and
other parameters required before the model can be run on any input data. We'll specify the loss
function to be categorical_crossentropy which can be used when there are only two
classes, and specify adam as the optimizer (which is a reasonable default when speed is a
priority). And finally, we can specify what metrics we want to evaluate the model with. Here
we'll use accuracy.
model.summary()
The model is trained with the fit() method, through the following command that specifies
the number of training epochs and the message level (how much information we want
displayed on the screen during training).
model.evaluate()
Pretty simple, right? Let's put it into practice.
Quiz
Let's start with the simplest example. In this quiz you will build a simple multi-layer
feedforward neural network to solve the XOR problem.
1. Set the first layer to a Dense() layer with an output width of 8 nodes and
the input_dim set to the size of the training samples (in this case 2).
2. Add a tanh activation function.
3. Set the output layer width to 1, since the output has only two classes. (We can use 0 for one
class and 1 for the other)
4. Use a sigmoid activation function after the output layer.
5. Run the model for 50 epochs.
This should give you an accuracy of 50%. That's ok, but certainly not great. Out of 4 input
points, we're correctly classifying only 2 of them. Let's try to change some parameters around
to improve. For example, you can increase the number of epochs. You'll pass this quiz if you
get 75% accuracy. Can you reach 100%?
To get started, review the Keras documentation about models and layers.
The Keras example of a Multi-Layer Perceptron network is similar to what you need to do
here. Use that as a guide, but keep in mind that there will be a number of differences.
tart Quiz:
network.py network_solution.py
import numpy as np
from keras.utils import np_utils
import tensorflow as tf
# Using TensorFlow 1.0.0; use tf.python_io in later versions
tf.python.control_flow_ops = tf
# Our data
X = np.array([[0,0],[0,1],[1,0],[1,1]]).astype('float32')
y = np.array([[0],[1],[1],[0]]).astype('float32')
# Our data
X = np.array([[0,0],[0,1],[1,0],[1,1]]).astype('float32')
y = np.array([[0],[1],[1],[0]]).astype('float32')
xor.compile(loss="categorical_crossentropy", optimizer="adam",
metrics = ['accuracy'])
As you follow along with this lesson, you are encouraged to work in the referenced
Jupyter notebooks at the end of the page. We will present a solution to you, but
please try creating your own deep learning models! Much of the value in this
experience will come from playing around with the code in your own way.
Workspace
To open this notebook, you have two options:
Instructions
This is more of a follow-along lab. We'll show you the steps to build the network.
However, at the end of the lab you'll be given the opportunity to improve the model,
and try to improve on its performance. Here are the main steps in this lab.
First, let's start by looking at the data. For that, we'll use the read_csv function in
pandas.
import pandas as pd
data = pd.read_csv('student_data.csv')
print(data)
Here we can see that the first column is the label y , which corresponds to
acceptance/rejection. Namely, a label of 1 means the student got accepted, and a
label of 0 means the student got rejected.
When we plot the data, we get the following graphs, which show that unfortunately,
the data is not as nicely separable as we'd hope:
So one thing we can do is make one graph for each of the 4 ranks. In that case, we get
this:
Pre-processing the data
Ok, there's a bit more hope here. It seems like the better grades and test scores the
student has, the more likely they are to be accepted. And the rank has something to
do with it. So what we'll do is, we'll one-hot encode the rank, and our 6 input variables
will be:
Test (GPA)
Grades (GRE)
Rank 1
Rank 2
Rank 3
Rank 4.
The last 4 inputs will be binary variables that have a value of 1 if the student has that
rank, or 0 otherwise.
So, first things first, let's notice that the test scores have a range of 800, while the
grades have a range of 4. This is a huge discrepancy, and it will affect our training.
Normally, the best thing to do is to normalize the scores so they are between 0 and 1.
We can do this as follows:
data["gre"] = data["gre"]/800
data["gpa"] = data["gpa"]/4.0
Now, we split our data input into X, and the labels y , and one-hot encode the output,
so it appears as two classes (accepted and not accepted).
X = np.array(data)[:,1:]
y = keras.utils.to_categorical(np.array(data["admit"]))
Building the model architecture
And finally, we define the model architecture. We can use different architectures, but
here's an example:
model = Sequential()
model.add(Dense(128, input_dim=6))
model.add(Activation('sigmoid'))
model.add(Dense(32))
model.add(Activation('sigmoid'))
model.add(Dense(2))
model.add(Activation('sigmoid'))
model.compile(loss = 'categorical_crossentropy', optimizer='adam',
metrics=['accuracy'])
model.summary()
The error function is given by categorical_crossentropy , which is the one we've
been using, but there are other options. There are several optimizers which you can
choose from in order to improve your training. Here we use adam, but there are
others that are useful, such as rmsprop. These use a variety of techniques that we'll
outline in upcoming pages in this lesson.
And there you go, you've trained your first neural network to analyze a dataset. Now,
in the following pages, you'll learn many techniques to improve the training process.
04. Lab: Student Admissions in Keras
Workspace
This section contains either a workspace (it can be a Jupyter Notebook workspace or an online
code editor work space, etc.) and it cannot be automatically downloaded to be generated here.
Please access the classroom with your account and manually download the workspace to your
local machine. Note that for some courses, Udacity upload the workspace files
onto https://github.com/udacity, so you may be able to download them there.
Workspace Information:
SGD
This is Stochastic Gradient Descent. It uses the following parameters:
Learning rate.
Momentum (This takes the weighted average of the previous steps, in order to get a bit of
momentum and go over bumps, as a way to not get stuck in local minima).
Nesterov Momentum (This slows down the gradient when it's close to the solution).
Adam
Adam (Adaptive Moment Estimation) uses a more complicated exponential decay that consists
of not just considering the average (first moment), but also the variance (second moment) of
the previous steps.
RMSProp
RMSProp (RMS stands for Root Mean Squared Error) decreases the learning rate by dividing
it by an exponentially decaying average of squared gradients.
07. Pre-Lab: IMDB Data in Keras
Mini Project: Using Keras to analyze IMDB Movie Data
Now, you're ready to shine! In this project, we will analyze a dataset from IMDB and use it to
predict the sentiment analysis of a review.
Workspace
To open this notebook, you have two options:
Instructions
In this lab, we will preprocess the data for you, and you'll be in charge of building and training
the model in Keras.
The dataset
This lab uses a dataset of 25,000 IMDB reviews. Each review comes with a label. A label of 0
is given to a negative review, and a label of 1 is given to a positive review. The goal of this lab
is to create a model that will predict the sentiment of a review, based on the words in the
review. You can see more information about this dataset in the Keras website.
Now, the input already comes preprocessed for us for convenience. Each review is encoded as
a sequence of indexes, corresponding to the words in the review. The words are ordered by
frequency, so the integer 1 corresponds to the most frequent word ("the"), the integer 2 to the
second most frequent word, etc. By convention, the integer 0 corresponds to unknown words.
Then, the sentence is turned into a vector by simply concatenating these integers. For instance,
if the sentence is "To be or not to be." and the indices of the words are as follows:
"to": 5
"be": 8
"or": 21
"not": 3
The data comes preloaded in Keras, which means we don't need to open or read any files
manually. The command to load it is the following, which will actually split the words into
training and testing sets and labels!:
num_words: Top most frequent words to consider. This is useful if you don't want to consider
very obscure words such as "Ultracrepidarian."
skip_top: Top words to ignore. This is useful if you don't want to consider the most common
words. For example, the word "the" would add no information to the review, so we can skip it
by setting skip_top to 2 or higher.
We first prepare the data by one-hot encoding it into (0,1)-vectors as follows: If, for example,
we have 10 words in our vocabulary, and the vector is (4,1,8), we'll turn it into the vector
(1,0,0,1,0,0,0,1,0,0).
Now it's your turn to use all you've learned! You can build a neural network using Keras, train
it, and evaluate it! Make sure you also use methods such as dropout or regularization, and
good Keras optimizers to do this. A good accuracy to aim for is 85%. Can your model achieve
this?
Help
This is a self-assessed lab. If you need any help or want to check your answers, feel free to
check out the solutions notebook in the same folder, or click here.
08. Lab: IMDB Data in Keras
Workspace
This section contains either a workspace (it can be a Jupyter Notebook workspace or an online
code editor work space, etc.) and it cannot be automatically downloaded to be generated here.
Please access the classroom with your account and manually download the workspace to your
local machine. Note that for some courses, Udacity upload the workspace files
onto https://github.com/udacity, so you may be able to download them there.
Workspace Information:
Intro to TensorFlow
Now that you are an expert in Neural Networks with Keras, you're more than ready to learn
TensorFlow. In the following sections of this Nanodegree Program, you will be using Keras
and TensorFlow alternately. Keras is great for building neural networks quickly, but it
abstracts a lot of the details. TensorFlow is great for understanding how neural networks
operate on a lower level. This lesson will teach you what you need to know of TensorFlow,
and give you some exercises to practice.
This lesson will build up on the knowledge from the Deep Neural Networks lesson. If you
need to refresh your memory on any of the topics, such as Linear Functions, Softmax, Cross
Entropy, Batching, Epochs, etc., feel free to go back and watch them again.
Linear Functions
Softmax
Cross Entropy
Batching and Epochs
Enjoy!
02. Installing TensorFlow
Throughout this lesson, you'll apply your knowledge of neural networks on real datasets
using TensorFlow (link for China), an open source Deep Learning library created by Google.
You’ll use TensorFlow to classify images from the notMNIST dataset - a dataset of images of
English letters from A to J. You can see a few example images below.
Your goal is to automatically detect the letter based on the image in the dataset. You’ll be
working on your own computer for this lab, so, first things first, install TensorFlow!
Install
As usual, we'll be using Conda to install TensorFlow. You might already have a TensorFlow
environment, but check to make sure you have all the necessary packages.
OS X or Linux
Run the following commands to setup your environment:
import tensorflow as tf
tf.Variable()
x = tf.Variable(5)
The tf.Variable class creates a tensor with an initial value that can be modified, much like a
normal Python variable. This tensor stores its state in the session, so you must initialize the
state of the tensor manually. You'll use the tf.global_variables_initializer() function
to initialize the state of all the Variable tensors.
Initialization
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
The tf.global_variables_initializer() call returns an operation that will initialize all
TensorFlow variables from the graph. You call the operation using a session to initialize all the
variables as shown above. Using the tf.Variable class allows us to change the weights and
bias, but an initial value needs to be chosen.
Initializing the weights with random numbers from a normal distribution is good practice.
Randomizing the weights helps the model from becoming stuck in the same place every time
you train it. You'll learn more about this in the next lesson, when you study gradient descent.
Similarly, choosing weights from a normal distribution prevents any one weight from
overwhelming other weights. You'll use the tf.truncated_normal() function to generate
random numbers from a normal distribution.
tf.truncated_normal()
n_features = 120
n_labels = 5
weights = tf.Variable(tf.truncated_normal((n_features, n_labels)))
The tf.truncated_normal() function returns a tensor with random values from a normal
distribution whose magnitude is no more than 2 standard deviations from the mean.
Since the weights are already helping prevent the model from getting stuck, you don't need to
randomize the bias. Let's use the simplest solution, setting the bias to 0.
tf.zeros()
n_labels = 5
bias = tf.Variable(tf.zeros(n_labels))
The tf.zeros() function returns a tensor with all zeros.
You'll be classifying the handwritten numbers 0 , 1 , and 2 from the MNIST dataset using
TensorFlow. The above is a small sample of the data you'll be training on. Notice how some of
the 1 s are written with a serif at the top and at different angles. The similarities and
differences will play a part in shaping the weights of the model.
Left: Weights for labeling 0. Middle: Weights for labeling 1. Right: Weights for labeling 2.
The images above are trained weights for each label ( 0 , 1 , and 2 ). The weights display the
unique properties of each digit they have found. Complete this quiz to train your own weights
using the MNIST dataset.
Instructions
1. Open quiz.py.
1. Implement get_weights to return a tf.Variable of weights
2. Implement get_biases to return a tf.Variable of biases
3. Implement xW + b in the linear function
2. Open sandbox.py
1. Initialize all weights
Since xW in xW + b is matrix multiplication, you have to use the tf.matmul() function
instead of tf.multiply() . Don't forget that order matters in matrix multiplication,
so tf.matmul(a,b) is not the same as tf.matmul(b,a) .
Start Quiz:
sandbox.py quiz.pyquiz_solution.pysandbox_solution.py
# Solution is available in the other "sandbox_solution.py" tab
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
from quiz import get_weights, get_biases, linear
def mnist_features_labels(n_labels):
"""
Gets the first <n> labels from the MNIST dataset
:param n_labels: Number of labels to use
:return: Tuple of feature list and label list
"""
mnist_features = []
mnist_labels = []
mnist = input_data.read_data_sets('/datasets/ud730/mnist',
one_hot=True)
# Add features and labels if it's for the first <n>th labels
if mnist_label[:n_labels].any():
mnist_features.append(mnist_feature)
mnist_labels.append(mnist_label[:n_labels])
# Linear Function xW + b
logits = linear(features, w, b)
# Training data
train_features, train_labels = mnist_features_labels(n_labels)
# Softmax
prediction = tf.nn.softmax(logits)
# Cross entropy
# This quantifies how far off the predictions were.
# You'll learn more about this in future lessons.
cross_entropy = -tf.reduce_sum(labels * tf.log(prediction),
reduction_indices=1)
# Training loss
# You'll learn more about this in future lessons.
loss = tf.reduce_mean(cross_entropy)
# Gradient Descent
# This is the method used to train the model
# You'll learn more about this in future lessons.
optimizer =
tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
# Print loss
print('Loss: {}'.format(l))
Start Quiz:
sandbox.py quiz.py quiz_solution.pysandbox_solution.py
# Solution is available in the other "quiz_solution.py" tab
import tensorflow as tf
def get_biases(n_labels):
"""
Return TensorFlow bias
:param n_labels: Number of labels
:return: TensorFlow bias
"""
# TODO: Return biases
pass
def get_biases(n_labels):
"""
Return TensorFlow bias
:param n_labels: Number of labels
:return: TensorFlow bias
"""
# TODO: Return biases
return tf.Variable(tf.zeros(n_labels))
def mnist_features_labels(n_labels):
"""
Gets the first <n> labels from the MNIST dataset
:param n_labels: Number of labels to use
:return: Tuple of feature list and label list
"""
mnist_features = []
mnist_labels = []
mnist = input_data.read_data_sets('/datasets/ud730/mnist',
one_hot=True)
# Add features and labels if it's for the first <n>th labels
if mnist_label[:n_labels].any():
mnist_features.append(mnist_feature)
mnist_labels.append(mnist_label[:n_labels])
# Linear Function xW + b
logits = linear(features, w, b)
# Training data
train_features, train_labels = mnist_features_labels(n_labels)
# Softmax
prediction = tf.nn.softmax(logits)
# Cross entropy
# This quantifies how far off the predictions were.
# You'll learn more about this in future lessons.
cross_entropy = -tf.reduce_sum(labels * tf.log(prediction),
reduction_indices=1)
# Training loss
# You'll learn more about this in future lessons.
loss = tf.reduce_mean(cross_entropy)
# Gradient Descent
# This is the method used to train the model
# You'll learn more about this in future lessons.
optimizer =
tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
# Print loss
print('Loss: {}'.format(l))
05. Quiz: TensorFlow Softmax
TensorFlow Softmax
The softmax function squashes it's inputs, typically called logits or logit scores, to be between
0 and 1 and also normalizes the outputs such that they all sum to 1. This means the output of
the softmax function is equivalent to a categorical probability distribution. It's the perfect
function to use as the output activation for a network predicting multiple classes.
TensorFlow Softmax
We're using TensorFlow to build neural networks and, appropriately, there's a function for
calculating softmax.
Quiz
Use the softmax function in the quiz below to return the softmax of the logits.
Start Quiz:
quiz.py solution.py
# Solution is available in the other "solution.py" tab
import tensorflow as tf
def run():
output = None
logit_data = [2.0, 1.0, 0.1]
logits = tf.placeholder(tf.float32)
return output
Start Quiz:
quiz.py solution.py
# Quiz Solution
# Note: You can't run code in this tab
import tensorflow as tf
def run():
output = None
logit_data = [2.0, 1.0, 0.1]
logits = tf.placeholder(tf.float32)
softmax = tf.nn.softmax(logits)
return output
06. Quiz: TensorFlow Cross Entropy
Cross Entropy in TensorFlow
As with the softmax function, TensorFlow has a function to do the cross entropy calculations
for us.
Let's take what you learned from the video and create a cross entropy function in TensorFlow.
To create a cross entropy function in TensorFlow, you'll need to use two new functions:
tf.reduce_sum()
tf.log()
Reduce Sum
x = tf.reduce_sum([1, 2, 3, 4, 5]) # 15
The tf.reduce_sum() function takes an array of numbers and sums them together.
Natural Log
x = tf.log(100.0) # 4.60517
This function does exactly what you would expect it to do. tf.log() takes the natural log of
a number.
Quiz
Print the cross entropy using softmax_data and one_hot_encod_label .
Start Quiz:
quiz.py solution.py
# Solution is available in the other "solution.py" tab
import tensorflow as tf
softmax = tf.placeholder(tf.float32)
one_hot = tf.placeholder(tf.float32)
softmax = tf.placeholder(tf.float32)
one_hot = tf.placeholder(tf.float32)
Mini-batching is a technique for training on subsets of the dataset instead of all the data at one
time. This provides the ability to train a model, even if a computer lacks the memory to store
the entire dataset.
Mini-batching is computationally inefficient, since you can't calculate the loss simultaneously
across all samples. However, this is a small price to pay in order to be able to run the model at
all.
It's also quite useful combined with SGD. The idea is to randomly shuffle the data at the start
of each epoch, then create the mini-batches. For each mini-batch, you train the network
weights with gradient descent. Since these batches are random, you're performing SGD with
each batch.
Let's look at the MNIST dataset with weights and a bias to see if your machine can handle it.
train_labels = mnist.train.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)
Calculate the memory size of train_features , train_labels , weights , and bias in
bytes. Ignore memory for overhead, just calculate the memory required for the stored data.
You may have to look up how much memory a float32 requires, using this link.
QUESTION:
ANSWER:
SOLUTION:
NOTE: The solutions are expressed in RegEx pattern. Udacity uses these patterns to
check the given answer
QUESTION:
ANSWER:
SOLUTION:
QUESTION:
ANSWER:
SOLUTION:
QUESTION:
But larger datasets that you'll use in the future measured in gigabytes or more. It's
possible to purchase more memory, but it's expensive. A Titan X GPU with 12 GB of
memory costs over $1,000.
Instead, in order to run large models on your machine, you'll learn how to use mini-
batching.
TensorFlow Mini-batching
In order to use mini-batching, you must first divide your data into batches.
Unfortunately, it's sometimes impossible to divide the data into batches of exactly
equal size. For example, imagine you'd like to create batches of 128 samples each
from a dataset of 1000 samples. Since 128 does not evenly divide into 1000, you'd
wind up with 7 batches of 128 samples, and 1 batch of 104 samples. (7*128 + 1*104 =
1000)
In that case, the size of the batches would vary, so you need to take advantage of
TensorFlow's tf.placeholder() function to receive the varying batch sizes.
Continuing the example, if each sample had n_input = 784 features and n_classes
= 10 possible labels, the dimensions for features would be [None,
n_input] and labels would be [None, n_classes] .
The None dimension is a placeholder for the batch size. At runtime, TensorFlow will
accept any batch size greater than 0.
Going back to our earlier example, this setup allows you to
feed features and labels into the model as either the batches of 128 samples or
the single batch of 104 samples.
Question 2
Use the parameters below, how many batches are there, and what is the last batch
size?
batch_size is 128
QUESTION:
ANSWER:
SOLUTION:
QUESTION:
ANSWER:
SOLUTION:
Now that you know the basics, let's learn how to implement mini-batching.
Question 3
Implement the batches function to batch features and labels . The function should
return each batch with a maximum size of batch_size . To help you with the quiz, look
at the following example output of a working batches function.
# 4 Samples of features
example_features = [
['F11','F12','F13','F14'],
['F21','F22','F23','F24'],
['F31','F32','F33','F34'],
['F41','F42','F43','F44']]
# 4 Samples of labels
example_labels = [
['L11','L12'],
['L21','L22'],
['L31','L32'],
['L41','L42']]
[
# 2 batches:
# First is a batch of size 3.
# Second is a batch of size 1
[
# First Batch is size 3
[
# 3 samples of features.
# There are 4 features per sample.
['F11', 'F12', 'F13', 'F14'],
['F21', 'F22', 'F23', 'F24'],
['F31', 'F32', 'F33', 'F34']
], [
# 3 samples of labels.
# There are 2 labels per sample.
['L11', 'L12'],
['L21', 'L22'],
['L31', 'L32']
]
], [
# Second Batch is size 1.
# Since batch size is 3, there is only one sample left from the 4 samples.
[
# 1 sample of features.
['F41', 'F42', 'F43', 'F44']
], [
# 1 sample of labels.
['L41', 'L42']
]
]
]
Implement the batches function in the "quiz.py" file below.
Start Quiz:
sandbox.py quiz.pyquiz_solution.py
from quiz import batches
from pprint import pprint
# 4 Samples of features
example_features = [
['F11','F12','F13','F14'],
['F21','F22','F23','F24'],
['F31','F32','F33','F34'],
['F41','F42','F43','F44']]
# 4 Samples of labels
example_labels = [
['L11','L12'],
['L21','L22'],
['L31','L32'],
['L41','L42']]
Start Quiz:
sandbox.pyquiz.py quiz_solution.py
import math
def batches(batch_size, features, labels):
"""
Create batches of features and labels
:param batch_size: The batch size
:param features: List of features
:param labels: List of labels
:return: Batches of (Features, Labels)
"""
assert len(features) == len(labels)
# TODO: Implement batching
output_batches = []
sample_size = len(features)
for start_i in range(0, sample_size, batch_size):
end_i = start_i + batch_size
batch = [features[start_i:end_i], labels[start_i:end_i]]
output_batches.append(batch)
return output_batches
Let's use mini-batching to feed batches of MNIST features and labels into a linear
model.
Set the batch size and run the optimizer over all the batches with
the batches function. The recommended batch size is 128. If you have memory
restrictions, feel free to make it smaller.
Start Quiz:
quiz.py helper.pyquiz_solution.py
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
from helper import batches
learning_rate = 0.001
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
train_labels = mnist.train.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)
# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)
# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels,
1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# TODO: Set batch size
batch_size = None
assert batch_size is not None, 'You must set the batch size'
init = tf.global_variables_initializer()
sample_size = len(features)
for start_i in range(0, sample_size, batch_size):
end_i = start_i + batch_size
batch = [features[start_i:end_i], labels[start_i:end_i]]
outout_batches.append(batch)
return outout_batches
Start Quiz:
quiz.pyhelper.py quiz_solution.py
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
from helper import batches
learning_rate = 0.001
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
train_labels = mnist.train.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)
# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)
# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels,
1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# TODO: Set batch size
batch_size = 128
assert batch_size is not None, 'You must set the batch size'
init = tf.global_variables_initializer()
The accuracy is low, but you probably know that you could train on the dataset more
than once. You can train a model using the dataset multiple times. You'll go over this
subject in the next section where we talk about "epochs".
08. Epochs
Epochs
An epoch is a single forward and backward pass of the whole dataset. This is used to increase
the accuracy of the model without requiring more data. This section will cover epochs in
TensorFlow and how to choose the right number of epochs.
train_labels = mnist.train.labels.astype(np.float32)
valid_labels = mnist.validation.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)
# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)
# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
init = tf.global_variables_initializer()
batch_size = 128
epochs = 10
learn_rate = 0.001
# Training cycle
for epoch_i in range(epochs):
This model continues to improve accuracy up to Epoch 9. Let's increase the number of epochs
to 100.
...
Epoch: 79 - Cost: 0.111 Valid Accuracy: 0.86
Epoch: 80 - Cost: 0.11 Valid Accuracy: 0.869
Epoch: 81 - Cost: 0.109 Valid Accuracy: 0.869
....
Epoch: 85 - Cost: 0.107 Valid Accuracy: 0.869
Epoch: 86 - Cost: 0.107 Valid Accuracy: 0.869
Epoch: 87 - Cost: 0.106 Valid Accuracy: 0.869
Epoch: 88 - Cost: 0.106 Valid Accuracy: 0.869
Epoch: 89 - Cost: 0.105 Valid Accuracy: 0.869
Epoch: 90 - Cost: 0.105 Valid Accuracy: 0.869
Epoch: 91 - Cost: 0.104 Valid Accuracy: 0.869
Epoch: 92 - Cost: 0.103 Valid Accuracy: 0.869
Epoch: 93 - Cost: 0.103 Valid Accuracy: 0.869
Epoch: 94 - Cost: 0.102 Valid Accuracy: 0.869
Epoch: 95 - Cost: 0.102 Valid Accuracy: 0.869
Epoch: 96 - Cost: 0.101 Valid Accuracy: 0.869
Epoch: 97 - Cost: 0.101 Valid Accuracy: 0.869
Epoch: 98 - Cost: 0.1 Valid Accuracy: 0.869
Epoch: 99 - Cost: 0.1 Valid Accuracy: 0.869
Test Accuracy: 0.8696000006198883
From looking at the output above, you can see the model doesn't increase the validation
accuracy after epoch 80. Let's see what happens when we increase the learning rate.
learn_rate = 0.1
In the upcoming TensorFLow Lab, you'll get the opportunity to choose your own learning rate,
epoch count, and batch size to improve the model's accuracy.
09. Pre-Lab: NotMNIST in TensorFlow
TensorFlow Neural Network Lab
TensorFlow Lab
We've prepared a Jupyter notebook that will guide you through the process of creating a single
layer neural network in TensorFlow. You'll implement data normalization, then build and train
the network with TensorFlow.
jupyter notebook
This should open a browser window for you. If it doesn't, go to http://localhost:8888/tree.
Although, the port number might be different if you have other notebook servers running, so
try 8889 instead of 8888 if you can't find the right server.
You should see the notebook intro_to_tensorflow.ipynb , this is the notebook you'll be
working on. The notebook has 3 problems for you to solve:
This is a self-assessed lab. Compare your answers to the solutions here. If you have any
difficulty completing the lab, Udacity provides a few services to answer any questions you
might have.
Help
Remember that you can get assistance from your mentor, the Forums (click the link on the left
side of the classroom), or the Slack channel. You can also review the concepts from the
previous lessons.
10. Lab: NotMNIST in TensorFlow
Workspace
This section contains either a workspace (it can be a Jupyter Notebook workspace or an online
code editor work space, etc.) and it cannot be automatically downloaded to be generated here.
Please access the classroom with your account and manually download the workspace to your
local machine. Note that for some courses, Udacity upload the workspace files
onto https://github.com/udacity, so you may be able to download them there.
Workspace Information:
The first thing we'll learn to implement in TensorFlow is ReLU hidden layer. A ReLU is a
non-linear function, or rectified linear unit. The ReLU function is 0 for negative inputs
and xx for all inputs x >0x>0.
As before, the following nodes will build up on the knowledge from the Deep Neural
Networks lesson. If you need to refresh your mind, you can go back and watch them again.
ReLU
Feedforward
Dropout
12. Quiz: TensorFlow ReLUs
TensorFlow ReLUs
TensorFlow provides the ReLU function as tf.nn.relu() , as shown below.
Quiz
Below you'll use the ReLU function to turn a linear single layer network into a non-linear
multilayer network.
Start Quiz:
quiz.py solution.py
# Solution is available in the other "solution.py" tab
import tensorflow as tf
output = None
hidden_layer_weights = [
[0.1, 0.2, 0.4],
[0.4, 0.6, 0.6],
[0.5, 0.9, 0.1],
[0.8, 0.2, 0.8]]
out_weights = [
[0.1, 0.6],
[0.2, 0.1],
[0.7, 0.9]]
# Input
features = tf.Variable([[1.0, 2.0, 3.0, 4.0], [-1.0, -2.0, -3.0,
-4.0], [11.0, 12.0, 13.0, 14.0]])
output = None
hidden_layer_weights = [
[0.1, 0.2, 0.4],
[0.4, 0.6, 0.6],
[0.5, 0.9, 0.1],
[0.8, 0.2, 0.8]]
out_weights = [
[0.1, 0.6],
[0.2, 0.1],
[0.7, 0.9]]
# Input
features = tf.Variable([[1.0, 2.0, 3.0, 4.0], [-1.0, -2.0, -3.0,
-4.0], [11.0, 12.0, 13.0, 14.0]])
Step by Step
In the following walkthrough, we'll step through TensorFlow code written to classify the
letters in the MNIST database. If you would like to run the network on your computer, the file
is provided here. You can find this and many more examples of TensorFlow at Aymeric
Damien's GitHub repository.
Code
TensorFlow MNIST
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot=True, reshape=False)
You'll use the MNIST dataset provided by TensorFlow, which batches and One-Hot encodes
the data for you.
Learning Parameters
import tensorflow as tf
# Parameters
learning_rate = 0.001
training_epochs = 20
batch_size = 128 # Decrease batch size if you don't have enough memory
display_step = 1
Input
# tf Graph input
x = tf.placeholder("float", [None, 28, 28, 1])
y = tf.placeholder("float", [None, n_classes])
Multilayer Perceptron
Optimizer
# Define loss and optimizer
cost = tf.reduce_mean(\
tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
.minimize(cost)
This is the same optimization technique used in the Intro to TensorFLow lab.
Session
# Initializing the variables
init = tf.global_variables_initializer()
That's it! Going from one layer to two is easy. Adding more layers to the network allows you
to solve more complicated problems.
14. Save and Restore TensorFlow Models
Save and Restore TensorFlow Models
Training a model can take hours. But once you close your TensorFlow session, you lose all the
trained weights and biases. If you were to reuse the model in the future, you would have to
train it all over again!
Fortunately, TensorFlow gives you the ability to save your progress using a class
called tf.train.Saver . This class provides the functionality to save any tf.Variable to
your file system.
Saving Variables
Let's start with a simple example of saving weights and bias Tensors. For the first example
you'll just save two variables. Later examples will save all the weights in a practical model.
import tensorflow as tf
Bias:
If you're using TensorFlow 0.11.0RC1 or newer, a file called "model.ckpt.meta" will also be
created. This file contains the TensorFlow graph.
Loading Variables
Now that the Tensor Variables are saved, let's load them back into a new model.
Bias:
You'll notice you still need to create the weights and bias Tensors in Python.
The tf.train.Saver.restore() function loads the saved data into weights and bias .
Since tf.train.Saver.restore() sets all the TensorFlow Variables, you don't need to
call tf.global_variables_initializer() .
learning_rate = 0.001
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)
# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
Let's train that model, then save the weights:
import math
save_file = './train_model.ckpt'
batch_size = 128
n_epochs = 100
saver = tf.train.Saver()
# Training cycle
for epoch in range(n_epochs):
total_batch = math.ceil(mnist.train.num_examples / batch_size)
# Loop over all batches
for i in range(total_batch):
batch_features, batch_labels = mnist.train.next_batch(batch_size)
sess.run(
optimizer,
feed_dict={features: batch_features, labels: batch_labels})
saver = tf.train.Saver()
test_accuracy = sess.run(
accuracy,
feed_dict={features: mnist.test.images, labels: mnist.test.labels})
That's it! You now know how to save and load a trained model in TensorFlow. Let's look at
loading weights and biases into modified models in the next section.
15. Finetuning
Loading the Weights and Biases into a New Model
Sometimes you might want to adjust, or "finetune" a model that you have already trained and
saved.
However, loading saved Variables directly into a modified model can generate errors. Let's go
over how to avoid these problems.
Naming Error
TensorFlow uses a string identifier for Tensors and Operations called name . If a name is not
given, TensorFlow will create one automatically. TensorFlow will give the first node the
name <Type> , and then give the name <Type>_<number> for the subsequent nodes. Let's see
how this can affect loading a model with a different order of weights and bias :
import tensorflow as tf
save_file = 'model.ckpt'
saver = tf.train.Saver()
saver = tf.train.Saver()
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to
match.
You'll notice that the name properties for weights and bias are different than when you
saved the model. This is why the code produces the "Assign requires shapes of both tensors to
match" error. The code saver.restore(sess, save_file) is trying to load weight data
into bias and bias data into weights .
Instead of letting TensorFlow set the name property, let's set it manually:
import tensorflow as tf
tf.reset_default_graph()
save_file = 'model.ckpt'
saver = tf.train.Saver()
saver = tf.train.Saver()
That worked! The Tensor names match and the data loaded correctly.
16. Quiz: TensorFlow Dropout
TensorFlow Dropout
Figure 1: Taken from the paper "Dropout: A Simple Way to Prevent Neural Networks from
Overfitting" (https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf)
TensorFlow provides the tf.nn.dropout() function, which you can use to implement
dropout.
keep_prob allows you to adjust the number of units to drop. In order to compensate for
dropped units, tf.nn.dropout() multiplies all units that are kept (i.e. not dropped)
by 1/keep_prob .
During testing, use a keep_prob value of 1.0 to keep all units and maximize the power of
the model.
Quiz 1
Take a look at the code snippet below. Do you see what's wrong?
There's nothing wrong with the syntax, however the test accuracy is extremely low.
...
...
sess.run(optimizer, feed_dict={
features: batch_features,
labels: batch_labels,
keep_prob: 0.5})
Dropout doesn't work with batching.
The keep_prob value of 0.5 is too low.
There shouldn't be a value passed to keep_prob when testing for accuracy.
keep_prob should be set to 1.0 when evaluating validation accuracy.
Note: Output will be different every time the code is run. This is caused by dropout
randomizing the units it drops.
Start Quiz:
quiz.py solution.py
# Solution is available in the other "solution.py" tab
import tensorflow as tf
hidden_layer_weights = [
[0.1, 0.2, 0.4],
[0.4, 0.6, 0.6],
[0.5, 0.9, 0.1],
[0.8, 0.2, 0.8]]
out_weights = [
[0.1, 0.6],
[0.2, 0.1],
[0.7, 0.9]]
# Input
features = tf.Variable([[0.0, 2.0, 3.0, 4.0], [0.1, 0.2, 0.3, 0.4],
[11.0, 12.0, 13.0, 14.0]])
hidden_layer_weights = [
[0.1, 0.2, 0.4],
[0.4, 0.6, 0.6],
[0.5, 0.9, 0.1],
[0.8, 0.2, 0.8]]
out_weights = [
[0.1, 0.6],
[0.2, 0.1],
[0.7, 0.9]]
# Input
features = tf.Variable([[0.0, 2.0, 3.0, 4.0], [0.1, 0.2, 0.3, 0.4],
[11.0, 12.0, 13.0, 14.0]])