You are on page 1of 145

Self-Driving Car

Program Syllabus

Core Curriculum
This section consists of all the lessons and projects you need to complete in order to receive your
certificate.
3 Parts
14 Projects

1. 1

Part 1

Computer Vision and Deep Learning


In this term, you'll become an expert in applying Computer Vision and Deep Learning on
automotive problems. You will teach the car to detect lane lines, predict steering angle, and
more all based on just camera data!
2. 2

LockedPart 2 - Locked
Purchase Term To Unlock

Sensor Fusion, Localization, and Control


In this term, you'll learn how to use an array of sensor data to perceive the environment and
control the vehicle. You'll evaluate sensor data from camera, radar, lidar, and GPS, and use these
in closed-loop controllers that actuate the vehicle.
3. 3

LockedPart 3 - Locked
Purchase Term To Unlock
Path Planning, Concentrations, and Systems
In this term, you'll learn how to plan where the vehicle should go, how the vehicle systems work
together to get it there, and you'll perform a deep-dive into a concentration of your choice.

Up Next

Part 1, Lesson 2: Finding Lane Lines

Extracurricular
This section consists of extra lessons and projects you can choose to complete in order to increase your
chances of changing careers.
2 Parts
7 Projects

1. 1

Part 1

Career: Job Search Strategies


Opportunity can come when you least expect it, so when your dream job comes along, you want
to be ready. In the following lessons, you will learn strategies for conducting a successful job
search, including developing a targeted resume and cover letter for that job.
2. 2

Part 2

Career: Networking
Networking is a very important component to a successful job search. In the following lesson,
you will learn how tell your unique story to recruiters in a succinct and professional but
relatable way.
Up Next

Part 1, Lesson 2: Finding Lane Lines

1. Welcome to the Self-Driving Car Nanodegree Program

https://www.youtube.com/watch?v=jHA__A61nqc

2. Meet Your Instructors

https://www.youtube.com/watch?v=QiflJFVOt18

3. Overview of ND Program
https://www.youtube.com/watch?v=RZ5iolr4RGs

4. What Projects Will You Build?

https://www.youtube.com/watch?v=JGpXenoW0dk

5. Career Support

https://www.youtube.com/watch?time_continue=9&v=4MGOyNXh4EQ
6. Nanodegree Support

Getting Support
There are several ways in which you will receive support during the program from Udacity's network
of Mentors and Reviewers, as well as your fellow students.

Mentorship
You can think of your Mentor as your Advisor in the Nanodegree.
Your in-classroom Mentor will be your guide through the program and will do the following:
Check in with you weekly to make sure that you are on track.
Help you set learning goals.
Guide you to supplementary resources when you get stuck.
Respond to any questions you have about the program.

If you have questions or comments about the Mentorship experience, or if you're having trouble
reaching your mentor, please email mentorship-support@udacity.com.

Forum Q&A

Udacity Discourse will be your home for the forums and the wiki.
Aside from your Mentor, the forums are a great place to ask in-depth and technical questions.
Questions in the forums will be answered by both paid mentors and other students. Make sure to like
answers as you read them, and feel free to post answers yourself!
We will be using Discourse for the forums, and you should be able to access these forums anytime by
following the forum link on the left hand side of the classroom. Once you are there, check out the
different categories and subcategories, and post a question if you have one!

Slack Community

Your private slack team will be the best place to chat live with students and staff.
Slack is the best place for live discussion and interaction with your community of students. If you
haven't joined already, you can sign up here. (Note that this Slack instance is for enrolled students and
is different from the ND013 Slack Team.)
Reviews

Our global team of Reviewers will code review each of your project submissions usually within 24
hours.
For each project you submit, you will receive detailed feedback from a project Reviewer.
Sometimes, a reviewer might ask you to resubmit a project to meet specifications. In that case, an
indication of needed changes will also be provided. Note that you can submit a project as many times
as needed to pass.

Feedback
Please help us improve the program by submitting bugs and issues to our Waffle board.
In order to keep our content up-to-date and address issues quickly, we've set up a Waffle board to track
error reports and suggestions.
If you find an error, check there to see if it has already been filed. If it hasn't, you can file an issue by
clicking on the "Add issue" button, adding a title, and entering a description in the details (you will
need a GitHub account for this).
Links and screenshots, if available, are always appreciated!

An example of adding an issue to our Waffle board.

Quiz Question
Have you signed up for Slack? Have you visited the forums?

7. Deadline Policy
Deadline Policy
When we use the term deadline with regards to Nanodegree program projects, we use it in one of two
ways:
A final deadline for passing all projects
Ongoing suggested deadline for individual projects

It is very important to understand the distinctions between the two, as your progress in the program is
measured against the deadlines weve established. Please see below for an explanation of what each
usage means.

A final deadline for passing all projects


Passing a project in this context means that a Udacity Reviewer has marked your project as Meets
Specifications. In order to graduate a term, you have to pass all projects by the last day of the term.
If you do not pass all projects by the last day of the term, the following happens:
You will receive a 4-week extension to complete any outstanding projects. You will receive this
extension a maximum of one time. Once you submit and pass all projects, you can enroll in the
next term, which will potentially be with a later class. If you do not submit and pass all projects
within the 4-week extension, you will be removed from the program.

Ongoing suggested deadlines for individual projects

An example of a suggested deadline.


The deadlines you see in your classroom are suggestions for when you should ideally pass each project.
They are meant to help keep you on track so that you maintain an appropriate pace throughout the
programone that will see you graduate on time!
Please note that you can submit your project as many times as you need to. There are no penalties if
you miss these deadlines. However, you will be at risk of not passing all projects on time if you miss
these deadlines, and fall behind, so it is a recommended best practice to try and meet each suggested
deadline.
8. Self-Driving Car History

Self-Driving Car History

Stanley - The car that Sebastian Thrun and his team at Stanford built to win the DARPA Grand
Challenge.
The recent advancements in self-driving cars are built on decades of work by people around the world.
In the next video, you'll get a chance to step back and learn about some of this work and how your own
contributions may one day fit into this narrative.
In particular, you'll get a chance to relive the DARPA Grand Challenge, one of the great milestones in
self-driving car technology, and meet some of the people who took on this seemingly impossible task.
This video is not required, but we highly encourage you to watch it when you get the chance.
We hope you enjoy it as much as we did!

9. The Great Robot Race

https://www.youtube.com/watch?v=saVZ_X9GfIM
10. Self-Driving Car Quiz

And now for a fun quiz... Test your knowledge of


the self-driving car industry!
Question 1 of 2
Can you guess how many self-driving cars will be on the road by the year 2020?

10,000,000

Question 2 of 2
Can you guess which of the following companies are CURRENTLY developing self-driving cars?

All of the above

Discussion Forum: https://discussions.udacity.com/c/nd013-finding-lane-lines


1. Setting up the Problem
https://www.youtube.com/watch?v=aIkAcXVxf2w

Quiz Question
Which of the following features could be useful in the identification of lane lines on the road?
Color

Shape

Orientation

Position in the image

2. Color Selection
https://www.youtube.com/watch?time_continue=1&v=bNOWJ9wdmhk

Quiz Question
What color is pure white in our combined red + green + blue [R, G, B] image?

[255, 255, 255]

3. Color Selection Code Example

Coding up a Color Selection


Lets code up a simple color selection in Python.
No need to download or install anything, you can just follow along in the browser for now.
We'll be working with the same image you saw previously.

Check out the code below. First, I import pyplot and image from matplotlib. I also import
numpy for operating on the image.
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np

I then read in an image and print out some stats. Ill grab the x and y sizes and make a copy of the
image to work with. NOTE: Always make a copy of arrays or other variables in Python. If instead, you
say "a = b" then all changes you make to "a" will be reflected in "b" as well!
# Read in the image and print out some stats
image = mpimg.imread('test.jpg')
print('This image is: ',type(image),
'with dimensions:', image.shape)

# Grab the x and y size and make a copy of the image


ysize = image.shape[0]
xsize = image.shape[1]
# Note: always make a copy rather than simply using "="
color_select = np.copy(image)

Next I define a color threshold in the variables red_threshold, green_threshold, and


blue_threshold and populate rgb_threshold with these values. This vector contains the
minimum values for red, green, and blue (R,G,B) that I will allow in my selection.
# Define our color selection criteria
# Note: if you run this code, you'll find these are not sensible values!!
# But you'll get a chance to play with them soon in a quiz
red_threshold = 0
green_threshold = 0
blue_threshold = 0
rgb_threshold = [red_threshold, green_threshold, blue_threshold]

Next, I'll select any pixels below the threshold and set them to zero.
After that, all pixels that meet my color criterion (those above the threshold) will be retained, and those
that do not (below the threshold) will be blacked out.
# Identify pixels below the threshold
thresholds = (image[:,:,0] < rgb_threshold[0]) \
| (image[:,:,1] < rgb_threshold[1]) \
| (image[:,:,2] < rgb_threshold[2])
color_select[thresholds] = [0,0,0]

# Display the image


plt.imshow(color_select)
plt.show()

The result, color_select, is an image in which pixels that were above the threshold have been
retained, and pixels below the threshold have been blacked out.
In the code snippet above, red_threshold, green_threshold and blue_threshold are all
set to 0, which implies all pixels will be included in the selection.

In the next quiz, you will modify the values of red_threshold, green_threshold and
blue_threshold until you retain as much of the lane lines as possible while dropping everything
else. Your output image should look like the one below.
Image after color selection

4. QUIZ: Color Selection


In the next quiz, I want you to modify the values of the variables red_threshold,
green_threshold, and blue_threshold until you are able to retain as much of the lane lines
as possible, while getting rid of most of the other stuff. When you run the code in the quiz, your image
will be output with an example image next to it. Tweak these variables such that your input image (on
the left below) looks like the example image on the right.
5. Region Masking
https://www.youtube.com/watch?v=ngN9Cr-QfiI

Coding up a Region of Interest Mask


Awesome! Now you've seen that with a simple color selection we have managed to eliminate almost
everything in the image except the lane lines.
At this point, however, it would still be tricky to extract the exact lines automatically, because we still
have some other objects detected around the periphery that aren't lane lines.

In this case, I'll assume that the front facing camera that took the image is mounted in a fixed position
on the car, such that the lane lines will always appear in the same general region of the image. Next, I'll
take advantage of this by adding a criterion to only consider pixels for color selection in the region
where we expect to find the lane lines.
Check out the code below. The variables left_bottom, right_bottom, and apex represent the
vertices of a triangular region that I would like to retain for my color selection, while masking
everything else out. Here I'm using a triangular mask to illustrate the simplest case, but later you'll use
a quadrilateral, and in principle, you could use any polygon.
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np

# Read in the image and print some stats


image = mpimg.imread('test.jpg')
print('This image is: ', type(image),
'with dimensions:', image.shape)

# Pull out the x and y sizes and make a copy of the image
ysize = image.shape[0]
xsize = image.shape[1]
region_select = np.copy(image)

# Define a triangle region of interest


# Keep in mind the origin (x=0, y=0) is in the upper left in image processing
# Note: if you run this code, you'll find these are not sensible values!!
# But you'll get a chance to play with them soon in a quiz
left_bottom = [0, 539]
right_bottom = [900, 300]
apex = [400, 0]

# Fit lines (y=Ax+B) to identify the 3 sided region of interest


# np.polyfit() returns the coefficients [A, B] of the fit
fit_left = np.polyfit((left_bottom[0], apex[0]), (left_bottom[1], apex[1]), 1)
fit_right = np.polyfit((right_bottom[0], apex[0]), (right_bottom[1], apex[1]), 1)
fit_bottom = np.polyfit((left_bottom[0], right_bottom[0]), (left_bottom[1],
right_bottom[1]), 1)

# Find the region inside the lines


XX, YY = np.meshgrid(np.arange(0, xsize), np.arange(0, ysize))
region_thresholds = (YY > (XX*fit_left[0] + fit_left[1])) & \
(YY > (XX*fit_right[0] + fit_right[1])) & \
(YY < (XX*fit_bottom[0] + fit_bottom[1]))

# Color pixels red which are inside the region of interest


region_select[region_thresholds] = [255, 0, 0]

# Display the image


plt.imshow(region_select)

6. Color and Region Combined


Combining Color and Region Selections
Now you've seen how to mask out a region of interest in an image. Next, let's combine the mask and
color selection to pull only the lane lines out of the image.
Check out the code below. Here were doing both the color and region selection steps, requiring that a
pixel meet both the mask and color selection requirements to be retained.

import matplotlib.pyplot as plt


import matplotlib.image as mpimg
import numpy as np

# Read in the image


image = mpimg.imread('test.jpg')

# Grab the x and y sizes and make two copies of the image
# With one copy we'll extract only the pixels that meet our selection,
# then we'll paint those pixels red in the original image to see our selection
# overlaid on the original.
ysize = image.shape[0]
xsize = image.shape[1]
color_select= np.copy(image)
line_image = np.copy(image)

# Define our color criteria


red_threshold = 0
green_threshold = 0
blue_threshold = 0
rgb_threshold = [red_threshold, green_threshold, blue_threshold]

# Define a triangle region of interest (Note: if you run this code,


# Keep in mind the origin (x=0, y=0) is in the upper left in image processing
# you'll find these are not sensible values!!
# But you'll get a chance to play with them soon in a quiz ;)
left_bottom = [0, 539]
right_bottom = [900, 300]
apex = [400, 0]

fit_left = np.polyfit((left_bottom[0], apex[0]), (left_bottom[1], apex[1]), 1)


fit_right = np.polyfit((right_bottom[0], apex[0]), (right_bottom[1], apex[1]), 1)
fit_bottom = np.polyfit((left_bottom[0], right_bottom[0]), (left_bottom[1],
right_bottom[1]), 1)

# Mask pixels below the threshold


color_thresholds = (image[:,:,0] < rgb_threshold[0]) | \
(image[:,:,1] < rgb_threshold[1]) | \
(image[:,:,2] < rgb_threshold[2])

# Find the region inside the lines


XX, YY = np.meshgrid(np.arange(0, xsize), np.arange(0, ysize))
region_thresholds = (YY > (XX*fit_left[0] + fit_left[1])) & \
(YY > (XX*fit_right[0] + fit_right[1])) & \
(YY < (XX*fit_bottom[0] + fit_bottom[1]))
# Mask color selection
color_select[color_thresholds] = [0,0,0]
# Find where image is both colored right and in the region
line_image[~color_thresholds & region_thresholds] = [255,0,0]

# Display our two output images


plt.imshow(color_select)
plt.imshow(line_image)

In the next quiz, you can vary your color selection and the shape of your region mask (vertices of a
triangle left_bottom, right_bottom, and apex), such that you pick out the lane lines and
nothing else.

7. Quiz: Color and Region Selection

In this next quiz, I've given you the values of red_threshold, green_threshold, and
blue_threshold but now you need to modify left_bottom, right_bottom, and apex to
represent the vertices of a triangle identifying the region of interest in the image. When you run the
code in the quiz, your output result will be several images. Tweak the vertices until your output looks
like the examples shown below.
Start Quiz

8. Finding Lines of Any Color

Finding Lines of any Color

So you found the lane lines... simple right? Now youre ready to upload the algorithm to the car and
drive autonomously right?? Well, not quite yet ;)
As it happens, lane lines are not always the same color, and even lines of the same color under different
lighting conditions (day, night, etc) may fail to be detected by our simple color selection.
What we need is to take our algorithm to the next level to detect lines of any color using sophisticated
computer vision methods.
So, what is computer vision?

9.
What is Computer Vision?
https://www.youtube.com/watch?v=wxQhfSdxjKU

In rest of this lesson, well introduce some computer vision techniques with enough detail for you to
get an intuitive feel for how they work.
You'll learn much more about these topics during the Computer Vision module later in the program.
We also recommend the free Udacity course, Introduction to Computer Vision.

Throughout this Nanodegree Program, we will be using Python with OpenCV for computer vision
work. OpenCV stands for Open-Source Computer Vision. For now, you don't need to download or
install anything, but later in the program we'll help you get these tools installed on your own computer.
OpenCV contains extensive libraries of functions that you can use. The OpenCV libraries are well
documented, so if youre ever feeling confused about what the parameters in a particular function are
doing, or anything else, you can find a wealth of information at opencv.org.
10. Canny Edge Detection
https://www.youtube.com/watch?v=Av2GsgQWX8I
https://www.youtube.com/watch?time_continue=6&v=LQM--KPJjD0

Note! The standard location of the origin (x=0, y=0) for images is in the top left corner with y
values increasing downward and x increasing to the right. This might seem weird at first, but if
you think about an image as a matrix, it makes sense that the "00" element is in the upper left.

Now let's try a quiz. Below, Im plotting a cross section through this image. Where are the areas in the
image that are most likely to be identified as edges?

Quiz Question
The red line in the plot above shows where I took a cross section through the image. The wiggles in the
blue line indicate changes in intensity along that cross section through the image. Check all the boxes
of the letters along this cross section, where you expect to find strong edges.
A

E
11. Canny to Detect Lane Lines

Canny Edge Detection in Action


Now that you have a conceptual grasp on how the Canny algorithm works, it's time to use it to find the
edges of the lane lines in an image of the road. So let's give that a try.
First, we need to read in an image:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
image = mpimg.imread('exit-ramp.jpg')
plt.imshow(image)
Here we have an image of the road, and it's fairly obvious by eye where the lane lines are, but what
about using computer vision?
Let's go ahead and convert to grayscale.
import cv2 #bringing in OpenCV libraries
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY) #grayscale conversion
plt.imshow(gray, cmap='gray')

Lets try our Canny edge detector on this image. This is where OpenCV gets useful. First, we'll have a
look at the parameters for the OpenCV Canny function. You will call it like this:
edges = cv2.Canny(gray, low_threshold, high_threshold)

In this case, you are applying Canny to the image gray and your output will be another image called
edges. low_threshold and high_threshold are your thresholds for edge detection.

The algorithm will first detect strong edge (strong gradient) pixels above the high_threshold, and
reject pixels below the low_threshold. Next, pixels with values between the low_threshold
and high_threshold will be included as long as they are connected to strong edges. The output
edges is a binary image with white pixels tracing out the detected edges and black everywhere else.
See the OpenCV Canny Docs for more details.
What would make sense as a reasonable range for these parameters? In our case, converting to
grayscale has left us with an 8-bit image, so each pixel can take 2^8 = 256 possible values. Hence, the
pixel values range from 0 to 255.
This range implies that derivatives (essentially, the value differences from pixel to pixel) will be on the
scale of tens or hundreds. So, a reasonable range for your threshold parameters would also be in
the tens to hundreds.
As far as a ratio of low_threshold to high_threshold, John Canny himself recommended a
low to high ratio of 1:2 or 1:3.
We'll also include Gaussian smoothing, before running Canny, which is essentially a way of
suppressing noise and spurious gradients by averaging (check out the OpenCV docs for GaussianBlur).
cv2.Canny() actually applies Gaussian smoothing internally, but we include it here because you can
get a different result by applying further smoothing (and it's not a changeable parameter within
cv2.Canny()!).

You can choose the kernel_size for Gaussian smoothing to be any odd number. A larger
kernel_size implies averaging, or smoothing, over a larger area. The example in the previous
lesson was kernel_size = 3.

Note: If this is all sounding complicated and new to you, don't worry! We're moving pretty fast through
the material here, because for now we just want you to be able to use these tools. If you would like to
dive into the math underpinning these functions, please check out the free Udacity course, Intro to
Computer Vision, where the third lesson covers Gaussian filters and the sixth and seventh lessons cover
edge detection.
#doing all the relevant imports
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import cv2

# Read in the image and convert to grayscale


image = mpimg.imread('exit-ramp.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)

# Define a kernel size for Gaussian smoothing / blurring


# Note: this step is optional as cv2.Canny() applies a 5x5 Gaussian internally
kernel_size = 3
blur_gray = cv2.GaussianBlur(gray,(kernel_size, kernel_size), 0)

# Define parameters for Canny and run it


# NOTE: if you try running this code you might want to change these!
low_threshold = 1
high_threshold = 10
edges = cv2.Canny(blur_gray, low_threshold, high_threshold)

# Display the image


plt.imshow(edges, cmap='Greys_r')

Here I've called the OpenCV function Canny on a Gaussian-smoothed grayscaled image called
blur_gray and detected edges with thresholds on the gradient of high_threshold, and
low_threshold.

In the next quiz you'll get to try this on your own and mess around with the parameters for the Gaussian
smoothing and Canny Edge Detection to optimize for detecting the lane lines and not a lot of other
stuff.

12. Quiz: Canny Edge Detection Quiz


Now its your turn! Try using Canny on your own and fiddle with the parameters for the Gaussian
smoothing and Edge Detection to optimize for detecting the lane lines well without detecting a lot of
other stuff. Your result should look like the example shown below.
Start Quiz
13. Hough Transform
Hough Transform

Using the Hough


Transform to Find Lines
from Canny Edges
https://www.youtube.com/watch?v=JFwj5UtKmPY

In image space, a line is plotted as x vs. y, but in 1962, Paul Hough devised a method for representing
lines in parameter space, which we will call Hough space in his honor.
In Hough space, I can represent my "x vs. y" line as a point in "m vs. b" instead. The Hough Transform
is just the conversion from image space to Hough space. So, the characterization of a line in image
space will be a single point at the position (m, b) in Hough space.

So now Id like to check your intuition if a line in image space corresponds to a point in Hough
space, what would two parallel lines in image space correspond to in Hough space?
Question 1 of 5
What will be the representation in Hough space of two parallel lines in image space?

Alright, so a line in image space corresponds to a point in Hough space. What does a point in
image space correspond to in Hough space?
A single point in image space has many possible lines that pass through it, but not just any lines, only
those with particular combinations of the m and b parameters. Rearranging the equation of a line, we
find that a single point (x,y) corresponds to the line b = y - xm.
So what is the representation of a point in image space in Hough space?
Question 2 of 5
What does a point in image space correspond to in Hough space?
A

What if you have 2 points in image space. What would that look like in Hough space?
Question 3 of 5
What is the representation in Hough space of two points in image space?

Alright, now we have two intersecting lines in Hough Space. How would you represent their
intersection at the point (m0, b0) in image space?
Question 4 of 5
What does the intersection point of the two lines in Hough space correspond to in image space?
A) A line in image space that passes through both (x1, y1) and (x2, y2)

https://www.youtube.com/watch?v=XQf7FOhwOVk

So, what happens if we run a Hough Transform on an image of a square? What will the corresponding
plot in Hough space look like?

https://www.youtube.com/watch?v=upKjISd3aBk

Question 5 of 5
What happens if we run a Hough Transform on an image of a square? What will the corresponding plot
in Hough space look like?

14. Hough Transform to Find Lane Lines


Implementing a Hough Transform on Edge
Detected Image
Now you know how the Hough Transform works, but to accomplish the task of finding lane lines, we
need to specify some parameters to say what kind of lines we want to detect (i.e., long lines, short lines,
bendy lines, dashed lines, etc.).
To do this, we'll be using an OpenCV function called HoughLinesP that takes several parameters.
Let's code it up and find the lane lines in the image we detected edges in with the Canny function (for a
look at coding up a Hough Transform from scratch, check this out.) .
Here's the image we're working with:

Let's look at the input parameters for the OpenCV function HoughLinesP that we will use to find
lines in the image. You will call it like this:
lines = cv2.HoughLinesP(edges, rho, theta, threshold, np.array([]),
min_line_length, max_line_gap)
In this case, we are operating on the image edges (the output from Canny) and the output from
HoughLinesP will be lines, which will simply be an array containing the endpoints (x1, y1, x2, y2)
of all line segments detected by the transform operation. The other parameters define just what kind of
line segments we're looking for.
First off, rho and theta are the distance and angular resolution of our grid in Hough space.
Remember that, in Hough space, we have a grid laid out along the (, ) axis. You need to specify rho
in units of pixels and theta in units of radians.

So, what are reasonable values? Well, rho takes a minimum value of 1, and a reasonable starting place
for theta is 1 degree (pi/180 in radians). Scale these values up to be more flexible in your definition of
what constitutes a line.
The threshold parameter specifies the minimum number of votes (intersections in a given grid cell)
a candidate line needs to have to make it into the output. The empty np.array([]) is just a
placeholder, no need to change it. min_line_length is the minimum length of a line (in pixels)
that you will accept in the output, and max_line_gap is the maximum distance (again, in pixels)
between segments that you will allow to be connected into a single line. You can then iterate through
your output lines and draw them onto the image to see what you got!

So, here's what its going to look like:


# Do relevant imports
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import cv2

# Read in and grayscale the image


image = mpimg.imread('exit-ramp.jpg')
gray = cv2.cvtColor(image,cv2.COLOR_RGB2GRAY)

# Define a kernel size and apply Gaussian smoothing


kernel_size = 5
blur_gray = cv2.GaussianBlur(gray,(kernel_size, kernel_size),0)

# Define our parameters for Canny and apply


low_threshold = 50
high_threshold = 150
edges = cv2.Canny(blur_gray, low_threshold, high_threshold)

# Define the Hough transform parameters


# Make a blank the same size as our image to draw on
rho = 1
theta = np.pi/180
threshold = 1
min_line_length = 10
max_line_gap = 1
line_image = np.copy(image)*0 #creating a blank to draw lines on

# Run Hough on edge detected image


lines = cv2.HoughLinesP(edges, rho, theta, threshold, np.array([]),
min_line_length, max_line_gap)

# Iterate over the output "lines" and draw lines on the blank
for line in lines:
for x1,y1,x2,y2 in line:
cv2.line(line_image,(x1,y1),(x2,y2),(255,0,0),10)

# Create a "color" binary image to combine with line image


color_edges = np.dstack((edges, edges, edges))

# Draw the lines on the edge image


combo = cv2.addWeighted(color_edges, 0.8, line_image, 1, 0)
plt.imshow(combo)
As you can see I've detected lots of line segments! Your job, in the next exercise, is to figure out which
parameters do the best job of optimizing the detection of the lane lines. Then, you'll want to apply a
region of interest mask to filter out detected line segments in other areas of the image. Earlier in this
lesson you used a triangular region mask, but this time you'll get a chance to use a quadrilateral region
mask using the cv2.fillPoly() function (keep in mind though, you could use this same method to
mask an arbitrarily complex polygon region). When you're finished you'll be ready to apply the skills
you've learned to do the project at the end of this lesson.
15. Quiz: Hough Transform Quiz
Now it's your turn to play with the Hough Transform on an edge-detected image. You'll start with the
image on the left below. If you "Test Run" the quiz, you'll get output that looks like the center image.
Your job is to modify the parameters for the Hough Transform and impose a region of interest mask to
get output that looks like the image on the right. In the code, I've given you a framework for defining a
quadrilateral region of interest mask.
Start Quiz

16. Finding Lane Lines in a Video Stream


https://www.youtube.com/watch?v=LatP7XUPgIE

17. Starter Kit Installation


In this term, you'll use Python 3 for programming quizzes, labs, and projects. The following will guide
you through setting up the programming environment on your local machine.
There are two ways to get up and running:
1. Anaconda
2. Docker
We recommend you first try setting up your environment with Anaconda. It's faster to get up and
running and has fewer moving parts.
If the Anaconda installation gives you trouble, try Docker instead.
Follow the instructions in this README.(https://github.com/udacity/CarND-Term1-Starter-
Kit/blob/master/README.md )
Here is a great link for learning more about Anaconda and Jupyter Notebooks(
https://classroom.udacity.com/courses/ud1111 )

18. Run Some Code!

Run Some Code!


Now that everything is installed, let's make sure it's working!
1. Clone and navigate to the starter kit test repository.
# NOTE: This is DIFFERENT from https://github.com/udacity/CarND-Term1-
Starter-Kit.git
git clone https://github.com/udacity/CarND-Term1-Starter-Kit-Test.git
cd CarND-Term1-Starter-Kit-Test

2. Launch the Jupyter notebook with Anaconda or Docker. This notebook is simply to make sure
the installed packages are working properly. The instructions for the first project are on the
next page.
# Anaconda
source activate carnd-term1 # If currently deactivated, i.e. start of a new
terminal session
jupyter notebook test.ipynb

# Docker
docker run -it --rm -p 8888:8888 -v ${pwd}:/src udacity/carnd-term1-starter-
kit test.ipynb
# OR
docker run -it --rm -p 8888:8888 -v `pwd`:/src udacity/carnd-term1-starter-
kit test.ipynb
3. Go to http://localhost:8888/notebooks/test.ipynb in your browser and run
all the cells. Everything should execute without error.

Troubleshooting
ffmpeg
NOTE: If you don't have ffmpeg installed on your computer you'll have to install it for moviepy to
work. If this is the case you'll be prompted by an error in the notebook. You can easily install ffmpeg
by running the following in a code cell in the notebook.
import imageio
imageio.plugins.ffmpeg.download()

Once it's installed, moviepy should work.

Docker
To get the latest version of the docker image, you may need to run:
docker pull udacity/carnd-term1-starter-kit

Warning! The image is ~2GB!

19. Project Expectations

Project Expectations
For each project in Term 1, keep in mind a few key elements:
rubric
code
writeup
submission

Rubric
Each project comes with a rubric detailing the requirements for passing the project. Project reviewers
will check your project against the rubric to make sure that it meets specifications.
Before submitting your project, compare your submission against the rubric to make sure you've
covered each rubric point.
Here is an example of a project rubric:
Example of a project rubric

Code
Every project in the term includes code that you will write. For some projects we provide code
templates, often in a Jupyter notebook. For other projects, there are no code templates.
In either case, you'll need to submit your code files as part of the project. Each project has specific
instructions about what files are required. Make sure that your code is commented and easy for the
project reviewers to follow.
For the Jupyter notebooks, sometimes you must run all of the code cells and then export the notebook
as an HTML file. The notebook will contain instructions for how to do this.
Because running the code can take anywhere from several minutes to a few hours, the HTML file
allows project reviewers to see your notebook's output without having to run the code.
Even if the project requires submission of the HTML output of your Jupyter notebook, please submit
the original Jupyter notebook itself, as well.
Writeup
All of the projects in Term 1 require a writeup. The writeup is your chance to explain how you
approached the project.
It is also an opportunity to show your understanding of key concepts in the program.
We have provided writeup templates for every project so that it is clear what information needs to be in
each writeup. These templates can be found in each project repository, with the title
writeup_template.md.

Your writeup report should explain how you satisfied each requirement in the project rubric.
The writeups can be turned in either as Markdown files (.md) or PDF files.

Submission
When submitting a project, you can either submit it as a link to a GitHub repository
(https://github.com/ )or as a ZIP file. When submitting a GitHub repository, we advise creating a new
repository, specific to the project you are submitting.
GitHub repositories are a convenient way to organize your projects and display them to the world. A
GitHub repository also has a README.md file that opens automatically when somebody visits your
GitHub repository link.
As a suggestion, the README.md file for each repository can include the following information:
a list of files contained in the repository with a brief description of each file
any instructions someone might need for running your code
an overview of the project

Here is an example of a README file:


Example of a README file
If you are unfamiliar with GitHub , Udacity has a brief GitHub tutorial
(http://blog.udacity.com/2015/06/a-beginners-git-github-tutorial.html )to get you started. Udacity also
provides a more detailed free course on git and GitHub(https://www.udacity.com/course/how-to-use-
git-and-github--ud775 ).
To learn about REAMDE files and Markdown, Udacity provides a free course on READMEs
(https://www.udacity.com/course/writing-readmes--ud777 ), as well.
GitHub also provides a tutorial( https://guides.github.com/features/mastering-markdown/ ) about
creating Markdown files.

20. Finding Lane Lines on the Road

Finding Lane Lines on the Road


Due
Jun 1

Project Submission
Navigate to the project repository on GitHub (https://github.com/udacity/CarND-LaneLines-P1 )and
have a look at the Readme file for detailed instructions on how to get setup with Python and OpenCV
and how to access the Jupyter Notebook containing the project code. You will need to download, or
git clone, this repository in order to complete the project.

In this project, you will be writing code to identify lane lines on the road, first in an image, and later in
a video stream (really just a series of images). To complete this project you will use the tools you
learned about in the lesson, and build upon them.
Your first goal is to write code including a series of steps (pipeline) that identify and draw the lane lines
on a few test images. Once you can successfully identify the lines in an image, you can cut and paste
your code into the block provided to run on a video stream.
You will then refine your pipeline with parameter tuning and by averaging and extrapolating the lines.
Finally, you'll make a brief writeup report. The github repository has a writeup_template.md that
can be used as a guide.
Have a look at the video clip called "P1_example.mp4" in the repository to see an example of what
your final output should look like. Two videos are provided for you to run your code on. These are
called "solidWhiteRight.mp4" and solidYellowLeft.mp4".

Evaluation
Once you have completed your project, use the Project
Rubric(https://review.udacity.com/#!/rubrics/322/view ) to review the project. If you have covered
all of the points in the rubric, then you are ready to submit! If you see room for improvement in any
category in which you do not meet specifications, keep working!
Your project will be evaluated by a Udacity reviewer according to the same Project
Rubric(https://review.udacity.com/#!/rubrics/322/view ). Your project must "meet specifications" in
each category in order for your submission to pass.

Submission
What to include in your submission
You may submit your project as a zip file or with a link to a github repo. The submission must include
two files:
Jupyter Notebook with your project code
writeup report (md or pdf file)

Ready to submit your project?


Click on the "Submit Project" button and follow the instructions to submit!
Congratulations! You've completed this project

1. Meet the Careers Team

https://www.youtube.com/watch?time_continue=2&v=oR1IxPTTz0U

2. Mercedes-Benz

https://www.youtube.com/watch?v=Z_hi4djW5aw
3. NVIDIA

https://www.youtube.com/watch?v=C6Rt9lxMqHs
4. Uber ATG

https://www.youtube.com/watch?v=V23NZzX0efY

5. Your Udacity Profile


Connect to Hiring Partners through your
Udacity Professional Profile
In addition to the Career Lessons and Projects you'll find in your Nanodegree program, you have a
Udacity Professional Profile linked in the left sidebar.
Your Udacity Professional Profile features important, professional information about yourself. When
you make your profile public, it becomes accessible to our Hiring Partners:
Your profile will also connect you with recruiters and hiring managers who come to Udacity to hire
skilled Nanodegree graduates.
As you complete projects in your Nanodegree program, they will be automatically added to your
Udacity Professional Profile to ensure you're able to show employers the skills you've gained through
the program. In order to differentiate yourself from other candidates, make sure to go in and customize
those project cards. In addition to these projects, be sure to:
Keep your profile updated with your basic info and job preferences, such as location
Ensure you upload your latest resume
Return regularly to your Profile to update your projects and ensure you're showcasing your best
work
If you are looking for a job, make sure to keep your Udacity Professional Profile updated and visible to
recruiters!

6. Get Started

When you're ready to get started on your job search, head back to your syllabus and click on
"Extracurricular."
There you'll find two optional modules built by our Careers team: Job Search Strategies and
Networking.
The Udacity Careers team has put together this custom curriculum to help you in your job search. From
writing your resume and cover letter, to creating profiles on LinkedIn and GitHub, the team is here to
help you secure your dream job!
These modules and their associated projects are completely optional, but we highly recommend you
complete them to succeed in the job market. Udacity Hiring Partners (https://career-resource-
center.udacity.com/hiring-partners-jobs )are excited to hire students and alumni. We want to help you
optimize your application materials and targeted them to specific jobs!

1. Introduction to Deep Learning


https://www.youtube.com/watch?time_continue=5&v=uyLRFMI4HkA
2. Starting Machine Learning

https://www.youtube.com/watch?v=UIycORUrPww

ND013 01 L Intro To Neural Networks


3.Linear Regression Quiz
https://www.youtube.com/watch?v=sf51L0RN6zc

Quiz Question
What's the best estimate for the price of a house?

4.Linear Regression Answer


https://www.youtube.com/watch?time_continue=3&v=L5QBqYDNJn0

5. Linear Regression Answer


Linear to Logistic Regression
Linear regression helps predict values on a continuous spectrum, like predicting what the price of a
house will be.
How about classifying data among discrete classes?
Here are examples of classification tasks:
Determining whether a patient has cancer
Identifying the species of a fish
Figuring out who's talking on a conference call

Classification problems are important for self-driving cars. Self-driving cars might need to classify
whether an object crossing the road is a car, pedestrian, and a bicycle. Or they might need to identify
which type of traffic sign is coming up, or what a stop light is indicating.
In the next video, Luis will demonstrate a classification algorithm called "logistic regression". He'll use
logistic regression to predict whether a student will be accepted to a university.
Linear regression will lead to neural networks, which is a much more advanced classification tool.

6. Logistic Regression Quiz


https://www.youtube.com/watch?v=kSs6O3R7JUI

Quiz Question
Does the student get Accepted?

7. Logistic Regression Answer


https://www.youtube.com/watch?time_continue=14&v=1iNylA3fJDs

Logistic Regression - Solution

8. Neural Networks
https://www.youtube.com/watch?time_continue=1&v=Mqogpnp1lrU
9. Perceptron
Perceptron
Now you've seen how a simple neural network makes decisions: by taking in input data, processing that
information, and finally, producing an output in the form of a decision! Let's take a deeper dive into the
university admission example to learn more about processing the input data.
Data, like test scores and grades, are fed into a network of interconnected nodes. These individual
nodes are called perceptrons, or artificial neurons, and they are the basic unit of a neural network. Each
one looks at input data and decides how to categorize that data. In the example above, the input either
passes a threshold for grades and test scores or doesn't, and so the two categories are: yes (passed the
threshold) and no (didn't pass the threshold). These categories then combine to form a decision -- for
example, if both nodes produce a "yes" output, then this student gains admission into the university.
Let's zoom in even further and look at how a single perceptron processes input data.
The perceptron above is one of the two perceptrons from the video that help determine whether or not a
student is accepted to a university. It decides whether a student's grades are high enough to be accepted
to the university. You might be wondering: "How does it know whether grades or test scores are more
important in making this acceptance decision?" Well, when we initialize a neural network, we don't
know what information will be most important in making a decision. It's up to the neural network to
learn for itself which data is most important and adjust how it considers that data.
It does this with something called weights.

Weights
When input comes into a perceptron, it gets multiplied by a weight value that is assigned to this
particular input. For example, the perceptron above has two inputs, tests for test scores and
grades, so it has two associated weights that can be adjusted individually. These weights start out as
random values, and as the neural network network learns more about what kind of input data leads to a
student being accepted into a university, the network adjusts the weights based on any errors in
categorization that results from the previous weights. This is called training the neural network.
A higher weight means the neural network considers that input more important than other inputs, and
lower weight means that the data is considered less important. An extreme example would be if test
scores had no affect at all on university acceptance; then the weight of the test score input would be
zero and it would have no affect on the output of the perceptron.

Summing the Input Data


Each input to a perceptron has an associated weight that represents its importance. These weights are
determined during the learning process of a neural network, called training. In the next step, the
weighted input data are summed to produce a single value, that will help determine the final output -
whether a student is accepted to a university or not. Let's see a concrete example of this.
We weight x_test by w_test and add it to x_grades weighted by w_grades.

When writing equations related to neural networks, the weights will always be represented by some
type of the letter w. It will usually look like a W when it represents a matrix of weights or a w when it
represents an individual weight, and it may include some additional information in the form of a
subscript to specify which weights (you'll see more on that next). But remember, when you see the
letter w, think weights.
In this example, we'll use wgrades for the weight of grades and wtest for the weight of test. For the
image above, let's say that the weights are: wgrades=1,wtest =0.2. You don't have to be concerned
with the actual values, but their relative values are important. wgrades is 5 times larger than wtest,
which means the neural network considers grades input 5 times more important than test in
determining whether a student will be accepted into a university.
The perceptron applies these weights to the inputs and sums them in a process known as linear
combination. In our case, this looks like wgradesxgrades+wtestxtest=1xgrades0.2xtest.
Now, to make our equation less wordy, let's replace the explicit names with numbers. Let's use 1 for
grades and 2 for tests. So now our equation becomes
w1x1+w2x2
In this example, we just have 2 simple inputs: grades and tests. Let's imagine we instead had m different
inputs and we labeled them x1,x2,...,xm. Let's also say that the weight corresponding to x1 is w1 and so
on. In that case, we would express the linear combination succintly as:
i=1mwixi
Here, the Greek letter Sigma is used to represent summation. It simply means to evaluate the
equation to the right multiple times and add up the results. In this case, the equation it will sum is wixi
But where do we get wi and xi?
i=1m means to iterate over all i values, from 1 to m.
So to put it all together, i=1mwixi means the following:
Start at i=1
Evaluate w1x1 and remember the results
Move to i=2
Evaluate w2x2 and add these results to w1x1
Continue repeating that process until i=m, where m is the number of inputs.

One last thing: you'll see equations written many different ways, both here and when reading on your
own. For example, you will often just see i instead of i=1m. The first is simply a shorter way of
writing the second. That is, if you see a summation without a starting number or a defined end value, it
just means perform the sum for all of the them. And sometimes, if the value to iterate over can be
inferred, you'll see it as just . Just remember they're all the same thing: i=1mwixi=iwixi=wixi.

Calculating the Output with an Activation Function


Finally, the result of the perceptron's summation is turned into an output signal! This is done by feeding
the linear combination into an activation function.
Activation functions are functions that decide, given the inputs into the node, what should be the node's
output? Because it's the activation function that decides the actual output, we often refer to the outputs
of a layer as its "activations".
One of the simplest activation functions is the Heaviside step function. This function returns a 0 if the
linear combination is less than 0. It returns a 1 if the linear combination is positive or equal to zero. The
Heaviside step function is shown below, where h is the calculated linear combination:
The Heaviside Step Function
In the university acceptance example above, we used the weights wgrades=1,wtest =0.2. Since w
grades and wtest are negative values, the activation function will only return a 1 if grades and test are
0! This is because the range of values from the linear combination using these weights and inputs are
(,0] (i.e. negative infinity to 0, including 0 itself).
It's easiest to see this with an example in two dimensions. In the following graph, imagine any points
along the line or in the shaded area represent all the possible inputs to our node. Also imagine that the
value along the y-axis is the result of performing the linear combination on these inputs and the
appropriate weights. It's this result that gets passed to the activation function.
Now remember that the step activation function returns 1 for any inputs greater than or equal to zero.
As you can see in the image, only one point has a y-value greater than or equal to zero the point right
at the origin, (0,0):
Now, we certainly want more than one possible grade/test combination to result in acceptance, so we
need to adjust the results passed to our activation function so it activates that is, returns 1 for more
inputs. Specifically, we need to find a way so all the scores wed like to consider acceptable for
admissions produce values greater than or equal to zero when linearly combined with the weights into
our node.
One way to get our function to return 1 for more inputs is to add a value to the results of our linear
combination, called a bias.
A bias, represented in equations as b, lets us move values in one direction or another.
For example, the following diagram shows the previous hypothetical function with an added bias of +3.
The blue shaded area shows all the values that now activate the function. But notice that these are
produced with the same inputs as the values shown shaded in grey just adjusted higher by adding the
bias term:
Of course, with neural networks we won't know in advance what values to pick for biases. Thats ok,
because just like the weights, the bias can also be updated and changed by the neural network during
training. So after adding a bias, we now have a complete perceptron formula:

Perceptron Formula
This formula returns 1 if the input (x1,x2,...,xm) belongs to the accepted-to-university category or
returns 0 if it doesn't. The input is made up of one or more real numbers, each one represented by xi,
where m is the number of inputs.
Then the neural network starts to learn! Initially, the weights ( wi) and bias (b) are assigned a random
value, and then they are updated using a learning algorithm like gradient descent. The weights and
biases change so that the next training example is more accurately categorized, and patterns in data are
"learned" by the neural network.
Now that you have a good understanding of perceptions, let's put that knowledge to use. In the next
section, you'll create the AND perceptron from the Neural Networks video by setting the values for
weights and bias.

10. AND Perceptron Quiz


What are the weights and bias for the AND perceptron?
Set the weights (weight1, weight2) and bias bias to the correct values that calculate AND
operation as shown above.
In this case, there are two inputs as seen in the table above (let's call the first column input1 and the
second column input2), and based on the perceptron formula, we can calculate the output.

First, the linear combination will be the sum of the weighted inputs: linear_combination =
weight1*input1 + weight2*input2 then we can put this value into the biased Heaviside step
function, which will give us our output (0 or 1):

Perceptron Formula

import pandas as pd

# TODO: Set weight1, weight2, and bias


weight1 = 1.0
weight2 = 1.0
bias = -1.2

# DON'T CHANGE ANYTHING BELOW


# Inputs and outputs
test_inputs = [(0, 0), (0, 1), (1, 0), (1, 1)]
correct_outputs = [False, False, False, True]
outputs = []

# Generate and check output


for test_input, correct_output in zip(test_inputs, correct_outputs):
linear_combination = weight1 * test_input[0] + weight2 * test_input[1] + bias
output = int(linear_combination >= 0)
is_correct_string = 'Yes' if output == correct_output else 'No'
outputs.append([test_input[0], test_input[1], linear_combination, output, is_correct_string])

# Print output
num_wrong = len([output[4] for output in outputs if output[4] == 'No'])
output_frame = pd.DataFrame(outputs, columns=['Input 1', ' Input 2', ' Linear Combination', '
Activation Output', ' Is Correct'])
if not num_wrong:
print('Nice! You got it all correct.\n')
else:
print('You got {} wrong. Keep trying!\n'.format(num_wrong))
print(output_frame.to_string(index=False))

If you still need a hint, think of a concrete example like so:


Consider input1 and input2 both = 1, for an AND perceptron, we want the output to also equal 1! The
output is determined by the weights and Heaviside step function such that
output = 1, if weight1*input1 + weight2*input2 + bias >= 0
or
output = 0, if weight1*input1 + weight2*input2 + bias < 0

So, how can you choose the values for weights and bias so that if both inputs = 1, the output = 1?

11. OR & NOT Perceptron Quiz


OR Perceptron
The OR perceptron is very similar to an AND perceptron. In the image below, the OR perceptron has
the same line as the AND perceptron, except the line is shifted down. What can you do to the weights
and/or bias to achieve this? Use the following AND perceptron to create an OR Perceptron.
Question 1 of 2
What are two ways to go from an AND perceptron to an OR perceptron?
Increase the weights

Decrease the magnitude of the bias

NOT Perceptron
Unlike the other perceptrons we looked at, the NOT operations only cares about one input. The
operation returns a 0 if the input is 1 and a 1 if it's a 0. The other inputs to the perceptron are ignored.

In this quiz, you'll set the weights (weight1, weight2) and bias bias to the values that calculate
the NOT operation on the second input and ignores the first input.

import pandas as pd

# TODO: Set weight1, weight2, and bias


weight1 = -1.0
weight2 = -4.0
bias = 2.0

# DON'T CHANGE ANYTHING BELOW


# Inputs and outputs
test_inputs = [(0, 0), (0, 1), (1, 0), (1, 1)]
correct_outputs = [True, False, True, False]
outputs = []

# Generate and check output


for test_input, correct_output in zip(test_inputs, correct_outputs):
linear_combination = weight1 * test_input[0] + weight2 * test_input[1] + bias
output = int(linear_combination >= 0)
is_correct_string = 'Yes' if output == correct_output else 'No'
outputs.append([test_input[0], test_input[1], linear_combination, output, is_correct_string])

# Print output
num_wrong = len([output[4] for output in outputs if output[4] == 'No'])
output_frame = pd.DataFrame(outputs, columns=['Input 1', ' Input 2', ' Linear Combination', '
Activation Output', ' Is Correct'])
if not num_wrong:
print('Nice! You got it all correct.\n')
else:
print('You got {} wrong. Keep trying!\n'.format(num_wrong))
print(output_frame.to_string(index=False))

We have a perceptron that can do AND, OR, or NOT operations. Let's do one more, XOR. In the next
section, you'll learn how a neural network solves more complicated problems like XOR.

12. XOR Perceptron Quiz


XOR Perceptron
An XOR perceptron is a logic gate that outputs 0 if the inputs are the same and 1 if the inputs are
different. Unlike previous perceptrons, this graph isn't linearly separable. To handle more complex
problems like this, we can chain perceptrons together.
Let's build a neural network from the AND, NOT, and OR perceptrons to create XOR logic. Let's first
go over what a neural network looks like.

The above neural network contains 4 perceptrons, A, B, C, and D. The input to the neural network is
from the first node. The output comes out of the last node. The weights are based on the line thickness
between the perceptrons. Any link between perceptrons with a low weight, like A to C, you can ignore.
For perceptron C, you can ignore all input to and from it. For simplicity we wont be showing bias, but
it's still in the neural network.
Quiz
The neural network above calculates XOR. Each perceptron is a logic operation of OR, AND,
Passthrough, or NOT. The Passthrough operation just passes it's input to the output. However, the
perceptrons A , B, and C don't indicate their operation. In the following quiz, set the correct operations
for the three perceptrons to calculate XOR.
Note: Any line with a low weight can be ignored.

Quiz Question
Set the operations for the perceptrons in the XOR neural network?

Submit to check your answer choices!

Perceptron

Operations
A
NOT
B
AND
C
OR
You've seen that a perceptron can solve linearly separable problems. Solving more complex problems,
you use more perceptrons. You saw this by calculating AND, OR, NOT, and XOR operations using
perceptrons. These operations can be used to create any computer program. With enough data and time,
a neural network can solve any problem that a computer can calculate. However, you don't build a
Twitter using a neural network. A neural network is like any tool, you have to know when to use it.
The power of a neural network isn't building it by hand, like we were doing. It's the ability to learn
from examples. In the next few sections, you'll learn how a neural networks sets it's own weights and
biases.
13. The Simplest Neural Network

The simplest neural network


So far you've been working with perceptrons where the output is always one or zero. The input to the
output unit is passed through an activation function, f(h), in this case, the step function.
The step activation function.
The output unit returns the result of f(h), where h is the input to the output unit:
h=iwixi+b
The diagram below shows a simple network. The linear combination of the weights, inputs, and bias
form the input h, which passes through the activation function f(h), giving the final output of the
perceptron, labeled y.

Diagram of a simple neural network. Circles are units, boxes are operations.
The cool part about this architecture, and what makes neural networks possible, is that the activation
function, f(h) can be any function, not just the step function shown earlier.
For example, if you let f(h)=h, the output will be the same as the input. Now the output of the network
is
y=iwixi+b
This equation should be familiar to you, it's the same as the linear regression model!
Other activation functions you'll see are the logistic (often called the sigmoid), tanh, and softmax
functions. We'll mostly be using the sigmoid function for the rest of this lesson:
sigmoid(x)=1/(1+ex)

The sigmoid function


The sigmoid function is bounded between 0 and 1, and as an output can be interpreted as a probability
for success. It turns out, again, using a sigmoid as the activation function results in the same
formulation as logistic regression.
This is where it stops being a perceptron and begins being called a neural network. In the case of
simple networks like this, neural networks don't offer any advantage over general linear models such as
logistic regression.
As you saw earlier in the XOR perceptron, stacking units lets us model linearly inseparable data.
But, as you saw with the XOR perceptron, stacking units will let you model linearly inseparable data,
impossible to do with regression models.
Once you start using activation functions that are continuous and differentiable, it's possible to train the
network using gradient descent, which you'll learn about next.

Simple network exercise


Below you'll use NumPy to calculate the output of a simple network with two input nodes and one
output node with a sigmoid activation function. Things you'll need to do:
Implement the sigmoid function.
Calculate the output of the network.

The sigmoid function is


sigmoid(x)=1/(1+ex)
For the exponential, you can use Numpy's exponential function, np.exp.

And the output of the network is


y=f(h)=sigmoid(iwixi+b)
For the weights sum, you can do a simple element-wise multiplication and sum, or use NumPy's dot
product function.
Simple.py
import numpy as np

def sigmoid(x):
# TODO: Implement sigmoid function
return 1 / (1 + np.exp(-x))

inputs = np.array([0.7, -0.3])


weights = np.array([0.1, 0.8])
bias = -0.1

# TODO: Calculate the output


output = sigmoid(np.dot(inputs,weights) + bias)

print('Output:')
print(output)

solution.py

import numpy as np

def sigmoid(x):
# TODO: Implement sigmoid function
return 1/(1 + np.exp(-x))

inputs = np.array([0.7, -0.3])


weights = np.array([0.1, 0.8])
bias = -0.1
# TODO: Calculate the output
output = sigmoid(np.dot(weights, inputs) + bias)

print('Output:')
print(output)

14. Gradient Descent

Learning weights
You've seen how you can use perceptrons for AND and XOR operations, but there we set the weights
by hand. What if you want to perform an operation, such as predicting college admission, but don't
know the correct weights? You'll need to learn the weights from example data, then use those weights
to make the predictions.
To figure out how we're going to find these weights, start by thinking about the goal. We want the
network to make predictions as close as possible to the real values. To measure this, we need a metric
of how wrong the predictions are, the error. A common metric is the sum of the squared errors (SSE):
E=21j[yjy^j]2
where y^ is the prediction and y is the true value, and you take the sum over all output units j and
another sum over all data points . This might seem like a really complicated equation at first, but it's
fairly simple once you understand the symbols and can say what's going on in words.
First, the inside sum over j. This variable j represents the output units of the network. So this inside
sum is saying for each output unit, find the difference between the true value y and the predicted value
from the network y^, then square the difference, then sum up all those squares.
Then the other sum over is a sum over all the data points. So, for each data point you calculate the
inner sum of the squared differences for each output unit. Then you sum up those squared differences
for each data point. That gives you the overall error for all the output predictions for all the data points.
The SSE is a good choice for a few reasons. The square ensures the error is always positive and larger
errors are penalized more than smaller errors. Also, it makes the math nice, always a plus.
Remember that the output of a neural network, the prediction, depends on the weights
y^j=f(iwijxi)
and accordingly the error depends on the weights
E=21j[yjf(iwijxi)]2
We want the network's prediction error to be as small as possible and the weights are the knobs we can
use to make that happen. Our goal is to find weights wij that minimize the squared error E. To do this
with a neural network, typically you'd use gradient descent.

Enter Gradient Descent

https://www.youtube.com/watch?v=29PmNG7fuuM

As Luis said, with gradient descent, we take multiple small steps towards our goal. In this case, we
want to change the weights in steps that reduce the error. Continuing the analogy, the error is our
mountain and we want to get to the bottom. Since the fastest way down a mountain is in the steepest
direction, the steps taken should be in the direction that minimizes the error the most. We can find this
direction by calculating the gradient of the squared error.
Gradient is another term for rate of change or slope. If you need to brush up on this concept, check out
Khan Academy's great lectures on the topic.
To calculate a rate of change, we turn to calculus, specifically derivatives. A derivative of a function
f(x) gives you another function f(x) that returns the slope of f(x) at point x. For example, consider
f(x)=x2. The derivative of x2 is f(x)=2x. So, at x=2, the slope is f(2)=4. Plotting this out, it looks like:
Example of a gradient
The gradient is just a derivative generalized to functions with more than one variable. We can use
calculus to find the gradient at any point in our error function, which depends on the input weights.
You'll see how the gradient descent step is derived on the next page.
Below I've plotted an example of the error of a neural network with two inputs, and accordingly, two
weights. You can read this like a topographical map where points on a contour line have the same error
and darker contour lines correspond to larger errors.
At each step, you calculate the error and the gradient, then use those to determine how much to change
each weight. Repeating this process will eventually find weights that are close to the minimum of the
error function, the block dot in the middle.
Gradient descent steps to the lowest error

Caveats
Since the weights will just go where ever the gradient takes them, they can end up where the error is
low, but not the lowest. These spots are called local minima. If the weights are initialized with the
wrong values, gradient descent could lead the weights into a local minimum, illustrated below.
Gradient descent leading into a local minimum
There are methods to avoid this, such as using momentum.

15. Gradient Descent: The Math


https://www.youtube.com/watch?time_continue=3&v=7sxA5Ap8AWM

16. Gradient Descent: The Code

Gradient Descent: The Code


From before we saw that one weight update can be calculated as:
wi=xi
with the error term as
=(yy^)f(h)=(yy^)f(wixi)
Now I'll write this out in code for the case of only one output unit. We'll also be using the sigmoid as
the activation function f(h).
# Defining the sigmoid function for activations
def sigmoid(x):
return 1/(1+np.exp(-x))

# Derivative of the sigmoid function


def sigmoid_prime(x):
return sigmoid(x) * (1 - sigmoid(x))

# Input data
x = np.array([0.1, 0.3])
# Target
y = 0.2
# Input to output weights
weights = np.array([-0.8, 0.5])

# The learning rate, eta in the weight step equation


learnrate = 0.5

# The neural network output (y-hat)


nn_output = sigmoid(x[0]*weights[0] + x[1]*weights[1])
# or nn_output = sigmoid(np.dot(x, weights))

# output error (y - y-hat)


error = y - nn_output

# error term (lowercase delta)


error_term = error * sigmoid_prime(np.dot(x,weights))

# Gradient descent step


del_w = [ learnrate * error_term * x[0],
learnrate * error_term * x[1]]
# or del_w = learnrate * error_term * x

gradient.py

def sigmoid(x):
"""
Calculate sigmoid
"""
return 1/(1+np.exp(-x))

learnrate = 0.5
x = np.array([1, 2])
y = np.array(0.5)

# Initial weights
w = np.array([0.5, -0.5])

# Calculate one gradient descent step for each weight


# TODO: Calculate output of neural network
nn_output = sigmoid(np.dot(w,x))

# TODO: Calculate error of neural network


error = y - nn_output

# TODO: Calculate change in weights


del_w = learnrate * error * nn_output * (1 - nn_output) * x

print('Neural Network output:')


print(nn_output)
print('Amount of Error:')
print(error)
print('Change in Weights:')
print(del_w)

solution.py

import numpy as np
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1/(1+np.exp(-x))

learnrate = 0.5
x = np.array([1, 2])
y = np.array(0.5)

# Initial weights
w = np.array([0.5, -0.5])

# Calculate one gradient descent step for each weight


# TODO: Calculate output of neural network
nn_output = sigmoid(np.dot(x, w))

# TODO: Calculate error of neural network


error = y - nn_output

# TODO: Calculate change in weights


del_w = learnrate * error * nn_output * (1 - nn_output) * x

print('Neural Network output:')


print(nn_output)
print('Amount of Error:')
print(error)
print('Change in Weights:')
print(del_w)
17. Implementing Gradient Descent

Implementing gradient descent


Okay, now we know how to update our weights
wij=jxi,
how do we translate this into code?
As an example, I'm going to have you use gradient descent to train a network on graduate school
admissions data (found at http://www.ats.ucla.edu/stat/data/binary.csv). This dataset has three input
features: GRE score, GPA, and the rank of the undergraduate school (numbered 1 through 4).
Institutions with rank 1 have the highest prestige, those with rank 4 have the lowest.
The goal here is to predict if a student will be admitted to a graduate program based on these features.
For this, we'll use a network with one output layer with one unit. We'll use a sigmoid function for the
output unit activation.

Data cleanup
You might think there will be three input units, but we actually need to transform the data first. The
rank feature is categorical, the numbers don't encode any sort of relative values. Rank 2 is not twice
as much as rank 1, rank 3 is not 1.5 more than rank 2. Instead, we need to use dummy variables to
encode rank, splitting the data into four new columns encoded with ones or zeros. Rows with rank 1
have one in the rank 1 dummy column, and zeros in all other columns. Rows with rank 2 have one in
the rank 2 dummy column, and zeros in all other columns. And so on.
We'll also need to standardize the GRE and GPA data, which means to scale the values such they have
zero mean and a standard deviation of 1. This is necessary because the sigmoid function squashes really
small and really large inputs. The gradient of really small and large inputs is zero, which means that the
gradient descent step will go to zero too. Since the GRE and GPA values are fairly large, we have to be
really careful about how we initialize the weights or the gradient descent steps will die off and the
network won't train. Instead, if we standardize the data, we can initialize the weights easily and
everyone is happy.
This is just a brief run-through, you'll learn more about preparing data later. If you're interested in how
I did this, check out the data_prep.py file in the programming exercise below.
Ten rows of the data after transformations.
Now that the data is ready, we see that there are six input features: gre, gpa, and the four rank
dummy variables.

Mean Square Error


We're going to make a small change to how we calculate the error here. Instead of the SSE, we're going
to use the mean of the square errors (MSE). Now that we're using a lot of data, summing up all the
weight steps can lead to really large updates that make the gradient descent diverge. To compensate for
this, you'd need to use a quite small learning rate. Instead, we can just divide by the number of records
in our data, m to take the average. This way, no matter how much data we use, our learning rates will
typically be in the range of 0.01 to 0.001. Then, we can use the MSE (shown below) to calculate the
gradient and the result is the same as before, just averaged instead of summed.

Here's the general algorithm for updating the weights with gradient descent:
Set the weight step to zero: wi=0
For each record in the training data:
Make a forward pass through the network, calculating the output y^=f(iwixi)
Calculate the error gradient in the output unit, =(yy^)f(iwixi)
Update the weight step wi=wi+xi
Update the weights wi=wi+wi/m where is the learning rate and m is the number of records.
Here we're averaging the weight steps to help reduce any large variations in the training data.
Repeat for e epochs.

You can also update the weights on each record instead of averaging the weight steps after going
through all the records.
Remember that we're using the sigmoid for the activation function, f(h)=1/(1+eh)
And the gradient of the sigmoid is f(h)=f(h)(1f(h))
where h is the input to the output unit,
h=iwixi

Implementing with Numpy


For the most part, this is pretty straightforward with NumPy.
First, you'll need to initialize the weights. We want these to be small such that the input to the sigmoid
is in the linear region near 0 and not squashed at the high and low ends. It's also important to initialize
them randomly so that they all have different starting values and diverge, breaking symmetry. So, we'll
initialize the weights from a normal distribution centered at 0. A good value for the scale is 1/n where
n is the number of input units. This keeps the input to the sigmoid low for increasing numbers of input
units.
weights = np.random.normal(scale=1/n_features**.5, size=n_features)
NumPy provides a function that calculates the dot product of two arrays, which conveniently calculates
h for us. The dot product multiplies two arrays element-wise, the first element in array 1 is multiplied
by the first element in array 2, and so on. Then, each product is summed.
# input to the output layer
output_in = np.dot(weights, inputs)

And finally, we can update wi and wi by incrementing them with weights += ... which is
shorthand for weights = weights + ....

Efficiency tip!
You can save some calculations since we're using a sigmoid here. For the sigmoid function, f(h)=f(h)
(1f(h)). That means that once you calculate f(h), the activation of the output unit, you can use it to
calculate the gradient for the error gradient.

Programming exercise
Below, you'll implement gradient descent and train the network on the admissions data. Your goal here
is to train the network until you reach a minimum in the mean square error (MSE) on the training set.
You need to implement:
The network output: output.
The error gradient: error.
Update the weight step: del_w +=.
Update the weights: weights +=.

After you've written these parts, run the training by pressing "Test Run". The MSE will print out, as
well as the accuracy on a test set, the fraction of correctly predicted admissions.
Feel free to play with the hyperparameters and see how it changes the MSE.

Gradient.py

import numpy as np
from data_prep import features, targets, features_test, targets_test

def sigmoid(x):
"""
Calculate sigmoid
"""
return 1 / (1 + np.exp(-x))

# Use to same seed to make debugging easier


np.random.seed(42)

n_records, n_features = features.shape


last_loss = None

# Initialize weights
weights = np.random.normal(scale=1 / n_features**.5, size=n_features)

# Neural Network hyperparameters


epochs = 1000
learnrate = 0.5

for e in range(epochs):
del_w = np.zeros(weights.shape)
for x, y in zip(features.values, targets):
# Loop through all records, x is the input, y is the target

# TODO: Calculate the output


output = sigmoid(np.dot(x,weights))

# TODO: Calculate the error


error = np.mean((output - y)**2)

# TODO: Calculate change in weights


del_w += (y-output)*x*output*(1-output)

# TODO: Update weights


weights += learnrate * del_w/len(x)

# Printing out the mean square error on the training set


if e % (epochs / 10) == 0:
out = sigmoid(np.dot(features, weights))
loss = np.mean((out - targets) ** 2)
if last_loss and last_loss < loss:
print("Train loss: ", loss, " WARNING - Loss Increasing")
else:
print("Train loss: ", loss)
last_loss = loss

# Calculate accuracy on test data


tes_out = sigmoid(np.dot(features_test, weights))
predictions = tes_out > 0.5
accuracy = np.mean(predictions == targets_test)
print("Prediction accuracy: {:.3f}".format(accuracy))

data_prep.py

import numpy as np
import pandas as pd
admissions = pd.read_csv('binary.csv')

# Make dummy variables for rank


data = pd.concat([admissions, pd.get_dummies(admissions['rank'], prefix='rank')], axis=1)
data = data.drop('rank', axis=1)

# Standarize features
for field in ['gre', 'gpa']:
mean, std = data[field].mean(), data[field].std()
data.loc[:,field] = (data[field]-mean)/std

# Split off random 10% of the data for testing


np.random.seed(42)
sample = np.random.choice(data.index, size=int(len(data)*0.9), replace=False)
data, test_data = data.ix[sample], data.drop(sample)

# Split into features and targets


features, targets = data.drop('admit', axis=1), data['admit']
features_test, targets_test = test_data.drop('admit', axis=1), test_data['admit']

binary.csv

solution.py

import numpy as np
from data_prep import features, targets, features_test, targets_test

def sigmoid(x):
"""
Calculate sigmoid
"""
return 1 / (1 + np.exp(-x))

# Use to same seed to make debugging easier


np.random.seed(42)

n_records, n_features = features.shape


last_loss = None

# Initialize weights
weights = np.random.normal(scale=1/n_features**.5, size=n_features)

# Neural Network parameters


epochs = 1000
learnrate = 0.5

for e in range(epochs):
del_w = np.zeros(weights.shape)
for x, y in zip(features.values, targets):
# Loop through all records, x is the input, y is the target

# Activation of the output unit


output = sigmoid(np.dot(x, weights))

# The error, the target minues the network output


error = y - output
# The gradient descent step, the error times the gradient times the inputs
del_w += error * output * (1 - output) * x

# Update the weights here. The learning rate times the


# change in weights, divided by the number of records to average
weights += learnrate * del_w / n_records

# Printing out the mean square error on the training set


if e % (epochs / 10) == 0:
out = sigmoid(np.dot(features, weights))
loss = np.mean((out - targets) ** 2)
if last_loss and last_loss < loss:
print("Train loss: ", loss, " WARNING - Loss Increasing")
else:
print("Train loss: ", loss)
last_loss = loss

# Calculate accuracy on test data


tes_out = sigmoid(np.dot(features_test, weights))
predictions = tes_out > 0.5
accuracy = np.mean(predictions == targets_test)
print("Prediction accuracy: {:.3f}".format(accuracy))
17. Multilayer Perceptrons
https://www.youtube.com/watch?v=Rs9petvTBLk

Implementing the hidden layer


Prerequisites
Below, we are going to walk through the math of neural networks in a multilayer perceptron. With
multiple perceptrons, we are going to move to using vectors and matrices. To brush up, be sure to view
the following:
1. Khan Academy's introduction to vectors.
2. Khan Academy's introduction to matrices.

Derivation
Before, we were dealing with only one output node which made the code straightforward. However
now that we have multiple input units and multiple hidden units, the weights between them will require
two indices: wij where i denotes input units and j are the hidden units.
For example, the following image shows our network, with its input units labeled x1,x2, and x3, and its
hidden nodes labeled h1 and h2:
The lines indicating the weights leading to h1 have been colored differently from those leading to h2
just to make it easier to read.
Now to index the weights, we take the input unit number for the i and the hidden unit number for the j.
That gives us
w11
for the weight leading from x1 to h1, and
w12
for the weight leading from x1 to h2.
The following image includes all of the weights between the input layer and the hidden layer, labeled
with their appropriate wij indices:
Before, we were able to write the weights as an array, indexed as wi.
But now, the weights need to be stored in a matrix, indexed as wij. Each row in the matrix will
correspond to the weights leading out of a single input unit, and each column will correspond to the
weights leading in to a single hidden unit. For our three input units and two hidden units, the weights
matrix looks like this:

Be sure to compare the matrix above with the diagram shown before it so you can see where the
different weights in the network end up in the matrix.
To initialize these weights in Numpy, we have to provide the shape of the matrix. If features is a 2D
array containing the input data:
# Number of records and input units
n_records, n_inputs = features.shape
# Number of hidden units
n_hidden = 2
weights_input_to_hidden = np.random.normal(0, n_inputs**-0.5, size=(n_inputs,
n_hidden))

This creates a 2D array (i.e. a matrix) named weights_input_to_hidden with dimensions


n_inputs by n_hidden. Remember how the input to a hidden unit is the sum of all the inputs
multiplied by the hidden unit's weights. So for each hidden layer unit, hj, we need to calculate the
following:

To do that, we now need to use matrix multiplication.


In this case, we're multiplying the inputs (a row vector here) by the weights. To do this, you take the
dot (inner) product of the inputs with each column in the weights matrix. For example, to calculate the
input to the first hidden unit, j=1, you'd take the dot product of the inputs with the first column of the
weights matrix, like so:

Calculating the input to the first hidden unit with the first column of the weights matrix.

And for the second hidden layer input, you calculate the dot product of the inputs with the second
column. And so on and so forth.
In NumPy, you can do this for all the inputs and all the outputs at once using np.dot
hidden_inputs = np.dot(inputs, weights_input_to_hidden)

You could also define your weights matrix such that it has dimensions n_hidden by n_inputs then
multiply like so where the inputs form a column vector:
Note: The weight indices have changed in the above image and no longer match up with the labels
used in the earlier diagrams. That's because, in matrix notation, the row index always precedes the
column index, so it would be misleading to label them the way we did in the neural net diagram. Just
keep in mind that this is the same weight matrix as before, but rotated so the first column is now the
first row, and the second column is now the second row. If we were to use the labels from the earlier
diagram, the weights would fit into the matrix in the following locations:

Weight matrix shown with labels matching earlier diagrams.


Remember, the above is not a correct view of the indices, but it uses the labels from the earlier neural
net diagrams to show you where each weight ends up in the matrix.
The important thing with matrix multiplication is that the dimensions match. For matrix multiplication
to work, there has to be the same number of elements in the dot products. In the first example, there are
three columns in the input vector, and three rows in the weights matrix. In the second example, there
are three columns in the weights matrix and three rows in the input vector. If the dimensions don't
match, you'll get this:
# Same weights and features as above, but swapped the order
hidden_inputs = np.dot(weights_input_to_hidden, features)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-11-1bfa0f615c45> in <module>()
----> 1 hidden_in = np.dot(weights_input_to_hidden, X)

ValueError: shapes (3,2) and (3,) not aligned: 2 (dim 1) != 3 (dim 0)

The dot product can't be computed for a 3x2 matrix and 3-element array. That's because the 2 columns
in the matrix don't match the number of elements in the array. Some of the dimensions that could work
would be the following:
The rule is that if you're multiplying an array from the left, the array must have the same number of
elements as there are rows in the matrix. And if you're multiplying the matrix from the left, the number
of columns in the matrix must equal the number of elements in the array on the right.

Making a column vector


You see above that sometimes you'll want a column vector, even though by default Numpy arrays work
like row vectors. It's possible to get the transpose of an array like so arr.T, but for a 1D array, the
transpose will return a row vector. Instead, use arr[:,None] to create a column vector:

print(features)
> array([ 0.49671415, -0.1382643 , 0.64768854])

print(features.T)
> array([ 0.49671415, -0.1382643 , 0.64768854])

print(features[:, None])
> array([[ 0.49671415],
[-0.1382643 ],
[ 0.64768854]])

Alternatively, you can create arrays with two dimensions. Then, you can use arr.T to get the column
vector.

np.array(features, ndmin=2)
> array([[ 0.49671415, -0.1382643 , 0.64768854]])

np.array(features, ndmin=2).T
> array([[ 0.49671415],
[-0.1382643 ],
[ 0.64768854]])

I personally prefer keeping all vectors as 1D arrays, it just works better in my head.

Programming quiz
Below, you'll implement a forward pass through a 4x3x2 network, with sigmoid activation functions
for both layers.
Things to do:
Calculate the input to the hidden layer.
Calculate the hidden layer output.
Calculate the input to the output layer.
Calculate the output of the network.

Multiplier.py
import numpy as np

def sigmoid(x):
"""
Calculate sigmoid
"""
return 1/(1+np.exp(-x))

# Network size
N_input = 4
N_hidden = 3
N_output = 2

np.random.seed(42)
# Make some fake data
X = np.random.randn(4)

weights_input_to_hidden = np.random.normal(0, scale=0.1, size=(N_input, N_hidden))


weights_hidden_to_output = np.random.normal(0, scale=0.1, size=(N_hidden, N_output))

# TODO: Make a forward pass through the network

hidden_layer_in = np.dot(X, weights_input_to_hidden)


hidden_layer_out = sigmoid(hidden_layer_in)

print('Hidden-layer Output:')
print(hidden_layer_out)
output_layer_in = np.dot(hidden_layer_out,weights_hidden_to_output)
output_layer_out = sigmoid(output_layer_in)

print('Output-layer Output:')
print(output_layer_out)

solution.py

import numpy as np

def sigmoid(x):
"""
Calculate sigmoid
"""
return 1/(1+np.exp(-x))

# Network size
N_input = 4
N_hidden = 3
N_output = 2

np.random.seed(42)
# Make some fake data
X = np.random.randn(4)

weights_input_to_hidden = np.random.normal(0, scale=0.1, size=(N_input, N_hidden))


weights_hidden_to_output = np.random.normal(0, scale=0.1, size=(N_hidden, N_output))
# TODO: Make a forward pass through the network

hidden_layer_in = np.dot(X, weights_input_to_hidden)


hidden_layer_out = sigmoid(hidden_layer_in)

print('Hidden-layer Output:')
print(hidden_layer_out)

output_layer_in = np.dot(hidden_layer_out, weights_hidden_to_output)


output_layer_out = sigmoid(output_layer_in)

print('Output-layer Output:')
print(output_layer_out)

19. Backpropagation
https://www.youtube.com/watch?v=MZL97-2joxQ

Backpropagation
Now we've come to the problem of how to make a multilayer neural network learn. Before, we saw
how to update weights with gradient descent. The backpropagation algorithm is just an extension of
that, using the chain rule to find the error with the respect to the weights connecting the input layer to
the hidden layer (for a two layer network).
To update the weights to hidden layers using gradient descent, you need to know how much error each
of the hidden units contributed to the final output. Since the output of a layer is determined by the
weights between layers, the error resulting from units is scaled by the weights going forward through
the network. Since we know the error at the output, we can use the weights to work backwards to
hidden layers.
For example, in the output layer, you have errors ko attributed to each output unit k. Then, the error
attributed to hidden unit j is the output errors, scaled by the weights between the output and hidden
layers (and the gradient):
Then, the gradient descent step is the same as before, just with the new errors:

where wij are the weights between the inputs and hidden layer and xi are input unit values. This form
holds for however many layers there are. The weight steps are equal to the step size times the output
error of the layer times the values of the inputs to that layer

Here, you get the output error, output, by propagating the errors backwards from higher layers. And
the input values, Vin are the inputs to the layer, the hidden layer activations to the output unit for
example.

Working through an example


Let's walk through the steps of calculating the weight updates for a simple two layer network. Suppose
there are two input values, one hidden unit, and one output unit, with sigmoid activations on the hidden
and output units. The following image depicts this network. (Note: the input values are shown as nodes
at the bottom of the image, while the networks output value is shown as y^ at the top. The inputs
themselves do not count as a layer, which is why this is considered a two layer network.)
Assume we're trying to fit some binary data and the target is y=1. We'll start with the forward pass, first
calculating the input to the hidden unit
h=iwixi=0.40.10.20.3=0.02
and the output of the hidden unit
a=f(h)=sigmoid(0.02)=0.495.
Using this as the input to the output unit, the output of the network is
y^=f(Wa)=sigmoid(0.10.495)=0.512.
With the network output, we can start the backwards pass to calculate the weight updates for both
layers. Using the fact that for the sigmoid function f(Wa)=f(Wa)(1f(Wa)), the error for the output
unit is
o=(yy^)f(Wa)=(10.512)0.512(10.512)=0.122.
Now we need to calculate the error for the hidden unit with backpropagation. Here we'll scale the error
from the output unit by the weight W connecting it to the hidden unit. For the hidden unit error, jh=k
Wjkkof(hj), but since we have one hidden unit and one output unit, this is much simpler.
h=Wof(h)=0.10.1220.495(10.495)=0.003
Now that we have the errors, we can calculate the gradient descent steps. The hidden to output weight
step is the learning rate, times the output unit error, times the hidden unit activation value.
W=oa=0.50.1220.495=0.0302
Then, for the input to hidden weights wi, it's the learning rate times the hidden unit error, times the
input values.
wi=hxi=(0.50.0030.1,0.50.0030.3)=(0.00015,0.00045)
From this example, you can see one of the effects of using the sigmoid function for the activations. The
maximum derivative of the sigmoid function is 0.25, so the errors in the output layer get reduced by at
least 75%, and errors in the hidden layer are scaled down by at least 93.75%! You can see that if you
have a lot of layers, using a sigmoid activation function will quickly reduce the weight steps to tiny
values in layers near the input. This is known as the vanishing gradient problem. Later in the course
you'll learn about other activation functions that perform better in this regard and are more commonly
used in modern network architectures.

Implementing in NumPy
For the most part you have everything you need to implement backpropagation with NumPy.
However, previously we were only dealing with error terms from one unit. Now, in the weight update,
we have to consider the error for each unit in the hidden layer, j:
wij=jxi
Firstly, there will likely be a different number of input and hidden units, so trying to multiply the errors
and the inputs as row vectors will throw an error
hidden_error*inputs
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-22-3b59121cb809> in <module>()
----> 1 hidden_error*x

ValueError: operands could not be broadcast together with shapes (3,) (6,)

Also, wij is a matrix now, so the right side of the assignment must have the same shape as the left side.
Luckily, NumPy takes care of this for us. If you multiply a row vector array with a column vector array,
it will multiply the first element in the column by each element in the row vector and set that as the first
row in a new 2D array. This continues for each element in the column vector, so you get a 2D array that
has shape (len(column_vector), len(row_vector)).
hidden_error*inputs[:,None]
array([[ -8.24195994e-04, -2.71771975e-04, 1.29713395e-03],
[ -2.87777394e-04, -9.48922722e-05, 4.52909055e-04],
[ 6.44605731e-04, 2.12553536e-04, -1.01449168e-03],
[ 0.00000000e+00, 0.00000000e+00, -0.00000000e+00],
[ 0.00000000e+00, 0.00000000e+00, -0.00000000e+00],
[ 0.00000000e+00, 0.00000000e+00, -0.00000000e+00]])

It turns out this is exactly how we want to calculate the weight update step. As before, if you have your
inputs as a 2D array with one row, you can also do hidden_error*inputs.T, but that won't work
if inputs is a 1D array.
Backpropagation exercise
Below, you'll implement the code to calculate one backpropagation update step for two sets of weights.
I wrote the forward pass, your goal is to code the backward pass.
Things to do
Calculate the network error.
Calculate the output layer error gradient.
Use backpropagation to calculate the hidden layer error.
Calculate the weight update steps.

Backprop.py

import numpy as np

def sigmoid(x):
"""
Calculate sigmoid
"""
return 1 / (1 + np.exp(-x))

x = np.array([0.5, 0.1, -0.2])


target = 0.6
learnrate = 0.5

weights_input_hidden = np.array([[0.5, -0.6],


[0.1, -0.2],
[0.1, 0.7]])

weights_hidden_output = np.array([0.1, -0.3])


## Forward pass
hidden_layer_input = np.dot(x, weights_input_hidden)
hidden_layer_output = sigmoid(hidden_layer_input)

output_layer_in = np.dot(hidden_layer_output, weights_hidden_output)


output = sigmoid(output_layer_in)

## Backwards pass
## TODO: Calculate error
error = None

# TODO: Calculate error gradient for output layer


del_err_output = None

# TODO: Calculate error gradient for hidden layer


del_err_hidden = None

# TODO: Calculate change in weights for hidden layer to output layer


delta_w_h_o = None

# TODO: Calculate change in weights for input layer to hidden layer


delta_w_i_h = None

print('Change in weights for hidden layer to output layer:')


print(delta_w_h_o)
print('Change in weights for input layer to hidden layer:')
print(delta_w_i_h)
solution.py

import numpy as np

def sigmoid(x):
"""
Calculate sigmoid
"""
return 1 / (1 + np.exp(-x))

x = np.array([0.5, 0.1, -0.2])


target = 0.6
learnrate = 0.5

weights_input_hidden = np.array([[0.5, -0.6],


[0.1, -0.2],
[0.1, 0.7]])

weights_hidden_output = np.array([0.1, -0.3])

## Forward pass
hidden_layer_input = np.dot(x, weights_input_hidden)
hidden_layer_output = sigmoid(hidden_layer_input)

output_layer_in = np.dot(hidden_layer_output, weights_hidden_output)


output = sigmoid(output_layer_in)
## Backwards pass
## TODO: Calculate error
error = target - output

# TODO: Calculate error gradient for output layer


del_err_output = error * output * (1 - output)

# TODO: Calculate error gradient for hidden layer


del_err_hidden = np.dot(del_err_output, weights_hidden_output) * \
hidden_layer_output * (1 - hidden_layer_output)

# TODO: Calculate change in weights for hidden layer to output layer


delta_w_h_o = learnrate * del_err_output * hidden_layer_output

# TODO: Calculate change in weights for input layer to hidden layer


delta_w_i_h = learnrate * del_err_hidden * x[:, None]

print('Change in weights for hidden layer to output layer:')


print(delta_w_h_o)
print('Change in weights for input layer to hidden layer:')
print(delta_w_i_h)

20. Implementing Backpropagation

Implementing backpropagation
Now we've seen that the error in the output layer is
k=(yky^k)f(ak)
and the error in the hidden layer is

For now we'll only consider a simple network with one hidden layer and one output unit. Here's the
general algorithm for updating the weights with backpropagation:
Set the weight steps for each layer to zero
The input to hidden weights wij=0
The hidden to output weights Wj=0
For each record in the training data:
Make a forward pass through the network, calculating the output y^
Calculate the error gradient in the output unit, o=(yy^)f(z) where z=jWjaj, the input
to the output unit.
Propagate the errors to the hidden layer jh=oWjf(hj)
Update the weight steps,:
Wj=Wj+oaj
wij=wij+jhai
Update the weights, where is the learning rate and m is the number of records:
Wj=Wj+Wj/m
wij=wij+wij/m
Repeat for e epochs.

Backpropagation exercise
Now you're going to implement the backprop algorithm for a network trained on the graduate school
admission data. You should have everything you need from the previous exercises to complete this one.
Your goals here:
Implement the forward pass.
Implement the backpropagation algorithm.
Update the weights.

Backprop.py

import numpy as np
from data_prep import features, targets, features_test, targets_test
np.random.seed(21)

def sigmoid(x):
"""
Calculate sigmoid
"""
return 1 / (1 + np.exp(-x))

# Hyperparameters
n_hidden = 2 # number of hidden units
epochs = 900
learnrate = 0.005

n_records, n_features = features.shape


last_loss = None
# Initialize weights
weights_input_hidden = np.random.normal(scale=1 / n_features ** .5,
size=(n_features, n_hidden))
weights_hidden_output = np.random.normal(scale=1 / n_features ** .5,
size=n_hidden)

for e in range(epochs):
del_w_input_hidden = np.zeros(weights_input_hidden.shape)
del_w_hidden_output = np.zeros(weights_hidden_output.shape)
for x, y in zip(features.values, targets):
## Forward pass ##
# TODO: Calculate the output
hidden_input = np.dot(x, weights_input_hidden)
hidden_output = sigmoid(hidden_input)
output = sigmoid(np.dot(hidden_output, weights_hidden_output))

## Backward pass ##
# TODO: Calculate the error
error = y-output

# TODO: Calculate error gradient in output unit


output_error = (error)*output*(1-output)

# TODO: propagate errors to hidden layer


hidden_error = np.dot(output_error, weights_hidden_output)*hidden_output*(1-hidden_output)

# TODO: Update the change in weights


del_w_hidden_output += output_error * hidden_output
del_w_input_hidden += hidden_error * x[:, None]

# TODO: Update weights


weights_input_hidden += learnrate * del_w_input_hidden / n_records
weights_hidden_output += learnrate * del_w_hidden_output / n_records

# Printing out the mean square error on the training set


if e % (epochs / 10) == 0:
hidden_output = sigmoid(np.dot(x, weights_input_hidden))
out = sigmoid(np.dot(hidden_output,
weights_hidden_output))
loss = np.mean((out - targets) ** 2)
if last_loss and last_loss < loss:
print("Train loss: ", loss, " WARNING - Loss Increasing")
else:
print("Train loss: ", loss)
last_loss = loss

# Calculate accuracy on test data


hidden = sigmoid(np.dot(features_test, weights_input_hidden))
out = sigmoid(np.dot(hidden, weights_hidden_output))
predictions = out > 0.5
accuracy = np.mean(predictions == targets_test)
print("Prediction accuracy: {:.3f}".format(accuracy))

data_prep.py

import numpy as np
import pandas as pd

admissions = pd.read_csv('binary.csv')

# Make dummy variables for rank


data = pd.concat([admissions, pd.get_dummies(admissions['rank'], prefix='rank')], axis=1)
data = data.drop('rank', axis=1)

# Standarize features
for field in ['gre', 'gpa']:
mean, std = data[field].mean(), data[field].std()
data.loc[:,field] = (data[field]-mean)/std
# Split off random 10% of the data for testing
np.random.seed(21)
sample = np.random.choice(data.index, size=int(len(data)*0.9), replace=False)
data, test_data = data.ix[sample], data.drop(sample)

# Split into features and targets


features, targets = data.drop('admit', axis=1), data['admit']
features_test, targets_test = test_data.drop('admit', axis=1), test_data['admit']

binary.csv

solution.py

import numpy as np
from data_prep import features, targets, features_test, targets_test

np.random.seed(21)

def sigmoid(x):
"""
Calculate sigmoid
"""
return 1 / (1 + np.exp(-x))

# Hyperparameters
n_hidden = 2 # number of hidden units
epochs = 900
learnrate = 0.005
n_records, n_features = features.shape
last_loss = None
# Initialize weights
weights_input_hidden = np.random.normal(scale=1 / n_features ** .5,
size=(n_features, n_hidden))
weights_hidden_output = np.random.normal(scale=1 / n_features ** .5,
size=n_hidden)

for e in range(epochs):
del_w_input_hidden = np.zeros(weights_input_hidden.shape)
del_w_hidden_output = np.zeros(weights_hidden_output.shape)
for x, y in zip(features.values, targets):
## Forward pass ##
# TODO: Calculate the output
hidden_input = np.dot(x, weights_input_hidden)
hidden_output = sigmoid(hidden_input)

output = sigmoid(np.dot(hidden_output,
weights_hidden_output))

## Backward pass ##
# TODO: Calculate the error
error = y - output

# TODO: Calculate error gradient in output unit


output_error = error * output * (1 - output)

# TODO: propagate errors to hidden layer


hidden_error = np.dot(output_error, weights_hidden_output) * \
hidden_output * (1 - hidden_output)

# TODO: Update the change in weights


del_w_hidden_output += output_error * hidden_output
del_w_input_hidden += hidden_error * x[:, None]

# TODO: Update weights


weights_input_hidden += learnrate * del_w_input_hidden / n_records
weights_hidden_output += learnrate * del_w_hidden_output / n_records

# Printing out the mean square error on the training set


if e % (epochs / 10) == 0:
hidden_output = sigmoid(np.dot(x, weights_input_hidden))
out = sigmoid(np.dot(hidden_output,
weights_hidden_output))
loss = np.mean((out - targets) ** 2)

if last_loss and last_loss < loss:


print("Train loss: ", loss, " WARNING - Loss Increasing")
else:
print("Train loss: ", loss)
last_loss = loss

# Calculate accuracy on test data


hidden = sigmoid(np.dot(features_test, weights_input_hidden))
out = sigmoid(np.dot(hidden, weights_hidden_output))
predictions = out > 0.5
accuracy = np.mean(predictions == targets_test)
print("Prediction accuracy: {:.3f}".format(accuracy))

21. Further Reading

Further reading
Backpropagation is fundamental to deep learning. TensorFlow and other libraries will perform the
backprop for you, but you should really really understand the algorithm. We'll be going over backprop
again, but here are some extra resources for you:
From Andrej Karpathy: Yes, you should understand backprop

Also from Andrej Karpathy, a lecture from Stanford's CS231n course

22. Create Your Own NN

In this lesson, you learned the power of perceptrons. How powerful one perceptron is and the power of
a neural network using multiple perceptrons. Then you learned how each perceptron can learn from
past samples to come up with a solution.
Now that you understand the basics of a neural network, the next step is to build a basic neural
network. In the next lesson, you'll build your own neural network.
23. Summary
https://www.youtube.com/watch?v=m8xslYUBXYo