for machine learning

© All Rights Reserved

321 views

for machine learning

© All Rights Reserved

- Neural Networks
- Artificial Neural Network
- Perceptron Learning Rules
- Face Detection Using Neural Network & Gabor Wavelet Transform
- Improving the Character Recognition Efficiency of Feed Forward BP Neural Network
- Paper020601.pdf
- thesis.pdf
- Statistical Data Science (Gnv64)
- 22
- A Review on Multi-Focus Image Fusion Algorithms
- Drug design by machine learning support vector machines for pharmaceutical data analysis.pdf
- What is a Neural Network
- Applying Machine Learning Classiﬁers to Dynamic Android Malware Detection at Scale
- A New Convolutional Neural Network Based Data-Driven Fault Diagnosis Method
- Entity Embeddings of Categorical Variables
- biomavFinal-1
- Indhufiles Ppt
- 2
- Percept Ron
- Create a Horror Movie

You are on page 1of 145

Program Syllabus

Core Curriculum

This section consists of all the lessons and projects you need to complete in order to receive your

certificate.

3 Parts

14 Projects

1. 1

Part 1

In this term, you'll become an expert in applying Computer Vision and Deep Learning on

automotive problems. You will teach the car to detect lane lines, predict steering angle, and

more all based on just camera data!

2. 2

LockedPart 2 - Locked

Purchase Term To Unlock

In this term, you'll learn how to use an array of sensor data to perceive the environment and

control the vehicle. You'll evaluate sensor data from camera, radar, lidar, and GPS, and use these

in closed-loop controllers that actuate the vehicle.

3. 3

LockedPart 3 - Locked

Purchase Term To Unlock

Path Planning, Concentrations, and Systems

In this term, you'll learn how to plan where the vehicle should go, how the vehicle systems work

together to get it there, and you'll perform a deep-dive into a concentration of your choice.

Up Next

Extracurricular

This section consists of extra lessons and projects you can choose to complete in order to increase your

chances of changing careers.

2 Parts

7 Projects

1. 1

Part 1

Opportunity can come when you least expect it, so when your dream job comes along, you want

to be ready. In the following lessons, you will learn strategies for conducting a successful job

search, including developing a targeted resume and cover letter for that job.

2. 2

Part 2

Career: Networking

Networking is a very important component to a successful job search. In the following lesson,

you will learn how tell your unique story to recruiters in a succinct and professional but

relatable way.

Up Next

https://www.youtube.com/watch?v=jHA__A61nqc

https://www.youtube.com/watch?v=QiflJFVOt18

3. Overview of ND Program

https://www.youtube.com/watch?v=RZ5iolr4RGs

https://www.youtube.com/watch?v=JGpXenoW0dk

5. Career Support

https://www.youtube.com/watch?time_continue=9&v=4MGOyNXh4EQ

6. Nanodegree Support

Getting Support

There are several ways in which you will receive support during the program from Udacity's network

of Mentors and Reviewers, as well as your fellow students.

Mentorship

You can think of your Mentor as your Advisor in the Nanodegree.

Your in-classroom Mentor will be your guide through the program and will do the following:

Check in with you weekly to make sure that you are on track.

Help you set learning goals.

Guide you to supplementary resources when you get stuck.

Respond to any questions you have about the program.

If you have questions or comments about the Mentorship experience, or if you're having trouble

reaching your mentor, please email mentorship-support@udacity.com.

Forum Q&A

Udacity Discourse will be your home for the forums and the wiki.

Aside from your Mentor, the forums are a great place to ask in-depth and technical questions.

Questions in the forums will be answered by both paid mentors and other students. Make sure to like

answers as you read them, and feel free to post answers yourself!

We will be using Discourse for the forums, and you should be able to access these forums anytime by

following the forum link on the left hand side of the classroom. Once you are there, check out the

different categories and subcategories, and post a question if you have one!

Slack Community

Your private slack team will be the best place to chat live with students and staff.

Slack is the best place for live discussion and interaction with your community of students. If you

haven't joined already, you can sign up here. (Note that this Slack instance is for enrolled students and

is different from the ND013 Slack Team.)

Reviews

Our global team of Reviewers will code review each of your project submissions usually within 24

hours.

For each project you submit, you will receive detailed feedback from a project Reviewer.

Sometimes, a reviewer might ask you to resubmit a project to meet specifications. In that case, an

indication of needed changes will also be provided. Note that you can submit a project as many times

as needed to pass.

Feedback

Please help us improve the program by submitting bugs and issues to our Waffle board.

In order to keep our content up-to-date and address issues quickly, we've set up a Waffle board to track

error reports and suggestions.

If you find an error, check there to see if it has already been filed. If it hasn't, you can file an issue by

clicking on the "Add issue" button, adding a title, and entering a description in the details (you will

need a GitHub account for this).

Links and screenshots, if available, are always appreciated!

Quiz Question

Have you signed up for Slack? Have you visited the forums?

7. Deadline Policy

Deadline Policy

When we use the term deadline with regards to Nanodegree program projects, we use it in one of two

ways:

A final deadline for passing all projects

Ongoing suggested deadline for individual projects

It is very important to understand the distinctions between the two, as your progress in the program is

measured against the deadlines weve established. Please see below for an explanation of what each

usage means.

Passing a project in this context means that a Udacity Reviewer has marked your project as Meets

Specifications. In order to graduate a term, you have to pass all projects by the last day of the term.

If you do not pass all projects by the last day of the term, the following happens:

You will receive a 4-week extension to complete any outstanding projects. You will receive this

extension a maximum of one time. Once you submit and pass all projects, you can enroll in the

next term, which will potentially be with a later class. If you do not submit and pass all projects

within the 4-week extension, you will be removed from the program.

The deadlines you see in your classroom are suggestions for when you should ideally pass each project.

They are meant to help keep you on track so that you maintain an appropriate pace throughout the

programone that will see you graduate on time!

Please note that you can submit your project as many times as you need to. There are no penalties if

you miss these deadlines. However, you will be at risk of not passing all projects on time if you miss

these deadlines, and fall behind, so it is a recommended best practice to try and meet each suggested

deadline.

8. Self-Driving Car History

Stanley - The car that Sebastian Thrun and his team at Stanford built to win the DARPA Grand

Challenge.

The recent advancements in self-driving cars are built on decades of work by people around the world.

In the next video, you'll get a chance to step back and learn about some of this work and how your own

contributions may one day fit into this narrative.

In particular, you'll get a chance to relive the DARPA Grand Challenge, one of the great milestones in

self-driving car technology, and meet some of the people who took on this seemingly impossible task.

This video is not required, but we highly encourage you to watch it when you get the chance.

We hope you enjoy it as much as we did!

https://www.youtube.com/watch?v=saVZ_X9GfIM

10. Self-Driving Car Quiz

the self-driving car industry!

Question 1 of 2

Can you guess how many self-driving cars will be on the road by the year 2020?

10,000,000

Question 2 of 2

Can you guess which of the following companies are CURRENTLY developing self-driving cars?

1. Setting up the Problem

https://www.youtube.com/watch?v=aIkAcXVxf2w

Quiz Question

Which of the following features could be useful in the identification of lane lines on the road?

Color

Shape

Orientation

2. Color Selection

https://www.youtube.com/watch?time_continue=1&v=bNOWJ9wdmhk

Quiz Question

What color is pure white in our combined red + green + blue [R, G, B] image?

Lets code up a simple color selection in Python.

No need to download or install anything, you can just follow along in the browser for now.

We'll be working with the same image you saw previously.

Check out the code below. First, I import pyplot and image from matplotlib. I also import

numpy for operating on the image.

import matplotlib.pyplot as plt

import matplotlib.image as mpimg

import numpy as np

I then read in an image and print out some stats. Ill grab the x and y sizes and make a copy of the

image to work with. NOTE: Always make a copy of arrays or other variables in Python. If instead, you

say "a = b" then all changes you make to "a" will be reflected in "b" as well!

# Read in the image and print out some stats

image = mpimg.imread('test.jpg')

print('This image is: ',type(image),

'with dimensions:', image.shape)

ysize = image.shape[0]

xsize = image.shape[1]

# Note: always make a copy rather than simply using "="

color_select = np.copy(image)

blue_threshold and populate rgb_threshold with these values. This vector contains the

minimum values for red, green, and blue (R,G,B) that I will allow in my selection.

# Define our color selection criteria

# Note: if you run this code, you'll find these are not sensible values!!

# But you'll get a chance to play with them soon in a quiz

red_threshold = 0

green_threshold = 0

blue_threshold = 0

rgb_threshold = [red_threshold, green_threshold, blue_threshold]

Next, I'll select any pixels below the threshold and set them to zero.

After that, all pixels that meet my color criterion (those above the threshold) will be retained, and those

that do not (below the threshold) will be blacked out.

# Identify pixels below the threshold

thresholds = (image[:,:,0] < rgb_threshold[0]) \

| (image[:,:,1] < rgb_threshold[1]) \

| (image[:,:,2] < rgb_threshold[2])

color_select[thresholds] = [0,0,0]

plt.imshow(color_select)

plt.show()

The result, color_select, is an image in which pixels that were above the threshold have been

retained, and pixels below the threshold have been blacked out.

In the code snippet above, red_threshold, green_threshold and blue_threshold are all

set to 0, which implies all pixels will be included in the selection.

In the next quiz, you will modify the values of red_threshold, green_threshold and

blue_threshold until you retain as much of the lane lines as possible while dropping everything

else. Your output image should look like the one below.

Image after color selection

In the next quiz, I want you to modify the values of the variables red_threshold,

green_threshold, and blue_threshold until you are able to retain as much of the lane lines

as possible, while getting rid of most of the other stuff. When you run the code in the quiz, your image

will be output with an example image next to it. Tweak these variables such that your input image (on

the left below) looks like the example image on the right.

5. Region Masking

https://www.youtube.com/watch?v=ngN9Cr-QfiI

Awesome! Now you've seen that with a simple color selection we have managed to eliminate almost

everything in the image except the lane lines.

At this point, however, it would still be tricky to extract the exact lines automatically, because we still

have some other objects detected around the periphery that aren't lane lines.

In this case, I'll assume that the front facing camera that took the image is mounted in a fixed position

on the car, such that the lane lines will always appear in the same general region of the image. Next, I'll

take advantage of this by adding a criterion to only consider pixels for color selection in the region

where we expect to find the lane lines.

Check out the code below. The variables left_bottom, right_bottom, and apex represent the

vertices of a triangular region that I would like to retain for my color selection, while masking

everything else out. Here I'm using a triangular mask to illustrate the simplest case, but later you'll use

a quadrilateral, and in principle, you could use any polygon.

import matplotlib.pyplot as plt

import matplotlib.image as mpimg

import numpy as np

image = mpimg.imread('test.jpg')

print('This image is: ', type(image),

'with dimensions:', image.shape)

# Pull out the x and y sizes and make a copy of the image

ysize = image.shape[0]

xsize = image.shape[1]

region_select = np.copy(image)

# Keep in mind the origin (x=0, y=0) is in the upper left in image processing

# Note: if you run this code, you'll find these are not sensible values!!

# But you'll get a chance to play with them soon in a quiz

left_bottom = [0, 539]

right_bottom = [900, 300]

apex = [400, 0]

# np.polyfit() returns the coefficients [A, B] of the fit

fit_left = np.polyfit((left_bottom[0], apex[0]), (left_bottom[1], apex[1]), 1)

fit_right = np.polyfit((right_bottom[0], apex[0]), (right_bottom[1], apex[1]), 1)

fit_bottom = np.polyfit((left_bottom[0], right_bottom[0]), (left_bottom[1],

right_bottom[1]), 1)

XX, YY = np.meshgrid(np.arange(0, xsize), np.arange(0, ysize))

region_thresholds = (YY > (XX*fit_left[0] + fit_left[1])) & \

(YY > (XX*fit_right[0] + fit_right[1])) & \

(YY < (XX*fit_bottom[0] + fit_bottom[1]))

region_select[region_thresholds] = [255, 0, 0]

plt.imshow(region_select)

Combining Color and Region Selections

Now you've seen how to mask out a region of interest in an image. Next, let's combine the mask and

color selection to pull only the lane lines out of the image.

Check out the code below. Here were doing both the color and region selection steps, requiring that a

pixel meet both the mask and color selection requirements to be retained.

import matplotlib.image as mpimg

import numpy as np

image = mpimg.imread('test.jpg')

# Grab the x and y sizes and make two copies of the image

# With one copy we'll extract only the pixels that meet our selection,

# then we'll paint those pixels red in the original image to see our selection

# overlaid on the original.

ysize = image.shape[0]

xsize = image.shape[1]

color_select= np.copy(image)

line_image = np.copy(image)

red_threshold = 0

green_threshold = 0

blue_threshold = 0

rgb_threshold = [red_threshold, green_threshold, blue_threshold]

# Keep in mind the origin (x=0, y=0) is in the upper left in image processing

# you'll find these are not sensible values!!

# But you'll get a chance to play with them soon in a quiz ;)

left_bottom = [0, 539]

right_bottom = [900, 300]

apex = [400, 0]

fit_right = np.polyfit((right_bottom[0], apex[0]), (right_bottom[1], apex[1]), 1)

fit_bottom = np.polyfit((left_bottom[0], right_bottom[0]), (left_bottom[1],

right_bottom[1]), 1)

color_thresholds = (image[:,:,0] < rgb_threshold[0]) | \

(image[:,:,1] < rgb_threshold[1]) | \

(image[:,:,2] < rgb_threshold[2])

XX, YY = np.meshgrid(np.arange(0, xsize), np.arange(0, ysize))

region_thresholds = (YY > (XX*fit_left[0] + fit_left[1])) & \

(YY > (XX*fit_right[0] + fit_right[1])) & \

(YY < (XX*fit_bottom[0] + fit_bottom[1]))

# Mask color selection

color_select[color_thresholds] = [0,0,0]

# Find where image is both colored right and in the region

line_image[~color_thresholds & region_thresholds] = [255,0,0]

plt.imshow(color_select)

plt.imshow(line_image)

In the next quiz, you can vary your color selection and the shape of your region mask (vertices of a

triangle left_bottom, right_bottom, and apex), such that you pick out the lane lines and

nothing else.

In this next quiz, I've given you the values of red_threshold, green_threshold, and

blue_threshold but now you need to modify left_bottom, right_bottom, and apex to

represent the vertices of a triangle identifying the region of interest in the image. When you run the

code in the quiz, your output result will be several images. Tweak the vertices until your output looks

like the examples shown below.

Start Quiz

So you found the lane lines... simple right? Now youre ready to upload the algorithm to the car and

drive autonomously right?? Well, not quite yet ;)

As it happens, lane lines are not always the same color, and even lines of the same color under different

lighting conditions (day, night, etc) may fail to be detected by our simple color selection.

What we need is to take our algorithm to the next level to detect lines of any color using sophisticated

computer vision methods.

So, what is computer vision?

9.

What is Computer Vision?

https://www.youtube.com/watch?v=wxQhfSdxjKU

In rest of this lesson, well introduce some computer vision techniques with enough detail for you to

get an intuitive feel for how they work.

You'll learn much more about these topics during the Computer Vision module later in the program.

We also recommend the free Udacity course, Introduction to Computer Vision.

Throughout this Nanodegree Program, we will be using Python with OpenCV for computer vision

work. OpenCV stands for Open-Source Computer Vision. For now, you don't need to download or

install anything, but later in the program we'll help you get these tools installed on your own computer.

OpenCV contains extensive libraries of functions that you can use. The OpenCV libraries are well

documented, so if youre ever feeling confused about what the parameters in a particular function are

doing, or anything else, you can find a wealth of information at opencv.org.

10. Canny Edge Detection

https://www.youtube.com/watch?v=Av2GsgQWX8I

https://www.youtube.com/watch?time_continue=6&v=LQM--KPJjD0

Note! The standard location of the origin (x=0, y=0) for images is in the top left corner with y

values increasing downward and x increasing to the right. This might seem weird at first, but if

you think about an image as a matrix, it makes sense that the "00" element is in the upper left.

Now let's try a quiz. Below, Im plotting a cross section through this image. Where are the areas in the

image that are most likely to be identified as edges?

Quiz Question

The red line in the plot above shows where I took a cross section through the image. The wiggles in the

blue line indicate changes in intensity along that cross section through the image. Check all the boxes

of the letters along this cross section, where you expect to find strong edges.

A

E

11. Canny to Detect Lane Lines

Now that you have a conceptual grasp on how the Canny algorithm works, it's time to use it to find the

edges of the lane lines in an image of the road. So let's give that a try.

First, we need to read in an image:

import matplotlib.pyplot as plt

import matplotlib.image as mpimg

image = mpimg.imread('exit-ramp.jpg')

plt.imshow(image)

Here we have an image of the road, and it's fairly obvious by eye where the lane lines are, but what

about using computer vision?

Let's go ahead and convert to grayscale.

import cv2 #bringing in OpenCV libraries

gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY) #grayscale conversion

plt.imshow(gray, cmap='gray')

Lets try our Canny edge detector on this image. This is where OpenCV gets useful. First, we'll have a

look at the parameters for the OpenCV Canny function. You will call it like this:

edges = cv2.Canny(gray, low_threshold, high_threshold)

In this case, you are applying Canny to the image gray and your output will be another image called

edges. low_threshold and high_threshold are your thresholds for edge detection.

The algorithm will first detect strong edge (strong gradient) pixels above the high_threshold, and

reject pixels below the low_threshold. Next, pixels with values between the low_threshold

and high_threshold will be included as long as they are connected to strong edges. The output

edges is a binary image with white pixels tracing out the detected edges and black everywhere else.

See the OpenCV Canny Docs for more details.

What would make sense as a reasonable range for these parameters? In our case, converting to

grayscale has left us with an 8-bit image, so each pixel can take 2^8 = 256 possible values. Hence, the

pixel values range from 0 to 255.

This range implies that derivatives (essentially, the value differences from pixel to pixel) will be on the

scale of tens or hundreds. So, a reasonable range for your threshold parameters would also be in

the tens to hundreds.

As far as a ratio of low_threshold to high_threshold, John Canny himself recommended a

low to high ratio of 1:2 or 1:3.

We'll also include Gaussian smoothing, before running Canny, which is essentially a way of

suppressing noise and spurious gradients by averaging (check out the OpenCV docs for GaussianBlur).

cv2.Canny() actually applies Gaussian smoothing internally, but we include it here because you can

get a different result by applying further smoothing (and it's not a changeable parameter within

cv2.Canny()!).

You can choose the kernel_size for Gaussian smoothing to be any odd number. A larger

kernel_size implies averaging, or smoothing, over a larger area. The example in the previous

lesson was kernel_size = 3.

Note: If this is all sounding complicated and new to you, don't worry! We're moving pretty fast through

the material here, because for now we just want you to be able to use these tools. If you would like to

dive into the math underpinning these functions, please check out the free Udacity course, Intro to

Computer Vision, where the third lesson covers Gaussian filters and the sixth and seventh lessons cover

edge detection.

#doing all the relevant imports

import matplotlib.pyplot as plt

import matplotlib.image as mpimg

import numpy as np

import cv2

image = mpimg.imread('exit-ramp.jpg')

gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)

# Note: this step is optional as cv2.Canny() applies a 5x5 Gaussian internally

kernel_size = 3

blur_gray = cv2.GaussianBlur(gray,(kernel_size, kernel_size), 0)

# NOTE: if you try running this code you might want to change these!

low_threshold = 1

high_threshold = 10

edges = cv2.Canny(blur_gray, low_threshold, high_threshold)

plt.imshow(edges, cmap='Greys_r')

Here I've called the OpenCV function Canny on a Gaussian-smoothed grayscaled image called

blur_gray and detected edges with thresholds on the gradient of high_threshold, and

low_threshold.

In the next quiz you'll get to try this on your own and mess around with the parameters for the Gaussian

smoothing and Canny Edge Detection to optimize for detecting the lane lines and not a lot of other

stuff.

Now its your turn! Try using Canny on your own and fiddle with the parameters for the Gaussian

smoothing and Edge Detection to optimize for detecting the lane lines well without detecting a lot of

other stuff. Your result should look like the example shown below.

Start Quiz

13. Hough Transform

Hough Transform

Transform to Find Lines

from Canny Edges

https://www.youtube.com/watch?v=JFwj5UtKmPY

In image space, a line is plotted as x vs. y, but in 1962, Paul Hough devised a method for representing

lines in parameter space, which we will call Hough space in his honor.

In Hough space, I can represent my "x vs. y" line as a point in "m vs. b" instead. The Hough Transform

is just the conversion from image space to Hough space. So, the characterization of a line in image

space will be a single point at the position (m, b) in Hough space.

So now Id like to check your intuition if a line in image space corresponds to a point in Hough

space, what would two parallel lines in image space correspond to in Hough space?

Question 1 of 5

What will be the representation in Hough space of two parallel lines in image space?

Alright, so a line in image space corresponds to a point in Hough space. What does a point in

image space correspond to in Hough space?

A single point in image space has many possible lines that pass through it, but not just any lines, only

those with particular combinations of the m and b parameters. Rearranging the equation of a line, we

find that a single point (x,y) corresponds to the line b = y - xm.

So what is the representation of a point in image space in Hough space?

Question 2 of 5

What does a point in image space correspond to in Hough space?

A

What if you have 2 points in image space. What would that look like in Hough space?

Question 3 of 5

What is the representation in Hough space of two points in image space?

Alright, now we have two intersecting lines in Hough Space. How would you represent their

intersection at the point (m0, b0) in image space?

Question 4 of 5

What does the intersection point of the two lines in Hough space correspond to in image space?

A) A line in image space that passes through both (x1, y1) and (x2, y2)

https://www.youtube.com/watch?v=XQf7FOhwOVk

So, what happens if we run a Hough Transform on an image of a square? What will the corresponding

plot in Hough space look like?

https://www.youtube.com/watch?v=upKjISd3aBk

Question 5 of 5

What happens if we run a Hough Transform on an image of a square? What will the corresponding plot

in Hough space look like?

Implementing a Hough Transform on Edge

Detected Image

Now you know how the Hough Transform works, but to accomplish the task of finding lane lines, we

need to specify some parameters to say what kind of lines we want to detect (i.e., long lines, short lines,

bendy lines, dashed lines, etc.).

To do this, we'll be using an OpenCV function called HoughLinesP that takes several parameters.

Let's code it up and find the lane lines in the image we detected edges in with the Canny function (for a

look at coding up a Hough Transform from scratch, check this out.) .

Here's the image we're working with:

Let's look at the input parameters for the OpenCV function HoughLinesP that we will use to find

lines in the image. You will call it like this:

lines = cv2.HoughLinesP(edges, rho, theta, threshold, np.array([]),

min_line_length, max_line_gap)

In this case, we are operating on the image edges (the output from Canny) and the output from

HoughLinesP will be lines, which will simply be an array containing the endpoints (x1, y1, x2, y2)

of all line segments detected by the transform operation. The other parameters define just what kind of

line segments we're looking for.

First off, rho and theta are the distance and angular resolution of our grid in Hough space.

Remember that, in Hough space, we have a grid laid out along the (, ) axis. You need to specify rho

in units of pixels and theta in units of radians.

So, what are reasonable values? Well, rho takes a minimum value of 1, and a reasonable starting place

for theta is 1 degree (pi/180 in radians). Scale these values up to be more flexible in your definition of

what constitutes a line.

The threshold parameter specifies the minimum number of votes (intersections in a given grid cell)

a candidate line needs to have to make it into the output. The empty np.array([]) is just a

placeholder, no need to change it. min_line_length is the minimum length of a line (in pixels)

that you will accept in the output, and max_line_gap is the maximum distance (again, in pixels)

between segments that you will allow to be connected into a single line. You can then iterate through

your output lines and draw them onto the image to see what you got!

# Do relevant imports

import matplotlib.pyplot as plt

import matplotlib.image as mpimg

import numpy as np

import cv2

image = mpimg.imread('exit-ramp.jpg')

gray = cv2.cvtColor(image,cv2.COLOR_RGB2GRAY)

kernel_size = 5

blur_gray = cv2.GaussianBlur(gray,(kernel_size, kernel_size),0)

low_threshold = 50

high_threshold = 150

edges = cv2.Canny(blur_gray, low_threshold, high_threshold)

# Make a blank the same size as our image to draw on

rho = 1

theta = np.pi/180

threshold = 1

min_line_length = 10

max_line_gap = 1

line_image = np.copy(image)*0 #creating a blank to draw lines on

lines = cv2.HoughLinesP(edges, rho, theta, threshold, np.array([]),

min_line_length, max_line_gap)

# Iterate over the output "lines" and draw lines on the blank

for line in lines:

for x1,y1,x2,y2 in line:

cv2.line(line_image,(x1,y1),(x2,y2),(255,0,0),10)

color_edges = np.dstack((edges, edges, edges))

combo = cv2.addWeighted(color_edges, 0.8, line_image, 1, 0)

plt.imshow(combo)

As you can see I've detected lots of line segments! Your job, in the next exercise, is to figure out which

parameters do the best job of optimizing the detection of the lane lines. Then, you'll want to apply a

region of interest mask to filter out detected line segments in other areas of the image. Earlier in this

lesson you used a triangular region mask, but this time you'll get a chance to use a quadrilateral region

mask using the cv2.fillPoly() function (keep in mind though, you could use this same method to

mask an arbitrarily complex polygon region). When you're finished you'll be ready to apply the skills

you've learned to do the project at the end of this lesson.

15. Quiz: Hough Transform Quiz

Now it's your turn to play with the Hough Transform on an edge-detected image. You'll start with the

image on the left below. If you "Test Run" the quiz, you'll get output that looks like the center image.

Your job is to modify the parameters for the Hough Transform and impose a region of interest mask to

get output that looks like the image on the right. In the code, I've given you a framework for defining a

quadrilateral region of interest mask.

Start Quiz

https://www.youtube.com/watch?v=LatP7XUPgIE

In this term, you'll use Python 3 for programming quizzes, labs, and projects. The following will guide

you through setting up the programming environment on your local machine.

There are two ways to get up and running:

1. Anaconda

2. Docker

We recommend you first try setting up your environment with Anaconda. It's faster to get up and

running and has fewer moving parts.

If the Anaconda installation gives you trouble, try Docker instead.

Follow the instructions in this README.(https://github.com/udacity/CarND-Term1-Starter-

Kit/blob/master/README.md )

Here is a great link for learning more about Anaconda and Jupyter Notebooks(

https://classroom.udacity.com/courses/ud1111 )

Now that everything is installed, let's make sure it's working!

1. Clone and navigate to the starter kit test repository.

# NOTE: This is DIFFERENT from https://github.com/udacity/CarND-Term1-

Starter-Kit.git

git clone https://github.com/udacity/CarND-Term1-Starter-Kit-Test.git

cd CarND-Term1-Starter-Kit-Test

2. Launch the Jupyter notebook with Anaconda or Docker. This notebook is simply to make sure

the installed packages are working properly. The instructions for the first project are on the

next page.

# Anaconda

source activate carnd-term1 # If currently deactivated, i.e. start of a new

terminal session

jupyter notebook test.ipynb

# Docker

docker run -it --rm -p 8888:8888 -v ${pwd}:/src udacity/carnd-term1-starter-

kit test.ipynb

# OR

docker run -it --rm -p 8888:8888 -v `pwd`:/src udacity/carnd-term1-starter-

kit test.ipynb

3. Go to http://localhost:8888/notebooks/test.ipynb in your browser and run

all the cells. Everything should execute without error.

Troubleshooting

ffmpeg

NOTE: If you don't have ffmpeg installed on your computer you'll have to install it for moviepy to

work. If this is the case you'll be prompted by an error in the notebook. You can easily install ffmpeg

by running the following in a code cell in the notebook.

import imageio

imageio.plugins.ffmpeg.download()

Docker

To get the latest version of the docker image, you may need to run:

docker pull udacity/carnd-term1-starter-kit

Project Expectations

For each project in Term 1, keep in mind a few key elements:

rubric

code

writeup

submission

Rubric

Each project comes with a rubric detailing the requirements for passing the project. Project reviewers

will check your project against the rubric to make sure that it meets specifications.

Before submitting your project, compare your submission against the rubric to make sure you've

covered each rubric point.

Here is an example of a project rubric:

Example of a project rubric

Code

Every project in the term includes code that you will write. For some projects we provide code

templates, often in a Jupyter notebook. For other projects, there are no code templates.

In either case, you'll need to submit your code files as part of the project. Each project has specific

instructions about what files are required. Make sure that your code is commented and easy for the

project reviewers to follow.

For the Jupyter notebooks, sometimes you must run all of the code cells and then export the notebook

as an HTML file. The notebook will contain instructions for how to do this.

Because running the code can take anywhere from several minutes to a few hours, the HTML file

allows project reviewers to see your notebook's output without having to run the code.

Even if the project requires submission of the HTML output of your Jupyter notebook, please submit

the original Jupyter notebook itself, as well.

Writeup

All of the projects in Term 1 require a writeup. The writeup is your chance to explain how you

approached the project.

It is also an opportunity to show your understanding of key concepts in the program.

We have provided writeup templates for every project so that it is clear what information needs to be in

each writeup. These templates can be found in each project repository, with the title

writeup_template.md.

Your writeup report should explain how you satisfied each requirement in the project rubric.

The writeups can be turned in either as Markdown files (.md) or PDF files.

Submission

When submitting a project, you can either submit it as a link to a GitHub repository

(https://github.com/ )or as a ZIP file. When submitting a GitHub repository, we advise creating a new

repository, specific to the project you are submitting.

GitHub repositories are a convenient way to organize your projects and display them to the world. A

GitHub repository also has a README.md file that opens automatically when somebody visits your

GitHub repository link.

As a suggestion, the README.md file for each repository can include the following information:

a list of files contained in the repository with a brief description of each file

any instructions someone might need for running your code

an overview of the project

Example of a README file

If you are unfamiliar with GitHub , Udacity has a brief GitHub tutorial

(http://blog.udacity.com/2015/06/a-beginners-git-github-tutorial.html )to get you started. Udacity also

provides a more detailed free course on git and GitHub(https://www.udacity.com/course/how-to-use-

git-and-github--ud775 ).

To learn about REAMDE files and Markdown, Udacity provides a free course on READMEs

(https://www.udacity.com/course/writing-readmes--ud777 ), as well.

GitHub also provides a tutorial( https://guides.github.com/features/mastering-markdown/ ) about

creating Markdown files.

Due

Jun 1

Project Submission

Navigate to the project repository on GitHub (https://github.com/udacity/CarND-LaneLines-P1 )and

have a look at the Readme file for detailed instructions on how to get setup with Python and OpenCV

and how to access the Jupyter Notebook containing the project code. You will need to download, or

git clone, this repository in order to complete the project.

In this project, you will be writing code to identify lane lines on the road, first in an image, and later in

a video stream (really just a series of images). To complete this project you will use the tools you

learned about in the lesson, and build upon them.

Your first goal is to write code including a series of steps (pipeline) that identify and draw the lane lines

on a few test images. Once you can successfully identify the lines in an image, you can cut and paste

your code into the block provided to run on a video stream.

You will then refine your pipeline with parameter tuning and by averaging and extrapolating the lines.

Finally, you'll make a brief writeup report. The github repository has a writeup_template.md that

can be used as a guide.

Have a look at the video clip called "P1_example.mp4" in the repository to see an example of what

your final output should look like. Two videos are provided for you to run your code on. These are

called "solidWhiteRight.mp4" and solidYellowLeft.mp4".

Evaluation

Once you have completed your project, use the Project

Rubric(https://review.udacity.com/#!/rubrics/322/view ) to review the project. If you have covered

all of the points in the rubric, then you are ready to submit! If you see room for improvement in any

category in which you do not meet specifications, keep working!

Your project will be evaluated by a Udacity reviewer according to the same Project

Rubric(https://review.udacity.com/#!/rubrics/322/view ). Your project must "meet specifications" in

each category in order for your submission to pass.

Submission

What to include in your submission

You may submit your project as a zip file or with a link to a github repo. The submission must include

two files:

Jupyter Notebook with your project code

writeup report (md or pdf file)

Click on the "Submit Project" button and follow the instructions to submit!

Congratulations! You've completed this project

https://www.youtube.com/watch?time_continue=2&v=oR1IxPTTz0U

2. Mercedes-Benz

https://www.youtube.com/watch?v=Z_hi4djW5aw

3. NVIDIA

https://www.youtube.com/watch?v=C6Rt9lxMqHs

4. Uber ATG

https://www.youtube.com/watch?v=V23NZzX0efY

Connect to Hiring Partners through your

Udacity Professional Profile

In addition to the Career Lessons and Projects you'll find in your Nanodegree program, you have a

Udacity Professional Profile linked in the left sidebar.

Your Udacity Professional Profile features important, professional information about yourself. When

you make your profile public, it becomes accessible to our Hiring Partners:

Your profile will also connect you with recruiters and hiring managers who come to Udacity to hire

skilled Nanodegree graduates.

As you complete projects in your Nanodegree program, they will be automatically added to your

Udacity Professional Profile to ensure you're able to show employers the skills you've gained through

the program. In order to differentiate yourself from other candidates, make sure to go in and customize

those project cards. In addition to these projects, be sure to:

Keep your profile updated with your basic info and job preferences, such as location

Ensure you upload your latest resume

Return regularly to your Profile to update your projects and ensure you're showcasing your best

work

If you are looking for a job, make sure to keep your Udacity Professional Profile updated and visible to

recruiters!

6. Get Started

When you're ready to get started on your job search, head back to your syllabus and click on

"Extracurricular."

There you'll find two optional modules built by our Careers team: Job Search Strategies and

Networking.

The Udacity Careers team has put together this custom curriculum to help you in your job search. From

writing your resume and cover letter, to creating profiles on LinkedIn and GitHub, the team is here to

help you secure your dream job!

These modules and their associated projects are completely optional, but we highly recommend you

complete them to succeed in the job market. Udacity Hiring Partners (https://career-resource-

center.udacity.com/hiring-partners-jobs )are excited to hire students and alumni. We want to help you

optimize your application materials and targeted them to specific jobs!

https://www.youtube.com/watch?time_continue=5&v=uyLRFMI4HkA

2. Starting Machine Learning

https://www.youtube.com/watch?v=UIycORUrPww

3.Linear Regression Quiz

https://www.youtube.com/watch?v=sf51L0RN6zc

Quiz Question

What's the best estimate for the price of a house?

https://www.youtube.com/watch?time_continue=3&v=L5QBqYDNJn0

Linear to Logistic Regression

Linear regression helps predict values on a continuous spectrum, like predicting what the price of a

house will be.

How about classifying data among discrete classes?

Here are examples of classification tasks:

Determining whether a patient has cancer

Identifying the species of a fish

Figuring out who's talking on a conference call

Classification problems are important for self-driving cars. Self-driving cars might need to classify

whether an object crossing the road is a car, pedestrian, and a bicycle. Or they might need to identify

which type of traffic sign is coming up, or what a stop light is indicating.

In the next video, Luis will demonstrate a classification algorithm called "logistic regression". He'll use

logistic regression to predict whether a student will be accepted to a university.

Linear regression will lead to neural networks, which is a much more advanced classification tool.

https://www.youtube.com/watch?v=kSs6O3R7JUI

Quiz Question

Does the student get Accepted?

https://www.youtube.com/watch?time_continue=14&v=1iNylA3fJDs

8. Neural Networks

https://www.youtube.com/watch?time_continue=1&v=Mqogpnp1lrU

9. Perceptron

Perceptron

Now you've seen how a simple neural network makes decisions: by taking in input data, processing that

information, and finally, producing an output in the form of a decision! Let's take a deeper dive into the

university admission example to learn more about processing the input data.

Data, like test scores and grades, are fed into a network of interconnected nodes. These individual

nodes are called perceptrons, or artificial neurons, and they are the basic unit of a neural network. Each

one looks at input data and decides how to categorize that data. In the example above, the input either

passes a threshold for grades and test scores or doesn't, and so the two categories are: yes (passed the

threshold) and no (didn't pass the threshold). These categories then combine to form a decision -- for

example, if both nodes produce a "yes" output, then this student gains admission into the university.

Let's zoom in even further and look at how a single perceptron processes input data.

The perceptron above is one of the two perceptrons from the video that help determine whether or not a

student is accepted to a university. It decides whether a student's grades are high enough to be accepted

to the university. You might be wondering: "How does it know whether grades or test scores are more

important in making this acceptance decision?" Well, when we initialize a neural network, we don't

know what information will be most important in making a decision. It's up to the neural network to

learn for itself which data is most important and adjust how it considers that data.

It does this with something called weights.

Weights

When input comes into a perceptron, it gets multiplied by a weight value that is assigned to this

particular input. For example, the perceptron above has two inputs, tests for test scores and

grades, so it has two associated weights that can be adjusted individually. These weights start out as

random values, and as the neural network network learns more about what kind of input data leads to a

student being accepted into a university, the network adjusts the weights based on any errors in

categorization that results from the previous weights. This is called training the neural network.

A higher weight means the neural network considers that input more important than other inputs, and

lower weight means that the data is considered less important. An extreme example would be if test

scores had no affect at all on university acceptance; then the weight of the test score input would be

zero and it would have no affect on the output of the perceptron.

Each input to a perceptron has an associated weight that represents its importance. These weights are

determined during the learning process of a neural network, called training. In the next step, the

weighted input data are summed to produce a single value, that will help determine the final output -

whether a student is accepted to a university or not. Let's see a concrete example of this.

We weight x_test by w_test and add it to x_grades weighted by w_grades.

When writing equations related to neural networks, the weights will always be represented by some

type of the letter w. It will usually look like a W when it represents a matrix of weights or a w when it

represents an individual weight, and it may include some additional information in the form of a

subscript to specify which weights (you'll see more on that next). But remember, when you see the

letter w, think weights.

In this example, we'll use wgrades for the weight of grades and wtest for the weight of test. For the

image above, let's say that the weights are: wgrades=1,wtest =0.2. You don't have to be concerned

with the actual values, but their relative values are important. wgrades is 5 times larger than wtest,

which means the neural network considers grades input 5 times more important than test in

determining whether a student will be accepted into a university.

The perceptron applies these weights to the inputs and sums them in a process known as linear

combination. In our case, this looks like wgradesxgrades+wtestxtest=1xgrades0.2xtest.

Now, to make our equation less wordy, let's replace the explicit names with numbers. Let's use 1 for

grades and 2 for tests. So now our equation becomes

w1x1+w2x2

In this example, we just have 2 simple inputs: grades and tests. Let's imagine we instead had m different

inputs and we labeled them x1,x2,...,xm. Let's also say that the weight corresponding to x1 is w1 and so

on. In that case, we would express the linear combination succintly as:

i=1mwixi

Here, the Greek letter Sigma is used to represent summation. It simply means to evaluate the

equation to the right multiple times and add up the results. In this case, the equation it will sum is wixi

But where do we get wi and xi?

i=1m means to iterate over all i values, from 1 to m.

So to put it all together, i=1mwixi means the following:

Start at i=1

Evaluate w1x1 and remember the results

Move to i=2

Evaluate w2x2 and add these results to w1x1

Continue repeating that process until i=m, where m is the number of inputs.

One last thing: you'll see equations written many different ways, both here and when reading on your

own. For example, you will often just see i instead of i=1m. The first is simply a shorter way of

writing the second. That is, if you see a summation without a starting number or a defined end value, it

just means perform the sum for all of the them. And sometimes, if the value to iterate over can be

inferred, you'll see it as just . Just remember they're all the same thing: i=1mwixi=iwixi=wixi.

Finally, the result of the perceptron's summation is turned into an output signal! This is done by feeding

the linear combination into an activation function.

Activation functions are functions that decide, given the inputs into the node, what should be the node's

output? Because it's the activation function that decides the actual output, we often refer to the outputs

of a layer as its "activations".

One of the simplest activation functions is the Heaviside step function. This function returns a 0 if the

linear combination is less than 0. It returns a 1 if the linear combination is positive or equal to zero. The

Heaviside step function is shown below, where h is the calculated linear combination:

The Heaviside Step Function

In the university acceptance example above, we used the weights wgrades=1,wtest =0.2. Since w

grades and wtest are negative values, the activation function will only return a 1 if grades and test are

0! This is because the range of values from the linear combination using these weights and inputs are

(,0] (i.e. negative infinity to 0, including 0 itself).

It's easiest to see this with an example in two dimensions. In the following graph, imagine any points

along the line or in the shaded area represent all the possible inputs to our node. Also imagine that the

value along the y-axis is the result of performing the linear combination on these inputs and the

appropriate weights. It's this result that gets passed to the activation function.

Now remember that the step activation function returns 1 for any inputs greater than or equal to zero.

As you can see in the image, only one point has a y-value greater than or equal to zero the point right

at the origin, (0,0):

Now, we certainly want more than one possible grade/test combination to result in acceptance, so we

need to adjust the results passed to our activation function so it activates that is, returns 1 for more

inputs. Specifically, we need to find a way so all the scores wed like to consider acceptable for

admissions produce values greater than or equal to zero when linearly combined with the weights into

our node.

One way to get our function to return 1 for more inputs is to add a value to the results of our linear

combination, called a bias.

A bias, represented in equations as b, lets us move values in one direction or another.

For example, the following diagram shows the previous hypothetical function with an added bias of +3.

The blue shaded area shows all the values that now activate the function. But notice that these are

produced with the same inputs as the values shown shaded in grey just adjusted higher by adding the

bias term:

Of course, with neural networks we won't know in advance what values to pick for biases. Thats ok,

because just like the weights, the bias can also be updated and changed by the neural network during

training. So after adding a bias, we now have a complete perceptron formula:

Perceptron Formula

This formula returns 1 if the input (x1,x2,...,xm) belongs to the accepted-to-university category or

returns 0 if it doesn't. The input is made up of one or more real numbers, each one represented by xi,

where m is the number of inputs.

Then the neural network starts to learn! Initially, the weights ( wi) and bias (b) are assigned a random

value, and then they are updated using a learning algorithm like gradient descent. The weights and

biases change so that the next training example is more accurately categorized, and patterns in data are

"learned" by the neural network.

Now that you have a good understanding of perceptions, let's put that knowledge to use. In the next

section, you'll create the AND perceptron from the Neural Networks video by setting the values for

weights and bias.

What are the weights and bias for the AND perceptron?

Set the weights (weight1, weight2) and bias bias to the correct values that calculate AND

operation as shown above.

In this case, there are two inputs as seen in the table above (let's call the first column input1 and the

second column input2), and based on the perceptron formula, we can calculate the output.

First, the linear combination will be the sum of the weighted inputs: linear_combination =

weight1*input1 + weight2*input2 then we can put this value into the biased Heaviside step

function, which will give us our output (0 or 1):

Perceptron Formula

import pandas as pd

weight1 = 1.0

weight2 = 1.0

bias = -1.2

# Inputs and outputs

test_inputs = [(0, 0), (0, 1), (1, 0), (1, 1)]

correct_outputs = [False, False, False, True]

outputs = []

for test_input, correct_output in zip(test_inputs, correct_outputs):

linear_combination = weight1 * test_input[0] + weight2 * test_input[1] + bias

output = int(linear_combination >= 0)

is_correct_string = 'Yes' if output == correct_output else 'No'

outputs.append([test_input[0], test_input[1], linear_combination, output, is_correct_string])

# Print output

num_wrong = len([output[4] for output in outputs if output[4] == 'No'])

output_frame = pd.DataFrame(outputs, columns=['Input 1', ' Input 2', ' Linear Combination', '

Activation Output', ' Is Correct'])

if not num_wrong:

print('Nice! You got it all correct.\n')

else:

print('You got {} wrong. Keep trying!\n'.format(num_wrong))

print(output_frame.to_string(index=False))

Consider input1 and input2 both = 1, for an AND perceptron, we want the output to also equal 1! The

output is determined by the weights and Heaviside step function such that

output = 1, if weight1*input1 + weight2*input2 + bias >= 0

or

output = 0, if weight1*input1 + weight2*input2 + bias < 0

So, how can you choose the values for weights and bias so that if both inputs = 1, the output = 1?

OR Perceptron

The OR perceptron is very similar to an AND perceptron. In the image below, the OR perceptron has

the same line as the AND perceptron, except the line is shifted down. What can you do to the weights

and/or bias to achieve this? Use the following AND perceptron to create an OR Perceptron.

Question 1 of 2

What are two ways to go from an AND perceptron to an OR perceptron?

Increase the weights

NOT Perceptron

Unlike the other perceptrons we looked at, the NOT operations only cares about one input. The

operation returns a 0 if the input is 1 and a 1 if it's a 0. The other inputs to the perceptron are ignored.

In this quiz, you'll set the weights (weight1, weight2) and bias bias to the values that calculate

the NOT operation on the second input and ignores the first input.

import pandas as pd

weight1 = -1.0

weight2 = -4.0

bias = 2.0

# Inputs and outputs

test_inputs = [(0, 0), (0, 1), (1, 0), (1, 1)]

correct_outputs = [True, False, True, False]

outputs = []

for test_input, correct_output in zip(test_inputs, correct_outputs):

linear_combination = weight1 * test_input[0] + weight2 * test_input[1] + bias

output = int(linear_combination >= 0)

is_correct_string = 'Yes' if output == correct_output else 'No'

outputs.append([test_input[0], test_input[1], linear_combination, output, is_correct_string])

# Print output

num_wrong = len([output[4] for output in outputs if output[4] == 'No'])

output_frame = pd.DataFrame(outputs, columns=['Input 1', ' Input 2', ' Linear Combination', '

Activation Output', ' Is Correct'])

if not num_wrong:

print('Nice! You got it all correct.\n')

else:

print('You got {} wrong. Keep trying!\n'.format(num_wrong))

print(output_frame.to_string(index=False))

We have a perceptron that can do AND, OR, or NOT operations. Let's do one more, XOR. In the next

section, you'll learn how a neural network solves more complicated problems like XOR.

XOR Perceptron

An XOR perceptron is a logic gate that outputs 0 if the inputs are the same and 1 if the inputs are

different. Unlike previous perceptrons, this graph isn't linearly separable. To handle more complex

problems like this, we can chain perceptrons together.

Let's build a neural network from the AND, NOT, and OR perceptrons to create XOR logic. Let's first

go over what a neural network looks like.

The above neural network contains 4 perceptrons, A, B, C, and D. The input to the neural network is

from the first node. The output comes out of the last node. The weights are based on the line thickness

between the perceptrons. Any link between perceptrons with a low weight, like A to C, you can ignore.

For perceptron C, you can ignore all input to and from it. For simplicity we wont be showing bias, but

it's still in the neural network.

Quiz

The neural network above calculates XOR. Each perceptron is a logic operation of OR, AND,

Passthrough, or NOT. The Passthrough operation just passes it's input to the output. However, the

perceptrons A , B, and C don't indicate their operation. In the following quiz, set the correct operations

for the three perceptrons to calculate XOR.

Note: Any line with a low weight can be ignored.

Quiz Question

Set the operations for the perceptrons in the XOR neural network?

Perceptron

Operations

A

NOT

B

AND

C

OR

You've seen that a perceptron can solve linearly separable problems. Solving more complex problems,

you use more perceptrons. You saw this by calculating AND, OR, NOT, and XOR operations using

perceptrons. These operations can be used to create any computer program. With enough data and time,

a neural network can solve any problem that a computer can calculate. However, you don't build a

Twitter using a neural network. A neural network is like any tool, you have to know when to use it.

The power of a neural network isn't building it by hand, like we were doing. It's the ability to learn

from examples. In the next few sections, you'll learn how a neural networks sets it's own weights and

biases.

13. The Simplest Neural Network

So far you've been working with perceptrons where the output is always one or zero. The input to the

output unit is passed through an activation function, f(h), in this case, the step function.

The step activation function.

The output unit returns the result of f(h), where h is the input to the output unit:

h=iwixi+b

The diagram below shows a simple network. The linear combination of the weights, inputs, and bias

form the input h, which passes through the activation function f(h), giving the final output of the

perceptron, labeled y.

Diagram of a simple neural network. Circles are units, boxes are operations.

The cool part about this architecture, and what makes neural networks possible, is that the activation

function, f(h) can be any function, not just the step function shown earlier.

For example, if you let f(h)=h, the output will be the same as the input. Now the output of the network

is

y=iwixi+b

This equation should be familiar to you, it's the same as the linear regression model!

Other activation functions you'll see are the logistic (often called the sigmoid), tanh, and softmax

functions. We'll mostly be using the sigmoid function for the rest of this lesson:

sigmoid(x)=1/(1+ex)

The sigmoid function is bounded between 0 and 1, and as an output can be interpreted as a probability

for success. It turns out, again, using a sigmoid as the activation function results in the same

formulation as logistic regression.

This is where it stops being a perceptron and begins being called a neural network. In the case of

simple networks like this, neural networks don't offer any advantage over general linear models such as

logistic regression.

As you saw earlier in the XOR perceptron, stacking units lets us model linearly inseparable data.

But, as you saw with the XOR perceptron, stacking units will let you model linearly inseparable data,

impossible to do with regression models.

Once you start using activation functions that are continuous and differentiable, it's possible to train the

network using gradient descent, which you'll learn about next.

Below you'll use NumPy to calculate the output of a simple network with two input nodes and one

output node with a sigmoid activation function. Things you'll need to do:

Implement the sigmoid function.

Calculate the output of the network.

sigmoid(x)=1/(1+ex)

For the exponential, you can use Numpy's exponential function, np.exp.

y=f(h)=sigmoid(iwixi+b)

For the weights sum, you can do a simple element-wise multiplication and sum, or use NumPy's dot

product function.

Simple.py

import numpy as np

def sigmoid(x):

# TODO: Implement sigmoid function

return 1 / (1 + np.exp(-x))

weights = np.array([0.1, 0.8])

bias = -0.1

output = sigmoid(np.dot(inputs,weights) + bias)

print('Output:')

print(output)

solution.py

import numpy as np

def sigmoid(x):

# TODO: Implement sigmoid function

return 1/(1 + np.exp(-x))

weights = np.array([0.1, 0.8])

bias = -0.1

# TODO: Calculate the output

output = sigmoid(np.dot(weights, inputs) + bias)

print('Output:')

print(output)

Learning weights

You've seen how you can use perceptrons for AND and XOR operations, but there we set the weights

by hand. What if you want to perform an operation, such as predicting college admission, but don't

know the correct weights? You'll need to learn the weights from example data, then use those weights

to make the predictions.

To figure out how we're going to find these weights, start by thinking about the goal. We want the

network to make predictions as close as possible to the real values. To measure this, we need a metric

of how wrong the predictions are, the error. A common metric is the sum of the squared errors (SSE):

E=21j[yjy^j]2

where y^ is the prediction and y is the true value, and you take the sum over all output units j and

another sum over all data points . This might seem like a really complicated equation at first, but it's

fairly simple once you understand the symbols and can say what's going on in words.

First, the inside sum over j. This variable j represents the output units of the network. So this inside

sum is saying for each output unit, find the difference between the true value y and the predicted value

from the network y^, then square the difference, then sum up all those squares.

Then the other sum over is a sum over all the data points. So, for each data point you calculate the

inner sum of the squared differences for each output unit. Then you sum up those squared differences

for each data point. That gives you the overall error for all the output predictions for all the data points.

The SSE is a good choice for a few reasons. The square ensures the error is always positive and larger

errors are penalized more than smaller errors. Also, it makes the math nice, always a plus.

Remember that the output of a neural network, the prediction, depends on the weights

y^j=f(iwijxi)

and accordingly the error depends on the weights

E=21j[yjf(iwijxi)]2

We want the network's prediction error to be as small as possible and the weights are the knobs we can

use to make that happen. Our goal is to find weights wij that minimize the squared error E. To do this

with a neural network, typically you'd use gradient descent.

https://www.youtube.com/watch?v=29PmNG7fuuM

As Luis said, with gradient descent, we take multiple small steps towards our goal. In this case, we

want to change the weights in steps that reduce the error. Continuing the analogy, the error is our

mountain and we want to get to the bottom. Since the fastest way down a mountain is in the steepest

direction, the steps taken should be in the direction that minimizes the error the most. We can find this

direction by calculating the gradient of the squared error.

Gradient is another term for rate of change or slope. If you need to brush up on this concept, check out

Khan Academy's great lectures on the topic.

To calculate a rate of change, we turn to calculus, specifically derivatives. A derivative of a function

f(x) gives you another function f(x) that returns the slope of f(x) at point x. For example, consider

f(x)=x2. The derivative of x2 is f(x)=2x. So, at x=2, the slope is f(2)=4. Plotting this out, it looks like:

Example of a gradient

The gradient is just a derivative generalized to functions with more than one variable. We can use

calculus to find the gradient at any point in our error function, which depends on the input weights.

You'll see how the gradient descent step is derived on the next page.

Below I've plotted an example of the error of a neural network with two inputs, and accordingly, two

weights. You can read this like a topographical map where points on a contour line have the same error

and darker contour lines correspond to larger errors.

At each step, you calculate the error and the gradient, then use those to determine how much to change

each weight. Repeating this process will eventually find weights that are close to the minimum of the

error function, the block dot in the middle.

Gradient descent steps to the lowest error

Caveats

Since the weights will just go where ever the gradient takes them, they can end up where the error is

low, but not the lowest. These spots are called local minima. If the weights are initialized with the

wrong values, gradient descent could lead the weights into a local minimum, illustrated below.

Gradient descent leading into a local minimum

There are methods to avoid this, such as using momentum.

https://www.youtube.com/watch?time_continue=3&v=7sxA5Ap8AWM

From before we saw that one weight update can be calculated as:

wi=xi

with the error term as

=(yy^)f(h)=(yy^)f(wixi)

Now I'll write this out in code for the case of only one output unit. We'll also be using the sigmoid as

the activation function f(h).

# Defining the sigmoid function for activations

def sigmoid(x):

return 1/(1+np.exp(-x))

def sigmoid_prime(x):

return sigmoid(x) * (1 - sigmoid(x))

# Input data

x = np.array([0.1, 0.3])

# Target

y = 0.2

# Input to output weights

weights = np.array([-0.8, 0.5])

learnrate = 0.5

nn_output = sigmoid(x[0]*weights[0] + x[1]*weights[1])

# or nn_output = sigmoid(np.dot(x, weights))

error = y - nn_output

error_term = error * sigmoid_prime(np.dot(x,weights))

del_w = [ learnrate * error_term * x[0],

learnrate * error_term * x[1]]

# or del_w = learnrate * error_term * x

gradient.py

def sigmoid(x):

"""

Calculate sigmoid

"""

return 1/(1+np.exp(-x))

learnrate = 0.5

x = np.array([1, 2])

y = np.array(0.5)

# Initial weights

w = np.array([0.5, -0.5])

# TODO: Calculate output of neural network

nn_output = sigmoid(np.dot(w,x))

error = y - nn_output

del_w = learnrate * error * nn_output * (1 - nn_output) * x

print(nn_output)

print('Amount of Error:')

print(error)

print('Change in Weights:')

print(del_w)

solution.py

import numpy as np

def sigmoid(x):

"""

Calculate sigmoid

"""

return 1/(1+np.exp(-x))

learnrate = 0.5

x = np.array([1, 2])

y = np.array(0.5)

# Initial weights

w = np.array([0.5, -0.5])

# TODO: Calculate output of neural network

nn_output = sigmoid(np.dot(x, w))

error = y - nn_output

del_w = learnrate * error * nn_output * (1 - nn_output) * x

print(nn_output)

print('Amount of Error:')

print(error)

print('Change in Weights:')

print(del_w)

17. Implementing Gradient Descent

Okay, now we know how to update our weights

wij=jxi,

how do we translate this into code?

As an example, I'm going to have you use gradient descent to train a network on graduate school

admissions data (found at http://www.ats.ucla.edu/stat/data/binary.csv). This dataset has three input

features: GRE score, GPA, and the rank of the undergraduate school (numbered 1 through 4).

Institutions with rank 1 have the highest prestige, those with rank 4 have the lowest.

The goal here is to predict if a student will be admitted to a graduate program based on these features.

For this, we'll use a network with one output layer with one unit. We'll use a sigmoid function for the

output unit activation.

Data cleanup

You might think there will be three input units, but we actually need to transform the data first. The

rank feature is categorical, the numbers don't encode any sort of relative values. Rank 2 is not twice

as much as rank 1, rank 3 is not 1.5 more than rank 2. Instead, we need to use dummy variables to

encode rank, splitting the data into four new columns encoded with ones or zeros. Rows with rank 1

have one in the rank 1 dummy column, and zeros in all other columns. Rows with rank 2 have one in

the rank 2 dummy column, and zeros in all other columns. And so on.

We'll also need to standardize the GRE and GPA data, which means to scale the values such they have

zero mean and a standard deviation of 1. This is necessary because the sigmoid function squashes really

small and really large inputs. The gradient of really small and large inputs is zero, which means that the

gradient descent step will go to zero too. Since the GRE and GPA values are fairly large, we have to be

really careful about how we initialize the weights or the gradient descent steps will die off and the

network won't train. Instead, if we standardize the data, we can initialize the weights easily and

everyone is happy.

This is just a brief run-through, you'll learn more about preparing data later. If you're interested in how

I did this, check out the data_prep.py file in the programming exercise below.

Ten rows of the data after transformations.

Now that the data is ready, we see that there are six input features: gre, gpa, and the four rank

dummy variables.

We're going to make a small change to how we calculate the error here. Instead of the SSE, we're going

to use the mean of the square errors (MSE). Now that we're using a lot of data, summing up all the

weight steps can lead to really large updates that make the gradient descent diverge. To compensate for

this, you'd need to use a quite small learning rate. Instead, we can just divide by the number of records

in our data, m to take the average. This way, no matter how much data we use, our learning rates will

typically be in the range of 0.01 to 0.001. Then, we can use the MSE (shown below) to calculate the

gradient and the result is the same as before, just averaged instead of summed.

Here's the general algorithm for updating the weights with gradient descent:

Set the weight step to zero: wi=0

For each record in the training data:

Make a forward pass through the network, calculating the output y^=f(iwixi)

Calculate the error gradient in the output unit, =(yy^)f(iwixi)

Update the weight step wi=wi+xi

Update the weights wi=wi+wi/m where is the learning rate and m is the number of records.

Here we're averaging the weight steps to help reduce any large variations in the training data.

Repeat for e epochs.

You can also update the weights on each record instead of averaging the weight steps after going

through all the records.

Remember that we're using the sigmoid for the activation function, f(h)=1/(1+eh)

And the gradient of the sigmoid is f(h)=f(h)(1f(h))

where h is the input to the output unit,

h=iwixi

For the most part, this is pretty straightforward with NumPy.

First, you'll need to initialize the weights. We want these to be small such that the input to the sigmoid

is in the linear region near 0 and not squashed at the high and low ends. It's also important to initialize

them randomly so that they all have different starting values and diverge, breaking symmetry. So, we'll

initialize the weights from a normal distribution centered at 0. A good value for the scale is 1/n where

n is the number of input units. This keeps the input to the sigmoid low for increasing numbers of input

units.

weights = np.random.normal(scale=1/n_features**.5, size=n_features)

NumPy provides a function that calculates the dot product of two arrays, which conveniently calculates

h for us. The dot product multiplies two arrays element-wise, the first element in array 1 is multiplied

by the first element in array 2, and so on. Then, each product is summed.

# input to the output layer

output_in = np.dot(weights, inputs)

And finally, we can update wi and wi by incrementing them with weights += ... which is

shorthand for weights = weights + ....

Efficiency tip!

You can save some calculations since we're using a sigmoid here. For the sigmoid function, f(h)=f(h)

(1f(h)). That means that once you calculate f(h), the activation of the output unit, you can use it to

calculate the gradient for the error gradient.

Programming exercise

Below, you'll implement gradient descent and train the network on the admissions data. Your goal here

is to train the network until you reach a minimum in the mean square error (MSE) on the training set.

You need to implement:

The network output: output.

The error gradient: error.

Update the weight step: del_w +=.

Update the weights: weights +=.

After you've written these parts, run the training by pressing "Test Run". The MSE will print out, as

well as the accuracy on a test set, the fraction of correctly predicted admissions.

Feel free to play with the hyperparameters and see how it changes the MSE.

Gradient.py

import numpy as np

from data_prep import features, targets, features_test, targets_test

def sigmoid(x):

"""

Calculate sigmoid

"""

return 1 / (1 + np.exp(-x))

np.random.seed(42)

last_loss = None

# Initialize weights

weights = np.random.normal(scale=1 / n_features**.5, size=n_features)

epochs = 1000

learnrate = 0.5

for e in range(epochs):

del_w = np.zeros(weights.shape)

for x, y in zip(features.values, targets):

# Loop through all records, x is the input, y is the target

output = sigmoid(np.dot(x,weights))

error = np.mean((output - y)**2)

del_w += (y-output)*x*output*(1-output)

weights += learnrate * del_w/len(x)

if e % (epochs / 10) == 0:

out = sigmoid(np.dot(features, weights))

loss = np.mean((out - targets) ** 2)

if last_loss and last_loss < loss:

print("Train loss: ", loss, " WARNING - Loss Increasing")

else:

print("Train loss: ", loss)

last_loss = loss

tes_out = sigmoid(np.dot(features_test, weights))

predictions = tes_out > 0.5

accuracy = np.mean(predictions == targets_test)

print("Prediction accuracy: {:.3f}".format(accuracy))

data_prep.py

import numpy as np

import pandas as pd

admissions = pd.read_csv('binary.csv')

data = pd.concat([admissions, pd.get_dummies(admissions['rank'], prefix='rank')], axis=1)

data = data.drop('rank', axis=1)

# Standarize features

for field in ['gre', 'gpa']:

mean, std = data[field].mean(), data[field].std()

data.loc[:,field] = (data[field]-mean)/std

np.random.seed(42)

sample = np.random.choice(data.index, size=int(len(data)*0.9), replace=False)

data, test_data = data.ix[sample], data.drop(sample)

features, targets = data.drop('admit', axis=1), data['admit']

features_test, targets_test = test_data.drop('admit', axis=1), test_data['admit']

binary.csv

solution.py

import numpy as np

from data_prep import features, targets, features_test, targets_test

def sigmoid(x):

"""

Calculate sigmoid

"""

return 1 / (1 + np.exp(-x))

np.random.seed(42)

last_loss = None

# Initialize weights

weights = np.random.normal(scale=1/n_features**.5, size=n_features)

epochs = 1000

learnrate = 0.5

for e in range(epochs):

del_w = np.zeros(weights.shape)

for x, y in zip(features.values, targets):

# Loop through all records, x is the input, y is the target

output = sigmoid(np.dot(x, weights))

error = y - output

# The gradient descent step, the error times the gradient times the inputs

del_w += error * output * (1 - output) * x

# change in weights, divided by the number of records to average

weights += learnrate * del_w / n_records

if e % (epochs / 10) == 0:

out = sigmoid(np.dot(features, weights))

loss = np.mean((out - targets) ** 2)

if last_loss and last_loss < loss:

print("Train loss: ", loss, " WARNING - Loss Increasing")

else:

print("Train loss: ", loss)

last_loss = loss

tes_out = sigmoid(np.dot(features_test, weights))

predictions = tes_out > 0.5

accuracy = np.mean(predictions == targets_test)

print("Prediction accuracy: {:.3f}".format(accuracy))

17. Multilayer Perceptrons

https://www.youtube.com/watch?v=Rs9petvTBLk

Prerequisites

Below, we are going to walk through the math of neural networks in a multilayer perceptron. With

multiple perceptrons, we are going to move to using vectors and matrices. To brush up, be sure to view

the following:

1. Khan Academy's introduction to vectors.

2. Khan Academy's introduction to matrices.

Derivation

Before, we were dealing with only one output node which made the code straightforward. However

now that we have multiple input units and multiple hidden units, the weights between them will require

two indices: wij where i denotes input units and j are the hidden units.

For example, the following image shows our network, with its input units labeled x1,x2, and x3, and its

hidden nodes labeled h1 and h2:

The lines indicating the weights leading to h1 have been colored differently from those leading to h2

just to make it easier to read.

Now to index the weights, we take the input unit number for the i and the hidden unit number for the j.

That gives us

w11

for the weight leading from x1 to h1, and

w12

for the weight leading from x1 to h2.

The following image includes all of the weights between the input layer and the hidden layer, labeled

with their appropriate wij indices:

Before, we were able to write the weights as an array, indexed as wi.

But now, the weights need to be stored in a matrix, indexed as wij. Each row in the matrix will

correspond to the weights leading out of a single input unit, and each column will correspond to the

weights leading in to a single hidden unit. For our three input units and two hidden units, the weights

matrix looks like this:

Be sure to compare the matrix above with the diagram shown before it so you can see where the

different weights in the network end up in the matrix.

To initialize these weights in Numpy, we have to provide the shape of the matrix. If features is a 2D

array containing the input data:

# Number of records and input units

n_records, n_inputs = features.shape

# Number of hidden units

n_hidden = 2

weights_input_to_hidden = np.random.normal(0, n_inputs**-0.5, size=(n_inputs,

n_hidden))

n_inputs by n_hidden. Remember how the input to a hidden unit is the sum of all the inputs

multiplied by the hidden unit's weights. So for each hidden layer unit, hj, we need to calculate the

following:

In this case, we're multiplying the inputs (a row vector here) by the weights. To do this, you take the

dot (inner) product of the inputs with each column in the weights matrix. For example, to calculate the

input to the first hidden unit, j=1, you'd take the dot product of the inputs with the first column of the

weights matrix, like so:

Calculating the input to the first hidden unit with the first column of the weights matrix.

And for the second hidden layer input, you calculate the dot product of the inputs with the second

column. And so on and so forth.

In NumPy, you can do this for all the inputs and all the outputs at once using np.dot

hidden_inputs = np.dot(inputs, weights_input_to_hidden)

You could also define your weights matrix such that it has dimensions n_hidden by n_inputs then

multiply like so where the inputs form a column vector:

Note: The weight indices have changed in the above image and no longer match up with the labels

used in the earlier diagrams. That's because, in matrix notation, the row index always precedes the

column index, so it would be misleading to label them the way we did in the neural net diagram. Just

keep in mind that this is the same weight matrix as before, but rotated so the first column is now the

first row, and the second column is now the second row. If we were to use the labels from the earlier

diagram, the weights would fit into the matrix in the following locations:

Remember, the above is not a correct view of the indices, but it uses the labels from the earlier neural

net diagrams to show you where each weight ends up in the matrix.

The important thing with matrix multiplication is that the dimensions match. For matrix multiplication

to work, there has to be the same number of elements in the dot products. In the first example, there are

three columns in the input vector, and three rows in the weights matrix. In the second example, there

are three columns in the weights matrix and three rows in the input vector. If the dimensions don't

match, you'll get this:

# Same weights and features as above, but swapped the order

hidden_inputs = np.dot(weights_input_to_hidden, features)

---------------------------------------------------------------------------

ValueError Traceback (most recent call last)

<ipython-input-11-1bfa0f615c45> in <module>()

----> 1 hidden_in = np.dot(weights_input_to_hidden, X)

The dot product can't be computed for a 3x2 matrix and 3-element array. That's because the 2 columns

in the matrix don't match the number of elements in the array. Some of the dimensions that could work

would be the following:

The rule is that if you're multiplying an array from the left, the array must have the same number of

elements as there are rows in the matrix. And if you're multiplying the matrix from the left, the number

of columns in the matrix must equal the number of elements in the array on the right.

You see above that sometimes you'll want a column vector, even though by default Numpy arrays work

like row vectors. It's possible to get the transpose of an array like so arr.T, but for a 1D array, the

transpose will return a row vector. Instead, use arr[:,None] to create a column vector:

print(features)

> array([ 0.49671415, -0.1382643 , 0.64768854])

print(features.T)

> array([ 0.49671415, -0.1382643 , 0.64768854])

print(features[:, None])

> array([[ 0.49671415],

[-0.1382643 ],

[ 0.64768854]])

Alternatively, you can create arrays with two dimensions. Then, you can use arr.T to get the column

vector.

np.array(features, ndmin=2)

> array([[ 0.49671415, -0.1382643 , 0.64768854]])

np.array(features, ndmin=2).T

> array([[ 0.49671415],

[-0.1382643 ],

[ 0.64768854]])

I personally prefer keeping all vectors as 1D arrays, it just works better in my head.

Programming quiz

Below, you'll implement a forward pass through a 4x3x2 network, with sigmoid activation functions

for both layers.

Things to do:

Calculate the input to the hidden layer.

Calculate the hidden layer output.

Calculate the input to the output layer.

Calculate the output of the network.

Multiplier.py

import numpy as np

def sigmoid(x):

"""

Calculate sigmoid

"""

return 1/(1+np.exp(-x))

# Network size

N_input = 4

N_hidden = 3

N_output = 2

np.random.seed(42)

# Make some fake data

X = np.random.randn(4)

weights_hidden_to_output = np.random.normal(0, scale=0.1, size=(N_hidden, N_output))

hidden_layer_out = sigmoid(hidden_layer_in)

print('Hidden-layer Output:')

print(hidden_layer_out)

output_layer_in = np.dot(hidden_layer_out,weights_hidden_to_output)

output_layer_out = sigmoid(output_layer_in)

print('Output-layer Output:')

print(output_layer_out)

solution.py

import numpy as np

def sigmoid(x):

"""

Calculate sigmoid

"""

return 1/(1+np.exp(-x))

# Network size

N_input = 4

N_hidden = 3

N_output = 2

np.random.seed(42)

# Make some fake data

X = np.random.randn(4)

weights_hidden_to_output = np.random.normal(0, scale=0.1, size=(N_hidden, N_output))

# TODO: Make a forward pass through the network

hidden_layer_out = sigmoid(hidden_layer_in)

print('Hidden-layer Output:')

print(hidden_layer_out)

output_layer_out = sigmoid(output_layer_in)

print('Output-layer Output:')

print(output_layer_out)

19. Backpropagation

https://www.youtube.com/watch?v=MZL97-2joxQ

Backpropagation

Now we've come to the problem of how to make a multilayer neural network learn. Before, we saw

how to update weights with gradient descent. The backpropagation algorithm is just an extension of

that, using the chain rule to find the error with the respect to the weights connecting the input layer to

the hidden layer (for a two layer network).

To update the weights to hidden layers using gradient descent, you need to know how much error each

of the hidden units contributed to the final output. Since the output of a layer is determined by the

weights between layers, the error resulting from units is scaled by the weights going forward through

the network. Since we know the error at the output, we can use the weights to work backwards to

hidden layers.

For example, in the output layer, you have errors ko attributed to each output unit k. Then, the error

attributed to hidden unit j is the output errors, scaled by the weights between the output and hidden

layers (and the gradient):

Then, the gradient descent step is the same as before, just with the new errors:

where wij are the weights between the inputs and hidden layer and xi are input unit values. This form

holds for however many layers there are. The weight steps are equal to the step size times the output

error of the layer times the values of the inputs to that layer

Here, you get the output error, output, by propagating the errors backwards from higher layers. And

the input values, Vin are the inputs to the layer, the hidden layer activations to the output unit for

example.

Let's walk through the steps of calculating the weight updates for a simple two layer network. Suppose

there are two input values, one hidden unit, and one output unit, with sigmoid activations on the hidden

and output units. The following image depicts this network. (Note: the input values are shown as nodes

at the bottom of the image, while the networks output value is shown as y^ at the top. The inputs

themselves do not count as a layer, which is why this is considered a two layer network.)

Assume we're trying to fit some binary data and the target is y=1. We'll start with the forward pass, first

calculating the input to the hidden unit

h=iwixi=0.40.10.20.3=0.02

and the output of the hidden unit

a=f(h)=sigmoid(0.02)=0.495.

Using this as the input to the output unit, the output of the network is

y^=f(Wa)=sigmoid(0.10.495)=0.512.

With the network output, we can start the backwards pass to calculate the weight updates for both

layers. Using the fact that for the sigmoid function f(Wa)=f(Wa)(1f(Wa)), the error for the output

unit is

o=(yy^)f(Wa)=(10.512)0.512(10.512)=0.122.

Now we need to calculate the error for the hidden unit with backpropagation. Here we'll scale the error

from the output unit by the weight W connecting it to the hidden unit. For the hidden unit error, jh=k

Wjkkof(hj), but since we have one hidden unit and one output unit, this is much simpler.

h=Wof(h)=0.10.1220.495(10.495)=0.003

Now that we have the errors, we can calculate the gradient descent steps. The hidden to output weight

step is the learning rate, times the output unit error, times the hidden unit activation value.

W=oa=0.50.1220.495=0.0302

Then, for the input to hidden weights wi, it's the learning rate times the hidden unit error, times the

input values.

wi=hxi=(0.50.0030.1,0.50.0030.3)=(0.00015,0.00045)

From this example, you can see one of the effects of using the sigmoid function for the activations. The

maximum derivative of the sigmoid function is 0.25, so the errors in the output layer get reduced by at

least 75%, and errors in the hidden layer are scaled down by at least 93.75%! You can see that if you

have a lot of layers, using a sigmoid activation function will quickly reduce the weight steps to tiny

values in layers near the input. This is known as the vanishing gradient problem. Later in the course

you'll learn about other activation functions that perform better in this regard and are more commonly

used in modern network architectures.

Implementing in NumPy

For the most part you have everything you need to implement backpropagation with NumPy.

However, previously we were only dealing with error terms from one unit. Now, in the weight update,

we have to consider the error for each unit in the hidden layer, j:

wij=jxi

Firstly, there will likely be a different number of input and hidden units, so trying to multiply the errors

and the inputs as row vectors will throw an error

hidden_error*inputs

---------------------------------------------------------------------------

ValueError Traceback (most recent call last)

<ipython-input-22-3b59121cb809> in <module>()

----> 1 hidden_error*x

ValueError: operands could not be broadcast together with shapes (3,) (6,)

Also, wij is a matrix now, so the right side of the assignment must have the same shape as the left side.

Luckily, NumPy takes care of this for us. If you multiply a row vector array with a column vector array,

it will multiply the first element in the column by each element in the row vector and set that as the first

row in a new 2D array. This continues for each element in the column vector, so you get a 2D array that

has shape (len(column_vector), len(row_vector)).

hidden_error*inputs[:,None]

array([[ -8.24195994e-04, -2.71771975e-04, 1.29713395e-03],

[ -2.87777394e-04, -9.48922722e-05, 4.52909055e-04],

[ 6.44605731e-04, 2.12553536e-04, -1.01449168e-03],

[ 0.00000000e+00, 0.00000000e+00, -0.00000000e+00],

[ 0.00000000e+00, 0.00000000e+00, -0.00000000e+00],

[ 0.00000000e+00, 0.00000000e+00, -0.00000000e+00]])

It turns out this is exactly how we want to calculate the weight update step. As before, if you have your

inputs as a 2D array with one row, you can also do hidden_error*inputs.T, but that won't work

if inputs is a 1D array.

Backpropagation exercise

Below, you'll implement the code to calculate one backpropagation update step for two sets of weights.

I wrote the forward pass, your goal is to code the backward pass.

Things to do

Calculate the network error.

Calculate the output layer error gradient.

Use backpropagation to calculate the hidden layer error.

Calculate the weight update steps.

Backprop.py

import numpy as np

def sigmoid(x):

"""

Calculate sigmoid

"""

return 1 / (1 + np.exp(-x))

target = 0.6

learnrate = 0.5

[0.1, -0.2],

[0.1, 0.7]])

## Forward pass

hidden_layer_input = np.dot(x, weights_input_hidden)

hidden_layer_output = sigmoid(hidden_layer_input)

output = sigmoid(output_layer_in)

## Backwards pass

## TODO: Calculate error

error = None

del_err_output = None

del_err_hidden = None

delta_w_h_o = None

delta_w_i_h = None

print(delta_w_h_o)

print('Change in weights for input layer to hidden layer:')

print(delta_w_i_h)

solution.py

import numpy as np

def sigmoid(x):

"""

Calculate sigmoid

"""

return 1 / (1 + np.exp(-x))

target = 0.6

learnrate = 0.5

[0.1, -0.2],

[0.1, 0.7]])

## Forward pass

hidden_layer_input = np.dot(x, weights_input_hidden)

hidden_layer_output = sigmoid(hidden_layer_input)

output = sigmoid(output_layer_in)

## Backwards pass

## TODO: Calculate error

error = target - output

del_err_output = error * output * (1 - output)

del_err_hidden = np.dot(del_err_output, weights_hidden_output) * \

hidden_layer_output * (1 - hidden_layer_output)

delta_w_h_o = learnrate * del_err_output * hidden_layer_output

delta_w_i_h = learnrate * del_err_hidden * x[:, None]

print(delta_w_h_o)

print('Change in weights for input layer to hidden layer:')

print(delta_w_i_h)

Implementing backpropagation

Now we've seen that the error in the output layer is

k=(yky^k)f(ak)

and the error in the hidden layer is

For now we'll only consider a simple network with one hidden layer and one output unit. Here's the

general algorithm for updating the weights with backpropagation:

Set the weight steps for each layer to zero

The input to hidden weights wij=0

The hidden to output weights Wj=0

For each record in the training data:

Make a forward pass through the network, calculating the output y^

Calculate the error gradient in the output unit, o=(yy^)f(z) where z=jWjaj, the input

to the output unit.

Propagate the errors to the hidden layer jh=oWjf(hj)

Update the weight steps,:

Wj=Wj+oaj

wij=wij+jhai

Update the weights, where is the learning rate and m is the number of records:

Wj=Wj+Wj/m

wij=wij+wij/m

Repeat for e epochs.

Backpropagation exercise

Now you're going to implement the backprop algorithm for a network trained on the graduate school

admission data. You should have everything you need from the previous exercises to complete this one.

Your goals here:

Implement the forward pass.

Implement the backpropagation algorithm.

Update the weights.

Backprop.py

import numpy as np

from data_prep import features, targets, features_test, targets_test

np.random.seed(21)

def sigmoid(x):

"""

Calculate sigmoid

"""

return 1 / (1 + np.exp(-x))

# Hyperparameters

n_hidden = 2 # number of hidden units

epochs = 900

learnrate = 0.005

last_loss = None

# Initialize weights

weights_input_hidden = np.random.normal(scale=1 / n_features ** .5,

size=(n_features, n_hidden))

weights_hidden_output = np.random.normal(scale=1 / n_features ** .5,

size=n_hidden)

for e in range(epochs):

del_w_input_hidden = np.zeros(weights_input_hidden.shape)

del_w_hidden_output = np.zeros(weights_hidden_output.shape)

for x, y in zip(features.values, targets):

## Forward pass ##

# TODO: Calculate the output

hidden_input = np.dot(x, weights_input_hidden)

hidden_output = sigmoid(hidden_input)

output = sigmoid(np.dot(hidden_output, weights_hidden_output))

## Backward pass ##

# TODO: Calculate the error

error = y-output

output_error = (error)*output*(1-output)

hidden_error = np.dot(output_error, weights_hidden_output)*hidden_output*(1-hidden_output)

del_w_hidden_output += output_error * hidden_output

del_w_input_hidden += hidden_error * x[:, None]

weights_input_hidden += learnrate * del_w_input_hidden / n_records

weights_hidden_output += learnrate * del_w_hidden_output / n_records

if e % (epochs / 10) == 0:

hidden_output = sigmoid(np.dot(x, weights_input_hidden))

out = sigmoid(np.dot(hidden_output,

weights_hidden_output))

loss = np.mean((out - targets) ** 2)

if last_loss and last_loss < loss:

print("Train loss: ", loss, " WARNING - Loss Increasing")

else:

print("Train loss: ", loss)

last_loss = loss

hidden = sigmoid(np.dot(features_test, weights_input_hidden))

out = sigmoid(np.dot(hidden, weights_hidden_output))

predictions = out > 0.5

accuracy = np.mean(predictions == targets_test)

print("Prediction accuracy: {:.3f}".format(accuracy))

data_prep.py

import numpy as np

import pandas as pd

admissions = pd.read_csv('binary.csv')

data = pd.concat([admissions, pd.get_dummies(admissions['rank'], prefix='rank')], axis=1)

data = data.drop('rank', axis=1)

# Standarize features

for field in ['gre', 'gpa']:

mean, std = data[field].mean(), data[field].std()

data.loc[:,field] = (data[field]-mean)/std

# Split off random 10% of the data for testing

np.random.seed(21)

sample = np.random.choice(data.index, size=int(len(data)*0.9), replace=False)

data, test_data = data.ix[sample], data.drop(sample)

features, targets = data.drop('admit', axis=1), data['admit']

features_test, targets_test = test_data.drop('admit', axis=1), test_data['admit']

binary.csv

solution.py

import numpy as np

from data_prep import features, targets, features_test, targets_test

np.random.seed(21)

def sigmoid(x):

"""

Calculate sigmoid

"""

return 1 / (1 + np.exp(-x))

# Hyperparameters

n_hidden = 2 # number of hidden units

epochs = 900

learnrate = 0.005

n_records, n_features = features.shape

last_loss = None

# Initialize weights

weights_input_hidden = np.random.normal(scale=1 / n_features ** .5,

size=(n_features, n_hidden))

weights_hidden_output = np.random.normal(scale=1 / n_features ** .5,

size=n_hidden)

for e in range(epochs):

del_w_input_hidden = np.zeros(weights_input_hidden.shape)

del_w_hidden_output = np.zeros(weights_hidden_output.shape)

for x, y in zip(features.values, targets):

## Forward pass ##

# TODO: Calculate the output

hidden_input = np.dot(x, weights_input_hidden)

hidden_output = sigmoid(hidden_input)

output = sigmoid(np.dot(hidden_output,

weights_hidden_output))

## Backward pass ##

# TODO: Calculate the error

error = y - output

output_error = error * output * (1 - output)

hidden_error = np.dot(output_error, weights_hidden_output) * \

hidden_output * (1 - hidden_output)

del_w_hidden_output += output_error * hidden_output

del_w_input_hidden += hidden_error * x[:, None]

weights_input_hidden += learnrate * del_w_input_hidden / n_records

weights_hidden_output += learnrate * del_w_hidden_output / n_records

if e % (epochs / 10) == 0:

hidden_output = sigmoid(np.dot(x, weights_input_hidden))

out = sigmoid(np.dot(hidden_output,

weights_hidden_output))

loss = np.mean((out - targets) ** 2)

print("Train loss: ", loss, " WARNING - Loss Increasing")

else:

print("Train loss: ", loss)

last_loss = loss

hidden = sigmoid(np.dot(features_test, weights_input_hidden))

out = sigmoid(np.dot(hidden, weights_hidden_output))

predictions = out > 0.5

accuracy = np.mean(predictions == targets_test)

print("Prediction accuracy: {:.3f}".format(accuracy))

Further reading

Backpropagation is fundamental to deep learning. TensorFlow and other libraries will perform the

backprop for you, but you should really really understand the algorithm. We'll be going over backprop

again, but here are some extra resources for you:

From Andrej Karpathy: Yes, you should understand backprop

In this lesson, you learned the power of perceptrons. How powerful one perceptron is and the power of

a neural network using multiple perceptrons. Then you learned how each perceptron can learn from

past samples to come up with a solution.

Now that you understand the basics of a neural network, the next step is to build a basic neural

network. In the next lesson, you'll build your own neural network.

23. Summary

https://www.youtube.com/watch?v=m8xslYUBXYo

- Neural NetworksUploaded byKiran Moy Mandal
- Artificial Neural NetworkUploaded bySelva Kumar
- Perceptron Learning RulesUploaded byVivek Goyal
- Face Detection Using Neural Network & Gabor Wavelet TransformUploaded byFayçal Belkessam
- Improving the Character Recognition Efficiency of Feed Forward BP Neural NetworkUploaded byAnonymous Gl4IRRjzN
- Paper020601.pdfUploaded byanasrp08
- thesis.pdfUploaded byFranz Jordan
- Statistical Data Science (Gnv64)Uploaded byChappal Chor
- 22Uploaded byHaresh Luhana
- A Review on Multi-Focus Image Fusion AlgorithmsUploaded byInternational Journal for Scientific Research and Development - IJSRD
- Drug design by machine learning support vector machines for pharmaceutical data analysis.pdfUploaded byEda Çevik
- What is a Neural NetworkUploaded byInSha RafIq
- Applying Machine Learning Classiﬁers to Dynamic Android Malware Detection at ScaleUploaded byRino Oktora
- A New Convolutional Neural Network Based Data-Driven Fault Diagnosis MethodUploaded byhachan
- Entity Embeddings of Categorical VariablesUploaded byAxel Straminsky
- biomavFinal-1Uploaded byVincent Slieker
- Indhufiles PptUploaded byPrabu Manavalan
- 2Uploaded byLowblung Ar
- Percept RonUploaded byHans Usurin
- Create a Horror MovieUploaded byNata Novian
- 20110709Uploaded bysnehal
- CVPR17eUploaded bybilo044
- final annotated source listUploaded byapi-460354029
- 13 Kundur Intro Video Edge DetectionUploaded byYasser El Haddar
- 2008 BOOK SupervisedClassificationforDecisionSupportinCRMUploaded byMochammad Sofyan
- Seminar Report JenaUploaded byraghu
- 042407 Neural NetworksUploaded bycrana10
- NeuralUploaded bytizkarfsr
- ReportUploaded byAshu Gupta
- Predicting Time of BUS With ANNUploaded byramadanduraku

- English GrammarUploaded bymarios r
- requirements for opensUploaded bykrishna chaitanya
- Tamil to engish.pdfUploaded bykvg3venu7329
- New Text Document.txtUploaded bykrishna chaitanya
- Synchronizer Techniques for Multi Clock Domain SoCsUploaded bykrishna chaitanya
- Synchronous circuits vs asynchronous circuitsUploaded bykrishna chaitanya
- CBSE UGC NET JAN 2017 Tips Tricks and GuidesUploaded byMishra Sanjay
- 1 Clock Domain CrossingUploaded bykrishna chaitanya
- ICC_CMDUploaded bykrishna chaitanya
- 1 Leda General Coding RulesUploaded bykrishna chaitanya
- 5-statemachinesUploaded bylaz_chikhi1574
- pd_interview.txtUploaded bykrishna chaitanya

- Electrical Manhole StandardsUploaded byMohammad Jaradeh
- predictors-of-wellbeing.pdfUploaded byMariaP12
- He LibroUploaded byAbraham Imam Muttaqin
- 0123743281Uploaded byfaneron
- DRAKON.pdfUploaded byMatias Oyola
- As-03.12.036 AMD Maintenance DraftUploaded byYacine Kaidi
- Sample Problems & Solution in AMOUploaded byKenvyl Pham
- Watson Metal Products_ Threaded rods to Grade 5Uploaded byameensderaj
- 100928-01Uploaded byjamesstudios8
- Time Cost Trade OffUploaded byVijay Jain
- Topic 6- Active ConstituentUploaded bytlapara
- Adi Innercity RedevelopmentUploaded byBereket K
- Oculus Mobile UnityUploaded byAnonymous qfWSjl
- SavonarolaUploaded byNostalgic_Dreams
- Learning to Solve Non-routine Mathematical ProblemsUploaded byJorge Rush
- 48-People vs Garcia(1997)Uploaded byLexter Cruz
- vaynerUploaded byjbapple
- Continuous Cheque PrintingUploaded bysaipuppala
- MayonnaiseUploaded bymssarwar9
- HD785ENG140-1Uploaded bynurdinzai
- Acts Series Introduction - NotesUploaded byDBrad Whitt
- Second-hand Translation for Tsar Aleksej Mixajlovič - A Glimpse Into the Newspaper Workshop at Posol'Skij Prikaz 1648 2001Uploaded byubismail
- A- Description Independent VariableUploaded byGaurav Agrawal
- University College of Engineering & Technology (UCETUploaded byali
- 10.5923.j.pc.20120201.01Uploaded byDika
- CDC Generation & Integration for Health Information Exchange Based on Cloud Computing SystemUploaded byShaka Technologies
- 胸腔外科POMRUploaded byHari25885
- CV Europass 20170424 Eliashvili en.docUploaded byDimitri Kenchoshvili
- Electric Circiuts Lab 1Uploaded byJuma Alaydi
- Personal BoundariesUploaded byscribdreaderplus