You are on page 1of 5

10/26/21, 9:43 AM machine-learning/digit_recognition.

ipynb at master · vnikov/machine-learning

Machine Learning Engineer Nanodegree ¶


Deep Learning

Project: Build a Digit Recognition Program


In this notebook, a template is provided for you to implement your functionality in stages which is required to
successfully complete this project. If additional code is required that cannot be included in the notebook, be
sure that the Python code is successfully imported and included in your submission, if necessary. Sections
that begin with 'Implementation' in the header indicate where you should begin your implementation for your
project. Note that some sections of implementation are optional, and will be marked with 'Optional'  in the
header.

In addition to implementing code, there will be questions that you must answer which relate to the project
and your implementation. Each section where you will answer a question is preceded by a 'Question'header.
Carefully read each question and provide thorough answers in the following text boxes that begin
with 'Answer:'. Your project submission will be evaluated based on your answers to each of the questions
and the implementation you provide.

Note: Code and Markdown cells can be executed using the Shift + Enterkeyboard


shortcut. In addition, Markdown cells can be edited by typically double-clicking the
cell to enter edit mode.

Step 1: Design and Test a Model Architecture


Design and implement a deep learning model that learns to recognize sequences of digits. Train the model
using synthetic data generated by concatenating character images from notMNIST or MNIST. To produce a
synthetic sequence of digits for testing, you can for example limit yourself to sequences up to five digits, and
use five classifiers on top of your deep network. You would have to incorporate an additional ‘blank’
character to account for shorter number sequences.

There are various aspects to consider when thinking about this problem:

Your model can be derived from a deep neural net or a convolutional network.
You could experiment sharing or not the weights between the softmax classifiers.
You can also use a recurrent network in your deep neural net to replace the classification layers and
directly emit the sequence of digits one-at-a-time.

You can use Keras to implement your model. Read more at keras.io.

Here is an example of a published baseline model on this problem. (video). You are not expected to model
your architecture precisely using this model nor get the same performance levels, but this is more to show an
exampe of an approach used to solve this particular problem. We encourage you to try out different
architectures for yourself and see what works best for you. Here is a useful  forum post  discussing the
architecture as described in the paper and here is another one discussing the loss function.

https://github.com/vnikov/machine-learning/blob/master/projects/digit_recognition/digit_recognition.ipynb 1/5
10/26/21, 9:43 AM machine-learning/digit_recognition.ipynb at master · vnikov/machine-learning

Implementation
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you
have completed your implementation and are satisfied with the results, be sure to thoroughly answer the
questions that follow.

In [2]: ### Your code implementation goes here.

### Feel free to use as many code cells as needed.

Question 1
What approach did you take in coming up with a solution to this problem?

Answer:

Question 2
What does your final architecture look like? (Type of model, layers, sizes, connectivity, etc.)

Answer:

Question 3
How did you train your model? How did you generate your synthetic dataset? Include examples of images
from the synthetic data you constructed.

Answer:

Step 2: Train a Model on a Realistic Dataset


Once you have settled on a good architecture, you can train your model on real data. In particular, the Street
View House Numbers (SVHN) dataset is a good large-scale dataset collected from house numbers in Google
Street View. Training on this more challenging dataset, where the digits are not neatly lined-up and have
various skews, fonts and colors, likely means you have to do some hyperparameter exploration to perform
well.

Implementation
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you
have completed your implementation and are satisfied with the results, be sure to thoroughly answer the
questions that follow.

In [ ]: ### Your code implementation goes here.

### Feel free to use as many code cells as needed.

https://github.com/vnikov/machine-learning/blob/master/projects/digit_recognition/digit_recognition.ipynb 2/5
10/26/21, 9:43 AM machine-learning/digit_recognition.ipynb at master · vnikov/machine-learning

Question 4
Describe how you set up the training and testing data for your model. How does the model perform on a
realistic dataset?

Answer:

Question 5
What changes did you have to make, if any, to achieve "good" results? Were there any options you explored
that made the results worse?

Answer:

Question 6
What were your initial and final results with testing on a realistic dataset? Do you believe your model is doing
a good enough job at classifying numbers correctly?

Answer:

Step 3: Test a Model on Newly-Captured Images


Take several pictures of numbers that you find around you (at least five), and run them through your classifier
on your computer to produce example results. Alternatively (optionally), you can try using OpenCV /
SimpleCV / Pygame to capture live images from a webcam and run those through your classifier.

Implementation
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you
have completed your implementation and are satisfied with the results, be sure to thoroughly answer the
questions that follow.

In [ ]: ### Your code implementation goes here.

### Feel free to use as many code cells as needed.

Question 7
Choose five candidate images of numbers you took from around you and provide them in the report. Are
there any particular qualities of the image(s) that might make classification difficult?

Answer:

https://github.com/vnikov/machine-learning/blob/master/projects/digit_recognition/digit_recognition.ipynb 3/5
10/26/21, 9:43 AM machine-learning/digit_recognition.ipynb at master · vnikov/machine-learning

Question 8
Is your model able to perform equally well on captured pictures or a live camera stream when compared to
testing on the realistic dataset?

Answer:

Optional: Question 9
If necessary, provide documentation for how an interface was built for your model to load and classify newly-
acquired images.

Answer: Leave blank if you did not complete this part.

Step 4: Explore an Improvement for a Model


There are many things you can do once you have the basic classifier in place. One example would be to also
localize where the numbers are on the image. The SVHN dataset provides bounding boxes that you can tune
to train a localizer. Train a regression loss to the coordinates of the bounding box, and then test it.

Implementation
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you
have completed your implementation and are satisfied with the results, be sure to thoroughly answer the
questions that follow.

In [ ]: ### Your code implementation goes here.

### Feel free to use as many code cells as needed.

Question 10
How well does your model localize numbers on the testing set from the realistic dataset? Do your
classification results change at all with localization included?

Answer:

Question 11
Test the localization function on the images you captured in Step 3. Does the model accurately calculate a
bounding box for the numbers in the images you found? If you did not use a graphical interface, you may
need to investigate the bounding boxes by hand.  Provide an example of the localization created on a
captured image.

Answer:

https://github.com/vnikov/machine-learning/blob/master/projects/digit_recognition/digit_recognition.ipynb 4/5
10/26/21, 9:43 AM machine-learning/digit_recognition.ipynb at master · vnikov/machine-learning

Optional Step 5: Build an Application or Program for a Model


Take your project one step further. If you're interested, look to build an Android application or even a more
robust Python program that can interface with input images and display the classified numbers and even the
bounding boxes. You can for example try to build an augmented reality app by overlaying your answer on the
image like the Word Lens app does.

Loading a TensorFlow model into a camera app on Android is demonstrated in the TensorFlow Android demo
app, which you can simply modify.

If you decide to explore this optional route, be sure to document your interface and implementation, along
with significant results you find. You can see the additional rubric items that you could be evaluated on
by following this link.

Optional Implementation
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you
have completed your implementation and are satisfied with the results, be sure to thoroughly answer the
questions that follow.

In [ ]: ### Your optional code implementation goes here.

### Feel free to use as many code cells as needed.

Documentation
Provide additional documentation sufficient for detailing the implementation of the Android application or
Python program for visualizing the classification of numbers in images. It should be clear how the program or
application works. Demonstrations should be provided.

Write your documentation here.

Note: Once you have completed all of the code implementations and successfully
answered each question above, you may finalize your work by exporting the
iPython Notebook as an HTML document. You can do this by using the menu
above and navigating to
File -> Download as ->
HTML (.html). Include the finished document along with
this notebook as your submission.

https://github.com/vnikov/machine-learning/blob/master/projects/digit_recognition/digit_recognition.ipynb 5/5

You might also like