100% found this document useful (1 vote)

584 views4 pages

Handwritten Digit Recognition Lab Guide

This document provides instructions for a lab assignment on handwritten digit recognition, requiring students to 1) set up their MATLAB path, 2) copy sample datasets, 3) experiment with provided utility functions, and then 4) complete four parts involving training and testing a k-nearest neighbors classifier, designing a feature extractor, implementing a perceptron, and exploring additional algorithms with potential bonus marks. Students are to submit their code and a one page report by the deadline for evaluation and grading. Rigorous experimentation and demonstration of concepts beyond the lectures are emphasized.

Uploaded by

geoaamer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

584 views4 pages

Handwritten Digit Recognition Lab Guide

Uploaded by

geoaamer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

COMP24111 Lab 2 Handwritten Digit Recognition

Deadline: Start of lab, week beginning 25 Oct. Extension policy: NO EXTENSIONS (EVEN ON REQUEST) You need to do 4 things:
1) Set up your path (and ensure it stays set up for every Matlab session) 2) Copy some datafiles 3) Read the help text and experiment with the utility functions I have provided. then (and ONLY then) 4) Do the lab exercises.

SET UP YOUR PATH Two directories need to be added to your Matlab path: /opt/info/courses/COMP24111/lab2 /opt/info/courses/COMP24111/lab3 You have to do this *within* Matlab - either with the addpath function (type: help addpath) which will modify the path only for the current session, or you can change it permanently with the pathtool (type: help pathtool).

COPY SOME DATAFILES Two files need to be copied to your working directory for this exercise: /opt/info/courses/COMP24111/lab2/usps_main.mat /opt/info/courses/COMP24111/lab2/usps_benchmark.mat These are samples of some US Postal Service handwritten digit data the usps_main set contains 500 examples of each of the digits 0-9 (so contains 5,000 digits totally). The usps_benchmark set is a different 500. The aim of the project is to correctly recognise the benchmark examples of digits 3, 6, and 8. You should use only data from the main set for the training of your KNN rule.

READ THE HELP TEXT AND EXPERIMENT WITH THESE UTILITY FUNCTIONS If your path is correctly set up, you have access to the following utility functions: knearest showdigit getonedigit showdata crossfold shufflerows extractfeatures implementation of k-nearest neighbour rule displays image for a single digit retrieves pixels for a single digit displays a group of digit images partitions the data shuffles the rows transforms the raw pixels

Use the help command to find out how they work be sure to know what arguments they take, and what they return, so you dont bother the demonstrators with something that you could solve by just typing: help showdigit After that, you can use the various functions listed above. As a starter, try the following: load usps_main.mat d = getonedigit(3, 123, maindata); showdigit(d);

Notice that we passed the variable maindata as an argument this is the variable that was loaded when you typed load usps_main.mat. A number 3 should be displayed be on the screen. This is the 123rd example of a number 3. There are totally 500 examples of each of the digits 0-9. Try others for yourself, for example this should show up the 485th instance of a 9 digit: d = getonedigit(9, 485, maindata); showdigit(d); Since Matlab indexes arrays from 1 (I hope you all realised that by now) then to get examples of the digit 0, you use array index 10, as follows: showdigit( getonedigit(10, 312, maindata) ) Notice that we by-passed the need to create the variable d here, and just passed the output of getonedigit directly into the showdigit function. The other utility functions also have arguments/return values that are compatible. Take note, all that the getonedigit function is doing is pulling out columns of the variable maindata. You could do this equally well yourself, and probably write something to make it more flexible. Look at the code by typing edit getonedigit.

Now the lab exercises..

PART 1 Training and Testing

1. Write a script that constructs a training dataset of just 3 and 8 digits. To start, pick 100 of each randomly. Your matrix should end up as 2D, 200 rows by 256 columns. Remember to include the true label for each digit, in another array, called labels (or whatever you want). HINT: you may need the reshape command, or to look at the source for getonedigit.m. Use the KNN rule to classify each of the digits in your training set, and report the accuracy. Plot a graph to display the accuracy as you vary K. What does k=1 mean? What does it mean as k gets bigger? Break your training set randomly into 2 parts say the first half and second half of the examples. One part you will use for training, and one part for testing. Plot a testing accuracy graph, again varying K. Repeat the random split and reclassify. Do you get the same behaviour? Plot an average, and standard deviation as error bars. Remember all graphs should have axis labels and a title. If you dont know what Matlab commands to use, try Googling to find out.

You can visualise your classifications using the showdata function provided. When you are comfortable with all this, extend the above to predict digits 3, 6, and 8.

PART 2 Feature Extractors for Digit Recognition

From the /opt/info/courses/COMP24111/lab2 directory, take a copy of the function extractfeatures, and rename it extractmyfeatures. Remember Matlab is similar to Java in that the name of the function has to be the same as the name of the file containing it. The extractfeatures function transforms the 256 pixels of any image into a lower dimensional representation, called a set of features. The features are meant to summarise the pixel information, but in less dimensions. The current function sums up the pixel values in each column giving a feature set of size 16, since the images are all 16x16. Another might be to use all the odd numbered pixels, giving a feature set of size 128. Each of these is likely to give me different accuracy when I try to classify the digits.

Use your imagination to design new feature extractors which preserve the information in the images, but reduces the number of dimensions necessary. Is there an observable trade-off between how many features you use, and the accuracy you get? It is clear that using less features for the KNN rule would be better in both time and memory requirements. You should evaluate a score for yourself as:

score =

accuracy on benchmark set number of features used

For example, if I got 90% accuracy, using the 256 original pixel values as features, my score is:

score =

90 0.351 25 6

Obviously, if I used only 16 features, and still got 90% accuracy, Id have a higher score, about 5.6. Just for a bit of fun, we will collate scores (remember to write it on your report below) and I will announce in a later lecture what techniques proved most effective for feature extraction. Remember, you should only evaluate your accuracy on the 3, 6, and 8 digits within the benchmark set. This means the total number of examples in your benchmark set should be 1500. That is: five hundred 3s, five hundred 6s and five hundred 8s. The other digits should be discarded, or you can try to classify them just for fun. Remember, you can use the function showdata to visualise your classifications. White boxes are placed around digits that you have predicted incorrectly. What do you notice? Are you always misclassifying examples of the 8 digits? Or the 3 digits? The simple accuracy measure we adopted does not tell us which digits we commonly mis-predict. This information obviously might be useful to know we wouldnt want to use a KNN rule that can never recognise the number 8. A far more useful evaluation measure is to look at the confusion matrix: http://en.wikipedia.org/wiki/Confusion_matrix Try building a confusion matrix for your classifications on the benchmark set.

PART 3 Something a bit more challenging Perceptrons

Implement a Perceptron, as described in the lecture. Remember, that a Perceptron can only do two-class problems, so you should downside the dataset to just recognising digits 6 vs. 8, or 3 vs. 8, or 3 vs. 6. Compare the performance to a 2-class KNN, in terms of both time and space complexity. How long does training take? How much memory is necessary? What testing accuracy do you achieve?

PART 4 Bonus marks

Additional, bonus marks are available for truly exceptional students: those able to show observations and knowledge that was not explicitly supplied in lectures. To restate - to obtain marks in this category you should have completed fully parts 1-3 and then shown evidence of additional learning outside the supplied lecture notes. Examples of things you could do: learn about other forms of evaluation than just a confusion matrix, or other feature extraction routines, or other classifiers than just the ones we learnt in the lectures.

DELIVERABLES
By the deadline, you should submit two files, using the submit command as you did last year: lab2.zip lab2.pdf - all your .m files, zipped up - your report, as detailed below

A report should be submitted, on one side of A4 as a PDF file. Anything beyond 1 side will be ignored. You should give a full description of the feature extraction you designed for Part 2, and any graphs that you think represent your achievements in Parts 2 (and Part 3 if you did it). Take note, we are not interested in the details of your code, what functions are called, what they return etc. This module is about algorithms, and is indifferent to how you program them. There is no specific format marks will be allocated roughly on the basis of: rigorous experimentation, knowledge displayed when talking to the demonstrator, imagination in Part 2 above, how informative / well presented your graphs are, grammar, ease of reading.

The lab is marked out of 15: Part 1 Training and Testing Part 2 Feature Extractors Part 3 Perceptrons Part 4 Bonus 5 marks 5 marks 2 marks 3 marks

Remember a mark of 12/15 constitutes a first class mark, 75%.

USPS BarcodePackageIMSpec
No ratings yet
USPS BarcodePackageIMSpec
39 pages
Glossary - Malwarebytes
No ratings yet
Glossary - Malwarebytes
63 pages
HowTo SMTP EMailConfigurationSetup Soln 699429
No ratings yet
HowTo SMTP EMailConfigurationSetup Soln 699429
8 pages
Inoces RDP
No ratings yet
Inoces RDP
2 pages
VBSCript
No ratings yet
VBSCript
8 pages
Defusing Binary Bomb with GDB: Part 1
No ratings yet
Defusing Binary Bomb with GDB: Part 1
56 pages
Cryptography
No ratings yet
Cryptography
503 pages
Introduction To Javacc: Cheng-Chia Chen
No ratings yet
Introduction To Javacc: Cheng-Chia Chen
87 pages
PHP - Sending Email (Text - HTML - Attachments)
No ratings yet
PHP - Sending Email (Text - HTML - Attachments)
6 pages
How To Export Office 365 Mailboxes To PST Using Ediscovery
No ratings yet
How To Export Office 365 Mailboxes To PST Using Ediscovery
32 pages
Bulk SMS Sender ID Application: TOBIT
No ratings yet
Bulk SMS Sender ID Application: TOBIT
1 page
PHP Cookie
100% (1)
PHP Cookie
3 pages
Web Session Management and Security
No ratings yet
Web Session Management and Security
39 pages
Microsoft 365 Security Lab Guide
No ratings yet
Microsoft 365 Security Lab Guide
6 pages
Installing and Using Python
No ratings yet
Installing and Using Python
8 pages
Best RDP Solutions for 2021
No ratings yet
Best RDP Solutions for 2021
3 pages
Spam Detection with Logistic Regression
No ratings yet
Spam Detection with Logistic Regression
4 pages
Port Security
No ratings yet
Port Security
10 pages
Domain SQLi Finder - Py
No ratings yet
Domain SQLi Finder - Py
13 pages
Understanding Phishing Attacks and Prevention
No ratings yet
Understanding Phishing Attacks and Prevention
17 pages
NPSB Detailed ISO8583 Message v2.8
No ratings yet
NPSB Detailed ISO8583 Message v2.8
147 pages
How To Install VcPanel - VPS Control Panel
No ratings yet
How To Install VcPanel - VPS Control Panel
6 pages
Phishing Scams: Impact & Prevention
No ratings yet
Phishing Scams: Impact & Prevention
16 pages
Bulk Email and SMS Bombing Guide
No ratings yet
Bulk Email and SMS Bombing Guide
6 pages
Advanced BEC Simulation Tools Guide
No ratings yet
Advanced BEC Simulation Tools Guide
4 pages
LibreOffice BASIC Functions Guide
No ratings yet
LibreOffice BASIC Functions Guide
2 pages
Exotel 2FA nOTP EBook.
No ratings yet
Exotel 2FA nOTP EBook.
19 pages
Coding For Bulk Email Sender
No ratings yet
Coding For Bulk Email Sender
4 pages
Keylogging in Cyber Security Explained
0% (2)
Keylogging in Cyber Security Explained
19 pages
PHP Form Handling for Beginners
No ratings yet
PHP Form Handling for Beginners
11 pages
ISO IEC 9126-2 - Product Quality - External Metrics
100% (1)
ISO IEC 9126-2 - Product Quality - External Metrics
58 pages
Vade Phishing Simulation Setup Guide
No ratings yet
Vade Phishing Simulation Setup Guide
16 pages
7 - Extrapolation of The Bins
No ratings yet
7 - Extrapolation of The Bins
11 pages
SMTP Port Change
No ratings yet
SMTP Port Change
14 pages
Dridex Malware Analysis and Detection
100% (1)
Dridex Malware Analysis and Detection
9 pages
Symantec™ Data Loss Prevention Administration Guide
No ratings yet
Symantec™ Data Loss Prevention Administration Guide
1,581 pages
Step 1 - Setup Paypal Account
No ratings yet
Step 1 - Setup Paypal Account
10 pages
How To Prevent Internal Email Spoofing in Exchange
No ratings yet
How To Prevent Internal Email Spoofing in Exchange
7 pages
03 SMTP DNS and HTTPS Analytics
No ratings yet
03 SMTP DNS and HTTPS Analytics
40 pages
How To Create Badges With A Vcard QR Code Using OnMerge Barcodes
100% (1)
How To Create Badges With A Vcard QR Code Using OnMerge Barcodes
8 pages
Attendance Bot Tutorial
No ratings yet
Attendance Bot Tutorial
38 pages
eSim Installation Guide for Linux & Windows
No ratings yet
eSim Installation Guide for Linux & Windows
2 pages
Create or Delete A Custom Number Format - Excel - Office PDF
No ratings yet
Create or Delete A Custom Number Format - Excel - Office PDF
6 pages
Clone Oracle Database 11g Home Guide
No ratings yet
Clone Oracle Database 11g Home Guide
3 pages
Array Methods and Properties in Siebel Escript: Method or Property Purpose
No ratings yet
Array Methods and Properties in Siebel Escript: Method or Property Purpose
15 pages
Kali Linux: Mass Mailer & Spoofing
No ratings yet
Kali Linux: Mass Mailer & Spoofing
17 pages
7
No ratings yet
7
193 pages
Phishing Detection via URL Scraping
No ratings yet
Phishing Detection via URL Scraping
6 pages
Understanding DNS Spoofing Attacks
No ratings yet
Understanding DNS Spoofing Attacks
16 pages
Please Complete The Information Requested Below: COMPANY NAME: X2 Logics Staffing Solution, Inc
No ratings yet
Please Complete The Information Requested Below: COMPANY NAME: X2 Logics Staffing Solution, Inc
2 pages
Acceptance Letter - Abin Sebastian v2
No ratings yet
Acceptance Letter - Abin Sebastian v2
1 page
Mitel 3300 Voicemail to Email Setup Guide
No ratings yet
Mitel 3300 Voicemail to Email Setup Guide
25 pages
Sending Bulk SMS Using Hubtel
No ratings yet
Sending Bulk SMS Using Hubtel
18 pages
DKIM-signing Outgoing Mail With Exim4
No ratings yet
DKIM-signing Outgoing Mail With Exim4
13 pages
CNN MATLAB Lab Instructions
No ratings yet
CNN MATLAB Lab Instructions
7 pages
Student Lab: Pattern Recognition
No ratings yet
Student Lab: Pattern Recognition
36 pages
Recognizing Handwritten Digits With Scikit-Learn: Punam Seal
No ratings yet
Recognizing Handwritten Digits With Scikit-Learn: Punam Seal
21 pages
AI-Based Digit Classification Script
No ratings yet
AI-Based Digit Classification Script
7 pages
Stanford KNNassignment
No ratings yet
Stanford KNNassignment
78 pages
Perceptron and Classifier Homework Tasks
No ratings yet
Perceptron and Classifier Homework Tasks
3 pages
NpTel Numerical Analysis Syllabus
No ratings yet
NpTel Numerical Analysis Syllabus
4 pages
Chaos Theory for Math Students
No ratings yet
Chaos Theory for Math Students
8 pages
Start: Application of The Poisson Distribution
No ratings yet
Start: Application of The Poisson Distribution
4 pages
Hospital Resource Optimization Using or Models
No ratings yet
Hospital Resource Optimization Using or Models
9 pages
Optimizing Public Transport Routes in Izhevsk
No ratings yet
Optimizing Public Transport Routes in Izhevsk
4 pages
Puzzle Steps and Calculations
No ratings yet
Puzzle Steps and Calculations
7 pages
Lecture 3 EdgeDetection
No ratings yet
Lecture 3 EdgeDetection
52 pages
TensorFlow & Keras: Advanced Guide
No ratings yet
TensorFlow & Keras: Advanced Guide
4 pages
CS502 Midterm Paper Overview 2022
No ratings yet
CS502 Midterm Paper Overview 2022
8 pages
Encryption Policy for Employees
No ratings yet
Encryption Policy for Employees
2 pages
AI Agent Properties and Search Algorithms
No ratings yet
AI Agent Properties and Search Algorithms
25 pages
Z-Transform in Digital Signal Processing
No ratings yet
Z-Transform in Digital Signal Processing
16 pages
Information Sources in Communication Systems
No ratings yet
Information Sources in Communication Systems
15 pages
Multiple Linear Regression Analysis
No ratings yet
Multiple Linear Regression Analysis
3 pages
Minimun Variance Estimator
No ratings yet
Minimun Variance Estimator
5 pages
7.CNS Question Bank
No ratings yet
7.CNS Question Bank
4 pages
04-Numerical Analysis
No ratings yet
04-Numerical Analysis
13 pages
Probability-Continuous Random-Variables
No ratings yet
Probability-Continuous Random-Variables
101 pages
Non-Equilibrium Dynamics in Biology
No ratings yet
Non-Equilibrium Dynamics in Biology
138 pages
Introduction to Complex Dynamics
No ratings yet
Introduction to Complex Dynamics
15 pages
Solve Equations Using Cramer’s Rule
No ratings yet
Solve Equations Using Cramer’s Rule
3 pages
Final Ta
No ratings yet
Final Ta
18 pages
R Programming Sessional-2 (2024)
No ratings yet
R Programming Sessional-2 (2024)
3 pages
M.Tech Geotechnical Engineering Curriculum
No ratings yet
M.Tech Geotechnical Engineering Curriculum
26 pages
Student Declaration
No ratings yet
Student Declaration
4 pages
MCQ On Digital Signal Processing
No ratings yet
MCQ On Digital Signal Processing
3 pages
Multiple Regression Analysis Overview
No ratings yet
Multiple Regression Analysis Overview
3 pages
Design A Neural Network For Classifying Movie Reviews
No ratings yet
Design A Neural Network For Classifying Movie Reviews
5 pages
DAX Mistakes for Power BI Users
No ratings yet
DAX Mistakes for Power BI Users
11 pages
End-of-Chapter 4 Problem Solutions
100% (1)
End-of-Chapter 4 Problem Solutions
10 pages

Handwritten Digit Recognition Lab Guide

Uploaded by

Handwritten Digit Recognition Lab Guide

Uploaded by

COMP24111 Lab 2 Handwritten Digit Recognition

Now the lab exercises..

PART 1 Training and Testing

PART 2 Feature Extractors for Digit Recognition

accuracy on benchmark set number of features used

PART 3 Something a bit more challenging Perceptrons

PART 4 Bonus marks

Remember a mark of 12/15 constitutes a first class mark, 75%.

You might also like