You are on page 1of 52

An Introduction to Machine

Learning in AWS
Emily Robinson
bit.ly/awsdogs
This talk will cover: This talk won’t cover:
• The why and what of “the cloud” and • ML using any other cloud
Amazon Web Services (AWS) technology
• Connecting to AWS from Python • The math/theory behind the model
• Helpful AWS terms/libraries • Any other algorithms
• Using those libraries to: • How to make a good model
• Upload images to Amazon’s storage
system, S3
• Train an image classification model
• Evaluate the model’s performance
• How to learn more!
Background
AWS and my learning journey
What is the Cloud?
What is AWS?

*https://aws.amazon.com/products/
What is AWS SageMaker?

*https://aws.amazon.com/sagemaker/
Why the Cloud?
Why the Cloud?
• Get access to a super computer
• Unless you want to buy* a $5,500 NVIDIA GPU, you probably can’t train a
128-layer neural network on millions of images on your laptop
• Can start with their model “blueprint”
• Full pipeline – store your data, train your model, create an endpoint
for real-time inference, monitor performance

*Or win one in #Sliced like my brother Dave


Why AWS?
• It’s what my former company Warby Parker used
• But seriously, see if your company already uses a cloud technology
• Otherwise check out Google Cloud, Azure (Microsoft), IBM Cloud,
Saturn Cloud – lots of options out there. A lot of them have a free tier
or free credits
AWS SageMaker Pricing 101
• You are billed by the hour, with the price determined by the machine
you’re using (more powerful = more money)
• https://aws.amazon.com/sagemaker/pricing/ lists prices for everything
• Can be as cheap as $.05 per hour, up to $28.15
• Some things like training jobs automatically stop when they’re done.
You can also shut things down when you’re not using them, though it
can take a few minutes to restart
• You need to submit a support ticket to use the more expensive
machines
• Can see your bill at https://console.aws.amazon.com/billing and set
budget alerts
How I got started with ML in AWS
Things I didn’t know before this project
• Image classification
• “The cloud”
• Deep learning
• AWS SageMaker SDK
• What SageMaker was
• What an SDK was
• Whether I’d accidentally rack up a $30,000 bill and get fired
Today’s image classification
problem
aka the hardest ML problem of our time
Is a dog an 11 or 12 ... or a 13 or 14 out of 10?
Courtesy of WeRateDogs (@dog_rates)
Dataset
• Using rtweet, download.file, and some regex, got images and ratings
from 356 @dog_rates tweets
• 235 images were rated 11/10 or 12/10; 121 13/10, 14/10, or the very
rare 15/10
• Rescaled all the images to be the same size with magick
• Divided randomly into 70/20/10 training/validation/holdout folders

Code is available at github.com/robinsones/weratedogs


Spoiler Alert: How’d the model end up
doing?

This is undoubtedly the worst image classification that’s ever run

But it did run ……………. .


Setup
Disclaimer

There will be a lot of code.


That code will be totally new to you.

Don’t try to memorize the code. Focus on


the steps we need to do, come back to the
exact code later at
github.com/robinsones/weratedogs.
Create an AWS account

https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-account/
Create your keys
Create your keys
Save your credentials to ~/.aws/credentials
Install and import the python packages
Create the S3 Client and bucket
Upload images
(Optional) View files in S3

https://console.aws.amazon.com/s3
Create tables of image info
Upload tables
SageMaker Code
SageMaker has a high and low level API
In general AWS has many ways to do something
In general AWS has many ways to do something
Create IAM SageMaker Role

https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html
Get Role ARN
Create Estimator
Set Hyperparameters
Create Data Channels, Format & Location of Your Data
Train Model
(Optional) View Training Job

https://console.aws.amazon.com/sagemaker/home?region=us-east-1#/jobs
(Optional) View Training Job

https://console.aws.amazon.com/sagemaker/home?region=us-east-1#/jobs
View Training Job Logs
View Training Job Logs
Generate Predictions with Batch Transform Job
Fetch Predictions
Spoiler Alert: How’d the model end up
doing?

This is undoubtedly the worst image classification that’s ever run

But it did run ……………. .


Model Results
Conclusion
Resources
• Python SageMaker SDK Docs: https://sagemaker.readthedocs.io/
• AWS Example Notebooks:
https://github.com/aws/amazon-sagemaker-examples
• SageMaker Developer Guide:
https://docs.aws.amazon.com/sagemaker/latest/dg
• Udacity SageMaker deployment notebooks:
https://github.com/udacity/sagemaker-deployment
• Deep Learning with Python, Second Edition by François Chollet
Top Takeaways
• It’s actually pretty hard (but not impossible!) to rack up a huge bill
• There’s a lot of resources for and ways to do things in AWS
• There is a lot to learn, but you can get value by just doing one piece.
One step at a time.
Acknowledgments
• Jacqueline Nolis for the slide design
• David Robinson for code advice
• My former manager and teammates at Warby Parker
• All the good dogs (aka every dog)
Thank you!

bit.ly/awsdogs_py

hookedondata.org

datascicareer.com

You might also like