You are on page 1of 44

AIM 216-R1

Intro to Amazon SageMaker


Debugger: Get insights into ML
model training Jayant Thomas
Satadal Bhattacharjee
Principal Product Sr. Director, AI
Manager Amazon AI Engineering
Amazon Web Services

© 2019, Amazon Web Services, Inc. or its affiliates. All rights


reserved.
Agend
a
Review Amazon SageMaker

Machine Learning Training Challenges

Introducing Amazon SageMaker

Debugger Sample Notebooks

Customer

Story Q&A
Our mission at AWS

Put machine learning in


the hands of every
developer
Build, Train, Deploy Machine Learning Models Quickly at
Scale
NEW!
Amazon SageM aker Studio
IDE
NEW!
NEW! NEW! NEW! Deployment & NEW!
Ground ML Algorithms & Q uick-start Training Reinforcement Neo
Learning Experiments Debugger Autopilo Hosting Monitoring
Truth Marketplace notebooks &
Tuning t
Frameworks
Amazon SageMaker
Amazon SageMaker: Fully Managed
Training

• XGBoost - Gradient Boosted


Trees
• Matrix Factorization
• Regression
• Principal Component Analysis
• K-Means Clustering
• And More! Bring Your Own Script Bring Your Own
Amazon-Provided Algorithms (Amazon SageMaker builds the Algorithm (You build the
Container) Container)

Fully Managed, Distributed, Auto-Scaled,


Secured
But how do I gain insights into my
training?
Challenges with Machine Learning
Training Large neural networks
with many layers
+

Many connections
Debugging machine +
learning training is Computationally
painful intensive
=
Extraordinarily difficult
to inspect, debug, and
profile
the ‘black box’
Challenges with Machine Learning
Training Manually print debug
data
+
Manually analyze the
debug data
Debugging machine +
learning training is Use open source tools
painful for charting
=

Valuable data
scientist/ML practitioner
time wasted
There’s got to be a better
way!
And there is…
Introducing Amazon SageMaker
Debugger
Training data analysis, debugging, & alert
generation

Amazon SageMaker
Relevant data Automatic data Automatic error Faster training Studio
capture analysis detection integration
Analyze &
Data is Debug data with Errors are automatically Analyze and debug across
debug from
automatically no code detected and alerts are distributed clusters
Amazon
captured for analysis changes sent
SageMaker
Studio
How does Amazon SageMaker Debugger
Work
Amazon SageMaker

Action  Stop the training

A
m
a
zAction  Analyze using
Training Analysis o Debugger SDK
in in Amazon SageM aker
n
C
progress progress
Notebook
l
o
u
Action  Visualize Tensors
Customer’s S3 d
using charts
W
Bucket Amazon SageM aker a
Studio t
Visualization c
h

• No code change is necessary to emit debug data with built in algorithms and E
custom training script v
e
• Analysis occurs real time as data is emitted making real time alerts possible n
© 2019, Amazon Web Services, Inc. or its affiliates. All rights
reserved.
Amazon SageMaker Studio – Real-time Built in
Rules
Amazon SageMaker Studio – Real-time
Alerts
Amazon SageMaker Studio – Compare Loss
Curves
Custom
Analysis
• Fetch specific
tensors in numpy
array

• Slice or dice the


data you like

• Plot your own graphs


to dig deeper into
the neural network
Custom Analysis – Image classification
model • Track distributions of
gradients, activations, weights

• Identify training issues in


real- time

Vanishing gradients
© 2019, Amazon Web Services, Inc. or its affiliates. All rights
reserved.
The Change Healthcare Intelligent Healthcare
Network™
Accelerating Healthcare Transformation
Change Healthcare AI innovations are accelerating
healthcare transformation by tackling cost and quality

Carried out by
More accurate Reducing rework
machines or
decision FIRST loops
optimal person
© 2019, Amazon Web Services, Inc. or its affiliates. All rights
reserved.
Model Development through
Production

Amazon SageMaker Using a End points


Debugger custom custom
models workflow
Model Development
Process

Amazon SageMaker
Debugger notebooks

Custom Amazon Artifactory as docker


SageMaker images
Debugger models Custom scripts
Train & Debug
Pipelines Apache Amazon
SageMaker
Data source Airflow
SDK
Workflow

Amazon Redshift

AWS Glue EC2 EC2 Amazon SageMaker


Trained Model Ready
Hyper Parameter tuning for Deployment
FTP into
S3
external
sources

Athena
Deployment
Architecture Amazon Customer
SageMaker onboarding
end points

Customer
systems
Internal Lambda
consumers Elastic Load API AWS WAF Rules
API ‘Score’ (Registration/API Rules)
Balancer Gateway Gateway
VPCE Public

Kinesis Stream

Kinesis Data Firehose S3


A W S Glue Catalog
© 2019, Amazon Web Services, Inc. or its affiliates. All rights
reserved.
Existing Process
Overview

Source: Internal Change Healthcare transactions processed


Existing Process
Overview
Claim
Forms
AI Augmented Process
Overview
Claim Forms After
Preprocessing

Original Forms Processed Forms


Claim Forms After
Preprocessing

Original Forms Processed Forms


AI Augmented Process Next
Steps
• Focused on improving accuracy

• Gain additional insights into the 10-layer CNN


network through model instrumentation

• Utilize Amazon SageMaker Debugger to optimize time


spent by data scientists on modeling
© 2019, Amazon Web Services, Inc. or its affiliates. All rights
reserved.
Improved Model Training Using SageMaker
Debugger
Apache Amazon
Airflow SageMaker
SDK
Workflow
Data Source

Early stops

Amazon Tensors Visualization Trained


Pre-processing Hooks/rules
model ready
Amazon S3 300K SageMaker Amazon SageMaker for
images deployment
TensorFlow Model
Debugger
(synthetic)
Amazon SageMaker Debugger
Usage

No instrumentation Used for Terminate the


was required with custom model training in
existing code analysis of the case of
tensors inconsistencies

• No code changes needed to instrument models for debugging purpose


• Created custom analysis of the tensors
• Enabled termination of model training in case of inconsistencies using
hooks
• E.g., Break the training job when tensors emitted blank images
Amazon SageMaker Debugger
Implementation
sagemaker_simple_estimator = TensorFlow(

role=sagemaker.get_execution_role(),

...

sagemaker_session=sess,

rules = [

Rule.sagemaker(
rule_configs.ex
ploding_tensor(
),

rule_par
ameters=
{ "tenso
r_regex"
:
Tensor
Analysis
> from smdebug.trials import create_trial

> tr = create_trial('/tmp/smdebug’)

> print(tr.tensors())

['model/convolution_25/kernel:0',
'model/convolution_23/kernel:0', 'model/convolution_28/bias:0',
'model/convolution_30/bias:0', 'model/convolution_24/kernel:0',
'model/convolution_28/kernel:0',
'model/convolution_30/kernel:0', 'model/convolution_26/kernel:0',
'model/convolution_33/bias:0’,….]

>print(tr.tensor('model/convolution_25/kernel:0').steps())

[2, 6, 10, 14, 18, 22, 26, 30, 34, 38]

> print(tr.tensor('model/convolution_25/kernel:0').value(2)

array([ 0.02875063, 0.02498713, -0.02165263, -0.0030316 , 0.03755578,


0.03020793, 0.02638954, -0.02373646,
0.03169937, 0.03428428, -0.05006 , 0.04924595, 0.0103638 , 0.03005257,
-0.01988248, -0.02163161, 0.01634527,
0.01877102, 0.02208846, 0.00245956, -0.01727855, 0.0081707 , 0.01851869,
Improved Model
Detected
Accuracyinconsistencies in real time
5 6 % reduction in human-processed
claims

Source: Internal findings after initial deployments


Future
Plans
We plan to use Amazon SageMaker Debugger as the
default to train all the Change Healthcare Models (40+
models)

Develop standard hooks for early stops and add custom


rules to detect problems specific to our models
Related
breakouts
AIM213-R & R1 – Introducing Amazon SageMaker Model Monitor

AIM214-R1 – Introducing Amazon SageMaker Studio

AIM215-R & R1 – Introducing Amazon SageMaker

Autopilot

AIM230-R & R1 – Introducing a new experience for Amazon


SageMaker Notebooks
Thank
you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights
reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights
reserved.

You might also like