Professional Documents
Culture Documents
Creating ML Datasets
Infrastructure
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
9:30 Intros and Google & ML (Rupert Whitehead, DevRel Ecosystem Lead UK & Ireland)
10:00 Intro to ML and Problem Framing (Barbara Fusinska, Strategic Cloud Engineer)
14:15 From traditional data processing to automated and serverless architectures (Dan Anghel)
16:45-17:00 Closing notes and Q&A (David Roldan, Google Cloud for Startups, Europe)
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
4
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
5
Barbara Fusinska Zack Akil Dan Anghel Hoi Lam Omer Mahmood
Strategic Cloud Developer Advocate, Strategic Cloud Developer Sales Engineer
Engineer Cloud Machine Learning Engineer Advocate
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
6
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
7
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
© 2017 Google Inc. All rights reserved. Google
and the Google logo are trademarks of Google Inc.
All other company and product names may be
trademarks of the respective companies with
which they are associated.
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
9
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
10
RULES
ANSWERS
CLASSICAL PROGRAMMING
DATA
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
11
ANSWERS
RULES
MACHINE LEARNING
DATA
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
12
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
13
20%
of all responses
sent on mobile
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Extending features of products
Classify pictures in
Google Photos Spam detection in Gmail
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Google Translate
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Totally new products
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Democratising AI with research
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
When applied properly, machine learning can
deliver a double-digit improvement to most
businesses.
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
20
Creating ML Datasets
Infrastructure
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Barbara Fusinska
Strategic Cloud Engineer,
Google
What is ML?
ML systems learn
how to combine input
to produce useful predictions
on never-before-seen data
LOTS of LOTS of
Inputs Outputs
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
What Can ML Do?
REGRESSION
CLASSIFICATION
Predict numerical values
Which of N labers?
E.g. click-through-rate
Cat, dog, horse, or bear
CLUSTERING
GENERATION
Most similar other examples
Complex output
Most relevant documents
E.g. image caption, translation
(unsupervised)
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Static models are easy to build, but they tend to go stale
Dynamic models are fresher, but require more engineering
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
The ML Surprise - Effort Allocation
Defining KPI’s
Collecting data
Building infrastructure
Optimizing ML algorithm
Integration
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
The ML Surprise - Effort Allocation
Effort Allocation
Defining KPI’s
Expectation
Collecting data
Building infrastructure
Optimizing ML algorithm
*Informally based on many conversations with new ML practitioners
Integration
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
The ML Surprise - Effort Allocation
Effort Allocation
Defining KPI’s
Expectation
Collecting data
Building infrastructure
Optimizing ML algorithm
Reality
Integration
0.25 0.5 0.75 .1
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
The top 10 pitfalls organizations hit when first trying ML!
● You thought training your own ML algorithm would be faster than writing SW Defining KPI’s
● You haven’t looked, but assume the data is ready for use Building infrastructure
● You forgot to put & keep humans in the loop Optimizing ML algorithm
● You launched a product whose initial value-prop was its ML algorithm Integration
● You made a great end-to-end ML system that optimizes for the wrong thing
● You forgot to measure if your algorithm improves things in the real world
● You confused the ease and value-add of using someone else’s pre-trained ML algorithm with
building your own
● You thought after research, production ML algorithms were trained only once
● You want to design your own in-house perception or NLP algorithm
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Two ways to add ML to your apps rning
Friendly machine lea
Custom ML models
Natural Video
Language API Intelligence API
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
There are pre-trained machine learning services available on
Google Cloud
...
TensorFlow Machine Learning
Engine
Translation Natural
API Language API
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
31
Creating ML Datasets
Infrastructure
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Learning Objectives
● Construct an ML problem
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Success Stories in ML at Google
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Google Photos
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
YouTube Recommendations
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Traits of Good ML Problems
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Problems vs Decisions
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Good Predictions and Decisions
Prediction Decision
Probability someone will click on a If P(click) > 0.20, prefetch the web
search result page
What fraction of video ad the user If low, don’t show the user the ad
will watch
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Properties of Good Output
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Let’s Start with an Example
We want the ML model to predict how popular a video just uploaded will
be in the future
Quantify it. How will you know your system has succeeded or failed?
KRs: properly predict 95% of top videos 28 days after being uploaded
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Success and Failure Metrics - Example 2
Quantify it. How will you know your system has succeeded or failed?
KRs: achieve 35% less CPU cost for transcoding without sacrificing
user experience
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
More About Metrics
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Lab: Framing a machine learning problem
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Dan Anghel
Strategic Cloud Engineer,
Google
Using ML Output in Your Products
Square
Footage
ML Model PRICE
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
ML Model Output
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Pseudocode
if model_output == VERY_POPULAR:
transcode_best()
else if model_output == SOMEWHAT_POPULAR:
transcode_good()
else:
transcode_basic()
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Alternative
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Model Boundaries
A B C decision A B C
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
What’s the Difference?
Red and Blue have same distance along the line, but
Blue has wrong product behavior
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Key Lesson
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Heuristics
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Example
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Why Heuristics?
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Identify Your Problem Type
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Classification Problem Type
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Example
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Start Simple!
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Biggest Gain in ML is First Launch
Initial Good v1 v2 v3 v4
System Heuristics ML
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Start Simple!
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Lab: Your Output
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
64
Creating ML Datasets
Infrastructure
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Barbara Fusinska
Strategic Cloud Engineer,
Google
66
If ML is a rocket engine,
data is the fuel
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
67
Unstructured data
accounts for
90% of enterprise data*
*Source: IDC
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Learning Objectives
● Representing features
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Raw data to features
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
A good feature is one that represents raw data in a form
conducive for ML
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
A good feature should be relatable to the objective
● We need a reasonable
hypothesis for why feature
value matters
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Quiz: Are these features related to the objective or not?
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Feature values should be known at prediction time
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Feature definitions shouldn’t change over time
city_id:“br/sao_paulo”
inferred_city_cluster_id:219
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Quiz: Value knowable at prediction time or not?
Objective Feature Good feature?
Whether item is new at store (and can not have been purchased
Predict whether a credit-card before)
transaction is fraudulent
Category of item being purchased
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
There should be enough examples of every feature value
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Raw data are converted to numeric features in different ways
{
"transactionId": 42,
"name": "Ice Cream",
"price": 2.50, [..., 1, 2.50, …, ]
"tags": ["cold", "dessert"],
[..., 0, 8.99, …, ]
"servedBy": {
"employeeId": 72365, [..., 0, 3.45, …, ]
"waitTime": 1.4,
…
"customerRating": 4
},
"storeLocation": { In estimator API,
"latitude": 35.3,
This is a feature
"longitude": -98.7
}
column
},
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Numeric values can be used as-is
{
"transactionId": 42,
"name": "Ice Cream",
[ , 2.50, …, 1.4, ]
"price": 2.50,
"tags": ["cold", "dessert"], …
"servedBy": {
"employeeId": 72365,
"waitTime": 1.4,
"customerRating": 4 INPUT_COLUMNS = [
}, …,
"storeLocation": { layers.real_valued_column('price'),
"latitude": 35.3,
"longitude": -98.7
…
Real-valued-column is a
]
} type of feature column
},
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Overly specific attributes should be discarded
{
"transactionId": 42,
"name": "Ice Cream",
"price": 2.50,
"tags": ["cold", "dessert"],
"servedBy": {
"employeeId": 72365,
"waitTime": 1.4,
"customerRating": 4
},
"storeLocation": { Don’t train on IDs or other
super-specific information
"latitude": 35.3,
"longitude": -98.7
}
},
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Categorical values could be one-hot encoded
{
"transactionId": 42,
"name": "Ice Cream",
"price": 2.50,
"tags": ["cold", "dessert"],
"servedBy": {
8345 72365 87654 98723 23451
"employeeId": "72365",
"waitTime": 1.4, 0 1 0 0 0
"customerRating": 4
Sparse-column is a type of
},
"storeLocation": { feature column
"latitude": 35.3, layers.sparse_column_with_keys('employeeId',
"longitude": -98.7 keys=['8345', '72365', '87654', '98723', '23451']),
}
},
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Preprocess data to create a vocabulary of keys
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Options for encoding categorical data
These are all different ways
to create a SPARSE-column
If you know the keys beforehand:
layers.sparse_column_with_keys('employeeId',
keys=['8345', '72345', '87654', '98723', '23451']),
layers.sparse_column_with_integerized_feature('employeeId', 5)
layers.sparse_column_hashbucket('employeeId', 500)
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Customer rating can be used as continuous or as one-hot
encoded value
{
"transactionId": 42,
"name": "Ice Cream",
"price": 2.50,
[..., 0,0,0,1,0, ...]
"tags": ["cold", "dessert"],
"servedBy": {
"employeeId": 72365,
"waitTime": 1.4, (OR)
"customerRating": 4
}, [..., 4, …]
"storeLocation": {
"latitude": 35.3,
"longitude": -98.7
}
},
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Don’t mix magic numbers with data
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Discretize floats that are not meaningful
lat = layers.real_valued_column('latitude')
dlat = layers.bucketized_column(lat, boundaries=np.arange(32,42,1).tolist())
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Crazy outliers hurt trainability
Same feature,
capped to max of 4.0
50 rooms
per person!?
features['capped_rooms'] = tf.clip_by_value(
features['rooms'] ,
clip_value_min=0,
clip_value_max=4
)
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
87
Training
Dataset
Validation
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
87
88
No.
Training Dataset
Training
Data Training
Training
Data
Experimental
Dataset
Validation
Validation
Validation
Data Test
Validation
Data
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
89
Lab: Dataset Design
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Companies / Mentors pairing
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Dan Anghel
Strategic Cloud Engineer,
Google
93
Production ML Systems
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Production ML System
MACHINE
DATA
DATA VERIFICATION RESOURCE
COLLECTION
MANAGEMENT
FEATURE SERVING
EXTRACTION
ML CODE ANALYSIS TOOLS
INFRASTRUCTURE
PROCESS
MANAGEMENT CONFIGURATION MONITORING
TOOLS
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Reuse generic systems wherever you can
Spark
Hadoop
TF Serving
Dataflow
Your ML Model
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
GCP tools support end-to-end production ML Models
DATA COLLECTION DATA VERIFICATION MACHINE RESOURCE
SERVING INFRASTRUCTURE
MANAGEMENT
Serverless
Cloud Dataflow Cloud Dataflow
Cloud Composer
(Upcoming)
Stackdriver Stackdriver
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Production ML systems on Google Cloud
Ingest/Capture Process Store Analyze Learn Visualize
3rd party
tools:
Tableau Data
ML Engine Qlik Analysts
Google App Engine Cloud Dataflow BigQuery Storage BigQuery iCharts
(tables)
Text API
Google Cloud Pub/Sub Cloud Dataproc Smart
apps
Speech API
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Introduction to Dataflow
Lab: ML Production Systems
● Complete Exercise 11
● Time: 1,5h
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Next Steps & Startup Ecosystem Wrap Up
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Buzz Word Bingo
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Data science is hard
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Paper, Rock, Scissors of AI
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Glove sensor data Linear transformation Probability
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Google Cloud Next
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Calls to Action
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Testing Cloud???
https://goo.gl/hY72bR
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Next Steps
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Next Steps
● Questions?
● Feedback survey
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Appendix
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Google Cloud Startup Program types
Start on Google, Stay with Google
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
3. Features need to be numeric because neural networks are
weighing and adding machines
matmul
b
x
Image source: https://commons.wikimedia.org/wiki/File:Two_layer_ann.svg
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Quiz: Which of these are numeric?
Feature of discount coupon to predict number of coupons that will be used Numeric?
Percent value of the discount (e.g. 10% off, 20% off, etc.)
Item category (1 for dairy, 2 for deli, 3 for canned goods, etc.)
Cloud Storage
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
By applying the same transform code to historical data as to
real-time data, you can limit training-serving skew
Cloud Storage
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Three potential architectures for dynamic training
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Cloud Functions can launch asynchronous training jobs
1 2 3 4
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
AppEngine can be used for user-triggered training jobs
1 2 3 4
Web Request
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Dataflow can be used for continuous training
1 2 3 4
5
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Lab: Design a system architecture for dynamic models
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Lab: Implementing a Machine Learning model in TensorFlow
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Recall that there are three possible places to do feature
engineering, and that tf.transform is a hybrid of #1 and #2
Hyper-parameter tuning
Model
Inputs Pre Feature
Train model
processing creation
Preprocessed
features
3 2 1
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
127
tf.transform is a hybrid of Beam and TensorFlow
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
First, set up the schema of the training dataset
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Next, run the analyze-and-transform PTransform on training
dataset to get back preprocessed training data and the
transform function
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Write out the preprocessed training data into TFRecords, the
most efficient format for TensorFlow
_ = transformed_data | tfrecordio.WriteToTFRecord(
os.path.join(OUTPUT_DIR, 'train'), The filenames will be like train-0003-of0015
coder=example_proto_coder.ExampleProtoCoder(transformed_metadata.schema))
Note that we use the metadata schema here
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
The preprocessing function is restricted to functions you can
call from TensorFlow graph
def preprocess(inputs):
import datetime
result = {} Create features from the inputs
result['fare_amount'] = (inputs['fare_amount'])
result['dayofweek'] = tft.string_to_int(inputs['dayofweek']) vocabulary
...
result['dropofflat'] = (tft.scale_to_0_1(inputs['dropofflat'])) scaling
result['passengers'] = tf.cast(inputs['passengers'], tf.float32)
return result Other tensorflow functions
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Writing out the eval dataset is similar, except that we reuse the
transform function computed from the training data
raw_test_data = (p
| beam.io.Read(beam.io.BigQuerySource(...)
| 'eval_filter' >> beam.Filter(is_valid))
transformed_test_dataset = (((raw_test_data, raw_data_metadata), transform_fn)
| beam_impl.TransformDataset())
transformed_test_data, _ = transformed_test_dataset
_ = transformed_test_data | tfrecordio.WriteToTFRecord(
os.path.join(OUTPUT_DIR, 'eval'),
coder=example_proto_coder.ExampleProtoCoder(
transformed_metadata.schema))
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
For serving, we need to write out the transformation metadata
_ = (transform_fn
| transform_fn_io.WriteTransformFn(os.path.join(OUTPUT_DIR, 'metadata')))
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Change our input function to read TFRecord, not CSV (1/2)
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Change our input function to read TFRecord, not CSV (2/2)
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
For serving, we need to write out the transformation metadata
_ = (transform_fn
| transform_fn_io.WriteTransformFn(os.path.join(OUTPUT_DIR, 'metadata')))
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Closing and Q&A
with David Roldan
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Next Steps by 9th September
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Omitted slides
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
What do these search queries have in common ?
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
ML converts examples into knowledge
Coffee near me
Anna’s
Gourmet
Cafe
Bill’s Diner 3min 5min
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
ML converts examples into knowledge
Coffee near me
Anna’s
Gourmet
Bill’s Diner 3min 15min Cafe
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
How far will you go for gourmet coffee?
User Happiness
1 KM 10 KM
Distance
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Good learning involves blending all the users’ preferences
Product excellence
User Happiness
1 KM 10 KM
Distance
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
ML can be used to solve many problems for which you are
writing rules today
Apply rules program to make decisions. Deploy model at scale to make predictions.
Add new rules in response to bug reports. Continuous training of model on data.
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
147
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Machine learning scales better than hand-coded rules
query = ‘Giants’
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
RankBrain (a deep neural network for search
ranking) improved performance significantly
#3
signal
Search
for Search ranking, out
machine learning for search engines of hundreds
#1
improvement
to ranking quality
in 2+ years
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.