You are on page 1of 149

ML Kickstarter

From raw data to an initial TF model

Google Developers Launchpad

© 2017 Google Inc. All rights reserved. Google


and the Google logo are trademarks of Google Inc.
All other company and product names may be
trademarks of the respective companies with
which they are associated.
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
2

How Google does ML?


Agenda
Intro to ML

Framing the Problem

Creating ML Datasets

Infrastructure

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
9:30 Intros and Google & ML (Rupert Whitehead, DevRel Ecosystem Lead UK & Ireland)

10:00 Intro to ML and Problem Framing (Barbara Fusinska, Strategic Cloud Engineer)

10:45 Hands-on Lab: Problem Framing (worksheet exercises 1-3)

11:15 Coffee break

11:30 Defining Output (Dan Anghel, Strategic Cloud Engineer)

11:45 Hands-on Lab: Defining Output (worksheet exercises 4-7)

12:15 Lunch (and mentors will do synch)

13:15 Creating ML Datasets (Barbara Fusinska)

13:45 Hands-on Lab: Dataset Design (worksheet exercises 8-10)

14:15 From traditional data processing to automated and serverless architectures (Dan Anghel)

15:00 Coffee break

15:15 Hands-on Lab: Building end-to-end ML system (worksheet exercises 11)

16:45-17:00 Closing notes and Q&A (David Roldan, Google Cloud for Startups, Europe)

17:00-17:30 Quick Mentor synch up

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
4

Meet the Googlers

Rupert Whitehead David Roldan Bella Rose


Pawel Nowak
Developer Relations Cloud startups, EMEA Operations, Cloud
Software Engineer
Regional Lead UK &
Ireland

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
5

Meet the Googlers

Barbara Fusinska Zack Akil Dan Anghel Hoi Lam Omer Mahmood
Strategic Cloud Developer Advocate, Strategic Cloud Developer Sales Engineer
Engineer Cloud Machine Learning Engineer Advocate

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
6

Meet our community Mentors

Matthias Feys William Fletcher Sergii Khomenko Steven Gray


CTO Machine Learning Lead Data Senior Teaching
ML6 Researcher, Scientist, Stylight Fellow, UCL
Datatonic

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
7

Meet our community Mentors

Janis Klaise Kryzstof Suwada David Kas


Data Scientist Data Scientist Deep RL Hacker
Seldon Brainly Instadeep

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
© 2017 Google Inc. All rights reserved. Google
and the Google logo are trademarks of Google Inc.
All other company and product names may be
trademarks of the respective companies with
which they are associated.
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
9

“Machine learning. This is the next


transformation… the programming
paradigm is changing. Instead of
programming a computer, you teach a
computer to learn something and it
does what you want.”
Eric Schmidt
Executive Chairman of the Board
Google

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
10

RULES
ANSWERS

CLASSICAL PROGRAMMING

DATA

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
11

ANSWERS
RULES

MACHINE LEARNING

DATA

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
12

New features Developing


Operational
Infrastructure on existing totally new
Efficiency
product product

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
13

Gmail - Smart Reply


Inbox

20%
of all responses
sent on mobile

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Extending features of products

Classify pictures in
Google Photos Spam detection in Gmail

Recommendations for the


next video in Youtube

Targeted ads to display


in Adwords
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
There are over 4000 TensorFlow machine learning models in
production at Google, and it has transformed our company

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Google Translate

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Totally new products

Intelligent voice ML Powered Camera Pedestrian detection


interactions Google Clips Self-driving cars
Google Home

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Democratising AI with research

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
When applied properly, machine learning can
deliver a double-digit improvement to most
businesses.

- Tushar Chandra, Google Distinguished Engineer

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
20

How Google does ML


Agenda
Intro to ML

Framing the Problem

Creating ML Datasets

Infrastructure

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Barbara Fusinska
Strategic Cloud Engineer,
Google
What is ML?

ML systems learn
how to combine input
to produce useful predictions
on never-before-seen data

SW Engineering Machine learning

LOTS of LOTS of
Inputs Outputs

Inputs Outputs Inputs Outputs

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
What Can ML Do?

REGRESSION
CLASSIFICATION
Predict numerical values
Which of N labers?
E.g. click-through-rate
Cat, dog, horse, or bear

CLUSTERING
GENERATION
Most similar other examples
Complex output
Most relevant documents
E.g. image caption, translation
(unsupervised)

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Static models are easy to build, but they tend to go stale
Dynamic models are fresher, but require more engineering

STATIC MODELS DYNAMIC MODELS

Trained once, offline Add training data over time

Easy to build and test ● Engineering is harder


● Have to do progressive
validation

Easy to let become stale ● Regularly sync out updated


version
● Will adapt to changes

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
The ML Surprise - Effort Allocation

Defining KPI’s

Collecting data

Building infrastructure

Optimizing ML algorithm

Integration

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
The ML Surprise - Effort Allocation

Effort Allocation

Defining KPI’s
Expectation
Collecting data

Building infrastructure

Optimizing ML algorithm
*Informally based on many conversations with new ML practitioners
Integration

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
The ML Surprise - Effort Allocation

Effort Allocation

Defining KPI’s
Expectation
Collecting data

Building infrastructure

Optimizing ML algorithm
Reality
Integration
0.25 0.5 0.75 .1

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
The top 10 pitfalls organizations hit when first trying ML!

● You thought training your own ML algorithm would be faster than writing SW Defining KPI’s

● You haven’t collected the data yet Collecting data

● You haven’t looked, but assume the data is ready for use Building infrastructure

● You forgot to put & keep humans in the loop Optimizing ML algorithm

● You launched a product whose initial value-prop was its ML algorithm Integration

● You made a great end-to-end ML system that optimizes for the wrong thing
● You forgot to measure if your algorithm improves things in the real world
● You confused the ease and value-add of using someone else’s pre-trained ML algorithm with
building your own
● You thought after research, production ML algorithms were trained only once
● You want to design your own in-house perception or NLP algorithm

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Two ways to add ML to your apps rning
Friendly machine lea

Custom ML models

Vision API Speech API Translation


API
TensorFlow Machine Learning
Engine

Natural Video
Language API Intelligence API

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
There are pre-trained machine learning services available on
Google Cloud

Custom ML models Pre-trained ML models

Vision API Speech API Jobs API

...
TensorFlow Machine Learning
Engine

Translation Natural
API Language API
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
31

How Google does ML?


Agenda
Intro to ML

Framing the Problem

Creating ML Datasets

Infrastructure

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Learning Objectives

● Decide where and when to use ML

● Construct an ML problem

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Success Stories in ML at Google

● Suggested short responses to emails

● Trained on historical Gmail threads

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Google Photos

● Untagged photos classification

● Originally trained on Image Search and


G+ tags

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
YouTube Recommendations

● Suggest videos to watch

● Trained on user behavior

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Traits of Good ML Problems

● Before collecting data, know your problem!


● Collecting data, then looking for correlations
is dangerous
● You need to make decisions, not just
predictions

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Problems vs Decisions

● Say we can predict the


probability that someone will
“thumbs down” a video
● ..but what can we do with that
knowledge

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Good Predictions and Decisions

Prediction Decision

What video the user wants to Show those videos in the


watch next? recommendations bar

Probability someone will click on a If P(click) > 0.20, prefetch the web
search result page
What fraction of video ad the user If low, don’t show the user the ad
will watch

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Properties of Good Output

● Need examples with known output (labeled data)


○ E.g. classifying abusive video comments. Do you have a system to log such
comments?

● Must be quantifiable with a definition a machine can produce


○ Did the user enjoy reading the article? Vs Did the user share the article?

● Should be connected to your ideal outcome


○ This output is what your model will try to optimize, i.e. the objective

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Let’s Start with an Example

Q: What would you like your machine learned model to do?

We want the ML model to predict how popular a video just uploaded will
be in the future

Q: What’s your desired outcome?

Example 1: Our ideal outcome is to suggest videos that people find


useful, entertaining and worth their time.
Example 2: Skip HD transcoding for less popular videos to minimize
resource costs.
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Success and Failure Metrics - Example 1

Quantify it. How will you know your system has succeeded or failed?

Success metrics: number of popular videos correctly predicted

KRs: properly predict 95% of top videos 28 days after being uploaded

ML model is deemed unsuccessful if number of popular videos


properly predicted is no better than current heuristics

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Success and Failure Metrics - Example 2

Quantify it. How will you know your system has succeeded or failed?

Success metrics: CPU resource utilization

KRs: achieve 35% less CPU cost for transcoding without sacrificing
user experience

ML model is deemed unsuccessful if CPU resource cost reduction <


CPU cost for training and serving model

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
More About Metrics

● How will you measure your metrics?


● How long will it take to determine success
of failure?

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Lab: Framing a machine learning problem

● Make a copy of the ML problem


framing worksheet
goo.gl/ZdaA4b
● Complete Exercise 1-3
● Time: 30 min

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Dan Anghel
Strategic Cloud Engineer,
Google
Using ML Output in Your Products

● When will your prediction be made?


○ As the user is using the product
○ Precomputed offline

● How will it be used in your product?

Square
Footage
ML Model PRICE

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
ML Model Output

Example: The output will be one of the 3 classes of videos, based on


watchtime after 28 days from uploading:
● Very popular (top 3%)
● Somewhat popular (top 7%)
● Not popular (bottom 90%)

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Pseudocode

if model_output == VERY_POPULAR:
transcode_best()
else if model_output == SOMEWHAT_POPULAR:
transcode_good()
else:
transcode_basic()

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Alternative

Use regression to output number of hours watched


if model_output > BEST_THRESHOLD:
transcode_best()

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Model Boundaries

Classification Model Regression Model


task being
A B C optimized

Product Code Product Code


model
A B C output

A B C decision A B C

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
What’s the Difference?

During training regression model does not know


about the threshold e l
Lab

Red and Blue have same distance along the line, but
Blue has wrong product behavior

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Key Lesson

Output your decision


when possible

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Heuristics

How might you solve a problem if you didn’t use ML?

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Example

Assume new videos uploaded by popular creators will become popular

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Why Heuristics?

● Gives you a baseline


● Forces you to think about what signals are helpful

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Identify Your Problem Type

Chose based on your models output


● Binary classification
● Unidimensional regression
● Multi-class single-label classification
● Multi-class multi-label classification
● Multidimensional regression
● Clustering (unsupervised)
● other

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Classification Problem Type

How many categories to pick from?


2 more than 2
binary classification multi-class classification
(e.g. click or no click?) (e.g. type of animal?)

How many categories for a single example?


1 several

multi-class single-label multi-class multi-label


(e.g. which type is this animal?) (e.g. what are all the animals
in this picture?)
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Regression Problem Types

How many numbers are output?


1 1+

unidimensional regression multidimensional regression


(aka “regression”)
(e.g. what is the
(e.g. how many minutes of video [latitude, longitude] of the
will this user watch ?) location in the photo?)

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Example

“Our problem is best framed as 3-class, single-label classification,


which predicts whether a video will be in one of three classes {very
popular, somewhat popular, not popular}, 28 days after being
uploaded.”

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Start Simple!

● A simple model is easier to implement and understand


● Once you have a full pipeline, you can iterate on the model more
easily
● Is your expensive neural network model actually better than a
simple model?
● Simple models are probably better than you think

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Biggest Gain in ML is First Launch

Initial Good v1 v2 v3 v4
System Heuristics ML

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Start Simple!

Rephrase your problem as either


● Binary classification
● Unidimensional regression

Example: “We will predict whether uploaded videos are likely to


become very popular (binary classification).”

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Lab: Your Output

● Make a copy of the ML problem framing


worksheet https://goo.gl/ZdaA4b
● Complete Exercise 4-7
● Time: 30 min

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
64

How Google does ML?


Agenda
Intro to ML

Framing the Problem

Creating ML Datasets

Infrastructure

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Barbara Fusinska
Strategic Cloud Engineer,
Google
66

If ML is a rocket engine,
data is the fuel

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
67

Unstructured data
accounts for
90% of enterprise data*

*Source: IDC

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Learning Objectives

● Go from raw data to features

● Representing features

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Raw data to features

Feature engineering is necessary because raw data doesn’t come to us as


feature vectors
[
6.0,
0 : {
1.0,
house_info : {
0.0,
num_rooms: 6
0.0,
num_bedrooms: 3
0.0, Process of creating
street_name: “Main features from raw data
9.321,
Street” is Feature Engineering
-2.20,
num_basement_rooms: -1
1.01,
...
0.0,
}
...,
}
]

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
A good feature is one that represents raw data in a form
conducive for ML

● To be conducive for ML, a feature should:


○ Be related to the objective
○ Be known at prediction-time
○ Be numeric with meaningful magnitude
○ Have enough examples
○ Bring human insight to problem

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
A good feature should be relatable to the objective

● We need a reasonable
hypothesis for why feature
value matters

● Different problems in same


domain may need different
features

breed age eye color

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Quiz: Are these features related to the objective or not?

Objective Feature Good feature?

Font of the text with which the discount is


Predict total number of customers advertised on partner websites
who will use a certain discount
Price of the item the coupon applies to
coupon
Number of items in stock

Whether cardholder has purchased these


items at this store before

Predict whether a credit card Credit card chip reader speed


transaction is fraudulent
Category of item being purchased

Expiry date of credit card

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Feature values should be known at prediction time

● Causal: can not rely on future information


● Must ingest that data in timely manner
● Legal/ethical to collect/use that
information?

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Feature definitions shouldn’t change over time

city_id:“br/sao_paulo”

inferred_city_cluster_id:219

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Quiz: Value knowable at prediction time or not?
Objective Feature Good feature?

Total number of discountable items sold


Predict total number of
customers who will use a Number of discountable items sold the previous month
certain discount coupon

Number of customers who viewed ads about item

Whether cardholder has purchased these items at this store before

Whether item is new at store (and can not have been purchased
Predict whether a credit-card before)
transaction is fraudulent
Category of item being purchased

Online or in-person purchase?

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
There should be enough examples of every feature value

● Each value of each feature in dataset


has to be understandable in context

● If you have category=auto, you must


have enough transactions
(fraud/no-fraud) of auto purchases

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Raw data are converted to numeric features in different ways
{
"transactionId": 42,
"name": "Ice Cream",
"price": 2.50, [..., 1, 2.50, …, ]
"tags": ["cold", "dessert"],
[..., 0, 8.99, …, ]
"servedBy": {
"employeeId": 72365, [..., 0, 3.45, …, ]
"waitTime": 1.4,

"customerRating": 4
},
"storeLocation": { In estimator API,
"latitude": 35.3,
This is a feature
"longitude": -98.7
}
column
},

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Numeric values can be used as-is

{
"transactionId": 42,
"name": "Ice Cream",
[ , 2.50, …, 1.4, ]
"price": 2.50,
"tags": ["cold", "dessert"], …
"servedBy": {
"employeeId": 72365,
"waitTime": 1.4,
"customerRating": 4 INPUT_COLUMNS = [
}, …,
"storeLocation": { layers.real_valued_column('price'),
"latitude": 35.3,
"longitude": -98.7

Real-valued-column is a
]
} type of feature column
},

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Overly specific attributes should be discarded

{
"transactionId": 42,
"name": "Ice Cream",
"price": 2.50,
"tags": ["cold", "dessert"],
"servedBy": {
"employeeId": 72365,
"waitTime": 1.4,
"customerRating": 4
},
"storeLocation": { Don’t train on IDs or other
super-specific information
"latitude": 35.3,
"longitude": -98.7
}
},

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Categorical values could be one-hot encoded

{
"transactionId": 42,
"name": "Ice Cream",
"price": 2.50,
"tags": ["cold", "dessert"],
"servedBy": {
8345 72365 87654 98723 23451
"employeeId": "72365",
"waitTime": 1.4, 0 1 0 0 0
"customerRating": 4
Sparse-column is a type of
},
"storeLocation": { feature column
"latitude": 35.3, layers.sparse_column_with_keys('employeeId',
"longitude": -98.7 keys=['8345', '72365', '87654', '98723', '23451']),
}
},

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Preprocess data to create a vocabulary of keys

Pre Feature Train


Inputs processing creation model
Model

Vocab Vocabulary of keys

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Options for encoding categorical data
These are all different ways
to create a SPARSE-column
If you know the keys beforehand:
layers.sparse_column_with_keys('employeeId',
keys=['8345', '72345', '87654', '98723', '23451']),

If your data is already indexed; i.e., has integers in [0-N):

layers.sparse_column_with_integerized_feature('employeeId', 5)

If you don’t have a vocabulary of all possible values:

layers.sparse_column_hashbucket('employeeId', 500)

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Customer rating can be used as continuous or as one-hot
encoded value
{
"transactionId": 42,
"name": "Ice Cream",
"price": 2.50,
[..., 0,0,0,1,0, ...]
"tags": ["cold", "dessert"],
"servedBy": {
"employeeId": 72365,
"waitTime": 1.4, (OR)
"customerRating": 4
}, [..., 4, …]
"storeLocation": {
"latitude": 35.3,
"longitude": -98.7
}
},

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Don’t mix magic numbers with data

[..., 0,0,0,1,1, ...] # 4


{
"transactionId": 42,
[..., 0,0,0,0,0, ...] # -1
"name": "Ice Cream",
"price": 2.50, (OR)
"tags": ["cold", "dessert"], [..., 4,1, …] # 4
"servedBy": {
"employeeId": 72365, [..., 0,0, …] # -1
"waitTime": 1.4, Data point
"customerRating": -1 with rating
},
"storeLocation": { Has rating?
"latitude": 35.3,
"longitude": -98.7 No
} rating Rating
},

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Discretize floats that are not meaningful

lat = layers.real_valued_column('latitude')
dlat = layers.bucketized_column(lat, boundaries=np.arange(32,42,1).tolist())

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Crazy outliers hurt trainability

Same feature,
capped to max of 4.0
50 rooms
per person!?

Rooms Per Person Capped Rooms Per Person

features['capped_rooms'] = tf.clip_by_value(
features['rooms'] ,
clip_value_min=0,
clip_value_max=4
)
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
87

Split the dataset and experiment with models

Training

Dataset

Validation

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
87
88

You can use the validation dataset to experiment with


model complexity
Train:
Start with a Evaluate:
Calculate
model with Is the model Use Model for
error on
random good Yes! Prediction
training
weights enough?
dataset

No.

Beware of overfitting as you


increase model complexity
Change the
wi weights so
that the
error goes
down
Underfit Overfit
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
88
89

To evaluate the final model there are two options

Option 1: Cross-validate Option 2: Use independent test data

Training Dataset
Training
Data Training
Training
Data

Experimental
Dataset
Validation

Validation
Validation
Data Test
Validation
Data

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
89
Lab: Dataset Design

● Complete Exercise 8-10


● Time: 30 min

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Companies / Mentors pairing

Company Mentor Design Review

Artio David Kas Janis Klaise

Cognism Matthias Feys Zack Ackil

DataDome Krzystof Suwada Dan Anghel

Dream Agility Dan Anghel Hoi Lam

Lab Geni Barbara Fusinska David Kas

Panopy Will Fletcher Will Fletcher

Personalyze.ai Zack Ackil Matthias Feys

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Dan Anghel
Strategic Cloud Engineer,
Google
93

Production ML Systems

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Production ML System

MACHINE
DATA
DATA VERIFICATION RESOURCE
COLLECTION
MANAGEMENT

FEATURE SERVING
EXTRACTION
ML CODE ANALYSIS TOOLS
INFRASTRUCTURE

PROCESS
MANAGEMENT CONFIGURATION MONITORING
TOOLS

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Reuse generic systems wherever you can

Spark

Hadoop

TF Serving

Dataflow
Your ML Model

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
GCP tools support end-to-end production ML Models
DATA COLLECTION DATA VERIFICATION MACHINE RESOURCE
SERVING INFRASTRUCTURE
MANAGEMENT

Serverless
Cloud Dataflow Cloud Dataflow

FEATURE EXTRACTION ANALYSIS TOOLS

ML CODE Cloud Machine


Learning Engine
Cloud Dataflow TensorBoard Cloud Datalab

PROCESS MANAGEMENT CONFIGURATION MONITORING

Cloud Composer
(Upcoming)
Stackdriver Stackdriver

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Production ML systems on Google Cloud
Ingest/Capture Process Store Analyze Learn Visualize

3rd party
tools:
Tableau Data
ML Engine Qlik Analysts
Google App Engine Cloud Dataflow BigQuery Storage BigQuery iCharts
(tables)

Google Cloud Logging Cloud Dataprep Vision API


Cloud Storage Cloud Data
(objects) Datalab Scientists

Text API
Google Cloud Pub/Sub Cloud Dataproc Smart
apps

Speech API
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Introduction to Dataflow
Lab: ML Production Systems

● Complete Exercise 11
● Time: 1,5h

Collection of Model Training Model Invoking


Data Evaluation Predictions

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Next Steps & Startup Ecosystem Wrap Up

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Buzz Word Bingo

MACHINE LEARNING OPERATIONAL INTELLIGENCE

BIG DATA ANALYTICS STREAMING ANALYTICS

ARTIFICIAL INTELLIGENCE NEURAL NETWORKS

IT OPERATIONS ANALYTICS BLOCKCHAIN

DATA SCIENCE BIG DATA

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Data science is hard

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Paper, Rock, Scissors of AI

Data beats algorithm.


Simple algorithm beats model.
Model beats hyperparameter tuning.

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Glove sensor data Linear transformation Probability

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Google Cloud Next

10-11 Oct ‘18


UK-LON

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Calls to Action

1) Registration for Next


a) g.co/next18/london
b) NEXTLONDON-DAY2-SRTUP-214
2) Office Hours and Architecture & Design Sessions
a) http://bit.ly/nextofficehours
3) Startup Party
a) http://bit.ly/nextstartupparty

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Testing Cloud???

https://goo.gl/hY72bR

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Next Steps

● Have a clearly defined ML problem


● Have training data with labels accessible by you and/or your mentors
● Download TensorFlow on your computer
● Have a well defined ML production system (from data collection, training,
visualization to serving infrastructure)
● We strongly recommend you to start your development ahead of the
Kickstarter day
● Email (rosei@google.com) completed worksheet (company name) by
Friday COB this week!

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Next Steps

● Questions?

● Feedback survey

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Appendix

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Google Cloud Startup Program types
Start on Google, Stay with Google

Start Program Spark Program Surge Program


● Idea or MVP companies ● Seed stage companies ● Series A companies
● $3,000 in GCP and Firebase credits ● $20,000 in Good Cloud Platform and ● $100,000 in Google Cloud Platform and
● Invitations to Google Cloud events Firebase credits for 1 year Firebase Credits for 2 years
● Training through Code Labs ● Invitations to Google Cloud events ● Invitations to Google Cloud events
● 6 months free of Hire (US ONLY) ● Weekly technical office hours ● Weekly technical office hours
● $500 in Qwiklabs credits ● Training through Code Labs ● 1:1 architecture reviews
● 5 free users on G Suite Basic for new ● 6 months free of Hire (US ONLY) ● Training through Code Labs
Accounts ● $500 in Qwiklabs credits ● 6 months free of Hire (US ONLY)
● Easy Transition to Spark Program ● $200/month Google Maps API for 1 year ● $500 in Qwiklabs credits
● Up to 10 free users on G Suite Basic for ● $600/month Google Maps API for 1 year
1 year for new G Suite Accounts ● Up to 10 free users on G Suite Basic for
● Easy Transition to Surge Program 1 year for new G Suite Accounts

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
3. Features need to be numeric because neural networks are
weighing and adding machines

matmul
b

x
Image source: https://commons.wikimedia.org/wiki/File:Two_layer_ann.svg

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Quiz: Which of these are numeric?

Feature of discount coupon to predict number of coupons that will be used Numeric?

Percent value of the discount (e.g. 10% off, 20% off, etc.)

Size of the coupon (e.g. 4 cm2, 24 cm2, 48 cm2, etc.)

Font an advertisement is in (Arial, Times New Roman, etc.)

Color of coupon (red, black, blue, etc.)

Item category (1 for dairy, 2 for deli, 3 for canned goods, etc.)

Note: Non-numeric features can be


used; it’s just that we need to find a
way to represent them in numeric form.
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Consider both real-time and historical data when building data
processing pipelines

BigQuery Pipeline p = Pipeline.create();


p.begin()
.apply(PubsubIO.readStrings().fromTopic(topic))
Cloud Pub/Sub
.apply(Window.into(SlidingWindows
.of(Duration.standardMinutes(60))
.apply(ParDo.of(new Filter1()))
Cloud Dataflow Cloud Pub/Sub
.apply(new Group1())
.apply(ParDo.of(new Filter2())
.apply(new Transform1())
.apply(BigQueryIO.writeTableRows().to(tbl));
Cloud Storage
p.run();

Cloud Storage

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
By applying the same transform code to historical data as to
real-time data, you can limit training-serving skew

BigQuery Pipeline p = Pipeline.create();


p.begin()
.apply(PubsubIO.readStrings().fromTopic(topic))
Cloud Pub/Sub
.apply(Window.into(SlidingWindows
.of(Duration.standardMinutes(60))
.apply(ParDo.of(new Filter1()))
Cloud Dataflow Cloud Pub/Sub
.apply(new Group1())
.apply(ParDo.of(new Filter2())
.apply(new Transform1())
.apply(BigQueryIO.writeTableRows().to(tbl));
Cloud Storage
p.run();

Cloud Storage

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Three potential architectures for dynamic training

Cloud Functions App Engine Cloud Dataflow


for asynchronous for user-triggered for continuous training
training jobs training jobs

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Cloud Functions can launch asynchronous training jobs

1 2 3 4

New file in Cloud Start Cloud MLE writes


Cloud Storage Function MLE training new model
launched job

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
AppEngine can be used for user-triggered training jobs

1 2 3 4

Web Request

User makes MLE training MLE writes Statistics on


web request job launched new model Cloud MLE

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Dataflow can be used for continuous training

1 2 3 4
5

Messages into Messages Aggregated Cloud MLE Updated


Pub/Sub aggregated data is stored launched on model is
with Dataflow into BigQuery new data in deployed
BigQuery

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Lab: Design a system architecture for dynamic models

Assume that you are building an


ML model to predict tomorrow’s
widget demand based on the
last 7 days of observations
● How would you build this model if
static? Do a napkin sketch.
● Repeat the napkin sketch of system
architecture, but for a dynamic model.
● Think carefully about what data to
collect and what performance to
monitor.

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Lab: Implementing a Machine Learning model in TensorFlow

Train from data in Pandas


In this lab, you will implement
dataframe
a simple machine learning
model using the estimator
Implement a Linear Regression
API:
model in TensorFlow

Train and evaluate the model

Repeat with a Deep Neural Network


model in TensorFlow

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Recall that there are three possible places to do feature
engineering, and that tf.transform is a hybrid of #1 and #2
Hyper-parameter tuning

Model
Inputs Pre Feature
Train model
processing creation
Preprocessed
features
3 2 1

tf.transform Dataflow TensorFlow


input_fn

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
127
tf.transform is a hybrid of Beam and TensorFlow

There are two PTransforms:

AnalyzeAndTransformDataset and TransformDataset

These correspond to two phases:

1. Analysis phase (compute min/max/vocab etc. using Beam)


○ Executed in Beam while creating training dataset
2. Transform phase (scale/string->int etc. using TensorFlow)
○ Executed in TensorFlow during prediction
○ Executed in Beam to create training/evaluation datasets

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
First, set up the schema of the training dataset

raw_data_schema = { TensorFlow type for input column


colname : dataset_schema.ColumnSchema(tf.string, ...)
for colname in 'dayofweek,key'.split(',')
}
raw_data_schema.update({
colname : dataset_schema.ColumnSchema(tf.float32, ...)
for colname in 'fare_amount,pickuplon, … ,dropofflat'.split(',')
})
raw_data_metadata = Use the schema to create metadata “template”
dataset_metadata.DatasetMetadata(dataset_schema.Schema(raw_data_schema))

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Next, run the analyze-and-transform PTransform on training
dataset to get back preprocessed training data and the
transform function

raw_data = (p 1. Read in data as usual for Beam


| beam.io.Read(beam.io.BigQuerySource(query=myquery, use_standard_sql=True))
| beam.Filter(is_valid)) 2. Filter out data that you don’t want to train with

3. Pass raw data + metadata template to AnalyzeAndTransformDataset


transformed_dataset, transform_fn = ((raw_data, raw_data_metadata)
| beam_impl.AnalyzeAndTransformDataset(preprocess))
4. Get back transformed dataset and a reusable transform function

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Write out the preprocessed training data into TFRecords, the
most efficient format for TensorFlow

_ = transformed_data | tfrecordio.WriteToTFRecord(
os.path.join(OUTPUT_DIR, 'train'), The filenames will be like train-0003-of0015
coder=example_proto_coder.ExampleProtoCoder(transformed_metadata.schema))
Note that we use the metadata schema here

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
The preprocessing function is restricted to functions you can
call from TensorFlow graph

def preprocess(inputs):
import datetime
result = {} Create features from the inputs
result['fare_amount'] = (inputs['fare_amount'])
result['dayofweek'] = tft.string_to_int(inputs['dayofweek']) vocabulary
...
result['dropofflat'] = (tft.scale_to_0_1(inputs['dropofflat'])) scaling
result['passengers'] = tf.cast(inputs['passengers'], tf.float32)
return result Other tensorflow functions

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Writing out the eval dataset is similar, except that we reuse the
transform function computed from the training data
raw_test_data = (p
| beam.io.Read(beam.io.BigQuerySource(...)
| 'eval_filter' >> beam.Filter(is_valid))
transformed_test_dataset = (((raw_test_data, raw_data_metadata), transform_fn)
| beam_impl.TransformDataset())
transformed_test_data, _ = transformed_test_dataset
_ = transformed_test_data | tfrecordio.WriteToTFRecord(
os.path.join(OUTPUT_DIR, 'eval'),
coder=example_proto_coder.ExampleProtoCoder(
transformed_metadata.schema))

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
For serving, we need to write out the transformation metadata

_ = (transform_fn
| transform_fn_io.WriteTransformFn(os.path.join(OUTPUT_DIR, 'metadata')))

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Change our input function to read TFRecord, not CSV (1/2)

def make_input_fn(data_paths, num_epochs, batch_size, mode):


def input_fn():
input_schema = {}
input_schema[LABEL_COLUMN] = tf.FixedLenFeature(shape=[1], dtype=tf.float32,
default_value=0.0)
for name in ['dayofweek', 'key']:
input_schema[name] = tf.FixedLenFeature(shape=[1], dtype=tf.string,
default_value='null')
... Start with the schema (same as during preprocessing)

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Change our input function to read TFRecord, not CSV (2/2)

keys, features = tf.contrib.learn.io.read_keyed_batch_features(


data_paths[0] if len(data_paths) == 1 else data_paths,
batch_size,
input_schema,
reader=gzip_reader_fn,
reader_num_threads=4,
queue_capacity=batch_size * 2,
randomize_input=(mode != tf.contrib.learn.ModeKeys.EVAL),
num_epochs=(1 if mode == tf.contrib.learn.ModeKeys.EVAL else num_epochs))
target = features.pop(LABEL_COLUMN)
features[KEY_FEATURE_COLUMN] = keys
return add_engineered(features), target
return input_fn

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
For serving, we need to write out the transformation metadata

_ = (transform_fn
| transform_fn_io.WriteTransformFn(os.path.join(OUTPUT_DIR, 'metadata')))

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Closing and Q&A
with David Roldan

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Next Steps by 9th September

● Have a clearly defined ML problem


● Have training data with labels accessible by you and/or your mentors
● Download TensorFlow on your computer
● Install Colab
● Have a well defined ML production system (from data collection, training,
visualization to serving infrastructure)
● We strongly recommend you to start your development ahead of the
Kickstarter day

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Omitted slides

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
What do these search queries have in common ?

Japanese toys in Buy live lobster in


san francisco kissimmee fl

Bee hive removal


pasadena md Vegan donuts near me

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
ML converts examples into knowledge

Coffee near me

Anna’s
Gourmet
Cafe
Bill’s Diner 3min 5min

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
ML converts examples into knowledge

Coffee near me

Anna’s
Gourmet
Bill’s Diner 3min 15min Cafe

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
How far will you go for gourmet coffee?
User Happiness

1 KM 10 KM
Distance
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Good learning involves blending all the users’ preferences

Product excellence
User Happiness

1 KM 10 KM
Distance

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
ML can be used to solve many problems for which you are
writing rules today

query Rule-based response query response


Model
application

Codeup rules based on human expertise. Train model based on data.

Apply rules program to make decisions. Deploy model at scale to make predictions.

Add new rules in response to bug reports. Continuous training of model on data.

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
147

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Machine learning scales better than hand-coded rules

query = ‘Giants’

user location = ‘Bay user location = user location =


Area’ ? ‘New York’ ? ‘other’ ?

results about results about results about


SF Giants NY Giants giants

© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
RankBrain (a deep neural network for search
ranking) improved performance significantly

#3
signal
Search
for Search ranking, out
machine learning for search engines of hundreds

#1
improvement
to ranking quality
in 2+ years
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.

You might also like