You are on page 1of 163

Machine Learning

for the Enterprise


Presented by Matt Kirk
YourChiefScientist.com
Sponsored by IBM
How important is
machine learning to you?
On a scale of one to ten
Why wasn’t
that number lower?
What is your motivation?
Big Data, Big Problems, Big
Algorithms, Big Algos, Big
Business, Big Innovation, Big
Ethics, Biggie Smalls, Big The
Movie.
71 uses of Machine Learning
Amazon recommendations Robot Process Automation at FortressIQ Waze
Nest Thermostat Detecting defects with images like food, Semantic Similarity using word2vec
Spotify discover weekly cracks in subfloors, etc Playing music
Facebook lookalike audiences Recommended connections Customer lifetime value calculation
Amazon photos organization FB News feed What ads, to serve to who, and when?
Siri, Google, Cortana, Alexa Learned database indexes Forecasting the weather
Chat bots and implied intent Personalization with Sailthru Better insurance. by monitoring your driving
AlphaGo A/B testing with optimizely patterns
MarI/O or tool assisted gaming Malware detection at Malware bytes Object detection with YOLO, and RCNNs
Yelp photo aggregation
thematic clusters Forthcoming
Facial detection at BorderArticle:
crossings with Learn where military bases are (Strava
Rekognition accidently gave this out)
Google autocomplete. YourChiefScientist.com
Self driving cars with Tesla Anomaly detection in technical trading
Sense energy meter Deep fakes Detecting sexual orientation with computer
Hedge funds and renaissance Better workouts with data at Volt vision
technologies Summarizing Legalese with LegalRobot InnerEye
Fraud detection Preventing money laundering at Paypal 23andme relatives
Check deposit recognition Verifying identity at SheerID Predicting lost connections in Airlines
Policing with data How the Oakland A's won with Reducing emissions with deep learning at
Netflix recommendations Moneyball Siemens
Better Pricing with AirBNB Emotions and Sentiment with IBM Predicting delivery time at Postmates
Painting with neural nets Watson Why do summer songs all sound the same?
Predict maintenance problems Age detection using Rude Carnie Roombas
Big Data, Big Problems, Big
Algorithms, Big ML, Big Business,
Big Innovation, Big Ethics, Biggie
Smalls, Big The Movie.
Why you are here (roll call)
A CTO who wants to use Machine Learning at their company?

A business analyst who wants to utilize algorithms?

A developer who wants to roll machine learning into their toolkit?

A data scientist who wants to learn more about the business?

A lifelong learner who is ready to dive in head first?


What this says about you
You’re committed to invest in your own knowledge

You like a good challenge

You’re ready to see whether this ML stuff is hype or !hype

Somebody’s gotta do something with all this data


Akrasia
Top 4 reasons why not
1. I don’t have enough time to learn everything that’s needed

2. I don’t understand how to deploy it

3. Machine learning is overhyped.

4. Machine learning causes more harm than good


I don’t have enough time
This individual believes...
To become an effective ML engineer you need to
1. Relearn
a. C/C++
b. Distributed computing
c. CUDA, OpenMP, MPI
2. Math
a. Calculus
b. Linear Algebra
c. Probability and statistics
3. Get a Ph.D in ML
4. Re-implement all the machine learning algorithms by hand
“if you skip them, you won't be able to do
any serious work in machine learning, or
even understand latest papers.
You will be a script kiddie, not a hacker.”
I don’t know
how to deploy it
Machine
learning is
overhyped
Machine
learning
causes harm
I don’t blame you….
If you followed the common sense advice…
you’d become ML powered in 2057.
We are going to break
through this
Why should you listen to me?
➔ Over 10 years doing this
➔ 2 published O’Reilly books on ML
➔ I have a masters degree in CS from GaTech
➔ Believability over track record
3 person startups
Unicorn startups
Enterprises (Finance Industry)
➔ I’m here to support your growth.
Why IBM?
➔ They did all the hard work
so you don’t have to
➔ Watson Studio is simple…
as you will see
➔ Been involved with AI
since the beginning
➔ Watson’s track record
Matt Kirk Miguel Maldonado
YourChiefScientist.com IBM – AI Expert Labs
Top 4 reasons why not
1. I don’t have enough time to learn everything that’s needed

2. I don’t understand how to deploy it

3. Machine learning is overhyped.

4. Machine learning causes more harm than good


Top 4 reasons why now.
1. IBM Watson did all the work for us.

2. Watson cloud takes care of deployment.

3. The cloud is obviously not hype.

4. IBM’s commitment to ethical usage of data.


Ready, Fire, Aim.
With IBM’s support.
What this course looks like
Introduction to Machine Learning 9:00 - 10:00
Stretch 10:00 - 10:05
Visualizations 10:05 - 11:30
Break, stretch, use the restroom 11:30 - 11:45
Predicting Churn 11:45 - 1:15
Lunch 1:15 - 2:15
Decision optimization 2:15 - 3:45
Break, stretch, use the restroom 3:45 - 4:00
Wrap up, freebies, and thank you 4:00 - 5:00
How we will achieve this
1. Lecture
2. Quiz
3. Demonstrations and Discussion
4. Labs
5. Stretches and Breaks
Lecture (Learn by Listening)
It won’t be academic.

It will be about introducing a mindset behind the theory.

As well as practical applications of it.

Write your questions down to ask during Demo & Lab time.
Quiz (Retain that knowledge)
Simple poll of the workshop to retain the knowledge.

These serve as a way for you to remember key insights about what
we’re talking about.
Discussion (Learn by Reflection)
The purpose of the demo is to introduce and guide you on the Lab
sections.

I will introduce the data and general purpose of the lab sections.

Also I will give you helpful guidance and things to experiment


with.
Lab (Learn by Doing)
This is the section where you get to learn and try out what we’ve
been working with.

This is meant for you to experiment.

Try things, fail, and if you don’t finish it all in 1.5 hours that’s ok.

I’m here to help and will be bouncing around.


Breaks
By the end of this class you will
➔ Have spent the time to know enough machine learning
➔ You will understand how to go about deploying into real world
➔ Understand why ML isn’t just hype
➔ Understand how to use ML for good not harm
➔ Learn the tacit knowledge not taught at universities
What Machine Learning is…
and is not
● What is it?
● Who is using it?
● The classes
● Relation to Artificial Intelligence
● Relation to Optimization
● What math should I know?
● Benefits / Difficulties
What is machine learning?
A toolkit of algorithms that finds insight from data.
What is machine learning?
What is machine learning?
What is machine learning?
Deductive reasoning
➔ Philosophical Logic
Modus Ponens
Syllogism
Contrapositives

➔ Economics
➔ Political Science
➔ Theory of Rationality
Problems with deduction
➔ Predictable irrationality

➔ Logical fallacies

➔ Local minima
Inductive reasoning
➔ Evidence based
➔ Statistics
➔ Generalizations
➔ Statistical Syllogisms
➔ Proof by Induction
➔ Prediction
➔ Analogies
➔ Causal inference
Problems with induction
➔ Biases
Confirmation bias
Attribution bias
Favoritism
Inductive bias
Racism
Sexism
➔ Black Swans
➔ Not all variables are available
Weapons of Math Destruction
Author Cathy O'Neil

Creditworthiness → Confirmation Bias

Racism → Attribution Bias

Sexism → Generalizations
Who uses machine learning?
Amazon recommendations Robot Process Automation at FortressIQ Waze
Nest Thermostat Detecting defects with images like food, Semantic Similarity using word2vec
Spotify discover weekly cracks in subfloors, etc Playing music
Facebook lookalike audiences Recommended connections Customer lifetime value calculation
Amazon photos organization FB News feed What ads, to serve to who, and when?
Siri, Google, Cortana, Alexa Learned database indexes Forecasting the weather
Chat bots and implied intent Personalization with Sailthru Better insurance. by monitoring your driving
AlphaGo A/B testing with optimizely patterns
MarI/O or tool assisted gaming Malware detection at Malware bytes Object detection with YOLO, and RCNNs
Yelp photo aggregation Facial detection at Border crossings with Learn where military bases are (Strava
thematic clusters Rekognition accidently gave this out)
Google autocomplete. Self driving cars with Tesla Anomaly detection in technical trading
Sense energy meter Deep fakes Detecting sexual orientation with computer
Hedge funds and renaissance Better workouts with data at Volt vision
technologies Summarizing Legalese with LegalRobot InnerEye
Fraud detection Preventing money laundering at Paypal 23andme relatives
Check deposit recognition Verifying identity at SheerID Predicting lost connections in Airlines
Policing with data How the Oakland A's won with Reducing emissions with deep learning at
Netflix recommendations Moneyball Siemens
Better Pricing with AirBNB Emotions and Sentiment with IBM Predicting delivery time at Postmates
Painting with neural nets Watson Why do summer songs all sound the same?
Predict maintenance problems Age detection using Rude Carnie Roombas
Social Media
➔ Facebook newsfeed

➔ Yelp photo aggregation

➔ Waze

➔ Strava

➔ What kind of ads to serve to whom


Healthcare
➔ InnerEye

➔ Better workouts with Volt athletics

➔ 23andme relatives

➔ Robotic procedures
Retail
➔ Target targeting pregnant woman

➔ Recommendation engines

➔ Personalized shopping feeds

➔ Predicting customer lifetime value

➔ Fraud detection
Finance
➔ Hedge Funds like Renaissance Tech
➔ Anomaly detection in technical trading
➔ Executing trades effectively at JP Morgan
➔ Feature extraction of SEC filings
➔ Fraud detection
➔ Check deposit detection
Marketing
➔ Look-alike audiences

➔ Advertisements

➔ Personalization in marketing

➔ Thematic clustering
Data security
➔ Malware detection

➔ Cloudflare

➔ Learned network topologies

➔ DDOS mitigation

➔ Fake news detection with Grover


3 Classes of ML Algorithms
3 Classes of ML Algorithms
SUPERVISED
LEARNING

Finding a function
that maps data to
values based on
previous observations

f(x) = y
3 Classes of ML Algorithms
SUPERVISED UNSUPERVISED
LEARNING LEARNING

Finding a function Algorithm looks for


that maps data to patterns in the data
values based on without any guidance
previous observations of values

f(x) = y f(x) = x
3 Classes of ML Algorithms
SUPERVISED UNSUPERVISED REINFORCEMENT
LEARNING LEARNING LEARNING

Finding a function Algorithm looks for Algorithm looks to


that maps data to patterns in the data maximize rewards
values based on without any guidance over a time period,
previous observations of values given previous
observations

f(x) = y f(x) = x max f_t(x) = y_t


So what is Deep Learning?

Large artificial neural networks that learn their own features.


“Large neural nets”
Andrew Ng
“Feature Learning”
Yoshua Bengio
Information Bottleneck
Machine Learning
Finding insight from a mountain of data
➔ Unsupervised Learning
Clustering ⟷ Group data
Matrix Factorization ⟷ Generalize data
Autoencoders ⟷ Generalize data
➔ Supervised Learning
Classify ⟷ Analogies
Regression ⟷ Predict
➔ Reinforcement Learning
Policy Iteration ⟷ Statistical Syllogism
Artificial Intelligence
Artificial Intelligence
➔ Planning
➔ Expert Systems
➔ Ontologies
➔ Perception
➔ Mostly deductive reasoning
➔ Heuristics
➔ Learning, Machine Learning
What is AI?
Codifying thought:
+ Induction
+ Deduction
+ Line between human and computer goes to zero
What is AI?
Journey To AI
AI

INFUSE – Operationalize AI with trust and transparency

ANALYZE - Scale insights with AI everywhere

ORGANIZE - Create a trusted analytics foundation

COLLECT - Make data simple and accessible


Optimization theory
What is Optimization theory
➔ Find the best element from some set of alternatives
➔ Sometimes with constraints
➔ Shows up in:
Queueing theory
Finance
Operations research
Manufacturing
Derivative Optimization
➔ Fermat showed that the
extreme points of a function
exist where the derivative
vanishes (1636)

➔ Linear Optimization
(Dantzig + von Neumann)
Computational Optimization
Using computers to find the best:
➔ Hill Climbing
➔ Genetic Algorithms
➔ Simulated Annealing
➔ LIPO
Optimization / Machine Learning
Optimization Machine Learning

Mathematical Engineering
techniques to find techniques to find
best option within a best option. More folk
given search space or theorems here, like
numerical set. deep learning
architectures.
Optimization Theory: Pure ML
Machine Learning: Applied ML
OK but how much math
do I really need to know?
What should you focus on?
Pure Math Applied Math
Algebra Differential equations
Calculus, real and complex analysis Physics
Geometry, topology Computer Science
Combinatorics Information Theory
Logic Probability Theory + Statistics
Number Theory Game Theory
Optimization
Linear Algebra
What should you focus on?
Pure Math Applied Math
Algebra (Group Theory) Differential equations
Calculus, real and complex analysis Physics
Geometry, topology Computer Science
Combinatorics Information Theory
Logic Probability Theory + Statistics
Number Theory Game Theory
Optimization
Linear Algebra
Machine learning is like a stew
The optimized combination of:
+ Computer Science
+ Information Theory
+ Statistics
+ Probability
+ Domain expertise
Machine learning is like a stew
The optimized combination of:
+ Computer Science
+ Learn a programming language
+ Information Theory
+ Entropy
+ Statistics & Probability
+ Mean, Median, Mode
+ Distributions
+ Domain expertise
Information Theory: Entropy
Weather in Death Valley

It’s always hot there.

Q-Codes: QSL, QRZ

Philips Codes: 73, 88

10 codes: 10-4

Chat Acronyms: lol, rofl

KJ7IUD 73
Information Theory: Entropy
Statistics: Mean, Median, Mode
➔ Mode
The highest probability

➔ Median
50% is below and 50% above

➔ Mean
weighted average
Distributions to know
Discrete Continuous

Poisson Normal
Phone calls received Height in a classroom

Binomial Uniform
Heads / Tails on a coin Random number

Exponential
Particle decay
Discrete Distributions
Poisson Binomial
Continuous Distributions
Normal Uniform Exponential
Lying with Statistics
➔ Biased samples
➔ Biased averages
➔ Discarded data
➔ Graph manipulation
➔ The dead cat phenomena
➔ Correlation vs Causation
High interest credit card debt
➔ Entanglement ➔ Glue Code
➔ Hidden Feedback Loops ➔ Pipeline Jungles
➔ Undeclared Consumers ➔ Experimental Code Paths
➔ Unstable Data dependencies ➔ Configuration Debt
➔ Underutilized data dependencies ➔ Fixed Thresholds
➔ Correction Cascade ➔ Correlation Changes

http://bit.ly/1zwONap
Discussion
Time to stretch!
Visualizations

● Relevance to you
● Best practices
● Bad practices
● Analyze Iris with Jupyter notebook and Matplotlib
● Analyze thermostat readings using IBM Watson Studio
● Q&A
What works better?
My own story about
visualizations
The Tibetans believe in 8 senses
The occipital lobe is quite large
Seeing is important to
influence thought
Types of Visualizations
➔ Numerical
➔ Categorical
➔ Combination
➔ Maps
➔ Network
➔ Time series
Great resource: data-to-viz.com
The 6+1 Edward Tufte Principles
1. Show comparisons
2. Show causality
3. Use multivariate data
4. Complete integrate modes
5. Establish credibility
6. Focus on content
7. Ruthless pruning
Comparisons
Causality
Multivariate
Integrated
Credibility
➔ Authenticity with data
Where’d it come from?
How did it get here?
➔ Honesty with data
Telling the actual truth not the expected or perceived truth
➔ Open with data
Missing data
Outliers
Errors: precision, recall, accuracy, F1 score etc.
Content
➔ Minimize architecture

➔ Focus on the story

➔ Get rid of chart junk


Ruthless pruning
➔ “Omit needless words”
➔ 5S for visualizing:
Sort it out: get rid of needless pixels
Straighten the data. Set it in order
Sweep it up: cleanse if needed
Standardize how visualization is done
Sustain the beauty
Data to Viz Caveats (top 10)
1. Cut y-axis only when time series
2. Don’t use pie charts
3. Add jitter to Boxplots
4. Avoid error bars on Bar plots
5. Use only one y-axis
6. Simpson Paradox
7. Comparing apples to oranges (normalize data when needed)
8. Don’t make me think.
9. Compare percentages with log
10. Mind your aspect ratio
Discussion
Andon Cups
Getting started with Iris

https://bit.ly/2m2r0n3
Visualize Fast with IBM Watson

https://bit.ly/2kszxj0
Churn Prediction

● Relevance to you
● Best practices
● Bad practices
● Using Watson to determine churn
● Discussion & Q&A
Churn at ClickFunnels
Expert Blindspot
You think they

Want Don’t Want

Missed opportunity
Market Traction
Want
19% of startups failures are from
The sweet spot!
competitors winning [1]
Customers
No market Need
Irrelevant work
Don’t want
42% of startup
Time wasters
failures are here [1]
[1] Top 20 reasons why startups fail - CB Insights
CRISP-DM &
IBM Data Science &
TACT
CRISP-DM
Data Science Methodology
TACT
Target

Transmit
Arrange

Compose
Compose

Pro
Cu du
sto ct
Transmit me Rea Arrange
r R lm
ea
lm

Target
tactmethod.com
IBM Data Science Methodology
➔ Target the right target
Understand the business
Build your analytic approach
➔ Arrange the data
Gather requirements
Collect data
Understand the data
Prepare the data
➔ Compose a model
Model: AutoML
➔ Transmit
Evaluate
Deploy
Gather feedback
Discussion
Andon Cups
Churn Prediction with IBM

From sklearn import cross_validation

Import sklearn.model_selection as
cross_validtion
https://ibm.co/2mjdNXl
Decision Optimization

● Relevance to you
● Best practices
● Bad practices
● Using Watson to make decisions
● Discussion & Q&A
Emails at Conversica
Principles - Ray Dalio
➔ Bridgewater Associates:
biggest hedge fund.
➔ Runs the company on:
Principles
Algorithms
Recipes
➔ “Coach” app internal to
Bridgewater Associates
Objective
➔ Minimize / Maximize

➔ Argmin / Argmax

➔ Convexity / Non-convex

➔ Continuous / Discrete

➔ Differentiable / Not

➔ Example: Mean in MVO


Constraints
➔ Linear / Non linear

➔ Unconstrained

➔ Example: MVO by Markowitz


Best Practices
➔ Sometimes solving the constraint is the most important (e.g. sudoku)
➔ Lean more towards convex problems if possible:
Quadratic or linear are easy to solve
Mostly convex looking is iteratively ok to solve too. Error minimization.
➔ If iterative optimization set a convergence criteria
➔ Global minima is much harder to find than local minima.
➔ Remember good enough is probably better than what we do naturally
with our minds.
Common pitfalls
➔ Not defining constraints properly (spinning out of control)

➔ Using wrong indicators of success

➔ Making up a problem that doesn’t exist

➔ Not taking into consideration costs to optimization

➔ Forgetting about externalities


Discussion
Andon Cups
Decision Optimization with IBM

https://bit.ly/2kMwe6j
Decision Optimization Examples

https://bit.ly/2mbdMoe
Why IBM Watson Studio?
What is your motivation?
If you stay till the end

We’ve got some cool stuff for you


If you aren’t using ML
now, you’re too late
If you aren’t using ML
now, you’re too late
to start from scratch
Imagine where you’ll be a
year from now
Or 5 years from now, and remember that number at the beginning.
Let me share a secret
with you
Nobody starts from scratch
This isn’t easy stuff
➔ Optimizing a model takes forever

➔ It’s hard to share modeling

➔ Deploying models requires engineers

➔ Visualizing data is hard

➔ Deep Learning is a dark art


AutoAI in IBM
Collaborative Data Science
APIs around models
Enhanced visual modeling
Automated Deep Learning
I’ve been here before
➔ Optimizing a model takes forever

➔ It’s hard to share modeling

➔ Deploying models requires engineers

➔ Visualizing data is hard

➔ Deep Learning is a dark art


Honestly… if I had any
one of these 10 years ago
I’d be ecstatic
IBM cloud benefits and
strategies
● TO HASH
Private cloud and on premises
So seriously check it out

https://ibm.co/2kybYS5
Free stuff
Discussion
What Machine Learning is…
and is not
● Algorithms that take data to insight
● The many uses of machine learning
● The classes
● What is Deep Learning, AI, Optimization in relation?
● What math should I know?
● The High Interest Credit Card Debt
Visualizations

● Visualizations influence thought strongly


● Strategies for award winning visuals
● Caveats for visuals
● Up and running fast with IBM watson
Churn Prediction

● Why predict churn at all?


● Methods for world class machine learning models
● Breakout and time to calculate churn based on Kaggle
Decision Optimization

● Why optimize decisions?


● Objective and constraint
● Tactics and Strategies for optimizations
● Breakout in decision optimization
Why IBM Studio?

● AutoAI
● Deployment solved
● Visualization distilled
● Deep Learning dispelled
● Collaborative work environment
● OSS Friendly: Python, R, Scala, Spark. Doesn’t matter
What was your number?

On a scale of one to ten


What are you going to do
now?
3 months, 6 months, a year from now?
Thank you
matt@matthewkirk.com
YourChiefScientist.com

Miguel.Maldonado@ibm.com
IBM.com/MachineLearning
Rate today ’s session

Session page on conference website O’Reilly Events App

You might also like