Professional Documents
Culture Documents
11/20/19 1
Formal approach to engineering ML systems ?
Software)engineering:)) Engineering)ML)Systems:))
SDLC,&Agile,&SCRUM,&SRS, ?)?
Software)Design:) Offline)Modeling:
OOP,&design&patterns,&SOA&... Preprocessing,&feature&engg.,&
learning,&evaluation
Programming)Languages: ML)Tools/)Packages:
Java,&&Python,&&C++,&Scala Tensorflow,&spark.ml,&sklearn,&R
Data)structures)&)Algos:) ML)Concepts)&)Algos
sort,&&&trees,&lists&& Linear&models,&Neural&models,&
BiasIvariance&&&
2
ML#Application
Life#Cycle Problem(
formulation
Online( Data
evaluation(& definitions
evolution
Deployment(&( Offline
maintenance ML(modeling
Production( SKILLS
Pre6deployment(
system(
testing
design
Product(+(ML(+(Engg.
Engg.(+(ML
Production(
system Engg.
implementation
ML 3
ML#Application
Life#Cycle Problem(
formulation
Application##### ! ML#&#
requirements Optimization#
Online( Data Problems
evaluation(& definitions
evolution
Deployment(&( Offline
maintenance ML(modeling
Production( SKILLS
Pre6deployment(
system(
testing
design
Product(+(ML(+(Engg.
Engg.(+(ML
Production(
system Engg.
implementation
ML 4
ML#Application
Life#Cycle Problem(
formulation
Precise#sources#&#definitions#of#
all#data#elements#
Online( Data
evaluation(& definitions
evolution
Checks:#
! Diff.#types#of#leakage
! Data#quality#issues
Deployment(&( Offline ! Distributional#violations
maintenance ML(modeling
Production( SKILLS
Pre6deployment(
system(
testing
design
Product(+(ML(+(Engg.
Engg.(+(ML
Production(
system Engg.
implementation
ML 5
ML#Application
Life#Cycle Problem(
formulation
Offline#training##&#evaluation#of#
Deployment(&( Offline
maintenance ML(modeling
Production( SKILLS
Pre6deployment(
system(
testing
design
Product(+(ML(+(Engg.
Engg.(+(ML
Production(
system Engg.
implementation
ML 6
Offline Modeling
Data-Exploration-
Data-Sampling/Splitting-
Data-Preprocessing
Feature-Engineering
Model-Training,-Evaluation-------------
&--Fine?tuning
Meet
Business-
Goals?
7
ML#Application
Life#Cycle Problem(
formulation
Functional##Production#System
Online( Data ! Scalability
evaluation(& definitions
evolution ! Responsiveness
! Fault#tolerance
! Security
Deployment(&( Offline ! …
maintenance ML(modeling
Production( SKILLS
Pre6deployment(
system(
testing
design
Product(+(ML(+(Engg.
Engg.(+(ML
Production(
system Engg.
implementation
ML 8
ML#Application
Life#Cycle Problem(
formulation
Equivalence#checks#for#offline#
Online( Data modeling#vs.#production#settings
evaluation(& definitions
evolution ! Data#fetch#process
! Entire#model#pipeline#
! Data#distributions#
Deployment(&( Offline
maintenance ML(modeling
Production( SKILLS
Pre6deployment(
system(
testing
design
Product(+(ML(+(Engg.
Engg.(+(ML
Production(
system Engg.
implementation
ML 9
ML#Application
Life#Cycle Problem(
formulation
Automation#of
Online( Data
definitions
▪ Predictions#for#new#instances
evaluation(&
evolution ▪ Data#quality#monitoring
▪ Data#logging#&#attribution
▪ Periodic#re=training
Deployment(&( Offline
maintenance ML(modeling
Production( SKILLS
Pre6deployment(
system(
testing
design
Product(+(ML(+(Engg.
Engg.(+(ML
Production(
system Engg.
implementation
ML 10
ML#Application
Life#Cycle Problem(
formulation
! A/B#testing#on key#prediction#
Online( Data quality#& business#metrics#
evaluation(& definitions
evolution ! Assessment#of#system#
performance#aspects#
! Diagnosis#to#find#areas#of#
improvement
Deployment(&( Offline
maintenance ML(modeling
Production( SKILLS
Pre6deployment(
system(
testing
design
Product(+(ML(+(Engg.
Engg.(+(ML
Production(
system Engg.
implementation
ML 11
System Objectives
12
ML#Application H
Life#Cycle Problem(
formulation
SKILLS
H
P
Online( Data Product(+(ML(+(Engg.
evaluation(& definitions
evolution Engg.(+(ML
Engg.
L P
Deployment(&( Offline
ML(modeling ML
maintenance
APPLICATION(DEPENDENT
L
P Production(
Pre6deployment( H High – Application(Specific
system(
testing
design P Partial
L
Production(
system
implementation
13
ML & Data Science Learning Programs
Deployment
Issues
Modeling
Process
ML
Pipelines
Learning
Algorithms
Data
14
Factors for Success of ML Systems
ML
Deployment Pipelines
Issues
Modeling Learning
Algorithms
Process
Data
15
Problem Formulation
Business Problem: Optimize a decision process to improve business metrics
Decision
Decisions
Process Business
Metrics
External
Response
ML ML ML
Model Model Model
Ask “why?” to arrive at the right ML problem(s) !
Reseller Fraud Example
Objective: Reduce return shipping expenses; increase #users served (esp. sale time)
Decision process:
• Partner with reseller in case of potential to expand user base
• Block fraudulent orders or introduce friction (e.g., disable COD/free returns)
OBSERVATIONS
• Sources of data
Instance Definition
11/22/19 26
Modeling Metrics - Classification
11/21/19 28
Ethical and Fairness Constraints
Need$to$be$incorporated$in$the$modeling$metrics$!
Deployment Constraints
11/21/19 31
Data Definitions
• Precisely record all sources & definitions for all data elements
– (ids, features, targets, metric-factors) for (training, evaluation, production)
• Establish parity across training/evaluation/production
– definitions, level sets, units, time windows, missing value handling, correct snapshots
• Review for common data leakages
– peeking into future, target
• Pro-actively collect information on data quality issues & resolve
– missing/invalid value causes, data corruptions
Offline Modeling
11/22/19 34
Deployment
11/21/19 36
Thank You !
Happy Modeling !
Contact: srujana@gmail.com