MLOps Asilla 20221124

Machine Learning
In Production
Lương Anh Tuấn: 2022-11-24
copyright© Asilla, Inc.

What we need to learn
- Design an ML production system end-to-end: project scoping, data needs, modeling strategies, and
deployment requirements.
- Establish a model baseline, address concept drift, and prototype how to develop, deploy, and continuously
improve a productionized ML application.
- Build data pipelines by gathering, cleaning, and validating datasets. Establish data lifecycle by using data
lineage and provenance metadata tools.
- Apply best practices and progressive delivery techniques to maintain and monitor a continuously operating
production system.

What is Machine Learning in Production(MLops)
- Machine learning engineering for production combines the foundational

concepts of machine learning with the functional expertise of modern
software development and engineering roles.
- The Machine Learning Engineering for Production (MLOps) covers how to
conceptualize, build, and maintain integrated systems that continuously
operate in production.
- MLOps need to handle relentless evolving data.
- MLOps must run non-stop at the minimum cost while producing the maximum
performance
- MLOps apply AI technology to solve real-world problems.

Machine Learning Project Lifecyle
https://www.datarobot.com/wiki/machine-learning-life-cycle/

Machine Learning Workflow

Machine Learning Workflow
- Data Engineering: data

acquisition & data preparation,
- ML Model Engineering: ML
model training & serving, and
- Code Engineering :integrating
ML model into the final product.
The complete development

pipeline includes three levels
of change: Data, ML Model,
and Code.

MLOps pipeline

What and Why Feature Store
Feature stores make it easy to:

- Productionize new features without extensive
engineering support
- Automate feature computation, backfills, and
logging
- Share and reuse feature pipelines across teams
- Track feature versions, lineage, and metadata
- Achieve consistency between training and
serving data
- Monitor the health of feature pipelines in
production
https://www.tecton.ai/blog/what-is-a-feature-store/
Feature Store Example
- Feature Management: Discover, Use,

Monitor, and Govern End-to-end
Feature Pipelines
- Feature Logic: Design and Define ML
Features
- Feature Repository: Register and
Collaborate on Feature Definitions
- Feature Engine: Transform Raw Data
Into Fresh Feature Values
- Feature Store: Store and Serve Fresh
Feature Values
https://www.tecton.ai/

Three levels of automation ML
1. Manual process. This is a typical data science

process, which is performed at the beginning of
implementing ML. This level has an experimental and
iterative nature. Every step in each pipeline, such as
data preparation and validation, model training and
testing, are executed manually. The common way to
process is to use Rapid Application Development
(RAD) tools, such as Jupyter Notebooks.
2. ML pipeline automation. The next level includes the
execution of model training automatically. We
introduce here the continuous training of the model.
Whenever new data is available, the process of model
retraining is triggered. This level of automation also
includes data and model validation steps.
3. CI/CD pipeline automation. In the final stage, we
introduce a CI/CD system to perform fast and reliable
ML model deployments in production. The core
difference from the previous step is that we now
automatically build, test, and deploy the Data, ML
Model, and the ML training pipeline components.
https://ml-ops.org/content/mlops-principles#experiments-tracking

MLOps Infrastructure Stack
An example of the technology stack might include

the following open source tools:
ps Setup Components Tools

Data Analysis Python, Pandas
Source Control Git
Test & Build Services PyTest & Make
Deployment Services Git, DVC
Model & Dataset DVC[aws s3]
Registry
Feature Store Project code library
ML Metadata Store DVC
ML Pipeline DVC & Make
Orchestrator
https://github.com/EthicalML/awesome-prod
uction-machine-learning
https://github.com/lfai/lfai-landscape

https://ml-ops.org/content/state-of-mlops
Summary of MLOps Principles and Best Practices
MLOps
Principles Data ML Model Code
Versioning 1) Data preparation pipelines 1) ML model training pipeline 1) Application code
2) Features store 2) ML model (object) 2) Configurations
3) Datasets 3) Hyperparameters
4) Metadata 4) Experiment tracking
Testing 1) Data Validation (error detection) 1) Model specification is unit tested 1) Unit testing
2) Feature creation unit testing 2) ML model training pipeline is integration tested 2) Integration testing for the end-to-end
3) ML model is validated before being operationalized pipeline
4) ML model staleness test (in production)
5) Testing ML model relevance and correctness
6) Testing non-functional requirements (security, fairness, interpretability)
Automation 1) Data transformation 1) Data engineering pipeline 1) ML model deployment with CI/CD
2) Feature creation and manipulation 2) ML model training pipeline 2) Application build
3) Hyperparameter/Parameter selection
Reproducibility 1) Backup data 1) Hyperparameter tuning is identical between dev and prod 1) Versions of all dependencies in dev
2) Data versioning 2) The order of features is the same and prod are identical
3) Extract metadata 3) Ensemble learning: the combination of ML models is same 2) Same technical stack for dev and
4) Versioning of feature engineering 4)The model pseudo-code is documented production environments
3) Reproducing results by providing
container images or virtual machines
Deployment 1) Feature store is used in dev and prod 1) Containerization of the ML stack 1) On-premise, cloud, or edge
environments 2) REST API
3) On-premise, cloud, or edge
Monitoring 1) Data distribution changes (training vs. serving 1) ML model decay 1) Predictive quality of the application on
data) 2) Numerical stability serving data
2) Training vs serving features 3) Computational performance of the ML model
https://ml-ops.org/content/mlops-principles#summary-of-mlops-principles-and-best-practices
Best practices to MLOps principles
MLOps Best Practices Data ML Model Code

Documentation 1) Data sources 1) Model selection criteria 1) Deployment process
2) Decisions, how/where to get data 2) Design of experiments 2) How to run locally
3) Labelling methods 3) Model pseudo-code
Project Structure 1) Data folder for raw and processed data 1) A folder that contains the trained model 1) A folder for bash/shell scripts
2) A folder for data engineering pipeline 2) A folder for notebooks 2) A folder for tests
3) Test folder for data engineering methods 3) A folder for feature engineering 3) A folder for deployment files (e.g
4)A folder for ML model engineering Docker files)
https://ml-ops.org/content/mlops-principles#summary-of-mlops-principles-and-best-practices

References
https://github.com/alexeygrigorev/mlbookcamp-code https://mlops-guide.github.io/Workflow/
https://github.com/paiml/practical-mlops-book https://ml-ops.org/
https://mlops.community/blog/ https://stanford-cs329s.github.io/syllabus.html
https://github.com/PacktPublishing/The-Machine-Learning https://github.com/DataTalksClub/mlops-zoomcamp
-Solutions-Architect-Handbook
https://github.com/kennethleungty/MLOps-Specialization-Notes
https://drive.google.com/drive/folders/1GpaAv1KhNNtYg3EsI6RpufFj8Tylv0UQ?usp=share_link
https://www.run.ai/guides/machine-learning-operations


MLOps Asilla 20221124

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MLOps Asilla 20221124

Uploaded by

Copyright:

Available Formats

Machine Learning

copyright© Asilla, Inc.

copyright© Asilla, Inc.

- Machine learning engineering for production combines the foundational

copyright© Asilla, Inc.

copyright© Asilla, Inc.

copyright© Asilla, Inc.

- Data Engineering: data

The complete development

copyright© Asilla, Inc.

copyright© Asilla, Inc.

Feature stores make it easy to:

- Feature Management: Discover, Use,

copyright© Asilla, Inc.

1. Manual process. This is a typical data science

copyright© Asilla, Inc.

An example of the technology stack might include

ps Setup Components Tools

copyright© Asilla, Inc.

MLOps Best Practices Data ML Model Code

copyright© Asilla, Inc.

copyright© Asilla, Inc.

You might also like