AWS FMOps LLMOps Operationalise GenAI Using MLOps Principles

300 LEVEL
FMOps/LLMOps: Operationalise
Generative AI using MLOps principles
Dr Sokratis Kartakis (he/him) Heiko Hotz (he/him)

Senior MLOps SA Architect, EMEA Senior LLM SA Architect, EMEA
Amazon Web Services Amazon Web Services
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
MLOps Foundation Overview

• MLOps KPIs, Maturity, People, Processes, Technology
Generative AI (GenAI) & MLOps

• Main Definitions
MLOps & FMOps/LLMOps Differentiators

• Processes & People
• Providers, fine-tuners, & consumers
• Select & Adapt the FM on a Specific Context
• Evaluate & Monitor Fine-tuned Models
• Data & Deployment
• Technology
What is MLOps?
MLOps Definition
MLOps
Machine Learning
& Operations People
The combination of people,

processes, and technology to
productionize ML solutions Technology Processes
efficiently.
MLOps Foundation Expected Outcomes
STANDARDIZE OPERATIONS AND INFRASTRUCTURE FOR YOUR DATA SCIENCE
MLOps Expected
Business Goal Technical Metric Before MLOps Business Value
Outcomes
Time to value up to Improve Speed-to-Value
1 Be more efficient in delivery < 3 months
(from idea to production) 12 months by 4x
Time to productionize existing ML use Reduce FTE overhead
2 Simplify route-to-live 3-6 months < 2 weeks
cases in average 8x
Focus on innovation
Standardize infrastructure, data,
3 % Template driven development n/a > 85% increasing re-usability
& code
by 85%
Standardize onboarding of new Time to instantiate a new MLOps Accelerate ML adoption
4 40 days < 1 hours
teams and ML use cases infrastructure & ML projects across all business areas
Execute the ML solutions without Your data is safe in your
5 Ensure high security standards n/a No internet
internet access in a private cloud private cloud
Reduce platform, people and operation costs
Customer references building MLOps foundation and business benefits:

• NatWest: https://aws.amazon.com/solutions/case-studies/natwest-group-case-study
• BP: https://aws.amazon.com/solutions/case-studies/bp-machine-learning-case-study
MLOps Maturity Model
del
Models in Production
Mo Scalable
erating
Op Reliable
Templatize and
Repeatable Productionize Multiple ML
Introduce Testing,
Solutions by Multiple
Monitoring, and Multi-
Teams
account Deployment
Standardize Code
Initial Repositories & ML Solution
Deployment
Establish the
Experimentation
Environment
MLOps Maturity
MLOps Key Personas and Roles
Platform Team Business
Secure Cloud/Data/ML Platform Viz Dashboards, ML Adoption, & ROI
Advance Analytics Team Data Science Team

Data Lake Experimentation & MLOps
MLOps Engineer/Admin Business Stakeholder
Standardize CI/CD, user/service role, Product Owners
model consumption, testing and Define business problem, business KPIs, and
deployment methodology make business decisions
Data Engineer Data Scientist
Prepare & Ingest data Create the best ML models
building ETL pipelines to solve business problems
Security Business Stakeholder

Assess data, user, and service Data & ML Consumers
access creating policies and Consumers of ML results from other BUs,
guardrails driving business decision making
Data Owners ML Engineer
Manage data sharing Collaborate with DS to Risk & Compliance
and provide access productionize ML
Approve & Review Models
Architects/ SysOps Engineer

Standardize account infrastructure,
connectivity, user roles Auditors/Risk & Compliance
implementation Review models, data sources,
code artifacts
MLOps Foundation People & Processes
SEPARATION OF CONCERNS IS KEY FOR SUCCESS
ML Governance Product Owner Model Auditors &

Lead Data Scientists Approvers Compliance
MLOps System Centralized model and code artifact
Engineers Administrators storage/versioning/auditing
ML Production
Environment
Security
Data Scientists Data Scientists ML Engineers
Platform Model
Experimentation Model Build Model Test
Administration Deployment
Prove that ML can solve a business
Automate model build/training Automate model testing and Serving and monitoring the model
problem, i.e. PoC
providing scaled data guardrails testing
Provision infrastructure Ingest data Data

Provide user access Prepare, combine and catalogue Data Data Engineers Data Owners Business Stakeholders
Provide data access Visualize data ML Consumers
MLOPs Scalable Phase
MULTIPLE TEAMS AND ML USE CASES ADOPT MLOPS
Generative AI (GenAI) & MLOps
MLOps & FMOps/LLMOps Differentiators
GenAI Use Case Domains
Text-to-Text = LLM – Unlabeled data {text}

• Chatbots
• Writing assistants e.g. summarization
• Programming assistants
Text-to-Image - Labeled data {text, image}

Adapt
Train • Generate fantasy images
• Generate new product design
Terabytes of Foundation Models
data Billions of Parameters Text-to-Audio or Video – (un)labeled data
(coming soon)
• Music composers
Key Definitions
Foundation Model Operations

Machine Learning Operations
Productionize ML solutions MLOps FMOps Productionize GenAI Solutions
efficiently (Text-Text/ Image/ Video/ Audio/
…)
Technology
Processes
People
Large Language Model Operations

LLMOps Productionize Large Language
Model-based solutions
MLOps & FMOps Differentiators
MLOps FMOps
Processes & People
Providers, fine-tuners, & consumers
Select & Adapt the FM on a Specific Context

Technology - Fine-tuning, parameter-efficient fine-tuning, prompt engineering
- Proprietary, open source based on the application
Processes
Evaluate & Monitor Fine-tuned Models

Human feedback, prompt management, toxicity/bias…
People
Data & Model Deployment
Data privacy, multi-tenancy, & cost, latency, and precision
Technology
MLOps, data, & application layers
MLOps FMOps Processes & People


Processes - Proprietary, open source based on the application

Human feedback, prompt management, toxicity/bias…
People
Technology
GenAI User Types & Skills
can be also can become
Generative AI Providers Fine-Tuners Consumers

User Types
Entities who build foundation Fine-tune foundational models Interact with Generative AI
models from scratch from providers to fit custom services from provider or
themselves and provide them requirements. Orchestrate the tuner by text prompting or
as a product to tuner and deployment of the model as a visual interface to complete
consumer. service for use by consumers. desired actions.
Deep end-to-end ML, NLP Strong end-to-end ML expertise No ML expertise required. Mostly
expertise and data science, and knowledge of model application developers or end-
labeler “squad“ deployment and inference. users with understanding of the
Skills Strong domain knowledge for service capabilities. Only prompt
tuning including prompt engineering is required for better
engineering. results.
Productionize applications
MLOps is required where DevOps/AppDev is
more relevant than MLOps
17
The Journey of Consumers
GenAI Processes - Consumers
Select, evaluate, & use FM as Inputs/Outputs & Rating

a black-box & adapt context Interaction with the GenAI Solutions.
Using multiple chained models and Aim to improve outcomes by
prompt engineering techniques to achieve penalizing or rewarding GenAI
context adaptation (if necessary). Expose solution outputs providing insights
the solution to the end users for prompt engineering
19
Select FM - Consumers
Step 1. Understand Step 2. Test & evaluate the Step 3. Select the best
top proprietary and top selected FMs (e.g. top 3) FM based on your
open source FM priorities
capabilities
13 K
available 20 3 1
FMs
Quick short listing: Use case-based benchmarking: Priority-based decision:

Use a small set of test Evaluate the models based on Select based on business
prompts based on the predefined prompts and outputs priorities cost, latency, precision
task (prompt catalog)
20
Step 1. Understand top FM capabilities
Main FM Capability Matrix
Proprietary or open-source FM
Commercial License Fine-tunable
Speed #Parameter Context Window

(#tokens where a tokens is ~0.75x words)
Trained on Quality
(instructions, code, interne data)
Existing Customer Skills
21
Step 1. Proprietary FM Capabilities
GPU
Company Can be used Available on
Model Name # Params instance Speed Context Window Trained on Fine-tunable
Name Commercially AWS
req.
Internet Data,
Bedrock,
J2 Ultra Instruct Yes 178 B p4d.24xl 8K Code, No
Jumpstart/SM Instructions
Internet Data,
AI21 Bedrock,
J2 Mid Instruct Yes 17 B g5.12xl 8K Code, No
Jumpstart/SM Instructions
Internet Data,
AI21 Summarize Yes g4dn.12xl Jumpstart/SM ~13 K Instructions No
Amazon Titan Text Large Yes n/a n/a Bedrock 4K n/a No

Internet Data,
Code,
Anthropic Claude Yes n/a n/a Bedrock 12 K Instructions,
No
Human feedback
Generate Model n/a Internet Data,

Yes n/a Jumpstart/SM 4K Instructions No
Command (50 B)
Cohere
Generate Model n/a Internet Data,
Yes n/a Jumpstart/SM 4K Instructions No
Command-Light (6 B)
Internet Data
LightOn Lyra-Fr 10B Yes 10 B g5.12xl Jumpstart/SM ? (French) No
Bedrock,
Stability AI SDXL Yes n/a g5.xl - <Text, Image> No
Jumpstart/SM
*Last update 16 Jun 2023 23

Step 1. Open-source FM Capabilities
GPU
Company Can be used Available on
Model Name # Params instance Speed Context Window Trained on Fine-tunable
Name Commercially AWS
req.
Internet Data,
FLAN-UL2 Yes 20 B g5.12xl Jumpstart/SM 2K Code, Yes
Instructions
Google
Internet Data,
FLAN-T5-XXL Yes 11 B g5.xl Jumpstart/SM 512 Code, Yes
Instructions
Internet Data,
Eleuther GPT-J Yes 6B g5.xl Jumpstart/SM 512 Code Yes
Internet Data,
Falcon-40B-
Yes 40 B g5.12xl Jumpstart/SM 2K Code, Yes
Instruct Instructions
TII
Internet Data,
Falcon-7B-
Yes 7B g5.xl Jumpstart/SM 2K Code, Yes
Instruct Instructions
Starcoder Yes 15 B g5.12xl SM 8K Code Yes

BigCode
Santa Coder Yes 1.1 B g5.xl SM 2K Code Yes
Internet Data,
LMSYS Org Vicuna-13B No 13 B g5.xl SM 2K Code, Yes
Instructions
Internet Data,
Meta Llama-65B No 65 B g5.48xl SM 2K Code Yes
Stability AI SD 2.1 Yes - g5.xl Jumpstart/SM - <Text, Image> Yes

*Last update 16 Jun 2023 24
Step 1. EU AI Act Matters for FM Selection
https://crfm.stanford.edu/2023/06/15/eu-ai-act.html
Step 1. Understand FM Capabilities
Can be the same person
Prompt Engineers GenAI Developers

Design Initial prompts Short list the top models based
on the initial prompts
Business new Design initial
GenAI use case prompts
GenAI use case example : Prompt examples:

Financial documents “Based on … Summarize the text” Selected top 3 FM example:
summarization “After reviewing X provide the summary” Titan Text Large
“Give me the summary” Claude
Falcon-7B-Instruct
Note:
Prompt = Input (data, data source, & instructions) + query
26
Step 2. Evaluate the top FMs
Accuracy metrics Classic ML metrics e.g. precision,

High evaluation precision but not …
Yes many use cases
Does the use case
provide discrete
Yes outcomes/outputs?
No Similarity metrics
Medium evaluation precision that ROUGE, cosine similarity [0,1], …
might require human evaluation but
many use cases
Does labelled data
exist?
No Human in the Loop Manually human feedback
based on predefined
(HIL)
No High evaluation precision but costly assessment rules e.g. using
and time consuming GroundTruth
Should we automate
the test process?
Feed the outputs to a ”reliable”
Yes LLM LLM and instruct it to rate the
Unknown evaluation precision
outcome with an score and
(depend on the LLM) but automated
explanation
27
Step 2. Evaluate the top FMs - Examples
Evaluation Prompt Catalog Evaluation Results
Prompt Output Evaluation Prompt Labeled LLM Score Feedback

(Input + Query) Pre-created Output Method Output Output
e.g. Labeled
Accuracy “Who is the PM of “Rishi “Rishi 1.0 -
“Give me the full name of “Rishi Sunak” metric the UK?” Sunak” Sunak” precision
the UK PM”
Similarity <Text to ”The <Summ 0.65 cos -
<Text to summarize> + ”The summary is …” metric summarize> + “Give summary ary> sim
“Give me the summary” me the summary” is …”
”Generate a story based n/a HIL/LLM ”Generate a story - <Story> 4/5 <Free
on the SnowWhite book” based on the text>
SnowWhite book”
“Generate the source code n/a Evaluate the top 3 FM
for a retail website” e.g. Titan Text Large, HIL/LLM “Generate the - Code> 3/5 <Free
Claude, Falcon-7B- source code for a text>
*Public or private data can be used retail website”
Instruct
Model Evaluation Score HIL/LLM Feedback

FM1 5/5 <Feedback summary>
Generate aggregated
results for the top FMs FM2 3/5 <Feedback summary>
Prompt Engineers GenAI Developers Prompt Testers
Design evaluation prompts Model Evaluations Conduct HIL activities FM3 4/5 <Feedback summary>
28
EX
AM
Step 3. Select the best FM based on priorities PLE
Model Speed
FM1 ⚡⚡
FM2 ⚡
FM3 ⚡
Speed No priority
High speed, smaller model, Model Selection:

lower precision, smaller cost FM2
P1: Precision Precision Cost P0: lower cost
Model Evaluation Score HIL/LLM Feedback Model Cost

FM1 5/5 <Feedback summary> Lower speed, larger model, FM1 $$$$
FM2 4/5 <Feedback summary> higher precision, larger cost FM2 $
FM3 3/5 <Feedback summary> FM3 $$$
29
GenAI Processes for LLM - Consumers
GenAI Developers & Prompt Engineers/Testers DevOps/AppDevs
Backend Front-end
LLM-based GenAI Solution
New Test Set
3. Test & 5. Input/ Develop & Input/ Output

1. Select FM Test Prompt Output Deploy Web & Rating
lineage WebUI GenAI End-users
Filtering Application Interaction Use web application
(input & outputs)
rate the quality of
output
4. Chain 6. Rating Test

2. Prompt Mechanisms Functionality
Prompts &
Engineering (thumbs up/down,
Applications rating, text)
30
© 2023, Amazon Web Services, Inc. or its aﬃliates. All rights reserved.
GenAI Technology for LLM - Consumers
Amazon Bedrock
33
GenAI Processes - Consumers
Select, evaluate, & use FM as Inputs/Outputs & Rating

a black-box & adapt context Interaction with the GenAI Solutions. Aim to
Using multiple chained models and prompt improve outcomes by penalizing or
engineering techniques to achieve context rewarding GenAI solution outputs providing
adaptation (if necessary). Expose the solution insights for prompt engineering
to the end users
Share data & fine-tune

Create Private User Accounts models as a black box
Create user accounts to websites that later
can upload new data and get personalized Upload small amount of data and behind
outcomes (e.g. images) the scenes fine tune a model that can
interact to retrieve personalized results
34
GenAI Processes for LLM - Consumers
GenAI Developers & Prompt Engineers/Testers DevOps/AppDevs
Backend Front-end
New Test Set
LLM-based GenAI Solution
1.Select FM 3.
3. Test &
Chain Test & Test
5. Input/ Develop & Input/ Output
or Select
1. Use Fine-
FM Test Prompt
Prompts & Prompt
Output Deploy Web & Rating WebUI
tuned Mode lineage
Applications lineage
Filtering GenAI End-users
(input & outputs)
Application Interaction Use web application
(input & outputs)
rate the quality of
output
Create User
4.
4. Input/
Chain 6.
5. Rating Test
2. Prompt Account &
Prompts
Output& Mechanisms Functionality
Engineering (thumbs up/down,
Share Data
Applications
Filtering rating, text)
Fine-tune Personalized Models

Fine-tune FM using APIs
35
36
Amazon Bedrock
37
The Journey of Providers
GenAI Providers Productionize FM using MLOps
TRAIN MULTIPLE FOUNDATIONS MODELS
The Journey of Fine-tuners
GenAI Processes - Fine-Tuners
Data Labeling Fine-tune Deployment & Monitor

Human in the loop to label Customization for Prompt Engineering Human in the loop
trillions, thousands, or specific domains Trade-off between cost, feedback/rating, result
hundreds of data precision, & latency similarities, toxicity rate,
Chaim of models new methods under
Prompt designing and research…
engineering
Filtering input prompts and
results using embeddings
45
Fine-Tuning, PEFT & Training
Prompt
‘right’ model
Select the
(Inputs)
Fine-Tuning
(Training Job)
Requires high computation power (high cost but lower
Pre-trained FM than training) to calculate all the weights of Large FM
(deep learning model)
Higher accuracy
Thousands of example
input/outputs Fine-Tuned FM
Parameter-Efficient Fine-
Tuning (PEFT - Training Job)
Tens of example Requires low computation power (reduced cost e.g. 1/10 of Completion
input/outputs fine-tuning) as it adds small new layers in the Large FM (deep
learning model) (Outputs)
Lower accuracy
MLOPs Technology
MLOPs & GenAI Technology - Fine-tuner
Multi-tenancy
Only Open Source
deployment
FM can be stored in
Model Registry
Select FM Models
Human in the loop:

Manual testing of
the fine-tuned
models
Data & Open Source Fine-tuned FM Deployment
AWS Account – Fine-tuner
Model Registry
Main Model Group
V1 V2 V3
Preloaded Open-
Source FM
Source FM
Open
FM1 Source
Notebook Internet Code
or
Code Repo Internet connection is
required to get locally the
FMs
Amazon Model
SageMaker
Customer Data Endpoint
Data & Proprietary Fine-tuned FM Deployment
AWS Account – Proprietary FM Provider 1

Use
AWS Account – Fine-tuner model Model Registry
package
Main Model Group
Internet
Notebook
or V1 V2 V3
Code Repo
Producer Production Ready FM
Amazon Model
SageMaker
Customer Data Endpoint
AWS Account – Proprietary FM Provider 2
AWS Account – Proprietary FM Provider N
MLOPs &
Generative AI
Technology –
MLOps
MLOps
Fine-tuner
T H R E E M A I N L AY E R S A R E
INTERCONNECTED
Data
Data
Generative AI
Application
Generative AI Application
MLOPs &
Generative AI
Technology – Amazon 4
Fine-tuner Bedrock
THREE MAIN LAYERS ARE
INTERCONNECTED
People & Processes
MLOps Key Personas and Roles
Platform Team Business
Secure Cloud/Data/ML Platform Viz Dashboards, ML Adoption, & ROI
Advance Analytics Team Data Science Team

Data Lake Experimentation & MLOps
MLOps Engineer/Admin Business Stakeholder
Standardize CI/CD, user/service role, Product Owners
model consumption, testing and Define business problem, business KPIs, and
Data Engineer Data Scientist
Prepare & Ingest data Create the best ML models
building ETL pipelines to solve business problems
Security Business Stakeholder

Assess data, user, and service Data & ML Consumers
access creating policies and Consumers of ML results from other BUs,
guardrails driving business decision making
Data Owners ML Engineer
Manage data sharing Collaborate with DS to Risk & Compliance
and provide access productionize ML
Architects/ SysOps Engineer

Standardize account infrastructure,
connectivity, user roles Auditors/Risk & Compliance
implementation Review models, data sources,
code artifacts
MLOps & FMOps Key Personas and Roles
Advance Analytics Team Data Science Team Platform Team Business
Data Lake Experimentation & MLOps Secure Cloud/Data/ML Platform Viz Dashboards, ML Adoption, & ROI
Data Engineer Data Scientist MLOps Engineer/Admin Business Stakeholder

Prepare & Ingest data Create the best ML models Standardize CI/CD, user/service role, Product Owners
building ETL pipelines to solve business problems model consumption, testing and Define business problem, business KPIs, and
Data Owners ML Engineer Security & Architects Business Stakeholder

Manage data sharing Collaborate with DS to Assess data, user, and service Data & ML Consumers
and provide access productionize ML access creating policies and Consumers of ML results from other BUs,
infrastructure driving business decision making
Risk & Compliance

Auditors/Risk & Compliance

Review models, data sources,
code artifacts
MLOps & FMOps Key Personas and Roles
Advance Analytics Team Data Science Team Platform Team Business
Data Lake Experimentation & MLOps Secure Cloud/Data/ML Platform Viz Dashboards, ML Adoption, & ROI
Data Engineer Data Scientist MLOps Engineer/Admin Business Stakeholder

Prepare & Ingest data Create the best ML models Standardize CI/CD, user/service role, Product Owners
building ETL pipelines to solve business problems model consumption, testing and Define business problem, business KPIs, and
Data Owners ML Engineer Security & Architects Business Stakeholder

Manage data sharing Collaborate with DS to Assess data, user, and service Data & ML Consumers
and provide access productionize ML access creating policies and Consumers of ML results from other BUs,
infrastructure driving business decision making
Labeler Team Data Science Team Extension Application Developer Team End-Users
Data Preparation at Scale Context Adaptation Integrate GenAI models in applications Consume Generative AI applications
Data Labelers/Editors Fine Tuners Generative AI Developers, AppDev, & Generative AI End-users
Label or edit billions of Data for Select the corresponding Prompt Engineers/Testers Consume Generative AI solutions as black
FM models and hundreds of data FM, evaluate the model & Design prompt inputs, create examples of box, share data and rate the quality of
for fine tuning interacting with design the deployment prompt input/outputs, and test the output
data lake using a dedicated method/infrastructure engineered prompts, develop the GenAI
website application and front-end
Generative AI Personas
Data Labelers/Editors Fine Tuners Generative AI Developers, AppDev, Generative AI End-users

Label trillions of Data for FM Select the corresponding & Prompt Engineers/Tests Consume Generative AI solutions as black
models and hundreds of data for FM, evaluate the model & Design prompt inputs, create examples of box, share data and rate the quality of
fine tuning interacting with data design the deployment prompt input/outputs, and test the output
lake using a dedicated website method/infrastructure engineered prompts, develop the GenAI
application and front-end
59
Generative AI Personas
Data Labelers/Editors Fine Tuners Generative AI Developers Generative AI End-users

Label trillions of Data for FM Select the corresponding Select, test, evaluate the FM, filter Consume Generative AI solutions as black
models and hundreds of data for FM, evaluate the model & inputs/outputs, and develop the GenAI box, share data and rate the quality of
fine tuning interacting with data design the deployment application back-end (e.g. LangChain Experts) output
lake using a dedicated website method/infrastructure
AppDev
Develop the front-end of the GenAI
application
Prompt Engineers Prompt Testers

Design the input/output Test at scale the Generative
prompts to adapt the AI solution (back-end/
solution to the context and front-end) and feed their
test the initial version results to the prompt test
repository
60
MLOps Foundation People & Processes

ML Production
Environment
Security
Data Scientists Data Scientists ML Engineers
Platform Model
problem, i.e. PoC
Provision infrastructure Ingest data Data

Provide user access Prepare, combine and catalogue Data Data Engineers Data Owners Business Stakeholders
Provide data access Visualize data ML Consumers
MLOps & GenAI Foundation People & Processes

ML Production
Environment
Security
Data Scientists Fine-Tuners Data Scientists ML Engineers

Fine-Tuners
Platform Model
problem, i.e. PoC
Provision infrastructure Ingest data Data Web UI
Provide user access Prepare, combine and catalogue Data Data Engineers Data Owners Business Stakeholders Data
Provide data access Visualize data ML Consumers Labelers / Editors
Application Generative AI Prompt AppDev

Web UI
Generative AI
Developers Engineers /Testers
End-users
Bonus: GenAI/LLM Vulnerabilities
https://owasp.org/www-project-top-10-for-large-
language-model-applications/descriptions/
GenAI/LLM Vulnerabilities 1/2
Prompt Injections: Bypassing filters or manipulating the LLM using carefully crafted prompts that make
the model ignore previous instructions or perform unintended actions.
Data Leakage: Accidentally revealing sensitive information, proprietary algorithms, or other

confidential details through the LLM’s responses.
Inadequate Sandboxing: Failing to properly isolate LLMs when they have access to external resources
or sensitive systems, allowing for potential exploitation and unauthorized access.
Unauthorized Code Execution: Exploiting LLMs to execute malicious code, commands, or actions on the
underlying system through natural language prompts.
SSRF Vulnerabilities: Exploiting LLMs to perform unintended requests or access restricted resources,
such as internal services, APIs, or data stores.
GenAI/LLM Vulnerabilities 2/2
Overreliance on LLM-generated Content: Excessive dependence on LLM-generated content without
human oversight can result in harmful consequences.
Inadequate AI Alignment: Failing to ensure that the LLM’s objectives and behavior align with the
intended use case, leading to undesired consequences or vulnerabilities.
Insufficient Access Controls: Not properly implementing access controls or authentication, allowing
unauthorized users to interact with the LLM and potentially exploit vulnerabilities.
Improper Error Handling: Exposing error messages or debugging information that could reveal
sensitive information, system details, or potential attack vectors.
Training Data Poisoning: Maliciously manipulating training data or fine-tuning procedures to introduce
vulnerabilities or backdoors into the LLM
Conclusion
MLOps FMOps Processes & People


Processes - Proprietary, open source based on the application

A/B testing & human feedback
People
Technology
Thank you! Please complete the session
survey in the mobile app
Dr Sokratis Kartakis (he/him) Heiko Hotz (he/him)

Senior MLOps SA Architect, EMEA Senior LLM SA Architect, EMEA
Amazon Web Services Amazon Web Services
@sokratis.kartakis
© 2023, Amazon Web Services, Inc. or its affiliates.

aﬃliates. All rights reserved.

AWS FMOps LLMOps Operationalise GenAI Using MLOps Principles

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AWS FMOps LLMOps Operationalise GenAI Using MLOps Principles

Uploaded by

Copyright:

Available Formats

300 LEVEL

Dr Sokratis Kartakis (he/him) Heiko Hotz (he/him)

MLOps Foundation Overview

Generative AI (GenAI) & MLOps

MLOps & FMOps/LLMOps Differentiators

The combination of people,

Reduce platform, people and operation costs

Customer references building MLOps foundation and business benefits:

Advance Analytics Team Data Science Team

Security Business Stakeholder

Architects/ SysOps Engineer

ML Governance Product Owner Model Auditors &

Data Scientists Data Scientists ML Engineers

Provision infrastructure Ingest data Data

Text-to-Text = LLM – Unlabeled data {text}

Text-to-Image - Labeled data {text, image}

Foundation Model Operations

Large Language Model Operations

Select & Adapt the FM on a Specific Context

Evaluate & Monitor Fine-tuned Models

MLOps FMOps Processes & People

Select & Adapt the FM on a Specific Context

Evaluate & Monitor Fine-tuned Models

can be also can become

Generative AI Providers Fine-Tuners Consumers

Select, evaluate, & use FM as Inputs/Outputs & Rating

Quick short listing: Use case-based benchmarking: Priority-based decision:

Main FM Capability Matrix

Commercial License Fine-tunable

Speed #Parameter Context Window

Existing Customer Skills

Amazon Titan Text Large Yes n/a n/a Bedrock 4K n/a No

Generate Model n/a Internet Data,

*Last update 16 Jun 2023 23

Starcoder Yes 15 B g5.12xl SM 8K Code Yes

Stability AI SD 2.1 Yes - g5.xl Jumpstart/SM - <Text, Image> Yes

Prompt Engineers GenAI Developers

GenAI use case example : Prompt examples:

Accuracy metrics Classic ML metrics e.g. precision,

Evaluation Prompt Catalog Evaluation Results

Prompt Output Evaluation Prompt Labeled LLM Score Feedback

Model Evaluation Score HIL/LLM Feedback

High speed, smaller model, Model Selection:

P1: Precision Precision Cost P0: lower cost

Model Evaluation Score HIL/LLM Feedback Model Cost

GenAI Developers & Prompt Engineers/Testers DevOps/AppDevs

New Test Set

3. Test & 5. Input/ Develop & Input/ Output

4. Chain 6. Rating Test

Select, evaluate, & use FM as Inputs/Outputs & Rating

Share data & fine-tune

GenAI Developers & Prompt Engineers/Testers DevOps/AppDevs

Fine-tune Personalized Models

Data Labeling Fine-tune Deployment & Monitor

Human in the loop:

AWS Account – Fine-tuner

AWS Account – Proprietary FM Provider 1

AWS Account – Proprietary FM Provider N

Advance Analytics Team Data Science Team

Security Business Stakeholder

Architects/ SysOps Engineer

Data Engineer Data Scientist MLOps Engineer/Admin Business Stakeholder

Data Owners ML Engineer Security & Architects Business Stakeholder