You are on page 1of 12

The New Stack and Ops for AI

https://www.youtube.com/watch?v=XGJNo8TpuVA
Going from the prototype to production

Framework to help guide you moving your app from prototype into production
1
. User Experience - Challenges
Control for uncertainly
Build guardrails for streerability and safety
Aspect building Transparent UX
Keep the human in the loop

To communicate the system’s capabilities and limitations


Guide the users Human/AI Collaboration

Guardrails = Safety controls for LLMs


Guardrails are essential for UX, especially from applications in regulated industries
. Model Consistency
Constrain model behavior
JSON Mode => allows you to force the model to output JSON, new
parameter json_schema …..
Reproducible output: You can get significantly more reproducible
output using the seed parameter

Ground the model (using knowledge store)


In the input context, explicitly give the model “grounded facts” to
reduce the likelihood of hallucinations (like a RAG)
Example:

Function call API

. Call API function


. Structured answer

Grounded Fact Source


Grounded Fact Sources
Search index
Retrieval
Database
Etc

. Evaluating Performance - Strategies to evaluate


1. Create eval suites for your specific use cases. https://github.com/openai/
evals
Long and track your eval runs

2. Model - graded evals (Using AI to grade AI)


GPT4-Strong evaluator Binary metric
Metric closely correclated with what your users would expect
IF GPT-4 is expensive/slow for evals, you can fine-tune a 3.5 “judge” by distilling
GPT-4 Output
. Managing latency & Cost (orchestration)
Two strategies in managing cost and latency involve:
1. Using semantic caching (reduce the number of round trips that
you’re making)
2. Route to cheaper models

Solved

Pricing
Summary graph

LLMOps

You might also like