You are on page 1of 40

A Survey of Techniques for

Maximizing LLM Performance


精进大型语言模型性能的各种技巧
https://www.youtube.com/watch?v=ahnGLM-RC1Y
Organized by Richard
Twitter: richchat
微信公众号:檬查查
Optimizing LLMs is hard
Optimizing LLMs is hard
• Extracting signal from the noise is not easy
• Performance can be abstract and difficult to measure
• When to use what optimization
• Today‘s talk is about maximizing performance.
You should leave here with:
• A mental model of what the options are
• An appreciation of when to use one over the other
• The confidence to continue on the journey yourself
Optimizing LLM performance is not always
linear
1 – Prompt Engineering
Prompt engineering - Strategies for
optimization
Start with Extend to
• Write clear instructions • Provide reference text
• Split complex tasks into • Use external tools
simpler subtasks
• Give GPTs time to “think”
• Test changes systematically
Prompt engineering
Intuition - Best place to start, and can be a pretty good place to finish
Good for Not good for
• Testing and learning early, • Introducing new information
• When paired with evaluation, it • Reliably replicating a complex
provides your baseline and style or method, i.e. learning a
sets up further optimization new programming language
• Minimizing token usage
2 - Retrieval-augmented
generation
RAG vs fine-tune
RAG
• Giving the model access to
domain-specific context
RAG
Intuition - If you want to give your LLM domain knowledge, then RAG is
likely the best next step

Good for Not good for


• Introducing new information • Embedding understanding of a
to the model to update its broad domain
knowledge • Teaching the model to learn a
• Reducing hallucinations by new language, format or style
controlling content • Reducing token usage
RAG – success story
RAG - Cautionary Tale
RAG - How to think about eval
3 – fine-tuning
Fine-tuning
• Continuing the training process
on a smaller, domain-specific
dataset to optimize a model
fora specific task
Fine-tuning Benefits
• Improve model performance on a specific task
• Often a more effective way of improving model performance than
prompt-engineering or FSL;
• Improve model efficiency
• Reduce the number of tokens needed to geta model to perform well on
your task.
• Distill the expertise of a large model into a smaller one
Fine-tuning
Intuition - lf prompt engineering isn't helping, finetuning likely isn't right
for your use-case

Good for Not good for


• Emphasizing knowledge that • Adding new knowledge to the
already exists in the model base model
• Customizing the structure or • Quickly iterating on a new use-
tone of responses case
• Teaching a model very
complex instructions
Fine-tuning – Canva success story
Fine-tuning - Cautionary Tale
Fine-tuning Steps
Fine-tuning Best Practices
Fine-tuning+RAG
Best of both worlds
• Fine-tune the model to understand complex instructions
• Minimize prompt-engineering tokens
• More space for retrieved context
• Use RAG to inject relevant knowledge into the context
4 – Application of Theory
Challenge
RAG – What we did
RAG - Evaluation
Fine-tuning
The optimization flow
Thank you!
https://www.youtube.com/watch?v=ahnGLM-RC1Y
Organized by Richard
Twitter: richchat
微信公众号:檬查查

You might also like