This document discusses techniques for optimizing large language model (LLM) performance, including prompt engineering, retrieval-augmented generation (RAG), and fine-tuning. Prompt engineering involves writing clear instructions and testing changes systematically. RAG gives the model access to domain-specific context to reduce hallucinations. Fine-tuning continues training on a smaller dataset to improve performance on a specific task. The techniques each have different strengths, and an optimization flow may combine them, such as fine-tuning a model and using RAG to inject relevant knowledge.
Original Description:
Original Title
A Survey of Techniques for Maximizing LLM Performance
This document discusses techniques for optimizing large language model (LLM) performance, including prompt engineering, retrieval-augmented generation (RAG), and fine-tuning. Prompt engineering involves writing clear instructions and testing changes systematically. RAG gives the model access to domain-specific context to reduce hallucinations. Fine-tuning continues training on a smaller dataset to improve performance on a specific task. The techniques each have different strengths, and an optimization flow may combine them, such as fine-tuning a model and using RAG to inject relevant knowledge.
This document discusses techniques for optimizing large language model (LLM) performance, including prompt engineering, retrieval-augmented generation (RAG), and fine-tuning. Prompt engineering involves writing clear instructions and testing changes systematically. RAG gives the model access to domain-specific context to reduce hallucinations. Fine-tuning continues training on a smaller dataset to improve performance on a specific task. The techniques each have different strengths, and an optimization flow may combine them, such as fine-tuning a model and using RAG to inject relevant knowledge.
精进大型语言模型性能的各种技巧 https://www.youtube.com/watch?v=ahnGLM-RC1Y Organized by Richard Twitter: richchat 微信公众号:檬查查 Optimizing LLMs is hard Optimizing LLMs is hard • Extracting signal from the noise is not easy • Performance can be abstract and difficult to measure • When to use what optimization • Today‘s talk is about maximizing performance. You should leave here with: • A mental model of what the options are • An appreciation of when to use one over the other • The confidence to continue on the journey yourself Optimizing LLM performance is not always linear 1 – Prompt Engineering Prompt engineering - Strategies for optimization Start with Extend to • Write clear instructions • Provide reference text • Split complex tasks into • Use external tools simpler subtasks • Give GPTs time to “think” • Test changes systematically Prompt engineering Intuition - Best place to start, and can be a pretty good place to finish Good for Not good for • Testing and learning early, • Introducing new information • When paired with evaluation, it • Reliably replicating a complex provides your baseline and style or method, i.e. learning a sets up further optimization new programming language • Minimizing token usage 2 - Retrieval-augmented generation RAG vs fine-tune RAG • Giving the model access to domain-specific context RAG Intuition - If you want to give your LLM domain knowledge, then RAG is likely the best next step
Good for Not good for
• Introducing new information • Embedding understanding of a to the model to update its broad domain knowledge • Teaching the model to learn a • Reducing hallucinations by new language, format or style controlling content • Reducing token usage RAG – success story RAG - Cautionary Tale RAG - How to think about eval 3 – fine-tuning Fine-tuning • Continuing the training process on a smaller, domain-specific dataset to optimize a model fora specific task Fine-tuning Benefits • Improve model performance on a specific task • Often a more effective way of improving model performance than prompt-engineering or FSL; • Improve model efficiency • Reduce the number of tokens needed to geta model to perform well on your task. • Distill the expertise of a large model into a smaller one Fine-tuning Intuition - lf prompt engineering isn't helping, finetuning likely isn't right for your use-case
Good for Not good for
• Emphasizing knowledge that • Adding new knowledge to the already exists in the model base model • Customizing the structure or • Quickly iterating on a new use- tone of responses case • Teaching a model very complex instructions Fine-tuning – Canva success story Fine-tuning - Cautionary Tale Fine-tuning Steps Fine-tuning Best Practices Fine-tuning+RAG Best of both worlds • Fine-tune the model to understand complex instructions • Minimize prompt-engineering tokens • More space for retrieved context • Use RAG to inject relevant knowledge into the context 4 – Application of Theory Challenge RAG – What we did RAG - Evaluation Fine-tuning The optimization flow Thank you! https://www.youtube.com/watch?v=ahnGLM-RC1Y Organized by Richard Twitter: richchat 微信公众号:檬查查