You are on page 1of 14

How to enhance the

output of LLMs
What Prompt Engineering,
RAG & Fine-tuning is all about!
Prompt Engineering, RAG &
Fine-Tuning
When integrating Large Language Models (LLMs) into your product or application,
ensuring high-quality output is essential. But how can you actually achieve that?
To enhance the performance of LLMs, three key techniques come into play: Prompt
Engineering, RAG, and Fine-Tuning.

The choice of technique depends on


two factors:

(1) the need of external knowledge

(2) the extent of required model


adaptation (e.g. behavior, writing style)

By carefully considering these


aspects, you can tailor your approach
to optimize LLM outputs for specific
tasks and domains.

*Visualization comes from Akshay Pachaar


Created by Julia Bastian & Sebastian Schuon
1

Prompt Engineering
1

What is Prompt Engineering?


Prompt engineering is the practice of designing and refining prompts—questions or
instructions—to elicit specific responses from an LLM.
By being more precise with instructions, the LLM generates more accurate and
relevant information based on your specific needs. This skill is needed when using an
LLM (like Chat GPT) directly or when implementing a LLM into your product.
While prompt engineering is perfect for iterating fast and prototyping with LLM, it falls
short (as a stand-alone technique) for more complex use cases since the prompt is
limited in length.

Example of a Prompt:

You are a health expert that answers


Priming questions about nutrition and health in a
helpful way.

Style & tone Use a informal and motivating language


instructions and explain background information about
the human body in a very simple way.

Dynamic “How should I set up my nutrition plan if I


content want to run a marathon in 6 month?”

Created by Julia Bastian & Sebastian Schuon


2

Retrieval Augmented
Generation (RAG)
2

What is RAG?
… explained with a prompt
Retrieval Augmented Generation (RAG) is a technique that equips LLMs with external
knowledge.
Think of it as enhancing a prompt with an extra layer of context. However, the added
context isn't just a few details – it's an entire dataset of knowledge automatically
integrated into the prompt. This enriches the LLM's understanding and enables it to
generate more informed and contextually relevant responses.

Integrated into the


prompt

Use this to help if relevant {knowledge}

Knowledge = Handbook of nutrition &


sport
External
knowledge Chapter 3: Nutrition to booster
endurance
To support the endurance during high
intensity sport it is important to have a
balanced nutrition plan consisting of…

Created by Julia Bastian & Sebastian Schuon


2

How does RAG work?


RAG achieves this by connecting the base LLM to external knowledge sources
through retrieval mechanisms.
Consequently, the prompt is automatically injected with relevant document
snippets from the database such as parts of a book’s chapter (often so-called vector
databases are used, which are good at retrieving text content). Using this knowledge,
the base LLM produces responses that are both contextually accurate and enriched
with a broader understanding of the user's input.

Created by Julia Bastian & Sebastian Schuon


2

When is RAG used?


Even though the base LLM is trained on the knowledge of the internet, there is
domain/context specific knowledge and up-to-date information that the base
model does not have.
In those cases the output can be improved through RAG. One example for this is an
expert chatbot on a very specific topic (e.g. ESG in real estate)

Example of a Use Case Advantages & Disadvantages

● Less expensive than fine-tuning for


Expert Chatbot for ESG in the
the setup
context of real estate
● Easy to keep content up-to-date
● Less technical expertise required than
fine-tuning.
● Transparency in response generation

● Increasing costs - through long


prompts
● Possibility of hallucinations through
retrieval of small information snippets

Created by Julia Bastian & Sebastian Schuon


3

Fine Tuning
3

What is Fine-Tuning?
… explained with a prompt

Fine-Tuning is a technique that adapts the LLMs to solve an instruction in a certain


way or format.
It operates similar to a prompting technique known as few-shot prompting. In this
method, the prompt does not include explicit explanations instead it uses labeled
examples illustrating how a task is performed or how the desired output should look
like. However, for fine-tuning the LLM is actually trained on a lot of those examples.

Example for Few-Shot Prompting:

You are a sales lead qualifier that will determine whether an


Priming inbound mail is qualified or unqualified

Here are examples of how we qualify:


We are a project development company and we are having problems to file
reportings to the bank since we are missing one data source for all financials
Few-Shot of our construction project. Can we arrange a demo?
example -> Qualified

We are general contractor and we are looking for a solution for our accountant
to manage invoices. Can we arrange a demo?
-> Unqualified
Dynamic Now qualify the following mails:
content

Created by Julia Bastian & Sebastian Schuon


3

How does Fine-Tuning work?

Fine-tuning trains a LLM for a precise task or specific behavior, eliminating the need
for external information retrieval.
This process involves training a base model on task-specific data (training set),
consisting of labeled examples aligned with the target task. Subsequently, users
engage directly with the fine-tuned LLM and not the base model anymore.

training set

Created by Julia Bastian & Sebastian Schuon


3

When is Fine-Tuning used?


The base LLM is trained to provide answers to questions, but there are instances
where you desire outputs in a particular format or task execution in a specific
manner.
Fine-tuning becomes essential when tailoring the behavior of the LLM, refining its
writing style, or incorporating domain-specific knowledge to align with specific
nuances, tones, terminologies or the demands of a particular job.

Example of a Use Case Advantages & Disadvantages

… when you have a lot of example ● Improving output when words fall
data how the output looks like short and examples are a better
explanation
● Increasing speed - through use of a
smaller model
● Saving costs - through reducing the
prompt length

… when words fall short for


instructions ● High effort for adaptations - due to
static data snapshots
● Possibility of hallucinations when
faced with unfamiliar input
● High effort in generating training data
beforehand
● Fine-tuning tends to be a “black box”

Created by Julia Bastian & Sebastian Schuon


Sources

Recommended
Resources
… to understand more

Prompt Engineering, RAG, and Fine-tuning: Benefits and When


to Use
by Entry Point AI

RAG vs Fine Tuning — Which Is the Best Tool to Boost Your


LLM Application?

By Heiko Hotz

Created by Julia Bastian & Sebastian Schuon


Save for later
on LinkedIn

Feel free to reach


out to us!

Julia Bastian Sebastian Schuon

#LLM | #AIEnablement | #AlascoOnAI

You might also like