Professional Documents
Culture Documents
Topics:
1.what are LLMs
2.what is RAG
3.comparision of RAG and GPT
4.whats is LLAMA index
5. how to get started with rag
6.applica and future aspects of RAG
7.conclusion
8. bibliography
(5)
https://learnbybuilding.ai/tutorials/rag-from-scratch
https://www.anaconda.com/blog/how-to-build-a-
retrieval-augmented-generation-chatbot
https://dev.to/llmware/become-a-rag-professional-in-
2024-go-from-beginner-to-expert-41mg
My goal is to tell you all the tricks and traps that lurk past the basics, but before we
get there, we have to cover the basic idea behind Retrieval Augmented Generation.
With RAG you supply an AI with relevant text just-in-time to help patch gaps in its
knowledge base. This is essential if you want a general purpose model like GPT-3.5
or GPT-4 to answer questions about your own data.
RAG begins with a vector database, which will allow you to quickly search through
giant amounts of text for relevant information. The vector database that I see
recommended most often is ‘Pinecone,’ and it’s the one I chose.
Pinecone has a lot of powerful features that many vector databases don’t:
However, Pinecone’s free tier is pretty limited and the paid tiers are all expensive.
There are strong open source alternatives like FAISS. I’d say Pinecone is a solid choice
but I wouldn’t call it a best practice. Choose the vector database that’s right for you.
Note - if you want to use Open.ai and Pinecone, then you'll need to sign for API access.
It took me days to get access to both systems, so I recommend getting a head-start on
this.
Before you can use a vector database you need to ‘embed’ a text string, which means
you need to convert it into a very big array of floating point numbers. It’s important
to choose the embedding method that matches your AI, but creating an embedding
is extremely easy.
If you want to create a vector embedding for OpenAI you just call the text-
embedding-ada-002 model and you get back a 1536 dimension vector:
response = openai.Embedding.create(
input="Your text string goes here",
model="text-embedding-ada-002"
)
embeddings = response['data'][0]['embedding']
After that you store the embedding in your vector database. If you're using Pinecone
and Open.ai, you'll store your vectors in an 'index'. Set up a cosine index with 1536 to
store your dimensions.
Not all Large Language Models use 1536 dimensions, so that will change if you use a
different LLM. However, every vector database I've encountered uses cosine as the
default method for measuring similarity.
When the user performs a search your vector database will find the most similar text
from the vector database - and it’ll do it very fast.
What is cool about this type of search is that it doesn’t require using the same words,
it works so long as the meaning is similar. That is why this is called ‘semantic search’.
After you get back the vectors you have to associate it to your text so then you can
then pass it off to the AI. So how should you associate your vector with text?
I’ve seen many people storing the text as metadata directly on the vector, but I very
strongly recommend you store a related database index instead. Doing that will give
you far more flexibility and a great external backup to your vector database.
Finally, supply the text to the AI in a prompt, which can use the (hopefully) relevant
text to provide the user with a correct answer. Easy right?
Of course, you can probably guess that it’s not quite that simple in practice. I’ll cover
all the gotchas and problems you’ll face in future posts. Thanks for reading!
(6)
Seven Real-World Applications of Retrieval-Augmented
Generation (RAG) Models
Retrieval-augmented generation models have demonstrated versatility across
multiple domains. Some of the real-world applications of RAG models are:
RAG models can power question-answering systems that retrieve and generate
accurate responses, enhancing information accessibility for individuals and
organizations. For example, a healthcare organization can use RAG models. They
can develop a system that answers medical queries by retrieving information from
medical literature and generating precise responses.
4. Information retrieval
First up: “Chatify” your website. With a setup time of just 30 minutes, SquirroGPT
empowers you to elevate your website's user experience, providing an interactive,
engaging, and responsive chat interface. Whether it's addressing user queries,
offering support, or facilitating seamless navigation, SquirroGPT is equipped to
handle it all, ensuring your visitors find exactly what they're looking for with ease and
convenience.
Responding to Requests for Proposals (RFPs) or Requests for Information (RFI) can
be a daunting and time-consuming task, requiring meticulous attention to detail and
extensive knowledge of your company’s offerings. Enter SquirroGPT, designed to
revolutionize the way you handle RFPs/RFIs. SquirroGPT can quickly and accurately
generate comprehensive response sheets to even the most complex RFPs/RFIs,
ensuring your proposals are coherent, compelling, and to the point. By leveraging
SquirroGPT, companies can not only expedite the RFP/RFI answering process but
also significantly enhance the quality and precision of their responses, thereby
increasing the chances of securing valuable contracts.
GPT-Enabled Fashion Recommendations
Applications of RAG
Enhanced Factual Accuracy: LLMs are notorious for generating creative text
formats, but factual accuracy can be a challenge. RAG overcomes this by
allowing the model to access trustworthy sources for factual grounding, making
responses more reliable.
Fine-tuning for Specific Domains: RAG can be fine-tuned for specific domains
by incorporating domain-specific knowledge bases. This will allow for the
generation of highly specialized content tailored to a particular field.