Building LLM Applications - Open-Source RAG (Part 7) - by Vipra Singh - Medium
Building LLM Applications - Open-Source RAG (Part 7) - by Vipra Singh - Medium
Open in app
Search
Member-only story
Learn Large Language Models (LLM) through the lens of a Retrieval Augmented
Generation (RAG) Application.
2. Data Preparation
3. Sentence Transformers
4. Vector Database
6. LLM
8. Evaluation
9. Serving LLMs
[Link] 1/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Table of Contents
· 1. Introduction
∘ 1.1. LLMs
∘ 1.2. LLM Providers
∘ 1.3. Vector Databases
∘ 1.4. Embedding Models
∘ 1.5. Orchestration Tools
∘ 1.6. Quality Tuning Tools
∘ 1.7. Data Tools
∘ 1.8. Infrastructure
· 2. Build an LLM application from scratch
∘ 2.1. Prepare the data
∘ 2.2. Create the embeddings + retriever
∘ 2.3. Load quantized model
∘ 2.4. Setup the LLM chain
∘ 2.5. Compare the results
· 3. LLM Server
· 4. Chatbot Applications
· 5. Application 1: Chat with multiple PDFs
· 6. Application 2: Chatbot with Open WebUI
· 7. Application 3: Deploy Chatbot using Docker
· Conclusion
· Credits
1. Introduction
[Link] 2/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Source
Our previous blog posts extensively explored Large Language Models (LLMs),
covering their evolution and wide-ranging applications. Now, let’s take a closer look
at the core of this journey: Building LLM Applications locally.
In this blog post, we’ll create a basic LLM Application using LangChain.
[Link] 3/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Let’s start with looking into the tools in the LLM App Stack.
We will see more LLM apps implemented, and we’ll start to see more of these take
on production vibes. These include, but are not limited to — observability, data
versioning, and enterprise features on the basic pieces.
LLMs
LLM Providers
Vector Databases
Embedding Models
Orchestration
Quality Tuning
Infrastructure
Data Tools
1.1. LLMs
[Link] 4/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Large language models are all the rage in AI. They have enabled us to work with AI
through natural language, a goal that researchers and practitioners everywhere have
been striving for for decades. The 2014 rise of generative adversarial networks,
combined with the 2018 emergence of transformers, and the increased compute
capabilities over the years have all led to this moment. This technology.
It’s not accurate or fair to say that LLMs will change the world. They already have.
OpenAI (GPT)
Meta (Llama)
Google (Gemini)
Mistral
Deci AI
DeciLM-7B is the latest in a family of LLMs by Deci AI. With its 7.04 billion
parameters and an architecture optimized for high throughput, it achieves top
performance on the Open LLM Leaderboard. It ensures diversity and efficiency in
training through a blend of open source and proprietary datasets and innovative
techniques like Grouped-Query Attention. Deci 7B supports 8192 tokens and is
under an Apache 2.0 license. — Harpreet Sahota
Symbl AI
We [the founders] come from a telecom background where they saw a need for
latency sensitive, low-memory language models. Symbl AI features a unique AI
model that focuses on understanding speech from end to end. It includes the ability
to do speech to text as well as analyze and understand what was said. — Surbhi
Rathore
Claude by Anthropic
AI Bloks
[Link] 5/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
language models that are fine-tuned and CPU-friendly, and designed to stack
together for a comprehensive solution. — Namee Oberst
Arcee AI
Abacus AI
Nous Research
Solar by Upstage
LLMs are expensive. Even more so for developing countries. There needs to be a
solution to this. That’s why we made Solar. Solar is small enough to fit on a chip and
accessible enough that anyone can access it. — Sung Kim
OctoAI
“When we started OctoAI, we knew models would only get larger, making GPU
resources scarce. This led us to focus our systems expertise on serving AI workloads
efficiently at scale. Today OctoAI serves the latest text-gen and media-gen
foundation models, via OpenAI-compatible APIs, so developers can get the best out
of open source innovation in a cost-effective package.” — Thierry Moreau
Fireworks AI
Martian
Milvus is an open source vector database aimed at making it possible to work with
billions of vectors. Aimed at enterprise scale, Milvus also includes many enterprise
features like multi-tenancy, role based access control, and data replications. —
Yujian Tang
[Link] 6/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Weaviate
Chroma
Qdrant
Astra DB
ApertureData
“We built Aperture Data with the intention of simplifying interactions with
multimodal data types for DS/ML teams. Our biggest value proposition is that we
can merge vector data, intelligence graphs, and multimodal data for querying
through one API.” — Vishakha Gupta
Pinecone
LanceDB
LanceDB runs in your app with no servers to manage. Zero vendor lock-in. LanceDB
is a developer-friendly, open source database for AI. It is based on DuckDB and the
Lance data format. — Jasmine Wang
ElasticSearch
Zilliz
Zilliz Cloud intends to solve the unstructured data problem. Built on the highly
scalable, reliable, and popular open source vector database Milvus, Zilliz Cloud
offers devs the ability to customize their vector search, scale seamlessly to billions
of vectors, and do it all without having to manage a complex infrastructure. —
Charles Xie
Hugging Face
[Link] 7/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Voyage AI
MixedBread
MixedBread looks to change the way that AI and people interact with Data. It’s
backed by a strong research and software engineering team. — Aamir Shakir
Jina AI
1.5. Orchestration Tools
A whole new set of orchestration tools rose around LLMs. The primary reason?
Orchestration of LLM apps includes prompting, an entirely new category. These
tools are made by people on the cutting edge of both “prompt engineering” and
machine learning.
LlamaIndex
“We built the first version of LlamaIndex at the cusp of the ChatGPT boom to solve
one of the most pressing problems with LLM tooling — how to harness this
reasoning capability and apply it on top of a user’s data. Today we’re a mature data
framework in Python and TypeScript that provides comprehensive
tooling/integrations (150+ data loaders, 50+ templates, 50+ vector stores) to build out
any LLM application over your data, from RAG to agents.” — Jerry Liu
LangChain
HayStack
Semantic Kernel
AutoGen
Flyte
[Link] 8/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
engineers and data scientists to streamline their work across teams and resources
easily. You can access the power of Flyte, fully managed in your Cloud with
[Link]” — Ketan Umare
Flowise AI
Boundary ML
Arize AI
“My co-founder, Aparna Dhinakaran, came from Uber’s ML team and I came from
TubeMogul, where we both realized the hardest problems we faced were
troubleshooting real world AI and making sense of AI performance. Arize has a
unique combination of people who have been working for decades on AI system
performance evaluation, highly usable observability tools, and large data systems.
We have a foundation in open source, and support a community version of our
software called Phoenix.” — Jason Lopatecki, CEO and Co-Founder of Arize AI.
WhyLabs
[Link] 9/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Deepchecks
Aporia
TruEra
Honey Hive
Guardrails AI
BrainTrust Data
Patronus AI
[Link] 10/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Giskard
Quotient
Quotient provides an end to end platform to quantitatively test changes in your LLM
application. After many conversations at GitHub, we decided that quantitatively
testing LLM apps was a big problem. Our special sauce is that we provide domain
specific evaluation for business use cases. — Julia Neagu
Galileo
1.7. Data Tools
In 2012, data was your best friend in any AI/ML application. In 2024, the story is a
little different, but not by much. The quality of your data is still critical. These tools
help you ensure that your data is labeled correctly, that you’re using the right
datasets, and move your data around easily.
Voxel51
“Models are only as good as the data they’re trained on, so what’s in your datasets?
We built Voxel51 to organize your unstructured data in a centralized, searchable,
and visualizable location that uniquely allows you to build automations that
improve the quality of the training data that you feed to your models.” — Brian
Moore
DVC
XetHub
We built XetHub after building Apple’s ML data platform and watching ML teams
struggle because their tools & processes didn’t scale and weren’t aligned with
software teams. XetHub has scaled Git to 100TB (per repo) and offers a GitHub-like
experience with tailor-made features for ML (custom views, model diffs, automatic
summarization, block-based deduplication, streaming mounts, dashboarding, and
more). — Rajat Arya
Kafka
Airbyte
ByteWax
[Link] 11/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
“Bytewax first fills a gap in the Python ecosystem for a Python-native stream
processor that is production-ready. Second it aims at the developer experience
problem with existing stream processing tools with an easy-to-use and intuitive API
and a straight-forward deployment story: `pip install bytewax && python -m
[Link] my_dataflow.py`” — Zander Matheson
[Link]
Spark
Pulsar
Floom
Flink
Proton by Timeplus
Apache NiFi
ActiveLoop
HumanLoop
SuperLinked
Skyflow (Privacy)
Skyflow is a data privacy vault service inspired by the zero trust approach used by
companies like Apple and Netflix. It isolates and protects sensitive customer data,
transforming it into non-sensitive references to reduce security risks. Skyflow’s APIs
ensure privacy-safe model training by excluding sensitive data, preventing
inference from prompts or user-provided files for operations like RAG. — Sean
Falconer
VectorFlow
[Link] 12/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Daios
Pathway
Mage AI
Flexor
1.8. Infrastructure
A March 2024 addition and shakeup to this stack, infrastructure tools are critical to
building LLM Apps. These tools allow you to build your app first and abstract the
production work to later on. They allow you to serve, train, and evaluate LLMs and
LLM based applications.
BentoML
I started BentoML because I saw how tough it was to run and serve AI models
efficiently. With traditional cloud infra, handling heavy GPU workloads and dealing
with large models can be a real headache. In short, we make it super easy for AI
developers to get their AI inference service up and running. We’re all about open
source here, supported by an awesome community that’s always contributing. —
Chaoyu Yang
Databricks
LastMile AI
TitanML
Lots of enterprises want to self host language models but don’t have the
infrastructure to do it well. TitanML provides that infrastructure to let developers
build applications. The special sauce is that it focuses on optimization for enterprise
workloads like batch inference, multimodal, and embedding models. — Meryem
Arik
ConfidentialMind
[Link] 13/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Snowflake
Upstash
There needs to be a way to keep track of state for stateless tools, and the developers
need to be served in this space. — Enes Akar
Unbody
We make AI accessible for non AI developers and make private data pipeline for AI
functionalities. — Amir Houieh
NIM by NVIDIA
Parea AI
[Link] 14/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Source
[Link] 15/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
At the same time, the fact that fine-tuning is not required gives you the freedom
to swap your LLM for a more powerful one when it becomes available, or switch
to a smaller distilled version, should you need faster inference.
Let’s illustrate building a RAG using an open-source LLM, embeddings model, and
LangChain.
[Link] 16/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
# If running in Google Colab, you may need to run this cell to make sure you're
import locale
[Link] = lambda: "UTF-8"
First, you need to acquire a GitHub personal access token to access the GitHub API.
By default, pull requests are considered issues as well, here we chose to exclude
them from data with by setting include_prs=False
Setting state = "all" means we will load both open and closed issues.
[Link] 17/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
include_prs=False, state="all")
docs = [Link]()
The content of individual GitHub issues may be longer than what an embedding
model can take as input. If we want to embed all of the available content, we need to
chunk the documents into appropriately sized pieces.
chunked_docs = splitter.split_documents(docs)
To create document chunk embeddings we’ll use the HuggingFaceEmbeddings and the
BAAI/bge-base-en-v1.5 embeddings model. There are many other embedding
models available on the Hub, and you can keep an eye on the best-performing ones
by checking the Massive Text Embedding Benchmark (MTEB) Leaderboard.
To create the vector database, we’ll use FAISS , a library developed by Facebook AI.
This library offers efficient similarity search and clustering of dense vectors, which
is what we need here. FAISS is currently one of the most used libraries for NN
search in massive datasets.
We’ll access both the embedding model and FAISS via LangChain API.
[Link] 18/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
db = FAISS.from_documents(chunked_docs, HuggingFaceEmbeddings
(model_name="BAAI/bge-base-en-v1.5"))
The vector database and retriever are now set up, next we need to set up the next
piece of the chain — the model.
With many models being released every week, you may want to substitute this
model to the latest and greatest. The best way to keep track of open source LLMs is
to check the Open-source LLM leaderboard.
To make inference faster, we will load the quantized version of the model:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfi
model_name = "HuggingFaceH4/zephyr-7b-beta"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4
[Link] 19/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
First, create a text_generation pipeline using the loaded model and its tokenizer.
Next, create a prompt template — this should follow the format of the model, so if
you substitute the model checkpoint, make sure to use the appropriate formatting.
text_generation_pipeline = pipeline(
model=model,
tokenizer=tokenizer,
task="text-generation",
temperature=0.2,
do_sample=True,
repetition_penalty=1.1,
return_full_text=True,
max_new_tokens=400,
)
llm = HuggingFacePipeline(pipeline=text_generation_pipeline)
prompt_template = """
<|system|>
Answer the question based on your knowledge. Use the following context to help:
{context}
</s>
<|user|>
{question}
</s>
<|assistant|>
"""
prompt = PromptTemplate(
input_variables=["context", "question"],
template=prompt_template,
)
[Link] 20/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Note: You can also use tokenizer.apply_chat_template to convert a list of messages (as
dicts: {'role': 'user', 'content': '(...)'} ) into a string with the appropriate chat
format.
Finally, we need to combine the llm_chain with the retriever to create a RAG chain.
We pass the original question through to the final generation step, as well as the
retrieved context docs:
retriever = db.as_retriever()
First, let’s see what kind of answer we can get with just the model itself, no context
added:
Output:
[Link] 21/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
To combine multiple adapters, you need to ensure that they are compatible with
1. Identify the types of connectors you need: Before combining adapters, determ
2. Check compatibility: Make sure that the adapters you choose are compatible w
3. Connect the adapters: Once you have identified the compatible adapters, conn
4. Test the connection: After connecting all the adapters, test the connection
rag_chain.invoke(question)
Output:
Based on the provided context, here are some potential ways to combine multipl
```python
from peft import Peft
# Load adapter 2
adapter2 = Peft("adapter2").requires_grad_(False)
adapter2(base_model).load_state_dict([Link]("path/to/[Link]"))
super().__init__()
[Link] = forward
```python
from peft import Peft
As we can see, the added context, really helps the exact same model, provide a
much more relevant and informed answer to the library-specific question.
Notably, combining multiple adapters for inference has been added to the library,
and one can find this information in the documentation, so for the next iteration of
this RAG it may be worth including documentation embeddings.
So, now we have an understanding of how to build an LLM RAG Application from
scratch.
For the below applications, we will be using Ollama as our LLM Server. Let’s start
with understanding more about LLM Server below.
3. LLM Server
The most critical component of this app is the LLM server. With Ollama, we have a
robust LLM Server that can be set up locally.
What is Ollama?
[Link] 23/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Ollama isn’t a single language model but a framework that lets us run multiple
open-source LLMs locally on our machine. Think of it like a platform for playing
different language models like Llama 2, Mistral, etc., instead of a specific player
itself.
Additionally, we can use the Langchain SDK, which is a tool for working with Ollama
more conveniently.
Using Ollama on the command line is very simple. The following are commands,
that we can try to run Ollama on our computer.
ollama pull — This command pulls a model from the Ollama model hub.
ollama list — This command is used to see the list of downloaded models.
ollama run — This command is used to run a model, If the model is not already
downloaded, it will pull the model and serve it.
ollama serve — This command is used to start the server, to serve the
downloaded models.
We can download these models to our local machine, and then interact with those
models through a command line prompt. Alternatively, when we run the model,
Ollama also runs an inference server hosted at port 11434 (by default) that we can
interact with through APIs and other libraries like Langchain.
As of this post, Ollama has 74 models, which also include categories like embedding
models.
[Link] 24/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Source : Ollama
4. Chatbot Applications
The 3 essential Chatbot applications that we will be building next are :
[Link] 25/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Our tech stack is super easy with Langchain, Ollama, and Streamlit.
Architecture
LLM Server: The most critical component of this app is the LLM server. Thanks
to Ollama, we have a robust LLM Server that can be set up locally, even on a
laptop. While [Link] is an option, I find Ollama, written in Go, easier to set
up and run.
[Link] 26/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
RAG: Undoubtedly, the two leading libraries in the LLM domain are Langchain
and LLamIndex. For this project, I’ll be using Langchain due to my familiarity
with it from my professional experience. An essential component for any RAG
framework is vector storage. We’ll be using Chroma here, as it integrates well
with Langchain.
Chat UI: The user interface is also an important component. Although there are
many technologies available, I prefer using Streamlit, a Python library, for peace
of mind.
The chatbot can access information from various PDFs. Here’s a breakdown:
Processing: LangChain API prepares the data for a large language model (LLM)
[Link] 27/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Folder Structure
[Link] 28/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
4. Install Ollama and pull LLM model specified in [Link] [ We have already
covered setting up Ollama in the above section ]
6. Run the [Link] file using the Streamlit CLI. Execute the following command:
Image by Author
[Link] 29/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
We can install Ollama directly on our local machine or can also deploy the Ollama
docker container locally. The choice is ours, either of them will work for the
langchain Ollama interface, Ollama official python interface, and open-webui
interface.
Below are the instructions for installing Ollama directly in our local systems :
Next, open the terminal and execute the following command to pull the latest
models. While there are many other LLM models available, I choose Mistral-7B for
its compact size and competitive quality.
[Link] 30/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
The set-up procedure is the same for all other models. We need to pull and run.
Image Source
Run the below docker command to deploy open-webui docker container on the local
machine.
[Link] 31/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Image by Author
3. Open Browser
[Link]
[Link] 32/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Image by Author
To get started, we need to register for the first time. Simply click on the "Sign up"
button to create our account.
Image by Author
[Link] 33/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Image by Author
Depending on which LLM we deployed on our local machine, those options will be
reflected in the drop-down to select.
Image by Author
[Link] 34/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Image by Author
[Link] 35/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Image by Author
Attaching a Demo gif file showcasing how we can use open-webui to chat with
images.
[Link] 36/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Source: open-webui
Let’s look into how we can use our customized models with Ollama. Below are the
steps for the same.
1. Create a file named Modelfile , with a FROM instruction with the local filepath to
the model we want to import.
FROM ./vicuna-33b.Q4_0.gguf
Customize a prompt
Models from the Ollama library can be customized with a prompt. For example, to
customize the llama2 model:
[Link] 37/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Create a Modelfile :
FROM llama2
For more examples, see the examples directory. For more information on working
with a Modelfile, see the Modelfile documentation.
The following picture shows the architecture of how the containers interact, and
what ports they will be accessing.
[Link] 38/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Source
We build 2 containers,
Ollama container uses the host volume to store and load the models
( /root/.ollama is mapped to the local ./data/ollama ). Ollama container listens
on 11434 (external port, which is internally mapped to 11434)
GitHub Repository :
Folder Structure
[Link] 39/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Ollama is a framework that allows us to run the Ollama server as a docker image.
This is very useful for building microservices applications that use Ollama models.
We can easily deploy our applications in the docker ecosystem, such as OpenShift,
Kubernetes, and others. To run Ollama in docker, we have to use the docker run
command, as shown below. Before this, we should have docker installed in our
system.
[Link] 40/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Image by Author
We should then be able to interact with this container using docker exec , as shown
below, and run the prompts.
Image by Author
[Link] 41/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Image by Author
Note that docker containers an ephemeral, and whatever models, we pull, will
disappear when we restart the container. We will solve this issue in the next blog,
where we will build a distributed Streamlit application from ground up. We will be
mapping the volume of the container with the host.
Ollama is a powerful tool that enables new ways of creating and running LLM
applications on the cloud. It simplifies the development process and offers flexible
deployment options. It also allows for easy management and scaling of the
applications.
[Link] 42/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
We are using Ollama and calling the model through Ollama Langchain library
(which is part of langchain_community )
[Link] 43/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Let’s now define a Dockerfile to build the docker image of the Streamlit application.
We are using the Python docker image, as the base image, and creating a working
directory called /app . We are then copying our application files there, and running
the pip installs to install all the dependencies. We are then exposing the port 8501
and starting the streamlit application.
We can build the docker image using docker build command, as shown below.
Image by Author
[Link] 44/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
We should be able to check if the Docker image is built, using docker images
Image by Author
Let’s now build a docker-compose configuration file, to define the network of the
Streamlit application and the Ollama container, so that they can interact with each
other. We will also be defining the various port configurations, as shown in the
picture above. For Ollama, we will also be mapping the volume, so that whatever
models are pulled, are persisted.
[Link]
[Link] 45/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Image by Author
Image by Author
Let’s now download the required model, by logging into the docker container using
the docker exec command as shown below.
[Link] 46/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Since we are using the model phi, we are pulling that model and testing it by
running it. We can see the screenshot below, where the phi model is downloaded
and will start running (since we are using -it flag we should be able to interact and
test with sample prompts)
We can see the downloaded model files and manifests in our local folder
./data/ollama (which is internally mapped to /root/.ollama for the container,
which is where Ollama looks for the downloaded models to serve)
[Link] 47/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Image by Author
[Link] 48/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Let's try to run a prompt “ generate a story about dog called bozo ”. We should be able
to see the console logs reflecting the API calls, that are coming from our Streamlit
application, as shown below
We can see below screenshot, the response, I got for the prompt I sent
[Link] 49/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
There we go. It was super fun, working on this blog getting Ollama to work with
Langchain, and deploying them on Docker using Docker-Compose
Conclusion
The blog explores building Large Language Model (LLM) applications locally,
focusing on Retrieval-Augmented Generation (RAG) chains. It covers components
like the LLM Server powered by Ollama, LangChain framework, Chroma for
[Link] 50/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
embeddings, and Streamlit for web apps. It details creating Chatbot applications
using Ollama, LangChain, ChromaDB, and Streamlit, with GitHub repo structures
and Docker deployment. Overall, it offers a practical guide to developing LLM
applications efficiently.
Credits
In this blog post, we have compiled information from various sources, including
research papers, technical blogs, official documentations, YouTube videos, and
more. Each source has been appropriately credited beneath the corresponding
images, with source links provided.
1. [Link]
7168449062336225280-3n_p/
2. [Link]
eac28b9dc1e7
3. [Link]
4. [Link]
5. [Link]
ollama-deploy-on-docker-5dfcfd140363
Your claps help me create more valuable content for our vibrant Python or ML
community.
[Link] 51/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Follow
Vipra Singh
Apr 28 755 6
[Link] 52/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Vipra Singh
Apr 17 824 5
Vipra Singh
[Link] 53/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Apr 7 479 1
Vipra Singh
Aug 15 1.5K 10
[Link] 54/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Vipra Singh
Apr 17 824 5
[Link] 55/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Jun 6 705 3
Lists
AI Regulation
6 stories · 571 saves
ChatGPT prompts
48 stories · 2021 saves
May 11 484 2
[Link] 56/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Apr 7 858 12
Harshit Tyagi
[Link] 57/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium
Sep 15 363 3
May 3 177 2
[Link] 58/58