0% found this document useful (0 votes)

351 views58 pages

Building LLM Applications - Open-Source RAG (Part 7) - by Vipra Singh - Medium

This document is part 7 of a series on building Large Language Model (LLM) applications, focusing on Retrieval Augmented Generation (RAG). It outlines the tools and processes necessary for creating LLM applications locally, including data preparation, embedding models, vector databases, and orchestration tools. The article also discusses various LLM providers and applications, emphasizing the importance of quality tuning and evaluation in production environments.

Uploaded by

Fernanda G.

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

351 views58 pages

Building LLM Applications - Open-Source RAG (Part 7) - by Vipra Singh - Medium

Uploaded by

Fernanda G.

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Open in app

Member-only story

Building LLM Applications: Open-Source RAG

(Part 7)
Vipra Singh · Follow
27 min read · Mar 17, 2024

Listen Share More

Learn Large Language Models (LLM) through the lens of a Retrieval Augmented
Generation (RAG) Application.

Posts in this Series

1. Introduction

2. Data Preparation

3. Sentence Transformers

4. Vector Database

5. Search & Retrieval

6. LLM

7. Open-Source RAG ( This Post )

8. Evaluation

9. Serving LLMs

10. Advanced RAG

[Link] 1/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Table of Contents
· 1. Introduction
∘ 1.1. LLMs
∘ 1.2. LLM Providers
∘ 1.3. Vector Databases
∘ 1.4. Embedding Models
∘ 1.5. Orchestration Tools
∘ 1.6. Quality Tuning Tools
∘ 1.7. Data Tools
∘ 1.8. Infrastructure
· 2. Build an LLM application from scratch
∘ 2.1. Prepare the data
∘ 2.2. Create the embeddings + retriever
∘ 2.3. Load quantized model
∘ 2.4. Setup the LLM chain
∘ 2.5. Compare the results
· 3. LLM Server
· 4. Chatbot Applications
· 5. Application 1: Chat with multiple PDFs
· 6. Application 2: Chatbot with Open WebUI
· 7. Application 3: Deploy Chatbot using Docker
· Conclusion
· Credits

1. Introduction

[Link] 2/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Source

Our previous blog posts extensively explored Large Language Models (LLMs),
covering their evolution and wide-ranging applications. Now, let’s take a closer look
at the core of this journey: Building LLM Applications locally.

In this blog post, we’ll create a basic LLM Application using LangChain.

Afterward, we’ll proceed to develop three additional open-source LLM Applications

locally.

[Link] 3/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Credits : Yujian Tang

Let’s start with looking into the tools in the LLM App Stack.

We will see more LLM apps implemented, and we’ll start to see more of these take
on production vibes. These include, but are not limited to — observability, data
versioning, and enterprise features on the basic pieces.

As of the March 2024 update, this article contains 67 companies in 8 categories:

LLMs

LLM Providers

Vector Databases

Embedding Models

Orchestration

Quality Tuning

Infrastructure

Data Tools

1.1. LLMs

[Link] 4/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Large language models are all the rage in AI. They have enabled us to work with AI
through natural language, a goal that researchers and practitioners everywhere have
been striving for for decades. The 2014 rise of generative adversarial networks,
combined with the 2018 emergence of transformers, and the increased compute
capabilities over the years have all led to this moment. This technology.

It’s not accurate or fair to say that LLMs will change the world. They already have.

OpenAI (GPT)

Meta (Llama)

Google (Gemini)

Mistral

Deci AI

DeciLM-7B is the latest in a family of LLMs by Deci AI. With its 7.04 billion
parameters and an architecture optimized for high throughput, it achieves top
performance on the Open LLM Leaderboard. It ensures diversity and efficiency in
training through a blend of open source and proprietary datasets and innovative
techniques like Grouped-Query Attention. Deci 7B supports 8192 tokens and is
under an Apache 2.0 license. — Harpreet Sahota

Symbl AI

We [the founders] come from a telecom background where they saw a need for
latency sensitive, low-memory language models. Symbl AI features a unique AI
model that focuses on understanding speech from end to end. It includes the ability
to do speech to text as well as analyze and understand what was said. — Surbhi
Rathore

Claude by Anthropic

AI Bloks

We built AI Bloks to solve the problem of automating LLM workflows in the

enterprise in private cloud. Our product ecosystem has one of the most
comprehensive open-source development frameworks and models for enterprise-
focused LLM workflows. We have an integrated RAG framework with over 40 small

[Link] 5/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

language models that are fine-tuned and CPU-friendly, and designed to stack
together for a comprehensive solution. — Namee Oberst

Arcee AI

Abacus AI

Nous Research

Solar by Upstage

LLMs are expensive. Even more so for developing countries. There needs to be a
solution to this. That’s why we made Solar. Solar is small enough to fit on a chip and
accessible enough that anyone can access it. — Sung Kim

1.2. LLM Providers

Amazon (Bedrock)

OctoAI

“When we started OctoAI, we knew models would only get larger, making GPU
resources scarce. This led us to focus our systems expertise on serving AI workloads
efficiently at scale. Today OctoAI serves the latest text-gen and media-gen
foundation models, via OpenAI-compatible APIs, so developers can get the best out
of open source innovation in a cost-effective package.” — Thierry Moreau

Fireworks AI

Martian

1.3. Vector Databases

Vector databases are a critical piece of the LLM stack. They provide the ability to
work with unstructured data. Otherwise impossible to work with, unstructured data
can be run through machine learning models to produce a vector embedding.
Vector databases use these vector embeddings to find similar vectors.

Milvus (This is my project! Give it a GitHub star!)

Milvus is an open source vector database aimed at making it possible to work with
billions of vectors. Aimed at enterprise scale, Milvus also includes many enterprise
features like multi-tenancy, role based access control, and data replications. —
Yujian Tang
[Link] 6/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Weaviate

Chroma

Qdrant

Astra DB

ApertureData

“We built Aperture Data with the intention of simplifying interactions with
multimodal data types for DS/ML teams. Our biggest value proposition is that we
can merge vector data, intelligence graphs, and multimodal data for querying
through one API.” — Vishakha Gupta

Pinecone

LanceDB

LanceDB runs in your app with no servers to manage. Zero vendor lock-in. LanceDB
is a developer-friendly, open source database for AI. It is based on DuckDB and the
Lance data format. — Jasmine Wang

ElasticSearch

Zilliz

Zilliz Cloud intends to solve the unstructured data problem. Built on the highly
scalable, reliable, and popular open source vector database Milvus, Zilliz Cloud
offers devs the ability to customize their vector search, scale seamlessly to billions
of vectors, and do it all without having to manage a complex infrastructure. —
Charles Xie

1.4. Embedding Models

Embedding models are the models that create vector embeddings. These are a
critical piece of the LLM stack that are often confused with LLMs. I blame OpenAI’s
naming conventions + the intense fervor around the need to learn this new
technology. LLMs can be used as embedding models, but embedding models have
existed long before the rise of LLMs.

Hugging Face

[Link] 7/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Voyage AI

“Voyage AI offers general-purpose, domain-specific, and fine-tuned embedding

models with the best retrieval quality and efficiency.” — Tengyu Ma

MixedBread

MixedBread looks to change the way that AI and people interact with Data. It’s
backed by a strong research and software engineering team. — Aamir Shakir

Jina AI
1.5. Orchestration Tools
A whole new set of orchestration tools rose around LLMs. The primary reason?
Orchestration of LLM apps includes prompting, an entirely new category. These
tools are made by people on the cutting edge of both “prompt engineering” and
machine learning.

LlamaIndex

“We built the first version of LlamaIndex at the cusp of the ChatGPT boom to solve
one of the most pressing problems with LLM tooling — how to harness this
reasoning capability and apply it on top of a user’s data. Today we’re a mature data
framework in Python and TypeScript that provides comprehensive
tooling/integrations (150+ data loaders, 50+ templates, 50+ vector stores) to build out
any LLM application over your data, from RAG to agents.” — Jerry Liu

LangChain

HayStack

Semantic Kernel

AutoGen

Flyte

“Flyte is redefining the landscape of machine learning and data engineering

workflows by leveraging containerization and Kubernetes to orchestrate complex,
scalable, and reliable workflows. With a focus on reproducibility and efficiency,
Flyte provides a unified platform for running computational tasks that allow ML

[Link] 8/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

engineers and data scientists to streamline their work across teams and resources
easily. You can access the power of Flyte, fully managed in your Cloud with
[Link]” — Ketan Umare

Flowise AI

Flowise is an ochestration tool built on top of Langchain and LlamaIndex.

Developing LLM apps require a whole new set of dev tooling, thats why we created
Flowise: to allow developers to build, evaluate and observe their LLM apps in one
single platform. We are the first that opened up the new frontier of low-code LLM
apps builder and the first open source platform that integrates different LLM
frameworks, allowing devs to customize for their use cases. — Henry Heng

Boundary ML

We created BAML, a new config language, that is optimized for expressing AI

function. BAML offers built in type-safety, a native VSCode playground, arbitrary
model support, observability, and support for both Python and Typescript. On top
of that, it’s Open Source! — Vaibhav Gupta
1.6. Quality Tuning Tools
Build your app first, then worry about quality. But really, quality is important. The
reason these tools exist is because: a) a lot of LLM based results are subjective but
need a way to be measured, b) if you’re using something in production, you need to
make sure it’s good, and c) seeing how different parameters effect your application
means you can see how to improve it.

Arize AI

“My co-founder, Aparna Dhinakaran, came from Uber’s ML team and I came from
TubeMogul, where we both realized the hardest problems we faced were
troubleshooting real world AI and making sense of AI performance. Arize has a
unique combination of people who have been working for decades on AI system
performance evaluation, highly usable observability tools, and large data systems.
We have a foundation in open source, and support a community version of our
software called Phoenix.” — Jason Lopatecki, CEO and Co-Founder of Arize AI.

WhyLabs

[Link] 9/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

“WhyLabs helps AI practitioners build reliable and trustworthy AI systems. As our

customers expanded from predictive to GenAI use cases, security and control
became their biggest barriers to launching to production. To solve that, we open
sourced LangKit, a tool that enables teams to prevent abuse, misinformation, and
data leakage in LLM applications. For enterprise teams, the WhyLabs Platform
layers on top of LangKit to provide a collaborative control and root cause analysis
center — Alessya Vijnic

Deepchecks

“We started Deepchecks as a response to the overwhelming cost of building,

aligning, and observing AI. Our special sauce is inproviding our users with an
automated scoring mechanism. This allows us to combine multiple considerations
such as quality and compliance to score an LLM response.” — Philip Tannor

Aporia

TruEra

TruEra’s AI Quality solutions evaluate, debug, and monitor machine learning

models and large language model applications for higher quality, trustworthiness,
and faster deployment. For large language model applications, TruEra uses
feedback functions to evaluate the performance of LLM applications without labels.
Combined with deep traceability, this brings unparalleled observability into any AI
app. TruEra works across the model lifecycle, is independent of model development
platforms, and embeds easily into your existing AI stack. — Josh Reini

Honey Hive

Guardrails AI

Exploding Gradients (RAGAS)

BrainTrust Data

BrainTrust Data is a robust solution to evaluating LLM apps that is software

engineering centric. It allows for quick iteration. Different set of evaluations from
other MLOps tools focused on usage and lets users define functions. — Albert Zhang

Patronus AI

[Link] 10/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Giskard

Quotient

Quotient provides an end to end platform to quantitatively test changes in your LLM
application. After many conversations at GitHub, we decided that quantitatively
testing LLM apps was a big problem. Our special sauce is that we provide domain
specific evaluation for business use cases. — Julia Neagu

Galileo
1.7. Data Tools
In 2012, data was your best friend in any AI/ML application. In 2024, the story is a
little different, but not by much. The quality of your data is still critical. These tools
help you ensure that your data is labeled correctly, that you’re using the right
datasets, and move your data around easily.

Voxel51

“Models are only as good as the data they’re trained on, so what’s in your datasets?
We built Voxel51 to organize your unstructured data in a centralized, searchable,
and visualizable location that uniquely allows you to build automations that
improve the quality of the training data that you feed to your models.” — Brian
Moore

DVC

XetHub

We built XetHub after building Apple’s ML data platform and watching ML teams
struggle because their tools & processes didn’t scale and weren’t aligned with
software teams. XetHub has scaled Git to 100TB (per repo) and offers a GitHub-like
experience with tailor-made features for ML (custom views, model diffs, automatic
summarization, block-based deduplication, streaming mounts, dashboarding, and
more). — Rajat Arya

Kafka

Airbyte

ByteWax

[Link] 11/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

“Bytewax first fills a gap in the Python ecosystem for a Python-native stream
processor that is production-ready. Second it aims at the developer experience
problem with existing stream processing tools with an easy-to-use and intuitive API
and a straight-forward deployment story: `pip install bytewax && python -m
[Link] my_dataflow.py`” — Zander Matheson

[Link]

At Unstructured we’re building data engineering tools to make it effortless to

transform unstructured data from raw to ML-ready. Today developers and data
scientist spend more than 80% of their time on data preparation; our mission is to
give this time back to them to focus on model training and application
development. — Brian Raymond

Spark

Pulsar

Floom

Flink

Proton by Timeplus

Apache NiFi

ActiveLoop

HumanLoop

SuperLinked

Skyflow (Privacy)

Skyflow is a data privacy vault service inspired by the zero trust approach used by
companies like Apple and Netflix. It isolates and protects sensitive customer data,
transforming it into non-sensitive references to reduce security risks. Skyflow’s APIs
ensure privacy-safe model training by excluding sensitive data, preventing
inference from prompts or user-provided files for operations like RAG. — Sean
Falconer

VectorFlow

[Link] 12/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Daios

Pathway

Mage AI

Flexor

1.8. Infrastructure
A March 2024 addition and shakeup to this stack, infrastructure tools are critical to
building LLM Apps. These tools allow you to build your app first and abstract the
production work to later on. They allow you to serve, train, and evaluate LLMs and
LLM based applications.

BentoML

I started BentoML because I saw how tough it was to run and serve AI models
efficiently. With traditional cloud infra, handling heavy GPU workloads and dealing
with large models can be a real headache. In short, we make it super easy for AI
developers to get their AI inference service up and running. We’re all about open
source here, supported by an awesome community that’s always contributing. —
Chaoyu Yang

Databricks

LastMile AI

TitanML

Lots of enterprises want to self host language models but don’t have the
infrastructure to do it well. TitanML provides that infrastructure to let developers
build applications. The special sauce is that it focuses on optimization for enterprise
workloads like batch inference, multimodal, and embedding models. — Meryem
Arik

Determined AI (acquired by HPE)

Pachyderm (acquired by HPE)

ConfidentialMind

[Link] 13/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

ConfidentialMind is building a deployable API stack for enterprises. What makes it

unique? The ability to deploy everything on prem and plug and play with open
source tools. — Markku Räsänen

Snowflake

Upstash

There needs to be a way to keep track of state for stateless tools, and the developers
need to be served in this space. — Enes Akar

Unbody

We make AI accessible for non AI developers and make private data pipeline for AI
functionalities. — Amir Houieh

NIM by NVIDIA

Parea AI

Parea is making it easier to build and evaluate LLM applications by bringing an

agnostic framework and model for LLMs. — Joel Alexander

Below is a very appropriate architecture of a RAG Application :

[Link] 14/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Source

2. Build an LLM application from scratch

We will quickly build a RAG (Retrieval Augmented Generation) for a project’s GitHub
issues using HuggingFaceH4/zephyr-7b-beta model, and LangChain.

Here’s a quick illustration:

[Link] 15/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

The external data is converted into embedding vectors with a separate

embeddings model, and the vectors are kept in a database. Embeddings models
are typically small, so updating the embedding vectors on a regular basis is
faster, cheaper, and easier than fine-tuning a model.

At the same time, the fact that fine-tuning is not required gives you the freedom
to swap your LLM for a more powerful one when it becomes available, or switch
to a smaller distilled version, should you need faster inference.

Let’s illustrate building a RAG using an open-source LLM, embeddings model, and
LangChain.

[Link] 16/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

First, let’s install the required dependencies:

!pip install -q torch transformers accelerate bitsandbytes transformers sentenc

# If running in Google Colab, you may need to run this cell to make sure you're
import locale
[Link] = lambda: "UTF-8"

!pip install -q langchain

2.1. Prepare the data

In this example, we’ll load all of the issues (both open and closed) from PEFT
library’s repo.

First, you need to acquire a GitHub personal access token to access the GitHub API.

from getpass import getpass

ACCESS_TOKEN = getpass("YOUR_GITHUB_PERSONAL_TOKEN")

Next, we’ll load all of the issues in the huggingface/peft repo:

By default, pull requests are considered issues as well, here we chose to exclude
them from data with by setting include_prs=False

Setting state = "all" means we will load both open and closed issues.

from langchain.document_loaders import GitHubIssuesLoader

loader = GitHubIssuesLoader(repo="huggingface/peft", access_token=ACCESS_TOKEN,

[Link] 17/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

include_prs=False, state="all")
docs = [Link]()

The content of individual GitHub issues may be longer than what an embedding
model can take as input. If we want to embed all of the available content, we need to
chunk the documents into appropriately sized pieces.

The most common and straightforward approach to chunking is to define a fixed

size of chunks and whether there should be any overlap between them. Keeping
some overlap between chunks allows us to preserve some semantic context between
the chunks. The recommended splitter for generic text is the
RecursiveCharacterTextSplitter, and that’s what we’ll use here.

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=30)

chunked_docs = splitter.split_documents(docs)

2.2. Create the embeddings + retriever

Now that the docs are all of the appropriate size, we can create a database with their
embeddings.

To create document chunk embeddings we’ll use the HuggingFaceEmbeddings and the
BAAI/bge-base-en-v1.5 embeddings model. There are many other embedding
models available on the Hub, and you can keep an eye on the best-performing ones
by checking the Massive Text Embedding Benchmark (MTEB) Leaderboard.

To create the vector database, we’ll use FAISS , a library developed by Facebook AI.
This library offers efficient similarity search and clustering of dense vectors, which
is what we need here. FAISS is currently one of the most used libraries for NN
search in massive datasets.

We’ll access both the embedding model and FAISS via LangChain API.

[Link] 18/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

from [Link] import FAISS

from [Link] import HuggingFaceEmbeddings

db = FAISS.from_documents(chunked_docs, HuggingFaceEmbeddings
(model_name="BAAI/bge-base-en-v1.5"))

We need a way to return(retrieve) the documents given an unstructured query. For

that, we’ll use the as_retriever method using the db as a backbone:

search_type="similarity" means we want to perform similarity search between

the query and documents

search_kwargs={'k': 4} instructs the retriever to return top 4 results.

retriever = db.as_retriever(search_type="similarity", search_kwargs={"k": 4})

The vector database and retriever are now set up, next we need to set up the next
piece of the chain — the model.

2.3. Load quantized model

For this example, we chose HuggingFaceH4/zephyr-7b-beta , a small but powerful
model.

With many models being released every week, you may want to substitute this
model to the latest and greatest. The best way to keep track of open source LLMs is
to check the Open-source LLM leaderboard.

To make inference faster, we will load the quantized version of the model:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfi

model_name = "HuggingFaceH4/zephyr-7b-beta"

bnb_config = BitsAndBytesConfig(
load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4

[Link] 19/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=bn

tokenizer = AutoTokenizer.from_pretrained(model_name)

2.4. Setup the LLM chain

Finally, we have all the pieces we need to set up the LLM chain.

First, create a text_generation pipeline using the loaded model and its tokenizer.

Next, create a prompt template — this should follow the format of the model, so if
you substitute the model checkpoint, make sure to use the appropriate formatting.

from [Link] import HuggingFacePipeline

from [Link] import PromptTemplate
from transformers import pipeline
from langchain_core.output_parsers import StrOutputParser

text_generation_pipeline = pipeline(
model=model,
tokenizer=tokenizer,
task="text-generation",
temperature=0.2,
do_sample=True,
repetition_penalty=1.1,
return_full_text=True,
max_new_tokens=400,
)

llm = HuggingFacePipeline(pipeline=text_generation_pipeline)

prompt_template = """
<|system|>
Answer the question based on your knowledge. Use the following context to help:
{context}
</s>
<|user|>
{question}
</s>
<|assistant|>
"""

prompt = PromptTemplate(
input_variables=["context", "question"],
template=prompt_template,
)

[Link] 20/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

llm_chain = prompt | llm | StrOutputParser()

Note: You can also use tokenizer.apply_chat_template to convert a list of messages (as
dicts: {'role': 'user', 'content': '(...)'} ) into a string with the appropriate chat
format.

Finally, we need to combine the llm_chain with the retriever to create a RAG chain.
We pass the original question through to the final generation step, as well as the
retrieved context docs:

from langchain_core.runnables import RunnablePassthrough

retriever = db.as_retriever()

rag_chain = {"context": retriever, "question": RunnablePassthrough()} | llm_cha

2.5. Compare the results

Let’s see the difference RAG makes in generating answers to the library-specific
questions.

question = "How do you combine multiple adapters?"

First, let’s see what kind of answer we can get with just the model itself, no context
added:

llm_chain.invoke({"context": "", "question": question})

Output:

[Link] 21/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

To combine multiple adapters, you need to ensure that they are compatible with

1. Identify the types of connectors you need: Before combining adapters, determ

2. Check compatibility: Make sure that the adapters you choose are compatible w

3. Connect the adapters: Once you have identified the compatible adapters, conn

4. Test the connection: After connecting all the adapters, test the connection

5. Secure the connections: To prevent accidental disconnections, use cable ties

Remember, combining multiple adapters can sometimes result in signal loss or in

rag_chain.invoke(question)

Output:

Based on the provided context, here are some potential ways to combine multipl

1. Load each adapter separately and concatenate their outputs:

```python
from peft import Peft

# Load the base model and adapter 1

base_model = AutoModelForSequenceClassification.from_pretrained("your_base_
adapter1 = Peft("adapter1").requires_grad_(False)
adapter1(base_model).load_state_dict([Link]("path/to/[Link]"))

# Load adapter 2
adapter2 = Peft("adapter2").requires_grad_(False)
adapter2(base_model).load_state_dict([Link]("path/to/[Link]"))

# Concatenate the outputs of both adapters

def forward(self, input_ids, attention_mask):
x = self.base_model(input_ids, attention_mask)[0]
x = [Link]([x, adapter1(x), adapter2(x)], dim=-1)
return x

# Create a new model class that includes the concatenated outputs

class MyModel(BaseModel):
def __init__(self):
[Link] 22/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

super().__init__()
[Link] = forward

# Instantiate the new model class and use it for inference

my_model = MyModel()
```

2. Freeze multiple adapters and selectively activate them during inference:

```python
from peft import Peft

# Load the base model and all ad

As we can see, the added context, really helps the exact same model, provide a
much more relevant and informed answer to the library-specific question.

Notably, combining multiple adapters for inference has been added to the library,
and one can find this information in the documentation, so for the next iteration of
this RAG it may be worth including documentation embeddings.

So, now we have an understanding of how to build an LLM RAG Application from
scratch.

Google Colab ( Credits ) :

[Link]
ks/en/rag_zephyr_langchain.ipynb

Next, we will use our understanding to build 3 more LLM Applications.

For the below applications, we will be using Ollama as our LLM Server. Let’s start
with understanding more about LLM Server below.

3. LLM Server
The most critical component of this app is the LLM server. With Ollama, we have a
robust LLM Server that can be set up locally.

What is Ollama?

[Link] 23/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Ollama can be installed from [Link] site.

Ollama isn’t a single language model but a framework that lets us run multiple
open-source LLMs locally on our machine. Think of it like a platform for playing
different language models like Llama 2, Mistral, etc., instead of a specific player
itself.

Additionally, we can use the Langchain SDK, which is a tool for working with Ollama
more conveniently.

Using Ollama on the command line is very simple. The following are commands,
that we can try to run Ollama on our computer.

ollama pull — This command pulls a model from the Ollama model hub.

ollama rm — This command is used to remove the already downloaded model

from the local computer.

ollama cp — This command is used to make a copy of the model.

ollama list — This command is used to see the list of downloaded models.

ollama run — This command is used to run a model, If the model is not already
downloaded, it will pull the model and serve it.

ollama serve — This command is used to start the server, to serve the
downloaded models.

We can download these models to our local machine, and then interact with those
models through a command line prompt. Alternatively, when we run the model,
Ollama also runs an inference server hosted at port 11434 (by default) that we can
interact with through APIs and other libraries like Langchain.

As of this post, Ollama has 74 models, which also include categories like embedding
models.

[Link] 24/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Source : Ollama

4. Chatbot Applications
The 3 essential Chatbot applications that we will be building next are :

1. Chat with multiple PDFs using LangChain, ChromaDB & Streamlit

2. Chatbot with Open WebUI

3. Deploy Chatbot using Docker.

By the end of these 3 applications, we will build an intuition of how industrial

applications are built and deployed at scale.

[Link] 25/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

RAG Application High-level Flow

5. Application 1: Chat with multiple PDFs

We will build an application that is something similar to ChatPDF but simpler.
Where users can upload multiple PDF documents and ask questions through a
straightforward UI.

Our tech stack is super easy with Langchain, Ollama, and Streamlit.

Architecture

LLM Server: The most critical component of this app is the LLM server. Thanks
to Ollama, we have a robust LLM Server that can be set up locally, even on a
laptop. While [Link] is an option, I find Ollama, written in Go, easier to set
up and run.
[Link] 26/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

RAG: Undoubtedly, the two leading libraries in the LLM domain are Langchain
and LLamIndex. For this project, I’ll be using Langchain due to my familiarity
with it from my professional experience. An essential component for any RAG
framework is vector storage. We’ll be using Chroma here, as it integrates well
with Langchain.

Chat UI: The user interface is also an important component. Although there are
many technologies available, I prefer using Streamlit, a Python library, for peace
of mind.

Okay, let’s start setting it up.

The chatbot can access information from various PDFs. Here’s a breakdown:

Data Source: Multiple PDFs

Storage: ChromaDB’s vector store (efficiently stores and retrieves information)

Processing: LangChain API prepares the data for a large language model (LLM)

LLM Integration: Likely involves Retrieval-Augmented Generation (RAG) for

enhanced responses

User Interface: Streamlit creates a user-friendly chat interface

GitHub Repo Structure:

[Link]

[Link] 27/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Folder Structure

1. Clone the repository to the local machine.

git clone [Link]

cd 7_Ollama/local-pdf-chatbot

2. Create a Virtual Environment:

python3 -m venv myenv

source myenv/bin/activate

3. Install the requirements:

[Link] 28/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

pip install -r [Link]

4. Install Ollama and pull LLM model specified in [Link] [ We have already
covered setting up Ollama in the above section ]

5. Run the LLama2 model using Ollama

ollama pull llama2

ollama run llama2

6. Run the [Link] file using the Streamlit CLI. Execute the following command:

streamlit run [Link]

Image by Author

6. Application 2: Chatbot with Open WebUI

In this application, we will be building a Chatbot with Open WebUI instead of
Streamlit / Chainlit / Gradio as an UI.

[Link] 29/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Below are the steps :

1. Install Ollama & Deploy a LLM

We can install Ollama directly on our local machine or can also deploy the Ollama
docker container locally. The choice is ours, either of them will work for the
langchain Ollama interface, Ollama official python interface, and open-webui
interface.

Below are the instructions for installing Ollama directly in our local systems :

Setting up and running Ollama is straightforward. First, visit [Link] and

download the app appropriate for our operating system.

Next, open the terminal and execute the following command to pull the latest
models. While there are many other LLM models available, I choose Mistral-7B for
its compact size and competitive quality.

ollama pull llama2

ollama run llama2

[Link] 30/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

The set-up procedure is the same for all other models. We need to pull and run.

Image Source

2. Install open-webui (ollama-webui)

Open WebUI is an extensible, feature-rich, and user-friendly self-hosted WebUI

designed to operate entirely offline. It supports various LLM runners, including
Ollama and OpenAI-compatible APIs.

Official GitHub Repo: [Link]

Run the below docker command to deploy open-webui docker container on the local
machine.

[Link] 31/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

docker run -d -p 3000:8080 --add-host=[Link]:host-gateway -v olla

Image by Author

3. Open Browser

Open the browser and call localhost with port 3000

[Link]

[Link] 32/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Image by Author

To get started, we need to register for the first time. Simply click on the "Sign up"
button to create our account.

Image by Author

Once registered, we will be routed to the home page of open-webui.

[Link] 33/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Image by Author

Depending on which LLM we deployed on our local machine, those options will be
reflected in the drop-down to select.

Image by Author

Once selected, chat.

[Link] 34/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Image by Author

This prevents us from creating streamlit or gradio UI interfaces to experiment with

various open-source LLMs, for presentations, etc.

We can chat with any PDF file now :

[Link] 35/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Image by Author

Attaching a Demo gif file showcasing how we can use open-webui to chat with
images.

[Link] 36/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Source: open-webui

Let’s look into how we can use our customized models with Ollama. Below are the
steps for the same.

How to use a customized model?

Import from GGUF

Ollama supports importing GGUF models in the Modelfile:

1. Create a file named Modelfile , with a FROM instruction with the local filepath to
the model we want to import.

FROM ./vicuna-33b.Q4_0.gguf

2. Create the model in Ollama

ollama create example -f Modelfile

3. Run the model

ollama run example

Import from PyTorch or Safetensors

See the guide on importing models for more information.

Customize a prompt

Models from the Ollama library can be customized with a prompt. For example, to
customize the llama2 model:

[Link] 37/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

ollama pull llama2

Create a Modelfile :

FROM llama2

# set the temperature to 1 [higher is more creative, lower is more coherent]

PARAMETER temperature 1
# set the system message
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""

Next, create and run the model:

ollama create mario -f ./Modelfile

ollama run mario
>>> hi
Hello! It's your friend Mario.

For more examples, see the examples directory. For more information on working
with a Modelfile, see the Modelfile documentation.

7. Application 3: Deploy Chatbot using Docker

Let’s build the chatbot application using Langshan, to access our model from the
Python application, we will be building a simple Steamlit chatbot application. We
will be deploying this Python application in a container and will be using Ollama in
a different container. We will build the infrastructure using docker-compose.

The following picture shows the architecture of how the containers interact, and
what ports they will be accessing.

[Link] 38/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Source

We build 2 containers,

Ollama container uses the host volume to store and load the models
( /root/.ollama is mapped to the local ./data/ollama ). Ollama container listens
on 11434 (external port, which is internally mapped to 11434)

Streamlit chatbot application will listen on 8501 (external port, which is

internally mapped to 8501).

GitHub Repository :

Folder Structure

[Link] 39/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Clone the repository to the local machine.

git clone [Link]

cd 7_Ollama/docker-pdf-chatbot

Create a Virtual Environment:

python3 -m venv ./ollama-langchain-venv

source ./ollama-langchain-venv/bin/activate

Install the requirements:

pip install -r [Link]

Ollama is a framework that allows us to run the Ollama server as a docker image.
This is very useful for building microservices applications that use Ollama models.
We can easily deploy our applications in the docker ecosystem, such as OpenShift,
Kubernetes, and others. To run Ollama in docker, we have to use the docker run
command, as shown below. Before this, we should have docker installed in our
system.

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollam

Below is the output :

[Link] 40/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Image by Author

We should then be able to interact with this container using docker exec , as shown
below, and run the prompts.

docker exec -it ollama ollama run phi

Image by Author

In the above command, we are running phi within the docker.

curl [Link] -d '{

"model": "phi:latest",
"prompt":"Who are you?",
"stream":false
}'

Below is the result :

[Link] 41/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Image by Author

Note that docker containers an ephemeral, and whatever models, we pull, will
disappear when we restart the container. We will solve this issue in the next blog,
where we will build a distributed Streamlit application from ground up. We will be
mapping the volume of the container with the host.

Ollama is a powerful tool that enables new ways of creating and running LLM
applications on the cloud. It simplifies the development process and offers flexible
deployment options. It also allows for easy management and scaling of the
applications.

Now, let’s get started with the Streamlit application.

[Link] 42/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

We are using Ollama and calling the model through Ollama Langchain library
(which is part of langchain_community )

Let’s define the dependencies in [Link].

[Link] 43/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Let’s now define a Dockerfile to build the docker image of the Streamlit application.

We are using the Python docker image, as the base image, and creating a working
directory called /app . We are then copying our application files there, and running
the pip installs to install all the dependencies. We are then exposing the port 8501
and starting the streamlit application.

We can build the docker image using docker build command, as shown below.

docker build . -t viprasingh/ollama-langchain:0.1

Image by Author

[Link] 44/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

We should be able to check if the Docker image is built, using docker images

command, as shown below.

Image by Author

Let’s now build a docker-compose configuration file, to define the network of the
Streamlit application and the Ollama container, so that they can interact with each
other. We will also be defining the various port configurations, as shown in the
picture above. For Ollama, we will also be mapping the volume, so that whatever
models are pulled, are persisted.

[Link]

We can bring up the applications by running the docker-compose up command,

once we execute docker-compose up , we see that both the containers start running,
as shown in the screenshot below.

[Link] 45/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Image by Author

We should be able to see the containers running by executing docker-compose ps

command as shown below.

Image by Author

We now check if Ollama is running by calling [Link] as shown in the

screenshot below.

Let’s now download the required model, by logging into the docker container using
the docker exec command as shown below.

docker exec -it docker-pdf-chatbot-ollama-container-1 ollama run phi

[Link] 46/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Since we are using the model phi, we are pulling that model and testing it by
running it. We can see the screenshot below, where the phi model is downloaded
and will start running (since we are using -it flag we should be able to interact and
test with sample prompts)

We can see the downloaded model files and manifests in our local folder
./data/ollama (which is internally mapped to /root/.ollama for the container,
which is where Ollama looks for the downloaded models to serve)

[Link] 47/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Image by Author

Let's now run access our streamlit application by opening [Link] on

the browser. The following screenshot shows the interface

[Link] 48/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Let's try to run a prompt “ generate a story about dog called bozo ”. We should be able
to see the console logs reflecting the API calls, that are coming from our Streamlit
application, as shown below

We can see below screenshot, the response, I got for the prompt I sent

[Link] 49/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

We can bring down the deployment by calling docker-compose down

The following screenshot shows the output

There we go. It was super fun, working on this blog getting Ollama to work with
Langchain, and deploying them on Docker using Docker-Compose

Conclusion
The blog explores building Large Language Model (LLM) applications locally,
focusing on Retrieval-Augmented Generation (RAG) chains. It covers components
like the LLM Server powered by Ollama, LangChain framework, Chroma for
[Link] 50/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

embeddings, and Streamlit for web apps. It details creating Chatbot applications
using Ollama, LangChain, ChromaDB, and Streamlit, with GitHub repo structures
and Docker deployment. Overall, it offers a practical guide to developing LLM
applications efficiently.

Credits
In this blog post, we have compiled information from various sources, including
research papers, technical blogs, official documentations, YouTube videos, and
more. Each source has been appropriately credited beneath the corresponding
images, with source links provided.

Below is a consolidated list of references:

1. [Link]
7168449062336225280-3n_p/

2. [Link]
eac28b9dc1e7

3. [Link]

4. [Link]

5. [Link]
ollama-deploy-on-docker-5dfcfd140363

Thank you for reading!

If this guide has enhanced your understanding of Python and Machine Learning:

Please show your support with a clap 👏 or several claps!

Your claps help me create more valuable content for our vibrant Python or ML
community.

Feel free to share this guide with fellow Python or AI / ML enthusiasts.

Your feedback is invaluable — it inspires and guides my future posts.

Connect with me!

Vipra

Large Language Models Streamlit Ollama Langchain Chromadb

[Link] 51/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Written by Vipra Singh

2.5K Followers

More from Vipra Singh

Vipra Singh

Building LLM Applications: Advanced RAG (Part 10)

Learn Large Language Models ( LLM ) through the lens of a Retrieval Augmented Generation (
RAG ) Application.

Apr 28 755 6

[Link] 52/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Vipra Singh

Building LLM Applications: Serving LLMs (Part 9)

Learn Large Language Models ( LLM ) through the lens of a Retrieval Augmented Generation (
RAG ) Application.

Apr 17 824 5

Vipra Singh

Building LLM Applications: Evaluation (Part 8)

Learn Large Language Models ( LLM ) through the lens of a Retrieval Augmented Generation (
RAG ) Application.

[Link] 53/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Apr 7 479 1

Vipra Singh

LLM Architectures Explained: NLP Fundamentals (Part 1)

Deep Dive into the architecture & building of real-world applications leveraging NLP Models
starting from RNN to the Transformers.

Aug 15 1.5K 10

See all from Vipra Singh

Recommended from Medium

[Link] 54/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Vipra Singh

Building LLM Applications: Serving LLMs (Part 9)

Learn Large Language Models ( LLM ) through the lens of a Retrieval Augmented Generation (
RAG ) Application.

Apr 17 824 5

Paras Madan in GoPenAI

Building a Multi PDF RAG Chatbot: Langchain, Streamlit with code

Talking to big PDF’s is cool. You can chat with your notes, books and documents etc. This blog
post will help you build a Multi RAG…

[Link] 55/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Jun 6 705 3

Lists

Natural Language Processing

1717 stories · 1291 saves

AI Regulation
6 stories · 571 saves

ChatGPT prompts
48 stories · 2021 saves

Predictive Modeling w/ Python

20 stories · 1553 saves

Florian June in AI Advances

Advanced RAG 11: Query Classification and Refinement

Priciples, Code Explanation and Insights about Adaptive-RAG and RQ-RAG

May 11 484 2

[Link] 56/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Plaban Nayak in The AI Forum

RAG on Complex PDF using LlamaParse, Langchain and Groq

Retrieval-Augmented Generation (RAG) is a new approach that leverages Large Language
Models (LLMs) to automate knowledge search, synthesis…

Apr 7 858 12

Harshit Tyagi

Start Building These Projects to Become an LLM Engineer

First steps to become an LLM Engineer

[Link] 57/58
25/09/2024, 16:53 Building LLM Applications: Open-Source RAG (Part 7) | by Vipra Singh | Medium

Sep 15 363 3

Ryan Siegler in KX Systems

RAG + LlamaParse: Advanced PDF Parsing for Retrieval

The core focus of Retrieval Augmented Generation (RAG) is connecting your data of interest to
a Large Language Model (LLM). This process…

May 3 177 2

See more recommendations

[Link] 58/58

AI Product Essentials Crash Course
No ratings yet
AI Product Essentials Crash Course
270 pages
Data Engineering Standards Guide
No ratings yet
Data Engineering Standards Guide
3 pages
Azure Dicumentation
No ratings yet
Azure Dicumentation
371 pages
Introduction To Cloud Infrastructure Technologies
No ratings yet
Introduction To Cloud Infrastructure Technologies
11 pages
DevBlogs - Microsoft Developer Blogs
No ratings yet
DevBlogs - Microsoft Developer Blogs
3 pages
IPython Dashboard Documentation 0.1.2
No ratings yet
IPython Dashboard Documentation 0.1.2
21 pages
Visual Studio Code - ArchWiki
No ratings yet
Visual Studio Code - ArchWiki
5 pages
C# Tutorial - The Fundamentals You Need To Master C# - Edureka
No ratings yet
C# Tutorial - The Fundamentals You Need To Master C# - Edureka
55 pages
Azure Data Engineer Learning Path (OCT 2019)
No ratings yet
Azure Data Engineer Learning Path (OCT 2019)
1 page
AI-Powered Research Assistant Guide
No ratings yet
AI-Powered Research Assistant Guide
13 pages
RAG and Vector Database Guide
No ratings yet
RAG and Vector Database Guide
18 pages
LangChain Concepts Updated With Tealium
No ratings yet
LangChain Concepts Updated With Tealium
17 pages
Express.js: Features and Comparisons
No ratings yet
Express.js: Features and Comparisons
11 pages
The LLM Mesh Oreilly
No ratings yet
The LLM Mesh Oreilly
104 pages
Getting Started With TensorFlow - Js - TensorFlow - Medium
No ratings yet
Getting Started With TensorFlow - Js - TensorFlow - Medium
6 pages
CI for Reliable ML Pipeline Building
No ratings yet
CI for Reliable ML Pipeline Building
22 pages
LLMs in 100 Images
100% (1)
LLMs in 100 Images
104 pages
.Net Core Best Practices Guide
No ratings yet
.Net Core Best Practices Guide
55 pages
A2A Hybrid RAG With Qdrant Recommendation
No ratings yet
A2A Hybrid RAG With Qdrant Recommendation
6 pages
GenAI Training GitHub Copilot Vs Code Your AI Pair Programmer
No ratings yet
GenAI Training GitHub Copilot Vs Code Your AI Pair Programmer
13 pages
LLMOps 6 Month Roadmap
No ratings yet
LLMOps 6 Month Roadmap
2 pages
Sample ML
No ratings yet
Sample ML
1 page
m6 Llmops
No ratings yet
m6 Llmops
34 pages
Supercharge Your Workday With Chat GPT
No ratings yet
Supercharge Your Workday With Chat GPT
14 pages
Java DevOps Career Path in Finance
No ratings yet
Java DevOps Career Path in Finance
2 pages
Azure DevOps Integration with SonarCloud
No ratings yet
Azure DevOps Integration with SonarCloud
30 pages
Evaluation Metrics PPT
No ratings yet
Evaluation Metrics PPT
10 pages
MSFT Eco - Azure.vp Handbook PDF 2017-12-30
No ratings yet
MSFT Eco - Azure.vp Handbook PDF 2017-12-30
204 pages
Hugging Face Case Study 112023
No ratings yet
Hugging Face Case Study 112023
2 pages
Fabric Data Science 1 150
No ratings yet
Fabric Data Science 1 150
150 pages
Generative AI Tutorial
No ratings yet
Generative AI Tutorial
5 pages
Lagos Apapa Port Complex
No ratings yet
Lagos Apapa Port Complex
10 pages
The Novice LLM Training Guide
No ratings yet
The Novice LLM Training Guide
13 pages
Python List and Tuple Guide
No ratings yet
Python List and Tuple Guide
49 pages
Techniques, Tricks & Frameworks
No ratings yet
Techniques, Tricks & Frameworks
143 pages
SpringBoot Roadmap With Resources
No ratings yet
SpringBoot Roadmap With Resources
5 pages
MS Azure AI900 Training
No ratings yet
MS Azure AI900 Training
91 pages
Evaluate RAG - Phoenix
No ratings yet
Evaluate RAG - Phoenix
25 pages
(Bda) Big Data Architect On Azure
No ratings yet
(Bda) Big Data Architect On Azure
32 pages
Lecture 36 Introduction To Langchain
No ratings yet
Lecture 36 Introduction To Langchain
31 pages
Building GPT-2 from Scratch in PyTorch
No ratings yet
Building GPT-2 from Scratch in PyTorch
13 pages
Building A Large Language Model LLM From Scratch
No ratings yet
Building A Large Language Model LLM From Scratch
13 pages
OCR & Groq: Fast Data Extraction
No ratings yet
OCR & Groq: Fast Data Extraction
17 pages
Azure Developer Learning Pathway 1122i
No ratings yet
Azure Developer Learning Pathway 1122i
2 pages
Langchain Guide
No ratings yet
Langchain Guide
11 pages
Event-Driven James Webb Space Telescope Operations Using On-Board JavaScripts
No ratings yet
Event-Driven James Webb Space Telescope Operations Using On-Board JavaScripts
10 pages
Python Ecosystem
No ratings yet
Python Ecosystem
11 pages
Azure AZ-900 Cheat Sheet
No ratings yet
Azure AZ-900 Cheat Sheet
28 pages
CC-IN2P3 GPU Farm Job Submission Guide
No ratings yet
CC-IN2P3 GPU Farm Job Submission Guide
13 pages
RESTful Web Services Quick Guide
No ratings yet
RESTful Web Services Quick Guide
27 pages
React JS Cheat Sheet & Guide
100% (1)
React JS Cheat Sheet & Guide
101 pages
Firecrawl Self-Host LLM Setup Guide
No ratings yet
Firecrawl Self-Host LLM Setup Guide
3 pages
Building AI Agents
No ratings yet
Building AI Agents
3 pages
GitHub Trainings
No ratings yet
GitHub Trainings
5 pages
GenAI Training GitHub Copilot Vs Code Your AI Pair Programmer
No ratings yet
GenAI Training GitHub Copilot Vs Code Your AI Pair Programmer
12 pages
Udemy Courses 100% Off FREE Coupons
No ratings yet
Udemy Courses 100% Off FREE Coupons
3 pages
Building LLM Applications
No ratings yet
Building LLM Applications
14 pages
Neurons To GenerativeAI Roadmap 2024
No ratings yet
Neurons To GenerativeAI Roadmap 2024
14 pages
LLM Guide for Interns
No ratings yet
LLM Guide for Interns
4 pages
Tutorial Membuat RAG AI ChatBot API Dengan Python FastAPI Dan Open Source LLMs
No ratings yet
Tutorial Membuat RAG AI ChatBot API Dengan Python FastAPI Dan Open Source LLMs
41 pages
Building LLM Applications - Search & Retrieval (Part 5) - by Vipra Singh - Medium
No ratings yet
Building LLM Applications - Search & Retrieval (Part 5) - by Vipra Singh - Medium
48 pages
LLM Architectures Explained - RNNS, LSTMs & GRUs (Part 3) - by Vipra Singh - Sep, 2024 - Medium
No ratings yet
LLM Architectures Explained - RNNS, LSTMs & GRUs (Part 3) - by Vipra Singh - Sep, 2024 - Medium
115 pages
LLM Architectures Explained - NLP Fundamentals (Part 1) - by Vipra Singh - Aug, 2024 - Medium
No ratings yet
LLM Architectures Explained - NLP Fundamentals (Part 1) - by Vipra Singh - Aug, 2024 - Medium
61 pages
The Math Behind Neural Networks - Towards Data Science
No ratings yet
The Math Behind Neural Networks - Towards Data Science
43 pages
Semi-Dense Loftr SFM
No ratings yet
Semi-Dense Loftr SFM
13 pages
Stanford CS 224N Deep Learning For NLP Practice Quiz Pack
No ratings yet
Stanford CS 224N Deep Learning For NLP Practice Quiz Pack
4 pages
Artificial Intelligence For Infectious Disease Prediction and Prevention: A Comprehensive Review
No ratings yet
Artificial Intelligence For Infectious Disease Prediction and Prevention: A Comprehensive Review
31 pages
Stimmer Et Al 2025 Natural Language Processing in Veterinary Pathology A Commentary On Oppo
No ratings yet
Stimmer Et Al 2025 Natural Language Processing in Veterinary Pathology A Commentary On Oppo
4 pages
Week 13 LLM ChatGPT HAAI IITKgp v2
No ratings yet
Week 13 LLM ChatGPT HAAI IITKgp v2
119 pages
STC Sample PPT
No ratings yet
STC Sample PPT
12 pages
Research. Paper On Transformers
No ratings yet
Research. Paper On Transformers
9 pages
Chatgpt: A Technical Perspective: Presented by Teamx
No ratings yet
Chatgpt: A Technical Perspective: Presented by Teamx
18 pages
Emotion Detection Project Report
No ratings yet
Emotion Detection Project Report
51 pages
Large Language Models Are Zero-Shot Fuzzers
No ratings yet
Large Language Models Are Zero-Shot Fuzzers
13 pages
Artificial Intelligence Enabled Predictive Energy Saving Planning of Liquid Cooling System For Data Centers
No ratings yet
Artificial Intelligence Enabled Predictive Energy Saving Planning of Liquid Cooling System For Data Centers
18 pages
Decision Mamba Reinforcement Learning Via Sequence Modeling
No ratings yet
Decision Mamba Reinforcement Learning Via Sequence Modeling
8 pages
Chatgpt vs. Bard: A Comparative Study
No ratings yet
Chatgpt vs. Bard: A Comparative Study
14 pages
IIT Patna BTech Student Profile
No ratings yet
IIT Patna BTech Student Profile
1 page
Natural Language Processing (NLP) Ohg365
No ratings yet
Natural Language Processing (NLP) Ohg365
7 pages
Memory-Efficient Low-Latency Remote Photoplethysmography Through Temporal-Spatial State Space Duality-2025
No ratings yet
Memory-Efficient Low-Latency Remote Photoplethysmography Through Temporal-Spatial State Space Duality-2025
10 pages
【2021 ArXiv】Contrastive Self-supervised Sequential Recommendation With Robust Augmentation
No ratings yet
【2021 ArXiv】Contrastive Self-supervised Sequential Recommendation With Robust Augmentation
11 pages
A Mathematical Framework For Transformer Circuits
No ratings yet
A Mathematical Framework For Transformer Circuits
58 pages
A Review of Visual Grounding On Remote Sensing Ima
No ratings yet
A Review of Visual Grounding On Remote Sensing Ima
25 pages
Minor Project Format
No ratings yet
Minor Project Format
46 pages
What Has A Foundation Model Found - Tweets Part 1
No ratings yet
What Has A Foundation Model Found - Tweets Part 1
11 pages
Codefuse-13B: A Pretrained Multi-Lingual Code Large Language Model
No ratings yet
Codefuse-13B: A Pretrained Multi-Lingual Code Large Language Model
12 pages
SMS Spam Detection with NLP
No ratings yet
SMS Spam Detection with NLP
21 pages
Are Large Language Models All You Need For Task-Oriented Dialogue?
No ratings yet
Are Large Language Models All You Need For Task-Oriented Dialogue?
13 pages
Dissertation
No ratings yet
Dissertation
62 pages
An Explainable Deep Learning Model For Diabetic Foot Ulcer
No ratings yet
An Explainable Deep Learning Model For Diabetic Foot Ulcer
20 pages
Bhai 3
No ratings yet
Bhai 3
18 pages
Ai Terminal 192,193
No ratings yet
Ai Terminal 192,193
32 pages
Network Traffic Prediction Apply The Transformer To Time Series
No ratings yet
Network Traffic Prediction Apply The Transformer To Time Series
6 pages
Dense Video
No ratings yet
Dense Video
35 pages