Flowise AI Tutorial #3 File Loaders, Text Splitters, Embeddings & Vector Stores

Uploaded by

Emanuele Sassù

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views3 pages

Flowise AI Tutorial #3 File Loaders, Text Splitters, Embeddings & Vector Stores

Uploaded by

Emanuele Sassù

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

flow wise makes it easy to create AI applications using a clean and intuitive user

interface the true power of flow wise is the ability to create AI apps that make
use of a custom knowledge base using our own data we can add the ability to upload
files to our application like PDF documents and text files and we can then chat to
our documentation using flowwise before we build this document chatbot though there
are a few Concepts that we first need to understand this is a topic that I could
probably create an entire video on but let's have a look at the fundamentals for
this video in this example we have a file containing the nursery rhyme Mary had a
little lamb and on the right hand side we have a chat app where we can also model
questions as an example we might want to ask the assistant questions about the
contents in this file but in order for the model to have a view of the contents of
this file we first have to provide contains of this fall as context into our
application which could look like this and that would be the same as copying the
text from the file pasting it into chat GPT and then asking chatgpt questions about
the file that could work but there is an issue with that and the issue we have is
with the token limit a token represents a word or a part of a word and if the
content's overfall was quite large we will exceed the token limit quite easily so
simply copying the file into the chat and then asking questions will become an
issue so therefore ideally we only want to grab the sections that are relevant to
our question and only feed that into the chat as context and thankfully flow wise
or Lang chain which flowwise is based on offers a neat and elegant solution to work
around this limitation so let's talk about text Splitters text Splitters allows us
to take the content of the file and then break the content up into chunks let's
also talk about documents and documents should not be confused with files but
instead a document is a line chain definition of a chunk of text which we got from
the text splitter but the document also includes metadata like the file name and
any other information that we want to store about this piece of text as an example
let's imagine a scenario where we want to upload a folder full of files to our
application the document loader will Loop through each of those files and then it
will use the text splitter to break the follow-up into chunks and it will use the
metadata to store the file name that contained that piece of text now that we've
broken up our file content into logical pieces of text called chunks and then
converted those chunks into line chain documents which also contain metadata we now
need to store that information somewhere and how this works is we'll actually store
that information in a database called a vector database Vector databases are a
fascinating topic but it will take way too long to explain how they work in this
video but Vector databases basically store the data as Vector arrays a vector array
is something that the AI will understand and it will assist the AI with finding
similar documents when we chat to it but in order for us to convert our text into a
vector array we need to run a function called embeddings embeddings is a unique
algorithm that is unique per language model and this will convert our text into
something that our model will understand because we are using openai in our example
we will use the openai embeddings function to convert our text to this array our
Vector array along with the text and metadata can now be stored as records in in
our Vector database and furthermore Vector stores also group similar pieces of text
or similar embeddings in clusters this means that in our application if we now had
to ask our application who is Mary our chat app will first Guide to the vector
store to perform a similarity search so it will go and extract all documents that
are related or similar to Mary it will then return a list of similar documents back
to our application by default our think flowwise Returns the top four results back
to the application it is then these results that get included in the conversation
as context greatly reducing the amount of tokens needed for our conversation enough
Theory let's now go and build this document chat bot in flow wise go back to the
dashboard and let's create a new new chat flow let's save this and call it document
chatbot what we also need is a file to upload to our application ideally you want
to create a file with information that GPT wasn't trained on so I use chatgpt to
generate a unique story and I added the contents to a file some details of the
story is that this is some sort of love story about a young woman named Emily and
she's an architect who moved to a small town where she met a man called Lucas so
there are some details in the story that GPT usually won't be aware of so go ahead
create a file and save it on your machine so back in our project let's have a look
at what we need as I mentioned in the previous videos our chat flows always require
either a chain or an agent to generate output in this example we don't have to use
any external tool so we do not need to use agents and we will be using chains if we
go to our nodes and open up chains we can see this chain called conversational
retrieval QA chain and this is a document QA chain and this is perfect for what we
need let's drop this chain on the canvas and let's configure it if we look at the
inputs this chain requires an llm as well as a vector store remember in our Theory
we showed that our data will be retrieved from a vector store let's set up our
language model under nodes let's go to chat models and let's drag and drop the chat
openai model onto the canvas and we can immediately connect this llm to the chain
we also need to provide our openai API key like so we can leave the model on GPT
3.5 turbo let's now also set up the vector store to do this let's click on nodes
under nodes we can open up Vector stores and within Vector stores we've got quite a
few options for this demo we'll simply use the in memory Vector store but in a
production environment you might want to consider one of these other options like
pinecan or Super Bass let's add our in-memory Vector store to the canvas and let's
hook it up to our chain our chain now has access to an llm as well as a vector
store so let's go ahead and load our data into the vector store first off let's
load documents into the vector store and you might recall from the theory that this
document is not referring to a text file but instead it's referring to a line chain
document this is basically a chunk of text with metadata so how do we create
documents this is actually quite easy all we need to do is add a document loader to
our project so under document loaders we have quite a few options we have the
Cheerio webscriper we can upload CSV v files docx files we can even load a folder
with multiple different files within it let's keep it simple and add the text file
document loader to our project this text file node will allow us to upload files
from our machine it will go and create documents from the content of that file so
let's hook this node up to our Vector store but what this is going to do is it's
going to upload our text file in its entirety and create one single document with
metadata from that file which is not what we want but what we want to do instead is
upload our file and then split our file contents up into chunks and then from those
chunks we want to create documents so optionally we can attach a text splitter to
this node in nodes go to text Splitters with in-text Splitters we'll select the
recursive character text splitter and then add that to our canvas we you can then
connect the stick splitter to the text splitter parameter on our text file node we
can now tell the text splitter how big these chunks need to be and the default is
1000 characters let's make that smaller by changing it to 200 characters the size
of the chunks is really up to you but just keep in mind the intention is to grab
these chunks and then add that to the context of our conversation and the smaller
the chunks the better because the smaller the context the less tokens we use which
drives down costs we can also specify a chunk overlap which will change to
something like 20 characters this means that each chunk might have a section of the
chunk before and after it available in its contents so now we are able to upload
files by selecting the file from our machine and this will now take our fall chunk
it based on these parameters and then come convert each of the chunks into a line
chain document which is then stored in our Vector database but this brings us to
the final component of our chain and that is embeddings in order for the AI to make
sense of the content that we're storing in the database it needs to convert the
text into Vector arrays and in order to convert the text into a vector array we
need to call the embeddings function this is quite easy to set up as well in the
nodes we can go to embeddings and under embeddings we can select the embedding
function related to our llm because we are using open AI as the lrm up here we will
simply select the openai embeddings function and drop that onto the canvas and we
can connect that to our Vector store and in order to call this openai embeddings
API we need to provide the open AI API key as well we can now go ahead and save
this chat flow and we should be able to test this out let's open up the chat and
let's ask a question specific to our file so let's actually pull up this file to
the site over here so we can test this out we know that the main character is
called Emily so let's ask it a question who is Emily and let's send this and that
is perfect Emily is indeed an architect and she is from everdale let's ask it who
is Lucas and it seems that
our story does not provide enough information about Lucas so this rephrase the
question a bit are Emily and Lucas friends and indeed apparently they are friends
so this is a fantastic way to upload documentation like large PDF files or a folder
full of content and then ask questions related to that content and you now have a
fully functional document chat bot I hope you enjoyed this content please please
like And subscribe to support my channel and please tell me down in the comments
what you would like to see me cover next I look forward to seeing you in the next
one bye

there are some details in the story that GPT usually won't be aware of so go ahead
create a file and save it on your machine

the site over here so we can test this out we know that the main character is
called Emily so let's ask it a question who is

Chat with PDFs and Images System Guide
No ratings yet
Chat with PDFs and Images System Guide
17 pages
Langchain Guide
No ratings yet
Langchain Guide
11 pages
Langchain Intro
No ratings yet
Langchain Intro
5 pages
02 Data Connections
No ratings yet
02 Data Connections
32 pages
Session 9 LangChain Ecosystem
No ratings yet
Session 9 LangChain Ecosystem
34 pages
OpenAI API Document Processing Guide
No ratings yet
OpenAI API Document Processing Guide
2 pages
? LangChain & RAG - A Beginner-Friendly Guide
No ratings yet
? LangChain & RAG - A Beginner-Friendly Guide
7 pages
A-Z of RAG Question Answering Methods in Langchain
No ratings yet
A-Z of RAG Question Answering Methods in Langchain
33 pages
Generative AI Apps With Langchain and Python - Rabi Jay
100% (4)
Generative AI Apps With Langchain and Python - Rabi Jay
387 pages
Notes - by Kishor
No ratings yet
Notes - by Kishor
11 pages
Synopsis
No ratings yet
Synopsis
3 pages
Setting Up A Local AI Q&A Server For Class 11 - 12 and JEE PDFs On Windows 10
No ratings yet
Setting Up A Local AI Q&A Server For Class 11 - 12 and JEE PDFs On Windows 10
6 pages
(English) Python RAG Tutorial (With Local LLMS) - AI For Your PDFs (DownSub - Com)
No ratings yet
(English) Python RAG Tutorial (With Local LLMS) - AI For Your PDFs (DownSub - Com)
15 pages
Understanding LangChain Chains
No ratings yet
Understanding LangChain Chains
34 pages
LangChain LLM Programming Guide
No ratings yet
LangChain LLM Programming Guide
39 pages
How To Build Your Own Custom ChatGPT Bot With Custom Knowledge Base - Better Programming
No ratings yet
How To Build Your Own Custom ChatGPT Bot With Custom Knowledge Base - Better Programming
8 pages
Generative AI Course Topics
No ratings yet
Generative AI Course Topics
3 pages
Content Processing and Loading Setup
No ratings yet
Content Processing and Loading Setup
1 page
Mastering Book Summarization Techniques
No ratings yet
Mastering Book Summarization Techniques
22 pages
Documentacao Langchain
No ratings yet
Documentacao Langchain
53 pages
OSCC Clinical Decision Support System
No ratings yet
OSCC Clinical Decision Support System
9 pages
ChatGPT for YouTube Videos with Langchain
No ratings yet
ChatGPT for YouTube Videos with Langchain
10 pages
Bring Your Data To Life - Creating A Chatbot With LLM, LangChain, Vector DB
No ratings yet
Bring Your Data To Life - Creating A Chatbot With LLM, LangChain, Vector DB
10 pages
RAG With Reinforcement Learning
No ratings yet
RAG With Reinforcement Learning
40 pages
LangChain for LLM App Developers
No ratings yet
LangChain for LLM App Developers
35 pages
LangChain for LLM App Developers
No ratings yet
LangChain for LLM App Developers
35 pages
Static Prompting: Micro-Course
No ratings yet
Static Prompting: Micro-Course
4 pages
Build a Self-Improving AI Chatbot
No ratings yet
Build a Self-Improving AI Chatbot
7 pages
Self RAG
No ratings yet
Self RAG
12 pages
Mini Project Docubot Power Point
No ratings yet
Mini Project Docubot Power Point
17 pages
Understanding The Core Idea: Retrieval-Augmented Generation (RAG)
No ratings yet
Understanding The Core Idea: Retrieval-Augmented Generation (RAG)
6 pages
Agent Ai
No ratings yet
Agent Ai
30 pages
Gen Ai-1
No ratings yet
Gen Ai-1
6 pages
Langchain 1 Complete
No ratings yet
Langchain 1 Complete
11 pages
Automate Tasks with AI Agent Teams
No ratings yet
Automate Tasks with AI Agent Teams
8 pages
Knowledge Retrieval Engine Project Plan
No ratings yet
Knowledge Retrieval Engine Project Plan
2 pages
Rag Project
No ratings yet
Rag Project
13 pages
CODE Explanation
No ratings yet
CODE Explanation
6 pages
QA Using Gemini Langchain ChromaDB PDF
No ratings yet
QA Using Gemini Langchain ChromaDB PDF
2 pages
Generative Ai
No ratings yet
Generative Ai
1 page
Build Personalized Bots with RAG
No ratings yet
Build Personalized Bots with RAG
32 pages
Absolutely, Let'S Break Down The Recursivecharactertextsplitter Class Even Further, Focusing On The Key Aspects and How It Achieves Text Splitting
No ratings yet
Absolutely, Let'S Break Down The Recursivecharactertextsplitter Class Even Further, Focusing On The Key Aspects and How It Achieves Text Splitting
12 pages
GenAI Curriculum
No ratings yet
GenAI Curriculum
64 pages
GenAI PDF
No ratings yet
GenAI PDF
34 pages
Master Sequential Agents Build Complex AI Apps With Flowise
No ratings yet
Master Sequential Agents Build Complex AI Apps With Flowise
11 pages
PDF Chatbot with Langchain Integration
No ratings yet
PDF Chatbot with Langchain Integration
2 pages
Finally Final
No ratings yet
Finally Final
18 pages
Prompt Engineering - OpenAI API
No ratings yet
Prompt Engineering - OpenAI API
30 pages
Intelligent Chat Bot Source Code
No ratings yet
Intelligent Chat Bot Source Code
10 pages
Birthday Gift Ideas for Data Scientists
No ratings yet
Birthday Gift Ideas for Data Scientists
1 page
LangChain: Build LLM Applications Easily
No ratings yet
LangChain: Build LLM Applications Easily
27 pages
Lang Chain
100% (1)
Lang Chain
143 pages
365careers - AI - Eng - Bootcamp, Ai, 365careers, Udemy
No ratings yet
365careers - AI - Eng - Bootcamp, Ai, 365careers, Udemy
89 pages
AIlab 10
No ratings yet
AIlab 10
3 pages
PDF Chatbot with LangChain Integration
No ratings yet
PDF Chatbot with LangChain Integration
2 pages
Langchain App Design
No ratings yet
Langchain App Design
7 pages
Grafting
No ratings yet
Grafting
8 pages
Polymer Mechanics & Processing
50% (2)
Polymer Mechanics & Processing
25 pages
Science Mock Paper for Class X 2024-25
No ratings yet
Science Mock Paper for Class X 2024-25
8 pages
Imo 2019 TST
No ratings yet
Imo 2019 TST
2 pages
Espinoza Et Al-2018-Basin Research-3
No ratings yet
Espinoza Et Al-2018-Basin Research-3
29 pages
Dinverter 2B Drive Installation Guide
No ratings yet
Dinverter 2B Drive Installation Guide
34 pages
2020 Organic Chemistry Exam Questions
No ratings yet
2020 Organic Chemistry Exam Questions
10 pages
Microbial Growth Requirements Explained
No ratings yet
Microbial Growth Requirements Explained
12 pages
PSO Codes Matlab
No ratings yet
PSO Codes Matlab
4 pages
Modular Bitter Leaf Washing Machine
No ratings yet
Modular Bitter Leaf Washing Machine
13 pages
NPT Gauge Thread Examination
No ratings yet
NPT Gauge Thread Examination
1 page
D.C. Motors: Characteristics and Applications
No ratings yet
D.C. Motors: Characteristics and Applications
7 pages
BFtutorial
No ratings yet
BFtutorial
58 pages
Radar Siting and Structure Guidelines
No ratings yet
Radar Siting and Structure Guidelines
3 pages
BSNL - SDCA - LDCA-2-Network Plans-I
80% (5)
BSNL - SDCA - LDCA-2-Network Plans-I
27 pages
Mineralogy MCQs With Answer
100% (14)
Mineralogy MCQs With Answer
11 pages
Portrait Lighting Techniques Guide
100% (4)
Portrait Lighting Techniques Guide
46 pages
Trimmer Motor Types and Specifications
No ratings yet
Trimmer Motor Types and Specifications
7 pages
DIDWW Outbound Traffic Rates Guide
No ratings yet
DIDWW Outbound Traffic Rates Guide
11,617 pages
Applsci 12 01558 v2
No ratings yet
Applsci 12 01558 v2
15 pages
Computer Networks Midterm Exam
No ratings yet
Computer Networks Midterm Exam
8 pages
Decision Making & Information System
No ratings yet
Decision Making & Information System
11 pages
Overview of Apache Cocoon Framework
No ratings yet
Overview of Apache Cocoon Framework
20 pages
OS-Lab Manual
No ratings yet
OS-Lab Manual
44 pages
Is Gtu Papers
No ratings yet
Is Gtu Papers
12 pages
Primavera Project Management Skills
No ratings yet
Primavera Project Management Skills
2 pages
S1 Physics End of Term 1 Exam
100% (1)
S1 Physics End of Term 1 Exam
4 pages
By Mge Ups Systems: Merlin Gerin
No ratings yet
By Mge Ups Systems: Merlin Gerin
60 pages
Biomolecules NEET Aakash Pattern
No ratings yet
Biomolecules NEET Aakash Pattern
4 pages
Lab 4 Creatiine Test - PPTX 2025 (Bio Chem)
No ratings yet
Lab 4 Creatiine Test - PPTX 2025 (Bio Chem)
11 pages

Flowise AI Tutorial #3 File Loaders, Text Splitters, Embeddings & Vector Stores

Uploaded by

Flowise AI Tutorial #3 File Loaders, Text Splitters, Embeddings & Vector Stores

Uploaded by

flow wise makes it easy to create AI applications using a clean and intuitive user

You might also like