0% found this document useful (0 votes)
1K views3 pages

Flowise AI Tutorial #3 File Loaders, Text Splitters, Embeddings & Vector Stores

Uploaded by

Emanuele Sassù
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views3 pages

Flowise AI Tutorial #3 File Loaders, Text Splitters, Embeddings & Vector Stores

Uploaded by

Emanuele Sassù
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd

flow wise makes it easy to create AI applications using a clean and intuitive user

interface the true power of flow wise is the ability to create AI apps that make
use of a custom knowledge base using our own data we can add the ability to upload
files to our application like PDF documents and text files and we can then chat to
our documentation using flowwise before we build this document chatbot though there
are a few Concepts that we first need to understand this is a topic that I could
probably create an entire video on but let's have a look at the fundamentals for
this video in this example we have a file containing the nursery rhyme Mary had a
little lamb and on the right hand side we have a chat app where we can also model
questions as an example we might want to ask the assistant questions about the
contents in this file but in order for the model to have a view of the contents of
this file we first have to provide contains of this fall as context into our
application which could look like this and that would be the same as copying the
text from the file pasting it into chat GPT and then asking chatgpt questions about
the file that could work but there is an issue with that and the issue we have is
with the token limit a token represents a word or a part of a word and if the
content's overfall was quite large we will exceed the token limit quite easily so
simply copying the file into the chat and then asking questions will become an
issue so therefore ideally we only want to grab the sections that are relevant to
our question and only feed that into the chat as context and thankfully flow wise
or Lang chain which flowwise is based on offers a neat and elegant solution to work
around this limitation so let's talk about text Splitters text Splitters allows us
to take the content of the file and then break the content up into chunks let's
also talk about documents and documents should not be confused with files but
instead a document is a line chain definition of a chunk of text which we got from
the text splitter but the document also includes metadata like the file name and
any other information that we want to store about this piece of text as an example
let's imagine a scenario where we want to upload a folder full of files to our
application the document loader will Loop through each of those files and then it
will use the text splitter to break the follow-up into chunks and it will use the
metadata to store the file name that contained that piece of text now that we've
broken up our file content into logical pieces of text called chunks and then
converted those chunks into line chain documents which also contain metadata we now
need to store that information somewhere and how this works is we'll actually store
that information in a database called a vector database Vector databases are a
fascinating topic but it will take way too long to explain how they work in this
video but Vector databases basically store the data as Vector arrays a vector array
is something that the AI will understand and it will assist the AI with finding
similar documents when we chat to it but in order for us to convert our text into a
vector array we need to run a function called embeddings embeddings is a unique
algorithm that is unique per language model and this will convert our text into
something that our model will understand because we are using openai in our example
we will use the openai embeddings function to convert our text to this array our
Vector array along with the text and metadata can now be stored as records in in
our Vector database and furthermore Vector stores also group similar pieces of text
or similar embeddings in clusters this means that in our application if we now had
to ask our application who is Mary our chat app will first Guide to the vector
store to perform a similarity search so it will go and extract all documents that
are related or similar to Mary it will then return a list of similar documents back
to our application by default our think flowwise Returns the top four results back
to the application it is then these results that get included in the conversation
as context greatly reducing the amount of tokens needed for our conversation enough
Theory let's now go and build this document chat bot in flow wise go back to the
dashboard and let's create a new new chat flow let's save this and call it document
chatbot what we also need is a file to upload to our application ideally you want
to create a file with information that GPT wasn't trained on so I use chatgpt to
generate a unique story and I added the contents to a file some details of the
story is that this is some sort of love story about a young woman named Emily and
she's an architect who moved to a small town where she met a man called Lucas so
there are some details in the story that GPT usually won't be aware of so go ahead
create a file and save it on your machine so back in our project let's have a look
at what we need as I mentioned in the previous videos our chat flows always require
either a chain or an agent to generate output in this example we don't have to use
any external tool so we do not need to use agents and we will be using chains if we
go to our nodes and open up chains we can see this chain called conversational
retrieval QA chain and this is a document QA chain and this is perfect for what we
need let's drop this chain on the canvas and let's configure it if we look at the
inputs this chain requires an llm as well as a vector store remember in our Theory
we showed that our data will be retrieved from a vector store let's set up our
language model under nodes let's go to chat models and let's drag and drop the chat
openai model onto the canvas and we can immediately connect this llm to the chain
we also need to provide our openai API key like so we can leave the model on GPT
3.5 turbo let's now also set up the vector store to do this let's click on nodes
under nodes we can open up Vector stores and within Vector stores we've got quite a
few options for this demo we'll simply use the in memory Vector store but in a
production environment you might want to consider one of these other options like
pinecan or Super Bass let's add our in-memory Vector store to the canvas and let's
hook it up to our chain our chain now has access to an llm as well as a vector
store so let's go ahead and load our data into the vector store first off let's
load documents into the vector store and you might recall from the theory that this
document is not referring to a text file but instead it's referring to a line chain
document this is basically a chunk of text with metadata so how do we create
documents this is actually quite easy all we need to do is add a document loader to
our project so under document loaders we have quite a few options we have the
Cheerio webscriper we can upload CSV v files docx files we can even load a folder
with multiple different files within it let's keep it simple and add the text file
document loader to our project this text file node will allow us to upload files
from our machine it will go and create documents from the content of that file so
let's hook this node up to our Vector store but what this is going to do is it's
going to upload our text file in its entirety and create one single document with
metadata from that file which is not what we want but what we want to do instead is
upload our file and then split our file contents up into chunks and then from those
chunks we want to create documents so optionally we can attach a text splitter to
this node in nodes go to text Splitters with in-text Splitters we'll select the
recursive character text splitter and then add that to our canvas we you can then
connect the stick splitter to the text splitter parameter on our text file node we
can now tell the text splitter how big these chunks need to be and the default is
1000 characters let's make that smaller by changing it to 200 characters the size
of the chunks is really up to you but just keep in mind the intention is to grab
these chunks and then add that to the context of our conversation and the smaller
the chunks the better because the smaller the context the less tokens we use which
drives down costs we can also specify a chunk overlap which will change to
something like 20 characters this means that each chunk might have a section of the
chunk before and after it available in its contents so now we are able to upload
files by selecting the file from our machine and this will now take our fall chunk
it based on these parameters and then come convert each of the chunks into a line
chain document which is then stored in our Vector database but this brings us to
the final component of our chain and that is embeddings in order for the AI to make
sense of the content that we're storing in the database it needs to convert the
text into Vector arrays and in order to convert the text into a vector array we
need to call the embeddings function this is quite easy to set up as well in the
nodes we can go to embeddings and under embeddings we can select the embedding
function related to our llm because we are using open AI as the lrm up here we will
simply select the openai embeddings function and drop that onto the canvas and we
can connect that to our Vector store and in order to call this openai embeddings
API we need to provide the open AI API key as well we can now go ahead and save
this chat flow and we should be able to test this out let's open up the chat and
let's ask a question specific to our file so let's actually pull up this file to
the site over here so we can test this out we know that the main character is
called Emily so let's ask it a question who is Emily and let's send this and that
is perfect Emily is indeed an architect and she is from everdale let's ask it who
is Lucas and it seems that
our story does not provide enough information about Lucas so this rephrase the
question a bit are Emily and Lucas friends and indeed apparently they are friends
so this is a fantastic way to upload documentation like large PDF files or a folder
full of content and then ask questions related to that content and you now have a
fully functional document chat bot I hope you enjoyed this content please please
like And subscribe to support my channel and please tell me down in the comments
what you would like to see me cover next I look forward to seeing you in the next
one bye

flow wise makes it easy to create AI applications using a clean and intuitive user 
interface the true power of flow wise is
there are some details in the story that GPT usually won't be aware of so go ahead 
create a file and save it on your machine
the site over here so we can test this out we know that the main character is 
called Emily so let's ask it a question who is

You might also like