Professional Documents
Culture Documents
Creating A Local LLM Vector Store From PDFs With KNIME and GPT4All - by Markus Lauber - Low Code For Data Science - Dec, 2023 - Medium
Creating A Local LLM Vector Store From PDFs With KNIME and GPT4All - by Markus Lauber - Low Code For Data Science - Dec, 2023 - Medium
Search Write
123
https://medium.com/low-code-for-advanced-data-science/creating-a-local-llm-vector-store-from-pdfs-with-knime-and-gpt4all-311bf61dd20e 1/16
1/16/24, 7:25 PM Creating a Local LLM Vector Store from PDFs with KNIME and GPT4All | by Markus Lauber | Low Code for Data Science | Dec, 2…
You can also read my initial article about: “KNIME, AI Extension and local
Large Language Models (LLM)”
https://medium.com/low-code-for-advanced-data-science/creating-a-local-llm-vector-store-from-pdfs-with-knime-and-gpt4all-311bf61dd20e 2/16
1/16/24, 7:25 PM Creating a Local LLM Vector Store from PDFs with KNIME and GPT4All | by Markus Lauber | Low Code for Data Science | Dec, 2…
KNIME Workflow to create and use a GPT4All LLM and a local Vector Store from your own Document (PDF)
(https://forum.knime.com/t/gpt4all-embeddings/75594/5?u=mlauber71).
You will have to install GPT4All and the KNIME AI Extension. If you experience
problems please also refer to the “GPT4All Installation behind a Firewall” of
this article. Also as of Q1/2024 there is a bug when using the GPT4All offline
or behind a firewall (which this is all about) — you will have to make some
adaptions in some code.
https://medium.com/low-code-for-advanced-data-science/creating-a-local-llm-vector-store-from-pdfs-with-knime-and-gpt4all-311bf61dd20e 3/16
1/16/24, 7:25 PM Creating a Local LLM Vector Store from PDFs with KNIME and GPT4All | by Markus Lauber | Low Code for Data Science | Dec, 2…
Select the model (.gguf) you want to use in the component. The name will
automatically be added to the text.
https://medium.com/low-code-for-advanced-data-science/creating-a-local-llm-vector-store-from-pdfs-with-knime-and-gpt4all-311bf61dd20e 4/16
1/16/24, 7:25 PM Creating a Local LLM Vector Store from PDFs with KNIME and GPT4All | by Markus Lauber | Low Code for Data Science | Dec, 2…
Search, drag and drop Sentence Extractor node and execute on the column
“Document” from the PDF Parser node. This will split the document cell in
multiple rows: one row for each sentence. Then use a Row Filter node to
remove all sentences below 5 terms. You can experiment with additional text
preparations.
Search, drag and drop FAISS Vector Store Creator node, connect to the
Embeddings4All Connector and your string sections output. Execute the
node on the column with the strings to create the vector store. You can either
download a suitable embedding like (currently: all-MiniLM-L6-v2-f16.gguf) or
let the node do the work.
Save the vector store by adding a Model Writer node. To save it properly, you
can use a relative path and specify the name of the vector store such as
“vector_store.model”.
https://medium.com/low-code-for-advanced-data-science/creating-a-local-llm-vector-store-from-pdfs-with-knime-and-gpt4all-311bf61dd20e 5/16
1/16/24, 7:25 PM Creating a Local LLM Vector Store from PDFs with KNIME and GPT4All | by Markus Lauber | Low Code for Data Science | Dec, 2…
Place a set of questions to the model and your vector store (https://hub.knime.com/-/spaces/-/~RgLTaML-
8RjQVBfi/current-state/).
Load the model into the GPT4All Chat Model Connector. Here you can use
the Flow Variable from the left side.
The Model Reader node reads from the workflow data area the vector store
you previously created in part A). The Vector Store Retriever will try to find 15
relevant documents to add to your prompt you will pose to the Large
Language Model:
https://medium.com/low-code-for-advanced-data-science/creating-a-local-llm-vector-store-from-pdfs-with-knime-and-gpt4all-311bf61dd20e 6/16
1/16/24, 7:25 PM Creating a Local LLM Vector Store from PDFs with KNIME and GPT4All | by Markus Lauber | Low Code for Data Science | Dec, 2…
Tell the Vector Store Retriever how many documents to find for your question.
You might want to edit some information and instructions around the
question itself you want to ask. This might be a good place to test the effects
of different prompts.
Do some prompt engineering and also add the additional informations (adapted from:
https://hub.knime.com/-/spaces/-/~WNe6bb2w2bemYBWE/current-state/).
https://medium.com/low-code-for-advanced-data-science/creating-a-local-llm-vector-store-from-pdfs-with-knime-and-gpt4all-311bf61dd20e 7/16
1/16/24, 7:25 PM Creating a Local LLM Vector Store from PDFs with KNIME and GPT4All | by Markus Lauber | Low Code for Data Science | Dec, 2…
Step 4: Connect to the Vector Store Retriever and the LLM Prompter
Run the questions thru the model and the Vector Store.
Drag in the Vector Store Retriever node and the LLM Prompter node,
in-between add a String Manipulation node for Prompt Engineering.
You can save the table via an Excel Writer node. Optionally you can compare
with a Table View node the answers by the LLM and the ones we imported
with the questions for reference.
C) ‘Live’ Chat with your GPT4All model using your Vector Store
In addition to asking your questions in a batch, I created a KNIME Component
where you could ‘live’ chat with the model and the information you provided
from the coffee machine manual. The component will take your question,
select suitable documents from the vector store and then give you the
answer:
https://medium.com/low-code-for-advanced-data-science/creating-a-local-llm-vector-store-from-pdfs-with-knime-and-gpt4all-311bf61dd20e 8/16
1/16/24, 7:25 PM Creating a Local LLM Vector Store from PDFs with KNIME and GPT4All | by Markus Lauber | Low Code for Data Science | Dec, 2…
A live chat with the model based on your PDF’s data (https://hub.knime.com/-/spaces/-/~RgLTaML-8RjQVBfi/current-
state/).
The initial task/role is only provided once, though you can change it. You can
also edit the prompt within the component if you want.
Load the model into the GPT4All Chat Model Connector. Here you can use
the Flow Variable from the left side where you selected the model.
https://medium.com/low-code-for-advanced-data-science/creating-a-local-llm-vector-store-from-pdfs-with-knime-and-gpt4all-311bf61dd20e 9/16
1/16/24, 7:25 PM Creating a Local LLM Vector Store from PDFs with KNIME and GPT4All | by Markus Lauber | Low Code for Data Science | Dec, 2…
Step 2: Select the initial Role for the prompt and the number of documents
to be searched
Besides your precise question, there should be a role being defined and some
additional instructions.
What happens inside the component is the answers are stored in a KNIME
table and are reloaded so you have your conversation stored and shown in
your chat window.
Currently the chat does *not* refer to items that already have been discussed
(like it would be with using a live connection to ChatGPT). But the upside is
that the conversation does happen just on your own machine without the
data being sent to the internet.
Even if you cannot process very large amounts of data (depending on the
power of your machine) you might be able to test prompts and see if your
vector store might work.
https://medium.com/low-code-for-advanced-data-science/creating-a-local-llm-vector-store-from-pdfs-with-knime-and-gpt4all-311bf61dd20e 10/16
1/16/24, 7:25 PM Creating a Local LLM Vector Store from PDFs with KNIME and GPT4All | by Markus Lauber | Low Code for Data Science | Dec, 2…
The setup still can sometimes give false answers or make things up. It
seems to help to refer to the coffee machine (although this is already done
be the ‘wrapper’ around the question). It might be good to experiment
with the prompts
What does not work well is ‘negativ’ questions or things that are not in the
manual. I asked the setup if the machine also can make tea and the
answer was somewhat evasive. You can boil water but it did not say that
this might not be the best idea. The model seems to be reluctant to say
no. Maybe something to add to the prompt
You should focus on what you expect to be in the document. I am not sure
how well it works to combine it with the general content of the model
When creating KNIME document types (that you use to train the Vector
store) you can add additional metadata, like author, page number, title and
https://medium.com/low-code-for-advanced-data-science/creating-a-local-llm-vector-store-from-pdfs-with-knime-and-gpt4all-311bf61dd20e 11/16
1/16/24, 7:25 PM Creating a Local LLM Vector Store from PDFs with KNIME and GPT4All | by Markus Lauber | Low Code for Data Science | Dec, 2…
so on. It might be useful to try and add that to answers to have some
reference
Note: this article has been edited to describe more precisely the use of
embedding models when creating vector stores.
https://medium.com/low-code-for-advanced-data-science/creating-a-local-llm-vector-store-from-pdfs-with-knime-and-gpt4all-311bf61dd20e 12/16
1/16/24, 7:25 PM Creating a Local LLM Vector Store from PDFs with KNIME and GPT4All | by Markus Lauber | Low Code for Data Science | Dec, 2…
Senior Data Scientist working with KNIME, Python, R and Big Data Systems in the telco
industry
More from Markus Lauber and Low Code for Data Science
Markus Lauber in Low Code for Data Science Tasmay Pankaj Tibre… in Low Code for Data Scie…
https://medium.com/low-code-for-advanced-data-science/creating-a-local-llm-vector-store-from-pdfs-with-knime-and-gpt4all-311bf61dd20e 13/16
1/16/24, 7:25 PM Creating a Local LLM Vector Store from PDFs with KNIME and GPT4All | by Markus Lauber | Low Code for Data Science | Dec, 2…
125 1 1.1K 7
83 4
See all from Markus Lauber See all from Low Code for Data Science
https://medium.com/low-code-for-advanced-data-science/creating-a-local-llm-vector-store-from-pdfs-with-knime-and-gpt4all-311bf61dd20e 14/16
1/16/24, 7:25 PM Creating a Local LLM Vector Store from PDFs with KNIME and GPT4All | by Markus Lauber | Low Code for Data Science | Dec, 2…
Markus Lauber in Low Code for Data Science Ankush k Singal in AI Advances
125 1 508 5
Lists
https://medium.com/low-code-for-advanced-data-science/creating-a-local-llm-vector-store-from-pdfs-with-knime-and-gpt4all-311bf61dd20e 15/16
1/16/24, 7:25 PM Creating a Local LLM Vector Store from PDFs with KNIME and GPT4All | by Markus Lauber | Low Code for Data Science | Dec, 2…
2.6K 25 864 3
10 min read · Dec 28, 2023 · 10 min read · Sep 29, 2023
3 93 1
https://medium.com/low-code-for-advanced-data-science/creating-a-local-llm-vector-store-from-pdfs-with-knime-and-gpt4all-311bf61dd20e 16/16