You are on page 1of 1

Student Assistant for Querying Multi-Modal Data using Large Language Models

We are looking for a student assistant in an exciting project to enhance data accessibility
through natural language querying of multi-modal data (tables, text, images, …) together
with our partner Hochtief / Nexplore.

Project Description:
Extracting information from large collections of documents is a difficult task, especially when
the required information is scattered throughout these documents in texts, images, diagrams
(plots) and tables. Together with our partner Hochtief / Nexplore we aim to build a system
that allows querying multi-modal data extracted from their large PDF collections using simple
natural language queries. In particular, we build on CAESURA
(https://arxiv.org/abs/2308.03424), a system that translates natural language queries over
multi-modal data into several processing steps using Large Language Models to answer
user queries. However, there are still some exciting challenges to be tackled. For instance,
currently, CAESURA relies on external off-the-shelf machine learning tools from Huggingface
to extract information for modalities different from text.

Illustrating Example from the medical domain:


A user is interested in the effectiveness of different drugs to treat diabetes and has access to
a PDF collection reporting on a vast amount of clinical trials. For instance, an example query
could be “Get the effect of different diabetes medications on blood glucose levels”. In order
to answer such a query, CAESURA would need to identify the documents concerning
diabetes treatments first, and afterwards extract the effect on blood glucose levels of each
drug. This is challenging, since the information about the effect on blood glucose levels
might be in different places in each PDF, and might sometimes be provided in the text and
sometimes in tables or plots. For each modality, CAESURA would need to choose the
correct extraction tool, e.g. a PlotQA ML model to extract information on blood glucose levels
from plots.

Task:
Extend CAESURA with tools for querying plots in addition to other modalities. For this
existing PlotQA models should be fine-tuned or new PlotQA models should be developed.

Qualifications:
- Proficiency in machine learning, computer vision, and NLP concepts.
- Hands-on experience with deep learning frameworks such as PyTorch and
TensorFlow.

Language: English / German

If you're passionate about advancing AI and data accessibility, we encourage you to apply.
Your contributions will be instrumental in solving real-world problems and making information
more accessible than ever before.

If you are interested, please send your application documents (CV, transcript) to
matthias.urban@cs.tu-darmstadt.de and carsten.binnig@cs.tu-darmstadt.de

You might also like