Professional Documents
Culture Documents
com/
Introduction
What is GPT4RoI?
source - https://arxiv.org/pdf/2307.03601.pdf
The input to the LLM includes a prefix prompt that provides an overview
of the picture. When a spatial instruction is present in the input text, the
corresponding embedding is replaced with the RoIAlign results of the
corresponding bounding box during tokenization and conversion to
embeddings.
source - https://arxiv.org/pdf/2307.03601.pdf
● Local You can find the code on GitHub website, where you can
also find instructions on how to install and run the model. You will
need to have a few dependencies and some other libraries
installed on your machine.
● Online If you don’t want to install anything on your machine, you
can also use the online demo of GPT4RoI. The demo allows you
to interact with the model using different instructions and RoIs on
various texts. You can also upload your own images and texts and
see how the model responds. The demo is a great way to explore
the capabilities of GPT4RoI and have some fun with it.
Limitations
Conclusion
source
research paper - https://arxiv.org/abs/2307.03601
research document - https://arxiv.org/pdf/2307.03601.pdf
Github repo - https://github.com/jshilong/GPT4RoI
License - https://github.com/jshilong/GPT4RoI/blob/main/LICENSE
Demo link - http://139.196.83.164:7000/