Professional Documents
Culture Documents
com/
Introduction
Have you ever wished for a virtual assistant that can do more than just
answer your questions or play your favorite songs? A virtual assistant
that can understand your goals, plan the best actions, execute them with
multiple modalities, inspect the results, and learn from feedback? If yes,
then you might be interested in AssistGPT, a new model developed by
researchers from ShowLab at the National University of Singapore,
Microsoft Research Asia, and Microsoft.
What is AssistGPT?
AssistGPT has several key features that make it a powerful and versatile
multi-modal assistant:
● It can help users with their daily tasks and chores that involve
multiple steps and modalities. For example, it can book a flight,
order food, create a presentation, or play a game using natural
language commands and visual interfaces.
● It can assist users with their learning and education by adapting to
their needs and preferences. For example, it can learn new skills
or concepts by asking questions or searching for information on
the web. It can also provide suggestions or feedback using natural
language responses.
● It can entertain users with their hobbies and interests by
generating diverse and creative content. For example, it can play
games with users using natural language dialogue and visual
feedback. It can also generate stories, poems, songs, or images
using natural language creativity.
● It can support users with their health and wellness by monitoring
and improving their well-being. For example, it can track users’
AssistGPT Architecture
source - https://arxiv.org/pdf/2306.08640.pdf
The planner's role is to generate a sequence of actions that can fulfill the
user's goal. It encompasses two sub-modules: the instruction parser and
the action planner. The instruction parser receives a natural language
instruction from the user and converts it into a formal representation of
the goal. Subsequently, the action planner utilizes the text generation
The inspector's task is to scrutinize the output of the executor and verify
if it aligns with the expected outcome. It consists of two sub-modules: the
action parser and the action verifier. The action parser converts an
action from the planner into a formal representation of the expected
outcome. The action verifier compares the executor's output with the
expected outcome using the text generation capabilities of the language
model. In case of any mismatch or error, the action verifier may flag it or
seek clarification from the user.
Limitations
AssistGPT is a novel and powerful model that can perform various tasks
using natural language and vision. However, it also has some limitations
or challenges that need to be addressed in the future.
Conclusion
source
project details - https://showlab.github.io/assistgpt/
research paper - https://arxiv.org/abs//2306.08640
project link - https://assistgpt-project.github.io/