Professional Documents
Culture Documents
Member-only story
9
Search Medium
I recently started an AI-focused educational newsletter, that already has over 160,000
subscribers. TheSequence is a no-BS (meaning no hype, no news, etc) ML-oriented newsletter
that takes 5 minutes to read. The goal is to keep you up to date with machine learning
projects, research papers, and concepts. Please give it a try by subscribing below:
Recent advancements in large language models (LLMs) have revolutionized the field,
equipping them with new capabilities like natural dialogue, mathematical reasoning,
and program synthesis. However, LLMs still face inherent limitations. Their ability to
store information is constrained by fixed weights, and their computation capabilities
are limited to a static graph and narrow context. Additionally, as the world evolves,
LLMs need retraining to update their knowledge and reasoning abilities. To overcome
these limitations, researchers have started empowering LLMs with tools. By granting
access to extensive and dynamic knowledge bases and enabling complex
computational tasks, LLMs can leverage search technologies, databases, and
computational tools. Leading LLM providers have begun integrating plugins that allow
LLMs to invoke external tools through APIs. This transition from a limited set of hand-
coded tools to accessing a vast array of cloud APIs has the potential to transform LLMs
into the primary interface for computing infrastructure and the web. Tasks such as
booking vacations or hosting conferences could be as simple as conversing with an
LLM that has access to flight, car rental, hotel, catering, and entertainment web APIs.
API calls often come with constraints, adding complexity to the LLM’s comprehension
and categorization of the calls. For example, a prompt may require invoking an image
classification model with specific parameter size and accuracy constraints. These
challenges highlight the need for LLMs to understand not only the functional
description of an API call but also reason about the embedded constraints.
The Dataset
The tech-focused dataset at hand encompasses three distinct domains: Torch Hub,
Tensor Hub, and HuggingFace. Each domain contributes a wealth of information,
shedding light on the diverse nature of the dataset. Torch Hub, for instance, offers 95
APIs, providing a solid foundation. In comparison, Tensor Hub takes it a step further
with an extensive collection of 696 APIs. Lastly, HuggingFace leads the pack with a
whopping 925 APIs, making it the most comprehensive domain.
To amplify the value and usability of the dataset, an additional endeavor has been
undertaken. Each API in the dataset is accompanied by a set of 10 meticulously crafted
and uniquely tailored instructions. These instructions serve as indispensable guides for
both training and evaluation purposes. This initiative ensures that every API goes
beyond mere representation, enabling more robust utilization and analysis.
The Architecture
Gorilla introduces the notion of retriever-aware training, where the instruction-tuned
dataset includes an additional field with retrieved API documentation for reference.
This approach aims to teach the LLM to parse and answer questions based on the
provided documentation. The authors demonstrate that this technique allows the LLM
to adapt to changes in API documentation, improves performance, and reduces
hallucination errors.
During inference, users provide prompts in natural language. Gorilla can operate in
two modes: zero-shot and retrieval. In zero-shot mode, the prompt is directly fed to the
Gorilla LLM model, which returns the recommended API call to accomplish the task or
goal. In retrieval mode, the retriever (either BM25 or GPT-Index) retrieves the most up-
to-date API documentation from the API Database. This documentation is
concatenated with the user prompt, along with a message indicating the reference to
the API documentation. The concatenated input is then passed to Gorilla, which
outputs the API to be invoked. Prompt tuning is not performed beyond the
concatenation step in this system.
AST sub-tree matching plays a crucial role in identifying the specific API being called
within the dataset. Since API calls can have multiple arguments, each of these
arguments needs to be matched. Additionally, considering that Python allows for
default arguments, it is essential to define which arguments to match for each API in
the database.
Gorilla in Action
Together with the paper, the researchers open sourced a version of Gorilla. The release
includes a notebook with many examples. Additionally, the following video clearly
shows some of the magic of Gorilla.
gorilla_720p.mp4
Edit description
drive.google.com
Gorilla is one of the most interesting approaches in the tool-augmented LLM space.
Hopefully, we will see the model distributed in some of the main ML hubs in the space.
Follow
CEO of IntoTheBlock, President of Faktory, I write The Sequence Newsletter, Guest lecturer at Columbia
University and Wharton, Angel Investor, Author, Speaker.