Professional Documents
Culture Documents
Seminar report on
“Transformative Insights: LIDA Automating Visualizations
with LLMs”
submitted in partial fulfilment of the requirements of
the award of the degree of
Bachelor of Technology
In
Computer Engineering
By
Aryan Rana, 20EPCCS032
under the guidance of
Dr Keshav Dev Gupta
Associate Professor
Department of Computer Engineering
(Session 2023-24)
Class:7CS-A
I hereby declare that the work which is being presented in this seminar report entitled
“Transformative Insights: LIDA Automating Visualizations with LLMs” in the
partial fulfilment for the award of the Degree of Bachelor of Technology in Computer
Engineering, submitted in the Department of Computer Engineering, Poornima College
of Engineering, Jaipur, is an authentic record of my own work done during session
2023-24 under the supervision of Dr Keshav Dev Gupta, Associate Professor,
Department of Computer Engineering.
I have not submitted the matter embodied in this seminar report for the award of any
other degree.
Signature
Name of Candidate: Aryan Rana
Registration no.: PCE20CS032
RTU Roll No: 20EPCCS032.
Class: 7CS-A
Dated: 4th December 2023
Place: Jaipur
SUPERVISOR’S CERTIFICATE
This is to certify that the above statement made by the candidate is correct to the best of
my knowledge.
This is to certify that Aryan Rana, 20EPCCS032 of IV year (VII Sem) the
Department of Computer Engineering, has submitted this seminar report
entitled “Transformative Insights: LIDA Automating Visualizations with LLMs ”
under the supervision of Dr Keshav Dev Gupta, Associate Professor in
department of Computer Engineering as per the requirements of the
Bachelor of Technology program of Poornima College of Engineering,
Jaipur.
Aryan Rana
20EPCCS032
Candidate’s Declaration 2
Department Certificate 3
Acknowledgement 4
List Of Figures 7
List Of Acronyms 8
Abstract 9
1.1 General 10
1.2 Background 11
3.1.1 13
Global Workspace Theory and
Consciousness
3.1.2 14
Large Language Models and Image
Generation Models
3.1.3 15
Summarizer
3.1.4 15
Goal Explore
5.1 Conclusion 20
References 22
2 2 Calling API 16
3 3 Process of LIDA 17
6 6 Visualization 19
7 7 Prompts 19
This presentation delves into the intricacies of LIDA, a groundbreaking tool designed to
empower users in the seamless creation of visualizations. LIDA addresses critical
subtasks in the visualization process, including the interpretation of data semantics,
enumeration of relevant visualization goals, and the generation of precise visualization
specifications. The methodology adopted by LIDA involves a multi-stage generation
approach, skilfully integrating Large Language Models (LLMs) and Image Generation
Models (IGMs) to orchestrate well-defined pipelines. Comprising four distinct modules,
namely the Summarizer, Goal Explorer, VisGenerator, and Infographer, LIDA provides
a holistic solution for grammar-agnostic visualization and infographic generation. Its
hybrid user interface, combining direct manipulation and multilingual natural language,
facilitates interactive chart, infographic, and data story creation. The tool's flexibility,
educational utility, and seamless integration through a Python API make it a versatile
asset across various domains. This presentation aims to unravel the capabilities of
LIDA, positioning it as a key player in advancing the field of data visualization.
INTRODUCTION
1.1 General
Data visualization is the process of transforming data into graphical representations that can
communicate information effectively and efficiently. Data visualization can help users to explore,
analyse, and understand data, as well as to communicate insights and findings to others. However,
creating data visualizations is not a trivial task. It requires domain knowledge, programming skills,
design principles, and visualization goals. Moreover, different data sets may require diverse types of
visualizations, such as charts, graphs, maps, diagrams, or infographics. Therefore, there is a need for
tools that can automate the process of data visualization and generate visualizations that are suitable
for the data and the user’s needs.
1.2 Background
Data visualization is a multidisciplinary field that involves computer science, statistics, design, and
cognition. There are many methods and tools for creating data visualizations, ranging from low-level
programming libraries to high-level graphical user interfaces. However, most of these methods and
tools require users to have some prior knowledge and skills in data analysis, programming, and
visualization design. Moreover, users need to specify the type and parameters of the visualization
they want to create, which may not be easy or intuitive for some users. Therefore, there is a gap
between the user’s needs and the available methods and tools for data visualization.
To bridge this gap, some researchers have proposed to use natural language processing (NLP) and
artificial intelligence (AI) techniques to automate the process of data visualization. For example,
some systems allow users to query data and generate visualizations using natural language, such as
DataTone, NL4DV, and VizML. Some systems use machine learning models to learn the mapping
between data and visualizations, such as Data2Vis, ChartSeer, and VizWiz. Some systems use
generative models to synthesize visualizations from data or text, such as Dall-E, GPT-3, and
VizBERT.
They are restricted to a specific grammar or syntax for natural language queries or
commands, which may not be natural or flexible for some users.
They are limited to a predefined set of visualization types or templates, which may not
cover all the possible or desirable visualizations for different data sets or scenarios.
In this paper, we propose LIDA, a tool that aims to overcome these limitations and provide a more
general and flexible solution for data visualization. LIDA uses large language models (LLMs) and
image generation models (IGMs) to generate grammar-agnostic visualizations and infographics from
any data set. LLMs are neural network models that are trained on large corpora of text and can
generate natural language texts for various tasks, such as summarization, translation, question
answering, and text generation. IGMs are neural network models that are trained on large collections
of images and can generate realistic images for various tasks, such as image synthesis, image
captioning, image manipulation, and image generation. LIDA combines the power of LLMs and
IGMs to create data visualizations that are not constrained by any grammar, syntax, template, or
style. LIDA can generate visualizations that are suitable for the data and the user’s goals, as well as
infographics that are data-faithful and stylized. LIDA is a tool that can create data visualizations and
infographics that accurately represent the data. It is compatible with any programming language and
visualization libraries, such as Matplotlib, Seaborn, Altair, and D3.
LIDA is a tool that can automatically generate grammar-agnostic visualizations and infographics
from any data set using large language models (LLMs) and image generation models (IGMs). LIDA
consists of four modules: a SUMMARIZER that converts data into a natural language summary, a
GOAL EXPLORER that enumerates visualization goals given the data, a VISGENERATOR that
generates, refines, executes, and filters visualization code, and an INFOGRAPHER module that
yields data-faithful stylized graphics using IGMs.
[Victor Dibia 12 July 2023] presented a paper on ‘LIDA: A Tool for Automatic
Generation of Grammar-Agnostic Visualizations and Infographics using Large
Language Models’.
LIDA provides a user interface, a python API, and a paper on the system architecture and features of
LIDA. LIDA leverages the language modelling and code writing capabilities of state-of-the-art
LLMs like ChatGPT and GPT4. LIDA also provides several operations on generated visualizations,
such as visualization explanation, self-evaluation, automatic repair, and recommendation. LIDA is a
tool that can create data visualizations and infographics that accurately represent the data. It is
compatible with any programming language and visualization libraries, such as Matplotlib, Seaborn,
Altair, and D3. LIDA is open source on GitHub and can be installed via pip. It also has a demo
website where users can try it out on their own data.
One of the main theoretical foundations of LIDA is the Global Workspace Theory
(GWT) of consciousness, proposed by Bernard Baars (1988; 1997). GWT is a
psychological and neurobiological theory that explains how consciousness arises and
functions in the brain. According to GWT, consciousness is a global phenomenon that
emerges from the interaction of many specialized and distributed brain processes. GWT
proposes that the brain consists of a large number of unconscious processors that
operate in parallel and compete for access to a limited capacity global workspace. The
global workspace is a neural network that integrates and broadcasts information to the
rest of the brain. The information that reaches the global workspace becomes conscious
and available for further processing, such as memory, attention, action selection, and
learning. GWT also suggests that consciousness is a dynamic and adaptive process that
responds to changing environmental and internal demands.
LIDA implements GWT by modelling the global workspace as a module that selects the
most salient and relevant information from the sensory input, the episodic memory, and
the declarative memory, and broadcasts it to the rest of the system. The information that
enters the global workspace is called the conscious content, and it triggers various
cognitive processes, such as goal generation, action selection, and learning. LIDA also
models the unconscious processors as modules that perform distinct functions, such as
perception, memory, attention, and action. LIDA simulates the competition and
cooperation among these modules by using activation and inhibition mechanisms. LIDA
also simulates the dynamic and adaptive nature of consciousness by using feedback
loops and learning mechanisms.
Another theoretical foundation of LIDA is the use of large language models (LLMs)
and image generation models (IGMs) for data visualization and infographic generation.
LLMs are neural network models that are trained on large corpora of text and can
generate natural language texts for various tasks, such as summarization, translation,
question answering, and text generation. IGMs are neural network models that are
trained on large collections of images and can generate realistic images for various
tasks, such as image synthesis, image captioning, image manipulation, and image
generation. LIDA uses LLMs and IGMs to create grammar-agnostic visualizations and
infographics from any data set.
LIDA uses LLMs and IGMs in two ways: first, LIDA uses LLMs to generate natural
language summaries, visualization goals, and visualization code from the data. LIDA
leverages the language modelling and code writing capabilities of state-of-the-art LLMs
like ChatGPT and GPT4. LIDA does not rely on any predefined grammar, syntax,
template, or style for generating natural language or code. Instead, LIDA uses the data
and the user’s preferences as the input and the output of the LLMs. LIDA also uses
LLMs to provide explanations, evaluations, repairs, and recommendations for the
generated visualizations. Second, LIDA uses IGMs to generate stylized and customized
graphics from the data and the visualization code. LIDA leverages the image synthesis
and manipulation capabilities of state-of-the-art IGMs like DALL-E and VQGAN.
LIDA does not rely on any predefined graphic elements, layouts, or themes for
generating graphics. Instead, LIDA uses the data, the visualization code, and the user’s
preferences as the input and the output of the IGMs. LIDA also uses IGMs to provide
data-faithful and aesthetic infographics.
The summarizer module converts data into a rich but compact natural language
summary. The summarizer uses a large language model (LLM) to generate a text that
describes the main features, patterns, and insights of the data. The summary serves as a
grounding context for all subsequent operations and helps the user to understand the
data better. For example, given a data set of the population and GDP of different
countries, the summarizer might generate a summary like this:
The data set contains information about the population and GDP of 195 countries in the
world. The data shows that China has the largest population with 1.4 billion people,
followed by India with 1.3 billion and the United States with 328 million. The data also
shows that the United States has the highest GDP with 21.4 trillion dollars, followed by
China with 14.9 trillion and Japan with 5.1 trillion. The data reveals a positive
correlation between population and GDP, but also a large variation in the GDP per
capita among the countries.
The goal explorer module enumerates visualization goals given the data. The goal
explorer uses a large language model (LLM) to generate a list of possible questions or
objectives that the user might have for visualizing the data. The goal explorer also ranks
the goals according to their relevance and importance. The goal explorer helps the user
to explore the data from different perspectives and to discover new insights. For
example, given the same data set of the population and GDP of different countries, the
goal explorer might generate a list of goals like this:
How do the population and GDP of different countries compare? (High priority)
Which countries have the highest and lowest GDP per capita? (High priority)
How do the population and GDP of different regions or continents compare?
(Medium priority)
What is the distribution of population and GDP across the world? (Medium
priority)
3.1.5 VizGenerator
The vizgenerator module generates, refines, executes, and filters visualization code. The
vizgenerator uses a large language model (LLM) to generate code that can create
visualizations for the data and the goals. The vizgenerator can generate code in any
programming language or visualization grammar, such as Python, R, C++, Matplotlib,
Seaborn, Altair, or D3. The vizgenerator also refines the code by adding or modifying
parameters, such as labels, titles, colours, or scales. The vizgenerator then executes the
code and filters the output by checking the validity, quality, and data-faithfulness of the
visualizations. The vizgenerator helps the user to create visualizations that are suitable
for the data and the goals, as well as to customize the visualizations according to their
preferences. For example, given the same data set of the population and GDP of
different countries, and the goal of comparing the population and GDP of different
countries, the vizgenerator might generate code like this:
The infographer module yields data-faithful stylized graphics using image generation
models (IGMs). The infographer uses a neural network model that can generate realistic
images from text or code. The infographer can generate stylized or customized graphics,
such as charts, maps, diagrams, or infographics. The infographer also ensures that the
graphics are data-faithful, meaning that they accurately represent the data and do not
introduce any distortion or bias. The infographer helps the user to create graphics that
are more appealing, engaging, and informative. For example, given the same data set of
the population and GDP of different countries, and the same code generated by the
vizgenerator, the infographer might generate an image.
Fig 7: Prompts
5.1 Conclusion:
LIDA is a novel tool that uses large language models (LLMs) and image generation
models (IGMs) to generate grammar-agnostic visualizations and infographics from any
data set. LIDA consists of four modules: a SUMMARIZER that converts data into a
natural language summary, a GOAL EXPLORER that enumerates visualization goals
given the data, a VISGENERATOR that generates, refines, executes, and filters
visualization code, and an INFOGRAPHER module that yields data-faithful stylized
graphics using IGMs. LIDA provides a user interface, a python API, and a paper on the
system architecture and features of LIDA. LIDA leverages the language modeling and
code writing capabilities of state-of-the-art LLMs like ChatGPT and GPT4. LIDA also
provides several operations on generated visualizations, such as visualization
explanation, self-evaluation, automatic repair, and recommendation. LIDA is a tool that
can create data visualizations and infographics that accurately represent the data. It is
compatible with any programming language and visualization libraries, such as
Matplotlib, Seaborn, Altair, and D3. LIDA is open source on GitHub and can be
installed via pip. It also has a demo website where users can try it out on their own data.
LIDA is a tool that can create data visualizations and infographics that accurately
represent the data. It is compatible with any programming language and visualization
libraries, such as Matplotlib, Seaborn, Altair, and D3. LIDA is a tool that can generate
visualizations that are not constrained by any grammar, syntax, template, or style. LIDA
can generate visualizations that are suitable for the data and the user’s goals, as well as
infographics that are data-faithful and stylized. LIDA is a tool that can create data
visualizations and infographics that accurately represent the data. It is compatible with
any programming language and visualization libraries, such as Matplotlib, Seaborn,
Altair, and D3.
There are several directions for future work and improvement of LIDA. Some of them
are:
[2] https://github.com/microsoft/lida
[3] https://microsoft.github.io/lida/
[4] Medium: LIDA | Automatically Generate Visualization with LLMs | The Future of
Data Visualization