You are on page 1of 23

A

Seminar report on
“Transformative Insights: LIDA Automating Visualizations
with LLMs”
submitted in partial fulfilment of the requirements of
the award of the degree of
Bachelor of Technology
In
Computer Engineering
By
Aryan Rana, 20EPCCS032
under the guidance of
Dr Keshav Dev Gupta
Associate Professor
Department of Computer Engineering

(Session 2023-24)
Class:7CS-A

Department of Computer Engineering,


Poornima College of Engineering
ISI-6, RIICO Institutional Area, Sitapura, Jaipur – 302022
CANDIDATE’S DECLARATION

I hereby declare that the work which is being presented in this seminar report entitled
“Transformative Insights: LIDA Automating Visualizations with LLMs” in the
partial fulfilment for the award of the Degree of Bachelor of Technology in Computer
Engineering, submitted in the Department of Computer Engineering, Poornima College
of Engineering, Jaipur, is an authentic record of my own work done during session
2023-24 under the supervision of Dr Keshav Dev Gupta, Associate Professor,
Department of Computer Engineering.

I have not submitted the matter embodied in this seminar report for the award of any
other degree.

Signature
Name of Candidate: Aryan Rana
Registration no.: PCE20CS032
RTU Roll No: 20EPCCS032.
Class: 7CS-A
Dated: 4th December 2023
Place: Jaipur

SUPERVISOR’S CERTIFICATE

This is to certify that the above statement made by the candidate is correct to the best of
my knowledge.

Dated: 4th December 2023 Dr Keshav Dev Gupta

Place: Jaipur Associate Prof, Dept. of Computer Engg.

Poornima College of Engineering (Seminar-7CS-A), Department of Computer Engineering 2


DEPARTMENT CERTIFICATE

This is to certify that Aryan Rana, 20EPCCS032 of IV year (VII Sem) the
Department of Computer Engineering, has submitted this seminar report
entitled “Transformative Insights: LIDA Automating Visualizations with LLMs ”
under the supervision of Dr Keshav Dev Gupta, Associate Professor in
department of Computer Engineering as per the requirements of the
Bachelor of Technology program of Poornima College of Engineering,
Jaipur.

Dr Keshav Dev Dr Saurabh Dr. Nikita Jain


Gupta Shandilya HOD
Coordinator - Department of
Seminar Coordinator - Computer
Department of Seminar Engineering
Computer Department of
Engineering Computer
Engineering

Poornima College of Engineering (Seminar-7CS-A), Department of Computer Engineering 3


ACKNOWLEDGEMENT

I would like to convey my profound sense of reverence and admiration to my supervisor


Dr Keshav Dev Gupta, Associate Professor, Department of Computer Engineering,
Poornima College of Engineering, for his intense concern, attention, priceless direction,
guidance, and encouragement throughout this seminar research. I am grateful to Dr
Keshav Dev Gupta & Dr Saurabh Shandilya, Seminar Coordinator, for her helping
attitude with a keen interest in completing this seminar.

My special heartfelt gratitude goes to Dr. Nikita Jain, Head, Department of


Computer Engineering, Poornima College of Engineering, for unvarying support,
guidance, and motivation during the course of this seminar research.
I am grateful to Dr. Mahesh Bundele, Director & Principal of Poornima College of
Engineering for his helping attitude with a keen interest in completing this dissertation
in time.
I extend my heartiest gratitude to all the teachers, who extended their cooperation to
steer the topic towards its successful completion. I am also thankful to non-teaching
staff of the department to support in preparation of this work.

Aryan Rana
20EPCCS032

Poornima College of Engineering (Seminar-7CS-A), Department of Computer Engineering 4


Table of Contents

Chapter No. / Heading / Title Page No.


Subheading No.

Candidate’s Declaration 2

Department Certificate 3

Acknowledgement 4

Table of Contents 5-6

List Of Figures 7

List Of Acronyms 8

Abstract 9

Chapter 1 Introduction 10-11

1.1 General 10

1.2 Background 11

Chapter 2 Literature Review 12

2.1 Summary of Review Papers 12

Chapter 3 Theoretical Aspects 13-17

3.1 Theoretical Aspects 13

3.1.1 13
Global Workspace Theory and
Consciousness
3.1.2 14
Large Language Models and Image
Generation Models
3.1.3 15
Summarizer
3.1.4 15
Goal Explore

Poornima College of Engineering (Seminar-7CS-A), Department of Computer Engineering 5


3.1.5 16
VizGenerator
3.1.6 17
Infographer
Chapter 4 Snapshots 18-19

Chapter 5 Conclusion and Future Scope 20-21

5.1 Conclusion 20

5.2 Future Scope 21

References 22

Poornima College of Engineering (Seminar-7CS-A), Department of Computer Engineering 6


LIST OF FIGURES

S. No. Fig. No. Description Page No.

1 1 Summary Process of LIDA 14

2 2 Calling API 16

3 3 Process of LIDA 17

4 4 Dataset and calling the API. 18

5 5 Docker Compose file. 18

6 6 Visualization 19

7 7 Prompts 19

Poornima College of Engineering (Seminar-7CS-A), Department of Computer Engineering 7


LIST OF ACRONYMS

Serial ACRONYM FULL FORM


Number

1 LLM Large Language Models

2 API Application Programming Interface

3 IGM Image Generation Models

4 GWT Global Workspace Theory

5 NLP Natural Language Processing

Poornima College of Engineering (Seminar-7CS-A), Department of Computer Engineering 8


ABSTRACT

This presentation delves into the intricacies of LIDA, a groundbreaking tool designed to
empower users in the seamless creation of visualizations. LIDA addresses critical
subtasks in the visualization process, including the interpretation of data semantics,
enumeration of relevant visualization goals, and the generation of precise visualization
specifications. The methodology adopted by LIDA involves a multi-stage generation
approach, skilfully integrating Large Language Models (LLMs) and Image Generation
Models (IGMs) to orchestrate well-defined pipelines. Comprising four distinct modules,
namely the Summarizer, Goal Explorer, VisGenerator, and Infographer, LIDA provides
a holistic solution for grammar-agnostic visualization and infographic generation. Its
hybrid user interface, combining direct manipulation and multilingual natural language,
facilitates interactive chart, infographic, and data story creation. The tool's flexibility,
educational utility, and seamless integration through a Python API make it a versatile
asset across various domains. This presentation aims to unravel the capabilities of
LIDA, positioning it as a key player in advancing the field of data visualization.

Poornima College of Engineering (Seminar-7CS-A), Department of Computer Engineering 9


CHAPTER - 1

INTRODUCTION
1.1 General
Data visualization is the process of transforming data into graphical representations that can
communicate information effectively and efficiently. Data visualization can help users to explore,
analyse, and understand data, as well as to communicate insights and findings to others. However,
creating data visualizations is not a trivial task. It requires domain knowledge, programming skills,
design principles, and visualization goals. Moreover, different data sets may require diverse types of
visualizations, such as charts, graphs, maps, diagrams, or infographics. Therefore, there is a need for
tools that can automate the process of data visualization and generate visualizations that are suitable
for the data and the user’s needs.

1.2 Background

Data visualization is a multidisciplinary field that involves computer science, statistics, design, and
cognition. There are many methods and tools for creating data visualizations, ranging from low-level
programming libraries to high-level graphical user interfaces. However, most of these methods and
tools require users to have some prior knowledge and skills in data analysis, programming, and
visualization design. Moreover, users need to specify the type and parameters of the visualization
they want to create, which may not be easy or intuitive for some users. Therefore, there is a gap
between the user’s needs and the available methods and tools for data visualization.

To bridge this gap, some researchers have proposed to use natural language processing (NLP) and
artificial intelligence (AI) techniques to automate the process of data visualization. For example,
some systems allow users to query data and generate visualizations using natural language, such as
DataTone, NL4DV, and VizML. Some systems use machine learning models to learn the mapping
between data and visualizations, such as Data2Vis, ChartSeer, and VizWiz. Some systems use
generative models to synthesize visualizations from data or text, such as Dall-E, GPT-3, and
VizBERT.

However, most of these systems have some limitations, such as:

 They are restricted to a specific grammar or syntax for natural language queries or
commands, which may not be natural or flexible for some users.
 They are limited to a predefined set of visualization types or templates, which may not
cover all the possible or desirable visualizations for different data sets or scenarios.

Poornima College of Engineering (Seminar-7CS-A), Department of Computer Engineering 10


 They are dependent on the quality and availability of the training data, which may not be
representative or sufficient for some domains or tasks.
 They are not able to generate stylized or customized visualizations or infographics, which
may be more appealing or informative for some users or audiences.

In this paper, we propose LIDA, a tool that aims to overcome these limitations and provide a more
general and flexible solution for data visualization. LIDA uses large language models (LLMs) and
image generation models (IGMs) to generate grammar-agnostic visualizations and infographics from
any data set. LLMs are neural network models that are trained on large corpora of text and can
generate natural language texts for various tasks, such as summarization, translation, question
answering, and text generation. IGMs are neural network models that are trained on large collections
of images and can generate realistic images for various tasks, such as image synthesis, image
captioning, image manipulation, and image generation. LIDA combines the power of LLMs and
IGMs to create data visualizations that are not constrained by any grammar, syntax, template, or
style. LIDA can generate visualizations that are suitable for the data and the user’s goals, as well as
infographics that are data-faithful and stylized. LIDA is a tool that can create data visualizations and
infographics that accurately represent the data. It is compatible with any programming language and
visualization libraries, such as Matplotlib, Seaborn, Altair, and D3.

Poornima College of Engineering (Seminar-7CS-A), Department of Computer Engineering 11


CHAPTER – 2
LITERATURE REVIEW

2.1 Summary of Review Papers

LIDA is a tool that can automatically generate grammar-agnostic visualizations and infographics
from any data set using large language models (LLMs) and image generation models (IGMs). LIDA
consists of four modules: a SUMMARIZER that converts data into a natural language summary, a
GOAL EXPLORER that enumerates visualization goals given the data, a VISGENERATOR that
generates, refines, executes, and filters visualization code, and an INFOGRAPHER module that
yields data-faithful stylized graphics using IGMs.

[Victor Dibia 12 July 2023] presented a paper on ‘LIDA: A Tool for Automatic
Generation of Grammar-Agnostic Visualizations and Infographics using Large
Language Models’.

LIDA provides a user interface, a python API, and a paper on the system architecture and features of
LIDA. LIDA leverages the language modelling and code writing capabilities of state-of-the-art
LLMs like ChatGPT and GPT4. LIDA also provides several operations on generated visualizations,
such as visualization explanation, self-evaluation, automatic repair, and recommendation. LIDA is a
tool that can create data visualizations and infographics that accurately represent the data. It is
compatible with any programming language and visualization libraries, such as Matplotlib, Seaborn,
Altair, and D3. LIDA is open source on GitHub and can be installed via pip. It also has a demo
website where users can try it out on their own data.

Poornima College of Engineering (Seminar-7CS-A), Department of Computer Engineering 12


CHAPTER – 3
THEORITICAL ASPECT

3.1 Theoretical Aspects

3.1.1 Global Workspace Theory and Consciousness

One of the main theoretical foundations of LIDA is the Global Workspace Theory
(GWT) of consciousness, proposed by Bernard Baars (1988; 1997). GWT is a
psychological and neurobiological theory that explains how consciousness arises and
functions in the brain. According to GWT, consciousness is a global phenomenon that
emerges from the interaction of many specialized and distributed brain processes. GWT
proposes that the brain consists of a large number of unconscious processors that
operate in parallel and compete for access to a limited capacity global workspace. The
global workspace is a neural network that integrates and broadcasts information to the
rest of the brain. The information that reaches the global workspace becomes conscious
and available for further processing, such as memory, attention, action selection, and
learning. GWT also suggests that consciousness is a dynamic and adaptive process that
responds to changing environmental and internal demands.

LIDA implements GWT by modelling the global workspace as a module that selects the
most salient and relevant information from the sensory input, the episodic memory, and
the declarative memory, and broadcasts it to the rest of the system. The information that
enters the global workspace is called the conscious content, and it triggers various
cognitive processes, such as goal generation, action selection, and learning. LIDA also
models the unconscious processors as modules that perform distinct functions, such as
perception, memory, attention, and action. LIDA simulates the competition and
cooperation among these modules by using activation and inhibition mechanisms. LIDA
also simulates the dynamic and adaptive nature of consciousness by using feedback
loops and learning mechanisms.

Poornima College of Engineering (Seminar-7CS-A), Department of Computer Engineering 13


3.1.2 Large Language Models and Image Generation Models

Another theoretical foundation of LIDA is the use of large language models (LLMs)
and image generation models (IGMs) for data visualization and infographic generation.
LLMs are neural network models that are trained on large corpora of text and can
generate natural language texts for various tasks, such as summarization, translation,
question answering, and text generation. IGMs are neural network models that are
trained on large collections of images and can generate realistic images for various
tasks, such as image synthesis, image captioning, image manipulation, and image
generation. LIDA uses LLMs and IGMs to create grammar-agnostic visualizations and
infographics from any data set.

LIDA uses LLMs and IGMs in two ways: first, LIDA uses LLMs to generate natural
language summaries, visualization goals, and visualization code from the data. LIDA
leverages the language modelling and code writing capabilities of state-of-the-art LLMs
like ChatGPT and GPT4. LIDA does not rely on any predefined grammar, syntax,
template, or style for generating natural language or code. Instead, LIDA uses the data
and the user’s preferences as the input and the output of the LLMs. LIDA also uses
LLMs to provide explanations, evaluations, repairs, and recommendations for the
generated visualizations. Second, LIDA uses IGMs to generate stylized and customized
graphics from the data and the visualization code. LIDA leverages the image synthesis
and manipulation capabilities of state-of-the-art IGMs like DALL-E and VQGAN.
LIDA does not rely on any predefined graphic elements, layouts, or themes for
generating graphics. Instead, LIDA uses the data, the visualization code, and the user’s
preferences as the input and the output of the IGMs. LIDA also uses IGMs to provide
data-faithful and aesthetic infographics.

Fig 1 Summary Process of LIDA

Poornima College of Engineering (Seminar-7CS-A), Department of Computer Engineering 14


3.1.3 Summarizer

The summarizer module converts data into a rich but compact natural language
summary. The summarizer uses a large language model (LLM) to generate a text that
describes the main features, patterns, and insights of the data. The summary serves as a
grounding context for all subsequent operations and helps the user to understand the
data better. For example, given a data set of the population and GDP of different
countries, the summarizer might generate a summary like this:

The data set contains information about the population and GDP of 195 countries in the
world. The data shows that China has the largest population with 1.4 billion people,
followed by India with 1.3 billion and the United States with 328 million. The data also
shows that the United States has the highest GDP with 21.4 trillion dollars, followed by
China with 14.9 trillion and Japan with 5.1 trillion. The data reveals a positive
correlation between population and GDP, but also a large variation in the GDP per
capita among the countries.

3.1.4 Goal Explorer

The goal explorer module enumerates visualization goals given the data. The goal
explorer uses a large language model (LLM) to generate a list of possible questions or
objectives that the user might have for visualizing the data. The goal explorer also ranks
the goals according to their relevance and importance. The goal explorer helps the user
to explore the data from different perspectives and to discover new insights. For
example, given the same data set of the population and GDP of different countries, the
goal explorer might generate a list of goals like this:
 How do the population and GDP of different countries compare? (High priority)
 Which countries have the highest and lowest GDP per capita? (High priority)
 How do the population and GDP of different regions or continents compare?
(Medium priority)
 What is the distribution of population and GDP across the world? (Medium
priority)

Poornima College of Engineering (Seminar-7CS-A), Department of Computer Engineering 15


 How has the population and GDP of different countries changed over time?
(Low priority)

3.1.5 VizGenerator

The vizgenerator module generates, refines, executes, and filters visualization code. The
vizgenerator uses a large language model (LLM) to generate code that can create
visualizations for the data and the goals. The vizgenerator can generate code in any
programming language or visualization grammar, such as Python, R, C++, Matplotlib,
Seaborn, Altair, or D3. The vizgenerator also refines the code by adding or modifying
parameters, such as labels, titles, colours, or scales. The vizgenerator then executes the
code and filters the output by checking the validity, quality, and data-faithfulness of the
visualizations. The vizgenerator helps the user to create visualizations that are suitable
for the data and the goals, as well as to customize the visualizations according to their
preferences. For example, given the same data set of the population and GDP of
different countries, and the goal of comparing the population and GDP of different
countries, the vizgenerator might generate code like this:

Fig 2: Calling API

Poornima College of Engineering (Seminar-7CS-A), Department of Computer Engineering 16


3.1.6 Infographer

The infographer module yields data-faithful stylized graphics using image generation
models (IGMs). The infographer uses a neural network model that can generate realistic
images from text or code. The infographer can generate stylized or customized graphics,
such as charts, maps, diagrams, or infographics. The infographer also ensures that the
graphics are data-faithful, meaning that they accurately represent the data and do not
introduce any distortion or bias. The infographer helps the user to create graphics that
are more appealing, engaging, and informative. For example, given the same data set of
the population and GDP of different countries, and the same code generated by the
vizgenerator, the infographer might generate an image.

Fig 3: Process of LIDA

Poornima College of Engineering (Seminar-7CS-A), Department of Computer Engineering 17


CHAPTER – 4
SNAPSHOTS

Fig 4: Dataset and calling the API.

Fig 5: Docker Compose file.

Poornima College of Engineering (Seminar-7CS-A), Department of Computer Engineering 18


Fig 6: Visualization

Fig 7: Prompts

Poornima College of Engineering (Seminar-7CS-A), Department of Computer Engineering 19


CHAPTER – 5
CONCLUSION AND FUTURE SCOPE

5.1 Conclusion:

LIDA is a novel tool that uses large language models (LLMs) and image generation
models (IGMs) to generate grammar-agnostic visualizations and infographics from any
data set. LIDA consists of four modules: a SUMMARIZER that converts data into a
natural language summary, a GOAL EXPLORER that enumerates visualization goals
given the data, a VISGENERATOR that generates, refines, executes, and filters
visualization code, and an INFOGRAPHER module that yields data-faithful stylized
graphics using IGMs. LIDA provides a user interface, a python API, and a paper on the
system architecture and features of LIDA. LIDA leverages the language modeling and
code writing capabilities of state-of-the-art LLMs like ChatGPT and GPT4. LIDA also
provides several operations on generated visualizations, such as visualization
explanation, self-evaluation, automatic repair, and recommendation. LIDA is a tool that
can create data visualizations and infographics that accurately represent the data. It is
compatible with any programming language and visualization libraries, such as
Matplotlib, Seaborn, Altair, and D3. LIDA is open source on GitHub and can be
installed via pip. It also has a demo website where users can try it out on their own data.

LIDA is a tool that can create data visualizations and infographics that accurately
represent the data. It is compatible with any programming language and visualization
libraries, such as Matplotlib, Seaborn, Altair, and D3. LIDA is a tool that can generate
visualizations that are not constrained by any grammar, syntax, template, or style. LIDA
can generate visualizations that are suitable for the data and the user’s goals, as well as
infographics that are data-faithful and stylized. LIDA is a tool that can create data
visualizations and infographics that accurately represent the data. It is compatible with
any programming language and visualization libraries, such as Matplotlib, Seaborn,
Altair, and D3.

Poornima College of Engineering (Seminar-7CS-A), Department of Computer Engineering 20


5.2 Future Scope:

There are several directions for future work and improvement of LIDA. Some of them
are:

 To extend LIDA to support more languages, data types, and visualization


libraries.
 To improve the quality and diversity of the generated visualizations and
infographics by using more advanced LLMs and IGMs, such as GPT5 and
DALL-E 2.0.
 To enhance the user interface and the user experience of LIDA by adding more
features and functionalities, such as voice input, interactive feedback, and
personalization.
 To evaluate LIDA on more data sets and visualization tasks, such as time series
analysis, geospatial analysis, and network analysis.
 To explore the ethical and social implications of using LIDA for data
visualization and infographic generation, such as data privacy, data bias, and
data literacy.

Poornima College of Engineering (Seminar-7CS-A), Department of Computer Engineering 21


REFERENCES

[1] LIDA: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and


Infographics using Large Language Models’
- Victor Dibia

[2] https://github.com/microsoft/lida

[3] https://microsoft.github.io/lida/

[4] Medium: LIDA | Automatically Generate Visualization with LLMs | The Future of
Data Visualization

Poornima College of Engineering (Seminar-7CS-A), Department of Computer Engineering 22


Poornima College of Engineering (Seminar-7CS-A), Department of Computer Engineering 23

You might also like