Professional Documents
Culture Documents
approach
Meg Ermer Germán H. Alférez
School of Computing School of Computing
Southern Adventist University Southern Adventist University
Collegedale, Tennessee Collegedale, Tennessee
mermer@southern.edu harveya@southern.edu
Abstract—The art of storytelling is multifaceted and nonlinear, contribution, and Section VI discusses the implications of our
involving multiple characters, themes, and symbols while often contribution and explores opportunities for further research.
jumping between the present and past. While media forms such
as novels can encapsulate these complexities, it is often difficult II. S TATE OF THE A RT
to visualize a narrative in an easy-to-understand format. Our
contribution is a graph-based system to let users organize and VOSviewer [1] is a software tool for building and visualiz-
visualize those narratives. Events and characters are represented ing bibliometric networks with information about researchers,
as nodes and their relationships are represented as edges. Neo4J journals, and individual publications. VOSviewer’s text mining
is used as a database management system to store the graph functionality also allows co-occurrent networks of key terms to
and to run queries on it, and Streamlit and Pyvis are used to
represent the database in the user interface. be extracted and visualized from different pieces of scientific
Index Terms—data visualization, data science, graph, database, literature. While this software can be adapted to visualize
node, edge, literature, narrative nearly any concept in terms of nodes and edges, it is not
designed with narrative summaries in mind. Thus, there is no
I. I NTRODUCTION way to easily display different types of relationships and the
Striking similarities exist between the fields of literature software is not oriented towards time-centered data.
and data science. Both require strong competencies in reading Bandara et al. [2] designed a timeline-based approach to nar-
comprehension and writing fluency, as well as the ability to rative storylines with Yarn visualization. This proposal allows
summarize and draw conclusions from an abundance of infor- multiple characters’ timelines to be visualized simultaneously
mation. Within data science, the data visualization discipline without coexisting in the same timeline. Yarn allows the user to
enables data analysts to quickly see trends and summarize efficiently generate all possible timelines within a narrative and
findings through bar graphs, scatter plots, histograms, and to view storyline layouts that depict non-linear point-of-view
many other graphical representations. However, in literature, narratives. However, this visualization does not display non-
there are no tools to visually represent the plot for readers, character-centered elements, such as relationships with places
leaving literary scholars with almost no way to draw their own or objects of importance. Moreover, each node on the timeline
conclusions except with their own interpretation of information is action-oriented, which results in the inability to fully see
directly from the text. While some tools exist to help visualize relationships between different characters, only the resulting
certain aspects of narratives, they are either very advanced actions of those relationships.
tools that must be adapted to fit the narrative storyline [1], StoryPrint [3] is an interactive visualization for script-based
or they rely almost exclusively on the timeline approach with storylines. StoryPrint utilizes a radial-centered approach, with
no way to represent relationships between different characters rings representing characters, scene delineation, and setting.
and events [2]. This ring method allows the viewer to contextualize the
Thus, our contribution is a web application that allows the frequency of a character’s dialogue and the length of a scene
reader to add different characters, relationships, and events in proportion to the total length of the story. It also facili-
from a narrative into a graph database. The reader can then tates comparative analysis by allowing the viewer to easily
interact with the user-interface to run queries on this data, recognize patterns with characters’ emotions, interactions, and
returning the corresponding nodes and edges in an easy-to- frequency of dialogue. StoryPrint also displays character emo-
read manner. This enables the user to visualize the general tion through color overlays that represent a scale of negative
trajectory of a narrative as whole rather than having to draw experience to positive experience.
conclusions straight from the raw text. While each of these tools and proposals contributes valuable
This paper is organized as follows: Section II presents the features to narrative visualization, they all lack the ability to
state of the art. Section III provides the theoretical framework types of relationships between data. In literature, the relation-
for this tool, and then Section IV describes methodology ships among people, places, and elements are as important as
used in its creation. Finally, Section V presents the resulting the people, places, and elements themselves.
III. T HEORETICAL F RAMEWORK
There are several preexisting concepts and technologies
that contributed to the creation of this web application, as Directed Graph Database
visualized in Fig. 1.
A. Directed Graph
A directed graph has a nonempty set of nodes and a set
of edges that is a subset of the square of the set of nodes
composes composes
[4]. Nodes can be thought of as circles that represent different
entities, and edges can be thought of as the lines that connect
those entities to other entities.
B. Database
Graph Database
A database management system is a collection of connected
data, usually called a database, and a set of programs to
access that data [5]. Databases are widely used by universities,
enterprises, banks, airlines, and many other corporations to
store information and records about finances, accounting,
students, and transactions. data represented in queried through
C. Graph Database
A graph database is a database that emphasizes the relations
between data as much as the data itself [6]. Storage can PyVis Streamlit
be visually represented like that of a directed graph, with
the data being the nodes and the relationships between the
data being edges. This allows the database to better describe
the complexity of connections within a dataset. Neo4j [11]
is the leading native graph database that utilizes Cypher as displayed in displayed in
its query language. Cypher is similar to SQL in relational
databases, but vastly more efficient with easier-to-understand
syntax [7]. Cypher eliminates the need for multiple joins and
view creations that many SQL queries require.
Web Application
D. Streamlit
Streamlit [8] is an open-source Python-based library de-
signed for building data science web apps. Its syntax and
structure allows users to create clean applications with minimal
lines of code. Streamlit is unique in that the front-end is taken Fig. 1. The concept map of technologies and theories that compose the
resulting web application.
care of within the backend, and its data caching speeds-up
computation pipelines.
E. Pyvis IV. M ETHODOLOGY
Pyvis [9] is a Python library project that allows users to To create a web application to visualize literary narrative
build and visualize network graphs. Graphs built with Pyvis data, a proper database management system (DBMS) must
are interactable, allowing hovering and dragging of the nodes be chosen, and then the proper libraries must be selected to
and edges. Pyvis is built around the JavaScript library VisJS. represent the data in the front end. This section outlines the
components of creating this application.
F. Web Application
A web application, otherwise known as a web app, is an A. Choosing a Database Management System
application hosted on a web server and delivered over through A DBMS was needed to store the data that the reader inputs.
the internet through a browser [10]. Unlike other applications, Because the data was to be represented in a graph form, a
web applications do not need to be downloaded because they graph database was chosen. This prevented long and needless
are accessible through a network. There are three parts to SQL queries to be created in order to string together rela-
a web application: a web server, an application server, and tionships between data stored in a relational database. Neo4J
a database. The web servers are responsible for managing [11] the most popular DBMS today, was selected because it
client requests. The application servers complete the task. provided Cypher, a reliable and easy-to-use querying language.
The database stores necessary information needed for the Neo4J’s desktop application also supplied a convenient way to
transaction. check and see that the queries were being properly fulfilled.
B. Choosing Front-end Libraries
The web application utilizes two Python libraries to display
the user interface. Initially, a React frontend with MaterialUI
components was considered, but time and convenience ended
up being the key factor in settling on Python-based libraries
instead of incorporating JavaScript. Because Streamlit [8] is
designed specifically for data science, it is used to allow the
reader to input the data from the narrative and submit it to
the DBMS utilizing the various input methods built in to the
library. Representing the graph database was another aspect
of the frontend. At first, the NetworkX library was considered
for the task, but ultimately Pyvis [9] was chosen due to its
cleaner and more interactive visualizations.
Fig. 3. The resulting graph displays a network of people, places, and things
V. R ESULTS from the story.
Fig. 7. The helper method create and return relationship method queries
into the database to create a connection between the two nodes.