Professional Documents
Culture Documents
The demand for the construction of complex visualizations is growing in many disciplines of science, as users
are faced with ever increasing volumes of data to analyze. The authors present VisTrails, an open source
provenance-management system that provides infrastructure for data exploration and visualization.
Finally, before constructing a vi- this detailed provenance of the pipe- extensible infrastructure lets users in-
sualization, users must often acquire, line evolution as a visualization trail, tegrate a wide range of libraries. This
generate, or transform a given data or vistrail. makes the system suitable for other
set—for example, to calibrate a simu- The stored provenance ensures exploratory tasks, including data min-
lation, they must obtain data from sen- that users will be able to reproduce ing and integration.
sors, generate data from a simulation, the visualizations and lets them easily
and finally construct and compare the navigate through the space of pipe- Creating an Interactive
visualizations for both data sets. Most lines created for a given exploration Visualization with VisTrails
visualization systems, however, don’t task. The VisTrails interface lets users To illustrate the issues involved in
give users adequate support for cre- query, interact with, and understand creating visualizations and how prov-
ating complex pipelines that support the visualization process’s history. In enance can aid in this process, we
multiple libraries and services. particular, they can return to previous present the following scenario, com-
versions of a pipeline and change the mon in medical data visualization.
VisTrails: Provenance specification or parameters to gener- Starting from a volumetric computed
for Visualization ate a new visualization without losing tomography (CT) data set, we generate
The VisTrails system (www.vistrails. previous changes. different visualizations by exploring
org) we developed at the Univer- Another important feature of the the data through volume rendering,
sity of Utah is a new visualization action-based provenance model is isosurfacing (extracting a contour), and
system that provides a comprehensive that it enables a series of operations slicing. Note that with proper modi-
provenance-management infrastruc- that greatly simplify the exploration fications, this example also works for
ture and can be easily combined with process and could reduce the time to visualizing other types of data (for ex-
existing visualization libraries. Unlike insight. In particular, the model al- ample, tetrahedral meshes).
previous systems, VisTrails uses an lows the flexible reuse of pipelines
action-based provenance model that and provides a scalable mechanism Dataflow-Processing Networks
uniformly captures changes to both for creating and comparing numer- and Visual Programming
parameter values and pipeline defini- ous visualizations as well as their cor- A useful paradigm for building visu-
tions by unobtrusively tracking all responding pipelines. Although we alization applications is the dataflow
changes that users make to pipelines originally built VisTrails to support model. A data flow is a directed graph in
in an exploration task. We refer to exploratory visualization tasks, its which nodes represent computations,
September/October 2007 83
V is u a l i z a t i o n C o r n e r
Figure 1. Dataflow programming for visualization. (a) We commonly use a script to describe a pipeline from existing
libraries such as the Visualization Toolkit (VTK). (b) Visual programming interfaces, such as the one VisTrails provides,
facilitate the creation and maintenance of these dataflow pipelines. The green rectangles represent modules, and
the black lines represent connections. (c) The end result of the script or VisTrails pipeline is a set of interactive
visualizations.
and edges represent data streams: for creating visualization pipelines— ing a high-level, structured workflow
each node or module corresponds to for example, scripting in a modern description is that we can use expres-
a procedure that’s applied on the in- dynamic language, such as Python. sive languages in querying and updat-
put data and generates some output Consider Figure 1a, which defines the ing workflows.
data as a result. The flow of data in workflow via a script written in Py-
the graph determines the order in thon that uses VTK to read a volume Comparing and Exploring
which a dataflow system executes the data set from a file, extract an isosur- Multiple Visualizations
processing nodes. In visualization, we face, map the isosurface to renderable Regardless of the specific mechanism
commonly refer to a dataflow network geometry, and then finally render it in we use to define a pipeline, the visu-
as a visualization pipeline. (For this an interactive window. alization process’s end goal is to gain
article, we use the terms workflow, Visual programming interfaces for insight from the data. To obtain such
data flow, and pipeline interchange- designing data flows have become insight, users must often generate
ably.) Figure 1b shows an example of popular and several systems, such as and compare multiple visualizations.
the data flow used to derive the im- SCIRun, have adopted them. These Going back to our scenario, several
ages shown in Figure 1c. The green interfaces give users a more intuitive alternatives exist for rendering our
rectangles represent modules, and view of the pipeline. They also dy- CT data. Isosurfacing is a commonly
the black lines represent connec- namically perform type checking and used technique. Given a function f:
tions. Most of the modules in Figure guide the connection between mod- R n → R and a value a, an isosurface
1 are from VTK, and labels on each ules’ input and output ports—once consists of the set of points in a do-
module indicate the corresponding the user selects a module’s output, main that map to a—that is, Sa = {x ∈
VTK class. In this figure, we natu- connections are allowed only to the R n: f(x) = a}.
rally think of data flowing from top target module’s appropriate input. The range of a values determines all
to bottom, eventually being rendered VisTrails automatically pulls edges possible isosurfaces that the user can
and presented for display. toward the correct input port. As we generate. To identify “good” a values
We can use different mechanisms discuss later, another benefit of hav- that represent a data set’s important
features, we can look at the range of isosurface value as four steps between dering might also show the bones and
values taken by a, and their frequen- 50 and 70, displayed horizontally in internal organs.
cy, in the form of a histogram. Using the VisTrails spreadsheet. Volume rendering and isosurfacing
VisTrails, we can straightforwardly This spreadsheet lets us compare are complementary techniques, and
extend the isosurface pipeline to visualizations in different dimensions they can generate very similar imag-
also display a histogram of the data. (row, column, sheet, and time); we ery depending on parameters. In fact,
VisTrails provides a very simple plug can also link the spreadsheet’s cells to distinguishing between them can be
in functionality that you can use to synchronize the interactions between difficult. The VisTrails system lets us
add packages and libraries, including visualizations. Note that VisTrails compare workflows using a visual dif-
your own. For our example, we used leverages the dataflow specifications ference interface. To demonstrate this
matplotlib’s 2D plotting functionality to identify and avoid redundant op- capability, we compute the difference
(http://matplotlib.sourceforge.net) to erations. By using the same cache between the original isosurface-gen-
generate the histogram at the top of for different cells in the spreadsheet, eration pipeline and the new volume-
Figure 1c. This histogram helps in VisTrails lets users efficiently explore rendering pipeline. Figure 3 shows
data exploration by suggesting re- numerous related visualizations. the visual difference of the workflows
gions of interest in the volume. The that we can inspect, along with their
plot shows that the highest frequency Comparing different visualization tech- resulting visualizations. In Figure 3a,
features lie between the ranges [0,25] niques. Volume rendering is a power- we use volume rendering to create the
and [58,68]. To identify the features ful computer graphics technique for image, in which we can see the skin
that correspond to these ranges, we visualizing 3D data. Whereas many on top of the bone structure; Figure
must explore these regions directly visualization algorithms focus on 3b shows only the bone structure ren-
through visualization. creating a rendering of surfaces— dered with our standard isosurface
although they might be surfaces of technique. This ability to (efficiently)
Scalable exploration of parameter spaces. 3D objects—volume rendering lets us compare workflows and visualizations
VisTrails provides an interface for see “inside” the volume. This tech- is one of the benefits of the VisTrails
parameter exploration that lets users nique models the volume as cloud- action-based provenance model and
specify a set of parameters to explore, like cells of semitransparent material. becomes increasingly important as a
as well as how to explore, group, and A surface rendering of the human workflow becomes more complex and
display them. As a simple 1D example, body might show only the skin, for is shared among collaborators.
Figure 2 shows an exploration of the example, but a complete volume ren- With other workflow systems, these
September/October 2007 85
V is u a l i z a t i o n C o r n e r
(a)
(b) (c)
Figure 3. A visual difference between different pipelines in VisTrails. We show the difference between pipelines that
generated (a) volume rendering and (b) isosurface visualizations. (c) The interface distinguishes shared modules in
dark gray, the modules unique to isosurfacing in blue, those unique to direct volume rendering in orange, and those
with parameter changes in light gray.
(c)
(a) (b)
Figure 4. Multiple rendering techniques. (a) VisTrails renders visualizations by combining volume rendering and
isosurfacing and updates them with user interactions. (b) The corresponding pipeline represents the data flow for
creating interactive visualizations. (c) VisTrails provides a fully browseable history of the exploration process that led
to this final set of visualizations.
comparisons are challenging because (scripting) comparisons. Although the subgraph isomorphism is NP-complete),
they require module-by-module (vi- former can be computationally intrac- the latter could lead to results that are
sual programming) or line-by-line table (the related decision problem of hard to interpret.
Interacting with visualizations. The teractively updates these slices along hypotheses. Whereas for simple
images we generated so far corre- with the plane during user interac- experiments, manual approaches
spond to simple, static workflows. To tions. Figure 4 shows the resulting to provenance management might
perform a more dynamic compari- visualizations along with the com- be feasible, complex computational
son between volume rendering and plex dynamic workflow required to tasks involving large volumes of data
isosurfacing, we add a feedback loop produce them. or multiple researchers require au-
into the workflow to let users adjust tomated approaches. As these tasks’
the visualization interactively. We Provenance and complexity and scale increases, data
thus build a new workflow that uses Exploratory Visualization organization becomes a major com-
the isosurfacing and volume render- The combination of multiple visual- ponent of the process.
ing algorithms simultaneously. We ization algorithms, different librar- VisTrails manages the data ma-
add a clipping plane into the volume ies, and the interactions between nipulated and metadata created in
visualization to assign the volume them considerably complicates the the course of an exploratory task.3 As
regions used for each algorithm. In workflow specification. In addition, a user (or group of users) generates
addition, we use a point on the plane creating a set of visualizations from a series of visualizations, VisTrails
to define axis-aligned slices of the data is not always a linear process transparently tracks all the steps this
volume that we display in distinct and often involves several itera- exploration followed—that is, the
spreadsheet cells. The pipeline in- tions as a user formulates and tests modules and connections added and
September/October 2007 87
V is u a l i z a t i o n C o r n e r
Figure 5. The VisTrails query-by-example interface. (a) Users can define a set of modules and parameters in the visual
programming interface to create a query template. (b) The query results are shown in the history tree, which users
can browse for specific instances of the match (inset).
lets us further explore the visualiza- Users can then click on the individual 4. D.A. Norman, Things That Make Us Smart:
Defending Human Attributes in the Age of the
tion—by trying different isosurface modules to view the execution log re- Machine, Addison-Wesley, 1994.
values, for example [see Figure 2]). In cords associated with them. 5. G. Kindlmann, “Lack of Reproducibility Hin-
addition, we can compare the pipe- ders Visualization Science,” IEEE Visualization
lines by dragging one node on top Compendium, IEEE CS Press, 2006, p. 69.
T
of the other (see Figure 3). These he VisTrails project has focused 6. T. Jankun-Kelly, K.-L. Ma, and M. Gertz,
“A Model and Framework for Visualization
computed differences are useful for on creating an infrastructure to Exploration,” IEEE Trans. Visualization and
understanding the visualization pro- manage the provenance data of ex- Computer Graphics, vol. 13, no. 2, 2007, pp.
cess, and the user can also reuse them. ploratory tasks. With this infrastruc- 357–369.
In this case, we applied the modules ture in place, our research focus is 7. D.P. Groth and K. Streefkerk, “Provenance
and Annotation for Visual Exploration
unique to “Isosurfacing” to “Volume now on what we can do with all the Systems,” IEEE Trans. Visualization and
Rendering” to create a new pipeline provenance accumulatation. By min- Computer Graphics, vol. 12, no. 6, 2006, pp.
called “Combined Rendering,” that ing this information, we hope to learn 1500–1510.
uses a cutting plane to define regions useful patterns that can guide users 8. C. Scheidegger et al., Querying and
Creating Visualizations by Analogy,” to be
for the rendering methods. VisTrails in assembling and refining complex published in IEEE Trans. Visualization and
can automatically apply pipeline dif- computational tasks. Computer Graphics, 2007.
ferences (like a patch) to derive new
pipelines in a process we call visual- Acknowledgments
ization creation by analogy.8 This article summarizes work being Claudio T. Silva is an associate professor at
Another benefit to having a high- done in the VisTrails project. It’s only the University of Utah. His research inter-
level specification of the visualiza- possible through the work of all our ests include visualization, geometry pro-
tion process is that users can query team members: Erik Anderson, Jason cessing, graphics, and high-performance
the pipelines and their execution Callahan, David Koop, Emanuele computing. Silva has a PhD in computer
instances. Scientists can query a vis- Santos, Carlos E. Scheidegger, and science from SUNY at Stony Brook. He
trail to find anomalies in previously Huy T. Vo. The data used in this ar- is a member of the IEEE, the ACM, Eu-
generated visualizations and locate ticle is available courtesy of the Na- rographics, and Sociedade Brasileira
data products and visualizations tional Library of Medicine’s Visible de Matematica. Contact him at csilva@
based on operations applied in the Human Project. The US National Sci- cs.utah.edu.
visualization process. VisTrails sup- ence Foundation partially supported
ports simple, keyword-based queries this work under grants IIS-0513692, Juliana Freire is an assistant professor at
as well as structured queries. In addi- CCF-0401498, EIA-0323604, CNS- the University of Utah. Her research inter-
tion to providing information about 0541560, OCE-0424602, and OISE- ests include scientific data management,
the results (for example, workflow 0405402. The US Department of Web information systems, and information
identifiers and attributes), VisTrails Energy, an IBM Faculty Award, and integration. Freire has a PhD in computer
can visually display query results by a University of Utah Seed Grant also science from SUNY at Stony Brook. She is a
highlighting the workflows and mod- partially supported this work. member of the ACM and the IEEE. Contact
ules that satisfy the query. Table 1 her at juliana@cs.utah.edu.
shows an example.
References
Users might also define queries by Steven P. Callahan is a research assistant and
1. W. Schroeder, K. Martin, and B. Lorensen,
example.8 As Figure 5 illustrates, us- The Visualization Toolkit: An Object-Oriented PhD candidate at the University of Utah. His
ers can construct (or copy and paste) Approach To 3D Graphics, Kitware, 2003. research interests include scientific visual-
a pipeline fragment into the VisTrails 2. E.A. Lee and T.M. Parks, “Dataflow Process ization, visualization systems, and computer
query tab to identify in the history Networks,” Proc. IEEE, vol. 83, no. 5, 1995, graphics. Callahan has an MS in computa-
pp. 773–801.
tree all nodes that contain that frag- tional engineering and science from the
3. S. Callahan et al., “Managing the Evolution
ment. They can then browse through of Dataflows with VisTrails (extended ab- University of Utah. Contact him at stevec@
the highlighted nodes and click on one stract),” Proc. IEEE Workshop on Workflow and sci.utah.edu.
September/October 2007 89