You are on page 1of 7

Visualizing Multi-Dimensional Data

Stephen G. Eick
Visual Insights, Inc.

Introduction Figure I: MicrosoftExcel'sPivotTable.

W i t h the decreasing cost of storage and tables is not well suited for analysis tasks [2, cube dimensions are frequently organized
increased bandwidth of n e t w o r k s it has 6]. Analysis queries submitted against ware- into hierarchies and usually include a dimen-
become technically feasible and cost effective houses engineered f o r fast t r a n s a c t i o n sion representing time. Multi-dimensional
to st.ore huge volumes of fine grain data.This archiving frequently run extremely slowly. databases automatically aggregate measures
data typically consists of transactions, sales (Note: A multi-million dollar warehouse may across hierarchical dimensions, support hier-
records, customer information, and is stored only be able to support: one or two power archical navigation, expand and collapse
in warehouses or data marts. When properly analysis users.) An approach for overcoming dimensions, enable drill-down, drill-up or drill-
analyzed, it provides a rich analysis source for the analysis problems associated with rela- across and facilitate comparisons through
understanding business data. Transactions tional databases promulgated by the business time.
c o l l e c t e d by o p e r a t i o n a l systems are intelligence software vendors involves aggre- Perhaps the most common uses of data
frequently stored in relational tables. gating transactions into multi-dimensional data- cubes is t o score aggregated transaction
For a variety of reasons, data cleanliness, bases or data cubes. information. In this case, for example, the
scalability, el~ciency of the relational method, cube dimensions might be product, store,
difficulty in building schemas and computa- D a t a Cubes department, customer number, region, month,
tional complexity, it is difficult to report, A data cube, the raw data structure in a multi- and the measures might be COGS, sales and
analyze, distribute and make business deci- dimensional database, organizes information profit, The dimensions are predefined indices
sions using raw relational tables.The problem along a sequence of categories. The catego- into a cube cell and the measures in a cell are
is chat the relational model and SQL [.3] stan- rizing variables are called dimensions and the roll-ups over the transactions.The roll-up or
dard interface for manipulating relational data, called measures, are stored in cells. The aggregations are usually sums bur may include

February 2000
Figure 3: 3D Multiscape's navigation controls.Top:
popup menu;Bottom:3D Muttiscape toolbar.

visual components (or views) that are


displayed together on the same screen. The
views in a perspective w o r k together to
enable a particular type of visual analysis. One
perspective, focused on visualizing a single
Figure 2: A landscape visualization of the multi-dimensionaldata from the PivotTablein Figure I for the measure, answers "what" questions. Another,
profits measure. focused on showing two or three measures
simultaneously, answers "why" questions. The
other functions such as averages, standard Standard PivotTable manipulations include: third explores ways to show three to 20
deviations, percentages, etc. • Assigning dimensions to the rows, columns measures using a variant on parallel coordinates.
For example, the values for the dimensions and pages using the menus, toolbars and
may be: Single Measure
wizards.
r e g i o n : n o r t h , s o u t h , east, west Figure 2 shows a Single Measure perspective
product : shoes, shirts • Navigating hierarchical dimensions by visualizing the PivotTable pictured in Figure I.
m o n t h : Jan, F e b ..... D e c collapsing or expanding the hierarchies. The Single Measure perspective presents the
Then the cell corresponding to [north] • Aggregating results for across different entire PivotTable for one measure, profits in
[shirts] [Feb] is the total sales of shirts for dimensions. this case.
the northern re, on for the month of February. The perspective in Figure 2 is organized
The hierarchical levels for the dimension Multi-Dimensional O L A P Cube into three parts. Controls across the top
time may be year, quarter, month, day, and hour. Visualization follow the standard Microsoft conventions.
Thus [north] [shirts] [QI] -- sales A problem with PivotTable reports is that Along the left are three interactive Bar
[north] [shirts] [Jan] + ___ + sales they are not an effective tool for under- Charts, one for each dimension, showing
[north] [shirts] [Mar] . standing multi-dimensional databases. From profits t o t a l e d by state, product and
It is not necessary that all cells be popu- Figure I for example, even if the font were product_type within product. In the center of
lated nor is it a requirement that the hierar- readable about the only thing it shows is that the display a three-dimensional landscape
chies be symmetric for all dimensions. certain products are not sold in certain view, called a 3D Multiscape [4], shows profit
states. Seeing patterns, discovering trends, by product-state combination. The height (or
PivotTable Reports navigating hierarchies in the multi dimensional depth for negative values) of each 3D Multi-
The standard interface for understanding and data, showing what changed from quarter to scape bar along the z-axis encodes profits or
manipulating data cubes is called a PivotTable quarter and finding relationships between losses. The Bar Charts present PivotTable
or Cross Tab. Although there are variations sales and profits is impossible. Seeming simple marginal totals, the aggregation of profits by
among particular vendor's implementations, analysis tasks such as identifying the three dimension value, whereas the 3D Multiscape
Figure I shows an example of a Microsoft largest cells, locating the two rows with the shows cell details.
Excel PivotTable. Cells are arranged in a row smallest totals, drilling into a subset of the The colors in the perspective, set in the
by c o l u m n by page grid, with one page cells or finding the biggest growth trends are top control panel, are tied to the state dimen-
showing at any time. The values of the row time consuming and tedious at best and sion. The Bar Charts along the left perform
(Product), column (State) and page dimensions frequently impossible. t w o functions: first to provide measure
(QTR) index the table cells and adjust the To overcome this problem, we have devel- summary detail with the color slices showing
visible page. Each cell contains five measures: oped a series of techniques for visual contributions by state and second to act as
Sales, Expenses, Profit, COGS and Marketing. discovery in multi-dimensional data cubes. filters in the visual analysis process.
Margins are totaled for each measure along This effort is part of an on-going effort to The 3D Multiscape provides "big picture"
the edges with grand totals in the lower right develop a complete family of tools, called overview of the PivotTable. It shows, for
hand corner (not shown).The row dimension, ADVIZOR, targeting different aspects of example, how profits vary by state, by product
Product, is organized into a two-level hierarchy visual query and analysis. and product type.The down-pointing 3D Multi-
by Product nested within Product Type. Our techniques are organized into three scape bars represent negative profits (losses).
A PivotTable is an interactive textual report. perspectives. A perspective is a set of linked 3D Multiscape's navigation control and

1~2 Computer Graphics


popup menu, shown in Figure 3, provides a plots. (Note: The outer box in the box plot There are eight classes of navigational
rich interface for tuning the visualization. represents the range of values from the 5% to controls:
Classes of interactive operations include: 95% percentiles. The inner, dark grey box • Perspective (labeled display) determines
• Symbol shape controls represents the range into which the middle how the PivotTable is shown, either as a
• Toggles to activate backwalls and wall 50% of the values fall (i.e. the 2 5 - 7 5 t h Single (Figure 2), Multiple (Figure 5) o r
grid lines percentiles). Outliers are values that are Anchored Measure (Figure 7).
plowed outside (above or below) the outer
• Row and column ordering • Color By determines which measure is
box (outside the 95th percentile), used to color all the views (Bar Charts,
• Select (see Section titled Selection) pan, The dimensions and measures are orga-
rotation and zoom controls 3D 1'4ultJscape,Scatterplot and Parabox).
nized along a series of parallel axes, as with a T h e Color Legend shows the c u r r e n t
• Rotation buttons chat smoothly animate parallel coordinates plot.The first three axes, color mapping (not shown).
the 3D Multiscape to a side, t o p o r indicated by the bubble plots, correspond to
angled (home) orientation. See Figure 4. • M e a s u r e sets the m e a s u r e t h a t is
the dimensions of the original pivot table and
displayed in the Single M e a s u r e and
the remainder, indicated by the Box plots,
Multiple Measures Multiple Measures perspectives.
correspond to the measures. Lines are drawn
Figure 5 shows a multiple measures perspec- between the bubble and box plots connecting • PivotTable Dimension Arrangement manipu-
tive visualizing both profit and sales. The dimensions and measure values. Each line on lates the rows and columns. (Note: This
biggest difference between a single and a Figure 7 shows the measure values for one control is not actually on the tool bar
multiple measures is that the 3D Multiscape is cell. The sizes of the bubbles in the State, but rather on the left-hand side of the
replaced with Sca~erplot with two measures, Product Type and Product columns show the display.)
one each on the x- and y-axes, and a third number of cells have each respective value. • SeleoJon andVisibility, described below, are
measure (or dimension) tied to color The values of the measures from each row by key capabilities for focusing and drilling
This leads to visual insights involving rela- column cell of the pivot table are plotted as into interesting parts of the cube.
tionships among three measures. points on the axis of the corresponding box • Bookmarks to save and return to t h e
The Data Sheet [4] below the Scatterplot plot. Lines are drawn between the bubble and current s~te.
shows line-item details for the points in the box plots c o n n e c t i n g dimensions and • Reporting and Prin~ng for sharing analysis
Scatterplot_ Mousing over any individual point measure. results.
both labels the point coordinates and shows From this perspecUve one can easily iden-
• Undo and Redo that is becoming standard
its c o r r e s p o n d i n g details f o r all of the tify measure values that are " o u d i e r s " or
in windows applications.
measures in the Data Sheet_ extreme values since they touch extreme
The Multiple .Measures perspective shows regions in the Box plot.The lines In Figure 7 Selection
patterns between two (or three if color is are colored by Product Type. The red lines Using the mouse, users may sweep o u t
used) measures and is intended to answer c o r r e s p o n d i n g t o m o n t h l y green tea regions on the views.The items in the sweep
" w h y " questions. Users discover w h a t purchases in Nevada show losses in profit out region become the selection set and are
happened using the single measure perspec- (negative profits) since t h e y t o u c h the drawn in color.The unselected set is drawn in
tive, e.g., profits are down for a particular b o t t o m of t h e Profit axis. By noticing the gray, Following the standard Windows model,
s t a t e - p r o d u c t c o m b i n a t i o n , and use the values of t h e o t h e r measures for the red there are four selection modes:
multiple measures perspective to discover lines, we can easily tell that the reasons for • Replace, the default mode, causes the
why, e.g., marketing costs were unusually high losses are: low sales, high COGS and high new selection set to replace the old.
for this combination. marketing costs. • Intersect combines the previous and
current selection sets to form a new,
Anchored Measures VisuaJly Navigating PivotTables
The analysis power of our perspectives (and necessarily smaller set.
The anchored measure perspective, shown in
Figure 7 focuses on displaying an arbitrary v i r t u a l l y any ocher visualization t o o l ) Is • Add extends the previous selection by
number of measures, say between three and increased by providing emcient techniques to forming a union with it and t h e new
20, using techniques from multidimensional navigate through the visualization. Figure 8 selection.
visualization. Anchored measures combines a shows o u r t o o l b a r , w h i c h serves as a • Subtra~ removes the swept out region
w e i g h t e d ParaBox [4], a c o m b i n a t i o n of command and control center for PivotTable from the previous selection.
bubble plots, parallel coordinates [5] and Box navigation. In addition, since selection is so important.,

February 2000
we have extended these modes by adding
buttons for:
• SelectAll causes all items to be selected.
• Unselect All causes all items to be unse-
lected.
• Toggle inverts the selection set (those
items previously selected become unse-
lected, those previously unselected
become selected).

Visibility: Excluding and Restoring


A key capability in visual analysis is the ability
to focus in on particular regions of interest.
There are two aspects of the visibility capa-
bility: excluding and restoring data. The exclude
button eliminates the unselected data items
from the display so that only the selected set
is visible.The restore button brings them back,
making them visible again. Working together,
selecting and excluding are extremely powerful.
Starting with a large PivotTable users can
quickly and easily identify unusual patterns
and interactively select them. Using the
Figure 5: Multiple Measures perspective showing sales and profits.
exclude button, users can easily drill-in and
focus on the interesting regions.
The Totals Table ( l o w e r left corner)
provides immediate feedback on what is
selected and excluded. This helps users navi-
gate without getting lost. If the PivotTable
data came from a SQL Server 7 0 L A P cube,
drill-down and drill-up operations cause
ADVIZOR to attach directly to the cube to
fetch data. This significantly increases scala-
bility since data is fetched only when needed.
To compare, for example, how profits vary
between the Coffee and Espresso, a conve-
nient strategy is to select those products and
exclude the other product types. To accom-
plish this in the Single Measure perspective,
the user would use the dimension Bar Charts
as filtering tools and would select just the
bars corresponding to Coffee and Espresso
Figure 6: Scatterplot navigation controls.Top:popup menu,
Product Type Bar Chart. Selecting the Exclude
bottom: Scatterplot toolbar.
button simplifies the perspective to these two
Product Types, making it immediately apparent
that Colombian Coffee is quite profitable in
Massachusetts and Espresso is extremely
profitable in NewYork (See Figure 9).

Write-Back: Exporting Result Sets


A visual analysis involves posing questions,
formulating hypotheses and discovering
results. As part of a holistic analysis process
these results must translate into business
actions that yield value. Our approach for
supporting the complete analysis process
involves both visual discovery and the
creation of result sets.
A result set, a sub-cube created by selecting
and excluding, may be exported (also called
"write-back") to Excel where it appears as a
Nevada Tea Green T... -365 26 260 131 g8 new PivotTable on a new worksheet. Further-
more, by integrating with Microsoft Office 97
Figure 7: Anchored Measure perspective. and Office 2000, important visualizations may

64 Computer Graphics
be copied to the chpboard and inserted in Exclude Replace Add ix)
PowerPoint presentations, printed, analyzed Create a new the the the
PivotTable of unselec~ed currently currenthl
further in Excel, saved as .html files and the visible Set/Restore Write a Color Unselect items from seled~l selected
browsed in Internet Explorer, Netscape Navi- data Undo bookmark query Legend all ib~rns view items items Preferences
gator or distributed as text for further action.
Case Study: A Profitability Analysis
The PivotTable shown in Figure I involves a
Print Redo Deleta a Totals Select Tog~e Restore Intersect Subtract Help
multi-quarter business profitability study. The selected bookmark Table all the the with the fromthe
views items s election excluded currently currency
complete dataset, stored as an Excel Pivot- items selec~d select=d
Table, contains a w'ide range of business items items
metrics. The analysi,, goal is to study prof- Figure 8: Toolbarsupports rich navigation.
itability by product and market, identify prof-
itability problems ar,d highlight reasons for
the problems. The 0usiness strategy is to
maximize profitabilit~ in Nevada by adjusting
product mix.
In the master P votTable the original
dimensions included: QTR, Months, Market,
State, Mrkt Size, Product Type, Product and
Decaf. The master PivotTable also included
the measures: Profit Margin, Sales, COGS, Tot
Exp, Marktg Payroll, ~isc, Inventory, Opening,
Additions, Ending, Ma,rgin Rat, Profit Ratio, Bdgt
Profit, Bdgt Margin, BCgt Sales, Bdgt COGS, Bdgt
Payroll and Bdgt Additi~,ns.
Focusing in on profitability, Figure 2 shows
profitability by product-state combination.
Identifying the tallest bars by interactively
touching the dimensional Bar Charts with the
mouse, shows that Colombian is the most
profitable product ,,verall and California is
the most profitable state (Figure 10). Rotating
the 3D Multiscape f~r better viewing of the
bars and labeling the tallest bars shows that
the most profitable product in any one state
is Columbian coffee (in Massachusetts)
f o l l o w e d by C o l u m b i a n (in C a l i f o r n i a )
(Figure I 0).
Notice also tha~ one bar in Figure I0
Figure 9: Comparingprofits,Coffee and Espresso.
stands out since it i,, large and points down
(indicating loss). Labeling this bar shows that
it corresponds to Green Tea sales in Nevada. PivotTables are a widely used tool for manip- We overcome these problems using the
Selecting, excludinE and switching to the ulating and reporting on these metrics. The following "sense making" operations:
Multiple Measures perspective shows the problem that we address involves PivotTable • Gestalt."Single Measure (Figure 2) provides
performance of Green Tea in Nevada over understanding. It is difficult for users to a crisp overview of an entire PivotTable
the last several quarters (Figure 5). understand and make sense of PivotTable showing overall patterns.
data, e.g., see patterns, identify trends and
Switching to the Multiple Measures • Extremes: interactively selecting and
spot outliers because of their size and textual
perspective in Figure 5 and A n c h o r e d labeling the largest bars, both in the posi-
nature. Our experience is that understanding
Measures perspective in Figure 7, we can dig tive and negative directions, highlights the
even a tiny PivotTable with 20 or more cells
into why there are osses by looking at the extreme values.
is hard, and understanding bigger tables is
relationship of five measures: profits, sales, • Outliers:Anchored Measures (Figure 5)
impossible. Graphical tools supplied with
COGS, Tot Exp and Marktg. There is a consis- allows the user to discern the items
Excel and other vendors are also not scalable
tent pattern: profits and sales of Green Tea are containing unusually high o r low
and not particularly useful. In practice, the
low where COGS,Tot Exp and Marktg are high. measures.
way users understand large PivotTables is by
The business problem is clear: a lack of Green
Tea sales along with high costs are causing a
breaking them up into multiple small tables. • Sorting and Ranking: the row and column
There are three disadvantages in using Bar Charts as well as 3D Multiscape are
large loss in Nevada. 1"he data shown in Figure
smaller, reduced size PivotTables: easily sorted for easy comparison and
5 can then be written back as a reduced
PivotTable for further Excel analysis. • Aggregations mask important details. comprehension.
• Subsets obscure overall patterns. • Margin totals: the row and column dimen-
Discussion • Predefined comparisons prevent users sional Bar Charts show margin totals.
Dimensional databases are a particularly from discovering unexpected and unan- • Scalability: it is easily possible to analyze
useful way to organize business metrics. ticipated results. PivotTables with several hundred rows

February 2000 115


• Selecting and Excluding: by interactively
selecting important regions of the data
and focusing in on them by excluding
unselected data, it is possible to see
details within context. Our visual naviga-
tional model provides a rich selection
mechanism making it possible, for
example, to focus in on arbitrary regions
of a PivotTable.
(,c~o~'ta~ ,°° c o ~ ~%~,~ cJ ~ • Result set write-back: real analysis sessions
gtm
6000 consist of both visual and t e x t u a l
0~e~JCOlumbJa~M~¢~us~ ~1o~1 o~ analysis. Users can visually select signifi-
cant subsets of the cube, e.g., regions
where a particular product is not prof-
itable, and export the subset back to

3000 J r
Excel as a new PivotTable for further
analysis.
-6000 • PivotTable drill-up a n d drill-down: If the
Excel PivotTable came from SQL Server
7, ADVIZOR supports accessing data at
the next lever down or up within any
dimensional hierarchy.
~ o~:~ ~ ~ o ~ c.~ c~ ~
Implementation
Our initial i m p l e m e n t a t i o n , named
ADVIZOR/2000, integrates with MS Office as
an Excel add-in and is launched from Excel's
pivot table toolbar. An architectural view of
its high-level design is shown in Figure I I.
To run ADVIZOR/2000, the user creates a
PivotTable in Excel using Excel's PivotTable
wizard.The source data can originate from an
Excel worksheet, a relational database or an
OLAP cube accessible from OLAP services.
When ADVIZOR/2000 is launched its initial
Single Measure perspective is automatically
populated with data scraped from the Pivot-
Table in the active worksheet. If the Excel
PivotTable was created from an SQL Server
7.0 data cube, ADVIZOR/2000 connects to
SQL Server and uses the cube schema
harvested from the PivotTable. Dimensional
browsing and other cube navigation opera-
tions such as drilling-up,-down and across are
then accomplished by selecting cube slices
and pulling raw slice data from OLAP
services.
The ADVIZOR/2000 container application
is written in Visual Basic 6.0. It consists essen-
tially of a thin container, about a dozen
controls for navigation, and visualization.
Internally, PivotTable data is kept in the Data
Pool, an i n - m e m o r y store that supports
Figure I O: Pra~ability Analysis. Top:most pro~able product and state overall. Bottom: most and least profitable manipulations and case-based linking [4].
state-product combinations. ADVIZOR/2000's architecture is quite flex-
ible. We have created prototype stand-alone
and columns, eliminating the need to shown) facilitates navigation through the versions of ADVIZOR/2000 that attach to
decompose a cube into sets of sub- PivotTable. OLAP services, integrate w i t h Business
cubes. • Showing multiple measures: Anchored and Objects (Note: http://www.businessob-
• Color: tying color to either a dimension Multiple Measures perspectives provide jects.com) and attach to Knosys' Pro
or measure increases the information- unique and novel ways to visualize Clarity.
carrying capacity of the visual display. several measures simultaneously. These Summary
• Visual N a v i g a t i o n : manipulating row, perspectives enable users to discover We have developed a tool that explores tech-
column and page dimensions ( n o t interactions among several measures. niques for visualizing multi-dimensional data-

Ill Computer Graphics


MS Excel

TPia~lieAd~d.in
t ~ ADVIZOR/2000
\

Services

Figure I I: High-levelsoftwaredesign.

bases.It displays data cubes in I:hree ways: Hammond, Barbara Mirel, John Pyrce, Kurt
• Single Measure perspective consisting of Rivard, Bill Swanson and Michael Tatelman.
linked Bar Charts (representing the
dimensions of I:he PivotTable) and a 3D References
Multiscape (three-dimensional landscape) I. Backer,Richard A.,William S. Cleveland and
showing one measure by the row and Ming-Jen Shyu."The Design and Control of
column dimensions (Figure 2). Trellis Display;' Journal of CamputmJonal and
Statistical Graphics, 5, pp. 123-155, 1996. See
• Multiple Measures perspective that
also h t t p : l l c m . b e l l l a b s . c o m l s t a t l
replaces the 3D Multiscape with a color-
doc/trellis.jcgs.col.ps.
coded Scau:erplot to show two or r.hree
2. Codd, E.E,"Extending the Database Rela-
measures simultaneously.
tional Model to Capture More Meaning,"
• Anchored Measures perspecl:ive that Association for Computing Machinery,
combines a parallel coordinates plot, Box 1997.
plot and bubble plots to show three or 3. Date, C.J. and Hugh Darwen. A Guide to the
more measures simultaneously. SQL Standard, Addison-Wesley, Reading,
We extend multi-dimensional data analysis MA, 1997.
techniques in at least four ways: 4. Eick, Stephen G. ADVIZOR:A Technical
• Larger tables: Using existing PivotTables Overview, August 1999, available at
interfaces i¢ is difficult to understand h~p:l/www.visualin sights.com.
tables with more than tens of rows and 5. Inselberg, Alfred. "Don't Panic ... Do it in
columns. Our visual techniques scale to
hundreds.
• Dimensions totals: The linked Bar Charts
Parallel," Journal of Computational and
Graphical Statistics, 14, pp. 53-77,
1999.
January,
show measure roll-ups by dimension. 6. Thomsen, Erik. OILAPSolutions,John Wiley &
These Bar Charts can easily show dimen- Sons, Inc., NewYork, NY, 1997.
sions with thousands of entities. 7. Tu~e, Edward R. The Visual Display of Quan-
• Ad hac comparisons: Our selection and titative Information, Graphics Press,
navigation model enables users to Cheshire, CT, 1983.
compare arbitrary regions of tables
rather than pre-defined subtables.
• Rich navigation controls: We introduce
techniques for navigatingwithin a perspec-
tive, manipulating images and navigating
within a multi-dimensional data cube.

Acknowledgments
This project has involved many key engineers,
developers and creative thinkers on the Visual
Insights staff including Tim Barg, Sue Burk-
wald, Brenda Garity, Dianne Hackborn, Bill

February 2000

You might also like