Professional Documents
Culture Documents
Stephen G. Eick
Visual Insights, Inc.
W i t h the decreasing cost of storage and tables is not well suited for analysis tasks [2, cube dimensions are frequently organized
increased bandwidth of n e t w o r k s it has 6]. Analysis queries submitted against ware- into hierarchies and usually include a dimen-
become technically feasible and cost effective houses engineered f o r fast t r a n s a c t i o n sion representing time. Multi-dimensional
to st.ore huge volumes of fine grain data.This archiving frequently run extremely slowly. databases automatically aggregate measures
data typically consists of transactions, sales (Note: A multi-million dollar warehouse may across hierarchical dimensions, support hier-
records, customer information, and is stored only be able to support: one or two power archical navigation, expand and collapse
in warehouses or data marts. When properly analysis users.) An approach for overcoming dimensions, enable drill-down, drill-up or drill-
analyzed, it provides a rich analysis source for the analysis problems associated with rela- across and facilitate comparisons through
understanding business data. Transactions tional databases promulgated by the business time.
c o l l e c t e d by o p e r a t i o n a l systems are intelligence software vendors involves aggre- Perhaps the most common uses of data
frequently stored in relational tables. gating transactions into multi-dimensional data- cubes is t o score aggregated transaction
For a variety of reasons, data cleanliness, bases or data cubes. information. In this case, for example, the
scalability, el~ciency of the relational method, cube dimensions might be product, store,
difficulty in building schemas and computa- D a t a Cubes department, customer number, region, month,
tional complexity, it is difficult to report, A data cube, the raw data structure in a multi- and the measures might be COGS, sales and
analyze, distribute and make business deci- dimensional database, organizes information profit, The dimensions are predefined indices
sions using raw relational tables.The problem along a sequence of categories. The catego- into a cube cell and the measures in a cell are
is chat the relational model and SQL [.3] stan- rizing variables are called dimensions and the roll-ups over the transactions.The roll-up or
dard interface for manipulating relational data, called measures, are stored in cells. The aggregations are usually sums bur may include
February 2000
Figure 3: 3D Multiscape's navigation controls.Top:
popup menu;Bottom:3D Muttiscape toolbar.
February 2000
we have extended these modes by adding
buttons for:
• SelectAll causes all items to be selected.
• Unselect All causes all items to be unse-
lected.
• Toggle inverts the selection set (those
items previously selected become unse-
lected, those previously unselected
become selected).
64 Computer Graphics
be copied to the chpboard and inserted in Exclude Replace Add ix)
PowerPoint presentations, printed, analyzed Create a new the the the
PivotTable of unselec~ed currently currenthl
further in Excel, saved as .html files and the visible Set/Restore Write a Color Unselect items from seled~l selected
browsed in Internet Explorer, Netscape Navi- data Undo bookmark query Legend all ib~rns view items items Preferences
gator or distributed as text for further action.
Case Study: A Profitability Analysis
The PivotTable shown in Figure I involves a
Print Redo Deleta a Totals Select Tog~e Restore Intersect Subtract Help
multi-quarter business profitability study. The selected bookmark Table all the the with the fromthe
views items s election excluded currently currency
complete dataset, stored as an Excel Pivot- items selec~d select=d
Table, contains a w'ide range of business items items
metrics. The analysi,, goal is to study prof- Figure 8: Toolbarsupports rich navigation.
itability by product and market, identify prof-
itability problems ar,d highlight reasons for
the problems. The 0usiness strategy is to
maximize profitabilit~ in Nevada by adjusting
product mix.
In the master P votTable the original
dimensions included: QTR, Months, Market,
State, Mrkt Size, Product Type, Product and
Decaf. The master PivotTable also included
the measures: Profit Margin, Sales, COGS, Tot
Exp, Marktg Payroll, ~isc, Inventory, Opening,
Additions, Ending, Ma,rgin Rat, Profit Ratio, Bdgt
Profit, Bdgt Margin, BCgt Sales, Bdgt COGS, Bdgt
Payroll and Bdgt Additi~,ns.
Focusing in on profitability, Figure 2 shows
profitability by product-state combination.
Identifying the tallest bars by interactively
touching the dimensional Bar Charts with the
mouse, shows that Colombian is the most
profitable product ,,verall and California is
the most profitable state (Figure 10). Rotating
the 3D Multiscape f~r better viewing of the
bars and labeling the tallest bars shows that
the most profitable product in any one state
is Columbian coffee (in Massachusetts)
f o l l o w e d by C o l u m b i a n (in C a l i f o r n i a )
(Figure I 0).
Notice also tha~ one bar in Figure I0
Figure 9: Comparingprofits,Coffee and Espresso.
stands out since it i,, large and points down
(indicating loss). Labeling this bar shows that
it corresponds to Green Tea sales in Nevada. PivotTables are a widely used tool for manip- We overcome these problems using the
Selecting, excludinE and switching to the ulating and reporting on these metrics. The following "sense making" operations:
Multiple Measures perspective shows the problem that we address involves PivotTable • Gestalt."Single Measure (Figure 2) provides
performance of Green Tea in Nevada over understanding. It is difficult for users to a crisp overview of an entire PivotTable
the last several quarters (Figure 5). understand and make sense of PivotTable showing overall patterns.
data, e.g., see patterns, identify trends and
Switching to the Multiple Measures • Extremes: interactively selecting and
spot outliers because of their size and textual
perspective in Figure 5 and A n c h o r e d labeling the largest bars, both in the posi-
nature. Our experience is that understanding
Measures perspective in Figure 7, we can dig tive and negative directions, highlights the
even a tiny PivotTable with 20 or more cells
into why there are osses by looking at the extreme values.
is hard, and understanding bigger tables is
relationship of five measures: profits, sales, • Outliers:Anchored Measures (Figure 5)
impossible. Graphical tools supplied with
COGS, Tot Exp and Marktg. There is a consis- allows the user to discern the items
Excel and other vendors are also not scalable
tent pattern: profits and sales of Green Tea are containing unusually high o r low
and not particularly useful. In practice, the
low where COGS,Tot Exp and Marktg are high. measures.
way users understand large PivotTables is by
The business problem is clear: a lack of Green
Tea sales along with high costs are causing a
breaking them up into multiple small tables. • Sorting and Ranking: the row and column
There are three disadvantages in using Bar Charts as well as 3D Multiscape are
large loss in Nevada. 1"he data shown in Figure
smaller, reduced size PivotTables: easily sorted for easy comparison and
5 can then be written back as a reduced
PivotTable for further Excel analysis. • Aggregations mask important details. comprehension.
• Subsets obscure overall patterns. • Margin totals: the row and column dimen-
Discussion • Predefined comparisons prevent users sional Bar Charts show margin totals.
Dimensional databases are a particularly from discovering unexpected and unan- • Scalability: it is easily possible to analyze
useful way to organize business metrics. ticipated results. PivotTables with several hundred rows
3000 J r
Excel as a new PivotTable for further
analysis.
-6000 • PivotTable drill-up a n d drill-down: If the
Excel PivotTable came from SQL Server
7, ADVIZOR supports accessing data at
the next lever down or up within any
dimensional hierarchy.
~ o~:~ ~ ~ o ~ c.~ c~ ~
Implementation
Our initial i m p l e m e n t a t i o n , named
ADVIZOR/2000, integrates with MS Office as
an Excel add-in and is launched from Excel's
pivot table toolbar. An architectural view of
its high-level design is shown in Figure I I.
To run ADVIZOR/2000, the user creates a
PivotTable in Excel using Excel's PivotTable
wizard.The source data can originate from an
Excel worksheet, a relational database or an
OLAP cube accessible from OLAP services.
When ADVIZOR/2000 is launched its initial
Single Measure perspective is automatically
populated with data scraped from the Pivot-
Table in the active worksheet. If the Excel
PivotTable was created from an SQL Server
7.0 data cube, ADVIZOR/2000 connects to
SQL Server and uses the cube schema
harvested from the PivotTable. Dimensional
browsing and other cube navigation opera-
tions such as drilling-up,-down and across are
then accomplished by selecting cube slices
and pulling raw slice data from OLAP
services.
The ADVIZOR/2000 container application
is written in Visual Basic 6.0. It consists essen-
tially of a thin container, about a dozen
controls for navigation, and visualization.
Internally, PivotTable data is kept in the Data
Pool, an i n - m e m o r y store that supports
Figure I O: Pra~ability Analysis. Top:most pro~able product and state overall. Bottom: most and least profitable manipulations and case-based linking [4].
state-product combinations. ADVIZOR/2000's architecture is quite flex-
ible. We have created prototype stand-alone
and columns, eliminating the need to shown) facilitates navigation through the versions of ADVIZOR/2000 that attach to
decompose a cube into sets of sub- PivotTable. OLAP services, integrate w i t h Business
cubes. • Showing multiple measures: Anchored and Objects (Note: http://www.businessob-
• Color: tying color to either a dimension Multiple Measures perspectives provide jects.com) and attach to Knosys' Pro
or measure increases the information- unique and novel ways to visualize Clarity.
carrying capacity of the visual display. several measures simultaneously. These Summary
• Visual N a v i g a t i o n : manipulating row, perspectives enable users to discover We have developed a tool that explores tech-
column and page dimensions ( n o t interactions among several measures. niques for visualizing multi-dimensional data-
TPia~lieAd~d.in
t ~ ADVIZOR/2000
\
Services
Figure I I: High-levelsoftwaredesign.
bases.It displays data cubes in I:hree ways: Hammond, Barbara Mirel, John Pyrce, Kurt
• Single Measure perspective consisting of Rivard, Bill Swanson and Michael Tatelman.
linked Bar Charts (representing the
dimensions of I:he PivotTable) and a 3D References
Multiscape (three-dimensional landscape) I. Backer,Richard A.,William S. Cleveland and
showing one measure by the row and Ming-Jen Shyu."The Design and Control of
column dimensions (Figure 2). Trellis Display;' Journal of CamputmJonal and
Statistical Graphics, 5, pp. 123-155, 1996. See
• Multiple Measures perspective that
also h t t p : l l c m . b e l l l a b s . c o m l s t a t l
replaces the 3D Multiscape with a color-
doc/trellis.jcgs.col.ps.
coded Scau:erplot to show two or r.hree
2. Codd, E.E,"Extending the Database Rela-
measures simultaneously.
tional Model to Capture More Meaning,"
• Anchored Measures perspecl:ive that Association for Computing Machinery,
combines a parallel coordinates plot, Box 1997.
plot and bubble plots to show three or 3. Date, C.J. and Hugh Darwen. A Guide to the
more measures simultaneously. SQL Standard, Addison-Wesley, Reading,
We extend multi-dimensional data analysis MA, 1997.
techniques in at least four ways: 4. Eick, Stephen G. ADVIZOR:A Technical
• Larger tables: Using existing PivotTables Overview, August 1999, available at
interfaces i¢ is difficult to understand h~p:l/www.visualin sights.com.
tables with more than tens of rows and 5. Inselberg, Alfred. "Don't Panic ... Do it in
columns. Our visual techniques scale to
hundreds.
• Dimensions totals: The linked Bar Charts
Parallel," Journal of Computational and
Graphical Statistics, 14, pp. 53-77,
1999.
January,
show measure roll-ups by dimension. 6. Thomsen, Erik. OILAPSolutions,John Wiley &
These Bar Charts can easily show dimen- Sons, Inc., NewYork, NY, 1997.
sions with thousands of entities. 7. Tu~e, Edward R. The Visual Display of Quan-
• Ad hac comparisons: Our selection and titative Information, Graphics Press,
navigation model enables users to Cheshire, CT, 1983.
compare arbitrary regions of tables
rather than pre-defined subtables.
• Rich navigation controls: We introduce
techniques for navigatingwithin a perspec-
tive, manipulating images and navigating
within a multi-dimensional data cube.
Acknowledgments
This project has involved many key engineers,
developers and creative thinkers on the Visual
Insights staff including Tim Barg, Sue Burk-
wald, Brenda Garity, Dianne Hackborn, Bill
February 2000