You are on page 1of 76

Advanced Visualization for

Clementine 1.0 User's Guide

For more information about SPSS software products, please visit our Web site at
http://www.spss.com or contact
SPSS Inc.
233 South Wacker Drive, 11th Floor
Chicago, IL 60606-6412
Tel: (312) 651-3000
Fax: (312) 651-3668
SPSS is a registered trademark and the other product names are the trademarks
of SPSS Inc. for its proprietary computer software. No material describing such
software may be produced or distributed without the written permission of the
owners of the trademark and license rights in the software and the copyrights in
the published materials.
The SOFTWARE and documentation are provided with RESTRICTED RIGHTS.
Use, duplication, or disclosure by the Government is subject to restrictions as set forth
in subdivision (c) (1) (ii) of The Rights in Technical Data and Computer Software
clause at 52.227-7013. Contractor/manufacturer is SPSS Inc., 233 South Wacker
Drive, 11th Floor, Chicago, IL 60606-6412.
General notice: Other product names mentioned herein are used for identification
purposes only and may be trademarks of their respective companies.
This product contains software developed by the Apache Software Foundation.
Copyright 2000 by the Apache Software Foundation. All rights reserved. Software
from the Apache Software Foundation is licensed as is, without warranty of any
kind, and SPSS disclaims any and all liability for damages.
This product includes software developed by Eric Young (eay@mincom.oz.au).
Copyright 19951997 by Eric Young. All rights reserved.
This product contains IBM Runtime Environment for AIX, Java 2 Technology
Edition Runtime Modules. Copyright 1999, 2000 by IBM Corporation.
Microsoft and Windows are registered trademarks of Microsoft Corporation.
UNIX is a registered trademark of The Open Group.
DataDirect, INTERSOLV, SequeLink, and DataDirect Connect are registered
trademarks of DataDirect Technologies.
Advanced Visualization for Clementine 1.0
Copyright 2004 by SPSS.
All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or
transmitted, in any form or by any means, electronic, mechanical, photocopying,
recording, or otherwise, without the prior written permission of the publisher.

Preface

Advanced Visualization for Clementine provides you with several new graph nodes
that allow you to explore and visualize your data in new ways. The following new
graph types are available: box plot, bar chart, pie chart, scatterplot matrix, parallel
coordinates chart, map, table heat map, categorical heat map, panel plot, and link
analysis plot.
Clementine is the SPSS enterprise-strength data mining workbench. Clementine
helps organizations improve customer and citizen relationships through an in-depth
understanding of data. Organizations use the insight gained from Clementine to retain
profitable customers, identify cross-selling opportunities, attract new customers,
detect fraud, reduce risk, and improve government service delivery.
Clementines visual interface invites users specific business expertise, which
leads to more powerful predictive models and shortens time-to-solution. Clementine
offers many modeling techniques, such as prediction, classification, segmentation,
and association detection algorithms. Once models are created, Clementine Solution
Publisher enables their delivery enterprise-wide to decision makers or to a database.

Compatibility
Clementine is designed to operate on computer systems running Windows Me,
Windows XP Home and Professional, Windows 2000, Windows 2003, or Windows
NT 4.0 with Service Pack 6.

Serial Numbers
Your serial number is your identification number with SPSS Inc. You will need
this serial number when you contact SPSS Inc. for information regarding support,
payment, or an upgraded system. The serial number was provided with your
Clementine system.
iii

Customer Service
If you have any questions concerning your shipment or account, contact your local
office, listed on the SPSS Web site at http://www.spss.com/worldwide/. Please have
your serial number ready for identification.

Training Seminars
SPSS Inc. provides both public and onsite training seminars. All seminars feature
hands-on workshops. Seminars will be offered in major cities on a regular basis. For
more information on these seminars, contact your local office, listed on the SPSS
Web site at http://www.spss.com/worldwide/.

Technical Support
The services of SPSS Technical Support are available to registered customers. Student
Version customers can obtain technical support only for installation and environmental
issues. Customers may contact Technical Support for assistance in using Clementine
products or for installation help for one of the supported hardware environments. To
reach Technical Support, see the SPSS Web site at http://www.spss.com, or contact
your local office, listed on the SPSS Web site at http://www.spss.com/worldwide/. Be
prepared to identify yourself, your organization, and the serial number of your system.

Tell Us Your Thoughts


Your comments are important. Please let us know about your experiences with SPSS
products. We especially like to hear about new and interesting applications using
Clementine. Please send e-mail to suggest@spss.com or write to SPSS Inc., Attn.:
Director of Product Planning, 233 South Wacker Drive, 11th Floor, Chicago, IL
60606-6412.

Contacting SPSS
If you would like to be on our mailing list, contact one of our offices, listed on our
Web site at http://www.spss.com/worldwide/.

iv

Contents
1

Introduction to Advanced Visualization for


Clementine

Advanced Visualization Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1


Installing Advanced Visualization for Clementine . . . . . . . . . . . . . . . . . . . . 2

Bar Charts

Bar Chart Node Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5


Setting Options for the Bar Chart Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Using a Bar Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Bar Chart Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Box Plots

11

Box Plot Node Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11


Setting Options for the Box Plot Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Using a Box Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Box Plot Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

15

Panel Plots

Panel Plot Node Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15


Setting Options for the Panel Plot Node . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Using a Panel Plot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17


Panel Plot Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Pie Charts

23

Pie Chart Node Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23


Setting Options for the Pie Chart Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Using a Pie Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Pie Chart Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Scatterplot Matrix

27

Scatterplot Matrix Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27


Setting Options for the Scatterplot Matrix Node . . . . . . . . . . . . . . . . . . . . . 28
Using a Scatterplot Matrix Chart. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Scatterplot Matrix Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Parallel Coordinates

33

Parallel Coordinates Plot Node Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 33


Setting Options for the Parallel Coordinates Node. . . . . . . . . . . . . . . . . . . . 34
Using a Parallel Coordinates Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Parallel Coordinates Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

vi

Link Analysis Plots

39

Link Analysis Plot Node Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39


Setting Options for the Link Analysis Plot Node . . . . . . . . . . . . . . . . . . . . . . 41
Using a link analysis Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Link Analysis Plot Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Categorical Heat Maps

47

Categorical Heat Map Node Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47


Setting Options for the Categorical Heat Map Node . . . . . . . . . . . . . . . . . . 48
Using a Categorical Heat Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Categorical Heat Map Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

10 Table Heat Maps

53

Table Heat Map Node Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53


Setting Options for the Table Heat Map Node . . . . . . . . . . . . . . . . . . . . . . . 54
Using a Table Heat Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Table Heat Map Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

59

11 Map Charts

Map Charts Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59


Setting Options for the Map Chart Node . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Using a Map Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Map Chart Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

vii

Index

67

viii

Chapter

Introduction to Advanced
Visualization for Clementine

Advanced Visualization Overview


Advanced Visualization for Clementine provides several new graph nodes which
allow you to visualize and explore your data in new ways.
The following additional graph nodes are available:

Bar chart

Box plot

Pie chart

Scatterplot matrix (SPLOM)

Parallel coordinates chart

Map

Table heat map

Categorical heat map

Panel plot

Link analysis plot

The following sections provide procedures for creating these graphs in addition to
examples of each graph type.
Note: If you are using data sets that are extremely wide (with a large number of
fields), and you experience any performance problems, you should use a Filter node
in your stream to keep only the fields that you need for your graphs.
1

2
Chapter 1

Installing Advanced Visualization for Clementine


The Advanced Visualization package is a free add-on component for Clementine. You
will receive a separate installation CD. Use this CD to install Advanced Visualization
on the computer on which Clementine is installed. You can use Clementine in
client/server mode, but note that you must install the Advanced Visualization package
on each client machine that needs access to the additional graph nodes.

System Requirements
The system requirements for installing are:

Hardware. Pentium-compatible processor or higher and a monitor with 1024

x 768 resolution or higher (support for 65,536 colors is recommended). A


CD-ROM drive for installation is also required.

Software. You must have Clementine 9.0 already installed on your system.

Operating system. Windows 98, Windows 2000, or Windows NT 4.0 with Service

Pack 6 or higher.

Minimum free disk space. 50 MB is required for application components.

Minimum RAM. 256 MB or more of RAM is required.

Installation Procedure
To install Advanced Visualization, simply insert the CD and follow the instructions.
The InstallShield Wizard will guide you through the installation.
E Insert the installation CD into the CD-ROM drive.
E In Windows Explorer, navigate to the CD-ROM drive and run setupwin32.exe.

3
Introduction to Advanced Visualization for Clementine
Figure 1-1
Installation wizard

E Click Next to begin.


E Follow the instructions that appear on the screen. You will need to specify a

destination directories for the Advanced Visualization components. The default


installation directory is C:\Program Files\Clementine\9.0\CEMI\AdvancedVisCEMI.
E To continue, click Next through all steps of the installation wizard.
E Click Finish to complete the initial installation.

When you have completed the installation, a number of files will have been added
to your computer; the new graph nodes will automatically be associated with
Clementine. You will see ten new graph nodes in the Graphs pallette.

Chapter

Bar Charts
Bar Chart Node Overview

A bar chart summarizes values of one field within categories of another. The height
of the bars in the chart may represent a function of either the measure field or the
cluster definition field.
Figure 2-1
Bar graph of jackpots in a casino

6
Chapter 2
Figure 2-2
Setting options for a Bar chart node

Setting Options for the Bar Chart Node


The following options are available before you create a bar chart:
Category field. Select the field to display as bars. You can select from flag, set, or
unknown type fields. You must specify a field in this drop-down list.
Measure field. Select the measure that will determine the size of the bars. You
can select from scale or unknown type fields. You must specify a measure in this
drop-down list.
Cluster field. Optionally, select a field that will be used to cluster the bars in the graph.
You can select from flag, set, and unknown field types.
Statistic. Select the type of statistic that will be represented by the bars. The following

statistics are available: count, proportion, maximum, mean, median, minimum, sum,
range, standard deviation, or confidence interval.
Panel by. Optionally specify a field by which you will panel the bar chart.

7
Bar Charts

And. Optionally specify another field by which you will panel the bar chart along a

second axis.
Legend. Optionally specify the location of the legend in the chart. The location of
the legend is specified using standard compass directions. For example, selecting se
(Southeast) will place the legend in the lower left-right corner of the graph window.
E After you set the desired options, click Execute to create the graph.

Using a Bar Chart


After you create a bar chart, you can use the toolbar buttons to print or save the graph
or you can use the graph control panel to change the graph settings.
Table 2-1
Graph toolbar buttons

Toolbar button

Description
Toggle the display of tooltips. If enabled, data
values appear in tooltips when you hover the
mouse pointer over the bars.
Print the graph.
Save the graph in PNG format.
Show the graph control panel. You use this to
modify any settings in the graph.
Hide the graph control panel.

Modifying a Bar Chart


Use the graph control panel to modify any of the settings after you create the graph.
To view or hide the graph control panel, click the double-arrow button at the top-right
of the graph window. In addition to the original settings, you can also change the
orientation of the bars.For more information, see Setting Options for the Bar Chart
Node on p. 6 .

8
Chapter 2
E If the graph control panel is not visible, click the double-arrow button at the top-right

of the window.
E Change any of the original settings as desired.
E Click the bar orientation buttons to specify whether bars are displayed vertically or

horizontally.
E Click Update.

You may want to either print or save the graph before modifying it. If you want to
compare different graphs side-by-side, you can return to the node dialog, specify
new settings, and create a new graph.
Using the control panel to modify the graph is useful if you want to quickly change
one setting, such as statistic or measure, to compare results or discover new patterns.

Bar Chart Example


The following example shows how a paneled bar chart can be used to examine the
where and when jackpots are awarded in a casino. The data are in the following
format:
Mach_id,Bank,Day,Coin-IN,Win,Jack_Pots,Pulls
AA101,AA,Monday,252257.01,16426.39,34047.78,293153.67
AA101,AA,Tuesday,182218.29,2702.41,34026.15,216650.56
AA101,AA,Wednesday,285342.87,8348.42,71840.9,208672.75
AA101,AA,Thursday,193859.11,9550.96,31282.31,215915.27

Suppose you want to find the largest median jackpot within banks of slot machines
for each day of the week. In the node dialog, specify the following settings:
E Select Bank as the Category field.
E Select Jack_Pots as the Measure field.
E Select Median as the Statistic.
E Select Day as the Paneling field.
E Click Execute.

The graph appears in a new window.

9
Bar Charts
Figure 2-3
Bar graph of jackpots in casino paneled by days of the week

Examining the panel for Saturday, there seems to be a spike in jackpots for one of
the banks of slots. Looking at the other days, the same bank seems to be awarding a
disproportionately high number of jackpots. If you hover over the other high bars,
you can verify that it is bank KK that is giving out the most jackpots.
It might be interesting to isolate the high jackpot bank by using a select node to
select records that have a value of jackpot greater than 30,000. Instead, we could look
at the coin-in values to make sure that the bank of slot machines isnt giving out a
high number of jackpots per paid play.
E If you do not already see the graph control panel, click the double-arrowed button

at the top-right of the graph window.


E Select Coin-IN as the Measure field.

10
Chapter 2
E Click Update.
Figure 2-4
Looking at number of coins inserted into the slot machines per day

We see that of all the days, Saturday has the most coins played. Bank KK has the
highest amount of coins played that day, so its possible that players had identified
machines in that area as hot machines and kept on playing them. If the number of
coins played were low, we might investigate the machines to make sure the payout
settings were correct.

Chapter

Box Plots
Box Plot Node Overview

Box plots are another way to look at the distribution of certain fields in detail. Box
plots show the median, inter-quartile range, outliers, and extreme cases of individual
fields. When using these plots, you can get some indication of your datas symmetry
and skewness.
Figure 3-1
Box plot

The box for each field plotted represents the range of values for the quartiles that
are above and below the median. That is, the box contains the middle 50%, or
inter-quartile range, of the data. The horizontal line within each box represents
the median value. If the median is not in the middle of the box, then this indicates
that the data is skewed. The thin lines extending above and below the box are the
whiskers, which represent the maximum and minimum values. Circles outside the
box represent outliers. If outliers are present, then the whiskers extend to one and half
times the inter-quartile range.

11

12
Chapter 3
Figure 3-2
Setting options for a Box plot node

Setting Options for the Box Plot Node


The following options are available before you create a box plot:
Category field. Select the field to display as boxes. You can select from flag, set, or
unknown type fields. You must specify a field in this drop-down list.
Measure field. Select the measure that will determine the size of the boxes. This field

is what will be summarized within the category field that you specify. You can select
from scale or unknown type fields. You must specify a measure in this drop-down list.
Cluster field. Optionally, select a field that will be used to cluster the boxes in the

chart. You can select from flag, set, and unknown field types.
Panel by. Optionally specify a field by which you will panel the box plots.
And. Optionally specify another field by which you will panel the box plots along
the other axis.

13
Box Plots

Legend. Optionally specify the location of the legend in the chart. The location of
the legend is specified using standard compass directions. For example, selecting se
(Southeast) will place the legend in the lower left-right corner of the graph window.
E After you set the desired options, click Execute to create the graph.

Using a Box Plot


After you create the box plot, you can hover your mouse pointer over the boxes or
outliers to view tooltips. The tooltips show the values that define box, whiskers,
and outliers.
If you want to change any graph settings, you must return to the node dialog,
change the settings, and execute the node to create a new graph. The graph toolbar
contains several buttons that allow you to interact with the graph.
Table 3-1
Graph toolbar buttons

Toolbar button

Description
Toggle the display of tooltips. If enabled,
data values appear in tooltips when you hover
the mouse pointer over boxes, whiskers, or
outliers.
Print the graph.
Save the graph in PNG format.

Box Plot Example


The following example builds on the sample cluster stream, showing how to use a box
plot to examine the distribution of data within the clusters, broken down by drug type
and cholesterol level. The stream used in this example is named cluster.str and is
located in the demos folder of your Clementine installation.
E In the cluster stream, attach a Box Plot node to the model node named Kmeans.
E In the Box Plot node dialog, select $KM-KMeans as the Category field.
E Select Na as the Measure field.

14
Chapter 3
E Select standardize data.
E Select Drug and Cholesterol as the paneling fields.
E Click Execute.

The graph appears in a new window.


Figure 3-3

We see that the only outlier exists in cluster-5, for the combination of drugX and
normal cholesterol levels. For drugY and normal cholesterol levels, cluster-1 and
cluster-5 have fairly even distributions. In comparison, if you look at the combination
of drugY and high cholesterol levels, both cluster-1 and cluster-3 have skewed
distributions, since the median for each is closer to the upper end of the inter-quartile
range.

Chapter

Panel Plots
Panel Plot Node Overview

Paneled plots, sometimes called condition plots, are plots of two fields, conditioned
by a third or a fourth field. Panel plots are useful for looking at any graph subject
to the conditions of other fields. Since the plot is paneled, you can view the results
side by side.
Figure 4-1
Panel plot

The plot itself is very general, allowing for different types of graphs, such as
scatterplots, bar charts, box plots, linear smoothing lines, or any combination of these.
The conditional fields define the panels. If the fields are sets, then the conditions are
the categories within those sets. If the conditional fields are scale, then the field is
split up into ranges of the continuous field, with each range defining a condition.
15

16
Chapter 4

Axes are automatically shared when there are two paneling variables. Panels
automatically shuffle when the aspect ratio of the graph output window is changed,
if there is only a single paneling field.
Figure 4-2
Setting options for a Table Heat Map node

Setting Options for the Panel Plot Node


The following options are available before you create a panel plot:
Panel by. Specify a field by which you will panel the plots.
And. Optionally specify another field by which you will panel the plots along the

other axis.
X Axis. Select the field that represents value to be plotted against the X axis. The
drop-down list will contain all scale and unknown types in your data set.
Y Axis. Select the field that represents value to be plotted against the Yaxis. The
drop-down list will contain all scale and unknown types in your data set.

17
Panel Plots

Element. Select the type of element that you want displayed in the plot: points, bars,
lines, path, area, and box. For each type of element, the default statistic will be
used. Note that you add or delete different elements in the control panel after you
create the plot.
Color by. Select the field that will determine the color of elements in the plot. Displays

all field types.


Size by. Select the field that will determine the size of elements in the plot. Displays
all field types.
Shape by. Select the field that will determine the shape of elements in the plot.
Displays all field types.
Legend. Optionally specify the location of the legend in the chart. The location of
the legend is specified using standard compass directions. For example, selecting se
(Southeast) will place the legend in the lower left-right corner of the graph window.
E After you specify the desired options, click Execute to create the graph.

Using a Panel Plot


After you create a panel plot, you can use the toolbar buttons to print or save the graph
or you can use the graph control panel to change the graph settings.
Table 4-1
Graph toolbar buttons

Toolbar button

Description
Toggle the display of tooltips. If enabled, data
values appear in tooltips when you hover the
mouse pointer over elements in the graphs.
Enable zooming. To zoom in on an area of
the plot, click and drag a rectangle around the
area you want to explore. All of the paneled
plots will zoom to the selected area.
Zoom out one level.
Print the graph.
Save the graph in PNG format.

18
Chapter 4

Toolbar button

Description
Show the graph control panel. You use this to
modify any settings in the graph.
Hide the graph control panel.

Modifying a Panel Plot


You can modify any of the original settings after you create the panel plot. For more
information, see Setting Options for the Panel Plot Node on p. 16.
Additional Chart Options

The graph control panel offers a few additional options not available in the node
dialog.
Plot Elements. You have the option to add multiple elements to the plot.

Note: You cannot change any elements that you have added to the plot. If you want a
different element, first remove the element you do not want, then add a new element.
Statistic. The statistic for the current element that you are adding. By default, the
first element you specified when you created the graph has no statistic applied. The
following statistics are available: count, mean, minimum, maximum, median, sum,
range, standard deviation, confidence interval.
Jitter. This option is useful for plots that have many points in the same location.

Use this option to slightly disperse the points so that individual plot points can be
distinguished easier.
Graph orientation. Click the icons for the desired graph orientation.

Since panel plots are generalized plots, you can include any number of elements in
your plots. For example you could include points in a scatterplot and then add linear
regression lines.
To add plot elements:
E Select the type of element you want to add from Plot elements.

19
Panel Plots
E Select the Statistic that you want applied to the element.
E If desired, select the Color by, Shape by, and Size by fields.
E Click Add.

The element is add to the element list, located directly beneath the Add button.
E After you have added all desired elements, click Update.

The new element appears in the graph.


The list box underneath the Add and delete buttons displays all of the current elements
in the graph. Each line of text represents an element. The element type begins each
line of text, followed by the various settings for that element.
To remove plot elements:
E In the element list box, select the element you want to remove from the plot.
E Click the delete button, which is the red X next to the Add button.

The element is removed from the list.


E After you have removed all desired elements from the list, click Update.
E The elements are removed from the graph.

Panel Plot Example


The following example shows you one way you can use a panel plot. Youll create a
scatterplot and then add a linear regression line to reveal the slope of the relationship
between the promotion cost and revenue increase for each product in a sales
promotion. The stream used in this example is named goodsplot.str and is located in
the demos folder of your Clementine installation.
E In the cluster stream, attach a Panel Plot node to the derive node named Increase.
E In the Panel Plot node dialog, select Class as the paneling field.
E Select Promotion as the X axis field.
E Select Increase as the Y axis field.
E Select Points as the Element type.

20
Chapter 4
E Click Execute.

The graph appears in a new window.


Figure 4-3
Paneled scatterplot with the graph control panel

The points show a general relationship between cost and revenue. To get a better idea
of the relationship, we can add a linear regression line to the plots.
E If the graph control panel is not visible, click the double-arrowed button at the

top-right of the graph window.


E In the graph control panel, select Line from the Plot Elements drop-down list.
E Select Linear from the Statistic drop-down list.
E Click Add.

21
Panel Plots
E Click Update.

The linear regression lines appear in each of the paneled scatterplots. We see that
confections and drinks have the greatest increase in revenue with respect to cost
of promotion.
Figure 4-4
Using the graph control panel to add regression lines to the plot

Chapter

Pie Charts
Pie Chart Node Overview

You use a pie chart is to visually represent the number of cases or percentage of
various categories as pieces in a pie. This allows you to quickly view the relative
distribution within a category of a field.
Figure 5-1
Pie chart showing distribution of regions

Categories with larger slices of the pie indicate a relatively larger number of cases
or a higher percentage.

23

24
Chapter 5
Figure 5-2
Setting options for a Table Heat Map node

Setting Options for the Pie Chart Node


The following options are available before you create a pie chart:
Show Labels. Display labels indicating the value of each pie slice.
Slice by. Specify the field that will be summarized in the pie slices.
Display. Select whether counts or percentages are displayed in the slices of the pie

chart.
Percentage of. If you choose to display percentages, specify a field which determines

the percentage. The field must be of the type range.


Note: If you choose to display percentages and do not specify any field for this
option, the pie chart will display counts.
Panel by. Optionally specify a field by which you will panel the pie chart.

25
Pie Charts

And. Optionally specify another field by which you will panel the pie charts along
the other axis.
Legend. Optionally specify the location of the legend in the chart. The location of
the legend is specified using standard compass directions. For example, selecting se
(Southeast) will place the legend in the lower left-right corner of the graph window.
E After you set the desired options, click Execute to create the graph.

Using a Pie Chart


After you create the pie chart, you can hover your mouse pointer over the pie slices to
view tooltips. The tooltips show the values that define each slice.
If you want to change any graph settings, you must return to the node dialog,
change the settings, and execute the node to create a new graph. The graph toolbar
contains several buttons that allow you to interact with the graph.
Table 5-1
Graph toolbar buttons

Toolbar button

Description
Toggle the display of tooltips. If enabled, data
values appear in tooltips when you hover the
mouse pointer over the pie slices.
Toggle the display of category labels on the
slices.
Print the graph.
Save the graph in PNG format.

Pie Chart Example


The following example shows how you can use a paneled pie chart to see the
proportions of respondents that have a mortgage, broken down by region and gender.
The stream used in this example is named mailshot2.str and is located in the demos
folder of your Clementine installation.
E In the cluster stream, attach a Pie Chart node to the type node.

26
Chapter 5
E In the Pie Chart node dialog, select mortgage as the slice field.
E Select Counts for the Display.
E Select region and sex as the paneling fields.
E Click Execute.

The graph appears in a new window.


Figure 5-3
Paneled pie chart showing whether respondents have a mortgage

We see that for both males and females, across all regions, the majority of respondents
dont have mortgages. In addition, for females in rural and suburban areas, less than a
quarter of the respondents had a mortgage. Of the other population segments, about
one-third of the respondents had a mortgage.

Chapter

Scatterplot Matrix

Scatterplot Matrix Overview


A scatterplot matrix, or splom, plots all possible combinations of two or more numeric
fields against one another. Scatterplots highlight the relationship between the fields by
plotting the actual values along two axes. Plotting several fields in a matrix of graphs
allows you to quickly determine which fields exhibit relationships with each other.
Figure 6-1
Scatterplot matrix

The diagonal of a scatterplot matrix consists of histograms for each of the fields, as
plotting a field against itself does not add any value to the plot.

27

28
Chapter 6
Figure 6-2
Setting options for a Splom node

Setting Options for the Scatterplot Matrix Node


The following options are available before you create a scatterplot matrix chart:
Plot. Select the fields to plot. You can select from all field types in this list box. At

least two plot fields are required to display an actual scatterplot matrix. If you specify
only one field to plot, the graph displays a histogram of the selected field.
Bin data. Check this box if you want to bin your data for fields of the type scale.

Selecting this option can be useful when your data set is large.
Number of bins. If you choose to bin your data, specify the number of bins.
E After you specify the desired options, click Execute to create the graph.

29
Scatterplot Matrix

Using a Scatterplot Matrix Chart


After you create the scatterplot matrix, you can hover your mouse pointer over the
points in the scatter plots or bars in the histograms to view tooltips. The tooltips
show the values that define each line.
If you want to change any graph settings, you must return to the node dialog,
change the settings, and execute the node to create a new graph. The graph toolbar
contains several buttons that allow you to interact with the graph.
Table 6-1
Graph toolbar buttons

Toolbar button

Description
Toggle the display of tooltips. If enabled, data
values appear in tooltips when you hover the
mouse pointer over the points in the scatter
plots or bars in the histogram.
Enable brushing. With brushing enabled, you
can hover the mouse pointer over points in the
scatterplots or bars in the histograms in the
graph to highlight them. The corresponding
points and bars are highlighted in all
scatterplots and histograms..
Print the graph.
Save the graph in PNG format.

Scatterplot Matrix Example


The following example shows you a common use for a scatterplot matrix. You can
use the chart to reveal patterns within each cluster and help validate the model. The
stream used in this example is named fraud.str and is located in the demos folder of
your Clementine installation.
E In the fraud stream, attach a Splom node to the derive node named claimdiff.
E In the Splom node dialog, select the following fields as the Plot fields.

farmsize

claimvalue

30
Chapter 6

claimdiff

landquality

farmincome

estincome

E Click Execute.

The graph appears in a new window.


Figure 6-3
Scatterplot matrix

31
Scatterplot Matrix

We see that several combinations of fields have linear relationships, indicated by the
groupings that look similar to lines going from the bottom-left to the top-right of
the scatterplots. However, other combinations of field may be more interesting. If
we examine the scatterplot of claim value against claim difference, we see that the
highest values of claim difference occur for the lowest claim values. Similarly, we see
also see that claim difference is highest for low income farms. Finally, the scatterplot
of farm size against claim difference shows some outliers. While the majority of
plot points shows a generally low value for claim difference, for some small- and
medium-sized farms the claim difference is unusually high.

Chapter

Parallel Coordinates

Parallel Coordinates Plot Node Overview


A parallel coordinates plot is a multivariate display that shows all values for the
selected fields, connected by lines. Each field has a separate axis in the plot. Each
record in your data set is represented by a single line, which connects the values of
the plotted fields on the axes.
Figure 7-1
A parallel coordinates plot

These plots are useful for detecting trends across variables in addition to revealing
outliers in the data. Most often, the analysis is performed on fields that are ranges, but
you can include set fields in the plot.

33

34
Chapter 7
Figure 7-2
Setting options for a parallel coordinates chart

Setting Options for the Parallel Coordinates Node


The following options are available before you create a parallel coordinates chart:
Plot. The fields that you want to plot in the graph.
Standardize data. When selected, all the values are plotted on the same range, rather

than between the maximum and minimum of each range. The graph is standardized
by default.
Color by. Field by which you will color the lines.
Bin data. Check this box if you want to bin your data for fields that are ranges.
Selecting this field can be useful when your data set is large.
Number of bins. If you choose to bin your data, specify the number of bins.

35
Parallel Coordinates

Panel by. Optionally specify a field by which you will panel the chart.
And. Optionally specify another field by which you will panel the charts along the
other axis.
Legend. Optionally specify the location of the legend in the chart. The location of
the legend is specified using standard compass directions. For example, selecting se
(Southeast) will place the legend in the lower left-right corner of the graph window.
E After you set the desired options, click Execute to create the graph.

Using a Parallel Coordinates Chart


After you create the parallel coordinates chart, you can hover your mouse pointer
over the plotted lines at the intersections of the axes to view tooltips. The tooltips
show the values that define each line.
If you want to change any graph settings, you must return to the node dialog,
change the settings, and execute the node to create a new graph. The graph toolbar
contains several buttons that allow you to interact with the graph.
Table 7-1
Graph toolbar buttons

Toolbar button

Description
Toggle the display of tooltips. If enabled, data
values appear in tooltips when you hover the
mouse pointer over the plotted lines at the
intersections.
Toggle fisheye lensing. When enabled, click
on an area in the graph to zoom the local area
as if you were viewing the graph through a
camera fisheye lens. Hold the mouse button
down and move the mouse pointer to apply
the fisheye effect to other areas of the graph.
This is useful if you need to explore an area
where there is a high concentration of nodes
and links.

36
Chapter 7

Toolbar button

Description
Enable brushing. In parallel coordinates
plots, it may often be difficult to distinguish
individual lines, when many are clustered
around a certain area. With brushing enabled,
you can hover the mouse pointer over lines
in the graph to highlight them. You can then
easily view the intersections of the highlighted
line.
Print the graph.
Save the graph in PNG format.

Parallel Coordinates Example


The following example shows you a common use for a parallel coordinates chart. You
can use the chart to reveal patterns within each cluster and help validate the model.
The stream used in this example is named cluster.str and is located in the demos
folder of your Clementine installation.
E In the cluster stream, attach a Parallel Coordinates node to the model node named

Kmeans.
E In the Parallel Coordinates node dialog, select Age, Na, and K as the Plot fields.
E Select standardize data.
E Select $KM-KMeans as the Color by and Panel by fields.
E Click Execute.

The graph appears in a new window.

37
Parallel Coordinates
Figure 7-3
Using a parallel coordinates plot to detect clusters

We see that in cluster-1, clusters appear at the lower values of Age and K and at higher
values of Na. In cluster-2, we see clustering at the opposite ends of the ranges,
with high values for Age and K, and lower values for Na. However, in cluster-3,
we see that there are no clear clusters. The lines are distributed along each of the
three axes without any clear grouping.

Chapter

Link Analysis Plots

Link Analysis Plot Node Overview


A link analysis plot is a graph that shows nodes and the connections between those
nodes. For example, you might want to examine the paths users take through a Web
site, by tracking how many times users go from one page to the next. You could also
create organization charts that show hierarchies or use it as part of text analysis to
show how various concepts are interconnected. Note that a link analysis plot does not
determine what the connections between the nodes are; it simply displays connections
described by the data.

39

40
Chapter 8
Figure 8-1
Example of a graph showing the paths users take through a Web site

Your data must be in a specific format if you want to use a link analysis plot. Each
record defines a single connection, or link, using two fieldsa FROM field and a TO
fieldthat must of the type string. The values for these fields represent nodes that are
connected. For example, if there is a connection between nodes named A and B,
then there should be a record where A is the value for the FROM field and B is
the value for the TO field. Note that if the values of the FROM and TO fields are
identical, a node will point to itself. Multiple records that define connections between
the same nodes in the same direction are not aggregated, so you should avoid them or
prep your data before creating the plot.
Other fields can be provided that describe the attributes for each connection. An
example of this would be a field that defines the size of the links. You can also have
fields in your data that define the color and shape of the connections; these aesthetics
are applied to the links.

41
Link Analysis Plots
Figure 8-2
Setting options for a Link Analysis plot node

Setting Options for the Link Analysis Plot Node


The following options are available before you create a link analysis plot:
From. Specify the field that contains values indicating the nodes from which the
connections begin. This field can be a flag, set, or unknown. This field is required.
To. Specify the field that contains values indicating the nodes where the connections

end. This field can be a flag, set, or unknown. This field is required.
Link size. Optionally specify a field that describes the size of the links between the
nodes. The field that you select must be of the type range.
Link color. Optionally specify another field that determines the color of the connection
between the nodes. The field that you select must be of the type range.
Link shape. Optionally specify another field that determines the shape of the
connection between the nodes. The field that you select must be of the type range.

42
Chapter 8

Graph layout. Specify the type of layout for the connections between the nodes.
Options available are: circle, network, random, and tree.

Note: If you select the tree layout, your data must be in the format of a tree. That is,
the data must describe a tree with a single root. If you select a tree layout and the data
is not in this format, the graph will be blank.
Link style. Specify how the links between the nodes will appears. Options available

are: arc, elbow, straight, and zigzag.


Legend. Optionally specify the location of the legend in the chart. The location of
the legend is specified using standard compass directions. For example, selecting se
(Southeast) will place the legend in the lower left-right corner of the graph window.
E After you set the desired options, click Execute to create the graph.

Using a link analysis Plot


After you create the graph, you can explore the links between the nodes, drag
nodes or use fisheye lensing to view areas of the graph in more detail. Tooltips
allow you to view data values of either the nodes or links by hovering your mouse
pointer over them.
If you want to change any graph settings, you must return to the node dialog,
change the settings, and execute the node to create a new graph. The graph toolbar
contains several buttons that allow you to interact with the graph.
Table 8-1
Graph toolbar buttons

Toolbar button

Description
Show the nodes as labels instead of points.
Toggle the display of tooltips. If enabled, data
values appear in tooltips when you hover the
mouse pointer over nodes or connecting links.
Toggle node dragging. When enabled, allows
you to move nodes by dragging them.

43
Link Analysis Plots

Toolbar button

Description
Toggle fisheye lensing. When enabled, click
on an area in the graph to zoom the local area
as if you were viewing the graph through a
camera fisheye lens. Hold the mouse button
down and move the mouse pointer to apply
the fisheye effect to other areas of the graph.
This is useful if you need to explore an area
where there is a high concentration of nodes
and links.
Enable zooming. To zoom in on an area of
the graph, click and drag a rectangle around
the area you want to explore.
Zoom out one level.
Print the graph.
Save the graph in PNG format.

Link Analysis Plot Example


The following example shows you a common use for a link analysis plot. It displays
the paths users take through a Web site. The data set contains Web page names
and contains values for the number of times users went from one page to the next.
The data are in the following format:
START,END,Traffic
page0,page1,1234
page1,page2,1345
page2,page3,2153
page2,page4,4343
page2,page5,6533
page1,page6,6763
...

In the node dialog, specify the following settings:


E Select START as the From field.
E Select END as the To field.

44
Chapter 8
E Select Traffic as the Link size.
E Select network as the Graph layout.
E Select straight as the Link style.
E Click Execute.

The graph appears in a new window.


Figure 8-3
Using a link analysis plot to examine the paths users take through a Web site

The nodes represent pages in the Web site and the lines in between the nodes
represent the amount of traffic from one page to the other. Thicker lines represent
more traffic and the legend shows how much traffic is indicated by the link thickness.

45
Link Analysis Plots

You may first want to view the nodes as points instead of labels to view the general
layout and to see where page traffic is lightest and heaviest.
In this example, we see that the heaviest page traffic is at the top of the graph,
where the links are thickest. Suppose you want to examine the areas with the heaviest
traffic in more detail. From the graph legend, we see that the highest amount of traffic
ranges from eight thousand to ten thousand page hits. Attach a Select node in between
the source node and the graph node to select only those cases where the value for
Traffic is greater than or equal to eight thousand. When you execute the graph node
again, you will see only those nodes that have the highest traffic between them.
Figure 8-4
Graph after using select node to view only high traffic links

46
Chapter 8

There are significantly fewer nodes in this graph, allowing you to view nodes with
the heaviest links without the clutter of all the nodes in the data set. Note that all the
nodes in the graph need not be interconnected, as you have selected only a subset of
all the nodes. You can easily identify the heaviest link in the entire data set, which
exists between pages 15 and 17.
By visualizing the traffic in this way, you can gain insight as to how visitors
are using your Web site. You might make decisions about to where to concentrate
advertising, or if traffic is light in some areas, where you might need to improve
navigation to other pages.

Chapter

Categorical Heat Maps

Categorical Heat Map Node Overview


A heat map is a table that uses colors instead of numbers to represent values for
the cells. Heat maps are used in a variety of industries, such as finance or life
sciences. Since the values are represented by colors, they are particularly effective in
highlighting concentrations of similar values in the table or revealing outliers.
Figure 9-1
Categorical heat map

The input data for a categorical heat map must have categorical data in which the
unique categories of two set fields define the rows and the columns of the heat map. A
table cell in this context is the combination of a category from each of the defining set
fields. This is similar to a traditional cross table, except colors are used for the values
47

48
Chapter 9

of the table cells. A statistic is applied to data contained within each cell and the result
of the statistic is displayed using color. Paneling is supported for this type of heat map.
The other type of heat map is a table heat map. For more information, see Table
Heat Map Node Overview in Chapter 10 on p. 53.
Figure 9-2
Setting options for a Categorical Heat Map node

Setting Options for the Categorical Heat Map Node


The following options are available before you create a categorical heat map:
Row. Specify the set field that will be represented in the rows of the heat map.
Column. Specify the set field that will be represented in the columns of the heat map.
Summary. Specify the field on which the statistic will be applied.
Statistic. Choose a statistic that will be applied to the data contained within the cell

defined by the each row and column category combination. The value of the statistic
is displayed using a range of colors, indicated in the legend. The following options
are available: count, maximum, mean, median, minimum, and sum.

49
Categorical Heat Maps

Panel by. Optionally specify a field by which you will panel the heat map.
And. Optionally specify another field by which you will panel the heat map along

the other axis.


Legend. Optionally specify the location of the legend in the chart. The location of

the legend is specified using standard compass directions. For example, selecting se
(Southeast) will place the legend in the lower left-right corner of the graph window.
E After you set the desired options, click Execute to create the graph.

Using a Categorical Heat Map


After you create the heat map, you can hover your mouse pointer over the colored
table cells to view tooltips. The tooltips show the values that make up a data cell.
Note that not all cells will have colors. If a cell is empty, that indicates that no data
exists for the Summary field that you chose for that given combination of categories.
If you want to change any graph settings, you must return to the node dialog,
change the settings, and execute the node to create a new graph. The graph toolbar
contains several buttons that allow you to interact with the graph.
Table 9-1
Graph toolbar buttons

Toolbar button

Description
Toggle the display of tooltips. If enabled, data
values appear in tooltips when you hover the
mouse pointer the colored table cells.
Print the graph.
Save the graph in PNG format.

Categorical Heat Map Example


The following example shows how a categorical heat map is used to examine earnings
data from a census survey containing several demographic categories. The data are in
the following format:
Education, Family_Type, Marital, Race, Gender, College, Earn_Hour

50
Chapter 9
Bachelor, Individual, Never_Mar, White, Male, NA, 2.0
Bachelor, Family, Married_SP, White, Male, NA, 2.0
Some_College, Individual, Divorced, White, Female, Sophomore, 2.0
Some_College, Secondary, Never_Mar, White, Female, Junior, 2.13
Mater, Individual, Never_Mar, White, Female, NA, 2.16
.....

In the node dialog, specify the following settings:


E Select Education as the Row field.
E Select Family_Type as the Column field.
E Select Earn_hour as the Summary field. Values for this field will be represented

by the colors in the heat map.


E Select Mean as the Statistic.
E Select Gender and Marital as the paneling fields.
E Click Execute.

The graph appears in a new window.

51
Categorical Heat Maps
Figure 9-3
Categorical heat map showing earnings

In the heat map of the divorced males panel, we see that those with bachelors degrees
seem to have high income. The absence of colored cells in the bottom left panel
indicates that we do not have much income data for widowed males.

Chapter

10

Table Heat Maps

Table Heat Map Node Overview


A heat map is a table that uses colors instead of numbers to represent values for the
cells. Since the values are represented by colors, they are particularly effective in
highlighting concentrations of similar values in the table or revealing outliers.
Figure 10-1
Table heat map

The input data for a table heat map must be tabular data that contains one symbolic
field that defines the labels for each row of data and n fields of numeric data,
where each field becomes a column in the heat map. In this case, each cell in the
resulting heat map is a numeric value from the original dataset, displayed using color.
Clustering is automatically performed to intelligently sort both the rows and columns
of the heat map. This will group like values into common regions within the heat
53

54
Chapter 10

map, which can make detecting patterns easier. Since the range of the fields you use
to define the columns may be on different scales, a standardize option is provided.
The other type of heat map is a categorical heat map. For more information, see
Categorical Heat Map Node Overview in Chapter 9 on p. 47.
Figure 10-2
Setting options for a Table Heat Map node

Setting Options for the Table Heat Map Node


The following options are available before you create a table heat map:
Rows. Specify the field that will be represented in the rows of the table heat map.
Columns. Specify the field that will be represented in the columns of the table heat

map.
Standardize Data Choose this option if you want to standardize your data; your data
will be converted such that the values will range from 0 to 1, rather than the range of
values in your data set.

55
Table Heat Maps

Legend. Optionally specify the location of the legend in the chart. The location of
the legend is specified using standard compass directions. For example, selecting se
(Southeast) will place the legend in the lower left-right corner of the graph window.
E After you set the desired options, click Execute to create the graph.

Using a Table Heat Map


After you create the heat map, you can hover your mouse pointer over the colored
table cells to view tooltips. The tooltips show the values that make up a data cell.
If you want to change any graph settings, you must return to the node dialog,
change the settings, and execute the node to create a new graph. The graph toolbar
contains several buttons that allow you to interact with the graph.
Table 10-1
Graph toolbar buttons

Toolbar button

Description
Toggle the display of tooltips. If enabled, data
values appear in tooltips when you hover the
mouse pointer over colored table cells.
Print the graph.
Save the graph in PNG format.

Table Heat Map Example


The following example shows how a table heat map is used to look at gene expression
values. Note that the rows and columns are sorted using clustering to help identify the
genes that are highly expressed across time intervals. The data are in the following
format:
gene,E11,E13,E15,E18,E21,P0,P7,P14,A
keratin,1.7,0.34,0.52,0.4,0.68,0.46,0.32,0.08,0.0
cellubrevin,5.75,4.41,1.19,2.13,2.3,2.53,3.89,3.95,2.72
nestin,2.53,3.27,5.2,2.8,1.5,1.12,0.53,0.51,0.44
MAP2,0.04,0.51,1.55,1.65,1.66,1.49,1.43,1.58,1.89
.....

56
Chapter 10

The stream file is Spinal_cord.str and the data file is Spinal_Cord.csv.


In the node dialog, specify the following settings:
E Select gene as the Row field.
E Select all available range fields as the Columns.
E Select Standardize data.
E Click Execute.

The graph appears in a new window.


Figure 10-3
Table heat map showing gene expression values

57
Table Heat Maps

With the tooltips on, if you hover over the darkest red cells in the heat map. The
tooltips indicate that the genes Ka1, actin, and NFL have the highest expression
values.
To examine the effect of how standardizing the data changes the appearance of the
heat map, re-create the graph without the standardize option. You will see that since
gene expression levels happen across various ranges of values, it is more difficult to
see concentrations for a specific gene.
Figure 10-4
Table heat map with non-standardized data

Chapter

11

Map Charts

Map Charts Overview


If you are working with demographic data, you can turn them into visual, easy-to-read
output using Map charts. You can also specify your own custom maps. For more
information about using custom maps, see the Map Editor tutorial.
Figure 11-1
Map of the United States tracking response rate in an election

59

60
Chapter 11
Figure 11-2
Setting options for a Table Heat Map node

Setting Options for the Map Chart Node


The following options are available before you create a map:
Type of map. Select either United States or Custom.

Note: The system provides basic United States maps. If you want to use your own
map, select Custom. If you use a custom map, then you must create a map file with
the Map Editor. For more information about using custom maps, see the Map Editor
tutorial.
Map. Required only if you select United States as the type of map. Specify the type of
map that you want to use. The maps available are: US States, US Counties, US lower
48 States, US lower 48 State Counties.

61
Map Charts

Map file. Select the name of the .zip file that you created with the Map Editor.

Required only if you use a custom map.


Map layer. Select the layer name in your map file that specifies the geography that you
want to plot. You assign this layer when you build the map file using the map editor.
Required only if you use a custom map.
Map attribute. Select the geographic attribute in the map file that matches the values in

the Map attribute data field from the data set. You choose to include attributes when
you build the map file using the map editor. Required only if you use a custom map.
Map attribute data. This field is required for both United States and custom maps.

Select the field in your data that contains the values that match the geographic
attributes (for example, state names, FIPS codes, etc.). For United States maps, this
data field must contain the full state names (for example, Arizona, Michigan,
etc.) when selecting either US Lower 48 States or US States; or, this data field must
contain valid county FIPS codes when selecting US Lower 48 State Counties or US
Counties. The field must be of type String.
Note: A node named StateCodes.nod and a stream named USMapExample.str
are available in the data folder of your Advanced Visualization
installation (the default installation location is C:\Program
Files\Clementine\9.0\CEMI\AdvancedVisCEMI\data). This example
stream and node contain a helpful reclassify Clementine node that you can use to
translate state abbreviations into the required state names.
Color by. Select the field that will be represented by colors in the regions of the map.
Statistic. Select the statistic to apply to the Color by field. The following statistics are

available: count, proportion, maximum, mean, median, minimum, and sum.


Projection. Optionally specify the type of projection that will be used to display
the map. The following projections are available: Lambert, Mercator, Transverse
Mercator, or none.
Legend. Optionally specify the location of the legend in the chart. The location of
the legend is specified using standard compass directions. For example, selecting se
(Southeast) will place the legend in the lower left-right corner of the graph window.
Background color. Select from several colors to specify the background of the map.
E After you set the desired options, click Execute to create the graph.

62
Chapter 11

Using a Map Chart


After you create the map, you can view tooltips for areas of the map or zoom in to
areas to examine them more closely. Tooltips allow you to view data values for areas
of the map by hovering your mouse pointer over them.
If you want to change any graph settings, you must return to the node dialog,
change the settings, and execute the node to create a new graph. The graph toolbar
contains several buttons that allow you to interact with the graph.
Table 11-1
Graph toolbar buttons

Toolbar button

Description
Toggle the display of tooltips. If enabled, data
values appear in tooltips when you hover the
mouse pointer over areas of the map.
Enable zooming. To zoom in on an area of
the graph, click and drag a rectangle around
the area you want to explore.
Zoom out one level.
Print the graph.
Save the graph in PNG format.

Map Chart Example


The following example shows you a common use for a map chart; it examines the
response rate in the 2000 U.S. election. The stream used is named map.str and it
reference the data file named rti_stFips.csv.
E In the Map node dialog, select United States as the type of map.
E Select US Counties as the Map field.
E Select fips as the Map attribute data field.
E Select resprate as the Color by field.
E Select Mean as the statistic.

63
Map Charts
E Select Lambert as the projection.
E Click Execute.

The graph appears in a new window.


Figure 11-3
Map

Note that some states are missing data for this survey. As a result, the graph is
rendered without those states. Since the data are broken down into counties, it will be
easier to examine that results by zooming in on an area of interest.
E In the graph window, click the zoom toolbar button to enable zooming.

64
Chapter 11
E With the mouse pointer, click and drag a rectangle around the area you want to

explore further.
Note: The area highlighted by the rectangle you drag will conform to the type of
projection that you use. A Lambert projection will yield curved rectangular selection
areas, while a Mercator projection will yield straight rectangular selection areas.
E The map zooms in on the area you selected.
Figure 11-4
Zooming in on an area of the map

After zooming, you can easily explore the counties. Hover your mouse pointer over
each area to view the value of the response rate of each county.

65
Map Charts
Figure 11-5
Detailed view of the map

Index

.spc file, 2

link analysis plot


creating, 41
example, 43
introduction, 39
toolbar, 42
using, 42

advanced visualization
overview, 1

map chart
creating, 59, 60
example, 62
using, 62

bar chart
creating, 6
example, 8
introduction, 5
modifying, 7
using, 7
box plot
creating, 12
example, 13
introduction, 11
using, 13

panel plot
creating, 16
example, 19
introduction, 15
modifying, 18
using, 17
parallel coordinates chart
creating, 34
example, 36
using, 35
parallel coordinates plot
introduction, 33
pie chart
creating, 24
example, 25
introduction, 23
using, 25

categorical heat map


creating, 48
example, 49
introduction, 47
using, 49
CEMI, 2

installing
on Windows, 2
scatterplot matrix
example, 29
using, 29
67

68
Index

scatterplot matrix chart


creating, 27, 28
setting options, 28
system requirements, 2

introduction, 53
using, 55

updating Clementine, 2
table heat map
creating, 54
example, 55

You might also like