Professional Documents
Culture Documents
For more information about SPSS software products, please visit our Web site at
http://www.spss.com or contact
SPSS Inc.
233 South Wacker Drive, 11th Floor
Chicago, IL 60606-6412
Tel: (312) 651-3000
Fax: (312) 651-3668
SPSS is a registered trademark and the other product names are the trademarks
of SPSS Inc. for its proprietary computer software. No material describing such
software may be produced or distributed without the written permission of the
owners of the trademark and license rights in the software and the copyrights in
the published materials.
The SOFTWARE and documentation are provided with RESTRICTED RIGHTS.
Use, duplication, or disclosure by the Government is subject to restrictions as set forth
in subdivision (c) (1) (ii) of The Rights in Technical Data and Computer Software
clause at 52.227-7013. Contractor/manufacturer is SPSS Inc., 233 South Wacker
Drive, 11th Floor, Chicago, IL 60606-6412.
General notice: Other product names mentioned herein are used for identification
purposes only and may be trademarks of their respective companies.
This product contains software developed by the Apache Software Foundation.
Copyright 2000 by the Apache Software Foundation. All rights reserved. Software
from the Apache Software Foundation is licensed as is, without warranty of any
kind, and SPSS disclaims any and all liability for damages.
This product includes software developed by Eric Young (eay@mincom.oz.au).
Copyright 19951997 by Eric Young. All rights reserved.
This product contains IBM Runtime Environment for AIX, Java 2 Technology
Edition Runtime Modules. Copyright 1999, 2000 by IBM Corporation.
Microsoft and Windows are registered trademarks of Microsoft Corporation.
UNIX is a registered trademark of The Open Group.
DataDirect, INTERSOLV, SequeLink, and DataDirect Connect are registered
trademarks of DataDirect Technologies.
Advanced Visualization for Clementine 1.0
Copyright 2004 by SPSS.
All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or
transmitted, in any form or by any means, electronic, mechanical, photocopying,
recording, or otherwise, without the prior written permission of the publisher.
Preface
Advanced Visualization for Clementine provides you with several new graph nodes
that allow you to explore and visualize your data in new ways. The following new
graph types are available: box plot, bar chart, pie chart, scatterplot matrix, parallel
coordinates chart, map, table heat map, categorical heat map, panel plot, and link
analysis plot.
Clementine is the SPSS enterprise-strength data mining workbench. Clementine
helps organizations improve customer and citizen relationships through an in-depth
understanding of data. Organizations use the insight gained from Clementine to retain
profitable customers, identify cross-selling opportunities, attract new customers,
detect fraud, reduce risk, and improve government service delivery.
Clementines visual interface invites users specific business expertise, which
leads to more powerful predictive models and shortens time-to-solution. Clementine
offers many modeling techniques, such as prediction, classification, segmentation,
and association detection algorithms. Once models are created, Clementine Solution
Publisher enables their delivery enterprise-wide to decision makers or to a database.
Compatibility
Clementine is designed to operate on computer systems running Windows Me,
Windows XP Home and Professional, Windows 2000, Windows 2003, or Windows
NT 4.0 with Service Pack 6.
Serial Numbers
Your serial number is your identification number with SPSS Inc. You will need
this serial number when you contact SPSS Inc. for information regarding support,
payment, or an upgraded system. The serial number was provided with your
Clementine system.
iii
Customer Service
If you have any questions concerning your shipment or account, contact your local
office, listed on the SPSS Web site at http://www.spss.com/worldwide/. Please have
your serial number ready for identification.
Training Seminars
SPSS Inc. provides both public and onsite training seminars. All seminars feature
hands-on workshops. Seminars will be offered in major cities on a regular basis. For
more information on these seminars, contact your local office, listed on the SPSS
Web site at http://www.spss.com/worldwide/.
Technical Support
The services of SPSS Technical Support are available to registered customers. Student
Version customers can obtain technical support only for installation and environmental
issues. Customers may contact Technical Support for assistance in using Clementine
products or for installation help for one of the supported hardware environments. To
reach Technical Support, see the SPSS Web site at http://www.spss.com, or contact
your local office, listed on the SPSS Web site at http://www.spss.com/worldwide/. Be
prepared to identify yourself, your organization, and the serial number of your system.
Contacting SPSS
If you would like to be on our mailing list, contact one of our offices, listed on our
Web site at http://www.spss.com/worldwide/.
iv
Contents
1
Bar Charts
Box Plots
11
15
Panel Plots
Pie Charts
23
Scatterplot Matrix
27
Parallel Coordinates
33
vi
39
47
53
59
11 Map Charts
vii
Index
67
viii
Chapter
Introduction to Advanced
Visualization for Clementine
Bar chart
Box plot
Pie chart
Map
Panel plot
The following sections provide procedures for creating these graphs in addition to
examples of each graph type.
Note: If you are using data sets that are extremely wide (with a large number of
fields), and you experience any performance problems, you should use a Filter node
in your stream to keep only the fields that you need for your graphs.
1
2
Chapter 1
System Requirements
The system requirements for installing are:
Software. You must have Clementine 9.0 already installed on your system.
Operating system. Windows 98, Windows 2000, or Windows NT 4.0 with Service
Pack 6 or higher.
Installation Procedure
To install Advanced Visualization, simply insert the CD and follow the instructions.
The InstallShield Wizard will guide you through the installation.
E Insert the installation CD into the CD-ROM drive.
E In Windows Explorer, navigate to the CD-ROM drive and run setupwin32.exe.
3
Introduction to Advanced Visualization for Clementine
Figure 1-1
Installation wizard
When you have completed the installation, a number of files will have been added
to your computer; the new graph nodes will automatically be associated with
Clementine. You will see ten new graph nodes in the Graphs pallette.
Chapter
Bar Charts
Bar Chart Node Overview
A bar chart summarizes values of one field within categories of another. The height
of the bars in the chart may represent a function of either the measure field or the
cluster definition field.
Figure 2-1
Bar graph of jackpots in a casino
6
Chapter 2
Figure 2-2
Setting options for a Bar chart node
statistics are available: count, proportion, maximum, mean, median, minimum, sum,
range, standard deviation, or confidence interval.
Panel by. Optionally specify a field by which you will panel the bar chart.
7
Bar Charts
And. Optionally specify another field by which you will panel the bar chart along a
second axis.
Legend. Optionally specify the location of the legend in the chart. The location of
the legend is specified using standard compass directions. For example, selecting se
(Southeast) will place the legend in the lower left-right corner of the graph window.
E After you set the desired options, click Execute to create the graph.
Toolbar button
Description
Toggle the display of tooltips. If enabled, data
values appear in tooltips when you hover the
mouse pointer over the bars.
Print the graph.
Save the graph in PNG format.
Show the graph control panel. You use this to
modify any settings in the graph.
Hide the graph control panel.
8
Chapter 2
E If the graph control panel is not visible, click the double-arrow button at the top-right
of the window.
E Change any of the original settings as desired.
E Click the bar orientation buttons to specify whether bars are displayed vertically or
horizontally.
E Click Update.
You may want to either print or save the graph before modifying it. If you want to
compare different graphs side-by-side, you can return to the node dialog, specify
new settings, and create a new graph.
Using the control panel to modify the graph is useful if you want to quickly change
one setting, such as statistic or measure, to compare results or discover new patterns.
Suppose you want to find the largest median jackpot within banks of slot machines
for each day of the week. In the node dialog, specify the following settings:
E Select Bank as the Category field.
E Select Jack_Pots as the Measure field.
E Select Median as the Statistic.
E Select Day as the Paneling field.
E Click Execute.
9
Bar Charts
Figure 2-3
Bar graph of jackpots in casino paneled by days of the week
Examining the panel for Saturday, there seems to be a spike in jackpots for one of
the banks of slots. Looking at the other days, the same bank seems to be awarding a
disproportionately high number of jackpots. If you hover over the other high bars,
you can verify that it is bank KK that is giving out the most jackpots.
It might be interesting to isolate the high jackpot bank by using a select node to
select records that have a value of jackpot greater than 30,000. Instead, we could look
at the coin-in values to make sure that the bank of slot machines isnt giving out a
high number of jackpots per paid play.
E If you do not already see the graph control panel, click the double-arrowed button
10
Chapter 2
E Click Update.
Figure 2-4
Looking at number of coins inserted into the slot machines per day
We see that of all the days, Saturday has the most coins played. Bank KK has the
highest amount of coins played that day, so its possible that players had identified
machines in that area as hot machines and kept on playing them. If the number of
coins played were low, we might investigate the machines to make sure the payout
settings were correct.
Chapter
Box Plots
Box Plot Node Overview
Box plots are another way to look at the distribution of certain fields in detail. Box
plots show the median, inter-quartile range, outliers, and extreme cases of individual
fields. When using these plots, you can get some indication of your datas symmetry
and skewness.
Figure 3-1
Box plot
The box for each field plotted represents the range of values for the quartiles that
are above and below the median. That is, the box contains the middle 50%, or
inter-quartile range, of the data. The horizontal line within each box represents
the median value. If the median is not in the middle of the box, then this indicates
that the data is skewed. The thin lines extending above and below the box are the
whiskers, which represent the maximum and minimum values. Circles outside the
box represent outliers. If outliers are present, then the whiskers extend to one and half
times the inter-quartile range.
11
12
Chapter 3
Figure 3-2
Setting options for a Box plot node
is what will be summarized within the category field that you specify. You can select
from scale or unknown type fields. You must specify a measure in this drop-down list.
Cluster field. Optionally, select a field that will be used to cluster the boxes in the
chart. You can select from flag, set, and unknown field types.
Panel by. Optionally specify a field by which you will panel the box plots.
And. Optionally specify another field by which you will panel the box plots along
the other axis.
13
Box Plots
Legend. Optionally specify the location of the legend in the chart. The location of
the legend is specified using standard compass directions. For example, selecting se
(Southeast) will place the legend in the lower left-right corner of the graph window.
E After you set the desired options, click Execute to create the graph.
Toolbar button
Description
Toggle the display of tooltips. If enabled,
data values appear in tooltips when you hover
the mouse pointer over boxes, whiskers, or
outliers.
Print the graph.
Save the graph in PNG format.
14
Chapter 3
E Select standardize data.
E Select Drug and Cholesterol as the paneling fields.
E Click Execute.
We see that the only outlier exists in cluster-5, for the combination of drugX and
normal cholesterol levels. For drugY and normal cholesterol levels, cluster-1 and
cluster-5 have fairly even distributions. In comparison, if you look at the combination
of drugY and high cholesterol levels, both cluster-1 and cluster-3 have skewed
distributions, since the median for each is closer to the upper end of the inter-quartile
range.
Chapter
Panel Plots
Panel Plot Node Overview
Paneled plots, sometimes called condition plots, are plots of two fields, conditioned
by a third or a fourth field. Panel plots are useful for looking at any graph subject
to the conditions of other fields. Since the plot is paneled, you can view the results
side by side.
Figure 4-1
Panel plot
The plot itself is very general, allowing for different types of graphs, such as
scatterplots, bar charts, box plots, linear smoothing lines, or any combination of these.
The conditional fields define the panels. If the fields are sets, then the conditions are
the categories within those sets. If the conditional fields are scale, then the field is
split up into ranges of the continuous field, with each range defining a condition.
15
16
Chapter 4
Axes are automatically shared when there are two paneling variables. Panels
automatically shuffle when the aspect ratio of the graph output window is changed,
if there is only a single paneling field.
Figure 4-2
Setting options for a Table Heat Map node
other axis.
X Axis. Select the field that represents value to be plotted against the X axis. The
drop-down list will contain all scale and unknown types in your data set.
Y Axis. Select the field that represents value to be plotted against the Yaxis. The
drop-down list will contain all scale and unknown types in your data set.
17
Panel Plots
Element. Select the type of element that you want displayed in the plot: points, bars,
lines, path, area, and box. For each type of element, the default statistic will be
used. Note that you add or delete different elements in the control panel after you
create the plot.
Color by. Select the field that will determine the color of elements in the plot. Displays
Toolbar button
Description
Toggle the display of tooltips. If enabled, data
values appear in tooltips when you hover the
mouse pointer over elements in the graphs.
Enable zooming. To zoom in on an area of
the plot, click and drag a rectangle around the
area you want to explore. All of the paneled
plots will zoom to the selected area.
Zoom out one level.
Print the graph.
Save the graph in PNG format.
18
Chapter 4
Toolbar button
Description
Show the graph control panel. You use this to
modify any settings in the graph.
Hide the graph control panel.
The graph control panel offers a few additional options not available in the node
dialog.
Plot Elements. You have the option to add multiple elements to the plot.
Note: You cannot change any elements that you have added to the plot. If you want a
different element, first remove the element you do not want, then add a new element.
Statistic. The statistic for the current element that you are adding. By default, the
first element you specified when you created the graph has no statistic applied. The
following statistics are available: count, mean, minimum, maximum, median, sum,
range, standard deviation, confidence interval.
Jitter. This option is useful for plots that have many points in the same location.
Use this option to slightly disperse the points so that individual plot points can be
distinguished easier.
Graph orientation. Click the icons for the desired graph orientation.
Since panel plots are generalized plots, you can include any number of elements in
your plots. For example you could include points in a scatterplot and then add linear
regression lines.
To add plot elements:
E Select the type of element you want to add from Plot elements.
19
Panel Plots
E Select the Statistic that you want applied to the element.
E If desired, select the Color by, Shape by, and Size by fields.
E Click Add.
The element is add to the element list, located directly beneath the Add button.
E After you have added all desired elements, click Update.
20
Chapter 4
E Click Execute.
The points show a general relationship between cost and revenue. To get a better idea
of the relationship, we can add a linear regression line to the plots.
E If the graph control panel is not visible, click the double-arrowed button at the
21
Panel Plots
E Click Update.
The linear regression lines appear in each of the paneled scatterplots. We see that
confections and drinks have the greatest increase in revenue with respect to cost
of promotion.
Figure 4-4
Using the graph control panel to add regression lines to the plot
Chapter
Pie Charts
Pie Chart Node Overview
You use a pie chart is to visually represent the number of cases or percentage of
various categories as pieces in a pie. This allows you to quickly view the relative
distribution within a category of a field.
Figure 5-1
Pie chart showing distribution of regions
Categories with larger slices of the pie indicate a relatively larger number of cases
or a higher percentage.
23
24
Chapter 5
Figure 5-2
Setting options for a Table Heat Map node
chart.
Percentage of. If you choose to display percentages, specify a field which determines
25
Pie Charts
And. Optionally specify another field by which you will panel the pie charts along
the other axis.
Legend. Optionally specify the location of the legend in the chart. The location of
the legend is specified using standard compass directions. For example, selecting se
(Southeast) will place the legend in the lower left-right corner of the graph window.
E After you set the desired options, click Execute to create the graph.
Toolbar button
Description
Toggle the display of tooltips. If enabled, data
values appear in tooltips when you hover the
mouse pointer over the pie slices.
Toggle the display of category labels on the
slices.
Print the graph.
Save the graph in PNG format.
26
Chapter 5
E In the Pie Chart node dialog, select mortgage as the slice field.
E Select Counts for the Display.
E Select region and sex as the paneling fields.
E Click Execute.
We see that for both males and females, across all regions, the majority of respondents
dont have mortgages. In addition, for females in rural and suburban areas, less than a
quarter of the respondents had a mortgage. Of the other population segments, about
one-third of the respondents had a mortgage.
Chapter
Scatterplot Matrix
The diagonal of a scatterplot matrix consists of histograms for each of the fields, as
plotting a field against itself does not add any value to the plot.
27
28
Chapter 6
Figure 6-2
Setting options for a Splom node
least two plot fields are required to display an actual scatterplot matrix. If you specify
only one field to plot, the graph displays a histogram of the selected field.
Bin data. Check this box if you want to bin your data for fields of the type scale.
Selecting this option can be useful when your data set is large.
Number of bins. If you choose to bin your data, specify the number of bins.
E After you specify the desired options, click Execute to create the graph.
29
Scatterplot Matrix
Toolbar button
Description
Toggle the display of tooltips. If enabled, data
values appear in tooltips when you hover the
mouse pointer over the points in the scatter
plots or bars in the histogram.
Enable brushing. With brushing enabled, you
can hover the mouse pointer over points in the
scatterplots or bars in the histograms in the
graph to highlight them. The corresponding
points and bars are highlighted in all
scatterplots and histograms..
Print the graph.
Save the graph in PNG format.
farmsize
claimvalue
30
Chapter 6
claimdiff
landquality
farmincome
estincome
E Click Execute.
31
Scatterplot Matrix
We see that several combinations of fields have linear relationships, indicated by the
groupings that look similar to lines going from the bottom-left to the top-right of
the scatterplots. However, other combinations of field may be more interesting. If
we examine the scatterplot of claim value against claim difference, we see that the
highest values of claim difference occur for the lowest claim values. Similarly, we see
also see that claim difference is highest for low income farms. Finally, the scatterplot
of farm size against claim difference shows some outliers. While the majority of
plot points shows a generally low value for claim difference, for some small- and
medium-sized farms the claim difference is unusually high.
Chapter
Parallel Coordinates
These plots are useful for detecting trends across variables in addition to revealing
outliers in the data. Most often, the analysis is performed on fields that are ranges, but
you can include set fields in the plot.
33
34
Chapter 7
Figure 7-2
Setting options for a parallel coordinates chart
than between the maximum and minimum of each range. The graph is standardized
by default.
Color by. Field by which you will color the lines.
Bin data. Check this box if you want to bin your data for fields that are ranges.
Selecting this field can be useful when your data set is large.
Number of bins. If you choose to bin your data, specify the number of bins.
35
Parallel Coordinates
Panel by. Optionally specify a field by which you will panel the chart.
And. Optionally specify another field by which you will panel the charts along the
other axis.
Legend. Optionally specify the location of the legend in the chart. The location of
the legend is specified using standard compass directions. For example, selecting se
(Southeast) will place the legend in the lower left-right corner of the graph window.
E After you set the desired options, click Execute to create the graph.
Toolbar button
Description
Toggle the display of tooltips. If enabled, data
values appear in tooltips when you hover the
mouse pointer over the plotted lines at the
intersections.
Toggle fisheye lensing. When enabled, click
on an area in the graph to zoom the local area
as if you were viewing the graph through a
camera fisheye lens. Hold the mouse button
down and move the mouse pointer to apply
the fisheye effect to other areas of the graph.
This is useful if you need to explore an area
where there is a high concentration of nodes
and links.
36
Chapter 7
Toolbar button
Description
Enable brushing. In parallel coordinates
plots, it may often be difficult to distinguish
individual lines, when many are clustered
around a certain area. With brushing enabled,
you can hover the mouse pointer over lines
in the graph to highlight them. You can then
easily view the intersections of the highlighted
line.
Print the graph.
Save the graph in PNG format.
Kmeans.
E In the Parallel Coordinates node dialog, select Age, Na, and K as the Plot fields.
E Select standardize data.
E Select $KM-KMeans as the Color by and Panel by fields.
E Click Execute.
37
Parallel Coordinates
Figure 7-3
Using a parallel coordinates plot to detect clusters
We see that in cluster-1, clusters appear at the lower values of Age and K and at higher
values of Na. In cluster-2, we see clustering at the opposite ends of the ranges,
with high values for Age and K, and lower values for Na. However, in cluster-3,
we see that there are no clear clusters. The lines are distributed along each of the
three axes without any clear grouping.
Chapter
39
40
Chapter 8
Figure 8-1
Example of a graph showing the paths users take through a Web site
Your data must be in a specific format if you want to use a link analysis plot. Each
record defines a single connection, or link, using two fieldsa FROM field and a TO
fieldthat must of the type string. The values for these fields represent nodes that are
connected. For example, if there is a connection between nodes named A and B,
then there should be a record where A is the value for the FROM field and B is
the value for the TO field. Note that if the values of the FROM and TO fields are
identical, a node will point to itself. Multiple records that define connections between
the same nodes in the same direction are not aggregated, so you should avoid them or
prep your data before creating the plot.
Other fields can be provided that describe the attributes for each connection. An
example of this would be a field that defines the size of the links. You can also have
fields in your data that define the color and shape of the connections; these aesthetics
are applied to the links.
41
Link Analysis Plots
Figure 8-2
Setting options for a Link Analysis plot node
end. This field can be a flag, set, or unknown. This field is required.
Link size. Optionally specify a field that describes the size of the links between the
nodes. The field that you select must be of the type range.
Link color. Optionally specify another field that determines the color of the connection
between the nodes. The field that you select must be of the type range.
Link shape. Optionally specify another field that determines the shape of the
connection between the nodes. The field that you select must be of the type range.
42
Chapter 8
Graph layout. Specify the type of layout for the connections between the nodes.
Options available are: circle, network, random, and tree.
Note: If you select the tree layout, your data must be in the format of a tree. That is,
the data must describe a tree with a single root. If you select a tree layout and the data
is not in this format, the graph will be blank.
Link style. Specify how the links between the nodes will appears. Options available
Toolbar button
Description
Show the nodes as labels instead of points.
Toggle the display of tooltips. If enabled, data
values appear in tooltips when you hover the
mouse pointer over nodes or connecting links.
Toggle node dragging. When enabled, allows
you to move nodes by dragging them.
43
Link Analysis Plots
Toolbar button
Description
Toggle fisheye lensing. When enabled, click
on an area in the graph to zoom the local area
as if you were viewing the graph through a
camera fisheye lens. Hold the mouse button
down and move the mouse pointer to apply
the fisheye effect to other areas of the graph.
This is useful if you need to explore an area
where there is a high concentration of nodes
and links.
Enable zooming. To zoom in on an area of
the graph, click and drag a rectangle around
the area you want to explore.
Zoom out one level.
Print the graph.
Save the graph in PNG format.
44
Chapter 8
E Select Traffic as the Link size.
E Select network as the Graph layout.
E Select straight as the Link style.
E Click Execute.
The nodes represent pages in the Web site and the lines in between the nodes
represent the amount of traffic from one page to the other. Thicker lines represent
more traffic and the legend shows how much traffic is indicated by the link thickness.
45
Link Analysis Plots
You may first want to view the nodes as points instead of labels to view the general
layout and to see where page traffic is lightest and heaviest.
In this example, we see that the heaviest page traffic is at the top of the graph,
where the links are thickest. Suppose you want to examine the areas with the heaviest
traffic in more detail. From the graph legend, we see that the highest amount of traffic
ranges from eight thousand to ten thousand page hits. Attach a Select node in between
the source node and the graph node to select only those cases where the value for
Traffic is greater than or equal to eight thousand. When you execute the graph node
again, you will see only those nodes that have the highest traffic between them.
Figure 8-4
Graph after using select node to view only high traffic links
46
Chapter 8
There are significantly fewer nodes in this graph, allowing you to view nodes with
the heaviest links without the clutter of all the nodes in the data set. Note that all the
nodes in the graph need not be interconnected, as you have selected only a subset of
all the nodes. You can easily identify the heaviest link in the entire data set, which
exists between pages 15 and 17.
By visualizing the traffic in this way, you can gain insight as to how visitors
are using your Web site. You might make decisions about to where to concentrate
advertising, or if traffic is light in some areas, where you might need to improve
navigation to other pages.
Chapter
The input data for a categorical heat map must have categorical data in which the
unique categories of two set fields define the rows and the columns of the heat map. A
table cell in this context is the combination of a category from each of the defining set
fields. This is similar to a traditional cross table, except colors are used for the values
47
48
Chapter 9
of the table cells. A statistic is applied to data contained within each cell and the result
of the statistic is displayed using color. Paneling is supported for this type of heat map.
The other type of heat map is a table heat map. For more information, see Table
Heat Map Node Overview in Chapter 10 on p. 53.
Figure 9-2
Setting options for a Categorical Heat Map node
defined by the each row and column category combination. The value of the statistic
is displayed using a range of colors, indicated in the legend. The following options
are available: count, maximum, mean, median, minimum, and sum.
49
Categorical Heat Maps
Panel by. Optionally specify a field by which you will panel the heat map.
And. Optionally specify another field by which you will panel the heat map along
the legend is specified using standard compass directions. For example, selecting se
(Southeast) will place the legend in the lower left-right corner of the graph window.
E After you set the desired options, click Execute to create the graph.
Toolbar button
Description
Toggle the display of tooltips. If enabled, data
values appear in tooltips when you hover the
mouse pointer the colored table cells.
Print the graph.
Save the graph in PNG format.
50
Chapter 9
Bachelor, Individual, Never_Mar, White, Male, NA, 2.0
Bachelor, Family, Married_SP, White, Male, NA, 2.0
Some_College, Individual, Divorced, White, Female, Sophomore, 2.0
Some_College, Secondary, Never_Mar, White, Female, Junior, 2.13
Mater, Individual, Never_Mar, White, Female, NA, 2.16
.....
51
Categorical Heat Maps
Figure 9-3
Categorical heat map showing earnings
In the heat map of the divorced males panel, we see that those with bachelors degrees
seem to have high income. The absence of colored cells in the bottom left panel
indicates that we do not have much income data for widowed males.
Chapter
10
The input data for a table heat map must be tabular data that contains one symbolic
field that defines the labels for each row of data and n fields of numeric data,
where each field becomes a column in the heat map. In this case, each cell in the
resulting heat map is a numeric value from the original dataset, displayed using color.
Clustering is automatically performed to intelligently sort both the rows and columns
of the heat map. This will group like values into common regions within the heat
53
54
Chapter 10
map, which can make detecting patterns easier. Since the range of the fields you use
to define the columns may be on different scales, a standardize option is provided.
The other type of heat map is a categorical heat map. For more information, see
Categorical Heat Map Node Overview in Chapter 9 on p. 47.
Figure 10-2
Setting options for a Table Heat Map node
map.
Standardize Data Choose this option if you want to standardize your data; your data
will be converted such that the values will range from 0 to 1, rather than the range of
values in your data set.
55
Table Heat Maps
Legend. Optionally specify the location of the legend in the chart. The location of
the legend is specified using standard compass directions. For example, selecting se
(Southeast) will place the legend in the lower left-right corner of the graph window.
E After you set the desired options, click Execute to create the graph.
Toolbar button
Description
Toggle the display of tooltips. If enabled, data
values appear in tooltips when you hover the
mouse pointer over colored table cells.
Print the graph.
Save the graph in PNG format.
56
Chapter 10
57
Table Heat Maps
With the tooltips on, if you hover over the darkest red cells in the heat map. The
tooltips indicate that the genes Ka1, actin, and NFL have the highest expression
values.
To examine the effect of how standardizing the data changes the appearance of the
heat map, re-create the graph without the standardize option. You will see that since
gene expression levels happen across various ranges of values, it is more difficult to
see concentrations for a specific gene.
Figure 10-4
Table heat map with non-standardized data
Chapter
11
Map Charts
59
60
Chapter 11
Figure 11-2
Setting options for a Table Heat Map node
Note: The system provides basic United States maps. If you want to use your own
map, select Custom. If you use a custom map, then you must create a map file with
the Map Editor. For more information about using custom maps, see the Map Editor
tutorial.
Map. Required only if you select United States as the type of map. Specify the type of
map that you want to use. The maps available are: US States, US Counties, US lower
48 States, US lower 48 State Counties.
61
Map Charts
Map file. Select the name of the .zip file that you created with the Map Editor.
the Map attribute data field from the data set. You choose to include attributes when
you build the map file using the map editor. Required only if you use a custom map.
Map attribute data. This field is required for both United States and custom maps.
Select the field in your data that contains the values that match the geographic
attributes (for example, state names, FIPS codes, etc.). For United States maps, this
data field must contain the full state names (for example, Arizona, Michigan,
etc.) when selecting either US Lower 48 States or US States; or, this data field must
contain valid county FIPS codes when selecting US Lower 48 State Counties or US
Counties. The field must be of type String.
Note: A node named StateCodes.nod and a stream named USMapExample.str
are available in the data folder of your Advanced Visualization
installation (the default installation location is C:\Program
Files\Clementine\9.0\CEMI\AdvancedVisCEMI\data). This example
stream and node contain a helpful reclassify Clementine node that you can use to
translate state abbreviations into the required state names.
Color by. Select the field that will be represented by colors in the regions of the map.
Statistic. Select the statistic to apply to the Color by field. The following statistics are
62
Chapter 11
Toolbar button
Description
Toggle the display of tooltips. If enabled, data
values appear in tooltips when you hover the
mouse pointer over areas of the map.
Enable zooming. To zoom in on an area of
the graph, click and drag a rectangle around
the area you want to explore.
Zoom out one level.
Print the graph.
Save the graph in PNG format.
63
Map Charts
E Select Lambert as the projection.
E Click Execute.
Note that some states are missing data for this survey. As a result, the graph is
rendered without those states. Since the data are broken down into counties, it will be
easier to examine that results by zooming in on an area of interest.
E In the graph window, click the zoom toolbar button to enable zooming.
64
Chapter 11
E With the mouse pointer, click and drag a rectangle around the area you want to
explore further.
Note: The area highlighted by the rectangle you drag will conform to the type of
projection that you use. A Lambert projection will yield curved rectangular selection
areas, while a Mercator projection will yield straight rectangular selection areas.
E The map zooms in on the area you selected.
Figure 11-4
Zooming in on an area of the map
After zooming, you can easily explore the counties. Hover your mouse pointer over
each area to view the value of the response rate of each county.
65
Map Charts
Figure 11-5
Detailed view of the map
Index
.spc file, 2
advanced visualization
overview, 1
map chart
creating, 59, 60
example, 62
using, 62
bar chart
creating, 6
example, 8
introduction, 5
modifying, 7
using, 7
box plot
creating, 12
example, 13
introduction, 11
using, 13
panel plot
creating, 16
example, 19
introduction, 15
modifying, 18
using, 17
parallel coordinates chart
creating, 34
example, 36
using, 35
parallel coordinates plot
introduction, 33
pie chart
creating, 24
example, 25
introduction, 23
using, 25
installing
on Windows, 2
scatterplot matrix
example, 29
using, 29
67
68
Index
introduction, 53
using, 55
updating Clementine, 2
table heat map
creating, 54
example, 55