You are on page 1of 30

Importance-Driven Time-Varying

Data Visualization
Chaoli Wang, Hongfeng Yu, Kwan-Liu Ma
University of California, Davis
Importance-Driven Volume Rendering

[Viola et al. 04]


Differences

• Medical or anatomical data sets


• Pre-segmented objects
• Importance assignment
• Focus on rendering
• Time-varying scientific data sets
• No segmentation or objects are given
• Importance measurement
• Focus on data analysis
Questions

• How to capture the important aspect of data?


• Importance – amount of change, or “unusualness”
• How to utilize the importance measure?
• Data classification
• Abnormality detection
• Time budget allocation
• Time step selection
Related Work

• Time-varying data visualization


• Spatial and temporal coherence
[Shen et al. 94, Westermann 95, Shen et al. 99]
• Compression, rendering, presentation
[Guthe et al. 02, Lum et al. 02, Woodring et al. 03]
• Transfer function specification
[Jankun-Kelly et al. 01, Akiba et al. 06]
• Time-activity curve (TAC) [Fang et al. 07]
• Local statistical complexity (LSC) [Jänicke et al. 07]
Importance Analysis

• Block-wise approach
• Importance evaluation
• Amount of information a block contains by itself
• New information w.r.t. other blocks in the time series
• Information theory
• Entropy
• Mutual information
• Conditional entropy
Information Theory

• Entropy
H ( X )    p( x ) log p( x )
xX

• Mutual information
p( x, y )
I ( X ; Y )   p( x, y ) log
xX yY p( x ) p( y )
• Conditional entropy
H ( X | Y )  H ( X )  I ( X ;Y )

p(x), p(y): Marginal probability distribution function


p(x,y): Joint probability distribution function
Relations with Venn Diagram

H(X) H(Y)

I(X;Y) H(X|Y) H(Y|X)

I(X;Y) = I(Y;X) H(X|Y) ≠ H(Y|X)


Entropy in Multidimensional Feature Space

• Feature vector f3
• Data value f2
• Gradient magnitude or other derivatives
• Domain-specific quantities

f1
• Multidimensional histogram
• Use the normalized bin count as probability p(x)

H ( X )    p ( x ) log p( x )
xX
Importance in Joint Feature-Temporal Space

• Consider two data blocks X and Y at F


• the same spatial location
• neighboring time steps F = (f1, f2, f3, …)

F
• Use joint feature-temporal histogram
• Use the normalized bin count as probability p(x,y)
• Run-length encode the histogram
p( x, y )
I ( X ;Y )   p ( x, y ) log
xX yY p( x ) p( y )
Importance Value Calculation

• Consider a time window for neighboring blocks

• Importance of a data block Xj at time step t:


AX j ,t  w H(X
i 1.. M
i j ,t | Y j ,i )

• Importance of time step t:


At  A
j 1.. N
X j ,t
Importance Curve – Earthquake Data Set

I regular

T
Importance Curve – Climate Data Set

I periodic

T
Importance Curve – Vortex Data Set

I turbulent

T
Clustering Importance Curves

• Hybrid k-means clustering [Kanungo et al. 02]


• Lloyd’s algorithm
• Local search by swapping centroids
• Avoid getting trapped in local minima
Clustering All Time Steps vs. Time Segments

599 time steps


50 segments

1200 time steps


120 segments

90 time steps
90 segments
Cluster Highlighting – Earthquake Data Set
Cluster Highlighting – Hurricane Data Set
Cluster Highlighting – Climate Data Set
Cluster Highlighting – Vortex Data Set
Cluster Highlighting – Combustion Data Set
Abnormality Detection

A: El Niño B: La Niña
Time Budget Allocation

• Allocate time budget based on importance value


At
t   

T 
i 1
Ai

• Animation time
• Non-even allocation
• Rendering time
• Assign to each time step (and each block in a time step)
• Adjust the sampling spacing accordingly
Time Step Selection

• Uniform selection
• Importance-driven selection
• Select the first time step
• Partition the rest of time steps into (K-1) segments
• In each time segment, select one time step: t  arg max H ( | t )

• Maximize the joint entropy
Precomputation and Clustering Performance

• The test data sets with their parameter settings, sizes of joint feature-temporal histograms,
and timings for histogram calculation.

• Timing for clustering all time steps of the five test data sets.
Choices of Window and Bin Sizes

• The importance curve of the vortex data set with different time window sizes (W) and
numbers of bins for feature components F = (f1, f2, f3).
Choices of # of Clusters and Block Size

3 clusters 4 clusters 5 clusters

50×50×20 20×20×20 10×10×20

• The cluster of the highest importance values under different choices of number of clusters
and block size. Top row: color adjustment only. Bottom row: color and opacity adjustment.
Artifact Along Block Boundaries

20×20×20 10×10×20
Summary

• Importance-driven data analysis and visualization


• Quantify data importance using conditional entropy
• Cluster the importance curves
• Leverage the importance in visualization
• Limitations
• Block-based classification
• Size of joint feature-temporal histogram
• Extensions
• Non-uniform data partition
• Incorporate domain knowledge
• Dimension reduction
Acknowledgements

• NSF
• CCF-0634913, CNS- 0551727, OCI-0325934, OCI-0749227, and
OCI-0749217
• DOE SciDAC Program
• DE-FC02-06ER25777, DE-FG02-08ER54956, and DE-FG02-
05ER54817
• Data sets
• Combustion: Jacqueline H. Chen, SNL
• Climate: Andrew T. Wittenberg, NOAA
• Earthquake: CMU quake group
• Hurricane: NSF, IEEE Visualization 2004 Contest

You might also like