You are on page 1of 1

There are problems with the following visualization designs.

Please write down what you feel Principle: The overall design is based on a metaphor of a DNA molecule that consists of two
wrong with these designs. twisting helices. Likewise, our design visually represents two conflicting sides. 3. The Securities and Futures Commission of Hong Kong has built a database to store all the
news articles related to finance and stock. They once contacted us to develop a visualization
(a) [5 marks] Community strands: By default, we polarize the community strand between two sentiment poles system which can show the sentiments of different groups of people towards a finance policy
(e.g., positive and negative) and interpolate the data samples at different timestamps along a and whether the sentiments change over time. For example, from the following articles, we
horizontal timeline. Thus, a smooth curving strand is created. In our design, we encode the know that Meng Xiaoming is negative on the first policy but is positive on second policy.
sentiment information on each community strand to enhance the visual patterns driven by
sentiments. First, the sentiment information is quantitatively represented as the screen distance --------------------
between a community strand and a sentiment pole. Leaning toward one sentiment pole indicates
that most of the people in the community share that sentiment. Second, the sentiment information
Article 1: Financial industry in Hong Kong and China slams government debt swap
is also represented by a color gradient from green to red, pertaining to two sentiment poles. The programme
community size implies its influence and is encoded as the thickness of the strand. With this ….
encoding scheme, we are able to identify whether the sentiment change of a community is caused
by thousands of people or only by tens of people. “The government’s manner of enforcing the debt swap has been very crude,” said Meng
Xiaoning, chief executive of TF Securities in Hong Kong.
Event box: Inside the event box, keywords are represented as small bars. The size and color of the
bar encode the normalized frequency and the sentiment of the keyword, respectively. Within
different time windows, users may discuss the topics using different keywords, thus resulting in Article 2: China's watchdogs step in to avert Sinosteel bond default
various events under the same topic. By default, an event box’s size varies based on the distance
between two community strands. This design spontaneously assigns more space and forms a multi - Mainland regulators make rare intervention in bond market, with the company's debt holders
foc view for displaying the details when divergence is large. told to redeem their notes a month later
User group: A user group is visualized as a circle embedded within the community strand. The …
users within the group are represented as dots whose sizes and colors represent the users’ Meng Xiaoning, the chief executive of TF Securities, said he was cautiously optimistic on the
normalized activeness and their sentiments, respectively.
Ignoring conventions. At first glance, it looks like gun deaths are on the decline in Florida. drive to internationalise the country's bond market.
But a closer look shows that the y-axis is upside-down, with zero at the top and the maximum …
y-axis upside down, needs to show the full scale (Tufte design principle) The design can support the tasks like comparison between two group of opinions clearly: 5 marks value at the bottom. As gun deaths increase, the line slopes downward, violating a well-
Follow design principles & intuitive design: 5 marks established convention that y-values increase as we move up the page. -----------------
Good scalability: 3 marks
(b) [5 marks] Design rationale: 2 marks Please design visualizations:
Edge-bundling is widely used to reduce visual clutter in graphs. The following figures show the
The following figure shows the paths that Napoleon’s troops move to and retreat from
original graph and three different edge-bundling algorithms. You do not need to know the details Moscow: • Given a policy, visualize the sentiments of a group people toward this policy and whether
of these algorithms for answering this question. they change over time
• Given a person, visualize the sentiments of this person towards a set of policies and
whether they change over time
• Your system also needs to show the keyword summary of the newspapers related to the
sentiments like the one above with the request of users
Sample Answer
There are mainly four factors needed to consider in the visualization: policies, people,
sentiments, time. For the detail information in the third requirements, we can display them
during interaction.
1. One policy, group of people
unjustified 3D design
For a given policy, the sentiments changes of a group of people can be visualized as a theme
river.
4. Problem Solving [10 marks]
We can categorize the sentiment into positive, neutral and negative. And use the width of the
(a) The original graph with 1715 nodes and 9780 edges showing the immigration among different
Multi-dimension data is very common in many applications. Here is a sample dataset with five
states in the USA; (b) The edges are bundled using FDEB with inverse-linear model; (c) The edges
river to encode the strength / number of the sentiments.
attributes.
are bundled with GBEB; (d) The edges are bundled using FDEB with inverse-quadratic model.

A. Based on your comparing Fig. (a) and Fig. (b)(c)(d), in your opinion, what are the advantages Suppose we also know the exact date (year/month) for each dot in the map and the number of
and disadvantages of edge bundling? [5 marks]
soldiers at that date.
Advantage: reduce visual clutter and show the overall pattern (2.5 marks) Please design a visualization to encode the above information (i.e., time and number of
Disadvantage: lose detailed information (2.5 marks) soldiers).
B. You are asked to design a controlled experiment to evaluate the three edge-bundling
algorithms. We hope the evaluation should be quantitative and as rigorous as possible. Please write Answer 1: (10 marks)
down your detailed plan to conduct the evaluation. [15 marks]
Please refer to: https://robots.thoughtbot.com/analyzing-minards-visualization-of-napoleons-
Basic Info: with-in subject design, recruit a group of participants (say 20) (3 marks) 1812-march

Task: Track how many destination points an edge-bundle can split into (5 marks)
2. One person, set of policies
Dataset: 5 different real-world graph (as synthetic graphs may have a uniform trend for each edge
Similar river-based visualization can be used. We can visualize each policy as a river, then
direction)
Technique: 3 different edge bundling algorithms with the default parameters encode the sentiment with color.
Independent variables: Edge bundling techniques & datasets An advanced setting is multiple policies and groups of people. The river based visualization
Controlled variables: 1920*1080 resolution display, same keyboard & mouse
(5 marks for above design setting) can be extended to fulfill this requirement.
(a) Please design a visualization to show the data. [5 marks] For example, the following Opinion Flow is a good design.
Dependent variables: the number of destination points & response time
Multi-dimensional data visualization techniques such as parallel coordinates, scatter plot matrix Analysis: use the original graph as the baseline, using statistical methods such as ANOVA to
(b) From your visualization, what kind of tasks can be performed? [5 marks] check whether there are significant differences between different techniques
(2 marks)
1. Clustering Please analyze the color scheme used in the PG defense demo:
2. Outlier detection http://vis.cse.ust.hk/pqeDefenceVis/
3. Positive / negative relation between different attributes
...

(5 marks for two or above reasonable tasks) Other answers sampled from the submitted exercises.
3. Problem Solving [10 marks]
Answer 2: (10 marks)
In the class, the professor mentioned that visualization is also some kind of transl ation, i.e.,
Use the width of the trajectory to encode the number of soldiers.
translating data to visual forms. It should obey the same rules of language translations (e.g.,
translating English to Chinese). Do you agree or disagree with his metaphor? Specifically, please Use a sequential color scheme to encode the time
write down the similarity and difference between language translation (e.g., English -> Chinese)
and visual encoding (i.e., Data -> Visual Form). Answer 3: (6 marks) (Geo-Time)

Use width of the trajectory to encode the number of soldiers


Similarity: it is necessary to truthfully represent the data and to avoid fake patterns. (5 marks) Use Z-axis to encode time
Difference: in visualization, it is common to utilize some interaction techniques such as filtering, Reasons to deduct marks: 3D has occlusion and distortion problems. You need to provide
clustering to represent a subset of the data for amplifying human cognition. (5 marks) extra discussions on how you can eliminate these problems under 3D context.
Your parents travelled in Hong Kong a few days ago and they would like you to help them
“record” their pleasant trips. Suppose you have the access to the GPS data from their mobile phone For each HKUST student, we have the following multivariate information:
Answer 4: (4 marks)
and you also know the accurate time they visited each location. Their traveling data look like this:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Name:
2016/12/4 Use thickness of the line to encode time
1. In the main view, what does the color encode? Do you agree with the color scheme used? Date of Birth: year/month/day
8 a.m. the ICC Tower What are your suggestions? Use color opacity to encode the number of soldiers.
12.30 a.m. Victoria Harbor Gender: male or female
3 p.m. Victoria Peak Answer: Green represents male students, pink represents female students. Circle and Reasons to deduct marks: un-intuitive. Not many visualizations use this kind of encoding and
5 p.m. Temple Street Night Market rectangle represent Ph.D. and M.Phil. students respectively, while the circle with a black ring people are
CGA:not0.0
familiar
to 4.3with it.
7.30 p.m. Night lights spectacle: the Symphony of Lights surrounding it means that the student takes the PQE but quits the Ph.D. program finally (i.e.,
graduate with a M.Phil. degree or just leave with nothing). For each HKUST
School: student, weScience,
Engineering, have theBusiness,
followingand
multivariate
Humanitiesinformation:
2016/12/5 Name:
9 a.m. Hong Kong Museum of History Department: CSE,
Date of Birth: ECE, …
year/month/day
1 p.m. Ocean Park Gender: male or female
Year of study: 1, 2, 3, 4
2. In the supervisor view, what does the color encode? CGA: 0.0 to 4.3
2016/12/6 School: Engineering, Science, Business, and Humanities
8.30 a.m. the Golden Mile of Nathan Road Answer: Actually, the different colors just represent different clusters of professors
2 p.m. Zoological and Botanical Garden
Department: CSE, ECE, …
according to the number of graduated students supervised by each professor. Please design a visualization
Year of study: 1, 2, 3,to4 encode the above information.
….
Please design a visualization to encode the above information.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If we want to use the color to encode the number of PG students graduated by each
supervisor, please suggest a color map. Solutions
(a) Please design a spatio-temproal visualization scheme to display their trips [8 marks]
Answer: A sequential color scheme less or equal than 2 hues should be used here. This is a classic problem of visualizing multi-variate data. Common techniques for
visualizing multivariate data include parallel coordinates (and varied parallel coordinates like
E.g.: flowmap + color (encoding time)
If we want to use the color to encode the research area of supervisors, please suggest a color circular PC), scatterplot matrix, and so on. Since here we the data does not have a very high
Ineffective design (e.g. like 3D): -2 marks map. dimension, so here we only give a few simple but effective examples.
Violate design principles (e.g. categorical color for ordinal attributes): -4 marks Sample Answer:
Answer: A qualitative color scheme should be used here. Make sure not to use too many Answer 1:
(b) Please design some narrative visualizaiton schemes to make your visualization more engaging colors.
(Parallel Coordinates) 1. If the graph keeps growing (e.g., I have 1000 collaborators), how to handle visual clutter?
[7 marks] Can you find any problem with the following data visualizations?
E.g. using video or slides to represent the visualization 1. 1) A good choice is to use node aggregation. We can pack low importance nodes into a few
large nodes. The details of large nodes will only be displayed during interaction. Edges
-3 marks if no description is provided for narrative / storytelling between center nodes are also reduced to reduce visual clutter.
-3 marks if no description is provided for the engagement
The proliferation of online social media has enabled people to spread opinions and ideas in an
unprecedented speed. Such opinions and ideas can be expressed by individuals, and divergences Node packing: https://bl.ocks.org/mbostock/ca5b03a33affa4160321
often occur when people oppose each other and want to achieve incompatible goals. For example,
in a political campaign, people supporting different parties may debate through social media for 2) Another choice is to use interactive filtering. However, this interaction will introduce more
their own political perspectives. Another example is marketing, where the makers of competing
products can launch persuasion campaigns on social media to gain the attention of social media perception and operation loads to users.
users. Some topics or events may get many people involved and they could have diverging views
on the same topic. For example, the US presidential election debates in 2016 triggered a series of Since the given data only has six dimensions (not including the Name), a good choice is the 2. How to show the evolution of my collaboration network? (i.e., how does this network
arguments because of the different political views of two parties. It attracted great attention of the parallel coordinates. change over time?)
public, leading to heated discussions on Twitter. On social media sites, some people show their
support by expressing positive comments on certain persons or events, while others may attack Parallel coordinates can be used with interactions such as brushing to filter data and reduce
their opponents with negative words. visual clutter. See https://bl.ocks.org/jasondavies/1341281
A survey of dynamic graph/network visualization can be found in this paper: The state of the
art in visualizing dynamic graphs by Beck et. al. in EuroVis 2014.
Suppose you have the following data and also attributes computed from the data: 1) The original Truncated axis. The y-axis is truncated to have a range from 34% to 42%. Doing so makes it (Iconography)
Twitter Data containing all the Tweets about a topic. Each tweet has a time stamp, some text, and
is associated with a user; 2) Two sets of people who have different opinions on this topic. To
look like the top tax rate will grow dramatically after Bush Tax Cuts expired! At a glance, the Typically, there are mainly two types of method: animation, and timeline.
bar sizes imply that rates in Jan 1, 2013 will be several times higher. However, the tax rate Compared with timeline-based visualization, animation is more consistent, but more
simply the design, only user names will be used in your system; 3) A sentiment score is already
will just grow about 4.6%, from 35% to 49.6%.
computed for every tweet about this topic from these two sets of people. The given data contains both categorical data and numerical data. Another choice is to map overwhelming to users, since they need to memorize the past frames of networks.
each record into a glyph (e.g. radar plot, or use color to encode categorical fields). Then In practice, the choice should be decided based on specific requirements.
You ask asked to design a visualization system to answer the following questions: 1) When does a plotting multiple records as glyphs onto a 2D plane (simply select 2 dimensions or use
2.
divergence start and end? How does it evolve? Your visualization should be able to reveal
dimensionality reduction like PCA or MDS). An example is as follows:
temporal data patterns that represent the process of a social divergence. 2) Who are involved in
each divergence side? Your visualization should highlight the people who were involved in the
divergence. 3) Why does a divergence occur? The analysts may need to read the tweets to find out
the reasons.

Please first describe how you want to design the system, especially the principles to follow, and
then sketch some key visualization schemes in your system.

Please design a controlled user study to evaluate these three methods. You need to provide
the scheme of the user study (within- or between-subjects), the number of subjects you plan
Cumulative graphs. We can’t tell much from this graph. It’s moving up and to the right, so to invite and their background, the tasks you give, the data you collect, the hypothesis you
things must be going well! But the non-cumulative graph paints a different picture: want to verify, and how you plan to analyze the data.

Reference answer
The advantage of this visualization is that you can more easily detect patterns that involve 1) Subjects: about 20 subjects; 10 with knowledge in visualization, and 10 with little
knowledge in visual analysis. Their genders and ages should be uniformly distributed.
dimensions higher than two (which is difficult in parallel coordinates or scatterplot matrix).
Its disadvantage is that the visualization will become overwhelming when the dataset is too 2) Example Analytical tasks:
large. a) Identify the most popular state that people immigrate in and out;
b) For a given state, identify the state that most people migrate to/from;
3) Data to collect: for each participant, record (i) the correctness of the performed tasks. (ii)
See https://link.springer.com/chapter/10.1007%2F978-3-540-33037-0_8 for more the time used to perform each task.
information (you should be able to access this resource on campus) 4) Analysis of test data:
a) calculate statistics (average, standard deviation, etc.) of the performance data to
evaluate effectiveness, efficiency, etc.;
Now things are a lot clearer. Revenues have been declining for the past ten years! If we b) visualize the performance data (boxplot, bar chart, etc.) to more intuitively see the
scrutinize the cumulative graph, it’s possible to tell that the slope is decreasing as time goes dominance and stability of the two designs;
on, indicating shrinking revenue. However, it’s not immediately obvious, and the graph is
c) perform hypothesis test (suppose A is better than B on task and calculate p-value).
incredibly misleading.

You might also like