Computational Journalism
Columbia Journalism School
Week 8: Visualization and Network Analysis
November 3, 2017
This class
Visualization as perception
Visualization design
Social network theory
Network analysis in journalism
Visualization as Perception
Topic links in Gdel, Escher, Bach
Visualization allows people to ofoad cognition to the
perceptual system, using carefully designed images as a
form of external memory.
- Tamara Munzner
Pop-Out Effects
Visual Comparisons
length
orientation
size color
extents correlations
Design Study Methodology: Reflections from the Trenches and the Stacks, Sedlmair et al, 2012
Visualization Design
Inward and Outward Grand Challenges for Visualization, Tamara Munzner
A multi-level typology of abstract visualization tasks, Brehmer & Munzner
Sequential Narrative
People tend to marry, do business with, spend time with, etc. people from
similar backgrounds... and people who have social ties tend to be similar.
Two major analysis methods
after you have the network data, which may be a very
manual process.
Look at a visualization
Apply algorithm
We can visualize the graph and use our eyes, or we can compute
centrality values algorithmically.
Degree centrality: number of edges
35
30
25
20
15
10
WASHINGTONTime and again, Texas Gov. Rick Perry picked up his office phone in
the months before he would announce his bid for the presidency. He dialed wealthy
friends who were his big fundraisers and state officials who owed him for their jobs.
Perry also met with a Texas executive who would later co-found an independent
political committee that has promised to raise millions to support Perry but is prohibited
from coordinating its activities with the governor.
- Jack Gillum, Perry called top donors from work phones, AP, 6 Dec 2011
The state of the art: Panama Papers
Graph Databases in Theory
Incredibly dirty source data. Current methods have low recall (~70%)
Unlinked
records
Soft
record
linkage
Graph Databases in Practice
Incomplete data. Building a network often requires scraping from documents. Bulk data
often unavailable or impractical, and some records need to be purchased one at a
time. Instead, reporting involves interactive data enrichment.
Graph queries are not that helpful. Cipher was available to PP investigators but no one
outside the core team learned it. Moreover, its not clear how often reporting problems
can be expressed as a graph query. Even find path between did not produce any
(documented) leads on PP.
Networks need to be narratives. The most useful networks are hand-built, for a particular
line of reporting.
Maps, not data visualizations
Query results vs. hand-built graphs