You are on page 1of 21

Distance Measures for Dynamic Citation Networks

M. Bommarito D. Katz J. Zelner J. Fowler

May 21, 2010

M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance


() Measures for Dynamic Citation Networks May 21, 2010 1 / 21
Outline

1 Goals
Supreme Court Citation Network

2 Citation Dynamics and Sinks

3 Distance Measures for Dynamic Citation Networks

4 How does the “sink” method perform?


Simulation Results
United States Supreme Court

5 Conclusion and Future Directions

M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance


() Measures for Dynamic Citation Networks May 21, 2010 2 / 21
Goals Supreme Court Citation Network

Goals & Data

Goal: Can we uncover various mesoscopic patterns within the


jurisprudence of the United States Supreme Court?
1 |V | ≈ 36k, |E| ≈ 280k

2 1791-2005

M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance


() Measures for Dynamic Citation Networks May 21, 2010 3 / 21
Goals Supreme Court Citation Network

Standard Solution

Standard Solution: Obtain vertex community membership by


applying an out-of-the-box community detection method.
Methods:
1 Edge-Betweenness (Girvan & Newman 2002)
2 Fast-Greedy (Clauset et al. 2004)
3 Leading (or more) Eigenvector (Newman 2006, Richardson et al.
2009)
4 Walktrap (Pons & Latapy 2006)

M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance


() Measures for Dynamic Citation Networks May 21, 2010 4 / 21
Goals Supreme Court Citation Network

Expectations

Expectation: Dyadic relationships should be fairly stable.

If two vertices are in the same community m at t, they should be in the


same community n (not necessarily identical to m) at t + 1.

Formally, this can be written as “pairwise stability” σ:

σ =P(Cit+1 = Cjt+1 |Cit = Cjt )


Cit :community membership of vertex i at time t

This conception of stability avoids many issues with community tracking.

M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance


() Measures for Dynamic Citation Networks May 21, 2010 5 / 21
Goals Supreme Court Citation Network

Results

Fast-Greedy Eigenvector
1.0 1.0
Pairwise Stability Pairwise Stability
0.9
Q 0.9 Q
0.8
0.8
0.7
0.7 0.6

0.6 0.5

0.4
0.5
0.3
0.4
0.2
0.3
1800 1820 1840 1860 1880 1900 0.1
1800 1810 1820 1830 1840 1850

The results of these approaches do not match our expectation.

M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance


() Measures for Dynamic Citation Networks May 21, 2010 6 / 21
Goals Supreme Court Citation Network

Research Source

Title: On the Stability of Community Detection Algorithms on


Longitudinal Citation Data.
Michael J. Bommarito II, Daniel M. Katz, Jonathan L. Zelner.
Forthcoming in Proceedings of ASNA 2009 (ETH-Zurich).

Goal: Compare out-of-the-box community detection methods under


different parameters of a citation model w.r.t.:
1 Average number of resulting communities across all time steps
2 Average pairwise stability of all vertex pairs across all time steps

M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance


() Measures for Dynamic Citation Networks May 21, 2010 7 / 21
Goals Supreme Court Citation Network

Results

M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance


() Measures for Dynamic Citation Networks May 21, 2010 8 / 21
Goals Supreme Court Citation Network

Implications

Citation networks are different.

1 Patterns within citation networks are not well-revealed by these


methods.
2 Qualitative conclusions may vary dramatically based on the chosen
method.
3 The “appropriateness” of each method may depend on parameters of
the generating process.

M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance


() Measures for Dynamic Citation Networks May 21, 2010 9 / 21
Citation Dynamics and Sinks

Citation Dynamics

What are the basic growth rules of a citation network?


1 Documents and their citations are introduced into the network in
sequence.
2 Documents cannot create new outbound citations after introduction.

These rules guarantee that any resulting network is an acyclic digraph.


The simplest topological ordering is just the order of vertex introduction.

M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance


() Measures for Dynamic Citation Networks May 21, 2010 10 / 21
Citation Dynamics and Sinks

Dynamic Acyclic Digraphs

What properties do we have?

1 Each component has at least one “sink” and one “source.”


2 Sinks are vertices with zero out-degree. The first vertex in a
topological ordering must be a sink.
3 Sources are vertices with zero in-degree. The last vertex in a
topological ordering must be a source.

M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance


() Measures for Dynamic Citation Networks May 21, 2010 11 / 21
Citation Dynamics and Sinks

Sinks

If sinks have zero out-degree, they must represent the point at


which at least one idea is introduced into the network.

Either the document “invents” the idea or the head of the citation arc was
not sampled in the dataset.

Weak vs. Strong - Dimensional Data can help identify Weak Sinks

M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance


() Measures for Dynamic Citation Networks May 21, 2010 12 / 21
Citation Dynamics and Sinks

Six Degrees of Marbury v. Madison

M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance


() Measures for Dynamic Citation Networks May 21, 2010 13 / 21
Distance Measures for Dynamic Citation Networks

Basic Idea of the Distance Measure

If two vertices share more “ideas,” they should be more similar.

Alternative Example: Articles in Political Science


1 American Politics
2 Congress
3 Committee Assignments
4 Formal Theory

We want to be able to use clustering methods, so we then construct a


distance measure from this basic premise.

M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance


() Measures for Dynamic Citation Networks May 21, 2010 14 / 21
Distance Measures for Dynamic Citation Networks

A Simple Distance Measure

Simplest Distance Measure: Proportion of Possibly Shared Ideas

|Si ∩ Sj |
Di,j =1 −
|Si ∪ Sj |
Si :the set of sink vertex IDs for vertex i

Note that this is only one way to translate from similarity to distance.

Also note that distance between vertices i and j don’t change over
time.

M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance


() Measures for Dynamic Citation Networks May 21, 2010 15 / 21
Distance Measures for Dynamic Citation Networks

Flexible Framework for More Detailed Specifications

What if the story is more complicated?

1 Minimum path length to a sink


2 Number of paths to a sink
3 Total number of shared ancestors
4 Total elapsed time along path

Example with arbitrary f for path length and number of shared


ancestors:
P
s∈S ∩S f (Ai,s , Pi,s , Aj,s , Pj,s )
Di,j =1 − P i j
s∈Si ∪Sj f (Ai,s , Pi,s , Aj,s , Pj,s )

M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance


() Measures for Dynamic Citation Networks May 21, 2010 16 / 21
How does the “sink” method perform? Simulation Results

Simulation
1 Directed
2 Two vertex types
3 Asymmetric vertex connection probabilities
4 Preferential attachment mechanism (Two-Dimensional)

M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance


() Measures for Dynamic Citation Networks May 21, 2010 17 / 21
How does the “sink” method perform? Simulation Results

Simulation Results

M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance


() Measures for Dynamic Citation Networks May 21, 2010 18 / 21
How does the “sink” method perform? United States Supreme Court

United States Supreme Court

The Early Years of the United States Supreme Court


M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance
() Measures for Dynamic Citation Networks May 21, 2010 19 / 21
How does the “sink” method perform? United States Supreme Court

Supreme Court Results Using the Sink Method

M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance


() Measures for Dynamic Citation Networks May 21, 2010 20 / 21
Conclusion and Future Directions

Conclusion

1 There are issues with existing community detection methods in


dynamic citation networks.

2 Our sink-based method provides more reasonable qualitative results


than other methods we’ve tried.

3 Application to a larger segment of the SCOTUS data together with


qualitative strategy designed to evaluate the outputs

M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance


() Measures for Dynamic Citation Networks May 21, 2010 21 / 21

You might also like