Professional Documents
Culture Documents
L F F
E IN
L F F
E IN
Requirements
Bring your own laptop with Java and Gephi installed. Gephi should be updated (menu Help > Check for Updates). Bring a mouse with a wheel. Bring a dataset of your own if you want, verify if it loads well in Gephi.[1]
[1] http://gephi.org/users/supported-graph-formats/
The greatest value of a picture is when it forces us to notice what we never expected to see
Dummy Example
Observation: visual saliences on specific file sizes External knowledge: these sizes correspond to films New hypothesis on data: films are highly exchanged, so the study might dig in this direction P2P file size distribution (Latapy et al., 2008)
1st graph viz tool: Pajek (1996) Vladimir Batagelj, Andrej Mrvar
1. Make complex things simple 2. Extract small information from large data 3. Present truth, do not deceive
http://www.dataists.com/2010/10/what-data-visualization-should-do-simple-small-truth/
at different levels
1 dimension
N dimensions
on multiple dimensions
T+0
T+N
at time scale
Global - connectivity - density - centralization Local - communities - bridges between communities - local centers vs periphery Individual - centrality - distances - neighborhood - location - local authority vs hub
Social - who with whom - communities - brokerage - influence and power - homophily Semantic - topics - thematic clusters Geographic - spatial phenomena
Actors
Territory
algorithms thresholds
communication goals
Guideline
# nodes 1 - 100 lists + edges in bonus, focus on qualitative data
100 - 1,000
easy to read, obvious patterns focus on entities (in context) metrics are tools to describe the graph (centrality, bridging...) links help to build and interpret categories of entities challenge: mix attribute crossing and connectivity
1,000 - 50,000 hard to read, problem of hidden signals: track patterns with various layouts and filtering focus on structures metrics are tools to build the graph (cosine similarity...) categories help to understand the structure challenge: pattern recognition > 50,000 require high computational power
Gephi now!
Gephi in a Nutshell
Like Photoshop for graphs. Helps data analysts to reveal patterns and trends, highlight outliers and tells story with their data.
Network visualization platform Open source, supported by a community Built for performance and usability Extensible by plug-ins Windows, MacOS X, Linux
Gephi Community
Nonprofit organization
Communities
Contributors
Mathieu Bastian, Mathieu Jacomy, Eduardo Ramos Ibaez, Sbastien Heymann, Guillaume Ceccarelli, Andr Panisson, Antonio Patriarca, Cezary Bartosiak, Martin kurla, Patrick McSweeney, Yi Du, Hlder Suzuki, Daniel Bernardes, Ernesto Aneiro, Keheliya Gallaba, Luiz Ribeiro, Urban kudnik, Vojtech Bardiovsky, Yudi Xue
Community Mission
Provide a sustainable software Maintain the technical ecosystem Build a business ecosystem Face cutting-edge technological challenges with a long-term vision Distribute the software in Open Source
Community Values
Open innovation: ideas and features come from the entire community. Decisions are taken with transparency. We consider this technology as a public good, and will keep it in open source.
Diversity of Usages
business leisure :-)
communication
academic
art
a b c d e
a 1
b 1 -
c 1 -
d 1 -
e 1 Graphical
XML
Tabular
Software I/O
MySQL PostgreSL SQL Server Neo4j
databases
file
CSV Pajek NET Guess GDF GEXF GraphML Graphviz DOT UCInet DL NetdrawVNA Tulip TLP Excel Spreadsheet
graph streaming
user input CSV Pajek NET file Guess GDF GEXF GraphML Excel Spreadsheet SVG PDF PNG
>
St ru c
rix
at
re
/M
tu
gh
W ei
ru
Li
St
ge
XM
Ed
At t
ge
CSV DL Ucinet DOT Graphviz GDF GEXF GML GraphML NET Pajek TLP Tulip VNA Netdraw Spreadsheet*
Ed
rc h
st
ic
au lt
ra ph s
Do you need...
GEXF Spreadsheet GraphML Guess GDF GML UCINet DL Netdraw VNA Graphviz DOT Pajek NET CSV Tulip TLP
Many features
Few features
Using Gephi
O M E
Team work
1 2 3 4
Create a team of 2~3 people. Choose a dataset. Explore it during 1H. Two teams present their preliminary findings.
GitHub is an application used by nearly a million people to store over two million code repositories, making GitHub the largest code host in the world.
Started in 2008, it provides the features of an online social network and a software repository to lower the barriers of collaboration and make the code easier to contribute. https://github.com
Your mission (should you decide to accept it): find research hypotheses based on your exploration
Example question: are the Perl communities based on geography?
_______________Blogroll Network______________ Nodes: blogs with more than two blogroll links Edges: blogroll link (in-link) _______________Post-link Network_____________ Nodes: blogs with more than two blogroll links Edges: hyperlink inside post from a blog to another (post-link)
Hands-On!
Start: Load a graph Apply a layout Color the nodes by a qualitative variable in Partition Panel Size the nodes by a quantitative variable in Ranking Panel Start to explore...compute metrics, filter the network End: Export maps to PDF in Preview Tab Save
Presentations
GitHub Repository
Irish Blogosphere
Gephi Documentation
Web Site: Support: Wiki: Source code: http://gephi.org
http://forum.gephi.org http://wiki.gephi.org https://launchpad.net/gephi
Online Tutorials
Tutorial in Spanish
https://code.google.com/p/camon/wiki/Taller_Gephi
Thank You!
Credits
[slide 11] images from Drew Conway
http://www.dataists.com/2010/10/what-data-visualization-should-do-simple-small-truth/
[slide 22 top left] Benot Vidal at MFG Labs [slide 22 bottom center] Franck Ghitalla at UTC [slide 22 right] Studies in MA Digital Fashion at LCF by Peter Jeun Ho Tsang
http://jeunhotsang.com/blog/2010/12/07/prototype/
Special Thanks to Franck Ghitalla and Mathieu Jacomy for their insightful discussions.