Data Visualization with GEPHI

Luv Walia – 10BM60043 Prabhjot Singh Bhatia – 10BM60060
Gephi is dubbed as the Photoshop of Data Analytics. It is open source software to visualize and manipulate complex data networks in an intuitive manner. This user guide is an attempt to present a walkthrough for the new user.

Class of 2012 Vinod Gupta School of Management IIT Kharagpur

Contents
Introduction ............................................................................................................................................ 2 What this tutorial is about and what it is not about........................................................................... 2 Who This Tutorial Is For ...................................................................................................................... 2 Prerequisites ....................................................................................................................................... 2 About Gephi ........................................................................................................................................ 2 Features: ............................................................................................................................................. 3 Uses for Gephi in Business .................................................................................................................. 3 Fundamentals ......................................................................................................................................... 4 Installing .............................................................................................................................................. 4 Opening a file ...................................................................................................................................... 4 Graph Visualization ................................................................................................................................. 5 Layout Algorithms ................................................................................................................................. 10 Installing plugins ................................................................................................................................... 20

The cover page image was created with Gephi Version 0.8.1 Beta using Force Atlas 2 layout algorithm

1

Introduction
Data visualization is the representation of processed data using graphical means, so as to make it easy to communicate the information clearly and effectively. There is a trade-off to be made between aesthetics and functionality. Gephi helps achieve this trade-off effortlessly.

What this tutorial is about and what it is not about
This tutorial is highly practical oriented. It guides one on how to go about data visualization, but limits itself to Gephi. It remains limited to the basic tools and techniques available in Gephi, and does not attempt to discuss all available techniques. The tutorial uses an example dataset to show the implementation of the techniques. Screenshots have been included for the same. The tutorial has been created on the basis of the latest available version, 0.8.1 beta. Future versions may or may not contain the features listed here, or may implement in a manner different from that listed here. In addition, this book does not specifically discuss the following topics.    The concepts of data visualization The algorithm followed by various plugins The internal working of the software.

Who This Tutorial Is For
This tutorial is aimed at the budding business professional, who is new to the software and wishes to get started with data visualization.

Prerequisites
A basic understanding of data analysis techniques is necessary. Additionally, one must know how the results of these analyses are to be interpreted for solving a real life problem. However, no prior data visualization experience is necessary. To try out some of the advanced techniques for live data capture and visualization, one must be comfortable doing programming and setting up a server connected to the internet.

About Gephi
[Pronounced: G-fai] Gephi, an open source network visualization platform has a rich set of built in functionalities and an intuitive user interface. The software provides a powerful and interactive visualization and exploration tool for all kinds of networks and complex systems, all with a smooth learning curve. As software for Exploratory Data Analysis, Gephi provides with a robust toolkit to explore, understand and manipulate graph structures, to reveal hidden insights. An analyst can make hypothesis, discover patterns and identify faults during data collection, all with a slick visual interface to have an overall perspective of things. Gephi is a complementary tool to statistics, since the importance of visual thinking has finally been recognized. Additionally, Gephi has built in tools for Social Network Analysis.

2

Features:
           Realtime Visualization: Gephi sports the fastest graph visualization engine which helps an analyst create and analyze a variety of scenarios to make accurate decisions, faster. SNA Metrics: Although Gephi can work with incorporates all major metrics currently used to perform a social network analysis(SNA) like Betweenness: an indicator of influence Diameter: An indicator of the reach of an individual Closeness: An indicator of how fast this individual can reach its entire network Clustering Coefficient: An indicator of how closely knit a particular group of nodes is. Average shortest path: An indicator of how many nodes to cross to reach a particular node PageRank: The importance of a page HITS: Social value of links and content on a page Clustering and Hierarchical graphs: Gephi helps us create clusters and sub clusters out of the given network graphs. Suppport for Large datasets: What differentiates Gephi from other similar software is its ability to work with a very large dataset, upto 50,000 nodes.

Uses for Gephi in Business
Gephi can help visualize any kind of network data graphs. Specifically from a business viewpoint, Gephi can be of help in a number of ways, as detailed:  Marketing o Segmentation:  Gephi provides an inbuilt clustering tool to the customers from a product/service targeting perspective o Targeting:  Whom to target. More importantly, whom NOT to target  Gephi helps us to find users with the most influence, and hence identify them as potential targets for marketing communication. Customer Relationship Management: o Identify the worth of a customer, based on his network o Whether or not to go the extra mile to retain that customer Organizational Development o Similar to the manner that we employ social network analysis for customers, a large organization could also apply the same concepts to its own employees and generate meaningful insights that could help in running the organization more effectively. Mergers & Acquisition: o How successful is the merger? Gephi can help answer this question by analyzing the past and the present scenarios Team Building: o What set of employees could bond well? o Where can conflicts arise? o Who are the unsung heroes/leaders? o Where do the barriers to internal communication lie? Human Resources

3

o

Gephi can help us identify potential candidates best suited for a particular position. It could also help us target a particular geography to hunt for potential candidates

Gephi can help us answer all the above questions, given the right set of data.

Fundamentals
Installing
Get Gephi from this link: https://gephi.org/users/download/. Being java based, Gephi is available for all: Windows, Linux and Macintosh. The installation is a simple process. NOTE: One needs to have java installed and configured on the system before attempting to install Gephi. To get java, visit this link: http://www.oracle.com/technetwork/java/javase/downloads/index.html . To just run Gephi, Java Runtime Environment would be fine. However, to build plugins for Gephi, one must have the Java Development Kit installed.

Opening a file
Gephi cannot work on raw data. It needs data to be processed into graph formats (for example, say .gexf). To accomplish this, we can take the help from other enterprise grade FOSS software such as R. However, for the purpose of demonstration, we shall be working with the sample datasets included in the Gephi toolkit. More specifically we shall be using the social network data sets, available http://wiki.gephi.org/index.php/Datasets   

here:

Open Graph File (File>Open…) Import Report When the file is opened, a report is created, and a sum-up of the data and any issues are listed: o Number of nodes o Number of edges o Type of graph

4

Click on OK to validate and see the graph:

o

Use the mouse to move and scale the visualization  Zoom: Mouse Wheel  Pan: Right Mouse Drag

Graph Visualization
o While the “Drag” mode is enabled you can drag the nodes by keeping left mouse pressed and moving away.  Click on the area where “Dragging” is written  Configure the “Diameter” with the slider

5

o

You can change the edge thickness by locating the edge-weight slider:

o

If you lose your graph, reset the position, using “Center On Graph” button

o

Autoselect neighbors  Essential option to enhance readability of the network. Selected nodes neighbors are automatically selected as well, allowing to know who is connected to who easily.  Expand the visualization settings (right bottom corner of the graph)  Check the “Autoselect neighbors” option

o

Edge color  By default edges have the same color as their source node. This can be configured and a single color can be used instead.

6

 

Expand the visualization settings and go to the “Edges” tab Uncheck the “Source node color” and configure “Edge default color”

o

Node shape and 3-D  Although Gephi uses a 3-D rendering engine, networks are usually in 2-D and this is the default mode.  Expand the visualization settings and go to the “Nodes” tab  Select “Sphere 3d” instead of “Disk 2d”

o

Display attributes  Besides a label, nodes and edges have attributes, like gender, age or relationship type in a social network. It’s easy to display them instead/with the label  Click on the “Attributes” button in the visualization settings.  A dialog appears and lists all attributes, separated for nodes and edges.

7

 

Check all attributes you want to display, for instance “Code”. Click on OK to confirm

o

Transform text color and size  The Ranking module will be used to do that.  Find the label color transformer and select which attribute to use for ranking. Here the “Degree” is chosen.  Configure the ranking colors and click on “APPLY”  The text should be colored now. Try also to use “Betweenness Centrality” instead of “Degree”.  Now select the label size transformer  Select sizes between 0 and 1, as this size value is multiplied with the default element size  Click on “APPLY” to see how the text size changes

8

o

Antialiasing option  Antialiasing is a visualization option which makes edges look smoother. It is set at 4x by default and can be set up to 16x.  Go to Gephi options in the “Tools” menu  Select the “Visualization” tab and then the “OpenGL” tab.  Here you can change the antialising option. Restart Gephi to validate the changes.

Layout the graph o Layout algorithms sets the graph shape, it is the most essential action. o Locate the Layout module, on the left panel.  Choose “Force Atlas 2” (to handle large networks while keeping a very good quality.)

9

“RUN” the layout by applying the following settings step by step:  LinLog mode = checked (Linear attraction & logarithmic repulsion (lin-lin by default), makes clusters tighter)  Scaling = 100 (Increase to make the graph sparser)  Edge weight influence = 0 (From 0 (no influence) to 1 (normal). Set 0 to calculate forces without edge weight) Now “STOP” the algorithm.

Layout Algorithms
o The purpose of Layout Properties is to let you control the algorithm in order to make a aesthetically pleasing representation.

There are several layout options available to the user, namely, OpenOrd, ForceAtlas, Yifan Hu, Frushterman-Reingold, Circular, Radial Axis and GeoLayout, each one being used for a specific purpose. LAYOUT OpenOrd ForceAtlas, Yifan Hu, Frushterman-Reingold Circular, Radial Axis GeoLayout EMPHASIS Divisions/Clustering Complementarities Ranking Geographic Repartition

Ranking (color) o Ranking module lets you configure node’s color and size. o Locate Ranking module, in the top left.

o o o

o

Choose “Degree” as a rank parameter. You should obtain the configuration panel below configure colors  Move your mouse over the gradient component  Double-click on triangles to configure the color Click on apply to see the result

10

Ranking result table o You can see rank values by enabling the result table. ACARVIN has 252 links and is the most connected node in the network o Enable table result view at the bottom toolbar o Click again on apply

Metrics o Calculate the average path length for the network. It computes the path length for all possibles pairs of nodes and give information about how nodes are close from each other o Click on “RUN” near “Average Path Length”. The settings panel immediately appears

11

o o

Select “Directed” and click on OK to compute the metric When finished, the metric displays its result in a report

12

Ranking (size) o Metrics generates general reports but also results for each node. Thus three new values have been created by the “Average Path Length” algorithm we ran.  Betweeness Centrality  Closeness Centrality  Eccentricity o Go back to Ranking o Select “Betweeness Centrality” in the list. This metrics indicates influencial nodes for highest value. o The node’s size will be set now. Colors remain the “Degree” indicator. o Select the diamond icon in the toolbar for size. o Set a min size at 40 and a max size at 200 o And click on “APPLY” to see the result.  Color: Degree Size: Betweeness Centrality metric

13

Show labels o Display node labels o Set label size proportional to node size o Set label size with the scale slider

o

Set label color  Locate the color chooser in the visualization settings  Press the left mouse to display the palette and pick a color. This sets node label color.  To configure edge label color, expand the settings bar

o

Label Adjust  Go to the Layout panel  Choose the “Label Adjust” layout in the list  Click “RUN” on to proceed

14

Community detection o The ability to detect and study communities is central in network analysis. We would like to colorize clusters in our example o Gephi implements the Louvain method1, available from the Statistics panel o Click on “RUN” near the “Modularity” line

15

Partition o The community detection algorithm created a “Modularity Class” value for each node. The partition module can use this new data to colorize communities. o Locate the Partition module on the left panel. o Immediately click on the “Refresh” button to populate the partition list. o Select “Modularity Class” in the partition list. o You can see that many communities were found, sorted in decreasing order by percentage, could be different for you. A random color has been set for each community identifier. o Click on “APPLY” to colorize nodes

16

Filter o

o o o

The last manipulation step is filtering. You create filters that can hide nodes and egdes on the network. We will create a filter to remove leaves, i.e. nodes with nine edge. Locate the Filters module on the right panel. Select “Degree Range” in the “Topology” category. Drag it to the Queries, drop it to “Drag filter here”.

17

o o o o

Click on “Degree Range” to activate the filter. The parameters panel appears. It shows a range slider and the chart that represents the data, the degree distribution. Move the slider to sets its lower bound to 9. Enable filtering by pushing the button. Nodes with a degree inferior to 9 are now hidden.

Preview o Before exporting your graph as a SVG or PDF file, go to the Preview to: o Select the “Preview” tab in the banner. Click on Refresh to see the preview. o See exactly how the graph will look like. Put the last touch.

18

o

In the Node properties, find “Show Labels” and enable the option. Click on “REFRESH”.

19

Export as SVG o From Preview, click on SVG near Export (SVG Files are vector graphics, like PDF. Images scale smoothly to different sizes and can therefore be printed or integrated in high-resolution presentations. Transform and manipulate SVG files in Inkscape or Adobe Illustrator) Save your project.

Installing plugins
Being the true open source feature extensive software in its class, Gephi has attracted a lot of attention from developers and researchers all round the world. As a result, there are a plethora of plugins available for Gephi to extend its functionality. These plugins can be found at https://gephi.org/plugins/ . A majority of these plugins are developed by the community and quite a few are under active development. A few prominent ones are: o o o o Retweet Monitor: Used for monitoring live retweets. More details at https://gephi.org/plugins/retweet-monitor/ Graphviz Layout: Used to make layouts suitable for the specialized graphviz software. More details at https://gephi.org/plugins/graphviz-layout/ Parallel Force Atlas: Used to speed up ForceAtlas, using multiple threads. More details at https://gephi.org/plugins/parallel-force-atlas/ Social Network Analysis: This plugin allows computation of various metrics used in social network analysis and influencer analysis. More details at https://gephi.org/plugins/social-network-analysis/ Layered Layout: This is a specialized layout with nodes in different orbits, specially used in Social Network Analysis. More details at https://gephi.org/plugins/layeredlayout/ HTTP Graph: Generates data based on the web browsing activity on the machine. Details at: https://gephi.org/plugins/http-graph/ Circular Layout, OpenOrd Layout, GeoLayout : These are layout algorithms as described previously in layouts

o

o o

To install a plugin, 1. 2. 3. 4. 5. 6. 7. 8. 9. Download the .zip file from the respective webpage for the plugin. Extract the file to a specified folder of your choice, to get a “.nbm” file. Open Gephi. Go to Tools>Plugins. Click on “Downloaded” tab. Click “Add Plugins” Browse to the path where the file was extracted and select the “.nbm” file. Click OK and then Install. Follow the onscreen instructions.

20

21

Sign up to vote on this title
UsefulNot useful