0% found this document useful (0 votes)

26 views27 pages

Lecture 4 - Analyzing Massive Graphs Part I

Uploaded by

Werd We

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views27 pages

Lecture 4 - Analyzing Massive Graphs Part I

Uploaded by

Werd We

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

ANALYZING MASSIVE GRAPHS - PART I

DR. MICHAEL FIRE

Graphs/Networks are Everywhere

Some examples:
• Online Social Networks
• Computer Networks
• Financial Transactions Networks
• Protein Networks
• Neuron Networks
• Animal Social Networks
Networks/Graphs are Well-Studied

Due to their cross-disciplinary usefulness today, there are:

• Numerous graph algorithms
• Many tools/frameworks to analyze graphs
• Many tools to visualize graphs

igraph

Networkx
Graphs - The Basics
A graph consists of a set of vertices (nodes), and a
set of links (edges). The links connect vertices.

There are direct graphs, like Twitter, and undirected

graphs, like Facebook.
Practical Network Analysis
There are countless interesting things that we can learn from graphs:
• Graph Algorithms and Their Complexity
• Network Evolution Models
• Dynamic Networks
• Spreading Phenomena
• Temporal Networks
• Motifs

In today’s lecture, we are going to learn how to use networks as

data structures to analyze and infer insights from data
Brief History of Network Models
• In 1959, Erdős and Rényi developed random graph generation
model
• In 1965, Price observed a network in which the degree distribution
followed a power law. Later, in 1976, Price provided an explanation
of the creation of these types of networks: “Success seems to
breed success”
• In 1998, Watts and Strogatz introduced model for generating small-
world networks
• In 1999, Baraba ́ si and Albert observed that degree distributions
that follow power laws exist in a variety of networks, including the
World Wide Web. Baraba ́ si and Albert coined the term “scale-free
networks” for describing such networks.
Network Analysis Tools
• Networkx - a Python Library for graph analysis
• iGraph - a library collection for creating and manipulating graphs
and analyzing networks. It is written in C and also exists as
Python and R packages.
• NodeXL - NodeXL Basic is an open-source template that makes
it easy to explore network graphs
• NetworKit - NetworKit is an open-source tool suite for high-
performance network analysis. Its aim is to provide tools for the
analysis of large networks.
• SGraph - A scalable graph data structure. The SGraph provides
flexible vertex and edge query functions, and seamless
transformation to and from an SFrame object
Networkx
Pros:
• Easy and fun to use
• Mature library
• Many implemented graph algorithms
• Vertices can be anything and edges can store
attributes
• Easy to visualize graphs and to save them in many
formats
Cons:
• Doesn’t work well with large graphs
• Slow
iGraph
Pros:
• Mature library
• Many implemented graph algorithms
• Pretty fast
Cons:
• Little tricky to use
• Doesn’t work well with very large graphs
SGraph
Pros:
• Can work with very large graphs (billions of links)
• Part of the TuriCreate Eco-system (works great with
SFrame)
Cons:
• Relatively little functionality
Network Visualization Tools
Cytoscape is an open source bioinformatics software
platform for visualizing molecular interaction networks.
Cytoscape can be used to visualize and analyze network
graphs of any kind.

Yeast Protein–protein/Protein–DNA interaction network visualized by Cytoscape.

Network Visualization Tools
Gephi is an open-source network
analysis and visualization software package written in Java
Network Visualization Tools
[Link] is a JavaScript library for producing dynamic,
interactive data visualizations in web browsers.

We can use [Link] to dynamically visualize networks

Some Terminology
There are many interesting vertex/links/networks properties that are
widely used in the context of analyzing networks. Here is a short list
of terms we will need for the following examples:
• Degree - the number of edges incident to a vertex
• Clustering Coeﬃcient - a measure of the degree to which
nodes in a graph tend to cluster together.
• Betweenness centrality - a centrality measure which quantifies
the number of times a vertex/edge acts as a bridge along the
shortest path between two other nodes/edges.
• Modularity - measures the strength of division of a graph into
modules (also called groups, clusters or communities)
• Strongly Connected Component - a maximal group of vertices
that are mutually reachable
• Weakly Connected Component - a maximal group of vertices
that are mutually reachable by violating the edge directions.
• Common Friends - Given two vertices u and v, the common
friends of u and v is the size of the common set of friends that
both u and v possess.
Some Useful Algorithms
• Community Detection
• Shortest Path
• Closeness Centrality
• PageRank
• k-Core Decomposition
Let’s move to running code using Jupyter Notebook
Example: Closeness
Centrality
Closeness centrality (or closeness) of a node is a
measure of centrality in a network, calculated as the
reciprocal of the sum of the length of the shortest
paths between the node and all other nodes in
the graph.

Thus, the more central a node is, the closer it is to all

other nodes.
The Networkx Implementation
The Networkx Implementation
Example: Grivan-Newman
Community Detection Algorithm
The idea behind the algorithm:
• Iteratively remove edges
• focuses on edges that are most likely “between"
communities, i.e. edges with the highest
betweenness
• Stop were there are no-more edges
• As the graph breaks down into pieces, the tightly
knit community structure is exposed and the result
can be depicted as a dendrogram.
The Networkx Implementation
"""
# If the graph is already empty, simply return its connected
# components.
if G.number_of_edges() == 0:
yield tuple(nx.connected_components(G))
return
# If no function is provided for computing the most valuable edge,
# use the edge betweenness centrality.
if most_valuable_edge is None:
def most_valuable_edge(G):
"""Returns the edge with the highest betweenness centrality
in the graph `G`.

"""
# We have guaranteed that the graph is non-empty, so this
# dictionary will never be empty.
betweenness = nx.edge_betweenness_centrality(G)
return max(betweenness, key=[Link])
# The copy of G here must include the edge weight data.
g = [Link]().to_undirected()
# Self-loops must be removed because their removal has no effect on
# the connected components of the graph.
g.remove_edges_from(nx.selfloop_edges(g))
while g.number_of_edges() > 0:
yield _without_most_central_edges(g, most_valuable_edge)
The Networkx Implementation
def _without_most_central_edges(G, most_valuable_edge):
"""Returns the connected components of the graph that results from
repeatedly removing the most "valuable" edge in the graph.

`G` must be a non-empty graph. This function modifies the graph `G`
in-place; that is, it removes edges on the graph `G`.

`most_valuable_edge` is a function that takes the graph `G` as input

(or a subgraph with one or more edges of `G` removed) and returns an
edge. That edge will be removed and this process will be repeated
until the number of connected components in the graph increases.

"""
original_num_components = nx.number_connected_components(G)
num_new_components = original_num_components
while num_new_components <= original_num_components:
edge = most_valuable_edge(G)
G.remove_edge(*edge)
new_components = tuple(nx.connected_components(G))
num_new_components = len(new_components)
return new_components

most_valuable_edge -> usually the default is the edge with the

highest betweenness centrality
Edge Betweenness Centrality
Using Girvan-Newman
Recommended Read:
• Michael Fire: The Science of networks: Big Data in
action
• Network Science Course by Carlos Castillo
• Top 30 Social Network Analysis and Visualization Tools
by Devendra Desale
• Lecture 24 — Community Detection in Graphs -
Motivation | Stanford University
• k-Core Decomposition: A tool for the visualization of
Large Scale Networks by Alvarez-Hamelin, José
Ignacio, et al.

Understanding Graph Theory with NetworkX
No ratings yet
Understanding Graph Theory with NetworkX
23 pages
Experion Document1
No ratings yet
Experion Document1
19 pages
Social Network Analysis Guide
No ratings yet
Social Network Analysis Guide
20 pages
Graph Analytics for Recommendation Engines
No ratings yet
Graph Analytics for Recommendation Engines
13 pages
Visualization Using Tools
No ratings yet
Visualization Using Tools
12 pages
Group 7 - SNA GA2
No ratings yet
Group 7 - SNA GA2
14 pages
Biol Sistemas 1 Redes
No ratings yet
Biol Sistemas 1 Redes
70 pages
Course 1-2
No ratings yet
Course 1-2
39 pages
03 SNA Network Measures Advanced 2
No ratings yet
03 SNA Network Measures Advanced 2
37 pages
Basics of Network Analysis
No ratings yet
Basics of Network Analysis
38 pages
C2 - Social Network Measurement
No ratings yet
C2 - Social Network Measurement
42 pages
Anurag Fulare Re Review 1 Sem 6
No ratings yet
Anurag Fulare Re Review 1 Sem 6
28 pages
NetworkX Python Tutorial for Network Analysis
No ratings yet
NetworkX Python Tutorial for Network Analysis
49 pages
Module1: Introduction: Prof. Punitha K, VIT Chennai
No ratings yet
Module1: Introduction: Prof. Punitha K, VIT Chennai
50 pages
BDA Experiment 8
No ratings yet
BDA Experiment 8
12 pages
05 Motifs PDF
No ratings yet
05 Motifs PDF
41 pages
Unit 6 Mining Social Network Graph
No ratings yet
Unit 6 Mining Social Network Graph
9 pages
Introduction To Network Analysis in Python Chapter1
No ratings yet
Introduction To Network Analysis in Python Chapter1
39 pages
Section 5
No ratings yet
Section 5
21 pages
Mathematics-2
No ratings yet
Mathematics-2
10 pages
Mod1 2
No ratings yet
Mod1 2
21 pages
SNA (Exp-3) B1
No ratings yet
SNA (Exp-3) B1
6 pages
6handbook of Graphs and Networks in People Analytics
No ratings yet
6handbook of Graphs and Networks in People Analytics
269 pages
Data Representation & Networks
No ratings yet
Data Representation & Networks
26 pages
Lec 32
No ratings yet
Lec 32
25 pages
Week 16
No ratings yet
Week 16
47 pages
L21 Mining Social Network Graphs
No ratings yet
L21 Mining Social Network Graphs
30 pages
Python Network Analysis Guide
No ratings yet
Python Network Analysis Guide
47 pages
Training 2024
No ratings yet
Training 2024
20 pages
Complex Networks Course Overview
No ratings yet
Complex Networks Course Overview
46 pages
Graph Theory in Cyber Security Analysis
No ratings yet
Graph Theory in Cyber Security Analysis
45 pages
Scalable Graph Analytics For Social Network Analysis Using Spark Final
No ratings yet
Scalable Graph Analytics For Social Network Analysis Using Spark Final
9 pages
Tutorial
No ratings yet
Tutorial
46 pages
Introduction To Datascience (R20DS501)
No ratings yet
Introduction To Datascience (R20DS501)
23 pages
DSP Unit 5
No ratings yet
DSP Unit 5
33 pages
Data Science 5th Assignment
No ratings yet
Data Science 5th Assignment
13 pages
Lecture 12 Graph
No ratings yet
Lecture 12 Graph
57 pages
05 Networks
No ratings yet
05 Networks
48 pages
Graph Theory for STEM Students
No ratings yet
Graph Theory for STEM Students
191 pages
Unit I Graph Theory and Concepts
No ratings yet
Unit I Graph Theory and Concepts
35 pages
Twitter Social Networking Analysis
No ratings yet
Twitter Social Networking Analysis
36 pages
Algorithms Unit 2
No ratings yet
Algorithms Unit 2
71 pages
Rust Finedoc
No ratings yet
Rust Finedoc
16 pages
Graph Theory Basics for Beginners
No ratings yet
Graph Theory Basics for Beginners
89 pages
Understanding Graph Databases - A Comprehensive Introduction
No ratings yet
Understanding Graph Databases - A Comprehensive Introduction
29 pages
Lec 18-Graph Analytics
No ratings yet
Lec 18-Graph Analytics
100 pages
SMA Exp 6
No ratings yet
SMA Exp 6
2 pages
Big Data Analysis With Networks (BDAWN) : Name of The Faculty: Affiliation
No ratings yet
Big Data Analysis With Networks (BDAWN) : Name of The Faculty: Affiliation
8 pages
Unit 2 Notes Social Media
No ratings yet
Unit 2 Notes Social Media
28 pages
Chapter 10
No ratings yet
Chapter 10
83 pages
Community Structure Identification in Graphs
No ratings yet
Community Structure Identification in Graphs
10 pages
ARS CH1 CN Basics
No ratings yet
ARS CH1 CN Basics
61 pages
Topic 1 - Graphs
No ratings yet
Topic 1 - Graphs
14 pages
Chapter 3
No ratings yet
Chapter 3
54 pages
Analyzing Co-Occurrence with GraphX
No ratings yet
Analyzing Co-Occurrence with GraphX
43 pages
10 - Network Robustness in IoT
No ratings yet
10 - Network Robustness in IoT
34 pages
A Graph 2
No ratings yet
A Graph 2
17 pages
Python Social Network Analysis Guide
No ratings yet
Python Social Network Analysis Guide
80 pages
Hamiltonian Path and Circuit Analysis
No ratings yet
Hamiltonian Path and Circuit Analysis
47 pages
CSE0541-1203 - DiscreteMath by Shakibul Ajam
No ratings yet
CSE0541-1203 - DiscreteMath by Shakibul Ajam
106 pages
Longest Path in Dag
No ratings yet
Longest Path in Dag
4 pages
DM&GT PRACTICE PROBLEMS Unit-4 & 5
No ratings yet
DM&GT PRACTICE PROBLEMS Unit-4 & 5
4 pages
2 Mark Questions Graph Theory-1
No ratings yet
2 Mark Questions Graph Theory-1
3 pages
Categorical and Numerical Data Analysis
No ratings yet
Categorical and Numerical Data Analysis
30 pages
Lecture Notes (CS503)
No ratings yet
Lecture Notes (CS503)
14 pages
Domination Number of Some Graphs
No ratings yet
Domination Number of Some Graphs
14 pages
Maths 1 Week 11 GA P1 ?
No ratings yet
Maths 1 Week 11 GA P1 ?
18 pages
Erdos 60 Random
No ratings yet
Erdos 60 Random
5 pages
VLSI Logic Circuits and Graph Theory-V3
No ratings yet
VLSI Logic Circuits and Graph Theory-V3
5 pages
Graphs With D No. of Distinct Eigenvalues - 1
No ratings yet
Graphs With D No. of Distinct Eigenvalues - 1
6 pages
Project Representation in Management
No ratings yet
Project Representation in Management
26 pages
Chapter 0.0 Introduction To Data Structures
No ratings yet
Chapter 0.0 Introduction To Data Structures
30 pages
Analysis Chapter 3
No ratings yet
Analysis Chapter 3
41 pages
IIT Kanpur Algorithm Practice Sheet
No ratings yet
IIT Kanpur Algorithm Practice Sheet
4 pages
DMGT
No ratings yet
DMGT
7 pages
DMGT SA Unit 1-5
No ratings yet
DMGT SA Unit 1-5
11 pages
Sptve - Icf 9 - Q1 - M18
No ratings yet
Sptve - Icf 9 - Q1 - M18
15 pages
Chapter 6
No ratings yet
Chapter 6
13 pages
Longest Chain in Graph via Genetic Algorithm
No ratings yet
Longest Chain in Graph via Genetic Algorithm
16 pages
Top-K Graph Mining Algorithm
No ratings yet
Top-K Graph Mining Algorithm
18 pages
Understanding Bloom's Taxonomy
100% (1)
Understanding Bloom's Taxonomy
20 pages
Algorithm Exam Prep Guide
No ratings yet
Algorithm Exam Prep Guide
19 pages
Operational Research TD
No ratings yet
Operational Research TD
11 pages
Graph Theory Basics & Exercises
No ratings yet
Graph Theory Basics & Exercises
10 pages
Prelim Examination
No ratings yet
Prelim Examination
4 pages
Graph and Tree Concepts Explained
No ratings yet
Graph and Tree Concepts Explained
31 pages
Algorithm Quiz
No ratings yet
Algorithm Quiz
40 pages
Persistence and Periodicity in A Dynamic Proximity Network
No ratings yet
Persistence and Periodicity in A Dynamic Proximity Network
5 pages

Lecture 4 - Analyzing Massive Graphs Part I

Uploaded by

Lecture 4 - Analyzing Massive Graphs Part I

Uploaded by

ANALYZING MASSIVE GRAPHS - PART I

DR. MICHAEL FIRE

Due to their cross-disciplinary usefulness today, there are:

There are direct graphs, like Twitter, and undirected

In today’s lecture, we are going to learn how to use networks as

Yeast Protein–protein/Protein–DNA interaction network visualized by Cytoscape.

We can use [Link] to dynamically visualize networks

Thus, the more central a node is, the closer it is to all

`most_valuable_edge` is a function that takes the graph `G` as input

most_valuable_edge -> usually the default is the edge with the

You might also like