You are on page 1of 33

# BIO/CS 471 – Algorithms for bioinformatics

Graph Theoretic
Concepts and Algorithms
for Bioinformatics

Intro. to Graph Theory

1

What is a “graph”
• Formally: A finite graph G(V, E) is a pair (V, E),
where V is a finite set and E is a binary relation on V.
– Recall: A relation R between two sets X and Y is a subset of X
x Y.
– For each selection of two distinct V’s, that pair of V’s is
either in set E or not in set E.

• The elements of the set V are called vertices (or
nodes) and those of set E are called edges.
• Undirected graph: The edges are unordered pairs of
V (i.e. the binary relation is symmetric).

a
b
c

– Ex: undirected G(V,E); V = {a,b,c}, E = {{a,b}, {b,c}}

• Directed graph (digraph):The edges are ordered pairs
of V (i.e. the binary relation is not necessarily
symmetric).
b
– Ex: digraph G(V,E); V = {a,b,c}, E = {(a,b), (b,c)}
Intro. to Graph Theory

a
c
2

Why graphs?
• Many problems can be stated in terms of a graph
• The properties of graphs are well-studied
– Many algorithms exists to solve problems posed as graphs
– Many problems are already known to be intractable

• By reducing an instance of a problem to a standard graph
problem, we may be able to use well-known graph algorithms
to provide an optimal solution
• Graphs are excellent structures for storing, searching, and
retrieving large amounts of data
– Graph theoretic techniques play an important role in increasing the
storage/search efficiency of computational techniques.

• Graphs are covered in section 2.2 of Setubal & Meidanis
Intro. to Graph Theory

3

proteins. etc. R Y L I Chemical compounds Intro. to Graph Theory Metabolic pathways 4 .Graphs in bioinformatics • Sequences – DNA.

Graphs in bioinformatics Intro. to Graph Theory Phylogenetic trees 5 .

Basic definitions Undirected graph Directed graph loop loop G=(V. • degree of a vertex: number of edges incident to it – Nodes of a digraph can also be said to have an indegree and an outdegree • adjacency: two vertices connected by an edge are adjacent Intro. to Graph Theory 6 .E) isolated vertex multiple edges adjacent • incidence: an edge (directed or undirected) is incident to a vertex that is one of its end points.

to Graph Theory 7 . trail. or walk circuit: a closed trail (ex: a-b-c-d-b-e-d-a) cycle: closed path (ex: a-b-c-d-a) Intro.“Travel” in graphs x a y path: no vertex can be repeated example path: a-b-c-d-e trail: no edge can be repeated example trail: a-b-c-d-e-b-d walk: no restriction example walk: a-b-d-a-b-c e b d c closed: if starting vertex is also ending vertex length: number of edges in the path.

Types of graphs • • • • • simple graph: an undirected graph with no loops or multiple edges between the same two vertices multi-graph: any graph that is not simple connected graph: all vertex pairs are joined by a path disconnected graph: at least one vertex pairs is not joined by a path complete graph: all vertex pairs are adjacent – Kn: the completely connected graph with n vertices Simple graph b a a e e b K5 d Intro. to Graph Theory c Disconnected graph with two components d c 8 .

Types of graphs • acyclic graph (forest): a graph with no cycles • tree: a connected. to Graph Theory c 8 -3 2 e a 6 f 9 . acyclic graph • rooted tree: a tree with a “root” or “distinguished” vertex – leaves: the terminal nodes of a rooted tree • directed acyclic graph (DAG): a digraph with no cycles • weighted graph: any graph with weights associated with the edges (edge-weighted) and/or the vertices (vertex-weighted) b 10 5 d Intro.

to Graph Theory 10 . every path connects an c predecessor/ancestor (the vertex at the head of the path) to its successor/descendents d (nodes at the tail of any path). x • parent: direct ancestor (one hop) y w • child: direct descendent (one hop) • A descendent vertex is reachable from any of v u its ancestors vertices z Intro. and paths can only use edges in the appropriate direction • In a DAG.Digraph definitions • for digraphs only… Directed graph a • Every edge has a head (starting point) and a b tail (ending point) • Walks. trails.

j contains a weight (or a defined constant HEAD for unweighted graphs) if the vertex i is the head of edge j or a constant TAIL if vertex I is the tail of edge j c 6 b 2 8 a 1 4 a 10 d Intro.j contains the weight of the edge between vi and vj (or 0 for no edge) • adjacency list: a |V| array where each cell i contains a list of all vertices adjacent to vi • incidence matrix: a |V| by |E| array where each cell i. b (10) adjacency list a b c d t 6 2 8 t 3 4 t t 2 10 5 4 t incidence matrix 11 .Computer representation • undirected graphs: usually represented as digraphs with two directed edges per “actual” undirected edge. d (4) b c b (6) d c (2). to Graph Theory a b c d b c 8 d 4 6 10 2 adjacency matrix a c (8). • adjacency matrix: a |V| x |V| array where each cell i.

# list of nodes coming out of this node childEdgeWeights = [].Computer representation • Linked list of nodes: Node is a defined data object with labels which include a list of pointers to its children and/or parents • Graph = [] # list of nodes Class Node: label = NIL. parents = []. to Graph Theory 12 . # ordered list of edged weights Intro. # list of nodes coming into this node children = [].

c.e} 13 .d}}) G with V’ = {b. • induced subgraph: a subgraph that contains all possible edges in E that have end points of the vertices of the selected V’ a a e b d c G(V.Subgraphs • G’(V’.d}.c.E’) is a subgraph of G(V.E) if V’  V and E’  E.{{c. to Graph Theory e b d d c c Induced subgraph of G’({a.E) Intro.d.

E) is a graph with the same vertex set. but with vertices adjacent only if they were not adjacent in G(V.Complement of a graph • The complement of a graph G (V.E) a a e b G G e b d d c Intro. to Graph Theory c 14 .

Famous problems: Shortest path • Consider a weighted connected directed graph with a distinguished vertex source: a distinguished vertex with zero in-degree • • • • What is the path of total minimum weight from the source to any other vertex? Greedy strategy works for simple problems (no cycles. no negative weights) Longest path is a similar problem (complement weights) We will see this again soon for fragment assembly! c 6 b 2 8 a 4 Intro. to Graph Theory 10 d 15 .

x. or The distance from s to c . D(x) = distance from s to x (initially all ) Select the closest vertex to s.Dijkstra’s Algorithm • 1. as the MINIMUM of: 1. 2. The current distance. x) Intro. to Graph Theory 16 . plus the distance from c to x – D(c) + W(c. 2. according to the current estimate (call it c) Recompute the estimate for every other vertex.

Dijkstra’s Algorithm Example Initial Process A Process C A B C D E 0     10 3 20  0 0 5 3 20 18 Process B 0 5 3 10 18 Process D 0 5 3 10 18 Process E 0 5 3 10 18 Intro. to Graph Theory B 10 A 5 D 20 11 3 2 C 15 E 17 .

Famous problems: Isomorphism • Two graphs are isomorphic if a 1-to-1 correspondence between their vertex sets exists that preserve adjacencies • Determining to two graphs are isomorphic is NP-complete a 1 e b d c Intro. to Graph Theory 2 4 3 5 18 .

3}{1. each of which induces a clique • clique partition: a disjoint clique cover 1 2 4 3 Intro.3.3} Clique cover: { {1.Famous problems: Maximal clique • • clique: a complete subgraph maximal clique: a clique not contained in any other clique.{1. the largest complete subgraph in the graph • Vertex cover: a subset of vertices such that each edge in E has at least one end-point in the subset • clique cover: vertex set divided into non-disjoint subsets. to Graph Theory Maximal cliques: {1.3}.3.3}{4} } 19 .2.2.2.4} } Clique partition: { {1.4} Vertex cover: {1.

to Graph Theory 1 2 4 3 20 .Famous problems: Coloring • vertex coloring: labeling the vertices such that no edge in E has two end-points with the same label • chromatic number: the smallest number of labels for a coloring of a graph • What is the chromatic number of this graph? • Would you believe that this problem (in general) is intractable? Intro.

to Graph Theory b 1 5 i 3 3 3 4 2 e 4 d c 2 21 .Famous problems: Hamilton & TSP • Hamiltonian path: a path through a graph which contains every vertex exactly once • Finding a Hamiltonian path is another NP-complete problem… • Traveling Salesmen Problem (TSP): find a Hamiltonian path of minimum cost a d g b e c a f h Intro.

to Graph Theory 22 .4 Intro.Famous problems: Bipartite graphs • Bipartite: any graph whose vertices can be partitioned into two distinct sets so that every edge has one endpoint in each set. • How colorable is a bipartite graph? • Can you come up with an algorithm to determine if a graph is bipartite or not? • Is this problem tractable or intractable? K4.

..{(d.f)}. d {(b. f a h c Intro.(a.d)}.d).c)}.. to Graph Theory g 1 2 4 3 23 .(c.Famous problems: Minimal cut set • cut set: a subset of edges whose remove causes the number of graph components to increase • vertex separation set: a subset of vertices whose removal causes the number of graph components to increase • How would you determine the minimal cut set or vertex separation set? e b cut-sets: {(a.b).

to Graph Theory 3 x x x x 4 x x x a b c d e Colors? f 24 .Famous problem: Conflict graphs • Conflict graph: a graph where each vertex represents a concept or resource and an edge between two vertices represents a conflict between these two concepts • When the vertices represents intervals on the real line (such as time) the conflict graph is sometimes called an interval graph • A coloring of an interval graph produces a schedule that shows how to best resolve the conflicts… a minimal coloring is the “best” schedule” • This concept is used to solve problems in the physical mapping of DNA A B C D E F 1 x 2 x x x Intro.

Famous problems: Spanning tree • spanning tree: A subset of edges that are sufficient to keep a graph connected if all other edges are removed • minimum spanning tree: A spanning tree where the sum of the edge weights is minimum 2 e b 6 a 4 1 c d8 4 2 f 2 4 h 2 g 2 e b 6 a 4 Intro. to Graph Theory 1 c d8 2 f 2 4 h 2 g 25 .

Famous problems: Euler circuit • G is said to have a Euler circuit if there is a circuit in G that traverses every edge in the graph exactly once • The seven bridges of Konigsberg: Find a way to walk about the city so as to cross each bridge exactly once and then return to the starting point. area b a area d area c b d c Intro. to Graph Theory This one is in P! 26 .

to Graph Theory “CAB” 27 .Famous problems: Dictionary • How can we organize a dictionary for fast lookup? a b c … y z a b c … y z a b c … y z a b c … y z a b c … y z 26-ary “trie” a b c … y z Intro.

Graph traversal • There are many strategies for solving graph problems… for many problems. • We will consider a “travel” problem for example: • Given the graph below. 3 b 4 a 2 5 Intro. Shorter paths (in terms of edge weight sums) are desirable. to Graph Theory d 1 e c 6 7 f 28 . find a path from vertex a to vertex d. the efficiency and accuracy of the solution boil down to how you “search” the graph.

b.A greedy approach • greedy traversal: Starting with the “root” node. to Graph Theory d 1 e c 6 7 Start node: a End node: d Traversal order: a. the greedy strategy is a poor choice. c. great! If you get to a dead end. take the edge with smallest weight. f. for yet other problems. • Advantages: Fast! Drawbacks: Answer is usually non-optimal • For some problems. greedy approaches are optimal. Mark the edge so that you never attempt to use it again. If you get to the end. e. 3 b 4 a 2 5 Intro. back up one decision and try the next best edge. for others the answer may usually be close to the best answers. d f 29 .

Exhaustive search: Breadth-first • For the current node. do any necessary work – In this case. d. e. f f 30 . b. c. calculate the cost to get to the node by the current path. if the cost is better than any previous path. • • Place all adjacent unused edges in a queue (FIFO) Take an edge from the queue. update the “best path” and “lowest cost”. and follow it to the new current node 3 b 4 a 2 5 d Intro. to Graph Theory 1 e c 6 7 Traversal order: a. mark it as used.

unmark all of your edges an go back from whence you came! 3 b 4 a 2 5 d Intro.state = “visited” Process vertex v Foreach edge (v. d.w) { if w. to Graph Theory 1 e V. b. w) process edge (v. e.Exhaustive search: Depth-first • For each current node – do any necessary work – Pick one unused edge out and follow it to a new current node – If no unused edges exist.w) } } c 6 7 DFS (G. f. c 31 . v) f } Traversal order: a.state = “unseen” { DFS (G.

if you find a better result. update the “best result” • At each step of the DFS compare your current “cost” to the cost of the current “best result”. if we already exceed the cost of the best result. stop the downward search! Mark all edges as used.Branch and Bound • • Begin a depth-first search (DFS) Once you achieve a successful result. note the result as our initial “best result” • Continue the DFS. and head back up. 3 b 4 a 2 5 d 1 e Intro. to Graph Theory c 6 7 f Traversal order: Path Current Best A 0 AE 2 AEB 6 AEBD 11 11 AEF 9 11 AEFC 15 11 AC 1 11 Path Current Best ACF 7 11 ACFE 15 11 < prune AB 3 11 ABD 8 8 ABE 7 8 ABEF 14 8 32 .

Binary search trees • • • Binary trees have at two children per node (the child may be null) Binary search trees are organized so that each node has a label. you eliminate 50% of the search space per node… if the tree is balanced 5 8 3 2 1 Intro. When searching or inserting a value. one out-going edge corresponds to “less than” and one out-going edge corresponds to “greater than”. to Graph Theory 4 9 6 7 10 33 . compare the target value to each node. • On the average.