In light of the recent discovery that Handout #33S Problem 8 had an incorrect solution for how to do cycle checking, I decided to slap together a handout about graphs and basic graph algorithms. This handout is intended as an aid for those that are still struggling with graphs and as prep for people continuing on with computer science.
A graph is a \ufb01nite set nodes and a set of links between these nodes. These links are called arcs or edges. Two nodes connected by an edge are said to be adjacent. The edges can be traversed to \ufb01nd paths between nodes. If there exists a path from nodev to nodew, thenv is said to be able to reachw.1
There are two basic kinds of graphs: directed and undirected. The set of nodes is denotedV by convention and the set of edges is denotedE. Where most types of graphs di\ufb00er is in the de\ufb01nition ofE. In directed graphs de\ufb01ningE is rather simple.E is a subset of the cartesian product ofV with itself,E\u2286 V\u00d7 V . Undirected graphs take a little bit more tact to de\ufb01ne. Each element inE is an unordered pair (x, y) wherex, y\u2208 V .
In both cases, the convention is to denote the number of nodesn and the number of edgesm. The number of edges isO(n2).2 3 Because of this, algorithms that areO(n +m) are considered to be \u201clinear in the size of the graph.\u201d
The primary way of representing a graph is using an adjacency list. With adjacency lists, each node is given a list of nodes that it is adjacent to. In order to be able to access all of the nodes of the graph, a list of nodes is also kept. The amount of data needed to represent a graph in this manner is thenO(n +m).
Oftentimes, there are many subtleties that arise directly from de\ufb01nitions and assumptions about the graphs being worked on. Can a node have an edge to itself? Is the input a connected graph? Can an edge have negative weight? Does there exist a cycle whose overall weight is negative? Can a path include a node or an edge more than once? Does the triangle inequality hold? Can the graph have 0 nodes? Does each node have at least 1 edge? One should be very careful about what you have assumed. These are all valid questions to ask before designing your own algorithms. Such assumptions and de\ufb01nitions should be made explicit. (In the case of programming, this should be in the comments in the code and documentation.)4
\u201cNote that the graph must have directed arcs in order to be acyclic\u201d (Handout #33). Now if one of you came up to me and asked me if this statement is true or false, I would have responded \u201cFALSE!\u201d But, it is true under the handout\u2019s de\ufb01nition of cycle: \u201cA graph is considered to be cyclic if we can \ufb01nd a path through a graph from any node back to the same node.\u201d (There are still annoying edge cases that break this, such as graphs without any edges.) It is not true under what would I assume the de\ufb01nition of a cycle is: \u201cA cycle is a path through a graph from any node back to the same node without reusing an edge in the path.\u201d
For the majority of the handout, the graph that is being used is a directed graph represented by an adjacency list that contains outgoing edges. Each node keeps a little additional information to help the algorithms presented later. This code as well as a few utility functions are in Fig. 1.
DFS is graph searching algorithm. When DFS visits a node, if the node has already been visited, it stops. Otherwise it marks the node as visited, and then calls DFS on each of its children in no particular order. (A C++ version is in Fig. 2.)
DFS recurses on the children, and so it goes as \u201cdeep\u201d as it can on each of the children. DFS has the nifty property that it will visit all of the nodes reachable from the node. For an undirected graph, this is the entire connected component (i.e. every node connected ton). Note: this isNOT guaranteed to visit all of the nodes in the graph. Not all of the nodes are necessarily reachable fromv. To visit all of the node\u2019s using DFS, iterate through all of the nodes and start the search on all of the unvisited nodes.
Because every node is visited at most once before it is marked as visited, the adjacency list of each node is processed at most once. Each node is visited at least once. Therefore the running time isO(n +m).
Breadth First Search is another linear time graph traversal algorithm. In BFS, a node is enqueued into an empty queue. Then while the queue is not empty, a node is popped o\ufb00 the queue, marked as visited and all of its unvisited children are enqueued. (A C++ version is in Fig. 3.)
BFS has the nice property every node is visited using the minimum number of hops necessary from the starting node. This enables other algorithms such as checking whether or not a graph is bipartite. BFS is the skeleton upon which many more complex traversal algorithms are based, such as A\u2217 and Dijkstra.
Go back to the part discussing DFS. Notice that it says that DFS will have at mostn + 1 calls deep. That last node was previously visited and still lies on the stack. Hmm... a cycle! In fact if a cycle exists, eventually DFS will eventually \ufb01nd a node that it has not only visited before, but that is still on the stack. This insight allows us to design a cycle checking algorithm based on DFS from Fig. 2. The full cycle checking algorithm is in Fig. 4.
This action might not be possible to undo. Are you sure you want to continue?