Professional Documents
Culture Documents
UNIT 5
SYLLABUS
• Graph Terminology -Traversal
• Topological Sorting, Minimum Spanning Tree, Prim’s Algorithm
• Minimum Spanning Tree- Kruskal Algorithm- Network Flow problem
• Shortest Path Algorithm-Introduction- Dijikstra’s Algorithm
• Hashing –Introductions- Hash Functions
• Hashing- Collision Avoidance- Separate Chaining
• Open Addressing- Linear Probing
• Quadratic Probing- Double hashing
• Rehashing- Extensible Hashing
Introduction to ADT Graphs
What is a graph?
• A data structure that consists of a set of nodes (vertices) and a set of
edges that relate the nodes to each other
• The set of edges describes relationships among the vertices
Formal definition of graphs
• A graph G is defined as follows:
G=(V,E)
V(G): a finite, nonempty set of vertices
E(G): a set of edges (pairs of vertices)
Directed vs. undirected graphs
• When the edges in a graph have no direction, the graph is
called undirected
Directed vs. undirected graphs (cont.)
• When the edges in a graph have a direction, the graph is
called directed (or digraph)
14
Implementation of ADT Graphs
Graph implementation
• Array-based implementation
• A 1D array is used to represent the vertices
• A 2D array (adjacency matrix) is used to represent the edges
Array-based implementation
Graph implementation (cont.)
• Linked-list implementation
• A 1D array is used to represent the vertices
• A list is used for each vertex v which contains the vertices which
are adjacent from v (adjacency list)
Linked-list implementation
Adjacency matrix vs. adjacency list
representation
• Adjacency matrix
• Good for dense graphs --|E|~O(|V|2)
• Memory requirements: O(|V| + |E| ) = O(|V|2 )
• Connectivity between two vertices can be tested quickly
• Adjacency list
• Good for sparse graphs -- |E|~O(|V|)
• Memory requirements: O(|V| + |E|)=O(|V|)
• Vertices adjacent to another vertex can be found quickly
Representation of Graph
• Graph can be represented by Adjacency Matrix and Adjacency List.
• Adjacency Matrix Representation
• The adjacency matrix A for a graph G=(V,E) with n vertices in an n x n matrix , such that
• Aij = 1, if there is an edge Vi to Vj
• Aij = 0, if there is no edge
Adjacency Matrix-Pros and Cons
Pros:
• Simple to implement
• Easy and fast to tell if a pair (i,j) is an edge: simply check if A[i][j] is 1 or 0
Cons:
• No matter how few edges the graph has, the matrix takes O(n2) in
memory
Adjacency List Representation
• In this representation we store a graph as a linked structure. We store all the vertices in a list
for each vertex; we have a linked list of its adjacency vertices.
3 1 2
0 0
1 2 3 0
1 2 0
1 2 1 3 0
0 1 2
2
3
Adjacency List-Pros and Cons
Pros:
• Saves on space (memory): the representation takes as many
memory words as there are nodes and edge.
Cons:
• It can take up to O(n) time to determine if a pair of nodes (i,j) is an
edge: one would have to search the linked list L[i], which takes time
proportional to the length of L[i].
Graph specification based on adjacency matrix
representation
const int NULL_EDGE = 0;
template<class VertexType> private:
class GraphType { int numVertices;
int maxVertices;
public:
VertexType* vertices;
GraphType(int); int **edges;
~GraphType(); bool* marks;
void MakeEmpty(); };
bool IsEmpty() const;
bool IsFull() const;
void AddVertex(VertexType);
void AddEdge(VertexType, VertexType, int);
int WeightIs(VertexType, VertexType);
void GetToVertices(VertexType, QueType<VertexType>&);
void ClearMarks();
void MarkVertex(VertexType);
bool IsMarked(VertexType) const; (continues)
template<class VertexType> void GraphType<VertexType>::AddVertex(VertexType vertex)
GraphType<VertexType>::GraphType(int maxV) {
{ vertices[numVertices] = vertex;
numVertices = 0;
maxVertices = maxV; for(int index = 0; index < numVertices; index++) {
vertices = new VertexType[maxV]; edges[numVertices][index] = NULL_EDGE;
edges = new int[maxV]; edges[index][numVertices] = NULL_EDGE;
for(int i = 0; i < maxV; i++) }
edges[i] = new int[maxV];
marks = new bool[maxV]; numVertices++;
} }
template<class VertexType>
GraphType<VertexType>::~GraphType() template<class VertexType>
{ void GraphType<VertexType>::AddEdge(VertexType fromVertex,
delete [ ] vertices; VertexType toVertex, int weight)
for(int i = 0; i < maxVertices; i++) {
delete [ ] edges[i]; int row;
delete [ ] edges; int column;
delete [ ] marks;
} row = IndexIs(vertices, fromVertex);
col = IndexIs(vertices, toVertex);
edges[row][col] = weight; (continues)
}
template<class VertexType>
int GraphType<VertexType>::WeightIs(VertexType fromVertex,
VertexType toVertex) {
int row;
int column;
row = IndexIs(vertices, fromVertex);
col = IndexIs(vertices, toVertex);
return edges[row][col];
}
• Then the vertex B is dequeued and its adjacency vertices C and D are
taken from the adjacency matrix for enqueuing. Since vertex C is already
in the queue, vertex D alone is enqueue
Steps
• Choose any node in the graph. Designate it as the search node and mark it as visited.
• Using the adjacency matrix of the graph, find a node adjacent to the search node that has
not be visited yet. Designate this as the new search node and mark it as visited.
• Repeat step 2 using the new search node. If no nodes satisfying (2) can be found return
to the previous search node and continue from there.
• When a return to the previous search node in (3) is impossible, the search from the
originally chosen search node is complete.
• If the graph still contains unvisited nodes, choose any node that has not been visited and
repeat step 1 through 4.
Depth-first Search Algorithm:
• The depth-first search algorithm progresses by expanding the starting node of G
and then going deeper and deeper until the goal node is found, or until a node that
has no children is encountered. When a dead-end is reached, the algorithm
backtracks, returning to the most recent node that has not been completely
explored.
• In other words, depth-first search begins at a starting node A which becomes the
current node. Then, it examines each node N along a path P which begins at A. That
is, we process a neighbor of A, then a neighbor of neighbor of A, and so on.
Depth-first Search Algorithm:
• During the execution of the algorithm, if we reach a path that has a node N that has
already been processed, then we backtrack to the current node. Otherwise, the
unvisited (unprocessed) node becomes the current node.
• This technique is used for searching a vertex in a graph.
• It produces a spanning tree as a final result.
• Spanning tree is a graph without any loops.
• Here we use Stack data structure with maximum size of total number of vertices in
the graph.
Depth-first Search Algorithm:
EXAMPLE-1
Depth-first Search Algorithm:
Consider the graph and find the spanning tree,
Depth-first Search Algorithm:
Depth-first Search Algorithm:
Depth-first Search Algorithm:
FS-iterative (G, s): //Where G is graph and s is source vertex
let S be stack
S.push( s ) //Inserting s in stack
mark s as visited.
while ( S is not empty):
//Pop a vertex from stack to visit next
v = S.top( )
S.pop( )
//Push all the neighbours of v in stack that are not visited
for all neighbours w of v in Graph G:
if w is not visited :
S.push( w )
mark w as visited
DFS-recursive(G, s):
mark s as visited
for all neighbours w of s in Graph G:
if w is not visited:
DFS-recursive(G, w)
Features of Depth-First Search Algorithm
Space complexity The space complexity of a depth-first search is
lower than that of a breadth-first search.
Time complexity The time complexity of a depth-first search is
proportional to the number of vertices plus the
number of edges in the graphs that are traversed.
The time complexity can be given as
(O(|V| + |E|)).
Completeness Depth-first search is said to be a complete algorithm.
If there is a solution, depth-first search will find it
regardless of the kind of graph. But in case of an
infinite graph, where there is no possible solution, it
will diverge.
Applications of DFS
• Detecting cycle in a graph
• Path Finding
• Solving puzzles with only one solution, such as mazes
Applications of Depth-First Search Algorithm
Depth-first search is useful for:
• Finding a path between two specified nodes, u and v, of an unweighted
graph.
• Finding a path between two specified nodes, u and v, of a weighted
graph.
• Finding whether a graph is connected or not.
• Computing the spanning tree of a connected graph.
Exercise
Find BFS and DFS Traversal ordering of nodes for the following
graph.
Topological sorting
Ordering a graph
• Suppose we have a directed acyclic graph (DAG) of courses, and we want to
find an order in which the courses can be taken.
• Must take all prereqs before you can take a given course. Example:
• [142, 143, 140, 154, 341, 374, 331, 403, 311, 332, 344,
312, 351, 333, 352, 373, 414, 410, 417, 413, 415]
• There might be more than one allowable ordering.
• How can we find a valid ordering of the vertices?
A
Topo sort example
• function topologicalSort():
• ordering := { }.
• Repeat until graph is empty:
• Find a vertex v with in-degree of 0 (no incoming edges).
• (If there is no such vertex, the graph cannot be sorted; stop.)
• Delete v and all of its
outgoing edges from the graph. C
• ordering += v .
F
B E
• ordering = { B }
A
Topo sort example
• function topologicalSort():
• ordering := { }.
• Repeat until graph is empty:
• Find a vertex v with in-degree of 0 (no incoming edges).
• (If there is no such vertex, the graph cannot be sorted; stop.)
• Delete v and all of its
outgoing edges from the graph. C
• ordering += v .
F
B E
• ordering = { B, C }
A
Topo sort example
• function topologicalSort():
• ordering := { }.
• Repeat until graph is empty:
• Find a vertex v with in-degree of 0 (no incoming edges).
• (If there is no such vertex, the graph cannot be sorted; stop.)
• Delete v and all of its
outgoing edges from the graph. C
• ordering += v .
F
B E
• ordering = { B, C, A }
A
Topo sort example
• function topologicalSort():
• ordering := { }.
• Repeat until graph is empty:
• Find a vertex v with in-degree of 0 (no incoming edges).
• (If there is no such vertex, the graph cannot be sorted; stop.)
• Delete v and all of its
outgoing edges from the graph. C
• ordering += v .
F
B E
• ordering = { B, C, A, D }
A
Topo sort example
• function topologicalSort():
• ordering := { }.
• Repeat until graph is empty:
• Find a vertex v with in-degree of 0 (no incoming edges).
• (If there is no such vertex, the graph cannot be sorted; stop.)
• Delete v and all of its
outgoing edges from the graph. C
• ordering += v .
F
B E
• ordering = { B, C, A, D, F }
A
Topo sort example
• function topologicalSort():
• ordering := { }.
• Repeat until graph is empty:
• Find a vertex v with in-degree of 0 (no incoming edges).
• (If there is no such vertex, the graph cannot be sorted; stop.)
• Delete v and all of its
outgoing edges from the graph. C
• ordering += v .
F
B E
• ordering = { B, C, A, D, F, E }
A
Revised algorithm
• We don't want to literally delete vertices and edges from
the graph while trying to topological sort it; so let's revise
the algorithm:
• map := {each vertex → its in-degree}.
• queue := {all vertices with in-degree = 0}.
• ordering := { }.
• Repeat until queue is empty:
• Dequeue the first vertex v from the queue.
• ordering += v.
• Decrease the in-degree of all v's neighbors by 1 in the map.
• queue += {any neighbors whose in-degree is now 0}.
• If all vertices are processed, success.
Otherwise, there is a cycle.
Topo sort example 2
• function topologicalSort():
• map := {each vertex → its in-degree}. C
• queue := {all vertices with in-degree = 0}. F
• ordering := { }. B E
• Repeat until queue is empty:
• Dequeue the first vertex v from the queue.
• ordering += v.
D
• Decrease the in-degree of all v's
neighbors by 1 in the map. A
• queue += {any neighbors whose in-degree is now 0}.
• function topologicalSort():
• map := {each vertex → its in-degree}. // O(V)
• queue := {all vertices with in-degree = 0}.
• ordering := { }.
• Repeat until queue is empty: // O(V)
• Dequeue the first vertex v from the queue. // O(1)
• ordering += v. // O(1)
• Decrease the in-degree of all v's // O(E) for all passes
neighbors by 1 in the map.
• queue += {any neighbors whose in-degree is now 0}.
2. Most Efficient Time Complexity of Topological Sorting is? (V – number of vertices, E – number of
edges)
a) O(V + E) b) O(V) c) O(E) d) O(V*E)
3. In most of the cases, topological sort starts from a node which has ___
a) Maximum Degree b) Minimum Degree c) Any degree d) Zero Degree
1.Cluster Analysis
2.Handwriting recognition
3.Image segmentation
Applications of MST
minimized
Growing a MST – Generic Approach
• Grow a set A of edges (initially
8 7
empty) b c d
4 9
2
• Incrementally add edges to A a 11 i 14 e
4
such that they would belong 8
7 6
10
h g f
to a MST 1 2
a 11 i 14 e
- Is it safe for A initially? 7 6
4
8 10
• Later on: S h
1
g
2
f
iv. After selecting the vertex v, the update rule is applied for each
unknown w adjacent to v. The rule is dw = min (dw , Cw,v), that is more
than one path exist between v to w then dw is updated with minimum
cost.
Prim’s Algorithm Example
1.v1 is selected as initial node in the
spanning tree and construct initial
configuration of the table.
v Known dv pv
V1 0 0 0
V2 0 ∞ 0
V3 0 ∞ 0
V4 0 ∞ 0
V5 0 ∞ 0
V6 0 ∞ 0
V7 0 ∞ 0
Prim’s Algorithm Example
2.v1 is declared as known vertex. Then its adjacent vertices v2, v3, v4 are updated.
v Known d pv
T[v2].dist = min(T[v2].dist, Cv1,v2) = min (∞ ,2) = 2 v
V
1 1 0 0
T[v3].dist = min(T[v3].dist, Cv1,v3) = min (∞ ,4) = 2
V
2 0 2 V1
V4 0 1 V1
V5 0 ∞ 0
V6 0 ∞ 0
V7 0 ∞ 0
Prim’s Algorithm Example
3. Among all adjacent vertices V2, V3, V4. V1 -> V4 distance is small. So V4 is selected and
declared as known vertex. Its adjacent vertices distance are updated.
• V1 is not examined because it is known vertex.
• No change in V2 , because it has dv = 2 and the edge cost from V4 -> V2 = 3.
T[v3].dist = min(T[v3].dist, Cv4,v3) = min (4 ,2) = 2
v Known vd p
v
V4 1 1 0
V5 0 7 0
V6 0 8 0
V7 0 4 0
Prim’s Algorithm Example
4. Among all either we can select v2, or
v3 whose dv = 2, smallest among v5, v6 and v7.
• v2 is declared as known vertex.
• Its adjacent vertices are v1, v4 and v5. v1, v Known dv pv
v4 V1 1 0 0
are known vertex, no change in their dv V2 V1
1 2
value.
T[v5].dist = min(T[v5].dist, Cv2,v5) = min V3 0 2 V4
(7 ,10) = 7 V4 1 1 V1
V5 0 7 V4
V6 0 8 V4
V7 0 4 V4
Prim’s Algorithm Example
5. Among all vertices v3‟s dv value is lower so v3 is selected. v3‟s adjacent
vertices are v1, v4 and v6. No changes in v1 and v4.
T[v6].dist = min(T[v6].dist, Cv3,v6) = min (8 ,5) = 5
Prim’s Algorithm Example
6. Among v5, v6, v7, v7‟s dv value is lesser, so v7 is selected. Its adjacent
vertices are v4, v4, and v6. No change in v4.
v Known d
v pv
V2 1 2 V1
V3 1 2 V4
V4 1 1 V1
V5 0 6 V7
V6 0 1 V7
V7 1 4 V4
Prim’s Algorithm Example
V1 1 0 0
8. Finally v5 is declared as known vertex. Its
adjacent vertices are v2, v4, and v7, no change in V2 1 2 V1
V4 V1
The minimum cost of spanning tree is 16. 1 1
V5 1 6 V7
Algorithm Analysis V6 1 1 V7
The running time is O(|V|2) in case of adjacency V7 1 4 V4
void prims(Table T)
{
vertex v, w;
for( i=1; i<=Numvertex; i++) {
T[i]. known = False;
T[i] . Dist = Infinity;
T[i]. path = 0; }
for(;;) { //let v be the start vertex with the smallest distance T[v] . Dist = 0;
T[v]. known = true;
for each w adjacent to v if(!T[w] . known)
{
T[w] . Dist = Min(T[w]. Dist, Cv,w);
T[w]. path = v; } }
}
Exercise
Convert the graph given below to minimum spanning
tree using prims algorithm.
Review Questions
3 4
2 3 4
4 5
2
9
5 6
1
1
12
8 10
7 8 9
7
The graph contains 9 vertices and 11 edges. So, the minimum spanning tree
formed will be having (9 – 1) = 8 edges.
Weight Source Destination
Sort the weights of the graph 1 8 5
2 3 5
3 2 3
4 1 2
4 3 4
5 4 6
7 7 8
8 1 7
9 5 6
10 8 9
12 9 6
Now pick all edges one by one from sorted list of edges
1 8 5
2 3 5
3 2 3
4 1 2
4 3 4
5 4 6
7 7 8
8 1 7
9 5 6
10 8 9
12 9 6
Weight Source Destination
1 8 5
2 3 5
3 2 3
4 1 2
4 3 4
5 4 6
7 7 8
8 1 7
9 5 6
10 8 9
12 9 6
Weight Source Destinatio
n
1 8 5
2 3 5
3 2 3
4 1 2
4 3 4
5 4 6
7 7 8
8 1 7
9 5 6
10 8 9
12 9 6
Weight Source Destination
1 8 5
2 3 5
3 2 3
4 1 2
4 3 4
5 4 6
7 7 8
8 1 7
9 5 6
10 8 9
12 9 6
Weight Source Destination
1 8 5
2 3 5
3 2 3
4 1 2
4 3 4
5 4 6
7 7 8
8 1 7
9 5 6
10 8 9
12 9 6
Weight Sourc Destination
e
1 8 5
2 3 5
3 2 3
4 1 2
4 3 4
5 4 6
7 7 8
8 1 7
9 5 6
10 8 9
12 9 6
Weight Source Destinatio
n
1 8 5
2 3 5
3 2 3
4 1 2
4 3 4
5 4 6
7 7 8
8 1 7
9 5 6
10 8 9
12 9 6
Weigh Source Destinatio
t n
1 8 5
2 3 5
3 2 3
4 1 2
4 3 4
5 4 6
7 7 8 Cycle is formed because of weight 8. hence it is discarded.
8 1 7
9 5 6
10 8 9
12 9 6
Weigh Source Destination
t
1 8 5
2 3 5
3 2 3
4 1 2
4 3 4
5 4 6
7 7 8
8 1 7
9 5 6 Cycle is formed because of weight 9. hence it is discarded.
10 8 9
12 9 6
Weight Source Destinatio
n
1 8 5
2 3 5
3 2 3
4 1 2
4 3 4
5 4 6
7 7 8
8 1 7
9 5 6 Cycle is formed because of weight 12. hence it is discarded.
10 8 9
12 9 6
Weight Source Destination
1 8 5
2 3 5
3 2 3
4 1 2
4 3 4
5 4 6
7 7 8
8 1 7
9 5 6
10 8 9
Cycle is formed because of weight 12. hence it is discarded.
12 9 6
3 4
2 3 4
5
4
2
6
1 5
1
10
7 8 9
7
v Known dv pv
V1 1 0 0
V2 0 ∞ 0
V3 0 ∞ 0
V4 0 ∞ 0
V1 is taken as source
V5 0 ∞ 0
V6 0 ∞ 0
V7 0 ∞ 0
DIJKSTRA’S ALGORITHM-Example
v Known dv pv
V1 1 0 0
V2 0 2 V1
V3 0 ∞ 0
V4 0 1 V1
V5 0 ∞ 0
2. Now V1 is known vertex, marked as 1. Its adjacent vertices are v 2, v4, pv and dv V6 0 ∞ 0
• 4. Select the vertex which is shortest distance from source v1. v2 is smallest one. v2 is marked
as known vertex. Its adjacent vertices are v4 ad v5. The distance from v1 to v4 and v5 through
v2 is more comparing with previous value of dv. No change in dv and pv value.
v Known dv pv
V1 1 0 0
V2 1 2 V1
V3 0 3 V4
V4 1 1 V1
V5 0 3 V4
V6 0 9 V4
V7 0 5 V4
DIJKSTRA’S ALGORITHM- Example
• 5. Select the smallest vertex from source. v 3 and v5 are smallest one. Adjacent vertices for v 3 is v1 and v6. v1 is source
• there is no change in dv and pv
• T[v6]. dist = Min (T[v6].dist, T[v3].dist + Cv3, v6) = Min (9 , 3+5) = 8
• dv and pv values are updated. Adjacent vertices for v 5 is v7. No change in dv and pv value.
DIJKSTRA’S ALGORITHM- Example
• 6. Next smallest vertex v7. Its adjacent vertex is v6. T[v6]. dist = Min (T[v6].dist, T[v7].dist +
Cv7, v6) = Min (8 , 5+1) = 6
• dv and pv values are updated.
v Known dv pv
V1 1 0 0
V2 1 2 V1
V3 1 3 V4
V4 1 1 V1
V5 1 3 V4
V6 0 6 V7
V7 1 5 V4
DIJKSTRA’S ALGORITHM- Example
• 7. The last vertex v6 is declared as known. No adjacent vertices for v 6. No updation in the table.
• The shortest distance from source v 1 to all vertices. v1 -> v2 = 2
• v1 -> v3 = 3 v1 -> v4 = 1 v1 -> v5 = 3 v1 -> v6 = 6 v1 -> v7 = 5
• Algorithm Analysis
•
• Time complexity of this algorithm
•
• O(|E| + |V|2 ) = O(|V|2 )
Routine for Dijkstra’s Algorithm
void Dijkstra(Table T)
{
Vertex v, w; for( ; ;)
{
v = smallest unknown distance vertex;
if( v = = NotAVertex) break;
T[v]. kown = True;
for each w adjacent to v if(!T[w].known)
if(T[v].Dist + Cvw < T[w]. Dist)
{/* update w*/
Decrease(T[w]. Dist to T[v].Dist + Cvw);
T[w]. path = v;
} } }
Exercise
Using Dijkstra’s Algorithm, find the shortest distance from source
vertex ‘S’ to remaining vertices in the following graph
Review Questions
Capacities
Capacities
Any valid flow can be decomposed into flow paths and circulations
– s → a → b → t: 11
– s → c → a → b → t: 1
– s → c → d → b → t: 7
– s → c → d → t: 4
Flow Decomposition
A B 11
H
7
3 5
2 12 6
9
C G 6
4 11
10 13
20 I
D 4 E
Flow Decomposition
Start flow at 0
“While there’s room for more flow, push more flow across the
network!”
While there’s some path from s to t, none of whose edges are saturated
Push more flow along the path until some edge is saturated
B C
3 4
1
A D
2
4
2
2
E F
Flow /
Capacity
Example1
Flow / Capacity
Residual Capacity
Example1
0/2
B C
3/3 2 0/4
4
0 1/1
A 0 2/2 0 D
2 3/4
0/2 2/2 1
E F
0
Flow / Capacity
Residual Capacity
Hash Functions
The total possible number of hash functions for n items assigned to m
positions in a table (n < m) is mn.
The number of perfect hash functions is equal to the number
(m−n )!
With 50 elements and a 100-position array, we would have a total of
10050 hash functions and about 1094 perfect hash functions (about 1 in a
million).
Most of the perfect hashes are impractical and cannot be expressed in a
simple formula.
Hash Functions(Contd...)
Division
Hash functions must guarantee that the value they produce is a valid index to the table
A fairly easy way to ensure this is to use modular division, and divide the keys by the size
of the table, so
This works best if the table size is a prime number, but if not, we can
use h(K) = (K mod p) mod TSize for a prime p > Tsize
However, nonprimes work well for the divisor provided they do not have any prime factors
less than 20.
The division method is frequently used when little is known about the keys
Hash Function Contd...
Folding
In folding, the keys are divided into parts which are then combined (or “folded”)
together and often transformed into the address.
Two types of folding are used, shift folding and boundary folding
In shift folding, the parts are placed underneath each other and then processed (for
example, by adding).
Using a Social Security number, say 123-45-6789, we can divide it into three parts -
123, 456, and 789 – and add them to get 1368.
This can then be divided modulo TSize to get the address
With boundary folding, the key is visualized as being written on a piece of paper
and folded on the boundaries between the parts.
Hash Functions Contd...
Folding (continued)
The result is that alternating parts of the key are reversed, so the Social Security
number part would be 123, 654, 789, totaling 1566.
As can be seen, in both versions, the key is divided into even length parts of
some fixed size, plus any leftover digits.
Then these are added together and the result is divided modulo the table size
Consequently this is very fast and efficient, especially if bit strings are used
instead of numbers.
With character strings, one approach is to exclusively-or the individual character
together and use the result.
In this way, h(“abcd”) = “a” ⋁ “b” ⋁ “c” ⋁ “d”
Hash Function Contd...
Folding (continued)
However, this is limited, because it will only generate values between 0 and
127.
A better approach is to use chunks of characters, where each chunk has as
many characters as bytes in an integer.
On the IBM PC, integers are often 2 bytes long, so h(“abcd”) = “ab” ⋁
“cd”, which would then be divided modulo Tsize.
Hash Function Contd...
Mid-Square Function
In the mid-square approach, the numeric value of the key is squared and the
middle part is extracted to serve as the address.
If the key is non-numeric, some type of preprocessing needs to be done to create a
numeric value, such as folding.
Since the entire key participates in generating the address, there is a better chance
of generating different addresses for different keys.
So if the key is 3121, 31212 = 9,740,641, and if the table has 1000 locations,
h(3121) = 406, which is the middle part of 31212.
In application, powers of two are more efficient for the table size and the middle
of the bit string of the square of the key is used.
Assuming a table size of 1024, 31212 is represented by the bit string
1001010 0101000010 1100001, and the key, 322, is in italics.
Hash Functions Contd...
Extraction
In the extraction approach, the address is derived by using a portion of the key.
Using the SSN 123-45-6789, we could use the first four digits, 1234, the last four
6789, or the first two combined with the last two 1289.
Other combinations are also possible, but each time only a portion of the key is
used.
With careful choice of digits, this may be sufficient for address generation.
For example, some universities give international students ID numbers beginning
with 999; ISBNs start with digits representing the publisher
So these could be excluded from the address generation if the nature
of the data is appropriately limited.
Hash Functions Contd...
Radix Transformation
h(K)
…
key space (e.g., integers, strings)
TableSize –1
Hash Table
simple/fast to compute,
Avoid collisions
have keys distributed evenly among cells.
Collision Resolution
Collision: when two keys map to the same location in the hash table.
Two ways to resolve collisions:
Separate Chaining
Open Addressing (linear probing, quadratic probing, double hashing)
Open Addressing:
Problem: -
Using the hash function ‘key mod 7’, insert the following sequence of keys in the hash
table-
50, 700, 76, 85, 92, 73 and 101
Use linear probing technique for collision resolution.
Solution-
The given sequence of keys will be inserted in the hash table as-
Contd...
Step-01:
● Draw an empty hash table.
● For the given hash function, the possible range of hash values is [0, 6].
● So, draw an empty hash table consisting of 7 buckets as-
Contd...
Step-02:
Step-03:
Step -08:
Book has examples on Cichelli’s Method and FHCD for minimal perfect
hash functions for small number of strings.
Hash Functions for Extendible Files
1. The keys 12, 18, 13, 2, 3, 23, 5 and 15 are inserted into an initially empty hash table of length 10
using open addressing with hash function h(k) = k mod 10 and linear probing. What is the resultant
hash table?
2. Consider the table below which shows the cost of allocating 5 jobs to 5 machines.
Machine
A B C D EF
Job 1 22 30 26 16 2
2 27 29 28 20 32
3 33 25 21 29 23
4 24 24 30 19 26
5 30 33 32 37 31
Which jobs should be allocated to which machines so as to minimise the total cost?
Collision
If two more keys hashes to the same index, the corresponding records cannot be
stored in the same location. This condition is known as collision.
Characteristics of Good Hashing Function:
It should be Simple to compute.
Number of Collision should be less while placing record in Hash Table.
Hash function with no collision is called as
Perfect hash function.
Hash Function should produce keys which are distributed uniformly in hash
table.
The hash function should depend upon every bit of the key. Thus the hash
function that simply extracts the portion of a key is not suitable
Review Questions
What is the best definition of a collision in a hash table?
a) Two entries are identical except for their keys
b) Two entries with different data have the exact same key
c) Two entries with different keys have the same exact hash value
d) Two entries with the exact same key have different hash values
• From the original index H, if the slot is filled, try cells H+1 ^ 2, H+2 ^ 2, H+3
^ 2,.., H + i ^ 2 with wrap-around.
Hi(X)=(Hash(X)+F(i))mod Tablesize, F(i)=i^2
Hi(X)=(Hash(X)+ i^2)mod Tablesize
2.Quadratic Probing Example
2.Quadratic Probing –Another Example
h0(23) = (23 % 7) % 7 = 2
h0(13) = (13 % 7) % 7 = 6
h0(21) = (21 % 7) % 7 = 0
h0(14) = (14 % 7) % 7 = 0 collision
h1(14) = (0 + 1^2) % 7 = 1
h0(7) = (7 % 7) % 7 = 0 collision
h1(7) = (0 + 1^2) % 7 = 1 collision
h-1(7) = (0 – 1^2) % 7 = -1
NORMALIZE: (-1 + 7) % 7 = 6 collision
h2(7) = (0 + 2^2) % 7 = 4
h0(8) = (8 % 7)%7 = 1 collision
h1(8) = (1 + 1^2) % 7 = 2 collision
h-1(8) = (1 – 1^2) % 7 = 0 collision
h2(8) = (1 + 2^2) % 7 = 5
h0(15) = (15 % 7)%7 = 1 collision Quadratic probing is better than
h1(15) = (1 + 1^2) % 7 = 2 collision linear probing because it
h-1(15) = (1 – 1^2) % 7 = 0 collision
h2(15) = (1 + 2^2) % 7 = 5 collision eliminates primary clustering.
h-2(15) = (1 – 2^2) % 7 = -3
NORMALIZE: (-3 + 7) % 7 = 4 collision
2. Quadratic Probing
Limitation:
• at most half of the table can be used as alternative locations to resolve collisions.
• This means that once the table is more than half full, it's difficult to find an
empty spot. This new problem is known as secondary clustering because
elements that hash to the same hash key will always probe the same alternative
cells.
3.Double Hashing
• Double hashing uses the idea of applying a second hash function to the key
when a collision occurs. The result of the second hash function will be the
number of positions forms the point of collision to insert.
There are a couple of requirements for the second function:
• It must never evaluate to 0 must make sure that all cells can be probed.
Hi(X)=(Hash(X)+i*Hash2(X))mod Tablesize
• A popular second hash function is:
Hash2 (key) = R - (key % R) where R is a prime number that is smaller than
the size of the table.
3.Double Hashing -Example
Implementation
HASH_TABLE
initialize_table( unsigned int table_size )
{
enum kind_of_entry { legitimate, empty, HASH_TABLE H;
deleted }; int i;
struct hash_entry if( table_size < MIN_TABLE_SIZE )
{ {
error("Table size too small");
element_type element; return NULL;
enum kind_of_entry info; }
/* Allocate table */
};
H = (HASH_TABLE) malloc( sizeof ( struct hash_tbl ) );
typedef INDEX position; if( H == NULL )
typedef struct hash_entry cell; fatal_error("Out of space!!!");
H->table_size = next_prime( table_size );
struct hash_tbl /* Allocate cells */
{ H->the cells = (cell *) malloc ( sizeof ( cell ) * H->table_size );
unsigned int table_size; if( H->the_cells == NULL )
fatal_error("Out of space!!!");
cell *the_cells; for(i=0; i<H->table_size; i++ )
}; H->the_cells[i].info = empty;
typedef struct hash_tbl *HASH_TABLE; return H;
}
Open addressing- Find()- Linear Probing
Position find( element_type key, HASH_TABLE H )
{
position i, current_pos;
i = 0;
current_pos = hash( key, H->table_size );
while( (H->the_cells[current_pos].element != key ) && (H->the_cells[current_pos].info != empty ) )
{
current_pos += 2*(++i) - 1;
if( current_pos >= H->table_size )
current_pos -= H->table_size;
}
return current_pos;
}
Open addressing- Insert()- Linear Probing
void insert( element_type key, HASH_TABLE H )
{
position pos;
pos = find( key, H );
if( H->the_cells[pos].info != legitimate )
H->the_cells[pos].info = legitimate;
H->the_cells[pos].element = key;
}
}
Review Questions
1. Double hashing is one of the best methods available for open
addressing.
a) True
b) False
2. What is the hash function used in Double Hashing?
a) (h1(k) – i*h2(k))mod m
b) h1(k) + h2(k)
c) (h1(k) + i*h2(k))mod m
d) (h1(k) + h2(k))mod m
Rehashing
Rehashing
The new size of the hash table:
• should also be prime
• will be used to calculate the new insertion spot (hence the name rehashing)
• This is a very expensive operation! O(N) since there are N elements to rehash and
the table size is roughly 2N. This is ok though since it doesn't happen that often.
The question becomes when should the rehashing be applied?
• once the table becomes half full
• once an insertion fails
• once a specific load factor has been reached, where load factor is the ratio of the
number of elements in the hash table to the table size
Review Questions
A technique for direct search is------------
a) Binary Search b) Linear Search c) Tree Search d) Hashing
Consider a hash table of size seven, with starting index zero, and a hash function
(3x + 4)mod7. Assuming the hash table is initially empty, which of the following is
the contents of the table when the sequence 1, 3, 8, 10 is inserted into the table
using closed hashing? Note that ‘_’ denotes an empty location in the table.
a) 8, _, _, _, _, _, 10
b) 1, 8, 10, _, _, _, 3
c) 1, _, _, _, _, _,3
d) 1, 10, 8, _, _, _, 3
Extendible Hashing
• Extendible Hashing is a mechanism for altering the size of the hash table to
accommodate new entries when buckets overflow.
• Common strategy in internal hashing is to double the hash table and rehash each
entry.
• However, this technique is slow, because writing all pages to disk is too
expensive. Therefore, instead of doubling the whole hash table, we use a directory
of pointers to buckets, and double the number of buckets by doubling the
directory, splitting just the bucket that overflows.
• Since the directory is much smaller than the file, doubling it is much cheaper.
Only one page of keys and pointers is split.
Extendible Hashing
Extendible hashing is a type of hash system which treats a hash as a
bit string and uses a trie for bucket lookup. Because of the
hierarchical nature of the system, re-hashing is an incremental operation
(done one bucket at a time, as needed).
Extendible hashing - Example
Extendible hashing - Example
Extendible hashing - Example
Extendible hashing - Example
Review Questions