You are on page 1of 49

Chapter 20

Hash Tables
The Limitations of Ordinal Indexing

• The main property of the array is that it


offers a homogeneous collection of items
indexed by an ordinal value.
• That is, the ith element of an array can be
accessed in constant time for reading or
writing (O(1)).
• Rarely a user knows the ordinal position of
the data we are interested in, though.

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-2


The Limitations of Ordinal Indexing
(2)
Consider an employee database
• Employees might be uniquely identified by their
social security number (0-9).
• With an array of all employees that were
randomly ordered, finding employee 111-22-
3333 would require, potentially, searching
through all of the elements in the employee
array, a O(n) operation.

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-3


The Limitations of Ordinal Indexing
(3)
• A somewhat better approach would be to
sort the employees by their social security
numbers, which would reduce the
asymptotic search time down to O (log n).

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-4


The Limitations of Ordinal Indexing
(4)
• Ideally, try to access an employee's records
in O(1) time.
• One way to accomplish this would to build
a huge array, with an entry for each possible
social security number value.
• That is, our array would start at element
000-00-0000 and go to element 999-99-
9999.

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-5


The Limitations of Ordinal Indexing
(5)

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-6


The Limitations of Ordinal Indexing
(6)
• Each employee record contains information like
Name, Phone, Salary, and so on, and is indexed
by the employee's social security number.
• Any employee's information could be accessed in
constant time.
• The disadvantage of this approach is its extreme
waste: there are a total of 109 different social
security numbers.
• For a company with 1,000 employees, only
0.0001% of this array would be utilized.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-7
The Limitations of Ordinal Indexing
(7)
• Creating a one billion element array to store
information about 1,000 employees is clearly
unacceptable in terms of space.
• One option would be to reduce the social
security number span by only using the last
four digits of an employee's social security
number.
• That is, rather than having an array spanning
from 000-00-0000 to 999-99-9999, the array
would only span from 0000 to 9999.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-8
Trimmed Down Array

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-9


Hashing
• The mathematical transformation of the
nine-digit social security number to a four-
digit number is called hashing.
• An array that uses hashing to compress its
indexers space is referred to as a hash table.
• A hash function is a function that performs
this hashing.
H(x) = last four digits of x

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-10


Graphical Representation
of a Hash Function

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-11


Hashing Example
• The hash table supports the retrieval or
deletion of any named item.
• We want to be able to support the basic
operations in constant time, as for the stack
and queue.
• Assume that all the items are small
nonnegative integers, ranging from 0 to
65,535. to implement each operation as
follows.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-12
using System;
using System.Collections.Generic;
Hashing Example
using System.Text;
using System.Collections;
in C#
namespace ConsoleApplication2 {
public class HashtableDemo {
private static Hashtable employees = new Hashtable();
public static void Main() {
employees.Add ("111-22-3333", "Scott");
employees.Add ("222-33-4444", "Sam");
employees.Add ("333-44-55555", "Jisun");
if (employees.ContainsKey ("111-22-3333")) {
string empName = (string)employees["111-22-3333"];
Console.WriteLine("Employee 111-22-3333's name is: " + empName);
}
else
Console.WriteLine("Employee 111-22-3333 is not in the hash table...");
}
}
}
Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-13
Basic Ideas
• First, we initialize an array a[] that is indexed
from 0 to 65,535 with all 0s.
• To perform insert(i), we execute a[i]++.
Note that a[i] represents a number of times
that i has been inserted.
• To perform find (i), we need to verify that
a[i] is not 0.
• To perform remove(i), make sure that a[i] is
positive and then execute a [ i ] --.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-14
Basic Ideas (2)

• The time for each operation is clearly


constant; even the overhead of the array
initialization is a constant amount of work
(65,536 assignments).
• There are two major problems with this
solution.

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-15


Basic Ideas (3)

• First, suppose that we have 32-bit integers


instead of 16-bit integers. Then the array a
must hold 4 billion (232) items, which is
impractical.
• Second, if the items are not integers but
instead are strings, they cannot be used to
index an array.

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-16


Basic Ideas (4)

• Just as a number 1234 is a collection of


digits 1, 2, 3, and 4, the string "junk" is a
collection of characters 'j', u, n', and ‘k’.
• The number 1234 is just 1 • 103 + 2 • 102 +
3 • 101+ 4 • 100.
• You may use the ASCII table.

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-17


Basic Ideas (5)

Store a string into the table: "Steve".


83 + 116 + 101 + 118 + 101 = 519
519 mod 12 = 3
Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-18
Basic Ideas (6)

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-19


Basic Ideas (7)

Store a string into the table: "Steve“:


83 + 116 + 101 + 118 + 101 = 519
519 mod 12 = 3

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-20


public class defineHash {
static int hash (char s[], int table_size) {
int sum=0;
if (s.length == 0)
return -1;

Hash Function
for (int i=0; i< s.length; i++)
Output
sum=sum + s[i];
return (sum % table_size); 3
}
public static void main (String [] ags) {
char s1[]= { 'S', 't', 'e', 'v','e'};
// char s1[]= {};
System.out.println (hash (s1, 12));
}
}
Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-21
public class defineHash {
static int hash (String s, int table_size) {
int sum=0;

Hash Function (2)


if (s == "")
return -1;
for (int i=0; i< s.length(); i++)
sum = sum + s.charAt (i);
return (sum % table_size);
}
public static void main (String [] ags) {
String s1= "Steve";
System.out.println (hash (s1, 12));
}
}
Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-22
Linear Probing
• Now that we have a hash function, we need
to decide what to do when a collision
occurs.
• Specifically, if X hashes out to a position
that is already occupied, are where do we
place it?
• The simplest possible strategy is linear
probing, or searching in the array until we
find an empty cell.

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-23


Linear Probing (2)
• In linear probing, collisions are resolved by
sequentially scanning an array (with
wraparound) until an empty cell is found.

table_size=10
hash ( 89, 10 ) = 9
hash ( 18, 10 ) = 8
hash ( 49, 10 ) = 9
hash ( 58, 10 ) = 8
hash ( 9, 10 ) = 9

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-24


Linear Probing (3)

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-25


Linear Probing (4)

• The first collision occurs when 49 is inserted;


the 49 is put in the next available spot,
namely, spot 0, which is open.
• Then 58 collides with 18, 89, and 49 before
an empty spot is found three slots away in
position 1.
• The collision for element 9 is resolved
similarly.

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-26


Linear Probing (5)
• So long as the table is large enough, a free
cell can always be found.
• However, the time needed to find a free cell
can get to be quite long.
• For example, if there is only one free cell
left in the table, we may have to search the
entire table to find it.

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-27


Linear Probing (6)
• On average we would expect to have to
search half the table to find it, which is far
from the constant time per access that we
are hoping for.
• But, if the table is kept relatively empty,
insertions should not be so costly.

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-28


Linear Probing (7)

• The find algorithm merely follows the


same path as the insert algorithm.
• If it reaches an empty slot, the item we are
searching for is not found; otherwise, it
finds the match eventually.

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-29


Linear Probing (8)
• For example, to find 58, we start at slot 8
(as indicated by the hash function). We see
an item, but it is the wrong one, so we try
slot 9.
• Again, we have an item, but it is the wrong
one, so we try slot 0 and then slot 1 until
we find a match.

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-30


Linear Probing (9)
• A find for 19 would involve trying slots 9, 0,
1, and 2 before finding the empty cell in slot
3.
• Thus 19 is not found.

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-31


Linear Probing (10)
• Standard deletion cannot be performed
because, as with a binary search tree, an
item in the hash table not only represents
itself, but it also connects other items by
serving as a placeholder during collision
resolution.

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-32


Linear Probing (11)
• If we removed 89 from the hash table,
virtually all the remaining find operations
would fail.
• Consequently, we implement lazy deletion,
or marking items as deleted rather than
rather than physically removing them from
the table. This information is recorded in an
extra data member. Each item is either
active or deleted.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-33
Linear Probing (12)

• The disadvantage of linear


probing is the primary
clustering problem.
• The table develops clusters
of successive used slots.

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-34


Linear Probing (13)
Consider the following linear probing example.
• Here, the keys k1, k2, …, k6 were inserted in
that order respectively.
• The indicators on the right show where each
key initially hashed to.

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-35


Homework: Problem #1

• Given input {4371, 1323, 6173, 4199, 4344,


9679, 1989}, a fixed table size 10 and a
hash function H(X) = X mod 10, show the
resulting for linear probing hash table
What are the array indices for a hash table
of size 11?

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-36


Quadratic Probing
• Quadratic probing examines cells 1, 4, 9,
and so on, away original probe point.
• Its name is derived from the use of the
formula F(i) = i2 to resolve collisions.

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-37


Quadratic Probing (2)

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-38


Quadratic Probing (3)
• When 49 collides with 89, the first alternative
attempted is one cell away. This cell is empty,
so 49 is placed there.
• Next, 58 collides at position 8. The cell at
positions 9 (which is one away) is tried, but
another collision occurs.

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-39


Quadratic Probing (4)
• A vacant cell is found at the next cell tried,
which is 22 = 4 positions away from the
original hash position.
• Thus 58 is placed in cell 2. The same thing
happens for 9.

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-40


Quadratic Probing (5)
• Note that the alternative locations for items
that hash to position 8 and the alternative
locations for the items that hash to position
9 are not the same.
• The long probe sequence to insert 58 did
not affect the subsequent insertion of 9,
which contrasts with what happened with
linear probing.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-41
Quadratic Probing (6)
• Linear probing is easily implemented.
• Quadratic probing appears to require
multiplication and mod operations.
– Does this apparent added complexity make
quadratic probing impractical?

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-42


Quadratic Probing (7)
• What happens (in both linear probing and
quadratic probing) if the load factor gets too
high?
– Can we dynamically expand the table, as is
typically done with other array-based data
structures?

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-43


Load Factor of Hash Table
• The number of elements in hash table
divided by the table size.

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-44


Chained Hash Table
• A hash table.
– The set of keys in K are each hashed to a value
in the hash table.

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-45


Chained Hash Table (2)

• A chained hash table.


– Each slot points to a linked list, empty slots and
the end of the lists are in grey.

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-46


Chained Hash Table (3)
• The directory operations on a hash table T
are easy to implement when collisions are
resolved by chaining.

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-47


Chained Hash Table (4)

• One of the disadvantages of chained hashing


is that we need to have enough space to store
m pointers in the hash table, n elements, plus
a further n pointers for the linked lists from
each element.

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-48


Homework: Problem #2

• Given input {4371, 1323, 6173, 4199, 4344,


9679, 1989}, a fixed table size 10 and a
hash function H(X) = X mod 10, show the
resulting for quadratic probing hash table.

Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 20-49

You might also like