You are on page 1of 18

Heidi C. Ellis and Gerard C.

Weatherby

Hashing
Hashing: Applies transformation to keys to arrive at address of an element. * Performance is independent of table size. Perfect hash function: Each key is transformed into a unique storage location. * Element could be stored and retrieved using same transformation. Imperfect hash function: Maps more than one key to the same storage location. Assume we have 10 elements to store with 4-digit keys between 0000 and 9999. Approach 1: Set up array size of 10,000 and use key value as
Object Structures
Hashes and Dictionaries.1

Heidi C. Ellis and Gerard C. Weatherby

an index into array. * Wastes LOTS of space. Approach 2: Divide key by 13 and use remainder as index to access array of 13 elements with indices 0 to 12.
Key 1234 5021 7423 2000 9043 6296 Transformed Key 12 3 0 11 8 4

Problem: 7423 and 0013 both map to same location.

Object Structures

Hashes and Dictionaries.2

Heidi C. Ellis and Gerard C. Weatherby

Hashing
Hashing: Transformation of key into storage location. * Key has been converted to hash. * Transformed key has no visible relation to original key. * Cannot be used to construct original key. Collision: Two keys are hashed to same storage location. Hashing search method reduces to two problems: 1. Finding a hashing method that reduces collisions. 2. Resolving collisions when they occur. Goal of a good hash function: Spread hash keys evenly over entire table. Minimize clustering: The tendency for many keys to hash to the same or nearby locations.
Object Structures
Hashes and Dictionaries.3

Heidi C. Ellis and Gerard C. Weatherby

Hash functions should be easy and quick to compute. Hash function represented by H(K) where K is the key. * Use modulo the array length to stay within the array. Mid-square method: * Square K. * Strip predetermined digits from front and rear. * e.g., use thousands and ten thousands places.
K 3205 7148 2344 K2 10272025 51093904 5499025 Hashed Key 72 93 99

Object Structures

Hashes and Dictionaries.4

Heidi C. Ellis and Gerard C. Weatherby

Hashing
Folding method: * Partition K into a number of parts. * Each part has the same number of digits as the required address. * Add parts together ignoring the last carry.
K 3205 7148 2344 H(K) 32 + 05 71 + 48 23 + 45 Hashed Key 37 19 68

Object Structures

Hashes and Dictionaries.5

Heidi C. Ellis and Gerard C. Weatherby

Remainder method: * H(K) = K \\ M. * Some choices of M are better than others. * M should not be even since H(K) will be even when K is even and odd when K is odd. * M should not be a power of the radix of the computer since (K\\M) would be the least significant digits of K. 1024004, 6424004, and 9324004 would collide. * Prime numbers work fairly well. * Select a prime that is about twice as large as the largest possible number of table elements.

Object Structures

Hashes and Dictionaries.6

Heidi C. Ellis and Gerard C. Weatherby

Hashing
How do we detect collisions? * Initialize table to value that cannot occur as key. * Inspect the location before inserting. Two major classes of collision resolution: 1. Open addressing: * When collision occurs, use organized method to find next open space. * Maximum number of elements equal to table size. 2. Chained addressing: * Make linked list of all elements that hash to same location. * Allows number of elements to exceed table size.
Object Structures
Hashes and Dictionaries.7

Heidi C. Ellis and Gerard C. Weatherby

Linear Probing Hashing


Linear Probing Hashing: Upon collision, look at next table location modulo M. * Simplest open addressing scheme. * Probe sequentially through the table starting at initial hash index. * If we reach end of table, wrap around. What guarantees termination. * Nothing unless we know that table is not full. * Overflow: Table that has no open locations when we try to insert a new key. * We could check for overflow, but tables that are close to being full have very poor performance. * Mostly full tables have very long collision resolution paths.
Object Structures
Hashes and Dictionaries.8

Heidi C. Ellis and Gerard C. Weatherby

Linear Probing Hashing


Use linear probing hashing to insert the following keys: 103 8 108 208 308 10 9

Deletion of an element takes more effort. * Collision chain must be maintained if element is in middle of chain. * Could reorganize whole chain. * Could mark entry as deleted. - Requires table to be reorganized periodically.

Object Structures

Hashes and Dictionaries.9

Heidi C. Ellis and Gerard C. Weatherby

Quadratic Hashing
One way to avoid clustering problem is to probe in some other way than successive table locations. Quadratic hashing: Probes the table in increasing modulo the table size. * H(K) + 1, H(K) + 4, H(K) + 9, H(K) + 16, . . . * Increment by squares. Use quadratic hashing to insert the following keys: 103 8 108 208 308 10 9

Object Structures

Hashes and Dictionaries.10

Heidi C. Ellis and Gerard C. Weatherby

Quadratic Hashing
Data is more widely distributed than with linear probing hashing. Using open address hashing, table size must be decided in advance. * Not applicable to all applications. * Best to guess high, but this wastes space. One resolution is adaptive hashing. * Hash table automatically enlarged and reorganized when it gets too close to being full. * Doesnt come to a full halt.

Object Structures

Hashes and Dictionaries.11

Heidi C. Ellis and Gerard C. Weatherby

Chained Hashing
Chained hashing solves problem of size/reorganization. Index to hash table computed as before. Table now holds a pointer to a linked list of all elements that hash to that location. Chained hashing allows more elements to be stored than table locations. Used in cases where the number of entries is difficult to predict. Chained hashing requires fewer storage accesses. * Dont have to follow a series of addresses to find an element.

Object Structures

Hashes and Dictionaries.12

Heidi C. Ellis and Gerard C. Weatherby

Chained Hashing
Use chained hashing to insert the following keys: 103 8 108 208 308 10 9

Object Structures

Hashes and Dictionaries.13

Heidi C. Ellis and Gerard C. Weatherby

Hashing
Hashing obtains search times independent of number of elements. * Trade off is wasted space. High speed can be compromised by clustering. * Also compromised by excessive collision recovery operations. Chaining reduces severity of collisions. * Requires extra space for nodes. Hashing is dynamic. * Handles combinations of insertions, searches, and deletions.

Object Structures

Hashes and Dictionaries.14

Heidi C. Ellis and Gerard C. Weatherby

Java Collection interface


Java provides Map interface * Associates pairs of values * For each key, there exists a unique value * Keys are hashed for rapid lookup * Map replaces obsolete Dictionary class Introduced with Java 1.2

Method
void clear()

Description
Removes all mappings from this map (optional operation).

boolean containsKey(Object key) Returns true if this map contains a mapping for the specified key.
Object Structures
Hashes and Dictionaries.15

Heidi C. Ellis and Gerard C. Weatherby

Method
boolean containsValue(Object value) Set entrySet() boolean equals(Object o) Object get(Object key) int hashCode() boolean isEmpty() Set keySet() Object put(Object key, Object value)

Description
Returns true if this map maps one or more keys to the specified value. Returns a set view of the mappings contained in this map. Compares the specified object with this map for equality. Returns the value to which this map maps the specified key. Returns the hash code value for this map. Returns true if this map contains no keyvalue mappings. Returns a set view of the keys contained in this map. Associates the specified value with the specified key in this map (optional operation).

Object Structures

Hashes and Dictionaries.16

Heidi C. Ellis and Gerard C. Weatherby

Method
void putAll(Map t) Object remove(Object key) int size() Collection values()

Description
Copies all of the mappings from the specified map to this map (optional operation). Removes the mapping for this key from this map if it is present (optional operation). Returns the number of key-value mappings in this map. Returns a collection view of the values contained in this map.

Object Structures

Hashes and Dictionaries.17

Heidi C. Ellis and Gerard C. Weatherby

Available implementations
Hashtable - original implemention (JDK 1.0) * nulls not allowed * synchronized HashMap - newer implementation (JDK 1.2) * nulls allowed * not synchronized LinkedHashMap - subclass of HashMap * maintains linked list so iteration returns values in insertion order Each implementation has initialCapacity and loadFactor attributes.
Object Structures
Hashes and Dictionaries.18

You might also like