You are on page 1of 15

Software Development 2

Bell College

17: HASH DATA STRUCTURES


Introduction ..................................................................................................................... 1 Hash Tables ..................................................................................................................... 1 EXERCISE: Looking at hash tables............................................................................... 6 Hash Maps ....................................................................................................................... 7 EXERCISE: Looking at a HashMap.............................................................................. 11

Introduction
The linear search (sometimes called the sequential search) is the simplest algorithm to search for a specific target key in a data collection, for example to search for a specific integer value in an array of integers. It is also the least efficient. It simply examines each element in turn, starting with the first element, until it finds the target element or it reaches the end of the array. A linear search does not require the array to be sorted. The binary search is the standard method for searching through a sorted array. It is much more efficient than the linear search, but it does require that the elements be in order. The time taken for a search using each of these methods depends on the size of the collection. The time for a linear search is proportional to the size of the collection it takes 10 times as long on average to find an element in an array of 100 elements as it does in an array of 10 elements. The binary search time depends on the logarithm of the collection size it takes twice as long on average to find an element in an array of 100 elements as it does in an array of 10 elements. A further search method is hashing. Hash data structures allow the storage and retrieval of data in an average time which does not depend at all on the collection size.

Hash Tables
The simplest has data structure is the hash table. The basis of a hash table is a hash function. A hash function is a function that takes an element of whatever data type you are storing (integer, string, etc) and outputs an integer in a certain range. This integer is used to determine the location in the table for that element. The hash table itself consists of an array whose indices are the range of the hash function.

17. HASH Data Structures

page 1

Software Development 2

Bell College

For example, a hash table for a car database might make use of the registration plate as a key. Lets assume the table has a size of 20. A possible hash function would be Value Mod 20, for example (sum of numeric values of the characters) % 20 where % is the modulus operator, giving the remainder after dividing. This would give translations like this: SD52DFG FG02GTH DR52GHY FT52RET NN52FRT 28+13+5+2+13+15+16 = 92: 15+16+0+2+16+29+17 = 95: 13+27+5+2+16+17+34 = 114: 15+29+5+2+27+14+29 = 121: 23+23+5+2+15+27+29 = 124: Hashcode = 12 Hashcode = 15 Hashcode = 14 Hashcode = 1 Hashcode = 4

A hash table with size 20 which has had records with these keys added could look like this: Hash 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: table contents empty occupied Make:Honda empty empty occupied Make:Jaguar empty empty empty empty empty empty empty occupied Make:Mazda empty occupied Make:Toyota occupied Make:Rover empty empty empty empty

Reg:FT52RET

Reg:NN52FRT

Reg:SD52DFG Reg:DR52GHY Reg:FG02GTH

A good hash function is one that gets a fairly even distribution of the numbers in the output range, even if the input values are very poorly distributed (for example, English words are poorly distributed since they only use 26 different letters and some letters and some combinations of letter occur far more frequently than others.

17. HASH Data Structures

page 2

Software Development 2

Bell College

Collisions
With hash tables, there always exists the possibility that two data elements will hash to the same integer value. When this happens, a collision results (two data members try to occupy the same place in the hash table array). For example:

SD52QWS

28+13+5+2+26+32+28 = 134:

Hashcode = 14

The element cannot be stored as location 14 is already occupied. Methods have been devised to deal with such situations. The loading factor in a hash table is defined as the proportion of locations which are occupied. When the loading factor is high, a hash table becomes rather inefficient. Closed hashing Linear probing Linear probing is one method for dealing with collisions. If a data element hashes to a location in the table that is already occupied, the table is searched consecutively from that location until an open location is found. The new key would then be stored in location 16, as location 15 is also already occupied. Rehashing In rehashing, a second function is used to generate an alternative address. The function should hash to a location far from the original one. Open hashing In open hashing further additions which have hashed to an occupied location are stored outwith the main table. These locations may be stored as part of an overflow table or they may be allocated dynamically: 12: 13: 14: 15: 16: 17:

occupied empty occupied occupied empty empty

Make:Mazda Make:Toyota Make:Rover

Reg:SD52DFG Reg:DR52GHY Reg:FG02GTH Reg:SD52QWS

17. HASH Data Structures

page 3

Software Development 2

Bell College

Searching a hash table


Searching a hash table is easy and extremely fast: just find the hash value for the item you're looking for, then go to that index and start searching the array until you find what you're looking for or you hit an empty location. Note that these examples are based on the use of linear probing to handle collisions. Example: search for car with registration NN52FRT (table uses linear probing) Calculate hashcode: 23+23+5+2+15+27+29 = 124: Hashcode = 4 Go to location 4. Compare key in location 4 with target Target was found, index = 4

Example: search for car with registration SD52QWS Calculate hashcode: 28+13+5+2+26+32+28 = 134: Hashcode = 14 Go to location 14 Compare key in location 14 with target Key not equal to target, go to next location Compare key in location 15 with target Key not equal to target, go to next location Compare key in location 16 with target Target was found, index = 16

Example: search for car with registration FD52HBC Calculate hashcode: 15+13+5+2+17+11+12 = 75: Hashcode = 15 Go to location 15 Compare key in location 15 with target Key not equal to target, go to next location

17. HASH Data Structures

page 4

Software Development 2

Bell College

Compare key in location 16 with target Key not equal to target, go to next location Compare key in location 17 with target Location 17 is empty this is where the key would have been if it was in the table Target not in table

Deleting from a hash table


Deleting an element from a hash table involves searching for the key, then deleting the element when it is found. The location must be marked as deleted, rather than empty, as searching will not work correctly otherwise. Locations marked deleted can be reused. The following segment of the table shows the result of adding SD52QWS using linear probing the hash function of this key is 14, but the element is stored in the next empty location, 16. 12: 13: 14: 15: 16: 17:

occupied empty occupied occupied occupied empty

Make:Mazda Make:Toyota Make:Rover Make:VW

Reg:SD52DFG Reg:DR52GHY Reg:FG02GTH Reg:SD52QWS

If we now delete the element in location 15, we get: 12: occupied Make:Mazda Reg:SD52DFG 13: empty 14: occupied Make:Toyota Reg:DR52GHY 15: deleted 16: occupied Make:VW Reg:SD52QWS 17: empty We now search for key SD52QWS. If location 15 was empty, the search algorithm would stop and report that the key is not in the table.

17. HASH Data Structures

page 5

Software Development 2

Bell College

EXERCISE: Looking at hash tables


For the car registration table described in the notes, with size 20, describe the sequence of actions used to add the following elements:

Make Audi Volvo Saab Mini Renault Hyundai

Registration DF52TYU RT02GHT FD52HBC DR02TRG TU02XYZ YU51TRH

You can find a table of the numeric values of characters below: Describe the sequence of actions used to delete the following key:

TU02XYZ
Describe the sequence of actions now used to search for the following key:

YU51TRH YU52TRH
Numeric values of characters (returned by the Character.getNumericValue(char) method in Java):
Character 0 1 2 3 4 5 6 7 8 9 A B C D E F G H Numeric Value 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Character I J K L M N O P Q R S T U V W X Y Z Numeric Value 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

17. HASH Data Structures

page 6

Software Development 2

Bell College

Hash Maps
A map data structure stores (key, value) pairs. Values are retrieved by searching for the appropriate key. Maps are also sometimes called dictionaries or associative arrays. Maps can be implemented in a number of ways. One of the most common ways is to store the (key, value) pairs in a hash table. This is called a hash map. The following code shows a simple Java implementation of a HashMap class. This class uses linear probing to handle collisions and includes methods to store and retrieve entries. Any kind of object can be used as a key, and any kind of object can be stored as a value. We cant use exactly the same hash function as we used in the examples above since we want to be able to use any kind of key, not just strings representing car registration numbers. The function used now makes use of the fact that every Java object inherits a method hashCode from Object.

/** * class HashMap * * @author Jim * @version 1.0 */ public class HashMap { public int CAPACITY = 20; public Object[] keys = new Object[CAPACITY]; public Object[] values = new Object[CAPACITY]; public String[] status = new String[CAPACITY]; /** * Initialize this HashMap object to be empty. */ public HashMap() { for (int i=0;i<CAPACITY;i++){ status[i] = "empty"; keys[i] = "empty"; values[i] = null; } }

17. HASH Data Structures

page 7

Software Development 2

Bell College

/** * Determines if this object contains no elements * * @return true - if this object contains no elements */ public boolean isEmpty() { int count = 0; for (int i=0;i<CAPACITY;i++){ if (status[i] == "occup") count++; } if (count==0) return true; else return false; } /** * Determines the number of elements * * @return the number of elements public int size() { int count = 0; for (int i=0;i<CAPACITY;i++){ if (status[i] == "occup") count++; } return count; }

*/

/** * Puts a new key/value pair in this HashMap * Calculates array position from hashcode, and rehashes * if that position is occupied * * @param key - the key to be stored * @param value - the value to be stored */ public void put(Object key, Object value) { int hashcode = Math.abs(key.hashCode()); hashcode %= CAPACITY; System.out.println("Hashcode calculated:" + hashcode);

17. HASH Data Structures

page 8

Software Development 2

Bell College

if (status[hashcode]=="occup") { hashcode = rehash(hashcode, 0); System.out.println("Hashcode recalculated:" + hashcode); } keys[hashcode] = key; values[hashcode] = value; status[hashcode] = "occup"; } /** * Checks whether the specifed key is present * * @param key - the key to be checked * @return true - if the key is present */ public boolean containsKey(Object key) { boolean found = false; for (int i=0; i<CAPACITY; i++) { if (keys[i]!=null) if (keys[i].equals(key)) found = true; } return found; } /** * Returns the value for a specified key * Uses hashcode to select starting point for search, * and uses linear probing to search from there * * @param key - the key to be found * @return the value associated with key */ public Object get(Object key) { int hashcode = Math.abs(key.hashCode()); hashcode %= CAPACITY; // System.out.println(hashcode); int attempts = 0; if (keys[hashcode].equals(key)) return values[hashcode]; else {

17. HASH Data Structures

page 9

Software Development 2

Bell College

while (!keys[hashcode].equals(key) && status[hashcode] != "empty" && hashcode <= CAPACITY && attempts < 10) { hashcode++; if (status[hashcode] == "occup") { if (keys[hashcode].equals(key)) return values[hashcode]; } else attempts++; } return null; } }

/** * Recursively recalculates an array position to find an * unoccupied position, based on a linear probing method * Checks whether allowed number of attempts has been * exceeded. * * @param hashcode - the hashcode of an object * @param attempts - the number of attempts allowed * @return the new array position */ private int rehash(int hashcode, int attempts) { if (hashcode < (CAPACITY) && attempts < CAPACITY) { hashcode++; if (hashcode==CAPACITY) hashcode = 0; //System.out.println("rehashing: " + hashcode); if (status[hashcode] == "occup") { hashcode = rehash(hashcode, attempts+1); } return hashcode; } else return -1; } }
The highlighted System.out.println commands in the put method are for the purpose of the following exercise only.

17. HASH Data Structures

page 10

Software Development 2

Bell College

Differences from the Java Collections Framework HashMap


This HashMap is much simpler and less robust than the Java Collections Framework. The framework class: has a fuller set of operations can automatically increase its capacity if it gets too full uses open hashing with linked lists to handle collisions

EXERCISE: Looking at a HashMap


Create a new BlueJ project called hashmaps, and add a new class HashMap using the above code. Add new classes Car and CarTest using the following code:

/** * class Car * * @author Jim * @version 1.0 */ public class Car { public String registration; public String make; public String model; /** * Constructor for objects of class Car */ public Car(String registration, String make, String model) { this.registration = registration; this.make = make; this.model = model; } }

17. HASH Data Structures

page 11

Software Development 2

Bell College

/** * class CarTest * * @author Jim * @version 1.0 */ public class CarTest { public HashMap map; /** * Constructor for objects of class CarTest */ public CarTest(HashMap map) { this.map = map; } /** * Constructs a Car object and stores in the map * using the registration number as the key * * @param reg the registration number * @param make the make * @param model the model */ public void storeCar(String reg, String make, String model) { Car car = new Car(reg, make, model); map.put(car.registration, car); } /** * Retrieves a Car object from the map * using the registration number as the key * * @param reg the registration number * @return the Car object found */ public Car findCar(String reg) { Car car = (Car) map.get(reg); return car; } }

17. HASH Data Structures

page 12

Software Development 2

Bell College

Create a new instance of HashMap called hashmap1. Create a new instance of CarTest called carTest1 and supply hashMap1 as the parameter for the constructor. You can now use carTest1 to store and find Car objects in hashMap1, and also inspect the map directly.

Storing data
Call the storeCar method of carTest1 repeatedly to add the following entries: carTest1.storeCar("DF52TMU", "Audi", "TT"); carTest1.storeCar("RT02GHT", "Volvo", "S50"); carTest1.storeCar("FD52HBC", "Saab", "9-5"); carTest1.storeCar("DR02TRG", "Mini", "Cooper"); carTest1.storeCar("TU02XYZ", "Renault", "Megane"); carTest1.storeCar("YU51TRH", "Hyundai", "Coupe"); These are similar to the hash table entries you used in the previous exercise. They will not be stored in the same array positions since the hash function is different here. Look at the output in the terminal window. This shows the hashcodes calculated during execution of the put method of the HashMap. Were there any collisions? What steps were taken to handle collisions?

Looking at the HashMap


Inspect hashMap1. You should see keys, values and status arrays. Inspect the keys array.

17. HASH Data Structures

page 13

Software Development 2

Bell College

Choose a position which is not empty. Note the position number and the registration number it contains. Fro example, in the figure, position 10 contains YU51TRH. Now inspect the values array. Inspect the object reference at the position number you noted above.

What kind of object does the values array contain? Does this object have the correct attributes? Repeat this for another non-empty position.

Finding data
Call the findCar method of carTest1 with TU02XYZ s a parameter.

17. HASH Data Structures

page 14

Software Development 2 What kind of object is returned? Does this object have the correct attributes? Check that the other keys find the correct value objects.

Bell College

Updating data
Add a new method set(key, value) to your HashMap class to update the value object for a specified key. Modify your CarTest class so that you can test this method.

17. HASH Data Structures

page 15

You might also like