Professional Documents
Culture Documents
Searching
Up to now the only way to find a key is to search through all or part of the data
linked list: O(n) AVL tree: O(log n) binary search of array: O(log n)
If lots of data and/or searching the data very often, these times can be long
given the key, would like to get the data directly
Hashing
The solution to this problem is to put the key through a function that says exactly where the data is (or where it should be placed)
this function is called a hash function
h(key) = integer
the integer obtained from a hash function can be used as an index into an array
if the hash function is perfect always generates a unique integer for different keys the time to place and access data is O(1)
Hashing
A M X
Hashing Function
A 0 1 2 3 4
M 5 6 7 8
X 9 10 11
Hashing Functions
So what is the hashing function?
the simplest hashing function is to use the division remainder
assume the array is 1000 elements in size translate the data into a number, n h(n) = n % 1000
Hashing Functions
simple example
consider a small school each student is tracked by a 4 digit ID number each students ID# begins with the year they started
2000 -> 0, 2001->1, 2002->2, etc.
Hashing Functions
To find Johns record in the array: Marys ID #: Petes ID #: Johns ID #: Amys ID#: 1000 1004 1009 1011 1009 % 1000 = 9 Go to index number 9.
9 10 11
Generating n
The previous example is rather simplistic in that it is hashing already unique integers
seems kind of pointless maybe not if the integers are large
consider the UWs 10 digit ID numbers
Generating n
How is a string converted into an integer?
the simplest method is to add all of the ASCII values for each character together example
convert amy into an integer
a = 97; m = 109; a + m + y = 327 y = 121
Hashing Functions
There are millions of possible hashing functions
we will not be considering them all basically, anything you can think of to generate an integer could be used as a hashing function
Mathematicians have spent lots of time and effort to come up with some basic methods that work pretty well
Division
We have already seen the division method
it involves taking the remainder of division
h(key) = key % tableSize
Folding
Separate the key into various equally sized parts and then recombine them
usually with addition
boundary folding
reverse the order of every other part and add them together
Folding
Consider a SSN as a key
break it into 3 parts
first 3, second 3, last 3
Increasing Performance
Consider using shifting and exclusive ORing to generate the key
exclusive OR parts together to generate index
Example
consider the string abcdefgh if each part is a letter, just exclusive OR them
a ^ b ^ c ^ d ^ e ^ f ^ g ^ h
Increasing Performance
int shiftFold(String key, int tableSize) { int chunk = 0; int result = 0; byte[ ] st = key.getBytes(); for(int i=0; i<st.length; i+=4) { for(int j=0; (j<4) && (j + i < st.length); j++) { chunk = chunk | st[j + i]; chunk = chunk << 8; } result = result ^ chunk; chunk = 0; } return result % tableSize; }
Increasing Performance
The performance could be increased even more if the table size was a power of 2
can get rid of the modulo operation at the end modulo is an expensive calculation could just do a subtraction and an AND operation instead
Mid-Square Function
Square the number and take the middle part as the index
a string must first be converted to get the number to square
Mid-Square Function
Table size equals 1024 (210) The key is 3121
31212 = 9740441 = (100101001010000101100001)2 middle 10 digits of this value are listed in bold
Index in array is
(0101000010)2 = 322
This is all very quick and easy to calculate using mask and shift operations
Mid-Square Function
int tableSize = 1024; int mask = (tableSize 1) ; int maskBits = logBase2(tableSize); int shiftBits = 7; // table size must be a power of two int midSquare(String key, int tableSize) { int n = stringToNum(key); int n = n * n; return n & (mask << shiftBits); }
Extraction
Simply pull out a certain part of the key and use it as the index
example
SSN = 123-45-6789 index = middle of key = 456 alternative index = first, middle, last = 159
Should try to choose a part of the key that is most likely unique
consider foreign student SSN start with 999
probably not a great idea to extract the first three numbers