Professional Documents
Culture Documents
value is O(n) In an ordered list this time can be improved, and there could definitely be improvement in the modification operations
In a Binary Search Tree the search time
could well improve to O(log n) Same is the limit for AVL trees
store key and values It can be implemented using Array or Linked List structures For a Dictionary, the direct addressing of each element could be done using the value of the element as index, if the Dictionary is of that size
simultaneously used by many processes Also there could be frequent accesses to the Keys in the runtime So, there is a need for reducing both size of space and the search time
Example-1
A 4 digit number as Key may need 9999 locations
If the Key stands for the Employee ID of a company
with 500 employees, Then only 500 locations shall be used when all the Keys are arranged in the memory
Example 2
A Hospital might be having large number of patients,
both inpatients and outpatients The database system can be modeled to group the patients and then index them so that the retrieval of the records shall be fast. Another way is not to group, but assign only one number to each case
the amortized time of O(1) or a near about time could be achieved if we know the location of the data or key we are looking for This location could be obtained from a mapping of the key to a new hashed key using proper functions
Hashing
Hashing could provide unique locations or a
reference to a shorter list for the keys from where we can easily get the data pertaining to one key Also, this would perhaps use less space in memory Instead of a large array, we can use a short length array/linked list
Hash Table
Hash Table is a Data Structure
Hash tables provide the time O(1) for any
and all values in a set contained on the Hash Table for search/insert/delete
Hash Table?
Hash table is an array say T[1,m] where m is a positive integer called the table size
When we try to put an item into a spot in the
hash table that is occupied, the situation is called collision It is resolved using a collision resolution policy
Hashing-Mathematical Definition
Hashing is a mapping operation
Consider the a set K of keys Let H be a function that map the keys to a new set L
Such that
H:K L
Hash Address
Let k is Key in K or k K Then k will have a mapped address in L given by
H(k) known as the Hash Address Hash Address d is the mapped address/location given by the hashing operation d=H(k) of a key k
in L This address d is also called the Hash Address or Hash Code for the key k The process of Hashing is also called Compression
Notes
There is no meaning between the actual data value k
and the hash key d So there is no practical way to traverse a hash table, except a direct search using d Hash table items are not in any order There is no mapping function from d to k, except the hash table The purpose of hash tables is to provide fast look ups
memory pages and buffers. High speed routing tables use hash tables. Database systems use hash tables.
Remove(k)
Sizeof Isempty
Types of hashing
There are two types 1. Open hashing- Open Chaining-Closed AddressingSeparate Chaining 2. Closed hashing- Open Addressing
bucket for the data with an index Data within the bucket could better be organized as Linked List
1 k1
2 k2
3 k3
L-1 kN-1
L kN
the earmarked space If there are multiple keys getting hashed to same address(collision) then the tie shall be resolved Bucket may be small enough to hold only one value at a time
Topics in Hashing
Basically there are two subareas under Hashing 1. Hash Functions 2. Collision Resolutions
Hash Functions
1. The Hash Function H should be easy to compute
distribute the hash addresses throughout the set L so that there are a minimum number of collisions
Hash Functions
key k, the hash function H obtains a value H(k) as an index into the hash table cell/bucket so that we can locate the key k in the Hash Table easily for search/insert
Hash Functions
Division Method
Mid Square method Multiplication Method
Division Method
Choose a prime number that is not close to the
power of 2 Let m be the selected number Then m also indicate the size of the Hash Table in the ideal case with one cell in each bucket The hash address/bucket address is given by
H(k)=k mod m
Example
Given keys are
4845, 5679, 6381, 3636, 7180, 8126, 1127 Use Table size m=7 Hash to a Table with 7 cells Also use m=11 and m=8 to repeat the exercise
Answer
0 1127 1 4845 2 5679 3 3636
HASH ADDRESS
4 6381
5 7180
6 8126
KEY
consideration must be given to the size of the table. The best choice for table size is usually a prime number not too close to a power of 2.
Division Method for Chaining Here, the Hash Table will have many cells Hash addresses map multiple keys to a single location,
as a linked list
Illustration
Take Table size m as 11 to map a set keys
Keys
122
221
661
90
69
167
57
addresses
57
3
4
69
Load Factor
Let there are m slots in a Hash Table
At the instant of observation the number elements is n
57
82
108 109
Solution
There are 11 slots
11 elements = 11/11=1
position Also, we get =1 even if there are vacant slots, because it is only showing the average
Notes on
The Load factor could be assuming various values
as the number of keys on the Hash Table changes Accordingly, could be less than, equal, or greater than one in a Hash Table formed using Separate Chaining(Open Hashing) In a Hash Table formed using Open Addressing(Closed Hashing) shall be always less than one decides the complexity of the operations on the Hash Tables like insert, search, delete etc
Exercise
Map the following keys in such a way that we have
end Add the ASCII value of last character to the ASCII value of first multiplied by 256 Apply mod m division to this resulting number
Keys
A, BABU, CHOWHAN, SUMAN, DILIP
SN
DP
These 5 symbols are then converted to a numerical code using the rule given previously by employing the ASCII values of the characters in the symbols
ASCII Values
A-65
B-66 C-67 D-68 E-69 F-70 G-71 H-72 I-73
J-74
K-75 L-76 M-77 N-78 O-79 P-80 Q-81 R-82
S-83
T-84 U-85 V-86 W-87 X-88 Y-89 Z-90
A-65
Example- Answer
AA 256*65+65=16705
BU 256*66+85=16981 CN 256*67+78=17320
B-66
C-67 L-76 U-85 D-68 M-77 V-86 E-69 F-70 N-78 W-87 O-79 X-88
SN 256* 83+78=21326
DP 256*68+ 80=17488
Solution
Take m=7
Obtain the Hash Addresses
Solution
0 1 2
CHOWHAN
DILIP
3
4 5 6
AA
SUMAN
BABU
Symbol Table
Compilers use a method similar to the previous one
1 Convert the string to a key. 2 Constrain the key to a positive value less than the size of the table. The best strategy is to keep the two functions separate so that there is only one part to change if the size of the table changes.
Notes-Chaining method
The chaining method gives infinite space in the hash
table in principle But, in practical applications, only limited space shall be allotted for one hash table in the memory There is no collision in chaining
Collisions
Collision
In the case of closed hashing(open addressing)-
even though H is ideally giving distinct addresses in L for each member in K in the real situation two or more Keys may LEAD TO A SINGLE Hash Address when a given Hash Function is used This situation is called collision We need some method to resolve collision The method is called Collision Resolution Policy
Linear Probing
If a collision occurs, look for next immediate free
location and use it for storage for the insert operation If a key is not found, look for it in the next cells in a linear manner for search operations
Example
Let H is mod 11 Let the keys are 56, 78, 100 appear in this order for
hashing All these have home as position 1 The table is considered a circular array
1 56
2 78
3 100 8
4 9 10
Exercise
Hash 45, 39, 66, 74 in that order with Table size m=7
3 45 5 66
4 39 6 74
Exercise
Let H is mod 11
Let the keys are 46, 122, 222, 441 appear in this order
for hashing
46 mod 11 = 2 122 mod 11 = 1 222 mod 11 = 2
441 mod 11 = 1
Solution
1 122
2 46
3 222 8
4 441 9 10
of k2 5. Once chosen, same positions of k2 must be used for all keys consistently
Example
k:
the Division method The function take the form H(k)=m(kA mod 1) =floor(m* (kA mod 1) Where, 0<A<1 and kA mod 1 refers to the fractional part of kA Since 0< kA mod 1<1, the range of H(k) is from 0 to m
works equally well with any size m A should be chosen carefully Rational numbers should not chosen for A An example of good choice for A is
5 1 2
5 1 A 2
2343 floor(11* (2343* 0.618 mod 1) 10 4345 floor(11* (4345* 0.618 mod 1) 2 6567 floor(11* (6567* 0.618 mod 1) 4 3476 floor(11* (3476* 0.618 mod 1) 1 1215 floor(11* (1215* 0.618 mod 1) 9 MATLAB command floor(11*mod((k*0.618),1))
Solution
1 3476
2 4345
3 8
H+3 and so on to find the space of the key, which has got the primary hash value as H This would lead to clustering of hash codes near some cells, called primary clustering Larger the cluster, lesser will be the search efficiency
uniformly distributed manner with a larger table size the process may avoid collisions Even if collisions occur we may use a pseudo random sequence to probe the locations But this approach reduces the locality reference, which then becomes a random variable So, better to use a via media solution between the linear probing and the random hashing
Quadratic Probing
Instead of linearly traversing through the hash table
slots in the case of collisions, the quadratic probing introduces more spacing between the slots we try in the case of collision This reduces the clustering effect seen in linear probing Clustering can still occur because Quadratic Probing is not immune to clustering Quadratic Probing preserves some locality reference and hence give good cache performance but lower than that of Linear Probing
Example
c1 = c2 =,
Take m= 11
Let the keys are 46, 122, 222, 441 appear in this
Exercise
Apply Quadratic Probing for the following Hash
Addresses
78 mod 11 =1 89 mod 11 =1 111 mod 11=1 166 mod 11=1
Answer
78 mod 11 =1 1
89 mod 11 =1 (1+0.5 *1 + 0.5*12 ) mod 11 2 111 mod 11=1 (1+0.5 *2 + 0.5*22 ) mod 11 4 166 mod 11=1 (1+0.5 *3 + 0.5*32 ) mod 11 7
Notes
If two keys have the same initial probe position, then
their probe sequences are the same, since H(k1, 0)=H(k2, 0) implies H(k1, i)=H(k2, i) This property leads to milder form of clustering called secondary clustering
Clustering
hashed keys share substantial segments of probe sequence, because more than one key hashed into same home position shall have the same probe sequence And the hash addresses that collide at the home address, say b, will extend the cluster
Primary Clustering
As we have seen, once a block of few contiguous
occupied positions emerges in the Hash Table, it becomes a target for subsequent collisions As clusters grow, they also merge to form larger clusters Primary clustering means elements that hash to different cells probe same alternative cells Clustering will be reduced only if the hash addresses home at different positions
Example
Suppose we have 10 Hash Codes with value 1 and 5
Hash Codes with Value 2 All these codes shall be clustering around 1 and 2
same home hash address, will lead to same probe sequence In Quadratic probing also, the probe sequence is a function of the home position and not the original key value
Double Hashing
To avoid secondary clustering, we need to have the
probe sequence that make use of the original key value in its decision process This is achieved using Double Hashing, because the Hashing is done in two stages We shall use a second hash function also, so as to reduce the collisions
Double Hashing
Let H1(k) and H2(k) be two hash functions for the
Notes
The functions H1(k) and H2(k) are auxiliary hash
functions, which are selected like any hash function: so that the Keys are distributed in a uniform and random manner.
Example 1
We let H1(k) = k mod m and H2(k) = 1 + (k mod m' ),
where m' is slightly less than m, say, m 1 or m 2. For example m=11 and m=9
Example 2
First Use Mid Square Method and then use the
Modulo Division
Double hashing
Double hashing can be used to avoid the primary and
secondary clustering H2(k) must be chosen with care m and H2(k) must be relatively prime and this can be effected by making m a prime number If m is a power of two then choose H2(k) which is always odd
Example
Generate Hash Codes using Double Hashing for the
following: 2227, 3545, 4537, 8981, 7857, 3433, 6965 Use Division Method using H1(k) = k mod m and H2(k) = 1 + (k mod m' ) We have H(k,i)= {H1(k) + i* H2(k)} mod m Use m=11 and m=9
Steps
First generate Hash codes with H1(k) = k mod m
using m=11 Then apply the Second hashing depends on the Collisions. Take m=9
Step 1-Answer
2227 mod 11 = 5
3545 mod 11 = 3 4537 mod 11 = 5
8981 mod 11 = 5
7857 mod 11 = 3 3433 mod 11 = 1
6965 mod 11 = 2
Step 2
For resolving collisions, use the second Hash
Function-two times for Hash Code 5 and once for Hash Code 3 and see how the mapping evolves
Answer-Step 2
2227 mod 11 = 5
3545 mod 11 = 3 4537 mod 11 = 5
4537 mod 9 +1 = 2
8981 mod 9 +1 = 9 7857 mod 9 +1 = 1
8981 mod 11 = 5
7857 mod 11 = 3 3433 mod 11 = 1 6965 mod 11 = 2
3433 mod 9 +1 = 5
6965 mod 9 +1 = 9
Step 3
2227 5
3545 3 4537 5+1*2=7
4537 mod 9 +1 = 2
8981 mod 9 +1 = 9 7857 mod 9 +1 = 1
8981 5+2*9 1
7857 3+1*1 4 3433 1+1*5 6 6965 2
3433 mod 9 +1 = 5
6965 mod 9 +1 = 9
Sparse Matrices