You are on page 1of 55

Search vs.

Hashing
• Search tree methods: key comparisons
– Time complexity: O(size) or O(log n)
• Hashing methods: hash functions
– Expected time: O(1)
• Effective way to reduce the number of comparisons
• Types
– Static hashing- In static hashing, the hash function maps search-key values
to a fixed set of locations.
– Dynamic hashing - In dynamic hashing a hash table can grow to
handle more items. The associated hash function must change as the
table grows.
Hashing
• Is a search scheme that will use some function.
• Hashing is another approach to storing and
searching for values.
• The function that converts the key into array
position is called hash function.
• An effective way to reduce the number of
comparisons.
• It deals with the idea of proving the direct address
of the record where the record is likely to store.
Basic terminologies
• Hash table
 Is a data structure used for storing & retrieving data very
quickly.
 Insertion of data in the hash tale is based on the key value.
 Every entry in the hash table is based on key value.
• Hash function
 Is a function which is used to put the data in the hash
table.
 h(key)= key %1000
• Hash key
 The integer returned by the hash function is called hash
key.
Types of hash function
Used to place the record in the hash table
• Division method
 Depends upon the remainder of division.
 h(key)= record % table size
• Mid square
 The key is squared &the middle or mid part of the result is used as
the index.
• Multiplicative hash function
 The given record is multiplied by some constant value. The formula
for computing the hash key is h(key) = floor(p*(fractional part of key
* A)), p- integer constant, A- constant real number
• Digit folding
 The key is divided into separate parts & using some simple operation
these parts are combined to produce the hash key
Collision
• The hash function returns the same addresses(hash
keys) for more than one records is called collision.
• Occurrences of collision mean poor design for the hash
functions.
• choosing a hash function
A good has function should satisfy two criteria:
1. It should be quick to compute
2. It should minimize the number of collisions
• Load factor of a hash table is the ratio of the no. of keys
in the table to the size of the hash table.
• If collision occurs then it should be handled by applying
some techniques – collision handling techniques.
Collision handling techniques- Separate
chaining
• Array of linked list implementation
• Is to keep a list of all elements that hash to the
same value.
• Create an array of linked list of words, so that the
item can be inserted into the linked list if collision
occurs.
• Disadvantages
Parts of the array might never be used.
Constructing new chain nodes is relatively expensive
Separate Chaining (cont’d)
• Example: Load the keys 23, 13, 21, 14, 7, 8, and 15 , in this order, in a hash table of
size 7 using separate chaining with the hash function: h(key) = key % 7
h(23) = 23 % 7 = 2
h(13) = 13 % 7 = 6
h(21) = 21 % 7 = 0
h(14) = 14 % 7 = 0 collision
h(7) = 7 % 7 = 0 collision
h(8) = 8 % 7 = 1
h(15) = 15 % 7 = 1 collision

7
Separate Chaining with String Keys (cont’d)
• Use the hash function hash to load the following commodity items into a hash table of size 13 using separate chaining:
onion 1 10.0
tomato 1 8.50
cabbage 3 3.50
carrot 1 5.50
okra 1 6.50
mellon 2 10.0
potato 2 7.50
Banana 3 4.00
olive 2 15.0
salt 2 2.50
cucumber 3 4.50
mushroom 3 5.50
orange 2 3.00
• Solution:

hash(onion) = (111 + 110 + 105 + 111 + 110) % 13 = 547 % 13 = 1


hash(salt) = (115 + 97 + 108 + 116) % 13 = 436 % 13 = 7
hash(orange) = (111 + 114 + 97 + 110 + 103 + 101)%13 = 636 %13 = 12 8
Separate Chaining with String Keys (cont’d)

0 okra potato

1 onion carrot
2
Item Qty Price h(key)
3 onion 1 10.0 1
4 tomato 1 8.50 10
cabbage cabbage 3 3.50 4
5 carrot 1 5.50 1
6 okra 1 6.50 0
mushroom mellon 2 10.0 10
7 potato 2 7.50 0
salt
8 Banana 3 4.0
11
9 olive 2 15.0 10
cucumber
10 salt 2 2.50 7
tomato cucumber
mellon 3 4.50olive 9
11 mushroom 3 5.50 6
12 banana orange 2 3.00 12
orange 9
• Open hashing has the disadvantage of requiring
pointers.

• This tends to slow the algorithm down a bit because of


the time required to allocate new cells, and also
essentially requires the implementation of a second data
structure.

• Closed hashing, also known as open addressing, is an


alternative to resolving collisions with linked lists.
Contd…
• Three common collision resolution strategies
in open addressing are

– Linear Probing
– Quadratic Probing
– Double Hashing
Collision handling techniques- Open
addressing
• Array based implementation
• Search the array in some systematic way for an
empty cell and insert the new item there if
collision occurs.
• The result of inserting keys {89, 18, 49, 58, 69}
into a closed table using the same hash function
as before and the collision resolution strategy, f(i)
= i.
Hash( 89, 10) = 9
Hash( 18, 10) = 8
Linear Probing Hash( 49, 10) = 9
Hash( 58, 10) = 8
After Hash( 9, 10 ) = 9

Insert 89 Insert 18 Insert 49 Insert 58 Insert 9

49 49 49

0 58 58
1 9
2
3
4
5
6
7
8
9 18 18 18 18
89 89 89 89 89
H + 1, H + 2, H + 3, H + 4,……..H + i
Problem with Linear Probing
• When several different keys are hashed to the
same location, the result is a small cluster of
elements, one after another.
• As the table approaches its capacity, these
clusters tend to merge into larger and lager
clusters.
• Quadratic Probing is the most common
technique to avoid clustering.
Quadratic Probing Hash( 89, 10) = 9
Hash( 18, 10) = 8
Hash( 49, 10) = 9
H+1*1, H+2*2, H+3*3, ….H+i*i Hash( 58, 10) = 8
Hash( 9, 10 ) = 9
After

Insert 89 Insert 18 Insert 49 Insert 58 Insert 9


49 49 49
0
1 58 58
2
9
3
4
5
6
7
8
9
18 18 18 18
89 89 89 89 89
Linear and Quadratic probing problems

• In Linear Probing and quadratic Probing, a


collision is handle by probing the array for an
unused position.
• Each array component can hold just one entry.
When the array is full, no more items can be
added to the table.
• A better approach is to use a different collision
resolution method called DOUBLE HASHING
Double Hashing
• For double hashing, one popular choice is f(i) = i*h2(x).

• This formula says that we apply a second hash function


to x and probe at a distance h2(x), 2h2(x), . . .,and so on.

• A poor choice of h2(x) would be disastrous. For instance,


the obvious choice h2(x) = x mod 9 would not help if 99
were inserted into the input in the previous examples.

• Thus, the function must never evaluate to zero.


f(i) = i*hash2(x) Double Hashing
Purpose – to overcome the disadvantage of
clustering.

A second hash function to get a fixed increment


for the “probe” sequence.

hash2(x) = R - (x mod R)
R: prime, smaller than table size.
Double Hashing
• f(i) = i*hash2(x)
• E.g.: hash2(x) = 7 – (x % 7)

What if hash2(x) == 0 for some x?


Do by yourself
Given input {4371, 1323, 6173, 4199, 4344,
9679, 1989} and a hash function h(x) = x(mod 10),
show the resulting

a. open hash table


b. closed hash table using linear probing
c. closed hash table using quadratic probing
d. closed hash table with second hash function
h2(x) = 7 - (x mod 7)
Priority Queues (Heap)
Motivation

• Development of a data structure which allows


efficient inserts and efficient deletes of the
minimum value (minheap) or maximum value
(maxheap)
Priority queue
• A stack is first in, last out
• A queue is first in, first out
• A priority queue is least-first-out
– The “smallest” element is the first one removed
• (You could also define a largest-first-out priority queue)
– The definition of “smallest” is up to the programmer
(for example, you might define it by implementing
Comparator or Comparable)
– If there are several “smallest” elements, the
implementer must decide which to remove first
• Remove any “smallest” element (don’t care which)
• Remove the first one added
Priority Queue ADT
1. PQueue data : collection of data with priority

2. PQueue operations
– insert
– deleteMin

3. PQueue property: for two elements in the


queue, x and y, if x has a lower priority value
than y, x will be deleted before y
Implementatation
• Heap must be a complete tree

• all leaves are on the lowest two levels


• nodes are added on the lowest level,
from left to right
• nodes are removed from the lowest level,
from right to left
Inserting a Value

5 12

26 25 14 15

29 45 35 31 21 3 insert here to keep


tree complete

i 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
array __ 4 5 12 26 25 14 15 29 45 35 31 21 3 __ __
currentsize = 13
Insert 3
Inserting a Value
4

5 12

26 25 14 15

29 45 35 31 21

save new value in a


3
temporary location: tmp 

Insert 3
Inserting a Value
4

5 12

26 25 14 15

29 45 35 31 21 14  copy 14 down
because 14 > 3

tmp  3

Insert 3
Inserting a Value
4

5 12

26 25 12 15

29 45 35 31 21 14 copy 12 down
because 12 > 3

tmp  3

Insert 3
Inserting a Value
4

5 4

26 25 12 15

29 45 35 31 21 14 copy 4 down
because 4 > 3

tmp  3

Insert 3
Inserting a Value
3 insert 3

5 4

26 25 12 15

29 45 35 31 21 14

Insert 3
Binary Heap Properties
1. Structure Property
2. Ordering Property
Some Definitions:
A Perfect binary tree – A binary tree with all
leaf nodes at the same depth. All internal
nodes have 2 children.
height h
2h+1 – 1 nodes
11 2h – 1 non-leaves
2h leaves
5 21

2 9 16 25

1 3 7 10 13 19 22 30
Heap Structure Property
• A binary heap is a complete binary tree.
Complete binary tree – binary tree that is
completely filled, with the possible exception of
the bottom level, which is filled left to right.
Examples:
Representing Complete
Binary Trees in an Array

1 A
From node i:
2 3
B C
4 D 5 E
6
F
7
G
left child:
8 9 10 11 12 right child:
H I J K L
parent:

implicit (array) implementation:

A B C D E F G H I J K L
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Heap Order Property
Heap order property: For every non-root
node X, the value in the parent of X is less
than (or equal to) the value in X.

10
10
20 80
20 80
40 60 85 99
30 15
50 700
not a heap
Heap Operations
• findMin:
• insert(val): percolate up.
• deleteMin: percolate down.
10

20 80

40 60 85 99

50 700 65
Heap – Insert(val)
Basic Idea:
1. Put val at “next” leaf position
2. Percolate up by repeatedly exchanging node
until no longer needed
Insert: percolate up
10

20 80

40 60 85 99

50 700 65 15

10

15 80

40 20 85 99

50 700 65 60
Insert Code (optimized)
void insert(Object o) { int percolateUp(int hole,
assert(!isFull()); Object val) {
while (hole > 1 &&
size++; val < Heap[hole/2])
newPos = Heap[hole] = Heap[hole/2];
percolateUp(size,o); hole /= 2;
}
Heap[newPos] = o; return hole;
} }
Heap – Deletemin

Basic Idea:
1. Remove root (that is always the min!)
2. Put “last” leaf node at root
3. Find smallest child of node
4. Swap node with its smallest child if needed.
5. Repeat steps 3 & 4 until no swaps needed.
DeleteMin: percolate down
10

20 15

40 60 85 99

50 700 65

15

20 65

40 60 85 99

50 700
DeleteMin Code (Optimized)
Object deleteMin() { int percolateDown(int hole,
assert(!isEmpty()); Object val) {
while (2*hole <= size) {
returnVal = Heap[1]; left = 2*hole;
size--; right = left + 1;
newPos = if (right ≤ size &&
Heap[right] < Heap[left])
percolateDown(1, target = right;
Heap[size+1]); else
Heap[newPos] = target = left;
Heap[size + 1];
if (Heap[target] < val) {
return returnVal; Heap[hole] = Heap[target];
} hole = target;
}
else
break;
}
return hole;
}
Exercise
25, 57, 48, 38, 10, 91, 84, 33
Linear and Quadratic probing problems

• In Linear Probing and quadratic Probing, a


collision is handle by probing the array for an
unused position.
• Each array component can hold just one entry.
When the array is full, no more items can be
added to the table.
• A better approach is to use a different collision
resolution method called DOUBLE HASHING
Double Hashing
• For double hashing, one popular choice is f(i) = i*h2(x).

• This formula says that we apply a second hash function


to x and probe at a distance h2(x), 2h2(x), . . .,and so on.

• A poor choice of h2(x) would be disastrous. For instance,


the obvious choice h2(x) = x mod 9 would not help if 99
were inserted into the input in the previous examples.

• Thus, the function must never evaluate to zero.


f(i) = i*hash2(x) Double Hashing
Purpose – to overcome the disadvantage of
clustering.

A second hash function to get a fixed increment


for the “probe” sequence.

hash2(x) = R - (x mod R)
R: prime, smaller than table size.
Double Hashing
• f(i) = i*hash2(x)
• E.g.: hash2(x) = 7 – (x % 7)

What if hash2(x) == 0 for some x?


Do by yourself
Given input {4371, 1323, 6173, 4199, 4344,
9679, 1989} and a hash function h(x) = x(mod 10),
show the resulting

a. open hash table


b. closed hash table using linear probing
c. closed hash table using quadratic probing
d. closed hash table with second hash function
h2(x) = 7 - (x mod 7)
Rehashing
• If the table gets too full, the running time for
the operations will start taking too long and
inserts might fail for closed hashing with
quadratic resolution.

• This can happen if there are too many


deletions intermixed with insertions.
Contd..

• A solution is Rehashing, then, is to build


another table that is about twice as big (with
associated new hash function) and scan down
the entire original hash table, computing the
new hash value for each (non-deleted)
element and inserting it in the new table.
Closed hash table after rehashing
Contd…
• Rehashing can be implemented in several
ways with quadratic probing.
 One alternative is to rehash as soon as the table is
half full.
The other extreme is to rehash only when an
insertion fails.
A third, middle of the road, strategy is to rehash
when the table reaches a certain load factor.

You might also like