Professional Documents
Culture Documents
Searching
1
OBJECTIVE
Introduces:
Basic searching concept
Type of searching
Hash function
Collision problems
2
CONTENTS
Introductions
Hashing
Hash method: modulo-division
Collision
3
Introduction
There are so many techniques for searching, but
none is the best because searching approach will
rely on:
Speed and Space – choose of fast technique but wasted
space is not a good idea (slower technique with
optimized space is much better!)
Static and dynamic tables – complexity of table must take
into account (they need to be considered!)
Table size – time (short or long) taken to
search a data is depending on the
size of table.
4
Introduction
Searching Approaches:
Searching approaches that can be used:
Binary-searched: array and binary trees
AVL Trees
B Trees
Hashing
5
Introduction
After search operation has been executed, there are 4
operations that can be done:
Retrieve
Update
Delete
Insert
6
Hashing
At the beginning of this chapter, searching technique is done
by comparing each of data-key.
Binary-search technique can provide a good performance for
searching data if and only all the keys are in sorted
sequence. But it will take time!!!
Hashing is a search technique that requires keys in unsorted
sequence and search by using the address index of the key.
In this technique, data-storing process will also using
hashing concept, that is hash index @ address index (address
for particular information). This will requires
hash function.
7
Hashing
A hash search is a search in which the key, through an
algorithmic function (hash function), determines the
location of the data (in a defined table).
Table (hash table) is a place for the data; that is for each
entry it will keep a unique key value.
Table entries will have their own unique key that is related
to the data that has been entered to the table.
Therefore, search is an operation that will use defined key
in order to access data or
information from the table.
8
Hash Table
hash table
item 0
1
key hash
2
function
3
H-1
9
Hashing Concept
10
The process of accessing all the data or information
from hash table will use hash function to get the hash
index.
11
Hash Function: Modul O-division
The selection of hash function to be used is very
important because it will define the addressing
approach for keys into the hash table.
We are required to spread those keys into the
hash table fair enough so that we can minimize
the use of same address location (collision).
12
Modulo-division
Modulo-division is one of the hashing techniques
that apply divide operation to find the address; it
divides the key by the array/table size and uses the
remainder for the address.
Address = key MOD listsize
14
Hash Function
Assume a table with 8 slots: [0] 72
hash key = key % table size [1]
4 = 36 % 8 [2] 18
2 = 18 % 8 [3] 43
0 = 72 % 8 [4] 36
3 = 43 % 8 [5]
6 = 6 % 8 [6] 6
[7]
15
Example
hash key = id % 13 ID Name
6 985926 Ziyad
10 970876 Musab
8 980962 Radha
11 986074 Adel
5 970728 Adnan
2 994593 Yousuf
1 996321 Husain
0 1 2 3 4 5 6 7 8 9 10 11 12
Husain
Yousuf
Musab
Radha
Adnan
Ziyad
Adel
Hash function for strings:
98 108 105 key[i]
key a l i
0 1 2 i
KeySize = 3;
17
Exercise
Identify the hash key for the 11-item hash table resulting from
hashing the keys 29, 93, 31, 181, 51, 193, 27, 83, 45, 37, 15
using modulo-division for the hash function
18
Collision
Collision is a situation where two or more keys are
pointed to the same address location (this normally
happened when user is trying to enter a new data
into the table).
Assuming there is a number of keys that should be
inserted into T table (hash table). Those keys are
10, 02, 26, and 19. T Table has only 7 entries ( 0 –
6 ).
19
Hash function used is: H(K) = K mod 7
Address for key “10” => 10 mod 7 = 3
Address for key “02” => 02 mod 7 = 2
Address for key “26” => 26 mod 7 = 5
Address for key “19” => 19 mod 7 = 5
0 1 2 3 4 5 6
02 10 T Table
26 19
collision
20
Searching table using hashing concept should
overcome these two problems:
Number of collision(s): Hashing should give minimal
number of collisions
Collision problem: Hashing should overcome the
problem.
21
Policies for overcoming collision problem:
Open Hashing:
Chaining Policy
Open Addressing:
Linear Probing
Double Hashing @ Rehash
22
Example:
Given a hash table with 5 locations and hash
function that has been used is H(i) = i % 5. Show
how this function works if the entries for hash
table are 10, 11, 18, 19, and 23 in sequence.
Address for key “10” => 10 mod 5 = 0
Address for key “11” => 11 mod 5 = 1
Address for key “18” => 18 mod 5 = 3
Address for key “19” => 19 mod 5 = 4
Address for key “23” => 23 mod 5 = 3
23
Chaining Policy
put the collided key into the same address by extending
the location using linked-list.
0 10
1
11
2
3
18 23
4
19
24
Insert 53
53 mod 11 = 9
0 0
1 23 1 56 1 23 1 56
2 24 2 24
3 36 14 3 36 14
4 4
5 16 5 16
6 17 6 17
7 7 29 7 7 29
8 8
9 31 20 42 9 53 20 42
10 10
31
25
Separate Chaining
The idea is to keep a list of all elements that hash
to the same value.
The array elements are pointers to the first nodes of the
lists.
A new item is inserted to the front of the list.
Advantages:
Better space utilization for large items.
Simple collision handling: searching linked list.
Overflow: we can store more items than the hash table
size.
Deletion is quick and easy: deletion from the linked list.
27
Using “Linear-Probe” policy
0 10
1 11
2
3 18 ® insert 23
4 19 collide with 18 Þ find another location
0 10
1 11
2 23
23 inserted into location [2]
3 18
4 19
28
Exercise
Based on the previous exercise, there are collisions occurs
resulting from hashing the keys 29, 93, 31, 181, 51, 193, 27,
83, 45, 37, 15. Shows how you handling the collisions using
linear probing.
29
Double Hashing @ Rehash
those keys that involve with collision will have to hash
continuously until an empty location is found.
Double hashing uses the idea of applying a second hash function to the
key when a collision occurs.
The result of the second hash function will be the number of
positions from the point of collision to insert.
There are some requirements for the second function:
it must never evaluate to zero
must make sure that all cells can be probed
A popular 2nd hash function is:
Hash2( key ) = R – ( key mod R )
Notes: where R is a first prime number smaller than the size of the table
30
Example – Double Hashing
Table size = 10 elements
[0]
[7]
HashKey(49) = 49 % 10 = 9 a collision!
= (7 – (49 % 7)) [8] 18
= (7 – (0))
[9] 89
= 7 positions from [9]
31
Example – Double Hashing
Insert Keys: 58, 69
[0] 69
HashKey(58) = 58 % 10 = 8 a collision!
= (7 – (58 % 7)) [1]
= (7 – (2)) [2]
= 5 positions from [8]
[3] 58
[4]
HashKey(69) = 69 % 10 = 9 a collision!
= (7 – (69 % 7)) [5]
= (7 – (6)) [6] 49
= 1 position from [9]
[7]
[8] 18
[9] 89
32
Double Hashing : Example 1
33
Exercise
Based on the previous exercise, there are collisions occurs
resulting from hashing the keys 29, 93, 31, 181, 51, 193, 27,
83, 45, 37, 15. Shows how you handling the collisions using
double hashing.
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]
51 193 93 181 29 31
hash(99)=t=9 active
hash2(99)=11 – (99%11)= d=11
Note: R=11, N=15Attempt to store key in array elements (t+d)%N, (t+2d)%N, (t+3d)
%N … 35
Double Hashing : Example 3
Where do you store 127 ?
R=11; TableSize=15 hash2(x) = 11 − (x % 11)
hash(127)=7=t
hash2(127)=11-(127%11)= 5=d
40
Some Applications of Hash Tables
Database systems: Hash tables are an important part of efficient random
access because they provide a way to locate data in a constant amount of
time.
Symbol tables: The tables used by compilers to maintain information
about symbols from a program. Compilers access information about
symbols frequently. Therefore, it is important that symbol tables be
implemented very efficiently.
Data dictionaries: Data structures that support adding, deleting, and
searching for data. Using a hash table is particularly efficient.
Network processing algorithms: Hash tables are fundamental
components of several network processing algorithms and applications,
including route lookup, packet classification, and network monitoring.
Browser Caches: Hash tables are used to implement browser caches.
Conclusions
• Hashing is a search method, used when
• sorting is not needed
• access time is the primary concern
• Factors affecting efficiency in hash table
– Choice of hash function
– Collision resolution strategy
– Load Factor
• Hashing offers excellent performance for
insertion and retrieval of data.
Conclusions
The ideal hash table structure is merely an array of some fixed
size, containing the items.
A stored item needs to have a data member, called key, that will
be used in computing the index value for the item.
Key could be an integer, a string, etc
If the keys are strings, convert it into a numeric value first.
Unique value, e.g. a name or Id that is a part of a large employee structure
The size of the array is TableSize.
The items that are stored in the hash table are indexed by values
from 0 to TableSize – 1.
hash function is used to map each key into some number in the
range 0 to TableSize – 1.
43
Conclusions
Insertion:
1) If collision – handle collision.
2) If unoccupied – store the key there.
else If occupied – go to (1).
Searching:
a) If match – successful search
b) If empty position – unsuccessful search
c) If occupied and no match – continue
searching.