You are on page 1of 44

Chapter 8

Searching

1
OBJECTIVE
Introduces:
 Basic searching concept
 Type of searching
 Hash function
 Collision problems

2
CONTENTS
 Introductions
 Hashing
 Hash method: modulo-division
 Collision

3
Introduction
There are so many techniques for searching, but
none is the best because searching approach will
rely on:
 Speed and Space – choose of fast technique but wasted
space is not a good idea (slower technique with
optimized space is much better!)
 Static and dynamic tables – complexity of table must take
into account (they need to be considered!)
 Table size – time (short or long) taken to
search a data is depending on the
size of table.

4
Introduction
Searching Approaches:
Searching approaches that can be used:
 Binary-searched: array and binary trees
 AVL Trees
 B Trees
 Hashing

5
Introduction
After search operation has been executed, there are 4
operations that can be done:
 Retrieve
 Update
 Delete
 Insert

6
Hashing
At the beginning of this chapter, searching technique is done
by comparing each of data-key.
Binary-search technique can provide a good performance for
searching data if and only all the keys are in sorted
sequence. But it will take time!!!
Hashing is a search technique that requires keys in unsorted
sequence and search by using the address index of the key.
In this technique, data-storing process will also using
hashing concept, that is hash index @ address index (address
for particular information). This will requires
hash function.
7
Hashing
A hash search is a search in which the key, through an
algorithmic function (hash function), determines the
location of the data (in a defined table).
Table (hash table) is a place for the data; that is for each
entry it will keep a unique key value.
Table entries will have their own unique key that is related
to the data that has been entered to the table.
Therefore, search is an operation that will use defined key
in order to access data or
information from the table.
8
Hash Table
hash table

item 0

1
key hash
2
function
3

H-1

9
Hashing Concept

10
The process of accessing all the data or information
from hash table will use hash function to get the hash
index.

11
Hash Function: Modul O-division
 The selection of hash function to be used is very
important because it will define the addressing
approach for keys into the hash table.
 We are required to spread those keys into the
hash table fair enough so that we can minimize
the use of same address location (collision).

12
Modulo-division
Modulo-division is one of the hashing techniques
that apply divide operation to find the address; it
divides the key by the array/table size and uses the
remainder for the address.
Address = key MOD listsize

Assume we have function


H(K) = K mod M; where:
K – key value
H – hash function
M – size of list / array / table
Address that is generated from H(K) : 0 < H(K) < M
13
Hash Function
The fixed process to convert a key to a hash key is
known as a hash function. This function will be
used whenever access to the table is needed.

The most common method of determining a hash key


is the division method. The formula that will be used
is:

hash key = key % table size

14
Hash Function
Assume a table with 8 slots: [0] 72
hash key = key % table size [1]
4 = 36 % 8 [2] 18
2 = 18 % 8 [3] 43
0 = 72 % 8 [4] 36
3 = 43 % 8 [5]
6 = 6 % 8 [6] 6

[7]

15
Example
hash key = id % 13 ID Name
6 985926 Ziyad
10 970876 Musab
8 980962 Radha
11 986074 Adel
5 970728 Adnan
2 994593 Yousuf
1 996321 Husain

0 1 2 3 4 5 6 7 8 9 10 11 12
Husain

Yousuf

Musab
Radha
Adnan
Ziyad

Adel
Hash function for strings:
98 108 105 key[i]

key a l i
0 1 2 i
KeySize = 3;

hash(“ali”) = (105 * 1 + 108*37 + 98*372) % 10,007 = 8172


0
1
2
“ali” hash ……
function 8172
ali
……
10,006 (TableSize)

17
Exercise
Identify the hash key for the 11-item hash table resulting from
hashing the keys 29, 93, 31, 181, 51, 193, 27, 83, 45, 37, 15
using modulo-division for the hash function

18
Collision
Collision is a situation where two or more keys are
pointed to the same address location (this normally
happened when user is trying to enter a new data
into the table).
Assuming there is a number of keys that should be
inserted into T table (hash table). Those keys are
10, 02, 26, and 19. T Table has only 7 entries ( 0 –
6 ).

19
Hash function used is: H(K) = K mod 7
Address for key “10” => 10 mod 7 = 3
Address for key “02” => 02 mod 7 = 2
Address for key “26” => 26 mod 7 = 5
Address for key “19” => 19 mod 7 = 5

0 1 2 3 4 5 6  

02 10 T Table

26 19
collision

20
Searching table using hashing concept should
overcome these two problems:
 Number of collision(s): Hashing should give minimal
number of collisions
 Collision problem: Hashing should overcome the
problem.

21
Policies for overcoming collision problem:
Open Hashing:

 Chaining Policy

Open Addressing:

 Linear Probing
 Double Hashing @ Rehash

22
Example:
Given a hash table with 5 locations and hash
function that has been used is H(i) = i % 5. Show
how this function works if the entries for hash
table are 10, 11, 18, 19, and 23 in sequence.
Address for key “10” => 10 mod 5 = 0
Address for key “11” => 11 mod 5 = 1
Address for key “18” => 18 mod 5 = 3
Address for key “19” => 19 mod 5 = 4
Address for key “23” => 23 mod 5 = 3

23
Chaining Policy
put the collided key into the same address by extending
the location using linked-list.

0 10

1
11
2

3
18 23
4

19

24
Insert 53
53 mod 11 = 9

0 0
1 23 1 56 1 23 1 56
2 24 2 24
3 36 14 3 36 14
4 4
5 16 5 16
6 17 6 17
7 7 29 7 7 29
8 8
9 31 20 42 9 53 20 42
10 10
31

25
Separate Chaining
The idea is to keep a list of all elements that hash
to the same value.
The array elements are pointers to the first nodes of the
lists.
A new item is inserted to the front of the list.
Advantages:
Better space utilization for large items.
Simple collision handling: searching linked list.
Overflow: we can store more items than the hash table
size.
Deletion is quick and easy: deletion from the linked list.

CENG 213 Data Structures 26


Linear Probing
 resolve the collision by adding 1 to the current address.
 For example (previous slide), given that we already inserted
key “26” to address 005. Next, key “19” is suppose to be
inserted in the same address (005) but this address is filled by
key “26”. Therefore we need to add 1 to the current address
(005). At this time, key “19” will be inserted into a new
address -> 006. If address 006 is filled by another key, we
need to add 1 to the current address (006), and becomes 007.
If we have accessed the final address location, addressing
will be started at the beginning of the table again.
 Linear probing : Try next available position

27
Using “Linear-Probe” policy
0 10
1 11
2
3 18 ® insert 23
4 19 collide with 18 Þ find another location

0 10

1 11

2 23
23 inserted into location [2]
3 18

4 19

28
Exercise
Based on the previous exercise, there are collisions occurs
resulting from hashing the keys 29, 93, 31, 181, 51, 193, 27,
83, 45, 37, 15. Shows how you handling the collisions using
linear probing.

29
Double Hashing @ Rehash
 those keys that involve with collision will have to hash
continuously until an empty location is found.
 Double hashing uses the idea of applying a second hash function to the
key when a collision occurs.
 The result of the second hash function will be the number of
positions from the point of collision to insert.
 There are some requirements for the second function:
 it must never evaluate to zero
 must make sure that all cells can be probed
A popular 2nd hash function is:
Hash2( key ) = R – ( key mod R )

Notes: where R is a first prime number smaller than the size of the table

30
Example – Double Hashing
Table size = 10 elements
[0]

Hash1(key) = key % 10 [1]


Hash2(key) = 7 – ( key % 7 ) [2]

Insert Keys: 89, 18, 49, 58, 69 [3]


[4]
HashKey(89) = 89 % 10 = 9
[5]
HashKey(18) = 18 % 10 = 8 [6] 49

[7]
HashKey(49) = 49 % 10 = 9 a collision!
= (7 – (49 % 7)) [8] 18
= (7 – (0))
[9] 89
= 7 positions from [9]
31
Example – Double Hashing
Insert Keys: 58, 69
[0] 69
HashKey(58) = 58 % 10 = 8 a collision!
= (7 – (58 % 7)) [1]
= (7 – (2)) [2]
= 5 positions from [8]
[3] 58
[4]
HashKey(69) = 69 % 10 = 9 a collision!
= (7 – (69 % 7)) [5]
= (7 – (6)) [6] 49
= 1 position from [9]
[7]
[8] 18

[9] 89
32
Double Hashing : Example 1

33
Exercise
Based on the previous exercise, there are collisions occurs
resulting from hashing the keys 29, 93, 31, 181, 51, 193, 27,
83, 45, 37, 15. Shows how you handling the collisions using
double hashing.
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]

51 193 93 181 29 31

Hash key (181) = 181 % 11 = 5


Hash key2 (181) = 7 – (181 % 7) = 1  1 positions from [5]  [6]
Hash key (51) = 51 % 11 = 7
Hash key2 (51) = 7 – (51 % 7) = 5  5 positions from [7]  [1]
Hash key (193) = 193 % 11 = 6
Hash key2 (193) = 7 – (193 % 7) = 3  3 positions from [6]  [9] !!??
Attempt to store key 193 in d, 2d, 3d…positions
positions [9], [1], [4],…key 193 = position [4]
34
Double Hashing : Example 2
Where do you store 99 ?
R=11; TableSize=15 hash2(x) = 11 − (x % 11)

hash(99)=t=9 active
hash2(99)=11 – (99%11)= d=11

(hash(99)+1*hash2(99)) %15= (9+11)%15=5active


(hash(99)+2*hash2(99)) %15= (9+22)%15=1active
(hash(99)+3*hash2(99)) %15= (9+33)%15=12insert 99

Note: R=11, N=15Attempt to store key in array elements (t+d)%N, (t+2d)%N, (t+3d)
%N … 35
Double Hashing : Example 3
Where do you store 127 ?
R=11; TableSize=15 hash2(x) = 11 − (x % 11)

hash(127)=7=t
hash2(127)=11-(127%11)= 5=d

(7+5)%15, (7+2*5)%15, (7+3*5)%15, (7+4*5)%15, (7+5*5)%15 …

12 --------2 ------------7 --------------12 ------------ 2---- >


>INFINITE CE 222-Data Structures & Algorithms II, Izmir
University of Economics 36
Rehashing
If
Hash Table gets too full
Running time for the operations will start taking too
long time
Insertions might fail
Solution: Rehashing
Not bad, occurs very infrequently
Expensive operation O(N)
Rehashing
Build another table that is about twice as big
Associate a new hash function
Scan down the entire original hash table
Compute the new hash value for each element
Insert it in the new table
Rehashing
Exercise:
Using the mod-division method and linear probing,
store the keys shown below in an array with 19
elements. How many collisions occurred?

224562, 137456, 214562


140145, 214576, 162145
144467, 199645, 234534

40
Some Applications of Hash Tables
 Database systems: Hash tables are an important part of efficient random
access because they provide a way to locate data in a constant amount of
time.
 Symbol tables: The tables used by compilers to maintain information
about symbols from a program. Compilers access information about
symbols frequently. Therefore, it is important that symbol tables be
implemented very efficiently.
 Data dictionaries: Data structures that support adding, deleting, and
searching for data. Using a hash table is particularly efficient.
 Network processing algorithms: Hash tables are fundamental
components of several network processing algorithms and applications,
including route lookup, packet classification, and network monitoring.
 Browser Caches: Hash tables are used to implement browser caches.
Conclusions
• Hashing is a search method, used when
• sorting is not needed
• access time is the primary concern
• Factors affecting efficiency in hash table
– Choice of hash function
– Collision resolution strategy
– Load Factor
• Hashing offers excellent performance for
insertion and retrieval of data.
Conclusions
The ideal hash table structure is merely an array of some fixed
size, containing the items.
A stored item needs to have a data member, called key, that will
be used in computing the index value for the item.
 Key could be an integer, a string, etc
 If the keys are strings, convert it into a numeric value first.
 Unique value, e.g. a name or Id that is a part of a large employee structure
The size of the array is TableSize.
The items that are stored in the hash table are indexed by values
from 0 to TableSize – 1.
hash function is used to map each key into some number in the
range 0 to TableSize – 1.

43
Conclusions
Insertion:
1) If collision – handle collision.
2) If unoccupied – store the key there.
else If occupied – go to (1).

Searching:
a) If match – successful search
b) If empty position – unsuccessful search
c) If occupied and no match – continue
searching.

If end of the table - continue from the beginning

You might also like