You are on page 1of 38

Hashing

part2
Lec13- spring 2018
PRESENTER BY: DR EMAD NABIL
Lecture Overview

• Introduction to Hashing
• Hash functions
• Distribution of records among addresses
• synonyms and collisions
• Collision resolution by progressive overflow or linear probing
1. Open addressing
The first collision resolution method, open addressing, resolves collisions in the home area. When a collision occurs,
the home area addresses are searched for an open or unoccupied element where the new data can be placed.
Examples of Open Addressing Methods:
1.1. Linear probing
1.2. Quadratic probing
1.3. Double hashing

2. Bucket hashing (defers collision but does not prevent it)


3. Chained progressive overflow
4. Chaining with separate overflow area
5. Hashed index (scatter table)

3
Address space
Hashing Example 0 999

Offset of Lowell =
4*Record_Size
2.Hashing with buckets

►This is a variation of hashed files in which more than one record/key is


stored per hash address.

►Bucket = block of records corresponding to one address in the hash table

►The hash function gives the Bucket Address


2. Hashing with buckets

Collision resolution with buckets and linear


probing (progressive overflow) method.
Note: A2=B2=C2
Implementation issues
• Mark free slots using certain marker

/// /// ///

• Mark deleted slots using tombstone for future usage


###

• In every bucket, there should be a counter that count the number of used
slots.

• Be careful of infinite loop if the whole table is full (buckets with progressive overflow)
1. Shifting records that follow a tombstone to its home address
2. Complete reorganization (Rehashing )
3. Use different collision resolution technique
3. Chained progressive overflow
Data Next

20
21
22
23
24
25
..
Data Next

20
21
22
23
24
25
..
Data Next

20
21
22
23
24
25
..
Data Next

20
21
22
23
24
25
..
Data Next

20
21
22
23
24
25
..
Data Next

20
21
22
23
24
25
..
The key behind enhancement:
grouping similar keys in one cluster

Cluster 1: (20) Adams Coles Flint


Cluster 2: (23) Bates Dean
A problem in the previous technique
Data Next

20
21
22
23
24
25
..
Data Next

20
21
22
23
24
25
..
Data Next

20
21
22
23
24
25
..
Data Next

20
21
22
23
24
25
..
Data Next

20
21
22
23
24
25
..

Did you notice the problem??


(20)Adams (20)Coles (22)Dean (20)Flint
(20)Adams (20)Coles (22)Dean (20)Flint

Two
clusters
The problem is Overlapped
that the home
address of dean,
which is the head
of a cluster, is not
free
4. Chaining with separate overflow area
Bucket size=1, may be
more than one slot in
bucket
5. Hashed index 5

Applicable
Hash value Record offset inside file to variable
length
records
Patterns of records access
1. Open addressing
The first collision resolution method, open addressing, resolves collisions in the home area. When a collision occurs,
the home area addresses are searched for an open or unoccupied element where the new data can be placed.
Examples of Open Addressing Methods:
1.1. Linear probing
1.2. Quadratic probing
1.3. Double hashing

2. Bucket hashing (defers collision but does not prevent it)


3. Chained progressive overflow
4. Chaining with separate overflow area
5. Hashed index (scatter table)

38

You might also like