Course7 Hashing

Chapter 7 – Trees
Algorithms and dynamic data structures
A. Chikhaoui 2022 - 2023

Introduction
For a dictionary with n (key, value) pairs:

find insert delete
Unsorted linked-list O(n) O(1) O(n)
Unsorted array O(n) O(1) O(n)
Sorted linked list O(n) O(n) O(n)
Sorted array O(log(n)) O(n) O(n)
Balanced tree O(log(n)) O(log(n)) O(log(n))
Is there a data structure? O(1) O(1) O(1)
It is interesting to have a data structure with constant time (complexity= o(1)) for
find,insert and delete operations → Hash tables
hashing is used in:
▶ Web searches
▶ Databases
▶ Compilers
▶ passwords
▶ , etc.
1 12
Definition
Hashing is a technique used for storing and retrieving information as quickly as possible.
It is used to perform optimal searches
9
8
7
x1 x4 6
hash function
x2 index=h(xi ) 5
x3 4
3
2
1
0
h is called: hash function such that h: S → [0...N − 1]. It maps a set of objects
S = {x1 , x2 , ...xm } into a table of size N.
It is a good idea to pick a prime as the table size to have a better distribution of values.
h(x) represents the primary address of x (it is an index of T)
Example of hash function is MOD function.
2 12
Collisions
Example: Let suppose that we have a hash table T and its size is 100. The hash function
is h(x) = x%100. If x1 = 22 and x2 = 122.
x1 and x2 will be mapped to the same location based our hash function.
This is called Collision
Definition: Collision is the condition where two items or more are stored in the same
location.
When collisions occur, we need to “handle” them using Collision Resolution
Techniques (see later).
Collisions can be reduced with a selection of a good hash function.
3 12
Characteristics of Good hash functions
A good hash function should have the following characteristics:

▶ Minimize collision
▶ Be easy and quick to compute
▶ Distribute key values evenly in the hash table
▶ Use all the information provided in the key
▶ Have a high load factor for a given set of keys
Load factor: The load factor of a hash table is the number of items stored in the table divided
by the size of the table. This is the decision parameter used when we want to rehash or expand
the existing hash table entries.
4 12
Collision Resolution Techniques
The process of finding an alternate location is called collision resolution.

the most popular collision resolution tchniques are :
▶ Direct Chaining: An array of linked lists
Separate chaining
▶ Open Addressing: Array-based implementation
Linear probing (linear search)
Quadratic probing (nonlinear search)
Double hashing (use two hash functions)
5 12
Separate Chaining
Collision resolution by chaining combines linked representation with hash table.

All keys that map to the same table location are kept in a singly-linked list.
Operations:
▶ f ind: The search for x begins in the cell
h(x). If x is not in that cell, we search in
the corresponding linked list.
▶ insert: the insertion begins with the
search for the key x. If x does not exist,
we add it in the cell h(x) if it is empty
otherwise we add it in the linked list
associated with this cell.
▶ delete: if x exists, delete it and
rearrange the table.
Complexity: In the worse case O(n), in the average case O(1).
6 12
Linear Probing
In linear probing, we search the hash table sequentially, starting from the original hash
location.
If a location is occupied, we check the next locations (h(x)-1, h(x)-2,...) Until finding an
empty location.
( We wrap around from the last table location to the first table location if necessary, it
means the next location of 0 is N-1: circular array).
There must always be at least one empty cell in the table (one position is sacrificed)
Example: insert the following items into an array of size=10, h(x)=x%size:
55, 21, 63, 103, 10, 82, 18.
7 12
Linear Probing
8 12
Linear Probing
8 12
Linear Probing
8 12
Linear Probing
8 12
Linear Probing
9 12
Linear Probing
9 12
Linear Probing
9 12
Linear Probing
9 12
Linear Probing
Operations:
▶ f ind: The search for x begins in the cell h(x), if the cell doesn’t match, look elsewhere. If if
x does not exist in the table, the find operation stops in the empty cell.
▶ insert: compute h(x), if the cell is full, find another by sequentially searching for the next
available cell: h(x)-1, h(x)-2, ...
▶ delete: Search x. Test all the data above (circularly) until you find an empty cell and
possibly move some values. Clear a cell.
Complexity: In the worse case O(n), in the average case O(1).
10 12
Quadratic Probing
Same idea as linear probing except that collisions are resolved by examining cells:
h(x) − 12 , h(x) − 22 , h(x) − 32 , ...
11 12
Double hashing
Like the Linear probing except that the cyclic sequence of cells to be tested uses a step
calculated by a 2nd hash function h′ (x).
Let h(x) be the function used to calculate the primary address and h′ (x) the function
which calculates the step of the sequence: h(x), h(x) − h′ (x) , h(x) − 2h′ (x) ,
h(x) − 3h′ (x) , ...
For the sequence to be circular, the subtractions are done modulo the size of the table.
As in linear probing, one cell is sacrificed.
12 / 12

Course7 Hashing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Course7 Hashing

Uploaded by

Copyright:

Available Formats

Chapter 7 – Trees

Algorithms and dynamic data structures

A. Chikhaoui 2022 - 2023

For a dictionary with n (key, value) pairs:

A good hash function should have the following characteristics:

The process of finding an alternate location is called collision resolution.

Collision resolution by chaining combines linked representation with hash table.

Complexity: In the worse case O(n), in the average case O(1).

You might also like