You are on page 1of 68

Search + Hash

Dept. Computer
Science

Searching and Hash structure Searching algorithms


Sequential Search
Interval Search

Hash structure
Data Structures and Algorithms Basic concepts
Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square

Dept. Computer Science Mid-square


Folding

Faculty of Computer Science and Engineering Rotation


Pseudo-random

Ho Chi Minh University of Technology, VNU-HCM Collision resolution


Open addressing
Bucket hashing
Linked list resolution

Search + Hash.1
Search + Hash
Overview
Dept. Computer
Science
1 Searching algorithms
Sequential Search
Interval Search
2 Hash structure Searching algorithms
Sequential Search
Basic concepts Interval Search

Hash functions Hash structure


Basic concepts
Direct Hashing Hash functions

Modulo division Direct Hashing


Modulo division
Digit extraction Digit extraction

Mid-square Mid-square
Mid-square
Mid-square Folding

Folding Rotation
Pseudo-random
Rotation Collision resolution

Pseudo-random Open addressing


Bucket hashing
Collision resolution Linked list resolution

Open addressing
Bucket hashing
Linked list resolution
Search + Hash.2
Search + Hash

Dept. Computer
Science

Searching algorithms
Sequential Search

SEARCHING Interval Search

Hash structure
Basic concepts
Hash functions

ALGORITHMS Direct Hashing


Modulo division
Digit extraction
Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.3
Search + Hash
Searching algorithms
Dept. Computer
Science

Definition
Searching algorithms
Searching Algorithms are designed to check for an ele- Sequential Search
Interval Search
ment or retrieve an element from any data structure where
Hash structure
it is stored. Based on the type of search operation, these Basic concepts
Hash functions
algorithms are generally classified into two categories: Direct Hashing
Modulo division
1 Sequential Search: In this, the list or array is traversed Digit extraction
Mid-square
sequentially and every element is checked. Mid-square
Folding
2 Interval Search: These algorithms are specifically de- Rotation
Pseudo-random
signed for searching in sorted data-structures. Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.4
Search + Hash
Linear Search
Dept. Computer
Science

Approach Searching algorithms


Sequential Search

1 Start from the leftmost element of list and one by one Interval Search

Hash structure
compare x with each element of list. Basic concepts
Hash functions
Direct Hashing
2 If x matches with an element, return the index. Modulo division
Digit extraction
Mid-square

3 If x doesn’t match with any of elements, return -1. Mid-square


Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.5
Search + Hash
Linear Search
Dept. Computer
Science

Approach Searching algorithms


Sequential Search

1 Start from the leftmost element of list and one by one Interval Search

Hash structure
compare x with each element of list. Basic concepts
Hash functions
Direct Hashing
2 If x matches with an element, return the index. Modulo division
Digit extraction
Mid-square

3 If x doesn’t match with any of elements, return -1. Mid-square


Folding
Rotation
Pseudo-random
The time complexity of the above algorithm is O(n). Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.5
Search + Hash
Binary Search
Dept. Computer
Science

Approach
Search a sorted array by repeatedly dividing the search in-
terval in half. Begin with an interval covering the whole Searching algorithms
array. Sequential Search
Interval Search

1 If the value of the search key is less than the item in Hash structure
the middle of the interval, narrow the interval to the Basic concepts
Hash functions

lower half, otherwise narrow it to the upper half. Direct Hashing


Modulo division

2 Repeatedly check until the value is found or the interval Digit extraction
Mid-square

is empty. Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.6
Search + Hash
Binary Search
Dept. Computer
Science

Approach
Search a sorted array by repeatedly dividing the search in-
terval in half. Begin with an interval covering the whole Searching algorithms
array. Sequential Search
Interval Search

1 If the value of the search key is less than the item in Hash structure
the middle of the interval, narrow the interval to the Basic concepts
Hash functions

lower half, otherwise narrow it to the upper half. Direct Hashing


Modulo division

2 Repeatedly check until the value is found or the interval Digit extraction
Mid-square

is empty. Mid-square
Folding
Rotation
Pseudo-random
Implemetation: Collision resolution
Open addressing
• Recursive Bucket hashing
Linked list resolution

• Iterative
Time complexity: O(log n)

Search + Hash.6
Search + Hash
Jump Search
Dept. Computer
Science

Approach
Jump Search is a searching algorithm for sorted arrays. The
basic idea is to check fewer elements (than linear search) by Searching algorithms
Sequential Search

jumping ahead by fixed steps or skipping some elements in Interval Search

place of searching all elements. Hash structure


Basic concepts
Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.7
Search + Hash
Jump Search
Dept. Computer
Science

Approach
Jump Search is a searching algorithm for sorted arrays. The
basic idea is to check fewer elements (than linear search) by Searching algorithms
Sequential Search

jumping ahead by fixed steps or skipping some elements in Interval Search

place of searching all elements. Hash structure


Basic concepts
Hash functions
Suppose that an array arr of size n divided to some blocks Direct Hashing
Modulo division
with fixed size m. Digit extraction
Mid-square
1 Find the block k such that the first element of block Mid-square
Folding
k is less than key and the first element of block k + 1 Rotation
Pseudo-random
is greater than or equals to key. Collision resolution
Open addressing
2 Perform the linear search on the block k. Bucket hashing
Linked list resolution

Search + Hash.7
Search + Hash
Jump Search
Dept. Computer
Science

Approach
Jump Search is a searching algorithm for sorted arrays. The
basic idea is to check fewer elements (than linear search) by Searching algorithms
Sequential Search

jumping ahead by fixed steps or skipping some elements in Interval Search

place of searching all elements. Hash structure


Basic concepts
Hash functions
Suppose that an array arr of size n divided to some blocks Direct Hashing
Modulo division
with fixed size m. Digit extraction
Mid-square
1 Find the block k such that the first element of block Mid-square
Folding
k is less than key and the first element of block k + 1 Rotation
Pseudo-random
is greater than or equals to key. Collision resolution
Open addressing
2Perform the linear search on the block k. Bucket hashing
Linked list resolution

Time complexity: n.

Search + Hash.7
Search + Hash
Jump Seach
Dept. Computer
Science

Consider the following array:


[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610]. Searching algorithms
Sequential Search
Length of the array is 16, block size is 4. Interval Search

Hash structure
1 Jump from index 0 to index 4. Basic concepts
Hash functions
2 Jump from index 4 to index 8. Direct Hashing
Modulo division
3 Jump from index 8 to index 12. Digit extraction
Mid-square

4 Since the element at index 12 is greater than 55 we Mid-square


Folding

will jump back a step to come to index 8. Rotation


Pseudo-random

5 Do linear search from index 8 to get the element 55. Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.8
Search + Hash
Interpolation Seach
Dept. Computer
Approach Science

Interpolation Search is an improvement over Binary Search


for instances, where the values in a sorted array are uniformly
distributed.
Searching algorithms
1 Calculate the value of pos using the probe position Sequential Search
Interval Search

formula. Hash structure


Basic concepts

(x − arr[lo]) × (hi − lo) Hash functions

pos = lo + Direct Hashing

arr[hi] − arr[lo] Modulo division


Digit extraction
Mid-square
where Mid-square
Folding
• arr: array where elements need to be searched. Rotation

• x: element to be searched. Pseudo-random


Collision resolution
• lo: starting index in arr. Open addressing

• hi: ending index in arr.


Bucket hashing
Linked list resolution

Search + Hash.9
Search + Hash
Interpolation Seach
Dept. Computer
Approach Science

Interpolation Search is an improvement over Binary Search


for instances, where the values in a sorted array are uniformly
distributed. Searching algorithms
Sequential Search
1 Calculate the value of pos using the probe position Interval Search

formula. Hash structure


Basic concepts
2 If it is a match, return the index of the item, and exit. Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Time complexity:
• If elements are uniformly distributed, then O(log log n)).
• In worst case, it can take up to O(n).
Search + Hash.9
Search + Hash
Interpolation Seach
Dept. Computer
Approach Science

Interpolation Search is an improvement over Binary Search


for instances, where the values in a sorted array are uniformly
distributed. Searching algorithms
Sequential Search
1 Calculate the value of pos using the probe position Interval Search

formula. Hash structure


Basic concepts
2 If it is a match, return the index of the item, and exit. Hash functions
Direct Hashing

3 If the item is less than arr[pos], calculate the probe Modulo division
Digit extraction

position of the left sub-array. Otherwise calculate the Mid-square


Mid-square
same in the right sub-array. Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Time complexity:
• If elements are uniformly distributed, then O(log log n)).
• In worst case, it can take up to O(n).
Search + Hash.9
Search + Hash
Interpolation Seach
Dept. Computer
Approach Science

Interpolation Search is an improvement over Binary Search


for instances, where the values in a sorted array are uniformly
distributed. Searching algorithms
Sequential Search
1 Calculate the value of pos using the probe position Interval Search

formula. Hash structure


Basic concepts
2 If it is a match, return the index of the item, and exit. Hash functions
Direct Hashing

3 If the item is less than arr[pos], calculate the probe Modulo division
Digit extraction

position of the left sub-array. Otherwise calculate the Mid-square


Mid-square
same in the right sub-array. Folding
Rotation

4 Repeat until a match is found or the sub-array Pseudo-random


Collision resolution

reduces to zero. Open addressing


Bucket hashing
Linked list resolution

Time complexity:
• If elements are uniformly distributed, then O(log log n)).
• In worst case, it can take up to O(n).
Search + Hash.9
Search + Hash

Dept. Computer
Science

Searching algorithms
Sequential Search
Interval Search

Basic concepts Hash structure


Basic concepts
Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.10
Search + Hash
Basic concepts
Dept. Computer
Science

Searching algorithms

• Sequential search: O(n) Sequential Search


Interval Search

• Binary search: O(log2 n) Hash structure


Basic concepts
Hash functions
Direct Hashing
Modulo division

→ Requiring several key comparisons before Digit extraction


Mid-square
Mid-square

the target is found. Folding


Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.11
Search + Hash
Basic concepts
Dept. Computer
Science

Search complexity:
Size Binary Sequential Sequential Searching algorithms
Sequential Search
(Average) (Worst Case) Interval Search

16 4 8 16 Hash structure
Basic concepts
50 6 25 50 Hash functions
Direct Hashing
256 8 128 256 Modulo division
Digit extraction
1,000 10 500 1,000 Mid-square
Mid-square
10,000 14 5,000 10,000 Folding
Rotation
100,000 17 50,000 100,000 Pseudo-random
Collision resolution
1,000,000 20 500,000 1,000,000 Open addressing
Bucket hashing
Linked list resolution

Search + Hash.12
Search + Hash
Basic concepts
Dept. Computer
Science

Searching algorithms
Sequential Search
Interval Search

Is there a search algorithm whose complexity Hash structure


Basic concepts

is O(1)? Hash functions


Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.13
Search + Hash
Basic concepts
Dept. Computer
Science

Searching algorithms
Sequential Search
Interval Search

Is there a search algorithm whose complexity Hash structure


Basic concepts

is O(1)? Hash functions


Direct Hashing

YES Modulo division


Digit extraction
Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.13
Search + Hash
Basic concepts
Dept. Computer
Science

Searching algorithms
Sequential Search
Interval Search

Hash structure
Basic concepts
Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Figure: Each key has only one address

Search + Hash.14
Search + Hash
Basic concepts
Dept. Computer
Science

Searching algorithms
Sequential Search
Interval Search

Hash structure
Basic concepts
Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.15
Search + Hash
Basic concepts
Dept. Computer
Science
• Home address: address produced by a
hash function.
• Prime area: memory that contains all the Searching algorithms
Sequential Search
home addresses. Interval Search

Hash structure
Basic concepts
Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.16
Search + Hash
Basic concepts
Dept. Computer
Science
• Home address: address produced by a
hash function.
• Prime area: memory that contains all the Searching algorithms
Sequential Search
home addresses. Interval Search

Hash structure
• Synonyms: a set of keys that hash to the Basic concepts
Hash functions

same location. Direct Hashing


Modulo division
Digit extraction

• Collision: the location of the data to be Mid-square


Mid-square

inserted is already occupied by the Folding


Rotation
Pseudo-random

synonym data. Collision resolution


Open addressing
Bucket hashing
Linked list resolution

Search + Hash.16
Search + Hash
Basic concepts
Dept. Computer
Science
• Home address: address produced by a
hash function.
• Prime area: memory that contains all the Searching algorithms
Sequential Search
home addresses. Interval Search

Hash structure
• Synonyms: a set of keys that hash to the Basic concepts
Hash functions

same location. Direct Hashing


Modulo division
Digit extraction

• Collision: the location of the data to be Mid-square


Mid-square

inserted is already occupied by the Folding


Rotation
Pseudo-random

synonym data. Collision resolution


Open addressing

• Ideal hashing: Bucket hashing


Linked list resolution

• No location collision
• Compact address space
Search + Hash.16
Search + Hash
Basic concepts
Dept. Computer
Science

Searching algorithms
Sequential Search
Interval Search

Hash structure
Basic concepts
Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.17
Search + Hash
Basic concepts
Dept. Computer
Science

Searching algorithms
Sequential Search
Interval Search

Hash structure
Basic concepts
Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.18
Search + Hash
Basic concepts
Dept. Computer
Science

Searching algorithms
Sequential Search
Interval Search

Hash structure
Basic concepts
Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.19
Search + Hash
Basic concepts
Dept. Computer
Science

Searching algorithms
Sequential Search
Interval Search

Hash structure
Basic concepts
Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.20
Search + Hash

Dept. Computer
Science

Searching algorithms
Sequential Search
Interval Search

Hash functions Hash structure


Basic concepts
Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.21
Search + Hash
Hash functions
Dept. Computer
Science

• Direct hashing
Searching algorithms

• Modulo division Sequential Search


Interval Search

• Digit extraction
Hash structure
Basic concepts
Hash functions

• Mid-square Direct Hashing


Modulo division
Digit extraction

• Folding Mid-square
Mid-square
Folding

• Rotation Rotation
Pseudo-random
Collision resolution

• Pseudo-random Open addressing


Bucket hashing
Linked list resolution

Search + Hash.22
Search + Hash
Direct Hashing
Dept. Computer
Science

Searching algorithms
Sequential Search
Interval Search

Hash structure
The address is the key itself: Basic concepts
Hash functions

hash(Key) = Key Direct Hashing


Modulo division
Digit extraction
Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.23
Search + Hash
Direct Hashing
Dept. Computer
Science

Searching algorithms
Sequential Search
Interval Search

• Advantage: there is no collision. Hash structure


Basic concepts

• Disadvantage: the address space (storage Hash functions


Direct Hashing
Modulo division
size) is as large as the key space. Digit extraction
Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.24
Search + Hash
Modulo division
Dept. Computer
Science

Address = Key mod listSize Searching algorithms


Sequential Search
Interval Search

• Fewer collisions if listSize is a prime Hash structure


Basic concepts

number. Hash functions


Direct Hashing
Modulo division

• Example: Digit extraction


Mid-square

Numbering system to handle 1,000,000 Mid-square


Folding
Rotation

employees Pseudo-random
Collision resolution

Data space to store up to 300 employees Open addressing


Bucket hashing
Linked list resolution

hash(121267) = 121267 mod 307 = 2

Search + Hash.25
Search + Hash
Digit extraction
Dept. Computer
Science

Searching algorithms
Address = selected digits f rom Key Sequential Search
Interval Search

Hash structure
Example: Basic concepts
Hash functions

379452→394 Direct Hashing


Modulo division
Digit extraction

121267→112 Mid-square
Mid-square

378845→388 Folding
Rotation
Pseudo-random

160252→102 Collision resolution


Open addressing

045128→051 Bucket hashing


Linked list resolution

Search + Hash.26
Search + Hash
Mid-square
Dept. Computer
Science

Searching algorithms
Sequential Search
Interval Search

Hash structure
Address = middle digits of Key 2 Basic concepts
Hash functions
Direct Hashing
Modulo division

Example: Digit extraction


Mid-square

9452 * 9452 = 89340304→3403 Mid-square


Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.27
Search + Hash
Mid-square
Dept. Computer
Science

• Disadvantage: the size of the Key 2 is too Searching algorithms


Sequential Search

large. Interval Search

Hash structure
• Variations: use only a portion of the key. Basic concepts
Hash functions

Example: Direct Hashing


Modulo division
Digit extraction

379452: 379 * 379 = 143641→364 Mid-square


Mid-square

121267: 121 * 121 = 014641→464 Folding


Rotation
Pseudo-random

045128: 045 * 045 = 002025→202 Collision resolution


Open addressing
Bucket hashing
Linked list resolution

Search + Hash.28
Search + Hash
Folding
Dept. Computer
Science

The key is divided into parts whose size matches


the address size.
Searching algorithms
Sequential Search

Example: Interval Search

Hash structure
Key = 123—456—789 Basic concepts
Hash functions

fold shift Direct Hashing


Modulo division

123 + 456 + 789 = 1368 Digit extraction


Mid-square
Mid-square

→ 368 Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.29
Search + Hash
Folding
Dept. Computer
Science

The key is divided into parts whose size matches


the address size.
Searching algorithms
Sequential Search

Example: Interval Search

Hash structure
Key = 123—456—789 Basic concepts
Hash functions

fold shift Direct Hashing


Modulo division

123 + 456 + 789 = 1368 Digit extraction


Mid-square
Mid-square

→ 368 Folding
Rotation
Pseudo-random
Collision resolution

fold boundary Open addressing


Bucket hashing
Linked list resolution
321 + 456 + 987 = 1764
→ 764
Search + Hash.29
Search + Hash
Rotation
Dept. Computer
Science

• Hashing keys that are identical except for


the last character may create synonyms. Searching algorithms
Sequential Search

• The key is rotated before hashing. Interval Search

Hash structure

original key rotated key Basic concepts


Hash functions
Direct Hashing

600101 160010 Modulo division


Digit extraction

600102 260010 Mid-square


Mid-square
Folding

600103 360010 Rotation


Pseudo-random

600104 460010 Collision resolution


Open addressing
Bucket hashing
600105 560010 Linked list resolution

Search + Hash.30
Search + Hash
Rotation
Dept. Computer
Science

• Used in combination with fold shift.


original key rotated key Searching algorithms

600101 → 62 160010 → 26
Sequential Search
Interval Search

600102 → 63 260010 → 36 Hash structure


Basic concepts
Hash functions

600103 → 64 360010 → 46 Direct Hashing


Modulo division

600104 → 65 460010 → 56 Digit extraction


Mid-square
Mid-square

600105 → 66 560010 → 66 Folding


Rotation
Pseudo-random
Collision resolution

Spreading the data more evenly across the ad- Open addressing
Bucket hashing

dress space. Linked list resolution

Search + Hash.31
Search + Hash
Pseudo-random
Dept. Computer
Science

Searching algorithms
Sequential Search
Interval Search

Hash structure
Basic concepts
Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square

For maximum efficiency, a and c should be Mid-square


Folding
Rotation

prime numbers. Pseudo-random


Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.32
Search + Hash
Pseudo-random
Dept. Computer
Science

Example:
Key = 121267 Searching algorithms
Sequential Search

a = 17 Interval Search

Hash structure
c=7 Basic concepts
Hash functions

listSize = 307 Direct Hashing


Modulo division
Digit extraction

Address = ((17*121267 + 7) mod 307 Mid-square


Mid-square

= (2061539 + 7) mod 307 Folding


Rotation
Pseudo-random

= 2061546 mod 307 Collision resolution


Open addressing

= 41 Bucket hashing
Linked list resolution

Search + Hash.33
Search + Hash

Dept. Computer
Science

Searching algorithms
Sequential Search
Interval Search

Collision resolution Hash structure


Basic concepts
Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.34
Search + Hash
Collision resolution
Dept. Computer
Science

• Except for the direct hashing, none of the Searching algorithms


Sequential Search

others are one-to-one mapping Interval Search

Hash structure
→ Requiring collision resolution methods Basic concepts
Hash functions
Direct Hashing
Modulo division

• Each collision resolution method can be Digit extraction


Mid-square
Mid-square

used independently with each hash Folding


Rotation

function Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.35
Search + Hash
Collision resolution
Dept. Computer
Science

Searching algorithms
Sequential Search

• Closed Hashing Interval Search

• Open addressing Hash structure


Basic concepts
• Bucket hashing Hash functions

• Open Hashing Direct Hashing


Modulo division
Digit extraction
• Linked list resolution Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.36
Search + Hash
Open addressing
Dept. Computer
Science

Searching algorithms
Sequential Search
Interval Search

When a collision occurs, an unoccupied ele- Hash structure


Basic concepts

ment is searched for placing the new element Hash functions


Direct Hashing

in. Modulo division


Digit extraction
Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.37
Search + Hash
Open addressing
Dept. Computer
Science

Searching algorithms
Sequential Search
Interval Search
Hash function: Hash structure

h : U → {0, 1, 2, ..., m − 1} Basic concepts


Hash functions
Direct Hashing
Modulo division

set of keys addresses Digit extraction


Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.38
Search + Hash
Open addressing
Dept. Computer
Science

Searching algorithms
Sequential Search

Hash and probe function: Interval Search

Hash structure
hp : U ×{0, 1, 2, ..., m−1} → {0, 1, 2, ..., m− Basic concepts
Hash functions

1} Direct Hashing
Modulo division
Digit extraction
Mid-square
set of keys probe numbers addresses Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.39
Search + Hash
Open Addressing
Dept. Computer
Science
1 Algorithm hashInsert(ref T ¡array¿, val k ¡key¿)
2 Inserts key k into table T.

3 i=0 Searching algorithms

4 while i ¡ m do Sequential Search


Interval Search

5 j = hp(k, i) Hash structure

6 if T[j] = nil then Basic concepts


Hash functions

7 T[j] = k Direct Hashing


Modulo division

8 return j Digit extraction


Mid-square

9 else Mid-square
Folding

10 i=i+1 Rotation
Pseudo-random

11 end Collision resolution


Open addressing

12 end Bucket hashing


Linked list resolution

13 return error: “hash table overflow”


14 End hashInsert

Search + Hash.40
Search + Hash
Open Addressing
Dept. Computer
Science
1 Algorithm hashSearch(val T ¡array¿, val k ¡key¿)
2 Searches for key k in table T.

3 i=0
Searching algorithms
4 while i ¡ m do Sequential Search
Interval Search
5 j = hp(k, i)
Hash structure
6 if T[j] = k then Basic concepts
Hash functions
7 return j Direct Hashing
Modulo division
8 else if T[j] = nil then Digit extraction

9 return nil Mid-square


Mid-square

10 else Folding
Rotation

11 i=i+1 Pseudo-random
Collision resolution

12 end Open addressing


Bucket hashing

13 end Linked list resolution

14 return nil
15 End hashSearch
Search + Hash.41
Search + Hash
Open Addressing
Dept. Computer
Science

Searching algorithms
There are different methods: Sequential Search
Interval Search

• Linear probing Hash structure


Basic concepts

• Quadratic probing Hash functions


Direct Hashing
Modulo division

• Double hashing Digit extraction


Mid-square
Mid-square

• Key offset Folding


Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.42
Search + Hash
Linear Probing
Dept. Computer
• When a home address is occupied, go to Science

the next address (the current address +


1):
Searching algorithms
hp(k, i) = (h(k) + i) mod m Sequential Search
Interval Search

Hash structure
Basic concepts
Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.43
Search + Hash
Linear Probing
Dept. Computer
• When a home address is occupied, go to Science

the next address (the current address +


1):
Searching algorithms
hp(k, i) = (h(k) + i) mod m Sequential Search
Interval Search

Hash structure
Basic concepts
Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.43
Search + Hash
Linear Probing
Dept. Computer
Science

Searching algorithms
Sequential Search
Interval Search

Hash structure
Basic concepts
Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.44
Search + Hash
Linear Probing
Dept. Computer
Science

• Advantages: Searching algorithms

• quite simple to implement Sequential Search


Interval Search

• data tend to remain near their home Hash structure


Basic concepts
Hash functions

address (significant for disk addresses) Direct Hashing


Modulo division
Digit extraction
Mid-square

• Disadvantages: Mid-square
Folding

• produces primary clustering Rotation


Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.45
Search + Hash
Quadratic Probing
Dept. Computer
Science

Searching algorithms
Sequential Search
Interval Search

• The address increment is the collision Hash structure


Basic concepts

probe number squared: Hash functions


Direct Hashing

hp(k, i) = (h(k) + i2 ) mod m Modulo division


Digit extraction
Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.46
Search + Hash
Quadratic Probing
Dept. Computer
Science

• Advantages: Searching algorithms

• works much better than linear probing Sequential Search


Interval Search

Hash structure
Basic concepts

• Disadvantages: Hash functions


Direct Hashing

• time required to square numbers Modulo division


Digit extraction
Mid-square

• produces secondary clustering Mid-square


Folding

h(k1 ) = h(k2 ) → hp(k1 , i) = hp(k2 , i)


Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.47
Search + Hash
Double Hashing
Dept. Computer
Science

Searching algorithms
Sequential Search
Interval Search

• Using two hash functions: Hash structure


Basic concepts
Hash functions

hp(k, i) = (h1 (k) + ih2 (k)) mod m Direct Hashing


Modulo division
Digit extraction
Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.48
Search + Hash
Key Offset
Dept. Computer
Science

• The new address is a function of the Searching algorithms

collision address and the key. Sequential Search


Interval Search

Hash structure
Basic concepts

of f set = [key/listSize] Hash functions


Direct Hashing

newAddress = (collisionAddress + Modulo division


Digit extraction
Mid-square

of f set) mod listSize Mid-square


Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.49
Search + Hash
Key Offset
Dept. Computer
Science

• The new address is a function of the Searching algorithms

collision address and the key. Sequential Search


Interval Search

Hash structure
Basic concepts

of f set = [key/listSize] Hash functions


Direct Hashing

newAddress = (collisionAddress + Modulo division


Digit extraction
Mid-square

of f set) mod listSize Mid-square


Folding
Rotation
Pseudo-random

hp(k, i) = (hp(k, i − 1) + [k/m]) mod m Collision resolution


Open addressing
Bucket hashing
Linked list resolution

Search + Hash.49
Search + Hash
Open addressing
Dept. Computer
Science

Hash and probe function: Searching algorithms


Sequential Search

hp : U ×{0, 1, 2, ..., m−1} → {0, 1, 2, ..., m− Interval Search

Hash structure
1} Basic concepts
Hash functions
Direct Hashing
Modulo division
set of keys probe numbers addresses Digit extraction
Mid-square
Mid-square

{hp(k, 0), hp(k, 1), . . . , hp(k, m − 1)} is a permutation of Folding


Rotation

{0, 1, . . . , m − 1} Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.50
Search + Hash
Bucket hashing
Dept. Computer
Science

Searching algorithms
Sequential Search

• Hashing data to buckets that can hold Interval Search

Hash structure
multiple pieces of data. Basic concepts
Hash functions

• Each bucket has an address and collisions Direct Hashing


Modulo division
Digit extraction

are postponed until the bucket is full. Mid-square


Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.51
Search + Hash
Bucket hashing
Dept. Computer
Science

Searching algorithms
Sequential Search
Interval Search

Hash structure
Basic concepts
Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.52
Search + Hash
Linked List Resolution
Dept. Computer
Science

Searching algorithms
Sequential Search

• Major disadvantage of Open Addressing: Interval Search

Hash structure
each collision resolution increases the Basic concepts
Hash functions

probability for future collisions. Direct Hashing


Modulo division
Digit extraction

→ use linked lists to store synonyms Mid-square


Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.53
Search + Hash
Linked list resolution
Dept. Computer
Science

Searching algorithms
Sequential Search
Interval Search

Hash structure
Basic concepts
Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.54
Search + Hash

Dept. Computer
Science

Searching algorithms
Sequential Search
Interval Search

THANK YOU. Hash structure


Basic concepts
Hash functions
Direct Hashing
Modulo division
Digit extraction
Mid-square
Mid-square
Folding
Rotation
Pseudo-random
Collision resolution
Open addressing
Bucket hashing
Linked list resolution

Search + Hash.55

You might also like