## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

6.046J/18.401J/SMA5503

Lecture 7

Prof. Charles E. Leiserson

**Symbol-table problem
**

Symbol table T holding n records: x

record

key[x] key[x]

Other fields containing satellite data

Operations on T: • INSERT(T, x) • DELETE(T, x) • SEARCH(T, k)

**How should the data structure T be organized?
**

© 2001 by Charles E. Leiserson Introduction to Algorithms Day 11 L7.2

**Direct-access table
**

IDEA: Suppose that the set of keys is K ⊆ {0, 1, …, m–1}, and keys are distinct. Set up an array T[0 . . m–1]: x if k ∈ K and key[x] = k, T[k] = NIL otherwise. Then, operations take Θ(1) time. Problem: The range of keys can be large: • 64-bit numbers (which represent 18,446,744,073,709,551,616 different keys), • character strings (even larger!).

© 2001 by Charles E. Leiserson Introduction to Algorithms Day 11 L7.3

Hash functions

Solution: Use a hash function h to map the universe U of all keys into T {0, 1, …, m–1}: 0

k1

K

k2

k4

k5 k3

h(k1) h(k4) h(k2) = h(k5) h(k3) m–1

U

When a record to be inserted maps to an already As each key is inserted, h maps it to a slot of T. occupied slot in T, a collision occurs.

© 2001 by Charles E. Leiserson Introduction to Algorithms Day 11 L7.4

**Resolving collisions by chaining
**

• Records in the same slot are linked into a list. T

i

49 49

86 86

52 52

h(49) = h(86) = h(52) = i

© 2001 by Charles E. Leiserson

Introduction to Algorithms

Day 11

L7.5

Analysis of chaining

We make the assumption of simple uniform hashing: • Each key k ∈ K of keys is equally likely to be hashed to any slot of table T, independent of where other keys are hashed. Let n be the number of keys in the table, and let m be the number of slots. Define the load factor of T to be α = n/m = average number of keys per slot.

© 2001 by Charles E. Leiserson Introduction to Algorithms Day 11 L7.6

Search cost

Expected time to search for a record with a given key = Θ(1 + α).

apply hash function and access slot search the list

Expected search time = Θ(1) if α = O(1), or equivalently, if n = O(m).

© 2001 by Charles E. Leiserson

Introduction to Algorithms

Day 11

L7.7

**Choosing a hash function
**

The assumption of simple uniform hashing is hard to guarantee, but several common techniques tend to work well in practice as long as their deficiencies can be avoided. Desirata: • A good hash function should distribute the keys uniformly into the slots of the table. • Regularity in the key distribution should not affect this uniformity.

© 2001 by Charles E. Leiserson Introduction to Algorithms Day 11 L7.8

Division method

Assume all keys are integers, and define h(k) = k mod m. Deficiency: Don’t pick an m that has a small divisor d. A preponderance of keys that are congruent modulo d can adversely affect uniformity. Extreme deficiency: If m = 2r, then the hash doesn’t even depend on all the bits of k: • If k = 10110001110110102 and r = 6, then h(k) = 0110102 . h(k)

© 2001 by Charles E. Leiserson Introduction to Algorithms Day 11 L7.9

**Division method (continued)
**

h(k) = k mod m. Pick m to be a prime not too close to a power of 2 or 10 and not otherwise used prominently in the computing environment. Annoyance: • Sometimes, making the table size a prime is inconvenient. But, this method is popular, although the next method we’ll see is usually superior.

© 2001 by Charles E. Leiserson Introduction to Algorithms Day 11 L7.10

Multiplication method

Assume that all keys are integers, m = 2r, and our computer has w-bit words. Define h(k) = (A·k mod 2w) rsh (w – r), where rsh is the “bit-wise right-shift” operator and A is an odd integer in the range 2w–1 < A < 2w. • Don’t pick A too close to 2w. • Multiplication modulo 2w is fast. • The rsh operator is fast.

© 2001 by Charles E. Leiserson Introduction to Algorithms Day 11 L7.11

**Multiplication method example
**

h(k) = (A·k mod 2w) rsh (w – r) Suppose that m = 8 = 23 and that our computer has w = 7-bit words: 1011001 =A × 1101011 =k 10010100110011 A h(k) 0 1 7 5 4 3

.

3A

.

6

2

.

2A

L7.12

Modular wheel

© 2001 by Charles E. Leiserson Introduction to Algorithms Day 11

**Dot-product method
**

Randomized strategy: Let m be prime. Decompose key k into r + 1 digits, each with value in the set {0, 1, …, m–1}. That is, let k = 〈k0, k1, …, km–1〉, where 0 ≤ ki < m. Pick a = 〈a0, a1, …, am–1〉 where each ai is chosen randomly from {0, 1, …, m–1}.

Define ha (k ) =

i =0

∑ ai ki mod m.

Introduction to Algorithms Day 11 L7.13

r

**• Excellent in practice, but expensive to compute.
**

© 2001 by Charles E. Leiserson

**Resolving collisions by open addressing
**

No storage is used outside of the hash table itself. • Insertion systematically probes the table until an empty slot is found. • The hash function depends on both the key and probe number: h : U × {0, 1, …, m–1} → {0, 1, …, m–1}. • The probe sequence 〈h(k,0), h(k,1), …, h(k,m–1)〉 should be a permutation of {0, 1, …, m–1}. • The table may fill up, and deletion is difficult (but not impossible).

© 2001 by Charles E. Leiserson Introduction to Algorithms Day 11 L7.14

**Example of open addressing
**

Insert key k = 496: 0. Probe h(496,0) T

586 133 204 481 m–1 0

collision

© 2001 by Charles E. Leiserson

Introduction to Algorithms

Day 11

L7.15

**Example of open addressing
**

Insert key k = 496: 0. Probe h(496,0) 1. Probe h(496,1) T

586 133 204 481 m–1 0

collision

© 2001 by Charles E. Leiserson

Introduction to Algorithms

Day 11

L7.16

**Example of open addressing
**

Insert key k = 496: 0. Probe h(496,0) 1. Probe h(496,1) 2. Probe h(496,2) T

586 133 204 496 481 0

insertion

m–1

© 2001 by Charles E. Leiserson

Introduction to Algorithms

Day 11

L7.17

**Example of open addressing
**

Search for key k = 496: 0. Probe h(496,0) 1. Probe h(496,1) 2. Probe h(496,2) T

586 133 204 496 481 0

Search uses the same probe sequence, terminating sucm–1 cessfully if it finds the key and unsuccessfully if it encounters an empty slot.

© 2001 by Charles E. Leiserson Introduction to Algorithms Day 11 L7.18

Probing strategies

Linear probing: Given an ordinary hash function h′(k), linear probing uses the hash function h(k,i) = (h′(k) + i) mod m. This method, though simple, suffers from primary clustering, where long runs of occupied slots build up, increasing the average search time. Moreover, the long runs of occupied slots tend to get longer.

© 2001 by Charles E. Leiserson Introduction to Algorithms Day 11 L7.19

Probing strategies

Double hashing Given two ordinary hash functions h1(k) and h2(k), double hashing uses the hash function h(k,i) = (h1(k) + i⋅h2(k)) mod m. This method generally produces excellent results, but h2(k) must be relatively prime to m. One way is to make m a power of 2 and design h2(k) to produce only odd numbers.

© 2001 by Charles E. Leiserson Introduction to Algorithms Day 11 L7.20

**Analysis of open addressing
**

We make the assumption of uniform hashing: • Each key is equally likely to have any one of the m! permutations as its probe sequence.

Theorem. Given an open-addressed hash table with load factor α = n/m < 1, the expected number of probes in an unsuccessful search is at most 1/(1–α).

© 2001 by Charles E. Leiserson

Introduction to Algorithms

Day 11

L7.21

**Proof of the theorem
**

Proof. • At least one probe is always necessary. • With probability n/m, the first probe hits an occupied slot, and a second probe is necessary. • With probability (n–1)/(m–1), the second probe hits an occupied slot, and a third probe is necessary. • With probability (n–2)/(m–2), the third probe hits an occupied slot, etc. n − i < n = α for i = 1, 2, …, n. Observe that m−i m

© 2001 by Charles E. Leiserson Introduction to Algorithms Day 11 L7.22

Proof (continued)

Therefore, the expected number of probes is L n 1 + n − 1 1 + n − 2 L 1 + 1 1+ m m − 1 m − 2 m − n + 1 ≤ 1 + α (1 + α (1 + α (L (1 + α )L))) ≤1+α +α 2 +α3 +L

=

i =0

αi ∑

∞

= 1 . 1−α

© 2001 by Charles E. Leiserson

**The textbook has a more rigorous proof.
**

Introduction to Algorithms Day 11 L7.23

**Implications of the theorem
**

• If α is constant, then accessing an openaddressed hash table takes constant time. • If the table is half full, then the expected number of probes is 1/(1–0.5) = 2. • If the table is 90% full, then the expected number of probes is 1/(1–0.9) = 10.

© 2001 by Charles E. Leiserson

Introduction to Algorithms

Day 11

L7.24

- Dictionariesuploaded byEd Z
- Datastage Interview Qauploaded byRam Sita
- Survey: The Solutions to Data Structuring Problem using Randomized Algorithmsuploaded byInternational Journal for Scientific Research and Development - IJSRD
- Neyer Paperuploaded byanishnair94
- 14786 Hashinguploaded byShubjit Kaur
- Hashinguploaded byNick Hartman
- Chapter 8 Data Structures and CAATs for Data Extractionuploaded byCamille Saranghae
- 21 Hashinguploaded byKun
- Hashinguploaded bysirapu
- ABOUT ASSEMBLERS_ ASSEMBLER ALGORITHM AND DATA STRUCTURES.pdfuploaded byrituraj69
- ABAP Coding in BIuploaded byVijay Chandra
- DBMS-Set1uploaded byTigist Awoke
- scimakelatex.25769.d.+person.e.+person.f.+personuploaded bymdp music
- lec7.pdfuploaded byAnonymous DbYtuqy123
- Chapter 3 Pages 86-110uploaded bymanny oluan
- Reliable Pipelinesuploaded bySavitha Chintakindi
- open end problems roleuploaded byandreiasalgueiro1923
- DS_EE Hash Stage Disk Cachinguploaded bycms_gcoles
- Analysis of 4-Parallel Radix-2^2 Feedforward FFT Architectureuploaded byrampraburohini
- C Problemsuploaded bypriyojit_hitk
- Discrete Freq DSP 1uploaded byhungpm2013
- Finaluploaded byIsha Garg
- file system implementation pptuploaded byNvn Saini
- Util Packuploaded byapi-25930603
- Strategies of the Defenderuploaded bycharlesriver
- Electronic Control System of Home Appliances Using Speech Command Wordsuploaded byIJSTR Research Publication
- Gradient Based Adaptive Beamforminguploaded byIRJET Journal
- Robust Hashing for Image Autheuploaded byRaghul Ramasamy
- Dartuploaded byPoornalalitha gadde
- Locked Data Structuresuploaded byanaatapa

- lecture11uploaded byvv
- TACTICAL COMMUNICATIONS PROTOCOL 2uploaded byvv
- DesignOfSWforEmbeddedSystemsuploaded byvv
- lecture13uploaded byvv
- Lecture 14uploaded byvv
- lecture09uploaded byvv
- Planning Algorithmuploaded byvv
- lecture12uploaded byvv
- lecture15uploaded byvv
- lecture06uploaded byvv
- lecture01uploaded byvv
- lecture04uploaded byvv
- Algorithmsuploaded byvv
- lecture03uploaded byvv
- lecture05uploaded byvv
- HSM Design principles v5 1uploaded byvv
- lecture08uploaded byvv
- lecture02uploaded byvv

Read Free for 30 Days

Cancel anytime.

Close Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Loading