1.4K views

Uploaded by mrbkiter

save

- Week 12
- DSA quiz
- File Organisation&Hasing
- Online Banking System
- Kirk Lafker
- ch05
- SAS Hash Objects Final
- Mysql Memcached En
- DS
- Indexing
- Data structures
- 10.1.1.16.3037
- H Dissertation EnginAktas
- Internet Banking System
- ds 2 marks
- Offshore Job Description_Sample
- RESUME BOOK IT.docx
- evaluation report
- 8Tools
- chapter5 home to school connection
- ic map
- Nix the Tricks
- 1)Assignment-I.pdf
- mte3106 kuliah 5.ppt
- 10 great ways to use literacy techniques
- alignment unit plan
- SAPABAP09
- formallessonplan2
- math-techology resources
- Arabesque 1
- yo_soy_la_locura_textos_0.pdf
- I.docx
- 365185558-Form-Rincian-Anggaran-Biaya.docx
- Lu Sin - Çığlık.pdf
- St. Cyprian – Patron Saint of Necromancers
- The Science of Mind - Ernest Holmes - SP.pdf
- Jurnal Herman
- Art Lili publicado.pdf
- unit 4.txt
- gg404-f18-syl.pdf
- Fr Gc 02 Formato Para Listado Maestro de Documentos
- Champion Motor grader Series 700 III
- API-ACR122U-2.04.pdf
- IMSLP17294-Baermann Clarinet Quartet Op18
- Pauta de Evaluación Tecnología_word_internet
- cat
- downloadfile.docx
- Distribuciones.doc
- IIM Rohtak Advertisement for WEBSITE
- ELEVACIONES DE UN PLANO
- UNDANGAN KEBAKTIAN.docx
- silabus (1).docx
- Legure - obrazac
- Introduction
- 94562957-CURS-VI.ppt
- ΦΙΛΟΣΟΦΙΑ ΖΗΤΗ
- 294 - Pr 26 - Transverse and Longitudinal Decomposition
- 1_SociedadMineraCerroVerde_ResumenEjecutivo.pdf
- Lectura Fichas de registro.pdf
- Chiến lược biến thể để trị
- Giải thuật quay lui Giải thuật nhánh-và-cận
- 01_Ghichep
- Approximation Algorithms
- Lex
- FuncProg
- ADT_OOP
- Vấn đề NP-đầy đủ
- 02_ToChucCongViec
- Chapter1 - Overview
- Qui hoạch động
- GrammarParser
- tÌm HiỂu vỀ Analytic Hierachy Process ThÔng
- Control Structures
- Introduction
- Heap Sort
- Hardware Basic
- definition about analysis and design Algorithm
- Multiway Tree
- divide and conquer
- Graph
- Introduction
- Search Tree
- Linked List
- Sorting
- Chiến lược giảm để trị
- Binary Tree
- Chapter 8
- Recursion

You are on page 1of 50

• Basic concepts • Hash functions • Collision resolution • Open addressing • Linked list resolution • Bucket hashing

1

Basic Concepts

• Sequential search: O(n) • Binary search: O(log2n)

found Requiring several key comparisons before the target is

2

Basic Concepts

• Search complexity:

Size

16 50 256 1,000 10,000 100,000 1,000,000

Binary

4 6 8 10 14 17 20

Sequential (Average)

8 25 128 500 5,000 50,000 500,000

**Sequential (Worst Case)
**

16 50 256 1,000 10,000 100,000 1,000,000

3

Basic Concepts

• Is there a search algorithm whose complexity is O(1)?

4

Basic Concepts

• Is there a search algorithm whose complexity is O(1)? YES.

5

Basic Concepts

keys memory addresses

hashing

Each key has only one address

6

Basic Concepts

Key

001 Harry Lee Sarah Trapp

Address

002

005

Vu Nguyen

Vu Nguyen 102002 John Adams 107095 Sarah Trapp 111060

005

Hash Function

100 002

007

Ray Black

100

John Adams

7

Basic Concepts

• Home address: address produced by a hash function. • Prime area: memory that contains all the home addresses.

8

Basic Concepts

• Synonyms: a set of keys that hash to the same location. • Collision: the location of the data to be inserted is already occupied by the synonym data.

9

Basic Concepts

• Ideal hashing:

– No location collision – Compact address space

10

Basic Concepts

Insert A, B, C hash(A) = 9 hash(B) = 9 hash(C) = 17 A

[1] [5] [9] [17]

11

Basic Concepts

Insert A, B, C hash(A) = 9 hash(B) = 9 hash(C) = 17 A

[1] [5] [9]

B and A collide at 9 B

[17]

Collision Resolution

12

Basic Concepts

Insert A, B, C hash(A) = 9 hash(B) = 9 hash(C) = 17 C

[1] [5]

B and A collide at 9 A

[9]

C and B collide at 17 B

[17]

Collision Resolution

13

Basic Concepts

Searh for B hash(A) = 9 hash(B) = 9 hash(C) = 17 C

[1] [5]

A

[9]

B

[17]

Probing

14

Hash Functions

• Direct hashing • Modulo division • Digit extraction • Mid-square • Folding • Rotation • Pseudo-random

15

Direct Hashing

• The address is the key itself: hash(Key) = Key

16

Direct Hashing

• Advantage: there is no collision. • Disadvantage: the address space (storage size) is as large as the key space

17

Modulo Division

Address = Key MOD listSize + 1 • Fewer collisions if listSize is a prime number • Example:

Numbering system to handle 1,000,000 employees Data space to store up to 300 employees hash(121267) = 121267 MOD 307 + 1 = 2 + 1 = 3

18

Digit Extraction

Address = selected digits from Key

• Example:

379452 121267 378845 160252 045128 → → → → → 394 112 388 102 051

19

Mid-square

Address = middle digits of Key2 • Example:

9452 * 9452 = 89340304 → 3403

20

Mid-square

• Disadvantage: the size of the Key2 is too large • Variations: use only a portion of the key

379452: 379 * 379 = 143641 → 364 121267: 121 * 121 = 014641 → 464 045128: 045 * 045 = 002025 → 202

21

Folding

• The key is divided into parts whose size matches the address size

Key = 123|456|789 fold shift 123 + 456 + 789 = 1368 ⇒ 368

22

Folding

• The key is divided into parts whose size matches the address size

Key = 123|456|789 fold shift 123 + 456 + 789 = 1368 ⇒ 368 fold boundary 321 + 456 + 987 = 1764 ⇒ 764

23

Rotation

• Hashing keys that are identical except for the last character may create synonyms. • The key is rotated before hashing.

original key 600101 600102 600103 600104 600105 rotated key 160010 260010 360010 460010 560010

24

Rotation

• Used in combination with fold shift

original key 600101 600102 600103 600104 600105 → → → → → 62 63 64 65 66 rotated key 160010 260010 360010 460010 560010 → → → → → 26 36 46 56 66

**Spreading the data more evenly across the address space
**

25

Pseudorandom

Key Pseudorandom Number Generator Rando m Number Modulo Division Address

y = ax + c For maximum efficiency, a and c should be prime numbers

26

Pseudorandom

• Example:

Key = 121267 a = 17 c=7 listSize = 307

Address = ((17*121267 + 7) MOD 307 + 1 = (2061539 + 7) MOD 307 + 1 = 2061546 MOD 307 + 1 = 41 + 1 = 42

27

Collision Resolution

• Except for the direct hashing, none of the others are one-to-one mapping

⇒ Requiring collision resolution methods

• Each collision resolution method can be used independently with each hash function

28

Collision Resolution

• A rule of thumb: a hashed list should not be allowed to become more than 75% full. Load factor: α = (k/n) x 100

n = list size k = number of filled elements

29

Collision Resolution

• As data are added and collisions are resolved, hashing tends to cause data to group within the list

⇒ Clustering: data are unevenly distributed across the list

**• High degree of clustering increases the number of probes to locate an element
**

⇒ Minimize clustering

30

Collision Resolution

• Primary clustering: data become clustered around a home address.

Insert A9, B9, C9, D11, E12

A B C D E

[1] [9] [10] [11] [12] [13]

31

Collision Resolution

• Secondary clustering: data become grouped along a collision path throughout a list.

Insert A9, B9, C9, D11, E12, F9

A B D E

[1]

C

F

[23]

[9] [10] [11] [12] [13] [14]

32

Open Addressing

• When a collision occurs, an unoccupied element is searched for placing the new element in. • There are different methods:

Linear probe Quadratic probe Double hashing Key offset

33

Linear Probe

• When a home address is occupied, go to the next address (the current address + 1). • Alternatives: add 1, subtract 2, add 3, subtract 4, ...

34

Linear Probe

001 002 003 Mary Dodd Sarah Trapp (379452) (070918) Bryan Devaux (121267)

Harry Eagle 166702

Hash

002

008 John Carver (378845)

306 307

Tuan Ngo

(160252)

Shouli Feldman (045128) 35

Linear Probe

001 002 003 004 Mary Dodd Sarah Trapp (379452) (070918) Bryan Devaux (121267) Harry Eagle (166702)

Harry Eagle 166702

Hash

002

008 John Carver (378845)

306 307

Tuan Ngo

(160252)

Shouli Feldman (045128) 36

Linear Probe

• Advantages:

quite simple to implement data tend to remain near their home address (significant for disk addresses)

• Disadvantages:

produces primary clustering

37

Quadratic Probe

• The address increment is the collision probe number squared (12, 22, 32, ...). • Modulo is used not to run off the end of the address list.

38

Quadratic Probe

• Disadvantage: time required to square the probe number. (n + 1)2 = n2 + 2n + 1

Increment Factor

39

Quadratic Probe

Probe Collision Probe2 and New Number Location Increment Address 1 2 3 4 5 6 7 1 2 6 15 31 56 92 12 = 1 22 = 4 32 = 9 42 = 16 52 = 25 62 = 36 72 = 49 02 06 15 31 56 92 41 Increment Factor 1 3 5 7 9 11 13 Next Increment 1 4 9 16 25 36 49

40

Quadratic Probe

• Limitation: it is not possible to generate a new address for every element in the list. (address + n2) MOD listSize + 1 ⇒ Use a list size that is a prime number (more elements reachable)

41

Pseudorandom Probe

• Generate a pseudorandom number as a new address from the collision address.

newAddress = (a x collisionAddress + c) MOD listSize + 1 a should be a large prime number

42

Pseudorandom Probe

• Significant limitation (of linear and quadratic probes as well): there is only one collision path for all keys. ⇒ produces secondary clustering

43

Secondary Clustering

hash(A) = hash(B) = hash(C) = 9 hash(D) = 11

A

[1] [9]

C

[11]

B

[13]

D

[15]

Linear probes

Search/find location for D

44

Key Offset

• Generate a new address as a function of the collision address and the key.

offset = [key / listSize] newAddress = (offset + collisionAddress) MOD listSize + 1

45

Key Offset

Key 166702 572556 067234 Home Address 2 2 2 Key Offset 543 1,865 219 Probe 1 239 026 222 Probe 2 169 050 135

offset = [166702 / 307 = 543] address = (543 + 02) MOD 307 + 1 = 239 address = (543 + 239) MOD 307 + 1 = 169

46

**Linked List Resolution
**

• Major disadvantage of Open Addressing: each collision resolution increases the probability for future collisions. ⇒ use linked lists to store synonyms

47

**Linked List Resolution
**

001 002 003 Mary Dodd Sarah Trapp (379452) (070918) Harry Eagle (166702) Bryan Devaux (121267) Chris Walljasper (572556)

008

John Carver

(378845)

overflow area prime area

306 307

Tuan Ngo

(160252)

Shouli Feldman (045128) 48

Bucket Hashing

• Hashing data to buckets that can hold multiple pieces of data. • Each bucket has an address and collisions are postponed until the bucket is full.

49

Bucket Hashing

Mary Dodd 001 (379452)

Sarah Trapp 002 Harry Eagle Ann Georgis 003

(070918) (166702) (367173)

Bryan Devaux (121267) Chris Walljasper(572556)

linear probe

Shouli Feldman (045128) 307

50

- Week 12Uploaded bySatish Gandham
- DSA quizUploaded byHenryHai Nguyen
- File Organisation&HasingUploaded byAmit Maheshwari
- Online Banking SystemUploaded byifeanyi_osi
- Kirk LafkerUploaded bySowajnya Sowji
- ch05Uploaded byfleebt
- SAS Hash Objects FinalUploaded byRamakant Sharma
- Mysql Memcached EnUploaded byMarko Drašković
- DSUploaded byadddata
- IndexingUploaded bySarita Sharma
- Data structuresUploaded byjegamca_jega_2749918
- 10.1.1.16.3037Uploaded bysahilsingh957
- H Dissertation EnginAktasUploaded bysakiraktas
- Internet Banking SystemUploaded bysyam praveen
- ds 2 marksUploaded bynofeelingrahul
- Offshore Job Description_SampleUploaded byAnimorphs
- RESUME BOOK IT.docxUploaded bysiti aminah
- evaluation reportUploaded byapi-242289043
- 8ToolsUploaded bygrannydebs
- chapter5 home to school connectionUploaded byapi-262315666
- ic mapUploaded byapi-324654624
- Nix the TricksUploaded bytina_cardone1
- 1)Assignment-I.pdfUploaded byKarteek Sirisetti
- mte3106 kuliah 5.pptUploaded byAsura Azmi
- 10 great ways to use literacy techniquesUploaded byapi-402254227
- alignment unit planUploaded byapi-349820411
- SAPABAP09Uploaded byVvenkata Ssubrahmanyam
- formallessonplan2Uploaded byapi-341227083
- math-techology resourcesUploaded byapi-210187851

- Chiến lược biến thể để trịUploaded bymrbkiter
- Giải thuật quay lui Giải thuật nhánh-và-cậnUploaded bymrbkiter
- 01_GhichepUploaded bymrbkiter
- Approximation AlgorithmsUploaded bymrbkiter
- LexUploaded bymrbkiter
- FuncProgUploaded bymrbkiter
- ADT_OOPUploaded bymrbkiter
- Vấn đề NP-đầy đủUploaded bymrbkiter
- 02_ToChucCongViecUploaded bymrbkiter
- Chapter1 - OverviewUploaded bymrbkiter
- Qui hoạch độngUploaded bymrbkiter
- GrammarParserUploaded bymrbkiter
- tÌm HiỂu vỀ Analytic Hierachy Process ThÔngUploaded bymrbkiter
- Control StructuresUploaded bymrbkiter
- IntroductionUploaded bymrbkiter
- Heap SortUploaded bymrbkiter
- Hardware BasicUploaded bymrbkiter
- definition about analysis and design AlgorithmUploaded bymrbkiter
- Multiway TreeUploaded bymrbkiter
- divide and conquerUploaded bymrbkiter
- GraphUploaded bymrbkiter
- IntroductionUploaded bymrbkiter
- Search TreeUploaded bymrbkiter
- Linked ListUploaded bymrbkiter
- SortingUploaded bymrbkiter
- Chiến lược giảm để trịUploaded bymrbkiter
- Binary TreeUploaded bymrbkiter
- Chapter 8Uploaded bymrbkiter
- RecursionUploaded bymrbkiter