daa

Attribution Non-Commercial (BY-NC)

67 views

daa

Attribution Non-Commercial (BY-NC)

- Cognos Report Studio Interview Questions
- 10 libc
- D1 2009-01 MS
- Shortest Way Huffman Text Compression
- Data Handling
- Computer Notes - Data Structures - 37
- Changes
- DatastructureC
- sdggd
- Data Structure 2 Marks
- Error Log
- Collect Set Examples
- itmtechreport_20141027
- (6) Rubric_long Module EML 3852
- HRRN.docx
- standard2 8 courtney carver itec7500 w03 17
- UnityJDBC - SQL to Mongo Translation
- Dev's
- Report Writing Skills
- CSE330 Assignment1 Solution

You are on page 1of 45

et al

EXHIBIT 5

PART 3 OF 6

Dockets.Justia.com

198

Tables

and

Information

Retrieval

CHAPTER

or Access table

Array

access

lmp/emerisy/

Figure

6.9

Implementation

of

table

functions

this

have

is

no

such

order

in

If

the

index

this

list

set

is

has

not

order

some

natural

reflected

order

aspect

then of

the

sometimes tables

the

table from

but

Hence

rotc

informatiort

in

necessary

involves

retrieval

using like

studied

naturally

the

list

search

table

but information go

directly to the

ones

retrieval the

methods

for

from

access

that

requires

differen requit ed

at

desired

searching

Ig in

entry of items

The

in

time

list

generally

depends

accessing

it

on

least

number

does

not

but the

is

the

time

the

for

and

is

table

of items

table

table

that

is

is

usually

access

0l

iist

usually

depend

in

on

the

number

For

this

reason

significantly

faster traversal

than

is

many

applications

list

On

It is

the

other

nearly special so for

list

hand

to

natural

generally

in the

hut

easy

In

move

it

not

for

tale

cbery

through

item on

in tablp.v

list

in

some

to for

operation perform

the

general table

with

may

not

if

be

every

item

easy order

an

is

particularly

advance

Finally

some

operation

specified

items

ano

across

we

ave

should

use the

clattfr table as

the

In

distinction

between

it

general array

the this

in

terms

section

shall

table

and

rc

we

have

array

the high-

defined

term

level

in

to

mean

and

and and

trict

prograrsming

for

feature

available tables

languages

Pascal contiguous

used

ntlst

implementing

both

and

6.5

HASHING

Sparse

Functions

lists

6.5.1

Tables

Index

We

an up

can

continue

that

to

exploit be

table

lookup

as

in

even

in

situations

index

where

the

can

used

key can

.....n

is

no

directly

honyt to

Set

array

indexing

What we weh

one-to-one

we

do

correspondence

between

the

keys by which

hash

Ji1

BTEX0000262

Hashing

199

tion

and

indices will

that

we

can

use

to

access

an

array

The

of

index

function

that

we

produce

it

be

to

it

somewhat

convert

the

more complicated

key

than

those

previous

to

sections an

itt

since but

may

need

from

loll

information

.er

in

principle

can

still

he

dune when

If the for possible

The

of space

eight

only

irises

.1

keys

are

exceeds

the

amount

words

of

available

table

our

keys keys

alphabetical

letters the

then

are

26

that of

\vill

possible

in

number

much

In is set

greater

than

number

unIv

poshions

he

available

high-speed

memory

That

large

practice

the but table

however

is

small

frction

these

keys

it

will

actually

occur

.cry

sparse

Conceptually few

of positions

we

can

indexed

In as

by

for

with

think

relatively

in

actually

Pascal

xample

we

might

terms

conceptual

declarations type

sparse not

in

table

of

item such

it

though

often

tie

it

may

helpful the

he

possible

to solving

it

implement

to

declaration

as

this arid

mnblem

of

begin

with

such

picture

slowly

down

details

how

is

puf

into

practice

Hash

Tables

The

of tOmetime5 ng tables ones

C/ax fir

idea

of

hash

table

such

keys

as that

the

one

shown

to

in

Figure

6.10

to

is

to

alluw

many

the

different

is

might index

the to

occur

be

mapped

there

if

the will

same

be

location

coon

rot

in

an

array two

under

action

of

to

the

function

Then

but the

possibility

ne-to-one

that that

records

want small

he

in

same

the

place

of

the

number

then

array this are

of

records

the

actually

occur

little

relative

size

array

in the

possibility

rsdifferent will

tc

cause

loss

of

time

Even

when

most

entries

occupied

required

hash ctnd

ha is

methods

can

be

an

effective

at

means of information

retrieval

number

rppbcations

table

with

every

oOt

totted

below

tperation

i.specified

tO

ii

12

13

15

lB

18

t9

20

21

22

23

24

aid

array

the

r$trict

00

iv

st

high

25

28

27

28

29

30

31

32

33

34 hash

35

36

37

38

39

40

41

42

43

44

45

46

47

Figure

6.10

table

tiO

We

array

begin This

with

function

hash

function

generally

that

takes several

key

and

maps

keys

it

to to

some index

in

Ylnforma_

will

map

different

the

same index

BTEX0000263

200

Tables

and

Information

Retrieval

CHAPTER

record must

is in

Et

lithe

col/rrioir

desired

the

location

given to the

by

the the

index

then

our problem

that are

is

solved

otherwise between

we

use

some method

to

resolve

collision

may

thus

have

occurred

questions

two

records wanting

to use

go

to

same location

find

There hash

two and

we must we

must

answer

hashing

to

First

we must

good

functions

second

determine

how

resolve

collisions

let

Before needed

to

approaching implement

these

questions

us

pause

to

outline

Trw informally

the steps

hashing

Algorithm

Outlines

First an

the

array used

the

must be declared

to locate entries itself

will usually

hold

the

the

hash

table

so

With

is

ordinary no need

arrays

to

keys

indices

several in

there

keep

them within

keis

fri

array

so

but

field

hash

table

possible the

keys

will

correspond be resen

cd Foldi

ia/nc

to for

the the

same index

key

itself

one

within

each

record

array

must

Next

ri/no/I

all

locations

tn

the

rri

the that

must

applic

is

be

triitialized

to

it

show

is

that

they

arc hs actual

emp1s

setti

a//wi

How

the

thts

is

done

to

depends

on

mon

often

accomplished occur

all

key

fields

some

keys

value for

guaranteed key

never

to of

is

an

ke

With

an

alphanumeric

example

consisting

blanks

might

represent

If

re

ord

into

the

hash

location insertion

table

is

the

hash then

function

the

fo

the

is

first

calculated

else

if

the

corresponding equal

case the

empty

the

record would

the

can

not

be he

it

inserted

alto red

the the

keys

are

then

of

ne

key

record

is in

and

in

remaining

to resolve the

record

collision th the

iith

different

location

become

Modu

necessary

1/i ni/

To

for the the

retrieve

record

If

gis

en

kes

is

entirely

is

similar

First

the

hash

functio

key

is

computed

has succeeded

desired

record while

the

in

iL

corresponding

is

location

iht

not a/

retrieval

otherwise

follos

the

location

nonempte

collision th

and

locations

have

been

examined

is

same

steps se

is

used been

for

resolution no record

an

enpt

the

position given

found

in

or

ill

lo

inons

the

considered

with

key

is

the

table

and

search

unsuccessful

52

Choosing

Hash The

and

that

Function nso

quick

actually will princip to criteria

in

selecting

it

hash

function an

are

that

ii

should of

be

the

eass

compute

occur

and

that the

should

of to

achieve

If

even

distribution in

keys what

sery Pascal

across

it

range

indices

construe1

in

we

know

advance

that

exactly wilt

keys

occur

then

is

possible not

hash

cc the

functions keys

is ill

but generall

is .ss

we

thi

do hash and

knos

ads tike in

what key

occur up

tin ntis

Theec

the piece

is

for

function thereb

is II

to

chop

hat

it

together

in

various

ssas

tatrt be

mdc

like

pseadorandorn

tl

numbers

indices

It is

generated

by

compi

tniformI

distributed

over

range

from

this

process

th

thai

the

word

host

comes

ince

iii ii

stncc ihc

the samc

will

process iie be

it

eonscr

is

key any

the

into

something

or will

be

irs

little

resernhl

\t

patterns results

regularities

that distr

occui

kess

destre

be

randomls

BTEX0000264

.3

Hashing

201

olved

curred lestions

Even

terms

though

the

term hash

or

is

very descriptive

are

in

some books

in its

thc

more

technical

.ccritler-srorage shall

key-transformation

three

used

be

plac.

in

We

build

consider function

methods

that

can

put

together

various

ways

to

second

hash

trtication

Ic

steps

Ignore

part

of

key

as the

and

their

use

the

remaining

part

If

directly the

as

the

index considering

non-numeric

digit iy integers

nun1rical

table

codes

1000 hash

keys then

so to

for the

example second

are

eightfifth

hash might

fast

has

the

locations function

often

fails

first

and

to

arrays

digits

from

he

is

right

make method

that

62538194

the

maps

keys

394

4to keep

4respond

Truncation

thr.3ugh the

very

but

it

distribute

evenly

table

eserved

Folding Partition tie

ttt

empty

setting

key

or

into

several

parts

and

to

combine

the

the

parts

in

convenient

way often

eight-digit

using

integer

addition

multiplicat

into

if

obtain three

to

index and

For two

example

digits the of to

an

4ual

key

can

be

divided

gr2ups

of

three

in the

added

ogether

and

truncated

to the

essary 381

be

proper

is

range

Hence

Since

all

Present

first

o2538194 information

better

maps

in

94

the

1100

value of

which

the

truncated

folding

affect

function

often

achieves

ktiserted

spread

of

indices

than

does

truncation

by

itself

t1lowed

Modular

Arithmetic

ecomes

Convert

size ltfunction the the Icy to

an

integer

using

the

the

above

as

devices the

as

desired

This

divide to

by

the

of

the

and

take

retnainder achieved

the stze

result

amounts

using very

is

1in

It1ot qjtion

then

all

the

Pascal

mod

in

integer indices

The

this like

spread case

or

by

the

taking hash

remainder

depends modulus

much power

P22

on of

modulus

small other

of

array

If the

If record

titicliiliis

10

then

index

while

remain has

the

unused

effect also

many keys tend to map to the same The best choice for modulus is prime

the

number

shall see

which

later

usually that

of

keys

quite

uniformly

for

is

We

to the since

is in

prime modulus

rather or

important

size

method 1000

it

collision better

resolution choose

either

is

Hence

997

than 1024

best

choosing

table

of poor

the

1009

the

would

to

usually

be

choice hash

that

Taking

pieces Pascal

it

usually

way

at

conclude

calculating that

is

it

function

the result

can

achieve range

good

spread

the

the

same

time

ensures on

the

proper

About

the

only can

reservation

that

tiny

machine

with

no

hardware

division

calculation

be

slow

so

other methods

should

be considered

Example

As

simple of

example

eight

let

us

write

hash

function

into

in

Pascal

in

for the

transforming range

key

ndorn

of

consisting

alphanumeric

characters

an

integer

hashsize That

is

we

shall

begin

with type

the

type

array1 follows

of

char

so

We

4H

can

then

write

simple

as

BTEX0000265

202

Tables

and

Inlormation

Retrieval

CHAPTER

Hashx keytype

integer

sample

has/c

function

function var

integer begin

for

to

do

ordx

Hash

mod

hashsizo

end

We

however

codes

tion

have

simply

is

added no reason

the to

integer believe

codes

that

corresponding

this

to

each

better

of

the

eight

instab

characters

method

for

will

be

or some

worse

of the

number

in

of

othersor

We

is

could

example

subtract

multiply

will

them

that to

pairs hash on

ignore every

Somettmes an applica

sometimes

lnct

it

suggest

one

function good

better

requires

experimentation

settle

one

re/i

as/c

6.5.3

Collision

Resolution

with

Open

Addressing

Linear

Probing

The

simplest

method

the

to

resolve

collision

is

to

start

with

the

hash

for

address

the

t-

location

where

collision

occurred

1-Jetice

and

do

sequential

search

straight

desi. and

so

key

or an

empty

called

location

linear

this

method

array

searches be

in

lii

therefore

probing

is

The

the

should

considered

to the

first

circular

location

Qua proceeds

of ne

when

array

the

last

location

reached

search

Clustering

The

there

major

is

drawback

tendency

positions find

of

linear

probing

is

that

is the

as

the

table start

becomes

to

about

in

half

full

toward with

clustering gaps

that

records

strings

appear

long

strings

of

adjacent

to

between

Thus

the

sequential

searches the

needed

ste

an

empty

position

become

longer and

are

longerin

For color

in the but

if

example

there with

mph

vi

c/us

erittg

in are

Figure

6.11

where

in the

thc

to

shown

function spread

it

that

locations probability

array Begin

hash

chooses

as

of

them

equal

If

1/n

fairly

uniform then

also the

shown

there

the

top diagram

it

new

insertion

hashes

is

location

it

will

go

hashcs

that

to

location will

which

filled

full

then

to or

\-ill

go

next

in

into

Thus

an

the

probability insertion

be of

has

doubled

2/n

will

At end

stage

ci

attempted

probability so

full

into

is

any

up

sn

of

filling

4/n

are

After

has

probability effect

is

5/n

to

of

being

the

filled string

and of

as

additional

insertn

made

most

likely

make

thc

positions the

beginninf

table starts

nun-c/sc-

location

longer toward

and

that

longer

of

and

hence

performance

of

hash

degenerate

sequential

search

probes

BTEX0000266

SECT

ON

Hashing

203

LL

is

LLI

c/

LI

II

LHT

11

LV

f1

11ff

1tF1L

lilt

ii

1ff

LCHt11

Figure

ii

lU

in

tL1t Ift

1lil

table

611

Clustering

hash

the

instability

eight

The

randomly keys

will

problem

to join

of

clustering

is

essentially

one

of

instability

if

few

keys

that

happen

other

worse

of the

be

near

each

the

other

then

it

becomes

will

more and

more

likely

them and

distribution

become

progressively

more unbalanced

applica

It

Increment

Functions

requires

If

we

are to

to select

avoid

the to

the

problem of

of

clustering

to

then check

we must when

use

some more

sophisticated

way

f/lashing

sequence so

to

locations

collision

occurs

function

There

to

are

many

the

ways

do

One

called

If

reltashittg this

uses

is

second

filled

if

hash

sonic

obtain

is

second

to the

position get

lirst

consider position

position so

is

thcn have by

other

method

the the fress

the

the

third

and

little

on

to

But be

we

fairly

good

spread second

hash

function

will

then

as

gained

an

independent

function distance

first

We

to

do

just the

well

first

to

find

more

sophisticated

way

this

of determining whatever

that

will catt

move

location

from

is

hash wish

position to

and an

apply

method

function

hash on

the

Hence

the

we

design

increment

the

depend key or on clustering

number of

probes

already

made

und

that

avoid

desired

and

so

it

is

If

Probing there

It

of

is

collision

It

It

9.

at

hash

address

that is

It at

this locations

is

method

It

probes

i2

the

table

at

locations for

mod

is

hashsize

That This

ii

is

the

increment

in

method

locations

substantially in the

clustering

fact

it

but not

that

it

not

obvious

is

that

it

will of

If

half

full

probe

strings

all

table

are

does

If

hashsize

is

power

then reach

relatively the

few

positions at

probed

and

at

Suppose

hashsize

prime

we

there

same

location

probe

i2

probej

then

It

It

j2

mod

hashsize

with

so that

oidiagram f3ashes

ijty

to

Ji

Since hashsize from

is

mod

divide

hashsize

It

is

5.tion Jiling

prime

multiple

it

must

one so

at

factor

least

divides

only have

total

when

by

of

hashsize

hashsizo

probes

so the

been

made

of

Hashsize

distinct Itinning at tO ttunher

probe.c

however

will

when

is

hashsize

exactly

number

be

probed

jstarts

oft

itt/net

hashsize

dlv

BTEX0000267

204

Tables and

Information

Retrieval

Ft

It

is

customary

to the

take

overflow

are

as quite

occurring

satisfactory

when

this

number

of

Positions

dec/a

rat

has been

probed

that

first

and

results

Note

colcu/atioti

quadratic probe

is at

probing

position

can

be accomplished

the

without

is

doing At

multiplications each

successive

After

the the

increment

it

sct

to to

probe

Since

increment

increased

by

after

has been

added

the

previous location

l35.2ili2

for

in alt

you

can

prove

this

fact

by

mathematical

induction probe

will

look

position

as

desired

Key-Dependent

let

it

having be

the

increment

depend

the

on key

the

number of probes

For

already

made

insertion

we can

the write

character and

usc

its

itself as the

example

In

we could

Pascal

truncate

key

to

single

code

Increment

we

might

increment

ordk

after division

is

good

is

approach

increment

specify

when

depend

the

the

remainder on

the

taken

as

the

hash

function

to

let

the

quotient

of

so

the the

same division An

calculation will

optimizins be

fast

compiler

the results In

is

should

division

only

00cc

and

generally

this

method

it

increment

the

once

will

determined

step not

remains

alt

constant

entries the of

If

hashsice the

is

prime

any

full

follows

probes overflow

through be

the

arras

before pletely

repetitions

Hence

will

indicated

until

array

com

quadratic

Random

Probing

final

method

is

to

use

pseudorandom

be one

that

number generator

generates can be

the

to

obtain

the

increme1it

The generator

it

used

the

should

always thet

same sequence

as

is

provided function

of

starts

with This

same

is

seed

The

seed

in

specified but

some

likely

the

key

the

method

excellent

avoiding

clustering

to

be

slower

than

others

Pascal

Algodthms

To

conclude

the

discussion

of

open

addressing used

we

continue keys

of

to the

study

type

the

Pascal

example

already

introduced type

which

alphanumeric

arrayfi 81

of

keytype

char

We

set

up

the

hash

table

with

the

declarations

BTEX0000268

Hashing

205

ositions

const hashsize

997 996

Jft0aflCCi

accrcc

hashmax

JtcCeSSIVe

is

..aa-s.s

type hashtable

array

hashtable

hashmax

of

item

.ocation

var

The

will

hash of

for

he

initialized

by

the

diining key

field

..cial

of

in

key item

in

called to

blankword blankword

together

look

that

consists

and

function

set

rig

each

Section

We

with

shall

use

already

written

65

that

ran

the

quadratic

of

probing

that this

collision be

resolution

this

We

is

.hown

--

maximum we

keep

number

counter With

probes

to

can

made bound

let

way

hashsze

dlv

and

check

upper

these the

conventions hash

table

us

write

procedure

to

insert

record

with

key

rkey

idy

into

made

procedure var

lnsertvar

hashtabte

item

truncate Ywe

might

caur.ter

ty

115

taa

rrsntly

cIc

iptimizing Last

pcsic.n

1150

fl5flrt

010051

Hashr.key

and while

Htp.key

btankword

r.key div

IC

to

location key

larsen

emptv5

and

the

at is

Hp.key

Has

he

argot

bonn0

array

and

begin

hashsize

do

t.s ovrfiow

occurrecOb

corn

ouucbutic

pro/nui.

Prepare

increment

tor

the

next

iteration

rement jrovided

hction of

If

if

hashmax

then hashsize

mod end

Hp.key

slower

blankword

then

Insert

to

.1kW

tern

else

if

HpI.key

r.key

then the

Error else

same key

cation

4p1n4

twice.t

le

Pascal

Overflow

Counter

has reachco

its

hmit

toserti

end

procedure form and

is left

prOCedure

to as

retrieve

the

record

if

any

with

given

key

will

have

similar

an

exercise

BTEX0000269

206

Tables

and

intormation

Retrievat

CHAPTER

SEC

Deletions

Up

with

to

now

it

we

have

said to

nothing be an easy

about task

it

deleting requiring

is

to

from marking

hash

the will

table

deleted

At

first

glance

may appear

special that

location

the

is

key an

that

is

empty

the

method

stop the

not

work

for

reason

empty

used

as

signal

search or

key

Suppose

that

before

deleuon

is

there

had

been

position

collision

is

two and

in

address

try

is

the

now-deleted

that to find

actually

stored elsewhere

position

is still in

the

table

the

If

we

now

it

to

retrieve

item then

the

the

now-empty

though

it

will the

stop table

special key

search

and

impossible

item

even

One

placed

free the to in

method

any

to

remedy

position

this

is

to

invent

another

indicate not

key

to

be

is

deleted

This

key

that

would

it

position

receive for

an

insertion

when

desired

should second

bit

be

used key

terminate

search

the

for

in the

table

Using

this

special

will

the

however

methods should

be

make we

algorithms

so as far studied as

more complicated

tables

deletions

and

are

slower

With

have

hash

indeed

awkward

and

avoided

much

possible

6.5.4

Collision

Resolution

by Chaining now we

have with

implicitly

Up

fact

in iccked stoaagc

to

assumed

that

we

are

using

for the

only hash

contiguous

table

itself

storag

is

while

working

the natural

hash

tables

Contiguous wish

is

storage

able to

choice

linked linked table

since

we

to

be

refe

quickly access

to

random

is

positioc

overflow

the

table

and

storage storage

itself

not

suited

ttot

random

for

There

howeve.

no can

reason

take of

It is

why

the

should

as

be of

in

used

pointers

the to the

records records

themselves

that is as an

hash

an

array

array

list

headers

An

to

example

refer to

appears

the linked

Figure

front

6.12

the

traiitional

lists

hash

table

as

cltain.c

and

deletion

call

this

method

collision

resolution

by chaining

Advantages

of

Linkr

Storage

There

several

advantages themselves

is

to are

this quite

point

of

view

is

The

first

and

the

most

important be saved

Olsadva

.spac

satin

when

Since time

are will

large

that

considerable must

in

space

aside

may

at

contiguous

If

array

enough

space

are

be

set

compilation then

if

avoid

overflow

the

records

is

themselves

to

the the

hash

cost

table of

there

use

of

spa

many

empty

positions

as

desirable that

help avoid he

to

collisions

If

consume

the

considerable

table

ssace

might

pointers of to the the the

needed

the table

elsewhere

on

the

hand iionly

factor

hash

contains then

the

only

size

records

pointers

that

one

word

each

by

the

hash

size

may

he

acid

bya

will

sn-coil

reco

essentially relative to

factor

equal

of

the

records

for

become

small

space

available

for

records only

or

other

in the

The

flciitIIuPI

it

scond

simple and good

major and

advantage

of

keeping

pointers

table link as

is

ti

allows

efficient

all

collision

tlte

handling witl

will

We

need

only hash

field

cad

list

record With

organize hash

records

adcires

linkso

function

few keys

same hash

.idress

BTEX000027O

CII ON

.--

Hashing

207

At

first

Slocation

tk

va

ind

The

target that

ilsewore

iition

till

will the

in

ly

to

be

is

sition

terminate

however

j1methods should be

4-

Figure

.\

chainett

bash

table

storage itelf is

in

lists

will

be

short

and

can hash

it

be

searched

quickly

Clustering go

to distinct that records the

is

rio

problem

at

because

third

keys

with

distinct

is

addresses

is

alwt

in

ipositions

advantage

the

that

no

If

longer

there linked

necessary

are

hash

the

showever

Ives

exceed

it

number of

only Even

length

list

records of

are

more

are

than

to

entries

We

as

means

record

that

if

some

there

the

lists

now

sure

contain

the size

more

of the

ais

an

one

the

-overal

lists

times

will

small and

average

of

will

the

linked

remain

sequential

search

and

on

the

appropriate

rentain

efficient

Finally proceeds

in

deleton

exactly

becomes

the

quick

as

and

casy

task

in

chained simple

hash

table

Deletion

same way

deletion

from

linked

list

tportant

Disadvantage

of

Linked

Storage

saved

pilation

if

These

advantages

is

of chained superior

the links to

hash open

tables

are indeed

powerful

let

Lest point

you

believe

that

chaining

space

always

All

If

however

the the

us

important space

records

is

there these

disadvantage

negligible are

in

require that

records

are

large

this

if

comparison

it

with

for

records themselves

the

other

require large

/1

small

then

is

not

links

is

Suppose

records

for

themselves

take

use the

take the

one

word

each

and

that

the are

items quite

key

to

alone

answer

the

Such

applications

become

common

the

where Suppose

hash

table

only and

the

some

hash items

for the the

yes-no

table

question

quite shall

about small

use 3n the

key

the

that

we

use

chaining

entries for as the

make number

table

itself

is

that to

with words

links full

same

number

of

of

Then

keys

we

field

of

to

storage

the

altogether node if

hash

and

will be

for

linked so the

find

next be

any

on

each

chain of

Since

the

hash

will

table

nearly

there

will

many

collisions

and

some

chains

have

several

items

IL

BTEX000027I

208

Tables

and

Information

Retrieval

Hence

searching

will

be

bit

slow Suppose

of storage

on

the

other

into will

hand

the

that

we

use

will

Open

addressing

that

it

The same 3i

be only

for

words

full

put

entirely there

hash

table

mean

wilt

one any

third given

and

will

therefore be faster

be

relatively

few

collisions

and

the

search

item

Pascal

Algorithms

chained

hash

table

in

Pascal

takes

declarations

like

thcIii

oiii

ii

type

pointer

list

mode

record array

called points

head

pointer

end

of

list

hashtable

10. hashmax

node

to the consists next

The

called

record next

type that

of

an on

item

called

into

and

an

additional

field

node

the

linked table

is

list

The

iliiiiii/iZiJ/rii

code

needed

to

initialize

hash

for

to

hashmax

use

is

do

Hlil.head

nil

We

hash

retrieval

can

even

itself

previously no

different use the

written

procedures

that

to

access

the

hash

table

for

The

data

function

from procedure

used

with

open

addressing linked

we

5.2

can

as

simply

SequentialSearch

version

from

Section

follows

procedure

Retrievevar

hashtable Boolean

USC

target

keytype

perfect lies

var found

hinds the norta

to

location table

rcLirria v.ith

wth

rvsdc

kecusroe

Loatin

poinbnq begin

that

pro.rh

ihe

iooomes

SequentialSearchHlHashtarget

target

found

location

end

Our procedure

already

45 iisiriii

for

inserting the

nec

receni

entry

will

in

assume with

that given

the

key whl

does he

not

appcar

otherwise

only

most

tscrti

key

retrievaH

procedure

inserts

Icey

lnsertvar

fliD

hashtable

toe the

pointer

haai leule

ciS.eLOtflflq ii oil

node .nto.te

ohaned

r.da

wth

is

var

integer

used

for

index

fts

hr

IS

table

begin

Hashpt

pI.next

.info.key

01ri

ktr

d.ex

linKed

Dr

Hli.head

Sat

Iso

incrr

i-.ao

flea

to

ls

nec

tie

rn

end

As you can

versions for

see

both

of

these since

procedures

collision

are

significantly

is

simpler

thou

arc

it-.

open

addressing

resolution

not

problem

BTEX0000272

TC

t4

Hashing

209

Exercises

6.5

El

Write and

Pascal

procedure

to

insert

an

item

into

hash

table

with open

addressing

linear

probing

E2

Write

ing

Pascal

procedure probing

to

retrieve

an

hash

table

with

open

address

and

ta

linear

th quadratic

F3

Devise

to integers the

simple

easy-to-calculate

hash

function

for

mapping

thc values

three-letter of

words

function

between

and

it

inclusive

Find

your

on

words

PAL

for

LAP

II

PAM

17 19

MAP

Try

for

PAT

as

PET

collisions

SET

as

SAT

possible

TAT

BAT

13

few

E4

Suppose

12

hash

table

contains keys 45

are

hasttsize to be

entries

ittto

indcxed

the

from

through

and

following 100 32

mapped

29

table

10

58

126 and

200

400

Detcrmine

these

tirst

addresses

find

how

many

collisions

occur

when

keys

mod

hasheize and

find thcir

Determine

these

addresses

how

digits

many

collisions

occur

when

keys

folded

by adding reducing

will

together in ordinary

decimal

rpresentation Find

iij\/t Juit

and function

then

that

mod

hashsizo no

collisions set for set for

is

hash

that the

produce

for

these called

keys

perfect

hash

cIiui

function

has

no

called

collisions parts 01

fixed

of

keys

Repeat

that

previous

hashsize of keys

that

11

hash

function

fill

produces

table

is

collision Ininifizo

completely

the

hash

perfeeL

ES

Another array

location

method

the

collisions into

with

all

open

addressing

that

is

to

keep

separate

called are

table can

which be

items

collide

with an hash

occupied or

the

put

in

They

order

either

inserted

with used

another

for

function Discuss

simply

inserted

with

sequential

search

retrieval

advantages

and

disadvantages

of

this

method

E6 E7

Write

an

algorithm

for

deleting

node

from

chained

hash

table

Write

special retrieval

deletion

algorithm

indicate

for deleted

hash item

table

with

part

open

addressing

using

second

the

key and

to

see

of Section

6.5.3

Change

insertion

algorithms

accordingly

EL

With

special

linear

probing

as

it

is

possible the

to

delete

an

item

without

using

second

key

is

follows

If

Mark

the

deleted finds

entry key

it

first

found

search then

the

is

empty

and

position from

move

back

there

make

Write an

its

previous

position to

empty ment

continue

new empty

and

position

insertion

algorithm need

imple

this

method

Do

the

retrieval

algorithms

modification

BTEX0000273

210

Tables

and

Information

Retrieval

CHAPTER

the

SE

Programming

Project

6.5

Fl

Consider words

filled

35

Pascal of on

nine the

reserved

words

listed

in

Appendix

less

C.2.l

nine

Consider

letters

these are

as

strings

characters right

where

words

than

long

blanks an

to to

integer-valued

.11

function

that

will

produce

find the

it

different helpful to

values write

file

when

short

luau

35

assist

reserved

words

program and could

may

program

the

Your

devise

integer

read what

words

from

appl

At

function

the

you

determine such

values until

collisions

occur

values

Find

are

smallest

hashsize

all

that

when

the

of your

function

reduced your

mod

hashsize

as

35

distinct

Modify

the

function part

the

necessary

will

achieve

hashsize

perfect

35

in

preceding

for

You

Pascal

then

have

discovered

minimal

hash

tlWi

function

35

reserved

words.

6.6

The

ANALYSIS OF HASHING

Birthday Surprise

The

sion

likely

likelihood

of

collisions

in

hashing

relates

to

the to

well-known be

itt

mathematical before

it

diver-

Si

How

that leap

many

two years

the

rartdomly people

there will are

in

people

the

need

room

and

same

birthday

niottth people 24

day

that

Since

the

from

will

possible

birthdays answer

for this

is

most

ottly

guess

be

in

hundreds

determine

hut the

fact

the

people by answering

probability his

its

We

With

have

in the

can

probabilities

question

is

opposite no

off

birthday

probability

people

Start that

in

room

any

what

the

that

two

Ott

with

person person

and has

has

first

check

different

birthday

is

calendar

second

that

hirihd

364/365

is

Check 363/365

it

off

The

probability this

person

if

different

htrthday have

is

now

Continuing then

the

way

we

that

the

people birthday

different

birthdays

probability

person

in

has

different

365

Sittce the

in

l/365

independent people 365

all

birthdays

that

of

the

different probability

people

that

are

in

the

probabilities

maltirJv

is

and

we

obtain

have

differcttt

birthdays

Itt

in

365

becomes

to

less

0.5

in tells

24

us that

regard size

be

hashing

are to

the

birthday

to

surpise have

the

with

any

problem therefo

to

cilhisuni

J//r

that

we

only

as

almost

try to

certain

some

eollisiotts of

Our approach

but also

mininlize

as

number

collisions

ltandc

occur

expeditiously

possible

Counting

Probes

As with

other

methods

of

information on average

retrieval

we

would

like

to

know

how

many

uhj.

comparisons

to locate

its

during

use

both

the

successful

and

for

unsuccessful looking

at

attempts

target

key

with

We

the

shall

word probe

onae

and

comparing

key

target

BTEX0000274

pER

6.

Analysis

of

Hashing

211

er these

let

clearly

it

depends

the the

on

how

full

the

in

table the

in

is

Theretbrc and

long

are

for

searching

is

methods

the

we

let

be

of

table

the

we The

which

same

table

is

as

hashsize

n/I

be

positions

array-

.ntd

when

fvs

short

factor

load factor

table there that

is is

of

the

Thus

can

signifies

an

empty

table

but for

0.5

half

full

For

open

addressing

never and

exceed open

chaining

no

limit

on

the

size of

We

Me

consider chaining

addressing

separately

apply

function

Analysis

of

Chaining

With

35 in

chained probes

hash

we

go

to

one

of

the the

linked target

lists

before

doing has

any

Suppose

chain

contain

if

it

is

present

rfeer

hash

in cttcccssf

it

items

rut vol

If the

the

search

is

unsuccessful Since

the

then

the are

target

will

be

compared

with

all

of

lists

corresponding

probability

keys

of

is

iten

any

the

distributed the

probes

for

all

equal one

appearing

on

list

expected

of

of

items

on

the

being searched

is

n/i

Hence

average

number

an unsuccessful

search

cucajit

cal retrieval

Now

we know

of

diver-

suppose

that the the

that

the

search

is

successful

of

From

the

is

analysis

of

sequential

search

is

becomes

length

see

average

number

the

comparisons But

it

where

length at least of this

the

is

chain

since

containing

target

the

expected contain

distributed the

chain

we

know

in

advance than

the

that

must

are

one

node over

he

thc

all

at

The

hence

for the the

nodes expected of

other

target

uniformly

is

of

the

chain

with

target

1/i

1/i by

n/i

no

two on

Except

tables

trivially

we

may approximate

successful

Hence

ybff

average

number

probes

for

search

is

very

nearly

364/365

is

1c

Analysis of

now

Open Addressing For our

the random analysis

Iitferent

of

the

number of probes

by

next

done

that

in not

open only

addressing

are the

all first

let

us

first

ignore

problem

after

of clustering

collision the

let

assuming probe

that will the

probes

randoni

of

pro/w.v

but

the

be

random over

is

remaining

all

positions

table

In as us

fact

us

assume events an

table

so

large

that

the

probes

can

be

regarded Let

hits

cell

independent

first

study

cell is

unsuccessful

the

search

The

probability that

that

the

first

probe

an

is

occupied

load factor

that the

The

probability

probe

hits

an

in

empty

exactly

The

is

probability

unsuccessful

the

search

terminates

that exactly

two

probes

therefore

Al

search

and

search

is is

similarly

probability

Ic

probes

..a of

trefore

are

made

in

in

an

unsuccessful

Atl

--

The

expected

number

UA

of

probes

an

unsuccessful

therefore

handle

UA

many

ttiIxuc-ctosJim/ retrieval

This

sum

is

evaluated

in

Appendix

we obtain

thereby

item

BTEX0000275

212

Tables

and

Information

Retrieval

CHAPTER

the

SE

To

needed

count

will

probes one

the

needed

for the

let

successful

search

we

in

note

the

that

the

number

search

be exactly

inserting inserted

more than

item

at

number of probes

us

unsuccessful

as

made

with grows

before each

Now

time value

consider

these

is

the are

table

beginning

the

empty

factor this

item

one

lo

its

As

It

items

inserted

us to

load

slowly

final

reasonable replace

successful

for

approximate an

is

that

by

continuous

growth

of

and

in

sum

with

integral

We

average

number

probes

search

approximately

act

es sJ

rid

SA

Jo Similar

it

IA

for

calculations

may

to

be done assume

open

addressing

with

linear

For

Err

is

no

longer

reasonable

are rather

that

successive

so at

probes

present

are

independent

the

lit

Ca

probing

details

however

more complicated

the references for

we

the

only

the

results For

to

derivatioti

consult

end

of

chapter

increases

linear

the average

number

of probes

an

unsuccessful

search

and

for

successful

search

the

number becomes

II

1A

Theoretical

gives the

values

of

the

foregoing

expressions

for

different

values

of

the

load factor

Load

factor

010

sea

rc/i

0.50

0.80

--

0.90

099

2.00

Sucee.rsjii

Chaining

1.05 probes

1.05

1.25

1.4 1.5

1.45 2.6

5.5

.50 4.6

2.00

Open Random

______

Linear

probes

1.06

505

UnsaecessJii

Sea

re/i

Chaining

0.10 probes

1.1

0.80 5.0

0.90 10.0

099

too 5000 methods

2.00

Open Random

Linear

probes

1.12

13

50

or hashing

1igurc

6.13

Theoretical

comparison

We

consistently traversal the

can

draw

requires

several

conclusions probes

is

from

this

table

First

it

is

clear

that

chaining

fewer

lists

than

does slower

On

the

other can

hand

reduce

of

the

linked

especially

usually

which

advantage

if

key

comparisons

quickly

Chaining

comes

BTEX0000276

SECTION

Omber

earch

into

its

Analysis

of

Hashing

213

own

when

is

the also

record

especially

are

large

and

comparison

of

keys

takes

significant are so

often

since

advantageoLts

when

cry to

uthuccessful

list

searches

be

com

that

mpty

raetor

ke

with

chaining

an

at

all

empty

need

list

or

short

may

found

is

no

key

comparisons addressing

be

ione

show

the

that

search

unsuccessful

linear

this

With

ing table

is

open

and

successful

searches

simpler

mcthod of

at

prob

the

.41

We

not

is

significantly

slower

full

than

more

sophisticated searches

methods however

search and

least

until

jimately

almost completely

linear

For unsuccessful

into

clustering

quickly conclude

factor

is

causes

probing

if

to

degenerate

are

is

long sequential

to

We

might load

therefore

that

searches

quite quite

likely

he

successful but

in

the

moderate

where

bit dts

Wr

then should

linear

probing

satisfactory

other

circumstances

another

method

be

used

The

For

Empirical

linear

Comparisons

It is

important and

also

to

remember

in

that

the

computations

is

giving

Figure

6.13 so

are

only

approxi

always For study

mate

expect sake using

that

practice

oothing

the

completely

results the

random

and

results

that

we

can

some

of

differences

between therefore

are

theoretical

actual of

comparison

keys

that

Figure

6.14

gives

900

pseudorandom

numbers

between

and

Load

factor sea

0.1

0.5

0.8

0.9

0.99

2.0

SuccessJii

re/i

Chaining

of the

1.05

1.4 2.1

1.4

.5 5.2

2.0

Open

Quadratic Linear

2.7 6.2

probes search

3.4

21.3

Unsuccessful

Chaining

0.11 probes

1.13 1.13

0.99

2.04

Open o0

Quadratic Linear

12b 430

probes

Figure

6.14

Empirical

comparison

of

hashing

methods

onclusions to the

In

comparison about

all

with

these

other

methods of information

is

retrieval

the the

important

thing

note

numbers

of items

in

that

they

depend

on

load factor

table

is

is

absolute

in

number

the

is

table no

Retrieval on

with

items

table the

40000

20

possible in

positions

slower

retrieval

list

with

size

items take

40

possible

positions long

Ig

With

search but

sequential

1000

this

will to 10

1000

times

as to

to

With

still

binary time

search needed

ratio

reduced

Chaining tt the

more

it

1000

the

increases

with

size

which

does should

hashing

the

hand

reduce comes one

Finally

that the

we

emphasize

importance

of devising of

that

good

If the

hash hash

function function

is

executes

quickly of

and

maximizes can

the spread

to

keys of

poor

performance

hashing

degenerate

sequential

search

BTEX0000277

214

Tables

and

Information

Retrieval

PIER

that

SE

Exercises

6.6

El

If

each

the

in

item

record

field

in

if

hash

table

is

words

suppose

of

storage

there

of

pointer

needed

chaining

that

items

the

the hash

factor

is

table and

be

load

open

addressing

for the

is

used determine

table

how

many

words

If

of storage

is

will

hash require

chaining

field load

used

then

node

will

will

words

for the

including

the

pointer

If for the the

How

factor table

many

is

be

is

used

altogether

nodes

will

and

itself

chaining Recall

used

with

how many

chaining

words

the

be

used

itself

hash only

that

hash

table

contains

pointers

to

requiring one

the

word each

parts to find the total

Add

ment

if

.c

your

for

is

answers

load

two previous

chaining

storage

require

factor

and open

small

for

then

addressing

requires

requires less

less

total

memory

Find

total

for

given

but

large for

at

the the

space

altogether

the break-

even

value

will

both

load

methods

factor

use the

same

storage

Your

answer

depend

6.14 of

El

to

6.13

is

and taken

somewhat

needed

in

favor

of

part

chaining of

because

no

space 6.13

see

Section

65.4

for

is

6.7

tables

like

Figure

for

where addressing

the

load

the

factors

are

calculated by

links

thc

of chaining table

it

and

open

space

required

added

the hash

thereby

in

reducing

load

factor

to

Givcn

nodes

linked

storage connected

hash

factor

table

with

find the

c/talc

total If

more

that

for will

is

the

link used

in

and

amount

of storage

of

be

ittcluding

strap

this

it

same anlount

items

to table

storage

used

hash

resulting

table

with

open

addressing This

is

and load

of use

words

for opeit

each

find

the

in

loth

the

factor

revised

the

tab/i

factor

addressing

computing

tables

Produce Produce

for

the

case

for the like case

.s

another

will the

table

What

123

table

look answer

to

when

the

each

item

takes

IOU

words

is

One

reason from

why

the

the

to

birthday

related

is

surprising

the

that

it

differs

answers

are

apparently

in

For

leap will

following

sup

ether

pose

that

there

people

the

room

and

in

What

is

the

probability

that

someone hat

at least

room

random

date

the

drawn

from

that

fb

What same

If

is

probability

two

people

in

the

room

will

have

that

random

cltoose else

birthday one

in

we

person

the

and

will

find

his

birthday

the

what

is

the

probability

thut

someone

124

In the

room

share

that

it

chained

hash

table

suppose

the

it

makes

each

as

to are the

speak kept

of

in

an order

order by

the in

fc

keys

and search

suppose can be

that

nodes

as

in

ker

key an

liaal

arc/pied

dcii

/th

Then

should

terminated

I-low

soon

passes

will

place on

where

average

be

if

present

many

fewer

probes

be

done

BTEX0000278

Cot

ion

Comparison

M.vods

215

jorage

there

unsuccessful average

to

search

insert

In

successful

search

the

in

How

place

many

probes

are

nceded Lnswrs

on with

new

node

iii

right the

Compare

the case

your

the curresponding

numbers derived

of chaining

of the

in

text

for

of unordered

chains

many

ES

In

our

discussion

for

the

hash

table

itself

contained

is

only

the

is

pointers

first

list

headers

4lng the

each chain

chains

the

One

table

variant

method

to

place

actual

item an the

of

each

hash open

usd1

An

With

empty

position load

indicated calculate

by

des

be

impossible

effect in

key

space item

as

with

this

addrcssino

as

given

the

factor

used

itself

on each

of

method

takes

function

uf

number of words

except

bk

links

link

one

word

your

require-

Programming

Project

6.6

Pt

Produce

test

table to

like

Figure

6.14

the

for

computer

of hash

by

writing

and

running

programs

implement

various

kinds

tables

and

load factors

it4

given

Your

iuse

no

.7

for

the

CONCLUSIONS

COMPARISON OF METHODS

and

the

added

previous one

sequential

have

together

explored search

first

thur table

qutte

different

methods

hashing which

to

retrieval

search

is

binary must

lookup

criteria

nid by

4with

ftnd the

If

we

are

to

ask

these

which

criteria

of

these

we

select

the

Hues

0/

1111

answer

and

and

will that

include

affect

lists

to for

imposed

by

the

application the

first

orucrurc

ldressing

is is

other

considerations applicable

are free to to

In

since

two

methods however

ubte ton/sup

is

are

only

to

the

lists

many

applications

the

we

choose speed

either

tables

data

structures

in

In

regard

both

and

convenience

are

ordinary

lookup

to

contiguous

it

tables

certainly as

superior

list is

but

there

many

the set

applications of keys

since

is

which

It is

is

such

when

preferred deletions

or

are

sparse

actions

also

whenever

insertions

or

frequent

such

in

contiguous

th at

pg

may

it

require

moving

of

the

large

amounts of information

three

Which

the form

ni/icr

other

methods

is

best

depends

on

other

criteria

such

as

supmethods

of the

data

search order

is

Sequential be stored

is

certainly either

the

most

flexible

of

our

methods

The

data

may

4ay

on

in

any

with

contiguous keys

or linked be

in

representation and

the requires

Binary must

search be

in

much

more demanding

The

must

order

data

tye

that

random-access peculiar

representation of

the

contiguous

well

If

storage

to retrieval

Hashing from

the

even but

more

generally

ordering

for

keys

suited the

hash

table

that

useless

any then

other

purpose kind

the

data

is

are

to

be

available

immediately

table

is

for

human

inspection

some

is

of order of

essential

and

hash search

ker

for

Finally

near miss

there

question say

the

unsuccessful

that closest the to

Sequential

key

hashing by

search can can

themselves determine

useful

nothing data

except keys

search

the

was

Binary thereby

Ac

fe

key an

which

have

target

in

provide

information

BTEX0000279

13n

tok.s/aie ivkic

to

P11151

isP wtl

nlpIfl\

of WacIswl

I98i hook he

tilt

he

\adsvorth

Inc

Ileintont in

9-in

All

rights

reseFvetl

No

AOl

pan

loint

of or

ilto

nets

repntcluced electronic

stored

svsent

tpvll Iir

or

transerilseci

ill

AIIV

nteans

Written

mecltantcal

01

tIt_

ig

re

FCiltg 03

otltcnvise-

vule

tot

prior

permission

k.sUolc Inc

iOihIislliilg

.ompanv

Ioittetts

diltirnia

939it

division

\\atlsosirtli

Prittied

in

the

ititeti

States

ol

Aiticiict

ii

Library of Congrcss

SIttistla

Ii

Cataloging

tiate

in

Puhcation

Data

tat

strticttires

ala

welt

altstrict

clar

Ivise

tic

iilstiii

II

Conspuier science

sciOn \\ehte .N

\\

Ahstrict

ivtcs

Neil

\X

ide

llS.i

cAo.Q.it3Ss

of

S-i-UtO2S

ISBN

O534-0319-Q

.\ci/ did-

Spi

ins

lime

Iiiit its

.ltic/tctil

\tsdll.sittt .IJcc

idtt

nat

Assistants

/1

71/i/i

001/

/llii

ill

On /1

Alat-keting 11111111111

lepteseiltalive

Ftl

ill

tail/on sA aiitcla

.siinLtii

IF IF Ir

nii/t

Manuscript

Perntissii icr Art

intl

Filet latin

Ih-ec

u/lOu

ins

ago

/ouis/i

Intctiot iFs

Sin

//neb

ii

uitlitlAti

ReAct

hi

AuiuiuOC

Interior

Illustrittioti ut-up/tic

mu

lw

/ltotu

/ultill-\ueeun

ivpcscri Ing

Iriniitg

Itst

/t/kSAUifli

/1 /i

Ins

So/LI

.-Otgi/c.s

li/i/ouuuui //ic/taunt

toiling

i-Si/na/lit

.0

c.tsiitjo/.ctai/i

Apple SEC

Iiill is

it-adentark

uI

Apple

cuuniantei

Inc uutpot-tnn

Ntaclti

trademark trademark

irailcinark

OF

til

liio_il

Oquipnieiu

i_s

ititctiiiiiuttal

Ictsiriess Inc

tea

Tic

Itiseal

Nli

is

of

Digital

kcscancli

BTEX000028O

310

/to/fliT

Sets

We

set

have

Shiecit to

nit

cat tie

mci

ii

tided Th2

the

set

Opvrtt

tti

on/nit

tPZtPrcectinpi

and

Id

nieti c/i/fert-tcc in

iii ri

ir

Ci told they

itt

he included

tfso

how wou

he

the

sjieci

iauons

the

that

thiccuglit

ttve

mi

lililicil

cli

sc

key key

as

Otie

if

7.4

Hashed

have studied

Implementations

several linked Ins niethod.s

lists

nietits

atiahc

Then

ins

ehentii It

/e

reec

Ic

tgc

for several

cit

the

Li

if

and

trees the

later retriessd

of

kvvvu

tlia

anit

mg

cog

rds Arracs

liese

provide

Ii

is

amc

thii

ecu

tst

cw

tperatic

In

these of

res

id values until

ncc

in

cltapter

of

implemented nrc

is

he

st

fbrni the

search

tr

The key

target

rds

dtscctssiott

are

ci

toipared

itr

desired

is

key

either of

match

prohcs

of

is

rte

prc

oft

ficitnd

the

data

lii td.s

structure of

trgati

exhausted

rig

The

pattern the

uses

It

liii

dependent structure

lsinarv \Te

apt

si

to

the

met

izi

and

as

relatirtg

records he prohed

the

Oti in

ever

tied

linear

list

list

implemented

ti

an

array

can

hy

for Iitiked

list

sett-ch

The same

ask

if it is

linked

ti

ftcrm create

can

data

Is

only he

st rLtctu

searched

re that for

sequentially

ni

sorted

ire

might

ci

pi

tssihlc

does

reclu to

fewest

effect ivc

chic

search pute he

ittiplement

teat

it

the

hod

operation

that has

it

pissihle key

sal

example

ccitii

i.tt

Ii

in

oft

he

reet

nd

si

given

ue

hash

AJI

cit

taut

tietut

rs

dd

ress

of

reet

key key

these

Ii

teiittiilhitc svlie

ref

is

teuc

in

that

oh

the

record

idetititieci

lie

distit

tet

value thtt

inti

the

mertti

is

cry

address

nc

cha

It

ngi

ts

\\e

sittil

see

are

the

Itt

artswer

qualified

cc

yes ml Ihev

Such

lie cc

futietiotis

can cd

if

hcund

of the

lint

they

ti

difficult

determitie

kni

c\vti

and tdvjn

eati

tca

let

ii

ii

instruct called

all

keys

the

data

set

are ate

it

than

is

ti

ealcik

it

ate

pet fect

hashing functions

and

further

exatniocti

ii tii pci

Section Ni

Tht.3

an

there has

tctitat

is

irma lv

he

that

ces

ci

mprotii dyes

cc

ealcu

fri

im

in

strictly

calculated ks

aecvhim

itcc

Figitre

selietite to

hvhrid

scheme

di

iti

lath give

folk

rcved

some

searching

if

The

function

ird tot

tiot

necessarily

the

exact thtt

tiietiion

addres

the

COnS

type

tahil

the

tart.et

reet

itLl

only

gives

home

address

tnt

ci tnlai

hills

desired

reci

tar

hi

tahlc

woe

acid tess

lit

kei

Figure

Futieth

iris

such

is Ii are

kttiiwrt

as

bashing functions

etsv

tic it

Iti

cotirt-ast

to

perfect Sctppi

In

151

hashitg perk

that

funetit

os The

these

hi

tre

usitallv

to

detertititie the

atid

can

give

si

exeellern otght

trtnauee case

In

uric

address

may

is

ci itittiti

record

this

is

being

search

Secthtti -t2 the

its

oft

cther

tddresses inttoduce

several

if

reqit

ired and of

ktiosen

as

rehash

and

in

vat

tthle

ing

.t

we

nunihier

hashing

Its

futictiotis 7.5

Section

tarize

we exantine

irnianee

in

rehLshttg

strategies tos

itt

Section

in

we

sitni

pertc

hashed

implemerttath with

diat

and

isis

Section trees

.6

ft tr

we

the

aticl thtt tite

tmpate

opertt

if

ttkl

perforniatice

and

freihcteticvttialssis

ci

graphs

idea

in

The

Si

lu

ndameotai

te

hiehitid regular

hashing

pittterti

is

the

tuthesis the

tf

sotiit

I/i

kec

that

\\hiti

arranges

hitiarv basic

tI

tic

that

tiiakes

relat itch

tidcivr

itch

tot

sc

setrelt idea

is

ltshitig the

takes

the

ci

diametrically

opposite

Iv

Nottce

tttiil

apprc

it

scatter

records

imphetei

rattdomn

through

BTEX000028I

.Secnn

hashed

ltiephsiiteitnituus

351

nleiiiorv

or

stor

of

as

spacerhe

so-called

ba-sb

table

he

that the

LtL5Il

ftinctii

ni

can of

he

the that

as

pseudo-random-number

and

that

generator address

of

uses

the

valt.ie

seed

outputs

the

home

element

containing

One metes

analt

si

of

the

is

drawbacks no

of hashing of

is

is

the

random

parent

locations or child

of

stored or

dc

There

nouon

first

next

root

for

annhing

gous

Thus

hashing but

not

appropriate implementing

it is

implementing

that that

set itvolve

is

relationship relationsltips

of

keyed

that

among elements

anutntg constituent

for

structures

ctuie5 iott

is

elements

sets

for

hi

that

reason ther

hashing

discussed mtexts

in

necin

this

chapter

of of

11

There

are

tweceo

appropriate

ci

or

tecOrds

ei

disc1tssion

hashing

the virtues of

matchprobes

is

One

probes

hashing

has of

is

that

it

allows

us

to

find of

records with

that

01

011

in

The

in

/iitclkei

operation

required

nuniher

structure

probes

depend

far

ids cjihed

of

the

on

for

even

implementation

of

even

list

data

discussed

array

so

by

linked rted

list

implementation and

to 01

find

01

log2n hr an

search

tree

inplementath hashing

to requires

of

uentially

logn

for

hinan

it is

Since

the

tt

require

to

fewest

probes

something Also

frequently

considered

stores

be

in

particularly

com-

effective

search

it

technique sometimes

of

since

bashing

to

elements

for to

table on

the

hash

All

table

is

considered

are sets of

he

technique

operating view

tahkss as are

of

these

views

hashing

correct

its

We

choose

lashing

technique address

hi

for

other

advantages

and

disadvantages

not

changed

It is

by

view

the the

qualified

convenient hash

function

its

consider

calculate

hash

table value

to

he

of

in

the

array

of

rect irds

and

.ie

and

can

let

the

index

directly

home

address

rather

advance

htniined

in

than

is

to calculate the

memon

address

address

Once

the

appropriate

the

index value

into

iii

computed

actual

arras

mapping The

function

can

table

complete

is

transtbmatiitn

as

an

gued

1rne

memory

hash

then

represented

shown

Figure 7.12

in

coast type

tablesize position

VtaiiIcircI

cOntain

/ascoi

var

table

arraylposition

of sidelement

of hash table

17/ic

bash

iahk.l

Ftgure

iko perfect

712

Array

representation

excellent

Suppose

In

that

we

have

hash

table

defined

by

iuught

rebasb

ti.k

var

table

arraylO..6l

integer arrav1..lOl of char

and

in

twe sum

7.6 for

we

the

end

and hash

that

the

function

is

tIi8

sort

I-Il

key

that the

key

mod

produced

the

efficient

pach

iOut

The

Notice and

value

is

by

this

frmnction of

is

always

table

an

integer

between

some

which

within

range

of indexes

the

BTEX0000282

St

312

/to/ner

see

Operation

Table address Table contents the

litst

cc-ca/c

will

produce key

the value

empty

of

table

shttwn

the

in

FigLlrc function

7.13

We

If -fr

st

tec-ord

we

store

has

374

then

bash

1/kes

ti

I1L7i

the

3m

at

nuid

in

the

exac to dt

it

record value of

tahlel3

This get

is

showtt

in

Figure

.14

If

the

next record

thing

141

cit

isrv

key

/111191

191

we

mod

that

74.1

There

is

tite

tahie

becomes

shown

in

Figure

7.15

third

record

7J3

table

911

gives

straightfsttt

11911

and

Table address Table contents target position value

tIi1i

911

mod

shown

the to the

iii

since their

the

si

use Inc

the

resulting

itf

tahle

Figure 7.16

already

in

exotic table

is

Retrieval

any

of

records

the

the that

simple

the

matter

The

table

Coos

TIt

ic

key

as

is

entpn

presented

it

hash

unction was

reproduces

the target

same

enipn

eiittty

did

when

record

stored

in

If

key

were 740

not

iti

the

table 7iO

the

hashing

functic

would produce

TIc

t11t

Ji7q0

Interrogating

mod

we

not

that in

We

find the that tahle

it is

will

nc

Si

cii

tp

tahIt ThO

is

tilt

entptv

atici

we conel

tide

tI

tat

record Digit

Sc

with Figure

16_st

Icey

7.14

si

-c

ned

at

table

The example

prohieni

in tile St

we

with

have

just

seen

sal

was

constructed hashed

case

in

to

conceal

different

serious ccations

The

keys

hrst ol

ltt

fbi

1liztt

keys

is

different

ues have

is

tltt

table

the

generall

values value

so and

carefully

is

tnlv

the

tair

current

example

of

Social

Sect

Table address

Table Contents

because

key key

were

of

chosen Then

Suppose

that

inserthm

ke

If

record with

22

attempted

III

2rt

is

mod

iireici

hi

the

pops

thu

in

the

last

hut

c_Iziti

led

with

anc to

nher

the

reeord

This

is

cal

led

this

collision happens of

life

possible

I_1

hi

two what

data

key th

iut

values

it

ltashittg

same

locatioti isions

Why

are

and

wIten

are

mp

trtant

because

et di

fact

var

tahtt

hashing Figure

Seeccud

.lS

tett

ti-ct

Sctppose

suited

at

that

employee

t-eeords

it

are will

hashed

tiot

based

to

ttn

Social

Security table to

num

with

where

pet-s Ntctic

tahteo

ber

If

firm entries

tthe

want

resene

bash

keep

billion

that

number

tO

pscssible

Secorirv

It

numbers

ccatioti

guarantee

if

l1 key

each

its

emph vee

slots

in its

lits-

niclcte

Even

that

is

is

the

firm cvhicb

sitit1

Table address

Table conttnts

data

100

izer

table

and

hash be

it

function

tI

perfect

lv

tm

is

the

ptt cbabi

that

there

isiorts

essential

zero

digits are c/c/i

eiltptv

the

are

1930

stortli

which

lookitig

says for

that

hasb

functions

in

It

Ott... empty

with IA Ii 4Th ..

data

no

collisions

so

rare

that

it

is

them only

in

vet

special 7.t.3

Iti

citcunistaoces

the

These

specitl to

circumsutnces

Insider

are to

disccissecl

Section

etttpiy

nteantime

we

need

what

It when

colhsicttc

does

single

empty

1191 data

occu With

careful called

number

design

strategies for

handling

collisions

are

simple

The

and

arc

iSsues

Figure

lltitdt tee

7.16 nj si ted

at

ci

cnrnc

ink

rehashing

in

or collision-resolution

7.-i.2

strategies

cluster

table

will

distttss

them

Secthm

56

BTEX0000283

Section

flashed

Implernentotzo

313

7.13

We

11 I-Il

salected

the

hashing

function

key

key

ii

in

the

example

to

we

just

completed look

at

We

will

now

of

see

why

that

was

reasonable

record

thing

do and

will

also

numher

other hashing

functions

TA

There

key pr

is

Hashing

large since

Functions

diverse

and

the

group

of the

ol

hashing functions

technique

all

that are

have simple

been and

posed

advent

are

hashing

Some

straightforward

since their exotic

latter

others

of the

comple

of

Almost such

of

are

computationallv

is

simple

factor

in

the

speed

computation hasa

functions

an important

use

Lum

l9l

will

good review

our have

attention

many

to

including hut

some of

effective

the

more

ones We Good

confine

simple

methods

The

table

ne

ie

hashing

finctions

two

desirable

properties

740

They They

compute

produce

rapidly nearly

random

hashing

distribution

of

index values

Wc

record

will

now consider

several

functions

Digit The

keys

first

selection hashing

the set function of data

serious

we

that

will discuss

is

digit selection

with are strings

Suppose

of digits

that

the

as

of

we

are

dealing

such

example

of

ocial

Security

tiumbers

nine-digit

key

the

If

population

three

comprising

the will

data give

is

randomly

chosen then

distribution

the of

choice values

of

the

last

digits

d449

is

good

random

Jilsion

spens

1lfe

possible

implementation

the

following

and

when

var

table

arrayf

09991

of person

fity ile

numwith

1/C

is

record

the

type

for

the

key

and

information

is

that

we

wish

to

Notice

that

hashing

function

in this case

Marantee Vthe

firm

key

simply

key

mod

1000 three

perfect

Ually

which

strips off

the in

last

digits

of

the

key

to select

If

zero

with

he

are

taken dealing

deciding

students

which

at

digits

the

population

last

functions

ity

we

is

university

example

three

the

three

in

very

digits are

CI7dMds

are

probably

State

good

choice tend

whereas

to

first

first

digits

d1c/41 from

Security

ih

Section

probably

state are

not

universities

draw

three

in

student bodies of

the the Social

5km

does

single

or geographical based

region

The

digits

number

They and

are

ittally

on

the

geographical

region

for

which

first

issued Most

clustered

is

California digits

example have

various data

digit of

of the

we

and

state

second

for

third

indicating

subregions

for

567

example

very

common

Lithe

were

California

BTEX0000284

it

314

./si/eii

it-is

uttiversitv rittge

ii

almost

the

licsii

all

of

the

rcxorcis

would would

and an

map

tllitJt

riRi

the

500sg

5fi

uld wi

factors factors

lit

tthk

tnd

subgroup he

into

position hut

The

Ii

if

the

unction

positu ins

would

of the

ctniform

rand

tm

he

time

that

iadecl

Ii

certain

It

causing

is

high reason

number

However

than

oF

citlhsiotis

if ic

would

pi pci

not

at in

he

is

good

kin

twti

hashing

ti

function

it

21

is

St

key

in

advance

of the

possible digits

is

analyze participating

in

clist

rihctt

it

iii vat

ues taken

ate

hi each

digit

key

The

ttte

ltaslt

tclclrnss

to select

last

Such

digits the the

an

analysis

called

digit

the

analj

digits

six

tf

Instead

ii

elu

eie

tsitig

three

we would choose

most uniform

fcm

three

the

tttcl

key

wlti

digit

attalvses

showed ins

that distrihctthin

If

tlte

keys

if

tlti

gave

lie

hit test

clistribcttit

hashing

to

nctioti

might

in

strip

tile

out

kev

The ket

is St

ise

digits

from

key

and

put

them

together

form

number

range

999

fit

rf1d/ri

fsf//C44

advised

thee nat

it

tIc

tactthtti

is

sitice

although

the

digits

are

apparently

random and

For

iit

list

tinift

trio

in

value

ti

might have

ins of

is

dependencies and

mu

amotig

tend the to

thetnselves

tccct

exam

The

if

ple

certai

et

tmhi

ight

tgether

position

Then

rttitpped

rcsttlt

were

to

tltc-

alwtvs range

if

wlteti

rI38

would he

loweritig

intercligit

only

select table

hat

ott

ut

the

J3ttd39

ci

effectivelv fir

the

ctitrelati

table tns

size

and

itlereasing tleccssarv

example

Ii is

t-.j

cltattces ht-ing

tlhsion

itt

Antlvsis to light

might he

intl rigl

to

such

ing

the

tt

situtti

cotiies

only

tlttcst

Division

right the

fly

sattte

tt

ttc

ttlt

ic-

tilt

st

elleci

Re

ucsltit

tg

tuctht

icis

is

division

/t

which

works

as

It

tilt

os

introducitig

invctlvittg

lit

keel

ke

of

tttod

the

ttt

/t

tt

itt

in

the ket

key

is

is

llte

ci

liii

pattern

in lie

tltc

key

regtrclltLss

liv itt

ilttcl

ttf

its

data

t\iDe

is

treated the

asatt

clivi.sh tn

integer

ctserl

the

ati

ivtdecl

titeger

/t

sense

is itt

lie

rentaiticler

it ti itt

of

ts

tin

tltc-

tthlc

tcldress

the

ltitvc

range an

front integer

tlte

Such

since

in

futiction getserate

is

last

con

tpctter ut

systems

ste

that

ci

ivide

most another

the

Folding

The

digit

rico

ieitt

lttrclwtre register

tegister

iicccl

aticl

tetmtiticlet

The

and

next

hasi as

oldie

is

rettttittclei

ottlv

be

copied

anti

the

variable/i

key

ci

itti

p1

ct

ccl

in

practice

rictI

icitictitins

of

this type Os

Ft

in tc

give the

yen good

iivisictti

resctits

Lctm

dYt

has

kevr

and

the pritg

cliv

tn

pi

it

cmlii

in

study

ti

sI

ti

itt

he

case

if iii

can

however perform

keos

itt fl-crc

itt

urtther

of

Id

example

csit

it

were 25 then

itt

hardware form

divisible

sctl

liv

wi

keys

ict

intt

itis

tI

15 and

inncthi mctcl

20

of

the

in

itittt

hash

iset

ttf

the

tic1

nttps

ci

ci

scthset

lic-

cii

the

tti

table

ii

st

ng

itt

that

we

wisl

iviuclt

ti

lvi

If

ci itt

rse

ctstt

tg

fu

ticts

kec hir

lit

kc\ II idu cc

tin

is

tahielhl

all

keys

which

key

not

maps mid

want

all

key

itt

The

result

ivi

tithlel

etc

at iv

httt

ctntvtiiclahle

\Vhat

we

clii

to

clii

is

to

itt

ts

fu

it

her

I/t

and

codtld hc

The

laett

ir

pttthlctti

5..-\l kcv.s

uticleriving

die

chttice

ivi

iii

25

as

the

table table in

size

is

that

it

Itas

of

with

crime

is

as

tci

htctor

II

map

the

intt

position

thtt

alsct

there

the

were

has

that

htctttr

The

make

scire

thtt

key and

have

nct

common

tiunibets

BTEX0000285

Sec/in

N/ed

/iitpfeiiiciiio/ioiic

315

411

-0099

567 be

actors

.tctors

and other

the than

easiest

way

to

ensure

inte

that

is

to

chotse

Fi

ir

to

50

that

it

itas

nil

cy

5432t

isitiOt1

ut

and

itselfa

is

itumher

tahle

it

this

reason

nit

sr

ouId

time

that

the Luni

is

division 19 Ic

function slttavs

used uiv

the

sc_c

\vitlt

ill he

ti

tome

lack irs

ttunthei

sat

nLimber

jgh

111

nvever

than

thtt

divisi

small

less

20

su

dab

inalsze licipating

1/gil

the

08642

in

Multiplication simple

32963

analy

method

Lees

in

27284

that

is

based

are

ott

multiplication

digits

in

is

sometime.s

used

Suppi

se

three iutiOn-

rhgits

If

27605

295077 91

04

d4

that

the

question

live

length

ht

in

range

is

squared

itt

Ii

077

ri/./tf

and FOr

ther

lion

2.O

The

result

is

Sti

i2i

I-digit

Figure

kit

tic

.t7

tc-iuli liv

ii

Ilcil

exam

if

Lt

1i-

iai

ITO

div

i.t

ii

Then

prcluct

In

hltc

itt

is

utittitleted digits

he

are

doiitg

digit for

N/tIc

initItItciligugiiiit

mapped

selection .xantple

ott

the

prodLict

Art

most

is

Lses

in

ittiddle

chosen

liv

iv.

r.4r5i1

example

to cia twit the tose

shu the of

ttf

nvn

ii

increasing

It is

necessarY

itg

important

right

middle

the

Consider

tile

for

exantple

That

otilt

clioos

value the

the

most from

digits

product

21

extntplet 21

that is

it

comes

right the

7ks

only

product

tile

and

All

front

most two

tahie

digits of

original This

in

see

is

value

the

kcvscndiitg

of hias are

ft

21

svihl

produce

to

same

location-it

digits aitd

kind

titat

we

fri

tn

tiit

tvoid

as

follows

intri

iducing

ilving

tIle

The middle

left ikelv

the

irt

slier

is

hand

of the

nh

trnted

itt it ci

ittvc

middle

ti

right the

pi

key

Chattging

in talile

fri

iitv

igit

if

in

the

key

is

is

change

in tile

hash

result

tn

if

trntatit

ml

ii

pm

los

is

au

integer

is

the

key

amalgamated

calculatit

tile

hash

subscript

ision

used

is

nction

fast

icnerate

the

Folding

The

digit text

The

tid

content the

hash

as

function

ill

we

the

is

that

we have

five-

hash

key key

we had

lt

1971

has

dd44c4

programs

divide are

Lver

perform

that

and

the

running

on

simple micrticornputer

that

system

tltat

has

no

to

.y5

the

were

hardware form

or

multiple

is

hut

to

does

the

have

an

arithmetic digits uI

add

the

one

key

was

hash

function

simply

add

individual

We

is

in

all

Ii The

key

d1

cl

-I-

cL

ci

cls

flu

result

would

Li

he 4S

in

the

runge

iii

to

do

is

tO

is

that

it

has also

and

could

be

used

as the than

index 46

in

the

hash

the

table result

If

larger

tahle

were he

needed adding

ticn

that

lthere

tile

were tnore

as

records

could

he

enlarged

no

commofl

numbers

pairs

of

digits

BTEX0000286

4c

316

riccc/clcs

sets

IIRcvh

IC/I

Ilj

lien

the

hit

Tie

tIlt

result

ilitlite

\uuld

givett to

he

heR

01

ecu

09

conthi

99 nng

ire

99

porn

lblding

is

10

The coo

ordi

tttss

nittItois

lie

tat

ttivcilves

ms

of

the

Rev

to

butt

stitaller

result

nietliotbs

or

oflhihtntrt.4

nsuaIl

either

arithmetic

addition

olteti

or

exdnstve

in

ors

With

Foltlmg

Sc

used

conjunction

inc In

hit

other

methods

tgt-anl

lithe

Rev

were Since

end

ecti

liv

numhe

that

ci

digits

and and

is

p0

were

implemented

cm

istt

ittutiel

has tss3 to

registers

cii

consetlnentlv

has

as

it

maximum

the thtee

It

tie

ii

the

less

Rev

im

raetahlc

stands can

otdi

sctntelttte

Fttlditig cati

he he

reduced used

to

an

integer

than the

M535

in

hefore

has

he

do

this

Snppitsc

Rev

question

value

is

Rei

9KOSa

htcah

die

321

\\

can

Rev

tint

1ottrc1it.it

groups

and

then

add

diem

tIUt9

type

i321

Ii

iltl

Rev

3Oh

it

ftinet

Ntis

thin

It

result

would he hctween

In

antI tahie

20tT Now

iosinoti

apply

the

cit

second range

is

hashing

It..

func-

var

sn

divisnin

taltle

produce

in tosttic

within

Un

lie

hash

ltts

ctis

the

composite

uncut

Ill

Rev

olth

bold Reel

ta

ccc

rep

II

Character-valued

All

ccl

keys

itt

the \vcre

exatttples

sc

our

ccl

diseussic

in

ccl

Itashing

funethtns the

assunied

Revs are

that

die

Res

tile

cciii

tiueger

dune kers

cltetu

however

these

character

strutlgs

or

that all dct.i

bce

ic

Unti

tre

litntlled

end

eonlputter or tltetnor\ c.saniple

is

Rencetither ol hits

sU

lie

ic5

tie

stmph

strtng

lie

ASCII

code

or

chttraeter

Algot

\\lttt

Ii

tati

.tlscc

ht

ccctetpttttd dtaraetets

..-as

cs

tltt

inurget

in

..-.-.. caIn 21

tIns Iashi

cu

Flit

nit

futittcon

of

the

sttiiplc

Uaseal

tchuerprets

121

integers

cnzlt

salnes ate single

h.sis

tc-ug

dts

cittractet-s

in

Itashing as

functions htlhcws

the

Rev

7.4.2

Ct

eltaraeters

tHu

cn

cut he

applied

coLlisic

Ill

Rev

ci

rdl

Rev and

mc

tb

when

will

nyc

In

the

ease

Re

cctdc

in

hegiti

I/cs

nod

stritig cO

strategies

ies

length such

Ii

the

Rer Rev

is

character

as

nmedigit

BTEX0000287

Section

-.

Flashed

Imp

letnentations

317

the

hit

pattern

for

the

string

would

he

110101011110012 The

corresponding

integer

is

ordj

key were Si ce

128 to the

128

ordv

multiplication

13689 by

128 effectivel the for

the left

shifts

hit

pattern

hits

The

addition

effectively get

concatenates

the

2-hit strings-

For

the

three-character

string

djv we

ordd

1h384 hecond ani

is

16384 providing

ordj

left shift

128

ordCv

14 the

7.1 hits

1652089

Notice

available that the result

is

of

for

the

capacirv

of

16-hit

register

on most mini

string

in

Algorithm

21-character

groups

o13

type

stringl

array

I.21

of

char

fi-inction

fold

string2

integer

l-oldv

cIxnackr nnqsc

hit

strii

of

of

var begin 1.22

ciaractcis

ti

IctLct

.14

hnqcn

the

art

rcqiiirectJbr

recoil

IbId

repeat

fold fokl

oniUli

16384

128

ords

ords1

until

end

Algorithm

7.1

Folding

character

string

Algorithm

the

7.1

could

he

written

more generally

can be

hut

doing

to the

so would

result of

ohscure

frmnction

simple process

Division

hashing

applied

fold

7-42

Collision -Resolution

Strategies

or

collision-resolution

strategy have

rehashing

determines

to the to

what

happens

when

will

two

or

more elements

collision

or hash

same address

We

hegin

by defining

some parameters

that will

be used

Strategies

We

nine-digit

will

call

the

number

of

different Social

values Security

that

key

can

assume

integer

for example

number

has

1000000000

BTEX0000288

Section

Flashed

fotpletneittarzorts

317

the

hit pattern

or

the

string

would

he

Folding

Ipons

of

the integer

is

ally

either1

ord

128

ord

13689 he 128

Ic

key

were

Siwe

hits

28

to the

the left

multiplication

effectively

shifts

the

hit

pattern

for

1tttplementecl 4a

The

addition

effectivev get

concatenates

the

maximum

the

It

2-hit strings

For

three-character

.hds

must used

1o384

string

djv we

be Ivalue

ordd

is

16384

ordf 128

left shift

ordv

14 the

7.1 hits

1652089

Notice that the result

is

2i4

the

providing ofa

of

for register

heo

lttd

capacity

16-hit

register

size folds

available

on most

string

in

mini-

and microo

tmputer systems

Algorithm

21 -character

groups

113

type

string2l

arraj

1.211

of

char

inctlon jhing

func

fold

string2t

integer

loldc

clxuactcr

to

.ctrotg of

of

2/

cicracters

tcnefe ituctrs

the

van

begin

1.22

At

h-act

24

hit

aw

rctjztirtclfttr

nttl

IT

hild

repeat Id

fold ordi

16384

12H

trdsi

ordUll

28

until

21

end

Algorithm

7.1

Folding

character

string

Algorithm

the

7.1

could

he

written

more generally

can he

hut

doing

to the

so would

result of

obscure

hinction

simple process

Division

hashing

applied

fold

7.42

Collision -Resolution

Strategies

determines

to the to

collision-resoLution

strategy have

or

rehashing

what

happens

when

will

two

or

more elements

collision

or hash he

same address

We

begin

by defining

some parameters

that will

used

Strategies

We

nine-digit

will

call

the

number

of

different Social

values Security

that

key

can

assume

integer

for example

numher

has

1000000000

BTEX0000289

318

c/tapir

Sets

conat

bucketsize tablesize

User User

supplied supplied

It

The

must he

size

of

the

hash

table tablesize

to

is

second elements

in

important

parameter

to

Li

rehash

at

large

enough

of

hold

that

the

is

number

actually

of

we

table

wish

varies

is

fraction

The number

type bucket array bucketsize stdelement of of the

is

records

stored

the

svhicl

and

is

dent

table

ted

ii

One

contains

of the

at

parameters

is

the

is

found

that

records

called

the

load factor

to

and

var table array .tablesize of

written

at

In

tablesize 7.3

We

The

In

3/7

the

7.3 of

summary

and

keys

are

our

data

in

elements hash

table

are

chosen

is

from

different

values

is

elements

full

stored

the

that

of size

tab/rize

and

pro

var begir

100%

position called table

is

of

hash

table

is

ohtained Each

array

by allowing of these

each

hash

table

is

to

hold

more

and

than

single

record

multirecord of such

cells

bucket shown

can

hold

records

An

representation

hash

if

in

Figure

of

718

tables access as collections devices to

the of buckets

as

is

The concept

that bucket ______________________ are stored

hash

direct

for tables

on

bucket

such

cell

magnetic

For

as

those

track in the

if

devices

each

can

be

tied

physical

of the device

the cia

tee1

or sector

transfer

The hashing

the physically

function related

produces block

into

bucket

the

number

access

that

results

of

random

at

memory speed

tables

tee1

rec

RAM

stored

end

A1g

func

Once

rec1

there

Iluckets

the of

bucket

size to

can

be

searched one

are

or modified

of limited

high

in

greater

than the

use

to

hash

in

RAM The

will the

tend

discuss table

slow

average one

table

access

in

time

records Bear

size

in

when

searching

We

that

only bash

buckets

of size

is

this

chapter

of

mind however

proct var

st

we

discuss

of

buckets

one

approaches second

positit

the

The

first

strategies

for resolving

collisions

will

be grouped attempts

into

approach keys

that linked the linked

is

open

that

address methods1

to tbe

and

in

begin

star

subsequent

in

basb

one

table

location

some other

the

table has to

unoccupied

list

open

home

in

The second

hash

approach extenial

table third

chat

is

rtj

ft

big

Figure 7.18 Hash table of added

buckets

associated

list at its

the

address

Each

eknient

pointers

Un

The

approach

will discuss

uses

to link

together

since

different

it

buckets

of the

bash

table

We

that

coalesced

chaining

is

one

better

strategies

uses

this

technique

ens Mgi

Table address

Tabte contents

Open

Fur

all

in of

of

the

open

and

their are

algorithms

we

will

use

the

ml

lii

121 131 141 151 161

empty

to

9t1...data..

empty

hash

table

represented

Figure 7.12

sophistication after as

There and

several

open

address

AJI

an elemm added

requircc

it is

data find

is

variety Let

of techniques

to to

37i

empty empty

an

open 227

table for

position

collision

us

return

Figure 7.16

the to

which

repeated

is

reference

that the

Figure

7.19

and

attempt

function

add

key 227

whose

easy inse

109t

..

data.

value

Recall

example

bashing

applied

gives

The

.11

FIgure Three

at tablel Il

11227

so

that

227

mod7

and

dc/c

records

tablel3l

and

227

collides

with

374

deleted

BTEX000029O

.cectioi

i-Ictshect

Iiizp

kince

unflons

319

Linear parameter

to

rehashing

is

simple

sequential

resolution

to

the the

collision

called

at

linear

Table Table contents position position lu empty 911 empty 13

-i

store

rehashing

time

at

tu

start

search through

hash

table

the

address

which found

the

collision until

tile

occurred

table

is

with

probe

at

until

an

open

is

fraction

or and

to

the

exhausted

is

position

is

reveals

in

an

open

It

a4factor

address

new record

the

stored there

The

result

shown

tile

Figure 7.20

request used

to

find

it

record

with key

227

generates

374

71

eniptv

1091

store are

first

15

in

We

7.3 The

7.3 g4ifferent Wesize

now

position

to

implement which

is

the

operations specihed

in

Section

7.2

operation

isfindkei

implemented

by Algorithms

and Figure

Linear 7.20 rehashing

and

procedure

vat

11

findke

ttke

kevtpe

boolean

positiOn

begin

hashtable

Fltkey

tI cells

is

Apply

bath

funrtion

hash

if

tablehj.key

-C

they

and

table

empty

then

for

Iinearrehashtkey

tables

for

those

track

in

If

they uindkev

tahlehf key

true

false

2$

then

else

is

the

hndkev

tyRAM

stored .isrching

end

Algorithm

function

7.2

Implementation

ofoperationjinc/key

using

the

hash

however

procedure linearrehashtkey

war

kevtvpe

var

it

position

oaches

xtnd and

05ltlOfl

start position

begin

start

repeat

iilthajn

is

mod

until

tablesize

tablefh.key tablelh.key

start

they

fleer

Jhttncl

iointers

or or

empty

Entire

Open

tthk

IoLanrnl .osarcbed

oiesced

tiiue

end

Algorithm 7.3 To

insert

linear rehashing

Table address

Table contents

Probes

an element or

is

we

search

the

beginning

table

is

at

the

home

For

address

until

an

empty II 12 13 911 421 374

II

use

the

methods 61seekto

added

until in

exhausted

leads to

example

inserting

421

Figure

of

7.20

We

of

have

column

to to find

to

our

illustration

hash

tablesthe

In the

number

of linear

probes

..i

16

it

77

empty

1091

each

element

stored therein

case

this

rehashing

IS

is

easy

determine

an elements can be

home

address

as

from

in

added

information

7.4

implemented

shown

for the

Algorithm

We

and

will

of

user-supplied empty

is

values Let

key of an element

empty value

Figure

i-lash

7.21

table

and

the

to

number

find

of ele

deleted

obvious

us see

why

we

need

the

probes

required

in

an

deleted

ment

the

table

BTEX000029I

320

Chapter

Sets

vat begin

position

Insert

an

element

using

Prohlen rehashing

in

linear

rehashing

pa

Figure

that to

He.key

while tablehj.key

empty

tablesize

and tableh.key

deleted

do

call

mcd

tableh.elt

this

phei

end

Algorithm

rehashing

Table address 7.4

Prohltm

pOsitiOn

Implementation

of

operation

insert

using

linear

two

rehash

clustering Cons

Table contents

idt

Probes Figure

7.22 in the

shows

the

result

of

adding

624

needed using

whose

to

home

an

address

is

to for

difference

in

101 III

empty

911 421 374 227

hash

are

table also

Figure 7.21

The probes

search

of the

find

empty space

to find

Only

new

kc

624

shown

that

12

131

subsequent

pathIf

linear

rehashing

624

position tioo

will retrace

same

any value

421

374

or

ter from

deleted and

not

replaced

by

the

151

624 1091

empty

searches

the

for

624

The CX

can he

calcu

61

work

Upon

encountering

solution

location to this

special

is

search

would

searching key

is

to

mark

positions

deleted with

as

value 7.5

The

deletion

operation

Original position

implemented

shown

in

Algorithm

for

624

value

whose

VZt begin

position

keyrype

leteze

an

eten2entfron

the

hczcb

gable

l1tkey

if

Apply tkev

hash

function

table

deleted

and tableh.key

empty

Figure

hash

tabt

then iinearrehashtkey

table

end

Algorithm

function

7.5

Implementation

of operation

delete

using

the

hash

The

and

of

ex

unsucc

The drawback

hash

table

to

the

use of

the to

the

value

deleted

of

is

that

it

can

pcrtbrmat

clutter to find

up an

the

thereby

increasing

is

number

all

probes

required

ele and

general way

that the pert

ment

to

partial the

solution

reenter

legitimate

elements

periodically

mark

remaining

locations of

it

notedprin empty

hashing/rehashing searching

in for target detail strategy

is

The performance

by

the

combined makes

linear that

it

You ma

measured key

in

number of probes

the

in

values Section by

is

We

7.5

other than

will 7.3 but

at

would

examine

perfurmance

feel for the that

of

fact

rehashing

more

we

the fur

can

get

probe key

sequence

value

results

well 7.22

looking undertaken

position to

kt

where

tablesize tern will are

of

624

Since

624

mod

is

the

begins

are

at

in the

table

The subsequent

are

search

shown

the

Five linear

required

find

624

There

two

problems

underlying

method

coy

BTEX0000292

Sect/au

7.4

Hashed

unp/ementat/oiws

321

men

ucing

Problem rehashing

in pattern 7.22

Any key

as will to

it

that

hashes

that

to

position

say

will that

follow

the

same

Table Table contents address

ybasbing

all

other keys

the

hash

Figure

that to

follow will

is

probe

with an

all

to

hashes

to position that

Probes

is

101

tj

tnprv 911

i2t

call

hashes before

have

to collide

of

the

that

previously

found

or before

empty

position

foun

We

will

121 131

this

phenomenon

Note

prlmaiy

in

clustering

7.22 that the

ll

227 cmprV

Problem

position near

Figure

the

probe

pattern

for

rehash

from

merged with

patterns

probe

pattern

for

rehash

from

position

The

CI

109t

have

merged together

phenomenon

called

secondaty

Figure

7.23

Consider

so is

Figure

the

7.23

which

is

copy

of

Figure and

7.21 There

the

is

substantial

to for

difference

in

probabilities

of positions positions

next

new key

to

space

find

Only

new

keys Keys

hashing hashing

into into

and

position

rehash

eventually

if necessary

arrive

at

624 were

position tion

any other

posi

227

would

would

ions

The expected

can be

calculated

number of probes

as

for

any

random key

not

yet

in

the

table

ter fromi

shown

in

Figure 7.24

operation

OrigInal posItion

hssh

Number

of

probes

Empty found

position

at

bath

table.l

fanczioa

Total

18

Figure

hash

7.24

Expected

in

number

7.23

of probes Expected

for an

unsuccessful of probes

search tS/7

in

the

table

shown

Figure

number

2.57

hash

The expected

and

number of probes

key not

for in

both

successful

target

will

key our

in

table

unsuccessful target

of rehashing Section 7.5 can

table and

searches we

will

be

measures

in to

tet

up

th

of

performance

strategies will

examine

them

more

noting

that

04

an ele

way

in

We

be

confine our

attention

here simply

the

c4ly and

performance and

improved

by eliminating

problems

we

notedprimary

measuret

secondary

to

clustering

the difficulties to

other than

7.3 For linear

resolve

by introducing

table position in

step

size

We

wE

at

rehash

Stepping

new

Algorithm

Or75 but

would

become

Sng

mqtmlcen

Position

cmodm

where tablesize

are relatively the

If

tablesize

is

prime or

then

exactly

at

least

if

and pat

red

to

finc

prime

table

have

no common

at

factors

position

the

search

cover

entire

probing

each

once

without

BTEX0000293

322

Chapter

Sets

This

kind

of

coverage

if

nonrepetitlous complete

position that

coverage

probed

prcihe

We

ha

Obviously

the

table

was

the

were

during

same rehashing

performance

that are

sequence

If

would

cover

not

he

the

wasted

entire

would

empty

affect

the

probe

did

not

spaces

not

included

in the

pattern

would

he

discovered Although

value that not of that

is

prime of

to

the

table

size

does

give

is

technique

it

has

these

nonrepetition

the

and

complete

of

where The

since

fact

is

does

solve

or

in

even

that

improve does

problems

of these

primary

that causi

secondary

clustering

An approach

solve

one

problems

it

described

next

be such

random

an

appi

Quadratic rehashing

is

rehashing probe

at

One

method

One so

of

improving

the

performance

of collided

at

to

key value

so

home

address

i2 mod

values

values tahlesize

of

Hkev

wheref

position takes

is

on

the

is

either

the

target

key or an

empty

called the

we define

found

or

until

the

is

completely

linear nut

in

searched

This

method

it

quadratic

p1ohleni clustering

that of

rehashing

secondary

Details

visits

than it are

rehashing

solve the

because problem

solves of

it

ckey

Suppose

position

thai

clustering

does given

primary

is

of this

all

method

Radke

1970

where

shown

is

rehashing

table

locations

without

repetition

provided

tab/esize

prime

number of

the

form 4k so

c421

the table

Random

occurs simply

rehashitzg jumps

Envision

to

rehashing

strategy

that

when method

he

collision

is

randomly and

the the

new

table

position

This

called of fianc

If

12

12

624 had

its

random

random

tion

rehashing

distance to

can

be

considered or

to

to

jump

hash

from

the

original

if

position

be

second

collisions

is

applied

is

same key

until to the

and or an

to

subsequent

occur

or

the

until

However

process

the table

repeated

target

full

empty

found

is

determined have

its

he

and

not

contain

key Since

fixed

each and

c62q

the prol

key

would

patterns

value

The

acces.ses

would

to

be

no

rehashing by

the the key

he

determined must

follow

since as

subsequent

the

with

there

the

be

same

there

is

pattern

original primary

it

Since

would

clustering to

no common

this

patterns approach

turn to

would

be

no

or

secondan

difficult

Although

the The

position that

appears

implement Thus

are

we

schemes

reh orig to

tJ

whose

performances

almost

as

good

hash

Douhlc

/xi.s/nig

str

Several

methods

the

exist large

is

that

attempt

of

to

approximate

the of

such

an

sia

tndom rtbashing

hs

it

Itegs

without

overhead

calculation efhcient

required

izing step

One

of

thcse

double hashing

computattonally

and

simpk

of

is

the quite

expect clos

.4

to

apply

BTEX0000294

Secno

-/

Ilasbeci

nipletuenrctriuits

323

jid

is

We

have

seen

that

the

general

pattern

for linear

probing

is

to

probe

at

were.1

woul_

not

cover not

Ci

rn

would

He

does

give

nd

rof pt prol

where The

since

fact

is

constant

is

Cc

is

in

at

our

original

discussion inefficiency

of

linear linear

rehashing rehashing

like to

that

constant

the

root

of

the

of

it

causes

fixed

probe

to to

patterns constraints

and

clustering repetition

Ideally

we would

this

is

.ese

be

random but

an

subject leads

is

on

Although

that

is

possible

such

approach

solution

at

computational

overhead size

that of has the Table Table

COzItCIAtS

One

tformance

collided

to

compute

needs

random jump

rehashing

to

for

position that

and

Thus

the

would be

location function are

function

address

key value

values

so

different

keys hashing

starting

same

given

different

It III

oic I1key

For

example

key rood

tablesize

21

or an empr

ethod

it

we define

related

step

size

called

solves of

the

ckey

Suppose

position that

mod

421

is

tablesize

in

2J

Figure 7.25

is

primary

is

to

he stored

collision

Then

421

as

collides

with 911

at

Figure

7.25

.te

it

shown

When

the

occurs

computed

ed

cahiesize

c421

so en thod ca ad

n$ is

421

mod

at

the

table

is

probed

called of

mod 22mod7

If

frJoII/stort

Empty

it

tine

thc

624

had

been

its

the

key

pattern

would would

have have

also

collided different

with

that

911

is

at

position

However

rehash 624

been

bund

cy ted ed

or unt

Since

eac

and Ice sant then

is

c624

the

mod

have

rehashing

by the the

probes

would

been

at

419w

mod

coittsioaj

jcoI/isiottl

patterns

.tproach

the

mod 35mod7

The rehash

position that originally to the pattern

is

Enqwy

for the

am

to

scheme

two

keys

both

of

which

pairs step size

hashed

to

the

same

we can

the

find

or groups

the

of keys

hash

same

is

same

size

probability

proximate tion

etit

th

of

such

an

event

size

low

hash

fact

tables

of reasonable

of

and

good

hashing

random

in

uAJ

simpl

izing of

is

step

generator

In of

the performance

for

double and

terms

and

the

expected

close

number

to that of

probes

both

successful

unsuccessful essentially

accesses the

quite

random rehashing

Since

it

has

same

BTEX0000295

324

Chapter

Sets

performance

in

numbers

greater as

of

probes

and

to

in

computation

for

per

resu

probe

hashing

it

has

is

overall

efficiency 7.6

It

algorithm Algorithm

double

given

Algorithm

is

comparable

7.3

procedure douhlerehashtkey

var

start position integer

keytype

var

it

position

key

produce

Eacl acteristic

begin

start

tkey

conat

type

mod

tablesize

or doubi

lablesize pointer

User

supplied repeat

Ii

quencie

node

record el stdelement next pointer

node

mod

tahleh.key

tahiesize

may be

tkey

until

tkey

found

Obs

cussed of

in

or tahlehj.key or

start

empty

Entire

Open

table

location SearJfld

end

position .tablesize

one an

end

pointer

for

var

table

arrayl

position

of

Algorithm 7.6

double

hashing

Extc

Figure 7.26

Representation

for

Algorithm

of chaining hash table

7.6

shows

function

only one

that

method

for

computing

size will In that

is

random

less than the

step

size

is

external

Any

not

randomizing hascd on

the

is

produces

original

step collision

and

division

position

of

is

the

do However

to avoid this

at

algorithm biases

Table address Table contents in

101 111 121 131 nil nil nil nil nil nil nil in

that esize

shown

efficient

and

simple

If

order

introducing

tab

should with

be

the

prime

division assures

number method

an

we use

of

is

conjunction

as

If

for the

original

the table

choice

of

and

tuin

primes

is

exhaustive tableszze

search

the

without

In tb

in

repetition

ahesize primes

prime and

also

prime then

and

are

rwin

ing

is

by

act

in

how

14

151

second

is

16

approach

the table

to

the problem

of

all

collisions

called

external chaining

that

Figure

Initialized

for

to

let

position

absorb

keys

of

the

records

into

hash

to

it

Since

we

To

illustrzi

external

do

list

not

is

usually

know

data

how many

to

will the

hash

an

table

position

linked

shown region

address

in

chaining

good

of

structure

is

collect

in

records

representation

based

on

an

array

pointers

shown

in

rt

As an

Tabte Table contents initialized

If

example

the

let

tablesize as

suppose

that

operation

create

has cellar

The

is

hash

table

shown

is

Figure 7.27

address

101

division

hash key

function

chosen say

home

add

nil

911

nil

I-It

key

mod

keys 374

1091

Hle

assuming

After

131 nil

374

then

insertion

of

the

51

16

nil

key

1091

374

1091

in

next

it

co

Ii

key key

address

result

is

911 hash

911

FIgure Hash

7.28

after

table 1091

insenion

of keys

the

table

shown

Figure 7.28

are not

Insertion in the

of

227

and

421

pro

position

If

\s

i4

911

collisions

the

collisions

shown

text

ket

BTEX0000296

______________

Section

7.4

Hasl.ec/

Inrplementatiozs

325

results

227 421

in

227 421

mod mod

insertion of Table address Table contents

Figure

729

Subsequent

624

nil

911s21

key produces Each

acteristics or

624

624mod7

131

nil

374

nil nil

227

the

list

result

is

shown

in

list

Figure 7.30

11 has

all

linked

The designer

any pointers records

of

the of

choices

of

list

char

single

151

as

he

or she

has

for access

listmethod

and

are

61

1091

terminauon

the

list If

double

linkage with

other

the

ordering accessed

of are

the

fre Figure

it

729

after

quencies

which

to

various

list

quite

different

I-lash

table

insertion

of keys

may he

effective

make each

self-organizing

in

is

and 421

Observe cussed

of

in

that

the

operations

are that

to are

those

on

lists

lists

dis

II

Chapter

that the

The only

list

differences

many

one

and

in

which

we

are

interested

determined

by

the

hash

address

function

nil

External

chaining

has

over

open

address

methods

9tl421E624

121

nil

Deletions The

are

possible of

no

resulting table

problems

greater

number

be

elements than

lists

in the 1.0

can be

for the

than

is

the

table

size

13

nil

374

227

can

allocated

greater as the

in

Storage

larger that the

is

elements

dynamically

nit

grow

7.5

1091

We

in

shall

see

Section

performance

better as

of

that

external

Itash 62-i 7.30

after

executing and

afindkev continues

operation

to

than

of

open

methods

be

excellent

grows

as

beyond

in

1.0

tahle

insertion

of

key

In the

next

technique

the

collisions to

are resolved

inserted to the

they

of

are

external

chain

Li

ing

is

by adding

element

is

he

end

list

The

difference

in

how

the

list

constructed

Table address Table contents

Coalesced To

illtitrate in

chaining

empty

coalesced

In

last

consider

is

the

hash

into

five

table

the

buckets

Ii 12

empty empty

addreys region

Il

shown

Figure 7.31

the

table

divided

the the

first

two

address

the

empty

ii

region and

address

cellar and

the

our two

example make up

each

that

addresses

make up

emptY

II

region

The hash

cellar

is

function

must

store

map

record

collided

address

region

at

The

their

empty

cellar

Ii

empts

only

used

to

records

with another

the division

record

iii

.1

home addresses

For our

example

we

will

use

hash

function

FIgure Hash

7.31 with

for

Hkey

assuming

After that

key each

mod

key

is

table

seven

buckets

initialized

coalesced

an

chaining

integer 27 and

is

II

inserting

key values 27

it

29

we have

Figure 7.32

position

at its

If

32

is

inserted largest

next

it

collides In

with

and

is

stored

to

in the that

empty begins

with

the

address

result

is

addition

in

added

list

home

address

the

The empty

shown

with

the

it

To

assist

in visualizing

is

the

process

position

If

Largest

epla

shown

in the

is

figures

in

key value

34

is

added

collides

with 29 and

placed

address

the

BTEX0000297

326

CT/ta/wee

Sets

Table address

Table Contents

Table address

Table contents

Tablc address

Table contents

7.43

Perj

Lu basi

perfect

Itt Ill

In

II

Ill 121 131

empty empty

perfect hash

table

ha we

gis

131 lil

epla

Il IS enipty epla

IS 11

epla 32

SI

that

such

fun

Perfect

One such

Figure

Flash

cot

7.32

after

Figure

inserting keys 27 Results

7.33

after

Figure

inserting key

7.34

after

table

32

Result.s

inserting

key

34

applications

and

29

st

Table address

Table contents

cntptv location

position

with

result point

the

is

largest

address

in

and

is

added

to

list

beginning

at

word

perfect

Suppo

hashi

The

to this

shown

Figure 7.34

chainitig to the has

empty

It

tip

coalesced

is

behaved

of

list

exactly that

like

at

external

its

resened

of the

WOI

epla

chainingeach

address

is

new record

insertion

added

end

begins

home

cellar

specili rese

121 131

The

next

illustrates

how

collision

is

resolved

after

the

same

not

full

If

resent

Atit ithet

Ii

37

is

added

the to

it

collides that

with

at

27 so

it

is

placed

in

location

is

and

added

Figure

151 161

to

the

end 1he

of

list

begins here

address

that

The

again

result the

shown

in

cerns which

the

ant

7.35 Figure

Results 7.35

after

point

its

he

made

is

once

record being

in

inserted position

cut he

cxl

was

insening key

since the

address Adding

is

already

occupied

the

placed

result this

the

empty

in

37

with

largest

47 produces used

to

shown

for the

The example

Table address Table contents

list

term

if

coalesced

were

at

describe

table

in

technique

it

53

added

to

to

the

hash with

kill

at

functions

that

lists

begins cannot

21

coalesce

until of

the

list

that

is

131

however

1973h

the

that

cottlesce

after

the

cellar

number

hash

at

101 Ill

The

effectivencss

coalesced

is

chaining

in

depends

on

the

choice

of

ts

cellar

perfect

Selection cellar

of

that

cellar contains

size

discussed

the

Vitter1982

table

1983

well

where

under

it

shown

of

14% of

hash

works

varierv

IS 29

can is

suggested records he

solved fortn

lists

50

to

the

deletion to

problems

of

open

the

fect

times

lii 161

34

schemes approach

since

without

resorting

marking

for the

records deleted

external

functions Let us

It

however

lists

more complicated

coalesce

in list

than

of

chain

Figure

Results

.36

after

approach

the

can

such

deletion to

scheme

are

are

for

keys

ti

47

which given

essentially

in

relinks

elements

element

be

deleted

of

Pascal

set

\itter

1982

our

introduction to collision-resolution

1-11ev techniques

the In

This Sections of

concludes

7.5

and

7.6

we

will

performance

functions

Before

that

we

that collisions will not

from

7.4.3

point of view

will

where

we

introduce hashing

Llen

The

is

hash

guarantee

occurperfect

functions

function

the

intege asso

integer ation

betwee

BTEX0000298

Section

7.4

.asl.tecl

Itnpfenzet

ocelot

is

327

Z4.3

Perfect

Hashing

Functions

is

Pascal

Reserved

Words

having

load factor that

is

one

that

causes no

minimal

operates on no

and

array begin

mod

nil

periect

hashing

perfect

is

function

table

of

10

Since

not of or

hashing needed

functions to locate

cause

case const

cllisions

that

se

are

are

exactly is of to

one probe

course

an element

is

has

given

functions

This

very

desirable

The problem

dlv

record repeat set then to type

until

that

such

not easy

construct found

are

do under

certain

in

Ierkct

hashing

functions

is

max onk he

of

the

conditions

Certain of

downto

else

One such

applications

ct.ndition

that

all

ke1

values the

known

advance

end

file

have

this

quality In

for

example

there

is

reserved reserved

or

as

it

programming procedure

language

Pascal

are

36

words

end

the

When

it

compiler

translating

it

program

has

scans

whether must determine programs statements word Suppose the reserved words are stored

perfect

encountered

table

reserved by

is

in

hash

accessible

in

goto

If

hashing

function

Determining only

if

is

the the

scan

in

reserved

of the

word-requires

table

is

one prohc

hashed and

the

content

are the

is

label

specified

If

the

word from we

can he

scan

If

they

the

saie

tot

reserved reserved

word was

not

certain

that

word

word

condition

of for perfect

Another

cerns the

hashing necessary

an-tount of

functions to find

is

practical

one

It

con

amount

computation

perfect

hashing

function

which

cmi he

enormous

The

with

total

computation keys

in

and

data

therefore

time

of

esponennally

funcitions table that that size

the the

number

31

of

the

The number

English

map

41

is

most

frequently

occurring whereas

the

words

hash

of

approximately mappings

number of such

functions

give

unique

perfect

10

is

approximately

is

l0

In

Knuth

if

1973h

the

Thus

only one

keys

is

of each

greater

is

million

functions

suitable of

practice to find

number of

hashing

are

than

few

dozen

long on

the

amount

time

perfect

unacceptably

for perfect perfect

most

computers Sprugnoli

Cichelli

There

has

proposals

that are

hashing but

not

functions

1977

has

proposed

minimal

1980

suggested

the

fect

functions and has given examples and some simple minimal perfect times to 1981 has proposed other minimal per compute them Jaeschke functions that avoid some problems that might arise with Cichellis method Let

us look keys

ft

idly

at

Cichellis strings

method Take

for

The

functions the

that

he

proposed words

11

are of

for

character in the

example

36

is

reserved

13 to

Pascal

see

list

margin

The hashing

function

15

gkeyfl

where

gkeyjLj

15 14

15 15 14

length

of

the

letter the character

15 13

The

is

function integer

gx

associates

integer the

first

thus gkevl

lj

the

15

13

the

associated

with the

last

of

the

key and

7.37

gkey

shows

an

is

Elgure associ

cichellis

for

table

integer ation

associated

with

letter

of

key Figure

Cichelli

between

letters

and

integers

found

by

Pascals

resened

words

BTEX0000299

328

ha/i/er

.Set.s

do end

else

record

As

conipi

an

example

suppose

function

that

the

word

would

begin he

were encountered

he

its

cxc

tI

packed

not then

icr

The hashing

result

pare

case 16 downto

goto

to

There

integer function are several

is

IS

13

33 should

Impici

its

24 26 28 29 30

exe

th

procedure

with

simple

as

it

he The

letters

first is

Use

that that of

repeat var

in

problems however

the

looking he

of di

up

the With

in tIre

otherwise type

trial

with

two

or and

more more

hut

can

is

irte

reasonable

ing

second he

serious

problem

that

determin

are

11 12 13

while const

div

array

which by

should

associated

with

each

character

The

integers

found

nil

and

discussion perfect

backiraching

7.38 need

a1oritbm

he

huilt

Of

for

ar

course

the

and

set

for

associated

integer

see

Figure

only used

once

this

are ci

Is

tisi

16

or of

33 34

351

begin

until label

1981

In

has

good

of the

backtracking functiitnsare

of

algorithm feasible

is

summan

in

hashing

when

In

the

tki

mod

tile

36

km

function

\vn

advance

is

and

the

number

iti

records

of

stiiall

that

case

perfect

program

hashing

its

function

detertnitied

advatrce rteed

the

use

of

the

hash

table Although

resulting access

determinttion

the veer itds

iif

mae be costl

the

it

only he

res

rn lv

done once

one

pri

The

Figure

tire hash

7.38

iitile

ir

ti

hash

tahie

rei4ui

ibe

values tii

Pascal

reserved

wi

rd Exercises 7.4

Fxplain the

tcillosving

lii

lii

ternis

ii

our

iiwir

words

perfect

trash

tuiictii

ii

tunic

ci

ill

address

in

hashing hashing

In

net

ii

in

ci illisiiin lacti

isP

rew

ii

utii

in

double

Li

Ii

ti

tsi

iaij

ir

linear

rt_liash ci

external

ehnning

ci iilesceit

tabring ci

Ci

ilie

divisi

in

trash

ttnrctii

in

i/I

key

goi

in

is

key

iii

id

ot

11

is

usually rio

ii

iii

hasir

function

iii iiivert

if

iii

has

nn

sniahi

divisors

spliin

svhv

tins

and

cliaini

iest

placed

tunctii tire

in in

eveii

iilti

ip

hash

in

ti

ninedigit test

integers

Social

functii

Seen

iii

rity

irwnihcr

ti

produ

fu

integers randonrlv

if

range

It

.. 999

vi iu

hash

trains

ire

applying

net

stttt

generated

te

keys keys

Deterirrinc

rosy

of

the

addresses

rcccivv

inrcgc

hasheij

Ci innpare using

vi iur

experimental iirrizer

uinet

ii

results

tire in

with

tire

results

that

nvi

iuld

he

ihiai

ned

perfect

values

if

rairdi

number

is

of

addresses

receiving

is

exacilv by

mashed

the

hash

perfect

randonnizer

approxiniated

7.5

For

syheie eceli

us

is

1-k

this

tIne

Ii

ad

funet

facti

ii ci invert

groups

keys

iii tire

rash

ii

in

tu

type

basil

tth

kevtvpe

array

the

.15 of char

Operatioi

Operatio

mu

integers

in

range

1999

trnpleioent

your

htsin

funcbi

in

and

deiernrtt

Otahlesi

BTEX0000300

Section

uiashi

tg

Peiforinance

329

by

its

4tered

execution

their

time

Do

the

stme

fur the

Flash

function

in

Exercise

and

com

pare

times

ct

Implement

its

hashing compare

function

it

described

the results 11

to

in

Section

in

7.4.3

Determine

execution

the

time

and

with

obtained

the

Exercise of integers

it

itpkulg

Use up

the with

hash

function

key 27

key 35

tm.d

store

sequence

32

in

31

23

table

tie

done

at

of

determin-

the

hash

integers

Of

are tL

var Use Use Use Use

tahle

course

array0.

11

of

integer

itre

Iichelli

lincar

rehashing

rthis

problem

keys are

douhle

external coalesced

hashing

chaining chaining with cellar size of four and the hash function

1.he

1e

perfect

k.Although

tijting.accesS7

I-tke

Ft

ir

key

mod

ahi

n-c

each

if

the

011 isbn-handling

the

strategies

determine

after

all

values lite

have

cid

been

lactor

placed

in

table

the

following

The

11w

average

tverage

number

nutnher

of of of

prohes prohes

necded needed

that

to to

hnd

find

value value

that that

is

in

the

in

tahle tahle

is

not

the

Implement

to Specihcation nntn Linear iuhle External

collection se

procedures

forms

hashitig

package

accordittg

rehashing

hashing

chaining chaining table with cellar size of

Coalesced

let

htslt

70

he

given

tahlc

array0..500

function

will

of

integer

pRin

why

and

hash

by/il

key

key integers the

ke

Use

in

hash

function

chaining ny

he fikeyl

of

mod 431

to store

random

the hash of

nunther

table

numbers

it

produce

futleth

ttl

sequence

of

in

Determine needed

to

plnng

t%s

the

load table

Ftctor

average

tlumher

probes

find

receivc

itlteger

the

ifrimated

7.5

Hashing Performance

this

j-

discussion

the

operations

iticludes

in

Specification that

72 do

not

are

divided

into

two

the

groups hash

The

First

group

size not

operations and

involve to

searching execute

is

create

clear

traverse

The

effort

these

operations

OperationsJiill

depend

require

on

which

collision-resolution

strategy ancl.clear to

used

and

effort

size since

01

table

effort

Operations must he

crane

Ideterm

Oiahlesize

each

position

initialized

BTEX00003OI

330

Civiptci-

Sets

empty Operation

processing Each

the

traverse

requires

probing

OOabiesize

table

positions

and

factor

0n

of for in

elements

in

target

of

operation an

the

group

requires

searching searches

are

the

hash

table

for

hashing

element

the are

associative

either

successfttl

an

which group of

key

value insert

is

is

found

or

the

The The

7.52

In additi

it

operations performance

ated

this

findkey operations

discuss

retrieve

update

determined

of

all

these therefore

primarily

associ

for

search

We

and

will

the

will single

ments hash

ol

successful

unsuccessful later

searches

We

out

the

delete

operation

tahi

for discussion

element

table

cor

7.5.1

Performance

expressions and

that give the

Tx

expected can

number

he

of

compares

Results 7.39

required

for three

for dif

Tx

unsuccessful

searches

policies are

developed

in

collision-resolution

shown see

Figures

and

7.40

Figure

shows

and

the

algebraic 7.40

expressions

the results

Knuth

1973h

the give

for

their

develop

memj

Observe those

Figure any

shcws

of graphing

will

algebraic results

expressions vers

close in to lesced

The

hasl

ci

that fur

random rehashing

hashing

for

technique

double

Expressions

the cellar

is

coalesced

result

chaining

for

are

given

in

Vitter

is

1982

same

is

Note

that

if

position

position will

not

full

the

coalesced

effort of

chaining

the

as for external

chaining

the

In as

general

that of

the

search

coalesced See

Vitter

chaining

approximately

the

now

If

ti

same

external

chaining

is

1982

all

in

which

per

itself

formance discussed

of coalesced

in

chaining

compared with

chaining considered

is

the hashing

to give

techniques

the best

th

this for

chapter

CoaLesced

shown

Figure

table as the

is

performance

the

circumstances

we

extern perfo

Linear rehashing

Cotlisionl resolution

provides

If

strategy

Unsuccessful

oubte ha

shing

linear

rilusting

-ll

It

-lI------

of of

uY/

rules

elements

ISnihic lug hashing

and

ing

aba

0.5

Fxteriial

cloi

ning

cx

xx

ments

Load Factor

Figure

III

739

Algxtaaic

cxpressi

115

hi

IF

ii

Ic

iii

nxinilcr Nuhi

it

pri

ihcs

or nearly

expected

successful

md

imiisticccssful

scan_lies

table

Thes elements

successful searches cessful

in

in

Figures

7.39

and

7.40

that

the

performance

of

curves

the

for

hashing The

example

user-defin

for

methods

unsuccessful hash table sucis

are

monotonicallv

increasing

load

factor

performance

of the

cones

of the

for

lists

and

both

trees

monotunically structure

increasing

functions

It

large

number

elements

in the

for

may be

1.0.

unsuccessful

not under

implementors

control

the load

BTEX00003O2

SediOn

7.5

I-/cashing

Peiforrnance

331

Jkons

and

factor

may be made

of

arbitrarily the

small load

by

factor

increasing

the

table the

size For

given

of

value

for

we The

can

reduce

is

and

improve

performance

hashing

price

more memory

iccessfuI tSful

adele

the

The The

7.5.2

In

Memory

to

Requirements

it

associ for

addition

performance hashing

that

is

important

Let

to

compare

the

the

memory

of

require

in

ments hash

of various

techniques pointer

of

is

be

numher

of

buckets and

for

the

required

table

re

operation

assume

occupies

one

word

memory

that

an 3T

External chaining

element

table

occupies

words elements

then

requirements

hash 27

containing

for

any

open

addressing

method

coalescedchaining

required orthree

1.40

fort

dif-.

for

coalesced

chaining

Open

addressing

Figure

nw

These

in

0.5

for external

chaining

Load Factor

tir

develop expressions

table for the are

based

on

the

exressions

ejy

following

assumptions

for

Each

position

close

to

hash

open

hash

addressing

table

contains

room

pointer

in the

For

in in

coa

each each Figure Memory element amount

7.41 requirements uccupies of

lesced

chaining For

contains the

one

and

Note

for

that

if

position

position will

external

chaining and

hash

table for

contains

external

and one

use

is

pointer

one element

to

each

element

table

We

when same

as

an

roximately

ch

now

If

the

expressions

consider two

pointer as to

cases

rather factor

memon

pointer

the

peritself

perhaps

the

we

store

an element

of the load

than

is

the

element

then 7.41

techniques

ye

memory

required

function requires

that

shown

in the

the

best

Figure

table

is

Open

hill

addressing

always

least

memory When

as

hill

nearly

open

addressing

requires

the

is

only

is

one-third nearly

much memory

as external the

chaining of

Of course open

when

table

chaining

performance

addressing witha

poor

In

this

case

in as

coalesced

provides

If

II

good

is

performance then

is

substantial

saving are of

memory shown

requirements

in

10

the

memory

over

is

requirements wider

full tables

extracts

chaining penalty

range

This to

load

factors leads in

when

for

nearly

analysis

to the

following For

small

of

thumb

constructing

hash

be

stored

RAM

elements

I- cx

and load

factors

open

addressing and

large load factors

and

ing

saves

memory

coalesced

chain

If

provides

good performance

external

with reasonable

memory

requirements with

ele

ments

led

are large

good

performance

minimum

number of Take

about

to for the

or

nearly

minimum memory

rules in the

These elements

for

are based

table

on

be of

the

assumption Often

that

that that

is

the

is

maximum

not to the store

can

table in

estimated compiler

case

data able

hashing

example

the symbol

used

actor

fig

The

user-defined both

It

identifiers

programs

with

to

The compiler

wide range

that

must

be

process FIgure

7.42 requirements occupies of 10

ftinctionsi

large

and

small programs

the

in the is

numbers of

load

identifiers greater

Memory

have

factor element

when

times

as

an

of

elements

the load

leg

table

overfill to

the

should

continue

operate

smoothly

Such

situations

amount

memory

pointer

BTEX00003O3

332

C/wines-

sets

are for

then handled

load factors

1w

the

use

than

of external

.0

chaining

which

continues

to

fLtnction

where

by

greater

7.5.3

\Xe will

Deletion

conclude hash

tables this section that

with

few

comments

using

about

deletion

As

discusseci

earlier

are constructed

open

addressing

techniques

pose

prohlem.s by

c/c/c/ed lent

when suhjected

record

clutters external as

it is

is

deleted

This arises

just

canno up

the

be

The space preen tuslv occupied marked empty but must be marked

Itt cit

tahle

and

hurts

Ct

ill

performance isbn

NC such

prf

is

if

chainint

for

Lised

list

for

resolution chaining

deletion

is full

Ieletion

handled

prohlettt essentially of

is

any

linked has

where

For coalesced been Citce must

full since the cellar deletion

IL

eel

as

long

as

it is

as

the

cellar

The

never

can he and

the

irequ

3tttt

f-i

handled

front

chaining

deletion

possihilip

i-tttt

coalesced given

in

lists

then

It

carefully

An algorithm would

the

IigLI

\itter

1982

be

is

slightl\

niore

and

strategy

extract egics

small

tf

perfurnitnce

It

penalp When

considered

and

designing along

frequency predicted

deletit

Li

must

performance

and

memory

req

ren

lit

tents 5ect

if

tn

Th

tee \\e

svi

II

appl

several

In

hashing

theot-etical

nteth

t-csults

tLl5

the

frequency

specific

atitl\-sis

cligraplis

will

see

nv

the

apply

in

Dignptt

ease

7.6

\\e

Frequency

ftne

lists

Analysis

fret

of Digraphs

of cligraphs hetcire Lised

bitta

1/

discussed

ii

luence

anti

analysis

in

In

Section

.jt

\\

used

Sect

on

ST we

tour

in

search

trees

ttitr

ttd

use

Figure

\tlLiLs

7ot

NI

trees

lit

we

will hut

cantptre

tltev

Itasiting

sirttegies..-\ll

ftasltittg

function double

tvith

differ

the

cttllisictn-tesctlotion

strategy

reltasltiitg

hashing

.sutuntan

LI

coalesced

of results

chaining

involving

and

all

if

external

tite

chaining

stttctui-e

will ave

conclude used

tt

data

we

Reet ini

ltxe igrapl

ts

values

to

and

the

7.6 Ihe

tiashtabte array of

flash

Itasi ttl

hinctwn

svi

II

Figure

the

dc

ftc

of

irni

showtt

tin

in

Figu

te

-c

.43

The hash

map each

\\e

at-ct

digraph

this it

pair as

if

lettets Let

id

integers

between

the

fit-st

and and

table

for

he .tdblesize

ttitplishi

ktlknvs

cI

and

be

second

conip.

LItittctets

of

Figure

Htslt

7.43

ci

ad

tatilv

cl1cL

addt-ess plihes

is Ott-

I.t.t

ic

cc

nit

Li

ted

its

It

tI

lows

ore

Ilt

lp

oidld1

tttdl

it

ing

shoLif

elentents

irdi

c/i

ctrd

digraphs

BTEX00003O4

Sect/ri

Ttecjttcict

luo/txi.c

ojiorapl.ia

33$

aU5

to

fttnction

where

and

ate

integers

hersveen

and

25

Finally

let

fir he computed

1d

svhee

ttA

discussed Figure crhtuques ou5l at Mu he liii

2h has and

values

hetsveen

hi

.sutiple

values

of

are

shi

sn

in

14

hash

function htr

pose

digraph

is

occupied marked

IF di

lid

mod

tahlesii

such

prob

is

Deletion deletton

an

is

irthle

ie

is

to

he

s_lectt_d

so

that

ii

tb/tsszze

lets

ii

st

nail

tilt

dv

sirs

Irequenea

anahsis

.shi

resuhs

555

it

in

this

sect

ii

in

are the

hased

list

cII3lcce choraphs

300

htntt

tigure Neuuxtnn

die

I/i

digriphi

101

tuss

hue possihiliw

ii

ii tn

tO

in

ItO shows

Ott

ci

the

expected hinan

search setrch of

leti4ths

Ow

the

lnLtr

htasltitia

strtt

tie

atiuld

and

inparist

tti

sorted

arcs the

results

as

the

Sectit

tid

nietnory

the

frequency

in

specific

Oigraph

Iigraph

Iidigraph

tic

ct

its

iii

IC

Wctiittt

4.9

we

and use Figure

\atues 7.44

if

itch

Figure

ir

7.45

if

trees tour

Figure

Its tnt

hG

ii

ft

digrtpli

ittssis

ti

ittit

adilitss 0i ihte

tiit vi

few

iii

tiecttciti tsptiiect

ri

cis

it

diurtphis

is

All

tlittiuplts It

iTt

xciii

si/v

circli

ic.tl1ih

bution

strategy

9in

ethic

$ta

tuxtl

chaining

structures Recall values

tu

data

see

Figure

4.-itt

that die

processing

rahle

1110

3110

SI

is

distinct

Ihett

tr

he entered

into

hash

The

relationship iah/estze

etd

and

the

numher

7.47

of

digraphs

processed

with

shown

itt

Figure

Figure

hash fi.tnction

148

sht tws

the

average

titute

required

to

process

search

digraph

tree

ALsit

htr the

four it

and table

for

hashing

techniques

is

and

or comparison

fur

binary

included Direct

and

second

comparison

is

the

time required

just in

direct

addressing

in

sehente

addressing

Direct

implemented

is

like

hashing

case

with

this

ease

t11

lId

distitict

addressing

to

possible

this

hecause

This

ye

can

assign collisions

Ii

address

plifies

each

of

the

670 and

is

posslle ensures

the

eliminates

at

sim

t000

2000 Digrapha

the

algorithms

price not

tturnher for

pri ihes

al

Number

digttplt

of

is

one

The

for this

requirement

with tahle

more

hash

memtn

functit

in

Processed Direct

address

the the

irtg

should

he

in

cunfused

the

in

hashing

ratdonaizes

pltces

elentent.s

stored

in

hash

Our

direct

addressing

scheme

digraphs

the

tthle

alphthetieal

order

Figure

7.47

ol

lrixttieitc\inthssis

iii ttsii

chigttphs

it

BTEX00003O5

- Cognos Report Studio Interview QuestionsUploaded bysasidhartsr
- 10 libcUploaded byraaj9850
- D1 2009-01 MSUploaded bycountach786
- Shortest Way Huffman Text CompressionUploaded byAndysah Putra Utama Siahaan
- Data HandlingUploaded byawrphoto
- Computer Notes - Data Structures - 37Uploaded byecomputernotes
- ChangesUploaded byBosco Baracus
- DatastructureCUploaded byvishalf
- sdggdUploaded byboiroy
- Data Structure 2 MarksUploaded byNellai Ragul
- Error LogUploaded byJawahar JG
- Collect Set ExamplesUploaded byDipak Nandeshwar
- itmtechreport_20141027Uploaded byc0d3r
- (6) Rubric_long Module EML 3852Uploaded byGordon Chai
- HRRN.docxUploaded byHassan Ahmed
- standard2 8 courtney carver itec7500 w03 17Uploaded byapi-338504614
- UnityJDBC - SQL to Mongo TranslationUploaded byHimanshu Sharma
- Dev'sUploaded byvinu_kb89
- Report Writing SkillsUploaded byyadab raut
- CSE330 Assignment1 SolutionUploaded bynehal hasnain refath
- PracticalUploaded byKim Angelo Gonzales
- A Fast, Minimal Memory, Consistent Hash Algorithm - 1406.2294.pdfUploaded byinvesttcartier
- compiler_design_symbol_table.pdfUploaded byAwais Bajwa
- 58795973 Expert System for Flexible Manufacturing SystemsUploaded byMadan Kaushish
- Plenary 2 - Informal Formal Writing and Defining ResearchUploaded byBass Boosterz
- team chef bmo harris rfiUploaded byapi-444025389
- TOS - MILUploaded byBernadette Falceso
- HashingUploaded byMehwish Mehmood
- module 4 opUploaded byTejaswini
- Singly Linked ListUploaded byAriel Roxas

- Ammonia NitrogenUploaded byWan Nabil
- Newsletter 2016 oUploaded byusmle
- Issues and Strategies of Subtitling Cultural References Harry Potter Movies in ArabicUploaded byKamilla Pak
- F5 BIG-IP – Apply SNAT to Client Subnet or IP - SomoIT.netUploaded byhaha
- Box Behn KenUploaded byHommingHomming
- Bftf 2016 ApplicationUploaded byJohnny Montgomery
- food-fraud-prevention.pdfUploaded byAnonymous cePb20
- Markov chain for transition probability.docxUploaded byalokesh1982
- Larouche - On the Subject of b.g. Tilak's ThesisUploaded bymattoro
- Karl Korsch - What is SocializationUploaded byАлФредо Элисондо
- Lynchburg City Council Agenda - New RestaurantUploaded byWSET
- Modernism vs. Post-modernismUploaded byNoman Shahzad
- 620362_GomoryUploaded byguidelli
- k7Uploaded byKrutik Shah
- Dead Women Talking Full Report FinalUploaded byjefry
- Minimization of Shrinkage Porosity in a Sand Casting ProcessUploaded byuzairmetallurgist
- Avoiding Pressure Surge Damage in PipelineUploaded byFerlie Indrapati
- Cluster AnalysisUploaded byFrancisco Andres Garcia Barrera
- Hybrid ARQUploaded byajna.
- Acids and BasesUploaded byLeomille C Tubac
- Hydrolysis of Methyl Salicylate ExpUploaded byPradeep
- vhdlUploaded bySalil Saxena
- Internal Environment AnalysisUploaded byArifHossain
- IGNOU MCA 3rd Semster Sofware Engineering Lab Record Solved MCSL 036Uploaded byfajer007
- ProgrammingSeftTestUploaded byblackvoodoo
- Final Intro1Uploaded bySubash Sellappan
- Stuart Wilde - The Journey Beyond EnlightenmentUploaded byGratiela Stadiu
- Jurnal ArticleUploaded byOcha Rosalina
- aptt talking pointsUploaded byapi-377713106
- Learnings in CST in church and impact or influence to your personUploaded bySugar Jumuad

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.