You are on page 1of 56

HASHING

1
Hashing
• Mathematical concept
– To define any number as set of numbers in
given interval
– To cut down part of number
– Used in discreet maths, e.g graph theory, set
theory
– Used in Searching technique
– Used in encryption methods

2
Hash Functions and Hash
Tables
• Hashing has 2 major components
– Hash function h
– Hash Table Data Structure of size N
• A hash function h maps keys (a identifying element of record set) to hash value or
hash key which refers to specific location in Hash table
• Example:
h(x)  x mod N
is a hash function for integer keys
• The integer h(x) is called the hash value of key x

3
Hash Functions and Hash Tables
• A hash table data structure is an array or array
type ADTof some fixed size, containing the
keys.
• An array in which records are not stored
consecutively - their place of storage is
calculated using the key and a hash function

hash array
Key index
function

4
• Hashed key: the result of applying a hash function to a
key
• Keys and entries are scattered throughout the array
• Contains the main advantages of both Arrays and Trees
• Mainly the topic of hashing depends upon the two main
factors / parts
(a) Hash Function (b) Collision Resolution
• Table Size is also an factor (miner) in Hashing, which is
0 to tablesize-1.

5
Table Size
• Hash table size
– Should be appropriate for the hash function used

– Too big will waste memory; too small will


increase collisions and may eventually force
rehashing (copying into a larger table)

6
Example
• We design a hash table for a
dictionary storing items 0 
(SSN, Name), where SSN 1 025-612-0001
(social security number) is a 2 981-101-0002
nine-digit positive integer 3 

• The actual data is not stored 4 451-229-0004


in hash table
• Pin points the location of 9997 
actual data or set of data 9998 200-751-9998

• Our hash table uses an array


9999 

of size N10,000 and the


hash function
h(x)last four digits of x
7
Hash Function
• The mapping of keys into the table is called Hash
Function

• A hash function,
– Ideally, it should distribute keys and entries evenly
throughout the table
– It should be easy and quick to compute.
– It should minimize collisions, where the position
given by the hash function is already occupied
– It should be applicable to all objects
8
• Different types of hash functions are used for the
mapping of keys into tables.

(a) Division Method


(b) Folding Method
(c) Mid-square Method
(d) Subtraction Method
(e) Digit-Extraction Method
(f) Rotation Method

9
1. Division Method
• Choose a number m larger than the number n of keys
in k.
• The number m is usually chosen to be a prime no.
• The hash function H is defined as,
H(k) = k(mod m) or H(k) = k(mod m) + 1
• Denotes the remainder, when k is divided by m
• 2nd formula is used when range is from 1 to m.

10
• Example:
Elements are: 3205, 7148, 2345

Table size: 0 – 99 (prime)


m = 97 (prime)

H(3205)= 4, H(7148)=67, H(2345)=17

• For 2nd formula add 1 into the remainders.

11
2. Folding Method
• The key k is partitioned into no. of parts
• Then add these parts together and ignoring the
last carry.
• One can also reverse the first part before
adding (right or left justified. Mostly right)
H(k) = k1 + k2 + ………. + kn

12
• Example:

H(3205)=32+05=37 or H(3250)=32+50=82

H(7148)=71+48=19 or H(7184)=71+84=55

H(2345)=23+45=77 or H(2354)=23+54=68

13
3. Mid-Square Method
• The key k is squared. Then the hash function H is
defined as
H(k) = l
• The l is obtained by deleting the digits from both
ends of K2.

• The same position must be used for all the keys.

14
• Example:
k: 3205 7148 2345
k2: 10272025 51093904 5499025
H(k): 72 93 99

• 4th and 5th digits have been selected. From the


right side.

15
4. Subtraction Method

16
5. Digit-Extraction Method

1st, 3rd and 4th digits has been selected and used as an
address 17
6. Rotation Method

18
Example

19
Example

20
Collision Resolution Strategies
• If two keys map on the same hash table index then we
have a collision.
• As the number of elements in the table increases, the
likelihood of a collision increases - so make the table
as large as practical
• Collisions may still happen, so we need a collision
resolution strategy

21
• Two approaches are used to resolve collisions.
(a) Separate chaining: chain together several keys/entries
in each position.
(b) Open addressing: store the key/entry in a different
position.
• Probing: If the table position given by the hashed
key is already occupied, increase the position by
some amount, until an empty position is found

22
Separate Chaining
• The idea is to keep a list of all elements that hash
to the same value.
– The array elements are pointers to the first nodes of
the lists.
– A new item is inserted to the front of the list.
• Advantages:
– Better space utilization for large items.
– Simple collision handling: searching linked list.
– Overflow: we can store more items than the hash table
size.
– Deletion is quick and easy: deletion from the linked list.

23
Example
Keys: 0, 1, 4, 9, 16, 25, 36, 49, 64, 81
hash(key) = key % 10.
0 0

1 81 1
2

4 64 4
5 25
6 36 16
7

9 49 9

24
Open Addressing

• Types of open addressing are

1. Linear Probing
2. Quadratic Probing
3. Double Hashing.

25
1. Linear Probing
• Locations are checked from the hash location k to the end
of the table and the element is placed in the first empty
slot
• If the bottom of the table is reached, checking “wraps
around” to the start of the table. Modulus is used for this
purpose
• Thus, if linear probing is used, these routines must
continue down the table until a match or empty location
is found

26
• Linear probing is guaranteed to find a slot for the
insertion if there still an empty slot in the table.
• Even though the hash table size is a prime number is
probably not an appropriate size; the size should be at
least 30% larger than the maximum number of elements
ever to be stored in the table.

• If the load factor is greater than 50% - 70% then the


time to search or to add a record will increase.

27
H(k)=h, h+1, h+2, h+3,……, h+I

• However, linear probing also tends to promote


clustering within the table.

1 2 3 4 5 6 7 8

28
2. Quadratic Probing
• Quadratic probing is a solution to the clustering
problem
– Linear probing adds 1, 2, 3, etc. to the original
hashed key
– Quadratic probing adds 12, 22, 32 etc. to the original
hashed key
• However, whereas linear probing guarantees that all
empty positions will be examined if necessary,
quadratic probing does not

29
• If the table size is prime, this will try approximately
half the table slots.
• More generally, with quadratic probing, insertion may
be impossible if the table is more than half-full!

H(k) = h, h+1, h+4, h+9, h+16,……, h+i2

30
3. Double Hashing
• 2nd hash function H’ is used to resolve the collision.
• Here H’(k) = h’ ≠ m
• Therefore we can search the locations with addresses,
H’(k) = h, h+h’, h+2h’, h+3h’,…….
• If m is prime, then this sequence access all the
locations.

31
Double Hashing
• Double hashing uses a
secondary hash function • Common choice of
d(k) and handles compression map for the
collisions by placing an secondary hash function:
item in the first available d2(k)  k mod q
cell of the series
(h  jd(k)) mod N where
for j  0, 1, … , N  1 – qN
• The secondary hash – q is a prime
function d(k) cannot have • The possible values for
zero values d2(k) are
• The table size N must be 1, 2, … , q
a prime to allow probing
of all the cells
32
Example of Double Hashing
k h(k) d(k) Probes
18 5 4 5
• Consider a hash 41 2 6 2
table storing integer 22 9 1 9
keys that handles 44 5 2 5 7
collision with double 59 7 3 7 10
32 6 4 6
hashing
31 5 3 5 8
– N13 73 8 3 8 11
– h(k)  k mod 13
– d(k)  k mod 7
0 1 2 3 4 5 6 7 8 9 10 11 12
• Insert keys 18, 41,
22, 44, 59, 32, 31,
41 18 32 44 31 22 59 11
73, in this order
0 1 2 3 4 5 6 7 8 9 10 11 12
33
Building
Building aa Hash
Hash Table
Table
• The simplest kind of hash
table is an array of records.
• This example has 701
records.

[0] [1] [2] [3] [4] [5] [ 700]

...
Building
Building aa Hash
Hash Table
Table [ 4 ]
Number 506643548

• Each record has a special field,


called its key.
• In this example, the key is a
long integer field called Number.

[0] [1] [2] [3] [4] [5] [ 700]


• Typical way to create a hash ...
value:

(Number mod 701)


Building
Building aa Hash
Hash Table
Table[ 4 ]
Number 506643548

• The number might be a


person's identification
number, and the rest of the
record has information
about the person.
[0] [1] [2] [3] [4] [5] [ 700]

...
Building
Building aa Hash
Hash Table
Table
• When a hash table is in use,
some spots contain valid
records, and other spots are
"empty".

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 506643548 Number 155778322

...
Inserting
Inserting aa New
New Record
Record Number 580625685

• Typical way create a hash


value:
(Number mod 701)

What is (580625685 % 701) ?

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 506643548 Number 155778322

...
Number 580625685

• Typical way to create a hash


value:
(Number mod 701)
3
What is (580625685 % 701) ?

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 506643548 Number 155778322

...
Number 580625685

• The hash value is used for


the location of the new
record.

[3]

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 506643548 Number 155778322

...
Inserting
Inserting aa New
New Record
Record
• The hash value is used for
the location of the new
record.

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 155778322

...
Collisions
Collisions Number 701466868

• Here is another new record


to insert, with a hash value
of 2.
My hash
value is [2].

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 155778322

...
Collisions
Collisions Number 701466868

• This is called a collision,


because there is already
another valid record at [2].

When
When aa collision
collision occurs,
occurs,
move
move forward
forward until
until you
you
find
find an
an empty
empty spot.
spot.
[0] [1] [2] [3] [4] [5] [ 700]
Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 155778322

...
Collisions
Collisions Number 701466868

• This is called a collision,


because there is already
another valid record at [2].

When
When aa collision
collision occurs,
occurs,
move
move forward
forward until
until you
you
find
find an
an empty
empty spot.
spot.
[0] [1] [2] [3] [4] [5] [ 700]
Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 155778322

...
Collisions
Collisions Number 701466868

• This is called a collision,


because there is already
another valid record at [2].

When
When aa collision
collision occurs,
occurs,
move
move forward
forward until
until you
you
find
find an
an empty
empty spot.
spot.
[0] [1] [2] [3] [4] [5] [ 700]
Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 155778322

...
Collisions
Collisions
• This is called a collision,
because there is already
another valid record at [2].

The
The newnew record
record goes
goes
in
in the
the empty
empty spot.
spot.

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 701466868 Number 155778322

...
Searching
Searching for
for aa Key
Key Number 701466868

• The data that's attached to a


key can be found fairly
quickly.

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 701466868 Number 155778322

...
Number 701466868

• Calculate the hash value.


• Check that location of the array
for the key.
My hash
value is [2].
Not me.

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 701466868 Number 155778322

...
Number 701466868

• Keep moving forward until you


find the key, or you reach an
empty spot.
My hash
value is [2].
Not me.

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 701466868 Number 155778322

...
Number 701466868

• Keep moving forward until you


find the key, or you reach an
empty spot.
My hash
value is [2].
Not me.

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 701466868 Number 155778322

...
Number 701466868

• Keep moving forward until you


find the key, or you reach an
empty spot.
My hash
value is [2].
Yes!

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 701466868 Number 155778322

...
Number 701466868

• When the item is found, the


information can be copied to
the necessary location.
My hash
value is [2].
Yes!

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 701466868 Number 155778322

...
Deleting
Deleting aa Record
Record
• Records may also be deleted from a hash table.

Please
delete me.

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 701466868 Number 155778322

...
Deleting
Deleting aa Record
Record
• Records may also be deleted from a hash table.
• But the location must not be left as an ordinary
"empty spot" since that could interfere with searches.

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 580625685 Number 701466868 Number 155778322

...
Deleting
Deleting aa Record
Record
• Records may also be deleted from a hash table.
• But the location must not be left as an ordinary
"empty spot" since that could interfere with searches.
• The location must be marked in some special way so
that a search can tell that the spot used to have
something in it.

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 580625685 Number 701466868 Number 155778322

...
Applications of Hashing
• Compilers use hash tables to keep track of declared
variables
• A hash table can be used for on-line spelling checkers
— if misspelling detection (rather than correction) is
important, an entire dictionary can be hashed and
words checked in constant time
• Game playing programs use hash tables to store seen
positions, thereby saving computation time if the
position is encountered again
• Hash functions can be used to quickly check for
inequality — if two elements hash to different values
they must be different

56

You might also like