You are on page 1of 26

Unit-4

Dictionaries
-A dictionary is a collection of pairs of the form (k, v), where k is a key
and v is the value associated with the key.
-No two pairs in a dictionary have the same key.
The following operations are performed on a dictionary:
 Determine whether or not the dictionary is empty.
 Determine the dictionary size.
 Find the pair with a specified key.
 Insert a pair into the dictionary.
 Delete or erase the pair with a specified key.
-A dictionary with duplicates is similar to a dictionary as defined above.
However it permits two or more (key, value) pairs to have the same key.
-For example a word dictionary is a collection of pairs; each pair
comprises a word and its value. The value of a word includes the
meaning of the word, the pronunciation, and so on.
-A telephone directory is another example of a dictionary with
duplicates.
-Another example is compiler uses a dictionary with duplicates called
the symbol table of user defined identifiers. When an identifier is
defined, a pair (key, value) is created for it and inserted into the symbol
table. The identifier is the key and information such as identifier type
(int, float, etc.) and (relative) memory address for the value of the
identifier comprise the value component of the pair.
1
Dictionary ADT
Node ADT
template<class K,class E>
class pairnode
{
K key;
E value;
pairnode<K,E> *link;
friend class dictionary<K,E>;
};
Dictionary class ADT
template<class K,class E>
class dictionary
{
pairnode<K,E> *start;
int dsize;
public:
dictionary();
int empty();
E find(K);

2
void insert(K,E);
void erase(K);
void display();
};
-Representations of dictionary
 Linear list representation (or linked list representation)
 Skip list representation
 Hash table representation
Linear list representation of Dictionary
-A dictionary may be maintained as an ordered linear list (p0,p1, …)
where the pis are the dictionary pairs in ascending order of key.
-Each node in the dictionary has following fields
Key Value Link
Here link field contains an address of the next node in the dictionary
-start is a pointer, which points to first pair in the dictionary.
Example
start=NULL
Dictionary is empty

√insert (10, A) into a dictionary

10 A NUL
2000
L
2000 3
start
√insert (20, B) into a dictionary
Start

2000 10 A 3000 20 B NUL


L
2000 3000

√ insert (30, C) into a dictionary

10 A 3000 20 B 5000 30 C NUL


L
2000
2000 3000 5000

Start

√insert (20, X) into a dictionary

Start

4
2000
10 A 3000 20 X 5000 30 C NUL
L
2000 3000 5000

√Find node 30 in a dictionary


Value is C
√Find node 100 in a dictionary
Key is not found
√remove pair whose key is 30 in a dictionary
Start

2000 10 A 3000 20 X NUL


L
2000 3000
Deleted pair is (30, C)
√remove pair whose key is 20 in a dictionary

5
start
2000 10 A NUL
L

Deleted pair is 2000

(20, X)

√remove pair whose key is 10 in a dictionary


start=NULL
Dictionary is empty
Deleted pair is (10, A)

√remove pair whose key is 35 in a dictionary


Dictionary is empty
Implementation of Dictionary using linked list (using template
class)
-The following constructor is used to initialize start pointer and dsize
variable
template<class K,class E>
dictionary<K,E>::dictionary()
{
start=NULL; dsize=0; //dsize is dictionary size
}

6
-The following function is used to return the dictionary size
template<class K,class E>
int dictionary<K,E>::size()
{
return dsize;
}
-The following function is used to check whether a dictionary is empty or not.

template<class K,class E>


int dictionary<K,E>::empty()
{
if(start==NULL) return 0;
else return 1;
}
-The following function is used to insert new pair into the dictionary.
template<class K,class E>
void dictionary<K,E>::insert(K kkey,E vvalue)
{
pairnode<K,E> *tempptr=start,*ptempptr=NULL;
while(tempptr!=NULL && tempptr->key<kkey)
{
ptempptr=tempptr;
7
tempptr=tempptr->link;
}
if(tempptr!=NULL && tempptr->key==kkey)
{ //replace old value with new value
tempptr->value=vvalue;
return;
}
pairnode<K,E> *newnode=new pairnode<K,E>;
newnode->key=kkey;
newnode->value=vvalue;
if(ptempptr==NULL) //inserting node at the beginning
{
newnode->link=start;
start=newnode;
}
else if(tempptr==NULL) //inserting node at the end
{
newnode->link=NULL;
ptempptr->link=newnode;
}

8
else //inserting node in the middle
{
newnode->link=tempptr;
ptempptr->link=newnode;
}
dsize++;
};
-The following function is used to find a pair, whose key matches with the given
key in a dictionary.

template<class K,class E>


E dictionary<K,E>::find(K fkey)
{
pairnode<K,E> *tempptr=start;
while(tempptr!=NULL && tempptr->key!=fkey )
tempptr=tempptr->link;
if(tempptr!=NULL && tempptr->key==fkey)
return tempptr->value;
return NULL;
}
-The following function is used to display dictionary pairs
template<class K,class E>

9
void dictionary<K,E>::display()
{
if(start==NULL) cout<<endl<<"Dictionary is empty";
else
{
pairnode<K,E> *tempptr=start;
cout<<endl<<"Dictionary pairs are...";
while(tempptr!=NULL)
{
cout<<tempptr->key<<"-"<<tempptr->value<<" ";
tempptr=tempptr->link;
}
}
}
-The following function is used to remove an existing pair, whose key
matches with the given key in a dictionary.
template<class K,class E>
void dictionary<K,E>::erase(K kkey)
{
if(start==NULL) cout<<endl<<"Dictionary is empty";
else
10
{
pairnode<K,E> *tempptr=start,*ptempptr=NULL;
while(tempptr!=NULL && tempptr->key<kkey)
{
ptempptr=tempptr;
tempptr=tempptr->link;
}
if(tempptr!=NULL && tempptr->key==kkey)
{
{
if(ptempptr==NULL) //deleting beginning node
start=tempptr->link;
else //deleting other nodes
ptempptr->link=tempptr->link;
}
delete tempptr;
}
else
{
cout<<endl<<"Key is not found";

11
}
}
}
Hash table representation of Dictionary
-Another possibility for the representation of a dictionary is to use
hashing.
-This method uses a hash function to map dictionary pairs into
positions in a table called the hash table.
-In the ideal situation, if pair p has the key k and f is the hash
function, then p is stored in position f(k) of the table. Assume for now
that each position of the table can store at most one pair.
-To search for pair with key k, we compute f(k) and see whether a
pair exists at position f(k) of the table. If so, we have found the
desired pair. If not, the dictionary contains no pair with the specified
key k. In the former case the pair may be deleted (if desired) by
making position f(k) of the table empty. In the latter case the pair may
be inserted by placing it in position f(k).
Hash functions
Division
-This function makes use of modulo arithmetic.
f(key)=key % tablesize
Where tablesize is the hash table size. It is best if tablesize is a prime
number.
-The positions in the hash table are indexed 0 through tablesize-1.
12
-For example, when tablesize is 11, the positions for the keys 3, 22,
27, 40, 80, and 96 are f(3)=3, f(22)=0, f(27)=5, 7, 3 , and 8,
respectively.
Folding
-In this method, the key is divided into several parts and these parts
are combined using a simple operation such as addition. The resulting
number can be divided modulo tablesize to get the position in the
hash table.
-For example, key 123 (key is divided as 12 and 3, tablesize is 11)
can be stored at the position f(123)=15%11=4 in the hash table.
Mid-square function
-In this method, the key is squared and the middle or mid part of the
result is used as the address.
-For example, key 25, whose square is 625, can be stored at the
position 2.
Extraction
-In this method, only a part of the key is used to compute the position.
-For example, if we want to store student records of cse branch,
whose roll numbers are differing in last two digits (07311A25XX).
Last two digits are used to store these records in the hash table.
Collision
-when an element is inserting, it hashes to the same value as an
already inserted element then we have collision and need to resolve it.
Collision resolution

13
-There two methods
1. Separate chaining
2. Open addressing
Separate chaining
-The first strategy, commonly known as separate chaining, is to keep
a list of all elements that hash to the same value. We can use the array
of linked lists.
-for example insert keys 14, 43, 77, 58, 25, 66, 96, 13, 88x into the
hash table.
0 77 14 NUL
L
1 43 NUL
L
2 58 NULL

3 66 NULL
4 88 25 NUL
L
5 96 NULL

6 13 NULL

-To perform a find, we use the hash function to determine which list
to traverse. We then perform a find in this list. To perform an insert,
we check the appropriate list to see whether the element is already in
place (if duplicates are expected, an extra data member is usually
kept, and this data member would be incremented in the event of a
match). If the element turns out to be new, it can be inserted at the
front of the list, since it is convenient and also because frequently it
14
happens that recently inserted elements are the most likely to be
accessed in the near future.
Open addressing
-Separate chaining hashing has the disadvantage of using linked lists.
This tends to slow algorithm down a bit because of the time required
to allocate new cells.
-In an open addressing, if a collision occurs, alternative cells are tried
until an empty cell is found. More formally, cells h0 (x), h1 (x), h2 (x),
… are tried in succession, where hi(x)=(hash(x)+f(i)) mod tablesize,
with f(0)=0. The function, f, is the collision resolution strategy.
-There are three methods
1. Linear probing
2. Quadratic probing
3. Double hashing
Linear probing
-In linear probing, f is a linear function of i, typically f(i)=i. This
amounts to trying cells sequentially (in circular ) in search of an
empty cell.
-The following figure shows the result of inserting keys 89, 18, 49,
58, 69 into a hash table using the same hash function as before and
the collision resolution strategy, f(i)=i.
f(89)=89%10=9, so 89 is inserted at 9th position in the table
f(18)=18%10=8, so 18 is inserted at 8th position in the table
f(49)=49%10=9, already 89 is stored at 9th position, so collision
15
occurs. Check (f(49)+1)%10=0 is empty or not. Zero position is
empty, insert 49 at this position.
f(58)=58%10=8, already 18 is stored at 8th position, so collision
occurs.
check (f(58)+1)%10=9, is not empty
check (f(58)+2)%10=0, is not empty
check (f(58)+3)%10=1, is empty, insert 58 at 1st position
key 69 is included in the similar manner
Empty After 89 After 18 After 49 After 58 After 69
table
0 49 49 49
1 58 58
2 69
3
4
5
6
7
8 18 18 18 18
9 89 89 89 89 89
Quadratic probing
-In quadratic probing, f is a linear function of i, typically f(i)=i2. This

16
amounts to trying cells in the distance of i2 for an empty cell.
-The following figure shows the result of inserting keys 89, 18, 49,
58, 69 into a hash table using the same hash function as before and
the collision resolution strategy, f(i)=i2.
f(89)=89%10=9, so 89 is inserted at 9th position in the table
f(18)=18%10=8, so 18 is inserted at 8th position in the table
f(49)=49%10=9, already 89 is stored at 9th position, so collision
occurs. Check (f(49)+12)%10=0 is empty or not. Zero position is
empty, insert 49 at this position.
f(58)=58%10=8, already 18 is stored at 8th position, so collision
occurs.
check (f(58)+12)%10=9, is not empty
check (f(58)+22)%10=2, is empty, insert 58 at 2nd position
key 69 is included in the similar manner

Empty After 89 After 18 After 49 After 58 After 69


table
0 49 49 49
1
2 58 58
3 69
4
5
6

17
7
8 18 18 18 18
9 89 89 89 89 89
Double hashing
-For double hashing one popular choice is f(i)=i.hash2(x). This
formula says that we apply second hash function to x and probe at a
distance hash2(x), 2hash2(x),…, and so on.
-A function such as hash2(x) = R - (x mod R), with R a prime smaller
than table size, will work well.
-The following figure shows the result of inserting keys 89, 18, 49,
58, 69 into a hash table using the same hash function as before and
the collision resolution strategy, f(i)=i.hash2(x), take R as 7, which is
smaller than table size.
-f(89)=89%10=9, so 89 is inserted at 9th position in the table
-f(18)=18%10=8, so 18 is inserted at 8th position in the table
-f(49)=49%10=9, already 89 is stored at 9th position, so collision
occurs. Apply second hash function.
hash2(49)=7-(49%7)=7, check cell at a distance 7 is empty or
th
not. 6 position is empty, insert 49 at this position.
-f(58)=58%10=8, already 18 is stored at 8th position, so collision
occurs. Apply second hash function.
hash2(58)=7-(58%7)=5, check cell at a distance 5 is empty or
not. 3rd position is empty, insert 58 at this position.
-f(69)=69%10=9, already 89 is stored at 9th position, so collision
18
occurs. Apply second hash function.
hash2(69)=7-(69%7)=1, check cell at a distance 1 is empty or
not. 0th position is empty, insert 69 at this position.
-If we want to insert 60, find f(60)=60%10=0, already 69 is stored at
0th position, so collision occurs. Apply second hash function.
hash2(60)=7-(60%7)=3, check cell at a distance 3 is empty or
not. 3rd position is not empty, check cell at a distance of 6, 6th position
is not empty, check cell at a distance 9, 9th position is not empty,
check at a distance 12, 2nd position is empty, insert 60 at this position.

Empty After 89 After 18 After 49 After 58 After 69


table
0 69
1
2
3 58 58
4
5
6 49 49 49
7
8 18 18 18 18
9 89 89 89 89 89

19
Rehashing
-If the table gets too full, the running time for the operations will start
taking too long and insertions might fail for open addressing hashing
with quadratic resolution. This can happen if there are too many
removals intermixed with insertions. A solution, then, is to build
another table that is about twice as big (with an associated new hash
function) and scan down the entire original hash rable, computing the
new hash value for each (nondeleted) element and inserting it in the
new table.
-As an example, suppose the elements 13, 15, 24, and 6 are inserted
into an open addressing hash table of size 7. The hash function is h(x)
= x mod 7. Suppose linear probing is used to resolve collisions. The
resulting hash table appears in fig4.1
0 6
1 15
2
3 24
4

5
6 13

20
0 6
1 15
2 23
3 24
4
Fig4.1 Fig4.2
5
-If 23 is inserted into the table, the resulting table in
fig4.2 will be over 70 percent full. 6 13 Because the table
is so full, a new table is created. The size of this table
is 17, because this is the first prime that is twice as large as the old
table size. The new hash function is then h(x) = x mod 17. The old
table is scanned, and elements 6, 15, 23, 24, and 13 are inserted into
the new table. The resulting table appears in fig4.3.
0
1
2
3
4
5
6 6
7 23

8 24
9

10
11

21
12

13 13
14

15 15
16
1 1 Fig4.3
Extendible 2 3 hashing
-If either open 4 7 addressing hashing or
separate chaining hashing is used, the major
6
problem is that collisions could cause several
blocks to be 14 examined during a find, even
for a well- distributed hash table.
Furthermore, when the table gets too full, an extremely expensive
rehashing step must be performed, which requires O(N) disk
accesses.
-Another alternative is extendible hashing, allows a find to be
performed in two disk accesses. Insertions also require
1
few disk accesses.
0 1
D
Directory
dL
Buckets

22
Fig4.e1
2

00 01 10 11

Fig4.e2
2 2 1

4 2 3

12 6 7

14 5

13

00 01 10 11

2 2 2 2

4 2 5 3

23
12 6 13 7

36 14 23

44

Fig4.e3
3
000 001 010 011 100 101 110 111

3 3 2 2 2
2 36 2 5 3
32 44 6 13 7
60 14 23

Fig4.e4

-Fig4.e1 represents an extendible hashing scheme for the data (keys)


2, 3, 4, 6, 7, and 14
-Directory contains pointers to the buckets determined by the least
significant bits (rightmost bits) of the keys. The number of entries in
the directory is 2D, where D represents the number of bits to be
scanned in the keys. Entries in the directory are least significant
binary bits of the key.
-Buckets contain keys (data).dL represents the number of bits to be

24
scanned in the key, which is going to be stored in this bucket.
-For example take D as one, and dL as one, and bucket size as 4
(means bucket can store up to 4 keys).
-When we want to store key 2. Its binary representation is 10; D value
is one, so check least significant bit in the key, whose value is 0, in
the directory you check for bit 0, and store key in the related bucket.
- similarly 3 - 4, 6, 7, and 14
Key Binary form
2 10 goes to bucket 0
3 11 goes to bucket 1
4 100 goes to bucket 0
6 110 goes to bucket 0
7 111 goes to bucket 1
14 1110 goes to bucket 0
-If we want to store key 12, whose binary representation is 1100
which is to be stored in the bucket 0, but bucket is full.
-When bucket is full, you compare dL value with D.
If dL < D, then split buckets, otherwise increase directory size and
split the buckets.
In this case dL value of the bucket 0 is 1, and D value is 1, so increase
directory size by 4, and scan two bits of the key.
-Fig4.e2 shows the resultant buckets with the directory. Bucket 1 is
not full, don’t split this bucket.
25
-Now the keys 2, 6, and 14 goes to bucket 10, and key 4 goes to
bucket 00 by considering two least significant bits of the key, and 12
goes to bucket 00.
-In the similar manner keys 5 and 13 goes to bucket 1 (01 &11).
-When we are storing key 23, actually it has to go to bucket 1, but
bucket is full, so split this bucket without increasing the directory
size, because bucket dL value (1) is than D. Fig4.e3 shows the
resultant buckets. Now key 23 goes to bucket 11.
-New keys 36 and 44 goes to bucket 00, old keys 5, and 13 goes to
bucket 10.
-when we are storing key 60, actually it has to go to bucket 00, but
bucket is full, so increase the directory size and split this bucket,
because bucket dL value (2) is not less than the D. Fig4.e4 shows the
resultant buckets.

26

You might also like