You are on page 1of 41

Acknowledgment: Dr.

Chui Chun Kit

Chapter 7.

Indexing
Part 2: Hashing
COMP3278
Introduction to
Database Management Systems
Department of Computer Science, The University of Hong Kong
Slides prepared by - Dr. Reynold Cheng, http://www.cs.hku.hk/~ckcheng/ for students in COMP3278
For other uses, please email : ckcheng@cs.hku.hk
In this chapter…
Outcome 1. Information Modeling
Able to understand the modeling of real life information in a database
system.

Outcome 2. Query Languages


Able to understand and use the languages designed for data access.

Outcome 3. System Design


Able to understand the design of an efficient and reliable database
system.

Outcome 4. Application Development


Able to implement a practical application on a real database.

2
We are going to learn…
Basic concepts

B+ -tree

Hashing

Index definition in SQL

3
Section 3

Hashing

Slides prepared by - Dr. Reynold Cheng, http://www.cs.hku.hk/~ckcheng/ for students in COMP3278


For other uses, please email : ckcheng@cs.hku.hk
Hashing
A kind of un-ordered indices.
No order within the index entries.
The entries are divided into buckets.

A bucket is a unit of storage containing one or more


records (a bucket is typically a disk block).

In a hash file organization, we obtain the bucket of


a record directly from its search-key value using a
hash function.
5
Hashing
A hash function h() is a function that map the set of all
search-key values to the set of all bucket addresses.

E.g., h(“Peter”) = i: Peter’s record is mapped to bucket i.

Records with different search-key values may be


mapped to the same bucket (Collision).
I.e., the entire bucket has to be searched sequentially to
locate a record

6
Static hashing
Settings
Hash function : K mod 4.
Each bucket can hold 2 entries.
Bucket 0
4
Search key
Bucket 1
49
10101 49 C.S.
5 12121 4 Finance
15151 5 Music
Bucket 2 9 22222 9 Physics

Overflow bucket of bucket 1.


Bucket 3

7
Static hashing
Problems
Database grows with time. If the initial number of buckets
is too small, performance will degrade due to too much
overflow.

If we anticipate the database size and allocate space


accordingly, a significant amount of space is wasted initially.

Solution
Dynamic hashing (e.g., extendable hashing)

8
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111
Finance 1010 0011 1010 0000 1100 0110 1001 1111
History 1100 0111 1110 1101 1011 1111 0011 1010
Music 0011 0101 1010 0110 1100 1001 1110 1011
Physics 1001 1000 0011 1111 1001 1100 0000 0001

10101 … C.S.
12121 … Finance
15151 … Music
22222 … Physics
32343 … History
33456 … Physics
45565 … C.S.
58583 … History
76543 … Finance
76766 … Biology
83821 … C.S.
98345 … E.E. 9
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000 Assumption
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111 Each bucket can store 2
Finance 1010 0011 1010 0000 1100 0110 1001 1111 entries.
History 1100 0111 1110 1101 1011 1111 0011 1010
Music 0011 0101 1010 0110 1100 1001 1110 1011 All records in this bucket share the
Physics 1001 1000 0011 1111 1001 1100 0000 0001
same 0 bit(s) in their hash prefix.

Hash prefix : The first 0 0


bits are used in hashing. 0

Bucket 1
10101 … C.S.
12121 … Finance Bucket address table
15151 … Music
22222 … Physics
32343 … History
33456 … Physics
45565 … C.S.
58583 … History
76543 … Finance
76766 … Biology
83821 … C.S.
98345 … E.E. 10
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000 Assumption
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111 Each bucket can store 2
Finance 1010 0011 1010 0000 1100 0110 1001 1111 entries.
History 1100 0111 1110 1101 1011 1111 0011 1010
Music 0011 0101 1010 0110 1100 1001 1110 1011 All records in this bucket share the
Physics 1001 1000 0011 1111 1001 1100 0000 0001
same 0 bit(s) in their hash prefix.

Hash prefix : The first 0 0


0
bits are used in hashing. . 10101 … C.S.

Insert Bucket 1
10101 … C.S.
12121 … Finance Bucket address table
15151 … Music
22222 … Physics
32343
33456


History
Physics
Since currently hash prefix = 0, which
45565
58583


C.S.
History
means the first 0 bits are used in hashing
76543
76766


Finance
Biology (no bits), so simply store the record in the
83821
98345


C.S.
E.E. only bucket. 11
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000 Assumption
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111 Each bucket can store 2
Finance 1010 0011 1010 0000 1100 0110 1001 1111 entries.
History 1100 0111 1110 1101 1011 1111 0011 1010
Music 0011 0101 1010 0110 1100 1001 1110 1011 All records in this bucket share the
Physics 1001 1000 0011 1111 1001 1100 0000 0001
same 0 bit(s) in their hash prefix.

Hash prefix : The first 0 0


0
bits are used in hashing. . 10101
12121


C.S.
Finance

Bucket 1
10101 … C.S. Insert
12121 … Finance Bucket address table
15151 … Music
22222 … Physics
32343 … History
33456 … Physics
45565 … C.S.
58583 … History
76543 … Finance
76766 … Biology
83821 … C.S.
98345 … E.E. 12
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111
Finance 1010 0011 1010 0000 1100 0110 1001 1111
History 1100 0111 1110 1101 1011 1111 0011 1010
Music 0011 0101 1010 0110 1100 1001 1110 1011 All records in this bucket share the
Physics 1001 1000 0011 1111 1001 1100 0000 0001
same 0 bit(s) in their hash prefix.

Hash prefix : The first 0 0


0
bits are used in hashing. . 10101
12121


C.S.
Finance FULL
Bucket 1
10101 … C.S.
12121 … Finance Insert Bucket address table
15151 … Music
22222 … Physics In extendable hashing, when a bucket is full, we need to
32343 … History
33456 … Physics add one more bucket. There are two options:
45565 … C.S. 1. Add a bucket and increase the number of bits that
58583 … History
76543 … Finance we use from the hash value.
76766 … Biology 2. Add as overflow bucket (When 1. cannot solve the
83821 … C.S.
98345 … E.E. problem).
13
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111
Finance 1010 0011 1010 0000 1100 0110 1001 1111
History 1100 0111 1110 1101 1011 1111 0011 1010
Music 0011 0101 1010 0110 1100 1001 1110 1011 All records in this bucket share the
Physics 1001 1000 0011 1111 1001 1100 0000 0001
same 1 bit(s) in their hash prefix.
1
Hash prefix : The first 1
bit is used in hashing. 1 .
0 All records in this bucket share the
10101 … C.S. 1 same 1 bit(s) in their hash prefix.
12121 … Finance Insert
15151 … Music 1
22222 … Physics Bucket address table 10101 … C.S.
32343 … History
. 12121 … Finance
33456 … Physics
45565 … C.S.
58583 … History
76543 … Finance
76766 … Biology
83821 … C.S.
98345 … E.E. 14
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111
Finance 1010 0011 1010 0000 1100 0110 1001 1111
History 1100 0111 1110 1101 1011 1111 0011 1010
Music 0011 0101 1010 0110 1100 1001 1110 1011 All records in this bucket share the
Physics 1001 1000 0011 1111 1001 1100 0000 0001
same 1 bit(s) in their hash prefix.
1
Hash prefix : The first 1
bit is used in hashing. 1 . 15151 … Music

0 All records in this bucket share the


10101 … C.S. 1 same 1 bit(s) in their hash prefix.
12121 … Finance Insert
15151 … Music 1
22222 … Physics Bucket address table 10101 … C.S.
32343 … History
. 12121 … Finance
33456 … Physics
45565 … C.S.
58583 … History
76543 … Finance
76766 … Biology
83821 … C.S.
98345 … E.E. 15
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111
Finance 1010 0011 1010 0000 1100 0110 1001 1111
History 1100 0111 1110 1101 1011 1111 0011 1010
Music 0011 0101 1010 0110 1100 1001 1110 1011 All records in this bucket share the
Physics 1001 1000 0011 1111 1001 1100 0000 0001
same 1 bit(s) in their hash prefix.
1
Hash prefix : The first 1
bit is used in hashing. 1 . 15151 … Music

0 All records in this bucket share the


10101 … C.S. 1 same 1 bit(s) in their hash prefix.
12121 … Finance
15151 … Music 1
Insert Bucket address table
22222
32343


Physics
History
. 10101
12121


C.S.
Finance FULL
33456 … Physics
45565 … C.S.
58583 … History
76543 … Finance
76766 … Biology
83821 … C.S.
98345 … E.E. 16
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111
Finance 1010 0011 1010 0000 1100 0110 1001 1111
History 1100 0111 1110 1101 1011 1111 0011 1010
Music 0011 0101 1010 0110 1100 1001 1110 1011 All records in this bucket share the
Physics 1001 1000 0011 1111 1001 1100 0000 0001
same 1 bit(s) in their hash prefix.
1
Hash prefix : The first 2
bits are used in hashing. 2 . 15151 … Music

All records in this bucket share the


00
same 2 bit(s) in their hash prefix.
10101 … C.S. 01
12121 … Finance 2
15151 … Music Insert 10 . 12121 … Finance
22222 … Physics
32343 … History 11
33456 … Physics
45565 … C.S. Bucket address table
58583 … History All records in this bucket share the
76543 … Finance same 2 bit(s) in their hash prefix.
76766 … Biology 2
83821 … C.S.
98345 … E.E. . 10101 … C.S. 17
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111
Finance 1010 0011 1010 0000 1100 0110 1001 1111
History 1100 0111 1110 1101 1011 1111 0011 1010
Music 0011 0101 1010 0110 1100 1001 1110 1011 All records in this bucket share the
Physics 1001 1000 0011 1111 1001 1100 0000 0001
same 1 bit(s) in their hash prefix.
1
Hash prefix : The first 2
bits are used in hashing. 2 . 15151 … Music

All records in this bucket share the


00
same 2 bit(s) in their hash prefix.
10101 … C.S. 01
12121 … Finance 2
15151 … Music Insert 10 . 12121 … Finance
22222 … Physics 22222 … Physics
32343 … History 11
33456 … Physics
45565 … C.S. Bucket address table
58583 … History All records in this bucket share the
76543 … Finance same 2 bit(s) in their hash prefix.
76766 … Biology 2
83821 … C.S.
98345 … E.E. . 10101 … C.S. 18
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111
Finance 1010 0011 1010 0000 1100 0110 1001 1111
History 1100 0111 1110 1101 1011 1111 0011 1010
Music 0011 0101 1010 0110 1100 1001 1110 1011 All records in this bucket share the
Physics 1001 1000 0011 1111 1001 1100 0000 0001
same 1 bit(s) in their hash prefix.
1
Hash prefix : The first 2
bits are used in hashing. 2 . 15151 … Music

All records in this bucket share the


00
same 2 bit(s) in their hash prefix.
10101 … C.S. 01
12121 … Finance 2
15151 … Music 10 . 12121 … Finance
22222 … Physics Insert 22222 … Physics
32343 … History 11
33456 … Physics
45565 … C.S. Bucket address table
58583 … History All records in this bucket share the
76543 … Finance same 2 bit(s) in their hash prefix.
76766 … Biology 2
83821 … C.S.
98345 … E.E. . 10101
32343


C.S.
History
19
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111
Finance 1010 0011 1010 0000 1100 0110 1001 1111
History 1100 0111 1110 1101 1011 1111 0011 1010
Music 0011 0101 1010 0110 1100 1001 1110 1011 All records in this bucket share the
Physics 1001 1000 0011 1111 1001 1100 0000 0001
same 1 bit(s) in their hash prefix.
1
Hash prefix : The first 2
bits are used in hashing. 2 . 15151 … Music

All records in this bucket share the


00
same 2 bit(s) in their hash prefix.
10101 … C.S. 01
12121 … Finance 2
10
15151
22222


Music
Physics
11
. 12121
22222


Finance
Physics FULL
32343 … History Insert
33456 … Physics
45565 … C.S. Bucket address table
58583 … History All records in this bucket share the
76543 … Finance same 2 bit(s) in their hash prefix.
76766 … Biology 2
83821 … C.S.
98345 … E.E. . 10101
32343


C.S.
History
20
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111
Finance 1010 0011 1010 0000 1100 0110 1001 1111
History 1100 0111 1110 1101 1011 1111 0011 1010
Music 0011 0101 1010 0110 1100 1001 1110 1011 All records in this bucket share the
Physics 1001 1000 0011 1111 1001 1100 0000 0001
same 1 bit(s) in their hash prefix.
1
Hash prefix : The first 3
bits are used in hashing. 3 . 15151 … Music

000 3
10101 … C.S. 001
12121 … Finance . 22222 … Physics
15151 … Music 010
22222 … Physics
32343 … History Insert
011 3
33456 … Physics 100
45565 … C.S. . 12121 … Finance
58583 … History 101
76543 … Finance
76766 … Biology 110 2
83821 … C.S. 111
98345 … E.E. . 10101
32343


C.S.
History
21
Bucket address table
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111
Finance 1010 0011 1010 0000 1100 0110 1001 1111
History 1100 0111 1110 1101 1011 1111 0011 1010
Music 0011 0101 1010 0110 1100 1001 1110 1011 All records in this bucket share the
Physics 1001 1000 0011 1111 1001 1100 0000 0001
same 1 bit(s) in their hash prefix.
1
Hash prefix : The first 3
bits are used in hashing. 3 . 15151 … Music

000 3
10101 … C.S. 001
12121 … Finance . 22222
33456


Physics
Physics
15151 … Music 010
22222 … Physics
32343 … History Insert
011 3
33456 … Physics 100
45565 … C.S. . 12121 … Finance
58583 … History 101
76543 … Finance
76766 … Biology 110 2
83821 … C.S. 111
98345 … E.E. . 10101
32343


C.S.
History
22
Bucket address table
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111
Finance 1010 0011 1010 0000 1100 0110 1001 1111
History 1100 0111 1110 1101 1011 1111 0011 1010
Music 0011 0101 1010 0110 1100 1001 1110 1011 All records in this bucket share the
Physics 1001 1000 0011 1111 1001 1100 0000 0001
same 1 bit(s) in their hash prefix.
1
Hash prefix : The first 3
bits are used in hashing. 3 . 15151 … Music

000 3
10101
12121


C.S.
Finance
001 . 22222 … Physics Since the hash
33456 … Physics
15151 … Music 010 prefix is 3 and 2
22222 … Physics
32343 … History 011 3
is 2, we can
33456
45565


Physics
C.S.
Insert 100 . 12121 … Finance simply add a
58583 … History 101 bucket without
76543 … Finance
110 increasing the
76766
83821


Biology
C.S. 111
2 FULL hash prefix.
98345 … E.E. . 10101

32343 …
C.S.
History
Bucket address table 23
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111 All records in this bucket share the
same 1 bit(s) in their hash prefix.
Finance 1010 0011 1010 0000 1100 0110 1001 1111
1
History 1100 0111 1110 1101 1011 1111 0011 1010
15151 … Music
Music 0011 0101 1010 0110 1100 1001 1110 1011 .
Physics 1001 1000 0011 1111 1001 1100 0000 0001
3
Hash prefix : The first 3
bits are used in hashing. 3 . 22222
33456


Physics
Physics

000
3
10101 … C.S. 001 Since the hash
12121 … Finance . 12121 … Finance
15151 … Music 010 prefix is 3 and 2
22222 … Physics
32343 … History 011 3
is 2, we can
33456
45565


Physics
C.S.
Insert 100 . 32343 … History simply add a
58583 … History 101 bucket without
76543 … Finance
76766 … Biology 110 3
increasing the
83821
98345


C.S.
E.E.
111 . 10101 … C.S. hash prefix.
Bucket address table 24
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111 All records in this bucket share the
same 1 bit(s) in their hash prefix.
Finance 1010 0011 1010 0000 1100 0110 1001 1111
1
History 1100 0111 1110 1101 1011 1111 0011 1010
15151 … Music
Music 0011 0101 1010 0110 1100 1001 1110 1011 .
Physics 1001 1000 0011 1111 1001 1100 0000 0001
3
Hash prefix : The first 3
bits are used in hashing. 3 . 22222
33456


Physics
Physics

000
10101 … C.S. 3
001
12121 … Finance . 12121 … Finance
15151 … Music 010
22222 … Physics
32343 … History 011 3
33456 … Physics Insert 100 32343 … History
45565 … C.S. .
58583 … History 101
76543 … Finance
76766 … Biology 110 3
83821 … C.S. 111
98345 … E.E. . 10101
45565


C.S.
C.S.
25
Bucket address table
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111 All records in this bucket share the
same 1 bit(s) in their hash prefix.
Finance 1010 0011 1010 0000 1100 0110 1001 1111
1
History 1100 0111 1110 1101 1011 1111 0011 1010
15151 … Music
Music 0011 0101 1010 0110 1100 1001 1110 1011 .
Physics 1001 1000 0011 1111 1001 1100 0000 0001
3
Hash prefix : The first 3
bits are used in hashing. 3 . 22222
33456


Physics
Physics

000
10101 … C.S. 3
001
12121 … Finance . 12121 … Finance
15151 … Music 010
22222 … Physics
32343 … History 011 3
33456 … Physics 100 32343 … History
45565 … C.S. Insert . 58583 … History
58583 … History 101
76543 … Finance
76766 … Biology 110 3
83821 … C.S. 111
98345 … E.E. . 10101
45565


C.S.
C.S.
26
Bucket address table
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111 All records in this bucket share the
same 1 bit(s) in their hash prefix.
Finance 1010 0011 1010 0000 1100 0110 1001 1111
1
History 1100 0111 1110 1101 1011 1111 0011 1010
15151 … Music
Music 0011 0101 1010 0110 1100 1001 1110 1011 .
Physics 1001 1000 0011 1111 1001 1100 0000 0001
3
Hash prefix : The first 3
bits are used in hashing. 3 . 22222
33456


Physics
Physics

000
10101 … C.S. 3
001
12121 … Finance . 12121 … Finance
15151 … Music 010 76543 … Finance
22222 … Physics
32343 … History 011 3
33456 … Physics 100 32343 … History
45565 … C.S. . 58583 … History
58583 … History Insert 101
76543 … Finance
76766 … Biology 110 3
83821 … C.S. 111
98345 … E.E. . 10101
45565


C.S.
C.S.
27
Bucket address table
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111 All records in this bucket share the
same 1 bit(s) in their hash prefix.
Finance 1010 0011 1010 0000 1100 0110 1001 1111
1
History 1100 0111 1110 1101 1011 1111 0011 1010
15151 … Music
Music 0011 0101 1010 0110 1100 1001 1110 1011 . 76766 … Biology
Physics 1001 1000 0011 1111 1001 1100 0000 0001
3
Hash prefix : The first 3
bits are used in hashing. 3 . 22222
33456


Physics
Physics

000
10101 … C.S. 3
001
12121 … Finance . 12121 … Finance
15151 … Music 010 76543 … Finance
22222 … Physics
32343 … History 011 3
33456 … Physics 100 32343 … History
45565 … C.S. . 58583 … History
58583 … History 101
76543 … Finance Insert
76766 … Biology 110 3
83821 … C.S. 111
98345 … E.E. . 10101
45565


C.S.
C.S.
28
Bucket address table
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111 All records in this bucket share the
same 1 bit(s) in their hash prefix.
Finance 1010 0011 1010 0000 1100 0110 1001 1111
1
History 1100 0111 1110 1101 1011 1111 0011 1010
15151 … Music
Music 0011 0101 1010 0110 1100 1001 1110 1011 . 76766 … Biology
Physics 1001 1000 0011 1111 1001 1100 0000 0001
Note that all
3 entries in this
Hash prefix : The first 3
bits are used in hashing. 3 . 22222
33456


Physics
Physics
bucket are of the
same hash value
000
10101 … C.S. 3 “C.S.”, increasing
001
12121 … Finance . 12121 … Finance hash prefix will
15151 … Music 010 76543 … Finance
22222 … Physics not solve the
32343 … History 011
33456 … Physics
3 collision
100 . 32343 … History
45565 … C.S. 58583 … History problem!
58583 … History 101
76543 … Finance Chaining is used.
110
76766
83821


Biology
C.S.
Insert
111
3 FULL
10101 … C.S.
98345 … E.E. . 45565 29
… C.S.
Bucket address table
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111 All records in this bucket share the
same 1 bit(s) in their hash prefix.
Finance 1010 0011 1010 0000 1100 0110 1001 1111
1
History 1100 0111 1110 1101 1011 1111 0011 1010
15151 … Music
Music 0011 0101 1010 0110 1100 1001 1110 1011 . 76766 … Biology
Physics 1001 1000 0011 1111 1001 1100 0000 0001
Note that all
3 entries in this
Hash prefix : The first 3
bits are used in hashing. 3 . 22222
33456


Physics
Physics
bucket are of the
same hash value
000
10101 … C.S. 3 “C.S.”, increasing
001
12121 … Finance . 12121 … Finance hash prefix will
15151 … Music 010 76543 … Finance
22222 … Physics not solve the
32343 … History 011
33456 … Physics
3 collision
100 . 32343 … History
45565 … C.S. 58583 … History problem!
58583 … History 101
76543 … Finance Chaining is used.
76766 … Biology Insert
110 3
83821 … C.S. 111
98345 … E.E. . 10101
45565


C.S.
C.S.
83821 … C.S.
30 Bucket address table
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111 All records in this bucket share the
same 1 bit(s) in their hash prefix.
Finance 1010 0011 1010 0000 1100 0110 1001 1111
1
History 1100 0111 1110 1101 1011 1111 0011 1010
Music 0011 0101 1010 0110 1100 1001 1110 1011 . 15151
76766


Music
Biology FULL
Physics 1001 1000 0011 1111 1001 1100 0000 0001
3
Hash prefix : The first 3
bits are used in hashing. 3 . 22222
33456


Physics
Physics

000
10101 … C.S. 3
001
12121 … Finance . 12121 … Finance
15151 … Music 010 76543 … Finance
22222 … Physics
32343 … History 011 3
33456 … Physics 100 32343 … History
45565 … C.S. . 58583 … History
58583 … History 101
76543 … Finance
76766 … Biology 110 3
83821 … C.S. Insert 111
98345 … E.E. . 10101
45565


C.S.
C.S.
83821 … C.S.
31 Bucket address table
Extendable hashing
Department h(department) – binary representation All records in this bucket share the
same 2 bit(s) in their hash prefix.
Biology 0010 1101 1111 1011 0010 1100 0011 0000
2
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
15151 … Music
E.E. 0100 0011 1010 1100 1100 0110 1101 1111 . 76766 … Biology
Finance 1010 0011 1010 0000 1100 0110 1001 1111
2
History 1100 0111 1110 1101 1011 1111 0011 1010
Music 0011 0101 1010 0110 1100 1001 1110 1011 .
Physics 1001 1000 0011 1111 1001 1100 0000 0001
3
Hash prefix : The first 3
bits are used in hashing. 3 . 22222
33456


Physics
Physics

000
10101 … C.S. 3
001
12121 … Finance . 12121 … Finance
15151 … Music 010 76543 … Finance
22222 … Physics
32343 … History 011 3
33456 … Physics 100 32343 … History
45565 … C.S. . 58583 … History
58583 … History 101
76543 … Finance
76766 … Biology 110 3
83821 … C.S. 111
98345 … E.E. . 10101
45565


C.S.
C.S.
83821 … C.S.
32 Bucket address table
Extendable hashing
Department h(department) – binary representation All records in this bucket share the
same 2 bit(s) in their hash prefix.
Biology 0010 1101 1111 1011 0010 1100 0011 0000
2
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
15151 … Music
E.E. 0100 0011 1010 1100 1100 0110 1101 1111 . 76766 … Biology
Finance 1010 0011 1010 0000 1100 0110 1001 1111
2
History 1100 0111 1110 1101 1011 1111 0011 1010
98345 … E.E.
Music 0011 0101 1010 0110 1100 1001 1110 1011 .
Physics 1001 1000 0011 1111 1001 1100 0000 0001
3
Hash prefix : The first 3
bits are used in hashing. 3 . 22222
33456


Physics
Physics

000
10101 … C.S. 3
001
12121 … Finance . 12121 … Finance
15151 … Music 010 76543 … Finance
22222 … Physics
32343 … History 011 3
33456 … Physics 100 32343 … History
45565 … C.S. . 58583 … History
58583 … History 101
76543 … Finance
76766 … Biology 110 3
83821 … C.S. 111
98345 … E.E. . 10101
45565


C.S.
C.S.
83821 … C.S.
33 Bucket address table
Extendable hashing
The bucket address table has a hash prefix value n.
The first-n bits of the hash values are used to locate
the address table entry for the pointer pointing to the
bucket containing records with that hash value.
3
Hash prefix : The first n bits are used in hashing.
000
001
010
2
011
100 00
1
101 01
0
110 10 0
111 11 1
34
Extendable hashing
All records in this bucket share the
same 2 bit(s) in their hash prefix.
Each bucket j contains:
2
The records
An integer ij

All records in the jth bucket share the same first-ij


bits in the hash function.

35
Splitting a bucket
If the hash prefix = ij
Only 1 pointer in address table is pointing to bucket j.

We increment hash prefix by 1, and double the size of the


bucket address table.
Reinsert the entries in the original bucket.

1 .
0 0
10101 … C.S. 0
12121 … Finance
1 1
Bucket 1 10101 … C.S.
Bucket address table . 12121 … Finance
Bucket address table
36
Splitting a bucket
If the hash prefix > ij (i.e., the indicator in the
bucket address table is larger than the indicator
in bucket)
Create a new bucket z, and set ij and iz to old ij + 1.
There must be a number of address table entries pointing
to bucket j originally, change the lower half of such
entries so that they point to bucket z.
Remove and re-insert each record in bucket j.
Let’s look at the running example: the insertion
of the entry “45565,…,C.S.” for an illustration.
37
Hash indexes v.s. B -tree
+

Hash indexes have different characteristics than B+-tree


They are used only for equality comparisons that use the =
operator (but they are very fast).

They are not used for comparison operators such as < that
find a range of values.

The optimizer cannot use a hash index to speed up ORDER


BY operations.

38
Section 4

Index &
SQL

Slides prepared by - Dr. Reynold Cheng, http://www.cs.hku.hk/~ckcheng/ for students in COMP3278


For other uses, please email : ckcheng@cs.hku.hk
Defining index in SQL
To create an index:
CREATE INDEX <index-name> ON
<relation-name> ( <attribute-list> )
[index_type]
Optional [index_type]: USING {BTREE | HASH}

Use CREATE UNIQUE INDEX to indirectly specify


and enforce the condition that the search-key is a
superkey.
To remove an index DROP INDEX <index-name>
40
Reference: MySQL 13.1.8 CREATE INDEX Syntax http://dev.mysql.com/doc/refman/5.0/en/create-index.html
Chapter 7.

END
COMP3278
Introduction to
Database Management Systems
Department of Computer Science, The University of Hong Kong
Slides prepared by - Dr. Reynold Cheng, http://www.cs.hku.hk/~ckcheng/ for students in COMP3278
For other uses, please email : ckcheng@cs.hku.hk

You might also like