Professional Documents
Culture Documents
2
Index structures
3
Access paths
4
Indices: basic principle
1
ki
3
2
Data file
ki pi
ki
Index ....
6
Index types
7
Indices: terminology
Clustered vs. unclustered
Primary vs. secondary
Single-level vs. multi-level
Dense vs. sparse
8
Clustered vs. unclustered indices
Unclustered index on B
20 20 o b
Clustered index on A
30 d
40 30 g f
40 f
50 g
60 50 b h
70 60 h m
80 o
70 z
90 80 a q
100 s
110 90 q t
120 100 s z 9
Primary vs. secondary indices
Primary index on B
Secondary index on A
10 20 o b
10 d
20 10 g f
40 f
30 g
30 100 b h
40 10 h m
40 o
10 z
40 30 a q
100 s
100 40 q t
120 100 s z
10
Secondary indices with pointers lists
20 20 o
30
40 10 g
40 f
100
120 100 b
10 h
10 z
30 a
40 q
100 s 11
Dense vs. sparse indices
40
90
110 50
130 60
150
70
170 80
190
210 90
230 100 12
Single-level vs. multi-level indices
Data file
14
Example (2)
secondary unclustered dense single-level index
data pages
160229323 BNCGRG78L21A944K Bianchi Giorgio
RID list
160235467 RSSNNA78A53A944V Rossi Anna
160239654 VRDMRC79H20F839X Verdi Marco
Alessandri
160239980 VRDMRT78L66A944K Verdone Marta
Bianchi
160240467 ………………… Alessandri Maria
Bianchini
160240532 ………………… Bianchini Carlo
………
………………… ………………… ………………… …………………
160240576 ………………… Rossi Demetrio
Rossi 160240600 ………………… Bianchi Arrigo
….. 160240623 ………………… Verdi Remo
……
record pointers
N.B. the list can be implemented as:
Verdi
Verdone pointer to a
key value
pointers page
…… … …. ….
15
Index specification in SQL
17
Paginating trees (i)
18
Paginating trees(ii)
Cons:
Complication of the balancing algorithm (inefficient during
updates)
No guarantee on the minimum page utilization
We should find a specific solution!
19
B-tree (R. Bayer & E. McCreight, 1972)
20
B-tree: terminology
(k1, r1) (k2, r2) … (ki, ri) (ki+1, ri+1) … (km, rm)
> k1 > ki
< k1 > km
< k2 < ki+1
23
B-tree: format of leaf nodes
Leaf nodes have the following format, where: k1 < k2 < … < km
(k1, r1) (k2, r2) … (ki, ri) (ki+1, ri+1) … (km, rm)
24
Example of B-tree
30 150 179
30 < K < 100
25
B-tree: search
26
B-tree: search example
30 150 179
30 < K
28
Maximum number of nodes in a B-tree with height h
29
Minimum number of nodes in a B-tree with height h
h 2
bmin 1 2 d 1 1 2
l d 1
h 1
1
l 0 d
The minimum number of entries is therefore :
30
Height of a B-tree
Now, we know N, thus we can compute the tree height as:
Nmin N Nmax
2d 1 1 N 2d 1 1
h 1 h
N 1
log2d 1N 1 h logd 1 2 1
31
Height of a B-tree: example
Let us consider the following values:
Key length: 8 bytes
RID length: 4 bytes
PID length: 2 bytes
Page size: 4096 bytes
We obtain:
(8+4)2d + 2(2d+1) = 4096
d = (4096-2)/(24+4) = 146
If N = 109, searching for a key value requires at most
log147(109/2) + 1 = 5 I/O operations!
A binary search would require 22 accesses,
supposing all pages are completely full
32
Height of a B-tree : considerations
Variability of h is, with N and d fixed, very limited
(maximum – minimum = 1)
With the following values of d, with h = 3 we can store up to
(2*146+1)3 – 1 = about 25 millions keys
N
1.E+24
Nmin 1000 1000000 1000000000
1.E+21
P d hmin hmax hmin hmax hmin hmax
1.E+18 Nmax
512 18 3 3 5 5 7 8
1.E+15
1024 36 2 2 4 4 6 6
1.E+12
2048 73 2 2 4 4 5 5
1.E+09
4096 146 2 2 3 3 5 5
1.E+06
8192 292 2 2 3 3 4 4
1.E+03
16384 585 1 2 3 3 4 4
1.E+00
0 1 2 3 4 5 6 7 8 9 10
33
B-tree: range search
34
B+-tree
The main features of a B+-tree are:
The entries (ki,ri) are all contained in leaf nodes
The tree is higher (with respect to a B-tree)
Leaves are linked as a list (possibly, double-linked)
using pointers (PIDs) to simplify range search
Internal nodes contain only key values (not necessarily
corresponding to values existing in data records)
The order of internal nodes is higher
The tree is lower (with respect to a B-tree)
35
B+-tree: format of internal nodes
k1 k2 … ki ki+1 … km
> k1 > ki
≤ k1 > km
≤ k2 ≤ ki+1
36
Example of B+-tree
K ≤ 100 100
30 150 179
30 < K ≤ 100
37
B+-tree: search
38
B+-tree: search example
30 150 179
30 < K
40
B+-tree: range search example
30 150 179
30 < K
41
B+-tree: insert
42
B+-tree: leaf split
43
B+-tree: split propagation
44
B+-tree: split example
Order 2 B+-tree
4 10 16 20
1 2 3 4 5 7 8 10 11 14 15 17 19 21 23 24 25
10
4 8 16 20
1 2 3 4 5 7 8 9 10 11 14 15 17 19 21 23 24 25
45
B+-tree: insert cost
46
B+-tree: alternative strategy for overflows
47
B+-tree: delete
48
B+-tree: underflow management
49
B+-tree: redistribution
50
B+-tree: concatenation example
Order 2 B+-tree
10
4 8 16 20
1 2 3 4 5 7 8 9 10 11 14 15 17 19 21 23 24 25
4 9 16 20
1 2 3 4 5 7 8 9 11 14 15 17 19 21 23 24 25
51
B+-tree: redistribution example
Order 2 B+-tree
10
4 8 16 20
1 2 3 4 5 7 8 9 10 11 14 15 17 19 21 23 24 25
4 8 16 21
1 2 3 4 5 7 8 9 10 11 14 15 17 21 23 24 25
52
B+-tree: delete cost
53
B+-tree: memory occupation
pagesize PIDsize
d
2keysize PIDsize
For trees, we need to know whether the index is primary or
secondary
In some cases, the leaf level can coincide with the data file
54
B+-tree: number of leaves (primary index)
N
NL
d leaf u
The average fill factor u usually equals log2
55
Height of a B+-tree
Since we know NL, we can now compute the B+-tree height:
NLmin NL NLmax
2d 1 NL 2d 1
h 2 h 1
56
B+-tree in practice
Secondary index
B+-tree as a data file
Variable-length keys
Key compression
Multi-attribute B+-tree
Bulk-loading
Implementing B+-tree: GiST
57
B+-tree as a secondary index
58
B+-tree: duplicating keys
This means that several entries exist with the same key value
(but different RID)
We should slightly modify the search algorithm (in practice,
this becomes a range search)
Not every leaf is “addressed”
Inefficient when deleting
We insert the RID as “part” of the key
The index is now “primary”
1 2 5 5 5 5 5 6 8 9
59
B+-tree: using PIDs
60
B+-tree: posting file
61
B+-tree as a data file
62
B+-tree: variable-length keys
63
B+-tree: key compression
“Ser” is enough…
64
B+-tree: multi-attribute searching
65
B+-tree: using multiple indices
66
Multi-attribute B+-tree
67
Multi-attribute B+-tree: example
(Bianchi, 1982) … (Rossi, 1978) (Rossi, 1985) (Rossi, 1992) … (Verdi, 1974)
68
B+-tree: bulk-loading
69
B+-tree: loading
70
Implementing B+-tree: GiST
71
Basic GiST concepts
72
Predicate monotonicity
73
GiST properties
75
Key methods: search
Consistent(E,q)
Input: Entry E=(p,ptr) and search predicate q
Output: if p & q == false then false else true
76
Consistent: comments
77
Consistent in B+-tree
78
Key methods: predicate creation
Union(P)
Input: Set of entries P = {(p1,ptr1),…,(pn,ptrn)}
Output: A predicate r holding for all tuples reachable
by way of one of the entry pointers
The goal of Union is providing the information needed
to characterize the predicate of a parent node,
starting from predicates of children nodes
In general, r can be logically derived as a predicate such as
(p1 | … | pn) r
79
Union in B+-tree
Given P = {([v1,w1[,ptr1),…,([vn,wn[,ptrn)}
Returns [min{v1,…,vn}, max{w1,…,wn}[
80
Key methods: key compression
Compress(E)
Input: Entry E = (p,ptr)
Output: Entry E’= (p’,ptr), with p’ compressed
representation of p
The goal of Compress is providing a more efficient
representation of the p predicate
E.g.: separators in place of totally ordered ranges
E.g.: prefixes from strings (prefix-B+-tree)
81
Key methods: key decompression
Decompress(E)
Input: Entry E’= (p’,ptr), with p’ = Compress(p)
Output: Entry E = (r,ptr), with p r
Compression is, in general, lossy
E.g.: prefix-B+-tree
Condition p r requires that, if information loss occurs,
what we obtain with Decompress is a predicate that holds
if p holds
The simplest case is when Decompress is the identity function
82
Key methods: insert
Penalty(E1,E2)
Input: Entries E1 = (p1,ptr1) and E2 = (p2,ptr2)
Output: A “penalty” value resulting from inserting E2
in the sub-tree rooted in E1
Penalty is used by Tree methods Insert and Split, and is needed
to compare different alternatives for updating the tree
83
Penalty in B+-tree
E1 = ([x1,y1[,ptr1) e E2 = ([x2,y2[,ptr2)
If E1 is the first entry in its node
Penalty returns max{y2 – y1, 0}
If E1 is the last entry in its node
Penalty returns max{x1 – x2, 0}
Otherwise
Penalty returns max{y2 – y1, 0} + max{x1 – x2, 0}
84
Key methods: split
PickSplit(P)
Input: Set of M+1 entries
Output: Two sets of entries, P1 e P2, with cardinality ≥ f M
PickSplit implements the real split strategy,
which is not detailed at this level
Usually, we try to minimize some metric similar to Penalty
85
Picksplit in B+-tree
86
Tree methods
87
Tree methods: architecture
Search: calls Consistent
Insert: calls ChooseSubtree, Split and AdjustKeys
ChooseSubtree: calls Penalty
Split: calls a PickSplit and Union
AdjustKeys: calls Union
Delete: calls Search and CondenseTree
CondenseTree: calls AdjustKeys and Insert
88
Search
89
Search in linear domains
90
Insert
91
Insert
Insert(R,E,l)
Input: Tree rooted at R, entry E, level l
Output: Tree with E inserted at level l
N = ChooseSubtree(R,E,l)
if E can be inserted in N
insert E in N
else Split(R,N,E)
AdjustKeys(R,N)
92
ChooseSubtree
93
Split
98
Estimating the number of result pages
0.9
F(R,P)/P 0.8
0.7
0.6
0 5 10 15 20
R/P
100
Yao model
0.8
0.6
F/P
0.4 Cardenas
0.2 Yao
0
0 500 1000
R
102
Comparing the models
R
N C i 1
R, N,C NP 1
i 1 N i 1
103
Cost of index access
104
Example
50000
40000
30000
20000
10000
0
10 100 1000 10000
EK
105
Cost for range search A [x,y]
106
Hash indices
H
key k RID of records with key = k
hash function
107
Hash indices: collisions
108
Hash indices: overflow
109
Static and dynamic hash indices
111
Static hashing
0 X
1 21 16 31 X
Primary area 2 32 77 17 12 2 22 X
3 53 X
4 69 24 X Overflow area
112
Static hashing: cost analysis
113
Hash functions
114
Hash functions: uniform distribution
PrX j x j 1
N
x j P P
with average value m and variance s2 given as:
N N 1
1
2
P P P
where neither m nor s2 depend on the specific bucket
For P >> 1, the ratio / is almost equal to 1
115
Quality of a hash function
116
Degeneracy
P 1
N xj
2
P 1 x j
2
j 0 P P j 0 P
are computed over all the P buckets and xj is the number
of records within the j-th bucket
The lower the degeneracy,
the better the behavior of the hash function
117
Hash functions: mid square
1451422=21066200164
118
Hash functions: shifting
119
Hash functions: folding
120
Hash functions: division
121
Hash functions : division (example)
P = 6997
k= 172146 H(k) = 4218
172147 4219
172148 4220
172149 4221
… …
174924 6996
174925 0
122
Alphanumeric keys
123
Alphanumeric keys: example
124
Choosing the “right” radix
125
Choosing the “right” radix (cont.)
degeneracy
3.5
In particular, 3
the maximum 2.5
is obtained for P = 507 2
= 13 x 13 x 3 1.5
1
420 440 460 480 500 520
P
127
Load factor
128
Bucket capacity
129
Optimal bucket capacity
130
Computing the average number of overflows
x C PrX j x j
N
OV j C j
x j C 1
that does not depend on the specific bucket j
For the sake of brevity, we will write Pr(x) in place of Pr{Xj = xj}
131
Overflow distribution
x C Pr x
N
OV C OV j C P j
j 0 x j C 1
OV C P
N
x C C d e Cd
x
x C 1 x!
By substituting the variable i = x – C:
OV C P
C d
C 1
e Cd N C i C i 1 d i 1
C!
i 1 C 1C 2 C i
OV C N
C d
C
e Cd
f C, d
C!
133
Example
C\d 0.5 0.7 0.9 0.95 1
1 0.213061 0.280836 0.340633 0.354464 0.367879
2 0.103638 0.170307 0.237853 0.254377 0.270669
5 0.02478 0.071143 0.137768 0.156336 0.17531
10 0.004437 0.028736 0.085344 0.103778 0.123244
20 0.000278 0.008014 0.047641 0.063497 0.080492
50 2.51E-07 0.000496 0.016561 0.026191 0.03622
40%
35% 0.5
0.7
30%
0.9
25% 0.95
20% 1
15%
10%
5%
0%
0 10 20 C 30 40 50
134
Managing overflows
135
Chaining in primary area
Separate lists
If bucket j overflows, we insert the new record in the first non-
full bucket following j
Overflow record are linked in a list
Non-overflow record should be linked, too
Coalesced chaining
A single pointer is used for every bucket (not for every record)
Bucket lists could be merged (coalesced)
E.g.: j overflows in j+h and j+h is full,
both overflow in j+h+l
136
Examples
138
Open addressing
139
Open addressing: searching
140
Open addressing: deleting records
insert
non
free occupied
occupied
insert delete
141
Linear probing
142
Primary clustering
143
Quadratic probing
144
Double hashing
145
Double hashing: comments
146
Comparing different techniques
147
Search performance
Coalesced chaining
Successful search:
E 1
1 2d
8d
e 1 2d
d
4
Unsuccessful search:
1 2d
A 1 e 1 2d
4
Separate chaining with overflow area
Successful search: E ≈ 1 + d/2
Unsuccessful search: A ≈ e-d + d
148
Performance: open addressing
149
Performance: open addressing
1 N 1 1 P N 1 1
E
N i 0 1 i / P N i 0 P i
P P 1 P N 1
E
N i 1 i i 1 i
150
Performance: open addressing
N
We have:
1
logn cost O n 1
i 1 i
Therefore:
P P 1
log1 d
P P
E log log
N P N N 1 d N
151
Performance: open addressing
Linear probing 1 1
Successful search: E 1
2 1 d
1 1
Unsuccessful search:
A 1
2
2 1 d
If the bucket capacity C increases, performance of open addressing
techniques improve, since:
A(C)=1+(A(C=1) –1)/C
E(C)=1+(E(C=1) –1)/C
152
Comparison
3.5
Access cost
3
3
2.5
2.5
2
2
1.5
1.5
1
1
0 0.2 0.4 0.6 0.8 1
0 0.2 0.4 0.6 0.8 1
d d
153
Problems for static organizations
154
Dynamic Hashing
155
Virtual hashing
overflow
Double primary area
buddies
156
Virtual hashing: using buddies
buddies
157
Virtual hashing: directory
158
Virtual hashing: initializing primary area
159
Virtual hashing: splitting bucket j
If l=0, or l>0 but the buddy of j is already used or does not exist
(l>0, but V[j+2l –1P0]=1 or j≥2l –1P0 –1)
l++, the primary area and the vector V are doubled
New elements of V equal 0, except for V[j+2l –1P0]=1
The new hash function Hl is used, with values in [0, 2lP0 –1]
Keys of bucket j are re-distributed using Hl
If the buddy of j exists and is not used (V[j+2l –1P0]=0)
V[j+2l –1P0]=1
Keys of bucket j are re-distributed using Hl
160
Virtual hashing: example (i)
V
P0=7, C=3
1
112 512 723 6851 7830 2840 286 1
1176 3270 1075 2665 1
841 6647 2385 1
1
We use the family of hash functions 1
Hl(k) = k % (2lP0 ) 1
We insert the key k=3820
H0(3820) = 5
Bucket 5 overflows
161
Virtual hashing: example (ii)
V
P1=14
1
112 512 723 6851 7830 2665 286 1
1176 3270 1075 2385 1
841 6647 1
1
3820 1
2840
1
0
0
The primary area and vector V are doubled
0
Keys of bucket 5 are re-distributed with its buddy 12, 0
by using the hash function H1(k) = k % 14 0
1
0
162
Virtual hashing: example (iii)
V
P1=14
1
112 512 723 6851 7830 2665 286 1
1176 3270 2385 1
841 1
1
1075 3820 1
6647 2840
1
3343
0
0
If we have to insert 3343, we have H1(3343)=11
0
Since V[11]=0, we should apply H0(3343)=4 0
Bucket 4 overflows and is splitted 1
165
Dynamic hashing
166
Dynamic hashing: using the trie
1
0
trie 1
Primary area
2
167
Dynamic hashing: overflow
0
trie 2 Primary area
1
0
1
Bucket 2 now contains
3
all keys whose pseudo-key
is 10…, and bucket 3 those with 11…
168
Dynamic hashing: performance
169
Dynamic hashing: variant
0 1 0 1
Primary area
170
Extendible hashing
171
Extendible hashing: bucket depth
000 001 010 011 100 101 110 111 Directory (p=3)
Primary area
p’ 2 3 3 2 3 3
172
Extendible hashing: splitting a bucket
173
Extendible hashing: splitting a bucket with p’<p
174
Example: splitting a un bucket with p’<p
000 001 010 011 100 101 110 111 Directory (p=3)
p’ 2 3 3 2 3 3
000 001 010 011 100 101 110 111 Directory (p=3)
p’ 2 3 3 3 3 3 3
175
Extendible hashing: splitting a bucket with p’=p
176
Example: splitting a un bucket with p’=p
00 01 10 11 Directory (p=2)
p’ 1 2 2
000 001 010 011 100 101 110 111 Directory (p=3)
p’ 1 2 3 3
177
Extendible hashing: deletion
178
Example: directory halving
000 001 010 011 100 101 110 111 Directory (p=3)
p’ 1 2 3 3
000 001 010 011 100 101 110 111 Directory (p=3)
p’ 1 2 2
00 01 10 11 Directory (p=2)
179
p’ 1 2 2
Extendible hashing: comments
180
Linear hashing
181
Linear hashing: management of primary area
182
Linear hashing: doubling the primary area
183
Linear hashing: example(i)
569 563
184
Linear hashing: example (ii)
563
185
Linear hashing: pros and cons
186
Linear hashing: storage utilization
0.85
0.8
storage utilization
0.75
0.7
0.65
0.6 0
0.7
0.55 0.8
0.9
0.5
0 2000 4000 6000 8000 10000
R
187
Linear hashing: performance
1.25
0 Minima of access
0.7
1.2
0.8
costs correspond
storage utilization
0.9 to maxima of
1.15
storage utilization
1.1
1.05
1
0 2000 4000 6000 8000 10000
R
188
Linear hashing: overflow capacity
Increasing Cov reduces the length of overflow chains
Search cost reduces as well
Beyond a given point, we only waste storage
1.18 0.7
1.16 successful search 0.695
unsuccessful search 0.69
1.14
storage utilization 0.685
1.12
0.68
1.1
0.675
1.08
0.67
1.06 0.665
1.04 0.66
1.02 0.655
1 0.65
0 5 10 15 20
Cov
189
Recursive linear hashing
190
Recursive linear hashing: operations
Insertion:
We compute the address H0(k)
If the bucket overflows, we move to level 1 using function H1(k)
If also the bucket at level L overflows, we add a new level
Split:
When the j-th bucket at level h is split, records to distribute
are those of j-th bucket and those in overflow,
which are contained in buckets at levels (h+1), …, L
191
Recursive linear hashing: search
h = 0;
while (h ≤ L)
if (Hh0(k) < SPh) Address = Hh1(k);
else Address = Hh0(k);
if EXISTS(k, Address, h) return true;
else if FULL(Address, h) h++;
else return false;
192
Recursive linear hashing: addressing
193
Recursive linear hashing: example
54 97 105 h=2
100 SP2=1
195
Spiral hashing: intuition
196
Spiral hashing: logical addresses (i)
X(k2,z)
X(k3,z)
197
Spiral hashing: logical addresses (ii)
4
3 m(k2)
2
1
m(k3)
0
0 0.5 1 1.5 2
z
198
Spiral hashing: proprieties of addresses
199
Spiral hashing: split
200
Spiral hashing: new value of z
201
Spiral hashing: example
1 z = 0, [ 20 , 20+1 –1]
203
Spiral hashing: physical addresses
204
Spiral hashing: allocation example
1 2 [ 30 , 30+1 –1]
9 6 4 5 7 8 10 11
205
Spiral hashing: converting addresses
206
Spiral hashing: performance (i)
1.009 1.125
successful search unsuccessful search
1.008 1.12
1.007 1.115
1.11
1.006
1.105
1.005
1.1
1.004 1.095
1.003 1.09
1.002 1.085
0 10000 20000 30000 40000 0 10000 20000 30000 40000
R R
207
Spiral hashing: performance (ii)
1.09
1.08
successful search
1.07
1.06
1.05
1.04
1.03 0
1.02 0.8
1.01
1
1 1.5 2 2.5 3 3.5 4
w
208