You are on page 1of 9

Clustered vs.

Unclustered Index
CLUSTERED

Index entries
direct search for
data entries

Data entries

UNCLUSTERED

Data entries
(Index File)
(Data file)

Data Records

Data Records

B+ Tree Indexes
Non-leaf
Pages

Leaf
Pages
(Sorted by search key)

Index leaf pages contain data entries, and are chained (prev & next)

Index non-leaf pages have index entries; only used to direct searches:
index entry
P0

K 1

P1

K 2

P 2

K m Pm

Example B+ Tree
Note how data entries
in leaf level are sorted

Root

17

Entries < 17
5

2*

3*

Entries >= 17
27

13

5*

7* 8*

14* 16*

22* 24*

30

27* 29*

33* 34* 38* 39*

Find: 29*? 28*? All > 15* and < 30*


Insert/delete: Find data entry in leaf, then change it.
3

Cost Model for Our Analysis

Notes:

We ignore CPU costs, for simplicity.


Measuring number of page I/Os ignores gains of prefetching a sequence of pages
Thus even I/O cost is only approximated.
Average-case analysis; based on simplistic assumptions.

Good enough to show overall trends!


4

Cost Model for Our Analysis

Variables :

B: The number of data pages


R: Number of records per page

Comparing File Organizations

Heap files (random order; insert at eof)

Sorted files, sorted on <age, sal>

Clustered B+ tree file, search key <age, sal>

Heap file with unclustered B + tree index on


search key <age, sal>

Heap file with unclustered hash index on

search key <age, sal>

Operations to Compare

Scan: Fetch all records from disk


Equality search
Range selection
Insert a record
Delete a record

Assumptions in Our Analysis

Heap Files:

Sorted Files:

Equality selection on key; exactly one match.


Files compacted after deletions.

Indexes:
data entry size/pointers = 10% size of data record
Hash: No overflow buckets.

Tree: 67% occupancy (this is typical).

80% page occupancy => File size = 1.25 data size


Implies file size = 1.5 data size

Scans:
Leaf levels of a tree-index are chained.
Index data-entries plus actual file scanned for unclustered indexes.

Range searches:
We use tree indexes to restrict set of data records fetched, but
ignore hash indexes.
8

Cost of Operations
(a) Scan

(b)
Equality

(c ) Range

(d) Insert

(e) Delete

(1) Heap
(2) Sorted
(3) Clustered
(4) Unclustered
Tree index
(5) Unclustered
Hash index

Several assumptions underlie these (rough) estimates!


9

You might also like