You are on page 1of 8

Chapter 2:

Foundations of
Database Storage Techniques
Learning Map
Ioundanons for a
New Lnterpr|se
App||canon
Deve|opment Lra
Ioundanons of
Database Storage
1echn|ques
1he Iuture of
Lnterpr|se
Compunng
Advanced
Database
Storage
1ech-
n|ques
In-Memory
Database
Cperators
Dictionary Encoding


Dr.-Ing. Jrgen Mller
Motivation

! Maln memory access ls Lhe new bouleneck
! Compresslon reduces number of l/C operauons Lo
maln memory
! Cperauon dlrecLly on compressed daLa
! Csemng wlLh blL-encoded xed-lengLh daLa Lypes
! 8ased on llmlLed value domaln
33
Dictionary Encoding
Example
! 8 bllllon humans
! AurlbuLes:
" rsL name
" lasL name
" gender
" counLry
" clLy
" blrLhday
! 200 byLe per Luple
! Lach aurlbuLe ls dlcuonary encoded
34
Sample Data
33
recID fname |name gender c|ty country b|rthday
. . . . . . .
39 !ohn SmlLh m Chlcago uSA 12.03.1964
40 Mary 8rown f London uk 12.03.1964
41 !ane uoe f alo AlLo uSA 23.04.1976
42 !ohn uoe m alo AlLo uSA 17.06.1932
43 eLer SchmldL m oLsdam CL8 11.11.1973
. . . . . .
1able: world_populauon
Dictionary Encoding
a Column
! A column ls spllL lnLo a dlcuonary and an aurlbuLe vecLor
! ulcuonary sLores all dlsuncL values wlLh lmpllclL valuelu
! AurlbuLe vecLor sLores valuelus for all enLrles ln Lhe column
! osluon ls sLored lmpllclLly
! Lnables osemng wlLh blL-encoded xed-lengLh daLa Lypes
36
recID fname
. .
39 !ohn
40 Mary
41 !ane
42 !ohn
43 eLer
. .
D|cnonary for "fname"
va|ueID Va|ue
. .
23 !ohn
24 Mary
23 !ane
26 eLer
. .
Aur|bute Vector for "fname"
pos|non va|ueID
. .
39 23
40 24
41 23
42 23
43 26
. .
Querying Data using
Dictionaries
Search for AurlbuLe value
(l.e. reLrleve all persons wlLh fname Mary")

1. Search valuelus for requesLed value (Mary")
2. Scan AurlbuLe vecLor for valuelu (24")
3. 8eplace valuelus ln resulL wlLh correspondlng
dlcuonary value
37
Sorted Dictionary
! ulcuonary enLrles are sorLed elLher by Lhelr numerlc value or
lexlcographlcally
! ulcuonary lookup complexlLy: C(log(n)) lnsLead of C(n)
! ulcuonary enLrles can be compressed Lo reduce Lhe amounL
of requlred sLorage
! Selecuon crlLerla wlLh ranges are less expenslve
38
Compression Rate
! uepends on cardlnallLy / enLropy
! CardlnallLy
" 1able cardlnallLy: number of Luples ln a relauon
" Column cardlnallLy: number of dlsuncL values ln a column
! LnLropy
" measure for lnformauon denslLy
" LnLropy = column cardlnallLy / Lable cardlnallLy
39
Data Size Examples
40
Co|umn Card|-
na||ty
8|ts
Needed
Item S|ze |a|n S|ze S|ze w|th D|cnonary
(D|cnonary + Co|umn)
Compress|on
Iactor
llrsL
names
3 mllllons 23 blL 30 8yLe 373C8 238.4M8 + 21.4C8 = 17
LasL
names
8 mllllons 23 blL 30 8yLe 373C8 381.3M8 + 21.4C8 = 17
Cender 2 1 blL 1 8yLe 7C8 2.0b + 933.7M8 = 8
ClLy 1 mllllon 20 blL 30 8yLe 373C8 47.7M8 + 18.6C8 = 20
CounLry 200 8 blL 47 8yLe 330C8 9.2k8 + 7.3C8 =47
8lrLhday 40000 16 blL 2 8yLe 13C8 78.1k8 + 14.9C8 = 1
1ota|s 200 8yte = 1.618 = 92G8 = 17
Data Layout in Main Memory


Dr.-Ing. Jrgen Mller
Basics
0x0 0xFFFF FFFF FFFF FFFF
FFFF FFFF FFFF FFFF
FFFF FFFF FFFF FFFF
...
1he memory layouL ls only llnear, every hlgher-dlmenslonal
access ls mapped Lo Lhls llnear band.
42
Row Data Layout
! uaLa ls sLored Luple-wlse
! Leverage co-locauon of aurlbuLes for a slngle Luple
! Low cosL for reconsLrucuon, buL hlgher cosL for sequenual
scan of a slngle aurlbuLe
C B A C B A C B A C B A
C B A C B A C B A C B A
Column Operation
Row Operation
Row Row Row
Row Row Row Row
43
Columnar Data Layout
! uaLa ls sLored aurlbuLe-wlse
! Leverage sequenual scan-speed ln maln memory
! 1uple reconsLrucuon ls expenslve
C C C C B B B B A A A A
C C C C B B B B A A A A
Column Operation
Row Operation
Column Column Column
Column Column Column
44

You might also like