A Blueprint of Sanssoucidb: 5.1 Data Storage in Main Memory

Chapter 5
A Blueprint of SanssouciDB
SanssouciDB is a prototypical database system for unified analytical and

transactional processing. The concepts of SanssouciDB build on prototypes
developed at the HPI and an existing SAP database system. SanssouciDB is
an SQL database and it contains similar components as other databases such
as a query builder, a plan executer, meta data, a transaction manager, etc.
5.1 Data Storage in Main Memory
In contrast to most other databases data is kept permanently in main memory.

Main memory is the primary persistence for data, yet logging and recovery
still need the disk as non-volatile data storage. All operators, e.g., find,
join, or aggregation, can anticipate that data resides in main memory. Thus,
operators can be programmed di↵erently, free of any hassles coming from
optimizing for disk access. Using main memory as the primary persistence
leads to a di↵erent organization of data that only works if data is always
available in memory.
5.2 Column-Orientation
Another concept used in SanssouciDB was invented more than two decades
ago, that is, storing data column-wise [CK85] instead of row-wise. In column-
orientation, complete columns are stored in adjacent blocks. This can be con-
trasted with row-oriented storage where complete tuples (rows) are stored in
adjacent blocks. Column-oriented storage, in contrast to row-oriented stor-
age, is well suited for reading consecutive entries from a single column. This
can be useful for aggregation and column scans. More details on column-
orientation and its di↵erences to row-orientation can be found in Chapter 8.
31
32 5 A Blueprint of SanssouciDB
To minimize the amount of data that needs to be transferred between stor-

age and processor, SanssouciDB uses several di↵erent data compression
techniques, which will be discussed in Chapter 7.
5.3 Implications of Column-Orientation
Column-oriented storage has become widespread in database systems

specifically developed for OLAP, as the advantage of column-oriented stor-
age is clear in case of quasi-sequential scanning of single attributes and set
processing thereof. If not all fields of a table are queried, column-orientation
can be exploited as well in transactional processing (avoiding "SELECT *").
An analysis of enterprise applications showed that there is actually no ap-
plication that uses all fields of a given tuple. For example, in dunning only
17 attributes are necessary out of a table that contains 300 attributes. If only
the 17 needed attributes are queried instead of the full tuple representation
of all 300 attributes, an instant advantage of factor eight to 20 for data to be
scanned can be achieved.
As disk is not the bottleneck any longer, but access to main memory has to
be considered, an important aspect is to work on a minimal set of data. So far,
application programmers were fond of "SELECT *" statements. The di↵er-
ence in runtime between selecting specific fields or all fields in row-oriented
storage is often insignificant and in case changes to an application need
more fields, the data was already there (which besides is a weak argument
for using SELECT * and retrieving unnecessary data). However, in case of
column-orientation, the penalty for "SELECT *" statements grows with table
width. Especially if tables are growing in width during productive usage,
actual runtimes of applications cannot be anticipated during programming.
With the column-store approach, the number of indices can be signifi-
cantly reduced. In a column store, every attribute can be used as an index.
Because all data is available in memory and the data of a column is stored
consecutively, the scanning speed is high enough that a full sequential scan
of an attribute is sufficient in most cases. If this is not fast enough, dedicated
indices can still be used in addition for further speedup.
Storing data in columns instead of rows is more complicated for work-
loads with write access, so the concept of a di↵erential bu↵er was introduced.
New entries are written to a di↵erential bu↵er first. In contrast to the main
store, the di↵erential bu↵er is optimized for inserts. At a later point in time
and depending on thresholds, e.g. the frequency of changes and new entries,
the data in the di↵erential bu↵er is merged into the main store. More details
about the di↵erential bu↵er and the merge process will provided later in
Chapter 25 and Chapter 27.
REFERENCES 33
5.4 Active and Passive Data
The data in SanssouciDB is separated into active data (data of business

processes that are not yet completed) and passive data (data of business
processes that are closed/completed and will not be changed any more).
Active data is stored in main memory. Passive data can be moved to slower
storage as it is queried less frequently. Separating passive data from active
data reduces the amount of main memory needed to store the entire data set
of an enterprise.
Whenever new data is written to the database or existing data is changed,
logging to non-volatile storage is needed. During the merge of the di↵eren-
tial store to the main store, snapshots are taken and stored in non-volatile
memory, as well. Logs and snapshots are necessary to restore the database
in case of failure.
The largest advantage so far is that main memory access depends on time-
deterministic processes in contrast to seek-times of HDDs that depend on
mechanical parts. Thus, runtimes of in-memory processing can be calculated
(although it might be complicated). Observations from using in-memory
databases show that response times are smooth and not varying like it is the
case with disks and their response time variations due to disk seeks.
5.5 Architecture Overview
The architecture shown in Figure 5.1 grants an overview of the components

of SanssouciDB.
SanssouciDB is split in three di↵erent logical layers fulfilling specific tasks
inside the database system. The “Distribution Layer” handles the commu-
nication to applications, creates query execution plans, stores meta data and
contains the logic for database transactions. Inside the main memory of a spe-
cific machine the main working set of SanssouciDB is located. That working
set is accessed during query execution and is stored either in row, column
or hybrid-oriented data layout, depending on the specific type of queries
sent to the database tables. The non-volatile memory is used for logging and
recovery purposes, as well as for data aging and time travel.
All those concepts will be described in the subsequent sections.
5.6 References
[CK85] George P. Copeland and Setrag N. Khoshafian. A Decomposition

Storage Model. SIGMOD Rec., 14(4):268–279, May 1985.
anssouciDB: An In-Memory
base for Enterprise Applications
34 REFERENCES
abase*(IMDB)* Interface Services and Session Management
permanently*in* Query Execution Metadata TA Manager

Distribution Layer
Distribution
at Server i
Layer
at Blade i
* Active Data Main Memory
at Blade
Serveri i
y*is*the*primary* Main Store Differential
Store
Indexes
Inverted
Combined
Combined
Column
Column
Column
Column
Column
Column
o*disk/**
Merge
m*disk*
Object
Data Guide
y*access*is**
eneck*
Data Time
Logging Recovery
ous*algorithms/* aging travel
Non-Volatile
es*are*crucial* Log Memory
g)** Passive Data (History) Snapshots
Fig. 5.1: Schematic Architecture of SanssouciDB

A Blueprint of Sanssoucidb: 5.1 Data Storage in Main Memory

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Blueprint of Sanssoucidb: 5.1 Data Storage in Main Memory

Uploaded by

Copyright:

Available Formats

Chapter 5

SanssouciDB is a prototypical database system for unified analytical and

5.1 Data Storage in Main Memory

In contrast to most other databases data is kept permanently in main memory.

To minimize the amount of data that needs to be transferred between stor-

5.3 Implications of Column-Orientation

Column-oriented storage has become widespread in database systems

5.4 Active and Passive Data

The data in SanssouciDB is separated into active data (data of business

5.5 Architecture Overview

The architecture shown in Figure 5.1 grants an overview of the components

[CK85] George P. Copeland and Setrag N. Khoshafian. A Decomposition

abase(IMDB) Interface Services and Session Management

permanentlyin Query Execution Metadata TA Manager

g)** Passive Data (History) Snapshots

Fig. 5.1: Schematic Architecture of SanssouciDB

You might also like