You are on page 1of 37

An Experimental Study Of

Tree-Based Index Data Structures

Alexandru Ionescu

4th Year Project Report


Computer Science and Mathematics
School of Informatics
University of Edinburgh
2021
3

Abstract
Databases play nowadays a huge role in computer science field. Formally, a database
is an organized collection of data, generally stored and accessed electronically from a
computer system. The database management software system (DBMS) is the software
that interacts with end users, applications and the database itself.
Some of major advantages of using databases are accuracy, managing large amounts of
data, security of information or data integrity. Thus we are interested in a performant
system that provides all these benefits and it can also be adapted for future require-
ments. When we think about the performance of a database, indexing is the first thing
that comes to the mind. It is a data structure technique used to quickly locate and
access the data within the database.
In this project, I investigate the existing literature, critically evaluating previous work
done in this area, while also conducting experiments to understand better why some
data structures perform better than others on various scenarios and then I propose an
implementation of a custom tree-based index that can be used in real-time environ-
ments.
4

Acknowledgements
Acknowledgements go here.
Table of Contents

1 Introduction 7
1.1 Project goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Summary of contributions . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.1 Analysis of existing work . . . . . . . . . . . . . . . . . . . 8
1.2.2 Practical work in DBToaster . . . . . . . . . . . . . . . . . . 8
1.3 Report Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Background 11
2.1 The concept of database indexing and why it is needed . . . . . . . . 11
2.1.1 Database Index . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.2 Why indexing is useful? . . . . . . . . . . . . . . . . . . . 11
2.2 Introduction of B+Tree . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Bw-Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Adaptive Radix Tree . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 MassTree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Experimental Analysis of Tree-Based Indices 17


3.1 Critical evaluation of previous work . . . . . . . . . . . . . . . . . . 17
3.2 Similar experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.1 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.2 Remarks of the analysis . . . . . . . . . . . . . . . . . . . . 21

4 Tree-Based Data Structures in DBToaster 25


4.0.1 Data Model of the software . . . . . . . . . . . . . . . . . . . 26
4.1 Implementation of MIN/MAX operations over Rings . . . . . . . . . 26
4.2 Testing the new implementations . . . . . . . . . . . . . . . . . . . . 27
4.3 Challenges involved . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3.1 A new approach . . . . . . . . . . . . . . . . . . . . . . . . 28

5 Future Work 29
5.1 Implementation part . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2 Experimental part . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2.1 Increase number of threads and relevant statistics . . . . . . . 29
5.3 Integration part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

6 Conclusions 31

5
6 TABLE OF CONTENTS

Bibliography 33

A Code for queries operations using Rings 35


Chapter 1

Introduction

The concept of a database was made possible by the introduction of direct access stor-
age media like magnetic disks, that gained popularity in the mid 1960s. Since then, the
sizes and performance of databases and their respective software have grown in orders
of magnitude. Capabilities to support bigger and complex type of information were
enabled by the advancements in the areas of computer memory, processors, computer
storage and computer networks.
In this context, database indexing is increasingly useful as it is a way to optimize the
performance of the system by minimizing the number of disk accesses required when
a query is processed. To put it simply, an index is a data structure that improves the
speed of data retrieval operations on a database table at the cost of additional writes
and storage space to maintain the index data structure.
There are a range of operations that we can do on databases such as insert, read, update
or scan. Although some index data structures perform especially well on some of these
operations none of them has great performance in all such modifications that may affect
our database. Thus, experimental studies are performed in order to understand what
index structure is best to use in a range of particular situations.

1.1 Project goals


This project has three main goals:
1. Study existing literature, how different index data structures are built to support
different types of workloads in databases; analyse comparison experiments previously
done and perform similar experimental studies on some tree-based index data struc-
tures and provide insights about their behaviour in various situations
2. Using knowledge of programming and tree-based index data structures to help add
support for a few operations found in queries and improve their performance as part of
my work at DBToaster software
3. Discuss and present ideas of further improvement in regards to the analysis per-
formed, how tree-based index data structures could be adapted for the future and work

7
8 Chapter 1. Introduction

to be done to develop DBToaster

1.2 Summary of contributions


1.2.1 Analysis of existing work
I performed analysis of previous work and in doing so, I was aiming at understand-
ing how different tree-based indexes data structures are built and I conducted exper-
iments to observe their performance on various types of workloads. I followed the
model used previously by a few research people from Carnegie Mellon University. As
part of testing phase, there has been used a set of Yahoo! Cloud Serving Benchmark
(YCSB) microbenchmarks. The default workloads have been used in the process: A
(Read/Update, 50/50), C (Read-only) and E (Scan/Insert, 95/5) with Zipfian distribu-
tions. The initialization phase in each workload has been measured and reported as the
Insert-only workload. There have been tested 64-bit random integers(Rand-Int) and
64-bit monotonically increasing integers(Mono-Int) key types. The index has been
populated with 50 million keys and then 10 million operations have been executed for
each workload. I performed a critical anaysis of the results and indicated how they
relate to the choice of underlying data structures.

1.2.2 Practical work in DBToaster


The most practical and challenging part of this project has been to use programming
(C++ language has been used) and tree-based index data structures to help optimize
queries in DBToaster software and observe their performance. I have used different
libraries and implementations of trees to help operations such as insertions, deletions
execute faster.

1.3 Report Structure


Chapter 2 introduces the concept of database indexing together with some tree-based
indexes that we carefully investigate further and an overview of existing literature that
concerns their implementation and ideas of their expected performances in various
scenarios.
Chapter 3 presents previous work that has been done in the area, critical evaluation
of it performed by myself, performance results of different data structures, techniques
used in optimizing as well as statistics that I have obtained by running similar experi-
ments and also some explanations regarding the results obtained.
Chapter 4 presents the work that I have done in the second part of the project where
I used knowledge of programming and tree-based data structures to add new function-
alities to DBToaster, while also having in mind the goal of improving by a little its
performances in some use cases.
Chapter 5 describes some of my ideas in regards to what can be done to further im-
prove the performance of database indexing, how data structures can be further ex-
1.3. Report Structure 9

tended to accommodate the future hardware and software designs, what could I have
done in more detail in my project and the future work that can be done in DBToaster
software, one might choose to do so.
Chapter 2

Background

In this chapter I present the concept of indexing in databases and explain why it is
important. Then I introduce tree-based indexes that I have analysed as part of my
experiments in this project and a short summary of existing literature that contains
their implementation and expected performance in different situations.

2.1 The concept of database indexing and why it is


needed
2.1.1 Database Index
A database index is a data structure that allows a query to efficiently retrieve data from
a database. Indexes are related to specific tables and consist of one or more keys, which
are the values that we want to look up in the index. A table can have more than one
index built from it. The keys are based on the tables’ columns.

2.1.2 Why indexing is useful?


I present the applicability of indexing with an analogy to a cards game. In a deck of
52 cards shuffled into a random order, to pick out the 8 of hearts, the obvious solution
would be individually flip each card until the desired one is found. On average it takes
half of the deck to go through, which is 26 cards. This is not an optimum solution at
all.
If instead the cards would be separated into four piles by suit, each pile randomly
shuffled, to pick out the 8 of hearts, the hearts pile would need to be selected (two to
find on average) and then flip through the 13 cards (seven to find on average), thus nine
moves in total. This is seventeen flips (26-9) faster than just scanning the whole deck.
That is why an index is so useful. By segregating and sorting our data on keys, we can
use a piling method to drastically reduce the number of flips it takes to find our data.
B-Tree is a data structure that stores data in its node in sorted order and is usually used
in database indexing. It is a generalized form of a binary tree. Instead of having a

11
12 Chapter 2. Background

single entry for a single node, B-Tree uses an array of entries for a single node with
references to child node for each of these entries, which leads to a smaller height of
the tree. Another property is that all leaf nodes should be at the same level in a B-Tree.

2.2 Introduction of B+Tree


The drawback of B-Tree is that it stores the data pointer (a pointer to the disk file block
containing the key value), corresponding to a particular key value pair, along with that
key value in the node. This greatly reduces the number of entities packed into the node
of a B-Tree and contributes to a greater number of levels of the tree, hence increasing
the search time of a record.
2.3. Bw-Tree 13

Figure 2.1

B+Trees are the ones generally implemented in databases, as they eliminate the above
drawback by storing data pointers only at the leaf nodes of the tree. So, the structure
of the leaf nodes of a B+Tree is quite different from the structure of internal nodes
of the B-Tree. We must also note that, because data pointers are present only at the
leaf nodes, the leaf nodes must necessarily store all the key values along with their
corresponding data pointers to the disk file block. Also, the leaf nodes are linked to
provide ordered access to the records.

2.3 Bw-Tree
Although very useful data structures for databases, B+Trees need to be modified for the
software and hardware of the future that will eventually handle the enormous amount of
data generated each day. Bw-Trees help with this issue. They preserve the elementary
ideas of B-Trees but focus on improving performance in more up-to-date software
paradigms and hardware. These trees were introduced by a team of Microsoft Research
in 2013. Two important concepts that Bw-Trees address to are multi core processing
and disk latency.
Multi core processing performance depends on two concepts: high concurrency and
high CPU cache hit ratios. While the amount of concurrency increases, design involv-
ing mutex locks (latches) will put a limit to the system’s scalability. Bw-Trees maintain
this concurrency without using latches. Also, to achieve high CPU cache hit ratio, they
do not update directly in memory, but instead they use so called delta updates.
Disk latency is a major problem and their low I/O operations per second are not ideal
for a system. Flash storage is a better alternative as it offers higher I/O ops per second
at a lower cost. Bw-Trees are aimed at flash storage, performing log structuring itself
at its storage layer. This approach ensures that write performance is as high as possible
for both high-end and low-end flash devices.
14 Chapter 2. Background

Figure 2.2

Mapping Table
A mapping table is maintained in the cache layer, that maps logical pages to physical
pages, logical pages being identified by a logical “page identifier” (PID). The mapping
table translates a PID into either , a flash offset, the address of a page on stable storage,
or a memory pointer, the address of the page in memory. Mapping table serves the
connection between physical location and inter node links. This enables the physical
location of a Bw-Tree node to change on every update and every time a page is written
to stable storage, without the location change being propagated to the root of tree.
Delta Updates
Page state changes are done by creating a delta record (describing the change) and
attaching it to an existing page state. The new memory address of the delta record is
installed into the page’s physical address slot in the mapping table using the atomic
compare and swap (CAS) instruction. Pages are consolidated (create a new page that
applies all delta changes) to both reduce memory footprint and to improve search per-
formance.

2.4 Adaptive Radix Tree


Due to the advancements in the field, main memory capacities have expanded to the
point where databases generally fit into RAM. In-memory data structures such as bal-
anced binary search trees are not efficient on modern hardware as they do not utilize
optimally CPU caches; hash tables, used for main-memory indexes, although quite
fast, support only point queries. Adaptive radix tree (ART) comes into picture to over-
come these shortcomings.
Firstly, a radix tree (known as radix trie or compact prefix tree) is a data structure that
represents a space-optimized trie (prefix tree) in which each node that is the only child
is merged with its parent. The result is that the number of children of every internal
node is at most the radix r of the radix tree, where r is a positive integer and a power x
of 2, having x greater or equal than 1.
2.5. MassTree 15

Figure 2.3

In comparison with regular trees, edges can be labeled with sequences of elements and
single elements as well, making them more efficient on small sets. In regular trees,
whole keys are compared en masse from their beginning up to the point of inequality,
however in radix tree, the key at each node is compared chunk-of-bits by chunk-of-bits,
where the quantity of bits in that chunk at that node is the radix r of the radix trie.
The adaptive radix tree(ART) is a variant of radix tree that integrates adaptive node
sizes to the radix tree. One major drawback of the usual radix trees is the use of space,
because it uses a constant node size in every level. The major difference between the
radix tree and the ART is its variable size for each node based on the number of child
elements, which grows while adding new entries. Thus, the adaptive radix tree leads
to a better use of space without reducing its speed.
If we look at the performances of this data structure we will observe the following: in
terms of lookups, ART exceeds tuned read-only search trees, while also performing
well under deletions and insertions. This type of tree solves the problem of space con-
sumption, often seen at radix trees, by adapting its data structures for internal nodes.
Even though it has a similar performance with hash tables, ART also maintains order
in data, so it can support range scan or prefix lookup too.

2.5 MassTree
When considering the problem of storing in memory millions of (key, value) pairs,
if we wanted to support point lookup, a hash table would be a choice. However, to
support range queries, a tree structure would be desirable. A candidate could be for
example B+Tree. The number of levels would be kept small due to the fact that each
node has a high fan-out. But this means that a single node packs a large number of
keys and it results in a large number of key comparisons to perform when searching
through the tree. Also, cost of key comparisons can be quite high with variable-length
16 Chapter 2. Background

keys. If the keys are really long they can each occupy more cache lines, so comparing
two of them can affect the cache locality.
MassTree is an efficient tree data structure that relies on splitting length keys into a
number of fixed-length keys called slices. When going down the tree, you compare
the first slice of each key, then the second, and so on, with each comparison having
a constant cost. This is way more efficient than comparing whole strings again and
again. This data structure exploits the cache locality advantage by doing these fixed-
size comparisons.
A classic search structure for variable-length strings over a fixed alphabet is Trie. The
disadvantage of it is that, when splitting strings into their natural one-character alpha-
bet would be too deep when you have many long keys. The child pointers in an English
alphabet trie are generally stored in an array indexed by the next character. When deal-
ing with large alphabets it is not feasible anymore. The idea is then to use B+Trees,
by making each node in the trie its own B+Tree. B+Trees are good at representing the
child pointers by using ranges, thus taking logarithmic take for search, but they are not
able to be indexed in constant time. So we can define MassTree as a trie concatenation
of B+Trees.
Chapter 3

Experimental Analysis of Tree-Based


Indices

This chapter contains my critical evaluation of previous work in the area of tree-based
index data structures together with performance results of these and techniques used
in optimizing. I have also included statistics I have obtained by running similar exper-
iments and some explanations regarding the results obtained.

3.1 Critical evaluation of previous work


Previous work has been done in the area to compare the performance of different
tree-based index data structures on different workloads. The work included running
experiments using Yahoo! Cloud Serving Benchmark with the default workloads
A(Read/Update 50/50), C(Read-Only) and E(Scan/Insert 95/5) with Zipfian distribu-
tions. Before that, there were no clear experiments conducted to compare the tree-
based data structures that have been studied in this project. This has been a great effort
and with a lot of insights revealed. The following data structures were taken into con-
sideration in the comparison: Bw-Tree, B+Tree, ART, MassTree, SkipList. Next I am
going to present some of the key results they obtained on two types of keys: random
integers(Rand-Int) and monotonically increasing integers(Mono-Int).
Overall Bw-Tree is slower than its competitors except the SkipList. For example, the
ART is more than 4× faster than the Bw-Tree for point lookups (though the ART is
slower on the Scan/Insert workload). The Bw-Tree is also slower than the MassTree
and the B+Tree, often by a factor of 2×.
MassTree performance is generally comparable to the B+Tree for integer workloads(except
Insert-Only). The B+Tree’s Read-Only and Read/Update performance is comparable
to the MassTree, and much faster than the Bw-Tree. For the Mono-Int Insert-Only
workload, the B+Tree without any optimizations even outperforms the MassTree and
ART, and is 3.7× faster than the Bw-Tree. The B+Tree also achieves high throughput
for Scan/Insert workloads, and is usually 3–5× faster than all other indexes. We can
also see that ART outperforms the other indexes for all workloads Scan/Insert.

17
18 Chapter 3. Experimental Analysis of Tree-Based Indices

In terms of memory, among all compared indexes, the ART has the lowest usage for
Mono-Int, while the B+Tree has the lowest for the Rand-Int keys due to its compact
internal structure and large node size. The SkipList consumes more memory than
the B+Tree/ART due to its customized memory allocator and preallocation; it has a
memory usage comparable to the Bw-Tree. The MassTree always has highest memory
usage.
3.2. Similar experiments 19

3.2 Similar experiments

3.2.1 Observations
I am going to present now the results obtained by running similar experiments. I also
included in the analysis SkipList to have a bigger picture of the performance statistics,
even though this data structure does not belong to trees category.
On workload A, ART performs best, being well adapted for lookups and updates, then
comes MassTree (slower by a factor of 1.5x). B+Tree is slower than MassTree, al-
though on Rand-Ints they have similar performance. Bw-Tree is worst, slower by a
factor of 1.3x than B+Tree.

Figure 3.1: workload A - Mono-Int

Figure 3.2: workload A - Rand-Int


20 Chapter 3. Experimental Analysis of Tree-Based Indices

On workload C, ART performs best as it is a data structure well adapted for supporting
point lookups. Then MassTree and B+Tree have a similar performance on Rand-Ints
(they are slower by a factor of 1.7x than ART; B+Tree is better here than MassTree)
and Mono-Ints (they are slower by a factor of 2.8x than ART; MassTree performs better
here than B-Tree) and Bw-Tree performs worst out of these 4 data structures on this
workload (slower by a factor of 4.3x on Mono-Ints and a factor of 2.1x on Rand-Ints).

Figure 3.3: workload C - Mono-Int

Figure 3.4: workload C - Rand-Int

On workload E, B+Tree performs best as it efficiently supports range queries. MassTree


comes second on Mono-Ints (slower by a factor of 1.33x), while Bw-Tree is second on
Rand-Ints (slower by a factor of 2.8x). Bw-Tree is slower than MassTree (by a factor
of 1.2x), but better than ART (by a factor of 1.8x) on Mono-Ints, while on Rand-Ints,
ART and MassTree perform the worst and they have a similar performance (slower
than B+ Tree by a factor 5.6x) with ART being a little bit better.
3.2. Similar experiments 21

Figure 3.5: workload E - Mono-Int

Figure 3.6: workload E - Rand-Int

3.2.2 Remarks of the analysis

We can see that on all workloads, Bw-Tree is slower than its competitors (B+Tree,
MassTree, ART). This comes from the overhead that Bw-Tree has because of the in-
direction layer and delta records, which cause this lock-free index data structure im-
plementation to underperform the other lock based indexes, in spite of some previous
claims that lock-free indexes are superior to lock-based ones.

ART is the leader on workloads heavily based on read and update operations, but fails
to have a great performance on the workload based on insert and scan operations. The
exceptional running time of this data structure on reads/updates comes from the fact
that performance depends on the length of the keys it has and not on the number of
total elements contained at a particular moment.
22 Chapter 3. Experimental Analysis of Tree-Based Indices

Bw-Tree has poor performance on workloads consisting of reads/updates, and has a


good performance on workload of insertions/scans on Rand-Ints. MassTree has a good
performance on workload of reads/updates (second after ART) and workload of read-
only on Mono-Ints (second after ART) and on insert/scan workload on Mono-Ints (sec-
ond after ART).
B+Tree is best on workload E and has good performance on workload C on Rand-
Ints (second after ART) and workload A on Rand-Ints (almost second). B+Tree is
exceptionally good for range queries. Consider the situation in which all the records
within a certain range should be retrieved. Once the first record in the range has been
found, the rest of the records can be found by sequentially processing the remaining
records in the leaf node and continuing the linked list of leaf nodes as far as necessary.
When a record is found that has a search key value above the upper bound of the
requested range, then the search completes.

Figure 3.7: workload A - Mono-Int Figure 3.8: workload A - Rand-Int

Figure 3.9: workload C - Mono-Int Figure 3.10: workload C - Rand-Int

From the latest graphs, it can be observed that on Rand-Ints, MassTree has generally
the highest memory usage, with SkipList, Bw-Tree, ART, B+Tree coming afterwards
3.2. Similar experiments 23

Figure 3.11: workload E - Mono-Int Figure 3.12: workload E - Rand-Int

in this order. On Mono-Ints though, SkipList comes first, then MassTree, Bw-Tree,
B+Tree, ART in a descending order of memory consumption.
Chapter 4

Tree-Based Data Structures in


DBToaster

As the second part of the project, my goal has been to use programming and tree-based
index data structures to help add new functionalities and optimize queries by a lot in
some cases in DBToaster software.
I started the work at DBToaster with the idea in mind of integrating tree-based data
structures in the code used to run queries that may contain various and complicated
aggregate functions. After studying this kind of data structures in the first part of this
project, it made sense to put tree-based algorithms to work in order to obtain better
performance.
To put it simply, DBToaster is an SQL-to-native-code compiler, it generates specialized
query engines, and it is used in applications that deals with real-time, low-latency data
processing. It has some features that make it one of the best systems of this kind that
exists in the world, such as:
1) Compilation of database queries to low-level code
DBToaster does not need to interpret queries and data, instead it compiles SQL-queries
to low-level code, eliminating overheads resulting from interpretation.
2) Online query processing
DBToaster generates code that maintains a query result as an in-memory materialized
view which is kept fresh as a stream of updates to the base data.
3) Embedded query engines
Code that is generated by DBToaster can be linked into applications without a separate
runtime system.
4) Materialized views of nested queries
DBToaster is the only system that actually supports efficient materialized views of
nested SQL queries. Nested queries are vital to complex analytics.

25
26 Chapter 4. Tree-Based Data Structures in DBToaster

4.0.1 Data Model of the software


DBToaster uses more abstractions to help compute queries in a more efficient way.
One of them is represented by rings. Formally, a ring (D, +, *, 0, 1) is a set with closed
binary operations + and *, additive identity 0 and multiplicative identity 1 that satisfies
a series of axioms (for any a, b, c in D):
1. a + b = b + a
2. (a + b) + c = a + (b + c)
3. 0 + a = a + 0 = a
4. it exists (-a) in D : a + (-a) = (-a) + a = 0
5. (a * b) * c = a * (b * c)
6. a * 1 = 1 * a = a
7. a * (b + c) = a * b + a * c and (a + b) * c = a * c + b * c
A schema S is a set of variables/attributes. For a variable X in S, let Dom(X) denote
its domain. A tuple t over schema S has the domain Dom(S) = Product Dom(X) with
X in S. The empty tuple () is the tuple over the empty schema.
Let (D, +, *, 0, 1) be a ring. A relation R over schema S and the ring D is a function
R : Dom(S) ->D mapping tuples over schema S to values in D such that R[t] != 0 for
finitely many tuples t. The tuple t is called a key, while its mapping R[t] is the payload
of t in R. Note that payloads may have positive or negative values, first case indicating
that tuple is present in database while the other one showing that deletions need to be
performed within the schema.

4.1 Implementation of MIN/MAX operations over Rings


Thinking of cases where the tree-based data structures would come in handy, I have
observed that MIN and MAX operations within queries would make great use of those.
Previously, DBToaster presented support only for COUNT, COUNT DISTINCT, SUM
and AVG aggregates. This has been the first improvement I did in the software.
The MIN/MAX implementation contains a few methods that are explained in the fol-
lowing:
The whole structure maintains a tree of values, where each value is actually a pair of
(aggregate, count), where ’aggregate’ is the search key and ’count’ is the associated
value. The following methods are used to construct the operations required to compute
queries containing MIN/MAX aggregate functions.
* isZero method checks whether the current value held in the structure corresponds to
the zero element of that ring;
* The structure corresponds to the 0 element when the tree is empty;
4.2. Testing the new implementations 27

* When adding another MinStruct via+= operator, two given trees are merged, by
iterating over the second tree and adding each pair (v, c) to the current tree (*this):
Suppose there are two trees named T1 and T2 and a pair (v, c) needs to be added from
T2 to T1. Looking for the element v in T1, if there is no element v in T1, pair (v, c)
is added to the final tree. Otherwise, if there is already v in T1, its count is increased,
that is, pair (v, c1 + c2) appears in the final tree T1. If the final count of an element is
0, that element is deleted from the tree;
* Multiplying MinStruct with a scalar ’s’ means multiplying the count of every element
in the tree of that structure by ’s’ (no change of keys).
I used the map data structure from Standard Template Library in C++, which is im-
plemented with self-balancing trees, and successfully compiled queries that contained
MIN/MAX operations.

4.2 Testing the new implementations


I was expecting a major improvement in speed execution because of the underlying
tree-based data structures used and the rings idea. Prior to that, queries containing
MIN/MAX aggregate functions were computed as nested queries which resulted in a
more complex intermediary generated code and thus leading to an increased runtime.
To better observe the impact of the new implementation using rings, I created some
use cases and performed a series of experiments. A few files were populated with data
along with streams to read that information in a desired format. I started with very
simple examples containing one of the aggregate functions that I wanted to test (SUM,
AVG, MIN, MAX, etc) and then continuing with more complex ones.
First, I compiled the query in an intermediary file (the system supports CPP as well as
M3 generated code but I worked with CPP) which I then ran to see the expected re-
sult. Statistics like processing tables, executing on system ready trigger and processing
streams were measured and they were under a few miliseconds on the considered tests.
All the expected results were obtained, which indicated a correct implementation of
the operations.

4.3 Challenges involved


One of the first things I needed to do was to make the DBToaster system work on my
machine. I followed the instructions corresponding to setup and extended setup found
at the readme file on the repository of the software and I dealt with the case of one of
the versions of a software not working on my Ubuntu version.
The back-end of the system is written in Java and front-end is written in OCaml and
I tackled various errors related to those while configuring the setup locally on my
computer. Also I tried to understand as much as possible about how DBToaster is
managed to be able to solve incompatibilities along the way and do the integration of
new code flawlessly.
28 Chapter 4. Tree-Based Data Structures in DBToaster

Also, the use of rings abstraction made the code development more demanding as
I was aiming at integrating new implementations within the software infrastructure.
Last but not least, using non-trivial tree-based data structures to help compute queries
efficiently has represented a challenge in itself.

4.3.1 A new approach


I decided to tackle a challenging extension and put my programming skills to work in
order to come up with a solution for determining Median in a real time data stream.
After trying a few ideas, I found an elegant and unique approach to it. I used two heaps,
one Max Heap and one Min Heap. We need the number that is at the middle of the
sorted array of all the elements we have so far. We want that at any time to have either
the same number of elements in the heaps or with one number of elements in plus in
the Max Heap. When we want to return the median, if there have been an odd number
of elements in total, that means that Max Heap will have one element in plus so we
return the top element there because it is the maximum element of the smallest n / 2
+ 1 total elements, which is the Median. Else we return the top element in Max Heap
plus the top element in Min Heap and we divide the result by 2, because the median
would be the average of those two numbers.
For the Top K elements problem, I have also used a binary search tree and solved it
in a similar way as in MAX/MIN case except that when to compute the final value, I
returned K values instead of just one.
Chapter 5

Future Work

There are definitely things related to this work that can be improved or further explored,
but because of the time frame of this year could not be tackled in this particular project.
I present in this chapter some modifications that can make the analysis of the work
done more relevant as well as some ideas for improving data structures discussed and
expand the performance comparison for the future. I divide all these ideas in 3 parts:
Implementation, Experimental and Integration.

5.1 Implementation part


We live in the world of big data today and the amount of data generated is huge. Thus
we need efficient ways to store this data and compute queries from them. Database
systems that were present until a few years back were based on assumptions such as,
much of the processing happens in a single processor and that large amounts of data can
only be stored in disks. With the advent of multi core processors and flash storage and
many other such advancements the underlying design of the database systems needs to
change to fit in this new setting.

5.2 Experimental part

5.2.1 Increase number of threads and relevant statistics

When running the experiments following the YCSB framework, one and two threads
have been used to obtain the results. Using 5, 10, 20 threads would extend the analysis
and help draw better conclusions about the increasing performance. Also, other rele-
vant statistics that we could have included in our analysis are cache misses as well as
instruction counts.

29
30 Chapter 5. Future Work

5.3 Integration part


A challenging part of the project has been represented by trying to use tree-based
data structures for query optimization. Firstly, I started with an implementation using
Standard Template Library in C++ for queries containing MIN/MAX operations and
observed that the performance was better in comparison with the previous implemen-
tation existed within the software. Then, I tackled other challenging problems such as
Median or Top K elements in a stream.
Chapter 6

Conclusions

In this project, I studied existing tree-based indexing data structures as well as other
type of data structures and experimentally evaluated their performance on different
type of workloads. The evaluation aimed to provide deeper insights about the be-
haviour of these data structures in various situations. I have achieved this by trying to
explain to the best of my knowledge the results obtained in the process.
I then present my work done within DBToaster software where I used research done in
index data structures and experience in programming to help optimize different types
of queries and add new functionalities to the software. I managed to provide implemen-
tations for tree-based data structures that fit in the existing infrastructure of DBToaster
and that results in a significant improvement of the running time of queries.
In Chapter 2 I start by introducing the concept of database indexing and why it is
useful. Then I introduce the main types of tree-based index data structures that I have
analysed in this project and present their structure as well as their usage, why they are
suitable in some particular situations .
Chapter 3 includes results of some of the work done previously in the area and analysis
that I did in a similar environment. The aim of these has been to understand what data
structures are suited for a range of situations and how well they actually perform as well
as developing better ways of thinking when it comes to design index data structures.
In Chapter 4 I present the work done in the area of optimizing queries by using tree-
based index data structures and integrating their implementation within DBToaster’s
existing software. I have seen the difference in performance of these attempts and I
have tried and succeed in providing new functionalities to computing queries.
In the end, in Chapter 5 I propose some ideas in regards to the development of data
structures in order to support future workloads and software, what I could have also
included in my analysis and ways to extend the analysis I did and continue this project,
one should want to do it.
To conclude with, in this project I have studied different tree-based index data struc-
tures, analysed scenarios in which they can provide good performance and I have
shown results obtained through running experiments. I compared these with some

31
32 Chapter 6. Conclusions

previous work done in the area. I also did a consistent amount of work at DBToaster
by writing and using tree-based data structures to improve the performance of running
various queries.
Bibliography

[1] https://dzone.com/articles/database-btree-indexing-in-sqlite
[2] http://www.cs.cmu.edu/h̃uanche1/publications/open bwtree.pdf
[3] https://www.cs.cmu.edu/p̃avlo/papers/mod342-wangA.pdf
[4] https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/bw-tree-
[5] Prefix Tree.
https://www.xspdf.com/resolution/52555187.html
[6] MassTree Notes.
https://www.the-paper-trail.org/post/masstree-paper-notes/
[7] BwTree Blog notes.
https://medium.com/@nayakdebanuj4/bw-trees-also-known-as-buzz-word-trees-7de9
[8] https://15721.courses.cs.cmu.edu/spring2020/papers/07-oltpindexes2/p521-binn
[9] https://www.comp.nus.edu.sg/d̃bsystem/download/xie-icde18-paper.pdf
[10] https://pdos.csail.mit.edu/papers/masstree:eurosys12.pdf
[11] http://www.cse.chalmers.se/edu/year/2018/course/DAT037/slides/12-tries.pdf
[12] Wikipedia. Radix tree.
https://en.wikipedia.org/wiki/Radix tree
[13] https://www.boost.org/doc/libs/1 58 0/doc/html/container/non standard containers.htmlcontain
[14] https://abseil.io/docs/cpp/guides/container
[15] https://medium.com/nlnetlabs/adapting-radix-trees-15fe7d27c894
[16] https://fsgeek.ca/2018/04/23/cache-craftiness-for-fast-multicore-key-value-
[17] MassTree Article.
http://highscalability.com/blog/2012/4/30/masstree-much-faster-than-mongodb-v
[18] https://ntnuopen.ntnu.no/ntnu-xmlui/bitstream/handle/11250/2619032/no.ntnu:
[19] http://ranger.uta.edu/s̃jiang/CSE6350-spring-19/Presentations-materials/Topi
[20] Wikipedia. Database Index.
https://en.wikipedia.org/wiki/Database index

33
34 Bibliography

[21] DBToaster Overview.


https://github.com/dbtoaster
[22] B+Tree Indexes Description.
http://web.csulb.edu/ãmonge/classes/common/db/B+TreeIndexes.html
[23] DBToaster Command-Line Reference.
https://dbtoaster.github.io/docs compiler.html
[24] http://www.cs.ox.ac.uk/dan.olteanu/papers/no-sigmod18.pdf
[25] https://www.cs.ox.ac.uk/files/9137/vldbj2014-dbtoaster-extended.pdf
[26] https://www.the-paper-trail.org/post/art-paper-notes/
[27] DBToaster Architecture.
https://dbtoaster.github.io/docs architecture.html
[28] DBToaster C++ Code Generation.
https://dbtoaster.github.io/docs cpp.html
[29] Database Index.
https://www.essentialsql.com/what-is-a-database-index/
Appendix A

Code for queries operations using


Rings

#ifndef DBTOASTER RINGS MAX STRUCT HPP


#define DBTOASTER RINGS MAX STRUCT HPP
#include <map>
#include ”numeric ring.hpp”
namespace dbtoaster {
namespace standard rings {
struct MaxRing : NumericRing {
std::map <DoubleType, long>sorted map;
explicit MaxRing() { }
explicit MaxRing(DoubleType v) : sorted map { { v, 1 } } { }
explicit MaxRing(const MaxRing& other, long a) {
if (a == 0L) return;
for (auto& kv : other.sorted map) {
sorted map[kv.first] = kv.second * a;
}
}
inline bool isZero() const { return sorted map.empty(); }
inline DoubleType result() const {
return (sorted map.empty() ? 0.0 : sorted map.rbegin()->first);
}

35
36 Appendix A. Code for queries operations using Rings

inline MaxRing& operator+=(const MaxRing& other) {


for (auto& kv : other.sorted map) {
auto it = sorted map.find(kv.first);
if (it == sorted map.end()) {
sorted map[kv.first] = kv.second;
}
else {
it->second += kv.second;
if (it->second == 0L) sorted map.erase(it);
}
}
return *this;
}
};
inline MaxRing multiply(MaxRing&& r, long a) {
if (a == 0L) return MaxRing();
if (a == 1L) return std::move(r);
for (auto& kv : r.sorted map) {
kv.second *= a;
}
return std::move(r);
}
inline MaxRing operator*(long a, MaxRing&& r) {
return multiply(std::forward <MaxRing>(r), a);
}
inline MaxRing operator*(MaxRing&& r, long a) {
return multiply(std::forward <MaxRing>(r), a);
}
inline MaxRing operator*(long a, const MaxRing& r) {
return MaxRing(r, a);
}
37

inline MaxRing operator*(const MaxRing& r, long a) {


return MaxRing(r, a);
}
}
}
#endif /* DBTOASTER RINGS MIN STRUCT HPP */

You might also like