High Quality
Open the downloaded document, and select print from the file menu (PDF reader required).
The viability of implementing an in-memory database, Er- lang ETS, using a relatively-new data structure, called a Judy array, was studied by comparing the performance of ETS tables based on four data structures: AVL balanced binary trees, B-trees, resizable linear hash tables, and Judy arrays. The benchmarks used workloads of sequentially- and randomly-ordered keys at table populations from 700 keys to 54 million keys.
Benchmark results show that ETS table insertion, lookup, and update operations on Judy-based tables are significantly faster than all other table types for tables that exceed CPU data cache size (70,000 keys or more). The relative speed of Judy-based tables improves as table populations grow to 54 million keys and memory usage approaches 3GB. Term deletion and table traversal operations by Judy-based tables are slower than the linear hash table-based type, but the additional cost of the deletion operation is smaller than the combined savings of the other operations.
Resizing a hash table to 232 buckets, managed by a Judy ar- ray, creates the most consistent performance improvements and uses only about 6% more memory than a regular hash table. Other applications could benefit substantially by this application of Judy arrays.
D.3.3 [Programming Languages]: Language Constructs and Features—Data Types and Structures ; E.1 [Data]: Data Structures—Trees; H.3.4 [Information Storage and Re-
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Erlang Open Telecom Platform (Erlang/OTP) has an efficient in-memory database known as ETS tables (Erlang term storage) for storing and retrieving Erlang data terms. As RAM prices continue to fall and as the gap between CPU instruction execution latency and RAM access latency widens, it becomes increasingly worthwhile to examine the performance of large ETS tables populated with millions of terms. This research compares the efficiency of the current ETS table implementations with several new implementa- tions: one based on an in-memory B-tree and three based on a relatively new data structure called a Judy array [3].
The Judy array turns out to be an excellent data struc- ture for building in-memory datastores that exceed CPU cache size. A performance analysis of seven small bench- mark programs shows that an ETS table implemented with a Judy array usually runs faster than tables implemented with a resizable linear hash table and much faster than ta- bles implemented with AVL trees or B-trees. In cases where Judy-based tables are slower, the combined speed advantage of the faster operations outweighs the penalty of the slower ones.
The primary audience for this research is Erlang develop- ers, most of whom use ETS tables in their applications di- rectly or indirectly via Mnesia [11], an important distributed database application written in Erlang. ETS tables are the bedrock on which Mnesia is built.
A secondary audience is the much larger community of C and C++ developers. Many algorithms that they use daily rely on hash tables for in-memory datastores of one kind or another. Most of those developers assume that their only options for optimizing the datastore portion of their applica- tions are either (a) to tune the hash function or (b) to adjust the size of the hash table. This paper proposes another op- tion: (c) to use a Judy array to create a really big hash table, 232 hash buckets, to reduce the time spent searching and managing collision lists.
This paper is structured as follows. Newcomers to Erlang are given a brief introduction to Erlang in Section 2 to aid them in understanding some of the Erlang jargon and syn- tax that Erlang programmers take for granted. Section 3 presents a summary of existing ETS table types and their implementations. Then the focus of attention shifts to a new data structure, the Judy array. Section 4 introduces
% This is a comment.
square(X) -> mult(X, X).
mult(X, Y) -> X * Y.
the reader to the Judy array and what it looks like from a user’s point of view. Section 5 describes how Judy arrays are used to implement three new ETS table types. The bench- mark programs using all of the ETS table types are discussed next. Section 6 explains several of the design decisions made while creating the benchmark programs. Section 7 analyzes the results of seven ETS benchmark programs. Section 8 presents a small survey of related work. The paper ends with Section 9 naming areas for future research and Sec- tion 10 presenting the conclusion.
This section provides a minimal Erlang primer so that Er- lang neophytes can understand the syntax used in the later sections of this paper. For greater detail, see the original reference book on the Erlang language [1].
Erlang terms can be divided into two general categories, simple and complex. Three simple term types are used in this paper:
numbers. Integers may grow beyond native CPU word length to any size (i.e., “bignum” support). Syntax examples:-4,6.02e23,3141592653589.
friendly names. Erlang atoms must begin with a lower- case letter or must be enclosed in single quotes. Syntax examples:atom1,’ATOM2’,’$foo’.
The two complex term types used in this paper are lists and tuples. An Erlang list, like a LISP list, may be of arbitrary length, and its elements may be any data type, including other lists or tuples. Erlang tuples are similar to lists but are fixed length. A tuple’s constant length allowsO(1) ac- cess to any element within the tuple. Example syntax of a three-element list and a tuple are[1,2,3] and{1,2,3}, respectively. The syntax"scott" is syntactic sugar for a list containing the ASCII character codes for the string “scott” (i.e.,[115,99,111,116,116]).
named in the-export attribute’s list are callable. Further- more, they must be called by a fully-qualified name using the syntaxmodule:func(A), whereA represents zero or more arguments, for example,test:square(5). Specifying the module nameerlang is optional when calling an Erlang built-in function (BIF), for example,date().
Erlang programs may be compiled into abstract byte code or native executable code. Both types of code are executed by a host operating system process that implements the Erlang virtual machine. Like the Java virtual machine, the Erlang virtual machine provides services such as consistent access to host operating system resources, memory management and garbage collection services, and exception handling facilities.
ETS is an acronym for Erlang Term Storage. ETS permits Erlang programs to store large amounts of data in mem- ory withO(logN ) orO(1) access time for tables with sorted and unsorted keys, respectively. This section presents an overview of what ETS tables are, then briefly explains the behavior and underlying data structures of existing ETS ta- ble types as well as the new ETS table types developed for comparison purposes.
Conceptually, an ETS table is a key-value database. In many ways, an ETS table is analogous to associative ar- rays found in many other languages, such as Perl’s hash and Python’s dictionary types. Familiar operations such as key insertion, query, and deletion are supported. Table traver- sal operations such as “get first item” and “get next item” are also available. However, ETS tables also support addi- tional features such as pattern-matching queries, for exam- ple, match all tuples where the first element is an integer less than 5 and the third element is the atomlouise.
2. Aterm’s key position is defined at table creation time. The default is the first element, but the key may be configured to be any element within the tuple.
These two rules allow the programmer a great deal of flex- ibility. Any Erlang term, be it a simple number or a very large list or a deeply-nested tuple, may be used as the key for an ETS-stored tuple. Furthermore, there are no restric- tions on how many elements the tuple may have (without violating rule #2) or on the data type of each tuple element.
active ETS tables of different types at one time, and the compile-time limit may be overridden by an environment variable when the virtual machine is started.
For performance reasons, ETS is primarily implemented as BIFs inside the Erlang virtual machine. All BIFs and their supporting functions, together with the rest of the virtual machine, are written in C.
The current release of Erlang, version R9B, provides one type of ordered ETS table: theordered set type. The se- mantics of anordered set ETS table require that traversal of the table using functions such as “get first key” and “get next key” must return keys in sorted order. Only one tuple with any keyK may be stored in the table at any time. If a tuple with keyK is inserted into anordered set table, and a tuple with that key is already present in the table, the old tuple will be replaced by the new tuple.
The sorting order of anordered set table’s keys is main- tained by a standard AVL balanced binary tree. The AVL tree data structure dictates that operations on the tree take
Because an ETS table key may be an arbitrary Erlang term, a sorting order is defined in order to compare terms of differ- ent types. For example, the value of42 >= "mark" is false. The sorting order is as follows: numbers< atoms< tuples< the empty list[]< non-empty lists< binaries. This rule is applied recursively to the elements of lists and tuples in order to break a tie.
The unordered ETS table types are implemented using a linear hash bucket array. The hash function folds all of the parts of the key term into an unsigned long integer. The hash bucket array is automatically resized as hash bucket populations rise above or fall below a compile-time constant,
To supplement the built-in ETS table types, the author added four new table types to the Erlang virtual machine. The first type is thebtree type. Thebtree ETS table type is implemented using an in-memory B-tree structure.
A B-tree is a 2M-ary tree where all paths from the root node to a leaf node are the same length. In addition, a B- tree limits the number of items stored in any node (except the root) to be betweenM and 2M items. The performance results in Section 7 demonstrate B-tree run-time behavior withM = 4.
B-trees share three characteristics with AVL trees. First, key sorting order is preserved by both tree types. Second, insertion and deletion operations may affect multiple nodes as the tree is rebalanced. Third, operations on the tree take
The other research ETS table types are based on Judy ar- rays: thejudysl,judyesl, andjudyeh types. Their imple- mentation is discussed in detail in Section 5.
The Judy array1 was invented by Doug Baskins while work- ing at Hewlett-Packard. This data structure is relatively new and has not been mentioned in any publications that Baskins or this author is aware of. The Judy array source code has been released to the Open Source community under the GNU Lesser General Public License and is available at this time at [3]. Documents describing the implementation of Judy arrays can be found online at [2] and [13].
The Judy array’s inventor claims that it can operate well on big- and little-endian CPUs, 64-bit and 32-bit CPUs, with small or exceptionally large data populations, and with sparse or dense populations without external tuning param- eters in a memory-efficient manner that, in most cases, is as fast or faster than traditional hashing techniques and much faster than tree-based algorithms. Fortunately, Judy array source code is available with an Open Source license [5] for independent testing.
From an application programming interface point of view, a Judy array is simply a dynamically-sized array. The two principal varieties of Judy arrays, called Judy1 and JudyL, use a processor-dependent word, 32 bits or 64 bits, as the array index. (To simplify descriptions, this paper will as- sume that the size of a Cunsigned long integer is 32 bits.) Each value stored in a Judy1 array is a single bit. Each value stored in a JudyL array is a word. The sorting order of indices in the array is preserved.
Basic Judy array operations include the following: insert indexI into the array, delete indexI, find indexI, find the first/last index in the array, and find the previous/next index in the array that precedes/follows indexI. Other operations include finding the previous/next unused index preceding/following indexI, counting the indices stored in the array between indexI andI, and finding theNth index stored in the array.
Judy arrays do not require explicit initialization. There are no tuning parameters: there is no need to specify how many indices will eventually be stored in the array, what the in- dex range will eventually be, or if the index population will be dense or sparse. There is no need to specify a custom hashing function or index comparison function. Judy arrays use aggressive compression techniques to try to minimize memory consumption.
Add a Comment