Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
7Activity
0 of .
Results for:
No results containing your search query
P. 1
Column Oriented Database C-store

Column Oriented Database C-store

Ratings:

4.25

(4)
|Views: 1,033 |Likes:

More info:

Categories:Types, School Work
Published by: William Tanksley, Jr on Mar 07, 2007
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

05/08/2014

pdf

text

original

 
 
C-Store: A Column-oriented DBMS
Mike Stonebraker
, Daniel J. Abadi
, Adam Batkin
+
, Xuedong Chen
, Mitch Cherniack 
+
,Miguel Ferreira
, Edmond Lau
, Amerson Lin
, Sam Madden
, Elizabeth O’Neil
,Pat O’Neil
, Alex Rasin
, Nga Tran
+
, Stan Zdonik 
 
MIT CSAILCambridge, MA
+
Brandeis UniversityWaltham, MA
UMass BostonBoston, MA
Brown UniversityProvidence, RI
Abstract
This paper presents the design of a read-optimizedrelational DBMS that contrasts sharply with mostcurrent systems, which are write-optimized.Among the many differences in its design are:storage of data by column rather than by row,careful coding and packing of objects into storageincluding main memory during query processing,storing an overlapping collection of column-oriented projections, rather than the current fare of tables and indexes, a non-traditionalimplementation of transactions which includes highavailability and snapshot isolation for read-onlytransactions, and the extensive use of bitmapindexes to complement B-tree structures.We present preliminary performance data on asubset of TPC-H and show that the system we arebuilding, C-Store, is substantially faster thanpopular commercial products. Hence, thearchitecture looks very encouraging.
1.
 
Introduction
Most major DBMS vendors implement record-orientedstorage systems, where the attributes of a record (or tuple)are placed contiguously in storage. With this
row store
 architecture, a single disk write suffices to push all of thefields of a single record out to disk. Hence, highperformance writes are achieved, and we call a DBMSwith a row store architecture a
write-optimized 
system.These are especially effective on OLTP-style applications.In contrast, systems oriented toward ad-hoc queryingof large amounts of data should be
read-optimized 
. Datawarehouses represent one class of read-optimized system,in which periodically a bulk load of new data isperformed, followed by a relatively long period of ad-hocqueries. Other read-mostly applications include customerrelationship management (CRM) systems, electroniclibrary card catalogs, and other ad-hoc inquiry systems. Insuch environments, a
column store
architecture, in whichthe values for each single column (or attribute) are storedcontiguously, should be more efficient. This efficiencyhas been demonstrated in the warehouse marketplace byproducts like Sybase IQ [FREN95, SYBA04], Addamark [ADDA04], and KDB [KDB04].
 
In this paper, we discussthe design of a column store called C-Store that includes anumber of novel features relative to existing systems.With a column store architecture, a DBMS need onlyread the values of columns required for processing a givenquery, and can avoid bringing into memory irrelevantattributes. In warehouse environments where typicalqueries involve aggregates performed over large numbersof data items, a column store has a sizeable performanceadvantage. However, there are several other majordistinctions that can be drawn between an architecture thatis read-optimized and one that is write-optimized.Current relational DBMSs were designed to padattributes to byte or word boundaries and to store values intheir native data format. It was thought that it was tooexpensive to shift data values onto byte or wordboundaries in main memory for processing. However,CPUs are getting faster at a much greater rate than disk bandwidth is increasing. Hence, it makes sense to tradeCPU cycles, which are abundant, for disk bandwidth,which is not. This tradeoff appears especially profitable ina read-mostly environment.There are two ways a column store can use CPU cyclesto save disk bandwidth. First, it can code data elementsinto a more compact form. For example, if one is storingan attribute that is a customer’s state of residence, then USstates can be coded into six bits, whereas the two-character abbreviation requires 16 bits and a variablelength character string for the name of the state requiresmany more. Second, one should
densepack 
 
values instorage. For example, in a column store it isstraightforward to pack N values, each K bits long, into N* K bits. The coding and compressibility advantages of a
Permission to copy without fee all or part of this material is granted  provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise,or to republish, requires a fee and/or special permission from the Endowment 
Proceedings of the 31
st
VLDB Conference,Trondheim, Norway, 2005
 
Writeable Store (WS)Read-optimized Store (RS)
Tuple Mover
column store over a row store have been previouslypointed out in [FREN95]. Of course, it is also desirable tohave the DBMS query executor operate on the compressedrepresentation whenever possible to avoid the cost of decompression, at least until values need to be presentedto an application.Commercial relational DBMSs store complete tuplesof tabular data along with auxiliary B-tree indexes onattributes in the table. Such indexes can be
 primary
,whereby the rows of the table are stored in as close tosorted order on the specified attribute as possible, or
secondary
, in which case no attempt is made to keep theunderlying records in order on the indexed attribute. Suchindexes are effective in an OLTP write-optimizedenvironment but do not perform well in a read-optimizedworld. In the latter case, other data structures areadvantageous, including bit map indexes [ONEI97], crosstable indexes [ORAC04], and materialized views[CERI91]. In a read-optimized DBMS one can explorestoring data using only these read-optimized structures,and not support write-optimized ones at all.Hence, C-Store physically stores a collection of columns, each sorted on some attribute(s). Groups of columns sorted on the same attribute are referred to as“projections”; the same column may exist in multipleprojections, possibly sorted on a different attribute in each.We expect that our aggressive compression techniqueswill allow us to support many column sort-orders withoutan explosion in space. The existence of multiple sort-orders opens opportunities for optimization.Clearly, collections of off-the-shelf “blade” or “grid”computers will be the cheapest hardware architecture forcomputing and storage intensive applications such asDBMSs [DEWI92]. Hence, any new DBMS architectureshould assume a grid environment in which there are Gnodes (computers), each with private disk and privatememory. We propose to horizontally partition data acrossthe disks of the various nodes in a “shared nothing”architecture [STON86]. Grid computers in the near futuremay have tens to hundreds of nodes, and any new systemshould be architected for grids of this size. Of course, thenodes of a grid computer may be physically co-located ordivided into clusters of co-located nodes. Since databaseadministrators are hard pressed to optimize a gridenvironment, it is essential to allocate data structures togrid nodes automatically. In addition, intra-queryparallelism is facilitated by horizontal partitioning of stored data structures, and we follow the lead of Gamma[DEWI90] in implementing this construct.Many warehouse systems (e.g. Walmart [WEST00])maintain two copies of their data because the cost of recovery via DBMS log processing on a very large(terabyte) data set is prohibitive. This option is renderedincreasingly attractive by the declining cost per byte of disks. A grid environment allows one to store suchreplicas on different processing nodes, thereby supportinga Tandem-style highly-available system [TAND89].However, there is no requirement that one store multiplecopies in the exact same way. C-Store allows redundantobjects to be stored in different sort orders providinghigher retrieval performance in addition to highavailability. In general, storing overlapping projectionsfurther improves performance, as long as redundancy iscrafted so that all data can be accessed even if one of theG sites fails. We call a system that tolerates K failures
K-safe
. C-Store will be configurable to support a range of values of K.It is clearly essential to perform transactional updates,even in a read-mostly environment. Warehouses have aneed to perform on-line updates to correct errors. As well,there is an increasing push toward real-time warehouses,where the delay to data visibility shrinks toward zero. Theultimate desire is on-line update to data warehouses.Obviously, in read-mostly worlds like CRM, one needs toperform general on-line updates.There is a tension between providing updates andoptimizing data structures for reading. For example, inKDB and Addamark, columns of data are maintained inentry sequence order. This allows efficient insertion of new data items, either in batch or transactionally, at theend of the column. However, the cost is a less-than-optimal retrieval structure, because most query workloadswill run faster with the data in some other order.However, storing columns in non-entry sequence willmake insertions very difficult and expensive.C-Store approaches this dilemma from a freshperspective. Specifically, we combine in a single piece of system software, both a read-optimized column store andan update/insert-oriented writeable store, connected by a
tuple mover 
,
as noted in Figure 1
.
At the top level, thereis a small Writeable Store (WS) component, which isarchitected to support high performance inserts andupdates. There is also a much larger component called theRead-optimized Store (RS), which is capable of supporting very large amounts of information. RS, as thename implies, is optimized for read and supports only avery restricted form of insert, namely the batch movementof records from WS to RS, a task that is performed by thetuple mover of Figure 1.
Figure 1. Architecture of C-Store
Of course, queries must access data in both storagesystems. Inserts are sent to WS, while deletes must be
 
marked in RS for later purging by the tuple mover.Updates are implemented as an insert and a delete. Inorder to support a high-speed tuple mover, we use avariant of the LSM-tree concept [ONEI96], whichsupports a
merge out 
process that moves tuples from WSto RS in bulk by an efficient method of merging orderedWS data objects with large RS blocks, resulting in a newcopy of RS that is installed when the operation completes.The architecture of Figure 1 must support transactionsin an environment of many large ad-hoc queries, smallerupdate transactions, and perhaps continuous inserts.Obviously, blindly supporting dynamic locking will resultin substantial read-write conflict and performancedegradation due to blocking and deadlocks.Instead, we expect read-only queries to be run in
historical
mode. In this mode, the query selects atimestamp, T, less than the one of the most recentlycommitted transactions, and the query is semanticallyguaranteed to produce the correct answer as of that pointin history. Providing such
snapshot isolation
[BERE95]requires C-Store to timestamp data elements as they areinserted and to have careful programming of the runtimesystem to ignore elements with timestamps later than T.Lastly, most commercial optimizers and executors arerow-oriented, obviously built for the prevalent row storesin the marketplace. Since both RS and WS are column-oriented, it makes sense to build a column-orientedoptimizer and executor. As will be seen, this softwarelooks nothing like the traditional designs prevalent today.In this paper, we sketch the design of our updatablecolumn store, C-Store, that can simultaneously achievevery high performance on warehouse-style queries andachieve reasonable speed on OLTP-style transactions. C-Store is a column-oriented DBMS that is architected toreduce the number of disk accesses per query. Theinnovative features of C-Store include:1.
 
A hybrid architecture with a WS component optimizedfor frequent insert and update and an RS componentoptimized for query performance.2.
 
Redundant storage of elements of a table in severaloverlapping projections in different orders, so that aquery can be solved using the most advantageousprojection.3.
 
Heavily compressed columns using one of severalcoding schemes.4.
 
A column-oriented optimizer and executor, withdifferent primitives than in a row-oriented system.5.
 
High availability and improved performance throughK-safety using a sufficient number of overlappingprojections.6.
 
The use of snapshot isolation to avoid 2PC and lockingfor queries.It should be emphasized that while many of these topicshave parallels with things that have been studied inisolation in the past, it is their combination in a realsystem that make C-Store interesting and unique.The rest of this paper is organized as follows. InSection 2 we present the data model implemented by C-Store. We explore in Section 3 the design of the RSportion of C-Store, followed in Section 4 by the WScomponent. In Section 5 we consider the allocation of C-Store data structures to nodes in a grid, followed by apresentation of C-Store updates and transactions inSection 6. Section 7 treats the tuple mover component of C-Store, and Section 8 presents the query optimizer andexecutor. In Section 9 we present a comparison of C-Store performance to that achieved by both a popularcommercial row store and a popular commercial columnstore. On TPC-H style queries, C-Store is significantlyfaster than either alternate system. However, it must benoted that the performance comparison is not fullycompleted; we have not fully integrated the WS and tuplemover, whose overhead may be significant. Finally,Sections 10 and 11 discuss related previous work and ourconclusions.
2.
 
Data Model
C-Store supports the standard relational
logical datamodel
, where a database consists of a collection of namedtables, each with a named collection of attributes(columns). As in most relational systems, attributes (orcollections of attributes) in C-Store tables can form aunique
 primary key
or be a
 foreign key
that references aprimary key in another table. The C-Store query languageis assumed to be SQL, with standard SQL semantics. Datain C-Store is not physically stored using this logical datamodel. Whereas most row stores implement physicaltables directly and then add various indexes to speedaccess, C-Store implements only
 projections
.Specifically, a C-Store projection is
anchored 
 
on a givenlogical table, T, and contains one or more attributes fromthis table. In addition, a projection can contain anynumber of other attributes from other tables, as long asthere is a sequence of n:1 (
i.e.
, foreign key) relationshipsfrom the anchor table to the table containing an attribute.To form a projection, we project the attributes of interest from T, retaining any duplicate rows, and performthe appropriate sequence of value-based foreign-key joinsto obtain the attributes from the non-anchor table(s).Hence, a projection has the same number of rows as itsanchor table. Of course, much more elaborate projectionscould be allowed, but we believe this simple scheme willmeet our needs while ensuring high performance. Wenote that we use the term projection slightly differentlythan is common practice, as we do not store the basetable(s) from which the projection is derived.
Table 1: Sample EMP data
Name Age Dept SalaryBob 25 Math 10KBill 27 EECS 50KJill 24 Biology 80K

Activity (7)

You've already reviewed this. Edit your review.
1 hundred reads
1 thousand reads
Xavier Araujo liked this
Xiaolong Jiang liked this
nupur_1989 liked this

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->