You are on page 1of 1

Cassandra (non-relational column-store) ParAccel (relational column-store)

Data Model Non relational data model. Typically a row-name, column-

name, and timestamp are sufficient to uniquely map to a
value in the database
Relational data model. It provides a declarative method for
specifying data and queries: users directly state what information
the database contains and what information they want from it, and
let the database management system software take care of
describing data structures for storing the data and retrieval
procedures for answering queries.
Independence of
It stores parts of a data entity or row in separate column-
families, and has the ability to access these column-families
separately. This means that not all parts of a row are picked
up in a single I/O operation from storage, which is considered
a good thing if only a subset of a row is relevant for a
particular query. However, column-families may consist of
many columns, and these columns within column-families
are not independently accessible.
It stores columns from a traditional relational database table
separately so that they can be accessed independently. Like
Cassandra, this is useful for queries that only access a subset of
table attributes in any particular query. However, the main
difference is that every column is stored separately, instead of
families of columns as in Cassandra (this statement ignores fine-
grained hybrid options within ParAccel).
Interface It is distinguished by being part of the NoSQL movement and
does not typically have a traditional SQL interface.
It supports standard SQL interfaces.
Optimized workload It can handle a more diverse set of application requirements
such as much higher rate of updates. It generally does better
for individual row queries, and does not perform well on
aggregation-heavy workloads. It can put attributes that tend
to be co-accessed in the same column-family; this saves the
seek cost that results from column-stores needing to find
different attributes from the same row in many different
It is optimized for read-mostly analytical workloads. These systems
support reasonably fast load times, but high update rates tend to be
problematic. Hence, data warehouses are an ideal market for
ParAccel, since they are typically bulk-loaded, require many
complex read queries, and are updated rarely. It tends to struggle on
workloads that get or put individual rows in the data set, but
thrive on big aggregations and summarizations that require
scanning many rows as part of a single query.