The Cassandra Data Model¶
For developers new to Cassandra and coming from a relational databasebackground, the data model can be a bit confusing. The following section providesa comparison of the two.
The Cassandra data model is designed for distributed data on a very large scale. Although it isnatural to want to compare the Cassandra data model to a relational database, they are really quitedifferent. In a relational database, data is stored in tables and the tables comprising an applicationare typically related to each other. Data is usually normalized to reduce redundant entries, andtables are joined on common keys to satisfy a given query.For example, consider a simple application that allows users to create blog entries. In thisapplication, blog entries are categorized by subject area (sports, fashion, and so on.). Users canalso choose to subscribe to the blogs of other users. In this example, the user id is the primary keyin the
users
table and the foreign key in the
blog
and
subscriber
tables. Likewise, the categoryid isthe primary key of the
category
table and the foreign key in the
blog_entry
table. Using thisrelational model, SQL queries can perform joins on the various tables to answer questions such as"what users subscribe to my blog" or "show me all of the blog entries about fashion" or "show methe most recent entries for the blogs I subscribe to".
In Cassandra, the
keyspace
is the container for your application data, similar to a database or schema in a relational database. Inside the keyspace are one or more
column family
objects, whichare analogous to tables. Column families contain
columns
, and a set of related columns is identifiedby an application-supplied row
key
. Each row in a column family is
not
required to have the sameset of columns.Cassandra does not enforce relationships between column families the way that relationaldatabases do between tables: there are no formal foreign keys in Cassandra, and joining columnfamilies at query time is not supported. Each column family has a self-contained set of columns thatare intended to be accessed together to satisfy specific queries from your application.For example, using the blog application example, you might have a column family for user data andblog entries similar to the relational model. Other column families (or secondary indexes) could thenbe added to support the queries your application needs to perform. For example, to answer thequeries:
What users subscribe to my blog?
Show me all of the blog entries about fashion.
Show me the most recent entries for the blogs I subscribe to.You need to design additional column families (or add secondary indexes) to support those queries.Keep in mind that some denormalization of data is usually required.
In Cassandra, the
keyspace
is the container for your application data, similar to aschema in a relational database. Keyspaces are used to group column familiestogether. Typically, a cluster has one keyspace per application.Replication is controlled on a per-keyspace basis, so data that has differentreplication requirements should reside in different keyspaces. Keyspaces are notdesigned to be used as a significant map layer within the data model, only as away to control data replication for a set of column families.
Reward Your Curiosity
Everything you want to read.
Anytime. Anywhere. Any device.
No Commitment. Cancel anytime.
