You are on page 1of 11

What is NoSQL

Last Updated on Sunday, 27 May 2012 15:21


Written by DWBIConcepts

NoSQL is not the name of any particular database instead it refers to a broad class of non-relational
databases that differ from classical relational database management systems (RDBMS) in some
significant aspects, most notably because they do not use SQL as their primary query language,
instead providing access by means of Application Programming Interfaces (API).

NoSQL...can be considered "Internet age" databases that


are being used by Amazon, Facebook, Google and the like
to address performance and scalability requirements that
cannot be met by traditional relational databases.
NoSQL databases and data-processing frameworks are primarily utilized because of their speed,
scalability and flexibility. Adoption of NoSQL in the enterprise level, however, is still emerging. Some
consider it the absolute apogee of achievement, whileothers maintain it at the peak of the Inflated
Expectations Phase of Gartners Hype Cycle, used to characterize the over-enthusiasm or hype and
subsequent disappointment that typically happen with the introduction of new technologies. Still
others relegate it to an inferior and inconspicuous position in favor of columnar relational databases
such as Sybase IQ or Oracle 11g.

Features of NoSQL databases


One major difference between traditional relational databases and NoSQL is that the latter do not
generally provide guarantees for atomicity, consistency, isolation and durability (commonly known as
ACID property), although some support is beginning to emerge. Instead of ACID, NoSql databases more
or less follow something called "BASE". We will discuss this in more detail later in the article.
ACID is comprised of a set of properties that guarantees that database transactions are processed
reliably. To know more about ACID, read What is a database? A question for both pro and newbie
The other major difference is, NoSQL databases are generally schema-less - that is records in these
databases do not require to conform to a pre-defined storage schema.
In a relational database, schema is the structure of a database system described in a formal language
supported by the DBMS and refers how the database will be constructed and divided into database
objects such as tables, fields, relationships, views, indexes, packages, procedures, functions, queues,
triggers and other elements.

In NoSQL databases, schema-free collections are utilized instead so that different types and document
structures such as {color, blue} and {price, 23.5} can be stored within a single
collection.
Below table lists down the major characteristic features of NoSQL databases 1

Feature

Schema-less

Shared nothing architecture

Elasticity

Sharding

Asynchronous replication

BASE instead of ACID

Source: http://dbpedias.com/wiki/NoSQL:Survey_of_Distributed_Databases

Types of NoSQL databases


NoSQL database systems came into being by some of the major internet players such as Google,
Facebook, LinkedIn and others which had significantly different challenges in dealing with data than
those addressed by traditional RDBMS solutions. There was a need to provide information out of large
volumes of data that to a greater or lesser degree adhered to similar horizontal structures. These
companies realized that performance and real-time character was more important than consistency, to
which much of the processing time in a traditional RDBMS had been devoted.
As such, NoSQL databases are often highly optimized for retrieve and append operations and often
offer little functionality beyond record storage. The reduced run-time flexibility compared to full SQL
systems is counterbalanced by significant gains in scalability and performance for certain data models.
NoSQL databases demonstrate their strengths above all with regard to the flexible handling of variable
data by document-oriented databases, in the representation of relationships by graph databases and
in the reduction of a database to a container with key-value pairs provided by key-value databases.

Consequently, NoSQL databases are often categorized according to the way they store data and fall
under the following major categories:

Key-value stores

Columnar (or column-oriented) databases

Graph databases

Document databases

Key-value stores
Key-value stores allow the application to store its data in a schema-less (key, value) pairs. These data
can be stored in a hash table like datatypes of a programming language - so that each value can be
accessed by its key. Although such storage might not be very efficient - since they provide only a
single way to access the values - but eliminates the need for a fixed data model.

Columnar databases
A column-oriented DBMS stores its content by column rather than by row. It contains predefined
families of columns and is more accomplished at scaling and updating at relatively high speeds, which
offers advantages for data warehouses and library catalogues where aggregates are computed over
large numbers of similar data items.

Graph databases
Graph databases optimize the storage of networks or Graphs of related nodal data as a single
logical unit. A graph database uses graph structures with nodes, edges and properties to represent and
store data and provides index-free adjacency, meaning that every element contains a direct pointer to
its adjacent element and no index lookups are necessary. This can be useful in cases of finding degrees
of separation where SQL would require extremely complex queries. A popular movie service, for
example, shows the logged-in user a Best Guess for You rating for each film based on how similar
people rated it, while other services such as LinkedIn, Facebook or Netflix show people in a network at
various degrees of separation. Although such queries become simple in Graph databases, the
relevance of this technology in a financial enterprise is difficult to determine.

Document databases
Document stores are used for large, unstructured or semistructured records. Data is organized in
documents that can contain any number of fields of any length. All document-oriented database
implementations assume documents encapsulate and encode data in some sort of standard formats
known as encodings and are ideal for MS Office or PDF documents. Document databases should not

be confused with Document Management Systems, however. The documents referred to are not actual
documents as such, although they can be. Documents inside a document-oriented database are
similar in some ways to records or rows in relational databases, but they are less rigid because they
are not required to adhere to a standard schema. Unlike a relational database where each record would
have the same set of fields and unused fields might be kept empty, there are no empty fields in
document records. This system allows new information to be added to or removed from any record
without wasting space by creating empty fields on all other records. In contrast to key-value and
columnar databases, which view each record as a list of attributes which are updated one at a time,
document stores allow insertion, updates and queries of entire records using a JavaScript Object
Notation (JSON) format. The concept of a join is less relevant in document databases than in traditional
RDBMS systems. As a result, records that might be joined in a traditional RDBMS, are generally
denormalized into wide records. Denormalization refers to a process by which the read-performance of
a database is optimized by the addition of redundant or grouped data. Some of the NoSQL vendors,
most notably MongoDB, do in fact feature add-on join capabilities as well. Many of these database
categories are beginning to blur, however. As all of them support the association of values with keys,
they are therefore all fundamentally key-value stores; document databases, moreover, can perform all
of the capabilities of columnar databases from a sematic point of view. As a result, the distinguishing
factors must be evaluated in terms of performance and ease of use for a particular solution.

Popular incarnations of NoSql databases


Most implemented solutions cannot be strictly assigned to a specific type and contain features from
two or more categories. We should also recognize that each NoSQL implementation has its own special
nuances. Popular offerings include the following:

Apache Cassandra
Apache Cassandra is an open-source, distributed database-management system designed to handle
very large amounts of data spread out across many commodity servers while providing a high degree
of service availability with no single point of failure. It is particularly fast at write operations as opposed
to reads and might therefore lend itself best to applications that require analysis of large sets of data
with write-backs.

HBase
HBase is also an open-source, distributed database modeled after Googles BigTable. HBase
technologies are not strictly a data-store, but generally work closely with a NoSQL database to
accomplish highly scalable analyses. HBase scales linearly with the number of nodes and can quickly
return queries on tables consisting of billions of rows and millions of columns.

BigTable

BigTable can be defined as a sparse, distributed, multi-dimensional sorted map. BigTable is designed to
scale into the petabyte range a petabyte is equivalent to 1 million gigabytes - across hundreds or
thousands of machines and to make it easy to add more machines to the system and start taking
advantage of those resources automatically without any reconfiguration.

Coherence and Ehcache


Coherence and Ehcache are equipped with In-Memory caches. Coherence is in heavy use in financial
industries where network latency defined as the time it takes to cross a network connection from
sender to receiver - is a factor.

Possible applications of NoSql Databases


NoSQL databases should generally be considered as potential options when any high-intensity
computation or analysis of large data sets is required, especially when performing real-time analysis.
This can easily make their use in many industry sectors e.g. financial institutions' electronic-trading
applications. Relational databases, especially the columnar variety, do not generally perform well on
updates. As a result, a NoSQL database might present itself as a viable alternative in cases where
massive updates are required. In situations involving variable-record templates or sparse data, NoSQL
document databases can offer a welcome alternative.

Let's do some coding in NoSql


All of the popular NoSQL databases have drivers available in Java and most other popular
programming languages. In addition, each provides an interactive shell where commands can be
executed directly against the database using JSON or the native interface without using any
intermediate programming language. Here are a few sample queries.
Stock orders In this example, we want to capture stock orders consisting of one or more buy or sell
orders. We will maintain this in a database called db in a Mongo collection called orders. A Mongo
collection corresponds roughly to a table in SQL. First, specify the record and assign it to a record
holder named t, t2, and so on:
t = {
order_date: new Date(),
orders: [
{
buy : {
symbol: IBM,
price: 195.20
},

shares: 1000
},
{
sell: {
symbol: MSFT,
price: 31.25
},
quantity: 5000
},
]
};
Save the record to the desired collection; if it does not already exist, Mongo will create the database
and the collection.
db.orders.save(t);
Subsequently list all orders to the console. An unqualified find() operation will find and return a list of
all collections in the database.
db.orders.find();
Notice how the records are denormalized, which is apparent because each order record contains
pricing information as well. This is in contrast to the relational strategy, where the pricing information
would be in a separate table. This does not, however, imply that joins are entirely forbidden in NoSQL.
MongoDB, for example, supports the concept of a DBRef, which is kind of a join operation. To use it in
this example, a separate collection containing product-pricing information could be created and joined
to the order records.
p1 = {
_id: IBM,
latest_price:195.20
};
db.symbols.save(p1);
p2 = {_id:MSFT, latest_price:31.25};
db.symbols.save(p2);
p3 = {
_id: CSCO,
latest_price:21.00
};

db.products.save(p3);
p4 = {
_id: VMW,
latest_price:100.20
};
db.products.save(p4);
It is now possible to identify all products with a price less than or equal to USD 100:
db.products.find({latest_price: {$le 100}});
Finally, an order record can be created which joins the products and pricing information:
t3 = {
order_date: new Date(),
buy:
{
product: new DBRef(products, p1._id),
quantity:1000
},
sell:
{
product: new DBRef(products, p2._id),
quantity:5000
},
};
db.orders.save(t3);
If the pricing information should subsequently change in the product table, a query will reflect the
updated prices for all records joined to those products.
Blog postings In contrast to the previous example, columnar databases do not support the concept of
a join at all. Apache Cassandra is worth a closer examination in this context. Cassandra retains its data
in a key-value store; keys map to multiple values, which are grouped into column families. Both keyvalue stores and column families are roughly equivalent to an RDBMS table. This example shows the
capture of blog postings in a key-value store named BlogPosts. While there is no mandatory schema
in NoSQL, in this example the records adhere to the following possible configurations:
First type:
{
post: {

title: an interesting blog post,


author: Joe Blogger,
body: interesting content
},
multimedia: {
header: header.png,
body: body.mpeg
}
}
Second type:
{
post: {
title: yet another interesting blog post,

author: John Bloghead,


body: more interesting content
},
multimedia: {
header: header.png,
body: body.mpeg
}
}
First, switch to the BlogPosts key-value store, creating it if necessary: use BlogPosts; Next, create the
first posting:
set post[post1][title] = an interesting blog post;
set post[post1][author] = Joe Blogger;
set post[post1][body] = interesting content;
Note how each column is set independently:
set multimedia[post1][header] = header.png;
set multimedia[post1][body] = body.mpeg;
set post[post2][title] = yet another interesting blog post;
set post[post2][author] = John Bloghead;
set post[post2][body] = more interesting content;
set multimedia[post2][body-image] = body_image.png;
set multimedia[post2][body-video] = body_video. mpeg;
The entire post1 record can now be queried:

get post[post1];
The body-video record associated with post2 can also be retrieved:
get multimedia[post2][body-video];

NoSQL versus relational columnar databases Is


NoSql right for you?
Relational columnar databases such as SybaseIQ continue to use a relational model and are accessed
via traditional SQL. The physical storage structure is very different when compared to non-relational
NoSQL columnar stores, which store data as rows whose structure may vary and are organized by the
developer into families of columns according to the application use case.
Relational columnar databases, on the other hand, require a fixed schema with each column physically
distinct from the others, which makes it impossible to declaratively optimize retrievals by organizing
logical units or families. Because a NoSQL database retrieval can specify one or more column families
while ignoring others, NoSQL databases can offer a significant advantage when performing individual
row queries. NoSQL databases cannot meet the performance characteristics of relational columnar
databases when it comes to retrieving aggregated results from groups of underlying records, however.
This distinction is a litmus test when deciding between NoSQL and traditional SQL databases. NoSQL
databases are not as flexible and are exceptional at speedily returning individual rows from a query.
Traditional SQL databases, on the other hand, forfeit some storage capacity and scalability but provide
extra flexibility with a standard, more familiar SQL interface.
Since relational databases must adhere to a schema, they typically need to reserve space even for
unused columns. NoSQL databases have a dense per-row schema and so tend to be better at
optimizing the storage of sparse data, although the relational databases often use sophisticated
storage-optimization techniques to mitigate this perceived shortcoming.
Most importantly, relational columnar databases are generally intended for the read-only access found
in conjunction with data warehouses, which provide data that was loaded collectively from
conventional data stores. This can be contrasted with NoSQL columnar tables, which can handle a
much higher rate of updates.

The CAP Theorem


Despite the high demand in recent years for massively distributed databases with high partition faulttolerance, the CAP theorem stipulates that it is actually impossible for a distributed system to provide
consistency, availability and partition fault-tolerance guarantees simultaneously; a distributed system
can satisfy at most any two of these guarantees at the same time, but not all three. These guarantees
can be understood as follows:
Consistency Concurrently executing queries see the same valid and consistent data at the same
time.

Availability This is a guarantee that every request receives a response about whether it succeeded or
failed.
Partition-tolerance Also known as fault-tolerance, this is a guarantee that the system continues to
operate despite arbitrary message loss.
Because no distributed system is capable of satisfying all three guarantees at the same time, a
tradeoff must be made. While traditional databases make that decision for us, NoSQL databases
provide these guarantees as tuning options. Database vendors must always decide which two to
prioritize. The options are as follows:
Availability is compromised in favor of consistency and partition-tolerance.
Partition-tolerance is forfeited in favor of consistency and availability.
Consistency is compromised but systems are always available and can work when parts are
partitioned.
Traditional SQL databases place a high priority on consistency and fault-tolerance and have generally
as a result chosen to go with the first option above and forfeit high availability. NoSQL databases
frequently leave that decision to the application operations team and provide configuration options so
that the preferred options can be chosen based on the application use case.

Concepts of BASE - Basically Available Soft-state


Eventually
Sometimes, however, perfect consistency is not a requirement and eventual consistency will suffice.
Consequently, many NoSQL databases are using eventual consistency to provide both availability and
partition tolerance guarantees with a maximum level of data consistency. In contrast to immediate
consistency, which guarantees that updates are immediately visible to all when a update operation
returns to the user with a successful result, eventual consistency means that given a sufficiently long
period of time over which no changes are sent, all updates can be expected to propagate eventually
through the system and all the replicas will be consistent.
In database terminology, this is known as Basically Available Soft-state Eventually (BASE) consistent
as opposed to the database concept of ACID. No doubt the juxtaposition of the terms ACID and BASE
was more than a mere coincidence.
Apache CouchDB, for example, uses a versioning system similar to software version control systems
such as Subversion (SVN). An update to a record does not overwrite the old value, but rather creates a
new version of that record. If two clients are operating on the same record and client A updates the
record before client B, then client B will be notified that the version being modified is out of date and
will have the option to requery the revised record and make the change there in a manner similar to an
update and merge operation in SVN.

In order to use NoSQL databases at the present time, an understanding of the API language is required
and queries must be written in that language. This is, however, greatly facilitated by the fact that Java
is supported in every case. Work has also been done recently to create a unified NoSQL language
called Unstructured Query Language (UNQL), which is semantically a superset of SQL Data
Manipulation Language (DML). There is also an Apache incubator project called Thrift which involves an
interface-definition language particularly well-suited to NoSQL use cases. Thrift is reminiscent of
CORBA IDL and provides a means by which language-specific interfaces can be generated for most
popular languages. Originally developed at Facebook, it has been shared as an open-source project
since 2007.

You might also like