Professional Documents
Culture Documents
What Is Nosql: Features of Nosql Databases
What Is Nosql: Features of Nosql Databases
NoSQL is not the name of any particular database instead it refers to a broad class of non-relational
databases that differ from classical relational database management systems (RDBMS) in some
significant aspects, most notably because they do not use SQL as their primary query language,
instead providing access by means of Application Programming Interfaces (API).
In NoSQL databases, schema-free collections are utilized instead so that different types and document
structures such as {color, blue} and {price, 23.5} can be stored within a single
collection.
Below table lists down the major characteristic features of NoSQL databases 1
Feature
Schema-less
Elasticity
Sharding
Asynchronous replication
Source: http://dbpedias.com/wiki/NoSQL:Survey_of_Distributed_Databases
Consequently, NoSQL databases are often categorized according to the way they store data and fall
under the following major categories:
Key-value stores
Graph databases
Document databases
Key-value stores
Key-value stores allow the application to store its data in a schema-less (key, value) pairs. These data
can be stored in a hash table like datatypes of a programming language - so that each value can be
accessed by its key. Although such storage might not be very efficient - since they provide only a
single way to access the values - but eliminates the need for a fixed data model.
Columnar databases
A column-oriented DBMS stores its content by column rather than by row. It contains predefined
families of columns and is more accomplished at scaling and updating at relatively high speeds, which
offers advantages for data warehouses and library catalogues where aggregates are computed over
large numbers of similar data items.
Graph databases
Graph databases optimize the storage of networks or Graphs of related nodal data as a single
logical unit. A graph database uses graph structures with nodes, edges and properties to represent and
store data and provides index-free adjacency, meaning that every element contains a direct pointer to
its adjacent element and no index lookups are necessary. This can be useful in cases of finding degrees
of separation where SQL would require extremely complex queries. A popular movie service, for
example, shows the logged-in user a Best Guess for You rating for each film based on how similar
people rated it, while other services such as LinkedIn, Facebook or Netflix show people in a network at
various degrees of separation. Although such queries become simple in Graph databases, the
relevance of this technology in a financial enterprise is difficult to determine.
Document databases
Document stores are used for large, unstructured or semistructured records. Data is organized in
documents that can contain any number of fields of any length. All document-oriented database
implementations assume documents encapsulate and encode data in some sort of standard formats
known as encodings and are ideal for MS Office or PDF documents. Document databases should not
be confused with Document Management Systems, however. The documents referred to are not actual
documents as such, although they can be. Documents inside a document-oriented database are
similar in some ways to records or rows in relational databases, but they are less rigid because they
are not required to adhere to a standard schema. Unlike a relational database where each record would
have the same set of fields and unused fields might be kept empty, there are no empty fields in
document records. This system allows new information to be added to or removed from any record
without wasting space by creating empty fields on all other records. In contrast to key-value and
columnar databases, which view each record as a list of attributes which are updated one at a time,
document stores allow insertion, updates and queries of entire records using a JavaScript Object
Notation (JSON) format. The concept of a join is less relevant in document databases than in traditional
RDBMS systems. As a result, records that might be joined in a traditional RDBMS, are generally
denormalized into wide records. Denormalization refers to a process by which the read-performance of
a database is optimized by the addition of redundant or grouped data. Some of the NoSQL vendors,
most notably MongoDB, do in fact feature add-on join capabilities as well. Many of these database
categories are beginning to blur, however. As all of them support the association of values with keys,
they are therefore all fundamentally key-value stores; document databases, moreover, can perform all
of the capabilities of columnar databases from a sematic point of view. As a result, the distinguishing
factors must be evaluated in terms of performance and ease of use for a particular solution.
Apache Cassandra
Apache Cassandra is an open-source, distributed database-management system designed to handle
very large amounts of data spread out across many commodity servers while providing a high degree
of service availability with no single point of failure. It is particularly fast at write operations as opposed
to reads and might therefore lend itself best to applications that require analysis of large sets of data
with write-backs.
HBase
HBase is also an open-source, distributed database modeled after Googles BigTable. HBase
technologies are not strictly a data-store, but generally work closely with a NoSQL database to
accomplish highly scalable analyses. HBase scales linearly with the number of nodes and can quickly
return queries on tables consisting of billions of rows and millions of columns.
BigTable
BigTable can be defined as a sparse, distributed, multi-dimensional sorted map. BigTable is designed to
scale into the petabyte range a petabyte is equivalent to 1 million gigabytes - across hundreds or
thousands of machines and to make it easy to add more machines to the system and start taking
advantage of those resources automatically without any reconfiguration.
shares: 1000
},
{
sell: {
symbol: MSFT,
price: 31.25
},
quantity: 5000
},
]
};
Save the record to the desired collection; if it does not already exist, Mongo will create the database
and the collection.
db.orders.save(t);
Subsequently list all orders to the console. An unqualified find() operation will find and return a list of
all collections in the database.
db.orders.find();
Notice how the records are denormalized, which is apparent because each order record contains
pricing information as well. This is in contrast to the relational strategy, where the pricing information
would be in a separate table. This does not, however, imply that joins are entirely forbidden in NoSQL.
MongoDB, for example, supports the concept of a DBRef, which is kind of a join operation. To use it in
this example, a separate collection containing product-pricing information could be created and joined
to the order records.
p1 = {
_id: IBM,
latest_price:195.20
};
db.symbols.save(p1);
p2 = {_id:MSFT, latest_price:31.25};
db.symbols.save(p2);
p3 = {
_id: CSCO,
latest_price:21.00
};
db.products.save(p3);
p4 = {
_id: VMW,
latest_price:100.20
};
db.products.save(p4);
It is now possible to identify all products with a price less than or equal to USD 100:
db.products.find({latest_price: {$le 100}});
Finally, an order record can be created which joins the products and pricing information:
t3 = {
order_date: new Date(),
buy:
{
product: new DBRef(products, p1._id),
quantity:1000
},
sell:
{
product: new DBRef(products, p2._id),
quantity:5000
},
};
db.orders.save(t3);
If the pricing information should subsequently change in the product table, a query will reflect the
updated prices for all records joined to those products.
Blog postings In contrast to the previous example, columnar databases do not support the concept of
a join at all. Apache Cassandra is worth a closer examination in this context. Cassandra retains its data
in a key-value store; keys map to multiple values, which are grouped into column families. Both keyvalue stores and column families are roughly equivalent to an RDBMS table. This example shows the
capture of blog postings in a key-value store named BlogPosts. While there is no mandatory schema
in NoSQL, in this example the records adhere to the following possible configurations:
First type:
{
post: {
get post[post1];
The body-video record associated with post2 can also be retrieved:
get multimedia[post2][body-video];
Availability This is a guarantee that every request receives a response about whether it succeeded or
failed.
Partition-tolerance Also known as fault-tolerance, this is a guarantee that the system continues to
operate despite arbitrary message loss.
Because no distributed system is capable of satisfying all three guarantees at the same time, a
tradeoff must be made. While traditional databases make that decision for us, NoSQL databases
provide these guarantees as tuning options. Database vendors must always decide which two to
prioritize. The options are as follows:
Availability is compromised in favor of consistency and partition-tolerance.
Partition-tolerance is forfeited in favor of consistency and availability.
Consistency is compromised but systems are always available and can work when parts are
partitioned.
Traditional SQL databases place a high priority on consistency and fault-tolerance and have generally
as a result chosen to go with the first option above and forfeit high availability. NoSQL databases
frequently leave that decision to the application operations team and provide configuration options so
that the preferred options can be chosen based on the application use case.
In order to use NoSQL databases at the present time, an understanding of the API language is required
and queries must be written in that language. This is, however, greatly facilitated by the fact that Java
is supported in every case. Work has also been done recently to create a unified NoSQL language
called Unstructured Query Language (UNQL), which is semantically a superset of SQL Data
Manipulation Language (DML). There is also an Apache incubator project called Thrift which involves an
interface-definition language particularly well-suited to NoSQL use cases. Thrift is reminiscent of
CORBA IDL and provides a means by which language-specific interfaces can be generated for most
popular languages. Originally developed at Facebook, it has been shared as an open-source project
since 2007.