Professional Documents
Culture Documents
Neo Technology
A number of NOSQL databases have emerged over the past decade, not only to handle the
terabytes and petabytes of data generated by enterprises and consumers, but also the types of
data created. NOSQL databases contain structured, semi-structured and unstructured data such
as text, audio, video, social network feeds, Web logs and more that cannot be managed in
traditional databases. However, the first NOSQL databases were designed for high-volume Web
properties, such as Amazon and Google, who needed to store a lot of data but were not
necessarily designed for the high-transaction, business-critical needs of today’s enterprise
applications.
After meeting with hundreds of developers, architects and CIOs at Fortune 500 companies, a
few key requirements emerge as enterprise needs for a NOSQL database. Not surprisingly, many
of them are characteristics that have been proven for years by traditional enterprise-strength
databases.
Today’s NOSQL databases include Key-Value stores, Column Family databases, Document
databases, and Graph databases. Each type stores data in a different way and is designed with a
specific purpose in mind. The type of NOSQL database you choose depends on what type of
data you need to store and how you want to access it. A graph database, for instance, models
real world connections better than other NOSQL databases. It can support today’s complex and
connected data types, and scale to billions of nodes and relationships. It is ideally suited for any
application where knowledge is obtained by relationships.
An enterprise delivering modern applications needs a database that can manage today’s
complex and connected data while still delivering the enterprise strength, transactions and
durability IT departments have counted on for years.
The relational database represents one of the most important developments in the history of
computer science. Upon its arrival over 40 years ago, it revolutionized the way the industry
views data management and today it is practically ubiquitous. However, in some cases relational
database technology shows its age.
The relational database model is built to handle structured data – data with a defined and
complete schema – in tables and it does that very well. The problem is that a lot of the data
used today is not rigidly structured. Big Data has emerged as a term that refers not only to the
massive amounts of data generated by enterprises and consumers - terabytes and petabytes of
information - but the types of data created: structured, semi-structured and unstructured data
such as text, audio, video, social network feeds and Web logs that cannot be managed in
traditional databases.
Today’s data is highly inter-related, and the relationships or connections between them are
important. Traditional databases were designed to perform set-based operations, or to consider
an entire data set when performing a query and then filtering down to reach a conclusion. They
were not optimized for today’s applications, which want to see how individual records relate to
one another. In a relational database, querying at that level is costly and comes with a
performance penalty.
Over the past few years some of the best known Web properties felt they had no option but to
build their own custom NOSQL databases to manage huge volumes of ever-changing data.
Amazon’s Dynamo and Google’s Big Table are examples of homegrown databases that can store
lots of data, however at times sacrificing consistency for availability.
Not every company can design its own custom NOSQL database, and so a few categories of open
source and commercially available NOSQL have emerged. But NOSQL databases built to manage
these large Web properties are not necessarily designed for the majority of enterprise
applications. The enterprise has seen the same explosion in data complexity and volume as the
A NOSQL database for the enterprise should support ACID transactions including XA-compliant
distributed two-phase commits. The connections between data should be stored on a disk, in a
structure designed for high-performance retrieval of connected data sets, all while enforcing
strict transaction management. This design delivers significantly better performance for
connected data than offered by relational database technologies.
Column Family Databases Store Large Amounts of Data, But Not Rich Data
A Column Family database can handle semi-structured data, because in theory every row can
have its own schema. It has few mandatory attributes and few optional attributes. It’s a
powerful way to capture semi-structured data, but often sacrifices consistency for availability.
Column Family databases can accommodate huge amounts of data, with basic organization to
help sift through the information. Writes are faster than reads, so one natural niche is real-time
data analysis. Logging real-time events is a perfect use case or any time when you need random,
real-time read/write access to your Big Data. Google’s Big Table was built on a Column Family
Document Databases Store Multiple Data Types, But Lack Transaction Support
A document database contains a collection of key-value pairs stored in documents. While it is
good at storing documents, it was not designed with enterprise-strength transactions and
durability in mind. Document databases are the most flexible of the key-value style stores,
perfect for storing a large collection of unrelated, discrete documents. A good application would
be a product catalog, which can display individual items, but not related items. You can see
what‘s available for purchase, but you cannot connect it to what other products like customers
bought after they viewed it. MongoDB and CouchDB are examples of document databases.
A graph database accesses data using traversals. A traversal is how you query a graph,
navigating from starting nodes to related nodes according to an algorithm, finding answers to
questions like “what music do my friends like, that I don’t yet own?” or “if this power supply
goes down, what Web services are affected?” Using traversals, you can easily conduct end-to-
end transactions that represent real user actions.
To summarize, there are four categories of NOSQL databases: Key-Value stores, Column Family
databases, Document databases, and Graph databases. The type of NOSQL database you choose
depends on what type of data you need to store, its size and complexity. Each was designed to
handle today’s data that could not be successfully managed in traditional databases. Key-Value
stores and Column Family databases handle size very well, but when it comes to complexity,
Document databases and Graph databases are better suited to represent rich data. Graph
databases are ideal for applications with complex and connected data.
Figure 2: NOSQL Databases are designed for the size and complexity of today’s data.
A NOSQL graph database can easily perform these queries without impacting performance or
being as cost-prohibitive as traditional databases. They were designed to quickly and easily
compare how individual records relate to one another.
Social Networks
Social networks are not just the Facebooks of today; they are now a part of enterprise
applications where a buyer would like to know what other companies used this product, or what
other products they also purchased. The relationships among entities are a natural application
for a graph database. Every user wants their own view on the world, resulting in extremely
localized queries of the data. With a graph database, local queries are always efficient, no
matter how many users are added to the entire set.
Recommendation Engines
Recommendations are increasingly prevalent in today’s enterprise applications. In the case of a
highly collaborative, global application, creative marketing teams wanted to share and query on
similar assets. With a graph database, users can quickly and easily find out which assets their
peers used and get recommendations on what they want even before they ask for it.
A graph database models real world connections better than other NOSQL databases. It can
support today’s complex and connected data types, and scale to billions of nodes and
relationships. It is ideally suited for any application where knowledge is obtained by
relationships.
NOSQL databases often coexist with traditional relational databases. That’s why the term
“NOSQL” has evolved to mean “Not Only SQL”. Enterprises are too big and too complex for a
one size fits all solution. Transactions span from multiple data stores and need to have seamless
integration. But when evaluating a NOSQL database, it is critical to demand enterprise-
readiness.
An enterprise delivering modern applications needs a NOSQL database that can manage today’s
complex and connected data while still delivering the enterprise strength, transactions and
durability IT departments have relied on for years.
Neo Technology is the NOSQL database company for the enterprise. Proven by eight years of
24/7 production use, Neo4j is a fully transactional database which enables customers, including
Adobe and Cisco, to tackle complex data problems. Neo Technology is a privately held company
funded by Fidelity Growth Partners Europe, Sunstone Capital and Conor Venture Partners, and is
headquartered in Menlo Park, CA. For more information, visit www.neotechnology.com.
World Headquarters
Neo Technology
1370 Willow Road
Menlo Park, CA 94025 USA
U.S. & Canada: 1 (855) 636-4532
European Lab
Neo Technology
Anckargripsgatan 3
211 19 Malmo, Sweden
Tel: 0808-189 0493
www.neotechnology.com
Copyright © 2011 Neo Technology. All rights reserved.