You are on page 1of 10

A WHITE PAPER

Neo Technology

NOSQL for the Enterprise


November 2011
Summary
Businesses are struggling to cope with an explosion of data, growing at 40% per year (McKinsey
Global Institute). The business need to leverage complex and connected data is driving the
adoption of scalable, high-performance NOSQL databases to expand and enhance their data
management strategies. Why NOSQL? What is missing from traditional relational databases that
would create a need for a new breed of database solutions? The answer lies in the types of data
being managed, the volume of information available today, and the relationships between
individual records. Ironically “relational” databases were not designed to navigate the types of
relationships that bring enormous value to many of today’s enterprise applications.

A number of NOSQL databases have emerged over the past decade, not only to handle the
terabytes and petabytes of data generated by enterprises and consumers, but also the types of
data created. NOSQL databases contain structured, semi-structured and unstructured data such
as text, audio, video, social network feeds, Web logs and more that cannot be managed in
traditional databases. However, the first NOSQL databases were designed for high-volume Web
properties, such as Amazon and Google, who needed to store a lot of data but were not
necessarily designed for the high-transaction, business-critical needs of today’s enterprise
applications.

After meeting with hundreds of developers, architects and CIOs at Fortune 500 companies, a
few key requirements emerge as enterprise needs for a NOSQL database. Not surprisingly, many
of them are characteristics that have been proven for years by traditional enterprise-strength
databases.

A NOSQL database should:

● Enable high-performance queries on complex, connected data


● Easily represent the complex, connected data stored in today’s applications
● Demonstrate mature support for end-to-end transactions
● Ensure enterprise-grade durability
● Provide native support for Java, the most widely used platform in today’s
enterprises

Today’s NOSQL databases include Key-Value stores, Column Family databases, Document
databases, and Graph databases. Each type stores data in a different way and is designed with a
specific purpose in mind. The type of NOSQL database you choose depends on what type of
data you need to store and how you want to access it. A graph database, for instance, models
real world connections better than other NOSQL databases. It can support today’s complex and
connected data types, and scale to billions of nodes and relationships. It is ideally suited for any
application where knowledge is obtained by relationships.

An enterprise delivering modern applications needs a database that can manage today’s
complex and connected data while still delivering the enterprise strength, transactions and
durability IT departments have counted on for years.

© 2011 Neo Technology Page 2


The Rise of NOSQL
Businesses are struggling to cope with an explosion of data, growing at 40% per year (McKinsey
Global Institute). The business need to leverage complex and connected data is driving the
adoption of scalable, high performance NOSQL databases to expand and enhance their data
management strategies. What is missing from traditional relational databases that would create
a need for a new breed of database solutions? The answer lies in the types of data being
managed, the volume of information available today, and the need to connect relationships
between individual records. Ironically “relational” databases were not designed to manage the
types of relationships that are so essential in today’s applications.

The relational database represents one of the most important developments in the history of
computer science. Upon its arrival over 40 years ago, it revolutionized the way the industry
views data management and today it is practically ubiquitous. However, in some cases relational
database technology shows its age.

The relational database model is built to handle structured data – data with a defined and
complete schema – in tables and it does that very well. The problem is that a lot of the data
used today is not rigidly structured. Big Data has emerged as a term that refers not only to the
massive amounts of data generated by enterprises and consumers - terabytes and petabytes of
information - but the types of data created: structured, semi-structured and unstructured data
such as text, audio, video, social network feeds and Web logs that cannot be managed in
traditional databases.

Many organizations have introduced complex (semi-structured and/or unstructured) data to


their applications, but because they are so heavily invested in their relational database they
believe that the only way to model this new data is by forcing it into a tabular structure and then
trying to work around it in upper layers. For complex data, the relational database offers poor
runtime characteristics.

Today’s data is highly inter-related, and the relationships or connections between them are
important. Traditional databases were designed to perform set-based operations, or to consider
an entire data set when performing a query and then filtering down to reach a conclusion. They
were not optimized for today’s applications, which want to see how individual records relate to
one another. In a relational database, querying at that level is costly and comes with a
performance penalty.

Over the past few years some of the best known Web properties felt they had no option but to
build their own custom NOSQL databases to manage huge volumes of ever-changing data.
Amazon’s Dynamo and Google’s Big Table are examples of homegrown databases that can store
lots of data, however at times sacrificing consistency for availability.

Not every company can design its own custom NOSQL database, and so a few categories of open
source and commercially available NOSQL have emerged. But NOSQL databases built to manage
these large Web properties are not necessarily designed for the majority of enterprise
applications. The enterprise has seen the same explosion in data complexity and volume as the

© 2011 Neo Technology Page 3


Web world, yet few of the NOSQL databases available today can meet the demands of the
enterprise.

What are the Enterprise Needs for NOSQL Databases?


As enterprises introduce new interactive applications – from banks offering self-service
applications to retailers suggesting additional products based on a customer’s business network
– they expect their database to perform much as it did before, even though their data is much
more complex and connected. After meeting with hundreds of developers, architects and CIOs
at Fortune 500 companies, a few essentials emerge as enterprise requirements for a NOSQL
database. Not surprisingly, many are characteristics that have been proven for years by
traditional enterprise-strength databases.

Ability to Handle Today’s Complex and Connected Data


The biggest difference between a relational database and a NOSQL database is the ability to
store not only huge volumes of data, but also data types that are complex and connected. In
other words, data such as audio, video, social network feeds, Web logs, email, documents, and
other text-centric information are very difficult, if not impossible, to squeeze into the confines
of a traditional relational database. A NOSQL database should enable high performance queries
on complex, connected data inherent in today's applications. Users should be able to ask
questions such as "Who are all my contacts in Europe?" and "Which of my contacts ordered
from this catalog?"

Simplify the Development of Applications Using Complex and Connected Data


A NOSQL database should be able to easily represent the complex and connected data that
makes up today’s enterprise applications. Unlike traditional databases, a flexible schema that
allows for multiple data types enables developers to easily change applications without
disrupting live systems. More collaborative development practices such as Agile have replaced
waterfall processes and databases must be flexible and adaptable to keep the lights on amid
constantly changing infrastructures.

Support for End-to-End Transactions


A surprisingly few number of NOSQL databases commercially available today are able to
conduct “all or nothing” transactions the way traditional databases do. Although this is a must-
have for relational databases, not all NOSQL databases can do this. Enterprise developers want
to be able to group operations and have all of them succeed or not at all. An example of this
would be taking $100 out of one bank account: the database should confirm that $100 has been
deposited into another account before committing it to the database log. Twitter will probably
survive if a single Tweet is lost, but an enterprise application such as online banking cannot
afford such a mistake.

A NOSQL database for the enterprise should support ACID transactions including XA-compliant
distributed two-phase commits. The connections between data should be stored on a disk, in a
structure designed for high-performance retrieval of connected data sets, all while enforcing
strict transaction management. This design delivers significantly better performance for
connected data than offered by relational database technologies.

© 2011 Neo Technology Page 4


Enterprise-grade Durability so that Data is Never Lost
An NOSQL database for the enterprise needs to have enterprise-grade durability that ensures
any transaction committed to the database will not be lost. In database systems, durability
means the ACID property that ensures that transactions committed will be there, no matter
what. In other words, if you book an airline ticket and the system goes down, that seat should
still be booked after the system is recovered. Durability is ensured through the use of database
backups and transaction logs that facilitate the restoration of committed transaction in spite of
any software or hardware failures. Some NOSQL databases tout single machine durability, but
how can a business-critical application put all its eggs in one basket? Relational databases have
employed replication for years to guarantee enterprise-strength durability. NOSQL databases
should also be able to ensure durability.

Java Still Reigns for Enterprise Development


In order to be serious about enterprise development, a NOSQL database must support Java. Java
remains the most prevalent programming languages in today’s enterprises. Developers need a
Java-friendly way to handle complex, connected data using the transactional guarantees
necessary for critical business applications. While hooks to other languages such as Ruby,
Python, Groovy and others are convenient, a NOSQL database must first and foremost support
Java to be a serious contender in the enterprise arena.

Emerging Categories of NOSQL Databases


There are four emerging categories of NOSQL databases available today: Key-Value stores,
Column Family databases, Document databases and Graph databases. Each was designed to
accommodate the huge volumes of data stored today as well as the new data types that are not
easily stored within the confines of a traditional relational database. The type of NOSQL
database you choose should be based on the type of data you need to store, its size and
complexity.

Key-Value Stores are the Simplest of NOSQL Databases


A Key Value data model is simple: it stores data in key and value pairs where every key maps to
a value. It can scale across many machines, but cannot support other data types. A Key-Value
store is ideal for applications that require massive amounts of simple data like sensor data or for
rapidly changing data such as stock quotes. Key-Value stores support massive data sets, of very
primitive data (hence the term “store” and not “database”). They are ideal for capturing time-
series data, like every vital statistic from your morning run, and everyone else's morning run,
over the last decade. Amazon’s Dynamo was built as a Key-Value store.

Column Family Databases Store Large Amounts of Data, But Not Rich Data
A Column Family database can handle semi-structured data, because in theory every row can
have its own schema. It has few mandatory attributes and few optional attributes. It’s a
powerful way to capture semi-structured data, but often sacrifices consistency for availability.
Column Family databases can accommodate huge amounts of data, with basic organization to
help sift through the information. Writes are faster than reads, so one natural niche is real-time
data analysis. Logging real-time events is a perfect use case or any time when you need random,
real-time read/write access to your Big Data. Google’s Big Table was built on a Column Family

© 2011 Neo Technology Page 5


database. Apache Cassandra is another example, which was originally developed for Facebook
to store billions of columns per row. However, it is unable to support unstructured data types or
query end-to-end transactions.

Document Databases Store Multiple Data Types, But Lack Transaction Support
A document database contains a collection of key-value pairs stored in documents. While it is
good at storing documents, it was not designed with enterprise-strength transactions and
durability in mind. Document databases are the most flexible of the key-value style stores,
perfect for storing a large collection of unrelated, discrete documents. A good application would
be a product catalog, which can display individual items, but not related items. You can see
what‘s available for purchase, but you cannot connect it to what other products like customers
bought after they viewed it. MongoDB and CouchDB are examples of document databases.

Graph Databases Show the Connections Between Data


A graph database uses nodes, relationships between nodes and key-value properties instead of
tables to represent information. This model is typically substantially faster for associative data
sets and uses a schema-less, bottoms-up model that is ideal for capturing ad-hoc and rapidly
changing data. Much of today’s complex and connected data can easily be stored in a graph
database where there is great value in the relationships among data sets.

Figure 1: In a graph database, everything is represented by nodes, relationships and properties.

A graph database accesses data using traversals. A traversal is how you query a graph,
navigating from starting nodes to related nodes according to an algorithm, finding answers to
questions like “what music do my friends like, that I don’t yet own?” or “if this power supply
goes down, what Web services are affected?” Using traversals, you can easily conduct end-to-
end transactions that represent real user actions.

© 2011 Neo Technology Page 6


Neo4j is the leading graph database available today, and includes enterprise-ready support for
complex and connected data, transactions, durability, and Java.

To summarize, there are four categories of NOSQL databases: Key-Value stores, Column Family
databases, Document databases, and Graph databases. The type of NOSQL database you choose
depends on what type of data you need to store, its size and complexity. Each was designed to
handle today’s data that could not be successfully managed in traditional databases. Key-Value
stores and Column Family databases handle size very well, but when it comes to complexity,
Document databases and Graph databases are better suited to represent rich data. Graph
databases are ideal for applications with complex and connected data.

Figure 2: NOSQL Databases are designed for the size and complexity of today’s data.

In the Enterprise, There is Value in Relationships


A graph database models real world connections better than other NOSQL databases. It can
support today’s complex and connected data types, and scale to billions of nodes and
relationships. It is ideally suited for any application where knowledge is obtained by
relationships. For example, you may want to know which of your customers on the East Coast
have made a purchase in the last six months AND will be attending an upcoming conference.
The ability to cross-reference these data points gives you much more context to an individual
customer than just a single record. Take it a step further and you can find out more about an
individual customer – whether you have worked in a similar industry or play soccer on the
weekends – all of which you can reference when you meet in person at the show.

A NOSQL graph database can easily perform these queries without impacting performance or
being as cost-prohibitive as traditional databases. They were designed to quickly and easily
compare how individual records relate to one another.

© 2011 Neo Technology Page 7


Enterprise Use Cases for NOSQL Graph Databases
A graph database is ideal for any enterprise application that has structured and unstructured
data and relies on the relationships between records, such as:

Master Data Management


Today’s Fortune 500 companies need a database that can easily overlay information to make
business-critical decisions. Imagine the ramifications if an enterprise found that its traditional
database could not handle the joins when mapping its sales force coverage to its customer base:
if the system could not recognize which sales representative was responsible for deals in the
Southeast region, he/she might not be compensated for a sale. A graph database can capture
complex inter-relationships directly in a graph and keep the lights on for such revenue-
impacting systems.

Network Data Management


While “the cloud” is a promising new way to utilize computing resources, adding yet more layers
to be managed presents a significant challenge. Managing the towering hierarchies of
applications, services, switches, servers and power requires a focus on how things are
connected. A graph database helps perform what-if analysis, and can respond in real-time to a
changing topology of networked entities.

Social Networks
Social networks are not just the Facebooks of today; they are now a part of enterprise
applications where a buyer would like to know what other companies used this product, or what
other products they also purchased. The relationships among entities are a natural application
for a graph database. Every user wants their own view on the world, resulting in extremely
localized queries of the data. With a graph database, local queries are always efficient, no
matter how many users are added to the entire set.

Recommendation Engines
Recommendations are increasingly prevalent in today’s enterprise applications. In the case of a
highly collaborative, global application, creative marketing teams wanted to share and query on
similar assets. With a graph database, users can quickly and easily find out which assets their
peers used and get recommendations on what they want even before they ask for it.

© 2011 Neo Technology Page 8


NOSQL for the Enterprise
NOSQL has emerged to manage new data types, huge volumes of data and the relationships
between complex and connected today inherent in modern applications. The type of NOSQL
database you choose depends on what type of data you need to store and how you want to
access it. Each of the NOSQL databases discussed serves a specific purpose.

A graph database models real world connections better than other NOSQL databases. It can
support today’s complex and connected data types, and scale to billions of nodes and
relationships. It is ideally suited for any application where knowledge is obtained by
relationships.

NOSQL databases often coexist with traditional relational databases. That’s why the term
“NOSQL” has evolved to mean “Not Only SQL”. Enterprises are too big and too complex for a
one size fits all solution. Transactions span from multiple data stores and need to have seamless
integration. But when evaluating a NOSQL database, it is critical to demand enterprise-
readiness.

An enterprise delivering modern applications needs a NOSQL database that can manage today’s
complex and connected data while still delivering the enterprise strength, transactions and
durability IT departments have relied on for years.

© 2011 Neo Technology Page 9


About Neo Technology

Neo Technology is the NOSQL database company for the enterprise. Proven by eight years of
24/7 production use, Neo4j is a fully transactional database which enables customers, including
Adobe and Cisco, to tackle complex data problems. Neo Technology is a privately held company
funded by Fidelity Growth Partners Europe, Sunstone Capital and Conor Venture Partners, and is
headquartered in Menlo Park, CA. For more information, visit www.neotechnology.com.

World Headquarters
Neo Technology
1370 Willow Road
Menlo Park, CA 94025 USA
U.S. & Canada: 1 (855) 636-4532

European Lab
Neo Technology
Anckargripsgatan 3
211 19 Malmo, Sweden
Tel: 0808-189 0493

www.neotechnology.com
Copyright © 2011 Neo Technology. All rights reserved.

You might also like