NoSql Database White Paper

Dedication
Every challenging work needs self-efforts as well as

guidance of elders especially those who were very close to
our heart.
Our humble effort We dedicate to our sweet and loving
Father, Mother and Wife
Whose affection, love, encouragement and prays of day and

night make us able to get such success and honor,
Along with all hard working and respect
Teachers, Students
Objectives
 NoSQL is a hot topic.
 Meet some famous Data Models.
 Compare NoSQL and SQL.
 Review common features of NoSQL.
 Identify some SQL Challenges.
 Address CAP Theorem.
 Base Transactions.
 ACID Properties.
 Meet some well Known NoSQL Products.
 Challenges of NoSQL .
Contents | 1
Contents
Abstract .................................................................................................................................................... 1
1. Introduction....................................................................................................................................... 1
1.1. Introduction and Overview ................................................................................................................... 1
1.2. Let’s agree on Some Definition ............................................................................................................. 2
1.3. Open source .......................................................................................................................................... 2
1.4. Scalability ............................................................................................................................................... 2
1.5. Sharding Technique ............................................................................................................................... 5
1.6. NoSQL Market Overview ....................................................................................................................... 6
2. Data models ...................................................................................................................................... 8
2.1. Flat File database model........................................................................................................................ 9
2.2. Hierarchical Model .............................................................................................................................. 10
2.3. Network Model ................................................................................................................................... 11
2.4. Relational Model ................................................................................................................................. 12
2.5. The CAP Theorem ................................................................................................................................ 15
2.6. Object/Relational Model ..................................................................................................................... 17
2.7. Object-Oriented Model ....................................................................................................................... 18
3. NoSQL ............................................................................................................................................. 19
3.1. NOSQL DEFINATION ............................................................................................................................ 20
3.2. LIST OF NOSQL DATABASE................................................................................................................... 21
3.3. BASE Transaction ................................................................................................................................. 22
3.4. Acid Vs Base......................................................................................................................................... 22
4. NoSQL Mtters .................................................................................................................................. 24
4.1. EXPLORING THE DIFFERENT TYPES OF NOSQL DATABASES ................................................................ 25
4.2. Key Value Store NoSQL Database ........................................................................................................ 26
4.3. Document Store NoSQL Database ....................................................................................................... 29
4.4. Column Store NoSQL Database ........................................................................................................... 31
4.5. Graph Base NoSQL Database............................................................................................................... 32
4.6. Complexity ........................................................................................................................................... 33
5. Closer Look on Famous Products...................................................................................................... 34
5.1. Redis .................................................................................................................................................... 35
5.2. Cassandra ............................................................................................................................................ 38
Contents | 2
5.3. MongoDB ............................................................................................................................................. 41

5.4. Neo4j ................................................................................................................................................... 44
5.5. SQL vs NoSQL ....................................................................................................................................... 46
5.6. Challenges of NoSQL ........................................................................................................................... 48
7 Conclusion 121
A List of abbreviations 52
B References 54
List of Figures
1.1 Vertical & Horizontal Scaling ………………………………………………………………………….…………..………....………3
1.2 Database Sharding…………………………………………………………………………………………………………..….…..……..5
1.3 NoSQL Market………………………………………………………………………………………………….………….…...…..……….7
2.1 flat file Model ( configuration file)………………………………………………………………………………….….……….....9
2.2 Hierarchy data model ( windows registry)……………………………………………………………………..….………….10
2.3 Network Model…………………………………………………………………………………………………………………….……….11
2.4 Relation Model Example………………………………………………………………………………………………….……..…….12
2.5 ACID Properties……………………………………………………………………………………………………………….…………….13
2.6 Scaling a Relational Database………………………………………………………………………………………………………14
2.7 CAP theorem ………………………………………………………………………………………………………………………………16
2.8 Object/Relational Model ……………………………………………………………………………………………..……………..17
2.9 O.bject-Oriented Model……………………………………………………………………………………………………….……..18
3.1 NoSQL Database list…………………………………………………………………………………………………………..………..19
4.1 NoSQL Databases……………………………………………………………………………………………………………..………….25
4.2 Key-Value example………………………………………………………………………………………………………………………..27
4.3 Consistent hashing…………………………………………………………………………………………………………..……………28
4.4 Document Storte NoSQL……………………………………………………………………………………………………………….29
4.5 Column Storte NoSQL…………………………………………………………………………………………….…….……………....31
4.6 Graph Base NoSQL……………………………………………………………………………………………………………….……….32
5.1 Super Column………………………………………………………………………………………………………………………….……40
List of Tables
1.1 Scale up Vs Scale out………………………………………………………………………………………………………………………..4
4.1 Key-Value example…………………………………………………………………………………………………………………………23
6.1 NoSQL Vs SQL…………………………………………………………………………………………………………………………………47
Abstract
NoSQL databases offer a significant change to how enterprise applications are built, challenging to two-decade
hegemony of relational databases. The question people face is whether NoSQL databases are an appropriate
choice, either for new projects or to introduce to existing projects. We’ll give rapid introduction to NoSQL
databases: where they came from, the nature of the data models they use, and the different way you have to
think about consistency. From this We’ll outline what kinds of circumstances you should consider using them,
why they will not make relational databases obsolete, and the important consequence of polyglot persistence.
In recent years a new type of database (NOSQL) has emerged in the field of persistence. June 2009, a global
gathering of NOSQL Movement triggered the fuse of “database revolution”, Non-relational database has now
become an extremely popular new area, NOSQL aims to solve the needs of high concurrent read-write, efficient
mass data storage and access, database scalability and high availability.
In these large-scale concurrent systems, cluster cache and data consistency become the focus of attention. This
article proposes a fast and effective approach to estimate the solution of the data consistency. We use node
split and consistent hashing techniques to reduce the servers load by analyzing the request from the client.
Each server node can be divided into a set of virtual nodes, which located in a ring. The complexity caused by
the size of the set is determined by the target system.
Through physical node split, we can reduce the cache miss rate in order to reduce the complexity of the
algorithm of consistency hashing, we will introduce a football game example.
1. Introduction | 1
1. Introduction
1.1. Introduction and Overview
Relational database management systems (RDMBSs) today are the predominant technology for storing
structured data in web and business applications. Since Codds paper “A relational model of data for large
shared data banks“ [Cod70] from 1970 these datastores relying on the relational calculus and providing
comprehensive ad hoc querying facilities by SQL have been widely adopted and are often thought of as the
only alternative for data storage accessible by multiple clients in a consistent way. Although there have been
different approaches over the years such as object databases or XML stores these technologies have never
gained the same adoption and market share as RDBMSs. Rather, these alternatives have either been absorbed
by relational database management systems that e.g. allow to store XML and use it for purposes like text
indexing or they have become niche products for e.g. OLAP or stream processing.
In the past few years, the” one size fits all“-thinking concerning datastores has been questioned by both, science
and web affine companies, which has lead to the emergence of a great variety of alternative databases. The
movement as well as the new datastores are commonly subsumed under the term NoSQL, “used to describe
the increasing usage of non-relational databases among Web developers”.
This Research’s aims at giving a systematic overview Data Models (Flat File model, hierarchical model, network
model, relational model, CAP Theorem, object/relational model, object-oriented model) (chapter 2), as well as
we will introduce an overview of NoSQL with its characteristics ( chapter3 ), also several classes of NoSQL
databases (key-/value-stores, document databases, column-oriented databases) (chapter 4) , as well as an
individual of famous products ( Redis,Cassandra,MongoDB and Neo4j) (chapters 5).
1. Introduction | 2
1.2. Let’s agree on Some Definition
Data base: Collection of persistent data that is used by the application systems of some given enterprise.
Database System: Basically just a computerized record-keeping system. The database itself can be regarded
as a kind of electronic filing cabinet; that is, it is a repository or container for a collection of computerized
data files ·
Data Model: Can be defined as model of the persistent data of some particular enterprise. and Takes facilities
provided and applies them to some specific problem.
1.3. Open source
Free Software And Open-Source Software ( FOSS ) : is computer software that can be classified as both free
software and open-source software. That is, anyone is freely licensed to use, copy, study, and change the
software in any way, and the source code is openly shared so that people are encouraged to voluntarily
improve the design of the software. This is in contrast to proprietary software, where the software is under
restrictive copyright and the source code is usually hidden from the users.
Foss Characteristic
1. Free and Open Source Software that means I can update or distribute the code without any
limitations.
2. Flat Model of Management of community there is no specific centralized management or hierarchy
and a group of people can make their own open source products.
3. Everyone can contribute
4. High Variety which make many products intersect each other to make many opportunities.
1.4. Scalability
Scalability: It is the ability of a system, network, or process to handle a growing amount of work in a capable
manner or its ability to be enlarged to accommodate that growth for example, it can refer to the capability of
a system to increase its total output under an increased load when resources (typically hardware) are added
(ability to respond increasing number of system request).
1. Introduction | 3
1.4.1. Scalability Options
1. Vertical scaling: where you scale by adding more resources, like CPU, memory, or storage to a single
system.
Note
Vertical-scaling is often limited to the capacity of a single machine, scaling
beyond that capacity often involves downtime and comes with an upper limit.
2. Horizontal scaling: where you scale by adding more machines. This model requires that the database
or application is capable of running in a distributed computing environment.
Fig (1.1) Vertical & Horizontal Scaling

1. Introduction | 4
1.4.2. Vertical Scalability Challenges
 Shared Memory, Interlocking and Multiprocessor Concurrency.

 Non-Uniform Memory Access (NUMA).
 Hardware Threading Frequency Reduction.
 A good example for horizontal scaling is Cassandra, MongoDB.
1.4.3. Horizontal Scalability Challenges
 Parallelize the workload across the various machines, Consistent, shared view of the entire data
set.
 Handle distributed concurrency and consistency Minimize the amount of coordination.
Scale Up Scale Out
 Make a single CPU as fast as possible.  Make Many CPUs work together
 Increase clock speed.

 Add RAM.  Learn how to divide your problems into
independent thread.
 Make disk I/O go faster.
Table (1.1) Scale up Vs Scale out

1. Introduction | 5
1.5. Sharding Technique
A database shard : is a horizontal partition of data in a database or search engine. Each individual partition is
referred to as a shard or database shard. Each shard is held on a separate database server instance, to spread
load.
Horizontal partitioning is a database design principle whereby rows of a database table are held separately,
rather than being split into columns (which is what normalization and vertical partitioning do, to differing
extents). Each partition forms part of a shard, which may in turn be located on a separate database server or
physical location.
Sharding a database table before it has been optimized locally causes premature complexity. Sharding should
be used only when all other options for optimization are inadequate. The introduced complexity of database
sharding causes the following potential problems:
 Increased complexity of SQL - Increased bugs because the developers have to write more
complicated SQL to handle sharding logic.
 Sharding introduces complexity - The sharding software that partitions, balances, coordinates,
and ensures integrity can fail.
 Single point of failure - Corruption of one shard due to network/hardware/systems problems
causes failure of the entire table.
 Failover servers more complex - Failover servers must themselves have copies of the fleets of
database shards.
 Backups more complex - Database backups of the individual shards must be coordinated with
the backups of the other shards.
 Operational complexity added - Adding/removing indexes, adding/deleting columns, modifying
the schema becomes much more difficult.
Fig (1.2) Database Sharding

1. Introduction | 6
1.5.1. Consequences
 manage parallel access in the application.

 scales well for both reads and writes.
 not transparent, application needs to be partition-aware so if user know a database in one node
maybe leads to expose database in other nodes however, hackers
1.6. NoSQL Market Overview
is expected to garner $4.2 billion by 2020, registering a CAGR of 35.1% during the forecast period 2014-
2020. The NoSQL Market (Not Only SQL) is a database mechanism developed for storage, analysis and access
of large volume of unstructured data. NoSQL allows schema-less data storage which is not possible with
relational database storage. The benefits of using NoSQL database include high scalability, simpler designs
and higher availability with more precise control. The ability to comfortably manage big data is another
significant reason for the adoption of NoSQL databases. The problems while testing and managing complex
queries as compared to relational database approaches is a major restraint for the wider adoption of NoSQL
technology. Problems while supporting complex queries also limits its adoption among enterprises. However,
in the next few years, the awareness would increase and NoSQL databases would witness rapid adoption in
order to support explosively increased business data, especially, in social networks, retail, e-commerce, etc.
The Not only SQL (NoSQL) technology is emerging in the database market horizon and would grow rapidly
over the next few years. This technology, ideally, is not a substitute for conventional RDBMS products such
as Oracle SQL, Microsoft Access, etc. However, it helps to overcome limitations observed in conventional
RDBMS technologies. The present RDBMS technologies are not capable of supporting unstructured data with
optimum space requirement. The design becomes complex and is hence difficult for developers. The need for
unstructured data management is hence difficult with conventional RDBMS solutions. Furthermore, RDBMS
proves to be a costly solution for developing agile web applications with moderate data analytics
requirements. NoSQL is emerging as an efficient alternative in this scenario, which helps to bridge the
problems associated with RDBMS technology. The adoption of NoSQL would rise significantly in accordance
to its application segments.
1. Introduction | 7
Fig (1.3) NoSQL Market

2. Data Models | 8
Chapter (2) Data model
2. Data models
2. Data Models | 9
2.1. Flat File database model
A flat file database is a database that stores data in a plain text file. Each line of the text file holds one record,
with fields separated by delimiters, such as commas or tabs. While it uses a simple structure, a flat file
database cannot contain multiple tables like a relational database can. Fortunately, most database programs
such as Microsoft Access and FileMaker Pro can import flat file databases and use them in a larger relational
database.
Flat file is also a type of computer file system that stores all data in a single directory. There are no folders or
paths used organize the data. While this is a simple way to store files, a flat file system becomes increasingly
inefficient as more data is added. The original Macintosh computer used this kind of file system, creatively
called the Macintosh File System (MFS). However, it was soon replaced by the more efficient Hierarchical File
System (HFS) that was based on a directory structure.
2.1.1. Flat file characteristic
 Widely Used
 Example-Configuration Files · Example-Comma Separated Value (CSV) files ·
Fig (2.1) flat file Model ( configuration file)

2. Data Models | 10
2.2. Hierarchical Model
A hierarchical data model is a data model in which the data is organized into a tree-like structure. The
structure allows representing information using parent/child relationships: each parent can have many
children but each child only has one parent (also known as a 1: many ratio). All attributes of a specific record
are listed under an entity type. In a database, an entity type is the equivalent of a table; each individual record
is represented as a row and an attribute as a column. Entity types are related to each other
using 1: N mapping, also known as one-to-many relationships. this model is recognized as the first data base
model created by IBM in the 1960s. The most recognized and used hierarchical databases are IMS developed
by IBM ,Windows , TDM and SAS-System 2K.
2.2.1. Hierarchical Data Model characteristics
 Represents data as hierarchical tree structures.

 Each hierarchy represents a number of related records.
 There is no standard language for the hierarchical model.
 A popular hierarchical DML is DL/1 of the IMS system.
 It dominated the DBMS market for over 20 years between 1965 and 1985 and is still a widely used
DBMS worldwide.
Fig (2.2) Hirarach data model ( windows registery)

2. Data Models | 11
2.3. Network Model
The popularity of the network data model coincided with the popularity of the hierarchical data model. Some
data were more naturally modeled with more than one parent per child. So, the network model permitted
the modeling of many-to-many relationships in data. In 1971, (CODASYL) 1 formally defined the network
model. The basic data modeling construct in the network model is the set construct. A set consists of an owner
record type, a set name, and a member record type. A member record type can have that role in more than
one set, hence the multi parent concept is supported. An owner record type can also be a member or owner
in another set. The data model is a simple network, and link and intersection record types (called junction
records by IDMS) may exist, as well as sets between them. Thus, the complete network of relationships is
represented by several pairwise sets; in each set some (one) record type is owner (at the tail of the network
arrow) and one or more record types are members (at the head of the relationship arrow). Usually, a set
defines a 1:M relationship, although 1:1 is permitted. The CODASYL network model is based on mathematical
set theory.
Fig (2.3) Network Model
1
CODASYL DBTG stands for Conference on Date System Languages Database Task Group.
2. Data Models | 12
2.4. Relational Model
(RDBMS - relational database management system) A database based on the relational model developed by
E.F. Codd. A relational database allows the definition of data structures, storage and retrieval operations and
integrity constraints. In such a database the data and relations between them are organized in tables. A table
is a collection of records and each record in a table contains the same fields.
In this model, data is organized in two-dimensional tables called relations. The tables or relation are related
to each other.
Certain fields may be designated as keys, which means that searches for specific values of that field will use
indexing to speed them up. Where fields in two different tables take values from the same set, a join operation
can be performed to select related records in the two tables by matching values in those fields. Often, but
not always, the fields will have the same name in both tables. For example, an "orders" table might contain
(customer-ID, product-code) pairs and a "products" table might contain (product-code, price) pairs so to
calculate a given customer's bill you would sum the prices of all products ordered by that customer by joining
on the product-code fields of the two tables. This can be extended to joining multiple tables on multiple fields.
Because these relationships are only specified at retrieval time, relational databases are classed as dynamic
database management system. The RELATIONAL database model is based on the Relational Algebra.
2.4.1. Properties of Relational Tables:

 Values Are Atomic
 Each Row is Unique
 Column Values Are of the Same Kind
 The Sequence of Columns is Insignificant
 The Sequence of Rows is Insignificant
 Each Column Has a Unique Name
Fig (2.4) Relation Model Example

2. Data Models | 13
2.4.2. ACID properties (atomicity, consistency, isolation, and durability)

This describes a set of properties that apply to data transactions, defined as follows:
 Atomicity: Everything in a transaction must happen successfully or none of the changes are
committed. This avoids a transaction that changes multiple pieces of data from failing halfway and
only making a few changes.
 Consistency: The data will only be committed if it passes all the rules in place in the database (ie: data
types, triggers, constraints, etc.)
 Isolation: Transactions won't affect other transactions by changing data that another operation
is counting on; and other users won't see partial results of a transaction in progress (depending
on isolation mode).
 Durability: Once data is committed, it is durably stored and safe against errors, crashes or any
other (software) malfunctions within the database.
The ACID concept is described in ISO/IEC 10026-1:1992. Each of these attributes can be measured against
a benchmark. In general, however, a transaction manager or monitor is designed to realize the ACID concept.
In a distributed system, one way to achieve ACID is to use a two-phase commit (2PC), which ensures that all
involved sites must commit to transaction completion or none do, and the transaction is rolled back
(see rollback).
Fig (2.5) ACID Properties

2. Data Models | 14
2.4.3. Relational Model is Great
 World Economy is Relational

 SQL = Rich, declarative query language.
 Database enforce referential integrity
 ACID semantics
 Well understood by developers
 Well supported by frameworks and tools, e.g. Spring, JDBC, Hibernate, JPA.
 Tuning, failure and recovery, performance characteristic.
2.4.4. Relational Database Drawbacks
 Relational Databases Are Not Designed To Handle Change : Today, change occurs frequently,
and data modeling is a huge challenge because of the time and resources that relational
databases require. Unfortunately, when using a relational database, even a simple change like
adding or replacing a column in a table might be a million dollar task.
 Relational Databases Are Not Designed For Scale : Relational databases are designed to run on
a single server in order to maintain the integrity of the table mappings and avoid the problems
of distributed computing.
 Scaling Relational Databases Is Hard :Achieving scalability and elasticity is a huge challenge for
relational databases. Relational databases were designed in a period when data could be kept
small, neat, and orderly. That’s just not true anymore. Yes, all database vendors say they scale
big. They have to in order to survive. But, when you take a closer look and see what’s actually
working and what’s not, the fundamental problems with relational databases start to become
more clear.
Fig ( 2.6 ) Scaling a Relational Database

2. Data Models | 15
 Relational Databases Are Not Designed For Heterogeneous Data: Relational databases have
resulted in accidental complexity that keeps most organizations spinning in circles.
Organizations simply cannot keep up with the many shapes, sizes, and types data that are
quickly growing in volume and changing.
 Problems with Structured Data: Data Silos and ETL in most organizations, it is the structured
data that is causing the bulk of the overhead and it’s because relational databases have trouble
with managing data that isn’t uniform. In other words, this data is also heterogeneous.
Relational databases require pre-defined schemas before loading data and any changes that are
required to handle a new data source are cumbersome and result in increasing schema
complexity. To get around this, it is often faster and easier to just setup another database, which
results in data silos and ETL.
 Relational Databases Are Not Designed for Mixed Workloads: Relational databases are
designed for either OLTP or OLAP workloads. You can’t use one database for both. Today, that
limitation is no longer acceptable as IT struggles to keep pace with the speed of business.
 RDBMS Are A Mismatch For Modern App Development: Relational databases require ORM,
which extracts the data away, tearing apart the data and adding more overhead in the process.
2.5. The CAP Theorem
Published by Eric Brewer in 2000, the theorem is a set of basic requirements that describe any distributed
system. And it says “Any distributed system can support only two of the following characteristic “.
Let's imagine a distributed database system with multiple servers. Here's how the CAP theorem applies:
 Consistency - All the servers in the system will have the same data so anyone using the system
will get the same copy regardless of which server answers their request.
 Availability - The system will always respond to a request (even if it's not the latest data or
consistent across the system or just a message saying the system isn't working).
 Partition Tolerance - The system continues to operate as a whole even if individual servers fail
or can't be reached.
It's theoretically impossible to have all 3 requirements met, so a combination of 2 must be chosen and this is
usually the deciding factor in what technology is used.
When it comes to distributed databases, the two choices are really AP or CP since if it's not partition tolerant,
it's not really a reliable distributed database. So the choice is simpler: if a network split happens, do you want
the database to keep answering but with possibly old/bad data (AP)? Or should it just stop responding unless
you can get the absolute latest copy (CP)?
2. Data Models | 16
Fig ( 2.7 ) CAP theorem
2.5.1. CAP Theorem Challenge ·

 Brewers CAP theorem: for any system sharing data, it is impossible to guarantee simultaneous
usually of these three properties.
 Very large systems will partition at some point:
 It is necessary to decide between C and AR ·
 Traditional DBMS prefer C over A and P ·
 Most Web applications choose A (except in specific applications such as order processing)
2.5.2. CAP Theorem and ACID

 Drop A or C of ACID
 Relaxing C makes replication easy, facilitates fault tolerance,
 Relaxing A reduces (or eliminates) need for distributed concurrency control.

2. Data Models | 17
2.6. Object/Relational Model
Object/relational database management systems (ORDBMSs) add new object storage capabilities to the
relational systems at the core of modern information systems. These new facilities integrate management of
traditional fielded data, complex objects such as time-series and geospatial data and diverse binary media
such as audio, video, images, and applets. By encapsulating methods with data structures, an ORDBMS server
can execute complex analytical and data manipulation operations to search and transform multimedia and
other complex objects.
As an evolutionary technology, the object/relational (OR) approach has inherited the robust transaction- and
performance-management features of its relational ancestor and the flexibility of its object-oriented cousin.
Database designers can work with familiar tabular structures and data definition languages (DDLs) while
assimilating new object-management possibilities. Query and procedural languages and call interfaces in
ORDBMSs are familiar: SQL3, vendor procedural languages, and ODBC, JDBC, and proprietary call interfaces
are all extensions of RDBMS languages and interfaces. And the leading vendors are, of course, quite well
known: IBM, Inform ix, and Oracle.
Fig ( 2.8 ) Object/Relational Model

2. Data Models | 18
2.7. Object-Oriented Model
Object DBMSs add database functionality to object programming languages. They bring much more than
persistent storage of programming language objects. Object DBMSs extend the semantics of the C++,
Smalltalk and Java object programming languages to provide full-featured database programming capability,
while retaining native language compatibility. A major benefit of this approach is the unification of the
application and database development into a seamless data model and language environment. As a result,
applications require less code, use more natural data modeling, and code bases are easier to maintain. Object
developers can write complete database applications with a modest amount of additional effort.
According to Rao (1994), "The object-oriented database (OODB) paradigm is the combination of object-
oriented programming language (OOPL) systems and persistent systems. The power of the OODB comes from
the seamless treatment of both persistent data, as found in databases, and transient data, as found in
executing programs."
In contrast to a relational DBMS where a complex data structure must be flattened out to fit into tables or
joined together from those tables to form the in-memory structure, object DBMSs have no performance
overhead to store or retrieve a web or hierarchy of interrelated objects. This one-to-one mapping of object
programming language objects to database objects has two benefits over other storage approaches: it
provides higher performance management of objects, and it enables better management of the complex
interrelationships between objects. This makes object DBMSs better suited to support applications such as
financial portfolio risk analysis systems, telecommunications service applications, world wide web document
structures, design and manufacturing systems, and hospital patient record systems, which have complex
relationships between data.
Fig ( 2.9 ) Object-Oriented Model

3. NoSQL | 19
Chapter (3) NoSQL
3. NoSQL
3. NoSQL | 20
3.1. NOSQL DEFINATION
Definition from www. nosql-database. org Next Generation Databases mostly addressing some of the points:
being non-relational, distributed, open-source and horizontal scalable, the original intention has been
modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more
characteristics apply as schema-free, easy replication support, simple API, eventually consistent / BASE (not
ACID), a huge data amount, and more.
2.8.1. Characteristics of NoSQL Databases

In order to guarantee the integrity of data, most of the classical database systems are based on transactions.
This ensures consistency of data in all situations of data management.
These transactional characteristics are also known as ACID (Atomicity, Consistency, Isolation, and Durability).
However, scaling out of ACID-compliant systems has shown to be a problem. Conflicts are arising between
the different aspects of high availability in distributed systems that are not fully solvable - known as the CAP-
theorem:
 Strong Consistency: all clients see the same version of the data, even on updates to the dataset
- e. g. by means of the two-phase commit protocol (XA transactions), and ACID,
 High Availability: all clients can always find at least one copy of the requested data, even if
some of the machines in a cluster is down,
 Partition-tolerance: the total system keeps its characteristic even when being deployed on
different servers, transparent to the client.
 Large data volumes
 Google “big data “
 Scalable replication and distribution
 Potentially thousands of machines
 Potentially distributed around the world Potentially
 Queries need to return answers quickly
 Mostly query, few updates
 Asynchronous Inserts & Updates
 Schema-less
 ACID transaction properties are not needed - BASE
 Open source development
The CAP-Theorem postulates that only two of the three different aspects of scaling out are
can be achieved fully at the same time. See figure 2.7.
Many of the NOSQL databases above all have loosened up the requirements on Consistency in order to
achieve better Availability and Partitioning. This resulted in systems know as BASE (Basically Available, Soft-
state, Eventually consistent).
3. NoSQL | 21
3.2. LIST OF NOSQL DATABASE
Discussing NoSQL databases is complicated because there are a variety of types from www. nosql-database.
org : Currently (150):
 Column Store – Each storage block contains data from only one column
 Document Store – stores documents made up of tagged elements
 Key-Value Store / TUPLE STORE – Hash table of keys
 XML Databases
 Graph Databases
 Codasyl Databases
 Object Databases
 Grid and Cloud Database Solutions
 XML Databases
 Multidimensional Databases
 Multivalued Databases
 Event Sourcing
 Network Model
 Other NoSQL related databases
 Unresolved and uncategorized
Fig ( 3.1 ) NoSQL Database list

3. NoSQL | 22
3.3. BASE Transaction
Luckily for the world of distributed computing systems, their engineers are clever. How do the vast data
systems of the world such as Google’s BigTable and Amazon’s Dynamo and Facebook’s Cassandra (to name
only three of many) deal with a loss of consistency and still maintain system reliability? The answer, while
certainly not simple, was actually a matter of chemistry or pH: BASE (Basically Available, Soft state, Eventual
consistency). In a system where BASE is the prime requirement for reliability, the activity/potential (p) of the
data (H) changes; it essentially slows down. On the pH scale, a BASE system is closer to soapy water or maybe
the Great Salt Lake . Such a statement is not claiming that billions of transactions are not happening rapidly,
they still are, but it is the constraints on those transactions that have changed; those constraints are
happening at different times with different rules. In an ACID system, the data fizzes and bubbles and is
perpetually active; in a BASE system, the bubbles are still there much like bath water, popping, gurgling, and
spinning, but not with the same vigor required from ACID. Here is why:
 Basically Available: This constraint states that the system does guarantee the availability of the data as
regards CAP Theorem; there will be a response to any request. But, that response could still be ‘failure’ to
obtain the requested data or the data may be in an inconsistent or changing state, much like waiting for a
check to clear in your bank account.
 Soft state: The state of the system could change over time, so even during times without input there may
be changes going on due to ‘eventual consistency,’ thus the state of the system is always ‘soft.’
 Eventual consistency: The system will eventually become consistent once it stops receiving input. The data
will propagate to everywhere it should sooner or later, but the system will continue to receive input and is
not checking the consistency of every transaction before it moves onto the next one. Werner Vogel’s
article “Eventually Consistent – Revisited” covers this topic is much greater detail.
3.4. Acid Vs Base
The relational databases strongly follow the ACID (Atomicity, Consistency, Isolation, and Durability) properties
while the NoSQL databases follow BASE (Basically Available, soft State, eventual consistency) principles. Now,
what are the technical intricacies between them.
In their names. Consistency vs. Eventual Consistency The technical mechanism for commit and transactions
etc. do perhaps not differ that much - it is the usage that differs.
You could say that ACID comes from a paradigm of one database with many users and that transactions on
datasets are made ad only one at the time have the ability to change a value.
and that BASE comes from that the data is distributed and synchronization amongst data is not feasible and
that exact values are not utterly necessary.
3. NoSQL | 23
Then you have tons of methods and technology to support how to enforce ACID and BASE. A challenge for
BASE is how to do updates distributed if the data can take many unforeseen routes. Should one use
a pessimistic approach (e.g. that all transactions are conducted and all nodes) or optimistic (e.g. use of best
value, sum, best data source). For ACID deadlocks is a challenge.
NoSQL or SQL is query languages that in the first case often uses key->values whereas uses more complex
relations. Regardless of what approach to be able to make a query you need to have some taxonomy and
formalism and tags. Taxonomy in order to know what you are asking for (e.g. cats, dogs, cars, yellow cars that
are not dogs, bank robbery, most likes on internet) a formalism to make a query language (some semantic
structure of FETCH or SELECT and a structure of link Keys to values) finally you also need to be able to tag the
unstructured data with meta tags that follows the taxonomy you will use.
Consistency in ACID really has a different meaning than in 'eventual consistency' in BASE.Consistency, as one
of the ACID properties, ensures that a transaction brings the database from one valid state to another.
Consistency as used in the CAP theorem and BASE, relates to how data are seen among the server nodes after
update. Eventual consistency is related to data replication - some server nodes may contain different data
versions for a short period of time. Some NoSQL solutions are fully ACID and some can be configured to
behave as ACID.
Consistency in ACID really has a different meaning than in 'eventual consistency' in BASE. Consistency, as one
of the ACID properties, ensures that a transaction brings the database from one valid state to another.
Consistency as used in the CAP theorem and BASE, relates to how data are seen among the server nodes after
update. Eventual consistency is related to data replication - some server nodes may contain different data
versions for a short period of time.
ACID BASE
 Strong consistency.  Weak consistency – stale data OK.

 Isolation  Availability first.
 Focus on “commit”.  Best effort.
 Nested transactions  Approximate answers OK
 Availability?  Aggressive (optimistic)
 Conservative (pessimistic)  Simpler!
 Difficult evolution  Faster
 Difficult evolution  Easier evolution
Table ( 3.1 ) ACID Vs BASE

4. NoSQL Matters | 24
Chapter (4) NoSQL Matters
4. NoSQL Mtters
4.1. EXPLORING THE DIFFERENT TYPES OF NOSQL DATABASES
In our previous post titled ‘Just Say Yes to NoSQL’, we cited the CAP theorem, did a point-by-point comparison
between RDBMS and NoSQL and explored in-depth, the various characteristics of NoSQL which make it the
most reliable database solution available today.
In this second part of the 3-part series we will focus exclusively on the different types of NoSQL databases.
2.11.1. The primary objective of a NoSQL database is to have
 simplicity of design,
 horizontal scaling, and
 finer control over availability.
NoSql databases use different data structures compared to relational databases. It makes some operations
faster in NoSQL. The suitability of a given NoSQL database depends on the problem it must solve.
2.11.2. Types of NoSQL databases-
There are 4 basic types of NoSQL databases:
1. Key-Value Store – It has a Big Hash Table of keys & values {Example- Riak, Amazon S3 (Dynamo)}
2. Document-based Store- It stores documents made up of tagged elements. {Example- CouchDB}
3. Column-based Store- Each storage block contains data from only one column, {Example- HBase,
Cassandra}
4. Graph-based-A network database that uses edges and nodes to represent and store data. {Example-
Neo4J}
. Fig (4.1) NoSQL Databases

4.2. Key Value Store NoSQL Database
The schema-less format of a key value database like Riak is just about what you need for your storage needs.
The key can be synthetic or auto-generated while the value can be String, JSON, BLOB (basic large object) etc.
The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular
item of data. A bucket is a logical group of keys – but they don’t physically group the data. There can be
identical keys in different buckets.
Performance is enhanced to a great degree because of the cache mechanisms that accompany the mappings.
To read a value you need to know both the key and the bucket because the real key is a hash (Bucket+ Key).
There is no complexity around the Key Value Store database model as it can be implemented in a breeze. Not
an ideal method if you are only looking to just update part of a value or query the database.
When we try and reflect back on the CAP theorem, it becomes quite clear that key value stores are great
around the Availability and Partition aspects but definitely lack in Consistency.
Example: Consider the data subset represented in the following table. Here the key is the name of the 3Pillar
country name, while the value is a list of addresses of 3PiIllar centers in that country.
Key Value
“India” {“B-25, Sector-58, Noida, India – 201301”
{“IMPS Moara Business Center, Buftea No. 1, Cluj-Napoca, 400606″,City

“Romania” Business Center, Coriolan Brediceanu No. 10, Building B, Timisoara,
300011”}
“US” {“3975 Fair Ridge Drive. Suite 200 South, Fairfax, VA 22033”}
. Table (4.1) Key-Value example
The key can be synthetic or auto-generated while the value can be String, JSON, BLOB (basic large object) etc.
This key/value type database allow clients to read and write values using a key as follows:
 Get(key), returns the value associated with the provided key.

 Put(key, value), associates the value with the key.
 Multi-get(key1, key2, .., keyN), returns the list of values associated with the list of keys.
 Delete(key), removes the entry for the key from the data store.
While Key/value type database seems helpful in some cases, but it has some weaknesses as well. One, is that
the model will not provide any kind of traditional database capabilities (such as atomicity of transactions, or
consistency when multiple transactions are executed simultaneously). Such capabilities must be provided by
the application itself.
Secondly, as the volume of data increases, maintaining unique values as keys may become more difficult;
addressing this issue requires the introduction of some complexity in generating character strings that will
remain unique among an extremely large set of keys.
 Riak and Amazon’s Dynamo are the most popular key-value store NoSQL databases.
Fig (4.2) Key-Value example
12.2.1. Consistent hashing

is a special kind of hashing such that when a hash table is resized, only K/n} keys need to be remapped on
average, where K} is the number of keys, and n} is the number of slots. In contrast, in most traditional hash
tables, a change in the number of array slots causes nearly all keys to be remapped because the mapping
between the keys and the slots is defined by a modular operation.
Consistent hashing achieves some of the same goals as rendezvous hashing (also called HRW Hashing). The
two techniques use different algorithms, and were devised independently and contemporaneously.
12.2.2. Consistent hashing Technique

Consistent hashing is based on mapping each object to a point on the edge of a circle (or equivalently,
mapping each object to a real angle). The system maps each available machine (or other storage bucket) to
many pseudo-randomly distributed points on the edge of the same circle.
To find where an object should be placed, the system finds the location of that object's key on the edge of
the circle; then walks around the circle until falling into the first bucket it encounters (or equivalently, the first
available bucket with a higher angle). The result is that each bucket contains all the resources located between
each one of its points and the previous points that belong to other buckets.
If a bucket becomes unavailable (for example because the computer it resides on is not reachable), then the
points it maps to will be removed. Requests for resources that would have mapped to each of those points
now map to the next highest points. Since each bucket is associated with many pseudo-randomly distributed
points, the resources that were held by that bucket will now map to many different buckets. The items that
mapped to the lost bucket must be redistributed among the remaining ones, but values mapping to other
buckets will still do so and do not need to be moved.
A similar process occurs when a bucket is added. By adding new bucket points, we make any resources
between those and the points corresponding to the next smaller angles map to the new bucket. These
resources will no longer be associated with the previous buckets, and any value previously stored there will
not be found by the selection method described above.
The portion of the keys associated with each bucket can be altered by altering the number of angles that
bucket maps to.
Fig (4.3) Consistent hashing
12.2.3. Key-Value Store characteristic
 A two-column table consisting of a key and a value associated with the key.
 The key acts as the index, and the value can be referenced as a look up.
 Example-Project-Voldemort
 http//www. proj ect-vo ldemort. coml
 linkedin
 eventual consistent key value stores, auto scaling.
 Example-Memcached
 http/ /memcachedb. org/
 Backend storage is Berkeley-DB
 Membase-Memcached with persistence and improved consistent hashing,
 AppFabric Cache-Multi region Cache.
 Redis-Data structure server.
 Riak-Based on Amazon's Dynamo.
4.3. Document Store NoSQL Database
The data which is a collection of key value pairs is compressed as a document store quite similar to a key-
value store, but the only difference is that the values stored (referred to as “documents”) provide some
structure and encoding of the managed data. XML, JSON (Java Script Object Notation), BSON (which is a binary
encoding of JSON objects) are some common standard encodings.
The following example shows data values collected as a “document” representing the names of specific retail
stores. Note that while the three examples all represent locations; the representative models are different.
Fig (4.4) Document Storte NoSQL

{officeName:”3Pillar Noida”,
{Street: “B-25, City:”Noida”, State:”UP”, Pincode:”201301”}
{officeName:”3Pillar Timisoara”,
{Boulevard:”Coriolan Brediceanu No. 10”, Block:”B, Ist Floor”, City: “Timisoara”, Pincode: 300011”}
{officeName:”3Pillar Cluj”,
{Latitude:”40.748328”, Longitude:”-73.985560”}
One key difference between a key-value store and a document store is that the latter embeds attribute
metadata associated with stored content, which essentially provides a way to query the data based on the
contents. For example, in the above example, one could search for all documents in which “City” is “Noida”
that would deliver a result set containing all documents associated with any “3Pillar Office” that is in that
particular city.
Apache CouchDB is an example of a document store. CouchDB uses JSON to store data, JavaScriptas its query
language using MapReduce and HTTP for an API. Data and relationships are not stored in tables as is a norm
with conventional relational databases but in fact are a collection of independent documents.
The fact that document style databases are schema-less makes adding fields to JSON documents a simple task
without having to define changes first.
 Couchbase and MongoDB are the most popular document based databases.
4.4. Column Store NoSQL Database
In column-oriented NoSQL database, data is stored in cells grouped in columns of data rather than as rows of
data. Columns are logically grouped into column families. Column families can contain a virtually unlimited
number of columns that can be created at runtime or the definition of the schema. Read and write is done
using columns rather than rows.
In comparison, most relational DBMS store data in rows, the benefit of storing data in columns, is fast search/
access and data aggregation. Relational databases store a single row as a continuous disk entry. Different
rows are stored in different places on disk while Columnar databases store all the cells corresponding to a
column as a continuous disk entry thus makes the search/access faster.
For example: To query the titles from a bunch of a million articles will be a painstaking task while using
relational databases as it will go over each location to get item titles. On the other hand, with just one disk
access, title of all the items can be obtained.
Fig (4.5) Column Storte NoSQL
2.14.1 Column Oriented characteristic
 Store data in column order

 Allow key-value pairs to be stored (and retrieved on key) in a massively parallel system
 Data model: families of attributes defined in a schema, new attributes can be added
 Storing principle: big hashed distributed tables
 Properties: partitioning (horizontally and/or vertically), high availability etc.
 Completely transparent to application ·
 Enables compression over column Document-based.
 Collections of Documents ·
 Schema-less.
 Based on JSON format : a data model which supports lists, maps, dates, Boolean with nesting
 Indexed semi-structured documents.
4.5. Graph Base NoSQL Database
In a Graph Base NoSQL Database, you will not find the rigid format of SQL or the tables and columns
representation, a flexible graphical representation is instead used which is perfect to address scalability
concerns. Graph structures are used with edges, nodes and properties which provides index-free adjacency.
Data can be easily transformed from one model to the other using a Graph Base NoSQL database.
 These databases that uses edges and nodes to represent and store data.
 These nodes are organized by some relationships with one another, which is represented by edges
between the nodes.
 Both the nodes and the relationships have some defined properties.
The following are some of the features of the graph based database, which are explained on the basis of the
example below:
Labeled, directed, attributed multi-graph: The graphs contain the nodes which are labelled properly with
some properties and these nodes have some relationship with one another which is shown by the
directional edges. For example: in the following representation, “Alice knows Bob” is shown by an edge that
also has some properties.
While relational database models can replicate the graphical ones, the edge would require a join which is a
costly proposition.
Fig (4.6) Graph Base NoSQL

4.6. Complexity
The exact positions in the picture above are obviously debatable but I think it serves to illustrate my point: the
key value stores and BigTable clones of the world handle size really well. This is because they have data
models that can easily be partitioned horizontally. Which is great for scale out of, for example, simple two-
column data like a whole bunch of username/password pairs.
The drawback however is that by constraining themselves to simpler data models, they’ve pushed complexity
up the stack. So if you have data with a non-trivial structure, then you have to compensate for a simple data
model by adding more complex functionality in the upper layers.
Document databases and graph databases, on the other hand, have opted for richer data models. This means
that they have more powerful abstractions that make it easy to model both simple and complex domains. But
these richer data models introduce more coupling of data and therefore it’s more challenging to get them to
scale to size.
5. Closer Look on Famous Products | 34
Chapter (5) Closer Look on Famous Products
5. Closer Look on Famous Products

5.1. Redis
2.18.1. QUICK OVERVIEW

Redis (REmote DIctionary Server) is an in-memory, key-value database, commonly referred to as a data
structure server. One of the key differences between Redis and other key-value databases is Redis’s ability to
store and manipulate high-level data types. These data types are fundamental data structures (lists, maps,
sets, and sorted sets) that most developers are familiar with. Redis’s exceptional performance, simplicity, and
atomic manipulation of data structures lends itself to solving problems that are difficult or perform poorly
when implemented with traditional relational databases.
2.18.2. COMMON USE CASES

 Caching – Due to its high performance, developers have turned to Redis when the volume of read
and write operations exceed the capabilities of traditional databases. With Redis’s capability to easily
persist the data to disk, it is a superior alternative to the traditional memcached solution for caching.
 Publish and Subscribe – Since version 2.0, Redis provides the capability to distribute data utilizing
the Publish/Subscribe messaging paradigm. Some organizations have moved to Redis and away from
other message queuing systems (i.e., RabbitMQ,zeromq) due to Redis’s simplicity and performance.
 Queues – Projects such as Resque use Redis as the backend for queueing background jobs.
 Counters – Atomic commands such as HINCRBY, allow for a simple and thread-save implementation
of counters. Creating a counter is as simple as determining a name for a key and issuing the HINCRBY
command. There is no need to read the data before incrementing, and there are no database schemas
to update. Since these are atomic operations, the counters will maintain consistency when accessed
from multiple application servers.
2.18.3. KEY FEATURES

 High-Level Data Structures – Provides five possible data types for values: strings, lists, sets, hashes,
and sorted sets. Operations that are unique to those data types are provided and come with
well documented time-complexity (Big O notation).
 High Performance – Due to its in-memory nature, the project maintainer’s commitment to keeping
complexity at a minimum, and an event-based programming model, Redis boasts
exceptional performance for read and write operations.
 Lightweight With No Dependencies – Written in ANSI C, and has no external dependencies. Works
well in all POSIX environments. Windows is not officially supported, but an experimental build is
provided by Microsoft.
 High Availability – Built-in support for asynchronous, non-blocking, master/slave replication to
ensure high availability of data. There is currently a high-availability solution called Redis Sentinel that
is currently usable, but is still considered a work in progress.
2.18.4. COMPANIES USING REDIS

 Twitter – Deployed a massive Redis cluster to store the timelines for all users. Utilizing the list data
structure, Twitter stores the 800 most recent incoming tweets for a given user. View the presentation
given by Twitter on how they distributetimelines at scale.
 Pinterest – Stores the user follower graphs in a Redis cluster where data is sharded across hundreds
of instances. Pinterest turned to Redis after finding that their original solution of MySQL and
memcached was reaching its limits. More on howPinterest is using Redis.
 Github – An early adopter of the Redis project, Github has developed and open-sourced the
library, Resque, to facilitate the execution of background jobs that have been placed on a queue.
Github developers took advantage of the fact that Redis had solved many of the difficult queueing
issues, so the developers could stay focused on the difficult worker scheduling issues. More on
how Github uses Redis for their job queueing needs.
2.18.5. TAKE AWAY
 The combination of high-level data structures, high performance, and overall intuitiveness allow Redis
to fill the role as the Swiss Army knife of data stores for a developer.
 Redis is well suited to solving challenges encountered when developing real-time systems, thanks to
the predictability of operations applied to the data in the database.
 Redis attributes much of its performance to that fact that the data always resides in-memory. Data
can be persisted to disk, but incorrect configuration could lead to data loss when Redis is shutdown
improperly.
2.18.6. Scaling Redis
 Master/slave replication
 Tree of Redis servers
 Non-persistent master can replicate to a persistent slave
 Use slaves for read-only queries
 Run multiple servers per physical host
 Server is single threaded Leverage multiple CPUs
 Optional “virtual memory"
 Ideally data should fit in RAM
 Values (not keys) written to disc
2.18.7. Redis Query Examples
var redis = require("redis"),

client = redis.createClient();
// if you'd like to select database 3, instead of 0 (default), call

// client.select(3, function() { /* ... */ });
client.on("error", function (err) {

console.log("Error " + err);
});
client.set("string key", "string val", redis.print);

client.hset("hash key", "hashtest 1", "some value", redis.print);
client.hset(["hash key", "hashtest 2", "some other value"], redis.print);
client.hkeys("hash key", function (err, replies) {
console.log(replies.length + " replies:");
replies.forEach(function (reply, i) {
console.log(" " + i + ": " + reply);
});
client.quit();
});
This will display:

mjr:~/work/node_redis (master)$ node example.js
Reply: OK
Reply: 0
Reply: 0
2 replies:
0: hashtest 1
1: hashtest 2
mjr:~/work/node_redis (master)$
5.2. Cassandra
Introduction. Apache Cassandra is a highly scalable, high-performance distributed database designed to

handle large amounts of data across many commodity servers, providing high availability with no single point
of failure. It is a type of NoSQL database.
2.19.1. What is Apache Cassandra?

Apache Cassandra is an open source, distributed and decentralized/distributed storage system (database), for
managing very large amounts of structured data spread out across the world. It provides highly available
service with no single point of failure.
Listed below are some of the notable points of Apache Cassandra:
 It is scalable, fault-tolerant, and consistent.

 It is a column-oriented database.
 Its distribution design is based on Amazon’s Dynamo and its data model on Google’s Bigtable.
 Created at Facebook, it differs sharply from relational database management systems.
 Cassandra implements a Dynamo-style replication model with no single point of failure, but adds a
more powerful “column family” data model.
 Cassandra is being used by some of the biggest companies such as Facebook, Twitter, Cisco,
Rackspace, ebay, Twitter, Netflix, and more.
2.19.2. Features of Cassandra

Cassandra has become so popular because of its outstanding technical features. Given below are some of
the features of Cassandra:
 Elastic scalability - Cassandra is highly scalable; it allows to add more hardware to accommodate
more customers and more data as per requirement.
 Always on architecture - Cassandra has no single point of failure and it is continuously available for
business-critical applications that cannot afford a failure.
 Fast linear-scale performance - Cassandra is linearly scalable, i.e., it increases your throughput as you
increase the number of nodes in the cluster. Therefore it maintains a quick response time.
 Flexible data storage - Cassandra accommodates all possible data formats including: structured,
semi-structured, and unstructured. It can dynamically accommodate changes to your data structures
according to your need.
 Easy data distribution - Cassandra provides the flexibility to distribute data where you need by
replicating data across multiple data centers.
 Transaction support - Cassandra supports properties like Atomicity, Consistency, Isolation, and
Durability (ACID).
 Fast writes - Cassandra was designed to run on cheap commodity hardware. It performs blazingly fast
writes and can store hundreds of terabytes of data, without sacrificing the read efficiency.
2.19.3. History of Cassandra

 Cassandra was developed at Facebook for inbox search.
 It was open-sourced by Facebook in July 2008.
 Cassandra was accepted into Apache Incubator in March 2009.
 It was made an Apache top-level project since February 2010.
2.19.4. Cassandra Data Model
 Columns-The column is the smallest increment of data in Cassandra. It is a tuple containing a name,
a value and a timestamp.
 A column must have a name, and the name can be a static label (such as"name"or"email") or it can
be dynamically set when the column is created by application.
 It is not required for a column to have a value.
2.19.5. Column Family
 Column Family is similar to a table in that it is a container for columns and rows.
 In Cassandra, we define column families. Column families can (and should) define metadata about
the columns, but the actual columns that make up a row are determined by the client application,
 Each row can have a different set of columns.
 Column Family is not entirely schema-less. Each column family should be designed to contain a single
type of data.
2.19.6. Super Column

 Column Family can contain either regular columns or super columns.
 Super Column which adds another level of nesting to the regular column family structure.
 Super columns are comprised of a (super) column name and an ordered map of sub-columns.
 A super column can specify a comparator on both the super column name as well as on the sub-
column names.
Fig (5.1) Super Column
2.19.7. Cassandra Advantages

 Tunable consistency.
 Decentralized.
 Writes are faster than reads.
 No Single point of failure.
 Incremental scalability.
 Uses consistent hashing (logical partitioning) when
 clustered.
 Peer to peer routing (ring).
5.3. MongoDB
MongoDB is a document-oriented database and is currently the most popular NoSQL database on the
market. Let’s take a quick look at MongoDB, its main features, and how it’s being used.
2.20.1. WHAT’S MONGODB?
Similar to how relational databases are based on tables, document-oriented databases such as MongoDB
are based on collection of documents, with each of these documents consisting of key/value attributes. A
single document can be thought of as the equivalent of a row in a table, with each key being similar to a
column name and each key’s value being similar to the respective row’s value. The main difference being
that a document is not constrained to a certain schema or columns within a table. Two documents could
share similar elements, like an ID field, as well as having completely different elements. For instance, in an
inventory system all items will have a price, but automotive parts will have additional attributes that are
completely different than the additional attributes that shoes would have. And because MongoDB allows
for dynamic schema changes, it’s easy to make agile changes without having to remodel the existing
database in order to support the new fields, such as adding a new inventory product with its own unique set
of attributes. Additionally, the hierarchy of documents map easily to object hierarchies within application
code, simplifying create, read, update, and delete operations.
MongoDB not only provides all these capabilities, but it does so without impacting performance, high
availability or scalability. In fact, MongoDB outperforms many traditional RDBMS with excellent mirroring
and auto-scaling features, allowing it to grow as needs and data change over time.

 Content Management and Delivery – Manage a diverse product catalog of content in a single data
store that allows for quick changes and rapid response times without additional complexity from content
retrieval systems.
 Mobile and Social Infrastructure – MongoDB provides a high-availability, low-latency, agile, and
scalable platform that allows for geospatial capabilities, real-time analytics, and global availability.
 Customer Data Management – Utilize rich querying capabilities for real-time analytics on massive
user bases with complex data models by using dynamic schemas and auto-sharding for horizontal scaling.

 Dynamic Schemas – MongoDB’s dynamic schema provides a simple way to incorporate changes as
application requirements change. These changes can be made to the database without affecting existing
data or application code and without incurring downtime.
 Operational Intelligence – MongoDB’s native map/reduce and aggregation framework provides
insights in real-time for applications, going beyond the capabilities of batch analytics technologies like
Hadoop and traditional BI tools.
 Deployment Flexibility – MongoDB was built to work with commodity hardware and cloud
architectures. Data is localized for queries to ensure performance is robust and predictable regardless of
deployment size.
 Simple Scale-Out – MongoDB is designed to be scaled across server clusters. As data volumes grow,
organizations can simply add more nodes to their clusters, and MongoDB will balance the data seamlessly
and automatically in the background.
 Rich Querying – MongoDB supports a full query language and primary and secondary indexing, as
well as full text search with Google-like syntax.
2.20.4. COMPANIES USING MONGODB
MongoDB already serves many Fortune 500 and Global 500 companies across financial, government,
healthcare, media and entertainment, retail, and telecommunications industries. Here are a few examples of
how some companies have incorporated MongoDB into their business:
 Forbes – Utilized MongoDB to aggregate and integrate dynamic content from their static and siloed
data stores in order to update and control content on their website. Because they MongoDB is open
source, they were able to do so with minimal funding or additional staffing.
 MetLife – Used MongoDB as the data engine for “The Wall,” an innovative customer service
application similar to the Facebook user interface, which provides a consolidated view of MetLife
customers across all lines of business. A prototype was built in two weeks and was live in the U.S. in 90
days.
 CERN – Created a data aggregation system built on MongoDB to provide search and aggregate
information from a wide variety of sources into a consistent, data agnostic form. This allows thousands
of users to quickly free form query against terabytes of meta-data and returns tens of thousands of
results.
 Under Armour – Leveraged MongoDB at the heart of the ecommerce platform to enable a flexible
data infrastructure for rapidly changing demands of the business, as well as meet their disaster recovery
and scalability needs through multi-data center replication and sharding.
2.20.5. TAKE AWAY

 MongoDB is the leading NoSQL database on the market and utilizes a document-oriented database.
 MongoDB supports the rich querying of datasets large or small with fast response times.
 Using dynamic schemas, MongoDB enables agile development and flexibility for application or
business requirement changes.
 MongoDB emphasizes performance, scalability, and high availability and in many cases exceeds
traditional databases in these areas.
 MongoDB is a proven solution for a wide range of business requirements to companies across
numerous industries.
2.20.6. MongoDB Query Examples
Find a professor that is available at 6pm on a Monday

{
company:"Facu|ty of Computers and lnfomwation Sciences. Mansoura Unnversuty".
off1ceHours: (
$elernMatch. (
"dayOfWeek" : "Monday",
"open"' ($lte: 1800)‘
"close": {$gtQ: 1800)
} DBCursor cursor = col|ection.ﬁnd(qbe0bject);
while (cursor.hasNext()) {
DBObject 0 = cursor.next();
5.4. Neo4j
Neo4j is a database designed around relationships. Instead of fixed relations between tables, storage and
retrieval is based on the interconnections between data. Neo4j is the most used graph database, a category
of NoSQL databases. It is open source and backed by Neo Technology. In a graph database, data points are
arbitrarily connected to each other and both the points (called nodes) and directional relationships have
properties that describe them. This is different than traditional relational databases where tables of data all
have the same form, have fixed relationships to other tables of data, and the relationships themselves do not
have any properties.
Any time your data is semi-structured, large volume, or highly associative you can leverage graph search. The
stand out use cases are:
 Social Networks – People and the relationships between them are perfectly suited to graph
databases. Queries that would cripple a relational system like second degree friends (friends of friends)
are straightforward in a graph database and Neo4j’s Cypher query language is easy to read and
understand.
 Recommendation Engines – With a graph database you can use your connected data to benefit your
customers. Neo4j offers the power and speed to keep up with the growth of your data.
 Geo Routing – Imagine modeling the classic traveling salesman problem of trying to visit each city on
a map exactly once with the shortest route. The data model is a map of cities and the routes between
them and is already a graph. Applications like routing inventory, scheduling tours and geospatial analysis
are more comprehensible by developers and business owners in Neo4j compared to a relational database.
Performance also goes from “minutes to milliseconds” without relying on extra lookups to indices during
retrieval.
 Reliable – Fully ACID compliant, scalable to billions of nodes, distributed options. ACID (Atomicity,
Consistency, Isolation, Durability) compliance is a guarantee of data reliability. Operational reliability is
attained through online backups and clustering.
 Framework support – While implemented in Java there are drivers for all major frameworks: Node.js,
Scala, Java, Spring, Ruby On Rails, Ruby, PHP, .NET, Python, and many others. There is also a REST API so
that anything capable of making HTTP requests can access a Neo4j database.
 Dual License – The dual license model gives a lot of flexibility for using Neo4j. There is a GPL for open
source projects so there’s a big community of support. Commercial licenses allow for closed source
projects, are free for personal use, add production support, and scale with enterprise size.
2.21.3. COMPANIES USING NEO4J

 Glassdoor – Uses Neo4j to power their Inside Connections system to search users’ Facebook network
to find out who they know within a company and to give them more relevant job recommendations.
 Cisco Systems – Models their entire sales organization in a versioned graph database. They use Neo4j
to run monitoring algorithms in real time to monitor performance and detect conflicts.
 Gamesys Ltd – This UK based company uses Neo4j to track player interactions and referral bonuses.
2.21.4. TAKE AWAY
Graph databases are well suited to model highly associative data such as relationships between people or
products. Graph queries using Neo4j’s Cypher language can be orders of magnitude faster than a relational
database modeling the same data.
Neo4j’s graph database protects data integrity and is fully ACID compliant.
With its extensive framework support and REST API, it is simple to integrate with existing applications and
architectures.
2.21.5. Cypher query language for Neo4j

START a = node (* )
MATCH (a)-[ :ACTED IN] ー >(m)<-[ DIRECTED] ー (d)
RETURN a. Name, m. t it le, d. Name
a.name m.title d.name
“Keanu Reeves” “The Matrix” “Andy Wachowski”
“Keanu Reeves” “The Matrix Reloaded” “Andy Wachowski”
“Noah Wyle” “ A Few Good Men” “ Rob Reiner”
“Tom Hanks” “Cloud Atlas” “Andy Wachowski”

5.5. SQL vs NoSQL
SQL (relational) versus NoSQL scalability is a controversial topic. This paper argues against both
extremes. Here is some more background to support this position. The argument for relational over NoSQL
goes something like this:
• If new relational systems can do everything that a NoSQL system can, with analogous performance
and scalability, and with the convenience of transactions and SQL, why would you choose a NoSQL system?
• Relational DBMSs have taken and retained majority market share over other competitors in the past 30
years: network, object, and XML DBMSs.
• Successful relational DBMSs have been built to handle other specific application loads in the past:
read-only or read-mostly data warehousing, OLTP on multi-core multi-disk CPUs, in-memory databases,
distributed databases, and now horizontally scaled databases.
• While we don’t see “one size fits all” in the SQL products themselves, we do see a common
interface with SQL, transactions, and relational schema that give advantages in training,
continuity, and data interchange. The counter-argument for NoSQL goes something like this:
• We haven’t yet seen good benchmarks showing that RDBMSs can achieve scaling comparable
with NoSQL systems like Google’s BigTable.
• If you only require a lookup of objects based on a single key, then a key-value store is adequate and
probably easier to understand than a relational DBMS. Likewise for a document store on a simple
application: you only pay the learning curve for the level of complexity you require.
• Some applications require a flexible schema, allowing each object in a collection to have
different attributes. While some RDBMSs allow efficient “packing” of tuples with missing attributes, and some
allow adding new attributes at runtime, this is uncommon.
• A relational DBMS makes “expensive” (multimode multi-table) operations “too easy”. NoSQL
systems make them impossible or obviously expensive for programmers.
• While RDBMSs have maintained majority market share over the years, other products have
established smaller but non-trivial markets in areas where there is a need for particular capabilities, e.g.
indexed objects with products like BerkeleyDB, or graph-following operations with object-oriented DBMSs.
Table (6.1) NoSQL Vs SQL
.
5.6. Challenges of NoSQL
1. Less mature
RDBMSs have been around a lot longer than NoSQL databases. The first RDBMS was released into the
market about 25 years ago. While proponents of NoSQL may present this as a disadvantage citing that age
is an indicator of obsolescence, with the advancement of years RDBMSs have matured to become richly
functional and stable systems.
In contrast, most of the NoSQL database alternatives have just barely made it out of the pre-production
stages, and there are many important features that have not yet been implemented. It’s an exciting
prospect for a developer to be teetering on the cutting edge of technology, but caution must be exercised
to avoid any disastrous consequences.
2. Less support
All enterprises need to have the reassurance that should a key function within their data management
system fail, they will have access to competent support in a timely manner. All the RDMBS vendors have
made great effort to ensure that such services are available, and enterprises can also enlist 24 hour support
from remote database administration services, which have the expertise to handle most of the RDBMSs.
Each NoSQL database in contrast tends to be open-source, with just one or two firms handling the support
angle. Many of them have been developed by smaller startups which lack the resources to fund support on
a global scale, and also the credibility that the established RDBMS vendors like Oracle, IBM and Microsoft
enjoy.
3. Business intelligence and analytics

NoSQL databases were created with the demands of the Web 2.0 modern-day web applications in mind. As
such, most features are directed at meeting these demands. Where the demands of a data app extend
beyond the characteristic ‘insert-read-update-delete’ cycle of a typical web app, these databases offer few
features for analysis and query ad-hoc.
Simple queries require some programming knowledge, and the most common business intelligence tools
that many enterprises rely on do not offer connectivity to NoSQL databases. However, this may be solved
in time, seeing as some tools like PIG or HIVE have been created to offer ad-hoc query functionality for
NoSQL databases.
4. Administration
The end goal for NoSQL database design was to offer a solution that would require no administration, but
the reality on the ground is much different. NoSQL databases still demand a lot of technical skill with both
installation and maintenance.
5. No advanced expertise
Because NoSQL databases are still new, virtually every NoSQL developer out there is still learning the ropes,
unlike RDBMS systems, which have millions of proficient developers throughout the market and in every
field of trade. Over time, this situation will resolve itself, but presently, it remains easier to find an RDBMS
expert than a NoSQL expert.
Any organization that wants to implement NoSQL solutions needs to proceed with caution, bearing in mind
the above limitations in addition to understanding the benefits that NoSQL databases offer their relational
counterparts.
6. Dynamic Schemas
Relational databases require that schemas be defined before you can add data. For example, you might
want to store data about your customers such as phone numbers, first and last name, address, city and
state – a SQL database needs to know what you are storing in advance.
This fits poorly with agile development approaches, because each time you complete new features, the
schema of your database often needs to change. So if you decide, a few iterations into development, that
you'd like to store customers' favorite items in addition to their addresses and phone numbers, you'll need
to add that column to the database, and then migrate the entire database to the new schema.
If the database is large, this is a very slow process that involves significant downtime. If you are frequently
changing the data your application stores – because you are iterating rapidly – this downtime may also be
frequent. There's also no way, using a relational database, to effectively address data that's completely
unstructured or unknown in advance.
NoSQL databases are built to allow the insertion of data without a predefined schema. That makes it easy
to make significant application changes in real-time, without worrying about service interruptions – which
means development is faster, code integration is more reliable, and less database administrator time is
needed. Developers have typically had to add application-side code to enforce data quality controls, such
as mandating the presence of specific fields, data types or permissible values. More sophisticated NoSQL
databases allow validation rules to be applied within the database, allowing users to enforce governance
across data, while maintaining the agility benefits of a dynamic schema.
7. Auto-Sharding
Because of the way they are structured, relational databases usually scale vertically – a single server has to
host the entire database to ensure acceptable performance for cross- table joins and transactions. This gets
expensive quickly, places limits on scale, and creates a relatively small number of failure points for database
infrastructure. The solution to support rapidly growing applications is to scale horizontally, by adding servers
instead of concentrating more capacity in a single server.
'Sharding' a database across many server instances can be achieved with SQL databases, but usually is
accomplished through SANs and other complex arrangements for making hardware act as a single server.
Because the database does not provide this ability natively, development teams take on the work of
deploying multiple relational databases across a number of machines. Data is stored in each database
instance autonomously. Application code is developed to distribute the data, distribute queries, and
aggregate the results of data across all of the database instances. Additional code must be developed to
handle resource failures, to perform joins across the different databases, for data rebalancing, replication,
and other requirements. Furthermore, many benefits of the relational database, such as transactional
integrity, are compromised or eliminated when employing manual sharding.
NoSQL databases, on the other hand, usually support auto-sharding, meaning that they natively and
automatically spread data across an arbitrary number of servers, without requiring the application to even
be aware of the composition of the server pool. Data and query load are automatically balanced across
servers, and when a server goes down, it can be quickly and transparently replaced with no application
disruption.
Cloud computing makes this significantly easier, with providers such as Amazon Web Services providing
virtually unlimited capacity on demand, and taking care of all the necessary infrastructure administration
tasks. Developers no longer need to construct complex, expensive platforms to support their applications,
and can concentrate on writing application code. Commodity servers can provide the same processing and
storage capabilities as a single high-end server for a fraction of the price.
8. Replication
Most NoSQL databases also support automatic database replication to maintain availability in the event of
outages or planned maintenance events. More sophisticated NoSQL databases are fully self-healing,
offering automated failover and recovery, as well as the ability to distribute the database across multiple
geographic regions to withstand regional failures and enable data localization. Unlike relational databases,
NoSQL databases generally have no requirement for separate applications or expensive add-ons to
implement replication.
9. Integrated Caching
A number of products provide a caching tier for SQL database systems. These systems can improve read
performance substantially, but they do not improve write performance, and they add operational
complexity to system deployments. If your application is dominated by reads then a distributed cache could
be considered, but if your application has just a modest write volume, then a distributed cache may not
improve the overall experience of your end users, and will add complexity in managing cache invalidation.
Many NoSQL database technologies have excellent integrated caching capabilities, keeping frequently-used
data in system memory as much as possible and removing the need for a separate caching layer. Some
NoSQL databases also offer fully managed, integrated in-memory database management layer for
workloads demanding the highest throughput and lowest latency.
7. Conclusion
The aim of this research is to give a thorough overview and introduction to the NoSQL database movement
which appeared in the recent years to provide alternatives to the predominant relational database
management systems. Chapter 2 discusses database data model, CAP Theory that describes any distributed
system. These can be summarized by the need for high scalability, the processing of large amounts of data,
the ability to distribute data among many (often commodity) servers, consequently a distribution-aware
design of DBMSs (instead of adding such facilities on top) as well as a smooth integration with programming
languages and their data structures (instead of e.g. costly object-relational mapping). As shown in chapter 2,
relational DBMSs have certain flaws and limitations regarding these requirements as they were designed in a
time where hardware (especially main-memory) was expensive and full dynamic querying was expected to
be the most important use case. The situation today is very different, so a complete redesign of database
management systems is suggested. Because of the limitations of relational DBMSs and today’s needs, a wide
range of non-relational datastores has emerged.
Chapter 3 introduced definition, list of NoSQL Database, Base Transactions and ACID vs. BASE to address
consistency, partitioning, storage layout, querying, and distributed data processing. The Important point is
characteristic of NoSQL database that guarantee the integrity of data such as Strong Consistency, high
availability and Partition-tolerance also variety of NoSQL database in addition to BASE transactions and how
do the vast data systems of the world such as Google’s BigTable and Amazon’s Dynamo and Facebook’s.
As a first class of NoSQL databases, key-/value-stores also document store, column store and graph base are
examined in chapter 4. Most of these datastores heavily borrow from Amazon’s Dynamo, a proprietary, fully
distributed, eventual consistent key-/value-store which has been discussed in detail in this research. The
chapter also looked at complexity of query depends and size of database victors that help to identify which a
suitable NoSQL database should we chose.
Chapter 5 has discussed a closer look on famous products such as document stores by observing MongoDB
as the two major representatives of this class of NoSQL databases. These document stores provide the
abstraction of documents which are flat or nested namespaces for key-/value-pairs. Redis is a key-value store
which is very vast in-memory operations (100K operations/second on entry-level hardware) also uses of redis.
Cassandra is a column store NoSQL also we illustrate super column as a unique features of Cassandra which
give a new concept to google Big Table. Neo4j is a graph model with high performance traversal API from
roots also cypher algorithm of graph model to find shortest path to data.
B. List of abbreviations | 52
B. List of abbreviations
2PC Two-phase commit
ACID Atomicity Consistency Isolation Durability
ACM Association for Computing Machinery
ADO ActiveX Data Objects
aka also known as
API Application Programming Interface
BASE Basically Available, Soft-State, Eventual Consistency
BBC British Broadcasting Corporation
BDB Berkley Database
BLOB Binary Large Object
CAP Consistency, Availability, Partition Tolerance
CEO Chief Executive Officer
CPU Central Processing Unit
CS Computer Science
CTA Constrained tree application
CTO Chief Technology Officer
DNS Domain Name System
DOI Digital Object Identifier
DBA Database administrator
EAV store Entity-Attribute-Value store
EC2 (Amazon’s) Elastic Cloud Computing
EJB Enterprise JavaBeans
Erlang OTP Erlang Open Telecommunication Platform
E-R Model Entity-Relationship Model
FIFO First in, first out
GFS Google File System
GPL Gnu General Public License
HA High Availability
HDFS Hadoop Distributed File System
HQL Hypertable Query Language
IBM International Business Machines
IEEE Institute of Electrical and Electronics Engineers
IETF Internet Engineering Task Force
IO Input Output
IP Internet Protocol
JDBC Java Database Connectivity
JPA Java Persistence API
JSON JavaScript Object Notation
JSR Java Specification Request
B. List of abbreviations | 53
LINQ Language Integrated Query

LRU Least recently used
LSM(-Tree) Log Structured Merge (Tree)
LZW Lempel-Ziv-Welch
MIT Massachusetts Institute of Technology
MVCC Multi-version concurrency control
(Java) NIO (Java) New I/O
ODBC Open Database Connectivity
OLAP Online Analytical Processing
OLTP Online Transaction Processing
OSCON O’Reilly Open Source Convention
OSS Open Source Software
PCRE Perl-compatible Regular Expressions
PNUTS (Yahoo!’s) Platform for Nimble Universal Table Storage
PODC Principles of distributed computing (ACM symposium)
RAC (Oracle) Real Application Cluster
RAM Random Access Memory
RDF Resource Description Framework
RDS (Amazon) Relational Database Service
RDBMS Relational Database Management System
REST Representational State Transfer
RPC Remote Procedure Call
RoR Ruby on Rails
RYOW Read your own writes (consistency model)
(Amazon)S3 (Amazon) Simple Storage Service
SAN Storage Area Network
SLA Service Level Agreement
SMP Symmetric multiprocessing
SPOF Single point of failure
SSD Solid State Disk
SQL Structured Query Language
TCP Transmission Control Protocol
TPC Transaction Performance Processing Council
US United States
URI Uniform Resource Identifier
URL Uniform Resource Locator
XML Extensible Markup Language
References
 “NoSQL -- Your Ultimate Guide to the Non - Relational Universe!”
http://nosql-database.org/links.html
 “NoSQL (RDBMS)”
http://en.wikipedia.org/wiki/NoSQL
 “NoSQL Tutorial” The Premier Magazine of the Linux Community
http://www.linuxjournal.com/article.php?sid=3294
 PODC Keynote, July 19, 2000. Towards Robust. Distributed Systems. Dr. Eric A. Brewer. Professor, UC
Berkeley. Co-Founder & Chief Scientist, Inktomi .
www.eecs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
 Andre B Bondi characteristics of scalability and their impact performance In proceedings of the 2Nd
International Workshop on software and Performance, WOSP ‘))’, pages 195-203, New York, NY, USA
, 2000.ACM
 “Brewer's CAP Theorem” posted by Julian Browne, January 11, 2009.
http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
 “How to write a CV” Geek & Poke Cartoon
http://geekandpoke.typepad.com/geekandpoke/2011/01/nosql.html
 Exploring CouchDB: A document-oriented database for Web applications”, Joe Lennon, Software
developer, Core International.
http://www.ibm.com/developerworks/opensource/library/os-couchdb/index.html
 “Graph Databases, NOSQL and Neo4j” Posted by Peter Neubauer on May 12, 2010 at:
http://www.infoq.com/articles/graph-nosql-neo4j
 “Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison”, Kristóf Kovács.
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
 “Distinguishing Two Major Types of Column-Stores” Posted by Daniel Abadi onMarch 29, 2010
http://dbmsmusings.blogspot.com/2010/03/distinguishing-two-major-types-of_29.html
 MapReduce: Simplified Data Processing on Large Clusters”, Jeffrey Dean and Sanjay Ghemawat,
December 2004.
http://labs.google.com/papers/mapreduce.html
 “Scalable SQL”, ACM Queue, Michael Rys, April 19, 2011
http://queue.acm.org/detail.cfm?id=1971597
 “a practical guide to noSQL”, Posted by Denise Miura on March 17, 2011 at
http://blogs.marklogic.com/2011/03/17/a-practical-guide-to-nosql/
 “ NoSQL Database , Not just a Buzzword”,Posted by Dr.Elghareeb Haytham, Mansoura University ,
December 8, 2014 at http://eg.linkedin.com/in/helghareeb.
 Dataversity and the NoSQLNow! Conference present the CIO’s guide to NoSQL:A free whitepater by
dan McCreary and William McKnight

NoSql Database White Paper

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

NoSql Database White Paper

Uploaded by

Copyright:

Available Formats

Dedication

Every challenging work needs self-efforts as well as

Father, Mother and Wife

Whose affection, love, encouragement and prays of day and

5.3. MongoDB ............................................................................................................................................. 41

1.1. Introduction and Overview

1.2. Let’s agree on Some Definition

1.3. Open source

1.4.1. Scalability Options

Fig (1.1) Vertical & Horizontal Scaling

1.4.2. Vertical Scalability Challenges

 Shared Memory, Interlocking and Multiprocessor Concurrency.

1.4.3. Horizontal Scalability Challenges

Scale Up Scale Out

 Increase clock speed.

Table (1.1) Scale up Vs Scale out

1.5. Sharding Technique

Fig (1.2) Database Sharding

 manage parallel access in the application.

1.6. NoSQL Market Overview

Fig (1.3) NoSQL Market

Chapter (2) Data model

2.1. Flat File database model

2.1.1. Flat file characteristic

Fig (2.1) flat file Model ( configuration file)

2.2. Hierarchical Model

2.2.1. Hierarchical Data Model characteristics

 Represents data as hierarchical tree structures.

Fig (2.2) Hirarach data model ( windows registery)

2.3. Network Model

Fig (2.3) Network Model

2.4. Relational Model

2.4.1. Properties of Relational Tables:

Fig (2.4) Relation Model Example

2.4.2. ACID properties (atomicity, consistency, isolation, and durability)

Fig (2.5) ACID Properties

2.4.3. Relational Model is Great

 World Economy is Relational

2.4.4. Relational Database Drawbacks

Fig ( 2.6 ) Scaling a Relational Database

2.5. The CAP Theorem

Fig ( 2.7 ) CAP theorem

2.5.1. CAP Theorem Challenge ·

2.5.2. CAP Theorem and ACID

 Relaxing C makes replication easy, facilitates fault tolerance,

 Relaxing A reduces (or eliminates) need for distributed concurrency control.

2.6. Object/Relational Model

Fig ( 2.8 ) Object/Relational Model

2.7. Object-Oriented Model

Fig ( 2.9 ) Object-Oriented Model

Chapter (3) NoSQL

3.1. NOSQL DEFINATION

2.8.1. Characteristics of NoSQL Databases

3.2. LIST OF NOSQL DATABASE

Fig ( 3.1 ) NoSQL Database list

3.3. BASE Transaction

3.4. Acid Vs Base

 Strong consistency.  Weak consistency – stale data OK.

Table ( 3.1 ) ACID Vs BASE

Chapter (4) NoSQL Matters

4.1. EXPLORING THE DIFFERENT TYPES OF NOSQL DATABASES

2.11.1. The primary objective of a NoSQL database is to have

2.11.2. Types of NoSQL databases-