Unit 5

Need for NoSQL databases
Problems in RDBMS:
Relational databases have been dominating the database industry for the past many years, but
this way of storing data is challenged with new business requirements. With ever increasing user
growth and the huge amounts of data being generated, relational databases cannot provide on
demand scalability. It is because they are primarily engineered to run in a single server machine,
thus scaling-up has been the only option.
1) Scaling-Up is Expensive
Scaling up means upgrading your servers by adding more processors, RAM and
hard-disks to your machine to handle more load and increase capacity. Which means that
whenever your business has grown enough to max out server capacity, you have to
replace and upgrade your database server machines to accommodate more throughput
(reads and writes). This results in downtime. Also, in "less-than-peak-load" conditions,
you are under-utilizing your server machines and incurring unnecessary initial costs.
2) Downtime: downtime is used to refer to periods when a system is unavailable
3) Rigid Data Models
A relational database stores data in a fixed and pre-defined structure. This leads to a rigid
data model where each schema change comes at a cost. This is because before your begin
your development, you have to define your data schema in terms of tables and columns.
Any change in the data model requires a change in the schema which leads to creating
new columns, defining new relations, reflecting the changes in your application,
discussing with your database administrators etc. While possible, this process is against
everything agile, where new business requirements need a fast implementation.
NoSQL
A NoSQL database provides extreme data flexibility, scales out as your application scales out
and assures 100% database uptime. These advantages can be addressed under three headings:
● Data Modeling
● Scalability
● High Availability
1. Data Modeling in a NoSQL Database
All businesses have a knack for adding new demands and features to their
applications after deployment. Such requirement changes are not just feature requests
but entire changes in business process as well. Innovation in any industry is not just a
buzzword anymore, it's a necessity, and competition demands rapid adaptability to
new requirements.
NoSQL is built to support exactly such fast moving requirements whereas a relational
database is inherently slow to such changes due to its rigid data structure. NoSQL
databases fully support agile development.
Flexible and Fluid Data Models

A NoSQL database stores data in Key Value pairs where the "key" is just a unique
identifier and the "value" is a JSON document. The closest similar entity in a
relational database is a row. A row in an RDBMS is just a flat data structure where
the data is divided into columns. The JSON format however, is much, much richer
than just a flat data structure. For comparison reasons, the following diagram speaks
volumes.
The diagram shows a single row of a user data converted to a JSON document. In it, the column
names are converted to keys inside a JSON document and the values are retained as is.
For example, if we are to add new information to the user data, it requires a hit on the whole
database (i.e. all the rows of a table) which hurts performance on production servers thereby
hurting every other application depending on the servers as well.
NoSQL on the other hand is completely flexible. Instead of defining all the column names before
inserting data, a NoSQL database has no data schema restrictions. If your business demands a
change or addition in the data from here-forth, no problem. Just add new entities inside new JSON
documents.
With the help of a NoSQL database, your data schema is flexible and that supports the speed that
agile development demands. Your application will accept new changes as required by the
business without advanced definition.
JSON: A Rich Way to Store Data
Storing data in the JSON format not only provides schema flexibility, it is also a more natural
way to store data than in rows and columns. For example, a relational database stores data in
pieces. Suppose you need to store a USER object which has multiple Hobbies and also has a
few Shipping Addresses. The relational way to store such information would be as follows.
This data is naturally meant to be stored together since the values are accessed, used and created
together. So it's safe to say that storing information in rows and columns is highly structured and
therefore unnatural. The natural way would have been to store this information as one and group
them together. This would allow your applications to access this information without any
complex queries and JOINS.
Such flexibility with fluid data models allows you to mold your data according to your
application needs and also helps you adapt to new changes per requirements of the business.
2. Distributed Mindset: The Core of a NoSQL database
NoSQL databases break the traditional mindset of storing data at a single location. Instead,
NoSQL distributes and stores data over a set of multiple servers. This distribution of data helps
the NoSQL database server to distribute the load on the database tier. This distributed data
approach also means the system can scale out rather than just scale up.
Scaling out without limits
One of the main advantages of a NoSQL database over a relational database is the ability to scale
out. Scaling out means adding more servers to your system to distribute the load. That is, when
your current server machines (application or database) reach their maximum capacity, you do not
stop operations to upgrade your server machines; instead you add more nodes. Such scalability
also saves money by using less-costly commodity hardware rather than expensive multi-core
heavy duty machines.
As seen from the above figure, adding servers only when required helps to keep the operating
cost low. This can scale out to an infinitely large cluster and it provides linear scalability. Which
means that as you increase your NoSQL database servers, your transaction capacity increases
equally. This helps you to handle huge amounts of data beyond the limits of a single machine.
3. High Availability (100% uptime)
Other than providing scalability, NoSQL databases also provide high availability out of the box.
NoSQL architecture supports data replication that happens near real time. If a primary server
goes down, the replica server automatically takes over and your client applications work
seamlessly. There is practically no limit to the number of replicas a node can have.
Furthermore, NoSQL databases allow you to configure your database to span multiple data
centers. This way, if any data center completely crashes, other replica machines can take over.
These replica machines can be configured to serve read requests from client applications.
Typically, the closest replica is automatically selected by the client application based on a
"nearest-path" algorithm.
Features of NoSQL
Non-relational
● NoSQL databases never follow the relational model

● Never provide tables with flat fixed-column records
● Work with self-contained aggregates or BLOBs
● Doesn't require object-relational mapping and data normalization
● No complex features like query languages, query planners, referential integrity joins,
ACID
Schema-free
● NoSQL databases are either schema-free or have relaxed schemas

● Do not require any sort of definition of the schema of the data
● Offers heterogeneous structures of data in the same domain
● Shared Nothing Architecture. This enables less coordination and higher distribution.
NoSQL is Shared Nothing.
Types, ACID and BASE
NoSQL is a non-relational DMS, that does not require a fixed schema, avoids joins, and is easy
to scale. NoSQL database is used for distributed data stores. NoSQL is used for Big data and
real-time web apps. For example, companies like Twitter, Facebook, Google that collect
terabytes of user data every single day.
NoSQL database stands for "Not Only SQL" or "Not SQL." Carl Strozz introduced the NoSQL
concept in 1998.
Traditional RDBMS uses SQL syntax to store and retrieve data for further insights. Instead, a
NoSQL database system encompasses a wide range of database technologies that can store
structured, semi-structured, unstructured and polymorphic data.
Types of NoSQL Databases
● Key-value Pair Based

● Column-oriented Graph
● Graphs based
● Document-oriented
Each of these categories has its unique attributes and limitations. No specific database is better to
solve all problems. You should select a database based on your product needs.
Key Value Pair Based
Data is stored in key/value pairs. It is designed in such a way to handle lots of data and heavy
load.
Key-value pair storage databases store data as a hash table where each key is unique, and the
value can be a JSON, BLOB(Binary Large Objects), string, etc.
It is one of the most basic types of NoSQL databases. This kind of NoSQL database is used as a
collection, dictionaries, associative arrays, etc. Key value stores help the developer to store
schema-less data. They work best for shopping cart contents.
Redis, Dynamo, Riak are some examples of key-value store DataBases. They are all based on
Amazon's Dynamo paper.
Column-based
Column-oriented databases work on columns and are based on BigTable paper by Google. Every
column is treated separately. Values of single column databases are stored contiguously.
They deliver high performance on aggregation queries like SUM, COUNT, AVG, MIN etc. as
the data is readily available in a column.
Column-based NoSQL databases are widely used to manage data warehouses, business
intelligence, CRM, Library card catalogs,
HBase, Cassandra, Hypertable, Amazon DynamoDB are examples of column based database.
Document-Oriented
Document-Oriented NoSQL DB stores and retrieves data as a key value pair but the value part is
stored as a document. The document is stored in JSON or XML formats. The value is understood
by the DB and can be queried.
Relational Vs. Document
The document type is mostly used for CMS systems, blogging platforms, real-time analytics &
e-commerce applications. It should not use for complex transactions which require multiple
operations or queries against varying aggregate structures.
Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes, are popular Document originated
DBMS systems.
Graph-Based
A graph type database stores entities as well the relations amongst those entities. The entity is
stored as a node with the relationship as edges. An edge gives a relationship between nodes.
Every node and edge has a unique identifier.
Compared to a relational database where tables are loosely connected, a Graph database is a
multi-relational in nature. Traversing relationship is fast as they are already captured into the DB,
and there is no need to calculate them.
Graph base database mostly used for social networks, logistics, spatial data.
Neo4J, Infinite Graph, OrientDB, FlockDB are some popular graph-based databases.
CAP Theorem
CAP theorem is also called brewer's theorem. It states that is impossible for a distributed
data store to offer more than two out of three guarantees
● Consistency
● Availability
● Partition Tolerance
Consistency:The data should remain consistent even after the execution of an operation.
This means once data is written, any future read request should contain that data. For example,
after updating the order status, all the clients should be able to see the same data.
Availability:The database should always be available and responsive. It should not have
any downtime.
Partition Tolerance:Partition Tolerance means that the system should continue to function
even if the communication among the servers is not stable. For example, the servers can be
partitioned into multiple groups which may not communicate with each other. Here, if part of the
database is unavailable, other parts are always unaffected.
BASE: Basically Available, Soft state, Eventual consistency
● Basically, available means DB is available all the time as per CAP theorem
● Soft state means even without an input (User input); the system state may change
(state change without input which is required for eventual consistency. )
● Eventual consistency means that the system will become consistent over time
The term "eventual consistency" means to have copies of data on multiple machines to get
high availability and scalability. Thus, changes made to any data item on one machine has to
be propagated to other replicas.
Advantages of NoSQL
● Can be used as Primary or Analytic Data Source

● Big Data Capability
● No Single Point of Failure
● Easy Replication
● It provides fast performance and horizontal scalability.
● Can handle structured, semi-structured, and unstructured data with equal effect
● Object-oriented programming which is easy to use and flexible
● NoSQL databases don't need a dedicated high-performance server
● Support Key Developer Languages and Platforms
● Simple to implement than using RDBMS
● It can serve as the primary data source for online applications.
● Handles big data which manages data velocity, variety, volume, and complexity
● Excels at distributed database and multi-data center operations
● Eliminates the need for a specific caching layer to store data
● Offers a flexible schema design which can easily be altered without downtime or service
disruption
Disadvantages of NoSQL
● No standardization rules
● Limited query capabilities
● It does not offer any traditional database capabilities, like consistency when multiple
transactions are performed simultaneously.
● When the volume of data increases it is difficult to maintain unique values as keys
become difficult
● Doesn't work as well with relational data
● Open source options so not so popular for enterprises.
Key term differences between MongoDB and RDBMS

RDBM MongoDB Difference
S
Table Collection In RDBMS, the table contains the columns and rows which are used to store
the data whereas, in MongoDB, this same structure is known as a collection.
The collection contains documents which in turn contains Fields, which in
turn are key-value pairs.
Row Document In RDBMS, the row represents a single, implicitly structured data item in a
table. In MongoDB, the data is stored in documents.
Column Field In RDBMS, the column denotes a set of data values. These in MongoDB are
known as Fields.
Joins Embedded In RDBMS, data is sometimes spread across various tables and in order to
documents show a complete view of all data, a join is sometimes formed across tables to
get the data. In MongoDB, the data is normally stored in a single collection,
but separated by using Embedded documents. So there is no concept of joins
in MongoDB.
Basic Operations on Document Databases
● Inserting
● Deleting
● Updating
● Retrieving
By convention, the database container is referred to as ‘db’. To refer to a collection, you prefix the
collection name with ‘db’. For example, the collection customer is indicated by ‘db.customer.’
Creating a database
“use” command (If the database does not
exist a new one will be created
Creating a Collection/Table and add a

document
insert () command
db.Employee.insert({_id:10, "EmployeeName"
: "Smith"})
Add multiple documents

insert () command with array
The [ and ] in the parameter list delimit an
array of documents to insert.
{
_id: 6,
name: "abc",
age: 43,
type: 1,
status: "A",
favorites: { food: "pizza", artist:
"Picasso" },
finished: [ 18, 12 ], db.books.insert( 🡪(or) db.books.insertMany(
badges: [ "black", "blue" ], [
points: [ {“book_id”: 1298747,
{ points: 78, bonus: 8 }, “title”:“Mother Night”,
{ points: 57, bonus: 7 } “author”: “Kurt Vonnegut, Jr.”},
] {“book_id”: 639397,
} “title”:“Science and the Modern World”,
Retrieve artist equal to "Picasso" “author”: “Alfred North Whitehead”},
{“book_id”: 1456701,
“title”:“Foundation and Empire”,
db.users.find( { "favorites.artist": “author”: “Isaac Asimov”}
"Picasso" } ) ]
)
Deleting Documents from a Collection deletes all documents in the books collection
remove() command db.books.remove()
remove the documents which have the Employee

id as 22.
db.Employee.remove({Employeeid:22})
removes all the documents from the collection

products where qty is greater than 20
db.products.remove( { qty: { $gt: 20 } } )

removes the first document from the collection
products where qty is greater than 20
db.products.remove( { qty: { $gt: 20 } }, true )
(or)
db.products.remove( { qty: { $gt: 20 } }, 1)
Updating Documents in a Collection update the name which has the Employee id 22
The update() method requires two
parameters to update: db.Employee.update(
• Document query {"Employeeid" : 22},
{$set: { "EmployeeName" : "NewMartin"}});
• Set of keys and values to update
Retrieving Documents from a Collection selects all documents in the collection

find() method is used to retrieve documents db.users.find( {} ) (or) db.users.find()
retrieves from the users collection all documents
returns all books by Kurt Vonnegut, Jr.: where the status field has the value "A":
return all key values db.users.find( { status: "A" } )
db.books.find({“author”: “Kurt Vonnegut, Jr.”}) retrieves all documents in the users collection
return specific key values (second where the status equals "A" and age is less than
argument that is a list of keys to return along ($lt) 30:
with a “1” to indicate the key should be db.users.find( { status: "A", age: { $lt: 30 } } )
returned.) retrieves all documents in the collection where
the status equals "A" or age is less than ($lt) 30:
db.books.find db.users.find(
( {
{“author”: “Kurt Vonnegut, Jr.”}, $or: [ { status: "A" }, { age: { $lt: 30 } } ]
{“title” : 1} }
) )
selects all documents in the collection where
the``status`` equals "A" and either age is less than
than ($lt) 30 or type equals 1
db.users.find(
{
status: "A",
$or: [ { age: { $lt: 30 } }, { type: 1 } ]
}
)
retrieve all books with a quantity greater than or
equal to 10 and less than 50
Db.books.find( {“quantity” : {“$gte” : 10, “$lt”

: 50 }} )
db.inventory.find( { qty: { $eq: 20 } } ) (or) db.inventory.find( { qty: 20 } )
db.inventory.find( { qty: { $ne: 20 } } ) 🡪 not equal to
db.inventory.find( { qty: { $in: [ 5, 15 ] } } )🡪 in is similar to or (selects all documents in the inventory

collection where the qty field value is either 5 or 15.)
$nin selects the documents where:
● the field value is not in the specified array or

● the field does not exist.
select all documents in the inventory collection where the qty field value
does not equal 5 nor 15. The selected documents will include those documents that
do not contain the qty field.
db.inventory.find( { qty: { $nin: [ 5, 15 ] } } )
Review Questions:
1. Define a document with respect to document databases.
2. Name two types of formats for storing data in a document database.
3. List at least three syntax rules for JSON objects.
4. Create a sample document for a small appliance with the following attributes:
appliance ID, name, description, height, width, length, and shipping weight. Use the
JSON format.
5. Using the db.books collection, write a command to insert a book to the collection.
6. Using the db.books collection, write a command to remove books by Isaac Asimov.
7. Using the db.books collection, write a command to retrieve all books with quantity greater
than or equal to 20.
8. Which query operator is used to search for values in a single key?

Unit 5

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 5

Uploaded by

Copyright:

Available Formats

Need for NoSQL databases

1. Data Modeling in a NoSQL Database

Flexible and Fluid Data Models

JSON: A Rich Way to Store Data

2. Distributed Mindset: The Core of a NoSQL database

3. High Availability (100% uptime)

● NoSQL databases never follow the relational model

● NoSQL databases are either schema-free or have relaxed schemas

NoSQL is Shared Nothing.

Types, ACID and BASE

● Key-value Pair Based

Key Value Pair Based

Relational Vs. Document

● Can be used as Primary or Analytic Data Source

Key term differences between MongoDB and RDBMS

Basic Operations on Document Databases

Creating a Collection/Table and add a

Add multiple documents

remove the documents which have the Employee

removes all the documents from the collection

db.products.remove( { qty: { $gt: 20 } } )

Retrieving Documents from a Collection selects all documents in the collection

Db.books.find( {“quantity” : {“$gte” : 10, “$lt”

db.inventory.find( { qty: { $eq: 20 } } ) (or) db.inventory.find( { qty: 20 } )

db.inventory.find( { qty: { $ne: 20 } } ) 🡪 not equal to

db.inventory.find( { qty: { $in: [ 5, 15 ] } } )🡪 in is similar to or (selects all documents in the inventory

$nin selects the documents where:

● the field value is not in the specified array or

db.inventory.find( { qty: { $nin: [ 5, 15 ] } } )

You might also like