Unit 3

NOSQL
NoSQL database technology stores information in JSON documents instead of columns and
rows used by relational databases. To be clear, NoSQL stands for “not only SQL” rather than
“no SQL” at all. This means a NoSQL JSON database can store and retrieve data using literally
“no SQL.”
Consequently, NoSQL databases are built to be flexible, scalable, and capable of rapidly
responding to the data management demands of modern businesses.
The following defines the four most-popular types of NoSQL database:
• Document databases are primarily built for storing information as documents,
including, but not limited to, JSON documents. These systems can also be used for
storing XML documents, for a NoSQL database example.
• Key-value stores group associated data in collections with records that are identified
with unique keys for easy retrieval. Key-value stores have just enough structure to
mirror the value of relational databases (as opposed to non-relational databases) while
still preserving the benefits of the NoSQL database structure.
• Wide-column databases use the tabular format of relational databases yet allow a wide
variance in how data is named and formatted in each row, even in the same table. Like
key-value stores, wide-column databases have some basic NoSQL structure while also
preserving a lot of flexibility
• Graph databases use graph structures to define the relationships between stored data
points. Graph databases are useful for identifying patterns in unstructured and semi-
structured information.
Why use NoSQL?
• Support large numbers of concurrent users (tens of thousands, perhaps millions)

• Deliver highly responsive experiences to a globally distributed base of users
• Be always available – no downtime
• Handle semi- and unstructured data
• Rapidly adapt to changing requirements with frequent updates and new features
The following NoSQL tutorial illustrates an application used for managing resumes. It interacts
with resumes as an object (i.e., the user object), contains an array for skills, and has a collection
for positions. Alternatively, writing a resume to a relational database requires the application
to “shred” the user object.
Storing this resume would require the application to insert six rows into three tables, as
illustrated in Figure 3.
And, reading this profile would require the application to read six rows from three tables, as
illustrated in Figure 4.
By contrast, in a document-oriented database defined as NoSQL, JSON is the de facto format

for storing data – helpfully, it’s also the de facto standard for consuming and producing data
for web, mobile, and IoT applications. JSON not only eliminates the object-relational
impedance mismatch, it also eliminates the overhead of ORM frameworks and simplifies
application development because objects are read and written without “shredding” them (i.e.,
a single object can be read or written as a single document), as illustrated in Figure 5.
Key Features of NoSQL :

1. Dynamic schema: NoSQL databases do not have a fixed schema and can
accommodate changing data structures without the need for migrations or
schema alterations.
2. Horizontal scalability: NoSQL databases are designed to scale out by adding
more nodes to a database cluster, making them well-suited for handling large
amounts of data and high levels of traffic.
3. Document-based: Some NoSQL databases, such as MongoDB, use a document-
based data model, where data is stored in semi-structured format, such as JSON
or BSON.
4. Key-value-based: Other NoSQL databases, such as Redis, use a key-value data
model, where data is stored as a collection of key-value pairs.
5. Column-based: Some NoSQL databases, such as Cassandra, use a column-
based data model, where data is organized into columns instead of rows.
6. Distributed and high availability: NoSQL databases are often designed to be
highly available and to automatically handle node failures and data replication
across multiple nodes in a database cluster.
7. Flexibility: NoSQL databases allow developers to store and retrieve data in a
flexible and dynamic manner, with support for multiple data types and changing
data structures.
8. Performance: NoSQL databases are optimized for high performance and can
handle a high volume of reads and writes, making them suitable for big data and
real-time applications.
Advantages of NoSQL: There are many advantages of working with NoSQL databases
such as MongoDB and Cassandra. The main advantages are high scalability and high
availability.
1. High scalability : NoSQL databases use sharding for horizontal scaling.
Partitioning of data and placing it on multiple machines in such a way that the
order of the data is preserved is sharding. Vertical scaling means adding more
resources to the existing machine whereas horizontal scaling means adding more
machines to handle the data. Vertical scaling is not that easy to implement but
horizontal scaling is easy to implement. Examples of horizontal scaling
databases are MongoDB, Cassandra, etc. NoSQL can handle a huge amount of
data because of scalability, as the data grows NoSQL scale itself to handle that
data in an efficient manner.
2. Flexibility: NoSQL databases are designed to handle unstructured or semi-
structured data, which means that they can accommodate dynamic changes to
the data model. This makes NoSQL databases a good fit for applications that
need to handle changing data requirements.
3. High availability : Auto replication feature in NoSQL databases makes it highly
available because in case of any failure data replicates itself to the previous
consistent state.
4. Scalability: NoSQL databases are highly scalable, which means that they can
handle large amounts of data and traffic with ease. This makes them a good fit
for applications that need to handle large amounts of data or traffic
5. Performance: NoSQL databases are designed to handle large amounts of data
and traffic, which means that they can offer improved performance compared to
traditional relational databases.
6. Cost-effectiveness: NoSQL databases are often more cost-effective than
traditional relational databases, as they are typically less complex and do not
require expensive hardware or software.
Disadvantages of NoSQL: NoSQL has the following disadvantages.

1 Lack of standardization : There are many different types of NoSQL databases,
each with its own unique strengths and weaknesses. This lack of standardization
can make it difficult to choose the right database for a specific application
2 Lack of ACID compliance : NoSQL databases are not fully ACID-compliant,
which means that they do not guarantee the consistency, integrity, and durability
of data. This can be a drawback for applications that require strong data
consistency guarantees.
3 Narrow focus : NoSQL databases have a very narrow focus as it is mainly
designed for storage but it provides very little functionality. Relational databases
are a better choice in the field of Transaction Management than NoSQL.
4 Open-source : NoSQL is open-source database. There is no reliable standard for
NoSQL yet. In other words, two database systems are likely to be unequal.
5 Lack of support for complex queries : NoSQL databases are not designed to
handle complex queries, which means that they are not a good fit for
applications that require complex data analysis or reporting.
6 Lack of maturity : NoSQL databases are relatively new and lack the maturity
of traditional relational databases. This can make them less reliable and less
secure than traditional databases.
7 Management challenge : The purpose of big data tools is to make the
management of a large amount of data as simple as possible. But it is not so
easy. Data management in NoSQL is much more complex than in a relational
database. NoSQL, in particular, has a reputation for being challenging to install
and even more hectic to manage on a daily basis.
8 GUI is not available : GUI mode tools to access the database are not flexibly
available in the market.
9 Backup : Backup is a great weak point for some NoSQL databases like
MongoDB. MongoDB has no approach for the backup of data in a consistent
manner.
10 Large document size : Some database systems like MongoDB and CouchDB
store data in JSON format. This means that documents are quite large (BigData,
network bandwidth, speed), and having descriptive key names actually hurts
since they increase the document size.
Types of NoSQL database: Types of NoSQL databases and the name of the databases
system that falls in that category are:
1. Graph Databases: Examples – Amazon Neptune, Neo4j
2. Key value store: Examples – Memcached, Redis, Coherence
3. Tabular: Examples – Hbase, Big Table, Accumulo
4. Document-based: Examples – MongoDB, CouchDB, Cloudant
CREATE TABLE users (

username text PRIMARY KEY,
first_name text,
last_name text,
email text,
age int,
address map<text, text>,
phone_numbers set<text>,
created_at timestamp,
updated_at timestamp
);
What is the CAP theorem?

The CAP theorem is used to makes system designers aware of the trade-offs while
designing networked shared-data systems. CAP theorem has influenced the design of
many distributed data systems. It is very important to understand the CAP theorem as It
makes the basics of choosing any NoSQL database based on the requirements.
CAP theorem states that in networked shared-data systems or distributed systems, we
can only achieve at most two out of three guarantees for a
database: Consistency, Availability and Partition Tolerance.
A distributed system is a network that stores data on more than one node (physical or
virtual machines) at the same time.
Consistency: means that all clients see the same data at the same time, no matter
which node they connect to in a distributed system. To achieve consistency, whenever
data is written to one node, it must be instantly forwarded or replicated to all the other
nodes in the system before the write is deemed successful.
Availability: means that every non-failing node returns a response for all read and
write requests in a reasonable amount of time, even if one or more nodes are down.
Another way to state this — all working nodes in the distributed system return a valid
response for any request, without failing or exception.
Partition Tolerance: means that the system continues to operate despite arbitrary
message loss or failure of part of the system. In other words, even if there is a network
outage in the data center and some of the computers are unreachable, still the system
continues to perform. Distributed systems guaranteeing partition tolerance can
gracefully recover from partitions once the partition heals.
The CAP theorem categorizes systems into three categories:

1 CP (Consistent and Partition Tolerant) database: A CP database delivers
consistency and partition tolerance at the expense of availability. When a partition
occurs between any two nodes, the system has to shut down the non-consistent
node (i.e., make it unavailable) until the partition is resolved.
Partition refers to a communication break between nodes within a distributed
system. Meaning, if a node cannot receive any messages from another node in the
system, there is a partition between the two nodes. Partition could have been
because of network failure, server crash, or any other reason.
2 AP (Available and Partition Tolerant) database: An AP database delivers
availability and partition tolerance at the expense of consistency. When a partition
occurs, all nodes remain available but those at the wrong end of a partition might
return an older version of data than others. When the partition is resolved, the AP
databases typically resync the nodes to repair all inconsistencies in the system.
3 CA (Consistent and Available) database: A CA delivers consistency and
availability in the absence of any network partition. Often a single node’s DB
servers are categorized as CA systems. Single node DB servers do not need to
deal with partition tolerance and are thus considered CA systems.
In any networked shared-data systems or distributed systems partition tolerance

is a must. Network partitions and dropped messages are a fact of life and must be
handled appropriately. Consequently, system designers must choose between
consistency and availability.
The following diagram shows the classification of different databases based on

the CAP theorem.
NoSQLDatabase
A NoSQL database (sometimes called as Not Only SQL) is a database that provides a
mechanism to store and retrieve data other than the tabular relations used in relational
databases. These databases are schema-free, support easy replication, have simple API,
eventually consistent, and can handle huge amounts of data.
The primary objective of a NoSQL database is to have
• simplicity of design,
• horizontal scaling, and
• finer control over availability.
NoSql databases use different data structures compared to relational databases. It makes some
operations faster in NoSQL. The suitability of a given NoSQL database depends on the problem
it must solve.
Relational Database NoSql Database
Supports powerful query language. Supports very simple query

language.
It has a fixed schema. No fixed schema.
Follows ACID (Atomicity, Consistency, Isolation, and It is only “eventually

Durability). consistent”.
Supports transactions. Does not support transactions.
What is Apache Cassandra?
Apache Cassandra is an open source, distributed and decentralized/distributed storage system

(database), for managing very large amounts of structured data spread out across the world. It
provides highly available service with no single point of failure.
Listed below are some of the notable points of Apache Cassandra −
• It is scalable, fault-tolerant, and consistent.
• It is a column-oriented database.
• Its distribution design is based on Amazon’s Dynamo and its data model on
Google’s Bigtable.
• Created at Facebook, it differs sharply from relational database management
systems.
• Cassandra implements a Dynamo-style replication model with no single point
of failure, but adds a more powerful “column family” data model.
• Cassandra is being used by some of the biggest companies such as Facebook,
Twitter, Cisco, Rackspace, ebay, Twitter, Netflix, and more.
Features of Cassandra
Cassandra has become so popular because of its outstanding technical features. Given below
are some of the features of Cassandra:
• Elastic scalability − Cassandra is highly scalable; it allows to add more
hardware to accommodate more customers and more data as per requirement.
• Always on architecture − Cassandra has no single point of failure and it is
continuously available for business-critical applications that cannot afford a
failure.
• Fast linear-scale performance − Cassandra is linearly scalable, i.e., it increases
your throughput as you increase the number of nodes in the cluster. Therefore it
maintains a quick response time.
• Flexible data storage − Cassandra accommodates all possible data formats
including: structured, semi-structured, and unstructured. It can dynamically
accommodate changes to your data structures according to your need.
• Easy data distribution − Cassandra provides the flexibility to distribute data
where you need by replicating data across multiple data centers.
• Transaction support − Cassandra supports properties like Atomicity,
Consistency, Isolation, and Durability (ACID).
• Fast writes − Cassandra was designed to run on cheap commodity hardware. It
performs blazingly fast writes and can store hundreds of terabytes of data,
without sacrificing the read efficiency.
The design goal of Cassandra is to handle big data workloads across multiple nodes without
any single point of failure. Cassandra has peer-to-peer distributed system across its nodes, and
data is distributed among all the nodes in a cluster.
• All the nodes in a cluster play the same role. Each node is independent and at
the same time interconnected to other nodes.
• Each node in a cluster can accept read and write requests, regardless of where
the data is actually located in the cluster.
• When a node goes down, read/write requests can be served from other nodes in
the network.
• Data Replication in Cassandra
• In Cassandra, one or more of the nodes in a cluster act as replicas for a given piece of
data. If it is detected that some of the nodes responded with an out-of-date value,
Cassandra will return the most recent value to the client. After returning the most recent
value, Cassandra performs a read repair in the background to update the stale values.
• The following figure shows a schematic view of how Cassandra uses data replication
among the nodes in a cluster to ensure no single point of failure.
•
• Note − Cassandra uses the Gossip Protocol in the background to allow the nodes to
communicate with each other and detect any faulty nodes in the cluster
Components of Cassandra
The key components of Cassandra are as follows −

• Node − It is the place where data is stored.
• Data center − It is a collection of related nodes.
• Cluster − A cluster is a component that contains one or more data centers.
• Commit log − The commit log is a crash-recovery mechanism in Cassandra.
Every write operation is written to the commit log.
• Mem-table − A mem-table is a memory-resident data structure. After commit log, the
data will be written to the mem-table. Sometimes, for a single-column family, there will
be multiple mem-tables.
• SSTable − It is a disk file to which the data is flushed from the mem-table when its
contents reach a threshold value.
• Bloom filter − These are nothing but quick, nondeterministic, algorithms for testing
whether an element is a member of a set. It is a special kind of cache. Bloom filters are
accessed after every query.
Cassandra Query Language
Users can access Cassandra through its nodes using Cassandra Query Language (CQL). CQL
treats the database (Keyspace) as a container of tables. Programmers use cqlsh: a prompt to
work with CQL or separate application language drivers.
Clients approach any of the nodes for their read-write operations. That node (coordinator) plays
a proxy between the client and the nodes holding the data.
Write Operations
Every write activity of nodes is captured by the commit logs written in the nodes. Later the
data will be captured and stored in the mem-table. Whenever the mem-table is full, data will
be written into the SStable data file. All writes are automatically partitioned and replicated
throughout the cluster. Cassandra periodically consolidates the SSTables, discarding
unnecessary data.
Read Operations
During read operations, Cassandra gets values from the mem-table and checks the bloom filter
to find the appropriate SSTable that holds the required data.
Cassandra - Data Model
The data model of Cassandra is significantly different from what we normally see in an
RDBMS. This chapter provides an overview of how Cassandra stores its data.
Cluster
Cassandra database is distributed over several machines that operate together. The outermost
container is known as the Cluster. For failure handling, every node contains a replica, and in
case of a failure, the replica takes charge. Cassandra arranges the nodes in a cluster, in a ring
format, and assigns data to them.
Keyspace
Keyspace is the outermost container for data in Cassandra. The basic attributes of a Keyspace
in Cassandra are −
• Replication factor − It is the number of machines in the cluster that will receive
copies of the same data.
• Replica placement strategy − It is nothing but the strategy to place replicas in
the ring. We have strategies such as simple strategy (rack-aware strategy), old
network topology strategy (rack-aware strategy), and network topology
strategy (datacenter-shared strategy).
• Column families − Keyspace is a container for a list of one or more column
families. A column family, in turn, is a container of a collection of rows. Each
row contains ordered columns. Column families represent the structure of your
data. Each keyspace has at least one and often many column families.
The syntax of creating a Keyspace is as follows −
CREATE KEYSPACE Keyspace name
WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3};
The following illustration shows a schematic view of a Keyspace.
Column Family
A column family is a container for an ordered collection of rows. Each row, in turn, is an
ordered collection of columns. The following table lists the points that differentiate a column
family from a table of relational databases.
Relational Table Cassandra column Family
A schema in a relational model is fixed. Once In Cassandra, although the column

we define certain columns for a table, while families are defined, the columns are
inserting data, in every row all the columns not. You can freely add any column to
must be filled at least with a null value. any column family at any time.
Relational tables define only columns and the In Cassandra, a table contains columns,
user fills in the table with values. or can be defined as a super column
family.
A Cassandra column family has the following attributes −

• keys_cached − It represents the number of locations to keep cached per
SSTable.
• rows_cached − It represents the number of rows whose entire contents will be
cached in memory.
• preload_row_cache − It specifies whether you want to pre-populate the row
cache.
Note − Unlike relational tables where a column family’s schema is not fixed, Cassandra does
not force individual rows to have all the columns.
The following figure shows an example of a Cassandra column family.
Column
A column is the basic data structure of Cassandra with three values, namely key or column
name, value, and a time stamp. Given below is the structure of a column.
SuperColumn
A super column is a special column, therefore, it is also a key-value pair. But a super column
stores a map of sub-columns.
Generally, column families are stored on disk in individual files. Therefore, to optimize
performance, it is important to keep columns that you are likely to query together in the same
column family, and a super column can be helpful here. Given below is the structure of a super
column.
Data Models of Cassandra and RDBMS
The following table lists down the points that differentiate the data model of Cassandra from
that of an RDBMS.
RDBMS Cassandra
RDBMS deals with structured data. Cassandra deals with unstructured data.
It has a fixed schema. Cassandra has a flexible schema.

In RDBMS, a table is an array of arrays. In Cassandra, a table is a list of “nested key-
(ROW x COLUMN) value pairs”. (ROW x COLUMN key x
COLUMN value)
Database is the outermost container that Keyspace is the outermost container that
contains data corresponding to an contains data corresponding to an application.
application.
Tables are the entities of a database. Tables or column families are the entity of a
keyspace.
Row is an individual record in RDBMS. Row is a unit of replication in Cassandra.
Column represents the attributes of a Column is a unit of storage in Cassandra.

relation.
RDBMS supports the concepts of Relationships are represented using

foreign keys, joins. collections.
NO SQL IN CLOUD
NoSQL Cloud Database Services are cloud-based database services that provide scalable,
high-performance, and cost-effective solutions for storing and retrieving data. NoSQL (Not
Only SQL) databases are designed to handle large volumes of unstructured, semi-structured,
and structured data, and can easily scale horizontally to accommodate increased data
volumes.
Cloud-based NoSQL databases offer several advantages over traditional on-premise

databases. These include:
1. Scalability: Cloud-based NoSQL databases can easily scale horizontally by

adding more servers to the cluster. This allows for seamless scalability as data
volumes increase.
2. High availability: NoSQL cloud databases are designed to be highly available and
can provide reliable uptime and performance, which is critical for many
applications.
3. Reduced cost: Cloud-based NoSQL databases can be more cost-effective than
traditional on-premise databases because they eliminate the need for expensive
hardware and infrastructure. This can be particularly beneficial for small to
medium-sized businesses that do not have the resources to invest in expensive
hardware.
4. Improved performance: Cloud-based NoSQL databases can provide high
performance and low latency, making them well-suited for applications that
require fast and efficient data access.
5. Flexibility: Cloud-based NoSQL databases are designed to handle unstructured,
semi-structured, and structured data, making them a flexible solution for a wide
range of applications.
Some popular NoSQL Cloud Database Services include:
1. Amazon DynamoDB: A fully managed NoSQL database service offered by

Amazon Web Services (AWS) that provides fast and predictable performance with
seamless scalability.
2. Google Cloud Datastore: A NoSQL document database service that is fully
managed and offers automatic scaling, high availability, and low latency.
3. Microsoft Azure Cosmos DB: A globally distributed, multi-model database
service that provides high availability, low latency, and flexible data modeling.
4. MongoDB Atlas: A fully managed global cloud database service for MongoDB
that provides automated backups, advanced security, and easy scalability.
5. Overall, NoSQL Cloud Database Services provide a flexible, scalable, and cost-
effective solution for storing and retrieving data in the cloud. They offer several
advantages over traditional on-premise databases and can be an excellent choice
for businesses of all sizes that need to store and manage large volumes of data.
Types of NoSQL databases:
• Column-based
• Key-value store
• Graph databases
• Document-Based
Figure 1: Types of NoSQL databases
Why choose NoSQL?
Google, Facebook, Amazon, and Linkedln are a few of the top internet companies that
originally use the NoSQL database for overcoming the downside of the Relational Database
Management System(RDBMS). Relational Database Management System isn’t dependably
the best answer for all circumstances since data handling necessities develop dramatically. A
dynamic method and a better cloud-friendly environment are provided by NoSQL for
processing unstructured data.
Difference between NoSQL Databases and SQL Databases:
• NoSQL database schema is flexible while SQL database schema is rigid.

• In NoSQL databases, queries are more immediate than SQL databases.
• NoSQL databases are non-relational while SQL databases are relational.
• NoSQL Examples -HBase, Bigtable.
• SQL Examples- Sybase, Access, Oracle, and MySQL.
Figure 2: Difference between SQl and NoSQL Databases
Oracle NoSQL Cloud Database Service:
Oracle NoSQL Database Cloud Service is a completely data-managed store and server-less.
It can deal with columnar, JSON reports, or key-value data models. The developers have
provided the following key features by NoSQL Cloud Database Service:
• Flexible: The service is developer-centric as it can be designed according to the
flexibility of developers. Various models of data models are supported by NoSQL
Database Cloud service, such as key-value, document-based, etc.
• Easy and Simple: NoSQL Database Cloud service makes application
development easy as it supports different languages like Java, Python, etc.
• Platform independent: Both fixed-schema and document-based data models
are supported by the NoSQL Database Cloud service.
How to start with Oracle NoSQL Cloud Database Service:
• TABLE CREATION: There are two modes with which we can easily create
new tables of Oracle NoSQL cloud database service. The modes are as follows:
• The first mode helps in the creation of the table without writing the
Data Definition Language(DDL) statement, that is a table can be
created declaratively. This mode can be called simple input mode.
• The second mode is referred to as advanced DDL input mode. As the
name describes, the table is created with the help of the Data Definition
Language(DDL) statement.
• INSERTION OF DATA IN TABLES: There are two modes with which we can
easily insert data in the table of Oracle NoSQL cloud database service. The modes
are as follows:
• In the first mode, the value is provided declaratively for the rows in the
table. This mode is referred to as simple input mode.
• In the second mode, the value is provided in Javascript Object
Notation(JSON) format for rows in the table, so this mode is advanced
JSON input mode.
Common Use Cases of Oracle NoSQL Cloud Database Service:
• Personalized experience: According to the user, these services help to deliver
customized content and also provide personalized individual experiences.
• Mobile apps: Advanced applications and applications with faster response time
can be made with the help of cloud database services for mobile.
• Games: It helps by supporting gaming with millions of simultaneous
participants with real-time situations.
Oracle NoSQL Database Cloud Simulator:
Oracle NoSQL Database Cloud Simulator is a tool that is provided by Oracle in order to
accelerate Oracle NoSQL Database cloud application development and testing. Cloudism is
the other word for cloud simulators. It helps in creating, debugging, and testing the
application against a locally deployed database instance. This locally deployed database has
all functions equivalent to the actual Oracle NoSQL Database cloud service.
Oracle NoSQL Migrator:
Oracle NoSQL Data Migrator is a tool supporting the movement of Oracle NoSQL tables
from one data source to another data source. This utility supports multiple migration options,
such as:
• Oracle NoSQL Database on-premise to Oracle NoSQL Database Cloud Service
and vice-versa.
• Between two Oracle NoSQL on-premise Databases.
• Between two Oracle NoSQL Database Cloud Service Tables.
• JSON file to Oracle NoSQL Database on-premise and vice-versa.
NoSQL Do’s and Don’ts
Let’s begin with performance, and if remember about the cap theorem which I explained in my
article “NoSQL Categories Breakdown” couple of weeks back, where it turned out that NoSQL
databases have enhanced availability over relational databases but at the cost of moderated
database consistency. That would sort of imply that NoSQL databases always perform better.
The reality is they do perform better but in very specific kind of web scale scenarios where the
load on the system is huge, and the necessity of what must be retrieved and saved is actually
rather straightforward and simple. In those situations NoSQL databases work well, and will
perform faster than a lot of relational databases.
But in other context where we don’t have that kind of load, NoSQL databases may actually be
slower, quite a bit slower in fact. So, in general for a line of business application loads you’re
going to want to use relational databases. They’ll be just as fast and probably faster, plus
because of the consistency guarantees they’ll be much more reliable, and for a business
application that may be something that’s not a luxury but rather a necessity. On the other hand,
in those cases where you have the simple storage and retrieval, but you have to do an awful lot
of it, then NoSQL may be best.
Suitability, which really translates into the idea of having examples of where one or the other
works well. Think about just the example of storing and retrieving configuration data,
personalization data, environmental data for the way an application have to come up and look
and be skinned and be branded, all kinds of settings data, the kinds of things that would
probably get stored in a flat file if it were just a desktop application that needed to save and
retrieve them. These are the kinds of things that tend to work pretty well for a NoSQL database.
Also, log data or any event-driven data, although you will very often find relational databases
used for those kinds of workloads, NoSQL databases in many cases will be a more appropriate
choice.
On the other hand, if you think about anything really that’s transactional, especially financial
transactional applications, anything to do with stock trades, accounting, credit card
transactions, these are things for which using a NoSQL database would really be a bit obtuse.
You can probably use relational for anything, and you can probably use NoSQL for anything.
There’s a law of computation that basically says any problem can be computed in any language,
and by extension, that likely means any data requirement can probably be satisfied by any
database. But it wouldn’t be satisfied especially well. So, realize that while there is overlap,
there are gray areas.
For the most part, things are rather mutually exclusive as to where each of these types of
databases works well. And one sure way to come up with a bad decision is to try and come up
with an endorsement for one or the other in every single case. That will almost always get you
in trouble.
The real question is whether the scenarios where relational works best are ones that you see in
your business, or whether the scenarios for NoSQL are the ones that you see in your business,
or whether you see a mix. If you only see one set of scenarios, then yes, it will appear that one
database type is always the answer. But that’s because the questions that you’re asking are in
effect limited.
Relational Databases –
The fundamental concept behind databases, namely MySQL, Oracle Express Edition, and MS-
SQL that uses SQL, is that they are all Relational Database Management Systems that make
use of relations (generally referred to as tables) for storing data.
In a relational database, the data is correlated with the help of some common characteristics
that are present in the Dataset and the outcome of this is referred to as the Schema of the
RDBMS.
Limitations of SQL vs NoSQL:
o Relational Database Management Systems that use SQL are Schema –Oriented
i.e. the structure of the data should be known in advance ensuring that the data
adheres to the schema.
o Examples of such predefined schema based applications that use SQL include
Payroll Management System, Order Processing, and Flight Reservations.
• It is not possible for SQL to process unpredictable and unstructured information.
However, Big Data applications, demand for an occurrence-oriented database which is
highly flexible and operates on a schema less data model.
• SQL Databases are vertically scalable – this means that they can only be scaled by
enhancing the horse power of the implementation hardware, thereby making it a costly
deal for processing large batches of data.
• IT enterprises need to increase the RAM, SSD, CPU, etc., on a single server in order to
manage the increasing load on the RDBMS.
• With increasing size of the database or increasing number of users, Relational Database
Management Systems using SQL suffer from serious performance bottlenecks -making
real time unstructured data processing a hard row to hoe.
• With Relational Database Management Systems, built-in clustering is difficult due to
the ACID properties of transactions.
NoSQL Databases
NoSQL is a database technology driven by Cloud Computing, the Web, Big Data and the Big
Users.
NoSQL now leads the way for the popular internet companies such as LinkedIn, Google,
Amazon, and Facebook - to overcome the drawbacks of the 40 year old RDBMS.
NoSQL Database, also known as “Not Only SQL” is an alternative to SQL database which
does not require any kind of fixed table schemas unlike the SQL.
NoSQL generally scales horizontally and avoids major join operations on the data. NoSQL
database can be referred to as structured storage which consists of relational database as the
subset.
NoSQL Database covers a swarm of multitude databases, each having a different kind of data
storage model. The most popular types are Graph, Key-Value pairs, Columnar and Document.
NoSQL vs SQL – 4 Key Differences:
1. Nature of Data and Its Storage- Tables vs. Collections

The foremost criterion for choosing a database is the nature of data that your enterprise is
planning to control and leverage. If the enterprise plans to pull data similar to an accounting
excel spreadsheet, i.e. the basic tabular structured data, then the relational model of the database
would suffice to fulfill your business requirements but the current trends demand for storing
and processing unstructured and unpredictable information.
To the contrary, molecular modeling, geo-spatial or engineering parts data is so complex to be

dealt with – that the Data Model created for this kind of data is highly complicated due to
several levels of nesting. Though several attempts were made to model this kind of data with
the ‘2D (Row-Column) Database’ - it did not fit .
To overcome this drawback, NoSQL database was considered as an alternate option. NoSQL
Databases ease the representation of multi-level hierarchies and nesting using the JSON i.e.
JavaScript Object Notation format.
In this world of dynamic schema where changes pour in every hour it is not possible to adhere
to the “Get it Right First” Strategy - which was a success with the outmoded static schema.
Web-centric businesses like Amazon, eBay, etc., were in need of a database like NoSQL vs
SQL that can best match up with the changing data model rendering them greater levels of
flexibility in operations.
2. Speed – Normalization vs. Storage Cost
RDBMS requires a higher degree of Normalization i.e. data needs to be broken down into
several small logical tables to avoid data redundancy and duplication. Normalization helps
manage data in an efficient way, but the complexity of spanning several related tables involved
with normalization hampers the performance of data processing in relational databases using
SQL.
On the other hand, in NoSQL Databases such as Couchbase, Cassandra, and MongoDB, data
is stored in the form of flat collections where this data is duplicated repeatedly and a single
piece of data is hardly ever partitioned off but rather it is stored in the form of an entity. Hence,
reading or writing operations to a single entity have become easier and faster.
NoSQL databases can also store and process data in real time - something that SQL is not
capable of doing it.
3. Horizontal Scalability vs. Vertical Scalability
The most beneficial aspect of NoSQL databases like HBase for Hadoop, MongoDB,
Couchbase and 10Gen’s is - the ease of scalability to handle huge volumes of data.
For instance, if you operate an eCommerce website similar to Amazon and you happen to be
an overnight success - you will have tons of customers visiting your website.
Under such circumstances, if you are using a relational database, i.e., SQL, you will have to
meticulously replicate and repartition the database so as to fulfill the increasing demand of the
customers.
The manner in which NoSQL vs SQL databases scale up to meet the business requirements
affects the performance bottleneck of the application.
Generally, with increase in demand, relational databases tend to scale up vertically which
means that they add extra horsepower to the system - to enable faster operations on the same
dataset.On the contrary, NoSQL Databases like the HBase, Couchbase and MongoD, scale
horizontally with the addition ofextra nodes (commodity database servers) to the resource pool,
so that the load can be distributed easily.
4. NoSQL vs SQL / CAP vs. ACID
Relational databases using SQL have been legends in the database landscape for maintaining
integrity through the ACID properties (Atomicity, Consistency, Isolated, and Durable) of
transactions and most of the storage vendors rely on properties.
However, the main motive is to shore up isolated non-dividable transactions - where changes
are permanent, leaving the data in a consistent state.
NoSQL Databases work on the concept of the CAP priorities and at a time you can decide to
choose any of the 2 priorities out of the CAP Theorem (Consistency-Availability-Partition
Tolerance) as it is highly difficult to attain all the three in a changing distributed node system.
One can term NoSQL Databases as BASE , the opposite of ACID - meaning:
BA= Basically Available –In the bag Availability
S= Soft State – The state of the system can change anytime devoid of executing any query
because node updates take place every now and then to fulfill the ever changing requirements.
E=Eventually Consistent- NoSQL Database systems will become consistent in the long run.
Why should you choose a NoSQL Database like HBase, Couchbase or Cassandra over
RDBMS?
1)Applications and databases need to work with Big Data
2)Big Data needs a flexible data model with a better database architecture
3)To process Big Data, these databases need continuous application availability with modern
transaction support.
NoSQL in Big Data Applications
1. HBase for Hadoop, a popular NoSQL database is used extensively by Facebook for its
messaging infrastructure.
2. HBase is used by Twitter for generating data, storing, logging, and monitoring data
around people’s search.
3. HBase is used by the discovery engine Stumble upon for data analytics and storage.
4. MongoDB is another NoSQL Database used by CERN, a European Nuclear Research
Organization for collecting data from the huge particle collider “Hadron Collider”.
5. LinkedIn, Orbitz, and Concur use the Couchbase NoSQL Database for various data
processing and monitoring tasks.
CASSENDRA COMMANDS
CREATE KEYSPACE
CREATE KEYSPACE <identifier> WITH <properties>

CREATE KEYSPACE “KeySpace Name”
WITH replication = {'class': ‘Strategy name’, 'replication_factor' : ‘No.Of replicas’}
AND durable_writes = ‘Boolean value’;
The CREATE KEYSPACE statement has two properties: replication and durable_writes.
CREATE KEYSPACE tutorialspoint

WITH replication = {'class':'SimpleStrategy', 'replication_factor' : 3};
VERIFY
DESCRIBE keyspaces;
tutorialspoint system system_traces
CREATE
CREATE KEYSPACE test

... WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3 }
... AND DURABLE_WRITES = false;
VERIFY
qlsh> SELECT * FROM system_schema.keyspaces;
keyspace_name | durable_writes | strategy_class | strategy_options

----------------+----------------+------------------------------------------------------+---------------------
-------
test | False | org.apache.cassandra.locator.NetworkTopologyStrategy |

{"datacenter1" : "3"}
tutorialspoint | True | org.apache.cassandra.locator.SimpleStrategy |
{"replication_factor" : "4"}
system | True | org.apache.cassandra.locator.LocalStrategy | { }
Using a Keyspace
Syntax:USE <identifier>
cqlsh> USE tutorialspoint;
cqlsh:tutorialspoint>
Altering a KeySpace
ALTER KEYSPACE <identifier> WITH <properties>

ALTER KEYSPACE “KeySpace Name”
WITH replication = {'class': ‘Strategy name’, 'replication_factor' : ‘No.Of replicas’};
cqlsh.> ALTER KEYSPACE tutorialspoint
WITH replication = {'class':'NetworkTopologyStrategy', 'replication_factor' : 3};
Altering Durable_writes
SELECT * FROM system.schema_keyspaces;
keyspace_name | durable_writes | strategy_class | strategy_options

----------------+----------------+------------------------------------------------------+---------------------
-------
test | False | org.apache.cassandra.locator.NetworkTopologyStrategy |
{"datacenter1":"3"}
tutorialspoint | True | org.apache.cassandra.locator.SimpleStrategy |

{"replication_factor":"4"}
system | True | org.apache.cassandra.locator.LocalStrategy | { }
system_traces | True | org.apache.cassandra.locator.SimpleStrategy |

{"replication_factor":"2"}
(4 rows)
ALTER KEYSPACE test
WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3}
AND DURABLE_WRITES = true;
DROP KEYSPACE
DROP KEYSPACE “KeySpace name”

cqlsh> DROP KEYSPACE tutorialspoint;
VERIFY
cqlsh> DESCRIBE keyspaces;

system system_traces
Creating a Table
CREATE TABLE tablename(

column1 name datatype PRIMARYKEY,
column2 name data type,
column3 name data type.
)
cqlsh> USE tutorialspoint;
cqlsh:tutorialspoint>; CREATE TABLE emp(
emp_id int PRIMARY KEY,
emp_name text,
emp_city text,
emp_sal varint,
emp_phone varint
);
Verification
cqlsh:tutorialspoint> select * from emp;
emp_id | emp_city | emp_name | emp_phone | emp_sal

--------+----------+----------+-----------+---------
ALTER COLUMN
ALTER TABLE table name

ADD new column datatype;
cqlsh:tutorialspoint> ALTER TABLE emp
... ADD emp_email text;
Verification
emp_id | emp_city | emp_email | emp_name | emp_phone | emp_sal

--------+----------+-----------+----------+-----------+---------
ALTER table name

DROP column name;
cqlsh:tutorialspoint> ALTER TABLE emp DROP emp_email;


--------+----------+----------+-----------+---------
DROP TABLE
DROP TABLE <tablename>

cqlsh:tutorialspoint> DROP TABLE emp;
VERIFICATION
cqlsh:tutorialspoint> DESCRIBE COLUMNFAMILIES;
employee
TRUNCATE
TRUNCATE <tablename>
cqlsh:tp> select * from student;
s_id | s_aggregate | s_branch | s_name

------+-------------+----------+--------
1| 70 | IT | ram
2| 75 | EEE | rahman
3| 72 | MECH | robbin
(3 rows)
cqlsh:tp> TRUNCATE student;
cqlsh:tp> select * from student;
s_id | s_aggregate | s_branch | s_name

------+-------------+----------+--------
(0 rows)
Creating an Index using Cqlsh
CREATE INDEX <identifier> ON <tablename>
cqlsh:tutorialspoint> CREATE INDEX name ON emp1 (emp_name);
CREATE DATA
INSERT INTO <tablename>

(<column1 name>, <column2 name>....)
VALUES (<value1>, <value2>....)
USING <option>
cqlsh:tutorialspoint> INSERT INTO emp (emp_id, emp_name, emp_city,
emp_phone, emp_sal) VALUES(1,'ram', 'Hyderabad', 9848022338, 50000);

emp_phone, emp_sal) VALUES(2,'robin', 'Hyderabad', 9848022339, 40000);

emp_phone, emp_sal) VALUES(3,'rahman', 'Chennai', 9848022330, 45000);
VERIFY
SELECT * FROM emp;

--------+-----------+----------+------------+---------
1 | Hyderabad | ram | 9848022338 | 50000
2 | Hyderabad | robin | 9848022339 | 40000
3 | Chennai | rahman | 9848022330 | 45000
Updating Data in a Table
UPDATE <tablename>
SET <column name> = <new value>
<column name> = <value>....
WHERE <condition>
emp_id emp_name emp_city emp_phone emp_sal
1 ram Hyderabad 9848022338 50000
2 robin Hyderabad 9848022339 40000
3 rahman Chennai 9848022330 45000
cqlsh:tutorialspoint> UPDATE emp SET emp_city='Delhi',emp_sal=50000

WHERE emp_id=2;

--------+-----------+----------+------------+---------
1 | Hyderabad | ram | 9848022338 | 50000
2 | Delhi | robin | 9848022339 | 50000
3 | Chennai | rahman | 9848022330 | 45000
Reading Data using Select Clause
SELECT FROM <tablename>

1 ram Hyderabad 9848022338 50000
2 robin Null 9848022339 50000
3 rahman Chennai 9848022330 50000
4 rajeev Pune 9848022331 30000


--------+-----------+----------+------------+---------
1 | Hyderabad | ram | 9848022338 | 50000
2 | null | robin | 9848022339 | 50000
3 | Chennai | rahman | 9848022330 | 50000
4 | Pune | rajeev | 9848022331 | 30000
(4 rows)
cqlsh:tutorialspoint> SELECT emp_name, emp_sal from emp;
emp_name | emp_sal
----------+---------
ram | 50000
robin | 50000
rajeev | 30000
rahman | 50000
(4 rows)
SELECT FROM <table name> WHERE <condition>;

cqlsh:tutorialspoint> CREATE INDEX ON emp(emp_sal);
cqlsh:tutorialspoint> SELECT * FROM emp WHERE emp_sal=50000;

--------+-----------+----------+------------+---------
1 | Hyderabad | ram | 9848022338 | 50000
2 | null | robin | 9848022339 | 50000
3 | Chennai | rahman | 9848022330 | 50000
DELETE FROM <identifier> WHERE <condition>;
1 ram Hyderabad 9848022338 50000
2 robin Hyderabad 9848022339 40000
3 rahman Chennai 9848022330 45000
DELETE emp_sal FROM emp WHERE emp_id=3;
Verification

--------+-----------+----------+------------+---------
1 | Hyderabad | ram | 9848022338 | 50000
2 | Delhi | robin | 9848022339 | 50000
3 | Chennai | rahman | 9848022330 | null
(3 rows)
cqlsh:tutorialspoint> DELETE FROM emp WHERE emp_id=3;


--------+-----------+----------+------------+---------
1 | Hyderabad | ram | 9848022338 | 50000
2 | Delhi | robin | 9848022339 | 50000
(2 rows)

Unit 3

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 3

Uploaded by

Copyright:

Available Formats

NOSQL

Why use NoSQL?

• Support large numbers of concurrent users (tens of thousands, perhaps millions)

By contrast, in a document-oriented database defined as NoSQL, JSON is the de facto format

Key Features of NoSQL :

Disadvantages of NoSQL: NoSQL has the following disadvantages.

CREATE TABLE users (

What is the CAP theorem?

The CAP theorem categorizes systems into three categories:

In any networked shared-data systems or distributed systems partition tolerance

The following diagram shows the classification of different databases based on

Relational Database NoSql Database

Supports powerful query language. Supports very simple query

It has a fixed schema. No fixed schema.

Follows ACID (Atomicity, Consistency, Isolation, and It is only “eventually

Supports transactions. Does not support transactions.

What is Apache Cassandra?

Apache Cassandra is an open source, distributed and decentralized/distributed storage system

The key components of Cassandra are as follows −

Cassandra Query Language

Relational Table Cassandra column Family

A schema in a relational model is fixed. Once In Cassandra, although the column

A Cassandra column family has the following attributes −

Data Models of Cassandra and RDBMS

It has a fixed schema. Cassandra has a flexible schema.

Row is an individual record in RDBMS. Row is a unit of replication in Cassandra.

Column represents the attributes of a Column is a unit of storage in Cassandra.

RDBMS supports the concepts of Relationships are represented using

Cloud-based NoSQL databases offer several advantages over traditional on-premise

1. Scalability: Cloud-based NoSQL databases can easily scale horizontally by

Some popular NoSQL Cloud Database Services include:

1. Amazon DynamoDB: A fully managed NoSQL database service offered by

Types of NoSQL databases:

Figure 1: Types of NoSQL databases

Why choose NoSQL?

Difference between NoSQL Databases and SQL Databases:

• NoSQL database schema is flexible while SQL database schema is rigid.

Figure 2: Difference between SQl and NoSQL Databases

Oracle NoSQL Cloud Database Service:

How to start with Oracle NoSQL Cloud Database Service:

NoSQL Do’s and Don’ts

Limitations of SQL vs NoSQL:

NoSQL vs SQL – 4 Key Differences:

1. Nature of Data and Its Storage- Tables vs. Collections

To the contrary, molecular modeling, geo-spatial or engineering parts data is so complex to be

2. Speed – Normalization vs. Storage Cost

3. Horizontal Scalability vs. Vertical Scalability

4. NoSQL vs SQL / CAP vs. ACID

BA= Basically Available –In the bag Availability

CREATE KEYSPACE <identifier> WITH <properties>

AND durable_writes = ‘Boolean value’;

CREATE KEYSPACE tutorialspoint

CREATE KEYSPACE test

qlsh> SELECT * FROM system_schema.keyspaces;

keyspace_name | durable_writes | strategy_class | strategy_options

test | False | org.apache.cassandra.locator.NetworkTopologyStrategy |

system | True | org.apache.cassandra.locator.LocalStrategy | { }

ALTER KEYSPACE <identifier> WITH <properties>

keyspace_name | durable_writes | strategy_class | strategy_options

tutorialspoint | True | org.apache.cassandra.locator.SimpleStrategy |

system | True | org.apache.cassandra.locator.LocalStrategy | { }

system_traces | True | org.apache.cassandra.locator.SimpleStrategy |

DROP KEYSPACE “KeySpace name”