Professional Documents
Culture Documents
NoSQL database technology stores information in JSON documents instead of columns and
rows used by relational databases. To be clear, NoSQL stands for “not only SQL” rather than
“no SQL” at all. This means a NoSQL JSON database can store and retrieve data using literally
“no SQL.”
Consequently, NoSQL databases are built to be flexible, scalable, and capable of rapidly
responding to the data management demands of modern businesses.
The following defines the four most-popular types of NoSQL database:
• Document databases are primarily built for storing information as documents,
including, but not limited to, JSON documents. These systems can also be used for
storing XML documents, for a NoSQL database example.
• Key-value stores group associated data in collections with records that are identified
with unique keys for easy retrieval. Key-value stores have just enough structure to
mirror the value of relational databases (as opposed to non-relational databases) while
still preserving the benefits of the NoSQL database structure.
• Wide-column databases use the tabular format of relational databases yet allow a wide
variance in how data is named and formatted in each row, even in the same table. Like
key-value stores, wide-column databases have some basic NoSQL structure while also
preserving a lot of flexibility
• Graph databases use graph structures to define the relationships between stored data
points. Graph databases are useful for identifying patterns in unstructured and semi-
structured information.
Advantages of NoSQL: There are many advantages of working with NoSQL databases
such as MongoDB and Cassandra. The main advantages are high scalability and high
availability.
1. High scalability : NoSQL databases use sharding for horizontal scaling.
Partitioning of data and placing it on multiple machines in such a way that the
order of the data is preserved is sharding. Vertical scaling means adding more
resources to the existing machine whereas horizontal scaling means adding more
machines to handle the data. Vertical scaling is not that easy to implement but
horizontal scaling is easy to implement. Examples of horizontal scaling
databases are MongoDB, Cassandra, etc. NoSQL can handle a huge amount of
data because of scalability, as the data grows NoSQL scale itself to handle that
data in an efficient manner.
2. Flexibility: NoSQL databases are designed to handle unstructured or semi-
structured data, which means that they can accommodate dynamic changes to
the data model. This makes NoSQL databases a good fit for applications that
need to handle changing data requirements.
3. High availability : Auto replication feature in NoSQL databases makes it highly
available because in case of any failure data replicates itself to the previous
consistent state.
4. Scalability: NoSQL databases are highly scalable, which means that they can
handle large amounts of data and traffic with ease. This makes them a good fit
for applications that need to handle large amounts of data or traffic
5. Performance: NoSQL databases are designed to handle large amounts of data
and traffic, which means that they can offer improved performance compared to
traditional relational databases.
6. Cost-effectiveness: NoSQL databases are often more cost-effective than
traditional relational databases, as they are typically less complex and do not
require expensive hardware or software.
A distributed system is a network that stores data on more than one node (physical or
virtual machines) at the same time.
Consistency: means that all clients see the same data at the same time, no matter
which node they connect to in a distributed system. To achieve consistency, whenever
data is written to one node, it must be instantly forwarded or replicated to all the other
nodes in the system before the write is deemed successful.
Availability: means that every non-failing node returns a response for all read and
write requests in a reasonable amount of time, even if one or more nodes are down.
Another way to state this — all working nodes in the distributed system return a valid
response for any request, without failing or exception.
Partition Tolerance: means that the system continues to operate despite arbitrary
message loss or failure of part of the system. In other words, even if there is a network
outage in the data center and some of the computers are unreachable, still the system
continues to perform. Distributed systems guaranteeing partition tolerance can
gracefully recover from partitions once the partition heals.
A NoSQL database (sometimes called as Not Only SQL) is a database that provides a
mechanism to store and retrieve data other than the tabular relations used in relational
databases. These databases are schema-free, support easy replication, have simple API,
eventually consistent, and can handle huge amounts of data.
The primary objective of a NoSQL database is to have
• simplicity of design,
• horizontal scaling, and
• finer control over availability.
NoSql databases use different data structures compared to relational databases. It makes some
operations faster in NoSQL. The suitability of a given NoSQL database depends on the problem
it must solve.
Cassandra has become so popular because of its outstanding technical features. Given below
are some of the features of Cassandra:
• Elastic scalability − Cassandra is highly scalable; it allows to add more
hardware to accommodate more customers and more data as per requirement.
• Always on architecture − Cassandra has no single point of failure and it is
continuously available for business-critical applications that cannot afford a
failure.
• Fast linear-scale performance − Cassandra is linearly scalable, i.e., it increases
your throughput as you increase the number of nodes in the cluster. Therefore it
maintains a quick response time.
• Flexible data storage − Cassandra accommodates all possible data formats
including: structured, semi-structured, and unstructured. It can dynamically
accommodate changes to your data structures according to your need.
• Easy data distribution − Cassandra provides the flexibility to distribute data
where you need by replicating data across multiple data centers.
• Transaction support − Cassandra supports properties like Atomicity,
Consistency, Isolation, and Durability (ACID).
• Fast writes − Cassandra was designed to run on cheap commodity hardware. It
performs blazingly fast writes and can store hundreds of terabytes of data,
without sacrificing the read efficiency.
The design goal of Cassandra is to handle big data workloads across multiple nodes without
any single point of failure. Cassandra has peer-to-peer distributed system across its nodes, and
data is distributed among all the nodes in a cluster.
• All the nodes in a cluster play the same role. Each node is independent and at
the same time interconnected to other nodes.
• Each node in a cluster can accept read and write requests, regardless of where
the data is actually located in the cluster.
• When a node goes down, read/write requests can be served from other nodes in
the network.
• Data Replication in Cassandra
• In Cassandra, one or more of the nodes in a cluster act as replicas for a given piece of
data. If it is detected that some of the nodes responded with an out-of-date value,
Cassandra will return the most recent value to the client. After returning the most recent
value, Cassandra performs a read repair in the background to update the stale values.
• The following figure shows a schematic view of how Cassandra uses data replication
among the nodes in a cluster to ensure no single point of failure.
•
• Note − Cassandra uses the Gossip Protocol in the background to allow the nodes to
communicate with each other and detect any faulty nodes in the cluster
Components of Cassandra
Users can access Cassandra through its nodes using Cassandra Query Language (CQL). CQL
treats the database (Keyspace) as a container of tables. Programmers use cqlsh: a prompt to
work with CQL or separate application language drivers.
Clients approach any of the nodes for their read-write operations. That node (coordinator) plays
a proxy between the client and the nodes holding the data.
Write Operations
Every write activity of nodes is captured by the commit logs written in the nodes. Later the
data will be captured and stored in the mem-table. Whenever the mem-table is full, data will
be written into the SStable data file. All writes are automatically partitioned and replicated
throughout the cluster. Cassandra periodically consolidates the SSTables, discarding
unnecessary data.
Read Operations
During read operations, Cassandra gets values from the mem-table and checks the bloom filter
to find the appropriate SSTable that holds the required data.
Cassandra - Data Model
The data model of Cassandra is significantly different from what we normally see in an
RDBMS. This chapter provides an overview of how Cassandra stores its data.
Cluster
Cassandra database is distributed over several machines that operate together. The outermost
container is known as the Cluster. For failure handling, every node contains a replica, and in
case of a failure, the replica takes charge. Cassandra arranges the nodes in a cluster, in a ring
format, and assigns data to them.
Keyspace
Keyspace is the outermost container for data in Cassandra. The basic attributes of a Keyspace
in Cassandra are −
• Replication factor − It is the number of machines in the cluster that will receive
copies of the same data.
• Replica placement strategy − It is nothing but the strategy to place replicas in
the ring. We have strategies such as simple strategy (rack-aware strategy), old
network topology strategy (rack-aware strategy), and network topology
strategy (datacenter-shared strategy).
• Column families − Keyspace is a container for a list of one or more column
families. A column family, in turn, is a container of a collection of rows. Each
row contains ordered columns. Column families represent the structure of your
data. Each keyspace has at least one and often many column families.
The syntax of creating a Keyspace is as follows −
CREATE KEYSPACE Keyspace name
WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3};
The following illustration shows a schematic view of a Keyspace.
Column Family
A column family is a container for an ordered collection of rows. Each row, in turn, is an
ordered collection of columns. The following table lists the points that differentiate a column
family from a table of relational databases.
Relational tables define only columns and the In Cassandra, a table contains columns,
user fills in the table with values. or can be defined as a super column
family.
SuperColumn
A super column is a special column, therefore, it is also a key-value pair. But a super column
stores a map of sub-columns.
Generally, column families are stored on disk in individual files. Therefore, to optimize
performance, it is important to keep columns that you are likely to query together in the same
column family, and a super column can be helpful here. Given below is the structure of a super
column.
The following table lists down the points that differentiate the data model of Cassandra from
that of an RDBMS.
RDBMS Cassandra
RDBMS deals with structured data. Cassandra deals with unstructured data.
Database is the outermost container that Keyspace is the outermost container that
contains data corresponding to an contains data corresponding to an application.
application.
Tables are the entities of a database. Tables or column families are the entity of a
keyspace.
NO SQL IN CLOUD
NoSQL Cloud Database Services are cloud-based database services that provide scalable,
high-performance, and cost-effective solutions for storing and retrieving data. NoSQL (Not
Only SQL) databases are designed to handle large volumes of unstructured, semi-structured,
and structured data, and can easily scale horizontally to accommodate increased data
volumes.
• Column-based
• Key-value store
• Graph databases
• Document-Based
Google, Facebook, Amazon, and Linkedln are a few of the top internet companies that
originally use the NoSQL database for overcoming the downside of the Relational Database
Management System(RDBMS). Relational Database Management System isn’t dependably
the best answer for all circumstances since data handling necessities develop dramatically. A
dynamic method and a better cloud-friendly environment are provided by NoSQL for
processing unstructured data.
Oracle NoSQL Database Cloud Service is a completely data-managed store and server-less.
It can deal with columnar, JSON reports, or key-value data models. The developers have
provided the following key features by NoSQL Cloud Database Service:
• Flexible: The service is developer-centric as it can be designed according to the
flexibility of developers. Various models of data models are supported by NoSQL
Database Cloud service, such as key-value, document-based, etc.
• Easy and Simple: NoSQL Database Cloud service makes application
development easy as it supports different languages like Java, Python, etc.
• Platform independent: Both fixed-schema and document-based data models
are supported by the NoSQL Database Cloud service.
• TABLE CREATION: There are two modes with which we can easily create
new tables of Oracle NoSQL cloud database service. The modes are as follows:
• The first mode helps in the creation of the table without writing the
Data Definition Language(DDL) statement, that is a table can be
created declaratively. This mode can be called simple input mode.
• The second mode is referred to as advanced DDL input mode. As the
name describes, the table is created with the help of the Data Definition
Language(DDL) statement.
• INSERTION OF DATA IN TABLES: There are two modes with which we can
easily insert data in the table of Oracle NoSQL cloud database service. The modes
are as follows:
• In the first mode, the value is provided declaratively for the rows in the
table. This mode is referred to as simple input mode.
• In the second mode, the value is provided in Javascript Object
Notation(JSON) format for rows in the table, so this mode is advanced
JSON input mode.
Common Use Cases of Oracle NoSQL Cloud Database Service:
• Personalized experience: According to the user, these services help to deliver
customized content and also provide personalized individual experiences.
• Mobile apps: Advanced applications and applications with faster response time
can be made with the help of cloud database services for mobile.
• Games: It helps by supporting gaming with millions of simultaneous
participants with real-time situations.
Oracle NoSQL Database Cloud Simulator:
Oracle NoSQL Database Cloud Simulator is a tool that is provided by Oracle in order to
accelerate Oracle NoSQL Database cloud application development and testing. Cloudism is
the other word for cloud simulators. It helps in creating, debugging, and testing the
application against a locally deployed database instance. This locally deployed database has
all functions equivalent to the actual Oracle NoSQL Database cloud service.
Oracle NoSQL Migrator:
Oracle NoSQL Data Migrator is a tool supporting the movement of Oracle NoSQL tables
from one data source to another data source. This utility supports multiple migration options,
such as:
• Oracle NoSQL Database on-premise to Oracle NoSQL Database Cloud Service
and vice-versa.
• Between two Oracle NoSQL on-premise Databases.
• Between two Oracle NoSQL Database Cloud Service Tables.
• JSON file to Oracle NoSQL Database on-premise and vice-versa.
Let’s begin with performance, and if remember about the cap theorem which I explained in my
article “NoSQL Categories Breakdown” couple of weeks back, where it turned out that NoSQL
databases have enhanced availability over relational databases but at the cost of moderated
database consistency. That would sort of imply that NoSQL databases always perform better.
The reality is they do perform better but in very specific kind of web scale scenarios where the
load on the system is huge, and the necessity of what must be retrieved and saved is actually
rather straightforward and simple. In those situations NoSQL databases work well, and will
perform faster than a lot of relational databases.
But in other context where we don’t have that kind of load, NoSQL databases may actually be
slower, quite a bit slower in fact. So, in general for a line of business application loads you’re
going to want to use relational databases. They’ll be just as fast and probably faster, plus
because of the consistency guarantees they’ll be much more reliable, and for a business
application that may be something that’s not a luxury but rather a necessity. On the other hand,
in those cases where you have the simple storage and retrieval, but you have to do an awful lot
of it, then NoSQL may be best.
Suitability, which really translates into the idea of having examples of where one or the other
works well. Think about just the example of storing and retrieving configuration data,
personalization data, environmental data for the way an application have to come up and look
and be skinned and be branded, all kinds of settings data, the kinds of things that would
probably get stored in a flat file if it were just a desktop application that needed to save and
retrieve them. These are the kinds of things that tend to work pretty well for a NoSQL database.
Also, log data or any event-driven data, although you will very often find relational databases
used for those kinds of workloads, NoSQL databases in many cases will be a more appropriate
choice.
On the other hand, if you think about anything really that’s transactional, especially financial
transactional applications, anything to do with stock trades, accounting, credit card
transactions, these are things for which using a NoSQL database would really be a bit obtuse.
You can probably use relational for anything, and you can probably use NoSQL for anything.
There’s a law of computation that basically says any problem can be computed in any language,
and by extension, that likely means any data requirement can probably be satisfied by any
database. But it wouldn’t be satisfied especially well. So, realize that while there is overlap,
there are gray areas.
For the most part, things are rather mutually exclusive as to where each of these types of
databases works well. And one sure way to come up with a bad decision is to try and come up
with an endorsement for one or the other in every single case. That will almost always get you
in trouble.
The real question is whether the scenarios where relational works best are ones that you see in
your business, or whether the scenarios for NoSQL are the ones that you see in your business,
or whether you see a mix. If you only see one set of scenarios, then yes, it will appear that one
database type is always the answer. But that’s because the questions that you’re asking are in
effect limited.
Relational Databases –
The fundamental concept behind databases, namely MySQL, Oracle Express Edition, and MS-
SQL that uses SQL, is that they are all Relational Database Management Systems that make
use of relations (generally referred to as tables) for storing data.
In a relational database, the data is correlated with the help of some common characteristics
that are present in the Dataset and the outcome of this is referred to as the Schema of the
RDBMS.
o Relational Database Management Systems that use SQL are Schema –Oriented
i.e. the structure of the data should be known in advance ensuring that the data
adheres to the schema.
o Examples of such predefined schema based applications that use SQL include
Payroll Management System, Order Processing, and Flight Reservations.
• It is not possible for SQL to process unpredictable and unstructured information.
However, Big Data applications, demand for an occurrence-oriented database which is
highly flexible and operates on a schema less data model.
• SQL Databases are vertically scalable – this means that they can only be scaled by
enhancing the horse power of the implementation hardware, thereby making it a costly
deal for processing large batches of data.
• IT enterprises need to increase the RAM, SSD, CPU, etc., on a single server in order to
manage the increasing load on the RDBMS.
• With increasing size of the database or increasing number of users, Relational Database
Management Systems using SQL suffer from serious performance bottlenecks -making
real time unstructured data processing a hard row to hoe.
• With Relational Database Management Systems, built-in clustering is difficult due to
the ACID properties of transactions.
NoSQL Databases
NoSQL is a database technology driven by Cloud Computing, the Web, Big Data and the Big
Users.
NoSQL now leads the way for the popular internet companies such as LinkedIn, Google,
Amazon, and Facebook - to overcome the drawbacks of the 40 year old RDBMS.
NoSQL Database, also known as “Not Only SQL” is an alternative to SQL database which
does not require any kind of fixed table schemas unlike the SQL.
NoSQL generally scales horizontally and avoids major join operations on the data. NoSQL
database can be referred to as structured storage which consists of relational database as the
subset.
NoSQL Database covers a swarm of multitude databases, each having a different kind of data
storage model. The most popular types are Graph, Key-Value pairs, Columnar and Document.
In this world of dynamic schema where changes pour in every hour it is not possible to adhere
to the “Get it Right First” Strategy - which was a success with the outmoded static schema.
Web-centric businesses like Amazon, eBay, etc., were in need of a database like NoSQL vs
SQL that can best match up with the changing data model rendering them greater levels of
flexibility in operations.
RDBMS requires a higher degree of Normalization i.e. data needs to be broken down into
several small logical tables to avoid data redundancy and duplication. Normalization helps
manage data in an efficient way, but the complexity of spanning several related tables involved
with normalization hampers the performance of data processing in relational databases using
SQL.
On the other hand, in NoSQL Databases such as Couchbase, Cassandra, and MongoDB, data
is stored in the form of flat collections where this data is duplicated repeatedly and a single
piece of data is hardly ever partitioned off but rather it is stored in the form of an entity. Hence,
reading or writing operations to a single entity have become easier and faster.
NoSQL databases can also store and process data in real time - something that SQL is not
capable of doing it.
The most beneficial aspect of NoSQL databases like HBase for Hadoop, MongoDB,
Couchbase and 10Gen’s is - the ease of scalability to handle huge volumes of data.
For instance, if you operate an eCommerce website similar to Amazon and you happen to be
an overnight success - you will have tons of customers visiting your website.
Under such circumstances, if you are using a relational database, i.e., SQL, you will have to
meticulously replicate and repartition the database so as to fulfill the increasing demand of the
customers.
The manner in which NoSQL vs SQL databases scale up to meet the business requirements
affects the performance bottleneck of the application.
Generally, with increase in demand, relational databases tend to scale up vertically which
means that they add extra horsepower to the system - to enable faster operations on the same
dataset.On the contrary, NoSQL Databases like the HBase, Couchbase and MongoD, scale
horizontally with the addition ofextra nodes (commodity database servers) to the resource pool,
so that the load can be distributed easily.
Relational databases using SQL have been legends in the database landscape for maintaining
integrity through the ACID properties (Atomicity, Consistency, Isolated, and Durable) of
transactions and most of the storage vendors rely on properties.
However, the main motive is to shore up isolated non-dividable transactions - where changes
are permanent, leaving the data in a consistent state.
NoSQL Databases work on the concept of the CAP priorities and at a time you can decide to
choose any of the 2 priorities out of the CAP Theorem (Consistency-Availability-Partition
Tolerance) as it is highly difficult to attain all the three in a changing distributed node system.
One can term NoSQL Databases as BASE , the opposite of ACID - meaning:
S= Soft State – The state of the system can change anytime devoid of executing any query
because node updates take place every now and then to fulfill the ever changing requirements.
E=Eventually Consistent- NoSQL Database systems will become consistent in the long run.
Why should you choose a NoSQL Database like HBase, Couchbase or Cassandra over
RDBMS?
1)Applications and databases need to work with Big Data
2)Big Data needs a flexible data model with a better database architecture
3)To process Big Data, these databases need continuous application availability with modern
transaction support.
NoSQL in Big Data Applications
1. HBase for Hadoop, a popular NoSQL database is used extensively by Facebook for its
messaging infrastructure.
2. HBase is used by Twitter for generating data, storing, logging, and monitoring data
around people’s search.
3. HBase is used by the discovery engine Stumble upon for data analytics and storage.
4. MongoDB is another NoSQL Database used by CERN, a European Nuclear Research
Organization for collecting data from the huge particle collider “Hadron Collider”.
5. LinkedIn, Orbitz, and Concur use the Couchbase NoSQL Database for various data
processing and monitoring tasks.
CASSENDRA COMMANDS
CREATE KEYSPACE
The CREATE KEYSPACE statement has two properties: replication and durable_writes.
VERIFY
DESCRIBE keyspaces;
tutorialspoint system system_traces
CREATE
VERIFY
Using a Keyspace
Syntax:USE <identifier>
cqlsh> USE tutorialspoint;
cqlsh:tutorialspoint>
Altering a KeySpace
Altering Durable_writes
SELECT * FROM system.schema_keyspaces;
DROP KEYSPACE
VERIFY
Creating a Table
Verification
cqlsh:tutorialspoint> select * from emp;
ALTER COLUMN
Verification
DROP TABLE
VERIFICATION
cqlsh:tutorialspoint> DESCRIBE COLUMNFAMILIES;
employee
TRUNCATE
TRUNCATE <tablename>
cqlsh:tp> select * from student;
(3 rows)
(0 rows)
CREATE DATA
VERIFY
UPDATE <tablename>
SET <column name> = <new value>
<column name> = <value>....
WHERE <condition>
(4 rows)
emp_name | emp_sal
----------+---------
ram | 50000
robin | 50000
rajeev | 30000
rahman | 50000
(4 rows)
Verification
cqlsh:tutorialspoint> select * from emp;
(2 rows)