You are on page 1of 15

Unit-2 NOSQL Databases

Unit-2

 Difference between Relational database and NoSQL

1. Relational Database

RDBMS stands for Relational Database Management Systems. It is most popular database. In it,
data is store in the form of row that is in the form of tuple. It contains numbers of table and data can be
easily accessed because data is store in the table. This Model was proposed by E.F. Codd.
2. NoSQL

NoSQL Database stands for a non-SQL database. NoSQL database doesn’t use table to store
the data like relational database. It is used for storing and fetching the data in database and
generally used to store the large amount of data. It supports query language and provides better
performance.
Difference between Relational database and NoSQL :

Relational Database NoSQL

1. It is used to handle data coming in low 1. It is used to handle data coming in high
velocity. velocity.

2. It gives only read scalability. 2. It gives both read and write scalability.

3. It manages structured data. 3. It manages all type of data.

4. Data arrives from one or few locations. 4. Data arrives from many locations.

5. It supports complex transactions. 5. It supports simple transactions.

6. It has single point of failure. 6. No single point of failure.

7. It handles data in less volume. 7. It handles data in high volume.

8. Transactions written in one location. 8. Transactions written in many locations.

9. support ACID properties compliance 9. doesn’t support ACID properties

10. Its difficult to make changes in database once 10. Enables easy and frequent changes to
it is defined database

11. schema is mandatory to store the data 11. schema design is not required

12. Deployed in vertical fashion. 12. Deployed in Horizontal fashion.

Dept. of MCA 1
Unit-2 NOSQL Databases

 MongoDB, the most popular NoSQL database, is an open-source document-oriented database.


The term ‘NoSQL’ means ‘non-relational’. It means that MongoDB isn’t based on the table-like
relational database structure but provides an altogether different mechanism for storage and
retrieval of data. This format of storage is called BSON ( similar to JSON format).
A simple MongoDB document Structure:
{

title: 'Geeksforgeeks',

by: 'Harshit Gupta',

url: 'https://www.geeksforgeeks.org',

type: 'NoSQL'

SQL databases store data in tabular format. This data is stored in a predefined data model which is
not very much flexible for today’s real-world highly growing applications. Modern applications are
more networked, social and interactive than ever. Applications are storing more and more data
and are accessing it at higher rates.
Relational Database Management System(RDBMS) is not the correct choice when it comes to
handling big data by the virtue of their design since they are not horizontally scalable. If the
database runs on a single server, then it will reach a scaling limit. NoSQL databases are more
scalable and provide superior performance. MongoDB is such a NoSQL database that scales by
adding more and more servers and increases productivity with its flexible document model.
RDBMS vs MongoDB:
 RDBMS has a typical schema design that shows number of tables and the relationship between
these tables whereas MongoDB is document-oriented. There is no concept of schema or
relationship.
 Complex transactions are not supported in MongoDB because complex join operations are not
available.
 MongoDB allows a highly flexible and scalable document structure. For example, one data
document of a collection in MongoDB can have two fields whereas the other document in the same
collection can have four.
 MongoDB is faster as compared to RDBMS due to efficient indexing and storage techniques.
 There are a few terms that are related in both databases. What’s called Table in RDBMS is
called a Collection in MongoDB. Similarly, a Row is called a Document and a Column is called a
Field. MongoDB provides a default ‘_id’ (if not provided explicitly) which is a 12-byte
hexadecimal number that assures the uniqueness of every document. It is similar to the Primary
key in RDBMS.

Dept. of MCA 2
Unit-2 NOSQL Databases

Features of MongoDB:
 Document Oriented: MongoDB stores the main subject in the minimal number of documents
and not by breaking it up into multiple relational structures like RDBMS. For example, it stores
all the information of a computer in a single document called Computer and not in distinct
relational structures like CPU, RAM, Hard disk, etc.
 Indexing: Without indexing, a database would have to scan every document of a collection to
select those that match the query which would be inefficient. So, for efficient searching
Indexing is a must and MongoDB uses it to process huge volumes of data in very less time.
 Scalability: MongoDB scales horizontally using sharding (partitioning data across various
servers). Data is partitioned into data chunks using the shard key, and these data chunks are
evenly distributed across shards that reside across many physical servers. Also, new machines
can be added to a running database.
 Replication and High Availability: MongoDB increases the data availability with multiple
copies of data on different servers. By providing redundancy, it protects the database from
hardware failures. If one server goes down, the data can be retrieved easily from other
active servers which also had the data stored on them.
 Aggregation: Aggregation operations process data records and return the computed results. It
is similar to the GROUPBY clause in SQL. A few aggregation expressions are sum, avg, min,
max, etc
Where do we use MongoDB?
MongoDB is preferred over RDBMS in the following scenarios:

 Big Data: If you have huge amount of data to be stored in tables, think of MongoDB before
RDBMS databases. MongoDB has built-in solution for partitioning and sharding your database.
 Unstable Schema: Adding a new column in RDBMS is hard whereas MongoDB is schema-less.
Adding a new field does not effect old documents and will be very easy.
 Distributed data Since multiple copies of data are stored across different servers, recovery of
data is instant and safe even if there is a hardware failure.
Language Support by MongoDB:
MongoDB currently provides official driver support for all popular programming languages
like C, C++, Rust, C#, Java, Node.js, Perl, PHP, Python, Ruby, Scala, Go, and Erlang.
 Cassandra
Apache Cassandra is a highly scalable, high-performance distributed database designed to handle
large amounts of data across many commodity servers, providing high availability with no single point
of failure. It is a type of NoSQL database. Let us first understand what a NoSQL database does.
NoSQL Database
A NoSQL database (sometimes called as Not Only SQL) is a database that provides a
mechanism to store and retrieve data other than the tabular relations used in relational databases.

Dept. of MCA 3
Unit-2 NOSQL Databases

These databases are schema-free, support easy replication, have simple API, eventually consistent, and
can handle huge amounts of data.
The primary objective of a NoSQL database is to have
 simplicity of design,
 horizontal scaling, and
 finer control over availability.
NoSql databases use different data structures compared to relational databases. It makes some
operations faster in NoSQL. The suitability of a given NoSQL database depends on the problem it must
solve.
NoSQL vs. Relational Database
The following table lists the points that differentiate a relational database from a NoSQL database.

Relational Database NoSql Database

Supports powerful query language. Supports very simple query language.

It has a fixed schema. No fixed schema.

Follows ACID (Atomicity, Consistency, Isolation, and


It is only “eventually consistent”.
Durability).

Supports transactions. Does not support transactions.

Besides Cassandra, we have the following NoSQL databases that are quite popular −
 Apache HBase − HBase is an open source, non-relational, distributed database modeled after
Google’s BigTable and is written in Java. It is developed as a part of Apache Hadoop project
and runs on top of HDFS, providing BigTable-like capabilities for Hadoop.
 MongoDB − MongoDB is a cross-platform document-oriented database system that avoids using
the traditional table-based relational database structure in favor of JSON-like documents with
dynamic schemas making the integration of data in certain types of applications easier and
faster.
What is Apache Cassandra?
Apache Cassandra is an open source, distributed and decentralized/distributed storage system
(database), for managing very large amounts of structured data spread out across the world. It
provides highly available service with no single point of failure.
Listed below are some of the notable points of Apache Cassandra −
 It is scalable, fault-tolerant, and consistent.
 It is a column-oriented database.
 Its distribution design is based on Amazon’s Dynamo and its data model on Google’s Bigtable.
 Created at Facebook, it differs sharply from relational database management systems.

Dept. of MCA 4
Unit-2 NOSQL Databases

 Cassandra implements a Dynamo-style replication model with no single point of failure, but
adds a more powerful “column family” data model.
 Cassandra is being used by some of the biggest companies such as Facebook, Twitter, Cisco,
Rackspace, ebay, Twitter, Netflix, and more.
Features of Cassandra
Cassandra has become so popular because of its outstanding technical features. Given below are some
of the features of Cassandra:
 Elastic scalability − Cassandra is highly scalable; it allows to add more hardware to
accommodate more customers and more data as per requirement.
 Always on architecture − Cassandra has no single point of failure and it is continuously
available for business-critical applications that cannot afford a failure.
 Fast linear-scale performance − Cassandra is linearly scalable, i.e., it increases your throughput
as you increase the number of nodes in the cluster. Therefore it maintains a quick response time.
 Flexible data storage − Cassandra accommodates all possible data formats including:
structured, semi-structured, and unstructured. It can dynamically accommodate changes to your
data structures according to your need.
 Easy data distribution − Cassandra provides the flexibility to distribute data where you need
by replicating data across multiple data centers.
 Transaction support − Cassandra supports properties like Atomicity, Consistency, Isolation, and
Durability (ACID).
 Fast writes − Cassandra was designed to run on cheap commodity hardware. It performs
blazingly fast writes and can store hundreds of terabytes of data, without sacrificing the read
efficiency.
History of Cassandra
 Cassandra was developed at Facebook for inbox search.
 It was open-sourced by Facebook in July 2008.
 Cassandra was accepted into Apache Incubator in March 2009.
 It was made an Apache top-level project since February 2010.
 Apache HBase
HBase is a data model that is similar to Google’s big table. It is an open source, distributed database
developed by Apache software foundation written in Java. HBase is an essential part of our Hadoop
ecosystem. HBase runs on top of HDFS (Hadoop Distributed File System). It can store massive amounts of
data from terabytes to petabytes. It is column oriented and horizontally scalable.

Dept. of MCA 5
Unit-2 NOSQL Databases

Figure – History of HBase


Applications of Apache HBase:
Real-time analytics: HBase is an excellent choice for real-time analytics applications that require low-
latency data access. It provides fast read and write performance and can handle large amounts of
data, making it suitable for real-time data analysis.
Social media applications: HBase is an ideal database for social media applications that require high
scalability and performance. It can handle the large volume of data generated by social media
platforms and provide real-time analytics capabilities.
IoT applications: HBase can be used for Internet of Things (IoT) applications that require storing and
processing large volumes of sensor data. HBase’s scalable architecture and fast write performance
make it a suitable choice for IoT applications that require low-latency data processing.
Online transaction processing: HBase can be used as an online transaction processing (OLTP)
database, providing high availability, consistency, and low-latency data access. HBase’s distributed
architecture and automatic failover capabilities make it a good fit for OLTP applications that require
high availability.
Ad serving and clickstream analysis: HBase can be used to store and process large volumes of
clickstream data for ad serving and clickstream analysis. HBase’s column-oriented data storage and
indexing capabilities make it a good fit for these types of applications.
Features of HBase –
1. It is linearly scalable across various nodes as well as modularly scalable, as it divided across
various nodes.
2. HBase provides consistent read and writes.
3. It provides atomic read and write means during one read or write process, all other processes
are prevented from performing any read or write operations.
4. It provides easy to use Java API for client access.
5. It supports Thrift and REST API for non-Java front ends which supports XML, Protobuf and binary
data encoding options.

Dept. of MCA 6
Unit-2 NOSQL Databases

6. It supports a Block Cache and Bloom Filters for real-time queries and for high volume query
optimization.
7. HBase provides automatic failure support between Region Servers.
8. It support for exporting metrics with the Hadoop metrics subsystem to files.
9. It doesn’t enforce relationship within your data.
10. It is a platform for storing and retrieving data with random access.
Facebook Messenger Platform was using Apache Cassandra but it shifted from Apache Cassandra to
HBase in November 2010. Facebook was trying to build a scalable and robust infrastructure to handle
set of services like messages, email, chat and SMS into a real time conversation so that’s why HBase is
best suited for that.
RDBMS Vs HBase –
1. RDBMS is mostly Row Oriented whereas HBase is Column Oriented.
2. RDBMS has fixed schema but in HBase we can scale or add columns in run time also.
3. RDBMS is good for structured data whereas HBase is good for semi-structured data.
4. RDBMS is optimized for joins but HBase is not optimized for joins.
Apache HBase is a NoSQL, column-oriented database that is built on top of the Hadoop ecosystem. It is
designed to provide low-latency, high-throughput access to large-scale, distributed datasets. Here are
some of the advantages and disadvantages of using HBase:
Advantages Of Apache HBase:
1. Scalability: HBase can handle extremely large datasets that can be distributed across a cluster
of machines. It is designed to scale horizontally by adding more nodes to the cluster, which
allows it to handle increasingly larger amounts of data.
2. High-performance: HBase is optimized for low-latency, high-throughput access to data. It uses a
distributed architecture that allows it to process large amounts of data in parallel, which can
result in faster query response times.
3. Flexible data model: HBase’s column-oriented data model allows for flexible schema design
and supports sparse datasets. This can make it easier to work with data that has a variable or
evolving schema.
4. Fault tolerance: HBase is designed to be fault-tolerant by replicating data across multiple
nodes in the cluster. This helps ensure that data is not lost in the event of a hardware or network
failure.
Disadvantages Of Apache HBase:
1. Complexity: HBase can be complex to set up and manage. It requires knowledge of the
Hadoop ecosystem and distributed systems concepts, which can be a steep learning curve for
some users.
2. Limited query language: HBase’s query language, HBase Shell, is not as feature-rich as SQL.
This can make it difficult to perform complex queries and analyses.

Dept. of MCA 7
Unit-2 NOSQL Databases

3. No support for transactions: HBase does not support transactions, which can make it difficult to
maintain data consistency in some use cases.
4. Not suitable for all use cases: HBase is best suited for use cases where high throughput and
low-latency access to large datasets is required. It may not be the best choice for applications
that require real-time processing or strong consistency guarantees.
 Neo4j Introduction
Neo4j: Neo4j is the most famous database management system and it is also a NoSQL database
system. Neo4j is different from Mysql or MongoDB it has its own features that’s makes it special
compared to other Database Management System.
Neo4j structure: Neo4j stores and present the data in the form of graph not in tabular format or not in
a Jason format. Here the whole data is represented by nodes and there you can create a relationship
between nodes. That means the whole database collection will look like a graph, that’s why it is making
it unique from other database management system. MS Access, SQL server all the relational database
management system use tables to store or present the data with the help of column and row but Neo4j
doesn’t use tables, row or columns like old school style to store or present the data.
Neo4j Usage: If your Database Management System has so many interconnecting relationships then
you can use Neo4j that will be the best choice. Neo4j is highly preferable to store data that contains
multiple connections between nodes. This is where the Neo4j(Graph Database) comes in it’s more
comfortable to use with relational data than the relational database. Because Neo4j doesn’t require a
predefined schema, you just need to load the data here the data is the main structure. It is schema
optional Database Management System.
There are some unique features that will make you choose Neo4j over any other Database
Management System. Neo4j is surrounded by relationships but there is no need to set up primary key
or foreign key constraints to any data. Here you can add any relation between any nodes you want.
That makes the Neo4j extremely suited for Networking data, below is the list of data areas where you
can use this Database Management System.
 Social network like in Facebook, Twitter or in Instagram
 Network Diagram
 Fraud Detection
 Graph based searched of digital assets
 Data Management
 Real-time product recommendation
Advantages:
1. Representation of connected data is very easy.
2. Retrieval or traversal or navigation of connected data is very fast.
3. It uses simple and powerful data model.
4. It can represent semi-structured data is easy.

Dept. of MCA 8
Unit-2 NOSQL Databases

Disadvantages:
1. OLAP support for these types of databases is not well executed.
2. In this area, still there are lots of research happening around.
 Difference between NoSQL and RDBMS
The following table highlights the major differences between NoSQL and RDBMS −

Basis of
NoSQL RDBMS
Comparison

Non-relational databases, often RDBMS, which stands for Relational Database


Definition known as distributed databases, are Management Systems, is the most common
another name for NoSQL databases. name for SQL databases.

Query No declarative query language SQL stands for Structured Query Language.

NoSQL databases are horizontally RDBMS databases are vertically scalable


Scalability
scalable

NoSQL combines multiple database Traditional RDBMS systems use SQL syntax
technologies. These databases were and queries to get insights from data.
Design
created in response to the Different OLAP systems use them.
application's requirements.

NoSQL databases use Relational database models contain data in


denormalization to optimise different tables; when running a query, you
themselves. One record stores all the must integrate the information and set table-
Speed
query data. This simplifies finding spanning restrictions. Because of so many
matched records, which speeds up tables, the database's query time is slow.
queries.
 Challenges NoSQL Approach
A NoSQL originally referring to non SQL or non relational is a database that provides a mechanism for
storage and retrieval of data. This data is modeled in means other than the tabular relations used in
relational databases. For a variety of reasons, businesses rely on relational database technology that
has been around for decades. Long-running, mission-critical applications are frequently supported by
relational databases, which have robust technology and expert support. However, as new Big Data use
cases and applications develop, businesses are turning to NoSQL database technology to meet their
needs.
Challenges of NoSQL:
 Back-ups in Application Consistent: Determining the sequence of changes across replicas,
which is critical in selecting which values should compose a snapshot, is a typical difficulty in
quorum-based replication systems. For example, determining a rigorous ordering between two
write requests to the same database object that arrived at two distinct nodes at the same time is
challenging. As a result of the absence of ordering, determining the most recent value of a
database item at any given time is difficult.

Dept. of MCA 9
Unit-2 NOSQL Databases

 Maturity: NoSQL proponents would agree that increasing demands will inevitably lead to
obsolescence, but the RDBMS maturity cycle appears to be more secure. RDBMS is more reliable
and functional since it has been around for decades. Many NoSQL alternatives are still in the
pre-production stage and do not yet have all of their essential features implemented.
 Database and node failures during backups and restores : Node failures are common in
NoSQL databases since they are designed to grow to hundreds of nodes. As a result, any
backup method must be able to account for data collection failures from down nodes and their
influence on quorum consistency. On the other hand, restorations must take into consideration
failed cluster nodes and modify data re population correspondingly.
 Analytics and business intelligence: NoSQL was created to fulfill the needs of Web 2.0
applications, and as a result, all of its characteristics are geared toward that goal. Other
commercial systems, on the other hand, necessitate moving beyond the insert-read-update-
delete cycle. Even the most basic queries need extensive programming knowledge, and
integrated BI tools are insufficient.
 Performance : Consider the same system as before in terms of performance (RPOs and RTOs).
Because it is impossible to parallelize database file processing beyond a certain point, it will be
impossible to achieve customer SLAs in terms of RTO if per row processing is sluggish. Due to the
same parallelization limitations, recovery procedures are also impacted. This means that backup
and recovery processing for 100 billion rows over 10 million files must be quick, and a solution
must dramatically minimize per-row and per-file database file processing.
 Data Integrity: Verifying data integrity at the block level is another issue with distributed
NoSQL databases. Checksum work for scale-up databases because the restored data is
physically identical to the backup data. The restored data in scale-out databases is semantically
comparable to the backup data, but it is not physically identical. In this situation, we’ll need to
come up with a unique method for identifying semantic equivalence between recovered and
backup data, which will allow us to spot data corruption issues that may arise throughout the
backup and restoration process.
 Expertise: Despite the fact that there are millions of developers throughout the world, the
majority of them are familiar with RDBMS architecture. All NoSQL developers should be
considered to be in the early stages of their careers and have yet to achieve expert status.
While this problem will eventually be resolved, selecting the correct developer is now difficult.
 De-duplication: In traditional database systems, de-duplication is simple—a block with identical
bitwise values. Data copies are not always identical in modern NoSQL database storage
systems because the sequencing and flushing of changes vary between nodes, creating a new
challenge for de-duplication.
Because of their increased capabilities/capacity and economies, NoSQL databases are becoming more
popular. However, developers should exercise caution and be mindful of the system’s real limits.

Dept. of MCA 10
Unit-2 NOSQL Databases

 Key-Value Data Model in NoSQL


A key-value data model or database is also referred to as a key-value store. It is a non-relational
type of database. In this, an associative array is used as a basic database in which an individual key is
linked with just one value in a collection. For the values, keys are special identifiers. Any kind of entity
can be valued. The collection of key-value pairs stored on separate records is called key-value
databases and they do not have an already defined structure.
How do key-value databases work?
A number of easy strings or even a complicated entity are referred to as a value that is associated with
a key by a key-value database, which is utilized to monitor the entity. Like in many programming
paradigms, a key-value database resembles a map object or array, or dictionary, however, which is
put away in a tenacious manner and controlled by a DBMS.
An efficient and compact structure of the index is used by the key-value store to have the option
to rapidly and dependably find value using its key. For example, Redis is a key-value store used to
tracklists, maps, heaps, and primitive types (which are simple data structures) in a constant database.
Redis can uncover a very basic point of interaction to query and manipulate value types, just by
supporting a predetermined number of value types, and when arranged, is prepared to do high
throughput.
When to use a key-value database:
Here are a few situations in which you can use a key-value database:-
 User session attributes in an online app like finance or gaming, which is referred to as real-time
random data access.
 Caching mechanism for repeatedly accessing data or key-based design.
 The application is developed on queries that are based on keys.
Features:
 One of the most un-complex kinds of NoSQL data models.
 For storing, getting, and removing data, key-value databases utilize simple functions.
 Querying language is not present in key-value databases.
 Built-in redundancy makes this database more reliable.
Advantages:
 It is very easy to use. Due to the simplicity of the database, data can accept any kind, or even
different kinds when required.
 Its response time is fast due to its simplicity, given that the remaining environment near it is very
much constructed and improved.
 Key-value store databases are scalable vertically as well as horizontally.
 Built-in redundancy makes this database more reliable.

Dept. of MCA 11
Unit-2 NOSQL Databases

Disadvantages:
 As querying language is not present in key-value databases, transportation of queries from
one database to a different database cannot be done.
 The key-value store database is not refined. You cannot query the database without a key.
Some examples of key-value databases:
Here are some popular key-value databases which are widely used:
 Couchbase: It permits SQL-style querying and searching for text.
 Amazon DynamoDB: The key-value database which is mostly used is Amazon DynamoDB
as it is a trusted database used by a large number of users. It can easily handle a large
number of requests every day and it also provides various security options.
 Riak: It is the database used to develop applications.
 Aerospike: It is an open-source and real-time database working with billions of exchanges.
 Berkeley DB: It is a high-performance and open-source database providing scalability.
 Columnar Data Model of NoSQL
The Columnar Data Model of NoSQL is important. NoSQL databases are different from SQL
databases. This is because it uses a data model that has a different structure than the previously
followed row-and-column table model used with relational database management systems databases
are a flexible schema model which is designed to scale horizontally across many servers and is used in
large volumes of data.
Columnar Data Model of NoSQL :
Basically, the relational database stores data in rows and also reads the data row by row, column
store is organized as a set of columns. So if someone wants to run analytics on a small number of
columns, one can read those columns directly without consuming memory with the unwanted data.
Columns are somehow are of the same type and gain from more efficient compression, which makes
reads faster than before. Examples of Columnar Data Model: Cassandra and Apache Hadoop Hbase.
Working of Columnar Data Model:
In Columnar Data Model instead of organizing information into rows, it does in columns. This makes them
function the same way that tables work in relational databases. This type of data model is much more
flexible obviously because it is a type of NoSQL database. The below example will help in
understanding the Columnar data model:
Row-Oriented Table:

S.No. Name Course Branch ID

01. Tanmay B-Tech Computer 2

02. Abhishek B-Tech Electronics 5

Dept. of MCA 12
Unit-2 NOSQL Databases

S.No. Name Course Branch ID

03. Samriddha B-Tech IT 7

04. Aditi B-Tech E & TC 8


Column – Oriented Table:

S.No. Name ID

01. Tanmay 2

02. Abhishek 5

03. Samriddha 7

04. Aditi 8

S.No. Course ID

01. B-Tech 2

02. B-Tech 5

03. B-Tech 7

04. B-Tech 8

S.No. Branch ID

01. Computer 2

02. Electronics 5

03. IT 7

04. E & TC 8

Columnar Data Model uses the concept of keyspace, which is like a schema in relational models.
Advantages of Columnar Data Model :
 Well structured: Since these data models are good at compression so these are very structured
or well organized in terms of storage.

Dept. of MCA 13
Unit-2 NOSQL Databases

 Flexibility: A large amount of flexibility as it is not necessary for the columns to look like each
other, which means one can add new and different columns without disrupting the whole
database
 Aggregation queries are fast: The most important thing is aggregation queries are quite fast
because a majority of the information is stored in a column. An example would be Adding up
the total number of students enrolled in one year.
 Scalability: It can be spread across large clusters of machines, even numbering in thousands.
 Load Times: Since one can easily load a row table in a few seconds so load times are nearly
excellent.
Disadvantages of Columnar Data Model:
 Designing indexing Schema: To design an effective and working schema is too difficult and
very time-consuming.
 Suboptimal data loading: incremental data loading is suboptimal and must be avoided, but this
might not be an issue for some users.
 Security vulnerabilities: If security is one of the priorities then it must be known that the
Columnar data model lacks inbuilt security features in this case, one must look into relational
databases.
 Online Transaction Processing (OLTP): Online Transaction Processing (OLTP) applications are
also not compatible with columnar data models because of the way data is stored.
Applications of Columnar Data Model:
 Columnar Data Model is very much used in various Blogging Platforms.
 It is used in Content management systems like WordPress, Joomla, etc.
 It is used in Systems that maintain counters.
 It is used in Systems that require heavy write requests.
 It is used in Services that have expiring usage.
 Aggregate-Oriented Databases in NoSQL
The aggregate-Oriented database is the NoSQL database which does not support ACID
transactions and they sacrifice one of the ACID properties. Aggregate orientation operations are
different compared to relational database operations. We can perform OLAP operations on
the Aggregate-Oriented database. The efficiency of the Aggregate-Oriented database is high if the
data transactions and interactions take place within the same aggregate. Several fields of data can be
put in the aggregates such that they can be commonly accessed together. We can manipulate only a
single aggregate at a time. We can not manipulate multiple aggregates at a time in an atomic way.
Aggregate – Oriented databases are classified into four major data models. They are as follows:
 Key-value
 Document
 Column family

Dept. of MCA 14
Unit-2 NOSQL Databases

 Graph-based
Each of the Data models above has its own query language.
 key-value Data Model: Key-value and document databases were strongly aggregate-oriented.
The key-value data model contains the key or Id which is used to access the data of the
aggregates. key-value Data Model is very secure as the aggregates are opaque to the
database. Aggregates are encrypted as the big blog of bits that can be decrypted with key or
id. In the key-value Data Model, we can place data of any structure and datatypes in it. The
advantage of the key-value Data Model is that we can store the sensitive information in the
aggregate. But the disadvantage of this model the database has some general size limits. We
can store only the limited data.
 Document Data Model: In Document Data Model we can access the parts of aggregates. The
data in this model can be accessed inflexible manner. we can submit queries to the database
based on the fields in the aggregate. There is a restriction on the structure and data types of
data to be paced in this data model. The structure of the aggregate can be accessed by
the Document Data Model.
 Column family Data Model: The Column family is also called a two-level map. But, however,
we think about the structure, it has been a model that influenced later databases such as HBase
and Cassandra. These databases with a big table-style data model are often referred to as
column stores. Column-family models divide the aggregate into column families. The Column-
family model is a two-level aggregate structure. The first level consists of keys that act as a row
identifier that selects the aggregate. The second-level values in the Column family Data Model
are referred to as columns.
 In the above example, the row key is 234 which selects the aggregate. Here the row key selects
the column families customer and orders. Each column family contains the columns of data. In the
orders column family, we have the orders placed by the customers.
 Graph Data Model: In a graph data model, the data is stored in nodes that are connected by
edges. This model is preferred to store a huge amount of complex aggregates and
multidimensional data with many interconnections between them. Graph Data Model has the
application like we can store the Facebook user accounts in the nodes and find out the friends of
the particular user by following the edges of the graph.
We can find the friends of a person by observing this graph data model. If there is an edge
between two nodes then we can say they are friends. Here we also consider the indirect links between
the nodes to determine the friend suggestions.

Dept. of MCA 15

You might also like