UNIT II First Half Notes

Introduction to NoSQL
NoSQL is a type of database management system (DBMS) that is designed

to handle and store large volumes of unstructured and semi-structured data.
Unlike traditional relational databases that use tables with pre-defined
schemas to store data, NoSQL databases use flexible data models that can
adapt to changes in data structures and are capable of scaling horizontally to
handle growing amounts of data.
The term NoSQL originally referred to “non-SQL” or “non-relational”
databases, but the term has since evolved to mean “not only SQL,” as
NoSQL databases have expanded to include a wide range of different
database architectures and data models.
NoSQL databases are generally classified into four main

categories:
1. Document databases: These databases store data as semi-

structured documents, such as JSON or XML, and can be queried using
document-oriented query languages.
2. Key-value stores: These databases store data as key-value pairs,
and are optimized for simple and fast read/write operations.
3. Column-family stores: These databases store data as column
families, which are sets of columns that are treated as a single entity.
They are optimized for fast and efficient querying of large amounts of
data.
4. Graph databases: These databases store data as nodes and edges,
and are designed to handle complex relationships between data.
NoSQL databases are often used in applications where there is a high
volume of data that needs to be processed and analyzed in real-time, such
as social media analytics, e-commerce, and gaming. They can also be used
for other applications, such as content management systems, document
management, and customer relationship management.
However, NoSQL databases may not be suitable for all applications, as they
may not provide the same level of data consistency and transactional
guarantees as traditional relational databases. It is important to carefully
evaluate the specific needs of an application when choosing a database
management system.
NoSQL originally referring to non SQL or non relational is a database that
provides a mechanism for storage and retrieval of data. This data is modeled
in means other than the tabular relations used in relational databases. Such
databases came into existence in the late 1960s, but did not obtain the
NoSQL moniker until a surge of popularity in the early twenty-first century.
NoSQL databases are used in real-time web applications and big data and
their use are increasing over time.
 NoSQL systems are also sometimes called Not only SQL to
emphasize the fact that they may support SQL-like query languages. A
NoSQL database includes simplicity of design, simpler horizontal scaling
to clusters of machines and finer control over availability. The data
structures used by NoSQL databases are different from those used by
default in relational databases which makes some operations faster in
NoSQL. The suitability of a given NoSQL database depends on the
problem it should solve.
 NoSQL databases, also known as “not only SQL” databases, are a
new type of database management system that have gained popularity in
recent years. Unlike traditional relational databases, NoSQL databases
are designed to handle large amounts of unstructured or semi-structured
data, and they can accommodate dynamic changes to the data model.
This makes NoSQL databases a good fit for modern web applications,
real-time analytics, and big data processing.
 Data structures used by NoSQL databases are sometimes also viewed
as more flexible than relational database tables. Many NoSQL stores
compromise consistency in favor of availability, speed and partition
tolerance. Barriers to the greater adoption of NoSQL stores include the
use of low-level query languages, lack of standardized interfaces, and
huge previous investments in existing relational databases.
 Most NoSQL stores lack true ACID(Atomicity, Consistency, Isolation,
Durability) transactions but a few databases, such as MarkLogic,
Aerospike, FairCom c-treeACE, Google Spanner (though technically a
NewSQL database), Symas LMDB, and OrientDB have made them
central to their designs.
 Most NoSQL databases offer a concept of eventual consistency in
which database changes are propagated to all nodes so queries for data
might not return updated data immediately or might result in reading data
that is not accurate which is a problem known as stale reads. Also some
NoSQL systems may exhibit lost writes and other forms of data loss.
Some NoSQL systems provide concepts such as write-ahead logging to
avoid data loss.
 One simple example of a NoSQL database is a document database. In
a document database, data is stored in documents rather than tables.
Each document can contain a different set of fields, making it easy to
accommodate changing data requirements
 For example, “Take, for instance, a database that holds data regarding
employees.”. In a relational database, this information might be stored in
tables, with one table for employee information and another table for
department information. In a document database, each employee would
be stored as a separate document, with all of their information contained
within the document.
 NoSQL databases are a relatively new type of database management
system that have gained popularity in recent years due to their scalability
and flexibility. They are designed to handle large amounts of unstructured
or semi-structured data and can handle dynamic changes to the data
model. This makes NoSQL databases a good fit for modern web
applications, real-time analytics, and big data processing.
Key Features of NoSQL :
1. Dynamic schema: NoSQL databases do not have a fixed schema and
can accommodate changing data structures without the need for
migrations or schema alterations.
2. Horizontal scalability: NoSQL databases are designed to scale out
by adding more nodes to a database cluster, making them well-suited for
handling large amounts of data and high levels of traffic.
3. Document-based: Some NoSQL databases, such as MongoDB, use
a document-based data model, where data is stored in semi-structured
format, such as JSON or BSON.
4. Key-value-based: Other NoSQL databases, such as Redis, use a
key-value data model, where data is stored as a collection of key-value
pairs.
5. Column-based: Some NoSQL databases, such as Cassandra, use a
column-based data model, where data is organized into columns instead
of rows.
6. Distributed and high availability: NoSQL databases are often
designed to be highly available and to automatically handle node failures
and data replication across multiple nodes in a database cluster.
7. Flexibility: NoSQL databases allow developers to store and retrieve
data in a flexible and dynamic manner, with support for multiple data
types and changing data structures.
8. Performance: NoSQL databases are optimized for high performance
and can handle a high volume of reads and writes, making them suitable
for big data and real-time applications.
Advantages of NoSQL: There are many advantages of working with NoSQL
databases such as MongoDB and Cassandra. The main advantages are
high scalability and high availability.
1. High scalability : NoSQL databases use sharding for horizontal
scaling. Partitioning of data and placing it on multiple machines in such a
way that the order of the data is preserved is sharding. Vertical scaling
means adding more resources to the existing machine whereas horizontal
scaling means adding more machines to handle the data. Vertical scaling
is not that easy to implement but horizontal scaling is easy to implement.
Examples of horizontal scaling databases are MongoDB, Cassandra, etc.
NoSQL can handle a huge amount of data because of scalability, as the
data grows NoSQL scale itself to handle that data in an efficient manner.
2. Flexibility: NoSQL databases are designed to handle unstructured or
semi-structured data, which means that they can accommodate dynamic
changes to the data model. This makes NoSQL databases a good fit for
applications that need to handle changing data requirements.
3. High availability : Auto replication feature in NoSQL databases
makes it highly available because in case of any failure data replicates
itself to the previous consistent state.
4. Scalability: NoSQL databases are highly scalable, which means that
they can handle large amounts of data and traffic with ease. This makes
them a good fit for applications that need to handle large amounts of data
or traffic
5. Performance: NoSQL databases are designed to handle large
amounts of data and traffic, which means that they can offer improved
performance compared to traditional relational databases.
6. Cost-effectiveness: NoSQL databases are often more cost-effective
than traditional relational databases, as they are typically less complex
and do not require expensive hardware or software.
7. Agility: Ideal for agile development.
Disadvantages of NoSQL: NoSQL has the following disadvantages.
1. Lack of standardization : There are many different types of NoSQL
databases, each with its own unique strengths and weaknesses. This lack
of standardization can make it difficult to choose the right database for a
specific application
2. Lack of ACID compliance : NoSQL databases are not fully ACID-
compliant, which means that they do not guarantee the consistency,
integrity, and durability of data. This can be a drawback for applications
that require strong data consistency guarantees.
3. Narrow focus : NoSQL databases have a very narrow focus as it is
mainly designed for storage but it provides very little functionality.
Relational databases are a better choice in the field of Transaction
Management than NoSQL.
4. Open-source : NoSQL is open-source database. There is no reliable
standard for NoSQL yet. In other words, two database systems are likely
to be unequal.
5. Lack of support for complex queries : NoSQL databases are not
designed to handle complex queries, which means that they are not a
good fit for applications that require complex data analysis or reporting.
6. Lack of maturity : NoSQL databases are relatively new and lack the
maturity of traditional relational databases. This can make them less
reliable and less secure than traditional databases.
7. Management challenge : The purpose of big data tools is to make the
management of a large amount of data as simple as possible. But it is not
so easy. Data management in NoSQL is much more complex than in a
relational database. NoSQL, in particular, has a reputation for being
challenging to install and even more hectic to manage on a daily basis.
8. GUI is not available : GUI mode tools to access the database are not
flexibly available in the market.
9. Backup : Backup is a great weak point for some NoSQL databases
like MongoDB. MongoDB has no approach for the backup of data in a
consistent manner.
10. Large document size : Some database systems like MongoDB and
CouchDB store data in JSON format. This means that documents are
quite large (BigData, network bandwidth, speed), and having descriptive
key names actually hurts since they increase the document size.
Types of NoSQL database: Types of NoSQL databases and the name of
the databases system that falls in that category are:
1. Graph Databases: Examples – Amazon Neptune, Neo4j
2. Key value store: Examples – Memcached, Redis, Coherence
3. Tabular: Examples – Hbase, Big Table, Accumulo
4. Document-based: Examples – MongoDB, CouchDB, Cloudant
When should NoSQL be used:
1. When a huge amount of data needs to be stored and retrieved.
2. The relationship between the data you store is not that important
3. The data changes over time and is not structured.
4. Support of Constraints and Joins is not required at the database level
5. The data is growing continuously and you need to scale the database
regularly to handle the data.
Types of NoSQL:
A database is a collection of structured data or information which is stored in
a computer system and can be accessed easily. A database is usually
managed by a Database Management System (DBMS).NoSQL is a non-
relational database that is used to store the data in the nontabular form.
NoSQL stands for Not only SQL. The main types are documents, key-value,
wide-column, and graphs.
Types of NoSQL Database:

 Document-based databases
 Key-value stores
 Column-oriented databases
 Graph-based databases
Document-Based Database:
The document-based database is a nonrelational database. Instead of

storing the data in rows and columns (tables), it uses the documents to store
the data in the database. A document database stores data in JSON, BSON,
or XML documents.
Documents can be stored and retrieved in a form that is much closer to the
data objects used in applications which means less translation is required to
use these data in the applications. In the Document database, the particular
elements can be accessed by using the index value that is assigned for
faster querying.
Collections are the group of documents that store documents that have
similar contents. Not all the documents are in any collection as they require a
similar schema because document databases have a flexible schema.
Key features of documents database:
 Flexible schema: Documents in the database has a flexible schema. It
means the documents in the database need not be the same schema.
 Faster creation and maintenance: the creation of documents is easy
and minimal maintenance is required once we create the document.
 No foreign keys: There is no dynamic relationship between two
documents so documents can be independent of one another. So, there
is no requirement for a foreign key in a document database.
 Open formats: To build a document we use XML, JSON, and others.
Key-Value Stores:
A key-value store is a nonrelational database. The simplest form of a NoSQL

database is a key-value store. Every data element in the database is stored
in key-value pairs. The data can be retrieved by using a unique key allotted
to each element in the database. The values can be simple data types like
strings and numbers or complex objects.
A key-value store is like a relational database with only two columns which is
the key and the value.
Key features of the key-value store:
 Simplicity.
 Scalability.
 Speed.
Column Oriented Databases:
A column-oriented database is a non-relational database that stores the data

in columns instead of rows. That means when we want to run analytics on a
small number of columns, you can read those columns directly without
consuming memory with the unwanted data.
Columnar databases are designed to read data more efficiently and retrieve
the data with greater speed. A columnar database is used to store a large
amount of data. Key features of columnar oriented database:
 Scalability.
 Compression.
 Very responsive.
Graph-Based databases:
Graph-based databases focus on the relationship between the elements. It

stores the data in the form of nodes in the database. The connections
between the nodes are called links or relationships.
Key features of graph database:
 In a graph-based database, it is easy to identify the relationship
between the data by using the links.
 The Query’s output is real-time results.
 The speed depends upon the number of relationships among the
database elements.
 Updating data is also easy, as adding a new node or edge to a graph
database is a straightforward task that does not require significant
schema changes.
Aggregate Data Models
The term aggregate means a collection of objects that we use to treat as a
unit. An aggregate is a collection of data that we interact with as a unit.
These units of data or aggregates form the boundaries for ACID operation.
Here in the diagram have two Aggregate:

 Customer and Orders link between them represent an aggregate.
 The diamond shows how data fit into the aggregate structure.
 Customer contains a list of billing address
 Payment also contains the billing address
 The address appears three times and it is copied each time
 The domain is fit where we don’t want to change shipping and billing
address.
Consequences of Aggregate Orientation:
 Aggregation is not a logical data property It is all about how the data is
being used by applications.
 An aggregate structure may be an obstacle for others but help with
some data interactions.
 It has an important consequence for transactions.
 NoSQL databases don’t support ACID transactions thus sacrificing
consistency.
 aggregate-oriented databases support the atomic manipulation of a
single aggregate at a time.
Advantage:
 It can be used as a primary data source for online applications.
 Easy Replication.
 No single point Failure.
 It provides fast performance and horizontal Scalability.
 It can handle Structured semi-structured and unstructured data with
equal effort.
Disadvantage:
 No standard rules.
 Limited query capabilities.
 Doesn’t work well with relational data.
 Not so popular in the enterprise.
 When the value of data increases it is difficult to maintain unique
values.
Key-Value and Document Data Models
 A document database is able to see a structure in the aggregate. The
advantage of opacity is that we can store whatever we like in the
aggregate.
 The database may impose some general size limit, but other than that
we have complete freedom.
 A document database imposes limits on what we can place in it,
defining allowable structures and types. In return, however, we get
more flexibility in access.
 With a key-value store, we can only access an aggregate by lookup
based on its key.
 With a document database, we can submit queries to the database
based on the fields in the aggregate, we can retrieve part of the
aggregate rather than the whole thing, and database can create
indexes based on the contents of the aggregate.
 In practice, the line between key-value and document gets a bit
blurry.
 People often put an ID field in a document database to do a key-
value style lookup.
 Databases classified as key-value databases may allow you
structures for data beyond just an opaque aggregate.
 For example, Riak allows you to add metadata to aggregates for
indexing and interaggregate links, Redis allows you to break down
the aggregate into lists or sets. You can support querying by
integrating search tools such as Solr.
 As an example, Riak includes a search facility that uses Solr-like
searching on any aggregates that are stored as JSON or XML
structures.
 Despite this blurriness, the general distinction still holds. With key-
value databases, we expect to mostly look up aggregates using a key.
With document databases, we mostly expect to submit some form of
query based on the internal structure of the document;
 This might be a key, but it’s more likely to be something else.mostly
look up aggregates using a key.
 With document databases, we mostly expect to submit some form of
query based on the internal structure of the document;
 This might be a key, but it’s more likely to be something else.
Relationships
 Aggregates are useful in that they put together data that is commonly
accessed together. But there are still lots of cases where data that’s
related is accessed differently.
 Consider the relationship between a customer and all of his orders.
Some applications will want to access the order history whenever
they access the customer;
 This fits in well with combining the customer with his order history
into a single aggregate.
 Other applications, however, want to process orders individually and
thus model orders as independent aggregates.
 In this case, you’ll want separate order and customer aggregates but
with some kind of relationship between them so that any work on an
order can look up customer data.
 The simplest way to provide such a link is to embed the ID of the
customer within the order’s aggregate data. That way, if you need
data from the customer record, you read the order, ferret out the
customer ID, and make another call to the database to read the
customer data.
 This will work, and will be just fine in many scenarios—but the
database will be ignorant of the relationship in the data.
 This can be important because there are times when it’s useful for
the database to know about these links.
 As a result, many databases—even key-value stores—provide ways
to make these relationships visible to the database.
 Document stores make the content of the aggregate available to the
database to form indexes and queries.
 Riak, a key-value store, allows you to put link information in
metadata, supporting partial retrieval and link-walking capability.
An important aspect of relationships between aggregates is how they
handle updates.
 Aggregate oriented databases treat the aggregate as the unit of data-
retrieval.
 Consequently, atomicity is only supported within the contents of a
single aggregate. If you update multiple aggregates at once, you
have to deal yourself with a failure partway through.
 Relational databases help you with this by allowing you to modify
multiple records in a single transaction, providing ACID guarantees
while altering many rows.
 All of this means that aggregate- oriented databases become more
awkward as you need to operate across multiple aggregates.
 There are various ways to deal with this, which we’ll explore later in
this chapter, but the fundamental awkwardness remains.
 This may imply that if you have data based on lots of relationships,
you should prefer a relational database over a NoSQL store.
 While that’s true for aggregate-oriented databases, it’s worth
remembering that relational databases aren’t all that stellar with
complex relationships either.
 While you can express queries involving joins in SQL, things
quickly get very hairy—both with SQL writing and with the
resulting performance—as the number of joins mounts up.
 This makes it a good moment to introduce another category of
databases that’s often lumped into the NoSQL pile.
Graph Databases
 A graph database is a type of NoSQL database that is designed to
handle data with complex relationships and interconnections.
 In a graph database, data is stored as nodes and edges, where nodes
represent entities and edges represent the relationships between
those entities.
1. Graph databases are particularly well-suited for applications that
require deep and complex queries, such as social networks,
recommendation engines, and fraud detection systems. They can also be
used for other types of applications, such as supply chain management,
network and infrastructure management, and bioinformatics.
2. One of the main advantages of graph databases is their ability to
handle and represent relationships between entities. This is because the
relationships between entities are as important as the entities themselves,
and often cannot be easily represented in a traditional relational
database.
3. Another advantage of graph databases is their flexibility. Graph
databases can handle data with changing structures and can be adapted
to new use cases without requiring significant changes to the database
schema. This makes them particularly useful for applications with rapidly
changing data structures or complex data requirements.
4. However, graph databases may not be suitable for all applications. For
example, they may not be the best choice for applications that require
simple queries or that deal primarily with data that can be easily
represented in a traditional relational database. Additionally, graph
databases may require more specialized knowledge and expertise to use
effectively.
Some popular graph databases include Neo4j, OrientDB, and ArangoDB.
These databases provide a range of features, including support for different
data models, scalability, and high availability, and can be used for a wide
variety of applications.
A graph database is a type of database used to represent the data in the

form of a graph. It has three components: nodes, relationships, and
properties. These components are used to model the data. They are useful
in the fields of social networking, fraud detection, AI Knowledge graphs etc.
The descriptions of components are as follows:
 Nodes: represent the objects or instances. They are equivalent to a
row in database. The node basically acts as a vertex in a graph. The
nodes are grouped by applying a label to each member.
 Relationships: They are basically the edges in the graph. They have
a specific direction, type and form patterns of the data. They basically
establish relationship between nodes.
 Properties: They are the information associated with the nodes.
Types of Graph Databases:

 Property Graphs: These graphs are used for querying and analyzing
data by modelling the relationships among the data. It comprises of
vertices that has information about the particular subject and edges that
denote the relationship. The vertices and edges have additional attributes
called properties.
 RDF Graphs: It stands for Resource Description Framework. It
focuses more on data integration. They are used to represent complex
data with well defined semantics. It is represented by three elements: two
vertices, an edge that reflect the subject, predicate and object of a
sentence. Every vertex and edge is represented by URI(Uniform
Resource Identifier).
When to Use Graph Database?
 Graph databases should be used for heavily interconnected data.
 It should be used when amount of data is larger and relationships are
present.
 It can be used to represent the cohesive picture of the data.
How Graph and Graph Databases Work?
Graph databases provide graph models They allow users to perform
traversal queries since data is connected. Graph algorithms are also applied
to find patterns, paths and other relationships this enabling more analysis of
the data. The algorithms help to explore the neighboring nodes, clustering of
vertices analyze relationships and patterns. Countless joins are not required
in this kind of database.
Example of Graph Database:
 Recommendation engines in E commerce use graph databases to
provide customers with accurate recommendations, updates about new
products thus increasing sales and satisfying the customer’s desires.
 Social media companies use graph databases to find the “friends of
friends” or products that the user’s friends like and send suggestions
accordingly to user.
 To detect fraud Graph databases play a major role. Users can create
graph from the transactions between entities and store other important
information. Once created, running a simple query will help to identify the
fraud.
Advantages of Graph Database:
 Potential advantage of Graph Database is establishing the
relationships with external sources as well
 No joins are required since relationships is already specified.
 Query is dependent on concrete relationships and not on the amount
of data.
 It is flexible and agile.
 it is easy to manage the data in terms of graph.
 Efficient data modeling: Graph databases allow for efficient data
modeling by representing data as nodes and edges. This allows for more
flexible and scalable data modeling than traditional relational databases.
 Flexible relationships: Graph databases are designed to handle
complex relationships and interconnections between data elements. This
makes them well-suited for applications that require deep and complex
queries, such as social networks, recommendation engines, and fraud
detection systems.
 High performance: Graph databases are optimized for handling large
and complex datasets, making them well-suited for applications that
require high levels of performance and scalability.
 Scalability: Graph databases can be easily scaled horizontally,
allowing additional servers to be added to the cluster to handle increased
data volume or traffic.
 Easy to use: Graph databases are typically easier to use than
traditional relational databases. They often have a simpler data model
and query language, and can be easier to maintain and scale.
Disadvantages of Graph Database:
 Often for complex relationships speed becomes slower in searching.
 The query language is platform dependent.
 They are inappropriate for transactional data
 It has smaller user base.
 Limited use cases: Graph databases are not suitable for all
applications. They may not be the best choice for applications that require
simple queries or that deal primarily with data that can be easily
represented in a traditional relational database.
 Specialized knowledge: Graph databases may require specialized
knowledge and expertise to use effectively, including knowledge of graph
theory and algorithms.
 Immature technology: The technology for graph databases is relatively
new and still evolving, which means that it may not be as stable or well-
supported as traditional relational databases.
 Integration with other tools: Graph databases may not be as well-
integrated with other tools and systems as traditional relational
databases, which can make it more difficult to use them in conjunction
with other technologies.
 Overall, graph databases on NoSQL offer many advantages for
applications that require complex and deep relationships between data
elements. They are highly flexible, scalable, and performant, and can
handle large and complex datasets. However, they may not be suitable
for all applications, and may require specialized knowledge and expertise
to use effectively.
Future of Graph Database:
Graph Database is an excellent tool for storing data but it cannot be used to
completely replace the traditional database. This database deals with a
typical set of interconnected data. Although Graph Database is in the
developmental phase it is becoming an important part as business and
organizations are using big data and Graph databases help in complex
analysis. Thus these databases have become a must for today’s needs and
tomorrow success.
Example We have a social network in which five friends are all connected.
These friends are Anay, Bhagya, Chaitanya, Dilip, and Erica. A graph
database that will store their personal information may look something like
this:
id first name last name email phone

1 Anay Agarwal anay@example.net 555-111-5555
2 Bhagya Kumar bhagya@example.net 555-222-5555
3 Chaitanya Nayak chaitanya@example.net555-333-5555
4 Dilip Jain dilip@example.net 555-444-5555
5 Erica Emmanuel erica@example.net 555-555-5555
Now, we will also a need another table to capture the friendship/relationship
between users/friends. Our friendship table will look something like this:
user_id friend_id
1 2
1 3
1 4
1 5
2 1
2 3
2 4
2 5
3 1
3 2
3 4
3 5
4 1
4 2
4 3
4 5
5 1
5 2
5 3
5 4
We will avoid going deep into the Database(primary key & foreign key)
theory. Instead just assume that the friendship table uses id’s of both the
friends. Assume that our social network here has a feature that allows every
user to see the personal information of his/her friends. So, If Chaitanya were
requesting information then it would mean she needs information about
Anay, Bhagya, Dilip and Erica. We will approach this problem the traditional
way(Relational database). We must first identify Chaitanya’s id in the User’s
table:
id first name last name email phone

3 Chaitanya Nayak chaitanya@example.net555-333-5555
Now, we’d look for all tuples in friendship table where the user_id is 3.
Resulting relation would be something like this:
user_id friend_id
3 1
3 2
3 4
3 5
Now, let’s analyse the time taken in this Relational database approach. This
will be approximately log(N) times where N represents the number of tuples
in friendship table or number of relations. Here, the database maintains the
rows in the order of id’s. So, in general for ‘M’ no of queries, we have a time
complexity of M*log(N) Only if we had used a graph database approach, the
total time complexity would have been O(N). Because, once we’ve located
Cindy in the database, we have to take only a single step for finding her
friends. Here is how our query would be executed:
Limitations of Graph Databases:

 Graph Databases may not be offering better choice over the NoSQL
variations.
 If application needs to scale horizontally this may introduces poor
performance.
 Not very efficient when it needs to update all nodes with a given
parameter.
Key Highlights on SQL vs NoSQL

SQL NoSQL
RELATIONAL DATABASE
MANAGEMENT SYSTEM Non-relational or distributed database system.
(RDBMS)
These databases have fixed

or static or predefined They have a dynamic schema
schema
These databases are not

These databases are best suited for hierarchical data
suited for hierarchical data
storage.
storage.
These databases are best

These databases are not so good for complex queries
suited for complex queries
Vertically Scalable Horizontally scalable

SQL NoSQL
Follows CAP(consistency, availability, partition

Follows ACID property
tolerance)
Examples: MySQL, PostgreS

Examples: MongoDB, GraphQL, HBase, Neo4j, Cassandr
QL, Oracle, MS-SQL Server,
a, etc
etc
Schemaless Databases
Traditional relational databases are well-defined, using a schema to describe every
functional element, including tables, rows views, indexes, and relationships. By
exerting a high degree of control, the database administrator can improve
performance and prevent capture of low-quality, incomplete, or malformed data. In a
SQL database, the schema is enforced by the Relational Database Management
System (RDBMS) whenever data is written to disk.
But in order to work, data needs to be heavily formatted and shaped to fit into the
table structure. This means sacrificing any undefined details during the save, or
storing valuable information outside the database entirely.
A schemaless database, like MongoDB, does not have these up-front constraints,
mapping to a more ‘natural’ database. Even when sitting on top of a data lake, each
document is created with a partial schema to aid retrieval. Any formal schema is
applied in the code of your applications; this layer of abstraction protects the raw
data in the NoSQL database and allows for rapid transformation as your needs
change.
Any data, formatted or not, can be stored in a non-tabular NoSQL type of database.
At the same time, using the right tools in the form of a schemaless database can
unlock the value of all of your structured and unstructured data types.
How does a schemaless database work?

In schemaless databases, information is stored in JSON-style documents which can
have varying sets of fields with different data types for each field. So, a collection
could look like this:
{
name:”abc”,age:30,interest:”football”
name:”xyz”,age:25
As you can see, the data itself normally has a fairly consistent structure. With the
schemaless MongoDB database, there is some additional structure — the system
namespace contains an explicit list of collections and indexes. Collections may be
implicitly or explicitly created — indexes must be explicitly declared.
What are the benefits of using a schemaless

database?
 Greater flexibility over data types
By operating without a schema, schemaless databases can store, retrieve,

and query any data type — perfect for big data analytics and similar
operations that are powered by unstructured data. Relational databases apply
rigid schema rules to data, limiting what can be stored.
 No pre-defined database schemas
The lack of schema means that your NoSQL database can accept any data
type — including those that you do not yet use. This future-proofs your
database, allowing it to grow and change as your data-driven operations
change and mature.
 No data truncation
A schemaless database makes almost no changes to your data; each item is

saved in its own document with a partial schema, leaving the raw information
untouched. This means that every detail is always available and nothing is
stripped to match the current schema. This is particularly valuable if your
analytics needs to change at some point in the future.
 Suitable for real-time analytics functions

With the ability to process unstructured data, applications built on NoSQL
databases are better able to process real-time data, such as readings and
measurements from IoT sensors. Schemaless databases are also ideal for
use with machine learning and artificial intelligence operations, helping to
accelerate automated actions in your business.
 Enhanced scalability and flexibility
With NoSQL, you can use whichever data model is best suited to the job.
Graph databases allow you to view relationships between data points, or you
can use traditional wide table views with an exceptionally large number of
columns. You can query, report, and model information however you choose.
And as your requirements grow, you can keep adding nodes to increase
capacity and power.
Materialized Views
Views:
A View is a virtual relation that acts as an actual relation. It is not a part of
logical relational model of the database system. Tuples of the view are not
stored in the database system and tuples of the view are generated every
time the view is accessed. Query expression of the view is stored in the
databases system.
Views can be used everywhere were we can use the actual relation. Views
can be used to create custom virtual relations according to the needs of a
specific user. We can create as many views as we want in a databases
system.
Materialized Views:
When the results of a view expression are stored in a database system, they
are called materialized views. SQL does not provides any standard way of
defining materialized view, however some database management system
provides custom extensions to use materialized views. The process of
keeping the materialized views updated is know as view maintenance.
Database system uses one of the three ways to keep the materialized view
updated:
 Update the materialized view as soon as the relation on which it is
defined is updated.
 Update the materialized view every time the view is accessed.
 Update the materialized view periodically.
. Materialized view is useful when the view is accessed frequently, as it

saves the computation time, as the result are stored in the database before
hand. Materialized view can also be helpful in case where the relation on
which view is defined is very large and the resulting relation of the view is
very small. Materialized view has storage cost and updation overheads
associated with it.
Differences between Views and Materialized Views:

Views Materialized Views
Query expression are stored in the

Resulting tuples of the query expression are
databases system, and not the resulting
stored in the databases system.
tuples of the query expression.
Views needs not to be updated every

Materialized views are updated as the tuples
time the relation on which view is
are stored in the database system. It can be
defined is updated, as the tuples of the
updated in one of three ways depending on
views are computed every time when the
the databases system as mentioned above.
view is accessed.
It does not have any storage cost It does have a storage cost associated with
associated with it. it.
It does not have any updation cost It does have updation cost associated with
associated with it. it.
There is no SQL standard for defining a

There is an SQL standard of defining a materialized view, and the functionality is
view. provided by some databases systems as an
extension.
Materialized views are efficient when the

Views are useful when the view is view is accessed frequently as it saves the
accessed infrequently. computation time by storing the results
before hand.
Materialized View caches the result of complex queries(that need lots of

computation and operations) and further supports refreshing of the cached
data. Materialized views are defined by database queries similar to that of
Views in PostgreSQL.
Creating Materialized Views:
The following statement is used for creating a materialized view in

PostgreSQL:
CREATE MATERIALIZED VIEW
Now the above statement can be used as a query on the database as shown
below:
Syntax: CREATE MATERIALIZED VIEW your_view_name
AS
your_query
WITH [NO] DATA;
Let’s see what we did in the above query:
 First, we need to specify the name of the view after the CREATE
MATERIALIZED VIEW statement.
 Then we add the query for the data that we need to extract from the
tables after the AS keyword.
 Finally, if you want to load the query results into the materialized view,
use WITH DATA option else use WITH NO DATA option.
Example:
The dvdrental database has a table name film_category where all comedy
films have a category_id of 4. In this example, we will use the concept of the
materialized view to filter out the film_id of all comedy movies in the
database.
CREATE MATERIALIZED VIEW comedy_movie_list
AS
SELECT film_id FROM film_category
where category_id=4 WITH DATA;
The view contains information retrieved from the film_category table about
the movies with category_id of 4.

UNIT II First Half Notes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

UNIT II First Half Notes

Uploaded by

Copyright:

Available Formats

Introduction to NoSQL

NoSQL is a type of database management system (DBMS) that is designed

NoSQL databases are generally classified into four main

1. Document databases: These databases store data as semi-

Types of NoSQL Database:

The document-based database is a nonrelational database. Instead of

A key-value store is a nonrelational database. The simplest form of a NoSQL

A column-oriented database is a non-relational database that stores the data

Graph-based databases focus on the relationship between the elements. It

Here in the diagram have two Aggregate:

A graph database is a type of database used to represent the data in the

Types of Graph Databases:

id first name last name email phone

id first name last name email phone

Limitations of Graph Databases:

Key Highlights on SQL vs NoSQL

These databases have fixed

These databases are not

These databases are best

Vertically Scalable Horizontally scalable

Follows CAP(consistency, availability, partition

Examples: MySQL, PostgreS

How does a schemaless database work?

What are the benefits of using a schemaless

By operating without a schema, schemaless databases can store, retrieve,

 No pre-defined database schemas

A schemaless database makes almost no changes to your data; each item is

 Suitable for real-time analytics functions

 Enhanced scalability and flexibility

. Materialized view is useful when the view is accessed frequently, as it

Differences between Views and Materialized Views:

Query expression are stored in the

Views needs not to be updated every

There is no SQL standard for defining a

Materialized views are efficient when the

Materialized View caches the result of complex queries(that need lots of

Creating Materialized Views:

The following statement is used for creating a materialized view in

You might also like