You are on page 1of 38

Course code : CSE3009

Course title : No SQL Data Bases


Module :1
Topic :1

Introduction to NoSQL Concepts

Dr. Karthika Natarajan 5/23/2022 1


Objectives

This session will give the knowledge about


• Database revolutions
• First generation Database
• Second generation Database
• Third generation Database
• What is NoSQL?
• Comparison between SQL and NoSQL?

Dr. Karthika Natarajan 5/23/2022 2


History of Database

• Databases are a foundational element of the modern world. We interact


with them even without knowing it — any time we buy something online, or
log in to a service, or access our bank accounts, and so on

• The concept of a database existed long before computers. In these times,


data was stored in journals, in libraries, and in hundreds of filing cabinets.
Everything was recorded via paper — and that meant it took up space, was
hard to find, and difficult to back up.

• Back then computers became available, and with them, the opportunity for
better data management.

Dr. Karthika Natarajan 5/23/2022 3


What is Database?

A database is a collection of data, typically describing the activities of one or more


related entities and attributes.
A database is a collection of information that is organized so that it can be easily
accessed, managed and updated.

A database management system, or DBMS, is software designed to assist in


maintaining and utilizing large collections of data, and the need for such systems,
as well as their use, is growing rapidly.

Dr. Karthika Natarajan 5/23/2022 4


Evolution of Database

Dr. Karthika Natarajan 5/23/2022 5


First Database Revolution
• The emergence of electronic computers following the Second World War
represented the first revolution in databases.

• Early “databases” used paper tape initially and eventually magnetic tape to
store data sequentially.

• 1955: spinning magnetic disk - Data can be modified or can be deleted easily
in the magnetic disk memory. It also allows random access of data i.e.,
individual records.

• 1961: ISAM (Index Sequential Access Method) made fast record-oriented access
feasible and consequently leads to OLTP (On-line Transaction Processing)
computer systems.
Dr. Karthika Natarajan 5/23/2022 6
ISAM
• ISAM is an advanced sequential file organization method. Using the primary key, the records are
sorted.
• For each primary key, an index value is generated and mapped with the record. This index is
nothing but the address of record in the file.
• If any record must be retrieved based on its index value, then the address of the data block is
fetched, and the record is retrieved from the memory.

Dr. Karthika Natarajan 5/23/2022 7


ISAM-Pros and Cons

Pros of ISAM:
•In this method, each record has the address of its data block, searching a record in a
huge database is quick and easy.
•This method supports range retrieval and partial retrieval of records. Since the index is
based on the primary key values, we can retrieve the data for the given range of value.
In the same way, the partial value can also be easily searched, i.e., the student name
starting with 'JA' can be easily searched.
Cons of ISAM
•This method requires extra space in the disk to store the index value.
•When the new records are inserted, then these files must be reconstructed to
maintain the sequence.
•When the record is deleted, then the space used by it needs to be released. Otherwise,
the performance of the database will slow down.

Dr. Karthika Natarajan 5/23/2022 8


First Database Revolution

By the early 1970s, two major models of DBMS were competing for
dominance.
• The network model was formalized by the CODASYL (Conference
on Data Systems Languages (CODASYL)) standard and implemented
databases such as IDMS (Integrated Database Management
System).
• The hierarchical model provided a somewhat simpler approach found
in IBM’s IMS (Information Management System).

Dr. Karthika Natarajan 5/23/2022 9


A hierarchical database model is a data model in which the data are organized into a tree-like structure. The
data are stored as records which are connected to one another through links.
In order to retrieve data from a hierarchical database, the whole tree needs to be traversed starting from the
root node.

Dr. Karthika Natarajan 5/23/2022 10


Hierarchical model for electronics gadgets

Dr. Karthika Natarajan 5/23/2022 11


Network Model

• It allows a record to have more than


one parent and child record.

• This model is capable of handling


multiple types of relationships which
can help in modeling real-life
applications, for example, 1: 1, 1: M,
M: N relationships.

Dr. Karthika Natarajan 5/23/2022 12


Hierarchical vs Network model
Second Database Revolution

In the late 1960s, Codd who is working at an IBM laboratory, found the
following drawbacks in First generation DBMS:
• Existing databases were too hard to use.
• Existing databases lacked a theoretical foundation.
• Existing databases mixed logical and physical implementations.

To overcome all these, he published a core ideas that defined the relational
database model that became the most significant model for database
systems for a generation.
Dr. Karthika Natarajan 5/23/2022 15
New in Second generation
Key concepts of the relational model includes
1.Attribute: Each column in a Table. Attributes are the properties which define a
relation. e.g., Student_Rollno, NAME, etc.
2.Tables – In the Relational model, the relations are saved in the table format. It is
stored along with its entities. A table has two properties rows and columns. Rows
represent records and columns represent attributes.
3.Tuple – It is nothing but a single row of a table, which contains a single record.
4.Degree: The total number of attributes which in the relation is called the degree of
the relation.
5.Cardinality: Total number of rows present in the Table.
6.Column: The column represents the set of values for a specific attribute.
7.Relation instance – Relation instance is a finite set of tuples in the RDBMS system.
Relation instances never have duplicate tuples.
8.Relation key - Every row has one, two or multiple attributes, which is called relation
key.
Dr. Karthika Natarajan 5/23/2022 16
Key concepts in relational model

Dr. Karthika Natarajan 5/23/2022 17


Relational Model: Advantages & Disadvantages
•Simplicity: A Relational data model in DBMS is simpler than the hierarchical and network
model.
•Structural Independence: The relational database is only concerned with data and not with
a structure. This can improve the performance of the model.
•Easy to use: The Relational model in DBMS is easy as tables consisting of rows and columns
are quite natural and simple to understand
•Query capability: It makes possible for a high-level query language like SQL to avoid
complex database navigation.
•Scalable: Regarding a number of records, or rows, and the number of fields, a database
should be enlarged to enhance its usability.

•Disadvantages:
•Few relational databases have limits on field lengths which can't be exceeded.

Dr. Karthika Natarajan 5/23/2022 18


Benefits in Second generation

Database normalization is a process in which we modify the


complex database into a simpler database.

Dr. Karthika Natarajan


5/23/2022 19
New in Second generation
Other important Key concepts of the relational model include:
• Constraints
• Operations
• Normal forms

•popular Relational Database management systems

•DB2 and Informix Dynamic Server - IBM


•Oracle and RDB – Oracle
•SQL Server and Access - Microsoft

Dr. Karthika Natarajan 5/23/2022 20


Transaction Models

Jim Gray defined the most widely accepted transaction model in the late 1970s. This soon
became popularized as ACID transactions
• Atomic: The transaction is indivisible - either all the statements in the transaction are
applied to the database or none are.
• Consistent: The database remains in a consistent state before and after transaction
execution.
• Isolated: While multiple transactions can be executed by one or more users
simultaneously, one transaction should not see the effects of other in-progress
transactions.
• Durable: Once a transaction is saved to the database, its changes are expected to
persist even if there is a failure of operating system or hardware.

Dr. Karthika Natarajan 5/23/2022 21


Atomicity

Dr. Karthika Natarajan 5/23/2022 22


Consistent

In case the value read by B and C is $300, which means that data is inconsistent because
when the debit operation executes, it will not be consistent.

Dr. Karthika Natarajan 5/23/2022 23


Isolation

account A is making T1 and T2 transactions to account B and C, but both are executing independently without
affecting each other. It is known as Isolation.

Dr. Karthika Natarajan 5/23/2022 24


2000s-nosql

• In 1998, the term NoSQL (not only structured query language) was coined.
• It refers to databases that use query language other than SQL to store and
retrieve data.
• NoSQL databases are useful for unstructured data.
• NoSQL allows faster processing of larger, more varied datasets.
• NoSQL databases are more flexible than the traditional relational databases.

Dr. Karthika Natarajan 5/23/2022 25


Third Database Revolution

By 2005, Google was by far the biggest website in the world.


When Google began, the relational database was already well established, but
it was inadequate to deal with the volumes and velocity of the data confronting
Google.
• In 2003, Google revealed details of the distributed file system
GFS(Google File System)
• In 2004, it revealed details of the distributed parallel processing
algorithm “MapReduce”
• In 2006, Google revealed details about its BigTable distributed structured
Database.
• In 2007, HADOOP project is developed.

Dr. Karthika Natarajan 5/23/2022 26


Drawbacks in Second Database Revolution

• Even the most expensive commercial RDBMS such as Oracle could not
provide sufficient scalability to meet the demands of large web sites.

• To overcome this major issue, distributed databases has been introduced.

• “Sharding” involves partitioning the data across multiple databases based


on a key attribute, such as the customer identifier.

• Sharding at sites like Facebook has allowed a MySQL-based system to


scale up to massive levels, but the downsides of doing this are immense.
Many relational operations and database-level ACID transactions are lost.

Dr. Karthika Natarajan 5/23/2022 27


Cloud Computing
• Between 2006 and 2008, Amazon rolled out Elastic Compute Cloud (EC2).

• EC2 made available virtual machine images hosted on Amazon’s hardware


infrastructure and accessible via the Internet.

• Amazon added other services such as storage (S3, EBS), Virtual Private Cloud
(VPC), a MapReduce service (EMR), and so on.

• The entire platform was known as Amazon Web Services (AWS) and was the first
practical implementation of an Infrastructure as a Service (IaaS) cloud.

• AWS became the inspiration for cloud computing offerings from Google, Microsoft, and
others.

Dr. Karthika Natarajan 5/23/2022 28


Document Databases

• The impedance mismatch between object-oriented and relational models, leads to


Object relational mapping systems.
• This was enabled by the programming style known as AJAX (Asynchronous JavaScript
and XML), in which JavaScript within the browser communicates directly with a backend
by transferring XML messages.
• XML was soon superseded by JavaScript Object Notation (JSON), which is a self-describing
format similar to XML but is more compact and tightly integrated into the JavaScript
language.
• The databases which supports JSON may directly create, access the database and
eliminates the role of relational middleman. Later these became as “Document Databases”.
• CouchBase and MongoDB are two popular JSON-oriented databases.

Dr. Karthika Natarajan 5/23/2022 29


NewSQL

In 2007, Michael Stonebraker and his team proposed a number of variants on the
existing RDBMS design.
• H-Store described a pure in-memory distributed database
• C-Store specified a design for a columnar database.

Both these designs were extremely influential in the years to come and are the first
examples of what came to be known as NewSQL database systems

NewSQL databases that retain key characteristics of the RDBMS but that diverge from
the common architecture exhibited by traditional systems such as Oracle and SQL
Server.

Dr. Karthika Natarajan 5/23/2022 30


The Nonrelational Explosion

At the conclusion, dozens of new database systems like such as MongoDB,


Cassandra, and HBase emerged due to the drawbacks of relational databases.

These new breeds of database systems lacked a common name “Distributed


Non-Relational Database Management System” (DNRDBMS).

However, in late 2009, the term NoSQL quickly caught on as shorthand for any
database system that broke with the traditional SQL database.

Dr. Karthika Natarajan 5/23/2022 31


The Database technologies

Dr. Karthika Natarajan 5/23/2022 32


What is NoSQL?

• NoSQL database, also called Not Only SQL, is an approach to data


management and database design that's useful for very large sets of
distributed data.

• NoSQL is not a relational database.

• A relational database model may not be the best solution for all situations.

• The easiest way to understand NoSQL, is that of a database which does


not adhering to the traditional relational database management system
(RDMS) structure.

Dr. Karthika Natarajan 5/23/2022 33


What is NoSQL?

• The most popular NoSQL database is Apache Cassandra.

• Cassandra, which was once Facebook’s proprietary database, was


released as open source in 2008.

• Other NoSQL implementations include SimpleDB, Google BigTable,


Apache Hadoop, MapReduce, MemcacheDB, and Voldemort.

• Companies that use NoSQL include NetFlix, LinkedIn and Twitter.

Dr. Karthika Natarajan 5/23/2022 34


Why we should use NoSQL?
There are several reasons why people consider using a NoSQL database.

• Application development productivity.


• Large data.
• Analytics.
• Scalability.
• Massive write performance.
• Fast key-value access.
• Flexible data model and flexible datatypes.
• Schema migration.
• Write availability.
• Easier maintainability, administration and operations.
• Generally available parallel computing.
• Programmer ease of use.
• Distributed systems and cloud computing support.

Dr. Karthika Natarajan 5/23/2022 35


SQL vs NoSQL
SQL NoSQL

Relational Databases (RDBMS) Non-relational or distributed database


Document based, key-value pairs, graph
Table based databases
databases or wide-column stores
Have dynamic schema for unstructured
Have predefined schema
data

Vertically scalable Horizontally scalable


Scalability is managed by increasing the Scalability is managed by adding few more
CPU, RAM, SSD, etc servers easily in your NoSQL database
Uses UnQL (Unstructured Query
Uses SQL (structured query language) Language). The syntax of using UnQL
varies from database to database
Dr. Karthika Natarajan 5/23/2022 36
SQL vs NoSQL

SQL NoSQL
MySql, Oracle, Sqlite, Postgres and MS- MongoDB, BigTable, Redis, RavenDb,
SQL Cassandra, Hbase, Neo4j and CouchDB
Not good fit for complex queries (NoSQL
Good fit for the complex query
don’t have standard interfaces)

Not best fit for hierarchical data storage Fits better for the hierarchical data storage

Best fit for heavy duty transactional type


Not fit for heavy transactional applications
applications

Dr. Karthika Natarajan 5/23/2022 37


SQL vs NoSQL

SQL NoSQL
Excellent support are available for all SQL
Still have to rely on community support
database
Emphasizes on ACID properties Follows the Brewers CAP theorem
(Atomicity, Consistency, Isolation (Consistency, Availability and
and Durability) Partition tolerance )
Classified on the basis of way of storing
data as graph databases, key-value store
Classified as either open-source or close-
databases, document store databases,
sourced column store database and XML
databases.

Dr. Karthika Natarajan 5/23/2022 38


Summary

This session will give the knowledge about


• Database revolutions
• First generation Database
• Second generation Database
• Third generation Database
• What is NoSQL?
• Comparison between SQL and NoSQL?

Dr. Karthika Natarajan 5/23/2022 39

You might also like