Adbms Tech-Neo Searchable

TR
Seg
abus... al
Sylle
e
Mumbai University
B. E. (Computer Engineering)
Credit
: i |
Course Code Course Name
| |
CSDOS01 Advance Database Management SY fC
Prerequisite: Database Management System
Course Objectives :
1. To provide insights into distributed database designing
2. To specify the various approaches used for using XML and JSON technologies.
. vine |
8.
i
i the various
To apply the concepts behind types of NoSQL databases al nd utilize it for Mongodb
4. To learn about the trends in advance databases. aes _l
Course Outcomes : After the successful completion of this course leamer will be able to «
g.
1. Design distributed database using the various techniques for query processin
Measure query cost and perform distributed transaction management. ‘ _|
2
3 Organize the data using XML and JSON database for better interoperability.
4, Compare different types of NoSQL databases.
5 Formulate NoSQL queries using Mongodb.
6 Describe various trends in advance databases through temporal, graph based and spatial based databases
Module | Hrs.
1 Distributed Databases 3
1.1 Introduction, Distributed DBMS Architecture, Data Fragmentation, Replication and

Allocation Techniques for Distributed Database Design. (Refer Chapter 1)
2 Distributed Database Handling 8
2.1 Distributed Transaction Management — Definition, properties, types, architecture

Distributed Query Processing - Characterization of Query Processors, Layers/
phases of query processing.
2.2 Distributed Concurrency Control- Taxonomy, Locking based, Basic TO algorithm,
Recovery in Distributed Databases: Failures in distributed database, 2PC and 3PC
protocol. (Refer Chapter 2)

3 Data interoperability - XML and JSON 6
Atacama Snnten YM Scns, Guang a

eT pe
Contents Hrs. —
3.2 Basic JSON syntax, (Java Script Object Notation) JSON data types, Stringifying and
parsing the JSON for sending & receiving, JSON Object retrieval using key-value
pair and JQuery, XML Vs JSON. (Refer Chapter 3)
10
NoSQL Distribution Model
comparison
41 NoSQL database concepts: NoSQL data modeling, Benefits of NoSQL,
between SQL and NoSQL database system.
d data, CAP
4.2 Replication and sharding, Distribution Models Consistency in distribute
theorem, Notion of ACID Vs BASE, handling Transactions, consistency and
eventual consistency
4.3 Types of NoSQL databases: Key-value data store, Document database and Column
ACID
Family Data store, Comparison of NoSQL databases w.r.t CAP theorem and
(Refer Chapter 4)
properties.
NoSQL using MongoDB
5.1 NoSQL using MongoDB: Introduction to MongoDB Shell, Running the MongoDB
shell, MongoDB client, Basic operations with MongoDB shell, Basic Data Types,
Arrays, Embedded Documents
5.2 Querying MongoDB using find() functions, advanced queries using logical operators
and sorting, simple aggregate functions, saving and updating document. MongoDB
Distributed environment: Concepts of replication and horizonal scaling through
sharding in MongoDB. (Refer Chapter 5)
Trends in advance databases
6.1 Temporal database: Concepts, time representation, time dimension, incorporating

time in relational databases.
6.2 Graph Database: Introduction, Features, Transactions, consistency, Availability,

Querying, Case Study Neo4J
6.3 Spatial database: Introduction, data types, models, operators and queries.
(Refer Chapter 6)
> Chapter2 Distributed Database Handling........
eee un cppe®seeaees
.scssssccecssssnersrenen tee sas ees Seennetssaseaeasascases 2-1 to 2-26
» Chapter3 and JSON...

XML amd
Data Interoperabilityity —— XML JSON.L.....cccceccssscscerscssesensuneesseneensenantseneesneenaes 3-1 to 3-29
> Chapter4 NOSOQL Distribution Model ...ccssecesssssesssecessssssssecsesenecsensnennesessssesnensatnarsneseusesanaes 4-1 to 4-13
NOSQL using MOngODB ...ccssssocecssssosssssesessesessssssssssssoreesersersestensssssssssssessseves 5-1 to 5-25

> Chapter5
» Chapter6 Trends in Advance Databases ........:..cccsssssssesssssessssassesstsseaseressazasecesesesseceseesece 6-1 to 6-18
gov
MODULE 1
Distributed Databases
CHAPTER 1
_ Syllabus
ccs
DBMS Architecture, Data Fragmentation, Replication and Allocation Techniques for

Introduction, Distributed
Distributed Database Design.
anaagsennaceaensanss 1-2
4.1. IMtrOCUCHION .ccccccccsesceseecenceccecsscsesscssccsansecusneseeenesesnenseaneunanendanbesaenenssencessenseuscunsenstunsennengeancensensensceneeensensesausau
1.1.1 Difference between Centralized and Distributed Database .........ccssesseeeenesseesesssaeneneneesenanensensannnennanenss 1-2
1.1.2 Transparency in DDBMS.. at Rivsrencseete NaS a peteentaateeetenn VE
UQ. Explain different types of transparency in distributed databasea i sR 13

1.1.3 Types of Distributed Systerm .........csscsessctesneernessesseesssenssennsensessnennessnarerennaanssoascenatnnscnnssnssnsnaesnnnsenasaaneenanenannns i+
1.2 Distributed DBMS Architecture ............ sviluabed Tonupsainsnnsanapebandnandiva staying vid isu ¥uESUAGtCURMR EE ARUARLGStagae eth See egestas ena teneneneneees 1-5
1.2.4 General Architecture of Distributed Databases SYSteIm...........c ccc esse teenseneeeesenenneneaneseeneanenennsnansnnenensssnsees 1-5
1.2.2 Parallel Database Architecture .c...cccccccsccccessesseceesseeseesaseneeesesesseescesseeeneesensaaeaaenaaenaeseasanenannaeenennanens
VEN
ME MEUEN
UQ. — Explain Parallel database architectures [UIUBMI My -......-..-...sccenncsssssssssessesennenes AT
1.2.3 Federated Database Schema Architecture.......cccccccccsssecscseessreeeneneeesneeeneesneeeeenes
1.2.4 Three-Tier Client-Server Architecture «0.0.0... cc ssssssssesscessesseeesanenneeesanenennenensaees 1-9
ua. Write a note on client server architecture. (UU TURSVEN BIE) .........cccccccccscsseseeseecenecenceeesessenneeneeseesneeseaseneeneaaeassees 1-9
1.3. Data Fragmentation, Replication.and Allocation Techniques for Distributed Database Design...
1.3.1 Replication .........cccceeeeees
1.3.2 Fragmentation
UQ. _ Give two examples of horizontal and vertical fragmentation each [JUUBINTYAREMUIEWAR ...............-e 1-11
ua. Give derived horizontal fragmentation for emp and pay. Write resultant fragmatts) sethcancceee 1-13
1.3.3 Syntax for Creating Fragments ........sssecsscsesssecsreraecesesssenecsseuscansesesesesesesenquansnessasauennaensasseyesesensuanasgsesenesananans 1-16
4.3:4 — Data Replication ......c.csscssssscsessssesessbessssonsarssesssnsrsezsssoensesesenvassansnenesnsecenenecsssas aiacsccdis
aesuaamnnesnanasensennsannmas nnies 11
1.4 Descriptive QUESTIONS .........ccsessteesesseeseeegenseseesscansssansaeescevseanececausassanseusaunensuasuseuaesnensesesseeseeeqecataucensussuguananseuncunneenenees 1-19

1.5 Multiple Choice Questions ........s.setersserssesesssesssssercssassenssenectnessceussesassseususuncansensuracasyesuunaneaysnnanananaessansnsenensnenaeanaesanenanans 1-19
Be Chapter Ends ou... csssesssesessecsssssspersensesesssnsusneussarcauancacseqeasacasanenenenensesaseseseaneuauescvesssususeeenersensenseseseeeetensseseneaas 1-20
Advance Database Management System (MU-Sem 5-Com Jotr) buted Databases seeeP, age no. . (1-2
Distri (1+
.0—0—00 uaa 0
b> 1.1 INTRODUCTION
A Distributed Database (DDB) is database that is not stored on one system, it is divided on different systems
or sites, i.e., on multiple computers which are connected through the computer
network.
1 Definition
* A Distributed database is defined as a logically related collection of data that is shared which is physically
distributed over a computer network on different sites.
* A Distributed Database System (DDBS) is the software that manages data which is stored on different
computers connected through network and follows the concept that user
will not come to know where data is
scattered on different sites or servers and users will think that only one
system is there to provide data which
is required by user in the form of query.
Example
* Consider you want to fetch data from different folders related to given
task and that folders are on different
drivers so we can say data which is related to each other
is distributed in folders.
In these folders data can be present in same format like in
document or can be in different format like excel
and document or can be in any other extension of
file.
& 1.1.1 Difference between Centralized and Distributed

Database
Parameters for Centralized Database

comparison
Figure
A OF
Client 3 Client 4
» Communication,
[| channel _
Client 2
Centralized
database
" Client'4 VpheZ " Client 6
1a1)Figg 1.1.1
(anFi : entra
Central
lt ized Databas
ataba e se Syste:
System (142)Fig: 1.: 1.eee
Distr 2 d Database system
ibute
Location of data ~ |The database is located on single | The database is located on various
machine. sites
Maintenance It is easy to maintain
It is difficult to maintain
(MU-New Syllabus w.e.f academic year 21-22)(M5-

68) Tech-Neo Publications... SACHIN SHAH Vent
ure
Advance Database Management System (MU-Sem 5-Com Distributed Databases), .,.Page no. (1-3
Parameters for Centralized Database Distributed Database

comparison
Design of data It will have simple design of data which It will be complex design of data which
will be easily understandable. will be difficult to understand,
Response time It will take more response time, It will take less response time.
Efficiency It is Jess efficient It is more efficient
Processing of query The query will be processed by single The query will be processed by many
server so will have load on the same server so will not have load on one
"| system. system.
Reliability It is less reliable It is more reliable,
Failure of system If centralized server fails entiré system If one system or server fails ,system
will be halted, continues to work with the other system.
Data traffic There will be data traffic as data stored | There will not be data traffic as data is
on one server divided or copied among the number of
servers.
Advantages e All data is stored at a single location | * Database can be easily expanded as
so it becomes easier to access and data is already spread across sites at |
communicate data. different physical locations.
¢ Minimal data redundancy. e The distributed database can easily
© — less costly be accessed from different networks.
e This database is more secured.
Disadvantages
e Data traffic will be there as all data e Very costly and it is difficult to
is stored at one location. maintain because of its complexity.
e If any kind of failure occurs at e In this database, it is difficult to
centralized system then there is risk provide a uniform view to user since
of entire data will be lost. it is spread across different physical
locations.
7% 1.1.2 Transparency in DDBMS
i as Sa PPO La a i Ps Ne ce a a a
Transparency is one of the features of DDBMS. It means or the way to hide internal implementation details
from the user, how data is distributed and where it is stored all these details will be hidden from the user.
(1) Distribution transparency : It allows the distributed data to be treated as a single logical database. User
doesn’t know which data are partitioned and where it is distributed.
(MU-New Syllabus w.e.f academic year 21-22)(M5-68) Tech-Neo Publications...A SACHIN SHAH Venture
Advance Database Management System (MU-Sem 5-Com Distributed Databases)._Page no. (1-4
more than one network site. Maintains
(2) Transaction Transparency :It allows a transaction to update data at
database integrity as transaction is completed or aborted.
(3) Failure transparency : It ensures system continues to operate in event of node or network failure.
(4) Performance transparency : It allows system to perform as if it looks like centralized DBMS
(5) Heterogeneity transparency : It allows the integration of several different local DBMS’s under a common
global schema.
(6) Replication Transparency : It hides about which data is replicated from the user.
(7) Fragmentation Transparency : The end user doesn’t know the fragment names or fragment locations are prior
to data retrieval. (which fragment data is accessed by query fired by user.)
tS Example of Distributed database system
Consider the application of online examination system.

(1) There are three servers used for the above system’as $1, S2, S3. The databases are stored on these servers are
as per the design of data model.
(2) On server S1 there is one fragment of database having the data of questions, consider for 2 subjects c and
java.
(3) User fires the query by selecting subject as C so user doesn’t know that user is fetching data from server S1
as he is unaware whether data is divided among servers for good performance. This implementation fact is
hidden from the user and centralized view is shown to use (Distribution/fragmentation transparency).
Ya. 1.1.3 Types of Distributed System

| 1. Homogeneous Database 2. Heterogeneous Database |
> 1. Homogeneous Database

In a homogeneous database, all sites or servers use same
DBMS for managing data. All the sites will have same
operating system, database management system and the
i|
data structures. In Fig. 1.1.3 there are two servers of the or)
RAAB!
system and are using the same DBMS as oracle and data afetefele:
is handled by same DBMS on both the servers. Oracle . Oracle
(143)Fig. 1.1.3 : Homogeneous database
> 2. Heterogeneous Database
¢ Ina heterogeneous distributed database, all sites or servers can use different DBMS that can cause problems
in query processing and transactions.
Advance Datebase N i MU-Sem ) Distributed Databases)....Page no. (1-5
e Also, one site might be completely unaware of the other sites.

Different computers may use a different operating system,
different database application. They may even use different
data models for the database and to communicate translations
or transformations are required for different sites. In Fig. 1.14
there are two servers of the system and are using different
DBMS as oracle and mysql where data is handled by different Cradle
DBMS on both the servers.
(tao Fig. 1.1.4 : Heterogeneous distributed database system
1 1.2 DISTRIBUTED DBMS ARCHITECTURE ~

databases as
A distributed database system allows different applications to access data from local and remote
per the requirement of the query-
To keep data
Architecture defines the flow of the data among the servers as per the design of the data model.
in the
in consistent state it is important to update all copies of data if fragmented and data should be
consistent state if stored on different servers.
Y= 1.2.1 General Architecture of Distributed Databases System
* In this architecture there are two views as logical and component architectural models of a DDB.
User User
Extemal
view A= Extemal
_Global conceptual schema (GCS)
f-
Local conceptual schema (LCS)
\
“Local conceptual schema (LCS)
4 A.
Local intemal schema (LIS) - eunnnueng Local intemal schema (LIS).
Site 1 Sites2 ton-1 Site n
(1asFig. 1.2.1: Logical architectural model
ion is presented with

® The Fig. 1.2.1 shows the generic schema(logical) architecture of a DDB, the organizat
is the
a consistent, unified view showing the logical structure of underlying data across all nodes. This view
(MU-New Syllabus w.e.f academic year 21-22)(M5-68) fel Tech-Neo Publications...A SACHIN SHAH Venture
Advance Database Manageme
ee nt System (MU-Sem 5-Com
ee ee Distributed Databases)....Page no. (1-6
integration of all the data that is stored on every site and divided as per the design of database and is
Tepresented by the Global Conceptual Schema (GCS), which provides network transparency
¢ Each node is having its own Local Internal Schema (LIS) based on physical organization details at that
particular site,
* The logical organization of data at each site which is local to it is not remote is shown by the Local
Conceptual Schema (LCS). The GCS, LCS and their underlying mappings provide the fragmentation and
replication transparency as per the design of database
i= Component architecture of a Distributed database system
¢ The Fig. 1.2.2 Shows the component architecture of a DDB. It is an extension of its centralized database. The
components that are responsible for executing the query whose data are available on different servers.
* The global query
compiler references the
User ~
Global Conceptual Schema (GCS) from
the global system catalog to verify and
Interactive global query
impose already defined constraints.
* The global query optimizer references both Global query compiler.

global and local conceptual schemas and
generates optimized local queries from ) Global query optimizer ||
global queries. |.Global transaction manager|

* It evaluates all candidate strategies using a
{
cost function that estimates cost based on t : !
response : ; ‘ Local transaction? Local transaction
time and estimated sizes ‘of "manager -2-|- "eset" manager
intermediate results. | ; {
re Local © Local query - Local Local query ©
e After computing the cost for. each system “translation Jesse! system
. . _. translation
_ catalog and execution: catalog and execution
candidate(each site), the optimizer selects the
candidate with ‘the minimum cost — for
execution. Each local DBMS has their local
query optimizer, transaction manager, and
execution engines as well as the local system
catalog, which has their local schemas. (1A6)Fig. 1.2.2: Component Architecture model
e The global transaction manager is responsible for coordinating the execution across multiple sites in
conjunction with the local transaction manager at individual sites.
(MU-New Syllabus w.e.f academic year 21-22)(M5-68) Tech-Neo Publications... SACHIN SHAH Venture
(MU-Sem 5-Com Distributed Databases)....Page no. (1-7
Advance Database Management System
2 1.2.2 Parallel Database Architecture

'
}
Explain Parallel database architectures URES ee Meee |
‘UQ, ei ee ee ie ieee
—<eee eee ee ee ee ee eee ee
There are two main types of multiprocessor system architectures ;

(A) Shared memory architecture (B) Shared disk (loosely coupled) architecture
e (A) Shared memory (tightly coupled) architecture : In this architecture, multiple processors share
in
secondary (disk) storage and also share primary memory, Data and code in a parallel program are stored
the main memory accessible for the processors.
Processor | | Processor aeeseannen!
Processor
n
1 2
nterconnection network
Memory Memory | ,,,, |, Memory

m VO1 eens
a
1 e 2 '
Shared main memory modules 0 units
(1a7)Fig. 1.2.3(a): Shared Memory
* (B) Shared disk (loosely coupled) architecture : In this architecture, multiple processors share secondary
(disk) storage but each has their own primary memory. ;
e These architectures enable processors to communicate without the overhead of exchanging messages over a
network.
Interconnection network
L |
Processor | |'Processor ‘Processor
1 2 ; eaneeeeaee: - n j ‘
| oe Interconnection network
Memory Memory sorsstvtity Memory

1 2 nN Processor 1 Processor 2 Processorn
é i
| Memory. Memory _ Memory .

|
(1a)Fig, 1.2.3(b) : Shared disk (1a9)Fig. 1.2.3(c) : Shared nothing
(Distributed Databases)....Page no. (1-8
Advance Database Management System (MU-Sem 5-Comp)
¢ Database management systems developed using the

above types of architectures are termed parallel
database management systems rather than
DDBMSs, as they utilize parallel processor
technology.
e Multiprocessor systems that have distributed

memory are called as loosely coupled systems. In
such systems, it is possible to organize many inter-
processor connections at the same time.
¢ Another type of multiprocessor architecture is called

shared architecture. In this architecture,
nothing
every processor has its own primary and secondary
(disk) memory, no common memory exists, and the
processors communicate over a “high-speed
architecture
interconnection network (bus or switch). quiaFig. 1.2.3(d) = Distributed Database
ed database computing environment, major
Although the shared nothing architecture resembles a distribut
multiprocessor systems, there is symmetry and
difference exists in the mode of operation. In shared nothing
nment where heterogeneity of
homogeneity of nodes; this is not true of the distributed database enviro
hardware and operating system at each node is very common.
3 1.2.3 Federated Database Schema Architecture

Extemal Extemal aw. fp Extemal
: ~ scheme
schema ~ scheme
e A federated (group of) database, or virtual database,
is the fully integrated, logical composite of all
Federated | 2 Federated
constituent databases in a federated database system.
_schema_— ™ schema_)
It is a type of meta-database management system
(DBMS), which transparently integrates multiple
autonomous database systems into a single federated
database
: :
e The five-level schema architecture to support
environment is ees wee - Someones
global applications in the FDBS a
~T_
shown in figure.
Local wesnete Local- |
It consists of the five schemas as below with the
a T
corresponding functionality :
Eomeanens veccese Component
Local schema, Component schema, Export schema, DBS
S
Federated schema, External schemas
qanjFig. 1.2.4 : Federated Database Architecture
eI Tech-Neo Publications..A SACHIN SHAH Venture

(MU-New Syllabus w.e.f academic year 21-22)(M5-68)
ll
Advance Database Management System (MU-Sem 5-Com Distributed Databases)....Page no. (1-9
(1) The local schema and component schema is the conceptual schema (full database definition) of a
component database, and the component schema is derived by translating the local schema into a canonical
data model or Common Data Model (CDM) for the FDBS.
Schema translation from the local schema to the component schema is done by generating mappings to
transform commands on a component schema into commands on the corresponding local schema.
(2) The export schema represents the subset of a component schema that is available to the FDBS.
(3) The federated schema is the global schema or view. This schema is the result of integrating all the
shareable export schemas. '
(4) The external schemas define the schema for a user group or an application that is designed for the users
only.
2a 1.2.4 Three-Tier Client-Server Architecture
1 UQ.. Write a note on clie

Se
The distributed database application uses the concept as Client

of client-server architectures. Mostly web applications e User interface or presentation tier z
. . “(Web browser, HTML, JavaScript, Visual Basic, ...)
use client server architecture.
¢ A client is a computer hardware device or software HTTP protocol
process that invokes a service available on a server. | poe 4
: : j Application server
eA server is. a physical computer dedicated to run - Application (business) logic tier 7
services to serve the needs of other computers. 2 Application program, Java, C/C++, Cl, ..) aie
Depending on the type of service that is running, it t
, ‘ ODBC, JDBC, SQL/CLI, SQLJ
could be a file server, database server, home media i
rver, or web se! rver,
server, , p print server, niatabase servei :
a re s the - (Database PSM, XML. tier...)
SQL, processing
transaction
Query and access, a|
e In the three-tier client-server architecture,
following three layers exist :
(1a12)Fig. 1.2.5 : Three tier client server architecture
1. Presentation layer (client)
© The user is provided with the interface by this layer. The programs at this layer present Web interfaces
or GUI forms to the client in order to interface with the application or system.
e Web browsers are often used and the languages used for creating the interface includes HTML,
XHTML, CSS, Java, JavaScript etc.
e This layer handles user input, output, and navigation by accepting user commands and displaying the
needed information by the user, usually in the form of static or dynamic Web pages. This layer typically
communicates with the application layer via the HTTP protocol.
(MU-New Syllabus w.e.f academic year 21-22)(M5-68) Tech-Neo Publications..A SACHIN SHAH Venture
Advance Database Management System (MU-Sem 5-Com Distributed Databases sea
‘age no. (1-10
.
2. Application layer (business logic)
© — The application logic programs application logic. For example, queries can be formulated based on user
input from the client, or query results can be formatted and sent to the client for presentation to user to
understand the output(proper format).
* The additional application functionality can be handled at this layer, such as security checks, identity
verification, and other functions.
* The application layer can interact with the database using ODBC, JDBC, SQL/CLI, or other database
access techniques.
3. Database server
e This layer communicates with the application layer and is responsible for handling the query and
updating requests of the user by processing the requests, and sending the results back to user.
e SQL is used to access the database if it is relational or object-relational and stored database procedures
may also be invoked. Query results (and queries) may be formatted and transmitted between the
application server and the database server.
To process SQL query application layer interacts with database layer in the following way:
(1) The user query is formulated by application server based on input from the client layer and divides it into a
number of independent site queries. Each site query is sent to the appropriate database server site.
(2) Each database server processes the local query independently and sends the results to the application server.
Increasingly, XML is being used as the standard for data exchange so the database server may format the
query result into XML before sending it to the application server,
(3) The results of all subqueries are combined by application server to produce the result of the originally
required query then format it into HTML or some other form accepted by the client, and sends it to the client
site for display. .
1 1.3 DATA FRAGMENTATION, REPLICATION AND ALLOCATION...

_) TECHNI
FOR DISTRIBUT
QUE ED, DATABASE
S DESIGN | :
In distributed database if we say that data is not stored at one place then data is
divided logically among the
sites as per the requirement of application. The distribution or division of data originates the concept of
fragmentation and replication. First there is need to understand what
exactly the fragmentation and replication of
data is?
There are two ways the data can be stored on different sites that
are as follows - Fragmentation and
Replication.
—
Tech-Neo Publications A SACHIN SHAH Venture
1.3.1 Replication
The process of storing copies of data on more than one server is called as replication of data. So system
maintains multiple copies of data to increase availability of data and reduces response time of the query.
ES Purpose of Data Replication
® To increases the availability of data.

e Speed up the query evaluation by copying of the data on multiple sites.
ES Advantages
(1) High availability of data

(2) Parallel execution with distribution of data and work among the server (copying of data)
(3) Increase in performance of system,
(4) Avoid loss of data in case of any failure.
t= Disadvantages
(1) If change is made at one site then need to be reflected at every site where copy is stored or else it may lead to
inconsistency of copies of data.
(2) The concurrency control becomes complex as concurrent execution need to be performed over a number of
sites.
(3) The data needs to be updated regularly.
1.3.2 Fragmentation
« The process of dividing the database into a smaller chunks or parts is called as fragmentation.
As per the requirements of application fragments may be stored at different locations or sites.
« Then original database should be able to construct from the fragments without loss of data.
e Every fragment is subpart of the original table.
e Fragmentation doesn’t create copies of data, it just divides the data so data consistency will not be a problem.
[= Types of data pepo
Xe 13, May 13 '

]
There are three types of data fragmentation: 1. Horizontal data fragmentation 2. Vertical Fragmentation
1. Horizontal data fragmentation
¢ In this type of fragmentation a table data is divided horizontally into the group of rows to create subsets of
tables.
(MU-New Syllabus w.e.f academic year 21-22)(M5-68) fa) Tech-Neo Publications...A SACHIN SHAH Venture
* It is done by applying some condition on attribute or column in table. Rows in table are separated into |
fragments.
* There is splitting of the rows horizontally by applying condition on attribute depending on the query. The
condition can be on one or more attributes.
Example
* Consider table customer (Custid, Name, address, City). In above table if we have values for city in table as
Mumbai, Delhi, Pune.
* QUERY : Fragment data according to the city values.

* There will be three horizontal fragments HF1,HF2,HF3. So
we can write query for HF1 as select * from
customer where city=""-Mumbai’”
The query for HF2 as select * from customer where city="Pune”
The query for HF-3 as select * from customer where city=""Delhi”
* Suppose customer table is having 1000 rows and assume we have HF1 = 300 rows, HF-2 = 400 and
HF-3 = 300 rows. If we combine these 3 horizontal fragments then we should have original table
containing
1000 rows with no loss or addition of records or rows.
HF-1 U HF-2 U HF-3 = Customer
300 + 400 + 300 1000.

In this type relations(tables) are fragmented (i.e. tables are divided into sub-table
s) and each of the fragments
is stored in different sites depending on the requirements and make sure that all the
fragments after combining
together will form the original table (there isn’t any loss of data).
HF1 : city = "MUMBAI"
CustID | name®
1
300
Table : CUSTORMER
HF2 : city = "PUNE"
Cust ID name | address. city CustID [name [address |
4
301
1000
700
HF3 ; city = "DELHI"

Cust ID name
701
1000
(1413)Fig. 1.3.1: Horizontal fragme

ntation
(MU-New Syllabus w.e.f academic year 21-22)(M5-68) te Tech-Neo Publications..A SACHIN SHAH Venture
(M Distributed Databases)....
Note: Horizontal fragmentation works on division of data based on conditions of the attributes . No of rows
i fy ine .,
will be different in fragments as per the satisfying condition of attribute,
i= Types of horizontal fragmentation
1. Complete horizontal fragmentation

2. Disjoint horizontal fragmentation
3. Derived horizontal fragmentation
Complete horizontal fragmentation : Completeness is required for reconstruction of relation so that every
table belongs to at least one of the partitions.
Disjoint horizontal fragmentation : The disjoint horizontal fragmentation generates a set of horizontal
fragmentation in which no two fragments have common tables. That means every table of relation belongs to
only one fragment.
3. Derived Haraneal ragmentalnitt
Owner relation and member relations participate in this type of fragmentation. The relation with the primary
key is owner relation and the relation with the foreign key is member relation.
Let S is owner and R is member relations so derived horizontal fragments of R are defined as
Ri = R semi join Si where 1<=i<=n and Si is primary horizontal fragment. Where n is the maximum number
of the fragments on R.
Si=Sigma (condition) (owner relation)
Consider the following example in which Pay is owner relation and Emp is member relation.
PAY
| TITLE, SAL
Leis
EMP PROJ
ENO, ENAME, TITLE| — | PNO, PNAME, BUDGER, LOC
L, Ls
ASG =
| ENO, PNO, RESP, OUR
(a1Fig 1.3.2 : Derived Horizontal fragmentation
Consider PAY! AND PAY2 are primary horizontal fragments then derived horizontal fragments can be
defined as :
DHF1 EMP semijoin PAY1
DHF2 EMP semijoin PAY2

i
2. Vertical fragmentation
* The division of data by selecting specific columns or attributes of the table, no condition is required. If we
combine all the vertical fragments there should be original table with correct number of attributes.
® The schema of the table or relation is divided into smaller schemas(sub-schemas). Each fragment must
contain a common candidate key so as to ensure lossless join, If candidate key isn’t taken then it will not be
possible to reconstruct the original table from vertical fragments as there will not be any relation among the
data.
Example
e Consider table customer (Custid, Name, address, City, Phno).
* — If we divide information of customer in one fragment with the attributes (name, address and city) and phno
in another fragment then we have two vertical fragments with candidate ket as cust_id which relates two
fragment with each other,
Table : CUSTORMER
10 name ddrass | city
1
1000
VF1 : custid, name, address VF2 : custid, city
ID} name | address| city ID | city

1 1
1000 1000
(ta18)Fig 1.3.3 : Vertical fragmentation
* Customer = VFI join VF2(by performing join operation among the vertical fragments we will get original
table customer)
™> Note : Vertical fragmentation works on div salt tice eens
will be same in all fragments as no division of rows me
E= Types of vertical fragmentation
(a) Complete vertical fragmentation
(b) Mixed (hybrid) Fragmentation
(a) Complete vertical fragmentation : In vertical fragments all attributes are present of original
table and one
common key then its complete vertical fragmentation.
(b) Mixed (hybrid) Fragmentation : This type of fragmentation is the combination of horizontal and vertical
fragmentation.
(MU-Sem 5-Com Distributed Databases)....Page no. (1-15
There are two options in this category :

(1) First perform horizontal fragmentation and then vertical fragmentation
n
(2) First perform vertical fragmentation and then horizontal fragmentatio
ative ways :
t= Mixed or hybrid fragmentation can be done in two altern
R
(1) First, generate a set of horizontal fragments, then generate
HFA —
vertical fragments from one or more of the horizontal
‘
fragments.
then generate R1 R2
(2) First, generate a set of vertical fragments;
of the vertical yr VEn VFI VFn
horizontal fragments from one or more
fragments.
R11 Rin R21 R2n
1a16)Fig 1.3.3(a) : Mixed fragmentation
of table.
Any one way you can use and both will be considered as mixed or hybrid fragmentation
Bare
[Fragment 2
Fragment 1
"Table
E Fragment 2 a
(117) Fig 1.3.4: Options for creating mixed fragments
Example 1
Fragmentation 1
SELECT * FROM cust WHERE age < 40.

Fragmentation 2
SELECT * FROM cust WHERE Address = ‘Pune’.

Example 2
drive then
Suppose you have folder and that folder is having some files, so if we copy some files to another
its replica of files that are present in the folder and if copy all the contents then its duplication of the folder or
data.
Advance Database Management System (MU-Sem 5-Com i
Distributed Datab ases)....Page no, (1-
16
1.3.3 Syntax for Creating Fragments
(1) Consider we have table student (st_id,name, address, ph_no, email_i

d)
(2) Horizontal fragments
(3) Create table HF1 as select *from student where st_id<=40)
(4) Create table HF2 as select *from student where

st_id>40)
(5) Select * from HF1;
(6) Select *from HF2:
(7) To get the original table use union between fragm
ents.
(8) Select *from HF1 union select * from HF2;
(9) Vertical fragments
(10) create table VFI as select st_id,name
from student;
(11) create table VF2 as select st_id,address, phno,
emailid from student;
(12) Select * from VFI;
(13) Select *from VF2]
(14) To get the original table use Join betwe

en fragments,
(15) Select *from VFI natural join VF2;
(16) Same way we can create mixed fragments
(17) Data Replication and Allocation
@& 1.3.4 Data Replication ,

:
The process of copying the fragments on server or
site that are connected in network is called as repli
cation
of data. Each fragment or each copy of a fragment
must be assigned to a particular site in the distri
buted
system. This process is called data distributi
on (or data allocation).
The choice of sites and the degree of replicatio
n depend on the performance and availabili
ty goals of the
system and on the types and frequencies of trans
actions submitted at each site.
For example, if high availability is requi
red, transactions can be submitted at any
site, and most transactions
are retrieval only, a fully replicated data
base is a good choice. However, if certa
in transactions that access
particular parts of the database are most
ly submitted at a particular site, the
corresponding set of fragments
can be allocated at that site only,
Data that is accessed at multiple sites

can be replicated at those site s.
If many updates are performed
be useful to limit replication, Fin , it may
ding an optimal or even a good
solution to distributed data allo
complex optimization problem. cation is a
(MU-New Syllabus w.e.f academic year 21-22

)(M5-68)
Tech-Neo Publications...A SAC
HIN SHAH Venture
ing ways :
We can have allocation of fragments on number of sites by the follow
“1. Full replication 2, No Replication 3. Partial replication
> 1, Full replication

In full replication scheme, the database is available to almost every location or user in communication
network. If fragments are copied on every site or server then its full replication of data.
Replication is the process which used in improving
the data availability. The replication of the whole
database at every site in the distributed system, will b | Original
Data] database
create fully replicated distributed database.
This causes high availability of data and the system
can continue to operate as long as at least one site is
up and good performance for global queries as the
results of such queries will be available locally from
any one site. The retrieval query can be processed at
the local site where it is submitted, if that site Recovery
includes a server module. User3

(iaia)Fig. 1.3.5 : Full replication of fragment
Advantages of full replication
(i) High availability of data, as database is available to almost every location.

(it) Faster execution of queries.
Disadvantages of full replication
(i) Full replication makes the concurrency control and recovery techniques more expensive.
(ii) Update operation is slower as multiple copies of same fragments that can slow down update operations, since
a single logical update must be performed on every copy of the database to keep the copies consistent. This
is especially true if many copies of the database exist.
> 2) No Replication
The opposite of full replication is having no replication - that is, each fragment is stored at exactly one site.
In this case, all-fragments must be dis-joint, except for the repetition of primary keys among vertical (or
mixed) fragments.
This type of allocation of data is also called as nonredundant allocation.
User 4 - =] original
=) | Data database
|]
User2
Server No replication
of data
User 3
(1at9)Fig. 1.3.6 : No replication
ES" Advantages of no replication
(i) Concurrency can be minimized.

(ii) Easy recovery of data.
[S" Disadvantages of no replication
(i) Availability of data will be limited.
(it) Slows down the query execution process, as multiple clients are accessi
ng the same server.
> 3) Partial replication
* Partial replication means only some fragments are
replicated from the database on more than one server
Or we can say that some fragments of the database
Original
may be replicated whereas others may not. e
| datab
siat ene
¢ The number of copies of each fragment can range
from one up to the total number of sites in the
distributed system.
E.g. sales forces, financial planners, and claims
adjustors carry partially replicated databases with
them on laptops and PDAs and synchronize them
(Recovery location)
periodically with the server database.
(120)Fig. 1.3.7 : Partial replication

== Advantages
The number of replicas created for frag
ments depend upon the importance of data
in that fragment.
0 Disadvantage
Concurrency control mechanism will be comp
lex and need to be carried out properly,
.

68) Tech-Neo Publications..A SACHIN SHAH Venture
tS Fragmentation and allocation Schema

that includes all attributes and tuples
e A fragmentation schema of a database is a defining set of fragments
be reconstructed from the fragments
of the database and satisfies the condition that the whole database can
ns,
by applying some of OUTER UNION (or OUTER JOIN) and UNION operatio
sites in distributed database
« An allocation schema describes the allocation of fragments to number of
If a fragment is
environment and a mapping that specifies for each fragment the site(s) at which it is stored.
stored at more than one site, it is said to be replicated.
>>| 1.4 DESCRIPTIVE QUESTIONS
se.
Q.1 Write difference between centralized and distributed databa
Q.2 Explain federated architecture of distributed database.
Q.3 Define replication. What are the advantages of replication of data?
Q.4 Explain types of distributed database system.
Q.5 Explain three tier architecture of distributed database system.
Q.6 What is the difference between replication and duplication?
Q.7 Which operations are performed on horizontal and vertical fragments to get original table.
Q.8 Explain how allocation of fragments done on different servers.
a.9 Give Example of derived horizontal fragments.
Q. 10 Explain parallel verses Distributed database architecture.
Q.11 Explain types of transparency in distributed database.
Q. 12 Write SQL expression for defining horizontal and vertical fragments.
Q.13 Explain parallel database architecture.
Q. 14 Explain two examples for horizontal and vertical fragments.
| 1.5 MULTIPLE CHOICE ‘on peal t Ic NS Q.1.2 Storing a separate-copy of the database at multiple
locations is which of the following
Q. 1.1 A homogenous distributed database is which of the (a) Data Replication
following? (b) Horizontal Partitioning
(c) Vertical Partitioning
(a) The same DBMS is used at each location and
data are not distributed across all nodes (d) None of the above Y Ans.: (a)
(b) The same DBMS is used at each location and
A distributed database is a collection of data which
data are distributed across all nodes,
(c) A different DBMS is used at each location and belong to the same system but are spread
data are not distributed across all nodes over the of the network.
(d) A different DBMS is used at each location and (a) Logically, sites
data are distributed across all nodes. ¥ Ans.: (a) (b) Physically, sites
(c) Database, DBMS Q. 1.7 Some of the columns of a relation are at different
(d) None of the above Y Ans.: (a) sites is which of the following?
Q. 1.4 Which of the following is/are the main goals of a (a) Horizontal Partitioning
distributed database? (b) Vertical Partitioning
(a) Interconnection of database (c) Replication
(b) Incremental growth (d) Fragmentation v Ans.: (b)

(c) Reduced communication overhead
Q. 1.8 Storing a separate copy of the database at multiple
(d) All of the above Y Ans.: (d)
locations is which of the following?
Which of the following parallel database (a) Replication
architectures is/are mainly used by distributed (b) Vertical fragmentation
database system? (c) Horizontal Fragmentation
(a) Shared Memory (d) Replication, Vertical fragmentation “ Ans.: (a)
(b) Shared Disk
Q.19 The rows of the tables are distributed on different
(C) Shared Nothing
servers in :
(d) Hierarchical Y Ans.: (c)
(a) Vertical Fragmentation
Q. 16 A heterogeneous distributed database is which of
the following? (b) Horizontal Fragmentation
(a) The same DBMS is used at (c) Mixed Fragmentation
each location and
data are not distributed across all nodes. (d) Allocation of data ¥ Ans.: (b)
(b) A different DBMS is used at each location and
Q. 1.10 There is no problem of redundancy of data in
data are not distributed across all nodes.
(c) A different DBMS is used at each location and
data are distributed across all nodes.” Ans.: (d) (2) Vertical Fragmentation
(b) Horizontal Fragmentation
(c) Mixed Fragmentation
(d) Allocation of data ¥ Ans.: (a)
Chapter Ends...
Q00
es §6Distributed Database
CHAPTER 2 Handling
ig : yllabus
Distributed Transaction Management —- Definition, properties, types, architecture Distributed Query
Processing - Characterization of Query Processors, Layers/ phases of query processing.
Distributed Concurrency Control- Taxonomy, Locking based, Basic TO algorithm, Recovery In Distributed Databases:
Failures in distributed database, 2PC and 3PC protocol.
2.1 Distributed transaction Management 2-2

2.44 Transaction .. denaeeeeseensess
2.1.2 Properties of Tiarennan avs sae SPU UPN ENURT SING NUE CRN GUTS ESTE EAVOUVEVORVayasaneceuresenernennenneenenetdensencesensagdat¥nds luB/aTAtiteaTG
2.1.3 Examples of Transactions
2.1.4 — Types of Tramsactions........ccccsssecssresereseesneenssseetesecseaesecncensrsransecrneesieeeseeccesaruessarancsenisseasiersaseneaneatets
2.2 Query Processing ......c.cccsessscsssesenesssssessuseessesssesrssessseesssseencssesnssusasesseesnnanenaeessnseasenssseessasssararsrssensnarsniescseassaceccescaneesseasees
UQ. mn.
_Explain steps in query processing. [UEIXEVAREM
2D.1 Transformation RUlOS..cccccssssscsssssesesersrersssesssscsstscssseeescereassessearanseterssasseessrscessseessetscsesecedeneestasurenersesenetensaees
2.2.2 Data Localizatlon v.cccccccccssscsrscresescescssessscssesscsscasessrsceseescsevsssseeseeeseeegeserassecseceesesicacseceureusenessarresesesenseasances
2.2.3. Global Query Optimization ii vir
2.2.4 Local Query Optimization ...ccssesecsseeseesesrneessesssssesssssecnserarensecsseeserssssesreesesessssscenensensanassssssaceceteeaeanneneey
2.2.5 Data Transfer Cost in Query Processing (example of SQL Query Executing at Different Sites )...........04 2-11
2.2.6 The Possibilities to Execute the SQL Query in Distributed Database vs scssesesssssserscsesscecsnsacsesestereeteaeeee 200 T
2.2.7 Distributed Concurrency Conttrol........ussee 2-12
2.2.8 Locking based Protocol es

2.2.9 Two-Phase LOCKING TYP@S...csscsesersseecsssesesesssecsisssssetenssransessesnsssersnsissesserssesesessesitsstsceescaneeteranenseseraaneneans 2-14
2.3 Distributed Two-phase Locking Algorithm ......sssssssssssesssssssssssresnsssessreseesrsensssesssersenrsrsceseuessnsnseacssesioesteeisanerisensseeerenes 2-16
2.3.1. Thomas’ Write Rule
2.4 Distributed concurrency Control ANd FECOVELY ....sssseesssseessseesssesesesecssiessssresnseeesnsseensesteneecrsnessaeessunesseceeanatseseasarssseettes BOT
UQ. —_Explain how concurrency control is achieved in distributed database, ((UIUMNMENIRED)..........-.sssssssseusssesen 219
2.5 Recovery in distributed database... nee autsiT TTR GeTe |
NOMLERAtTE
ARIAT AIN
2.5.1 Distributed Two-phase Commit ere). suunbueusiinTs i 2-21
REIE
MED) .......-ssssssssssss
UQ. —_ Explain 2PC protocol in detail. ((UU N
sssssesssesesesessns nsnnsnsey 2-21
2.5.2 Distributed Three-Phase Commit oPC) cave rerdansecsegreaceensansegraserenpatp Nap NAN TAR
dalam eoyappaddianiigi
2.6 Descriptive Questions.........
af Multiple Choice questions
Chapter Ends oo. c.ccccccescsessssssssesssesssessssessssesssessvensssssssssesssveessasssessarsessessseansesesasneesnsesnedeavssacasancsnnsauenesuenseenneseys
Advance Database Management System (MU-Sem 5-Com Distributed Database Handling). ...Page no. (2-2
p> 2.1 DISTRIBUTED TRANSACTION MANAGEMENT
%™ 2.1.1 Transaction
0 Definition
e = The transaction is sequence of steps or a logical unit of work on a database or an entire program on data.
* The series of steps necessary to accomplish a logical unit of work is referred as one transaction.
¢ A transaction is a collection of actions that make

Database may be
consistent transformations of system states while (Database in: temporarily in an Database in
preserving system consistency. inconsistent state a consistent |
: ate | during execution . state.
¢ When managing transactions which are accessing
data at several sites by keeping consistent data on
site is called transaction management.
.
* Transaction is considered as sequence of read and , fransaction T

write operation on database with computation
performed. (181) Fig. 2.1.1 : Transaction
YS 2.1.2 Properties of Transaction
1. Atomicity 2. Consistency 3. Isolation 4. Durability
> 1. Atomicity
e All changes to data are performed as if they are a single operation. That is, all the changes are performed, or
none of them are to make sure that database is in the consistent state.
* For example, in an application that transfers funds from one account to another, the atomicity property
ensures that, if a debit is made successfully from one account, the corresponding credit is made to the other
account. :
> 2. Consistency
e Data is in a consistent state when a transaction starts and when it ends.

e For example, in an application that transfers funds from one account to another, the consistency property
ensures that the total value of funds in both the accounts are updated at the start and end of each transaction
that confirms that changes are made to both the accounts are correct and consistent.
> 3. Isolation
¢ When one transaction is executing then other transaction will not interfere and executes independe
ntly The
state of a transaction is invisible to other transactions that seems transacti
ons that run concurrently appear to
be serialized.
(MU-New Syllabus w.e.f academic year 21-22)(M5-68) [al rech-Neo Publications..A SACHIN SHAH Venture
Advance Database Management System (MU-Sem 5-Com Distributed Database Handling)....Page no. (2-3
in an application that transfers funds from one account to another, the isolation property
e For example,
ensures that another transaction sees the transferred funds in one account or the other, but not in both, nor in
neither.
> 4. Durability
in the
e When the transaction successfully completes, changes to data persist and will not be undone, even
event of a system failure.
durability property
e For example, in an application that transfers funds from one account to another, the
ensures that the changes made to each account will be there in the database.
2 2.1.3 Examples of Transactions
(1) Banks withdraw $100 to account A.

(2) Airlines check if two seats are available on flight #700.
(3) Companies give bonus to employees.
2 2.1.4 Types of Transactions
Transactions have been classified according to a number of criteria. One criterion is the duration of
as
transactions. Accordingly, transactions may be classified as online or batch. These two classes are also called
short life and long life transactions.
1. Online Transaction
e Online transactions are characterized by very short execution/response times and by access to a
relatively smal] portion of the database.
¢ This class of transactions probably covers a large majority of daily life applications that we use mostly.
e Examples are banking transactions and airline reservation transactions. .
2. Batch Transactions
* Batch transactions, are the transactions that take longer to execute (response time being measured can be
in minutes, hours, or even days) and access a larger portion of the database.
¢ The applications that might require batch transactions are statistical applications, report generation,
complex queries and image processing.
¢ Another way the classification is done on the organization of the read and write operations:
3. Flat transactions
Flat transactions have a single start point (Begin transaction) and a single termination point (End transaction)
Advance Database Management System (MU-Sem 5-Com Distributed Database Handling)...,Page no, 2-4
a
4. Nested Transactions
* The transaction model that includes other transactions with their own begin and commit points are called
nested transactions.
¢ — It will look like as shown below
Begin transaction abo
Begin
Begin trarisaction abel:::
end. {abel}
Begin transaction abe2
End
mI 2.2 ees :
Ez Teh) cry Dec. 11
Query processing refers to the set of the activities involved in extracting data from a database and in
distributed database data is available on more than one server so it is important to design database in such a
way that response time will be reduced and database will be in the consistent state.
It briefs how query is executed internally and how output is calculated on user desktop. In a distributed
system, the issues must be considered as the cost of a data transmission over the network. The stages of a
distributed database query processing are given below and are common to all the servers or sites.
Query decomposition
It refers to global conceptual schema.

¢ In distributed database system all data will be available on different server so need to deal with the
diving query into subparts and fetch data from corresponding server where it is stored.
¢ This layer decomposes the calculus query(SQL query) into an algebraic query on global relations.
¢ The information required for this transformation is available in the global conceptual schema(
GCS)
describing the global relations.
Query decomposition can be done in four steps as given below
_ (1) Normalization (2) Analysis

(3) Elimination of redundancy (4) Rewriting
Localization :
This step maps the distributed query to separate queries on individual

fragments.

(3 Tech-Neo Publications...4 SACHIN SHAH Venture
Advance Database Management System (MU-Sem 5-Comp
c. Global query optimization : One strategy selected Caloulus query on

from list of candidates. global relations
D. Local query optimization : Finally , each query

executes at local site, Query Global
decomposition schema
E. Normalization
(1) Input query can be complex depending on the Algebraic query on

global relations
!
facilities provided by the language.
The goal of normalization is to transform the query contro!
(2)
to facilitate further
site < Data Fragment
to a normalized form localization schema
processing.
includes the lexical and analytical Algebraic query on fragments
(3) This process
analysis and the treatment of WHERE clause,
There are two possible normal forms Global Allocation
optimization schema
F. Conjunctive NI L
(A predicate) of disjunctions ‘Distributed query execution plan
This is a conjunction
(V predicates) as follows:
Local |" Distributed
sites execution.
(p11 Vp12 V...Vpin) a...A (pm1 Vpm2 ...Vpmn)
G. Disjunctive NF (182)Fig. 2.2.1 : Distributed query processing steps
This is disjunction (V predicate) of conjunctions (A predicates) as follows:

(pll Ap12a...Apln) V...V (pm1Apm2,A...Apam)
e The transformation of the quantifier-free predicate is using equivalence rules.
® Equivalence rules : Some of equivalence rules are :
1. pla p2 @ p2apl. 2. plVp2<p2Vpl
3. pla (p2ap3)e (p1 A p2 ).a p3 4. pl V (p2 V p3) © (pl V p2)

V p3
5. xal(aplhep
‘= Example of NF (Normal Form)
Consider the below query expressed in SQL as

SELECT ENAME :
FROM EMP.ASG
WHERE EMPINO=ASG.ENO
AND ASG,PNO-‘P1”
AND DLTR=12
OR DUR=24
(MU-New Syllabus w.e.f academic year 21-22)(M5-68) [el Tech-Neo Publications...A SACHIN SHAH Venture
Advance Database Management System (MU-Sem 5-Com Distributed Database Handling)....Page no, (2-6
The qualification in conjunctive NF is (above query in CNF)

EMP.ENO = ASG .ENO A ASG.PNO="P1”- A (DUR=12 V DUR=24)
The qualification in disjunctive NF is(above query in DNF)
(EMP.ENO = ASG.ENO a ASG.PNO="PI" 4 DUR=12) V
(EMP.ENO = ASG.ENO ~-ASG.PNO="P1” A DUR=24)
H. Analysis
* Query analysis enables rejection of normalized queries for which further processing is either impossible
or necessary.
The main reasons for rejection are that the query is type incorrect or semantically incorrect.
Type incorrect:
e — If any of its attribute or relation names are not defined in the global schema.
¢ — If operations are applied to attributes of the wrong type.
Semantically incorrect :
¢ Components do not contribute in any way to the generation of the result

¢ Only a subset of relational calculus queries can be tested for correctness .
¢ Those that do not contain disjunction and negation,
¢ To detect through Connection graph (query graph) and Join graph.
I. Query Graph
This graph is used for most queries involving select, project, and join operations. In a graph, one node
represents the result relation and any other node represents an operand relation. 4 An edge between two
nodes that are not results represents a join, whereas an edge whose destination node is the result represents a
project. ,
c= Example of Query graph
Consider the following query in SQL:
SELECT ENAME, RESP

FROM EMP, A: PROJ
WHERE EMP.ENO = ASG.ENO
AND ASG.PNO = PROJ.PNO
AND PNAME = "CAD/CAM"
AND DUR. >= 36
AND TITLE = "PROGRAMMER"
(MU-New Syllabus w.e.f academic year 21-22)(MS-68) Tech-Neo Publications A SACHIN SHAH Venture
DUR 236
ASGPNO =
PROJ.PNO
UPNAME’= |
RESULT CADICAM’
(183) Fig, 2.2.2 : Query Graph
J. Join Graph
This is the graph in which only joins are

EMP.ENO = ASGENO ASG.PNO = PROJ.PNO
considered.
If the query graph is not connected, the query

is wrong.
(184)Fig. 2.2.3 : Join Graph
Consider the SQL query;

SELECT ENAME,RESP
FROM EMP, AS, PROJ
WHERE EMP.ENO = ASG .ENO.
AND PNAME = "CAD/CAM"
AND DUR >= 36
AND ‘TITLE="PROGRAMMER" Se ee
The query graph is shown in figure is disconnected, which tell us that the query is semantically incorrect.
EMP.ENO =
ASG.ENO
TITLE=
Programmer:
168)Fig. 2.2.4 : Disconnected Query Graph
Advance Database Management System (MU-Sem 5-Com Distributed Database Handling)....Page no, (2-8
t& Elimination of redundancy
¢ A user query expressed on a view may be enriched with several predicates to achieve view-relation
correspondence and ensure semantic integrity and security.
¢ The enriched query qualification may then contain redundant predicates.
¢ Such redundancy may be eliminated by simplifying the qualification with the following well known
idempotency rules :
1. plaa(pl) @ false
2.. pla(pl V p2)
© pl
3. pl V false = pl
tx Example
Consider SQL, query as

SELECT TITLE ;
FROM EMP
WHERE EMP.ENAME = "J Doe"
OR (NOT(EMP.TITLE = "Programmer")
AND(EMP.TITLE = "Programmer")
OR EMP.TITLE = "Elect. Eng.")
AND. NOT (EMP.TITLE = "Elect. Eng.")) ay SE
After simplification the Query becomes SELECT TITLE
SECC
‘WHERE EMP.ENAME
K. Rewriting
This process is divided into two steps:

1. Straightforward transformation of query from relational calculus into relational algebra
2. Restructuring of relational algebra to improve performance
tS Operator tree
¢ — Is used to represent the algebra query graphically. It is a tree in which

a leaf node is a relation and a non leaf
node is an intermediate relation produced by a relational algebra operator.
¢ The transformation of a tuple relational calculus query into
an Operator tree can easily be achieved 3
follows.
e First, a different leaf is created for
each different tuple variable. In S
QL, the 42 i iatel
available in the FROM clause. lea vies ines ania}

68) Tech-Neo Publicatioons..
ns....A SACHIN SHAH Ventue
e Second, the root node is created as a project operation and these are found in SELECT clause
e Third, the SQL WHERE clause is translated into the sequence of relational operations (select, join, union,
etc.).
Consider the example of SQL query as shown below :

Feotal arr a Sesgir fhaes
TIENAME a } Project
SELECT ENAME | Poel ve

© =1
8OUROR 24*
DUR=2 A
FROM PROJ, ASG, EMP : et. 20 es a
WHEREASG.ENO = EMP.ENO | SPNAME=*CADICAM \. Select
AND ASG.PNO = PROJ.PNO piece va aces |
AND ENAME = "J.DOE" | i aimbot 3 tate bo
AND PROJ.PNAME ="CAD/CAM" Lone Cee
AND (DUR =12 OR DUR= 24) ~ RUS eae sin
; al
1 gPRO! 5) ASG... EMP J

(186) Fig. 2.2.5: Exam SQL query with operator tree
YS. 2.2.1 Transformation Rules:
For converting SQL query into tree form there are some transformation rules to follow as below
* By applying transformation rules many different trees can be generated.”
* Commutativity of binary operations

RxS@SxR
RMoMSOSR
e Associativity of binary operations

(RxS)xT@SRx(SXT),,
(RDI S) Pd TSR bd (S D4 T)
-e Idempotence of unary operations

o JIA‘(ITA"(R)) @ TIA’(R) ;
o op! (Al )( op2(A2)(R)) & opl (Al) A p2(A2)(R)

« Commuting selection with projection
o AI,..., An(op(Ap)(R)) & mAl, ...., An(op(Ap) TAL, ....An, Ap(R)))
e Commuting selection with binary operations
o op(A)(R X 8) & (ap(A) (R) x 8)
¥
Advance Database Management System (MU-Sem 5-Com ; Distributed Database Handling)....Page no. (2-19
Oo op(Ai)(R(Aj,Bk)S) & (op(Ai) (R)) (Aj, Bk)S

¢ Commuting projection with binary operations
o TIC(Rx §) I TTA’(R) x IB’(S)
© TIC(R(Aj,Bk)S) & TTA’(R) (Aj,Bk ) ITB’(S)
(Remember that redundant queries are likely to arise when a query is the result of system transformations
applied to the user query) such transformations are used for performing semantic data control (views, semantic
integrity control), The calculus query is restructured as an algebraic query and several algebraic queries can be
generated from same calculus query and performance is checked to fine better option.
MX 2.2.2 Data Localization
* The input to the second layer is an algebraic query on global relations. The second layer will localize the
query’s data using data distribution information in the fragment schema..
* This layer will check which fragments are required for the query and transforms the distributed query into a
query on fragments.
* A global relation can be reconstructed by applying the fragmentation rules, and then deriving a program,
called a localization program of relational algebra operators which then act on fragments.
* Generating a query on fragments is done in two steps. First, the query is mapped into a fragment query by
substituting each relation by its reconstruction program (also called materialization program). Second, the
fragment query is simplified and restructured to produce another “good” query.
tA 2.2.3 Global Query Optimization
* The input to the third layer is an algebraic query on fragments.
¢ The goal of query optimization is to find an execution strategy for the query which is optimal.(re
quires less
time to execute)
* The previous layers have already optimized the query, for example, by eliminating
redundant expressions.
However, this optimization is independent of fragment characteristics such as fragment
allocation and
cardinalities.
* Query optimization consists of finding the “best” ordering
of operators in the query, including
communication operators that minimize a cost function(response time).
¢ The output of the query optimization layer is a optimized
algebraic query with communication operators
included on fragments, It is typically represented as a distrib
uted query execution plan,
(MU-New Syllabus w.e.f academic year 21-22)(MS-68) Tach-Neo Publications...A SACHIN SHAH Venture
(MU-Sem 5-Com Distributed Database Handling)..,.Page no. (2-11
‘a, 2.2.4 Local Query Optimization
The last layer is performed by all the sites which are having fragments involved in the query. Each sub query
will execute at one site, called a local query which is then optimized using the local schema of the site and
executed.
2a. 2.2.5 Data Transfer Cost in Query Processing (example of SQL Query Executing at
Different Sites )
Consider below the example of SQL query that is

accessing data of fragments of table employee
and department stored on different servers and - DBI
‘EMPLOYEE.
due to that there is transfer cost required from one
site to another to satisfy the need of the query and —_—_Location 1
to complete the execution.
In DDB, all the tables in the user query may not
Location 2
be present in a single DB or at single location.
Hence while processing the query, it. may need to
Location 4
access the tables at different DB or at different
location. (187)Fig. 2.2.6 : Query Processing
This requires a request and transfer cost for the data over the network.
Take the example of EMPLOYEE and DEPARTMENT tables.
Consider an EMPLOYEE table with 1000 records with each record of 100bytes,
at
DEPARTMENT table with 10 records with each record of 20 bytes. Suppose EMPLOYEE table is in DB1
location 1 and DEPARTMENT table is in DB2 at location 2.
resulting
Consider the query to find the Names of the employees and their department names. Suppose each
record will have 20 bytes and all the employees and departments are being selected.
e Suppose this query is being executed at location 4.
%3. 2.2.6 The Possibilities to Execute the SQL Query in Distributed Database
Case 1
Since location 4 is not having any of the tables, both the tables needs to be transferred to location 3. Hence
the cost of data transfer is as below :
Cost of transferring EMPLOYEE data: 1000 records * 100 bytes = 1,00,000 bytes
Cost of transferring DEPARTMENT data: 10 records * 20 bytes = 200 bytes.
Therefore, total cost = 1, 00,000 bytes + 200 bytes = 1,00,200 bytes
Here cost of transferring result records will not come as result is request at this location itself.
(MU-New Syllabus w.e.f academic year 21-22)(M5-68) [al Tech-Neo Publications...A SACHIN SHAH Venture
sell
Distributed Database Hand ling)....Page no. 2-12
(2.
Advance Database Management System (MU-Sem 5-Com
Case 2
the data
¢ Suppose we transfer EMPLOYEE records into location 2 and proc
ess the query there. Then transfer
EE records and transfer cost of
to the location 3. This transfer needs to consider transfer cos t of EMPLOY
result records.
* So the cost of transferring EMPLOYEE data: 1000 records * 100 bytes
= 1,00,000 bytes
* — Cost of transferring the result : 1000 records * 20 bytes = 20,000 bytes.

Therefore, total cost = 1,00,000 bytes + 200 bytes = 1,20,000 bytes
Case 3
the query there. Then transfer the
e Suppose we transfer DEPARTMENT records into. location | and process
data to the location 3. This transfer needs to consider transfer cost of DEPARTMENT records and transfer
cost of result records. Hence j
* Cost of transferring DEPARTMENT data: 10 records * 20 bytes = 200 bytes
¢ — Cost of transferring the result : 1000 records * 20 bytes = 20,000 bytes..
* Therefore, total cost = 200 bytes + 20,000 bytes = 20,200 bytes
¢ Hence the case 3 is the best approach for transferring the data which gives the minimal cost. Hence the
federated method has to calculate these costs depending on the query, table size, result size, cost processing
" location etc and determine which method to use for query processing. Like we do in normal DB query
processing — reducing number of records, performing the filter condition first etc.
2 2.2.7 Distributed Concurrency Control
In distributed system the database is divided on multiple locations and multiple transactions are allowed to
execute at the same time. So it must recover from site or communication failure.
There can be some issues or problems when dealing with distributed environment that are listed below:
1. Multiple copies of the data items 2... Failure of individual sites
3. Failure of communication links 4. Distributed commit
5. Distributed deadlock
> 1. Dealing with multiple copies of the data items
The concurrency control method should maintain consistency of these multiple copies of data. The recovery
method is responsible for making a copy consistent with other copies if the site on which the copy is stored
fails and recovers later.
> 2. Failure of individual sites
When one or more individual sites fail, the DDBMS should continue to Operate with its running sites. When
a site recovers, its local database must be brought up-to-date with the rest of the sites before it rejoins the
system. ,
> 3. Failure of communication links
The system must be able to deal with the failure of one or more of the communication links that connect the
sites. And can cause network partitioning.
> 4, Distributed commit
If some sites fail during the commit process, then problems can arise with committing a transaction that is
accessing databases stored on multiple servers. The two-phase commit (2PC) protocol is used to deal with
this problem.
> 5. Distributed deadlock
Deadlock may occur among several sites due to distribution of data, so there should be some techniques for
dealing with this problem -
2.2.8 Locking based Protocol
A lock is defined as a variable associated with a data item that describes the status of the item with respect to
the possible operations that can be applied to it,
Another way Wwe can say that lock is a mechanism that is used to control the access of the item by number of
transaction and if the data item is shared by conflicting operations then only one operation will access data.
Database systems with lock-based protocol uses a mechanism by which any transaction cannot read or write
data until it acquires an appropriate lock on it.
There are two types of locks :
1. Binary Locks: A lock on a data item can be in two states as it is either locked or unlocked by the
particular data item.
2. Shared/exclusive: The locking mechanism differentiates the locks\based on their requirement of the
data item as if a lock is acquired on a data item to perform a write operation, it is an exclusive lock. If a
lock is required to perform only read operation then its shared lock. Allowing more than one transaction
to write on the same data item would lead the database into an inconsistent state. Read locks are shared
because no data updating done on data item which has read lock.
There are two options for handling the locks on data item with lock manager as given below:
Single lock manager approach (Binary locks-locked or unlocked)
e Jn this approach , system maintains a single lock manager that resides on a single chosen site, say Si.
¢ All lock and unlock requests are made at site Si.
e When a transaction needs to lock a data item, it sends a lock request to Si rights to allocate data item
and lock manager determines whether the lock can be granted or not.
e If data item is available then lock manager sends a message to the site which has initiated the request
and If not available, the request for data item is delayed until it can be granted.
fen :
istri
ement System (MU-Sem 5-Com
Advance Database Mana
a replica of the data item
sact ion can read the data item from any one of the sites at which
© The tran
resides.
the data item resides mus t be inv
olved in the writing.
, all the sites wher e a repl ica of
e Incase of a write
cS Advantages
and
two messa ges for handl ing lock requests and unlock requests
It is easy implementation as it requir es
the deadlock-handling
deadl ock handl ing as all lock and unloc k requests are made at one server,
easy
algorithms can be applied directly.
tS Disadvantages
Bottleneck : The lock manager site becomes a bottleneck.

(Shared/exclusive lock approach)
2. Distributed lock manager
sites.
manager function is distributed over several
In distributed-lock-manager approach the lock
ck requests for
main tain s a local lock mana ger whos e function is to monitor lock and unlo
e Each site
those data items that are stored on that site.
at site Si a message is
item Q that is not replicated and resides
© When a transaction wishes to lock a data
a lock.
sent to the lock manager at site Si requesting
cS Advantage
t to failures.
Distribution of the work and it causes robus
t= Disadvantage
icated.
Deadlock detection and recovery is more compl
>=. 2.2.9 Two-Phase Locking Types
A transaction is said to follow the Two-Phase

Sl ap ce
4 sie ty Obtain lock

king can
Locking protocol (2PL) if Locking and Unloc [been erebardosed
es snes one
be done in two phases : a BI freedoms Be LL Sie | Retease lock
; yi ta _
A. Growing Phase B. Shrinking Phase 3oo { ri
may 8] Pon
A. Growing Phase : New locks on data items es f;eo Fe eee hl
; PEBI. Ot
| i f ae oR
eb
be acquired but none can be released. TTS
may be pp het al ose Ft
B. Shrinking Phase : Existing locks
BEGIN eee 5 - —
released but no new locks can be acquired. . . td OINT | ‘END
(oboa. uo Transaction duration —» _
(188) Fig. 2.2.7 : Two phase locking protocol
(MU-New Syllabus w.ef academic year 21-22)(M5-68) Tech-Neo Publications...A SACHIN SHAH Ventu’?
Consider below example of transaction implementing 2 Phase locking concept

Table 2.2.1 : Transaction implementing
Sr.No: | 7 Th
1 lock-S(A)
2 lock-S(A)
3 lock-X(B)
gen Oe | coms
5 | Unlock(A)
6 Lock-X(C)
7 Unlock(B)
8 Unlock(A)
9 Unlock(C)
List of steps number with growing and shrinking phase performed

1. Transaction T,
e The growing Phase is from steps 1-3.
e The shrinking Phase is from steps 5-7.
e Lock Point at 3
2. Transaction T;
e The growing Phase is from steps 2-6. -
e The shrinking Phase is from steps 8-9.
e Lock Point at 6
The Point at
i Note ? LOCK POINT“ which the growing phase ends, i. , when a transaction takes the final lock’ needs to
ee
«carry on its work:
c= Drawbacks of 2-PL
Cascading Rollback is possible under 2-PL.

Deadlocks and Starvation are possible.
The modifications to 2-PL are in three categories :
oe Strict 2-PL
o Rigorous 2-PL
7
© Conservative 2-PL
In Basic 2-PL, over that some extra modifications are done.
Strict 2-PL
This requires that in addition to the lock being 2-Phase all Exclusive(X) locks held by the transaction be
released until after the Transaction Commits.
Following Strict 2-PL ensures that our schedule is :
© Recoverable
© Cascade less
Hence, it gives us freedom from Cascading Abort which was still there in Basic 2-PL and moreover
guarantee Strict Schedules but still, Deadlocks are possible!
ES Rigorous 2-PL
This requires that in addition to the lock being 2-Phase all Exclusive(X) and Shared(S) locks held by the
transaction be released until after the Transaction Commits.
Following Rigorous 2-PL-ensures that our schedule is :
© Recoverable
o Cascade less
Conservative 2PL 3
This protocol requires the transaction to lock all the items it access before the Transaction begins execution
by predeclaring its read-set and write-set.
If any of the predeclared items needed cannot be locked, the transaction does not lock any of the items y
instead, it waits until all the items are available for locking.
It is difficult to use in practice because of the need to predeclare the read-set and the write-set which is not
possible in many situations.
2.3 DISTRIBUTED TWO-PHASE LOCKING ALGORITHM
In this type of two-phase locking mechanism, lock managers are distributed to all sites.
They are responsible
for managing locks for data at that site. If no data is replicated, it is equivalent to primary
copy 2PL.
In this approach, there are a number of lock managers, where each lock manager
controls locks of data items
stored at its local site. The location of the lock manager is based upon data distribu
tion and replication.
The basic principle of distributed two-phase locking is same as the basic two -phase
; . locking protocol.
However, in a distributed system there are sites designated as lock managers.
A lock manager controls lock acquisition requests from
transaction monitors. In order to enforce c0-
ordination between the lock managers in various sites,
; at least one site is given the authority ‘to see all
transactions and detect lock conflicts.
(MU- New Syllabus w.e.f academic year 21-22)(M5-68) [ial rch. nies Publications...A SACHIN SHAH Venture
(MU-Sem 5-Com Distributed Database Handling)....Page no, (2-17
who can detect lock conflicts, distributed two-phase locking
e Depending upon the number of sites
approaches can be of three types :
1. Centralized two-phase locking
2. Primary copy two-phase locking

Basic Timestamp Ordering Protocol
1. Centralized two-phase locking

environment
In this approach, one site is designated as. the central lock manager. All the sites in the
know the location of the central lock manager and obtain lock from it during transactions.
2. Primary copy two-phase locking
o Select any one replica of data item to be the primary copy.
© Site containing the replica is called the primary site for that data item
After that, a
o Primary copy 2PL mechanism, many lock managers ‘are distributed to different sites.
When the
particular lock manager is responsible for managing the lock for a set of data items.
| primary copy has been updated, the change is propagated to the slaves.
o Each of these sites has the responsibility of managing a defined set of locks. All the sites know
which lock control center is responsible for managing lock of which data table/fragment item.
3. Basic Timestamp Ordering Protocol
DBMS creates unique identifier that is timestamp to identify the transaction with the relative starting
time and timestamp values are assigned in the order in which the transactions are submitted to the
system. So, time stamping is a method of concurrency control in which each transaction is assigned a
transaction timestamp.
[> Timestamp (‘TS’)
For every data item, two time stamp are maintained that are listed below:
e Read time stamp : Time stamp of youngest transaction which has performed operation read on the data
item.
* Write time stamp : Time stamp of youngest transaction which has performed operation write on the
data item.
Steps followed in basic Timestamp Ordering Protocol are given below
¢ The Timestamp Ordering Protocol is used to order the transactions based on their Timestamps. The order of
transaction is nothing but the ascending order of the transaction creation.
e The priority of the older transaction is higher that's why it executes first. To determine the timestamp of the
transaction, this protocol uses system time or logical counter.
* The lock-based protocol is used to manage the order between conflicting pairs among transactions at the
execution time. But Timestamp based protocols start working as soon as a transaction is created.
(MU-New Syllabus w.e.f academic year 21-22)(M5-68) te Tech-Neo Publications...A SACHIN SHAH Venture
Let's assume there are two transactions Tl and T2. Suppose the transaction T1 has entered the system at 007
times and transaction T2 has entered the system at 009 times. T1 has the higher priority, so it executes firs,
as it is entered the system first.
The timestamp ordering protocol also maintains the timestamp of last 'read' and ‘write’ operation on a data,
Basic Timestamp ordering protocol works as given below :
1. Check the following condition whenever a transaction Ti issues a Read (X) operation:
¢ If W_TS(X) >TS(Ti) then the operation is rejected.
« If W_TS(X) <= TS(Ti) then the operation is executed,
e Timestamps of all the data items are updated.

Check the following condition whenever a transaction Ti issues a Write(X) operation:
e If TS(Ti) < R_TS(X) then the operation is rejected.
© If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back otherwise the operation is
executed,
Where,
TS (TI) denotes the timestamp of the transaction Ti.
R_TS(X) denotes the recent Read time-stamp of data-item X.
W_TS(X) denotes the recent Write time-stamp of data-item X.
7 2.3.1 Thomas' Write Rule
This rule states if TS (Ti) < W-timestamp(X), then the operation is rejected and Ti is rolled back. Time-
stamp ordering rules can be modified to make the schedule view serializable. Instead of making Ti rolled back,
the 'write’ operation itself is ignored.
tS Types of failures in distributed database
1. Transaction Failure : This is the condition in the transaction where a transaction cannot execute it further.
This type of failure affects only few tables or processes. The failure can be because of logical errors in the
code or because of system error like deadlock or unavailability of system resources to execute the
transactions.
System Crash : This can be because of hardware or software failure or because of external factors like
power failure. In most of the cases data in the secondary memory are not affected because of this crash. This
is because the database has lots of integrity checkpoints to prevent the data loss from secondary memory.
Disk Failure : These are the issues with hard disks like formation of bad sectors, disk head crash,
unavailability of disk etc.
E—————
——eEeEeee———————————E——EE—E
eee
> 2.4 DISTRIBUTED CONCURRENCY CONTROL AND RECOVERY

TE Ce nT,
RORLAL SAMEEREN SAS TR
Sota Wane bets 2S 2 Uo a creer

EN SR SES COC LCC be
. Some of
Distributed concurrency control and recovery techniques must deal with these and other problems
the techniques that have given below to deal with recovery and concurrency control in DDBMSs.
1. Distributed Concurrency Control Based on a Distinguished Copy of a Data Item
To deal with replicated data items in a distributed database, a number of concurrency control methods
have been proposed that extend the concurrency control techniques for centralized databases.
this
The idea is to designate a particular copy of each data item as a distinguished copy. The locks for
are sent to
data item are associated with the distinguished copy, and all locking and unlocking requests
the site that contains that copy.
A number of different methods are based on this idea, but they differ in their method of choosing the
site.
distinguished copies. In the primary site technique, all distinguished copies are kept at the same
A modification of this approach is the primary site with a backup site. Another approach is the primary
copy method, where the distinguished copies of the various data items can be stored in different sites. A
site that includes a distinguished copy of a data item basically acts as the coordinator site for -
concurrency control on that item.
EY Different methods followed in distributed concurrency control :
A. Primary site technique

B. Primary Site with Backup Site
C. Primary Copy Technique
D. Choosing a New Coordinator Site in Case of Failure
> A. Primary Site Technique
In this method a single primary site is designated to be the coordinator site for all database items and
all locks are kept at that site, and all requests for locking or unlocking are sent there.
This method is an extension of the centralized locking approach.
For example, if all transactions follow the two-phase locking protocol, serializability is guaranteed. The
advantage of this approach is that it is a simple extension of the centralized approach and thus is not
overly complex. However, it has certain inherent disadvantages.
One is that all locking requests are sent to a single site, possibly overloading that site and causing a
system bottleneck. A second disadvantage is that failure of the primary site paralyzes the system, since
all locking information is kept at that site. This can limit system reliability and availability.
Although all locks are accessed at the primary site, the items themselves can be accessed at any site at
which they reside. For example, once a transaction obtains a Read_lock on a data item from the primary
(MU-New Syllabus w.e.f academic year 21-22)(M5-68) te Tech-Neo Publications...A SACHIN SHAH Venture
Advance Database Management System (MU-Sem 5-Com
Distributed Database Handlin
site, it can access any copy of that data item. However, once a transaction obtains
a Write_lock and
updates a data item, the DDBMS is responsible for updating all copies of the data item before Teleasing
the lock.
> B. Primary Site with Backup

Site
This method overcomes the second disadvantage of the primary site method by designating a
second site
to be a backup site. All locking information is maintained at both the primary
and the backup sites.
In case of primary site failure, the backup site takes over as the primary
site, and a new backup Site is
chosen. This simplifies the process of recovery from failure of
the primary site, since the backup site
takes over and processing can resume after a new backup site
is chosen and the lock status information
is copied to that site,
It slows down the process of acquiring locks,
however, because all lock requests and granting
of locks
must be recorded at both the primary and the backu
p sites before a response is sent to the requesting
transaction.
The problem of the primary and backup sites becoming
overloaded with requests and slowing down the
system remains undiminished.
> C. Primary Copy Technique

This method distributes the load of lock coordination
among various sites by having-the distinguished
copies of different data items stored at differ
ent sites.
Failure of one site affects any transactions-that
are accessing locks-on items whose primary
copies reside
at that site, but other transactions are not affected.
This method can also use backup sites to enhance
reliability and availability.
> D. Choosing a New Coordinator Site in
Case of Failure
Whenever a coordinator site fails in any
of the preceding techniques, the sites that
choose a new coordinator(to continue with are still running must
the work that is stopped).
In the case of the primary site approach
with no backup site, all executing transactio
and restarted and as a part of the recovery ns must be aborted
process there should be selection of new prima
creating a lock manager process and a record ry site and
of all lock information at that site.
For methods that use backup sites, transactio
n processing is suspended while the
designated as the new primary site and a new backup site is
backup site is chosen and is sent copies of all
information from the new prima the locking
ry site.
If a backup site X is about to become the
new primary site, X can choose the new
among the system’s running sites. However, backup site from
if no backup site existed, or if both the
backup sites are down, a process called elect primary and the
ion can be used to choose the new coordinato
r site.
In above process, any site Y that attempts to
communicate with the coordinator site Tepe
to do so can assume that the coordinator atedly and fails -
is down and can start the election process
to all running sites proposing that Y beco by sending a message
me the new coordinator. As soon as Y
yes votes, Y can decla receives a majority of
re that it is the new coordinator. The elect
ion algorithm itself is quite complex, but
)(M5-68) Tech-Neo Publications...A SACHIN SHAH Ventu
re
(Distributed Database Handling) Page no. (2-21)
Advance Database Management System (MU-Sem 5-Comp)
this is the main idea behind the election method. The algorithm also resolves any attempt by two of
more sites to become coordinator at the same time.
Distributed Concurrency Control Based on Voting
The concurrency control methods for replicated items maintains the locks for that item.
In the voting method, there is no distinguished copy but a lock request is sent to all sites that inclndes a
copy of the data item.
Each copy maintains its own lock and can grant or deny the request for it. If a transaction that requests a
lock is granted that lock by a majority of the copies, it holds the lock and informs all copies that it has
been granted the lock. If a transaction does not receive a majority of votes granting it a lock within a
certain time-out period, it cancels its request and informs all sites of the cancellation.
The voting method is considered a truly distributed concurrency control method, since the responsibility
for a decision resides with all the sites involved: The voting has higher message traffic among sites than
the distinguished copy methods. If any site failure occurs during the voting , it becomes extremely
complex.
Distributed Recovery
In some cases it is quite difficult even to deter-mine whether a site is down without exchanging
numerous messages with other sites. For example, suppose that site X sends a message to site Y and
expects a response from Y but does not receive it. There are several possible explanations:
(a) The message was not delivered to Y because of communication failure.
(b) Site Y is down and'could not respond.
(c) Site Y is running and sent a response, but the response was not delivered.
Without additional information or the sending of additional messages, it is difficult to determine what
actually happened.
Another problem with distributed recovery is distributed commit. When a transaction is updating data at
several sites, it cannot commit until it is sure that the effect of the transaction on every site cannot be
lost. This means that every site must first have recorded the local effects of the transactions permanently
in the local site log on disk.
The two-phase commit protocol is often used to ensure the correctness of distributed commit.
2 2.5.1 Distributed Two-phase Commit (2PC)

a >
MU - May 14
——— ee
Assume that there are set of grocery stores where the head of all store wants to query about the available rice
inventory at connected stores in order to move inventory store to store to make balance over the quantity of
rice inventory at all stores.
Advance Database Management System (MU-Sem 5-Com| Distributed Database Handling)....Page no. 2-29
¢ The task is performed by a single transaction T that’s component T,, at the n' store and a store Sp corresponds
to Ty where the manager is located. The following sequence of activities are performed by T:
a) Component of transaction (T) Ty is created at the head-site (head-office).
b) Tosends messages to all the stores to order them to create components T).
c) Every T, executes a query at the store “i” to discover the quantity of available rice inventory and reports
this number to T,.
d) Each store receives instruction and update the inventory level and made shipment to other stores where
require,
But there are some problems that we can face during the execution of above process:
1) Atomicity property of transaction may be violated because any store (S,) may be instructed twice to
Send the inventory that may leave the database in an inconsistent state.
To ensure atomicity property Transaction T must either commit at all the Sites, or it must abort at all
sites. :
2) However, the system at store T, may crash, and the instructions from Tp are never received by T,
because of any network issue and any other reason.
* The distributed two phase commit protocol solves above problems, faced, during execution of Distributed
two-phase commit process.
¢ There are two phases’:
| A. Phase 1: Prepare Phase B. Phase 2: Commit/Abort Phase |
Transaction.
coordinator, un Scere al
see Coordinator Participant
a ~WU°St10. pre a sas -
i: pare a é [Beg] Prepare (vote request)
Prepare
apePale, .
a : a
a 9
=
phase < >?
; 4 ed
4 prepa ‘a
i 2§
aea SS
+t
a : —
a
a
C,
a
Pets
i
ne
9
a
Decision
ise ait
a 2
Commit) Ml E
phase a a o3
Ak
i pone a QB 5 ¢—<_—<—_———
' é ——
: ' :
~'/ a (End }~
(189)Fig. 2.5.1 : Distributed

two phase commit Protocol
(MU-New Syllabus we.f academic year 21-22)(M5-68)

Tech-Neo Publications... SACHIN SHAH Venture
__|
Advance Database Management System (MU-Sem 5-Comp) (Distributed Database Handling)....Page no. (2-23)
|) A. Phase 1: Prepare Phase

After each participants has locally completed its transaction, it sends a “DONE” message to the controlling
site. When the controlling site has received “DONE” message from all participants, it sends a “Prepare”
message to the participants.
e The participants vote on whether they still want to commit or not. If a participant wants to commit, it sends a
“Ready” message.
* A participant that does not want to commit sends a “Not Ready” message. This may happen when the
participant has conflicting concurrent transactions or there is a timeout.
> B. Phase 2: Commit/Abort Phase
e After the controlling site has received “Ready” message from all the participants -
o The controlling site sends a “Global Commit” message to the participants.
o The participants apply the transaction and send a “Commit ACK” message to the controlling site.
o When the controlling site receives “Commit ACK” message from all the participants, it considers the
transaction as committed.
e After the controlling site has received the first “Not Ready” message from any participant then :
o Thecontrolling site sends.a “Global Abort” message to the participants.
o The participants abort the transaction and send an “Abort ACK” message to the controlling site.
o When the controlling site receives “Abort ACK” message from all the participants, it considers the
transaction as aborted. For better understanding consider the following scenario of 2PC protocol.
Phase ONE Phase TWO
Transaction
Participant
Fig. 2.5.2 : Scenario of 2PC
7
Advance Database Management System (MU-Sem 6-Com Distributed Database Handling)....Page no, 2-24
TH 2.5.2 Distributed Three-Phase Commit (3PC)
The steps in distributed three-phase commit are as follows :

1, Voting phase (vote collection phase)
2. Dissemination Phase (pre-commit phase)
3. Decision Phase
Coordinator Participants 1
Begin —+ ReqUost-lo-Propniy (ore)
/oniNO Volo 4
coordinator Participants
= ' Trans action
‘Vote’ a ee
be ree t
on co a Re
rae < a Uést. . s
rene
P,
r9pare to Commit eeseuee | atfirmation Negaven 5t
hI ‘phase, :
all oe |
Disseminatio arty
phase, n
Commit Init Ackn owiedgemen

ializeg i
: : —
™
Decision - C Ommit orRollback
| phase» pr
¢
__
am
er o°
ment
|
=
End —|
:
L
pesnowense
(1810)Fig. 2.5.3 : Distributed three phase commit protocol
e The extension of 2PC is 3PCwhere the commit phase is divided into two parts to improve fault tolerance and
addition of the phase prepare-to-commit.
The working of the protocol is as given below:
> 1. Voting phase (vote collection phase)

e In the first phase, the coordinator will send a sub-transaction to all participants and the participants will
send the coordinator a reply saying yes to the commit or no to the commit.
¢ — If all participants respond yes, the coordinator will send participants a pre-commit message. If any of the
participants responds no, the coordinator will send a message that says ABORT. This phase ensures tho!
the coordinator will only ask participants to proceed with a commit if there are no failures
(MU-New Syllabus w.e.f academic year 21-22)(M5-68) Tech-Neo Publications...A SACHIN SHAH ventul
_ Advance Database Management System (MU-Sem 5-Comp) (Distributed Database Handling) ...Page no. (2-25)
_» 2. Dissemination Phase (pre-commit phase)
e In the second phase, which is the acting as prepare-to-commit stage, the coordinator sends a prepare
message to participants from the first phase.
° In this phase, the coordinator essentially asks the participants that if they are prepared to commit and, if
they are not, the commit is aborted.
> 3. Decision Phase
e If the coordinator succeeds in the second phase, it will move on to the decision phase. Once the
the coordinator
coordinator receives a yes from all participants stating that they are prepared to commit,
will send out a commit message. Then participants will commit to the specified transaction.
then it will
e If the coordinator receives a negative message while in a voting state, times out or fails,
message to all
automatically abort the transaction. In this case, the coordinator will send an abort
participants and all participants will execute abort operation to the transaction.
= = SAS
SS Sts hepa
Q.1 Explain steps in query processing in distributed database.
Q@.2 Explain concurrency control way in distributed database.

Q.3 Explain types of failures in Distributed database.
Q.4 Explain 2PC protocol.
Q.5 Explain 3PC protocol.

Q.6 Which are types of transaction?
Q.7 Which are recovery techniques in distributed database?

Q.8 How query is processed in distributed database.
Q.9 Explain basic timestamp ordering protocol.
Q.10 How query is processed in distributed database. Explain with example.
Q. 11 Explain how replication and fragmentation is used in Query processing.
wh 27 MULTIPLE CHOICE QUESTIONS as ay ; (d) The algebraic query is executed by the local
: ‘ BE eRe ORC Matra ihe sy Se Fiat: Sheree SEE) sites v Ans. : (d)
Q. 2.1 Which of the following is NOT a step of query | Q,2.2 Let us assume that in 2PC protocol a transaction
decomposition layer in distributed query coordinator failed after a decision is taken (to
processing? abort/commit) and shared among the participating
(a) Normalized query is analyzed semantically ~ sites. What should the coordinator do during restart
(b) Semantically correct query is simplified (recovery)?
(c) Simplified calculus query is restructured as (a) Abort the transaction in any case
an algebraic query (b) Commit the transaction in any case
y
(c) Commit/abort only if received all (d). It will be granted as soon as it is released by
acknowledgements from participating sites A ¥ Ans. : (c)
(d) Cannot be decided by the coordinator on what

Q. 2.8 In a two-phase locking protocol, a transaction
to be done v Ans. : (c) release locks in phase.
Q. 2.3 Which of the following is NOT an advantage of (a) Shrinking phase
data replication in distributed database? (b) Growing phase
(a) Fast access to shared data
(c) Running phase
(b) High availability
(d) Initial phase ¥ Ans.
: (a)
(c) Reduced network traffic
(d) Easy updating of data items
Q. 2.9 In two phase commit, coordinates the
Y Ans. : (d)
synchronization of the commit or rollback
Q. 2.4 helps solve concurrency problem. operations.
(a) Locking (a) Database manager
(b) Transaction monitor (b) Central coordinator
(c) Transaction serializability (c) Participants
(d) Two phase commit v Ans.: (a) (d) Concurrency control manager Ans. : (b)
Q. 2.5 Which of the following is not a property of Q. 2.10 The transaction wants to edit the data item is called
transactions? as ¢
(a) Atomicity (b) Concurrency (a). Exclusive Mode
(c). Isolation (d) Durability ~ Ans.
: (b) (b) Shared Mode
Q. 2.6 means that a transaction must execute (c) Inclusive Mode

exactly once completely or not at all. (d) Unshared Mode ¥ Ans. : (a)
(a) Durability (b) Consistency Q. 2.11 If a distributed transactions are well-formed and 2-
(c) Atomicity (d) Isolation Vv Ans. : (c) phasedlocked, then is the correct locking
Q. 2.7 Assume transaction A holds a shared lock R. If mechanism in distributed transaction as well as in
transaction B also requests for a shared lock on R. centralized databases.
(a) It will result in a deadlock situation

(a) Two phase locking
(b) It will immediately be rejected

(b). Three phase locking
(c) It will immediately be granted (c) Transaction locking
(d)_Well-formed locking ¥ Ans. : (a)
Chapter Ends...
god
ei Data Interoperability —
CHAPTER 3 XML and JSON
:
XML Databases : Document Type Definition, XML Schema, Querying and Transformation: XPath and XQuery.
Basic JSON syntax, (Java Script Object Notation),JSON data types, Stringifying and parsing the JSON for sending
& receiving, JSON Object retrieval using key-value pair and JQuery, XML Vs JSON.
3.1ee
XML Databases .. eerie sovnliceviasieurnentvons atanunsupausresnanensesapenedis'ssuvdaild sensi idiueauhel DUN CSE SRE OTS
GQ. Explain XML Based databases. OR What is DTD ? Explain with example 7 ...
3.4.1 Building Blocks of XML File with FOSPACE TO DTD. ..csecccsescssesesesnsestensevsseessoonsennsseensidiedaetsisheseritetdietdactes eases GOD
Ga. Explain XML Schema in details. . slovav execpt ouaweaesan sabe ivaanyeinverssnsedaavenssguotvaeapupaversnaaunenriaasnessevoaneveneasnenes SA
3.1.2 Querying and Transformation: XPATH and XQuery... RE nirrerer reenter iTecnr ea
GQ, Explain the XPATH and XQuery. » unainigno Z 6109 26el) guuocn sisidw secdssub ioaoitelet # * oT ge
GQ. Explain Data retrieval from XML Using XQUEMY: J... cece ceteteteeeeteneseeseerertoneeenseneesessevenesennesensnsensqaeanenenns
Ga. What is XPATH and its uses ?
3.2 Basic JSON (JavaScript Object Notation) SYMAX v.ccsssessesssssessssseeneessenssnsssesaseessssseenseeesseansesseessenecntensensesesetsatsasensers SOUT
3.2.1 ISON Dat Types ..ccseecceeccccsscsssossssesssensssssucesascnsssssnsssnsanenresseuassresneeserssosacennnsenssseuesengueatacossecenencensneensssenten OULD
3.2.2 What is a JSON ODbj@Gt ? o..c.scescsesssssssssccsssesssssssnsssessesssepersnssatecneoeevserediscsrananeeesesegsuenseseatsesesansesinasasensusuate 3-13
3.2.3 sess SO
JSON ASrAYS .ccceecsescseststesesesesesescseesssssseseecseenarsrseareseseseseseseenssssssanseereseasissneneceravedareeneneserebaqeaesesesssaseseacen
3.2.4 Parsing JSON Data in JavaScript
GQ. Explain the parsing and stringifying function with respect to JSON retrieval... S15
3.2.5 Stringifying and Parsing the JSON for Sending and Recelving .......sssssecrsssssssenssrsensasesenssnereeennserees tS
3.2.6 Applications of JSON ...c..sscsssesssecsssssssssvesssssssessssonssestecssneenteesssessanquvessvesscscuusatecssecseneansnatersstasensanessnsesiesnvensies
OULD
3.3
GQ. Differentiate between XML and JSON. ...c.ccessescesessnseseserseseseeseeesrseesseereneesavenaeerseeeeetieteasenseseren SELE
3.4 Multiple Choice QU@STIONS.......:sccsessesesesssesssveeesessssesesesssssessasensansesensvsnseneneseneacensnesessaveyeyeansrseenesessnescananeneeeererunesavanenenas 3-18
2,
+e Chapter Ends oc ACN LAN TNTT
.cccccccecsescssstersesneessseeenecseessnesanareessggs MATT RNR ., 3-20
SSeS
Boca: TT eo ee oye em oe op == ss ke wm me eee wee
} $2. Explain XM Be
a
Se ae OS Se a
7s
XML is software and hardware independent techniques for storing and transporting data between
applications,
XML is having self descriptive tags. It has more similarity

with HTML but both are differ in their nature
such as XML has user defined tags
and HTML has predefined tags,
HTML is basically used to publish the web contents
and XML is basically container of data which is used
to transmit the data between applications,
XML Databases is holding large amount of infor
mation in XML format. As day by day the drasti
the information exch c change in
ange is happening and XML becomes
key player in data storage and easy data
transportation. So the rapid growth in intern
et usage and technology transfer gives birth
data and retri to store large XML
eve the required data easily.
XML databases comes into the focus with this
requirements and basically classified in to two
types such as,
© XML Enabled Database
Oo Native XML Databases
XML Enabled Database

}
The extension offered for the conversion of XML
documents is known as an KML enabled database.
This is a relational database, which means that
data is organised into tables with rows and colum
ns.
The tables are made up of records, which are
made up of fields.
Native XML Database
The container, not the table format, is used

in native XML databases. It has the capac
ity to hold a big number
of XML documents and data. XPath expression
s query a native XML database.
Document Type Definition (DTD)
DTD is abbreviated as Document Type Defini

tion which is having registry of XML tags
which also defines
the struct
ure and elements used in XML document prepar
ation.
The tags used in XML document preparation
must have defined first in DTD file. It is requi
red to have valid
XML document. The XML tags and elements are
valid when it has defined with this DTD file.
|
We can say XML document is valid, when
it is opening in any web browser and the tree
of documents get
displayed over there.
(MU-New Syllabus w.e.f academic year

21-22)(M5-68) [al Tech-Neo Publications...A SACHIN
SHAH Venture
awe
| Advance Database Management System (MU-Sem 5-Comp. Data interoperability - XML and JSON)....Page no. (3-3) —
t= Valid XML Document
<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE Email SYSTEM "Email.dtd">
<email>
> Amit</receiver>
<receiver
<sender>Pramod</sender>
<heading> Welcome to ADBMS</heading>
<body> Hi lets study ADBMS concepts. </body>
</email> . Bsa
In above example! DOCTYPE declaration specifies the DTD file reference.

t& DTD for above XML document
<!DOCTYPEemail
<!ELEMENT email (receiver,sender,heading,body)>

<!ELEMENT receiver (#PCDATA)>
<!ELEMENT sender (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)> ee
e From this above DTD example please note the following,

e !ELEMENT email is the root element of the document and it must contains (receiver,sender,heading,body)
these elements.
e #PCDATA specifies the Parsed Character Data it is having a text data that will be parsed by the parser.
#CDATA specifies the text will not be parsed by the parser.
23. 3.1.1 Building Blocks of XML File with respect to DTD
1. Elements 2. Attributes 3. XML schema
> 1. Elements
e XMLelements can be defined as building blocks of an XML document.

Elements can behave as a container to hold text, elements, attributes, media objects or mix of all.
start-
Each XML document contains one or more elements, the boundaries of which are either delimited by
tags and end-tags, or empty elements.
> 2. Attributes
Attributes are part of the XML elements.
(MU-New Syllabus w.e.f academic year 21-22)(M5-68) fl Tech=Neo Publications..A SACHIN SHAH Venture
An element can have any number of uniq
ue attributes,
* Attributes give more information about the XML element or more precisely it defines a property of the
element.
* An XML attribute is always a name-val

ue pair.
* To validate XML file we need to open the .xml file with
any latest web browser and it (if all tags are Written
properly) all tags are used and written properly then the xml
tree will get generate like below,
> 3. XML Schema
DTD file. XML Schema is another name for

XML Schema Definition (XSD). It's used to
validate XML data's describe and
structure and content. The components,
properties, and data types are defined by
XML schema. Namespaces are supported by the the
Schema element. It's similar to a database schem
explains how the a, which
data in a database is organized.
* An XML Schema's goal is to describe the legal
components of an XML document :
I. The different types of elements and chara
cteristics that can be found in a document
2. The number of child elements (and
their sequence):
3. Element and attribute data types
4. Element and attribute default and fixed
value
* So at the outset we can say that while
writing a XML document it will reference
to DTD or XML Schema,
¢ Let’s see one example. ~
<?xml version="1,0" enfeoding= "UTES
2
<!DOCTYPE Email SYSTEM "Email.dtd">
<email>
<Sreceiver> Amit</receiver>
<sender>Pramod</sender>
<heading> Welcome to ADBMS </heading>
<body> Hi lets study ADBMS concepts. </body>
<PhoneNo>23456789</phoneNo>
</email> 7 ee tis e om j :
e For the above XML file we will see how
to write a XML Schema or XSD.
XML Schema Document '
<?xml version = "1.0" encoding = "UTF-8"?>
<xsischema xmlns:xs = “http://www.w3.org/2

001/KMLSchema'’>
<xs:element name = "email">
(MU-New Syllabus w.e.f academic

year 21-22)(M5-68)
[a] Tech-Neo Publications... SACHIN SHAH Venture
(MU-Sem 5-Comp. Data interoperability - XML and JSON)....Page no. (3-5
<xs:complexType>
<xs:sequence>
<xs:element name = "receiver" type = "xs:string" />

<xs:element name = "sender" type = "xs:string!' />
<xs:element name = "heading" type = "xs:string" />
<xs:element name = "body" type = “xs:string" />
<xs:element name = "PhoneNo" type = "xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
3.3.1.2 Querying and Transformation: XPATH and XQuery
GQ. Explain the XPATH and XQuery, :
1 GQ. Explain Data retrieval from XML using XQuery. :
¢ Inrecent trends day by day increase in applications that uses XML for information exchange, mediate and to
store the data. So that the querying tools for effective data management is becoming very important now a
days.
® Tools for Querying and transforming XML data are especially important for extracting information from
enormous amounts of XML data and converting data across different XML schemas. A relational query's
output can be a relation, and an XML query's output can be an XML document. Querying and transformation
can thus be merged into a single tool.
A. XPATH
matching
e The query language XPath is used to navigate around an XML document. It's typically used to find
patterns for specific elements or attributes. It has official recommendation of W3C (World Wide Web
Consortium).
* It is used to explore an XML document's elements and attributes. XPath includes a number of expressions
that can be used to extract information from an XML document.
t= Basic components of XPATH
1. Definitions of Structure : Elements, attributes, text, namespaces, processing-instructions, comments,

and document nodes are all defined by XPath.
Data interoperabili — N)...
— XML and JSO .
Advance Database Management System (MU-Sem 5-Comp.
ns select node s or list of
nodes in XML
pow erf ul pat h exp res sio
2. Path Expressions : XPath provides
documents.
and sequence
num: eric val ues , dat e and time comparison, node
3. Standard Function : String values ,
basic functions are a 1! available
in XPath.
manipulation, Boolean values, and other
XML documents
is one of the major elements in XSLT stand
ard and used to transform
4, XSLT : XPath
into various other types of document.
e patterns are used by
pattern in order to select a set of nodes. Thes
* An XPath expression generally defines a
addressing purpose.
XSLT to perform transformations or by XPointer for
of the XPath
XPath specification specifies seven types 0 f nodes whic
h can be the output of execution
e
expression
o Root
o Element
o Text
o Attribute
o Comment
° Processing Instruction
o Namespace —
from an XML document.
¢ XPath uses a path expression to select node or a list of nodes
node/ list of nodes
t= List of useful paths and expression to select any
name node name.
1. Node name: It is useful in selecting all the nodes with the
2. /: Itis used to start the selection tight from the root node.
the secaae
3. /f:Itis used to show the selection starts with the current node that matches
4, .:Itis used to select the current node.
5. .. It is used to select the parent node of the current node.

6. @: It is used to select the attributes of the current node
¢ Below is the example where we have a sample XML document, students_info.xml and its style sheet
document students_design.xsI which uses the XPath expressions under select attribute of various XSL tags
to get the values of roll no, firstname, lastname, nickname and marks of each student node
Student_info.xml
<?xml version = "1.0"?>

<?xml-stylesheet type = "text/xsl" href = "students.xsl''?>
<class>
B) Feiss Pubicatinns sa. gucxaea Ane

68)
(MU-New Syllabus w.e.f academic year 21-22)(M5- NS...
— XML and JSON)...
<student rollno = "101">

<firstname> Rakesh </firstname>
<lastname >Sharma</lastname>
<nickname> Rakesh </nickname>

<marks>95 </marks>
</student>
<student rollno = "102">
<firstname
> Yogini</firstname >
<lastname> Verma</lastname>
<nickname> Yogini </nickname>
<marks>65</marks>
</student>
' <student rollno = "103">
_ <firstname> Rushi</firstname>
<lastname>Sing </lastname>
<nickname> Rushi</nickname>
<marks>90</marks>
</student>
</class>
Student_design.xsl
<?xml version = "1.0" encoding = "UTF-8"?>
<xsl:stylesheet version = "1.0"
xmins:xsl = "http://www.w3.org/1999/XSL/Transform">
<xsl:template match = "/">
<html>
<body>
<h2>Student information</h2>
<table border = "1">
<tr bgcolor = "green">
<th> Roll No</th>
<th> First Name</th>
<th>Last Name</th> ...
(MU-New Syllabus w.e.f academic year 21-22)(M5-68) fe Tech-Neo Publications...A SACHIN SHAH Venture
Advance Database Mana jement System Data interoperability — XML and JSON)....Page no. 38
(MU-Sem 5-Comp.
<th> Nick Name</th>
<th> Marks</th>
</tr>
<xsl:for-each select = "class/student">
<tr>
<td> <xsl:value-of select "@rollno'/> </td>

<td> <xsl:value-of select = "firstname"/> </td>
<td> <xsl:value-of select = "lastname"/> </td>
<td> <xsl:value-of select = "nickname'/> </td>
<td> <xsl:value-of select = "marks"/> </td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xslistylesheet>_
In the first file student_info.xml we have added three student records in XML
file format where the root
element is <class></class> and the student elements are listed in the XML
format within it. In second file
student_design.xsl we have fetched the data from XML file format
using XPATH expressions and added
look and feel to the data using XSL style sheet.
In student_design @ symbol is used to fetch the value of
Roll no’s of the student and XSL style is usedto
format the output data.
[C:/Users/admin/Desktop/XSL_XM. x oa
cS G G Fle://C:
rere
/Use rs/acmiryDesktop/KSl_XML/student.infoara
te treme
[101 {Rakesh | Sharma | Rakesh

[102° / Yogini | “Verma | Yorum les
[103 | Ruski [Sing | Ruski 40
a
(MU-New Syllabus w.e.f academic year 21-22)(M5-68) Tech-Neo Publicatio
ns,..A SACHIN SHAH Venture
e Here is how the data fetched from the XML file using XPATH expression and formatted with XSL style
sheet looks like.
B. X Query
* XQuery to XML is same like as SQL for the Databases. As like SQL is basically designed to query the
database as per the requirements same like XQuery does for the XML.
e XQuery is a functional query language that may be used to get data from XML files. It's the same with XML
as it is with databases. It was created with the intention of querying XML data.
e Use XQuery to take data from multiple databases, from XML files, from remote Web documents, even from
CGI scripts, and to produce XML results that you can process with XSLT.
e Both hierarchical and tabular data can be obtained with XQuery. Tree and graphical structures can be queried
with XQuery. XQuery may be used to query webpages directly. XQuery can be used to create webpages
directly. XQuery can be used to transform xml files.
e For example we have a employee database in employee.xml file and we need to get the data from XML file
such books we need to find whose prize is above 50. We can write XQuery with the extension -xqy.
Book-xml:
<fsml version="1.0" encoding="UTF-8"2>

<books>
<book ategary="JAVA'> HSA Tre Hay CEE Gah

< “english">Java Black book</title>
Robert</euthor>
we
</book> a ‘ ie
<book category XML'>
<title lang=" english" Complete XML ee title>
<author>Robert</author>
<author>Peter</author>
el
A dvance Database Management
ng Syst
y em (MU-Sem 5-Comp.) (Data interoperability — XML and JSON)....Page no, (3-1 y
Syear>2013</year>
Sprice>50.00</price>
</book>
<book category="XML">
<title lang="english">Leam XPath in 24 hours </title>

<author>Jay Ban</author>
<year>2010</year>
<price> 16.50</price>
</book>
~</books>
Book.xqy
for $x in doc("Books.xml")/books/book
where $x/price>40
retum $x/title
° In above example we have seen the retival of XML data i XQuery and you may notice that it will work
same as SQL will work for Databases.
° Let’s see one more example to retrieve the XML data elements from the Product.xml file using XQuery.
Product.xml
<Prod_catalog>
<product dept="WMN">
<number>557</number>
<name language="en'"> Fleece Pullover</name>
<colorChoices > navy black</ calorChoices >
</product>
<product dept="ACC">
<name language="en"> Floppy Sun Hat</name>
</product>
<product dept="ACC">
<number>443 </number>
(MU-New Syllabus w.e-f academic year 21-22)(M5-68) [al Tech-Neo Publications,.A SACHIN SHAH Venture
Advance Database Management System (MU-Sem 5-Comp. Data interoperability - XML and JSON)....Page no. (3-11
<name language="en"> Deluxe Travel Bag</name>

</product>
<product dept="MEN">
<name language="en"> Cotton Dress Shirt</name>
<colorChoices> white gray </colorChoices>
<desc>Our <i>favorite</i> shirt</dese>

</product>
</Prod_catalog>
Lets write a XQuery to retrieve the‘elements ‘from XML.
cS Query
for $prod in doc("catalog.xml")/catalog/product |
where $prod/@dept = "ACC"
order by $prod/name
return $prod/name
tS Results
<name language="en"> Deluxe Travel Bag</ nume>

<name language="en"> Floppy Sun Hat</name>
>| 3.2 BASIC JSON (JAVASCRIPT OBJECT NOTATION) SYNTAX
e JSON or JavaScript Object Notation is a lightweight text-based open standard designed for human-
readable data interchange.
¢ Douglas Crockford created the JSON format, which is documented in RFC 4627.
¢ JSON is data representation format who has the data represented in the form of Key Value pair. JSON file
has extension .JSON.
= Why we use JSON?
1. Provide support for all browsers.

Easy to read and write.
eR wo
Straightforward syntax.
You can natively parse in JavaScript using eval() function,
Easy to create and manipulate.
vA
(MU-New Syllabus w.e-f academic year 21-22)(M5-68) Tech-Neo Publications..A SACHIN SHAH Venture
— XML and JSON)...
Supported by all major JavaScript frameworks.

Supported by most backend technologies.
JSON is recognized natively by JavaScript.
It allows you to transmit and serialize structured data using a network connection.
. You can use it with modern programming languages.
to the server,
_ JSON is text which can be converted to any object of JavaScript into JSON and send this JSON
Key features of JSON
facade, which
Easy to use : JSON is simple to write and easy to use because JSON API offers high-level
helps you to simplify commonly used use-cases. It's utilised in the development of JavaScript-based
applications, such as browser extensions and webpages.

Performance : JSON is quite fast as it consumes very less memory space, which is especially suitable for
over a
large object graphs or systems. JSON is a serialisation and transmission standard for structured data
network connection.
Free tool : JSON library is open source and free to use. It's mostly used to send data from a server to web
apps. The JSON format is used by web services and APIs to give public data,
Doesn't require to create mapping : Jackson API provides default mapping for many objects to be
serialized. JSON can be simply used and get embedded in modern programming languages.
Clean JSON : Creates clean and compatible JSON result that is easy to read.
Dependency : JSON library does not require any other library for processing.
Simple JSON format data is as below where mobile store data is represented in JSON format.
{
"Mobile Store": [
{
"Prod_id":"01",
"Model": "MI 10",
"Version": "5th",
"Prize":."25000"
},
"Prod_id’:"02",
"Model": "Samsung",
"Version": "2nd",
:
"Prize": "30000"
1
]
}
(MU-New Syllabus w.e.f academic year 21-22)(M5-68) Tech-Neo Publications... SACHIN SHAH Venture |
Advance Database Management System (MU-Sem 5-Comp. Data interoperability — XML and JSON)....Page no. (3-13
2a, 5.2.1 JSON Data Types
1. String : The String type of data is presented as double quoted Unicode with escape character back slash.
2. Number : Number format supports up to double precision floating point format in java script. Number can
be of any type such as integer, fraction or exponent.
Array : Collection of ordered sequence of values and represent using array.

FY
Boolean : True or false
Object : It is used to represent unordered collection of key: value pair objects data.
esa7rAaw
Value : It is used to present either any of from number, array, string, true of false or null.
Whitespace : When any pair of tokens are used. .
Null : When empty data we need to specify.
2. 3.2.2 What is a JSON Object ?
JSON object is a set of, Keys along with its values without any specific order.
The key and their values are grouped using curly braces, both opening and closing “{ }”. So, in the previous.
For example_when we were creating a JSON with a car attribute, we were actually creating a JSON car
Object. There are certain rules that need to be followed while creating a JSON structure, we will learn about
those rules while discussing the Key value pairs.
an
So, in order to create a JSON, the first thing we will need is an attribute. Here, we are creating
assume our
“Employee” JSON object. Next thing we need is to specify the properties of the object, let’s
properties of the
employee have a “First Name”, “Last Name”, “employee ID” and “designation”. These
employee are represented as “Keys” in the JSON structure.
Employee:
{
“Employee_id” : “1001”
“Firstname” : “Raghav”,
“Lastname” : “Shastry” aes
“Designation” ; “Manager”
Employee Object.
Everything within the curly braces is known as JSON
Example, we used a JSON to
A basic JSON object is represented by Key-Value pair. In the previous
Tepresent an employee data.
Name”
And we have represented different properties for the employee; “Employee ID” “First Name”, “Last
and “designation”. Each of these “keys” has a value in the JSON. For Example, “First Name” has been
values.
represented by a value “Raghav”. Similarly, we also have represented other keys by using different
Tech-Neo Publications...A SACHIN SHAH Venture

Advance Database Management
'S Rules to be followed while creating a JSON
1. JSON Objects should start and end with braces “{ }”-
Key fields are included in the double quotes.

N
Values are represented by putting “:” colon between them and the keys.
WwW
JSON key-value pairs are separated by a comma “,”.

fF
Values can be of any data type like String, Integer, Boolean efc,
vA
3.2.3 JSON Arrays
Arrays in JSON are similar to the ones that are present in any programming language, the array in JSON is
also an ordered collection of data. The array starts with a left square bracket “[“and ends with right square
bracket “J”. The values inside the array are separated by a comma. There are some basic nules that need to
be followed if you are going to use an array ina JSON.
Let’s have a look at a sample JSON with an Array. We will use the same Employee object that we used
earlier. We will add another property like “Technical Skills”. An employee can have expertise in multiple
programming languages. So, in this case, we can use an array to offer a better way to record muluple
language expertise values.
e For Example
Employee :
{
“Employee id” : “1001”
“Firstname” : “Raghav”,
“Lastname” : “Shastry”
“Designation” : “Manager”
“Technical_Skills” : [“Java”, “C’, “C++”, “net”]
}
5S Rules for using Arrays in JSON
An array in JSON will start with a left square bracket and will end with a right square
bracket.
e Values inside the array will be separated by a comma.
Objects, Key-value pair, and Arrays make diverse components of the JSON. These
can be used together to
record any data ina JSON.
Advance Database Management System (MU-Sem 5-Comp. — XML and JSON)...
7 3.2.4 Parsing JSON Data in JavaScript
e The JSON.parse() method in JavaScript makes it simple to parse JSON data from the web server. This
method parses a JSON string and creates a JavaScript value or object from it. A syntax error will occur if the
provided string is not valid JSON.
e __Let’s see one example suppose we have received one JSON encoded string such as,
{"name": "Amol", "age": 22, "country": "Inda"}
e Let’s convert this JSON encoded string into JavaScript object as below,
var json = '{"name": "Amol!, "age": 22, "country": "Inda"}';
e We can conConverting JSON-encoded string to JS object

var obj = ISON parse(json);
Ya. 3.2.5 Stringifying and Parsing the JSON for Sending and Receiving
« The most common use of JSON is to exchange the information between client and server. While sending and
receiving the data initially and at the receiver end it should be in a string format but when it will be sent on
network it will be in object form.
¢ It is possible to create a single JSON object or array of object using JavaScript and it will be used as per the
requirement. So the conversion of string to object and object to string is required at the respective end.
e The two main function are useful in this conversion and they built on javascript.
1. JSON.parse()
2. JSON.stringifyO
e Parsing : The data that we receive from a web server is always a string. We use JSON -parse() to parse the
data and convert it to'a JavaScript object. It is only a string, some text, before it is parsed. and you canmet
access the data embedded in it. It becomes a JavaScript Object after parsing, and you can access the datz.
* Suppose we have received the data from the web server is as,
Name : {"name";"Amey", "age":35, "city":"Pune"}
© Then it can be parsed such as it will get converted into JavaScript object as,
Name obj = JSON.pars ({"name":"Amey", "age":35, "city":"Pune"}");
* Stringify : A JavaScript object is converted to a JSON string using JSON. stringify(). The data that is sent to
a JavaScript object to a
"a web server must be a string. The JSON. stringify() method can be used to convert
string ().
object to a
© The data that is sent to a web server must be a string. JSON.stringify() converts a JavaScript
string().
* For example we have a following JSON data
({MU-New Syllabus w.e.f academic year 21-22)(M5-68) Tech-Neo Publications..A SACHIN SHAH Venture
{
Advance Database Ma meant System (MU-Sem 5-Comp. Data interoperability - XML and JSON)....Page no. (3-16
Name obj = {name: "Amey", age: 35, city: "Pune"};

e We can convert above JSON object into a string by using JSON. stringify0) function as below,
Name myJSON = JSON stringify(obj);

¢ The result we obtain from this conversion is a JSON string and ready to send to the server. |
e Similarly we can stringify the JSON array also |

Name Stud_array = [“Akash", "Seema", “Samyak", "Sakshi"];
e Wecan convert the above JSON array in to JSON String
Stad_armay Student = JSON. stringify(Stud_array);
¢ Here in this above example Student is a string now and it can be sent to server side.
ES JSON Object retrieval using key-value pair and JQuery,
JSON refers to JavaScript Object Notation. We use JSON to transfer the data from server to client and client
to server.
e As per now we have seen JSON and its data formats, we.can sehine iad JSON data in key value pair as,
Var Sid Name=. (name"s"Amey", "aged, "city'"Pune’} on
This is how we can define the JSON data object in terms of the Key value pair. Here in this example above
you can see the keys are defined uniquely and it has associated values too. However the name, age and city |
are unique keys and having values specified over there. Let’s discuss how we can retrieve the JSON data
using JQuery. ;
e First we can find the JSON object using its key let’s see how we can,
var Name_of_Student= Stud | Name.name;
var Age
of Student = Stud_Name.age;
var city_of Student = Stud_Name-city; nos SS

e In this way we can use the unique key and append (.) dot to retrieve values of the associated keys.
e Let’s have look on some JSON Array of object we have and from which we need to fetch the data. The
JSON data we have as, .
var Stud_data= [{ "Rno": "101", "Password": "Pass@123" eo : 5 a —
{ "Rno"; "102", "Password": "Stud@123" }]}; 0. ps ae \ BWA TISS

oe
e To retrieve data from the above JSON objects we can do it as below,

yar firstStudent = Stud_data[0].Rno;
var secondStudent = Stud_data[1].Rno;
var firstPassword = Stic’ dath[O]-Passwund:
yar secondPassword = Stud_data[1].Password;
(MU-New Syllabus w.e.f academic year 21-22)(M5S-68) Tech-Neo Publications..A SACHIN SHAH Venture |
73. 3.2.6 Applications of JSON
» Helps you to transfer data from a server
» Sample JSON file format helps in transmit and serialize all types of structured data.
e Allows you to perform asynchronous data calls without the need to do a page refresh
e Helps you to transmit data between a server and web applications.

» tis widely used for JavaScript-based application, which includes browser extension and websites.
e Youcan transmit data between the server and web application using JSON.
e Wecanuse JSON with modern programming languages.
e It is used for writing JavaScript-based applications that include browser add-ons.
e Web services and Restful APIs use the JSON format to get public data.
1 3.3 XML
VS JSON.
to server side and both are having

XML and JSON these both data formats are used to share the information
its unique features lets discuss the same,
[Parameters | JSON is JavaScript Object Notation.

Acronym XML is Extensible Markup Language.
Supports XML supports different encoding. JSON supports UTF-8 encoding.
XML has user defied tags. JSON will not uses start and end tag.
Tags
Security XML is secured than JSON. JSON is less secure.
XML does not use arrays. JSON can have array of JSON objects.
Arrays
XML requires DOM parsing. JSON is parsed into ready to use JavaScript
Parsing
objects.
has no display capabilities.

Display XML provides a capability to display data | JSON
capability because it is a markup language.
Retrieving value is difficult. Retrieving value is easy.

Retrieving
value
The object has to be expressed by | Native support for object.
Object
conventions - mostly missed use of
attributes and elements.
te Tech-Neo Publications..A SACHIN SHAH Venture

(MU-New Syllabus w.e.f academic year 21-22)(MS-68)
Gr
=
Se
Advance Database Management System (MU-Sem 5-Comp.) (Data interoperability - XML-and JSON)....Page.no. (3-1 8)
SS SSS
> 3.4 MULTIPLE CHOICE QUESTIONS Q.3.8 Which is correct format of writting JSON
name/value pair
SS
Q. 3.1 XML stands for occ ccccoues (a) "name" : "value" (b) name = 'value'
(a) Extensible Markup Language (c) name = "value" (d) name: value’ ~Ans. : (a)
(b) Eccessive Markup Language
SS
What is a JSONStringer used for?
(c) Executive Markup Language
(a) It is used to quickly create JSON text.
(a) Extensible Managing Language = “Ans. : (a) (b) It is used to create number strings in JSON.
Q.3.2 The XML format has a simpler set of (c) It quickly converts JSON to Java strings f
ns avansnsuusntnsenegnens than HTML.
(d) It is used to create JSON ordered pairs é
(a) loader rule (b) parsing rules ¥ Ans. : (a)
(c) generator mile = (d) logical mle ~— “Ans. : (b)
Q.3.10 sis a major element in the W3C'’s XSLT
Q.3.3 All information in XML is aesees standard.
(a) Unicode text (b) multi code (a) XQuery (b) XPATH
(c) multi text (d) simple text v Ans. : (a) (c) XPOINTER (d) XLINK ¥ Ans. : (b)
Q.3.4 In XML the attribute value must always be quoted Q. 3.11 XPath is used to navigate through
WIED sciccasecsens
(a) elements and attributes files
(a) double quotes —_(b) single quotes
(b) files
(c) both a and b (d) name of attributes
(c) defferent pages
v Ans. : (a)
(d) none of these ~ Ans. : (a)
Q.3.5 Which of the following isn't a JSON type?
Q. 3.12 XML stands for ............0.000.....
(a) String (b) Object
(a) Extensible Markup Language
(c) Date (d) Array -.¥ Ans. : (¢)
(b) Eccessive Markup Language
Q.3.6 What is the purpose of method JSON. parse()?
(c) Executive Markup Language
(a) Parses a string from JSON to JSON2 (d) Extensible Managing Language v Ans. : (a)
(b) Parses a string to integer
Q. 3.13 The “XML format has a simpler set of
(c) Parses a string to JSON aay NS than HTML. —
(d) Parses integer to string v Ans. : (c) _(a) loader rule (b) parsing rules
Q. 3.7 What is JSON retum? (c) generator rule (d) logicalrule = “Ans. : (b)
(a) json. loads() takes in a string and returns a json Q.3.14 All information in XML is «0.0...
object. json. dumps() takes in a json object and
(a) Unicode text —_(b) multi code
returns a string.
(c) multi text_ (d) simple text “Ans. : (a)
(b) json. loads() takes in a json object and returns a
json object. json. dumps() takes in a json object Q.3.15 In XML the attribute value must always be quoted
and returns a string. WItH oo...
:
(c) json. loads() takes in a string and returns a json (a) double quotes (b) single quotes
object. json. dumps() takes in a string and (c) both a and b (d) name of attributes
returns a string
~ Ans, : (a)
(d) None of these v Ans. : (a)
ance Database Management System (MU-Sem 5-Com
— XML and JSON)....P.
N type?
3.16 which of the following isn't a JSO Q. 3.23 JSON name/value Pair is written
as
(a) String (b) Object (a) name’ : ‘value’ (b) name = ‘value’
(c) Date (d) Array v Ans. : (c) (c) name = “value” (d) “name” : “value”
0. 37 What is the purpose of method J SON. parse()? ~ Ans. : (d)
(a) Parses a string from JSON to JSON2 Q. 3.24 In the below notation, Employee is of type {
(b) Parses a string to integer “Employee”: [ “Amy”, “Bob”, “John” ] }
(c) Parses a string to JSON (a) Not a valid JSON string
(d) .Parses integer to string ¥ Ans. : (c) (b) Array (c)Class (d) Object ¥ Ans. : (b)
Q. 3.25 Which of the following is not a JSON type?
g. 3.18 What is JSON return?
(a) json. loads() takes in a string and returns a json (a) Object (b) Date
object. json, dumps() takes in a json object and (c) Array (d) String ¥ Ans, : (b)
returns a string.
Q. 3.26 What is the value of obj in the following code?
(b) json. loads() takes in a json object and returns a var obj = JSON.parse(‘{“fruit”: “Apple”}’,
json object. json. dumps() takes in a json object function(k, v) { if (v == “Apple”) return “Orange”
and returns a string. else return v; });
(c) json. loads() takes in a string and returns a json (a) { “fruit”: “Apple”} (b) { “fruit” : “Orange”}
object. json. dumps() takes in a string and (c) {“Orange”} (d){“Apple”} Ans. : (b)
returns a string
| Q.3.27 What is the value of json in the following code?
(d) None of these ¥ Ans. : (a) var obj = { fruit: ‘apple’, toJSON: function () {
return ‘orange’; } }; var json = JSON.stringify({x:
Q.319 Which is correct format of writing JSON
name/value pair obj});
(a) {“x”:"orange”} (b) {“fruit”:apple”}
(a) "name" : "value" (b) name = 'value'
(c) { “y”-”apple” }
(d) {“fruit”:”orange”}
(c) name = "value" (d) name value’ ¥ Ans.
: (a)
~ Ans. : (a)
Q. 3.20 What is a JSONStringer used for?
Q. 3.28 What is used by the JSONObject and JSONArmay
(a) It is used to quickly create JSON text. constructors to parse JSON source strings?
(b) It is used to créate number strings in JSON. (a) JSONTokener (b) JSONParser
(c) It quickly converts JSON to Java strings (c) JParser (d) Parser] ¥ Ans. : (a)
in
(d) It is used to create JSON ordered pairs. Q. 3.29 Which statement about the space parameter
¥ Ans. : (a) JSON.stringify () is false?
(a) It controls spacing in the resulting JSON string
Q.3.21 XPath is used to navigate through
(b) It removes whitespace
(a) elements and attributes
(c) It is an optional parameter
(b) files (d) All three statements are false “Ans. : (b)
(c) different pages Q. 3.30. What is a JSONStringer used for?
v Ans. : (a)
(d) none of these (a) It is used to quickly create JSON text.
Q.3.22 XPath is a major element in (b) It quickly converts JSON to Java strings.
(c) It is used to create number strings in JSON.
(a) XSLT (b) XSL
na) (d) It is used to create JSON ordered pairs.
(c)XML_ (d) XHTML v Ans.
¥ Ans.: (a)
A
Tech-Neo Publications..A SACHIN SHAH Venture
(MU-New Syllabus we.f academic year 21-22)(M5-68)
Data intero — XML and JSON)....
ap
Q. 3.31 What is the value of json in the following code? var (d)A collection of native-value pairs, and
(}; days[‘Monday’]) = _ true; ordered list of arrays, or values. “Ans. : (a)
days =
days[‘Wednesday'] = true; days[‘Sunday’] = false; N?
var json = JSON. stringify((x: days});
Q. 3.34 Does whitespace matter in JSO
(a) No, it will be stripped out.
(a) (day”: (“Monday”:"true”,”Wednesday”:"true”,
"Sunday”:"false”} } (b) Yes, only within strings.
(b)("'x":{“Monday”:true,”Wednesday”:true,”"Sunda (c) Yes, only outside of strings.

gs
y":false} ) (d) Yes, both inside and outside of strin
v Ans. : (b)
(c){ “day”: (“Monday":true,”" Wednesday” :true,”Sun
day” :false } }
Q. 3.35 What DOM stands for?
(d){“x":[“Monday”:true,” Wednesday” :true,”Sunda
(a) Direct Object Model
y” false} ) ¥ Ans. : (b)
(b) Document Object Modeling
Q. 3.32 What error does JSON.parse() throw when the
(c) Document Object Model
string to parse is not valid JSON?
(d) Document Output Model v Ans. : (c)
(a) ReferenceError .(b) EvalError ‘-
(d) TypeError ¥ Ans. : (c) Q. 3.36 Which of the following XPath expression selects
(c) SyntaxError
the parent of the current node?
Q. 3.33 What two structures is JSON built on? a). (b).. ()/ (d)/ “Ans.: (b)
(a) A collection of name/value pairs, and an
Q. 3.37 The basic use of Xquery?
ordered list of values, or array.
(a) It works as same just like SQL works for
(b)A collection of. object/item pairs, and an
Database
ordered list of pairs, or array.
and an (b) Used to fetch the web objects
(c) A collection of name/value objects,
ordered list of objects, or array. (c) Used to fetch the html elements
(d) None of these v Ans. : (a)
Chapter Ends... |
g00 |
ee
hi al NoSQL Distribution
CHAPTER 4 | Model
| -syltab
se
s of NoSQL, comparison between SQL and NoSQL databa
NoSQL database concepts: NoSQL data modeling, Benefit
system. .
distributed data, CAP theorem, Notion of ACID Vs BASE,
Replication and sharding, Distribution Models Consistency in
‘
handling Transactions, consistency and eventual consistency.
Comparison of
Document database and Column Family Data store,
Types of NoSQL databases: Key-value data store,
NoSQL databases w.r.t CAP theorem and ACID properties.
ge tCeeeeeee
eseedsetQUaiOMessatseeQeQeetOnessin @ 42
2202 205
uveuscanesansnuuannenauesnsrssessersieesesssessensiesses
4.1 NoSQL database concepts.........---+ sccuess EE cen 4-2
.::sssssscssesne meennersmsnerstseter
cette
4.1.1 What is meant by NoSQL Databases ? ......-: EE nsnn S nss 42
sents nessttennnanenseta
sssestseetts
AA.2 Why NoSQLis in existence? -.-....sscsssore ennenrestsncan essenan enensees 43
essen
nensaes
sssossseseesesssessssssseesceecseensssssnessesnan
4.1.3. Benefits of NoSQL databases over FIDBMS....esss T 43
esssset
geerEE
iE
-sssssssssss eteessessessereceen
4.1.4 Challenges in using RDBMS.......--ss IIsnne ITr 43
se rsns
esssssreseme ee nnnternseers
ete esnn
4.1.5 Types of NoSQL Databases ........-tsssses srestenneceasa nnentnn nner
nacense nsnse
ase Models.2...:...s---ssiciseseseeetsentessss
4.1.5(A) Performance Parameters of NoSQL Datab EET
ISTE nses ssn
nnnntts
sess
sssssssesssn nennnnsnssss
ecesssnsoP
41.6 NOGQL Data Modelling ......sssssssssssssr
esneneensaaeedhiees
4.1.6(A) Document Oriented Databases....... s cgusascu stiasctya
ecsssscreeretnnettesceten
4.1.6(B) Graph Based Databases «....c.ssssecs
ensecstsssrsisennrnntens
4.1.6(G) Key Value Databases ......-.ssrssserssser
4.1.6(D) Column Store Databases ........-s:ssscsecesieererrceteninnts
nsrerseessnennnensrsnanncennnannens
4.1.7 —_ Benefits Of NOSOL .......seessesesseesense
System
4.1.8 Comparison between SQL and NoSQL Database
ecersecscenneennnennstnn tenn tees
nsenne
4.2 — Replication and Sharding........ssesssscssscss
ees
4.2.1 Whaat is Replication? ..........sescssssssesesseneeetsresseres
4.2.2 — Master-Slave Replication ..........:.-ssssrssrcsesetererererertes
eee
4.2.3. Whatis MongoDB Sharding ?.....--.-ssssrersrseneesers
nne
4.2.4 How Data is Distributed Across Shards Vicnscsesmswie
Data...
4.2.5 Distribution Models Consistency in Distributed
ssserereeensees
4.2.6 Update and Read Consistency «..1-----secereo
enente
4.2.7 —_ CAP THGOF OM ose eeeseeeeeeneenenereesntecnennnnnnens
4.2.8 Notion of ACID Vs BASE .......+ssesesssseerecsees
enserreriee
4.3 Types of NoSQL databases.......--sssrsisrsscsersern rsnterseees 4-15
em and ACID PrOPOrtiS ......scceescceseseeesenseeeesta
4.3.1 Comparison of NoSQL Databases w.r.t CAP Theor neys 4-16
ssssssssssescnscssessetnsaisantnanssstnsateneaensensadeneaseses
4.3.2 RDBMS To NoSQL Database w.r.t ACID and BASE.......scs 4-16
isesreetensnnnneen senses er ;
4.3.3 Features of NoSQL Database.....sssssrrssssssnerr
nsers
4.4. Multiple Choice Quesitons .........ssssssessersereerre tae io net
Sen
eessssm ssssese eesessn caus sepqaheeeeie aiit
nnnananersecracnantnennnns scvanaatigiecscagaels
fe Chapter End .........ssesssessscs
Dr 4.1 NOSQL DATABASE CONCEPTS
a
“A NoSQL (originally referring to "non SQL" or “non relational") database is one that vores and Tetrieve,
data using methods other than the tabular
relations employed in relational databases
It refers to
et »
a wide range of database technologies that were
created in <a = merase in the Volune
of data kept on users, things, and
goods, as well as the frequency, with which ties we SS
performance and processing requirements. NoSQL databases are gPCesSo 25 well ag
typically organised as key-value pain.
graph databases, document-oriented databases, or column-oriented
databases.
The term NoSQL was first used in 1998 by Carlo Strozzi for a relational database that
omitted the use ¢
SQL. The term was picked up again in 2009 and used for conferences of atincanes
of non-telation,
databases such as Last.fm developer Jon Oskarsson, who organized the NoSQL
meet up in San Francisco,
Ch 4.1 What is mean
by NoS
tQL Databases?
NoSQL is a non relational database management system and it
is different for the relational database
management system by many ways.
It is designed for the distributed data where the applications

are required to store the large amount of the
data. .
Let’s compare the data generated through the
social media applications they usually generated
the large
amount of data and this data is not in a fixed
format one cannot predict about the data forma
t aS we can
predict in structured RDBMS case.
The main advantage of the NoSQL datab

ase is it is schema less and does not Tequi
re any fixed structure s
compared with RDBMS which is schema oriented.
2. 4.1.2 Why NoSQL is in existence?
capable to store the large amount of data.

'3" What is the right time to use NoSQL databases ?
One can shift to NoSQL databases when,
1. There is requirement to build the application where application requires higher

2. When the applications are having huge amount of data to handle.
degree of parallelism.

Tech-Neo Public
ations_4 SACH
IN SHAH Ventur
e
advance Database Management System (MU-Sem 5-Comp. NoSQL Distribution Model)....Page no. (4-3
3, When there are fewer requirements of ACID properties.
4. When it is transaction less and schema is flexible.
ya 4.1.3 Benefits of NoSQL databases over RDBMS
1. Increased performance
Higher scalability
te
3 Schema less
4. Dynamic
ya. 4.1.4 Challenges in using RDBMS
Relational databases are the best suited for some limit of data storage and simple structured data storage but
as today’s trend of data is considered so this traditional RDBMS is having some limitations.
RDBMS assumes a well-defined structure of data and assumes that the data is largely uniform.
It needs the schema of your application and its properties (columns, types, etc.) to be defined up-front
before building the application. This does not match well with the agile development approaches for highly
dynamic applications. :
As the data starts to grow larger, you have to scale your database vertically, ie. adding more capacity to the
existing servers.
73. 4.1.5 Types of NoSQL Databases
1. Document Oriented Databases : Document-oriented databases handle a document as a whole, not as a
collection of name/value pairs.

containing
Graph Based Databases : A graph database represents and stores data using graph structures
that supports index-
nodes, edges, and characteristics. A graph database, by definition, is any storage system
free adjacency.
may be quickly looked up to
Key Value Databases : A key-value pair's key is a single value in the set that
retrieve data.
d storage. When storing nulls,
Column Store Databases: Data can be stored efficiently with column-oriente
value exists for that column.
it saves wasting space by simply not saving a column when no
(MU-New Syllabus w.e academic year 21-22)(M5-68) fa) Tech-Neo Publications..A SACHIN SHAH Venture
2X 4.1.5(A) Performance Parameters of NoSQL Database Models
they are having some performance
As we have seen different types of NoSQL database model s above
parameters to differentiate each other let's discuss the same,
Data model Ke are Flexibility) Functionality Complexity
High High Variable None

Key Value Store] High
High Moderate] Minimal Low

Column Store | High
Document Store| High Variable | High Variable Low
Graph database | Variable Variable | High Graph Theory] High
4.1.6 NoSQL Data Modelling
YS 4.1.6(A) Document Oriented Databases
e Document-oriented databases handle a document as a whole, not as a collection of name/value pairs. This
allows you to group a variety of documents into a single collection at the collection level. Document
databases allow you to index documents based on their attributes as well as their primary identifier.
Today, there are a variety of open-source documentdatabases accessible, but MongoDB and CouchDB are
the most popular. MongoDB has grown in popularity as one of the most widely used NoSQL databases.
° Few databases of this category available in market is as, MongoDB, HBase, Cassandra, Amazon
SimpleDB, Hypertable.
e Document Oriented databases consists key value pairs to represent the data in the database, This type of
databases are going to store a records in the form of documents such as for example if you are storing 4
records in SQL database (MySQL, ORACLE) it going to create a 4 new rows in a table you are inserting
values. Similarly when you try to add 4 new records in Document Oriented Database such as MongoDB it
will create 4.new documents in the particular collection.
e NOTES e
(MU-New Syllabus w.e-f academic year 21-22)(M5-68) Tech-Neo Publications..A SACHIN SHAH Venture
BH 1dC"S9b?643604b'h5994 eRe A ad
od ee er hue
eas ea ae
za “principal"
permanent" i
vice_principal",
Pw Ut TTT aa
mt
Fig. 4.1.1 : Mongodb Collection
e As we have already discusses about SQL based databases and NoSQL Databases it is noticed that the SQL
databases are schema oriented and having many features such’ as different constraints to ensure data
duplication and NoSQL are schema less so the data duplication and data formats are no matters. The schema
is nothing but how logically your relationships are associated and explores logical structure of the database.
But NoSQL databases like Mongodb is schema less and having data stored in documents each and every
to each and
document is allocated with _id which is object id it’s a 12 byte hexadecimal number allocated
every document.
‘structure of ‘Mongodb
® In this above image you can see the object id is allocated to each document and
database where values are stored in key value pair.
%& 4.1.6(B) Graph Based Databases
e A graph database represents and stores data using

graph structures containing nodes, edges, and
‘Records Records
characteristics. A graph database, by definition, is
any storage system that supports index-free
that each element has a Sia
adjacency. This means
eliminating
direct pointer to the element next to it, sass
for index lookups. Specialized graph
the need
databases, such triple-stores
as_ and network
databases
databases, are separate from general graph
rsed using
that can store any graph. The graph is trave
(io1)Fig. 4.1.2 : Graph Based Databases
indexes.

Advance Database Management System (MU-Sem 5-Comp. NoSQL Distribution Model)....Page no. (4-6
* Few databases of this category available in market is as Neo4j, OrientDB, Facebook Open Graph, FlockDp,
* In graph based database you will not get the rigid format of the tables or the any kind of eles and columns
Tepresentation. A flexible graphical representation is used to address better scalability in the Sap baseq
databases. The Graph structures are used with set of edges, nodes and properties which provides
index free
adjacency. The data can be easily transformed from one model to another using Graph based NoSQL
database,
* A graph database is one that is built on the basis of graph theory. It is made up of
a collection of items, each
of which can be a node or an edge.
o Nodes: People, businesses, accounts, and any other item to be tracked are examples of entities or
instances. In a relational database, they are generally comparable to a record, relation, or row; in a
document-store database, they are roughly equivalent to a document.
© Edges: The lines that connect nodes to other nodes, often known as graphs or relationships, represent
the interaction between them. Examining the links and interconnections of nodes, attributes, and edges
reveals meaningful patterns. There are two types of edges: directed and undirected. An edge linking two
nodes in an undirected graph has only one meaning. The edges linking two different nodes in a directed
graph have different meanings depending on their orientation. Edges are the most important notion in
graph databases, as they provide an abstraction that can't be easily implemented in a relational or
document-store paradigm.
cS Properties
They are information associated to nodes. For example, if Wikipedia were one of the nodes, it might be tied
to properties such as website, reference material, or words that start with the letter w, depending on which
aspects of Wikipedia are germane to a given database.
%&. 4.1.6(C). Key Value Databases
¢ A key/value pair's key is a single value in the set that may be quickly looked up to retrieve data.
Key/value
pairs come in a variety of shapes and sizes, with some keeping data in memory. and others
allowing it to be
saved to disc. Oracle's Berkeley DB is a basic yet powerful key/value store.
¢ Incomparison to relational databases, key-value databases operate in a totally different way.
* RDBs specify the database's data structure as a sequence of tables
with fields that have well-defined data
types. By exposing data types to the database software,
it can do a variety of optimizations. Key-value
systems, on the other hand, handle data as a single opaque
collection with several fields for each entry. This
provides more flexibility and adheres more closely to modern
notions such as object oriented programming.
Because optional values are not represented by placeholders
or input parameters, as they are in most
Relational DBs, key-value databases frequently employ significantly
more than placeholders and input
parameters.
e Few databases of this category available in market is as, Membas

e, Redis, MemcacheDB.
(MU-New Syllabus w.e.f academic year 21-22)(M5-68) [al Tech-Neo Publications..A SACHIN SHAH Venture
agvan
Kl | AAA,BBB,CCC
K2 | AAA,BBB
K3 | AAA,DDD
K4 | AAA,2,01/01/2015
KS | 3,ZZZ,5S623
va. 4.1.6(D) Column Store Databases
» Data can be stored efficiently with column-oriented storage. When storing nulls, it saves wasting space by
simply not saving a column when no value exists for that column.
the unit itself being identifiable by a
» Each data unit can be thought of as a collection of key/value pairs, with
key is commonly referred to as the
primary identifier, sometimes known as the primary key, This primary
row-key in Bigtable and its clones. |
data rather than as
e Incolumn oriented NoSQL database, data is stored in cells grouped in columns of the
which contains the virtually unlimited
rows of data. The columns are grouped logically in to columns
definition of schema.
number of columns that can be created or the runtime or at the
e The column store databases are :
o GooglesBigtable, Cassesndra,
o HBase; CouchDB .
3. 4.1.7 Benefits of NoSQL

s of old,
ases in response to the complexity and constraint
Organizations. are increasingly using NoSQL datab e, and
ase is more scalable, can help you achieve higher performanc
legacy relational databases. A NoSQL datab
implementing and distributing software.
offers a more cost-effective means of building,
will discuss some
about the NoSQL databases so far now we
As we have discussed few things above’
benefits of using this type of databases,
no single point
are built upon distributed architectures so there is
1. Data Availability : The NoSQL databases res
most important featu
re for a datab ase and there is built in replication of data is there. As we know
of failu two nodes failed or having
synchronization so that if one or
of distributed environment is replication and res it may
load then still there is no any data loss or request failure. Due to these unique featu
threshold of .
cont inuo us avail abili ty of the datab ase in single location or on a site or on a cloud
have fault tolerance and at massive scale and due to
depl oyed appro priat ely NoSQ L databases cab supply high performance
When its
So whenever the users trying to use the database
down.
this there is less possibility of database going arity.
high and due to all these featu res this type of databases gaining more popul
uptime is
fal Tech-Neo Publications..A SACHIN SHAH Venture
(MU-New Syllabus w.ef academic year 21-22)(M5-68)
tion where DBAs must rely on scaling to meet
2. High Scalability : Traditional database services have a restric
ies to buy larger servers to handle the
development needs. Its ultimate goal is for database users or compan
convenient scaling out option: the
growing data load. In those cases, NoSQL databases offer a far more
scaling out as
database is dispersed across multiple pre-existing hosts. As the demand for data storage grows,
S data to
a virtual environment will provide a more cost-effective option to hardware scaling. Scaling RDBM
es can be simply handled because
commodity servers like cluster servers is difficult, while NoSQL databas
costs are very minimal, making
they are pre-programmed to accomplish scale on new nodes. The hardware
data storage a viable option.
Best suitable for Big Data ; Large amounts of data are generated by each business, application, and service,
which must be appropriately kept. It emphasises the concept of "big data,’ which is concerned with the data
industrial revolution. RDBMS, in most cases, are unable to store unstructured data, as well as many types
and large amounts of data. To address ‘big data’ volumes, enterprises have turned to NoSQL platforms such
as MongoDB or Hadoop.
Location Independence : While doing operations with the databases the users are really getting abstract
view of data. Irrespective of location uses can submit its queries and it can be processed by any site as per
the availability of the site any time users query will get processed and results get generated due to
synchronous and replicated sites.
5. Flexible and Agile Data Model : Traditional database systems, especially big production databases, are
notorious for causing enormous headaches when it comes to handling changes in storage and operating
design. Minor changes must be carefully monitored in such a system. NoSQL database systems, on the other
hand, have no such limits in their data storage architecture. They are adaptable to changes in data genre as
well as data storage architecture, allowing for comparative agility such as the addition of new columns
without significant adjustments or breakdown,
Analytics and Business Intelligence : A key strategic reasons business move to a NoSQL database system
from a Relational Database Management Systems is the more flexible data model that found in most NoSQL
databases. The relational idata model is based on defined relationships between tables which themselves are
defined by determined column structure all of which are explicitly organized in a database schema. A
NoSQL data model often referred as a schema less data model and it is able to accept all kinds of data such
as structured, semi structured and unstructured much more easily than a relational database which rely on a
predefined schema.
NoSQL databases are cheaper : NoSQL databases are intended for utilizing inexpensive commodity
hardware for constructing clusters of the server, which helps in managing huge data volumes and transaction
of data. On the other hand, traditional RDBMSs systems want expensive storage and original servers; this
means they pose a higher cost per volumes for storing the data.
)(M5-68) fe] Tech-Neo Publications..A SACHIN SHAH Venture

base Managem
NoSQL Distribution Model)....Page no 4-9
= tesen Mu-Sem 5-Comp.
pe
ya, 4.1.8 Comparison between SQL and NoSQL Database System
The SQL databases and NoS

s are comp ared base d on their ability to store the types of data, size
let’ Oh data base
of storage and man. Y more let's discuss the comparisons between SQL and NoSQL databases.
parameters SQL Databases NoSQL Databases ‘ a
Type SQL databases are relational, NoSQL are non-relational.
Storage | SQL databases where information | NoSQL databases are document based, key
stored in a tables value based, graph based, column based
data storing.
SQL databases are better for multi row | NoSQL are better for unstructured data
Data transaction
transactions like documents or JSON.
Schema These databases have fixed or static or | They have dynamic schema
predefined schema
suited for | These databases are best suited for

Hierarchical data storage | These databases are not
hierarchical data storage. hierarchical data storage.
best suited for These databases are not so good for

Complex queries These databases are.
complex queries complex queries
SQL databases are vertically scalable NoSQL databases are horizontally

Scalability
scalable. iat
-| SQL databases follows ACID NoSQL databases follows CAP theorem

Follows properties
/theorm properties
NoSQL databases don’t have join
Operations performed or | SQL databases supports JOIN
operation they may have embedded
supported operations
document concept.
have a predefined NoSQL databases use dynamic schema for

SQL databases
unstructured data.
schema
Examples :
Examples :
MongoDB, Cassendra, HBASE.
Oracle, Postgres, and MySQL
[el Tech-Neo Publications..A SACHIN SHAH Venture

>>| 4.2 REPLICATION AND SHARDING a
MongoDB is a next-generation database that allows you to achieve things that were previously impossible,
It is a significant member of the NoSQL movement and a premier non-relational database management
system. MongoDB stores documents using key-value storage rather than tables and fixed schemas like a
relational database management system (RDBMS).
In big, production contexts, it also provides a variety of horizontal scalability options. MongoDB is a
NoSQL document database system that scales horizontally and uses a key-value structure to store data.
@X 4.2.1 What is Replication?
Scaling NoSQL databases to meet rising demand on your application is quite simple compared to traditional
database servers - you simply add anew server, make a few configuration modifications, and it joins to your
existing servers, enlarging the cluster. All existing databases and collections are replicated and synchronised
with the other member nodes automatically. When the full data volume of your database(s) can fit on a single
server, a replication cluster works well. A full copy of your databases will be stored on each server in your
replication cluster.
Replica Sets are a wonderful way to duplicate MongoDB data across many servers while also having the
database failover automatically in the event of a server loss. Clients can connect directly to secondary
instances to scale read workloads. That’s why it is important to note that master/slave MongoDB replication
is not the same as a Replica Set, and it lacks automatic failover.
2S. 4.2.2
:
Master-Slave Replication
ow . All updates saves
: Read can be done
With master slave distribution, you replicate data across atmasternode | Master. at master node
multiple nodes. One node is designated as the master, bs —
or primary. This master is the authoritative source for : :
Changes propagates
the data and is usually responsible for processing any to the slaves
updates to that data. a
The other nodes are slaves, or secondary. A replication

process synchronizes the slaves with the master. (102)Fig. 4.2.1: Master Slave Replication
4.2.3 What is MongoDB Sharding ?
MongoDB scales by using a method known as "sharding." It is the process of writing data across multiple
servers in order to distribute the read and write load as well as data storage needs,
MongoDB's method to handling the needs of data growth is sharding, which is the technique of storing data
records across numerous machines. As dala grows in size, a single system may not be able to store it all oF
provide a satisfactory read and write throughput.
NoSQL Distribution Model)....Page no. (4-11
advance Database Managemen System (MU-Sem 5-Comp,
tos
ifficulty of horizo ding to increase the number of
anne : * nal scaling is solved by sharding, You use shar
machines available to handle data expansion and read and write operations
, allocating data
the database takes on the responsibility of
SQL databas
a“ _ asin ee auto-sh arding, where
a Be
to shards ai s § that data access goes to the right shard. This can make it much easier to use sharding
ng i is particularly valuable for performance because it can improve both r ead and
tion. Shardiing
in an applicica
ation.
write performance.
. Using came particularly with caching, can greatly improve read performance but does little for
a way to horizontally scale writes.
applications that have a lot of writes. Sharding provides
ya. 4.2.4 How Data Is Distributed Across Shards ?
a table. In a typical
. collection in MongoDB is similar to a table. Documents are individual rows in
distributes data, or shards, at the collection
database, data is partitioned using a unique key. MongoDB
(table) level, with data partitioned using the shard key.
each document in the collection. To separate
. The Shard Key is based on an indexed key that is present in
partitioning.
sharded keys, MongoDB uses either range-based partitioning or hash-based
¥a. 4.2.5 Distribution Models Consistency in Distributed Data
se to a cluster-oriented NoSQL database is in

e One of the biggest changes from a centralized relational databa
try to exhibit strong consistency by avoiding all the
how you think about consistency. Relational databases
various inconsistencies that we’ll shortly be discussing.
y”
s such as “CAP theorem” and “eventual consistenc
* Once you start looking at the NoSQL world, phrase stency you
hing you have to think about what sort of consi
appear, and as soon as you start building somet
need for your system.
2. 4.2.6 Update and Read Consistency

ng a threat of
s to data then these application prone to be havi
¢ When the application is having concurrent acces istency
types of problems gives a birth to the cons
each other. So these
read write instructions get collide with situation is write- write
on. When two trans actio ns tryin g to write the data at a same time this
preservati decide to apply one,
the syst em. When the write s reach the server, the server will serialize them
conflict in this approach is.
n two or more tran sact ions are trying to write data at a same time and
then the other. Whe of having lost update.
e time there should be possibility
going to serialize the data at a sam
are sometimes
opti mist ic app roa che s to ensu ring consistency in the face of concurrency
© Pessimistic and conflicts from arising; an optimistic
int erc han gea bly . A pes sim ist ic approach works by avoiding
used t typical
s to aris e but rec ogn ise s the m and makes steps to resolve them, The mos
approach allows conflict er to
to use writ e lock s, whi ch require acquiring a lock in ord
is
pessimistic approach for upd late conflicts at a time.
ge a vari able , and the syst em assu res that only one client can gain a lock
chan
_ ————
fa) Tech-Neo Publications..A SACHIN SHAH Venture
———S—— — 2y¥8
er ezement
* Having a data store that maintains update consistency is one thing, but it doesn’t guarantee that-readers of
that data store will always get consistent responses to their requests.
3. 4.2.7 CAP Theorem
* The CAP theorem is frequently used in the NoSQL community as a reason why consistency may be handled
carefully. Eric Brewer proposed it in 2000 [Brewer], and Seth Gilbert and Nancy Lynch [Lynch and Gilbert]
refined it a few years later [Lynch and Gilbert].
* In the view of the handling consistency the basic statement of the CAP theorem : Given the three
properties such as Consistency, Availability, and Partition tolerance, you can only get two. Obviously this
depends very much on how you define these three properties, and differing opinions have led to several
debates on what the real consequences of the CAP theorem are.
¢ A distributed system cannot be a consistent,
available and tolerant to network partitions at the
Consistency
same instance of time. There must be only two of
above properties are satisfied at a time. Since every
distributed system has to be tolerant to the network
partitions and where the two communicating nodes
are also distributed in nature and at a same time one
Partition
has to choose the availability where system always Availability
Tolerance
be available for accepting read and writes and
consistency where an update operation is
synchronized with all other nodes at the same time.
: (103Fig. 4.2.2: Three main features Distributed system
e Consistency : For various transactions, consistency means that the nodes will have the same copies of a
replicated data item visible. Each node in a distributed cluster must return the same, most recent, successful
write. Every client has the same view of the data, which is referred to as consistency. Consistency models
come in a variety of shapes and sizes. Sequential consistency, a particularly powerful form of consistency, is
referred to in CAP.
e Availability :Each read or write request for a data item will either be processed successfully or will receive
an error message indicating that the operation cannot be performed. In a reasonable length of time, every
non-failing node responds to all read and write requests. Every node on the network must be able to reply in
an acceptable length of time in order to be available.
e Partition Tolerance : Partition tolerance means that the system can keep running even if the network
connecting the nodes fails, resulting in two or more partitions, each with its own set of nodes that can only
communicate with one another. That is, despite network partitions, the system continues to function and
maintains its consistency promises. Network partitions are an unavoidable reality. Once a partition repairs,
distributed systems that ensure partition tolerance can gently recover.
(MU-New Syllabus w.e.f academic year 21-22)(M5-68) Tech-Neo Publications_A SACHIN SHAH Venture
t MU-Sem 5-Comp. NoSQL Distribution Model)....Page no. 4-13
Advance Database
2a. 4.2.8 Notion of ACID Vs BASE
«The basic idea of this ACID and BASE properties which must possess by the database so that one rou
system will be available for the users.
consistency and availability in a partition
e The CAP theorem states that it is impossible to achieve both
tolerant distributed system.
database models is the way they deal with this
e The fundamental difference between ACID and BASE
Consistency, Isolation and Durability in the

* ACID Properties : The ACID properties are Atomicity, unique
to handle the transactions efficiently with this 4
Database management system we are using they able
properties.
It's a
se of the ACID database transaction paradigm.
» A executed transaction is always consistent, becau ssing
transaction processing or online analytical proce
fantastic fit for companies that deal with online
because of this.
ns at the same
can manage a large number of tiny transactio
e These businesses require database systems that
time. Invalid states must be treated with zero tolerance.
& ACID stands for

or the process halts and the database reverts back to
* Atomic : Each transaction is either properly carried out
data in the database is valid.
the state before the transaction started. This ensures that all
e the structural integrity of the database.
e Consistent : A processed transaction will never damag
ly
other or none of the instructions from the concurrent
e Isolated ; No two transactions will collide with each
executing transactions should collide.
if the failure occurs, once system recovered the
e Durable : The transactions should get executed even
not be any data loss.
pending transactions will get execute and there should
E> BASE Properties

new
and fluid way to manipulate data. As a result, a
© The rise of NoSQL databases provided a flexible
properties.
database model was designed, reflecting these
as ACID properties.
e The word BASE cannot be abbreviated as simply
i BASE stands for

ses will
ally Avail able : Rathe r than enforc ing immed iate consistency, BASE-modelled NoSQL databa
e Basic
se cluster.
replicating it across the nodes of the databa
ensure availability of data by spreading and
data values may change over time. The BASE model
© Soft State : Due to the lack of immediate consistency,
responsibility
enforces its own consistency, delegating that
breaks off with the concept of a database which
to developers.

(MU-New Syllabus wef academic year 21-22)(M5-68)
....Page no. (4-14
NoSQL Distribution Model)
Advance Database Management System MU-Sem 5-Comp.
not mean that it
t : The fact that BAS E does not enforce immediate consistency does
Eventually Consisten
.
it does, data reads are still possible
never achieves it. However, until
ACID vs. BASE: Which one is good ?

superior As a
ide a defi niti ve resp onse (0 the question of which database model is
It's impossible to prov ion.
must be considered while making a decis
result, all components of the project and
be a bette r optio n for indi vidu als that seek consistency, predictability,
ACID-compliant databases will
reliability due to their highly structured nature.
allows for simpler scaling and
prior itise expa nsio n will likel y choose for the BASE model, which
Those who ictions.
requires deve lopers who are familiar with the model's restr
more flexibility, BASE, on the other hand,
Handling Transactions
saction, you
sact ions work simi larl y to tran sact ions in other databases. To use a tran
NoSQL database tran to execute your group of
r, and then you use that session
drive
start a MongoDB session through the ments, multiple
perf orm inser ts, upda tes, and reads across multiple docu
commands. You can then g that they will be
glob ally shar ded clus ters with in the transaction scope knowin
collections, ‘and across
executed in ACID compliance.
ations:
like MongoDB do have a few limit
Transactions in NoSQL database
ections
o You can’t read from any of the system coll
o You can’t write to capped collections
t created already
o You can’t write to collections that aren’
indexes
o You can’t modify or drop collections or
and it
cons iste ncy: The term cons iste ncy refers to the database consistency
Consistency and Eventual
database at any moment.
should be related to retrieval of data from
server nodes across the
be strongly consistent at all times. All the
Consistency simply means the data must
implement this
y at any point in time. And the only way to
world should contain the same value as an entit
when being updated.
behaviour is by locking down the nodes
Eventual Consistency
s the data store to be highly available. It is also
Eventual consistency is a consistency model that enable
buted systems.
known as optimistic replication & is key to distri
and let’s say a write request comes to one of
Whenever we use multiple replicas of a database to store data
strategy to make this write request at one replica
the replicas. In such a situation, Databases had to discover a
write data of the request and become consistent.
reach other replicas so that they all could also
Venture
Tech-Neo Publications::A SACHIN SHAH
21-22)(M5-68)
a
po 4.3. TYPES OF NOSQL DATABASES
NoSQL databases are all quite different from SQL databases.

They all use a data model that has a different structure than the traditional row and column table model used
with relational database management systems (RDBMSs).
But NoSQL databases are all quite different from each other as well.
Let’s discuss few of them as below,
data element in the
Key value data store:-The simplest type of NoSQL database is a key-value store . Every
database is stored as a key value pair consisting of an attribute name (or "key") anda value.
name
In a sense, a key-value store is like a relational database with only two columns the key or attribute
(such as state) and the value (such as Maharashtra) as below. ,
“State”: “Maharashtra”
¢
t= Document database and Column Family Data store

(not Word documents or Google
A document database stores data in JSON, BSON , or XML documents
can be indexed for
docs, of course). In a document database, documents can be nested. Particular elements
faster querying.
Document databases are popular with developers because they have the flexibility to rework their document
structures as needed to suit their application, shaping their data structures as their application requirements
change over time.
A column store is arranged as a group of columns, whereas a relational database stores data in rows and
reads data row by row.
This means that if you just need to analyse a few columns, you can read those columns directly without
wasting RAM on irrelevant data. Because columns are frequently of the same kind, they benefit from more
efficient compression, which speeds up reads. The value of.a column in a columnar database can be easily
aggregated.
4.3.1 Comparison of NoSQL Databases w.r.t CAP Theorem and ACID Properties
many
Due to a mismatch between the in-memory data structure and relational data structure of applications,
not need to
problems were faced by application developers. By using NoSQL databases, developers do
point to the
convert in-memory structure to relational structure. Hence, they also use it as an integration
application.
perfectly on clusters.
Relational databases were not designed in such a way that they can run
The storage-requirement is growing day by day and the solution is moving towards distributed systems.
databases to achieve higher scalability, higher speed, and
The organizations are shifting to NoSQL
continuous availability.

°& 4.3.2 RDBMS To NoSQL Database w.r.t ACID and BASE
* RDBMS systems are made such that they don’t scale. Handle things like foreign keys, maintain relations
over the entire data set. The problem with this is to handle the data on a large set of machines with their
foreign key relationships,
* According to CAP only two properties out of three can be achieved. If the consistency is the absolute
requirement we have to give up the other two. Because the RDBMS follow ACID(Atomicity, Consistency,
Isolation, Durability), so it is difficult to scale the RDBMS.
2X 4.3.3 Features of NoSQL Database
e The need for Speed : Whenever a fast response time is required, the data should be placed in the memory.
In this case, when the very fast response time is required we have to choose a database that stores the data
in the memory.
e The need of Scale : With the increased number of users and data volumes organizations requires such
databases which are easily scalable:
¢ Need for Continuous Availability : Slow performance can drive a customer away and nothing is worse than
downtime. There is a difference between high scalability approach that RDBMS offer with master-slave
architecture and the continuous availability that NoSQL databases like Cassandra offer no downtime with
redundant copies of data are being spread throughout a cluster across multiple locations.
e Need for Location Independence : The ability to serve data quickly to multiple locations is critical.
Because of fundamental master-slave design, RDBMS struggles to provide fast read access to many
locations.
: enn | G.4.3 Which of the following is a NoSQL Database

ppl 4.4 MULTIPLE CHOICE QUESITONS _ Type? ;
(a) SQL (b) Document databases
Q.4.1. | MongoDB can be used as a , taking (c) JSON (d) All of the mentioned
advantage of load balancing and data replication v Ans. : (b)
features over multiple machines for storing files. |" @ 44 Which of the following is a widé-column store?
(a), AMS EMS (a) Cassandra. (b) Riak
(c) File system (d) None of the mentioned (c) MongoDB (d) Redis Ans. 2 (a)
v Ans. : (a) Q.4.5 Why MongoDB is known as best NoSQL
Q.4.2.MongoDB has been adopted as software database?

by a number of major websites and services. (a) Document Oriented (b) Rich Query language
(a) frontend (b) backend (c) High Performance (d) All of the mentioned
(c) proprietary (d) All of the mentioned ¥ Ans. : (d)

“Ans. :(b) | Q.4.6 Explain the structure of ObjectID in MongoDB.
(a) ObjectID is a 10-byte BSON type
(b) ObjectID is a 12-byte BSON type
Adwance Database (MU-Sem 5-Comp. NoSOQL Distribution Modol)....Page no. 4-17
(c) ObjectID is a 20-byte BSON type the key named post_text from the first document
(d) None of the mentioned “Ans, : (b) retrieved?
@.4.7 Which of the following language is MongoDB (a) db.posts.find(( | ,{_id:0, post_text:1))
Writlen in? (b) db. posts. findOne( (post_text:1))
(a) Javascript (by Cc (c) db.posts.finOne([ },(post_text:1})
(c) C++ (d) All of the mentioned (d) db.posts. finOne( ( }.(_id:0, post_text:1})
“Ana, 3 (d)
~ Ans, : (d)
a>
Q.48 Whatis the aim of NoSQL? Q. 4.15 What is true about Replication?
data
(a) Replication is the process of synchronizing
(a) Not suitable for storing structured data. across multiple servers.
es
data.
(b) Allow storing non-structured (b) Replication provides redundancy and increas
(c) New data format to store large datasets data availability with multiple copies of data on
different database servers.
(d) An alternative to SQL databases to store textual of
(c) Replication protects a database from the loss
data : (c)
~ Ans. a single server.
nosq] is ; (d) All of the above ~ Ans. : (d)
Q.493 The core princofiple
(2) Low availabilit y (b) High availability Q. 4.16 In MongoDB client, how to initiate a new replica
(c) Both AandB © (d) None of the above set?
“Ans.
: (b) (a) rs.initiate() (b) rs.conf()
(c) rs.status() (d) rs.new() ~ Ans. : (a)
Q.410 Which achitecm does NoSQL re
follow?
(a) Shared Memory Q. 4.17 is the process of storing data
(b) Shared Nothing records across multiple machines and it is
(c) Shared Disk MongoDB's approach to mecting the demands of
(d) Shared Nothing Architecture ~ Ans.: (d) data growth
(a) Shading (b) Config Servers
Q@.411 Which of the following is a NoSQL Database
(c) Query Routers (d) Projection Ans. : (a)
Type?
(a) SQL (b) JSON Q. 4.18 Single replica set has limitation of?
(c) Document databases (d) Alll of the above (a) 1ONodes (b) 12 Nodes
7 Ans.
: (c) (c) 8 Nodes (d) Infinite Nodes ¥ Ans. : (b)
Q@.4.12 Which of the fol classification

a primary ing
islow Q. 4.19 Which of the following is true about why to use
for nosql architectu res Shading?
Database
& Graph nt
(2) Docume (a) In replication, all writes go to master node
(b) Key / value database (b) Memory can't be large enough when active
dataset is big
(c) column-oriented database
(d) All of the Above : (c)
~ Ans. (c) Vertical scaling is too expensive
(d) All of the above ~ Ans. : (d)
413 What are the disadvantages of NoSQL?
(a) NoSQL is not compatible with SQL. Q. 4.20 What does the following aggregate query perform?
(b) In order to support ACID developers will have db.posts.aggregate({_ _{ $match; | likes: ( Sgt:
to implement their own code, making their 100, $lte : 200 } } }, ( Sgroup: ( _id: null,
systems more complex. count: { $sum: 1 } } }]);
(c) NoSQL databases don’t have the reliability (a) Calculates the number of posts with likes
functions which Relational Databases have between 100 and 200
(d) All of the above ¥ Ans. : (d) (b) Groups the posts by number of likes (101, 102,
_id, 103) by adding | every time
Q.4.14 Consider a collection posts which has fields:
times tamp, post_t ags (c) Fetches the posts with likes between 100 and
post_text, post_author, post_
200 and sets their _id as nul
etc. Which of the following query retrieves ONLY

——x
Advance Database Management System (MU-Sem 5-Comp.) (NoSQL Distribution Model)....Page no, (4-1 8)
(d) Fetches the posts with likes between 100 and
Q, 4.28 Point out the wrong statement.
200, sets the _id of the first document as null
and then increments it 1 every time “Ans. : (a) (a) Map-reduce cannot have a finalize stage to
make final modifications to the result
Q. 4.21 Which of the following aggregation commands in
MongoDB does not support shaded collections? (b) Map-reduce is less efficient and more complex
(a) aggregate than the aggregation pipeline
(b) mapReduce
(c) group (c) Specifically, a user with the user Admin role
(d) Allofthe above Ans. : (c)
can grant itself any privilege in the database
Q. 4.22 is a binary serialization format used to
store documents and make remote procedure calls (d) All of the mentioned ~ Ans. : (a)
in MongoDB. Q. 4.29 The aggregation pipeline can use to
(a) BSON (b)GridFS improve its performance during some of its stages.
(c) JSON (d) None of the mentioned (a) indexes (b) OptmData
~Ans. : (a)
Q.4.23
(c) functions (d) all of the mentioned
Point out the correct statement.
v Ans, : (a)
(a) ObjectIds are small, likely unique, fast to
generate, and ordered 12 Byte Hexadecimal 'Q. 4.30 MongoDB uses the notation to access
the elements of an array and to access the fields of
number
an embedded document.
(b) Objectlds are large, likely unique, and ordered (a) Dot
(c) ObjectIds values consists of 18-byte ‘(b) Array
(d) Objectlds values consists of 8-byte ¥Ans. : (a) (c) Nested Sets
Q. 4.24 Which of the following data type is depreciated? (d) None of the mentioned ¥ Ans. : (a)
(a) Double (b) String Q. 4,31 MongoDB indexes use a data structure.
(c) Object (d) Undefined v Ans. : (d) (a) Hash
Q. 4.25 In the mongo shell, you can access the creation (b) Map
time of the Objectld, using the method. (c) B-tree
(a) getTime() (b) getTimestamp() (d) Red Black tree v Ans. : (b)
(c) Timestamp(Q) (d) None of the mentioned Q. 4.32 MongoDB uses indexes to index the
¥ Ans, : (b) content stored in arrays.
Q. 4.26 What is eventual consistency (a) single key (b) multi key
(a) At any time, the system is linearizable (c) compkey (d) none of the mentioned
(b) At any time, concurrent reads from any node v'Ans, : (b)
return the same values
Q. 4.33 A replica set can have only primary.
(c) If writes stop, all reads will return the same
(a) One (b) Two
value after a while
(c) Three (d) Many v Ans. : (a)
(d) If writes stop, a distributed system will become
consistent v Ans. : (c) Q. 4.34 MongoDB supports sharding through the
configuration of a sharded
Q. 4.27 are operations that process data
(a) shapes (b) clusters:
records and return computed results.
(a) ReplicaAgg (b) SumCalculation (c) clusters (d) Databases v Ans. : (b)
(c) Aggregations, (d) None of the mentioned
¥ Ans. : (c)
Chapter Ends...
O00
MODULE 5
NoSQL using
CHAPTER 5 _ MongoDB
Introduction to MongoDB Shell, Running the MongoDB shell, MongoDB client, Basic
NoSQL using MongoDB:
operations with MongoDB shell, Basic Data Types, Arrays, Embedded Documents.
operators and sorting, simple aggregate
Querying MongoDB using find() functions, advanced queries using logical
Concepts of replication and horizonal
functions, saving and updating document. MongoDB Distributed environment:
scaling through sharding in MongoDB.
ssesesg2cce 5-2
5.1 NoSQL using MONgoDB..........sssssscsssssessessssssssnseresssessseserenensesessessnesnaetansnenscsneseanesstanansaaneanensqaearassasearinnasernatess
tates 5-4
5.1.1 MOngoDB Client..........essessssesssssssesesssenessnensesncersseanssnanenensearsessessescecenacansssearsaseassnecasanenaneaceanenscusananensnuncgssess
mes 5-5
5.1.2 Comparative Analysis of SOL Database Objects and NoSQL Database ODSCIS ai ectite evr scsvesonduentareeettge
tes 5-5
5.1.3 Basic Operations with MongoDB Shell .........--:-sssssssssssessssnsrssecsssnssseseanserensienrseacenessssnanscnnaransansancanennanaaaneas
5-10
5.1.4 Basic Data Types in MongoDB.............5...4 secesnsnuecectuuuecssasecnennesanssssscnseascssenuesenanegnsonsgvectsuscenasscransenasasensssets
s
BAB AITAYS.cosecsnsecssssesssenrssnecsnssssssscessnecsnnenssnsersnvscessaeesnsecsansccsunecouscesssssssanssstsecansccaunacsussecganeqqanengnastnaantsassnassesseessnss 5-11
rie eae 5-12
5.2 — Querying MongoDB using find() functions ........ssesssseseresssssssneessseenussnsensetenrstesrsassssoneceesesensuusnunannnenenenngns
5.2.14 snes tees 5-19
Sorting im MOMQOMD........ssssessssseveesceesssnessesssssseseessnneseeecensascansuauanenansssinuaentassaccessnuaassenannanaranaaaannsanrss
5.2.2 ite
MongoDB Distributed Ennviriori ety ocoz bed ioxsde tieechedb ikcnecns eeepc
nnie att ctattecccect es henasitacaaseaseananconnats 5-21
s §-22
5.2.2(A) Replication in MOngODB ......-.sssesesssssssssssssssesscesceanssssneesseserersessnuassanannnnannnannannnnnnansnnegnanannnanannnaneanangnannangnneneee
sananraasanssy 5-22
5,2.2(B) Sharding Components .....ssosssssssssssersvssesnnereneettnsscssssnsnseretneeseaeereesnnneesnanennonenaunanenunanazza
5.2.3. Benefits of Sharding over Replication

5.3 Descriptive Questions .......sssssssereccetereserceeecesatennirenes
uansusnbdundoe uea¥edestdenc
nsdhsoavetsdu assoq¢eeneasseased anea
cencavsasuans d 5-23
sorens
5.4 Multiple. Choice Quesitons ........----1sssse sacocenununnasencean
dudsrvas
tientmnsen ts Bo
% Chapter Ede asssssssssnscosnssesssncentnoeesermecetununnnrnmeatntsiaisinasietterteecntnsieretiannitiea
Advance Database Management System (MU-Sem 5-Comp. NoSQL using MongoDB)....Page no. (5-2
a ES
>> 5.1 NOSQL USING MONGODB
* MongoDB, like other database scripting/query languages such as SQL, MySQL, and Oracle, offers excellent
performance, scalability, and availability for database management.
¢ MongoDB is a widely used NoSQL database that stores data in a JSON format. It's what gives Mongo DB its
scalability and flexibility.
= Introduction to MongoDB Shell
* The mongo shell is an interactive JavaScript interface to MongoDB. You can use the mongo shell to
query and update data as well as perform administrative operations.
* The mongo shell is included as part of the MongoDB server installation. If you have already installed the
server, the mongo shell is installed to the same location as the server binary.
MEC Windssessystemid cmd.ece « margaee Student - = ih
Ue Meeste
pe Ce ot
cute Ge
e Here you can see in above image when we type 3 + 4 mongodb shell which is JavaScript enabled and
showing addition of3 + 4=7.
t= Running the MongoDB shell
e Let’s see How to start the shell and get connected with MongoDB database.
e After successful download you can get too connected with server but before that it is necessary that the
MongoDb server instance is running and started successfully. You can verify the MongoDb server instance
“mongod” is running on machine. Afterwards open command prompt and navigate to MongoDb
installation
directory up to bin folder and then type “mongo” command your client will get connect to MongoDb server.
e — Let’s see How to start the MongoDb database from binary distribution on windows
machine.
» Step 1: Open command prompt and navigate to MongoDb installation directory up to bin folder as shown
below.
g | —.|..fe} y i
BE CAWindows\system32\cmd.exe ae |
icrosoft Windows (Version 6.1. 7608]
fopyright (c> 2089 Microsoft Corporation. All rights reserved.
C:N\Users\admindE:
E:\>cad E:\8KN DATANDBMS Lab\nongodh-windous~64-3 .4.9\nongodh_3.4Nbin
IE:\SKN DATANDBMS Lab\mongodb-windows-64-3.4.9\mongodh_3 .4\binoL
> Step2: It is necessary to start the mongodb server first before running any client. The client and
server instances are as follows:
(1) Server Instance: - mongod
(2) Client Instance: - mongo

e Let’s start the server and while connecting create one folder on hard drive and submit the path of same folder
with server start command sop that the future work will store to the same directory.
Er C\Windowsisystem3.Z\cmd.exe - mangod.ere —-dbpath E/student
aid
poe ORM Mec thor ar
ieee ae To ee
STUDS Pe ae
ae)
* You can see two red boxes in above image upper red box showing you command we need to type for starting
server the same command is as below,
mongod.exe —dbpath E:/student
_
Advance Database Management System MU-Sem 5-Comp. NoSQL using MongoDB)....Page no, (5-4
* Here mongod.exe will be a exe to start the server instance mongod and E:/stude
nt is folder created on hard
drive and it is being passed here while Starting server,
> Step 3 : Now as we have seen step 2 we have started mongodb server just keep the same command prompt
running and open new command prompt to run the client and start the client with the help of following
command “mongo.exe student”,
Soong .
b-windows -64-3.4.9\ Ber esl me) -4\binomongo.exe student

0.0.1: 27017 student}
CONTROL initandlisten
I CONTROL ete * WARNING: Access control is |
I CONTROL not enabled for the database.
[initandlisten] =* CU en een Met ett)
CONTROL [initandlisten |
eh aces Eberle epee Hotfix KB2731284 or later update is not installed, |
eh UE te) Se abe ReteTp will zer
¢ Here in this above image you can see we have started client by using mongo.exe student this command and
it
is started because our server is running on port number 27017 on local host and directory (student) get
configured. Everything is fine.
* Hence we have successfully started the mongodb server and client through the mongodb shell let’s try
various CRUD operation on mongodb database in next section.
YS 5.1.1 MongoDB Client
e As above we have seen the server started in one command prompt with the help of proper command
and if
everything is fine it is ‘started on local host on port no 27017. Now the time is to run the client
we have to
run the client and ultimately it will open a connection with the mongodb server running
on a port it
connection is successful then we will get a access to the database directory selected at the time
of Starting the
server.
¢ We can start running client instance mongo by just executing mongo.exe in
separate command prompt and
while calling select the database directory where we need to connect.
e Command is :- >mongo.exe Student
Advance Database Management System (MU-Sem 5-Comp.) NoSQL using MongoDB)....Page no. (5-5
Be CW
e In above image the command shown on red box is to start the mongodb client instance and it will be going
to connect with mongodb server which is already running on 27017 port on local host.
2S 5.1.2 Comparative Analysis of SQL Database Objects and NoSQL Database Objects
e The comparative analysis for the various database objects of SQL and replaced with NoSQL databases
areas below:
SQL Databases NoSQL Databases

Database Database
Table Collection
Rows/Records/Tuples | Documents
Primary key Object Id (id) filed for

document.
SQL Join Embedded Documents
e As per above the various objects are identified with respect to SQL and NoSQL databases.
Ya. 5.1.3 Basic Operations with MongoDB Shell

and
* After installation is successful then it is shown at the server end that one connection to the server is open
then we can start executing commands through mongodb client.
below,
° Few basic administrative mongodb database commands are as mentioned
:
(1) To display version of mongodb database we are using
MongoDB Enterprise >version();
3.4. ipo
GuaNa—a—aoaeae—a——eeeeeeeee—eeeeeee oe SSSmE—_ aa—ae—aere er rP_P_—l
(2) To display the help manual for mongodb commands you can use,
db.help ();
It show the help options for a collection methods also by following way,
db.<Collection_name>.help();
(3) To display the list of databases in mongodb:-
showdbs; .
or
show databases;
(4) To display the list of collections from the current databases:
show collections;
(5) To display the list of users of current database:-
show users;
(6) To display the various roles of the users from current databases:-
show roles;
S2iomd.exe- mongoext Stucent
(MU-New Syllabus w.e.f academic year 21-22)(M5-68) =: Tech-Neo Publications..A SACHIN SHAH Venture
Advance Database Management System (MU-Sem 5-Comp.)
———————————————
oe
(NoSQL using MongoDB)....Page no. (5-7)
(7) To create new database in mongodb database: - Let’s create Books database.
use Books
This command will create Books database in mongodb and select the same as a current database. Please
note one thing here until and unless you don’t have any collection created in the empty database it will
not be displayed in the list shown by show dbs command;
(8) To create collection in database we may use below command:-
db.createCollection(“Collection Name”);
BG C\Windows\system32\c
Poot hea lee db.createCollection("DBMS Books");

Pm 5 SR
MongoDB Enterprise >
Here we have created new collections on the Books database we have created above show execute show
dbs () as well as show collections () commands and notice the difference now Books database is
displayed in the list.
’
‘ee
MY CAWindows\system32\cmd.exe - mongaexe Stucent CS beetle
MongoDB Enterprise > show dbs;

E
Stud mPa leis)
Student mS Faseis)
Studentinfo .878GB
3. @78GB
Cassese)
SCP Les)
mo Fi. 6)3)
skncoe moPs-1e)s]
[al Tech-Neo Publications..A SACHIN SHAH Venture

Advance Database Management System (MU-Sem 5-Comp.)
ing MongoDB)....Page no.
———————
NOS ON SB Se (5-8)
rr = a
(9)
c
{
ME C\Windows\syster3Zi.cmdene - mongo ere Student
MongoDB Enterprise > show dbs;
baat) Pes 165)
Student PL se)
CRRA: 1e:)
FATE)
RSF e es)
®.078GB
Erste)
CRA Tes]
MoneoNR. Enterprise >» show collections;
DBMS_Books
eee we tires yt
fongoDB Enterprise >
[= Commands to see new collection created in database
(1) so far we have created database then we have seen different database operations and now let’s try to insert
the data in the collection created with name DBMS_Books. Note one thing as we have discussed above as,
when we insert data in mongodb it will get inserted as a document just like inserting rows in SQL databases.
Let’s see few examples.
MongoDB Enterprise >db.DBMS_Books.insert({Rook_id : 2, Book_Name : "Complete Guide to DBMS",
Author:"Desai", Edition : 4});
WriteResult({. "nInserted" : 1 })

f Hi CAWindows\system32.cmd.exe - morga.exe Student = > 3 bette
ert({Rook_id : 2, Book Name -: "C
i a a ore a ’
> Se
You can see above. we have inserted one document in the mongodb database and while writing insert
operations we have written as db.Collection_name.insert({}), first we have used
db object which is
specifically instance of the currently selected database then
collection_name for in which collection we
Advance Database Management System (MU-Som 5-Comp.) (NoSQL using MongoDB)....Page no, (5-9)
suppos
suy - ” ins
and the records ; and followed
7 by data in‘ the form of key: value pair we have written on
parenthesis () while all key value pairs are written inside curly brackets (}.
(2) Display the content of the collection.

To display the content of any collection we can execute the following command.
db.Collection_name.find()
MongoDB Enterprise >db.DBMS_Books.find();

{ "_id" : Objectld("60c¢7a339632c042c266f6cb7"), "Rook_id" : 1, "Book_Name" : "Instoduction to DBMS",
"Author" : "Korth", "Edition" : 6 }
{ “_id" : Objectld("60e7a377632c042c266f6cb8"), "Rook_id" : 2, "Book_Name" : "Complete Guide to
DBMS", "Author" : "Desai", "Edition" : 4 }
BE C:\Windows \system32\cmd.exe - mangoene Student i ba Ene
: “"Insteduction to
omplete Guide
and when we
Here you can see when we have added data in the collection it is inserted into document
is having _id
displayed it the same will get displayed as above. The most important part is every document
al number for every
field and along with Objectld added with it. This object id is 12 byte hexadecim
document added by mongodb database only.
of different information such as,
This 12 byte hexadecimal Object id is unique and it’s a combination
_id: ObjectId(4 bytes timestamp,
3 bytes machine id,
2 bytes process id,
3 bytes incrementer)
way.
(3) Display the documents in formatted
db.collection_name.find().pretty();
the document in the form of key value pair in a
This pretty function is used to display the content of
formatted way such as below,
(MU-New Syllabus wieif academic year 21-22)(M5-68) Tech-Neo Publications...A SACHIN SHAH Venture
System (MU-Sem 5-Comp. NoSQL usingi M ongoDB)....Page no, 5-10
(5-
MongoDB Enterprise >db.DBMS_Books.find().pretty();

{
"sid! :Objectld("60e7a339632c042c266f6cb7'),
"Rook_id" : 1,
"Book_Name" : "Instoduction to DBMS",
"Author" : "Korth",
"Edition" : 6
"aid" :Objectld("60e7a377632c042c266f6cb8"),
"Rook_id" : 2,
"Book_Name" : "Complete Guide to DBMS",
"Author" : "Desai",
"Edition" : 4
eur OM ins 9916
ae REMY aCe EES SP late Pleas ted ad

“Rook id” = 1,
“Book Wame“ : “Instoduction to DBMS",
ehre) a ood
“Edition” : 6
weet re aa SC lil LET e Sila ve lita

“Rook_id™ = 2,
“Book_Name“ : “Complete Guide to DBMS",
“Author” : “Desai™,
bl a: Fi ete e
Fi
eee Ly
As we can see the difference with pretty() function when it is used display content with find() function it will
be displaying the data in formatted way.
7S 5.1.4 Basic Data Types in MongoDB
(1) String : The String is the most commonly used data type to store the data, String in MongoDB must be
UTF-8 valid.
(2) Integer : The Integer type is used to store a numerical value. Integer can be 32 bit or
64 bit depending
upon your server.
(3) Boolean : The Boolean type is used to store a Boolean (true/ false) value,
(MU-New Syllabus w.e.f academic year 21-22)(M5-68) le] Tech-Neo Publications..A SACHIN SHAH Venture
System (MU-Sem 5-Comp. NoSQL using MongoDB)....Page no. (5-11
(4) Double : The Double type is used to store floating point values.
(5) Min/ Max keys : The Min/Max type is used to compare a value against the lowest and highest BSON
elements.
(6) Arrays : The Array type is used to store arrays or list or multiple values into one key.
(7) Timestamp : The timestamp. It can be handy for recording when a document has been modified or added.
(8) Object : This data type is used for embedded documents.
(9) Null : This type is used to store a Null value.

it's generally reserved for
(10) Symbol : The Symbol data type is used identically to a string; however,
languages that use a specific symbol type.
your
(11) Date : The data type is used to store the current date or time in UNIX time format. You can specify
own date time by creating object of Date and passing day, month, year into it.
(12) Object ID : This data type is used to store the document’s ID.
(13) Binary data : This data type is used to store binary data.
(14) Code: This data type is used to store JavaScript code into the document.
(15) Regular expression : This data type is used to store regular expression.
7a. 5.1.5 Arrays
in collections and a collection is

In NoSQL databases like MongoDB database, data is organized and stored
, like in a JSON.
containing documents. A document has fields and values (Key -Value pair)
date, etc.) and composite data types
The field types are basically scalar data types such as (string, number,
t the array like data structure in
such as (arrays and objects). We can use the scalar data types to represen
in mongodb array.
NoSQL MongoDB databases let’s see one example how we can add data
Example
Books collection with us and we have 2

Let’s see the previous example suppose we have this DBMS_
documents in the collections.
= © ete
nd- monga eat Student
exe :
MB C\Windowctsystemi2ier
WS Books. fiad() pretty;
)(M5-68) Tech-Neo Publications..A SACHIN SHAH Venture

Advance Database Management S stem (MU-Sem 5-Comp.
NoSQL using MongoDB)....Page no. 5-12
Now we will add one more document in the collec
tion with Book details who has more than one
author for
the book. For ex we want to insert Java book details
which has 2 authors for the same book, for this
add the author information we can
in array.
Bi CAWindows\cystem3Acnd exe- Mongo.ere Student
ese =a 4 =) S fetal
iene { reer
ete
ob Oot a is] ale nee Bera Drea

oh pelea Vi Las IN
Teel a Tha me OCSc tasnne ey ID
cea rota aa
<a a
msc hOS ts an 9
pi Ratios eC lel SEL Eee rn Pernt tet

ae Pl in
Complete Guide to DBMS"
,
A tose
teh te atesthak et
Introduction to arr
As per above we can use arrays in MongoDB databases

and we can insert data where we need to have more
than one value for a field. As show in above example we
can have more than two authors for a one book.
== Embedded Documents
MongoDB provides you a cool feature which is known

as Embedded or Nested Document.
Embedded document or nested documents are those types
of documents which contain a document inside
another document.
Or in other words, when a collection has a document,
this document contains another document, another
document contains another sub-document, and so on,
then such types of documents are known as
embedded/nested documents.
bbl 5.2 QUERYING MONGODB USING FIND() FUNCTIONS
In MongoDB, find() method is used to select documents in a collec

tion and return a cursor to the selected
documents. Cursor means a pointer that points to a docume
nt, when we use findQ) method it returns a pointer
on the selected documents and returns one by one.
If we want to return pointer on all documents then use empty

() parameter that returns all documents one by
one. It takes only some optional parameters,
The first optional parameter is the selection criteria on

which we want to return a cursor. To return all
documents in a collection use empty document({})
NoSQL using MongoDB)....Pago no. 5-13
and NoSQL operation and already we have discussed
As we have gone through the many SQL operations
nt ways to fetch the
about the finding information from the table in variety of ways now let's see the differe
data from the Mongodb collections.
(1) Selecting the information from the collection

Syntax :-db.collection.find(query, projection)
MongoDB Enterprise >db.DBMS_Books.find().pretty();
{
" id" :Objectld("60e7a339632c0420266{6cb7"),
"Rook_id" : 1,
"Book_Name" : "Instoduction to DBMS",
"Author" : "Korth",
"Edition" : 6
"id" :Objectld("60e72377632c042¢266f6cb8"),
"Rook_id" ; 2,
"Book_Name" : "Complete Guide to DBMS",
"Author" : "Desai",
"Edition" : 4
"id" :Objectld("60ebec6bfd 1 a80d08c3cd489"),

"Book_id" : 3,
"Book_Name" : "Introduction to Java",
"Author" : [
"Ghosling",
"Adam"
],
"Edition" : 5
}
In above query we have fetched all the documents from the DBMS_Books collection and as we have already
discussed pretty() function is used to display the documents in formatted way,
Advance Database Manage
__
ment tem (MU-Sem 5-Comp. NoSQL using MongoDB)....Page no, 5-14
(2) To find a specific document from the collection with a specific condition we
can use below command,
MongoDB Enterprise >db.Student.find({Marks: 56}).pretty();
{
"id" *Objectld("59b96d9d3fca9f8e61527676"),
"StudnetName’ : "Pramod",
"Section" : "c",
"Marks" : 56,
"AdmissionDate" :ISODate("2017-09-13T17:40:45.1287"
')
}
In above example we are Supposed to find the studen
ts who have secured 56 marks in the examination
why we have written a condition in find function that’s
such as Marks : 56 it is in the form of key : value
pair and it
is displaying result of student matched with the specified
criteria.
(2) Demonstrate the use of findOne() : Simil
arly with find() function findOne() function
displays the first
document from the collection.
MongoDB Enterprise > db.Student findOne();
{
"id" ‘Objectld("59b2d719610568336449e0c9
"),
"StudentName" : "Tarun",
"Section" : "A",
"Marks" : 105 >
"Subject" : [ ],
"AdmissionDate” :ISODate("2017-09-13T1 7:37:09.0222")
}
(3) Sort the documents in ascending or descending order :
Here sort() function is used for sorting the documents in ascend

ing or descending order as below.
Sorting documents in ascending order:
Syntax :-db.collection.find().sort({key: 1})
Sorting documents in descending order:-

Syntax :-db.collection.find().sort({key: -1})
For example :- Lets display the documents according to ascending order of Marks.
MongoDB Enterprise >db.Student.find().pretty().sort({Marks : 1});

fal Tech-Neo Publications._A SACHIN SHAH Venture
advance Database Management System (MU-Sem 5-Comp. NoSQL using MongoDB)....Page no. 5-15
{
"_id" :ObjectId("59b96d9d3fea98e61527676"),
"StudnetName" : "Pramod",
"Section": "co",
"Marks" : 56,
"AdmissionDate" :ISODate("2017-09-13T17:40:45.1282Z")
"id" :Objectld("59b96d863fca91B661527675"),
"StudnetName" : "Atish",
"Section" : "B",
"Marks" : 78,
"AdmissionDate" :ISODate("2017-09-13T17:40:22.257Z')
" id" _Objectld("59b2d7266f0568336449e0cs"),

"StudentName” : "Saurabh",
"Section": "A",
"Marks" : 95,
"AdmissionDate" 1SODate("2017-09-13T17:37:09.0222")
"id" :Objectd("59b2d7196f0568336449e0c9"),
"Section": "A", -
"Marks" : 105,
"Subject" : []; :
222")
"AdmissionDate" JSODate("2017-09-13T17:37:09.0
}
only display the number of
the output of find() function and limit can
(4) We can use the limit() function to filter
on.
documents specified with the limit functi
use the limit function as,
ple we want to disp lay only 2 documents from the collection we can
For exam

Advance Database Management System (MU-Sem 5-Comp. lection is as below
We have Students collection and the documents from the ¢ collec
’
MongoDB Enterprise >db Student.find()-pretty()s
{
"_id" :Objectld('59b2d7196{0568336449¢0c9"),
"Section" : "A",
"Marks" : 105,
"Subject" : [ ],
"AdmissionDate" ISODate("2017-09-13T17:37:09.022Z") :
}
{
"_id" :;ObjectId("59b2d7266f0568336449e0ca"),
"StudentName" : "Saurabh",
"Section" : "A",
"Marks" : 95,
"AdmissionDate" :ISODate("2017-09-13T17:37:09.0222Z")
}
{
" _id" :Objectld("59b96d863fca9f8e61527675"),
"StudnetName" : "Atish", .
"Section" : "B",
. "Marks" ; 78,
"AdmissionDate" ISODate("2017-09-13T17:40:22.2572Z")
}
A.
"id" :Objectld("59b96d9d3fca9f8e61527676" ;
-"StudnetName" : "Pramod’,
"Section" : "ce",
"Marks" : 56,
"AdmissionDate" 1SODate("2017-09-13T17:40:45.1287")
}
Lets use limit(2) function to display first

2 documents from theeilbacting
(MU-New Syllabus w.ef academic year 21-22\(MS-68) &) revit

“Ne0 Publications,..A SACHIN SHAH Venti4
wa
advance Database Management System (MU-Sem 5-Comp.) (NoSQL using MongoDB)....Page no. (5-17)
MongoDB Enterprise >db.Student.find().limit(2).pretty();
{
" id" :Objectld("59b2d719610568336449e0c9"),
"StudentName" ; "Tarun",
"Section" ; "A",
"Marks" : 105,
"Subject" : [],
"AdmissionDate" :ISODate("2017-09-13T17:37 :09.022Z")
" jd" :Objectld("'59b2d7266f0568336449e0ca"),

"Section" : "A",
"Marks" : 95, ,
"AdmissionDate” :ISODate("2017-09-13T17:37:09.0222")
}

This is how we can use limit function to limit the output of find function.
(5) Display the records where marks greater than 80.

db Student.find({Marks : {Sgt : 80 }})-pretty();
MongoDB Enterprise > db.Student.find({Marks : {fgt : 80 }})-pretty();

{
" jd” :ObjectId("59b2d7196(0568336449e0c9"),
"StudentName" : “Tarun", :
"Section": "A",
"Marks" : 105,
"Subject" : [],
"AdmissionDate" :ISODate("2017-09-13T17:37:09,0222")
" jd" :;Objectld("59b2d726610568336449e0ca"),

"Section" : "A",
"Marks" : 95,.
Advance Database Management System (MU-Sem 5-Comp,) (NoSQL ualng MongoD6) ...Page No, (5-1 8)
"AdmissionDate" TSODate("2017-09-19T17:37:09,0222")
}
Display the students record whose marks are below 80 juat use
following
db.Student.find({Marks : {$lt : 80 }}),pretty();
r¥ Advanced queries using logical operators and sorting
* We can use the logical operators in MongoDB database when ever some logical relation we required such as,
* AND and OR operators we can use and the syntax to use logical operators as below,
* — Let’s select a range of students whose marks is greater than 50 and less than 80
tS $AND operator in mongodb :
MongoDB Enterprise >db.Student.find({Marks: { $gt: 50, $lt; 80 }}).pretty();

{
<.id" :Objectld("S9b96d863fea9f8e61527675"),
"StudnetName" ; "Atish",
"Section" : "B",
"Marks" : 78,
"AdmissionDate" 1SODate("2017-09-13117:40:22.2572")
}
{
" id" :Objectld("59b96d9d3fca18e61527676"),
"Section": "c",
"Marks" : 56,
"AdmissionDate" :ISODate("2017-09-13117:40;45,1282")
}
Similarly we can use $gte for greater than or equal to and $Ite for loss than or equal to,
e —Let’s select the students whose marks is either 56 or 78 and display the content in formatted way,
EX Logical Operator - $not
The MongoDB $not operator performs a logical NOT operation on the given expression and fetches selected
documents that do not match the expression and the document that do not contain the field as well, specified in
the expression.
(MU-New Syllabus w.e.f academic year 21-22)(M5-68) Tech-Neo Publications,.A SACHIN SHAH Venture
tr Syntax
{ field: { Snot: { <expression> } } }

If we want to select all documents from the collection "student" which satisfying the condition
Marks of the student is at least 40
the following mongodb command can be used :

>db.student.find( {"Marks": { $not: {$lt : 50}}}).pretty();
$OR operator in mongodb:-

MongoDB Enterprise >db.Student.find({$or: [{Marks : 56}, {Marks : 78}]}).pretty();
{
"_id" :Objectld("59b96d863fca9f8e6 1527675"),

"StudnetName" : "Atish",
"Section": "B",
"Marks" : 78,
"AdmissionDate" :[SODate("2017-09-13T17:40:22,.257Z")
{
"_id" :ObjectId("59b96d9d3fca9f8e6 1527676"),
"Section": "c",
"Marks" : 56, ry
"AdmissionDate" ‘ISODate("2017-09-13T17:40:45.128Z")
}
2 5.2.1 Sorting in mongodb
Sorting the documents in mongodb we can arrange the documents in ascending order or descending order
depends upon the requirements,
db.student.find().sort({Marks : 1}).pretty()
we have already executed in last point and we have seen Marks :1 will display the
Above statement
documents in ascending order of marks.
db.student.find().sort({Marks : -1}).pretty()
Marks :-1 will display the
Above statement we have already executed in last point and we have seen
documents in descending order of marks.
Advance Database Management System (MU-Sem 5-Comp. NoSOL using MongoD®)...Page ne. (5-20
ES Simple aggregate functions
Data records are processed and computed results are returned through aggregation processes. Aggregation
operations combine values from several documents into a single result and can execute a number of functions on
the gathered data. Aggregation is performed in MongoDB using three approaches: the aggregation pipeline, the
map-reduce function, and single-purpose aggregation methods.
EF Aggregation
MongoDB’s aggregation pipeline framework is modelled on the basic concept of data processing pipelines
when we aggregate the documents enter into multi stage pipeline that transforms the document into aggregated
results.
To CAWindows\systemi2\ond
exe - mongo.exe Student
You can see in the above example we have executed simple aggregate operation in first half of the query
$match will start matching the records with given key over there such as we have given {Section: A}, so all
records who has Section: A grouped together and in second half we have aggregated the result by taking sum of
mark of students who have Section: A.
E= Saving and updating document

SAVE method
The db.collection.save( ) method is used to updates an existing document or inserts a new document,
depending on its document parameter.
Syntax
db.collection.save( )
t= Parameters
Name _ ojo Required

Optional _
document A document to save to the collection. Required
writeConcern | A document expressing the write concern. Omit | Optional

to use the default write concern.
Example: Save a New Document without Specifying an _id Field.
(MU-New Syllabus w.e.f academic year 21-22)(M5-68) eI Tech-Neo Publications..A SACHIN SHAH Venture
Advance Datebese ft MU-Sem t NoSQL usin MongoDB)....Page no. (5-21
In the following example, save(method performs an insert since the document passed to the method does not
contain the _id field :
>db.invoice.save( { inv_no: "1001", inv_date: "10/10/2020", ord_qty:200 } );

>WniteResult({ “nInserted" : 1 })
During the insert, the shell will create the _id field with a unique Objectld value, as verified by the inserted
document :
Example: Replace an Existing Document
The products collection contains the following document:
{ “_id" : 1001, "inv_no” : "1001", “inv_date” : "10/10/2020", "ord_qty" : 200 }

The save() method performs an update with upsert: true since the document contains an _id field
db invoice.save{ { _id: 1001,inv_no: "00015", inv_date: "15/10/2020", ord_qty:500 } );
E> Update Operations
Update operations modify existing documents in a collection. MongoDB provides the following methods to
update documents of a collection:
In MongoDB, update operations target a single. collection. All write operations in MongoDB are atomic on
the level of a single document.
You can specify criteria, or filters, that identify the documents to update. These filters use the same syntax as
read operations.
MongoDB Enterprise>db.Student.updateOne({StudentName :"Atish"}, {$set : {Marks : 70}});
{ "acknowledged" : true, "matchedCount" : 1, “modifiedCount" : 1 }

With this update statement we can update the marks of student and we can set new marks using $set
operation.
23. 5.2.2 MongoDB Distributed Environment
MongoDB is the leader in a new generation of databases that are designed for scalability. With a technique
called “sharding” you are able to easily distribute data and grow your deployment over inexpensive hardware or
in the cloud. One of the benefits of scaling with MongoDB is that sharding is automatic and built into the
database. This relieves developers of having to build in sharding logic into the application code to scale out the
system. Concepts of replication and horizontal scaling through sharding in MongoDB.
(MU-New Syllabus w.e.f academic year 21-22)(M5-68) © Tech-Neo Publications..A SACHIN SHAH Venture
Advance Database Management System (MU-Sem 5-Comp. NoSQL using MongoD8) Page no. (5-22
@& 5.2.2(A) Replication in MongoDB
A replica set in MongoDB is a group of mongod processes that maintain the same data set. Replica sets
provide redundancy and high availability, and are the basis for all production deployments. This section
introduces replication in MongoDB as well as the components and architecture of replica sets.
* Sharding is a process of splitting up the large scale of data sets into a chunk of smaller data sets across
multiple MongoDB instances in a distributed environment.
¢ MongoDB sharding provides us scalable solution to store a large amount of data among the number of
servers rather than storing ona single server.
e In practical terms, it is not feasible to store exponentially growing data on a single machine. Querying a huge
amount of data stored on a single server could lead to high resource utilization and may
not provide
satisfactory read and write throughput.
* Basically, there are two types of scaling methods that exist to undertake growing data with the
system:
(1) Vertical ~ (2). Horizontal

* Vertical Scaling works with enhancing single server performance by adding more powerful processors,
upgrading RAM, or adding more disk space to the system. But there are the possible implications of applying
vertical scaling in practical use cases with existing technology and hardware configurations.
¢ — Horizontal Scaling works with adding more servers and distribute the load on multiple servers. Since each
machine will be handling the subset of the whole dataset, it provides better efficiency and cost-effective
solution rather than deploying the high-end hardware. But it requires additional maintenance of complex
infrastructure with a large number of servers. .
@& (5.2.2(B) Sharding Components
* Shard is a Mongo instance to handle a subset of original data. Shards are required to be
deployed in the
replica set.
¢ Mongos is a Mongo instance and acts as an interface between a client application and a sharded cluster. It
works as a query router to shards.
* Config Server is a Mongo instance which stores metadata inform
ation and configuration details of cluster.
MongoDB requires the config server to be deployed as a replica set.
2 5.2.3 Benefits of Sharding over Replication
(1) In replication, the primary node handles all write operat

ions, whereas Secondary serversvare: ‘required to
mainta in backup copies or serye read-only operations,
But in sharding along with replica sets, the load
distri gets
buted among numbers of servers.
(2) A single replica set is limited to 12 nodes, but there
is no restriction on the number of shards
(MU-New Syllabus w.e.f academic year 21-22)(MS-68) el Tech-Neo Publications...A SACHIN SHAH Venture
Advance Database Management System (MU-Sem 5-Comp. NoSQL using MongoDB)....Page no, (5-23
(3) Replication requires high-end hardware or vertical scaling for handling large datasets, which is too expensive
compared to adding additional servers in sharding,
(4) In replication, read performance can be enhanced by adding more slave/secondary servers, whereas, in
sharding, both read and write performance will be enhanced by adding more shards nodes.
pl 5.3 DESCRIPTIVE QUESTIONS e ee

Q.1 Explain NoSQL databases? Explain MongoDB database in detail?
Q.2 Write a short notes on
(a) Arrays in mongodb = (b) Mongodb Shell (c) Embedded documents
Q.3 Explain the different MongoDB CRUD operations?| |

Q.4 Explain MongoDB aggregation?
Q.5 Explain the use of operators in the MongoDB database?
Q.6 Explain the difference between SAVE and Update in MongoDB?
Q.7 Howcan the sharding needed ina Distributed database be explained?
Q.8 Explain the concept of replication and sharding? .
Q.9 Explain the use of sort function on mongodb?
Q.10 Explain the use of comparison operators in mongodb?
>» 5.4 MULTIPLE CHOI Q. 5.4 What is the interactive shell for MongoDB called?
(a) mongo (b) ‘mongodb
Q. 5.1 Which of the following is not a NoSQL database? (c)dbmong = (d)_snone of the mentioned
(a) SQL Server (b). MongoDB - | tiee't Y Ans. ; (a)
(c) Cassandra (d) mariadb = Ans.: (a) | Q. 5,5 provides statistics on the per-collection
Q. 5.2 “Sharding” a database across many server instances level.
can be achieved with (a) mongosniff (b) mongotop
(a) LAN (b) SAN (c) mongooplog = (d)_mongofiles ~ Ans.
: (b)
(c) MAN (d) All of the mentioned Q.5.6 is a command-line’ tool that displays a
Y Ans.
: (b) summary list of status statistics for a currently
running MongoDB instance.
Q.5.3 In our posts collection, which command can be used (a) mongostat
to find all the posts whose author names begin lie
between "A" and "C" in dictionary order? (b) mongotop
(a) db.posts.find( { post_author: { $gte:"A" , $lte: _ (c) mongooplog
Tell } } ); (d) mongofiles v Ans. H (a)
(b) db.posts.find( ( post_author : { $gte:"C". $lte: | @ 57 Mongo looks for a database server listening on port
"A" } }); 27017 onthe interfa
___ ce.
(c) db.posts.find( { post_author ; { Sgt; "A", Slt: (a) web (b) localhost
"Cc" y })s (c) web host (d) all of the mentioned
(d) This type of search is not supported by ¥ Aacsith)
MongoDB. $lt and $gt operators are
¥ Ans, : (a)
Q.5.8 After starting the mongo shell, your session will use Q. 5.17 The method limits the number of
the database by default. documents in the result set.
(a) mongo (b) master (a) limitQ) (b) limitOf()
(c) test (d) primary v Ans. : (c) (c) limitByQ (d) none of the mentioned
Q. 5.9 v Ans. : (a)
command display the list of databases.
(a) show db (b) show dbs Q.5.18 Which of the following line skips the first 5
(c) show data (d) display dbs ~ Ans. : (b) documents in the bios collection and returns all
remaining documents?
Q. 5.10 Which of the following operation is used to switch
to new database mydb ? (a) db.bios.find().limit( 5 )
(a) use dbs (b) db. bios.find().skip( 1 )
(b) use db
(c) use mydb (c) db.bios.find().skip( 5)
(d) use mydbs. “v Ans. ; (c)
(d) db.bios.find().sort( 5 ) Y Ans. : (c)
Q.5.11 Which of the following also retums a list of
databases? Q. 5.19 A query may include a that specifies
(a) show databases the fields from the matching documents to return.
(b) show database
(c) display dbs (a) selection (b) projection
(d) all of the mentioned
(c) union (d) none of the mentioned
Y Ans.
: (a)
Y Ans. : (b)
Q. 5.12 Command to check list of collection is
(a) show collection (b) show collections Q. 5.20 Point out the correct statement.
(c) show collect (a) Secondary indexes allow applications to store a
(d) none of the mentioned
view of a portion of the collection in an efficient
¥ Ans. : (b) data structure
Q. 5.13 When you query a collection, MongoDB returns a (b) MongoDB has full support for secondary
object that contains the results of the indexes
query. (c) Most indexes store an ordered representation of
(a) row (b) cursor all values of a field ora group of fields
(c) colums (d) none of the mentioned (d) All of the mentioned Y Ans, : (b)
v Ans.
: (b) Q. 5.21 MongoDB stores all documents in
Q. 5.14 Which of the following method returns true if the (a) tables (b) collections
cursor has documents? (c) rows (d) all of the mentioned
(a) hasMethodQ) (b) hasNextQ) ¥ Ans.
: (b)
{c) hasDoc() (d) all of the mentioned
Q. 5.22 Which of the following operation adds a new
~ Ans, : (b) document to the users collection?
Q. 5.15 method renders the document in a (a) add (b) insert
JSON-like format. (c) truncate (d) drop ~ Ans. : (b)
(a) displayjson (b) print Q. 5.23 Which of the following preference determines how
(c) printjson (d) printdoc ¥ Ans. : (c) the client direct read operations to the set?
Q. 5.16 Which of the following method is called while (a) read (b) write
accessing documents using the array index (c) update (d) delete
notation? ¥ Ans. : (a)
Q. 5.24 Applications can also control the behav
(a) cur.toArray() ior of write
operations using concern,
(b) cursor.toArray() (a) read
(c) doc.toArray() (b) write
(d) all of the mentioned Y Ans. : (b) (c) truncate
(d) all of the mentioned ¥ Ans. : (b)
(MU-New Syllabus w.e.f academic year 21-22)(M5-68) Tech-Neo Publications..A SAC

HIN SHAH Venture
Advance Database Managoment Syatom (MU-Sem 5-Comp. NoSQL using MongoDB)....Page no. (5-25
Q. 5.25 Which of (he following pipeline is used for Q. 5.31 Which of the following functionality is used for
aggregation in MongoDB? aggregation framework?
(a) data processing (a) Smatch (b) $project
(b) information processing (c) $projectmatch (d) All of the mentioned
Y Ans. : (b)
(c) knowledge processing
(d) none of the mentioned ¥ Ans. + (a) Q. 5,32 To hide the _id field from the result set, specify
in the projection document.
Q, 5.26 The order of documents returned by a query is not (a) ids [ (b) _id?0
defined unless you specify a
(c) _id: it (d) None of the mentioned
(u) sortfind() = (b) sortelse()
¥ Ans. : (b)
(c) sort() (d) none of the mentioned
” Ans, : (c) Q. 5.33 Which of the following is not a projection operator?
(a) $slice (b) $elemMatch
Q. 5.27 Point out the correct statement,
(c) $ (d) None of the mentioned
(a) Queries specify criteria, or conditions, that ” Ans. : (d)
identify the documents that MongoDB returns to
the clients Q. 5.34 The method-returns a document that
(b) Write operations, or queries, retrieve data stored includes a metrics field.
in the database (a) db.serverStats() (b)-db.serverStatus()
(c) The selection limits the amount of data that (c) db.status() (d) all of the mentioned
MongoDB returns to the client over the network - ~ Ans. : (b)
(d) All of the mentioned v Ans. : (a)
Q. 5.35 An index cannot cover a query on a
Q. 5.28 In aggregation pipeline; the pipeline stage collection when run against a mongos if the index
provides access to MongoDB queries. does not contain the shard key.
(a) $eatch (b) $match (a) vertical (b) sharded
(c) $batch (d) All of the mentioned (c) horizontal (d) none of the mentioned
¥ Ans. : (b) i ~ Ans. : (b)
Q. 5.29 Which of the following method returns one Q. 5.36 A set is a group of mongod instances that
document? host the same data set.
(a) findOne() (b) findOnel() (a) copy (b) sorted
(c) selectOne() (d) all of the mentioned (c) radii (d) replica ¥ Ans. : (d)
¥ Ans, : (a)
Q. 5.37 A replica set can have only primary.
Q. 5.30 Which of the following query selects documents in (a) One (b) Two
the records collection that match the condition (c) Three (d) Four ¥ Ans, ; (a)
{ “user_id’: ( $lt: 42 } }?
Q. 5.38 How many types of sharding exist in MongoDB?
(a) db.records.findOne( ( “user_id”: ( Slt: 42 )},
( “history’: 0 } ) (jl (b)2 ()3 (d)4 Y Ans. : (b)
(b) db.records.find( ( “user_id”: { $lt: 42 } J, { Q. 5.39 scaling adds-more-CPU and. storage

“history”; 0 } ) resources to increase capacity.
(c) db.records.findOne( ( “userid”: ( $lt: 42 } }, { (a) Horizontal (b) Vertical
“history”: |) ) (c) Partition (d) All of the mentioned
Lane tO
(d) db.records,select( ( “user_id”: ( $l: 42 ) J, (
“history”: 0 } ) ¥ Ans. : (b)
Chapter Ends...
Qo00
MODULE 6
Trends in Advance
| CHAPTER 6 Databases
Temporal database: Concepts, time representation, time dimension, incorporating time in relational databases.
Graph Database: Introduction, Features, Transactions, consistency, Availability, Querying, Case Study Neo4J
Spatial database: Introduction, data types, models, operators and queries.
6.1. Temporal Databases

6.1.1 Introduction
UQ. —_ Write a short note on Temporal data models. (UBDPEERE).. 6-2
6.1.2 — Time Representation and Time Dimensions .....0.ccccccccsessscssessocevssssessssscscssvecssscsesssrasssvessusssecevsesesvavessacecsesesetes 6-2
6.1.3 Valid Time and Transaction Time Dimensions ......:..cccccccccccccecsscssecsesessssesssscscecaenesesnecsesesneseeseecaneucatcasneeaneees 6-4
6.1.4 Bi-Temporal Relation (Data Using Both Valid and Transaction Time)..........:ssssscscssssssssssessessenssesseesssssseanesteces 6-6
6.2 Graph Database... ccccccccscssesessssscsssescsscsesseseseseersessesesevsvscusacsenussessssevenssvaes ; aba ean cageenaareaiaccunsunsiiacesiortscancsqsoneaeaees 6-6
6.2.1 Introduction .
6.2.2 Features of Graph Database
6.2.3 Simple Graph
6.2.4 The BASE Consistency Model
6.2.5
6.2.6
6.2.7
6.3 Spatial database: Introduction, data types, models, operators ANd QUOTES .........ussecessssesteseensseetsstenesesnenesesesteneeeenes 6-14
6.3.1 Spatial Data Types .....c.cciseecevesessessesserensssscessecaniesssansneeneanessescaneenssensesrness e te area 6-14
G.3.2 Spatial OPOratOrs ..:...ccceecsessersteccseseendisctababegsedisnsnssensenlensaaseveduqeqarieniangeasycectioqecguanajeuadsaunasercanquctactnansesdedeaqctiaes 6-15
6.3.3 Models of Spatial Information .........cseceeereserennsee enue renerinsieieassaeassereceaasseesennertonsenasennseees cee 6-16
ua. Explain different types of spatial data models. [[UIUMCRSEMED)............ssssssssscstsssessseceneesssseeesesasssnsnseeees 6-16
6.4 —_ Descriptive QUESTIONS .......sesesececstetssestieneenereeenteeens
6.5 Multiple choice Questions
“¢ Chapter Ends ........... cece Serceravanereecuscensueaiaonevsaseness)anenesnesaafiliisaestbeeianel gaadivencdmeanraseanaacenrasgtonnengenestenseensate
...Page no. (6-2
Databasese
rends in AdvanceS
Advance Database Management tem (MU-Sem 5-Comp.
>» 6.1 TEMPORAL DATABASES
CEs s oS SooSe=
oo ee re -_= =- a=
eee ee ee OE ee
temporal data types and stores

A temporal database stores data relating to, time instances. It offers
information relating to past, present and future time.
historical data.
Temporal databases provide a uniform and systematic way of dealing with
e.g., medical or judicial records
ES Some of the examples of Temporal Databases are given below

history for proper diagnosis. Information like the
e Healthcare Systems : Doctors need the patients” health
etc.
time a vaccination was given or the exact time when fever goes high
, time when policies are in effect needs to be
Insurance Systems : Information about claims, accident history
maintained.
EF Time in Temporal databases

There are two different views of time in temporal databases.
to the system.
Valid Time: It is the time period during which a fact is true in real world, provided
stored in‘the database, based on the
Transaction Time : It is the me peniad during which a fact is
ically by the system.
transaction serialization order along with the timestamp generated automat
23. 6.1.2 Time Representation and Time Dimensions

ion about past states because in
The database that supports temporal data is used to store and retrieve informat
is a database
many applications or system it is important to store past events of data. A temporal database
with built-in-support for handling data involving time.
real world and
Normally, databases models possess consists of only one state — the current state of the
doesn’t store information about past states. When state of the real world changes, the database gets updated
and information about old state gets lost.
Sometimes, it is also important to store and retrieve information about current and past states.
Consider examples below where time is used to store data for analysis
Patient database must store information about the medical history of patient.
Judicial records.
Various sensory information. So we define a Temporal database — “Database that stores the states of real
world across time”.
Tech-Neo Publicatio SACHIN SHAH Venture

Advance Database Management System (MU-Sem §-Comp.) (Trends in Advance Databases)....Page no. (6-3)
e« Temporal views in databases includes :
o Valid Time.
o Transaction Time
© Bi-temporal Data,
EMP_VALID
Name NIN Salary Dept no VST VET
VST : Valid Start Time VET : Valid End Time

EMP_TRANSAC
Name | NIN | Salary | Deptno Ist | , Ter

TST : Transaction Start Time TET : Transaction End Time
EMP_BITEMP
~ Name a NIN © aie Salary pe Deptno '|> VST VET Ist TET
(1F)Fig. 6.1.1 : Temporal views on emp table
e The temporal data types which specifies date with Year, Month, and Day as YYYY-MM-DD, TIME
(specifying Hour, Minute, and Second as HH:MM:SS), TIMESTAMP. (specifying a Date/Time
combination, with options for including sub-second divisions if they are needed), INTERVAL (a relative
time duration, such as 10 days or 250 minutes), and PERIOD (an anchored time duration with a fixed
starting point to end)
e A temporal database will store information concerning when certain events occur, or when certain facts are
true. The events or facts are typically associated in the database with a single time point in some granularity.
e For example, a bank deposit event may be associated with the timestamp when the deposit was made, or the
total monthly sales of a product (fact) may be associated with a particular month (say, February 1999). Note
that even though such events or facts may have different granularities, each is still associated with a single
time value in the database. Duration events or facts, on the other hand, are associated with a specific time
period in the database
e Forexample, an employee may have worked in a company from August 15, 1993 till November 20, 1998. A
time period’ is represented by its start and end time points [start-time, end-time].If the above period is
represented as [1993-08-15, 1998-11-20]. Such a time period is often used to mean the set of all time points
from start-time to end-time, inclusive, in the specified granularity. Hence, assuming day granularity, the
period as [1993-08-15, 1998-11-20] represents the set of all days from August 15, 1993 until November 20,
1998.
e Publicoti CHAN SHA

(MU-New Syllabus w.e.f academic year 21-22)(M5-68) NB Teeh-Neo Ane eee
Advance Database Management System (MU:Sem 5-Comp. Trends in Advance Databases). ...Page no. (6-4
%& 6.1.3 Valid Time and Transaction Time Dimenstons

e Given a particular event or fact that is associated.with a particular time point or time period inthe database,
the association may be interpreted to mean different things. The most natural Interpretation is that the
associated time is the time that the event occurred, or the period during which the fact was considered to be
true in the real world. If this interpretation is used, the associated time is often referred to as the valid time.
A temporal database using this interpretation is called a valid time database. However, a different
interpretation can be used, where the associated time refers to the time when the information was actually
stored in the database; that is, it is the value of the system time clock when the information is valid in the
system . In this case, the associated time is called the transaction time.
A temporal database using this interpretation is called a transaction time database. Other interpretations can
also be intended, but these two are considered to be the most common ones, and they are referred to as time
dimensions, In some applications, only one of the dimensions is needed and in other cases both time
dimensions are required, in which case the temporal database is called a bitemporal database. If other
interpretations are intended for time, the user can define the semantics and program the applications
appropriately, and it is called a user-defined time.
Valid Time Example
Consider the example of a person, John: John was born on April 3, 1992 in Chennai. His father registered his
birth after three days on April 6, 1992. He did his entire schooling and college in Chennai. He got a job in
Mumbai and shifted to Mumbai on June 21, 2015. He registered his change of address only on Jan 10, 2016.
Time specifications in SQL
_ SQL support data types that is used to integrate time with data. That data types are as date: four digits for the
year (1--9999), two digits for the month (1--12), and two digits for the date (1--31).
Time: Two digits for the hour, two digits for the minute, and two digits for the second, plus optional
fractional digits.
Timestamp : the fields of date and time, with six fractional digits for the seconds field.
o Incorporating time in relational databases
© Incorporating Time in Relational Databases Using Tuple Versioning
Valid Time Relations
The valid time temporal database contents looks look like as shown below with the attributes as Name,
City,
Valid From, Valid Till
Name| City |Valid From Valid TH

John | Chennai | April 3, 1992 | June 20, 2015
John | Mumbai | June 21, 2015 | oo

Fig. 6.1.2 : Valid time temporal database

Tech-Neo Publications...4 SACHIN SHAH Venture
advance Database Management System (MU-Sem 5-Comp.) {Trends in Advance Databases). z Page no. (6-5)
¢ Let us now see how the different types of temporal databases that may be represented in the relational
model. First, suppose that we would like to include the history of changes as they occur in the real world.
EMP_VT
Name Ssn J Salary. Dno |. Supervisor_ssn Vst .], Vet
DEPT_VT
Dname |. Dno Total_sal Manager_ssn Vst Vet
(1F2)Fig. 6.1.3: Valid Time relations emp and dept
e Consider again the database emp and dept and consider that the granularity level is day. Then, we could
convert the two relations EMPLOYEE and DEPARTMENT into valid time relations by adding the
attributes VST (Valid Start Time) and VET (Valid End Time), whose data type is DATE in order to
provide day granularity and_ the relations renamed EMP_VT and DEPT_VT, respectively as shown in
Fig 6.1.3
e If update is applied to the database before it becomes effective in the real world, then called a proactive
update. If the update is applied to the database after it becomes effective in the real world, it is called a
retroactive update: An update that is applied at the same time as it becomes effective is called a
simultaneous update.
e The action that corresponds to deleting an employee in a nontemporal database would typically be applied to
a valid time database by closing the current version of the employee being deleted.
Transaction Time Relations
* Ina transaction time database, whenever a change is applied to the database, the actual timestamp of the
transaction that applied the change (insert, delete, or update) is recorded.
e Such a database is most useful when changes are applied simultaneously in the majority of cases for
example, real-time stock trading or banking transactions.
e If we convert the nontemporal database into a transaction time database, then the two relations EMPLOYEE
and DEPARTMENT are converted into transaction time relations by adding the attributes TST (Transaction
Start Time) and TET (Transaction find Time), whose data type is typically TIMESTAMP.
* A transaction time database has also been called a rollback database, 18 because a user can logically roll
back to the actual database state at any past point in time T.
' Implementation Considerations
* There are various options for storing the tuples in a temporal relation.
tables: one for the currently
* One is to store all the tuples in the same table and another option is to create two
valid information and the other for the rest of the tuples.

ow’
Advance Database Management System (MU-Sem 5-Comp. rends in Advance Databases)....Page no, (6-§
*S" Incorporating Time in Object-Oriented Databases Using Attribute Versioning
* The tuple versioning approach is already discussed for implementing temporal databases.
* In this approach, whenever one attribute value is changed, a whole new tuple version is created, even though
all the other attribute values will be identical to the previous tuple version. An alternative approach can be
used in database systems that support complex structured objects, such as object databases or object-
relational systems. This approach is called attribute versioning.
In attribute versioning, a single complex object is used to store all the temporal changes of the object. Each
attribute that changes over time is called a time varying attribute.
It has its values versioned over time by adding temporal periods to the attribute.
The temporal periods may represent valid time, transaction time, or bitemporal; depending on the application
requirements.
Uni-Temporal Relations : Has one axis of time, either Valid Time or Transaction Time.
6.1.4 Bi-Temporal Relation (Data Using Both Valid and Transaction Time)
A bi-temporal database which includes both the valid time and transaction time. Transaction time records the
time period during which a database entry is made. So, now the database will have four additional entries the
valid from, valid till, transaction entered and transaction superseded.
The database contents will look aks shown below: Name, oy Valid xin, Valid Till, Entered, Superseded
"Valid From. . Valid Till : "Entered. | StSuperseded

John | Chennai | April 3, 1992 | June 20, 2015 | April 6, 1992 | Jan 10, 2016
John | Mumbai | June 21, 2015 | Jan 10, 2016 | o

Fig. 6.1.4 : Bi-Temporal Relation
2 6.2.1 Introduction
A graph database is an online database management system with Create, Read, Update and Delete
(CRUD) operations working on a graph data model. Data represented as a graph n Collection of vertices
(nodes) and edges n Possible to store data associated with both individual nodes and individual edges.
For example, Twitter’s data can be easily represented as a graph because of a small network of followers.
The relationships are key here in establishing the semantic context: namely, that simran follows john, and
that john, in turn, follows simran. Ruth and john likewise follow each other. So it is easy to show all this
connection with the help of graph database. A graph is composed of two elements: a node and a
relationship. Each node represents an entity (a person, place, thing) and each relationship represents how
two nodes are associated.
advance Database Management System (MU-Sem 5-Comp. Trends in Advance Databases) ....Page no. (6-
e This general-purpose structure allows you to model all kinds of scenarios — from a system of roads, to a
network of devices, to a population’s medical history or anything else defined by relationships.
va. 6.2.2 Features of Graph Database
i. Performance
Your data volume will definitely increase in the future, but what's going to increase at an even faster
definitely get bigger, but
clip is the connections (or relationships) between your data. Big data will
connected data will grow exponentially.
the number and depth of
In the traditional databases, relationship queries come to a grinding halt as
even as your data grows
relationships increase. In contrast, graph database performance stays constant
year over year.
2. Flexibility
the speed of business because the
With graph databases, your IT and data architecture teams move at
and industry change. Your tearm
structure and schema of.a graph data model flex as your solutions
(and then exhaustively remodel and
doesn’t have to exhaustively model your domain ahead of time
add to the existing structure without
migrate the DB after some exec asks for a change); instead, you can
endangering current functionality.
the
With the graph database. model, you are the one dictating changes and taking charge; whereas
RDBMS data model represents to its tabular way of seeing the world.
3. Agility
Nodes
Developing with graph technology aligns perfectly
with today’s agile, test-driven development practices, ——s Relationships
allowing your graph-database-backed application to
evolve with-your changing business requirements.
Your agile team now has a database that keeps up with
your daily demands.
The main building blocks of Graph DB Data Model are:
o Nodes
o Relationships
(iF3) Fig. 6.2.1 : Graph DB Data Model
o Properties
7 6.2.3 Simple Graph

represented using Arrows. Relationships are
represented using Circles. Relationships are
e Nodes are
in terms of Properties (key-value pairs).
directional. We can represent Node's data
Node's Circle.
* Each Node's Id property is within the
5-68) Tech-Neo Publications..A SACHIN SHAH Venture

(MU-New Syllabus w.e.f academic year 21-22)(M
—,.
Advance Database Management System MU ‘ Trends in Advance Databases)...
"=" Consistency
* ACID properties mean that once a transaction is complete,

the data is in the consistent state.
In NoSQL technology, the
graph databases (e.g. Neodj)
consistently stored. use an ACID consistency
model to ensure data i
t& 6.2.4 The BASE Consis

tency Model
In the NoSQL database world, ACID transactions are
less used because of some databases have
requirements for immediate consi no
stency, data freshness and accuracy in order to gain
other benefits such as scale
and resilience. It scales very well and
rea cts well to rapid data changes.
5S BASE consists of three pro
perties
1. Basically Available
The system is guaranteed to be available

in event of failure; Rather than enforcing
immediate consistency,
BASE-modelled NoSQL databases will
ensure availability‘of data by spreading and replicating it across the
nodes of the database cluster.
2. Soft State
Due to the lack of immediate consistency, data values

may change over time. The BASE model breaks
off
with the concept of a database which enforces its own
consistency, delegating that responsibility to
developers. The state of the data could change
without application interactions due to eventual
consistency.
3. Eventually Consistent
as
The fact that BASE does not enforce immediate consistency
does not mean that it never achieves it.
However, until it does, data reads are still possib
le (even though they might not reflect the realit
y). The
system will be eventually consistent after the
application input.’ The data will be replicated to
different nodes
and will eventually reach a consistent state. But the
consistency is not guaranteed at a transaction level.
t= Example of BASE consistency model
e Marketing and customer service companies who

deal with sentiment analysis will prefer the elastic
ity of
BASE when conducting their social network research,
¢ Social network feeds are not well structured but

contain huge amounts of data which a BASE-model
ed
database can easily store.
The BASE consistency model is used by column
family, key-value and document stores.

3 Tech-Neo Publications...A SACHIN SHAH Venture
advance Database Management System (MU-Sem 5-Comp. rends in Advance Databases)....Page no. (6-9
va. 6.2.5 Neo4j
Neo4j is a popular Graph Database. Other Graph Databases are Oracle NoSQL Database, OrientDB,
HypherGraphDB, GraphBase, InfiniteGraph, and AllegroGraph.
t= Querying
The Cypher is the Query Language of Neo4j .Neo4j has a high-level query language, Cypher. There are
declarative commands for creating nodes and relationships (see Figures 24.4(a) and (b)), as well as for
finding nodes and relationships based on specifying patterns.
ced the CREATE command in the
Deletion and modification of data is also possible in Cypher. We introdu
other features of Cypher. A Cypher
previous section, so we will now give a brief overview of some of the
clauses, the result from one clause can be the input to
query is made up of clauses. When a query has several
the next clause in the query.
Cypher Keywords
r
the same way there are a few key words in Cyphe
Most of the programming languages have keywords in
need to be able to create, read, update, or delete data
reserved for specific actions in parts of a query. We
that functionality.
Neo4j, and keywords help us accomplish
: A.MATCH B. RETURN
Let us check in detail with two common keywords
A. MATCH
g node, relationship, label, property, or pattern
The MATCH keyword in Cypher is used to search for an existin
MATCH works like SELECT in SQL.
in the database. If it is similar with SQL
lar node, find all the nodes with a particular
You can find all node labels in the database, search for a particu
and much more using MATCH.
relationship, look for patterns of nodes and relationships,
RETURN
or results you might want to return from a Cypher
The RETURN keyword in Cypher specifies what values
onships, node and relationship propertiesor, patterns in
query. You can tell Cypher to return nodes, relati
doing write procedures, but is needed for reads.
your query results. RETURN is not required when
In order to
earlier become important when using RETURN.
The node and relationship variables we discussed in your MATCH
back nodes , relat ionsh ips, prope rties , or patterns, you need to have variables specified
bring
clause for the data you want to return.
Cypher query examples
keywords.
have learned so far using MATCH and RETURN
Let us look at some examples of the syntax we
have an image below of the
ion of what we are trying to achieve and
Each example will start with an explanat
wser.
results of the query run in Neo4j Bro
SHAH Venture
21-22)(M5-68) fl Tech-Neo Publications...A SACHIN
(MU-New Syllabus w. ef academic year
Advaiics Dated Ma rent System (MU-Sem 5-Comp.) Trends in Advance Databases). ...Page no. (6-10
Example 1
* Find the labeled Person nodes in the graph. Note that we must use a variable like p for the Person node if
we want retrieve the node in the RETURN clause,
* Query can be written in cypher as:
o MATCH (p:Person)
o RETURN p
o LIMIT
Example 2
¢ Find Person nodes in the graph that have a name of 'Tom Hanks’. Remember that we can name our variable
anything we want, as long as we reference that same name later.
© Query can be written in cypher as :
MATCH (tom:Person {names Tom Hanks'})
RET URN tom *
(Link for more queries-https://neo4j.com/developer/cypher/querying/)
t& 6.2.6 Neo4] Database Server Setup with Windows exe File
> Step 1: Visit the Neo4j official site using https://neo4j.com/. On clicking, this link will take you to the
homepage of neo4j website.
eared Ms aq x
Neo he tana fae we} 1
CFite
AG. doer fenteten,
Gjneebipyiiue § Saye @aalom teste eden Gate lea FF Gta ma: | Ce batman
> Step 2: As highlighted in the above screenshot, this page has a Downlo
ad button on the to p right hand side.
Click it.

advance Database Management System (MU-Sem 5-Comp. rends in Advance Databases)....Page no. (6-11
» Step 3: This will redirect you to the downloads page, where you can download the community edition and
the enterprise edition of Neo4j. Download the community edition of the software by clicking the respective
button.
ma - go xX
£1 Download Neotj 3.11. x \Ga,
€BY Apps5 [5]CONew Tab[a bySeve | nipsiieodicomicon | th ¥| | O| OtherBObookmarksI

Yahoo 6 Google @ JavaScript, the weird
neox a
For Business j For Individuals i
ata! gemer®
» & scale-but copablies,
d community version of Neo4j software

> Step 4: This will take you to the page where you can downloa
ve to the desired operating system
compatible with different operating systems. Download the file respecti
a ee M- o x
a Download Neo$j Comn::
€ |
COE Scare nipcimeo geomiw at) & ) OBE bookmarst
theweirs: —» | Other
St Apps [New Tabby Yahoo Googe @ iaaScpt
o
@Hreoy +
® Naot} 3.1.1
igesigr inacty ee mele seer coe co eae agsdsdase
fat Oh Bos etn sod Docker
ows-x64_3_1_1.exe to your system as shown in the

This will download a file named neo4j-community_wind
following screenshot.

(MU-New Syllabus w.ef aca demic year 21-22)(M5-68)
Advance Database Management System |
| = | Downloads
Home = Share = View
v & Search Do... A

¢ ‘4 4 » ThisPC » Downloads
- (@ neot)-community windows-x64,3.1.1-ere
w# Quick access
I
BB Destop
*&
H Downloads
&
=| Pictures
4
&, Google Drive
8) Documents +
>»
Titer {= &)
> Step 5 : Double-click the exe file to install Neo4j Server.

e Neodj Community Edition Setup - Oo x
Select Destination Directory
Where shoukd Neo] Community Edition be installed? @neo
Select the folder where you would like Neo4| Community Edition to be installed, then click
Next, ( [iy
Browse. |
» Step 6: Accept the license agreement and proceed with the installation. After completion of the process, you
can observe that Neo4j is installed in your system.
‘(link for reference-
https://www.tutorialspoint.com/neo4j/neo4j_environment_setup.htm)
7 6.2.7 CASE STUDY ON Neo4j
t= Cisco Systems
“Real-Time Graph Analysis of Documents Saves Company Over 4 Million Employee Hours”, The sales
team at Cisco Systems relies on an extensive series of documents that help them close deals
with potential
customers. By using Neo4j, Cisco was able to create a metadata graph to make relevant sales content
findable,
saving the company millions of hours of otherwise-wasted staff time
(MU-New Syllabus w.e.f academic year 21-22)(M5-68) Tech-Neo Publicati ons...A SACHIN SHAH Venture
aavance Database, Management System (MU-Sem 5-Comp
——=
The company
Cisco Systems is : Orin IT leader that designs, manufactures and sells networking equipment to
enterprise and service providers, small businesses and individuals. With more than 70,000 employees in over
165 countries, they are constantly working to create and patent new networking technologies. An integral
ng with to identify their needs and
part of their DNA. is creating long-lasting customer partnerships, worki :
provide solutions that support their success.
i The Challenge
as documents, files
. Because of the scope of Cisco’s sales pipeline, there is a huge amount of content — such
to sign potential customers
and presentations — in their internal database that Cisco’s sales team relies on
rson spent up to one hour every day
However, there was a major content findability problem: Each salespe
trying to find the content relevant to their prospects’ needs.
employees could search with a
» The company was relying on a typical index-driven search engine their
d metadata, it was a challenge to pull up relevant
series of keywords. But because files didn’t have assigne
understanding of the content.
content. The problem was too much content, and no deeper
i The Strategy
ata to
job ahead of them. They would have to assign metad
To address their findability issue, Cisco had a big team
ntional document browsing smarter so their sales
all of their content and find a way to make conve would also need to
d routes to get to the relevant content. They
wouldn’t have to go through long, complicate :
ical files and tag new documents in real time.
assign metadata tags to a huge library of histor
'F The Solution
ction of Cisco’s
to solve these challenges. To assign metadata to the large colle
¢ Cisco turned to Neo4j
as Microsoft Word and PDF — into
was to transform the file types — such
historical documents, the first step
clustered by large data platforms.
t Diri chle t allo cati on (LDA ), format so the documents could be
an laten phrases were fed into Neo4j,
ment s were clus tere d, a coll ection of common keywords and
Once the docu
logy.
where they were combined to create an onto
ment system to a
pro ces sin g, the doc ume nt is sent from the content manage
* For real-time document and phrases into
repr oces ses the doc ume nt, assigns tags and adds the keywords
machine tagging service that The ability to assign metadata
aba se whil e retu rnin g the doc ument to the document repository.
the Neo4j dat ent findability problem.
to historical data — and in real time — solved Cisco’s cont
of times the
on keywords, content ratings and the number
took it one step further. Based
* But Neo4j ons, providing sellers
Neo 4j was also able to provide content recommendati
document has been accessed,
leverage when closing deals with customers.
with additional information they could
The Result
focus on
that sav es thei r staf f tim e and increases their ability to
rch engine
* Now Cisco has a robust sea are in turn more accurate and effective. Wit
h
hav e few er sea rch resu lts whi ch
additional customers. They
done in half the time.
about 20 million documents, search is
SHAH Venture
ionsions
icat
: eo Publicat
ll Tech-N ..A..A SA SACHIN
21-22)(M5-68)
Advance Database Management S stem (MU-Sem 5-Comp. rands ,in Advance Databases ....Page no. (§-
6-14
Cisco created their own global sales kit to converge related content together so their salespeople can click on
any grouping of subjects. The sales kit tracks views
and how often a piece of content was downloaded —
of that rich information comes back to their aj
system.
Cisco’s sellers now have the ability to search their vast document
database and quickly provide relevant
content to their customers and prospects, The company now saves over four million hours a year that
are
now used to engage with more prospects and close
more deals.
§.3 SPATIAL DATABASE: INTRODUCTION, DATA TYPES, MODELS, OPERATORS

AND
QUERIES
Spatial data represents information about the physical location and shape of geometri
c objects.
Spatial data supports in databases is important for efficiently storing, indexin
g and querying of data on the
basis of spatial locations. Some of the examples of spatial and nonspatial data
are listed below.
Examples of non-spatial data : Names, phone numbers,
email addresses of people
Examples of Spatial data : Census Data NASA satellite images
- terabytes of data per day Weather and
Climate Data Rivers, Farms, ecological impact
Example : Oracle Spatial Extension can work with Oracle 10g DBMS
that supports spatial data types (e.g.
polygon), operations (e.g. overlap) callable from SQL3 query language
has spatial indices, e.g. R-trees
% 6.3.1 Spatial Data Types
Spatial data is the data collected through physical

real life locations like towns, cities, islands etc.
Spatial
data are categorized into three different types Map
data,
- Points: Individual x, ylocations.
attribute data and image data and they are widely used in | sampli
Ex. : Center point of plot locations, tower locations,
ng locations§
commercial sectors. pee ony eens
Be @ BYE S o a S a oylyeu ecia a
1. Map data
a —— ‘
. ; o
Map data includes different types of Spatial featur : ‘Lines : Composed of many (at least 2) vertices, or. -
es | points, that are connected.
of objects: in map, e.g. — an object’s shape and | _EX.: Roads
and streams.
location of object within map. The three basic types | xy.
of features are points, lines, and polygons (or areas). ey Ye.
Points : Object represented only by its location in
EPP WER aS SORE Se TO MR SEE
space, e.g. center of a state.-Points are used to |: Asie Three or more vertices that are connected
represent spatial characteristics of objects whose |. x. : Building boundaries and lakes.
locations correspond to single 2-D coordinates (x, y;
xy xy
or longitude/latitude) in the scale of particular

application. xy PRL Ae
ed ue Belair ksa ee
~
(rqFig. 6.3.1: Map

data
(MU-New Syllabus w.e.f academic year 21-22)(M5-68) fH Tech-Neo Publications...
A SACHIN SHAH Ventur
e
aavance Database Management System (MU-Sem 5-Comp. rends in Advance Databases)....
: Buildin :
For examples : Dullcings, cellular towers, or stationary vehicles. Moving vehicles and other moving objects
.
can be represented by sequence of point locations that change over time.
4
Lines :- ItIti is a representati
Pp i
on of moving ‘
through or connections in; space and it shows sequence of points
Lines oa objects having length, such as roads or rivers, whose spatial characteristics can be
approximated by sequence of connected lines,
Polygons : Polygons are used to represent characteristics of objects that have boundary, like states, lakes, f or
countries.
attribute data
Geographic Information Systems (GIS) uses the descriptive data that is associated with features in the
map.
hi;
For example, in map representing countries within an Indian state E.g. Del
Attributes - Population, largest city/town, area in square miles, water portion on land.and so on.
Image data
It includes camera data like satellite images and aerial photographs and objects of interest such as buildings
and roads, can be identified and overlaid on these images.
Satellite images are typical examples of raster data.
Ys 6.3.2 Spatial Operators
Spatial operators applied in geometric properties of objects.

the relationship among them.
It is then used in the physical space to capture them and show
It is also used to perform spatial analysis.
below :
Spatial operators are grouped into three categories as given
C. Metric Operators |
| A. Topological operators B. Projective Operators
» A. Topological operators
n or rotation.
logical operations are applied, like translatio
Topological properties do not vary when topo
structured in many levels.
Topological operators are hierarchically
ions between regions with a
offer s oper ator s, abili ty to check for detailed topological relat
© The base level
broad boundary.
rtain spatial data
more abstr act oper ator s that allow users to query unce
o The higher levels offer
independent of the geometric data model.
loop).
open (regi on), close (region), and inside (point,
Examples
B. Projective operators
the concavity convexity of
establish predicates regarding
Projective operators, like convex hull are used to
objects.
SACHIN SHAH Venture
el Tech-Neo Publications..A
Trends in Advance Databases)....
Example - Having inside
the object’s concavity,
> © Metric Operators
Metric operator’s task is to

provide a more accurate des
cription of the geometry of
They are often used the object.
to measure the global properties of singular objec
ts, and to measure the relative position
of different objects, in
terms of distance and direction.
Example — length (of an arc)
and distance (of a point to poin
t),
Dynamic Spatial
Operators
Dynamic operations changes the

objects upon which the operator
s are applied. Create, destroy,
are the fundamental dynamic and update
oper ations,
¢ Example : Updation of a spatial object via
translate, rotate, scale up or scale down, refle
ct, and shear.
%& 6.3.3 Models of Spatial Information
1. Field 2, Object
Field : These models are used to model spatia
l data that is continuous in nature, e.g. terra
quality index, temperatur in elevation, air
e data, and soil variation characteristics
.
* Object : These models have been used for
applications such as transportation networks,
buildings, and land parcels,
other objects that possess both spatial
and non-spatial attributes.
A spatial application is modeled using
either field or an object based model, which
requirements and the traditional choice of depends. on the
model for the application. Example — High
traffic, analysing
system, etc.
e The requests for the Spatial data that uses of
spatial operations are called Spatial Queries.
Spatial queries canbe divided as shown below
:
1, Range queries : These type of spatial queri
es find all objects of a particular type that
are’ within a given
Spatial area.
Example : Finds all hospitals within the pimpr
i chinchwad area. A variation of this query
location, find all objec is for a given
ts within a particular distance, for example,
find all banks within. 5 km range.
2. Nearest neighbor queries : These type of spatia
l queries find object of a particular type which
given location. is nearest to a
Example : To find the nearest police station from the locatio

n of accident.
3. Spatial joins or overlays : These type of spatia
l queries perform joins of the objects of two
spatial condition, such as the objects which are inters types based on
ecting or overlapping spatially.
Example : Finds all Transport cafés or nearby food places on a National
Hi ighway between two
spatially joins township objects and highway object. Finds all hotels cities. It
that are w ithin 5 kilometres of
station. It spatially joins railway station objects and hotels objects, a railway
(MU-New Syllabus w.e.f academic year 21-22)(M5-68

) Tech-Neo Publications...A SACHIN SHAH Venture
advance Database Management System (MU-Sem 5-Comp. rends in Advance Databases)....Pa
....Page no, 6-1
4, Spatial Queries: List the names of all bookstores within ten miles of particular region in the city. ) List all
customers who live Maharashtra and its adjoining states.
Important application domains with spatial data and queries are listed below :
1. Army Field Commander : Has there been any significant enemy troop movement since last night? —
Insurance Risk Manager: Which homes are most likely to be. affected in the next great flood on the
Mississippi? — Medical Doctor: Based on this patient's MRI, have we treated somebody with a similar
condition.
2, Mobile phone user : Where is the nearest gas station? Where is the nearest domino’s pizza shop?
as given below
» Two types of spatial data are particularly important to consider for evaluation or analysis are
Computer Aided Design (CAD) data : It includes spatial information about how objects like buildings, cars
ed-design databases are integrated-circuit
_
or aircraft are designed. Other examples that include computer-aid

and electronic-device layouts.
2, Geographic Data : It consists of data such as road maps, land-usage maps, topographic elevation maps,
on systems are
political maps showing boundaries, land ownership maps, and so on. Geographic informati
special purpose databases tailored for storing geographic data.
1% Spatial join
on their spatial attribute
e Spatial join is a join which compares any two joined objects based on a predicate
less than 50 Kms.”
values. — “For each river pass through Bavaria, find all cities within
e Itcan be written as shown below by using SQL expression as
c.area)) FROM rivers 1, cities c WHERE r-route
e SELECT r.tname, c.cname, length(intersection(r.route,
intersects Bavaria.area and dist(r.route,c.area) < 50 Km.
Link to execute cypher query

https://neo4j.com/developer/cypher/querying/
1 6.4 ‘DESCRIPTIVE QUESTIONS |

Q.1 Define temporal database and give example.
Q.2 How to incorporate time for temporal database.
time relations with suitable example.
Q.3 Explain valid time, transaction time and bitemporal
Q.4 Explain difference between temporal and spatial database.

Q.5 Explain data types used for spatial database.
Q.6 Which are the spatial operators.
e.
Q.7 Whats graph database. Explain how it is represented in databas
Q.8 Which are tools that support graphical database.
Q.9 Which are the building blocks of graph database model.

Venture
a
el Tech-Neo Publicatio s...
icationns..A SA CHIN SHAH
5-68)
(MU-New Syllabus w.e.f academic year 21-22)(M
Q, 10 Which are the ways to store data in graph database.
Q. 11 What Is n6o4j, Write example of queries.
Q, 12 Write short note on temporal data model, (MU - Dec. 19)
Q.13 Explain different spatial data models. (MU - Dec. 19)
>>| 6.5 MULTIPLE CHOICE QUESTIONS Q. 6.6 A (geographic) field is a geographic phenomena
for which, for every point in the study area
Q.6.1 Most allow the representation of simple (a) A value can be determined
feometric objects such as points, lines and (b) A value cannot be determined
polygons, (c) A value is not relevant
(a) Active database (d) A value is missing ~ Ans, : (a)
(b) Temporal database Q.6.7 The term that means the value of a data at
(c) Spatial database particular time is __
(d) Deductive databases Y Ans. : (c) (a) Temporal data (b) Spatial data
Q.6,2 GIS stands for (c) Interval data (d) Graphical datav’ Ans. : (a)
(a) Geographic Information System Q. 6.8 Neo4j is
(b) Generic Information System (a) Graph database
(c) Geological Information System (b) Relational database
(d) Geographic Information Sharing ¥ Ans. : (a) (c) Query language
Q.6.3 GIS deals with which kind of data (d) Temporal database Y Ans. : (a)
(a) Numeric data (b) Binary data Q. 6.9 Cypher is used for Querying in
(c) Spatial data (d) Complex data ¥ Ans. : (c) (a) Graph database
Q.64 By ‘spatial data’ we mean data thathas (b) Relational database
(a) Complex values (b) Positional values (c) Query language
(c) Graphic values —_ (d) Decimal values (d) Temporal database Y Ans. : (a)
Q. 6.10 Events or facts are represented in __
¥ Ans. : (b)
Q.6.5 ‘Spatial databases’ are also known as_ (a) Graph database
(b) Relational database
(a) Geodatabases
(b) Monodatabases (c) Query language
(c) Concurrent databases (d) Temporal database Y Ans. : (a)
(d) None of the above Y Ans. : (a)
Chapter Ends...
O00

Adbms Tech-Neo Searchable

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Adbms Tech-Neo Searchable

Uploaded by

Copyright:

Available Formats

TR

Prerequisite: Database Management System

1. To provide insights into distributed database designing

4. To learn about the trends in advance databases. aes _l

1.1 Introduction, Distributed DBMS Architecture, Data Fragmentation, Replication and

2.1 Distributed Transaction Management — Definition, properties, types, architecture

protocol. (Refer Chapter 2)

Atacama Snnten YM Scns, Guang a

Trends in advance databases

6.1 Temporal database: Concepts, time representation, time dimension, incorporating

6.2 Graph Database: Introduction, Features, Transactions, consistency, Availability,

» Chapter3 and JSON...

> Chapter4 NOSOQL Distribution Model ...ccssecesssssesssecessssssssecsesenecsensnennesessssesnensatnarsneseusesanaes 4-1 to 4-13

NOSQL using MOngODB ...ccssssocecssssosssssesessesessssssssssssoreesersersestensssssssssssessseves 5-1 to 5-25

» Chapter6 Trends in Advance Databases ........:..cccsssssssesssssessssassesstsseaseressazasecesesesseceseesece 6-1 to 6-18

DBMS Architecture, Data Fragmentation, Replication and Allocation Techniques for

UQ. Explain different types of transparency in distributed databasea i sR 13

1.4 Descriptive QUESTIONS .........ccsessteesesseeseeegenseseesscansssansaeescevseanececausassanseusaunensuasuseuaesnensesesseeseeeqecataucensussuguananseuncunneenenees 1-19

& 1.1.1 Difference between Centralized and Distributed

Parameters for Centralized Database

" Client'4 VpheZ " Client 6

(MU-New Syllabus w.e.f academic year 21-22)(M5-

Parameters for Centralized Database Distributed Database

Efficiency It is Jess efficient It is more efficient

Reliability It is less reliable It is more reliable,

7% 1.1.2 Transparency in DDBMS

tS Example of Distributed database system

Consider the application of online examination system.

Ya. 1.1.3 Types of Distributed System

> 1. Homogeneous Database

is handled by same DBMS on both the servers. Oracle . Oracle

(143)Fig. 1.1.3 : Homogeneous database

> 2. Heterogeneous Database

e Also, one site might be completely unaware of the other sites.

1 1.2 DISTRIBUTED DBMS ARCHITECTURE ~

Y= 1.2.1 General Architecture of Distributed Databases System

_Global conceptual schema (GCS)

Local intemal schema (LIS) - eunnnueng Local intemal schema (LIS).

Site 1 Sites2 ton-1 Site n

(1asFig. 1.2.1: Logical architectural model

ion is presented with

i= Component architecture of a Distributed database system

* The global query optimizer references both Global query compiler.

global queries. |.Global transaction manager|

2 1.2.2 Parallel Database Architecture

There are two main types of multiprocessor system architectures ;

Memory Memory | ,,,, |, Memory

(1a7)Fig. 1.2.3(a): Shared Memory

Memory Memory sorsstvtity Memory

| Memory. Memory _ Memory .

(1a)Fig, 1.2.3(b) : Shared disk (1a9)Fig. 1.2.3(c) : Shared nothing

¢ Database management systems developed using the

e Multiprocessor systems that have distributed

¢ Another type of multiprocessor architecture is called

3 1.2.3 Federated Database Schema Architecture

eI Tech-Neo Publications..A SACHIN SHAH Venture

2a 1.2.4 Three-Tier Client-Server Architecture

1 UQ.. Write a note on clie

The distributed database application uses the concept as Client

1. Presentation layer (client)

2. Application layer (business logic)

1 1.3 DATA FRAGMENTATION, REPLICATION AND ALLOCATION...

ES Purpose of Data Replication