You are on page 1of 24

Excel Engineering College AI & DS

COLLEGE

20AI303 DATABASE MANAGEMENT SYSTEMS


IMPORTANT QUESTIONS
PART A
1. Define DBMS and Data model.
Database management system (DBMS) is a collection of interrelated data and a set of
programs to access those data.
A data model is a collection of conceptual tools for describing data, data relationships, data
semantics and consistency constraints.
Types of data model used
➢ Hierarchical model
➢ Network model
➢ Relational model
➢ Entity-relationship
➢ Object-relational model
➢ Object model

2. What are the advantages of using a DBMS?


The advantages of using a DBMS are
a) Controlling redundancy
b) Restricting unauthorized access
c) Providing multiple user interfaces
d) Enforcing integrity constraints.
e) Providing backup and recovery

3. Define the terms


1) Physical schema
2) logical schema.
Physical schema: The physical schema describes the database design at the physical level,
which is the lowest level of abstraction describing how the data are actually stored.
Logical schema: The logical schema describes the database design at the logical level, which
describes what data are stored in the database and what relationship exists among the data.

4. What is a data dictionary?


A data dictionary is a data structure which stores meta data about the structure of the database
ie. the schema of the database.

5. Define data independence and what are the two levels of data independence?
The ability to modify a schema definition in one level without affecting a schema definition in
a next higher level is called data independence
1. Physical data independence: - Is the ability to modify the physical schema without causing
application program to be rewritten.
2. Logical data independence: - Is the ability to modify the logical schema without causing
application program to be re-written.

6. Define weak and strong entity sets?


Excel Engineering College AI & DS
COLLEGE

Weak entity set: entity set that do not have key attribute of their own are called weak entity
sets. Strong entity set: Entity set that has a primary key is termed a strong entity set.

7. What are the types of attributes used in ER diagram?


Single valued attributes: attributes with a single value for a particular entity are called single
valued attributes.
Multivalued attributes : Attributes with a set of value for a particular entity are called
multivalued attributes.
Stored attributes: The attributes stored in a data base are called stored attributes.
Derived attributes: The attributes that are derived from the stored attributes are called derived
attributes.
Composite attributes can be divided in to sub parts.
Key attribute : An entity type usually has an attribute whose values are distinct from each
individual entity in the collection. Such an attribute is called a key attribute.

8. What is embedded SQL? What are its advantages?


Embedded SQL is a method of combining the computing power of a programming language
and the database manipulation capabilities of SQL. Embedded SQL statements are SQL
statements written in line with the program source code of the host language. The embedded
SQL statements are parsed by an embedded SQL preprocessor and replaced by host-language
calls to a code library. The output from the preprocessor is then compiled by the host compiler.
This allows programmers to embed SQL statements in programs written in any number of
languages such as: C/C++, COBOL and Fortran

9. Define i) Primary key ii) Foreign key


Primary key:
Primary key is chosen by the database designer as the principal means of identifying an entity
in the entity set.
Foreign key:
A relation schema r1 derived from an ER schema may include among its attributes the primary
key of another relation schema r2.this attribute is called a foreign key from r1 referencing r2.

10. What is meant by lossless-join decomposition? We claim the above decomposition is


lossless. How can we decide whether decomposition is lossless?
• Let R be a relation schema.
• Let F be a set of functional dependencies on R.
• Let R1and R2 form a decomposition of R.
•The decomposition is a lossless-join decomposition of R if at least one of the following
functional dependencies are in :
a. R1∩ R2→ R1
b. R1∩ R2→ R2

11. What is meant by normalization of data?


It is a process of analyzing the given relation schemas based on their Functional Dependencies
Excel Engineering College AI & DS
COLLEGE

(FDs) and primary key to achieve the properties


➢ Minimizing redundancy
➢ Minimizing insertion, deletion and updating anomalies
The process of storing the job of higher normal form relations as base relations which
lower normal is known as De-normalization.

12. What is meant by functional dependencies?


Consider a relation schema R and a C R and ß C R. The functional dependency a ß holds on
relational schema R if in any legal relation r(R), for all pairs of tuples t1 and t2 in r such that t1
[a] =t1 [a], and also t1 [ß] =t2 [ß].

13. What are the ACID properties?


Collections of operations that form a single logical unit of work are called transactions.
(atomicity, consistency, isolation, durability) is a set of properties that guarantee database
transactions are processed reliably. In the context of databases, a single logical operation on
the data is called a transaction. For example, a transfer of funds from one bank account to
another, even though that might involve multiple changes (such as debiting one account and
crediting another), is a single transaction.

14. Give the reasons for allowing concurrency?


The reasons for allowing concurrency is if the transactions run serially, a short transaction may
have to wait for a preceding long transaction to complete, which can lead to unpredictable
delays in running a transaction. So concurrent execution reduces the unpredictable delays in
running transactions.

15. Define deadlock?


Neither of the transaction can ever proceed with its normal execution. This situation is called
deadlock. To handle deadlock 2 methods are used as follows
• deadlock prevention
• deadlock detection and recovery

16. Differentiate strict two phase locking protocol and rigorous two phase locking
protocol.
In strict two phase locking protocol all exclusive mode locks taken by a transaction is held
until that transaction commits.
Rigorous two phase locking protocol requires that all locks be held until that transaction
commits.

17. What are the types failures occurring in a system?


1. Transaction failure
A) Logical error
b) System error
2. System crash
3. Disk failure

18. Distinguish between fixed length records and variable length records?
Excel Engineering College AI & DS
COLLEGE

Fixed length records


Every record has the same fields and field lengths are fixed.
Variable length records
File records are of same type but one or more of the fields are of varying size.

19. Why serializability is used?


Serializability is a useful concept because it allows programmers to ignore issues
related to concurrency when the code transactions. The protocols required o ensure
serializabillity may allow too little concurrency for certain applications.

20. Define i)bit-level striping ii) block-level striping.


Bit-level striping:
Data striping consists of splitting the bits of each byte across multiple disks. This is
called bit-level striping.
Block-level striping:
Block level striping stripes blocks across multiple disks. It treats the array of disks as a large
disk, and gives blocks logical numbers.

21. What is a B+-Tree index?


B+-Tree index takes the form of a balanced tree in which every path from the root
of the root of the root of the tree to a leaf of the tree is of the same length.
Draw the structure of a B+ tree and explain briefly.
A node contains up to n-1 search key values and n pointers.
P1 K1 P2 K2 .................. Pn-1 Kn-1 Pn

22. What is an index? What are the types of indices?


An index is a structure that helps to locate desired records of a relation quickly,
without examining all records.
Types of indices:
a) Ordered indices
b) Hash indices

23. What is called query processing? What are the steps involved in query
processing?
Query processing refers to the range of activities involved in extracting data from a
database.
The basic steps are:
• Parsing and translation
• Optimization
• Evaluation

24. What is the difference between Information Retrieval and DBMS.

S.No Information Retrieval DBMS


Excel Engineering College AI & DS
COLLEGE

1 Imprecise semantics Precise semantics


2 Keyword search SQL
3 Unstructured data format Structured data
4 Reads mostly. Adds document Expects reasonable number of updates.
occasionally.
5 Displays page through top k results. Generates full answer.

25. What is the difference between Database and Data warehouse?

S.No Database Data warehouse


1. Online transaction & query processing is Data analysis & decision making is done in
done in database data warehouse
2. Online Transaction Processing (OLTP) is Online Analytical Processing (OLAP) is
carried out in Database. carried out in Database

PART B

1. Analyze the purpose of DBMS instead of File processing system

Data redundancy and inconsistency:


The same information may be written in several files. This redundancy leads to higher storage
and access cost. It may lead data inconsistency that is the various copies of the same data may
longer agree for example a changed customer address may be reflected in single file but not else
where in the system.

Difficulty in accessing data :


The conventional file processing system do not allow data to retrieved in a convenient and
efficient manner according to user choice.

Data isolation :
Because data are scattered in various file and files may be in different formats with new
application programs to retrieve the appropriate data is difficult.

Integrity Problems:
Developers enforce data validation in the system by adding appropriate code in the various
application program. How ever when new constraints are added, it is difficult to change the
programs to enforce them.

Atomicity:
It is difficult to ensure atomicity in a file processing system when transaction failure occurs due
to power failure, networking problems etc.(atomicity: either all operations of the transaction are
reflected properly in the database or non are)
Excel Engineering College AI & DS
COLLEGE

Concurrent access:
In the file processing system it is not possible to access a same file for transaction at same time
Security problems:
There is no security provided in file processing system to secure the data from unauthorized user
access.

2. Explain in detail about components of DBMS

File manager manages allocation of disk space and data structures used to represent information
on disk.
Database manager: The interface between low-level data and application programs and queries.
Query processor translates statements in a query language into low-level instructions the
database manager understands. (May also attempt to find an equivalent but more efficient form.)
DML precompiler converts DML statements embedded in an application program to normal
procedure calls in a host language. The precompiler interacts with the query processor.
Excel Engineering College AI & DS
COLLEGE

DDL compiler converts DDL statements to a set of tables containing metadata stored in a data
dictionary In addition, several data structures are required for physical system implementation:

Data files: store the database itself.


Data dictionary: stores information about the structure of the database. It is used heavily. Great
emphasis should be placed on developing a good design and efficient implementation of the
dictionary.
Indices: provide fast access to data items holding particular values

3. Explain data Models and its types.

4. Explain in detail about types of DBMS languages

. 1. Data Definition Language


o DDL stands for Data Definition Language. It is used to define database structure or
pattern.
Excel Engineering College AI & DS
COLLEGE

o It is used to create schema, tables, indexes, constraints, etc. in the database.


o Using the DDL statements, you can create the skeleton of the database.
o Data definition language is used to store the information of metadata like the number of
tables and schemas, their names, indexes, columns in each table, constraints, etc.

Here are some tasks that come under DDL:

o Create: It is used to create objects in the database.


o Alter: It is used to alter the structure of the database.
o Drop: It is used to delete objects from the database.
o Truncate: It is used to remove all records from a table.
o Rename: It is used to rename an object.
o Comment: It is used to comment on the data dictionary.

2. Data Manipulation Language

DML stands for Data Manipulation Language. It is used for accessing and manipulating data in
a database. It handles user requests.

Here are some tasks that come under DML:

o Select: It is used to retrieve data from a database.


o Insert: It is used to insert data into a table.
o Update: It is used to update existing data within a table.
o Delete: It is used to delete all records from a table.
o Merge: It performs UPSERT operation, i.e., insert or update operations.
o Call: It is used to call a structured query language or a Java subprogram.
o Explain Plan: It has the parameter of explaining data.
o Lock Table: It controls concurrency.

3. Data Control Language


o DCL stands for Data Control Language. It is used to retrieve the stored or saved data.
o The DCL execution is transactional. It also has rollback parameters.

Here are some tasks that come under DCL:

o Grant: It is used to give user access privileges to a database.


o Revoke: It is used to take back permissions from the user.
Excel Engineering College AI & DS
COLLEGE

4. Transaction Control Language

TCL is used to run the changes made by the DML statement. TCL can be grouped into a logical
transaction.

Here are some tasks that come under TCL:

o Commit: It is used to save the transaction on the database.


o Rollback: It is used to restore the database to original since the last Commit.

5. State the rules that a database must obey in order to be regarded as a true relational
database
Rule 1: Information Rule
The data stored in a database, may it be user data or metadata, must be a value of some table cell.
Everything in a database must be stored in a table format.
Rule 2: Guaranteed Access Rule
Every single data element value is guaranteed to be accessible logically with a combination of
table-name, primary-key rowvalue, and attribute-name columnvalue. No other means, such as
pointers, can be used to access data.
Rule 3: Systematic Treatment of NULL Values
The NULL values in a database must be given a systematic and uniform treatment. This is a very
important rule because a NULL can be interpreted as one the following − data is missing, data is
not known, or data is not applicable.
Rule 4: Active Online Catalog
The structure description of the entire database must be stored in an online catalog, known as
data dictionary, which can be accessed by authorized users. Users can use the same query
language to access the catalog which they use to access the database itself.
Rule 5: Comprehensive Data Sub-Language Rule
A database can only be accessed using a language having linear syntax that supports data
definition, data manipulation, and transaction management operations. This language can be used
directly or by means of some application. If the database allows access to data without any help
of this language, then it is considered as a violation.
Rule 6: View Updating Rule
All the views of a database, which can theoretically be updated, must also be updatable by the
system.
Rule 7: High-Level Insert, Update, and Delete Rule
A database must support high-level insertion, updation, and deletion. This must not be limited to
a single row, that is, it must also support union, intersection and minus operations to yield sets of
data records.
Rule 8: Physical Data Independence
The data stored in a database must be independent of the applications that access the database.
Any change in the physical structure of a database must not have any impact on how the data is
being accessed by external applications.
Rule 9: Logical Data Independence
The logical data in a database must be independent of its user’s view application. Any change in
logical data must not affect the applications using it. For example, if two tables are merged or
Excel Engineering College AI & DS
COLLEGE

one is split into two different tables, there should be no impact or change on the user application.
This is one of the most difficult rule to apply.
Rule 10: Integrity Independence
A database must be independent of the application that uses it. All its integrity constraints can be
independently modified without the need of any change in the application. This rule makes a
database independent of the front-end application and its interface.
Rule 11: Distribution Independence
The end-user must not be able to see that the data is distributed over various locations. Users
should always get the impression that the data is located at one site only. This rule has been
regarded as the foundation of distributed database systems.
Rule 12: Non-Subversion Rule
If a system has an interface that provides access to low-level records, then the interface must not
be able to subvert the system and bypass security and integrity constraints.

6. Explain in detail about normalization


Problems without normalization
• Data redundancy
• Insert anomalies
• delete anomalies
• update anomalies
Types of normalization
First Normal Form (1NF) - For a table to be in the First Normal Form, it should follow the
following 4 rules:
• It should only have single(atomic) valued attributes/columns.
• Values stored in a column should be of the same domain
• All the columns in a table should have unique names.
• And the order in which data is stored, does not matter.
Second Normal Form (2NF) - For a table to be in the Second Normal Form,
• It should be in the First Normal form.
• And, it should not have Partial Dependency.
Third Normal Form (3NF) - A table is said to be in the Third Normal Form when,
• It is in the Second Normal form.
• And, it doesn't have Transitive Dependency.
Boyce and Codd Normal Form (BCNF) - Boyce and Codd Normal Form is a higher version of
the Third Normal form. This form deals with certain type of anomaly that is not handled by 3NF.
A 3NF table which does not have multiple overlapping candidate keys is said to be in BCNF. For
a table to be in BCNF, following conditions must be satisfied:
• R must be in 3rd Normal Form
• and, for each functional dependency ( X → Y ), X should be a super Key.
Fourth Normal Form (4NF) - A table is said to be in the Fourth Normal Form when,
• It is in the Boyce-Codd Normal Form.
• And, it doesn't have Multi-Valued Dependency.
Fifth Normal Form (5NF) - A table is said to be in the Fifth Normal Form when,
• It is in the Fourth Normal Form.
• And, it doesn't have join Dependency
Excel Engineering College AI & DS
COLLEGE

7. Explain in detail about ER Notations

8. Explain in detail about steps involved in mapping ER to relational model.


• Tabular representation of strong entity sets
• Tabular representation of weak entity sets
• Tabular representation of relationship sets
• Tabular representation of specialization sets
Excel Engineering College AI & DS
COLLEGE

• Tabular representation of generalization sets


• Tabular representation of aggregation sets

9. Explain in detail about transaction properties and states.


Collections of operations that form a single logical unit of work are called transactions
Properties of transaction:
Atomicity. Either all operations of the transaction are reflected properly in the database, or none
are.
Consistency. Execution of a transaction in isolation (that is, with no other transaction executing
concurrently) preserves the consistency of the database.
Isolation. Even though multiple transactions may execute concurrently, the system guarantees
that, for every pair of transactions Ti and Tj , it appears to Ti that either Tj finished execution
before Ti started, or Tj started execution after Ti finished. Thus, each transaction is unaware of
other transactions executing concurrently in the system.
Durability. After a transaction completes successfully, the changes it has made to the database
persist, even if there are system failures.

States of transaction:
A transaction must be in one of the following states:
• Active, the initial state; the transaction stays in this state while it is executing
• Partially committed, after the final statement has been executed
• Failed, after the discovery that normal execution can no longer proceed
• Aborted, after the transaction has been rolled back and the database has been restored to its
state prior to the start of the transaction
• Committed, after successful completion

10. Outline deadlock handling mechanisms

Neither of the transaction can ever proceed with its normal execution. This situation is called
deadlock.

Conditions for deadlock:


Mutual exclusion: only one process at a time can use a resource.
Hold and wait: a process holding at least one resource is waiting to acquire additional resources
held by other processes.
No preemption: a resource can be released only voluntarily by the process holding it, after that
process has completed its task.
Circular wait: there exists a set {P0, P1, …, P0} of waiting processes such that P0 is waiting for
a resource that is held by P1, P1 is waiting for a resource that is held by P2, …, Pn–1 is waiting
for a resource that is held by Pn, and P0 is waiting for a resource that is held by P0.

Deadlock handling mechanisms:

• Deadlock prevention
• Deadlock avoidance
• Deadlock detection and recovery
Excel Engineering College AI & DS
COLLEGE

Deadlock prevention:

Mutual Exclusion – not required for sharable resources; must hold for nonsharable resources.
Hold and Wait – must guarantee that whenever a process requests a resource, it does not hold
any other resources.
No Preemption – If a process that is holding some resources requests another resource that
cannot be immediately allocated to it, then all resources currently being held are released.
• Wait-Die Scheme
If TS(Ti) < TS(Tj) − that is Ti, which is requesting a conflicting lock, is older than Tj − then Ti is
allowed to wait until the data-item is available.
If TS(Ti) > TS(tj) − that is Ti is younger than Tj − then Ti dies. Ti is restarted later with a
random delay but with the same timestamp.
• Wound-Wait Scheme
If TS(Ti) < TS(Tj), then Ti forces Tj to be rolled back − that is Ti wounds Tj. Tj is restarted later
with a random delay but with the same timestamp.
If TS(Ti) > TS(Tj), then Ti is forced to wait until the resource is available.
Circular Wait – impose a total ordering of all resource types, and require that each process
requests resources in an increasing order of enumeration. If a process that is holding some
resources requests another resource that cannot be immediately allocated to it, then all resources
currently being held are released.

Deadlock avoidance:

• Simplest and most useful model requires that each process declare the maximum number
of resources of each type that it may need.
• The deadlock-avoidance algorithm dynamically examines the resource-allocation state to
ensure that there can never be a circular-wait condition.
• Resource-allocation state is defined by the number of available and allocated resources,
and the maximum demands of the processes.

Deadlock detection and recovery:

• Allow system to enter deadlock state


• When a process requests an available resource, system must decide if immediate
allocation leaves the system in a safe state.
• System is in safe state if there exists a safe sequence of all processes
• If a system is in safe state ⇒ no deadlocks.
• If a system is in unsafe state ⇒ possibility of deadlock.
• Avoidance ⇒ ensure that a system will never enter an unsafe state.
• Apply Detection algorithm – wait for graph results in cycle - possibility of deadlock.
• Apply Recovery scheme – select a victim - Total or partial recovery – No starvation

11. Explain about 2 phase locking & 2 phase commit


Excel Engineering College AI & DS
COLLEGE

2 phase locking:
This protocol requires that each transaction issue lock and unlock requests in two phases:
1. Growing phase. A transaction may obtain locks, but may not release any lock.
2. Shrinking phase. A transaction may release locks, but may not obtain any new locks.

Demerits:

• Deadlock occurs
• Cascade rollback occurs

Overcoming demerits:
In strict two phase locking protocol all exclusive mode locks taken by a transaction is held
until that transaction commits.
Rigorous two phase locking protocol requires that all locks be held until the transaction
commits.

2 phase commit:

2-phase commit protocol (2PC) is a type of atomic commitment protocol (ACP). It is


a distributed algorithm that coordinates all the processes that participate in a distributed atomic
transaction on whether to commit or abort (roll back) the transaction

The commit-request phase (or voting phase), in which a coordinator process attempts to
prepare all the transaction's participating processes to take the necessary steps for either
committing or aborting the transaction and to vote, either "Yes": commit (if the transaction
participant's local portion execution has ended properly), or "No": abort (if a problem has been
detected with the local portion)

The commit phase, in which, based on voting of the participants, the coordinator decides
whether to commit (only if all have voted "Yes") or abort the transaction (otherwise), and
notifies the result to all the participants. The participants then follow with the needed actions
(commit or abort) with their local transactional resources (also called recoverable resources; e.g.,
database data) and their respective portions in the transaction's other output (if applicable).

12. Explain in detail about the concept of serializability

A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule. Different


forms of schedule equivalence give rise to the notions of:
• conflict serializability
• view serializability

Need for serializability:

• Lost update - occurs when two different transactions are trying to update the same
Excel Engineering College AI & DS
COLLEGE

column on the same row within a database at the same time


• Dirty read - occurs when a transaction is allowed to read data from a row that has been
modified by another running transaction and not yet committed.
• Unacceptable read – occurs when a transaction reads different values in between
execution due to changes made by concurrent running transaction

Conflict serializability:

Instructions l i and l j of transactions Ti and Tj respectively, conflict if and only if there exists
some item Q accessed by both l i and l j , and at least one of these instructions wrote Q.
• l i = read(Q), lj = read(Q). l i and l j don’t conflict.
• l i = read(Q), lj = write(Q). They conflict.
• l i = write(Q), lj = read(Q). They conflict
• l i = write(Q), lj = write(Q). They conflict

View serializability:

Let S and S´ be two schedules with the same set of transactions. S and S´ are view equivalent if
the following three conditions are met:
• For each data item Q, if transaction Ti reads the initial value of Q in schedule S, then
transaction Ti must, in schedule S´, also read the initial value of Q.
• For each data item Q if transaction Ti executes read(Q) in schedule S, and that value was
produced by transaction Tj (if any), then transaction Ti must in schedule S´ also read the
value of Q that was produced by transaction Tj .
• For each data item Q, the transaction (if any) that performs the final write(Q) operation in
schedule S must perform the final write(Q) operation in schedule S´

13. Explain the concept of RAID technology

Redundant Array of Independent Disks, which is a technology to connect multiple secondary


storage devices and use them as a single storage media. RAID improves the performance and
reliability of DBMS system.

• RAID 0 – striping
• RAID 1 – mirroring
• RAID 0+1 – Non redundant and mirroring
• RAID 2 – Memory style error correcting codes
• RAID 3 –Bit interleaved parity
• RAID 4 – Block interleaved parity
• RAID 5 - Block interleaved Distributed parity
• RAID 6 – P+Q redundancy

Bit-level striping: Data striping consists of splitting the bits of each byte across multiple
disks.
Excel Engineering College AI & DS
COLLEGE

Block-level striping: Block level striping stripes blocks across multiple disks. It treats the
array of disks as a large disk, and gives blocks logical numbers.
Mirroring is a process of duplicate every disk to achieve redundancy for increasing the
reliability

14. Explain the concept of Indexing

Indexing is a data structure technique to efficiently retrieve records from the database files based
on some attributes on which the indexing has been done.

Primary Index − Primary index is defined on an ordered data file. The data file is ordered on
a key field. The key field is generally the primary key of the relation.
Secondary Index − Secondary index may be generated from a field which is a candidate key and
has a unique value in every record, or a non-key with duplicate values.
Clustering Index − Clustering index is defined on an ordered data file. The data file is ordered
on a non-key field.
Ordered Indexing is of two types −
• Dense Index
• Sparse Index

Dense Index - In dense index, there is an index record for every search key value in the database.

Sparse Index - In sparse index, index records are not created for every search key. An index record
here contains a search key and an actual pointer to the data on the disk.

Multilevel Index - Index records comprise search-key values and data pointers. Multilevel index is
stored on the disk along with the actual database files. As the size of the database grows, so does
the size of the indices

15. Explain the concept of Hashing


Hashing is an effective technique to calculate the direct location of a data record on the disk
without using index structure.
Hashing uses hash functions with search keys as parameters to generate the address of a data
record.
Hash Organization:
• Bucket − A hash file stores data in bucket format. Bucket is considered a unit of storage.
A bucket typically stores one complete disk block, which in turn can store one or more
records.
Excel Engineering College AI & DS
COLLEGE

• Hash Function − A hash function, h, is a mapping function that maps all the set of
search-keys K to the address where actual records are placed. It is a function from search
keys to bucket addresses.
Static Hashing:
In static hashing, when a search-key value is provided, the hash function always computes the
same address. For example, if mod-4 hash function is used, then it shall generate only 5 values.
The output address shall always be same for that function. The number of buckets provided
remains unchanged at all times.
Bucket Overflow:
The condition of bucket-overflow is known as collision. This is a fatal state for any static hash
function. In this case, overflow chaining can be used.
• Overflow Chaining − When buckets are full, a new bucket is allocated for the same hash
result and is linked after the previous one. This mechanism is called Closed Hashing.
• Linear Probing − When a hash function generates an address at which data is already
stored, the next free bucket is allocated to it. This mechanism is called Open Hashing.
Dynamic Hashing
The problem with static hashing is that it does not expand or shrink dynamically as the size of
the database grows or shrinks. Dynamic hashing provides a mechanism in which data buckets are
added and removed dynamically and on-demand. Dynamic hashing is also known as extended
hashing.
Hash function, in dynamic hashing, is made to produce a large number of values and only a few
are used initially.

16. Explain in detail about query processing

The aims of query processing are to transform a query written in a high-level language, typically
SQL, into a correct and efficient execution strategy expressed in a low-level language
(implementing the relational algebra), and to execute the strategy to retrieve the required data

The steps involved in processing a query are


• Parsing and transaction
• Optimization
• Evaluation

Example:
Given SQL query for displaying name of employee whose salary is greater than 5000
SELECT Ename FROM Employee WHERE Salary > 5000;

Translated into Relational Algebra Expression


Excel Engineering College AI & DS
COLLEGE

σ Salary > 5000 (π Ename (Employee))


OR
π Ename (σ Salary > 5000 (Employee))

A sequence of primitive operations that can be used to evaluate a query is a Query Execution
Plan or Query Evaluation Plan

Need for Query Optimization


• Query optimization provides faster query processing.
• It requires less cost per query.
• It gives less stress to the database.
• It provides high performance of the system.
• It consumes less memory.
Excel Engineering College AI & DS
COLLEGE

17. Explain in detail about DDBMS

Distributed DBMS:
A distributed database is a collection of multiple interconnected databases, which are spread
physically across various locations that communicate via a computer network.
A Distributed Database Management System (DDBMS) manages the distributed database and
provides mechanisms so as to make the databases transparent to the users.
In these systems, data is intentionally distributed among multiple nodes so that all computing
resources of the organization can be optimally used.
Factors Encouraging DDBMS:
• Distributed Nature of Organizational Units
• Need for Sharing of Data
• Support for Both OLTP and OLAP
• Database Recovery
• Support for Multiple Application Software

Types of DDBMS
• Homogeneous – Autonomous and Non-autonomous
• Heterogeneous - Federated and Un-federated

Homogeneous Distributed Databases


In a homogeneous distributed database, all the sites use identical DBMS and operating systems.
Its properties are −
• The sites use very similar software.
• The sites use identical DBMS or DBMS from the same vendor.
• Each site is aware of all other sites and cooperates with other sites to process user
requests.
• The database is accessed through a single interface as if it is a single database.

Types of Homogeneous Distributed Database:


There are two types of homogeneous distributed database −
• Autonomous − Each database is independent that functions on its own. They are
integrated by a controlling application and use message passing to share data updates.
• Non-autonomous − Data is distributed across the homogeneous nodes and a central or
master DBMS co-ordinates data updates across the sites.
Heterogeneous Distributed Databases
In a heterogeneous distributed database, different sites have different operating systems, DBMS
products and data models. Its properties are −
• Different sites use dissimilar schemas and software.
Excel Engineering College AI & DS
COLLEGE

• The system may be composed of a variety of DBMSs like relational, network,


hierarchical or object oriented.
• Query processing is complex due to dissimilar schemas.
• Transaction processing is complex due to dissimilar software.
• A site may not be aware of other sites and so there is limited co-operation in
processing user requests.

Types of Heterogeneous Distributed Databases
There are two types of heterogeneous distributed database −
• Federated − The heterogeneous database systems are independent in nature and
integrated together so that they function as a single database system.
• Un-federated − The database systems employ a central coordinating module through
which the databases are accessed.
Architectural Models:
Some of the common architectural models are −

• Client - Server Architecture for DDBMS - The server functions primarily encompass
data management, query processing, optimization and transaction management. Client
functions include mainly user interface.
• Peer - to - Peer Architecture for DDBMS - In these systems, each peer acts both as a
client and a server for imparting database services. The peers share their resource with
other peers and co-ordinate their activities.
• Multi - DBMS Architecture for DDBMS - This is an integrated database system
formed by a collection of two or more autonomous database systems.

18. Explain in detail about OODBMS

• Object oriented database systems are alternative to relational database and other database
systems.
• In object oriented database, information is represented in the form of objects.
• Object oriented databases are exactly same as object oriented programming languages. If
we can combine the features of relational model (transaction, concurrency, recovery) to
object oriented databases, the resultant model is called as object oriented database model

Features of OODBMS

Complexity OODBMS has the ability to represent the complex internal structure (of object) with
multilevel complexity.

Inheritance Creating a new object from an existing object in such a way that new object inherits
all characteristics of an existing object.
Excel Engineering College AI & DS
COLLEGE

Encapsulation It is an data hiding concept in OOPL which binds the data and functions together
which can manipulate data and not visible to outside world.

Persistency OODBMS allows to create persistent object (Object remains in memory even after
execution). This feature can automatically solve the problem of recovery and concurrency.

Challenges in ORDBMS implementation

a. Storage and accessibility of data


Challenge: Storage of large ADTs and structured objects.
Solution: As large ADTs need special storage, it is possible to store them on different locations
on the disk from the tuples that contain them. For e.g. BLOBs (Binary Large Object like images,
audio or any multimedia object.)
b. Query Processing
Challenge: Efficient flow of Query Processing and optimization is a difficult task.
Solution: By registering the user defined aggregation function, query processing becomes easier.
It requires three implementation steps - initialize, iterate and terminate.
c. Query Optimization
Challenge: New indexes and query processing techniques increase the options for query
optimization. But, the challenge is that the optimizer must know to handle and use the query
processing functionality properly.
Solution: While constructing a query plan, an optimizer must be familiar to the newly added
index structures.
Object identity

• Every object has unique identity. In an object oriented system, when object is created OID is
assigned to it.
• In RDBMS OID is value based and primary key is used to provide uniqueness of each table
in relation. Primary key is unique only for that relation and not for the entire system.
Primary key is chosen from the attributes of the relation which makes object independent on
the object state.
• In OODBMS OID are variable name or pointer.

Type of Attributes

1. Simple attributes
Excel Engineering College AI & DS
COLLEGE

Attributes can be of primitive data type such as, integer, string, real etc. which can take literal
value.

Example: 'ID' is simple attribute and value is 07.

2. Complex attributes

Attributes which consist of collections or reference of other multiple objects are called as
complex attributes

.Example: Collection of Employees consists of many employee names.

3. Reference attributes

Attributes that represent a relationship between objects and consist of value or collection of
values are called as reference attributes.

Example: Manager is reference of staff object.

19. Explain in detail about XML databases

XML Database is used to store huge amount of information in the XML format. As the use of
XML is increasing in every field, it is required to have a secured place to store the XML
documents. The data stored in the database can be queried using XQuery, serialized, and
exported into a desired format.

There are two major types of XML databases −

• XML- enabled
• Native XML (NXD)

XML - Enabled Database: XML enabled database is nothing but the extension provided for the
conversion of XML document. This is a relational database, where data is stored in tables
consisting of rows and columns. The tables contain set of records, which in turn consist of fields.
Excel Engineering College AI & DS
COLLEGE

Native XML database is based on the container rather than table format. It can store large
amount of XML document and data. Native XML database is queried by the XPath-expressions.
Example

<?xml version = "1.0"?>


<contact-info>
<contact1>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact1>

<contact2>
<name>Manisha Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 789-4567</phone>
</contact2>
</contact-info>

XML parser is a software library or a package that provides interface for client applications to
work with XML documents. It checks for proper format of the XML document and may also
validate the XML documents. Modern day browsers have built-in XML parsers.

XML Document Type Declaration, commonly known as DTD, is a way to describe XML
language precisely. DTDs check vocabulary and validity of the structure of XML documents
against grammatical rules of appropriate XML language. An XML document is always
descriptive. The tree structure is often referred to as
XML Tree and plays an important role to describe any XML document easily. The tree structure
contains root (parent) elements, child elements and so on. By using tree structure, you can get to
know all succeeding branches and sub-branches starting from the root. The parsing starts at the
root, then moves down the first branch to an element, take the first branch from there, and so on
to the leaf nodes.
Document Object Model (DOM) is the foundation of XML. XML documents have a hierarchy
of informational units called nodes; DOM is a way of describing those nodes and the
relationships between them.
A DOM document is a collection of nodes or pieces of information organized in a hierarchy.
This hierarchy allows a developer to navigate through the tree looking for specific information.
Because it is based on a hierarchy of information, the DOM is said to be tree based.
XQuery is a specification for a query language that allows a user or programmer to extract
information from an Extensible Markup Language (XML) file or any collection of data that can
be XML-like. The syntax is intended to be easy to understand and use
Excel Engineering College AI & DS
COLLEGE

XML Namespace is a set of unique names. Namespace is a mechanisms by which element and
attribute name can be assigned to a group. The Namespace is identified by URI(Uniform
Resource Identifiers).
20. Explain in detail about Information Retrieval

• Relevance Ranking Using Terms


• Relevance Using Hyperlinks
• Synonyms., Homonyms, and Ontologies
• Indexing of Documents
• Measuring Retrieval Effectiveness
• Web Search Engines
• Information Retrieval and Structured Data
• Directories

• Information retrieval (IR) systems use a simpler data model than database systems
• Information organized as a collection of documents
• Documents are unstructured, no schema
• Information retrieval locates relevant documents, on the basis of user input such as
keywords or example documents
e.g., find documents containing the words “database systems”
• Can be used even on textual descriptions provided with nontextual data such as images
• Web search engines are the most familiar example of IR systems

• In full text retrieval, all the words in each document are considered to be keywords.
• We use the word term to refer to the words in a document
Information retrieval systems typically allow query expressions formed using keywords
and the logical connectives and, or, and not
• Ands are implicit, even if not explicitly specified
• Ranking of documents on the basis of estimated relevance to a query is critical
• Relevance ranking is based on factors such as
• Term frequency – Frequency of occurrence of query keyword in document
• Inverse document frequency – How many documents the query keyword occurs in
Fewer -> give more importance to keyword
• Hyperlinks to documents – More links to a document -> document is more important
• Web crawlers are programs that locate and gather information on the Web

You might also like