Professional Documents
Culture Documents
COLLEGE
5. Define data independence and what are the two levels of data independence?
The ability to modify a schema definition in one level without affecting a schema definition in
a next higher level is called data independence
1. Physical data independence: - Is the ability to modify the physical schema without causing
application program to be rewritten.
2. Logical data independence: - Is the ability to modify the logical schema without causing
application program to be re-written.
Weak entity set: entity set that do not have key attribute of their own are called weak entity
sets. Strong entity set: Entity set that has a primary key is termed a strong entity set.
16. Differentiate strict two phase locking protocol and rigorous two phase locking
protocol.
In strict two phase locking protocol all exclusive mode locks taken by a transaction is held
until that transaction commits.
Rigorous two phase locking protocol requires that all locks be held until that transaction
commits.
18. Distinguish between fixed length records and variable length records?
Excel Engineering College AI & DS
COLLEGE
23. What is called query processing? What are the steps involved in query
processing?
Query processing refers to the range of activities involved in extracting data from a
database.
The basic steps are:
• Parsing and translation
• Optimization
• Evaluation
PART B
Data isolation :
Because data are scattered in various file and files may be in different formats with new
application programs to retrieve the appropriate data is difficult.
Integrity Problems:
Developers enforce data validation in the system by adding appropriate code in the various
application program. How ever when new constraints are added, it is difficult to change the
programs to enforce them.
Atomicity:
It is difficult to ensure atomicity in a file processing system when transaction failure occurs due
to power failure, networking problems etc.(atomicity: either all operations of the transaction are
reflected properly in the database or non are)
Excel Engineering College AI & DS
COLLEGE
Concurrent access:
In the file processing system it is not possible to access a same file for transaction at same time
Security problems:
There is no security provided in file processing system to secure the data from unauthorized user
access.
File manager manages allocation of disk space and data structures used to represent information
on disk.
Database manager: The interface between low-level data and application programs and queries.
Query processor translates statements in a query language into low-level instructions the
database manager understands. (May also attempt to find an equivalent but more efficient form.)
DML precompiler converts DML statements embedded in an application program to normal
procedure calls in a host language. The precompiler interacts with the query processor.
Excel Engineering College AI & DS
COLLEGE
DDL compiler converts DDL statements to a set of tables containing metadata stored in a data
dictionary In addition, several data structures are required for physical system implementation:
DML stands for Data Manipulation Language. It is used for accessing and manipulating data in
a database. It handles user requests.
TCL is used to run the changes made by the DML statement. TCL can be grouped into a logical
transaction.
5. State the rules that a database must obey in order to be regarded as a true relational
database
Rule 1: Information Rule
The data stored in a database, may it be user data or metadata, must be a value of some table cell.
Everything in a database must be stored in a table format.
Rule 2: Guaranteed Access Rule
Every single data element value is guaranteed to be accessible logically with a combination of
table-name, primary-key rowvalue, and attribute-name columnvalue. No other means, such as
pointers, can be used to access data.
Rule 3: Systematic Treatment of NULL Values
The NULL values in a database must be given a systematic and uniform treatment. This is a very
important rule because a NULL can be interpreted as one the following − data is missing, data is
not known, or data is not applicable.
Rule 4: Active Online Catalog
The structure description of the entire database must be stored in an online catalog, known as
data dictionary, which can be accessed by authorized users. Users can use the same query
language to access the catalog which they use to access the database itself.
Rule 5: Comprehensive Data Sub-Language Rule
A database can only be accessed using a language having linear syntax that supports data
definition, data manipulation, and transaction management operations. This language can be used
directly or by means of some application. If the database allows access to data without any help
of this language, then it is considered as a violation.
Rule 6: View Updating Rule
All the views of a database, which can theoretically be updated, must also be updatable by the
system.
Rule 7: High-Level Insert, Update, and Delete Rule
A database must support high-level insertion, updation, and deletion. This must not be limited to
a single row, that is, it must also support union, intersection and minus operations to yield sets of
data records.
Rule 8: Physical Data Independence
The data stored in a database must be independent of the applications that access the database.
Any change in the physical structure of a database must not have any impact on how the data is
being accessed by external applications.
Rule 9: Logical Data Independence
The logical data in a database must be independent of its user’s view application. Any change in
logical data must not affect the applications using it. For example, if two tables are merged or
Excel Engineering College AI & DS
COLLEGE
one is split into two different tables, there should be no impact or change on the user application.
This is one of the most difficult rule to apply.
Rule 10: Integrity Independence
A database must be independent of the application that uses it. All its integrity constraints can be
independently modified without the need of any change in the application. This rule makes a
database independent of the front-end application and its interface.
Rule 11: Distribution Independence
The end-user must not be able to see that the data is distributed over various locations. Users
should always get the impression that the data is located at one site only. This rule has been
regarded as the foundation of distributed database systems.
Rule 12: Non-Subversion Rule
If a system has an interface that provides access to low-level records, then the interface must not
be able to subvert the system and bypass security and integrity constraints.
States of transaction:
A transaction must be in one of the following states:
• Active, the initial state; the transaction stays in this state while it is executing
• Partially committed, after the final statement has been executed
• Failed, after the discovery that normal execution can no longer proceed
• Aborted, after the transaction has been rolled back and the database has been restored to its
state prior to the start of the transaction
• Committed, after successful completion
Neither of the transaction can ever proceed with its normal execution. This situation is called
deadlock.
• Deadlock prevention
• Deadlock avoidance
• Deadlock detection and recovery
Excel Engineering College AI & DS
COLLEGE
Deadlock prevention:
Mutual Exclusion – not required for sharable resources; must hold for nonsharable resources.
Hold and Wait – must guarantee that whenever a process requests a resource, it does not hold
any other resources.
No Preemption – If a process that is holding some resources requests another resource that
cannot be immediately allocated to it, then all resources currently being held are released.
• Wait-Die Scheme
If TS(Ti) < TS(Tj) − that is Ti, which is requesting a conflicting lock, is older than Tj − then Ti is
allowed to wait until the data-item is available.
If TS(Ti) > TS(tj) − that is Ti is younger than Tj − then Ti dies. Ti is restarted later with a
random delay but with the same timestamp.
• Wound-Wait Scheme
If TS(Ti) < TS(Tj), then Ti forces Tj to be rolled back − that is Ti wounds Tj. Tj is restarted later
with a random delay but with the same timestamp.
If TS(Ti) > TS(Tj), then Ti is forced to wait until the resource is available.
Circular Wait – impose a total ordering of all resource types, and require that each process
requests resources in an increasing order of enumeration. If a process that is holding some
resources requests another resource that cannot be immediately allocated to it, then all resources
currently being held are released.
Deadlock avoidance:
• Simplest and most useful model requires that each process declare the maximum number
of resources of each type that it may need.
• The deadlock-avoidance algorithm dynamically examines the resource-allocation state to
ensure that there can never be a circular-wait condition.
• Resource-allocation state is defined by the number of available and allocated resources,
and the maximum demands of the processes.
2 phase locking:
This protocol requires that each transaction issue lock and unlock requests in two phases:
1. Growing phase. A transaction may obtain locks, but may not release any lock.
2. Shrinking phase. A transaction may release locks, but may not obtain any new locks.
Demerits:
• Deadlock occurs
• Cascade rollback occurs
Overcoming demerits:
In strict two phase locking protocol all exclusive mode locks taken by a transaction is held
until that transaction commits.
Rigorous two phase locking protocol requires that all locks be held until the transaction
commits.
2 phase commit:
The commit-request phase (or voting phase), in which a coordinator process attempts to
prepare all the transaction's participating processes to take the necessary steps for either
committing or aborting the transaction and to vote, either "Yes": commit (if the transaction
participant's local portion execution has ended properly), or "No": abort (if a problem has been
detected with the local portion)
The commit phase, in which, based on voting of the participants, the coordinator decides
whether to commit (only if all have voted "Yes") or abort the transaction (otherwise), and
notifies the result to all the participants. The participants then follow with the needed actions
(commit or abort) with their local transactional resources (also called recoverable resources; e.g.,
database data) and their respective portions in the transaction's other output (if applicable).
• Lost update - occurs when two different transactions are trying to update the same
Excel Engineering College AI & DS
COLLEGE
Conflict serializability:
Instructions l i and l j of transactions Ti and Tj respectively, conflict if and only if there exists
some item Q accessed by both l i and l j , and at least one of these instructions wrote Q.
• l i = read(Q), lj = read(Q). l i and l j don’t conflict.
• l i = read(Q), lj = write(Q). They conflict.
• l i = write(Q), lj = read(Q). They conflict
• l i = write(Q), lj = write(Q). They conflict
View serializability:
Let S and S´ be two schedules with the same set of transactions. S and S´ are view equivalent if
the following three conditions are met:
• For each data item Q, if transaction Ti reads the initial value of Q in schedule S, then
transaction Ti must, in schedule S´, also read the initial value of Q.
• For each data item Q if transaction Ti executes read(Q) in schedule S, and that value was
produced by transaction Tj (if any), then transaction Ti must in schedule S´ also read the
value of Q that was produced by transaction Tj .
• For each data item Q, the transaction (if any) that performs the final write(Q) operation in
schedule S must perform the final write(Q) operation in schedule S´
• RAID 0 – striping
• RAID 1 – mirroring
• RAID 0+1 – Non redundant and mirroring
• RAID 2 – Memory style error correcting codes
• RAID 3 –Bit interleaved parity
• RAID 4 – Block interleaved parity
• RAID 5 - Block interleaved Distributed parity
• RAID 6 – P+Q redundancy
Bit-level striping: Data striping consists of splitting the bits of each byte across multiple
disks.
Excel Engineering College AI & DS
COLLEGE
Block-level striping: Block level striping stripes blocks across multiple disks. It treats the
array of disks as a large disk, and gives blocks logical numbers.
Mirroring is a process of duplicate every disk to achieve redundancy for increasing the
reliability
Indexing is a data structure technique to efficiently retrieve records from the database files based
on some attributes on which the indexing has been done.
Primary Index − Primary index is defined on an ordered data file. The data file is ordered on
a key field. The key field is generally the primary key of the relation.
Secondary Index − Secondary index may be generated from a field which is a candidate key and
has a unique value in every record, or a non-key with duplicate values.
Clustering Index − Clustering index is defined on an ordered data file. The data file is ordered
on a non-key field.
Ordered Indexing is of two types −
• Dense Index
• Sparse Index
Dense Index - In dense index, there is an index record for every search key value in the database.
Sparse Index - In sparse index, index records are not created for every search key. An index record
here contains a search key and an actual pointer to the data on the disk.
Multilevel Index - Index records comprise search-key values and data pointers. Multilevel index is
stored on the disk along with the actual database files. As the size of the database grows, so does
the size of the indices
• Hash Function − A hash function, h, is a mapping function that maps all the set of
search-keys K to the address where actual records are placed. It is a function from search
keys to bucket addresses.
Static Hashing:
In static hashing, when a search-key value is provided, the hash function always computes the
same address. For example, if mod-4 hash function is used, then it shall generate only 5 values.
The output address shall always be same for that function. The number of buckets provided
remains unchanged at all times.
Bucket Overflow:
The condition of bucket-overflow is known as collision. This is a fatal state for any static hash
function. In this case, overflow chaining can be used.
• Overflow Chaining − When buckets are full, a new bucket is allocated for the same hash
result and is linked after the previous one. This mechanism is called Closed Hashing.
• Linear Probing − When a hash function generates an address at which data is already
stored, the next free bucket is allocated to it. This mechanism is called Open Hashing.
Dynamic Hashing
The problem with static hashing is that it does not expand or shrink dynamically as the size of
the database grows or shrinks. Dynamic hashing provides a mechanism in which data buckets are
added and removed dynamically and on-demand. Dynamic hashing is also known as extended
hashing.
Hash function, in dynamic hashing, is made to produce a large number of values and only a few
are used initially.
The aims of query processing are to transform a query written in a high-level language, typically
SQL, into a correct and efficient execution strategy expressed in a low-level language
(implementing the relational algebra), and to execute the strategy to retrieve the required data
Example:
Given SQL query for displaying name of employee whose salary is greater than 5000
SELECT Ename FROM Employee WHERE Salary > 5000;
A sequence of primitive operations that can be used to evaluate a query is a Query Execution
Plan or Query Evaluation Plan
Distributed DBMS:
A distributed database is a collection of multiple interconnected databases, which are spread
physically across various locations that communicate via a computer network.
A Distributed Database Management System (DDBMS) manages the distributed database and
provides mechanisms so as to make the databases transparent to the users.
In these systems, data is intentionally distributed among multiple nodes so that all computing
resources of the organization can be optimally used.
Factors Encouraging DDBMS:
• Distributed Nature of Organizational Units
• Need for Sharing of Data
• Support for Both OLTP and OLAP
• Database Recovery
• Support for Multiple Application Software
Types of DDBMS
• Homogeneous – Autonomous and Non-autonomous
• Heterogeneous - Federated and Un-federated
• Client - Server Architecture for DDBMS - The server functions primarily encompass
data management, query processing, optimization and transaction management. Client
functions include mainly user interface.
• Peer - to - Peer Architecture for DDBMS - In these systems, each peer acts both as a
client and a server for imparting database services. The peers share their resource with
other peers and co-ordinate their activities.
• Multi - DBMS Architecture for DDBMS - This is an integrated database system
formed by a collection of two or more autonomous database systems.
• Object oriented database systems are alternative to relational database and other database
systems.
• In object oriented database, information is represented in the form of objects.
• Object oriented databases are exactly same as object oriented programming languages. If
we can combine the features of relational model (transaction, concurrency, recovery) to
object oriented databases, the resultant model is called as object oriented database model
Features of OODBMS
Complexity OODBMS has the ability to represent the complex internal structure (of object) with
multilevel complexity.
Inheritance Creating a new object from an existing object in such a way that new object inherits
all characteristics of an existing object.
Excel Engineering College AI & DS
COLLEGE
Encapsulation It is an data hiding concept in OOPL which binds the data and functions together
which can manipulate data and not visible to outside world.
Persistency OODBMS allows to create persistent object (Object remains in memory even after
execution). This feature can automatically solve the problem of recovery and concurrency.
• Every object has unique identity. In an object oriented system, when object is created OID is
assigned to it.
• In RDBMS OID is value based and primary key is used to provide uniqueness of each table
in relation. Primary key is unique only for that relation and not for the entire system.
Primary key is chosen from the attributes of the relation which makes object independent on
the object state.
• In OODBMS OID are variable name or pointer.
Type of Attributes
1. Simple attributes
Excel Engineering College AI & DS
COLLEGE
Attributes can be of primitive data type such as, integer, string, real etc. which can take literal
value.
2. Complex attributes
Attributes which consist of collections or reference of other multiple objects are called as
complex attributes
3. Reference attributes
Attributes that represent a relationship between objects and consist of value or collection of
values are called as reference attributes.
XML Database is used to store huge amount of information in the XML format. As the use of
XML is increasing in every field, it is required to have a secured place to store the XML
documents. The data stored in the database can be queried using XQuery, serialized, and
exported into a desired format.
• XML- enabled
• Native XML (NXD)
XML - Enabled Database: XML enabled database is nothing but the extension provided for the
conversion of XML document. This is a relational database, where data is stored in tables
consisting of rows and columns. The tables contain set of records, which in turn consist of fields.
Excel Engineering College AI & DS
COLLEGE
Native XML database is based on the container rather than table format. It can store large
amount of XML document and data. Native XML database is queried by the XPath-expressions.
Example
<contact2>
<name>Manisha Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 789-4567</phone>
</contact2>
</contact-info>
XML parser is a software library or a package that provides interface for client applications to
work with XML documents. It checks for proper format of the XML document and may also
validate the XML documents. Modern day browsers have built-in XML parsers.
XML Document Type Declaration, commonly known as DTD, is a way to describe XML
language precisely. DTDs check vocabulary and validity of the structure of XML documents
against grammatical rules of appropriate XML language. An XML document is always
descriptive. The tree structure is often referred to as
XML Tree and plays an important role to describe any XML document easily. The tree structure
contains root (parent) elements, child elements and so on. By using tree structure, you can get to
know all succeeding branches and sub-branches starting from the root. The parsing starts at the
root, then moves down the first branch to an element, take the first branch from there, and so on
to the leaf nodes.
Document Object Model (DOM) is the foundation of XML. XML documents have a hierarchy
of informational units called nodes; DOM is a way of describing those nodes and the
relationships between them.
A DOM document is a collection of nodes or pieces of information organized in a hierarchy.
This hierarchy allows a developer to navigate through the tree looking for specific information.
Because it is based on a hierarchy of information, the DOM is said to be tree based.
XQuery is a specification for a query language that allows a user or programmer to extract
information from an Extensible Markup Language (XML) file or any collection of data that can
be XML-like. The syntax is intended to be easy to understand and use
Excel Engineering College AI & DS
COLLEGE
XML Namespace is a set of unique names. Namespace is a mechanisms by which element and
attribute name can be assigned to a group. The Namespace is identified by URI(Uniform
Resource Identifiers).
20. Explain in detail about Information Retrieval
• Information retrieval (IR) systems use a simpler data model than database systems
• Information organized as a collection of documents
• Documents are unstructured, no schema
• Information retrieval locates relevant documents, on the basis of user input such as
keywords or example documents
e.g., find documents containing the words “database systems”
• Can be used even on textual descriptions provided with nontextual data such as images
• Web search engines are the most familiar example of IR systems
• In full text retrieval, all the words in each document are considered to be keywords.
• We use the word term to refer to the words in a document
Information retrieval systems typically allow query expressions formed using keywords
and the logical connectives and, or, and not
• Ands are implicit, even if not explicitly specified
• Ranking of documents on the basis of estimated relevance to a query is critical
• Relevance ranking is based on factors such as
• Term frequency – Frequency of occurrence of query keyword in document
• Inverse document frequency – How many documents the query keyword occurs in
Fewer -> give more importance to keyword
• Hyperlinks to documents – More links to a document -> document is more important
• Web crawlers are programs that locate and gather information on the Web