Professional Documents
Culture Documents
5-Unit DBMS
5-Unit DBMS
Part-A
2. List information types of documents necessary for relevance ranking of documents in IR.
( Nov/Dec-2019)
Relevance ranking is based on factors such as
o Term frequency
o Frequency of occurrence of query keyword in document
o Inverse document frequency
o How many documents the query keyword occurs in
Fewer give more importance to keyword
o Hyperlinks to documents
o More links to a document Î document is more important
1
CS 8492-DATABASE MANAGEMENT SYSTEMS
When transaction is committed, this list is recursively traversed. Recursive traverse is done by
following all references in the object.
So if we assign reference to some newly created object N to some persistent object P, then this
object P will be included in the list of the modified objects.
During transaction commit OODBMS traverses object P and reach object N. It detects that N is
not yet persistent (has not assigned OID) and makes it persistent by allocating space for it in the
storage and assigning OID.
7. Compare sequential access devices versus random access devices with an example. (Apr/May-
2019)
Comparing random versus sequential operations is one way of assessing application
efficiency in terms of disk use. Accessing data sequentially is much faster than accessing it randomly
because of the way in which the disk hardware works. The seek operation, which occurs when the
disk head positions itself at the right disk cylinder to access data requested, takes more time than any
other part of the I/O process.
Because reading randomly involves a higher number of seek operations than does sequential
reading, random reads deliver a lower rate of throughput. The same is true for random writing.
2
CS 8492-DATABASE MANAGEMENT SYSTEMS
Features:
DDBMS is used to create, retrieve, update and delete distributed databases.
It synchronizes the database periodically and provides access mechanisms by the virtue of
which the distribution becomes transparent to the users.
It ensures that the data modified at any site is universally updated.
It is used in application areas where large volumes of data are processed and accessed by
numerous users simultaneously.
It is designed for heterogeneous database platforms.
It maintains confidentiality and data integrity of the databases.
9. How does the concept of an object in the object-oriented model differ from the concept of an
entity in the Entity-Relationship (E-R) model? (Nov/Dec-16)
E-R Model:
ER model is used to represent real life scenarios as entities. The properties of these entities are
their attributes in the ER diagram and their connections are shown in the form of relationships. An ER
model is generally considered as a top down approach in data designing.
An example of ER model is:
10. What is the difference between XML schema and XML DTD?
DTD, or Document Type Definition, and XML Schema, which is also known as XSD, are two
ways of describing the structure and content of an XML document. DTD is the older of the two, and as
such, it has limitations that XML Schema has tried to improve. The first difference between DTD and
XML Schema, is namespace awareness; XML Schema is, while DTD is not. Namespace awareness
removes the ambiguity that can result in having certain elements and attributes from multiple XML
vocabularies, by giving them namespaces that put the element or attribute into context.
3
CS 8492-DATABASE MANAGEMENT SYSTEMS
Part of the reason why XML Schema is namespace aware while DTD is not, is the fact that XML
Schema is written in XML, and DTD is not. Therefore, XML Schemas can be programmatically
processed just like any XML document. XML Schema also eliminates the need to learn another
language, as it is written in XML, unlike DTD.
4
CS 8492-DATABASE MANAGEMENT SYSTEMS
Part-B
Distributed database is a system in which storage devices are not connected to a common
processing unit.
Database is controlled by Distributed Database Management System and data may be stored at
the same location or spread over the interconnected network. It is a loosely coupled system.
5
CS 8492-DATABASE MANAGEMENT SYSTEMS
Shared nothing architecture is used in distributed databases.
The above diagram is a typical example of distributed database system, in which communication
channel is used to communicate with the different locations and every system has its own
memory and database.
Goals of Distributed Database system:
The concept of distributed database was built with a goal to improve:
Reliability: In distributed database system, if one system fails down or stops working for some time
another system can complete the task.
Availability: In distributed database system reliability can be achieved even if sever fails down.
Another system is available to serve the client request.
Performance: Performance can be achieved by distributing database over different locations. So the
databases are available to every location which is easy to maintain.
Types of distributed databases:
(i) Homogeneous distributed databases system:
Homogeneous distributed database system is a network of two or more databases (With same
type of DBMS software) which can be stored on one or more machines.
Example: Consider that we have three departments using Oracle-9i for DBMS. If some changes
are made in one department then, it would update the other department also.
6
CS 8492-DATABASE MANAGEMENT SYSTEMS
(ii) Heterogeneous distributed database system.
Heterogeneous distributed database system is a network of two or more databases with different
types of DBMS software, which can be stored on one or more machines.
In this system data can be accessible to several databases in the network with the help of generic
connectivity (ODBC and JDBC).
Example: In the following diagram, different DBMS software are accessible to each other using
ODBC and JDBC.
7
CS 8492-DATABASE MANAGEMENT SYSTEMS
Fragmentation is advantageous as it doesn’t create copies of data, consistency is not a problem.
Fragmentation of relations can be done in two ways:
Horizontal fragmentation – Splitting by rows – The relation is fragmented into groups of
tuples so that each tuple is assigned to at least one fragment.
Vertical fragmentation – Splitting by columns – The schema of the relation is divided into
smaller schemas. Each fragment must contain a common candidate key so as to ensure
lossless join.
2. Give XML representation of bank management system and also explain about
Document Type Definition and XML schema. (Nov/Dec-2018)
Since the XML format is widely accepted, a wide variety of tools are available to assist in
its processing, including browser software and database tools. Just as SQL is the dominant language
for querying relational data, XML is becoming the dominant format for data exchange.
<bank>
<account>
<account-number> A-101 </account-number>
<branch-name> Downtown </branch-name>
<balance> 500 </balance>
</account>
<account>
<account-number> A-102 </account-number>
<branch-name> Perryridge </branch-name>
<balance> 400 </balance>
</account>
<account>
<account-number> A-201 </account-number>
<branch-name> Brighton </branch-name>
<balance> 900 </balance>
</account>
<customer>
<customer-name> Johnson </customer-name>
<customer-street> Alma </customer-street>
<customer-city> Palo Alto </customer-city
</customer>
<customer>
8
CS 8492-DATABASE MANAGEMENT SYSTEMS
<customer-name> Hayes </customer-name>
<customer-street> Main </customer-street>
<customer-city> Harrison </customer-city>
</customer>
<depositor>
<account-number> A-101 </account-number>
<customer-name> Johnson </customer-name>
</depositor>
<depositor>
<account-number> A-201 </account-number>
<customer-name> Johnson </customer-name>
</depositor>
<depositor>
<account-number> A-102 </account-number>
<customer-name> Hayes </customer-name>
</depositor>
(a) XML representation of bank information.
Structure of XML Data:
The fundamental construct in an XML document is the element. An element is simply a pair
of matching start- and end-tags, and all the text that appears between them. XML documents must
have a single root element that encompasses all other elements in the document. In the example in
Figure the <bank> element forms the root element. Further, elements in an XML document must
nest properly. For instance,
<account> . . . <balance> . . . </balance> . . . </account>
is properly nested, whereas
<account> . . . <balance> . . . </account> . . . </balance>
is not properly nested. While proper nesting is an intuitive property, we may define it more
formally. Text is said to appear in the context of an element if it appears between the start-tag and
end-tag of that element. Tags are properly nested if every start-tag has a unique matching end-tag
that is in the context of the same parent element.
The ability to nest elements within other elements provides an alternative way to represent
information. Figure shows a representation of the bank information from Figure, but with
account elements nested within customer elements.
The nested representation makes it easy to find all accounts of a customer, although it
would store account elements redundantly if they are owned by multiple customers.
9
CS 8492-DATABASE MANAGEMENT SYSTEMS
Nested representations are widely used in XML data interchange applications to avoid
joins.
For instance, a shipping application would store the full address of sender and receiver
redundantly on a shipping document associated with each shipment, whereas a normalized
representation may require a join of shipping records with a company-address relation to
get address information.
In addition to elements, XML specifies the notion of an attribute. For instance, the type of
an account can represented as an attribute, as in Figure. The attributes of an element
appear as name=value pairs before the closing “>” of a tag.
Attributes are strings, and do not contain markup. Furthermore, attributes can appear only
once in a given tag, unlike subelements, which may be repeated.
<account>
This account is seldom used any more.
<account-number> A-102 </account-number>
<branch-name> Perryridge </branch-name>
<balance> 400 </balance>
</account>
(b) Mixture of text with subelements
<bank-1>
<customer>
<customer-name> Johnson </customer-name>
<customer-street> Alma </customer-street>
<customer-city> Palo Alto </customer-city>
<account>
<account-number> A-101 </account-number>
<branch-name> Downtown </branch-name>
<balance> 500 </balance>
</account>
<account>
<account-number> A-201 </account-number>
<branch-name> Brighton </branch-name>
<balance> 900 </balance>
</account>
</customer>
<customer>
10
CS 8492-DATABASE MANAGEMENT SYSTEMS
<customer-name> Hayes </customer-name>
<customer-street> Main </customer-street>
<customer-city> Harrison </customer-city>
<account>
<account-number> A-102 </account-number>
<branch-name> Perryridge </branch-name>
<balance> 400 </balance>
</account>
</bank-1>
(c) Nested XML representation of bank information.
One final syntactic note is that an element of the form <element></element>, which
contains no subelements or text, can be abbreviated as <element/>; abbreviated elements
may, however, contain attributes.
Since XML documents are designed to be exchanged between applications, a namespace
mechanism has been introduced to allow organizations to specify globally unique names
to be used as element tags in documents.
The idea of a namespace is to prepend each tag or attribute with a universal resource
identifier (for example, a Web address) Thus, for example, if First Bank wanted to ensure
that XML documents
...
<account acct-type= “checking”>
<account-number> A-102 </account-number>
<branch-name> Perryridge </branch-name>
<balance> 400 </balance>
</account>
...
(d)Use of attributes.
It created would not duplicate tags used by any business partner’s XML documents, it can
prepend a unique identifier with a colon to each tag name.
The bank may use a Web URL such as a unique identifier. Using long unique identifiers in
every tag would be rather inconvenient, so the namespace standard provides a way to
define an abbreviation for identifiers.
In Figure, the root element (bank) has an attribute xmlns:FB, which declares that FB is
defined as an abbreviation for the URL given above. The abbreviation can then be used in
various element tags, as illustrated in the figure.
11
CS 8492-DATABASE MANAGEMENT SYSTEMS
A document can have more than one namespace, declared as part of the root element.
Different elements can then be associated with different namespaces. A default namespace
can be defined, by using the attribute xmlns instead of xmlns:FB in the root element.
Elements without an explicit namespace prefix would then belong to the default
namespace.Sometimes we need to store values containing tags without having the tags
interpreted as XML tags. So that we can do so, XML allows this construct:
<![CDATA[<account> · · ·</account>]]>
Because it is enclosed within CDATA, the text <account> is treated as normal text data, not as a
tag. The term CDATA stands for character data.
<bank xmlns:FB=“http://www.FirstBank.com”>
...
<FB:branch>
<FB:branchname> Downtown </FB:branchname>
<FB:branchcity> Brooklyn </FB:branchcity>
</FB:branch>
...
</bank>
(e) Unique tag names through the use of namespaces.
XML Document Schema:
Databases have schemas, which are used to constrain what information can be stored in the
database and to constrain the data types of the stored information.
In contrast, by default, XML documents can be created without any associated schema: An
element may then have any subelement or attribute.
While such freedom may occasionally be acceptable given the self-describing nature of the
data format, it is not generally useful when XML documents must be processesed
automatically as part of an application, or even when large amounts of related data are to be
formatted in XML.
Here, describe the document-oriented schema mechanism included as part of the XML
standard, the Document Type Definition, as well as the more recently defined XMLSchema.
Document Type Definition:
The Document Type Definition (DTD) is an optional part of an XML document. The main
purpose of a DTD is much like that of a schema: to constrain and type the information
present in the document.
12
CS 8492-DATABASE MANAGEMENT SYSTEMS
However, the DTD does not in fact constrain types in the sense of basic types like integer or
string. Instead, it only constrains the appearance of subelements and attributes within an
element.
The DTD is primarily a list of rules for what pattern of subelements appear within an
element. Figure shows a part of an example DTD for a bank information document;
<!DOCTYPE bank [
<!ELEMENT bank ( (account—customer—depositor)+)>
<!ELEMENT account ( account-number branch-name balance )>
<!ELEMENT customer ( customer-name customer-street customer-city )>
<!ELEMENT depositor ( customer-name account-number )>
<!ELEMENT account-number ( #PCDATA )>
<!ELEMENT branch-name ( #PCDATA )>
<!ELEMENT balance( #PCDATA )>
<!ELEMENT customer-name( #PCDATA )>
<!ELEMENT customer-street( #PCDATA )>
<!ELEMENT customer-city( #PCDATA )>
(f) Example of a DTD
The account element is defined to contain subelements account-number, branch name and
balance (in that order). Similarly, customer and depositor have the attributes in their schema
defined as subelements.
Finally, the elements account-number, branch-name, balance, customer-name, customer-
street, and customer-city are all declared to be of type #PCDATA. The keyword #PCDATA
indicates text data; it derives its name, historically, from “parsed character data
XML Schema:
An effort to redress many of these DTD deficiencies resulted in a more sophisticated
schema language, XMLSchema. Here an example of XMLSchema, and list some areas in
which it improves DTDs, without giving full details of XMLSchema’s syntax.
Figure (g) shows how the DTD in Figure (f) can be represented by XMLSchema. The first
element is the root element bank, whose type is declared later.
The example then defines the types of elements account, customer, and depositor. Observe
the use of types xsd:string and xsd:decimal to constrain the types of data elements.
Finally the example defines the type BankType as containing zero or more occurrences of
each of account, customer and depositor. XMLSchema can define the minimum and
maximum number of occurrences of subelements by using minOccurs and maxOccurs.
13
CS 8492-DATABASE MANAGEMENT SYSTEMS
The default for both minimum and maximum occurrences is 1, so these have to be explicity
specified to allow zero or more accounts, deposits, and customers. Among the benefits that
XMLSchema offers over DTDs are these:
It allows user-defined types to be created.
It allows the text that appears in elements to be constrained to specific types, such as
numeric types in
<xsd:schema xmlns:xsd=“http://www.w3.org/2001/XMLSchema”>
<xsd:element name=“bank” type=“BankType” />
<xsd:element name=“account”>
<xsd:complexType>
<xsd:sequence>
<xsd:element name=“account-number” type=“xsd:string”/>
<xsd:element name=“branch-name” type=“xsd:string”/>
<xsd:element name=“balance” type=“xsd:decimal”/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name=“customer”>
<xsd:element name=“customer-number” type=“xsd:string”/>
<xsd:element name=“customer-street” type=“xsd:string”/>
<xsd:element name=“customer-city” type=“xsd:string”/>
</xsd:element>
<xsd:element name=“depositor”>
<xsd:complexType>
<xsd:sequence>
<xsd:element name=“customer-name” type=“xsd:string”/>
<xsd:element name=“account-number” type=“xsd:string”/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:complexType name=“BankType”>
<xsd:sequence>
<xsd:element ref=“account” minOccurs=“0” maxOccurs=“unbounded”/>
<xsd:element ref=“customer” minOccurs=“0” maxOccurs=“unbounded”/>
<xsd:element ref=“depositor” minOccurs=“0” maxOccurs=“unbounded”/>
14
CS 8492-DATABASE MANAGEMENT SYSTEMS
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
1. Data replication
Data Replication is the process of storing data in more than one site or node. It is useful
in improving the availability of data. It is simply copying data from a database from one
server to another server so that all the users can share the same data without any
inconsistency.
The result is a distributed database in which users can access data relevant to their tasks
without interfering with the work of others.
Data replication encompasses duplication of transactions on an ongoing basis, so that
the replicate is in a consistently updated state and synchronized with the source. However
in data replication data is available at different locations, but a particular relation has to
reside at only one location.
There can be full replication, in which the whole database is stored at every site. There can
also be partial replication, in which some frequently used fragment of the database are
replicated and others are not replicated.
Types of Data Replication:
Transactional Replication – In Transactional replication users receive full initial copies of the
database and then receive updates as data changes. Data is copied in real time from the publisher
to the receiving database(subscriber) in the same order as they occur with the publisher therefore
in this type of replication, transactional consistency is guaranteed.
Transactional replication is typically used in server-to-server environments. It does not
simply copy the data changes, but rather consistently and accurately replicates each change.
15
CS 8492-DATABASE MANAGEMENT SYSTEMS
Snapshot Replication – Snapshot replication distributes data exactly as it appears at a specific
moment in time does not monitor for updates to the data. The entire snapshot is generated and
sent to Users. Snapshot replication is generally used when data changes are infrequent.
It is bit slower than transactional because on each attempt it moves multiple records from
one end to the other end. Snapshot replication is a good way to perform initial synchronization
between the publisher and the subscriber.
Merge Replication – Data from two or more databases is combined into a single database. Merge
replication is the most complex type of replication because it allows both publisher and subscriber
to independently make changes to the database. Merge replication is typically used in server-to-
client environments. It allows changes to be sent from one publisher to multiple subscribers.
Advantages of Data Replication:
Reliability − In case of failure of any site, the database system continues to work since a copy is
available at another site(s).
Reduction in Network Load − Since local copies of data are available, query processing can be
done with reduced network usage, particularly during prime hours. Data updating can be done at
non-prime hours.
Quicker Response − Availability of local copies of data ensures quick query processing and
consequently quick response time.
Simpler Transactions − Transactions require less number of joins of tables located at different
sites and minimal coordination across the network. Thus, they become simpler in nature.
Disadvantages of Data Replication:
Increased Storage Requirements − Maintaining multiple copies of data is associated with
increased storage costs. The storage space required is in multiples of the storage required for a
centralized system.
Increased Cost and Complexity of Data Updating − Each time a data item is updated, the
update needs to be reflected in all the copies of the data at the different sites. This requires
complex synchronization techniques and protocols.
Undesirable Application – Database coupling − If complex update mechanisms are not used,
removing data inconsistency requires complex co-ordination at application level. This results in
undesirable application – database coupling.
Fragmentation:
Fragmentation is the task of dividing a table into a set of smaller tables. The subsets of the
table are called fragments. Fragmentation can be of three types: horizontal, vertical, and hybrid
(combination of horizontal and vertical). Horizontal fragmentation can further be classified into
two techniques: primary horizontal fragmentation and derived horizontal fragmentation.
16
CS 8492-DATABASE MANAGEMENT SYSTEMS
Fragmentation should be done in a way so that the original table can be reconstructed from
the fragments. This is needed so that the original table can be reconstructed from the fragments
whenever required. This requirement is called “reconstructiveness.”
Advantages of Fragmentation:
Since data is stored close to the site of usage, efficiency of the database system is
increased.
Local query optimization techniques are sufficient for most queries since data is locally
available.
Since irrelevant data is not available at the sites, security and privacy of the database
system can be maintained.
Disadvantages of Fragmentation:
When data from different fragments are required, the access speeds may be very high.
In case of recursive fragmentations, the job of reconstruction will need expensive
techniques.
Lack of back-up copies of data in different sites may render the database ineffective in
case of failure of a site.
(i)Vertical Fragmentation:
In vertical fragmentation, the fields or columns of a table are grouped into fragments. In
order to maintain reconstructiveness, each fragment should contain the primary key field(s) of the
table. Vertical fragmentation can be used to enforce privacy of data.
For example, let us consider that a University database keeps records of all registered
students in a Student table having the following schema.
Relation: STUDENT
Now, the fees details are maintained in the accounts section. In this case, the designer will
fragment the database as follows:
CREATE TABLE STD_FEES AS SELECT Regd_No, Fees FROM STUDENT;
(ii)Horizontal Fragmentation:
Horizontal fragmentation groups the tuples of a table in accordance to values of one or
more fields. Horizontal fragmentation should also confirm to the rule of reconstructiveness. Each
horizontal fragment must have all columns of the original base table.
For example, in the student schema, if the details of all students of Computer Science
Course need to be maintained at the School of Computer Science, then the designer will
horizontally fragment the database as follows:
17
CS 8492-DATABASE MANAGEMENT SYSTEMS
CREATE COMP_STD AS SELECT * FROM STUDENT WHERE COURSE = "Computer
Science";
(iii)Hybrid Fragmentation:
In hybrid fragmentation, a combination of horizontal and vertical fragmentation
techniques are used. This is the most flexible fragmentation technique since it generates
fragments with minimal extraneous information. However, reconstruction of the original table is
often an expensive task.
Hybrid fragmentation can be done in two alternative ways:
At first, generate a set of horizontal fragments; then generate vertical fragments from one
or more of the horizontal fragments.
At first, generate a set of vertical fragments; then generate horizontal fragments from one
or more of the vertical fragments.
One assumes that the user then starts at the top of the ranked list and works their way
down examining each document in turn for relevance. This, of course, is an estimation of how
users behave; in practice they are often far less predictable. There are also further complications
that must be considered.
(i)Set-based measures:
Two simple measures developed early on were precision and recall. These are set-based
measures: documents in the ranking are treated as unique and the ordering of results is
ignored. Precision measures the fraction of retrieved documents that are relevant; recall
measures the fraction of relevant documents that are retrieved.
Precision and recall hold an approximate inverse relationship: higher precision is often
coupled with lower recall. However, this is not always the case as it has been shown that
precision is affected by the retrieval of non-relevant documents; recall is not.
Compared to other evaluation measures, precision is simple to compute because one only
considers the set of retrieved documents (as long as relevance can be judged). However, to
compute recall requires comparing the set of retrieved documents with the entire
collection, which is impossible in many cases (e.g., for Web search). In this situation
techniques, such as pooling, are used.
18
CS 8492-DATABASE MANAGEMENT SYSTEMS
Often preference is given to either precision or recall. For example, in Web search the
focus is typically on obtaining high precision by finding as many relevant documents in
the top n results. However, there are certain domains, such as patent search, where the
focus is on finding all relevant documents through an exhaustive search.
(ii)Rank-based measures:
More commonly used measures are based on evaluating ranked retrieval results, where
importance is placed, not only on obtaining the maximum number of relevant documents,
but also for returning relevant documents higher in the ranked list.
A common way to evaluate ranked outputs is to compute precision at various levels of
recall (e.g., 0.0, 0.1, 0.2, ... 1.0), or at the rank positions of all the relevant documents and
the scores averaged (referred to as average precision).
A further common measure is precision at a fixed rank position, for example Precision at
rank 10 (P10 or P@10). Because the number of relevant documents can influence the
P@10 score, an alternative measure called R-precision can be used: precision is measured
at the rank position Rq, the total number of relevant documents for query q.
More recently, measures based on non-binary (or graded) relevance judgments have been
utilised, such as discounted cumulative gain. In such measures, each document is given a
score indicating relevance (e.g., relevant=2; partially-relevant=1; non-relevant=0).
Discounted Cumulative Gain (GCG) computes a value for the number of relevant
documents retrieved that includes a discount function to progressively reduce the
importance of relevant documents found further down the ranked results list. This
simulates the assumption that users prefer relevant documents higher in the ranked list.
The measure also makes the assumption that highly relevant documents are more useful
than partially relevant documents, which in turn are more useful than non-relevant
documents. The score can be normalised to provide a value in the range 0 to 1, known
as normalised DCG (nDCG).
The measure can be averaged across multiple topics similar to computing mean average
precision, and it has also been extended to compute the value of retrieved results across
multiple queries in a session, referred to as normalised session Discounted Cumulative
Gain or nsDCG.
(iii)Other measures:
Additional measures have been developed to evaluate different information retrieval
problems. For example, to measure the success of search tasks where just one relevant
document is required (known-item search), measures, such as Mean Reciprocal
Rank (MRR), can be used.
In practice it is important to select an evaluation measure that is suitable for the given
task; for example, if the problem is known-item search then the mean reciprocal rank
would be appropriate; for an ad hoc search task then mean average precision or averaged
normalised discounted cumulative gain would be applicable.
19
CS 8492-DATABASE MANAGEMENT SYSTEMS
20
CS 8492-DATABASE MANAGEMENT SYSTEMS
The evaluation of Prolog programs is based on a technique called backward
chaining, which involves a top-down evaluation of goals. In the deductive databases
that use Datalog, attention has been devoted to handling large volumes of data stored in a
relational database.
Hence, evaluation techniques have been devised that resemble those for a bottom-up
evaluation. Prolog suffers from the limitation that the order of specification of facts and
rules is significant in evaluation.
(ii)The Spatial DB:
Spatial data is associated with geographic locations such as cities, towns etc. A spatial
database is optimized to store and query data representing objects. These are the objects which
are defined in a geometric space.
21
CS 8492-DATABASE MANAGEMENT SYSTEMS
OQL is SQL-like query language to query Java heap. OQL allows to filter/select
information wanted from Java heap. While pre-defined queries such as "show all instances of
class X" are already supported by HAT, OQL adds more flexibility. OQL is based on
JavaScript expression language.
OQL query is of the form:
select <JavaScript expression to select>
[ from [instanceof] <class name> <identifier>
[ where <JavaScript boolean expression to filter> ] ]
where class name is fully qualified Java class name (example: java.net.URL) or array class
name. [C is char array name, [Ljava.io.File; is name of java.io.File]] and so on.
Note that fully qualified class name does not always uniquely identify a Java class at runtime.
There may be more than one Java class with the same name but loaded by different loaders.
So, class name is permitted to be id string of the class object.
If instanceof keyword is used, subtype objects are selected. If this keyword is not specified,
only the instances of exact class specified are selected. Both from and where clauses are
optional.
22
CS 8492-DATABASE MANAGEMENT SYSTEMS
In select and (optional) where clauses, the expression used in JavaScript expression. Java
heap objects are wrapped as convenient script objects so that fields may be accessed in natural
syntax.
For example, Java fields can be accessed with obj.field_name syntax and array elements can
be accessed with array[index] syntax. Each Java object selected is bound to a JavaScript
variable of the identifier name specified in from clause.
OQL Examples:
select all Strings of length 100 or more
Heap object:
The heap built-in object supports the following methods:
heap.forEachClass -- calls a callback function for each Java Class
heap.forEachClass(callback);
(clazz is the class whose instances are selected. If not specified, defaults to
java.lang.Object. includeSubtypes is a boolean flag that specifies whether to include
subtype instances or not. Default value of this flag is true.)
heap.findClass -- finds Java Class of given name
23
CS 8492-DATABASE MANAGEMENT SYSTEMS
heap.findClass(className);
where className is name of the class to find. The resulting Class object has following
properties:
o name - name of the class.
24
CS 8492-DATABASE MANAGEMENT SYSTEMS
Literal Types:
Collection Literals The ODMG Object Model supports collection literals of the following
types: set<t> , bag<t>, list<t>, array<t>, dictionary<t, v>, where t is a type of objects or values in
the collection.
Object Structure:
• The structure of object can be either atomic or not, in which case the object is composed
of other objects.
• An atomic object type is user-defined. There are no built-in atomic object types included
in the ODMG Object Model.
• In the ODMG Object Model, instances of collection objects are composed of distinct
elements, each of which can be an instance of an atomic type, another collection, or a
literal type.
Collection Objects:
The collections supported by the ODMG Object Model include:
‣ Set<t>: A Set object is an unordered collection of elements, with no duplicates allowed.
‣ Bag<t>: A Bag object is an unordered collection of elements that may contain
duplicates.
‣ List<t>: A List object is an ordered collection of elements.
‣ Array<t> : An Array object is a dynamically sized, ordered collection of elements that
can be located by position.
‣ Dictionary<t,v>: A Dictionary object is an unordered sequence of key-value pairs with
no duplicate keys.
Each of these is a type generator, parameterized by the type shown within the angle
brackets.
ODMG Interface:
• An interface is a specification of the abstract behavior of an object type.
• It is a signature for the persistent object. Interface tells external world how to interact
with an object. That is, an interface describes the interface of types of objects: their visible
attributes, relationships and operations.
• Interfaces are non-instantiable but they serve to define operations that can be inherited
by the user-defined objects for a particular application.
• State properties of an interface (i.e., its attributes and relationships) cannot be inherited.
Interfaces and Behavior Inheritance:
• In ODMG, two types of inheritance relationships exist.
25
CS 8492-DATABASE MANAGEMENT SYSTEMS
• An interface is a specification of the abstract behavior of an object type, which specifies
the operation signatures.
• Interfaces are noninstantiable – that is, one cannot create objects that correspond to an
interface definition.
• They are mainly used to specify abstract operations that can be inherited by classes or
by other interfaces.
• Subtyping pertains to the inheritance of behavior only and it is specified by colon (: ).
26
CS 8492-DATABASE MANAGEMENT SYSTEMS
2) Native XML Data Management Systems:
These are systems like Niagara and Timber that support only XQuery. By this method,
the XML document is broken into nodes and the node information, stored in a B+ tree as all
document nodes are stored in order at the leaf level. In Niagara, so called inverted list indexes
are created to enable efficient structural joins algorithms.
There are various ways to solve the problem of effective, automatic conversion of XML
data into and out of relational databases.
Database vendors such as IBM, Microsoft, Oracle, and Sybase have developed tools to assist
in converting XML documents into relational tables. The various solutions are as follows.
Oracle XML SQL Utility models XML document elements as a collection of nested tables.
Enclosed elements are modeled by employing the Oracle Object datatype. The "SQL-to-
XML" conversion constructs an XML document by using a one-to-one association between a
table, referenced by Object datatype, and a nested element. "XML-to-SQL" might require
either data model amending (converting it from relational into object-relational) or
restructuring the original XML document.
IBM DB2 XML Extender allows storing XML documents either as BLOB-like objects or as
decomposed into a set of tables. The latter transformation, known as XML collection, is
defined in XML 1.0 syntax.
Microsoft approaches the problem by extending SQL-92 and by introducing OPENXML row
set.
Sybase Adaptive Server introduces the ResultSetXml Java class as a base for processing
XML documents in both directions.
27
CS 8492-DATABASE MANAGEMENT SYSTEMS
Oracle translates the chain of object references from the database into the hierarchical
structure of XML elements. In an object-relational database, the field ACCOUNT in the table
FXTRADE is modeled as an object reference of type AccountType:
{
CURRENCY1 CHAR (3), CURRENCY2 CHAR (3),
28
CS 8492-DATABASE MANAGEMENT SYSTEMS
A corresponding XML document generated from the given object-relational model (using
"SELECT * FROM FXTRADE") looks like
<?xml version="1.0"?>
<ROWSET>
<ROW num="1">
<CURRENCY1>GBP</CURRENCY1>
<CURRENCY2>JPY</CURRENCY2>
<AMOUNT>10000</AMOUNT>
<SETTLEMENT>20010325</SETTLEMENT>
<ACCOUNT>
<BANKCODE>812</BANKCODE>
<BANKACCT>00365888</BANKACCT>
</ACCOUNT>
</ROW>
</ROWSET>
import java.sql.*;
import oracle.xml.sql.dml.OracleXMLSave;
throws SQLException
29
CS 8492-DATABASE MANAGEMENT SYSTEMS
sav.insertXML(args[0]);
sav.close();
}
...
}
If the XML and object-relational model in the database are synchronized, but what if they
aren't? we have two options in that case.
XSU does not permit storage of attribute values; it's recommended that you transform attributes
into elements.
Object-Relational Database
Persistent data: data that continue to exist even after the program that created it has
terminated.
A persistent programming language is a programming language extended with constructs to
handle persistent data. It distinguishes with embedded SQL in at least two ways:
1. In a persistent program language, query language is fully integrated with the host
language and both share the same type system. Any format changes required in
databases are carried out transparently.
Comparison with Embedded SQL where (1) host and DML have different type
systems, code conversion operates outside of OO type system, and hence has a higher
chance of having undetected errors; (2) format conversion takes a substantial amount
of code.
2. Using Embedded SQL, a programmer is responsible for writing explicit code to fetch
data into memory or store data back to the database.
In a persistent program language, a programmer can manipulate persistent data
without having to write such code explicitly.
31
CS 8492-DATABASE MANAGEMENT SYSTEMS
Drawbacks:
(1) Powerful but easy to make programming errors that damage the database;
(2) Harder to do automatic high-level optimization; and
(3) Do not support declarative querying well.
a. Persistence by class. Declare class to be persistent: all objects of the class are then
persistent objects. Simple, not flexible since it is often useful to have both transient
and persistent objects in a single class. In many OODB systems, declaring a class to be
persistent is interpreted as ``persistable'' -- objects in the class potentially can be made
persistent.
b. Persistence by creation. Introduce new syntax to create persistent objects.
c. Persistence by marking. Mark an object persistent after it is created (and before the
program terminates).
d. Persistence by reference. One or more objects are explicitly declared as (root)
persistent objects. All other objects are persistent iff they are referred, directly or
indirectly, from a root persistent object. It is easy to make the entire data structure
persistent by merely declaring the root of the structure as persistent, but is expensive to
follow the chains in detection for a database system.
32
CS 8492-DATABASE MANAGEMENT SYSTEMS
(c)Storage and Access of Persistent Objects:
Objects storage in a database:
Code (that implements methods) should be stored in the database as part of the
schema, along with type definitions, but many implementations store them outside of the
database, to avoid having to integrate system software such as compilers with the database
system.
Data: stored individually for each object.
Find the objects:
1. Give names to objects like we give names to files: works only for small sets of
objects.
2. Expose object identifiers or persistent pointers to the objects:
3. Store the collections of object and allow programs to iterate over the collections to
find required objects. The collections can be modelled as objects of a collection
type. A special case of a collection is a class extent, which is a collection of all
objects belonging to the class.
Most OODB systems support all three ways of accessing persistent objects. All objects
have object identifiers.
Names are typically given only to class extents and other collection objects, and
perhaps to other selected objects, but most objects are not given names.
Class extents are usually maintained for all classed that can have persistent objects, but
in many implementations, they contain only persistent objects of the class.
The ODBMS which is an abbreviation for object oriented database management system, is the
data model in which data is stored in form of objects, which are instances of classes. These
classes and objects together makes an object oriented data model.
Components of Object Oriented Data Model:
The OODBMS is based on three major components, namely: Object structure, Object classes, and
Object identity. These are explained as following below.
1. Object Structure:
The structure of an object refers to the properties that an object is made up of. These
properties of an object are referred to as an attribute. Thus, an object is a real world entity with
certain attributes that makes up the object structure. Also an object encapsulates the data code into
a single unit which in turn provides data abstraction by hiding the implementation details from the
user.
33
CS 8492-DATABASE MANAGEMENT SYSTEMS
The object structure is further composed of three types of components: Messages, Methods, and
Variables. These are explained as following below.
Messages:
A message provides an interface or acts as a communication medium between an object and
the outside world. A message can be of two types:
Read-only message: If the invoked method does not change the value of a variable,
then the invoking message is said to be a read-only message.
Update message: If the invoked method changes the value of a variable, then the
invoking message is said to be an update message.
Methods:
When a message is passed then the body of code that is executed is known as a method.
Every time when a method is executed, it returns a value as output. A method can be of two
types:
Read-only method: When the value of a variable is not affected by a method, then it
is known as read-only method.
Update-method: When the value of a variable changes by a method, then it is known
as an update method.
Variables –
It stores the data of an object. The data stored in the variables makes the object
distinguishable from one another.
2. Object Classes:
An object which is a real world entity is an instance of a class. Hence first we need to
define a class and then the objects are made which differ in the values they store but share the
same class definition. The objects in turn corresponds to various messages and variables stored in
it.
Example :
class CLERK
{ //variables
char name;
string address;
int id;
int salary;
//messages
char get_name();
34
CS 8492-DATABASE MANAGEMENT SYSTEMS
string get_address();
int annual_salary();
};
In above example we can see, CLERK is a class that holds the object variables and
messages.
An OODBMS also supports inheritance in an extensive manner as in a database there may be
many classes with similar methods, variables and messages.
Thus, the concept of class hierarchy is maintained to depict the similarities among various
classes.
The concept of encapsulation that is the data or information hiding is also supported by object
oriented data model. And this data model also provides the facility of abstract data types apart
from the built-in data types like char, int and float.
ADT’s are the user defined data types that hold the values within it and can also have methods
attached to it.
Thus, OODBMS provides numerous facilities to it’s users, both built-in and user defined. It
incorporates the properties of an object oriented data model with a database management
system, and supports the concept of programming paradigms like classes and objects along
with the support for other concepts like encapsulation, inheritance and the user defined ADT’s
(abstract data types).
35