Unit 4adtnotes

LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY

LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
UNIT 4 EMERGING SYSTEMS ● Examples of some enhanced Data Models:

1. Active database
Topic 1: ENHANCED DATA MODELS:
2. Spatial database
Need for enhanced data models: 3. Temporal Database
4. Multimedia database
● As the use of database systems has grown, users have demanded 5. Deductive databases
additional functionality from software packages, with the purpose Active Databases:
of making it easier to implement more advanced and complex user ● Active databases, provide additional functionality for specifying
applications. active rules.
● Object-oriented databases and object-relational systems do provide ● These rules can be automatically triggered by events that occur,
features that allow users to extend their systems by specifying such as a database update or a certain time being reached, and can
additional abstract data types for each application. initiate certain actions that have been specified in the rule
● It is useful to identify certain common features for the advanced declaration if certain conditions are met.
applications and to create models that can represent the common ● Many commercial packages already have some of the functionality
features. provided by active databases in the form of triggers.
● Specialized storage structures and indexing methods can be ● Triggers are now part of the sQL-99 standard.
implemented to improve the performance of the common features. Temporal Databases:
● The common features can then be implemented as abstract data ● Temporal databases permit the database system to store a history of
type or class libraries and separately purchased with the basic changes, and allow users to query both current and past states of
DBMS software package. the database.
● Users can utilize these features directly if they are suitable for their ● Some temporal database models also allow users to store future
applications, without having to reinvent, reimplement, and expected information, such as planned schedules.
reprogram such common features.
Spatial Databases:
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Spatial databases provide concepts for databases that keep track of Expert database systems or knowledge-based systems:
objects in a multidimensional space. For example, cartographic ● It incorporates reasoning and inferencing capabilities; such
databases that store maps include two-dimensional spatial systems use techniques that were developed in the field of artificial
positions of their objects, which include countries, states, rivers, intelligence, including semantic networks, frames, production
cities, roads, seas, and so on. systems, or rules for capturing domain-specific knowledge.
● Other databases, such as meteorological databases for weather
information, are three-dimensional, since temperatures and other
meteorological information are related to three-dimensional spatial
points.
Multimedia Databases:
● Multimedia databases provide features that allow users to store and
query different types of multimedia information, which includes
images (such as pictures or drawings), video clips (such as movies,
news reels, or home videos), audio clips (such as songs, phone
messages, or speeches), and documents (such as books or articles).
● Deductive databases:
It’s an area that is at the intersection of databases, logic, and
artificial intelligence or knowledge bases. A deductive database
system is a database system that includes capabilities to define
(deductive) rules, which can deduce or infer additional information
from the facts that are stored in a database. TOPIC 2: CLIENT/SERVER MODEL
● Because part of the theoretical foundation for some deductive
database systems is mathematical logic, such rules are often Client/Server Model:
referred to as logic databases.

LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
 Networked computing model ● The client machines provide the user with the appropriate
 Processes distributed between clients and servers interfaces to utilize these servers, as well as with local processing
 Client – Workstation (usually a PC) that requests and uses a power to run local applications.
service ● This concept can be carried over to software, with specialized
 Server – Computer (PC/mini/mainframe) that provides a service software-such as a DBMS or a CAD (computer-aided design)
 For DBMS, server is a database server package-being stored on specific server machines and being made
accessible to multiple clients.
Basic Client/Server Architectures:
● Figure 1 illustrates client/server architecture at the logical level,
● The client/server architecture was developed to deal with and Figure 2 is a simplified diagram that shows how the physical
computing environments in which a large number of Pc’s, architecture would look.
workstations, file servers, printers, database servers, Web servers, ● The concept of client/server architecture assumes an underlying
and other equipment are connected via network. framework that consists of many PCs and workstations as well as a
● The idea is to define specialized servers with specific smaller number of mainframe machines, connected via local area
functionalities. networks and other types of computer networks.
● For example, it is possible to connect a number of PCs or small Client
workstations as clients to a file server that maintains the files of the ● A client in this framework is typically a user machine that
client machines. Another machine could be designated as a printer provides user interface capabilities and local processing.
server by being connected to various printers; thereafter, all print ● When a client requires access to additional functionality-such as
requests by the clients are forwarded to this machine. database access-that does not exist at that machine, it connects to a
● Web servers or e-mail servers also fall into the specialized server server that provides the needed functionality.
category. Server:
● In this way, the resources provided by specialized servers can be A server is a machine that can provide services to the client machines, such
accessed by many client machines. as file access, printing, archiving, or database access.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● In the general case, some machines install only client software,

others only server software, and still others may include both client
and server software.
● It is more common that client and server software usually run on
separate machines.
Fig 1. 2-tier client server architecture

Two-Tier Client/Server Architectures for DBMS:
Types of basic DBMS architecture: ● The client/server architecture is increasingly being incorporated
into commercial DBMS packages.
1. Two-tier
● In relational DBMSs (RDBMSs), client/server architecture started
2. Three-tier
as centralized systems, where the system components that were
first moved to the client side were the user interface and
application programs are present.
● SQL provides a standard language for RDBMSs, which created a
logical dividing point between client and server.
● Hence, the query and transaction functionality remained on the
server side.
● In such architecture, the server is often called a query server or
transaction server, because it provides these two functionalities.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● In RDBMSs, the server is also often called an SQL server, since Fig 2. Physical 2-tier Client server architecture
most RDBMS servers are based on the SQL language and
● Most DBMS vendors provide ODBC drivers for their systems.
standard.
● Hence, a client program can actually connect to several RDBMSs
● In such client/server architecture, the user interface programs and
and send query and transaction requests using the ODBC API,
application programs can run on the client side.
which are then processed at the server sites.
● When DBMS access is required, the program establishes a
● Any query results are sent back to the client program, which can
connection to the DBMS (which is on the server side); once the
process or display the results as needed.
connection is created, the client program can communicate with
● A related standard for the Java programming language, called
the DBMS.
JDBC, has also been defined.
● A standard called Open Database Connectivity (ODBC) provides
● This allows Java client programs to access the DBMS through a
an application programming interface (API), which allows
standard interface.
client-side programs to call the DBMS, as long as both client and
server machines have the necessary software installed.
● The second approach to client/server architecture was taken by
some object-oriented DBMSs.
● Because many of these systems were developed in the era of

client/server architecture, the approach taken was to divide the
software modules of the DBMS between client and server in a
more integrated way.
● For example, the server level may include the part of the DBMS
software responsible for handling data storage on disk pages, local
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
concurrency control and recovery, buffering and caching of disk ● The architectures described here are called two-tier architectures
pages, and other such functions. because the software components are distributed over two systems:
client and server.
● Meanwhile, the client level may handle the user interface; data Advantage:
dictionary functions; DBMS interactions with programming ● The advantages of this architecture are its simplicity and seamless
language compilers; global query optimization, concurrency compatibility with existing systems.
control, and recovery across multiple servers; structuring of ● The emergence of the World Wide Web changed the roles of
complex objects from the data in the buffers; and other such clients and server, leading to the three-tier architecture.
functions.
Three-Tier Client/Server Architectures for Web
● In this approach, the client/server interaction is more tightly Applications:
coupled and is done internally by the DBMS modules-some of
which reside on the client and some on the server-rather than by
the users.
● The exact division of functionality varies from system to system.
● In such a client/ server architecture, the server has been called a

data server, because it provides data in disk pages to the client.
● This data can then be structured into objects for the client
programs by the client-side DBMS software itself.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
Three layers: checking a client's credentials before forwarding a request to the

Client GUI interface Browser database server.
Browser ● Clients contain GUI interfaces and some additional
(I/O application-specific business rules.
processing) ● The intermediate server accepts requests from the client, processes
the request and sends database commands to the database server,
Application Business rules Web Server and then acts as a conduit for passing (partially) processed data
server from the database server to the clients, where it may be processed
Database Data storage DBMS further and filtered to be presented to users in GUI format.
server
In the three-tier client-server architecture, the following three layers exist:
Client GUI interface Browser 1. Presentation layer (client):
(I/O processing) ● This provides the user interface and interacts with the
Application server Business rules Web Server user.
Database server Data storage DBMS ● The programs at this layer present Web interfaces or
● Many Web applications use an architecture called the three-tier forms to the client in order to interface with the
architecture, which adds an intermediate layer between the client application.
and the database server. ● Web browsers are often utilized, and the languages used
● This intermediate layer or middle tier is sometimes called the include HTML, JAVA, JavaScript, PERL, Visual Basic,
application server and sometimes the Web server, depending on the and so on. This layer handles user input, output, and
application. navigation by accepting user commands and displaying
● This server plays in intermediary role by storing business rules the needed information, usually in the form of static or
(procedures or constraints) that are used to access data from the dynamic Web pages. The latter are employed when the
database server. It can also improve database security by interaction involves database access.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● When a Web interface is used, this layer typically ● Query results (and queries) may be formatted into XML
communicates with the application layer via the HTTP when transmitted between the application server and the
protocol. database server.
2. Application layer (business logic): Database Server Architectures
● This layer programs the application logic. For example, ● 2-tiered approach
queries can be formulated based on user input from the ● Client is responsible for
client, or query results can be formatted and sent to the o I/O processing logic
client for presentation. o Some business rules logic
● Additional application functionality can be handled at this ● Server performs all data storage and access processing 🡪DBMS is
layer, such as security checks, identity verification, and only on server
other functions. Advantages:
● The application layer can interact with one or more ● Clients do not have to be as powerful
databases or data sources as needed by connecting to the ● Greatly reduces data traffic on the network
database using ODBC, DBC, SQL/CLI or other database ● Improved data integrity since it is all processed centrally
access techniques. ● Stored procedures 🡪 some business rules done on server
3. Database server:
● This layer handles query and update requests from the
application layer, processes the requests, and send the
results.
● Usually SQL is used to access the database if it is
relational or object-relational and stored database
procedures may also be invoked.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
2. Each database server processes the local query and sends the
results to the application server site. Increasingly, XML is being
touted as the standard for data exchange so the database server
may format the query result into XML before sending it to the
application server.
3. The application server combines the results of the subqueries to produce

the result of the originally required query, formats it into HTML or some
other form accepted by the client, and sends it to the client site for display.
Functions of Application server:
● The application server is responsible for generating a distributed
execution plan for a multisite query or transaction and for
supervising distributed execution by sending commands to servers.
● These commands include local queries and transactions to be
executed, as well as commands to transmit data to other clients or
Fig 3. Logical 3-tier Client server architecture.
servers.
Interaction between application server and database server during the ● Another function controlled by the application server (or
processing of an SQL query: coordinator) is that of ensuring consistency of replicated copies of
1. The application server formulates a user query based on input from a data item by employing distributed (or global) concurrency
the client layer and decomposes it into a number of independent control techniques.
site queries. Each site query is sent to the appropriate database ● The application server must also ensure the atomicity of global
server site. transactions by performing global recovery when certain sites fail.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● If the DDBMS has the capability to hide the details of data ● Improved customer service
distribution from the application server, then it enables the ● Competitive advantage
application server to execute global queries and transactions as ● Reduced risk
though the database were centralized, without having to specify the Challenges of Three-tier Architectures
sites at which the data referenced in the query or transaction ● High short-term costs
resides. ● Tools and training
● This property is called distribution transparency. Some DDBMSs ● Experience
do not provide distribution transparency, instead requiring that ● Incompatible standards
applications be aware of the details of data distribution. ● Lack of compatible end-user tools
● Advances in encryption and decryption technology make it safer to Client/Server Security
transfer sensitive data from server to client in encrypted form, ● Network environment 🡪complex security issues
where it will be decrypted. The latter can be done by the hardware Security levels:
or by advanced software. ● System-level password security
● This technology gives higher levels of data security, but the o for allowing access to the system
network security issues remain a major concern. ● Database-level password security
● Various technologies for data compression are also helping in o for determining access privileges to tables;
transferring large amounts of data from servers to clients over o read/update/insert/delete privileges
wired and wireless networks. ● Secure client/server communication
o via encryption
Advantages of Three-Tier Architectures
● Scalability
● Technological flexibility
● Long-term cost reduction TOPIC 3: DATA WAREHOUSING AND DATA MINING
● Better match of systems to business needs
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
Data Warehousing and Data Mining: ● Corporate decision making requires a unified view of all
organizational data, including historical data
Need for Data warehousing:
● A data warehouse is a repository (archive) of information
● Large companies have presences in many places, each of which gathered from multiple sources, stored under a unified schema, at a
may generate a large volume of data. single site
● For instance, large retail chains have hundreds or thousands of o Greatly simplifies querying, permits study of historical
stores, whereas insurance companies may have data from trends
thousands of local branches. o Shifts decision support query load away from transaction
● Large organizations have a complex internal organization structure, processing systems
and therefore different data may be present in different locations,

or on different operational systems, or under different schemas.
● For instance, manufacturing-problem data and customer-complaint
data may be stored on different database systems.
● Corporate decision makers require access to information from all
such sources.
● Setting up queries on individual sources is both cumbersome and
inefficient.
● A data warehouse is a repository (or archive) of information
● Moreover, the sources of data may store only current data, whereas
gathered from multiple sources, stored under a unified schema, at a
decision makers may need access to past data as well; for instance,
single site.
information about how purchase patterns have changed in the past
● Once gathered, the data are stored for a long time, permitting
year could be of great importance.
access to historical data.
● Data warehouses provide a solution to these problems.
● Thus, data warehouses provide the user a single consolidated
Data Warehousing
interface to data, making decision-support queries easier to write.
● Data sources often store only current data, not historical data
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● By accessing information for decision support from a data ● Data sources that have been constructed independently are likely to
warehouse, the decision maker ensures that online have different schemas. They may even use different data models.
transaction-processing systems are not affected by the ● Part of the task of a warehouse is to perform schema integration,
decision-support workload. and to convert data to the integrated schema before they are stored.
● As a result, the data stored in the warehouse are not just a copy of
Components of a Data Warehouse the data at the sources.
The issues to be addressed in building a warehouse are the ● Instead, they can be thought of as a materialized view of the data at
following: the sources.
Data cleansing.
When and how to gather data. ● The task of correcting and preprocessing data is called data
● In a source-driven architecture for gathering data, the data cleansing.
sources transmit new information, either continually (as transaction ● Data sources often deliver data with numerous minor
processing takes place), or periodically (nightly, for example). inconsistencies that can be corrected.
● In a destination-driven architecture, the data warehouse ● For example, names are often misspelled, and addresses may have
periodically sends requests for new data to the sources. street/area/city names misspelled, or zip codes entered incorrectly.
● Unless updates at the sources are replicated at the warehouse via ● These can be corrected to a reasonable extent by consulting a
two-phase commit, the warehouse will never be quite up to date database of street names and zip codes in each city.
with the sources. ● Address lists collected from multiple sources may have duplicates
● Two phase commit is usually far too expensive to be an option, so that need to be eliminated in a merge–purge operation.
data warehouses typically have slightly out-of-date data. ● Records for multiple individuals in a house may be grouped
● That, however, is usually not a problem for decision-support together so only one mailing is sent to each house; this operation is
systems. called householding.
What schema to use. How to propagate updates.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Updates on relations at the data sources must be propagated to the ● Thus, the data are usually multidimensional data, with dimension
data warehouse. attributes and measure attributes.
● If the relations at the data warehouse are exactly the same as those ● Tables containing multidimensional data are called fact tables and
at the data source, the propagation is straightforward. are usually very large
● If they are not, the problem of propagating updates is basically the ● A table recording sales information for a retail store, with one tuple
view-maintenance problem. for each item that is sold, is a typical example of a fact table.
What data to summarize. ● The dimensions of the sales table would include what the item is
● The raw data generated by a transaction-processing system (usually an item identifier such as that used in bar codes), the date
may be too large to store online. when the item is sold, which location (store) the item was sold
● However, we can answer many queries by maintaining just from, which customer bought the item, and so on.
summary data obtained by aggregation on a relation, rather ● The measure attributes may include the number of items sold and
than maintaining the entire relation. the price of the items.
● For example, instead of storing data about every sale of ● To minimize storage requirements, dimension attributes are usually
clothing, we can store total sales of clothing by itemname and short identifiers that are foreign keys into other other tables called
category. dimension tables.
● Suppose that a relation r has been replaced by a summary relation
s. Users may still be permitted to pose queries as though the ● For instance, fact table sales would have attributes item-id,
relation r were available online. store-id, customer-id, and date, and measure attributes number and
● If the query requires only summary data, it may be possible to price.
transform it into an equivalent one using s instead. ● The attribute store-id is a foreign key into a dimension table store,
Warehouse Schemas which has other attributes such as store location (city, state,
● Data warehouses typically have schemas that are designed for data country).
analysis, using tools such as OLAP tools. ● The item-id attribute of the sales table would be a foreign key into
a dimension table item-info, which would contain information such
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
as the name of the item, the category to which the item belongs, ● Complex data warehouse designs may also have more than one
and other item details such as color and size. fact table.
● The customer-id attribute would be a foreign key into a customer The Evolution of Data Warehousing
table containing attributes such as name and address of the
● Since 1970s, organizations gained competitive advantage
customer.
through systems that automate business processes to offer
● We can also view the date attribute as a foreign key into a date-info
more efficient and cost-effective services to the customer.
table giving the month, quarter, and year of each date.
● This resulted in accumulation of growing amounts of data in
● The resultant schema appears in Figure.
operational databases.
● Organizations now focus on ways to use operational data to

support decision-making, as a means of gaining competitive
advantage.
● However, operational systems were never designed to support

such business activities.
● Businesses typically have numerous operational systems with

● Such a schema, with a fact table, multiple dimension tables, and
overlapping and sometimes contradictory definitions.
foreign keys from the fact table to the dimension tables, is called a
star schema. ● Organizations need to turn their archives of data into a source
● More complex data warehouse designs may have multiple levels of of knowledge, so that a single integrated / consolidated view
dimension tables; for instance, the item-info table may have an of the organization’s data is presented to the user.
attribute manufacturer-id that is a foreign key into another table
● A data warehouse was deemed the solution to meet the
giving details of the manufacturer. Such schemas are called
requirements of a system capable of supporting
snowflake schemas.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
decision-making, receiving data from multiple operational ● Data in the warehouse is only accurate and valid at some point in
data sources. time or over some time interval.
● Time-variance is also shown in the extended time that the data is
Data Warehousing Concepts
held, the implicit or explicit association of time with all data, and
● A subject-oriented, integrated, time-variant, and non-volatile the fact that the data represents a series of snapshots.
collection of data in support of management’s decision-making
process .
Non-volatile Data
Subject-oriented Data
● Data in the warehouse is not updated in real-time but is
● The warehouse is organized around the major subjects of the
refreshed from operational systems on a regular basis.
enterprise (e.g. customers, products, and sales) rather than the
● New data is always added as a supplement to the database,
major application areas (e.g. customer invoicing, stock control, and
rather than a replacement.
product sales).
● This is reflected in the need to store decision-support data rather Data Webhouse
than application-oriented data.
● The Web is an immense source of behavioral data as
Integrated Data individuals interact through their Web browsers with remote
Web sites. The data generated by this behavior is called
● The data warehouse integrates corporate application-oriented
clickstream.
data from different source systems, which often includes data
● A data webhouse is a distributed data warehouse with no
that is inconsistent.
central data repository that is implemented over the Web to
● The integrated data source must be made consistent to present
harness clickstream data.
a unified view of the data to the users.
Benefits of Data Warehousing
Time-variant Data
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Potential high returns on investment – Executive information systems (EIS)

● Competitive advantage
– OLAP tools
● Increased productivity of corporate decision-makers
– Data mining tools
Examples of Typical Data Warehouse Queries

Comparison of OLTP Systems and Data Warehousing
● What was the total revenue for Scotland in the third quarter of
2004?
● What was the total revenue for property sales for each type of
property in Great Britain in 2003?
● What are the three most popular areas in each city for the
renting of property in 2004 and how does this compare with
the figures for the previous two years?
Data Warehouse
● What is the monthly revenue for property sales at each branch
Queries
office, compared with rolling 12-monthly prior figures?
● The types of queries that a data warehouse is expected to ● What would be the effect on property sales in the different
answer ranges from the relatively simple to the highly regions of Britain if legal costs went up by 3.5% and
complex and is dependent on the type of end-user access tools Government taxes went down by 1.5% for properties over
used. £100,000?
● Which type of property sells for prices above the average
u End-user access tools include:
selling price for properties in the main cities of Great Britain
– Reporting, query, and application development tools and how does this correlate to demographic data?
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● What is the relationship between the total annual revenue

generated by each branch office and the total number of sales
staff assigned to each branch office?
Typical Architecture of a Data Warehouse

Problems of Data Warehousing
● Underestimation of resources for data loading

● Hidden problems with source systems
● Required data not captured
● Increased end-user demands
● Data homogenization
● High demand for resources
● Data ownership
● High maintenance
Operational Data
● Long duration projects

The source of data for the data warehouse is supplied from:
● Complexity of integration 1. Mainframe operational data held in first generation hierarchical

and network databases.
2. Departmental data held in proprietary file systems such as VSAM,
RMS, and relational DBMSs such as Informix and Oracle.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
3. Private data held on workstations and private servers. Warehouse Manager

4. External systems such as the Internet, commercially available ● Performs all the operations associated with the management of
databases, or databases associated with an organization’s suppliers the data in the warehouse.
or customers. ● Constructed using vendor data management tools and
custom-built programs.
● Operations performed include
Operational Data Store(ODS) o Analysis of data to ensure consistency.
● A repository of current and integrated operational data used for o Transformation and merging of source data from
analysis. temporary storage into data warehouse tables.
● Often structured and supplied with data in the same way as the data o Creation of indexes and views on base tables.
warehouse. o Generation of denormalizations, (if necessary).
● May act simply as a staging area for data to be moved into the o Generation of aggregations, (if necessary).
warehouse. o Backing-up and archiving data.
● Often created when legacy operational systems are found to be ● In some cases, also generates query profiles to determine
incapable of achieving reporting requirements. which indexes and aggregations are appropriate.
● Provides users with the ease-of-use of a relational database while ● A query profile can be generated for each user, group of users,
remaining distant from the decision support functions of the data or the data warehouse and is based on information that
warehouse. describes the characteristics of the queries such as frequency,
Load Manager target table(s), and size of results set.
● Performs all the operations associated with the extraction and Query Manager
loading of data into the warehouse. ● Performs all the operations associated with the management of
● Size and complexity will vary between data warehouses and may user queries.
be constructed using a combination of vendor data loading tools
and custom-built programs.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Typically constructed using vendor end-user data access tools, data ● Removes the requirement to continually perform summary
warehouse monitoring tools, database facilities, and custom-built operations (such as sort or group by) in answering user queries.
programs. ● The summary data is updated continuously as new data is loaded
● Complexity determined by the facilities provided by the end-user into the warehouse.
access tools and the database.
Archive / Backup Data
● The operations performed by this component include directing
queries to the appropriate tables and scheduling the execution of ● Stores detailed and summarized data for the purposes of archiving
queries. and backup.
● In some cases, the query manager also generates query profiles to ● May be necessary to backup online summary data if this data is
allow the warehouse manager to determine which indexes and kept beyond the retention period for detailed data.
aggregations are appropriate. ● The data is transferred to storage archives such as magnetic tape or
Detailed Data optical disk.
● Stores all the detailed data in the database schema.
● In most cases, the detailed data is not stored online but aggregated Metadata
to the next level of detail.

● This area of the warehouse stores all the metadata (data about data)
● On a regular basis, detailed data is added to the warehouse to
definitions used by all the processes in the warehouse.
supplement the aggregated data.
Lightly and Highly Summarized Data ● Used for a variety of purposes
● Stores all the pre-defined lightly and highly aggregated data
o Extraction and loading processes - metadata is used to
generated by the warehouse manager.
map data sources to a common view of information
● Transient as it will be subject to change on an on-going basis in
within the warehouse.
order to respond to changing query profiles.
● The purpose of summary information is to speed up the o Warehouse management process - metadata is used to
performance of queries. automate the production of summary tables.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
o Query management process - metadata is used to direct a ● These users interact with the warehouse using end-user access
query to the most appropriate data source. tools.
● The data warehouse must efficiently support ad hoc and routine
● The structure of metadata will differ between each process,
analysis.
because the purpose is different.
● High performance is achieved by pre-planning the requirements for
● This means that multiple copies of metadata describing the same
joins, summations, and periodic reports by end-users (where
data item are held within the data warehouse.
possible).
● Most vendor tools for copy management and end-user data access
● There are five main groups of access tools
use their own versions of metadata.
o Data reporting and query tools
● Copy management tools use metadata to understand the mapping
rules to apply in order to convert the source data into a common o Application development tools
form.
o Executive information system (EIS) tools
● End-user access tools use metadata to understand how to build a
o Online analytical processing (OLAP) tools
query.
o Data mining tools
● The management of metadata within the data warehouse is a very
complex task that should not be underestimated.
End-user Access Tools
● The principal purpose of data warehousing is to provide

information to business users for strategic decision-making.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Metaflow - Processes associated with the management of the

metadata.
Data Warehouse Information Flows
Data Warehousing Tools and Technologies
● Building a data warehouse is a complex task because there is no

vendor that provides an ‘end-to-end’ set of tools.
● Necessitates that a data warehouse is built using multiple products
from different vendors.
● Ensuring that these products work well together and are fully
integrated is a major challenge.
Extraction, Cleansing, and Transformation Tools
● Tasks of capturing data from source systems, cleansing and

● Inflow - Processes associated with the extraction, cleansing, and
transforming it, and loading the results into a target system can be
loading of the data from the source systems into the data
carried out either by separate products, or by a single integrated
warehouse.
solution.
● Upflow - Processes associated with adding value to the data in the
● Integrated solutions include
warehouse through summarizing, packaging, and distribution of
▪ Code Generators
the data.
▪ Database Data Replication Tools
● Downflow - Processes associated with archiving and ▪ Dynamic Transformation Engines
backing-up/recovery of data in the warehouse.

Data Warehouse DBMS Requirements
● Outflow - Processes associated with making the data available to

● Load performance
the end-users.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Load processing o SMP - A set of tightly coupled processors that share

● Data quality management memory and disk storage.
● Query performance
● MPP - A set of loosely coupled processors, each of which has its
● Terabyte scalability
own memory and disk storage.
● Mass user scalability
● Networked data warehouse Data Warehouse Metadata
● Warehouse administration
● Integrated dimensional analysis ● Metadata is used for a variety of purposes and management of
● Advanced query functionality metadata is a critical issue in achieving a fully integrated data
warehouse.
Data Warehouse Parallel Database Technologies ● The major purpose of metadata is to show the pathway back to
where the data began, so that the warehouse administrators
● Aims to solve decision-support problems using multiple nodes
know the history of any item in the warehouse.
working on the same problem.
● Problem is that metadata has several functions in the data
● Performs many database operations simultaneously, splitting
warehouse.
individual tasks into smaller parts so that tasks can be spread
across multiple processors. – Data transformation and loading
● Parallel DBMSs must be capable of running parallel queries,
– Data warehouse management
parallel data loading, table scanning, and data archiving, and back
up. – Query generation
● Two main parallel hardware architectures include u Problem is that metadata has several functions in the data
warehouse.
o Symmetric Multi-processing (SMP)
– Data transformation and loading

o Massively Parallel Processing (MPP)
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
– Data warehouse management ● Maintaining efficient data storage management.
– Query generation ● Purging data.
● Various tools of data warehouse generate and use their own ● Archiving and backing-up data.
metadata. Major challenge is to synchronize the various types of
● Implementing recovery following failure.
metadata.
● Two industry organizations: the Meta Data Coalition (MDC) and ● Security management.
the Object Management Group (OMG) have merged to propose a
single standard for metadata and modeling in data warehousing
called the Common Warehouse Metamodel (CWM).
● Allows users to exchange metadata between different products
from different vendors freely.
Administration and Management Tools
● Monitoring data loading from multiple sources.

● Data quality and integrity checks.
● Managing and updating metadata.
● Monitoring database performance to ensure efficient query
response times and resource utilization.
● Auditing data warehouse usage to provide user chargeback
information.
● Replicating, subsetting, and distributing data. Typical Data Warehouse and Data Mart Architecture
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
– Do not normally contain detailed operational data unlike

data warehouses.
– More easily understood and navigated.
Reasons for Creating a Data Mart
● To give users access to the data they need to analyze most often.
● To provide data in a form that matches the collective view of the
data by a group of users in a department or business function area.
● To improve end-user response time due to the reduction in the
volume of data to be accessed.
● To provide appropriately structured data as dictated by the
requirements of the end-user access tools.
● Building a data mart is simpler compared with establishing a
corporate data warehouse.
● The cost of implementing data marts is normally less than that
Data Mart required to establish a data warehouse.
● The potential users of a data mart are more clearly defined and can
● A subset of a data warehouse that supports the requirements of a
be more easily targeted to obtain support for a data mart project
particular department or business function.
rather than a corporate data warehouse project.
● Characteristics include
– Focuses on only the requirements of one department or Data Marts Issues
business function.
● Data mart functionality
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Data mart size – For many enterprises the way to avoid the complexities
associated with designing a data warehouse is to start by
● Data mart load performance
building one or more data marts.
● Users access to data in multiple data marts – Data marts allow designers to build something that is far
simpler and achievable for a specific group of users.
● Data mart Internet / Intranet access ● Few designers are willing to commit to an enterprise-wide design
that must meet all user requirements at one time.
● Data mart administration
● Despite the interim solution of building data marts, the goal
● Data mart installation remains the same: that is, the ultimate creation of a data warehouse
that supports the requirements of the enterprise.
Designing Data Warehouses
● The requirements collection and analysis stage of a data warehouse
u To begin a data warehouse project, we need to find answers for project involves interviewing appropriate members of staff (such
questions such as: as marketing users, finance users, and sales users) to enable the
identification of a prioritized set of requirements that the data
– Which user requirements are most important and which
warehouse must meet.
data should be considered first?
● At the same time, interviews are conducted with members of staff
– Which data should be considered first? responsible for operational systems to identify, which data sources
can provide clean, valid, and consistent data that will remain
– Should the project be scaled down into something more supported over the next few years.
manageable? ● Interviews provide the necessary information for the top-down
view (user requirements) and the bottom-up view (which data
– Should the infrastructure for a scaled down project be
sources are available) of the data warehouse.
capable of ultimately delivering a full-scale
● The database component of a data warehouse is described using a
enterprise-wide data warehouse?
technique called dimensionality modeling.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
Dimensionality modelling
● A logical design technique that aims to present the data in a

standard, intuitive form that allows for high-performance
access
● Uses the concepts of Entity-Relationship modeling with some
important restrictions.
● Every dimensional model (DM) is composed of one table with
a composite primary key, called the fact table, and a set of
smaller tables called dimension tables.
● Each dimension table has a simple (non-composite) primary

key that corresponds exactly to one of the components of the
composite key in the fact table.
● Forms ‘star-like’ structure, which is called a star schema or

star join.
● All natural keys are replaced with surrogate keys. Means that
every join between fact and dimension tables is based on Star schema for property sales of DreamHome
surrogate keys, not natural keys.
● Surrogate keys allows the data in the warehouse to have some

independence from the data used and produced by the OLTP
systems.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Important to treat fact data as read-only reference data that

will not change over time.
● Most useful fact tables contain one or more numerical

measures, or ‘facts’ that occur for each record and are numeric
and additive.
● Dimension tables usually contain descriptive textual

information.
● Dimension attributes are used as the constraints in data

warehouse queries.
● Star schemas can be used to speed up query performance by

denormalizing reference information into a single dimension
table.
Dimensionality modelling
● Snowflake schema is a variant of the star schema where

● Star schema is a logical structure that has a fact table
dimension tables do not contain denormalized data.
containing factual data in the center, surrounded by dimension
tables containing reference data, which can be denormalized. ● Starflake schema is a hybrid structure that contains a mixture
● Facts are generated by events that occurred in the past, and are of star (denormalized) and snowflake (normalized) schemas.
unlikely to change, regardless of how they are analyzed. Allows dimensions to be present in both forms to cater for
different query requirements.
● Bulk of data in data warehouse is in fact tables, which can be
extremely large. Property sales with normalized version of Branch dimension table
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
– Predictable query processing
Comparison of DM and ER models
● A single ER model normally decomposes into multiple DMs.
● Multiple DMs are then associated through ‘shared’ dimension

tables.
Database Design Methodology for Data Warehouses
● ‘Nine-Step Methodology’ includes following steps:
● Choosing the process
● Choosing the grain
Dimensionality modelling ● Identifying and conforming the dimensions
● Predictable and standard form of the underlying dimensional ● Choosing the facts
model offers important advantages:

● Storing pre-calculations in the fact table
– Efficiency
● Rounding out the dimension tables
– Ability to handle changing requirements
● Choosing the duration of the database
– Extensibility
● Tracking slowly changing dimensions
– Ability to model common business situations
● Deciding the query priorities and the query modes
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
Step 1: Choosing the process
● The process (function) refers to the subject matter of a

particular data mart.
● First data mart built should be the one that is most likely to be
delivered on time, within budget, and to answer the most
commercially important business questions.
ER model of property sales business process of DreamHome
ER model of an extended version of DreamHome

Step 2: Choosing the grain
● Decide what a record of the fact table is to represents.

● Identify dimensions of the fact table. The grain decision for the
fact table also determines the grain of each dimension table.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Also include time as a core dimension, which is always present in

star schemas.
Step 3: Identifying and conforming the dimensions
● Dimensions set the context for asking questions about the facts in
the fact table.
● If any dimension occurs in two data marts, they must be exactly
the same dimension, or one must be a mathematical subset of the
other.
● A dimension used in more than one data mart is referred to as
being conformed.
Step 4: Choosing the facts
● The grain of the fact table determines which facts can be used in
the data mart.
● Facts should be numeric and additive.
● Unusable facts include:
– non-numeric facts
Star schemas for property sales and property advertising – non-additive facts
– fact at different granularity from other facts in table

LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
Property rentals with a badly structured fact table
Step 5: Storing pre-calculations in the fact table
● Once the facts have been selected each should be re-examined

to determine whether there are opportunities to use
pre-calculations.
Step 6: Rounding out the dimension tables
● Text descriptions are added to the dimension tables.

● Text descriptions should be as intuitive and understandable to
Property rentals with fact table corrected the users as possible.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Usefulness of a data mart is determined by the scope and o Type 2, where a changed dimension attribute causes a
nature of the attributes of the dimension tables. new dimension record to be created
o Type 3, where a changed dimension attribute causes an
Step 7: Choosing the duration of the database
alternate attribute to be created so that both the old and
● Duration measures how far back in time the fact table goes. new values of the attribute are simultaneously accessible
in the same dimension record
● Very large fact tables raise at least two very significant data Step 9: Deciding the query priorities and the query modes
warehouse design issues. ● Most critical physical design issues affecting the end-user’s
perception includes:
– Often difficult to source increasing old data.
– physical sort order of the fact table on disk
– It is mandatory that the old versions of the important – presence of pre-stored summaries or aggregations
dimensions be used, not the most current versions. Known ● Additional physical design issues include administration, backup,
as the ‘Slowly Changing Dimension’ problem. indexing performance, and security.
Step 8: Tracking slowly changing dimensions Database Design Methodology for Data Warehouses
● Slowly changing dimension problem means that the proper ● Methodology designs a data mart that supports the requirements of
description of the old dimension data must be used with the old a particular business process and allows the easy integration with
fact data. other related data marts to form the enterprise-wide data
● Often, a generalized key must be assigned to important dimensions warehouse.
in order to distinguish multiple snapshots of dimensions over a
● A dimensional model, which contains more than one fact table
period of time.
sharing one or more conformed dimension tables, is referred to as a
● There are three basic types of slowly changing dimensions:
fact constellation.
o Type 1, where a changed dimension attribute is
overwritten
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
Fact and dimension tables for each business process of DreamHome
Criteria for assessing the dimensionality of a data warehouse
● Criteria proposed by Ralph Kimball (2000) to measure the

extent to which a system supports the dimensional view of
data warehousing.
● Twenty criteria divided into three broad groups: architecture,
Dimensional model (fact constellation) for the DreamHome data administration, and expression.
warehouse
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
Online Analytical Processing (OLAP)
The dynamic synthesis, analysis, and consolidation of large volumes of

multi-dimensional data.
Data Mining:
Data Mining
● The process of extracting valid, previously unknown,

comprehensible, and actionable information from large databases
and using it to make crucial business decisions, (Simoudis,1996).
● Involves the analysis of data and the use of software techniques for
finding hidden and unexpected patterns and relationships in sets of
data.
● Reveals information that is hidden and unexpected, as little value

– Architectural criteria describes the way the entire system in finding patterns and relationships that are already intuitive.
is organized.
● Patterns and relationships are identified by examining the
– Administration criteria are considered to be essential to underlying rules and features in the data.
the ‘smooth running’ of a dimensionally-oriented data
● Tends to work from the data up and most accurate results normally
warehouse.
require large volumes of data to deliver reliable conclusions.
– Expression criteria are mostly analytic capabilities that are
needed in real-life situations.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Starts by developing an optimal representation of structure of – Predicting customers likely to change their credit card
sample data, during which time knowledge is acquired and affiliation
extended to larger sets of data.
– Determining credit card spending by customer groups
● Data mining can provide huge paybacks for companies who have
● Insurance
made a significant investment in data warehousing.
– Claims analysis
● Relatively new technology, however already used in a number of
industries. – Predicting which customers will buy new policies
Examples of Applications of Data Mining ● Medicine
● Retail / Marketing – Characterizing patient behavior to predict surgery visits
– Identifying buying patterns of customers – Identifying successful medical therapies for different
illnesses
– Finding associations among customer demographic
characteristics
Data Mining Operations
– Predicting response to mailing campaigns
● Four main operations include:
– Market basket analysis
– Predictive modeling
● Banking
– Database segmentation
– Detecting patterns of fraudulent credit card use
– Link analysis
– Identifying loyal customers
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
– Deviation detection ● Data Mining Operations and Associated Techniques
● There are recognized associations between the applications and the

corresponding operations.
o e.g. Direct marketing strategies use database
segmentation.
● Techniques are specific implementations of the data mining

operations.
● Each operation has its own strengths and weaknesses.

Predictive Modeling
● Data mining tools sometimes offer a choice of operations to
implement a technique. ● Similar to the human learning experience
● Criteria for selection of tool includes o uses observations to form a model of the important
characteristics of some phenomenon.
o Suitability for certain input data types
● Uses generalizations of ‘real world’ and ability to fit new data into
o Transparency of the mining output a general framework.
o Tolerance of missing variable values ● Can analyze a database to determine essential characteristics
(model) about the data set.
o Level of accuracy possible
● Model is developed using a supervised learning approach, which

o Ability to handle large volumes of data
has two phases: training and testing.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
– Training builds a model using a large sample of historical

data called a training set.
– Testing involves trying out the model on new, previously

unseen data to determine its accuracy and physical
performance characteristics.
● Applications of predictive modeling include customer retention

management, credit approval, cross selling, and direct marketing.
Example of Classification using Neural Induction
● There are two techniques associated with predictive modeling:
classification and value prediction, which are distinguished by the
nature of the variable being predicted.
Predictive Modeling – Classification
● Used to establish a specific predetermined class for each record in

a database from a finite set of possible, class values.
Predictive Modeling - Value Prediction
● Two specializations of classification: tree induction and neural
induction. ● Used to estimate a continuous numeric value that is associated with
a database record.
● Uses the traditional statistical techniques of linear regression and

nonlinear regression.
● Relatively easy-to-use and understand.
Example of Classification using Tree Induction

LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Linear regression attempts to fit a straight line through a plot of the ● Aim is to partition a database into an unknown number of
data, such that the line is the best representation of the average of segments, or clusters, of similar records.
all observations at that point in the plot. ● Uses unsupervised learning to discover homogeneous
sub-populations in a database to improve the accuracy of the
● Problem is that the technique only works well with linear data and
profiles.
is sensitive to the presence of outliers (that is, data values, which
do not conform to the expected norm). ● Less precise than other operations thus less sensitive to
redundant and irrelevant features.
● Although nonlinear regression avoids the main problems of linear
regression, it is still not flexible enough to handle all possible ● Sensitivity can be reduced by ignoring a subset of the
shapes of the data plot. attributes that describe each instance or by assigning a
weighting factor to each variable.
● Statistical measurements are fine for building linear models that
describe predictable data points, however, most data is not linear in ● Applications of database segmentation include customer
nature. profiling, direct marketing, and cross selling.
● Data mining requires statistical methods that can accommodate

non-linearity, outliers, and non-numeric data.
● Applications of value prediction include credit card fraud detection

or target mailing list identification.
Example of Database Segmentation using a Scatterplot

Database Segmentation
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
– Sequential pattern discovery
– Similar time sequence discovery
● Applications include product affinity analysis, direct

marketing, and stock price movement.
● Associated with demographic or neural clustering techniques, Link Analysis - Associations Discovery
which are distinguished by
● Finds items that imply the presence of other items in
– Allowable data inputs the same event.
– Methods used to calculate the distance between records ● Affinities between items are represented by
association rules.
– Presentation of the resulting segments for analysis
o e.g. ‘When a customer rents property for

more than 2 years and is more than 25 years
Link Analysis
old, in 40% of cases, the customer will buy a
● Aims to establish links (associations) between records, or sets of property. This association happens in 35%
records, in a database. of all customers who rent properties’.
● There are three specializations
– Associations discovery Link Analysis - Sequential Pattern Discovery

LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
u Finds patterns between events such that the presence of one set of ● Can be performed using statistics and visualization techniques or
items is followed by another set of items in a database of events as a by-product of data mining.
over a period of time.
● Applications include fraud detection in the use of credit cards and
– e.g. Used to understand long term customer buying insurance claims, quality control, and defects tracing.
behavior.
Example of Database Segmentation using a Visualization

Link Analysis - Similar Time Sequence Discovery
● Finds links between two sets of data that are time-dependent, and
is based on the degree of similarity between the patterns that both
time series demonstrate.
– e.g. Within three months of buying property, new home

owners will purchase goods such as cookers, freezers, and
washing machines.
Deviation Detection The Data Mining Process
● Relatively new operation in terms of commercially available data ● Recognizing that a systematic approach is essential to successful
mining tools. data mining, many vendor and consulting organizations have
specified a process model designed to guide the user through a
● Often a source of true discovery because it identifies outliers,
sequence of steps that will lead to good results.
which express deviation from some previously known expectation
● Developed a specification called the Cross Industry Standard
and norm.
Process for Data Mining (CRISP-DM).
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● CRISP-DM specifies a data mining process model that is not ● The model also discusses relationships between different DM
compliant with a particular industry or tool. tasks. It gives idealised sequence of actions during a DM project.
● CRISP-DM has evolved from the knowledge discovery processes

used widely in industry and in direct response to user
requirements.
● The major aims of CRISP-DM are to make large data mining

projects run more efficiently, be cheaper, more reliable, and more Phases of the CRISP-DM Model
manageable.
● CRISP-DM is a hierarchical process model. At the top level, the

process is divided into six different generic phases, ranging from
business understanding to deployment of project results.
● The next level elaborates each of these phases as comprising of

several generic tasks. At this level, the description is generic
enough to cover all the DM scenarios.
● The third level specializes these tasks for specific situations. For
instance, the generic task might be cleaning data, and specialised
Data Mining Tools
task could be cleaning of numeric values or categorical values.
● There are a growing number of commercial data mining tools on

● The fourth level is the process instance; that is a record of actions,
the marketplace.
decisions and result of an actual execution of DM project.
● Important characteristics of data mining tools include:

LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
– Data preparation facilities ● Product scalability and performance
– Selection of data mining operations – Capable of dealing with increasing amounts of data,
possibly with sophisticated validation controls.
– Product scalability and performance
– Maintaining satisfactory performance may require
– Facilities for understanding results
investigations into whether a tool is capable of supporting
● Data preparation facilities parallel processing using technologies such as SMP or
MPP.
– Data preparation is the most time-consuming aspect of
data mining. ● Facilities for understanding results
– Functions supported include: data preparation, data – By providing measures such as those describing accuracy
cleansing, data describing, data transforming and data and significance in useful formats such as confusion
sampling. matrices, by allowing the user to perform sensitivity
analysis on the result, and by presenting the result in
● Selection of data mining operations alternative ways using for example visualization
techniques.
– Important to understand the characteristics of the
operations (algorithms) to ensure that they meet the user’s
requirements.
– In particular, important to establish how the algorithms

treat the data types of the response and predictor
variables, how fast they train, and how fast they work on
new data.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
Decision Tree
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
TOPIC: DATAWAREHOUSING AND DATAMINING
One of the major challenges for organizations seeking to exploit data

mining is identifying suitable data to mine.
Data mining requires a single, separate, clean, integrated, and
self-consistent source of data.
A data warehouse is well equipped for providing data for mining for the
following reasons:
1. Data quality and consistency are prerequisites for mining to ensure
the accuracy of the predictive models. Data warehouses are
populated with clean, consistent data.
2. It is advantageous to mine data from multiple sources to discover
as many interrelationships as possible. Data warehouses contain
data from a number of sources.
3. Selecting the relevant subsets of records and fields for data mining
requires the query capabilities of the data warehouse.
4. The results of a data mining study are useful if there is some way
to further investigate the uncovered patterns. Data warehouses
provide the capability to go back to the data source.
Given the complementary nature of data mining and data warehousing,
many vendors are investigating ways of integrating data mining and data
warehouse technologies.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
TOPIC: WEB DATABASES
Web Databases:
● Web most popular and powerful networked information system to

date.
● As architecture of Web was designed to be platform-independent,

can significantly lower deployment and training costs.
● Organizations using Web as strategic platform for innovative

business solutions, in effect becoming Web-centric.
● Many Web sites today are file-based where each Web document is
stored in separate file.
● For large sites, this can lead to significant management problems.
● Also many Web sites now contain more dynamic information, such
as product and pricing data.
● Maintaining such data in both a database and in separate HTML

files is problematic.
● Accessing database directly from Web would be a better approach.

LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
Internet eCommerce and eBusiness
Worldwide collection of interconnected networks. ● eCommerce - Customers can place and pay for orders via the
business’s Web site.
● Began in late ‘60s in ARPANET, a US DOD project,
investigating how to build networks that could withstand ● eBusiness - Complete integration of Internet technology into
partial outages. economic infrastructure of the business.
● Starting with a few nodes, Internet estimated to have over 945
● Business-to-business transactions may reach $2.1 trillion in Europe
million users by end of 2004.
and $7 trillion in US by 2006.
● 2 billion users projected by 2010.
● About 3.5 billion documents on Internet (550 billion if ● eCommerce may account for $12.8 trillion in worldwide corporate
intranets/extranets included). revenue by 2006 and could represent 18% of sales in the global
economy.
Intranet and Extranet
The Web
● Intranet - Web site or group of sites belonging to an organization,
accessible only by members of that organization. Hypermedia-based system that provides a simple ‘point and click’ means of
browsing information on the Internet using hyperlinks.
● Extranet - An intranet that is partially accessible to authorized
outsiders. ● Information presented on Web pages, which can contain text,
graphics, pictures, sound, and video.
● Whereas intranet resides behind firewall and is accessible only to
people who are members of same organization, extranet provides ● Can also contain hyperlinks to other Web pages, which allow users
various levels of accessibility to outsiders. to navigate in a non-sequential way through information.
● Web documents written using HTML.

LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Web consists of network of computers that can act in two roles: HyperText Transfer Protocol (HTTP)
u as servers, providing information; ● Protocol used to transfer Web pages through Internet.
● Based on request-response paradigm:
u as clients (browsers), requesting information.
o Connection - Client establishes connection with Web
● Protocol that governs exchange of information between Web server server.
and browser is HTTP and locations within documents identified as o Request - Client sends request to Web server.
a URL. o Response - Web server sends response (HTML

document) to client.
● Much of Web’s success is due to its simplicity and o Close - Connection closed by Web server.
platform-independence.
● HTTP/1.0 is stateless protocol - each connection is closed once
Basic Components of Web Environment server provides response.
● This makes it difficult to support concept of a session that is

essential to basic DBMS transactions.
HyperText Markup Language (HTML)
● Document formatting language used to design most Web pages.

● A simple, yet powerful, platform-independent document language.
● HTML is application of Standardized Generalized Markup
Language (SGML), a system for defining structured document
types and markup languages to represent instances of those
document types.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● path name on host where resource stored.
● Can optionally specify:
● port through which connection to host should be made,
● query string.
http://www.w3.org/Markup/MarkUp.html
Static and Dynamic Web Pages
Uniform Resource Locators (URLs)

● HTML document stored in file is static Web page.
String of alphanumeric characters that represents location or address of a ● Content of dynamic Web page is generated each time it is
resource on Internet and how that resource should be accessed. accessed.

● Thus, dynamic Web page can:
● Defines uniquely where documents (resources) can be found.
– respond to user input from browser;
● Uniform Resource Identifiers (URIs) - generic set of all Internet
resource names/addresses. – be customized by and for each user.
● Uniform Resource Names (URNs) - persistent, ● Requires hypertext to be generated by servers.
location-independent name. Relies on name lookup services.

● Need scripts that perform conversions from different data formats
● URL consists of three basic parts: into HTML ‘on-the-fly’.
● protocol used for the connection, Web Services
● host name,
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
Collection of functions packaged as single entity and published to ● Common example is stock quote facility, which receives a request
network for use by other programs. for current price of a specified stock and responds with requested
price.
● Web services are important paradigm in building applications and
business processes for the integration of heterogeneous ● Second example is Microsoft MapPoint Web service that allows
applications. high quality maps, driving directions, and other location
● Based on open standards and focus on communication and information to be integrated into a user application, business
collaboration among people and applications. process, or Web site.
● Unlike other Web-based applications, Web services have no user
Requirements for Web-DBMS Integration
interface and are not targeted for browsers. Instead, consist of
reusable software components designed to be consumed by other ● Ability to access valuable corporate data in a secure manner.
applications.
● Data- and vendor-independent connectivity to allow freedom of
Web Services – Technologies & Standards choice in DBMS selection.
● eXtensible Markup Language (XML). ● Ability to interface to database independent of any proprietary
browser or Web server.
● SOAP (Simple Object Access Protocol) protocol, based on XML,
used for communication over Internet. ● Connectivity solution that takes advantage of all the features of an
organization’s DBMS.
● WSDL (Web Services Description Language) protocol, again
based on XML, used to describe the Web service. ● Open architecture to allow interoperability with a variety of
systems and technologies. For example:
● UDDI (Universal Discovery, Description and Integration) protocol
used to register the Web service for prospective users. o different Web servers;
Web Services
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
o Microsoft's (Distributed) Common Object Model ● DBMS advantages

(DCOM/COM);
● Simplicity
o CORBA/IIOP (Internet Inter-ORB protocol);
● Platform independence
o Java/Remote Method Invocation (RMI);
● Graphical User Interface
o XML;
● Standardization
o Web services (SOAP, WSDL, UDDI).
● Cross-platform support
● Cost-effective solution that allows for scalability, growth, and
● Transparent network access
changes in strategic directions, and helps reduce applications
development costs. ● Scalable deployment
● Support for transactions that span multiple HTTP requests. ● Innovation
● Support for session- and application-based authentication. Disadvantages of Web-DBMS Approach
● Acceptable performance. ● Reliability
● Minimal administration overhead. ● Security
● Set of high-level productivity tools to allow applications to be ● Cost

developed, maintained, and deployed with relative ease and speed.
● Scalability
● Limited functionality of HTML

Advantages of Web-DBMS Approach
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Statelessness ● Some popular scripting languages are: JavaScript, VBScript, Perl,

and PHP.
● Bandwidth
● They are interpreted languages, not compiled, making it easy to
● Performance
create small applications.
● Immaturity of development tools

Common Gateway Interface (CGI)
Approaches to Integrating Web and DBMSs

Specification for transferring information between a Web server
● Scripting Languages. and a CGI program.
● Common Gateway Interface (CGI).

● Server only intelligent enough to send documents and to tell
● HTTP Cookies.
browser what kind of document it is.
● Extending the Web Server.
● Java, J2EE, JDBC, SQLJ, JDO, Servlets, and JSP. ● But server also knows how to launch other programs.
● Microsoft Web Solution Platform: .NET, ASP, and ADO.
● When server sees that URL points to a program (script), it executes
● Oracle Internet Platform.
script and sends back script’s output to browser as if it were a file.
Scripting Languages (JavaScript and VBScript)
● Scripting languages can be used to extend browser and Web server

with database functionality.
● As script code is embedded in HTML, it is downloaded every time

page is accessed. CGI – Environment
● Updating browser is simply a matter of changing Web document

on server.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Then performs necessary processing and writes its output to

STDOUT.
● Script responsible for sending MIME header, which allows

browser to differentiate between components.
● CGI scripts can be written in almost any language, provided it

supports reading and writing of an operating system’s environment
variables.
● Four primary methods for passing information from browser to a

CGI script:
o Passing parameters on the command line.

CGI
o Passing environment variables to CGI programs.
● CGI defines how scripts communicate with Web servers.
o Passing data to CGI programs via standard input.
● A CGI script is any script designed to accept and return data that
conforms to the CGI specification. o Using extra path information.
● Before server launches script, prepares number of environment CGI - Passing Parameters on Command Line
variables representing current state of the server, who is requesting
the information, and so on.
● Script picks this up and reads STDIN.

LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
CGI – Disadvantages
● Communication between client and database server must always

go through Web server.
● Lack of efficiency and transaction support, and difficulty

validating user input inherited from statelessness of HTTP
protocol.
● HTTP never intended for long exchanges or interactivity.
● Server has to generate a new process or thread for each CGI script.
CGI – Advantages
● Security.
● CGI is the de facto standard for interfacing Web servers with
external applications. HTTP Cookies
● Possibly most commonly used method for interfacing Web ● Cookies can make CGI scripts more interactive.
applications to data sources.
● Cookies are small text files stored on Web client.
● Advantages:
● CGI script creates cookie and has Web server send it to client’s
– simplicity, browser to store on hard disk.
– language independence, ● Later, when client revisits Web site and uses a CGI script that
requests this cookie, client’s browser sends information stored in
– Web server independence,
the cookie.
– wide acceptance.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Cookies can be used to store registration information or ● Extending Web server is potentially dangerous, since server
preferences (e.g. for virtual shopping cart). executable is being changed.
● However, not all browsers support cookies. Comparison of CGI and API
Extending the Web Server ● CGI and API both extend capabilities of server.
● To overcome limitations of CGI, many servers provide an API that ● CGI scripts run in environment created by Web server program.
adds functionality to server.
● Scripts only execute once Web server interprets request from
● Two of main APIs are Netscape’s NSAPI and Microsoft’s ISAPI. browser, then returns results back to the server.
● Scripts are loaded in as part of the server, giving back-end ● API approach not nearly so limited in its ability to communicate.
applications full access to all the I/O functions of server.
● API-based extensions are loaded into same address space as Web
● One copy of application is loaded and shared between multiple server.
requests to server.
Java
● Approach more complex than CGI, possibly requiring specialized
● Proprietary language developed by Sun.
programmers.
● Originally intended to support environment of networked machines

● Can provide very flexible and powerful solution.
and embedded systems.
● API extensions can provide same functionality as a CGI program,
● Now, Java is rapidly becoming de facto language for Web
but as API runs as part of the server, API approach can perform
computing.
significantly better than CGI.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Interesting because of its potential for building Web applications ● Before Java application can be executed, it must first be loaded
(applets) and server applications (servlets). into memory.
‘A simple, object-oriented, distributed, interpreted, robust, secure, ● Done by Class Loader, which takes ‘.class’ file(s) containing
architecture neutral, portable, high-performance, multi-threaded and bytecodes and transfers it into memory.
dynamic language’.
● Class file can be loaded from local hard drive or downloaded from
● Has a machine-independent target architecture, the Java Virtual network.
Machine (JVM).
● Finally, bytecodes must be verified to ensure that they are valid
● Since almost every Web browser vendor has already licensed Java and do not violate Java’s security restrictions.
and implemented an embedded JVM, Java applications can
● Loosely speaking, Java is a ‘safe’ C++.
currently be deployed on most end-user platforms.
● Safety features include strong static type checking, automatic

garbage collection, and absence of machine pointers at language
level.
● Safety is central design goal: ability to safely transmit Java code

across Internet.
● Security is also integral part of Java’s design - sandbox ensures

untrusted application cannot gain access to system resources.
Java 2 Platform
● In mid-1999, Sun announced it would pursue a distinct and

integrated Java enterprise platform:
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
– J2ME: aimed at embedded and consumer-electronics

platforms.
– J2SE: aimed at typical desktop and workstation

environments. Serves as foundation for J2EE and Web
services.
– J2EE: aimed at robust, scalable, multiuser, and secure

enterprise applications.
– J2EE was designed to simplify complex problems with

development, deployment, and management of multi-tier
enterprise applications.
● Cornerstone of J2EE is Enterprise JavaBeans (EJB), a standard for

building server-side components in Java.
● Three types of EJB components:
– EJB Session Beans, components implementing business

logic, business rules, and workflow.
– EJB Message-Driven Beans (MDBs), which process

messages sent by clients, EJBs, or other J2EE
components.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
– EJB Entity Beans, components encapsulating some data – A direct mapping of relational database tables to Java
contained by the enterprise. Entity Beans are persistent. classes (e.g. TopLink from Oracle).
● Two types of entity beans: ● JDBC API consists of two main interfaces: an API for application
writers, and a lower-level driver API for driver writers.
– Bean-Managed Persistence (BMP), which requires
developer to write code top make bean persist using an ● Applications and applets can access databases using:
API such as JDBC or SQLJ.
– ODBC drivers and existing database client libraries;
– Container-Managed Persistence (CMP), where
– JDBC API with pure Java JDBC drivers.
persistence is provided automatically by container.
● Discuss 5 ways to access a database: JDBC, SQLJ, CMP, JDO, and

JSP.
JDBC
● Modeled after ODBC, JDBC API supports basic SQL

functionality.
● With JDBC, Java can be used as host language for writing database
applications.
● On top of JDBC, higher-level APIs can be built.
● Currently, two types of higher-level APIs:
– An embedded SQL for Java (e.g. SQLJ).

LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
JDBC - Advantages/Disadvantages
● Advantage of using ODBC drivers is that they are a de facto

standard for PC database access, and are available for many
DBMSs, for very low price.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Disadvantages with this approach: ● Thus, SQLJ facilitates static analysis for syntax checking, type
checking, and schema checking, which may help produce more
– Non-pure JDBC driver will not necessarily work with a
reliable programs at loss of some functionality.
Web browser.
● It also potentially allows DBMS to generate an execution strategy
– Currently downloaded applet can connect only to database
for the query, thereby improving performance of the query.
located on host machine.
● JDBC is low-level middleware tool with features to interface Java
– Deployment costs increase.
application with RDBMS.
SQLJ
● Developers need to design relational schema to which they will
● Another JDBC-based approach uses Java with static embedded map Java objects, and write code to map Java objects to rows of
SQL. relations.
● SQLJ comprises a set of clauses that extend Java to include SQL ● Problems:
constructs as statements and expressions.
o need to be aware of two different paradigms (object and
● SQLJ translator transforms SQLJ clauses into standard Java code relational);
that accesses database through a CLI.
o need to design relational schema to map onto object
Comparison of JDBC and SQLJ design;
● SQLJ is based on static embedded SQL while JDBC is based on o need to write mapping code.
dynamic SQL.
EJBs
● EJBs have 3 elements in common:

LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
– an indirection mechanism; bean, at which point client can access bean through remote or local
interface returned by create().
– a bean implementation;
– a deployment description.
● With indirection mechanism clients do not invoke EJB methods

directly.
● Session and entity beans provide access to their operations via

interfaces.
● home interface defines methods that manage lifecycle of a bean. ● Bean implementation is a Java class that implements business logic
The corresponding server-side implementation classes are defined in remote interface.
generated at deployment time.
● Transactional semantics are described declaratively and captured in
● To provide access to other operations, bean can expose a local the deployment descriptor.
interface (if client and bean are colocated), a remote interface, or
● Deployment descriptor, written in XML, lists a bean’s properties
both.
and elements, which may include:
● Local interfaces expose methods to clients running in same
– home interface, remote interface, local interface;
container or JVM.
– Web service endpoint interface,
● Remote interfaces make methods available to clients no matter
where deployed. – bean implementation class,
● When a client invokes create() method (which returns an interface) – JNDI name for bean, transaction attributes, security
on home interface, EJB container calls ejbCreate() to instantiate attributes, and per-method descriptors.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
Container-Managed Persistence (CMP) ● With CMR, beans use local interfaces to maintain relationships
with other beans.
● Instead of writing Java code to implement BMP, CMP is defined
declaratively in deployment descriptor. ● For example, a Staff bean can use collection of PropertyForRent
local interfaces to maintain a 1:M relationship
● At runtime, container manages bean’s data by interacting with data
source designated in deployment descriptor. ● Container can also manage referential integrity.
● Following steps need to be followed for CMP: ● CMR relationships are described declaratively in deployment
descriptor file outside enterprise-beans element.
– Define CMP fields in local interface.
● Need to specify both beans involved in relationship.
– Define CMP fields in entity bean class implementation.
● Relationship is defined in ejb-relations element, with each role
– Define CMP fields in deployment descriptor.
defined in ejb-relationship-role element.
– Define PK field and its type in deployment descriptor.
● When bean is deployed, the container provider’s tools parse
Container-Managed Relationships (CMR) deployment descriptor and generate code to implement underlying
classes.
● EJB container can manage relationships between entity beans and
session beans. EJB Query Language (EJB-QL)
● Relationships have a multiplicity, which can be 1:1, 1:M, or M:M, ● Used to define queries for entity beans that operate with CMP.
and a direction, which can be unidirectional or bidirectional. EJB-QL can express queries for two different styles of operations:
● Local interfaces provide foundation for CMR. – finder methods, which allow results of an EJB-QL query
to be used by clients of the entity bean. Finder methods
are defined in home interface.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
– select methods, which find objects or values related to </query>

state of an entity bean without exposing results to client.
Select methods are defined in entity bean class.
EJB Query Language (EJB-QL)

● An object-based approach for defining queries against persistent
store; conceptually similar to SQL. <query>
● As with CMP and CMR fields, queries are defined in the <query-method>
deployment descriptor.
<method-name>findByStaffName</method-name>
● EJB container is responsible for translating EJB-QL queries into
query language of persistent store, resulting in query methods that <method-params>java.lang.String</method-params>
are more flexible.

</query-method>
<query>
<result-type-mapping>Local</result-type-mapping>
<query-method>
<ejb-ql><![CDATA[SELECT OBJECT(s)
<method-name>findAll</method-name>
FROM Staff s WHERE s.name = ?1]]>
<method-params></method-params>
</ejb-ql>
</query-method>
</query>
<result-type-mapping>Local</result-type-mapping>
Java Data Objects (JDO)
<ejb-ql><![CDATA[SELECT OBJECT(s) FROM Staff s]]></ejb-ql>

● ODMG submitted their Java binding to Java Community Process
as basis of JDO. Development of JDO had two major aims:
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
– To provide standard interface between application objects ● PersistenceManager contains methods to manage the lifecycle of
and data sources, such as relational databases, XML PersistenceCapable instances and is also the factory for Query and
databases, legacy databases, and file systems. Transaction instances.
– To provide developers with a transparent Java-centric ● A PersistenceManager instance supports one transaction at a time
mechanism for working with persistent data to simplify and uses one connection to the underlying data source at a time.
application development. (Aim of JDO was to reduce
● Query allows applications to obtain persistent instances from data
need to explicitly code such things as SQL statements and
source. Can be many Query instances associated with a
transaction management into applications).
PersistenceManager and multiple queries may be designated for
Java Data Objects (JDO) – Interfaces simultaneous execution.
● PersistenceCapable makes a Java class capable of being persisted ● This interface is implemented by each JDO vendor to translate
by a persistence manager. Every class whose instances can be expressions in JDOQL into native query language of data store.
managed by a JDO PersistenceManager must implement this
Java Data Objects (JDO) – Interfaces and Classes
interface.
● Most JDO implementations provide an enhancer that transparently ● Extent is a logical view of all objects of a particular class that exist
adds code to implement this interface to each persistent class. in the data source.
● The interface defines methods that allow an application to examine
runtime state of an instance and to get its associated ● Extents are obtained from a PersistenceManager and can be
PersistenceManager if it has one. configured to also include subclasses.
● PersistenceManagerFactory obtains PersistenceManager instances.

● Extent has two possible uses: (a) to iterate over all instances of a
PMF instances can be configured and serialized for later use.
class; (b) to execute a query in the data source over all instances of
a particular class.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Transaction contains methods to mark start/end of transactions. ● Thus, any transient instance of a persistent class will become
persistent at commit if it is reachable, directly or indirectly, by a
● JDOHelper class defines static methods that allow a JDO-aware
persistent instance.
application to examine runtime state of instances and to get its
associated PersistenceManager if it has one. ● Instances are reachable through either a reference or collection of
references.
JDO – Creating Persistent Classes
● Reachability algorithm is applied to all persistent instances
1. Ensure each class has a no-arg constructor. If class has no
transitively through all their references to instances in memory,
constructors defined, complier automatically generates a no-arg
causing the complete closure to become persistent.
constructor; otherwise, developer will need to specify one.
● Allows developers to construct complex object graphs in memory
2. Create a JDO metadata file to identify the persistent classes. The
and make them persistent simply by creating a reference to graph
JDO metadata file is expressed as an XML document.
from a persistent instance.
3. Enhance classes so that they can be used in a JDO runtime
● Instances have to be explicitly deleted.
environment. JDO specification describes a number of ways that
classes can be enhanced, however, most common way is using an JDO Query Language (JDOQL)
enhancer program that reads a set of .class files and JDO metadata
● Data source-neutral query language based on Java boolean
file and creates new .class files that have been enhanced to run in a
expressions.
JDO environment.
● Syntax is same as standard Java syntax, with a few exceptions.
● A Query object is used to find persistent objects matching certain

JDO – Reachability-based Persistence
criteria. A Query is obtained through one of newQuery() methods
● JDO supports reachability-based persistence. of PersistenceManager.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Basic JDOQL query has following 3 components: ● Have a number of advantages over CGI:
– a candidate class (usually a persistent class); – improved performance;
– a candidate collection containing persistent objects – portability;

(usually an Extent);
– extensibility;
– a filter, a boolean expression in a Java-like syntax.
– simpler session management;
● Query result is a subcollection of candidate collection containing
– improved security and reliability.
only those instances of candidate class that satisfy filter.
Java Server Pages (JSP)
● Queries can include optional parameter declarations that act as
placeholders in filter string, variable declarations, imports, and ● Java-based server-side scripting language that allows static HTML
ordering expressions. to be mixed with dynamically-generated HTML.
● Compiled into Java servlet and processed by a Java-enabled Web
server (JSP works with most Web servers).
Query query = pm.newQuery(PropertyForRent.class, ● Since servlet is compiled, performance is improved.
“this.rent < 400”); Java Web Services – Document-Oriented
Collection result = (Collection) query.execute(); ● Deal directly with processing XML documents.
Java Servlets ● Java API for XML Processing (JAXP), processes XML documents
using various parsers and transformations. JAXP supports both
● Servlets are programs that run on Java-enabled Web server and
SAX and DOM. Also supports the XSLT.
build Web pages, analogous to CGI.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Java Architecture for XML Binding (JAXB), processes XML ● JAXR gives Java developers a uniform way to use business
documents using schema-derived JavaBeans component classes. registries based on open standards (such as ebXML) or industry
JAXB provides methods for unmarshalling an XML instance consortium-led specifications (such as UDDI).
document into a tree of Java objects, and marshalling tree back
Microsoft Web Platform - .NET
into an XML document.
“Software is delivered as a service, accessible by any device, any
● SOAP with Attachments API for Java (SAAJ) , provides standard
time, any place, and is fully programmable and personalizable.”
way to send XML documents over Internet from Java platform.
Based on SOAP 1.1 and SOAP with Attachments, which define a ● Contains various tools, services, and technologies, such as:
basic framework for exchanging XML messages.
– Windows 2000,
Java Web Services – Procedure-Oriented
– Exchange Server,
● Java API for XML-based RPC (JAX-RPC), sends SOAP method
calls to remote clients over Internet and receives results. – Visual Studio,
● Client written in language other than Java can access a Web service – HTML/XML,
developed and deployed on Java platform.

– scripting languages,
● Also, client written in Java can communicate with service

– components (Java, ActiveX).
developed and deployed using some other platform.
Object Linking and Embedding for DataBases (OLE DB)
● Java API for XML Registries (JAXR), provides standard way to
access business registries and share information. ● Microsoft has defined set of data objects, collectively known as
OLE DB.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Allows OLE-oriented applications to share and manipulate sets of ● ASP runs in-process with the server, and is optimized to handle
data as objects. large volume of users.
● OLE DB is an object-oriented specification based on C++ API. ● When an ‘.asp’ file is requested, Web server calls ASP, which
reads requested file, executes any commands, and sends generated
● Components can be treated as data consumers and data providers.
HTML page back to browser.
Consumers take data from OLE DB interfaces and providers
expose OLE DB interfaces.
OLE DB Architecture
ActiveX Data Objects (ADO)
● Programming extension of ASP supported by Microsoft IIS for

database connectivity.
Active Server Pages (ASP)
● Supports following key features:
● ASP is programming model that allows dynamic, interactive Web
o Independently-created objects.
pages to be created on server.
o Support for stored procedures.

● ASP provides flexibility of CGI, without performance overhead
discussed previously.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
o Support for different cursor types. ● Microsoft technology for client-side database manipulation across
Internet.
o Batch updating.
● Still uses ADO on server-side to execute query and return
o Support for limits on number of returned rows.
recordset to client, which can then execute other queries on
o Support for multiple recordsets. recordset.
● Designed as an easy-to-use interface to OLE DB. ● RDS provides mechanism to send updated records back to server.
ADO Object Model ● A disconnected recordset model.
Comparison of ASP and JSP
● Both designed to enable developers to separate page design from

programming logic through use of callable components.
● Differences:
– JSP is essentially platform and server independent

whereas ASP primarily restricted to MS Windows-based
platforms.
– JSP perhaps more extensible as JSP developers can

extend the JSP tags available.
– JSP components are reusable across platforms.
Remote Data Services (RDS) – JSP benefits from in-built Java security model.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
Microsoft .NET o Application Center (to deploy and manage scalable Web
applications),
● Number of limitations with Microsoft’s platform:
o Mobile Information Server (to support handheld devices),
o a number of languages supported with different
programming models (J2EE composed solely of Java); o SQL Server,
o no automatic state management; o Microsoft Visual Studio .NET
o relatively simple user interfaces for Web compared to o Microsoft .NET Framework (CLR + Class Library).
traditional Windows user interfaces;
.NET Framework
o need to abstract operating system (Windows API difficult
to program).
● Next, and current, evolution in Microsoft’s Web solution strategy

was development of .NET.
● Various tools, services, technologies in .NET:
o Windows Server,
o BizTalk Server (to build XML-based business processes

across applications and organizations),
o Commerce Server (to build scalable e-Commerce

.NET – Common Language Runtime
solutions),
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● An execution engine that loads, executes, and manages code ● Collection of reusable classes, interfaces, and types that integrate
compiled into an intermediate bytecode format - Microsoft with CLR providing standard functionality such as:
Intermediate Language (MSIL) - analogous to Java bytecodes.
– string management, input/output, security management,
● Not interpreted but compiled to native binary format before
– network communications, thread management,
execution by a JIT compiler built into CLR.
– user interface design features,
● Allows one language to call another, and even inherit and modify
objects from another language. – database access and manipulation.
● Provides number of services such as memory management, code ● 3 main components:

and thread execution, uniform error handling, and security.
– Windows Forms to support user interface development.
● Enforces strict type-and-code-verification system called common
type system (CTS), which contains range of pre-built data types – ASP.NET to support development of Web applications
representing both simple data types for objects such as numbers and Web services. Reengineered version of ASP to
and text values, and more complex data types for developing user improve performance and scalability.
interfaces, data systems, file management, graphics, and Internet

– ADO.NET to help applications connect to databases.
services.
ADO.NET
● Also supports side-by-side execution allowing application to run
on single computer that has multiple versions of .NET Framework ● Designed to address 3 main weaknesses with ADO:
installed, without application being affected.
o providing a disconnected data access model required for
.NET Framework Class Library Web;
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
o providing compatibility with .NET Framework class

library;
o providing extensive support for XML.
● Different from connected style of programming that existed in

traditional 2-tier C-S architecture, where connection was held open
for duration of program’s lifetime and no special handling of state
was required.
ADO.NET Object Model
● Also ADO data model is primarily relational and could not easily
handle XML with a data model that is heterogeneous and
hierarchical.
● Recognizing that ADO was a mature technology and widely used,

ADO has been retained in the .NET Framework, accessible
through the .NET COM interoperability services.
● Two main layers:
o a connected layer (similar to ADO);
o a disconnected layer, the DataSet (providing a similar

functionality to RDS).
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
ADO.NET – contents of DataSet can be loaded from an XML stream

or document, which can be either data, XML schema
● Main replacements for ADO Recordset are:
information, or both.
– DataAdapter, acts as bridge between vendor-dependent

● Also, a DataSet can be made persistent using XML (with or
data source and vendor-neutral DataSet. While data
without a corresponding XML Schema).
source may be RDB, may also be an XML document.
Microsoft Web Services
– DataReader, provides connected, forward-only, read-only
stream of data from data source. A DataReader can be ● .NET Framework built on number of standards to promote
used independently of a DataSet for increased interoperability with non-Microsoft solutions.
performance.
● For example, Visual Studio .NET automatically creates necessary
– DataSet, provides disconnected copies of records from XML and SOAP interfaces required to turn application into a Web
data source. DataSet stores records from one or more service.
tables in memory without holding a connection to the data
● In addition, .NET Framework provides set of classes that conform
source, but unlike RDS DataSet maintains information on
to all the underlying communication standards, such as SOAP,
relationships between tables and constraints.
WSDL, and XML.
● Several ways a DataSet can be used:
● Microsoft UDDI SDK enables developers to add UDDI
– user can create DataTable, DataRelation, and Constraint functionality to development tools, installation programs, and any
within DataSet and populate table with data other software that needs to register or locate and bind remote Web
programmatically. services.
– user can populate DataSet with data from existing Microsoft Access and Web Page Generation
relational data source using a DataAdapter.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Access provides wizards for automatically generating – Java, J2EE, EJB, JDBC, and SQLJ for database
HTML/XML: connectivity, Java servlets, and JSP. Also supports JNDI
and stored Java procedures.
– Static pages: user can export data to HTML format.
– OMG’s CORBA technology.
– Dynamic pages using ASP: user can export data to an
‘asp’ file on Web server. – IIOP for object interoperability and RMI.
– Dynamic pages using HTX/IDC files: user can export data – Web services: SOAP, WSDL, UDDI, ebXML, WebDav,
to HTX/IDC files on server. LDAP.
– Dynamic pages using data access pages: data access Oracle Internet Platform
pages are Web pages bound directly to data in the
database. Can be used like Access forms, except pages are
stored as external files.
– XML: data can be output as an XML document along

with associated schema and an XSL file.
Oracle Internet Platform
● Comprises Oracle Application Server and Oracle DBMS.
● It is n-tier architecture based on industry standards such as:
– HTTP and HTML/XML for Web enablement.

LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
Oracle Application Server (OracleAS) – mod_oc4j, routes HTTP requests for J2EE to OracleAS
Containers for J2EE (OC4J);
● A reliable, scalable, secure, middle-tier application server designed
to support eBusiness. – mod_plsql, routes requests for stored procedures to
database server;
● Currently available in three versions:
– mod_fastcgi, enhanced version of CGI that runs programs
– Java Edition: lightweight Web server with minimal
in pre-spawned process;
application support;
– mod_oradav, provides support for WebDAV;
– Standard Edition: for medium to large Web sites that
handle large volume of transactions; – mod_ossl, provides standard S-HTTP;
– Enterprise Edition: Standard Edition + extras. – mod_osso, enables transparent single sign-on.
Communication Services OracleAS Containers for J2EE (OC4J)
● Handles all incoming requests received by OracleAS, some ● A fully compliant J2EE 1.3 server.
processed by Oracle HTTP Server and some by other areas of
● Runs on J2SE and executes and manages J2EE application
OracleAS.
components such as:
● Oracle HTTP Server is extended version of Apache Server.
– Servlets Servlet container provided that manages
Oracle HTTP Server Modules (mods) execution of Web components and J2EE applications.
● Oracle has enhanced several of Apache mods, and has added – JSPs JSP translator provided to convert JSP files into
Oracle-specific ones; e.g.: Java source that container can then compile and execute
as a servlet.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
– EJBs EJB container provided that manages execution of Business Components for Java (BC4J)
EJBs for J2EE applications. Container has configurable
● A Java and XML framework that enables development,
settings that customize the underlying support , such as
deployment, and customization of multi-tier database applications
security, transaction management, JNDI lookups, and
from reusable business components.
remote connectivity. Container also manages EJB
lifecycles, database connection resource pooling, data ● Application developers can use BC4J to author and test business
persistence, and access to J2EE APIs. logic in components that automatically integrate with databases,
reuse business logic through SQL-based views, and access/update
● OracleAS supports both JDBC and SQLJ database access
these views from servlets, JSP, and Java Swing clients.
mechanisms, and provides following drivers:
● Applications can be deployed as either EJB Session Beans or

– Oracle JDBC drivers, for use with Oracle database. Have
CORBA objects on OracleAS.
extensions to support Oracle-specific datatypes and to
enhance their performance. Presentation Services
– J2EE Connectors, part of J2EE platform, provide a ● These services deliver dynamic content to client browsers,
Java-based solution for connecting various application supporting servlets, JSP, Perl/CGI scripts, PL/SQL pages,
servers and EISs. forms, and business intelligence.
– DataDirect Connect Type 4 JDBC drivers, for connecting – Oracle Forms Services, to run Oracle Forms over
to non-Oracle databases. Internet;
– OracleJSP, an implementation of Sun’s JSP;

LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
– Oracle PSP (PL/SQL Server Pages), analogous to JSP, TopLink

but uses PL/SQL rather than Java for the server-side
● A persistence framework that includes an object-relational
scripting.
mapping mechanism for storing Java objects and EJBs in a RDB.
– Perl Interpreter, a persistent Perl runtime embedded in
● Provides solution to address complex differences between Java
Oracle HTTP Server.
objects and RDBs and enables applications to store persistent Java
Web Services and XML Support objects in any RDB supported by a JDBC driver.
● OracleAS provides facilities for developing, deploying, and ● Includes Mapping Workbench, a visual tool to map any object
managing Web services; e.g.: model to any relational schema.
– Web services can be developed using stateless and stateful Oracle Portal
Java classes, stateless session EJBs, and stateless PL/SQL
● A portal is Web-based application that provides a common,
stored procedures.
integrated entry point for accessing dissimilar data types on a
– Web Service HTML/XML Streams Processing Wizard single Web page.
assists developers in creating an EJB whose methods
● A portal is divided into a number of portlets.
access and process HTML or XML streams.
● Oracle Portal provides a number of tools to generate and customize

– Web services can be integrated into both enterprise and
portals and portlets.
wireless portals, other Web services, databases, legacy
systems, and applications. Oracle Wireless
– OracleAS supports SOAP, WSDL, and UDDI. ● Provides services and tools for delivering information and
applications to mobile devices.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Includes Multi-Channel Server (MCS) that supports development – Oracle Personalization enables users to track activity of
of applications that are accessible from multiple channels including specific user and personalize information for that user.
wireless browsers, voice, and messaging.
● MCS automatically translates applications written in Oracle

Wireless XML, XHTML Mobile Profile, or XHTML+XForms for
any device and network.
● Also allows portal sites to be created that use Web pages, Java
applications, and XML-based applications.
Business Intelligence
● Functions to track, extract, and analyze business intelligence to

support strategic decision-making:
– Oracle Reports Services enable users to run Oracle

Reports over Internet.
– Oracle Discoverer allows users to produce queries,

reports, and analysis of information from databases,
OLTP systems, and data warehouses using a Web
browser.
– Oracle Clickstream provides services to capture and

analyze aggregate information about Web site usage.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
TOPIC: MOBILE DATABASES ▪ Replication supports a variety of applications

that have very different requirements.
Mobile Databases:
▪ Some applications are adequately supported with
Database Replication: only limited synchronization between the copies
of the database and the corporate database
The process of copying and maintaining database objects, such system, while other applications demand
as replication relations, in multiple databases that make up a continuous synchronization between all copies of
distributed database system. the database.
▪ Financial applications involving the management
Benefits of Database Replication
of shares require data on multiple servers to be
Availability synchronized in a continuous, nearly
instantaneous manner to ensure that the service
Reliability
provided is available and equivalent at all times.
Performance ▪ An important application of database
replication called mobile databases
Load reduction
● Introduction to Mobile Databases
Disconnected computing We are currently witnessing increasing demands on mobile

computing to provide the types of support required by a growing
Supports many users number of mobile workers. Such individuals require to work as if
in the office but in reality they are working from remote locations
Supports advanced applications
including homes, clients’ premises, or simply while en route to
Applications of Replication remote locations. The ‘office’ may accompany a remote worker in
the form of a laptop, PDA (personal digital assistant), or other
Internet access device. With the rapid expansion of cellular,
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
wireless, and satellite communications, it will soon be possible for The components of a mobile database environment include:
mobile users to access any data, anywhere, at any time. However,
● corporate database server and DBMS that manages and stores the
business etiquette, practicalities, security, and costs may still limit
corporate data and provides corporate applications;
communication such that it is not possible to establish online
● remote database and DBMS that manages and stores the mobile
connections for as long as users want, whenever they want. Mobile
data and provide mobile applications;
databases offer a solution for some of these restrictions.
● mobile database platform that includes laptop, PDA, or other
Internet access devices;
● two-way communication links between the corporate and mobile
A database that is portable and physically separate from the
DBMS.
corporate database server but is capable of communicating with
Mobile DBMSs
that server from remote sites allowing the sharing of corporate data
is called mobile database ● All the major DBMS vendors now offer a mobile DBMS.
● Most vendors promote their mobile DBMS as being capable of
communicating with a range of major relational DBMSs and in
providing database services that require limited computing
resources to match those currently provided by mobile devices.
● The additional functionality required of mobile DBMSs includes
the ability to:
o communicate with the centralized database server
through modes such as wireless or Internet access;
o replicate data on the centralized database server and
24.7 mobile device;
o synchronize data on the centralized database server and
mobile device;
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
o capture data from various sources such as the Internet; ● The general architecture of a mobile platform is a distributed
o manage data on the mobile device; architecture where a number of computers, generally referred to as
o analyze data on a mobile device; Fixed Hosts and Base Stations, are interconnected through a
o create customized mobile applications. high-speed wired network. Fixed hosts are general purpose
Applications: computers that are not typically equipped to manage mobile units
but can be configured to do so.
● This feature is especially useful to geographically dispersed
● Base stations function as gateways to the fixed network for the
organizations. Typical examples might include electronic valets,
Mobile Units.
news reporting, brokerage services, and automated salesforces.
● They are equipped with wireless interfaces and offer network
Disadvantage:
access services of which mobile units are clients.
● There are a number of hardware and software problems that must Wireless Communications.
be resolved before the capabilities of mobile computing can be
● The wireless medium on which mobile units and base stations
fully utilized.
communicate have bandwidths significantly lower than those of a
● Some of the software problems-which may involve data
wired network.
management, transaction management, and database recovery-have
● The current generation of wireless technology has data rates that
their origins in distributed database systems.
range from the tens to hundreds of kilobits per second (2G cellular
● In mobile computing, however, these problems are more difficult,
telephony) to tens of megabits per second (wireless Ethernet,
mainly because of the limited and intermittent connectivity
popularly known as WiFi).
afforded by wireless communications, the limited life of the power
● Modem (wired) Ethernet, by comparison, provides data rates on
supply (battery) of mobile units, and the changing topology of the
the order of hundreds of megabits per second.
network.
● In addition, mobile computing introduces new architectural
possibilities and challenges.
Mobile Computing Architecture
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Wireless applications must consider all these characteristics when

choosing a communication option.
● For example, physical objects block infrared frequencies. While
inconvenient for some applications, such blockage allows for
secure wireless communications within a closed room.
Client/Network Relationships.
● Mobile units can move freely in a geographic mobility domain, an

area that is circumscribed by wireless network coverage.
● To manage the mobility of units, the entire geographic mobility
domain is divided into one or more smaller domains, called cells,
each of which is supported by at least one base station.
● Besides data rates, characteristics like range, interference, locality
● The mobile discipline requires that the movement of mobile units
of access, and support for packet switching also distinguish
be unrestricted throughout the cells of a geographic mobility
wireless connectivity options.
domain, while maintaining information access contiguity
● Some wireless access options allow seamless roaming thoughout a
movement, especially intercell movement, does not negatively
geographical region (e.g., cellular networks), whereas WiFi
affect the data retrieval process.
networks are localized around a base station. Some wireless
networks, such as WiFi and Bluetooth, use unlicensed areas of the
● frequency spectrum, which may cause interference with other
appliances, such as cordless telephones.
● Finally, modem wireless networks can transfer data in units called
packets, that are commonly used in wired networks in order to Characteristics of Mobile Environments
conserve bandwidth.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● The characteristics of mobile computing include high 1. The entire database is distributed mainly among the wired components,
communication latency, intermittent wireless connectivity, limited possibly with full or partial replication. A base station or fixed host
battery life, and changing client location. Latency is caused by the manages its own database with a DBMS-like functionality, with additional
processes unique to the wireless medium, such as coding data for functionality for locating mobile units and additional query and transaction
wireless transfer, and tracking and filtering wireless signals at the management features to meet the requirements of mobile environments.
receiver.
2. The database is distributed among wired and wireless components. Data
● Battery life is directly related to battery size, and indirectly related
management responsibility is shared among base stations or fixed hosts and
to the mobile device's capabilities.
mobile units. Hence, the distributed data management issues can be applied
● Intermittent connectivity can be intentional or unintentional.
to mobile databases with the following additional considerations and
● Unintentional disconnections happen in areas wireless signals
variations:
cannot reach, e.g., elevator shafts or subway tunnels.
● Intentional disconnections occur by user intent, e.g., during an • Data distribution andreplication: Data is unevenly distributed among the
airplane takeoff, or when the mobile device is powered down. base stations and mobile units. The consistency constraints compound the
● Clients are expected to move, which alters the network topology problem of cache management. Caches attempt to provide the most
and may cause their data requirements to change. frequently accessed and updated data to mobile units that process their own
● All of these characteristics impact data management, and robust transactions and may be disconnected over long periods.
mobile applications must consider them in their design.'
Data Management Issues • Transaction models: Issues of fault tolerance and correctness of
transactions are aggravated in the mobile environment. A mobile transaction
From a data management standpoint, mobile computing may be considered is executed sequentially through several base stations and possibly on
a variation of distributed computing. multiple data sets depending upon the movement of the mobile unit. Central
coordination of transaction execution is lacking. Moreover, a mobile
Mobile databases can be distributed under two possible scenarios:
transaction is expected to be long-lived because of disconnection in mobile
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
units. Hence, traditional ACID properties of transactions may need to be applying these (spatial) queries in order to refresh the cache poses a
modified and new transaction models must be defined. problem.
• Query processing: Awareness of where data is located is important and • Division of labor: Certain characteristics of the mobile environment force
affects the cost benefit analysis of query processing. Query optimization is a change in the division of labor in query processing. In some cases, the
more complicated because of mobility and rapid resource changes of mobile client must function independent of the server.
units. The query response needs to be returned to mobile units that may be
• Security: Mobile data is less secure than that which is left at the fixed
in transit or may cross cell boundaries yet must receive complete and
location. Proper techniques for managing and authorizing access to critical
correct query results.
data become more important in this environment. Data is also more volatile,
• Recovery and fault tolerance: The mobile database environment must and techniques must be able to compensate for its loss.
deal with site, media, transaction, and communication failures. Site failure
Application: Intermittently Synchronized Databases
of a mobile unit is frequent due to limited battery power. A voluntary
shutdown of a mobile unit should not be treated as a failure. Transaction ● One mobile computing scenario is becoming increasingly
failures are routine during handoff when a mobile unit crosses cells. The commonplace as people conduct their work away from their offices
transaction manager should be able to deal with such frequent failures. and homes and perform a wide range of activities and functions: all
kinds of sales, particularly in pharmaceuticals, consumer goods,
• Mobile database design: The global name resolution problem for handling
and industrial parts; law enforcement; insurance and financial
queries is compounded because of mobility and frequent shutdown. Mobile
consulting and planning; real estate or property management
database design must consider many issues of metadata management-for
activities; courier and transportation services, and so on.
example, the constant updating of location information.
• Location-based service: As clients move, location-dependent cache

information may become stale. Eviction techniques are important in this
case. Furthermore, frequently updating location dependent queries, then TOPIC
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
XML and Web Databases: o emergence of XML as standard for data representation
and exchange on the Web, and similarity between XML
Data that may be irregular or incomplete and have a structure that may
documents and semistructured data.
change rapidly or unpredictably.
XML (eXtensible Markup Language)
o Semistructured data is data that has some structure, but structure
may not be rigid, regular, or complete. A meta-language (a language for describing other languages) that enables
designers to create their own customized tags to provide functionality not
o Generally, data does not conform to fixed schema (sometimes use
available with HTML.
terms schema-less or self-describing).
● Most documents on Web currently stored and transmitted in
● Information normally associated with schema is contained within
HTML.
data itself.
● One strength of HTML is its simplicity. Simplicity may also be
● Some forms of semistructured data have no separate schema, in
one of its weaknesses, with users wanting tags to simplify some
others it exists but only places loose constraints on data.
tasks and make HTML documents more attractive and dynamic.
● Unfortunately, relational, object-oriented, and object-relational

● To satisfy this demand, vendors introduced some browser-specific
DBMSs do not handle data of this nature particularly well.
HTML tags, making it difficult to develop sophisticated, widely
● Has gained importance recently for various reasons: viewable Web documents.
o may be desirable to treat Web sources like a database, but ● W3C has produced XML, which could preserve general
cannot constrain these sources with a schema; application independence that makes HTML portable and
powerful.
o may be desirable to have a flexible format for data
exchange between disparate databases; ● XML is a restricted version of SGML, designed especially for Web
documents.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● SGML allows document to be logically separated into two: one ● Extensibility

that defines the structure of the document (DTD), other containing
● Reuse
the text itself.
● Separation of content and presentation

● By giving documents a separately defined structure, and by giving
authors ability to define custom structures, SGML provides ● Improved load balancing
extremely powerful document management system.
● Support for integration of data from multiple sources
● However, SGML has not been widely adopted due to its inherent
complexity. ● Ability to describe data from a wide variety of applications
● XML attempts to provide a similar function to SGML, but is less ● More advanced search engines
complex and, at same time, network-aware.

● New opportunities.
● XML retains key SGML advantages of extensibility, structure, and

validation.
● Since XML is a restricted form of SGML, any fully compliant

SGML system will be able to read XML documents (although the
opposite is not true).
● XML is not intended as a replacement for SGML or HTML.
Advantages of XML
● Simplicity OVERVIEW OF XML
● Open standard and platform/vendor-independent

LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
XML declaration White

</LNAME>
● XML documents begin with an optional XML declaration the
</NAME>
encoding system used (UTF-8 for Unicode), and whether or not
</STAFF>
there are external markup declarations referenced
The element NAME is completely nested within the element STAFF and the
Elements
elements FNAME and LNAME are nested within element NAME.
● Elements, or tags, are the most common form of markup.
● The first element must be a root element, which can contain other
Attributes
(sub)elements.
● Attributes are name–value pairs that contain descriptive
● An XML document must have one root element.
information about an element.
● An element begins with a start-tag (for example,
● The attribute is placed inside the start-tag after the corresponding
● <STAFF>) and ends with an end-tag (for example, </STAFF>).
element name with the attribute value enclosed in quotes.
● XML elements are case sensitive, so an element <STAFF> would
Example
be different from an element <staff>.
<STAFF branchNo "B005">
An element can be empty, in which case it can be abbreviated to
<EMPTYELEMENT/>.
● We could equally well have represented the branch as a subelement
Elements must be properly nested as the following fragment.
of STAFF.
Example
● If we had to represent the member of staff’s sex, we could use an
<STAFF>
attribute of an empty element, for example:
<NAME>
<SEX gender "M"/>
<FNAME>
John
● A given attribute may only occur once within a tag, while
</FNAME>
subelements with the same tag may be repeated.
<LNAME>
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● XML declaration: optional at start of XML document. ● Although optional, DTD is recommended for document
● Entity references: serve various purposes, such as shortcuts to often conformity.
repeated text or to distinguish reserved characters from content.
● Comments: enclosed in <!– and --> tags.
● CDATA sections: instructs XML processor to ignore markup
characters and pass enclosed text directly to application.
● Processing instructions: can also be used to provide information to
application.
DTDs – Element Type Declarations
XML – Ordering
● Identify the rules for elements that can occur in the XML
● Semistructured data model described assumes collections are
document. Options for repetition are:
unordered.
– * indicates zero or more occurrences for an element;
● In XML, elements are ordered.
– + indicates one or more occurrences for an element;
● In contrast, in XML attributes are unordered.
– ? indicates either zero occurrences or exactly one
occurrence for an element.
Document Type Definitions (DTDs)
– Name with no qualifying punctuation must occur exactly
Defines the valid syntax of an XML document.
once.
● Lists element names that can occur in document, which
● Commas between element names indicate they must occur in
elements can appear in combination with which other ones,
succession; if commas omitted, elements can occur in any order.
how elements can be nested, what attributes are available for
DTDs – Attribute List Declarations
each element type, and so on.
● Identify which elements may have attributes, what attributes they
● Term vocabulary sometimes used to refer to the elements used
may have, what values attributes may hold, plus optional defaults.
in a particular application.
Some types:
● Grammar specified using EBNF, not XML.
● CDATA: character data, containing any text.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● ID: used to identify individual elements in document (ID is an ● Validating processor will not only check that an XML document is
element name). well-formed but that it also conforms to a DTD, in which case
● IDREF/IDREFS: must correspond to value of ID attribute(s) for XML document is considered valid.
some element in document. DOM and SAX
● List of names: values that attribute can hold (enumerated type). ● XML APIs generally fall into two categories: tree-based and
DTDs – Element Identity, IDs, IDREFs event-based.
● ID allows unique key to be associated with an element. ● DOM (Document Object Model) is tree-based API that provides
● IDREF allows an element to refer to another element with the object-oriented view of data.
designated key, and attribute type IDREFS allows an element to ● API was created by W3C and describes a set of platform- and
refer to multiple elements. language-neutral interfaces that can represent any well-formed
● To loosely model relationship Branch Has Staff: XML/HTML document.
– <!ATTLIST STAFF staffNo ID #REQUIRED> ● Builds in-memory representation of document and provides classes
– <!ATTLIST BRANCH staff IDREFS #IMPLIED> and methods to allow an application to navigate and process the
DTDs – Document Validity tree.
● Two levels of document processing: well-formed and valid.
● Non-validating processor ensures an XML document is
well-formed before passing information on to application.
● XML document that conforms to structural and notational rules of
XML is considered well-formed; e.g.: Representation of Document as Tree-Structure
– document must start with <?xml version “1.0”>;
– all elements must be within one root element;
– elements must be nested in a tree structure without any
overlap;
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● For uniqueness, elements and attributes given globally unique

names using URI reference.
<STAFFLIST xmlns=“http://www.dreamhome.co.uk/branch5/”
xmlns:hq = “http://www.dreamhome.co.uk/HQ/”>
<STAFF branchNo = “B005”>
SAX (Simple API for XML) <STAFFNO>SL21</STAFFNO>

● An event-based, serial-access API that uses callbacks to report
…
parsing events to application.
● For example, there are events for start and end elements. <hq:SALARY>30000</hq:SALARY>
Application handles these events through customized event
handlers. </STAFF>
● Unlike tree-based APIs, event-based APIs do not built an

</STAFFLIST>
in-memory tree representation of the XML document.
● API product of collaboration on XML-DEV mailing list, rather XSL (eXtensible Stylesheet Language)
than product of W3C. ● In HTML, default styling is built into browsers as tag set for
Namespaces HTML is predefined and fixed.
● Allows element names and relationships in XML documents to be ● Cascading Stylesheet Specification (CSS) provides alternative
qualified to avoid name collisions for elements that have same rendering for tags. Can also be used to render XML in a browser
name but defined in different vocabularies. but cannot make structural alterations to a document.
● Allows tags from multiple namespaces to be mixed - essential if
data comes from multiple sources.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● XSL created to define how XML data is rendered and to define ● Designed for use with XSLT (for pattern matching) and XPointer
how one XML document can be transformed into another (for addressing).
document. ● With XPath, collections of elements can be retrieved by specifying
XSLT (XSL Transformations) a directory-like path, with zero or more conditions placed on the
● A subset of XSL, XSLT is a language in both markup and path.
programming sense, providing a mechanism to transform XML ● Uses a compact, string-based syntax, rather than a structural
structure into either another XML structure, HTML, or any number XML-element based syntax, allowing XPath expressions to be
of other text-based formats (such as SQL). used both in XML attributes and in URIs.
● XSLT’s main ability is to change the underlying structures rather
than simply the media representations of those structures, as with
CSS.
● XSLT is important because it provides a mechanism for
dynamically changing the view of a document and for filtering
data.
● Also robust enough to encode business rules and it can generate
graphics (not just documents) from data.
● Can even handle communicating with servers (scripting modules
can be integrated into XSLT) and can generate the appropriate
messages within body of XSLT itself. XPointer
Provides access to values of attributes or content of elements
XPath anywhere within an XML document.
Declarative query language for XML that provides simple syntax ● Basically an XPath expression occurring within a URI.
for addressing parts of an XML document.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Among other things, with XPointer can link to sections of text, – ID attribute replaces the name attribute;
select particular elements or attributes, and navigate through – documents must conform to XML rules.
elements. Simple Object Access Protocol (SOAP)
● Can also select data contained within more than one set of nodes, ● An XML-based messaging protocol that defines a set of rules for
which cannot do with XPath. structuring messages.
XLink ● Protocol can be used for simple one-way messaging but also useful
Allows elements to be inserted into XML documents to create and for performing RPC-style request-response dialogues.
describe links between resources. ● Not tied to any particular operating system or programming
● Uses XML syntax to create structures that can describe links language nor any particular transport protocol, although HTTP is
similar to simple unidirectional hyperlinks of HTML as well as popular.
more sophisticated links. ● Important advantage of SOAP is that most firewalls allow HTTP to
● Two types of XLink: simple and extended. pass right through, facilitating point-to-point SOAP data
● Simple link connects a source to a destination resource; an exchanges.
extended link connects any number of resources. ● SOAP message is an XML document containing:
o A required Envelope element that identifies the
XML document as a SOAP message.
XHTML (eXtensible HTML) 1.0 o An optional Header element that contains
● Reformulation of HTML 4.01 in XML 1.0 and is intended to be application specific information such as
next generation of HTML. authentication or payment information.
● Basically a stricter and cleaner version of HTML; e.g.: o A required Body Header element that contains
– tags and attributes must be in lowercase; call and response information.
– all XHTML elements must be have an end-tag; o An optional Fault element that provides
– attribute values must be quoted and minimization is not information about errors that occurred while
allowed; processing message.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
Example SOAP Message WSDL Concepts
Web Services Description Language (WSDL)

● XML-based protocol for defining a Web service.
● Specifies location of a service, operations service exposes, SOAP
messages involved, and comms protocol used to talk to service.
● Notation that a WSDL file uses to describe message formats is
typically based on XML Schema.
● Published WSDL descriptions can be used to obtain information
about available Web services.
● WSDL 2.0 describes a Web service in two parts: an abstract part
and a concrete part.
● At abstract level, WSDL describes a Web service in terms of the Universal Discovery, Description and Integration (UDDI)
messages it sends and receives; messages are described ● Defines SOAP-based Web service for locating WSDL-formatted
independent of a specific wire format using a type system, protocol descriptions of Web services.
typically XML Schema. ● Essentially describes online electronic registry that serves as
● At concrete level, a binding specifies transport and wire format electronic Yellow Pages, providing information structure where
details for one or more interfaces. An endpoint associates a various businesses register themselves and services they offer
network address with a binding and a service groups endpoints that through their WSDL definitions.
implement a common interface.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Based on industry standards including HTTP, XML, XML ● XML schema is the definition (both in terms of its organization
Schema, SOAP, and WSDL. and its data types) of a specific XML structure.
● Two types of UDDI registries: public and private. ● XML Schema language specifies how each type of element in
WSDL and UDDI schema is defined and the element’s data type.
● Schema is an XML document, and so can be edited and processed
by same tools that read the XML it describes.
XML Schema – Simple Types
● Elements that do not contain other elements or attributes are of
type simpleType.
<xsd:element name=“STAFFNO” type = “xsd:string”/>

<xsd:element name=“DOB” type = “xsd:date”/>
<xsd:element name=“SALARY” type = “xsd:decimal”/>
XML Schema ● Attributes must be defined last:

● DTDs have number of limitations:
– it is written in a different (non-XML) syntax; <xsd:attribute name=“branchNo” type = “xsd:string”/>
– it has no support for namespaces; XML Schema – Complex Types
– it only offers extremely limited data typing. ● Elements that contain other elements are of type complexType.
● XML Schema is more comprehensive method of defining content ● List of children of complex type are described by sequence
model of an XML document. element.
● Additional expressiveness will allow Web applications to exchange
XML data more robustly without relying on ad hoc validation <xsd:element name = “STAFFLIST”>
tools. <xsd:complexType>
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
<xsd:sequence> ● If there are many references to STAFFNO, use of references will

 place definition in one place and improve the maintainability of the
</xsd:sequence> schema.
</xsd:complexType> Defining New Types
</xsd:element> ● Can also define new data types to create elements and attributes.
Cardinality <xsd:simpleType name = “STAFFNOTYPE”>
● Cardinality of an element can be represented using attributes <xsd:restriction base = “xsd:string”>
minOccurs and maxOccurs. <xsd:maxLength value = “5”/>
● To represent an optional element, set minOccurs to 0; to indicate </xsd:restriction>
there is no maximum number of occurrences, set maxOccurs to </xsd:simpleType>
“unbounded”. ● New type has been defined as a restriction of string (to have
maximum length of 5 characters).
<xsd:element name=“DOB” type=“xsd:date” Groups
minOccurs = “0”/> ● Can define both groups of elements and groups of attributes.
<xsd:element name=“NOK” type=“xsd:string” Group is not a data type but acts as a container holding a set of
minOccurs = “0” maxOccurs = “3”/> elements or attributes.
References
● Can use references to elements and attribute definitions. <xsd:group name = “StaffType”>
<xsd:sequence>
<xsd:element name=“STAFFNO” type=“xsd:string”/> <xsd:element name=“StaffNo” type=“StaffNoType”/>
…. <xsd:element name=“Position” type=“PositionType”/>
<xsd:element ref = “STAFFNO”/> <xsd:element name=“DOB” type =“xsd:date”/>
<xsd:element name=“Salary” type=“xsd:decimal”/>
</xsd:sequence>
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
</xsd:group> ● Must first build a model of the domain of interest, to clarify what
Constraints kind of data is to be sent from first application to second.
● XML Schema provides XPath-based features for specifying ● However, as XML Schema just describes a grammar, there are
uniqueness constraints and corresponding reference constraints that many different ways to encode a specific domain model into an
will hold within a certain scope. XML Schema, thereby losing the direct connection from the
<xsd:unique name = “NAMEDOBUNIQUE”> domain model to the Schema.
<xsd:selector xpath = “STAFF”/> ● Problem compounded if third application wishes to exchange
<xsd:field xpath = “NAME/LNAME”/> information with other two.
<xsd:field xpath = “DOB”/> ● Not sufficient to map one XML Schema to another, since the task
</xsd:unique> is not to map one grammar to another grammar, but to map objects
Key Constraints and relations from one domain of interest to another.
● Similar to uniqueness constraint except the value has to be ● Three steps required:
non-null. Also allows the key to be referenced. o reengineer original domain models from
<xsd:key name = “STAFFNOISKEY”> XML Schema;
<xsd:selector xpath = “STAFF”/> o define mappings between the objects in
<xsd:field xpath = “STAFFNO”/> the domain models;
</xsd:key> o define translation mechanisms for the
Resource Description Framework (RDF) XML documents, for example using
● Even XML Schema does not provide the support for semantic XSLT.
interoperability required.
● For example, when two applications exchange information using u RDF is infrastructure that enables encoding, exchange, and reuse
XML, both agree on use and intended meaning of the document of structured meta-data.
structure.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
u This infrastructure enables meta-data interoperability through <rdf:Description about=“http://www.dh.co.uk/staff_list.xml”>

design of mechanisms that support common conventions of <s:Author>John White</s:Author>
semantics, syntax, and structure. </rdf:Description>
u RDF does not stipulate semantics for each domain of interest, but </rdf:RDF>
instead provides ability for these domains to define meta-data
elements as required. ● To store descriptive information about the author, model author as
u RDF uses XML as a common syntax for exchange and processing a resource.
of meta-data.
RDF Data Model
● Basic RDF data model consists of three objects:
o Resource: anything that can have a URI; e.g., a Web page,
a number of Web pages, or a part of a Web page, such as
an XML element.
o Property: a specific attribute used to describe a resource; RDF Schema
e.g., attribute Author may be used to describe who ● Specifies information about classes in a schema including
produced a particular XML document. properties (attributes) and relationships between resources
o Statement: consists of combination of a resource, a (classes).
property, and a value. ● RDF Schema mechanism provides a basic type system for use in
● Components known as “subject”, “predicate”, and “object” of an RDF models, analogous to XML Schema.
RDF statement. ● Defines resources and properties such as rdfs:Class and
Example statement: rdfs:subClassOf that are used in specifying application-specific
“Author of http://www.dh.co.uk/staff_list.xml is John White” schemas.
<rdf:RDF xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#” ● Also provides a facility for specifying a small number of
xmlns:s=“http://www.dh.co.uk/schema/”> constraints such as cardinality.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
XML Query Languages ● W3C formed an XML Query Working Group in 1999 to produce a
● Data extraction, transformation, and integration are data model for XML documents, set of query operators on this
well-understood database issues that rely on a query language. model, and query language based on query operators.
● SQL and OQL do not apply directly to XML because of the ● Queries operate on single documents or fixed collections of
irregularity of XML data. documents, and can select entire documents or subtrees of
● However, XML data similar to semistructured data. There are documents that match conditions based on document
many semistructured query languages that can query XML content/structure.
documents, including XML-QL, UnQL, and XQL. ● Queries can also construct new documents based on what has been
● All have notion of a path expression for navigating nested structure selected.
of XML. ● Ultimately, collections of XML documents will be accessed like
Example XML-QL databases.
Find surnames of staff who earn more than £30,000. ● Working Group has produced four documents:
WHERE <STAFF> o XML Query (XQuery) Requirements;
<SALARY> $S </SALARY> o XML XQuery 1.0 and XPath 2.0 Data Model;
<NAME><FNAME> $F </FNAME> <LNAME> $L o XML XQuery 1.0 and XPath 2.0 Formal Semantics;
</LNAME></NAME> o XQuery 1.0 – A Query Language for XML;
</STAFF> IN “http://www.dh.co.uk/staff.xml” o XML XQuery 1.0 and XPath 2.0 Functions and
$S > 30000 Operators;
CONSTRUCT <LNAME> $L </LNAME> o XSLT 2.0 and XPath 1.0 Serialization.
XML Query Requirements
● Specifies goals, usage scenarios, and requirements for XQuery
Data Model and query language. For example:
XML Query Working Group – language must be declarative and must be defined
independently of any protocols with which it is used;
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
– queries should be possible whether or not a schema exists; ● Path expression can begin with an expression that identifies a
– language must support both universal and existential specific node, such as function doc(string), which returns root node
quantifiers on collections and it must support aggregation, of named document.
sorting, nulls, and be able to traverse inter- and ● Query can also contain path expression beginning with “/” or “//”,
intra-document references. which represents an implicit root node determined by the
XQuery environment in which query is executed.
● XQuery derived from XML query language called Quilt, which has Example – XQuery Path Expressions
borrowed features from XPath, XML-QL, SQL, OQL, Lorel, Find staff number of first member of staff in our XML document.
XQL, and YATL. doc(“staff_list.xml”)/STAFFLIST/STAFF[1]//STAFFNO
● Like OQL, XQuery is a functional language in which a query is
represented as an expression. ● Four steps:
● XQuery supports several kinds of expression, which can be nested – first opens staff_list.xml and returns its document node;
(supporting notion of a subquery). – second uses /STAFFLIST to select STAFFLIST element
XQuery – Path Expressions at top;
● Uses syntax of XPath. – third locates first STAFF element that is child of root
● In XQuery, result of a path expression is ordered list of nodes, element;
including their descendant nodes, ordered according to their – fourth finds STAFFNO elements occurring anywhere
position in original hierarchy, top-down, left-to-right order. within this STAFF element.
● Result of path expression may contain duplicate values. ● Knowing structure of document, could also express this as:
● Each step in path expression represents movement through doc(“staff_list.xml”)//STAFF[1]/STAFFNO
document in particular direction, and each step can eliminate nodes doc(“staff_list.xml”)/STAFFLIST/STAFF[1]/STAFFNO
by applying one or more predicates.
● Result of each step is list of nodes that serves as starting point for Find staff numbers of first two members of staff.
next step. doc(“staff_list.xml”)/STAFFLIST/STAFF[1 TO 2]/ STAFFNO
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
Example – XQuery Path Expressions ● LET clause also binds one or more variables to one or more
Find surnames of staff at branch B005. expressions but without iteration, resulting in single binding
doc(“staff_list.xml”)/STAFFLIST/ STAFF[@branchNo =“B005”]//LNAME for each variable.
● Optional WHERE clause specifies one or more conditions to
● Five steps: restrict tuples generated by FOR and LET.
– first two as before; ● RETURN clause evaluated once for each tuple in tuple stream
– third uses /STAFF to select STAFF elements within and results concatenated to form result.
STAFFLIST element; ● ORDER BY clause, if specified, determines order of the tuple
– fourth consists of predicate that restricts STAFF elements stream which, in turn, determines order in which RETURN
to those with branchNo attribute = B005; clause is evaluated using variable bindings in the respective
– fifth selects LNAME element(s) occurring anywhere tuples.
within these elements.
XQuery – FLWOR Expressions
● FLWOR (“flower”) expression is constructed from FOR, LET,
WHERE, ORDER BY, RETURN clauses.
● FLWOR expression starts with one or more FOR or LET
clauses in any order, followed by optional WHERE clause,
optional ORDER BY clause, and required RETURN clause.
● FOR and LET clauses serve to bind values to one or more
variables using expressions (e.g., path expressions).
● FOR used for iteration, associating each specified variable
with expression that returns list of nodes.
● FOR clause can be thought of as iterating over nodes returned
by its respective expression.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
element resulting in a decimal value in this case, which is then

compared with 15000.
● ‘=’ operator is a general comparison operator. XQuery also
defines value comparison operators (‘eq’, ‘ne’, ‘lt’, ‘le’, ‘gt’,
‘ge’), which are used to compare two atomic values.
● If either operand is a node, atomization is used to convert it to
an atomic value.
● If we try to compare an atomic value to an expression that
returns multiple nodes, then a general comparison operator
returns true if any value satisfies predicate; however, value
comparison operator would raise an error.
EXAMPLE:
List staff at branch B005 with salary > £15,000.
FOR $S IN doc(“staff_list.xml”)//STAFF
WHERE $S/SALARY > 15000 AND

$S/@branchNo = “B005”
RETURN $S/STAFFNO
Example – XQuery FLWOR Expressions
● Effective boolean value (EBV) of empty sequence is false; EBV
List staff with salary = £30,000.
also false if expression evaluates to: xsd:boolean value false, a
LET $SAL := 30000
numeric or binary zero, a zero-length string, or special float value
RETURN doc(“staff_list.xml”)//STAFF[SALARY = $SAL]
NaN (not a number); EBV of any other sequence evaluates to true.
● Note, predicate seems to compare an element (SALARY) with
a value (15000). In fact, ‘=’ operator extracts typed value of
EXAMPLE
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
List all staff in descending order of staff number. RETURN

FOR $S IN doc(“staff_list.xml”)//STAFF <BRANCHNO>
ORDER BY $S/STAFFNO DESCENDING” { $B/text() }
RETURN $S/STAFFNO </BRANCHNO>
EXAMPLE </LARGEBRANCHES>
List each branch office and average salary at branch. EXAMPLE
FOR $B IN List branches with at least one member of staff with salary > £15,000.
distinct-values(doc(“staff_list.xml”)//@branchNo)) <BRANCHESWITHLARGESALARIES>
LET $avgSalary := avg(doc(“staff_list.xml”)// FOR $B IN
STAFF[@branchNo = $B]/SALARY) distinct-values(doc(“staff_list.xml”)//@branchNo)
RETURN LET $S := doc(“staff_list.xml”)//STAFF/[@branchNo = $B]
<BRANCH> WHERE SOME $sal IN $S/SALARY
<BRANCHNO> SATISFIES ($sal > 15000)
{ $B/text() } RETURN
</BRANCHNO>, <BRANCHNO>{ $B/text() }</BRANCHNO>
<AVGSALARY>$avgSalary</AVGSALARY> </ BRANCHESWITHLARGESALARIES >
</BRANCH> Example – Joining Two Documents
EXAMPLE List staff along with details of their next of kin.
List branches that have more than 20 staff. FOR $S IN doc(“staff_list.xml”)//STAFF,
<LARGEBRANCHES> $NOK IN doc(“nok.xml”)//NOK
FOR $B IN WHERE $S/STAFFNO = $NOK/STAFFNO
distinct-values(doc(“staff_list.xml”)//@branchNo) RETURN
LET $S := doc(“staff_list.xml”)//STAFF/[@branchNo = $B] <STAFFNO>{ $S, $NOK/NAME }</STAFFNO>
WHERE count($S) > 20 Example – Joining Two Documents
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
</BRANCHLIST>
Example – User-Defined Function
Function to return staff at a given branch.
DEFINE FUNCTION staffAtBranch($bNo) AS element()* {
FOR $S IN doc(“staff_list.xml”)//STAFF
WHERE $S/@branchNo = $bNo
ORDER BY $S/STAFFNO
RETURN $S/STAFFNO, $S/NAME,
$S/POSITION, $S/SALARY
}
Example – Joining Two Documents staffAtBranch($B)
List each branch office and staff who work there. XML Information Set (Infoset)
<BRANCHLIST> ● Abstract description of information available in well-formed XML
FOR $B IN document that meets certain XML namespace constraints.
distinct-values(doc(“staff_list.xml”)//@branchNo) ● XML Infoset is attempt to define set of terms that other XML
ORDER BY $B specifications can use to refer to the information items in a
RETURN well-formed (although not necessarily valid) XML document.
<BRANCHNO> { $B/text() } { ● Does not attempt to define complete set of information, nor does it
FOR $S IN doc(“staff_list.xml”)//STAFF represent minimal information that an XML processor should
WHERE $S/@branchNo = $B return to an application.
ORDER BY $S/STAFFNO ● It also does not mandate a specific interface or class of interfaces
RETURN $S/STAFFNO, $S/NAME, $S/POSITION, $S/SALARY (although Infoset presents information as tree).
} ● XML document’s information set consists of two or more
</BRANCHNO> information items.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● An information item is an abstract representation of a component ● Also defines all permissable values of expressions in XSLT,
of an XML document such as an element, attribute, or processing XQuery, and XPath.
instruction. ● Data Model is based on XML Infoset, with following new features:
● Each information item has a set of associated properties. e.g., – support for XML Schema types;
document information item properties include: – representation of collections of documents and of simple
o [document element]; and complex values.
o [children]; ● Decided to make XPath subset of XQuery.
o [notations]; [unparsed entities]; ● XPath spec shows how to represent information in XML Infoset as
o [base URI], [character encoding scheme], a tree structure containing seven kinds of nodes (document,
[version], and [standalone]. element, attribute, text, comment, namespace, or processing
Post-Schema Validation Infoset (PSVI) instruction), with XPath operators defined in terms of these seven
● XML Infoset contains no type information. nodes.
● To overcome this, XML Schema specifies an extended form of ● To retain these operators while using richer type system provided
XML Infoset called Post-Schema Validation Infoset (PSVI). by XML Schema, XQuery extended XPath data model with
● In PSVI, information items representing elements and attributes additional information contained in PSVI.
have type annotations and normalized values that are returned by
an XML Schema processor. ● Data Model is node-labeled, tree-constructor, with notion of node
● PSVI contains all information about an XML document that a identity to simplify representation of reference values (such as
query processor requires. IDREF, XPointer, and URI values).
● An instance of data model represents one or more complete
documents or document parts, each represented by its own tree of
XQuery 1.0 and XPath 2.0 Data Model nodes.
● Defines the information contained in the input to an XSLT or ● Every value is ordered sequence of zero or more items, where an
XQuery Processor. item can be an atomic value or a node.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● An atomic value has a type, either one of atomic types defined in ● Data Model also specifies a number of constructor functions whose
XML Schema or restriction of one of these types. purpose is to illustrate how nodes are constructed.
● When a node is added to a sequence its identity remains same. ER Diagram Representing Main Components
Thus, a node may occur in more than one sequence and a sequence
may contain duplicate items.
● Root node representing XML document is a document node and
each element in document is represented by an element node.
● Attributes represented by attribute nodes and content by text nodes
and nested element nodes.
● Primitive data in document is represented by text nodes, forming
the leaves of the node tree.
● Element node may be connected to attribute nodes and text
nodes/nested element nodes.
● Every node belongs to exactly one tree, and every tree has exactly
one root node.
● Tree whose root node is document node is referred to as a
document and a tree whose root node is some other kind of node is
referred to as a fragment.
Example - XML Query Data Model
● Information about nodes obtained via accessor functions that can
operate on any node.
● Accessor functions are analogous to an information item’s named
properties.
● These functions are illustrative and intended to serve as concise
description of information that must be exposed by Data Model.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
Example - XML Query Data Model

LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Static type analysis (optional), checks whether each (core)

expression is type safe and, if so, determines its static type. If
expression is not type-safe, type error is raised; otherwise, parse
tree built with each subexpression annotated with its static type.
● Dynamic evaluation, computes value of the expression from parse
tree. May result in a dynamic error, either a type error (if static
type analysis has done) or a non-type error.
XQuery Formal Semantics

‘goal is to complement XPath/XQuery spec, by defining meaning
XQuery Formal Semantics – Normalization
of expressions with mathematical rigor. A rigorous formal semantics
● Takes full XQuery expression and transforms it into an equivalent
clarifies intended meaning of the English specification, ensures that no
expression in the core XQuery.
corner cases are left out, and provides reference for implementation’.
● Written as follows:
● Provides implementors with a processing model and a complete
[Expr]Expr
description of the language’s static and dynamic semantics.
==
XQuery Formal Semantics – Main Phases
CoreExpr
● Parsing, ensures input expression is instance of language defined
u States that Expr is normalized to CoreExpr (Expr subscript
by the grammar rules and then builds an internal parse tree.
indicates an expression; other values possible; e.g. Axis).
● Normalization, converts expression into an XQuery Core
expression.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
XQuery Formal Semantics – Normalization ● WHERE clause normalized to IF expression that returns an empty
● FLWOR expression covered by two sets of rules; first splits sequence if condition is false and normalizes result:
expression at clause level then applies further normalization to [WHERE Expr1]FLWOR(Expr)
each clause: ==
[(ForClause | LetClause | WhereClause | OrderByClause) IF ([Expr1]Expr) THEN Expr ELSE ( )
FLWORExpr]Expr Normalization – Example
== FOR $i IN $I, $j IN $J
[(ForClause | LetClause | WhereClause | OrderByClause)]FLWOR LET $k := $i + $j
([FLWORExpr]Expr) WHERE $k > 2
[(ForClause | LetClause | WhereClause | OrderByClause) RETURN RETURN ($i, $j)
Expr]Expr FOR $i IN $I RETURN
== FOR $j in $J RETURN
[(ForClause | LetClause | WhereClause | OrderByClause)]FLWOR LET $k := $i + $j RETURN
([Expr]Expr) IF ($k > 2) THEN RETURN ($i, $j)
XQuery Formal Semantics – Normalization ELSE ( )
● Second set applies to FOR and LET clauses and transforms each Static Type Analysis
into series of nested clauses, each of which binds one variable. For ● XQuery is strongly typed so types of values and expressions must
example, for the FOR clause we have: be compatible with context in which they are used.
[FOR varRef1 TypeDec1? PositionalVar1? IN Expr1, …, ● After normalization static type analysis may optionally be
varRefn TypeDecn? PositionalVarn? IN Exprn]FLWOR(Expr) performed.
== ● Static type of an expression is defined as ‘most specific type that
FOR varRef1 TypeDec1? PositionalVar1? IN [Expr1]Expr RETURN … can be deduced for that expression by examining the query only,
FOR varRefn TypeDecn? PositionalVarn? IN [Exprn]Expr RETURN Expr independent of the input data’.
● Useful for detecting certain types of error early in development.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Also useful for optimizing query execution; e.g. may be able to ● All implementations of XQuery must support dynamic typing,
conclude that result of query is an empty sequence. which checks during dynamic evaluation that type of a value is
● Based on set of inference rules used to infer static type of each compatible with context in which it is used.
expression, based on static types of its operands. ● Type error raised if an incompatibility is detected.
● Bottom-up process, starting at leaves of expression tree containing ● Based on judgments, called evaluation judgments:
simple constants and input data whose type can be inferred from ▪ dynEnv |- Expr ⇒ Value
schema of input document. ● States that “in dynamic environment dynEnv, the evaluation of
● Inference rules used to infer static types of more complex expression Expr yields value Value”.
expressions at next level of tree until entire tree processed.
● Type error raised if static type of some expression is inappropriate. u Inference rule is written as collection of hypotheses (judgments)
Static Type Analysis – Inference Rules and a conclusion, written respectively above and below a dividing
line.
Static typing takes a static environment and an expression and infers a
u Consider logical expressions:
type. Written as:
dynEnv |- Expri ⇒ false 1<= i <= 2
statEnv |- Expr : Type dynEnv |- Expr1 AND Expr2 ⇒ false
– States that “in environment statEnv, expression Expr has type dynEnv |- Expri ⇒ RAISES Error 1<= i <= 2
Type”. dynEnv |- Expr1 AND Expr2 ⇒ RAISES Error
– This is called a typing judgment (a judgment expresses whether a ● Consider following expression:
property holds or not). o (1 IDIV 0 = 1) AND (2 = 3)
– Inference rule written as a collection of premises and a conclusion; ● If left-hand expression evaluated first it will raise an error (divide
for example: by zero) and overall expression will raise an error (no need to
statEnv |- Expr1:xsd:boolean statEnv |- Expr2:Type2 statEnv |- Expr3:Type3 evaluate the right-hand expression).
statEnv |- IF Expr1 THEN Expr2 ELSE Expr3 : (Type2 | Type3)
Dynamic Evaluation
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Conversely, if right-hand expression evaluated first, overall TOPIC: XML AND DATABASE
expression will evaluate to false (no need to evaluate the left-hand ● Need to handle XML that:
expression). – may be strongly typed governed by XML Schema;
– may be strongly typed governed by another schema
language, such as a DTD or RELEX-NG;
– may be governed by multiple schemas or one schema may
be subject to frequent change;
– may be schema-less;
– may contain marked-up text with logical units of text
(such as sentences) that span multiple elements;
– has structure, ordering, and whitespace that may be
significant;
– may be subject to update as well as queries based on
context and relevancy.
● Four general approaches to storing an XML document in RDB:
– store the XML as the value of some attribute within a
tuple;
– store the XML in a shredded form across a number of
attributes and relations;
– store the XML in a schema independent form;
– store the XML in a parsed form; i.e., convert the XML to
internal format, such as an Infoset or PSVI representation,
and store this representation.
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
Storing XML in an Attribute ● With this approach also have to create an appropriate database
● In past the XML would have been stored in an attribute whose data structure.
type was CLOB. Schema-Independent Representation
● More recently, some systems have a new native XML data type
(e.g. XML or XMLType).
● Raw XML stored in serialized form, which makes it efficient to
insert documents into database and retrieve them in their original
form.
● Relatively easy to apply full-text indexing to documents for Schema-Independent Representation
contextual and relevance retrieval. However, question about ● Could use DOM to represent structure of XML data.
performance of general queries and indexing, which may require ● Since XML is a tree structure, each node may have only one
parsing on-the-fly. parent. The rootID attribute allows a query on a particular node to
● Also, updates usually require entire XML document to be replaced be linked back to its document node.
with a new document. ● While this is schema independent, recursive nature of structure can
cause performance problems when searching for specific paths.
● XML decomposed (shredded) into its constituent elements and ● To overcome this, create denormalized index containing
data distributed over number of attributes in one or more relations. combinations of path expressions and a link to node and parent
● Storing shredded documents may make it easier to index values of node.
some elements, provided these elements are placed into their own XML and SQL
attributes. ● SQL:2003 has extensions to enable publication of XML
● Also possible to add some additional data relating to hierarchical (commonly referred to as SQL/XML):
nature of the XML, making it possible to recompose original – new native XML data type, XML, which allows XML
structure and ordering, and to allow the XML to be updated. documents to be treated as relational values in columns of
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
tables, attributes in user-defined types, variables, and ● XMLPARSE, to perform a non-validating parse of a character
parameters to functions; string to produce an XML value.
– set of operators for the type; ● XMLROOT, to create an XML value by modifying the properties
– implicit set of mappings from relational data to XML. of the root item of another XML value.
● Standard does not define any rules for the inverse process; i.e., ● XMLCOMMENT, to generate an XML comment.
shredding XML data into an SQL form, with some minor ● XMLPI, to generate an XML processing instruction.
exceptions. ● XMLSERIALIZE, to generate a character or binary string from an
Example– Creating Table using XML Type XML value;
CREATE TABLE XMLStaff ( docNo CHAR(4), docDate DATE, staffData ● XMLAGG, an aggregate function, to generate a forest of elements
XML, PRIMARY KEY docNo); from a collection of elements.
INSERT INTO XMLStaff VALUES (‘D001’, DATE‘2004-12-01’, Example – Using XML Operators
XML(‘<STAFF branchNo = "B005"> List all staff with salary > £20,000, as an XML element containing name
<STAFFNO>SL21</STAFFNO> and branch number as an attribute.
<POSITION>Manager</POSITION> SELECT staffNo, XMLELEMENT (NAME “STAFF”, fName || ‘ ’ ||
<DOB>1945-10-01</DOB> lName,
<SALARY>30000</SALARY> </STAFF>’) ); XMLATTRIBUTES (branchNo AS
SQL/XML Operators “branchNumber”) ) AS “staffXMLCol”
● XMLELEMENT, to generate an XML value with a single element FROM Staff
as a child of its root item. Element can have attributes specified via WHERE salary > 20000;
XMLATTRIBUTES subclause. Example – Using XML Operators
● XMLFOREST, to generate an XML value with a list of elements For each branch, list names of all staff with each one represented as an
as children of a root item. XML element.
● XMLCONCAT, to concatenate a list of XML values. SELECT XMLELEMENT (NAME “BRANCH”,
XMLATTRIBUTES (branchNo AS “branchNumber”),
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
XMLAGG ( XMLELEMENT (NAME “STAFF”, – XML namespaces use ‘:’ to separate namespace prefix
fName || ‘ ’ || lName) from local component.
ORDER BY fName || ‘ ’ || lName )) ● Resolved using escape notation that changes unacceptable
AS “branchXMLCol” characters in XML Names into sequence of allowable characters
FROM Staff GROUP BY branchNo; based on Unicode values (“_xHHHH_”).
SQL/XML Mapping Functions
● SQL/XML also defines mapping from tables to XML documents. Mapping SQL Data Types to XML Schema
● Mapping may take as its source an individual table, all tables in a ● SQL/XML maps each SQL data type to closest match in XML
schema, or all tables in a catalog. Schema, in some cases using facets to restrict acceptable XML
● Standard does not specify syntax for the mapping; instead it is values to achieve closest match.
provided for use by applications and as a reference for other ● For example:
standards. – SMALLINT mapped to a restriction of xsd:integer with
● Mapping produces two XML documents: one that contains mapped minInclusive and maxInclusive facets set.
table data and other that contains an XML Schema describing the – CHAR mapped to restriction of xsd:string with facet
first. length set.
Mapping SQL Identifiers to XML Names – DECIMAL mapped to xsd:decimal with precision and
● Number of issues had to be addressed to map SQL identifiers to scale set.
XML Names: Mapping Tables to XML Documents
– range of characters that can be used within an SQL ● Create root element named after table with <row> element for each
identifier larger than range for an XML Name; row.
– SQL delimited identifiers (identifiers within ● Each row contains a sequence of column elements, each named
double-quotes), permit arbitrary characters to be used at after corresponding column.
any point in identifier; ● Each column element contains a data value.
– XML Names that begin with ‘XML’ are reserved;
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
● Names of table and column elements are generated using fully ● XML document must be unit of (logical) storage although not
escaped mapping from SQL identifiers to XML Names. restricted by any underlying physical storage model (so traditional
● Must also specify how nulls are to be mapped, using ‘absent’ DBMSs not ruled out nor proprietary storage formats such as
(column with null would be omitted) or ‘nil’. indexed, compressed files).
Generating an XML Schema ● Two types:
● Generated by creating globally-named XML Schema data types for o text-based, which stores XML as text, e.g. as a
every type required to describe tables(s) being mapped. file in file system or as a CLOB in an RDBMS;
● Naming convention uses suffix containing length or precision/scale o model-based, which stores XML in some internal
to name of the base type (e.g. CHAR(10) would be CHAR_10). tree representation, e.g., an Infoset, PSVI, or
● Next, named XML Schema type is created for types of the rows in representation, possibly with tags tokenized.
table (name used is ‘RowType’ concatenated with catalog, schema, NOTES FROM INTERNET
and table name). ● Mapping XML into relational data
● Named XML Schema type is created for type of the table itself ● Generating XML using Java and JDBC
(name used is ‘TableType’ concatenated with catalog, schema, and ● Storing XML
table name). ● XML on the Web
● Finally, an element is created for table based on this new table ● XML support in Oracle
type. ● XML API for databases
Native XML Databases
● Defines (logical) data model for an XML document (as opposed to Mapping XML into relational data
data in that document) and stores/retrieves documents according to The database
that model. ● We can model the database with a document node and its
● At a minimum, model must include elements, attributes, PCDATA, associated element node:
and document order. <?xml version=“1.0” ?>
<myDatabase>
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
table1 fieldm
table2 </custRec>
... ● The name is arbitrary, since the relational data model doesn't
tablen define a name for a record type
</myDatabase> The field
● Order of tables is immaterial ● A field is represented as an element node with a data node as its
The table only child:
● Each table of the database is represented by an element node with <custName type="t">
the records as its children: d
<customer> </custName>
record1 ● If d is omitted, it means the value of the fields is the empty string.
record2 ● The value of t indicates the type of the value
... Example
recordm <?xml version=“1.0” ?>
</customer> <myDatabase>
● Again, order of the records is immaterial, since the relational data <customers>
model defines no ordering on them. <custRec>
The record <custName type=“String”>Robert Roberts</custName>
● A record is also represented by an element node, with its fields as <custAge type=“Integer”>25</custAge>
children: </custRec>
<custRec> <custRec>
field1 <custName type=“String”>John Doe</custName>
field2 <custAge type=“Integer”>32</custAge>
... </custRec>
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
</customers> }
</myDatabase> xml.append(“</customers></myDatabase>”);
Generating XML from relational data Storing XML in relational tables
Step 1 : Set up the database connection Step 1 : Set up the parser
StringReader stringReader = new StringReader(xmlString);
// Create an instance of the JDBC driver so that it has InputSource inputSource = new InputSource(stringReader);
// a chance to register itself DOMParser domParser = new DOMParser();
Class.forName(sun.jdbc.odbc.JdbcOdbcDriver).newInstance(); domParser.parse(inputSource);
// Create a new database connection. Document document = domParser.getDocument();
Connection con = Step 2 : Read values from parsed XML document
DriverManager.getConnection(jdbc:odbc:myData, “”, “”); NodeList nameList = doc.getElementsByTagName(“custName”);
// Create a statement object that we can execute queries with NodeList ageList = doc.getElementsByTagName(“custAge”);
Statement stmt = con.createStatement(); Step 3 : Set up database connection
Step 2 : Execute the JDBC query Class.forName(sun.jdbc.odbc.JdbcOdbcDriver).newInstance();
String query = “Select Name, Age from Customers”; Connection con =
ResultSet rs = stmt.executeQuery(query); DriverManager.getConnection(jdbc:odbc:myDataBase, “”, “”);
Step 3 : Create the XML! Statement stmt = con.createStatement();
StringBuffer xml = “<?xml version=‘1.0’?><myDatabase><customers>”; Step 4 : Insert data using appropriate JDBC update query
while (rs.next()) { String sql = “INSERT INTO Customers (Name, Age) VALUES (?,?)”;
xml.append(“<custRec><custName>”); PreparedStatement pstmt = conn.prepareStatement(sql);
xml.append(rs.getString(“Name”)); int size = nameList.getLength();
xml.append(“</custName><custAge>”); for (int i = 0; i < size; i++) {
xml.append(rs.getInt(“Age”)); pstmt. setString(1, nameList.item(i).getFirstChild().getNodeValue());
xml.append(“</custAge></custRec>”); pstmt.setInt(2, ageList.item(i).getFirstChild().getNodeValue());
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
pstmt.executeUpdate(sql);
}
XML on the Web (Servlets)
public void doGet(HttpServletRequest req, HttpServletResponse resp)
{
resp.setContentType("text/xml");
PrintWriter out = new PrintWriter(resp.getOutputStream());
… generate XML here, as before…
out.println(xmlGenerated); Let’s insert the XSL…
out.flush(); <?xml version=“1.0” ?>
out.close(); <?xml-stylesheet type="text/xsl" href="http://myServer/Customer.xsl"?>
} <myDatabase>
● Appropriate XSL can be inserted for display <customers>
<custRec>
<custName type=“String”>Robert Roberts</custName>
<custAge type=“Integer”>25</custAge>
</custRec>
… other records here …
</customers>
XML in IE 5.0 </myDatabase>
XML with XSL in IE 5.0
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
XML Class Generator for Java
XML support in Oracle

● Oracle8i and interMedia
● XML Parsers and XSL Processors (Java, C, C++, and PL/SQL)
XML SQL Utility for Java
● XML Class Generators (Java and C++)
(Generating XML)
● XML SQL Utility for Java
● XSQL Servlet
Oracle8i and interMedia
● run Oracle XML components and applications inside the database
using JServer - Oracle8i's built-in JVM
● interMedia Text allows queries such as find "Oracle WITHIN title"
where "title" is a section of the XML document
XML Parsers in Oracle

XSQL Servlet
LOYOLA-ICAM
UNIT IV
EMERGING SYSTEMS
XML API for databases
● blend the power of a database with the features of XML
● most XML tools work with the SAX or DOM API

● implement the same APIs directly over a database, enabling XML
tools to treat databases as if they were XML documents. That way,
we can obviate the need of converting a database.
XML API with database

Unit 4adtnotes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 4adtnotes

Uploaded by

Copyright:

Available Formats

LOYOLA-ICAM

COLLEGE OF ENGINEERING AND TECHNOLOGY

UNIT 4 EMERGING SYSTEMS ● Examples of some enhanced Data Models:

referred to as logic databases.

● In the general case, some machines install only client software,

Fig 1. 2-tier client server architecture

● Because many of these systems were developed in the era of

● The exact division of functionality varies from system to system.

● In such a client/ server architecture, the server has been called a

Three layers: checking a client's credentials before forwarding a request to the

Client GUI interface Browser 1. Presentation layer (client):

Application server Business rules Web Server user.

architecture, which adds an intermediate layer between the client application.

application. navigation by accepting user commands and displaying

3. The application server combines the results of the subqueries to produce

stores, whereas insurance companies may have data from trends

● Large organizations have a complex internal organization structure, processing systems

and therefore different data may be present in different locations,

● Organizations now focus on ways to use operational data to

● However, operational systems were never designed to support

● Businesses typically have numerous operational systems with

● Potential high returns on investment – Executive information systems (EIS)

Examples of Typical Data Warehouse Queries

● What is the relationship between the total annual revenue

Typical Architecture of a Data Warehouse

● Underestimation of resources for data loading

● High demand for resources

● Long duration projects

● Complexity of integration 1. Mainframe operational data held in first generation hierarchical

3. Private data held on workstations and private servers. Warehouse Manager

to the next level of detail.

End-user Access Tools

● The principal purpose of data warehousing is to provide

● Metaflow - Processes associated with the management of the

● Building a data warehouse is a complex task because there is no

Extraction, Cleansing, and Transformation Tools

● Tasks of capturing data from source systems, cleansing and

● Downflow - Processes associated with archiving and ▪ Dynamic Transformation Engines

backing-up/recovery of data in the warehouse.

● Outflow - Processes associated with making the data available to

● Load processing o SMP - A set of tightly coupled processors that share

– Data transformation and loading

– Data warehouse management ● Maintaining efficient data storage management.

– Query generation ● Purging data.

Administration and Management Tools

● Monitoring data loading from multiple sources.

– Do not normally contain detailed operational data unlike

– More easily understood and navigated.

Reasons for Creating a Data Mart

– Focuses on only the requirements of one department or Data Marts Issues

● A logical design technique that aims to present the data in a

● Each dimension table has a simple (non-composite) primary

● Forms ‘star-like’ structure, which is called a star schema or

● Surrogate keys allows the data in the warehouse to have some

● Important to treat fact data as read-only reference data that

● Most useful fact tables contain one or more numerical

● Dimension tables usually contain descriptive textual

● Dimension attributes are used as the constraints in data

● Star schemas can be used to speed up query performance by

● Snowflake schema is a variant of the star schema where

– Predictable query processing

Comparison of DM and ER models

● A single ER model normally decomposes into multiple DMs.