Professional Documents
Culture Documents
Spatial Databases:
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
● Spatial databases provide concepts for databases that keep track of Expert database systems or knowledge-based systems:
objects in a multidimensional space. For example, cartographic ● It incorporates reasoning and inferencing capabilities; such
databases that store maps include two-dimensional spatial systems use techniques that were developed in the field of artificial
positions of their objects, which include countries, states, rivers, intelligence, including semantic networks, frames, production
cities, roads, seas, and so on. systems, or rules for capturing domain-specific knowledge.
● Other databases, such as meteorological databases for weather
information, are three-dimensional, since temperatures and other
meteorological information are related to three-dimensional spatial
points.
Multimedia Databases:
● Multimedia databases provide features that allow users to store and
query different types of multimedia information, which includes
images (such as pictures or drawings), video clips (such as movies,
news reels, or home videos), audio clips (such as songs, phone
messages, or speeches), and documents (such as books or articles).
● Deductive databases:
It’s an area that is at the intersection of databases, logic, and
artificial intelligence or knowledge bases. A deductive database
system is a database system that includes capabilities to define
(deductive) rules, which can deduce or infer additional information
from the facts that are stored in a database. TOPIC 2: CLIENT/SERVER MODEL
● Because part of the theoretical foundation for some deductive
database systems is mathematical logic, such rules are often Client/Server Model:
Networked computing model ● The client machines provide the user with the appropriate
Processes distributed between clients and servers interfaces to utilize these servers, as well as with local processing
Client – Workstation (usually a PC) that requests and uses a power to run local applications.
service ● This concept can be carried over to software, with specialized
Server – Computer (PC/mini/mainframe) that provides a service software-such as a DBMS or a CAD (computer-aided design)
For DBMS, server is a database server package-being stored on specific server machines and being made
accessible to multiple clients.
Basic Client/Server Architectures:
● Figure 1 illustrates client/server architecture at the logical level,
● The client/server architecture was developed to deal with and Figure 2 is a simplified diagram that shows how the physical
computing environments in which a large number of Pc’s, architecture would look.
workstations, file servers, printers, database servers, Web servers, ● The concept of client/server architecture assumes an underlying
and other equipment are connected via network. framework that consists of many PCs and workstations as well as a
● The idea is to define specialized servers with specific smaller number of mainframe machines, connected via local area
functionalities. networks and other types of computer networks.
● For example, it is possible to connect a number of PCs or small Client
workstations as clients to a file server that maintains the files of the ● A client in this framework is typically a user machine that
client machines. Another machine could be designated as a printer provides user interface capabilities and local processing.
server by being connected to various printers; thereafter, all print ● When a client requires access to additional functionality-such as
requests by the clients are forwarded to this machine. database access-that does not exist at that machine, it connects to a
● Web servers or e-mail servers also fall into the specialized server server that provides the needed functionality.
category. Server:
● In this way, the resources provided by specialized servers can be A server is a machine that can provide services to the client machines, such
accessed by many client machines. as file access, printing, archiving, or database access.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
● In RDBMSs, the server is also often called an SQL server, since Fig 2. Physical 2-tier Client server architecture
most RDBMS servers are based on the SQL language and
● Most DBMS vendors provide ODBC drivers for their systems.
standard.
● Hence, a client program can actually connect to several RDBMSs
● In such client/server architecture, the user interface programs and
and send query and transaction requests using the ODBC API,
application programs can run on the client side.
which are then processed at the server sites.
● When DBMS access is required, the program establishes a
● Any query results are sent back to the client program, which can
connection to the DBMS (which is on the server side); once the
process or display the results as needed.
connection is created, the client program can communicate with
● A related standard for the Java programming language, called
the DBMS.
JDBC, has also been defined.
● A standard called Open Database Connectivity (ODBC) provides
● This allows Java client programs to access the DBMS through a
an application programming interface (API), which allows
standard interface.
client-side programs to call the DBMS, as long as both client and
server machines have the necessary software installed.
● The second approach to client/server architecture was taken by
some object-oriented DBMSs.
● For example, the server level may include the part of the DBMS
software responsible for handling data storage on disk pages, local
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
concurrency control and recovery, buffering and caching of disk ● The architectures described here are called two-tier architectures
pages, and other such functions. because the software components are distributed over two systems:
client and server.
● Meanwhile, the client level may handle the user interface; data Advantage:
dictionary functions; DBMS interactions with programming ● The advantages of this architecture are its simplicity and seamless
language compilers; global query optimization, concurrency compatibility with existing systems.
control, and recovery across multiple servers; structuring of ● The emergence of the World Wide Web changed the roles of
complex objects from the data in the buffers; and other such clients and server, leading to the three-tier architecture.
functions.
Three-Tier Client/Server Architectures for Web
● In this approach, the client/server interaction is more tightly Applications:
coupled and is done internally by the DBMS modules-some of
which reside on the client and some on the server-rather than by
the users.
● This data can then be structured into objects for the client
programs by the client-side DBMS software itself.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
(I/O processing) ● This provides the user interface and interacts with the
Database server Data storage DBMS ● The programs at this layer present Web interfaces or
● Many Web applications use an architecture called the three-tier forms to the client in order to interface with the
and the database server. ● Web browsers are often utilized, and the languages used
● This intermediate layer or middle tier is sometimes called the include HTML, JAVA, JavaScript, PERL, Visual Basic,
application server and sometimes the Web server, depending on the and so on. This layer handles user input, output, and
● This server plays in intermediary role by storing business rules the needed information, usually in the form of static or
(procedures or constraints) that are used to access data from the dynamic Web pages. The latter are employed when the
database server. It can also improve database security by interaction involves database access.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
● When a Web interface is used, this layer typically ● Query results (and queries) may be formatted into XML
communicates with the application layer via the HTTP when transmitted between the application server and the
protocol. database server.
2. Application layer (business logic): Database Server Architectures
● This layer programs the application logic. For example, ● 2-tiered approach
queries can be formulated based on user input from the ● Client is responsible for
client, or query results can be formatted and sent to the o I/O processing logic
client for presentation. o Some business rules logic
● Additional application functionality can be handled at this ● Server performs all data storage and access processing 🡪DBMS is
layer, such as security checks, identity verification, and only on server
other functions. Advantages:
● The application layer can interact with one or more ● Clients do not have to be as powerful
databases or data sources as needed by connecting to the ● Greatly reduces data traffic on the network
database using ODBC, DBC, SQL/CLI or other database ● Improved data integrity since it is all processed centrally
access techniques. ● Stored procedures 🡪 some business rules done on server
3. Database server:
● This layer handles query and update requests from the
application layer, processes the requests, and send the
results.
● Usually SQL is used to access the database if it is
relational or object-relational and stored database
procedures may also be invoked.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
2. Each database server processes the local query and sends the
results to the application server site. Increasingly, XML is being
touted as the standard for data exchange so the database server
may format the query result into XML before sending it to the
application server.
● If the DDBMS has the capability to hide the details of data ● Improved customer service
distribution from the application server, then it enables the ● Competitive advantage
application server to execute global queries and transactions as ● Reduced risk
though the database were centralized, without having to specify the Challenges of Three-tier Architectures
sites at which the data referenced in the query or transaction ● High short-term costs
resides. ● Tools and training
● This property is called distribution transparency. Some DDBMSs ● Experience
do not provide distribution transparency, instead requiring that ● Incompatible standards
applications be aware of the details of data distribution. ● Lack of compatible end-user tools
● Advances in encryption and decryption technology make it safer to Client/Server Security
transfer sensitive data from server to client in encrypted form, ● Network environment 🡪complex security issues
where it will be decrypted. The latter can be done by the hardware Security levels:
or by advanced software. ● System-level password security
● This technology gives higher levels of data security, but the o for allowing access to the system
network security issues remain a major concern. ● Database-level password security
● Various technologies for data compression are also helping in o for determining access privileges to tables;
transferring large amounts of data from servers to clients over o read/update/insert/delete privileges
wired and wireless networks. ● Secure client/server communication
o via encryption
Advantages of Three-Tier Architectures
● Scalability
● Technological flexibility
● Long-term cost reduction TOPIC 3: DATA WAREHOUSING AND DATA MINING
● Better match of systems to business needs
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
Data Warehousing and Data Mining: ● Corporate decision making requires a unified view of all
organizational data, including historical data
Need for Data warehousing:
● A data warehouse is a repository (archive) of information
● Large companies have presences in many places, each of which gathered from multiple sources, stored under a unified schema, at a
may generate a large volume of data. single site
● For instance, large retail chains have hundreds or thousands of o Greatly simplifies querying, permits study of historical
thousands of local branches. o Shifts decision support query load away from transaction
● By accessing information for decision support from a data ● Data sources that have been constructed independently are likely to
warehouse, the decision maker ensures that online have different schemas. They may even use different data models.
transaction-processing systems are not affected by the ● Part of the task of a warehouse is to perform schema integration,
decision-support workload. and to convert data to the integrated schema before they are stored.
● As a result, the data stored in the warehouse are not just a copy of
Components of a Data Warehouse the data at the sources.
The issues to be addressed in building a warehouse are the ● Instead, they can be thought of as a materialized view of the data at
following: the sources.
Data cleansing.
When and how to gather data. ● The task of correcting and preprocessing data is called data
● In a source-driven architecture for gathering data, the data cleansing.
sources transmit new information, either continually (as transaction ● Data sources often deliver data with numerous minor
processing takes place), or periodically (nightly, for example). inconsistencies that can be corrected.
● In a destination-driven architecture, the data warehouse ● For example, names are often misspelled, and addresses may have
periodically sends requests for new data to the sources. street/area/city names misspelled, or zip codes entered incorrectly.
● Unless updates at the sources are replicated at the warehouse via ● These can be corrected to a reasonable extent by consulting a
two-phase commit, the warehouse will never be quite up to date database of street names and zip codes in each city.
with the sources. ● Address lists collected from multiple sources may have duplicates
● Two phase commit is usually far too expensive to be an option, so that need to be eliminated in a merge–purge operation.
data warehouses typically have slightly out-of-date data. ● Records for multiple individuals in a house may be grouped
● That, however, is usually not a problem for decision-support together so only one mailing is sent to each house; this operation is
systems. called householding.
What schema to use. How to propagate updates.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
● Updates on relations at the data sources must be propagated to the ● Thus, the data are usually multidimensional data, with dimension
data warehouse. attributes and measure attributes.
● If the relations at the data warehouse are exactly the same as those ● Tables containing multidimensional data are called fact tables and
at the data source, the propagation is straightforward. are usually very large
● If they are not, the problem of propagating updates is basically the ● A table recording sales information for a retail store, with one tuple
view-maintenance problem. for each item that is sold, is a typical example of a fact table.
What data to summarize. ● The dimensions of the sales table would include what the item is
● The raw data generated by a transaction-processing system (usually an item identifier such as that used in bar codes), the date
may be too large to store online. when the item is sold, which location (store) the item was sold
● However, we can answer many queries by maintaining just from, which customer bought the item, and so on.
summary data obtained by aggregation on a relation, rather ● The measure attributes may include the number of items sold and
than maintaining the entire relation. the price of the items.
● For example, instead of storing data about every sale of ● To minimize storage requirements, dimension attributes are usually
clothing, we can store total sales of clothing by itemname and short identifiers that are foreign keys into other other tables called
category. dimension tables.
● Suppose that a relation r has been replaced by a summary relation
s. Users may still be permitted to pose queries as though the ● For instance, fact table sales would have attributes item-id,
relation r were available online. store-id, customer-id, and date, and measure attributes number and
● If the query requires only summary data, it may be possible to price.
transform it into an equivalent one using s instead. ● The attribute store-id is a foreign key into a dimension table store,
Warehouse Schemas which has other attributes such as store location (city, state,
● Data warehouses typically have schemas that are designed for data country).
analysis, using tools such as OLAP tools. ● The item-id attribute of the sales table would be a foreign key into
a dimension table item-info, which would contain information such
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
as the name of the item, the category to which the item belongs, ● Complex data warehouse designs may also have more than one
and other item details such as color and size. fact table.
● The customer-id attribute would be a foreign key into a customer The Evolution of Data Warehousing
table containing attributes such as name and address of the
● Since 1970s, organizations gained competitive advantage
customer.
through systems that automate business processes to offer
● We can also view the date attribute as a foreign key into a date-info
more efficient and cost-effective services to the customer.
table giving the month, quarter, and year of each date.
● This resulted in accumulation of growing amounts of data in
● The resultant schema appears in Figure.
operational databases.
decision-making, receiving data from multiple operational ● Data in the warehouse is only accurate and valid at some point in
data sources. time or over some time interval.
● Time-variance is also shown in the extended time that the data is
Data Warehousing Concepts
held, the implicit or explicit association of time with all data, and
● A subject-oriented, integrated, time-variant, and non-volatile the fact that the data represents a series of snapshots.
collection of data in support of management’s decision-making
process .
Non-volatile Data
Subject-oriented Data
● Data in the warehouse is not updated in real-time but is
● The warehouse is organized around the major subjects of the
refreshed from operational systems on a regular basis.
enterprise (e.g. customers, products, and sales) rather than the
● New data is always added as a supplement to the database,
major application areas (e.g. customer invoicing, stock control, and
rather than a replacement.
product sales).
● This is reflected in the need to store decision-support data rather Data Webhouse
than application-oriented data.
● The Web is an immense source of behavioral data as
Integrated Data individuals interact through their Web browsers with remote
Web sites. The data generated by this behavior is called
● The data warehouse integrates corporate application-oriented
clickstream.
data from different source systems, which often includes data
● A data webhouse is a distributed data warehouse with no
that is inconsistent.
central data repository that is implemented over the Web to
● The integrated data source must be made consistent to present
harness clickstream data.
a unified view of the data to the users.
Benefits of Data Warehousing
Time-variant Data
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
● Data ownership
● High maintenance
Operational Data
● Typically constructed using vendor end-user data access tools, data ● Removes the requirement to continually perform summary
warehouse monitoring tools, database facilities, and custom-built operations (such as sort or group by) in answering user queries.
programs. ● The summary data is updated continuously as new data is loaded
● Complexity determined by the facilities provided by the end-user into the warehouse.
access tools and the database.
Archive / Backup Data
● The operations performed by this component include directing
queries to the appropriate tables and scheduling the execution of ● Stores detailed and summarized data for the purposes of archiving
queries. and backup.
● In some cases, the query manager also generates query profiles to ● May be necessary to backup online summary data if this data is
allow the warehouse manager to determine which indexes and kept beyond the retention period for detailed data.
aggregations are appropriate. ● The data is transferred to storage archives such as magnetic tape or
Detailed Data optical disk.
● Stores all the detailed data in the database schema.
● In most cases, the detailed data is not stored online but aggregated Metadata
o Query management process - metadata is used to direct a ● These users interact with the warehouse using end-user access
query to the most appropriate data source. tools.
● The data warehouse must efficiently support ad hoc and routine
● The structure of metadata will differ between each process,
analysis.
because the purpose is different.
● High performance is achieved by pre-planning the requirements for
● This means that multiple copies of metadata describing the same
joins, summations, and periodic reports by end-users (where
data item are held within the data warehouse.
possible).
● Most vendor tools for copy management and end-user data access
● There are five main groups of access tools
use their own versions of metadata.
o Data reporting and query tools
● Copy management tools use metadata to understand the mapping
rules to apply in order to convert the source data into a common o Application development tools
form.
o Executive information system (EIS) tools
● End-user access tools use metadata to understand how to build a
o Online analytical processing (OLAP) tools
query.
o Data mining tools
● The management of metadata within the data warehouse is a very
complex task that should not be underestimated.
● Advanced query functionality metadata is a critical issue in achieving a fully integrated data
warehouse.
Data Warehouse Parallel Database Technologies ● The major purpose of metadata is to show the pathway back to
where the data began, so that the warehouse administrators
● Aims to solve decision-support problems using multiple nodes
know the history of any item in the warehouse.
working on the same problem.
● Problem is that metadata has several functions in the data
● Performs many database operations simultaneously, splitting
warehouse.
individual tasks into smaller parts so that tasks can be spread
across multiple processors. – Data transformation and loading
● Parallel DBMSs must be capable of running parallel queries,
– Data warehouse management
parallel data loading, table scanning, and data archiving, and back
up. – Query generation
● Two main parallel hardware architectures include u Problem is that metadata has several functions in the data
warehouse.
o Symmetric Multi-processing (SMP)
● Various tools of data warehouse generate and use their own ● Archiving and backing-up data.
metadata. Major challenge is to synchronize the various types of
● Implementing recovery following failure.
metadata.
● Two industry organizations: the Meta Data Coalition (MDC) and ● Security management.
the Object Management Group (OMG) have merged to propose a
single standard for metadata and modeling in data warehousing
called the Common Warehouse Metamodel (CWM).
● Allows users to exchange metadata between different products
from different vendors freely.
● Replicating, subsetting, and distributing data. Typical Data Warehouse and Data Mart Architecture
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
● To give users access to the data they need to analyze most often.
● To provide data in a form that matches the collective view of the
data by a group of users in a department or business function area.
● To improve end-user response time due to the reduction in the
volume of data to be accessed.
● To provide appropriately structured data as dictated by the
requirements of the end-user access tools.
● Building a data mart is simpler compared with establishing a
corporate data warehouse.
● The cost of implementing data marts is normally less than that
Data Mart required to establish a data warehouse.
● The potential users of a data mart are more clearly defined and can
● A subset of a data warehouse that supports the requirements of a
be more easily targeted to obtain support for a data mart project
particular department or business function.
rather than a corporate data warehouse project.
● Characteristics include
business function.
● Data mart functionality
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
● Data mart size – For many enterprises the way to avoid the complexities
associated with designing a data warehouse is to start by
● Data mart load performance
building one or more data marts.
● Users access to data in multiple data marts – Data marts allow designers to build something that is far
simpler and achievable for a specific group of users.
● Data mart Internet / Intranet access ● Few designers are willing to commit to an enterprise-wide design
that must meet all user requirements at one time.
● Data mart administration
● Despite the interim solution of building data marts, the goal
● Data mart installation remains the same: that is, the ultimate creation of a data warehouse
that supports the requirements of the enterprise.
Designing Data Warehouses
● The requirements collection and analysis stage of a data warehouse
u To begin a data warehouse project, we need to find answers for project involves interviewing appropriate members of staff (such
questions such as: as marketing users, finance users, and sales users) to enable the
identification of a prioritized set of requirements that the data
– Which user requirements are most important and which
warehouse must meet.
data should be considered first?
● At the same time, interviews are conducted with members of staff
– Which data should be considered first? responsible for operational systems to identify, which data sources
can provide clean, valid, and consistent data that will remain
– Should the project be scaled down into something more supported over the next few years.
manageable? ● Interviews provide the necessary information for the top-down
view (user requirements) and the bottom-up view (which data
– Should the infrastructure for a scaled down project be
sources are available) of the data warehouse.
capable of ultimately delivering a full-scale
● The database component of a data warehouse is described using a
enterprise-wide data warehouse?
technique called dimensionality modeling.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
Dimensionality modelling
● All natural keys are replaced with surrogate keys. Means that
every join between fact and dimension tables is based on Star schema for property sales of DreamHome
surrogate keys, not natural keys.
● Predictable and standard form of the underlying dimensional ● Choosing the facts
● Dimensions set the context for asking questions about the facts in
the fact table.
● If any dimension occurs in two data marts, they must be exactly
the same dimension, or one must be a mathematical subset of the
other.
● A dimension used in more than one data mart is referred to as
being conformed.
● The grain of the fact table determines which facts can be used in
the data mart.
– non-numeric facts
Star schemas for property sales and property advertising – non-additive facts
● Usefulness of a data mart is determined by the scope and o Type 2, where a changed dimension attribute causes a
nature of the attributes of the dimension tables. new dimension record to be created
o Type 3, where a changed dimension attribute causes an
Step 7: Choosing the duration of the database
alternate attribute to be created so that both the old and
● Duration measures how far back in time the fact table goes. new values of the attribute are simultaneously accessible
in the same dimension record
● Very large fact tables raise at least two very significant data Step 9: Deciding the query priorities and the query modes
warehouse design issues. ● Most critical physical design issues affecting the end-user’s
perception includes:
– Often difficult to source increasing old data.
– physical sort order of the fact table on disk
– It is mandatory that the old versions of the important – presence of pre-stored summaries or aggregations
dimensions be used, not the most current versions. Known ● Additional physical design issues include administration, backup,
as the ‘Slowly Changing Dimension’ problem. indexing performance, and security.
Step 8: Tracking slowly changing dimensions Database Design Methodology for Data Warehouses
● Slowly changing dimension problem means that the proper ● Methodology designs a data mart that supports the requirements of
description of the old dimension data must be used with the old a particular business process and allows the easy integration with
fact data. other related data marts to form the enterprise-wide data
● Often, a generalized key must be assigned to important dimensions warehouse.
in order to distinguish multiple snapshots of dimensions over a
● A dimensional model, which contains more than one fact table
period of time.
sharing one or more conformed dimension tables, is referred to as a
● There are three basic types of slowly changing dimensions:
fact constellation.
o Type 1, where a changed dimension attribute is
overwritten
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
Data Mining:
Data Mining
● Starts by developing an optimal representation of structure of – Predicting customers likely to change their credit card
sample data, during which time knowledge is acquired and affiliation
extended to larger sets of data.
– Determining credit card spending by customer groups
● Data mining can provide huge paybacks for companies who have
● Insurance
made a significant investment in data warehousing.
– Claims analysis
● Relatively new technology, however already used in a number of
industries. – Predicting which customers will buy new policies
– Identifying buying patterns of customers – Identifying successful medical therapies for different
illnesses
– Finding associations among customer demographic
characteristics
Data Mining Operations
– Predicting response to mailing campaigns
● Four main operations include:
– Market basket analysis
– Predictive modeling
● Banking
– Database segmentation
– Detecting patterns of fraudulent credit card use
– Link analysis
– Identifying loyal customers
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
● Criteria for selection of tool includes o uses observations to form a model of the important
characteristics of some phenomenon.
o Suitability for certain input data types
● Uses generalizations of ‘real world’ and ability to fit new data into
o Transparency of the mining output a general framework.
o Tolerance of missing variable values ● Can analyze a database to determine essential characteristics
(model) about the data set.
o Level of accuracy possible
● Linear regression attempts to fit a straight line through a plot of the ● Aim is to partition a database into an unknown number of
data, such that the line is the best representation of the average of segments, or clusters, of similar records.
all observations at that point in the plot. ● Uses unsupervised learning to discover homogeneous
sub-populations in a database to improve the accuracy of the
● Problem is that the technique only works well with linear data and
profiles.
is sensitive to the presence of outliers (that is, data values, which
do not conform to the expected norm). ● Less precise than other operations thus less sensitive to
redundant and irrelevant features.
● Although nonlinear regression avoids the main problems of linear
regression, it is still not flexible enough to handle all possible ● Sensitivity can be reduced by ignoring a subset of the
shapes of the data plot. attributes that describe each instance or by assigning a
weighting factor to each variable.
● Statistical measurements are fine for building linear models that
describe predictable data points, however, most data is not linear in ● Applications of database segmentation include customer
nature. profiling, direct marketing, and cross selling.
● Associated with demographic or neural clustering techniques, Link Analysis - Associations Discovery
which are distinguished by
● Finds items that imply the presence of other items in
– Allowable data inputs the same event.
– Methods used to calculate the distance between records ● Affinities between items are represented by
association rules.
– Presentation of the resulting segments for analysis
u Finds patterns between events such that the presence of one set of ● Can be performed using statistics and visualization techniques or
items is followed by another set of items in a database of events as a by-product of data mining.
over a period of time.
● Applications include fraud detection in the use of credit cards and
– e.g. Used to understand long term customer buying insurance claims, quality control, and defects tracing.
behavior.
● Finds links between two sets of data that are time-dependent, and
is based on the degree of similarity between the patterns that both
time series demonstrate.
● Relatively new operation in terms of commercially available data ● Recognizing that a systematic approach is essential to successful
mining tools. data mining, many vendor and consulting organizations have
specified a process model designed to guide the user through a
● Often a source of true discovery because it identifies outliers,
sequence of steps that will lead to good results.
which express deviation from some previously known expectation
● Developed a specification called the Cross Industry Standard
and norm.
Process for Data Mining (CRISP-DM).
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
● CRISP-DM specifies a data mining process model that is not ● The model also discusses relationships between different DM
compliant with a particular industry or tool. tasks. It gives idealised sequence of actions during a DM project.
● The third level specializes these tasks for specific situations. For
instance, the generic task might be cleaning data, and specialised
Data Mining Tools
task could be cleaning of numeric values or categorical values.
– Selection of data mining operations – Capable of dealing with increasing amounts of data,
possibly with sophisticated validation controls.
– Product scalability and performance
– Maintaining satisfactory performance may require
– Facilities for understanding results
investigations into whether a tool is capable of supporting
● Data preparation facilities parallel processing using technologies such as SMP or
MPP.
– Data preparation is the most time-consuming aspect of
data mining. ● Facilities for understanding results
– Functions supported include: data preparation, data – By providing measures such as those describing accuracy
cleansing, data describing, data transforming and data and significance in useful formats such as confusion
sampling. matrices, by allowing the user to perform sensitivity
analysis on the result, and by presenting the result in
● Selection of data mining operations alternative ways using for example visualization
techniques.
– Important to understand the characteristics of the
operations (algorithms) to ensure that they meet the user’s
requirements.
Decision Tree
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
Web Databases:
● Many Web sites today are file-based where each Web document is
stored in separate file.
● Also many Web sites now contain more dynamic information, such
as product and pricing data.
Worldwide collection of interconnected networks. ● eCommerce - Customers can place and pay for orders via the
business’s Web site.
● Began in late ‘60s in ARPANET, a US DOD project,
investigating how to build networks that could withstand ● eBusiness - Complete integration of Internet technology into
partial outages. economic infrastructure of the business.
● Starting with a few nodes, Internet estimated to have over 945
● Business-to-business transactions may reach $2.1 trillion in Europe
million users by end of 2004.
and $7 trillion in US by 2006.
● 2 billion users projected by 2010.
● About 3.5 billion documents on Internet (550 billion if ● eCommerce may account for $12.8 trillion in worldwide corporate
intranets/extranets included). revenue by 2006 and could represent 18% of sales in the global
economy.
Intranet and Extranet
The Web
● Intranet - Web site or group of sites belonging to an organization,
accessible only by members of that organization. Hypermedia-based system that provides a simple ‘point and click’ means of
browsing information on the Internet using hyperlinks.
● Extranet - An intranet that is partially accessible to authorized
outsiders. ● Information presented on Web pages, which can contain text,
graphics, pictures, sound, and video.
● Whereas intranet resides behind firewall and is accessible only to
people who are members of same organization, extranet provides ● Can also contain hyperlinks to other Web pages, which allow users
various levels of accessibility to outsiders. to navigate in a non-sequential way through information.
● Web consists of network of computers that can act in two roles: HyperText Transfer Protocol (HTTP)
u as servers, providing information; ● Protocol used to transfer Web pages through Internet.
● Based on request-response paradigm:
u as clients (browsers), requesting information.
o Connection - Client establishes connection with Web
● Protocol that governs exchange of information between Web server server.
and browser is HTTP and locations within documents identified as o Request - Client sends request to Web server.
● query string.
http://www.w3.org/Markup/MarkUp.html
String of alphanumeric characters that represents location or address of a ● Content of dynamic Web page is generated each time it is
● host name,
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
Collection of functions packaged as single entity and published to ● Common example is stock quote facility, which receives a request
network for use by other programs. for current price of a specified stock and responds with requested
price.
● Web services are important paradigm in building applications and
business processes for the integration of heterogeneous ● Second example is Microsoft MapPoint Web service that allows
applications. high quality maps, driving directions, and other location
● Based on open standards and focus on communication and information to be integrated into a user application, business
collaboration among people and applications. process, or Web site.
● Unlike other Web-based applications, Web services have no user
Requirements for Web-DBMS Integration
interface and are not targeted for browsers. Instead, consist of
reusable software components designed to be consumed by other ● Ability to access valuable corporate data in a secure manner.
applications.
● Data- and vendor-independent connectivity to allow freedom of
Web Services – Technologies & Standards choice in DBMS selection.
● eXtensible Markup Language (XML). ● Ability to interface to database independent of any proprietary
browser or Web server.
● SOAP (Simple Object Access Protocol) protocol, based on XML,
used for communication over Internet. ● Connectivity solution that takes advantage of all the features of an
organization’s DBMS.
● WSDL (Web Services Description Language) protocol, again
based on XML, used to describe the Web service. ● Open architecture to allow interoperability with a variety of
systems and technologies. For example:
● UDDI (Universal Discovery, Description and Integration) protocol
used to register the Web service for prospective users. o different Web servers;
Web Services
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
● Before server launches script, prepares number of environment CGI - Passing Parameters on Command Line
variables representing current state of the server, who is requesting
the information, and so on.
CGI – Disadvantages
● Server has to generate a new process or thread for each CGI script.
CGI – Advantages
● Security.
● CGI is the de facto standard for interfacing Web servers with
external applications. HTTP Cookies
● Possibly most commonly used method for interfacing Web ● Cookies can make CGI scripts more interactive.
applications to data sources.
● Cookies are small text files stored on Web client.
● Advantages:
● CGI script creates cookie and has Web server send it to client’s
– simplicity, browser to store on hard disk.
– language independence, ● Later, when client revisits Web site and uses a CGI script that
requests this cookie, client’s browser sends information stored in
– Web server independence,
the cookie.
– wide acceptance.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
● Cookies can be used to store registration information or ● Extending Web server is potentially dangerous, since server
preferences (e.g. for virtual shopping cart). executable is being changed.
● However, not all browsers support cookies. Comparison of CGI and API
Extending the Web Server ● CGI and API both extend capabilities of server.
● To overcome limitations of CGI, many servers provide an API that ● CGI scripts run in environment created by Web server program.
adds functionality to server.
● Scripts only execute once Web server interprets request from
● Two of main APIs are Netscape’s NSAPI and Microsoft’s ISAPI. browser, then returns results back to the server.
● Scripts are loaded in as part of the server, giving back-end ● API approach not nearly so limited in its ability to communicate.
applications full access to all the I/O functions of server.
● API-based extensions are loaded into same address space as Web
● One copy of application is loaded and shared between multiple server.
requests to server.
Java
● Approach more complex than CGI, possibly requiring specialized
● Proprietary language developed by Sun.
programmers.
● Interesting because of its potential for building Web applications ● Before Java application can be executed, it must first be loaded
(applets) and server applications (servlets). into memory.
‘A simple, object-oriented, distributed, interpreted, robust, secure, ● Done by Class Loader, which takes ‘.class’ file(s) containing
architecture neutral, portable, high-performance, multi-threaded and bytecodes and transfers it into memory.
dynamic language’.
● Class file can be loaded from local hard drive or downloaded from
● Has a machine-independent target architecture, the Java Virtual network.
Machine (JVM).
● Finally, bytecodes must be verified to ensure that they are valid
● Since almost every Web browser vendor has already licensed Java and do not violate Java’s security restrictions.
and implemented an embedded JVM, Java applications can
● Loosely speaking, Java is a ‘safe’ C++.
currently be deployed on most end-user platforms.
Java 2 Platform
– EJB Entity Beans, components encapsulating some data – A direct mapping of relational database tables to Java
contained by the enterprise. Entity Beans are persistent. classes (e.g. TopLink from Oracle).
● Two types of entity beans: ● JDBC API consists of two main interfaces: an API for application
writers, and a lower-level driver API for driver writers.
– Bean-Managed Persistence (BMP), which requires
developer to write code top make bean persist using an ● Applications and applets can access databases using:
API such as JDBC or SQLJ.
– ODBC drivers and existing database client libraries;
– Container-Managed Persistence (CMP), where
– JDBC API with pure Java JDBC drivers.
persistence is provided automatically by container.
JDBC
● With JDBC, Java can be used as host language for writing database
applications.
JDBC - Advantages/Disadvantages
● Disadvantages with this approach: ● Thus, SQLJ facilitates static analysis for syntax checking, type
checking, and schema checking, which may help produce more
– Non-pure JDBC driver will not necessarily work with a
reliable programs at loss of some functionality.
Web browser.
● It also potentially allows DBMS to generate an execution strategy
– Currently downloaded applet can connect only to database
for the query, thereby improving performance of the query.
located on host machine.
● JDBC is low-level middleware tool with features to interface Java
– Deployment costs increase.
application with RDBMS.
SQLJ
● Developers need to design relational schema to which they will
● Another JDBC-based approach uses Java with static embedded map Java objects, and write code to map Java objects to rows of
SQL. relations.
● SQLJ comprises a set of clauses that extend Java to include SQL ● Problems:
constructs as statements and expressions.
o need to be aware of two different paradigms (object and
● SQLJ translator transforms SQLJ clauses into standard Java code relational);
that accesses database through a CLI.
o need to design relational schema to map onto object
Comparison of JDBC and SQLJ design;
● SQLJ is based on static embedded SQL while JDBC is based on o need to write mapping code.
dynamic SQL.
EJBs
– an indirection mechanism; bean, at which point client can access bean through remote or local
interface returned by create().
– a bean implementation;
– a deployment description.
● home interface defines methods that manage lifecycle of a bean. ● Bean implementation is a Java class that implements business logic
The corresponding server-side implementation classes are defined in remote interface.
generated at deployment time.
● Transactional semantics are described declaratively and captured in
● To provide access to other operations, bean can expose a local the deployment descriptor.
interface (if client and bean are colocated), a remote interface, or
● Deployment descriptor, written in XML, lists a bean’s properties
both.
and elements, which may include:
● Local interfaces expose methods to clients running in same
– home interface, remote interface, local interface;
container or JVM.
– Web service endpoint interface,
● Remote interfaces make methods available to clients no matter
where deployed. – bean implementation class,
● When a client invokes create() method (which returns an interface) – JNDI name for bean, transaction attributes, security
on home interface, EJB container calls ejbCreate() to instantiate attributes, and per-method descriptors.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
Container-Managed Persistence (CMP) ● With CMR, beans use local interfaces to maintain relationships
with other beans.
● Instead of writing Java code to implement BMP, CMP is defined
declaratively in deployment descriptor. ● For example, a Staff bean can use collection of PropertyForRent
local interfaces to maintain a 1:M relationship
● At runtime, container manages bean’s data by interacting with data
source designated in deployment descriptor. ● Container can also manage referential integrity.
● Following steps need to be followed for CMP: ● CMR relationships are described declaratively in deployment
descriptor file outside enterprise-beans element.
– Define CMP fields in local interface.
● Need to specify both beans involved in relationship.
– Define CMP fields in entity bean class implementation.
● Relationship is defined in ejb-relations element, with each role
– Define CMP fields in deployment descriptor.
defined in ejb-relationship-role element.
– Define PK field and its type in deployment descriptor.
● When bean is deployed, the container provider’s tools parse
Container-Managed Relationships (CMR) deployment descriptor and generate code to implement underlying
classes.
● EJB container can manage relationships between entity beans and
session beans. EJB Query Language (EJB-QL)
● Relationships have a multiplicity, which can be 1:1, 1:M, or M:M, ● Used to define queries for entity beans that operate with CMP.
and a direction, which can be unidirectional or bidirectional. EJB-QL can express queries for two different styles of operations:
● Local interfaces provide foundation for CMR. – finder methods, which allow results of an EJB-QL query
to be used by clients of the entity bean. Finder methods
are defined in home interface.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
● As with CMP and CMR fields, queries are defined in the <query-method>
deployment descriptor.
<method-name>findByStaffName</method-name>
● EJB container is responsible for translating EJB-QL queries into
query language of persistent store, resulting in query methods that <method-params>java.lang.String</method-params>
<query>
<result-type-mapping>Local</result-type-mapping>
<query-method>
<ejb-ql><![CDATA[SELECT OBJECT(s)
<method-name>findAll</method-name>
FROM Staff s WHERE s.name = ?1]]>
<method-params></method-params>
</ejb-ql>
</query-method>
</query>
<result-type-mapping>Local</result-type-mapping>
Java Data Objects (JDO)
– To provide standard interface between application objects ● PersistenceManager contains methods to manage the lifecycle of
and data sources, such as relational databases, XML PersistenceCapable instances and is also the factory for Query and
databases, legacy databases, and file systems. Transaction instances.
– To provide developers with a transparent Java-centric ● A PersistenceManager instance supports one transaction at a time
mechanism for working with persistent data to simplify and uses one connection to the underlying data source at a time.
application development. (Aim of JDO was to reduce
● Query allows applications to obtain persistent instances from data
need to explicitly code such things as SQL statements and
source. Can be many Query instances associated with a
transaction management into applications).
PersistenceManager and multiple queries may be designated for
Java Data Objects (JDO) – Interfaces simultaneous execution.
● PersistenceCapable makes a Java class capable of being persisted ● This interface is implemented by each JDO vendor to translate
by a persistence manager. Every class whose instances can be expressions in JDOQL into native query language of data store.
managed by a JDO PersistenceManager must implement this
Java Data Objects (JDO) – Interfaces and Classes
interface.
● Most JDO implementations provide an enhancer that transparently ● Extent is a logical view of all objects of a particular class that exist
adds code to implement this interface to each persistent class. in the data source.
● The interface defines methods that allow an application to examine
runtime state of an instance and to get its associated ● Extents are obtained from a PersistenceManager and can be
● Transaction contains methods to mark start/end of transactions. ● Thus, any transient instance of a persistent class will become
persistent at commit if it is reachable, directly or indirectly, by a
● JDOHelper class defines static methods that allow a JDO-aware
persistent instance.
application to examine runtime state of instances and to get its
associated PersistenceManager if it has one. ● Instances are reachable through either a reference or collection of
references.
JDO – Creating Persistent Classes
● Reachability algorithm is applied to all persistent instances
1. Ensure each class has a no-arg constructor. If class has no
transitively through all their references to instances in memory,
constructors defined, complier automatically generates a no-arg
causing the complete closure to become persistent.
constructor; otherwise, developer will need to specify one.
● Allows developers to construct complex object graphs in memory
2. Create a JDO metadata file to identify the persistent classes. The
and make them persistent simply by creating a reference to graph
JDO metadata file is expressed as an XML document.
from a persistent instance.
3. Enhance classes so that they can be used in a JDO runtime
● Instances have to be explicitly deleted.
environment. JDO specification describes a number of ways that
classes can be enhanced, however, most common way is using an JDO Query Language (JDOQL)
enhancer program that reads a set of .class files and JDO metadata
● Data source-neutral query language based on Java boolean
file and creates new .class files that have been enhanced to run in a
expressions.
JDO environment.
● Basic JDOQL query has following 3 components: ● Have a number of advantages over CGI:
Collection result = (Collection) query.execute(); ● Deal directly with processing XML documents.
Java Servlets ● Java API for XML Processing (JAXP), processes XML documents
using various parsers and transformations. JAXP supports both
● Servlets are programs that run on Java-enabled Web server and
SAX and DOM. Also supports the XSLT.
build Web pages, analogous to CGI.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
● Java Architecture for XML Binding (JAXB), processes XML ● JAXR gives Java developers a uniform way to use business
documents using schema-derived JavaBeans component classes. registries based on open standards (such as ebXML) or industry
JAXB provides methods for unmarshalling an XML instance consortium-led specifications (such as UDDI).
document into a tree of Java objects, and marshalling tree back
Microsoft Web Platform - .NET
into an XML document.
“Software is delivered as a service, accessible by any device, any
● SOAP with Attachments API for Java (SAAJ) , provides standard
time, any place, and is fully programmable and personalizable.”
way to send XML documents over Internet from Java platform.
Based on SOAP 1.1 and SOAP with Attachments, which define a ● Contains various tools, services, and technologies, such as:
basic framework for exchanging XML messages.
– Windows 2000,
Java Web Services – Procedure-Oriented
– Exchange Server,
● Java API for XML-based RPC (JAX-RPC), sends SOAP method
calls to remote clients over Internet and receives results. – Visual Studio,
● Client written in language other than Java can access a Web service – HTML/XML,
● Allows OLE-oriented applications to share and manipulate sets of ● ASP runs in-process with the server, and is optimized to handle
data as objects. large volume of users.
● OLE DB is an object-oriented specification based on C++ API. ● When an ‘.asp’ file is requested, Web server calls ASP, which
reads requested file, executes any commands, and sends generated
● Components can be treated as data consumers and data providers.
HTML page back to browser.
Consumers take data from OLE DB interfaces and providers
expose OLE DB interfaces.
OLE DB Architecture
o Support for different cursor types. ● Microsoft technology for client-side database manipulation across
Internet.
o Batch updating.
● Still uses ADO on server-side to execute query and return
o Support for limits on number of returned rows.
recordset to client, which can then execute other queries on
o Support for multiple recordsets. recordset.
● Designed as an easy-to-use interface to OLE DB. ● RDS provides mechanism to send updated records back to server.
● Differences:
Remote Data Services (RDS) – JSP benefits from in-built Java security model.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
Microsoft .NET o Application Center (to deploy and manage scalable Web
applications),
● Number of limitations with Microsoft’s platform:
o Mobile Information Server (to support handheld devices),
o a number of languages supported with different
programming models (J2EE composed solely of Java); o SQL Server,
o relatively simple user interfaces for Web compared to o Microsoft .NET Framework (CLR + Class Library).
traditional Windows user interfaces;
.NET Framework
o need to abstract operating system (Windows API difficult
to program).
o Windows Server,
● An execution engine that loads, executes, and manages code ● Collection of reusable classes, interfaces, and types that integrate
compiled into an intermediate bytecode format - Microsoft with CLR providing standard functionality such as:
Intermediate Language (MSIL) - analogous to Java bytecodes.
– string management, input/output, security management,
● Not interpreted but compiled to native binary format before
– network communications, thread management,
execution by a JIT compiler built into CLR.
– user interface design features,
● Allows one language to call another, and even inherit and modify
objects from another language. – database access and manipulation.
representing both simple data types for objects such as numbers and Web services. Reengineered version of ASP to
and text values, and more complex data types for developing user improve performance and scalability.
– user can populate DataSet with data from existing Microsoft Access and Web Page Generation
relational data source using a DataAdapter.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
● Access provides wizards for automatically generating – Java, J2EE, EJB, JDBC, and SQLJ for database
HTML/XML: connectivity, Java servlets, and JSP. Also supports JNDI
and stored Java procedures.
– Static pages: user can export data to HTML format.
– OMG’s CORBA technology.
– Dynamic pages using ASP: user can export data to an
‘asp’ file on Web server. – IIOP for object interoperability and RMI.
– Dynamic pages using HTX/IDC files: user can export data – Web services: SOAP, WSDL, UDDI, ebXML, WebDav,
to HTX/IDC files on server. LDAP.
– Dynamic pages using data access pages: data access Oracle Internet Platform
pages are Web pages bound directly to data in the
database. Can be used like Access forms, except pages are
stored as external files.
Oracle Application Server (OracleAS) – mod_oc4j, routes HTTP requests for J2EE to OracleAS
Containers for J2EE (OC4J);
● A reliable, scalable, secure, middle-tier application server designed
to support eBusiness. – mod_plsql, routes requests for stored procedures to
database server;
● Currently available in three versions:
– mod_fastcgi, enhanced version of CGI that runs programs
– Java Edition: lightweight Web server with minimal
in pre-spawned process;
application support;
– mod_oradav, provides support for WebDAV;
– Standard Edition: for medium to large Web sites that
handle large volume of transactions; – mod_ossl, provides standard S-HTTP;
– Enterprise Edition: Standard Edition + extras. – mod_osso, enables transparent single sign-on.
● Handles all incoming requests received by OracleAS, some ● A fully compliant J2EE 1.3 server.
processed by Oracle HTTP Server and some by other areas of
● Runs on J2SE and executes and manages J2EE application
OracleAS.
components such as:
● Oracle HTTP Server is extended version of Apache Server.
– Servlets Servlet container provided that manages
Oracle HTTP Server Modules (mods) execution of Web components and J2EE applications.
● Oracle has enhanced several of Apache mods, and has added – JSPs JSP translator provided to convert JSP files into
Oracle-specific ones; e.g.: Java source that container can then compile and execute
as a servlet.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
– EJBs EJB container provided that manages execution of Business Components for Java (BC4J)
EJBs for J2EE applications. Container has configurable
● A Java and XML framework that enables development,
settings that customize the underlying support , such as
deployment, and customization of multi-tier database applications
security, transaction management, JNDI lookups, and
from reusable business components.
remote connectivity. Container also manages EJB
lifecycles, database connection resource pooling, data ● Application developers can use BC4J to author and test business
persistence, and access to J2EE APIs. logic in components that automatically integrate with databases,
reuse business logic through SQL-based views, and access/update
● OracleAS supports both JDBC and SQLJ database access
these views from servlets, JSP, and Java Swing clients.
mechanisms, and provides following drivers:
– J2EE Connectors, part of J2EE platform, provide a ● These services deliver dynamic content to client browsers,
Java-based solution for connecting various application supporting servlets, JSP, Perl/CGI scripts, PL/SQL pages,
servers and EISs. forms, and business intelligence.
– DataDirect Connect Type 4 JDBC drivers, for connecting – Oracle Forms Services, to run Oracle Forms over
to non-Oracle databases. Internet;
● OracleAS provides facilities for developing, deploying, and ● Includes Mapping Workbench, a visual tool to map any object
managing Web services; e.g.: model to any relational schema.
– Web services can be developed using stateless and stateful Oracle Portal
Java classes, stateless session EJBs, and stateless PL/SQL
● A portal is Web-based application that provides a common,
stored procedures.
integrated entry point for accessing dissimilar data types on a
– Web Service HTML/XML Streams Processing Wizard single Web page.
assists developers in creating an EJB whose methods
● A portal is divided into a number of portlets.
access and process HTML or XML streams.
– OracleAS supports SOAP, WSDL, and UDDI. ● Provides services and tools for delivering information and
applications to mobile devices.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
● Includes Multi-Channel Server (MCS) that supports development – Oracle Personalization enables users to track activity of
of applications that are accessible from multiple channels including specific user and personalize information for that user.
wireless browsers, voice, and messaging.
● Also allows portal sites to be created that use Web pages, Java
applications, and XML-based applications.
Business Intelligence
wireless, and satellite communications, it will soon be possible for The components of a mobile database environment include:
mobile users to access any data, anywhere, at any time. However,
● corporate database server and DBMS that manages and stores the
business etiquette, practicalities, security, and costs may still limit
corporate data and provides corporate applications;
communication such that it is not possible to establish online
● remote database and DBMS that manages and stores the mobile
connections for as long as users want, whenever they want. Mobile
data and provide mobile applications;
databases offer a solution for some of these restrictions.
● mobile database platform that includes laptop, PDA, or other
Internet access devices;
● two-way communication links between the corporate and mobile
A database that is portable and physically separate from the
DBMS.
corporate database server but is capable of communicating with
Mobile DBMSs
that server from remote sites allowing the sharing of corporate data
is called mobile database ● All the major DBMS vendors now offer a mobile DBMS.
● Most vendors promote their mobile DBMS as being capable of
communicating with a range of major relational DBMSs and in
providing database services that require limited computing
resources to match those currently provided by mobile devices.
● The additional functionality required of mobile DBMSs includes
the ability to:
o communicate with the centralized database server
through modes such as wireless or Internet access;
o replicate data on the centralized database server and
24.7 mobile device;
o synchronize data on the centralized database server and
mobile device;
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
o capture data from various sources such as the Internet; ● The general architecture of a mobile platform is a distributed
o manage data on the mobile device; architecture where a number of computers, generally referred to as
o analyze data on a mobile device; Fixed Hosts and Base Stations, are interconnected through a
o create customized mobile applications. high-speed wired network. Fixed hosts are general purpose
Applications: computers that are not typically equipped to manage mobile units
but can be configured to do so.
● This feature is especially useful to geographically dispersed
● Base stations function as gateways to the fixed network for the
organizations. Typical examples might include electronic valets,
Mobile Units.
news reporting, brokerage services, and automated salesforces.
● They are equipped with wireless interfaces and offer network
Disadvantage:
access services of which mobile units are clients.
● There are a number of hardware and software problems that must Wireless Communications.
be resolved before the capabilities of mobile computing can be
● The wireless medium on which mobile units and base stations
fully utilized.
communicate have bandwidths significantly lower than those of a
● Some of the software problems-which may involve data
wired network.
management, transaction management, and database recovery-have
● The current generation of wireless technology has data rates that
their origins in distributed database systems.
range from the tens to hundreds of kilobits per second (2G cellular
● In mobile computing, however, these problems are more difficult,
telephony) to tens of megabits per second (wireless Ethernet,
mainly because of the limited and intermittent connectivity
popularly known as WiFi).
afforded by wireless communications, the limited life of the power
● Modem (wired) Ethernet, by comparison, provides data rates on
supply (battery) of mobile units, and the changing topology of the
the order of hundreds of megabits per second.
network.
● In addition, mobile computing introduces new architectural
possibilities and challenges.
Mobile Computing Architecture
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
● The characteristics of mobile computing include high 1. The entire database is distributed mainly among the wired components,
communication latency, intermittent wireless connectivity, limited possibly with full or partial replication. A base station or fixed host
battery life, and changing client location. Latency is caused by the manages its own database with a DBMS-like functionality, with additional
processes unique to the wireless medium, such as coding data for functionality for locating mobile units and additional query and transaction
wireless transfer, and tracking and filtering wireless signals at the management features to meet the requirements of mobile environments.
receiver.
2. The database is distributed among wired and wireless components. Data
● Battery life is directly related to battery size, and indirectly related
management responsibility is shared among base stations or fixed hosts and
to the mobile device's capabilities.
mobile units. Hence, the distributed data management issues can be applied
● Intermittent connectivity can be intentional or unintentional.
to mobile databases with the following additional considerations and
● Unintentional disconnections happen in areas wireless signals
variations:
cannot reach, e.g., elevator shafts or subway tunnels.
● Intentional disconnections occur by user intent, e.g., during an • Data distribution andreplication: Data is unevenly distributed among the
airplane takeoff, or when the mobile device is powered down. base stations and mobile units. The consistency constraints compound the
● Clients are expected to move, which alters the network topology problem of cache management. Caches attempt to provide the most
and may cause their data requirements to change. frequently accessed and updated data to mobile units that process their own
● All of these characteristics impact data management, and robust transactions and may be disconnected over long periods.
mobile applications must consider them in their design.'
Data Management Issues • Transaction models: Issues of fault tolerance and correctness of
transactions are aggravated in the mobile environment. A mobile transaction
From a data management standpoint, mobile computing may be considered is executed sequentially through several base stations and possibly on
a variation of distributed computing. multiple data sets depending upon the movement of the mobile unit. Central
coordination of transaction execution is lacking. Moreover, a mobile
Mobile databases can be distributed under two possible scenarios:
transaction is expected to be long-lived because of disconnection in mobile
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
units. Hence, traditional ACID properties of transactions may need to be applying these (spatial) queries in order to refresh the cache poses a
modified and new transaction models must be defined. problem.
• Query processing: Awareness of where data is located is important and • Division of labor: Certain characteristics of the mobile environment force
affects the cost benefit analysis of query processing. Query optimization is a change in the division of labor in query processing. In some cases, the
more complicated because of mobility and rapid resource changes of mobile client must function independent of the server.
units. The query response needs to be returned to mobile units that may be
• Security: Mobile data is less secure than that which is left at the fixed
in transit or may cross cell boundaries yet must receive complete and
location. Proper techniques for managing and authorizing access to critical
correct query results.
data become more important in this environment. Data is also more volatile,
• Recovery and fault tolerance: The mobile database environment must and techniques must be able to compensate for its loss.
deal with site, media, transaction, and communication failures. Site failure
Application: Intermittently Synchronized Databases
of a mobile unit is frequent due to limited battery power. A voluntary
shutdown of a mobile unit should not be treated as a failure. Transaction ● One mobile computing scenario is becoming increasingly
failures are routine during handoff when a mobile unit crosses cells. The commonplace as people conduct their work away from their offices
transaction manager should be able to deal with such frequent failures. and homes and perform a wide range of activities and functions: all
kinds of sales, particularly in pharmaceuticals, consumer goods,
• Mobile database design: The global name resolution problem for handling
and industrial parts; law enforcement; insurance and financial
queries is compounded because of mobility and frequent shutdown. Mobile
consulting and planning; real estate or property management
database design must consider many issues of metadata management-for
activities; courier and transportation services, and so on.
example, the constant updating of location information.
XML and Web Databases: o emergence of XML as standard for data representation
and exchange on the Web, and similarity between XML
Data that may be irregular or incomplete and have a structure that may
documents and semistructured data.
change rapidly or unpredictably.
XML (eXtensible Markup Language)
o Semistructured data is data that has some structure, but structure
may not be rigid, regular, or complete. A meta-language (a language for describing other languages) that enables
designers to create their own customized tags to provide functionality not
o Generally, data does not conform to fixed schema (sometimes use
available with HTML.
terms schema-less or self-describing).
● Most documents on Web currently stored and transmitted in
● Information normally associated with schema is contained within
HTML.
data itself.
● One strength of HTML is its simplicity. Simplicity may also be
● Some forms of semistructured data have no separate schema, in
one of its weaknesses, with users wanting tags to simplify some
others it exists but only places loose constraints on data.
tasks and make HTML documents more attractive and dynamic.
● Has gained importance recently for various reasons: viewable Web documents.
o may be desirable to treat Web sources like a database, but ● W3C has produced XML, which could preserve general
cannot constrain these sources with a schema; application independence that makes HTML portable and
powerful.
o may be desirable to have a flexible format for data
exchange between disparate databases; ● XML is a restricted version of SGML, designed especially for Web
documents.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
● XML attempts to provide a similar function to SGML, but is less ● More advanced search engines
Advantages of XML
● XML declaration: optional at start of XML document. ● Although optional, DTD is recommended for document
● Entity references: serve various purposes, such as shortcuts to often conformity.
repeated text or to distinguish reserved characters from content.
● Comments: enclosed in <!– and --> tags.
● CDATA sections: instructs XML processor to ignore markup
characters and pass enclosed text directly to application.
● Processing instructions: can also be used to provide information to
application.
DTDs – Element Type Declarations
XML – Ordering
● Identify the rules for elements that can occur in the XML
● Semistructured data model described assumes collections are
document. Options for repetition are:
unordered.
– * indicates zero or more occurrences for an element;
● In XML, elements are ordered.
– + indicates one or more occurrences for an element;
● In contrast, in XML attributes are unordered.
– ? indicates either zero occurrences or exactly one
occurrence for an element.
Document Type Definitions (DTDs)
– Name with no qualifying punctuation must occur exactly
Defines the valid syntax of an XML document.
once.
● Lists element names that can occur in document, which
● Commas between element names indicate they must occur in
elements can appear in combination with which other ones,
succession; if commas omitted, elements can occur in any order.
how elements can be nested, what attributes are available for
DTDs – Attribute List Declarations
each element type, and so on.
● Identify which elements may have attributes, what attributes they
● Term vocabulary sometimes used to refer to the elements used
may have, what values attributes may hold, plus optional defaults.
in a particular application.
Some types:
● Grammar specified using EBNF, not XML.
● CDATA: character data, containing any text.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
● ID: used to identify individual elements in document (ID is an ● Validating processor will not only check that an XML document is
element name). well-formed but that it also conforms to a DTD, in which case
● IDREF/IDREFS: must correspond to value of ID attribute(s) for XML document is considered valid.
some element in document. DOM and SAX
● List of names: values that attribute can hold (enumerated type). ● XML APIs generally fall into two categories: tree-based and
DTDs – Element Identity, IDs, IDREFs event-based.
● ID allows unique key to be associated with an element. ● DOM (Document Object Model) is tree-based API that provides
● IDREF allows an element to refer to another element with the object-oriented view of data.
designated key, and attribute type IDREFS allows an element to ● API was created by W3C and describes a set of platform- and
refer to multiple elements. language-neutral interfaces that can represent any well-formed
● To loosely model relationship Branch Has Staff: XML/HTML document.
– <!ATTLIST STAFF staffNo ID #REQUIRED> ● Builds in-memory representation of document and provides classes
– <!ATTLIST BRANCH staff IDREFS #IMPLIED> and methods to allow an application to navigate and process the
DTDs – Document Validity tree.
● Two levels of document processing: well-formed and valid.
● Non-validating processor ensures an XML document is
well-formed before passing information on to application.
● XML document that conforms to structural and notational rules of
XML is considered well-formed; e.g.: Representation of Document as Tree-Structure
– document must start with <?xml version “1.0”>;
– all elements must be within one root element;
– elements must be nested in a tree structure without any
overlap;
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
<STAFFLIST xmlns=“http://www.dreamhome.co.uk/branch5/”
xmlns:hq = “http://www.dreamhome.co.uk/HQ/”>
● XSL created to define how XML data is rendered and to define ● Designed for use with XSLT (for pattern matching) and XPointer
how one XML document can be transformed into another (for addressing).
document. ● With XPath, collections of elements can be retrieved by specifying
XSLT (XSL Transformations) a directory-like path, with zero or more conditions placed on the
● A subset of XSL, XSLT is a language in both markup and path.
programming sense, providing a mechanism to transform XML ● Uses a compact, string-based syntax, rather than a structural
structure into either another XML structure, HTML, or any number XML-element based syntax, allowing XPath expressions to be
of other text-based formats (such as SQL). used both in XML attributes and in URIs.
● XSLT’s main ability is to change the underlying structures rather
than simply the media representations of those structures, as with
CSS.
● XSLT is important because it provides a mechanism for
dynamically changing the view of a document and for filtering
data.
● Also robust enough to encode business rules and it can generate
graphics (not just documents) from data.
● Can even handle communicating with servers (scripting modules
can be integrated into XSLT) and can generate the appropriate
messages within body of XSLT itself. XPointer
Provides access to values of attributes or content of elements
XPath anywhere within an XML document.
Declarative query language for XML that provides simple syntax ● Basically an XPath expression occurring within a URI.
for addressing parts of an XML document.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
● Among other things, with XPointer can link to sections of text, – ID attribute replaces the name attribute;
select particular elements or attributes, and navigate through – documents must conform to XML rules.
elements. Simple Object Access Protocol (SOAP)
● Can also select data contained within more than one set of nodes, ● An XML-based messaging protocol that defines a set of rules for
which cannot do with XPath. structuring messages.
XLink ● Protocol can be used for simple one-way messaging but also useful
Allows elements to be inserted into XML documents to create and for performing RPC-style request-response dialogues.
describe links between resources. ● Not tied to any particular operating system or programming
● Uses XML syntax to create structures that can describe links language nor any particular transport protocol, although HTTP is
similar to simple unidirectional hyperlinks of HTML as well as popular.
more sophisticated links. ● Important advantage of SOAP is that most firewalls allow HTTP to
● Two types of XLink: simple and extended. pass right through, facilitating point-to-point SOAP data
● Simple link connects a source to a destination resource; an exchanges.
extended link connects any number of resources. ● SOAP message is an XML document containing:
o A required Envelope element that identifies the
XML document as a SOAP message.
XHTML (eXtensible HTML) 1.0 o An optional Header element that contains
● Reformulation of HTML 4.01 in XML 1.0 and is intended to be application specific information such as
next generation of HTML. authentication or payment information.
● Basically a stricter and cleaner version of HTML; e.g.: o A required Body Header element that contains
– tags and attributes must be in lowercase; call and response information.
– all XHTML elements must be have an end-tag; o An optional Fault element that provides
– attribute values must be quoted and minimization is not information about errors that occurred while
allowed; processing message.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
● Based on industry standards including HTTP, XML, XML ● XML schema is the definition (both in terms of its organization
Schema, SOAP, and WSDL. and its data types) of a specific XML structure.
● Two types of UDDI registries: public and private. ● XML Schema language specifies how each type of element in
WSDL and UDDI schema is defined and the element’s data type.
● Schema is an XML document, and so can be edited and processed
by same tools that read the XML it describes.
XML Schema – Simple Types
● Elements that do not contain other elements or attributes are of
type simpleType.
</xsd:group> ● Must first build a model of the domain of interest, to clarify what
Constraints kind of data is to be sent from first application to second.
● XML Schema provides XPath-based features for specifying ● However, as XML Schema just describes a grammar, there are
uniqueness constraints and corresponding reference constraints that many different ways to encode a specific domain model into an
will hold within a certain scope. XML Schema, thereby losing the direct connection from the
<xsd:unique name = “NAMEDOBUNIQUE”> domain model to the Schema.
<xsd:selector xpath = “STAFF”/> ● Problem compounded if third application wishes to exchange
<xsd:field xpath = “NAME/LNAME”/> information with other two.
<xsd:field xpath = “DOB”/> ● Not sufficient to map one XML Schema to another, since the task
</xsd:unique> is not to map one grammar to another grammar, but to map objects
Key Constraints and relations from one domain of interest to another.
● Similar to uniqueness constraint except the value has to be ● Three steps required:
non-null. Also allows the key to be referenced. o reengineer original domain models from
<xsd:key name = “STAFFNOISKEY”> XML Schema;
<xsd:selector xpath = “STAFF”/> o define mappings between the objects in
<xsd:field xpath = “STAFFNO”/> the domain models;
</xsd:key> o define translation mechanisms for the
Resource Description Framework (RDF) XML documents, for example using
● Even XML Schema does not provide the support for semantic XSLT.
interoperability required.
● For example, when two applications exchange information using u RDF is infrastructure that enables encoding, exchange, and reuse
XML, both agree on use and intended meaning of the document of structured meta-data.
structure.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
XML Query Languages ● W3C formed an XML Query Working Group in 1999 to produce a
● Data extraction, transformation, and integration are data model for XML documents, set of query operators on this
well-understood database issues that rely on a query language. model, and query language based on query operators.
● SQL and OQL do not apply directly to XML because of the ● Queries operate on single documents or fixed collections of
irregularity of XML data. documents, and can select entire documents or subtrees of
● However, XML data similar to semistructured data. There are documents that match conditions based on document
many semistructured query languages that can query XML content/structure.
documents, including XML-QL, UnQL, and XQL. ● Queries can also construct new documents based on what has been
● All have notion of a path expression for navigating nested structure selected.
of XML. ● Ultimately, collections of XML documents will be accessed like
Example XML-QL databases.
Find surnames of staff who earn more than £30,000. ● Working Group has produced four documents:
WHERE <STAFF> o XML Query (XQuery) Requirements;
<SALARY> $S </SALARY> o XML XQuery 1.0 and XPath 2.0 Data Model;
<NAME><FNAME> $F </FNAME> <LNAME> $L o XML XQuery 1.0 and XPath 2.0 Formal Semantics;
</LNAME></NAME> o XQuery 1.0 – A Query Language for XML;
</STAFF> IN “http://www.dh.co.uk/staff.xml” o XML XQuery 1.0 and XPath 2.0 Functions and
$S > 30000 Operators;
CONSTRUCT <LNAME> $L </LNAME> o XSLT 2.0 and XPath 1.0 Serialization.
XML Query Requirements
● Specifies goals, usage scenarios, and requirements for XQuery
Data Model and query language. For example:
XML Query Working Group – language must be declarative and must be defined
independently of any protocols with which it is used;
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
– queries should be possible whether or not a schema exists; ● Path expression can begin with an expression that identifies a
– language must support both universal and existential specific node, such as function doc(string), which returns root node
quantifiers on collections and it must support aggregation, of named document.
sorting, nulls, and be able to traverse inter- and ● Query can also contain path expression beginning with “/” or “//”,
intra-document references. which represents an implicit root node determined by the
XQuery environment in which query is executed.
● XQuery derived from XML query language called Quilt, which has Example – XQuery Path Expressions
borrowed features from XPath, XML-QL, SQL, OQL, Lorel, Find staff number of first member of staff in our XML document.
XQL, and YATL. doc(“staff_list.xml”)/STAFFLIST/STAFF[1]//STAFFNO
● Like OQL, XQuery is a functional language in which a query is
represented as an expression. ● Four steps:
● XQuery supports several kinds of expression, which can be nested – first opens staff_list.xml and returns its document node;
(supporting notion of a subquery). – second uses /STAFFLIST to select STAFFLIST element
XQuery – Path Expressions at top;
● Uses syntax of XPath. – third locates first STAFF element that is child of root
● In XQuery, result of a path expression is ordered list of nodes, element;
including their descendant nodes, ordered according to their – fourth finds STAFFNO elements occurring anywhere
position in original hierarchy, top-down, left-to-right order. within this STAFF element.
● Result of path expression may contain duplicate values. ● Knowing structure of document, could also express this as:
● Each step in path expression represents movement through doc(“staff_list.xml”)//STAFF[1]/STAFFNO
document in particular direction, and each step can eliminate nodes doc(“staff_list.xml”)/STAFFLIST/STAFF[1]/STAFFNO
by applying one or more predicates.
● Result of each step is list of nodes that serves as starting point for Find staff numbers of first two members of staff.
next step. doc(“staff_list.xml”)/STAFFLIST/STAFF[1 TO 2]/ STAFFNO
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
Example – XQuery Path Expressions ● LET clause also binds one or more variables to one or more
Find surnames of staff at branch B005. expressions but without iteration, resulting in single binding
doc(“staff_list.xml”)/STAFFLIST/ STAFF[@branchNo =“B005”]//LNAME for each variable.
● Optional WHERE clause specifies one or more conditions to
● Five steps: restrict tuples generated by FOR and LET.
– first two as before; ● RETURN clause evaluated once for each tuple in tuple stream
– third uses /STAFF to select STAFF elements within and results concatenated to form result.
STAFFLIST element; ● ORDER BY clause, if specified, determines order of the tuple
– fourth consists of predicate that restricts STAFF elements stream which, in turn, determines order in which RETURN
to those with branchNo attribute = B005; clause is evaluated using variable bindings in the respective
– fifth selects LNAME element(s) occurring anywhere tuples.
within these elements.
XQuery – FLWOR Expressions
● FLWOR (“flower”) expression is constructed from FOR, LET,
WHERE, ORDER BY, RETURN clauses.
● FLWOR expression starts with one or more FOR or LET
clauses in any order, followed by optional WHERE clause,
optional ORDER BY clause, and required RETURN clause.
● FOR and LET clauses serve to bind values to one or more
variables using expressions (e.g., path expressions).
● FOR used for iteration, associating each specified variable
with expression that returns list of nodes.
● FOR clause can be thought of as iterating over nodes returned
by its respective expression.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
FOR $S IN doc(“staff_list.xml”)//STAFF
</BRANCHLIST>
Example – User-Defined Function
Function to return staff at a given branch.
DEFINE FUNCTION staffAtBranch($bNo) AS element()* {
FOR $S IN doc(“staff_list.xml”)//STAFF
WHERE $S/@branchNo = $bNo
ORDER BY $S/STAFFNO
RETURN $S/STAFFNO, $S/NAME,
$S/POSITION, $S/SALARY
}
Example – Joining Two Documents staffAtBranch($B)
List each branch office and staff who work there. XML Information Set (Infoset)
<BRANCHLIST> ● Abstract description of information available in well-formed XML
FOR $B IN document that meets certain XML namespace constraints.
distinct-values(doc(“staff_list.xml”)//@branchNo) ● XML Infoset is attempt to define set of terms that other XML
ORDER BY $B specifications can use to refer to the information items in a
RETURN well-formed (although not necessarily valid) XML document.
<BRANCHNO> { $B/text() } { ● Does not attempt to define complete set of information, nor does it
FOR $S IN doc(“staff_list.xml”)//STAFF represent minimal information that an XML processor should
WHERE $S/@branchNo = $B return to an application.
ORDER BY $S/STAFFNO ● It also does not mandate a specific interface or class of interfaces
RETURN $S/STAFFNO, $S/NAME, $S/POSITION, $S/SALARY (although Infoset presents information as tree).
} ● XML document’s information set consists of two or more
</BRANCHNO> information items.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
● An information item is an abstract representation of a component ● Also defines all permissable values of expressions in XSLT,
of an XML document such as an element, attribute, or processing XQuery, and XPath.
instruction. ● Data Model is based on XML Infoset, with following new features:
● Each information item has a set of associated properties. e.g., – support for XML Schema types;
document information item properties include: – representation of collections of documents and of simple
o [document element]; and complex values.
o [children]; ● Decided to make XPath subset of XQuery.
o [notations]; [unparsed entities]; ● XPath spec shows how to represent information in XML Infoset as
o [base URI], [character encoding scheme], a tree structure containing seven kinds of nodes (document,
[version], and [standalone]. element, attribute, text, comment, namespace, or processing
Post-Schema Validation Infoset (PSVI) instruction), with XPath operators defined in terms of these seven
● XML Infoset contains no type information. nodes.
● To overcome this, XML Schema specifies an extended form of ● To retain these operators while using richer type system provided
XML Infoset called Post-Schema Validation Infoset (PSVI). by XML Schema, XQuery extended XPath data model with
● In PSVI, information items representing elements and attributes additional information contained in PSVI.
have type annotations and normalized values that are returned by
an XML Schema processor. ● Data Model is node-labeled, tree-constructor, with notion of node
● PSVI contains all information about an XML document that a identity to simplify representation of reference values (such as
query processor requires. IDREF, XPointer, and URI values).
● An instance of data model represents one or more complete
documents or document parts, each represented by its own tree of
XQuery 1.0 and XPath 2.0 Data Model nodes.
● Defines the information contained in the input to an XSLT or ● Every value is ordered sequence of zero or more items, where an
XQuery Processor. item can be an atomic value or a node.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
● An atomic value has a type, either one of atomic types defined in ● Data Model also specifies a number of constructor functions whose
XML Schema or restriction of one of these types. purpose is to illustrate how nodes are constructed.
● When a node is added to a sequence its identity remains same. ER Diagram Representing Main Components
Thus, a node may occur in more than one sequence and a sequence
may contain duplicate items.
● Root node representing XML document is a document node and
each element in document is represented by an element node.
● Attributes represented by attribute nodes and content by text nodes
and nested element nodes.
● Primitive data in document is represented by text nodes, forming
the leaves of the node tree.
● Element node may be connected to attribute nodes and text
nodes/nested element nodes.
● Every node belongs to exactly one tree, and every tree has exactly
one root node.
● Tree whose root node is document node is referred to as a
document and a tree whose root node is some other kind of node is
referred to as a fragment.
Example - XML Query Data Model
● Information about nodes obtained via accessor functions that can
operate on any node.
● Accessor functions are analogous to an information item’s named
properties.
● These functions are illustrative and intended to serve as concise
description of information that must be exposed by Data Model.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
XQuery Formal Semantics – Normalization ● WHERE clause normalized to IF expression that returns an empty
● FLWOR expression covered by two sets of rules; first splits sequence if condition is false and normalizes result:
expression at clause level then applies further normalization to [WHERE Expr1]FLWOR(Expr)
each clause: ==
[(ForClause | LetClause | WhereClause | OrderByClause) IF ([Expr1]Expr) THEN Expr ELSE ( )
FLWORExpr]Expr Normalization – Example
== FOR $i IN $I, $j IN $J
[(ForClause | LetClause | WhereClause | OrderByClause)]FLWOR LET $k := $i + $j
([FLWORExpr]Expr) WHERE $k > 2
[(ForClause | LetClause | WhereClause | OrderByClause) RETURN RETURN ($i, $j)
Expr]Expr FOR $i IN $I RETURN
== FOR $j in $J RETURN
[(ForClause | LetClause | WhereClause | OrderByClause)]FLWOR LET $k := $i + $j RETURN
([Expr]Expr) IF ($k > 2) THEN RETURN ($i, $j)
XQuery Formal Semantics – Normalization ELSE ( )
● Second set applies to FOR and LET clauses and transforms each Static Type Analysis
into series of nested clauses, each of which binds one variable. For ● XQuery is strongly typed so types of values and expressions must
example, for the FOR clause we have: be compatible with context in which they are used.
[FOR varRef1 TypeDec1? PositionalVar1? IN Expr1, …, ● After normalization static type analysis may optionally be
varRefn TypeDecn? PositionalVarn? IN Exprn]FLWOR(Expr) performed.
== ● Static type of an expression is defined as ‘most specific type that
FOR varRef1 TypeDec1? PositionalVar1? IN [Expr1]Expr RETURN … can be deduced for that expression by examining the query only,
FOR varRefn TypeDecn? PositionalVarn? IN [Exprn]Expr RETURN Expr independent of the input data’.
● Useful for detecting certain types of error early in development.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
● Also useful for optimizing query execution; e.g. may be able to ● All implementations of XQuery must support dynamic typing,
conclude that result of query is an empty sequence. which checks during dynamic evaluation that type of a value is
● Based on set of inference rules used to infer static type of each compatible with context in which it is used.
expression, based on static types of its operands. ● Type error raised if an incompatibility is detected.
● Bottom-up process, starting at leaves of expression tree containing ● Based on judgments, called evaluation judgments:
simple constants and input data whose type can be inferred from ▪ dynEnv |- Expr ⇒ Value
schema of input document. ● States that “in dynamic environment dynEnv, the evaluation of
● Inference rules used to infer static types of more complex expression Expr yields value Value”.
expressions at next level of tree until entire tree processed.
● Type error raised if static type of some expression is inappropriate. u Inference rule is written as collection of hypotheses (judgments)
Static Type Analysis – Inference Rules and a conclusion, written respectively above and below a dividing
line.
Static typing takes a static environment and an expression and infers a
u Consider logical expressions:
type. Written as:
dynEnv |- Expri ⇒ false 1<= i <= 2
statEnv |- Expr : Type dynEnv |- Expr1 AND Expr2 ⇒ false
– States that “in environment statEnv, expression Expr has type dynEnv |- Expri ⇒ RAISES Error 1<= i <= 2
Type”. dynEnv |- Expr1 AND Expr2 ⇒ RAISES Error
– This is called a typing judgment (a judgment expresses whether a ● Consider following expression:
property holds or not). o (1 IDIV 0 = 1) AND (2 = 3)
– Inference rule written as a collection of premises and a conclusion; ● If left-hand expression evaluated first it will raise an error (divide
for example: by zero) and overall expression will raise an error (no need to
statEnv |- Expr1:xsd:boolean statEnv |- Expr2:Type2 statEnv |- Expr3:Type3 evaluate the right-hand expression).
statEnv |- IF Expr1 THEN Expr2 ELSE Expr3 : (Type2 | Type3)
Dynamic Evaluation
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
● Conversely, if right-hand expression evaluated first, overall TOPIC: XML AND DATABASE
expression will evaluate to false (no need to evaluate the left-hand ● Need to handle XML that:
expression). – may be strongly typed governed by XML Schema;
– may be strongly typed governed by another schema
language, such as a DTD or RELEX-NG;
– may be governed by multiple schemas or one schema may
be subject to frequent change;
– may be schema-less;
– may contain marked-up text with logical units of text
(such as sentences) that span multiple elements;
– has structure, ordering, and whitespace that may be
significant;
– may be subject to update as well as queries based on
context and relevancy.
● Four general approaches to storing an XML document in RDB:
– store the XML as the value of some attribute within a
tuple;
– store the XML in a shredded form across a number of
attributes and relations;
– store the XML in a schema independent form;
– store the XML in a parsed form; i.e., convert the XML to
internal format, such as an Infoset or PSVI representation,
and store this representation.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
Storing XML in an Attribute ● With this approach also have to create an appropriate database
● In past the XML would have been stored in an attribute whose data structure.
type was CLOB. Schema-Independent Representation
● More recently, some systems have a new native XML data type
(e.g. XML or XMLType).
● Raw XML stored in serialized form, which makes it efficient to
insert documents into database and retrieve them in their original
form.
● Relatively easy to apply full-text indexing to documents for Schema-Independent Representation
contextual and relevance retrieval. However, question about ● Could use DOM to represent structure of XML data.
performance of general queries and indexing, which may require ● Since XML is a tree structure, each node may have only one
parsing on-the-fly. parent. The rootID attribute allows a query on a particular node to
● Also, updates usually require entire XML document to be replaced be linked back to its document node.
with a new document. ● While this is schema independent, recursive nature of structure can
cause performance problems when searching for specific paths.
● XML decomposed (shredded) into its constituent elements and ● To overcome this, create denormalized index containing
data distributed over number of attributes in one or more relations. combinations of path expressions and a link to node and parent
● Storing shredded documents may make it easier to index values of node.
some elements, provided these elements are placed into their own XML and SQL
attributes. ● SQL:2003 has extensions to enable publication of XML
● Also possible to add some additional data relating to hierarchical (commonly referred to as SQL/XML):
nature of the XML, making it possible to recompose original – new native XML data type, XML, which allows XML
structure and ordering, and to allow the XML to be updated. documents to be treated as relational values in columns of
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
tables, attributes in user-defined types, variables, and ● XMLPARSE, to perform a non-validating parse of a character
parameters to functions; string to produce an XML value.
– set of operators for the type; ● XMLROOT, to create an XML value by modifying the properties
– implicit set of mappings from relational data to XML. of the root item of another XML value.
● Standard does not define any rules for the inverse process; i.e., ● XMLCOMMENT, to generate an XML comment.
shredding XML data into an SQL form, with some minor ● XMLPI, to generate an XML processing instruction.
exceptions. ● XMLSERIALIZE, to generate a character or binary string from an
Example– Creating Table using XML Type XML value;
CREATE TABLE XMLStaff ( docNo CHAR(4), docDate DATE, staffData ● XMLAGG, an aggregate function, to generate a forest of elements
XML, PRIMARY KEY docNo); from a collection of elements.
INSERT INTO XMLStaff VALUES (‘D001’, DATE‘2004-12-01’, Example – Using XML Operators
XML(‘<STAFF branchNo = "B005"> List all staff with salary > £20,000, as an XML element containing name
<STAFFNO>SL21</STAFFNO> and branch number as an attribute.
<POSITION>Manager</POSITION> SELECT staffNo, XMLELEMENT (NAME “STAFF”, fName || ‘ ’ ||
<DOB>1945-10-01</DOB> lName,
<SALARY>30000</SALARY> </STAFF>’) ); XMLATTRIBUTES (branchNo AS
SQL/XML Operators “branchNumber”) ) AS “staffXMLCol”
● XMLELEMENT, to generate an XML value with a single element FROM Staff
as a child of its root item. Element can have attributes specified via WHERE salary > 20000;
XMLATTRIBUTES subclause. Example – Using XML Operators
● XMLFOREST, to generate an XML value with a list of elements For each branch, list names of all staff with each one represented as an
as children of a root item. XML element.
● XMLCONCAT, to concatenate a list of XML values. SELECT XMLELEMENT (NAME “BRANCH”,
XMLATTRIBUTES (branchNo AS “branchNumber”),
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
XMLAGG ( XMLELEMENT (NAME “STAFF”, – XML namespaces use ‘:’ to separate namespace prefix
fName || ‘ ’ || lName) from local component.
ORDER BY fName || ‘ ’ || lName )) ● Resolved using escape notation that changes unacceptable
AS “branchXMLCol” characters in XML Names into sequence of allowable characters
FROM Staff GROUP BY branchNo; based on Unicode values (“_xHHHH_”).
SQL/XML Mapping Functions
● SQL/XML also defines mapping from tables to XML documents. Mapping SQL Data Types to XML Schema
● Mapping may take as its source an individual table, all tables in a ● SQL/XML maps each SQL data type to closest match in XML
schema, or all tables in a catalog. Schema, in some cases using facets to restrict acceptable XML
● Standard does not specify syntax for the mapping; instead it is values to achieve closest match.
provided for use by applications and as a reference for other ● For example:
standards. – SMALLINT mapped to a restriction of xsd:integer with
● Mapping produces two XML documents: one that contains mapped minInclusive and maxInclusive facets set.
table data and other that contains an XML Schema describing the – CHAR mapped to restriction of xsd:string with facet
first. length set.
Mapping SQL Identifiers to XML Names – DECIMAL mapped to xsd:decimal with precision and
● Number of issues had to be addressed to map SQL identifiers to scale set.
XML Names: Mapping Tables to XML Documents
– range of characters that can be used within an SQL ● Create root element named after table with <row> element for each
identifier larger than range for an XML Name; row.
– SQL delimited identifiers (identifiers within ● Each row contains a sequence of column elements, each named
double-quotes), permit arbitrary characters to be used at after corresponding column.
any point in identifier; ● Each column element contains a data value.
– XML Names that begin with ‘XML’ are reserved;
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
● Names of table and column elements are generated using fully ● XML document must be unit of (logical) storage although not
escaped mapping from SQL identifiers to XML Names. restricted by any underlying physical storage model (so traditional
● Must also specify how nulls are to be mapped, using ‘absent’ DBMSs not ruled out nor proprietary storage formats such as
(column with null would be omitted) or ‘nil’. indexed, compressed files).
Generating an XML Schema ● Two types:
● Generated by creating globally-named XML Schema data types for o text-based, which stores XML as text, e.g. as a
every type required to describe tables(s) being mapped. file in file system or as a CLOB in an RDBMS;
● Naming convention uses suffix containing length or precision/scale o model-based, which stores XML in some internal
to name of the base type (e.g. CHAR(10) would be CHAR_10). tree representation, e.g., an Infoset, PSVI, or
● Next, named XML Schema type is created for types of the rows in representation, possibly with tags tokenized.
table (name used is ‘RowType’ concatenated with catalog, schema, NOTES FROM INTERNET
and table name). ● Mapping XML into relational data
● Named XML Schema type is created for type of the table itself ● Generating XML using Java and JDBC
(name used is ‘TableType’ concatenated with catalog, schema, and ● Storing XML
table name). ● XML on the Web
● Finally, an element is created for table based on this new table ● XML support in Oracle
type. ● XML API for databases
Native XML Databases
● Defines (logical) data model for an XML document (as opposed to Mapping XML into relational data
data in that document) and stores/retrieves documents according to The database
that model. ● We can model the database with a document node and its
● At a minimum, model must include elements, attributes, PCDATA, associated element node:
and document order. <?xml version=“1.0” ?>
<myDatabase>
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
table1 fieldm
table2 </custRec>
... ● The name is arbitrary, since the relational data model doesn't
tablen define a name for a record type
</myDatabase> The field
● Order of tables is immaterial ● A field is represented as an element node with a data node as its
The table only child:
● Each table of the database is represented by an element node with <custName type="t">
the records as its children: d
<customer> </custName>
record1 ● If d is omitted, it means the value of the fields is the empty string.
record2 ● The value of t indicates the type of the value
... Example
recordm <?xml version=“1.0” ?>
</customer> <myDatabase>
● Again, order of the records is immaterial, since the relational data <customers>
model defines no ordering on them. <custRec>
The record <custName type=“String”>Robert Roberts</custName>
● A record is also represented by an element node, with its fields as <custAge type=“Integer”>25</custAge>
children: </custRec>
<custRec> <custRec>
field1 <custName type=“String”>John Doe</custName>
field2 <custAge type=“Integer”>32</custAge>
... </custRec>
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
</customers> }
</myDatabase> xml.append(“</customers></myDatabase>”);
Generating XML from relational data Storing XML in relational tables
Step 1 : Set up the database connection Step 1 : Set up the parser
StringReader stringReader = new StringReader(xmlString);
// Create an instance of the JDBC driver so that it has InputSource inputSource = new InputSource(stringReader);
// a chance to register itself DOMParser domParser = new DOMParser();
Class.forName(sun.jdbc.odbc.JdbcOdbcDriver).newInstance(); domParser.parse(inputSource);
// Create a new database connection. Document document = domParser.getDocument();
Connection con = Step 2 : Read values from parsed XML document
DriverManager.getConnection(jdbc:odbc:myData, “”, “”); NodeList nameList = doc.getElementsByTagName(“custName”);
// Create a statement object that we can execute queries with NodeList ageList = doc.getElementsByTagName(“custAge”);
Statement stmt = con.createStatement(); Step 3 : Set up database connection
Step 2 : Execute the JDBC query Class.forName(sun.jdbc.odbc.JdbcOdbcDriver).newInstance();
String query = “Select Name, Age from Customers”; Connection con =
ResultSet rs = stmt.executeQuery(query); DriverManager.getConnection(jdbc:odbc:myDataBase, “”, “”);
Step 3 : Create the XML! Statement stmt = con.createStatement();
StringBuffer xml = “<?xml version=‘1.0’?><myDatabase><customers>”; Step 4 : Insert data using appropriate JDBC update query
while (rs.next()) { String sql = “INSERT INTO Customers (Name, Age) VALUES (?,?)”;
xml.append(“<custRec><custName>”); PreparedStatement pstmt = conn.prepareStatement(sql);
xml.append(rs.getString(“Name”)); int size = nameList.getLength();
xml.append(“</custName><custAge>”); for (int i = 0; i < size; i++) {
xml.append(rs.getInt(“Age”)); pstmt. setString(1, nameList.item(i).getFirstChild().getNodeValue());
xml.append(“</custAge></custRec>”); pstmt.setInt(2, ageList.item(i).getFirstChild().getNodeValue());
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS
pstmt.executeUpdate(sql);
}
XML on the Web (Servlets)
public void doGet(HttpServletRequest req, HttpServletResponse resp)
{
resp.setContentType("text/xml");
PrintWriter out = new PrintWriter(resp.getOutputStream());
… generate XML here, as before…
out.println(xmlGenerated); Let’s insert the XSL…
out.flush(); <?xml version=“1.0” ?>
out.close(); <?xml-stylesheet type="text/xsl" href="http://myServer/Customer.xsl"?>
} <myDatabase>
● Appropriate XSL can be inserted for display <customers>
<custRec>
<custName type=“String”>Robert Roberts</custName>
<custAge type=“Integer”>25</custAge>
</custRec>
… other records here …
</customers>
XML in IE 5.0 </myDatabase>
XML with XSL in IE 5.0
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT IV
EMERGING SYSTEMS