You are on page 1of 5

Week 1 part 2: Database

Applications and Technologies


Based on a Colloquium by
…Data everywhere… Philip A. Bernstein
SQL Databases, Packaged presented at
applications the University of Toronto,
Data warehouses, Groupware on October 19, 1999
Internet databases, Data mining
Object-relational databases,
Scientific databases
©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Applications & Technologies — 1 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Applications & Technologies — 2

The Data Scalability Problem


ƒ Very large data sets are everywhere ...
Where do these data
ƒ Terabyte data warehouses come from?
ƒ Scientific databases (GB’
(GB’s/day) What technologies do
ƒ Thousands of databases per enterprise
ƒ Thousands of tables per database they use??
ƒ The Internet … all of the world’
world’s documents
at your fingertips!
ƒ 1B machines with 8GB = 8PB (Pentabytes
(Pentabytes,, Whatever they use,
that is)
ƒ What’
What’s limiting our ability to fully exploit
they need models
on-
on-line data is not just size but complexity.
complexity. (schemas, metadata,…)
©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Applications & Technologies — 3 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Applications & Technologies — 4

1
Table
SQL Database Systems Information Resource Management
• Managing descriptions of DB’s and applications
ƒ ~ $5.4B in new SQL DB licenses in 1998 • Reverse engineer & import code and DB Customer Update
Marketing

ƒ Plus another ~ $1B in DB tools


cust
emp Order Product Authorize
Credit Order

dept
Entry

dno

schema
Scheduled Bill
Customer

dna
Delivery

Salesperson

select all
Schedule
Delivery
Inventory

ƒ 2/3 transaction processing, 1/3 decision support • Catalogue management and browser tools ERD
Customer

Scheduled
Delivery
Order

Salesperson
Product
Forms Spec

Architecture

ƒ Transaction processing
Tables
VB Code VC++ Code

• Applications
ƒ 135K txns/min (tpmC), $97/tpmC, $13.1M system cost
• Metadata reporting- find relevant databases
ƒ 20K txns/min (tpmC), $15/tpmC, $305K system cost
• Year 2000 - find date fields
ƒ 7x txn rate, for 43x the price! (Scale out vs. scale
• Database translation - move to a different
up)
DBMS
ƒ TP monitors are becoming Internet application
servers • Technology – entity-relationship model on SQL
ƒ They’re at the core of all big e-commerce sites. DBMS
ƒ Query optimization is excellent, but there’s still much • Platinum (CA), Viasoft (R&O)
room for improvement • No automatic integration or categorization

©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Applications & Technologies — 5 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Applications & Technologies — 6

Packaged Applications
Example of DB Reengineering ƒ Packaged applications (Enterprise Resource Planning
The relation systems) drive a lot of DB business -- there is little
custom application development at large companies
Employee(emp#,cname,dept,hiredt,sin# ƒ Commercial applications - finance, manufacturing,
,sal) distribution, human resources, order processing ...
might be mapped into Date ƒ Vendors include SAP ($5.1B), Peoplesoft ($1.3B), Baan
hires ($700M), Oracle apps ($700M), growing at 20%
M 1
ƒ They all have big built-in repositories of models to
Company
Company
Employee
Employee manage DB mapping, customization, & application
M 1
Name upgrades.
Sin# employs
ƒ Models drive package integration
Emp#
Emp# ƒ Load models from legacy apps; then write adapters
Dept ƒ Interfaces on which to build custom front ends
Sal

©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Applications & Technologies — 7 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Applications & Technologies — 8

2
Data Warehousing The Killer Model Mgmt App
ƒ Business goal - decision support for everyone ƒ Creating and maintaining a Data Warehouse is hard. You need
ƒ Problem - Ad hoc query on production DBs doesn’t work tools, which require lots of models.
ƒ Approach - Create snapshot DB for decision support
= a data warehouse ƒ Inconsistent data formats Data quality & timeliness
ƒ One of main growth areas of the DB business
¾ $2B revenue in 1995, grew to $8B in 1998 (h/w + s/w)

Production Data Data Users ƒ Missing or invalid data Relate technical and
Databases Warehouse Marts business models

Scrub & Filter & ƒ Semantic inconsistencies Tracing data lineage


Transform Roll Up (instance-level)

©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Applications & Technologies — 9 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Applications & Technologies — 10

Model-driven data transformation tools Groupware Applications


ƒ E-Mail/BBoards aren’t yet on relational DBMSs
ƒ They’re huge… 100MB/person x 10K persons =
1TB
ƒ Heavily used for document management
Customer
Update
Marketing

cust
emp
Order Product
Authorize
Credit Order
Entry

dept

ƒ Generate code for loading a data warehouse dno Scheduled Bill


Delivery Customer

dna Salesperson
Schedule

select all Delivery

ƒ $2B/yr + 15%/yr, a major and growing part of IT.


Inventory

Forms Spec

Version schemas and transformations for ƒ Web-based portals are essential to core business,
ERD

ƒ
Customer

Order Product

Scheduled
Delivery

Salesperson

Architecture
Tables
VB Code VC++ Code

lineage combine data warehouse and document DB (finance,


ƒ Use tool-specific repository engines on SQL marketing)
ƒ Very weak model management today
DBMS ƒ Ad hoc extensibility of models in today’s mail
ƒ More powerful and general-purpose tools are systems
needed ƒ Yahoo!-like categorization and its data sources
e.g., cost centers, organization structure, sales
docs…
ƒ Web site management (dependencies and
configurations)
©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Applications & Technologies — 11 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Applications & Technologies — 12

3
Internet Databases Data Mining
ƒ The other big growth area of today’s DB business
ƒ Internet content-oriented meta-data is big business ƒ Extract patterns and models from the data
-- indexing, categorization hierarchies, comparison ƒ Automatically, from large data sets
shopping, … ƒ Use statistics, pattern recognition, machine
ƒ Structural metadata is just starting to get attention learning, visualization
ƒ XML interoperation requires libraries of DTD’s ƒ Applications - finance, fraud detection, astronomy,
ƒ Need tools to map to related DB schemas, forms, marketing analysis, medical diagnosis, biology …
… ƒ A small market, with much projected growth
ƒ E-commerce application deployment requires ƒ Must define and manage data subsets to mine
configuration definition and requirements -- TP ƒ And retain measures, derived data, and lineage
monitor and ORB technology is weak. ƒ Like scientific experiments, with irregular
ƒ Vision - Plug in a DB or application and it’s analysis steps
accessible and manageable; Improve hot links, ƒ Versioning is important.
search engines, and categorizations
©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Applications & Technologies — 13 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Applications & Technologies — 14

Object-Relational Databases Managing Object-Relational Models


ƒ Goal - Capture more of the world’s data, not just scalars OR DB requires richer models in its catalogues
- text, video, audio, time series, graphs, digital photos, ƒ Connections between types, such as inheritance,
medical records, insurance claims, etc. nesting, and relationships
ƒ Richer operators - specialized to data types -- content- ƒ Abstract data types, including operations (code
based retrieval, image similarity, path search, becomes a more intrinsic part of DBMS)
ƒ Richer constraints & triggers, e.g., to support workflow ƒ That is, DBMS catalog’s information model is
ƒ Adding a data type affects every DBMS component, extended and becomes more extensible
components must be designed for extensibility ƒ OR DB could be used for extended types that are
ƒ All DB vendors are working on it. especially suitable for models
ƒ Today’s systems are incomplete and hard to extend ƒ Labeled directed graphs, with closure
ƒ 3rd parties will supply data blades / extenders / functionality
cartridges ƒ Versions and configurations

©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Applications & Technologies — 15 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Applications & Technologies — 16

4
Scientific Databases
What Was Left Out
ƒ CERN laboratory planning a new
ƒ Digital libraries
generation of experiments in particle
physics which will generate 1-10PB per ƒ Document management
year starting in 2005. ƒ Directory services
ƒ These data will be distributed world-wide ƒ Advanced file systems
to be analyzed by scientists ƒ Software databases and repositories
(public.web.cern.ch/Public/ ƒ Configuration management systems
ƒ To deal with the data, there is an
ƒ ...
international Data Grid project e.g.,
www.cacr.caltech.edu/ppdg/,
grid.web.cern.ch/grid/
©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Applications & Technologies — 17 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Applications & Technologies — 18

Asilomar Report Grand Challenge

The Information Utility


“...Make it easy for everyone to store,
organize, access, and analyze the
majority of human information online…”

©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Applications & Technologies — 19

You might also like