Professional Documents
Culture Documents
Handbook
Donald K. Burleson
Joseph Hudicka
William H. Inmon
Craig Mullins
Fabian Pascal
The Data Warehouse eBusiness DBA
Handbook
By Donald K. Burleson, Joseph Hudicka, William H. Inmon,
Craig Mullins, Fabian Pascal
Copyright © 2003 by BMC Software and DBAzine. Used with permission.
Printing History:
Oracle, Oracle7, Oracle8, Oracle8i and Oracle9i are trademarks of Oracle Corporation.
Many of the designations used by computer vendors to distinguish their products are
claimed as Trademarks. All names known to Rampant TechPress to be trademark names
appear in this text as initial caps.
The information provided by the authors of this work is believed to be accurate and
reliable, but because of the possibility of human error by our authors and staff, BMC
Software, DBAZine and Rampant TechPress cannot guarantee the accuracy or
completeness of any information included in this work and is not responsible for any
errors, omissions or inaccurate results obtained from the use of information or scripts in
this work.
Links to external sites are subject to change; DBAZine.com, BMC Software and
Rampant TechPress do not control or endorse the content of these external web sites,
and are not responsible for their content.
ISBN 0-9740716-2-5
This book provides you with insight into how to build the
foundation of your eBusiness application. You’ll learn the
importance of the Data Warehouse in your daily operations.
You’ll gain lots of insight into how to properly design and build
your information architecture to handle the rapid growth that
eCommerce business sees today. Once your system is up and
running, it must be maintained. There is information in this
text that goes through how to maintain online data systems to
reduce downtime. Keeping your online data secure is another
big issue with online business. To wrap things up, you’ll get
links to some of the best online resources on Data
Warehousing.
and eBusiness
Making the Most of E-business
Everywhere you look today, you see e-business. In the trade
journals. On TV. In the Wall Street Journal. Everywhere. And
the message is that if your business is not e-business enabled,
that you will be behind the curve.
So what is all the fuss about? Behind the corporate push to get
into e-business is a Web site. Or multiple Web sites. The Web
site allows your corporation to have a reach into the
marketplace that is direct and far reaching. Businesses that
would never have entertained entry to foreign marketplaces and
other marketplaces that are hard to access suddenly have easy
and cheap presence. In a word, e-business opens up
possibilities that previously were impractical or even
impossible.
Figure 2: Sitting behind the web site is the infrastructure called the
"corporate information factory"
Warehousing
The Data Warehouse Foundation
The Web-based e-business environment has tremendous
potential. The Web is a tremendously powerful medium for
delivery of information. But there is nothing intrinsically
powerful about the Web other than its ability to deliver
information. In order for the Web-based e-business
environment to deliver its full potential, the Web-based
environment requires an infrastructure in support of its
information processing needs. The infrastructure that best
supports the Web is called the corporate information factory.
At the center of the corporate information factory is a data
warehouse.
The costs that were described for Fig 2 now are multiplied by
seven (or whatever number of units of data are required). As
the analyst is developing the procedures for getting the unit of
information required, no thought is given to getting
information for other units of information. Therefore each
12 The Data Warehousing eBusiness DBA Handbook
time a new piece of information is required, the process
described in Fig 2 begins all over again. AS a result, the cost of
information spikes dramatically.
Fig 7 shows that when the long-term needs for information are
considered, the data warehouse is far and away the less
expensive than the series of short term efforts. And the length
of time for access to information is an intangible whose worth
is difficult to measure. No one argues that information today,
right now is much more effective than information six months
from now. In fact, six months from now I will have forgotten
why I wanted the information in the first place. You simply
cannot beat a data warehouse for speed and ease of access of
information.
Warehouse
The Foundations of E-Business
The basis for a long-term, sound e-business competitive
advantage is the data warehouse.
Intelligent Messages
To give your messages sent via the Internet some punch, you
need intelligence behind them. And the basis of that
intelligence is the information that is buried in a data
warehouse.
Integrated Data
Integrated data has a related but different effect. Suppose you
are a salesperson wanting to sell something (it really doesn't
matter what). Your boss gives you a list and says go to it. Here's
your list:
acct 123
acct 234
acct 345
acct 456
acct 567
acct 678
You start by making a few contacts, but you find that you're
not having much success. Most everyone on your list isn't
interested in what you're selling.
Here's how your list looks with even more integrated data:
acct 123 - John Smith - male - 25 years old - single
acct 234 - Mary Jones - female - 58 years old - widow
acct 345 - Tom Watson - male - 52 years old - married
acct 456 - Chris Ng - female - 18 years old - single
acct 567 - Pat Wilson - male - 68 years old - married
acct 678 - Sam Freed - female - 45 years old - married
Integrated Data 25
- Chris Ng - female - 18 years old - single
- profession - hair dresser - income - 18,000 - no family -
rents
acct 456 - net worth - 0 - drives - Honda - school - none
- degree - none
- hobbies - hiking, tennis
- Pat Wilson - male - 68 years old - married
- profession - retired - income - 25,000 - two sons - rents
acct 567 - net worth - 25,000 - drives - nothing
- school - U Texas - degree - PhD
- hobbies - watching football
- Sam Freed - female - 45 years old - married
- profession - pilot - income - 150,000 - son and
daughter - owns home
acct 678 - net worth - 750,000 - drives - Toyota
- school - UCLA - degree - BS
- hobbies - thimble collecting
Looking Smarter
Stated differently, with integrated data you can be a great deal
more accurate and efficient in your sales efforts. Integrated data
saves huge amounts of time that would otherwise be wasted.
With integrated customer data, your Internet messages start to
make you look smart.
But making sales isn't the only use for integrated information.
Marketing can also make great use of this information. It
probably doesn't make sense, for example, to market tennis
26 The Data Warehousing eBusiness DBA Handbook
equipment to Sam Freed. Chris Ng is a much better bet for
that. And it probably doesn't make sense to market football
jerseys to Tom Watson. Instead, marketing those things to Pat
Wilson makes the most sense. Integrated information is worth
its weight in gold when it comes to not wasting marketing
dollars and opportunities.
Looking Smarter 27
The Role of the eDBA
4
CHAPTER
And now the role of the DBA expands even further with the
introduction of database procedural logic.
Information
Architecture
How to Select the Optimal Information Exchange
Architecture
Introduction
Over 80 percent of Information Technology (IT) projects fail.
Startling? Maybe. Surprising? Not at all. In almost every IT
project that fails, weakly documented requirements are typically
the reason behind the failure. And nowhere is this more
obvious than in data migration.
Data Volume
Understanding how much data must be moved from point to
point will give you metrics against which you can compare your
network bandwidth. If your network is nearly saturated already,
40 The Data Warehousing eBusiness DBA Handbook
adding the burden of information exchange may be more than
it can handle.
Transformation Requirements
Before all else, be sure to conduct a Data Quality Assessment
(DQA) to evaluate the health of your data resources. Probably
the most high profile element of an information architecture is
its error management/resolution capabilities. A DQA will
identify existing problems that lurk in the data, and highlight
enhancements that should be made to the systems that generate
this data, to prevent such data quality concerns from happening
in the future. Of course, there will be some issues that simply
are not preventable, and others that have not yet materialized.
In this case, it will be beneficial to implement monitors that
periodically sample your data in search of non-compliant data.
Frequency
Determine how often data must be transmitted from point to
point. Will data flow in one direction, or will it be bi-
directional? Will it be flowing between two systems, or more
than two? Can the information be exchanged weekly, nightly,
or must it be as near to real-time as technically feasible?
The Main Variables to Ponder 41
Optimal Architecture Components
The optimal information exchange architecture will include as
many of the following components as warranted by the
projects objectives:
1. Data profiling
2. Data cleansing
3. System/network bandwidth resources
4. ETL (Extraction, Transformation & Loading)
5. Data monitoring
Naturally, there are commercial products available for each of
these components, but you can just as easily build utilities to
address your specific objectives.
Conclusion
While there is no single architecture that is ideal for all
Information exchange projects, the components laid out in this
paper are the key criteria that successful information exchange
projects address. Perhaps you can apply this five-tier
architecture to a new information exchange project, or evaluate
existing information exchange architectures in comparison to it,
and see if there is room for improvement. It is never too late to
improve the foundation of such a critical business tool.
Databases
Design and the eDBA
Welcome to another installment in the ongoing saga of the
eDBA. So far in this series of articles, we have discussed eDBA
issues ranging including availability and database recovery, new
technologies such as Java and XML, and even sources of on-
line DBA information.
But when did you ever throw away data? Oh, sure, you may
redesign a database or move from one DBMS to another. But
what did you do? Chances are, you saved the data and migrated
it from the old database to the new one. Some changes had to
be made, maybe some external data was purchased to combine
with the existing data, and most likely some parts of the
database were not completely populated. But data lives forever.
That is, novices do not have experience with databases and data
requirements gathering, so they attempt to design databases like
the flat files they are accustomed to using. This is a mistake
because problems inevitably occur after the databases and
applications become operational in a production environment.
Once the logical data model has been created, the DBA uses
his knowledge of the DBMS that will be be used to transform
logical entities and data elements into physical database tables
and columns. To successfully create a physical database design,
you will need to have a good working knowledge of the
features of the DBMS, including:
In-depth knowledge of the database objects supported by
the DBMS and the physical structures and files required to
support those objects
Details regarding the manner in which the DBMS supports
indexing, referential integrity, constraints, data types, and
other features that augment the functionality of database
objects
Detailed knowledge of new and obsolete features for
particular versions or releases of the DBMS to be used
Knowledge of the DBMS configuration parameters that are
in place
Data definition language (DDL) skills to translate the
physical design into actual database objects
Armed with the correct information, the DBA can create an
effective and efficient database from a logical data model. The
first step in transforming a logical data model into a physical
Infrastructure
E-Business and Infrastructure
Pick up any computer trade journal today and you can't but
help read about e-business. It's everywhere.
But along the way some things you never read about start to
happen. The Web application that was originally designed
needs to be changed because the visitors to the website are
responding in a manner totally unanticipated. They are looking
at things you never intended them to look at. And they are
ignoring things that they should be paying attention to. The
changes need to be made immediately.
The reality of the Web environment has hit. Creating the Web
is one thing. Making it operate successfully on a day-to-day
basis is something else.
Then the volumes of data grow so large the Web can't swallow
them. Performance gets worse and files are lost. One day the
head of systems decides a new and larger (and much more
expensive) computer is needed for the server. But the cost and
complexity of upgrading the computer is only the beginning
headache. All of today's Web applications and data have to be
converted into tomorrow's Web systems. And the conversion
must be done with no outages that are apparent to the Web
visitors.
One capability the website has is the ability to create and send
transactions to the standard corporate systems environment.
When too much data starts to collect in the Web environment
it passes out of the Web environment into a granularity
manager that in turn passes the now refined and condensed
data into a data warehouse. And the website has the ability to
access data directly from the corporate environment by means
of an ODS.
Corporate Structure
Integrating Data in the Web-Based E-Business
Environment
In order to be called industrial strength, the Web-based e-
business environment needs to be supported by an
infrastructure called the corporate information factory. The
corporate information factory is able to manage large volumes
of data, provide good response time in the face of many
transactions, allow data to be examined at both a detailed level
and a summarized level, and so forth.
Figure 3: The content, structure, and keys of the corporate systems need to
be used in the creation of the Web environment.
Warehouse
The Issues of the E-Business Infrastructure
In order to have long-term success, the systems of the e-
business environment need to exist in an infrastructure that
supports the full set of needs. The e-business Web
environment needs to operate in conjunction with the
corporate information factory. The corporate information
factory then is the supporting infrastructure for the Web-based
e-business environment that allows for the different issues of
operation to be fulfilled.
What then are the issues that face the Web designer/
administrator in the successful operation of the Web e-business
environment? There are three pressing issues at the forefront
of success. They are:
managing the volumes of data that are collected as a by
product of e-business processing
establishing and achieving good website performance so
the Internet interaction is adequate to the user
integrating e-business processing with other already
established corporate processing
These three issues are at the heart of success of the operation
of the e-business environment. These issues are not addressed
directly inside the Web environment but by a combination of
the Web environment interfacing with the corporate
information factory.
As data passes from the website to the data warehouse the data
passes through a granularity manager. The granularity manager
performs the function of editing and condensing the Web
generated data. Data that is not needed is deleted. Data that
needs to be combined is aggregated. Data that is too granular is
summarized. The granularity manager has many ways of
Performance
Performance in the Web e-business environment is a funny
thing. Performance is vital to the success of the Web-based e-
business environment because in the Web-based e-business
environment the Web IS the store.
Performance 83
corporate information factory is through the corporate ODs
Figure 3 shows this interface.
Figure 3: The interface from the data warehouse environment to the Web
environment is by way of the corporate ODs
Integration 85
Figure 4: Corporate data is integrated with the Web data when they meet
inside the data warehouse.
Figure 4 shows that data from the Web passes into the data
warehouse. If the data coming from the Web has used
common key structures and definitions of data, then the
granularity manager has a simple job to do. But if the Web
designer has used unique conventions and structures for the
Web environment, then it is the job of the granularity manager
ALTERNATE SPELLINGS OF
LIBYAN LEADERS’ SURNAME
1 Qadhafi
2 Qaddafi
3 Qatafi
4 Quathafi
5 Kadafi
6 Kaddafi
7 Khadaffi
8 Gadhafi
9 Gaddafi
10 Gadafy
This approach does not alleviate the need for a data warehouse,
and there will still be integration rules to support, but
improving the quality of data at the point it is collected
considerably increases the likelihood that this data will be used
more effectively over a longer period of time.
There are many examples to follow. Take the EPA, which has
installed monitors of various shapes and sizes across the
continental U.S. and beyond. The monitors take periodic
samples of air and water quality and compare the sample results
to previously agreed-upon benchmarks.
We too must identify the data elements that contain the most
critical data sources we manage and develop data quality
monitors that periodically sample the data and track quality
levels.
eBusiness
Data Modeling for the Data Warehouse
In order to be effective, data warehouse developers need to
show tangible results quickly. At the same time, in order to
build a data warehouse properly, you need a data model. And
everyone knows that data models take huge amounts of time to
build. How then can you say in the same breath that a data
model is needed in order to build a data warehouse and that a
data warehouse should be built quickly? Aren't those two
statements completely contradictory?
The answer is -- not at all. Both statements are true and both
statements do not contradict each other if you know what the
truth is and understand the dynamics that are at work.
Fact 1 -- when you build a data model for a data warehouse you
build a data model for only the primitive data of the
corporation. Fig 1 suggests a data model for primitive data
of the data warehouse.
.......................................................
Stated differently, the data model for the data warehouse tries
to include as many classifications of data as possible and does
not exclude any reasonable classification. In doing so, the data
modeler sets the stage for all sorts of requirements to be
satisfied by the data warehouse.
Once defined this way in the data model, the data warehouse is
prepared to handle many requirements, some known, some
unknown.
For these reasons then, creating the data model for the data
warehouse is not a horribly laborious task given the parameters
of modeling only atomic data and putting in attributes that
allow the atomic data to be stretched any way desired.
Customer
Interacting with the Internet Viewer
The Web-based e-business environment is supported by an
infrastructure called the corporate information factory. The
corporate information factory provides many capabilities for
the Web, such as the ability to handle large volumes of data,
have good and consistent performance, see both detail and
summary information, and so forth.
The ability to remember who has been to a site and what they
have done is at the heart of the opportunity for cross selling,
extensions to existing sales, and many other marketing
opportunities. In order to see how this extended feedback loop
works, it makes sense to follow a customer through the process
for a few transactions.
The Web analyst reads the detailed data for each cookie. In the
case of the Internet viewer who has had one entry into the
website, there will be only one set of detailed data reflecting the
dialogue that has occurred. But if there have been multiple
entries by the Internet viewer, the Web analyst would consider
each of them.
In addition if the Web analyst has available other data about the
customer, that information is taken into consideration as well.
This analysis of detailed historical data is shown in Figure 3,
Step 1.
The viewer enters the system through the fire wall. Figure 4,
Step 1 shows this entry.
Control then passes to the Web manager and the first thing the
Web manager does is to determine if the cookie is known to
the system. Since this is the second (or later) dialogue for the
viewer there is a cookie record for the viewer. The Web
manager goes to the ODS and finds that indeed the cookie is
known to the system. This interaction is shown in Figure 4,
Step 2.
IN SUMMARY
The feedback loop that has been described fulfills the needs
of dialogue management in the Internet management. The
feedback loop allows:
each customer's records at the detailed level to
be analyzed
access to summary and aggregate information to be
made in subsecond time
records to be created each time new information
is available, and so forth.
IN SUMMARY 113
Getting Smart
14
CHAPTER
And how exactly are products priced just right? The genesis of
pricing products just right is the integrated historical data that
resides in the data warehouse. The data warehouse contains a
huge amount of useful sales data. Each sales transaction is
recorded in the data warehouse.
Historically Speaking
By looking at the past sales history of an item, the analyst can
start to get a feel for the price elasticity of the item. Price
elasticity refers to the sensitivity of the sale to the price of the
product. Some products sell well regardless of their price and
other products are very sensitive to pricing. Some products sell
well when the price is low but sell poorly when the price is
high.
114 The Data Warehousing eBusiness DBA Handbook
Consider the different price elasticity of two common products
- milk and bicycles.
MILK PRICE
$2.25/gallon 560 units sold
$2.15/gallon 585 units sold
$1.95/gallon 565 units sold
$1.85/gallon 590 units sold
$1.75/gallon 575 units sold
$1.65/gallon 590 units sold
BICYCLES
$400 16 units sold
$390 15 units sold
$380 19 units sold
$370 21 units sold
$360 20 units sold
$350 23 units sold
$340 24 units sold
$330 26 units sold
$320 38 units sold
$310 47 units sold
$300 59 units sold
$290 78 units sold
WASHING MACHINE
500 20 units
475 22 units
450 23 units
425 20 units
400 175 units
375 180 units
350 195 units
325 200 units
300 210 units
275 224 units
$200 35 days
$175 34 days
$150 36 days
$125 31 days
$100 21 days
$75 20 days
$50 15 days
And exactly where does the merchant get the numbers for
elasticity analysis? The answer, of course, is a data warehouse.
The data warehouse contains detailed, integrated, historical data
which is of course exactly what the business analyst needs to
affect these analyses.
Conclusion
Once the price elasticity of items is known, the merchant
knows just how to price the item. And once the merchant
knows exactly how to price an item, the merchant is positioned
to make money. The Web and eBusiness now are positioned to
absolutely maximize sales and revenue. However, note that if
the products are not priced properly, the Web accelerates the
rate at which the merchant loses money. This is what is meant
by being smart about the message you put out on the Web. The
Web accelerates every thing. It either allows you to make
money faster than ever before or lose money faster than ever
before. Whether you make or lose money depends entirely on
how smart you are about what goes out over the Web.
Java
The eDBA and Java
Welcome to another installment of our eDBA column where
we explore and investigate the skills required of DBAs as their
companies move from traditional business models to become
e-businesses. Many new technologies will be encountered by
organizations as they morph into e-businesses. Some of these
technologies are obvious such as connectivity, networking, and
basic web skills. But some are brand new and will impact the
way in which an eDBA performs her job. In this column and
next month's column I will discuss two of these new
technologies and the impact of each on the eDBA. In this
month we discuss Java: next time, XML. Neither of these
columns will provide an in-depth tutorial on the subject.
Instead, I will provide an introduction to the subject for those
new to the topic, and then describe why an eDBA will need to
know about the topic and how it will impact their job.
What is Java?
Java is an object-oriented programming language. Originally
developed by Sun Microsystems, Java was modeled after, and
most closely resembles, C++. But it requires a smaller footprint
and eliminates some of the more complex features of C++ (e.g.
pointer management). The predominant benefit of the Java
programming language is portability. It enables developers to
write a program once and run it on any platform, regardless of
hardware or operating system.
However, you can not use SQLJ to write dynamic SQL. This
can be a drawback if you desire the flexibility of dynamic SQL.
However, you can use both SQLJ and JDBC calls inside of a
single program. Additionally, if your shop uses ODBC for
developing programs that access Oracle, for example, then
JDBC will be more familiar to your developers than SQLJ.
One final issue for eDBAs confronted with Java at their shop:
you will need to have at least a rudimentary understanding of
how to read Java code. Most DBAs, at some point in their
career, get involved in application tuning, debugging, or
designing. Some wise organizations make sure that all
application code is submitted to a DBA Design Review process
before it is promoted to production status. The design review is
performed to make sure that the code is efficient, effective, and
properly coded. We all know that application and SQL is the
126 The Data Warehousing eBusiness DBA Handbook
single biggest cause of poor relational performance. In fact,
most experts agree that 70% to 80% of poor "relational"
performance is caused by poorly written SQL and application
logic. So reviewing programs before they are moved to
production status is a smart thing to do.
Resistance is Futile
Well, you might argue that portability is not important. I can
hear you saying "I've never written a program for DB2 on the
mainframe and then decided, oh, I think I'd rather run this over
on our RS/6000 using Informix on AIX." Well, you have a
point. Portability is a nice-to-have feature for most
organizations, not a mandatory one. The portability of Java
code helps software vendors more than IT shops. But if
software vendors can reduce cost, perhaps your software
budget will decrease. Well, you can dream, can't you?
Conclusion
Since Java is clearly a part of the future of e-business, eDBAs
will need to understand the benefits of Java. But, clearly, that
will not be enough for success. You also will need a
technological understanding of how Java works and how
relational data can be accessed efficiently and effectively using
Java.
Beginning to learn Java today is a smart move- one that will pay
off in the long-term, or perhaps near-term future!
XML
New Technologies of the eDBA: XML
This is the third installment of my regular eDBA column, in
which we explore and investigate the skills required of DBAs to
support the data management needs of an e-business. As
organizations move from a traditional business model to an e-
business model, they will also introduce many new
technologies. Some of these technologies, such as connectivity,
networking, and basic Web skills, are obvious. But some are
brand new and will impact the way in which eDBAs perform
their jobs.
What is XML?
XML is getting a lot of publicity these days. If you believe
everything you read, then XML is going to solve all of our
interoperability problems, completely replace SQL, and
possibly even deliver world peace. In reality, all of the previous
assertions about XML are untrue.
Some Skepticism
There are, however, some problems with XML. Support for
the language, for example, is only partial in the standard and
most popular Web browsers. As more XML capabilities gain
support and come to market, this will become less of a
problem.
Integrating XML
With the DBMS, more and more of the popular DBMS
products are providing support for XML. Take, for example,
the XML Extender provided with DB2 UDB Version 7. The
XML Extender enables XML documents to be integrated with
DB2 databases. By integrating XML into DB2, you can more
directly and quickly access the XML documents as well as
search and store entire XML documents using SQL. You also
have the option of combining XML documents with traditional
data stored in relational tables.
Comments:
It is at best misleading, and at worst disingenuous to claim
the MV data structure is an “improvement” on the
relational structure. First, the hierarchic structure underlying
MV precedes the relational model. And second, the relational
MultiValue Lacks Value 137
model was invented to replace the hierarchic model (which it
did), the exact opposite of the claim!
Note: In fact, if I recall correctly, the Pick operating system
preceded even the first generation hierarchic DBMSs and
was only later extended to database management.
The logical-physical confusion raises its ugly head right in
the first sentence of the first paragraph. Unlike a MV file,
which is physical, relational tables are logical. There is nothing
in the relational model — and intentionally so — to dictate
how the data in tables should be physically stored and,
therefore, nothing to prevent RDBMSs to store data from
multiple logical tables in one physical file. And, in fact, even
SQL products — which are far from true implementations
of the relational model — support such features. The
important difference is that while true RDBMSs (TRDBMS)
insulate applications and users from the physical details,
MVDBMSs do not.
Paper representations of R-tables are two-dimensional
because they are pictures of R-tables, not the real thing. A R-
table with N columns is a N-dimensional representation of
the real world.
The term “post-relational” — which has yet to be precisely
defined — is used in marketing contexts to obscure the non-
relational nature of MV products. Neither it, nor the term
“three-dimensional” have anything to do with “variable
field” and “variable record length,” implementation features
that can be supported by TRDBMSs. That current SQL
DBMSs lack such support is not a relational, but product
flaw.
It’s the “Values that are specific to each state [that] are
grouped logically” that give MV technology its name and
throw into serious question whether MV technology
138 The Data Warehousing eBusiness DBA Handbook
adheres to the relational principle of single-valued columns.
The purpose of this principle is practical: it avoids serious
complications, and takes advantage of the sound
foundations of logic and math. This should not be
interpreted to mean that "single-valued" means no lists,
arrays, and so on. A value can be anything and of arbitrary
complexity, but it must be defined as such at the data type (domain)
level, and MV products do not do that. In fact, MV files are
not relational databases for a variety of reasons, so even if they
adhered to the SVC principle, it wouldn’t have made a
difference (for an explanation why, see the first two papers
in the new commercial DATABASE FOUNDATIONS
SERIES launched at DATABASE DEBUNKINGS -
http://www.dbdebunk.com/.)
The second quote is from a response by Steve VanArsdale to
my two-part article, "The Dangerous Illusion: Normalization,
Performance and Integrity" in DM Review)
Note: This is, in fact, exactly what Oracle did when it added
the special CONNECT BY clause to its version of SQL, for
explode operations on tree structures. Aside from violating
relational closure by producing results with duplicates and
meaningful ordering, it works only for very simple trees.
References
"On Multivalue Technology"
(http://www.dbdebunk.com/multivalue.htm)
References 145
Securing your Data
18
CHAPTER
While this hierarchical model for roles may appear simple, there
are some important caveats that must be considered.
Let's take a look at how VPD works. When users access a table
(or view) that has a security policy:
1. The Oracle server calls the policy function, which returns a
"predicate." A predicate is a WHERE clause that qualifies a
particular set of rows within the table. The heart of VPD
security is the policy transformation of SQL statements. At
runtime, Oracle produces a transient view with the text:
Oracle Virtual Private Databases 153
SELECT * FROM scott.emp WHERE P1
The grant execute security model fits in very nicely with the
logic consolidation trend over the decade. By moving all of the
business logic into the database management system, it can be
tightly coupled to the database and at the same time have the
benefit of additional security. The Oracle9i database is now the
repository not only for the data itself, but for all of the SQL
and stored procedures and functions that transform the data.
By consolidating both the data and procedures in the central
repository, the Oracle security manager has much tighter
control over the entire database enterprise.
Conclusion
By themselves, each Oracle security mechanism does an
excellent job of controlling access to data. However, it can be
quite dangerous (especially from an auditing perspective) to
mix and manage between the three security modes. For
example, an Oracle shop using role-based security that also
decided to use virtual private databases would have a hard time
160 The Data Warehousing eBusiness DBA Handbook
reconciling what users had specific access to what data tables
and rows.
Conclusion 161
Maintaining Efficiency
19
CHAPTER
For the second change, let's update a row in the first table to
change a variable character column; for example, let's change
the LASTNAME column from "DOE" to "BEAUCHAMP."
This update results in an expanded row size because the value
for LASTNAME is longer in the new row: "BEAUCHAMP"
consists of 9 characters whereas "DOE" only consists of 3.
Let's make a third change, this time to table three. In this case
we are modifying the value of every clustering column such
that the DBMS cannot maintain the data in clustering
sequence.
Reorganizing Tablespaces
To minimize fragmentation and row chaining, as well as to re-
establish clustering, database objects need to be restructured on
a regular basis. This process is known as reorganization. The
primary benefit is the resulting speed and efficiency of database
functions because the data is organized in a more optimal
fashion on disk. The net result of reorganization is to make
Figure 2 look like Figure 1 again. In short, reorganization is
useful for any database because data inevitably becomes
disorganized as it is used and modified.
Online Reorganization
Modern reorganization tools enable database structures to be
reorganized while the data is up and available. To accomplish
an online reorganization, the database structures to be
reorganized must be copied. Then this "shadow" copy is
reorganized. When the shadow reorganization is complete, the
reorg tool "catches up" by reading the log to apply any changes
that were made during the online reorganization process. Some
vendors offer leading-edge technology that enables the reorg to
catch up without having to read the log. This is accomplished
by caching data modifications as they are made. The reorg can
read the cached information much quicker than trying to catch
up by reading the log.
Synopsis
Reorganizations can be costly in terms of downtime and
computing resources. And it can be difficult to determine when
reorganization will actually create performance gains. However,
the performance gains that can be accrued are tremendous
when fragmentation and disorganization exist. The wise DBA
will plan for regular database reorganization based on an
examination of the data to determine if the above types of
disorganization exist within their corporate databases.
Synopsis 169
The Highly Available
20
CHAPTER
Database
The eDBA and Data Availability
Greetings and welcome to a new monthly column that explores
the skills required of DBAs as their companies move from
traditional business models to become e-businesses. This, of
course, begs the question: what is meant by the term e-
business? There is a lot of marketing noise surrounding e-
business and sometimes the messages are confusing and
disorienting. Basically, e-business can be thought of as the
transformation of key business processes through the use of
Internet technologies.
What does all of this mean for the eDBA? Well, the first thing
to take away from this discussion is: "Although it is important
to plan for recovery from unplanned outages, it is even more
important to minimize downtime resulting from planned
outages. This is true because planned outages occur more
frequently and therefore can have a greater impact on e-
vailability than unplanned outages."
172 The Data Warehousing eBusiness DBA Handbook
How can an eDBA reduce downtime associated with planned
outages? The best way to reduce downtime is to avoid it.
Consider the following technology and software to avoid the
downtime traditionally associated with planned outages.
From December 1998 to June 1999 the eBay web site was
inaccessible for at least 57 hours caused by the following:
December 7 Storage software fails (14 hours)
December 18 Database server fails (3 hours)
March 15 Power outage shuts down ISP
May 20 CGI Server fails (7 hours)
May 30 Database server fails (3 hours)
June 9 New UI goes live; database server fails (6 hours)
June 10 Database server fails (22 hours)
The Impact of Downtime on an e-business 175
June 12 New UI and personalization killed
June 13-15 Site taken offline for maintenance (2 hours)
These problems resulted in negative publicity and lost business.
Some of these problems required data to be restored. eBay
customers could not reliably access the site for several days.
Auction timeframes had to be extended. Bids that might have
been placed during that timeframe were lost. eBay agreed to
refund all fees for all auctions on its site during the time when
its systems were down. To recover from this series of outages
eBay's profits were impacted by an estimated $5 million in
refunds and auction extensions. This, in turn, caused the stock
to drop from a high of $234 in April to the $130 range in mid-
July. Don't misunderstand and judge eBay too harshly though.
eBay is a great site, a good business model, and a fine example
of an e-business. But better planning and preparation for "e-
database administration" could have reduced the number of
problems they encountered.
Conclusion
These are just a few techniques eDBAs can use to maintain
high e-vailability for their web-enabled applications. Read this
column every month for more tips, tricks, and techniques on
achieving e-vailability, and migrating your DBA skills to the
web.
Strategy
The eDBA and Recovery
As I have discussed in this column
before, availability is the most RAID Levels
important issue faced by eDBAs in
managing the database There are several levels
environment for an e-business. An of RAID that can be
e-business, by definition, is an implemented.
online business - and an online
business should never close. RAID Level 0 (or
RAID-0) is also
Customers expect Web commonly referred to
applications to deliver full as disk striping. With
functionality regardless of the day RAID-0, data is split
of the week or the time of day. across multiple drives,
And never forget that the Web is which delivers higher
worldwide - when it is midnight in data throughput. But
New York it is still prime time in there is no redundancy
Singapore. Simply put, an e- (which really doesn't fit
business must be available and the definition of the
operational 24 hours a day, 365 RAID acronym).
days a year. Because there is no
redundant data being
An e-business must be prepared to stored, performance is
engage with customers at any time usually very good, but a
or risk losing business to a failure of any disk in
company whose website is more the array will result in
accessible. Studies show that if a data loss.
A disaster that takes out your data center is the worst of all
possible situations and will definitely result in an outage of
some considerable length. The length of the outage will depend
greatly on the processes in place to send database copies and
database logs to an off-site location.
Point-in-Time Recovery
Another type of database recovery is a Point-in-Time (PIT)
recovery. PIT recovery usually is performed to deal with
application level problems. Conventional techniques to
perform a point-in-time recovery will remove the effects of all
transactions performed since a specified point in time. The
traditional approach will involve an outage. Steps for PIT
recovery include:
1. Identifying the point in time to which the database should
be recovered. Depending on the DBMS being used, this can
be to an actual time, an offset on the database log, or to a
specific image copy backup (or set of backups). Care must
be taken to ensure that the PIT selected for recovery will
provide data integrity, not just for the database object
impacted, but for all related database objects as well.
2. The database objects must be taken off-line while the
recovery process applies the image copy backups.
3. If the recovery is to a PIT later than the time the backup
was taken, the DBMS must roll forward through the
database logs applying the changes to the database objects.
4. When complete, the database objects can be brought back
online.
The outage will last as long as it takes to complete steps 2
through 4. Depending on the circumstances, you might want to
make the database objects unavailable for update immediately
Point-in-Time Recovery 183
upon discovering data integrity problems so that subsequent
activities do not make the situation worse. In that case, the
outage will encompass Steps 1 through 4.
Transaction Recovery
A third type of database recovery exists for e-businesses willing
to invest in sophisticated third-party recovery solutions.
Transaction Recovery addresses the shortcomings of traditional
recoveries by reducing or eliminating downtime and avoiding
the loss of good data.
For UNDO recovery, the database log is read to find the data
modifications that were applied during a given timeframe and:
INSERTs are turned into DELETEs
Deletes are turned into Inserts
UPDATEs are turned around to UPDATE to the old value
In effect, an UNDO recovery reverses database modifications
using SQL. The traditional DBMS products do not provide
native support for this. To generate UNDO recovery SQL, you
will need a third-party solution that understands the database
log format and can create the SQL needed to undo the data
modifications.
Since the REDO process does not generate SQL for the
problem transactions, performing a recovery and then
executing the REDO SQL can restore the data to a current
state that does not include the problem transactions.
Database Design
In some cases you can minimize the impact of future database
problems by properly designing the database for the e-business
application that will use the database. For example, you might
be able to segment or partition the database by type of
customer, location, or some other business criterion whereby
only a portion of the database can be taken off-line while the
rest remains operational.
In this way, only certain clients will be affected, not the entire
universe of users. Of course, this approach is not always
workable, but sometimes "up front" planning and due diligence
during database design can mitigate the impact of future
problems.
Other solutions exist that back out transactions from the log to
perform a database recovery. For eDBAs, a backout recovery
may be desired in instances where a problem is identified
quickly. You may be able to decrease the time required to
Database Design 189
recover by backing out the effects of a bad transaction instead
of going back to an image copy and rolling forward through the
log.
Tasks
Intelligent Automation of DBA Tasks
It is hard to get good help these days. There are more job
openings for qualified, skilled IT professionals than there are
individuals to fill the jobs. And one of the most difficult IT
positions to fill is the DBA. DBAs are especially hard to recruit
because the skills required to be a good DBA span multiple
disciplines. These skills are difficult to acquire, and to make
matters more difficult, the required skill set of a DBA is
constantly changing.
The DBA must understand the business purpose for the data
to ensure that it is used appropriately and is accessible when the
business requires it to be available. Appropriate usage involves
data security rules, user authorization, and ensuring data
integrity. Availability involves database tuning, efficient
application design, and performance monitoring and tuning.
These are difficult and complicated topics. Indeed, entire books
have been dedicated to each of these topics.
After a physical database has been created from the data model,
the DBA must be able to manage that database once it has
been implemented. One major aspect of this management
involves performance management. A proactive database
monitoring approach is essential to ensure efficient database
access. The DBA must be able to utilize the monitoring
environment, interpret its statistics, and make changes to data
structures, SQL, application logic, and the DBMS subsystem to
optimize performance. And systems are not static, they can
change quite dramatically over time. So the DBA must be able
to predict growth based on application and data usage patterns
and implement the necessary database changes to
accommodate the growth. And performance management is
not just managing the DBMS and the system. The DBA must
understand SQL, the standard relational database access
language. Furthermore, the DBA must be able to review SQL
192 The Data Warehousing eBusiness DBA Handbook
and host language programs and to recommend changes for
optimization. As databases are implemented with triggers,
stored procedures, and user-defined functions, the DBA must
be able to design, debug, implement, and maintain the code-
based database objects as well.
A Lot of Effort
Implementing, managing, and maintaining complex database
applications spread throughout the world is a difficult task. To
support modern applications a vast IT infrastructure is required
that encompasses all of the physical things needed to support
your applications. This includes your databases, desktops,
networks, and servers, as well as any networks and servers
outside of your environment that you rely on for e-business.
These things, operating together, create your IT infrastructure.
These disparate elements are required to function together
efficiently for your applications to deliver service to their users.
Intelligent Automation
One of the ways to reduce these problems is through intelligent
automation. As IT professionals we have helped to deliver
systems that automate multiple jobs throughout our
organizations. That is what computer applications do: they
automate someone's job to make that job easier. But we have
yet to intelligently automate our DBA jobs. By automating
some of the tedious day-to-day tasks of database
administration, we can free up some time to learn about new
RDBMS features and to implement them appropriately.
Synopsis
As IT tasks get more complex and IT professionals are harder
to employ and retain, more and more IT duties should be
automated using intelligent management software. This is
especially true for very complex jobs, such as DBA. Using
intelligent automation will help to reduce the amount of time,
effort, and human error associated with managing databases
and complex applications.
Help
Online Resources of the eDBA
As DBAs augment their expertise and skills to better prepare to
support Web-enabled databases and applications, they must
adopt new techniques and skills. We have talked about some of
those skills in previous eDBA columns. But eDBAs have
additional resources at their disposal, too.
Usenet Newsgroups
When discussing the Internet, many folks limit themselves to
the World Wide Web. However, there are many components
that make up the Internet. One often-overlooked component is
the Usenet Newsgroup. Usenet Newsgroups can be a very
fertile source of expert information. Usenet, an abbreviation
for User Network, is a large collection of discussion groups
called newsgroups. Each newsgroup is a collection of articles
pertaining to a single, pre-determined topic. Newsgroup names
usually reflect their focus. For example, comp.databases.ibm-
db2 contains discussions about the DB2 Family of products.
Mailing Lists
Another useful Internet resource for eDBAs is the mailing list.
Mailing Lists are a sort of community bulletin board. You can
think of mailing lists as somewhat equivalent to a mass mailing.
But mailing lists are not spam because users must specifically
request to participate before they will receive any mail. This is
known as "opting in."
No eDBA Is an Island
The bottom line is that eDBAs are not alone in the Internet-
connected world. It is true that the eDBA is expected to
perform more complex administrative tasks in less time and
with minimal outages. But fortunately the eDBA has a wealth
of help and support that is just a mouse click away. As an
eDBA you are doing yourself a disservice if you do not take
advantage of the Internet resources at your disposal.