Professional Documents
Culture Documents
The Data Warehouse and Technology - Building The Data Warehouse
The Data Warehouse and Technology - Building The Data Warehouse
http://it-slideshares.blogspot.com/
5.0 Overview
This chapter outlines some of technological requirements for the data warehouse.
Manage Volumes 2. Manage multiple media technology 3. Index and monitoring data 4. Interface to retrieve and passing data
1.
*Not fast to find first record sought; very fast to find all other records in the block.
Place
data at block/page level Manage data in parallel Solid Meta Data control Rich Language Interface
Language Interface
Typically, the language interface to the data warehouse should do the following:
Be able to access data a set at a time Be able to access data a record at a time Specifically ensure that one or more indexes will be used in the satisfaction of a query Have an SQL interface Be able to insert, delete, or update data
Load
efficiently Use indexes efficiently Store data in compact way Support compound Keys
Compaction of Data
Manage large amounts of data. Programmer gets the most out of a given I/O when data is stored compactly
Compound Keys
The time valiancy of data warehouse data. Key-foreign key relationships are quite common in the atomic data
VARIABLE-LENGTH DATA
Variable-length data efficiently Lock Manager, explicit control at Able Index Only processing Restore data in Bulk efficiently
programmer Level
Lock Management
Ensures that two or more people are not updating the same record at the same time. Turn the lock manager off and on is necessary.
Index-Only Processing
Looking in an index (or indexes) without going to the primary source of data
Fast Restore
The capability to quickly restore a data warehouse table from non-DASD storage
Because record level, transaction-based updates are a regular feature of the general-purpose DBMS, must offer facilities:
Locking COMMITs Checkpoints Log tape processing Deadlock Backout
Should the decision be made to go to a new DBMS technology, what are the considerations?
Will the new DBMS technology meet the foreseeable requirements? How will the conversion from the older DBMS technology to the newer DBMS technology be done?
holds at least an order of magnitude less data. is geared for very heavy and unpredictable access and analysis of data. holds a much shorter time horizon of data. allows unfettered access.
holds massive amounts of data is geared for a limited amount of flexible access contains data with a very lengthy time horizon (from 5 to 10 years) allows analysts to access its data in a constrained fashion
2.
3.
3.
4.
4.
5.
5.
Weaknesses:
Has performance that is less than optimal. Cannot be purely optimized for access
Weaknesses:
Cannot handle nearly as much data as a standard relational format. Does not support general-purpose update processing. May take a long time to load. If access is desired on a path not supported by the
A large amount of data is spread across more than one storage medium.
One processing environment is the DASD environment where online, interactive processing is done. The other processing environment is often a tape or mass store environment
Simple contextual information relates to the basic structure of data itself, and includes such things as these:
The structure of data The encoding of data The naming conventions used for data The metrics describing the data, such as:
How much data there is How fast the data is growing What sectors of the data are growing
This type of information addresses such aspects of data as these: Product definitions Marketing territories Pricing Packaging Organization structure Distribution
Some examples of external contextual information include the following: Economic forecasts:
Inflation Financial trends Taxation Economic growth
Complex and external contextual types of information are hard to capture and quantify because they are so unstructured.
Testing
It is very unusual to find a similar test environment in the world of the data warehouse, for the following reasons: Data warehouses are so large that a corporation has a hard time justifying one of them, much less two of them. The nature of the development life cycle for the data warehouse is iterative. For the most part, programs are run in
Summary
Some technological features are
required:
Robust language interface Compound keys Variable-length data The abilities to do the following: Manage large amounts of data Manage data on a diverse media Easily index and monitor data Interface with a wide number of technologies Allow the programmer to place the data directly on the physical device Store and access data in parallel Have metadata control of the warehouse Efficiently load the warehouse Efficiently use indexes Store data in a compact way Support compound keys Selectively turn off the lock manager Do index-only processing Quickly restore from bulk storage
Summary cont
The data architect must recognize the differences between a transactionbased DBMS and a data warehousebased DBMS.
Summary cont
Multidimensional OLAP technology is suited for data mart processing and not data warehouse processing. When the data mart approach is used, many problems become evident:
The number of extract programs grows large. Each new multidimensional database must return to the legacy operational environment for its own data. There is no basis for reconciliation of differences in analysis. A tremendous amount of redundant data among different multidimensional DBMS environments exists.
Summary cont
Metadata in the data warehouse environment plays a very different role than metadata in the operational legacy environment.
http://it-slideshares.blogspot.com