Data Warehouse Basics

Operational Systems OLTP(Online Transaction Processing) record transactions/events and help to inserts,update,modify or delete data into tables.
s. Follows ACID properties of a Transaction. Drawbacks of OLTP 1) Not a DSS and does not support Analysis, Decision making or Forecasting. 2) OLTP retrieves and manipulates a single record while a DSS retrieves(only retrieval, no updation) for large number of records for Analysis and Decision making. 3) Fully normalized tables and stores one transaction at a time(locks implemented) 4) Unsuitable for historical data analysis as data is updated often and is current. 5) Data is spread over heterogeneous systems. Data Extract Processing: Collecting data from different sources and moving it to a single centralized warehouse. Drawbacks of Data Extract Processing 1) Management 2) Productivity 3) Data Quality. Benefits of Data Warehouse 1) Support for decision making. 2) Cleansing routines. 3) Consistent and valid transformation rules. 4) Documented pre summarization of data values. 5) Single standard query and reporting system no matter where from data is retrieved. 6) No disparity between data and its definition. Single source of accurate and reliable information. 7) No conflicts in time periods of algorithms used. 8) Less restrictions to Drill up or Drill down. 9) Cost effective, sophisticated, user friendly and intuitive tools. 10) Open parallelism based technology. 11) Inherent robustness, fault tolerance and easy management. 12) Shorter project cycles. 13) Subject Oriented, Integrated, Time variant and Non volatile. 14) Detailed as well as Summary data. 15) No redundancy as data is physically selected and moved. What is a data warehouse? It is an architecture - not a product - that supports strategic decision (DSS) making across the enterprise.
Subject Oriented Traditional data stores have been designed to support an application. Data warehouses are designed to provide information about subjects, regardless of the application. A good example of the difference would be the databases at a bank. In an application sense, there might be a database for savings account / checking account customers, a separate database for customers with car loans, maybe another database for customers with mortgages. To get a composite view of a particular customer, you need to pull together information from each of the databases. A data warehouse does this by maintaining customer data - a subject orientation.
Integrated This is a byproduct of the subject orientation. In order to be subject oriented, a data warehouse has to integrate data from potentially multiple sources. But this integration goes beyond single subjects - like a customer - and defines the relationships between independent subjects. An example might be integrating information for customers, who place orders, that are filled with products. The end result of integration is that you are able to perform cross-functional analysis.
Non-volatile This simply means that the data warehouse is not a database in the traditional sense where information is constantly being updated, new information is being added, and some information may be deleted. These traditional activities are characteristic of application-focused data stores, but should not be seen in a data warehouse environment. To achieve this stability, the data in the warehouse should be historical snapshots - the final characteristic.
Time-variant This refers to the fact that you can get a "picture" of your business at any one point in time. These "pictures" can be compared from one time period to the next, like in the case of comparing Q1 performance this year to Q1 performance last year. Furthermore, you can store as many "pictures" as you want. One such picture might be of the entire company. Another picture might just be of a business unit, or even a single market segment. To properly capture, track, and store these pictures, the data warehouse essentially takes a series of regular snapshots. It builds and maintains a "photo album" that shows you what your business looks like, and how it changes, over time.
Depending upon how many "pictures" you want to take and how often you take them, the size of a data warehouse can grow to be multiple terabytes in size. So what is a terabyte?
If your average PC has 1 gigabyte of hard disk space, then one terabyte is the equivalent of 1,000 PCs. Or to put it another way, it's about 1,000,000 diskettes worth of information: all in one central store, integrated across the enterprise, and delivered to end users for a variety of applications. See the figure below for a picture of the data warehouse architecture.
Picking the Right Engine At the heart of the data warehouse is the database itself. It is the engine that runs the whole system. Some of the leading players in this arena are:

Oracle Informix IBM NCR / Teradata Red Brick
Powering these databases so that they deliver satisfactory performance requires specialized hardware in most circumstances. The largest data warehouses run on MPP (Massively Parallel Processing) hardware boxes that have hundreds or even thousands of processors (CPUs). The medium-size data warehouses (like Timken's) usually require an SMP (Symmetric Multi-Processing) hardware platform. These boxes have anywhere from several to tens of processors. Only the smallest data warehouses (or data marts) can
run effectively on the traditional single processor systems. The volume of data stored within the warehouse is the primary driver when selecting a hardware platform. No matter how powerful the "engine" is, though, it is not worth much if the design of the warehouse database is bad. A poor design will hurt performance no matter how powerful the database and hardware is. Thus, the first consideration is developing a sound data model to serve as the foundation of the data warehouse. The primary tool that Timken is considering is
ERWin from LogicWorks
This tool will allow us to create and manage the data warehouse design over time. Extraction & Transformation Tools Once the database design is in place, the next piece of building the architecture is mapping data from your legacy systems into the warehouse database design and then loading the data. Experts in the field of data warehousing estimate that 70-80% of the costs of building a data warehouse fall into this phase, typically called "Extraction & Transformation". As such, several companies have tools in this area that assist you in writing the programs to extract, transform, clean-up, and load the data into the warehouse. The leaders include:

PassPort by Carleton Warehouse Executive by Prism ETI Extract by Evolutionary Technologies PowerMart by Informatica
Use of these types of tools will greatly reduce the time needed to get the warehouse up and running, as well as help to manage the warehouse environment once it is in production. End User Access Tools The final piece of the architecture is the end user access tools. Here you get into much of the jargon with data warehousing: OLAP This stands for On-line Analytical Processing, but really just means all the decision support type activities like drill-down, what-if scenario analysis, etc. It has several flavors like ROLAP, MOLAP, and even MROLAP. Each has some distinguishing characteristics, but just suffice it to say that the differences are disappearing as technology continues to advance. Some of the tools in this area include:

PowerPlay by Cognos BusinessObjects by Business Objects
Essbase by Arbor Software DSS Server by MicroStrategies
In addition, each of the major database vendors has their own tools for performing OLAP. Query & Reporting This is the realm of standard reporting and ad-hoc queries. There is a myriad of tools in this area in particular, including:

Impromptu by Cognos Brio Query by Brio Technology BusinessObjects by Business Objects IQ/Vision by IQ Software Crystal Reports
not to mention the tools offered by the database vendors. Data Mining This is essentially the inverse of OLAP. With OLAP, you typically have a question that you are trying to find the answer to, like: "Why are sales in the 3rd quarter so far below forecast?" Then, using OLAP techniques you keep "slicing & dicing" the information until you find the answer. With data mining, you turn a program loose on the data in the warehouse and let it tell you what might be some good questions to ask. Data mining programs attempt to find relationships between different subjects that are not readily discernable. This area of data warehousing is growing at a rapid pace and the the tools here change almost weekly.

Data Warehouse Basics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Warehouse Basics

Uploaded by

Copyright:

Available Formats

Operational Systems OLTP(Online Transaction Processing) record transactions/events and help to inserts,update,modify or delete data into tables.

Oracle Informix IBM NCR / Teradata Red Brick

ERWin from LogicWorks

PowerPlay by Cognos BusinessObjects by Business Objects

Essbase by Arbor Software DSS Server by MicroStrategies

You might also like