An Introduction to Data Warehousing

Presented by

Joseph M. Wilson
EPA

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

1

In the Beginning, life was simple«

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

2

But«

Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

3

4 .Our information needs« Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Inmon Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. (The Spider web) SOURCE: William H. 5 .Kept growing.

Purpose To explore and discuss the purpose and principles of data warehousing. 6 . Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

7 .Briefing Contents Data Warehouse Concepts Building a Data Warehouse STORET Warehouse Example Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

By comparison: an OLTP (on-line transaction processor) or operational system is used to deal with the everyday running of one aspect of an enterprise. It is generally used for research and decision support. 8 . Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.So What Is a Data Warehouse? Definition: A data warehouse is the data repository of an enterprise. OLTP systems are usually designed independently of each other and it is difficult for them to share information.

advanced reporting and OLAP tools Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.Why Do We Need Data Warehouses? Consolidation of information resources Improved query performance Separate research and decision support functions from the operational systems Foundation for data mining. data visualization. 9 .

What Is a Data Warehouse Used for? Knowledge  discovery Making consolidated reports  Finding relationships and correlations  Data mining  Examples  Banks identifying credit risks  Insurance companies searching for fraud  Medical research Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. 10 .

How Do Data Warehouses Differ From Operational Systems? Goals Structure Size Performance optimization Technologies used Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. 11 .

12 . many columns per table) Batch updates Usually very complex queries Operational system Transaction oriented Small (MB up to several GB) Current data Normalized table structure (many tables. few columns per table) Continuous updates Simple to complex queries Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.Comparison Chart of Database Types Data warehouse Subject oriented Large (hundreds of GB up to several TB) Historic data De-normalized table structure (few tables.

Design Differences Operational System Data Warehouse ER Diagram Star Schema 13 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. .

14 .Supporting a Complete Solution Operational SystemData Entry Data WarehouseData Retrieval Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Often viewed as a restriction of the data warehouse to a single business process or to a group of related business processes targeted toward a particular business group. Data Marts. and Operational Data Stores  Data Warehouse ± The queryable source of data in the enterprise. Since an ODS supports day to day operations. it needs to be continually updated. It is comprised of the union of all of its constituent data marts. 15 .Data Warehouses. SOURCE: Ralph Kimball Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.  Data Mart ± A logical subset of the complete data warehouse.  Operational Data Store (ODS) ± A point of integration for operational systems that developed independent of each other.

16 .Briefing Contents Data Warehouse Concepts Building a Data Warehouse STORET Warehouse Example Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

17 .Building a Data Warehouse Data Warehouse Lifecycle Analysis  Design  Import data  Install front-end tools  Test and deploy  Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Stage 1: Analysis Identify: Target Questions  Data needs  Timeliness of data  Granularity  ± ± ± ± Analysis Design Import data Install front-end tools Test and deploy Create an enterprise-level data dictionary Dimensional analysis  Identify facts and dimensions Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. 18 .

Stage 2: Design Star schema Data Transformation Aggregates Pre-calculated Values HW/SW Architecture ± Analysis Design ± Import data ± Install front-end tools ± Test and deploy Dimensional Modeling Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. 19 .

20 . Most dimension tables contain many textual attributes that are the basis for constraining and grouping within data warehouse queries. Dimension Table ± One of a set of companion tables to a fact table.Dimensional Modeling Fact Table ± The primary table in a dimensional model that is meant to contain measurements of the business. SOURCE: Ralph Kimball Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

correct. or flag bad data Conform Dimensions  Load the data into the warehouse Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. 21 .Stage 3: Import Data Identify data sources  Extract the needed data from existing systems to a data staging area  Transform and Clean the data      ± Analysis ± Design Import data ± Install front-end tools ± Test and deploy Resolve data type conflicts Resolve naming and key conflicts Remove.

22 .Importing Data Into the Warehouse OLTP 1 OLTP 2 Data Staging Area Data Warehouse OLTP 3 Operational Systems (source systems) Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

23 . ± Analysis ± Design ± Import data Install front-end tools ± Test and deploy Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.Stage 4: Install Front-end Tools Reporting tools Data mining tools GIS Etc.

Stage 5: Test and Deploy Usability tests Software installation User training Performance tweaking based on usage ± ± ± ± Analysis Design Import data Install front-end tools Test and deploy Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. 24 .

Special Concerns Time and expense Managing the complexity Update procedures and maintenance Changes to source systems over time Changes to data needs over time Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. 25 .

26 .Briefing Contents Data Warehouse Concepts Building a Data Warehouse STORET Warehouse Example Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

Goals of the STORET Central Warehouse Improved performance and faster data retrieval Ability to produce larger reports Ability to provide more data query options Streamlined application navigation Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. 27 .

Old Web Application Flow Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. 28 .

29 .Central Warehouse Application Flow Search Criteria Selection Report Size Feedback/ Report Customization Report Generation Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

gov/storet/dw_home. 30 .html Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.Web Application Demo STORET Central Warehouse: http://epa.

STORET Central Warehouse ± Potential Future Enhancements More query functionality Additional report types Web Services Additional source systems? ST O R E T State System A State System B Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. 31 .

Data Warehouse Components SOURCE: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. Ralph Kimball 32 .

Ralph Kimball 33 .Data Warehouse Components ² Detailed SOURCE: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.

34 .Briefing Contents Data Warehouse Concepts Building a Data Warehouse STORET Warehouse Example Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation.