Data Modeling for the Data Warehouse Tom Haughey April 27-29, 2005

Registration $1595 Early Registration Rate $1495 if registered by March 27, 2005 DAMA, MPO or IDMA Rate $1395 if registered by March 27, 2005
This workshop will focus on the specialized techniques used to model data warehouse data. It provides concrete guidelines and rules for modeling this data. This workshop will stress that this is not just an intuitive process based solely on the spontaneous judgment of a skilled analyst. It can be based on sound rules and guidelines just as production data models are. This workshop emphasizes three fundamental concepts: ♦Finding the right level of atomic data in the warehouse, ♦That the data warehouse is not a database but an integrated environment consisting of different levels of data, ♦Optimizing the data warehouse design through dimensional modeling.
Workshop Agenda

Introduction Scope and levels of modeling • Kinds of data • The framework for data modeling • Challenges in data management • Five major characteristics of data warehouse • Data Models • Types and technologies of data warehousing Data Warehouse Methodology • Explanation of methodology steps • Iterative nature of development Introduction To Data Modeling • Definition and components • Levels of data models • Rules for each level • The high level data model • Identifying and defining subject areas • The detailed level data model • Normalization Functional dependency Mathematical normalization Natural normalization Application to the data warehouse Building the Data Warehouse Model • Comparison of operational and informational data • Case 1: direct access to operational data • Case 2: using informational data bases • A data controlled environment • Progression of data in a data controlled environment • Two types of data changes in the data warehouse: Transformations optimizations Levels of Data In the Enterprise • Four types of data and systems • The warehouse and decision support data model • Sources of warehouse and decision data • Definition and rules • The corporate model • The business area model

Derived Data • Types and different rules for handling during analysis and design Modeling Time And History • Short Term And Long Term View • Four ways of handling time and date • Time-series data • Capturing business changes • Importance of representing the business time dimension Information Gathering • Facilitated sessions • Interviews • Information gathering techniques Analyzing Current Systems Data • Define key data elements • Data stewards for each data element • Key data element business rules • Define domains and valid values • Define valid ranges for error • Document key data elements on repository • Validate data mappings • Identify key data elements in source systems • Map relationships for repository Data Transformations • Remove pure production data • Add time and history to the data identifier • Add data derivations • Find the right atomicity of data • Determine the functional dependencies in summary data • Create data arrays and fact tables • Accommodate varying levels of summarization • Add summary data • Merge like data from different tables • Create arrays of data • Separate data based on its stability • Embed relationships in the data • Add external data • Techniques for derived data

He wrote his own CASE tool in 1984. Concepts learned are reinforced by individual and group exercises Tom Haughey is considered one of the four founding fathers of Information Engineering in is America. Usually. who market the CASE tools called POSE and SILVERRUN. this requires a degree of denormalization when creating the data model. a training and consulting company specializing in practical and rapid development methods. quarterly. annually). Information Engineering and Data Warehousing. over 40. He was formerly Chief Technology Officer for the Pepsi Bottling Group and Enterprise Director of Data Warehousing for Pepsico. which is that of time. it is necessary to show the values of data changing over time (such as. He was also formerly Vice President of Technology for Computer Systems Advisers. By contrast. The steps of optimization Data Warehouse Technology • Categories of warehouse tools • Review of major products Important Considerations And Issues • Denormalization and performance • Archiving and purging • Data distribution and replication • Change control • Copy management • Alternative Models For Copied Data Among the most important factors you will learn at this seminar will be how to design a dimensional data warehouse. . while ensuring a quality data warehouse design. He has worked on the development of seven different CASE tools. In addition. Inc. data warehousing.He formerly worked for IBM for 17 years as a Senior project manager. The seminar will teach how to effectively accomplish this aspect of data modeling. His courses on data management. For example. Data warehouse design usually introduces a third dimension. monthly.000 copies of which have been sold to date. OLTP (on-line transaction processing) data models are normalized so as to reduce update problems. Information Engineering and software development have been delivered to Fortune 1000 companies around the world.• Different levels of summarization Critical Warehouse Components • Definition of fact tables and dimensions • Creating multidimensional arrays • Developing fact tables and arrays • Corporate reference tables • The star database schema • The snowflake database schema • Meta-data repository and components Optimizing the Data Warehouse Design Data design compromises Safe compromises to data Merge like tables Create arrays of data (violate first normal form) Split data based on stability and usage ( Add indices.. He is the author of many articles on Data Management. the focus is on current data. He is currently President of InfoModel. Encode-decode data Aggressive compromises to data Store derived data Summarize data Add redundant data Imbed relationship data What you will learn Add redundant relationships Add partial dependencies (violate second normal form) Add transitive dependencies (violate third normal form) Critical factors in data design Number of occurrences of each table The ratio of one table to another The queries that use the data The data accesses made by each query The load factor for each query. to support trend analysis. which is two-dimensional.

Sign up to vote on this title
UsefulNot useful