• Embed Doc
  • Readcast
  • Collections
  • CommentGo Back
Download
 
David Walker Page 1
A TECHNICAL ARCHITECTURE FOR THE DATA WAREHOUSE
David Walker ConsultantData Management & Warehousing, Wokingham, United Kingdom
Summary
The paper aims to set out a brief description of the process of implementing a DataWarehouse and then to look in more detail at one particular aspect of the process,namely the Technical Architecture. Whilst it can not be exhaustive it does set out amodel that can be used as a guide in the implementation of a system.
Introduction
When an organisation decides to build a Data Warehouse the are four key elementsrequired of the process. These elements are:
 
The Business AnalysisA process of gathering together users of information (this is not the ITdepartment but other departments such as marketing, sales, finance, customer service, etc.) and asking them what their goals are and how they measurethem. The measurements are the Key Performance Indicators (KPI) and willeventually become the answers required from the Data Warehouse. Theconstraints that they place on these KPI such as time, location, etc. are thedimensions of the Data Warehouse. This basic design allows users toformulate natural language questions against a relational database [for example What is the total sales (KPI) and average sales (KPI) by region(Dimension) over the past two years (Dimension) ?]Obtaining a common understanding between departments of the KPI anddimensions will in itself prove a challenge and is normally approached as aseries of workshops. Users should understand that the Data Warehouse is aniterative design and should therefore not expect answers to all their questionson the first pass. It is also likely that the required information from the systemwill change over time as decisions, based on information that has comeavailable, will change the course of business and require different KPI tohelp further decision support.
 
David Walker Page 2
 
The Database Schema DesignThe next stage of the process is to take the requirements identified by the usersand design a schema that will support the queries. By taking the questionsidentified by users and looking at the data available on the existing systems amapping can be produced. This will inevitably show a mismatch between therequirement and the information available. Decisions then have to be made toeither supplement this from external sources, modify internal sources to provide such information, or omit the information until a later iteration.The inevitable consequence of this approach is that one, or a number of ‘star schemas’ will be derived. These are schemas that have a large central (or Fact) table containing the KPI and foreign keys to a number of smaller dimension tables. Each dimension table will have a relatively low number of rows but has a large number of (highly indexed) columns that are to someextent de-normalised. A user may see a table called TIME that has columns‘Day of Year’, ‘Day of Week’, ‘Day of Month’, ‘Month’, ‘Week Number’,‘Year’ and ‘Julian Date’. For five years of information this will only contain1825 (5*365) rows. The primary key will be ‘Julian Date’.This dimension will be one of perhaps eight that relate to the large centraltable. If each of these dimensions contains 2000 rows, then the central tablecould have up to 2.56^26 rows. In practice, there is a high level of sparsity(no sales on Saturday or Sunday would reduce the TIME dimension by 30%alone) and therefore it is practical to build the systems to support the design.Each ‘star schema’ can be considered a Data Mart for one or moredepartments. These star schemas may also contain pre-calculated summaries,averages, etc. The schema design must also include Meta Data, or data aboutdata. This will allow users to interrogate the database to discover the where a particular field is and if appropriate how it is derived (for example how wasthe gross profit calculated, which system provided the stock levels, etc.). TheMeta Data can also be used to enforce security on the system by determiningwhat a user can or can not see.
 
David Walker Page 3
 
The Technical ArchitectureThis is the substantive part of the paper and is covered in detail below. At thisstage it is sufficient to say that the technical architecture is the design of a practical method to build, maintain and secure the Data Warehouse. As thesystem is rolled out it will grow in its importance and will become missioncritical. Whilst not being required on a seven by twenty-four basis, it willneed to be available to run very large, and possibly long running queries.Other issues will include backup and recovery, response, accuracy and presentation.
 
Project ManagementAs has already been stated the building of a Data Warehouse will be both inter-departmental and an iterative approach. It is necessary for the process to havethe complete buy-in from management and to have strong project managementto maintain the momentum over the long term. Not only does the projectmanager have to deal with the initial implementation, but build the necessaryinfrastructure to keep the project rolling over the longer term. A failure toachieve this is more often the cause of the collapse of a Data Warehouse projectthan the technical hurdles encountered along the way.
The Technical Architecture
 
of 00

Leave a Comment

You must be to leave a comment.
Submit
Characters: ...
You must be to leave a comment.
Submit
Characters: ...