Data Management & Warehousing

PROCESS NEUTRAL DATA MODELLING CONCEPTS DAVID M WALKER ETIS COMMUNITY GATHERING 13-14 NOVEMBER 2008 - BRUSSELS

© 2008 Data Management & Warehousing David M Walker

ETIS Community Gathering, Brussels

Page 1 14 November 2008

Agenda
•  The Issues With Conventional Data Warehouse Data Models •  Assumptions About The Data Model To Be Constructed •  Requirements Of A Data Warehouse Data Model •  Constructing The Data Warehouse Data Model

© 2008 Data Management & Warehousing David M Walker

ETIS Community Gathering, Brussels

Page 2 14 November 2009

Data Management & Warehousing

THE ISSUES WITH CONVENTIONAL DATA WAREHOUSE DATA MODELS

© 2008 Data Management & Warehousing David M Walker

ETIS Community Gathering, Brussels

Page 3 14 November 2009

Issues
•  Data models take a long time to develop •  Data models are expensive to change
–  Affects Source -> Data Warehouse ETL –  Affects Data Warehouse -> Data Mart ETL

•  The design often reflects the first or largest source system
–  This makes it difficult to add other systems

•  They often reflect current working practice
–  Making it difficult to change when the business does
© 2008 Data Management & Warehousing David M Walker ETIS Community Gathering, Brussels Page 4 14 November 2009

Issues
•  A struggle to keep up with rapidly changing source system data models •  Reference data is often not stored in a time variant way •  History is lost with data model changes •  Queries directly on the data warehouse are complex •  Different rules apply to query each table •  Different database platforms have different needs
© 2008 Data Management & Warehousing David M Walker ETIS Community Gathering, Brussels

Page 5 14 November 2009

Data Management & Warehousing

ASSUMPTIONS ABOUT THE DATA MODEL TO BE CONSTRUCTED

© 2008 Data Management & Warehousing David M Walker

ETIS Community Gathering, Brussels

Page 6 14 November 2009

Assumptions
•  Used in data warehouse
–  Not in the operational systems or the data marts –  Different style of modelling required

•  Users not going to query the data model
–  Users will query separate dependent data marts

•  Data will be extracted from the model to populate the data marts by ETL tools •  Data will be loaded into the model from the source systems by ETL tools
© 2008 Data Management & Warehousing David M Walker ETIS Community Gathering, Brussels Page 7 14 November 2009

Assumptions
•  Direct updates will be prohibited
–  A separate application or applications will exist as a surrogate source and ETL used to load the data

•  Not a ‘mixed mode’ database
–  Some parts using one data modelling convention and other parts using another –  This is bad practice with any modelling technique!

© 2008 Data Management & Warehousing David M Walker

ETIS Community Gathering, Brussels

Page 8 14 November 2009

Data Management & Warehousing

REQUIREMENTS OF A DATA WAREHOUSE DATA MODEL

© 2008 Data Management & Warehousing David M Walker

ETIS Community Gathering, Brussels

Page 9 14 November 2009

Requirements
•  Uses A Design Pattern
–  General reusable approaches and solutions to commonly occurring problems that can be used in many different situations

•  Convention Over Configuration
–  Decrease the number of decisions that designers / developers need to make, gaining simplicity, without losing flexibility –  Achieved by ensuring that tables and columns use a standard structures, naming convention, etc. and are populated and queried in a consistent fashion
© 2008 Data Management & Warehousing David M Walker ETIS Community Gathering, Brussels Page 10 14 November 2008

Requirements
•  DRY (Don’t Repeat Yourself)
–  Reduce duplication because it:
•  Increases the difficulty of changing the model •  Decreases the clarity of the model •  Leads to opportunities for inconsistency

•  Static over a long period of time
–  No need to add or modify tables on a regular basis –  Note: There is a difference between designed and implemented, it is possible to have designed a table but not to implement it until it is actually required
© 2008 Data Management & Warehousing David M Walker ETIS Community Gathering, Brussels Page 11 14 November 2008

Requirements
•  The data model should store data at the lowest possible level
–  Information stored at the transaction level –  Avoid the storage of aggregates

•  Supports the best use of platform specific features without compromising the design
–  Where available supports:
•  Partitioning •  Column Storage •  Many Insert/Few Update strategies
© 2008 Data Management & Warehousing David M Walker ETIS Community Gathering, Brussels Page 12 14 November 2009

Requirements
•  Completely time-variant
–  It should be possible to reconstruct all information at any point in time

•  Communication tool
–  Aids the refinement of requirements –  Aids the explanation of possibilities –  Develops confidence from the user

© 2008 Data Management & Warehousing David M Walker

ETIS Community Gathering, Brussels

Page 13 14 November 2009

Requirements
•  Uses Standard BI Relational Databases
–  Ensure that the solution can be deployed on any current platform and, if necessary, re-deployed on a future platform

•  Process Neutral
–  It will not reflect past, current or planned business processes, practices or dependencies –  Stores the data items and relationships as defined by their use at the point in time when the information is created and acquired
© 2008 Data Management & Warehousing David M Walker ETIS Community Gathering, Brussels Page 14 14 November 2008

Data Management & Warehousing

CONSTRUCTING THE DATA WAREHOUSE DATA MODEL

© 2008 Data Management & Warehousing David M Walker

ETIS Community Gathering, Brussels

Page 15 14 November 2009

Who is the customer?
•  Everyone has a different definition •  Everyone needs a different information •  Users have conflicting definitions •  Customer can be individuals or businesses
© 2008 Data Management & Warehousing David M Walker ETIS Community Gathering, Brussels Page 16 14 November 2009

More problems …
•  Some of the customers are suppliers as well •  Some businesses have separate divisions that have to be handled separately •  Some customers interact with different divisions within our organisation •  Some individuals or organisations also perform other roles
–  e.g. legal, re-sellers, partners, etc.

© 2008 Data Management & Warehousing David M Walker

ETIS Community Gathering, Brussels

Page 17 14 November 2009

The Party
•  These problem arises because the data is being looked at in terms of current business process •  In fact there is no customer entity, just different types of party
–  Individuals, Organisations, Organisational Units –  Concept of Party identical to that in contract law

•  The role of customer is defined not by the table definition but by the usage of party data with other information held (e.g. the purchase transaction relating to a product)
© 2008 Data Management & Warehousing David M Walker ETIS Community Gathering, Brussels Page 18 14 November 2009

Attributes of Party
•  The attributes of ‘Party’ will be those that remain static over the life of the record
–  State ID Number, Name, Start Date, End Date –  These attributes have ‘lifetime value’

•  Attributes that change need to be stored elsewhere •  The Party table needs to be categorised or typed
–  Individual, Organisation, Organisation Unit

© 2008 Data Management & Warehousing David M Walker

ETIS Community Gathering, Brussels

Page 19 14 November 2009

PARTIES Data Model
PARTIES • PARTY_DWK • PARTY_ID • PARTY_NAME • PARTY_START_DATE • PARTY_END_DATE • PARTY_TYPE_DWK

PARTY_TYPES • PARTY_TYPE_DWK • PARTY_TYPE • PARTY_TYPE_DESC • PARTY_TYPE_GROUP • PARTY_TYPE_START_DATE • PARTY_TYPE_END_DATE

© 2008 Data Management & Warehousing David M Walker

ETIS Community Gathering, Brussels

Page 20 14 November 2009

Supporting Non-Lifetime Attributes
•  Need to add data for different Party Types
–  Marital Status for Individuals –  Number of Children for Individuals –  Number of Employees for Organisations –  Turnover for Organisations

•  Need to add data that changes over the lifetime of the party
–  Usually the same attributes that are needed for different Party Types
© 2008 Data Management & Warehousing David M Walker ETIS Community Gathering, Brussels Page 21 14 November 2009

PARTY_PROPERTIES Data Model
PARTIES • PARTY_DWK • PARTY_ID • PARTY_NAME • PARTY_START_DATE • PARTY_END_DATE • PARTY_TYPE_DWK PARTY_PROPERTY_TYPES • PARTY_PROPERTY_TYPE_DWK • PARTY_PROPERTY_TYPE • PARTY_PROPERTY_TYPE_DESC • PARTY_PROPERTY_TYPE_GROUP • PARTY_PROPERTY_TYPE_START_DATE • PARTY_PROPERTY_TYPE_END_DATE

PARTY_PROPERTIES • PARTY_DWK • PARTY_PROPERTY_TYPE_DWK • PARTY_START_DATE • PARTY_END_DATE • PARTY_PROPERTY_VALUE

© 2008 Data Management & Warehousing David M Walker

ETIS Community Gathering, Brussels

Page 22 14 November 2009

Relationships between Parties
•  Parties have relationships
–  David Walker works in Professional Services –  David Walker is employed by Data Management & Warehousing –  David Walker is married to Helen walker

•  This is known as a Peer-To-Peer relationship •  This is the first place that we see a role defined by a relationship

© 2008 Data Management & Warehousing David M Walker

ETIS Community Gathering, Brussels

Page 23 14 November 2009

PARTY_LINKS Data Model
PARTIES • PARTY_DWK • PARTY_ID • PARTY_NAME • PARTY_START_DATE • PARTY_END_DATE • PARTY_TYPE_DWK PARTY_LINK_TYPES • PARTY_LINK_TYPE_DWK • PARTY_LINK_TYPE • PARTY_LINK_TYPE_DESC • PARTY_LINK_TYPE_GROUP • PARTY_LINK_TYPE_START_DATE • PARTY_LINK_TYPE_END_DATE

PARTY_LINKS • PARTY_DWK • LINKED_PARTY_DWK • PARTY_LINK_TYPE_DWK • PARTY_START_DATE • PARTY_END_DATE

© 2008 Data Management & Warehousing David M Walker

ETIS Community Gathering, Brussels

Page 24 14 November 2009

Segments of Parties
•  Grouping Parties together because at some point in time they shared characteristics •  This is known as a Peer Group Relationship •  Examples
–  Married people with two or more children –  IT companies with less than <100 employees

•  Usually generated by analysis and the results stored •  Most commonly seen in market segmentation type applications
© 2008 Data Management & Warehousing David M Walker ETIS Community Gathering, Brussels Page 25 14 November 2009

PARTY_SEGMENTS Data Model
PARTIES • PARTY_DWK • PARTY_ID • PARTY_NAME • PARTY_START_DATE • PARTY_END_DATE • PARTY_TYPE_DWK PARTY_SEGMENT_TYPES • PARTY_SEGMENT_TYPE_DWK • PARTY_SEGMENT_TYPE • PARTY_SEGMENT_DESC • PARTY_SEGMENT_GROUP • PARTY_SEGMENT_START_DATE • PARTY_SEGMENT_END_DATE

PARTY_SEGMENTS • PARTY_DWK • PARTY_SEGMENT_TYPE_DWK • PARTY_START_DATE • PARTY_END_DATE

© 2008 Data Management & Warehousing David M Walker

ETIS Community Gathering, Brussels

Page 26 14 November 2009

Understanding The Conventions
•  All Type tables have the same format
–  Categorisation

•  All Property tables have the same format
–  Time Variant Attributes

•  All Link tables have the same format
–  Peer-To-Peer Relationships

•  All Segment tables have the same format
–  Peer Group Relationships

•  There are no other significant clusters of data about a single entity such as Party
© 2008 Data Management & Warehousing David M Walker ETIS Community Gathering, Brussels Page 27 14 November 2009

Introducing Major Entities
•  Party is a Major Entity
–  These are entities that exist regardless of the business process –  It is the relationships between major entities that are defined by business processes –  Major Entity attributes differ from one another

•  All Organisations only need a finite number of major entities including:
–  Campaign –  Asset –  Account –  Channel –  Electronic Address –  Contract
© 2008 Data Management & Warehousing David M Walker ETIS Community Gathering, Brussels

–  Geography –  Product/Service –  Calendar
Page 28 14 November 2009

Data Models For Other Major Entities
•  Geography
–  Geography Types
•  Postal Addresses, GPS Co-ordinates, ELR

–  Geographic Property Types –  Geographic Properties –  Geographic Link Types –  Geographic Links –  Geographic Segment Types –  Geographic Segments

•  and so on for every major entity
© 2008 Data Management & Warehousing David M Walker ETIS Community Gathering, Brussels Page 29 14 November 2009

Major Entity Sub Model
Major Entity Sub-Model
MAJOR ENTITY PROPERTIES MAJOR ENTITY PROPERTY TYPES

MAJOR ENTITY

MAJOR ENTITY LINKS

MAJOR ENTITY LINK TYPES

MAJOR ENTITY TYPES

MAJOR ENTITY SEGMENTS

MAJOR ENTITY SEGMENT TYPES

© 2008 Data Management & Warehousing David M Walker

ETIS Community Gathering, Brussels

Page 30 14 November 2009

Relationships Between Major Entities
•  Storing names with multiple addresses and multiple electronic addresses (e-mail, telephone numbers, etc.)
–  Billing, Contact, Home, Work, etc

•  Usage
–  Party -> Contract -> Account -> Electronic Address -> A Number -> Usage –  Party -> Contract -> Account -> Electronic Address -> B Number -> Usage –  Product/Service -> Tariff - Usage
© 2008 Data Management & Warehousing David M Walker ETIS Community Gathering, Brussels Page 31 14 November 2009

Party -> (Electronic) Addresses
PARTY_ ADDRESS_ HISTORY_ TYPES PARTY_ ADDRESS_ HISTORY PARTY ADDRESS

PARTY_ ELECTRONIC_ADDRESS_ HISTORY PARTY_ ELECTRONIC_ADDRESS_ HISTORY_ TYPES
© 2008 Data Management & Warehousing David M Walker ETIS Community Gathering, Brussels

ELECTRONIC ADDRESS

Page 32 14 November 2009

Party -> Usage (Simplified)
PARTY CONTRACT ACCOUNT

PRODUCT SERVICE TARIFF HISTORY & TYPE

PRODUCT SERVICE

ACCOUNT ELECTRONIC ADDRESS HISTORY & TYPE

TARIFF USAGE HISTORY

A Number B Number

ELECTRONIC ADDRESS

© 2008 Data Management & Warehousing David M Walker

ETIS Community Gathering, Brussels

Page 33 14 November 2009

Extending the Data Model
•  Identify as many Major Entities as possible
–  But remember there are only a finite number so don’t invent things for the sake of it

•  Define the standard sub-model around them •  Put appropriate data in the sub-model •  Create the relationships to _HISTORY tables for the transaction the business wants to analyse

© 2008 Data Management & Warehousing David M Walker

ETIS Community Gathering, Brussels

Page 34 14 November 2009

Does this help meet requirements?
  Uses A Design Pattern   Convention Over Configuration   DRY (Don’t Repeat Yourself)   Static over a long period of time   The data model should store data at the lowest possible level   Supports the best use of platform specific features without compromising the design   Completely time-variant   Communication tool
© 2008 Data Management & Warehousing David M Walker ETIS Community Gathering, Brussels Page 35 14 November 2009

Some Key Elements
•  Self Similar modelling
–  –  –  –  All _TYPE tables have the same structure, etc. Naming conventions are consistent everywhere Easy to create standard algorithms for load and extraction Easy to partition on type and/or date

•  Insert ‘heavy’ / Update ‘light’
–  Most ETL will result in an insert, there will be very few updates

•  Manages ‘Slowly Changing Dimensions’
–  Inherent in the Major Entity Sub-Model design –  Significantly reduces overhead in the Data Mart build

•  Data Driven
–  Types provide extensible metadata –  Prevents un-necessary updating of the data model itself

•  Natural Star Schemas
–  Histories will map to FACTS, –  Major Entity Collections will collapse into DIMENSIONS
© 2008 Data Management & Warehousing David M Walker ETIS Community Gathering, Brussels Page 36 14 November 2008

Is this all there is to it ?
•  At a high level – YES •  BUT:
–  There are methods for dealing with data quality –  Special case methods for some lifetime attributes
•  e.g. Handling women changing their names at marriage

–  Insert/Update methods for performance –  Design Patterns for implementation –  Other detailed techniques

•  This talk could only ever be:

“An introduction to Process Neutral Data Modelling”
© 2008 Data Management & Warehousing David M Walker ETIS Community Gathering, Brussels Page 37 14 November 2008

Further Reading
•  Available From http://www.datamgmt.com •  White Papers
–  Overview Architecture for Enterprise Data Warehouses
•  March 2006 - 32 pages

–  Data Warehouse Documentation Roadmap
•  April 2007 – 28 pages

–  How Data Works
•  June 2007 – 32 Pages

–  Data Warehouse Governance
•  April 2007 – 24 Pages

–  Data Warehouse Project Management
•  October 2008 – 32 Pages

–  Process Neutral Data Modelling
© 2008 Data Management & Warehousing David M Walker ETIS Community Gathering, Brussels Page 38 14 November 2009

Data Management & Warehousing

Thank you !!
Website: http://www.datamgmt.com Phone: +44 7050 028 911 E-mail: davidw@datamgmt.com Skype/MSN: datamgmt
© 2008 Data Management & Warehousing David M Walker ETIS Community Gathering, Brussels Page 39 14 November 2008