You are on page 1of 39

Data Management & Warehousing

PROCESS NEUTRAL DATA MODELLING
CONCEPTS

DAVID M WALKER
ETIS COMMUNITY GATHERING
13-14 NOVEMBER 2008 - BRUSSELS

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 1
David M Walker 14 November 2008
Agenda

•  The Issues With Conventional Data Warehouse
Data Models
•  Assumptions About The Data Model To Be
Constructed
•  Requirements Of A Data Warehouse Data Model
•  Constructing The Data Warehouse Data Model

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 2
David M Walker 14 November 2009
Data Management & Warehousing

THE ISSUES WITH
CONVENTIONAL DATA WAREHOUSE
DATA MODELS

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 3
David M Walker 14 November 2009
Issues

•  Data models take a long time to develop
•  Data models are expensive to change
–  Affects Source -> Data Warehouse ETL
–  Affects Data Warehouse -> Data Mart ETL
•  The design often reflects the first or largest
source system
–  This makes it difficult to add other systems
•  They often reflect current working practice
–  Making it difficult to change when the business does

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 4
David M Walker 14 November 2009
Issues

•  A struggle to keep up with rapidly changing
source system data models
•  Reference data is often not stored in a time
variant way
•  History is lost with data model changes
•  Queries directly on the data warehouse are
complex
•  Different rules apply to query each table
•  Different database platforms have different
needs
© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 5
David M Walker 14 November 2009
Data Management & Warehousing

ASSUMPTIONS ABOUT
THE DATA MODEL
TO BE CONSTRUCTED

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 6
David M Walker 14 November 2009
Assumptions

•  Used in data warehouse
–  Not in the operational systems or the data marts
–  Different style of modelling required
•  Users not going to query the data model
–  Users will query separate dependent data marts
•  Data will be extracted from the model to
populate the data marts by ETL tools
•  Data will be loaded into the model from the
source systems by ETL tools
© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 7
David M Walker 14 November 2009
Assumptions

•  Direct updates will be prohibited
–  A separate application or applications will exist as a
surrogate source and ETL used to load the data
•  Not a ‘mixed mode’ database
–  Some parts using one data modelling convention and
other parts using another
–  This is bad practice with any modelling technique!

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 8
David M Walker 14 November 2009
Data Management & Warehousing

REQUIREMENTS OF A
DATA WAREHOUSE
DATA MODEL

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 9
David M Walker 14 November 2009
Requirements

•  Uses A Design Pattern
–  General reusable approaches and solutions to
commonly occurring problems that can be used in
many different situations
•  Convention Over Configuration
–  Decrease the number of decisions that designers /
developers need to make, gaining simplicity, without
losing flexibility
–  Achieved by ensuring that tables and columns use a
standard structures, naming convention, etc. and are
populated and queried in a consistent fashion
© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 10
David M Walker 14 November 2008
Requirements

•  DRY (Don’t Repeat Yourself)
–  Reduce duplication because it:
•  Increases the difficulty of changing the model
•  Decreases the clarity of the model
•  Leads to opportunities for inconsistency
•  Static over a long period of time
–  No need to add or modify tables on a regular basis
–  Note: There is a difference between designed and
implemented, it is possible to have designed a table
but not to implement it until it is actually required

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 11
David M Walker 14 November 2008
Requirements

•  The data model should store data at the lowest
possible level
–  Information stored at the transaction level
–  Avoid the storage of aggregates
•  Supports the best use of platform specific
features without compromising the design
–  Where available supports:
•  Partitioning
•  Column Storage
•  Many Insert/Few Update strategies

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 12
David M Walker 14 November 2009
Requirements

•  Completely time-variant
–  It should be possible to reconstruct all information at
any point in time
•  Communication tool
–  Aids the refinement of requirements
–  Aids the explanation of possibilities
–  Develops confidence from the user

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 13
David M Walker 14 November 2009
Requirements

•  Uses Standard BI Relational Databases
–  Ensure that the solution can be deployed on any
current platform and, if necessary, re-deployed on a
future platform
•  Process Neutral
–  It will not reflect past, current or planned business
processes, practices or dependencies
–  Stores the data items and relationships as defined by
their use at the point in time when the information is
created and acquired

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 14
David M Walker 14 November 2008
Data Management & Warehousing

CONSTRUCTING THE
DATA WAREHOUSE
DATA MODEL

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 15
David M Walker 14 November 2009
Who is the customer?

•  Everyone has a
different definition
•  Everyone needs a
different information
•  Users have conflicting
definitions
•  Customer can be
individuals or
businesses

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 16
David M Walker 14 November 2009
More problems …

•  Some of the customers are suppliers as well
•  Some businesses have separate divisions that
have to be handled separately
•  Some customers interact with different divisions
within our organisation
•  Some individuals or organisations also perform
other roles
–  e.g. legal, re-sellers, partners, etc.

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 17
David M Walker 14 November 2009
The Party

•  These problem arises because the data is being
looked at in terms of current business process
•  In fact there is no customer entity, just different
types of party
–  Individuals, Organisations, Organisational Units
–  Concept of Party identical to that in contract law
•  The role of customer is defined not by the table
definition but by the usage of party data with
other information held (e.g. the purchase
transaction relating to a product)
© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 18
David M Walker 14 November 2009
Attributes of Party

•  The attributes of ‘Party’ will be those that remain
static over the life of the record
–  State ID Number, Name, Start Date, End Date
–  These attributes have ‘lifetime value’
•  Attributes that change need to be stored
elsewhere
•  The Party table needs to be categorised or typed
–  Individual, Organisation, Organisation Unit

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 19
David M Walker 14 November 2009
PARTIES Data Model
PARTIES
• PARTY_DWK
• PARTY_ID
• PARTY_NAME
• PARTY_START_DATE
PARTY_TYPES • PARTY_END_DATE
• PARTY_TYPE_DWK • PARTY_TYPE_DWK
• PARTY_TYPE
• PARTY_TYPE_DESC
• PARTY_TYPE_GROUP
• PARTY_TYPE_START_DATE
• PARTY_TYPE_END_DATE

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 20
David M Walker 14 November 2009
Supporting Non-Lifetime Attributes

•  Need to add data for different Party Types
–  Marital Status for Individuals
–  Number of Children for Individuals
–  Number of Employees for Organisations
–  Turnover for Organisations
•  Need to add data that changes over the lifetime
of the party
–  Usually the same attributes that are needed for
different Party Types

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 21
David M Walker 14 November 2009
PARTY_PROPERTIES Data Model

PARTIES
• PARTY_DWK
• PARTY_ID
• PARTY_NAME
• PARTY_START_DATE PARTY_PROPERTIES
• PARTY_END_DATE • PARTY_DWK
• PARTY_TYPE_DWK • PARTY_PROPERTY_TYPE_DWK
• PARTY_START_DATE
• PARTY_END_DATE
PARTY_PROPERTY_TYPES • PARTY_PROPERTY_VALUE
• PARTY_PROPERTY_TYPE_DWK
• PARTY_PROPERTY_TYPE
• PARTY_PROPERTY_TYPE_DESC
• PARTY_PROPERTY_TYPE_GROUP
• PARTY_PROPERTY_TYPE_START_DATE
• PARTY_PROPERTY_TYPE_END_DATE

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 22
David M Walker 14 November 2009
Relationships between Parties

•  Parties have relationships
–  David Walker works in Professional Services
–  David Walker is employed by Data Management &
Warehousing
–  David Walker is married to Helen walker
•  This is known as a Peer-To-Peer relationship
•  This is the first place that we see a role defined
by a relationship

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 23
David M Walker 14 November 2009
PARTY_LINKS Data Model

PARTIES
• PARTY_DWK
• PARTY_ID
• PARTY_NAME
• PARTY_START_DATE
• PARTY_END_DATE PARTY_LINKS
• PARTY_TYPE_DWK • PARTY_DWK
• LINKED_PARTY_DWK
• PARTY_LINK_TYPE_DWK
PARTY_LINK_TYPES • PARTY_START_DATE
• PARTY_LINK_TYPE_DWK • PARTY_END_DATE
• PARTY_LINK_TYPE
• PARTY_LINK_TYPE_DESC
• PARTY_LINK_TYPE_GROUP
• PARTY_LINK_TYPE_START_DATE
• PARTY_LINK_TYPE_END_DATE

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 24
David M Walker 14 November 2009
Segments of Parties

•  Grouping Parties together because at some
point in time they shared characteristics
•  This is known as a Peer Group Relationship
•  Examples
–  Married people with two or more children
–  IT companies with less than <100 employees
•  Usually generated by analysis and the results
stored
•  Most commonly seen in market segmentation
type applications
© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 25
David M Walker 14 November 2009
PARTY_SEGMENTS Data Model

PARTIES
• PARTY_DWK
• PARTY_ID
• PARTY_NAME
• PARTY_START_DATE
• PARTY_END_DATE PARTY_SEGMENTS
• PARTY_TYPE_DWK • PARTY_DWK
• PARTY_SEGMENT_TYPE_DWK
• PARTY_START_DATE
PARTY_SEGMENT_TYPES • PARTY_END_DATE
• PARTY_SEGMENT_TYPE_DWK
• PARTY_SEGMENT_TYPE
• PARTY_SEGMENT_DESC
• PARTY_SEGMENT_GROUP
• PARTY_SEGMENT_START_DATE
• PARTY_SEGMENT_END_DATE

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 26
David M Walker 14 November 2009
Understanding The Conventions

•  All Type tables have the same format
–  Categorisation
•  All Property tables have the same format
–  Time Variant Attributes
•  All Link tables have the same format
–  Peer-To-Peer Relationships
•  All Segment tables have the same format
–  Peer Group Relationships
•  There are no other significant clusters of data about a
single entity such as Party

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 27
David M Walker 14 November 2009
Introducing Major Entities

•  Party is a Major Entity
–  These are entities that exist regardless of the
business process
–  It is the relationships between major entities that are
defined by business processes
–  Major Entity attributes differ from one another
•  All Organisations only need a finite number of
major entities including:
–  Campaign –  Asset –  Geography
–  Account –  Channel –  Product/Service
–  Electronic Address –  Contract –  Calendar
© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 28
David M Walker 14 November 2009
Data Models For Other Major Entities

•  Geography
–  Geography Types
•  Postal Addresses, GPS Co-ordinates, ELR
–  Geographic Property Types
–  Geographic Properties
–  Geographic Link Types
–  Geographic Links
–  Geographic Segment Types
–  Geographic Segments
•  and so on for every major entity
© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 29
David M Walker 14 November 2009
Major Entity Sub Model
Major Entity Sub-Model
MAJOR ENTITY MAJOR ENTITY
PROPERTIES PROPERTY TYPES

MAJOR ENTITY

MAJOR ENTITY MAJOR ENTITY
LINKS LINK TYPES

MAJOR ENTITY MAJOR ENTITY MAJOR ENTITY
TYPES SEGMENTS SEGMENT TYPES

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 30
David M Walker 14 November 2009
Relationships Between Major Entities

•  Storing names with multiple addresses and
multiple electronic addresses (e-mail, telephone
numbers, etc.)
–  Billing, Contact, Home, Work, etc
•  Usage
–  Party -> Contract -> Account -> Electronic Address ->
A Number -> Usage
–  Party -> Contract -> Account -> Electronic Address ->
B Number -> Usage
–  Product/Service -> Tariff - Usage
© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 31
David M Walker 14 November 2009
Party -> (Electronic) Addresses
PARTY_
ADDRESS_
HISTORY_
TYPES

PARTY_ ADDRESS
ADDRESS_
HISTORY

PARTY

PARTY_ ELECTRONIC
ELECTRONIC_ADDRESS_ ADDRESS
HISTORY

PARTY_
ELECTRONIC_ADDRESS_
HISTORY_
TYPES
© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 32
David M Walker 14 November 2009
Party -> Usage (Simplified)

PARTY CONTRACT ACCOUNT

PRODUCT SERVICE PRODUCT ACCOUNT
TARIFF SERVICE ELECTRONIC ADDRESS
HISTORY & TYPE HISTORY & TYPE

A Number
TARIFF ELECTRONIC
USAGE HISTORY ADDRESS
B Number

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 33
David M Walker 14 November 2009
Extending the Data Model

•  Identify as many Major Entities as possible
–  But remember there are only a finite number so don’t
invent things for the sake of it
•  Define the standard sub-model around them
•  Put appropriate data in the sub-model
•  Create the relationships to _HISTORY tables for
the transaction the business wants to analyse

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 34
David M Walker 14 November 2009
Does this help meet requirements?

  Uses A Design Pattern
  Convention Over Configuration
  DRY (Don’t Repeat Yourself)
  Static over a long period of time
  The data model should store data at the lowest possible
level
  Supports the best use of platform specific features
without compromising the design
  Completely time-variant
  Communication tool

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 35
David M Walker 14 November 2009
Some Key Elements

•  Self Similar modelling
–  All _TYPE tables have the same structure, etc.
–  Naming conventions are consistent everywhere
–  Easy to create standard algorithms for load and extraction
–  Easy to partition on type and/or date
•  Insert ‘heavy’ / Update ‘light’
–  Most ETL will result in an insert, there will be very few updates
•  Manages ‘Slowly Changing Dimensions’
–  Inherent in the Major Entity Sub-Model design
–  Significantly reduces overhead in the Data Mart build
•  Data Driven
–  Types provide extensible metadata
–  Prevents un-necessary updating of the data model itself
•  Natural Star Schemas
–  Histories will map to FACTS,
–  Major Entity Collections will collapse into DIMENSIONS
© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 36
David M Walker 14 November 2008
Is this all there is to it ?

•  At a high level – YES
•  BUT:
–  There are methods for dealing with data quality
–  Special case methods for some lifetime attributes
•  e.g. Handling women changing their names at marriage
–  Insert/Update methods for performance
–  Design Patterns for implementation
–  Other detailed techniques
•  This talk could only ever be:
“An introduction to
Process Neutral Data Modelling”
© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 37
David M Walker 14 November 2008
Further Reading

•  Available From http://www.datamgmt.com
•  White Papers
–  Overview Architecture for Enterprise Data Warehouses
•  March 2006 - 32 pages
–  Data Warehouse Documentation Roadmap
•  April 2007 – 28 pages
–  How Data Works
•  June 2007 – 32 Pages
–  Data Warehouse Governance
•  April 2007 – 24 Pages
–  Data Warehouse Project Management
•  October 2008 – 32 Pages
–  Process Neutral Data Modelling

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 38
David M Walker 14 November 2009
Data Management & Warehousing

Thank you !!

Website: http://www.datamgmt.com
Phone: +44 7050 028 911
E-mail: davidw@datamgmt.com
Skype/MSN: datamgmt

© 2008 Data Management & Warehousing ETIS Community Gathering, Brussels Page 39
David M Walker 14 November 2008