You are on page 1of 18

Data Warehousing

Data Model Overview


Modeling for the Enterprise while Serving the Individual

Debbie Smith
Data Warehouse
Consultant
Teradata Global
Sales Support

Data Model Overview


Table of Contents

Executive Summary

Introduction

Revisit Dr. E. F. Codds 12 Rules

So What is an EDW?

Data Infrastructure

Data Modeling Theory

From Logical to Physical Modeling

Physical Models

Impact of Information Delivery Tools

10

Impact of ETL Tools

10

Surrogate Keys

11

Changing Dimensions

12

The Teradata Direction

13

Impact of the Data


Warehousing Evolution

13

Summary

14

Appendix A Codds 12 Rules

15

Appendix B A Comparison

16

Appendix C Glossary

17

Appendix D Endnotes

18

Appendix E References

18

EB-2406

> 1007 > PAGE 2 OF 18

Executive Summary
The data model choice for the data warehouse is often a matter
of great controversy. The desire is to offer a self-service type of
environment that allows business users easy access with acceptable
response times. Response time also includes the time required
between the conception of a new application and the delivery of
that application. What data model do you employ to provide ease
of use for the business user while still being able to address current
and future needs of the data warehouse in terms of updating,
expansion, availability, and management? This paper will provide
an overview of popular data modeling and the Teradata Corporation position regarding data modeling.

Data Model Overview


Introduction

> number 8; Physical Data Independence

enterprise. Before continuing, lets discuss

Data model choices are often a matter of

where the user is isolated from how

the environment that Teradata refers to

great controversy when discussing build-

the data is physically stored

when we say enterprise data warehouse.

ing a data warehouse. When building the

> number 9; Logical Data Independence

We define an EDW as an area where the

data warehouse, how do you build it? Do

where the user is not impacted should

data of the business (the enterprise) is

you attempt to build it for the business

the physical data structure change

centrally integrated, centrally stored, and

entity current and future or do you

accessed through common business

build it in a manner that satisfies current

These three guidelines specifically provide

defined methods. We believe, and our

business users needs? This paper will

a methodology to isolate users from the

customers have shown us, that the value of

discuss the idea of being able to build the

impact of IT activities and directly impact

centralizing the data is the synergy gained

centralized enterprise data warehouse to

and lay the foundation for being able to

by storing the data once, managing it once

sustain longevity, and use view layer to

build a logical, user-friendly data structure

and accessing it for and in many varied

form fit individual business user needs

that is independent of the physical data

ways, times, methods, and reasons. (By

and requirements, combining the benefits

structure. This foundation provisions for

gathering and integrating the data of the

of a normalized model with the benefits of

form fitting individual business users

business, business users are able to have a

a dimensional model.

needs while maintaining a physical model

360-degree view of the data.) Of course,

that facilitates the needs of ETL, update

the accumulation of business data is an

Revisit Dr. E. F. Codds

frequencies, and data management

ongoing process and ever evolving. One

12 Rules

requirements, enabling the enterprise data

of the goals is to develop the long-term

In 1985, Dr. Codd published a list of 12

warehouse with user friendliness.

strategy that provides a methodology of

principles that have become the design of


relational database systems guidelines (see

So What is an EDW?

Appendix A).1 Dr. Codd is credited as the

The foundation of this paper is built on

creator of the relational model, and within

the concept of an enterprise data ware-

these guidelines, he outlines the structur-

house (EDW). Many people use the term

ing of information into tables, that null

data warehouse to represent similar

values are properly identified and a high-

concepts; however, there can be some

level of insert, update, and delete is

variances in what is meant. It is generally

maintained. Note guidelines 6, 8, and 9.

accepted that the data warehouse is the

> number 6; View Updating Rule where


logical views support full range data
manipulation

environment that provides for decision


support within the organization. By
appending the word enterprise, that
environment is now thought of or expected to become reflective of the entire

EB-2406

> 1007 > PAGE 3 OF 18

adding and refining business data. The


higher the degree of customer success, the
greater degree the infrastructure is able to
answer any question, from any user
(internal or external) on any data at any
time, including the yet unknown queries
or unknown future application needs,
without the need or intervention of
additional resources. This synergy allows
the business to use the information of the
business to quickly respond to and create
changes in the market place.

Data Model Overview


Data Infrastructure

the enterprise with increasing speed. It is

With more than 35 years of experience in

The data model is the core of the data

also driving changes in how the enterprise

database technology and data warehouse

warehouse. The decisions made when

conducts business by providing insight

design, Bill Inmon is recognized as an

defining the data model determine the data

into current business practices. These

authority in the data warehousing indus-

infrastructure. This data infrastructure can

increasing requirements of the data

try. His lectures and papers promote a

impact performance, time to market for

warehouse impact data modeling choices.

normalized model as the model of choice

new applications, facilitate responses to

The goal of the EDW is to address the

for the data warehouse, reserving the use

changes in the market place, business users

needs of the enterprise. This requires a

of star schema models for data marts if

ability to retrieve information, data latency,

data model that provides an enterprise

and when necessary. He states, It is not

and the optimization of the data ware-

solution rather than a localized solution

possible to build an effective DSS environ-

house over the long term. The focal point

that only addresses specific application

ment without an enterprise data ware-

of the data warehouse data modeling

needs for individuals within the organiza-

house. The design of the enterprise data

discussion typically covers normalization

tion. The charge of the EDW is to enable

warehouse is typically normalized. The

theory and dimensional theory.

the business at an enterprise level. To do

classical entity relationship structuring of

that, hard decisions concerning data

data accompanied by the ensuing normal-

modeling must be addressed. Questions

ization of the data structures fits very

such as: should the data model reflect the

conveniently with the requirements of the

needs of the enterprise or the needs of the

usage of the enterprise data warehouse. 2

Teradata Database, as an RDBMS, is and


always has been agnostic about what well
defined data model is chosen. There are
many customers who execute a normalized model, others using a snowflake
model, others using a star schema, and
many others using some variation or
derivation of each/any and all. With that
said, it is important to note that Teradata
as a company, with its vast experience in
data warehousing, does have preferences to
ensure the data model employed provides
the highest flexibility and responsiveness
to the business for not only current needs
but also future needs.

individual? What is the impact on the data


model as the needs of the individual
and/or the needs of the enterprise change?
Do you have to sacrifice one over the
other? What if you could minimize those
sacrifices? Experience has shown us that
with Teradata Database, it is possible to
take full advantage of Codds guidelines of
physical data independence, logical data
independence, and view updating to create
a data model that provides the flexibility
to satisfy both current and future needs of

As data warehouses are evolving and taking

the enterprise. And then, through the use

on greater responsibility and active roles in

of views, create a view model that address-

how an enterprise conducts business, data

es the needs of the individual. This

model requirements, and expectations are

provides a methodology to utilize both the

changing at a faster rate to support this

normalized model and augment it with

increased responsibility. The data ware-

views that use dimensional modeling

house is expected to reflect changes within

techniques, taking the best of both worlds.

EB-2406

> 1007 > PAGE 4 OF 18

Data Modeling Theory


Entity-Relationship Modeling
Before designing and establishing the
physical data model, theres a logical data
modeling process. This process includes
extended discussions with the business
community. During these discussions, the
business requirements are identified, the
data entities and elements required to meet
those business requirements are established, and the relationships between the
data entities are captured. These insights
are later diagrammed into a representation
of the business, and referred to as an Entity
Relationship (ER) model. An ER model or
diagram represents the data entities of the
business and the relationships between
those entities. At this stage, the specific
functions or queries the model will be used

Data Model Overview

Customer

Entity

CustomerName and CustomerAddr are


attributes of Customer

CustomerName
CustomerAddr

Places

Relationship

Business Rule

> A Customer places zero or more Travel-Requests


> A Travel-Request is placed by only one Customer

Boston to

Entity

New York
TravelRequest
Figure 1. Example ER Diagram

to support are not included. This is a

functions. It identifies the dimensions and

The dimensional model is often thought

logical representation of the enterprise and

levels within the business, separating out

of or represented as a cube with dimen-

is later used to develop a physical data

the facts or measures of the business. The

sions such as time, product, and

model or schema. There are a number of

dimensional model enforces additional

geography. The business metric is at the

tools available to map the ER diagram,

rules that eliminate many to many rela-

intersection of these dimensions. Visualiz-

ERwin, now known as AllFusion Erwin

tionships between entities, allowing only

ing the dimensional model as a cube,

Data Modeler, being the most recognized.

many-to-one relationships. It fuses

sometimes referred to as a star schema,

ER modeling is guided by the rules of

multiple entities together into a new

(See Figure 2.) makes it easy to imagine

normalization. These are strict rules meant

entity. This new entity does not directly

being able to slice and dice that segment of

to ensure the essence of business relation-

reflect the entities and relationships that

data. Creating these cubes for reporting

ships is captured. Examples of the

occur in the business, but is established for

requires understanding of what questions

components (entities and relationships) of

the convenience of storing a data point or

will be asked before designing. Each cube

an ER diagram are shown in Figure 1.

metric that is important to the targeted

is designed to facilitate quick and easy

group of business users. The goal of the

access to a specific business application.

Dimensional Modeling
Dimensional modeling is another logical
design method used to organize data for
functional groups of users or business

EB-2406

> 1007 > PAGE 5 OF 18

dimensional model is to provide a presentation layer that facilitates easy navigation


of the data for business users and quick
access to reports.

Data Model Overview

Advertising

Fiscal

Calendar

Ad Year

Year

Year

Product

Geography

Dept

Quarter

Quarter

Period

Month

Ad Period

Region
Minor
Dept

Category

Sub
Category
District
Ad Week

Week

Week
SKU

UPC

Day

Store

Sales

Figure 2. A Cube (Star Schema)

The dimensional model flattens the

week to a fiscal week providing a method-

With a dimensional model, the central fact

dimensions via denormalization. Business

ology for drilling through to lower levels.

table is forced to a single grain, causing the

rules are used to validate and ensure

In effect, the functional rules and process-

initial fact table design to become brittle

additional restrictions (as in Figure 2).

es are accommodated within the model.

or inflexible. You cant incorporate new

For example, many companies have a

This, in turn, means that much more

data sources without breaking the original

fiscal year that differs from the calendar

needs to be understood about the queries

star schema design or creating separate

year; they may even have another time

and outputs that the model is expected to

fact tables or data marts. To get an enter-

dimension, such as advertising, which

support. A dimensional model can be

prise view, you would then have to drill

differs from both calendar and fiscal.

created from an ER model, however, an

across these different stars or data marts.

If/when the advertising week does not fall

ER model could not be created from a

within the fiscal week, business rules will

dimensional model. Once entities are

be used to determine and assign the ad

fused together, separating those fused


entities is difficult, if not impossible.

EB-2406

> 1007 > PAGE 6 OF 18

Data Model Overview


From Logical to Physical
Modeling
Why would you create a physical model

Less Data Redundancy

that is different from the logical model?


The physical model often differs from the
logical model to yield the best overall
performance for specific database tech-

Snowflake

Star

Flattened

nologies, and not all database technologies


are equal. Technologies differ so much

Normalized

Denormalized

that in an article written for Intelligent


Magazine Neil Raden states, As you know,

Simpler Data Access

the primary schema for a data warehouse


is either a star schema or a normalized
schema. The latter is a term so loosely

Figure 3. The Modeling Spectrum

defined that its hard to describe, but a


normalized schema typically resembles a
third (or higher) normal form (3NF)

is where the greatest variation, deviation,

accomplished this, you can see the busi-

schema thats not dimensional. These 3NF

and difference of opinion takes shape,

ness functions as if looking down from a

designs dont support query and analysis.

which of course, forms the basis for the

skyscraper. The denormalization survival

Their sole purpose is to act as a staging

confusion surrounding data modeling. On

guide then provides guidelines of how to

area, an upstream data repository for a

one end of the spectrum is the flattened

transform the normalized logical model

series of star schemas, online analytic

denormalized model; while on the other

into a physical model. This survival guide

processing (OLAP) cubes, and other

end is the normalized model. Of course,

provides some of the principles for denormal-

structures that are directly queried by

the reality is that most customers develop

ization, the basics of which are to minimize

analysts. The only routine exception to this

a data model that falls somewhere between

denormalization so as not to compromise the

is Teradata implementations: Because of

the opposing ends, or some combination

business current and future needs.

the unique characteristics of the massively

of both. This is often the result of the data

parallel architecture and database optimiz-

warehouse evolving to understand busi-

er, Teradata can process analytical SQL

ness demands.

Normalized Physical Model


As you move along the modeling spectrum

against a 3NF schema with acceptable


performance.

Physical Models

In his book Data Modelers Workbench,

(see Figure 3), the further to the left your

Steve Hoberman discusses what he refers

data model is, the greater the degree of

This implies the physical model is often

to as the normalization hike and the

normalization. A normalized model

changed from the logical model not

denormalized survival guide. Hobermans

separates data into tables based on very

necessarily due to business requirements

principles state that the logical model is

strict rules that relate an attribute to the

but to facilitate the technology being used

completely normalized, up to fifth normal

primary key. One of the fundamental rules

and ensure query speed. This transition

form. His analogy states that once youve

of normalization is the elimination of data

from the logical model to physical model

EB-2406

> 1007 > PAGE 7 OF 18

Data Model Overview


redundancy within the model, keeping

tables in a normalized model, the greater

power users) were created within the

each attribute within each table function-

the degree of normalization, the greater

organization to code and generate reports

ally dependent upon the primary key.

the degree of storage space conservation

for business users. These power users

By following this rule, eliminating data

because data is not duplicated. However,

became so busy that requests were often

redundancy within the model, the number

denormalized structures, such as summary

queued for weeks at a time. Business users

of tables along with the complexity of the

tables, will and often do exist in a data

wanted direct access to information, and

data model increases. As new data subjects

warehouse that has been developed under

they wanted it instantaneously. Denormal-

for the data warehouse are identified, the

the constructs of normalization. When the

ized models provided for simple, easy

model is extended following the same

business value warrants the cost to build

navigation of the data. The data are

normalization rules to include the new

and manage summary table structures,

typically aggregated along one of the lines

data subjects.

they are a viable augmentation of the

of dimension. The types of queries

normalized data model.

requested are well understood, and the

[While data warehouse modelers are often

model is established to answer those

embroiled in controversy over how to model

Denormalized Model

the data warehouse, the world of transac-

Moving right along the modeling

tional databases is in agreement that a

spectrum (see Figure 3) the degree of

normalized model is the optimal choice. As

denormalization increases. A flattened

data warehouses evolve towards transaction-

model essentially resembles an Excel

al decision making using very current data,

spreadsheet. Denormalized data models

the need for a more normalized physical

flourished out of a need to provide busi-

design becomes evident.] A normalized data

ness users with quick and easy access to

model provides a methodology for captur-

data without the requirement of under-

ing and storing the lowest level of data,

standing the underlying data model.

The star schema model is typically only

eliminates multiple updates to various

Business users, whose job does not include

considered for use in data marts or the

tables with the same transaction, and is

knowing SQL or the logic of data models

data warehouse and is built to address a

the model of choice for the OLTP types of

and, therefore, how to join tables together,

specific business need, report, or applica-

transactions that are targeted and specific.

needed easy access to information.

tion. Each star schema is built as an

However, those same business users

individual data mart. When the business

did understand spreadsheet formats and,

needs a new report, another star schema,

in fact, often requested information in a

or data mart, is built. If the needed

report format. Therefore, when data was

dimensions are the same as what has

presented in a two-dimensional form

already been built, they are duplicated

(similar to a spreadsheet); as opposed to

to this new data mart. Each of these data

a series of connected tables, business users

marts caters to the needs of the individual.

were protected from data model complexi-

There may be a temptation to think of a

ty. Because the data model was generated

group of these data marts as constituting a

to address specific business constituencies,

data warehouse. However, they are distinct

new user constituencies (often called

structures that have duplicated data in

Customer experience has shown that


the normalized model can answer new
business questions or previously unknown
questions without making changes to
the structure of the database, because
the relationships and entities have been
represented physically and provide the
greatest flexibility for the business. It
eliminates data duplication and, therefore,
the complexity of maintaining duplicated
data. Even though there are typically more

EB-2406

> 1007 > PAGE 8 OF 18

questions. These pre-defined models


ensure the known query will return an
answer set quickly. When new queries
are identified, the model is extended to
include a duplication of existing data in
a form that allows the business user to
navigate and quickly retrieve the answer
set for the new set of queries.

Data Model Overview


order to store it either in a different

Item

Date

Sales

format or on a different platform, either

100012

01102001

10.00

a different database or just different

300012

02122001

3.00

hardware. The star schema model doesnt

200012

01152001

2.50

100012

03042001

15.00

user usage. When the business community

Item

Jan Sales

Feb Sales

Mar Sales

Apr Sales

has a new request, a new star schema is

100012

345.00

450.00

326.50

245.90

typically built to facilitate the request.

200012

456.60

376.50

210.00

390.00

300012

254.00

112.00

310.00

295.00

400012

510.00

610.00

590.00

545.00

have the goal of storing data once for


multiple uses. Its goal is to facilitate end-

For example, when a business user wants


to know the monthly sales of products,
the report produced would have headings

Figure 4. Example table in denormalized format

such as Item, January Sales, February


Sales, March Sales, etc. The normalized

forcing all dimensions to conform to what

Inmon states, In short, simply doing

Sales table would consist of columns of

is established in the staging area. This bus

dimensional modeling as a basis for data

Item, Date, and Sales. This table would

architecture is the roadwork that connects

warehouse design leads down a dark path

provide the greatest flexibility, allowing for

all the different data marts and servers

when multiple star joins are considered. It

the next questions. These questions might

being used. Kimball states, It is acceptable

is never apparent that there is a problem

include what are sales per week for specific

to create a normalized database to support

with star joins when you are looking at

items, what were fiscal month sales, what

the staging processes; however, this is not

just one star join. But when you look at

items sell on Mondays, or what are

the end goal. The normalized structures

multiple star joins, the limitations of

weekend sales. However, because it is

must be off-limits to user queries because

dimensional modeling become apparent.

often difficult for users or even database

they defeat understandability and per-

Does this mean that dimensional model-

software to make the conversion from

formance. As soon as a database supports

ing is invalid as a database design

the table structure to the report layout,

query and presentation services, it must be

technique for data warehousing? The

in the dimensional model, DBAs simply

considered part of the data warehouse

answer is not at all. Dimensional modeling

store the table in the denormalized report

presentation area. By default, normalized

and star joins fit the bill very nicely for

format (see Figure 4).

databases are excluded from the presenta-

data marts. In fact, if I had to design a

tion area, which should be strictly

data mart tomorrow, I would not consider

In an effort to maintain conformity, the

dimensionally structured.

star schema is built with constrained

comes to the foundation data, its another

dimensions. Ralph Kimball explains that

Contrasting Ralph Kimballs belief that

data are placed in a staging area for

data warehouses are best suited for a star

transformation. These data, using what

schema, Bill Inmon firmly believes the

he refers to as the data warehouse bus

EDW has at its heart a normalized model,

architecture, are then propagated out to

reserving the use of star schema models

all the different data marts that require it,

for data marts if and when necessary.

EB-2406

> 1007 > PAGE 9 OF 18

using any other approach. But, when it


story. The foundation data the data
warehouse requires a different treatment
than dimensional modeling altogether.
The data warehouse, which is the proper
foundation for all DSS activity, including
star joins, requires very granular, very

Data Model Overview


flexible data. The ideal structure for the

model would be the easiest to implement.

extract data from the transactional sys-

data warehouse is normalized data. The

Since the data warehouse was updated or

tems, and those are in normalized format,

normalized data can be bent and shaped

refreshed on a weekly or even monthly

ETL tools work better with the normalized

any old way.

basis, the update schedule mitigated issues

data model.

surrounding the complexity of updates


The fact table is connected to the
dimension tables by means of foreign key
relationships. The keys to the dimension
tables sit in the fact table. There may be
many relationships that find their way
into the fact table. In addition the fact
table carries with it other non key data
if needed. In other words, the dimension
table may share with the fact table data
other than just the key. When the fact table
and the dimension tables are connected,
they form a star, hence the name star join.
The net effect of the fact table is a struc-

to maintain simple data models. Business


users were expected to know, understand,
and code simple SQL for retrieving
information, allowing IT to turn the data
warehouse over to the business users and
only maintain responsibility for keeping it
updated. When the business began coming
up with additional questions that the data
model couldnt address, and joins between
star schemas became complex, information delivery tools were developed to help
business users retrieve the information
they needed through an easy interface.

ture which is highly denormalized and

If the data warehouse is built with a


denormalized data model, then most
require additional steps to land the data
in a separate area for ETL. While the ETL
tools work well with a normalized model,
moving those data into a denormalized
model requires manipulation or transformation. The suggested way of handling
this manipulation or transformation is
the creation of a data staging area. This
staging area is used to land the extracted
data then manipulate or transform them
to the format needed before loading. This
staging area also provides the platform

which is very efficient to access. Once the

While many of these information delivery

from where data, in particular dimensions,

fact table is created, it can be efficiently

tools have grown up expecting this star

can be duplicated out to the different star

analysed. But there is a tradeoff. The

data model, many are growing in sophis-

schema data marts.

extreme efficiency of the star join also

tication and technologically developing

makes it inflexible. If there is a desire to

to be able to recognize and use more

see the data in the star join in a manner

normalized data models and making it

other than that for which the structure is

easier for end users to navigate a more

built, then the star join is very inflexible.8

complex schema.

When loading to a normalized model, the


need to transform the data for modeling
purposes is significantly less, and can
actually happen during the load process,
eliminating the need for a staging area. In

Impact of Information

Impact of Extraction,

a normalized model, those using a staging

Delivery Tools

Transformation, and Loading

area are typically using it to correct data

Denormalized models, specifically star

(ETL) Tools

quality issues that exist in the transactional

schemas, gained general acceptance on the

While Information Delivery tools were

systems. For example, ensuring character

basis of the ease provided for business

developed to assist the business user in

fields follow the same set of rules. If an

users to directly access data. After gather-

retrieving information from the data

address contains the word street, then all

ing the detailed business requirements,

warehouse, ETL tools were developed to

formats of street are transformed to be

the developers often determined that to

assist in extracting data from transactional

the same, for example ST becomes Street

mitigate risk and provide the business

systems. Since the ETL tools are meant to

and St. becomes Street.

users what they requested, a denormalized

EB-2406

> 1007 > PAGE 10 OF 18

Data Model Overview


Surrogate Keys

within the table minimal. Another

surrogate key, a process to link the values

In general, a logical key comprises the data

argument is that when the natural key is

the user does know with the surrogate

fields whose value ensures uniqueness for

large or consists of multiple columns, a

value must be established to retrieve

each row, and whose value will not change

surrogate key would be smaller and require

information. Typically, this requires

during the life of the row. Implemented

less space as it is populated throughout the

additional secondary indexes. Using

physically, it is a natural key. Often, the

data model. To facilitate access, a process is

surrogate keys tends to add both a column

terms key and index are used interchange-

developed that will lead to the surrogate

and a unique index to each table, some-

ably, but they are not the same. An index

key for joins and retrieval of information.

times multiple secondary indexes, and in

is a mechanism used to optimize performance. The physical implementation differs


depending on the database management
system employed.

some cases, entirely new tables. Surrogate


In a data warehouse, which brings together
data, often disparate data, from the trans-

key values also must be propagated to all


the dependent tables as foreign keys.

actional environment, surrogate keys may


provide a viable methodology to do so.

While surrogate keys are frequently with

A surrogate key is an artificial key, or

The data warehouse is subject oriented,

star schemas, their use is not reserved for

generated key, that is used as a primary

which contrasts the application or function

specific data models. The decision to use or

key in substitution for natural keys. Using

orientation of transactional environments.

not use a surrogate key should be part of

a surrogate key is a choice rather than a

Integrating and reorganizing multiple

the data model design considerations. That

requirement. Typically, they are numerical

systems that represent the same subject,

decision should be predicated on what the

and randomly generated to uniquely

such as customer, can require the integra-

data modeler believes to be best for the

identify a row. The use of a surrogate key

tion of data that does not occur naturally

organization? Will there be requirements

is often promoted on the basis of adding

in the operations environment. A surrogate

for the natural key to change? Is there a

flexibility and data independence to the

key will uniquely identify each instance

need to ensure that every row is unique?

data model. In business, change is an

of that integrated subject, each customer,

Will there be a resulting impact from

ongoing fact of being in business. Business-

without worry of duplication in the future

adding secondary indexes?

es are involved in mergers and acquisitions,

as the business continues to change. This

new products and channels are added, they

provides a stable data model that will

are reorganized, and they enter into new

withstand the test of time. However,

markets. There are a couple of arguments

beware not to use surrogates as replace-

for surrogate keys. One is that as business

ments for good data modeling practices.

natural key is used as the primary key,


changing either the value or the format of
that key is an intrusive effort, which could
result in loss of historical information and
data relationships. When a surrogate key is
used as the primary key, the impact of
changing the value of the natural key is the
same as changing the value of any column

EB-2406

> 1007 > PAGE 11 OF 18

order table is order number, and a business


user wanted to know how many units of
product X were ordered by company ABC,
chances are he wouldnt know the order

changes, so do the values, and potentially,


the format of natural keys. When the

For example, if the natural key for an

Before compromising to use surrogate

number. But would the business user know

keys, you must consider some things. To

the product number and/or the company

generate the surrogate key, a process must

name? When a surrogate key is used as the

be put in place to ensure uniqueness and

primary key, order number, product, and

consistency. Updating and inserting new

ordering company would probably each be

rows require first the creation of the

made secondary indexes. When the natural

surrogate key and also a lookup process

key of order number is used as the primary

to ensure the newly generated key hasnt

key, product and ordering company would

been used. Since the user doesnt know the

probably each be made secondary indexes.

Data Model Overview

SCD Type One

SCD Type Two

SCD Type Three

SKU

Catg

SKU

Catg

Updated

SKU

Catg

Prev Catg

1001

01

1001

03

20021029

1001

03

01

1002

02

1001

01

20010101

1001

01

15

1003

03

1002

02

20010101

1001

15

20

1003

01

20020915

1002

02

02

1003

03

20010101

1003

01

03

1003

03

12

1003

12

05

1003

05

03

Figure 5. Methods to address SCD

Outside of the maintenance issues of the

However, a changing dimension is only a

A normalized model is typically developed

surrogate key, the data modeler must

problem when the model is built on dimen-

as type two, developing a history as

determine when it is wise to use either a

sions. Typically, there are three methods used

changed rows are inserted into the table.

surrogate key or a natural key. For example,

to address SCDs (see Figure 5).

If the business application requires only

while order number may make sense to

> Type one updates in place the dimen-

the current value, a view could be used

use as the primary key, in the case of a

sion value as it changes. This provides

to present to the application current values

customer table, it may make sense to use a

a dimension that is always current,

only. Those applications that need history

surrogate key for customer id. The decision

eliminating the capture of history

or the ability to identify values during a

to use a surrogate key should be evaluated

because the association of the data

specific time frame would then have access

on a case by case occurrence and not

with the old dimension value has

to all the history data.

entered into as a standard.

been overlaid.

While the normalized model provides

> Type two inserts a new row into the

for the type two SCD, the construct of the

A changing dimension is often referred to

table as the dimension changes. This

dimensional model increases the challenge

as a Slowly Changing Dimension (SCD).

method provides both current and

of providing for changing dimensions.

The impact of time on the data warehouse

history information of the changing

Included in this challenge is the need to

can be significant. Time has a tendency to

dimension, and usually involves

determine how to reflect the changing

generate many changes. Customers move

carrying an effective date to indicate

dimension. Which type satisfies the

and their address changes; people change

which row is current.

business need for the dimension? Does

Changing Dimensions

their names. Area demographics change,

> Type three updates the dimension

it satisfy all the business needs? Which

populations shift. Products move from one

in place after moving the changed

method satisfies the needs of all the

category to another. These are changing

dimension to an old column. This

data marts in the dimensional model?

dimensions. Time impacts all data models,

method provides for current and

and dimensions will change over time.

most recent changes.

EB-2406

> 1007 > PAGE 12 OF 18

Data Model Overview


The Teradata Direction

model of atomic data provides the greatest

One of the key factors that sets the Teradata

Identifying the most appropriate choice of

flexibility and, therefore, often the greatest

Database apart from all others is the

data model for a Teradata solution is a two-

benefit for long-term business benefit. A

ability to use a normalized data model

step process. First, you must distinguish

well integrated data warehouse is a valu-

that facilitates ease of management, ease

between what Teradata Database lends itself

able mechanism in business operations. As

of expansion, simplifies load strategies,

to and what Teradata professionals advo-

businesses use the data warehouse to

and allows for full integration of enterprise

cate. Teradata Database is impartial to the

uncover patterns in their manufacturing

data. This facilitates building for the

data model. The database has the technical

processes, their ordering processes, or in

enterprise. Addressing the needs of the

ability and the power to perform with any

customer purchasing habits, business

individual becomes as easy as applying a

well-designed data model. Teradatas ability

processes change. The business expects the

view to the underlying data structure. This

to parallelize every aspect of query process-

data warehouse to enable and reflect those

view then allows IT and the data warehouse

ing eliminates technical issues and

business changes. If business changes

to address the needs of the individual.

complexity of decision support processing

require an additional column (dimension)

of large volumes of data, including multi-

to one of the act tables or product in one

table joins and joins within joins that are

category is transferred to another category,

often the catalyst for denormalized data

the business expects the data warehouse to

models for some technologies. Teradata

reflect these changes based on a business

engineering is always reviewing and identi-

timeline. If that timeline is negatively

fying areas within the optimizer to improve

impacted by time requirements in address-

performance whether the data model is

ing data model changes to enable and

normalized or the denormalized star

reflect the business changes, the data

schema. Join efficiencies improvements,

warehouse becomes an obstacle to the

such as large table/small table joins in the

businesss success. Customer and field

early 1990s, were targeted at star schemas.

experience repeatedly shows that a normal-

Impact of the Data

Advanced indexing features, join index

ized model provides for the most flexibility

Warehousing Evolution

introduced in Teradata Database V2R3 and

and ease in delivering new products to

As data warehousing evolves to an active

aggregate join index introduced in Teradata

market and for facilitating rapid business

environment where frequent and/or

Database V2R4, provide a methodology for

changes. So Teradata professionals often

continual data updates are required, where

users of Teradata Database to have a

suggest starting with a normalized model

processing becomes event driven rather

normalized model and use these features to

because of flexibility and ease in adding

than report driven, where the need to

create star schemas if and where processing

new data elements. However, this does not

address changes in the market place are

merits it. Teradata Database V2R5 has made

mean that the data warehouse follows all

expected with increasing speed, the data

generating surrogate keys easier with the

the rules of third normal form in the

model becomes even more critical. The

introduction of the Identity column.

strictest sense. Based on business require-

mixed workload of the shorter tactical

Optimizer learning and intelligence occurs

ments, summary tables are often found

queries with the longer strategic queries,

without regard to data model.

useful to augment the normalized model.

against the same data creates additional

The goal is to provide a model that sup-

denormalization challenges. Evolving to an

Years of field experience have shown


Teradata professionals that a normalized

EB-2406

> 1007 > PAGE 13 OF 18

ports the business through all of the


changes that occur over time.

Views are a transparent layer that is on


top of the underlying data structure and
provide a methodology of creating a
presentation layer that eases business
user access. In Teradata Database, views
dont require space and will transition
the underlying data model into nearly
any presentation with such efficiency the
cost of using views is mitigated to simply
the management of them.

Data Model Overview


active data warehouse eliminates many of

Summary

be unable to resolve their issues in a timely

the known variables, such as what queries

The goal of a data warehouse is to provide

manner, time that had a direct impact on

will be submitted, who will submit those

the business with a tool that facilitates

the success of the business. After deter-

queries, which were used in defining the

and enables the business to make better

mining the data was available in the data

data models of early data warehouses. To

business decisions and to take timely

warehouse, in a normalized format, the

accommodate this evolution, some of

action on those decisions. The robustness

data warehouse group was able to generate

Teradatas competitors technologies are

of the data infrastructure determines the

a presentation layer that resolved the

beginning to add functions and features

long-term success of the data warehouse

business needs within a couple of hours.

that enable a more normalized model.

and the ability of the business to harness

The ability to do this was valuable to the

Tools such as information delivery tools

the information for its own success. The

business bottom line.

are also preparing for this evolution. As

robustness of the Teradata Database

the data warehouse evolves to an active

provides for the ability to combine the

environment, the requirement for fresh

best of all worlds. The DBA can generate

data to be available and accessible erodes

a DBA-friendly normalized data model

the time available to maintain denormal-

that facilitates the full integration of data

ized models.

representing the enterprise, following


the teachings of Codd, Date, and Inmon.

The data model chosen for the data


warehouse should be chosen with a clear
understanding of the benefits of each. The
early days of data warehousing tended to
depend on a denormalized model. Much
of that decision was based on addressing
specific business needs rather than
addressing the breadth of the business,
or on existing technology that limited
choices. During those early days, when
Teradata professionals recommended a
normalized model, it was considered
unusual. As data warehousing evolves to

The use of views, a presentation layer that


employs the user friendliness of dimensional modeling, enables catering to the
needs of the individual, interfacing with
the data warehouse for business user ease

ness, and it should enable the business.


Taking a hard line with any specific data
model type wont satisfy the business
needs. However, the impact of time
generates change within the business, and
the data warehouse requires the flexibility
to address and yield to those changes.
A normalized data model is the correct
place to begin.
Debbie Smith is a Data Warehouse

for relational databases, and following the

Consultant in Global Sales Support. She had

teachings of Kimball. This allows for data

14 years of experience with Teradata systems

to be centrally stored, eliminates redun-

at a retail customer. Her responsibilities

dant data easing data management, provides

while with the retail customer included

a methodology to support future business

application development/support, database

needs by capitalizing on existing data

administrator, business power user, and

structures and shortening time to market.

responsibility for developing, implementing,

ment increase for businesses to change,

For example, one Teradata customer had

the idea of a normalized data model for

their business users request a new applica-

the data warehouse no longer seems so

tion from IT. After the applications group

radical. In fact, many of the leading

said that it would take 6 months just to

analysts and data warehouse professionals

find the data and once found, then they

understand the value of a normalized

would have to determine the time required

model and are becoming more critical

to generate the application. The business

of denormalized models.

users left the meeting believing they would

> 1007 > PAGE 14 OF 18

should reflect requirements of the busi-

of access, capitalizing on Codds guidelines

being active and as the need and require-

EB-2406

The data model for the data warehouse

and managing an Information Delivery


strategy for business end users. Since joining
Teradata, Debbie has worked extensively
with prospects as well as new and mature
data warehouse customers to implement
effective tactical and strategic data warehousing initiatives.

Data Model Overview


Appendix A Codds 12 Rules

transaction control. All commercial rela-

particularly difficult to satisfy. Most

Rule 1: The Information Rule

tional databases use forms of the standard

databases rely on strong ties between

All data should be presented to the user in

SQL (Structured Query Language) as their

the user view of the data and the actual

table form.

supported comprehensive language.

structure of the underlying tables.

Rule 2: Guaranteed Access Rule

Rule 6: View Updating Rule

Rule 10: Integrity Independence

All data should be accessible without

Data can be presented to the user in

The database language (like SQL) should

ambiguity. This can be accomplished

different logical combinations, called

support constraints on user input that

through a combination of the table name,

views. Each view should support the same

maintain database integrity. This rule is

primary key, and column name.

full range of data manipulation that direct-

not fully implemented by most major

access to a table has available. In practice,

vendors. At a minimum, all databases do

providing update and delete access to

preserve two constraints through SQL.

Rule 3: Systematic Treatment of


Null Values
A field should be allowed to remain

logical views is difficult and is not fully


supported by any current database.

empty. This involves the support of a null

No component of a primary key can have


a null value. (See Rule 3)

value, which is distinct from an empty

Rule 7: High-level Insert, Update,

string or a number with a value of zero.

and Delete

Of course, this cant apply to primary keys.

Data can be retrieved from a relational

In addition, most database implementa-

database in sets constructed of data from

tions support the concept of a non-null

multiple rows and/or multiple tables. This

Rule 11: Distribution Independence

field constraint that prevents null values

rule states that insert, update, and delete

A user should be totally unaware of

in a specific table column.

operations should be supported for any

whether or not the database is distributed

retrievable set rather than just for a single

(whether parts of the database exist in

row in a single table.

multiple locations). A variety of reasons

Rule 4: Dynamic On-Line Catalog Based


on the Relational Model

If a foreign key is defined in one table,


any value in it must exist as a primary key
in another table.

make this rule difficult to implement.

A relational database must provide access

Rule 8: Physical Data Independence

to its structure through the same tools that

The user is isolated from the physical

Rule 12: Nonsubversion Rule

are used to access the data. This is usually

method of storing and retrieving informa-

There should be no way to modify the

accomplished by storing the structure

tion from the database. Changes can be

database structure other than through

definition within special system tables.

made to the underlying architecture

the multiple row database language (like

(hardware, disk storage methods) without

SQL). Most databases today support

affecting how the user accesses it.

administrative tools that allow some direct

Rule 5: Comprehensive Data


Sublanguage Rule

manipulation of the data structure.

The database must support at least one

Rule 9: Logical Data Independence

clearly defined language that includes

How a user views data should not change

(From: ITWorld.com, Codds 12 Rules

functionality for data definition, data

when the logical structure (tables structure)

Data Management Strategies 05-07-2001.)

manipulation, data integrity, and database

of the database changes. This rule is

EB-2406

> 1007 > PAGE 15 OF 18

Data Model Overview


Appendix B A Comparison
Normalized Model

Denormalized (Star) Model

Active Data Warehousing

Supports active data warehousing initiatives.

Challenges the ability to be active.

Any Question

The exibility of the normalized model provides for


any question to be answered.

The simplicity of the denormalized model limits


what questions can be answered without changes.

Complex Analysis

A normalized model supports complex analysis.

The enforced simplicity of a denormalized model is


unable to support complex analysis.

Data Duplication

The rules of normalization require a single version


of data.

Data is often duplicated multiple times to satisfy


different groups of questions.

Data Granularity

The normalized model easily stores the lowest level


of data.

To maintain the ease of navigation, the denormalized model is typically aggregated along one of the
dimension lines.

Data Navigation

The normalized model is more difcult to navigate.


Business users have a difcult time at best in
understanding the model, the necessary joins and
sometimes even the relationships between data
elements.

The denormalized model is specically designed for


easy navigation by the business user.

Flexibility

The greater the degree of normalization in the data


model, the greater the exibility.

The denormalized model is meant to be simplistic


and is designed for specic requirements. As
requirements change, changes to the model are
needed.

Maintenance

Updates to the normalized model are easier


because a piece of information is in one place and
requires one update. The normalized model is the
standard for the transactional environment.
Continuing that model throughout the data warehouse provides a closer link and reection of
transactional processes.

Updating the denormalized model requires preprocessing, aggregation and longer time frames.

EB-2406

> 1007 > PAGE 16 OF 18

Data Model Overview


Appendix C Glossary

Dimensional Model

Foreign Key

Attribute

A design method that models the data

Columns in a table that reflect primary

Data that is attributed or describes an

based on a predefined set of business

keys of other entities. Determines the

entity.

questions to facilitate ease of access and

relationship between entities.

response time speed.


Cube

Hierarchy

A way to visualize how a dimensional

Drill (Across)

The arrangement or ranking of entities

model is organized. It is also a type of

Movement of end-user request across

into a graded series.

Dimensional Model design where the

different dimensions.
Intelligent Key

measures and dimensions are stored in a


format that is expected by an OLAP tool
(many times proprietary format). One
cube may represent one Fact table with its
contributing Dimensions.

Drill (Up or Down)

Key is made up of or consists of values

Movement of end-user request through

that reflect a business requirement or use.

data along a dimension.


Natural Key
Entity

The data fields whose value ensures

An identifiable object that is distinguish-

uniqueness for each row, and the value

able from all other objects.

will not change during the life of the row.

are used to present to the end user a

Entity Relationship Model

Normalized Model

portion or derivation of the underlying

Shows the enterprise entities (or objects)

Model of entities designed to remove data

data for their purposes. Data is not

and the relationships between the entities

redundancy, the higher the form (2NF,

replicated.

in that enterprise, without duplication.

3NF, 4NF, etc) the less redundancy.

Data Mart (physical)

ETL

OLTP

A segment of data that is aggregated or

Terminology given to a group of tools that

Online transaction processing. Consists of

preprocessed to enable specific end-user

extract data, transform it and then load it.

a set of instructions that serially, mechani-

queries and requirements.

Sometimes the sequence of processing

cally and repetitively processes many

changes to reflect ELT which is extract,

thousands of transactions through a

load and then transform.

predefined access path that ultimately

Data Mart (logical)


The data is stored in one place but Views

Data Warehouse
A reflection of integrated data that is

updates the underlying data store. OLTP

organized by subject areas and is stored to

Fact

address cross functional and multiple end-

A measurement or value (quantity).

user requirements and queries.

transactions are used by a transactional


system that is focused on the acquisition

Flattened Model

of data. OLTP transactions normally have

Dimension

Made up of rows that contain the data

strict service level agreements (SLA)

The corporate line along which facts are

elements of all the column headings of a

attached to them requiring sub-second

measured (time, product, geography).

business report so that nothing has to be

response time. OLTP concentrates on data

calculated to generate portions of that

update and/or inserts.

report.

EB-2406

> 1007 > PAGE 17 OF 18

Data Model Overview


Teradata.com

Primary Key

Appendix D Endnotes

Appendix E References

Unique identifier for rows in a table.

1. Dr. E. F. Codd, Codds 12 Rules Data

Ralph Kimball, The Data Warehouse

Schema
Structured framework that, for purposes
of this paper, provides and represents
relationships between entities.
Service Level Agreement (SLA)

Management Strategies, ITWorld.com,

Toolkit, Practical Techniques for Building

05-07-2001.

Dimensional Data Warehouses, Solutions

2. William Inmon, Enterprise Data


Warehouse, http://www.billinmon.com/
library/articles/introedw.asp.
3. Ralph Kimball and Margy Ross, The

from the Expert., John Wiley & Sons, New


York, 1996.
Daniel L .Moody, Department of Information Systems, University of Melbourne,

An agreement between IT and the busi-

Data Warehouse Toolkit, Second Edition,

Kortink, Mark, From Enterprise Models

ness that pertains to a performance aspect

Wiley Computer Publishing, New York,

to Dimensional Models: A Methodology

of a process. The agreement could be

2002, pp 11-12.

for Data Warehouse and Data Mart Design,

based on availability, enforcing that data is


loaded by a specific timeframe, or it could
be based on retrieval processing enforcing
a processing time for specific queries or
types of queries.
Slowly Changing Dimensions (SCD)

4. Neil Raden, edited by Ralph Kimball,


Designer, Real Time: Get Real, Part II,
www.intelligententerprise.com, June
5. Steve Hoberman, Data Modelers
Workbench, Tools and Techniques

that dimensions changes (i.e. customer

for Analysis and Design, John Wiley &

addresses).

Sons, New York, 2002.

Made up of multiple Star Schemas without the duplication of dimension tables


that can be joined together.

William Inmon/Peter Meers, The Dilemma


of Change: Managing Changes over Time
in the Data Warehouse/DSS Environment,

30, 2003.

The impact of time on a dimension and

Snowflake Schema

Simsion Bowles & Associates, 2000.

Intelligent Enterprise, Data Warehouse

White paper on http://www.billinmon.com/


library/whiteprs/Dilemma of Change.pdf,
March 2001.

6. Ralph Kimball and Margy Ross,


The Data Warehouse Toolkit, Second
Edition, Wiley Computer Publishing,
2002, page 9.
7. William Inmon, The Problem

Star Schema

with Dimensional Modeling,

Individual fact table surrounded by

http://www.billinmon.com/library/

dimensional tables to which it will join.

articles/artdimmd.asp.

Surrogate Key
Artificial key used as a substitute for
natural data keys (i.e. customer id).

8. William Inmon, Star Joins,


http://www.billinmon.com/library/
articles/starjoin.asp.

AllFusion is a trademark and ERwin is a registered trademark of Computer Associates International, Inc. Teradata continually improves products as new technologies and components become available. Teradata, therefore, reserves the right to change specifications without prior notice. All features, functions, and operations
described herein may not be marketed in all parts of the world. Consult your Teradata representative or Teradata.com for more information.
Copyright 2004-2007 by Teradata Corporation

EB-2406

> 1007 > PAGE 18 OF 18

All Rights Reserved.

Produced in U.S.A.

You might also like