You are on page 1of 38

<Insert Picture Here>

Oracle Business Intelligence

Enterprise Edition
Some Guidelines for Data Modeling
Kurt Wolff
March 14, 2008

Normalized versus denormalized data modeling

General approaches to modeling operational vs. star
and snowflake schemas.
How to create logical table sources when you have
metrics from tables that don't join to a logical
How, when and where to set outer joins.
General OBIEE Metadata Modeling Best Practices.

Normalized vs. Denormalized: Definition

A schema is said to be normalized when it

minimizes data storage redundancy.

All values depend on the key and only the key.

Denormalized product table

Notice, for example, that the value BigG is stored 10 times.

Normalized Model

BigG stored only once

Normalized: Most Efficient For Data Inserts,

Therefore, popular in transaction (i.e. ERP) systems

where key measure is transactions/second

DBAs are trained to normalize its in their genes
Quasi-debates about whether data warehouses should
have normalized table structures. Debates feature
multiple gurus, analysts, etc.
Therefore you may encounter normalized data
warehouses, too.

Normalized Schemas and BI

More tables in FROM clause
More tables to join
Optimizer able to pick best join strategy?

Common DWH Schemas



3 NF (bring out your

ERWIN diagram)

Commonly Encountered Views About Star

Business intelligence schemas should be built as a single star
i.e. all facts in a single fact table
Drilling across (facts from multiple stars) is technically
Wouldn't it be easier if users just pointed their query tool at a single fact
table ? If the metrics are frequently compared to one anotherit makes
more sense to physically combine the data into a single fact table.
Margy Ross

Star schemas are limiting

Star schema designs in traditional [relational] databases require that
business users declare all queries they are likely to run so that the
appropriate dimensions and facts may be brought together. Each query
run must fit within a single star schema, thus eliminating the ability to ask
ad hoc or unplanned queries. Claudia Imhoff

OBI EE Provides Flexibility

Multiple stars mapped within the business model
Drilling across (from measure in one fact table to

measure in another fact table) is easy

No worries about chasm traps
No worries about fan traps
Add additional measures or stars
Stars can be at different grain

Another Advantage of Star Schemas

In Oracle, Star Join Transformations high

performance joins
Not talked about much within Oracle (??)
Requires bit-mapped indexes on fact table foreign keys
(among other things)
Has been used in analytic applications

Myth: Logical Schema Has to Be Star

Importing snowflake does works quite nicely


Snowflake Logical Schemas

Create Dimension creates all levels and level keys
Estimate levels works better
Get levels for aggregate tables works better
More complex business model -- more
tables, joins, columns.
Logical dimension columns are mapped to a
single physical column
Logical joins dont cover as many physical

Importing Full 3NF Database As Is

Likely to produce inconsistent business model that BI

Server cannot navigate

Bridge tables
Table self-joins
Single table that has multiple roles

Modeling is needed to dimensionalize the business model.

All 3NF models can be dimensionalized.

Separate aggregatable from non-aggregatable

Logical dimension tables are collections of nonaggregatable columns whose values are functionally
dependent on the logical table key
X is logical table key, Y is another attribute
each X value is associated with precisely one Y value

Logical fact tables are collections of aggregatable

columns (or columns defined by formulas that include

aggregatable columns)
Logical dimension tables have a 1:N relationship to
fact tables (expressed in business model joins)

Dimensionalization Corollaries
No non-aggregatable columns in logical fact tables
No logical fact table keys
Model non-aggregatable columns as separate

dimension table
SQL generated will reflect physical joins
Logical joins will determine join type (inner, outer)

SOP Modeling Sequence

Begin with logical fact table (usually only one unless multi-user

development expected)
Build base measures mapped to sources at lowest grain
Add logical dimension tables and logical joins
Create dimensions and hierarchies
Add additional base measures (from higher grain sources) set
aggregate levels of sources
Add aggregate sources (fact and dimension)
Create compound measures
Presentation layer (folders, column names) finalized but
Security (groups, authentication, permissions, initialization
blocks, filters)

Operational Data Sources

Common question: can we use BI Server to query

operational data? For example, SAP?

Operational system schema (3NF, 4NF, BCNF) not
the only issue
Operational system logic has to be duplicated
Can result in very complex SQL use SELECT objects in the
physical layer

Sometimes table structure itself is an issue

Multiple Table Types in SAP

Transparent. Can be read from outside SAP using

SQL. Store transaction data. Query performance an

issue unless you know indexes and access methods.
Pooled : Logical tables that can be combined in a
table pool (i.e. 10 1000 small tables stored in a
single physical table). Data combined in one field.
Store control data. Cannot be read from outside SAP.
Cluster : Logical tables that are assigned to a table
cluster (1-10 very large tables combined). Data
combined in one field. Primarily used to store control
data or temporary data. Cannot be read from
outside SAP.

Complex SQL
See sample from Siebel forecasting
Metadata development will take a lot of time

OLTP/DWH Fragmentation on Time

OLTP: Forecasting.Forecasts."Forecast Date" >

DWH: Forecasting.Forecasts."Forecast Date" <=

<Insert Picture Here>

Join Elimination Rules

Inner Joins in LTS

Complex Joins Not Eliminated


sum(T18915."Amount") as c1,
T18912."Employee" as c2
"EmpDept" T18909,
"Employees" T18912,
"Facts" T18915

Inner Joins in LTS

K/FK Joins Eliminated Depending on Cardinality


sum(T18915."Amount") as c1,
T18912."Employee" as c2
"Employees" T18912,
"Facts" T18915

Inner Joins in LTS

K/FK Joins Eliminated Depending on Cardinality
Key/FK, reversed



sum(T18915."Amount") as c1,
T18912."Employee" as c2
"Employees" T18912,
"EmpDept" T19037,
"Facts" T18915

Outer Joins in Logical Table Sources

Never Eliminated

SQL generated so that

BI Server can do OJ;
OJ not supported in DB

Outer Joins Between Logical Tables

Are Eliminated

sum(T18915."Amount") as c1,
T18912."Employee" as c2 from
"Employees" T18912,
"Facts" T18915

Joins Across LTSs

Joins Can Occur Across Dimension Table Sources
But Not in Time Dimension

No need to have
this in the
logical table

If time dimension, use aliases to avoid joins across sources

Outer Joins To Preserve Dimensions:

Two Options
Outer joins in the business model
Result in OJs in SQL
Joins, if performed, will always be OJs
OJ syntax can be ambiguous you may not get what you want
OJs can be expensive and SLOW
Outer joins in result sets
Create pseudo-measure that will always return all dimension
Include pseudo-measure in logical query (can be in a filter)
Let BI Server do the outer join of result sets
Lets users control when OJs occur

How to Preserve Dimensions





Desired Output: All

Months, All Products
with Amount > 0, Show
0 Instead of Null

Metadata Setup to Preserve Dimensions

Strategy: Use fact-based partitioning
Fact exists for all Item/Month combinations. Set filter for
BI Server will full outer join result sets
Use IfNull function to convert Nulls to 0s
Filter Items via subquery
One row
fact table
join to
where 1=1

If Months is
being used as a
formal time
dimension, the
complex join is
not allowed.
Create K/FK join
where K=1 for
all rows, FK= 1.

Query Setup

Subquery PreserveDim Sub

Prefer over setting up

expensive outer joins in the
metadata. Gives users

Metrics That Dont Join to Dimensions

Avoiding Errors Using an Empty Table

Add Month

Add Sales

None of the fact tables are compatible

with the query request

Avoiding Errors Using an Empty Table


Add Month

Additional Thoughts re Best Practices

Advice: dont pay much attention to Admin Tool

consistency checker Best Practices.

Analytic Apps repository is a good model (perhaps
overly complex in number of logical fact tables)
Only uses aliases in business model mapping
Consistent naming conventions for aliases so they group
together in a convenient way in the admin tool
Average aggregation rarely used. Use Sum/Count instead.
Focus on usability in the presentation layer not too many
things (<=12 objects in a container) in a logical order
Create descriptions for presentation layer objects (best done
in business model layer). Create metadata dictionary.

Alias Naming Conventions

Just because everything could be in a single

business model doesnt mean it has to be!

One More Thing

Dont use BI Server time series functions Ago,

ToDate unless the BI Server can function ship the

Rank function to the database(s)!