You are on page 1of 38

<Insert Picture Here>

Oracle Business Intelligence


Enterprise Edition
Some Guidelines for Data Modeling
Kurt Wolff
March 14, 2008

Topics
Normalized versus denormalized data modeling

strategies.
General approaches to modeling operational vs. star
and snowflake schemas.
How to create logical table sources when you have
metrics from tables that don't join to a logical
dimension.
How, when and where to set outer joins.
General OBIEE Metadata Modeling Best Practices.

Normalized vs. Denormalized: Definition


A schema is said to be normalized when it

minimizes data storage redundancy.


All values depend on the key and only the key.

Denormalized product table


Notice, for example, that the value BigG is stored 10 times.

Normalized Model

BigG stored only once

Normalized: Most Efficient For Data Inserts,


Updates
Therefore, popular in transaction (i.e. ERP) systems

where key measure is transactions/second


DBAs are trained to normalize its in their genes
Quasi-debates about whether data warehouses should
have normalized table structures. Debates feature
multiple gurus, analysts, etc.
Therefore you may encounter normalized data
warehouses, too.

Normalized Schemas and BI


More tables in FROM clause
More tables to join
Optimizer able to pick best join strategy?

Common DWH Schemas

Snowflake

Star

3 NF (bring out your


ERWIN diagram)
Constellation

Commonly Encountered Views About Star


Schemas
Business intelligence schemas should be built as a single star
i.e. all facts in a single fact table
Drilling across (facts from multiple stars) is technically
difficult.
Wouldn't it be easier if users just pointed their query tool at a single fact
table ? If the metrics are frequently compared to one anotherit makes
more sense to physically combine the data into a single fact table.
Margy Ross

Star schemas are limiting


Star schema designs in traditional [relational] databases require that
business users declare all queries they are likely to run so that the
appropriate dimensions and facts may be brought together. Each query
run must fit within a single star schema, thus eliminating the ability to ask
ad hoc or unplanned queries. Claudia Imhoff

OBI EE Provides Flexibility


Multiple stars mapped within the business model
Drilling across (from measure in one fact table to

measure in another fact table) is easy


No worries about chasm traps
No worries about fan traps
Add additional measures or stars
Stars can be at different grain

Another Advantage of Star Schemas


In Oracle, Star Join Transformations high

performance joins
Not talked about much within Oracle (??)
Requires bit-mapped indexes on fact table foreign keys
(among other things)
Has been used in analytic applications

Myth: Logical Schema Has to Be Star


Importing snowflake does works quite nicely

Demo

Snowflake Logical Schemas


Benefits
Create Dimension creates all levels and level keys
Estimate levels works better
Get levels for aggregate tables works better
Drawbacks
More complex business model -- more
tables, joins, columns.
Logical dimension columns are mapped to a
single physical column
Logical joins dont cover as many physical
joins

Importing Full 3NF Database As Is


Likely to produce inconsistent business model that BI

Server cannot navigate


Bridge tables
Table self-joins
Single table that has multiple roles

Modeling is needed to dimensionalize the business model.


All 3NF models can be dimensionalized.

Dimensionalize
Separate aggregatable from non-aggregatable

columns
Logical dimension tables are collections of nonaggregatable columns whose values are functionally
dependent on the logical table key
X is logical table key, Y is another attribute
each X value is associated with precisely one Y value

Logical fact tables are collections of aggregatable

columns (or columns defined by formulas that include


aggregatable columns)
Logical dimension tables have a 1:N relationship to
fact tables (expressed in business model joins)

Dimensionalization Corollaries
No non-aggregatable columns in logical fact tables
No logical fact table keys
Model non-aggregatable columns as separate

dimension table
SQL generated will reflect physical joins
Logical joins will determine join type (inner, outer)

SOP Modeling Sequence


Begin with logical fact table (usually only one unless multi-user

development expected)
Build base measures mapped to sources at lowest grain
Add logical dimension tables and logical joins
Create dimensions and hierarchies
Add additional base measures (from higher grain sources) set
aggregate levels of sources
Add aggregate sources (fact and dimension)
Create compound measures
Test
Rename
Presentation layer (folders, column names) finalized but
extendable
Security (groups, authentication, permissions, initialization
blocks, filters)

Operational Data Sources


Common question: can we use BI Server to query

operational data? For example, SAP?


Operational system schema (3NF, 4NF, BCNF) not
the only issue
Operational system logic has to be duplicated
Can result in very complex SQL use SELECT objects in the
physical layer

Sometimes table structure itself is an issue

Multiple Table Types in SAP


Transparent. Can be read from outside SAP using

SQL. Store transaction data. Query performance an


issue unless you know indexes and access methods.
Pooled : Logical tables that can be combined in a
table pool (i.e. 10 1000 small tables stored in a
single physical table). Data combined in one field.
Store control data. Cannot be read from outside SAP.
Cluster : Logical tables that are assigned to a table
cluster (1-10 very large tables combined). Data
combined in one field. Primarily used to store control
data or temporary data. Cannot be read from
outside SAP.

Complex SQL
See sample from Siebel forecasting
Metadata development will take a lot of time

OLTP/DWH Fragmentation on Time


OLTP: Forecasting.Forecasts."Forecast Date" >

VALUEOF("ETLRunDateMinusInterval")
DWH: Forecasting.Forecasts."Forecast Date" <=
VALUEOF("ETLRunDateMinusInterval")

<Insert Picture Here>

Join Elimination Rules

Inner Joins in LTS


Complex Joins Not Eliminated
Complex
Employee

Amount

select
sum(T18915."Amount") as c1,
T18912."Employee" as c2
from
"EmpDept" T18909,
"Employees" T18912,
"Facts" T18915
where

Inner Joins in LTS


K/FK Joins Eliminated Depending on Cardinality
Key/FK
Employee

Amount

select
sum(T18915."Amount") as c1,
T18912."Employee" as c2
from
"Employees" T18912,
"Facts" T18915
where

Inner Joins in LTS


K/FK Joins Eliminated Depending on Cardinality
Key/FK, reversed

Employee

Amount

select
sum(T18915."Amount") as c1,
T18912."Employee" as c2
from
"Employees" T18912,
"EmpDept" T19037,
"Facts" T18915
where

Outer Joins in Logical Table Sources


Never Eliminated

SQL generated so that


BI Server can do OJ;
OJ not supported in DB

Outer Joins Between Logical Tables


Are Eliminated

select
sum(T18915."Amount") as c1,
T18912."Employee" as c2 from
"Employees" T18912,
"Facts" T18915
where

Joins Across LTSs


Joins Can Occur Across Dimension Table Sources
But Not in Time Dimension

No need to have
this in the
CUSTOMERS
logical table
source

If time dimension, use aliases to avoid joins across sources

Outer Joins To Preserve Dimensions:


Two Options
Outer joins in the business model
Result in OJs in SQL
Joins, if performed, will always be OJs
OJ syntax can be ambiguous you may not get what you want
OJs can be expensive and SLOW
Outer joins in result sets
Create pseudo-measure that will always return all dimension
rows
Include pseudo-measure in logical query (can be in a filter)
Let BI Server do the outer join of result sets
Lets users control when OJs occur

How to Preserve Dimensions


Facts

Months

Items

Data

Desired Output: All


Months, All Products
with Amount > 0, Show
0 Instead of Null

Metadata Setup to Preserve Dimensions


Strategy: Use fact-based partitioning
Fact exists for all Item/Month combinations. Set filter for
Fact.
BI Server will full outer join result sets
Use IfNull function to convert Nulls to 0s
Filter Items via subquery
One row
fact table
Complex
join to
Dummy
where 1=1

If Months is
being used as a
formal time
dimension, the
complex join is
not allowed.
Create K/FK join
where K=1 for
all rows, FK= 1.

Query Setup

Subquery PreserveDim Sub

Prefer over setting up


expensive outer joins in the
metadata. Gives users
control.

Metrics That Dont Join to Dimensions

Avoiding Errors Using an Empty Table

Add Month

OK
Add Sales

None of the fact tables are compatible


with the query request

Avoiding Errors Using an Empty Table

OK

Add Month

Additional Thoughts re Best Practices


Advice: dont pay much attention to Admin Tool

consistency checker Best Practices.


Analytic Apps repository is a good model (perhaps
overly complex in number of logical fact tables)
Only uses aliases in business model mapping
Consistent naming conventions for aliases so they group
together in a convenient way in the admin tool
Average aggregation rarely used. Use Sum/Count instead.
Focus on usability in the presentation layer not too many
things (<=12 objects in a container) in a logical order
Create descriptions for presentation layer objects (best done
in business model layer). Create metadata dictionary.

Alias Naming Conventions

Just because everything could be in a single


business model doesnt mean it has to be!

One More Thing


Dont use BI Server time series functions Ago,

ToDate unless the BI Server can function ship the


Rank function to the database(s)!