You are on page 1of 34

Successful Dimensional Modeling of Very Large Data Warehouses

By Bert Scalzo, Ph.D.
Bert.Scalzo@Quest.com

About the Author
       Oracle DBA from 4 through 8i Worked for Oracle Education Worked for Oracle Consulting Holds several Oracle Masters BS, MS and PhD in Computer Science MBA and insurance industry designations Articles in
• Oracle Magazine • Oracle Informant • PC Week (now E-Magazine)

About Quest Software

Know Your Application What type of application are you building:  On Line Transaction Processing (OLTP)  Operational Data Store (ODS)  On Line Analytical Processing (OLAP)  Data Mart / Data Warehouse (DM/DW) .

4000 0NF Dimensional .OLTP Business Focus Operational ODS Operational Tactical OLAP Tactical DM/DW Tactical Strategic End User Tools DB Technology Trans Count Trans Size Trans Time Size in Gigs Normalization Data Modeling Client Server Web Relational Large Small Short 10 – 200 3NF Traditional ER Client Server Web Relational Medium Medium Medium 50 – 400 3NF Traditional ER Client Server Cubic Small Medium Long 50 – 400 N/A N/A Client Server Web Relational Small Large Long 400 .

Embrace New Concepts  “Teach Old Dog New Tricks”  Throw out any OLTP baggage  Forget OLTP “Golden Rules” .

Star Schema Design “Star schema” approach to dimensional data modeling was pioneered by Ralph Kimball Dimensions: smaller. non-key columns used for calculations during end-user queries . de-normalized tables containing business descriptive columns that end-users query on Facts: very large tables with primary keys formed from the concatenation of related dimension table foreign key columns. and possessing numerically additive.

Facts Dimensions .

108th -1010th 103rd -105th .

Transform OLTP Model Fold OLTP model into itself to form a Star:  De-Normalize parent/child relationships  De-Normalize lookup relationships  Use surrogate or meaningless keys  Create and populate a time dimension  Create hierarchies of data in dimensions .

OLTP Model .

Dimensional Model .

LEVELX -------------------DAY MONTH QUARTER WEEK YEAR SQL> select distinct levelx from dw_product. LEVELX -------------------ALL PRODUCTS CATEGORY ITEM PSA SUB_CATEGORY .Dimension Hierarchies SQL> select distinct levelx from dw_period.

Avoid Snowflakes Avoid natural desire to normalize model: •Complicates end-user query construction •Adds additional level of “JOIN” complexity •Database optimizers do not handle very well •Saves some space at the cost of longer queries .

Snowflake Model .

g. week.g. beer) •By dimension hierarchy (e.g. year) •By geographic regions (e. quarter.Common Aggregations Build end-user driven aggregate tables: •By time (e.g. time zones) •By end-user reporting interests (e. product category) •Aggregates should be 5 to 10 times smaller . month.

Time Aggregates .

Non-Time Aggregates .

Index Design All fact table. foreign key columns must have individual bitmap indexes on them All dimension table. non-key columns should have individual bitmap indexes .

10 B-Tree Indexes .

48 Bitmap Indexes!!! .

Key Fact Table Issues Fact tables should: •NOT create or enable foreign key constraints •NOT create or enable table check constraints •NOT create or enable primary/unique constraints (use unique indexes which offer parallel creation) •NOT create or enable column check constraints (other than simple NOT NULL check constraints) •NOT create or enable “row” level triggers •NOT enable logging on tables or their indexes .

No PK/UK/FK Constraints .

Key Oracle Issues Trust me – no way to build large DW in Oracle 7.X Very brief overview in next few slides of: •Partioning options •Indexing options •Comparative timings •Tuning ad-hoc Star queries •Serial versus Parallel queries •Materialized Views … .

prefixed bitmap index •Fact non-time index = local. but: •Use Range or List Partitioning using your time dimension •Fact unique index = local. non-prefixed bitmap index •If any non-time dimension provides a good locality of reference for typical user queries.e use 8i’s new composite partitioning) .Oracle Partitioning •Way beyond the scope of dimensional modeling. then sub-partition on that dimension (i. prefixed b-tree index •Fact time index = local.

BTREE 2. BTREE 6. BITMAP 5. BTREE 4. BTREE 7. BITMAP Indexing Options!!! . BTREE 3. BTREE 9. BITMAP 11. BITMAP 8.TABLE OBJECT RELATIONAL TABLE IN CLUSTER TABLE IN TABLESPACE ORG INDEX ORG HEAP TABLE NONPARTITION CLUSTER INDEX NONCLUSTER INDEX TABLE-IZED INDEX TABLE PARTITION INDEX NONPARTITION INDEX NONPARTITION INDEX NONPARTITION INDEX NONPARTITION INDEX PARTITION INDEX NONPARTITION INDEX PARTITION GLOBAL GLOBAL GLOBAL GLOBAL GLOBAL GLOBAL GLOBAL LOCAL 1. BTREE 10. BTREE 12.

Oracle 8i Table Option Timings Fact Implementation Regular “Heap” Table Single Column Partition Multi Column Partition Composite Partition Index Organized Table Timing 9.508 Partition Index 14.293 4.902 NOTE: specific to my data and user queries Organized .747 4.987 6.319 12.

non-prefixed bitmap index . prefixed bitmap index •Fact non-time index use local. prefixed b-tree index •Fact time index uses local.Tuning Star Queries •Way beyond the scope of dimensional modeling. but: •Use Oracle 8.X’s Range Partitioning based upon your time dimension (do not try to use hash or composite partitioning) •Fact unique index uses local.

Typical User Query Query: beer and coffee sales for November of 98 in Dallas .

Best Explain Plan Star Transformation .

with Partition Parallel.34 4 5. No Partition Serial.62 5 ORA600 ORA600 11.578 11.Oracle 8i Query Options Explain Plan Serial. No Partition Parallel.14 0 25. with Partition UNIX 9.45 4 NOTE: specific to my data and user queries .688 NT 22.

e.Oracle 8i Materialized Views •Way beyond the scope of dimensional modeling. but : •Special form of snapshots (i. replication) •End-users direct all queries against detail table •Optimizer rewrites queries to use best aggregate •Optimizer suggests new aggregates based on load •Eliminates need for numerous aggregation programs .

This presentation provides the ground level. and some exposure to Oracle 8i.000 foot overview focusing on hardware and software architectures -. monitoring and tuning parallel loading of Data Warehouses in Oracle 8. While there are numerous books and papers on Data Warehousing with Oracle. Optimizing Data Warehouse Loading via Parallelized Pro-C and SQL Attendees will learn optimal techniques for coding. . they generally provide a 50.X. partitioning options.Other DW Presentations Optimizing Data Warehouse Ad-Hoc Queries against "Star Schemas“ Attendees will learn optimal techniques for designing.0 and 8i. have at least one years experience with Oracle 8.0 and 8i. monitoring and tuning "Star Schema" Data Warehouses in Oracle 8. Attendees should be DBAs or senior developers familiar with Oracle 8. While there are numerous books and papers on Data Warehousing with Oracle. detailed recipe for successfully querying tables whose sizes exceed 500 million rows. Attendees should be DBAs familiar with "Star Schema" database designs. detailed recipe for high speed loading of tables whose sizes exceed 500 million rows.000 foot overview focusing on hardware and software architectures -.0. Issues covered will include database instance options. This presentation provides the ground level.with some database design. Issues covered will include table and index designs. they generally provide a 50. plus Oracle initialization parameters. optimizer choices.with some database design. Oracle initialization parameters and star transformation explain plans. partitioning options. table and index designs. statistics and histograms. ProC and SMP or MPP UNIX environments.

THANK YOU FOR LISTENING .