Dimensional Modelling

Dimensional Modelling
Dimensional modeling is a technique for conceptualizing and visualizing data models as a set of measures that are described by common aspects of the business. Dimensional modeling has two basic concepts:  Facts  Dimensions Other ralates concepts  Aggregates  Meta-data

Definition • A fact is a collection of related data items, consisting of measures • A fact is a focus of interest for the decision making process. • Measures are continuously valued attributes that describe facts (Golfarelli et al) • A fact is a business measure (Kimball and Ross)

What exactly is being analysed? what numbers are being analysed?

Examples of Facts
A university provides education services to its students. What are its facts and measures? Facts Measures Applications

number, revenue from prospectus sales
number, revenue

Student Performance
Student Placement

grades, marks, %age marks, division
designation, nature of job, salary

Student awards

Title, amount

Fact Each fact typically represents •a business item: an order •a business transaction: order processing •an event: arrival of an order that can be used in analyzing the business or business process .

Some Aspects of Facts A fact is continuously valued. It takes a value from a a broad range of values. The set of integers real numbers The most useful facts are numeric and additive: we almost never work with a single fact Textual facts occur very rarely: free format and unpredictable contents make it impossible to analyse these recent interest in unstructured DW look at these .

g. current_balance along account not along date • Non-additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact e.Types of Facts • Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact table. E.g. percentage or profit margin . E. Sales_Amount along date. product • Semi-additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table. but not the others.g.

perform analysis over time Discretely valued description that is more or less constant and participates in constraints Qualifying characteristics that provide additional perspective to a given fact . perform analysis over region. time • The parameter that gives meaning to a measure number of customers is a fact. product.Dimension Definition • The parameter over which we want to perform analysis of facts sales is a fact.

Student Year. Discipline.Examples of Dimension A university provides education services to its students. Year . Grades Student awards Discipline. Discipline. Region Region Performance Placement Year. What are its facts and dimensions? Facts Dimension Applications Enrollment Age.

South 1999. Discipline Grades ECE.…. IT.. 2000 …. A. Region Year North. 12 …. Student Name of student .. A+.Dimensions and their Values Dimension Age Dimension Value 10. 11.. CSE..

and groupings . report headings.Aspects of Dimensions The values of dimensions do not change with time slow changing dimensions rapidly changing dimensions Need to handle such changes Dimensions are the primary source of query constraints.

unit Hierarchies are a basis for drill down and roll-up special. notable units holidays For special queries: sales performance on holidays . branch.Dimension Hierarchies/Categories Dimensions are composed of smaller units called categories or members simpler components forming a hierarchy country. zone.

Organising Facts and Dimensions The model should provide drill down/roll up along dimension hierarchies provide good data access must be query centric be optimised for queries and analysis each dimension should be able to interact fully with the fact .

The Star Schema Dimension Dimension Fact Dimension Dimension Dimension A DW is a collection of star schemata .

Example: Facts and Dimensions Product type Region City Sales Product name Rupees Year Season Month .

Computing Fact Sizes Product type Region City Sales Rupees Let there be 5000 products 60 months 50 cities Number of sales facts = 5000*60*50= 15000000 Product name Assume one sale fact per product. per month Year Season Month . per city.

Sparse Facts Not all 5000 products may be sold each month in each city Assume that 3000 products are sold each month in each city Number of sales facts = 3000 * 60 * 50 = 9000000 Approximately 60% of the cube is occupied and 40% is empty .

product wise and month-wise Aggregation is Number of products = 5000 performed in order to Number of regions = 5 speed up common Number of months = 60 queries Total number of facts = 5000*5*60 = 1500000 Space-time tradeoff if the frequency of use is high then pay the storage expense Aggregation guideline if the number of facts summarised is more than 10.Aggregation Aggregates are pre-calculated summaries along dimension hierarchies derived from basic facts. then do aggregation . We need the total sales for each region.

Aggregation Year Region Product type Season Three-way aggregati Month City No aggregation product name One-way aggregation Two-way aggregation When aggregation is done by rising along n-dimensions then n-way aggregation is said to be performed .

5M facts The probability of all 5000 products being sold in a month in a region is higher than of all 5000 being sold in a region .5M facts The probability of all 5000 products being sold in a month in a region is higher than of all 5000 being sold in a city Two-way aggregation on regions and season results in 0.Sparsity and Aggregation As the amount of aggregation increases sparsity decreases One-way aggregation on regions results in 1.

Aggregation and the Star Schema Each aggregate is a fact with its own derived dimensions Derived dimensions may be defined ‘on the fly’ Sales summary by quarter. but quarter was not in the original dimension hierarchy Each aggregate has its own star schema .

Metadata Different definitions : • Data about the data • Tables of contents for the data • Catalog for the data • Data warehouse atlas • Data warehouse roadmap Metadata contains the answers to questions about the data in the Data Warehouse .

Central Role of Metadata .

Metadata for End Users .

Client Anyone who purchases hotel rooms Reservations. Accounts. Housekeeping 1 January 2000 13 September 2003 weekly six months 15 September 2003 Every six months .Example Entity Name Aliases Definition Source Systems Create Date Last Update Date Update Cycle Full refresh cycle Data Quality Review Planned Archival Customer Account.

Metadata for IT Professionals .

Metadata Driven Data Warehouse Process .

Data Acquisition Metadata Types .

predefined reports.Information Delivery • Functions: – Report generation – Query processing – Complex analysis • Metadata recorded in the information delivery functional area – relate to predefined queries. . and input parameter definitions for queries and reports – also include information for OLAP.

Information Delivery Metadata Types .

Challenges for Metadata Management • Reconcile the formats of metadata of several tools • No industry-wide accepted standards • Centralized metadata repository : a collection of fragmented metadata stores • No easy and accepted methods of passing metadata • Preserving version control of metadata • Unifying the metadata relating to the data sources can be an enormous task .

Common Warehouse Model Foundation Metadata Business information about model elements Data types Keys and Indexes Expression Software Deployment: software deployed in DW Type Mapping: mapping of data types between different systems .

Common Warehouse Model Metadata for Resource Relational data sources Record data sources multidimensional resources XML data sources Analysis Metadata Data transformation tools OLAP processing tools Data mining tools Information visualisation tools Business taxonomy and glossary Metadata .

Common Warehouse Model Management Warehouse Processes Results of Warehouse Operations .

The Star Schema Revisited The Star contains ‘detailed’ facts and dimensions Aggregates are facts and have their own dimensions Meta-data support is built around the start schema .

STARjoin . accelerated approaches to indexing.Star Schema: Benefits • Depicts a fuller description of each dimension • Explicitly shows multiple levels of aggregation on each dimension • Depicts multiple facts at the intersection of all dimensions • Directly implementable in a Relational DBMS • Can utilize new. STARindex and joining.

Dimensional Modelling vs.000 $44.000 39.000 22.460 3.000 ======================================================================= Is this a Relational Table? What is the Entity? What is the Identifier? What are the Attributes? How to make it a Relational Table? How many Fact types? How many Dimensions? .810 12.260 10.000) ======================================================================= REGION: PRODUCT: Stibes Farkles Teglers SOUTHERN WESTERN NORTHERN EASTERN TOTAL ---------------------------------------------------------------------------------------------------------------------------$7.790 11.875 10.750 12.140 5.310 6.850 $15.500 $153.140 5. Spread Sheet Annual product sales by region ($.625 40.090 6.525 $13.500 Qwerts 5.250 11.000 $47.975 $51.500 $40.500 ---------------------------------------------------------------------------------------------------------------------------TOTALS: $21.150 $14.

150 5.975 12.090 6.000 153.000 Where are the Dimension Tables? REGION: NAME (all) Southern Western Northern Eastern LEVEL 1 2 2 2 2 --------------------- .140 5.850 10.Dimensional Modelling vs.875 13. Relations How many Facts? How many Dimensions? What type of Table? What is the Identifier? REGION Southern Southern Southern Southern Western Western Western Western Northern Northern Northern Northern Eastern Eastern Eastern Eastern (all) (all) Southern (all) PRODUCT Stibes Farkles Teglers Qwerts Stibes Farkles Teglers Qwerts Stibes Farkles Teglers Qwerts Stibes Farkles Teglers Qwerts Stibes Qwerts (all) (all) SALES $7.625 51.750 15.260 10.500 21.790 11.310 6.250 14.810 12.460 3.000 40.140 5.525 11.

Product Group ID (fk) Week Store ID (fk) Product ID (fk) Quantity .ER Diagram REGION Region ID Region Name STORE Store ID Store Name Address City State ZipCode Region ID (fk) SALES Sales Date Store ID (fk) Product ID (fk) Sale Amount Sale Units DEPARTMENT Department ID Department Name INVENTORY PRODUCT GROUP Product Group ID Product Group Desc. Department ID (fk) PRODUCT Product ID Product Desc.

ER Diagram Good for OLTP Update in exactly one place No redundancy Oriented towards insertion. Modification of data weak entities/relationships create normalised structures What are the facts and dimensions? . deletion.

Transformation of ER to Star Product Dimension Location Dimension Time Dimension DEPARTMENT REGION YEAR MONTH PRODUCT GROUP STORE WEEK PRODUCT ITEM DATE SALES FACTS .