You are on page 1of 19

Intelligent Data Strategies

AN INTRO TO DATA MARTS AND DATA WAREHOUSES MARKUS BEAMER BDPA-CHARLOTTE
WWW.BDPA-CHARLOTTE.ORG

ALSO AVAILABLE ON MOBEAMER.BLOGGER.COM

Data Warehouse or Data mart
y There are many definitions of a data warehouse and data marts.

However there is no single standard definition. However for our purposes we will define them as follows: Data Warehouse
Extreme Volume
Contains years of daily information at the lowest grain possible.

Data Mart
Specific Volume Sets
May only contain month to date information.

Corporate wide
The grouping of data elements is dictated by the corporate structure.

Specific
Data is grouped by needs of the team or group building the solution.

Facts and Dimensions
This system is typically made up of facts.

Many Metrics
Has many metric tables and rollup grains

Serves data to Datamarts Will have data that is shared across groups.

Serves data to Reports Will have data specific to only the implementation group.

Traditional Reporting Solutions

Analyst(s)

Intelligence

Opens the door for: ‡ Conflicting numbers ‡ Human Error ‡ Miss-understood data ‡ Non-Efficent

Many systems performing many different business functions

Many reports from the multiple sources

Human intervention is needed to ³makes sense´ of different reports.

The Disorganized Closet

Like a Disorganized Closet The Data is there, but do you know that you have that special shirt you really need on Friday

Organize Your Data Closet

It takes time It takes discipline It takes a planned approach

Simple Data Warehousing

Intelligence

Automated reporting that understands all sources.

Many systems performing many different business functions

A centralized shared location for all the data.

Reports specific to each system can still be delivered.

Adding Data Marts

Data Mart

Intelligence

Data Mart Reports specific to each system can still be delivered.

Automated reporting that understands all sources can be delivered.

Loading the Warehouse

The Website A normal website where customer can come and order items from your company.

Website DB This is your standard relational database system. Tracks a lot of information.

Data A single data file containing All Orders made by a customer for that day.

Ware House All this data is stored in a single table called ³Orders´

Extract, Transform, Load (ETL) is a process used to get data into your warehouse. The typical chain of events is as follows: 
Your front end system, usually a transactional system, will send data

and information to it¶s relational database.  At certain periods, nightly, hourly or otherwise, a single data file is extracted and delivered to a specified location.  The warehouse on detecting a new file will transform this information and load it into it¶s standard model.

Components of a Datawarehouse

External Sources

Stage

Data Marts

Reports

Facts
y Source ± this record is a one to one match of data delivered from

the data file in the ETL process.
Name Markus John CustomerID 1001 1010 Product Airplane Car Price 19.00 10.00 Date 06/25/2010 06/28/2010

y Fact - A fact is a single measureable data piece. Fact tables will not

typically contain text fields. They will also always have a date associated with them. This represents when that fact was taken.
Date 06/25/2010 06/28/2010 CustomerDimID 1 2 ProductDimID 101 102 Price 19.00 10.00

Dimensions
y Dimension ± A dimension is an attribute found within the source

data. In a perfect warehouse all text elements would be turned into dimensions. You may even do this to numeric values. Dimension will typically speed up your reporting processes.
CustDimID 1 2 CustID 1001 1010 Name Markus John ProductDimID 101 102 Name Airplane Car

y A Conformed Dimension table is a dimension table that is shared

throughout all of your data marts within your warehouse. For example: A customer, product or employee dimension might be considered a core dimension.

Time Sensitive Dimensions
y Slow Changing Dimensions allow for time sensitive data tracking y Simply add start and end dates to each dimension table. y This will impact your loading and transformation processes.

CustDimID 1 2

CustID 1001 1010

Name Markus John

State NC SC

Start 01/01/2010 01/01/2010

End 01/01/2070 01/01/2070

2 Customers one in NC one in SC.

CustDimID 1 2 3

CustID 1001 1010 1001

Name Markus John Markus

State NC SC SC

Start 01/01/2010 01/01/2010 02/01/2010

End 01/31/2010 01/01/2070 01/01/2070

Markus moves to SC in Feb. You can still report accurate NC sales in Jan because of the start and end dates.

Hierarchies
y Hierarchies are dimension tables that reflect parent to children

relationships. Typically a hierarchy table will be used to ³rollup´ metrics to different levels.
y We can turn the product dimension table into a hierarchy by adding

a parent product code.
ProductDimID 100 101 102 Name Toys Airplane Car ProductCode T A1 C1 ParentCode T T For Example: The Airplane and Car both belong to the Toys product line. This hierarchy could be used to rollup and produce all Toy sales.

Facts and Dimensions
y Notice how this fact tables has relations to the dimension tables.

This allows us to ³pivot´ the facts around each dimension in an efficient manner.
Date 06/25/2010 06/28/2010 CustomerDimID 1 2 ProductDimID 101 102 Price 19.00 10.00

CustDimID 1 2

CustID 1001 1010

Name Markus John

ProductDimID 101 102

Name Airplane Car

Metrics
y Metric ± A metric is an aggregation of fact information, usually

around a particular set of dimensions. In a typical environment a metric table becomes the source for a single report. But because of the dimensions, metrics can be combined across multiple systems.
y These metric tables are combined with their dimensions to produce

that actual output of the reports.
Date 06/25/2010 06/28/2010 Date 06/25/2010 06/28/2010 ProductDimID 101 102 CustDimID 1 2 NumOrders 20 30 NumOrders 1 5

Product Metric Table
For Example: The Product Metric table might be used to show that ³Airplanes´(101) were sold 30 times on the 28th.

Customer Metric Table
In this example you can see that Markus (1) bought one product, while John (2) bought 5 orders.

Star Schema
y Star Schema ± The diagram used to depict a traditional data mart is

a called a star schema. Typically a fact or metric tables is placed in the center. All dimension tables are then laid out around it. Giving the diagram a star like appearance.
Date Date Day of Week Month Name isHoliday Date CustomerDimID ProductDimID Customer CustomerDimID CustID Name Age Start End EmployeeDimID Price Orders Product ProductDimID ProductName ProdcutCode ParentCode Start End Employee EmployeeDimID SSN Manager SSN Status Start End

Using the Data Mart
y You can then take these fact and dimension tables and place them in

front of a reporting engine.
y The user can then drill through the metrics. y The dimension tables allow the user to ³pivot´ the metrics through

any attribute. They can go from viewing Customers by State to viewing Sales by Employees by switching dimensions.
y If users consistently want to use one view of the data, you may

decide to turn these into Metric tables.

Creating A Metric Table
SQL
Creating Metric Table
Select Orders.date ,Orders.CustomerDimID ,count(*) as numOrders From Orders Group by orders.CustomerDimID Date 06/25/2010 06/28/2010 CustDimID 1 2 NumOrders 1 5

Schema
Date Date Day of Week Month Name isHoliday Orders Date CustomerDimID Customer CustomerDimID CustID Name Age Start End ProductDimID EmployeeDimID Price Employee EmployeeDimID SSN Manager SSN Status Start End ProductDimID ProductName ProdcutCode ParentCode Start End Product

Reporting SQL
Select date, numOrders, Customer.Name from Metric_NumOrders inner join customer on customer.customerDimID = Metric.customerDimID where date between µ01/01/2010¶ and µ01/31/2010¶

References
y A data mart is not a data warehouse 

http://www.information-management.com/infodirect/19991120/1675-1.html

y General Data Warehousing Articles 

http://www.ralphkimball.com/html/articles.html