Data Warehouse

Agenda
           

What is Data Warehouse Transaction System vs Data Warehouse Data Warehouse Architecture Metadata Data Flows Issues for building Data Warehouse Warehouse Schema Tool & Technologies Advantages of Data Warehouse Problems Data Mart Data Mining
Data Warehouse

What is Data Warehouse?

Collection of integrated, subject-oriented, time-variant and non-volatile data in support of managements decision making process. Described as the "single point of truth", the "corporate memory", the sole historical register of virtually all transactions that occur in the life of an organization.

Data Warehouse

Transaction System vs. Data Warehouse
♦ Transaction System ♦ Data Warehouse
Supports day-to-day operational processes Contains raw, detailed data that has not been refined or cleansed Volatile -- data changes from day-to-day, with frequent updates Technical issues drive the data structure and system design Disparate data structures, physical locations, query types, etc. Users rely on technical analysts for reporting needs Operational processes impacted by queries run off of system Supports management analysis and decision-making processes Contains summarized, refined, and cleansed information Non-volatile -- provides a data “snapshot”; adjustments are not permitted, or are limited Business analysis requirements drive the data structure and system design Integrated, consistent information on a single technology platform Users have direct, fast access via On-line Analytical Processing tools Minimal impact on operational processes

Data Warehouse

Data Warehouse Architecture
ODS 1 Meta-data Lightly summarized data High Summarized data

Query Manager Load Manager
Detailed data

Reporting, query, application development, and EIS tools

ODS 2

DBMS

OLAP tools

ODS 3

Operational data store (ODS)

Warehouse Manager
Data mining

Archive/backup data
Data Warehouse

End-user access tools

Operational datastore(ODS) It is a repository of current and integrated operational data used for analysis. Load manager it performs all the operations associated with the extraction and loading of data into the warehouse. Warehouse managerperforms all the operations associated with the management of the data in the warehouse. Query manageralso called backend component, it performs all the operations associated with the management of user queries.

Data Warehouse

End-user access toolscan be categorized into five main groups: data reporting and query tools, application development tools, executive information system (EIS) tools, online analytical processing (OLAP) tools, and data mining tools Summarized data-> Stores all th aggregations generated by warehouse manager.Exists to speed up performance of queries and do not require backup Archive/backup data-> Backup ensures recovery of Data Warehouse from any data loss or any failure. In archiving, older data is removed from the system in a format that allows it to be qickly restored if required. Meta-data

Data Warehouse

Importance of Meta Data
 

Meta-data : data about data Purpose of meta-data is to show the pathway back to where the data began, so that the warehouse administrators know the history of any item in the warehouse The meta-data associated with data transformation and loading must describe the source data and any changes that were made to the data The meta-data associated with data management describes the data as it is stored in the warehouse The meta-data is required by the query manager to generate appropriate queries, also is associated with the user of queries

Data Warehouse

Data flows

Inflow- The processes associated with the extraction, cleansing, and loading of the data from the source systems into the data warehouse. upflow- The process associated with adding value to the data in the warehouse through summarizing, packaging , packaging, and distribution of the data downflow- The processes associated with archiving and backingup of data in the warehouse outflow- The process associated with making the data availabe to the end-users Meta-flow- The processes associated with the management of the meta-data

Data Warehouse

Reporting, query,application development, and EIS (executive information system) tools
Operational data source1

Warehouse Manager Meta-flow
Meta-data High summarized data

Inflow Load Manager
Operational data source n Detailed data

Lightly summarized data

Outflow Query Manager OLAP (online analytical processing) tools

Upflow

DBMS
Warehouse Manager

Operational data store (ods)

Downflow Archive/backup data Data mining tools

End-user access tools
Information flows of a data warehouse
Data Warehouse

Issues to be addressed in Building Data Warehouse
    

When and how to gather Data? What schema to use? Data Cleansing How to propagate updates? What data to summarize?

Data Warehouse

Warehouse Schema

Fact Table:
Stores the business data. Data in fact table is called Fact. They contain multidimensional data.

Dimension Table:
To minimize storage requirements, dimension attributes are usually short identifiers that are foreign keys into other tables called Dimension Table

Data Warehouse

Schema with Fact & Dimension Table
Name of the Product Product Number Description Of Product PRODUCT Area 1 AREA Area 2

DURATION

Area 3

Year Beginning Date Completion Date

Data Warehouse

Star Schema

Fact table in the center and all the dimension tables attached to the central fact table. Example: Sales Processing
Dimension Table: PRODUCT

Dimension Table: AREA

Fact Table SALES

Dimension Table: TIME

Dimension Table: CUSTOMER
Data Warehouse

Dimension Tables
Region_Dimension_Table region _id region _doc NE NW SE SW Northeast Northwest Southeast Southwest

Product_Dimension_Table prod_grp_id prod_id prod_grp_desc prod_desc 10 20 30 100 140 220 Fewer devices Circuit boards Components Power supply Motherboard Co-processor

account _id account _doc account _id account _doc

100000 100000 110000 110000 120000 120000 130000 130000 140000 140000

ABC Electronics ABC Electronics Midway Electric Midway Electric Victor Components Victor Components Washburn, Inc. Washburn, Inc. Zerox Zerox

Account_Dimension_Table

month month

prod_id region_id account_id vend_id net-sales gross_sales prod_id region_id account_id vend_id net-sales gross_sales 100 100 140 140 220 220 SW SW NE NE SW SW 100000 100000 110000 110000 100000 100000 100 100 200 200 300 300 30,000 30,000 23,000 23,000 32,000 32,000 50,000 50,000 42,000 42,000 49,000 49,000

01-1996 01-1996 02-1996 02-1996 03-1996 03-1996

Fact Table
Monthly_Sales_Summary_Table Vendor_Dimension_Table
month month 01-1996 01-1996 02-1996 02-1996 03-1996 03-1996 mo_in_fiscal_yr mo_in_fiscal_yr 4 4 5 5 6 6 month_name month_name vend_id vendor_desc vend_id vendor_desc January January February February March March 100 100 200 200 300 300 PowerAge, Inc. PowerAge, Inc. Advanced Micro Devices Advanced Micro Devices Farad Incorporated Farad Incorporated

Time_Dimension_Table

Data Warehouse

Snowflake Schema

Consists of Fact Table and Normalized Dimensional Table.
Disadvantage:
  

Unmanageable Data Difficult to Retrieve Data Metadata become Complex

Data Warehouse

Snowflake Schema
Product Category Product Manufacturer

Dimension Table PRODUCT

Dimension Table AREA

Fact Table SALES

Dimension Table TIME

Dimension Table CUSTOMER
Data Warehouse

Starflake Schema

Combination of Star Schema and Snowflake Schema. Consists of Fact table, Star Dimension and Snowflake Dimension.

Data Warehouse

Starflak e Schema

Price Snowflake Dimension Product

Weight

Star Dimension Product

Fact Table SALES

Star dimension Location

Location Location 1
Data Warehouse

Location 2

Tools and Technologies
Tools & Technologies used in the construction of a Data Warehouse:
  

Data Extraction - SAS Data Cleansing - Apertus, Trillium Data Storage - ORACLE, SYBASE

Data Warehouse

Advantages of using data warehouse
     

End-user access wide variety of data Business decision making for future purpose Increases data consistency Increases productivity Decreases computing costs Combines data

Data Warehouse

Problems
   

 

Increased end-user demands High demand for resources High maintenance Extracting, cleansing and loading data could be time consuming. Data warehousing increases project scope. Problems with compatibility with systems already in place e.g. transaction processing system. Providing training to end-users, who end up not using the data warehouse. Security could develop into a serious issue, especially if the data warehouse is web accessible.

Data Warehouse

Data mart

It a subset of a data warehouse that supports the requirements of particular department or business function The characteristics that differentiate Data Marts and Data Warehouses include:

A Data mart focuses on only the requirements of users associated with one department or business function

Data marts do not normally contain detailed operational data, unlike data warehouses As data marts contain less data compared with data warehouses, data marts are more easily understood and navigated
Data Warehouse

Operational data source1

Warehouse Manager
Highly summarized data Lightly summarized data

Meta-data ODS 1

Reporting, query,application development, and EIS tools
Query Manager

Load Manager

ODS 2

Detailed data

DBMS
OLAP tools
Warehouse Manager

ODS 3

(First Tier) Operational data store (ODS) Archive/backup data

Data mining
End-user access tools

summarized Data Data Mart (Relational database)
(Second Tier)

Summarized data (Multi-dimension database)
Data Warehouse

Reasons for creating a To give Mart Datausers access to the data they need to analyze

most often

To provide data in a form that matches the collective view of the data by a group of users in a department or business function To improve end-user response time due to the reduction in the volume of data to be accessed To provide appropriately structured data the user as it is the requirements of end-user access tools Normally use less data so tasks such as data cleansing, loading, transformation, and integration are far easier, and hence implementing and setting up a data mart is simpler than establishing a corporate data warehouse
Data Warehouse

Data Mining

Process of extracting previously unknown, valid and actionable information from large data and then using the information to make crucial business decisions. Applications : Early warning systems, Fraud detection, market research, direct mail. Data Mining provides techniques to :  Detect trends or patterns, find correlations  Data Analysis

Forecasting and business modeling

Data Warehouse

Sign up to vote on this title
UsefulNot useful