You are on page 1of 34

Data Warehousing

L.Ramanathan Asst. Prof. SCSE VIT University Room No. MB E 407

The Architectural Components

Understanding Data Warehouse Architecture
Architecture- Definition
“The Structure that brings all the components of data warehouse together”. Architecture includes the following factors:  Integrated data  Things necessary to prepare and store data  Means for delivering information from the data warehouse

Made up of technology that empowers the data warehouse It defines the standards.   Composed of rules. general design and support techniques .. procedures and functions that enable the data warehouse to work and fulfill the business requirements.Understanding Data Warehouse Architecture – Contd. measurements.

application development. Query EIS(executive information system) tools and Manage Detailed data DBMS OLAP(online analytical processing) tools Data mining (Third Tier) Archive/backup data End-user access tools Data Mart Data Staging Data Storage summarized data(Relational database) Summarized data (Multi-dimension database) (Second Tier) Typical data warehouse and data mart architecture . query.Source Data Operational data source1 Operational data source 2 Operational data source n Operational data store (ods) D A T A A C Q U I S I T I O N Management and Control Warehouse Manager Information Delivery Meta-data Lightly summarized data High summarized data Reporting.

.Distinguishing Characteristics Different Objectives and Scope The DW architecture must have components that will work to provide data to users in large volumes in a single session.

. Defining the scope of the DW is also difficult.  Data granularity and data volumes.Contd.Distinguishing Characteristics Different Objectives and Scope . Factors that is to be considered is  Number and extent of data sources.  Impact of DW on existing Operational systems Scope is measured in terms of data transformation and integration functions .

.Distinguishing Characteristics Data Content  Read only data in the DW is the primary component in the architecture.  Architecture should support business subjects as well as high data volumes.

roll up.Distinguishing Characteristics Complex Analysis and Quick Response Architecture should support complex analysis of strategic information since information retrieval is complex.  Review results in different output options  Results in tabular format as well as graphical. Finally the architecture should provide a platform for making rapid decisions and to deal with situations quickly . Users must be able to:  Drill down. slice and dice the data.

Distinguishing Characteristics Flexible and Dynamic DW architecture should be flexible enough to accommodate additional requirements as and when they require. . Ex: The missed items in the business requirements or those that arises because of the change in the business conditions.

.Distinguishing Characteristics Meta Data.  It interleaves with and connects other components.Driven  Holds data about every phase of the data movement.

2.Questions 1. 3. What is the use of Meta data? . What are the three major architectural components? State any three characteristics of DW architecture.

Data Delivery. and Meta data driven. 3. 2. Data Storage. Architecture should support different objectives and scope. read only data. Flexible and dynamic. complex analysis and quick response. Meta data details every phase of data movement and it interleaves and connects with other components. .Answers 1. Data acquisition.

TECHNICAL ARCHITECTURE     The technical architecture of a DW is the complete set of functions and services provided within its components.. Includes the procedures and rules that are required to perform the functions and provide the services. Includes the data stores needed for each component to provide the services Tools are the means to implement the architecture .

Data Stores .Components – Functions & Services  Components  Data Acquisition Data Flow Data Extraction. Data Staging  Data Storage Data Flow Flow. Data Transformation. Data Groups. Service locations. Data Repository  Information delivery Data Flow Flow.

Data Acquisition: Technical Architecture Source Data Management & control Metadata Data Extraction Intermediary Flat Files Data Transformation Relational DB or Flat Files Data Staging .

the data is ready for loading into DW repository. .DATA ACQUISITION 1. 2. Extract data from data sources Move to the staging area Prepare the data for loading into the DW Components – source data and data staging  Data Flow  Flow – Data flow begins at the data sources and pauses at the staging area. 3. After transformation and integration.

TECHNICAL ARCHITECTUREDATA ACQUISITION  Data Flow Contd…  Data Sources Primary data source consists of * Enterprise’s operational systems (+) – Consolidated data. ready to use (-) – Proprietary tools required to extract data * Legacy data resides on hierarchical or network databases .

TECHNICAL ARCHITECTURE DATA ACQUISITION  Data Flow Contd…  Intermediate Data Stores * Data from data sources moved to temporary files * Homogenous data from several sources are merged with other temporary files before moving into staging area * Flat files are used to extract data from operational systems .

TECHNICAL ARCHITECTURE DATA ACQUISITION  Data Flow Contd…  Staging Area – Each extracted file is examined. perform various transformation functions . resolve inconsistencies and cleanse the data. reviewed for business rules. sort and merge data. . This data temporarily resides in staging area before loaded into the DW repository.

TECHNICAL ARCHITECTURE DATA ACQUISITION  Functions and Services   List of functions and Services Data Extraction  Select data sources and determine the types of filters to be applied to individual sources  Generate automatic extract files from operational systems using replication  Create intermediate files to store selected data to be merged later  Transport extracted data from multiple platforms .

TECHNICAL ARCHITECTURE DATA ACQUISITION  Functions and Services   List of functions and Services Data Extraction – Contd….     Provide automated job control services for creating extract files Reformat input from outside sources. departmental data files. databases etc Generate common application code for data extraction Resolve inconsistencies for common data elements from multiple sources .

de duplicate and merge/purge De normalize extracted data structures Convert data types Calculate and derive attribute values Check for referential integrity Aggregate data as needed Resolve missing values Consolidate and integrate data .TECHNICAL ARCHITECTURE DATA ACQUISITION  Functions and Services   List of functions and Services Data Transformation          Map input data to data for DW repository Clean data.

TECHNICAL ARCHITECTURE DATA ACQUISITION  Functions and Services   List of functions and Services Data Staging         Provide back up and recovery for staging area repositories Sort and merge files Create files as input to make changes to dimension tables Create and populate database if relational database Preserve audit trail to relate each data item in the DW to input source Resolve and create primary and foreign keys for load tables Consolidate data sets and create flat files If storage is relational extract load files .

DATA STORAGE : TECHNICAL ARCHITECTURE Management & Control E-R Model Relational DB E-R Model Data Storage Relational DB Dimensional Model Data Marts .

 Data Flow  Flow – From the staging area to DW repository * Top-down Enterprise repository to dependent data marts * Bottom-up Independent data marts to DW .TECHNICAL ARCHITECTURE DATA STORAGE Deals with the entire process loading the data from the staging area into the DW repository.

DATA STORAGE  Data Flow – Contd…  Data Groups * First group   Set of files or tables containing data for a full refresh Meant for the initial loading of the DW set of files or tables containing ongoing incremental loads * Second group  .TECHNICAL ARCHITECTURE.

DATA STORAGE   Data Flow – Contd… Data Repository   All DW databases are relational Capabilities of RDBMS is available for processing of data .TECHNICAL ARCHITECTURE.

TECHNICAL ARCHITECTURE DATA STORAGE  Functions and Services  List of functions and Services          Load data for full refreshes of DW tables Perform incremental loads at specified intervals Support loading into multiple tables Optimize the loading process Provide automated job control services for loading data into DW Provide backup and recovery Provide security Monitor and fine tune the database Periodically archive data from the database. .

INFORMATION DELIVERY : TECHNICAL ARCHITECTURE Management & Control Metadata Information Delivery Multidimensional Database OLAP Temporary Result Sets Standard Reporting Data Stores DM REPORT / QUERY .

TECHNICAL ARCHITECTURE INFORMATION DELIVERY Deals with the entire process of providing information to the user in a flexible manner  Data Flow  Flow   May be top-down or bottom-up User query data is transformed into information either in the form of regular or ad hoc report .

TECHNICAL ARCHITECTURE INFORMATION DELIVERY  Data Flow  Service Location * Query service may be from user desktop / an application server /database * A comprehensive reporting service is needed for producing reports at regular intervals .

 .TECHNICAL ARCHITECTURE INFORMATION DELIVERY  Data Flow  Data Stores – The following intermediary data stores are used for information delivery: Proprietary temporary stores to hold results of individual queries and reports for repeated use  Data stores for standard reporting  Proprietary multidimensional databases.

Automatically reformat queries for optimal execution Govern Queries and control runaway queries Store result set of queries and reports for future use.TECHNICAL ARCHITECTURE INFORMATION DELIVERY  Functions and Services  List of functions and Services           Provide security to control information access Monitor user access to improve service and for future enhancements. Allow users to browse DW content Simplify access by hiding internal complexities of data storage from users. Provide multiple levels of granularity Provide event triggers to monitor data loading Make provision to perform complex analysis thru’ OLAP .