Data warehouse design

Practical Approach for developing Data warehouse




Plan & define requirements at the overall corporate level. Create Surrounding architecture for a complete data warehouse. Conform and standardize the data content. Implement data warehouse as a series of super marts, one at a time.

Overview of the components SOURCE DATA COMPONENT Divided into four broad categories  Production data  Internal data  Archived data  External data  .

Data warehouse Basic Architecture .

combine. merge and prepare source data for storage and use in data warehouse.  . convert. change.Data Staging Component Data extraction  Data transformation  Data loading Data staging is area with set of functions to clean .

Basic architecture (with staging area) .

Data Storage Component Yearly Refresh Production data Quarterly Refresh Monthly Refresh Internal data Daily Refresh Archived data Data warehouse Base data load .

Architecture ( with data marts) .

Since in data dictionary . we keep the information about the logical data structures.Metadata components  The metadata component is the data about the data in the data warehouse.  It is similar to data dictionary in DBMS. information about files.  . address and indexes.

User Metadata   .Types of Metadata  Operational Metadata Extraction and Transformation Metadata End.

Importance of Meta data  It connects all parts of Data warehouse. It provide information about the contents and structure to the developers.   It open the door to the end users and make the contents recognizable in own terms. .

TYPES OF DATA WAREHOUSE      Older detail data Current details data Lightly summarized data Highly summarized data Meta data .

with points radiating from a center. It is called a star schema because the diagram resembles a star. The center of the star consists of one or more fact tables and the points of the star are the dimension tables   .STAR SCHEMAS  The star schema is the simplest data warehouse schema.


Example of Star Schemas product prodId p1 p2 name price bolt 10 nut 5 store storeId c1 c2 c3 city Dimension nyc table sfo la product Store sale oderId date o100 1/7/97 o102 2/7/97 105 3/8/97 custId 53 53 111 prodId p1 p2 p1 storeId c1 c1 c3 qty 1 2 5 amt 12 11 50 This is fact table and all primary keys are indexed of various dimension table customer custId 53 81 111 name joe fred sally address 10 main 12 main 80 willow city sfo sfo la customer .

The relationship among sets of attributes of dimensions can separate new dimension table.SNOWFLAKE SCHEMAS  Snow flake model is result of decomposing one or more of the dimensions.  .


Example of Snowflake Schemas sType store city store storeId s5 s7 s9 cityId sfo sfo la sType tId t1 t2 size small large location downtown suburbs region tId t1 t2 t1 mgr joe fred nancy city Dimension table Fact table cityId pop sfo 1M la 5M regId north south Dimension table region regId name north cold region south warm region .

DATA WAREHOUSING OBJECTS  Objects in data warehousing are of two types  Fact tables Dimension tables  .

Stores business measurements. It usually represent numeric and additive. It consists of typically facts and foreign keys to the dimension tables.    .FACT TABLES  Fact tables are large tables in data warehouse schema . that can analyzed and examine.

These tables are generally textual and descriptive .DIMENSION TABLES  Dimension tables are known as lookup or reference tables. use them as row headers of    . It stores information normally contain queries. Contain static data in the data warehouse.

Cube Fact table sale prodId storeId amt p1 c1 12 p2 c1 11 p1 c3 50 p2 c2 8 Multi-dimensional cube p1 p2 c1 12 11 c2 8 c3 50 Two dimension view .

3-D Cube Fact table sale prodId p1 p2 p1 p2 p1 p1 storeId c1 c1 c3 c2 c1 c2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4 Multi-dimensional cube day 2 day 1 p1 p2 c1 p1 12 p2 11 c1 44 c2 8 c2 4 c3 50 c3 Three dimension view .

Repetitive Read/ Write /update       OLAP Knowledge worker Decision support Subject oriented Historical.Difference between users functions DB design     data usage access   OLTP Clerks. up-to-date . IT professional Day-to-day operations Application oriented Current. multidimensional. integrated consolidated Ad hoc Read only . detailed relational isolated data. summarized.

Sign up to vote on this title
UsefulNot useful