You are on page 1of 27

BIDW Roadmap

Author : Dave Goyal


BIDW Process Roadmap

Author : Dave Goyal 2


Overall Process
 Program / Project Planning and
Management
 Business Process Definition
 Technical Architecture Design
 Product Selection and Installation
 Dimensional Modeling

Author : Dave Goyal 3


Overall Process…Contd.
 Physical Design
 ETL Design and Development
 BI Application Design
 BI Application Development
 Deployment
 Change Management and Maintenance

Author : Dave Goyal 4


Program / Project
Planning and Management
 Define the Project
 Build the Business Case and Justification
 Plan the Project
 Manage the Project
 Manage the Program

Author : Dave Goyal 5


Business Process
 Define Business Process
 Define Requirements using Interviews
 Define Requirements using Facilitated
Sessions

Author : Dave Goyal 6


Technical Architecture Design
 Back Room Architecture (Source , ETL)
 Presentation Server Architecture
(Dimensional Architecture)
 Front Room Architecture (BI)
 Additional Architecture Features
(Infrastructure, Metadata, Security)

Author : Dave Goyal 7


Product Selection and Installation
 Architecture Plan (DW Architecture
Diagram and Application Architecture
Document)
 Product Selection (Hardware/OS, DBMS,
ETL, BI, Data Profiling, Data Cleansing
etc.)

Author : Dave Goyal 8


Dimensional Modeling Process
 Value Chain Business Process
 Choose the Business Process
 Declare the Grain
 Identify the Dimensions
 Identify the Facts
 Enterprise Bus Matrix

Author : Dave Goyal 9


Physical Design
 High Level Physical Design
 Develop Standards
 Develop the Physical Data Model
 Develop Initial Indexing Plan
 Design OLAP Database
 Design Aggregations

Author : Dave Goyal 10


ETL Table Naming Convention
 D_ : Dimension Table
 F_ : Fact Table
 S_ : Source Table - Contains all data copied
directly from a source file
 X _ : Extract Table – Contains changed
source data only, Changes may be from an
incremental extract or derived from a full
extract

Author : Dave Goyal 11


ETL Table Naming Convention 2
 C_ : Clean Table – contains source rows that
have been cleaned
 E_ : Error Table - contains error rows found in
source data
 M_ : Master table – maintains history of all clean
rows
 T_ : Transform Table – contains the data
resulting from a transformation of source data

Author : Dave Goyal 12


ETL Table Naming Convention 3
 I_ : Insert Table – contains new data to be
inserted in dimension table
 U_ : Update Table – contains changed data
to be inserted in dimension table

Author : Dave Goyal 13


Data Quality
 Avoid Null string in dimension tables
 Specify default value for NOT NULL
columns – ‘N/A’, ‘Not Known’, ‘Invalid’
 Dimension Primary keys should be auto
generated surrogate keys. Allow data
quality rows as 0, -1 , -2

Author : Dave Goyal 14


Surrogate Keys
 Always use surrogate keys for dimension
keys as auto generate keys
 Use SET IDENTITY ON and SET
IDENTITY OFF sql statement to create
keys 0 , -1 and -2 rows for each dimension
when it is created
 0 : INVALID
 -1 : UNKNOWN
 -2 : NOT APPLICABLE

Author : Dave Goyal 15


ETL Design and Development
 Round Up the Requirements
 Extract Data from source (3 Steps)
 Clean and Conform Data (5 Steps)
 Delivering Data (13 Steps)
 Managing the ETL Environment (13 Steps)

Author : Dave Goyal 16


ETL Roadmap

Author : Dave Goyal 17


ETL Implementation Process
 Analyze data quality thoroughly and have
options available to resolve it
 Define Data source definitions
 Create High Level S2T Map
 Create Detail Level S2T Map
 Create Fact Worksheet

Author : Dave Goyal 18


ETL Process…Extract
 Extract Data to S_Table (Full Load)
 Compare S_ to M_ table and load the difference in
X_ tables
 Clean X Table by removing duplicate rows from X_
Table . De-duplication step
 Move duplicate rows to E_ Table
 Move non duplicate clean rows to C_ table
 Compare C_ to M_ and insert new into M and
update M_ with changed

Author : Dave Goyal 19


ETL Process…Transform
 Select and Transform from C_ to T_
 Compare T_ with D_ for new and changed
rows
 Insert New rows in I_ and changed rows in
U_

Author : Dave Goyal 20


ETL Process…Load (I_)
 Insert rows directly into D_ table from I_
 Update rows from U_ to D_ when its SCD
1,3.
 Insert rows from U_ to D_ when its SCD 2
 Please Dimension or Surrogate keys will be
generated during Load stage

Author : Dave Goyal 21


ETL Process… To remember
 S_ , X_ , M_ , C_ , E_ tables should be
named as source tables such S_Agents .
 T_ , I_ , U_ , or D_ table should be named
as target tables such as T_Agent,
T_PolicyHolder etc.
 Source table data size should follow source
data formats except Natural keys should be
varchar to accommodate data quality

Author : Dave Goyal 22


High Level BIDW System
Architecture Model

Author : Dave Goyal 23


BI Application Design
 Define the structure of the portal and its
webpages
 Define High Level Reporting requirements
(Dashbaords, Scorecards)
 Define Analytical reporting requirements
( Cubes, Interactive reports, Adhoc Queries)
 Define Detailed reporting requirements
( Filter based reports, Adhoc queries)

Author : Dave Goyal 24


BI Application Layers

Author : Dave Goyal 25


BI Application Development
 Setup the development environment
 Setup the Issue management system
 Develop all reports
 Test and Balance each report against the
source system

Author : Dave Goyal 26


Deployment / Maintenance
 Design Version control system
 Define the change management process
 Define the documents to deploy changes
from Dev, Test, QA to Production
 Manage and maintain environments.

Author : Dave Goyal 27

You might also like