You are on page 1of 39

Day1

Data
warehousin
g Concepts
P. Srinivas

Agenda:
To understand the concepts of Data warehousing
Train the Trainer
Class timing
Morning Session-I
09:00 AM to 10:30 AM
Short Intermission
10:30 AM to 10:45 AM
Morning Session-II
10:46 AM to 01:00 PM
Long Intermission for Lunch 01:00 PM to 2: 00 PM
Afternoon Session -I
02:00 PM to 3:00 PM
Short Intermission
03:00PM to 3:15 PM
Afternoon Session-II
03:16PM to 06:00 PM

What is a data warehouse?


A data warehouse is an integrated System where all the data
From different organizations are stored and shared for Business
Decisions and operational excellence
Bill Inmon, The father of Datawarehousing says that
A data warehouse is a subject-oriented,
Integrated, time-variant, and non-volatile collection of data in
Support of managements decisions.
http://www.inmoncif.com
Ralph Kimball, is a famous Data warehousing specialist in USA
Who authored many books and practicing since a very long time
http://www.kimballgroup.com

What is subject-oriented mean?


Any Subject Area data may be loaded into Database server,
and Analyze the same.
For e.g.,
An Enterprise is a Factory which have several departments
Which will work together to achieve the Quality of Service.
OLTP Application
Data

Data
Warehouse/storage
Subject area wise

Human Resource
system

HR

Customer Information
System

Customer

Finance

Finance

Sales and Distribution

SD

What is Integrated?
Legal Business data from different countries of the world is
gathered and stored in a warehouse.
For eg.,
A Corporation(company) has different international
Companies located in different locations.
Suppose, Human Resource Department in companies located in
UK, USA, UAE, India may be collected in One server and
Create KPI analytical reports which will help in Business
Management from Tactical SBU Manager to Strategic
Management.

What is time-variant?
Time-variant means, The corporate data will be loaded to
A database every day or 8 times/three hours a day OLTP data
Will be stored.
For Example:
Some subject Areas data loading is done as tailored to load
Into data warehouse
Yearly once
Half yearly once
Quarterly
Monthly once
Weekly once
Daily once
Hourly once

What is Non-Volatile Business data?


As the Customer data from the source systems/Applications is
Legal, Update anomalies are not allowed except a front end
Application user.
SELECT
Digital laws of the land are strict, Companies come to India
for development

For Example,
USA is a super power country, which has given permission
To access their citizens data in India. People work as Production
Support personnel/Analysts and watch the valuable data
as they shall be given privileges;;;

BASIC COMPONENTS OF A DATA WAREHOUSE

REQUIREMENT GATHERING Materialization as a Project

Requirement gathering is key/(s) to DW project/(s)


Interviews with Front Office personnel for Inputs
Interviews for access privileges on Source System/(s)
Review the Interviews and prepare Business
- Requirement Document (BRD)

Project Planning

After Estimates and tendering process, Manager sends a mail


To DM to provide the infra structure required to start the work.
With Client, there will be brain-storm session about the BRD

What are Data Marts


Physical
Logical

Physical Data Mart


The physical tables are designed particularly for one subject
area or Manager to see his every day business related reports.
Indexes are created for easy retrieval irrespective of the
Database Engines speed/velocity to retrieve data.
Denormalized model is designed and loaded with that
Kind of transformed data for quick results.
E.F. Codds 3rdNormal form is not used.

Logical Datamarts

The Data mart is created on top of the 3rd normal database tables.
Entity-Relationship diagrams and full proof details will be
Available.
The Presentation layer is created using Views.
The three-dimensional data model is created in
front end Applications.

Data Design

What is Entity Relationship modelling?


Contextual Data Modelling (CDM)
Logical Data Modelling (LDM)
Physical Data Modelling (PDM)

Entity = Table
Attributes = columns
Rows = Tuples
Cardinality = No. of Records

What is a fact table?


A fact is a real observation in the market place
The latest todays data and future dated data will be
Indicator, but in reality it will be considered for Mining object.
A fact table is a Big table designed to accommodate the
Measures. Measures are attributes designed and
which have Math datatypes most of the times.
Given a functional scenario, The modeller would create a model
and help the design team.
A fact table may have a big primary key to attain UNIQUE ness of
Records.
The measures are attributes/Columns in a table.
Generally, IT is considered as a cooked table as the attributes
Data is created using database utilities or UNIX procedures
and the modified data is loaded.
The measures are additive, semi-additive numbers with
meaningful units of measurement

What is a fact-less fact table?

When the designer finds no measures additive, Substractive,


multiply or divisible in a fact table, then its called as FLFT
Eg. Logs, Events and Coverage tables, Student number
, Employee number, etc.,

What is a confirmed Fact table?


The data of one or more Fact table may be used
for loading into another fact table
To generate reports, The presentation layer may be
Designed joining two or more fact tables.

What are Additive, Semiadditive and Non-additive facts?

What is a dimensional table?

What is a Slowly changing Dimension?


Type1
Type2
Type3

Sample Time dimension

What is De-generate Dimension?

What is a Junk Dimension?

What is Surrogate keys?

What is Quarterly Snapshot

Different Source Systems data

ETL Extract types


Full Load/refresh
Incremental Load
Transactional events

What is purging of data?

Cleansing, De-duping and Merge

Transformation rules
Integration
History maintenance of a person events
De-normalization
Referential Integrity checks
Data type conversions
Calculations, & Derivations
Aggregation for quick retrieval
How to handle NULL values

Staged data load job control services


Job Definition
Run decks
Proc Libs
Job scheduling
Monitoring
Log files

What is soft delete of Corporate data?

What are upstream applications?

What are downstream applications?

Metadata Repository design and storage of data

What is presentation layer?


Logical Design
Physical Design

What is Drill-down/detailed and Drill-up/Highlevel Summary


of data?

What is ANSI standard? American National Standards Institute


What is BSA Standard ? Business Software Alliance
And what is EAR? Export Administration Regulations

Roles in a typical Data warehousing Project The BackOffice

Business Analyst/(s)
Subject Matter experts (SME)
ETL Developer/(s)
Reports developer/(s)
System Analyst/(s)
Project Manager/(s)
Designer
Data Modeller
Data Steward/(s)
Project Leader/(s)
ETL Pre-prod Support person/(s)
Report tool Pre-prod support person/(s)
Prod Server support person/(s)
Database Administrator/(s)
System Administrator/(s)
System Engineer/(s)

THANK YOU!
Any Questions!

Clarifications required!

You might also like