Professional Documents
Culture Documents
By
Mrs.Chhaya s PaWar
Be-a / te-B
2018-19
• Batch and near real-time loads to integrate data from multiple resources
(internal and external)
• Basic reporting with no drill-down/ drill-across
• Online analytical processing (OLAP)
• Predictive analytics
• Operational business intelligence
Tangible Benefits
• Better decisions in terms of cost and quality
• Enhanced asset liability management
• Cost of product introduction lowers with targeted marketing
With 200 million dollars annual sales even 1% improvement in sales wll
bring 2 million dollar additional revenue
Intangible Benefits
• Improved productivity by keeping data at one place
• Enhanced customer relations by knowing individual customer
• CRM improved with customization
• Enable the reengineering of business processes
• Customer experience
• Risk mitigation
• Finance transformation
• Product innovation
• Asset optimization
• Operational excellence
………………words by Teradata
Data warehouse are special type of databases that are specially built for
getting information OUT rather than putting data IN
Definition: According to W.H.Inmon(1992) considered to be father of Data
Warehousing;
Data Warehouse is subject oriented , Integrated, non volatile, time
variant collection of data in support of management’s decision.
• Subject oriented
• Integrated
• Non volatile
• Time variant
How the huge amount of data in the source systems get delivered to the data
warehouse users as useful pieces of information
CHPT 1:INTRODUCTION TO DATA WAREHOUSING
ETL PROCESS
• Main source
• Lot of variation as data is collected from different platform, OS , DBMS etc
• Disparity in data is the biggest challange
• It’s a place where all the extracted data is temporarily stored and prepared
for loading into data warehouse
• It isolates the raw data
• DW users cant access staging area so security and process quality
• It eases the development of central metadata repository which maintains
documentation of operational systems,ETL process,DW,tools and predefined
reports
• Operational metadata
• Extraction and transformational metadata
• End user metadata
DIMENSIONAL MODELLING
DIMENSIONAL MODELLING
• Preparing logical design of data warehouse
• Data tables are designed , physically created and linked with each other
Data Wrehouse Modelling vs Operational Database modelling
• Star Schema
• Snowflake Schema
• Fact Constellation Schema
Example:
How much profit in dollars did the salesperson david make on 2 January 2006
by selling trousers to jennhy at the new delhi store ?
• System conversion
• Data Aging
• Heterogeneous system integration
• Incomplete information at data entry
• Fraud
• Lack of policies…prevention of corrupt or incorrect data
CHPT 1:INTRODUCTION TO DATA WAREHOUSING
ISSUES IN DATA CLEANSING
• Which data to cleanse- decided by project team and users, whether cleansing
and aftermath of leaving the dirty data as it is
• Where to cleanse- at the data staging area
• How to cleanse- find appropriate tools
• To minimize on the fly processing needed when the user is navigating the data
• Preprocessing and storing all the possible combinations of measures,
dimensions and hierarchies before the user starts the analysis
• This makes the data available instantaneously before the user
• Definition-
Online analytical processing (OLAP) is a category of software technology that
enables analysts, managers, executives to gain insights into the data through
fast, consistent, interactive access in a wide variety of possible views of
information that has been transformed from raw data to reflect the real
dimensionality of the enterprise as understood by the user.
• Whenever the ROLAP engine in analytical server issues a complex query, it fetches data from
the main warehouse and dynamically creates a multidimensional view of data for the user.
• Here, it differs from MOLAP because MOLAP already has a static multidimensional view of
data stored in proprietary databases MDDBs.
• . The cells or data cubes of this multidimensional databases carry precalculated and
prefabricated data. Proprietary software systems create this precalculated and
fabricated data, while the data is loaded to MDDBs from the main databases.
• Now, it is the work of MOLAP engine, which reside there in the application layer,
provide the multidimensional view of data from MDDBs to the user.
• Thus when a user request for the data, no time is wasted in calculating the data and
the system responses fast
• It is able to download a relatively small hypercube from a central point, usually from
data mart or data warehouse, and perform multidimensional analyses while
disconnected from the source.
• Data sets are limited to the boundaries defined by the user with no access to granular
data.