Professional Documents
Culture Documents
DWM1
DWM1
INTRODUCTION TO DATA
WAREHOUSING
Data Mart
A data warehouse that is limited in scope
Chapter 9 Copyright © 2014 Pearson Education, Inc.
2
COMPARISON CHART OF DATABASE TYPES
Single-tier architecture
Two-tier architecture
From the architecture point of view, there are three warehouse models-
Enterprise Warehouse:-
•An enterprise warehouse collects all information topics spread throughout the
organization.
•It provides corporate-wide data integration, typically from one or several operational
systems or external information providers, and is cross-functional in scope.
•It usually contains detailed data as well as summarized data and can range in size
from a few gigabytes to hundreds of gigabytes, terabytes, or beyond. Can be an
enterprise data warehouse.
•The traditional mainframe, computer super server, or parallel architecture has been
implemented on platforms. This requires extensive commercial modeling and may
take years to design and manufacture.
Data Mart:-
•A data mart contains a subset of corporate-wide data that is important to
a specific group of users.
•The scope is limited to specific selected subjects.
•For example, a marketing data mart may limit its topics to customers,
goods, and sales.
•The data contained in the data warts are summarized. Data warts are
typically applied to low-cost departmental servers that are Unix/Linux or
Windows-based.
•The implementation cycle of a data mart is more likely to be measured
in weeks rather than months or years. However, it can be in the long run,
complex integration is involved in its design and planning were not
enterprise-wide.
Virtual Warehouse:-
•A virtual warehouse is a group of views on an operational database.
•For efficient query processing, only a few possible summary views can be physical.
•Creating a virtual warehouse is easy, but requires additional capacity on operational
database servers.
•A data warehouse architecture defines the arrangement of the data in different
databases. As the data must be organized and cleansed to be valuable, a modern data
warehouse structure centers on identifying the most effective technique of extracting
information from raw data in the staging area and converting it into a simple
consumable warehousing structure using a dimensional model that delivers valuable
business intelligence.
Chapter 9 Copyright © 2014 Pearson Education, Inc.
12
Extraction, Transformation And
Loading
1.Extraction:
The first step of the ETL process is extraction.
2.In this step, data from various source systems is extracted which can be
in various formats like relational databases, No SQL, XML and flat files
into the staging area.
3. It is important to extract the data from various source systems and
store it into the staging area first and not directly into the data warehouse
because the extracted data is in various formats and can be corrupted
also.
4.Hence loading it directly into the data warehouse may damage it and
rollback will be much more difficult. Therefore, this is one of the most
important steps of ETL process.
Categories of Metadata
Metadata can be broadly categorized into three categories −
•Business Metadata − It has the data ownership information, business definition, and
changing policies.
•Technical Metadata − It includes database system names, table and column names
and sizes, data types and allowed values. Technical metadata also includes structural
information such as primary and foreign key attributes and indices.
•Operational Metadata − It includes currency of data and data lineage. Currency of
data means whether the data is active, archived, or purged. Lineage of data means the
Chapter
history 9 migrated and
of data transformation
Copyright appliedEducation,
© 2014 Pearson on it. Inc. 17
BENEFITS OF A DATA WAREHOUSE