You are on page 1of 69

Chapter 2

DATA
WAREHOUSE:

Group 6
Table of contents
Defining features
-Integrated Data
01
-Time-Variant Data
-Nonvolatile Data
Data Warehouses and Data
Marts
-How Are They Different? 02
-Centralized Data Warehouse
-Independent Data Marts
-Data-Mart Bus

Overview of the components


-Source Data Component 03
-Data Staging Component
-Data Storage Component
Table of contents
-Information Delivery Component
-Management and Control Component

Metadata in the Data Warehouse


04
-Types of Metadata
Chapter Summary
05
Chapter Objectives
• Review formal definitions of a data warehouse
• Discuss the defining features
• Distinguish between data warehouses and data marts
• Study each component or building block that makes
up a data warehouse
• Introduce metadata and highlight its significance
Data is an information
Warehouse delivery system.
Bill Inmon

“A Data Warehouse is a subject oriented,


integrated, nonvolatile, and time variant
collection of data in support of
management’s decisions.”
Sean Kelly

The data in the data warehouse is:


Separate
Available
Integrated
Time stamped
Subject oriented
Nonvolatile
Accessible
Defining
features
Subject-Oriented Data
• In operational systems - store data by
individual applications.
• In the data sets for an order processing
application - keep the data for that
particular application.
Subject-Oriented Data
• In every industry, data sets are
organized around individual
applications to support those
particular operational systems.
Subject-Oriented Data
• In striking contrast, in the data
warehouse, data is stored by
subjects, not by applications.
• Business subjects differ from
enterprise to enterprise.
Integrated Data
• A data warehouse is developed
by integrating data from varied
sources into a consistent
format.
Data from Applications
TIME-
VARIANT
Data
TIME-VARIANT Dat

In data warehousing,
what is the term
“time variant”?
TIME-VARIANT Dat

The term “time variant”

data warehouse
specific time period.
TIME-VARIANT Dat

• The data warehouse is consistent


within a period, data warehouse is
loaded daily, hourly, or on a regular
basis

• Does not change during that period.


TIME-VARIANT Dat

What is a time variant data example?

Time-Variant: A data warehouse stores


historical data. Data from a data
warehouse, for example, can be
retrieved from three months, six
months, twelve months, or even older
data.
Non-Volatile Data

Data extracted from the various


operational systems and pertinent data
obtained from outside sources are
transformed, integrated, and stored in
the data warehouse. The data in the
data warehouse is not intended to run
in day-to-day business.
Non-Volatile Data

Data from the operational systems are


moved into the data warehouse at
specific intervals. Depending on the
requirements of the business, these data
movements take place twice a day,
once a day, once a week, or once in two
weeks.
Data Warehouse
& Data Marts
Data Warehouse and Data Marts

Data Warehouse - store and process large


amounts of data from various sources within a
business.

Data Marts - A data mart is a data storage


system that contains information specific to an
organization's business unit.
How Are They Different?
Data Warehouse Data Marts
 Corporate/Enterprise-wide  Departmental
 Union of all data marts  A single business process
 Data received from staging  STARjoin (facts &
area dimensions)
 Queries on presentation  Technology optimal for data
resource access and analysis
 Structure for corporate view  Structure to suit the
of data departmental view of data
 Organized on E-R model
Centralized Data Warehouse

This architectural type takes into account the


enterprise-level information requirements. An overall
infrastructure is established. Atomic level normalized data
at the lowest level of granularity is stored in the third
normal form.
Independent Data Marts

This architectural type evolves in companies


where the organizational units develop their own data marts
for their own specific purposes. Although each data mart
serves the particular organizational unit, these separate data
marts do not provide “a single version of the truth.”
Data-Mart Bus
This is the Kimbal conformed supermarts
approach. You begin with analyzing requirements for a
specific business subject such as orders, shipments,
billings, insurance claims, car rentals, and so on. You build
the first data mart (supermart) using business dimensions
and metrics. These business dimensions will be shared in
the future data marts.
Overview of
the
Components
Erikamae Manzo
Components of a system

Operational System

Order entry Claims processing Savings account


Front-end component
Graphical User Interface (GUI)
Data storage component
Microsoft SQL
Oracle Informix
Server
Display component
Graphical User Interface (GUI)
Connectivity component

Data interface Network software


Architecture
Architecture in the proper arrangement of components
Architecture
Architecture in the proper arrangement of components

The basic
components
are the same
Source Data Component

PRODUCTION

INTERNAL

ARCHIVED

EXTERNAL
Production
Data
This comes from the different operating
systems. Basis: Data requirements in
data warehouse → segments of data
from various operational modes
PRODUCTION DATA
This is data from many vertical applications.

Operational System: info queries → NARROW


The queries are all predictable.

“You do not expect a particular query to run across different


operational systems.” – There is no conformance of data
among the various operational systems.
PRODUCTION DATA

Significant and Disturbing Characteristic:

DISPARITY
Noun
dis·​par·​i·​ty | \ di-ˈsper-ə-tē , -ˈspa-rə- \

Definition
: a noticeable and usually significant difference or dissimilarity
Internal
Data
In each organization, the client keeps
their “private” spreadsheets, reports,
customer profiles, and sometimes even
department databases.
INTERNAL DATA
You cannot ignore the internal data held in private files in
your organization. It is a collective judgment call on how
much of the internal data should be included in the data
warehouse.

Internal data ADDS additional complexity to the process of


transforming and integrating the data before it can be stored
in the data warehouse.
Archived
Data
Operational systems: run the current
business. In every operational system,
we periodically take the old data and
store it in archived files.
ARCHIVED DATA
The circumstances in your organization dictate how often
and which portions of the operational databases are
archived for storage.

Stage:
1st: recent data → separate archival database
2nd: older data → flat files on disk storage
3rd: oldest data → tape cartridges or microfilm and kept
off-site
ARCHIVED DATA

Data warehouse keeps historical snapshots of data →


ANALYSIS

This type of data is useful for discerning patterns and


analyzing trends.
External
Data
Most depend on information from
external sources for a large percentage
of the information they use. They use
statistics associating to their industry
produced by the external department.
EXTERNAL DATA
The purposes served by such external data sources cannot
be fulfilled by the data available within organization itself.

Insights & archived data: LIMITED

In order to spot industry trends and compare performance


against other organizations, you need data from
EXTERNAL SOURCES.
EXTERNAL DATA

Outside sources’ data do not conform to your formats.

→ organize the data transmissions from the external


sources

You need to accommodate the variations.


Data Staging Component

Data staging provides a place and an area with a set of functions to clean,
change, combine, convert, deduplicate, and prepare source data for storage
and use in the data warehouse.
Three major functions need to be performed for getting the
data ready

Data
Data Extraction. Data Loading.
Transformation.
Data Extraction.
This function has to deal with numerous data sources. You have to
employ the appropriate technique for each data source. Source
data may be from differ- ent source machines in diverse data
formats. Part of the source data may be in relation- al database
systems. Some data may be on other legacy network and
hierarchical data models. Many data sources may still be in flat
files. You may want to include data from spreadsheets and local
departmental data sets. Data extraction may become quite
complex.
Data Transformation
Data transformation involves many forms of combining pieces of
data from the differ- ent sources. You combine data from a single
source record or related data elements from many source records.
On the other hand, data transformation also involves purging
source data that is not useful and separating out source records
into new combinations. Sorting and merging of data takes place
on a large scale in the data staging area.
Data Loading
Two distinct groups of tasks form the data loading function. When
you complete the design and construction of the data warehouse
and go live for the first time, you do the initial loading of the data
into the data warehouse storage. The initial load moves large
volumes of data using up substantial amounts of time. As the data
warehouse starts functioning, you continue to extract the changes
to the source data, transform the data revisions, and feed the
incremental data revisions on an ongoing basis.
Information Delivery Component
USERS OF INFORMATION FROM THE DATA WAREHOUSE

NOVICE NOVICE NOVICE


USER USER USER

Comes to the data Wants to be able to navigate


warehouse with no throughout the data warehouse, pick
Needs prepackaged
training therefore, up interesting data, format queries,
information but not
needs fabricated drill through the data layers and
regularly
reports and present create custom reports and ad hoc
queries queries
INFORMATION DELIVERY COMPONENT
Metadata Component

Metadata component is the


data about the data in the data
warehouse.
Management and Control Component
Metadata in the
Data Warehouse
Contain all of the
information about the
OPERATIONAL
EXTRACTION
END-USER AND
TRANSFORMATION
METADATA METADATA
operational data
METADATA
sources
Contain data about the
extraction of data from the
EXTRACTION AND
END-USER
source systems and all the
OPERATIONAL
METADATATRANSFORMATION
METADATA
METADATA
data transformations that
take place in the data staging
area.
The navigational map
END-USER
EXTRACTION
END-USER AND
TRANSFORMATION
METADATA
METADATA
of the data warehouse.
METADATA
Special Significance
 It acts as the glue that connects all parts of
the data warehouse.

 it provides information about the contents


and structures to the developers.

 It opens the door to the end-users and


makes the contents recognizable in their
own terms
Chapter
Summary
Defining features of the data
warehouse are: separate, subject-
oriented, integrated, time-
variant, and nonvolatile.
 You may use a top-down approach and
build a large, comprehensive, enterprise
data warehouse; or, you may use a
bottom-up approach and build small,
independent, departmental data marts. In
spite of some advantages, both
approaches have serious shortcomings.
A viable practical approach is
to build conformed data marts,
which together form the
corporate data warehouse.
Data warehouse building blocks
or components are: source data,
data staging, data storage,
information delivery, metadata,
and management and control.
In a data warehouse, metadata is
especially significant because it
acts as the glue holding all the
components together and serves
as a roadmap for the end-users.
Thank you!

You might also like