You are on page 1of 28

Data Warehouse Design

Data Warehouse Architecture


What is a Data Warehouse Architecture?

• Conceptualization of how the data warehouse


is built
• The conceptual framework/organization based
on the business process in a particular
enterprise
The architecture varies upon the needs/processes
of an enterprise
Data Warehouse Architecture
Data Sources
• Databases that serve daily operations of the
enterprise e.g. production, sales (cash
register), accounting
• Operational Data Store (ODS) is a point of
integration for operational systems that
developed independent of each other.
ODS needs to be continually updated since it
supports day operations
Staging Area
• This is a place where copied data from data
sources is prepared for data warehouse
A copy of data retrieved from different data
sources, which is later loaded to a DW
• The data in this area is accessible to DW
practitioners only
DB operators and analysts do not have access
Data Warehouse Area
This area stores
• Cleaned raw data
• Derived (aggregated) data
 Usual aggregates of the raw data e.g. quarter sales
per regions
• Metadata
 data about data which defines the DW
 describes the meaning, properties and origins of the data
in the DW
 It is used for building, maintaining and managing
the data warehouse
Why we need Metadata
• Metadata helps to answer the following
questions
Where did the data come from?
How many times do data get reloaded?
What transformations were applied with
cleansing?
What tables, attributes, and keys does the Data
Warehouse contain?
Presentation Area
• Referred to a warehouse in business
communities
• This area comprise of
Data marts
Users or Analytical process
• Analytical process involve the use of tools to
gather insights from the data
Data Marts
• A logical subset of the complete data
warehouse.
• Often viewed as a restriction of the data
warehouse to a single business process or to a
group of related business processes targeted
toward a particular business group
extracted and designed to support management
of a department or section
From the Data Warehouse to Data
Marts
Information

Individually Less
Structured

History
Departmentally
Normalized
Structured
Detailed

Organizationally More
Structured Data Warehouse

Data
Data Warehouse vs Data Mart
Vertical-tier and Horizontal-tier architecture
DW ARCHITECTURE IN PRACTICE
Vertical Tier
• Popular ones are
Generic Two-Tier Architecture
Three-Tier Architecture
Two-Tier Architecture
Thin Client
• Operations are executed on the server side
• The client is just used to display the results
• This architecture fits well for Internet DW
access
Fat Client
• The server just delivers the data e.g., the
corresponding data mart
• Operations are executed on the client
• Communication between client and server
must be stable to sustain large data transfers
Three Tier Architecture

Tier 3: reporting
and analysis

Tier 2: derived data


that had been
aggregated for DSS

Tier 1: raw and detailed


data intended to be the
single source for all
decision support
Horizontal Tiers
• Popular practices
Independent Data Mart
Dependent Data Mart
Hybrid Data Mart
Independent Data Mart
• Mini warehouses
Limited in scope
Faster and cheaper to build than DWs
• Separate ETL for each independent Data Mart
Redundant processing for each mart
Dependent Data Mart
• Single ETL for the DW
No redundancy in the ETL process
• Data Marts are loaded from the DW
Hybrid Data Mart
• A mixture of dependent and independent data
marts
• Combines data from several operational
source systems in addition to a data
warehouse
• Integrated view of the enterprise
Learning Activity: Discussion Forum
• Compare the performance of vertical tier
architectures and horizontal tier architectures
Centralized vs Distributed
OTHER TYPES OF DATA WAREHOUSE
Centralized DW
• Analytical queries are run only at the main
enterprise location
no need to transport data via network
• High costs for large dedicated hardware
Distributed DW
• More natural form due to corporations being
active all over the world and having different
types of hardware and software
– Example: large companies such as Uber, eBay, etc.
• Distributed DW works
when much processing occurs at the local level
even though local branches report to the same
balance sheet, the local organizations are
somewhat autonomous
Types of Distributed DW
1. Geographically distributed
• In the case of corporations spread around the world
 Information is needed both locally and globally (e.g., KFC)

2. Technologically distributed DW
• Placing the DW on the distributed technology of
some vendor
 Entry costs are cheap – large centralized hardware is expensive
 No theoretical limit on how much data can be placed in the DW
–new servers can be added to the network on demand
Types of Distributed DW
3. Independently evolving distributed DW
• Developed concurrently in organizations
• One DW is built and operationalized
 Once successfully, other parts of the organization follow
independently

• ETL, which stands for “extract, transform, load,” are the


three processes that, in combination, move data from
one database, multiple databases, or other sources to a
unified repository—typically a data warehouse.
ETL Process

E
T

You might also like