Professional Documents
Culture Documents
1
LEARNING OBJECTIVES
3.1 Understand the basic definitions and concepts of data
warehousing
3.2 Understand data warehousing architectures
3.3 Describe the processes used in developing and managing data
warehouses
3.4 Explain data warehousing operations
3.5 Explain the role of data warehouses in decision support
3.6 Explain data integration and the extraction, transformation, and
load (ETL) processes
3.7 Understand the essence of business performance management
(BPM)
3.8 Learn balanced scorecard and Six Sigma as performance
measurement systems
2
Data Warehouse
1. Subject oriented
• Organized around major subjects, such as sales progress
• Containing only information relevant for decision support
• Focusing on the modeling and analysis of data for decision makers,
not on daily operations or transaction processing
• For example, to learn more about your company's sales, you can
build a warehouse that concentrates on sales.
Using this warehouse, you can answer questions like "Who was our
best customer for this item last year?" This ability to define a data
warehouse by subject matter, sales in this case, makes the data
warehouse subject oriented (http://docs.oracle.com/)
4
4 main characteristics of data warehousing
2. Integrated
• Constructed by integrating multiple, various
data sources
• Must place data from different sources into
a consistent format, to do so they must deal
with naming conflict and discrepancies
• Data cleaning and data integration
techniques are applied
• Ensure consistency in naming conventions
among different data sources
• When data is moved to the warehouse, it is
converted 5
4 main characteristics of data warehousing
4. Non-volatile
After data are entered into a data warehouse, users cannot
change or update the data.
Operational update of data does not occur in the data
warehouse environment
Does not require transaction processing, recovery, and
concurrency control mechanisms
Requires only two operations in data accessing:
• Initial loading of data and
• Access of data 7
Other DWs Characteristics
Summarized
Not normalized
Metadata
Web based, relational/multi-dimensional
Client/server, real-time/right-time/active...
Summary of Data Warehouse
9
Data Mart
A Generic DW Framework
Data Applications
Sources No data marts option (Visualization)
Data
Marts Routine
ERP Business
ETL
Reporting
Process
Data mart
Select (Marketing)
Legacy Metadata Data/text
Extract mining
Data mart
Transform Enterprise (Operations)
POS Data warehouse
OLAP,
Integrate
Data mart Dashboard,
(Finance) Web
Other Load
OLTP/Web
Replication Data mart
(...) Custom built
External
applications
Data
14
DW Architecture
• Three-tier architecture
3-tier
architecture
2-tier 1-tier
architecture Architecture?
Tier 1: Tier 2:
Client workstation Application & database server
Data Warehousing Architectures
A Web-based DW Architecture
18
Alternative DW Architectures (1 of 2)
(a) Independent Data Marts Architecture
ETL
End user
Source Staging Independent data marts
access and
Systems Area (atomic/summarized data)
applications
ETL
Dimensionalized data marts End user
Source Staging
linked by conformed dimentions access and
Systems Area
(atomic/summarized data) applications
19
Alternative DW Architectures (2 of 2)
(d) Centralized Data Warehouse Architecture
ETL
Normalized relational End user
Source Staging
warehouse (atomic/some access and
Systems Area
summarized data) applications
Dimensional Modeling
A retrieval-based system that supports high-volume query
access
Star schema
The most commonly used and the simplest style of
dimensional modeling
Contain a fact table surrounded by and connected to several
dimension tables
Snowflakes schema
An extension of star schema where the diagram resembles a
snowflake in shape
Multidimensionality
29
OLAP
30
OLTP vs OLAP
OLTP OLAP
Users Clerk, IT professional Knowledge worker
Function Day to day operations Decision support
To suit typical database function Designed for reporting on
DB Design
of update, edit, delete, relational Subjects, data warehouse
Historical, summarized,
Current, up-to-date
Data integrated, multidimensional,,
detailed,
consolidated
Usage Repetitive, structured Ad-hoc, un-structured
Access Read/write Read. Lots of scans
Type of Work Short, simple transaction Complex query
# Records Accessed Tens Millions
# Users Thousands Hundreds, Tens
31
DB Size 100MB-GB 100GB-TB
OLAP Operations
OLAP
• Slicing Operations on a Simple Tree-Dimensional Data Cube
A 3-dimensional
OLAP cube with Sales volumes of
slicing a specific Product
operations on variable Time
and Region
e
m
Ti
Product
Geography
Sales volumes of
a specific Time on
variable Region
and Products
Successful DW Implementation Things to Avoid
Scalability
The main issues pertaining to scalability:
The amount of data in the warehouse
How quickly the warehouse is expected to grow
The number of concurrent users
The complexity of user queries
Good scalability means that queries and other data-
access functions will grow linearly with the size of the
warehouse
How the database looks like for the two types
36
How the database looks like for the two types
37
DW Administration and Security
The Future of DW
• Sourcing…
– Web, social media, and Big Data
– Open source software
– SaaS (software as a service)
– Cloud computing
– Data lakes
• Infrastructure…
– Columnar
– Real-time DW
– Data warehouse appliances
– Data management practices/technologies
– In-database & In-memory processing New DBMS
– New DBMS, Advanced analytics, …
40
Data Lakes
• Unstructured data storage technology for Big Data
• Data Lake versus Data Warehouse
Data lakes store all types of raw data, which data scientists may then use for a variety
of projects. Data warehouses store cleaned and processed data, which can then be
used to source analytic or operational reporting, as well as specific BI use cases.
Table 3.6 A Simple Comparison between a Data Warehouse and a Data Lake
Dimension Data Warehouse Data Lake
The nature of data Structured, processed Any data in raw/native format
Processing Schema-on-write (SQL) Schema-on-read (NoSQL)
Retrieval speed Very fast Slow
Cost Expensive for large data volumes Designed for low-cost storage
Agility Less agile, fixed configuration Highly agile, flexible configuration
Novelty/newness Not new/matured Very new/maturing
Security Security Not yet well-secured
Users Business professionals Data scientists
Business Performance Management (1 of 2)
1. Strategize
2. Plan
3. Monitor/analyze
4. Act/adjust
Each with its own sub-
process steps
44
• Strategic planning
– Common tasks for the strategic planning process:
1. Conduct a current situation analysis
2. Determine the planning horizon
3. Conduct an environment scan
4. Identify critical success factors
5. Complete a gap analysis
6. Create a strategic vision
7. Develop a business strategy
8. Identify strategic objectives and goals
2 - Plan: How Do We Get There?
Operational planning
Operational plan: plan that translates an organization’s
strategic objectives and goals into a set of well-defined
tactics and initiatives, resource requirements, and
expected results for some future time period (usually a
year).
Operational planning can be
Tactic-centric (operationally focused)
Budget-centric plan (financially focused)
3 - Monitor/Analyze: How Are We Doing?
Data mart
Smaller and focuses on a particular subject or department.
It is a subset of data warehouse/departmental data warehouse
A data mart is a smaller DW designed around one problem, organizational
function, topic, or other focus area.
Can be Dependent data mart
A subset that is created directly from a data warehouse
Ensures that the end user is viewing the same version of the data that are
accessed by all other data warehouse users
Or Independent data mart
A small data warehouse designed for a strategic business unit or a
department
49
Data Warehousing - Concept
—or—
3b.Data are loaded into data marts
4b.The data marts are consolidated into the EDW
52
Data Warehousing - Process Overview
53
Data Warehousing Architectures
There are several basic architectures for data warehousing
To distinguished the architectures data warehouse is divided
into three parts:
The data warehouse itself
Data acquisition (back-end) software, which extracts data from legacy
systems and external sources, consolidates and loads into the data
warehouse
Client (front-end) software, which allows users access and analyze data
from the warehouse
54
Data Warehousing Architectures
55
Data Warehousing Architectures
57
Data Integration : The Extraction,
Transformation, and Load (ETL) Process
59
Data Transformation Tools – To purchase or to build?
60
Data Transformation Tools –
To purchase or to build? A Consideration
Should organization purchase data transformation tools or build
the transformation process itself ?
Data transformation tools are expensive
Data transformation tools may have a long learning curve
It is difficult to measure how the IT organization is doing until it
has learned to use the data transformation tools
Not an easy decision !
However many believe purchasing the tools should be able to
kick start the project faster and simplify the maintenance of the
data warehouse . 61
Data Warehouse Development
Pros: Pros:
Easy to build organizationally Business Enterprise View
Easy to build technologically Design consistency
Cons: Data reusability
Enterprise wide view unavailable Cons:
Redundant data costs Require corporate leadership and vision
High ETL costs
High DBA costs 64
Eventually it can be this …
65
Data Warehouse Development
68
Data Warehouse
Administration and Security Issues
Due to its huge size and complicated nature, DW requires strong
monitoring and administrating
Needs more than a DBA
Needs data warehouse administrator (DWA)
A person responsible for the administration and management of
a data warehouse
69
Data Warehouse
Administration and Security Issues
What skills should a DWA possess? Why?
Familiarity with high-performance hardware, software and
networking technologies, since a data warehouse is based on
those.
Solid business insight, to understand the purpose of the DW
and its business justification, familiarity with business
decision, making processes to understand how the DW for
business strategic purpose will be used easy.
Excellent communication skills, to communicate with the
rest of the organization 70
Data Warehouse
Administration and Security Issues
Effective security in a data warehouse should focus on four main areas:
1. Establishing effective corporate and security policies and procedures. An
effective security policy should start at the top and be communicated to
everyone in the organization.
2. Implementing logical security procedures and techniques to restrict access. This
includes user authentication, access controls, and encryption.
3. Limiting physical access to the data center environment.
4. Establishing an effective internal control review process for security and
privacy
71
Performance Measurement
Performance Measurement 2
(HBR, 1992)
76
Balanced Scorecard
The meaning of “balance”?
Financial
Perspective
Internal
Customer VISION & Business
Perspective STRATEGY Process
Perspective
Learning and
Growth
Perspective
Six Sigma as a Performance Measurement System
Six Sigma
A performance management methodology aimed at reducing
the number of defects in a business process to as close to
zero defects per million opportunities (DPMO) as possible
The DMAIC performance model
A closed-loop business improvement model that
encompasses the steps of defining, measuring, analyzing,
improving, and controlling a process
Lean Six Sigma
Lean manufacturing / lean production
Lean production versus six sigma?
Comparison of BSC and Six Sigma (1 of 2)