Professional Documents
Culture Documents
CS 408
Concepts and Architectures
Database vs Data warehouse
• A database is any collection of data organized for storage,
accessibility, and retrieval.
2
Database vs Data warehouse
• A database is a collection of related data which
represents some elements of the real world. It is designed
to be built and populated with data for a specific task. It is
also a building block of your data solution.
• A data warehouse is an information system which stores
historical and commutative data from single or multiple
sources. It is designed to analyze, report, integrate
transaction data from different sources.
• Data Warehouse eases the analysis and reporting
process of an organization. It is also a single version of
truth for the organization for decision making and
forecasting process.
3
Business intelligence
• Business intelligence is the delivery of accurate, useful
information to the appropriate decision makers with
necessary timeframe to support effective decision-
making.
4
Data warehouse
• Data warehouse is a system that retrieves and
consolidates data periodically from the source systems
into a dimensional or normalized data store. It usually
keeps years of history and is queried for business
intelligence or other analytical activities. It is typically
updated in batches, not every time a transaction happens
in the source system
5
Data Mart
• Data Mart is a subset of data warehouse and is defined
as body of historical data in electronic repository that does
not participate in the daily operations of the organization.
Instead, this data is used to create business intelligence.
The data in the data mart usually applies to a specific
area of organization.
6
Fact Table
• Fact Table is the primary table in a dimensional model
where the numerical performance measurements of the
business are stored. We try to store the measurement
data resulting from a business process in a single data
mart.
7
Dimension Table
• Dimension Table is an integral companion to a fact table.
The dimension tables contain the textual descriptors of
the business. In a well-designed dimensional model,
dimension tables have many columns or attributes. These
attributes describe the rows in the dimension table.
Dimension tables tend to be relatively shallow in terms of
the number of rows (often far fewer than 1 million rows)
but are wide with many large columns. Dimension tables
are the entry points into the fact table. The dimensions
implement the user interface to the data warehouse.
8
OLAP DB
• Online analytic processing (OLAP) database is a
technology for storing, managing, and querying data
specifically designed to support business intelligence
uses.
9
ETL
• Extract, Transformation, and Load (ETL) system is a set
of processes that clean, transform, combine, de-duplicate,
archive, conform, and structure data for use in the data
warehouse.
10
PivotTable
• A PivotTable is a powerful tool to calculate, summarize,
and analyze data that lets you see comparisons, patterns,
and trends in your data.
11
Data Warehousing for Business Intelligence
Database management
essentials
intelligence implementation
Targeted Learners
13
Broad Course Objectives
• Establish an initial foundation of data warehouse background for
business intelligence careers
• Gain conceptual background about business architectures,
management practices, and data warehouse development
methodologies
• Create data warehouse designs, data integration workflows, and
pivot table operations
• Reflect on business architecture selection, data warehouse design
methodologies, and data integration goals and constraints
14
Prerequisite
• Introductory database course
• Background about relational databases, query
formulation, data modeling, and normalization
• Basic knowledge of Algorithms
• Basic Data Structure Concepts
15
Course Topics
Data warehouse User
Data mart tier
server departments
Operational
database Staging Extraction
Area
process
Transformation
process
Detailed and
summarized data
EDM
External
data source Data warehouse
Data mart
18
Decision Making Hierarchy
Lack of
integration
Missing
Performance
DBMS
limitations
features
Data
warehouse
technology
and
deployments
20
Technology and Deployment Limitations
* Performance limitation
- Performance problems with a separate database for both transaction
processing and business intelligence decision making
- Never solved. Use a separate database
* Lack of integration
- Lack of integration with transaction databases and external data
sources
- Add value: integrate, standardize, clean, and summarize both internal
and external data sources
21
Technology and Deployment Limitations
* Missing features for summary data
- Storage and optimization techniques for summary queries
- Data modeling approaches
- Support for precomputed query results
- Support for different business analyst query tools
22
Data Warehouse Characteristics
• Essential part of infrastructure for business intelligence
• Logically centralized repository for decision making
– Populated from operational databases and external data sources
– Integrated and transformed data
– Optimized for reporting and periodic integration
23
Comparison of Processing Environments
Transaction
processing
• Primary data from
transactions
• Daily operations and
short term decisions
Business intelligence
processing
• Transformed secondary
data
• Medium and long-term
decisions
24
Data Comparison
Characteristic Operational Data Warehouse
Database
Currency Current Historical
Details level Individual trans. Individual and summary
Orientation Process Subject
Records per Few Thousands
request
Normalization level Mostly normalized Normalization relaxed (not
important)
Update level Highly volatile Mostly refreshed / fetched (non
volatile)
Data model Relational Relational (star schemas) and
multidimensional (data cubes)
* A star schema is a conference for constructing the data into dimension tables, fact tables,
and materialized views. All data is saved in columns, and metadata is needed to identify the 25
columns that function as multidimensional objects.
Schema Comparison
Operational database Data warehouse
Manages Store
Item StoreId
ItemId StoreManager
Employee ItemName StoreStreet
ItemUnitPrice StoreCity
EmpNo StoreState
EmpFirstName ItemBrand StoreSales
ItemCategory StoreZip
EmpLastName StoreNation
... DivId
ItemSales Sales
DivName
SalesNo
DivManager
SalesUnits
SalesDollar
Takes Customer SalesCost
TimeDim
CustId TimeNo
Product CustName TimeSales TimeDay
Customer CustPhone
Order ProdNo TimeMonth
CustNo CustStreet CustSales TimeQuarter
ProdName
OrdNo CustCity TimeYear
CustFirstName Places Contains ProdQOH
OrdDate CustState TimeDayOfWeek
CustLastName ...
... CustZip TimeFiscalYear
...
CustNation
Qty
26
Challenges in Data Warehouse Projects
27
Intangible Benefits
• Includes:
– Brand Recognition
– Employee expertise
– Management skills
• Not easily quantified but important for an organization’s
success
• May also include Increased data quality
– Fewer missing values
– More matched entities
– More data availability
– Higher levels of compliance with data standards
28
Intangible Benefits
• Intangible Benefits may become tangible over
time: e.g.
– Increased revenue and reduced expenses
– A data warehouse may enable reduced losses due to
improved fraud detection.
– Improved customer attention through target marketing
– Reduction of inventory carrying costs through
improved demand forecasting
29
Learning Curve for Skills
30
Learning Curve for Production
13
11
9
7
5
3
1
0 1 2 3 4 5 6 7 8 9 10 11
Units
31
Maturity Relationships
Business Value Learning Curve Data Transformation Learning Curve
1.2 25
0.8
Transformation Cost
15
0.6
10
0.4
5
0.2
0 0
0 10 20 30 40 50 60 70
0 2 4 6 8 10 12
Time
Time
Between a data warehouse is deployed
32
Project Relationships
0.8
Business value
0.8
Risk
0.6 0.6
0.4
0.4
0.2
0.2
0
0 10 20
Scope40
30 50 60 70
0
0 10 20
Scope40
30 50 60 70
sources
Architecture Choices
Top Down
• Enterprise data warehouse
• Higher integration levels
• Logically centralized
• Larger project scope
Bottom Up
• Independent data marts
• Lower integration levels
• Logically decentralized
• Smaller project scope
35
Top-Down Architecture
Data warehouse User
Data mart tier
server departments
Operational
database Staging Extraction
Area
process
Transformation
process
Detailed and
summarized data
EDM
External
data source Data warehouse
Data mart
36
Bottom-up Architecture
User
Data mart tier
departments
Operational
database
Transformation
process
Data mart
Operational
database
External
data source
Data mart 37
Maturity Model Stages
@ Eckerson 2007
39
Maturity Model Insights
40
Advantages of Business Intelligence
• To gain competitive advantage
• To shift from product focus to customer focus
• To identify new markets
• To focus more on profitable customers
• To improve retention of customers
• To reduce inventory costs
41
Traditional Applications
42
Data Mining
• Discover significant, implicit patterns
– Target promotions
– Change mix and collocation of items
• Requires large volumes of transaction data including
sensor data and social media interactions
• Important tools for business intelligence
43
Market Shares and Trends
• Major vendors: Teradata, Oracle, IBM, Microsoft, SAP
• Large projected market growth
• Trends
– Real time load and analysis
– Increased storage and analysis of social interactions
– Increased usage of cloud services and appliances
44
Cloud Influence
Server
Database
Server Server
Database Database
User
Organization Application
(SaaS)
Development
Platform Cloud Vendor
(PaaS)
Infrastructure
Infrastructure
(IaaS)
46
Employment Opportunities
47
Skill-Position Mapping
Position
Competency
DW Manager DW Analyst BI Analyst
Communication ▄ █ █
Data cube tools ▄ █ █
Dashboards ▄ █
Data mining ▄ █
Data integration █ █
tools
DW schema █ ▄
design
Performance █
analysis
Quantitative █
modeling
48
SQL extensions █ █ ▄
Salary Trends (USA)