You are on page 1of 96

Lecture 8

Data Warehouses

Data Warehouses - 2021/22


Lecture Goals
• Goal:
 High-level approach
 What is a Data Warehouse?
 Basic characteristics, definition
 Basic architecture
 Processes
 Data

Data Warehouses - 2021/22


High level
approaches

Data Warehouses - 2021/22


Approach
• Situation
 multiple heterogeneous
databases
 operational
 query requirements
 analytic

Data Warehouses - 2021/22


Data
Sources
Data
• Available data:
 operational data:
 Legacy systems,
 Dedicated applications
 Enterprise applications – ERP, MRP II, CRM (OLTP)
 databases:
 relational, non-relational
 data dumps
 general data:

Data Warehouses - 2021/22


 external sources, e.g., web services
 archives:
 external data storages,
 removable media
Data
• Available data:
 different locations
 different formats and structures
 different regulations and policies
 different availability
 different semantics
 different quality standards

Data Warehouses - 2021/22


Approach
• How to handle queries?

• How to handle responses?

Data Warehouses - 2021/22


Data
Sources
Query Driven Approach
• Query Driven Approach
 build wrappers and
integrators
 to integrate heterogeneous
databases
 wrapper – a tool to access
known resources and
translate their objects, Mediator
focused on making an
interface compatible
 mediator – unifies several Convert

Data Warehouses - 2021/22


interfaces, notably their way
of communicating, like an
Convert
actual mediator. Convert
 lazy integration

Data
Sources
Query Driven Approach
• Query Driven Approach
 query is issued to a client
side
 a metadata dictionary
translates the query into
inner queries
 queries are mapped and
sent to the local query Mediator
processor
 inner queries are appropriate
Convert

Data Warehouses - 2021/22


for the individual
heterogeneous sites Convert
 results from heterogeneous Convert
sites are integrated into a
global answer set

Data
Sources
Query Driven Approach
• Query Driven Approach
 Disadvantages
 needs complex integration
and filtering processes
 is very inefficient and very
expensive for frequent
queries
 very expensive for queries
that requires aggregations Mediator

Data Warehouses - 2021/22


Convert
Convert
Convert

Data
Sources
Query Driven Approach
• Query Driven Approach
 Advantages
 access to current data
 no data redundancy

Mediator

Convert

Data Warehouses - 2021/22


Convert
Convert

Data
Sources
DW Incentives
• Update Driven Approach
 the information is
integrated
 from multiple heterogeneous
sources
 in advance and stored
 information is made
available
 for direct querying and
Integrator analysis

Data Warehouses - 2021/22


Convert
Convert
Convert

Data
Sources
DW Incentives
• Other possibilities ?
 Advantages / Disadvantages
?

Integration
rules
Change
detection
Integrator Metada
ta

Data Warehouses - 2021/22


Convert
Convert
Convert

Data
Sources
Update Driven Approach
• Update Driven Approach
 Advantages
 provide high performance
 data are copied, processed,
integrated, annotated,
summarized and
restructured
 in semantic data store in
advance
 query processing does not
Integrator

Data Warehouses - 2021/22


require interface with the
processing at local sources
Convert
Convert
Convert

Data
Sources
Update Driven Approach
• Update Driven Approach
 Disadvantages
 High data redundancy
 Problems with update

Integrator

Data Warehouses - 2021/22


Convert
Convert
Convert

Data
Sources
BI Process
• Data from the operational systems are
 Extracted
 Cleansed
 Transformed
 Aggregated
 Loaded into the DW

• Typically a good DW is a prerequisite for successful BI

Data Warehouses - 2021/22


Data in DW
• Data in DW:
 replicated, cleansed, transformed
 Aggregated
 loaded to the data warehouse system
 refreshed

Integrator

Data Warehouses - 2021/22


Convert
Convert
Convert
Data Warehouse

Data Warehouses - 2021/22


What it is ? Why it is important ?
DW Incentives
• Incentives for a Data Warehouse
 Businesses have a lot of data, operational data and facts.
 data is usually in different databases and in different physical
places
 distributed
 data is available (or archived), but in different formats
 heterogeneous
 Decision makers need
 to access information

Data Warehouses - 2021/22


 (data that has been summarized) virtually on one single site.
 this access needs to be fast
 regardless of the size of the data, and how old the data is.
DW Incentives
• Database developers
 understood that their software was required for both
 transactional and analytic processing
 however, their principal developments were directed to ever-
larger transactional data bases
 This process occurred even though operational and analytic data
are separate with
 different requirements
 different user communities

Data Warehouses - 2021/22


Data Warehouse?
• How can we define Data Warehouse ?
 Essence
 Purpose
 Characteristics
 Data

Data Warehouses - 2021/22


Data Warehouse
• Data Warehouse
 A dedicated database system for decision making
 separate from the production database(s) used operationally
 Update driven approach
 Differs from production system in that:
 it covers a much longer time horizon than transaction systems
 it includes multiple data bases that have been processed so that
the warehouse’s data are defined uniformly (i.e., ‘clean’ data)
 it is optimized for answering complex queries from managers and

Data Warehouses - 2021/22


analysts
Data Warehouse
• Data Warehouse
 Repository of an organization’s electronically stored data
 designed to facilitate reporting and analysis
 Provides architectures and tools
 to organize, understand, and use data to make decisions
 Support information processing
 by providing a solid platform of consolidated, historical data for
analysis

Data Warehouses - 2021/22


Data Warehouse
• Data warehousing:
 The entire process involved in constructing and using data
warehouses
 It consists of ?
 It requires ?

Data Warehouses - 2021/22


Data Warehouse

Data Warehouses - 2021/22


Definitions
• Definitions:
 “A read-only analytical database that is used as the
foundation of a decision support process”
 (Poe and Reeves, 1995)
 “An integrated and consistent store of subject-oriented
data that is obtained from a variety of sources and formatted
into a meaningful context to support decision-making in
organization”
 (McFadden, 1999)
 “Managed data situated after and outside the

Data Warehouses - 2021/22


operational system”
 (Gupta 1997)
The Definition
• The definition of Data Warehouse
 A data warehouse is a subject-oriented, integrated, time-
variant, and non-volatile collection of data in support of
management’s decision-making process
 W.H. Inmon, 1992

Data Warehouses - 2021/22


Data Warehouse
• Data warehouses typically provide a concise and
straightforward view around a particular subject
(customer, product, sales, etc.), instead of the global
organization's ongoing operations (order processing,
warehouse management, etc.)

Data Warehouses - 2021/22


Data Warehouse
• Subject oriented:
 Analytical database focuses on analysis of particular subject
 E.g. Profitability of a certain service or branch
 Operational database focuses on support of a particular
business process
 E.g. Handling appointments and bookings

Data Warehouses - 2021/22


Data Warehouse
• Subject oriented:
 provide a simple and concise view
 around particular subject issues
 excluding data that are not useful in the decision process

Data Warehouses - 2021/22


Data Warehouse
• Integrated:
 create a conformed and uniform view of the subject data

Data Warehouses - 2021/22


Data Warehouse
• Integrated:
 constructed by integrating multiple, heterogeneous data
sources
 relational databases, flat files, ...
 data cleaning and data integration techniques are applied
 there is not consistency among different data sources
 heterogeneous data sources
 when data is moved to the warehouse, it is processed
 ensure consistency

Data Warehouses - 2021/22


 when data is moved to the warehouse, it is processed
Data Warehouse
• Integrated:
 Data cleaning and data integration techniques are applied.
 in naming conventions
 e.g., LastName and FamilyName in DB1 and DB2 have the
same signification
 in encoding structures
 e.g, Attribute User_Id is a long int in DB1 and it is a string in
DB2
 in attribute scales
 e.g, cm vs inch

Data Warehouses - 2021/22


 missing values
 Etc.
Data Warehouse

Data Warehouses - 2021/22


Data Warehouse
• Integrated:
 Logic organisation
 Semantic organisation
 Dictionaries are aligned
 Provide a single, detailed,
and consistent view
 Resolve conflicting data
 Single-version of truth
 Physical organisation

Data Warehouses - 2021/22


 Format, data types, etc.

http://blog.cybyte.com/etl-business-intelligence/
Data Warehouse
• Having a DW
 It is the first time the company has an integrated view of its
information

Data Warehouses - 2021/22


Data Warehouse
• Non-volatile:
 a physically separate store
 warehouse data is loaded and accessed
 typical database update of data does not occur in the data
warehouse environment
 does not require transaction processing, recovery, and concurrency
control mechanisms
 requires only two operations in data accessing:
 initial loading of data and querying (read)

Data Warehouses - 2021/22


 data is frozen
Data Warehouse
• Time-variant:
 data in the data warehouse is accurate and correct only in
reference to a particular point of time
 data is stored as a series of snapshots
 each snapshot can represent a point/period of time
 historical data is gathered
 every data element in the data warehouse contains an
element of time – explicitly or implicitly
 as opposed to operational database that typically maintains no
historical data

Data Warehouses - 2021/22


 maybe just a specific type, for different purposes
Data Warehouse
• SNAPSHOTS
 Data in the data warehouse
is stored in units of
"snapshots".
 The records in the data
warehouse are
 created as of some moment in
time
 created in effect a snapshot
taken as of that moment in

Data Warehouses - 2021/22


time
Data Warehouse
• SNAPSHOTS
 In this regard the data in
the data warehouse is
fundamentally different
from the data in an
operational data base
environment
 Data in an operational data
base environment can be
updated.
 Since data in the data

Data Warehouses - 2021/22


warehouse environment is
snapshot data it cannot be
updated
Data Warehouse
• There are many different forms of taking snapshots
 The most basic consideration of a snapshot is that the
snapshot has been taken as a result of an event.
 The event may be triggered by a wide variety of occurrences:
 an occurrence of a transaction,
 the periodic passage of time,
 a threshold having been reached,
 an audit,
 a special request, etc.

Data Warehouses - 2021/22


Data Warehouse
• The snapshot triggered has four basic components:
 A key
 identifies the record and primary data
 A unit of time
 usually refers to the moment when the event occured
 Primary data that relates directly to the key
 Secondary data captured as part of the snapshot process that
has no direct relationship to the primary data or key
 incidental data that might be later used to support decisions

Data Warehouses - 2021/22


 extraneous information captured at the moment of snapshot
Data Warehouse
• Example:
 A key
 identifies a sales of product
 A unit of time
 identifies when this sales happened
 Primary data that relates directly to the key
 identifies product, price, location, etc.
 Secondary data
 how much product is in stock, interest rate

Data Warehouses - 2021/22


Data Warehouse
• Time-variant:
 incremental data loads
 changes invoke data addition and not data updates
 the time horizon for the data warehouse is significantly
longer than that of operational systems
 provide information from a historical perspective
 e.g., past 5-10 years

Data Warehouses - 2021/22


Data Warehouse

Data Warehouses - 2021/22


Data Warehouse
• Non-volatile and time-variant ?
 Consider that a particular health clinic changed location
(address)
 How this is handled in operational database ?
 How should this be handled in a data warehouse environment ?

Data Warehouses - 2021/22


Data Warehouse
• Having a DW
 First opportunity to store history – a roboust amount of
history
 History is an invaluable in understanding your company’s
 Customers, trends, products, etc.

Data Warehouses - 2021/22



Data warehouse

Data Warehouses - 2021/22


Data warehouse
• A data mart (DM)
 is a simple form of a data warehouse that is focused on a
single subject (or functional area),
 such as Location, Sales, Finance, or Marketing
 single LOB – line of business
 given their single-subject focus, data marts usually draw
data from only a few sources
 the sources could be internal operational systems, a central data
warehouse, or external data
 A data warehouse, unlike a data mart, deals with more

Data Warehouses - 2021/22


complex and general data
Data warehouse
• An operational data store (ODS)
 a central database that provides a snapshot of the latest data
from multiple transactional systems for operational reporting
 enables organizations to combine data in its original format from
various sources into a single destination to make it available for
business reporting.
 ODS is not a data warehouse or data mart.

• The purpose of an ODS is to integrate corporate data from


different heterogeneous data sources in order to facilitate
operational reporting in real-time or near real-time.
 Usually data in the ODS will be structured similar to the source

Data Warehouses - 2021/22


systems, although during integration the data can be cleaned,
denormalized, and business rules applied to ensure data
integrity.
 This integration will happen at the lowest granular level and occur
quite frequently throughout the day.
 An ODS is targeted for the lowest granular queries
 Normally an ODS will not be optimized for historical and trend
analysis
Data warehouse

Data Warehouses - 2021/22


http://randygrenier.blogspot.com/ https://www.iri.com/
Data warehouse

Data Warehouses - 2021/22


Data warehouse

Data Warehouses - 2021/22


Data warehouse

Data Warehouses - 2021/22


https://www.iri.com/
A bit of History
• 1979:
 First Spreadsheet Program—VisiCalc.

• 1985
 Procter and Gamble utilises first commercial system focused
on business analytics
 Excel 1.0

• 1988
 B. Devlin and P. Murphy publish “An architecture for a

Data Warehouses - 2021/22


business and information system”
 First time the term business data warehouse was used

• 1989
 SQL
A bit of History
• 1990
 Inmon publishes “Building the Data Warehouse”
 Cognos PowerPlay

• 1993:
 Introduction of the term OLAP

• 1996
 Kimball publishes “The Data Warehouse Toolkit”

• 1992:
 Essbase (Extended Spreadsheet Database) Published by Hyperion Solution this
became a major OLAP server product in the market in 1997.

Data Warehouses - 2021/22


• 1999:
 Microsoft OLAP Services published which became Microsoft Analysis Services in
2000

• 2002
 Inmon updates book and defines architecture for collection of disparate sources
into detailed, time variant data store.
 Kimball updates book and defines multiple databases called data marts that are
organized by business processes, but use enterprise standard data bus.
Inmon / Kimball
• Data warehouse:
 A data warehouse is a subject-oriented, integrated, time-
variant, and non-volatile collection of data in support of
management’s decision-making process
 W.H. Inmon, 1992
 A data warehouse is a database with these particular features

 “A copy of transaction data specifically structured for query


and analysis. (..) A data warehouse is a system that extracts
data from source systems, transforms and loads to

Data Warehouses - 2021/22


multidimensional structures, and further supports queries
and reporting for decision support.“
 Kimball
 Data warehouse is an architecture, the entire process focused on
these particular tasks
 Requires multiple tools and separate approaches
Inmon / Kimball
• Definition for a data warehouse:
 These are not opposing definitions
 Kimball’s data warehouse should follow all characteristics of
Inmon’s definition
 As such Inmon defines a data warehouse as a subset of Kimball’s
data warehouses
 None of the definitions tackles the shape of the database
 Focusing on features and functional aspects
 As such there are many different designs and models of data
warehouses

Data Warehouses - 2021/22


Inmon / Kimball
• Inmon’s Top-Down
Approach
 A centralized data warehouse
acts as a enterprise-wide data
warehouse from which data
marts are built as per the
requirements of the specific
departments
 Persistent dimensional views
of data across data marts can
be viewed since all data marts
are loaded from a data

Data Warehouses - 2021/22


warehouse
 This data warehouse design is
efficient against all business
changes.
 Creation of a data mart from a
data warehouse is very simple
 The analytic systems can
access data in a data
warehouse via the data marts
Inmon / Kimball
• Kimball’s Bottom-Up
Approach
 A business process is built
using data marts which are
joined together using common
dimensions
 A dimensional data model with
facts and dimensions is
implemented here
 The reports can be generated
quickly since the data marts
are built first

Data Warehouses - 2021/22


 The data warehouse can be
easily expandable to
accommodate new units.
 It involves the creation of new
data marts and then
integrating with the other data
marts
 The analytic systems access
data via data marts
Inmon / Kimball

Data Warehouses - 2021/22


Inmon / Kimball
• Inmon
 Data warehouse begins with
the corporate data model
 key subject areas and key
entities the business operates
with
 detailed logical model
 Normalized conceptual
model
 data redundancy is avoided
as much as possible

Data Warehouses - 2021/22


 leads to clear identification of
business concepts and avoids
data update anomalies
 Physical model
 implementation of the data
warehouse is also normalized
 single version of truth for the
enterprise is managed here
Inmon / Kimball
• Inmon
 Normalized model makes
loading of the data less
complex
 but using this structure for
querying is hard as it involves
many tables and joins
 Suggested building data marts
specific for departments.
 The data marts will be
designed specifically for LOB
 The data marts can have de-
normalized data to help with
reporting.

Data Warehouses - 2021/22


 Any data that comes into the
data warehouse is integrated,
and the data warehouse is the
only source of data for the
different data marts.
 This ensures that the integrity
and consistency of data is kept
intact across the organization.
Inmon / Kimball
• Kimball
 Building the data warehouse
starts with identifying
 the key business processes
 the key business questions
 The model
 the dimensional model
 not normalized
 Multiple star schemas are

Data Warehouses - 2021/22


built to satisfy different
reporting requirements
Inmon / Kimball
• Kimball
 Kimball proposes the
concept of ‘conformed
dimensions’
 Data integration
 key dimensions, like
customer and product, that
are shared across the
different facts will be built
once and be used by all the
facts

Data Warehouses - 2021/22


Inmon / Kimball
• Key advantages of the Inmon • The disadvantages of Inmon
approach: method:
 The data warehouse truly serves  The model and implementation
as the single source of truth for can become complex over time as
the enterprise, it involves more tables and joins.
 Is the only source for the data  Need resources who are experts
marts and all the data in the data in data modeling and of the
warehouse is integrated. business itself.
 Data update anomalies are  These type of resources can be
avoided because of very low hard to find and are often
redundancy. expensive.
 This makes ETL process easier  A fairly large team of specialists
and less prone to failure. need to be around to successfully
manage the environment
 The business processes can be
understood easily, as the logical  The initial set-up and delivery

Data Warehouses - 2021/22


model represents the detailed will take more time, and
business entities. management needs to be aware
of this.
 More flexible
 As the business requirements
 More ETL work is needed as the
change or source data changes, it data marts are built from the
is easy to update the data data warehouse.
warehouse as one thing is in only
one place.
 Can handle varied reporting
needs across the enterprise.
Inmon / Kimball
• Key advantages of the Kimball method: • Disadvantages of the Kimball method:
 Quick to set-up and build, and the first phase of  The essence of the ‘one source of truth’ is lost
the data warehousing project will be delivered
quickly.  data is not fully integrated before serving reporting
needs
 The star schema can be easily understood by the  Redundant data can cause data update anomalies
business users and is easy to use for reporting. over time.
 Most BI tools work well with star schema.
 Adding columns to the fact table can cause
 The footprint of the data warehousing environment performance issues.
is small
 This is because the fact tables are designed to be
 occupies less space in the database and it makes very deep.
the management of the system fairly easier.
 If new columns are to be added, the size of the fact
 The performance of the star schema model is very table becomes much larger and will not perform
good. well.

 The database engine will perform a ‘star join’ where  This makes the dimensional model hard to change
a Cartesian product will be created using all of the as the business requirements change.
dimension values and the fact table will be queried  Cannot handle all the enterprise reporting needs
finally for the selective rows.
because the model is oriented towards business
 This is known to be a very effective database processes rather than the enterprise as a whole.
operation.
 Integration of legacy data into the data warehouse

Data Warehouses - 2021/22


 A small team of developers and architects is can be a complex process.
enough to keep the data warehouse performing
effectively
 Works really well for department-wise metrics and
KPI tracking
 the data marts are geared towards department-
wise or business process-wise reporting.
 Drill-across
 Using multiple star schemas to generate a report
using confirmed dimensions.
A bit of history
• Two prevailing views on data warehousing:
 Kimball, in 1997, stated that
 "...the data warehouse is nothing more than the union of all the
data marts",
 Kimball indicates a bottom-up data warehousing methodology
 in which individual data marts providing thin views into the
organizational data could be created and later combined into a
larger all-encompassing data warehouse.
 Inmon responded in 1998 by saying,
 "You can catch all the minnows in the ocean and stack them

Data Warehouses - 2021/22


together and they still do not make a whale,"
 This indicates the opposing view that the data warehouse should
be designed from the top-down to include all corporate data.
 In this methodology, data marts are created only after the
complete data warehouse has been created.
A bit of history
• Kimball’s approach of data marts seems to be more
popular
 since companies prefer to start with something small that
works rather than spec endlessly only to create a monster.
 Sometimes there is a data warehouse in place.
 It’s usually implemented by a relational database which is queried
directly and used for online analytical processing (OLAP).

• Although Inmon argues that a data warehouse is just an


architecture

Data Warehouses - 2021/22


 people use the term on a day to day basis to refer to an actual
technology
A bit of history
• Till today
 In 2014 Inmon criticized Cloudera for associating Big Data
with the data warehouse, two totally unrelated terms
according to him
 “Turbocharge Your Porsche - Buy An Elephant”, Bill Inmon
 Whereas Kimball took the opposing view by presenting a
webinar with Cloudera about building a data warehouse with
Hadoop

Data Warehouses - 2021/22


Data
Warehousing

Data Warehouses - 2021/22


The process
Processes

Data Warehouses - 2021/22


ETL DATA DATA
PROCESS MAINTANANCE ACCESS
Process

Data Warehouses - 2021/22


The data
Data
Warehousing

Data Warehouses - 2021/22


Data for DW
• Available data:
 operational data:
 Legacy systems, Dedicated applications
 Enterprise applications – ERP, MRP II, CRM (OLTP)
 general data:
 External sources
 archives:
 External data storages, removable media

Data Warehouses - 2021/22


• Specifically in DW:
 Data is replicated, cleansed, transformed
 Aggregated
 Data loaded to the data warehouse system
Logical Components

Data Warehouses - 2021/22


Data Staging Data Organisation Data Access
Area Area Area
Data for DW
• Data Staging Area
 is both a storage and process area (the ETL process)
 it represents everything that happens between the operational
source system and the data organisation area
 input
 Unintegrated, operational environment (‘legacy systems’)
 the key architectural requirement for data staging area is
that
 it is off-limits to business users

Data Warehouses - 2021/22


 does not provide query and presentation services
 it is acceptable to create a normalized database in the staging
processes
 it is not the end goal
 normalization defeats the understand ability and
performance
Data for DW
• Data Staging Area
 Data Extraction
 gathering the data from multiple heterogeneous sources
 Data Cleaning
 finding and correcting the errors in data
 Data Transformation
 converting data from legacy format to warehouse format
 Data Loading
 sorting, summarizing, consolidating, checking integrity

Data Warehouses - 2021/22


 building indices, partitions
 Refreshing
 updating - from data sources to data warehouse
• Data staging
 Baseline – isolation from source system
 Snapshots
 History
 Auditability

Data Warehouses - 2021/22


DW Environment

Data Warehouses - 2021/22


Data in DW
• Data
 Summarised
 operational data are mapped into decision usable form
 the higher the level of summarisation, the more the data is used
 the more summarized the data, the quicker it is to retrieve
 Larger
 much more data is retained

Data Warehouses - 2021/22


 Denormalised
 data can be redundant
 Metadata
 data about data
Data in DW
• Data Organisation Area:
 is the data warehouse
 as far as the business community is concerned
 data is organized, stored, and made available for direct
querying by users, report writers, and other analytical
applications
 holds both aggregated and detailed data for management
 separate from the databases used for OLTP
 data is based on a multidimensional data model

Data Warehouses - 2021/22


 views data in the form of a data cube
Data in DW
• Storage structure
 After extraction from the operational data, in DW
information is stored in databases
 operated by a DBMS
 Different database structures can be used for a DW:
 Relational model (RDB) operated by a RDBMS
 Multi-dimensional model (MDB) operated by a MDBMS

Data Warehouses - 2021/22


Data in Data Warehouse

Data Warehouses - 2021/22


DW Environment

Data Warehouses - 2021/22


Data Access
• Data Access Area:
 variety of capabilities can be provided to business users to
leverage the presentation area for analytic decision making
 pivot tables/charts
 tabular view
 data mining structure
 all data access tools in the data warehouse’s presentation
area query the data DBMS (organisation area)

Data Warehouses - 2021/22


 data access tools can be
 as simple as an ad hoc query tool
 as complex as a sophisticated data mining or modelling
application
Data Access
• On-Line Analytical Processing (OLAP)
 Interactive analysis
 Explorative discovery
 Fast response times required

• OLAP operations/queries
 Aggregation, e.g., SUM
 Change level, e.g. (Year, City) -> (Year, Month, City)
 Roll Up: Less detail

Data Warehouses - 2021/22


 Drill Down: More detail
 Filtering, e.g. Year=2000
 Slice/Dice: Selection,
DW Environment
• Performance Optimization
 The data warehouse contains GBytes or even TBytes of data!
 OLAP users require fast query response time
 They don’t want to wait for the result for 1 hour!
 Acceptable: answer within seconds, maximum minutes

 Idea:
 ?

Data Warehouses - 2021/22


DW Environment
• Performance Optimization
 Idea:
 Precompute
 some partial result in advance and store it
 At query time,
 use partial result to derive the final result very fast

Data Warehouses - 2021/22


DW Environment

Data Warehouses - 2021/22


DW Environment

Data Warehouses - 2021/22


DW Environment

Data Warehouses - 2021/22


Data Warehouse
• “Through measurement comes knowledge.”
 Heike Kamerlingh Onnes

• A data warehouse is an example of the journey that data


takes, when combined with context, to become
information.
 Prior to application of context it is just a collection of
numbers and letters, bits and bytes.
 Yet information is still not enough to enable an organization
to learn from and act based on what they have collected.

Data Warehouses - 2021/22


 The ability to use accurate data and timely information to
objectively measure and, therefore, proactively manage
outcomes and business processes demonstrates the value of a
data warehouse.
• DW as part of DSS (Decision Support Systems)
 Ad hoc queries (SQL-style )
 Optimized for large, complex data
 On-Line Analytic Processing (OLAP) queries
 Optimized for extensive group by and aggregation operations
 Data is viewed as multidimensional array
 Influenced by end-user tools such as spreadsheets
 Exploratory data analysis or data mining
 Looking for interesting patterns in the data

Data Warehouses - 2021/22


 Note this is hard to express a priori in a query
To Recapitulate
• Goals
 A data warehouse is a
 subject-oriented, integrated, time-variant, and non-volatile
collection of data
 Provide technologies for analytical processing – OLAP
 Allow for advanced analytical querying
 Allow for data mining
 Discover hidden patterns and trends
 Ad-hoc analysis and reports

Data Warehouses - 2021/22


 Ensure an integrated view over data
 Entire set of data available in the enterprise
 Clean Data
 Multi-dimensional model (MDB) operated by a MDBMS
• Kimball R., Ross M., Thornthwaite W.,
Mundy J., Becker B.,
 The data warehouse lifecycle toolkit, 2nd
edition
 Wiley Publishing, Inc., 2008
Bibliography • Jensen C.S., Pedersen T.B., Thomsen C.,
 Multidimensional Databases and Data
SOURCES Warehousing,
 Morgan & Claypool Publishers series
SYNTHESIS LECTURES ON DATA
MANAGEMENT, 2010

• Inmon W.,
 Building the Data Warehouse,
 John Wiley & Sons, New York 2002

• Claudia Imhoff, Nicholas Galemmo,

Data Warehouses - 2021/22


Jonathan G. Geiger,
 Mastering Data Warehouse Design -
Relational and Dimensional Techniques,
 Wiley Publishing, Inc., 2003

• https://www.youtube.com/watch?v=rvUR
MymCpJM

You might also like