You are on page 1of 26

Joni, S.Kom, M.

TI
D5784

Connolly, T., & Begg, C. (2015).


DATABASE SYSTEMS: A Practical Approach To
Design, Implementation, and Management.
6th. Pearson. ISBN: 978-1-292-06118-4.
Data Warehousing Concepts

Data Warehousing Design

Online Analytical Processing (OLAP)

Data Mining

Summary
The latest and most successful advocate for data warehousing is Bill Inmon, who has earned
the title of ‘father of data warehousing’ due to his active promotion of the concept.
The Great Debates about Data Warehouse

"The data warehouse is nothing more than the union of all the data marts.”
(Ralph Kimball)

"You can catch all the minnows in the ocean and stack them together and
they still do not make a whale.”
(Bill Inmon)

Independent Data Mart (Ralph Kimball) Dependent Data Mart (Bill Inmon)
Dimensional Modeling
 Dimensional modeling is the design concept used by many data warehouse
designers to build their data warehouse.
 The dimensional data model provides a method for making databases simple and
understandable.
 The major purpose of creating a data warehouse from transactional systems is
creating intelligence out of day to day activity and it is not intended for extracting
operation reports and cannot be treated merely as a report store.

Dimensional Data
OLTP ETL
Modeling Warehouse

 Fact table: A central table in a data warehouse schema that contains numerical measures and
keys relating facts to dimension tables.
 Dimension table: It is a business entity of the source system. There can be multiple
normalized table represent one single business entity on the source system.
 All fact tables have two or more foreign keys, that connect to the dimension tables’primary
keys
 The fact table itself generally has its own primary key made up of a subset of the foreign keys.
Star schema is a logical structure that has a fact table containing factual data in the
center, surrounded by dimension tables containing reference data (which can be
denormalized).
Product Dimension Store Dimension
ProductKey StoreKey
Sales Fact
Product Name StoreName
StartDate ProductKey StartDate
ProductManufacturer CustomerKey StoreLocation
StoreKey
Customer Dimension
DateKey Date Dimension
CustomerKey
DateKey
CustomerName
CalendarDate
StartDate
Month
CustomerLocation
Day
 The star schema has a center, represented by a fact table, and the points of the star, represented
by the dimension tables.
 From a technical perspective, the advantage of a star schema is that joins between the dimensions
and the fact tables are simple, performance, ability to slicing and easy understanding of data.
Snowflake schema is a variant of the star schema where dimension tables do not
contain denormalized data.
Loc Dim

LocKey
Prod Dimension
LocName
ProductKey Store Dimension
StartDate
Product Name StoreKey

StartDate StoreName
Sales Fact
ProductMfr StartDate

LocKey
ProductKey
Cust Dimension CustomerKey
Date Dimension
CustomerKey StoreKey
DateKey
CustLoc Dim CustomerName DateKey CalendarDate
CustomerLocKey StartDate
Month
Location CustomerLocKey
Day
StartDate

In a snowflake, the dimension tables are normalized. From a performance perspective, the snowflake
may result in slower queries because of the additional joins required.
OLAP key features as described in the OLAP Council White Paper (2001):
1. multi-dimensional views of data
2. support for complex calculations
3. time intelligence
 OLAP database servers use multi-dimensional structures to store data and relationships
between data.
 Multidimensional structures can be visualized as cubes of data, and cubes within cubes of data.
Each side of the cube is considered a dimension.
Multidimensional Data as 3-field Table versus 2-D Cube
Multidimensional Data as 4-field Table versus 3-D Cube
OLAP tools are categorized according to the architecture of the database providing the
data for the purposes of analytical processing.
There are 4 main categories of OLAP Tools:
- Multi-dimensional OLAP (MOLAP)
- Relational OLAP (ROLAP)
- Hybrid OLAP (HOLAP)
- Desktop OLAP (DOLAP)
Data mining is not:

 Data warehouse: Data warehouse, relational or OLAP can be used for mining
process but data mining itself is not data warehouse store for storing
warehouse objects such as facts, dimensions.

 Reporting store: Data mining is not a report store. It provides a method for
analyzing data and making decisions. It does not provide any reports other
than the analyzed data

 OLAP: Online Analytical processing stores the data warehouse data in multi
dimensional store and also does aggregates accordingly. Data Mining does
not require the data to be in multi dimensional or aggregations. It cannot be
treated as a replacement of OLAP store.
OLAP Data Mining
Typically focuses on historical facts Typically focuses on future outcomes or trends

Aggregates data using pre-defined groupings Requires detail data

Verification driven/Factual results Discovery driven

Ad hoc queries and reports Statistical and machine learning techniques

Limited ability to include reliability estimates with Data models available for predicting, discovering
predictions patterns, estimating and producing accurate results for
trend analysis and forecasting

OLAP can be used as a data source for Data Mining Data mining results can also be used in OLAP applications
models by incorporating new predictive variables or scores as
dimensions or attributes in your OLAP tool
There are 4 main operations associated with data mining techniques:
1. Predictive modeling
2. Database segmentation
3. Link analysis
4. Deviation detection
 Operational analysis is business
transaction reports (closing bank
balances, who was admitted into the
Trend hospital today, how many support calls
analysis Adhoc are closed today, etc.)
analysis  Trend analysis understands the growth
of the historical data over a period of
Operational Predictive time.
 Ad hoc analysis is business context
analysis analysis analysis (Products sales by region) or it
can also be used for finding the root
cause such as sudden decrease in sales
Data analysis Cycle of a product due floods or natural
calamity
 Predictive analysis is predicting the
patterns for the future (also called
forecasting)
Factors that would encourage considering data mining include the following:
 Data availability in source Systems: Detailed data is available from source systems,
preferably on a near real-time basis. Having detailed data would be a good
candidate for accurate and predictable results.
 Huge data volume: Large data sets that can be difficult to analyze effectively using
other tools lend themselves to data mining solutions. Also, the statistical functions
in data mining require a large sample set in order to produce meaningful results.
 Complexity to identify trends: Having multiple factors enter into to forecasting or
discovery analysis lends itself to data mining, particularly when the appropriate
grouping structures are not known in advance.
 Automating with minimum user interaction: Because data mining is driven by data
values, the same solution can be implemented at different customer locations,
achieving customized behavior with no changes to the application.

You might also like