Professional Documents
Culture Documents
• Late 1980s-present
– Advanced Data Analysis
• Data warehouse and OLAP
• Data mining and knowledge discovery
• Advanced data mining appliations
• Data mining and socity
• 1990s-present:
– XML-based database systems
– Integration with information retrieval
– Data and information integreation
• Present – future:
– New generation of integrated data and
information system.
Pattern Evaluation
– Data mining: the core of
knowledge discovery
process. Data Mining
Task-relevant Data
Data Selection
Warehouse Transformation
Data Cleaning
Data Integration
Pattern evaluation
Data
Databases Warehouse
12/11/20 P.PramodKumar, Sr.Asst.Prof.,
Data Mining and Business Intelligence
Increasing potential
to support
business decisions End User
Making
Decisions
Data Exploration
Statistical Analysis, Querying and Reporting
• Relational databases
• Data warehouses
• Transactional databases
• Cluster analysis
– Analyze class-labeled data objects, clustering
analyze data objects without consulting a known
class label.
– Clustering based on the principle: maximizing the
intra-class similarity and minimizing the interclass
similarity
Information
Science Data Mining MachineLearning
Visualization Other
Disciplines
• General functionality
– Descriptive data mining
– Predictive data mining
• Techniques utilized
– Database-oriented, data warehouse (OLAP),
machine learning, statistics, visualization,
neural network, etc.
• Applications adopted
– Retail, telecommunication, banking, fraud
analysis, DNA mining, stock market
• Data warehousing:
– The process of constructing and using data warehouses
all
0-D(apex) cuboid
time,location,supplier
time,item,location 3-D cuboids
time,item,supplier item,location,supplier
4-D(base) cuboid
12/11/20 time, item, location, supplier Sr.Asst.Prof.,
P.PramodKumar,
• What is Data Cube?
• What is Dimension Table?
• What is Fact Table?
• What is Base Cuboid?
• What is Apex Cuboid?
branch_key
location
branch location_key
location_key
branch_key
units_sold street
branch_name
city_key
branch_type city
dollars_sold
city_key
avg_sales city
Measures province_or_street
country
12/11/20 P.PramodKumar, Sr.Asst.Prof.,
Example of Fact Constellation
time
time_key item Shipping Fact Table
day item_key
day_of_the_week Sales Fact Table item_name time_key
month brand
quarter time_key type item_key
year supplier_type shipper_key
item_key
branch_key from_location
all all
Office Day
Month
12/11/20 P.PramodKumar, Sr.Asst.Prof.,
A Sample Data Cube
Date Total annual sales
1Qtr 2Qtr 3Qtr 4Qtr sum of TV in U.S.A.
t
uc
TV
od
PC U.S.A
Pr
VCR
Country
sum
Canada
Mexico
sum
all
0-D(apex) cuboid
product date country
1-D cuboids
3-D(base) cuboid
product, date, country
Monitor
Metadata & OLAP Server
other
source Integrator
s Analysis
Operational Extract Query
DBs Transform Data Serve Reports
Load
Refresh
Warehouse Data mining
Data Marts
Data Sources
12/11/20 Data Storage OLAP Engine Front-End Tools
P.PramodKumar, Sr.Asst.Prof.,
Metadata Repository
• Meta data is the data defining warehouse objects. It has the
following kinds
– Description of the structure of the warehouse
• schema, view, dimensions, hierarchies, derived data defn, data mart
locations and contents
– Operational meta-data
• data lineage (history of migrated data and transformation path), currency of
data (active, archived, or purged), monitoring information (warehouse usage
statistics, error reports, audit trails)
– The algorithms used for summarization
– The mapping from operational environment to the data warehouse
– Data related to system performance
• warehouse schema, view and derived data definitions
– Business data
• business terms and definitions, ownership of data, charging policies
12/11/20 P.PramodKumar, Sr.Asst.Prof.,
Data Warehouse Back-End Tools and Utilities
• Data extraction:
– get data from multiple, heterogeneous, and external
sources
• Data cleaning:
– detect errors in the data and rectify them when possible
• Data transformation:
– convert data from legacy or host format to warehouse
format
• Load:
– sort, summarize, consolidate, compute views, check
integrity, and build indices and partitions
• Refresh
– propagate the updates from the data sources to the
warehouse
Multi-Tier Data
Warehouse
Distributed
Data Marts
12/11/20
Define a high-level corporate data model
P.PramodKumar, Sr.Asst.Prof.,
Types of OLAP Servers
• Relational OLAP (ROLAP)
– Use relational or extended-relational DBMS to store and
manage warehouse data and OLAP middle ware to support
missing pieces
– Include optimization of DBMS backend, implementation of
aggregation navigation logic, and additional tools and
services
– greater scalability
• Multidimensional OLAP (MOLAP)
– Array-based multidimensional storage engine (sparse
matrix techniques)
– fast indexing to pre-computed summarized data
12/11/20 P.PramodKumar, Sr.Asst.Prof.,
Types of OLAP Servers
• Hybrid OLAP (HOLAP)
– User flexibility, e.g., low level: relational, high-
level: array
• Specialized SQL servers
– specialized support for SQL queries over
star/snowflake schemas
n
T ( Li 1)
• Materialization
i of
1 data cube
– Materialize every (cuboid) (full materialization), none (no
materialization), or some (partial materialization)
– Selection of which cuboids to materialize
• Based on size, sharing, access frequency, etc.
C c3 61
c2 45
62 63 64
46 47 48
c1 29 30 31 32
c0
B13 14 15 16 60
b3 44
B 28 56
b2 9
40
24 52
b1 5
36
20
b0 1 2 3 4
a0 a1 a2 a3
A
mining
Layer2
MDDB
MDDB
Meta
Data
Filtering&Integration Database API Filtering
Layer1
Data cleaning Data
Databases Data
DataP.PramodKumar, Warehouse
integrationSr.Asst.Prof.,
12/11/20 Repository