Professional Documents
Culture Documents
Dr.Sachin chaudhary B.Tech.(CSE), MS(S.E.), Phd(I.P.), Pdf(CBIR). Director Murala college of engg.& Tech.,Machilipatnam,AP
Course Overview
The course: what and how 0. Introduction I. Data Warehousing II. Decision Support and OLAP III. Data Mining IV. Looking Ahead Demos and Labs
2
What product promWhat product prom-otions have the biggest -otions have the biggest impact on revenue? impact on revenue? What impact will What impact will new products/services new products/services have on revenue have on revenue and margins? and margins?
Which customers Which customers are most likely to go are most likely to go to the competition ?? to the competition
Data, Data everywhere yet ... I cant find the data I need
data is scattered over the network many versions, subtle differences
Data
7
Evolution
60s: Batch reports
hard to find and analyze information inflexible and expensive, reprogram every new request
70s: Terminal-based DSS(Decision Support System and EIS (executive information systems)
still inflexible, not integrated with desktop tools
Definition of DSS Decision support system is defined as a system that helps the decision makers in various levels to take decisions This system uses data, analytical models and user friendly software for taking decision
10
Definition of EIS
Executive information system(EIS) is defined as a system that helps the high level executives to take policy decisions. This system user higher level data, analytical models and user friendly software for taking decisions.
11
Evolution
80s: Desktop data access and analysis tools
query tools, spreadsheets, GUIs easier to use, but only access operational databases
90s: Data warehousing with integrated OLAP(online analytical processing)engines and tools
12
Subject-Oriented
A data warehouse is organized around the major subjects of the organization such as customer, supplier, product, sales, etc..,
Data warehouse provides a simple and concise view around a particular subject by excluding data that are not useful to the decision support process.
15
Integrated
A data warehouse is constructed by integrating multiple sources of data such as relational database, flat files and on-line transaction records. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attributes etc..,
16
Time Variant
Data warehouse maintains records of both historical and current data. So it can provide information in a historical perspective
17
Non Volatile
Once data warehouse is loaded with data, it is not possible to perform any modifications in the stored data.
18
Farmers: Harvest information from known access paths Explorers: Seek out the unknown and previously unsuspected rewards hiding in the detailed data
19
Operation al Database
Loans Credit Card Trust Savings Customer
Data Warehouse
Data Source
Cleaning
New Update
21
Collection Data
Data warehousing collect data from various data sources such as relational data base, flat files and on-line records The collection of data are stored in database inside the warehouse. The type of data collection used depends on the architecture of the ware house.
22
Integration
Each and every data source uses from different schema. Data warehouse get data from different source with different schema and convert the data from various sources into a common integrated schema.
23
Star Schema
A single fact table and for each dimension one dimension table Does not capture hierarchies directly
T i
m
date, custno, prodno, cityname, ...
e c u s t
f a c t
p r o d c i t y
24
Snowflake schema
Represent dimensional hierarchy directly by normalizing tables. Easy to maintain and saves storage
T i
m
date, custno, prodno, cityname, ...
e c u s t
f a c t
p r o d c i t y
r e g i 25o n
Decision Support
Used to manage and control business Data is historical or point-in-time Optimized for inquiry rather than update Use of the system is loosely defined and can be adhoc Used by managers and end-users to understand the business and make judgments
27
OLAP
DATA WAREHOUSE SQL OLAP SERVER Request Result FRONT END set TOOL User
29
Result
TYPES OF OLAP
31
Multi-dimensional Data
HeyI sold $100M worth of goods
R eg io n
Dimensions: Product, Region, Time Hierarchical summarization paths
Product Industry Region Country Time Year
Product
Category
Region
Quarter
Product
City Office
Month Day
Week
32
Month
ERP Systems
Metadata Repository
33
Middleware
Management
34
Architecture of
35
Design Component The data warehouse designer design the database of the data warehouse and the warehouse administrator manages the data warehouse. The designer and administrator use the design component to design and store data
36
Types of design
Bottom-up design Business value can be returned as quickly as the first data marts can be created Top-down design Atomic data, that is, data at the lowest level of detail, are stored in the data warehouse. Hybrid design
37
Data Manager Component The database in the data warehouse uses the data manager component for managing and accessing the data stored in the data warehouse.
Rdbms Mdbms
38
Management Component
Administering data acquisition operation Managing backup copies of the data Recovering the lost data Providing security to the data stored in the data warehouse. Authorizing access to the data stored in the data warehouse.
39
This component acquires data from various sources by using the data acquisition applications The data acquisition applications are based on rules that are defined by the data warehouse developers.
40
Restructuring the records and fields of the database tables. Removing the irrelevant and redundant data obtaining and adding missing data. Verifying integrity and consistency of the data
41
The operation performed on the data for enhancement are Decoding and translating the values in fields. Summarizing data Calculating the derived values.
42
Middleware Component
This components connect to the local databases. Analytical server used to analyze multidimensional data. Intelligent data warehousing middleware to control the access to the warehouse database.
44
Data Mart
Data mart is a database that contains data needed for a small group of users for their own department needs. Dependent data mart Independent data mart
45
This supports the entire information requirement of an organization. This has large model, wider implementation, large data and more number of users.
This support the information requirement of a department in an organization This has small data model, shorter implementation, less data and some users.
46
prototype
48
R eg io n
periods
Product
Category
Region
Quarter
Month
Product
City
Month
Week
49
50
Data warehouse is a permanent storage Views are created from warehouse data data. when needed and it is not permanent Data warehouse are multidimensional Data warehouse can be indexed to maximize performance. Data warehouse provides specific support to a functionality Views are relational Views cannot be indexed. Views cannot give specific support to a functionality.
Data warehouse provide large amount of Views are created by extracting data. minimum data from data warehouse.
52
Data Mining
Data mining is sorting through data to identify patterns and establish relationships.
54
55
57
Application Areas
Industry Application Finance Credit Card Analysis Insurance Claims, Fraud Analysis Telecommunication Call record analysis Consumer goods promotion analysis Data Service providers Value added data Utilities Power usage analysis
58
data
information
60
Data Warehouse
Database s
Flat Files
61
field
unit
Structuring/Modeling Issues
Data
70
71
Data Marts
Data Warehouse
72
True Warehouse
Data Sources
Data Warehouse
Data Marts
73
What Is OLAP?
Online Analytical Processing - coined by EF Codd in 1994 paper contracted by Arbor Software Generally synonymous with earlier terms such as Decisions Support, Business Intelligence, Executive Information System OLAP = Multidimensional Database MOLAP: Multidimensional OLAP (Arbor Essbase, Oracle Express) ROLAP: Relational OLAP (Informix MetaCube, Microstrategy DSS Agent)
75
Result: OLAP shifted from small vertical niche to mainstream DBMS category
76
Strengths of OLAP
It is a powerful visualization paradigm It provides fast, interactive response times It is good for analyzing time series It can be useful to find some clusters and outliers Many vendors offer OLAP tools
77
OLAP Is FASMI
Fast Analysis Shared Multidimensional Information
78
80
30
Cream 12
Region Product
SF SF
10 47
Date
81
ns io eg Europe R
Far East India Retail Direct Special
Sales Channel
82
Drill-Down
Roll Up
Low-level Details
83
Multidimensional Spreadsheets
Analysts need spreadsheets that support
pivot tables (cross-tabs) drill-down and roll-up slice and dice sort selections derived attributes
86
OLAP Operations
Roll Up
Drill Down
Single Cell
Multiple Cells
Slice
Dice
Prentice Hall
87
Database Layer
Presentation Layer
Generate SQL execution plans in the ROLAP engine to obtain OLAP functionality.
Database Layer
Presentation Layer
Store atomic data in a proprietary data structure (MDDB), pre-calculate as many outcomes as possible, obtain OLAP functionality via proprietary algorithms running against this data.
90