This action might not be possible to undo. Are you sure you want to continue?
Data Warehousing Relation to OLTP (TPS) OLAP Data mining
A producer wants to know«.
Which are our lowest/highest margin customers ? What is the most effective distribution channel? Who are my customers and what products are they buying?
What product prom-otions have the biggest impact on revenue? What impact will new products/services have on revenue and margins?
Which customers are most likely to go to the competition ?
What is a Data Warehouse?
A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context. [Barry Devlin]
Very Large Data Bases
Terabytes -- 10^12 bytes:Walmart -- 24 Terabytes Petabytes -- 10^15 bytes:Geographic Information Systems Exabytes -- 10^18 bytes: National Medical Records Zettabytes -- 10^21 bytes: Zottabytes -- 10^24 bytes: Weather images Intelligence Agency Videos
Data Warehouse A data warehouse is a subject-oriented integrated time-varying non-volatile collection of data that is used primarily in organizational decision making.Bill Inmon. -. Building the Data Warehouse 1996 6 .
Application Areas Industry Finance Insurance Telecommunication Transport Consumer goods Data Service providers Utilities Application Credit Card Analysis Claims. Fraud Analysis Call record analysis Logistics management promotion analysis Value added data Power usage analysis 7 .
Subject-Orientation Application-Orientation Subject-Orientation Operational Database Loans Credit Card Trust Savings Customer Data Warehouse Vendor Product Activity 8 .Application-Orientation vs.
OLTP vs Data Warehouse OLTP Application Oriented Used to run business Detailed data Current up to date Isolated Data Repetitive access Clerical User Warehouse (DSS) Subject Oriented Used to analyze business Summarized & refined Snapshot data Integrated Data Ad-hoc access Knowledge User 9 .
OLTP vs Data Warehouse OLTP Performance Sensitive Few Records accessed at a time (tens) Read/Update Access No data redundancy Database Size 100MB -100 GB Data Warehouse Performance relaxed Large volumes accessed at a time (millions) Mostly Read (and Batch Update) Redundancy present Database Size 100 GB .terabytes 10 .
OLTP vs Data Warehouse OLTP Transaction throughput is the performance metric Thousands of users Managed in entirety Data Warehouse Query throughput is the performance metric Hundreds of users Managed by subsets 11 .
OLTP Systems are used to ³run´ a business The Data Warehouse helps to ³optimize´ the business 12 ...To summarize .
Data Warehouse Architecture Relational Databases Optimized Loader ERP Systems Extraction Cleansing Data Warehouse Engine Analyze Query Purchased Data Legacy Data Metadata Repository 13 .
OLAP Tools Metadata Data Mining tools 14 .Components of the Warehouse Data Extraction and Loading The Warehouse Analyze and Query -.
Loading the Warehouse Cleaning the data before it is loaded .
The Reality Warehouse data comes from disparate questionable sources 16 .Data Quality .
Data Transformation Terms Extracting Conditioning Scrubbing Merging Householding Enrichment Scoring Loading Validating Delta Updating 17 .
IMS. DB2. IDMS. more data today in relational databases on Unix Conditioning The conversion of data types from the source to the target data store (warehouse) -always a relational database 18 .Data Transformation Terms Extracting Capture of data from operational source in ³as is´ status Sources for data generally in legacy mainframes in VSAM.
Data Transformation Terms Householding Identifying all members of a household (living at the same address) Ensures only one mail is sent to a household Can result in substantial savings: 1 lakh catalogues at Rs. 1 lakh. A 2% savings would save Rs. 50 lakhs. 19 . 50 each costs Rs.
Data Transformation Terms Enrichment Bring data from external sources to augment/enrich operational data. Scoring computation of a probability of an event.. Nielsen... C. e. Data sources include Dunn and Bradstreet.. chance that a customer will defect to AT&T from MCI. chance that a customer is likely to buy a new product 20 . CMIE. A.g. IMRA etc..
Seshadri S.. Used for improving the quality of data Clean data is vital for the success of the warehouse Example Seshadri. Srinivasan Seshadri. Sheshadri. are the same person 21 .Scrubbing Data Sophisticated transformation tools. Sesadri. etc.
load the data into the warehouse Issues huge volumes of data to be loaded Incremental versus Full loads Online versus Offline loads 22 . validating etc. scrubbing.Loads After extracting.
Structuring/Modeling Issues .
policy..Data Warehouse Structure Subject Orientation -.customer. E. account etc.. A subject may be implemented as a set of related tables. product.g. customer may be five tables 24 ..
Derived Data Introduction of derived (calculated data) may often help Have seen this in the context of dual levels of granularity Can keep auxiliary views and indexes to speed up query processing 25 .
Schema Design Schema Types Star Schema Fact Constellation Schema Snowflake schema 26 .
a few columns at most large number of rows (millions to a billion) Access via dimensions 27 .Fact Table Central table mostly raw numeric items narrow rows.
.Star Schema A single fact table and for each dimension one dimension table Does not capture hierarchies directly T i m date. cityname.. custno. prodno. e c u s t f a c t p r o d c i t y 28 ..
Snowflake schema Represent dimensional hierarchy directly by normalizing tables. cityname. prodno. custno. e c u s t f a c t p r o d c i t y r e g i o 29 n .. .. Easy to maintain and saves storage T i m date.
Fact Constellation Fact Constellation Multiple fact tables that share many dimension tables Booking and Checkout may share many dimension tables in the hotel industry Hotels Promotion Booking Checkout Room Type Customer 30 Travel Agents .
Data Partitioning Typically partitioned by date line of business geography organizational unit any combination of above 31 .
True Warehouse Data Sources Data Warehouse Data Marts 32 .
´Slicing and Dicingµ Product The Telecomm Slice Household Telecomm Video Audio Europe Far East India Retail Direct Special Sales Channel 33 .
Roll-up and Drill Down Higher Level of Aggregation Sales Channel Region Country State Location Address Sales Representative Low-level Details 34 .
Budget vs.(total sales. Expenses Ranking -. percent-to-total) Comparison -.Nature of OLAP Analysis Aggregation -. quartile analysis Access to detailed and aggregate data Complex criteria specification Visualization 35 .Top 10.
What is not data mining? (Deductive) query processing. Expert systems or small ML/statistical programs 36 . knowledge extraction. data archeology. information harvesting.Data Mining Data mining (Knowledge Discovery in Databases): Knowledge discovery (mining) in databases (KDD). business intelligence. data/pattern analysis. data dredging. etc.
Data Mining: Confluence of Multiple Disciplines Database Technology Statistics Machine Learning Data Mining Visualization Information Science Other Disciplines 37 .
Data Mining Task-relevant Data Data Warehouse Data Cleaning Data Integration Databases Selection 38 .Data Mining: Knowledge Discovery in Databases Pattern Evaluation Data mining: the core of knowledge discovery process.
Originally proposed on market basket data .A rule such as 70% of transactions that purchase bread also purchase butter Classification: .Data Mining Techniques Concept Description : Generalize.Classify countries based on climate. summarize. Average Association: (Correlation) . 39 . classify cars based on gas mileage.Attempts to predict the value of a discrete dependent various known attributes .
Maximizing the intra-class similarity and minimizing the inter-class similarity Outlier Analysis: .Data Mining Techniques Prediction : Predict some unknown (Stack Market Analysis) Cluster Analysis: .A data object that does not comply with the general behavior of the data. rare events analysis 40 . .It is useful in fraud detection.
Useful Links .Products. References.
ProReports PowerSoft -.SAS/Assist Software AG -.SQL*Assist.Impromptu Information Builders Inc.Esperant Sterling Software -. -.Reporting Tools Andyne Computing -.Discoverer2000 Platinum Technology -.InfoMaker SAS Institute -.BrioQuery Business Objects -.GQL Brio -.Focus for Windows Oracle -.VISION:Data 42 .Business Objects Cognos -.
OLAP and Executive Information Systems Andyne Computing -.Plato Oracle -.Commander OLAP Holistic Systems -. OLAP++ Speedware -.Metacube Microstrategies --DSS/Agent Microsoft -.Pablo Arbor Software -.Essbase Cognos -.SAS/EIS.PowerPlay Comshare -. WebOLAP Informix -. Forest & Trees SAS Institute -.Holos Information Advantage -AXSYS.Media 43 .LightShip Planning Sciences -Gentium Platinum Technology -ProdeaBeacon.Express Pilot -.
refresh CA-Ingres replicator Carleton Passport Prism Warehouse Manager SAS Access Sybase Replication Server Platinum Inforefiner. clean.Other Warehouse Related Products Data extract. Infopump 44 . transform.
Prism Warehouse Manager Red Brick Systems -. -.Passport Evolutionary Technologies Inc.Extraction and Transformation Tools Carleton Corporation -.InfoRefiner Prism Solutions -.Extract Informatica -.DecisionScape Formation 45 . -.OpenBridge Information Builders Inc.EDA Copy Manager Platinum Technology -.
IPE Postal Soft 46 .Scrubbing Tools Apertus -.Enterprise/Integrator Vality -.
Informix XPS Microsoft -. MPP 47 .SQL Server Oracle -. IQ. Oracle Parallel Server Red Brick -.Red Brick Warehouse SAS Institute -.SAS Software AG -.ADABAS Sybase -.SQL Server.Oracle7.Warehouse Products Computer Associates -.Informix.Allbase/SQL Informix -.CA-Ingres Hewlett-Packard -.
5 Sybase MPP Sybase IQ 48 .Warehouse Server Products Oracle 8 Informix Online Dynamic Server XPS --Extended Parallel Server Universal Server for object relational applications Sybase Adaptive Server 11.
Warehouse Server Products Red Brick Warehouse Tandem Nonstop IBM DB2 MVS Universal Server DB2 400 Teradata 49 .
Other Warehouse Related Products Connectivity to Sources Apertus Information Builders EDA/SQL Platimum Infohub SAS Connect IBM Data Joiner Oracle Open Connect Informix Express Gateway 50 .
Other Warehouse Related Products Query/Reporting Environments Brio/Query Cognos Impromptu Informix Viewpoint CA Visual Express Business Objects Platinum Forest and Trees 51 .
SQR/Workbench PowerSoft -PowerBuilder SAS Institute -.Access. and PC Databases Information Builders -Lotus -Approach Focus Microsoft -.4GL's. GUI Builders.SAS/AF 52 . Visual Basic MITI -.
neurOagent Information Discovery -.SAS/Neuronets 53 .IDIS SAS Institute -.Data Mining Products DataMind -.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.