• Embed Doc
  • Readcast
  • Collections
  • CommentGo Back
Download
 
1
Unit 6: Data Warehousing1
Data Warehousing
Organizational InformaticsUnit 6: Data Warehousing & Data MiningThomas Haigh
Unit 6: Data Warehousing2
Structure of Presentation
Foundational idea: Data BaseManagement System
Data Warehouse Concept
Data Mining Concept
Some examples
Unit 6: Data Warehousing3
 A File Based System, 1962
 
Unit 6: Data Warehousing4
Data Base ManagementSystem Today
Single most important kind of corporate IT
Foundation of almost every
 Advanced web site
 Administrative application
Increasing use in science
Used on personal PCs
Even on Pocket PCs
Unit 6: Data Warehousing5
Data Base Management System
Standard piece of system software
Oracle, SQL Server, DB2, Access are most widelyused
Supports multiple data bases
Creation, modification of data structures (i.e. tables)
 Via Data Definition Language (DDL)
Retrieval, insertion, deletion, updating of data (i.e.rows)
 Via Data Manipulation Language (DML)
Unit 6: Data Warehousing6
Components of a DBMS
 
2
Unit 6: Data Warehousing7
Gives Different Views on Data
Different users have different permissions
 View, change, delete
On various parts of the data base
 “Views”are used to present
Data joined, grouped or filtered in particular ways
Can include results of calculations or functions
This allows
 Avoidance of duplicated data
Store it once; present in many different ways
Called “normalization” 
Unit 6: Data Warehousing8
Current DBMS Systems
Dominant model is “relational(egOracle)
Good for updating
Flexible
Can be slow & complex to extract data for reports
Use SQL (Structured Query Language)
to define and manipulate data
Support multiple simultaneous users
Can be ad-hoc individuals
Can be batch jobs or programs
Unit 6: Data Warehousing9
Not One Big Database
Big central database doesn’t work 
Finish up with dozens/hundreds of littledata bases
Physically separate
 All incomplete
Different data formats
Different concepts of data
Unit 6: Data Warehousing10
From Recent DB Textbook 
Top(strategic)Middle(tactical)Lower(operational)Individual operationaldatabasesSummarized, integratedoperational databasesExternal data sources andsummarized, tactical databasesOperational databasesManagement Hierarchy
 
Unit 6: Data Warehousing11
Data Warehouse Concept
Emerges early 1990s
One big DB for everything has failed, so
Leave “transactional”systems spread out (physically,organizationally), BUT
Make a second, read-only copy of everything in acentralized “data warehouse”. Update regularly.
Unit 6: Data Warehousing12
Ex. Warehouse Architecture
 
3
Unit 6: Data Warehousing13
Important Issues
What data to include
Does it need to cover all parts of business?
Does it need to cover all kinds of data?
How often to update it
Difference between daily, weekly can be enormous interms of cost
Where to keep data
Single centralized repository, or local copiesreplicated?
Unit 6: Data Warehousing14
Challenge 1: Loading Data
Huge job: extraction, cleaning, loading
Data comes from many sources
Need to standardize formats
Need to standardize codes, meanings
Duplicate, inconsistent data common
Not a one-time process (unlike ERP)
Need to automate, run on regular basis
Big effort to maintain processes
Unit 6: Data Warehousing15
Challenge 2: StructuringWarehouse
Data will usually be stored in relationaldatabase, e.g. Oracle
But not in normalized format
Remember, normalization is optimized for updating
But means lots of joins between tables when retrieverecords
Warehouse data optimized for queries
May duplicate many times for efficient retrieval
May store totals and statistics to save counting
 “Snowflake schema”or “Star schema” 
Unit 6: Data Warehousing16
One Example: The “Cube” 
Term “cubeused to describe indexingof data by dimension
Store a matrix of pre-calculated totals,assemble as needed for queries
e.g. A set of sales totals for eachcombination of product, day and store.
 Add up these pre-calculated totals to figureout sales by region, over a month, or for aclass of products
Unit 6: Data Warehousing17
Challenge 3: Querying the Warehouse
OLAP –“On Line Analytical Processing” 
General term for techniques
Distinguished from OLTP: for transactions
Means tools to work with pre-analyzeddata
Need because can’t do in real-time fromregular data base
But caching results in not practical
 All tools allow querying
Some include advanced statistical & modeling
Unit 6: Data Warehousing18
Data Mining
Different kinds of analysis
 Ad-hoc queries
Statistics, data mining, visualization
Delivery of regular reports and alerts
Take large volume of data
Use advanced techniques to reveal hidden knowledgein mass of data
Find hidden patterns, correlations
E.g. in reading: catalogs & recent movers
Clichéversion: beer & diapers (5pm -7pm)
 Automated at Amazon –product suggestions
of 00

Leave a Comment

You must be to leave a comment.
Submit
Characters: ...
You must be to leave a comment.
Submit
Characters: ...