Professional Documents
Culture Documents
Page 1
Page 2
Page 3
Page 4
Data Marts
A data mart is a smaller, more focused data warehouse. It reflects the business rules of a specific business unit. The data mart does not need to cleanse its data because that was done when it went into the warehouse. It is a set of tables for direct access by users. These tables are designed for aggregation. It typically is not a source for traditional statistical analysis.
Page 5
Data Mart
Data Delivery
Data Mart
Data Mart
Page 6
Page 8
Page 9
Page 10
Page 11
One-Time Costs
Hardware y Disk y CPU analysis y Network Middleware y Terminal analysis utility Processing Metadata Infrastructure Software DBMS Terminal
Network Log
Operational
y y y y y y y
Ongoing refreshment Integration transformation Data model maintenance Record identification maintenance Metadata infrastructure maintenance Archival of data Data aging within the DW
y y y y y y y y y
Integration/transformation processing specification Metadata infrastructure population System of record definition Data dictionary language definition Network transfer definition CASE/Repository interface Initial data warehouse population Data model definition Database design definition
Page 12
Page 13
Source System B
Source System C
Source System D
Page 14
Source System B
Source System D
Page 15
Page 16
Page 17
Data Marts yDepartmentalized ySummarized, aggregated data yStar join design yLimited historical data yLimited data volume yRequirements driven data yFocused on departmental needs yMulti-dimensional DBMS technologies
Page 18
Page 19
Page 20
A General Approach
Although all data mining endeavors are unique, they possess a common set of process steps: 1. Infrastructure preparation choice of hardware platform, the database system and one or more mining tools 2. Exploration looking at summary data, sampling and applying intuition 3. Analysis each discovered pattern is analyzed for significance and trends
Page 21
Page 22
Page 23
In general, a correlation coefficient is a number between 0 and 1 that shows strength of a relationship. Some types of correlation are signed ( ) to also show the direction of the relationship. Even a weak correlation can be interesting, however, if it shows a trend over time. Page 24
Page 25
Page 26
Page 27
Page 28
Dr. John Snow used a map to show the source of cholera was a water pump, thus proving the disease was water borne.
Page 29
Page 30
One of todays more useful types of visualization is in simulators (both in games and in practice). This is the only way most of us will ever fly a Boeing 747.
Page 31
It is now both cheaper and safer to train commercial pilots on simulators. With good software, pilots can be placed in situations they may not ever see until too late in the cockpit.
Page 32
Page 33
Page 34
Page 35