You are on page 1of 40

Data warehousing concepts

Hanu

Agenda




OLTP Vs OLAP
Modeling Techniques
User Profile
Top down approach
Bottom up approach

Traditional OLTP systems
• OLTP systems are highly structured sets of
information that support the ongoing and day-to-day
operation of an organization

• These databases usually hold information about
small subsets of the organization split on the basis of
– Business functions
e.g. sales, purchase,travel
– Geographical locations e.g. Northern region,
Eastern region
– Logical units

e.g. REUD, BCMD, IHLD, EISA

OLTP (Contd…)
• Transactional database require a highly
normalized database design to achieve
performance goals and to optimize on storage
space
• These databases need to record, on a real-time
basis, every transaction that the organization
enters into

What is OLAP ? • An organization’s success also depends on its ability to analyze data (through views and reports) and make intelligent decisions that potentially affect its future. Systems that facilitate such analyses are called On Line Analytical Processing (OLAP) systems .

Why not OLTP for OLAP? • OLTP databases do not contain historical data • OLTP databases contain small subsets of organizational data • OLTP databases are heterogeneous in nature and geographically distributed systems .

– Poor data quality.In other words. • OLTP systems are – Fragmented – Not integrated. – Disparate sources.. – Difficult to access. – Difficult to understand. .. – Disparate platforms. – Redundant data.

suitably modified to support the needs of analytical processes and stored outside the operational database. a data warehouse is a subject oriented. integrated. nonvolatile collection of data in support of management decisions. • According to Bill Inmon. .Data warehouse • A Data Warehouse is a copy of the enterprise operational data. known as the father of Data Warehousing. time-variant.

unpredictable queries that access many rows per table • Loaded with consistent. requires no real time validation • Supports few concurrent users relative to OLTP • OLTP database • Designed for real-time business operations • Optimized for a common set of transactions. complex. valid data. usually adding or retrieving a single row at a time per table • Optimized for validation of incoming data during transactions. uses validation data tables • Supports thousands of concurrent users .OLAP Vs OLTP • Data warehouse database • Designed for analysis of business measures by categories and attributes • Optimized for bulk loads and large.

g... MOLAP Semistructured Sources Data Warehouse extract transform load refresh etc. Analysis serve Query/Reporting serve e.g.Data warehouse architecture Data Warehouse Server (Tier 1) OLAP Servers (Tier 2) Clients (Tier 3) e. ROLAP Operational DB’s serve Data Marts Data Mining .

D/W Architecture Goals • Deliver a great user experience — user acceptance is the measure of success • Function without interfering with OLTP systems • Provide a central repository of consistent data • Answer complex queries quickly • Provide a variety of powerful analytical tools. such as OLAP and data mining .

inventory.Characteristic of D/W • • • • Are based on a dimensional model Contain historical data Include both detailed and summarized data Consolidate disparate data from multiple sources while retaining consistency • Focus on a single subject. such as sales. or finance .

User Profile • Statisticians (2%) • Knowledge workers (15%) • Information Consumers (83%) .

including the Operational Data Store (ODS) • Design the relational database and OLAP cubes • Develop the data maintenance applications • Develop analysis applications • Test and deploy the system .Steps in implementing D/W • Identify and gather requirements • Design the dimensional model • Develop the architecture.

Identify and gather requirements • • • • Identify the Sponsor Meet the Business Users Meet Data experts Communicate with users often and thoroughly .

Identify The Business Areas • For Telecom D/W – – – – – – – – – Customer Behavior Corporate Customer Customer Service Accounts Settlements Partner Supplier Competitor Marketing .

Sources and Targets • Sources – Telephone call detail recording – Customer Service such as ordering service and disconnecting lines – Customer payment processing • Targets – – – – Studies of minutes of call use by customer group Segmentation of customers by minutes of call use Product bundling analysis Customer Payment analysis .

Design the dimensional model • • • • Identify the dimensions Should match with Business needs Identify the grain of the detail Decide on – Star Schema – Snow-flake Schema – Star-flake Schema .

Star Schema .

Star Schema .

Snowflake Schema .

Snowflake Schema .

.

Design consideration of Dimension Table • • • • Star or Snowflake Level of hierarchies Surrogate Key Date and Time .

• Type 3: Create new fields in the dimension record. • Type 2: Add a new dimension record.Slowly changing Dimension • Type 1: Overwrite the dimension record. – Tracking bands can reduce the updation to some extent – Nightmare if source and report not in sync .

Rapidly changing Dimensions • Breaking offending dimension attributes • Fact less facts! • Confirmed Dimensions .

Fact tables • • • • • Multiple Fact tables Additive measures Non-additive/Semi additive measures Calculated Measures Granularity .

cleansing and converting mapped data from the legacy environment to data warehouse environment.ETL • Extract. Transform and Load process may be described as the process of selecting. transforming. migrating. .

Extraction • Push strategy • Pull strategy .

removing the inconsistency between data from different sources. conditional transforms. Cleansing of data could be an important part of the transformation process . complex calculations to create derived data etc.Transformation • Transformation involves applying complex filters.

Loading • Loading involves the insertion of data into the target system. Loading is the last step before the users see the data. the data warehouse. that is. It involves populating the fact and dimension tables as well as aggregation tables that are part of the physical data model .

Loading approach • Transform and Load • Load and Transform • Transform while Loading .

Issues in Loading • Volume and frequency of loading • Disk space • Scheduling .

Data Marts • A data mart is a repository of data gathered from operational data and other sources that is designed to serve a particular community of knowledge workers. The emphasis of a data mart is on meeting the specific demands of a particular group of knowledge users in terms of analysis. presentation. and ease-of-use . In scope. the data may derive from an enterprise-wide database or data warehouse or be more specialized. content.

OLAP • ROLAP • MOLAP • HOLAP .

– Microsoft DTS (Available with SQL Server 7.PowerCenter – IBM.Few Popular tools • ETL – Data Junction. – Informatica.Data Warehouse Manager – AbIntio .0 and above) – Oracle Warehouse Builder.

Few Popular tools • OLAP – – – – – – – Cognos Business Objects Power Analyzer Microsoft Analysis service Micro strategy DB2 OLAP Server Hyperion OLAP Server .

Few Popular tools • Data Mining – Intelligent Miner – DARWIN – SAS .

dmreview.121.datawarehousing.168.com/proceedings/2000/dat a_warehousing/ws006pn/sld001.com • http://www.htm • http://sdgcomputing.61&ContentType=Inte rnal+Literature • http://www.com .datawarehouse-training.References • http://192.com • http://www.com • http://www.14/asp/Search/DispDoc.asp?Do cNo=8703&KCURating=8.caworld.

Thank You Hanu .