You are on page 1of 27

Data Warehousing and Data Mining

IS 665

Why Business Intelligence?


Which are our lowest/highest margin customers ? What is the most effective distribution channel?

Who are my customers and what products are they buying? Which customers are most likely to go to the competition ?

What prod promotions have the biggest impact on revenue?

What impact will new products/services have on revenue and margins?

http://www.cse.iitb.ernet.in/~sudarsha
Some material SAP

Transactional Processing
IT allows organizations to get a handle on the integration and modernization of daily transactions that run the operations of the business. Very basic transactional data is captured as a by-product of doing business. But once the implementation of transaction processing is completed, there arises a demand for information from the basic data that has been captured as transactions have been executed.
There are several reasons why basic transaction data is not enough to run the business:
Not very enlightening: Basic transaction data needs to be summarized, analyzed, aggregated, and so forth in order for management to be able to see beyond the detail.
Some material SAP

Information growth
The world produces 1 - 2 exabytes of unique information per year

So What is an exabyte? 1018


1 000 000 000 000 000 000 bytes or 1 000 000 000 gigabytes (where a gig is a billion) or 50,000 times the volume of information in the Library of Congress

Greg James
Some material SAP

Information growth
The world's total yearly production of print, film, optical, and magnetic content would require roughly 1.5 billion gigabytes of storage. This is the equivalent of 250 megabytes per person for each man, woman, and child on earth.

http://www.sims.berkeley.edu/research/projects/how-much-info/
Greg James
Some material SAP

Information growth

Hard drive cost per gigabyte.

1999 Winchester Disk Drive Market Forecast and Review,

International Data Corporation report. (Some years forecast)

Some material SAP

Greg James

The Value Chain

Some material SAP

From Data to Decisions


Historically, BI systems have evolved through three main phases.
Phase 1: In the first phase, enterprises started to put in place structured data stores filing data relevant to their needs. Typically a snapshot of data from a single operational source or sometimes directly against the operational source. In the 1980s, decision support tasks were performed centrally, with highly skilled individuals analyzing mainframe-resident data. The results were delivered to management as hard-copy reports and graphs.
Some material SAP

From Data to Decisions Contd


Phase 2.
Competitive factors pushed enterprises toward a better leverage of the data and they adopted augmented Decision Support Systems. With data analysis techniques such as statistics, creating a focused store of data with subject matter from two or three operational sources. Along with the storage, tools taking advantage of the structure have been setup for enterprises to question these data stores.

Some material SAP

From Data to Decisions Contd


Phase 3.
Finally, to cope with the multiplicity and diversity of the data stores, enterprises are starting to unify and rationalize these through data warehouse frameworks. In addition to this effort, and due to an increased competitive pressure, advanced data analysis techniques are also implemented to leverage further - and gain business advantages - the resulting and ever growing amount of data.

Some material SAP

Business Intelligence
BI is the user-centered process of exploring data, data relationships and trends - thereby helping to improve overall decision making.

This involves an iterative process of accessing data (ideally stored in the data warehouse) and analyzing it thereby deriving insights, drawing conclusions and communicating findings - to effect change positively within the enterprise.
BI is an application of a data warehouse, but does not necessitate a data warehouse. BI is comprised of four major product segments: interactive query tools, reporting tools, advanced DSSs, and EISs.
Some material SAP

Meta Group - View

Some material SAP

BI User Categories
Authors and analysts
Need advanced analysis functionality and ad-hoc data exploration capabilities Require useful, manageable tools

Executives and knowledge workers


Require personalized information in context via an intuitive user interface Want predefined analysis paths and the option of in-depth analysis of summary data

Information consumers
Need a snapshot of a particular data set to perform their operational tasks Do not interact extensively with the data.
Some material SAP

BI User Categories
High
Authoring Authoring and and Ad-hoc Query Ad-hoc Query 10 % 10 %

High

Analytical functionality and flexibility

OLAP OLAP Analysis Analysis andPower Power Reporting and Reporting 30 % 30 % Information Information Consumption, Consumption, Portal-based Deployment, Portal-based Deployment, andExecutive Executive Reporting and Reporting 60 % 60 %

Required training investment and cost

Low

Low

Some material SAP

BI The Solution at a Glance


Support decision-making requirements of the entire enterprise regardless of data sources of access methods Convert data into information, and ensure information is delivered at the right time to the right person in the right format to support business decision making. End-to-end BI solution incorporates: Data acquisition, Data warehousing, Online analytical processing (OLAP), Managed Reporting and Analysis, Query Design, Data mining.

Some material SAP

BI Architecture

Some material SAP

Data Warehouse
A data warehouse is a process and architecture that requires robust planning to implement, which consists of the selection, conversion, transformation, consolidation, integration, cleansing and mapping of data (i.e., recent and historical) from multiple operational data sources (e.g., IBMs IMS) to a target DBMS (e.g., IBMs DB2) that supports an enterprises decision-making processes and BI systems. Data Warehousing key components
Extraction, transformation, and loading Data warehouse management Business modeling Meta Data Repository

Some material SAP

Corporate Governance:
Establishing a Single Point of Truth
Areas of central governance:

tactical/ strategic analytics

integrated operational analytics

roll up

ad hoc

DW - Enterprise Layer single point of truth

Some material SAP

OLTP - Online Transaction Processing


Online: things done while directly connected to the computer or network as oppose to stuff that is Batched for later processing. Transaction processing: is concerned with gathering info needed for daily data processing. The gathering of input info, processing of that info, and updating existing information. Example: operational DB.

Some material SAP

Sockel

OLAP - Online

Analytical Processing
OLAP online analytical processing: is the manipulation of info to support decision-making
OLAP Answer questions such as:
How many senior-level marketing majors have not taken statistics?

OLAP manipulates information to support decision making. OLAP supports Data warehouses

Some material SAP

OLAP Business Scenarios


Creation of complex formulas, down to cell level Usage of conditions and exceptions in reports Multi-currency handling Flexible hierarchy analysis Non-cumulative key figures to analyze data from inventory management data Elimination of internal business volume Market share analysis Slow moving items ...
Some material SAP

Transactional vs. Analytical DB


Characteristics
Primary Operation Level of Analysis Amount of Data per Transaction Type of Data Relevance of Data Data Updates Database Concept Number of Transactions/Users Time Frame

OLTP
Update Low Very small Detailed Current Often Complex

OLAP
Analyze High Very large Summary Current and historical Less frequent, only new data Simple

Many
Point in time

Few
Time period

Database Data Number of Tables per Transaction


Type of Processing
Some material SAP

Normalized
Several Well-defined

Denormalized
Few Ad hoc

Data Mining
Data mining is the process of discovering meaningful correlations, patterns, and trends by sifting through large amounts of data stored in repositories. The key data mining concepts are multi-step processes, discovery and techniques.

Whether or not this discovery or exploration is performed by human analysts, software agents or machine learning techniques, it is important that the results provide enterprises with insights not available through traditional techniques or predefined relationships (e.g., relational tables).
Some material SAP

Data Warehousing is about Questions, not Answers!


Ken Orr

Data Warehousing is about What If, not What Is!


Merv Adrian Giga Group
Source: Sockel

Some material SAP

SAP BI Architecture

Some material SAP

Business Intelligence Evolution


History
IT Service Reports

Legacy
Decision Support Systems

Current
Business Intelligence

Future
Business Perf. Mgmt.

Hand coded Single system data Summary metrics Extreme latency

Report writers Joined operating data Statistical metrics Extreme cost

OLAP DW

Dashboard/minin g Enterprise portals

Predictive metrics Recommendations Extreme infoglut Extreme integration

Moving beyond one-way info delivery to true BPM


Source: META Group Inc. and Sockel

Some material SAP

Conclusion

A fundamental prerequisite of success of an enterprise BW strategy is the support of corporate management (sponsorship):

If there is no organizational momentum toward a common goal, then the best architecture, the best framework in the world is bound to fail.
W.H. Inmon

Some material SAP