You are on page 1of 43

Informatica Data Quality Products

Chris Phillips

Agenda
1. Product Platform 2. Data Quality Positioning 3. Informatica Data Quality 4. Data Quality Assistant 5. Informatica Data Explorer

Agenda
1. Product Platform 2. Data Quality Positioning 3. Informatica Data Quality 4. Data Quality Assistant 5. Informatica Data Explorer

The Informatica Product Platform


Automating Entire Data Integration Lifecycle
Audit, Monitor, Report
Ensure data consistency, perform impact analysis and continuously monitor quality
Data Explorer Data Quality

Access
Any system in batch or real-time

Discover
Search and profile any data from any source

Cleanse
Validate, correct and standardize all data types

Integrate
Transform and reconcile all data types

Deliver
Provide right data, at the right time, in the right format

PowerExchange

PowerCenter

Develop & Manage


Develop and collaborate with common repository and shared metadata

Data Quality Functions


Data Profiling and Discovery
Column profiling
Frequency, pattern analysis, # of uniques, # of records, etc

Dependency Profiling Redundancy Profiling


Orphan analysis

Data Quality Profiling Data Enrichment and Cleansing Data Standardisation Data Matching and Deduplication / Consolidation Data Quality Monitoring
5

Agenda
1. Product Platform 2. Data Quality Positioning 3. Informatica Data Quality 4. Data Quality Assistant 5. Informatica Data Explorer

2007 Magic Quadrant

Key Strengths and Customer Benefits

Business-Focused Data Quality

business leadership needs to take responsibility for identification of data quality issues, establishing minimum acceptable levels for data quality and facilitating data quality improvement initiatives, Gartner 2006

All Master Data Types Data Quality Metrics and Reports Enterprise Data Quality Deployment

The same infrastructure can be deployed to support customer and product data

Easy access to metrics to identify, categorize and quantify low quality data

Scalable infrastructure for high performance installations (reusable for DI and DQ)

Business Demand Drivers


Why Enterprise Data Quality?

Business Imperatives

Improve Sales and Customer Service

Business Intelligence & Regulatory Compliance

Global Operational Efficiency

Mergers & Acquisitions

IT Initiative

CRM, SVoC, CDI

Regulatory Reporting

MDM Product Hubs

Systems Consolidation

Data Quality Project

Name & Address Cleansing

DQ Reporting & Monitoring

Data Standardization

Data Matching & Consolidation

Data Quality Services

Profile

Cleanse

Enrich

Match

Scorecard

Business Case and Benefit Statement - Improved Sales and Customer Service
Business Imperatives
Improve Sales and Customer Service Global Operational Efficiency

Regulatory Compliance

Mergers & Acquisitions

Benefit Statement

Increase Revenues Reduce Costs

10

Agenda
1. Product Platform 2. Data Quality Positioning 3. Informatica Data Quality 4. Data Quality Assistant 5. Informatica Data Explorer

11

Data Quality: Where it fits?


Data Data Storage Intelligence
DQ DQ Reporting Reporting DQ DQ Server Server Business Business Application Application Reporting Reporting --Basel II Basel II --FIAS FIAS --SOX SOX Single View Single View of of Customer/ Customer/ Product Product

Front End YY Front End Application X Application X

Data Data Mart Mart


Data Quality Data Quality

Data Data Warehouse Warehouse


Loading Tool Loading Tool Messaging Messaging FTP FTP Bus Bus Transformation Transformation Extraction Extraction

DB DB

Data Integration

Source Reconciliation Source Reconciliation Fuzzy Matching Fuzzy Matching Scorecarding Scorecarding Cleansing Cleansing Enrichment Enrichment

Operational Operational Data Store Data Store

Data Exploration: Analyse & Align Data Exploration: Analyse & Align
CRM CRM Finance Finance Production Production External External Systems Systems etc etc

Data Sources

Data Quality Fire Wall Data Quality Fire Wall


12

Informatica Data Quality


Provider
XML, Messaging, and Web Services

Workbench

Data Quality Engine


Cleanse Match Enrich
Scorecard/Monitor

Consumer
Portals, Dashboards, and Reports

Packaged Applications

Rules

Repository
Reference Data

Scorecards

XML, Messaging, and Web Services Packaged Applications

Relational and Flat Files

Global

Local
Relational and Flat Files

Connectors High Availability

Web Services Grid

Mainframe and Midrange

Mainframe and Midrange

13

Design & Build Data Quality Rules


Data Quality Profiling process is
Build analysis rules Identify, Categorize, Quantify Build or access reference dictionaries for consistency Present initially via a desktop drilldown reporting tool

14

Data Quality Management Process


Analyze Analyze
1. Identify & Measure Data Quality 5. Monitor Data Quality Versus Targets

2. Define Data Quality Rules & Targets

4. Implement Quality Improvement Processes

3. Design Quality Improvement Processes

Enhance Enhance

15

Six Types of Data Quality Dimensions


Completeness What data is missing or unusable?

Conformity

What data is stored in a non-standard format?

Consistency

What data values give conflicting information?

Accuracy

What data is incorrect or out of date?

Duplicates

What data records or attributes are repeated?

Integrity

What data is missing or not referenced?

16

Data Quality Dimensions


Column Profiling Column Profiling What is the datas physical characteristics ?? Across multiple tables? What is the datas physical characteristics Across multiple tables? What relationships exist in the data set? Across multiple tables? What relationships exist in the data set? Across multiple tables? What data is redundant? Orphan Analysis What data is redundant? Orphan Analysis What data is missing or unusable? What data is missing or unusable? What data is stored in aa non-standard format? What data is stored in non-standard format? What data gives conflicting information? What data gives conflicting information? What data is incorrect or out of date? What data is incorrect or out of date? What data records are duplicated? What data records are duplicated? What data is missing important relationship linkages? What data is missing important relationship linkages? What scores, values, calculations are outside of range? What scores, values, calculations are outside of range?

Data Exploration

Relationship Relationship Redundancy Redundancy Completeness Completeness Conformity Conformity Consistency Consistency Accuracy Accuracy Duplication Duplication Integrity Integrity Range Range

Data Quality

17

US Customer Master Data - Examples


Consistency: Consistency: Data is in correct format and Incorrect Format complete, but breaks a business rule

Duplication: Fuzzy matching Completeness: Conformity: Missing Key Values Incorrect Format

Range: Identify outliers Integrity: Accuracy: UsingRelationship Identification reference data to validate

COMPLETENESS

CONFORMITY

CONSISTENCY

DUPLICATION

INTEGRITY

ACCURACY

RANGE

18

Parsing and Cleansing of Person Name

Before

After parsed & cleansed


19

Match and Consolidate


Identifies records that identify the same location/individual Consolidate multiple instances into a master records

20

Identify Key Fields in Materials Master (MARA)


Completeness and Conformity
Base Unit of Measure Completeness

Material Type Completeness

Gross Weight Completeness EAN or GTIN Completeness & Conformity

Net Weight Completeness

Gross Weight Conformity

21

Data Quality Scorecard

22

Data Quality Process Summary


Business Data Analyst

Business Rule Definition

Data Integration Developer

Deployment

23

Deployment Scenario 1 Data Quality Projects


Business Application data team Data Quality Project Environment
Compliance, Audits, monitoring and Scorecarding Projects Data Quality Standards Implementation Data Cleansing Projects Data Quality as a critical task within a data migration project

Target Target Target Application Target Application Target Application database Target Application database Application database Application database database database

Business

IT

24

Deployment Scenario 2 Data Quality with Powercenter for DI


Data Storage

Data Quality Reporting

Business Business Application Application Database Database

Enterprise Enterprise DataWarehouse DataWarehouse

Data Integration

Data Quality Custom Transformation

Analyse Cleanse Match / Consolidate

Data Integration plus Data Quality

Powercenter

Report / Monitor

Data Sources

Point of Entry Data Quality --delivered via Powercenter Web Services Point of Entry Data Quality delivered via Powercenter Web Services

25

Benefits of Integrated Data Quality & Data Integration Platform


Universal Data Access Enterprise-wide Data Quality Deployments Common Repository & Shared Metadata Re-use & Standardisation. Higher productivity. Common reporting Common Management tools & services Faster enablement, lower support overhead Optimum Performance & Scalability No Data Quality bottlenecks Single vendor, strategic relationship

26

Agenda
1. Product Platform 2. Data Quality Positioning 3. Informatica Data Quality 4. Data Quality Assistant 5. Informatica Data Explorer

27

Data Consolidation Interactive Cleansing


Benefits
Supports a key part of the data quality process i.e. the distributed manual process of review, edit and correct Supports the business user who understands the changes to be made. Provides a portal for business users to collaborate with IT to address data quality issues.

28

Data Quality Assistant Features


Exceptions Management
Edit exception records directly Accept for sending directly to match process (see process flow) Reject for eliminating records from future match process Reprocess for sending through automated data quality rules for standardization, validation and matching

Interactive Consolidation
Select final record Select attributes from other records to populate final record

Audit Trail
Status of records stored here Updated for changed/final master records Merged for associate records to final

29

Data Quality Assistant as part of DQ process


Data Sources Data Quality Assessment Framework
DQ Reporting DQ Scorecard

IDQ Rules

Prioritisation of issues IDQ Cleansing, Standardising IDQ Exception File Generation

Root Cause Analysis

Corrective Actions / Change Requests

Data Fixing

Data Quality Assistant

30

Managing Exceptions

31

Interactive Consolidation

32

Audit Trail

33

Agenda
1. Product Platform 2. Data Quality Positioning 3. Informatica Data Quality 4. Data Quality Assistant 5. Informatica Data Explorer

34

Informatica Data Explorer 5.0


Provider
XML, Messaging, and Web Services

Graphical User Interface


Cleanse Match Enrich
Scorecard/Monitor

Consumer
Data Quality Reporting

Packaged Applications

Data Profiling Engine


Data Quality Process

Relational and Flat Files

Repository

Data Integration

Mainframe and Midrange

Metadata Management

35

Data Profiling The Challenge


Sources
(Heterogeneous Legacy Systems)
ERP SAP VSAM

Targets
CRM

Flat Files

Oracle Analyzing / and / ETL Code, load,Cleansingexplode M&A

BI

Sybase RDBMS SCM DW

DB/2
36

Informatica Data Explorer The Principle


Source Systems
(Heterogeneous Legacy Systems)
SAP VSAM

Target Systems
ERP CRM

Analyzing/ /Cleansing / /ETL Profiling Profiling Cleansing ETL


Oracle Flat Files Column Profiling Column Profiling Sybase RDBMS Single Table Analysis Single Table Analysis Cross Table Analysis Cross Table Analysis DB/2
37

M&A

BI

SCM

DW

Informatica Data Explorer - Benefits


Rapid automatic investigation of undocumented data sources
Pre-build business rules are applied to all data types to automatically identify low quality data. A process driven GUI enables a data analyst rapidly walk thru a series of tests with the data.

Execute a broad range of profiling functions for all data types


Data Explorer allows the user to implement column, table and crosstable analysis, thus implementing dependency and redundancy profiling

Deliver a catalog of issues / actions / comments


Results of Data Profiling are logged in an open repository for downstream processes to make use of e.g. data integration processes

Integration with Data integration tools including PowerCenter


Increases accuracy of mapping design by providing all profiling information to the Powercenter Developer

38

Informatica Data Explorer


Rapid analysis of data in multiple source systems Catalog details of each data source in repository Tables, columns, domains Data structures (inferred & documented) Data completeness & redundancy High-level DQ status & issues Tag data and document instructions for follow-on processes Target Mapping information
Generate high level graphical reports View summary information 39

Investigate and profile data in all source systems to assess actual state of data and identify issues

View results & drill downs Connect to data sources & step through Analyses

Build tags, notes, action steps Email info Set-up reports

Analyst Statements
a core component to creating master data is the ability to first perform data quality profiling and then apply standardization, matching, merging and enrichment logic, Forrester, Rob Karel, Mar. 2007

40

What our customers say


Detailed investigations like this would never be possible without this tool, Customer Quote Informatica Data Explorer tells me everything I have never asked for, Customer Quote . . . I cant misuse fields any more . . ., Customer Quote With this tool I am 4 times faster as I am today when analysing data, Customer Quote Using Data Explorer is like surfing data, Large Fortune 500 customer with 50 trained IDE users.

41

Agenda
1. Product Platform 2. Data Quality Positioning 3. Informatica Data Quality 4. Data Quality Assistant 5. Informatica Data Explorer

42

Questions?

43