You are on page 1of 44

Data Integration Strategy

Mark Mitchell
Senior Product Specialist - EMEA

The development, release and timing of any Informatica product described herein remains at the sole
discretion of Informatica. This information should not be relied upon in making a purchasing decision.

Data Integration Product Strategy

Project Specific Solutions


Targeted at Broader DI Use Cases
Includes Some Industry Verticals

Role Specific Tools


Encompassing Full DI Lifecycle
Integrated Workflow

Common Services & Frameworks


Single Integration Engine for ETL, EII
Comprehensive Orchestration Support
Ajax and Eclipse UIs
Model-based Repository

Informatica confidential. For discussion only. Do not distribute.

Data Integration Product Strategy


Project Specific Solutions

Business
Imperatives

IT
Initiatives

Improve
Decisions &
Regulatory
Compliance

Modernize
Business &
Reduce IT
Costs

Business
Intelligence

Legacy
Retirement

Merge &
Acquire

Increase
Business
Profitability

Outsource
Non-core
Functions

Application
Consolidation

Customer,
Supplier,
Product Hubs

Data
Consolidation

Data
Master Data
Synchronization
Management

BPO
SaaS

Data
Integration
Projects
Data
Warehouse

Data
Migration

Informatica confidential. For discussion only. Do not distribute.

Data
Quality

Data Integration Product Strategy


Role Specific Tools

Need

User

% Lifecycle

Discover &
Define

Define business terms, domain values and


Business Analyst
high level semantics. Profile source data
Analyst Focused Product
systems. Create integration specs. Define
business rules.

High

Architect &
Design

Build (or import) model of key business


Data Specialist/
entities. Document relationships
between
Architect
FocusedArchitect
Product
data objects and source data. Model
service interfaces.

Medium

Develop &
Process

Build ETL mappings/workflows + views for


end pointing
Informatica

Low

Admin &
Operate

Deploy and manage changes in large scale Administrator


environments. High availability,Informatica
grid,
8.x
pushdown. Unified administration, security.

Developer/
8.xProgrammer
- 7.x

Low

Informatica confidential. For discussion only. Do not distribute.

Data Integration Product Strategy


Common Services & Frameworks
Project Specific
Solutions
(Ajax)

Role Specific Tools


(Ajax, Eclipse)

= Major New Capabilities

Migration

Replication/
Bulk Sync

CrossEnterprise

Other INFA BU
Products

Analyst

Architect

Developer

Administrator

Informatica Services Framework


Shared Services,
Single Processing
Engine

Transformation

Cleansing

Profiling

Catalog

Rules Mgmt.

Orchestration

WSH

Reporting

Common Integration Engine


(Batch, Real-time, On Demand, Caching)
Foundation Services
(Repository, Grid, HA, Security, Admin., Logging, Licensing, etc.)
Connectivity Services

Informatica confidential. For discussion only. Do not distribute.

Major Release Timeline

Galileo Release
1H 2009
Da Vinci Release
2H 2008

Modeling, Federation
(Architect)

1H 2008
2H 2007
XE Platform
2H 2007
PC/PWX 8.5
Q3 2007

Governance, Orchestration
(Analyst)
Replication/Bulk Sync Solution

Migration Solution,
Data Masking Solution
Cross-Enterprise
Solution
Mission-Critical Deployments
(Admin, Operator)

Informatica confidential. For discussion only. Do not distribute.

Informatica 8.5 Release


Release Summary
CHARTER

Delivering enterprise grade data integration


Supporting Integration Competency Centers (ICCs)

Focus: Administrators and Operators


Delivery timeframe: GA Released 12 Oct 2007

KEY CAPABILITIES

Simplified web services deployment


Dynamic workflow concurrency
Improved real-time, synchronization support
Unified administration, enterprise grade security
More productive metadata discovery and analysis
200+ incremental enhancements to existing capabilities

Informatica confidential. For discussion only. Do not distribute.

Integrated Platform for Discovery & Definition


Glossary
Management
Profile, Cleanse
& Integrate

Analyst
Specify
Rules

Developer
Collaborate

Active
Scorecard

Informatica confidential. For discussion only. Do not distribute.

PowerCenter Data Masking

Informatica confidential. For discussion only. Do not distribute.


9

Data Masking Feature Summary


Protects sensitive information by transforming it into
de-identified, realistic-looking data while retaining
original data properties

Data remains relevant and meaningful


Preserves the shape and form of individual fields
Preserves intra-record relationships
Preserves join / foreign key relationships

John Smith
654-65-8945
4739-1146-80755716
100 Cardinal way
Redwood City

Glen Carter
654-45-2643
4739-1102-3517-8842
342 54th Street
New York

Informatica confidential. For discussion only. Do not distribute.

10

Business Drivers & Requirements

Per Incident Cost of Data Breach

Manage Risk
Minimize risk of a data security
breach

Regulatory Environment

Improve compliance with data


privacy laws & regulations

$197 per record ($239 in Fin Serv)


$6.3 million average
$225k - $35m range
Organizations Admitting Non Compliance
Sarbanes-Oxley, 28%

Globalization

Reduce costs through outsourcing


& offshoring

Gramm-Leach-Bliley Act, 14%


California database breach notification act, 15%
HIPAA, 40%
EU Data Privacy Directive , 45%

2007 Annual Study: Cost of a Data Breach, Ponemon Institute


Global State of Information Security 2006, CIO & PricewaterhouseCoopers

Informatica confidential. For discussion only. Do not distribute.

11

challenge in data privacy


is sharing data while
protecting personal information
The

Informatica confidential. For discussion only. Do not distribute.

12

Protecting Sensitive Data


Restrict Access, Mask Private Data

Development and Testing

Training

Support

Data Analysis

Outsourcing and Offshoring

John Smith
654-65-8945
4739-1146-8075-5716
100 Cardinal way
Redwood city

Glen Carter
654-45-2643
4739-1102-3517-8842
342 54th Street
New York

Informatica confidential. For discussion only. Do not distribute.

13

Business Use Cases


A Financial Services Organization needs to setup an offshore development center
for lowering the IT costs
Challenges

Offshore environment needs production-like data for reliable


development and testing of applications
200+ applications containing sensitive data in a variety of databases
(e.g. Oracle, DB2) and files (e.g. VSAM) with many inter-dependent
tables
Needs to ensure that access to all sensitive data is restricted to
users who have a Need-to-know
Sensitive fields include Name, Address, SSN, Credit Card Number,
Account Number, etc

Solution

PowerCenter Data masking option can preserve referential integrity


and intelligently mask SSN, Credit Card Number, etc while providing
realistic data to development and test environments

Informatica confidential. For discussion only. Do not distribute.

14

Business Use Cases


A Health Care Provider needs to outsource the analysis of health related data to a
third party marketing research firm
Challenges

It needs to mask all sensitive health related information to comply


with privacy laws like HIPAA
Sensitive fields include Name, Address, Age, Date of Birth, etc
Masked data must remain as close as possible to the original data to
ensure proper data analysis
For example, the date of birth needs to be masked but have to
maintain the same age

Solution

PowerCenter Data masking Option provides features like Blurring,


Mask format, Name, Address substitution, etc to de-identify sensitive
data while maintaining the original data characteristics.

Informatica confidential. For discussion only. Do not distribute.

15

The Informatica Product Platform


Automating Entire Data Integration Lifecycle
Audit, Monitor, Report
Source live data
from any system
in batch or
real-time

Ensure data consistency, perform impact analysis and


continuously monitor quality

Access
Unstructured
or structured
in batch or
real-time

PowerExchange

Data Explorer

Data Quality

Discover

Cleanse

Search and profile


any data from any
source

Discover and
profile sensitive
data from any
system

Deliver deidentified data to


other
environments

Monitoring &
reporting of
adherence to
security policies

Integrate

Validate, correct and


standardize
all data types

Define data
masking rules
and apply
transformations

Transform and
reconcile all
data types and
Industry
formats

Deliver
Exchange data at
the right time, in
the right format,
across any
platform

PowerCenter
+ Data Masking option

Develop & Manage


Develop and collaborate with common repository and shared metadata

Informatica confidential. For discussion only. Do not distribute.

16

Masking Production Data for Test Environment


Production Environment

Test Environment

Mainframe and
Mid-Range

Mainframe and
Mid-Range

Packaged
Applications

Packaged
Applications

Relational and
Flat Files

PowerCenter +
Data Masking Option

Standards and
Messaging
Remote Data

Data Masking Option is


licensed per
PowerCenter repository

Relational and
Flat Files

Standards and
Messaging
Remote Data

Informatica confidential. For discussion only. Do not distribute.

17

Use data masking


transformations in
PowerCenter mappings

Informatica confidential. For discussion only. Do not distribute.

18

PowerCenter Data Masking Option


Key Features

Multiple techniques and algorithms


Random Masking
Blurring
Key Masking for preserving referential
integrity
Substitution

Specialized, built-in rules and content


Name and Address content
Special fields like SSN, Credit Card,
Phone Number, etc
Pre-packaged sample mappings

Component of data integration platform


Universal data access
Rich transformation capabilities
Auditing and Reporting

Informatica confidential. For discussion only. Do not distribute.

19

Random Masking

Replace sensitive field with a randomly generated value


subject to various constraints

Range Minimum and Maximum boundaries

Blurring Fixed or Percent variance to the original value

Mask Format Format specification for retaining the data


structure
Character

Description

Alphabetical characters

Digits

Alphanumeric characters

Any character

No character masking.

Informatica confidential. For discussion only. Do not distribute.

20

Random Masking - Example


Customer

Customer

CUSTID

FULLNAME

CREATEDDATE

CUSTID

FULLNAME

CREATEDDATE

117

Andrew Davies

4/16/1996

448

Kan Crone

3/2/1976

638

Elizabeth Murphy

1/14/1998

259

Ludie Dowden

9/5/1982

890

Richard Block

4/6/2000

913

Jarad Bayne

11/19/2004

Customer Accounts

Customer Accounts

ACCTID

CUSTID

BALANCE

STARTDATE

ACCTID

CUSTID

BALANCE

STARTDATE

AS-09615

117

5197

11/12/2004

RW-07778

448

5268

11/12/2004

SJ-04108

117

8047

3/2/2007

VB-55856

448

7555

3/2/2007

FX-56312

638

162

7/27/2005

SX-00685

259

170

7/27/2005

Production Database

Test Database

ACCTID is masked using Mask Format to preserve the structure, two alphabetic characters
followed by a hyphen followed by five numeric characters
CREATEDDATE is masked using Range masking, to generate a random date between 01/01/1950
and 01/01/2010
BALANCE needs to be blurred plus or minus 10% in order to preserve the distribution of balances
across all accounts
Informatica confidential. For discussion only. Do not distribute.

21

Random Masking - Example


BALANCE number datatype

Blurring Mask BALANCE with a


value that is within + or - 10%
range of the original value

Informatica confidential. For discussion only. Do not distribute.

22

Random Masking - Example


CREATEDDATE date datatype

Range Generate a random


date between 01/01/1950 and
01/01/2010

Informatica confidential. For discussion only. Do not distribute.

23

Random Masking - Example


ACCTID string datatype

Mask Format Mask ACCTID


while preserving the structure,
two alphabetic characters, retain
the third character followed by
five numeric characters

Result string replacement


characters can be used to specify
characters to mask and replace. For eg.,
Use only uppercase alphabetic
characters

Informatica confidential. For discussion only. Do not distribute.

24

Key Masking
Generate repeatable values to preserve referential
integrity
Seed based algorithm returns the same data each
time the source value and seed value are the same
Configure the same seed value for masking the
primary key and foreign key columns
Change seed value to produce a different set of
repeatable data

Informatica confidential. For discussion only. Do not distribute.

25

Key Masking - Example


Customer

Customer

CUSTID

FULLNAME

CREATEDDATE

CUSTID

FULLNAME

CREATEDDATE

117

Andrew Davies

4/16/1996

448

Kan Crone

3/2/1976

638

Elizabeth Murphy

1/14/1998

259

Ludie Dowden

9/5/1982

890

Richard Block

4/6/2000

913

Jarad Bayne

11/19/2004

Customer Accounts

Customer Accounts

ACCTID

CUSTID

BALANCE

STARTDATE

ACCTID

CUSTID

BALANCE

STARTDATE

AS-09615

117

5197

11/12/2004

RW-07778

448

5268

11/12/2004

SJ-04108

117

8047

3/2/2007

VB-55856

448

7555

3/2/2007

FX-56312

638

162

7/27/2005

SX-00685

259

170

7/27/2005

Production Database

Test Database

Customer and Customer Accounts tables have to be masked consistently to preserve referential
integrity

Maintain repeatability. For example, mask 117 to 448 again and again

Change repeatable value for different runs. For example, mask 117 to 448 for test environment
but to 772 for development environment

Informatica confidential. For discussion only. Do not distribute.

26

Key Masking - Example

Seed Same seed value is used


while masking the primary key
and foreign key fields to preserve
referential integrity

Informatica confidential. For discussion only. Do not distribute.

27

Special built-in masking rules


Built-in rules for commonly known sensitive fields
Credit Card Number
Generate a random but valid credit card number using Luhn
algorithm
Preserve Issuer Identifier (Visa, Discover, etc), the first 6 digits of
the CC Number

Social Security Number


Generate a random Social Security Number that has not been
generated yet
Uses High group file provided by Social Security Authority
Download latest high group file for keeping up-to-date

Informatica confidential. For discussion only. Do not distribute.

28

Special built-in masking rules


Phone Number
Generate a random phone number but preserve the incoming
phone format

Email Address
Generate a random email address of the correct format with @, .,
etc

URL
Generate a random URL value with the correct format

IP
Generate a random IP address within the same network range

Informatica confidential. For discussion only. Do not distribute.

29

Special built-in masking rules - Example


Customer
PHONE

EMAIL

SSN

CREDITCARD

(206) 923-3477

bmurphy@illuminetss7.com

275-85-8158

4552-7473-4192-6624

6682848046

deborahashea@lmco.com

271-85-8451

4465-8580-5809-1951

Customer
PHONE

EMAIL

SSN

CREDITCARD

(988) 676-4900

ir6NKRi@JuBlAlgI07WR.AEb

275-53-0840

4552-7464-3620-2545

8056642448

78dgrJMg9gU1@laoQ.fGf

271-43-3410

4465-8564-7382-9054

Mask Phone while retaining the same format


Mask Email while retaining the correct email format
Generate an SSN with the correct format but that has not been issued so far
Generate a valid credit card Number while preserving the issuer identifier number
Informatica confidential. For discussion only. Do not distribute.

30

Special built-in masking rules - Example

Informatica confidential. For discussion only. Do not distribute.

31

Substitution Name and Address


Generate random but realistic looking values for Names
and Addresses
Packaged substitution datasets
First Names (Male and Female)
Last Names
Address

PowerCenter Lookup transformation is used for


performing random lookup against the provided datasets
Pre-packaged sample mappings that demonstrate
substitution mechanism

Informatica confidential. For discussion only. Do not distribute.

32

Substitution - Example
Customer
FULLNAME

STREET

CITY

STATE

John Smith

100 Cardinal way

Redwood City

CA

Andrew Davies

5400 Carillon Pt

Kirkland

WA

Customer
FULLNAME

STREET

CITY

STATE

Glen Harrison

6 Meadows Pkwy

Olympia

WA

Kan Crone

9001 Stockdale Hwy

Bakersfield

CA

Randomly substitute values from included content


Name Masking. For example, mask John Smith to Glen Harrison
Address Masking. For example, mask 100 Cardinal way to 6 Meadows Pkwy

Informatica confidential. For discussion only. Do not distribute.

33

Substitution Mapping
First Name Lookup Use
firstnames.dic file

Data Masking Transformation


Generate random numbers for
lookup
Surname Lookup Use
surnames.dic file

Address Lookup Use


Address.dic file

Informatica confidential. For discussion only. Do not distribute.

34

Orchestration

35

Informatica BPM Functionality


Data Service Orchestration and Human Workflow

Informatica
Orchestration
Designer

Informatica
Orchestration
Server

Informatica
Human
Workflow

Informatica confidential. For discussion only. Do not distribute.

36

Orchestration Designer
Eclipse based
Visual and Source editors for BPMN, XFORM,
WSDL, XSD etc.
Drag and drop interface eliminates coding (and
errors!)
Import and Export of standard artifacts (WSDL, XSD
etc.)
Single-click Deploy

Informatica confidential. For discussion only. Do not distribute.

37

Orchestration Server
BPEL engine
Executes BPEL code generated by Orchestration
Designer or by third party
Interaction with external participants is exclusively
based on Web Services technology (WSDL)
Supports long running processes
Newer versions of processes can be deployed without
terminating existing versions

Informatica confidential. For discussion only. Do not distribute.

38

Human Workflow
Designed as XFORMS (Web 2.0) using
Orchestration Designer
Deployed to Orchestration Server along with
generated BPEL code
Rendered by Orchestration Server and delivered to
browser

Informatica confidential. For discussion only. Do not distribute.

39

BPMN - Highlights
A

standardized means of illustrating a business


process
Useful for documenting business process
Useful in IT for documenting technical process

Informatica confidential. For discussion only. Do not distribute.

40

BPMN Diagram - Sample

Informatica confidential. For discussion only. Do not distribute.

41

Process Monitoring
Where are we at on the ABC project / deal / claim/ account?

Zoom and
Timeline Control

Shows
Event
Information

Shows
Proces
s Path

Informatica confidential. For discussion only. Do not distribute.

42

Uses for Orchestration

Sequencing

Synchronization of master data

Synchronize master data between multiple independent data sources

Conditional Logic

Take differentiated action depending on the outcome of another process activity

Different handlers for System and Business exceptions

Human Workflow

Start a process after the completion of another process or after a specific time has
been reached

Complex decisions requiring human intervention

Looping

Iteratively execute a process activity based on standard looping criteria (for, while,
repeat-until)
Informatica confidential. For discussion only. Do not distribute.

43

Thank You

44

You might also like