Informatica User Group

PowerCenter : Differences Between v 7 & v 8
Mark Murray - Senior Sales Consultant October, 19th 2006

Informatica confidential. For discussion purposes only.
1

Goals for New Architecture
• Enterprise Deployment
• Improved Service Orientation • High Availability • Grid Deployments

Centralized Services
• Administration • Logging & Auditing

Single Point of Administration
• Traditional Configuration • HA Configuration • Grid Configuration

Informatica confidential. For discussion purposes only.

2

What do customers want?
• High Availability and Failover was a top 10 request in the 2004 User Group surveys • Database Pushdown Optimization was 10th out of 66 features in the 2005 Surveys • Improved logging capabilities was 2nd out of over 60 feature requests in the 2004 surveys • Looping support within the Designer

Informatica confidential. For discussion purposes only.

3

Informatica Data Integration Platform
Continually Raising the Bar
Hercules
2007

PowerCenter 8.1.1
Now

On-Demand Platform for the Enterprise

PowerCenter 7
Advanced Edition
One Product, Single Install

Mission-Critical Enterprise Deployment

Informatica confidential. For discussion purposes only.

4

Informatica Delivers Continuous Innovation
0:37

<18 min

“With PowerCenter continually leapfrogging on performance and scalability, we are never concerned about our ability to handle increasingly large data volumes in our data integration environment.”
--- Kevin Smith, CRM Strategies Manager, AAA Carolina 6:35

3:36
SOA Web services Grid, 64-bit Team development Enterprise security Mainframe Data Server and CDC Impact analysis Realtime Workflow Data quality 3-tier architecture Enterprise metadata Realtime Workflow Data quality 3-tier architecture Enterprise metadata Partitioning Debugger XML Metadata connectivity Pipelining ERP Connectivity UNICODE

Session On Grid Adaptive Load Balancing High Availability Dynamic Partitioning Pushdown Optimization Unstructured Data Data Federation SOA Web Services Grid, 64-bit Team development Enterprise security Mainframe Data Server and CDC Impact analysis Realtime Workflow Data quality 3-tier architecture Enterprise metadata Partitioning Debugger XML Metadata connectivity Pipelining ERP Connectivity UNICODE

1 TB Transform and Load Test HR: Min
Partitioning Debugger XML Metadata connectivity Pipelining ERP Connectivity UNICODE Pipelining ERP Connectivity UNICODE

Partitioning Debugger XML Metadata connectivity Pipelining ERP Connectivity UNICODE

V4.x

V5.x

V6.x

V7.x

V8.x

Informatica confidential. For discussion purposes only.

5

What else is in the Informatica product family?
PowerCenter Options
Data Cleanse and Match

PowerCenter 8 Advanced Edition
Metadata Manager Data Analyzer Team Based Development

Data Federation (EII)

New

Enterprise Grid High Availability Pushdown Optimization Unstructured Data Mapping Generation Data Profiling

PowerCenter 8 Standard Edition

Updated

Partitioning Real-Time PowerCenter Connects

Broader

Metadata Exchange
6

Informatica confidential. For discussion purposes only.

PowerCenter 8 Base Improvements
Delivering Value for Installed Base Customers
Reduce Time To Results
• Java transformation support User defined functions Extended expression library Mapping generation and templates Improved Data Profiling

PowerCenter Advanced Edition
Metadata Manager Data Analyzer Team Based Development

• • • •

Cost Effectively Scale
• Centralized administration web-based console Extended recovery options Connection resilience (RDMS, Network, PC) Flat File Performance Optimization Enhanced, centralized logging Enhanced Team-Based Development Unicode repository option • • • • • •

PowerCenter Standard Edition

Informatica confidential. For discussion purposes only.

7

PowerCenter 8 Release Themes

• • • • •

Service Oriented Architecture 24x7 Availability of PowerCenter services Order of magnitude performance improvements Unlimited scalability Improved developer productivity

Informatica confidential. For discussion purposes only.

8

PowerCenter 8.x Update –
Setting the Standard for Data Integration across the Enterprise
• Infrastructure and Server Enhancements
• • • • • • Services based Architecture High Availability Grid Enhancements Easy Grid Configuration Centralized administration web-based console Centralized configuration

Developer Enhancements
• • • • • • Functions and Expressions User Defined Functions Java Transformation Dynamic Target Creation Visio Template – mapping generation and templates Upgrade Wizard

Performance Enhancements
• • • • • Pushdown Optimization Flat Files Partitioning Auto Cache Connection resilience (RDMS, Network, PC)

Expand the definition of universal data access
• • • • Data Federation Option Unstructured Data Option Data Quality Option – Extended PowerExchange

Informatica confidential. For discussion purposes only.

9

PowerCenter 8 Architecture

Informatica confidential. For discussion purposes only.
10

PowerCenter 6 and 7 Architecture
Repository Manager Designer Repository Server Admin Console Web Services Hub Repository Server Repository Database

Client Tools

Workflow Manger Workflow Monitor

PowerCenter Connects

Data Servers (pmserver)

PowerExchange

Machine
Informatica confidential. For discussion purposes only.
11

PowerCenter 8 Architecture
Repository Manager Designer Administration Console

Client Tools

Workflow Manger Workflow Monitor

Application Services
Integration Service Web Services Hub Repository Service SAP BW Service

Repository Database

*

PowerCenter Connects

Core Services
Repository Service Domain/Gateway Services
• • • • Administration & Authorization Configuration Domain Licensing

Log Service

PowerExchange

Node & Domain
.

Informatica confidential. For discussion purposes only.

12

PowerCenter 8 Terminology
• Services • A service is a resource that provides specialized functions. • PowerCenter has two types of services. Application and Core Services.
• PowerCenter Application Services – represents server based functions such as Repository, Integration, SAP BW, and WebService Hub services. • PowerCenter Core Services – represents functions that manage and maintain the environment in which PowerCenter operates.

Informatica confidential. For discussion purposes only.

13

Introducing PowerCenter 8 Terminology
• Node
• A node is a logical representation of a physical machine. It has physical attributes such as a hostname and port number. • Each node runs a Service Manager which is responsible for the application and core services. • Is started when you start “Informatica Services”

• Domain
• A domain is the fundamental unit of PowerCenter Services administraion. • A domain is a logical collection or set of nodes and services that you can group in a “folder like” deployment.

Informatica confidential. For discussion purposes only.

14

PowerCenter 8 Terminology
• Service Manager • On the gateway node, the Service Manager is responsible for
• Controlling the domain • Manage services running on the domain • Provide service lookup

• On all nodes, the Service Manager
• Controls the core services and application services

Informatica confidential. For discussion purposes only.

15

PowerCenter Services Framework
Client Tools
Designer Repository Manager Workflow Manager Monitor Administration Console
Integration Service Master Gateway (Domain Controller) Repository Service Repository Database

PowerCenter Domain

Check point

Logs Domain Metadata

Informatica confidential. For discussion purposes only.

16

High Availability (HA)

Informatica confidential. For discussion purposes only.
17

High Availability in PC8
• Failover
• Restart for data integration, repository and other services • Primary and backup servers

• Recovery
• Workflow and sessions will be recovered on running servers on the grid during server failure
• Checkpoint recovery

• Repository recovery

• Resilience
• PowerCenter jobs will sustain transient failure
• Network errors • DB connection failures
Informatica confidential. For discussion purposes only.
18

Resilience
• DB Connection Resilience
• When connecting/disconnecting from a DB • Oracle, DB2, Sybase, SQL Server and Teradata • Retry interval based on timeout setting

• FTP Resilience
• For connections to FTP server • Read/write will recover if connection lost based on timeout parameter

• Internal Resilience
• PowerCenter components (integration service, clients etc.) resilient to Repository service failure

Informatica confidential. For discussion purposes only.

19

Simple High Availability/Failover Scenario
• Simple environment
• 1 Domain which consists of:
• 2 nodes for Integration Services
• node01 - Primary • node02 - Backup

Node01
(Int_Svc01)

• 1 server for repository.

Repository DB Node02
(Int_Svc02)

Informatica confidential. For discussion purposes only.

20

Simple High/Failover Availability Scenario
• node01 Integration Service goes down • Node01 Integration Service “fails over” to node02
Component Failure (HW/SW)

node01
(Int_Svs01)

Repository DB node02
(Int_Svs02)

Automatic Failover Restart Recovery

Informatica confidential. For discussion purposes only.

21

Grid Enhancements

Informatica confidential. For discussion purposes only.
22

Domain Overview Dashboard
Simplified, Web-based Administration

Services Configuration
Remember pmserver config file?

Domain
Example Primary & Backup Repository Service

Nodes

Services

Informatica confidential. For discussion purposes only.

23

Mission-critical Enterprise Deployment
Cost-effective Scalability with PowerCenter on a Grid
Automatically recover, restart on live server

Failed Hardware Server

PowerCenter Domain Controller

Distributed processing of sessions PowerCenter Domain on Server Grid

Informatica confidential. For discussion purposes only.

24

Grid Enhancements
Grid Object
• • • Configured from admin console Services can be assigned to grid Workflows are assigned to be run by services Same as version 7 Distribute Sessions of a Workflow across multiple nodes New in version 8 Can partition sessions to run on multiple nodes # of partitions dynamically determined at runtime Less configuration for users Configure available resources on nodes in grid through admin console Load balancer dispatch jobs based on resource availability on nodes

Workflow distributed on Grid (WOnG)
• •

Session distributed on Grid (SOnG)
• • • • • •

Dynamic Partitioning

Resource Maps

Informatica confidential. For discussion purposes only.

25

Grid – PC 7 vs. PC 8
PowerCenter 7
• • • • ServerGrid is collection of pmservers Work is directed to individual pmservers Work distributed across Grid in round-robin manner Session/task is lowest unit of work

Informatica confidential. For discussion purposes only.

26

Grid Capabilities in 7.x vs. 8.x
7.x
• ServerGrid Object • Collection of pmservers • Workflows explicitly assigned to pmservers • Pmservers belonging to a ServerGrid will dispatch to other pmservers • Pmservers could fail causing workflows to fail • Can’t split sessions across multiple nodes • Load balancer is round robin only

8.X
• Grid object
• Collection of nodes

• Workflows assigned to Integration Service • Integration Service assigned to Grid (can run on any node in grid) • If one node fails, another Integration Service process on another node in grid takes over running the workflow • A session can be partitioned across nodes • Load balancer takes into account resource availability on nodes and resource requirements of sessions for dispatch.

Informatica confidential. For discussion purposes only.

27

Performance Improvements

Informatica confidential. For discussion purposes only.
28

Pushdown Optimization

Informatica confidential. For discussion purposes only.
29

Introduction
• What is pushdown optimization?
• Push transformation processing to data sources & targets w/o moving data out

• Benefits
• Reduce movement of data when source and target are the same database instance • Utilize database-specific processing that may be more optimal

• Maintain metadata and lineage in PowerCenter

Informatica confidential. For discussion purposes only.

30

Pushdown Optimization
• Full Pushdown:
• Source and target are in the same RDBMS • All transformations can be processed in database

• • •

Partial Source:
• One or more transformations can be processed in source database

Partial Target :
• One or more transformations can be processed in target database

Generated SQL:
• INSERT INTO t (…) VALUES (?+1, SOUNDEX(?))

Extract Source DB

Transform

Load Target DB

Informatica confidential. For discussion purposes only.

31

Example – Full Pushdown
SQL & Business Logic Maintained in Repository

Informatica confidential. For discussion purposes only.

32

Flat File Performance & Parameter and Variable Enhancements

Informatica confidential. For discussion purposes only.
33

Flat file enhancements
• FF Reader and Writer have been rewritten to optimize for performance
• Delimited files with lots of decimal data will see the most significant performance improvements • Out of box performance improvements should be between 30%300%

• Append to flat file targets
• Session output can be appended to existing flat file

• Flat file source/target command support
• Sources: use a command to generate source data or a file list that references multiple source files. • Targets: use a command to process the target data or process data for all partitioned targets in a session.
Informatica confidential. For discussion purposes only.
34

Parameters and Variables Enhancements
• Parameter Enhancements
• Table owner name for relational sources/targets • E-mail address • FTP remote file name

• Global section specification in parameter files for use across different workflows / sessions

Informatica confidential. For discussion purposes only.

35

Partitioning Enhancements

Informatica confidential. For discussion purposes only.
36

Partitioning Enhancements
• Flat File Partitioning
• FF targets can now be partitioned • All partitions can write to a single file, a merge file or file list can be created that contains the names of the individual files that were written

• Database Partitioning
• Partitioned Oracle and DB2 sources can be read in parallel • No changes to targets. DB2 can be written to in parallel.

• Dynamic Partitioning
• Based on # of partitions in database • Based on the # of nodes in a Grid

Informatica confidential. For discussion purposes only.

37

Auto Cache

© Informatica Corporation, 2006. All rights reserved. Informatica confidential. For discussion purposes only.
38

AutoCache Overview
• Cache in PowerCenter v7
• • • • Default cache settings not adequate for all situations. Default settings can underestimate new chip technologies. Sometimes necessary to hand tune individual transformations. Development did not always scale when deployed to different production machines.

• Auto Cache in PowerCenter v8.x
• Automatically distribute session memory to transformations. • Automatically scale memory usage based on resource available. • Automatically scale memory usage based on mapping complexity.

Informatica confidential. For discussion purposes only.

39

Memory Attributes
• PowerCenter has two types of memory attributes:
• Transformation Memory Attributes • Session Memory Attributes

Transformation Memory Attributes are for individual transformations:
• Lookup, Aggregator, Rank, Joiner
• Index and Data Cache Size

• Sorter Cache Size • XML Target Cache Size

Session Memory Attributes are for the session:
• Default Buffer Block Size • DTM Buffer Size

Informatica confidential. For discussion purposes only.

40

New Memory Attribute Specification
• Previously, only integer byte value were allowed for Memory Attributes. E.g, 1000000 or 2000000. • Now also allow shortcuts: “KB”, “MB”, and “GB”. E.g, 100MB • Also allow the value “Auto”
• This indicates that the user wants PowerCenter to automatically find a good value for that memory attribute • “Auto” supported for both session (e.g. DTM buffers/buffer block size) and transformation memory attributes (e.g. lookup caches)

Informatica confidential. For discussion purposes only.

41

AutoCache
• Allows the user to leave the calculations to PowerCenter User specifies total amount of memory AutoCache is allowed to use Automatically computes a value for ALL memory attributes that have the value “Auto” Will NOT affect any memory attributes where the value is not “Auto”

Informatica confidential. For discussion purposes only.

42

Cache Calculator
• • Click drop down Calculate based on the number of rows and the ports going into the object Value is propogated into the Cache value

Informatica confidential. For discussion purposes only.

43

Developer Improvements

Informatica confidential. For discussion purposes only.
44

Functions and Expressions

Informatica confidential. For discussion purposes only.
45

Function Enhancements
• Over 20 new functions added in the 8.x release
• Financial Functions, Regular Expression parsing/match, IN(), Compression, Encryption, CRC, MD5 and more

• Custom Functions
• Extend the functionality of the Expression Transformation via a C API • All 20+ functions above were added via this API

Informatica confidential. For discussion purposes only.

46

Function Enhancements
• User Defined Functions (UDF) • Ability for Designer users to create reusable functions entirely within the Expression Language • UDFs are folder level objects • can use any valid functions (except aggregation functions) as well as other UDFs (in the same folder)

Informatica confidential. For discussion purposes only.

47

Java & SQL Transformations

Informatica confidential. For discussion purposes only.
48

Java Transformation Use Cases
• Looping over data • Walking data hierarchies • Calling third-party APIs (Java based)
• Calling RMI/EJB etc. • Other Java Packages

• Calling expression/UDF/unconnected widget (like lookup) from Custom Transformation • Simple “Custom Transformation”

Informatica confidential. For discussion purposes only.

49

Improved Developer Productivity
Java Inline Coding Sample

Informatica confidential. For discussion purposes only.

50

SQL Transformation Use Cases
• New SQL Transformation • Allows PowerCenter developers to execute SQL statements midstream in a mapping. • You can insert, delete, update, and retrieve rows from a database and returns database errors. • The SQL that is executed can be static SQL or can be dynamic where the SQL statement is itself created on a row by row basis. • The SQL transformation can also be used to execute SQL scripts from within a mapping – e.g. leverage SQL scripts that already exist

Informatica confidential. For discussion purposes only.

51

XML

Informatica confidential. For discussion purposes only.
52

XML Enhancements
• Filter data with query predicate • Create a default namespace • Import part of an XML schema • Use anySimpleType

Informatica confidential. For discussion purposes only.

53

Metadata Enhancements

Informatica confidential. For discussion purposes only.
54

Metadata Exchange Enhancements
• New Data Model Support
• • • • Sybase Power Designer – bi-directional Oracle Designer – bi-directional ER Studio Design Tool – uni-directional (same as before) CA Erwin – bi-directional

• Business Intelligence Support
• Business Objects (bi-directional) – added 6.5 & XI & XI R2 XConnects • Cognos ReportNet Framework Manager (bi-directional) – added 2.0 • Microstrategy (bi-directional) – added 8.0

Informatica confidential. For discussion purposes only.

55

Dynamic Target Creation

Informatica confidential. For discussion purposes only.
56

Dynamic Target creation
• Ability to dynamically create a target based on a transformation in the workspace or navigator • Right click on transformation in workspace and selected Create and Add Target • Drag a transformation and drop it in the Target folder • Has same port definitions as transformation from which it was created • Target type is same as repository you are using • Can edit the target definition to change type or ports • Creation dialog will be added in an upcoming release

Informatica confidential. For discussion purposes only.

57

Improved Developer Productivity
Target Generation

Simply Right-Click on an object…

…..Target is created! All you need to do is Auto link and you are ready to go

Informatica confidential. For discussion purposes only.

58

Mapping Generation Option
Visio Client for PowerCenter

Informatica confidential. For discussion purposes only.
59

Mapping Generation Option
• Bi-Directional “engine” for automatically generating mappings from Visio templates or reverse engineering PowerCenter mappings into Visio templates • Leverages the Informatica Data Stencil and Velocity templates for Visio

Informatica confidential. For discussion purposes only.

60

Visio Client for PowerCenter

Mapping Template

Template Inputs

Informatica confidential. For discussion purposes only.

61

Upgrade Wizard

Informatica confidential. For discussion purposes only.
62

PowerCenter Upgrade to 8.1
• A new Upgrade wizard in Admin Console
• Integrated UI that takes the user through the various steps in the upgrade • Provides a detailed upgrade summary report in the end • Allows user to switch in and out of the Upgrade UI to perform any other administrative activities • Can handle multiple repositories (global /local) and multiple PowerCenter Servers in one shot • Live feedback during repository upgrade as user goes through the upgrade process

• A new post-upgrade reference guide

Informatica confidential. For discussion purposes only.

63

Summary

Informatica confidential. For discussion purposes only.
64

Summary - PC 7 vs. PC 8
PC 7.x
• • • • 3 Tier Architecture Basic Grid Deployment Introduction to Profiling Added Transformations
• • Union XML

PC 8.x
• • Services Oriented Architecture Enhanced Grid Deployment
• • • High Availability Session on Grid Resilience

• •

Enhanced Profiling Added Transformations
• • Java SQL

• •

Web Services Team Based Development

Enhanced Productivity
• • Mapping Generation User Defined Functions
65

Informatica confidential. For discussion purposes only.

Thank You Questions at the break

Informatica confidential. For discussion purposes only.
66

Sign up to vote on this title
UsefulNot useful