You are on page 1of 66

Informatica User Group

PowerCenter : Differences Between v 7 & v 8


Mark Murray - Senior Sales Consultant
October, 19th 2006

Informatica confidential. For discussion purposes only.


1
Goals for New Architecture
• Enterprise Deployment
• Improved Service Orientation
• High Availability
• Grid Deployments

• Centralized Services
• Administration
• Logging & Auditing

• Single Point of Administration


• Traditional Configuration
• HA Configuration
• Grid Configuration

Informatica confidential. For discussion purposes only. 2


What do customers want?

• High Availability and Failover was a top 10


request in the 2004 User Group surveys
• Database Pushdown Optimization was 10th out of
66 features in the 2005 Surveys
• Improved logging capabilities was 2nd out of over
60 feature requests in the 2004 surveys
• Looping support within the Designer

Informatica confidential. For discussion purposes only. 3


Informatica Data Integration Platform
Continually Raising the Bar
Hercules
2007

PowerCenter 8.1.1 On-Demand Platform


Now for the Enterprise

PowerCenter 7 Mission-Critical
Advanced Edition Enterprise Deployment

One Product,
Single Install

Informatica confidential. For discussion purposes only. 4


Informatica Delivers <18 min

Continuous Innovation
0:37
“With PowerCenter continually Session On Grid
Adaptive Load Balancing
leapfrogging on performance and High Availability
scalability, we are never concerned about Dynamic Partitioning
Pushdown Optimization
our ability to handle increasingly large data Unstructured Data
volumes in our data integration 3:36 Data Federation

environment.” SOA SOA


--- Kevin Smith, CRM Strategies Manager, Web services Web Services
Grid, 64-bit Grid, 64-bit
AAA Carolina Team development Team development
Enterprise security Enterprise security
Mainframe Data Server Mainframe Data Server
6:35 and CDC and CDC
Impact analysis Impact analysis

Realtime Realtime Realtime


1 TB Transform and Workflow Workflow Workflow
Load Test HR: Min Data quality Data quality Data quality
3-tier architecture 3-tier architecture 3-tier architecture
Enterprise metadata Enterprise metadata Enterprise metadata

Partitioning Partitioning Partitioning Partitioning


Debugger Debugger Debugger Debugger
XML XML XML XML
Metadata connectivity Metadata connectivity Metadata connectivity Metadata connectivity

Pipelining Pipelining Pipelining Pipelining Pipelining


ERP Connectivity ERP Connectivity ERP Connectivity ERP Connectivity ERP Connectivity
UNICODE UNICODE UNICODE UNICODE UNICODE

V4.x V5.x V6.x V7.x V8.x

Informatica confidential. For discussion purposes only. 5


What else is in the Informatica product
family?
PowerCenter Options
Data Cleanse and Match

PowerCenter 8 Data Federation (EII)


Advanced Enterprise Grid
Edition New
High Availability
Metadata Manager
Pushdown Optimization
Data Analyzer
Team Based Development Unstructured Data

Mapping Generation

Data Profiling
PowerCenter 8 Updated Partitioning
Standard
Edition Real-Time

PowerCenter Connects
Broader
Metadata Exchange

Informatica confidential. For discussion purposes only. 6


PowerCenter 8 Base Improvements
Delivering Value for Installed Base Customers
Reduce Time To Results
• Java transformation support
PowerCenter • User defined functions
Advanced • Extended expression library
Edition • Mapping generation and templates
Metadata Manager
• Improved Data Profiling
Data Analyzer
Cost Effectively Scale
Team Based Development
• Centralized administration web-based console
• Extended recovery options
PowerCenter
Standard • Connection resilience (RDMS, Network, PC)
Edition • Flat File Performance Optimization
• Enhanced, centralized logging
• Enhanced Team-Based Development
• Unicode repository option

Informatica confidential. For discussion purposes only. 7


PowerCenter 8 Release Themes

• Service Oriented Architecture


• 24x7 Availability of PowerCenter services
• Order of magnitude performance improvements
• Unlimited scalability
• Improved developer productivity

Informatica confidential. For discussion purposes only. 8


PowerCenter 8.x Update –
Setting the Standard for Data Integration across the Enterprise

• Infrastructure and Server • Developer Enhancements


Enhancements • Functions and Expressions
• Services based Architecture • User Defined Functions
• High Availability • Java Transformation
• Grid Enhancements • Dynamic Target Creation
• Easy Grid Configuration • Visio Template – mapping generation
• Centralized administration web-based and templates
console • Upgrade Wizard
• Centralized configuration

• Performance Enhancements • Expand the definition of universal


data access
• Pushdown Optimization
• Flat Files • Data Federation Option
• Partitioning • Unstructured Data Option
• Auto Cache • Data Quality Option –
• Connection resilience (RDMS, • Extended PowerExchange
Network, PC)

Informatica confidential. For discussion purposes only. 9


PowerCenter 8 Architecture

Informatica confidential. For discussion purposes only.


10
PowerCenter 6 and 7 Architecture

Repository Manager Client Tools Workflow Manger

Designer Workflow Monitor


Repository Server Admin Console

Repository
Web Services Repository Database
Hub Server

Data Servers (pmserver)


PowerCenter
Connects

PowerExchange

Machine

Informatica confidential. For discussion purposes only. 11


PowerCenter 8 Architecture

Client Tools
Repository Manager Workflow Manger

Designer Workflow Monitor


Administration Console

Application Services Repository


Database
Integration Service Repository Service

Web Services Hub * SAP BW Service

Core Services
PowerCenter
Connects Repository Service
Domain/Gateway Services Log Service
• Administration & Authorization
• Configuration
PowerExchange • Domain
• Licensing

Node & Domain


.

Informatica confidential. For discussion purposes only. 12


PowerCenter 8 Terminology

• Services
• A service is a resource that provides specialized functions.
• PowerCenter has two types of services. Application and
Core Services.
• PowerCenter Application Services – represents server based functions such
as Repository, Integration, SAP BW, and WebService Hub services.
• PowerCenter Core Services – represents functions that manage and
maintain the environment in which PowerCenter operates.

Informatica confidential. For discussion purposes only. 13


Introducing PowerCenter 8 Terminology

• Node
• A node is a logical representation of a physical machine. It has
physical attributes such as a hostname and port number.
• Each node runs a Service Manager which is responsible for the
application and core services.
• Is started when you start “Informatica Services”

• Domain
• A domain is the fundamental unit of PowerCenter Services
administraion.
• A domain is a logical collection or set of nodes and services that
you can group in a “folder like” deployment.

Informatica confidential. For discussion purposes only. 14


PowerCenter 8 Terminology

• Service Manager
• On the gateway node, the Service Manager is responsible
for
• Controlling the domain
• Manage services running on the domain
• Provide service lookup
• On all nodes, the Service Manager
• Controls the core services and application services

Informatica confidential. For discussion purposes only. 15


PowerCenter Services Framework

Client Tools PowerCenter


Repository Domain
Database
Designer

Repository Check
Repository Service point
Manager

Workflow Logs
Manager Master
Gateway
(Domain
Controller) Domain
Monitor Metadata

Administration
Console

Integration
Service

Informatica confidential. For discussion purposes only. 16


High Availability (HA)

Informatica confidential. For discussion purposes only.


17
High Availability in PC8

• Failover
• Restart for data integration, repository and other services
• Primary and backup servers

• Recovery
• Workflow and sessions will be recovered on running servers on
the grid during server failure
• Checkpoint recovery
• Repository recovery

• Resilience
• PowerCenter jobs will sustain transient failure
• Network errors
• DB connection failures

Informatica confidential. For discussion purposes only. 18


Resilience

• DB Connection Resilience
• When connecting/disconnecting from a DB
• Oracle, DB2, Sybase, SQL Server and Teradata
• Retry interval based on timeout setting

• FTP Resilience
• For connections to FTP server
• Read/write will recover if connection lost based on timeout
parameter

• Internal Resilience
• PowerCenter components (integration service, clients etc.)
resilient to Repository service failure

Informatica confidential. For discussion purposes only. 19


Simple High Availability/Failover Scenario

• Simple environment
• 1 Domain which consists of: Node01
(Int_Svc01)
• 2 nodes for Integration Services
• node01 - Primary
• node02 - Backup
• 1 server for repository.

Repository DB

Node02
(Int_Svc02)

Informatica confidential. For discussion purposes only. 20


Simple High/Failover Availability Scenario
Component
• node01 Integration Failure
(HW/SW)
Service goes down
node01
• Node01 Integration Service (Int_Svs01)

“fails over” to node02

Repository DB

node02
(Int_Svs02)

Automatic
Failover
Restart
Recovery

Informatica confidential. For discussion purposes only. 21


Grid Enhancements

Informatica confidential. For discussion purposes only.


22
Domain Overview Dashboard
Simplified, Web-based Administration

Services Configuration
Remember pmserver config file?
Domain

Example
Primary
& Backup
Repository Nodes
Service

Services

Informatica confidential. For discussion purposes only. 23


Mission-critical Enterprise Deployment
Cost-effective Scalability with PowerCenter on a Grid

Automatically
recover, restart Failed
on live server Hardware
Server

PowerCenter
Domain
Controller

Distributed
processing
of sessions
PowerCenter
Domain on
Server Grid

Informatica confidential. For discussion purposes only. 24


Grid Enhancements
ƒ Grid Object
• Configured from admin console
• Services can be assigned to grid
• Workflows are assigned to be run by services

• Workflow distributed on Grid (WOnG)


• Same as version 7
• Distribute Sessions of a Workflow across multiple nodes

• Session distributed on Grid (SOnG)


• New in version 8
• Can partition sessions to run on multiple nodes
ƒ Dynamic Partitioning
• # of partitions dynamically determined at runtime
• Less configuration for users
ƒ Resource Maps
• Configure available resources on nodes in grid through admin console
• Load balancer dispatch jobs based on resource availability on nodes

Informatica confidential. For discussion purposes only. 25


Grid – PC 7 vs. PC 8

PowerCenter 7
• ServerGrid is collection of
pmservers
• Work is directed to
individual pmservers
• Work distributed across Grid
in round-robin manner
• Session/task is lowest unit
of work

Informatica confidential. For discussion purposes only. 26


Grid Capabilities in 7.x vs. 8.x
7.x 8.X
• ServerGrid Object • Grid object
• Collection of nodes
• Collection of pmservers
• Workflows assigned to
• Workflows explicitly assigned Integration Service
to pmservers
• Integration Service assigned
• Pmservers belonging to a to Grid (can run on any node
ServerGrid will dispatch to in grid)
other pmservers • If one node fails, another
• Pmservers could fail causing Integration Service process on
workflows to fail another node in grid takes
over running the workflow
• Can’t split sessions across • A session can be partitioned
multiple nodes across nodes
• Load balancer is round robin • Load balancer takes into
only account resource availability
on nodes and resource
requirements of sessions for
dispatch.

Informatica confidential. For discussion purposes only. 27


Performance Improvements

Informatica confidential. For discussion purposes only.


28
Pushdown Optimization

Informatica confidential. For discussion purposes only.


29
Introduction

• What is pushdown optimization?


• Push transformation processing to data sources & targets
w/o moving data out

• Benefits
• Reduce movement of data when source and target are the
same database instance
• Utilize database-specific processing that may be more
optimal
• Maintain metadata and lineage in PowerCenter

Informatica confidential. For discussion purposes only. 30


Pushdown Optimization
• Full Pushdown:
• Source and target are in the same RDBMS
• All transformations can be processed in database

• Partial Source:
• One or more transformations can be processed in source database

• Partial Target :
• One or more transformations can be processed in target database

• Generated SQL:
• INSERT INTO t (…) VALUES (?+1, SOUNDEX(?))

Extract Transform Load


Source Target
DB DB

Informatica confidential. For discussion purposes only. 31


Example – Full Pushdown
SQL & Business Logic Maintained in Repository

Informatica confidential. For discussion purposes only. 32


Flat File Performance & Parameter and
Variable Enhancements

Informatica confidential. For discussion purposes only.


33
Flat file enhancements

• FF Reader and Writer have been rewritten to optimize for


performance
• Delimited files with lots of decimal data will see the most
significant performance improvements
• Out of box performance improvements should be between 30%-
300%

• Append to flat file targets


• Session output can be appended to existing flat file
• Flat file source/target command support
• Sources: use a command to generate source data or a file list
that references multiple source files.
• Targets: use a command to process the target data or process
data for all partitioned targets in a session.

Informatica confidential. For discussion purposes only. 34


Parameters and Variables Enhancements

• Parameter Enhancements
• Table owner name for relational sources/targets
• E-mail address
• FTP remote file name

• Global section specification in parameter files for


use across different workflows / sessions

Informatica confidential. For discussion purposes only. 35


Partitioning Enhancements

Informatica confidential. For discussion purposes only.


36
Partitioning Enhancements

• Flat File Partitioning


• FF targets can now be partitioned
• All partitions can write to a single file, a merge file or file list can
be created that contains the names of the individual files that
were written

• Database Partitioning
• Partitioned Oracle and DB2 sources can be read in parallel
• No changes to targets. DB2 can be written to in parallel.

• Dynamic Partitioning
• Based on # of partitions in database
• Based on the # of nodes in a Grid

Informatica confidential. For discussion purposes only. 37


Auto Cache

©Informatica
Informatica Corporation,
confidential.2006.
ForAll rights reserved.
discussion purposes only.
38
AutoCache Overview

• Cache in PowerCenter v7
• Default cache settings not adequate for all situations.
• Default settings can underestimate new chip technologies.
• Sometimes necessary to hand tune individual transformations.
• Development did not always scale when deployed to different
production machines.

• Auto Cache in PowerCenter v8.x


• Automatically distribute session memory to transformations.
• Automatically scale memory usage based on resource available.
• Automatically scale memory usage based on mapping
complexity.

Informatica confidential. For discussion purposes only. 39


Memory Attributes
• PowerCenter has two types of memory attributes:
• Transformation Memory Attributes
• Session Memory Attributes

• Transformation Memory Attributes are for individual


transformations:
• Lookup, Aggregator, Rank, Joiner
• Index and Data Cache Size
• Sorter Cache Size
• XML Target Cache Size

• Session Memory Attributes are for the session:


• Default Buffer Block Size
• DTM Buffer Size

Informatica confidential. For discussion purposes only. 40


New Memory Attribute Specification

• Previously, only integer byte value were allowed for


Memory Attributes. E.g, 1000000 or 2000000.
• Now also allow shortcuts: “KB”, “MB”, and “GB”. E.g,
100MB
• Also allow the value “Auto”
• This indicates that the user wants PowerCenter to automatically
find a good value for that memory attribute
• “Auto” supported for both session (e.g. DTM buffers/buffer block
size) and transformation memory attributes (e.g. lookup caches)

Informatica confidential. For discussion purposes only. 41


AutoCache
• Allows the user to leave
the calculations to
PowerCenter
• User specifies total
amount of memory
AutoCache is allowed to
use
• Automatically computes a
value for ALL memory
attributes that have the
value “Auto”
• Will NOT affect any
memory attributes where
the value is not “Auto”

Informatica confidential. For discussion purposes only. 42


Cache Calculator

• Click drop down

• Calculate based on
the number of rows
and the ports going
into the object

• Value is propogated
into the Cache
value

Informatica confidential. For discussion purposes only. 43


Developer Improvements

Informatica confidential. For discussion purposes only.


44
Functions and Expressions

Informatica confidential. For discussion purposes only.


45
Function Enhancements

• Over 20 new functions added in the 8.x release


• Financial Functions, Regular Expression parsing/match,
IN(), Compression, Encryption, CRC, MD5 and more

• Custom Functions
• Extend the functionality of the Expression Transformation
via a C API
• All 20+ functions above were added via this API

Informatica confidential. For discussion purposes only. 46


Function Enhancements
• User Defined Functions (UDF)
• Ability for Designer users to create reusable functions
entirely within the Expression Language
• UDFs are folder level objects
• can use any valid functions (except aggregation
functions) as well as other UDFs (in the same folder)

Informatica confidential. For discussion purposes only. 47


Java & SQL Transformations

Informatica confidential. For discussion purposes only.


48
Java Transformation Use Cases

• Looping over data


• Walking data hierarchies
• Calling third-party APIs (Java based)
• Calling RMI/EJB etc.
• Other Java Packages

• Calling expression/UDF/unconnected widget (like


lookup) from Custom Transformation
• Simple “Custom Transformation”

Informatica confidential. For discussion purposes only. 49


Improved Developer Productivity
Java Inline Coding Sample

Informatica confidential. For discussion purposes only. 50


SQL Transformation Use Cases

• New SQL Transformation


• Allows PowerCenter developers to execute SQL
statements midstream in a mapping.
• You can insert, delete, update, and retrieve rows from a
database and returns database errors.
• The SQL that is executed can be static SQL or can be
dynamic where the SQL statement is itself created on a
row by row basis.
• The SQL transformation can also be used to execute SQL
scripts from within a mapping – e.g. leverage SQL scripts
that already exist

Informatica confidential. For discussion purposes only. 51


XML

Informatica confidential. For discussion purposes only.


52
XML Enhancements

• Filter data with query predicate


• Create a default namespace
• Import part of an XML schema
• Use anySimpleType

Informatica confidential. For discussion purposes only. 53


Metadata Enhancements

Informatica confidential. For discussion purposes only.


54
Metadata Exchange Enhancements

• New Data Model Support


• Sybase Power Designer – bi-directional
• Oracle Designer – bi-directional
• ER Studio Design Tool – uni-directional (same as before)
• CA Erwin – bi-directional

• Business Intelligence Support


• Business Objects (bi-directional) – added 6.5 & XI & XI R2
XConnects
• Cognos ReportNet Framework Manager (bi-directional) – added
2.0
• Microstrategy (bi-directional) – added 8.0

Informatica confidential. For discussion purposes only. 55


Dynamic Target Creation

Informatica confidential. For discussion purposes only.


56
Dynamic Target creation
• Ability to dynamically create a target based on a
transformation in the workspace or navigator
• Right click on transformation in workspace and selected
Create and Add Target
• Drag a transformation and drop it in the Target folder
• Has same port definitions as transformation from which it
was created
• Target type is same as repository you are using
• Can edit the target definition to change type or ports
• Creation dialog will be added in an upcoming release

Informatica confidential. For discussion purposes only. 57


Improved Developer Productivity
Target Generation

Simply Right-Click
on an object…

…..Target is created! All you


need to do is Auto link and
you are ready to go

Informatica confidential. For discussion purposes only. 58


Mapping Generation Option
Visio Client for PowerCenter

Informatica confidential. For discussion purposes only.


59
Mapping Generation Option

• Bi-Directional “engine” for automatically


generating mappings from Visio templates or
reverse engineering PowerCenter mappings into
Visio templates
• Leverages the Informatica Data Stencil and
Velocity templates for Visio

Informatica confidential. For discussion purposes only. 60


Visio Client for PowerCenter

Mapping Template

Template Inputs

Informatica confidential. For discussion purposes only. 61


Upgrade Wizard

Informatica confidential. For discussion purposes only.


62
PowerCenter Upgrade to 8.1

• A new Upgrade wizard in Admin Console


• Integrated UI that takes the user through the various steps in the
upgrade
• Provides a detailed upgrade summary report in the end
• Allows user to switch in and out of the Upgrade UI to perform any
other administrative activities
• Can handle multiple repositories (global /local) and multiple
PowerCenter Servers in one shot
• Live feedback during repository upgrade as user goes through
the upgrade process

• A new post-upgrade reference guide

Informatica confidential. For discussion purposes only. 63


Summary

Informatica confidential. For discussion purposes only.


64
Summary - PC 7 vs. PC 8

PC 7.x PC 8.x
• 3 Tier Architecture • Services Oriented Architecture
• Basic Grid Deployment • Enhanced Grid Deployment
• Introduction to Profiling • High Availability
• Session on Grid
• Added Transformations
• Resilience
• Union
• XML • Enhanced Profiling
• Web Services • Added Transformations
• Team Based Development • Java
• SQL

• Enhanced Productivity
• Mapping Generation
• User Defined Functions

Informatica confidential. For discussion purposes only. 65


Thank You
Questions at the break

Informatica confidential. For discussion purposes only.


66

You might also like