P. 1
Informatica%2bBest%2bPractices

Informatica%2bBest%2bPractices

|Views: 267|Likes:
Published by mwthota

More info:

Published by: mwthota on Oct 20, 2010
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

06/09/2013

pdf

text

original

Challenge

Using Informatica’s suite of metadata tools effectively in the design of the end-user
analysis application.

Description

The levels of metadata available in the Informatica tool suite are quite extensive.
The amount of metadata that is entered is dependent on the business requirements.
Description information can be entered for all repository objects, sources, targets,
transformations, etc. You also can drill down to the column level and give
descriptions of the columns in a table if necessary. Also, all information about column
size and scale, data types, and primary keys are stored in the repository. The
decision on how much metadata to create is often driven by project timelines. While
it may be beneficial for a developer to enter detailed descriptions of each column,
expression, variable, etc, it will also require a substantial amount of time to do so.
Therefore, this decision should be made on the basis of how much metadata will be
required by the systems that use the metadata.

Informatica offers two recommended ways for accessing the repository metadata.

• Effective with the release of version 5.0, Informatica PowerCenter contains a
Metadata Reporter. The Metadata Reporter is a web-based application that
allows you to run reports against the repository metadata.

• Because Informatica does not support or recommend direct reporting access
to the repository, even for Select only queries, the second way of repository
metadata reporting is through the use of views written using Metadata
Exchange (MX). These views can be found in the Informatica Metadata
Exchange (MX) Cookbook.

Metadata Reporter

The need for the Informatica Metadata Reporter arose from the number of clients
requesting custom and complete metadata reports from their repositories. The
Metadata Reporter allows report access to every Informatica object stored in the
repository. The architecture of the Metadata Reporter is web-based, with an Internet

INFORMATICA CONFIDENTIAL

BEST PRACTICES

PAGE BP-63

browser front end. You can install the Metadata Reporter on a server running either
UNIX or Windows that contains a supported web server. The Metadata Reporter
contains servlets that must be installed on a web server that runs the Java Virtual
Machine and supports the Java Servlet API. The currently supported web servers
are:

• iPlanet 4.1 or higher
• Apache 1.3 with Jserv 1.1
• Jrun 2.3.3

(Note: The Metadata Reporter will not run directly on Microsoft IIS because IIS does
not directly support servlets.)

The Metadata Reporter is accessible from any computer with a browser that has
access to the web server where the Metadata Reporter is installed, even without the
other Informatica Client tools being installed on that computer. The Metadata
Reporter connects to your Informatica repository using JDBC drivers. Make sure the
proper JDBC drivers are installed for your database platform.

(Note: You can also use the JDBC to ODBC bridge to connect to the repository. Ex.
Syntax - jdbc:odbc:)

Although the Repository Manager provides a number of Crystal Reports, the
Metadata Reporter has several benefits:

• The Metadata Reporter is comprehensive. You can run reports on any
repository. The reports provide information about all types of metadata
objects.

• The Metadata Reporter is easily accessible. Because the Metadata Reporter is
web-based, you can generate reports from any machine that has access to
the web server where the Metadata Reporter is installed. You do not need
direct access to the repository database, your sources or targets or
PowerMart or PowerCenter

• The reports in the Metadata Reporter are customizable. The Metadata
Reporter allows you to set parameters for the metadata objects to include in
the report.

• The Metadata Reporter allows you to go easily from one report to another.
The name of any metadata object that displays on a report links to an
associated report. As you view a report, you can generate reports for objects
on which you need more information.

The Metadata Reporter provides 15 standard reports that can be customized with the
use of parameters and wildcards. The reports are as follows:

• Batch Report
• Executed Session Report
• Executed Session Report by Date
• Invalid Mappings Report

PAGE BP-64

BEST PRACTICES

INFORMATICA CONFIDENTIAL

• Job Report
• Lookup Table Dependency Report
• Mapping Report
• Mapplet Report
• Object to Mapping/Mapplet Dependency Report
• Session Report
• Shortcut Report
• Source Schema Report
• Source to Target Dependency Report
• Target Schema Report
• Transformation Report

For a detailed description of how to run these reports, consult the Metadata Reporter
Guide included in your PowerCenter Documentation.

Metadata Exchange: The Second Generation (MX2)

The MX architecture was intended primarily for Business Intelligence (BI) vendors
who wanted to create a PowerCenter-based data warehouse and then display the
warehouse metadata through their own products. The result was a set of relational
views that encapsulated the underlying repository tables while exposing the
metadata in several categories that were more suitable for external parties. Today,
Informatica and several key vendors, including Brio, Business Objects, Cognos, and
MicroStrategy, are effectively using the MX views to report and query the Informatica
metadata.

Informatica currently supports the second generation of Metadata Exchange called
MX2. Although the overall motivation for creating the second generation of MX
remains consistent with the original intent, the requirements and objectives of MX2
supersede those of MX.

The primary requirements and features of MX2 are:

Incorporation of object technology in a COM-based API. Although SQL
provides a powerful mechanism for accessing and manipulating records of data in a
relational paradigm, it’s not suitable for procedural programming tasks that can be
achieved by C, C++, Java, or Visual Basic. Furthermore, the increasing popularity
and use of object-oriented software tools require interfaces that can fully take
advantage of the object technology. MX2 is implemented in C++ and offers an
advanced object-based API for accessing and manipulating the PowerCenter
Repository from various programming languages.

Self-contained Software Development Kit (SDK). One of the key advantages of
MX views is that they are part of the repository database and thus could be used
independent of any of the Informatica’s software products. The same requirement
also holds for MX2, thus leading to the development of a self-contained API Software
Development Kit that can be used independently of the client or server products.

Extensive metadata content, especially multidimensional models for OLAP. A
number of BI tools and upstream data warehouse modeling tools require complex
multidimensional metadata, such as hierarchies, levels, and various relationships.

INFORMATICA CONFIDENTIAL

BEST PRACTICES

PAGE BP-65

This type of metadata was specifically designed and implemented in the repository to
accommodate the needs of our partners by means of the new MX2 interfaces.

Ability to write (push) metadata into the repository. Because of the limitations
associated with relational views, MX could not be used for writing or updating
metadata in the Informatica repository. As a result, such tasks could only be
accomplished by directly manipulating the repository’s relational tables. The MX2
interfaces provide metadata write capabilities along with the appropriate verification
and validation features to ensure the integrity of the metadata in the repository.

Complete encapsulation of the underlying repository organization by means
of an API. One of the main challenges with MX views and the interfaces that access
the repository tables is that they are directly exposed to any schema changes of the
underlying repository database. As a result, maintenance of the MX views and direct
interfaces becomes a major undertaking with every major upgrade of the repository.
MX2 alleviates this problem by offering a set of object-based APIs that are
abstracted away from the details of the underlying relational tables, thus providing
an easier mechanism for managing schema evolution.

Integration with third-party tools. MX2 offers the object-based interfaces needed
to develop more sophisticated procedural programs that can tightly integrate the
repository with the third-party data warehouse modeling and query/reporting tools.

Synchronization of metadata based on changes from up-stream and down-
stream tools. Given that metadata will reside in different databases and files in a
distributed software environment, synchronizing changes and updates ensures the
validity and integrity of the metadata. The object-based technology used in MX2
provides the infrastructure needed to implement automatic metadata synchronization
and change propagation across different tools that access the Informatica
Repository.

Interoperability with other COM-based programs and repository interfaces.

MX2 interfaces comply with Microsoft’s Component Object Model (COM)
interoperability protocol. Therefore, any existing or future program that is COM-
compliant can seamlessly interface with the Informatica Repository by means of
MX2.

Support for Microsoft’s UML-based Open Information Model (OIM). The
Microsoft Repository and its OIM schema, based on the standard Unified Modeling
Language (UML), could become a de facto general-purpose repository standard.
Informatica has worked in close cooperation with Microsoft to ensure that the logical
object model of MX2 remains consistent with the data warehousing components of
the Microsoft Repository. This also facilitates robust metadata exchange with the
Microsoft Repository and other software that support this repository.

Framework to support a component-based repository in a multi-tier
architecture. With the advent of the Internet and distributed computing, multi-tier
architectures are becoming more widely accepted for accessing and managing
metadata and data. The object-based technology of MX2 supports a multi-tier
architecture so that a future Informatica Repository Server could be accessed from a
variety of thin client programs running on different operating systems.

PAGE BP-66

BEST PRACTICES

INFORMATICA CONFIDENTIAL

MX2 Architecture

MX2 provides a set of COM-based programming interfaces on top of the C++ object
model used by the client tools to access and manipulate the underlying repository.
This architecture not only encapsulates the physical repository structure, but also
leverages the existing C++ object model to provide an open, extensible API based
on the standard COM protocol. MX2 can be automatically installed on Windows 95,
98, or Windows NT using the install program provided with its SDK. After the
successful installation of MX2, its interfaces are automatically registered and
available to any software through standard COM programming techniques. The MX2
COM APIs support the PowerCenter XML Import/Export feature and provide a COM
based programming interface in which to import and export repository objects.

INFORMATICA CONFIDENTIAL

BEST PRACTICES

PAGE BP-67

Naming Conventions

Challenge

Choosing a good naming standard for the repository and adhering to it.

Description

Repository Naming Conventions

Although naming conventions are important for all repository and database objects,
the suggestions in this document focus on the former. Choosing a convention and
sticking with it is the key point - and sometimes the most difficult in determining
naming conventions. It is important to note that having a good naming convention
will help facilitate a smooth migration and improve readability for anyone reviewing
the processes.

FAQs

The following paragraphs present some of the questions that typically arise in
naming repositories and suggest answers:

Q: What are the implications of numerous repositories or numerous folders within a
repository, given that multiple development groups need to use the PowerCenter
server, and each group works independently?

• One consideration for naming conventions is how to segregate different
projects and data mart objects from one another. Whenever an object is
shared between projects, the object should be stored in a shared work area
so each of the individual projects can utilize a shortcut to the object.
Mappings are listed in alphabetical order.

Q: What naming convention is recommended for Repository Folders?

• Something specific (e.g., Company_Department_Project-Name_Prod) is
appropriate if multiple repositories are expected for various projects and/or
departments.

PAGE BP-68

BEST PRACTICES

INFORMATICA CONFIDENTIAL

Note that incorporating functions in the object name makes the name more
descriptive at a higher level. The drawback is that when an object needs to be
modified to incorporate some other business logic, the name no longer accurately
describes the object. Use descriptive names cautiously and at a high enough level. It
is not advisable to rename an object that is currently being used in a production
environment.

The following tables illustrate some naming conventions for transformation objects
(e.g., sources, targets, joiners, lookups, etc.) and repository objects (e.g.,
mappings, sessions, etc.).

Transformation Objects Naming Convention

Advanced External
Procedure Transform:

aep_ProcedureName

Aggregator Transform:

agg_TargetTableName(s) that leverages the expression
and/or a name that describes the processing being done.

Expression Transform:

exp_TargetTableName(s) that leverages the expression
and/or a name that describes the processing being done.

External Procedure
Transform:

ext_ProcedureName

Filter Transform:

fil_TargetTableName(s) that leverages the expression
and/or a name that describes the processing being done.

Joiner Transform:

jnr_SourceTable/FileName1_ SourceTable/FileName2

Lookup Transform:

lkp_LookupTableName

Mapplet:

mplt_Description

Mapping Variable:

$$Function or Process that is being done

Mapping Parameter:

$$Function or Process that is being done

Normalizer Transform:

nrm_TargetTableName(s) that leverages the expression
and/or a name that describes the processing being done.

Rank Transform:

rnk_TargetTableName(s) that leverages the expression
and/or a name that describes the processing being done.

Router:

rtr_TARGETTABLE that leverages the expression and/or a
name that describes the processing being done

Group Name: Function_TargetTableName(s) (e.g.
INSERT_EMPLOYEE or UPDATE_EMPLOYEE)

Normalizer Transform:

nrm_TargetTableName(s) that leverages the expression
and/or a name that describes the processing being done.

Sequence Generator:

seq_Function

Source Qualifier
Transform:

sq_SourceTable1_SourceTable2

Stored Procedure

SpStoredProcedureName

Update Strategy

UpdTargetTableName(s) that leverages the expression
and/or a name that describes the procession being done

Repository Objects

Naming Convention

Mapping Name:

m_TargetTable1_TargetTable2

Session Name:

s_MappingName

Batch Names:

bs_BatchName for a sequential batch and bc_BatchName
for a concurrent batch.

INFORMATICA CONFIDENTIAL

BEST PRACTICES

PAGE BP-69

Folder Name

Folder names should logically group sessions and
mappings. The grouping can be based on project, subject
area, promotion group, or some combination of these.

Target Table Names

There are often several instances of the same target, usually because of different
actions. When looking at a session run, there will be the several instances with own
successful rows, failed rows, etc. To make observing a session run easier, targets
should be named according to the action being executed on that target.

For example, if a mapping has four instances of CUSTOMER_DIM table according to
update strategy (Update, Insert, Reject, Delete), the tables should be named as
follows:

• CUSTOMER_DIM_UPD
• CUSTOMER_DIM_INS
• CUSTOMER_DIM_DEL
• CUSTOMER_DIM_REJ

Port Names

Ports names should remain the same as the source unless some other action is
performed on the port. In that case, the port should be prefixed with the
appropriate name.

When you bring a source port into a lookup or expression, the port should be
prefixed with “IN_”. This will help the user immediately identify the ports that are
being inputted without having to line up the ports with the input checkbox. It is a
good idea to prefix generated output ports. This helps trace the port value
throughout the mapping as it may travel through many other transformations. For
variables inside a transformation, you should use the prefix 'var_' plus a meaningful
name.

Batch Names

Batch names follow basically the same rules as the session names. A prefix, such as
'b_' should be used and there should be a suffix indicating if the batch is serial or
concurrent.

Batch

Session Postfixes

init_load

Initial Load indicates this session should only be used one time to load
initial data to the targets.

incr_load

Incremental Load is a update of the target and normally run periodically

wkly

indicates a weekly run of this session / batches

mtly

indicates a monthly run of this session / batches

Shared Objects

PAGE BP-70

BEST PRACTICES

INFORMATICA CONFIDENTIAL

Any object within a folder can be shared. These objects are sources, targets,
mappings, transformations, and mapplets. To share objects in a folder, the folder
must be designated as shared. Once the folder is shared, the users are allowed to
create shortcuts to objects in the folder.

If you have an object that you want to use in several mappings or across multiple
folders, like an Expression transformation that calculates sales tax, you can place the
object in a shared folder. You can then use the object in other folders by creating a
shortcut to the object in this case the naming convention is ‘SC_’ for instance
SC_mltCREATION_SESSION, SC_DUAL.

ODBC Data Source Names

Set up all Open Database Connectivity (ODBC) data source names (DSNs) the same
way on all client machines. PowerCenter uniquely identifies a source by its Database
Data Source (DBDS) and its name. The DBDS is the same name as the ODBC DSN
since the PowerCenter Client talks to all databases through ODBC.

If ODBC DSNs are different across multiple machines, there is a risk of analyzing the
same table using different names. For example, machine1 has ODBS DSN Name0
that points to database1. TableA gets analyzed in on machine 1. TableA is uniquely
identified as Name0.TableA in the repository. Machine2 has ODBS DSN Name1 that
points to database1. TableA gets analyzed in on machine 2. TableA is uniquely
identified as Name1.TableA in the repository. The result is that the repository may
refer to the same object by multiple names, creating confusion for developers,
testers, and potentially end users.

Also, refrain from using environment tokens in the ODBC DSN. For example, do not
call it dev_db01. As you migrate objects from dev, to test, to prod, you are likely to
wind up with source objects called dev_db01 in the production repository. ODBC
database names should clearly describe the database they reference to ensure that
users do not incorrectly point sessions to the wrong databases.

Database Connection Information

A good convention for database connection information is
UserName_ConnectString. Be careful not to include machine names or
environment tokens in the Database Connection Name. Database Connection names
must be very generic to be understandable and enable a smooth migration.

Using a convention like User1_DW allows you to know who the session is logging in
as and to what database. You should know which DW database, based on which
repository environment, you are working in. For example, if you are creating a
session in your QA repository using connection User1_DW, the session will write to
the QA DW database because you are in the QA repository.

Using this convention will allow for easier migration if you choose to use the Copy
Folder method. When you use Copy Folder, session information is also copied. If the
Database Connection information does not already exist in the folder you are copying
to, it is also copied. So, if you use connections with names like Dev_DW in your
development repository, they will eventually wind up in your QA, and even in your

INFORMATICA CONFIDENTIAL

BEST PRACTICES

PAGE BP-71

Production repository as you migrate folders. Manual intervention would then be
necessary to change connection names, user names, passwords, and possibly even
connect strings. Instead, if you have a User1_DW connection in each of your three
environments, when you copy a folder from Dev to QA, your sessions will
automatically hook up to the connection that already exists in the QA repository.
Now, your sessions are ready to go into the QA repository with no manual
intervention required.

PAGE BP-72

BEST PRACTICES

INFORMATICA CONFIDENTIAL

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->