Velocity v8

Data Warehousing Methodology

Data Warehousing
Executive Summary
Data Warehousing, once dedicated to business intelligence and reporting, and usually at the departmental or business unit level, is today becoming a strategic corporate initiative supporting an entire enterprise across a multitude of business applications. This brisk pace of change, coupled with industry consolidation and regulatory requirements, demands that data warehouses step into a mission-critical, operational role. Information Technology (IT) plays a crucial role in delivering the data foundation for key performance indicators such as revenue growth, margin improvement and asset efficiency at the corporate, business unit and departmental levels. And IT now has the tools and methods to succeed at any of these levels. An enterprise-wide, integrated hub is the most effective approach to track and improve fundamental business measures. It is not only desirable, it is necessary and feasible. Here are the reasons why:

The traditional approach of managing information across divisions, geographies, and segments through manual consolidation and reconciliation is error -prone and cannot keep pace with the rapid changes and stricter mandates in the business. The data must be trustworthy. Executive officers are responsible for the accuracy of the data used to make management decisions, as well as for financial and regulatory reporting. Technologies have matured to the point where industry leaders are reaping the benefits of enterprise-wide data solutions, increasing their understanding of the market, and improving their agility

Organizations may choose to implement different levels of Data Warehouses from line of business level implementations to Enterprise Data Warehouses. As the size and scope of a Warehouse increases so does the complexity, risk and effort. For those that achieve an Enterprise Data Warehouse, the benefits are often the greatest. However, an organization must be committed to delivering an Enterprise Data Warehouse and must ensure the resources, budget and timeline are sufficient to overcome the organizational hurdles to having a single repository of corporate data assets.

INFORMATICA CONFIDENTIAL

Velocity v8 Methodology - Data Warehousing

2 of 1017

Business Drivers
The primary business drivers responsible for a data warehouse project vary and can be very organization specific. However, a few generalities can be evidenced as trends across most organizations. Below are some of the key drivers typically responsible for driving data warehouse projects: Desire for a ‘360 degree view’ around customers, products, or other subject areas In order to make effective business decisions and have meaningful interactions with customers, suppliers and other partners it is important to gather information from a variety of systems to provide a ‘360 degree view’ of the entity. For example, consider a software company looking to provide a 360 degree view of their customers. To provide this view, it may require gathering and relating sales orders, prospective sales interactions, maintenance payments, support calls and services engagements. These items merged together paint a more complete picture of a particular customer’s value and interaction with the organization. The challenge is that in any organization this data might reside in numerous systems with different customer codes and structures across different technologies, making the creation of a single report nearly impossible programmatically. Thus a need arises for a centralized location to merge and rationalize this data for easy reporting - such as a Data Warehouse. Desire to provide intensive analytics reporting without impacting operational systems Operational systems are built and tuned for the best operational performance possible. A slowdown in an order entry system may cost a business lost sales and decreased customer satisfaction. Given that analytic reporting often requires summarizing and gathering large amounts of information, queries against operational systems for analytic purposes are usually discouraged and even outright prohibited for fear of impacting system performance. One key value of a data warehouse is the ability to access large data sets for analytic purposes while remaining physically separated from operational systems. This ensures that operational system performance is not adversely affected by analytic work and that business users are free to crunch large data sets and metrics without impacting daily operations. Maintaining or generating historical records In most cases, operational systems only store current state information on orders, transactions, customers, products and other data. Historical information has little use in the operational world. Point of sale transactions, for example, may be purged from

INFORMATICA CONFIDENTIAL

Velocity v8 Methodology - Data Warehousing

3 of 1017

operational systems after 30 days when the return policy expires. When organizations have a need for historical reporting it is often difficult or impossible to gather historical values from operational systems due to their very nature. By implementing a Data Warehouse where data is pulled in on a specified interval, historical values and information can be retained in the warehouse for any length of time an organization determines necessary. Data can also be stored and organized more efficiently for easy retrieval for analytical purposes. Standardizing on common definitions of corporate metrics across organizational boundaries As organizations grow, different areas of an organization may develop their own interpretation of business definitions and objects. To one group, a customer might mean anyone who purchased something from the web-site versus another group that believes any business or individual that received services is a customer. In order to standardize reporting and consolidation of these areas, organizations will embark on a data warehouse project to define and calculate these metrics in a common fashion across the organization. There are many other specific business drivers that can spur the need for a Data Warehouse. However, these are some of the most common seen across most organizations.

Key Success Factors
To ensure success for a Data Warehouse implementation, there are key success factors that must be kept in mind throughout the project. Many times data warehouses are built by IT staff that have been pulled or moved from other implementation efforts such as system implementations and upgrades. In these cases, the process for implementing a Data Warehouse can be quite a change from past IT work. These Key Success Factors point out important topics to consider as you begin project planning.

Understanding Key Characteristics of a Data Warehouse
When embarking on a Data Warehouse project it is important to recognize and keep in mind the key differentiators between a Data Warehouse project and a typical system implementation. Some of these differences are:

Data Sources are from many disparate systems, internal and external to the organization Data models are used to understand relationships and business rules within

INFORMATICA CONFIDENTIAL

Velocity v8 Methodology - Data Warehousing

4 of 1017

the data
● ● ●

Data volumes for both Data Integration and analytic reporting are high Historical data is maintained, often for periods of years Data is often stored as both detailed level data and summarized or aggregated data The underlying database system is tuned for querying large volumes of data rather than for inserting single transaction data Data Warehouse data supports tactical and strategic decision making, rather than operational processing A successful Data Warehouse is business driven. The goal of any Data Warehouse is to identify the business information needs Data Warehouse applications can have enterprise–wide impact and visibility and often enable reporting at the highest levels within an organization As an enterprise level application, executive level support is vital to success

Typically data must be modeled, structured and populated in a relational database for it to be available for a Data Warehouse reporting project. Data Integration is designed based on available Operational Application sources to pull, cleanse, transform and populate an Enterprise Subject Area Database. Once the data is present in the Subject Area Database, projects can fulfill their requirements to provide Business Intelligence reporting to the Data Warehouse end users. This is done by identifying detailed reporting requirements and designing corresponding Business Intelligence Data Marts that capture all of the properly granulated facts and dimensions needed for reporting. These Data Marts are then populated using a Data Integration process and coupled to the Reporting components developed in the Business Intelligence tool.

Understanding Common Data Warehouse Project Types
Not every Data Warehouse project is a brand new implementation of a Data Warehouse. Often Warehouses are deployed in phases where subsequent implementations are simply adding new subject areas, new data sources or enhanced reporting to the existing solution. General categories of Data Warehouse projects have been defined below along with key considerations for each. New Business Data Project This type of project addresses the need to gather data from an area of the enterprise where no prior familiarity of the business data or requirements exists. All project components are required.

INFORMATICA CONFIDENTIAL

Velocity v8 Methodology - Data Warehousing

5 of 1017

Logical data modeling is a crucial step in this type of project, as there is a need to thoroughly understand and model the data requirements from a business perspective. The logical data model serves as the fundamental foundation and blueprint upon which all follow-on project work will be based. If not properly and sufficiently addressed, ultimate success for this type of project will be difficult to achieve; and re-work after-thefact will be costly from both a time and money perspective. Physical data modeling and data discovery components will drive out the identification and design of the new database requirements and the new data sources. New Data Integration processes must be created to bring new data into new Data Warehouse database structures. A set of history loads may be required to backload the data and bring it up to the current timeline. New Dimensional Data Mart and BI Reporting offerings must be modeled, designed and implemented to satisfy the user information and access needs. Enhanced Data Source Project This type of project addresses the need to add a new data source or to alter an existing data source, but always within the context of already established logical data structures and definitions. No logical data modeling is needed because no new business data requirements are being entertained. Minor adjustments to the physical model and database may be needed to accommodate changes in volume due to the new source or new or altered views may be needed to report on the new data instances that may now be available to the users. Data discovery analysis comprises a key portion of this type of project, as does the corresponding new or altered data integration processes that move the data to the database. Business intelligence reports and queries may need to change to incorporate new views or expanded drill-downs and data value relationships. Back loading historical data may also be required. When enhancing existing data, Metadata management efforts to track data from the physical data sources through the data integration process and to business intelligence data marts and reports can assist with impact analysis and scoping efforts. Enhanced Business Intelligence Requirements-Only Project This type of project is focused solely on the expansion or alteration of the business intelligence reporting and query capability using existing subject area data. This type of project does not entertain the introduction of any new or altered data (in structure or content) to the warehouse subject area database. New or altered dimensional data mart tables/views may be required to support the business intelligence enhancements; otherwise the majority, if not all, of the work is within the business intelligence

INFORMATICA CONFIDENTIAL

Velocity v8 Methodology - Data Warehousing

6 of 1017

development component.

Executive Support and Organizational Buy-In
Successful Data Warehouse projects are usually characterized by strong organizational commitment to the delivery of enterprise analytics. The value and return on investment (ROI) must be clearly articulated early in the project and an acknowledgement of the cost and time to achieve results needs to be fully explored and understood. Typically Data Warehouse project efforts involve a steering committee of executives and business leads that drive the priority and overall vision for the organization. Executive commitment is not only needed for assigning appropriate resources and budget, but also to assist the data warehouse team in breaking down organizational barriers. It is not uncommon for a data warehouse team to encounter challenges in getting access to the data and systems necessary to build the data warehouse. Operational system owners are focused on their system and its primary function and have little interest in making their data available for warehousing efforts. At these times, the Data Warehouse steering committee can step in or rally executive support to break down these barriers in the organization. It is important to assess the business case and executive sponsorship early on in the Data Warehouse project. The project is at risk if the business value of the warehouse cannot be articulated at the executive level and on down through the organization. If executives do not have a clear picture of how the data warehouse will impact their business and the value it will provide, it wont’ be long before a decision is made to reduce or stop funding the effort.

Enterprise Vision and Milestone Delivery
The Data Warehouse team should always keep the end goal in mind for an enterprise wide data warehouse. Often an enterprise data warehouse will strive to achieve a ‘single source of truth’ across the entire enterprise and across all data stores. Delivering this in a ‘big bang’ approach nearly always fails. By the time all of the enterprise modeling, data rationalization and data integration have taken place across all facets of the organization, the value of the project is called into question, and the project is either delayed or cancelled. In order to keep an Enterprise Data Warehouse on track, phases of deployment should be scheduled to provide value quickly and continuously throughout the lifecycle of the

INFORMATICA CONFIDENTIAL

Velocity v8 Methodology - Data Warehousing

7 of 1017

project. It is important for project teams to find areas of high business value that can be delivered quickly and then build upon that success as the enterprise vision is realized. The string of regular success milestones and business value keeps the executive sponsorship engaged and proves the value of the Data Warehouse to the organization early and often. The key to remember is that while these short term milestones are delivered, the Data Warehouse Team should not lose sight of the end goal of the enterprise vision. For example, when implementing customer retention metrics for two key systems - as an early ‘win’ – be sure to consider the 5 other systems in the organization and try to ensure that the model and process is flexible enough so that the current work will not need to be re-architected when this data is added in a later phase. Keep the final goal in mind when designing and building the incremental milestones.

Flexible and Adaptable Reporting
End-user reporting must provide flexibility, offering straightforward reports for basic users, and for analytic users allowing drilling and roll-ups, views of both summary and detailed data and ad-hoc reporting. Report design that is too rigid may lead to clutter (as multiple structures are developed for reports that are very similar to each other) in the Business Intelligence Application and in the Data Integration and Data Warehouse Database contents. Providing flexible structures and reports allows data to be queried from the same reports and database structures without redundancy or the time required to develop new objects and Data Integration processes. Users can create reports from the common structures, thus removing the bottleneck of IT activities and the need to wait for development. Data modeling and physical database structures that reflect the business model (rather than the requirements for a single report) enable flexibility as a by-product.

Summary and Detail Data
Often reporting requirements are defined for summary data. While summary data may be available from transaction and operational systems, it is best to bring the detailed data into the Data Warehouse and summarize based on that detail. This avoids potential problems due to different calculation methods, aggregating on different criteria and other ways in which the summary data brought in as a source might differ from rollups that begin with the raw detailed records. Because the summary offers smaller database table sizes, it may be tempting to bring this data in first, and then bring in the detailed data at a later stage in order to drill down to the details. Having standard sources of the raw data and using the same source for various summaries increases the quality of the data and avoids ending up with multiple versions of the truth.

INFORMATICA CONFIDENTIAL

Velocity v8 Methodology - Data Warehousing

8 of 1017

Engage Business Users Early
Business Users must be engaged throughout the entire development process. The resulting reports and supporting data from the Data Warehouse project should address answers to business questions. If the users are not involved then it is probable that the end-result will not meet their needs for business data; and the overall success of the project is diminished. The more the business users feel that the solution is focused on solving their analytic needs – the more likelihood there is of adoption.

Thorough Data Validation and Monitoring
Once lost, trust is difficult to regain. As a Data Warehouse is rolled out (and throughout its existence) it is important to thoroughly validate the data it contains in order it to maintain the end users trust in the data warehouse analytics. If a key metric is incorrect (i.e., the gross sales amount for a region in a particular month) end users may loose confidence in the system and all of its reports and metrics. If users lose faith in the analytics, this can hamper enterprise adoption and even spell the end of a data warehouse. Not only is thorough testing and validation required to ensure that data is loaded completely and accurately into the warehouse, but organizations will often create ongoing balancing and auditing procedures. These procedures are run on a regular basis to ensure metrics are accurate and that they ‘tie out’ with source systems. Sometimes these procedures are manual and sometimes they are automated. If the warehouse is suspected to be inaccurate - or a daily load fails to run – communications are initiated with end users to alert them to the problem. It is better to limit user reporting for a morning until the issues are addressed, than to risk that an executive makes a critical business decision with incorrect data.

Last updated: 27-May-08 23:05

INFORMATICA CONFIDENTIAL

Velocity v8 Methodology - Data Warehousing

9 of 1017

Roles

Velocity Roles and Responsibilities Application Specialist Business Analyst Business Project Manager Data Architect Data Integration Developer Data Quality Developer Data Steward/Data Quality Steward Data Warehouse Administrator Database Administrator (DBA) End User Metadata Manager PowerCenter Domain Administrator Presentation Layer Developer Production Supervisor Project Sponsor Quality Assurance Manager Repository Administrator Technical Architect Technical Project Manager Test Engineer Test Manager Training Coordinator User Acceptance Test Lead

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

INFORMATICA CONFIDENTIAL

Velocity v8 Methodology - Data Warehousing

10 of 1017

Velocity Roles and Responsibilities
The following pages describe the roles used throughout this Guide, along with the responsibilities typically associated with each. Please note that the concept of a role is distinct from that of an employee or full time equivalent (FTE). A role encapsulates a set of responsibilities that may be fulfilled by a single person in a part-time or fulltime capacity, or may be accomplished by a number of people working together. The Velocity Guide refers to roles with an implicit assumption that there is a corresponding person in that role. For example, a task description may discuss the involvement of "the DBA" on a particular project, however, there may be one or more DBAs, or a person whose part-time responsibility is database administration. In addition, note that there is no assumption of staffing level for each role -- that is, a small project may have one individual filling the role of Data Integration Developer, Data Architect, and Database Administrator, while large projects may have multiple individuals assigned to each role. In cases where multiple people represent a given role, the singular role name is used, and project planners can specify the actual allocation of work among all relevant parties. For example, the methodology always refers to the Technical Architect, when in fact, there may be a team of two or more people developing the Technical Architecture for a very large development effort.

Data Integration Project - Sample Organization Chart

Last updated: 20-May-08 18:51

INFORMATICA CONFIDENTIAL

Velocity v8 Methodology - Data Warehousing

11 of 1017

Application Specialist
Successful data integration projects are built on a foundation of thorough understanding of the source and target applications. The Application Specialist is responsible for providing detailed information on data models, metadata, audit controls and processing controls to Business Analysts, Technical Architects and others regarding the source and/or target system. This role is normally filled by someone from a technical background who is able to query/analyze the data ‘hands-on’. The person filling this role should have a good business understanding of how the data is generated and maintained and good relationships with the Data Steward and the users of the data.

Reports to:

Technical Project Manager

Responsibilities:
● ● ●

Authority on application system data and process models Advises on known and anticipated data quality issues Supports the construction of representative test data sets

Qualifications/Certifications
● ● ●

Possesses excellent communication skills, both written and verbal Must be able to work effectively with both business and technical stakeholders Works independently with minimal supervision

Recommended Training

Informatica Data Explorer

Last updated: 09-Apr-07 15:38

INFORMATICA CONFIDENTIAL

Velocity v8 Methodology - Data Warehousing

12 of 1017

Business Analyst
The primary role of the Business Analyst (sometimes known as the Functional Analyst) is to represent the interests of the business in the development of the data integration solution. The secondary role is to function as an interpreter for business and technical staff, translating concepts and terminology and generally bridging gaps in understanding. Under normal circumstances, someone from the business community fills this role, since deep knowledge of the business requirement is indispensable. Ideally, familiarity with the technology and the development life-cycle allows the individual to function as the communications channel between technical and business users.

Reports to:

Business Project Manager

Responsibilities:

Ensures that the delivered solution fulfills the needs of the business (should be involved in decisions related to the business requirements) Assists in determining the data integration system project scope, time and required resources Provides support and analysis of data collection, mapping, aggregation and balancing functions Performs requirements analysis, documentation, testing, ad-hoc reporting, user support and project leadership Produces detailed business process flows, functional requirements specifications and data models and communicates these requirements to the design and build teams Conducts cost/benefit assessments of the functionality requested by end-users Prioritizes and balances competing priorities Plans and authors the user documentation set

● ● ●

Qualifications/Certifications

Possesses excellent communication skills, both written and verbal

INFORMATICA CONFIDENTIAL

Velocity v8 Methodology - Data Warehousing

13 of 1017

● ● ●

Must be able to work effectively with both business and technical stakeholders Works independently with minimal supervision Has knowledge of the tools and technologies used in the data integration solution Holds certification in industry vertical knowledge (if applicable)

Recommended Training
● ● ● ● ● ●

Interview/workshop techniques Project Management Data Analysis Structured analysis UML or other business design methodology Data Warehouse Development

Last updated: 09-Apr-07 15:20

INFORMATICA CONFIDENTIAL

Velocity v8 Methodology - Data Warehousing

14 of 1017

Business Project Manager
The Business Project Manager has overall responsibility for the delivery of the data integration solution. As such, the Business Project Manager works with the project sponsor, technical project manager, user community, and development team to strike an appropriate balance of business needs, resource availability, project scope, schedule, and budget to deliver specified requirements and meet customer satisfaction.

Reports to:

Project Sponsor

Responsibilities:
● ● ● ●

Develops and manages the project work plan Manages project scope, time-line and budget Resolves budget issues Works with the Technical Project Manager to procure and assign the appropriate resources for the project Communicates project progress to Project Sponsor(s) Is responsible for ensuring delivery on commitments and ensuring that the delivered solution fulfills the needs of the business Performs requirements analysis, documentation, ad-hoc reporting and project leadership

● ●

Qualifications/Certifications
● ● ● ● ● ● ●

Translates strategies into deliverables Prioritizes and balances competing priorities Possesses excellent communication skills, both written and verbal Results oriented team player Must be able to work effectively with both business and technical stakeholders Works independently with minimal supervision Has knowledge of the tools and technologies used in the data integration

INFORMATICA CONFIDENTIAL

Velocity v8 Methodology - Data Warehousing

15 of 1017

solution

Holds certification in industry vertical knowledge (if applicable)

Recommended Training

Project Management

Last updated: 06-Apr-07 17:55

INFORMATICA CONFIDENTIAL

Velocity v8 Methodology - Data Warehousing

16 of 1017

Data Architect
The Data Architect is responsible for the delivery of a robust scalable data architecture that meets the business goals of the organization. The Data Architect develops the logical data models, and documents the models in Entity-Relationship Diagrams (ERD). The Data Architect must work with the Business Analysts and Data Integration Developers to translate the business requirements into a logical model. The logical model is captured in the ERD, which then feeds the work of the Database Administrator, who designs and implements the physical database. Depending on the specific structure of the development organization, the Data Architect may also be considered a Data Warehouse Architect, in cooperation with the Technical Architect. This role involves developing the overall Data Warehouse logical architecture, specifically the configuration of the data warehouse, data marts, and an operational data store or staging area if necessary. The physical implementation of the architecture is the responsibility of the Database Administrator.

Reports to:

Technical Project Manager

Responsibilities:

Designs an information strategy that maximizes the value of data as an enterprise asset Maintains logical/physical data models Coordinates the metadata associated with the application Develops technical design documents Develops and communicates data standards Maintains Data Quality metrics Plans architectures and infrastructures in support of data management processes and procedures Supports the build out of the Data Warehouse, Data Marts and operational data store Effectively communicates with other technology and product team members

● ● ● ● ● ●

Qualifications/Certifications

INFORMATICA CONFIDENTIAL

Velocity v8 Methodology - Data Warehousing

17 of 1017

● ●

Strong understanding of data integration concepts Understanding of multiple data architectures that can support a Data Warehouse Ability to translate functional requirements into technical design specifications Ability to develop technical design documents and test case documents Experience in optimizing data loads and data transformations Industry vertical experience is essential Project Solution experience is desired Has had some exposure to Project Management Has worked with Modeling Packages Has experience with at least one RDBMS Strong Business Analysis and problem solving skills Familiarity with Enterprise Architecture Structures (Zachman/TOGAF)

● ● ● ● ● ● ● ● ● ●

Recommended Training
● ●

Modeling Packages Data Warehouse Development

Last updated: 01-Feb-07 18:51

INFORMATICA CONFIDENTIAL

Velocity v8 Methodology - Data Warehousing

18 of 1017

Data Integration Developer
The Data Integration Developer is responsible for the design, build, and deployment of the project's data integration component. A typical data integration effort usually involves multiple Data Integration Developers developing the Informatica mappings, executing sessions, and validating the results.

Reports to:

Technical Project Manager

Responsibilities:

Uses the Informatica Data Integration platform to extract, transform, and load data Develops Informatica mapping designs Develops Data Integration Workflows and load processes Ensures adherence to locally defined standards for all developed components Performs data analysis for both Source and Target tables/columns Provides technical documentation of Source and Target mappings Supports the development and design of the internal data integration framework Participates in design and development reviews Works with System owners to resolve source data issues and refine transformation rules Ensures performance metrics are met and tracked Writes and maintains unit tests Conduct QA Reviews Performs production migrations

● ● ● ● ● ●

● ●

● ● ● ●

Qualifications/Certifications
● ●

Understands data integration processes and how to tune for performance Has SQL experience

INFORMATICA CONFIDENTIAL

Velocity v8 Methodology - Data Warehousing

19 of 1017

● ●

Possesses excellent communications skills Has the ability to develop work plans and follow through on assignments with minimal guidance Has Informatica Data Integration Platform experience Is an Informatica Certified Designer Has RDBMS experience Has the ability to work with business and system owners to obtain requirements and manage expectations

● ● ● ●

Recommended Training
● ● ● ● ● ● ● ● ●

Data Modeling PowerCenter – Level I & II Developer PowerCenter - Performance Tuning PowerCenter - Team Based Development PowerCenter - Advanced Mapping Techniques PowerCenter - Advanced Workflow Techniques PowerCenter - XML Support PowerCenter - Data Profiling PowerExchange

Last updated: 01-Feb-07 18:51

INFORMATICA CONFIDENTIAL

Velocity v8 Methodology - Data Warehousing

20 of 1017

Data Quality Developer
The Data Quality Developer (DQ Developer) is responsible for designing, testing, deploying, and documenting the project's data quality procedures and their outputs. The DQ Developer provides the Data Integration Developer with all relevant outputs and results from the data quality procedures, including any ongoing procedures that will run in the Operate phase or after project-end. The DQ Developer must provide the Business Analyst with the summary results of data quality analysis as needed during the project. The DQ Developer must also document at a functional level how the procedures work within the data quality applications. The primary tasks associated with this role are to use Informatica Data Quality and Informatica Data Explorer to profile the project source data, define or confirm the definition of the metadata, cleanse and accuracy-check the project data, check for duplicate or redundant records, and provide the Data Integration Developer with concrete proposals on how to proceed with the ETL processes.

Reports to:

Technical Project Manager

Responsibilities:
● ● ●

Profile source data and determine all source data and metadata characteristics Design and execute Data Quality Audit Present profiling/audit results, in summary and in detail, to the business analyst, the project manager, and the data steward Assist the business analyst/project manager/data steward in defining or modifying the project plan based on these results Assist the Data Integration Developer in designing source-to-target mappings Design and execute the data quality plans that will cleanse, de-duplicate, and otherwise prepare the project data for the Build phase Test Data Quality plans for accuracy and completeness Assist in deploying plans that will run in a scheduled or batch environment Document all plans in detail and hand-over documentation to the customer Assist in any other areas relating to the use of data quality processes, such as unit testing

● ●

● ● ● ●

Qualifications/Certifications

INFORMATICA CONFIDENTIAL

Velocity v8 Methodology - Data Warehousing

21 of 1017

● ● ● ●

Has knowledge of the tools and technologies used in the data quality solution Results oriented team player Possesses excellent communication skills, both written and verbal Must be able to work effectively with both business and technical stakeholders

Recommended Training
● ● ● ● ●

Data Quality Workbench I & II Data Explorer Level I PowerCenter Level I Developer Basic RDBMS Training Data Warehouse Development

Last updated: 15-Feb-07 17:34

INFORMATICA CONFIDENTIAL

Velocity v8 Methodology - Data Warehousing

22 of 1017

its use.Data Warehousing 23 of 1017 . The Data Steward will be the primary contact for all questions relating to the data. There is often an arbitration element to the role where data is put to different uses by separate groups of users whose requirements have to be reconciled. Reports to: ● Business Project Manager Responsibilities: ● ● ● ● ● ● ● ● ● Records the business use for defined data Identifies opportunities to share and re-use data Decides upon the target data quality metrics Monitors the progress towards.Data Steward/Data Quality Steward The Data Steward owns the data and associated business and technical rules on behalf of the Project Sponsor. This role has responsibility for defining and maintaining business and technical rules. Typically the Data Steward is a key member of a Data Stewardship Committee put into place by the Project Sponsor. completeness and accuracy of data definitions Communicates concerns. and tuning of. issues and problems with data to the individuals that can influence change Researches and resolves data issues ● Qualifications/Certifications INFORMATICA CONFIDENTIAL Velocity v8 Methodology . processing and quality. data quality target metrics Oversees data quality strategy and remedial measures Participates in the enforcement of data quality standards Enters. liaising with the business and technical communities. and resolving issues relating to the data. maintains and verifies data changes Ensures the quality. In essence. This committee will include business users and technical staff such as Application Experts. this role formalizes the accountability for the management of organizational data.

including setting and executing strategy Previous industry vertical experience is essential Possesses excellent communication skills.● ● Possesses strong analytical and problem solving skills Has experience in managing data standardization in a large organization. both written and verbal Exhibits effective negotiating skills Displays meticulous attention to detail Must be able to work effectively with both business and technical stakeholders Works independently with minimal supervision Project solution experience is desirable ● ● ● ● ● ● ● Recommended Training ● ● Data Quality Workbench Level I Data Explorer Level I Last updated: 15-Feb-07 17:34 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 24 of 1017 .

loading. Reports to: ● Technical Project Manager Responsibilities: ● ● Monitors and supports the Enterprise Data Warehouse environment Manages the data extraction. A typical data integration solution however. involves more than a single target database and the Data Warehouse Administrator is responsible for coordinating the many facets of the solution. security. movement. transformation.Data Warehouse Administrator The scope of the Data Warehouse Administrator role is similar to that of the DBA.Data Warehousing 25 of 1017 . job scheduling and submission. including operational considerations of the data warehouse. integration and presentation technology Experience in developing and supporting real-time and batch-driven data movements Solid understanding of relational database models and dimensional data models Strategic planning and system analysis ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and resolution of production failures. cleansing and updating processes into the DW environment Maintains the DW repository Implements database security Sets standards and procedures for the DW environment Implements technology improvements Works to resolve technical issues Contributes to technical and system architectural planning Tests and implements new technical solutions ● ● ● ● ● ● ● Qualifications/Certifications ● ● ● Experience in supporting Data Warehouse environments Familiarity with database.

Data Warehousing 26 of 1017 .● ● Able to work effectively with both business and technical stakeholders Works independently with minimal supervision Recommended Training ● ● ● ● ● DBMS Administration Data Warehouse Development PowerCenter Administrator Level I & II PowerCenter Security and Migration PowerCenter Metadata Manager Last updated: 01-Feb-07 18:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Data Warehousing 27 of 1017 . the project DBA also has useful knowledge of existing source database systems. implementing the model. In most cases.e. an analytic solution with heterogeneous sources/targets may require the involvement of several DBAs. The Project Manager and Data Warehouse Administrator are responsible for ensuring that the DBAs are working in concert toward a common solution.Database Administrator (DBA) The Database Administrator (DBA) in a Data Integration Solution is typically responsible for translating the logical model (i. the ERD) into a physical model for implementation in the chosen DBMS. As a result. a DBA's skills are tied to a particular DBMS. implements and supports enterprise databases Establishes and maintains database security and integrity controls Delivers database services while managing to policies.. and general administration of the DBMS. such as Oracle or Sybase.e. performance tuning. procedures and standards Tests and implements new technical solutions Monitors and supports the database infrastructure (including clients) Develops volume and capacity estimates Proposes and implements enhancements to improve performance and reliability Provides operational support of databases. Network Administrators) to identify and resolve performance issues ● ● ● ● ● ● ● ● ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology .. including backup and recovery Develops programs to migrate data between systems Works to resolve technical issues Contributes to technical and system architectural planning Supports data integration developers in troubleshooting performance issues Collaborates with other Departments (i. In many cases. developing volume and capacity estimates. Reports to: ● Technical Project Manager Responsibilities: ● ● ● Plans.

backup and recovery Expertise in database configuration and tuning Appreciation of DI tool-set and associated tools Experience in developing and supporting ETL real-time and batch processes Strategic planning and system analysis Strong analytical and communication skills Able to work effectively with both business and technical stakeholders Ability to work independently with minimal supervision Recommended Training ● DBMS Administration Last updated: 01-Feb-07 18:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 28 of 1017 .Qualifications/Certifications ● ● ● ● ● ● ● ● Experience in database administration.

Report Development Last updated: 01-Feb-07 18:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . As such. Reports to: ● Business Project Manager Responsibilities: ● ● ● ● Gathers and clarifies business requirements Reviews technical design proposals Participates in User Acceptance testing Provides feedback on the user experience Qualifications/Certifications ● ● Strong understanding of the business' processes Good communication skills Recommended Training ● ● Data Analyzer . developing the solution and User Acceptance Testing (if applicable). the end user represents a key customer constituent (management is another). Specifically. a representative of the End User community must be involved in gathering and clarifying the business requirements.Quickstart Data Analyzer . and must therefore be heavily involved in the development of a data integration solution.End User The End User is the ultimate "consumer" of the data in the data warehouse and/or data marts.Data Warehousing 29 of 1017 .

Reports to: ● Business Project Manager Responsibilities: ● ● ● Formulates and implements the metadata strategy Captures and integrates metadata from heterogeneous metadata sources Implements and governs best practices relating to enterprise metadata management standards Determines metadata points of integration between disparate systems Ensures the ability to deliver metadata to business and technical users Monitors development repositories for accuracy and metadata consistency Identifies and profiles data sources to populate the metadata repository Designs metadata repository models ● ● ● ● ● Qualifications/Certifications ● ● ● ● ● Business sector experience is essential Experience in implementing and managing a repository environment Experience in data modeling (relational and dimensional) Experience in using repository tools Solid knowledge of general data architecture concepts. determining metadata points of integration between disparate systems. The Metadata Manager also monitors PowerCenter repositories for accuracy and metadata consistency. This role involves setting the company's metadata strategy. developing standards with the data administration group. standards and best INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and ensuring the ability to deliver metadata to business and technical users.Metadata Manager The Metadata Manager's primary role is to serve as the central point of contact for all corporate metadata management. The Metadata Manager is required to work across business and technical groups to ensure that consistent metadata standards are followed in all existing applications as well as in new development.Data Warehousing 30 of 1017 .

Data Warehousing 31 of 1017 .practices ● ● ● Strong analytical skills Excellent communication skills. both written and verbal Proven ability to work effectively with both business users and technical stakeholders Recommended Training ● ● ● DBMS Basics Data Modeling PowerCenter .Metadata Manager Last updated: 01-Feb-07 18:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Data Warehousing 32 of 1017 . configure. Nodes. The PowerCenter Domain Administrator works closely with the Technical Architect and other project personnel during the Architect. Reports to: ● Technical Project Manager Responsibilities: ● Manages the PowerCenter Domain. licensing and the physical linstall and location of the services and nodes that compose the domain. The PowerCenter Domain Administrator is reponsible for the domain security configuration. Build and Deploy phases to plan. Service Manager and Application Services Develops Disaster recovery and failover strategies for the Data Integration Environment Responsible for High Availability and PowerCenter Grid configuration Creates new services as nodes as needed Ensures proper configuration of the PowerCenter Domain components Ensures proper application of the licensing files to nodes and services Manages user and user group access to the domain components Manages backup and recovery of the domain metadata and appropriate shared file directories Monitors domain services and troubleshoots any errors Applies software updates as required Tests and implements new technical solutions ● ● ● ● ● ● ● ● ● ● Qualifications/Certifications INFORMATICA CONFIDENTIAL Velocity v8 Methodology . support and maintain the desired PowerCenter configuration. This involves the management and administration of all components in the PowerCenter domain.PowerCenter Domain Administrator The PowerCenter Domain Administrator is responsible for administering the Informatica Data Integration environment.

Data Warehousing 33 of 1017 .● ● ● ● Informatica Certified Administrator Experience in supporting Data Warehouse environments Experience in developing and supporting ETL real-time and batch processes Solid understanding of relational database models and dimensional data models Recommended Training ● PowerCenter Administrator Level I and Level II Last updated: 01-Feb-07 18:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

This component provides the user interface to the data warehouses. To be most effective. ensuring that the end-user requirements gathered during the requirements definition phase are accurately met by the final build of the application. As the interface is highly visible to the enterprise. such as Cognos.Presentation Layer Developer The Presentation Layer Developer is responsible for the design. and deployment of the presentation layer component of the data integration solution. build. The Presentation Layer Developer designs the application. the Presentation Layer Developer should be familiar with metadata concepts and the Data Warehouse/Data Mart data model. Business Objects and others. the developer works with front-end Business Intelligence tools. Reports to: ● Technical Project Manager Responsibilities: ● Collaborates with ends users and other stakeholders to define detailed requirements Designs business intelligence solutions that meet user requirements for accessing and analyzing data Works with front-end business intelligence tools to design the reporting environment Works with the DBA and Data Architect to optimize reporting performance Develops supporting documentation for the application Participates in the full testing cycle ● ● ● ● ● Qualifications/Certifications ● Solid understanding of metadata concepts and the Data Warehouse/Data Mart model INFORMATICA CONFIDENTIAL Velocity v8 Methodology . data marts and other products of the data integration effort. In most cases.Data Warehousing 34 of 1017 . a person in this role must work closely with end users to gain a full understanding of their needs.

Cognos..e. Informatica Data Analyzer) Excellent problem solving and trouble-shooting skills Solid interpersonal skills and ability to work with business and system owners to obtain requirements and manage expectations Capable of expressing technical concepts in business terms ● ● ● Recommended Training ● ● Informatica Data Analyzer Data Warehouse Development Last updated: 01-Feb-07 18:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Business Objects.● Aptitude with front-end business intelligence tools (i.Data Warehousing 35 of 1017 .

Responsibilities includes. review of execution statistics. but are not limited to . sessions and other data integration processes.training and supervision of system operators.Production Supervisor The Production Supervisor has operational oversight for the production environment and the daily execution of workflows. managing the scheduling for upgrades to the system and application software as well as the release of data integration processes. Reports to: ● Information Technology Lead Responsibilities: ● Manages the daily execution of workflows and sessions in the production environment Trains and supervises the work of system operators Reviews and audits execution logs and statistics and escalates issues appropriately Schedules the release of new sessions or workflows Schedules upgrades to the system and application software Ensures that work instructions are followed Monitors data integration processes for performance Monitors data integration components to ensure appropriate storage and capacity for daily volumes ● ● ● ● ● ● ● Qualifications/Certifications ● ● ● ● Production supervisory experience Effective leadership skills Strong problem solving skills Excellent organizational and follow-up skills Recommended Training ● PowerCenter Level I Developer INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 36 of 1017 .

● ● ● PowerCenter Team Based Development PowerCenter Advanced Workflow Techniques PowerCenter Security and Migration Last updated: 01-Feb-07 18:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 37 of 1017 .

Data Warehousing 38 of 1017 . bringing word of the successful implementation to other areas within the organization. Reports to: ● Executive Leadership Responsibilities: ● ● ● ● Provides the business sponsorship for the project Champions the project within the business Initiates the project effort Guides the Project Managers in understanding business requirements and priorities Assists in determining the data integration system project scope. guides the Project Managers in understanding business priorities. This is important because the lack of business sponsorship is often a contributing cause of systems implementation failure. serves as project champion. budget and required resources Reports status of the implementation to executive leadership ● ● Qualifications/Certifications ● Has industry vertical knowledge Recommended Training ● N/A Last updated: 01-Feb-07 18:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Once an implementation is complete.Project Sponsor The Project Sponsor is typically a member of the business community rather than an IT/IS resource. the Project Sponsor may also serve as "chief evangelist". The Project Sponsor often initiates the effort. time. and reports status of the implementation to executive leadership.

Depending upon the test approach taken by the project team.Data Warehousing 39 of 1017 . the QA Manager works with project management and the development team to resolve them. Reports to: ● Technical Project Manager Responsibilities: ● Leads the effort to validate the integrity of the data through the data integration processes Ensures that the data contained in the data integration solution has been accurately derived from the source data Develops and maintains quality assurance plans and test requirements documentation Verifies compliance to commitments contained in quality plans Works with the project management and development teams to resolve issues Participates in the enforcement of data quality standards Communicates concerns. In situations where issues arise with regard to the quality of the solution. coordinates the QA and Test strategies ● ● ● ● ● ● ● ● ● ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . issues and problems with data Participates in the testing and post-production verification Together with the Technical Lead and the Repository Administrator.e. articulates the development standards Advises on the development methods to ensure that quality is built in Designs the QA and standards enforcement strategy Together with the Test Manager. The QA Manager can be a member of the IT organization.. the QA Manager may also serve as the Test Manager. and ensuring that the utlimate data target has been accurately derived from the source data.Quality Assurance Manager The Quality Assurance (QA) Manager ensures that the original intent of the business case is achieved in the actual implementation of the analytic solution. This involves leading the efforts to validate the integrity of the data throughout the data integration processes. the Business Analysts and End Users). but serve as a liaison to the business community (i.

Data Warehousing 40 of 1017 . best practices and procedures Experience with automated testing tools Knowledge of Data Warehouse and Data Integration enterprise environments Able to work effectively with both business and technical stakeholders ● ● ● Recommended Training ● ● ● ● PowerCenter Level I Developer Infomatica Data Explorer Informatica Data Quality Workbench Project Management Last updated: 01-Feb-07 18:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . auditing processes.● Manages the implementation of the QA strategy Qualifications/Certifications ● ● ● Industry vertical knowledge Solid understanding of the Software Development Life Cycle Experience in quality assurance performance.

global/local repository relationships and backup and recovery. Reports to: ● Technical Project Manager Responsibilities: ● ● ● ● ● ● ● ● ● Develops and maintains the repository folder structure Manages user and user group access to objects in the repository Manages PowerCenter global/local repository relationships and security levels Coordinates migration of data during the development effort Establishes and promotes naming conventions and development standards Develops back-up and restore procedures for the repository Works to resolve technical issues Contributes to technical and system architectural planning Tests and implements new technical solutions Qualifications/Certifications ● ● Informatica Certified Administrator Experience in supporting Data Warehouse environments INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The Repository Administrator works closely with the Technical Architect and other project personnel during the Architect. and roles. maintaining database connections. Build and Deploy phases to plan. managing users. support and maintain the desired PowerCenter and Data Analyzer configuration. configure. establishing and promoting naming conventions and development standards. the Repository Administrator is responsible for coordinating migrations.Data Warehousing 41 of 1017 . groups. It entails developing and maintaining the folder and schema structures. This requires maintaining the organization and security of the objects contained in the repository. During the development effort.Repository Administrator The Repository Administrator is responsible for administering a PowerCenter or Data Analyzer Repository. and developing back-up and restore procedures for the repositories.

Data Warehousing 42 of 1017 .● ● Experience in developing and supporting ETL real-time and batch processes Solid understanding of relational database models and dimensional data models Recommended Training ● ● PowerCenter Administrator Level I and Level II Data Analyzer Introduction Last updated: 01-Feb-07 18:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

and implementation of a sound technical architecture. SDLC and development standards Develops and reviews implementation plans and contingency plans ● ● ● ● ● ● ● ● Qualifications/Certifications ● Software development expertise (previous development experience of the application type) Deep understanding of all technical components of the application solution ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Having this perspective helps to ensure that the architecture can expand to correspond with the growth of the data integration solution. The Architect interacts with the Project Management and design teams early in the development effort in order to understand the scope of the business problem and its solution. large volume enterprise solution Performs high-level architectural planning.Technical Architect The Technical Architect is responsible for the conceptualization. which includes both hardware and software components. The Technical Architect must always consider both current (stated) requirements and future (unstated) directions. design. scheduling and development reviews Approves code reviews and technical deliverables Assures architectural integrity Maintains compliance with change control. proof-of-concept and software design Defines and implements standards. shared components and approaches Functions as the Design Authority in technical design reviews Contributes to development project estimates.Data Warehousing 43 of 1017 . This is particularly critical given the highly iterative nature of data integration solution development. Reports to: ● Technical Project Manager Responsibilities: ● Develops the architectural design for a highly scalable.

● ● ● ● ● Understanding of industry standard data integration architectures Ability to translate functional requirements into technical design specifications Ability to develop technical design documents Strong Business Analysis and problem solving skills Familiarity with Enterprise Architecture Structures (Zachman/TOGAF) or equivalent Experience and/or training in appropriate platforms for the project Familiarity with appropriate modeling techniques such as UML and ER modeling as appropriate ● ● Recommended Training ● ● ● ● ● Operating Systems DBMS PowerCenter Developer and Administrator .Level I PowerCenter New Features Basic and advanced XML Last updated: 25-May-08 16:19 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 44 of 1017 .

Reports to: ● Project Sponsor or Business Project Manager Responsibilities: ● ● ● ● ● ● Defines and implements the methodology adopted for the project Liaises with the Project Sponsor and Business Project Manager Manages project resources within the project scope.Data Warehousing 45 of 1017 . he/she works with the project sponsor. time-line and budget Ensures all business requirements are accurate Communicates project progress to Project Sponsor(s) Is responsible for ensuring delivery on commitments and ensuring that the delivered solution fulfills the needs of the business Performs requirements analysis. ad-hoc reporting and resource leadership ● Qualifications/Certifications ● ● ● ● Translates strategies into deliverables Prioritizes and balances competing priorities Must be able to work effectively with both business and technical stakeholders Has knowledge of the tools and technologies used in the data integration solution Holds certification in industry vertical knowledge (if applicable) ● Recommended Training INFORMATICA CONFIDENTIAL Velocity v8 Methodology . As such.Technical Project Manager The Technical Project Manager has overall responsibility for managing the technical resources within a project. documentation. business project manager and development team to assign the appropriate resources for a project within the scope. and budget and to ensure that project deliverables are met. schedule.

Data Warehousing 46 of 1017 .● ● ● ● Project Management Techniques PowerCenter Developer Level I PowerCenter Administrator Level I Data Analyzer Introduction Last updated: 01-Feb-07 18:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

and test data. He/She uses the procedures as defined in the test strategy to execute.Data Warehousing 47 of 1017 . The Test Engineer is also responsible for complete execution including design and implementing test scripts. test suites of test cases. test requirements documentation. During test planning. report results and progress of test execution and to escalate testing issues as appropriate. The Test Engineer should be able to demonstrate knowledge of testing techniques and to provide feedback to developers. the Test Engineer works with the Testing Manager/Quality Assurance Manager to finalize the test plans and to ensure that the requirements are testable.Test Engineer The Test Engineer is responsible for completion of test plans and their execution. Reports to: ● Test Manager (or Quality Assurance Manager) Responsibilities: ● ● Provides input to the test plan and executes it Carries out requested procedures to ensure that Data Integration systems and services meet organization standards and business requirements Develops and maintains test plans. test cases and test scripts Verifies compliance to commitments contained in the test plans Escalates issues and works to resolve them Participates in testing and post-production verification efforts Executes test scripts and documents and provides the results to the test manager Provides feedback to developers Investigates and resolves test failures ● ● ● ● ● ● ● Qualifications/Certifications ● ● ● Solid understanding of the Software Development Life Cycle Experience with automated testing tools Strong knowledge of Data Warehouse and Data Integration enterprise INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Data Warehousing 48 of 1017 .environments ● ● Experience in a quality assurance and testing environment Experience in developing and executing test cases and in setting up complex test environments Industry vertical knowledge ● Recommended Training ● ● ● ● PowerCenter Developer Level I &II Data Analyzer Introduction SQL Basics Data Quality Workbench Last updated: 01-Feb-07 18:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

issues and problems with data ● ● ● ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The test manager is also responsible for the creation of the test data set.Data Warehousing 49 of 1017 . An integrated test data set is a valuable project resource in its own right. separate functional and volume test data sets will be required. test requirements documentation. He/she also develops a test schedule that fits into the overall project plan. Finally. In general. It may also be necessary to manufacture a data set which triggers all the business rules and transformations specified for the application. the test data set is very useful to the developers of integration and presentation components. Typically. the Test Manager must continually advocate adherence to the Test Plans. During test planning. these should be derived from the production environment. In most cases.Test Manager The Test Manager is responsible for coordinating all aspects of test planning and execution. apart from its obvious role in testing. the Test Manager works with a development counterpart during test execution. Reports to: ● Technical Project Manager (or Quality Assurance Manager) Responsibilities: ● ● Coordinates all aspects of test planning and execution Carries out procedures to ensure that Data Integration systems and services meet organization standards and business requirements Develops and maintains test plans. the development manager schedules and oversees the completion of fixes for bugs found during testing. test cases and test scripts Develops and maintains test data sets Verifies compliance to commitments contained in the test plans Works with the project management and development teams to resolve issues Communicates concerns. Projects at risk of delayed completion often sacrifice testing at the expense of a highquality end result. the Test Manager becomes familiar with the business requirements in order to develop sufficient test coverage for all planned functionality.

Data Warehousing 50 of 1017 . tracking and verifying bug fixes Industry vertical knowledge Able to work effectively with both business and technical stakeholders Project management ● ● ● ● ● ● Recommended Training ● ● ● PowerCenter Developer Level I Data Analyzer Introduction Data Explorer Last updated: 01-Feb-07 18:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .● ● ● Leads testing and post-production verification efforts Executes test scripts and documents and publishes the results Investigates and resolves test failures Qualifications/Certifications ● ● ● Solid understanding of the Software Development Life Cycle Experience with automated testing tools Strong knowledge of Data Warehouse and Data Integration enterprise environments Experience in a quality assurance and testing environment Experience in developing and executing test cases and in setting up complex test environments Experience in classifying.

Reports to: ● Business Project Manager Responsibilities: ● ● ● Designs. etc. development team and end users Interviews subject matter experts Ensures delivery on training commitments ● ● Qualifications/Certifications ● ● Experience in the training field Ability to create training materials in multiple formats (i.Training Coordinator The Training Coordinator is responsible for the design. The Training Coordinator will also schedule and manage the delivery of the actual training material to the End Users. written. and the End Users to ensure that he/she fully understands the training needs. computerbased. development. The deployment of a data integration solution can only be successful if the End Users fully understand the purpose of the solution.Data Warehousing 51 of 1017 . instructor-led.) Possesses excellent communication skills. the data and metadata available to them. and develops the appropriate training material and delivery approach. both written and verbal Results oriented team player Must be able to work effectively with both business and technical stakeholders Has knowledge of the tools and technologies used in the data integration solution ● ● ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology .e. The Training Coordinator will work the Project Management Team. develops and delivers training materials Schedules and manages logistical aspects of training for end users Performs training need analysis in conjunction with the Project Manager. and the types of analysis they can perform using the application. the development team. and delivery of all requisite training materials..

Recommended Training ● ● ● Training Needs Analysis Data Analyzer Introduction Data Analyzer Report Creation Last updated: 01-Feb-07 18:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 52 of 1017 .

User Acceptance Test Lead The User Acceptance Test Lead is responsible for leading the final testing and gaining final approval from the business users.Data Warehousing 53 of 1017 . Reports to: ● Business Project Manager Responsibilities: ● ● Gathers and clarifies business requirements Interacts with the design team and end users during the development efforts to ensure inclusion of users requirements within the defined scope Reviews technical design proposals Schedules and leads the user acceptance test effort Provides test script/case training to the user acceptance test team Reports on test activities and results Validates that the deployed solution meets the final user requirements ● ● ● ● ● Qualifications/Certifications ● ● ● ● Experience planning and executing user acceptance testing Strong understanding of the business' processes Knowledge of the project solution Excellent communication skills Recommended Training ● N/A INFORMATICA CONFIDENTIAL Velocity v8 Methodology . He/ she then validates that the deployed solution meets the final user requirements. The User Acceptance Test Lead interacts with the End Users and the design team during the development effort to ensure the inclusion of all the user requirements within the original defined scope.

Last updated: 12-Jun-07 16:06 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 54 of 1017 .

2.Data Warehousing 55 of 1017 .2 Plan and Manage Project r 1.1 Define Project r 1.2.1 Establish Project Roles 1.4 Manage Project r r r ● 1.Phase 1: Manage 1 Manage ● 1.1.3 Develop Project Plan 1.3 Perform Project Close INFORMATICA CONFIDENTIAL Velocity v8 Methodology .2 Develop Project Estimate 1.2.2.3 Assess Centralized Resources r r ● 1.1 Establish Business Project Scope 1.1.1.2 Build Business Case 1.

comprehensive scope can be used to develop a work breakdown structure (WBS) and establish project roles for summary task assignments. the final step is to obtain project closure. These elements include: ● Scope . The measurable.At the end of each project. The business objectives should also spell out a complete inventory of business processes to facilitate a collective understanding of these processes among project team members. businessrelevant outcomes expected from the project should be established early in the development effort. A well-defined. A thorough. The plan should also spell out the change and control process that will be used for the project. Additionally. a project evaluation will help in retaining lessons learned and assessing the success of the overall effort.The project plan should detail the project scope as well as its objectives.Data Warehousing 56 of 1017 . and assumptions. Part of this closure is to ensure the completeness of the effort and obtain sign-off for the project. ● ● Prerequisites None Roles Business Project Manager (Primary) Data Integration Developer (Secondary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Phase 1: Manage Description Managing the development of a data integration solution requires extensive planning. Planning/Managing . comprehensive plan provides the foundation from which to build a project solution. required work efforts. The goal of this phase is to address the key elements required for a solid project foundation. Then. an estimate of the expected Return on Investment (ROI) can be developed to gauge the level of investment and anticipated return. risks. Project Close/Wrap-Up .Clearly defined business objectives.

Data Warehousing 57 of 1017 .Data Quality Developer (Secondary) Data Transformation Developer (Secondary) Presentation Layer Developer (Secondary) Production Supervisor (Approve) Project Sponsor (Primary) Quality Assurance Manager (Approve) Technical Architect (Primary) Technical Project Manager (Primary) Considerations None Best Practices None Sample Deliverables None Last updated: 20-May-08 18:53 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Phase 1: Manage Task 1. any discussion of implementation specifics should be avoided at this time. defining in business terms the purpose and scope of the project as well as the value to the business (i. Prerequisites None Roles Business Analyst (Primary) Business Project Manager (Primary) Project Sponsor (Primary) Considerations There are no technical considerations during this task.1 Define Project Description This task entails constructing the business context for the project.e.Data Warehousing 58 of 1017 . Best Practices None Sample Deliverables Project Definition INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The focus here is on defining the project deliverable in business terms with no regard for technical feasibility.. the business case). in fact. Any discussion of technologies is likely to sidetrack the strategic thinking needed to develop the project objectives.

Data Warehousing 59 of 1017 .Last updated: 01-Feb-07 18:43 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

1. of a less-than-direct path to limited success.Phase 1: Manage Subtask 1. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Prerequisites None Roles Business Analyst (Primary) Business Project Manager (Review Only) Project Sponsor (Primary) Considerations The primary consideration in developing the Business Project Scope is balancing the high-priority needs of the key beneficiaries with the need to provide results within the near-term. at least. there is a much higher risk of failure or.Data Warehousing 60 of 1017 . The Project Manager and Business Analysts need to determine the key business needs and determine the feasibility of meeting those needs to establish a scope that provides value. If the business purpose is unclear or the boundaries of the business objectives are poorly defined.1 Establish Business Project Scope Description In many ways the potential for success of the development effort for a data integration solution correlates directly to the clarity and focus of its business scope. typically within a 60 to 120 day time-frame.

it is often difficult to gather all of the project beneficiaries and the project sponsor together for any single meeting.Data Warehousing 61 of 1017 . A "forum" type of meeting may be the most efficient way to gather the necessary information since it minimizes the amount of time involved in individual interviews and often encourages useful dialog among the participants. involve as many project beneficiaries as possible in the needs assessment and goal definition. so you may have to arrange multiple meetings and summarize the input for the various participants. Best Practices None Sample Deliverables Project Charter Last updated: 01-Feb-07 18:43 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . However.Tip As a general rule.

ROI modeling is valuable because it: ● ● Supplies a fundamental cost-justification framework for evaluating a data integration project.benefits and costs . wherein the data breaks down in the target system due to a poor understanding of the data and metadata. and in that process. please see Assessing the Business Case. if possible. Mandates advance planning among all appropriate parties. The best way to do this. a data integration project can succeed from an IT perspective but deliver little if any business value if the data within the system is faulty. helps them set realistic expectations for the data integration solution or the data quality initiative. while technical considerations drive the cost assessments. a CRM system containing a dataset with a large quantity of redundant or inaccurate records is likely to be of little value to the business. Common business imperatives include: ● ● Improving decision-making and ensuring regulatory compliance. quantify that value. These two assessments .Phase 1: Manage Subtask 1. For example. Consider a data integration project that is planned and resourced meticulously but that is undertaken on a dataset where the data is of a poorer quality than anyone realized. as much as possible. Helps organizations clarify and agree on the benefits they expect.2 Build Business Case Description Building support and funding for a data integration solution nearly always requires convincing executive IT management of its value to the business. poor quality data can lead to failures in compliance with industry regulations and even to outright project failure at the IT level. Prerequisites 1. For more details on how to quantify business value and associated data integration project cost.Data Warehousing 62 of 1017 . It is vital to acknowledge data quality issues at an early stage in the project. What is worse. Defective data leads to breakdowns in the supply chain. Often an organization does not realize it has data quality issues until it is too late. ● In addition to traditional ROI modeling on data integration initiatives. and inferior customer relationship management.Business Benefits When creating your ROI model. it is best to start by looking at the expected business benefit of implementing the data integration solution. is to actually calculate the project's estimated return on investment (ROI) through a business case that calculates ROI.form the basis for determining overall ROI to the business.1. and executive management. including IT team members. Poor data quality costs organizations vast sums in lost revenues. data quality should be a consideration in ROI modeling for all data integration projects – from the beginning. This can lead to the classic “code-load-explode” scenario. poor business decisions. The business beneficiaries are primarily responsible for assessing the project benefits. Moreover.1 Establish Business Project Scope Roles Business Project Manager (Secondary) Considerations The Business Case must focus on business value and. quantitative and qualitative ROI assessments should also include assessments of data quality. business users. For this reason. Building the Business Case Step 1 . INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Modernizing the business to reduce costs.1.

but also over the course of time. it is important to estimate the value derived from the data integration project. where data sources are moved to enable a new application or system. Retirement of legacy systems. Data Synchronization project.● ● ● Merging and acquiring other organizations. where data between two source systems need to stay perfectly consistent to enable different applications or systems. supplier. You can estimate the value by asking questions such as: ● ● ● ● What is the business goal of this project? Is this relevant? What are the business metrics or key performance indicators associated with this goal? How will the business measure the success of this initiative? How does data accessibility affect the business initiative? Does having access to all of your data improve the business initiative? How does data availability affect the business initiative? Does having data available when it’s needed improve the business initiative? How does data quality affect the business initiative? Does having good data quality improve the business initiative? Conversely. Data Migration project. Most business cases and associated ROI models factor in expected business value for at least three years. If you are still struggling with estimating business value with the data integration initiative. Outsourcing non-core business functions to be able to focus on your company’s core value proposition. Data Consolidation project. which enables new business insight usually through business intelligence. you’ll start to be able to equate business value. mistyped information and other data deficiencies. in a monetary number. Master Data Management project. master view of the data. Application consolidation initiatives. Increasing business profitability. where data from external partners is transformed to internal formats for processing by internal systems and responses are transformed back to partner appropriate formats. Business process outsourcing (BPO) and/or Software as a Service (SaaS). Remember to not only estimate the business value over the first year after implementation. see the table below that outlines common business value categories and how they relate to various data integration initiatives: Business Value Category INCREASE REVENUE Explanation Typical Metrics Data Integration Examples INFORMATICA CONFIDENTIAL Velocity v8 Methodology . what is the potential negative impact of having poor data quality on the business initiative? How does data auditability affect the business? Does having an audit trail of your data improve the business initiative from a compliance perspective? How does data security affect the business? Does ensuring secure data improve the business initiative? ● ● ● ● After asking the questions above. missing information. Each of these business imperatives requires support via substantial IT initiatives. you must be able to integrate data from a variety of disparate systems.Data Warehousing 63 of 1017 . Establishment of data hubs for customer. and/or product data. The form of those data integration projects may vary. where the goals are to cleanse data and to correct errors such as duplicates. For these IT initiatives to be successful. ● ● ● Once you have established the heritage of your data integration project back to its origins in the business imperatives. where multiple data sources come together to form a more complex. with the data integration project. where certain data sources or applications are retired in favor of another. B2B Data Transformation project. Common IT initiatives include: ● ● ● ● ● Business intelligence initiatives. Data Quality project. You may have a: ● ● ● ● Data Warehousing project.

contract manufacturers.Financial reconciliation .Single view of customer across all products.com.revenue per transaction New Product / Service Delivery Accelerate new product/service .Promotions effectiveness analysis Sales and Channel Increase sales productivity.sales per rep or per employee .quote-to-cash cycle time . etc.new product/service adoption rate Pricing / Promotions Set pricing and promotions to stimulate demand while improving margins .% fraudulent transactions .distribution costs per unit .New Customer Acquisition Lower the costs of acquiring new customers .new product/service launch time rate" of new offerings .cross-supplier purchasing history .# new products launched/year introductions.Data Warehousing 64 of 1017 .Asset management/tracking INFORMATICA CONFIDENTIAL Velocity v8 Methodology .demand forecast accuracy .Cross-geography/cross-channel pricing visibility . and improve inventory management .cost-per-impression.Sales/agent productivity dashboard .cost per lead .End-of-quarter days to close .production cycle times products and/or deliver services .cost per new customer acquisition . directory services.margins . g.delivery date reliability Logistics & Distribution . cost-per-action LOWER COSTS Supply Chain Management Lower procurement costs.straight-through-processing rate Lower distribution costs and improve visibility into distribution chain .Demand chain synchronization .# new customers acquired/month per sales rep or per office/store .Marketing analytics & customer segmentation .invoicing/collections reconciliation . Collections and Fraud Prevention Improve invoicing and collections efficiency.Customer lifetime value analysis .Data sharing across design.cross-enterprise inventory rollup . increase supply chain visibility. and improve "hit .% uncollectible .fraud detection Financial Management Streamline financial management and reporting .integration with third party logistics management and distribution partners Invoicing.Sales & demand analytics .scheduling and production synchronization Production & Service Delivery Lower the costs to manufacture . development.Marketing analytics .% share of wallet .profitability per segment .product master data integration . salesforce.Integration of third party data (from credit bureaus.Customer master data integration . and detect/prevent fraud . channels .inventory turns .average delivery times . Management and improve visibility into demand .Data sharing with third parties e.customer lifetime value . marketing agencies .# invoicing errors .demand analysis .Financial data warehouse/ reporting .# products/customer .) Cross-Sell / Up-Sell Increase penetration and sales .cost per unit (product) .Differential pricing analysis and tracking .purchasing discounts .Financial reporting efficiency .close rate .cost per transaction (service) .Asset utilization rates .DSO (days sales outstanding) . channels. production and marketing/sales teams .% cross-sell rate within existing customers .

Prevent compliance outages to -# negative audit/inspection findings g. In most cases.data loss) Step 2 – Calculating the Costs Now that you have estimated the monetary business value from the data integration project in Step 1.Corporate performance management . SEC/SOX/Basel avoid investigations.Reference data integration . 2005 • The top-performing third of Integration Competency Centers (ICCs) will save an average of: • 30% in data interface development time and costs • 20% in maintenance costs INFORMATICA CONFIDENTIAL Velocity v8 Methodology .recover point objective (RPO -. recovery costs.recovery time objective (RTO) .probability of loss . energy or capital assets .cost of compliance lapses (fines. and lower recovery costs Risk .Resiliency and automatic failover/recovery for all data integration processes Business Reduce downtime and lost Continuity/ business. One scenario would be implementing that data integration with tools from Informatica.Financial reporting . lost business) . commodity.audit/oversight costs .mean time between failure (MTBF) . while the other scenario would be implementing the data integration project without Informatica’s toolset. "Integration Competency Center: Where Are Companies Today?". hand coding: • 31% in development costs • 32% in operations costs • 32% in maintenance costs • 35% in overall project life-cycle costs Gartner.expected loss . .probability of compliance lapse II/PCI) and negative impact on brand . you will need to calculate the associated costs with that project in Step 2.Risk management data warehouse . prevent loss of key Disaster Recovery data. the data integration project is inevitable – one way or another the business initiative is going to be accomplished – so it is best to compare two alternative cost scenarios.safeguard and control costs .Scenario analysis . "The Total Economic Impact of Deploying Informatica PowerCenter".errors & omissions . Some examples of benchmarks to support the case for Informatica lowering the total cost of ownership (TCO) on data integration and data quality projects are outlined below: Benchmarks from Industry Analysts.Data Warehousing 65 of 1017 . Consultants.MANAGE RISK Compliance Risk(e. and Authors Forrester Research.Compliance monitoring & reporting Financial/Asset Risk Management Improve risk management of key assets.mean time to recover (MTTR) . 2004 The average savings of using a data integration/ETL tool vs. including financial. penalties.

• Overall customer loss averaged 2. Wiley Computer Publishing. unbudgeted spending averaged $5 million per company. and lost and missed revenue may be as high as 10 to 25 percent of revenue or total budget of an organization. or $15 per customer record • Opportunity costs covering loss of existing customers and increased difficulty in recruiting new customers averaged $7.5 million per company.• The top-performing third of ICCs will achieve 25% reuse of integration components Larry English.6 percent of all customers and ranged as high as 11 percent In addition to lowering cost of implementing a data integration solution. workarounds. In order to quantify the value of risk mitigation. out-of-pocket. 1999.000 consumer records • Total costs to recover from a breach averaged $14 million per company.Data Warehousing 66 of 1017 . even though the values may be valid. • "The business costs of non-quality data. calls to individual customers. increased call center costs and discounted product offers • Indirect costs for lost employee productivity averaged $1. Improving Data Warehouse and Business Information Quality. or $50 per lost customer for outside legal counsel.500 to 900. including irrecoverable costs. when you don’t use Informatica for your data integration project. you should consider the cost of project overrun and the associated likelihood of overrun when using Informatica vs. mail notification letters." Ponemon Institute-.5 million per company. An example analysis of risk mitigation value is below: INFORMATICA CONFIDENTIAL Velocity v8 Methodology . or $75 per lost customer record. rework of products and services. Informatica adds value to the ROI model by mitigating risk in the data integration project. may be 25 to 30 percent or more in those same databases. or $140 per lost customer record • Direct costs for incremental." • "Large organizations often have data redundantly stored 10 times or more.Study of costs incurred by 14 companies that had security breaches affecting between 1." • "Invalid data values in the typical customer database averages around 15 to 20 percent… Actual data errors.

Data Warehousing 67 of 1017 . The following isa sample summary of an ROI model: INFORMATICA CONFIDENTIAL Velocity v8 Methodology . put all of this information into a format that is easy-to-read for IT and line of business executive management.Step 3 – Putting it all Together Once you have calculated the three year business/IT benefits and the three year costs of using PowerCenter vs. not using PowerCenter.

three areas should be considered: 1. 3. less development effort.Data Warehousing 68 of 1017 . To prove the value. 2.For data migration projects it is frequently necessary to prove that using Informatica technology for the data migration efforts has benefits over traditional means. Best Practices None Sample Deliverables None Last updated: 20-May-08 19:09 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Informatica delivered migrations will have lower risk due to ease of maintenance. Informatica Software can reduce the overall project timeline by accelerating migration development efforts. Availability of lineage reports as to how the data was manipulated by the data migration process and by whom. higher quality of data. and increased project management tools with the metadata driven solution.

and it is even possible that some new tasks will be created.Data Warehousing 69 of 1017 . If an ICC does not already exist. some may no longer be required. there are points in the development cycle where the availability of some degree of centralized resources has a material effect on the Velocity Work Breakdown Structure (WBS). Typically. the ICC acquires responsibility for some or all of the data integration infrastructure (essentially the Non-Functional Requirements) and the project teams are liberated to focus on the functional requirements.informatica.3 Assess Centralized Resources Description The pre-existence of any centralized resources such as an Integration Competency Center (ICC) has an obvious impact on the tasks to be undertaken in a data integration project. an ICC section is included under the Considerations heading where alternative or supplementary activity is required if an ICC is in place.1. However. it is necessary to assess the extent and nature of the resources available in order to demarcate the responsibilities between the ICC and project teams. The precise division of labor is obviously dependent on the degree of centralization and the associated ICC model that has been adopted. If an ICC does exist. The objective in Velocity is not to replicate the material that is available elsewhere on the set-up and operation of an ICC(http://www. Prerequisites None Roles Business Project Manager (Primary) Considerations INFORMATICA CONFIDENTIAL Velocity v8 Methodology . In the task descriptions that follow.com/solutions/ icc/default.Phase 1: Manage Subtask 1.htm ). some tasks are altered. this subtask is finished since there are no centralized resources to assess and all the tasks in the Velocity WBS are the responsibility of the development team.

It is the responsiblity of the project manager to review the Velocity WBS in the light of the services provided by the ICC.Data Warehousing 70 of 1017 . Best Practices Selecting the Right ICC Model Planning the ICC Implementation Sample Deliverables None Last updated: 01-Feb-07 18:43 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The responsibility for each subtask should be established.

Phase 1: Manage Task 1. project management activities involve reconciling trade-offs between business requests as to functionality and timing with technical feasibility and budget considerations. and the continuing management of expectations through status reporting. This often means balancing between sensitivity to project goals and INFORMATICA CONFIDENTIAL Velocity v8 Methodology . issue tracking and change management.Data Warehousing 71 of 1017 .2 Plan and Manage Project Description This task incorporates the initial project planning and management activities as well as project management activities that occur throughout the project lifecycle. Prerequisites None Roles Business Project Manager (Primary) Data Integration Developer (Secondary) Data Quality Developer (Secondary) Presentation Layer Developer (Secondary) Project Sponsor (Approve) Technical Architect (Primary) Technical Project Manager (Primary) Considerations In general. It includes the initial structure of the project team and the project work steps based on the business objectives and the project scope.

are detailed documentation and frequent review of the status of the project effort against plan. For B2B projects. high profile projects such as implementing a new ERP system that will often cost in the millions of dollars. so the resource requirements for the Data Migration must be understood and guaranteed as part of the larger effort overseen by the PMO. these roles will have responsibility beyond the data migration. and maintaining a firm grasp of what is feasible ("telling the truth") on the other. Best Practices None Sample Deliverables None Last updated: 20-May-08 19:13 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 72 of 1017 . apart from strong people skills (especially. Successful project management is predicated on regular communication of these project aspects with the project manager. and of the risks regarding enlargement of scope ("change management"). It is important to identify the roles and gain the understanding of the PMO as to how these roles are needed and will intersect with the broader system implementation. The tools of the trade. and with other management and project personnel. of the unresolved issues. Informatica recommends having the Technical Architect directly involved throughout the process. interpersonal communication skills). More specifically. For data migration projects there is often a project management office (PMO) in place The PMO is typically found in high dollar. technical considerations typically play an important role.concerns ("being a good listener") on the one hand. The format of data received from partners (and replies sent to partners) forms a key consideration in overall business operations and has a direct impact on the planning and scoping of changes.

. This is a precursor to building the project team and making resource assignments to specific tasks.1 Establish Business Project Scope provides a primary indication of the required roles and skill sets.1.... What responsibilities will fall to the company resources and which are offloaded to a consultant? Who (i.2...for deployment/training/support? ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology .for data architecture? .Data Warehousing 73 of 1017 .e...for documentation? .1 Establish Project Roles Description This subtask involves defining the roles/skill sets that will be required to complete the project. The following types of questions are useful discussion topics and help to validate the initial indicators: ● What are the main tasks/activities of the project and what skills/roles are needed to accomplish them? How complex or broad in scope are these tasks? This can indicate the level of skills needed.Phase 1: Manage Subtask 1. company resource or consultant) will provide the project management? Who will have primary responsibility for infrastructure requirements? . Prerequisites None Roles Business Project Manager (Primary) Project Sponsor (Approve) Technical Project Manager (Primary) Considerations The Business Project Scope established in 1. for testing? .

resulting ‘shortcuts’ often lead to quality problems in production systems. or Production Supervisor. These responsibilities directly conflict with the developer’s need to meet a tight development schedule. as well as the level of involvement expected from company personnel and consultant personnel. Before defining any roles. DBA. there is often pressure to combine roles due to limited funding or availability of resources. a QA Manager or Test Manager or Lead should not be the same person as a Project Manager or one of the development team. Network Administrator. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . After the project scope and required roles have been defined. In defining the necessary roles. If this agreement has not been explicitly accomplished. project quality may suffer. be sure that the Project Sponsor is in agreement as to the project scope and major activities. Repository Administrator. and if one person fills both of these roles. be sure to provide the Sponsor with a full description of all roles. Those roles require operational diligence and adherence to procedure as opposed to ad hoc development. indicating which will rely on company personnel and which will use consultant personnel. When development roles are mixed with operational roles. These roles should be defined as generally as possible rather than attempting to match a requirement with a resource at hand.● How much development and testing will be involved? This is a definitional activity and very distinct from the later assignment of resources. The classic conflict is between development roles and highly procedural or operational roles. The QA Manager is responsible for determining the criteria for acceptance of project quality and managing quality-related procedures. For similar reasons. Tip Involve the Project Sponsor. The Role Descriptions in Roles provides typical role definitions. There are some roles that inherently provide a healthy balance with one another. For example. The Project Role Matrix can serve as a starting point for completing the project-specific roles matrix.Data Warehousing 74 of 1017 . development personnel are not ideal choices for filling such operational roles as Metadata Manager. review the project scope with the Project Sponsor to resolve any remaining questions. This sets clear expectations for company involvement and indicates if there is a need to fill additional roles with consultant personnel if the company does not have personnel available in accordance with the project timing.

Best Practices None Sample Deliverables Project Definition Project Role Matrix Work Breakdown Structure Last updated: 01-Feb-07 18:43 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 75 of 1017 .

time of day that the travel will occur. details on project execution must be developed. which can be viewed as a list of tasks that must be completed to achieve the desired project results. team skills. For example. and external dependencies always have an impact on the actual effort required. However. an experienced traveller who frequently travels the route between his/her home or office and the airport can easily provide an accurate estimate of the time required for the trip. requiring consideration of numerous factors such as distance to the airport. and often becomes more difficult as project visibility increases and there is an increasing demand for an "exact estimate". then summing the whole. Two important documents required for project execution are the: ● Work Breakdown Structure (WBS).2. The same holds true for estimating INFORMATICA CONFIDENTIAL Velocity v8 Methodology . at this time. a solid project estimate. These details should answer the questions of what must be done. subsequently. The resulting estimate however. speed of available transportation. and how much will it cost. how long it will take. The traveller can arrive at a valid overall estimate by assigning time estimates to each factor. (See Developing a Work Breakdown Structure (WBS) for more details) Project Estimate. The accuracy of an estimate largely depends on the experience of the estimator (or estimators). the estimation process becomes much more complex. which. estimates are useful for providing a close approximation of the level of effort required by the project.Phase 1: Manage Subtask 1. who will do it. The objective of this subtask is to develop a complete WBS and.2 Develop Project Estimate Description Once the overall project scope and roles have been defined.Data Warehousing 76 of 1017 . is not likely to be nearly as accurate as the one based on knowledge gained through experience. It is important to understand that estimates are never exact. Factors such as project complexity. ● Estimating a project is never an easy task. focuses solely on development costs without consideration for hardware and software liabilities. means of transportation. and so on. When the same traveller is asked to estimate travel time to or from an unfamiliar airport however. expected weather conditions.

and specifications for data formats. not included in the initial estimates. Having the entire project team review the WBS when it is near completion helps to ensure that it includes all necessary project tasks. INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 77 of 1017 . Prerequisites None Roles Business Project Manager (Primary) Data Integration Developer (Secondary) Data Quality Developer (Secondary) Data Transformation Developer (Secondary) Presentation Layer Developer (Secondary) Project Sponsor (Approve) Technical Architect (Secondary) Technical Project Manager (Secondary) Considerations An accurate estimate depends greatly on a complete and accurate Work Breakdown Structure. Project deadlines often slip because some tasks are overlooked and. therefore.the time and resources required to complete development on a data integration solution project. Sample Data Requirements for B2B Projects For B2B projects (and non B2B projects that have significant unstructured or semistructured data transformation requirements) the actual creation and subsequent QA of transformations relies on having sufficient samples of input and output data.

any cleansing of sample data required (for example to conform to HIPAA or financial privacy regulations). estimates should include sufficient time to allow for the collection and assembly of sample data. Best Practices None Sample Deliverables None Last updated: 20-May-08 19:17 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 78 of 1017 . and for any data analysis or metadata discovery to be performed on the sample data. By their nature. the full authoring of B2B data transformations cannot be completed (or in some cases proceed) without the availability of adequate sample data both for input to transformations and for comparison purposes during the quality assurance process.When estimating for projects that use Informatica’s B2B Data Transformation.

roles. Deployment. Prerequisites None Roles Business Project Manager (Primary) Project Sponsor (Approve) Technical Project Manager (Secondary) Considerations The initial project plan is based on agreements-to-date with the Project Sponsor regarding project scope. Beta Test and Deployment. At that time. like System Test (or "alpha"). System Test. Major activities (e. are represented in the initial plan as a single set of activities. INFORMATICA CONFIDENTIAL Velocity v8 Methodology .) typically involve their own full-fledged planning processes once the technical design is completed. In some cases.3 Develop Project Plan Description In this subtask. estimation of effort. project timelines and any understanding of requirements.Phase 1: Manage Subtask 1. priorities. or simply on more precise determinations of effort and of start and/or completion dates as the project unfolds. additional activities may be added to the project plan to allow for more detailed tracking of those project activities. etc.. later phases of the project.Data Warehousing 79 of 1017 .2. the Project Manager develops a schedule for the project using the agreed-upon business project scope to determine the major tasks that need to be accomplished and estimates of the amount of effort and resources required. approach. and will be more fully defined as the project progresses.g. Updates to the plan (as described in Developing and Maintaining the Project Plan) are typically based on changes to scope.

The sooner the plan is updated and changes communicated to the Project Sponsor and/or company management.Perhaps the most significant message here is that an up-to-date plan is critical for satisfactory management of the project and for timely completion of its tasks. Keeping the plan updated as events occur and client understanding or needs and expectations change requires an on-going effort. Best Practices Data Migration Velocity Approach Sample Deliverables Project Roadmap Work Breakdown Structure Last updated: 01-Feb-07 18:43 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 80 of 1017 . the less likely that expectations will be frustrated to a problematic level.

or personal. activities and schedule Managing all project issues as they arise.Phase 1: Manage Subtask 1. procedural. expectations and involvement Managing the project team. and making sure that someone accepts responsibility for such occurrences and delivers in a timely fashion. including business requirements reviews and technical reviews Change Management as scope changes are proposed. ● In a more specific sense. project organization.2. project management involves being constantly aware of. its make-up. or preparing for. The management effort includes: ● ● Managing the project beneficiary relationship(s). project management begins before the project starts and continues until its completion and perhaps beyond. anything that needs to be accomplished or dealt with to further the project objectives. including changes to staffing or priorities Issues Management Project Acceptance and Close ● ● ● ● ● Prerequisites None INFORMATICA CONFIDENTIAL Velocity v8 Methodology . involvement. including the initial project scope.4 Manage Project Description In the broadest sense. and project plan Project Status and reviews of the plan and scope Project Content Reviews. priorities. whether technical. Project management begins with pre-engagement preparation and includes: ● Project Kick-off. logistical.Data Warehousing 81 of 1017 .

Best Practices None Sample Deliverables Issues Tracking Project Review Meeting Agenda Project Status Report Scope Change Assessment INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Roles Business Project Manager (Primary) Project Sponsor (Review Only) Technical Project Manager (Primary) Considerations In all management activities and actions. the Project Manager must balance the needs and expectations of the Project Sponsor and project beneficiaries with the needs. Limitations and specific needs of the team must be communicated clearly and early to the Project Sponsor and/or company management to mitigate unwarranted expectations and avoid an escalation of expectation-frustration that can have a dire effect on the project outcome. In addition to "expectation management". Issues that affect the ability to deliver in any sense. knowledge-transfer and testing procedures. and potential changes to scope. This involves soliciting specific requirements with subsequent review of deliverables that include in addition to the data integration solution documentation. user interfaces. limitations and morale of the project team. must be brought to the Project Sponsor's attention as soon as possible and managed to satisfactory resolution. project management includes Quality Assurance for the project deliverables.Data Warehousing 82 of 1017 .

Data Warehousing 83 of 1017 .Last updated: 01-Feb-07 18:43 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Prerequisites None Roles Business Project Manager (Primary) Production Supervisor (Approve) Project Sponsor (Approve) Quality Assurance Manager (Approve) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . justifications for tasks expected but not completed. along with a final status report. lessons learned. This task should also generate a reconciliation document.3 Perform Project Close Description This is a summary task that entails closing out the project and creating project wrap-up documentation. and any recommendations for future work on the end product. Each project should end with an explicit closure procedure. experience is an important tool for succeeding in future efforts. A Project Close Report should be completed at the conclusion of the effort. reconciling project time/budget estimates with actual time and cost expenditures. The project close documentation should highlight project accomplishments. This process should include Sponsor acknowledgement that the project is complete and the end product meets expectations.Data Warehousing 84 of 1017 .Phase 1: Manage Task 1. Building upon the experience of a project team and publishing this information will help future teams succeed in similar efforts. As mentioned earlier in this chapter.

Data Warehousing 85 of 1017 .Technical Project Manager (Approve) Considerations None Best Practices None Sample Deliverables Project Close Report Last updated: 01-Feb-07 18:43 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

1 Define Business Drivers.2.Phase 2: Analyze 2 Analyze ● 2.2 Review Metadata Sourcing Requirements 2. Objectives and Goals 2.2 Determine Sourcing Feasibility 2.3.7 Determine Regulatory Requirements 2.5 Define Metadata Requirements r 2.1 Perform Data Quality Analysis of Source Data 2.6 Determine Technical Readiness 2.1 Establish Inventory of Technical Metadata 2.2 Report Analysis Results to the Business r INFORMATICA CONFIDENTIAL Velocity v8 Methodology .8.3.4 Define Functional Requirements 2.8 Perform Data Quality Audit r 2.5.Data Warehousing 86 of 1017 .2 Define Business Requirements r ● 2.3 Define Business Scope r 2.2 Establish Data Stewardship r ● 2.5.5 Build Roadmap for Incremental Delivery r r r ● ● 2.3.3 Determine Target Requirements 2.3 Assess Technical Strategies and Policies r r ● ● ● 2.8.1 Identify Source Data Systems 2.1 Define Business Rules and Definitions 2.2.3.5.

Many development failures and project cancellations can be traced to an absence of adequate upfront planning and scope definition. The purpose of the Analyze Phase is to build a solid foundation for project scope through a deliberate determination of the business drivers. Inadequately defined or prioritized objectives and project requirements foster scenarios where project scope becomes a moving target as requirements may change late in the game. requiring repeated rework of design or even development tasks. Once the business case for a data integration or business intelligence solution is accepted and key stakeholders are identified. requirements. and priorities that will form the basis of the project design and development. organizations demand faster. the process of detailing and prioritizing objectives and requirements can begin .Phase 2: Analyze Description Increasingly. and cheaper delivery of data integration and business intelligence solutions. better.Data Warehousing 87 of 1017 . a roadmap for major project stages. Prerequisites None Roles Application Specialist (Primary) Business Analyst (Primary) Business Project Manager (Primary) Data Architect (Primary) Data Integration Developer (Primary) Data Quality Developer (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology .with the ultimate goal of defining project scope and. if appropriate.

and must be based on commonly agreed-upon definitions of business information.Data Warehousing 88 of 1017 .Data Steward/Data Quality Steward (Primary) Database Administrator (DBA) (Primary) Legal Expert (Primary) Metadata Manager (Primary) Project Sponsor (Secondary) Security Manager (Primary) System Administrator (Primary) Technical Architect (Primary) Technical Project Manager (Primary) Considerations Functional and technical requirements must focus on the business goals and objectives of the stakeholders. thereby providing value to the business even though there may be a much longer timeline to complete the entire project. The initial business requirements are then compared to feasibility studies of the source systems to help the prioritization process that will result in a project roadmap and rough timeline. In addition. Best Practices None Sample Deliverables None INFORMATICA CONFIDENTIAL Velocity v8 Methodology . during this phase it can be valuable to identify the available technical metadata as a way to accelerate the design and improve its quality. This sets the stage for incremental delivery of the requirements so that some important needs are met as soon as possible. A successful Analyze Phase can serve as a foundation for a successful project.

Data Warehousing 89 of 1017 .Last updated: 01-Feb-07 18:43 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

or increased business value that the project is likely to resolve or deliver.Data Warehousing 90 of 1017 . issues.Phase 2: Analyze Task 2. Objectives should be explicitly defined so that they can be evaluated at the conclusion of a project to determine if they were achieved. of a less-thandirect path to likely limited success. The specific deliverables of an IT project. Objectives written for a goal statement are nothing more than a deconstruction of the goal statement into a set of necessary and sufficient objective statements. That is. Business Drivers The business drivers explain why the solution is needed and is being recommended at a particular time by identifying the specific business problems. the business objectives should be written so they are understandable by all of the project stakeholders. However. there is a much higher risk of failure or. for instance. at least. Business drivers may include background information necessary to understand the problems and/or needs. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . There should be clear links between the project’s business drivers and the company’s underlying business strategies. Objectives are important because they establish a consensus between the project sponsor and the project beneficiaries regarding the project outcome.1 Define Business Drivers. every objective must be accomplished to reach the goal. Business Objectives Objectives are concrete statements describing what the project is trying to achieve. If the business objectives are vague. may or may not make sense to the project sponsor. and no objective is superfluous. Objectives and Goals Description In many ways. the potential for success of any data integration/business intelligence solution correlates directly to the clarity and focus of its business scope.

Prerequisites None Roles Business Project Manager (Review Only) Project Sponsor (Review Only) Considerations Business Drivers The business drivers must be defined using business language.Data Warehousing 91 of 1017 . the goal is bound by a number of objective statements. its quality focus. These objective statements clarify the fuzzy boundary of the goal statement. They are the foundation for project planning and scope definition. Characteristics of a well-defined goal should reference the project's business benefits in terms of cost. If the goal's achievement can be measured. figures. Because goals are high-level statements. The goal provides focus and serves as the compass for determining if the project outcomes are appropriate. Taken as a pair.Business Goals Goal statements provide the overall context for what the project is trying to accomplish. and/or quality. it may take more than one project to achieve a stated goal. If the goal is not achievable through any combination of projects. In the project management life cycle. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Every project should have at least one goal. It is the agreement between the company and the project sponsor about what is going to be accomplished by the project. time. Identify how the project is going to resolve or address specific business problems. They should align with the company's stated business goals and strategies. Project context is established in a goal statement by stating the project's object of study. its purpose. and other pertinent background information to support the existence of a problem. Key components when identifying business drivers include: ● Describe facts. it is probably too abstract and may be a vision statement. the goal and objectives statements define the project. and its viewpoint. it is probably defined at too low a level and may actually be an objective.

● Explain how the project resolves or helps to resolve the problem in terms familiar to the business. and principles. A measure . A time frame . Attainable/Achievable. ● ● ● ● Specific: An objective should address a specific target or accomplishment. make sure each objective contains four parts. Realistic and Time-bound (SMART). The deliverables of the project are created based on the objectives not the other way around. An action . This type of meeting encourages discussion among participants and minimizes the amount of time involved in defining business objectives and goals. strategies. Attainable: If an objective cannot be achieved. The business objectives should take into account the results of any data quality investigations carried out before or during the project. While goal statements are designed to be vague. A meeting between all major stakeholders is the best way to create the objectives and gain a consensus on them at the same time. Measurable: Establish a metric that indicates that an objective has been met. as follows: ● ● ● ● An outcome . a well-worded objective is Specific. Show any links to business goals. Business Objectives Before the project starts. define and agree on the project objectives and the business goals they define. ● At a minimum. It may not be possible to gather all the project beneficiaries and the project sponsor together at the same time so multiple meetings may have to be arranged with the results summarized. then it's probably a goal. Consider explaining the origins of the significant requirements as a way of explaining why the project is needed.the expected completion date of the project. Time-bound: Achieve objectives within a specified time frame.describe what the project will accomplish.Data Warehousing 92 of 1017 . ● Large projects often have significant business and technical requirements that drive the project's development. If the project source data quality INFORMATICA CONFIDENTIAL Velocity v8 Methodology .metric(s) that will measure success of the project. Realistic: Limit objectives to what can realistically be done with available resources.how to meet the objective. Measurable.

If the project has specific data-related objectives. Since the goal statement is meant to be succinct. Smaller projects generally have a single goal. and risks. then a high degree of data quality may be an objective in its own right. such as regulatory compliance objectives.is low. regardless of the number of goals a project has. data quality investigations (such as a Data Quality Audit) should be carried out as early as is feasible in the project life-cycle. Regardless of the number of objectives identified. For this reason. costs. Business Goals The goal statement must also be written in business language so that anyone who reads it can understand it without further explanation. which should also be prioritized. Lower level. See 2.8 Perform Data Quality Audit. Provide overall context for what the project is trying to accomplish. detailed objectives tend to require less descriptive narrative and deconstruct into fewer deliverables to obtain. the priority should be established by ranking the objectives with their respective impacts. The goal statement should: ● ● ● Be short and to the point. time and quality. then the project's ability to achieve its objectives may be compromised. There is considerable discretion in how granular a project manager may get in defining objectives. Generally speaking. the goal statement should always be brief and to the point. Low investment projects must be more modest in the objectives they pursue. Larger projects may have more than one goal. Be aligned to business goals in terms of cost. High-level objectives generally need a more detailed explanation and often lead to more definition in the project's deliverables to obtain the objective. Best Practices None Sample Deliverables None Last updated: 18-May-08 17:36 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . the number of objectives comes down to how much business investment is going to be made in pursuit of the project's goals. High investment projects generally have many objectives.Data Warehousing 93 of 1017 .

Phase 2: Analyze Task 2. The goal of this task is to ensure the participation and consensus of the project sponsor and key beneficiaries during the discovery and prioritization of these information requirements.2 Define Business Requirements Description A data integration/business intelligence solution development project typically originates from a company's need to provide management and/or customers with business analytics or to provide business application integration. Project success will be based on clearly identifying and accurately resolving these informational needs with the proper timing.Data Warehousing 94 of 1017 . This requires determining what information is critical to support the project objectives and its relation to important strategic and operational business processes. As with any technical engagement. the first task is to determine clear and focused business requirements to drive the technology implementation. Prerequisites None Roles Business Project Manager (Primary) Data Quality Developer (Secondary) Data Steward/Data Quality Steward (Primary) Legal Expert (Approve) Metadata Manager (Primary) Project Sponsor (Approve) INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

For instance. Tactical Requirements ● The tactical requirements serve the ‘day to day’ business. They would want to answer questions such as. Strategic requirements are typically implemented through a data warehouse type project with appropriate visualization tools. 'What is the revenue of area ‘a’ in January of this year as compared to last year?’. ‘How has the turnover of product ‘x’ increased over the last year?’ or. Best Practices None Sample Deliverables None Last updated: 02-May-08 12:05 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Considerations In a data warehouse/business intelligence project. Operational level employees want solutions to enable them to manage their on-going work and solve immediate problems.Data Warehousing 95 of 1017 . a distributor running a fleet of trucks has an unavailable driver on a particular day. Strategic Requirements ● The customer management is typically interested in strategic questions that often include a significant timeframe. 'How can the delivery schedule be altered in order to meet the delivery time of the highest priority customer?' Answers to these questions are valid and pertinent for only a short period of time in comparison to the strategic requirements. Answers to strategic questions provide company executives with the information required to build on the company strengths and/or to eliminate weaknesses. there can be strategic or tactical requirements. For example. Tactical requirements are often implemented via operational data integration.

2. For example. Prerequisites None Roles Data Quality Developer (Secondary) Data Steward/Data Quality Steward (Primary) Legal Expert (Approve) Metadata Manager (Primary) Security Manager (Approve) Considerations Formulating business rules is an iterative process.1 Define Business Rules and Definitions Description A business rule is a compact and simple statement that represents some important aspect of a business process or policy. and removal of persistent data in an information system. By capturing the rules of the business—the logic that governs its operation—systems can be created that are fully aligned with the needs of the organization.Data Warehousing 96 of 1017 . updating. a new bank account cannot be created unless the customer has provided an adequate proof of identification and address. Business rules stem from the knowledge of business personnel and constrain some aspect of the business. a business rule expresses specific constraints on the creation. often stemming from statements of INFORMATICA CONFIDENTIAL Velocity v8 Methodology . From a technical perspective.Phase 2: Analyze Subtask 2.

no more than. In this approach. Re-use existing definitions if available. The following set of guidelines follow best practices and provide practical instructions on how to formulate business rules: ● Start with a well-defined and agreed upon set of unambiguous definitions captured in a definitions repository. provide direct inputs to a subsequent conceptual data modeling and analysis phase. rules that cannot be decomposed further. "the departmental commission paid is calculated as the total commission multiplied by the departmental rollup rate." ● ● ● The aim is to define atomic business rules. Use standard expressions for derivation business rules like "x is calculated from/". derivation.Data Warehousing 97 of 1017 . formal statement of a single term. once formulated. definitions and connections can eventually be mapped onto a data model and constraints and derivations can be mapped onto a set of rules that are enforced in the data model. etc. The components of business rules. that is. only if. or constraint on the business. such as must. must not. the total commission paid to broker ABC can be no more than xy% of the total revenue received for the sale of widgets. Each atomic business rule is a specific. Use standard expressions to constrain business rules. "summed from". For example.policy in an organization. Use meaningful and precise verbs to connect the definitions captured above. For example. fact. Rules are expressed in natural language. etc. Best Practices None Sample Deliverables Business Requirements Specification Last updated: 01-Feb-07 18:43 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

This subtask outlines the roles and responsibilities that key personnel can assume within the framework of an overall stewardship program.2 Establish Data Stewardship Description Data stewardship is about keeping the business community involved and focused on the goals of the project being undertaken. Prerequisites None Roles Business Analyst (Secondary) Business Project Manager (Primary) Data Steward/Data Quality Steward (Secondary) Project Sponsor (Approve) Considerations A useful mix of personnel to staff a stewardship committee may include: ● ● ● ● An executive sponsor A business steward A technical steward A data steward Executive Sponsor INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 98 of 1017 .Phase 2: Analyze Subtask 2.2. This participation should be regarded as ongoing because stewardship activities need to be performed at all stages of a project lifecycle and continue through the operational phase.

These should be distributed to all of the team members working on stewardship activities.Data Warehousing 99 of 1017 . The success of the stewardship function relies on the early establishment and distribution of standardized documentation and procedures. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . including defining and maintaining business and technical rules and liaising with the business and technical communities Reference point for arbitration where data is put to different uses by separate groups of users whose requirements have to be reconciled ● ● The mix of personnel for a particular activity should be adequate to provide expertise in each of the major business areas that will be undertaken in the project.● ● ● ● Chair of the data stewardship committee Ultimate point of arbitration Liaison to management for setting and reporting objectives Should be recruited from project sponsors or management Technical Steward ● ● ● ● Member of the data stewardship committee Liaison with technical community Reference point for technical-related issues and arbitration Should be recruited from the technical community with a good knowledge of the business and operational processes Business Steward ● ● ● ● Member of the data stewardship committee Liaison with business users Reference point for business-related issues and arbitration Should be recruited from the business community Data Steward ● ● Member of the data stewardship committee Balances data and quality targets set by the business with IT/project parameters Responsible for all issues relating to the data.

deciding which is the best data to use. during testing.The data stewardship committee should be involved in the following activities: ● ● ● ● Arbitration Sanity checking Preparation of metadata Support Arbitration Arbitration means resolving data contention issues. Specific tasks are: ● ● ● ● ● ● Determining the structure and contents of the metadata Determining how the metadata is to be collected Determining where the metadata is to reside Determining who is likely to use the metadata Determining what business benefits are provided Determining how the metadata is to be acquired Depending on the tools used to determine the metadata (for example. PowerCenter Profiling option. when conformed dimensions and standardized facts are being formulated by the analysis teams. and after the project goes live.Data Warehousing 100 of 1017 . for example. and determining how this data should best be transformed and interpreted so that it remains meaningful and consistent. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Preparation of Metadata The data stewardship committee should be actively involved in the preparation and verification of technical and business metadata. This is particularly important during the phases where ambiguity needs to be resolved. the Data Steward may take a lead role in this activity. Informatica Data Explorer). This is a key verification task and is particularly important in evaluating prototypes developed in the Analyze Phase . Sanity Checking There is a role for the data stewardship committee to check the results and ensure that the transformation rules and processes have been applied correctly.

Data Warehousing 101 of 1017 . Business metadata is used to answer questions such as: How does this division of the enterprise calculate revenue?" ● Technical metadata . aid understanding.The purpose of maintaining this type of information is to clarify context. and source-target analysis. During the Analyze Phase the team would provide inputs to induction training programs prepared for system users when the project goes live. for example. New Functionality The data stewardship committee needs to assess any major additions to functionality. this activity is of key importance because new functionality needs to be assessed for ongoing development. technical information about how to query the system and semantic information about the data that is retrieved. The assessment should consider return on investment. and scalability in terms of new hardware/software requirements. auditing. Such programs should include. There may be a need to perform this activity during the Analyze Phase if functionality that was initially overlooked is to be included in the scope of the project.● Business metadata . priority. Technical metadata is used to perform analysis such as: “What would be the impact of changing the length of a field from 20 to 30 characters and what systems would be affected?” Support The data stewardship committee should be involved in the inception and preparation of training of the user community by answering questions about data and the tools available to perform analytics. and provide business users with the ability to perform high level searches for information.The purpose of maintaining this type of information is for impact analysis. Best Practices None Sample Deliverables None INFORMATICA CONFIDENTIAL Velocity v8 Methodology . After the project has gone live.

Data Warehousing 102 of 1017 .Last updated: 15-Feb-07 17:55 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

it may appear that everyone views the project scope in the same way. sponsors. there is commonly confusion about what falls inside the boundary of a specific project and what does not. and key stakeholders is critical.Phase 2: Analyze Task 2.3 Define Business Scope Description The business scope forms the boundary that defines where the project begins and ends. Prerequisites None Roles Informatica Velocity v6 (Primary) Data Architect (Primary) Data Integration Developer (Primary) Data Quality Developer (Primary) Data Steward/Data Quality Steward (Secondary) Metadata Manager (Primary) Project Sponsor (Secondary) Technical Architect (Primary) Technical Project Manager (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 103 of 1017 . However. Developing a detailed project scope and socializing it with your project team. Throughout the project discussions about the business requirements and objectives.

Tip As a general rule. Quick WINS are accomplishments in a relatively short time. it is often difficult to gather all of the project beneficiaries and the project sponsor together for any single meeting. some managers are surprised to learn that their assumptions were not correct.Data Warehousing 104 of 1017 . However.they can be included in the business scope. resulting in problems for the project team. without great expense and with a positive outcome . The safest rule is “the more detail. A common mistake made by project teams is to define the project scope only in general terms. Other project teams report problems with "scope creep" as their project gradually takes on more and more work. after significant work has been completed by the project team. involve as many project beneficiaries as possible in the needs assessment and goal definition. A "forum" type of meeting may be the most efficient way to gather the necessary information since it minimizes the amount of time involved in individual interviews and often encourages useful dialog among the participants. the better” along with details regarding what related elements are not within scope or will be delayed to a later effort. Then later. The Project Manager and Business Analysts need to determine the key business needs and determine the feasibility of meeting those needs to establish a scope that provides value. typically within a 60 to 120 day time-frame. Best Practices None Sample Deliverables None Last updated: 18-May-08 17:35 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Considerations The primary consideration in developing the business scope is balancing the highpriority needs of the key beneficiaries with the need to provide results within the nearterm. so you may have to arrange multiple meetings and summarize the input for the various participants. This lack of definition causes managers and key beneficiaries throughout the company to make assumptions related to their own processes or systems falling inside or outside of the scope of the project. WINS stand for Ways to Implement New Solutions.

In addition. unstructured.. For relational systems.e.3.Phase 2: Analyze Subtask 2.Data Warehousing 105 of 1017 . the documentation should include Entity-Relationship diagrams (E-R diagrams) and data dictionaries. semi-structured and complex XML) documentation may also include data format specifications for both internal and public (in the case of open data format standards) and any deviations from public standards.g. The development team needs to carefully review the source system documentation to ensure that it is complete (i. In this subtask. the developers must also determine what source systems house the data. where the data resides in the source systems.1 Identify Source Data Systems Description Before beginning any work with the data. For file based data sources (e. Prerequisites None Roles Business Analyst (Primary) Data Architect (Primary) Data Integration Developer (Primary) Data Quality Developer (Primary) Data Transformation Developer (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology .. specifies data owners and dependencies) and current. if available. it is necessary to determine precisely what data is required to support the data integration solution. and how the data is accessed. the development project team needs to validate the initial list of source systems and source formats and obtain documentation from the source system owners describing the source system schemas. The team also needs to ensure that the data is fully accessible to the developers and analysts that are building the data integration solution.

the source systems that are in scope should be understood at the start of the project. If there is a disconnect between which systems are in and out of scope it is important to document and analyze the impact. it is important to request copies of the source system data to serve as samples for further analysis. Instead of having to redesign your entire data warehouse to handle the new source system. but will be well worth the effort because the warehouse is now able to handle source system changes. Understanding the life expectancy of the source system will play a crucial part in the design process. it is advisable to request a subset of the data for evaluation purposes. you can minimize the impact of replacing source systems by designing a generic staging area that would essentially allow you to plug in the new source system.8. Designing this type of staging area however.Considerations In determining the source systems for data elements. or technologies and processes improve.Data Warehousing 106 of 1017 . This is a requirement in 2. Assuming that the bulk of your processing occurs after the data has landed in the staging area. The primary source of customer data is a system called Shucks. takes a large amount of planning and adds time to the schedule. This can present challenges to the team as the primary knowledge of those systems may be replaced as well. but is also important at this stage of development. it may be possible to design a generic staging area that could fit any customer source system instead of building a staging area based on one specific source system. For example. For Data Migration. Make a point to over-communicate what systems are in-scope. INFORMATICA CONFIDENTIAL Velocity v8 Methodology .1 Perform Data Quality Analysis of Source Data . assume you are building a customer data warehouse for a small bank. Try to determine if the source system is likely to be replaced or phased out in the foreseeable future. many companies upgrade or replace their systems. you discover that the bank is being bought out by a larger bank and that Shucks will be replaced within three months by the larger bank's source of customer data: a system called Grins. However. Another important element of the source system analysis is to determine the life expectancy of the source system itself. After your project starts. As data volumes in the production environment are often large. requesting too small of a subset can be dangerous in that it fails to provide a complete picture of the data and may hide any quality issues that truly exist. During the Analyze Phase these systems should be confirmed and communicated to all key stakeholders. As companies merge. and you will be building a staging area in the warehouse to act as a landing area for all of the source data. Identifying new source systems may exponentially increment the amount of resources needed on the project and require re-planning.

Data Warehousing 107 of 1017 .Best Practices None Sample Deliverables None Last updated: 20-May-08 19:28 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Take care to focus only on data that is within the scope of the requirements.3. Determining sourcing feasibility is a two-stage process. the developers must determine: ● ● ● what source systems house the data. it is necessary to determine precisely what data is required to support the data integration solution.Data Warehousing 108 of 1017 . In addition. A detailed analysis of the data sources within these source systems.2 Determine Sourcing Feasibility Description Before beginning to work with the data. requiring: ● ● A thorough and high-level understanding of the candidate source systems. Involvement of the business community is important in order to prioritize the business data needs based upon how effectively the data supports the users' top priority business problems. how the data is accessed.Phase 2: Analyze Subtask 2. Prerequisites None Roles Application Specialist (Primary) Business Analyst (Primary) Data Architect (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . where the data resides in the source systems.

revenues. The candidate source systems (i. executive information systems. ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology .g.. This list typically identifies 20 or more types of data that are required to support the data integration solution and may include.g. customer demographic data. categories and classifiers). sales forecasts... External sources — Any information source provided to the organization by an external entity. Types of source include: ● Operational sources — The systems an organization uses to run its business.e.Data Warehousing 109 of 1017 . requesting too small a subset can be dangerous in that it fails to provide a complete picture of the data and may hide any quality issues that exist. and budgets). it is advisable to request a subset of the data for evaluation purposes. Particular care needs to be taken when archived historical data (e. Candidate Source System Analysis A list of business data sources should have been prepared during the business requirements phase. Additional resources and procedures may be required to sample and analyze these data sources.. commissions. for example. such as Nielsen marketing data or Dun & Bradstreet. externally provided data such as market research) is required as a source to the data integration application. it is important to request copies of the source system data to serve as samples for further analysis. Strategic sources — The data may be sourced from existing strategic decision support systems. There may be a single source or multiple sources for the required data.Data Quality Developer (Primary) Metadata Manager (Primary) Considerations In determining the source systems for data elements. product information (e. where the required data can be found) can be identified based on this list.e. However. for example. Because data volumes in the production environment are often large. It may be any combination of the ERP and legacy operational systems.g. data archived on tapes) or syndicated data sets (i. and financial information (e..

An appreciation of the underlying technical feasibility may also impact the choice of data sources and should be within the scope of the high-level analysis being undertaken.Data Warehousing 110 of 1017 . An internal data warehouse accessed by the data integration application to validate and complement the information. Source system platform. An external system against which a credibility check needs to be performed by the data integration application (i. Factors to consider in this survey are: ● Current and future organizational standards INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and a single source for reference data may be key factors influencing the selection of the source systems. ● Appropriateness of the data source with respect to the underlying business functions. Note that projects typically under-estimate problems in these areas. accuracy of data. The timeliness of the data from the data source. reliability. a low-latency data integration application that requires credit checks to be performed on customers seeking a loan. to determine a credit rating).The following checklist can help to evaluate the suitability of data sources. the relevant source systems may be: ● A call center that captures the initial transactional request and passes this information in real time to a data integration application. This activity is about compiling information about the “as is and as will be” technological landscape that affect the characteristics of the data source systems and their impact on the data integration solution. Source systems boundaries with respect to the scope of the project being undertaken. ● ● ● ● ● ● ● ● Consider. Access licensing requirements and limitations. In this case. which can be particularly important for resolving contention amongst the various sources. for example. Many projects run into difficulty because poor data quality. The availability of the data from the data source.. Unique characteristics of the data source system.e. impacts the ability to perform transform and load operations. The accuracy of the data from the data source. Current and future deployment of the source data system. ● ● Timeliness. both at high (metadata) and low (record) levels.

Understanding data sources requires the participation of a data source expert/Data Quality Developer and a business analyst to clarify the relevance. it should produce a checklist identifying any issues about inaccuracies of the data content. Migration strategies. Further. ● ● Completion of this high-level analysis should reveal a number of feasible source systems. Best Practices. Networks. Services. for example: ● If a source system is likely to be replaced or phased out in the foreseeable future. or technologies and processes improve. Understanding the life expectancy of the source system plays a crucial part in the design process. External data sources. Security criteria. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . This can present challenges to the team. both in structure and in content. many companies upgrade or replace their systems. software. If the scope of the development is larger than anticipated. volumes and the frequency of data updates with respect to the ability to parse and transform the data and the implications that will have on hardware and software requirements. as well as the points of contact and owners of the source systems. as the primary knowledge of those systems may be replaced as well. As companies merge. solutions with significant file based data sources (and other solutions with complex data transformation requirements) it is necessary to also assess data sizes. Data Quality The next step in determining source feasibility is to perform a detailed analysis of the data sources. A high-level analysis should also allow for the early identification of risks associated with the planned development. Hardware. operational limitations. gaps in the data.Data Warehousing 111 of 1017 . and to create an accurate model of the source data systems. For B2B solutions. If the data quality is determined to be inadequate in one or more respects.● ● ● ● ● ● ● ● Infrastructure. and any changes in the data structures over time.

INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 112 of 1017 . since this documentation may be out of date and inaccurate. In-depth structural and metadata profiling of the data sources can be conducted through Informatica Data Explorer. The output of the data profiling effort is a survey. either the complete dataset or a representative subset. Invalid data. corrected or flagged for correction at this stage in the project. and other metrics. and analyzing the low-level quality of the data in terms of record accuracy. The data profiling process involves analyzing the source data. where necessary. Re-engineering requirements to correct content errors. whose recipients include the data stewardship committee. Missing documentation. and business meaning of the source data. taking an inventory of available data elements. Using sample data derived from the actual source systems is essential for identifying data quality issues and for determining if the data meets the business requirements of the organization. depending on the data volume. It is important to work with the actual source data. and checking the format of those data elements.8 Perform Data Quality Audit for more information on required data quality and data analysis steps. duplication. An assessment whether the source data is in a suitable condition for extraction. Documentation should include Entity-Relationship diagrams (E-R diagrams) for the source systems.technical content. ● Bear in mind that the issue of data quality can cleave in two directions: discovering the structure and metadata characteristics of the source data. Gaps in data. It is important not to rely solely on the technical documentation to obtain accurate descriptions of the source data. Low-level/per-record data quality issues also must be uncovered and. A complete set of technical documentation and application source code should be available for this step. See 2. which documents: ● ● ● ● ● ● ● ● Inconsistencies of data structures with respect to the documentation. Missing data. Data profiling is a useful technique to determine the structure and integrity of the data sources. these diagrams then serve as the blueprints for extracting data from the source systems. Inconsistencies of data with respect to the business rules. particularly when used in conjunction with the technical documentation. Inconsistencies in standards and style.

Determine File Transformation Constraints For solutions with complex data transformation requirements. the final step is to determine the feasibility of transforming the data to target formats and any implications that will have on the eventual system design. This matrix should contain details of the availability of the systems on different days of the week. for smaller-impact projects it is important that direct access is provided to all systems that are in scope. data migration projects have high level sponsorship and whatever is needed is provided. For Data Migration projects access to data is not normally a problem given the premise of the solution. Historically. However. The Source Availability Matrix lists all the sources that are being used for data extraction and specifies the systems' downtimes during a 24-hour period. This will require identification of appropriate boundaries for splitting and may require additional steps to convert the data into formats that are suitable for splitting. timelines should be increased and risk items should be added to the project. The developers need to work closely with the source system administrators during this step because the administrators can provide specific information about the hours of operations for their systems. This is necessary in order to determine realistic start and end times for the load window. Typically. most projects without direct access go over-time due to lack of availability of key resources to provide extracted data. including weekends and holidays. Best Practices None INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Very large flat file formats often require splitting processes to be introduced into the design in order to split the data into manageable sized chunks for subsequent processing.Determine Source Availability The next step is to determine when all source systems are likely to be available for data extraction. If this can be avoided by providing direct access it should. For example large PDF-based data sources may require conversion into some other format such as XML before the data can be split.Data Warehousing 113 of 1017 . If direct access is not available.

Sample Deliverables None Last updated: 20-May-08 19:37 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 114 of 1017 .

3. For Operational Data Integration projects. this typically involves putting some structure to the informational requirements.Data Warehousing 115 of 1017 . The preceding business requirements tasks (see Prerequisites) provide a high-level assessment of the organization's business initiative and provide business definitions for the information desired.Phase 2: Analyze Subtask 2. it is important that the requirements process involve representatives from all interested departments and that those parties reach a semantic consensus early in the process. Prerequisites None Roles Application Specialist (Secondary) Business Analyst (Primary) Data Architect (Primary) Data Steward/Data Quality Steward (Secondary) Data Transformation Developer (Secondary) Metadata Manager (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . this may involve identifying a subject area or transaction set within an existing operational schema or a new data store. Note that if the project involves enterprise-wide data integration.3 Determine Target Requirements Description This subtask provides detailed business requirements that lead to design of the target data structures for a data integration project. For Data Warehousing / Business Intelligence projects.

The key performance metrics may be directly sourced from an existing operational system or may require integration of data from various systems. Dimensions The key to determining dimension requirements is to formulate a business-oriented description for the segmentation requirements for each of the desired metrics. An example for a consultancy might be: "Compare the utilization rate of consultants for period x. Market analytics may indicate a requirement for metrics to be compared to external industry performance criteria.Data Warehousing 116 of 1017 . INFORMATICA CONFIDENTIAL Velocity v8 Methodology . for each of the major geographies as compared to the prior period" Often a mix of financial (e.g. strategic information needs must be explored to determine the metrics and dimensions desired. budget targets) and operational (e.. This may involve an iterative process of interaction with the business community during requirements gathering sessions. Metrics Metrics should indicate an actionable business measurement. requirements should be based on existing or defined business processes. This facilitates the design of processes to treat source metrics that may arrive in a variety of formats from various source systems. The key performance metrics should be agreed-upon through a consensus of the business users to provide common and meaningful definitions. for data warehousing projects.Technical Architect (Primary) Considerations Operational Data Integration For an operational data integration project. paying attention to words such as “by” and “where”. However..g. segmented by industry. trends in customer satisfaction) key performance metrics is required to achieve a balanced measure of the organizational performance.

Data Warehousing 117 of 1017 . consideration needs to be given to the target data to be generated. It is also important to clarify the lowest level of detail that is required for reporting. or by time). Data Migration Projects Data migration projects should be exclusively driven by the target system needs. a Pay-TV operator may be interested in monitoring the effectiveness of a new campaign geared at enrolling new subscribers. by demography. it may. Rapidly-changing dimensions are those whose values may change frequently over their lifecycle (e. the number of new subscribers would be an essential metric. Area. Subregion. by group. Therefore.For example. It is also important at this stage to determine as many likely summarization levels of a dimension as possible. In a simple case. Considerations include: ● ● What are target file and data formats? What sizes of target files need to be supported? Will they require recombination of multiple intermediate data formats? Are there applicable intermediate or target canonical formats that can be ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and day while geography may be broken down into Major Region. however... it is recommended to identify the target system needs early in the Analyze Phase and focus the analysis activities on those objects.g. time may have a hierarchical structure comprising year. since this can affect the structure of an eventual data model built from this analysis. not by what is available in the source systems. The metric and dimension requirements should be prioritized according to perceived business value to aid in the discussion of project scope in case there are choices to make regarding what to include or exclude. For example. etc. A technical consideration at this stage is to understand whether the dimensions are likely to be rapidly changing or slowly changing.g. quarter. a customer attribute that changes many times a year) as opposed to a slowly-changing dimension such as an organization that may only change when a reorganization occurs. month. B2B Projects For B2B and non B2B projects that have significant flat file based data targets. be important to the business community to perform an analysis based on the dimensions (e.

created or leveraged? ● What XML schemas are needed to support the generation of the target formats? Do target formats conform to well known proprietary or open data format standards? Does target data generation need to be accomplished within specific time or other performance related thresholds? How are errors both in data received and in overall B2B operation communicated back to the internal operations staff and to external trading partners? What mechanisms are used to send data back to external partners? What applicable middleware.Data Warehousing 118 of 1017 .)? Are there machine readable specifications that can be leveraged directly or on modification to support “Specification driven transformation” based creation of data transformation scripts? Is sample data available for testing and verification of any data transformation scripts created? ● ● ● ● ● ● ● ● At a higher level. communications and enterprise application software is used in the overall B2B operation? What data transformation implications does the choice of middleware and infrastructure software impose? How is overall B2B interaction governed? What process flows are involved in the system and how are they managed (for example via B2B Data Exchange. the number and complexity of data targets and the number and complexity of intermediate data formats and schemas determine the overall scope of the data transformation and integration aspects of B2B data integration projects as a whole. the number and complexity of data sources. Best Practices None Sample Deliverables None Last updated: 20-May-08 19:46 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . external BPM software etc.

are often large-scale. This roadmap. This can also be the case with analytics visualization projects or metadata reporting/management projects. This gives near-term deliverables that provide early value to the business (which can be helpful in funding discussions) and conversely. and business priorities are used to set the sequence of the increments. long-term projects. but also communicates the timing of these incremental sub-projects based on their prioritization. could cause costly rework later on. whether data warehousing or operational data integration. but each delivers clear value in itself. source analysis is reviewed to provide feasibility. The business requirements are reviewed for logical subprojects (increments). Below is an example of a timeline for a Sales and Finance data warehouse with the increments roughly spaced each quarter.Data Warehousing 119 of 1017 . is an important avenue for early end-user feedback that may enable the development team to avoid major problems.5 Build Roadmap for Incremental Delivery Description Data Integration projects.3. Backlog GL Analytics COGS Analysis Prerequisites None Roles Business Project Manager (Primary) Data Architect (Primary) Data Steward/Data Quality Steward (Secondary) Project Sponsor (Secondary) Technical Architect (Primary) Technical Project Manager (Primary) Considerations The roadmap is the culmination of business requirements analysis and prioritization. if undetected. then. Each increment builds on the completion of the prior increment. Any complex project should be considered a candidate for incremental delivery. provides the project stakeholders with a rough timeline for completion of their entire objective. Under this strategy the entirety of the comprehensive objectives of the project are broken up into prioritized deliverables. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . This feedback may point out misconceptions or other design flaws which.Billings. each of which can be completed within approximately three months.Phase 2: Analyze Subtask 2. factoring in feasibility and the interoperability or dependencies of the increments. Q1 Yr 1 Implement Data Warehouse Architecture Q2 Yr 1 Q3 Yr 1 Q4 Yr 1 Q1 Yr 2 Revenue Analytics Complete Bookings.

It is critical to gain the buy-in of the main project stakeholders regarding priorities and agreement on the roadmap sequence. there is less risk of costly rework effort due to misunderstood (or changing) requirements because of early feedback from endusers. Much lower risk of overall project failure because of the plan for early.The objective is to start with increments that are highly feasible. Advantages of incremental delivery include: ● ● Customer value is delivered earlier – the business sees an early start to its ROI. ● Best Practices None Sample Deliverables None Last updated: 20-May-08 17:20 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . There may be schema redesign or other rework necessary after initial increments because of unforeseen requirements or interdependencies. have no dependencies and provide significant value to the business. ● ● ● Disadvantages can be ● There is always some extra effort involved in managing the release of multiple increments. However. attainable successes. the project still provides the value of the increments that are completed. more complex increments may be harder to deliver. they may attract more visibility and have greater perceived value than the project as a whole. Because the early increments reflect high-priority business needs. Highly likely that even if all of the long-term objectives are not achieved (they may prove infeasible or lose favor with the business). One or two of these “quick hit” increments is important to build end-user confidence and patience as the later.Data Warehousing 120 of 1017 . Early increments elicit feedback and sometimes clearer requirements that will be valuable in designing the later increments.

Data Warehousing 121 of 1017 . Business rules and data definitions further clarify specific business requirements and are very important in developing detailed functional requirements and ultimately the design itself. For example. while the requirements for data migration or operational data integration projects should be based on an analysis of the target transactions they are expected to support and what the receiving system needs in order to process the incoming data. Prerequisites None Roles Business Project Manager (Review Only) Considerations Different types of projects require different functional requirements analysis processes.Phase 2: Analyze Task 2. The business drivers and goals provide a high-level view of these needs and serve as the starting point for the detailed functional requirements document. A functional requirements document is necessary to ensure that the project team understands these needs in detail and is capable of proceeding with a system design based upon the end user needs. Requirements for metadata management projects involve reviewing IT requirements for reporting and managing project metadata. an understanding of how key business users will use analytics reporting should drive the functional requirements for a business analytics project. and interviewing potential users to determine reporting needs and preferences. it must resolve the business objectives in a way that the end users find easy to use and satisfactory in addressing their needs. Business Analytics Projects INFORMATICA CONFIDENTIAL Velocity v8 Methodology .4 Define Functional Requirements Description For any project to be ultimately successful. surveying the corporate information technology landscape to determine potential sources of metadata.

. and even reviewing mock-ups and usage scenarios with key end-users.. Data Integration Projects For all data integration projects (i. with what frequency. This understanding helps to determine the details regarding what data to provide.. and periodicity. with what special calculations. The business requirements should indicate frequency of load for migration systems that will be run in parallel for a period of time (i. each with a different purpose.Developers need to understand the end-users expectations and preferences in terms of analytic reporting in order to determine the functional requirements for data warehousing or analytics projects. B2B Projects For B2B projects and flat file/XML-based data integration projects. developers also need to review the source analysis with the DBAs to determine the functional requirements of the source extraction processes. repeatedly). the data formats that are required for trading partners to interact with the system. Data Migration Functional requirements analysis for data migration projects involves a thorough understanding of the target transactions within the receiving system(s) and how the systems will process the incoming data for those transactions.e. the mechanisms for trading partners and operators to determine the success and failure of transformations and the internal interactions with legacy systems and other applications all form part of the INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and there may be a need for interfaces with queue-based messaging systems in situations where EAI-type integration between operational databases is involved or master data management requirements.Data Warehousing 122 of 1017 . and so forth. The processing may involve multiple load steps.e. some operational and perhaps some for reporting. those currently developing reports and analyses for Finance and other departments). The analysis may include studying existing reporting.e. Operational Data Integration These projects are similar to data migration projects in terms of the need to understand the target transactions and how the data will be processed to accommodate them. There may also be real-time requirements for some. all of the above). interviewing current information providers (i. at what level of summarization.

rather than business requirements. Often B2B systems may have real-time requirements and involve the use of interfaces with queue-based messaging systems. These. For large B2B projects. and general development efficiency. outputs. Prioritization will determine in what phase certain functionality is delivered. for Business Process Outsourcing and other types of B2B interaction. Best Practices None Sample Deliverables None Last updated: 20-May-08 19:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . priorities need to be assigned to functions based on the business needs. web services and other application integration technologies. in turn. For projects using a phased approach. may impose additional User Interface and/ or EAI type integration requirements. dependencies for those functions.Data Warehousing 123 of 1017 . the Functional Requirements Specifications template can provide a valuable guide for determining the system constraints. While these are technical. Building the Specifications For each distinct set of functional requirements. inputs. and dependencies. technical considerations often form a core component of the business operation. overall business process management will typically form part of the overall system which may impose requirements around the use of partner management software such as B2B Data Exchange and/or business process management software.requirements of the system.

Examples of metadata include: ● ● ● ● ● ● Definition of the data element Business names of the element System abbreviations for that element The data type (string. date. some are of interest only to certain technical staff members. Companion or interpretation guides governing an organization’s interpretation of data in a particular standard.Data Warehousing 124 of 1017 . Specifications of transformations between data formats. metadata can provide answers to such typical business questions as: INFORMATICA CONFIDENTIAL Velocity v8 Methodology . flat file metadata can include: ● Standards documents governing layout and semantics of data formats and interchanges. decimal.) Size of the element Source location In terms of flat file and XML sources. metadata can include open and proprietary data standards and an organization’s interpretations of those standards.5 Define Metadata Requirements Description Metadata is often articulated as ‘data about data’. COBOL copybook definitions for flatfile data to be passed to legacy or backend systems. In addition to the previous examples. That is. etc. ● ● ● All of these pieces of metadata are of interest to various members of the metadata community. while other pieces may be very useful for business people attempting to navigate through the enterprise data warehouse or across and through various business/subject areaorientated data marts. It is the collection of information that further describes the data used in the data integration project.Phase 2: Analyze Task 2.

historical. source to target dependency)? How will a change over here affect things over there (i.) Determine responsibility for: r r r r ● ● ● ● ● ● Capturing Establishing standards and procedures Maintaining and securing the metadata Proper use. and how Determine business and source system definitions and names Determine metadata sources (i.g.... BI. distributed. OLAP. databases. absolute.) Determine methods to consolidate metadata from multiple sources Identify where metadata will be stored (e....) Determine training requirements Determine the quality of the metadata sources (i.e. and update procedures INFORMATICA CONFIDENTIAL Velocity v8 Methodology . its definition in business terminology)? What is the time scale for some number? How is some particular metric calculated? Who is the data owner? ● ● ● Metadata also provides answers to Technical questions: ● ● ● ● ● ● What does this mapping do (i. repository-based.e. or both) Evaluate the metadata products and their capabilities (i. quality control.Data Warehousing 125 of 1017 .● What does a particular piece of data mean (i. in reports or mappings)? How current is my information? What is the load history for a particular object? Which reports are being accessed most frequently and by whom? The components of a metadata requirements document include: ● ● ● ● ● Decision on how metadata will be used in the organization Assign data ownership Decision on who should use what metadata. ETL. modeling tools.e. central. XML Schemas. CASE dictionary. etc.e. relative..e. impact analysis)? Where are the bottlenecks (i..e. warehouse manager.e. etc. and why. etc.

Prerequisites None Roles Application Specialist (Review Only) Business Analyst (Primary) Business Project Manager (Primary) Data Architect (Primary) Data Steward/Data Quality Steward (Primary) Database Administrator (DBA) (Primary) Metadata Manager (Primary) System Administrator (Primary) Considerations One of the primary objectives of this subtask is to attain broad consensus among all key business beneficiaries regarding metadata business requirements priorities. class words. INFORMATICA CONFIDENTIAL Velocity v8 Methodology .. code values. it is critical to obtain as much participation as possible in this process.) Create a Metadata committee Determine if the metadata storage will be active or passive Determine the physical requirements of the metadata storage Determine and monitor measures to establish the use and effectiveness of the metadata. etc. abbreviations.● ● ● ● ● Establish metadata standards and procedures Define naming standards (i.e.Data Warehousing 126 of 1017 .

B2B Projects For B2B and flat file oriented data integration projects. In some cases applicable metadata may need to be mined from sample operational data from unstructured and semi structured system documentation. Best Practices None Sample Deliverables None Last updated: 20-May-08 20:05 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . For B2B projects. getting adequate sample source and target data can become a critical part of defining the metadata requirements.Data Warehousing 127 of 1017 . The process of designing the system may include the need to determine and document the metadata consumed and produced by legacy and 3rd party systems. metadata is often defined in less structured forms than for data dictionaries or other traditional means of managing metadata.

repositories) is necessary in order to understand the availability and coverage of metadata.5.1 Establish Inventory of Technical Metadata Description Organizations undertaking new initiatives require access to consistent and reliable data resources. A metadata inventory will provide the basis for which informed estimates and project plans can be prepared. In particular. which can require significant effort..Data Warehousing 128 of 1017 . the ease of accessing and collating what is available. Metadata is required for a number of purposes: ● ● ● ● ● ● ● Provide a data dictionary Assist with change management and impact analysis Provide a ‘system of record’ (lineage) Facilitate data auditing to comply with regulatory requirements Provide a basis on which formal data cleansing can be conducted Identify potential choices of canonical data formats Facilitate definition of data mappings An inventory of sources (i. there may be a need to develop custom resources to access certain metadata repositories. Confidence in the underlying information assets and an understanding of how those assets relate to one another can provide valuable leverage in the strategic decision-making process. Prerequisites None INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Phase 2: Analyze Subtask 2. The inventory is also the basis on which the development of metadata collation and reporting can be planned. and any potential gaps in metadata provisioning. Integrating these data assets and turning them into key components of the decision-making process requires significant effort. systems that generate data become isolated resources unless they are properly integrated. As organizations grow through mergers and consolidations.e. if Metadata Manager is used.

The type of reporting expected from the metadata. ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology .g. ● ● ● ● The second part of the process is to investigate in detail those metadata repositories or sources that will be required to meet the next phase of requirements. The availability of an XConnect (assuming Metadata Manager is used) to access repository and collate the metadata. The type of metadata (usually the product maintaining it) and the format in which is kept (e.Data Warehousing 129 of 1017 . This investigation will establish: ● ● The (generally recognized) name of each source. This investigation will establish: ● Ownership and stewardship of the metadata (responsibilities of the owners and stewards are usually pre-defined by an organization and not part of preparing the metadata inventory)... usually by the Business Analysts and System Specialist).g. Cross-references to other documents (e. design or modeling documents).Roles Application Specialist (Review Only) Business Analyst (Primary) Data Steward/Data Quality Steward (Primary) Metadata Manager (Primary) Technical Architect (Primary) Considerations The first part of the process is to establish a Metadata Inventory that lists all metadata sources. Existence of a metadata model (one will need to be developed if it does not exist. The priority assigned to investigation. database type and version).

Best Practices None Sample Deliverables Metadata Inventory Last updated: 20-May-08 20:12 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . metadata is often defined and maintained in the form of non-database oriented metadata such as XML schemas or data format specifications (and specifications as to how standards should be interpreted). Frequency and methods of updating the repository. The quality of the metadata sources (i. the responsibility for tracking metadata may shift to members of the technical architecture team. In B2B systems.. as traditional database design.Data Warehousing 130 of 1017 . Those elements required for reporting/analysis purposes. legacy systems and/or mapping specifications. and include a list of prerequisites and dependencies). planning and maintenance may play a lesser role in these systems. (Ideally. Extent of any update history. but may be better measured against metrics that either exist with the organization or are proposed as part of developing this inventory). a custom XConnect) if none already exists. ‘quality’ can be measured qualitatively by a questionnaire issued to users.e. ● B2B Projects For B2B and flat file oriented data integration projects. Metadata repositories may take the form of document repositories using document management or source control technologies. Metadata may need to be mined from sample data.● ● ● ● ● System and business definition of the metadata items. the estimates should be in man-days by skill. The development effort involved in developing a method of accessing/ extracting metadata (for Metadata Manager.

store.Phase 2: Analyze Subtask 2. Having a thorough understanding of these requirements is a must for a smooth and timely implementation of any metadata analysis solution per business requirements. as well as the implementation of tools to collect.Data Warehousing 131 of 1017 . Prerequisites None Roles Business Project Manager (Review Only) Metadata Manager (Primary) System Administrator (Review Only) Considerations Determine Metadata Reporting Requirements Metadata reporting requirements should drive the specific metadata to be collected.5. and display it.2 Review Metadata Sourcing Requirements Description Collect business requirements about metadata that is expected to be stored and analyzed. These requirements are determined by the reporting needs for metadata. and the ability to extract and load this information to the respective metadata repository or metadata warehouse. as well as details of the metadata source. The need to expose metadata to developers is quite different than a need to expose metadata to operations personnel and quite different than a need to expose metadata to business users. Each of these pieces of the metadata picture requires different information and can be stored in and handled by different metadata repositories. INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

it is important to note that there is not likely to be an automated method of extracting and loading this type of metadata. Various other more formalized sources of metadata usually have automated methods for loading to a metadata repository or warehouse. and if load windows are being met. etc.e. In the best case scenario. as well as information that can help with impact analysis in the case of change to a source or target structure or transformation logic. if any. If it is important to include this information in a metadata repository or warehouse. Operations personnel generally require metadata regarding either the data integration processes or business intelligence reporting. this information needs to be entered manually. or both. This includes information that is held in data modeling tools. This metadata allows operations to address issues as they arise. This information is helpful in determining issues or problems with delivering information to the final end-user with regard to items such as the expected source data sizes versus actual processed. the number of end users running specific reports. business users want to know how the data was generated (and related) and what manipulation. was performed to produce it. or exist as knowledge gained through the course of working with the data. ontologies and taxonomies) to the transformations and/or calculations that were used to create the final report values. Sources of Metadata After initial reporting requirements are developed. the location and accessibility of the metadata must be considered. If there are data quality routines to be implemented.. Some sources of metadata only exist in documentation. database management systems and business intelligence tools. source metadata can also help to determine the best method for such implementation. or can be considered “home grown” by the systems that are used to perform specific tasks on the data. data integration platforms. Information looked at ranges from specific reference metadata (i. the time of day reports are being run and when the load on the system is highest. a custom process can be created to load this metadata and in the worst case.Data Warehousing 132 of 1017 . It is important to note that most sources of metadata that can be loaded in an automated fashion contain mechanisms for holding some custom / unstructured type INFORMATICA CONFIDENTIAL Velocity v8 Methodology . as well as specific expectations regarding the quality of the source data. When reviewing metadata.Developers typically require metadata that helps determine how source information maps to a target. the time to run specific processes.

this usually requires a platform that can integrate the metadata from various metadata sources. ERP systems via PowerConnects.metadata. as well as provide a relatively robust reporting function. however. most of these metadata repositories include locations for manually entering metadata. targets. Mechanisms such as metadata extensions also allow for user-defined fields of metadata. Usually.Data Warehousing 133 of 1017 . custom XConnects need to be created to accommodate any metadata source that does not already have a pre-built loading interface or any source where the pre-built interface does not extract all the required metatdata. PowerCenter can import definitions from data modeling tools using Metadata Exchange. such as description fields. there are various locations where description fields can be used to include unstructured type / more descriptive metadata. a metadata warehouse platform such as Informatica Metadata Manager may be more appropriate to handle such functions. Thus. a platform like Metadata Manager is optimal. refer to the Informatica INFORMATICA CONFIDENTIAL Velocity v8 Methodology . if robust reporting is required. including items such as database management systems. In terms of automated loading of metadata. In general. which specific software metadata repositories usually lack. metadata from various sources. in these cases. This methodology may obviate the need for creating custom methods of loading metadata or manually entering the same metadata in various locations. these software packages include sufficient reporting capability to meet the required needs of this type of reporting. as well as automatically importing metadata from various sources. such as the PowerCenter repository or the business intelligence software repository. Also. and XML schema definitions. At the same time. In the case of metadata for developers and operations personnel. this type can generally be found and stored in the repositories of the software used to accomplish the tasks. or reporting across multiple software metadata repositories. In the case of metadata requirements for a business user. when using the PowerCenter repository as a metadata hub. (For details about developing a custom Xconnect. as well as the various types of metadata sources. Metadata Storage and Loading For each of the various types of metadata reporting requirements mentioned. and other objects can be imported natively from the connections the PowerCenter software can make to these systems. different methods of storage may fit better than others and affect how the various metadata can be sourced. When using Metadata Manager. Specifically.

structures. For metadata repositories like PowerCenter.Data Warehousing 134 of 1017 . such as report information.to analyze data integration operations.to analyze a business intelligence system. and how long it takes to run reports. including: ● Business intelligence reports .1 Custom Metadata Integration Guide). (For more information about metamodels. refer to the Metadata Manager Administrator Guide).to explore database objects. methods. Data integration reports . Metadata Analysis and Reports The specific types of analysis and reports must also be considered with regard to specifically what metadata needs to be sourced.5. the PowerCenter data integration platform. the following analysis is possible with Metadata Manager: ● ● ● ● ● Metadata browsing Metadata searching Where-used analysis Lineage analysis Packaged reports Metadata Manager provides more specific metadata analysis to help analyze source repository metadata. (For a specific list. database management systems and business intelligence tools. and indexes. such as reports that identify data integration problems. the available analysis is very specific and little information beyond what is normally sourced into the repository can be available for reporting. triggers. refer to the Metadata Manager ● ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Metadata Manager 8. In the case of a metadata warehouse platform such as Metadata Manager. Metadata Manager contains various XConnect interfaces for data modeling tools. such as schemas. Database management reports . user activity. Metamodel reports – to analyze how metadata is classified for each repository. more comprehensive reporting can be created. From a high-level. and analyze data integration processes. and the relationships among them.

some analysis requirements cannot be fulfilled by the above-mentioned features and out-of-the-box reports.Administrator Guide). ● ODS reports – to analyze data in particular metadata repositories. Bear in mind that Informatica Data Explorer (IDE) also provides a range of source data and metadata profiling and source-to-target mapping capabilities. Analysis should be performed to identify any gaps and to determine if any customization or design can be done within Metadata Manager to resolve the gaps. Best Practices None Sample Deliverables None Last updated: 09-May-08 13:52 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 135 of 1017 . It may be possible that even with a metadata warehouse platform like Metadata Manager.

The following types of questions should be considered in beginning this effort.Data Warehousing 136 of 1017 .5.Phase 2: Analyze Subtask 2. Prerequisites None Roles Application Specialist (Review Only) Business Project Manager (Primary) Data Architect (Primary) Database Administrator (DBA) (Primary) System Administrator (Primary) Considerations Assessing the impact of an enterprise’s IT policies may incorporate a wide range of discussions covering an equally wide range of business and developmental areas. Overall INFORMATICA CONFIDENTIAL Velocity v8 Methodology .3 Assess Technical Strategies and Policies Description Every IT organization operates using an established set of corporate strategies and related development policies. Understanding and detailing these approaches may require discussions ranging from Sarbanes-Oxley compliance to specific supported hardware and software considerations. The goal of this subtask is to detail and assess the impact of these policies as they relate to the current project effort.

if any. Developer. if any. DBAs)? Is all development performed by full time employees? Are contractors and/or offshore resources employed? ● ● Project Lifecycle ● What is a typical development lifecycle? What are standard milestones? What criteria are typically applied to establish production readiness? What change controls mechanisms/procedures are in place? Are these controls strictly policy-based. if any..Data Warehousing 137 of 1017 . Business Analyst. etc.g. UNIX vs. Linux? Oracle vs. NT vs. PeopleSoft? What. data extraction and integration standards currently exist? What source systems are currently utilized? For example. Project Sponsor. 2) Test. 5) Production. 3) QA. What is or will be the end-user presentation layer? ● ● ● ● ● ● Project Team ● What is a standard project team structure? For example. or are any specific change-control software in use? What. Are dedicated support resources assigned? Or are they often shared among initiatives (e. regulatory requirements exist regarding access to and historical maintenance of the source data? What. 4) Pre-Production. SQL Server? SAP vs. promotion/release standards are used? What is the standard for production support? ● ● ● Metadata and Supporting Documentation INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Project Manager. mainframe? flat file? relational database? What. load window restrictions exist regarding system and/or source data availability? How many environments are used in a standard deployment? For example: 1) Development. if any.● Is there an overall IT Mission Statement? If so. what specific directives might affect the approach to this project effort? Environment ● What are the current hardware or software standards? For example.

Best Practices None Sample Deliverables None Last updated: 15-Feb-07 18:11 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . is the current metadata strategy within the enterprise? Resolving the answers to questions such as these will enable a greater accuracy in project planning. the understanding gained from this assessment ensures that any new project effort will better marry its approach to the established practices of the organization. and staffing efforts.Data Warehousing 138 of 1017 . scoping. Additionally. if any.● ● What types of supporting documentation are typically required? What.

Data Warehousing 139 of 1017 . implementation of said architecture. Conducting this analysis. Prerequisites None Roles Business Project Manager (Primary) Database Administrator (DBA) (Primary) System Administrator (Primary) Technical Architect (Primary) Considerations Carefully consider the following questions when evaluating the technical readiness of a given enterprise: ● Has the architecture team been staffed and trained in the assessment of critical technologies? Have all of the decisions been made regarding the various components of the infrastructure. servers. and software? ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology .6 Determine Technical Readiness Description The goal of this task is to determine the readiness of an IT organization with respect to its technical architecture. provides evidence as to whether or not the critical technologies and associated support system are sufficiently mature as to not present significant risk to the endeavor.Phase 2: Analyze Task 2. and the associated staffing required to support the technical solution. including: network. through interviews with the existing IT team members (such as those noted in the Roles section).

and DBA(s)? (See 1. installing. and deployment of the servers and network? If in place.2. scalability. Has the Technical Architect evaluated and verified the Informatica PowerCenter Quickstart configuration requirements? Has the repository database been installed and configured? ● ● ● ● ● By gaining a better understanding of questions such as these. Technical Architect. capacity.Data Warehousing 140 of 1017 . developers can achieve a clearer picture of whether or not that organization is sufficiently ready to move forward with the project effort. and reliability of the infrastructure? Has the project team been fully staffed and trained. Are proven implementation practices and approaches in place to ensure a successful project? (See 2.5. Developer(s).1 Establish Project Roles). what are the availability. including but not limited to: a Project Manager. System Administrator.3 Assess Technical Strategies and Policies). Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:44 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . This information also helps to develop a more accurate and reliable project plan.● Has a schedule been established regarding the ordering.

and specific controls over actions and processing of the data. processes.7 Determine Regulatory Requirements Description Many organizations must now comply with a range of regulatory requirements such as financial services regulation.Data Warehousing 141 of 1017 . detailed auditing of data. including any prescribed content. are likely to be particularly important. and interchange of data between organizations. data protection. However. etc. in the banking sector. Sarbanes-Oxley. The definitions of content (e. there is a "carrot and stick" element to regulatory compliance. These steps include establishing a catalog of all reporting and auditing required. inclusion/exclusion rules. units. The penalties for not precisely meeting the requirements can be severe. timescales. and controls.g. Successful compliance — for example. This can mean prescribed reporting. the project personnel must establish what government or industry standards the project data must adhere to and devise a plan to meet these standards. These requirements differ from the "normal" business requirements in that they are imposed by legislation and/or external bodies. with the Basel II Accord — brings the potential for more productive and profitable uses of data.. formats. Some industries may also be required to complete specialized reports for government regulatory bodies.Phase 2: Analyze Task 2. retention of data for potential criminal investigations.) and any metrics or calculations. Regulatory requirements and industry standards can also present the business with an opportunity to improve its data processes and update the quality of its data in key areas. As data is prepared for the later stages in a project. Prerequisites None Roles Business Analyst (Primary) Business Project Manager (Review Only) INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

2. specified verification. to establish a “single version of the truth” for items in the business chain). as well as for sign-offs. or if the business simply insists on high standards for its data (for example. the key benefit of GDS is that it is not a compliance chore but a positive and profitable initiative for a business. Such initiatives are sometimes gathered under the umbrella of Global Data Synchronization (GDS). between Development. Regulatory requirements often require the ability to clearly audit the processes affecting the data. completeness.Legal Expert (Primary) Considerations Areas where requirements arise include the following: ● Sarbanes-Oxley regulations in the U. or records that contain empty fields. and individually. as enterprises realize the benefits of synchronizing their data storage conventions with suppliers and customers.Data Warehousing 142 of 1017 . and Production). mean a proliferation of controls on processes and data. etc. ● ● If your project must comply with a government or industry regulation.’ Remember. such a system can produce INFORMATICA CONFIDENTIAL Velocity v8 Methodology . by coming together around common data models such as bar codes and RFID (radio frequency identification). This may require a metadata reporting system that can provide viewing and reporting of data lineage and ‘where-used. Developers need to work closely with an organization’s Finance Department to ascertain exactly how Sarbanes-Oxley affects the project. then you must increase your focus on data quality in the project. There may be implications for how environments are set up and controls for migration between environments (e. compliance with a request for data under Section 314 of the USAPATRIOT Act is likely to be difficult for a business that finds it has large numbers of duplicate records. Another regulatory system applicable to financial companies is the Basel II Accord. and duplication. both communally. Other industries are demanding adherence to new data standards.. Such problems should be identified and addressed before the data is moved downstream in the project. or fields populated with default values. it is a de facto requirement within the international financial community. Test.S. For example.8 Perform Data Quality Audit is dedicated to performing a Data Quality Audit that can provide the project stakeholders with a detailed picture of the strengths and weaknesses of the project data in key compliance areas such as accuracy. While Basel II does not have the force of law.g.

along with the inventory of metadata. review any proposed processes and audit controls to verify that the regulatory requirements can be met and that any gaps are filled.Data Warehousing 143 of 1017 . Use the data models of the systems and data sources involved. ● ● Processes and Auditing Controls It is important that data can be audited at every stage of processing where it is necessary. Also. and delivery mechanisms for all reports comply with the regulatory requirements. as may SWIFT or Basel II for finance-related data. Industry and regulatory standards for data interchange may also affect data model and ETL designs. Use data models and the metadata catalog to assess the availability and quality of the required data and metadata. It is important to check that the format. To this end. again filling any gaps. Potentially there are now two areas to investigate in more detail: data and metadata. Verify that the target data models meet the regulatory requirements. HIPAA and HL7-compliance may dictate transaction definitions that affect healthcare-related projects. ● Map the requirements back to the data and/or metadata required using a standard modeling approach.spin-off benefits for IT in terms of automated project documentation and impact analysis. content. Best Practices None Sample Deliverables None Last updated: 15-Feb-07 18:13 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . ensure that reporting requirements can be met.

are necessary to correct data quality issues and ensure that the successful completion of the project is not in jeopardy.8 Perform Data Quality Audit Description Data Quality is a key factor for several tasks and subtasks in the Analyze Phase. is a key determinant of the specifics of the business scope and of the success of the project in general. The project leaders can then decide what actions. There is little point in performing a data warehouse. or integration project if the underlying data is in bad shape. such as data transformation and load operations. migration. if any. and can also compromise the business’ ability to generate a return on the project investment. Poor data quality can impede the proper execution of later steps in the project.2 Determine Sourcing Feasibility. see subtask 2. Prerequisites None Roles Business Project Manager (Secondary) Data Quality Developer (Primary) Data Steward/Data Quality Steward (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . For information on issues relating primarily to data structure. which focuses on the quality of the data content. This is compounded by the fact that most businesses underestimate the extent of their data quality problems.3.Phase 2: Analyze Task 2. The Data Quality Audit is designed to analyze representative samples of the source data and discover their data quality characteristics so that these can be articulated to all relevant project personnel. The quality of the proposed project source data. in terms of both its structure and content. Problems with the data content must be communicated to senior project personnel as soon as they are discovered.Data Warehousing 144 of 1017 .

If a record includes 01/01/1900 as a data of birth. The Data Quality Developer uses a data analysis tool to determine the quality of the data according to several criteria. The Data Quality Developer generates summary reports on the data and distributes these to the relevant roles for discussion and next steps. Data Quality Criteria You can define any number and type of criteria for your data quality. ● ● Two important aspects of the audit are (1) the data quality criteria used. However. for example. For example. Integrity is concerned with the recognition of meaningful associations ● ● ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . fields in the dataset that have been left empty or whose default values have been left unchanged. Completeness is concerned with missing data. Conformity is concerned with data values of a similar type that have been entered in a confusing or unusable manner.Data Warehousing 145 of 1017 . many data input fields have a default date setting of 01/01/1900. telephone numbers that include/omit area codes.Technical Project Manager (Secondary) Considerations The Data Quality Audit can typically be conducted very quickly. and (2) the type of report generated. The main steps are as follow: ● Representative samples of source data from all main areas are provided to the Data Quality Developer. It is often determined by comparing the dataset with a reliable reference source. but the actual time required is determined by the starting condition of the data and the success criteria defined at the beginning of the audit. a dictionary file containing product reference data. it is highly likely that the field was never populated. for example. Consistency is concerned with the occurrence of disparate types of data records in a dataset created for a single data type (e. there are six standard criteria: ● Accuracy is concerned with the general accuracy of the data in a dataset. the combination of personal and business information in a dataset intended for business data only)..g. that is.

For example. indicating that the records refer to a single household. it can be difficult to tell simply by “eye-balling” if a given data record is inaccurate. a dataset may contain records for two or more family members in a household but without any means for the organization to recognize or use this information. with identifying redundant records in the dataset or records with meaningful information in common. such as redundancy or timeliness. for example. where both records describe the same batch. Best Practices Developing the Data Quality Business Case Sample Deliverables None Last updated: 21-Aug-07 14:06 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . 12345” and “Batch 12345”. A dataset may contain several records with common surnames and street addresses. that is.a pair of duplicate records may be visible to the naked eye. Accuracy can be determined by applying fuzzy logic to the data or by validating the records against a verified reference data set. this type of information is relevant to marketing personnel. the characteristics above are sometimes described with other terminology.Data Warehousing 146 of 1017 . Note that the accuracy factor differs from the other five factors in the following respect: whereas. ● Duplication is concerned with data records that duplicate one another’s information. and the prevalence and relative priority of data quality issues differ from one organization and one project to the next.between records in a dataset. r This list is not absolute. For example: r A dataset may contain user-entered records for “Batch No. Every organization’s data needs are different.

and producing an accurate and qualified summary of the data’s quality. This subtask focuses on data quality analysis. It involves conducting a data analysis on the project data.Phase 2: Analyze Subtask 2.8.1 Perform Data Quality Analysis of Source Data Description The data quality audit is a business rules-based approach that aims to help define project expectations through the use of data quality processes (or plans) and data quality scorecards. The result of this step is a list of the sources of data to be analyzed.Data Warehousing 147 of 1017 .2 Report Analysis Results to the Business.as well as input on known data quality issues. Select Target Data The main objective of this step is to meet with the data steward and business owners to identify the data sources to be analyzed. Data quality analysis plans are configured in Informatica Data Quality (IDQ) Workbench. These define the initial scope of the audit. content. Run Data Quality Analysis This step identifies and quantifies data quality issues in the source data. and structure. For each data source. (The plans should be configured in a manner that enables the production of scorecards in the next subtask.8. the Data Quality Developer will need all available information on the data format.) The plans designed at this INFORMATICA CONFIDENTIAL Velocity v8 Methodology . 2. or on a representative sample of the data. A scorecard is a graphical representation of the levels of data quality in the dataset. along with the identification of all known issues. Prerequisites None Roles Business Analyst (Secondary) Data Quality Developer (Primary) Data Steward/Data Quality Steward (Secondary) Considerations There are three key steps in the process: 1. The following figure illustrates selecting target data from multiple sources. The results are processed and presented to the business users in the next subtask 2.

a set of base rules must be established to test the conformity of the attributes' data values against basic rule definitions. The following figure illustrated business rule evaluation. For numeric data. For example: ● ● For character data. ● For each data set. and consistency. Data analysis provides detailed metrics to guide the next steps of the audit.Data Warehousing 148 of 1017 . lowest. analysis provides statistics on the highest. all the necessary fields must be tested against the base rule sets. Tuning and re-running the analysis plans with these business rules. Using IDQ. analysis identifies all distinct values (such as code values) and their frequency distribution. as well as any invalid date values. to prioritize data quality issues. analysis can detect issues such as bar codes with correct/incorrect numbers of digits. average. as well as the number of positive values. the Data Quality Developer can identify all such data content issues. negative values. Define Business Rules The key objectives of this step are to identify issues in the areas of completeness. and any non-numeric values. These objectives involve: ● Discussions of data quality analyses with business users to define completeness. and total. and to define customized data quality rules. At a minimum. analysis identifies the highest and lowest dates. For example. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . For consumer packaging data. then that attribute should only have date information stored. zero/null values. conformity. the number of blank/null fields. and consistency rules for each data element.stage identify cases of incomplete or absent data values. ● ● The figure below shows sample IDQ report output. if an attribute has a date type. 3. conformity. For dates.

Data Warehousing 149 of 1017 .Best Practices None Sample Deliverables None Last updated: 15-Feb-07 18:17 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

8. and framing a discussion for the business about what actions to take based on the report conclusions. The report can include the following types of file: ● Data quality scorecards . The Data Quality Audit concludes with a presentation of these findings to the business and project stakeholders and agreement on recommended next steps. which is delivered in this subtask. potential risk areas are identified and alternative solutions are evaluated. Prerequisites None Roles Business Analyst (Secondary) Business Project Manager (Secondary) Data Quality Developer (Primary) Data Steward/Data Quality Steward (Primary) Technical Project Manager (Secondary) Considerations There are two key activities in this subtask: delivering the report.2 Report Analysis Results to the Business Description The steps outlined in subtask 2.Phase 2: Analyze Subtask 2. high-impact fashion. INFORMATICA CONFIDENTIAL Velocity v8 Methodology .1 lead to the preparation of the Data Quality Audit Report.charts and graphs of data quality that can be pre-set to present and compare data quality across key fields and data types Drill-down reports that permit reviewers to access the raw data underlying the summary information Exception files ● ● In this subtask.8.Data Warehousing 150 of 1017 . The Data Quality Audit report highlights the state of the data analyzed in an easy-to-read.

) Part of the report creation process is the agreement of pass/fail scores for the data and the assignment of weights to the data performance for different criteria.1 into a framework that can be easily understood by the business. and possibly creating audit summary documentation such as a Microsoft Word document or a PowerPoint slideshow. are linked to the underlying data so that viewers can move from high-level to low-level views of the data. The data quality scorecard can also be presented through a dashboard framework. (Graphical displays. IDQ reports information in several formats. recommendations made. and project targets set. including database tables. and categorize data quality issues according to business criteria. the data quality plans can be re-used to track data quality progress over time and throughout the organization. measure. which adds value to the scorecard by grouping graphical information in business-intelligent ways. Once the scorecards are defined. CSV files. HTML files. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The data quality issues can then be evaluated. and graphically.Delivering the report involves formatting the analysis results from subtask 2. preparing the data sources for the scorecards. the business may state that at least 98 percent of values in address data fields must be accurate and weight the zip +four field as most important. or scorecards. Creating Scorecards Informatica Data Quality (IDQ) is used to identify.8. This includes building data quality scorecards.Data Warehousing 151 of 1017 . For example.

The set of stakeholders should include one or more members of the data stewardship committee. data experts.As can be seen in the above figure. In some projects — for example. the business objectives may be achieved by data quality levels that are less than 100 percent. and the project stakeholders must work to those regulations. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Reviewing the Audit Results and Deciding the Next Step By integrating various data analysis results within the dashboard application.Data Warehousing 152 of 1017 . the project data must obtain a minimum quality levels in order to pass through the project processes and be accepted by the target data source. Together. a dashboard can present measurements in a “traffic light” manner (color-coded green/amber/red) to provide quick visual cues as to the quality of and actions needed for the data. In other cases. and representatives of the business. a Data Quality Developer. when the data must comply with government or industry regulations — the data quality levels are non-negotiable. the project manager. the stakeholders can review the current state of data quality and decide on appropriate actions within the project. In all cases. these stakeholders can review the data quality audit conclusions and conduct a costbenefit comparison of the desired data quality levels versus the impact on the project of the steps to achieve these levels.

For these reasons. Tracking levels of data quality over time. Best Practices None Sample Deliverables None INFORMATICA CONFIDENTIAL Velocity v8 Methodology . the scorecard typically does not need to be re-created for successive data quality analyses. Historical statistical tracking and charting capabilities are available within a data quality scorecard. Ongoing Audits and Data Quality Monitoring Conducting a data quality audit one time provides insight into the then-current state of the data. The following figure illustrates how ongoing audits can chart progress in data quality. As part of a statistical control process. or whether some event has caused the measured level to fall below what is acceptable. and scorecards can be easily updated. provides a historical view of when and how much the quality of data has improved. but does not reflect how project activity can change data quality over time. it is necessary to discuss data quality as early as possible in project planning. as part of an ongoing monitoring process. Statistical control charts can help in notifying data stewards when an exception event impacts data quality and can help to identify the offending information process.Data Warehousing 153 of 1017 . data quality levels can be tracked on a periodic basis and charted to show if the measured levels of data quality reach and remain in an acceptable range. once configured.

Last updated: 15-Feb-07 17:29 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 154 of 1017 .

1.Data Warehousing 155 of 1017 .1.5 Develop Change Management Process r r r r ● 3.3 Develop Configuration Recommendations 3.5 Estimate Volume Requirements r r r r ● 3.3.2.2.2 Install/Configure Software r INFORMATICA CONFIDENTIAL Velocity v8 Methodology .1.2.2 Design Development Architecture r 3.1 Procure Hardware and Software 3.Phase 3: Architect 3 Architect ● 3.2 Develop Architecture Logical View 3.1 Define Technical Requirements 3.2.1.1 Develop Quality Assurance Strategy 3.1.3 Develop Change Control Procedures 3.4 Determine Metadata Strategy 3.2 Define Development Environments 3.3 Implement Technical Architecture r 3.1 Develop Solution Architecture r 3.4 Develop Architecture Physical View 3.3.2.

Phase 3: Architect Description During this Phase of the project. It is critical that the architecture decisions made during this phase are guided by an understanding of the business needs.Data Warehousing 156 of 1017 . the project infrastructure is developed and the development standards and strategies are defined. quality assurance. The strategies include development standards. The environments and strategies for the entire development process are defined. the technical requirements are defined. good architecture decisions will ensure the success of the overall effort. ensure consistency and expedite completion of the data migration. This phase should culminate in the implementation of the hardware and software that will allow the Design Phase and the Build Phase of the project to begin. As Data Integration architectures become more real-time and mission critical. In the Architect Phase a series of key tasks are undertaken to accelerate development. The conceptual architecture is designed. Proper execution during the Architect Phase is especially important for for Data Migration projects. change control processes and metadata strategy. Prerequisites None Roles Business Analyst (Primary) Business Project Manager (Primary) Data Architect (Primary) Data Integration Developer (Secondary) Data Quality Developer (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . which forms the basis for determining capacity requirements and configuration recommendations.

Data Warehousing 157 of 1017 .Data Warehouse Administrator (Review Only) Database Administrator (DBA) (Primary) Metadata Manager (Primary) Presentation Layer Developer (Secondary) Project Sponsor (Approve) Quality Assurance Manager (Primary) Repository Administrator (Primary) Security Manager (Secondary) System Administrator (Primary) Technical Architect (Primary) Technical Project Manager (Primary) Considerations None Best Practices None Sample Deliverables None Last updated: 25-May-08 16:13 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Prerequisites None INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Phase 3: Architect Task 3. and can be the most influential. because the architecture must consider anticipated data volumes. Refining the logical model into a physical model. Then developing a logical model of the architecture for consideration. reviewing the requirements. reliable (with minimal or no downtime). Data integration solutions have grown in scope as well as the amount of data they process. and Validating the physical model. and vastly scalable. it is necessary to develop a thorough set of estimates. A robust solution architecture not only meets the business requirements but it also exceeds the expectations of the business community.Data Warehousing 158 of 1017 . Given the continuous state of change that has become a trademark of information technology. it is prudent to have an architecture that is not only easy to implement and manage. In addition. The Technical Architect is responsible for ensuring that the proposed architecture can support the estimated volumes. visible part of the whole effort. This task approaches the development of the architecture as a series of stepwise refinements: ● ● ● ● First. easily extendable.1 Develop Solution Architecture Description The scope of solution architecture in a data integration or an enterprise data warehouse project is quite broad and involves careful consideration of many disparate factors. but also flexible enough to accommodate changes in the future. Well-designed solution architecture is very crucial to any data integration effort. This necessitates careful consideration of architectural issues across a number of architectural domains.

how it runs. This may include many of the services described in the execution architecture. but also involves services that are unique to development environments such as security mechanisms for controlling access to development objects. and so forth. These three areas of concern provide a framework for considering how any system is built. and how it is operated. and the techniques and services required in the development of the enterprise solution. maintenance releases. Of course. and the operations architecture. including databases." it is clear that it has all the elements of a software system.Data Warehousing 159 of 1017 . tools. the execution architecture. change control tools and procedures. Although there may be some argument about whether an integration solution is a "system. which includes the entire supporting infrastructure ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Roles Business Analyst (Primary) Data Architect (Primary) Data Quality Developer (Primary) Data Warehouse Administrator (Review Only) Database Administrator (DBA) (Primary) System Administrator (Primary) Technical Architect (Primary) Technical Project Manager (Review Only) Considerations A holistic view of architecture encompasses three realms. executable programs. end users. and migration capabilities. the development architecture. all of these elements must be considered in the design and development of the enterprise solution. Each of these architectural areas involves specific responsibilities and concerns: ● Development Architecture. Execution Architecture. which incorporates technology standards.

which is a unified collection of technology services. In the context of an enterprise-wide integration solution. This differs from the execution architecture in that its primary users are system administrators and production support personnel.2 Design Development Architecture focuses on the development architecture and the Operate Phase discusses the important aspects of operating a data integration solution. this includes client and server hardware. network infrastructure. tools. standards. operating systems. and controls required to keep a business application production or development environment operating at the designed service level. and any other technology services employed in the runtime delivery of the solution.required to run an application or set of applications. Refer to the Operate Phase for more information on the operations architecture. database management systems. ● Operations Architecture. 3.Data Warehousing 160 of 1017 . The specific activities that comprise this task focus primarily on the Execution Architecture. Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:44 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

strategic decisions. implementation specifications based on the findings to date (regarding data rules.Data Warehousing 161 of 1017 . The technical requirements will drive these design steps by clarifying what technologies will be employed and. how they will satisfy the business and functional requirements.Phase 3: Architect Subtask 3.1. Data acquisition and data flow requirements. at least at a conceptual level. etc. Definitions of source and target schema – at least at logical/conceptual level. the business requirements and functional requirements must be reviewed and a highlevel specification of the technical requirements developed. ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . source analysis. Prerequisites None Roles Business Analyst (Primary) Data Quality Developer (Secondary) Technical Architect (Primary) Technical Project Manager (Review Only) Considerations The technical requirements should address.) such as: ● Technical definitions of business rule derivations (including levels of summarization. from a high-level.1 Define Technical Requirements Description In anticipation of architectural design and subsequent detailed technical design steps.

Security requirements and structures (access. etc. Connectivity specifications and constraints (especially limits of access to operational systems).).● ● ● ● ● ● Data quality requirements (at least at a high level). domain.Data Warehousing 162 of 1017 . Specific technologies required (if requirements clearly indicate such). Performance requirements (both “back-end” and presentation performance). ● For Data Migration projects the technical requirements are fairly consistent and known They will require processes to: ● ● ● ● ● Populate the reference data structures Acquire the data from source systems Convert to target definitions Load to the target application Meet the necessary audit functionalities The details of which will be covered in a data migration strategy. administration. Data consolidation/integration requirements (at least at a high level). Report delivery and access specifications. Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:44 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

The logical architecture helps people to visualize the solution and show how all the components work together. ETL. indicating. The logical view must take into consideration all of the source systems required to support the solution.e. IP addresses. etc.Data Warehousing 163 of 1017 . Prerequisites None INFORMATICA CONFIDENTIAL Velocity v8 Methodology .2 Develop Architecture Logical View Description Much like a logical data model. hardware specifications. how local repositories relate to the global repository (if applicable). The logical view does not contain detailed physical information such as server names.1. To communicate the conceptual architecture to project participants to validate the architecture. to be refined as you implement or grow the solution. databases. reporting. for example. ● ● The logical diagram provides a road map of the enterprise initiative and an opportunity for the architects and project planners to define and describe.. in some detail. the repositories that will contain the runtime metadata. This is a “living” architectural diagram. and all known data marts and reports.Phase 3: Architect Subtask 3. the individual components. To serve as a blueprint for developing the more detailed physical view. These details will be fleshed out in the development of the physical view. The logical view should show relationships in the data flow and among the functional components. a logical view of the architecture provides a high-level depiction of the various entities and relationships as an architectural blueprint of the entire data integration solution. The major purposes of the logical view are: ● To describe how the various solution elements work together (i. and metadata).

Roles Data Architect (Secondary) Technical Architect (Primary) Considerations The logical architecture should address reliability. and QA. availability. scalability. This will likely include legacy staging. performance.. ODS Web Application Servers ROLAP engines. Data Mining For Data Migration projects a key component is the documentation of the various utility database schemas. pre-load staging. including but not limited to: ● ● ● ● ● ● ● ● ● All relevant source systems ETL repositories.Data Warehousing 164 of 1017 . extensibility. Data Modeling tools PowerCenter Servers. XML Server Data Quality tools. BI repositories Metadata Management. data warehouse. security. data marts. Additionally. It should incorporate all of the high-level components of the information architecture.g. Web Services. interoperability. Repository Server Target data structures. e. MOLAP cubes. database schemas for Informatica Data Quality and Informatica Data Explorer will also be included. usability. reference data. Best Practices Designing Data Integration Architectures PowerCenter Enterprise Grid Option Sample Deliverables None INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Portals. Metadata Reporting Real-time Messaging. and audit database schemas.

Last updated: 06-Dec-07 15:36 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 165 of 1017 .

and potentially the user community. including project management. Discussions with interested constituents should focus on the recommended architecture. the recommendations of the Data Architect and Technical Architect should be very well formed. system administrators. The recommendations will be formally documented in the next subtask 3.3 Develop Configuration Recommendations Description Using the Architecture Logical View as a guide.and agreed upon . It is critical that the scope of the project be set . At this point. not on protracted debate over the business requirements. (Refer back to the Manage Phase for a discussion of scope setting and control issues).4 Develop Architecture Physical View but are not documented at this stage since they are still considered open to debate.Data Warehousing 166 of 1017 . the Project Sponsor. These recommendations will serve as the basis for discussion with the appropriate parties. develop a set of recommendations for how to technically configure the analytic solution. Prerequisites None Roles Data Architect (Secondary) Technical Architect (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology .prior to developing and documenting the technical configuration recommendations. Changes in the requirements at this point can have a definite impact on the project delivery date.Phase 3: Architect Subtask 3. based on their understanding of the business requirements and the current and planned technical standards. and considering any corporate standards or preferences.1.1.

The technical architectures should also provide a recommendation of a 32-bit architecture or a 64 bit architecture based on the cost/benefit of each.informatica. incremental costs can be reduced by leveraging existing available hardware resources and leveraging PowerCenter’s server grid technology. Linux.The incremental cost of the solution must fit within whatever budgetary parameters have been established by project management.the choice of server hardware and operating system to fit into the corporate standards. Bear in mind that not all INFORMATICA CONFIDENTIAL Velocity v8 Methodology . ● ● The primary areas to consider in developing the recommendations include. Also make sure the RAM size is determined in accordance with the systems to be built. Consult the Platform Availability Matrix at my. This is also likely to support the handling of larger numbers in data. In many cases RAM disks can be used in place of RAM when increased RAM availability is an issue. the server may be either UNIX. It is also important to ensure the hardware is built for OLAP applications. it must consider data capacity and volume throughput requirements. of course.Data Warehousing 167 of 1017 . Depending on the size and throughput requirements.Many IT organizations mandate – or strongly encourage .com for specifics on the applications under consideration for the project. This determination is important for ensuring improved performance. or NT-based. Cost . In particular. but are not necessarily limited to: ● Server Hardware and Operating System . which typically tend to be computational intensive as compared to OLTP systems which require hyper threading. solve the technical challenges posed by the analytic solution. In many cases.The recommended solution should work well within the context of the organization's existing infrastructure and conform to the organization's future infrastructure direction. Conformity .The recommended configuration must.Considerations The configuration recommendations must balance a number of factors in order to be adopted: ● Technical solution . This is especially important when the PowerCenter application creates huge cache files. It is advisable to consider the vast advantages of 64-bit OS and PowerCenter as this is likely to provide increased resources and enable faster processing speeds.

and cache files. should be chosen.e. Considerations should include network traffic (between the repository server. to name a few. ● Data Analyzer or other Business Intelligence Data Integration Platforms – Whether using Data Analyzer or a different BI tool for analytics. database.. such as OS. Typically. the location of the PowerCenter repository database. ETL server. of course. database tables. In instances where a choice of the DBMS is available. and BI enduser reports. the goal is to develop configuration recommendations that result in a high-performance application passing data efficiently between source system. This is also true for database connectivity (see Database Management System below). and the physical storage that will contain the PowerCenter executables as well as source. ● Disk Storage Systems – The architecture of the disk storage system should also be included in the architecture configuration. ● PowerCenter Server – The PowerCenter server should.Data Warehousing 168 of 1017 . careful consideration should be given to disk array and striping configuration in order to optimize performance for the related systems (i. and BI. Some organizations leverage a Storage Area Network (SAN) to store all data. A DBMS that is supported by all components in the technical infrastructure.applications have the same level of availability on every platform. In any case. ETL. be considered when developing the architecture recommendations. a secure Webserver infrastructure that utilizes a demilitarized zone (DMZ) will result in a different technical architecture configuration than an INFORMATICA CONFIDENTIAL Velocity v8 Methodology . many organizations also mandate the choice of a database management system. and BI). one should also consider user requirements that may dictate that a secure Web-server infrastructure be utilized to provide reporting access outside of the corporate firewall to enable features such as reporting access from a mobile device. while other organizations opt for local storage. For Web-based analytic tools such as Data Analyzer. target. ETL. and client machines). database server. ● Database Management System – Similar to organizational standards that mandate hardware or operating system choices. PowerCenter server. it is important to remember that PowerCenter and Data Analyzer support a vast array of DBMSs on a variety of platforms (refer to the PowerCenter Installation Guide and Data Analyzer Installation Guide for specifics).

they will be helpful for explaining the planned architecture. As drafts of the physical view are developed. TIP Use the Architecture Logical View as a starting point for discussing the technical configuration recommendations. Best Practices PowerCenter Enterprise Grid Option Sample Deliverables None Last updated: 06-Dec-07 14:55 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 169 of 1017 .infrastructure that simply supports reporting from within the corporate firewall.

Network Administrator. The physical view is unlikely to explicitly show all of the technical information necessary to configure the system.. servers. Much like a physical data model..4 Develop Architecture Physical View Description The physical view of the architecture is a refinement of the logical view.Data Warehousing 170 of 1017 . UNIX Administrator. but should provide enough information for domain experts to proceed with their specific responsibilities.e. Application Server Administrator. In addition. this view of the architecture depicts physical entities (i. In essence. a UNIX server may be serving as a PowerCenter server engine. etc). Mainframe Administrator. each entity should show the elements of the logical model supported by it. and may also be running Oracle to store the associated PowerCenter repositories. The physical view is the summarized planning document for the architecture implementation. and networks) and their attributes (i. hardware model.1. Data Analyzer server engine. IP address).e. this view is a common blueprint that the system's general contractor (i.e. Prerequisites None Roles Data Warehouse Administrator (Approve) Database Administrator (DBA) (Primary) System Administrator (Primary) Technical Architect (Review Only) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . the Technical Architect) can use to communicate to each of the subcontractors (i.e.Phase 3: Architect Subtask 3. server name. DBAs. For example. but takes into account the actual hardware and software resources necessary to build the architecture. workstations. operating system.

Data Warehousing 171 of 1017 .Considerations None Best Practices PowerCenter Enterprise Grid Option Sample Deliverables None Last updated: 06-Dec-07 15:35 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Server System Administrators.Data Warehousing 172 of 1017 . etc. DBAs.Phase 3: Architect Subtask 3.5 Estimate Volume Requirements Description Estimating the data volume and physical storage requirements of a data integration project is a critical step in the architecture planning process. specifically: Disk Space Considerations INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Prerequisites None Roles Data Architect (Primary) Data Quality Developer (Primary) Database Administrator (DBA) (Primary) System Administrator (Primary) Technical Architect (Secondary) Considerations Capacity planning and volume estimation should focus on several key areas that are likely to become system bottlenecks or to strain system capacity. Due to the varying complexity and data volumes associated with data integration solutions.e. This subtask represents a starting point for analyzing data volumes. it is crucial to review each technical area of the proposed solution with the appropriate experts (i.1.. but does not include a definitive discussion of capacity planning.). Network Administrators.

but will have swelled to nearly 5. Partitioning the physical target can greatly increase the efficiency and organization of the load process. this number may be vastly different for a "young" warehouse than one at "maturity".5 million rows at full maturity. This can obviously be affected by certain DBMS data types.000 sales. After the physical data model has been developed.5 million rows. an index may require 30 to 80 percent additional disk space. Be sure to consult the DBAs to understand how to factor in the physical storage characteristics relative to the DBMS being used. The basic techniques for database sizing are well understood by experienced DBAs. if the database is designed to store three years of historical sales data. the row width can be calculated. However. thus limiting the size to 5. Beyond the third year. The documentation for the DBMS should specify storage requirements for all supported data types. it does increase the number of physical units to be maintained. the database sizing model should be updated to reflect the data model and any changes to the known business requirements. the target databases. their row widths. After the Design Phase is completed. the DBMS documentation should contain specifics about calculating index size. so be sure to take into account each physical byte consumed.Data Warehousing 173 of 1017 . Indexing can add a significant disk usage penalty to a database.000 rows after the first month. ● ● ● Using these basic factors. the estimating model should produce a fairly accurate estimate of database size. A Database Sizing Model workbook is one effective means for estimating these sizes. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and estimated number of rows. and there is an average daily volume of 5. such as block parameter sizes. it is possible to construct a database sizing model (typically in spreadsheet form) that lists all database tables and indexes. Depending on the overall size of the indexed table. and the size of the keys used in the index. the table will contain 150. As the typical data integration solution does not alter the source systems. there is usually no need to consider their size. However. there should be a process in place for archiving data off the table. Be sure to discuss with the DBAs the most intelligent structuring of the database partitions. Note that the model will provide an estimate of raw data size.Database size is the most likely factor to affect disk space usage in the data integration solution. Once the row number estimates have been validated. and any ODS or staging areas demand disk storage over and above the existing operational systems. Again. During the Architect Phase only a rough volume estimate is required. Estimates of database size must factor in: ● Determine the upper bound of the precision of each table row. For example. Depending on the type of table.

A DBA who is unfamiliar with the star schema may seek to normalize the data model in order to save space. be sure to consider storage techniques carefully during the BI platform selection process. The width of the fact table is very important. Firmly resist this tendency to normalize. As a result of the limited load window. transformed and stored. For example. As a result. names. the size of the dimension tables is rarely a major contributor to the overall target database size.The estimating process also provides a good opportunity to validate the star schema data model.. either up or down. The ability to do so is constrained by three factors: ● Time it takes to extract the data (potentially including network transfer time. if INFORMATICA CONFIDENTIAL Velocity v8 Methodology . be sure that the DBAs who are responsible for the target data model understand its advantages. since a warehouse can contain millions. a load window is allotted representing clock time. TIP If you have determined that the star schema is the right model to use for the data integration solution. depending on the extent to which they pre-aggregate data and how that data is stored. or even hundreds of millions of fact records. If a fact table is wider than 32-64 bytes. Because this may be an important factor in the overall disk space requirements. tens of millions. etc. Data Processing Volume Data processing volume refers to the amount of data being processed by a given PowerCenter server within a specified timeframe. will typically be wider than the fact tables. and may contain redundant data (e. the PowerCenter server engine must be able to perform its operations on all data in a given time period. Since there is the possibility of unstructured data being sourced. but will have far fewer rows. The dimension tables. This window is determined by the availability of the source systems for extracts and the end-user requirements for access to the target data sources. from source to target. It is important to remember that Business Intelligence (BI) tools may consume significant storage space. In most data integration implementations. it may be wise to re-evaluate what is being stored. fact tables should contain only composite keys and discrete facts. it is important to factor in any conversion in data size. addresses.). on the other hand.g. Maintenance jobs that run on a regular basis may further limit the length of the load window.Data Warehousing 174 of 1017 .

etc. The Technical Architect should work closely with a Network Administrator to examine network capacity between the different components involved in the solution. It is a more accurate estimation to use clock time to ensure processing within the given load window. it is impossible to accurately project the throughput (in terms of rows per second) of a mapping due to the high variability in mapping complexity. It is important to involve the network specialists early in the Architect Phase so that they INFORMATICA CONFIDENTIAL Velocity v8 Methodology .. Refer to Performance Tuning Databases (Oracle) for suggestions on improving database performance. quantity and complexity of transformations. re-locate server engine(s). move source or target data prior to session execution. related to database tuning. it is possible to estimate the required network capacity. From an estimating standpoint. data matching operations in Infomatica Data Quality can take several hours of processor time to complete.Data Warehousing 175 of 1017 . Refer to the Best Practice Effective Data Matching Techniques for more information. The initial estimate is likely to be rough. The Network Administrator can thoroughly analyze network throughput during system and/or performance testing. as described in Task 4. The throughput of the PowerCenter Server engine is typically the last option for improved performance. as this can have an affect on the total volume of data being transmitted. and the nature of the data being transformed. but should provide a sense of whether the existing capacity is sufficient and whether the solution should be architected differently (i. Network Throughput Once the physical data row sizes and volumes have been estimated.).6) then a related performance factor is the time taken to perform data matching (that is. however. Data matching processes can be tuned and executed on remote machines on the network to significantly reduce record processing time.e. Depending on the size of the dataset concerned. however. and apply the appropriate tuning techniques.the data is on a remote server) ● ● Transformation time within PowerCenter Load time (which is also potentially impacted by network latency) The biggest factors affecting extract and load times are. If the project includes steps dedicated to improving data quality (for example. It is important to remember the network overhead associated with packet headers. record de-duplication) operations. Refer to the Velocity Best Practice Tuning Sessions for Better Performance which includes suggestions on tuning mappings and sessions to optimize performance.

If such co-location is not possible. TIP Informatica generally recommends having either the source or target database co-located with the PowerCenter Server engine because this can significantly reduce network traffic. Best Practices None Sample Deliverables None Last updated: 15-Feb-07 18:19 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .are not surprised by additional network requirements when the system goes into production. it may be advisable to FTP data from a remote source machine to the PowerCenter Server as this is a very efficient way of transporting the data across the network.Data Warehousing 176 of 1017 .

techniques.Phase 3: Architect Task 3. so it is important to approach the overall body of work in this task as a whole and consider the development architecture as a whole.2 Design Development Architecture Description The Development Architecture is the collection of technology standards. all of these subtasks relate to the others.Data Warehousing 177 of 1017 . and include good communication and change controls as well as controllable migration procedures. Ignoring proper controls is likely to lead to issues later on in the project. and services required to develop a solution. This task involves developing a testing approach. and determining the metadata strategy. Prerequisites None Roles Business Project Manager (Primary) Data Architect (Secondary) Data Integration Developer (Secondary) Database Administrator (DBA) (Primary) Metadata Manager (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . tools. defining the development environments. Although the various subtasks that compose this task are described here in linear fashion. The benefits of defining the development architecture are achieved later in the project.

This is because many more organizational groups are involved. is much broader than a departmentally-scoped solution. By contrast. adherence to standards is much more important. possibly covering more than one project.Presentation Layer Developer (Secondary) Project Sponsor (Review Only) Quality Assurance Manager (Primary) Repository Administrator (Primary) Security Manager (Secondary) System Administrator (Primary) Technical Architect (Primary) Technical Project Manager (Primary) Considerations The Development Architecture should be designed prior to the actual start of development because many of the decisions made at the beginning of the project may have unforeseen implications once the development team has reached its full size. The following paragraphs outline some of the key differences between a departmental development effort and an enterprise effort: INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The design of the Development Architecture must consider numerous factors including the development environment(s). naming standards. and more. since the results will be visible to a larger audience. change control procedures.Data Warehousing 178 of 1017 . The scope of a typical PowerCenter implementation. developer security. and testing must be more rigorous. as well as the requisite planning associated with the development environment. a full integration solution involving the creation of an ICC (Integration Competency Center) or an analytic solution that approaches enterprise scale requires more of a "big team" approach. The main difference is that a departmental data mart type project can be created with only two or three developers in a very short time period. It is important to consider this statement fully. because it has implications for the planned deployment of a solution.

Migration procedures are loose. Only one or two repository folders may be necessary. all developers use similarly often highly privileged user ids. Sharing a single developer ID among multiple developers makes it impossible to determine which developer locked a development object. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Naming standards are not rigidly enforced. since there is little risk of the developers "stepping on" each other's work. leading to the possibility of untested changes being made in test environments. the environment may be simplistic: ● Communication between developers is easy. Developer security is ignored. typically. As is the case with the execution environment. or who made the last change to an object.With a small development team. Repository folders originally named to correspond to individual developers will not adequately support subject area. failure to define secured development groups allows all developers to access all folders.Data Warehousing 179 of 1017 . Failure to understand the dependencies of shared objects leads to unknown impacts on the dependent objects. ● ● ● ● These factors represent only a subset of the issues that may occur when the development architecture is haphazardly constructed.or release-based development groups. or "organically" grown. including both the development and execution environments. Developers maintaining others' mappings are likely to spend unnecessary time and effort trying to decipher unfamiliar names. More importantly. this simplified environment leads to serious development issues: ● Developers accustomed to informal communication may not thoroughly inform the entire development team of important changes to shared objects. it may literally consist of shouting over a cubicle partition. The lack of rigor in testing and migrating objects into production leads to runtime bugs and errors in the warehouse loading process. ● ● ● ● However. as the development team grows and the project becomes more complex. But any serious effort to develop an enterprise-scale analytic solution must be based on well-planned architecture. a departmental data mart development effort can "get away with" minimal architectural planning. development objects are moved into production without undue emphasis on impact analysis and change control procedures.

default values. table-driven parameter tables. These structures will be key component in the development of reusable objects.In Data Migration projects it is common to build out a set of reference data tables to support the effort. data control structures. cross-reference specifics. These often include tables to hold configuration details (valid values). Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:45 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 180 of 1017 .

there is far more involved in producing a high quality project.Phase 3: Architect Subtask 3. The QA Strategy includes definition of key QA roles. key verification processes and key QA assignments involved in detailing all of the validation procedures for the project. However. should consider the business requirements and the project methodology. Although it may take a “sales” effort to win over management to a QA process that is highly involved throughout the project. the trade-offs of cost vs. the QA Manager or “owner” of the project’s QA processes. the benefits can be proven historically in the success rates of projects and their ongoing maintenance costs. value will likely affect the scope of QA.2.1 Develop Quality Assurance Strategy Description Although actual testing starts with unit testing during the build phase followed by the project’s Test Phase. ● ● Potential areas of verification to be considered for QA processes: Formal business requirements reviews with key business stakeholders and sign-off Formal technical requirements reviews with IT stakeholders and sign-off Formal review of environments and architectures with key technical personnel ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 181 of 1017 . Prerequisites None Roles Quality Assurance Manager (Primary) Security Manager (Secondary) Test Manager (Primary) Considerations In determining what project steps will require verification.

formal sign-off for unit tests Gatekeeping for migration out of Development environment (into QA and/or Production) Regression testing: definition of procedures. code.● ● ● Peer reviews of logic designs Peer walkthroughs of data integration logic (mappings.Data Warehousing 182 of 1017 .) Unit Testing: definition of procedures. validation of resolution User Acceptance Test: review of Test Plans. formal acceptance process Documentation review Training materials review Review of Deployment Plan. formal acceptance process Defect Management: review of procedures. review of test plans. sign-off for deployment completion ● ● ● ● ● ● ● ● Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:45 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . etc. formal signoff System Tests: review of Test Plans. review of test plans.

what databases they are accessing. standards. An important component of any development environment is to configure it as close to the test and production environments as possible given time and budget. This can significantly ease the development and integration efforts downstream and will ultimately save time and cost during the testing phases. For example.Phase 3: Architect Subtask 3.2 Define Development Environments Description Although the development environment was relatively simple in the early days of computer system development when a mainframe-based development project typically involved one or more isolated regions connected to one or more database instances.1. there are key areas of consideration and decisions that must be made with respect to them. Rather. and many more "moving parts. developers need to understand what systems they are logging into.2. and where sources and targets reside.Data Warehousing 183 of 1017 . The task of defining the development environment is. is still critical to development success. database instances. extremely important and very difficult." The basic concept of isolating developers from testers. Because of the wide variance in corporate technical environments. However. hardware platforms. and both from the production system. including (most importantly) the information the developers need to use the environments. distributed systems.1 Define Technical Requirements Roles Database Administrator (DBA) (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . there are many more technical issues. and objectives. After the development environment has been defined. therefore. there is no "optimal" development environment. it is important to document its configuration. what repository (or repositories) they are accessing. such as federated data warehouses. relative to a centralized development effort. involve much more complex development environments. and specialized personnel to deal with. Prerequisites 3.

Both PowerCenter and Data Analyzer have built-in security features that allow an administrative user (i. Within each repository. The major differences are that the development approach is "repository-centric" (as opposed to code-based).. This involves critical decisions. LDAP can be used to assist in simplifying the organization of users and permissions. folders are used to group and organize work units or report objects. as well as determining an overall metadata strategy (see 3.2. The organization of security groups should be carefully planned and implemented prior to the start of development. and few (if any) hand-coded objects to build and maintain. provides capabilities for integrating multiple heterogeneous sources and targets. if at all.Data Warehousing 184 of 1017 . The requirements of the development team should dictate to what extent the PowerCenter capabilities are exploited. because of the repository-based development approach. the development environment must consider all of the following key areas: ● Repository Configuration. which deals with a single database). a global repository. Folder structure.4 Determine Metadata Strategy ).e. PowerCenter.Repository Administrator (Primary) System Administrator (Primary) Technical Architect (Primary) Technical Project Manager (Review Only) Considerations The development environment for any data integration solution must consider many of the same issues as a "traditional" development project. there are multiple sources and targets (unlike a typical system development project. or both. As an additional option. Developer security. ● ● Repository Configuration Informatica's data integration platform. To be effective. In addition. the Repository Administrator) to define the access rights of all other users to objects in the repository. as well as the change control/migration approach. such as whether to use local repositories. In a simple data integration development effort. source data may be INFORMATICA CONFIDENTIAL Velocity v8 Methodology . the folder structure must consider the organization of the development team(s).

and this repository cannot access objects in any other repositories. while sharing common development objects. This type of repository is referred to as a local repository and is typically used for small. a PowerCenter-based development environment is required. from a development environment standpoint. the following three configurations serve as the basis for determining how to best configure the environment for developers: ● Standalone PowerCenter. distributing the server load across the PowerCenter server engines can leverage this same configuration.. and include the participation of developers from multiple areas within the corporate organization. and production. PowerCenter Data Integration Hub with Networked Local Repositories. including developer security. In the production environment. Some of these may include mainframe legacy systems as well as third-party ERP providers. Most data integration solutions currently being developed involve data from multiple sources. The strength of this solution is that multiple development groups can work semi-autonomously. There are basically three ways to configure an Informatica-based data integration solution although variations on these three options are certainly possible. and end results. PowerExchange for PeopleSoft Enterprise) and Data Analyzer for front-end reporting. Some companies can manage colocating development and testing on one repository by segregating codes through folder strategies. However.e. In order to develop a cohesive analytic solution. This option can dramatically affect the definition of the development environment. In this configuration. departmental data marts. and shareable objects. testing. and a repository administrator with SuperUser authority can control production objects. More complex data integration development efforts involve multiple source and target systems. folders can be configured to restrict access to specified developers (or groups). independent. can still use this repository. transformation rules.extracted from a single database or set of flat files. Many of the capabilities within PowerCenter are available.Data Warehousing 185 of 1017 . This configuration combines a centralized. PowerCenter as a Data Integration Hub with a Data Analyzer Front-End to ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . working on multiple projects. particularly with the addition of PowerExchange products (i. PowerExchange for SAPNetweaver. folder structures. with shared concepts of the business entities. The primary development restrictions are that the objects in the repository can't be shared with other repositories. This means that there would be an instance of repository for development. and then transformed and loaded into a single target database. there is a single repository that cannot be shared with any others within the enterprise. shared global repository with one or more distributed local repositories. target multiple data marts. Multiple developers.

Shareable objects. PowerCenter Data Integration Hub with Networked Local Repositories In this advanced repository configuration. must be thoroughly planned and executed. ● ● TIP It is very important to house all globally-shared database schemas in the Global Repository. and development objects. it may still make sense to do local development and unit INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Again. targets. there may still be non-shared sources. Target definitions. This configuration provides an end-to-end suite of products that allow developers to build the entire data integration solution from data loads to end-user reporting. Because most IT organizations prefer to maintain their database schemas in a CASE/data modeling tool. In most cases. And. targets. there is no single "correct" solution.the Reporting Warehouse. Shared objects should be created and maintained in a single place. the PowerCenter Global Repository becomes a development focal point. only general guidelines for consideration. and shareable objects connect to the Global Repository to do so. these objects may eventually be migrated into the shared Global Repository. Apply the same logic regarding source definitions. it is perfectly acceptable to house the definitions within a local repository. it is important to have a single "true" version of their schemas resident in the Global Repository. Because many source systems may be shared. Departmental developers wishing to access enterprise definitions of sources. the procedures for updating the PowerCenter definitions of source/target schemas must include importing these schemas from tools such as ERwin. even if the overall development environment includes a PowerCenter Data Integration Hub. the Technical Architect must pay careful attention to the sharing of development objects and the use of multiple repositories. Of course.Data Warehousing 186 of 1017 . The layout of this repository. In these cases. If necessary. and its contents. It is far easier to develop these procedures for a single (global) repository than for each of the (independent) local repositories that may be using the schemas. The Global Repository may include shareable folders containing: ● Source definitions. the Global Repository is the place.

Folder Architecture Options and Alternatives Repository folders provide development teams with a simple method for grouping and organizing work units. for the life of the project. where appropriate. After the initial configuration is determined. since they shouldn't be shared until they have been fully tested.3 Design and Build Data Quality Process) are performed using processes saved to a discrete IDQ repository.testing in a local repository . from which they can be distributed as needed across the enterprise. The process for creating and administering folders is quite simple. This mechanism allows a global repository administrator to oversee multiple development projects without having to separately log-in to each of the individual local repositories. For example. depending on their purpose. In addition to the sharing advantages provided by the PowerCenter Data Integration Hub approach. and thoroughly explained in Informatica’s product documentation.even for shared objects. the Technical Architect can limit his/her involvement in this area.8 Perform Data Quality Audit or 5. During the Architect Phase and the Design Phase the Technical Architect should work closely with Project Management and the development lead(s) to determine the appropriate repository placement of development objects in a PowerCenter-based environment. any data quality steps taken with Infomatica Data Quality (IDQ) applications (such as those implemented in 2. This capability is useful for ensuring that individual project teams are adhering to enterprise standards and may also be used by centralized QA teams. Moreover. As indicated above. These processes (called plans in IDQ parlance) can be added to PowerCenter transfomations and subsequently saved with those transformations in the PowerCenter repository. plans may remain in an IDQ server repository. the global repository also serves as a centralized entry point for viewing all repositories linked to it via networked local repositories. INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 187 of 1017 . data quality plans can be designed and tested within an IDQ repository before deployment in PowerCenter. The main area for consideration is the determination of an appropriate folder structure within one or more repositories.

Once the developer has completed the initial development and unit testing within his/her own sandbox folder. DEV3. particularly if development is centralized around the source systems. This simplifies the repository-to-repository migration procedures. the promotion and deployment process can be quite complex depending on the load strategy. This strategy is particularly suitable for large projects populating numerous target tables. In these situations. Eventually however. The most commonly employed general approaches to folder structure are: ● Folders by Subject (Target) Area. ● ● In addition to these basic approaches. BILLING. etc. Folder names may be ERP. DISTRIBUTION.TIP If the migration approach adopted by the Technical Architect involves migrating from a development repository to another repository (test or production).Data Warehousing 188 of 1017 . The Source Area Division method is attractive to some development teams. Folder names may be DEV1. but is suitable only for small development teams working with a minimal number of mappings. Migration to production is significantly simplified. the mappings or objects are consolidated as they are migrated to test or QA. The Subject Area Division method provides a solid infrastructure for large data warehouse or data mart developments by organizing work by key business area. Folder Division by Environment. he/she can migrate the results to the appropriate folder. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . etc. QA. This is particularly useful when performing folder copies from one environment to another because it eliminates the need to change database connection settings after the folder copy has been completed." allowing for unrestricted freedom in development and testing. As each developer completes unit tests in his/her individual work folders. Data Analyzer creates Personal Folders for each user name which can be used as a sandbox area for report development and test. Another possible approach is to assign the same names to corresponding database connections in both the "source" and "target" repositories. Folder Dividion by Source Area. For example. etc. This method is easier to establish and maintain than Folders by Subject Area. it may make sense for the "target" repository to mirror the folder structure within the development repository. DEV2. the number of mappings in a single folder may become too large to easily maintain. with the maximum number of required folder copies limited to the number of environments. TEST. folder names may be SALES. many PowerCenter development environments also include developer folders that are used as "sandboxes.

x or migrated as a complete folder as they can in earlier versions of PowerCenter.Data Warehousing 189 of 1017 . if individual objects need to be migrated. Because the folders are arranged alphabetically. the most efficient method to migrate an object was to perform a complete folder copy. This involves grouping mappings meaningfully within a folder. as will the MRKT folders . it is also important to consider the migration process in the design of the folder structures. unique identifier. another group of developers is working on a Sales data mart. Objects can also be linked together to facilitate their deployment to downstream repositories. However. the migration process can become very cumbersome. PowerCenter 7.x uses the export and import of repository objects for the migration process among environments.x introduced the concept of team-based development and object versioning. approach is to give all developers access to the default Administrator ID provided upon installation of the PowerCenter or Data Analyzer INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Developer Security The security features built into PowerCenter and Data Analyzer allow the development team to be grouped according to the functions and responsibilities of each member. Objects are exported and imported as individual pieces and cannot be linked together in a deployment group as they can in PowerCenter 7. For example. typically prefixing folder names with a brief. suppose three developers are working on the development of a Marketing department data mart. while logically grouping them together. but risky. Data Analyzer 4. which integrated a true version-control tool within PowerCenter. in the same repository. A common technique for logically grouping folders is to use standardized naming conventions. In order to allow each developer to work in his/her own folder. Objects can be treated as individual elements and can be checked out for development and checked in for testing. The migration process depends largely on the folder structure that is established. and the type of repository environment. since all mappings within the folder migrate together. One common. which creates a challenge to logically grouping development objects in different folders. etc. Concurrently. since each object needs to be "manually" migrated. SALES_DEV2. MRKT_DEV1. Finally. SALES_DEV3. In earlier versions of PowerCenter.TIP PowerCenter does not support nested folder hierarchies. all of the SALES-related folders will sort together. the folders may be named SALES_DEV1.

The following paragraphs offer some recommendations for configuring security profiles for a development team. the repository maintains an association between repository user names and external login names. including suggestions for configuring user privileges and folder-level privileges. it can communicate those changes (via a Change Request or bug report) to the development group. and they are created. and folders. For example. In these cases. When the testing team identifies necessary changes. The tightest security of all is reserved for promoting development objects into production. and maintained via administrative functions provided by the PowerCenter Repository Manager or Data Analyzer Administrator. As development objects migrate closer to the production environment. This approach is simpler than assigning privileges on a user-by-user basis since there are generally a few groups and many users.software. If you use LDAP authentication for repository users. When you create a user. you can select the login name from the external directory. PowerCenter's and Data Analyzer’s security approach is similar to database security environments. it is more common to assign privileges to groups only. the testing group is typically granted Execute permissions in order to run mappings. which fixes the error and re-migrates the result to the test area. The internal security enables multi-user development through management of users. PowerCenter’s security management is performed through the Repository Manager and Data Analyzer’s security is performed through tasks on the Administrator tab. PowerCenter UserIDs are distinct from database userids. In some environments. Although privileges can be assigned to users or groups. Many projects use this approach because it allows developers to begin developing mappings and sessions as soon as the software is installed. LDAP integration is an available option that can minimize the administration of usernames and passwords separately. and then add users to each group. privileges. and any user can belong to more than one group. For companies that have the capabilities to do so.Data Warehousing 190 of 1017 . For additional information on PowerCenter and Data Analyzer security. security privileges should be tightened. see Configuring Security. groups. no member of the development team is permitted to move anything into production. managed. INFORMATICA STRONGLY DISCOURAGES THIS PRACTICE. but should not be given Write access to the mappings. Every user must be assigned to at least one group. Despite the similarities. a System Owner or other system representative outside the development group must be given the appropriate repository INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

The Technical Architect and Repository Administrator must understand these conditions while designing an appropriate security solution.Data Warehousing 191 of 1017 .privileges to complete the migration process. Best Practices Configuring Security Sample Deliverables None Last updated: 19-Dec-07 16:54 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

without imposing undue restrictions on the development team's goal of getting its solution into production in a timely manner.3 Develop Change Control Procedures Description Changes are inevitable during the initial development and maintenance stages of any project. The change control procedures document should also provide background contextual information. This change control process controls the timing. shared transformations. or deployment plans . or batches for PowerCenter and schemas. The procedures themselves should be a well-documented series of steps. The primary purpose of a change control process is to facilitate the coordination among the various organizations involved with effecting this change (i. repositories. mapplets.in the logical and physical data models.2. global variables. Wherever and whenever the changes occur . However.they must be controlled. and operations). mappings. the change control process must not be so cumbersome as to hinder speed of deployment. reports. sessions. development. describing what happens to a development object once it has been modified (or created) and unit tested by the developer.Phase 3: Architect Subtask 3. This subtask addresses many of the factors influencing the design of the change control procedures. impact. targets. or shared objects for Data Analyzer). and method by which development changes are migrated through the promotion hierarchy. business rules. and databases. test. The procedures should be thorough and rigid.Data Warehousing 192 of 1017 . Change control procedures include formal procedures to be followed when requesting a change to the developed system (such as sources. Prerequisites None Roles Data Integration Developer (Secondary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology .. extract programs. deployment.e. including the configuration of the environment.

The general approach for migration is similar regardless of whether the environment is a single repository or multiple repository approach. because of the many advantages gained by isolating development from production environments. the logical grouping is accomplished through the use of folders named accordingly. In the single repository approach. The following development environment factors influence the approach taken to change control: Repository Configuration Subtask 3. representing the various promotion levels within the promotion hierarchy (e. PROD). In the multiple repository approach. logical groupings of development objects have been created. Stand-Alone PowerCenter.2 Define Development Environments discusses the two basic approaches to repository configuration. an entire repository may be used for INFORMATICA CONFIDENTIAL Velocity v8 Methodology . In either case.g. or performing a complete folder copy. QA.Data Warehousing 193 of 1017 . then the change control process is fairly straightforward.. is the simplest configuration in that it involves a single repository. DEV. but provides a more stable solution.2. If that single repository supports both development and production (although this is not generally advisable). migrations involve copying the relevant object from a development folder to a production folder. TEST.Database Administrator (DBA) (Secondary) Presentation Layer Developer (Secondary) Quality Assurance Manager (Approve) Repository Administrator (Secondary) System Administrator (Secondary) Technical Project Manager (Primary) Considerations It is important to recognize that the change control procedures and the organization of the development environment are heavily dependent upon each other. It is impossible to thoroughly design one without considering the other. This decision complicates the change control procedures somewhat. The first one. However. Informatica recommends physically separating repositories whenever technically and fiscally feasible.

one (or more) promotion levels. ● ● ● Tip With a PowerCenter Data Integration Hub implementation.e. as well as a procedure for backing out changes. if the object is only stored locally. the global Repository Administrator can perform all repository migration tasks. the change must be applied to the global repository. Whenever possible. the change must be made in both repositories. ● If the object is a global object (reusable or not reusable). If the object is shared. and a separate PROD repository. This provides access to both repositories through one "console". If the object is stored in both repositories (i. Finally. load) does this change impact? What processes does the client have in place to handle and track changes? Who else uses the data affected by the change and are they involved in the change request? How will this change be promoted to other environments in a timely manner? What is the effort involved in making this change? Is there time in the project schedule for this change? Is there sufficient time to fully test the change? ● ● ● ● Change Request Tracking Method The change procedures must include a means for tracking change requests and their migration schedules. the production repository should be independent of the others. In this case. the change is only implemented in the local repository. the following questions must be considered in the change control procedures: ● ● What PowerCenter or Data Analyzer objects does this change affect? What other system objects are affected by the change? What processes (migration/promotion. The Change Request Form should include information about the nature of the change. Regardless of the repository configuration however.. the shortcuts referencing this object automatically reflect the change from any location in the global or local architecture. global and local). only the "original" object must be migrated.Data Warehousing 194 of 1017 . INFORMATICA CONFIDENTIAL Velocity v8 Methodology . global repositories can register local repositories. A typical configuration would be a shared repository supporting both DEV and TEST. simplifying the administrative tasks for completing change requests. Therefore. if necessary.

Team Based Development. managing migrations. But. the timing of the request for migration.x. refer to the appropriate sections of the PowerCenter documentation. however. and scripts that make up the data migration project. workflows. It is important to note. Tracking and Reverting to Previous Version The team-based development option provides functionality in two areas: versioning and deployment. tracking finished work that needs to be reviewed or migrated. Data Migration Projects For Data Migration projects change control is critical for success.x. There are a number of ways to back-out a changed development object. It is common that the target system has continual changes during the life of the data migration project. and enough technical information about the change that it can be reversed if necessary. While the functionality provided via team-based development is quite powerful. it is clear that there are better ways of using it to achieve expected goals. such as repository queries and labeling are necessary to ensure optimal use of versioning and deployment. the key to change control is in the communication of changes to ensure that testing activities are integrated. not after an incorrect change has been migrated into Production. For a more detailed explanation of any of the capabilities of the team-based development features of PowerCenter. These cause changes to specifications. sessions. The following sections describe this functionality at a general level. and ensuring minimal errors can be quite complex.the developer making the change. For data migration. The activities of coordinating development in a team environment. however. The process requires a combination of PowerCenter functionality and user process to implement effectively. which in turn cause a need to change the mappings. Change control is important to allow the project management to understand the scope of change and to limit the impact that process changes cause to related processes. is during the implementation of the development environment. other features. Backing out a change in PowerCenter 7. reversing a change to a single object in the repository is very tedious and error-prone. The time to plan for this occurrence however.Data Warehousing 195 of 1017 . that prior to PowerCenter 7. and should be considered as a last resort. is a simple as reverting to a previous version of the object(s). Best Practices None INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Data Warehousing 196 of 1017 .Sample Deliverables None Last updated: 15-Feb-07 18:48 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

A proper metadata strategy provides Data Integration Developers. It can provide a map for managing the expanding requirements for reporting information that the business places upon the IT environment.Data Warehousing 197 of 1017 .Phase 3: Architect Subtask 3. The metadata strategy highlights the importance of a central data administration department for organizations that are concerned about data quality. As such. and End Users with the ability to create a common understanding of the data. This solution allows for the following capabilities: ● ● ● ● ● Consolidation and cataloging of metadata from various source systems Reporting on cataloged metadata Lineage and where-used Analysis Operational reporting Extensibility The Business Intelligence Metadata strategy can also assist in achieving the goals of data orientation by providing a focus for sharing the data assets of an organization. The metadata strategy should describe where metadata will be obtained. and how it will be accessed. End-User Application Developers. where it will be stored. implementing and maintaining a solid metadata strategy is a key enabler of high-quality solutions.4 Determine Metadata Strategy Description Designing.2. the metadata may be as important as the data itself. where it came from. The federated architecture model of a PowerCenter-based global metadata repository provides the ability to share metadata that crosses departmental boundaries while allowing non-shared metadata to be maintained independently. the Metadata Manager is responsible for documenting and distributing it to the development team and end-user community. and reuse. The components of a metadata strategy for Data Analyzer include: ● ● ● ● ● ● ● Determine how metadata will be used in the organization Data stewardship Data ownership Determine who will use what metadata and why Business definitions and names Systems definitions and names Determine training requirements for the power user as well as regular users Prerequisites None Roles Metadata Manager (Primary) Repository Administrator (Primary) Technical Project Manager (Approve) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . because it provides context and credibility to the data being analyzed. integrity. and what business rules have been applied to it. After the strategy is developed.

Data migration resources Data migration resources. Where. This consolidation is beneficial when a production system based on clean.. DW Architects. maintained. PowerPlugs. The following table expands the concept of the Who. PowerCenter Repository Captured Manager and loaded once. reliable metadata is unveiled to the company. maintained as necessary Source to target mappings Simplifies PowerCenter Informatica documentation Repository process.Data Warehousing 198 of 1017 . The Metadata Manager should analyze each point of integration in order to answer the following questions: ● ● ● ● ● What metadata needs to be captured? Who are the users that will benefit from this metadata? Why is it necessary to capture this metadata (i. and How approach to managing metadata: Metadata Definition (What?) Source Structures Users (Who?) Source system users & owners. an area where managing metadata provides benefit to IT and/or business users. Informatica Data Explorer Captured Repository and loaded Manager once. its source) and where will it ultimately reside? How will the repository be populated initially. What. and accessed? It is important to centralize metadata management functions despite the potential "metadata bottleneck" that may be created during development..Considerations The metadata captured while building and deploying analytic solution architecture should pertain to each of the system's points of integration. Why. business analysts Benefits (Why?) Source of Metadata (Where?) Metadata Store (Where?) Population Maintenance (How?) (How?) Access (How?) Allows users Source operational to see all structures and system associated elements in the source system Allows users Data warehouse. what are the actual benefits of capturing this metadata)? Where is the metadata currently stored (i.e. maintained as necessary Target Warehouse Structures Informatica Repository Warehouse Designer.e. Changes Manager Informatica Data Explorer INFORMATICA CONFIDENTIAL Velocity v8 Methodology . to see all structures and data marts associated elements in the target system PowerCenter Source Repository Analyzer. PowerPlugs. Data Migration Resources Target system users/ analysts. allows for quicker more efficient rework of mappings PowerCenter Capture Data Repository Designer. PowerCenter.

.Data Warehousing 199 of 1017 . always considering the following points: ● Source structures. Analytic applications. Transform. This can simplify the development and maintenance of the analytic solution. in stored procedures or external procedures) will not have metadata associated with it.e. Are source data structures captured or stored already in a CASE/data modeling tool? Are they maintained consistently? Target structures. Reporting tools. This metadata may be useful to operators and end users. Also. PowerCenter automatically captures rich operational data when batches and sessions are executed. remember that any ETL code developed outside of a PowerCenter mapping (i. End users working with Data Analyzer may need access to the PowerCenter metadata in order to understand the business context of the data in the target database(s). Several front-end analytic tools have the ability to import PowerCenter metadata. the metadata will be created and maintained automatically within the PowerCenter repository. The Metadata Manager and Repository Manager need to work together to determine how best to capture the metadata. Are target data structures captured or stored already in a CASE/data modeling tool? Is PowerCenter being used to create target data structure? Where will the models be maintained? Extract.Reporting Tool Business analysts Allows users Data to see Analyzer business names and definitions for query-building Informatica Repository Reporting Tool Note that the Informatica Data Explorer (IDE) application suite possesses a wide range of functional capabilities for data and metadata profiling and for source-to-target mapping. Operational metadata. Assuming PowerCenter is being used for the ETL processing. and should be considered an important part of the analytic solution. ● ● ● ● ● Best Practices None Sample Deliverables None Last updated: 18-Oct-07 15:09 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and Load process.

.e. services and documentation are kept up to date. Its purpose is to minimize the disruption to services caused by change and to ensure that records of hardware. scheduling. software. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Elements of the process include identify change. impact assessment.Phase 3: Architect Subtask 3.2. etc.Data Warehousing 200 of 1017 . Prerequisites None Roles Business Project Manager (Primary) Project Sponsor (Review Only) Technical Project Manager (Primary) Considerations Identify Change Change Management is necessary in any of the following situations: ● A problem arises that requires a change that will affect more than one business user or a user group such as sales. services.5 Develop Change Management Process Description Change Management is the process for managing the implementation of changes to a project (i. marketing. The Change Management process enables the actual change to take place. and implementation. create request for change. software. or related documentation. data warehouse or data integration) including hardware. approval.

The Change Request Form should include information about the nature of the change. Before implementing a change request in the PowerCenter environment. and enough technical information about the change that it can be reversed if necessary. ● PowerCenter Versions 7. if necessary. that this approach has the disadvantage of being very time consuming and may also greatly increase the size of the repository databases. the original object can be retrieved via object copy. A change is required to fulfill a change in business strategy as identified by a business leader or developer. The number of 'versions' to maintain is at the discretion of the PowerCenter Administrator. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . as well as a procedure for backing out changes. The following sections describe this functionality at a general level. with a checklist of items to be considered and approved before implementing the change. a software upgrade) or a change in needs (for new functionality).X The team-based development option provides functionality in two areas: versioning and deployment. For a more detailed explanation of any of the capabilities of the Team-based Development features of PowerCenter. the object simply needs to be copied to the original development folder from this versioning folder. such as repository queries and labeling are required to ensure optimal use of versioning and deployment. If a change needs to be reversed. Using this back-up. it is advisable to create an additional back-up repository.Data Warehousing 201 of 1017 . In addition.● A new requirement is identified as a result of advances in technology (e. Create one to 'x' number of version folders. then change the object back to its original form by referring to the change request form.. Note however. be sure to: ● Track changes manually (electronic or paper change request form). the developer making the change. After a successful restore.X and 8. But. the repository can be restored to a 'spare' repository database. the timing of the request for migration.g. The change procedures must include a means for tracking change requests and their migration schedules. where 'x' is the number of versions back that repository information is maintained. ● Request for Change A request for change should be completed for each proposed change. other features. please refer to the appropriate sections of the PowerCenter documentation.

The change request must be tracked through all stages of the change request process. the dashboards also need to be migrated to reflect the link to the report. The following sections on the request for change must be completed at this stage: ● Full details of change – Inform Administrator. with thorough documentation regarding approval or rejection and resubmission. ● ● ● Approval to Proceed An initial review of the Change Request form should assess the cost and value of proceeding with the change. The originator can then resubmit the change request with the requested information. Fallback plan in case of failure – Includes reverting to old version using TBD Date and time of change – Migration / Promotion plan Test-Dev and Dev-Prod ● ● ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . certain considerations need to be addressed with the migration of objects: Data Analyzer’s repository database contains user profiles in addition to reporting objects. certain objects that are migrated via XML imports may only be reflected on the node that the import operation was performed on. Plan and Prepare Change Once approval to proceed has been granted. he or she should return the request form to the originator for further details. If users are synchronized from outside sources (like an LDAP directory or via Data Analyzer’s API). Assessment of risk of the change failing. If sufficient information is not provided on the request form to enable the initial reviewer to thoroughly assess the change.● For clients using Data Analyzer for front-end reporting. backup repository and backup database. then a repository restore from one environment to another may delete user profiles (once the repository is linked to LDAP). When reports containing references to dashboards are migrated. the originator may plan and prepare the change in earnest. It may be necessary to stop and re-start the other nodes to refresh these nodes with these changes. In a clustered Data Analyzer configuration.Data Warehousing 202 of 1017 . Impact on services and users – Inform business users in advance about any anticipated outage.

and then communicated to all affected parties (e. etc. the impact is easy to define. INFORMATICA CONFIDENTIAL Velocity v8 Methodology .g.) The importance of the impact analysis process is in recognizing unforeseen downstream affects prior to implementing the change. and it must be done in sync with the migration of the repository change. This ensures that everyone who needs to be notified is.g. DBAs.e. if a requested change is limited to changing the target of a particular session from a flat file to a table. etc. In many cases. and the hidden impacts can be worrisome. the impact is obvious. migration/promotion. An assessment must be completed to determine how a change request affects other objects in the analytic solution architecture. the initial analysis is performed. In many development projects. if a business rule change is made.Data Warehousing 203 of 1017 . For example.. load) does this change impact? What processes does the client have in place to handle and track changes? Who else uses the data affected by the change and are they involved in the change request? How will this change be promoted to other environments in a timely manner? What is the effort involved in making this change? Is there time in the project schedule for this change? Is there sufficient time to fully test the change? ● ● ● ● Implementation Following final approval and after relevant and timely communications have been issued.) at a regularly scheduled meeting. For PowerCenter.. For example. Repository Administrator. the values on a report will change. Any implemented change has some planned downstream impact (e. An impact analysis must answer the following questions: ● ● What PowerCenter or Data Analyzer objects does this change affect? What other system objects are affected by the change? What processes (i. However. and that all approve the change request. additional data will be included. the change may be implemented in accordance with the plan and the scheduled date and time.Impact Analysis The Change Control Process must include a formalized approach to completing impact analysis. the corresponding target database must also be changed. the Repository Manager can be used to identify object interdependencies.. a new target file will be populated. most changes occur within mappings or within databases. how will the end results of the mapping be affected? If a target table schema needs to be modified within the repository.

Approve or reject the change or migration request. 2. The Project Manager has authority to approve/reject change requests. 3. 6. the following steps should occur: 1.Data Warehousing 204 of 1017 . so the appropriate procedures need to be established to ensure that all items are synchronized. there are objects outside of the Informatica architecture that are directly linked to these objects. the change request form should indicate whether the change was successful or unsuccessful so as to maintain a clear record of the outcome of the request. List all objects affected by the change. the types of objects to manage are: ● ● ● ● ● ● ● ● ● ● ● Source definitions Target definitions Mappings and mapplets Reusable transformations Sessions Batches Reports Schemas Global variables Dashboards Schedules In addition. 5. including development objects and databases. When a change request is submitted. Submit the promotion request for migration to QA and/or production environments. Perform impact analysis on the request. Change Control and Migration /Promotion Process Identifying the most efficient method for applying change to all environments is essential. 4. Migrate the change to the test environment. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . the process will need to start over for this object. Test the requested change. pass the request to the PowerCenter Administrator for processing. If the change does not pass testing.After implementation. If approved. Within the PowerCenter and Data Analyzer environments.

Data Warehousing 205 of 1017 . Best Practices None Sample Deliverables None Last updated: 15-Feb-07 18:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . 8.7. the Project Manager approves the request. The Repository Administrator promotes the object to appropriate environments. If appropriate.

This section touches on these topics. the next step is to physically implement that architecture. but is not meant to be a step-by-step guide to the acquisition and implementation process. This includes procuring and installing the hardware and software required to support the data integration processes.3 Implement Technical Architecture Description While it is crucial to design and implement a technical architecture as part of the data integration project development effort.1 Develop Solution Architecture). Specifically.Data Warehousing 206 of 1017 . most of the implementation work is beyond the scope of this document.Phase 3: Architect Task 3.2 Design Development Architecture Roles Database Administrator (DBA) (Secondary) Project Sponsor (Approve) Repository Administrator (Primary) System Administrator (Primary) Technical Architect (Primary) Technical Project Manager (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Prerequisites 3. and is accomplished by following pre-established procedures. After determining an appropriate technical architecture for the solution (3. the acquisition and installation of hardware and system software is generally handled by internal resources.

The entire procurement process.Considerations The project schedule should be the focus of the hardware and software implementation process. however. There are. which may require a significant amount of time. must begin as soon as possible to keep the project moving forward. as described in the related subtasks. Delays in this step can cause serious delays to the project as a whole. a number of proven methods for expediting the procurement and installation processes. Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:45 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 207 of 1017 .

e.3. Although the forms vary widely among companies.1 Procure Hardware and Software Description This is the first step in implementing the technical architecture. an RFP typically lists what products need to be purchased. It is critical to begin the procurement process well in advance of the start of development.Phase 3: Architect Subtask 3. The document is then reviewed and approved by appropriate management and the organization's "buyer". when they will be needed. and why they are necessary for the project.. An RFP is usually mandatory for procuring any new hardware or software. Prerequisites 3. but is often based on a purchase request (i.Data Warehousing 208 of 1017 .2 Design Development Architecture Roles Database Administrator (DBA) (Secondary) Project Sponsor (Approve) Repository Administrator (Secondary) System Administrator (Secondary) Technical Architect (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The procurement process varies widely among organizations. Request for Purchase or RFP) generated by the Project Manager after the project architecture is planned and configuration recommendations are approved by IT management.

3 Develop Configuration Recommendations.2 Define Business Requirements.1. Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:45 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and 3. Approval must be received from another group or individual within the organization. The Technical Architect should have a good idea of at least some of the software and hardware choices before a physical architecture and configuration recommendations are solidified. it is a good idea to notify the buyer of necessary impending purchases.Data Warehousing 209 of 1017 .Technical Project Manager (Primary) Considerations Frequently. It may also be possible to begin the procurement process before all of the prerequisite steps are complete (See 2. providing a brief overview of the types of products that are likely to be required and for what reasons. it may be worthwhile to get started on a temporary server with the intention of moving the work to the new server when it is available. Finally. often referred to as a "buyer". Even before product purchase decisions are finalized. if development is ready to begin and the hardware procurement process is not yet complete. the Project Manager does not control purchasing new hardware and software. 3.1.2 Develop Architecture Logical View.

1. Establishing and following a detailed installation plan can help avoid unnecessary delays in development.2 Design Development Architecture Roles Database Administrator (DBA) (Primary) Repository Administrator (Primary) System Administrator (Primary) Technical Architect (Review Only) Technical Project Manager (Review Only) Considerations When installing and configuring hardware and software for a typical data warehousing project.Data Warehousing 210 of 1017 . including the repository.2 Develop Architecture Logical View). configuring. Prerequisites 3.Phase 3: Architect Subtask 3. and deploying new hardware and software should not affect the progress of a data integration project.2 Install/ Configure Software Description Installing. The entire development team depends on a properly configured technical environment. the following Informatica software components should be considered: ● PowerCenter Services – The PowerCenter services. (See 3. Incorrect installation or delays can have serious negative effects on the project schedule.3. INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

PowerExchange – PowerExchange has components that must be installed on the source system. keeping these important database size considerations in mind: ● PowerCenter Metadata Repository . PCR is based on Informatica Data Analyzer.Data Warehousing 211 of 1017 . views. and indexes. thin-client tool that uses Microsoft Internet Explorer 6 as the client. and indexes. to manage and distribute these reports via an internet browser interface. and client.ODBC drivers should also be installed on the client machines. Data Analyzer Metadata Repository . allowing users to view PowerCenter operational load statistics and perform impact analysis. log. ● PowerCenter Client – The client tools for the PowerCenter engine must be installed and configured on the client machines for developers. The database user should have privileges to create tables. ● ● ● ● ● In addition to considering the Informatica software components that should be installed. Informatica recommends allocating up to 150MB for PowerCenter repositories. Additional client tool installation for Data Analyzer is usually not necessary. thin-client tool that uses Microsoft Internet Explorer 6 as the client. Data Analyzer Client – Data Analyzer is a web-based. Metadata Manager Repository – Although you can create a Metadata ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Data Analyzer Server – The analytics server engine for Data Analyzer should be installed and configured on a server. The DataDirect . and domain services. running on an included JBOSS application server. Additional client tool installation for the PCR is usually not necessary. PowerCenter Reports Client – The PCR client is a web-based. although the proper version of Internet Explorer should be verified on client workstations.Although you can create a PowerCenter metadata repository with a minimum of 100MB of database space. PowerCenter server.integration. The PowerCenter client tools allow a developer to interact with the repository through an easy-to-use GUI interface. although the proper version of Internet Explorer should be verified on the client machines of business users to ensure that minimum requirements are met. Additional space should be added for versioned repositories. PowerCenter Reports – PowerCenter Reports (PCR) is a reporting tool that enables users to browse and analyze PowerCenter metadata. should be installed and configured on a server machine.Although you can create a Data Analyzer repository with a minimum of 60MB of database space. Informatica recommends allocating up to 150MB for Data Analyzer repositories. The database user should have privileges to create tables. views. the preferred database for the data integration project should be selected and installed.

views. refer to the Informatica PowerCenter Installation Guide. consider installing the Repository service on a machine with a fast network connection. and repository databases. For step-by-step instructions for installing the PowerCenter services. To optimize performance. PowerCenter Server Installation The PowerCenter services need to be installed and configured. The recommended configuration for the PowerCenter environment is to install the PowerCenter services and the repository and target databases on the same multiprocessor machine. This approach minimizes network interference when the server is writing to the target database. such as native drivers or ODBC. do not install the Repository service on a Primary Domain Controller (PDC) or a Backup Domain Controller (BDC). Informatica recommends placing a high-speed network connection between the two servers.Manager repository with a minimum of 550MB of database space. Some organizations house the repository database on a separate database server if they are running OLAP servers and want to consolidate metadata repositories. The following list is intended to complement the installation guide when installing PowerCenter: ● Network Protocol . and indexes. If available hardware dictates that the PowerCenter Server is separated physically from the target database server. The database user should have privileges to create tables. Native Database Drivers (or ODBC in some instances) are used by the Server to connect to the source. and storage parameters are set at the database level. along with any necessary database connectivity drivers. Because the repository tables are typically very small in comparison to the data mart tables.TCP/IP and IPX/SPX are the supported protocols for communication between the PowerCenter services and PowerCenter client tools. To improve repository performance. target. Use this approach when available CPU and memory resources on the multiprocessor machine allow all software processes to operate efficiently without “pegging” the server.Data Warehousing 212 of 1017 . Connectivity needs to be established among all the platforms before the Informatica applications can be used. Ensure that ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . you may choose to allocate more space in order to plan for future growth. ● Data Warehouse Database – Allow for ample space with growth at a rapid pace. it may be advisable to keep the repository in a separate database.

set the DataMovementMode to ASCII because the 7-bit storage of character data results in smaller cache sizes for string data. Informatica recommends allocating a minimum of 500MB of space in the database for the PowerCenter repository. as well as common technical symbols. Register the package with each repository that you want to use it with. please refer to the PowerCenter Release Notes documentation to ensure that all required patches have been applied to the operating system. Lightweight Directory Access Protocol (LDAP) . 3. it also results in larger metadata repositories. ASCII is a single-byte code page that encodes character data with 7-bits.If you use PowerCenter default authentication. the Repository service passes a user login to the external directory for authentication. ● ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Data Movement Mode . you create users and maintain passwords in the PowerCenter metadata repository using Repository Manager. 2.Data Warehousing 213 of 1017 . allowing synchronization of PowerCenter user names and passwords with network/ corporate user names and passwords.. The DataMovementMode can be set to ASCII or Unicode. Unicode) is not required. European. perform the following steps: 1. You must create the user name-login associations. Set up users in each repository.all appropriate database drivers (and most recent patch levels) are installed on the PowerCenter server to access source. developers can save multiple copies of any PowerCenter object to the repository. To install the plug-in. The repository maintains an association between repository user names and external login names. resulting in more efficient data movement.Unicode is an international character set standard that supports all major languages (including US.e. but you do not maintain user passwords in the repository. and repository databases. if international code page support (i. If you use Lightweight Directory Access Protocol (LDAP). Informatica provides a PowerCenter plug-in that you can use to interface between PowerCenter and an LDAP server. Although actual performance results depend on the nature of the application. If Versioning is enabled for a repository. ● Operating System Patches – Prior to installing PowerCenter. This step is often overlooked and can result in operating system errors and/or failures when running the PowerCenter Server. The Repository service verifies users against these user names and passwords. target. Versioning – If Versioning is enabled for a PowerCenter Repository. Although this feature provides developers with a seamless way to manage changes during the course of a project. Configure the LDAP module connection information from the Administration Console. Unicode uses a fixed-width encoding of 16-bits for every character. and Asian).The DataMovementMode option is set in the PowerCenter Integration Service configuration.

verify that you have enough disk space for the PowerCenter Client.For more information on configuring LDAP authentication. along with the application server foundation software. The Repository Manager saves repository connection information in the registry. this is important when importing and exporting repository registries. PowerCenter Client Installation The PowerCenter Client needs to be installed on all developer workstations. The registry references the data source names used in the exporting machine. Also. If a registry is imported containing a DSN that does not exist on the client system. When installing PowerCenter Client tools via a standard installation. Aside from eliminating the potential for confusion on individual developer machines. or if you want to standardize the installation across all machines in the environment. The reports are built on the Data Analyzer infrastructure. Currently. along with any necessary drivers. and then import it for a new client.Data Warehousing 214 of 1017 . You must have 300MB of disk space to install the PowerCenter 8 Client tools. Data Analyzer must be installed and configured. it is possible to export that information. When adding an ODBC data source name (DSN) to client workstations. choose to install the “Client tools” and “ODBC” components. You may want to perform a silent installation if you need to install the PowerCenter Client on several machines on the network. When you perform a silent installation. the connection will fail at runtime. refer to the Informatica PowerCenter Repository Guide. the installation program uses information in a response file to locate the installation directory. make sure you have 30MB of temporary file space available for the PowerCenter Setup. You can also perform a silent installation for remote machines on the network. including database connectivity drivers such as ODBC. it is a good idea to keep the DSN consistent among all workstations. PCR is shipped with the PowerCenter installation (both Standard and Advanced Editions). INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Before you begin the installation. PowerCenter Reports Installation PowerCenter Reports (PCR) replaces the PowerCenter Metadata Reporter. To simplify the process of setting up client systems. TIP You can install the PowerCenter Client tools in standard mode or silent mode.

The proper version of Internet Explorer should be verified on client machines. refer to the Informatica PowerCenter Installation Guide. if you use Lightweight Directory Access Protocol (LDAP). as well as PowerCenter user names and passwords. You must create the user name-login associations. the client workstation should have at least a 300MHz processor and 128MB of RAM. Data Analyzer passes a user login to the external directory for authentication. see the Data Analyzer Administration Guide.properties and ldaprealm.properties files. you must configure the IAS. thin-client tool that uses Microsoft Internet Explorer 6 as the client. Data Analyzer verifies users against these user names and passwords. Informatica recommends placing a high-speed network connection between the two servers. Please note that these are the minimum requirements for the PCR INFORMATICA CONFIDENTIAL Velocity v8 Methodology . allowing synchronization of Data Analyzer user names and passwords with network/corporate user names and passwords. ● PowerCenter Reports Client Installation The PCR client is a web-based. This approach minimizes network input/output as the PCR server reads from the PowerCenter repository database. application server.If you use default authentication. If available hardware dictates that the PCR server be physically separated from the PowerCenter repository database server. The repository maintains an association between repository user names and external login names. Use this approach when available CPU and memory resources on the multiprocessor machine allow all software processes to operate efficiently without “pegging” the server. This step is often overlooked and can result in operating system errors and/or failures if the correct patches are not applied. be sure to refer to the Data Analyzer Release Notes documentation to ensure that all required patches have been applied to the operating system. but you do not have to maintain user passwords in the repository. For more information on configuring LDAP authentication. you create users and maintain passwords in the Data Analyzer metadata repository. and repository databases on the same multiprocessor machine. and the minimum system requirements should be validated. In order to use PCR. ensuring that Internet Explorer 6 is the default web browser.Data Warehousing 215 of 1017 . The following list of considerations is intended to complement the installation guide when installing PCR: ● Operating System Patch Levels – Prior to installing PCR. In order to enable LDAP. However. For step-by-step instructions for installing the PowerCenter Reports. Lightweight Directory Access Protocol (LDAP) .The recommended configuration for the PCR environment is to place the PCR/Data Analyzer server.

Informatica recommends disabling any pop-up blocking utility on your browser. Data Analyzer Server Installation The Data Analyzer Server needs to be installed and configured along with the application server foundation software. Adobe SVG Viewer . see “Working with Reports” in the Data Analyzer User Guide. additional CPU and memory is required. you must install Adobe SVG Viewer. Data Analyzer is certified on the following application servers: ● ● ● BEA WebLogic IBM WebSphere JBoss Application Server Refer to the PowerCenter Installation Guide for the current list of supported application servers and exact version numbers. Certain interactive features in the PCR require third-party plug-in software to work correctly.client. you can display interactive report charts and chart indicators. the PCR windows may not display properly. users are likely to be multi-tasking using multiple applications. If a pop-up blocker is running while you are working with PCR. for PCR to display its application windows correctly. For more information on downloading Adobe SVG Viewer. you can export a report to an Excel file and refresh the data in Excel directly from the cached data in PCR or from data in the data warehouse through PCR. you must first install the Microsoft SOAP Toolkit. Users must download and install the plug-in software on their workstation before they can use these features. see “Managing Account Information” in the Data Analyzer User Guide. To use the data refresh feature. To view interactive charts. PCR uses the following third-party plug-in software: ● Microsoft SOAP Toolkit . In most situations.In PCR. For information on downloading the Microsoft SOAP Toolkit. You can click on an interactive chart to drill into the report data and view details and select sections of the chart. ● Lastly. Currently.In PCR. so this should be taken into consideration.Data Warehousing 216 of 1017 . INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and that if other applications are running on the client workstation.

refer to the Data Analyzer Release Notes documentation to ensure that all required patches have been applied to the operating system. and data warehouse databases on the same multiprocessor machine. Data Analyzer verifies users against these user names and passwords. You must create the user name-login associations. Lightweight Directory Access Protocol (LDAP) .properties files. In order to enable LDAP. you must configure the IAS. The following list of considerations is intended to complement the installation guide when installing Data Analyzer: ● Operating System Patch Levels – Prior to installing Data Analyzer. For more information on configuring LDAP authentication. repository. However. This approach minimizes network input/output as the Data Analyzer Server reads from the data warehouse database. The recommended configuration for the Data Analyzer environment is to put the Data Analyzer Server. http server.TIP When installing IBM WebSphere Application Server. This step is often overlooked and can result in operating system errors and/or failures if the correct patches are not applied.Data Warehousing 217 of 1017 .properties and ldaprealm. as well as PowerCenter user names and passwords. Use this approach when available CPU and memory resources on the multiprocessor machine allow all software processes to operate efficiently without “pegging” the server. if you use Lightweight Directory Access Protocol (LDAP). you create users and maintain passwords in the Data Analyzer metadata repository. avoid using spaces in the installation directory path name for the application server. or messaging server. but you do not maintain user passwords in the repository. refer to the Informatica Data Analyzer Administrator Guide. allowing synchronization of Data Analyzer user names and passwords with network/corporate user names and passwords. Informatica recommends placing a high-speed network connection between the two servers. ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The repository maintains an association between repository user names and external login names. Data Analyzer passes a user login to the external directory for authentication. For step-by-step instructions for installing the Data Analyzer Server components. application server.If you use Data Analyzer default authentication. If available hardware dictates that the Data Analyzer Server is separated physically from the data warehouse database server. refer to the Informatica Data Analyzer Installation Guide.

you must first install the Microsoft SOAP Toolkit. This ensures that the managed connections in JBOSS will be configured properly. you must install Adobe SVG Viewer.Data Warehousing 218 of 1017 . Without this setting it is possible that email alert messages will not be sent properly. so this should be taken into consideration. For information on downloading the Microsoft SOAP Toolkit. you can export a report to an Excel file and refresh the data in Excel directly from the cached data in Data Analyzer or from data in the data warehouse through Data Analyzer. and the minimum system requirements should be validated. see “Working with Reports” in the Data Analyzer User Guide. In order to use the Data Analyzer Client. and that if other applications are running on the client workstation. For more information ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . additional CPU and memory is required. If the transaction log is full or runs out of space when the Data Analyzer installation program creates the Data Analyzer repository. thin-client tool that uses Microsoft Internet Explorer 6 as the client. To use the data refresh feature. the installation program will fail. the client workstation should have at least a 300MHz processor and 128MB of RAM. To view interactive charts. be sure to clear the database transaction log for the repository database. Users must download and install the plug-in software on their workstation before they can use these features. In most situations. set the minimum pool size to 0 in the file <JBOSS_HOME>/server/informatica/deploy/hsqldb-ds. Data Analyzer uses the following third-party plug-in software: ● Microsoft SOAP Toolkit . The proper version of Internet Explorer should be verified on client machines. Please note that these are the minimum requirements for the Data Analyzer Client. you can display interactive report charts and chart indicators. ensuring that Internet Explorer 6 is the default web browser.In Data Analyzer. Certain interactive features in Data Analyzer require third-party plug-in software to work correctly.TIP After installing Data Analyzer on the JBoss application server. Data Analyzer Client Installation The Data Analyzer Client is a web-based. You can click on an interactive chart to drill into the report data and view details and select sections of the chart.In Data Analyzer. users are likely to be multi-tasking using multiple applications. TIP Repository Preparation Before you install Data Analyzer. xml. Adobe SVG Viewer .

they must be installed prior to the Metadata Manager installation.on downloading Adobe SVG Viewer. Informatica recommends that a system administrator. You must install the application server and other required software before you install Metadata Manager. Lastly. For complete information on the Metadata Manager installation process. If a pop-up blocker is running while you are working with Data Analyzer. the Data Analyzer windows may not display properly. Metadata Manager Installation Metadata Manager software can be installed after the development environment configuration has been completed and approved. for Data Analyzer to display its application windows correctly. 1. The JBoss Application Server can be installed from the Metadata Manager installation process. who is familiar with application and web servers. install the required software.Data Warehousing 219 of 1017 . Install BEA WebLogic Server or IBM WebSphere Application Server on the machine where you plan to install Metadata Manager. IBM WebSphere Application Server. Informatica recommends disabling any pop-up blocking utility on your browser. If you choose to use BEA WebLogic or IBM WebSphere. 2. LDAP servers. refer to the PowerCenter Installation Guide. and the J2EE platform. The following high-level steps are involved in Metadata Manager installation process: Metadata Manager requires a web server and a Java 2 Enterprise Edition (J2EE)compliant application server. You can install Metadata Manager on a machine with a Windows or UNIX operating system. Metadata Manager includes the following installation components: ● ● ● ● ● Metadata Manager Limited edition of PowerCenter Metadata Manager documentation in PDF format Metadata Manager and Data Analyzer integrated online help Configuration Console online help INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and JBoss Application Server. see “Managing Account Information” in the Data Analyzer User Guide. Metadata Manager works with BEA WebLogic Server.

Assign the Integration repository to the PowerCenter Server to enable running of prepackaged XConnect workflows. Configure an XConnect for each source repository. and XConnects. 6. 2. Configure the PowerCenter Server. Install PowerCenter 8 to manage metadata extract and load tasks.Be sure to refer to the Metadata Manager Release Notes for information regarding the supported versions of each application. 4. as well as the PowerCenter and Metadata Manager licenses. When installing Metadata Manager. Create one database user account for the Metadata Manager Warehouse and Metadata Manager Server repository and another for the Integration repository. Optionally. Install Metadata Manager. Install BEA WebLogic Server or IBM WebSphere Application Server. Install the application server. After completing the Metadata Manager installation. the Metadata Manager Administrator can begin creating security groups. run the pre-compile utility (for BEA WebLogic Server and IBM WebSphere). If you are using the BEA WebLogic Server as your Application server. 3. see “Installing Metadata Manager” chapter of the PowerCenter Installation Guide. such as the Configuration Console.Data Warehousing 220 of 1017 . and then load metadata from the source repositories into the Metadata Manager INFORMATICA CONFIDENTIAL Velocity v8 Methodology . To install Metadata Manager for the first time. After the software has been installed and tested. Install PowerCenter 8. optionally pre-compile the JSP scripts to display the Metadata Manager web pages faster when they are accessed for the first time. For more information on any of these steps. The Metadata Manager installation creates both repositories and installs other Metadata Manager components. configure XConnects to extract metadata. 7. refer to the Metadata Manager Administration Guide. Following are the some of the initial steps for the Metadata Manager Administrator once the Metadata Manager is installed. and the repositories. Apply the product license. Note: For more information about installing Metadata Manager. Apply the application server license. 1. documentation. users. provide the connection information for the database user accounts for the Integration repository and the Metadata Manager Warehouse and Metadata Manager Server repository. complete each of the following tasks in the order listed below: 1. 5. The workflow for each XConnect extracts metadata from the metadata source repository and loads it into the Metadata Manager Warehouse. Create database user accounts.

specify the PowerCenter source files directory in the Configuration Console. Warehouse. Verify the Integration repository. Take time to identify and notify resources you are going to need to complete the installation. PowerCenter Server. 3. Take care to read through the installation documentation prior to attempting the installation. This action adds the corresponding XConnect for this repository in the Configuration Console. 5. Also.2.Data Warehousing 221 of 1017 . set user privileges and object access permissions. 4. Depending on the specific product. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Add each source repository to Metadata Manager. take time to read the PowerExchange Installation Guide as well as the documentation for the specific PowerExchange products you have licensed and plan to install. you could need any or all of the following: ● ● ● ● ● ● ● Database Administrator PowerCenter Administrator MVS Systems Administrator UNIX Systems Administrator Security Administrator Network Administrator Desktop (PC) Support Installing the PowerExchange Listener on Source Systems The process for installing PowerExchange on the source system varies greatly depending on the source system. and PowerCenter Repository Server connections in the Configuration Console. Set up and run the XConnect for each source repository using the Configuration Console. The PowerExchange Installation Guide has step by step instructions for installing PowerExchange on all supported platforms. Set up the Configuration Console. To limit the tasks that users can perform and the type of source repository metadata objects that users can view and modify. PowerExchange Installation Before beginning the installation. Repository registration / creation in the Metadata Manager.

It is recommended that a separate user account be created to run the required processes. Administrator access may be required to install the software.Installing the PowerExchange Navigator on the PC The Navigator allows you to create and edit data maps and tables. Install the PowerExchange Navigator.Data Warehousing 222 of 1017 .cfg. Refer to the PowerExchange Reference Manual for information on the parameters in dbmover. Informatica recommends that the installation be performed in one environment and tested from end-to-end (from data map creation to running workflows) before attempting to install the product in other environments. Installing PowerExchange Client for the PowerCenter Server The PowerExchange client for the PowerCenter server allows PowerCenter to read data from PowerExchange data sources. Depending on your installation.cfg file. Best Practices None Sample Deliverables None Last updated: 15-Feb-07 18:58 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . complete each of the following tasks in the order listed below: 1. To install PowerExchange on the desktop (PC) for the first time. Modify the dbmover. 2. A PowerCenter Administrator needs to register the PowerExchange plug-in with the PowerExchange repository. modifications may not be required. The PowerCenter Administrator should perform the installation with the assistance of a server administrator.

1 Design Presentation Layer Prototype 4.3 Develop Presentation Layout Design r r INFORMATICA CONFIDENTIAL Velocity v8 Methodology .1 Develop Data Model (s) r 4.2 Determine Source Availability r ● 4.1 Develop Physical Database Design ● 4.1.2.Data Warehousing 223 of 1017 .2 Present Prototype to Business Analysts 4.1.3 Design Physical Database r 4.2 Analyze Data Sources r 4.4.Phase 4: Design 4 Design ● 4.1 Develop Enterprise Data Warehouse Model 4.1 Develop Source to Target Relationships 4.4 Design Presentation Layer r 4.3.2.4.4.2 Develop Data Mart Model(s) r ● 4.

web services. and workflows within PowerCenter. The presentation layer is designed and a prototype constructed.Phase 4: Design Description The Design Phase lays the foundation for the upcoming Build Phase. sessions.Data Warehousing 224 of 1017 . message queues or custom databases to drive specific applications or effect a data migration. Prerequisites 3 Architect Roles Application Specialist (Primary) Business Analyst (Primary) Data Architect (Primary) Data Integration Developer (Primary) Data Quality Developer (Primary) Database Administrator (DBA) (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . if done thoroughly. The design of target data store may include. all data models are developed. Each task in the Design Phase provides the functional architecture for the development process using PowerCenter. source systems are analyzed and physical databases are designed. The Design Phase requires that several preparatory tasks are completed before beginning the development work of building and testing mappings. data warehouses and data marts. enables the data integration solution to perform properly and provides an infrastructure that allows for growth and change. star schemas. In the Design Phase. Each task.

Data Warehousing 225 of 1017 .Presentation Layer Developer (Primary) System Administrator (Primary) Technical Project Manager (Review Only) Considerations None Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:45 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Each of these data stores may exist independently of the others. INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Phase 4: Design Task 4. The logical data model will. Depending on the structure and approach to data storage supporting the data integration solution. the ODS can then become the source for the EDW. In addition.Data Warehousing 226 of 1017 . many types of data integration projects do not involve a Data Warehouse. lead to the initial physical database design that will support the business requirements and be populated through data integration logic.1 Develop Data Model(s) Description A data integration/business intelligence project requires logical data models in order to begin the process of designing the target database structures that are going to support the solution architecture. and may reside on completely different database management systems (DBMSs) and hardware platforms. but typically one that is dictated by the functional processes. many implementations also include an Operational Data Store (ODS). in turn. requires a data model design. which may also be referred to as a dynamic data store (DDS) or staging area. The ODS typically receives the data after some cleansing and integration. While this task and its subtasks focus on the data models for Enterprise Data Warehouses and Enterprise Data Marts. An ODS may be needed when there are operational or reporting uses for the consolidated detail data or to provide a staging area. for example. The same may be true of data consolidation projects if the target is the same structure as an existing operational database. but with little or no summarization from the source systems. Regardless of the architecture chosen for the data integration solution. when there is a short time span to pull data from the source systems. Data migration or synchronization projects typically have existing transactional databases as sources and targets. the data models for the target databases or data structures. In any case. the data architecture may include an Enterprise Data Warehouse (EDW) and one or more data marts. Operational data integration projects. the data models may be reverse engineered directly from these databases. in these cases. need to be developed in logical and consistent fashion prior to development. It can act as a buffer between the EDW and the source applications. including data consolidation into new data structures. each of the database schemas comprising the overall solution will require a corresponding logical model. The data model for the ODS is typically in third-normal form and may be a virtual duplicate of the source systems' models.

time-stamped events (or facts) with business-oriented names. technology-centric data model. it typically has derived calculations and subtotals. Data marts (DMs) are effectively subsets of the EDW. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . as opposed to business transactions. such as the fiscal quarter in which the shipment occurred. The EDW typically has a somewhat de-normalized structure to support reporting and analysis. Historical data capture requirements may differ from those on the enterprise data warehouse. subject-oriented view of the corporate data comprised of relevant source system data. Analyzing the cost benefits of build vs.Data Warehousing 227 of 1017 . These types of schemas represent business activities as a series of discrete.Major business intelligence projects require an EDW to house the data imported from many different source systems. During the modeling phase of a data integration project. Data marts are fed directly from the enterprise data warehouse. atomic-level facts).. a variant of a star schema may be used. These assumptions encourage the use of star and snowflake schemas in the solution design. such as orders or shipments. it is important to consider all possible methods of obtaining a data model. It is typically slightly summarized so that its information is relevant to management. The EDW is not generally intended for direct access by end users for reporting purposes—for that we have the “data marts”. The logical design structures are typically dimensional star or snowflake schemas. which in turn drives the logical design that becomes the foundation of the physical database design. as opposed to providing all the transaction details. Detailed requirements drive content. These facts contain foreign key "pointers" to one or more dimensions that place the fact into a business context. ensuring synchronization of business rules and snapshot times. Business users prefer systems that deliver results quickly. The EDW represents an integrated. A subject-oriented data mart may be able to provide for more historical analysis. buy may well reveal that it is more economical to buy a pre-built subject area model than to invest the time and money in building your own. Two generic assumptions about business users also affect data mart design: ● ● Business users prefer systems they easily understand. The use of business terminology throughout the star or snowflake schema is much more meaningful to the end user than the typical normalized. or the sales region responsible for the order.e. There may be additions and reductions to the logical data mart design depending on the requirements for the particular data mart. Depending on size and usage. In addition to its numerical details (i. or alternatively may require none. The structures of the data marts are driven by the requirements of particular business users and reporting tools.

This question is particularly critical for designing logical data warehouse and data mart schemas. For example. the Data Architect will design models that fail to support the business requirements. Conventions should be chosen for the prefix and suffix names of certain types of fields. Conventions for Names and Data Types Some internal standards need to be set at the beginning of the modeling process to define data types and names. The EDW logical model is largely dependent on source system structures. the entire purpose is defeated. at best.Prerequisites None Roles Business Analyst (Secondary) Data Architect (Primary) Data Quality Developer (Primary) Technical Project Manager (Review Only) Considerations Requirements The question that should be asked before modeling begins is: Are the requirements sufficiently defined in at least one subject area that the data modeling tasks can begin? If the data modeling requires too much guesswork. It is extremely important for project team members to adhere to whatever conventions are chosen. numeric surrogate keys in the data warehouse might use either seq or id as a suffix to easily identify the INFORMATICA CONFIDENTIAL Velocity v8 Methodology . time will be wasted or. at worst. If project team members deviate from the chosen conventions.Data Warehousing 228 of 1017 .

type of field to the developers. you may want to consider the metadata reporting capabilities in Metadata Manager to provide automatically updated lineage and impact analysis. Additionally. Adhering to rigorous change control procedures will help to ensure that all impacts of a change are recognized prior to their implementation. Informatica’s Metadata Services products can be used to deliver metadata from other application repositories to the PowerCenter Repository and from PowerCenter to various business intelligence (BI) tools. Logical data models can be delivered to PowerCenter ready for data integration development. As metadata has to be delivered to numerous applications used in various stages of a data integration project. Domains are also hierarchical. Additionally. INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 229 of 1017 .) Data modeling tools refer to common data types as domains. For example. Once the backbone of a data model is in place. Many business intelligence vendors have tools that can access the PowerCenter Repository through the Metadata Services and Metadata Manager Benefits architectures. a change control procedure should be implemented to monitor any changes requested and record implementation of those changes. Metadata integration is a major up-front consideration if metadata is to be managed consistently and competently throughout the project. metadata originating from these models can be delivered to end users through business intelligence tools. Establishing these data types at the beginning of the model development process is beneficial for consistency and timeliness in implementing the subsequent physical database design. address can be of a string data type. Residential and business addresses are children of address. They should be stored in a repository in order to take advantage of PowerCenter's integrated metadata approach. To facilitate metadata analysis and to keep your documentation up-to-date. Maintaining the Data Models Data models are valuable documentation. Versioning should take place regularly within the repository so that it is possible to roll back several versions of a data model. Metadata A logical data model produces a significant amount of metadata and is likely to be a major focal point for metadata during the project. they should be regularly backed-up to file after major changes. if necessary. (See Naming Conventions for additional information. both to the project and the business users. an integrated approach to metadata management is required.

● Both options 1 and 2 allow for metadata integration. Best Practices None Sample Deliverables None Last updated: 15-Feb-07 19:01 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Option 1 is generally preferable because the links can be imported into the PowerCenter Repository through Metadata Exchange.TIP To link logical model design to the requirements specifications. use either of these methods: ● Option 1: Allocate one of the many entity or attribute description fields that data modeling tools provide to be the link between the elements of the logical design and the requirements documentation. Then. establish (and adhere to) a naming convention for the population of this field to identify the requirements that are met by the presence of a particular entity or attribute. Option 2: Record the name of the entity and associated attribute in a spreadsheet or database with the requirements that they support.Data Warehousing 230 of 1017 .

nor snowflake schema. then the logical EDW model should encompass all of the sources that feed the warehouse.1. In summary.Data Warehousing 231 of 1017 . Prerequisites None Roles Business Analyst (Secondary) Data Architect (Primary) Technical Project Manager (Review Only) Considerations Analyzing Sources Designing an Enterprise Data Warehouse (EDW) is particularly difficult because it is an accumulation of multiple sources. The Data Architect needs to identify and replicate all of the relevant source structures in the EDW data model. nor a highly normalized structure of the source systems. Some of the source structures are redesigned in the model to migrate non-relational sources to relational structures. it should be neither a full star. which can be INFORMATICA CONFIDENTIAL Velocity v8 Methodology . the developed EDW logical model should be the sum of all the parts but should exclude detailed attribute information.Phase 4: Design Subtask 4. it may be appropriate to provide limited consolidation where common fields are present in various incoming data sources. The PowerCenter Designer client includes the Source Analyzer and Warehouse Designer tools. In some cases. This model will be a slightly de-normalized structure to replicate source data from operational systems.1 Develop Enterprise Data Warehouse Model Description If the aim of the data integration project is to produce an Enterprise Data Warehouse (EDW).

These tools can be used to analyze sources.. The relationships are reflected in the physical design that a modeling tool produces from the logical design. A universal table should hold all of the contact fields from both systems (i. Additionally. Normalized targets defined using PowerCenter can then be created in a database and reverse-engineered into the data model. incoming non-relational structures can be normalized by use of the Normalizer transformation object. different systems may use different codes for the gender of a customer. ● Many-to-many relationships. an account must have an account type for it to be fully understood. but not include a field for a fax number. A universal table brings together the fields that cover the same business subject or business rule. standard contact details plus fields for fax. A data modeling tool does not enforce nonidentifying relationships through constraints when the logical model is used to generate a physical database. Non-identifying relationships are relationships in which the parent object is not required for its identity. if desired. a customer table in one source system may have only standard contact details while a second system may supply fields for mobile phones and email addresses. For example. There are two types of relationships: identifying and nonidentifying. Universal Tables Universal tables provide some consolidation and commonality among sources. The tool attempts to enforce identifying relationships through database constraints.useful for this task. Relationship Modeling Logical modeling tools allow different types of relationships to be identified among various entities and attributes. and many-to-one relationships can all be defined in logical models.e. In PowerCenter Designer. in a bank. mobile phones and email). universal tables should ensure syntactic consistency such that fields from different source tables represent the same data items and possess the same data types. ● An identifying relationship is one in which a child attribute relies on the parent for its full identity. Universal tables are also intended to be the sum of all parts. For example. one-to-one relationships. For example. The modeling tools hide the underlying INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 232 of 1017 . dedicated modeling tools can be used. Alternatively. convert them into target structures and then expand them into universal tables.

Historical Considerations Business requirements and refresh schedules should determine the amount and type of history that an EDW should hold.. the modeling tools automatically create the lookup tables if the tool is used to generate the database schema. The Data Architect should focus on the common factors in the business requirements as early as possible. it may be necessary to obtain or generate new reference data to perform all relevant data quality checks. regional definitions. Data Quality Data can be verified for validity and accuracy as it comes into the EDW. The EDW can reasonably be expected to answer such questions as: ● ● Is the post code or currency code valid? Has a valid date been entered (i.Data Warehousing 233 of 1017 . INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and not before 1st Jan 1900. Alternatively. Variations and dimensions specific to certain parts of the organization can be dealt with later in the design.. a number of calendars. ISO Currency Codes. among others.e. data values can be evaluated against expected ranges. minimum age requirement for a driver's license)? Does the data conform to standard formatting rules? ● Additionally. The logical history maintenance architecture should be common to all tables within the EDW. Capturing historical data usually involves taking snapshots of the database on a regular basis and adding the data to the existing content with time stamps. focusing on the commonalities early in the process also allows other tasks in the project cycle to proceed earlier. ISO Units of Measure).) Values can also be validated against reference datasets.complexities and show those objects as part of a physical database design. For example. individual updates can be recorded and the previously current records can be timeperiod stamped or versioned. More importantly. It is also necessary to decide how far back the history should go. In addition.g. dates of birth should be in a reasonable range (not after the current date. As well as using industry-standard references (e. A project to develop an integrated solution architecture is likely to encounter such common business dimensions as organizational hierarchy. and product dimensions.

Variations in facts can be included in these tables along with common organizational facts. These facts need to be identified and labeled in the logical model according to the part of the organization using the differing methods. The Data Architect may determine at this point that subject areas thought to be common are. Tip Determining Levels of Aggregation In the EDW. and customer values. turnover. Performance measures include productivity. Variations in dimension may require additional dimensional tables. The Data Architect can use a star or snowflake structure to denormalize the data structures. profit and gross margin. client satisfaction.Data Warehousing 234 of 1017 .Subject areas also incorporate metrics. there may be limited value in holding multiple levels of aggregation. it may be better to aggregate using the PowerCenter server to load the appropriate aggregate data to the data mart. Commonality ensures continuity between the measures a business currently takes from an operational system and the new ones that will be available in the data integration solution. One or two central tables should hold the facts. in fact. Objectives such as trading ease of maintenance and minimal disk space storage against speed and usability determine whether a simple star or snowflake structure is preferable. ● If the data warehouse is feeding dependent data marts. efficiency. they should be modeled in the fact tables at the center of the star schema. ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Common business rules determine the formulae for the calculation of the metrics. Metrics are measures that businesses use to quantify their performance. There are two reasons for this: ● Common semantics enable business users to know if they are using the same organizational terminology as their colleagues. ● When the combination of dimensions and hierarchies is understood. not common across the entire organization. If specific levels are required. they can be modeled. commission payments. Various departments may use different rules to calculate their profit.

not just one. A single lookup table simply increases the amount of work mapping developers need to carry out to qualify the parts of the table they are concerned with for a particular mapping. Code Lookup Tables Use of single code lookup tables does not provide the same benefits as a single code lookup table on an OLTP system. Any of the following strategies may be appropriate: ● Informatica Generated Keys. and continuous numbering between loads. There are options for reusability. Separate codes would have to be loaded from their various sources and checked for existing records and updates. The sequence generator transformation allows the creation of surrogate keys natively in Informatica mappings. The function of a single code lookup table is to provide central maintenance of codes and descriptions. they also involve more work for the Data Architect. He/she should however. distribute. they will be physically separable when the physical database implementation takes place. The limitation to this strategy is that it cannot generate a number higher INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Individual lookup tables remove the single point of failure for code lookups and improve development time for mappings. Having a single database structure is likely to complicate matters in the future. These external dimensions will then be available as a subset of the data warehouse. If the historical data is needed. A single code lookup table implies the use of a single surrogate key. ensure that regardless of how the code tables are modeled. It should be assumed that the data set will be updated periodically and that the history will be kept for reference unless the business determines it is not necessary. should be held in the data warehouse. the syndicated data sets will need to be date-stamped. setting key-ranges.Syndicated Data Sets Syndicated data sets. This is not a benefit that can be achieved when populating a data warehouse since data warehouses are potentially loaded from more than one source several times. and maintain these keys as you plan your design. If problems occur in the load. The Data Architect may prefer to show a single object for codes on the diagrams. It is important to determine a strategy to create. Surrogate Keys The use of surrogate keys in most dimensional models presents an additional obstacle that must be overcome in the solution design. such as weather records. they affect all code lookups .Data Warehousing 235 of 1017 . however.

External Code Generated.Data Warehousing 236 of 1017 . This is done using either the stored procedure transformation or the external procedure transformation. ● External Table Based.dll that contains a programmatic solution to generate surrogate keys. ● ● Best Practices None Sample Deliverables None Last updated: 15-Feb-07 19:03 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . either by calling it from the source qualifier transformation or the stored procedure transformation. two billion is generally big enough for most dimensions. However. PowerCenter can access an external code table during loads using the look-up transformation to obtain surrogate keys. to perform the insert into the key field. Triggers/Database Sequence. Create a trigger on the target table. Informatica can access a stored procedure or external .than 232.

if desired.2 Develop Data Mart Model(s) Description The data mart's logical data model. supports the final step in the integrated enterprise decision support architecture. The metrics and aggregations must incorporate the dimensions that the data mart business users can use to study their metrics.Phase 4: Design Subtask 4. the aim of the data mart is to solve a specific business issue for its business sponsors.Data Warehousing 237 of 1017 . the logical model can be used to automatically resolve and generate some of the physical design. one data mart may focus on the business customers while another may focus on residential services. The structure of the dimensions must be sufficiently simple to enable those users to quickly produce their own reports. These models should be easily identified with their source in the data warehouse and will provide the foundation for the physical design.1. The logical design must incorporate transformations supplying appropriate metrics and levels of aggregation for the business users. As a subset of the data warehouse. such as lookups used to resolve many-to-many relationships. If the data integration project was initiated for the right reasons. In most modeling tools. Prerequisites None Roles Business Analyst (Secondary) Data Architect (Primary) Technical Project Manager (Review Only) Considerations The subject area of the data mart should be the first consideration because it determines the facts that must be drawn from the Enterprise Data Warehouse into the INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Tip Keep it Simple! If. If any additional metrics are required. the application has certain requirements that must also be considered. they should be placed in the data warehouse. the data mart is going to be used primarily as a presentation layer by business users extracting data for analytic purposes. The data mart may also drive an application. the mart should use as simple a design as possible. The data mart will then have dimensions that the business wants to model the facts against. If so.Data Warehousing 238 of 1017 .business-oriented data mart. as is generally the case. Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:46 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . but the need should not arise if sufficient analysis was completed in earlier development steps.

Data Warehousing 239 of 1017 . and how much data there is to deal with (i. including the data values and dependencies on other data elements. It is also important to understand where the data comes from.e. volume estimates). Prerequisites None Roles Application Specialist (Primary) Business Analyst (Primary) Data Architect (Primary) Data Integration Developer (Primary) Database Administrator (DBA) (Secondary) System Administrator (Primary) Technical Project Manager (Review Only) Considerations None INFORMATICA CONFIDENTIAL Velocity v8 Methodology .. Completing this task successfully increases the understanding needed to efficiently map data using PowerCenter. how the data is related. It is important to understand all of the data elements from a business perspective.Phase 4: Design Task 4.2 Analyze Data Sources Description The goal of this task is to understand the various data sources that will be feeding the solution.

Best Practices Using Data Explorer for Data Discovery and Analysis Sample Deliverables None Last updated: 01-Feb-07 18:46 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 240 of 1017 .

This step defines the relationships between the data elements and clearly illuminates possible data issues. such as incompatible data types or unavailable data elements.2.1 Develop Source to Target Relationships Description The third step in analyzing data sources is to determine the relationship between the sources and targets and to identify any rework or target redesign that may be required if specific data elements are not available. Prerequisites None Roles Application Specialist (Secondary) Business Analyst (Primary) Data Architect (Primary) Data Integration Developer (Primary) Technical Project Manager (Review Only) Considerations Creating the relationships between the sources and targets is a critical task in the design process. It is important to map all of the data elements from the source data to an appropriate counterpart in the target schema. Taking the necessary care in this effort should result in the following: ● Identification of any data elements in the target schema that are not currently INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 241 of 1017 .Phase 4: Design Subtask 4.

the Data Architect may need to re-evaluate and redesign the target schema or determine where the necessary data can be acquired. as early in the process as possible. it is best to eliminate as much unnecessary data as possible. When the source data is not available. unnecessary data is moved through the extraction process. All source data should be analyzed in a data quality application to assess its current data quality levels. Any data modifications or translations should be noted during this determination process as the source-to-target relationships are established.available from the source. Data quality should also be incorporated as an on-going process to be leveraged by the target data source. where applicable INFORMATICA CONFIDENTIAL Velocity v8 Methodology . In many cases. This step eliminates any data elements that are not required in the target. This can serve as a preliminary design specification for work to be performed during the Build Phase . to provide the following data: ● ● ● ● ● ● ● ● ● Operational (transactional) system in the organization Operational data store External data provider Operating system DBMS Data fields Data descriptions Data profiling/analysis results Data quality operations.Data Warehousing 242 of 1017 . The first step determines what data is not currently available from the source. This ensures that data in the target is of high quality and serves its purpose. The matrix lists each of the target tables from the data mart in the rows of the matrix and lists descriptions of the source systems in the columns. During the Design Phase . Regardless of whether the data is coming from flat files or relational sources. data quality processes can be introduced to fix identified issues and/or enrich data using reference information. Determination of the data flow required for moving the data from the source to the target. ● Identification of any data elements that can be removed from source records because they are not needed in the target. Determination of the quality of the data in the source. ● ● The next step in this subtask produces a (Target-Source Matrix) which provides a framework for matching the business requirements to the essential data elements and defining how the source and target elements are paired.

Data Warehousing 243 of 1017 . When this occurs. The Project Manager is responsible for ensuring that these parties agree that the data relationships defined in the Target-Source Matrix are correct and meet the needs of the data integration solution. If it does not. the Business Analyst may need to revalidate the particular rule or requirement to ensure that it meets the endusers' needs. however. The matrix should show all of the possible sources for this particular initiative. Therefore. Some of the potential risks involved in eliminating or changing data elements are: ● Losing a critical piece of data required for a business rule that was not originally defined but is likely to be needed in the future. the Business Analyst and Data Architect must determine if there is another way to use the available data elements to enforce the rule. may be helpful.One objective of the data integration solution is to provide an integrated view of key business data. or if the data meets requirements but is not available. Such a change would also push back all tasks defined and require a change in the Project Plan. Prior to any mapping development work. If no solution is found. The Project Manager should meet with the Business Analyst and the Data Architect to determine what rules or requirements can be changed and which must remain as originally defined. the Project Manager should obtain sign-off from the Business Analysts and user community. Undefined Data In some cases the Data Architect cannot locate or access the data required to establish a rule defined by the Business Analyst. the data elements must be checked for correctness and validated with both the Business Analyst(s) and the user community. The Data Architect can propose data elements that can be safely dropped or changed without compromising the integrity of the user requirements. Any change in data that needs to be incorporated in the Source or Target data models requires substantial time to rework and could significantly delay development. Changes in the Source system model may drop secondary relationships that ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . for each target table one or more source systems must exist. Choosing to eliminate data too early in the process due to inaccessibility. Enlisting the services of the System Administrator or another knowledgeable source system resource. The Project Manager must then identify any risks inherent in eliminating or changing the data elements and decide which are acceptable to the project. Such data loss may require a substantial amount of rework and can potentially affect project timelines. may cause problems further down the road. the Project Manager should communicate with the end-user community and propose an alternative business rule. After this matrix is completed.

including the data types. Then. the System Administrator may provide useful information about the reasons for any changes to the source system and their effect on data relationships. Best Practices None Sample Deliverables None Last updated: 15-Feb-07 19:05 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . names. the various risks involved in changing or eliminating data elements must be re-evaluated. The Data Architect should also decide which risks are acceptable.Data Warehousing 244 of 1017 . and definitions. Source Changes after Initial Assessment When a source changes after the initial assessment. The Data Architect needs to outline everything that has changed.were not initially visible. Once again. the corresponding Target-Source Matrix must also change.

the Source Availability Matrix.Data Warehousing 245 of 1017 . This is necessary in order to determine realistic start and end times for the load window. lists all the sources that are being used for data extraction and specifies the systems' downtimes during a 24-hour period. Prerequisites None Roles Application Specialist (Primary) Data Integration Developer (Secondary) Database Administrator (DBA) (Secondary) System Administrator (Primary) Technical Project Manager (Review Only) Considerations The information generated in this step will be crucial later in the development process INFORMATICA CONFIDENTIAL Velocity v8 Methodology . including weekends and holidays. This matrix should contain details of the availability of the systems on different days of the week.2 Analyze Data Sources task is to determine when all source systems are likely to be available for data extraction. The developers need to work closely with the source system administrators during this step because the administrators can provide specific information about the hours of operations for their systems.Phase 4: Design Subtask 4. The final deliverable in this subtask.2.2 Determine Source Availability Description The final step in the 4.

and therefore. Sometimes. This information is also helpful for determining whether an Operational Data Store (ODS) is needed. the extraction times can be so varied among necessary source systems that an ODS or staging area is required purely for logistical reasons. Determining the source availability can go a long way in determining when the load window for a regularly scheduled extraction can run. In many multi-national companies.Data Warehousing 246 of 1017 . may not be available for extraction concurrently. This can pose problems when trying to extract data with minimal (or no) disruption of users' day-to-day activities.for determining load windows and availability of source data. Best Practices None Sample Deliverables Source Availability Matrix Target-Source Matrix Last updated: 01-Feb-07 18:46 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . source systems are distributed globaly.

Data Warehousing 247 of 1017 .. number of devices.Phase 4: Design Task 4. partitioned.e..1. Prerequisites None Roles INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and may contain both detail and aggregate information.g..g. The physical design must reflect the end-user reporting requirements. The relevant end-user reporting tools. and physical location of storage). and the underlying RDBMS. multi-dimensional tools may arrange the data into data "cubes"). availability of disk storage space. the physical design considers the following physical aspects of the database: ● How the tables are arranged. may dictate following a particular database structure (e. Where the logical design details the relationships between logical entities in the system. on which devices).3 Design Physical Database Description The physical database design is derived from the logical models created in Task 4. stored (i. and indexed ● The detailed attributes of all database columns ● The likely growth over the life of the database ● How each schema will be created and archived ● Hardware availability and configuration (e. organizing the data entities to allow a fast response to the expected business queries. Physical target schemas typically range from fully normalized (essentially OLTP structures) to snowflake and star schemas.

For Data Migration. These needs determine the likely selection criteria.Data Warehousing 248 of 1017 . the tables that are designed and created are normally either stage tables or reference tables. filters. Database design tools may generate and execute the necessary processes to create the physical tables. suggest indexing or partitioning policies (i. selection sets and measures that will be used for reporting. end-user reporting needs are the primary driver. as well as indicate which elements are likely to grow or change most quickly. A final consideration is how to implement the schema. However. automated scripts may still be necessary for dropping. These tables are generated to simplify the migration process. the functionality of the operating system or RDBMS. to support the most frequent cross-references between data objects or tables and identify the most common table joins) and appropriate access rights. the physical database design is tempered by system-imposed limits such as the available disk sizes and numbers. These are typically delivered with a packaged application or already exist for the broader project implementation. and the PowerCenter Metadata Exchange can interact with many common tools to pull target table definitions into the repository. In all cases. These elements may.. The table definitions for the target application are almost always provided to the data migration team. and creating tables. and the volume. or additional data marts may all point toward specific design decisions to support future load nad/or reporting requirements. scripts and DBA duties. truncating.e. INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Business Analyst (Secondary) Data Architect (Primary) Database Administrator (DBA) (Primary) System Administrator (Review Only) Technical Project Manager (Review Only) Considerations Although many factors influence the physical design of the data marts. in turn. enhancements to its usability and functionality. These factors all help to determine the best-fit physical structure for the specific project. the human resources available for design and creation of procedures. Long-term strategies regarding growth of a data warehouse. frequency and speed of delivery of source data.

Best Practices None Sample Deliverables Physical Data Model Review Agenda Last updated: 01-Feb-07 18:46 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 249 of 1017 .

g. may dictate following a particular database structure (e.Phase 4: Design Subtask 4.. disk space and devices Minimizing DBA and systems administration overhead Effective use of existing tools and procedures Physical designs are required for target data marts. The relevant end-user reporting tools.3.1 Develop Physical Database Design Description As with all design tasks. Optimally. and the underlying RDBMS. multi-dimensional tools may arrange the data into data "cubes"). there are both enterprise and workgroup considerations in developing the physical database design. the final design should balance the following factors: ● ● ● ● ● Ease of end-user reporting from the target Ensuring the maximum throughput and potential for parallel processing Effective use of available system resources.Data Warehousing 250 of 1017 . Prerequisites None Roles Business Analyst (Secondary) Data Architect (Primary) Database Administrator (DBA) (Primary) System Administrator (Review Only) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . as well as any ODS/DDS schemas or other staging tables.

organizing the data entities to provide answers to the expected business queries. and allows a fast response to the end-user queries. ● ● ● The design must also reflect the end-user reporting requirements. optimized for end-user reporting. location.e.. if not. Proprietary multi-dimensional structures.g. The physical design provides a structure that enables the source data to be quickly extracted and loaded in the transformation process. networking links and required interfaces Determining distribution and accessibility requirements.Technical Project Manager (Review Only) Considerations This task involves a number of major activities: ● Configuring the RDBMS. as above but with certain entities split or merged to simplify loading into them. and maintenance requirements (i. will the physical database design exceed the capabilities of the existing systems or make upgrades difficult?) ● ● ● ● The logical target data models provide the basic structure of the physical design. Physical target schemas typically range from: ● ● Fully normalized (essentially OLTP structures) Denormalized relational structures (e.Data Warehousing 251 of 1017 . INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Aggregate versions of the above. or extracting from them to feed other databases) Classic snowflake and star schemas. which involves determining what database systems are available and identifying their strengths and weakness Resolving hardware issues such as the size. such as 24x7 access and local or global access Determining if existing tools are sufficient or. allowing very fast (but potentially less flexible and detailed) queries. recovery. and number of storage devices. ordered as fact and dimension tables in standard RDBMS systems. selecting new ones Determining back-up..

physical partitioning of larger tables allow it to be quickly loaded via parallel processes. this typically uses a star or snowflake schema. Data Warehouse design. so the impact of indexes on loading is not as significant. The Data Warehouse design should be biased toward feeding subsequent data marts. Data Marts should be strongly biased toward reporting. the data warehouse and ODS structures should be as physically close as possible so as to avoid network traffic. or simply relational copies of source flat files.. since the data warehouse functions as the enterprise-wide central point of reference. Tied to subject areas. ● ● Tip The tiers of a multi-tier strategy each have a specific purpose. where significant end-user reporting may occur). The ODS structure should be very similar to the source since no transformations are performed. May use multi-dimensional structures if a specific set of end-user reporting requirements can be identified.Staging from source should be designed to quickly move data from the operational system. optimized for set-based reporting and cross-referenced against many. and has few indexes or constraints (which slow down loading).Data Warehousing 252 of 1017 . most likely as starschemas. or a more normalized relational structure (where the data warehouse acts purely as a feeder to several dependent data marts) to speed up extracts to the subsequent data marts. this may be based on a starschema (i. or multi-dimensional cubes. At the same time. This is usually closely related to the individual sources. Because data volumes are high. Data Mart design. The usual source for complex business queries. The volumes will be smaller than the parent data warehouse. and should be indexed to allow rapid feeds to the marts. along with a relational structure.Preferred Strategy A typical multi-tier strategy uses a mixture of physical structures: ● Operational Data Store (ODS) design. Optimized for fast loading (to allow connection to the source system to be as short as possible) with few or no indexes or constraints. ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and is. varied combinations of dimensional attributes.e. relationally organized (like the source OLTP). therefore. which strongly suggests the likely physical structure: ● ODS .

using many physical devices to store individual targets or partitions can speed loading because several tables on a single device must use the same read-write heads when being updated in parallel.. This may require amending an initial physical design to split up larger tables. to define additional pointers and create more complex backup instructions).Data Warehousing 253 of 1017 . all RDBMS systems might provide the same set of functions. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . If target tables are physically partitioned. the separate partitions can be stored on separate physical devices. Of course. using multiple.g. as well as greater flexibility in table reorganisations as well as backup and recovery. In an ideal world. Tools The relevant end-user reporting tools may dictate following a particular database structure. although much of this can be automated using external scripts. allowing a further order of parallel loading. level of configuration. Business Objects. A lack of physical partitioning may affect performance when loading data into growing tables. partitioning allows faster parallel loading to a single table. different vendors include different features in their systems and new features are included with each new release. separate devices may result in added administrative overhead and/or work for the DBA (i. this may affect: ● Physical partitioning. ● ● Tip Using multiple physical devices to store whole tables allows faster parallel updates to them. Ideally. This is not the case however.RDBMS Configuration The physical database design is tempered by the functionality of the operating system and RDBMS. When it is available. and scalability. Physical device management.e. This is not available with all systems. Although many popular business intelligence tools (e. at least for the data mart and data warehouse designs. Older systems may not allow tables to physically grow past a certain size. Limits to individual tables. The downside is that extra initial and ongoing DBA and systems administration overhead is required to fully manage partition management.

proprietary systems for storage. via telephone lines). on the same box. ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . 24x7) access.. The existing network connections. Database design tools (ErWin. indexing. PowerDesigner) may generate and execute the necessary processes to create the physical tables.g. while a batch window would allow indexes/constraints to be dropped before loading.e. The areas to consider are: ● ● ● The size of storage available The number of physical devices The physical location of such (e.Data Warehousing 254 of 1017 ..MicroStrategy and others) can access a wide range of relational and denormalized structures. loading.. ● Multi-dimensional (MOLAP) tools often require specific (i. proprietary) structures to be used. Designer 2000. or different sub-sets of the data)? Will all end-users access the same physical data. different levels of aggregation.g. Factors to consider include: ● Will end-users require continuous (i. resulting in significantly decreased load times.. long/thin vs. and organization.. short/fat star schema designs). or will a batch window be available to load new data? Each involves some issues: continuous access may require complex partitioning schemes and/or holding multiple copies of the data. via a fast network. Will different users require access to the same data. ● Hardware Issues Physical designs should be able to be implemented on the existing system (which can help to identify weaknesses in the physical infrastructure). but are also subject to their own features and functions.e. each generally works best with a particular type (e. the likely demands on the data mart should affect the physical design. g. and peaks in demand. These tools arrange the data logically into data "cubes". on a closely connected box. but in different forms (e. but physically use complex. ● Distribution and Accessibility For a large system. or local copies of it (which need to be distributed in some way)? This issue affects the potential size of any data mart.

the indexes can be dropped before loading. For Data Migration projects.Tip If the end-users require 24x7 access. rebuild. it is possible with later releases of major RDBMS tools to load table-space and index partitions entirely separately. Back Up. forecast. and then re-generated after the load.. however. reorganize. to change. This is not true for all databases. the physical structures must be designed with an eye on any existing limits to the general data management processes. rather than simply scanning all rows of the table if appropriately indexed fields are used in a request. Recovery And Maintenance Finally. it is important to determine: ● Will the designs fit into existing back up processes? Will they execute within the available timeframes and limits? Will recovery processes allow end-users to quickly re-gain access to their reporting system? Will the structures be easy to maintain (i. the slower the speed of data loading into the target. Because the physical designs lead to real volumes of data. If tables are needed they will most likely be staging tables or be used to assist in transformation.Data Warehousing 255 of 1017 . it is rare that any tables will be designed for the source or target application. The more indexes that exist on the target. Where an appropriate batch window is available for performing the data load. and. needs to be incorporated into the actual load mechanisms. and incoming volumes of source data are very large. if available. If no window is available. however. the strategy should be one of balancing the load and reporting needs by careful selection of which fields to index.e. It is common that the staging tables will mirror either the source system or the target system. It is encouraged to create two levels of staging where Legacy Stage will mirror the source system and Pre-Load Stage will mirror the INFORMATICA CONFIDENTIAL Velocity v8 Methodology . since unanticipated downtime is likely to affect an organization's ability to plan. or upgrade)? ● ● Tip Indexing frequently-used selection fields/columns can substantially speed up the response for end-user reporting because the database engine optimizes its search pattern. or even operate effectively. only swapping them into the reporting target at the end. since maintaining the indexes becomes an additional load on the database engine.

target system. after the fact. Developers often take advantage of PowerCenter’s table generation functionality in designer for this purpose. to quickly generate needed tables and subsequently to reverse engineer the table definitions with a modeling tool. Best Practices None Sample Deliverables None Last updated: 15-Feb-07 19:08 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 256 of 1017 .

and focus. and document the overall presentation layer. The developers will use the design that results from this task and its associated subtasks in the Build Phase to build the presentation layer (5. This section describes some general considerations and deliverables for determining how to deliver information to the end user. picklist. Strong technical features. This step may actually take place earlier in this phase. The purpose of any presentation layer is to design an application that can transform operational data into relevant business information. The presentation layer application should be capable of handling a variety of analytical approaches. demonstrate it to users and get their feedback. Thin client web access with ease of use. pivoting. and seamless integration with other applications.6 Build Presentation Layer). semiadditive summations. An analytic solution helps end users to formulate and support business decisions by providing this information in the form of context. including the following: ● Ad hoc reporting. and direct SQL entry. and user-changeable variables. This task includes activities to develop a prototype. r r r INFORMATICA CONFIDENTIAL Velocity v8 Methodology . or occur in parallel with the data integration tasks. summarization.4 Design Presentation Layer Description The objective of this task is to design a presentation layer for the end-user community. The tool should enable users to formulate there own queries by directly manipulating relational tables and complex joins. Note: Readers are reminded that this guide is intentionally analysis-neutral. interactive exploration of the data. metadata access. highlighting (alerts). Used in situations where users need extensive direct. charting and graphs. Such tools must support: r Query formulation that includes multipass SQL. Analysis and presentation capabilities like complex formatting.Data Warehousing 257 of 1017 .Phase 4: Design Task 4.

MOLAP. An artificial intelligence-based technology that integrates large databases and proposes possible patterns or trends in the data. HOLAP. OLAP access is more discovery-oriented than ad hoc reporting. largely because EIS could not contain sufficient data for true analysis. The OLAP technologies provide multidimensional access to business information. Dashboard reporting (Push-Button). drill through. Arguably the most common approach and most often associated with analytic solution architectures. and DOLAP are all variants). Dashboard reporting from the data warehouse effectively replaced the concept of EIS (executive information systems). ROLAP.g.. each with their own characteristics. Graphical presentation of the information attempts to highlight business trends or exceptional conditions. There are several types of OLAP (e.This approach is suitable when users want to answer questions such as. The key distinction is data mining's ability to deliver trend analysis without specific requests by the end users. Dashboard reporting emphasizes the summarization and presentation of information to the end user in a user friendly and extremely graphical interface. allowing users to drill down.Data Warehousing 258 of 1017 . Nevertheless. and drill across data. A commonly cited example is the telecommunications company that uses data mining to highlight potential fraud by comparing activity to the customer's previous calling patterns. ● ● Prerequisites None Roles Business Analyst (Primary) Presentation Layer Developer (Primary) Considerations The presentation layer tool must: INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The tool selection process should highlight these distinguishing characteristics in the event that OLAP is deemed the appropriate approach for the organization. the need for an executive style front-end still exists and dashboard reporting (sometimes referred to as Push-Button access) largely fills the need. Data Mining. "What were Product X revenues in the past quarter?" ● Online Analytical Processing (OLAP).

" Meeting the requirements of all end users may require mixing different approaches to end-user analysis. the end-user analysis solution should include several tools. For example.2 Define Business Requirements. The needs of the various users should be determined by the user requirements defined in 2.● ● Comply with established standards across the organization Be compatible with the current and future technology infrastructures The analysis tool does not necessarily have to be "one size fits all.Data Warehousing 259 of 1017 . if most users are likely to be satisfied with an OLAP tool while a group focusing on fraud detection requires data mining capabilities. each satisfying the needs of the various user groups. Best Practices None Sample Deliverables Information Requirements Specification Last updated: 01-Feb-07 18:46 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Prerequisites None Roles Business Analyst (Primary) Presentation Layer Developer (Primary) Considerations It is important to use actual source data in the prototype.1 Design Presentation Layer Prototype Description The purpose of this subtask is to develop a prototype of the end-user presentation layer "application" for review by the business community (or its representatives). the more relevant the feedback. then. The closer the prototype is to what the end user will actually see upon final release.Phase 4: Design Subtask 4. In this way. The result of this subtask is a working prototype for end-user review and investigation. A formal change control request process complements this approach. This approach helps to INFORMATICA CONFIDENTIAL Velocity v8 Methodology . end users can see an initial interpretation of their needs and validate or expand upon certain requirements. PowerCenter can deliver a rough cut of the data to the target schema.Data Warehousing 260 of 1017 . Baselining user requirements also allows accurate tracking of progress against the project plan and provides transparency to changes in the user requirements.4. This makes it easier for the development team to focus on deliverables. thereby allowing the end-user capability to evolve through multiple iterations of the design. Also consider the benefits of baselining the user requirements through a sign-off process. Data Analyzer (or other business intelligence tools) can build reports relatively quickly.

ensure that the project plan remains close to schedule. Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:46 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 261 of 1017 .

4.1 Design Presentation Layer Prototype Roles INFORMATICA CONFIDENTIAL Velocity v8 Methodology . EIS and Data Mining often bring a new data analysis capability and approach to end users. Technologies such as OLAP. The prototype demonstration should focus on the capabilities of the end user analysis tool and highlight the differences between typical reporting environments and decision support architectures. The result of this subtask will be a deliverable. The feedback should cover such issues as pre-defined reports.Phase 4: Design Subtask 4. dimensional hierarchies. Prerequisites 4. review of formulas for any derived attributes. The bulk of the document should contain a list of participants' approval or rejection of various aspects of the prototype. and a summary of what was presented. a list of participants.2 Present Prototype to Business Analysts Description The purpose of this subtask is to present the presentation layer prototype to business analysts and the end users. end users must precisely specify their queries. presentation of data. and so forth. Multidimensional analysis allows for much more discovery and research. containing detailed results from the prototype presentation meeting or meetings. A prototype that uses familiar data to demonstrate these abilities helps to launch the education process while also improving the design. The Prototype Feedback document should contain such administrative information as date and time of the meeting. the Prototype Feedback document. The demonstration of the prototype is also an opportunity to further refine the business requirements discovered in the requirements gathering subtask. In an ad hoc reporting paradigm.4.Data Warehousing 262 of 1017 . The end users themselves can offer feedback and ensure that the method of data presentation and the actual data itself are correct. A thorough understanding of what the tool can provide enables the end users to refine their requirements to maximize the benefit of the new tool. which follows a different paradigm. This subtask also serves to educate the users about the capabilities of their new analysis tool.

and viceversa. For example. Having all parties participate in this activity facilitates the process of working through any data issues that may be identified. Data Mining). it is important here to assemble a group that represents the spectrum of end-users across the organization. Different job functions require different information and may also require various data access methods (i. OLAP. instead it should be conducted at several points throughout design and development to facilitate and elicit end-user feedback. Using actual source data in the development of the prototype gives the Data Integration Developer a knowledge base of what data is or is not available in the source systems and in what format that data is stored. A cross section of end users at various levels ensures an accurate representation of needs across the organization. User involvement also helps build support for the presentation layer tool throughout the organization.. from business analysts to high-level managers.e. information that is important to business analysts such as metadata.Business Analyst (Primary) Presentation Layer Developer (Primary) Considerations The Data Integration Developer needs to be an active participant in this subtask to ensure that the presentation layer is developed with a full understanding of the needs of the end users. As with the tool selection process. The demonstration of the presentation layer tool prototype should not be a one-time activity. ad hoc. Involving the end users is vital to getting "buy-in" and ensuring that the system will meet their requirements. Best Practices None Sample Deliverables Prototype Feedback INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 263 of 1017 . EIS. may not be important to a high-level manager.

Data Warehousing 264 of 1017 .Last updated: 01-Feb-07 18:46 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Prerequisites 4. The principles are the same regardless of the type of application (e. A well-designed interface effectively communicates this information to the end user. is to improve and finalize its design for use by the end users. For example.. however.Phase 4: Design Subtask 4. The significance and required level of detail in the information to be delivered determines whether to present the information on a dashboard or a report. warehousing or business intelligence project is to collect and transform data into meaningful information for use by the decision makers of a business.1 Design Presentation Layer Prototype Roles Business Analyst (Secondary) Presentation Layer Developer (Primary) Considerations Types of Layouts Each piece of information presented to the end user has its own level of importance. The more critical information in the above-mentioned category.3 Develop Presentation Layout Design Description The goal of any data integration.Data Warehousing 265 of 1017 . The next step after prototyping the presentation layer and gaining approval from the Business Analysts.4. If an interface is not designed intuitively. information that needs to be concise and answers the question “Has this measurement fallen below the critical threshold number?”. needing to reach the end user without having to wait for the user to log onto the INFORMATICA CONFIDENTIAL Velocity v8 Methodology .g.4. customer relationship management reporting or metadata reporting solution). the end users may not be able to successfully leverage the information to their benefit. qualifies to be an Indicator on a dashboard.

Data can be provided via Alerts. In order to ascertain this. What type of information do the users need and what are their expectations? Always remember that the users are looking for very specific pieces of information in the presentation layout. The audience’s requirements and preferences should drive your presentation style. For example: you may have to create multiple dashboards in a single project and personalize each dashboard to a specific group of end users’ needs. be sure to understand the type of data that each report is going to display before choosing a report table layout. Most of the time. be sure to consider the following points: ● Who is your audience? You have to know who is the intended recipient of the information that you are going to provide. such as sales data for all the regions or revenue by product category etc. but a sectional layout would be more appropriate if the end users are interested in seeing the dollar amounts for each category of the product in each district. A clear understanding of what needs to be displayed. or links to Favorite Reports and Shared Documents. you will find that the same information will best serve its purpose if presented in two different styles to two different users. On many occasions. Generally. the business users are ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Report Table Layout Each report that you are going to build should have suitable design features for the data to be displayed so as to ensure that the report communicates its message effectively. For example. a tabular layout would be appropriate for a sales revenue report that shows the dollar amounts against only one dimension (e. When developing either a dashboard or report. Dashboards Data Analyzer dashboards contain all the critical information users need in one single interface. However. Indicators.g.Data Warehousing 266 of 1017 . as well as how many different types of indicators and alerts are going to put on the dashboard are important in the selection of an appropriate dashboard layout. Data Analyzer facilitates the design of an appealing presentation layout for the information by providing predefined dashboard layouts. one at a time. Often there will be multiple audiences for the information you have to share. product category).system.. most information delivery requirements constitute detailed reports. each subset of data should be placed in a separate container. Detailed Reports can be put as links on the dashboards so that users can easily navigate to more detailed reports. needs to be implemented as an Alert.

green for all the ‘good’ values and so on. A good example of this would be using a bright red color for all your alerts. Some users may be interested in more indicators and charts while others may want see detailed reports. indicators on a dashboard or emailed alerts are likely to be more appropriate.Data Warehousing 267 of 1017 . Try to place yourself in the users' shoes and ask yourself questions such as what would be the most helpful way to display the information or what could be the possible uses of the information that you are providing. the users' expectations will affect the way your information is presented to them. if they just need quick glimpses of the data. then they are likely to want a detailed report. They do not always have the time or required skills to navigate to various places and search for the specific metric or value that matters to them. monthly. such as monthly or daily sales. etc. It becomes important to choose the right colors and backgrounds to catch the user’s attention where it is needed the most. This can also help to determine what type of indicators to develop. ● Why do they need it? Understanding this can help you to choose the right layout for each piece of information that you have to present. If they want granular information.not highly technically skilled personnel. graph or an indicator can convey critical information to the concerned users quickly and accurately. When does the data need to be displayed? It is critical to know when important business processes occur. especially the color codes as given in the example above. the better you can design presentation layout. However. Additionally. weekly. The more thoroughly you understand the user expectations. How should the data be displayed? A well-designed chart. Best Practices None INFORMATICA CONFIDENTIAL Velocity v8 Methodology . ● ● Tip It is also important to determine if there are any enterprise standards set for the layout designs of the reports and dashboards. This can help drive the development and scheduling of reports – daily.

Data Warehousing 268 of 1017 .Sample Deliverables None Last updated: 15-Feb-07 19:10 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

4.Data Warehousing 269 of 1017 .3.4.4 Design and Develop Data Integration Processes r 5.3 Design and Build Data Quality Process r 5.2 Perform Integrated ETL Testing r INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Phase 5: Build 5 Build ● 5.2 Review Physical Model 5.8 Conduct Peer Reviews r r r r r r r ● 5.3.7 Perform Unit Test 5.5.5 Develop Inventory of Data Quality Processes 5.2 Develop Error Handling Strategy 5.1 Build Load Process 5.2 Determine Dictionary and Reference Data Requirements 5.1 Design Data Quality Technical Rules 5.3 Design and Execute Data Enhancement Processes 5.4.3.1 Design High Level Load Process 5.3 Plan Restartability Process 5.3 Define Defect Tracking Process r r ● ● 5.5 Design Individual Mappings & Reusable Objects 5.3.3.5 Populate and Validate Database r 5.4 Design Run-time and Real-time Processes for Operate Phase Execution 5.4.1.1 Launch Build Phase r 5.1 Review Project Scope and Plan 5.1.4.4.1.6 Review and Package Data Transformation Specification Processes and Documents r r r r r ● 5.4.3.2 Implement Physical Database 5.5.4 Develop Inventory of Mappings & Reusable Objects 5.6 Build Mappings & Reusable Objects 5.4.

1 Develop Presentation Layer 5.2 Demonstrate Presentation Layer to Business Analysts r INFORMATICA CONFIDENTIAL Velocity v8 Methodology .6.Data Warehousing 270 of 1017 .6.6 Build Presentation Layer r 5.● 5.

the project scope. At this point. Prerequisites None Roles Business Analyst (Primary) Business Project Manager (Secondary) Data Architect (Primary) Data Integration Developer (Primary) Data Quality Developer (Primary) Data Steward/Data Quality Steward (Secondary) Data Warehouse Administrator (Secondary) Database Administrator (DBA) (Primary) Presentation Layer Developer (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Phase 5: Build Description The Build Phase uses the design work completed in the Architect Phase and the Design Phase as inputs to physically create the data integration solution including data quality and data transformation development efforts. and business requirements defined in the Manage Phase should be re-evaluated to ensure that the project can deliver the appropriate value at an appropriate time.Data Warehousing 271 of 1017 . plan.

As a project progresses from the Design Phase to the Build Phase. loads can be multi-threaded. These transformations apply the business rules. ERP systems from SAP. Sybase IQ and Teradata external loaders can be used to increase performance. ● Extract . flat files. and enforce data consistency from disparate sources as data is moved from source to target. HIPAA sources. PowerCenter interfaces mask the complexities of the underlying DBMS for the developer. and Siebel. relational databases.PowerCenter extracts data from a broad array of heterogeneous sources. In addition DB2. Data can be delivered to EAI queues for enterprise applications. it is helpful to review the activities involved in each of these processes. Data can be accessed from sources including IBM mainframe and AS400 systems. and automate the extract. To increase performance and throughput. perform data transformations. Data loads can also take advantage of Open ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . transform.Data Warehousing 272 of 1017 . and load (ETL) processes. enabling the build process to focus on implementing the business logic of the solution Transform . Peoplesoft. transform. Load .PowerCenter automates much of the load process. web log sources and direct parsing of XML data files through DTDs or XML schemas. MQ Series. streamed (concurrent execution of the extract.The majority of the work in the Build Phase focuses on developing and testing data transformations. or serviced by more than one server. cleanse the data. and load steps). Oracle. and TIBCO.Project Sponsor (Approve) Quality Assurance Manager (Primary) Repository Administrator (Secondary) System Administrator (Secondary) Technical Project Manager (Primary) Test Manager (Primary) Considerations PowerCenter serves as a complete data integration platform to move data from source to target databases. pipelined.

Database Connectivity (ODBC) or use native database drivers to optimize performance. Pushdown optimization can even allow some or all of the transformation work to occur in the target database itself.Data Warehousing 273 of 1017 . Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:46 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Attention should be given to project schedule. all analysis performed in previous phases of the project needs to be compiled. including databases. architecture design. Some or all of the following types of meetings may be required to get development under way: ● ● Kick-off meeting to introduce all parties and staff involved in the Build phase Functional design review to discuss the purpose of the project and the benefits expected and review the project plan Technical design review to discuss the source to target mappings. important deliverables.Data Warehousing 274 of 1017 . dependencies. and other business issues ● ● A series of meetings may be required to transfer the knowledge from the Design team to the Build team. reviewed and disseminated to the members of the Build team. and all necessary user logons and passwords. As a result of these meetings. and critical risk factors Overview of the technical design including external dependencies Mechanism for tracking scope changes. ensuring that the appropriate staff is provided with relevant information.1 Launch Build Phase Description In order to begin the Build phase.Phase 5: Build Task 5. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . database/SQL tools available in the environment. operating systems. complete with key milestones. scope. and any other technical documentation ● Information provided in these meetings should enable members of the data integration team to immediately begin development. The team should be provided with: ● ● ● Project background Business objectives for the overall solution effort Project schedule. the integration team should have a clear understanding of the environment in which they are to work. and risk factors. problem resolution. file systems within the repository and file structures within the organization relating to the project.

). When team members encounter design problems or technical problems. DBA.The team should be provided with points of contact for all facets of the environment (e. If all points of INFORMATICA CONFIDENTIAL Velocity v8 Methodology .. etc. there must be an appropriate path for problem escalation. UNIX\NT Administrator. Prerequisites None Roles Business Analyst (Secondary) Data Architect (Primary) Data Integration Developer (Secondary) Data Warehouse Administrator (Secondary) Database Administrator (DBA) (Secondary) Presentation Layer Developer (Primary) Quality Assurance Manager (Primary) Repository Administrator (Review Only) System Administrator (Review Only) Technical Project Manager (Primary) Test Manager (Primary) Considerations It is important to include all relevant parties in the launch activities. PowerCenter Administrator. The Project Manager should establish a specific mechanism for problem escalation along with a problem tracking report.Data Warehousing 275 of 1017 . g. The team should also be aware of the appropriate problem escalation plan.

Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:46 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Having meetings prior to resolving these issues can result in significant down time for the developers while they wait to have their sources in place and finalized. which emanate from outside the project.Data Warehousing 276 of 1017 . there are often bottlenecks in the development flow. and should avoid having meetings prematurely when external dependencies have not been resolved. The Project Manager also needs to be aware of the external factors that create project dependencies. the key personnel in each area should be present to reschedule quickly. and should be able to anticipate where bottlenecks are likely to occur.discussion cannot be resolved during the kick-off meeting. Because of the nature of the development process. so as not to affect the overall schedule. The Project Manager should be aware of the risk factors.

With this information. The team should review the project plan and identify the critical success factors and key deliverables to avoid focusing on relatively unimportant tasks. scope. Prerequisites None Roles Business Analyst (Review Only) Data Architect (Review Only) Data Integration Developer (Review Only) Data Warehouse Administrator (Review Only) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and plan in order to prepare themselves for the Build Phase. This helps to ensure that the project stays on its original track and avoids much unnecessary effort. the Build team should be able to enhance the project plan to navigate through the risk areas. There is often a tendency to waste time developing non-critical features or functions. Risk assessments made by the design team.1 Review Project Scope and Plan Description The Build team needs to understand the project's objectives. Dependencies that effect deliverables.Data Warehousing 277 of 1017 . The team should be provided with: ● ● ● ● Detailed descriptions of deliverables and timetables.Phase 5: Build Subtask 5. and tasks to reach its goal of developing an effective solution. Critical success factors. dependencies.1.

This is also an appropriate time to review data profiling and analysis results to ensure all data quality requirements have been taken into consideration. Team members are responsible for identifying the risk factors in their respective areas and notifying project management during the review process. It is also a good time to review and update the project plan. During the project scope and plan review. however. the team may need to plan additional tasks and time to build a method for accessing the data.Data Warehousing 278 of 1017 . In this case. significant effort should be made to identify upcoming Build Phase risks and assess their potential impact on project schedule and/ or cost. Because the design is complete. For example. to incorporate the knowledge gained during the earlier phases. which was created before these findings. this is the first opportunity for the team to review what it has learned during the Architect Phase and the Design Phase about the sources of data for the solution. risk management at this point tends to be more tactical than strategic. Best Practices None Sample Deliverables Project Review Meeting Agenda Last updated: 01-Feb-07 18:46 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . the team may have learned that the source of data for marketing campaign programs is a spreadsheet that is not easily accessible by the network on which the data integration platform resides.Database Administrator (DBA) (Review Only) Presentation Layer Developer (Review Only) Quality Assurance Manager (Review Only) Technical Project Manager (Primary) Considerations With the Design Phase complete. the team leadership must be fully aware of key risk factors that remain.

The Data Warehouse Administrator can provide metadata information and other source data information. Prerequisites None Roles Business Analyst (Secondary) Data Architect (Primary) Data Integration Developer (Secondary) Data Warehouse Administrator (Secondary) Database Administrator (DBA) (Secondary) Presentation Layer Developer (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . aggregations. what partitions are available and how they are defined. as well as all the dimensions.Data Warehousing 279 of 1017 .1. The Data Architect can provide database specifics such as: what are the indexed columns.2 Review Physical Model Description The data integration team needs the physical model of the target database in order to begin analyzing the source to target mappings and develop the end user interface known as the presentation layer. and the Data Integration Developer(s) needs to understand the entire physical model of both the source and target systems. and transformations that will be needed to migrate the data from the source to the target. and what type of data is stored in each table.Phase 5: Build Subtask 5.

In addition. the data integration team should have sufficient understanding of the database schemas to begin work on the Build-related tasks. Teradata's BYTEINT datatype is not supported by some end user reporting tools. This can lead to extremely complex and/or poorly performing mappings. For this reason. As a result of the various kick-off and review meetings. Best Practices None Sample Deliverables Physical Data Model Review Agenda Last updated: 01-Feb-07 18:46 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . For example.Data Warehousing 280 of 1017 . some end user products may not support some datatypes specific to a database. it is advisable to allow some flexibility in the design of the physical model to permit modifications to accommodate the sources.Quality Assurance Manager (Review Only) Repository Administrator (Review Only) Technical Project Manager (Review Only) Considerations Depending on how much up-front analysis was performed prior to the Build phase. the project team may find that the model for the target database does not correspond well with the source tables or files.

● Prerequisites None Roles Data Integration Developer (Review Only) Data Warehouse Administrator (Review Only) Database Administrator (DBA) (Review Only) Presentation Layer Developer (Review Only) Quality Assurance Manager (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . say. a database with a web browser front-end. and that their resolutions can be retested. based on shared documents such as spreadsheets. This requires a ‘defect tracking system’ that may be entirely manual. sufficient details of the problem must be recorded to allow proper investigation of the root cause and then the tracking of the resolution process.Phase 5: Build Subtask 5. Sufficient details being recorded to ensure that any problems reported are repeatable and can be properly investigated. along with their resolution process.1. to ensure that defects are discovered.Data Warehousing 281 of 1017 . The success of a defect tracking system depends on: ● Formal test plans and schedules being in place. or automated using.3 Define Defect Tracking Process Description Since testing is designed to uncover defects. Whatever tool is chosen. it is crucial to properly record the defects as they are identified.

Non-urgent problems are reviewed by the Test Manager and Project Manager on a regular basis (this can be daily at a critical development time. Test Manager reviews these reports and assigns priorities on an Urgent/High/ Medium/Low basis (‘Urgent’ should only be used for problems that will prevent or severely delay further testing). but is usually less frequent) to agree priorities for all outstanding problems. if appropriate. Urgent problems are immediately passed to the Project Manager for review/ action. The Project Manager reviews the results of investigations and assigns rectification work to ‘fixers’ according to priorities and effective use of resources.Data Warehousing 282 of 1017 . The ‘investigator’ attempts to determine the root cause of the defect and to define the changes needed to rectify the defect.Repository Administrator (Review Only) System Administrator (Review Only) Technical Project Manager (Primary) Test Manager (Primary) Considerations The defect tracking process should encompass these steps: ● ● Testers prepare Problem Reports to describe defects identified. Best Practices None INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The ‘fixer’ make the required changes and conducts unit testing. ● ● ● ● ● ● ● The Project Manager and Test Manager review the test results at their next meeting and agree on closure. The Project Manager may decide to group a number of fixes together to make effective use of resources. The Project Manager assigns problems for investigation according to the agreed-upon priorities. Regression testing is also typically conducted.

Sample Deliverables Issues Tracking Last updated: 01-Feb-07 18:46 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 283 of 1017 .

Data Warehousing 284 of 1017 .2 Implement Physical Database Description Implementing the physical database is a critical task that must be performed efficiently to ensure a successful project. Database Administrators. and support of the database(s) used in the solution. performance. The information in this section is intended as an aid for individuals responsible for the long-term maintenance. Conversely. poor physical implementation generally has the greatest negative performance impact on a system. One example INFORMATICA CONFIDENTIAL Velocity v8 Methodology . In many cases.Phase 5: Build Task 5. as well as the operating system and network hardware. Prerequisites None Roles Data Architect (Secondary) Data Integration Developer (Review Only) Database Administrator (DBA) (Primary) Repository Administrator (Secondary) System Administrator (Secondary) Considerations Nearly everything is a trade-off in the physical database implementation. correct database implementation can double or triple the performance of the data integration processes and presentation layer applications. and System Administrators with an in-depth understanding of their database engine and Informatica product suite. It should be particularly useful for programmers.

The DBA should be thoroughly familiar with the design of star-schemas for Data Warehousing and Data Integration solutions. For this reason. For data migration projects this task often refers exclusively to the development of new tables in either a reference data schema or staging schemas. and desired use of the database by the end-user community prior to beginning the physical design and implementation processes. database. cross-reference tables. Therefore most of the table development will be in the developer space rather then in the applications that are part of the data migration. The DBA is responsible for determining which of the many available alternatives is the best implementation choice for the particular database.Data Warehousing 285 of 1017 . exception handling details. default values. as well as standard 3rd Normal Form implementations for operational systems. There should be little creation of tables in the source or target system due to the nature of the project. and other tables necessary for successful completion of the data migration.is the trade off of the flexibility of a completely 3rd Normal Form data schema for the improved performance of a 2nd Normal Form database. it is critical for this individual to have a thorough understanding of the data. Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:46 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Additionally. Developers are encouraged to leverage a reference data database which will hold reference data such as valid values. tables will get created in staging schemas.

1 Identify Source Data Systems) it is important to work as far as is practicable with the actual source data. this is unlikely to occur.3. Prerequisites None Roles INFORMATICA CONFIDENTIAL Velocity v8 Methodology . where implementation has a high degree of variation from the base product. However. and possibly the entire dataset.either the complete dataset or a subset . The processes designed in this task are based on the results of 2. and the other covers the quality of the data contents from a business perspective. Using data derived from the actual source systems .3 Design and Build Data Quality Process Description Follow the steps in this task to design and build the data quality enhancement processes that can ensure that the project data meets the standards of data quality required for progress through the rest of the project. In the case of complex ERP systems like SAP. Note: If the results of the Data Quality Audit indicate that the project data already meets all required levels of data quality.8 Perform Data Quality Audit. Both the design and build components are captured in the Build Phase since much of this work is interative as intermediate builds of the data quality process are reviewed.was essential in identifying quality issues during the Data Quality Audit and determining if the data meets the business requirements (i. the design is further expanded and enhanced. if it answers the business questions identified in the Manage Phase). The data quality enhancement processes designed in the subtasks of this task must operate on as much of the project dataset(s) as deemed necessary. Here again (as in subtask 2. then you can skip this task.Data Warehousing 286 of 1017 .e.Phase 5: Build Task 5. a thorough data quality check should be performed to consider the customizations.. Data quality checks can be of two types: one can cover the metadata characteristics of the data.

If data comes from different sources. it is imperative to resolve as many of the data issues as possible. If the data is flawed. If data comes from a mainframe. and Duplication. Note however that Informatica Data Quality (IDQ) applications do not read data directly from mainframe. the development initiative faces a very real danger of failing. it is necessary to use the proper access method to interpret data correctly. Integrity. the issue of data quality covers far more than simply whether the source and target data definitions are compatible. Business Analyst.Business Analyst (Primary) Business Project Manager (Secondary) Data Integration Developer (Secondary) Data Quality Developer (Primary) Data Steward/Data Quality Steward (Secondary) Technical Project Manager (Approve) Considerations Because the quality of the source system data has a major effect on the correctness of all downstream data. it is mandatory to correct data for each source as well as for the integrated data. Consistency.8 Perform Data Quality Audit. These are fully explained in task 2. data quality processes seek to answer the following questions: what standard has the data achieved in areas that are important to the business. Making the necessary corrections at this stage eliminates many of the questions that may otherwise arise later during testing and validation. Project Sponsor. and other interested INFORMATICA CONFIDENTIAL Velocity v8 Methodology . In addition. As indicated above. Before beginning to design the data quality processes.Data Warehousing 287 of 1017 . eliminating errors in the source data makes it far easier to determine the nature of any problems that may arise in the final data outputs. the Data Quality Developer. The Data Quality Developer uses the results of the Data Quality Audit as the benchmark for the data quality enhancement steps you need to apply in the current task. Conformity. From the business perspective. Completeness. and what standards are required in these areas? There are six main areas of data quality performance: Accuracy. as early as possible.

(See Subtask 5. Best Practices Data Cleansing Sample Deliverables None Last updated: 01-Feb-07 18:46 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Informatica’s dedicated data quality application suite.Data Warehousing 288 of 1017 .1 Design Data Quality Technical Rules.) The tasks that follow are written from the perspective of Informatica Data Quality. The first step is to agree on the business rules to be applied to the data.3.parties must meet to review the outcome of the Data Quality Audit and agree the extent of remedial action needed for the project data.

In may cases. in a larger sense. establish the business rules to be applied to the data. Some business rules can be tested mathematically using simple processes. In this subtask the Data Quality Developer and the Business Analyst.3. consider a financial institution that must store several types of information for account holders in order to comply with the Sarbanes-Oxley or the USA-PATRIOT Act. whereas others may require complex processes or reference data assistance. A business rule is a condition of the data that must be true if the data is to be valid and. and business rules can be defined at high. For example. including: INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 289 of 1017 . An important factor in completing this task is proper documentation of the business rules. for a specific business objective to succeed. Prerequisites None Roles Business Analyst (Primary) Data Quality Developer (Primary) Data Steward/Data Quality Steward (Secondary) Considerations All areas of data quality can be affected by business rules. and optionally other personnel representing the business.and low-levels and at varying levels of complexity. poor data quality is directly related to the data’s failure concerning a business rule.1 Design Data Quality Technical Rules Description Business rules are a key driver of data enhancement processes.Phase 5: Build Subtask 5. It defines several business rules for its database data.

Documenting Business Rules Documenting rules is essential as a means of tracking the implementation of the business requirements. but they are implemented in different ways.3. A plan has inputs. concerning address validation. it may be advisable to break it down into multiple rules. The name of the data source containing the records affected by the rule. to indicate that the account holder is between 18 and 100 years old). a discrete data quality process is called a plan. The data headers or field names containing the values affected by the rule. ● ● ● ● Note: In IDQ.g. Add columns for the plan name and the results of implementing the rule. This can be as simple as a incremented number. The decision to use external reference data is covered in subtask 5.Data Warehousing 290 of 1017 . outputs.. requires reference data verification. How to build the data quality processes to validate the rules. When defining business rules. or assigning a project code to each rule.● ● Field 1-Field n must not be null or populated with default values. the following information must be provided: ● A unique ID should be provided for each rule.2 Determine Dictionary and Reference Data Requirements. the Data Quality Developer must consider the following questions: ● ● How to document the rules. All account holder addresses are validated as postally correct. This should be as complete as possible – however. although the third rule. When documenting business rules. and analysis or enhancement algorithms and is analogous to a PowerCenter INFORMATICA CONFIDENTIAL Velocity v8 Methodology . ● These three rules are equally easy to express. A text description of the rule. Date of Birth field must contain dates within certain ranges (e. if the description becomes too lengthy or complex. All three rules can be checked in a straightforward manner using Informatica Data Quality (IDQ). The Data Quality Developer can provide this information later. The Data Quality Developer and the Business Analyst can refer back to the results of the Data Quality Audit to identify this information.

(The Data Quality Developer need not to create the plans at this stage) The Data Quality Developer may create a plan for each rule. It is important to understand that a data quality plan can be added to a PowerCenter custom transformation and run within a PowerCenter mapping.mapping. Typically a plan handles more than one rule. Bear in mind that the Data Quality Integration transformation in PowerCenter accepts information from one plan. There is a trade-off between simplicity in plan design.Data Warehousing 291 of 1017 . One advantage of this course of action is that the Data Quality Developer does not need to define and maintain multiple instances of input and output data. wherein each plan contains a single rule. the Data Quality Developer must decide how to convert the rules into data quality plans. you must add the same number of transformations. Assigning Business Rules to Data Quality Plans When the Data Quality Developer and Business Analyst have agreed on the business rules to apply to the data. where a single set of inputs and outputs can do the same job in a more sophisticated plan. covering small increments of data quality progress. This decision is taken on a rule-by-rule basis. Best Practices None Sample Deliverables None Last updated: 15-Feb-07 19:12 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . To add several plans to a mapping. wherein a single plan addresses several rules. It’s also worth considering if the plan will be run from within IDQ or added to a PowerCenter mapping for execution in a workflow. and efficiency in plan design. or may incorporate several rules into a single plan.

2 Determine Dictionary and Reference Data Requirements Description Many data quality plans make use of reference data files to validate and improve the quality of the input data. in cases where input data is verified against tables of known-correct data. To enrich data records with new data or enhance partially-correct data values. (Typos can be identified and fixed. For example.Data Warehousing 292 of 1017 . Prerequisites None Roles Business Analyst (Secondary) Business Project Manager (Secondary) Data Quality Developer (Primary) Data Steward/Data Quality Steward (Secondary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and define a strategy for maintaining and distributing reference files. if necessary. in cases of address records that contain usable but incomplete postal information.) ● When preparing to build data quality plans. obtain approval to use third-party data. An important factor in completing this task is the proper documentation of the required dictionary or reference files.3. For example. the Data Quality Developer must determine the types of dictionary and reference files that may be used in the data quality plans.Phase 5: Build Subtask 5. Plus-4 information can be added to zip codes. The main purposes of reference data are: ● To validate the accuracy of the data in question.

What is a non-standard location? One where the plans cannot see the dictionary files. All dictionaries installed with IDQ are text dictionaries. Informatica also provides and supports reference data of external origin. If the Data Quality Developer feels that externally-derived reference data files are necessary. city and town names. as long as the Data Quality Developer does not move the designed plans to non-standard locations. and can. such as postal address data endorsed by national postal carriers. The reference data provided by third-party vendors is typically in database format.DIC file format. These are plain-text files saved in . data quality plans can always point to the current version of the reference data. Database dictionaries are useful when the reference data has been originated for other purposes and is likely to change independently of IDQ. They can be created and edited manually.Considerations Data quality plans can make use of three types of reference data. he or she must inform the Project Manager or other business personnel as soon as possible.Data Warehousing 293 of 1017 . as this is likely to effect (1) the project budget and (2) the software architecture implementation. therefore. Users with database expertise can create and specify dictionaries that are linked to database tables. IDQ does not install any database dictionaries. units of measurement. Managing and Distributing Reference Files Managing standard-installed dictionaries is straightforward. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Database dictionaries are stored as SELECT statements that query the database at the time of plan execution. ● Standard dictionary files. These files are installed with Informatica Data Quality (IDQ) and can be used by several types of components in Workbench. be updated dynamically when the underlying data is updated. By making use of a dynamic connection. IDQ installs with a set of dictionary files in generic business information areas including forenames. ● Third-party reference data. These data files originate from third-party vendors and are provided by Informatica as premium product options. and gender identification. ● Database dictionaries.

and also looks in the logged-on user’s \Dictionaries folder on the server if the plan is executed on the server. the plan cannot run unless the config. If the relevant dictionary files are moved out of these locations. a large number of basic cross-references are also required for data migration projects. the reference file name. These needs can be met with a variety of Informatica products.. Note: Informatica customers license third-party data on a subscription basis. It is also recommended to use a table driven approach to populate hardcoded values which then allows for easy changes if the specific hard-coded values change over time. These locations are specified in IDQ’s config.xml file.xml file has been edited. client-server) install looks in this location. and the customer (possibly the system administrator) must perform the updates.. Data migration projects have additional reference data requirements which include a need to determine the valid values for key code fields and to ensure that all input data aligns with these codes.e. The third-party data currently available from Informatica is packaged in a manner that installs to locations recognized by IDQ. if the user has created new or modified dictionaries within the standard dictionary format. you must document exactly how you have done so: record the plan name. and the component instance that uses the reference file. client-only) install of IDQ looks for its dictionary files in the \Dictionaries folder of the installation. and wishes to copy (publish) plans to a server or another IDQ installation. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . these locations are defined in the config.e. A Standard (i. but to expedite development. These data types are examples of reference data that should be planned for by using a specific approach to populate and maintain them with input from the business community. Whenever you add a dictionary or reference data file to a plan.Data Warehousing 294 of 1017 . (Again. they must be addressed prior to building data integration processes. the user must copy the new dictionary files to a recognized location for the server or the other IDQ also.IDQ recognizes set locations for dictionary and reference data files. Additionally. Make sure you pass the inventory of reference data to all other personnel who are going to use the plan. Informatica provides regular updates to the reference data. It is recommended to build valid value processes to perform this validation. An Enterprise (i. The business must agree to license these files before the Data Quality Developer can assume he or she can develop plans using third-party files. and because the files are licensed and delivered separately from IDQ. Third-party reference data adds another set of actions. because the installation of these files is less simple. and the system administrator must understand that the reference data will be installed in the required locations.xml file.) However. Conversely. copying these files to other locations is not as simple.

Data Warehousing 295 of 1017 .Best Practices None Sample Deliverables None Last updated: 15-Feb-07 19:14 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

At a high level. requires user training. Instead.4. the Data Quality Workbench.3. along with subtask 5. tested. Workbench is an intuitive user interface.4 focuses on the steps to deploy plans in a runtime or scheduled environment.3. they describe the rudiments of plan construction. however. and the next steps to plan deployment.3.Data Warehousing 296 of 1017 . The data sources. Users who are creating plans should read both subtasks. While this subtask describes the creation and execution of plans through Informatica Data Quality (IDQ) Workbench. and other components can be configured through a tabbed dialog box in the same way as PowerCenter transformations. within which plans can be designed. and Workbench.Phase 5: Build Subtask 5. sinks. However. there are several aspects to creating plans primarily for runtime use. sinks.3. the design of a plan is not dissimilar to the design of a PowerCenter mapping. All plans are created in Workbench. the plans that users construct in Workbench can grow in size and complexity.3 Design and Execute Data Enhancement Processes Description This subtask. These subtasks are not a substitute for that training. Both subtasks assume that the Data Quality Developer will have received formal training in IDQ. Note: IDQ provides a user interface. Prerequisites None Roles Data Quality Developer (Primary) Technical Project Manager (Approve) Considerations A data quality plan is a discrete set of data analysis and/or enhancement operations with a data source and a data target (or sink). One difference between PowerCenter and Workbench is that users cannot define workflows that contain serial data quality INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and transformations in a mapping. the elements required for various types of plans. subtask 5.4 Design Run-time and Real-time Processes for Operate Phase Execution concerns the design and execution of the data quality plans that will prepare the project data for the Data Integration Design and Development in the Build Phase. targets. like all software applications. and these are covered in 5. much like the sources. Sources. and deployed to other Data Quality engines across the network. and analysis/enhancement components are represented on-screen by icons.

The final plans that operate on the project data are likely to operate on ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . although this functionality is available in a runtime/batch scenario. or fixed-width file types are usable. Oracle. and write data to file and database. The following figure illustrates a simple data quality plan. These are described in detail in the Best Practice Data Cleansing. and need not be part of a formal test scenario. Plans can be designed to fulfill several data quality requirements. the questions to consider include: ● What types of plan are necessary to meet the needs of the project The business should have already signed-off on specific data quality goals as a part of agreeing the overall project objectives. the Data Quality Developer must be able to sign-off on each plan as valid and executable. Data quality plans can read source data from. an operational component analyzing the data. What source data will be used for the plans? This is related to the testing issue mentioned above. the audit may indicate that the project data contains a high percentage of duplicate records. and therefore matching and pre-match grouping plans may be necessary. This data quality plan shows a data source reading from a SQL database. and the Data Quality Audit should have indicated the areas where the project data requires improvement. What test cycles are appropriate for the plans? Testing and tuning plans in Workbench is a normal part of plan development. A plan can have any number of operational components. as are DB2. and a data sink component that receives the data available as plan output. enrichment. matching. testing a plan in Workbench is akin to validating a mapping in PowerCenter. parsing. including data analysis. SQL Server databases and any database legible via ODBC. validation. For example. When designing data quality plans. Most delimited. and consolidation. cleansing and standardization. However. Informatica Data Quality (IDQ) stores plan data in its own MySQL data repository. flat. In many cases.Data Warehousing 297 of 1017 .plans.

the complete project dataset. The functional details of the plan are saved as XML in the PowerCenter repository.3. At the minimum. It enables the PowerCenter Integration Service to send data quality plan XML to the Data Quality engine when a session containing a Data Quality Integration transformation is run. will the plans be deployed as runtime plans? A plan is considered a runtime plan if it is deployed in a scheduled or batch operation with other plans. a meaningful sample of the dataset should be replicated and made available for plan design and test purposes. In such cases. Note that plans with realtime capabilities are also suitable for use in a request-response environment. the plan is run using a command line instruction. The Integration enables the following types of interaction: ● It enables you to browse the Data Quality repository and add a data quality plan to the Data Quality Integration transformation. Ideally. Bear in mind that a plan that is published to a service domain repository will translate the data source locations set at design time into new locations local to the new computer on which it resides. See subtask 5. Where will the plans be deployed? IDQ can be installed in a client-server configuration.3. See subtask 5.4 Design Run-time and Real-time Processes for Operate Phase Execution for details. so that a connected Workbench user can run a plan from a local or domain repository to any Execution Service on the service domain. The server employs service domain architecture. so that the plans can be designed and tested on a fully faithful version of the project data.Data Warehousing 298 of 1017 . A subset of the source and sink components can be configured in this way (six out of twenty-one components). such as a point of data entry environment. the Data Quality Developer may publish plans from Workbench to a remote repository on the IDQ service domain for execution by other Data Quality Developers. Likewise. These realtime plans can be called by a third-party application to analyze keyboard data inputs and correct human error. with multiple Workbench installations acting as clients to the IDQ server. An important consideration here is. Bear in mind also that it is possible to add a plan to a mapping if the Data Quality Integration plugin has been installed client-side and server-side to PowerCenter. a complete ‘clone’ of the project dataset should be available to the Data Quality Developer.4 Design Run-time and Real-time Processes for Operate Phase Execution and the Informatica Data Quality User Guide for more information. the plans will effect changes in the customer data. Best Practices None Sample Deliverables INFORMATICA CONFIDENTIAL Velocity v8 Methodology . ● A plan designed for use in a PowerCenter mapping must set its data source and data sink components to process data in realtime. in every case.

None Last updated: 12-Feb-07 15:05 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 299 of 1017 .

However.3. they are commonly published or moved to a computer where higher-performance is available. along with subtask 5. When publishing or moving a runtime plan. A runtime plan can be scheduled to run at regular intervals on the dataset to analyze dataset quality.3 describes the creation and execution of plans through Data Quality Workbench. there are several aspects to creating plans primarily for runtime which are described in this subtask.3. such plans can outlive the project in which they are designed and provide a method for ongoing data monitoring in the enterprise. Because they can be scheduled and run in a batch. Prerequisites None Roles INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 300 of 1017 .4 Design Runtime and Real-time Processes for Operate Phase Execution Description This subtask. While subtask 5. runtime plans present two opportunities for the Data Quality Developer and the data project as a whole: ● A plan that may take several hours to run — such as a large-scale data matching plan — can be scheduled to run overnight as a runtime plan. consider the issues discussed in this subtask.3. All data quality plans are created in Workbench. ● Because runtime plans need not be run from a user interface.3 Design and Execute Data Enhancement Processes concerns the design and execution of the data quality plans to prepare the project data for the Data Integration component of the Build Phase and possibly later phases.Phase 5: Build Subtask 5. this subtask focuses on the steps to deploy plans in a runtime or scheduled environment. Users who are creating plans should read both subtasks.

If the plan connects to a file. This is pertinent as the runtime plan will typically be moved from its design-time computer to another computer for execution. If the source data is stored in a database.name\Files\Myfiles INFORMATICA CONFIDENTIAL Velocity v8 Methodology . consider the following questions: Will the plan be run in an IDQ service domain? A plan moved to another machine may be run through Data Quality Server (specifically.Data Warehousing 301 of 1017 . the Data Quality engine can run the plan from the repository. the source data and reference files must reside in locations that are visible to Informatica Data Quality (IDQ).txt A Data Quality Server on Windows will look for the file here: C:\Program Files\Data Quality\users\user. by a machine hosting a Data Quality Execution Service. then the data locations can remain static — so long as the data source details do not change. However. and you can publish the plan to repository from the Workbench client. the same database connection must be available on the machine to which the plans are moved. the name and path to the file(s) are set in the data source component. If a Data Quality Developer defines a plan with a source file stored in the following location on the Workbench computer: C:\Myfiles\File.) In this case. When you publish a plan. If the plan is run on the machine on which it was designed. Data source locations are set in the in the plan at design time. if the plan is moved to another machine.Business Analyst (Review Only) Data Quality Developer (Primary) Technical Project Manager (Review Only) Considerations The two main factors to consider when planning to use runtime plans are: ● ● What data sources will the plan use? What reference files will the plan use? In both cases. bear in mind that IDQ recognizes a specific set of folders as valid source file locations.

the files that installed with the product) then IDQ takes care of this automatically.. that is.name/Files/Myfiles where user. as long as the plan resides on a service domain. (The Data Quality Developer must be working on a Workbench machine that has a client connection to the Data Quality Server. a Windows path will be mapped to a UNIX path. Are the source files in a non-standard location on the runtime computer? If a Data Quality Developer publishes a plan to a service domain repository for runtime execution.Data Warehousing 302 of 1017 . the plans must be saved as a .And a Data Quality Server on UNIX installed at /home/Informatica/dataquality/ will look for the file here: /home/informatica/dataquality/users/user. these files must be copied to the a location that is recognizable to the IDQ installation that will run the deployed plans. For more information on valid dictionary and reference data files. When the Data Quality Developer designs a plan in Workbench.pln files for use in another instance of Workbench. Implications for Plan Design The above settings can have a significant bearing on plan design.xml file for runtime deployment.e. (Plans can also be saved as . Do the plans use non-standard dictionary files.) Path translations are platform-independent. If a plan uses standard dictionary files (i. see the Informatica Data Quality User Guide. he or she should ensure that the folders created for file resources can map efficiently to the server folder structure.) The Data Quality Developer can set the run command to distinguish between plans stored in the Data Quality repository and plans saved on the file system. translating the location set in the plan into the required file location. or dictionary/reference files in non-standard locations? The Data Quality Developer must check that any dictionary or reference files added to a plan at design time are also available at the runtime location. the Data Quality Developer can add a parameter file to the run command.name is the logged-on Data Quality Developer name. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . If a plan is published or copied to a network location and uses non-standard reference files. and the plan source file is located in a non-standard location on the executing computer. Will the plan be deployed to IDQ machines outside the service domain? If so.

let’s say the Developer creates a data source file folder on a Workbench installation at the following location: C:\Program Files\Data Quality\Sources When the plan runs on the server side. the Data Quality Server looks for the source file in the following location: C:\Program Files\Data Quality\users\user.Data Warehousing 303 of 1017 .For example. good plan design suggests the creation of folders under C:\ that can be recreated efficiently on the server. Best Practices None Sample Deliverables None Last updated: 15-Feb-07 19:16 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .name\Files\Program Files\Data Quality\Sources Note that the folder path Program Files\Data Quality is repeated here: in this case.

The questions below are a subset of those included in the sample deliverable document Data Quality Plan Documentation and Handover. he or she must then create an inventory of the plans. Prerequisites None Roles Data Quality Developer (Primary) Considerations For each plan created for use in the project (or for use in the Operate Phase and postproject scenarios).5 Develop Inventory of Data Quality Processes Description When the Data Quality Developer has designed and tested the plans to be used later in the project. The questions can be divided into two sections: one relating to the plan’s place and function relative to the project and its objectives. and the other relating to the plan design itself. Project-related Questions ● What is the name of the plan? What project is the plan part of? Where does the plan fit in the overall project? ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . other project personnel and business users are likely to rely on the inventory to identify where the plan functioned in the project.Phase 5: Build Subtask 5. once they achieve any size. can be hard for personnel other than the Data Quality Developer to read. Moreover. Data quality plans. This inventory should be as exhaustive as possible.3. the inventory document should answer the following questions.Data Warehousing 304 of 1017 .

and on which applications will the plan run? Provide a screengrab of the plan layout in the Workbench user interface.Data Warehousing 305 of 1017 . if any. and when? In what version of IDQ was the plan was designed? What Informatica application will run the plan.● What particular aspect of the project does the plan address? What are the objectives of the plan? What issues. apply to the plan or its data? What department or group uses the plan output? What are the predicted ‘before and after’ states of the plan data? Where is the plan located (include machine details and folder location) and when was it executed? Is the plan version-controlled? What are the creation/medatada details for the plan? What steps were taken or should be taken following plan execution? ● ● ● ● ● ● ● Plan Design-related Questions ● What are the specific data or business objectives of the plan? Who ran (or should run) the plan. What data source(s) are used? Where is the source located? What are the format and origin of the database table? ● ● ● ● ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Provide the logical statements. and if so.● Is the source data an output from another IDQ plan.3. What are the outputs for the instance. database table. where are they written? What is the next step in the project? Will the plan(s) be re-used (e. and what actions are they likely to they take? ● ● ● ● ● ● ● ● ● Best Practices None Sample Deliverables None Last updated: 15-Feb-07 19:19 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . which one? Describe the activity of each component in the plan. or file? Are there exception files? If so..g. if appropriate.Data Warehousing 306 of 1017 . in a runtime environment)? Who receives the plan output data. and how are they named? Where is the output written: report.1 Design Data Quality Technical Rules. What reference files or dictionaries are applied? What business rules are defined? This question can refer to the documented business rules from subtask 5. as appropriate. Component functionality can be described at a high level or low level.

to all personnel who need them.Data Warehousing 307 of 1017 .6 Review and Package Data Transformation Specification Processes and Documents Description In this subtask the Data Quality Developer collates all the documentation produced for the data quality operations thus far in the project and makes them available to the Project Manager. The Data Quality Developer must also ensure that the data quality plans themselves are stored in locations known to and usable by the Data Integration Developers. Project Sponsor. The Data Quality Developer should either arrange a handover meeting with all relevant project roles or ask the Data Steward to arrange such a meeting. Prerequisites None Roles Data Integration Developer (Secondary) Data Quality Developer (Primary) Technical Project Manager (Review Only) Considerations After the Data Quality Developer verifies that all data quality-related materials produced in the project are complete.3. and Data Integration Developers — in short.Phase 5: Build Subtask 5. he or she should hand them all over to other interested parties in the project. INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Data Quality Audit results (prepared in task 2. the Data Quality Developer must be able to say whether the data operated on is now in a position to proceed through the rest of the project.1 Design Data Quality Technical Rules). Data Quality plan files (.) Inventory of business rules used in the plans (prepared in subtask 5.xml files). If the Data Quality Developer believes that there are “show stopper” issues in the data quality.pln or .5 Develop Inventory of Data Quality Processes).3 Design and Build Data Quality Process. The materials that the Data Quality Developer must assemble include: ● Inventory of data quality plans (prepared in subtask 5. or locations of the Data Quality repositories containing the plans.8 Perform Data Quality Audit). ● ● ● ● ● ● Best Practices Build Data Audit/Balancing Processes INFORMATICA CONFIDENTIAL Velocity v8 Methodology .3. (All Data Quality repositories containing final plans should be backed up. The business managers can then decide if the data can pass to the next stage of the project or if remedial action is appropriate. lessons learned Data quality targets: met or missed? Recommended next steps for project data ● ● ● Regarding data quality targets met or missed. Details of backup data quality plans. The presentation may constitute a PowerPoint slide show and may include dashboard reports from data quality plans. Summary of task 5.Data Warehousing 308 of 1017 . Inventory of dictionary and reference files used in the plans (prepared in subtask 5.2 Determine Dictionary and Reference Data Requirements).3. The presentation should cover the following areas: ● Progress in treating the quality of the project data (‘before and after’ states of the data in the key data quality areas) Success stories. he or she must inform the business managers and provide an estimate of the work necessary to remedy the data issues.3.The Data Quality Developer should consider making a formal presentation at the meeting and should prepare for a Q&A session before the meeting ends.

Data Warehousing 309 of 1017 .Sample Deliverables Data Quality Plan Design Last updated: 01-Feb-07 18:47 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Prerequisites None Roles Business Analyst (Primary) Data Integration Developer (Primary) Data Warehouse Administrator (Secondary) Database Administrator (DBA) (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . because requirements are likely to change. When complete.Data Warehousing 310 of 1017 . the goal of this task should be to capture and document as much detail as possible about the data integration processes prior to development. Many development delays and oversights are attributable to an incomplete or incorrect data integration process design. which incorporates high-level standards such as error-handling strategies. however.4 Design and Develop Data Integration Processes Description A properly designed data integration process performs better and makes more efficient use of machine resources than a poorly designed process. This goal is somewhat unrealistic. Nevertheless. design elements need further clarification. This task includes the necessary steps for developing a comprehensive design plan for the data integration process. and overall load-processing strategies. as well as specific details and benefits of individual mappings. thus underscoring the importance of this task. this task should provide the development team with all of the detailed information necessary to construct the data integration processes with minimal interaction with the design team.Phase 5: Build Task 5. and some items are likely to be missed during the design process.

Order_Qty * Item.Quality Assurance Manager (Primary) Technical Project Manager (Review Only) Considerations The PowerCenter platform provides facilities for developing and executing mappings for extraction. Customer_Num. transformation and load operations.Customer.' Instead.Item_Price . (Pre-emptive steps to define business rules and to avoid data errors may have been performed already as part of task 5. Item_Num = Item. This is the stage where business rules are transformed into actual physical specifications. For migration projects the data integration processes can be further subdivided into the following processes: ● ● ● Develop Acquire Processes Develop Convert Processes Develop Migrate/Load Processes INFORMATICA CONFIDENTIAL Velocity v8 Methodology . the formula for 'Total Cost' should be documented as: Orders. Mapping specifications should address field sizes. including the business rules applied to the data before it reaches a target.Data Warehousing 311 of 1017 .) It is important to capture design details at the physical level. or may involve a series of detailed transformations that use complicated expressions to manipulate the data before it reaches the target. moving data can be a simple matter of passing data straight from a data source through an expression transformation to a target. a field that stores "Total Cost" should not have a formula that reads 'Calculate total customer cost. Depending on the complexity of the transformations. The data may also undergo data quality operations inside or outside PowerCenter mappings. and so forth. transformation rules. These mappings determine the flow of data between sources and targets. For example. Data Migration projects differ from typical data integration projects in that they should have an established process and templates for most processes that are developed. methods for handling errors or unexpected results in the data.Item_Num and Order.3 Design and Build Data Quality Process. note also that some business rules may be closely aligned with data quality issues. avoiding the use of vague terms and moving any business terminology to a separate "business description" area. This is due to the fact that development is accelerated and more time is spent on data quality and driving out incomplete business rules then on traditional development.Customer_Num = Customer.Item_Discount where Order.

Data Warehousing 312 of 1017 .● Develop Audit Processes Best Practices Real-Time Integration with PowerCenter Sample Deliverables None Last updated: 27-May-08 16:19 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

In addition. considerations. availability of sources and targets.1 Design High Level Load Process Description Designing the high-level load process involves the factors that must be considered outside of the mapping itself.1 Define Project. in order to complete the loads successfully. Creating a solid load process is an important part of developing a sound data integration solution. load dependencies and session level error handling are all examples of issues that developers should deal with in this task. In this step. tables may be loaded differently in the initial load than they will subsequently. the Database Administrator works with the Data Warehouse Administrator and Data Integration Developer. 2. the Data Integration Developer and Business Analyst use information created in the two earlier steps to develop a load plan document. members of the development team work together to determine the load window. Determine dependencies . In this step. network availability. all of which involve specific activities. the team must have a thorough understanding of the business requirements developed in task 1. Determining load windows. such as database availability. session scheduling. The load document generated in this step. The load window is the amount of time it will take to load an individual table or an entire data warehouse or data mart. Identify load requirements . To begin this step.Phase 5: Build Subtask 5. the developers should consider other environmental factors. The team should also consider the differences between the requirements for initial and subsequent loading.4. 3. Prerequisites INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and other processes that may be executing concurrently with the data integration processes. In this step. and deliverables. Create initial and ongoing load plan . This subtask incorporates three steps. The steps are: 1. describes the rules that should be applied to the session or mapping. this lists the estimated run times for the batches and sessions required to populate the data warehouse and/or data marts. to identify and document the relationships and dependencies that exist between tables within the physical database.Data Warehousing 313 of 1017 . These relationships affect the way in which a warehouse is loaded.

and to the operations personnel configuring the sessions. while subsequent loads may perform a smaller number of both insert and update operations. the dimension table will be empty. Data Warehouse Administrator and Technical Architect are responsible for ensuring that their respective environments are tuned properly to allow for maximum throughput. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . to assist with this goal. can be used by the Data Integration Developers as a performance target.None Roles Business Analyst (Review Only) Data Integration Developer (Primary) Data Warehouse Administrator (Secondary) Database Administrator (DBA) (Primary) Quality Assurance Manager (Approve) Technical Project Manager (Review Only) Considerations Determining Load Requirements The load window determined in step 1 of this subtask. the first load will perform a large number of inserts. before the first load of the warehouse. if the primary focus of a mapping is an update of a dimension. Consequently. For example. Subsequent loads of a table are often performed differently than the initial load. the dimension table has no data. But in the first load of a warehouse. The Database Administrator. The initial load of a table may involve the execution of a subset of the database operations used by subsequent loads. The development team should consider and document such situations and convey the different load requirements to the developer creating the mappings. suppose the primary focus of a mapping is an update of a dimension. For example. Mappings should be tailored to ensure that their sessions run to successful completion within the constraints set by the load window requirements document.Data Warehousing 314 of 1017 .

although unrelated local processes executing on the server are not likely to cause a session to fail. Use the target load plan option in PowerCenter Designer to ensure that the parent table is marked to be loaded first. the development team should consult with the Network Administrator to discuss network capacity and availability in order to avoid poorly performing batches and sessions. or integrity constraints (if applied) will be broken and the data load will fail. Similarly. The Data Integration Developer is responsible for documenting these dependencies at a mapping level so that loads can be planned to coordinate with the existence of dependent relationships. The Developer should also consider and document other variables such as source and target database availability. The parent table keys will be loaded before an associated child foreign key is loaded into its table.e. they can severely decrease performance by keeping available processors and memory away from the PowerCenter server engine. The load plans should be designed around the known availability of both source and target databases. and local server processes unrelated to PowerCenter when designing the load schedule. Finally. the parent table must always be loaded before the child table. When designing the load plan. use the constraint-based loading option at the session level. parent / child) relationships are the most common variable that should be considered in this step. thereby slowing throughput and possibly causing a load window to be missed.Identifying Dependencies Foreign key (i. network up/ down time. TIP Load parent / child tables in the same mapping to speed development and reduce the number of sessions that must be managed. it is particularly important to consider the availability of source systems. if sources or targets are located across a network.. To load tables with parent / child relationships in the same mapping. as these systems are typically beyond the operational control of the development team. Best Practices None Sample Deliverables None INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 315 of 1017 .

Data Warehousing 316 of 1017 .Last updated: 15-Feb-07 19:21 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

a decision needs to be made regarding how the load process will account for data errors. The degree of complexity of the error handling strategy varies from project to project. and automated manner. It is also unreasonable to expect any automated load process to execute correctly 100 percent of the time. abort.Data Warehousing 317 of 1017 .4. bad data. The event is triggered based on the completion of the sequence of tasks. The challenge in implementing an error handling strategy is to design mappings and load routines robust enough to handle any or all possible scenarios or events that may trigger an error during the course of the load process. The identification of a process error is driven by the stability of the process itself. dependencies. or server availability. target system. including session failure.2 Develop Error Handling Strategy Description After the high-level load process is outlined and source files and tables are identified. Control Task allows the user to stop. load volumes.Phase 5: Build Subtask 5. timely. and reporting tools. ● ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The following is a subset of these tasks: ● Command Task allows the user to specify one or more shell commands to run during the workflow. The identification of a data error within a load process is driven by the standards of acceptable data quality. Decision Task allows the user to enter a condition that determines the execution of the workflow. platform constraints. end-user environments. It is unreasonable to expect any source system to contain perfect data. platform stability. depending on such variables as source data. business requirements. load windows. time constraints. This task determines how the PowerCenter Integration Service executes a workflow. Event Task specifies the sequence of task execution in a workflow. Errors can be triggered by any number of events or scenarios. Several types of tasks within the Workflow Manager are designed to assist in error handling. mismatched control totals. The error handling development effort should include all the work that needs to be performed to correct errors in a reliable. or fail the top-level workflow or the parent workflow based on an input-link condition.

When the rejected rows are discovered and processed. Prerequisites None Roles Data Integration Developer (Primary) Database Administrator (DBA) (Secondary) Quality Assurance Manager (Approve) Technical Project Manager (Review Only) Considerations INFORMATICA CONFIDENTIAL Velocity v8 Methodology .● Timer Task allows the user to specify the period of time to wait before the Integration Service executes the next task in the workflow. ● The Data Integration Developer is responsible for determining: ● ● ● ● ● What data gets rejected. Why the data is rejected.Data Warehousing 318 of 1017 . The user can choose to either set a specific time and date to start the next task or wait a period of time after the start time of another task. How the mappings handle rejected data. Data integration developers should find an acceptable balance between the end users' needs for accurate and complete information and the cost of additional time and resources required to repair errors. and include in the discussion the outputs from tasks 2. Email Task allows the user to configure email to be set to an administrator or business owner in the event that an error is encountered by a workflow task. and Where the rejected data is written. The Data Integration Developer should consult closely with the Data Quality Developer in making these determinations.8 Perform Data Quality Audit and 5.3 Design and Build Data Quality Process.

the data will be rejected regardless of whether or not it was accounted for in the code. Bad data can be defined as data that enters the load process from one or more source systems. There are several approaches to handling session failures within the Workflow Manager.session scripts. In the second scenario. This data can be rejected by the load process itself or designated as "bad" by the mapping logic created by developers. whether it is in the form of a message to a pager from operations or a post-session email from a PowerCenter Integration Service. the data will end up in a reject file on the PowerCenter server. Although the data is rejected without developer intervention. the failure of the session itself needs to be recognized as an error in the load process. Both scenarios require post-load reconciliation of the rejected data.g. In the first scenario. the row of data is simply skipped by the Data Transformation Manager (DTM) and is not written to the target or to any reject file. the event-raise task and the event-wait task) can be used to start specific tasks in reaction to a failed task. workflow variables such as the pre-defined task-specific variables or user-defined variables. end-user environments.. or reporting environments. A PowerCenter Integration Service will reject a row if a date/time field is sent to a character field without implicitly converting the data. and event tasks (e. either by parsing reject files or balancing control totals. accounting for it remains a challenge. Data Rejected by Platform Constraints. but is prevented from entering the target systems. The error handling strategy commonly includes a mechanism for notifying the process owner that the session failed. which are typically staging areas. These include custom-written recovery routines with pre. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . These errors include: ● Session Failure. r In both of these scenarios. A load process may reject certain data if the data itself does not comply with database and data type constraints. If a PowerCenter session fails during the load process. ● "Bad" Data. For instance: r ● The database server will reject a row if the primary key field(s) of that row already exists in the target.and post.Data Integration Developers should address the errors that commonly occur during the load process in order to develop an effective error handling strategy. An error handling strategy should account for data that is rejected in this manner.Data Warehousing 319 of 1017 .

The records flagged for success are written to the target while the records flagged for failure are written to a reject file or table for reconciliation. Data that has not been loaded within the window can be written to staging areas or processed in recovery mode at a later time. notification and data that has not been committed to the target system must be incorporated in the error handling strategy.Some of the reasons that bad data may be encountered between the time it is extracted from the source systems and the time it is loaded to the target include: r r r r The data is simply incorrect. To a degree. Irreconcilable Control Totals.e. incorporate control totals in their mapping logic. One way to ensure that all data is being loaded properly is to compare control totals captured on each session. some organizations run post-session reports against the repository tables and parse the log files.. how many records entered the job stream? How many records were written to target X? How many records were written to target Y? A post-session script can be launched to reconcile the total records read into the job stream with the total numbers written to the target(s). Control totals can be defined as detailed information about the data that is being loaded in a session. row counts. The data is converted improperly in a transformation.Data Warehousing 320 of 1017 . email. and user requirements. start to finish) based on data volumes. The data fails on foreign key validation. business hours. based on the data itself and the logic applied to that data. there may have been an error somewhere in the load process. A load window is the time that is allocated for a load process to complete (i. If a load process does not complete within the load window. Notification can take place via operations. wishing to capture more in-depth information about their loads. The data violates business rules. which flag records within the data flow for success or failure. Others. Depending on the level of detail desired to capture control totals. ● Data Rejected by Time Constraints Load windows are typically pre-defined before data is moved to the target system. spinning off check sums. or page. For example. The strategy that is implemented to handle these types of errors determines what data is available to the business as well as the accuracy of that data. and other calculations during the load process. If the number in does not match the number out. This strategy can be developed with PowerCenter mappings. the PowerCenter session logs and repository tables store this type of information. These totals ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

any sessions and workflows scheduled on it will not be run if it is the only resource configured within a domain. Events themselves can either be defined by the user or pre-defined (i. if a PowerCenter Integration Service goes down during a load process. An event-wait task instructs the Integration Service to wait for a specific event to be raised before continuing with the workflow. database servers do occasionally go down. There are two event tasks that can be included in a workflow: event-raise and event-wait tasks. For example. first error code and message. gettaskdetails. This strategy may vary considerably depending on the PowerCenter configuration employed. A thorough error handling strategy should assess and account for the probability of services not being available 100 percent of the time. an event raised on completion of one set of tasks. triggering notification when numbers do not match up. ● Server Availability.. the sessions and workflows currently running on it will fail if it is the only service configured in the domain. Issuing this command for a session task returns various data regarding a workflow. while an event-raise task triggers an event at a particular point in a workflow. Sessions and workflows can be configured to run based on dependencies. The use of events allows the sequence of execution within a workflow to be specified. number of successful and failed rows from the source and target.e. Or a batch may have sessions embedded that are dependent on each other's completion. The pmcmd command. a file watch event) by PowerCenter. If a session or batch fails at any point in the load process because of a dependency violation. ● Job Dependencies. Problems such as this are usually directly related to the stability of the server platform. provides information to assist in analyzing the loads. network interrupts do happen. session log file name. the error handling strategy should catch the problem.Data Warehousing 321 of 1017 . Similarly. the start of a session can be dependent on the availability of a source file. and the number of transformation errors. triggering the initiation of another.are then compared to figures generated by the source systems. including the mapping name. For example. PowerCenter's High Availability options can be harnessed to eliminate many single points of failure within a domain and can help to ensure minimal service interruption. If a node is unavailable at runtime. Ensure Data Accuracy and Integrity INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and log/file space can inadvertently fill up.

Data Integrity deals with the internal relationships of the data in the system and how those relationships are maintained (i. The accuracy of the data. When relationships cannot be maintained because of incorrect information entered from the source systems. therefore. the load process needs to determine if processing can continue or if the data should be rejected. One of the main goals of the load process is to ensure the accuracy of the data that is committed to the target systems. along with source owners and data stewards. data integrity issues will not arise since the data has already been processed in the steps described in task 4. good) on a column-by-column basis. how much of the data is still bad (vs.e. the data in these systems must be sufficiently accurate to provide users with a level of confidence that the information they are viewing is correct. before any logic is applied to it. In the absence of dedicated data quality steps such as these. task 2.3 Design and Build Data Quality Process is designed specifically to eliminate data quality problems as far as possible before data enters the Build Phase of the project.6. and then to determine which data can be fixed in either the source or the mappings.In addition to anticipating the common load problems. is dependent on the source systems from which it is extracted. data in one table must match corresponding data in another table). Specifically. individuals should be held accountable for: INFORMATICA CONFIDENTIAL Velocity v8 Methodology . It is important. However. Taking ownership of these responsibilities throughout the project is vital to correcting errors during the load process. The error handling process should account for the data that does not pass validation. the former approach is preferable as it (1) provides metrics to business and project personnel and (2) provides an effective means of addressing data quality problems. and task 5.. developers need to investigate potential data problems and the integrity of source data.8 Perform Data Quality Audit is specifically designed to establish such knowledge about project data quality. and which does not need to be fixed before it enters the target.Data Warehousing 322 of 1017 . Ideally. one approach is to estimate. for developers to identify the source systems and thoroughly examine the data in them. it is essential to assign the responsibilities of correcting data errors. Including lookups in a mapping is a good way of checking for data integrity. Determine Responsibility For Data Integrity/Business Data Errors Since it is unrealistic to expect any source system to contain data that is 100 percent accurate. Lookup tables are used to match and validate data based upon key fields. Because end users typically build reports from target systems and managers make decisions based on their content.

● ● ● ● Providing business information Understanding the data layout Data stewardship (understanding the meaning and content of data elements) Delivering accurate data Part of the load process validates that the data conforms to known rules from the business.Data Warehousing 323 of 1017 . he or she will be relied upon to provide developers with insight into such things as valid values. Their presence is not always required. After understanding the business requirements. These individuals should be thoroughly familiar with the format. The primary purpose for developing an error handling strategy is to prevent data that inaccurately portrays the state of the business from entering the target system. developers must gather data content information from the individuals that have first-hand knowledge of how the data is laid out in the source systems and how it is to be presented in the target systems. and consistent descriptions across source systems. and accurate descriptions. This knowledge helps to determine which data should be allowed in the target system based on the physical nature of the data as opposed to the business purpose of the data. The individuals responsible for providing business information to the developers must be knowledgeable and experienced in both the internal operations of the organization and the common practices of the relevant industry. It is important to understand the data and functionality of the source systems as well as the goals of the target environment. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . End users should either accept the consequences of permitting invalid data to enter the target system or they should choose to reject the invalid data. If developers are not familiar with the business practices of the organization. depending on the scope of the project. Both options involve complex issues for the business organization. When these rules are not met by the source system data. codes. the process should handle these exceptions in an appropriate manner. They are also responsible for maintaining translation tables. or their equivalent. it is practically impossible to make valid judgments about which data should be allowed in the target system and which data should be flagged for error handling. layout. Data stewards. standard codes. Providers of business information play a key role in distinguishing good data from bad. and structure of the data. are responsible for the integrity of the data in and around the load process. but if a data steward is designated. The individuals responsible for maintaining the physical data structures play an equally crucial role in designing the error handling strategy.

For Data Migration projects. The goal is to have the developers design the error handling process according to the information provided by the experts.This type of information. along with robust business knowledge and a degree of familiarity with the data architecture. The data steward can determine if the code should be in the translation table. Data stewards are also responsible for correcting the errors that occur during the load process and in their field of expertise. If. and if it should have been flagged for error. The error handling process should recognize the errors and report them to the owners with the relevant expertise to fix them. a new code is introduced from the source system that has no equivalent in a translation table. for example. By developing this important standardized strategy. Normally this tracking data is stored in a relational database with a corresponding set of exception reports. all data cleansing and data correction development will be expedited due to having a predefined method of determining what exceptions have been raised and which data caused the exception. will give the Build team the necessary level of confidence to implement an error handling strategy that can ensure the delivery of accurate data to the target system. it should be flagged and presented to the data steward for review. it is important to develop a standard method to track data exceptions. Best Practices Disaster Recovery Planning with PowerCenter HA Option Sample Deliverables None Last updated: 04-Dec-07 18:18 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 324 of 1017 .

the development team must anticipate and plan for potential disruptions to the loading process. The team must design the data integration platform so that the processes for loading data into the warehouse can be restarted efficiently in the event that they are stopped or disrupted. taking all or part of the data warehouse off-line while new data is loaded. it is possible for disruptions to occur. This is because it often involves performing data updates within a tight timeframe. To minimize the amount of time required for data updates and further ensure the quality of data loaded into the warehouse. stopping the data load in mid-stream. Prerequisites None Roles Data Integration Developer (Primary) Database Administrator (DBA) (Secondary) Quality Assurance Manager (Approve) Technical Project Manager (Review Only) Considerations Providing backup schemas for sources and staging areas for targets is one step toward improving the efficiency with which a stopped or failed data loading process can be restarted.Data Warehousing 325 of 1017 . Source data should not be changed prior to restarting a failed process. While the update process is usually very predictable.3 Plan Restartability Process Description The process of updating a data warehouse with new data is sometimes described as "conducting a fire drill". as this may cause the PowerCenter server to return missing or repeat values.4.Phase 5: Build Subtask 5. A backup INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

PowerCenter versions 6 and above have the ability to configure a Workflow to Suspend on Error.C. providing a staging area for target data gives the team the flexibility of truncating target tables prior to restarting a failed process. A second step in planning for efficient restartability is to configure PowerCenter sessions so that they can be easily recovered. For example. 2. suppose a workflow has tasks A. This places the workflow in a state of suspension. if necessary.B. Similarly. it is not a major recovery point. so that the environmental problem can be assessed and fixed. In this workflow. If data extraction from source is datetime-driven. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . while the workflow can be resumed from the point of suspension. This means that task B is a major recovery point. Sessions in workflows manage the process of loading data into a data warehouse. A script can be incorporated into the data update process to delete or move flat file sources only upon successful completion of the update. Since session A is able to recover by merely restarting it. When configuring sessions. Also. you can restart the task and the workflow will automatically recover. so that the failed process can be restarted using its original source. Follow these steps to identify and create points of recovery within a workflow: 1. This is particularly important if mappings in later sessions are dependent on data created by mappings in earlier sessions. On the other hand. it will impact data integrity or subsequent runs.E that run in sequence. all sources should be date-stamped and stored until the loading processes using those sources that have successfully completed. if task B fails. if multiple sessions are to be run.Data Warehousing 326 of 1017 . All tasks that may impact data integrity or subsequent runs should be recovery points. create a delete path within the mapping and run the workflow in suspend mode.D. Identify the major recovery points in the workflow. If flat file sources are being used. if a failure occurs at task A. Identify the strategy for recovery r Build restorability in mapping. TIP You can configure the links between sessions to only trigger downstream sessions upon success status.source schema allows the warehouse team to store a snapshot of source data. arrange the sessions in a sequential manner within a workflow.

Always be sure to examine log files when a session stops. Use the high availability feature in PowerCenter. On the session property screen. in some cases a special mapping that has a filter on the source may be required. While running a session in bulk load can increase session performance. This will limit the amount of time lost if any INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Also. Determine whether or not a session really needs to be run in bulk mode. and research and resolve potential reasons for the stop. normal loading session. In many cases a full refresh is the best course of action. One option is to delete records from the target and restart the process. review and revise scripts as necessary. then the final load processes should include re-start processing design which should be prototyped during the Architect Phase. However. as bulk loading bypasses the database log. Other Ways to Design Restartability PowerCenter Workflow Manager provides the ability to use post-session emails or to create email tasks to send notification to designated recipients informing them about a session run. Create a copy of the workflow and create session-level override and start-from date where recovery is required. or de-selecting the "Stop On" option in the session property screen.Data Warehousing 327 of 1017 . if large amounts of data need to be loaded. This allows the operator to respond to the failed session as soon as possible.r Include transaction controls in mappings. Successful recovery on a bulk-load session is not guaranteed. If the session stops. or DTM errors). Data Migration Projects often have a need to migrate significant volumes of data. Writer. Due to this fact. rather than truncating targets and re-running a bulk-loaded session. re-start processing should be considered in the Architect Phase and throughout the Design Phase and Build Phase. consider increasing the possible number of non-fatal errors allowed. configure the session to stop if errors occur in presession scripts. 3. If a session stops because it has reached a designated number of non-fatal errors (such as Reader. Configure sessions so that an email is sent to the Workflow Operator when a session or workflow fails. This filter should be based on the recovery date or other relevant criteria. it may be easier to recover a large.

large-volume load fails.Data Warehousing 328 of 1017 . Best Practices Disaster Recovery Planning with PowerCenter HA Option Sample Deliverables None Last updated: 04-Dec-07 18:16 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

A common mistake is to assume a source to target mapping document equates to a single mapping. because the objects on these lists can be assigned to individual developers and their progress tracked over the course of the project. Prerequisites None Roles Data Integration Developer (Primary) Considerations The smallest divisions of assignable work in PowerCenter are typically mappings and reusable objects.Phase 5: Build Subtask 5. they will be added to and subtracted from over the course of the project and should be continually updated as the project moves forward.4. these lists are valuable tools. the lists will not be completely accurate at this point. Each of these components would help further refine the project plan by adding the next layer of detail for the tasks related to the development of the solution.4 Develop Inventory of Mappings & Reusable Objects Description The next step in designing the data integration processes is breaking the development work into an inventory of components. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . These components then become the work tasks that are divided among developers and subsequently unit tested. particularly from the perspective of the lead developer and project manager. Despite the ongoing changes. however. This is often not the case.Data Warehousing 329 of 1017 . Naturally. The Inventory of Reusable Objects and Inventory of Mappings created during this subtask are valuable high-level lists of development objects that need to be created for the project. To load any one target table – it might easily take more than one mapping to perform all of the needed tasks to correctly populate the table.

Sometimes an object that would seem sharable across any mapping making use of it. developers can achieve INFORMATICA CONFIDENTIAL Velocity v8 Methodology . This is another area where Metadata Manager can become very useful for developers who want to do where used analysis for objects.Data Warehousing 330 of 1017 . it is important to keep a holistic view of the project in mind when determining which objects are reusable and which ones are custom built. For reusable objects. aggregate mappings. it is then useful to track each of these 5 mappings separately for status and completion.Assume the case of loading a Data Warehouse Dimension table for which you have one source to target matrix document. You might then generate a : ● Source to Staging Area Mapping (Incremental) ● Data Cleansing and Rationalizaiton Mapping ● Staging to Warehouse Update/Insert Mapping ● Primary Key extract (Full extract of Primary Keys used in the delete mapping) ● Logical Delete Mapping (Mark dimension records as deleted if they no longer appear in source) It is important to break down the work into this level of detail because from the list above. By knowing that a particular mapping is going to utilize 4 reusable objects . These would include audit mappings. As a result of the processes and tools implemented during the project. templates and other objects that will need to be developed during the build phase. may need different versions depending on purpose. you can see how a single source to target matrix may generate 5 separate mappings that could each be developed by different developers. mapping generation mappings. From a project planning perspective. Also included in your mapping inventory are the special purpose mappings that are involved in the end to end process but not specifically defined by the business requirements and source to target matrixes.they can focus on the unique work to that particular mapping and not duplicate the same functionality of the 4 reusable objects. Having a list of the common objects that are being developed across the project allows individual developers to better plan their mapping level development efforts.

Best Practices Working with Pre-Built Plans in Data Cleanse and Match Sample Deliverables Mapping Inventory Last updated: 01-Feb-07 18:47 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .communication and coordination to improve productivity.Data Warehousing 331 of 1017 .

5 Design Individual Mappings & Reusable Objects Description After the Inventory of Mappings and Inventory of Reusable Objects is created. the next step is to provide detailed design for each object on each list. Reusable Objects Three key items should be documented for the design of reusable objects: inputs. These details include specific physical information. and the transformations or expressions in between.Data Warehousing 332 of 1017 .4. as well as any other required processes. Developers who have a clear understanding of what reusable objects are available are likely to create better mappings that are easy to maintain. Developers use the documents created in subtask 5. outputs. For the project. down to the table.Phase 5: Build Subtask 5. The detailed design should incorporate sufficient detail to enable developers to complete the task of developing and unit testing the reusable objects and mappings. Prerequisites None Roles Business Analyst (Secondary) Data Integration Developer (Primary) Considerations A detailed design must be completed for each of the items identified in the Inventory of Mapping and Inventory of Reusable Objects.4 Develop Inventory of Mappings & Reusable Objects to construct the mappings and reusable objects.4. and datatype level. as well as error processing and any other information requirements identified. consider INFORMATICA CONFIDENTIAL Velocity v8 Methodology . field.

design a high-level view of the mapping and document a picture of the process within the mapping. Then. be sure to document them and remember to keep it at a physical level. and transformations. you can create shortcuts that point to the object. Document the process and the available objects in the shared folder. Documenting reusable objects provides a comprehensive overview of the workings of relevant objects and helps developers determine if an object is applicable in a specific situation.Data Warehousing 333 of 1017 . it is important to have a clear picture of the end-to-end processes that the data will flow through. After the high-level flow has been established. assign a developer the task of keeping the objects organized in the folder. Whatever the rules. the business rules may say.creating a shared folder for common objects like sources. targets. etc. "For active customers. calculate a late fee rate". The data being extracted from the source system dictates how the developer implements the mapping.). In a multi-developer environment. the other developers must understand the mapplet in order to use it properly. document the details at the field level. Document any other information about the mapping that is likely to be helpful in developing the mapping. using a textual description to explain exactly what the mapping is supposed to accomplish and the methods or steps it follows to accomplish its goal.. This is time consuming and often overlooks vital components of the mapplet. for example. that translates to 'for customers with an ACTIVE_FLAG of "1". and any information about specific error handling for the mapping. and updating sources and targets when appropriate. a multiplication of two fields. Document any expression that may take place in order to generate the target field (e. any known issues with particular fields. The designer of the mapping must determine that. particularly in a multi-developer environment. For example. it is important to document pre-mapping logic. Helpful information may. or conditional logic should be made clear upfront. Next. listing each of the target fields and the source field(s) that are used to create the target field. if one developer creates a mapplet that calculates tax rate. on a physical level. include source and target database connection information. pre or post mapping processing requirements. It is crucial to document reusable objects. When you want to use these objects. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Special joins for the source. filters. a sum of a field. The designer may have to do some investigation at this point for business rules as well. a comparison of two fields. lookups and how to match data in the lookup tables. Without documentation. developers have to browse through the mapplet objects to try to determine what the mapplet is doing. multiply the DAYS_LATE field by the LATE_DAY_RATE field'. data cleansing needed at a field level. Mappings Before designing a mapping.g. potential data issues at a field level. For example.

The completed mapping design should then be reviewed with one or more team members for completeness and adherence to the business requirements. Best Practices None Sample Deliverables None Last updated: 15-Feb-07 19:26 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . individual mapping designs may be created. often the mappings are very similar for some of the stages. and can also be useful for system and unit testing. The specific details used to build an object are useful for developing the expected results to be used in system testing.Data Warehousing 334 of 1017 . while still providing sufficient detail to develop the solution. For mapping specific alterations such as converting data from source to target format. such as populating the reference data structures. In these cases. loading the target and auditing the loading process. This strategy reduces the sheer documentation required for the project. For Data Migrations. The mapping and reusable object detailed designs are a crucial input for building the data integration processes. acquiring data from the source. The design document should be updated if the business rules change or if more information is gathered during the build process. it is likely that a detailed ‘template’ is documented for these mapping types.

as well as for the maintenance team that inherits the mappings after development is complete. although the need for additional objects may become apparent during the development work. a session must be made for the mapping in Workflow Manager. The mapping building process also requires adherence to naming standards. In addition to building the mappings. this subtask involves updating the design documents to reflect any changes or additions found necessary to the original design. To identify and troubleshoot problems in more detail. the debug feature may be leveraged. Commonly-used objects should be put into a shared folder to allow for code reuse via shortcuts. which should be defined prior to beginning this step. thorough documentation helps to ensure good knowledge transfer and is critical to project success.and post-session processes and session parameters should be incorporated and tested (if needed).6 Build Mappings & Reusable Objects Description With the analysis and design steps complete. and consistently using. so that the session and all of its processes are ready for unit testing. By this point.Data Warehousing 335 of 1017 . Accurate. Once the mapping is completed. most reusable objects should have been identified. including the mappings and reusable objects. the next priority is to put everything together and build the data integration processes. Developing. Once the initial session testing proves satisfactory. this feature is useful for looking at the data as it flows through each transformation. Reusable objects can be very useful in the mapping building process. A unit testing session can be created initially to test that the mapping logic is executing as designed. naming standards helps to ensure clarity and readability for the original developer and reviewers. then pre.4.Phase 5: Build Subtask 5. Prerequisites None Roles INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

The design documents may specify that data can be obtained from numerous sources. refer to Informatica's web site (www.. Oracle. Importing the sources and targets is the first step in building a mapping. For more information about Metadata Exchange for Data Models. target schema may be created: ● From scratch. Sybase. SQL Server. Specifically. Enterprise Resource Planning (ERP) applications. and definitions should be verified in this subtask to ensure that they correspond with the design documents. and mainframes via PowerExchange data access products. ASCII/EBCDIC flat files (including OCCURS and REDEFINES). Informix. as well as descriptions from a data model. it is extremely important to document the sources. Metadata Exchange for Data Models extract table. or Sybase PowerDesigner) are used in the design phase. CA ERwin. and transformations in the mapping at this point to help end users understand the flow of the mapping and ensure effective knowledge transfer. PowerPlugs. TIP When data modeling or database design tools (e. Oracle Designer/2000.Data Warehousing 336 of 1017 .g. column. Informatica PowerPlugs can be helpful for extracting the data structure definitions of source and target sources. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Although the targets and sources are determined during the Design Phase the keys.Data Integration Developer (Primary) Database Administrator (DBA) (Secondary) Considerations Although documentation for building the mapping already exists in the design document. targets.com) or the Metadata Exchange for Data Models' manuals.informatica. index and relationship definitions. fields. including DB/2. The design documents may also define the use of target schema and specify numerous ways of creating the target schema. This can save significant time because the PowerPlugs also import documentation and help users to understand the source and target structures in the mapping.

● ● From a default schema that is then modified as desired. be sure to include a description of the source/target in the object's comment section. transformations created in the Mapping Designer can be designated as reusable in the Edit Transformation dialog box. such as reusable transformations. A single reusable object is referred to as a mapplet. Reusable objects are useful when standardized logic is going to be used in multiple mappings. ● When all the transformations are complete. can also be very useful in mapping. When reusable transformations are used with mapplets. However. which automatically creates reusable transformations. With the help of the Cubes and Dimensions wizard (for multidimensional data models). much like creating a "normal" mapping.Data Warehousing 337 of 1017 . INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Mapplets represent a set of transformations and are constructed in the Mapplet Designer. When mapplets are used in a mapping. it can be created in the Transformation Developer. By reverse-engineering the target from the database. With Metadata Exchange for Data Models. they encapsulate logic into a single transformation object. and follow the appropriate naming standards identified in the design documentation (for additional information on source and target objects. Informatica recommends using this method with care however. they facilitate the overall mapping maintenance. If shared logic is not identified until it is needed in more than one mapping. Other types of reusable objects. refer to the PowerCenter User Guide). Reusable transformations can be built in either of two ways: ● If the design specifies that a transformation should be reusable. because after a transformation is changed to reusable. it is particularly important to carefully document their purpose and function. everything must be linked together (as specified in the design documentation) and arrangements made to begin unit testing. Changes to a reusable transformation are reflected immediately in all mappings that employ the transformation. ● ● TIP When creating sources and targets in PowerCenter Designer. the change cannot be undone. making the flow of a mapping easier to understand. because the mapplets hide their underlying logic.

Variables and Parameter Files Using Shortcut Keys in PowerCenter Designer Sample Deliverables None Last updated: 17-Oct-07 17:23 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Best Practices Data Connectivity using PowerCenter Connect for Web Services Development FAQs Using Parameters.Data Warehousing 338 of 1017 .

To underscore this point. quick to point out that data integration solutions and the presentation layers should be subject to more rigorous testing than transactional systems. the solution initiative is in danger of failure.7 Perform Unit Test Description The success of the solution rests largely on the integrity of the data available for analysis. however. they run the risk of INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 339 of 1017 . The test plan should briefly discuss the coding inherent in each transformation of a mapping and elaborate on the tests that are to be conducted. there is always a risk of performing less than adequate unit testing.Phase 5: Build Subtask 5. Complete and thorough unit testing is. Experienced developers are. consider which poses a greater threat to an organization: sending a supplier an erroneous purchase order or providing a corporate vice president with flawed information about that supplier's ranking relative to other strategic suppliers? Prerequisites None Roles Business Analyst (Review Only) Data Integration Developer (Primary) Considerations Successful unit testing examines any inconsistencies in the transformation logic and ensures correct implementation of the error handling strategy. The first step in unit testing is to build a test plan (see Unit Test Plan). Within the presentation layer.4. If unit tests are based only upon the code logic. If the data proves to be flawed. These tests should be based upon the business rules defined in the design specifications rather than on the specific code being tested. This is due primarily to the iterative nature of development and the ease with which a prototype can be deployed. therefore. essential to the success of this type of project.

For instance. if you are testing the logic performed in a Lookup transformation. Some are useful in editing data in a flat file. the test scripts indicate the transformation logic being tested by each test record and should contain an expected result for each record. If you change the tracing level in the mapping itself. or in a session's "Config Object" tab. If you override tracing in a session's "Config Object" tab properties. and creating test data can be very time consuming. a sample of the initial load may be appropriate for development and unit testing purposes. it may be necessary to create test data in order to test any exception. transformations designed on the Data Quality Integration transformation that links to Informatica Data Quality (IDQ) software) then the data quality processes (or plans) defined in IDQ are also candidates for unit testing. depending upon the quality of the actual data used. and most all offer some improvements in productivity. error. Depending on volumes.missing inconsitencies between the actual code and the business rules defined during the Design Phase. you will have to go back and modify the mapping after the testing has been completed. For testing. TIP Session log tracing can be set in a mapping's transformation level. it is generally good practice to override logging in a session's "Mapping" tab transformation properties. Test data should be available from the initial loads of the system. and/or value threshold logic that may not be triggered by actual data. in a session's "Mapping" tab. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Good practice holds that all data quality plans that are going to be used on project data — whether as part of a PowerCenter transfomation or a discrete process — should be tested before formal use on such data. However. It is important to use actual data in testing since test data does not necessarily cover all of the anomalies that are possible with true data. A detailed test script is essential for unit testing. This focuses the log file on the unit test at hand. If the transformation types include data quality transformations (that is. Consider establishing a discrete unit test stage for data quality plans.Data Warehousing 340 of 1017 . create a test session and only activate verbose data logging on the appropriate Lookup. this will affect all transformation objects in the mapping and potentially create a significantly larger session log to parse. While it is possible to analyze test data without tools. there are many good tools available for creating and manipulating test data.

You can then document the actual results as compared to the expected results outlined in the test script. The ability to change the data running through the mapping while in debug mode is an extremely valuable tool because it allows you to test all conditions and logic as you step through the mapping.Data Warehousing 341 of 1017 . This ensures that the session does not write data to the target tables. a second run of test data should occur if the business requirements demand periodic updates to the target database. The first session should load test data into empty targets. After running a test session. Best Practices None Sample Deliverables Defect Log Defect Report Test Condition Results Last updated: 01-Feb-07 18:47 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . A thorough unit test should uncover any transformation flaws and document the adjustments needed to meet the data integration solution's business requirements. Running the mapping in the Debugger also allows you to view the target data without the session writing data to the target tables. thereby ensuring appropriate results. analyze and document the actual results compared to the expected results outlined in the test script. After checking for errors from the initial load.It is also advisable to activate the test load option in the PowerCenter session properties and indicate the number of test records that are to be sourced.

) being reviewed. which vary depending on the object (i. but should be limited to the minimally acceptable number. the review meeting for a design document may include the business analyst who specified the requirements..Data Warehousing 342 of 1017 . and configurations. ● The number of reviewers at the meeting depends on the type of review. a design authority. document. code. who then schedules a review meeting. Prerequisites None Roles Data Integration Developer (Secondary) Quality Assurance Manager (Primary) Considerations The peer review process encompasses several steps.e. etc. In general. and one or two technical experts in the particular design aspects. he or she communicates this to the Quality Assurance Manager.e. the process should include these steps: ● When an author confirms that an object has reached a suitable stage for review. code..8 Conduct Peer Reviews Description Peer review is a powerful technique for uncovering and resolving issues that otherwise would be discovered much later in the development process (i. INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Phase 5: Build Subtask 5.4. The main types of object that can be subject to formal peer review are: documents. For example. during testing) when the cost of fixing is likely to be much higher.

reviewers should look at the object point-by-point and note any defects found in the Defect Log. For example. Although this can be estimated when the defect is originally noted. The ‘benefit’ of such reviews is the potential time saved. it should be rated as 'high impact'. ● The author or Quality Assurance Manager should lead the meeting to ensure that it is structured and stays on point. the DBA should be involved in reviewing the logical data model to ensure that he/she has sufficient information to conduct the physical design. Metrics can be used to help in tracking the value of the review meetings. appropriate documents. the Quality Assurance Manager may decide to conduct an informal mini-review after the defects are corrected to ensure that all problems have been appropriately rectified. code. which can be implemented across the project or.It is a good practice to select reviewers with a direct interest in the deliverable. If no net benefit is obtained from the peer reviews.Data Warehousing 343 of 1017 . one day for a medium-impact defect and two days for a high-impact defect. ● ● ● There are two main factors to consider when rating the ‘impact’ of defects discovered during peer review. in specific areas of the project. The meeting should not be allowed to become bogged down in resolving defects. and review checklist should be distributed prior to the review meeting to allow preparation. the Quality Assurance Manager should schedule another review meeting with the same review team to ensure that all defects are corrected. ● If possible. more likely. and the subsequent re-work. Trivial items such as spelling or formatting errors should not be recorded in the log (to avoid ‘clutter’). the effect on functionality and the saving in rework time. If the initial review meeting identifies a significant amount of required rework. Best Practices None INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The ‘cost’ of formal peer reviews is the man-time spent on meeting preparation. but should reach consensus on rating the object using a High/Medium/Low scale. Adding up the benefit in man-days allows a direct comparison with ‘cost’. This can be recorded in man-days. It may be better to assign a notional ‘benefit’ – say two hours for a low-impact defect. the review meeting itself. the Quality Assurance Manager should investigate a less intensive review regime. During the meeting. If a defect would result in a significant functional deficiency. If the number and impact of defects is small. such estimates are unlikely to be reliable. or large amount of rework later in the project.

Data Warehousing 344 of 1017 .Sample Deliverables None Last updated: 01-Feb-07 18:47 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

it may be necessary to move sessions and workflows around to improve performance. The final product of this task . the execution order of each session or task is very flexible. however. By incorporating link conditions and/or decision tasks into workflows. Additionally. Since the volume of data used in production may differ significantly from the volume used for testing. After unit testing is complete. The tasks within the workflows should be organized so as to achieve an optimum load in terms of data quality and efficiency.is not static. The objective here is to eliminate any possible errors in the system test that relate directly to the load process.Phase 5: Build Task 5.Data Warehousing 345 of 1017 . When this task is completed. the development team should have a completely organized loading model that it can use to perform a system test.the completed workflow(s) .5 Populate and Validate Database Description This task bridges the gap between unit testing and system testing. the sessions for each mapping must be ordered so as to properly execute the complete data migration from source to target. Creating workflows containing sessions and other tasks with the proper execution order does this. event raises and event waits can be incorporated to further develop dependencies. Prerequisites None Roles Business Analyst (Secondary) Data Integration Developer (Primary) Technical Project Manager (Review Only) Test Manager (Approve) INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

consider putting all mappings to be tested in a single folder. This will allow them to be executed in the same workflows and reordered to assess optimum performance. this task requires a single instance of the target database(s). Also. Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:47 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . the structure of the tables must be identical to those in the operational database(s). while data may not be required for initial testing.Data Warehousing 346 of 1017 . Additionally.Considerations At a minimum.

4. Clearly define and document all dependencies Analyze and document the load volume Analyze the processing resources available Develop operational requirements such as notifications. 3. worklets and workflows is critical for correct data loading. external processes and INFORMATICA CONFIDENTIAL Velocity v8 Methodology . 2. however this is dependent primarily on load volumes.Data Warehousing 347 of 1017 . hardware. Minimizing the load window is not always as important.5. Prerequisites None Roles Business Analyst (Review Only) Data Integration Developer (Primary) Technical Project Manager (Review Only) Considerations The load development process involves the following five steps: 1.1 Build Load Process Description Proper organization of the load process is essential for achieving two primary load goals: ● ● Maintaining dependencies among sessions and workflows and Minimizing the load window Maintaining dependencies between sessions. lack of dependency control results in incorrect or missing data. and available load time.Phase 5: Build Subtask 5.

This is true even if all of the individual components have already been subjected to unit test.Data Warehousing 348 of 1017 . The highest priority within a group is usually assigned to sessions with the most child dependencies. Analyzing Load Volumes The Load Dependency Analysis should list all sessions. The analysis must clearly document the dependency relationships between each session and/or event. if the hardware is not adequate to run the sessions concurrently. and the impact of any possible dependency test results (e. on which the sessions depend. Develop tasks. do not run a session. With unit test data. fail a session. Three email options are available for notification during the load process: ● Post-session e-mails can be sent after a session completes successfully or when it fails E-mail tasks can be placed in workflows before or after an event or series of events E-mails can be sent when workflows are suspended ● ● When the integrated load process is complete. The larger volumes associated with an actual operational run would be likely to hamper validation of the overall process. worklets and workflows based on the results If the volume of data is sufficiently low for the available hardware to handle. developing the load process solely on the dependency analysis. Another possible component to add into the load process is sending e-mail. together with any other events (Informatica or other). you will need to prioritize them. ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . follow the following format: ● The first set of sessions or events listed in the analysis (Group A). Also. in order of their dependency. would be those with no dependencies.g.timing 5. etc.. it should be subject to unit test. fail a parent or worklet. the algorithm or logic needed to test the dependency conditions during execution. the staff members who perform unit testing should be able to easily identify major errors when the system is placed in operation. The load dependency documentation would for example. you may consider volume analysis optional.).

number of rows loaded. the number of rows extracted. e-mail) ● The third set (Group C). the final load processes are the set of load scripts.g.. Best Practices None Sample Deliverables INFORMATICA CONFIDENTIAL Velocity v8 Methodology . number and volume of lookups in the mappings). The Load Volume Analysis should also list sessions in descending order of processing time. scheduling objects. as they are often developed late in the project development cycle when time is of short supply. or master workflows that will be executed for the data migration.g. until all sessions are included. etc. Against each session in this list.e. would be those with a dependency on one or more sessions or events in the first set (Group A). source row counts and row widths. Fail. The Load Volume Analysis should list all the sources . fail parent) r Notification (e. estimated based these factors (i. This should include the sources for all lookup transformations. Completed by (time). expected for each session. It is recommended to keep all load scripts/schedules/master workflows to a minimum as the execution of each will become a given line item on the migration punchlist.) r Action (e. similar dependency information as above would be included. in addition to the extract sources. do not run. ● The listing would be continued in the document. would be those with a dependency on one or more sessions or events in the second set (Group B).Data Warehousing 349 of 1017 . For Data Migration projects. as the amount of data that is read to initialize a lookup cache can materially affect the initialization and total execution time of a session. Succeed. the following information would be included: r Dependency relationships (e. It is important that developers develop with a load plan in mind so that these load procedures can be developed quickly.The second set listed (Group B).g. Against each session in this list.

None Last updated: 15-Feb-07 19:28 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 350 of 1017 .

In this strategy tests would be defined based on the interface. a test objective might be the integration of system components that use a common interface. For instance. Prerequisites None Roles Business Analyst (Secondary) Data Integration Developer (Primary) Technical Project Manager (Review Only) Test Manager (Approve) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . It is still important to take note of the ETL load so as to ensure that such aspects as performance and data quality are not adversely affected. There are a number of strategies that can be employed for integration testing.5. software applications at the company level . Integration testing based on test objectives. through to delivery and payment. interact without error. the one set of tests might cover the processing of a customer order. ● These two strategies illustrate that the ETL process is merely part of the equation rather than the focus of it. two examples are as follows: ● Integration testing based on business processes. from acquisition and registration. until all system components or applications have been sufficiently tested.Data Warehousing 351 of 1017 . For example. tests examine all the system components affected by a particular business process. one step up. Additional business processes are incorporated into the tests.Phase 5: Build Subtask 5. In this strategy.2 Perform Integrated ETL Testing Description The task of integration testing is to check that components in a software system or.

5.Data Warehousing 352 of 1017 . and session tasks in such a way as to maintain the required dependencies while minimizing the overall load window. The Final Load Process document orders workflows.1 Build Load Process to represent the current actual result. is the layout of workflows. it is crucial for the ultimate goal of a successful process implementation.61% INFORMATICA CONFIDENTIAL Velocity v8 Methodology . this table shows the number of transformation objects for mappings. worklets. 5. It is a good practice to keep the Load Dependency Analysis and Load Volume Analysis in mind during this testing. the Final Load Process document. this layout is still dynamic and may change as a result of ongoing performance testing. Mapping Trans. worklets. Objects M_ABC M_DEF M_GHI M_JKL 15 3 24 7 If mapping M_ABC is the only one unit tested. This document will differ from that generated in the previous subtask. the ITP is: ITP = 100% * 15 / 49 = 30. and session tasks that will achieve an optimal load process. particularly if the process identifies a problem in the load order. The final product of this subtask. The formula for ITP is: ITP = 100% * Transformation Objects Unit Tested/Total Objects As an example.Considerations Although this is a minor test from an ETL perspective. Any deviations from those analyses are likely to cause errors in the loaded data. Primary proofing of the testing method involves matching the number of rows loaded to each individual table. Tip The Integration Test Percentage (ITP) is a useful tool that indicates the percentage of the project's source code that has been unit and integration tested. However.

A unit may be defined as an individual function.If mapping M_DEF is the only one unit tested. the definition of a unit can vary. the ITP is: ITP = 100% * 49 / 49 = 100% The ITP metric provides a precise measurement as to how much unit and integration testing has been done.12% If mappings M_GHI and M_JKL are unit tested.00% And if all modules are unit tested. the ITP is: ITP = 100% * (24 + 7) / 49 = 100% * 31 / 49 = 75. the ITP is: ITP = 100% * 3 / 49 = 6. The ITP metric is not based on the definition of a unit. On actual projects. Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:48 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . the ITP metric is based on the actual number of transformation objects tested with respect to the total number of transformation objects defined in the project.Data Warehousing 353 of 1017 . a group of functions. or an entire Computer Software Unit (which can be several thousand lines of code). Instead.

the developers should refer to the deliverables produced during the Design Phase. using the results from 4. Presenting the end-user the presentation layer to business analysts to elicit and incorporate their feedback. This task incorporates both development and unit testing. The Build Presentation Layer task consists of two subtasks which may need to be performed iteratively several times: 1.5 Populate and Validate Database. This document provides the necessary specifications for building the front-end application for the user community. Throughout the Build Phase.Phase 5: Build Task 5. the Presentation Layer Design document.4 Design Presentation Layer. Depending on volumes. most importantly. end user feedback. Prerequisites None Roles INFORMATICA CONFIDENTIAL Velocity v8 Methodology . data is needed to validate the results of any presentation layer queries. The result of this task should be a final presentation layer application that satisfies the needs of the organization. which is the final result of the Design Phase and incorporates all efforts completed during that phase. Developing the end-user presentation layer 2. This task cannot therefore. These deliverables include a working prototype. This sample data set can be used to assist in building the presentation layer and validating reporting results .6 Build Presentation Layer Description The objective of this task is to develop the end-user analysis. metadata design framework and. a sample of the initial load may be appropriate for development and unit testing purposes.4 Design and Develop Data Integration Processes and 5. without the added effort of fabricating test data. While this task may run in parallel with the building of the data integration processes. be completed before tasks 5. Test data will be available from the initial loads of the target system.Data Warehousing 354 of 1017 .

Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:48 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Business Analyst (Primary) Presentation Layer Developer (Primary) Project Sponsor (Approve) Technical Project Manager (Review Only) Considerations The development of the presentation layer includes developing interfaces and predefined reports to provide end users with access to the data. enabling developers to incorporate changes or additions early in the review cycle.Data Warehousing 355 of 1017 . Having end users available to review the work-in-progress is an advantage. It is important that data be available to validate the accuracy of the development effort.

After an object is built. Keep in mind that you have to create a report no matter what the final form of the information delivery is.1 Develop Presentation Layer Description By the time you get to this subtask.6. The following considerations should be taken into account while INFORMATICA CONFIDENTIAL Velocity v8 Methodology . the indicators and alerts are derived off a report and hence your first task is to create a report. alerts and indicators.Phase 5: Build Subtask 5. Prerequisites None Roles Presentation Layer Developer (Primary) Considerations During the Build task. The principles for this subtask also apply to metadata solutions providing metadata to end users.Data Warehousing 356 of 1017 . it is good practice to verify and review all the design options and to be sure to have a clear picture of what the goal is. it is important to follow any naming standards that may have been defined during the design stage. in addition to the standards set on layouts. This will ensure proper knowledge transfer and ease of maintenance in addition to improving the readability for everyone. Also. keep detailed documentation of these objects during the build activity. thorough testing should be performed to ensure that the data presented by the object is accurate and the object is meeting the performance that is expected. making this subtask relatively simple. all the design work should be complete. During the build. etc. In other words. formats. Now is the time to put everything together and build the actual objects such as reports.

Step 2: What parameters should I include? The metrics are always measured against a set of predefined parameters. Add all the metrics that you want to see on the report and arrange them in the required order. unless you are creating an Attribute-only Report. which are called metrics in the BI terminology. and add them to the Report (unless you are creating a Metric-only Report). This will help in searching for the specific metrics or attributes much faster than manually searching in a pool of hundreds of metrics and attributes. time periods or product categories. Step 3: What are my data limiting criteria for this report? Now that you have selected all the data elements that you need in the report. just like the metrics. Select these parameters. Make sure to use the right Filters and Ranking criteria to accomplish this in the report. Optionally. for example. Time setting preferences can vastly differ from one user’s requirement to that of another. you can choose a Time Key that you want to use as well for each metric. which are called Attributes in the BI terminology. are perhaps the most important part of the report. Step 4: How should I format the report? INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Consider using Filtersets instead of just Filters so that important criteria limiting the data sets can be standardized over a project or department. Tip Create a query for metrics and attributes. for example. It is important to thoroughly analyze the end user’s requirements and expectations prior to adding the Time Settings to reports. Begin the build task by selecting your metrics.Data Warehousing 357 of 1017 .building any piece of information delivery: Step 1: What measurements do I want to display? The measurements. One group of users may be interested just in the current data while another group may want to compare the trends and patterns over a period of time. You can add a prompt to the report if you want to make it more generic over. it is time to make sure that you are delivering only the relevant data set to the end users. You can add Prompts and Time Keys for the attributes too.

or product categories. Who should get the report. cross tabular. and how should they get it ? – Make sure that proper security options are implemented for each report. There may be sensitive and confidential data that you want to ensure is not accessible by unauthorized users. However.Data Warehousing 358 of 1017 . you can consider the following points while formatting the reports: Table report type: The data in the report can be arranged in one of the following three table types: tabular. Select the one that suits the report the best. you may want to save it in a shared folder. Occasionally. Ad-hoc reports that are of interest to a smaller set of individuals are usually run on-demand. However. Chart or graph: A picture is worth a thousand words. Data sort order: Arrange the data such that the pattern makes it easy to find any part of the information that one is interested in. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . you should think about how the report should be delivered. there will be a requirement to see the data in the report as soon as the data changes in the data warehouse (and data in the warehouse may change very frequently). etc.You can chose to run the report on-demand or schedule it to be automatically refreshed at regular intervals. When should the report be refreshed? . The refresh interval should typically consider the period for which the business users are likely to consider the data ‘current’ as well as the frequency of data change in the data warehouse. If the report is shared by more than one group of users. be sure to address the following points: Where should the report reside? – Select a folder that is most suited for the data that the report contains. Step 5: How do I deliver the information? Once the report is ready. In doing so. A chart or graph can be very useful when you are trying to make a comparison between two or more time periods. the bulk of the reports that are viewed regularly by different business users need to be scheduled to refresh periodically. or sectional.Presenting the information to the end user in an appealing format is as important as presenting the right data. when. A good portion of the formatting should be decided during the Design phase. regions. You can handle situations like this by having the report refresh at ‘real-time’.

Based on these requirements. you can make minor changes in the report as necessary. Keywords: It is not uncommon to have numerous reports pertaining to the same business area residing in the same location. Rarely. Comments and description: Comments and Descriptions make the reports more easily readable as well as helping when searching for a report. it may become a tough task to name a report very accurately if the same report is viewed in two different perspectives by two different sets of users. First. such as whether the report needs to be broadcast to users. you may have to contact the Administrator and have it set up for you.Special requirements– You should consider any special requirements a report may have at this time.Data Warehousing 359 of 1017 . will assist users in searching for a report more easily. Indicator Considerations After the base report is complete. You may consider making a copy of the report and naming the two instances to suit each set of users. Highlighters: It may also be a good idea to use highlighters to make critical pieces of information more conspicuous in the report. Including keywords in the report setup. Drill paths: Check to make sure that the required drill paths are set up. Remember that there are several types of chart indicators as well as several different gauge indicators to choose INFORMATICA CONFIDENTIAL Velocity v8 Methodology . you can build indicators on top of that report. Consider the following for each report that you build: Title of the report: The title of the report should reflect what the report contents are meant to convey. table or gauge indicators. You can use chart. Add the report to one or more analytic workflows so that the user can get additional questions answered in the context of a particular report’s data. you will need to determine and select the type of indicator that best suits the primary purpose. If you don’t find a drill path that you think is useful for this report. Packing More Power into the Information Adding certain features to the report can make it more useful for everybody. Analytic workflows: Analytic workflows make the information analysis process as a whole more robust. whether there is a need to export the data in the report to an external format etc.

you must determine the proper delivery device. pie. Once you find out what is important to the users. fax. etc). you have more than a dozen different types of charts to choose from (standard bar. you can create either chart or table indicators. you have to determine and specify three ranges (low. stacked line. medium. However. including sum calculations.. If the user doesn’t log into Power Analyzer on a daily basis. in a table view. Additionally. consider the following: Do you want to display information on one specific metric? Gauge indicators allow you to monitor a single metric and display whether or not that indicator is within an acceptable range.from. or digital. or pager) has been INFORMATICA CONFIDENTIAL Velocity v8 Methodology . you can define the Alert rules. How should the alert be delivered? Once the appropriate Alert receiver is identified.Data Warehousing 360 of 1017 . An alert may go on a business unit’s dashboard or a personal dashboard. make sure that the required delivery device (i. you have to decide how the gauge should be displayed: circular. phone. If the alert is critical. flat. such as falling revenue or record breaking sales. When you create a gauge indicator. Furthermore. If you chose chart indicators. Alert Considerations Alerts are created when something important is occurring. When creating indicators. you can create a gauge indicator to monitor the revenue metric value for each division of your company. a page could be sent. Do you want to display information on multiple metrics? If you want to display information for one or more attributes or multiple metrics. For example. chose a table indicator. if you’d like to see a subset of an actual report.e. consider the following: What are the important business occurrences? These answers will come from discussions with your users. maybe an email should be sent. email. To help decide what types of indicators to use. and high) for the metric value. Who should receive the alert? It is important that the alert is delivered to the appropriate audience.

If a report takes too long time to generate data. Best Practices None Sample Deliverables None Last updated: 18-Oct-07 15:05 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . to make sure that it is optimized. have accurate numbers and are reported against the correct attributes. especially against dozens of tables. Multi-pass SQL is supported by Data Analyzer Indexing is important. by running utilities such as Explain Plan on the query in Oracle. Try to make sure that the individual rows. even in simple star schemas. Testing and Performance Thorough testing needs to be performed on the report/indicator/alert after it is built to ensure that you are presenting accurate and desired information. ● ● Tip You can view the query in your report if your report is taking a long time to get the data from the source system. Always keep performance of the reports in mind.Data Warehousing 361 of 1017 . then you need to identify what is causing the bottleneck and eliminate or reduce the bottleneck.registered in the BI tool. Copy the query and evaluate the query. can make a welldesigned data warehouse look inefficient. as well as aggregate values. The following points are worth remembering: ● Complex queries.

INFORMATICA CONFIDENTIAL Velocity v8 Methodology . it is important to focus on the capabilities of the tool and the differences between typical reporting environments and solution architectures. thereby ensuring that the end result meets the business requirements. they can offer more relevant feedback.Phase 5: Build Subtask 5.Data Warehousing 362 of 1017 .2 Demonstrate Presentation Layer to Business Analysts Description After the initial development effort. This approach helps the developers to gather and incorporate valuable user feedback and enables the end users to validate or clarify the interpretation of their requirements prior to the release of the end product. When educating the end users about the front end tool whether a business intelligence tool or an application.6. Prerequisites None Roles Business Analyst (Primary) Presentation Layer Developer (Primary) Project Sponsor (Approve) Technical Project Manager (Review Only) Considerations Demonstrating the presentation layer to the business analysts should be an iterative process that continues throughout the Build Phase. When end users thoroughly understand the capabilities of the front end that they will use. the development team should present the presentation layer to the Business Analysts to elicit and incorporate their feedback.

particularly when some requests are not included in the initial release. The Project Manager needs to work closely with the developers and analysts to prioritize the requests based upon the availability of source data to support the end users' requests and the level of effort necessary to incorporate the changes into the initial (or current) release. in an iterative approach. Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:48 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . some of the additional requests may be implemented in future releases to avoid delaying the initial release. the Project Manager must communicate regularly with the end users to set realistic expectations and establish a process for evaluating and prioritizing feedback. In addition. The Project Manager also needs to clearly communicate release schedules and future development plans.The Project Manager must play an active role in the process of accepting and prioritizing end user requests. to the end-user community. including specifics about the availability of new features or capabilities. This type of communication helps to avoid end-user dissatisfaction.Data Warehousing 363 of 1017 . While the initial release of the presentation layer should satisfy user requirements.

2 Execute Complete System Test 6.5.5 Conduct Volume Testing r r r r ● ● 6.1.3.5.1.5.2.4 Define User Acceptance Test Plan 6.2 Prepare for Testing Process r 6.1 Prepare for System Test 6.4 Tune Reporting Performance r r r INFORMATICA CONFIDENTIAL Velocity v8 Methodology .1 Benchmark 6.Phase 6: Test 6 Test ● 6.5 Define Test Scenarios 6.3.1 Define Test Data Strategy 6.2 Identify Areas for Improvement 6.1 Define Overall Test Strategy r 6.2 Prepare Defect Management Processes r ● 6.Data Warehousing 364 of 1017 .3 Perform Data Validation 6.2 Define Unit Test Plan 6.3.1 Prepare Environments 6.3 Define System Test Plan 6.1.3 Tune Data Integration Performance 6.3.4 Conduct User Acceptance Testing 6.3 Execute System Test r 6.2.3.4 Conduct Disaster Recovery Testing 6.1.5 Tune System Performance r 6.5.6 Build/Maintain Test Source Data Set r r r r r ● 6.1.1.

disasters that disrupt service for the system in some way. whether that be radical changes to data volumes. It includes a number of detailed technically-oriented verifications that are managed as processes by the technical team with primarily technical criteria for acceptance. its success against its business objectives. Because of the natural tension that exists between completion of the preset project timeline and completion of Acceptance Criteria (which may take longer than expected) the Test Phase schedule is often owned by a QA Manager or Project Sponsor rather than the Project Manager. or spikes in concurrent usage.Phase 6: Test Description The diligence with which you pursue the Test Phase of your project will inevitably determine its acceptance by its end-users. Prerequisites None Roles Business Analyst (Primary) Data Integration Developer (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and therefore. the System Test and the User Acceptance Test (UAT). UAT is a detailed user-oriented set of verifications with User Acceptance as the objective. Test is also a critical preparation against any eventuality that could impact your project. The Test phase includes the full design of your testing plans and infrastructure as well as two categories of comprehensive system-wide verification procedures. Velocity includes as a final step in the Test Phase activities related to tuning system performance. The System Test is conducted after all elements of the system have been integrated into the test environment. Any test cannot be considered complete until there is verification that it has accomplished the agreed-upon Acceptance Criteria. Satisfactory performance and system responsiveness can be a critical element of user acceptance. During the Test Phase you must essentially validate that your system accomplishes everything that the project objectives and requirements specified and that all the resulting data and reports are accurate. It is typically managed by end-users with participation from the technical team.Data Warehousing 365 of 1017 .

Data Warehouse Administrator (Primary) Database Administrator (DBA) (Primary) End User (Primary) Network Administrator (Primary) Presentation Layer Developer (Primary) Project Sponsor (Review Only) Quality Assurance Manager (Primary) Repository Administrator (Primary) System Administrator (Primary) System Operator (Primary) Technical Project Manager (Secondary) Test Manager (Primary) User Acceptance Test Lead (Primary) Considerations To ensure the Test Phase is successful it must be preceded by diligent planning and preparation. during the Design Phase. participation.Data Warehousing 366 of 1017 . Velocity recommends that this planning process begins. test tools. The Test Phase includes the development of test plans and procedures. guidelines and scenarios. Early on. as well as detailed Acceptance Criteria. project leadership and project sponsors should establish test strategies and begin building plans for System Test and UAT. and that it includes descriptions of timelines. at the latest. It is intended INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

some of its activities will be revisited. You may want to plan for regular periods of benchmarking and tuning. Any defects or deficiencies discovered must be categorized (severity. performance and responsiveness may degrade. so some repair and retest should be expected. perhaps many times. Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:48 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . By it's nature software development is not always perfect. The Test Phase includes other important activities in addition to testing. priority) recorded. and weighed against the Acceptance Criteria (AC). Although formal user acceptance signals the completion of the Test Phase. some tasks and subtasks in the Test Phase will overlap with those in the Build Phase and possibly the Design Phase. regression test procedures. Thus it is expected that from a planning perspective. Performance tuning is recommended as a recurrent process. Test scenarios. throughout the operation of the system. criticality.to overlap with the Build Phase which includes the individual design reviews and unit test procedures. and the results must be retested with the inclusion of satisfactory regression testing. rather than waiting to be reactive to end-user complaints. This process has the prerequisite for the development of some type of Defect Tracking System. and other testing aids must also be retained for this purpose. As data volume grows and the profile of the data changes. The Defect Tracking System must be maintained to record defects and enhancements for as long as the system is supported and used. Velocity recommends that this be developed during the Build Phase. It is difficult to determine your final testing strategy until detailed design and build decisions have been made in the Build Phase. The technical team should repair them within the guidelines of the AC.Data Warehousing 367 of 1017 .

Planning should include the following components : ● ● ● ● ● resource requirements and schedule construction and maintenance of the test data preparation of test materials preparation of test environments preparation of the methods and control procedures for each of the major tests Typically.Data Warehousing 368 of 1017 . determining the amount and types of testing to be performed should occur early in the development lifecycle.Phase 6: Test Task 6. the beginning of these activities often begins as early as the Design Phase. Thus. there are three levels of testing: Testing Level Description Performed By INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The detailed object level testing plans are continually updated and modified as the development process continues since any change to development work is likely to create a new scenario to test. while all of the testing related activities have been consolodated in the Testing phase. This enables project management to allocate adequate time and resources to this activity. Although the major testing actually occurs at the end of the Build Phase . This also enables the project to build the appropriate testing infrastructure prior to the beginning of the testing phase.1 Define Overall Test Strategy Description The purpose of testing is to verify that the software has been developed according to the requirements and design specifications.

As most data integration solutions do not directly touch end users. and performance. Testing may include. with data integration this includes testing individual mappings. User Acceptance Testing should focus on the front-end applications and reports. data integrity. rather than the load processes themselves. the developer tests all error conditions and logic branches within the code. UNIX scripts.Unit Testing of each individual function. but is not limited to.Data Warehousing 369 of 1017 . User Acceptance Testing Team User Acceptance Prerequisites None Roles Business Analyst (Primary) Data Integration Developer (Primary) End User (Primary) Presentation Layer Developer (Primary) Quality Assurance Manager (Approve) Technical Project Manager (Approve) Considerations INFORMATICA CONFIDENTIAL Velocity v8 Methodology . stored procedures. Ideally. Developer System or Integration Testing performed to review the system as a System Test Team whole as well as its points of integration. or other external programs. For example. reliability.

Data Warehousing 370 of 1017 .None Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:48 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

if there is a one-to-ten ratio between products and customers in the live environment. the main challenge is to ensure that it accurately reflects the production environment. Theoretically. in order to test the load of production source data. there is sometimes a risk of sensitive information migrating from production to less-controlled environments (i. the ratios between tables should be maintained. test). there is no guarantee that all possible exception cases and value ranges will occur in the sub-set of the data used.1 Define Test Data Strategy Description Ideally. Adequate test data can be important for proper unit testing and is critical for satisfactory system and user acceptance tests.Data Warehousing 371 of 1017 . the full set of production data is often not available. for example.. Additionally. However. The deliverable from this subtask is a description and schedule for how test data will be derived. and migrated to testing environments. it is important to understand that with any set of data used for testing.e. If generated data is used. generated data can be made to be representative and engineered to test all of the project functionality. in some circumstances. While the actual record counts in generated tables are likely to differ from production environments. care should be taken to retain this same ratio in the test environment. actual data from the production environment will be available for testing so that tests can cover the full range of possible values and states in the data.1. stored.Phase 6: Test Subtask 6. Prerequisites None Roles Business Analyst (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . this may even be illegal. There is also the chicken-and-egg problem of requiring the load of production source data. Therefore.

Best Practices None INFORMATICA CONFIDENTIAL Velocity v8 Methodology .e. By using real production data. However. there is less of a premium on flexible maintenance of test data structures. the test data strategy should be focused on how much source data to use rather than how to manufacture test data. the overhead of developing software to load test data may not be justified. It is strongly recommended that the data used for testing is real production data but most likely of less volume then the production system. The PowerCenter mappings that load the test data from this source can make use of techniques to insulate (to some degree) the logic from schema changes by including pass-through transformations after source qualifiers and before targets.Data Integration Developer (Secondary) End User (Primary) Presentation Layer Developer (Primary) Quality Assurance Manager (Approve) Technical Project Manager (Approve) Test Manager (Primary) Considerations In stable environments. it is also possible to store test data in a format that is geared toward ease of maintenance and to use PowerCenter to transfer the data to the source system format. data for testing purposes is stored in the same structure as the source in the data flow. where source and/or target data structures are not finalized). it may be easier to store test data in XML or CSV formats where it can easily be maintained with a text editor. the availability of a data movement tool such as PowerCenter greatly expands the range of options for test data storage and movement.. In dynamic environments (i. For Data Migration. Usually. the final testing will be more meaningful and increase the level of confidence from the business community thus making ‘go/no-go’ decisions easier.Data Warehousing 372 of 1017 . So if the source is a database with a constantly changing structure.

Sample Deliverables Critical Test Parameters Last updated: 01-Feb-07 18:48 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 373 of 1017 .

INFORMATICA CONFIDENTIAL Velocity v8 Methodology . system requirements.2 Define Unit Test Plan Description Any distinct unit of development must be adequately tested by the developer before it is designated ready for system test and for integration with the rest of the project elements. or components for the mapping or report. the developer may add technical tests and make sure all logic paths are covered. This includes any element of the project that can. database/ schema to be used). The designer should begin with a test scenario or test data descriptions and include checklists for the required functionality. and other identification information.1. References to all applicable requirements and design documents.g.Data Warehousing 374 of 1017 . The unit test plans should include specification of inputs. enrichment). Specification of test environment (e. cleansing. Rather than conducting unit testing in a haphazard fashion with no means of certifying satisfactory completion. date of build or change.Phase 6: Test Subtask 6.. standardization. The unit test plan consists of: ● Identification section: unit name. and must be validated by the designer as meeting the business and functional requirements and design criteria.g. data analysis. References to all applicable data quality processes (e. be tested on its own. developer. Short description of test scenarios and/or types of test runs. reports. The creation of the unit test plan should be a collaborative effort by the designer and the developer. version number. and expected outputs and results. Unit test plans are based on the individual business and functional requirements and detailed design for mappings. The unit test is the best opportunity to discover any misinterpretation of the design as well as errors of development logic. all unit testing should be measured against a specified unit test plan and its completion criteria. in any way.. Per test run: r r ● ● ● ● ● Purpose (what features/functionality are being verified). Prerequisites. tests to verify.

and any dependencies between test runs. The description of test runs should include the functional coverage. Prerequisites None Roles Business Analyst (Secondary) Data Integration Developer (Primary) Presentation Layer Developer (Primary) Quality Assurance Manager (Review Only) Considerations Reference to design documents should contain the name and location of any related requirements documents. other code translations). error handling results.g. It should also include specification of any infrastructure elements or tools to be used in conjunction with the tests.Data Warehousing 375 of 1017 .. ● Comments and findings. and other applicable documents. References to test data or load-files to be used. Specification of the test environment should include such details as which reference or conversion tables must be used to translate the source data for the appropriate target (e. workflows. high-level and detailed design. Test script (step-by-step guide to executing the test). Specification (checklist) of the expected outputs. mock-ups. data output. for conversion of postal codes. any dependencies the test has on completion of other logic or INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Prerequisites should include whatever is needed to create the correct environment for the test to take place. for key translation. messages. etc.r r r r Definition of test inputs.

For example. It specifies in detail any output records and fields. one run may be an initial load against an empty target. Test data must contain a mix of correct and incorrect data. and so forth.Data Warehousing 376 of 1017 . ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . availability of reference data. In addition. to complete the actual test run itself. One or more test runs can be specified in a single unit test plan. as well as information about who is assigned to resolve the problem. and any functional or operational results through each step of the test run. incorrect postal code format. The Comments and Findings section is where all errors and unexpected results found in the test run should be logged. Comparing the produced output from the test run with this specification provides the verification that the build satisfies the design. but can not exist in the same record. ClearCase) where errors can be logged along with an indication of their severity and impact. incorrect data may have results according to the defined error-handling strategy such as creating error records or aborting the process. The test script specifies all the steps needed to create the correct environment for the test. Analysis can be done by hand or by using compare scripts. It is up to the QA Management and/or QA Strategy to determine whether to use a more advanced error tracking system for unit testing or to wait until system test. Examples of incorrect data are: ● Value errors: value is not in acceptable domain or an empty value for mandatory fields. Correct data can be expected to result in the specified output. (e. The script should cover all of the potential logic paths and include all code translations and other transformations that are part of the unit. or nonnumeric data in numeric fields. Semantic errors: two values are correct. These data must be maintained in a secure place to make repeatable tests possible. adequate space in database or file system. and the steps to analyze the results. Syntax errors: incorrect date format. errors in the test plan itself can be logged here as well..test runs. with subsequent runs covering incremental loads against existing data or tests with empty input or with duplicate input records or files and empty reports. Some sites demand a more advanced error logging system. The input files or tables must be specified with their locations.g. Specifying the expected output is the main part of the test plan.

The unit test can proceed after analysis and error correction. the unit can be handed over to the next test phase. At that point. The results of these tests will feed as prerequisites into the main unit test plan. The tests for data quality processes should follow the same guidelines as outlined in this document.Note that the error handling strategy should account for any Data Quality operations built into the project. Note also that some PowerCenter transformations can make use of data quality processes. Data quality plan instructions can be loaded into a Data Quality Integration transformation (the transfomation is added to PowerCenter via a plug-in).Data Warehousing 377 of 1017 . A PowerCenter mapping should be validated once the Data Quality Integration transformation has been added to it and configured with a data quality plan. developed in Informatica Data Quality (IDQ) applications. Best Practices Testing Data Quality Plans Sample Deliverables Test Case List Unit Test Plan Last updated: 01-Feb-07 18:48 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Data quality plans should be tested using IDQ applications before they are added to PowerCenter transformations. The unit test is complete when all test runs are successfully completed and the findings are resolved and retested. or plans. Every difference between the output expectation and the test output itself should be logged in the Comments and Findings section. along with information about the severity and impact on the test process.

Prerequisites None Roles Quality Assurance Manager (Review Only) Test Manager (Primary) Considerations Since the system test addresses multiple areas and test types. creation of the test plan should involve several specialists. it is very important to begin planning for System Test early in the project to make sure that all necessary resources are scheduled and prepared ahead of time. As with the other testing processes. the data may also be flawed or users may perceive it as flawed. the users may abandon it (especially if it is a reporting system) because it does not meet their perceived needs.Data Warehousing 378 of 1017 . INFORMATICA CONFIDENTIAL Velocity v8 Methodology .1.Phase 6: Test Subtask 6.which results in a loss of confidence in the system. All individuals participating in executing the test plan must agree on the relevant performance indicators that are required to determine if project goals and objectives are being met. Success rests largely on business users' confidence in the integrity of the data. If the system has flaws that impede its functions. If the system does not provide adequate performance and responsiveness.3 Define System Test Plan Description System Test (sometimes known as Integration Test) is crucial for ensuring that the system operates reliably as a fully integrated system and functions according to the business requirements and technical design. and signed-off on by all participating team members. The System Test Manager is then responsible for compiling their inputs into one consistent system test plan. reviewed. The performance indicators must be documented.

unit of work to be tested) must be sufficiently specific to track and improve data quality and performance. this test level is the highest. External Interface Level. Test levels may include one or more of the following: ● System Level. Covers all testing that involves verifying the function and reliability of specific software applications. Hardware Component Level. Covers all testing that involves verifying the function and reliability of specific data items and structures. Covers all testing that involves external data sources. Internal Interface Level. Data Unit Level. this level of testing may validate a back-up power system by removing the primary power source. Test Levels Each test case is categorized as occurring on a specific level or levels. this level of testing may collect data from diverse business systems into a data warehouse. For example. Covers all testing that involves internal system data flow. Covers all "end to end" integration testing. and the last level of testing to be completed. this level of testing may validate the ability of PowerCenter to successfully connect to a particular data target and load data. For example. This helps to clearly define the actual extent of testing expected within a given test case. Typically. Test Cases The test case (i. and involves the complete validation of total system functionality and reliability through all system entry points and exit points.Data Warehousing 379 of 1017 . This typically occurs during the development cycle in which data types and structures are defined and tested ● ● ● ● ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . For example. so that the test team can easily measure and monitor their evaluation criteria. this level of testing may determine the effect of a potential increase in network traffic due to an expanded system user base on overall business operations. and Test Types. Test Levels. Covers all testing that involves verifying the function and reliability of specific hardware components. Support System Level. This level of testing typically occurs during the development cycle. Software Process Level. Involves verifying the ability of existing support systems and infrastructure to accommodate new systems or the proposed expansion of existing systems.Performance indicators are placed in the context of Test Cases. For example.e.. This level of testing typically occurs during the development cycle.

This should include any specific data items. Test types that may be required include: ● Critical Technical Parameters (CTPs). which identifies the testing start and end dates for each TCD. All TCRs are included with each Test Case Description (TCD). This description is typically provided with each TCD. list of components Reference to design document(s) such as high-level designs.2 Execute System Test other specific tests should be planned for :● ● ● 6.5 Conduct Volume Testing The system test plans should include: ● ● System name.based on the application design constraints and requirements. list of any prerequisites. expected results. actual results.Data Warehousing 380 of 1017 . database model and reference. test steps. hardware descriptions. etc. Test Execution and Progression. based on the identified test types. component. The overall Test Schedule for the project is available in the TCD Test Schedule Summary. workflow designs. tester ID. and the current iteration of the test. Test Condition Requirement scripts are developed to satisfy all identified CTPs. based upon the project plan. These TCRs are assigned a numeric designation and include the test objective.4 Conduct Disaster Recovery Testing 6. Test Types The Data Integration Developer generates a list of the required test types based on the desired level of testing. version number. ● ● ● As part of 6.3 Perform Data Validation 6. and maintained using MS Project or a comparable tool. or functional parts. A detailed description of general control procedures for executing a test. such as special conditions and processes for returning a TCR to a developer in the event that it fails. A specific test schedule that is defined within each TCD. A worksheet of specific CTPs is established.3. Test Condition Requirements (TCRs). The defined test types determine what kind of tests must be performed to satisfy a given test case.3. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . the current date. Each CTP defines specific functional units that are tested.3. Test Schedule.

etc.) Prerequisites (e. results. each of which must be described in detail. systems engineers and database administrators. After each run. error records expected. etc. The interaction between the test runs must also be specified. depending on the defect count and severity. interdependencies) Per test run: r r Type and purpose of the test run (coverage. the specialists can take the necessary actions to resolve the problems. expected runtime. participants review the progress of the system test. accurate results from other test runs. etc. the system test can proceed. whether the system test can proceed with subsequent test runs or that errors must be corrected and the previous run repeated. the System Test Manager can decide. results recording.. developers. any problems identified. availability of monitoring tools. After the solution is approved and implemented. space in database or file system.g.) Specification of expected and maximum acceptable runtime Step-by-step guide to execute the test (including environment preparation.● ● ● Specification of test environment Overview of the test runs (coverage.) r r r r r ● ● Defect tracking process and tools Description of structure for meetings to discuss progress. and analysis steps. and assignments to resolve or avoid them. These errors and the general progress of the system test should be discussed in a weekly or bi-weekly progress meeting. INFORMATICA CONFIDENTIAL Velocity v8 Methodology .) Definition of test input References to test data or load-files to be used (note: data must be stored in a secure place to permit repeatable tests) Specification of the expected output and system behaviour (including record counts. Every difference between the expected output and the test output itself should be recorded and entered into the defect tracking system with a description of the severity and impact on the test process. etc. At this meeting.Data Warehousing 381 of 1017 . issues and defect management during the test The system test plan consists of one or more test runs. availability of reference data. The meeting should be directed by the System Test Manager and attended by the testers and other necessary specialists like designers. After assignment of the findings.

the system test plan will have been completed.When all tests are run successfully and all defects are resolved and retested.Data Warehousing 382 of 1017 . Best Practices None Sample Deliverables None Last updated: 15-Feb-07 19:38 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Resources for the testing will include physical environment setup as well as allocation of staff to testing from the user community. is that they will provide an insight into the time and effort required for INFORMATICA CONFIDENTIAL Velocity v8 Methodology . instructions to run particular workflows and run reports within which. Indeed.1. As with system testing. In addition. Prerequisites None Roles Business Analyst (Secondary) End User (Primary) Quality Assurance Manager (Approve) Test Manager (Primary) Considerations The plan should be a construction of the acceptance criteria. For example. planning for User Acceptance Testing should be begun early in the project so as to ensure the necessary resources are scheduled and ready. one possible benefit of having non technical users involved.4 Define User Acceptance Test Plan Description User Acceptance Testing (often know as UAT) is essential for gaining approval.Phase 6: Test Subtask 6. It is the end user community that needs to carryout the testing and identify relevant issues for fixing. the users can then examine the data.Data Warehousing 383 of 1017 . The author of the plan needs to bear in mind that the testers from user community may not be technically minded. with test scripts of actions that users will need to carry out to achieve certain results. These criteria need to be documented and agreed by all parties so as to avoid delays through scope creep. the user acceptance criteria will need to be distilled from the requirements and existing gold standard reports. acceptance and project sign off.

During these two phases. In addition to test scripts for execution additional criteria for acceptance need to be defined:● ● ● ● Performance. required response time and usability Data quality tolerances Validation procedures for verifying data quality Tolerable bugs based on the defect management processes In Data Migration projects. This testing usually takes two forms. executing their normal daily routine and driving out issues and inconsistencies. user acceptance testing is even more user-focused than other data integration efforts. This UAT activity is the best way to find out if the data is correct and if the data migration was completed successfully. Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:48 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .adoption and training when the completed data integration project is deployed. It is very important that the data migration team works closely with the business testers to both provide appropriate data for these tests and to capture feedback to improve the data as soon as possible. traditional UAT and ‘day-in-the-life’. business users are working through the system.Data Warehousing 384 of 1017 .

for much of the test procedures. whether Unit Test.Data Warehousing 385 of 1017 .5 Define Test Scenarios Description Test scenarios provide the context. For this reason. and other details of specific test runs. enabling testers to pretend to carry-out the related business activity and then measure the results against expectations. the “story line”. Prerequisites None Roles Business Analyst (Secondary) End User (Primary) Quality Assurance Manager (Approve) Test Manager (Primary) Considerations Test scenarios must be based on the functional and technical requirements by dividing them into specific functions that can be treated in a single test process. The test scenario forms the basis for development of test scripts and checklists. System Test or UAT. How can you know that the software solution you’re developing will work within its ultimate business usage? A scenario provides the business case for testing specific functionality. the source data definitions.Phase 6: Test Subtask 6. design of the scenarios is a critical activity and one that may involve significant effort in order to provide coverage for all the functionality that needs testing.1. Test scenarios may include: INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

development objects. Should include use of valid data as well as invalid or missing data. Description of business. Description of the type of technologies. Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:48 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Any known dependencies on other elements of the existing or new systems. functional. or technical context for the test.Data Warehousing 386 of 1017 . and/or data that should be included. ● ● ● Typical attributes of test scenarios: ● ● ● ● Should be designed to represent both typical and unusual situations.● The purpose/objective of the test (functionality being tested) described in enduser terms. Test engineers may define their own unit test cases. Business cases and test scenarios for System and Integration Tests are developed by the test team with assistance of developers and end-users.

such as surrogate keys. Volume Test Data INFORMATICA CONFIDENTIAL Velocity v8 Methodology .. there is also a need for adequate data for volume tests (i. The procedures for any given project are. If available. but are also opportunistic. while for other projects. of course. Business application skills are necessary to ensure that the test data not only reflects the eventual production environment but that it is also engineered to trigger all the functionality specified for the application. Volume is not a requirement of the functional test data set. Technical skills in whatever storage format is selected are also required to facilitate data entry and/or movement. functional test data). specific to its requirements and environments. storing.). the case for a predefined data set for the targets should also be considered. there will exist a comprehensive set of data or at least a good start in that direction.6 Build/ Maintain Test Source Data Set Description This subtask deals with the procedures and considerations for actually creating. indeed. Functional Test Data Creating a source data set to test the functionality of the transformation software should be the responsibility of a specialized team largely consisting of business-aware application experts. timestamps.1. etc. and maintaining the test data. For some projects. In a data integration project.e. volume test data).Phase 6: Test Subtask 6.. This has additional value in that the definition of a target data set in itself serves as a sort of design audit.e. The following paragraphs discuss each of these data types.Data Warehousing 387 of 1017 . such a data set makes it possible to develop an automated test procedure to compare the actual result set to a predicted result set (making the necessary adjustments to generated data. the test data may need to be created from scratch. while functional test data for the application sources is indispensable. In addition to test data that allows full functional testing (i. too much data is undesirable since the time taken to load it needlessly delays the functional test.

the volume test data set should also be available to developers for unit testing in order to identify problems as soon as possible. Maintenance In addition to the initial acquisition or generation of test data.Data Warehousing 388 of 1017 . In addition.The main objective for the volume test data set is to ensure that the project satisfies any Service Level Agreements that are in place and generally meets performance expectations in the live environment. as required. you are likely to need procedures that will enable you to rebuild or rework the test data. such as: ● ● ● ● Cartesian join in source qualifier Normalizer transformation Union transformation Java transformation If possible. PowerCenter can be used to generate volumes of data and to modify sensitive live information in order to preserve confidentiality. you will need a protected location for its storage and procedures for migrating it to test environments in such a fashion that the original data set is preserved (for the next test sequence). Once again. Prerequisites None Roles Business Analyst (Primary) Data Integration Developer (Primary) Considerations Creating the source and target data sets and conducting automated testing are nontrivial. and are therefore. often dismissed as impractical. This is partly the result of a INFORMATICA CONFIDENTIAL Velocity v8 Methodology . There are a number of techniques to generate multiple output rows from a single source row.

It is strongly recommended that all data migration integration and system tests use actual production data.Data Warehousing 389 of 1017 . Therefore. it is going to be necessary to compile a schedule of expected results from a given starting point. Using PowerCenter to make this information available and to compare the actual results from the execution of the workflows can greatly facilitate the process. effort spent generating test data on a data migration project should be very limited. Data Migration projects should have little need for generating test data.failure to appreciate the role that PowerCenter can play in the execution of the test strategy. At some point in the test process. Best Practices None Sample Deliverables None Last updated: 15-Feb-07 19:40 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Prerequisites None Roles Data Integration Developer (Secondary) Database Administrator (DBA) (Primary) Presentation Layer Developer (Secondary) Quality Assurance Manager (Primary) Repository Administrator (Primary) System Administrator (Primary) Test Manager (Primary) Considerations Prior to beginning this subtask. ramping up defect management procedures. including the test strategy. and generally making sure the test plans and all their elements are prepared and that all participants have been notified of the upcoming testing processes. Ensure that all unit test certification procedures INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Phase 6: Test Task 6. and UAT plan. you will need to collect and review the documentation generated by the previous tasks and subtasks.2 Prepare for Testing Process Description This is the first major task of the Test Phase – general preparations for System Test and UAT. This includes preparing environments. Verify that all required test data has been prepared and that the defect tracking system is operational. system test plan.Data Warehousing 390 of 1017 .

end-user documentation. functional and internal design specifications. problem/bug tracking. communications. Review the upcoming test processes with the Project Sponsor to ensure that they are consistent with the organization's existing QA culture (i. etc. and methods). coverage analyzers.) to ensure that everything is in place and ready. hardware. Verify that all expected participants have been notified of the applicable test schedule. test tracking. Develop the test procedures and documents for testers to follow from these. and any other related documents. Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:48 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Review the test environment requirements (e.are being followed. approaches.g... etc.g. software.. in terms of testing scope. ● ● ● Review testware requirements (e.Data Warehousing 391 of 1017 .) to ensure that everything is ready for the upcoming tests. Based on the system test plan and UAT plan: ● Collect all relevant requirements.e.

Phase 6: Test Subtask 6. To provide secure environments that support the test procedures and appropriate access. to the extent possible. To allow System Tests and UAT to proceed without delays and without system disruptions. the Production environment.2.Data Warehousing 392 of 1017 . and isolation from development.1 Prepare Environments Description It is important to prepare the test environments in advance of System Test with the following objectives: ● ● To emulate. To provide test environments that enable full integration of the system. ● ● Prerequisites None Roles Data Integration Developer (Secondary) Database Administrator (DBA) (Primary) Presentation Layer Developer (Secondary) Repository Administrator (Primary) System Administrator (Primary) Test Manager (Primary) Considerations INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

The Test Manager is responsible for preparing these items. Review the test plans and scenarios to determine the technical requirements for the test environments. Test scripts need to be prepared. Recording test results – the results of each test must be recorded and crossreferenced to the defect reporting process. and timing for any migrations and sufficient controls to enforce them. including all necessary hardware components (i. application tools.Data Warehousing 393 of 1017 . custom application components etc. The test environment administrator(s) must have specific verifications. data movement. The System Test environment may evolve into the UAT environment. A formal definition of the required environment also needs to be prepared. and networking. and success criteria. tasks. This may be any combination of formal presentations. software components (i. Processes Where possible. Some of the key terminology related to the preparation of the environments and the associated processes include: ● Training testers – a series of briefings and/or training sessions should be made available. security and access rights. together with a definition of the data required to execute the scripts. training. all processes should be supported by the use of appropriate tools. testing tools. This plan should cover responsibilities. It is vital that all resources. server and client). depending on requirements and stability. operating system. are made available for the entire testing period. resources. database. but is likely to delegate a large part of the work. ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology .. including off-project support staff. Volume tests and disaster/recovery tests may require special system preparations.Plans A formal test plan needs to be prepared by the Project Manager in conjunction with the Test Manager.e.. including versions). Establishing security and isolation is critical for preventing any unauthorized or unplanned migration of development objects into the test environments. time-scales. computer based tutorials or self-study sessions. formal training courses.. procedures.e.

1. adding the additional cases or modifying the live data to create the required cases. it is important to define how these datasets relate. Client workstations – must be available and sufficiently powerful to run the required client tools. Where multiple data repositories are involved. and tracking the resolution process.3 Define Defect Tracking Process) – a process for recording defects. The process of creating the test data needs to be defined. Databases – all necessary schemas must be created and populated with an ● ● ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology .● Reporting and resolution of defects (see 5.) should be installed at the version used in development (normally) with databases created as required. and may require some sampling to keep the data volumes at realistic levels. possible to use modified live data. of course. data quality tools. In this case. test tools. It is also important that the data is consistent across all the repositories and that it can be restored to a known state (or states) as and when required. This covers: ● Server(s) – must be available for the required duration and have sufficient disk space and processing power for the anticipated workload. Bandwidth must be available for any particular large data transmissions. ETL. it must be possible to roll-back to a base-state of data to allow reapplication of the ‘transaction’ data – as would be achieved by restoring from back-up. 'Live data' is usually not sufficient because it does not cover all the cases the system should handle. prioritizing their resolution. It is. Server and client software – all necessary software (OS.Data Warehousing 394 of 1017 . database. This should enable a full dataset to be defined. Some automated approach to creating all or the majority of the data is best. Environment A properly set-up environment is critical to the success of UAT. ensuring that all possible cases are tested. connectivity etc. Overall test management – a process for tracking the effectiveness of UAT and the likely effort and timescale remaining ● Data The data required for testing can be derived from the test cases defined in the scripts. Networking – all required LAN and WAN connectivity must be set up and firewalls configured to allow appropriate access. There is often a need to process data through a system where some form of OLTP is involved.

Best Practices None Sample Deliverables None Last updated: 15-Feb-07 19:43 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . but should also include all source systems. target systems. The system tests will be a simulation of production systems so the entire process should execute like a production environment. and file systems. reference data and staging databases.appropriate backup/restore strategy in place. ● Application software – correct versions should be migrated from development. For Data Migration.Data Warehousing 395 of 1017 . and access rights defined and implemented. the system test environment should not be limited to the Informatica environment.

detailed test strategy. Worse yet. and remove defects early in the life cycle prior to testing. defect prevention and defect detection. change requests and trouble reports are evidence of defects that have made their way to the end users. of course.2 Prepare Defect Management Processes Description The key measure of software quality is. They should clearly define methods for reviewing system requirements and design and spell out guidelines for testing processes.Data Warehousing 396 of 1017 . in these later project stages. the number of defects (a defect is anything that produces results other than the expected results based on the software design specification).2. defect detection is a much more resource-intensive activity. however. Prerequisites None Roles Quality Assurance Manager (Primary) Test Manager (Primary) Considerations Personal and peer reviews are primary sources of early defect detection. Unit testing. Defect management begins with the design of the initial QA strategy and a good.Phase 6: Test Subtask 6. A good defect management process should enable developers to both lower the number of defects that are introduced. Therefore it is essential for software projects to have a systematic approach to detecting and resolving defects early in the development life cycle. tracking INFORMATICA CONFIDENTIAL Velocity v8 Methodology . system testing and UAT are other key sources. There are two major components of successful defect management.

and retest ● Complete description of the resolution Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:48 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .defects. with details of its behaviour ● Means for recording the timing of the defect discovery. In addition. especially during unit and system testing. and managing each type of test. resolution. many QA strategies include specific checklists that act as gatekeepers to authorize satisfactory completion of tests. To support early defect resolution. you must have a defect tracking system that is readily accessible to developers and includes the following: ● Ability to identify and type the defect.Data Warehousing 397 of 1017 .

and offers an opportunity to clarify users performance expectations and establish realistic goals that can be used to measure actual operation after the system is placed in production. tools and timelines for the test. the users may abandon it (especially if it is a reporting system) because it does not meet their perceived needs.4 Conduct Disaster Recovery Testing .which results in a loss of confidence in the system. 6.3.2 Execute Complete System Test .3 Perform Data Validation . 6. in which the QA Manager and QA team ensure that the system is capable of delivering complete. 4. valid data to the business users.3. This is useful for determining if existing or planned hardware will be sufficient to meet the demands on the system.3 Execute System Test Description System Test (sometimes known as Integration Test) is crucial for ensuring that the system operates reliably and according to the business requirements and technical design. in which the test team determines how to test the system from end-to-end to ensure a successful load as well as planning for the environments. in which the data integration team works with the Database Administrator to run the system tests planned in the prior subtask. Success rests largely on business users' confidence in the integrity of the data.Phase 6: Test Task 6.Data Warehousing 398 of 1017 . If the system does not provide adequate performance and responsiveness. in which the system’s robustness INFORMATICA CONFIDENTIAL Velocity v8 Methodology .3. the data may also be flawed. participants. 6. This task incorporates five steps: 1. If the system has flaws that impede its function. 3. System testing follows unit testing. 6.1 Prepare for System Test .3. It also offers a good opportunity to refine the data volume estimates that were originally generated in the Architect Phase. or users may perceive it as flawed . 2. It is crucial to also involve end-users in the planning and review of system tests. providing the first tests of the fully integrated system.

Data Warehousing 399 of 1017 . 5.5 Conduct Volume Testing . in which the system’s capability to handle large volumes is tested.3. Prerequisites None Roles Business Analyst (Primary) Data Integration Developer (Primary) Database Administrator (DBA) (Primary) End User (Primary) Network Administrator (Secondary) Presentation Layer Developer (Secondary) Project Sponsor (Review Only) Quality Assurance Manager (Review Only) Repository Administrator (Secondary) System Administrator (Primary) Technical Project Manager (Review Only) Test Manager (Primary) Considerations INFORMATICA CONFIDENTIAL Velocity v8 Methodology .and recovery in case of disasters such as network or server failure is tested. 6.

test procedures. system tests are important because these are essentially ‘dress-rehearsals’ for the final migration. describing the testing strategy. acceptance criteria. scripts. Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:48 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . when the system is in operation and may not be meeting performance expectations or delivering the results that users want . These tests should be executed with production-level controls and be tracked and improved upon from system test cycle to system test cycle.All involved individuals and departments should review and approve the test plans. For Data Migration projects. This information can be invaluable later on. In data migration projects these system tests are often referred to as ‘mock-runs’ or ‘trial cutovers’. and results. and test results prior to beginning this subtask. It is important to thoroughly document the system testing procedure.Data Warehousing 400 of 1017 .or expect.

● Migration of Informatica development folders to the system test environment.1 Prepare for System Test Description System test preparation consists primarily of creating the environment(s) required for testing the application and staging the system integration.3. Prerequisites None Roles Data Integration Developer (Secondary) Database Administrator (DBA) (Secondary) System Administrator (Secondary) Test Manager (Primary) Considerations The preparations for System Test often take much more effort than expected. communications. to fully integrate all the elements of the system. and any support tools. so they should be preceded by a detailed integration plan that describes how all of the system elements will be physically integrated within the System Test environment.Data Warehousing 401 of 1017 . The following are some general steps that are common in most integration plans. following comprehensive unit testing. the environment should be as similar as possible to the production environment in its hardware. These folders may also include shared folders and/or shortcut INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and to test the system by emulating how it will be used in production. The integration plan should be specific to your environment. System Test is the first opportunity. For this reason. but some of the general steps are likely be the same.Phase 6: Test Subtask 6. software.

folders that may have been added or modified during the development process.Data Warehousing 402 of 1017 . constraints. not a pre-system test step. the variables or parameters used for incremental logic need to match the values in the system test environment database(s). migrating code. The data model of the system test environment should be very similar to the model that is going to be implemented in production. In order to emulate the production environment. so it is important to system test the data model before going into production. Best Practices None Sample Deliverables System Test Plan Last updated: 01-Feb-07 18:48 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . ● ● For Data Migration projects the system test should not just involve running Informatica Workflows. deployment groups may be used for this purpose. flat files or parameter files reside on the development environment’s server and need to be copied to the appropriate directories on the system test environment server. If the variables or parameters don’t match. or indices often change throughout development. ● Data consistency in system test environment is crucial. Often. The system test set-up should be part of the system test. Columns. executing data and process validation and post-process auditing. it should also include data set-up. In order to emulate the production environment. they can cause missing data or unusual amounts of data being sourced. Synchronization of incremental logic is key when doing system testing. In versioned repositories. the data being sourced and targeted should be as close as possible to production data in terms of data quality and size.

2 Execute Complete System Test Description System testing offers an opportunity to establish performance expectations and verify that the system works as designed. This subtask involves a number of guidelines for running the complete system test and resolving or escalating any issues that may arise during testing.3. Prerequisites None Roles Business Analyst (Secondary) Data Integration Developer (Secondary) Database Administrator (DBA) (Review Only) Network Administrator (Review Only) Presentation Layer Developer (Secondary) Quality Assurance Manager (Review Only) Repository Administrator (Review Only) System Administrator (Review Only) Technical Project Manager (Review Only) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . as well as to refine the data volume estimates generated in the Architect Phase .Phase 6: Test Subtask 6.Data Warehousing 403 of 1017 .

System testing is a cyclical process. Use the pmcmd command line syntax when running PowerCenter tasks and workflows with a third-party scheduler. Third-party scheduling tools can create dependencies between PowerCenter tasks and jobs that may not be possible to run on PowerCenter. a new data warehouse (or a new instance of a data warehouse) may include a one-off initial load step. and test data need to be available prior to system test. The project team should plan to execute multiple iterations of the most common load routines within the timeframe allowed for system testing. and defect classifications. as is the schedule for the testing run. Applications should be run in the order specified in the test plan. Scheduling An understanding of dependent predecessors is crucial for the execution of end-to-end testing. System Test Results The team executing the system test plan is responsible for tracking the expected and INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Also the tools in PowerCenter and/or a third-party scheduling tool can be used to detect long running sessions/tasks and alert the system test team via email. For example.Test Manager (Primary) Considerations System Test Plan A System test plan needs to include pre-requisites to enter into the system test phase. is generally facilitated through an application such as the PowerCenter Workflow Manager module and/or a third-party scheduling tool.Data Warehousing 404 of 1017 . expected results. Scheduling. all test conditions. Load Routines Ensure that the system test plan includes all types of load that may be encountered during the normal operation of the system. This helps to identify issues early and manage system test timeframe effectively. which is the responsibility of the testing team. or ad-hoc processes beyond the normal incremental load routines. There may also be weekly. monthly. criteria to successfully exit system test phase. In addition.

Commercial software tools are available for logging test cases and storing test results. those criteria must be fulfilled. Defect levels must meet established criteria for completion of the system test cycle. If outstanding defects are still apparent at the end of the system testing period. the test team should immediately generate a change request. It is common for these types of projects to have three or four full system tests otherwise known as ‘mock runs’ or ‘trial cutovers’. the test team should seek the advice of the appropriate developer and business analyst before continuing with any other dependent tests. this is unlikely to happen. fixed. and successfully retested within the system testing timeframe. Defects should be judged by their number and by their impact. If system test plan contains successful system test completion criteria. Ultimately. Click the Transformation Statistics tab in the Properties dialog box. Session statistics are also available in the PowerCenter repository view REP_SESS_LOG.1 Prepare for System Test ). In reality.actual results of each session and task run. To see the results: ● ● Right-click the session in the Workflow Monitor and choose ‘Properties’. The project team must review and sign-off on the results of the tests. The results of this test should be reviewed. In the case of a PowerCenter session failure. the project team is responsible for ensuring that the tests adhere to the system test plan and the test cases within it (developed in Subtask 6. the project team needs to decide how to proceed.3. because they are usually part of a larger implementation the system test should be integrated with the larger project system test. For Data Migration projects. improved upon and communicated to the project manager or project management office (PMO). all defects will be captured. or if any process fails during testing. Ideally. or through Metadata Reporter. If the results do not meet the criteria listed in the test case. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The change request is assigned to the developer(s) responsible for completing system modifications.Data Warehousing 405 of 1017 . Resolution of Coding Defects The testing team must document the specific statistical results of each run and communicate those results back to the project development team. The details of each PowerCenter session run can be found in the Workflow Monitor.

Data Warehousing 406 of 1017 .Best Practices None Sample Deliverables None Last updated: 15-Feb-07 19:46 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

. it can sign-off on the end-to-end testing process. TCD. The team responsible for completing the end-to-end test plan should be in a position to utilize the results detailed in the testing documentation (e. or redefining the business requirements to minimize modifications. CTPs. TCR. which may include obtaining additional funding or resources. limiting the scope of the modifications. If expectations are not met. The Project Sponsor and Project Manager should then finalize the approach for incorporating the modifications. The Data Integration Developer should assess the resources and time required to modify the data integration environment to achieve the required test results. The analysis should also include data from initial runs in production. and TCRs).Data Warehousing 407 of 1017 . ● The gap analysis should list the errors and requirements not met so that a Data Integration Developer can be assigned to investigate the issue. the testing team should perform a gap analysis on the differences between the test results and the project and business expectations.3.3 Perform Data Validation Description The purpose of data validation is to ensure that source data is populated as per specification.g. ● If the team concludes that the expectations are being met. Test team members should review and analyze the test results to determine if project and business expectations are being met.Phase 6: Test Subtask 6. Prerequisites None Roles Business Analyst (Primary) Data Integration Developer (Review Only) Presentation Layer Developer (Secondary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

it is important to consider these issues: ● Job Run Validation. Stores metadata about an error and the error message. the end-user community are all jointly responsible for ensuring the accuracy of the data.Project Sponsor (Review Only) Quality Assurance Manager (Review Only) Technical Project Manager (Review Only) Test Manager (Primary) Considerations Before performing data validation. r r r ● Involvement. If relational database error logging is chosen. Stores metadata about the session. the QA team. when a transformation error occurs. The test team. then the error tables can be checked for any transformation errors and session errors. PMERR_MSG. PMERR_TRANS. ultimately.Data Warehousing 408 of 1017 . such as name and datatype. Access To Front-End for Reviewing Results. The session logs and the workflow monitor can be used to check if the job has completed successfully. PMERR_SESS. The Data Integration Developer needs to resolve the errors identified in the error tables. Stores data and metadata about a transformation row error and its corresponding source row. A very high-level testing validation can be performed using dashboards or custom reports using Informatica Data Explorer. The test team should have access to reports and/or a front-end tool to help review the results of each ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and. all must sign-off to indicate their acceptance of the data quality. At the conclusion of system testing. The Integration Service generates the following tables to help you track row errors: r PMERR_DATA. Stores metadata about the source and transformation ports.

This technique involves using various techniques and/or tools to validate data and ensure. Before testing begins. and how the results are to be validated. Manual validation may be valid for a limited set of data or for master data. scorecards can be used to generate a high-level view of the data quality. Plans can be built to identify problems with data conformity and consistency. at the end of cycle. The test team should also have access to current business reports produced in legacy and current operational systems. Otherwise. During the System Test phase of the data integration project. Data profiling allows the project team to test the requirements and assumptions that were the basis for the Design Phase and Build Phase of the project. Data validation can be either manual or automated. facilitating such tests as: r Business rule validations INFORMATICA CONFIDENTIAL Velocity v8 Methodology . the team should determine just how results are to be reviewed and reported. new test cases can be added and new test data can be created for the testing cycle.Data Warehousing 409 of 1017 . Using the results from data analysis and scorecards. The following tools are very useful for data validation: ● File Diff. This utility is generally available with any testing tool and is very useful if the source(s) and target(s) are files. ● Using DataProfiler In Data Validation. the result sets from the source and/or target systems can be saved as flat files and compared using file diff utilities. This technique involves manually validating target data with source and also ensuring that all the transformation have been correctly applied. you can use data profiling technology to validate the data loaded to the target database. The testing team can use Informatica Data Quality (IDQ) Data Analysis plans to assess the level of data quality needs. Automated. Manual. The current reports can be compared to those produced from data in the new system to determine if requirements are satisfied and that the new reports are accurate. that all the requirements are met. what tool(s) are to be used. The Data Validation task has enormous scope and is a significant phase in any project cycle. Once the data is analyzed.testing run. Full data validation can be one of the most time-consuming elements of the testing process. ● Data Analysis Using IDQ.

r r r Domain validations Row counts and distinct value counts Aggregation accuracy Throughout testing. A set of tools must be developed to enable the business validation personnel to quickly and accurately validate that the data migration was complete.Data Warehousing 410 of 1017 . When development data is used to develop the business rules for the mappings. it is advisable to re-profile the source data. For Data Migration projects it is important to identify a set of processes and procedures to be executed to simplify the validation process. This is particularly relevant in environments where production source data was not available during design. surprises commonly occur when production data finally becomes available. it can be used to verify the makeup and diversity of any data sets extracted or created for the purposes of testing. These processes and procedures should be built into the Punch List and should focus on reliability and efficiency. For large scale data migration projects it is important to realize the scale of validation. PowerCenter Metadata Reporter should be leveraged and documented in the punch list steps and detailed records of all interaction points should be included in operational procedures. Additionally it is important that the run book includes steps to verify that all technical steps were completed successfully. Additionally. Advanced tools may have facilities for defect assignment. Best Practices None Sample Deliverables None Last updated: 15-Feb-07 19:48 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . This provides information on any source data changes that may have taken place since the Design Phase. The Data Integration Developer and the testing team must ensure that all defects are identified and corrected before changing the defect status. and/ or a section for defect explanation. Defect Management: The defects encountered during the data validation should be organized using either a simple tool like an Excel (or comparable) spreadsheet or a more advanced tool. defect status changes.

3. and for ensuring that staff roles and responsibilities are understood if a disaster occurs. Prerequisites None Roles Database Administrator (DBA) (Primary) End User (Primary) Network Administrator (Secondary) Quality Assurance Manager (Review Only) Repository Administrator (Secondary) System Administrator (Primary) Test Manager (Primary) Considerations Prior to disaster testing.Phase 6: Test Subtask 6.4 Conduct Disaster Recovery Testing Description Disaster testing is crucial for proving the resilience of the system to the business sponsors and IT support teams. These factors should already have been assessed during earlier phases of the project.Data Warehousing 411 of 1017 . INFORMATICA CONFIDENTIAL Velocity v8 Methodology . disaster tolerance and system architecture need to be considered.

then the workflow can be set to fail. For example. Secondly. The need for a disaster tolerant system depends on the risk of disaster and how long the business can afford applications to be out of action. These determinations should allow you to judge the disaster tolerance capabilities of the system. The servers in the grid must be able to create and maintain a connection to each other across the network. Disaster Tolerance Disaster tolerance is the ability to successfully recover applications and data after a disaster within an acceptable time period. corrupts data. malicious acts of sabotage against the organization. The vulnerability of the business to disaster depends upon the importance of the system to the business as a whole and the nature of a system. If a server unexpectedly shuts down while it is running a session. the user acceptance machine can take over. System Architecture Disaster testing is strongly influenced by the system architecture.The first step is to try to quantify the risk factors that could cause a system to fail and evaluate how long the business could cope without the system should it fail. As an extra precaution. PowerCenter server grid technology is beneficial when designing and implementing a disaster tolerant system. server grids are used to balance loads and improve performance on resource-intensive tasks. A system can be designed with a clustered architecture to reduce the impact of disaster. Normally. If the production server fails. A well-designed system will minimize the risk of disaster. Service level agreements (SLA) for the availability of a system dictate the need for disaster testing. A disaster is an event that unexpectedly disrupts service availability. Sessions in a workflow can be configured to run on any available server that is registered to the grid. a real-time messagebased transaction processing application that has to be operational 24/7 needs to be recovered faster than a management information system with a less stringent SLA. consider the system architecture. or terrorist activity against society in general. If a disaster occurs. but they can help reduce disaster recovery time too. or destroys data. replication technology can be used to protect critical data. Disasters may be triggered by natural phenomena. the system should allow a smooth and timely recovery. a user acceptance system and a production system can run in a clustered environment. For example.Data Warehousing 412 of 1017 . The location and geographical proximity of data centers plus the nature of the business affect risk. This depends on the session settings specified and whether the server is configured as a master or worker INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Data Warehousing 413 of 1017 . Disaster testing requires a significant commitment in terms of staff and financial resources. even a torch in case the lights go out! Perhaps the greatest risk to a system is human error. There may not be time to test for every possible disaster scenario. the plan explains the test scope. Businesses need to provide proper training for all staff involved in maintaining and supporting the system.server. the test plan and activities should be precise. and achievable. spare network cards. For example. establishes the criteria for measuring success. Also be sure to provide documentation and procedures to cope with common support issues. you can begin to prepare the disaster test plan. Be aware that single physical points of failure are often hardware and network related. Be sure to have backup facilities and spare components available. and clarifies roles and responsibilities. If so the scope should list and explain why certain functions or scenarios cannot be tested. in a typical data warehouse it is quite easy to recover data during INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Focus on the stress points for each particular application when deciding on the test scope. relevant. The test plan identifies the overall test objectives. The guideline is to aim to avoid single points of failure in a system where possible. consider what the test goals are and whether they are worthwhile for the allocated time and resources. Furthermore.g. Therefore. cooling systems.. includes test scripts. Test Scope Test scope identifies the exact systems and functions to be tested. Although the failed workflow has to be manually recovered if one of the servers unexpectedly shuts down. Clustering and server grid solutions alleviate single points of failure. unless a catastrophic network failure occurs. specifies any prerequisites and logistical requirements (e. Allow sufficient time to prepare the plan. Remember a single mis-typed command or clumsy action can bring down a whole system. the test environment). Disaster Test Planning After disaster tolerance and system architecture have been considered. other servers in the grid should be available to rerun it. for example auxiliary generators.

particularly if new hardware and / or software components are introduced to the system being tested. and premises. as well as hardware and software needs. detailing the actions and activities required to actually conduct the technical tests. however. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . then the exercise can be considered a success. Make sure that the testing environment is kept up to date with code and infrastructure changes that are being applied in the normal system testing environment(s). it must be scheduled and communicated to all concerned parties. For example.the extract phase (i. It is important to regularly test for disaster tolerance.Data Warehousing 414 of 1017 . they should be tested in the isolated disaster-testing environment.. SLAs should already exist specifically for disaster recovery criteria. materials. These scripts can be simple or complex. Establish Success Criteria In theory. and can be used to provide instructions to test participants.e. Use SLAs to help establish quantifiable measures of success. Test Scripts The disaster test plan should include test scripts. if the disaster testing results meet or beat the SLA standards. when data is being extracted from a legacy system based on date/time criteria). Try and prepare a dedicated environment for disaster testing. Environment and Logistical Requirements Logistical requirements include schedules. In general. success criteria can be measured in several ways. Success can mean identifying a weakness in the system highlighted in the test cycle or successfully executing a series of scripts to recover critical processes that were impacted by the disaster test case. The test scripts should be prepared by the business analysts and application developers. As new applications are created and improved. Be sure to enlist the help of application developers and system architects to identify the stress points in the overall system. if the electricity supply is going to be turned off or the plug pulled on a particular server. The test schedule is important because it explains what will happen and when. It may be more difficult to recover from a downstream data warehouse or data mart load process.

If other staff can understand the plan and successfully recover the system by following it. and system architects.Data Warehousing 415 of 1017 . It is advisable to involve other business and IT departmental staff in the testing where possible. Ensure that the test plan is approved by the appropriate staff members and business groups. Additionally. then the impact of a real disaster is reduced. end-users. The testing team should be able to run the tests based on the information within the test plan and the instructions in the test scripts. Any deficiencies in this area need to be addressed because a good test plan forms the basis of an overall disaster recovery strategy for the system. Involve business representatives as well as IT testing staff in the disaster testing exercise. IT testing staff can focus on technical recovery of the system. Executing Disaster Tests Disaster test execution should expose any flaws in the system architecture or in the test plan itself. Business users can identify the key areas for recovery and prepare backup strategies and procedures in case system downtime exceeds normal expectations. Typically this is a simple document to identify emergency procedures to follow if something were to happen to any of the major pieces of infrastructure. and fine-tuning the test plan. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . It needs to communicate any issues in a timely manner to the application developers. it is recommended to establish a disaster recovery plan. a back-out plan should be present in the event the migration must stop mid-stream during the final implementation weekend. They can assist in simulating an attack on the database. not just the department members who planned the test. The test team is responsible for capturing and logging test results.Staff Roles and Responsibilities Encourage the organization IT security team to participate in a disaster testing exercise. Data Migration Projects While data migration projects don’t fully require a full-blown disaster recovery solution. business analysts. identifying vulnerable access points on a network.

and a backup using the pmrep syntax that can be called from a script. Be sure to backup the production repository every day.cfg.cfg.Data Warehousing 416 of 1017 . pmrepserver. Postscript: Backing Up PowerCenter Components Apply safeguards to protect important PowerCenter components. Best Practices Disaster Recovery Planning with PowerCenter HA Option PowerCenter Enterprise Grid Option Sample Deliverables None Last updated: 06-Dec-07 14:56 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Conclusion and Postscript Disaster testing is a critical aspect of the overall system testing strategy. and odbc. If conducted properly. It is also advisable to back up the pmserver.ini files. The backup takes two forms: a database backup of the repository schema organized by the DBA. even if disaster tolerance is not considered a high priority by the business. disaster testing provides valuable feedback and lessons that will prove important if a real disaster strikes.

A volume testing exercise is similar to a disaster testing exercise. a wellplanned and conducted test exercise provides invaluable reassurance to the business and IT communities regarding the stability and resilience of the system. Stress and volume testing seek to determine when and if system behavior changes as the load increases. consider the Service Level Agreements (SLA) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . this is also known as stress testing. Prerequisites None Roles Data Integration Developer (Secondary) Database Administrator (DBA) (Primary) Network Administrator (Secondary) System Administrator (Secondary) Test Manager (Primary) Considerations Understand Service Level Agreements Before starting the volume test exercise. However.3.5 Conduct Volume Testing Description Basic volume testing seeks to verify that the system can cope with anticipated production data levels. Taken to extremes. volume testing seeks to find the physical and logical limits of a system.Phase 6: Test Subtask 6.Data Warehousing 417 of 1017 . The test scenarios encountered may never happen in the production environment.

Volume testing exercises should aim to simulate throughput at peak periods as well as normal periods. The time available for the completion of a matching process can have a big impact on the perception that the plan is running correctly. Data archiving helps to reduce the volume of data in the actual core production system. For example. Bear in mind that. consult with a Data Quality Developer when estimating data volumes over time and peak load periods. However. A bank may have month or year-end processes and statements to produce. data matching operations are often scheduled for off-peak periods. Estimate Projected Data Volumes Over Time and Consider Peak Load Periods Enlist the help of the DBAs and Business Analysts to estimate the growth in projected data volume across the lifetime of the system. for these reasons. Use the projected data volumes to provide benchmarks for testing.3. Remember to make allowances for any data archiving strategy that exists in the system.Data Warehousing 418 of 1017 . Stress testing goes beyond the peak period data volumes in order to find the limits of the system. Organizations often experience higher than normal periods of activity at predictable times.for the particular system. See 6. The SLA should set measures for system availability and projected temporal growth in the amount of data being stored by the system. the net volume of data will increase over time. a retailer or credit card supplier may experience peak activity during weekends or holiday periods. If the project includes data quality operations. although of course. there are some volume-test specific issues to consider during the planning stage: INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Informatica Data Quality (IDQ) can perform millions or billions of comparison operations in a matching process. A task such as duplicate record identification (known as data matching in Informatica Data Quality parlance) can place significant demands on system resources. The SLAs are the benchmark to measure the volume test results against.4 Conduct Disaster Recovery Testing for details on disaster test planning guidelines. Volume Test Planning Volume test planning is similar in many ways to disaster test planning. Data matching is also a processor-intensive activity: the speed of the processor has a significant impact on how fast a matching process completes.

for example. third-party scrambling solutions are also available. Obtaining adequate volumes of data for testing in a nonproduction environment can be time-consuming and logistically difficult. For new applications. Do they match the production environment? Be sure to make allowances for the test results if there is a shortfall in processing capacity or network limitations on the test environment. production data probably does not exist. Alternatively. If stress tests are being carried out. Additional pressure can be applied to the system. ● Increasing Data Volumes Volume testing cycles need to include normal expected volumes of data and some exceptionally high volumes of data. Volume testing may involve ensuring that testing occurs at an appropriate time of day and day of week. Incorporate peak period loads into the volume testing schedules. Contact the DBA and the IT security manager for guidance on the data scrambling protocol of the department or organization. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Some of the popular RDBMS products contain built-in scrambling packages.Data Warehousing 419 of 1017 . ● Hardware and Network Requirements and Test Timing Remember to consider the hardware and network characteristics when conducting volume testing. one of the developers may be able to build a customized suite of programs to artificially generate data.Obtaining Volume Test Data and Data Scrambling The test team responsible for completing the end-to-end test plan should ensure that the volume(s) of test data accurately reflect the production business environment. Some organizations choose to copy data from the production environment into the test system. so remember to make allowances in the test plan for this. data volume need to be increased even further. and taking into account any other applications that may negatively affect the database and/or network resources. Security protocol needs to be maintained if data is copied from a production environment since the data is likely to need to be scrambled. Some commercially-available software products can generate large volumes of data. by adding a high number of database users or temporarily bringing down a server.

exceeding the maximum file size supported on an INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Any particular stress test cases need to be logged in the test plan and the test schedules. writing to disk etc.Data Warehousing 420 of 1017 . The results can be displayed in Data Analyzer dashboards or exported to other media (e. server performance and network efficiency. Alternatively. PowerCenter Metadata Reporter provides an excellent method of logging PowerCenter session performance over time. Physical or user-defined limits may be reached on particular parameters. data transfer efficiency. Eventually however. Be sure to capture performance statistics for PowerCenter tasks. and Bottlenecks If the system has been well-designed and built. such as those related to CPU usage.g. The views in the PowerCenter Repository can also be queried directly with SQL statements.. Scalability. the applications are more likely to perform in a predictable manner as data volumes increase. Run the Metadata Reporter for each test cycle to capture session and workflow lapse time. use the features within the scheduling tool to capture lapse time data. collaboration should occur with the network and server administrators regarding the option to capture additional statistics. the limits of the system are likely to be exposed as data volumes reach a critical mass and other stresses are introduced into the system. If jobs and tasks are being run through a scheduling tool. Volume and Stress Test Execution ● Volume Test Results Logging The volume testing team is responsible for capturing volume test results. In addition. use shell scripts or batch file scripts to retrieve time and process data from the operating system. database throughput. PDF files). The type of statistics to capture depend on the operating system in use. This is known as scalability and is a very desirable trait in any software system. ● System Limits. For example.

operating system constitutes a physical limit. Alternatively.Data Warehousing 421 of 1017 . However. Conclusion Volume and stress testing are important aspects of the overall system testing strategy. breaching sort space parameters by running a database SQL query probably constitutes a limit that has been defined by the DBA. For example. a SQL query called in a PowerCenter session may experience a sudden drop in performance when data volumes reach a threshold figure. The test results provide important information that can be used to resolve issues before they occur in the live system. be aware that it is not possible to test all scenarios that may cause the system to crash. A sound system architecture and well-built software applications can help prevent sudden catastrophic errors. The DBA and application developer need to investigate any sudden drop in the performance of a particular query. Volume and stress testing is intended to gradually increase the data load in order to expose weaknesses in the system as a whole. Best Practices None Sample Deliverables None Last updated: 18-Oct-07 15:11 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Bottlenecks are likely to appear in the load processes before such limits are exceeded.

e.Data Warehousing 422 of 1017 . Prerequisites None Roles End User (Primary) Test Manager (Primary) User Acceptance Test Lead (Primary) Considerations Plans By this time User Acceptance Criteria should have been precisely defined by the user community as well.4 Conduct User Acceptance Testing Description User Acceptance Testing (UAT) is arguably the most important step in the project and is crucial to verifying that the system meets the users’ requirements. but much of the preparation will have been undertaken by IT staff working to a plan agreed with the users. as the specific business objectives and requirements for the project.Phase 6: Test Task 6. As such. UAT Acceptance Criteria should include INFORMATICA CONFIDENTIAL Velocity v8 Methodology . UAT is the responsibility of the user community in terms of organization. Being business usage-focused. primarily through the presentation layer.. of course. without knowledge of all the underlying logic) that focuses on the deliverables to the end user. every effort must be made to replicate the production conditions. staffing and final acceptance. it relates to the business requirements rather than on testing all the details of the technical specification. The function of the user acceptance testing is to obtain final functional approval from the user community for the solution to be deployed into production. As such UAT is considered black box testing (i.

Data Warehousing 423 of 1017 . Best Practices None Sample Deliverables None Last updated: 16-Feb-07 14:07 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . etc.) including “gold standard” reports to use for validation data quality tolerances that must be met validation procedures that will be based for comparison to existing systems (esp. for validation of data migration/synchronization projects or operational integration) required performance tolerances. including response time and usability ● ● ● As the testers may not have a technical background. There needs to be communication with the user community so that they are informed of the project’s progress and able to identify appropriate members of staff to make available to carry out the testing. the plan should include detailed procedures for testers to follow. These participants will become the users most equipped to adopt the new system and so should be considered “super-users” who may participate in user training thereafter. including the required test data (ideally a copy of the real. The success of UAT depends on having certain critical items in place: ● ● Formal testing plan supported by detailed test scripts Properly configured environment. production environment and data) Adequately experienced test team members from the end user community Technical support personnel to support the testing team and to evaluate and remedy problems and defects discovered ● ● Staffing the User Acceptance Testing It is important that the user acceptance testers and their management are thoroughly committed to the new system and ensuring its success. based on the defect management procedures report validation procedures (data audit.● ● tolerable bug levels.

Data Quality Developers.Data Warehousing 424 of 1017 . Prerequisites None Roles Data Integration Developer (Primary) Data Warehouse Administrator (Primary) Database Administrator (DBA) (Primary) Network Administrator (Primary) Presentation Layer Developer (Primary) Quality Assurance Manager (Review Only) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Data Quality Plans. Database Administrators.Phase 6: Test Task 6. Decisions made during the development process can seriously impact performance and no level of production tuning can compensate for an inefficient design that must be redeveloped. in some cases. but should be useful for anyone responsible for the long-term maintenance. performance. and deployment. testing. PowerExchange Connectivity and Data Analyzer Reports.5 Tune System Performance Description Tuning a system can. tuning is a philosophy. it is a concept of continual analysis and optimization. The information in this section is intended for use by Data Integration Developers. provide orders of magnitude performance gains. The concept of performance must permeate all stages of development. tuning is not something that should just be performed after the system is in production. More importantly. and System Administrators. and support of PowerCenter Sessions. rather. However.

the entire end-to-end process must be considered and measured. The unit of work being baselined may be a single PowerCenter session for example. tuning efforts mistakenly focus on PowerCenter as the only point of concern when there may be other areas causing the bottleneck and needing attention. can fix the problem of slow source data access. network. followed by small incremental tuning changes to the environment. A good monitoring system may involve a variety of technologies to provide a full view of the environment. then re-executing the benchmarked data integration processes to determine the affect of the tuning changes Often. While it is certainly important to focus on a specific area. your data integration loads can never be faster than the source database can provide data. and deploy to production to gain benefit. but it is always necessary to consider the end-toend process of that session in the tuning efforts. network bandwidth.no amount of downstream tuning in PowerCenter. or underpowered . poorly implemented. Another important consideration of system tuning is the availability of an on-going means to monitor the system performance. hardware. If you are sourcing data from a relational database for example. The tuning effort requires benchmarking. tune. file systems etc. Quick identification of these areas allows pro-active tuning and adjustments before the problems become catosrophic. True system performance analysis requires looking at all areas of the environment to determine opportunities for better performance from relational database systems.Repository Administrator (Primary) System Administrator (Primary) System Operator (Primary) Technical Project Manager (Review Only) Test Manager (Primary) Considerations Performance and tuning the Data Integration environment is more than just simply tuning PowerCenter or any other Informatica product. extreme degredation for one reason or another. file systems. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . If the source database is poorly indexed.Data Warehousing 425 of 1017 . Throughout the tuning process. and even hardware. continuously monitoring the performance of the system may reveal areas that show degredation over time and sometimes even immediate.

Therefore. Because this down-time may prevent the business from operating. For Data Migration projects performance is often an important consideration. the scheduled outage window must be as short as possible. a down-time is usually required. Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:49 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . If a data migration project is the result of the implementation of a new package application or operational system.Note: The PowerCenter Administrator's Guide provides extensive information on performance tuning and is an excellent reference source on this topic.Data Warehousing 426 of 1017 . performance tuning is often addressed between system tests.

Prerequisites None Roles Data Integration Developer (Primary) Data Warehouse Administrator (Primary) Database Administrator (DBA) (Primary) Network Administrator (Primary) Presentation Layer Developer (Primary) Repository Administrator (Primary) System Administrator (Primary) Test Manager (Primary) Considerations INFORMATICA CONFIDENTIAL Velocity v8 Methodology .1 Benchmark Description Benchmarking involves the process of running sessions or reports and collecting run statistics to set a baseline for comparison. When determining a benchmark.Phase 6: Test Subtask 6. the two key statistics to record are: ● ● session duration from start to finish.5. and rows per second throughput.Data Warehousing 427 of 1017 . The benchmark can be used as the standard for comparison after the session or report is tuned for performance.

After choosing a set of mappings. For example. schedule the session to run daily at a time where there are not many processes running on the server. It is important to work with the same exact data set each time you run a session for benchmarking and performance tuning. Data Analyzer benchmarking should focus on the time taken to run the source query. This should define the number of rows processed for each source and target. and PowerCenter session and server process. it is important to run the exact same rows for future performance tuning tests. create a set of new sessions that use the default settings. Track two values for rows per second throughput: rows per second as calculated by PowerCenter (from transformation statistics in the session properties). it is important to choose a variety of mappings to benchmark. and the average rows processed per second (based on total time duration divided by the number of rows loaded). time to complete. Track the performance results in spreadsheet over a period of days or for several runs. Tip Tracking Results One way to track benchmarking results is to create a reference spreadsheet.000 rows for the benchmark. INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 428 of 1017 . the session start time. If it is not possible to run the session without background processes. These sessions are the first candidates for performance tuning. The session should run at the same time for future tests.Since the goal of this task is to improve the performance of the entire system. Run these sessions when no other processes are running in the background. if you run 1. compile the average of the results in a new spreadsheet. Once the average results are calculated. database. end time. identify the sessions that have lowest throughput or that miss their load window. and rows per second throughput. After the statistics are gathered. It should be possible to identify potential areas for improvement by considering the machine. Be sure that the session runs at the same time each day or night for benchmarking. the sessions should be tuned for performance. network. Having a variety of mappings ensures that optimizing one session does not adversely affect the performance of another session. When the benchmark is complete.

and display it in the user’s browser.Data Warehousing 429 of 1017 .generate the report. Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:49 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

5. Prerequisites None Roles Data Integration Developer (Primary) Data Warehouse Administrator (Primary) Database Administrator (DBA) (Primary) Network Administrator (Primary) Presentation Layer Developer (Primary) Repository Administrator (Primary) System Administrator (Primary) Test Manager (Primary) Considerations After performance benchmarks are established (in 6.5. including: INFORMATICA CONFIDENTIAL Velocity v8 Methodology .2 Identify Areas for Improvement Description The goal of this subtask is to identify areas for improvement.5. based on the performance benchmarks established in Subtask 6. It is important to consider all possible areas for improvement.1 Benchmark ).1 Benchmark .Data Warehousing 430 of 1017 .Phase 6: Test Subtask 6. careful analysis of the results can reveal areas that may be improved through tuning.

while many OLTP systems measure their performance in throughput. Network. disk I/O..Data Warehousing 431 of 1017 . other considerations must be included in the performance tuning activities. and backup. Database tuning is. an art form and is largely dependent on the DBA's skill. and processing) must be addressed for optimal performance. or more. Data Analyzer. Many ERP applications have two-step processes where the data is loaded through simulated on-line processes. or one of the other areas will suffer. and by eliminating paging of memory on the server running the PowerCenter sessions. Regardless of whether the system is UNIX. More specifically an API will be executed that will replicate in a batch scenario the way that the on-line entry works. available bandwidth.or NT-based. An often-overlooked facet of system performance. and most DBA's are schooled in OLTP performance tuning rather than response time tuning. For data migration projects. network protocol employed. Tuning the server daemon process and individual sessions can increase performance by a factor of 2 or 3. PowerCenter. etc. It is important to understand that analytic solutions define their performance in response time. session processing. These goals can be achieved by decreasing the number of network hops between the server and the databases. Key considerations for network performance include the network card and its settings. executing all edits. and in-depth understanding of the database engine. A major consideration in tuning databases is in defining throughput versus response time. Database. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . ● ● ● The actual tuning process can begin after the areas for improvement have been identified and documented. Each of the three functional areas of database tuning (i. network optimization can have a major affect on overall system performance. then optimizing and tuning the network may help to shorten the overall process of data movement. performance will not be the same as in a scenario where a relational database is being populated. in itself.● ● Machine. memory. In such a case. It is possible that tuning may be required for source queries and the reports themselves if the time taken to generate the report on screen takes too long. finesse. The best approach to performance tuning is to set the expectation that all data errors should be identified and corrected in the ETL layer prior to the load to the target application. For example. This approach can improve performance by as much as 80%.e. Most systems need to tune the PowerCenter session and server process in order to achieve an acceptable level of performance. packet size settings. if the process of moving or FTPing files from a remote server takes four hours and the PowerCenter session takes four minutes.

Data Warehousing 432 of 1017 .Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:49 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

5. Prerequisites None Roles Data Integration Developer (Primary) Database Administrator (DBA) (Primary) Network Administrator (Primary) Quality Assurance Manager (Review Only) Repository Administrator (Primary) System Operator (Primary) Technical Project Manager (Review Only) Test Manager (Primary) Considerations Performance tuning should include the following steps: INFORMATICA CONFIDENTIAL Velocity v8 Methodology .2 Identify Areas for Improvement . based on the areas for improvement that were identified and documented in Subtask 6.3 Tune Data Integration Performance Description The goal of this subtask is to implement system changes to improve overall system performance.5.Phase 6: Test Subtask 6.Data Warehousing 433 of 1017 .

the mapping is driven by business rules.g. 4. 5. For more details on tuning mappings and sessions refer to the Best Practices. the target is inserting the data quickly. This is the optimal desired performance. After the tuning achieves a desired level of performance. watching the buffer input and outputs for the sources and targets. In some cases. the DTM (data transformation manager) process should be the slowest portion of the session details. Run a session and monitor the server to determine if the system is paging memory or if the CPU load is too high for the number of available processors. Re-run the session and monitor the performance details. Only after the server. Points to look for in tuning mappings are: filtering unwanted data early. and the business rules are usually dictated by the business unit in concert with the end-user community. Since the purpose of most mappings is to enforce the business rules. re-run the PowerCenter session or Data Analyzer report to determine the impact of the changes. 6. 2. comparing the new performance with the old performance. This is because. Tune the source system and target system based on the performance details. re-run the benchmark sessions. cached lookups.1. it is rare that the mapping itself can be greatly tuned. This indicates that the source data is arriving quickly. and target have been tuned to their peak performance should the mapping and session be analyzed for tuning. optimizing one or two sessions to run quickly can have a disastrous effect on another mapping and care should be taken to ensure that this does not occur. Once the source and target are optimized. If the system is paging. Best Practices Session and Data Partitioning Sample Deliverables None INFORMATICA CONFIDENTIAL Velocity v8 Methodology .. correcting the system to prevent paging (e. increasing the physical memory available on the machine) can greatly improve performance. Finally. 3. and the actual application of the business rules is the slowest portion. in most cases. Only minor tuning of the session can be conducted at this point and usually has only a minimal effect. source.Data Warehousing 434 of 1017 . aggregators that can be eliminated by programming finesse and using sorted input on certain active transformations.

Data Warehousing 435 of 1017 .Last updated: 18-Oct-07 15:14 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Phase 6: Test Subtask 6.Data Warehousing 436 of 1017 . Prerequisites None Roles Database Administrator (DBA) (Primary) Network Administrator (Secondary) Presentation Layer Developer (Primary) Quality Assurance Manager (Review Only) Repository Administrator (Primary) System Administrator (Primary) Technical Project Manager (Review Only) Test Manager (Primary) Considerations Database Performance 1.4 Tune Reporting Performance Description The goal of this subtask is to identify areas where changes can be made to improve the performance of Data Analyzer reports.5. Generate SQL for each report and explain this SQL in the database to INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

If you find that many users are running aggregations against detail tables. Data Analyzer report rendering performance is directly related to the number of rows returned from the database. 2. Application Server Performance 1. make the report a cached report. This will save time when the user runs the report as the data will already be aggregated. Take care in adding indexes since indexes affect ETL load times. Therefore such reports should be run at the time when there is least use on the system subject to other dependencies. then provide analytic workflows to drill down to more detail. Fine tune the application server Java Virtual Machine (JVM) to correspond with the recommendations in the Best Practice on Data Analyzer Configuration and Performance Tuning. Strawman estimates for CPU and memory are as follows: r r 1 CPU per 50 users 1-2 GB RAM per CPU INFORMATICA CONFIDENTIAL Velocity v8 Methodology . 3. Within Data Analyzer. Also try to architect reports to start out with a high-level query. Tune the database hosting the data warehouse and add indexes on the key tables. 4. consider creating an aggregate table in the database and perform the aggregations via ETL processing. use filters within reports as much as possible. 2. Analyze SQL requests made against the database to identify common patterns with user queries. 2. If the data is being updated frequently. Reports run in batches can use considerable resources. This should significantly enhance Data Analyzer's reporting performance.Data Warehousing 437 of 1017 . Schedule reports to run during off peak hours. make the report a dynamic report. If the data within the report does not get updated frequently. Ensure that the application server has sufficient CPU and memory to handle the expected user load. Try to avoid sectional reports as much as possible since they take more time in rendering. Data Analyzer Performance 1. Try to restrict as much data as possible.determine if the most efficient access paths are being used.

You may need additional memory if a large number of reports are cached.Data Warehousing 438 of 1017 . Best Practices None Sample Deliverables None Last updated: 16-Feb-07 14:09 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . You may need additional CPUs if a large number of reports are on-demand.3.

6 Develop Run Book r r r r ● 7.Data Warehousing 439 of 1017 .Phase 7: Deploy 7 Deploy ● 7.2 Migrate Development to Production 7.1.1 Plan Deployment r 7.2.1 Train Users 7.1.1.3 Package Documentation r r INFORMATICA CONFIDENTIAL Velocity v8 Methodology .1.3 Plan User Documentation Rollout 7.2.1.2.2 Deploy Solution r 7.5 Develop Communication Plan 7.2 Plan Metadata Documentation and Rollout 7.1 Plan User Training 7.

this is where the fruits of the project are exposed and the end user acceptance begins. Up to this point.Phase 7: Deploy Description Upon completion of the Build Phase (when both development and testing are finished) the data integration solution is ready to be installed in a production environment and submitted to the ultimate test as a viable solution that meets the users' requirements. During the Build Phase components are created that may require special initialization steps and proceedures. organizations face many hurdles such as software upgrades.Data Warehousing 440 of 1017 . Metadata. and regular maintenance. and dashboards in one or more development environments. As data volumes grow and user interest increases. Prerequisites None INFORMATICA CONFIDENTIAL Velocity v8 Methodology . developers have been developing data cleansing. data transformations. To the end user. and maintainable data integration solution that provides business value to the user community. which is the cornerstone of any data integration solution. but it will be integral to planned metadata management projects down the road. The deployment strategy developed during the Architect Phase is now put into action. For the production deployment. it must be maintained to ensure stability and scalability. deploying a data integration solution is the final step in the development process. should play an integral role in the documentation and training rollout to users. checklists and procedures are developed to ensure that crucial steps are not missed in the production cut over. After the solution is actually deployed. reports. load processes. additional functionality requests. But whether a project team is developing the back-end processes for a legacy migration project or the front-end presentation layer for a metadata management system. Not only is metadata critical to the current data integration effort. scalable. All data integration solutions must be designed to support change as user requirements and the needs of the business change. Use the Deploy Phase as a guide to deploying an on-time.

Roles Business Analyst (Primary) Business Project Manager (Primary) Data Architect (Secondary) Data Quality Developer (Primary) Data Warehouse Administrator (Primary) Database Administrator (DBA) (Primary) End User (Secondary) Metadata Manager (Primary) Presentation Layer Developer (Primary) Project Sponsor (Approve) Quality Assurance Manager (Approve) Repository Administrator (Primary) System Administrator (Primary) Technical Architect (Secondary) Technical Project Manager (Review Only) Considerations None INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 441 of 1017 .

Data Warehousing 442 of 1017 .Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:49 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Phase 7: Deploy Task 7. critical to project success.1 Plan Deployment Description The success or failure associated with deployment often determines how users and management perceive the completed data integration solution. therefore.Data Warehousing 443 of 1017 . The steps involved in planning and implementing deployment are. This task addresses three key areas of deployment planning: ● ● ● Training Metadata documentation User documentation Prerequisites None Roles Application Specialist (Secondary) Business Analyst (Review Only) Data Integration Developer (Secondary) Database Administrator (DBA) (Primary) End User (Secondary) Metadata Manager (Primary) Project Sponsor (Primary) Quality Assurance Manager (Review Only) INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

And. everyone involved in the system design and build should understand the need for good documentation and make it a part of his or her everyday activities. including how. If this is the case. ensuring effective knowledge transfer throughout the development effort. the determination to create a training program must be made as early in the project lifecycle as possible. it is sometimes necessary to create these facilities to provide training on the data integration solution. when.System Administrator (Secondary) Technical Project Manager (Secondary) Considerations Although training and documentation are considered part of the Deploy Phase. Neither can be planned nor implemented effectively without the following: ● Thorough understanding of the business requirements that the data integration is intended to address In-depth knowledge of the system features and functions and its ability to meet business users' needs Understanding of the target users. and the project plan must specify the necessary resources and development time. Creating a new training program is a double-edged sword: it can be quite time-consuming and costly. Companies that do not have groups in place need to assign resources on the project team to these tasks. and why they will be using the system ● ● Companies that have training and documentation groups in place should include representatives of these groups in the project development team. This "in-process" documentation then serves as the foundation for the training curriculum and user documentation that is generated during the Deploy Phase. especially if additional personnel and/ or physical facilities are required but it also gives project management the opportunity to tailor a training program specifically for users of the solution rather than "fitting" the training needs into an existing program.Data Warehousing 444 of 1017 . INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Although most companies have training programs and facilities in place. both activities need to start early in the development effort and continue throughout the project lifecycle. Project management also needs to determine policies and procedures for documenting and automating metadata reporting early in the deployment process rather than making reporting decisions on-the-fly.

Capabilities should include: ● ● ● The ability to migrate code efficiently with little effort The ability to report what was deployed The ability to roll back changes if necessary This is why team-based development is normally a part of any data migration project. Documentation and training should both be developed with an eye toward flexibility and future change.Data Warehousing 445 of 1017 . in a consistent manner. it is important to recognize the need to revise the end-user documentation and training curriculum over the course of the project lifecycle as the system and user requirements change.Finally. For Data Migration projects it is very important that the operations team has the tools and processes to allow for a mass deployment of large amounts of code at one time. Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:49 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

INFORMATICA CONFIDENTIAL Velocity v8 Methodology . but is populated with only a small subset of data. When deploying a data conversion project. Ideally. However. but this arrangement raises the possibility of affecting either the development efforts or the production data. it may be necessary to train administrative users. when deploying a metadata management system. and successfully implement a user training program. Planning user training also entails ensuring the availability of appropriate facilities. on the other hand. and business users separately. For example. and that these products may be of interest to personnel at several layers of the organization. with minimal disruption of everyday activities. this system mirrors the production environment. Note also that users of data quality applications such as Informatica Data Quality or Informatica Data Explorer will require training. create. it may only be necessary to train administrative users. to developing class schedules geared toward training as many users as possible. In most cases. The first step in planning user training is identifying the system users and understanding both their needs and their existing level of expertise. In developing a training curriculum.e. such as legacy migration initiatives.from defining the system users and their needs. The needs of these specialized users can be addressed most effectively in follow-up classes. In any case.. then consider the specialized needs of high-end (i. multiple training programs are required in order to address a wide assortment of user types and needs. Training content and duration must correspond with enduser requirements.Data Warehousing 446 of 1017 . The project plan should include sufficient time and resources for implementing the training program . it may be that very little training is required on the data integration component of the project. training can use either a development or production platform. in most cases. efficiently and effectively.Phase 7: Deploy Subtask 7. In some cases. Prerequisites None Roles End User (Secondary) Considerations Successful training begins with careful planning. it is important to understand that there is seldom a "one size fits all" solution.1 Plan User Training Description Companies often misjudge the level of effort and resources required to plan. if sensitive production data is used in a training database. If a separate system is not available. training should take place on a system that is separate from the development and production environments. presentation layer users. A well-designed and well-planned training program is a "must have" for a data integration solution to be considered successfully deployed.1. expert) users and novice users who may be completely unfamiliar with decision-support capabilities. ensure appropriate security measures are in place to prevent unauthorized users in training from accessing confidential data. It is generally best to focus the curriculum on the needs of "average" users who will be trained prior to system deployment.

users often fail to understand the full capabilities of the data integration system and the company is unlikely to achieve optimal value from the system. The evaluation should address the effectiveness of both the course and the trainer because both are crucial to the success of a training program.Data Warehousing 447 of 1017 . Many companies overlook the importance of training users on the data content and application. the curriculum for a two-day training class on a data integration solution might look something like this: 2-Day Data Integration Solution Training Class Curriculum Day 1 Introduction and orientation High-level description & conceptualization tour of the data integration architecture Lunch Data content training 1 hour 2 hours Duration 1 hour 2 hours INFORMATICA CONFIDENTIAL Velocity v8 Methodology . but they do need to understand the access tools. In this case. including good graphics and well-documented exercise materials that encourage users to practice using the system features and functions. as does a poorly-designed presentation layer. data content and application training are also important to business users. simplifying the necessary information as much as possible and organizing it to match the users' requirements. the presentation layer. identifying both its strengths and weaknesses and making recommendations for future or follow-up training classes. and the underlying data content to use it effectively. they are unlikely to use the data integration solution on a regular basis in their everyday activities. If users do not gain confidence using the system during training. Training for business users usually focuses on three areas: r r r ● The presentation layer Data content Application ● While the presentation layer is often the primary focus of training. Laboratory materials can make or break a training program by encouraging users to try using the system on their own. attractive training materials. providing only data access tool training. Careful curriculum preparation includes developing clear. Thus.● Business users often do not need to understand the back-end processes and mechanisms inherent in a data integration solution. Training materials that contain obvious errors or poorly documented procedures actually discourage users from trying to use the system. The training curriculum should include a post-training evaluation process that provides users with an opportunity to critique the training program. As an example. training should focus on these aspects.

Data Warehousing 448 of 1017 .Introduction to the presentation layer Day 2 Introduction to the application Introduction to metadata Lunch Integrated application & presentation layer laboratory 1 hour 1 hour 2 hours 1 hour 3 hours Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:49 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

automatic entries are produced while importing a source or saving a mapping.. “Single View of Customer”).Data Warehousing 449 of 1017 . using manual and automatic entries into the metadata repository. thorough metadata documentation provides end users with an even clearer picture of the potentially vast impact of seemingly minor changes in data structures.2 Plan Metadata Documentation and Rollout Description Whether a data integration project is being implemented as a “single-use effort” such as a legacy migration project. easy-tounderstand. Manual entries may include descriptions and business names. This subtask uses the example of a PowerCenter development environment to discuss the importance of documenting metadata. it is possible to capture each step of the data integration process in the metadata. the PowerCenter development environment is graphical. Although it is not always easy to capture important metadata. for example.1.Phase 7: Deploy Subtask 7. Proper use and enforcement of metadata standards will. or as a longer-term initiative such as data synchronization (e. careful planning is required early in the development process to properly capture the desired metadata. and intuitive. and stored throughout various systems. every effort must be expended to satisfy this component of business documentation requirements. for example. When metadata management systems are built. On the back-end. Because every aspect of design can potentially be captured in the PowerCenter repository. help ensure that future audit requirements are met. transformed. Prerequisites None Roles INFORMATICA CONFIDENTIAL Velocity v8 Methodology . it is important to remember that metadata documentation is just as important for metadata management and presentation-layer development efforts. and that business users have the ability to learn exactly how their data is migrated. metadata documentation is critical to the overall success of the project.g. Metadata is the information map for any data integration effort. On the front-end. However.

With PCMR. developers and administrators can perform both operational and impact analysis on their data integration projects. Informatica provides several methods of gaining access to this data:. a set of views that are ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . variable. The decision about which kinds of metadata to capture is driven by business requirements and project timelines. PowerCenter provides the ability to enter descriptive information for all repository objects. and so forth. This ability helps users in a number of ways: for example. targets. as well as all information about column size and scale. how to access it. and primary keys are stored in the repository. therefore. it eliminates confusion about which columns should be used for a calculation. The decision. and transformations. ● The PowerCenter Metadata Reporter (PCMR) provides Web-based access to the PowerCenter repository. Moreover. it would also be very time-consuming. sources. From the developer's perspective. While it may be beneficial for a developer to enter detailed descriptions of each column. it is important to decide what metadata to capture. 'C_Year' and 'F_Year' might be column names on a table. because the repository structure can change with any product release. but 'Calendar Year' and 'Fiscal Year' are more useful to business users trying to calculate market share for the company's fiscal year. column level descriptions of the columns in a table. should be based on how much metadata is actually required by the systems that use metadata.Data Warehousing 450 of 1017 . expression. Informatica does not recommend accessing the repository tables directly. For example. Informatica continues to provide the “MX Views”.Database Administrator (DBA) (Primary) Metadata Manager (Primary) System Administrator (Secondary) Technical Project Manager (Review Only) Considerations During this subtask. This enables business users to maintain information on the actual business name and description of a field on a particular table. datatypes. and when to place change control check points in the process to maintain all the changes in the metadata. even for select access.

installed with the PowerCenter repository. MX2 is a set of encapsulated objects that can communicate with the metadata repository through a standard interface. The MX Views are meant to provide query-level access to repository metadata. These MX2 objects offer developers an advanced object-based API for accessing and manipulating the PowerCenter Repository from a variety of programming languages. Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:49 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 451 of 1017 .

and another geared toward "push button" users.3 Plan User Documentation Rollout Description Good system and user documentation is invaluable for a number of data integration system users. Management users who are learning to navigate reports and dashboards. Enterprise architects trying to develop a clear picture of how systems. and Business users trying to pull together analytical information for an executive report. such as: ● ● New data integration or presentation layer developers.Data Warehousing 452 of 1017 . and metadata are connected throughut an organization.1. focusing on understanding the data. and providing details on how and where they can find information within the system. ● ● A well-documented project can save development and production team members both time and effort getting the new system into production and the new employee(s) up-tospeed. Prerequisites None Roles Business Analyst (Review Only) Quality Assurance Manager (Review Only) Considerations INFORMATICA CONFIDENTIAL Velocity v8 Methodology . providing details about the data integration architecture and configuration. data. User documentation usually consists of two sets: one geared toward ad-hoc users.Phase 7: Deploy Subtask 7. This increasingly includes documentation on how to use and/or access metadata.

Group members attend detailed training sessions and work with the documentation and training specialists to develop materials that are geared toward the needs of typical. it should incorporate a feedback mechanism that encourages users to evaluate it and recommend changes or additions. and they serve as in-house experts on the data integration architecture. It requires careful planning and frequent review to ensure that it meets users' needs and is easily accessible to everyone that needs it. Such groups have two benefits: they help to ensure that training and documentation materials are on-target for the needs of the users. system users like themselves. and increase their understanding of. or frequent. To improve users' ability to effectively access information in.Data Warehousing 453 of 1017 . reducing users' reliance on the central support organization.Good documentation cannot be implemented in a haphazard manner. the content. In addition. Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:49 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . many companies create resource groups within the business organization.

Prerequisites 7. IT infrastructure. it cannot be assumed that everyone is always up to date on the production go-live planning and timing.Data Warehousing 454 of 1017 . The plan should discuss where key communication information will be stored. A comprehensive communication plan can ensure that all required people in the organization are ready for the production deployment. The communication plan will ensure proper and timely communication across the organization so there are no surprises when the production run is initiated.1. who will be communicated to.5 Develop Communication Plan Description A communication plan should be developed that discusses the details of communications and coordination for the production rollout of the data integration solution. web support teams. and how much communication will be provided.1. For example you may need to communicate with DBA's. This information will be initially in a stand-alone document but upon project management approval this information will be added to the run book.4 Develop Punch List Roles Application Specialist (Secondary) Data Integration Developer (Secondary) Database Administrator (DBA) (Secondary) Production Supervisor (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Since many of them can be outside of the immediate data integration project team.Phase 7: Deploy Subtask 7. and other system owners that may have assigned tasks and monitoring activities during the first production run.

Project Sponsor (Review Only) System Administrator (Secondary) Technical Project Manager (Secondary) Considerations The communication plan should provide details about communication. how would the entire core project team communicate in a dire emergency). Since many go-live events occur over weekends. Best Practices None Sample Deliverables Data Migration Communication Plan Last updated: 01-Feb-07 18:49 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . it is also important to retain not only business contact information but also weekend contact information such as cell phones or pagers in the event a key contact needs to be reached on a non-business day.Data Warehousing 455 of 1017 ..e. It must include steps to take if a specific person on the plan is unresponsive. escalation procedures and emergency communication protocols (i.

Typically the punch list will be created for the first trial cutover or mock-run and the run book will be developed during the first and second trial cutovers and completed by the start of the final production go-live.Data Warehousing 456 of 1017 .6 Develop Run Book Description The Run Book contains detailed descriptions of the tasks from the punch list that was used for the first production run. It details the tasks more explicitly for the individual mock-run and final go-live production run.1.Phase 7: Deploy Subtask 7. Prerequisites 7.5 Develop Communication Plan Roles Application Specialist (Secondary) Data Integration Developer (Secondary) Database Administrator (DBA) (Secondary) Production Supervisor (Primary) Project Sponsor (Review Only) System Administrator (Secondary) Technical Project Manager (Secondary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology .1.1.4 Develop Punch List 7.

The run book for a data migration project eliminates the need for an operations manual that is present for most other data integration solutions. Go/No-Go Procedure Information will also be included in the run-book. For Data Migration projects this is even more imperative. versus providing too little detail that could jeopardize the successful execution of the tasks. The run book is developed and leveraged on trial cutovers and should have all the necessary information to ensure a successful migration. Best Practices None Sample Deliverables Data Migration Run Book Last updated: 01-Feb-07 18:50 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . since you normally have only one critical go-live event. It is important to find a balance between providing too much information making it unwieldy and unlikely to be used.Data Warehousing 457 of 1017 .Considerations One of the biggest challenges for completing a run book (like completing an operations manual) is to provide an adequate level of detail. This is the one chance to have a successful production golive without negatively impacting operational systems that depend on the migrated data.

Data Warehousing 458 of 1017 . and providing clear and consistent documentation.Phase 7: Deploy Task 7.2 Deploy Solution Description The challenges involved in successfully deploying a data integration solution involve managing the migration from development through production. Prerequisites None Roles Business Analyst (Primary) Business Project Manager (Primary) Data Architect (Secondary) Data Integration Developer (Primary) Data Warehouse Administrator (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and delivery of finalized documentation rather than hurrying through these tasks solely to meet a predetermined target delivery date. These are all critical factors in determining the success (or failure) of an implementation effort. Before the deployment tasks are undertaken however. The deployment strategies for Informatica processes should take this into account and when applicable match up with those deployment strategies. training end-users. For data migration projects it is important to understand that some packaged applications such as SAP have their own deployment strategies. If all prerequisites are not satisfactorily completed. it is necessary to determine the organization's level of preparedness for the deployment and thoroughly plan end-user training materials and documentation. training. it may be advisable to delay the migration.

Database Administrator (DBA) (Primary) Presentation Layer Developer (Primary) Production Supervisor (Approve) Quality Assurance Manager (Approve) Repository Administrator (Primary) System Administrator (Primary) Technical Architect (Secondary) Technical Project Manager (Approve) Considerations None Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:50 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 459 of 1017 .

production operations personnel responsible for day-to-day operations and maintenance. If the presentation layer is not ready or the data appears incomplete or inaccurate. This involves ensuring that a “laboratory environment” is set-up properly for multiple concurrent users.Phase 7: Deploy Subtask 7.1 Train Users Description Before training can begin.2. and more. operational users who need to review the content being delivered by a data conversion system. company management must work with the development team to review the training curricula to ensure that it meets the needs of the various application users. users may lose interest in the application and choose not to use it for their regular business tasks. Application users may include individuals who have reporting needs and need to understand the presentation layer. First. however. and that data is clean and available to that environment. It is also important to prevent untrained users from accessing the system. otherwise the support staff is likely to be overburdened and spend a significant amount of time providing on-the-job training to uneducated users.Data Warehousing 460 of 1017 . Prerequisites None Roles Business Analyst (Primary) Business Project Manager (Primary) Data Integration Developer (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . After the training curricula is planned and users are scheduled to attend classes appropriate to their needs. administrative users managing the sourcing and delivery of metadata across the enterprise. management and the development team need to understand just who the users are and how they are likely to use the application. a training environment must be prepared for the training sessions. This lack of interest can result in an underutilized resource critical to business success.

If the training needs of the various user groups vary widely. The training curricula should educate users about the data content as well as the effective use of the data integration system. a thorough understanding of the data content helps to ensure that training moves along smoothly without interruption for ad-hoc questions about the meaning or significance of the data itself. This type of training can be held in informal "question and answer" sessions rather than formal classes. it is important to remember that no one training curriculum can address all needs of all users. It is also wise to schedule follow-up training for data and tool issues that are likely to arise after the deployment is complete and the end-users have had time to work with the tools and data. While correct and effective use of the system is important. Additionally. it may be necessary to obtain additional training staff or services from a vendor or consulting firm. and ensure that they receive appropriate training. and metadata managers. some key users may not be properly trained. resulting in a less-than-optimal hand-off to the user departments. Finally. at least from a high-level perspective. For example. The user roles should be defined up-front to ensure that everyone who needs training receives it. it may be important to consider users such as DBAs. Best Practices None INFORMATICA CONFIDENTIAL Velocity v8 Methodology . data modelers. be sure that training objectives are clearly communicated between company management and the development team to ensure complete satisfaction with the training deliverable. If the roles are not defined up-front.Data Warehouse Administrator (Secondary) Presentation Layer Developer (Primary) Technical Project Manager (Review Only) Considerations It is important to consider the many and varied roles of all application users when planning user training. in addition to training obvious users such as the operational staff. The basic training class should be geared toward the average user with follow-up classes scheduled for those users needing training on the application's advanced features.Data Warehousing 461 of 1017 .

Sample Deliverables Training Evaluation Last updated: 01-Feb-07 18:50 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 462 of 1017 .

While there are multiple tasks to perform in the deployment process. workflows. stored procedures. for example). dashboards. ● ● ● Prerequisites None Roles Data Warehouse Administrator (Primary) Database Administrator (DBA) (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology .2. These tasks are dispatched within three phases: ● ● ● Pre-deployment phase Deployment phase Post-deployment phase Each phase is detailed in the ‘Considerations’ section. schedules. etc.Phase 7: Deploy Subtask 7. the actual migration phase consists of moving objects from one environment to another. Data Quality .mappings. global variables.schemas. parameters files. sessions.2 Migrate Development to Production Description To successfully migrate PowerCenter or Data Analyzer from one environment to another one (from development to production.datamaps and registrations. some tasks must be completed. Data Analyzer . PowerExchange/CDC .plans and dictionaries. reports. A migration can include the following objects: ● PowerCenter . scripts.Data Warehousing 463 of 1017 .

If you are going to use the folder copy method. Ensure the Production environment is compliant with specifications and is ready to receive the deployment.e. manually change the connections for each incorrect session to source and target the production environment. If you are going to use the deployment group method. Obtain sign-off from the deployment team and project teams to deploy to the Production environment. make sure all the objects to be migrated are checked-in and refresh the deployment group as it is done. ● ● ● Deployment tasks: ● Verify the consistency of the connection objects names across environments to ensure that the connections are being made to the production sources/ targets. Obtain sign-off from the business units to migrate to the Production environment. during. make sure the shared folders are copied before the non-shared folders. If not. and after the migration to ensure a successful deployment. Pre-deployment tasks ● Ensure all objects have been successfully migrated and tested in the Quality Assurance environment. Determine the method of migration (i.Data Warehousing 464 of 1017 . Failure to complete one or more of these tasks can result in an incomplete or incorrect deployment.. folder copy or deployment group) to use.Production Supervisor (Approve) Quality Assurance Manager (Approve) Repository Administrator (Primary) System Administrator (Primary) Technical Project Manager (Approve) Considerations The tasks below should be completed before. ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

). ● Post-deployment tasks: ● Communicate with the management team members on all aspects of the migration (i.PowerCenter Using PowerCenter Labels Migration Procedures .Data Warehousing 465 of 1017 . successes. develop a project close document to evaluate the overall effectiveness of the project (i. Make sure the new tables are associated with the proper data source and that the data connectors are plugged to the news schemas. lessons learned. If the deployment window is longer that the regular maintenance window. Finalize and deliver the documentation. recommended improvements.● Data Analyzer objects that reference new tables require that schemas be migrated before the reports. ● ● Finally. it may be necessary to coordinate with the business unit to minimize the impact on the end-users. Synchronize the deployment window with the maintenance window to minimize the impact on end-users. when deployment is complete. tips and tricks.PowerExchange Deploying Data Analyzer Objects Sample Deliverables Project Close Report INFORMATICA CONFIDENTIAL Velocity v8 Methodology . etc.e. etc. problems encountered. solutions... Obtain final user and project sponsor acceptance.). Best Practices Deployment Groups Migration Procedures .e.

Data Warehousing 466 of 1017 .Last updated: 01-Feb-07 18:50 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

and Distributing them to the departments and individuals that will need them to use or supervise use of the application. Some typical deliverables include all of those listed in the Sample Deliverables section.Data Warehousing 467 of 1017 . and so forth. ● ● Documentation types and content varies widely among projects.2. depending on the type of engagement. management should have reviewed and approved all of the documentation. Updating and/or revising them as necessary. scope of project.Phase 7: Deploy Subtask 7. By this point.3 Package Documentation Description The final tasks in deploying the new application are: ● Gathering all of the various documents that have been created during the life of the project. expectations. Prerequisites None Roles Business Analyst (Approve) Business Project Manager (Primary) Data Architect (Secondary) Data Integration Developer (Primary) Data Warehouse Administrator (Secondary) Database Administrator (DBA) (Secondary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Presentation Layer Developer (Primary) Production Supervisor (Approve) Technical Architect (Primary) Technical Project Manager (Review Only) Considerations None Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:50 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 468 of 1017 .

2 Upgrade Software r INFORMATICA CONFIDENTIAL Velocity v8 Methodology .3 Monitor Load Processes 8.2 Operate Solution r 8.3.2.Data Warehousing 469 of 1017 .4 Track Change Control Requests 8.1.1 Define Production Support Procedures r 8.1 Execute First Production Run 8.Phase 8: Operate 8 Operate ● 8.1 Develop Operations Manual ● 8.2.2.5 Monitor Usage 8.2.6 Monitor Data Quality r r r r r ● 8.3.1 Maintain Repository 8.2.3 Maintain and Upgrade Environment r 8.2 Monitor Load Volume 8.2.

and turn over the deliverables discussed throughout this phase. hardware and software upgrades. This team requires only the appropriate system documentation and lead time to be ready to provide support. Often. During its day-to-day operations the system continually faces new challenges such as increased data volumes. This phase is sometimes referred to as production support.Data Warehousing 470 of 1017 . test. Planning is probably the most important task in the Operate Phase.Phase 8: Operate Description The Operate Phase is the final step in the development of a data integration solution. but does not allow adequate time to plan and execute the turnover to day-to-day operations. Prerequisites None Roles Business Project Manager (Primary) Data Integration Developer (Secondary) Data Steward/Data Quality Steward (Primary) Data Warehouse Administrator (Secondary) Database Administrator (DBA) (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Thus. and network or other physical constraints. it is imperative for the project team to acknowledge this support capability by providing ample time to create. Many companies have dedicated production support staff with both the necessary tools for system monitoring and a standard escalation process. the project team plans the system's development and deployment. The goal of this phase is to keep the system operating smoothly by anticipating these challenges before they occur and planning for their resolution.

Data Warehousing 471 of 1017 .Presentation Layer Developer (Secondary) Repository Administrator (Primary) System Administrator (Primary) System Operator (Primary) Technical Project Manager (Review Only) Considerations None Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:50 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

The production support procedures should be clear to system operators even before the system is in production. which tells system operators how to run the system on a day-to-day basis. because any production issues that are going to arise will probably do so very shortly after the system goes live.and then. Note that this task must occur prior to the system actually going live. In addition. Be sure to thoroughly document specific procedures and contact information for problem escalation. Most organizations have well-established and documented system support procedures in-place.Data Warehousing 472 of 1017 . the project team produces an Operations Manual. this task should produce guidelines for performing system upgrades and other necessary changes to the system throughout the project's lifetime. Prerequisites None Roles Data Integration Developer (Secondary) Production Supervisor (Primary) System Operator (Review Only) Considerations The watchword here is: Plan Ahead.Phase 8: Operate Task 8. The support procedures for the solution should fit into these existing procedures. The manual should include information on how to restart failed processes and who to contact in the event of a failure. especially if the procedures or contacts INFORMATICA CONFIDENTIAL Velocity v8 Methodology . preferably before the system actually goes live. deviating only where absolutely necessary . only with the prior knowledge and approval of the Project Manager and Production Supervisor.1 Define Production Support Procedures Description In this task. Any such deviations should be determined and documented as early as possible in the development effort.

Data Warehousing 473 of 1017 .differ from the existing problem escalation plan. Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:50 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

the manual provides the operators with error processing information.Data Warehousing 474 of 1017 . etc. as well as reprocessing steps in the event of a system failure. e. For data visualization.Phase 8: Operate Subtask 8. For a data integration/migration/consolidation solution.1 Develop Operations Manual Description After the system is deployed. web server and application server information Notify the appropriate second-tier support personnel in the event of a serious system malfunction INFORMATICA CONFIDENTIAL Velocity v8 Methodology . schedules Rerun scheduled reports Source. tasks and any external code Recover and restart workflows Notify the appropriate second-tier support personnel in the event of a serious system malfunction Record the appropriate monitoring data during and after workflow execution (i. The Operations Manual should contain a high-level overview of the system in order to familiarize the operations staff with new concepts along with the specific details necessary to successfully execute day-to-day operations.) ● For a data visualization or metadata reporting solution the manual should include the details on the following: ● ● ● ● Run reports. data volumes. The system operators .use this manual to determine how to run the various pieces of the implemented solution. the manual should provide operators with the necessary information to perform the following tasks: ● ● ● Run workflows. and shared objects in order to familiarize the operations staff with those concepts. database.1. load times. target.the individuals who monitor the system on a day-to-day basis . the Operations Manual should contain high-level explanations of reports. the Operations Manual is likely to be the most frequentlyused document in the operations environment. dashboards. worklets.. In addition.

) Operations manuals for all projects should provide information for performing the following tasks: ● ● ● Start servers Stop servers Notify the appropriate second-tier support personnel in the event of a serious system malfunction Test the health of the reporting and/or data integration environment (i. during the Design Phase. etc. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and unit and integration testing plans contain a great deal of information that can be transferred into the Operations Manual. report specifications. frequency. Bear in mind that data quality processes are executed earlier.Data Warehousing 475 of 1017 ..● Record the appropriate monitoring data (i. ● Prerequisites None Roles Data Integration Developer (Secondary) Production Supervisor (Primary) System Operator (Review Only) Considerations A draft version of the Operations Manual can be started during the Build Phase as the developers document the individual components. data volumes.e.e. check CPU and memory usage on the PowerCenter and Data Analyzer servers). source and target databases / files and real time feeds. report run times. check DB connections to the repositories.. Documents such as mapping specifications. although the Data Quality Developer and Data Integration Developer will be available during the Build Phase to agree on any data quality measures (such as ongoing runtime data quality process deployment) that need to be added to the Operations Manual.

Rather. In addition. it is intended to provide system operators with a consolidated source of documentation to help them support the system.The Operations Manual serves as the handbook for the production support team. Restart and recovery procedures should be thoroughly tested and documented. and supporting products. For example. along with step-by-step instructions for implementing the procedures. an Operations Manual typically contains names and phone numbers for on-call support personnel. The Operations Manual also does not replace proper training on PowerCenter. Escalation procedures should be thoroughly discussed and distributed so that members of the development and operations staff are fully familiar with them. Keeping this information consolidated in a central place in the document makes it easier to maintain. it is imperative that it be accurate and kept up-to-date. the Operations Manual is not meant to replace user manuals and other support documentation. the manual should include information on any manual procedures that may be required.Data Warehousing 476 of 1017 . Best Practices None Sample Deliverables Operations Manual Last updated: 01-Feb-07 18:50 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Therefore. and the processing window should be calculated and published. Data Analyzer. Although it is important. This attention to detail helps to ensure a smooth transition into the Operate Phase.

For a data migration or consolidation solution. the system must be monitored to ensure that data is being loaded into the database.2 Operate Solution Description After the data integration solution has been built and deployed. the job of running it begins. A data visualization or metadata reporting solution should be monitored to ensure that the system is accessible to the end users. Prerequisites None Roles Business Project Manager (Primary) Data Steward/Data Quality Steward (Primary) Data Warehouse Administrator (Secondary) Database Administrator (DBA) (Primary) Presentation Layer Developer (Secondary) Project Sponsor (Primary) Repository Administrator (Review Only) System Administrator (Primary) System Operator (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Phase 8: Operate Task 8.Data Warehousing 477 of 1017 . The goal of this task is to ensure that the necessary processes are in place to facilitate the monitoring of and the reporting on the system's daily processes.

Technical Project Manager (Review Only) Considerations None Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:50 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 478 of 1017 .

Prerequisites 6. This first run should be executed following the Punch List and should be revisited upon completion of the execution.Data Warehousing 479 of 1017 . tested and signed off for production it is time to execute the first run in the production environment. the first run is a key to a successful deployment. This run should leverage a Punch List and should execute a set of tested workflows or scripts (not manual steps such as executing a specific SQL statement for set-up). In most cases the first production run is a high-profile set of activities that must be executed. controlled.2 Execute Complete System Test 7. documented.2.3.2. While the first run is often similar to the on-going load process. documented and communicated. During the implementation. There are often specific one-time setup tasks that need to be executed on the first run that will not be part of the regular daily data integration process.2 Migrate Development to Production Roles Database Administrator (DBA) (Primary) Production Supervisor (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and improved for all future production runs.Phase 8: Operate Subtask 8. it can be distinctively different. Any manual steps should be closely monitored.1 Execute First Production Run Description Once a Data Integration solution is fully developed. It is important that the first run is executed successfully with limited manual interactions.

There may be extra data validation and verification at the end of the first production run to ensure that the system is properly initialized and ready for on-going loads. real time) depend on the setup and success of the first production run. often the first production run may include loading historical data as well as initial loads of code tables and dimension tables.System Administrator (Primary) System Operator (Primary) Technical Project Manager (Review Only) Considerations For some projects (such as a data migration effort) the first production run is the production system. It will not go on beyond the first production run since a data migration by its nature requires a single movement of the production data. the set of tasks that make up the production run may not be executed again. hourly. Best Practices None Sample Deliverables Data Migration Run Book Operations Manual Punch List Last updated: 01-Feb-07 18:50 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The load process may execute much longer than a typical on-going load due to the extra amount of data and the different criteria it is run against to pick up the historical data. Any future runs will be a part of the execution that addresses a specific data problem. For data warehouses. not the entire batch.Data Warehousing 480 of 1017 . Further. It is important to appropriately plan and execute the first load properly as the subsequent periodic refreshes of the data warehouse (daily.

If generated correctly. the data volume estimates used by the Technical Architect and the development team in building the architecture.2 Monitor Load Volume Description Increasing data volume is a challenge throughout the life of a data integration solution. the amount of data processed and loaded into the database continues to grow. the amount of data processed and presented increases. By continuously monitoring volumes. however.2. Prerequisites None Roles Production Supervisor (Secondary) System Operator (Primary) Considerations Installing PowerCenter Reporting using Data Analyzer with Repository and Administrative reports can help monitor load volumes. One of the operations team's greatest tasks is to monitor the data volume processed by the system to determine any trends that are developing.Data Warehousing 481 of 1017 . as a data visualization or metadata management system matures. Monitoring affords team members the time necessary to determine how best to accommodate the increased volumes. the development and operations teams can act proactively as data volumes increase. should ensure that it is capable of growing to meet ever-changing business requirements. As the data migration or consolidation system matures and new data sources are introduced.Phase 8: Operate Subtask 8. The Session Run Details report can be configured to provide the following: ● Sucessful rows sourced INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Similarly.

Dashboards and alerts can be set to monitor loads on an on-going basis. alerting data integration administrators if load times exceed specified threshholds. By customizing the standard reports.across all projects. This information provides the project team with both a measure of the increased volume over time and an understanding of the increased volume's impact on the data load window. Data Integration support staff can create any variety of monitoring levels -. Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:50 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .● ● ● ● Sucessful rows written Failed rows sourced Failed rows written Session duration The Session Run Details report can also be configured to display data over ranges of time for trending.from individual projects to full daily load processing statistics -.Data Warehousing 482 of 1017 .

It is important to recognize in data migration and consolidation solutions that the processing time may increase as the system matures. This monitoring is necessary to ensure that the system is operating at peak efficiency. and existing sources mature. For data migration and consolidation solutions. new data sources are used. more users access the system and reports are run more frequently. this includes monitoring the processes that create the end-user reports. this includes monitoring the processes that load the database. the system operator needs to monitor and report on processing times as well as data volumes. Administrative and operational dashboards can display all vital metrics needing to be monitored. Prerequisites None Roles Presentation Layer Developer (Secondary) System Operator (Primary) Considerations Data Analyzer with Repository and Administration Reports installed can provide information about session run details. or simply fail to run are noticed and appropriate steps are taken. Therefore.3 Monitor Load Processes Description After the data integration solution is deployed. they may cause problems as the daily load processing begins to overlap the system's user availability. it is important to note that processing time can increase as the system matures. If the processes are not monitored. They can also provide the project management team with a high-level INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Phase 8: Operate Subtask 8. and server load trends by day. For data visualization and metadata management reporting solutions.Data Warehousing 483 of 1017 . average loading times. are delayed. For presentation layers and metadata management reporting solutions. It is important to ensure that any processes that stop.2. the system operators begin the task of monitoring the daily processes.

Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:50 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . This software typically includes both visual monitors for the client desktop of the System Operator as well as electronic alerts than can be programmed to contact various project team members.understanding of the health of the analytic support system.Data Warehousing 484 of 1017 . Large installations may already have monitoring software in place that can be adapted to monitor the load processes of the analytic solution.

INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 485 of 1017 . enabling the developers to follow a well-established process during the Operate Phase. The Change Control Procedure document. Many companies rely on a Configuration Control Board to prioritize and approve work for the various maintenance releases. The change control process allows the project team to prioritize the problems and create schedules for their resolution and eventual promotion into the production environment.4 Track Change Control Requests Description The process of tracking change control requests is integral to the Operate Phase.Phase 8: Operate Subtask 8. Prerequisites None Roles Business Project Manager (Primary) Project Sponsor (Primary) Considerations Ideally.2. Most companies use a Change Request Form to kick-off the Change Control procedure. a change control process was implemented during the Architect Phase. created in conjunction with the Change Control Procedures in the Architect Phase should describe precisely how the project team is going to identify and resolve problems that come to light during system development or operation. It is here that any production issues are documented and resolved. These forms should include the following: ● Identify the individual or department requesting the change.

or why it doesn't function like an earlier version. Define the problem or issue that the requested change addresses.Data Warehousing 486 of 1017 . An estimation of the development time. in what release. specifying whether the change was implemented. Best Practices None Sample Deliverables Change Request Form Last updated: 01-Feb-07 18:50 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The expected release date. Include a Resolutions section to be filled in after the Change Request is resolved. if any. The impact of the change requested to project(s) in development. The priority level of the change requested. This type of change control documentation can be invaluable if questions subsequently arise as to why a system operates the way that it does.● ● ● ● ● ● ● A clear description of the change requested. and by whom.

how often it is retrieved. For example.Data Warehousing 487 of 1017 .2. new requirements may be determined by the number of users requesting a particular report or by requests for more or different information in the report. All of this information can then be used to gauge the system's return on investment and to plan future enhancements.5 Monitor Usage Description One of the most important aspects of the Operate Phase is monitoring how and when the organization's end users use the data integration solution. This subtask enables the project team to gauge what information is the most useful. Monitoring the use of the presentation layer during User Acceptance Testing can indicate bottlenecks.Phase 8: Operate Subtask 8. Operations continues to monitor the tasks to maintain system performance. and what type of user generally requests it. The monitoring results can be used to plan for changes in hardware and/or network facilities to support increased requests to the presentation layer. Prerequisites None Roles Business Project Manager (Primary) Data Warehouse Administrator (Secondary) Database Administrator (DBA) (Primary) Production Supervisor (Primary) Project Sponsor (Review Only) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . These requirements may trigger changes in hardware capabilities and/or network bandwidth. When the project is complete.

Data Warehousing 488 of 1017 . The project team should review the available tools. This information can be extracted using Informatica tools to provide a complete view of information presentation usage. and determine which tools best suit the project's monitoring needs. Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:50 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . as well as the metadata on processes used to provide the presentation layer with data. as well as software that may be bundled with the RDBMS. Some end-user reporting tools have built-in reports for such purposes.Repository Administrator (Review Only) System Administrator (Approve) System Operator (Review Only) Considerations Most business organizations have tools in place to monitor the use of their production systems. Informatica provides tools and sources to metadata that meet the need for monitoring information from the presentation layer.

For new data entering the system. There are three types of data quality process relevant in this context: ● ● ● Processes that can be scheduled to monitor data quality on an ongoing basis Processes that can address or repair any data quality issues discovered Processes that can run at the point of data entry to prevent bad data from entering the system This subtask is concerned with agreeing to a strategy to use any or all such processes to validate the continuing quality of the business’ data and to safeguard against lapses in data quality in the future. Prerequisites None Roles Data Steward/Data Quality Steward (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Moreover. Such processes are an optional deliverable for most projects. as it provides a means to monitor the existing data to ensure that previously identified data quality issues do not reoccur. or may be suitable with a reasonable amount of tuning. monitoring provides a means to ensure that any new feeds do not compromise the integrity of the existing data. However. the processes created for the Data Quality Audit task in the Analyze Phase may still be suitable for application to the data in the Operate Phase.Data Warehousing 489 of 1017 .6 Monitor Data Quality Description This subtask is concerned with data quality processes that may have been scoped into the project for late-project or post-project use.Phase 8: Operate Subtask 8.2. there is a strong argument for building into the project plan data quality initiatives that will outlast the project. This argument is based upon the concept that the decision to incorporate ongoing monitoring should be considered a key deliverable.

Any duplication issues found in the system can be addressed manually or by other data quality plans. However. from a third-party application. these matters require a committed strategy from the business. weekends). compliant data — are well worth it. This subtask is the logical conclusion to a process that began with the performance of a Data Quality Audit in the Analyze Phase and the creation of data quality processes (called plans in Informatica Data Quality terminology) in the Design Phase. nightly or weekly). the results — clean. For example.. Runtime plans can be used to monitor the data stored to the system. for example.g. or reject them as unusable. and write output data back to a live application. The plans created during and after the Operate Phase are likely to be runtime or realtime plans. they can be used to capture data problems at the point of keyboard entry and thus before they are saved to the data system. Best Practices INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Real-time plans are useful in data entry scenarios.Production Supervisor (Secondary) Considerations Ongoing data quality initiatives bring the data quality process full-circle. complete. pass them if accurate. A real-time plan is one that can accept a live data feed. the Data Quality Developer may design a plan to identify duplicate records in the system. cleanse them of error. and the Developer or the system administrator can schedule the plan to run overnight. these plans can be run during periods of relative inactivity (e. The Data Quality Developer must also consider the impact that ongoing data quality initiatives are likely to have on the business systems. The Data Quality Developer must discuss the importance of ongoing data quality management with the business early in the project.g. so that the business can decide what data quality management steps to take within the project or outside of it.Data Warehousing 490 of 1017 .. Should the data quality plans be deployed to several locations or centralized? Will the reference data be updated at regular intervals and by whom? Can plan resource files be moved easily across the enterprise? Once the project resources are unwound. A runtime plan is one that can be scheduled for automated. regular execution (e. The real-time plan can be used to check data entries.

Data Warehousing 491 of 1017 .None Sample Deliverables None Last updated: 01-Feb-07 18:50 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

After these changes are prioritized and authorized by the Project Manager.2. Special attention should be paid to software release schedules. when appropriate. use many of the techniques discussed in 8. Prerequisites None Roles Database Administrator (DBA) (Primary) Repository Administrator (Primary) System Administrator (Secondary) Considerations Once the Build Phase has been completed. This plan should enable both the development and operations staff to plan for and execute system upgrades in an efficient.3 Maintain and Upgrade Environment Description The goal in this task is to develop and implement an upgrade procedure to facilitate upgrading the hardware. hardware limitations.Phase 8: Operate Task 8. many of which are likely to undergo upgrades during the system's lifetime. The team should consider all aspects of the systems' architecture including any software and hardware being used. upgrading system components should be treated as a system change and as such. This information INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and/or network hardware that supports the overall analytic solution. and vendor release support schedules.4 Track Change Control Requests.The deployed system incorporates multiple components. an upgrade plan should be developed and executed. Ideally. with as little impact on the system's end users as possible. the development and operations staff should begin determining how upgrades should be carried out. This plan should include the tasks necessary to perform the upgrades as well as the tasks necessary to update system documentation and the Operations Manual. software. timely manner.Data Warehousing 492 of 1017 . network limitations.

Many times. along with the time to perform the upgrade itself. When combined with knowledge of the data load windows. Thus. Upgrading the Informatica software has some special implications.will give the team an idea of how often and when various upgrades are likely to be required. the software upgrade requires a repository upgrade as well.Data Warehousing 493 of 1017 . Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:50 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . the operations team should factor in the time required to backup the repository. the development staff should be involved in order to ensure that all current sessions are running as designed after the upgrade occurs. In addition. this will allow the operations team to schedule upgrades without adversely affecting the end users.

Register and unregister a local repository. The following repository-related functions can be performed through the Administration Console: ● ● ● ● ● ● ● ● ● ● Enable or disable a Repository Service or service process.Phase 8: Operate Subtask 8. restore. Upgrade a repository and upgrade a Repository Service to a Repository Service. These backups become invaluable if some catastrophic event occurs that requires the repository to be restored. Promote a local repository to a global repository. Prerequisites None Roles INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Backup. Additional information about upgrades is available in the "Upgrading PowerCenter" chapter of the PowerCenter Installation and Configuration Guide. Manage repository plug-ins.3. Alter the operating mode of a Repository Service. The Administration Console manages Repository Services and repository content including backup and restoration. since daily use of these applications adds metadata to the repositories. copy.1 Maintain Repository Description A key operational aspect of maintaining PowerCenter repositories involves creating and implementing backup policies.Data Warehousing 494 of 1017 . Create and delete repository content. or delete a repository. Another key operational aspect is monitoring the size and growth of these repository databases. Manage user connections and locks. Send repository notification messages.

Precautions to take before switching to exclusive mode include user intent notification and disconnect verification. repository promotion. Repository Backup Although PowerCenter database tables may be included in Database Administration backup procedures. Administrative duties can be performed through the Administration Console only when the Repository Service is enabled. Running in exclusive mode requires full privileges and permissions on a Repository Service.Database Administrator (DBA) (Secondary) Repository Administrator (Primary) System Administrator (Secondary) Considerations Enabling and Disabling the Repository Service A service process starts on a designated node when a Repository Service is enabled. plug-in registration. software. The Repository Service provides backup processing for repositories through the INFORMATICA CONFIDENTIAL Velocity v8 Methodology . or user mishaps. PowerCenter's High Availability (HA) feature enables a service to fail-over to another node if the original node become unavailable. PowerCenter repository backup procedures and schedules are established to prevent data loss due to hardware. Exclusive Mode The Repository Service executes in normal or exclusive mode. The Repository Service must be stopped and restarted to complete the mode switch. Running the Repository Service in exclusive mode allows only one user to access the repository through the Administrative Console or pmrep command line program.Data Warehousing 495 of 1017 . or repository upgrades. It is advisable to set the Repository Service mode to exclusive when performing administrative tasks that require configuration updates involving deleting repository content or enabling version control.

Commands can be packaged and scheduled so that backups occur on a desired schedule without manual intervention. and code page information in a file stored on the server in the backup location. as new repositories are added. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . In addition. on the other hand. for example. it may be desirable to backup the development repository nightly during heavy development efforts. Only the DBA should have access to the backup repository and requests should be made through him/her. TIP A simple approach to automating PowerCenter repository backups is to use the pmrep command line program. and then remove the temporary repository from the database and the cache. a single important mapping is deleted by accident.Data Warehousing 496 of 1017 . eliminate the repository as a source of user problems. as occasionally happens. The backup file name should minimally include repository name and backup date (yyyymmdd). it may be prudent to keep the temporary database around all the time and copy over the development repository to the backup repository on a daily basis in addition to backing up to a file. connection information. Preserve the repository dates as part of the backup file name and. if the development or production repository is corrupted. A backup file enables technical support staff to validate repository integrity to. With the PowerCenter client tools. If. TIP Keep in mind that you cannot restore a single folder or mapping from a repository backup. If the developers need this service often. Production repositories. the backup repository can be used to recover quickly. Because development repositories typically change more frequently than production repositories. you need to obtain a temporary database space from the DBA in order to restore the backup to a temporary repository DB. PowerCenter backup scheduling should account for repository change frequency. may only need backup processing after development promotions are registered. A repository backup file is invaluable for reference when. for example.Administrative Console or the pmrep command line program. questions arise as to the integrity of the repository or users encounter problems using it. copy the lost metadata. The Repository Service backup function saves repository objects. delete the older ones.

especially in large projects. response may become slower. ● Audit Trail The SecurityAuditTrail configuration option in the Repository Service properties in the Administrative Console allows tracking changes to repository users. and privileges. Assuming that repository backups are taken on a consistent basis. if necessary.Data Warehousing 497 of 1017 . Security audit changes logged include owner. repository databases should go undergo periodic "housecleaning" through statistics and defragmentation. Much like any other database.Repository Performance Repositories may grow in size due to the execution of workflows. As the repository grows. owner's group or folder permissions. Perform Defragmentation. Enabling the audit trail causes the Repository Service to record security changes to the Repository Service log. privileges. global object permissions. Work with the DBAs to schedule this as a regular job. you can always get old log information from the repository backup. Best Practices Disaster Recovery Planning with PowerCenter HA Option Sample Deliverables None Last updated: 04-Dec-07 18:21 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . passwords changes of another user. group maintenance. and permissions. Consider these techniques to maintain a repository for better performance: ● Delete Old Session/Workflow Logs Information. groups. user maintenance. Write a simple SQL script to delete old log information.

The pro’s and con’s of every upgrade decision typically include the following: Pro New functionality and features Bug fixes and refinements of existing functionality Often provides enhanced performance Con Disruptive to development environment Disruptive to production environment May require new training and adversely affect productivity Support for older releases of software is dropped. Given that data integration environments often contain a host of different applications including Informatica software. A regular schedule should be defined where new releases are evaluated on functionality and need in the environment.3. new capabilities. Software upgrades require a continuous assessment and planning process.Phase 8: Operate Subtask 8. upgrades must be coordinated with on-going development work and on-going production data integration. database systems. However. EAI tools. Appropriate planning and coordination of software upgrades allow a data integration environment to stay current on its technology stack with minimal disruptions to production data integration efforts and development projects. System architects and administrators must continually evaluate the new software offerings across the various products in their data integration environment and balance the desire to upgrade with the impact of an upgrade. New software releases offer expanded functionality. Prerequisites None Roles Database Administrator (DBA) (Secondary) Repository Administrator (Primary) System Administrator (Secondary) Considerations When faced with a new software release. and other related technologies – an upgrade in any one of these technologies may require an upgrade in any number of other software programs for the full system to function properly. an upgrade can be a disruptive event since project work may halt while the upgrade process is in progress. May require other pieces of software to be forcing an upgrade to maintain support upgraded to function properly May be required to support newer releases of other software in the environment INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Once approved. and fixes to existing functionality that can benefit the data integration environment and future integration work. BI tools.Data Warehousing 498 of 1017 . operating systems. the first consideration is to decide whether the upgrade is appropriate for the data integration environment.2 Upgrade Software Description Upgrading the application software of a data integration solution to a new release is a continuous operations task as new releases are offered periodically by every software vendor.

Whether you are in a production environment or still in development mode. Investigation and strategy around potential architecture changes should occur early. software versions are compatible. In PowerCenter for example. as the architecture has moved to a Service-Oriented-Architecure with high availability and failover.Data Warehousing 499 of 1017 . test. and production environments. resolved. The following issues need to be factored into the overall upgrade plan: ● Training . they can be investigated. Upgrade at some time in the future. ● ● ● ● Once a comprehensive plan for the upgrade is in place. and accounted for in the final upgrade plan.The upgrade decision can be to: ● ● ● Upgrade to the latest software release immediately. existing processes may be altered to incorporate and implement the new features. Developing a well thought-out test plan is crucial to a successful upgrade. results can be compared over time to ensure that no unforeseen differences occur in the new software version. Planning for these architecture changes allows users to take full advantage of the new features when the software upgrade is deployed.A new software release likely includes new and expanded features that may create a need to alter the current data integration processes. New Features . Time is required to make and test these changes as well. Proper planning of the necessary training can ensure that employees are trained ahead of the upgrade so that productivity does not suffer once the new software is in place. and new features do not cause unexpected results requires detailed testing. Testing .A future release of software may range from minimal architectural changes to major changes in the overall data integration architecture. Often these changes provide an opportunity to redesign and improve the existing architecture in coordination of the software upgrade. Environment Assessment . the underlying physical setup and location of software components has changed from release to release. the time comes to perform the actual upgrade on the development. an upgrade requires careful planning to ensure a successful transition and minimal disruption. best practice dictates training a core set of architects and system administrators early in the upgrade process so they can assist in the upgrade planning process. it is advisable to copy the production environment to a ‘sandbox’ instance. there are a few important steps to emphasize in the upgrade process: INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The Installation Guides for each of the Informatica products and online help provide instructions on upgrading and the step-by-step process for applying the new version of the software. During the upgrade process. Sandbox Upgrade . If differences do occur. It is not uncommon for data integration teams to skip minor releases (and sometimes even major releases) if they aren’t appropriate for their environment or when the upgrade effort outweighs the benefits. Do not upgrade to this software version at all. A software upgrade is then performed on the ‘sandbox instance’ and data integration processes run on both the current production and the sandbox instance for a period of time. Because it is impossible to properly estimate and plan the upgrade effort if you do not have knowledge of the new features and potential environment changes.Often more than 60 percent of the total upgrade time is devoted to testing the data integration environment with the new software release.In environments with production systems. Reviewing the new features and assessing the impact on the upgrade process is a key preplanning step. Ensuring that data continues to flow correctly. Architects sometimes decide to forgo a particular software version and skip ahead to the future releases if the current release does not provide enough benefit to warrant the disruption to the environment. However. The ‘sandbox’ environment should be as close to an exact copy of production as possible. In this way. including production data.New releases of software often include new features and functionality that are likely to require some level of training for administrators and developers.

but provide a failsafe insurance policy. with minimal disruption to the development and production environments. Restoring from backups may be slower than restoring from the copy.Data Warehousing 500 of 1017 . Best Practices None Sample Deliverables None Last updated: 01-Feb-07 18:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Upgrades have been known to fail in production environments. The backups created using the Repository Manager are reliable and can be used to successfully restore the original repository. Always remove all repository locks through Repository Manager before attempting an upgrade. In addition to the copy. ALWAYS make multiple backups of the current version of the repository before attempting the upgrade.● ● Make a copy of the current database instance housing the repository prior to any upgrade. Carefully monitor the upgraded systems for a period of time after the upgrade to ensure the success of the upgrade. The only recourse at that point is to restore from the backup. making the partially upgraded repositories unusable. A smooth upgrade process enables data integration teams to take advantage of the latest technologies and advances in data integration. ● ● A well-planned upgrade process is key to ensuring success during the transition from the current version to a new version.

Data Warehousing 501 of 1017 .Best Practices ● Configuration Management and Security r Data Analyzer Security Database Sizing Deployment Groups Migration Procedures .PowerExchange Running Sessions in Recovery Mode Using PowerCenter Labels Build Data Audit/Balancing Processes Data Cleansing Data Profiling Data Quality Mapping Rules Effective Data Matching Techniques Effective Data Standardizing Techniques Integrating Data Quality Plans with PowerCenter Managing Internal and External Reference Data Real-Time Matching Using PowerCenter Testing Data Quality Plans Tuning Data Quality Plans Using Data Explorer for Data Discovery and Analysis Working with Pre-Built Plans in Data Cleanse and Match Designing Data Integration Architectures Development FAQs Event Based Scheduling r r r r r r ● Data Quality and Profiling r r r r r r r r r r r r r ● Development Techniques r r r INFORMATICA CONFIDENTIAL Velocity v8 Methodology .PowerCenter Migration Procedures .

PowerCenter Workflows and Data Analyzer Creating Inventories of Reusable Objects & Mappings Metadata Reporting and Sharing Repository Tables & Metadata Management Using Metadata Extensions Daily Operations Third Party Scheduler Determining Bottlenecks Performance Tuning Databases (Oracle) Performance Tuning Databases (SQL Server) Performance Tuning Databases (Teradata) Performance Tuning in a Real-Time Environment r r r r r r r r r ● Error Handling r r r r r ● Metadata and Object Management r r r r ● Operations r r ● Performance and Tuning r r r r r INFORMATICA CONFIDENTIAL Velocity v8 Methodology .PowerCenter Mappings Error Handling Techniques .Data Warehousing 502 of 1017 .r Key Management in Data Warehousing Solutions Mapping Auto-Generation Mapping Design Mapping Templates Naming Conventions Naming Conventions .Data Warehousing Error Handling Strategies .General Error Handling Techniques . Variables and Parameter Files Error Handling Process Error Handling Strategies .Data Quality Performing Incremental Loads Real-Time Integration with PowerCenter Session and Data Partitioning Using Parameters.

Data Warehousing 503 of 1017 .r Performance Tuning UNIX Systems Performance Tuning Windows 2000/2003 Systems Recommended Performance Tuning Procedures Tuning and Configuring Data Analyzer and Data Analyzer Reports Tuning Mappings for Better Performance Tuning Sessions for Better Performance Tuning SQL Overrides and Environment for Better Performance Advanced Client Configuration Options Advanced Server Configuration Options Organizing and Maintaining Parameter Files & Variables Platform Sizing PowerExchange for Oracle CDC PowerExchange for SQL Server CDC PowerExchange Installation (for Mainframe) Assessing the Business Case Defining and Prioritizing Requirements Developing a Work Breakdown Structure (WBS) Developing and Maintaining the Project Plan Developing the Business Case Managing the Project Lifecycle Using Interviews to Determine Corporate Data Integration Requirements r r r r r r ● PowerCenter Configuration r r r r ● PowerExchange Configuration r r r ● Project Management r r r r r r r INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Data Analyzer integrates seamlessly with the following LDAP-compliant directory servers: SunOne/iPlanet Directory Server 4.Data Warehousing 504 of 1017 . application layer and data layer. Ensuring that Data Analyzer security provides appropriate mechanisms to support and augment the security infrastructure of a Business Intelligence environment at every level. Description Four main architectural layers must be completely secure: user layer.Data Analyzer Security Challenge Using Data Analyzer's sophisticated security architecture to establish a robust security system to safeguard valuable business information against a range of technologies and security models.1 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Users must be authenticated and authorized to access data. transmission layer.

folders.2 Novell eDirectory Server 8. Data Analyzer provides three basic types of application-level security: ● Report. Restricts access for users or groups to specific reports. Folder and Dashboard Security. Restricts users to specific attribute values within an attribute column of a table. You can use system roles or create custom roles. Data Analyzer supports Netegrity SiteMinder for centralizing authentication and access control for the various web applications in the organization. Column-level Security. Restricts users and groups to particular metric and attribute columns. Data Analyzer supports the standard security protocol Secure Sockets Layer (SSL) to provide a secure environment. all INFORMATICA CONFIDENTIAL Velocity v8 Methodology . ● ● Components for Managing Application Layer Security Data Analyzer users can perform a variety of tasks based on the privileges that you grant them. Transmission Layer The data transmission must be secure and hacker-proof.2 4. and/or dashboards.Sun Java System Directory Server 5. You can grant roles to groups and/or individual users.1 5.2 Microsoft Active Directory 2000 Microsoft Active Directory 2003 In addition to the directory server.Data Warehousing 505 of 1017 . A role can consist of one or more privileges.7 IBM SecureWay Directory IBM SecureWay Directory IBM Tivoli Directory Server 3. Row-level Security. Application Layer Only appropriate application functionality should be provided to users with associated privileges. When you edit a custom role. Data Analyzer provides the following components for managing application layer security: ● Roles.

all subgroups contained within it inherit the changes. If a user belongs to more than one group. With hierarchical groups. you may create a Lead group and assign it the Advanced Consumer role. Each person accessing Data Analyzer must have a unique user name. you can create group hierarchies. ● Managing Groups Groups allow you to classify users according to a particular function. For example. Users. Because the Manager group is a subgroup of the Lead group. each subgroup automatically receives the roles assigned to the group it belongs to. the user has the privileges from all groups. You can assign one or more roles to a group. you grant the same privileges to all members of the group. After you create groups. Custom roles . When you assign roles to a group.The end user can create and assign privileges to these roles. You can also assign groups to other groups to organize privileges for related users. ● Types of Roles ● System roles . if group 1 has access to something but group 2 is excluded from that object. A user has a user name and password. A group can consist of users and/or groups.groups and users with the role automatically inherit the change. all users and groups within the edited group inherit the change. When you edit a group.Data Warehousing 506 of 1017 . To organize related users into related groups. you can assign users to the groups. you can assign roles to the user or assign the user to a group with predefined roles. all users in the group inherit the changes. You may organize users into groups based on their departments or management level. it has both the Manage Data Analyzer and Advanced Consumer role privileges. Belonging to multiple groups has an inclusive effect. For example.Data Analyzer provides a set of roles when the repository is created. a user belonging to both groups 1 and 2 will have access to the object. you create a Manager group with a custom role Manage Data Analyzer. Each role has sets of privileges assigned to it. To set the tasks a user can perform. ● Groups. Groups are created to organize logical sets of users and roles. Within the Lead group. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . When you edit a group. When you change the roles assigned to a group.

To access the files in the Data Analyzer EAR file. Open the web. some organizations keep only user accounts in the Windows Domain or LDAP directory service. not groups.xml file before you modify it. but set up groups in Data Analyzer to organize the Data Analyzer users. you can set a property in the web. locate the web. use the EAR Repackager utility provided with Data Analyzer. when Data Analyzer synchronizes the repository with the Windows Domain or LDAP directory service.xml file so that Data Analyzer updates only user accounts.Data Warehousing 507 of 1017 . it updates the users and groups in the repository and deletes users and groups that are not found in the Windows Domain or LDAP directory service.Preventing Data Analyzer from Updating Group Information If you use Windows Domain or LDAP authentication. Data Analyzer provides a way for you to keep user accounts in the authentication server and still keep the groups in Data Analyzer. To prevent Data Analyzer from deleting or updating groups in the repository.xml file in the following directory: /custom/properties 2. Note: Be sure to back-up the web. To prevent Data Analyzer from updating group information in the repository: 1.xml file with a text editor and locate the line containing the following property: enableGroupSynchronization The enableGroupSynchronization property determines whether Data Analyzer updates the groups in the repository. you typically modify the users or groups in Data Analyzer. However. In the directory where you extracted the Data Analyzer EAR file. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The web. You can then create and manage groups in Data Analyzer for users in the Windows Domain or LDAP directory service.xml file is in stored in the Data Analyzer EAR file. Ordinarily.

Data Analyzer creates a System Administrator user account when you create the repository. Save the web. scheduler.xml file in the Data Analyzer folder.xml file and add it back to the Data Analyzer EAR file. You can change the password for a system daemon. Access LDAP Directory Contacts INFORMATICA CONFIDENTIAL Velocity v8 Methodology .com. runs the updates for all time-based schedules. Managing Users Each user must have a unique user name to access Data Analyzer. The system daemon. 5.Data Warehousing 508 of 1017 . Data Analyzer does not synchronize the groups in the repository with the groups in the Windows Domain or LDAP directory service.informatica. Restart Data Analyzer. You cannot assign new roles to system daemons or assign them to groups. You must create and manage groups. Change the password in the Administration tab in Data Analyzer 2.xml file is set to false. 4. To perform Data Analyzer tasks. 3. complete the following steps: 1. When the enableGroupSynchronization property in the web. To prevent Data Analyzer from updating group information in the Data Analyzer repository. a user must have the appropriate privileges. change the value of the enableGroupSynchronization property to false: <init-param> <param-name> InfSchedulerStartup. Change the password in the web. ias_scheduler/ padaemon. Data Analyzer permanently assigns the daemon role to system daemons. but you cannot change the system daemon user name via the GUI. The default user name for the System Administrator user account is admin.3. and assign users to groups in Data Analyzer. To change the password for a system daemon. System daemons must have a unique user name and password in order to perform Data Analyzer system functions and tasks. Data Analyzer updates only the user accounts in Data Analyzer the next time it synchronizes with the Windows Domain or LDAP authentication server.ias.enableGroupSynchronization </param-name> <param-value>false</param-value> </init-param> When the value of enableGroupSynchronization property is false. You can assign privileges to a user with roles or groups. Restart Data Analyzer.

template dimensions. enter the Base DN entries for your LDAP directory. you determine which users and groups can view particular attribute values.Data Warehousing 509 of 1017 . Use access permissions to restrict access to a particular folder or object in the repository. Restrict user and/or group access to folders. Data Analyzer grants read and write access permissions to every user in the repository. Types of Access Permissions Access permissions determine the tasks that you can perform for a specific repository object. Use password restrictions when you do not want users to alter their passwords. Delete. When you add an LDAP server. dashboards. you determine which users and groups have access to the folders and repository objects. delete. users can email reports and shared documents to LDAP directory contacts. reports. You can assign the following types of access permissions to repository objects: ● ● Read. Password restrictions. ● ● When you create an object in the repository. ● ● By default. contact your LDAP system administrator. every user has default read and write permissions for that object. Allows you to view a folder or object. Restrict users from changing their passwords. After you set up the connection to the LDAP directory service. Allows you to edit an object. In the BaseDN property. you can add the LDAP server on the LDAP Settings page. When you set access permissions.To access contacts in the LDAP directory service. Customizing User Access You can customize Data Analyzer user access with the following security options: ● Access permissions. or change access permissions for that object. Change permission. Allows you to delete a folder or an object from the repository. Write. When you set data restrictions. metrics. Data restrictions. Allows you to change the access permissions on a folder or object. Restrict user and/or group access to information in fact and dimension tables and operational schemas. If a user with a data restriction runs a report. Use data restrictions to prevent certain users or groups from accessing specific values when they create reports. Also allows you to create and edit folders and objects within a folder. attributes. or schedules. you determine which users and/or groups can read. Data Analyzer does not display the restricted data to that user. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . You can use the General Permissions area to modify default access permissions for an object. or turn off default access permissions. write. The Base distinguished name entries define the type of information that is stored in the LDAP directory. If you do not know the value for BaseDN. By customizing access permissions for an object. you must provide a value for the BaseDN (distinguished name) property.

Exclusive. you may want to restrict data related to the performance of a new store from outside vendors. Both tables use the Region attribute. To set data restrictions for a user or group. Set data restrictions for one user at a time. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Use the IN option to allow users to access data related to the attributes you select. it runs reports against the data restrictions for the report owner. Restrict access to attribute values in a fact table. If the attributes apply to more than one schema in the repository. You can set data restrictions using one of the following methods: ● Set data restrictions by object. You can set one or more data restrictions for each user or group. ● Restricting Data Access by User or Group You can edit a user or group profile to restrict the data the user or group can access in reports. the Data Analyzer Server creates a separate report for each unique security profile. operational schema. You can set one data restriction that applies to both the Sales and Salary fact tables based on the region you select.Data Warehousing 510 of 1017 . Use the NOT IN option to restrict users from accessing data related to the attributes you select. you can set data restrictions for any schema in the repository. to allow users to view only data from the year 2001. Edit a user account or group to restrict user or group access to specified data. Data restrictions are set to keep sensitive data from appearing in reports. you need the following role or privilege: ● ● System Administrator role Access Management privilege When Data Analyzer runs scheduled reports that have provider-based security. Use this method to apply the same data restrictions to more than one user or group. You can set a data restriction to limit user or group access to data in a single schema based on the attributes you select. For example. For example. create an “IN 2001” rule. When you edit a user profile. you can also restrict the user or group access from related data across all schemas in the repository.Data Restrictions You can restrict access to data based on the values of related attributes. For example. you may have a Sales fact table and Salary fact table. including operational schemas and fact tables. You can apply the data restriction to users and groups in the repository. to allow users to view all data except from the year 2001. create a “NOT IN 2001” rule. and real-time message stream. For example. Use this method to set custom data restrictions for different users or groups ● Types of Data Restrictions You can set two kinds of data restrictions: ● Inclusive. You can set a data restriction that excludes the store ID from their reports. However. if the reports have consumer-based security. real-time connector.

jar and locate the file entry called InfChangeSystemUserNames. ias_scheduler/padaemon 1.DatabaseName=database_name Duser=userName -Dpassword=userPassword -Dias_scheduler=pa_scheduler -Dadmin=paadmin repositoryutil.jar set CLASSPATH=%CLASSPATH%.jdbc. This extracts the file as 'd:\temp\repository tils\Refresh\InfChangeSystemUserNames.%WL_HOME%\lib\ias.1) ● Repository authentication.zip set CLASSPATH=%CLASSPATH%. To change the Data Analyzer system administrator username on Weblogic 8.1\jdk131_06 set WL_HOME=E:\bea\wlserver6.1(DA 8.1\lib 3.bat) with the following commands in the directory D:\Temp \Repository Utils\Refresh\ REM To change the system user name and password REM ******************************************* REM Change the BEA home here REM ************************ set JAVA_HOME=E:\bea\wlserver6.class' 6.class" into a temporary directory (example: d: \temp) 5.\bea\wlserver6.jar set CLASSPATH=%CLASSPATH%.%WL_HOME%\lib\ias_securityadapter. 2.1 set CLASSPATH=%WL_HOME%\sql set CLASSPATH=%CLASSPATH%.%WL_HOME%\lib\classes12. Go to the Web Logic library directory: . Create a batch file (change_sys_user. ● To change the Data Analyzer default users from admin.InfChangeSystemUserNames REM END OF BATCH FILE INFORMATICA CONFIDENTIAL Velocity v8 Methodology .jar set CLASSPATH=%CLASSPATH%.%WL_HOME%\lib\jconn2.Data Warehousing 511 of 1017 . Set up the new system administrator account in Windows Domain or LDAP directory service. Extract the file "InfChangeSystemUserNames. Then use the Update System Accounts utility to change the system administrator account name in the repository.sqlserver.SQLServerDriver-Durl=jdbc: informatica:sqlserver://host_name:port.%WL_HOME%\infalicense REM Change the DB information here and also REM the user Dias_scheduler and -Dadmin to values of your choice REM ************************************************************* %JAVA_HOME%\bin\java-Ddriver=com. LDAP or Windows Domain Authentication.SelectMethod=cursor. You must use the Update System Accounts utility to change the system administrator account name in the repository.%WL_HOME%\lib\weblogic.informatica. Open the file ias.refresh.The following information applies to the required steps for changing admin user for weblogic only.jar set CLASSPATH=%CLASSPATH%. Back up the repository.class 4.

Note: There is a tailing period at the end of the command above. Save the file and open up a command prompt window and navigate to D:\Temp\Repository Utils \Refresh\ 9.\bea\wlserver6. and weblogic.xml.1\config\informatica\applications\ias \WEB-INF) by replacing ias_scheduler with 'pa_scheduler' 11.jar META-INF cd META-INF Update META-INF/weblogic-ejb. Restart the server.xml This file is in iasEjb. respectively.\bea\wlserver6.7.jar file located in the directory .xml replace ias_scheduler with pa_scheduler cd \ jar uvf \bea\wlserver6.1\config\informatica\applications\ To edit the file Make a copy of the iasEjb.jar: ● ● ● ● ● ● ● mkdir \tmp cd \tmp jar xvf \bea\wlserver6. 10.jar -C \tmp . Replace ias_scheduler with pa_scheduler in the xml file weblogic-ejb-jar.jar.1\config\informatica\applications\iasEjb.1\config\informatica\applications\iasEjb. Make changes in the batch file as directed in the remarks [REM lines] 8.bat and press Enter. Modify web.Data Warehousing 512 of 1017 . The user "ias_scheduler" and "admin" will be changed to "pa_scheduler" and "paadmin".xml (located at . At the prompt. Last updated: 04-Jun-08 15:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . 12. type change_sys_user.

database administrators. In most situations the row count in the target system can be calculated by following the data flows from the source to the target. Granularity affects the size of a database to a great extent.Data Warehousing 513 of 1017 . The same data in the ODS will be present in the warehouse as well. The number of dimensions that are connected to the fact tables affects the granularity of the table and hence the size of the table. albeit in a different format. a sales order fact table's size is likely to be greatly affected by whether the table is being aggregated at a monthly level or at a quarterly level. Each row in the source translates to 12 rows in the target.Database Sizing Challenge Database sizing involves estimating the types and sizes of the components of a data architecture. Description The first step in database sizing is to review system requirements to define such things as: ● Expected data architecture elements (will there be staging areas? operational data stores? centralized data warehouse and/or master data? data marts?) Each additional database element requires more space. This is important for determining the optimal configuration for the database servers in order to support the operational workloads. The granularity of fact tables is determined by the dimensions linked to that table. Individuals involved in a sizing exercise may be data architects. especially for aggregate tables. ● Load frequency and method (full refresh? incremental updates?) INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The source table holds sales data for 12 months in a single row (one column for each month). So a source table with one million rows ends up as a 12 million row table. This is even more true in situations where data is being replicated across multiple systems. ● Expected source data volume It is useful to analyze how each row in the source system translates into the target system. and/or business analysts. For example. such as a data warehouse maintaining an operational data store as well. For example. The level at which a table has been aggregated increases or decreases a table's row count. say a sales order table is being built by denormalizing a source table. ● Data granularity and periodicity Granularity refers to the lowest level of information that is going to be stored in a fact table.

55.Data Warehousing 514 of 1017 .116000 PM 13-APR-04 02. Oracle Table Space Prediction Model Oracle (10g and onwards) provides a mechanism to predict the growth of a database. A load plan that updates a target less frequently is likely to load more data at one go.14. Estimated growth rates over time and retained history. The results of this query are shown below: TIMEPOINT SPACE_USAGE SPACE_ALLOC QUALITY -----------------------------.116000 PM 13-MAY-04 02.-------------------11-APR-04 02. You may want to calculate other estimates based on five-percent annual sales growth (case 1) and 20-percent annual sales growth (case 2).55. 10 million sales transactions are expected.116000 PM 16-MAY-04 02.14.14.116000 PM 6372 6372 6372 6372 6372 6372 6372 65536 INTERPOLATED 65536 INTERPOLATED 65536 INTERPOLATED 65536 PROJECTED 65536 PROJECTED 65536 PROJECTED 65536 PROJECTED The QUALITY column indicates the quality of the output as follows: INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Oracle incorporates a table space prediction model in the database engine that provides projected statistics for space used by a table.Load frequency affects the space requirements for the staging areas.14.'TABLE')) ORDER BY timepoint. Multiple projections for best and worst case scenarios can be very helpful.14.----------. In the first year.14. This feature can be useful in predicting table space requirements.116000 PM 14-MAY-04 02.55. more space is required by the staging areas. A full refresh requires more space for the same reason. Next.14.'tablename'.----------.55.55.116000 PM 12-APR-04 02. That is. an annual sales growth rate of 10 percent translates into 11 million fact table records for the next year.116000 PM 15-MAY-04 02. use the sales growth forecasts for the upcoming years for database growth calculations.object_growth_trend ('schema'. the fact table is likely to contain about 60 million records.55. At the end of five years. this equates to 10 million fact-table records. As an example. Determining Growth Projections One way to estimate projections of data growth over time is to use scenario analysis. Therefore.55. The following Oracle 10g query returns projected space usage statistics: SELECT * FROM TABLE(DBMS_SPACE. for scenario analysis of a sales tracking data mart you can use the number of sales transactions to be stored as the basis for the sizing estimate.

● ● Baseline Volumetric Next. For example. Guesstimating When there is not enough information to calculate an estimate as described above. You then need to apply growth projections to these statistics. 10 MB * (60. you determine the size to be 10MB.5 times larger than the largest table in the database. INTERPOLATED . Also estimate the temporary space for sort operations. Develop a detailed sizing using a worksheet inventory of the tables and indexes from the physical data model. use the physical data models for the sources and the target architecture to develop a baseline sizing estimate. The default estimate for index size is to assume same size as the table size. Add up the field sizes to determine row size.000/10. For this reason.The data for this timepoint did not meet the GOOD criteria but was based on data gathered before and after the timepoint.The timepoint is in the future.. For data warehouse applications where summarizations are common. and database cache. the estimated size for the fact table is about 60GB [i. PROJECTED . The administration guides for most DBMSs contain sizing guidelines for the various database structures such as tables. So. The temporary space can be as much as 1. Based on your understanding of transaction volume over time. be sure to use the database manuals to determine the size of each data type. plan on large temporary spaces.Data Warehousing 515 of 1017 . determine your growth metrics for each type of data and calculate out your source data volume (SDV) from table size and growth metrics. If your target data architecture is not completed so that you can determine table sizes. This test load can be a fraction of the actual data and is used only to gather basic sizing statistics.The data for the timepoint relates to data within the AWR repository with a timestamp within 10 percent of the interval. use educated guesses and “rules of thumb” to develop as reasonable an estimate as possible. base your estimates on multiples of the SDV: r ● If it includes staging areas: add another SDV for any source subject area that you will INFORMATICA CONFIDENTIAL Velocity v8 Methodology .000. Another approach that is sometimes useful is to load the data architecture with representative data and determine the resulting database sizes.● GOOD . Then use the data volume projections to determine the number of rows to multiply by the table size. indexes. sort space.e.000)]. use what you do know of the source data to estimate average field size and average number of fields in a row to determine table size. log files. Various database products use different storage methods for data types. ● If you don’t have the source data model. after loading ten thousand sample records to the fact table. Based on the scenario analysis. along with field data types and field sizes. data files. so the data is estimated based on previous growth statistics. you can expect this fact table to contain 60 million records after five years. Don't forget to add indexes and summary tables to the calculations.

Last updated: 19-Jul-07 14:14 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . if the rollup level represents 10 percent of the dimensions at the details level. keeping one year’s worth of monthly loads = 12 x SDV) Data warehouse architectures are based on the periodicity and granularity of the warehouse. remember that there is always much more data than you expect so you may want to add a reasonable fudge-factor to the calculations for a margin of safety. Be sure to consider the growth projections over time and the history to be retained in all of your calculations. add a percentage of the warehouse volumetrics based on how much of the warehouse data will be aggregated and to what level (e.Data Warehousing 516 of 1017 . this may be another SDV + (.stage multiplied by the number of loads you’ll retain in staging. for data marts add a percentage of the data warehouse based on how much of the warehouse data is moved into the data mart.3n x SDV where n = number of time periods loaded in the warehouse over time) If your data architecture includes aggregates..g. r r r r And finally. r If you intend to consolidate data into an operational data store.. use 10 percent). add the SDV multiplied by the number of loads to be retained in the ODS for historical purposes (e. Similarly.g.

there is a need for a versatile and flexible mechanism that can overcome such limitations as confinement to a single source folder. Deployment groups are faster and more flexible than folder moves for incremental changes. static deployment groups can be created. When copying a deployment group.e.Data Warehousing 517 of 1017 . the repository folders). There are two types of deployment groups . and reporting of changes in information technology systems.. Deployment Groups are containers that hold references to objects that need to be migrated. sessions and tasks. monitoring. targets. sources. they allow for migration “rollbacks” if necessary. mapplets.static and dynamic. ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Migrating a deployment group involves moving objects in a single copy operation from across multiple folders in the source repository into multiple folders in the target repository. If the set of deployment objects is not expected to change between deployments. individual objects to be copied can be selected as opposed to the entire contents of a folder. dynamic deployment groups should be used. as well as the object holders (i.Deployment Groups Challenge In selectively migrating objects from one repository folder to another. If the set of deployment objects is expected to change frequently between deployments.e. Dynamic deployment groups contain a query that is executed at the time of deployment.. Description Regulations such as Sarbanes-Oxley (SOX) and HIPAA require tracking. workflows. The results of the query (i. object versions in the repository) are then selected and copied to the deployment group. This includes objects such as mappings. reusable transformations. Automation of change control processes using deployment groups and pmrep commands provide organizations with a means to comply with regulations for configuration management of software artifacts in a PowerCenter repository. Users explicitly add the version of the object to be migrated to the deployment group. ● Static deployment groups contain direct references to versions of objects that need to be moved. In addition.

in a non-versioned repository) and deployments using Deployment Groups. It is important to note that the deployment group only migrates the objects it contains to the target repository/folder.. the command-line deployment requires an XML control file that contains the same INFORMATICA CONFIDENTIAL Velocity v8 Methodology . itself.g. Whereas deployment via the GUI requires stepping through a wizard and answering a series of questions to deploy. Deploying via the GUI Migrations can be performed via the GUI or the command line (pmrep). For more information. an administrator can have the deployment group “undo” the most recent deployment. In order to migrate objects via the GUI. in addition to specific selection criteria. Deploying via the Command Line Alternatively. It does not. It still resides in the source repository. simply drag a deployment group from the repository it resides in onto the target repository where the referenced objects are to be moved. a query condition should be used. and the deployment history is created. Without this qualifier. While any available criteria can be used. reverting the target repository to its pre-deployment state. then if necessary. refer to the “Strategies for Labels” section of Using PowerCenter Labels. A deployment group exists in a specific repository. and into which folders in which target repositories those versions were copied (i.. it is advisable to have developers use labels to simplify the query.Data Warehousing 518 of 1017 . Once the wizard is complete. A deployment group maintains a history of all migrations it has performed. the deployment may encounter errors if there are non-reusable objects held within the mapping or mapplet. The Deployment Wizard appears and steps the user through the deployment process. The commands DeployFolder and DeployDeploymentGroup in pmrep are used respectively for these purposes. It can be used to move items to any other accessible repository/folder. When generating a query for deployment groups with mappings and mapplets that contain non-reusable objects.Dynamic deployment groups are generated from a query. move to the target repository. the PowerCenter pmrep command can be used to automate both Folder Level deployments (e.e. It tracks what versions of objects were moved from which folders in which source repositories. Using labels (as described in the Using PowerCenter Labels Best Practice) allows objects in the subsequent repository to be tracked back to a specific deployment. the migration occurs. it provides a complete audit trail of all migrations performed). Given that the deployment group knows what it moved and to where. The query must include a condition for Is Reusable and use a qualifier of either Reusable and Non-Reusable.

Use pmrep ExecuteQuery to output the results to a persistent input file. Rolling Back a Deployment Deployment groups help to ensure that there is a back-out methodology and that the latest version of a deployment can be rolled back. To do this: In the target repository (where the objects were migrated to). Use pmrep CreateDeploymentGroup to create a dynamic or static deployment group. 3. The following steps can be used to create a script to wrap pmrep commands and automate PowerCenter deployments: 1. When migrating with deployment groups in this way. Use pmrep ListObjects to return the object metadata to be parsed in another pmrep command. Initiate a rollback on a deployment in order to roll back only the latest versions of INFORMATICA CONFIDENTIAL Velocity v8 Methodology . 2. a web interface can be built for entering/approving/rejecting code migration requests. it is possible to consolidate them by mapping folders appropriately through the deployment group migration wizard. Use DeployDeploymentGroup to copy a deployment group to a different repository. go to: Versioning>>Deployment>>History>>View History>>Rollback. the override buttons in the migration wizard are used to select specific folder mappings. This can provide additional traceability and reporting capabilities to the automation of PowerCenter code migrations. The rollback purges all objects (of the latest version) that were in the deployment group. A control file with all the specifications is required for this command.Data Warehousing 519 of 1017 . 4. This file must be present before the deployment is executed. Considerations for Deployment and Deployment Groups Simultaneous Multi-Phase Projects If multiple phases of a project are being developed simultaneously in separate folders. This input file can also be used for AddToDeploymentGroup command.information that the wizard requests. Additionally.

Code Migration from Versioned Repository to a Non-Versioned Repository In some instances. If it is necessary to keep more than the latest version. Also. Note that when migrating in this manner. the number of object versions in those repositories increases. other aspects of the computing environment may make it desirable to generate a dynamic deployment group.the objects. it may be desirable to migrate objects to a non-versioned repository from a versioned repository. Also all the deleted versions of the objects should be purged to reduce the size of the repository. pmrep command RollBackDeployment can be used for automating rollbacks. On-Shore Migration In an off-shore development environment to an on-shore migration situation. this changes the wizards used.Data Warehousing 520 of 1017 . as does the size of the repositories. These labels are ones that have been applied to the repository for the specific purpose of identifying objects for purging. you will have to rollback all the objects in a deployment group. Managing Repository Size As objects are checked in and objects are deployed to target repositories. use a combination of Check-in Date and Latest Status (both are query parameters) to purge the desired versions from the repository and retain only the very latest version. Remember that you cannot rollback part of the deployment. it activates the import wizard as if a deployment group was being received. Instead of migrating the group itself to the next repository. The rollback ensures that the check-in time for the repository objects is the same as the deploy time. labels can be included in the query. and that the export from the versioned repository must take place using XML export. Last updated: 27-May-08 13:20 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . If the on-shore repository is versioned. Off-Shore. In order to manage repository size. a query can be used to select the objects for migration and save them to a single XML file which can be then be transmitted to the on-shore environment through alternative methods.

and XML import/export. holds all of the common objects. which provides the capability to migrate any combination of objects within the repository with a single command. and from QA to production.PowerCenter Challenge Develop a migration strategy that ensures clean migration between development. In versioned PowerCenter repositories. Eventually. such as sources. QA. The following example shows a typical architecture. Also. Standalone Repository Environment In a standalone environment. and production workspaces and segregate work. the company has chosen to create separate development folders for each of the individual developers for development and unit test purposes. and reusable mapplets. In this example. In addition. The distributed environment section touches on several migration architectures. all work is performed in a single PowerCenter repository that serves as the metadata store. based on the environment and architecture selected. Repository Environments The following section outlines the migration procedures for standalone and distributed repository environments. Each section describes the major advantages of its use. and production or are there just one or two environments that share one or all of these phases. two test folders are created for QA purposes. outlining the pros and cons of each. The first contains all of the unit-tested mappings from the development folder. please note that any methods described in the Standalone section may also be used in a Distributed environment. two production folders will also be built. PowerCenter migration options include repository migration. The second is a common or shared folder that contains all of the tested shared objects. users can also use static or dynamic deployment groups for migration. targets. Deciding which migration strategy works best for a project depends on two primary factors. as well as its disadvantages. folder migration. SHARED_MARKETING_DEV.Migration Procedures . and production environments. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . as the following paragraphs explain. and production environments is essential for the deployment of an application. How has the folder architecture been defined? ● Each of these factors plays a role in determining the migration procedure that is most beneficial to the project. PowerCenter offers flexible migration options that can be adapted to fit the need of each application. object migration. Description Ensuring that an application has a smooth migration process between development.Data Warehousing 521 of 1017 . The following sections discuss various options that are available. This type of architecture within a single repository ensures seamless migration from development to QA. Separate folders are used to represent the development. test. thereby protecting the integrity of each of these environments as the system evolves. quality assurance (QA). QA. This Best Practice is intended to help the development team decide which technique is most appropriate for the project. ● How is the PowerCenter repository environment designed? Are there individual repositories for development. QA. A single shared or common development folder.

is object migration via an object copy. although the XML import/export method is the most intuitive method for resolving shared object conflicts. In this case. the migration method is slightly different here when you're copying the mappings because you must ensure that the shortcuts in the mapping are associated with the SHARED_MARKETING_TEST folder. A user can export each of the objects in the SHARED_MARKETING_DEV folder to XML. let's discuss how it will migrate mappings to test. Warehouse Designer. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . However. Source Analyzer. etc. you can use either of the two object-level migration methods described above to copy the mappings to the folder. The first step in this process is to copy all of the shared or common objects from the SHARED_MARKETING_DEV folder to the SHARED_MARKETING_TEST folder. the next step is to copy the individual mappings from each development folder into the MARKETING_TEST folder. Otherwise. You can then continue the migration process until all mappings have been successfully migrated.Proposed Migration Process – Single Repository DEV to TEST – Object Level Migration Now that we've described the repository architecture for this organization. With the XML import/export. Again. and then eventually to production.Data Warehousing 522 of 1017 . Migrations with versioned PowerCenter repositories is covered later in this document. In PowerCenter 7 and later versions.. ● After you've copied all common or shared objects. After all mappings have completed their unit testing. the process for migration to test can begin. and most common method. the XML files can be uploaded to a third-party versioning tool. and then re-import each object into the SHARED_MARKETING_TEST via XML import. Designer prompts the user to choose the correct shortcut folder that you created in the previous example. which point to the SHARED_MARKETING_TEST (see image below). if the organization has standardized on such a tool. This is similar to dragging a file from one folder to another using Windows Explorer. versioning can be enabled in PowerCenter. you can export multiple objects into a single XML file. and then import them at the same time. a user opens the SHARED_MARKETING_TEST folder and drags the object from the SHARED_MARKETING_DEV into the appropriate workspace (i.).e. The second approach is object migration via object XML import/export. This can be done using one of two methods: ● The first.

Then click “Next. Next. then the default name is used (see below). the object-level migration can be completed either through drag-and-drop or by using XML import/export. If no such workflow exists. The following steps outline the full process for successfully copying a workflow and all of its associated tasks. 1. a default name is used. 2.The final step in the process is to migrate the workflows that use those mappings. the Wizard prompts you to rename it or replace it. If a workflow with the same name exists in the destination folder. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The Wizard prompts for the name of the new workflow. Again. In either case. If it does not exist. The next step for each task is to see if it exists (as shown below).” 3. you can rename or replace the current one. but differs in that the Workflow Manager provides a Workflow Copy Wizard to guide you through the process. Then click “Next” to continue the copy process. Select the mapping and continue by clicking “Next". the Wizard prompts you to select the mapping associated with each session task in the workflow. this process is very similar to the steps described above for migrating mappings. If the task is present.Data Warehousing 523 of 1017 .

highlight the SHARED_MARKETING_TEST folder. If no connections exist. If connections exist in the target repository.Data Warehousing 524 of 1017 . Open the PowerCenter Repository Manager client tool and log into the repository. Initial Migration – New Folders Created The move to production is very different for the initial move than for subsequent changes to mappings and workflows. The Copy Folder Wizard appears to guide you through the copying process.4. click "Finish" and save the work. To make a shared folder for the production environment. drag it. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . we need to create two new folders to house the production-ready objects. and drop it on the repository name. address the initial test to production migration. 2. Create these folders after testing of the objects in SHARED_MARKETING_TEST and MARKETING_TEST has been approved. When this step is completed. 1. the default settings are used. The following steps outline the creation of the production folders and. 3. Since the repository only contains folders for development and test. at the same time. the Wizard prompts you to select the connection to use for the source and target.

In this case.4.Data Warehousing 525 of 1017 . the folder name that appears on this screen is the folder name followed by the date.” INFORMATICA CONFIDENTIAL Velocity v8 Methodology . By default. enter the name as “SHARED_MARKETING_PROD. 5. In this example. The second Wizard screen prompts you to enter a folder name. we'll use the advanced options. The first Wizard screen asks if you want to use the typical folder copy options or the advanced options.

The third Wizard screen prompts you to select a folder to override. 7. Because this is the first time you are transporting the folder. The final screen begins the actual copy process. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Click "Finish" when the process is complete. you won’t need to select anything.6.Data Warehousing 526 of 1017 .

At the end of the migration. Incremental Migration – Object Copy Example Now that the initial production migration is complete. owner’s group. the Copy Wizard copies the permissions for the folder owner to the target folder. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Before you can actually run the workflow in these production folders. Previously. you need to modify the session source and target connections to point to the production environment. When you copy or replace a PowerCenter repository folder. or all others in the repository to the target folder. and all users in the repository to the target folder. you should have two additional folders in the repository environment for production: SHARED_MARKETING_PROD and MARKETING_ PROD (as shown below). Use the MARKETING_TEST folder as the original to copy and associate the shared objects with the SHARED_MARKETING_PROD folder that you just created. The wizard does not copy permissions for users. the Copy Wizard copied the permissions for the folder owner. let's take a look at how future changes will be migrated into the folder.Data Warehousing 527 of 1017 . These folders contain the initially migrated objects.Repeat this process to create the MARKETING_PROD folder. groups.

By comparing the objects. Open the destination folder and expand the source folder. you can ensure that the changes that you are making are what you intend.Data Warehousing 528 of 1017 . Designer prompts you to choose whether to Rename or Replace the object (as shown below). it must be re-tested and migrated into production for the actual change to occur. you can choose to compare conflicts whenever migrating any object in Designer or Workflow Manager. Because this is a modification to an object that already exists in the destination folder. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The following steps outline the process of moving these objects individually. Log into PowerCenter Designer. 3. See below for an example of the mapping compare window. These types of changes in production take place on a case-by-case or periodically-scheduled basis.Any time an object is modified. 2. In PowerCenter 7 and later versions. Choose the option to Replace the object. Click on the object to copy and drag-and-drop it into the appropriate workspace window. 1.

Log into Workflow Manager and make the appropriate changes to the session or workflow so it can update itself with the changes. Rename. r When copying each mapping in PowerCenter. we look at moving development work to QA and then from QA to production. worklets. Copy the mapping from Development into Test. The newly copied mapping is now tied to any sessions that the replaced mapping was tied to. open the MARKETING_TEST folder. If using shortcuts. 6. or Skip for each reusable object. using multiple development folders for each developer. if not using shortcuts. r Drag all of the newly copied objects from the SHARED_MARKETING_TEST folder to MARKETING_TEST. For this example. r 2. 5. Standalone Repository Example In this example. save the folder so the changes can take place. After the object has been successfully copied. or Reuse the object. first follow these steps. r In the PowerCenter Designer. first explaining how to move objects and mappings from each individual folder to the test folder and then how to move tasks. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . we focus solely on the MARKETING_DEV data mart.4. skip to step 2 r Copy the tested objects from the SHARED_MARKETING_DEV folder to the SHARED_MARKETING_TEST folder. Follow these steps to copy a mapping from Development to QA: 1.Data Warehousing 529 of 1017 . Choose to Reuse the object for all shared objects in the mappings copied into the MARKETING_TEST folder. Save your changes. Designer prompts you to either Replace. such as source and target definitions. and workflows to the new area. with the test and production folders divided into the data mart they represent. and drag and drop the mapping from each development folder into the MARKETING_TEST folder.

follow these steps.r Save your changes. If a reusable session task is being used. r r r In Test.5 The folder or global object owner or a user assigned the Administrator role for the Repository Service can grant folder and global object permissions. Save your changes. Save your changes. change the owner of the test folder to a user(s) in the test group. and all others in the repository. The folder or global object owner and a user assigned the Administrator role for the Repository Service have all permissions which you cannot change. Click the Target tab. groups. r In Development. change the owner of the folders to a user in the production group. in PowerCenter 7 and later versions. Rules to Configure Folder and Global Object Permissions Rules in 8. open the MARKETING_TEST folder and drag and drop each reusable session from the developers’ folders into the MARKETING_TEST folder. the owner of the folders should be a user(s) in the development group. r 5. r As mentioned earlier. When migrating objects from Dev to Test to Prod you can’t use the same database connection as those that will be pointing to dev or test environment. Follow the same steps listed above to copy the workflow to the new folder. r Drag each workflow from the development folders into the MARKETING_TEST folder. The Copy Workflow Wizard appears. copy each workflow from Development to Test. r Open each newly copied session and click on the Source tab. Revoke all rights to Public other than Read for the production folders. In Production. Disadvantages of a Single Repository Environment The biggest disadvantage or challenge with a single repository environment is migration of repository objects with respect to database connections. Otherwise. Distributed Repository Environment INFORMATICA CONFIDENTIAL Velocity v8 Methodology . the repository. Change each connection to point to the target database for the Test environment. r r 4. owner’s group. the Copy Wizard allows you to compare conflicts from within Workflow Manager to ensure that the correct migrations are being made. A Copy Session Wizard guides you through the copying process. Change the source to point to the source database for the Test environment. Rules in Previous Versions Users with the appropriate repository privileges could grant folder and global object permissions. 3. Permissions can be granted to users. r In the PowerCenter Workflow Manager. and all others in Permissions could be granted to the owner. A single repository structure can also create confusion as the same users and groups exist in all environments and the number of folders can increase exponentially.Data Warehousing 530 of 1017 . You could change the permissions for the folder or global object owner. skip to step 4. Implement the appropriate security. While the MARKETING_TEST folder is still open. Be sure to double-check the workspace from within the Target tab to ensure that the load options are correct.

INFATEST. test. With a fully distributed approach. Each repository has a similar name.A distributed repository environment maintains separate. The ability to automate this process using pmrep commands. mappings. like the folders in the standalone environment. with each involving some advantages and disadvantages. workflows.e. independent repositories.Data Warehousing 531 of 1017 . Separating repository environments is preferable for handling development to production migrations. For instance. The ability to move everything without breaking or corrupting any of the objects. thereby eliminating many of the manual processes that users typically perform. This section discusses migrations in a distributed repository environment through repository copies. and production environments. in our Marketing example we would have three repositories. mapplets. work performed in development cannot impact QA or production. INFADEV. ● ● This approach also involves a few disadvantages. ● ● ● ● Repository Copy Folder Copy Object Copy Deployment Groups Repository Copy So far. Because the environments are segregated from one another. this document has covered object-level migrations and folder migrations through drag-and-drop object copying and object XML import/export. reusable transformation.. separate repositories function much like the separate folders in a standalone environment. There are four techniques for migrating from development to production in a distributed repository architecture. and INFAPROD. we discuss a distributed repository architecture. In the following example. hardware. The main advantages of this approach are: ● The ability to copy all objects (i. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and software for development.) at once from one environment to another. etc.

database connections. 1. etc.Data Warehousing 532 of 1017 . sequences. we'll look at three ways to accomplish the Repository Copy method: ● ● ● Copying the Repository Repository Backup and Restore PMREP Copying the Repository Copying the Test repository to Production through the GUI client tools is the easiest of all the migration methods. High-performance organizations leverage the value of operational metadata to track trends over time related to load success/failure and duration. ensure that all users are logged out of the destination repository and then connect to the PowerCenter Repository Administration Console (as shown below). INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The problem with this is that everything is moved -. Everything must be set up correctly before the actual production runs can take place. First. Significant maintenance is required to remove any unwanted or excess objects.● The first is that everything is moved at once (which is also an advantage). you must run the repository in the ‘exclusive mode’. which leads to the second disadvantage. Click on the “INFA_PROD Repository on the left pane to select it and change the running mode to “exclusive mode’ by clicking on the edit button on the right pane under the properties tab. Before you can delete the repository. the repository copy process requires that the existing Production repository be deleted. session run times.ready or not. There is also a need to adjust server variables. you must delete the repository before you can copy the Test repository. This metadata can be a competitive advantage for organizations that use this information to plan for future growth. parameters/variables. If the Production repository already exists. we may have 50 mappings in QA. This results in a loss of production environment operational metadata such as load statuses. The 10 untested mappings are moved into production along with the 40 production-ready mappings. and then the Test repository can be copied. but only 40 of them are production-ready. ● ● ● Now that we've discussed the advantages and disadvantages. For example. etc. Lastly.

2. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Delete the Production repository by selecting it and choosing “Delete” from the context menu.Data Warehousing 533 of 1017 .

3.Data Warehousing 534 of 1017 . Click on the Action drop-down list and choose Copy contents from INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Data Warehousing 535 of 1017 .4. exit from the PowerCenter Administration INFORMATICA CONFIDENTIAL Velocity v8 Methodology . 5. Enter the username and password of the Test repository. When you've successfully copied the repository to the new location. repository service “INFA_TEST” from the drop-down menu. Click OK to begin the copy process. 6. choose the domain name. In the new window.

and the SHARED_MARKETING_TEST to SHARED_MARKETING_PROD. 7. The Service Manager periodically synchronizes the list of users and groups in the repository with the users and groups in the domain configuration database. You can use infacmd to export users and groups from the source domain and import them into the target domain.5 onwards.Data Warehousing 536 of 1017 . Use infacmd ExportUsersAndGroups to export the users and groups to a file. double-click on the newly copied repository and log-in with a valid username and password.Console. Use infacmd ImportUsersAndGroups to import the users and groups from the file to a different PowerCenter domain The following steps outline the process of backing up and restoring the repository for migration. This method is preferable to the repository copy process because if any type of error occurs. During synchronization. Modify the server information and all connections so they are updated to point to the new Production locations for all existing tasks and workflows. 9. the file is backed up to the binary file on the repository server. Select Action -> Backup Contents from the drop-down menu. users and groups that do not exist in the target domain are deleted from the repository. then highlight each folder individually and rename them. Repository Backup and Restore Backup and Restore Repository is another simple method of copying an entire repository. This process backs up the repository to a binary file that can be restored to any new location. For example. Before you back up a repository and restore it in a different domain. Launch the PowerCenter Administration Console. From 8. 1. and highlight the INFA_TEST repository service. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . 10. Verify connectivity. security information is maintained at the domain level. When this cleanup is finished. 8. verify that users and groups with privileges for the source Repository Service exist in the target domain. you can log into the repository through the Workflow Manager. In the Repository Manager. Be sure to remove all objects that are not pertinent to the Production environment from the folders before beginning the actual testing process. rename the MARKETING_TEST folder to MARKETING_PROD.

INFORMATICA CONFIDENTIAL Velocity v8 Methodology .e. The file is saved to the Backup directory within the repository server’s home directory.. click OK to begin the backup process.rep file containing all repository information. After you've selected the location and file name.Data Warehousing 537 of 1017 . A screen appears and prompts you to supply a name for the backup file as well as the Administrator username and password. 3.2. 4. When the backup is complete. The backup process creates a . select the repository connection to which the backup will be restored to (i. the Production repository). Stay logged into the Manage Repositories screen.

PMREP backup backs up the repository to the file specified with the -o option. you can write scripts to be run on a daily basis to perform functions INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Enter the appropriate information and click OK. You must be connected to a repository to use this command. You must provide the backup file name. you must repeat the steps listed in the copy repository option in order to delete all of the unused objects and renaming of the folders. pmrep is installed in the PowerCenter Client and PowerCenter Services bin directories. The system will prompt you to supply a username. Using this code example as a model. Use this command when the repository is running.5. and the name of the file to be restored. Refer to the Repository Manager Guide for a list of PMREP commands. PMREP utilities can be used from the Informatica Server or from any client machine connected to the server. The BackUp command uses the following syntax: backup -o <output_file_name> [-d <description>] [-f (overwrite existing output file)] [-b (skip workflow and session logs)] [-j (skip deploy group history)] [-q (skip MX data)] [-v (skip task statistics)] The following is a sample of the command syntax used within a Windows batch file to connect to and backup a repository.Data Warehousing 538 of 1017 . When the restoration process is complete. password. PMREP Using the PMREP commands is essentially the same as the Backup and Restore Repository method except that it is run from the command line rather than through the GUI client tools.

rep Alternatively. you will need to modify the connect strings appropriately. Delete the tasks not being used in the Workflow Manager and the mappings in the Designer r 2.. restore. select Relational connections from the Connections menu. and workflows. backup. such as: INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Use pmrep delete command to delete the content of target repository (if contect already exists in the target repository) 4. r In the Workflow Manager. tasks. etc: backupproduction. r Disable the workflows not being used in the Workflow Manager by opening the workflow properties. the following steps can be used: 1.Data Warehousing 539 of 1017 . follow these steps to convert the repository to Production: 1..and post-session scripts. Disable workflows that are not ready for Production or simply delete the mappings. then checking the Disabled checkbox under the General tab. 4. Modify the pre.bat REM This batch file uses pmrep to connect to and back up the repository Production on the server Central @echo off echo Connecting to Production repository. Use pmrep restore command to restore the backup file into target repostiory Post-Repository Migration Cleanup After you have used one of the repository migration procedures to migrate into Production. r 3. Use infacmd commands to run repository service in ‘Exclusive’ mode 2. “<Informatica Installation Directory>\Server\bin\pmrep” connect -r INFAPROD -n Administrator -x Adminpwd – h infarepserver –o 7001 echo Backing up Production repository.. Edit each relational connection by changing the connect string to point to the production sources and targets.and post-session commands and SQL as necessary.such as connect. and from the Components tab make the required changes to the pre. Use pmrep backup command to backup the source repository 3. If you are using lookup transformations in the mappings and the connect string is anything other than $SOURCE or $TARGET.. “<Informatica Installation Directory>\Server\bin\pmrep” backup -o c:\backup\Production_backup. r r In the Workflow Manager. Implement appropriate security. open the session task properties. Modify the database connection strings to point to the production sources and targets.

follow these sub steps. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . the folder copy method has historically been the most popular way to migrate in a distributed environment. 1. Folder Copy Although deployment groups are becoming a very popular migration method. Copying an entire folder allows you to quickly promote all of the objects located within that folder. If some mappings or workflows are not valid. This can be a serious consideration in real-time or near real-time environments. Remember that a locked repository means than no jobs can be launched during this process. In Production.Data Warehousing 540 of 1017 . then all shortcut relationships are automatically converted to point to this newly copied common or shared folder. and production. and workflow variables are copied automatically. 2.r r r r In Development. worklets and workflows are promoted at once. If the project uses a common or shared folder and this folder is copied first. When the folder copy process is complete. Therefore. test. mappings. change the owner of the test folders to a user in the test group. The Copy Folder Wizard appears to step you through the copy process. Copy the Development folder to Test. tasks. All connections. change the owner of the folders to a user in the production group. The three advantages of using the folder copy method are: ● The Repository Managers Folder Copy Wizard makes it almost seamless to copy an entire folder and all the objects located within it. All source and target objects. Revoke all rights to Public other than Read for the Production folders. mapplets. sequences. everything in the folder must be ready to migrate forward. Because of this. In Test. however. follow these sub-steps: ● ● ● Open the Repository Manager client tool. Highlight the folder to copy and drag it to the Test repository. If using shortcuts. The first example uses three separate repositories for development. Highlight the folder to copy and drag it to the Test repository. The following example steps through the process of copying folders from each of the different environments. reusable transformations. it is necessary to schedule this migration task during a time when the repository is least utilized. Connect to both the Development and Test repositories. If you skipped step 1. ● ● The primary disadvantage of the folder copy method is that the repository is locked while the folder copy is being performed. The Copy Folder Wizard will appear. Connect to both the Development and Test repositories. otherwise skip to step 2: ● ● ● ● ● Open the Repository Manager client tool. open the newly copied folder in both the Repository Manager and Designer to ensure that the objects were copied properly. then developers (or the Repository Administrator) must manually delete these mappings or workflows from the new folder after the folder is copied. mapping variables. ensure that the owner of the folders is a user in the development group.

Follow these steps to ensure that all shortcuts are reconnected.3. choose to replace the folder. ● ● Use the advanced options when copying the folder across.Data Warehousing 541 of 1017 . INFORMATICA CONFIDENTIAL Velocity v8 Methodology . If the folder already exists in the destination repository. Select Next to use the default name of the folder 4. The following screen appears to prompt you to select the folder where the new shortcuts are located.

When the folder copy process is complete. repeat the steps above to migrate to the Production repository. see the earlier description of Object Copy for the standalone environment. Ensure that all tasks updated correctly and that folder and repository security is modified for test and production. For additional information. One advantage of Object Copy in a distributed environment is that it provides more granular control over objects. a folder compare will take place. Object Copy Copying mappings into the next stage in a networked environment involves many of the same advantages and disadvantages as in the standalone environment. log onto the Workflow Manager and change the connections to point to the appropriate target location. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . 5.Data Warehousing 542 of 1017 . create a common folder with the exact same name and case.In a situation where the folder names do not match. When testing is complete. If using shortcuts. Two distinct disadvantages of Object Copy in a distributed environment are: ● ● Much more work to deploy an entire group of objects Shortcuts must exist prior to importing/copying mappings Below are the steps to complete an object copy in a distributed repository environment: 1. otherwise skip to step 2: ● ● In each of the distributed repositories. follow these sub-steps. 2. Rename the folder as appropriate and implement the security. making sure the shortcut has the exact same name. Copy the mapping from the Test environment into Production. but the process of handling shortcuts is simplified in the networked environment. The Copy Folder Wizard then completes the folder copy process. Copy the shortcuts into the common folder in Production.

you can migrate individual objects as you would in an object copy migration. you can set up a dynamic deployment group that allows the objects in the deployment group to be defined by a repository query. Copying a Folder replaces the previous copy. and tasks. Advantages of Using Deployment Groups ● ● ● ● Backup and restore of the Repository needs to be performed only once. but can also have the convenience of a repository. Faster and more flexible than folder moves for incremental changes. for additional convenience. Lastly. change the owner of the test folders to a user in the test group. Drag-and-drop the mapping from Test into Production. the use of Deployment Groups for migrations between distributed environments allows the most flexibility and convenience. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Uses for Deployment Groups r r r r r Deployment Groups are containers that hold references to objects that need to be migrated. ensure the owner of the folders is a user in the development group. Additionally. sessions. but is available for all repository objects including workflows. The objects included in a deployment group have no restrictions and can come from one or multiple folders. Deployment Groups For versioned repositories.● In the Designer. ● ● 3. Types of Deployment Groups ● Static r Contain direct references to versions of objects that need to be moved. rather than the entire contents of a folder. In Production. Allows for migration “rollbacks” Allows specifying individual objects to copy. Note that the ability to compare objects is not limited to mappings. rather than being added to the deployment group manually. because deployment groups are available on versioned repositories. 4. Implement appropriate security.Data Warehousing 543 of 1017 . change the owner of the folders to a user in the production group. If creating the workflow. In Test. add a session task that points to the mapping and enter all the appropriate information. they also have the ability to be rolled back. follow the Copy Wizard. when necessary. With Deployment Groups. Revoke all rights to Public other than Read for the Production folders. Copying a Mapping allows for different names to be used for the same object. Create or copy a workflow with the corresponding session task in the Workflow Manager to run the mapping (first ensure that the mapping exists in the current repository). ● ● If copying the workflow.or folder-level migration as all objects are deployed at once. During the mapping copy process. ● ● ● ● In Development. PowerCenter 7 and later versions allow a comparison of this mapping to an existing copy of the mapping already in Production. reverting to the previous versions of the objects. connect to both the Test and Production repositories and open the appropriate folders in each. Allows for version-based object migration.

● Advantages r r r r Tracks versioned objects during development. Note: By default. go to edit mode and lock them. ● Create label r r r r r Create labels through the Repository Manager. object versions in the repository) are then selected and copied to the target repository Pre-Requisites Create required folders in the Target Repository Creating Labels A label is a versioning object that you can associate with any versioned object or group of versioned objects in a repository. ● Dynamic r r Contain a query that is executed at the time of deployment. After creating the labels. The "Lock" option is used to prevent other users from editing or applying the label. Associates groups of objects for import and export. Improves query results. ● Advantages r Tracks objects during development INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Queries A query is an object used to search for versioned objects in the repository that meet specific conditions. The results of the query (i. Run the query and apply the labels.r Users explicitly add the version of the object to be migrated to the deployment group.e. Associates groups of objects for deployment. the latest version of the object gets labeled. Some Standard Label examples are: ■ ■ ■ ■ ■ Development Deploy_Test Test Deploy_Production Production ● Apply Label r r Create a query to identify the objects that are needed to be queried.Data Warehousing 544 of 1017 . This option can be enabled only when the label is edited.

In the dialog window. Expand the repository. In this example.r r r Associates a query with a deployment group Finds deleted objects you want to recover Finds groups of invalidated objects you want to validate ● Create a query r The Query Browser allows you to create. edit. and choose whether it should be static or dynamic. we are creating a static deployment group. right-click on “Deployment Groups” and choose “New Group. Launch the Repository Manager client tool and log in to the source repository. or delete object queries ● Execute a query r r Execute through Query Browser EXECUTE QUERY: ExecuteQuery -q query_name -t query_type -u persistent_output_file_name -a append -c column_separator -r end-of-record_separator -l end-oflisting_indicator -b verbose Creating a Deployment Group Follow these steps to create a deployment group: 1. give the deployment group a name.Data Warehousing 545 of 1017 .” 3. 2. run. Click OK. INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

or Repository Manger.” INFORMATICA CONFIDENTIAL Velocity v8 Methodology . In Designer.Data Warehousing 546 of 1017 .” The “View History” window appears. Workflow Manager.Adding Objects to a Static Deployment Group Follow these steps to add objects to a static deployment group: 1. In the “View History” window. right-click the object and choose “Add to Deployment Group. right-click an object that you want to add to the deployment group and choose “Versioning” -> “View History. 2.

4. In the Deployment Group dialog window. In most cases. choose whether you want to add dependent objects. In the final dialog window. and click OK.Data Warehousing 547 of 1017 . Click OK. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . you will want to add dependent objects to the deployment group so that they will be migrated as well. choose the deployment group that you want to add the object to.3.

just as you did for a static deployment group.Data Warehousing 548 of 1017 . the task of adding each object to the deployment group is similar to the effort required for an object copy migration. PowerCenter tries to re-insert or replace the shortcuts. it is quite simple and aided by the PowerCenter GUI interface. and causes the deployment to fail. this option can cause issues when moving existing code forward because “All Dependencies” also flags shortcuts. Also. However. select the “Queries” button. This does not work. Adding Objects to a Dynamic Deployment Group Dynamic Deployment groups are similar in function to static deployment groups. but in this case.NOTE: The “All Dependencies” option should be used for any new code that is migrating forward. To make deployment groups easier to use. Follow these steps to add objects to a dynamic deployment group: 1. the contents of the deployment group are defined by a repository query. choose the dynamic option. First. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . but differ in the way that objects are added. The object will be added to the deployment group at this time. PowerCenter allows the capability to create dynamic deployment groups. In a static deployment group. Don’t worry about the complexity of writing a repository query. During the deployment. objects are manually added one by one. Although the deployment group allows the most flexibility. In a dynamic deployment group. create a deployment group.

In the Query Editor window. provide a name and query type (Shared). The drop-down list of parameters lets you choose from 23 predefined metadata categories.Data Warehousing 549 of 1017 . The “Query Browser” window appears. the developers have assigned the “RELEASE_20050130” label to all objects that need to be migrated. The creation and application of labels are discussed in Using PowerCenter Labels. Define criteria for the objects that should be migrated. 3. so the query is defined as “Label Is Equal To ‘RELEASE_20050130’”.2. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . In this case. Choose “New” to create a query for the dynamic deployment group.

e. which guides you through the stepby-step options for executing the deployment group. or through the pmrep command line utility. and close the Deployment Group editor window. INFORMATICA CONFIDENTIAL Velocity v8 Methodology .. Click OK on the Query Browser window. you simply drag the deployment group from the source repository and drop it on the destination repository. This is ideal since the deployment group allows ultimate flexibility and convenience as the script can be scheduled to run overnight. You can also use the pmrep utility to automate importing objects via XML.Data Warehousing 550 of 1017 . Deployments -> History -> View History -> Rollback). you must first locate the Deployment via the TARGET Repositories menu bar (i. This opens the Copy Deployment Group Wizard. you can set up a UNIX shell or Windows batch script that calls the pmrep DeployDeploymentGroup command. Executing a Deployment Group Migration A Deployment Group migration can be executed through the Repository Manager client tool. test.4. With the client tool. thereby causing minimal impact on developers and the PowerCenter administrator. which can execute a deployment group migration without human intevention. Recommendations Informatica recommends using the following process when running in a three-tiered environment with development. Save the Query and exit the Query Editor. and production servers. Automated Deployments For the optimal migration method. Rolling Back a Deployment To roll back a deployment.

The following steps outline the process of exporting the objects from source repository and importing them into the destination repository: Exporting 1. the export/import functionality allows the export/import of multiple objects to a single XML file. by using labels. reusable transformations. Select Repository -> Export Objects INFORMATICA CONFIDENTIAL Velocity v8 Methodology . This method provides the greatest flexibility in that you can promote any object from within a development repository (even across folders) into any destination repository. This method gives you total granular control over the objects that are being moved. The XML Object Copy Process allows you to copy nearly all repository objects. workflows. Beginning with PowerCenter 7 and later versions. From Designer or Workflow Manager. 2. Versioned Repositories For versioned repositories. and the enhanced pmrep command line utility. login to the source repository. worklets.Data Warehousing 551 of 1017 . see the steps listed in the Object Copy section. mapplets. For recommendations on performing this copy procedure correctly. targets. Third-Party Versioning Some organizations have standardized on third-party version control software. dynamic deployment groups. mappings. Also. including sources.Non-Versioned Repositories For migrating from development into test. and tasks. It also ensures that the latest development mappings can be moved over manually as they are completed. Informatica recommends using the Object Copy method. Open the folder and highlight the object to be exported. Informatica recommends using the Deployment Groups method for repository migration in a distributed repository environment. the use of the deployment group migration method results in automated migrations that can be executed without manual intervention. PowerCenter’s XML import/export functionality offers integration with such software and provides a means to migrate objects. This can significantly cut down on the work associated with object level XML import/export. This method is most useful in a distributed environment because objects can be exported into an XML file from one repository and imported into the destination repository.

4. (This may vary depending on where you installed the client tools.Data Warehousing 552 of 1017 .dtd file.3. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Select Repository -> Import Objects. 1. The system prompts you to select a directory location on the local workstation. 6.) 5. The system prompts you to select a directory location and file to import into the repository. Together. Open Windows Explorer and go to the C:\Program Files\Informatica PowerCenter 7 and later versions x \Client directory. and paste the copy into the directory where you saved the XML file. Choose the directory to save the file. The following screen appears with the steps for importing the object. 4. Using the default name for the XML file is generally recommended. these files are now ready to be added to the version control software Importing Log into Designer or the Workflow Manager client tool and login to the destination repository. make a copy of it. 3. Open the folder where the object is to be imported. Select the mapping and add it to the Objects to Import list. Find the powrmart. 2.

6.Data Warehousing 553 of 1017 . it has to be deleted prior to restoring).” Remember. and then click "Import". Since the shortcuts have been added to the folder. Click "Next". allowing the activities associated with XML import/export to be automated through pmrep. It is important to note that the pmrep command line utility was greatly enhanced in PowerCenter 7 and later versions. the mapping will now point to the new shortcuts and their parent folder.5. if the destination repository has content. Last updated: 04-Jun-08 16:18 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . 7. Click on the destination repository service on the left pane and choose the “Action drop-down list box “ -> “Restore.

use the PowerExchange Copy Utility DTLURDMO. Step 2: Run DTLURDMO to copy PowerExchange objects. ● ● Using the DTLURDMO utility Using the Power Exchange Client tool (Detail Navigator) DTLURDMO Utility Step 1: Validate connectivity between the client and listeners ● Test communication between clients and all listeners in the production environment with: dtlrexeprog=ping <loc>=<nodename>.Migration Procedures . DTLURDMO does have the ability to copy selectively.PowerExchange Challenge To facilitate the migration of PowerExchange definitions from one environment to another. At this stage. you need to copy the datamaps. The following section assumes that the entire datamap set is to be copied. and the full functionality of the utility is documented in the PowerExchange Utilities Guide. The types of definitions that can be managed with this utility are: ● PowerExchange data maps INFORMATICA CONFIDENTIAL Velocity v8 Methodology . however. Description There are two approaches to perform a migration. ● Run selected jobs to exercise data access through PowerExchange data maps.Data Warehousing 554 of 1017 . To do this. if PowerExchange is to run against new versions of the PowerExchange objects rather than existing libraries.

This file is used to specify how the DTLURDMO utility operates. it looks for a file dtlurdmo. For example: CALL PGM(dtllib/ DTLURDMO) parm ('datalib/deffile(dtlurdmo)') Running DTLURDMO The utility should be run extracting information from the files locally. and then the extract maps if this is a capture environment.ini in the current path. the definition is in the member CFG/DTLURDMO in the current datalib library. The utility runs on all capture platforms. By default. DTLURDMO must be run once for the datamaps. you must give the library and filename of the definition file as a parameter. If no definition file is specified.Data Warehousing 555 of 1017 .This file is used to specify how the DTLURDMO utility operates and is read from the SYSIN card. MVS DTLURDMO job utility Run the utility by submitting the DTLURDMO job. If you want to create a separate DTLURDMO definition file rather than use the default location. This causes the datamaps to be written out in the format required for the upgraded PowerExchange. and extract maps cannot be run together.x Listener. then writing out the datamaps through the new PowerExchange V8. it looks for a file dtlurdmo. On non-MVS platforms.x.ini in the current path.ini ● DTLURDMO Definition file specification . If no input argument is provided. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . AS/400 utility Syntax: CALL PGM(<location and name of DTLURDMO executable file>) For example: CALL PGM(dtllib/DTLURDMO) ● DTLURDMO Definition file specification .● ● PowerExchange capture registrations PowerExchange capture extraction data maps On MVS.This file is used to specify how the DTLURDMO utility operates. Commands for mixed datamaps. registrations. the input statements for this utility are taken from SYSIN. ● DTLURDMO Definition file specification . then again for the registrations. the input argument point to a file containing the input definition. Windows and UNIX Command Line Syntax: DTLURDMO <dtlurdmo definition file> For example: DTLURDMO e:\powerexchange\bin\dtlurdmo. which can be found in the RUNLIB library.

ENCRYPT PASSWORD option from the PowerExchange Navigator. DM_COPY. Details of performing selective copies are documented fully in the PowerExchange Utilities Guide. then selective copies can be carried out. registrations.x. Note: The encrypted password (EPWD) is generated from the FILE.Data Warehousing 556 of 1017 . REPLACE.x. SOURCE LOCAL. Power Exchange Client tool (Detail Navigator) Step 1: Validate connectivity between the client and listeners ● Test communication between clients and all listeners in the production environment with: dtlrexeprog=ping loc=<nodename>. TARGET NODE1. DETAIL. Definition File Example The following example shows a definition file to copy all datamaps from the existing local datamaps (the local datamaps are defined in the DATAMAP DD card in the MVS JCL or by the path on Windows or UNIX) to the V8.If only a subset of the PowerExchange datamaps. EPWD A3156A3623298FDC. SELECT schema=*.x format. This document assumes that everything is going to be migrated from the existing environment to the new V8.x listener (defined by the TARGET location node1): USER DTLUSR. and extract maps are required. INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Step 2: Start the Power Exchange Navigator ● ● Select the datamap that is going to be promoted to production. On the menu bar.Data Warehousing 557 of 1017 . choose the appropriate location ( in this case mvs_prod). ● Supply the user name and password and click OK. ● On the drop-down list box. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . select a file to send to the remote node. ● A confirmation message for successful migration is displayed.● Run selected jobs to exercise data access through PowerExchange data maps.

Data Warehousing 558 of 1017 .Last updated: 06-Feb-07 11:39 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Description When a task in the workflow fails at any point. This option. the mapping needs to produce the same result. the Integration Service uses the saved recovery information to recover it. Enable the session for recovery by selecting one of the following three Recovery Strategies: ● Resume from the last checkpoint r The Integration Service saves the session recovery information and updates recovery tables for a target database. r INFORMATICA CONFIDENTIAL Velocity v8 Methodology . as if the session completed successfully with one run. There are also recovery options available for workflows and tasks that can be used to handle different failure scenarios. Additionally. "Suspend on Error". As an alternative. in the recovery execution as in the failed execution. the workflow can be suspended and the error can be fixed. results in accurate and complete target data. rather than re-processing the portion of the workflow with no errors.Data Warehousing 559 of 1017 . Configure Session for Recovery The recovery strategy can be configured on the Properties page of the Session task. and in the same order. Configure Mapping for Recovery For consistent recovery. ensure that all the targets received data from transformations that produce repeatable data. This can be achieved by sorting the input data using either the sorted ports option in Source Qualifier (or Application Source Qualifier) or by using a sorter transformation with distinct rows option immediately after source qualifier transformation. one option is to truncate the target and run the workflow again from the beginning. If a session interrupts.Running Sessions in Recovery Mode Challenge Understanding the recovery options that are available for PowerCenter when errors are encountered during the load.

Configure Workflow for Recovery The Suspend on Error option directs the Integration Service to suspend the workflow while the error is being fixed and then it resumes the workflow. it does not recover the session." If one or more tasks are still running in the workflow when a task fails.r The Integration Service recovers a stopped. If no other task is running in the workflow. the Integration Service restarts the failed tasks and continues evaluating the rest of the tasks in the workflow. The workflow is suspended when any of the following tasks fail: ● ● ● ● Session Command Worklet Email When a task fails in the workflow. The Integration Service does not evaluate the output link of the failed task. The Workflow Monitor displays the status of the workflow as "Suspending. The session status becomes failed and the Integration Service continues running the workflow. the Integration Service stops running tasks in the path.Data Warehousing 560 of 1017 . the Workflow Monitor displays the status of the workflow as "Suspended. and recover the workflow in the Workflow Monitor. ● Fail task and continue workflow r The Integration Service recovers a workflow. ● Restart task r r The Integration Service does not save session recovery information. such as a target database error. aborted or terminated session from the last checkpoint. the Integration Service reruns the session during recovery. When you recover a workflow." When the status of the workflow is "Suspended" or "Suspending. the Integration Service stops running the failed task and continues running tasks in other paths. If a session interrupts. Truncate Target Table INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The Integration Service does not run any task that already completed successfully." you can fix the error.

the status of the worklet is "Suspending". the Integration Service uses the existing session log when it resumes the workflow from the point of suspension. the target table is not truncated during recovery process. If no other task is running in the worklet. Suspension Email The workflow can be configured to send an email when the Integration Service suspends the workflow. However. Session Logs In a suspended workflow scenario. The Integration Service only sends out another suspension email if another task fails after the workflow resumes. When a task in the worklet fails. Check the "Browse Emails" button on the General tab of the Workflow Designer Edit sheet to configure the suspension email. another suspension email is not sent. The error can be fixed and the workflow can be resumed subsequently.If the truncate table option is enabled in a recovery-enabled session. the recovery process can be started by using pmcmd in command line mode or by using a script. it INFORMATICA CONFIDENTIAL Velocity v8 Methodology . When a task fails.Data Warehousing 561 of 1017 . Recovery Tables and Recovery Process When the Integration Service runs a session that has a resume recovery strategy. the Integration Service also suspends the worklet if a task within the worklet fails. the status of the worklet is "Suspended". Starting Recovery The recovery process can be started using Workflow Manager or Workflow Monitor . If another task fails while the Integration Service is suspending the workflow. If other tasks are still running in the worklet. Suspending Worklets When the "Suspend On Error" option is enabled for the parent workflow. the workflow is suspended and suspension email is sent. The parent workflow is also suspended when the worklet is "Suspended" or "Suspending". Alternately. the Integration Service stops executing the failed task and other tasks in its path. the earlier runs that caused the suspension are recorded in the historical run information in the repository.

it creates a recovery table. If you disable recovery. If you manually create this table. you must create a row and enter a value other than zero for LAST_TGT_RUN_ID to ensure that the session recovers successfully. Output is repeatable. the Integration Service cannot recover the session. create the recovery tables manually. The table contains information that the Integration Service uses to determine if it needs to write messages to the target table during recovery for a real-time session. You can set this property for Custom transformations.Data Warehousing 562 of 1017 .writes to recovery tables on the target database system. When the Integration Service recovers the session. If you do not want the Integration Service to create the recovery tables. If you want the Integration Service to create the recovery tables. If you edit or drop the recovery tables before you recover a session.When the Integration Service runs a real-time session that uses the recovery table and that has recovery enabled. The Integration Service removes the information from this table after each successful session and initializes the information at the beginning of subsequent sessions.Contains information that the Integration Service uses to identify each target on the database. grant table creation privilege to the database user name that is configured in the target database connection. it uses information in the recovery tables to determine if it needs to write the message to the target table. A Lookup transformation property that determines if the lookup source is the same between the session and recovery. PM_REC_STATE . The ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . on the target database to store message IDs and commit numbers.Contains target load information for the session run. The Integration Service creates the following recovery tables in the target database: PM_RECOVERY . the Integration Service does not remove the recovery tables from the target database and you must manually remove them Session Recovery Considerations The following options affect whether the session is incrementally recoverable: ● Output is deterministic. The information remains in the table between session runs. PM_REC_STATE. A property that determines if the transformation generates the data in the same order for each session run. it uses information in the recovery tables to determine where to begin loading data to target tables. PM_TGT_RUN_ID . When the Integration Service recovers the session. Lookup source is static. A property that determines if the transformation generates the same set of data for each session run.

Any change after initial failure (in mapping. Session configurations are not supported by PowerCenter for session recovery. Data movement mode change after initial session failure. ● ● ● ● ● ● ● ● ● HA Recovery Highly-available recovery allows the workflow to resume automatically in the case of Integration Service failover. Inconsistent Data During Recovery Process For recovery to be effective. Mapping uses sequence generator transformation. when server is running in Unicode mode. Session sort order changes. Source and/or target changes after initial session failure. Maximum automatic recovery attempts When you automatically recover terminated tasks. after initial session failure. Mapping uses a lookup table and the data in the lookup table changes between session runs. session and/or in the Integration Service) that changes the ability to produce repeatable data. Mapping changes in a way that causes server to distribute or filter or aggregate rows differently. The following situations may produce inconsistent data during a recovery session: ● Session performs incremental aggregation and the Integration Service stops unexpectedly. Mapping uses a normalizer transformation.Integration Service uses this property to determine if the output is deterministic. and in the same order. Automatically recover terminated tasks Recover terminated Session or Command tasks without user intervention. results in inconsistent data during the recovery process. the recovery session must produce the same set of rows. The following options are available in the properties tab of the workflow: ● Enable HA recovery Allows the workflow to be configured for Highly Availability. you can choose the number of times the Integration Service ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 563 of 1017 . source or target) changes. Code page (server.

The default setting is 5.attempts to recover the task. Last updated: 26-May-08 11:28 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 564 of 1017 .

Description A label is a versioning object that can be associated with any versioned object or group of versioned objects in a repository. a label is a named object in the repository. then only version 9 has that label. multiple labels can point to the same object for greater flexibility. To create a label. However. Labels provide a way to tag a number of object versions with a name for later identification. and not objects as a whole. choose Versioning-Labels from the Repository Manager. Note that labels apply to individual object versions.Data Warehousing 565 of 1017 . Associate groups of objects for import and export. whose purpose is to be a “pointer” or reference to a group of versioned objects. Labels can be used for many purposes: ● ● ● ● Track versioned objects during development Improve object query results. So if a mapping has ten versions checked in. a label called “Project X version X” can be applied to all object versions that are part of that project and release. The other versions of that mapping do not automatically inherit that label. and a label is applied to version 9. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The “Use Repository Manager” privilege is required in order to create or edit labels. Create logical groups of objects for future deployment. For example. Therefore.Using PowerCenter Labels Challenge Using labels effectively in a data warehouse or data integration project to assist with administration and migration.

Include comments for further meaningful description.Data Warehousing 566 of 1017 . like other global objects such as Queries and Deployment Groups. Labels. can have user and group privileges attached to them. This prevents anyone from accidentally associating additional objects with the label or removing object references for the label. This allows an administrator to create a label that can only be used by specific individuals or groups. Locking the label is also advisable. For example. a suggested naming convention for labels is: Project_Version_Action. choose a name that is as descriptive as possible.When creating a new label. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Only those people working on a specific project should be given read/write/execute permissions for labels that are assigned to that project.

To apply the label to objects. objects associated with that label can be used in the deployment. invoke the “Apply Label” wizard from the Versioning >> Apply Label menu option from the menu bar in the Repository Manager (as shown in the following figure). Using Labels in Deployment An object query can be created using the existing labels (as shown below). it should be applied to related objects.Once a label is created. For example. and tasks associated with the workflow. The Repository Server applies labels to sources. After the label has been applied to related objects. Note: Labels can be applied to any object version in the repository except checked-out versions. Labels can also be used to manage the size of the repository (i.e. to group dependencies for a workflow. it can be used in queries and deployment groups (see the Best Practice on Deployment Groups ). INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Based on the object query. Use the “Move label” property to point the label to the latest version of the object(s).Data Warehousing 567 of 1017 . targets. Applying Labels Labels can be applied to any object and cascaded upwards and downwards to parent and/or child objects. to purge object versions). Execute permission is required for applying labels. apply a label to all children objects. mappings. Labels can be associated only with a dynamic deployment group.

choose three labels for the development and subsequent repositories: ● ● ● The first is to identify the objects that developers can mark as ready for migration. thus developing a migration audit trail.Data Warehousing 568 of 1017 . Be sure that developers are aware of the uses of these labels and when they should apply labels. The second should apply to migrated objects. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The third is to apply to objects as they are migrated into the receiving repository. completing the migration audit trail. For each planned migration between repositories.Strategies for Labels Repository Administrators and other individuals in charge of migrations should develop their own label strategies and naming conventions in the early stages of a data integration project.

For example. Using labels in this fashion along with the query feature allows complete or incomplete objects to be identified quickly and easily. Additional labels can be created with developers to allow the progress of mappings to be tracked if desired. Last updated: 04-Jun-08 13:47 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Developers can also label the object with a migration label at a later time if necessary. when an object is successfully unit-tested by the developer. use the first label to construct a query to build a dynamic deployment group. it can be marked as such. thereby providing an object-based view of progress.When preparing for the migration.Data Warehousing 569 of 1017 . Developers and administrators do not need to apply the second and third labels manually. The second and third labels in the process are optionally applied by the migration wizard when copying folders between versioned repositories.

4. Accounts Receivable – (Number of Accounts Receivable Shipments or Total Accounts Receivable Outstanding) 2. To drive out this type of solution execute the following tasks: 1. An example audit/balance table definition looks like this : Audit/Balancing Details INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and many others. 5. Develop a data integration process that will read from the source system and populate the detail audit/balancing table with the control totals. Develop a data integration process that will read from the target system and populate the detail audit/balancing table with the control totals. Customers – (Number of Customers or Number of Customers by Country) b. Each control measure that is being tracked will require development of a corresponding PowerCenter process to load the metrics to the Audit/ Balancing Detail table. Develop a reporting mechanism that will query the audit/balancing table and identify the the source and target entries match or if there is a discrepancy. Deliveries – (Number of shipments or Qty of units shipped of Value of all shipments) d. Work with business users to identify what audit/balancing processes are needed. More specifically. Define for each process defined in #1 which columns should be used for tracking purposes for both the source and target system. business intelligence reports provide insight at a glance to verify that the correct data has been pulled from the source and completely loaded to the target. Patriot Act. Description The common practice for audit and balancing solutions is to produce a set of common tables that can hold various control metrics regarding the data integration process.Build Data Audit/Balancing Processes Challenge Data Migration and Data Integration projects are often challenged to verify that the data in an application is complete. 3. This best practice illustrates how to do this in an efficient and a repeatable fashion for increased productivity and reliability. Some examples of this may be: a. This is particularly important in businesses that are either highly regulated internally and externally or that have to comply with a host of government compliance regulations such as Sarbanes-Oxley. BASEL II. HIPAA. Ultimately.Data Warehousing 570 of 1017 . Orders – (Qty of Units Sold or Net Sales Amount) c. to identify that all the appropriate data was extracted from a source system and propagated to its final target.

s) 10.Data Warehousing 571 of 1017 .2 UPDATE_TIMESTAMP TIMESTAMP UPDATE_PROCESS VARCHAR2 50 Control Column Definition by Control Area/Control Sub Area Column Name CONTROL_AREA Data Type Size VARCHAR2 50 CONTROL_SUB_AREA VARCHAR2 50 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .s) 10.s) 10.Column Name AUDIT_KEY CONTROL_AREA Data Type NUMBER VARCHAR2 Size 10 50 50 10 10 10 10 10 CONTROL_SUB_AREA VARCHAR2 CONTROL_COUNT_1 CONTROL_COUNT_2 CONTROL_COUNT_3 CONTROL_COUNT_4 CONTROL_COUNT_5 CONTROL_SUM_1 CONTROL_SUM_2 CONTROL_SUM_3 CONTROL_SUM_4 CONTROL_SUM_5 NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER (p.2 NUMBER (p.2 NUMBER (p.s) 10.2 NUMBER (p.s) 10.2 NUMBER (p.

CONTROL_COUNT_1 CONTROL_COUNT_2 CONTROL_COUNT_3 CONTROL_COUNT_4 CONTROL_COUNT_5 CONTROL_SUM_1 CONTROL_SUM_2 CONTROL_SUM_3 CONTROL_SUM_4 CONTROL_SUM_5 VARCHAR2 50 VARCHAR2 50 VARCHAR2 50 VARCHAR2 50 VARCHAR2 50 VARCHAR2 50 VARCHAR2 50 VARCHAR2 50 VARCHAR2 50 VARCHAR2 50 UPDATE_TIMESTAMP TIMESTAMP UPDATE_PROCESS VARCHAR2 50 The following is a screenshot of a single mapping that will populate both the source and target values in a single mapping: INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 572 of 1017 .

The following two screenshots show how two mappings could be used to provide the same results: INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 573 of 1017 .

21 0 21294.21 283. INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Note: One key challenge is how to capture the appropriate control values from the source system if it is continually being updated. Identifying what the control totals should be 2.22 21011.21 11230. projects can lower the time needed to develop these solutions and still provide risk reductions by having this type of solution in place.01 Deliveries 1298 In summary. By building a common model for meeting audit/balancing needs. Building processes that will collect the correct information at the correct granularity There are also a set of basic tasks that can be leveraged and shared across any audit/balancing needs. In those cases you may want to take advantage of an aggregator transformation to collect the appropriate control totals as illustrated in this screenshot: The following are two Straw-man Examples of an Audit/Balancing Report which is the end-result of this type of process: Data Area Leg count TT count Diff Leg amt TT amt Customer 11000 Orders 9827 10099 9827 1288 1 0 0 0 11230. The first example with one mapping will not work due to the changes that occur in the time between the extraction of the data from the source and the completion of the load to the target application. there are two big challenges in building audit/balancing processes: 1.Data Warehousing 574 of 1017 .

Data Warehousing 575 of 1017 .Last updated: 04-Jun-08 18:17 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

IDE focuses on data profiling. and de-duplication tool. the challenge is twofold: to cleanse project data. Defective data leads to breakdowns in the supply chain. correction. IDQ has been developed as a data analysis. In a production environment. IDQ can discover data quality issues at a record and field level. Description A significant portion of time in the project development process should be dedicated to data quality. which can improve both efficiency and effectiveness. Profiling and Analysis . Note: The remaining items in this document will therefore. including the implementation of data cleansing processes. once in the system. one that provides a complete solution for identifying and resolving all types of data quality problems and preparing data for the consolidation and load processes. Concepts Following are some key concepts in the field of data quality. Informatica offers two application suites for tackling data quality issues: Informatica Data Explorer (IDE) and Informatica Data Quality (IDQ). The list of concepts can be read as a process. and to ensure that all data entering the organizational data stores provides for consistent and reliable decision-making. A 2005 study by the Gartner Group stated that the majority of currently planned data warehouse projects will suffer limited acceptance or fail outright. and IDE is ideally suited to these tasks. However. poor business decisions. its unique strength is its metadata profiling and discovery capability.Data Cleansing Challenge Poor data quality is one of the biggest obstacles to the success of many data integration projects. in Informatica terminology these tasks are assigned to IDE and IDQ respectively. and Velocity best practices recommends the use of IDQ for such purposes. so that the project succeeds. Therefore. focus in the context of IDQ usage.Data Warehousing 576 of 1017 . INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Moreover. profiling is primarily concerned with metadata discovery and definition. and its results can feed into the data integration process. data quality reports should be generated after each data warehouse implementation or when new source systems are integrated into the environment. There should also be provision for rolling back if data quality testing indicates that the data is unacceptable. poor data quality can cost organizations vast sums in lost revenues. leading from profiling and analysis to consolidation. Gartner declared that the main cause of project problems was a lack of attention to data quality. These data quality concepts provide a foundation that helps to develop a clear picture of the subject data. Thus.whereas data profiling and data analysis are often synonymous terms. cleansing. It is essential that data quality issues are tackled during any large-scale data project to enable project success and future organizational success. and inferior customer relationship management.

INFORMATICA CONFIDENTIAL Velocity v8 Methodology . company name. including data cleansing.refers to arranging information in a consistent manner or preferred format.Data Warehousing 577 of 1017 . but optional.using the data sets defined during the matching process to combine all cleansed or approved data into a single. Consolidation . Matching and de-duplication .the process of extracting individual elements within the records. For more information. files. see the Best Practice Effective Data Matching Techniques. Informatica Applications The Informatica Data Quality software suite has been developed to resolve a wide range of data quality issues. see the Best Practice Effective Data Standardizing Techniques. and SSN. and zip+4 codes. For more information. Examples may include: sales volume. Example: validating addresses with postal directories. or data entry forms in order to check the structure and content of each field and to create discrete fields devoted to specific information types.refers to removing. Use matching components and business rules to identify records that may refer.Parsing . redundant or poor-quality records where high-quality records of the same information exist. Cleansing and Standardization . The suite comprises the following elements: ● IDQ Workbench . Examples may include: name.the process of correcting data using algorithmic components and secondary reference data sources. master record. phone number. information to existing data or complete data. to check and validate information. to the same customer. number of employees for a given business. or flagging for removal.refers to adding useful. for example. Validation . Examples include the removal of dashes from phone numbers or SSNs. title. or house-holding. Enhancement . consolidated view.a stand-alone desktop tool that provides a complete set of data quality functionality on a single computer (Windows only). Examples are building best record.

Plans are saved into projects that can provide a structure and sequence to your data quality endeavors. Workbench enables you to build discrete procedures (called plans in Workbench) which contain data input components.Data Warehousing 578 of 1017 . parsing.a plug-in component that integrates Workbench with PowerCenter. IDQ Server enables the creation and management of multiple repositories. ● ● Using IDQ in Data Projects IDQ can be used effectively alongside PowerCenter in data projects. validation. and consolidation operations on the specified data. enhancement. That is. standardization. IDQ Integration . The following figure illustrates how data quality processes can function in a project setting: INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Through its Workbench user-interface tool. Plans can perform analysis.● IDQ Server.a set of processes that enables the deployment and management of data quality procedures and resources across a network of any size through TCP/IP. matching. IDQ stores all its processes as XML in the Data Quality Repository (MySQL). to run data quality procedures in its own applications or to provide them for addition to PowerCenter transformations. IDQ tackles data quality in a modular fashion. enabling PowerCenter users to embed data quality procedures defined in IDQ in their mappings. and operational components. output components.

Using the IDQ Integration INFORMATICA CONFIDENTIAL Velocity v8 Methodology . which enables the creation of versatile and easy to use dashboards to communicate data quality metrics to all interested parties. you’ll test and measure the results of the plans and compare them to the initial data quality assessment to verify that targets have been met. In stage 3. Stages 3 and 4 typically occur during the Design Phase of Velocity. This stage is performed in Workbench. If you are using IDQ Workbench and Server. In stage 5. you can deploy plans and resources to remote repositories and file systems through the user interface. At a high level. Stage 4 is the phase in which data cleansing and other data quality tasks are performed on the project data. at a point defined as the Manage Phase within Velocity. Capturing business rules and testing the plans are also covered in this stage. this information feeds into another iteration of data quality operations in which the plans are tuned and optimized. you deploy the data quality plans. In stage 2. you verify the target levels of quality for the business according to the data quality measurements taken in stage 1. you use Workbench to design the data quality plans and projects to achieve the targets. Stage 5 can occur during the Design and/or Build Phase of Velocity. If you are running Workbench alone on remote computers. In a large data project. you can export your plans as XML. in consultation with the business or project sponsor. and in accordance with project resourcing and scheduling. stages 1 and 2 ideally occur very early in the project.In stage 1.Data Warehousing 579 of 1017 . If targets have not been met. you may find that data quality processes of varying sizes and impact are necessary at many points in the project plan. In stage 4. you analyze the quality of the project data according to several metrics. depending on the level of unit testing required.

which processes the data (in conjunction with any reference data files used by the plan) and returns the results to PowerCenter. Informatica provides a set of plans dedicated to cleansing and de-duplicating North American name and postal address records. With the Integration component. The plan information is saved with the transformation as XML. it enables the PowerCenter Server (or Integration service) to send data quality plan XML to the Data Quality engine for execution. The PowerCenter Designer user opens a Data Quality Integration transformation and configures it to read from the Data Quality repository. On the PowerCenter server side. The relevant source data and plan information will be sent to the Data Quality engine. Server side: PowerCenter needs an instance of the Data Quality engine to execute the plan instructions.Data Warehousing 580 of 1017 .Data Quality Integration is a plug-in component that enables PowerCenter to connect to the Data Quality repository and import data quality plans to a PowerCenter transformation. An IDQ-trained consultant can build the data quality plans. The data quality plans’ functional details are saved as XML in the PowerCenter repository. The Integration component enables the following process: ● Data quality plans are built in Data Quality Workbench and saved from there to the Data Quality repository. The PowerCenter Designer user saves the transformation and the mapping containing it to the PowerCenter repository. Next. or you can use the pre-built plans provided by Informatica. you can apply IDQ plans to your data without necessarily interacting with or being aware of IDQ Workbench or Server. Last updated: 06-Feb-07 12:43 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . it enables you to browse the Data Quality repository and add data quality plans to custom transformations. The Integration interacts with PowerCenter in two ways: ● On the PowerCenter client side. ● ● The PowerCenter Integration service can then run a workflow containing the saved mapping. the users selects a plan from the Data Quality repository and adds it to the transformation. ● The Integration requires that at least the following IDQ components are available to PowerCenter: ● ● Client side: PowerCenter needs to access a Data Quality repository from which to import plans. Currently.

● Generating and Viewing Profile Reports Use Profile Manager to view profile reports.) ● Running and Monitoring Profiles Profiles are run in one of two modes: interactive or batch. Bear in mind that Informatica’s Data Quality (IDQ) applications also provide data profiling capabilities. A custom data profile is useful when there is a specific question about a source. This information can help to improve the quality of the source data. or if you want to test whether data matches a particular pattern. This Best Practice is intended to provide an introduction on usage for new users. An auto profile is particularly valuable when you are data profiling a source for the first time. It provides a row count.0 and later that leverages existing PowerCenter functionality and a data profiling GUI front-end to provide a wizard-driven approach to creating data profiling mappings. The sessions are created with default configuration parameters. candidate key evaluation. Choose the appropriate mode by checking or unchecking “Configure Session” on the "Function-Level Operations” tab of the wizard.Data Warehousing 581 of 1017 . Custom profiling is useful for validating business rules and/or verifying that data matches a particular pattern. and average (if numeric) at the column level. create the sessions manually in Workflow Manager and configure and schedule them appropriately. choose “Always run profile interactively” since most of your dataprofiling tasks will be interactive.Data Profiling Challenge Data profiling is an option in PowerCenter version 7. (In later phases of the project. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . check the “Use source owner name during profile mapping generation” option. Consult the following Velocity Best Practice documents for more information: ● ● Data Cleansing Using Data Explorer for Data Discovery and Analysis Description Creating a Custom or Auto Profile The data profiling option provides visibility into the data contained in source systems and enables users to measure changes in the source data over time. For data-profiling tasks that are likely to be reused on a regular basis. uncheck this option because more permanent data profiles are useful in these phases. Right-click on a profile and choose View Report. If you are profiling data using a database user that is not the owner of the tables to be sourced. and domain inference. and redundancy evaluation at the source level. If you are in the analysis phase of your project. use custom profiling if you have a business rule that you want to validate. For example. distinct value and null value count. max. Setting Up the Profile Wizard To customize the profile wizard for your preferences: ● ● Open the Profile Manager and choose Tools > Options. and workflows. sessions. ● Use Interactive to create quick. and min. Creating and running an auto profile is quick and helps to gain a reasonably thorough understanding of a source in a short amount of time. since auto profiling offers a good overall perspective of a source. single-use data profiles.

then capture the script that is generated and run it. You can create additional metrics.' from user_tables where table_name like 'PMDP%'. select 'analyze index ' || index_name || ' compute statistics.' from user_tables where index_name like 'DP%'. first 200 rows) Profile Warehouse Administration Updating Data Profiling Repository Statistics The Data Profiling repository contains nearly 30 tables with more than 80 indexes. analysis Manual random sampling PowerCenter samples random rows of the source data based on a userspecified percentage. ORACLE select 'analyze table ' || table_name || ' compute statistics.For greater flexibility.Data Warehousing 582 of 1017 . you can also use Data Analyzer to view reports. be sure to keep database statistics up to date.g. Sample first N rows Samples the number of user-selected rows Provides a quick readout of a source (e. Run the query below as appropriate for your database type. The xml files are located in the \Extensions\DataProfile\IPAReports subdirectory of the client installation. To ensure that queries run optimally. Samples more or fewer rows than the automatic option chooses. Sampling Techniques Four types of sampling techniques are available with the PowerCenter data profiling option: Technique No sampling Description Uses all source data Usage Relatively small data sources Automatic random sampling PowerCenter determines the Larger data sources where you appropriate percentage to sample.. attributes. You can also schedule Data Analyzer reports and alerts to send notifications in cases where data does not meet preset quality limits. then want a statistically significant data samples random rows. and reports in Data Analyzer to meet specific business requirements. Microsoft SQL Server select 'update statistics ' + name from sysobjects where name like 'PMDP%' SYBASE select 'update statistics ' + name from sysobjects where name like 'PMDP%' INFORMIX INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Each PowerCenter client includes a Data Analyzer schema and reports xml file.

Choose Target Warehouse>Purge to open the purging tool. indexname from dbc.select 'update statistics low for table '. ' || tabname || ' and indexes all.tables where tabname like 'PMDP %' TERADATA select 'collect statistics on '.indices where tablename like 'PMDP%' and databasename = 'database_name' where database_name is the name of the repository database. ' index '. tabname. ' . Purging Old Data Profiles Use the Profile Manager to purge old profile data from the Profile Warehouse. ' from syscat. Last updated: 01-Feb-07 18:52 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Choose Target Warehouse>Connect and connect to the profiling warehouse. tablename.Data Warehousing 583 of 1017 . ' from systables where table_name like 'PMDP%' IBM DB2 select 'runstats on table ' || rtrim(tabschema) || '.

Bear in mind that you can augment or supplant the data quality handling capabilities of PowerCenter with Informatica Data Quality (IDQ). defined in IDQ can deliver significant data quality improvements to your project data. Common Questions to Consider Data integration/warehousing projects often encounter general data problems that may not merit a full-blown data quality project. For a summary of Informatica’s data quality methodology.Data Warehousing 584 of 1017 . whether it be data warehousing. This Best Practice focuses on techniques for use with PowerCenter and third-party or add-on software. If you have added these data quality steps to your project. the Informatica application suite dedicated to data quality issues. Description The issue of poor data quality is one that frequently hinders the success of data integration projects. or plans. The quality of data is important in all types of projects. Comments that are specific to the use of PowerCenter are enclosed in brackets. but which nonetheless must be addressed. Data analysis and data enhancement processes.Data Quality Mapping Rules Challenge Use PowerCenter to create data quality mapping rules to enhance the usability of the data in your system. much of the content discusses specific strategies to use with PowerCenter. A description of the range of IDQ capabilities is beyond the scope of this document. consult the Best Practice Data Cleansing. you are likely to avoid the issues described below. It can produce inconsistent or faulty results and ruin the credibility of the system with the business users. such as those described in the Analyze and Design phases of Velocity. This document discusses some methods to ensure a base level of data quality. enjoys a significant advantage over a project that has not audited and resolved issues of poor data quality. A data project that has built-in data quality steps. as embodied in IDQ. INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

users can still see it in this format. Since the “raw” data is stored in the table. One solution to this issue is to create additional fields that act as a unique key to a given table. or formatting applied to it. trimming. These blanks need to be trimmed before matching can occur. The project team must understand how spaces are handled from the source systems to determine the amount of coding required to correct this.Data Warehousing 585 of 1017 . but the additional columns mitigate the risk of duplication. Another possibility is to explain to the users that “raw” data in unique. In other words. This issue can be particularly troublesome in data migration projects where matching the source data is a high priority.data synchronization. the options provided while configuring the File Properties may be sufficient.). these questions should be addressed during the Design and Analyze Phases of the project because they can require a significant amount of re-coding if identified later. which then stores the data with trailing blanks. users want to see data in its “raw” format without any capitalization. (When using PowerCenter and sourcing flat files. identifying fields is not as clean and consistent as data in a common format. Remember that certain RDBMS products use the data type CHAR. with the answers driven by the project’s requirements and the business users that are being serviced. Some of the areas to consider are: Text Formatting The most common hurdle here is capitalization and trimming of spaces. It is usually only advisable to use CHAR for 1-character flag fields. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . but there is danger in taking this requirement literally since it can lead to duplicate records when some of these fields are used to identify uniqueness and the system is combining data from various source systems. push back on this requirement. Certain questions need to be considered for all of these projects. or data migration. Ideally. This is easily achievable as it is the default behavior. Often. but which are formatted in a standard way. Failing to trim leading/trailing spaces from data can often lead to mismatched results since the spaces are stored as part of the data value.

caching is not always realistic or feasible. PowerCenter can handle some conversions without function calls (these are detailed in the product documentation). developers must put one space beside the text radio button.Note that many fixed-width files do not use a null as space. [In PowerCenter. Embedding database text manipulation functions in lookup transformations is not recommended because a developer must then cache the lookup table due to the presence of a SQL override. Therefore. (In PowerCenter. avoid embedding database text manipulation functions in lookup transformations. even when they are not needed or desired.Data Warehousing 586 of 1017 . an implicit conversion is performed. and also tell the product that the space is repeating to fill out the rest of the precision of the column. if the TO_CHAR function is not used. The strip trailing blanks facility then strips off any remaining spaces from the end of the data value. but this may cause subsequent support or INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and 15 digits are carried forward. Datatype Conversions It is advisable to use explicit tool functions when converting the data type of a particular data value.) On very large tables.

An example piece of code would be: IIF(IS_DATE(in_RECORD_CREATE_DT. PowerCenter treats all numeric columns as 15 digit floating point decimals. birth date or death date). TO_DATE(in_RECORD_CREATE_DT. If the check is not performed first. then the Enable Decimal Arithmetic in the Session Properties option needs to be checked. It is also advisable to determine if any default dates should be defined. However. These should then be used throughout the system for consistency. [Informatica recommends first checking a piece of data to ensure it is in the proper format before trying to convert it to a Date data type.] Dates Dates can cause many problems when moving and transforming data from one place to another because an assumption must be made that all data values are in a designated format. (By default. which can cause data to be lost]. 'YYYYMMDD').. The Enable Decimal Arithmetic option must be enabled when comparing two numbers for equality. ‘YYYYMMDD'). regardless of how they are defined in the transformations. The NULL in the example above could be changed to one of the standard default dates described here. be aware that enabling this option can slow performance by as much as 15 percent. The maximum numeric precision in PowerCenter is 28 digits. Trapping Poor Data Quality Techniques INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Decimal Precision With numeric data columns.maintenance headaches.Data Warehousing 587 of 1017 . such as a low date or high date. then a developer increases the risk of transformation errors. However. NULL) If the majority of the dates coming from a source system arrive in the same format. to increase performance. do not fall into the trap of always using default dates as some are meant to be NULL until the appropriate time (e. developers must determine the expected or required precisions of the columns. so that the proper checks are made. then it is often wise to create a reusable expression that handles dates.g.) If it is determined that a column realistically needs a higher precision.

INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Enforcing Rules During Mapping Another method of filtering bad data is to have a set of clearly defined data rules built into the load job.Data Warehousing 588 of 1017 . An example of this is to check all incoming Country Codes against a Valid Values table. The records are then evaluated against these rules and routed to an Error or Bad Table for further re-processing accordingly.The most important technique for ensuring good data quality is to prevent incorrect. or 2) relax the business rule to allow the record to be loaded. It is this person’s responsibility to patrol these tables and push back to the appropriate systems as necessary. as well as help to make decisions about fixing or filtering bad data. or incomplete data from ever reaching the target system. the source system may refuse to fix the record at all. In this case. a decision must be made to either: 1) fix the data manually and risk not matching with the source system. This goal may be difficult to achieve in a data synchronization or data migration project. A data steward should have a good command of the metadata. The source system can then be alerted to verify where the problem exists in its feed. If the code is not found. it is a good idea to assign a team member the role of data steward. In fact. and he/she should also understand the consequences to the user community of data decisions. as specified in the Analyze Phase these metrics and others should be readily available. stop the overall process from loading into your target system. be sure to request an audit file or report that contains a summary of what to expect within the feed. then the record is flagged as an Error record and written to the Error table. in the absence of an enterprise data steward. A pitfall of this method is that you must determine what happens to the record once it has been loaded to the Error table. Common requests here are record counts or summaries of numeric data fields. then a delay may occur until the record can be successfully loaded to the target system. inconsistent. Often times. but it is very relevant when discussing data warehousing or ODS. If the values do not match. Assuming that the metrics can be obtained from the source system. if the proper governance is not in place. it is advisable to then create a pre-process step that ensures your input source matches the audit file. This section discusses techniques that you can use to prevent bad data from reaching the system. Checking Data for Completeness Before Loading When requesting a data feed from an upstream system. If the record is pushed back to the source system to be fixed. If you have performed a data quality audit.

this dimension may require a new record if a new combination occurs. The cross-reference translation data can be accumulated over time. there are some cases where a corresponding dimension record is not present at the time of the fact load. but they provide a clear and easy way to identify records that may need to be reprocessed at a later date. When this occurs. but assign the foreign key a value that represents Not Found or Not Available in the dimension. A third solution is to use dynamic caches and load the dimensions when a record is not found there. consistent rules need to handle this so that data is not improperly exposed to. It is imperative that all of these solutions be discussed with the users before making any decisions since they will eventually be the ones making decisions based on the reports. This can usually be accomplished by loading the dimension tables before loading the fact tables. or hidden from. A dimensional model relies on the presence of dimension records existing before loading the fact tables. The team will most likely want to flag the row through the use of either error tables or process codes so that it can be reprocessed at a later time. Dimension Not Found While Loading Fact The majority of current data warehouses are built using a dimensional model. Thus. These keys must also exist in the dimension tables to satisfy referential integrity.Another solution applicable in cases with a small number of code values is to try to anticipate any mistyped error codes and translate them back to the correct codes. the users. However. even while loading the fact table. Last updated: 01-Feb-07 18:52 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . One occasion when this may be advisable is in cases where dimensions are simply made up of the distinct combination values in a data set. Another solution is to filter the record from processing since it may no longer be relevant to the fact table. Each time an error is corrected. both the incorrect and correct values should be put into the table and used to correct future errors automatically. One solution is to continue to load the data to the fact table.Data Warehousing 589 of 1017 . This should be done very carefully since it may add unwanted or junk values to the dimension table.

This Best Practice is targeted toward Informatica Data Quality (IDQ) users familiar with Informatica's matching approach.Data Warehousing 590 of 1017 . Once duplicate records are identified. a user’s ability to design and execute a matching plan that meets the key requirements of performance and match quality depends on understanding the best-practice approaches described in this document. Step Profiling Description Typically the first stage of the data quality process. and it can assist marketing efforts by identifying households or individuals who are heavy users of a product or service. it can help control costs associated with mailing lists by preventing multiple pieces of mail from being sent to the same person or household. You can also match records or values against reference data to ensure data accuracy and validity. and better recognize key relationships among data records (such as customer records from a common household). It also highlights the data elements that require standardizing to improve match scores.Effective Data Matching Techniques Challenge Identifying and eliminating duplicates is a cornerstone of effective marketing efforts and customer resource management initiatives. if present. Data can be enriched by matching across production data and reference data sources. It enables the creation of a single view of customers. In other scenarios. However. To optimize your data matching operations in IDQ. The following table outlines the processes in each step. IDQ’s matching capabilities can help to resolve dataset duplications and deliver business results. profiling generates a picture of the data and indicates the data elements that can comprise effective group keys. would allow clear ‘joins’ between the datasets and improve business knowledge. and it is an increasingly important driver of cost-efficient compliance with regulatory initiatives such as KYC (Know Your Customer). the datasets may lack common keys (such as customer numbers or product ID fields) that. To describe plan design and plan execution actions that will optimize plan performance and results. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Business intelligence operations can be improved by identifying links between two or more systems to provide a more complete picture of how customers interact with a business. you must be aware of the factors that are discussed below. An integrated approach to data matching involves several steps that prepare the data for matching and improve the overall quality of the matches. Description All too often. you can remove them from your dataset. Identifying and eliminating duplicates in datasets can serve several purposes. It has two high-level objectives: ● ● To identify the key performance variables that affect the design and execution of IDQ matching plans. an organization's datasets contain duplicate data in spite of numerous attempts to cleanse the data or prevent duplicates from occurring.

variant spellings. This is not a high-priority issue. The process whereby the data values in the created groups are compared against one another and record matches are identified according to user-defined criteria. (This document does not make any recommendations on profiling. standardization or consolidation strategies. The plan designer must weigh file-based versus database matching approaches when considering plan requirements. Its focus is grouping and matching. Factor Group size Impact Plan performance Impact summary The number and size of groups have a significant impact on plan execution speed. It identifies the master record in a duplicate cluster and permits the creation of a new dataset or the elimination of subordinate records. Group keys Quality of matches Hardware resources Plan performance Size of dataset(s) Plan performance Informatica Data Quality components Plan performance INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Standardization reduces the likelihood that match quality will be affected by data elements that are not relevant to match determination. Grouping Matching Consolidation The sections below identify the key factors that affect the performance (or speed) of a matching plan and the quality of the matches identified. and memory require consideration. Any child data associated with subordinate records is linked to the master record.Standardization Removes noise. it should be considered when designing the plan.Data Warehousing 591 of 1017 . and other extraneous data elements. The process whereby duplicate records are cleansed. Processors. However. The proper selection of group keys ensures that the maximum number of possible matches are identified in the plan. excess punctuation. disk performance. They also outline the best practices that ensure that each matching plan is implemented with the highest probability of success.) The following table identifies the key variables that affect matching plan performance and the quality of matches identified. A post-standardization function in which the groups' key fields identified in the profiling stage are used to segment data into logical groups that facilitate matching plan performance.

Candidate group keys should also have high scores in three keys areas of data quality: completeness. based on this one large group.Data Warehousing 592 of 1017 .000 records. When grouping is implemented properly. The goal of grouping is to optimize performance while minimizing the possibility that valid matches will be overlooked because like records are assigned to different groups. processing 1. Timing requirements must be understood up-front. and accuracy.000 records. The selection of group keys. consider a dataset of 1. When matching on grouped data. so more likely matches are potentially identified.000. a matching plan compares the records within each group with one another. and the potential for missing good matches is increased. Problems in these data areas can be improved by standardizing the data prior to grouping. This can be determined by profiling the data and uncovering the structure and quality of the content prior to grouping. conformity. Large groups perform more record comparisons. is critical to ensuring that relevant records are compared against one another. As groups get smaller. A record for a INFORMATICA CONFIDENTIAL Velocity v8 Methodology . the matching plan would require 87 days to complete. Match identification Quality of matches Group Size Grouping breaks large datasets down into smaller ones to reduce the number of record-to-record comparisons performed in the plan. for which a grouping strategy generates 10. If 9. based on key data fields.999 of these groups have an average of 50 records each. For example. geography is a logical separation criterion when comparing name and address data.999 groups could be matched in about 12 minutes if the group sizes were evenly distributed. When selecting a group key. the remaining group will contain more than 500. Grouping splits data into logical chunks and thereby reduces the total number of comparisons performed by the plan. the number of data records in each group. The plan designer must weigh deterministic versus probabilistic approaches. The reverse is true for small groups. has a significant affect on the success of matching operations.000 groups.Time window and frequency of execution Plan performance The time taken for a matching plan to complete execution depends on its scale.000. the remaining 9. two main criteria apply: ● Candidate group keys should represent a logical separation of the data into distinct units where there is a low probability that matches exist between records in different units. groups must be defined intelligently through the use of group keys. Group size can also have an impact on the quality of the matches returned in the matching plan. The most important determinant of plan execution speed is the size of the groups to be processed — that is. Group key selection.000 comparisons a minute! In comparison. ● For example. fewer comparisons are possible. plan execution speed is increased significantly. Group Keys Group keys determine which records are assigned to which groups. therefore. which directly impacts the speed of plan execution. with no meaningful effect on match quality. Therefore.

Matching plans may need to be tuned to fit within the cycle in which they are run. Genf. The more frequently a matching plan is run. and the quantity of components used in the plan. records for Geneva. file-based matching outperformed database matching in UNIX and Windows environments for plans containing up to 100.g. Match Identification The method used by IDQ to identify good matches has a significant effect on the success of the plan.11) to test source/sink combinations and various operational components. the more the execution time will have to be considered. Swiss data). Raw performance should not be the only consideration when selecting the components to use in a matching plan. However. the more time required to produce a matching plan — both in terms of the preparation of the data and the plan execution. combinations of components.000 groups. IDQ Components All IDQ components serve specific purposes. and the IDQ components to employ. If the group key in this case is based on city name. duplicate data may exist for an individual living in Geneva. However. In tests comparing file-based matching against database matching. if you are working with national data (e.Data Warehousing 593 of 1017 . However. For example. a deterministic check may first check if INFORMATICA CONFIDENTIAL Velocity v8 Methodology . IDQ’s fuzzy matching algorithms can be combined with this method. matching plans that wrote output to a CSV Sink outperformed plans with a DB Sink or Match Key Sink. and very little functionality is duplicated across the components. the larger the dataset. Knowing the time window for plan completion helps to determine the hardware configuration choices. Frequency of Execution The frequency with which plans are executed is linked to the time window available. there are performance implications for certain component types. Two key methods for assessing matches are: ● ● deterministic matching probabilistic matching Deterministic matching applies a series of checks to determine if a match can be found between two records. Thus. and Geneve will be written to different groups and never compared — unless variant city names are standardized. Different components serve different needs and may offer advantages in a given scenario. Also. The time available for the completion of a matching plan can have a significant impact on the perception that the plan is running correctly. Size of Dataset In matching.person living in Canada is unlikely to match someone living in Ireland. Several tests have been conducted on IDQ (version 2. the country-identifier field can provide a useful group key. in general terms. Time Window IDQ can perform millions or billions of comparison operations in a single matching plan. grouping strategy. the size of the dataset typically does not have as significant an impact on plan performance as the definition of the groups within the plan. Plans with a Mixed Field Matcher component performed more slowly than plans without a Mixed Field Matcher. who may also be recorded as living in Genf or Geneve.

This can be difficult for users to understand and communicate to one another. This can result in matches being missed. and (2) it is similar to the methods employed when manually checking for matches.the last name comparison score was greater than 85 percent. The advantage of probabilistic matching is that it is less rigid than deterministic matching. match scores from less heavily-weighted components may still produce a match. The disadvantages to this method are its rigidity and its requirement that each dependency be true. Also. the cut-off mark for good matches versus bad matches can be difficult to assess. Weights assigned to individual components can place emphasis on different fields or areas in a record. a matching plan with 95 to 100 percent success may have found all good matches. even if a heavilyweighted score falls below a defined threshold.Data Warehousing 594 of 1017 . it then checks the first name. Probabilistic matching takes the match scores from fuzzy matching components and assigns weights to them in order to calculate a weighted average that indicates the degree of similarity between two pieces of information. The following table illustrates this principle. If an 80 percent match is found. If a 90 percent match is found on the first name. The disadvantages of this method are a higher degree of required tweaking on the user’s part to get the right balance of weights in order to optimize successful matches. There are no dependencies on certain data elements matching in order for a full match to be found. Matches between 85 and 89 percent may correspond to only 65 percent genuine matches. or can require several different rule checks to cover all likely combinations. but matching plan success between 90 and 94 percent may map to only 85 percent genuine matches. For example. then the entire record is considered successfully matched. and so on. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The advantages of deterministic matching are: (1) it follows a logical path that can be easily communicated to others. However. Best Practice Operations The following section outlines best practices for matching with IDQ. it next checks the address. If this is true. Close analysis of the match results is required because of the relationship between match quality and match thresholds scores assigned since there may not be a one-to-one mapping between the plan’s weighted score and the number of records that can be considered genuine matches.

000.000 records Exceptions Large datasets over 2M records with uniform data. as a minimum.Capturing Client Requirements Capturing client requirements is key to understanding how successful and relevant your matching plans are likely to be. matching scalability can be achieved using PowerCenter's partitioning capabilities. As a best practice. be sure to answer the following questions. Keep the following parameters in mind when designing a grouping plan. IDQ’s architecture supports massive scalability by allowing large jobs to be subdivided and executed across several processors. Scalability in standard installations. 500. group sizes have a significant affect on the speed of matching plan execution.000 groups per one million record dataset. as achieved in the allocation of matching plans to multiple processors. Performance is the key to success in high-volume matching solutions. Minimum number of singlerecord groups Optimum number of comparisons 1.20 percent per 1 million records INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Managing Group Sizes As stated earlier. This scalability greatly enhances IDQ’s ability to meet the service levels required by users without sacrificing quality or requiring an overly complex solution. Minimize the number of groups containing more than 5.000 records. before designing and implementing a matching plan: ● ● ● ● ● ● ● How large is the dataset to be matched? How often will the matching plans be executed? When will the match process need to be completed? Are there any other dependent processes? What are the rules for determining a match? What process is required to sign-off on the quality of match results? What processes exist for merging records? Test Results Performance tests demonstrate the following: ● ● IDQ has near-linear scalability in a multi-processor environment.000 comparisons +/. If IDQ is integrated with PowerCenter.Data Warehousing 595 of 1017 . Condition Maximum group size Best practice 5. will eventually level off. Also. the quantity of small groups should be minimized to ensure that the greatest number of comparisons are captured.

Data Warehousing 596 of 1017 . The majority of the activity required in matching is tied to the processor. and components used. multi-processor matching may not significantly improve performance in every case. A RAID drive may be appropriate for datasets of 3 to 4 million records and a minimum of 512MB of memory should be available. Group Key Identification Identifying appropriate group keys is essential to the success of a matching plan. This increases the time required to retrieve information that otherwise could be stored in memory. The following table is a rough guide for hardware estimates based on IDQ Runtime on Windows platforms. Group key selection depends on the type of data in the dataset. higher-specification processors (e. and also increases the load on the hard disk.000 records 1.5 GHz computer. background processes running. for example whether it contains name and address data or other data types such as product codes. matching plans split across four processors do not run four times faster than a single-processor matching plan.500.In cases where the datasets are large. and accuracy. Be aware however.. Information that cannot be stored in memory during plan execution must be temporarily written to the hard disk. and written to. 2GB RAM. Also. Specifications for UNIX-based systems vary. Hardware Specifications Matching is a resource-intensive operation. can be split into multiple plans to take advantage of multiple processors on a server. Multi-Processor With IDQ Runtime. especially in terms of processor capability. RAID 5 hard disk Single Processor vs. Three key variables determine the effect of hardware on a matching plan: processor speed. Ideally. disk performance. Therefore.000 comparisons to 6. As a result. Group keys act as a “first pass” or high-level summary of the shape of the dataset(s). any dataset that is about to be matched has been profiled and standardized to identify candidate keys. it is possible to run multiple processes in parallel. the speed can range from as low as 250. it is vital to select group keys that have high data quality scores for completeness. the speed of the processor has a significant affect on how fast a matching plan completes. Matching plans.5 GHz minimum) should be used for high-volume matching plans. multiple group keys may be required to segment the data to ensure that best practice guidelines are followed. and memory. Although the average computational speed for IDQ is one million comparisons per minute. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . that this requires additional effort to create the groups and consolidate the match output. consistency. 512MB RAM Multi processor server.500. 1. whether they are file-based or database-based.000 to 3 million records > 3 million records Suggested hardware specification 1. 1GB RAM Multi-processor server. The speed of the disk and the level of defragmentation affect how quickly data can be read from. Therefore. depending on the hardware specification. the hard disk. Match volumes < 1. Informatica Corporation can provide sample grouping plans that automate these requirements as far as is practicable.5 million comparisons per minute.g. conformity. The hard disk reads and writes data required by IDQ sources and sinks. Remember that only data records within a given group are compared with one another. As a best practice. Hard disk capacity and available memory can also determine how fast a plan completes.

With regards to selecting one method or the other.and multi-processor plan in this case would be more than five hours (i. (Time equals Y * 1.Using IDQ with PowerCenter and taking advantage of PowerCenter's partitioning capabilities may also improve throughput.25]) Matching Est 1 million comparisons a minute. a four-processor match plan should require approximately one hour and 20 minute to group and standardize and two and one half hours to match. The following table can help in estimating the execution time between a single and multi-processor match plan. The major differences between the two methods revolve around how data is stored and how the outputs can be manipulated after matching is complete. However. (Time equals Y) Multiprocessor Single processor time plus 20 percent. File-Based Matching File-based matching and database matching perform essentially the same operations. if a single processor plan takes one hour to group and standardize the data and eight hours to match. IDQ’s Weight Based Analyzer component lets plan designers calculate weighted match scores for matched fields. (Time equals [(X / NP) * 1. Bear in mind that IDQ supports deterministic matching operations only. A 2006 article by Forrester Research stated a preference for deterministic comparisons since they remove the burden of identifying a universal match threshold from the user. nine hours for the single processor plan versus three hours and 50 minutes for the quad-processor plan).20) Time for single processor matching divided by no or processors (NP) multiplied by 25 percent. The time difference between a single. Probabilistic Comparisons No best-practice research has yet been completed on which type of comparison is most effective at determining a match. there are no best practice recommendations since this is largely defined by requirements. Database vs.e. Deterministic vs.. The following table outlines the strengths and weakness of each method: File-Based Method Ease of implementation Performance Space utilization Operating system restrictions Easy to implement Fastest method Requires more hard-disk space Possible limit to number of groups that can be created Database Method Requires SQL knowledge Slower than file-based method Lower hard-disk space requirement None INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Plan Type Standardardization/ grouping Single Processor Depends on operations and size of data set. This approach has the advantage that splitting plans into multiple independent plans is not typically required. (Time equals X) For example. Each method has strengths and weaknesses.Data Warehousing 597 of 1017 .

which involves logically segmenting the dataset into distinct elements. hard-ware with higher processor speeds has higher match throughputs. IDQ architecture also allows matching tasks to be broken into smaller tasks and shared across multiple processors. When the volume of data increases into the tens of millions. the plan designer must develop multiple plans for execution in parallel. To develop this solution. Each subgroup is assigned to a discrete matching plan. Multi-Processor Matching: Solution Overview IDQ does not automatically distribute its load across multiple processors. so that there is a high probability that records within a group are not duplicates of records outside of the group. the quantity of comparisons required to check an entire dataset increases geometrically as the volume of data increases. where the number of comparisons required is a multiple of the volumes of data in each dataset. Groups are then subdivided into one or more subgroups (the number of subgroups depends on the plan being run and the number of processors in use). or groups. and INFORMATICA CONFIDENTIAL Velocity v8 Methodology . ● The following section outlines how a multi-processor matching solution can be imple-mented and illustrates the results obtained in Informatica Corporation testing. The first factor can be controlled in IDQ through grouping. For a single data source.Data Warehousing 598 of 1017 . IDQ affects the number of comparisons per minute in two ways: ● Its matching components maximize the comparison activities assigned to the com-puter processor.Ability to control/ manipulate output Low High High-Volume Data Matching Techniques This section discusses the challenges facing IDQ matching plan designers in opti-mizing their plans for speed of execution and quality of results. A similar situation arises when matching between two datasets. The use of multiple processors to handle matching operations greatly enhances IDQ scalability with regard to high-volume matching problems. Grouping data greatly reduces the total number of required comparisons without affecting match accuracy. To scale a matching plan to take advantage of a multi-processor environment. In order to detect matching information. It highlights the key factors affecting matching performance and discusses the results of IDQ performance testing in single and multi-processor environments. the number of comparisons required to identify matches — and consequently. Checking for duplicate records where no clear connection exists among data elements is a resource-intensive activity. The number of comparisons that can be performed per minute. This reduces the amount of disk I/O communication in the system and increases the number of comparisons per minute. Therefore. the plan designer first groups the data to prevent the plan from running low-probability comparisons. a record must be compared against every other record in a dataset. the amount of time required to check for matches — reaches impractical levels. Approaches to High-Volume Matching Two key factors control the time it takes to match a dataset: ● ● The number of comparisons required to check the data.

the plans are executed in parallel. and the discrete match plans are run in parallel against each table. This is demonstrated in the graph below.000. Two gigabytes of RAM were available.000. The following diagram outlines how multi-processor matching can be implemented in a database model. Test results using file-based and database-based methods showed a near linear scal-ability as the number of available processors increased. disk I/O in this configuration eventually limited the benefits of adding additional processor capacity. Informatica Corporation Match Plan Tests Informatica Corporation performed match plan tests on a 2GHz Intel Xeon dual-processor server running Windows 2003 (Server edition). so too did the demand on disk I/O resources. Each subgroup of data is loaded into a sepa-rate staging area. Grouping of the data limited the total number of comparisons to approximately 500. The tests were performed on one million rows of data. Several tests were performed using file-based and database-based matching methods and single and multiple processor methods. As the processor capacity began to scale upward. Results from each plan are consolidated to generate a single match result for the orig-inal source data. As the number of processors increased.Data Warehousing 599 of 1017 . The hyper-threading ability of the Xeon processors effectively provided four CPUs on which to run the tests. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Source data is first grouped and then subgrouped according to the number of processors available to the job.

Therefore. When the data was not evenly distributed. and the benefits of scaling over multiple processors was not as evident. having an even distribution of records across all proc-essors was important to maintaining scalability. Last updated: 26-May-08 17:52 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . some match plans ran longer than others.Execution times for multiple processors were based on the longest execution time of the jobs run in parallel.Data Warehousing 600 of 1017 .

“care of” information. the person who best understands the nature of the data within the business scenario. Often. Examples of cleansing operations include the removal of person names. Users can extend the number of profiles by using the first 500 profiles within one component and then feeding the data overflow into a second Profile Standardizer via the Token Parser component. This reveals the content of the elements within the data as well as standardizing the data itself. After the data is parsed and labeled. word. Data standardization refers to operations related to modifying the appearance of the data.. Cleansing and Standardization Operations Data can be transformed into a “standard” format appropriate for its business type. Description Data cleansing refers to operations that remove non-relevant information and “noise” from the content of the data. number. the Data Quality Developer should carry out these steps in consultation with a member of the business. A data standardization operation typically profiles data by type (e. This is typically performed on complex data types such as name and address or product data. the Profile Standardizer is a powerful tool for parsing unsorted data into the correct fields.1 User Guide. For best results.Data Warehousing 601 of 1017 . INFORMATICA CONFIDENTIAL Velocity v8 Methodology . data quality scorecard or dashboard reporting can be introduced. when using the Profile Standardizer. see the Report Viewer chapter of the Informatica Data Quality 3. The intent is to shorten development timelines and ensure a consistent and methodological approach to cleansing and standardizing project data. For information on dashboard reporting. be aware that there is a finite number of profiles (500) that can be contained within a cleansing plan.g. it should be evident if reference dictionaries will be needed to further standardize the data.Effective Data Standardizing Techniques Challenge To enable users to streamline their data cleansing and standardization processes (or plans) with Informatica Data Quality (IDQ). It may take several iterations of dictionary construction and review before the data is standardized to an acceptable level. code) and parses data strings into discrete components. so that it takes on a more uniform structure and to enriching the data by deriving additional details from existing content. excess character spaces. ● Within IDQ. this individual is the data steward. Once acceptable standardization has been achieved. or punctuation from postal address. However.

When building a data quality plan. These rules should be documented and converted to logic that can be contained within a data quality plan. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . be sure to group related business rules together in a single rules component whenever possible. when standardizing product data information). otherwise the plan may become very difficult to read. This requirement may arise when a string or an element within a string needs to be treated as an array.Data Warehousing 602 of 1017 . Standard and Third-Party Reference Data Reference data can be a useful tool when standardizing data. the business user may discover and define business rules applicable to the data. This helps to ensure that the customer develops reasonable expectations of what can be achieved with the data set within an agreed-upon timeframe. it may be advisable to re-emphasize the score-carding and graded-data approach to cleansing and standardizing.e. IDQ installs with several reference dictionary files that cover common name and address and business terms.Discovering Business Rules At this point. it may be necessary to perform some custom scripting using IDQ’s scripting component. The illustration below shows part of a dictionary of street address suffixes. Terms with variant formats or spellings can be standardized to a single form. If there are rules that do not lend themselves easily to regular IDQ components (i. Common Issues when Cleansing and Standardizing Data If the customer has expectations of a bureau-style service.

Please note.Data Warehousing 603 of 1017 . address. and Church are all common surnames. These components leverage address reference data that originates from national postal carriers such as the United States Postal Service. the string has five tokens. and thus it is important to discuss their inclusion in the project with the business in advance so as to avoid budget and installation issues. Because data quality is an iterative process.” In this case. Main St” can reasonably be interpreted as “Saint Patrick’s Church. Each data value can then be compared to a discrete prefix and suffix dictionary. whereas St with position 5 would be standardized to meaning_2. and the plans developed using a SQL statement as input. Country dictionaries are identifiable by country code to facilitate such statements. Remember that IDQ installs with a large set of reference dictionaries and additional dictionaries are available from Informatica. and premise values can be interchangeable. The data would typically be staged in a database. particularly in name and address data where name. Hill. Main Street. as the results produced by IDQ will be affected by the starting condition of the data and the requirements of the business users. Several types of reference data. are available from Informatica. with differing levels of address granularity. the business rules initially developed may require ongoing modification. Such datasets enable IDQ to validate an address to premise level. For example. the position of the value is important. In some cases. Conclusion Using the data cleansing and standardization techniques described in this Best Practice can help an organization to recognize the value of incorporating IDQ into their development methodology. as you are treating the string as an array. You may need to write business rules using the IDQ Scripting component. IDQ provides several components that focus on verifying and correcting the accuracy of name and postal address data. When data arrives in multiple languages. it is worth creating similar IDQ plans for each country and applying the same rules across these plans. and sometimes they can both occur in the same string. the reference datasets are licensed and installed as discrete Informatica products. Park. with a “where country_code= ‘DE’” clause. Last updated: 01-Feb-07 18:52 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The address string “St Patrick’s Church. St with position 1 within the string would be standardized to meaning_1. for example. Pricing for the licensing of these components may vary and should be discussed with the Informatica Account Manager. if the delimiter is a space (thus ignoring any commas and periods).Standardizing Ambiguous Data Data values can often appear ambiguous. “ST” can be a suffix for street or a prefix for Saint.

rather than naming a field Customer address1. name the field address1. etc. This is done by placing a join and sort prior to the IDQ plan containing the match. This changes the input field reference to only those fields that must be visible in the PowerCenter integration. however. ● If the IDQ was plan developed using database source and/or sink. If you are working with an input file that has 50 fields and you only really need 10 fields for the IDQ plan. If you don’t see a field after the plan is integrated in PowerCenter. If the IDQ plan was developed using group sink/source (or dual group sink). Specifically those components are CSV Source. CSV Sink and CSV Match Sink. Thus. city. create a file that contains only the necessary field names. address2. For example. Velocity v8 Methodology . they require additional setup as each field name and length must be defined. if the same standardization and cleansing is needed by multiple sources you can integrate the same IDQ plan. which will reduce development time as well as ongoing maintenance. Other fields such as Pipe will cause an error within the PowerCenter Designer. Database source and sinks are not allowed in PC integration. This document assumes that the appropriate setup and configuration of IDQ and PowerCenter have been completed as part of the software installation process and these steps are not included in this document. For reusability of IDQ plans. PowerCenter integration does not allow input ports to be selected as output if the IDQ transformation is defined as a passive transformation. When replacing group sink you also must add functionality to the PC mapping to replicate the grouping. Plans running locally from workbench can use any of the available IDQ Source and Sink components. save it as a comma delimited file and then point to that newly created file from the source of the IDQ plan. This is driven by the input file used for the workbench plan and the fields selected as output in the sink. This is not true for plans that are integrated into PowerCenter as they can only use Source and Sink components that contain the “Enable Real-time processing” check box. the source and sink need to be enabled by setting the enable real-time processing option on them. Use only necessary fields as input to each mapping plan. change the delimiter to comma. save the plan and then go back to PowerCenter Designer and perform the import of the plan again. CSV Match Source. Description Preparing IDQ Plans for PowerCenter Integration IDQ plans are typically developed and tested by executing from workbench. The delimiter of the Source and Sink must be comma for integration IDQ plans.Data Warehousing 604 of 1017 ● ● ● ● ● ● INFORMATICA CONFIDENTIAL . PowerCenter only sees the input and output ports of the IDQ plan from within the PC mapping. In addition. use generic naming conventions for the input and output ports. When IDQ plans are integrated within a PowerCenter mapping.Integrating Data Quality Plans with PowerCenter Challenge This Best Practice outlines the steps to integrate an Informatica Data Quality (IDQ) plan into a PowerCenter mapping. the Real-time Source and Sink can be used. customer address2. If the IDQ transformation is configured as active this is not an issue as you must select all fields needed as output from the IDQ transformation within the sink transformation of the IDQ plan. Consider the following points when developing a plan for integration in PC. If you encounter this error. Passive and active IDQ transformations follow the general restrictions and rules for active and passive transformations in PowerCenter. you must replace them with either CSV Sink/Source or CSV Match Sink/Source depending on the functionality you are replacing. go back to workbench. it means the field is not in the input file or not selected as output. customer city. you must replace them with CSV Sink/ Source or CSV Match Sink/Source.

Integrating IDQ Plans into PowerCenter Mappings After the IDQ Plans are converted to real time-enabled. they are ready to integrate into a PowerCenter mapping. However.profile for UNIX) Installing IDQ on the PowerCenter server Running IDQ Integration and Content install on the server Registering IDQ plug-in via the PowerCenter Admin console Note: The plug-in must be registered in each repository from which an IDQ transformation is to be developed. an active transformation is necessary only for a matching plan. Integrating into PowerCenter requires proper installation and configuration of the IDQ/PowerCenter integration. the PowerCenter mapping will display an error message and will not allow that mapping to be integrated until the Runtime enable is active again. IDQ plan input needs to have all input fields passed through. open the mapping. Be careful not to refresh the IDQ plan in the mapping within PowerCenter while real time is not enabled. only within the PowerCenter mapping. shown below.● Once the source and sink are converted to real time. To integrate an IDQ plan. as typical PowerCenter INFORMATICA CONFIDENTIAL Velocity v8 Methodology . the IDQ transformation icon. as appropriate. you may change the check box at any time to revert to standalone processing. If you do so. Typically. Then click in the mapping workspace to insert the transformation into the mapping. you cannot run the plan within workbench. The following dialog box appears: Select Active or Passive. is visible in the PowerCenter repository.Data Warehousing 605 of 1017 . including: ● ● ● ● Making appropriate changes to environment variables (to . ● ● Installing IDQ workbench on the workstation Installing IDQ Integration and Content on the workstation using the PowerCenter Designer When all of the above steps are executed correctly. and click on the IDQ icon. If selecting Active.

they will be populated upon import/integration of the IDQ plan. un-configured state.Data Warehousing 606 of 1017 . Double-click on the title bar for the IDQ transformation to open it for editing. As the following figure illustrates.rules apply to Active and Passive transformation processing. the IDQ transformation is “empty” in its initial. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Notice all ports are currently blank.

click Test Connection. When first integrating an IDQ plan. specify the name of the computer on which the IDQ repository is installed. “Configuration”.Then select the far right tab.Data Warehousing 607 of 1017 . the connection and repository displays are blank. In the Host Name box. Click the Connect button to establish a connection to the appropriate IDQ repository. specify the correct value. This is usually the PowerCenter server. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . If the default Port Number (3306) was changed during installation. Next.

The procedure for granting privileges to the IDQ (MySQL) repository is explained at the end of this document. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . If the plan is valid for PowerCenter integration. the following dialog box appears. if the Source and Sink have not been configured correctly. For example.Note: In some cases if the User Name has not been granted privileges on the Host server you will not be allowed to connect. a dialog box appears. and the following dialog is displayed: Browse to the plan you want to import.Data Warehousing 608 of 1017 . then click on the Validate button. the following dialog is displayed. click the down arrow to the right of the Plan Name box. If there is an error in the plan. When the connection is established.

INFORMATICA CONFIDENTIAL Velocity v8 Methodology . are visible and can be connected just as any other PowerCenter transformation. an error dialog is displayed. a copy of that plan is then integrated in the PowerCenter metadata. This reads the current version of the plan and refreshes it within PowerCenter. ● ● ● ● Double-click on IDQ transformation in PowerCenter Mapping Select the Configurations tab: Select Refresh.Data Warehousing 609 of 1017 . it is saved in the MySQL repository. changes made to the IDQ plan in Workbench are not reflected in the PowerCenter mapping until the plan is manually refreshed in the PowerCenter mapping. If any PowerCenter-specific errors were created when the plan was modified. When you save an IDQ plan. Select apply. Refreshing IDQ Plans for PowerCenter Integration After Data Quality Plans are integrated in PowerCenter. the PowerCenter ports (equivalent to the IDQ Source and Sink fields. the MySQL repository and the PowerCenter repository do not communicate updates automatically.After a valid plan has been configured. When you integrate that plan into PowerCenter. The following paragraphs detail the process for refreshing integrated IDQ plans when necessary to reflect changes made in workbench.

output.* to ‘admin’@’<idq_client_ip>’ ● For a user to integrate an IDQ plan into PowerCenter. the client machine must be granted permissions to the MySQL on the server. save and retrieve plans. enter the following command: grant all privileges on *. If the client machine has not been granted access. Login to the server on which the MySQL repository is located and login to MySQL: mysql –u root ● For a user to connect to IDQ server. grant the following privilege: INFORMATICA CONFIDENTIAL Velocity v8 Methodology . then save the mapping in PowerCenter. The usual practice is to save the plan to the IDQ repository located on the PowerCenter server.● Update input. Saving IDQ Plans to the Appropriate Repository – MySQL Permissions Plans that are to be integrated into PowerCenter mappings must be saved to an IDQ Repository that is visible to the PowerCenter Designer prior to integration. and pass-through ports as necessary. ● ● Identify the IP address for any client machine that needs to be granted access. The person at your organization who has login rights to the server on which IDQ is installed needs to perform this task for all users who will need to save or retrieve plans from the IDQ Server.Data Warehousing 610 of 1017 . This procedure is detailed below. and test the changes. In order for a Workbench client to save a plan to that repository. the client receives an error message when attempting to access the server repository.

grant all privileges on *.Data Warehousing 611 of 1017 .* to ‘root’@’<powercenter_client_ip>’ Last updated: 20-May-08 23:18 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Data Warehousing 612 of 1017 . INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and they may be stored in different parts of the organization. Examples include the dictionary files that install with IDQ. Reference files are essential to some. Description Reference data files can be used by a plan to verify or enhance the accuracy of the data inputs to the plan. External files are more likely to remain in their original format. as they are often created specifically for data quality projects. Internal data is specific to a particular project or client. where appropriate. Reference data can be stored in a file format recognizable to Informatica Data Quality or in a format that requires intermediary (third-party) software in order to be read by Informatica applications. which are easily portable into dictionary format. employee tax numbers or customer names. or company registration and identification information from an industrystandard source such as Dun & Bradstreet. It may be custom-built for the project. External data is used when authoritative. such as United States Postal Service. It may be a list of employees. Internal data files. Databases can also be used as a source for internal data. are typically saved in the dictionary file format or as delimited text files.Managing Internal and External Reference Data Challenge To provide guidelines for the development and management of the reference data sources that can be used with data quality plans in Informatica Data Quality (IDQ). external data may be contained in a database or in a library whose files cannot be edited or opened on the desktop to reveal discrete data values. acceptable variants on those terms. or valid postal addresses — any data set that provides an objective reference against which project data sources can be checked or corrected. These forms of data may or may not be part of the project source data. A reference data file is a list of verified-correct terms and. Working with Internal Data Obtaining Reference Data Most organizations already possess much information that can be used as reference data — for example. For example. but not all data quality processes. independently-verified data is needed to provide the desired level of data quality to a particular aspect of the source data. postal address data sets that have been verified as current and complete by a national postal carrier. External data has been sourced or purchased from outside the organization. The goal is to ensure the smooth transition from development to production for reference data files and the plans with which they are associated. Reference data can be internal or external in origin. Such data is typically generated from internal company information. package measurements.

Note that the dictionary file has at least two columns of data.DIC file open in a text editor. particularly in cases where both sets of data are highly unlikely to share common errors. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . You can write from plan files directly to a dictionary using the IDQ Report Viewer (see below). are internal data sources sufficiently reliable for use as reference? Bear in mind that in some cases the reference data does not need to be 100 percent accurate.The question arises.DIC file name extension. You can use the Dictionary Manager within Data Quality Workbench. The Item columns contain versions of each datum that the dictionary recognizes as identical to or coterminous with the Label entry. To edit a dictionary value.Data Warehousing 613 of 1017 . open the DIC file and make your changes. ● ● The figure below shows a dictionary file open in IDQ Workbench and its underlying .DIC File Format IDQ installs with a set of reference dictionaries that have been created to handle many types of business data. and dictionary files are essentially comma delimited text files. This method allows you to create text and database dictionaries.DIC file into the Dictionaries folders of your IDQ (client or server) installation. A dictionary can have multiple Item columns. These dictionaries are created using a proprietary . The Label column contains the correct or standardized form of each datum from the dictionary’s perspective. You can make changes either through a text editor or by opening the dictionary in the Dictionary Manager. each datum in the dictionary must have at least two entries in the DIC file (see the text editor illustration below). It can be good enough to compare project data against reference data and to flag inconsistencies between them. Therefore. You can create a new dictionary in three ways: ● You can save an appropriately formatted delimited file as a . Saving the Data in . DIC is abbreviated from dictionary.

You can also add values in a text editor by placing the cursor on a new line and typing Label and Item values separated by commas. The user’s file space in the Data Quality service domain. Dictionary files are not version controlled by IDQ. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . this is the master configuration file for the product and you should not edit it without consulting Informatica Support. Thus. and review/roll back to earlier versions when necessary. You should define a process to log changes and back-up your dictionaries using version control software if possible or a manual method. it is recommended that these modifications be made to a copy of the original file. open the DIC file in Dictionary Manager. IDQ does not recognize a dictionary file that is not in such a location. If modifications are to be made to the versions of dictionary files installed by the software. or at a location in the server’s Dictionaries folders that corresponds to the dictionaries’ location on Workbench — when the plan is copied to the server-side repository. even if you can browse to the file when designing the data quality plan. The File Manager within IDQ Workbench provides an Explorer-like mechanism for moving files to other machines across the network.xml file. renamed or relocated as desired.DIC files in pre-set locations within the IDQ installation when running a plan. Note: IDQ users with database expertise can create and specify dictionaries that are linked to database tables. This approach avoids the risk that a subsequent installation might overwrite changes. Sharing Reference Data Across the Organization As you can publish or export plans from a local Data Quality repository to server repositories. You can create and annotate multiple versions of a plan. any plan that uses a dictionary in a non-standard location will fail. However. By default. and add a Label string and at least one Item string. By making use of a dynamic connection.xml file. You must ensure that copies of any dictionary files used in the local plan are available in a suitable location on the service domain — in the user space on the server. so you can copy dictionaries across the network. Database dictionaries are useful when the reference data has been originated for other purposes and is likely to change independently of data quality. the dictionary is ready for use in IDQ. Note: You can change the locations in which IDQ looks for plan dictionaries by editing the config. however. and that thus can be updated dynamically when the underlying data is updated. Version Controlling Updates and Managing Rollout from Development to Production Plans can be version-controlled during development in Workbench and when published to a domain repository. Bear in mind that Data Quality looks for . This is most relevant when you publish or export a plan to another machine on the network. data quality plans can always point to the current version of the reference data. Once saved.Data Warehousing 614 of 1017 . Bear in mind that Data Quality looks only in the locations set in the config. place the cursor in an empty row.To add a value to a dictionary. Data Quality relies on dictionaries being located in the following locations: ● ● The Dictionaries folders installed with Workbench and Server.

and IDQ leverages software from the vendor to read these files. Managing Reference Updates and Rolling Out Across the Organization If your organization has a reference data subscription. (The third-party software has a very small footprint. and so the need for a versioning strategy for these files is debatable. Depending on the number of IDQ installations that must be updated.Data Warehousing 615 of 1017 . INFORMATICA CONFIDENTIAL Velocity v8 Methodology . you will receive either regular data files on compact disc or regular information on how to download data from Informatica or vendor web sites. Using Workbench to Build Dictionaries With IDQ Workbench. If you obtained third-party data through Informatica. you will be kept up to date with the latest data as it becomes available for as long as your data subscription warrants. Strategies for Managing Internal and External Reference Data Experience working with reference data leads to a series of best practice tips for creating and managing reference data files. You should plan for the task of obtaining and distributing updates in your organization at frequent intervals. You must develop a strategy for distributing these updates to all parties who run plans with the external data.) However. some software files can be amenable to data extraction to file. should not ordinarily be changed. You can check that you possess the latest versions of third-party data by contacting your Informatica Account Manager. Working with External Data Formatting Data into Dictionary Format External data may or may not permit the copying of data into text format — for example. Bear in mind that postal address data vendors update their offerings every two or three months. Bear in mind that third-party reference data. updating your organization with thirdparty reference data can be a sizable task. The key advantage of external data — its reliability — is lost if you do not apply the latest files from the vendor. although this presents difficulties if the database is very large in size.Database reference data can also be version controlled. Obtaining Updates for External Reference Data External data vendors produce regular data updates. third-party postal address validation data is provided to Informatica users in this manner. Currently. and it’s vital to refresh your external reference data when updates become available. you can select data fields or columns from a dataset and save them in a dictionarycompatible format. such as postal address data. This may involve installing the data on machines in a service domain. external data contained in a database or in library files. and that a significant percentage of postal addresses can change in such time periods.

DIC suffix and add it to the Dictionaries folder of your IDQ installation: the dictionary is now visible to the IDQ Dictionary Manager.xml file for possible modifications. You now have a dictionary file of bad account numbers that you can use in any plans checking the validity of the organization's account records. records containing bad zip codes). let’s say you have an exception file containing suspect or invalid customer account records. In this case. and subsequently use this file to create a dictionary-compatible file. This file will be populated with Label and Item1 entries corresponding to the column data. thereby lowering the risk of accidental errors during migration. you can quickly parse the account numbers from this file to create a new text file containing the account serial numbers only. you create a file with Label and Item1 columns. Using Report Viewer to Build Dictionaries The IDQ Report Viewer allows you to create exception files and dictionaries on-the-fly from report data.Let’s say you have designed a data quality plan that identifies invalid or anomalous records in a customer database. This file effectively constitutes the labels column of your dictionary. and save the column data as a dictionary file. right-click on a column. the dictionary created is a list of serial numbers from invalid customer records (specifically. For example. Rename the file with a . it is a best practice to follow the dictionary organization structure installed by the application. The figure below illustrates how you can drill-down into report data. The plan designer can now create plans to check customer databases against these serial numbers. Using a very simple data quality plan. Using IDQ. and then saving the spreadsheet as a CSV file. As a general rule. you can create an exception file of these bad records.Data Warehousing 616 of 1017 . When following the original dictionary organization structure is not practical or contravenes other requirements. Subsequent users are then relieved of the need to examine the config. take care to document INFORMATICA CONFIDENTIAL Velocity v8 Methodology . You can also append data to an existing dictionary file in this manner. adding to that structure as necessary to accommodate specialized and supplemental dictionaries. By opening this file in Microsoft Excel or a comparable program and copying the contents of Column A into Column B.

this can be very labor-intensive. you have three options: ● ● You can reset the location to which IDQ looks by default for dictionary files. ● Last updated: 08-Feb-07 17:09 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . You can reconfigure the plan components that employ the dictionaries to point to the new location. (Specifically. Depending on the complexity of the plan concerned. You can do this by appending a parameter file to the plan execution instructions on the command line. If you must move or relocate your reference data files post-plan development.Data Warehousing 617 of 1017 . If deploying plans in a batch or scheduled task. Since external data may be obtained from third parties and may not be in file format. The parameter file is an xml file that can contain a simple command to use one file path instead of another.the customizations. the most efficient way to share its content across the organization is to locate it on the Data Quality Server machine. you can append the new location to the plan execution command. this is the machine that hosts the Execution Service.) Moving Dictionary Files After IDQ Plans are Built This is a similar issue to that of sharing reference data across the organization.

It also provides step-by-step instructions on how to build this process using Informatica’s PowerCenter and Data Quality. those two records will not match. 1. There is a master data set (or possibly multiple master data sets) that contain clean and unique customers. the more effort will need to be put into the matching logic. Once the master data is cleansed. This transaction can be anything from a new customer signing up on the web to a list of new products. it may be sensible to limit candidate pull records to those having the same zip code and the same first letter of the last name. comparing Bob to Robert or GRN to Green requires a more sophisticated approach. with the most accurate and complete information readily available Reduced risk of fraud or over-exposure Trusted information at the source Less effort in BI. EDI feeds messages on a queue. 3.Data Warehousing 618 of 1017 . This process requires well-defined rules for important attributes. In a perfect world of consistent id’s. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . 2. and representations of data across all companies and systems. typically thought to be a new item. an application entry. there is an incoming transaction. unique records to be added. There must be a process to determine if a “new” item really is new or if it already exists within the master data set. On the other hand. there is no need to compare records that are so dissimilar that they cannot meet the business rules for matching. and/or migration projects Description Performing effective real-time matching involves multiple puzzle pieces. when matching consumer data on name and address. spellings. ● ● ● ● ● Benefits of preventing duplicate records include: Better ability to service customer. There will be many more chances of missed matches allowing duplicates to enter the system. the next step is to develop criteria for candidate selection. and the less value the organization will derive from it. taking these records and matching them against existing master data that already exists allows for only the new. this is anything that is assumed to be new and intended to be added to master. this is not the case and even being creative and using %LIKE% syntax does not provide thorough results. Applying these rules to the data should result in complete. which really means trusted data. To interact with the master data set. conformant.Real-Time Matching Using PowerCenter Challenge This Best Practice describes the rationale for matching in real-time along with the concepts and strategies used in planning for and developing a real-time matching solution. and/or many other types of data. or other common data feeds. For example. For efficient matching. prospects. suppliers. consistent. For example. Unfortunately. checking for duplicates would simply be some sort of exact lookup into the master to see if the item already exists. valid data. data warehouse. Whether the data is coming from a website. The cheapest and most effective way to eliminate duplicate records from a system is to prevent them from ever being entered in the first place. changes captured from a database. products. Standardizing Data in Advance of Matching The first prerequisite for successful matching is to cleanse and standardize the master data set. These rules should also be reusable so they can be used with the incoming transaction data prior to matching. the set of candidates must be sufficiently broad to minimize the chance that similar records will not be compared. The more compromises made in the quality of master data by failing to cleanse and standardize. because we can reason that if those elements are different between two records.

It is also best to use elements that can be verified as valid. The returned result set consists of the candidates that are potential matches to the incoming record Developing an Effective Candidate Selection Strategy Determining which records from the master should be compared with the incoming record is a critical decision in an effective real-time matching system. Consider even a modest customer master data set with one million records. This can consist of matching one to many elements of the input record to each candidate pulled from the master. Scores below a certain threshold can then be discarded and potential matches can be output or displayed. or combination of multiple parts/fields. one for each candidate. The one common theme with candidate keys is the data elements used should have the highest levels of completeness and validity possible. The match pairs then go through the matching logic resulting in a match score 5. What specific data elements the candidate key should consist of very much depends on the scenario and the match rules.There also may be cases where multiple candidate sets are needed. The goal of candidate selection is to select only that subset of the records from the master that are definitively related by a field. Ideally this key would be constructed and stored in an indexed field within the master table(s) allowing for the quickest retrieval. will have a match score or a series of match scores. think of matching on name and address for one set of match rules and name and phone for a second. the matching logic can be developed. Once the candidate selection process is resolved. and thus the wait in real-time would be unacceptable. while another pulls in the record as a candidate. Adding to the previous example. The input record coming into the server 2. such as a postal code INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The selection is done using a candidate key or group key. This would be the case if there are multiple sets of match rules that the two records will be compared against.Data Warehousing 619 of 1017 . The full real-time match process flow includes: 1. Candidate selection for real-time matching is synonymous to grouping or blocking for batch matching. For most organizations it is not realistic to match an incoming record to all master records. There are many instances where multiple keys are used to allow for one key to be missing or different. The server then standardizes the incoming record and retrieves candidate records from the master data source that could match the incoming record 3. consisting of the incoming record and the candidate 4. This would require selecting records from the master that have the same phone number and first letter of the last name. the amount of processing. Match pairs are then generated. Once the data is compared each pair of records. one input and one candidate. Records with a match score below a given threshold are discarded 6. part of a field.

0 SP4 Informatica Data Quality 8. should be under 300 records.Data Warehousing 620 of 1017 INFORMATICA CONFIDENTIAL . The ideal size of the candidate record sets. The table below lists multiple common matching elements and how group keys could be used around the data. Step by Step Development The following instructions further explain the steps for building a solution to real-time matching using the Informatica suite. candidate record counts should be kept under 5000 records. For acceptable two to three second response times.utilizing Web Services Hub Informatica Data Explorer 5.or a National ID.1 . They involve the following applications: ● ● ● ● Informatica PowerCenter 8.5 SP1 – utilizing North American Country Pack SQL Server 2000 Velocity v8 Methodology .5. for sub-second response times.

street name. The first step is to analyze the customer master file.Data Warehousing 621 of 1017 . thus the match rules we must account for blank names. country pack) Once complete. outputting the discreet address components such as house number. and generate an index on that field. 0. country pack) Generate the candidate key field. country pack) Parse the name field into individual fields. and suite number. 3)|| in_HOUSE_NUMBER||substr(in_STREET_NAME. (Expression. your customer master table should look something like this: ● ● ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . house number. directional. The next step is to load the customer master file into the database. middle. (Pre-built mapplet to do this. Assume that this analysis shows the postcode field is complete for all records and the majority of it is of high accuracy. street type. populate that with the selected strategy (assume it is the first 3 characters of the zip. 1)) Standardize the phone number. output of previous mapplet. we will use the recommended address grouping strategy for our candidate key (see table1) The desire is that different applications from the business will be able to make a web service call to determine if the data entry represents a new customer or an existing customer ● Solution: 1. Below is a list of tasks that should be implemented in the mapping that loads the customer master data into the database: ● Standardize and validate the address. Assume also that neither the first name or last name field is completely populated. assume there are examples where the names are not properly fielded. we are performing a name and address match Because address is part of the match. hint: substr(in_ZIPCODE. (Pre-built mapplet to do this. (Pre-built mapplet to do this. 2.Scenario: ● A customer master file is provided with the following structure ● ● In this scenario. Also remember to output a value to handle of nicknames. and the first character of street name). 0. Although the data structure indicates names are already parsed into first. and last.

3. we will be using a Web Service Provider source and target. From there select Web Service Provider and the Create Web Service Definition. ● You will see a screen like the one below where the Service can be named and input and output ports can be created. city. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and a phone number. All fields will be free-form text. ● Within PowerCenter Designer. zip.Data Warehousing 622 of 1017 . address. Select the Multiple Occurring Elements checkbox for the output ports section. Since we are providing the Service. state. Now that the customer master has been loaded. the potential that multiple records will be returned must be taken into account. Also add a match score output field to return the percentage at which the input record matches the different potential matching records from the master. Since this is a matching scenario. go to the source analyzer and select the source menu. Follow these steps to build the source and target definitions. assume that the incoming record will include a full name field. For this project. a Web Service mapping must be created to handle real-time matching.

In developing a plan for real-time. a sample stub header is listed below. IN_GROUP_KEY_1. Another difference from batch matching in PowerCenter is that the DQ transformation can be set to passive.IN_DIRECTIONAL_1. IN_ADDRESS2_1. Create a new stub CSV file with only the header row. ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology .IN_POSTNAME_1. An IDQ match plan must be build to use within the mapping.IN_SUITE_NUM_1. The following steps illustrate converting the North America Country Pack’s Individual Name and Address Match Plan from a plan built for use in a batch mapping to a plan built for use in a real-time mapping. IN_HOUSE_NUM_1. This header must use all of the input fields used by the plan before modification. For convenience. Rename it to “RT Individual Name and Address Match”. using a CSV source and CSV sink.IN_STREET_NAME_1. Firstname_1 & Firstname_2.IN_FIRSTNAME_1.g. The source will have the _1 and the _2 fields that a Group Source would supply built into it. The header for the stub file will duplicate all of the fields. with one set having a suffix of _1 and the other _2. IN_MIDNAME_1.IN_FIRSTNAME_ALT_1. both enabled for real-time is the most significant difference from a similar match plan designed for use in IDQ standalone. This will be used to generate a new CSV Source within the plan.IN_CITY_1.Data Warehousing 623 of 1017 . ● Open the DCM_NorthAmerica project and from within the Match folder make a copy of the “Individual Name and Address Match” plan. e.IN_LASTNAME_1.● Both the source and target should now be present in the project folder.IN_STATE_1. 4.

IN_CITY_2. you can see the different instances that need to be reselected as they appear with a red diamond. The source and target previously generated b. This plan will be imported into a passive transformation.IN_GROUP_KEY_2.IN_STATE_2. as seen below. An IDQ transformation importing the plan just built c. Because the components were originally mapped to the CSV Match Source and that was deleted. IN_DIRECTIONAL_2. A Filter transformation to filter those records that match score below a certain threshold g.IN_POSTAL_CODE_1. A Sequence transformation to build a unique key for each matching record returned in the SOAP response INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Select the check box for Enable Real-time Processing in both the source and the sink and the plan will be ready to be imported into PowerCenter. and point it at the new stub file. data can be passed around it and does not need to be carried through the transformation. A SQL transformation to get the candidate records for the master table f. ● ● Also delete the CSV Match Sink and replace it with a CSV Sink. With this implementation you can output multiple match scores so it is possible to see why two records matched or didn’t match on a field by field basis. ● 5. IN_FIRSTNAME_ALT_2. An Expression transformation to generate the group key and build a single directional field e. Only the match score field(s) must be selected for output. The same IDQ cleansing and standardization transformations used to load then master data (Refer to step 2 for specifics) d.IN_HOUSE_NUM_2. IN_POSTNAME_2.IN_MIDNAME_2.Data Warehousing 624 of 1017 . As you open the different match components and RBAs. the fields within your plan need to be reselected. The mapping will consist of: a. IN_POSTAL_CODE_2 ● Now delete the CSV Match Source from the plan and add a new CSV Source.IN_FIRSTNAME_2. Consequently.IN_ADDRESS2_2.IN_STREET_NAME_2.IN_LASTNAME_2.

concatenate the pre and post directional field into a single directional field for matching purposes. Make sure that the output fields agree with the query in number. Also within the expression.Data Warehousing 625 of 1017 . Remember to use the same logic as in the mapping that loaded the customer master. Connect all necessary fields from the source qualifier. which will be used for matching. name. and type. Add the following country pack mapplets to standardize and validate the incoming record from the web service: r r r ● mplt_dq_p_Personal_Name_Standardization_FML mplt_dq_p_USA_Address_Validation mplt_dq_p_USA_Phone_Standardization_Validation ● Add an Expression Transformation and build the candidate key from the Address Validation mapplet output fields. DQ mapplets. create a new mapping and drag the web service source and target previously created into the mapping. Add a SQL transformation to the mapping. The SQL transform will present a dialog box with a few questions related to the SQL transformation. Standardized phone number ● The next step is to build the query from within the SQL transformation to select the candidate records. and a Static connection.● Within PowerCenter Designer. For details on the other options refer to the PowerCenter help. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . These fields should include: r r r r ● ● XPK_n4_Envelope (This is the Web Service message key) Parsed name elements Standardized and parsed address elements. The output of the SQL transform will be the incoming customer record along with the candidate record. For this example select Query mode. MS SQL Server (change as desired). and Expression transformation to the SQL transformation.

select passive as the transformation type. and the record will be included in the response. Each of these records needs a new Unique ID so the Sequence Generator transformation will be used.80) Any record coming out of the filter transformation is a potential match that exceeds the specified threshold. For this scenario.0 and 1.These will be stacked records where the Input/Output fields will represent the input record and the Output only fields will represent the Candidate record. This match score will be in a float type format between 0.0. (Hint: TO_FLOAT(out_match_score) >= . Map the output of the Sequence Generator to the primary key field of the response element group. To complete the mapping. The mapping should look like this: ● ● ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . A simple example of this is shown in the table below where a single incoming record will be paired with two candidate records: ● Comparing the new record to the candidates is done by embedding the IDQ plan converted in step 4 into the mapping through the use of the Data Quality transformation. the cut-off will be 80%. all records that have a match score below a certain threshold will get filtered off.Data Warehousing 626 of 1017 . the output of the Filter and Sequence Generator transformations need to be mapped to the target. When this transformation is created. Make sure to map the input primary key field (XPK_n4_Envelope_output) to the primary key field of the envelope group in the target (XPK_n4_Envelope) and to the foreign key of the response element group in the target (FK_n4_Envelope). The output of the Data Quality transformation will be a match score. Using a filter transformation.

create a workflow. but for the purpose of this scenario: a. For all the specific details of this please refer to the PowerCenter documentation. generate a new workflow and session for this mapping using all the defaults. Once created. ● Using the Workflow Manager. Also make sure to select the proper connection. Give the service the name you would like to see exposed to the outside world INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The workflow needs to be Web Services enabled and this is done by selecting the enabled checkbox for Web Services. Once the Web Service is enabled. it should be configured. ● ● The final step is to expose this workflow as a Web Service. For more advanced tweaking and web service settings see the PowerCenter documentation.Data Warehousing 627 of 1017 . On the Mapping tab select the SQL transformation and make sure the connection type is relational. edit the session task.6. Before testing the mapping. This is done by editing the Workflow.

The web service is ready for testing.b. Set the timeout to 30 seconds c. Allow 2 concurrent runs d. Set the workflow to be visible and runnable 7. INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 628 of 1017 .

Last updated: 26-May-08 12:57 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 629 of 1017 .

Bear in mind that data quality plans are designed to analyze and resolve data content issues. bear in mind that some plans (specifically some data INFORMATICA CONFIDENTIAL Velocity v8 Methodology . based upon the importance of a given data field to the underlying business process. the project may set a target of 95 percent accuracy in its customer addresses. It is not necessary to formally test the plans used in the Analyze Phase of Velocity. This method of iterative testing helps support rapid identification and resolution of bugs.Data Warehousing 630 of 1017 . Plan testing often precedes the project’s main testing activities. The development of data quality plans typically follows a prototyping methodology of create. As well. Common Questions in Data Quality Plan Testing ● What dataset will you use to test the plans? While the ideal situation is to use a data set that exactly mimics the project production data. Testing is performed as part of the third step. but more often represent a continuum of data improvement issues where it is possible that every data instance is unique and there is a target level of data quality rather than a “right or wrong answer”. Description Testing data quality plans is an iterative process that occurs as part of the Design Phase of Velocity. Data quality plans tend to resolve problems in terms of percentages and probabilities that a problem is fixed. The level of inaccuracy acceptability is also likely to change over time. For example.Testing Data Quality Plans Challenge To provide a guide for testing data quality processes or plans created using Informatica Data Quality (IDQ) and to manage some of the unique complexities associated with data quality plans. analyze. as the tested plan outputs will be used as inputs in the Build Phase. accuracy should continuously improve as the data quality rules are applied and the existing data sets adhere to a higher standard of quality. you may not gain access to this data. in order to determine that the plans are being developed in accordance with design and project requirements. These are not typically cut-and-dry problems. execute. If you obtain a full cloned set of the project data for testing purposes.

Testing to Validate Rules 1. the plans must have real-time enabled data source and sink components. When you pass an IDQ plan as tested. Identify a small. For details on the local and remote locations to which IDQ looks for source and reference data files. ● Are the plans using reference dictionaries? Reference dictionary management is an important factor since it is possible to make changes to a reference dictionary independently of IDQ and without making any changes to the plan itself. Will the plans be integrated into a PowerCenter transformation? If so. ● ● Strategies for Testing Data Quality Plans The best practice steps for testing plans can be grouped under two headings. Testing to Validate Plan Effectiveness This process is concerned with establishing that a data enhancement plan has been properly designed. If not. you must ensure that no additional work is carried out on any dictionaries referenced in the plan.matching plans) can take several hours to complete. representative sample of source data. it’s vital to ensure that your plan resources. To determine the results to expect when the plans are run. 3. discussions should be held with the key business stakeholders to review the results of the IDQ plan and determine the appropriate course of action. standardization or matching that the plans will apply. Consider testing data matching plans overnight. that the plan delivers the required improvements in data quality. refer to the Informatica Data Quality 8. This is largely a matter of comparing the business and project requirements for data quality and establishing if the plans are on course to deliver these. you must ensure that the dictionary files reside in locations that are valid IDQ. How will the plans be executed? Will they be executed on a remote IDQ Server and/or via a scheduler? In cases like these. that is. 2. the plans may need a thorough redesign – or the business and project targets may need to be revised. are in valid locations for use by the Data Quality engine. In INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Data Warehousing 631 of 1017 . manually process the data based on the rules for profiling. In either case. Execute the plans on the test dataset and validate the plan results against the manually-derived results.5 User Guide. including source data files and reference data files. Moreover.

once the entire data set is processed against the business rules. there may be other data anomalies that were unaccounted for that may require additional modifications to the underlying business rules and IDQ plans.Data Warehousing 632 of 1017 .addition. Last updated: 05-Dec-07 16:02 INFORMATICA CONFIDENTIAL Velocity v8 Methodology .

Is the plan properly documented? You should ensure all plan documentation on the data flow and the data components are up-to-date. When editing a plan.Tuning Data Quality Plans Challenge This document gives an insight into the type of considerations and issues a user needs to be aware of when making changes to data quality processes defined in Informatica Data Quality (IDQ). ● ● ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Work_Folder) in the Workbench for changing and testing. you should copy the plan to a new project folder (viz. In IDQ. always work on staged data (database or flat-file). Description You should consider the following questions prior to making changes to a data quality plan: ● What is the purpose of changing the plan? You should consider changing a plan if you believe the plan is not optimally configured. and leave the original plan untouched during testing. you can create a baseline version of the plan using IDQ version control functionality. For guidelines on documenting IDQ plans. In addition. see the Sample Deliverable Data Quality Plan Design. Have you backed up the plan before editing? If you are using IDQ in a client-server environment. data quality processes are called plans.. removing or changing the operational components that comprise a data quality plan. Is the plan operating directly on production data? This applies especially to standardization plans. or the plan is not functioning properly and there is a problem at execution time or the plan is not delivering expected results as per the plan design principles. Are you trained to change the plan? Data quality plans can be complex. This best practice is not intended to replace training materials but serve as a guide for decision making in the areas of adding. The principal focus of this best practice is to know how to tune your plans without adversely affecting the plan logic. You can later migrate the plan to the production environment after complete and thorough testing. You should not alter a plan unless you have been trained or are highly experienced with IDQ methodology.Data Warehousing 633 of 1017 .

they can also identify duplicate data entries and fix accuracy issues through the use of reference data.Data Warehousing 634 of 1017 . The key objective in data analysis is to determine the levels of completeness. data analysis plans can also identify cases of missing. ● Data analysis plans produce reports on data patterns and data quality across the input data. Adding Components In general. In pursuing these objectives. simply adding a component to a plan is not likely to directly affect results if no further changes are made to the plan. input data changing (in format or content). adding a “To Upper” component to convert text into upper case may not cause the plan results to change meaningfully. inaccurate or “noisy” data. the data process flow is changed and the plan must be re-tested and results reviewed in detail before migrating the plan into production. You should take into account all current change-management procedures. once the outputs from the new component are integrated into existing components.You should have a clear goal whenever you plan to change an existing plan. It is possible to configure a plan that moves “beyond the point of truth” by focusing on certain data elements and excluding others. and consistency in the dataset. that improved plan statistics do not always mean that the plan is performing better. Bear in mind. It is not necessarily your goal to obtain the best scores for your data. For example. Your goal in a data enhancement plan is to resolve the data quality issues discovered in the data analysis. An event may prompt the change: for example. some components have a larger impact than others. conformity. although the presentation of the output data will change. This includes integration and regression testing too. particularly in data analysis plans. Data enhancement plans corrects completeness. However.) Bear in mind that at a high level there are two types of data quality plans: data analysis and data enhancement plans. or changes in business rules or business/project targets. When added to existing plans. conformity and consistency problems. However. and the updated plans should be thoroughly tested before production processes are updated. adding and integrating a Rule Based Analyzer component INFORMATICA CONFIDENTIAL Velocity v8 Methodology . ● Your goal in a data analysis plan is to discover the quality and usability of your data. (See also Testing Data Quality Plans.

g. or name) as opposed to adding all standardization tasks to a single large plan.Data Warehousing 635 of 1017 . in most cases. you can create plans for specific function areas (e. To avoid overloading a plan with too many components. However. For example. in an environment where a large number of attributes must be evaluated against the six standard data quality criteria (i. The plan cannot run without these configuration changes being completed. This makes plans and business rules easier to maintain and provides a good framework for future development.. you can add a new instance to an existing component. during standardization. it is often a good practice to split tasks into multiple plans where a large amount of data quality measures need to be checked.. product. and that instance behaves very differently to the other instances in that component — for example. splitting plans up by data entity may be advantageous. completeness. configuration changes will be required to all components that use the outputs from the component. If you remove an integrated component.(designed to apply business rules) may cause a severe impact. The overall name for the component should also be changed to reflect the logic of the instances contained in the component. it is a good practice to add multiple instances to a single component. To avoid making plans over-complicated. if it acts on an unrelated set of outputs or performs an unrelated type of action on the data — you should probably add a new component for this instance. This can have the same effect as adding and integrating a new component icon. Good plan design suggests that instances within a single component should be logically similar and work on the selected inputs in similar ways. a new icon — to the plan. in these cases. within reason. data flow in the plan will be broken. conformity. As well as adding a new component — that is. Alternatively. consistency. Similarly. as the rules are likely to change the plan logic. This will also help you keep track of your changes onscreen. For more information on the six standard data quality criteria. If you add a new instance to a component.e. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . see Data Cleansing Removing Components Removing a component from a plan is likely to have a major impact since. The only exceptions to this case are when the output(s) of the removed component are solely used by CSV Sink component or by a frequency component. address. accuracy. duplication and consolidation) using one plan per data quality criterion may be a good way to move forward. you must note that the plan output changes since the column(s) no longer appear in the result set.

It is not necessary to change the configuration of dependent components.Editing Component Configurations Changing the configuration of a component can have a comparable impact on the overall plan as adding or removing a component – the plan’s logic changes. changing the configuration of a component can impact the results in more subtle ways. For example. although adding or removing a component may make a plan non-executable. By default. changing the reference dictionary used by a parsing component does not “break” a plan. and therefore. Last updated: 26-May-08 11:12 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . However. so when you change an output name. so do the results that it produces. changing the name of a component instance output does not break a plan. Similarly. component output names “cascade” through the other components in the plan. but may have a major impact on the resulting output.Data Warehousing 636 of 1017 . all subsequent components automatically update with the new output name.

corporate. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Description Data profiling and data mapping involve a combination of automated and human analyses to reveal the quality.g. and enterprise application integration to CRM initiatives and B2B integration. cryptic field names. Data profiling analyzes several aspects of data structure and content. transform. However. the relationships between fields. Data Profiling Data profiling involves the explicit analysis of source data and the comparison of observed data characteristics against data quality standards. Data profiling and mapping provide a firm foundation for virtually any project involving data movement.Using Data Explorer for Data Discovery and Analysis Challenge To understand and make full use of Informatica Data Explorer’s potential to profile and define mappings for your project data.Data Warehousing 637 of 1017 . multiple formats within a field. Data quality and integrity issues include invalid values. content and structure of project data sources. The key to success for data-related projects is to fully understand the data as it actually is.. industry. the data’s actual form rarely coincides with its documented or supposed form. including characteristics of each column or field. mine. from data warehouse/data mart development. These types of projects rely on an accurate understanding of the true structure of the source data in order to correctly transform the data for a given target database design. migration. and the commonality of data values between fields— often an indicator of redundant data. or government) to which the source data must be mapped in order to be assessed. or an external standard (e. This Best Practice describes how to use Informatica Data Explorer (IDE) in data profiling and mapping scenarios. duplicate entities. integrate. Informatica Data Explorer is a key tool for this purpose. consolidation or integration. and others. Quality standards may either be the native rules expressed in the source data’s metadata. ERP migrations. non-atomic fields (such as long address strings). before attempting to cleanse. or otherwise operate on it.

Data Warehousing 638 of 1017 .Data profiling in IDE is based on two main processes: ● ● Inference of characteristics from the data Comparison of those characteristics with specified standards. IT organizations can preempt the “code/load/explode” syndrome. wherein a project fails at the load stage because the data is not in the anticipated form. By performing these processes early in a data project. as an assessment of data quality Data mapping involves establishing relationships among data elements in various data structures or sources. The following figure summarizes and abstracts these scenarios into a single depiction of the IDE solution. in terms of how the same information is expressed or stored in different ways in different sources. The overall process flow for the IDE Solution is as follows: INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Data profiling and mapping are fundamental techniques applicable to virtually any project.

normalized. The normalized and target schemas are then exported to IDE’s FTM/XML tool. and FTM/XML is used to map SQL and/or XML-based metadata structures.. and to document transformation requirements between fields in the normalized and target schemas. the design of the target database is a given (i. FTM is used for SQL-based metadata structures. and documents cleansing and transformation requirements based on the source and normalized schemas. In this scenario. In a fixed-target scenario.1. IDE is used to develop the normalized schema into a target database. the schema development process is bypassed.e. transformation. generates accurate metadata (including a normalized schema). Data and metadata are prepared and imported into IDE. 6. In this scenario.Data Warehousing 639 of 1017 . IDE infers both the most likely metadata and alternate metadata which is consistent with the data. 2. OR 5. FTM/XML is used to map the source data fields to the corresponding fields in an externally-specified target schema. or situations where a data modeling team is independently designing the target schema. the project team designs the target database by modeling the existing data sources and then modifying the model as required to meet current business and performance requirements. Instead.infers metadata from the data for a column or set of columns. The resultant metadata are exported to and managed in the IDE Repository. which documents transformation requirements between fields in the source. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The IDE Repository is used to export or generate reports documenting the cleansing. and loading or formatting specs developed with IDE applications. IDE profiles the data. business-tobusiness integration projects. Externally specified targets are typical for ERP package migrations. In a derived-target scenario. 4. or because an off-the-shelf package or industry standard is to be used). because another organization is responsible for developing it. and target schemas. IDE's Methods of Data Profiling IDE employs three methods of data profiling: Column profiling . 3.

Data Warehousing 640 of 1017 .determines the overlap of values across a set of columns. which may come from multiple tables. Cross-Table profiling . functional dependencies. This process can discover primary and foreign keys.uses the sample data to infer relationships among the columns in a table. INFORMATICA CONFIDENTIAL Velocity v8 Methodology .Table Structural profiling . and sub-tables.

Profiling against external standards requires that the data source be mapped to the standard before being assessed (as shown in the following figure). IDE is used to profile the data and develop a normalized schema representing INFORMATICA CONFIDENTIAL Velocity v8 Methodology . IDE and Fixed-Target Migration Fixed-target migration projects involve the conversion and migration of data from one or more sources to an externally defined or fixed-target. making them relevant to existing systems as well as to new systems.Data Warehousing 641 of 1017 . IDE can also be used in the development and application of corporate standards. Note that the mapping is performed by IDE’s Fixed Target Mapping tool (FTM). Data profiling projects may involve iterative profiling and cleansing as well since data cleansing may improve the quality of the results obtained through dependency and redundancy profiling. Note that Informatica Data Quality should be considered as an alternative tool for data cleansing.

2. and documents cleansing and transformation requirements based on the source and normalized schemas. The general sequence of activities for a fixed-target migration project. as shown in the figure below. and loading or formatting specs developed with IDE and FTM. Data is prepared for IDE. transformation. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and formatting specs can be used by the application development or Data Quality team to cleanse the data. implement any required edits and integrity management functions. generates accurate metadata (including a normalized schema). The resultant metadata are exported to and managed by the IDE Repository. The cleansing requirements can be reviewed and modified by the Data Quality team. The following screen shot shows how IDE can be used to generate a suggested normalized schema. transformation.the data source(s). 6. which may discover ‘hidden’ tables within tables. 3. The IDE Repository is used to export or generate reports documenting the cleansing. 4.Data Warehousing 642 of 1017 . is as follows: 1. Metadata is imported into IDE. while IDE’s Fixed Target Mapping tool (FTM) is used to map from the normalized schema to the fixed target. and documents transformation requirements between fields in the normalized and target schemas. 5. The cleansing. and develop the transforms or configure an ETL product to perform the data conversion and migration. IDE profiles the data. FTM maps the source data fields to the corresponding fields in an externally specified target schema. Externallyspecified targets are typical for ERP migrations or projects where a data modeling team is independently designing the target schema.

as shown below: Derived-Target Migration Derived-target migration projects involve the conversion and migration of data from one or more sources to a target database defined by the migration team. The figure below shows that the general sequence of activities for a derived-target migration project is as follows: INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and from the normalized to target schemas.Depending on the staging architecture used. IDE is used to profile the data and develop a normalized schema representing the data source(s). IDE can generate the data definition language (DDL) needed to establish several of the staging databases between the sources and target. changing the relational structure.Data Warehousing 643 of 1017 . and to further develop the normalized schema into a target schema by adding tables and/or fields. When the target schema is developed from the normalized schema within IDE. and/or denormalizing the schema to enhance performance. eliminating unused tables and/or fields. the product automatically maintains the mappings from the source to normalized schema.

FTM is used to develop and document transformation requirements between the normalized and target schemas. 7. The cleansing requirements can be reviewed and modified by the Data Quality team. The resultant metadata are exported to and managed by the IDE Repository. Last updated: 09-Feb-07 12:55 INFORMATICA CONFIDENTIAL Velocity v8 Methodology . 3. IDE is used to modify and develop the normalized schema into a target schema.1. transformation. Data is prepared for IDE. The mappings between the data elements are automatically carried over from the IDE-based schema development process. 6. and document cleansing and transformation requirements based on the source and normalized schemas. The IDE Repository is used to export an XSLT document containing the transformation and the formatting specs developed with IDE and FTM/XML. and denormalizing to enhance performance. 4. adapting to corporate data standards. 5. implement any required edits and integrity management functions. incorporating new business requirements and data elements. and formatting specs are used by the application development or Data Quality team to cleanse the data.Data Warehousing 644 of 1017 . IDE is used to profile the data. and develop the transforms of configure an ETL product to perform the data conversion and migration. Metadata is imported into IDE. 2. generate accurate metadata (including a normalized schema). This generally involves removing obsolete or spurious data elements. The cleansing.

how best to manage exception data. The processing logic for data matching is split between PowerCenter and Informatica Data Quality (IDQ) applications. and Validation These plans provide modular solutions for name and address data.Data Warehousing 645 of 1017 . and executed. standardization.or plans . Informatica Data Cleanse and Match is a cross-application data quality solution that installs two components to the PowerCenter system: ● Data Cleanse and Match Workbench. Plans 01-04: Parsing. standardize. The level of structure contained in a given data set determines the plan to be used. Description The North America Content Pack installs several plans to the Data Quality Repository: ● ● Plans 01-04 are designed to parse. The following diagram demonstrates how the level of structure in address data maps to the plans required to standardize and validate an address. ● Informatica Data Cleanse and Match has been developed to work with Content Packs developed by Informatica. INFORMATICA CONFIDENTIAL Velocity v8 Methodology . tested. another for data cleansing. the desktop application in which data quality processes . and de-duplication functionality to United States and Canadian name and address data through a series of pre-built data quality plans and address reference data files. where plans are stored until needed. Cleansing. called the Data Quality Integration transformation. This document focuses on the plans that install with the North America Content Pack.plans can be designed. what behavior to expected from the plans. and validate United States name and address data. which was developed in conjunction with the components of Data Cleanse and Match.Working with Pre-Built Plans in Data Cleanse and Match Challenge To provide a set of best practices for users of the pre-built data quality processes designed for use with the Informatica Data Cleanse and Match (DC&M) product offering. Plans 05-07 are designed to enable single-source matching operations (identifying duplicates within a data set) or dual source matching operations (identifying matching records between two datasets). PowerCenter Designer users can connect to the Data Quality repository and read data quality plan information into this transformation. The plug-in adds a transformation to PowerCenter. cleansing. This document focuses on the following areas: ● ● ● when to use one plan vs. The plans can operate on highly unstructured and wellstructured data sources. Data Quality Integration. The North America Content Pack delivers data parsing. a plug-in component that integrates Informatica Data Quality and PowerCenter. Workbench installs with its own Data Quality repository.

a combination of the address standardization and validation plans is required. For example. a combination of the general parser. and email addresses. and zip are mapped to address fields. only the address validation plan may be required. depending on the profile of the content. and zip) are mapped to specific fields. state. the address validation plan (plan 03) can be run successfully to validate input addresses discretely from the other plans. names. where the data is not mapped to any address columns. dates. and dates are scattered throughout the data.In cases where the address is well structured and specific data elements (i. email addresses. The purpose of making the plans modular is twofold: ● It is possible to apply these plans on an individual basis to the data. As a result. but not specifically labeled as such (e. it can and does happen. city. Social Security Numbers. Plans 01 and 02 are not designed to operate in sequence. In fact.e..g.Data Warehousing 646 of 1017 . For example. the General Parser plan sorts such data into typespecific fields of address. telephone numbers. company names. In extreme cases. Street addresses.com While it is unusual to see data fragmented and spread across a number of fields in this way. the Data Quality Developer will not run all seven plans consecutively on the same dataset. Where the city. Using a combination of dictionaries and pattern recognition. Modular plans facilitate faster performance.com CA 94063 Field5 Redwood City info@informatica. even if it were desirable from a functional point of view. would result in significant performance degradation and extremely complex plan logic that would be difficult to modify and maintain. nor are plans 06 and 07. as Address1 through Address5).. Designing a single plan to perform all the processing tasks contained in the seven plans. company names. ● 01 General Parser The General Parser plan was developed to handle highly unstructured data and to parse it into type-specific fields. consider data stored in the following format: Field1 100 Cardinal Way Redwood City Field2 Informatica Corp 38725 Field3 CA 94063 100 Cardinal Way Field4 info@informatica. and validation plans may be required to obtain meaning from the data. data is not stored in any specific fields. address standardization. state. In cases such as this. the above data will be parsed into the following format: INFORMATICA CONFIDENTIAL Velocity v8 Methodology . There is no requirement that the plans be run in sequence with each other.

and symbols can represent a company. telephone and email in the same contact field). As demonstrated with the address fields in the above example. the data is parsed into the non-name data output field.dic dictionary. While the General Parser does not make any assumption about the data prior to parsing.depending on the elements in the field it can identify first (if any). If the name is not recognized as a company name (e.Data Warehousing 647 of 1017 . such as when several files of differing structures have been merged into a single file. Name Standardization follows two different tracks: one for person names and one for company names. then any company name detected in a person name column is moved to a field for unparsed company name output. If the company name track inputs are already fully populated for the record in question. The second track for name standardization is person names standardization. Its purpose is to identify and sort data by information type.. Person names in North America tend to follow a set structure and typically do not contain company suffixes or digits. by the presence of a company suffix) but contains digits.Address1 100 Cardinal Way Redwood City Address2 CA 94063 100 Cardinal Way Address3 Redwood City CA 94063 E-mail info@informatica. While this track is dedicated to standardizing person names. the General Parser is likely only be used in limited cases. the dates are accepted as valid because they have a structure of symbols and numbers that represents a date. further tests to validate a company name are not likely to yield usable results. Name parsing algorithms have been built using this assumption. where certain types of information may be mixed together. The plan input fields include two inputs for company names. The first pass applies a series of dictionaries to the name fields.com info@informatica. standardizing any company suffixes included in the field. the contents are not arranged in a standard address format. the address fields are labeled as addresses. it does not necessarily assume that all data entered here is a person name. Second. the field is left in a pipe-delimited form containing unparsed data. it parses based on the elements of data that it can make sense of first. Data that is entered in these fields are assumed to be valid company names.or leave it unparsed . firstname middlename” format. numbers.. In cases where no elements of information can be labeled. For example. 02 Name Standardization The Name Standardization plan is designed to take in person name or company name information and apply parsing and standardization logic to it.com Date Company Informatica Corp 08/01/2006 The General Parser does not attempt to apply any structure or meaning to the data. First. The General Parser does not attempt to validate the correctness of a field. the General Parser would label the entire field either a name or an address . attempting to parse out name INFORMATICA CONFIDENTIAL Velocity v8 Methodology . Any data entered into the company name fields is subjected to two processes. Additional logic is applied to identify people whose last name is similar (or equal) to a valid company name (for example John Sears). Overall. in the absence of an external reference data source. inputs that contain an identified first name and a company name are treated as a person name. if found. the standardized company name is matched against the company_names. Any remaining data is accepted as being a valid person name and parsed as such. the company name is standardized using the Word Manager component. values entered in this field that contain a company suffix or a company name are taken out of the person name track and moved to the company name track. (e. which returns the standardized Dun & Bradstreet company name. Therefore. g. Adding or deleting dictionary entries can greatly affect the effectiveness of this plan. they are flagged as addresses in the order in which they were processed in the file. and no additional tests are performed to validate that the data is an existing company name. if a person name and address element are contained in the same field. therefore. The effectiveness of the General Parser to recognize various information types is a function of the dictionaries used to identify that data and the rules used to sort them. For example. Name parsing occurs in two passes. North American person names are typically entered in one of two different styles: either in a “firstname middlename surname” format or “surname. The General Parser does not attempt to handle multiple information types in a single field.g. or in cases where the data has been badly managed. Any combination of letters. A value of 99/99/9999 would also be parsed as a date.

Sr Eddie Martin III Martin Luther King. last. When name details have been parsed into first. A rule is applied to the parsed details to check if the name has been parsed correctly. In cases where no clear gender can be generated from the first name.Data Warehousing 648 of 1017 . “best guess” parsing is applied to the field based on the possible assumed formats. Thomas Smith. For example. name suffixes. To ensure that the name standardization plan is delivering adequate results. an address datum such as “Corporate Parkway” may be standardized as a business name. and any extraneous data (“noise”) present. Any remaining details are assumed to be middle name or surname details.prefixes. firstnames. If not. Likewise. depending on whether or not the field contains a recognizable company suffix in the text. the gender field is typically left blank or indeterminate. certain companies may be treated as person names and parsed according to person name processing rules. This is typically a result of the dictionary content. Non-name data encountered in the name standardization plan may be standardized as names depending on the contents of the fields. Jr. Informatica strongly recommends pre. salutations are generated. Sears Roy Jones Jr. Finally. If this is a significant problem when working with name data. Based on the following input: ROW ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 IN NAME1 Steven King Chris Pope Jr.and postexecution analysis of the data. some person names may be identified as companies and standardized according to company name processing logic. as “Corporate” is also a business suffix. Prince Dean Jones Mike Judge Thomas Staples Eugene F. the first name is used to derive additional details including gender and the name prefix. using all parsed and derived name elements. and middle name formats. Any text data that is entered in a person name field is always treated as a person or company. Staples Corner Sears Chicago Robert Tyre Chris News The following outputs are produced by the Name Standardization plan: INFORMATICA CONFIDENTIAL Velocity v8 Methodology . The salutation field is generated according to the derived gender information. This can be easily replicated outside the data quality plan if the salutation is not immediately needed as an output from the process (assuming the gender field is an output). Shannon C. Depending on the data entered in the person name fields. some adjustments to the dictionaries and the rule logic for the plan may be required.

Bear in mind that the dictionary parsing operates from right to left across the data. standardize. the address standardization dictionaries assume that these data elements are spelled correctly. The plan makes a number of assumptions that may or may not suit your data: ● When parsing city.S. and To parse. state. or Canadian. The plan also organizes key search elements into discrete fields. there is no need to include any country code field in the address inputs when configuring the plan. and zip details have been parsed out. Velocity v8 Methodology . once city. and zip/postal code information. ● ● ● Most of these issues can be dealt with. as these will conflict with the “Plus 4” element of a zip code. so that country name and zip code fields are analyzed before city names and street addresses. Zip codes are all assumed to be five-digit. and enrich the input addresses. For example. thereby speeding up the validation process. City names are also commonly found in street names and other address elements.S. All remaining information is assumed to be address information and is absorbed into the address line 1-3 fields. Therefore. state/province. and zip/postal code information for United States and Canadian postal address data. the plan assumes all remaining elements are street address lines and parses them in the order they occurred as address lines 1-3. 03 US Canada Standardization This plan is designed to apply basic standardization processes to city. and these may be missed during parsing. or by adding some pre-processing logic to a workflow prior to passing the data into the plan. zip codes that begin with “0” may lack this first number and so appear as a four-digit codes. In some files. the standardization plan may not correctly parse the information. by minor adjustments to the plan logic or to the dictionaries. 04 NA Address Validation The purposes of the North America Address Validation plan are: ● ● To match input addresses against known valid addresses in an address database. Adding four-digit zips to the dictionary is not recommended. The plan accepts up to six generic address fields and attempts to parse out city. Variation in town/city names is very limited. Zip codes may also be confused with other five-digit numbers in an address line such as street numbers. Therefore. state. The purpose of the plan is to deliver basic standardization to address elements where processing time is critical and one hundred percent validation is not possible due to time constraints. Any information that cannot be parsed into the remaining fields is merged into the non-address data field. and zip details. and in cases where punctuation differences exist or where town names are commonly misspelled. state/province. the word “United” may be parsed and written as the town name for a given address before the actual town name datum is reached. if necessary.The last entry (Chris News) is identified as a company in the current plan configuration – such results can be refined by changing the underlying dictionary entries used to identify company and person names. The plan appends a country code to the end of a parsed address if it can identify it as U. Therefore. The plan assumes that all data entered into it are valid address elements. “United” is part of a country (United States of America) and is also a town name in the U.Data Warehousing 649 of 1017 INFORMATICA CONFIDENTIAL .

Users will use either plan 05 and 06 or plans 05 and 07. productive fields for grouping name and address data are location-based (e. Single source matching seeks to identify duplicate records within a single data set. are as similar to each other as possible to return better match set. see the Best Practice Effective Data Matching Techniques. Although they work with datasets in other languages. and Matching These plans take advantage of PowerCenter and IDQ capabilities and are commonly used in pairs. the main function of grouping in a PowerCenter context is to create candidate group keys. A well-designed grouping plan can dramatically cut plan processing time while minimizing the likelihood of missed matches in the dataset. Using the US Canada Standardization plan before the NA Address Validation plan helps to improve validation plan results in cases where city. The address validation APIs store specific area information in memory and continue to use that information from one record to the next. state. The aim for standardization here is different from a classic standardization plan – the intent is to ensure that different spellings. Grouping performs two functions. the results may be sub-optimal. Therefore. For more information on grouping strategies for best result/performance relationship. Plan 05 (Match Standardization and Grouping) performs cleansing and standardization operations on the data before group keys are generated. the Sorter transformation can organize the data to facilitate matching performance. and zip are key search criteria for the address validation engine. City. etc. Matching Concepts To ensure the best possible matching results and performance. 07 Dual Source Matching. Note also that the Standardization and Matching plans are geared towards North American English data. When a matching plan is run on grouped data. ● ● Note that the matching plans are designed for use within a PowerCenter mapping and do not deliver optimal results when executed directly from IDQ Workbench. state. there is little value in comparing the record for John Smith with the record for Angela Murphy as they are obviously not going to be considered as duplicate entries. The plan generates the following group keys: INFORMATICA CONFIDENTIAL Velocity v8 Methodology . and they need to be mapped into discrete fields. abbreviations. when looking for duplicates in a customer list. city name. and it creates new data columns to provide group key options for the matching plan.Perform