You are on page 1of 401

Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality


Management for Developers

Student Guide

Version: DQ10.1.1_MAN_DEV_201703

Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers

Version: DQ10.1.1_MAN_DEV_201703
March 2017
Copyright (c) 1998–2017 Informatica LLC. All rights reserved.
This educational service, materials, documentation and related software contain proprietary
information of Informatica LLC and are provided under a license agreement containing restrictions
on use and disclosure and are also protected by copyright law. Reverse engineering of the software
is prohibited. No part of the materials and documentation may be reproduced or transmitted in any
form, by any means (electronic, photocopying, recording or otherwise) without prior consent of
Informatica LLC. The related software is protected by U.S. and/or international Patents and other
Patents Pending.
Use, duplication, or disclosure of the related software by the U.S. Government is subject to the
restrictions set forth in the applicable software license agreement and as provided in DFARS
227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013(c)(1)(ii) (OCT 1988), FAR
12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14 (ALT III), as applicable.
The information in this educational service, materials, and documentation is subject to change
without notice. If you find any problems in this educational service, materials or documentation,
please report them to us in writing.
Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT,
PowerCenter Connect, PowerCenter Data Analyzer, PowerExchange, PowerMart, Metadata
Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data
Transformation, Informatica B2B Data Exchange Informatica On Demand, Informatica Identity
Resolution, Informatica Application Information Lifecycle Management, Informatica Complex Event
Processing, Ultra Messaging and Informatica Master Data Management are trademarks or
registered trademarks of Informatica LLC in the United States and in jurisdictions throughout the
world. All other company and product names may be trade names or trademarks of their respective
owners.
Portions of this educational service, materials and/or documentation are subject to copyright held
by third parties, including without limitation: Copyright © Adobe Systems Incorporated. All rights
reserved. Copyright © Microsoft. All rights reserved. Copyright © Oracle. All rights reserved.
Copyright @ the CentOS Project.
This Software is protected by U.S. Patent Numbers 5,794,246; 6,014,670; 6,016,501; 6,029,178;
6,032,158; 6,035,307; 6,044,374; 6,092,086; 6,208,990; 6,339,775; 6,640,226; 6,789,096;
6,820,077; 6,823,373; 6,850,947; 6,895,471; 7,117,215; 7,162,643; 7,243,110, 7,254,590;
7,281,001; 7,421,458; 7,496,588; 7,523,121; 7,584,422, 7,720,842; 7,721,270; and 7,774,791,
international Patents and other Patents Pending.
DISCLAIMER: Informatica LLC provides this educational services, materials, and documentation
“as is” without warranty of any kind, either express or implied, including, but not limited to, the
implied warranties of non-infringement, merchantability, or use for a particular purpose. Informatica
LLC does not warrant that this educational service, materials, documentation or related software is
error free. The information provided in this educational service, materials, documentation and
related software may include technical inaccuracies or typographical errors. The information in this
educational service, materials, documentation and related software is subject to change at any time
without notice.

Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or itsiiaffiliates.


Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Document Conventions
This guide uses the following formatting conventions:

If you see… It means… Example


> Indicates a sub menu to navigate Click Repository > Connect.
to. In this example, you should click the
Repository menu or button and
choose Connect.
boldfaced text Indicates text you need to type or Click the Rename button and name
enter. the new source definition
S_EMPLOYEE.
UPPERCASE Database tables and column T_ITEM_SUMMARY
names are shown in all
UPPERCASE.
italicized text Indicates a variable you must Connect to the Repository using the
replace with specific information. assigned login_id.
Note: The following paragraph provides Note: You can select multiple objects
additional facts. to import by using the Ctrl key.
Tip: The following paragraph provides Tip: The m_ prefix for a mapping
suggested uses or a Velocity best name is…
practice.

Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or itsiiiaffiliates.


Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Other Informatica Resources


In addition to the student and lab guides, Informatica provides these other resources:
 Documentation and Knowledge Base
 Global Customer Support
 Professional Certification

Accessing Documentation and Knowledge Base


To get the latest documentation and Knowledge Base for your product, go to
https://network.informatica.com

Contacting Global Customer Support


You can contact a Customer Support Center by telephone or through the Online Support. Online
Support requires a username and password. You can request a username and password at
https://www.informatica.com/services-and-training/support-services/contact-us.html

Obtaining Informatica Professional Certification


You can take, and pass, exams provided by Informatica to obtain Informatica Professional
Certification. For more information, go to
https://www.informatica.com/services-and-training/certification.html

Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or itsivaffiliates.


Module 1: Course reproduction
Unauthorized Introduction and
orAgenda 1.1
distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 1: Course reproduction
Unauthorized Introduction and
orAgenda 1.2
distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 1: Course reproduction
Unauthorized Introduction and
orAgenda 1.3
distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 1: Course reproduction
Unauthorized Introduction and
orAgenda 1.4
distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 1: Course reproduction
Unauthorized Introduction and
orAgenda 1.5
distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 1: Course reproduction
Unauthorized Introduction and
orAgenda 1.6
distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 1: Course reproduction
Unauthorized Introduction and
orAgenda 1.7
distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 1: Course reproduction
Unauthorized Introduction and
orAgenda 1.8
distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 1: Course reproduction
Unauthorized Introduction and
orAgenda 1.9
distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 1: Course reproduction
Unauthorized Introduction and
orAgenda 1.10
distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 1: Course reproduction
Unauthorized Introduction and
orAgenda 1.11
distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 2: Data Quality
Unauthorized Projects and
reproduction Solution Architecture
or distribution 2.1
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 2: Data Quality
Unauthorized Projects and
reproduction Solution Architecture
or distribution 2.2
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 2: Data Quality
Unauthorized Projects and
reproduction Solution Architecture
or distribution 2.3
prohibited. Copyright© 2017, Informatica and/or its affiliates.

DQ Architecture
The ISP (Informatica Service platform) contains multiple services including the Domain Service,
which bundles the remaining services including: Admin, Data Integration, Analyst, Reporting,
Model Repository, and Content Management Services. The boxes inside the Data Integration
Service are plugins that perform specific tasks such as mapping execution, profiling execution, and
web service execution.
Service Descriptions:
• Admin Service – Service that allows administration of INFA Domains. This includes setup of
remaining services, license setup, security, connections, monitoring, and logs.
• Model Repository Service – This is the metadata repository that stores setting, logic, rules,
mappings, statistics, etc.
• Data Integration Service – This is the service that performs the work such as data movement,
profiling, mapping execution, data previews, etc.
• Content Management Service – This is a service utilized by Data Quality transformations for
management of AV data files, IMO populations, and other content related items.
• Analyst Service – This is the service that manages the Analyst Browser UI, controlling
Steward activities, RTM, bad record, duplicate record review, and score carding.
• Reporting Service – This is an embedded Jaspersoft Server used for reporting and dash boarding.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 2: Data Quality
Unauthorized Projects and
reproduction Solution Architecture
or distribution 2.4
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 2: Data Quality
Unauthorized Projects and
reproduction Solution Architecture
or distribution 2.5
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 2: Data Quality
Unauthorized Projects and
reproduction Solution Architecture
or distribution 2.6
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 2: Data Quality
Unauthorized Projects and
reproduction Solution Architecture
or distribution 2.7
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 2: Data Quality
Unauthorized Projects and
reproduction Solution Architecture
or distribution 2.8
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 2: Data Quality
Unauthorized Projects and
reproduction Solution Architecture
or distribution 2.9
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 2: Data Quality
Unauthorized Projects and
reproduction Solution Architecture
or distribution 2.10
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 2: Data Quality
Unauthorized Projects and
reproduction Solution Architecture
or distribution 2.11
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 2: Data Quality
Unauthorized Projects and
reproduction Solution Architecture
or distribution 2.12
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 2: Data Quality
Unauthorized Projects and
reproduction Solution Architecture
or distribution 2.13
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 2: Data Quality
Unauthorized Projects and
reproduction Solution Architecture
or distribution 2.14
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 2: Data Quality
Unauthorized Projects and
reproduction Solution Architecture
or distribution 2.15
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 3: Data Quality
Unauthorized Process Overview
reproduction 3.1
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 3: Data Quality
Unauthorized Process Overview
reproduction 3.2
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 3: Data Quality
Unauthorized Process Overview
reproduction 3.3
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 3: Data Quality
Unauthorized Process Overview
reproduction 3.4
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

What is data quality management?

How well does your organization’s data match the real-world phenomena that it intends to
represent?
How usable is the data that you’ve got?
• Note also that data quality issues relate not simply to poor-quality data, but also to good-quality
data whose value is not being maximized.
• Data quality issues reach far beyond the “name and address” and can affect any industry.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 3: Data Quality
Unauthorized Process Overview
reproduction 3.5
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 3: Data Quality
Unauthorized Process Overview
reproduction 3.6
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Examples of data quality in practice

Completeness — In the Last Invoice Date column, there are no values for CHISLAIN DRION,
GEORGE LOUGHRAN, and GERARD EGAN.
Conformity — The sys_address_1 column contains house name/number and street information,
except in the cases of JANE DUNNE and JOHN O CONNOR.
Consistency — The Business Type entries for BK ENGINEERING and MR MARTIN THATCHER
are inconsistent: the former is referred to as a PERSON, and the latter is referred to as a
BUSINESS.
Integrity — The household connection between JANE POLLARD and WILLIAM POLLARD is not
made in the dataset.
Duplication — Duplicate entries are present for CHRIS MOLONEY and MR CHRISTOPHER
MOLONEY.
Accuracy — The accuracy of the data cannot always be determined from the dataset alone. You
may need to confirm the accuracy of your data by comparing it with a reliable reference source.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 3: Data Quality
Unauthorized Process Overview
reproduction 3.7
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 3: Data Quality
Unauthorized Process Overview
reproduction 3.8
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 3: Data Quality
Unauthorized Process Overview
reproduction 3.9
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 3: Data Quality
Unauthorized Process Overview
reproduction 3.10
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 3: Data Quality
Unauthorized Process Overview
reproduction 3.11
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 3: Data Quality
Unauthorized Process Overview
reproduction 3.12
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

The data quality life cycle

Informatica Data Quality supports a “continuous improvement” process that can be cyclical or
iterative in nature.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 3: Data Quality
Unauthorized Process Overview
reproduction 3.13
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

In stage 1 work with the business to define metrics and analyze the quality of the project data
according to agreed measures. This stage is performed in Informatica Analyst which enables the
creation of versatile and easy to use scorecards to communicate data quality metrics to all
interested parties.
In stage 2, verify the target levels of data quality for the business using the data quality
measurements taken in stage 1.
In stage 3 use Developer to design the DQ rules and mappings and projects to achieve this targets.
Capturing business rules and testing the mapplets and mappings are also covered in this stage.
In stage 4, deploy the DQ rules and mappings. Stage 4 is the phase in which data cleansing and
other tasks are performed on the project data. This will require collaboration between both the
Analyst and Developer users.
In stage 5, test and measure the results of the rules and compare them to the initial DQ
assessment to verify that targets have bene met. If targets have not been met this information
feeds into another iteration of DQ operations in which plans are tuned and optimized.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 3: Data Quality
Unauthorized Process Overview
reproduction 3.14
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Profiling
This process is important in every type of project be it for Reporting, Cleansing or Gating.
It will typically be performed by the Analyst in the Analyst tool. They can profile and review the
data, create comments, build rules, develop reference tables and review rules created by the
developer.
Rules and reference tables built here can be applied in processes created by the developer to
report, cleanse and standardize the data in further modules.

Profiling analyzes the contents of all specified fields in a dataset, identifying low-quality data
according to the six data quality criteria. Profiling is used to look for specific quality problems and
provides specialized analysis functionality for several data types.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 3: Data Quality
Unauthorized Process Overview
reproduction 3.15
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Standardization

This process will be performed by Developers in the Developer Tool

The processes you implement in the standardization module will correct the completeness,
conformity, and consistency problems identified through Profiling.
Standardization transforms and intelligently parses data from a single multi-domain field to several
fields. It standardizes field formats and extracts important data from free text fields.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 3: Data Quality
Unauthorized Process Overview
reproduction 3.16
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Matching

This process will be performed by Developers in the Developer Tool


Data matching identifies equivalent/duplicate and related data records within a dataset or between
datasets.
Correspondingly, it can identify inaccurate data by comparing the current dataset with a reference
dataset.
Identity Matching can be used for Identity data and is particularly affective on non-standardized and
dirty data.
It uses an extremely powerful and flexible matching engine to achieve highly accurate matching
results, in minimum processing time, based on a multi-stage matching process.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 3: Data Quality
Unauthorized Process Overview
reproduction 3.17
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Consolidation

Consolidation is the fourth and final module in the data quality process although its output can form
the basis for a subsequent iteration of the data quality lifecycle.
Data consolidation manages the process of merging or linking duplicate or related records. It
facilitates the consolidation of records in a single database or multiple databases. It can also
append data from a reference dataset or overwrite inaccurate data.
The consolidation module operates on the rows of one or more datasets; it uses the results from
the matching module to guide users through the process of consolidating within a dataset or
between datasets.
The new Informatica Data Director can be used for manual consolidation of records through a web
browser.
Further Integration with PowerCenter and the addition of 2 new PowerCenter Transformations has
further increased Informatica Data Quality’s consolidation ability.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 3: Data Quality
Unauthorized Process Overview
reproduction 3.18
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.1
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.2
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.3
Copyright© 2017, Informatica and/or its affiliates.

Informatica Developer GUI


Informatica Developer is an easy-to-use user interface featuring sophisticated data integration,
data cleansing, and data matching transformations. It is built on Eclipse 3.3 Rich Client Platform
(RCP). With the Developer Tool, you can build Mappings and processes using configurable
transformations. Mappings are stored in the internal database — the repository — and can be
executed on-the-fly at the click of a button.
It is possible to auto arrange the transformations on the Mappings by right clicking and selecting
Arrange All Iconic.
Maximize / Minimize any view/editor (double-click)
Move views
Undock views
Fast views
Reset perspective
Views available include:
Connection Explorer. Shows connections to relational databases.
Data Viewer. Shows the results of a mapping, data preview, or an SQL query.
Object Explorer. Shows projects, folders, and the objects they contain.
Outline. Shows dependent objects in an object.
Progress. Shows the progress of operations in the Developer tool, such as a mapping run.
Properties. Shows object properties.
Search. Shows search options.
Validation Log. Shows object validation errors.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.4
Copyright© 2017, Informatica and/or its affiliates.

The Model Repository


The Model repository is a relational database that stores the metadata for projects and folders. It
stores reference data and rules, and this repository is available to users of the Developer tool and
Analyst tool. Each time you open the Developer tool, you connect to the Model repository to access
projects and folders.

Projects
Highest object in the Navigator hierarchy. A project is the top-level container that you use to store
folders and objects in the Developer tool and can be shared and non-shared. Use projects to
organize and manage the objects that you want to use for data services and data quality solutions.
You manage and view projects in the Object Explorer view. When you create a project, it is stored
in the Model repository. Each project that you create also appears in the Analyst tool.

Folders
Use folders to organize objects in a project. Create folders to group objects based on business
needs. For example, you can create a folder to group objects for a particular task in a project.
Folders appear within projects in the Object Explorer view. A folder can contain other folders, data
objects, and object types.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.5
Copyright© 2017, Informatica and/or its affiliates.

The Model Repository


The Model repository is a relational database that stores the metadata for projects and folders. It
stores reference data and rules, and this repository is available to users of the Developer tool and
Analyst tool. Each time you open the Developer tool, you connect to the Model repository to access
projects and folders.

Projects
Highest object in the Navigator hierarchy. A project is the top-level container that you use to store
folders and objects in the Developer tool and can be shared and non-shared. Use projects to
organize and manage the objects that you want to use for data services and data quality solutions.
You manage and view projects in the Object Explorer view. When you create a project, it is stored
in the Model repository. Each project that you create also appears in the Analyst tool.

Folders
Use folders to organize objects in a project. Create folders to group objects based on business
needs. For example, you can create a folder to group objects for a particular task in a project.
Folders appear within projects in the Object Explorer view. A folder can contain other folders, data
objects, and object types.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.6
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.7
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.8
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.9
Copyright© 2017, Informatica and/or its affiliates.

Column Profiling
A profile is the analysis of data quality based on the content and structure of data. It is a set of
metadata describing the content and structure of a dataset. Data profiling is often the first step in a
project. You can run a profile to evaluate the structure of data and verify that data columns are
populated with the types of information you expect. If a profile reveals problems in data, you can
define steps in your project to fix those problems.
A profile provides the following facts about data:
• The number of unique and null values in each column expressed as a number and a
percentage.
• The patterns of data in each column, and the frequencies with which these values occur.
• Statistics about the column values, such as the maximum and minimum lengths of values and
the first and last values in each column.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.10
Copyright© 2017, Informatica and/or its affiliates.

Developer Profiling

Multiple object profiling, Column, Join Analysis and Mid Stream Profiling are available in the
Developer tool
As in the Analyst, it is possible to add comments, change the sampling policies and apply rules.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.11
Copyright© 2017, Informatica and/or its affiliates.

Collaboration - Comments

Comments exist in the context of a profile. Comments added in the analyst tool are visible in the
developer tool and vice versa.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.12
Copyright© 2017, Informatica and/or its affiliates.

Collaboration - Tags

Can be applied to columns within Profiles only

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.13
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.14
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.15
Copyright© 2017, Informatica and/or its affiliates.

Generate Mapping from a Profile

The Analyst can build profiles and apply rules. When complete the Developer can build a mapping
and assign a target to actually apply these rules.
This means less room for error and saves developer time in that they do not need to recreate the
mapping and individually assign the rules.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.16
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.17
Copyright© 2017, Informatica and/or its affiliates.

Scorecard

Use scorecards to measure data quality progress. You can create a scorecard from a profile and
monitor the progress of data quality over time. Scorecards display the value frequency for columns
in a profile as scores.
You can create and view a scorecard in the Developer tool. You can run and edit the scorecard in
the Analyst tool.
Scorecarding demonstrates tangible progress in improving quality over one or several data quality
life cycles — and helps you refocus your data quality processes on problem areas.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.18
Copyright© 2017, Informatica and/or its affiliates.

Scorecarding and the data quality life cycle

Throughout the lifecycle of the project, the scorecards can be run to measure the level of
improvement in the data and provide guidance for areas that need to be developed in the
Mappings. For example, you can create a scorecard to measure data quality before you apply
rules. After you apply rules, you can create another scorecard to compare the effect of the rules on
the quality. This provides a higher level view of the quality of the data and will enable scores to be
maintained across the life cycle of the project.
Scorecarding provides a statistical determination of your data quality according to the six key
quality indicators and tracks data quality improvements over time.
1. It acts as a diagnostic tool at the start of the quality process.
2. When conducted again following completion of the process, it enables you to validate
your process methodology and identify areas in which data can be further improved.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.19
Copyright© 2017, Informatica and/or its affiliates.

Data Quality Scorecard

Use the following rules and guidelines when you work with scorecards:
You cannot add a column with the same name to an existing scorecard.
You cannot add the same column twice to a scorecard even if you change the column name.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.20
Copyright© 2017, Informatica and/or its affiliates.

Data Quality Scorecard

Use the following rules and guidelines when you work with scorecards:
You cannot add a column with the same name to an existing scorecard.
You cannot add the same column twice to a scorecard even if you change the column name.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.21
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.22
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.23
Copyright© 2017, Informatica and/or its affiliates.

What are Reference Tables?

In IDQ Reference tables enable validation, parsing, enrichment and enhancement of data. A
reference table contains data that you can use to standardize source data. Reference data can
include valid and standard values and are used by analysts and developers in data quality
standardization and validation rules.
For example, during a data quality project, you create a reference table that contains the list of
valid values for an address column in source data. A developer can use the reference data in the
Developer tool to standardize the invalid values for the address.
When a new Reference Table is created:
Corresponding database table is created in the Staging (or specified reference table) area
Entries are made in audit tables.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.24
Copyright© 2017, Informatica and/or its affiliates.

Reference Table used for Data Validation and Reporting:


• Contains ‘correct’ data only
• Data can be validated against this reference set and either valid/invalid output. This can be
rolled up to get a count of valid values that can then be used in a Scorecard / Report.
Reference Table used for Data Standardization:
• Contains ‘correct’ value in valid column, and variants across columns
• Here the variant will be identified in the table and changed to the value in the ‘valid’ column
Reference Table used for Data Enrichment:
• Contains data value in Column2 and derived value in valid column
• It will look up value and return the new data value in the valid column

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.25
Copyright© 2017, Informatica and/or its affiliates.

Reference Tables

Reference tables can be created in the following ways:


Text Based:
Create, edit, and import data quality dictionary files as reference tables.
Database sources:
Managed: Import data directly from current reference data sources
Once imported the reference table and data can be edited as normal
UnManaged: Link to existing reference data sources
Because the data is not imported and held in Informatica, neither reference tables nor the
data can be edited. Any changes to be made should be made in the external source.
UnManaged Editable: Link to existing reference data sources
In this case the data is still maintained in the database and not in DQ but the user will be
able to edit the data but not the metadata.
You can view the audit trail events to see the changes made to a reference table on the Audit Trail
view.
You can view properties for the reference table in the Properties view.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.26
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.27
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.28
Copyright© 2017, Informatica and/or its affiliates.

Import Flat File as Reference Table

If you are planning on using the reference table as part of a process to standardize data (for data
quality processes) you need to select a “valid” or “correct” column. This is the valid value that the
data will be changed to in a lookup or DQ transformation.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.29
Copyright© 2017, Informatica and/or its affiliates.

Connect to a relational table

Select “Connect to a relational table” option to use external DB as a source.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.30
Copyright© 2017, Informatica and/or its affiliates.

Connect to a relational table

Select “Connect to a relational table” option to use external DB as a source.


RTM Data for Unmanaged reference table is exported to .dic
For PowerCenter usage treated the same as standard reference tables
Data synchronisation for staged RTM with external database as source:
No data synchronisation takes place. Changes to source database will not be reflected in RTM, only
changes made through Developer will be reflected in RTM
Data editing for unmanaged RTM:
All data edits must be performed on source database
Database (structural) changes for unmanaged RTM:
If columns are added/removed from source DB for unmanaged RTM, unmanaged RTM must be re-
created for changes to appear
External tables (for managed and unmanaged) RTM must conform to the below types:

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.31
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 4: Analyst reproduction
Unauthorized Collaboration and Reference Table
or distribution Management
prohibited. 4.32
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.1
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.2
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.3
prohibited. Copyright© 2017, Informatica and/or its affiliates.

The Model Repository


The Model repository is a relational database that stores the metadata for projects and folders. It
stores reference data and rules, and this repository is available to users of the Developer tool and
Analyst tool. Each time you open the Developer tool, you connect to the Model repository to access
projects and folders.

Projects
Highest object in the Navigator hierarchy. A project is the top-level container that you use to store
folders and objects in the Developer tool and can be shared and non-shared. Use projects to
organize and manage the objects that you want to use for data services and data quality solutions.
You manage and view projects in the Object Explorer view. When you create a project, it is stored
in the Model repository. Each project that you create also appears in the Analyst tool.

Folders
Use folders to organize objects in a project. Create folders to group objects based on business
needs. For example, you can create a folder to group objects for a particular task in a project.
Folders appear within projects in the Object Explorer view. A folder can contain other folders, data
objects, and object types.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.4
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.5
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Project Permissions

Users assigned the Administrator role for a Model Repository Service inherit all permissions on all
projects in the Model Repository Service. Users assigned to a group inherit the group permissions.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.6
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Objects

In DQ, a PDO can be of two types: relational data object and customized data object. A relational
data object represents only the native metadata of the resources. A customized data object
represents both the native metadata as well as the configuration rules for read / write such as rules
for filtering, joining data, sort order, etc.

With a logical view of data, you can achieve the following goals:
• Use common data models across an enterprise so that you do not have to redefine data to meet
different business needs.
• It also means if there is a change in data attributes, you can apply this change one time and use
one mapping to make this change to all databases that use this data.
• Find relevant sources of data and present the data in a single view. Data resides in various
places in an enterprise, such as relational databases and flat files. You can access all data
sources and present the data in one view.
• Expose logical data as relational tables to promote reuse.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.7
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Physical data objects

Relational data object: A physical data object that uses a relational table, view, or synonym as a
source. For example, you can create a relational data object from a DB2 i5/OS table or an Oracle
view.
Customized data object: Create a customized data object if you want to perform operations such
as joining data, filtering rows, sorting ports, or running custom queries in a reusable data object.
For example, you can create a customized data object from two Microsoft SQL Server tables that
have a primary key-foreign key relationship.
Nonrelational data object: A physical data object that uses a nonrelational database resource as
a source. For example, you can create a nonrelational data object from a VSAM source.
Flat file data object: A physical data object that uses a flat file as a source. You can create a flat
file data object from a delimited or fixed-width flat file.
SAP data object: A physical data object that uses an SAP source.
WSDL data object: A physical data object that uses a WSDL file as a source.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.8
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Configuring physical data objects

Overview Properties
Physical data objects contain general properties, such as the data object name and description as
well as column names, datatype and precision. You can edit general properties in the Overview
view in the editor.
Read/Write Properties
Some of the property tabs available for the Read properties are: General, Columns, Format and
Runtime. The property tabs for the Write properties are General, Columns, and Runtime. These
tabs allow you to configure properties for flat file data objects including the input and output file
names and locations. The Data Integration Service uses this information when it reads data from a
file or writes data to a file. They also include format or query properties, depending on the type of
physical data object.
Parameters
Parameters are covered later in the course
Advanced
The advanced tab contains properties such as code page, format, delimiters, text qualifiers and
date formats.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.9
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Physical data objects


A physical data object is the representation of data based on a flat file or relational table. Create a
physical data object in a project or folder. If the source of a data object changes, you can
synchronize the physical data object. When you synchronize a physical data object, the Developer
tool imports the object metadata from the file or table you select.
Use relational PDO if you want to reuse the native metadata and customize read/write at the
mapping level. For example, provide different filters, join conditions, at the mapping level.
Physical Data Objects represents all the native metadata including datatypes, keys, relationships,
etc. Keys and Relationships can be created/augmented within the PDO, even if not defined at the
Database level. This type of PDO would probably be used for a majority of cases.
When you import relational data objects, the Developer tool retains the primary key information
defined in the database. When you import related relational data objects at the same time, the
Developer tool also retains foreign keys and key relationships. However, if you import related
relational data objects separately, you must re-create the key relationships after you import the
objects.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.10
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Customized Physical Data Objects

Customized data object represents both the native metadata and the configuration rules for
read/write. This will be mainly used for cases where the user wants to customize properties like
filter, join condition, sort order, SQL query, etc and then reuse it in multiple mappings. These
properties cannot be overridden again at the mapping level.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.11
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Logical Data Objects

We can see here that the Read mapping consists of 2 data objects that are joined using the
OrderNo in a joiner transformation.
The data for this object will appear as if it is coming from a single object when we can see that it is
being sourced from 2 objects.

You can build a logical data object using a logical data object as a source. Read mappings can
contain transformations and mapplets etc.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.12
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality Mappings Defined

A mapping is a set of inputs and outputs that represent the data flow between sources and targets.
They can be linked by transformation objects that define the rules for data transformation.
The Data Integration Service uses the instructions configured in the mapping to read, transform,
and write data.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.13
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.14
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.15
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Mapplets and Rules

A mapplet or rule is a reusable object containing a set of transformations that you can use in
multiple mappings.
Use a mapplet in a mapping or, validate the mapplet as a rule and use in Informatica Analyst.
Validating the mapplet, as a rule, means that the rule will be visible in the Analyst Tool. It checks to
ensure that only passive transformations are in the mapplet.
When you use a mapplet in a mapping, you use an instance of the mapplet.
Any change made to the mapplet is inherited by all instances of the mapplet.
A mapplet is the implementation of a business rule. A business rule is a condition of the data that
must be true if the data is to be valid. In many chases poor data quality is directly related to the
data's failure concerning a business rule.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.16
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.17
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.18
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Applied within the labeler to identify and label strings.


Applied in the Parser to parse strings out.
Can choose System or Custom Content Sets.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.19
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Transformations

Objects that define the rules for data transformation. Use different transformation objects to
perform different functions. Transformation configurations can be unique or reusable.
Transformations in a mapping represent the operations that are performed on the data.
To create a reusable transformation
Click File->New->Transformation
To create a non-reusable transformation
In the context menu click Add Transformation or use the Transformation Palette
Transformation Palette
DQ and DI transformations
Change layout or show/hide palette
Double-Click on a transformation to bring it to the canvas
Use Properties to configure transformations

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.20
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Transformations

Objects that define the rules for data transformation. Use different transformation objects to
perform different functions. Transformation configurations can be unique or reusable.
Transformations in a mapping represent the operations that are performed on the data.
To create a reusable transformation
Click File->New->Transformation
To create a non-reusable transformation
In the context menu click Add Transformation or use the Transformation Palette
Transformation Palette
DQ and DI transformations
Change layout or show/hide palette
Double-Click on a transformation to bring it to the canvas
Use Properties to configure transformations

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.21
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.22
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.23
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.24
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Autolink

Use Autolink to automatically link the ports from one transformation to another. It is possible to
Autolink by Name or Position. This removes the need to manually drag ports from one
transformation to another.

Propagating Port Attributes

Propagate port attributes to pass changed attributes to a port throughout a mapping. This removes
the need to manually update all following (or preceding) transformations with the changes made.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.25
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Preview

The data preview is run on the server, not in the client, so the database/file needs to be accessible
from the server.
It provides immediate information on the output from a transformation or mapplet in a mapping. It
can be run in the background for long running data viewer jobs.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.26
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.27
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.28
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.29
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.30
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.31
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.32
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Introduction and Overview

Before the data can be cleansed, standardized and de duplicated, the specific problems must be
identified. This will typically be done by the Data Analyst in the Analyst Tool. Analysts can
communicate the results with the Developer through shared projects, shared data objects, rules,
reference tables, comments, and tags.
The Developer can use the results of this analysis to define the standardization process while
maintaining this open channel of communication with the Analyst.
When the Developer logs in, a project will exist that was created by the Analyst in the Analyst Tool.
The Developer can use the information that exists in this project to create the standardization,
matching (and consolidation) required.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 5: Workingreproduction
Unauthorized in the Developer Tool
or distribution 5.33
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.1

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.2

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.3

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.4

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.5

Why Profile data?


Data profiling will allow users to discover physical characteristics of each column in a file. It is
performed against the entire source unless constrained by the Analyst.
IDQ reviews the contents of each appearance of a column in the source data to establish:
Its minimum and maximum value.
Its character format of the data (pattern inferences).
The distribution of distinct values it contains.
Whether it ever takes on a null value.
The most likely data types that could be used to store the data.
Profiling will be an iterative process, especially in the very early stages of the lifecycle, to
understand the content and structure of your data. As you profile a potential data source, you may
need to refine what data and/or systems really do support the Business Process. Expected data
may not be in the files or systems you start with. You may find that there is an earlier or more
appropriate source of information. You may also discover that you only need a subset of the data to
support the business flow and/or project goals.
Data governance is a set of processes that ensures important data assets are formally managed
throughout the enterprise. It ensures that data can be trusted. Data governance is a quality control
discipline for assessing, managing, using, improving, monitoring, maintaining and protecting
organizational information.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.6

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.7

Developer Profiling

Multiple object profiling, Column, Join Analysis and Mid Stream Profiling are available in the
Developer tool.
As in the Analyst, it is possible to add comments, change the sampling policies and apply rules.

(Only Join Analysis profiling is available with IDQ, the Profile Models is available as part of
Advanced Profiling and a license is required to use the additional functionality it provides.)

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.8

Developer Profiling

Configure sampling options to select the sample rows in the flat file.
Enable row drilldown to drill down to data records in the profile results.
Select Preview Columns : Only show columns that you select in the drilldown results.
Drilldown on live data Access the row data that you drill down to on the source (default).
Drilldown on staged data: The Analyst tool stages the row data in the profiling warehouse.

*Primary Key, Functional Dependency Profiling and Data Domain Discovery are restricted to
Advanced Profiling users.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.9

Developer Profiling

Multiple object profiling, Column, Join Analysis and Mid Stream Profiling are available in the
Developer tool
As in the Analyst, it is possible to add comments, change the sampling policies and apply rules.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.10

Values

Use frequency values to view a list of the distinct values that exist in a column. Reviewing the
values will give an indication whether the data needs to be standardized and cleansed. Reference
tables can be created/updated using these values.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.11

Patterns

Reviewing the patterns can give you an indication of potentially bad values in the data. For
example if 99% of the data in a column (say State) is in a certain format (XX – 2 uppercase letters)
and there is 1% of data in a completely different format (99 for example), it can highlight anomalies
in the format of the data in the field.
Pattern Values:
b Blank.
X = Letter X Any uppercase or lowercase alphabetic character.
9 Any numeric character. ( and ) Used to indicate repetition.
9 = Number
p Left parenthesis - `(`.
3 of either a letter or number will be q Right parenthesis - `)'.
symbols - themselves
displayed (like 999), over 3 they will
be displayed as 9(4) which is 9999
b = blank
p = open parenthesis
q = close parenthesis
Symbols are displayed as themselves

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.12

Patterns

Reviewing the patterns can give you an indication of potentially bad values in the data. For
example if 99% of the data in a column (say State) is in a certain format (XX – 2 uppercase letters)
and there is 1% of data in a completely different format (99 for example), it can highlight anomalies
in the format of the data in the field.
Pattern Values:
b Blank.
X Any uppercase or lowercase alphabetic character.
9 Any numeric character. ( and ) Used to indicate repetition.
p Left parenthesis - `(`.
q Right parenthesis - `)'.
symbols - themselves

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.13

An "inferred" datatype is a datatype derived from the data as opposed to


the "documented" data type, which is the data type that comes from the metadata.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.14

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.15

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.16

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.17

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.18

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.19

Mid Stream Profiling

With mid-stream profiling, Developers can rapidly profile the output of any transformation at any
stage of any mapping to instantly test and debug their logic.
For example – the Developer can profile the data outputs of a transformation and see a single
record that contains a value different to the rest – which could invalidate the transformations. This
is one of many new capabilities which delivers faster and more accurate development results.
Profiling is no longer a separate process – it is now unified with cleansing and Developers can use
profiling any where in any mapping.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.20

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.21

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.22

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.23

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.24

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.25

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.26

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.27

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.28

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.29

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.30

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.31

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.32

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.33

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.34

Use the web-based interface of the Analyst tool to collaborate on business projects.

The Analyst tool interface has headers and workspaces. A workspace is a web page where you perform
tasks based on licensed functionality that you access through tabs in the Analyst tool. You must also
have privileges to perform tasks in a workspace.

When you login to the Analyst tool, the Start workspace appears. You can open multiple workspaces in
the Analyst tool interface.
For example, use the Discovery workspace to analyze the quality of data and metadata in source
systems. You can access a workspace through workspace tabs or through menus in the Analyst tool
header.
You can use assets in some workspaces to perform tasks such as running profiles, creating business
rules, or creating mapping specifications. An asset is a type of object in the Analyst tool that supports
business operations within an organization.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.35

The Analyst tool header appears at the top of the Analyst tool user interface.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.36

Glossary - Create and manage business terms, categories, glossaries, and policies.

Discovery - Create and manage data object profiles, flat file data objects, and table data objects.
View and manage Developer tool objects such as SAP and mainframe objects that are stored in
projects in the Model repository.

Design - Create and manage mapping specifications, reference tables, and rule definitions.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.37

Note that the My Checked Out Assets is only visible in a versioned repository.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.38

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.39

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.40

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.41

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.42

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.43

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.44

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.45

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 6: Profiling,reproduction
Unauthorized Creating Mapplets and Rulesprohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 6.46

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.1
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.2
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.3
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.4
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.5
Copyright© 2017, Informatica and/or its affiliates.

What is data standardization?

Data standardization focuses on improving data completeness, conformity, consistency.


In the standardization module, you can
• convert data to standardized formats
• parse data, i.e. convert multi-domain fields to single domain fields
• remove “noise”
• remove or replace bad or inconsistent data
• make use of a range of table-based and routine-based transformation techniques

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.6
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.7
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.8
Copyright© 2017, Informatica and/or its affiliates.

Standardization Functions

Standardization Mappings can clean, transform, parse, and enrich data.


Common transformations used in standardization Mappings include Case Converter, Merge,
Standardizer and Parser.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.9
Copyright© 2017, Informatica and/or its affiliates.

Standardization Mappings

This illustration shows a standardization mapping under construction in the Developer workspace.
Many standardization processes can be repeated in different mappings such as removing noise,
deriving a nameprefix from a firstname, deriving a currency from a country code.
Mapplets can be generated so these processes can be created once, and reused in several
mappings or validated as rules and applied in the Analyst tool.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.10
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.11
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.12
Copyright© 2017, Informatica and/or its affiliates.

Case Converter

The case converter can be used to standardize the case of your data.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.13
Copyright© 2017, Informatica and/or its affiliates.

Merge Transformation

The merge transformation can be used to concatenate 2 or more ports.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.14
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.15
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.16
Copyright© 2017, Informatica and/or its affiliates.

Labeler Transformation

This transformation can be used to apply descriptions/labels to characters and strings.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.17
Copyright© 2017, Informatica and/or its affiliates.

Labeler Transformation

This transformation can be used to apply descriptions/labels to characters and strings.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.18
Copyright© 2017, Informatica and/or its affiliates.

Labeler – Reference Table

Case Sensitive – only items in the same case as defined in the reference table will be labeled as
belonging to the reference table
Replace Matches with valid values
Input – Mister John Smith
Reference table:
Mr Mr. Mister
Output:
Labeled output Tokenized output
Nameprefix John Smith <Mr><John><Smith>
Set Priority – if a token is identified as belonging to the reference table the whole string will be
labeled. For example
Input – Informatica Inc
Label - Company
Reference table – company suffix
Ltd Ltd Limited
Inc Inc Incorporated
Labeled output Tokenized output
Company <Informatica>< Ltd>

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.19
Copyright© 2017, Informatica and/or its affiliates.

Labeler Transformation

This transformation can be used to apply descriptions/labels to characters and strings.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.20
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.21
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.22
Copyright© 2017, Informatica and/or its affiliates.

CMS Settings:

File / Disk Location for probabilistic models


Service must have read + write permissions to location
Recommended not to use network location for performance and synchronisation reasons
Default is ./classifier (INFA_HOME/tomcat/bin/classifier)

Content:
Pre-trained Classifier included in Core Accelerator / OOTB Package
Covers language classification for following languages:
ar, de, en, es, fr, it, nl, pt, ru, tr
New rule: rule_Classify_Language in General_Data_Cleansing folder
Compiled using MAX ENTROPY algorithm

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.23
Copyright© 2017, Informatica and/or its affiliates.

Tips

Use OOTB / Core Accelerator rules to get started


No increased memory requirements for DIS or CMS
If model compilation and classifier transforms algorithms do not match, execution will cause errors
Japanese and Chinese data: tokens need to be separated by space first

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.24
Copyright© 2017, Informatica and/or its affiliates.

Standardizer Transformation

This transformation can be used to remove noise, standardize common terms using reference
tables, replace custom text and remove reference table matches (to easily facilitate the update of
reference tables).

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.25
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.26
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.27
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.28
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.29
Copyright© 2017, Informatica and/or its affiliates.

Decision Transformation

The Decision transformation provides a flexible and user friendly means to define and maintain
conditional constructs. It offers most of the features available in the Expression transformation
(assisted construction of decision statements, syntax highlighting, search, ability to cut and paste,
drag and drop functionality, etc.) as well as an intuitive “if-then-else” syntax and the ability to
specify multiple outputs in a single statement. The Decision transformation supports multi-strategy
configuration, which means that multiple conditional statements can be configured in a single
transformation interface.
Output ports can be added and edited in the ports tab of the Decision transformation.
The Decision transformation does not validate that the datatype returned by a function is
compatible with the datatype of the associated output port
Some Expression functions (e.g. functions that handle Binary and BigInt datatypes) are not
supported but will be added in a following release.
ABORT DECODE
AES_DECRYPT DECOMPRESS
AES_ENCRYPT ERROR
COMPRESS IIF
DEC_BASE64 TO_BIGINT

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.30
Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 7: Cleansing,
Unauthorized Standardizing
reproduction and Enhancing
or distribution Data
prohibited. 7.31
Copyright© 2017, Informatica and/or its affiliates.

Data Standardization — Summary

Standardization:
• removes noise – e.g. white space, punctuation
• standardizes elements using Reference Tables or reference tables, and standardizes elements
for matching
• standardizes case of data characters (lower/upper/sentence)
• enriches data – e.g. gender from first name, county and town from postcode
• output can be stored in new database column fields or used to replace existing fields

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.1
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.2
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.3
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Standardization functions

Standardization Mappings can clean, transform, parse, and enrich data.


Common transformations used in standardization Mappings include Case Converter, Merge,
Standardizer and Parser.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.4
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.5
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.6
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.7
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.8
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.9
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.10
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Sample Standardization Output

This slide shows a step in the standardization process.


The Contact had been parsed out into nameprefix, firstname, initial and surname outputs.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.11
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.12
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Parsing

Output multiple tokens to a single field


Configure delimiters and reverse enabling
Parse using:
- Token/Content sets
- Regular expressions
User can select from Infa built RegEx or create their own
Each RegEx can have more than one output.
User must add the necessary output fields for each required RegEx output.
Reference tables
Standardize as you parse
Define outputs

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.13
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Parsing
Users can select from an Informatica built RegEx or create their own. Each RegEx can have more
than one output. The user can validate their RegEx via the UI. The necessary output fields for each
required RegEx output must be added.
Unparsed provides the remaining tokens that were not matched to a token definition created in the
parser Transformation.
For example, if the input is “White 15” and the only parser token defined is WORD, then your
output would be as follows:
WORD_Out = White
UnparsedField = 15
Overflow contains all the tokens which were parsed by one of the provided token
definitions/reference tables but could not be sent to an output port.
In Informatica 9, we introduce the addition of “Detailed Overflow” output ports. Detailed Overflow
provides more detailed overflow output in addition to the overflow. This option creates additional
Overflow output ports based on individual (and unique) token definitions.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.14
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Parsing
Users can select from an Informatica built RegEx or create their own. Each RegEx can have more
than one output. The user can validate their RegEx via the UI. The necessary output fields for each
required RegEx output must be added.
Unparsed provides the remaining tokens that were not matched to a token definition created in the
parser Transformation.
For example, if the input is “White 15” and the only parser token defined is WORD, then your
output would be as follows:
WORD_Out = White
UnparsedField = 15
Overflow contains all the tokens which were parsed by one of the provided token
definitions/reference tables but could not be sent to an output port.
In Informatica 9, we introduce the addition of “Detailed Overflow” output ports. Detailed Overflow
provides more detailed overflow output in addition to the overflow. This option creates additional
Overflow output ports based on individual (and unique) token definitions.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.15
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.16
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.17
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Detailed Overflow

Overflow lists the value that has overflowed. Firstname_Overflow tells us the item exists in the
firstname reference table but there is no room available to parse it.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.18
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.19
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

NER and Probabilistic Approaches

Models can be overcooked if too may exceptions are added without including representative
selections of other patterns.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.20
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Probabilistic Models

Misspellings (abbreviations, truncated words) do not need to be included as inferences can be


made probabilistically.
Conflicting labels do not need additional rules to arbitrate which is correct.
Able to correctly label ambiguous terms that can have more than 1 meaning.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.21
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Named Entity Recognition

NER can be applied to various types of text such as


Address line 1 to extract person, organization and location
Long text descriptions
News articles and ‘Big data’ – Facebook/ Twitter feeds
Every data set contains patterns. A few patterns represent a large portion of the data, while smaller
and smaller portions of the data are represented in larger numbers of patterns.
NER uses a statistical approach to analyzing data patterns. It relies on taking a sample of data and
makes inferences as to what a particular pattern represents. The broader and more representative
the sample, the better the predictive capability of the model. Models offer more reusability. Even if
prevalence of patterns changes, NER can still recognize specific patterns. As new patterns are
encountered, they can be added to the model to broaden capability.
All text follows repetitive patterns.
First 2000 Address line 1 elements from a given database would have similar patterns to
lines 2001 - 1,000,000.
Patterns can be analyzed and statistics gathered.
Based on patterns identified, probabilities of a type of entity can be calculated and labelled
appropriately

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.22
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.23
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.24
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.25
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.26
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Pattern Based Parsing

Reference tables containing patterns should be kept up to date with any new patterns that emerge.
This means they will appear in the list of patterns in the parser and can then be assigned a parsing
strategy.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.27
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Pattern Based Parsing

Reference tables containing patterns should be kept up to date with any new patterns that emerge.
This means they will appear in the list of patterns in the parser and can then be assigned a parsing
strategy.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.28
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.29
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.30
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.31
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 8: Parsing reproduction
Unauthorized Data 8.32
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.1
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.2
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.3
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.4
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.5
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Matching Process

Step 1: Key Definition (Optional) - Define a Group Key


Step 2: Matching
Classic or Identity Matching
Pair Generation
Generate pairs of records for each set
Scoring / Comparison
Compare records in each pair and assign a similarity score
Processing / Clustering
Use candidate pairs and scores to arrange records into unique clusters of
related records
Step 3: Consolidation (Optional)
Association (Optional) - Associate created clusters to create “super clusters” based on
multiple match criteria
Consolidation (Optional) - Consolidate a cluster of duplicates to create a master survivor
record

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.6
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.7
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Matching Theory – Pair Generation

In this example, each record in the dataset will be compared with all the others. The first record will
be compared against the other four. Then the second record will be compared with the three below
it (we don’t need to compare it with the first record, as it has already been compared with the first
record). The third record is compared against two others, and finally the fourth record is compared
with the last record. This gives a total of 10 pairs.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.8
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Matching Theory – Scoring

The next phase assigns a score to each pair, which indicates how similar they are. For this
example, we will assign a score from 0 to 1, with 1 indicating identical records. Looking at the
example, we can see that all but two of our pairs are totally unrelated

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.9
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Matching Theory – Processing

We now need to present this information to the user. We are going to use a clustered output. This
means that we output the same number of rows that we originally received, and we add an
identifier to each row. Rows that are similar will have the same identifier. We also need a way to
determine if two rows are related. To do this, we specify a threshold value. Any pairs with a score
equal to or above the threshold are deemed to match. In our example, we will pick a threshold of
0.8. We can see that only one pair meets the threshold. So the records that make up that pair will
go into the same cluster. Any record that doesn’t match with another record will be assigned to a
cluster of size 1. So, when we look at the output, if there is only one record in a particular cluster it
means that that record didn’t match with anything.
There are other ways of presenting the output format, but the one common step of this phase is
determining which records match by comparing their score to the threshold.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.10
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.11
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.12
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

What is pre-processing and grouping?


In our example it is easy to generate the pairs we need, and the number of pairs is not large. The
number of matching operations that can be performed on your data increases exponentially with
the number of records in your dataset. Each record in a file (or database table) is compared against
every other record in the file regardless of the fact that they may not be remotely similar. As a
result, the time taken to run a matching Mappings increases exponentially. Grouping reduces the
number of matching operations by separating data records into several groups so that matching is
performed on records within each group only.
Simple example:
The record for “John Smith” need not be compared with the record for “Jane Adams” as this is
obviously not a duplicate record. By grouping on the surname field, these 2 records will be placed
into separate groups and therefore no comparison will be performed. Through grouping (on the
appropriate key) we are ensuring that only related records are matched against each other thus
minimizing the amount of matching operations to be performed.
If there are n records in the database, then the number of matching operations
= (n2 - n)/2 for n=1,000, the approximate number of operations = 500,000 for n=1,000,000, the
approximate number of operations = 500,000,000,000
Therefore, matching Mappings performed on un-grouped data (even in small datasets of e.g.
50,000 records) can take long periods to run.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.13
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Grouping

Steps involved:
To improve matching success rates:
Pre–processing
Remove noise symbols, punctuation
Standardize terms (abbreviations, nicknames, languages)

To increase matching speed:


Grouping
Use the Key Generator Transformation to generate keys

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.14
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

IDQ Grouping and matching

When matching is performed, the records within each group file will be compared against each
other.
This means that records that are not likely to be duplicates will not be compared against each
other, thus saving processing time.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.15
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Key Generator

We use the Key Generator to generate keys prior to matching


The Key Generator transformation has three purposes:

1. Assign a unique identifier to each record in a dataset if one does not exist.
2. Apply an operation to a field so that it is more suitable for grouping Key Creation Strategy
(Algorithm) that will be applied to the key field.
3. Sort the outgoing data so that rows with the same group key value are contiguous. Only
required for classic matching.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.16
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.17
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Key Creation Strategy

The original Soundex algorithm was patented by Margaret O'Dell and Robert C. Russell in 1918.
The method is based on the six phonetic classifications of human speech sounds (bilabial,
labiodental, dental, alveolar, velar, and glottal), which in turn are based on where you put your lips
and tongue to make the sounds.

The Soundex generates an alphanumeric code that represents the characters at the start of a
string. It is designed to generate the same value for syllables that sound similar but have different
spellings, and it is typically used prior to matching. It performs optimally with English-language
vocabulary. The algorithm allows slight differences or typos in name spellings to be recognized,
e.g. Smith and Smyth.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.18
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Key Creation Strategy

If the String key creation strategy is used the start position and length of characters to be taken
from the input field are required. Also the user can define which side they wish to start from. If the
value in the input field is Publix and the length defined is 3 (starting from the left) the key field
generated will be Pub. All records with a similar key will be grouped and matched against each
other.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.19
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Key Creation Strategy

If the String key creation strategy is used the start position and length of characters to be taken
from the input field are required. Also the user can define which side they wish to start from. If the
value in the input field is Publix and the length defined is 3 (starting from the left) the key field
generated will be Pub. All records with a similar key will be grouped and matched against each
other.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.20
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Key Creation Strategy

If the String key creation strategy is used the start position and length of characters to be taken
from the input field are required. Also the user can define which side they wish to start from. If the
value in the input field is Publix and the length defined is 3 (starting from the left) the key field
generated will be Pub. All records with a similar key will be grouped and matched against each
other.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.21
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Midstream Profiling for Groups:

Number of records per group


NULL keys
Single record groups
Changes can be made in the transformation and re-profiled to ensure the most appropriate settings
are implemented.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.22
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Key Gen Match Performance Analysis

• Number of groups
• Editable throughput
• Estimated matching time
• Number of comparisons
• Editable minimum Group size and groups below threshold
• Maximum group size and groups above threshold

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.23
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.24
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.25
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.26
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Match Transformation

The Match transformation reads values in selected input columns and calculates match scores
representing the degrees of similarity between the pairs of values. Match transformation performs:
Match Type (Pair Generation)
Strategies (Scoring )
Match Output (Processing)

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.27
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Edit Distance

It is defined for strings of arbitrary length.


The Edit Distance strategy derives a match score for two data values by calculating the minimum
“cost” of transforming one string into another by the insertion, deletion, and replacement of
characters. The result of this calculation is the edit distance — the higher the edit distance score,
the greater the similarity between the two strings.
The formula used is the number of edits divided by the longest string. The result of this calculation
is the distance or difference between the two. Take this from 1 and this will tell you how similar the
strings are.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.28
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Jaro Distance

The Jaro Distance strategy is similar to the Edit Distance strategy, in that it calculates the general
similarity between two data values; however, the Jaro Distance algorithm reduces the match score
for the pair of values if the first four characters are not the same.
The default penalty is 0.2. This value can be edited on the strategy's Parameters tab but the
amount of characters it is applied to can not be changed. If the first 4 characters are the same and
the difference is at the end of the field, the score will be higher.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.29
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Bigram

This strategy is useful in the comparison of long multi token text strings (e.g. free format address
lines or lines of user comment).
The Bigram matching calculation is based on the occurrence of consecutive characters in both data
strings in a matching pair — specifically, the algorithm looks for pairs of consecutive characters
that are common to both strings. The greater the number of common identical pairs between the
strings, the higher the match score.
In the above example comparing Dublin Post Office and Post Office Dublin, both the Edit and Jaro
distance strategies score low, however the Bigram score is considerably higher due to its ability to
identify the matched pairs. For multi token longer text fields, the Bigram score is the most accurate.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.30
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Hamming Distance

Hamming distance is a very effective algorithm when the position of the data characters is a key
factor.
The Hamming Distance strategy derives a match score for a pair of data strings by calculating the
number of positions in which characters differ between them or is the number of symbols that
disagree. Strictly speaking, this algorithm should only be applied to strings of the same length.
For example in cases where the strings were identical or had characters in the same position and
order, the scores were the same.
However, in cases where the string lengths were different and differences between characters
were closer to the start of the string the match score were much lower in the Hamming distance.
When determining which strategy to use in matching, the Hamming distance places much more
emphasis on the position of key characters.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.31
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Hamming Distance

Hamming distance is a very effective algorithm when the position of the data characters is a key
factor.
The Hamming Distance strategy derives a match score for a pair of data strings by calculating the
number of positions in which characters differ between them or is the number of symbols that
disagree. Strictly speaking, this algorithm should only be applied to strings of the same length.
For example in cases where the strings were identical or had characters in the same position and
order, the scores were the same.
However, in cases where the string lengths were different and differences between characters
were closer to the start of the string the match score were much lower in the Hamming distance.
When determining which strategy to use in matching, the Hamming distance places much more
emphasis on the position of key characters.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.32
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Configuring Pair Generation


After applying the key transformation, we now have a “group key” field. We pass the unique
identifier and group key, along with the data to the match transformation so the pairs of records can
be generated. IDQ looks for changes in the GroupKey. When it sees a change in the GroupKey, it
knows that it has received a discrete chunk of work (nothing outside this group will interact with any
other data) and so it knows it can organize these chunks into pairs for matching.
Pair Generation is configured using the “Match Type” tab. There are 4 options:
• Field Match (Single) – This is basic single source with groups.
• Field Match (Dual) – Dual source pair generation with groups.
• Identity Match (Single)
• Identity Match (Dual)
Selecting either of the Dual source options here will add an extra input group to the transformation.
Note that for Identity Dual Source, both the input groups must have the same structure (i.e. they
must have the same number of ports, and the corresponding ports must be of the same type).
When Dual Source is selected the user is allowed to nominate with Input Group is the Master data
set.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.33
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Match transformation - Strategies

The strategies commonly used in matching are:


• Edit Distance — calculates the minimum ‘cost’ of transforming one string into another by
inserting, deleting, and replacing characters. Typically used for matching columns containing
single words.
• Jaro Distance — focuses on differences between strings, with an emphasis on the initial
characters in the string.
• Bigram — determines the number of consecutive pairs of letters shared by two input strings.
Typically used for matching columns containing longer text strings.
• Hamming Distance — determines the number of places in which characters differ between two
input strings. Typically used for matching columns containing dates, codes or numeric values.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.34
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Match transformation - Strategies

The strategies commonly used in matching are:


• Edit Distance — calculates the minimum ‘cost’ of transforming one string into another by
inserting, deleting, and replacing characters. Typically used for matching columns containing
single words.
• Jaro Distance — focuses on differences between strings, with an emphasis on the initial
characters in the string.
• Bigram — determines the number of consecutive pairs of letters shared by two input strings.
Typically used for matching columns containing longer text strings.
• Hamming Distance — determines the number of places in which characters differ between two
input strings. Typically used for matching columns containing dates, codes or numeric values.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.35
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Match Transformation - Processing

Clusters
Clusters of records with similarity score above threshold. If it scores above the threshold
with more than one cluster the overlapping clusters get merged together.

Matched Pairs
Pairs of records with similarity score above threshold
Best Match
Applicable to dual source matching only
Master-Candidate pairs with highest similarity score

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.36
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.37
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.38
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Clustering
The driver score represents how the record in the cluster has scored against the driver record. The
Link score represents how the record scored against the record that brought it into the cluster. For
performance reasons unless you require the driver score, we suggest calculating and storing only
the Link Score.

Records: Pairs Link Threshold = 0.8


Cluster output:
generated: Scores A (A matched with B = 0.8)
B
A A,B A,B = 0.8 C (C matched with B = 0.9)
Even though A and C
B A,C A,C = 0.6 didn’t match, C is brought
into the cluster because it
C B,C B,C = 0.9 matched with B.

A’s Link record is B (and vice versa) Link Score = 0.8


C’s Link record is B (brought into the cluster by B) Link Score = 0.9
C is the Driver record. The Driver Score could be calculated by matching the records against the
driver record for example.
C,A = 0.6 This is the Driver Score
C,B = 0.9 This is the Driver Score

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.39
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.40
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Driver ID

The driver record can be used to manipulate the master record when using the generated
sequence ID. It is the record with the lowest id (physically – also the one with the highest number)
in the cluster. This would be the first created record in the cluster.
Currently, during consolidation the last record in each cluster is set as the master record. The
driver score tells you how every other record in the cluster matches against this master record.
It is possible to manipulate the driver record by sorting the records ahead of consolidation.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.41
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Key Gen Match Performance Analysis

• Number of groups
• Editable throughput
• Estimated matching time
• Number of comparisons
• Editable minimum Group size and groups below threshold
• Maximum group size and groups above threshold

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.42
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.43
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 9: Matching
Unauthorized Data
reproduction 9.44
or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Matching — Summary

The key objectives in data matching are to assess duplication, integrity, and accuracy in a data
source or between two data sources.
The data source(s) used can be files or database tables.
Key matching strategies include:
• Edit Distance – shorter names, i.e. one or two words
• Jaro Distance — text strings where matches between the initial characters are important
• Bigram Distance – longer names, i.e. long strings
• Hamming Distance – postcodes, dates, codes or numbers

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.1
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.2
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.3
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.4
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.5
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.6
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Exception Management Process

Here is the High level Process Flow


Step 1 Source is fed into IDQ.
Step 2 IDQ applies Data Quality rules. Data is split into exception records and
passed records.
Step 3 Passed records go straight to Target location.
Step 4 Exception records go to staging area accessible by the Task Inbox
This is where Users Manage Bad records or Consolidate suspect
duplicates.

Once users are satisfied that the cleansed and consolidated records, the records can then be
pushed out to target

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.7
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.8
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.9
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.10
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.11
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.12
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.13
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.14
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.15
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.16
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.17
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Routing Options:


• Good – score above upper threshold or records with no associated issues. These records
could be written directly to the target
• Bad– score between lower and upper thresholds or records with associated issues. These
records would be gated and sent for manual review
• Rejected – score below lower threshold. These records could also be gated and sent for
automatic processing or to a specific person. They may be too bad to be corrected.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.18
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.19
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.20
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.21
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.22
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.23
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.24
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.25
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.26
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.27
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.28
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.29
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 10: Managing
Unauthorized Exception and
reproduction Duplicate Records
or distribution 10.30
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.1

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.2

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.3

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.4

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.5

What is a workflow?

A workflow object is an event, task, or gateway. An event starts or ends the workflow.
A task is an activity that runs a single unit of work in the workflow, such as running a mapping,
sending an email, or running a shell command.
A task can require human input to complete.
A gateway makes a decision to split and merge paths in the workflow.
A sequence flow connects workflow objects to specify the order that the Data Integration Service
runs the objects.
You can create a conditional sequence flow to determine whether the Data Integration Service runs
the next object.
You can define and use workflow variables and parameters to make workflows more flexible. A
workflow variable represents a value that records run-time information, and that can change during
a workflow run.
A workflow parameter represents a constant value that you define before running a workflow. You
use workflow variables and parameters in conditional sequence flows and object fields. You also
use workflow variables and parameters to pass data between a task and the workflow.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.6

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.7

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.8

Workflow Tasks/Events
Exclusive gateway – The Data Integration Service evaluates the conditional sequence flows
according to the order in the Exclusive gateway. Only one outgoing branch is taken.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.9

Workflow Tasks/Events

An Assignment task assigns a value to a user-defined workflow variable.


Assignment can include Literal values, Workflow Parameters, Workflow System and User-defined
Variables, any valid expression using functions and operators.

A Command task runs a single shell command. Command task produces an exit code output
value that indicates whether the command ran successfully. A successful command returns 0. An
unsuccessful command returns a non-zero value. The general task status output indicates whether
the Command task ran successfully. A Command task does not fail if the command is
unsuccessful.

A Human task defines actions that one or more users perform on workflow data. A Mapping task
identifies records in a data set that contain unresolved data quality issues. Users who perform a
Human task use the Analyst tool to resolve the issues and update the data quality status of each
record.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.10

Workflow Task/Event

A Mapping task runs a mapping during a workflow.


Outputs include numbers of source rows and number of target rows processed. It also includes
total number of rows that the mapping failed to read from the source and write to the target.

A Notification task sends an email notification to specified recipients. Use the Administrator tool to
configure the email server properties for the Data Integration Service.
Recipients include users and groups in the domain and email addresses. Workflow parameters and
variables can be used to dynamically determine the recipients, email addresses, and email content.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.11

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.12

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.13

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.14

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.15

Building a Workflow

Indicate the flow of control from one task to the next


The flow may branch either:
Unconditionally (multiple links are followed from a single task)
Using Gateway tasks (only one branch is followed)

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.16

Building a Workflow

Indicate the flow of control from one task to the next


The flow may branch either:
Unconditionally (multiple links are followed from a single task)
Using Gateway tasks (only one branch is followed)

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.17

Building a Workflow

Indicate the flow of control from one task to the next


The flow may branch either:
Unconditionally (multiple links are followed from a single task)
Using Gateway tasks (only one branch is followed)

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.18

Building a Workflow

Indicate the flow of control from one task to the next


The flow may branch either:
Unconditionally (multiple links are followed from a single task)
Using Gateway tasks (only one branch is followed)

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.19

You can send an email notification from a Notification task. For example, when a task is created,
you can email the users who will be performing the task to let them know the task has been
created, along with information such as the date, task name, ID etc.
When a task has been completed you could send a notification to the Business Administrator to let
them know its been completed.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.20

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.21

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.22

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.23

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.24

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.25

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.26

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.27

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.28

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.29

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.30

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.31

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.32

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.33

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.34

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.35

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 11: Managing
Unauthorized and Deploying
reproduction Workflows prohibited. Copyright© 2017, Informatica and/or its affiliates.
or distribution 11.36

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 12: Deployment: Executing Mappings
Outside of Developer
Unauthorized 12.1
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 12: Deployment: Executing Mappings
Outside of Developer
Unauthorized 12.2
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 12: Deployment: Executing Mappings
Outside of Developer
Unauthorized 12.3
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 12: Deployment: Executing Mappings
Outside of Developer
Unauthorized 12.4
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 12: Deployment: Executing Mappings
Outside of Developer
Unauthorized 12.5
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 12: Deployment: Executing Mappings
Outside of Developer
Unauthorized 12.6
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 12: Deployment: Executing Mappings
Outside of Developer
Unauthorized 12.7
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 12: Deployment: Executing Mappings
Outside of Developer
Unauthorized 12.8
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 12: Deployment: Executing Mappings
Outside of Developer
Unauthorized 12.9
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 12: Deployment: Executing Mappings
Outside of Developer
Unauthorized 12.10
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 12: Deployment: Executing Mappings
Outside of Developer
Unauthorized 12.11
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 12: Deployment: Executing Mappings
Outside of Developer
Unauthorized 12.12
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 12: Deployment: Executing Mappings
Outside of Developer
Unauthorized 12.13
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 12: Deployment: Executing Mappings
Outside of Developer
Unauthorized 12.14
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 12: Deployment: Executing Mappings
Outside of Developer
Unauthorized 12.15
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 12: Deployment: Executing Mappings
Outside of Developer
Unauthorized 12.16
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 12: Deployment: Executing Mappings
Outside of Developer
Unauthorized 12.17
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 12: Deployment: Executing Mappings
Outside of Developer
Unauthorized 12.18
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 12: Deployment: Executing Mappings
Outside of Developer
Unauthorized 12.19
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 12: Deployment: Executing Mappings
Outside of Developer
Unauthorized 12.20
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Student note text here

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 12: Deployment: Executing Mappings
Outside of Developer
Unauthorized 12.21
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 12: Deployment: Executing Mappings
Outside of Developer
Unauthorized 12.22
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 12: Deployment: Executing Mappings
Outside of Developer
Unauthorized 12.23
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 12: Deployment: Executing Mappings
Outside of Developer
Unauthorized 12.24
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 12: Deployment: Executing Mappings
Outside of Developer
Unauthorized 12.25
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 12: Deployment: Executing Mappings
Outside of Developer
Unauthorized 12.26
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 12: Deployment: Executing Mappings
Outside of Developer
Unauthorized 12.27
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 13: Importing
Unauthorized and Exporting
reproduction Objects
or distribution 13.1
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 13: Importing
Unauthorized and Exporting
reproduction Objects
or distribution 13.2
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 13: Importing
Unauthorized and Exporting
reproduction Objects
or distribution 13.3
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Object Import/Export Overview

Import and Export through Command Line for Domain and Design Time (MRS) objects.
Command line utility enables selective object exports from the MRS.

Object import control file allows users to achieve fine grain conflict resolution similar to Developer
Tool.

Domain objects (users, groups, roles, connections) can now be selectively imported and exported
through command line only.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 13: Importing
Unauthorized and Exporting
reproduction Objects
or distribution 13.4
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 13: Importing
Unauthorized and Exporting
reproduction Objects
or distribution 13.5
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 13: Importing
Unauthorized and Exporting
reproduction Objects
or distribution 13.6
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 13: Importing
Unauthorized and Exporting
reproduction Objects
or distribution 13.7
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 13: Importing
Unauthorized and Exporting
reproduction Objects
or distribution 13.8
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 13: Importing
Unauthorized and Exporting
reproduction Objects
or distribution 13.9
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Advanced Import

Choosing to rename the imported object will keep the original target object unaltered.
Replacing the target will destroy the target object and may impact other objects.

If multiple parent folders match, then the full folder hierarchy of the source will be compared against
the full folder hierarchy of the target.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 13: Importing
Unauthorized and Exporting
reproduction Objects
or distribution 13.10
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 13: Importing
Unauthorized and Exporting
reproduction Objects
or distribution 13.11
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Dependency Resolution

For complex imports, users may lose context of the dependencies for a given object, and whether
or not the object itself is a dependency for another object.
To regain context, they can use the “Used in source by” node to identify all objects that depend on
that object. And the “Uses in source” to identify all dependencies for that object.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 13: Importing
Unauthorized and Exporting
reproduction Objects
or distribution 13.12
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 13: Importing
Unauthorized and Exporting
reproduction Objects
or distribution 13.13
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 13: Importing
Unauthorized and Exporting
reproduction Objects
or distribution 13.14
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 13: Importing
Unauthorized and Exporting
reproduction Objects
or distribution 13.15
prohibited. Copyright© 2017, Informatica and/or its affiliates.

Import/Export

Import/Export can also be performed through the command line but this will not be covered in the
training.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 14: Troubleshooting
Unauthorized 14.1
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 14: Troubleshooting
Unauthorized 14.2
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 14: Troubleshooting
Unauthorized 14.3
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 14: Troubleshooting
Unauthorized 14.4
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 14: Troubleshooting
Unauthorized 14.5
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 14: Troubleshooting
Unauthorized 14.6
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 14: Troubleshooting
Unauthorized 14.7
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

If you have access, log into Informatica Administrator and verify the services are down. Close
Developer, restart the services and once they are up and running, try again.
Alternatively close Developer, contact the Administrator and ask them to review the services.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 14: Troubleshooting
Unauthorized 14.8
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 14: Troubleshooting
Unauthorized 14.9
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 14: Troubleshooting
Unauthorized 14.10
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 14: Troubleshooting
Unauthorized 14.11
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 14: Troubleshooting
Unauthorized 14.12
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 14: Troubleshooting
Unauthorized 14.13
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 14: Troubleshooting
Unauthorized 14.14
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 14: Troubleshooting
Unauthorized 14.15
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 14: Troubleshooting
Unauthorized 14.16
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 14: Troubleshooting
Unauthorized 14.17
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 14: Troubleshooting
Unauthorized 14.18
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 14: Troubleshooting
Unauthorized 14.19
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 14: Troubleshooting
Unauthorized 14.20
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 14: Troubleshooting
Unauthorized 14.21
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 14: Troubleshooting
Unauthorized 14.22
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 14: Troubleshooting
Unauthorized 14.23
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 14: Troubleshooting
Unauthorized 14.24
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 14: Troubleshooting
Unauthorized 14.25
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 14: Troubleshooting
Unauthorized 14.26
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.
Module 14: Troubleshooting
Unauthorized 14.27
reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

Data Quality: Data Quality Management for Developers Copyright © 2017 Informatica LLC
Unauthorized reproduction or distribution prohibited. Copyright© 2017, Informatica and/or its affiliates.

You might also like