Professional Documents
Culture Documents
1 HotFix 2)
User Guide
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Informatica Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Informatica My Support Portal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Informatica Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Informatica Web Site. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Informatica How-To Library. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Informatica Knowledge Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Informatica Support YouTube Channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Informatica Marketplace. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Informatica Velocity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Informatica Global Customer Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Table of Contents
ii
Table of Contents
Table of Contents
iii
iv
Table of Contents
Table of Contents
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
vi
Table of Contents
Preface
The Informatica Data Quality User Guide is written for Informatica users who create and run data quality processes in
the Informatica Developer and Informatica Analyst client applications. The Informatica Data Quality User Guide
contains information about profiles and other objects that you can use to analyze the content and structure of data and
to find and fix data quality issues.
Informatica Resources
Informatica My Support Portal
As an Informatica customer, you can access the Informatica My Support Portal at http://mysupport.informatica.com.
The site contains product information, user group information, newsletters, access to the Informatica customer
support case management system (ATLAS), the Informatica How-To Library, the Informatica Knowledge Base,
Informatica Product Documentation, and access to the Informatica user community.
Informatica Documentation
The Informatica Documentation team takes every effort to create accurate, usable documentation. If you have
questions, comments, or ideas about this documentation, contact the Informatica Documentation team through email
at infa_documentation@informatica.com. We will use your feedback to improve our documentation. Let us know if we
can contact you regarding your comments.
The Documentation team updates documentation as needed. To get the latest documentation for your product,
navigate to Product Documentation from http://mysupport.informatica.com.
vii
Informatica Marketplace
The Informatica Marketplace is a forum where developers and partners can share solutions that augment, extend, or
enhance data integration implementations. By leveraging any of the hundreds of solutions available on the
Marketplace, you can improve your productivity and speed up time to implementation on your projects. You can
access Informatica Marketplace at http://www.informaticamarketplace.com.
Informatica Velocity
You can access Informatica Velocity at http://mysupport.informatica.com. Developed from the real-world experience
of hundreds of data management projects, Informatica Velocity represents the collective knowledge of our
consultants who have worked with organizations from around the world to plan, develop, deploy, and maintain
successful data management solutions. If you have questions, comments, or ideas about Informatica Velocity,
contact Informatica Professional Services at ips@informatica.com.
viii
Preface
Use the following telephone numbers to contact Informatica Global Customer Support:
North America / South America
Asia / Australia
Toll Free
Toll Free
Toll Free
Preface
ix
CHAPTER 1
identify strengths and weaknesses in data and help you define a project plan.
Create scorecards to review data quality. A scorecard is a graphical representation of the quality measurements in
a profile.
Standardize data values. Standardize data to remove errors and inconsistencies that you find when you run a
profile. You can standardize variations in punctuation, formatting, and spelling. For example, you can ensure that
the city, state, and ZIP code values are consistent.
Parse data. Parsing reads a field composed of multiple values and creates a field for each value according to the
type of information it contains. Parsing can also add information to records. For example, you can define a parsing
operation to add units of measurement to product data.
Validate postal addresses. Address validation evaluates and enhances the accuracy and deliverability of postal
address data. Address validation corrects errors in addresses and completes partial addresses by comparing
address records against address reference data from national postal carriers. Address validation can also add
postal information that speeds mail delivery and reduces mail costs.
Find duplicate records. Duplicate analysis calculates the degrees of similarity between records by comparing data
from one or more fields in each record. You select the fields to be analyzed, and you select the comparison
strategies to apply to the data. The Developer tool enables two types of duplicate analysis: field matching, which
identifies similar or duplicate records, and identity matching, which identifies similar or duplicate identities in
record data.
Manage exceptions. An exception is a record that contains data quality issues that you correct by hand. You can
run a mapping to capture any exception record that remains in a data set after you run other data quality processes.
You review and edit exception records in the Analyst tool or in Informatica Data Director for Data Quality.
Create reference data tables. Informatica provides reference data that can enhance several types of data quality
process, including standardization and parsing. You can create reference tables using data from profile results.
Create and run data quality rules. Informatica provides rules that you can run or edit to meet your project
objectives. You can create mapplets and validate them as rules in the Developer tool.
Collaborate with Informatica users. The Model repository stores reference data and rules, and this repository is
available to users of the Developer tool and Analyst tool. Users can collaborate on projects, and different users can
take ownership of objects at different stages of a project.
Export mappings to PowerCenter. You can export and run mappings in PowerCenter. You can export mappings to
PowerCenter to reuse the metadata for physical data integration or to create web services.
CHAPTER 2
Reference Data
This chapter includes the following topics:
Reference Data Overview, 4
User-Defined Reference Data, 5
Informatica Reference Data, 6
Reference Data and Transformations, 6
Reference Tables, 7
Content Sets, 8
Note: Create a reference object with incorrect values when you want to search a data set for incorrect values.
The following table lists common examples of project data columns that can contain reference data:
Information
Employee codes
Information
Customer names
You can create reference data objects in the Developer tool and Analyst tool. For example, you can create a reference
table from column profile data. You can export reference tables to the file system.
The Data Quality Content Installer file set includes Informatica reference data objects that you can import.
Reference Tables
A reference table contains the standard versions of a set of data values and alternative versions of the values that
might occur in business data.
You add a reference table to a transformation in the Developer tool. You use the transformations to find reference data
values in input data and to write the alternative values as output data.
You create a reference table in the following ways:
Create an empty reference table and enter the data values.
Create a reference table from data in a flat file.
Create a reference table from data in another database table.
Create a reference table from column profile results.
You can create a reference table in the Developer tool or Analyst tool. You can edit reference table data in the
Developer tool. You can edit reference table data and metadata in the Analyst tool. When you create a reference table,
the Model repository stores the table metadata as a repository object.
Reference Tables
Before you edit the table, verify that you have the required privileges on the following services:
Content Management Service. To edit reference table data, you need the Edit Reference Table Data privilege. To
edit reference table metadata, you need the Edit Reference Table Metadata privilege.
Model Repository Service. To view the project that contains the reference table, you need the Create Project
privilege.
Use the Security options in the Administrator tool to review or update the service privileges.
To edit data in an unmanaged reference table, verify also that you configured the reference table object to permit
edits.
Note: If you edit the metadata for an unmanaged reference table in a database application, use the Analyst tool to
synchronize the Model repository with the database table. You must synchronize the Model repository and the
database table before you use the unmanaged reference table in the Developer tool.
Content Sets
A content set is a Model repository object that you use to store reusable content expressions. A content expression is
an expression that you can use in Labeler and Parser transformations to identify data.
You can create content sets to organize content expressions into logical groups. For example, if you create a number
of content expressions that identify Portuguese strings, you can create a content set that groups these content
expressions. Create content sets in the Developer tool.
Content expressions include character sets, pattern sets, regular expressions, and token sets. Content expressions
can be system-defined or user-defined. System-defined content expressions cannot be added to content sets. Userdefined content expressions can be reusable or non-reusable.
Character Sets
A character set contains expressions that identify specific characters and character ranges. You can use character
sets in Labeler transformations that use character labeling mode.
Character ranges specify a sequential range of character codes. For example, the character range "[A-C]" matches
the uppercase characters "A," "B," and "C." This character range does not match the lowercase characters "a," "b," or
"c."
Use character sets to identify a specific character or range of characters as part of labeling operations. For example,
you can label all numerals in a column that contains telephone numbers. After labeling the numbers, you can identify
patterns with a Parser transformation and write problematic patterns to separate output ports.
Description
Label
Standard Mode
Enables a simple editing view that includes fields for the start
range and end range.
Start Range
End Range
Advanced Mode
Range Character
Delimiter Character
Classifier Models
A classifier model analyzes input strings and determines the types of information that they contain. You use a
classifier model in a Classifier transformation.
Use a classifier model when input strings contain significant amounts of data. For example, you can use a classifier
model to identify the subject matter in a set of documents. You export the text from each document, and you store each
document as a separate field in a single data column. The Classifier transformation reads the data and classifies the
subject matter in each field according to the labels defined in the classifier model.
The classifier model contains the following columns:
Data column
A column that contains the words and phrases that are likely to exist in the input data. The transformation
compares the input data with the data in this column.
Label column
A column that contains descriptive labels that can define the information in the data. The transformation returns a
label from this column as output.
The classifier model also contains compilation data that the Classifier transformation uses to calculate the correct
information type for the input data.
You create a Classifier model in the Developer tool. The Model repository stores the metadata for the classifier model
object. The column data and compilation data are stored in a file in the Informatica directory structure.
Content Sets
Pattern Sets
A pattern set contains expressions that identify data patterns in the output of a token labeling operation. You can use
pattern sets to analyze the Tokenized Data output port and write matching strings to one or more output ports. Use
pattern sets in Parser transformations that use pattern parsing mode.
For example, you can configure a Parser transformation to use pattern sets that identify names and initials. This
transformation uses the pattern sets to analyze the output of a Labler transformation in token labeling mode. You can
configure the Parser transformation to write names and initials in the output to separate ports.
Description
Pattern
Probabilistic Models
A probabilistic model identifies data values by the types of information that they represent and by the position of the
values in an input string.
You use probabilistic models with the Labeler and Parser transformations.
A probabilistic model contains the following columns:
An input column that represents the data on the input port. You populate the column with sample data from the
input port. The model uses the sample data as reference data in parsing and labeling operations.
One or more label columns that identify the types of information in each input string. You add the label columns to
the model, and you assign labels to the data values in each string. Use the label columns to indicate the correct
position of the data values in the string.
You create a Classifier model in the Developer tool. The Model repository stores the metadata for the classifier model
object. The column data and compilation data are stored in a file in the Informatica directory structure.
The probabilistic model also contains compilation data that the transformations can use to calculate the correct
information type for the input data. You update the model logic when you compile the model in the Developer tool.
Regular Expressions
In the context of content sets, a regular expression is an expression that you can use in parsing and labeling
operations. Use regular expressions to identify one or more strings in input data. You can use regular expressions in
Parser transformations that use token parsing mode. You can also use regular expressions in Labeler transformations
that use token labeling mode.
Parser transformations use regular expressions to match patterns in input data and parse all matching strings to one
or more outputs. For example, you can use a regular expression to identify all email addresses in input data and parse
each email address component to a different output.
Labeler transformations use regular expressions to match an input pattern and create a single label. Regular
expressions that have multiple outputs do not generate multiple labels.
10
Description
Number of Outputs
Regular Expression
Test Expression
Next Expression
Previous Expression
Token Sets
A token set contains expressions that identify specific tokens. You can use token sets in Labeler transformations that
use token labeling mode. You can also use token sets in Parser transformations that use token parsing mode.
Use token sets to identify specific tokens as part of labeling and parsing operations. For example, you can use a token
set to label all email addresses that use that use an "AccountName@DomainName" format. After labeling the tokens,
you can use the Parser transformation to write email addresses to output ports that you specify.
Description
Name
N/A
Description
N/A
N/A
Label
Regular Expression
Regular Expression
Regular Expression
Content Sets
11
Property
Description
Test Expression
Regular Expression
Next Expression
Regular Expression
Previous Expression
Regular Expression
Label
Character
Standard Mode
Character
Start Range
Character
End Range
Character
Advanced Mode
Character
Range Character
Character
Delimiter Character
Character
12
In the Object Explorer view, select the project or folder where you want to store the content set.
2.
3.
4.
Optionally, select Browse to change the Model repository location for the content set.
5.
Click Finish.
Open a content set in the editor and select the Content view.
2.
3.
Click Add.
4.
5.
6.
If you selected the Token Set expression view, select a token set mode.
7.
Click Next.
8.
9.
Click Finish.
Tip: You can create content expressions by copying them from another content set. Use the Copy To and Paste
From options to create copies of existing content expressions. You can use the CTRL key to select multiple content
expressions when using these options.
Open the mapping that contains the transformation you will connect to the Labeler or Parser.
2.
3.
Under Column Profiling, select the column you want to add to the probabilistic model.
4.
5.
6.
If you want to add a subset of column values to a probabilistic model, follow these steps:
a.
Use the Shift or Ctrl keys to select one or multiple values from the editor.
b.
Right-click the values and select Send to > Export Results to File.
If you want to add all column values to a probabilistic model, click the option to Export Value Frequencies to
File.
Content Sets
13
7.
In the Export dialog box, enter a file name. You can save the file on the Informatica services machine or on the
Developer client machine.
If you save the file on the client machine, enter a path to the file.
You can use the file as a data source for the Label or Data column in the probabilistic model.
14
CHAPTER 3
Classifier Models
This chapter includes the following topics:
Classifier Models Overview, 15
Classifier Model Structure, 16
Classifier Model Reference Data, 16
Classifier Model Label Data, 18
Classifier Scores, 19
Classifier Model Views, 19
Classifier Model Filters, 20
Creating a Classifier Model from a Data Object, 21
Copy and Paste Operations, 22
of information in the text. Natural language processes detect relevant words in the input string. Natural language
processes disregard words that are not relevant.
The input data strings contain multiple values. For example, you can create a data column that contains the
15
2.
You create a classifier model that contains sample text for each language.
Note: You can use sample data from the email messages data as source data for the model. Copy the email
message text to a file or database table, and create a data source from the file or table in the Model repository.
3.
4.
You add the transformation to a mapping, and you connect the transformation ports to the data source and data
targets. You create a data target for each language.
When you run the mapping, the Classifier transformation analyzes the email messages and writes the email text to the
correct data target. You can share the data target with the team members in the appropriate support center.
To update the compilation data, open the model in the Developer tool and click Compile.
16
you create a model, verify that the reference data includes the types of text that you expect to find when you run the
mapping.
You can use the mapping source data to create a classifier model. Select a sample of the source data and copy the
data sample to the model.
Consider the following rules and guidelines when you work with classifier model reference data:
A reference data field can be of any length. You can enter pages of text into each data field.
You import reference data from a data object.
You cannot edit reference data values. However, you can delete a data row.
2.
3.
4.
Browse the Model repository and select the data object that you want to use. Click Next.
Note: Do not select a social media data object as a data source.
5.
Review the columns on the data object, and select a column to add as a data column or label column for the
model. You can add a reference data column and a label column in the same operation.
To use a data source column as the reference data column in the model, select the column name and click
Data.
You can select multiple data columns. The classifier model merges the contents of the columns you select into
a single column.
To use a data source column as the label column for the model, select the column name and click Label.
Click Next.
6.
7.
After you append the data, verify that the data rows you added include label values.
Open the classifier model in the Developer tool. To open the model, select the model name in the content set and
click Edit.
2.
Select the row that contains the data you want to delete.
You can select a single row, multiple rows, or all rows.
3.
Click Delete.
17
2.
3.
4.
5.
Click OK.
2.
3.
Filter the data rows in the model to display the rows that do not have a label.
To display the rows that do not have a label, clear all label names in the Labels panel.
4.
Select one or more data rows. You can use the Select All option to select all the rows that appear.
The model adds a check-mark to the rows that you select.
5.
Browse the label values in the model, and select a label to apply to the data rows.
The model assigns the label to the rows you selected.
6.
Compile the model to add the label names to the classifier model logic.
If you assign a label that you cleared from the display of label names, the model hides the rows. Select the label name
in the Labels panel to view the rows.
18
1.
2.
3.
Click Properties.
4.
In the Manage Labels dialog box, select one or more label to delete.
You can select multiple labels.
5.
Click Delete.
6.
Classifier Scores
A Classifier transformation compares each row of input data with every row of reference data in a classifier model. The
transformation calculates a score for each comparison. The scores represent the degrees of similarity between the
input row and the reference data rows.
When you run a mapping that contains a Classifier transformation, the mapping returns the label that identifies the
reference data row with the highest score. The score range is 0 through 1. A high score indicates a strong match
between the input data and the model data.
Review the classifier scores to verify that the label output accurately describes each row of input data. You can also
review the scores to verify that the classifier model is appropriate to the input data. If the transformation output
contains a large percentage of low scores, the classifier model might be inappropriate. To improve the comparisons,
compile the model again. If the compiled model does not improve the scores, replace the model in the
transformation.
Classifier Scores
19
The following image show the default view of a classifier model that contains data for language classification:
classifier model. If a data row does not use a label, add a label to the row.
Find data values in reference data rows. Use the filter in the default view to find data values in the reference data.
Verify that the reference data overlaps with the source data in a mapping.
Find a data value within a reference data row. Use the filter in the detailed view when you need to verify that a
reference data row contains a data value. A data row can contain a large quantity of data values.
2.
3.
20
2.
3.
4.
To verify that all data strings use a label, clear all the label values. The model displays any string that does not use
a label.
5.
2.
3.
4.
5.
6.
Type the search value in the search field below the data row.
The model highlights the first instance of the value in the row.
7.
Click the Down arrow to find the next instance of the value in the row.
Use the Up and Down arrows to move through the values in the data row.
2.
3.
4.
5.
Browse the Model repository and select the data object that contains the reference data.
Click Next.
21
6.
Review the columns on the data object, and select a column to add as reference data values or label values for
the model.
To add a data column as reference data, select the column name and click Data.
To use a data column as a source for label values, select the column name and click Label.
Click Next.
7.
8.
2.
3.
4.
Click OK.
The Developer tool copies the classifier model to the selected content set.
2.
3.
4.
Click OK.
The Developer tool pastes the classifier model to the current content set.
22
CHAPTER 4
Probabilistic Models
This chapter includes the following topics:
Probabilistic Models Overview, 23
Probabilistic Model Structure, 25
Probabilistic Model Reference Data, 27
Probabilistic Model Label Data, 27
Probabilistic Model Advanced Properties, 30
Creating an Empty Probabilistic Model, 30
Creating a Probabilistic Model from a Data Object, 31
Copy and Paste Operations, 32
The Labeler transformation writes the labels to an output port in the same format as the input string.
Use a probabilistic model in a Parser transformation to write each value in an input string to a new port. The Parser
transformation creates an output port for each data category that you define in the probabilistic model.
Probabilistic models use natural language processes to identify the type of information in a string. Natural language
processes detect relevant terms in the input string and disregard terms that are not relevant.
You compile a probabilistic model in the Developer tool. When you compile a model, you create associations between
similar data values in the model. The Labeler and Parser transformations uses the compiled data to analyze the
values in the input strings.
23
Field 1
Field 2
Field 3
19132954
AIM SECURITIES
PETRIE TAYBRO
10110169
JASE TRAPANI
10111786
JAN SEEDORF
10112299
FELIX LEVENGER
HARVARD MAGAZINE
10112036
RICHARD TREMBLAY
BERGER ASSOCIATES
10111101
DAREEN HULSMAN
19131385
PATRICK MCKINNIE
LAKENYA PASKETT
15954710
When you run the mapping, the Labeler transformation compares the input data with the probabilistic model reference
data. The Labeler transformation assigns a label to each input value. The transformation writes the labels to an output
port. Each output row contains a set of labels that defines the data structure on the corresponding input row.
The Labeler adds the following labels to the output port:
24
Row ID
Output Labels
Product Type
Product Details
Product Size
Sunnydream
Orange Juice
Unsweetened
12 oz
25
When you use a probabilistic model in a Labeler transformation, the Labeler assigns a label value to each value in the
input row. For example, the transformation labels the string "Franklin Delano Roosevelt" as "FIRSTNAME
MIDDLENAME LASTNAME."
When you use a probabilistic model in a Parser transformation, the Parser writes each input value to an output port
based on the label that matches the value. For example, the Parser writes the string "Franklin Delano Roosevelt" to
FIRSTNAME, MIDDLENAME, and LASTNAME output ports.
26
To compile the model, open the model in the Developer tool and click Compile.
2.
3.
4.
After you add a reference data string to a model, assign a label to each value in the string.
Open a data source in a data application and select one or more cells in a data column.
2.
3.
4.
5.
6.
27
same value has different meanings in two rows of reference data, you can assign different labels to the value in each
row.
You can define the same combination of labels for multiple input strings. Multiple examples of a label increase the
likelihood that the probabilistic model assigns the correct label to an input data value.
Park Place
Park
Place
Park Avenue
Park
Avenue
Madison Avenue
Madison
Avenue
Central Park
Central
Park
State Street
State
Street
The Labeler transformation can return any of the label combinations that you define in the model. Organize the label
columns from left to right in the order in which you want the labels to appear in the output data.
Note: If you add or remove a label in a probabilistic model after you add the model to a Parser transformation, you
invalidate the parsing operation that uses the model. You must delete and re-create the operation that uses the
probabilistic model.
If a probabilistic model contains a label value that does not identify a data value, you cannot compile the model.
Overflow Label
When a transformation cannot assign a label that you define to an input data value, the transformation assigns an
overflow label to the data.
The Labeler transformation assigns an overflow label to any data value that it cannot identify. The Parser
transformation creates an overflow column for unassigned data.
A transformation can fail to recognize an input value if the number of values in the input row exceeds the number of
labels in the probabilistic model. Before you use a model in a mapping, review the mapping source data and verify that
the model contains the correct number of label values.
The following table shows how a Parser transformation uses an overflow port to parse data that a probabilistic model
cannot recognize:
28
Input Data
Street_Names port
Address_Suffixes port
Park Place
Park
Place
Park Avenue
Park
Avenue
Madison Avenue
Madison
Avenue
Overflow port
Input Data
Street_Names port
Address_Suffixes port
Overflow port
Central Park
Central
Park
Washington
Square
Park
Madison
Square
Garden
2.
3.
Verify that the model contains the reference data that you need.
4.
Right-click an input data row and select New Label. Enter a column name in the New Label dialog box.
The label appears in the model.
5.
Right-click an input data row and select View tokens and labels as rows.
The Labels panel displays under the input data column. The panel displays each reference data value as a data
row.
6.
7.
8.
Note: A label is a structural element in a model. If you add or remove a label after you add the model to a
transformation, you invalidate the operation that uses the model. Delete and re-create the transformation operation.
2.
3.
4.
5.
2.
29
3.
4.
5.
Click OK.
2.
3.
4.
5.
After you create the empty model, you must add input data.
30
2.
3.
4.
5.
6.
Browse the Model repository and select the data object that contains the reference data.
Click Next.
7.
Review the data columns on the data object, and select a column to add as reference data values or label values
for the model.
To add a data column as reference data, select the column name and click Data.
To use a data column as a source for label values, select the column name and click Label.
Click Next.
8.
9.
Set the delimiters to use for the reference data values. Specify a delimiter to identify multiple values that
represent a single piece of information.
The default delimiter is a character space.
10.
11.
31
2.
3.
4.
Click OK.
The Developer tool copies the probabilistic model to the selected content set.
2.
3.
4.
Click OK.
The Developer tool pastes the probabilistic model to the current content set.
32
33
CHAPTER 5
34
Rules
Create and apply rules within profiles. A rule is business logic that defines conditions applied to data when you run a
profile. Use rules to further validate the data in a profile and to measure data quality progress.
You can add a rule after you create a profile. You can reuse rules created in either the Analyst tool or Developer tool in
both the tools. Add rules to a profile by selecting a reusable rule or create an expression rule. An expression rule uses
both expression functions and columns to define rule logic. After you create an expression rule, you can make the rule
reusable.
Create expression rules in the Analyst tool. In the Developer tool, you can create a mapplet and validate the mapplet
as a rule. You can run rules from both the Analyst tool and Developer tool.
Scorecards
A scorecard is the graphical representation of the valid values for a column or output of a rule in profile results. Use
scorecards to measure data quality progress. You can create a scorecard from a profile and monitor the progress of
data quality over time.
A scorecard has multiple components, such as metrics, metric groups, and thresholds. After you run a profile, you can
add source columns as metrics to a scorecard and configure the valid values for the metrics. Use a metric group to
categorize related metrics in a scorecard into a set. A threshold identifies the range, in percentage, of bad data that is
acceptable for columns in a record. You can set thresholds for good, acceptable, or unacceptable ranges of data.
When you run a scorecard, you can configure whether you want to drill down on the metrics for a score on the live data
or staged data. After you run a scorecard and view the scores, you can drill down on each metric to identify valid data
records and records that are not valid. You can also view scorecard lineage for each metric or metric group in a
scorecard. To track data quality effectively, you can use trend charts and monitor how the scores change over a period
of time.
The profiling warehouse stores the scorecard statistics and configuration information. You can configure a third-party
application to get the scorecard results and run reports. You can also display the scorecard results in a web
application, portal, or report such as a business intelligence report.
35
You can define a column profile for a data object in a mapping or mapplet or an object in the Model repository. The
object in the repository can be in a single data object profile, multiple data object profile, or profile model.
You can add rules to a column profile. Use rules to select a subset of source data for profiling. You can also change the
drilldown options for column profiles to determine whether the drilldown reads from staged data or live data.
Filtering Options
You can add filters to determine the rows that a column profile uses when performing profiling operations. The profile
does not process rows that do not meet the filter criteria.
1.
2.
3.
Click Add.
4.
5.
Enter a name for the filter. Optionally, enter a text description of the filter.
6.
Select Set as Active to apply the filter to the profile. Click Next.
7.
8.
Click Finish.
Sampling Properties
Configure the sampling properties to determine the number of rows that the profile reads during a profiling
operation.
The following table describes the sampling properties:
36
Property
Description
All Rows
First
Random Sample of
Reads a random sample from the number of rows that you specify.
In the Object Explorer view, select the data object you want to profile.
2.
Click File > New > Profile to open the profile wizard.
3.
4.
Enter a name for the profile and verify the project location. If required, browse to a new location.
5.
6.
Verify that the name of the data object you selected appears in the Data Objects section.
7.
Click Next.
8.
Configure the profile operations that you want to perform. You can configure the following operations:
Column profiling
Primary key discovery
Functional dependency discovery
Data domain discovery
Note: To enable a profile operation, select Enabled as part of the "Run Profile" action for that operation.
Column profiling is enabled by default.
9.
10.
Review the drilldown options, and edit them if necessary. By default, the Enable Row Drilldown option is
selected. You can edit drilldown options for column profiles. The options also determine whether drilldown
operations read from the data source or from staged data, and whether the profile stores result data from previous
profile runs.
11.
Click Finish.
37
CHAPTER 6
38
Profile Type
Profile Results
Column profile
Description
Values
Frequency
Percent
Chart
Description
Patterns
Frequency
Percent
Chart
39
Description
Maximum Length
Minimum Length
Bottom
Top
Note: The profile also displays average and standard deviation statistics for columns of type Integer.
2.
3.
4.
5.
Under Details, select Values or select Patterns and click the Export button.
The Export data to a file dialog box opens.
40
6.
Accept or change the file name. The default name is [Profile_name]_[column_name]_DVC for column value
data and [Profile_name]_[column_name]_DI for pattern data.
7.
Select the type of data to export. You can select either Values for the selected column or Patterns for the
selected column.
8.
9.
Click Browse to select a location and save the file locally in your computer. By default, Informatica Developer
writes the file to a location set in the Data Integration Service properties of Informatica Administrator.
10.
If you do not want to export field names as the first row, clear the Export field names as first row check box.
11.
Click OK.
CHAPTER 7
cannot contain any other type of transformation. For example, a rule cannot contain a Match transformation as it is
an active transformation.
It does not specify cardinality between input groups.
2.
41
Browse the Object Explorer view and find the profile you need.
2.
3.
4.
Click Add.
The Apply Rule dialog box opens.
5.
6.
Click the Value column under Input Values to select an input port for the rule.
7.
Optionally, click the Value column under Output Values to edit the name of the rule output port.
The rule appears in the Definition tab.
42
CHAPTER 8
Scorecards in Informatica
Developer
This chapter includes the following topics:
Scorecards in Informatica Developer Overview, 43
Creating a Scorecard, 43
Exporting a Resource File for Scorecard Lineage, 44
Viewing Scorecard Lineage from Informatica Developer, 44
Creating a Scorecard
Create a scorecard and add columns from a profile to the scorecard. You must run a profile before you add columns to
the scorecard.
1.
In the Object Explorer view, select the project or folder where you want to create the scorecard.
2.
3.
Click Add.
The Select Profile dialog box appears. Select the profile that contains the columns you want to add.
4.
5.
43
By default, the scorecard wizard selects the columns and rules defined in the profile. You cannot add columns
that are not included in the profile.
6.
Click Finish.
The Developer tool creates the scorecard.
7.
Optionally, click Open with Informatica Analyst to connect to the Analyst tool and open the scorecard in the
Analyst tool.
2.
3.
Click Next.
4.
Click Browse to select a project that contains the scorecard objects and lineage that you need to export.
5.
Click Next.
6.
7.
8.
To view the dependent objects that the Export wizard exports with the objects that you selected, click Next.
The Export wizard displays the dependent objects.
9.
Click Finish.
The Developer tool exports the objects to the XML file.
In the Object Explorer view, select the project or folder that contains the scorecard.
2.
3.
4.
In the Scorecard view of the Analyst tool, select a metric or metric group.
5.
44
CHAPTER 9
2.
3.
4.
If the transformation has multiple output groups, select the output groups as necessary.
5.
Click OK.
The profile results appears in the Results tab of the profile.
45
2.
3.
Press the CTRL key and click two objects in the editor.
4.
5.
Optionally, configure the profile comparison to match columns from one object to the other object.
6.
Optionally, match columns by clicking a column in one object and dragging it onto a column in the other
object.
7.
Optionally, choose whether the profile analyzes all columns or matched columns only.
8.
Click OK.
In the Object Explorer view, find the profile on which to create the mapping.
2.
3.
4.
5.
Confirm the profile definition that the Developer tool uses to create the mapping. To use another profile, click
Select Profile.
6.
Click Finish.
The mapping appears in the Object Explorer.
46
CHAPTER 10
Reference Tables
This chapter includes the following topics:
Reference Tables Overview, 47
Reference Table Data Properties, 47
Creating a Reference Table Object, 48
Creating a Reference Table from a Flat File, 49
Creating a Reference Table from a Relational Source , 50
Copying a Reference Table in the Model Repository, 51
Editing Reference Table Data, 52
Finding Data Values in a Reference Table , 52
47
Description
Name
Description
Description
Valid
Name
Data Type
Precision
Scale
Description
Default value
Default value for the fields in the column. You can optionally
add a default value when you create the reference table.
Connection Name
Select File > New > Reference Table from the Developer tool menu.
2.
3.
4.
5.
48
Add two or more columns to the table. Click the New option to create a column.
Default Value
Name
column
Data Type
string
Precision
10
Scale
Description
6.
Select the column that contains the valid values. You can change the order of the columns that you create.
7.
Default Value
Cleared
Audit note
Empty
Default value
Empty
Click Finish.
The reference table opens in the Developer tool workspace.
Select File > New > Reference Table from the Developer tool menu.
2.
In the new table wizard, select Reference Table from a Flat File.
3.
Browse to the file you want to use as the data source for the table.
4.
5.
6.
7.
8.
If the flat file contains column names, select the option to import column names from the first line of the file.
49
9.
Default Value
Text qualifier
No quotation marks
Line 1
Row Delimiter
\012 LF (\n)
Cleared
Escape character
Empty
Cleared
500
Click Next.
10.
11.
Default Value
Cleared
Audit note
Empty
Default value
Empty
500
Click Finish.
The reference table opens in the Developer tool workspace.
Select File > New > Reference Table from the Developer tool menu.
2.
In the new table wizard, select Reference Table from a Relational Source. Click Next.
3.
50
If the database connection you select does not specify the reference data warehouse, select Unmanaged
table.
If you want to perform edit operations on an unmanaged reference table, select the Editable option.
5.
6.
7.
8.
9.
Default Value
Cleared
Audit note
Empty
Default value
Empty
500
Click Finish.
Browse the Model repository and find the reference table you want to copy.
2.
Right-click the reference table and select Copy from the context menu.
3.
In the Model repository, find the project or folder you want to store to copy of the table.
4.
Click Paste.
51
In the Object Explorer, select the project or folder that contains the reference table.
2.
3.
4.
Edit the data values. You can edit the data in the following ways:
To add a data row, click New.
The cursor moves to the final row of the table and adds a row. Enter values for each field in the row.
To edit a data value, double-click the value in the reference table and update the value.
To delete a data row, select the row and click Delete.
5.
In the Object Explorer, select the project or folder that contains the reference table.
2.
3.
4.
5.
Search the columns you select for the data value in the Find field.
Use the Up and Down options to find instances of the data value.
52
53
CHAPTER 11
54
2.
Determine whether you want to create a profile with default options or change the default profile options.
3.
4.
5.
6.
7.
8.
Define a filter to determine the rows that the profile reads at run time.
9.
Note: Consider the following rules and guidelines for column names and profiling multilingual and Unicode data:
You cannot add a column to a profile if both the column name and profile name match. You cannot add the same
browser. The Analyst tool changes the Datetime, Numeric, and Decimal datatypes based on the browser locale.
Sorting on multilingual data. You can sort on multilingual data. The Analyst tool displays the sort order based on
Profile Options
Profile options include profile results option, data sampling options, and data drilldown options. You can configure
these options when you create a column profile for a data object.
You use the New Profile wizard to configure the profile options. You can choose to create a profile with the default
options for columns, sampling, and drilldown options. When you create a profile for multiple data sources, the Analyst
tool uses default column profiling options.
Description
Profile Options
55
Sampling Options
Sampling options determine the number of rows that the Analyst tool chooses to profile. You can configure sampling
options when you go through the wizard or when you run a profile.
The following table describes the sampling options for a profile:
Option
Description
All Rows
The number of rows that you want to run the profile against.
The Analyst tool chooses the rows from the first rows in the
source.
Random sample
Drilldown Options
You can configure drilldown options when you go through the wizard or when you run a profile.
The following table describes the drilldown options for a profile:
Options
Description
Select Columns
Identifies columns for drilldown that you did not select for
profiling.
56
1.
In the Navigator, select the project that contains the data object that you want to create a custom profile for.
2.
In the Contents panel, right-click the data object and select New > Profile.
The New Profile wizard appears. The Column profiling option is selected by default.
3.
Click Next.
4.
5.
6.
7.
In the Folders panel, select the project or folder where you want to create the profile.
The Analyst tool displays the project that you selected and shared projects that contain folders where you can
create the profile. The profile objects in the folder appear in the Profiles panel.
8.
Click Next.
9.
In the Columns panel, select the columns that you want to profile. The columns include any rules you applied to
the profile. The Analyst tool lists the name, datatype, precision, and scale for each column.
Optionally, select Name to select all columns.
10.
11.
12.
13.
Click Next.
14.
15.
Click Next to verify the row drilldown settings including the preview columns for drilldown.
16.
Click Save to create the profile, or click Save & Run to create the profile and then run the profile.
In the Navigator, select the project or folder that contains the profile that you want to edit.
2.
3.
4.
Based on the changes you want to make, choose one of the following menu options:
General. Change the basic properties such as name, description, and profile type.
Data Source. Choose another matching data source.
Column Profiling. Select the columns you want to run the profile on and configure the necessary sampling
57
5.
Click Save to save the changes or click Save & Run to save the changes and then run the profile.
Running a Profile
Run a profile to analyze a data source for content and structure and select columns and rules for drill down. You can
drill down on live or staged data for columns and rules. You can run a profile on a column or rule without profiling all the
source columns again after you run the profile.
1.
In the Navigator, select the project or folder that contains the profile you want to run.
2.
3.
Creating a Filter
You can create a filter so that you can make a subset of the original data source that meets the filter criteria. You can
then run a profile on this sample data.
1.
Open a profile.
2.
Click Actions > Edit > Column Profiling Filters to open the Edit Profile dialog box.
The current filters appear in the Filters panel.
3.
Click New.
4.
5.
filter.
Advanced. Use function categories, such as Character, Consolidation, Conversion, Financial, Numerical,
expression to generate the SQL filter. For example, to filter company records in the European region from a
Company table with a Region column, enter
Region = 'Europe'
in the editor.
6.
58
Managing Filters
You can create, edit, and delete filters.
1.
In the Navigator, select the project or folder that contains the profile you want to filter.
2.
3.
Click Actions > Edit > Column Profiling Filters to open the Edit Profile dialog box.
The current filters appear in the Filters panel.
4.
2.
3.
Verify the flat file path in the Browse and Upload field.
4.
Click Next.
A synchronization status message appears.
5.
2.
3.
To complete the synchronization process, click OK. Click Cancel to cancel the process.
If you click OK, a synchronization status message appears.
4.
Managing Filters
59
CHAPTER 12
60
Note: You can select a value or pattern and view profiled rows that match the value or pattern on the Details panel.
In the Properties view, you can view profile properties on the Properties panel. You can view properties for columns
and rules on the Columns and Rules panel.
In the Data Preview view, you can preview the profile data. The Analyst tool includes all columns in the profile and
displays the first 100 rows of data.
Profile Summary
The summary for a profile run includes the number of unique and null values expressed as a number and a
percentage, inferred datatypes, and last run date and time. You can click each profile summary property to sort on
values of the property.
The following table describes the profile summary properties:
Property
Description
Name
Unique Values
% Unique
Null
% Null
Datatype
Datatype derived from the values for the column. The Analyst tool can derive the
following datatypes from the datatypes of values in columns:
-
String
Varchar
Decimal
Integer
"-" for Nulls
Note: The Analyst tool cannot derive the datatype from the values of a numeric
column that has a precision greater than 38. The Analyst tool cannot derive the
datatype from the values of a string column that has a precision greater than 255.
If you have a date column on which you are creating a column profile with a year
value earlier than 1800, the inferred datatype may show up as fixed length string.
Change the default value for the year-minimum parameter in the
InferDateTimeConfig.xml, as necessary.
% Inferred
Percentage of values that match the data type inferred by the Analyst tool.
Documented Datatype
Maximum Value
Minimum Value
Drilldown
Profile Summary
61
Column Values
The column values include values for columns and the frequency in which the value appears for the column.
The following table describes the properties for the column values:
Property
Description
Value
Frequency
Number of times a value appears for a column, expressed as a number, a percentage, and a
chart.
Percent
Chart
Drill down
Note: To sort the Value and Frequency columns, select the columns. When you sort the results of the Frequency
column, the Analyst tool sorts the results based on the datatype of the column.
Column Patterns
The column patterns include the value patterns for the columns and the frequency in which the pattern appears.
The profiling warehouse stores 16,000 unique highest frequency values including NULL values for profile results by
default. If there is at least one NULL value in the profile results, the Analyst tool can display NULL values as
patterns.
Note: The Analyst tool cannot derive the pattern for a numeric column that has a precision greater than 38. The
Analyst tool cannot derive the pattern for a string column that has a precision greater than 255.
The following table describes the properties for the column patterns:
62
Property
Description
Pattern
Frequency
Percent
Chart
Drill down
The following table describes the pattern characters and what they represent:
Character
Description
Represents any numeric character. Informatica Analyst displays up to three characters separately
in the "9" format. The tool displays more than three characters as a value within parentheses. For
example, the format "9(8)" represents a numeric value with 8 digits.
Column Statistics
The column statistics include statistics about the column values, such as average, length, and top and bottom values.
The statistics that appear depend on the column type.
The following table describes the types of column statistics for each column type:
Statistic
Column Type
Description
Average
Integer
Standard Deviation
Integer
Maximum Length
Integer, String
Minimum Length
Integer, String
Bottom
Integer, String
Top
Integer, String
Column Statistics
63
You can select columns for drilldown even if you did not choose those columns for profiling. You can choose to read
the current data in a data source for drilldown or read profile data staged in the profiling warehouse. After you perform
a drilldown on a column value, you can export drilldown data for the selected values or patterns to a CSV file at a
location you choose. Though Informatica Analyst displays the first 200 values for drilldown data, the tool exports all
values to the CSV file.
Run a profile.
The profile appears in a tab.
2.
In the Summary view, select a column name to view the profile results for the column.
3.
Select a column value on the Values tab or select a column pattern on the Patterns tab.
4.
2.
3.
Right-click and select Drilldown Filter > Edit to open the DrillDown Filter dialog box.
4.
5.
To manage current drilldown filters, you can save, recall, or reset filters.
To save a filter, select Drilldown Filter > Save.
To go back to the last saved drilldown filter results, select Drilldown Filter > Recall.
To reset the drilldown filter results, select Drilldown Filter > Reset.
64
Description
Column Profile
Values
Values for the columns and rules and the frequency in which
the values appear for each column.
Patterns
Value patterns for the columns and rules you ran the profile
on and the frequency in which the patterns appear.
Statistics
Properties
In the Navigator, select the project or folder that contains the profile.
2.
3.
In the Column Profiling view, select the column that you want to export.
4.
5.
Enter the file name. Optionally, use the default file name.
6.
65
66
7.
Enter a file format. The format is Excel for the All option and CSV for the rest of the options.
8.
9.
Click OK.
CHAPTER 13
Analyst tool. An analyst can create an expression rule and promote it to a reusable rule that other analysts can use
in multiple profiles.
Predefined rules. Includes reusable rules that a developer creates in the Developer tool. Rules that a developer
creates in the Developer tool as mapplets can appear in the Analyst tool as reusable rules.
After you add a rule to a profile, you can run the profile again for the rule column. The Analyst tool displays profile
results for the rule column. You can modify the rule and run the profile again to view changes to the profile results. The
output of a rule can be one or more virtual columns. The virtual columns exist in the profile results. The Analyst tool
profiles the virtual columns. For example, you use a predefined rule that splits a column that contains first and last
names into FIRST_NAME and LAST_NAME virtual columns. The Analyst tool profiles the FIRST_NAME and
LAST_NAME columns.
Note: If you delete a rule object that other object types reference, the Analyst tool displays a message that lists those
object types. Determine the impact of deleting the rule before you delete it.
67
Predefined Rules
Predefined rules are rules created in the Developer tool or provided with the Developer tool and Analyst tool. Apply
predefined rules to the Analyst tool profiles to modify or validate source data.
Predefined rules use transformations to define rule logic. You can use predefined rules with multiple profiles. In the
Model repository, a predefined rule is a mapplet with an input group, an output group, and transformations that define
the rule logic.
Open a profile.
2.
3.
4.
5.
In the Navigator, select the project or folder that contains the profile that you want to add the rule to.
2.
3.
4.
5.
6.
Click Next.
7.
In the Rules panel, select the rule that you want to apply.
The name, datatype, description, and precision columns appear for the Inputs and Outputs columns in the
Rules Parameters panel.
8.
Click Next.
9.
In the Inputs section, select an input column. The input column is a column name in the profile.
10.
Optionally, in the Outputs section, configure the label of the output columns.
11.
Click Next.
12.
In the Columns panel, select the columns you want to profile. The columns include any rules you applied to the
profile. Optionally, select Name to include all columns.
The Analyst tool lists the name, datatype, precision, and scale for each column.
68
13.
14.
15.
Click Save to apply the rule or click Save & Run to apply the rule and then run the profile.
Expression Rules
Expression rules use expression functions and columns to define rule logic. Create expression rules and add them to
a profile in the Analyst tool.
Use expression rules to change or validate values for columns in a profile. You can create one or more expression
rules to use in a profile. Expression functions are SQL-like functions used to transform source data. You can create
expression rule logic with the following types of functions:
Character
Conversion
Data Cleansing
Date
Encoding
Financial
Numeric
Scientific
Special
Test
Open a profile.
2.
Configure the rule logic using expression functions and columns as parameters.
3.
In the Navigator, select the project or folder that contains the profile that you want to add the rule to.
2.
Expression Rules
69
4.
Click New.
5.
6.
Click Next.
7.
8.
Optionally, choose to promote the rule as a reusable rule and configure the project and folder location.
If you promote a rule to a reusable rule, you or other users can use the rule in another profile as a predefined
rule.
9.
In the Functions tab, select a function and click the right arrow to enter the parameters for the function.
10.
In the Columns tab, select an input column and click the right arrow to add the expression in the Expression
editor. You can also add logical operators to the expression.
11.
Click Validate. You can proceed to the next step if the expression is valid.
12.
Optionally, click Edit to configure the return type, precision, and scale.
13.
Click Next.
14.
In the Columns panel, select the columns you want to profile. The columns include any rules you applied to the
profile. Optionally, select Name to select all columns.
The Analyst tool lists the name, datatype, precision, and scale for each column.
70
15.
16.
17.
Click Save to create the rule or click Save & Run to create the rule and then run the profile.
CHAPTER 14
71
You can perform the following tasks when you work with scorecards:
1.
Create a scorecard in the Developer tool and add columns from a profile.
2.
Optionally, connect to the Analyst tool and open the scorecard in the Analyst tool.
3.
After you run a profile, add profile columns as metrics to the scorecard.
4.
5.
View the scorecard to see the scores for each column in a record.
6.
7.
Edit a scorecard.
8.
9.
10.
11.
View trend charts for each score to monitor how the score changes over time.
12.
Metrics
A metric is a column of a data source or output of a rule that is part of a scorecard. When you create a scorecard, you
can assign a weight to each metric. Create a metric group to categorize related metrics in a scorecard into a set.
Metric Weights
When you create a scorecard, you can assign a weight to each metric. The default value for a weight is 1.
When you run a scorecard, the Analyst tool calculates the weighted average for each metric group based on the metric
score and weight you assign to each metric.
For example, you assign a weight of W1 to metric M1, and you assign a weight of W2 to metric M2. The Analyst tool
uses the following formula to calculate the weighted average:
(M1 X W1 + M2 X W2) / (W1 + W2)
In the Navigator, select the project or folder that contains the profile.
2.
3.
4.
72
Note: Use the following rules and guidelines before you add columns to a scorecard:
You cannot add a column to a scorecard if both the column name and scorecard name match.
You cannot add a column twice to a scorecard even if you change the column name.
5.
6.
Click Next.
7.
Select the scorecard that you want to add the columns to, and click Next.
8.
Select the columns and rules that you want to add to the scorecard as metrics. Optionally, click the check box in
the left column header to select all columns. Optionally, select Column Name to sort column names.
9.
Select each metric in the Metrics panel and configure the valid values from the list of all values in the Score
using: Values panel.
You can select multiple values in the Available Values panel and click the right arrow button to move them to the
Selected Values panel. The total number of valid values for a metric appears at the top of the Available Values
panel.
10.
Select each metric in the Metrics panel and configure metric thresholds in the Metric Thresholds panel.
You can set thresholds for Good, Acceptable, and Unacceptable scores.
11.
Click Next.
12.
In the Score using: Values panel, set up the metric weight for each metric. You can double-click the default
metric weight of 1 to change the value.
13.
14.
Click Save to save the scorecard or click Save & Run to save and run the scorecard.
Running a Scorecard
Run a scorecard to generate scores for columns.
1.
In the Navigator, select the project or folder that contains the scorecard.
2.
3.
4.
Select a score from the Metrics panel and select the columns from the Columns panel to drill down on.
5.
In the Drilldown option, choose to drill down on live data or staged data.
For optimal performance, drill down on live data.
6.
Click Run.
Viewing a Scorecard
Run a scorecard to see the scores for each metric. A scorecard displays the score as a percentage and bar. View data
that is valid or not valid. You can also view scorecard information, such as the metric weight, metric group score, score
trend, and name of the data object.
1.
2.
Metrics
73
3.
Click Actions > Drilldown to view the rows of valid data or rows of data that is not valid for the column.
The Analyst tool displays the rows of valid data by default in the Drilldown panel.
Editing a Scorecard
Edit valid values for metrics in a scorecard. You must run a scorecard before you can edit it.
1.
In the Navigator, select the project or folder that contains the scorecard.
2.
3.
4.
On the Metrics tab, select each score in the Metrics panel and configure the valid values from the list of all values
in the Score using: Values panel.
5.
Make changes to the score thresholds in the Metric Thresholds panel as necessary.
6.
7.
8.
9.
10.
Click Save to save changes to the scorecard, or click Save & Run to save the changes and run the
scorecard.
Defining Thresholds
You can set thresholds for each score in a scorecard. A threshold specifies the range in percentage of bad data that is
acceptable for columns in a record. You can set thresholds for good, acceptable, or unacceptable ranges of data. You
can define thresholds for each column when you add columns to a scorecard, or when you edit a scorecard.
Complete the following prerequisite tasks before you define thresholds for columns in a scorecard:
In the Navigator, select the project or folder that contains the profile and add columns from the profile to the
In the Add to Scorecard window, or the Edit Scorecard window, select each metric in the Metrics panel.
2.
In the Metric Thresholds panel, enter the thresholds that represent the upper bound of the unacceptable range
and the lower bound of the good range.
3.
Metric Groups
Create a metric group to categorize related scores in a scorecard into a set. By default, the Analyst tool categorizes all
the scores in a default metric group.
74
After you create a metric group, you can move scores out of the default metric group to another metric group. You can
edit a metric group to change its name and description, including the default metric group. You can delete metric
groups that you no longer use. You cannot delete the default metric group.
In the Navigator, select the project or folder that contains the scorecard.
2.
3.
4.
5.
6.
7.
Click OK.
8.
In the Navigator, select the project or folder that contains the scorecard.
2.
3.
4.
5.
Select a metric from the Metrics panel and click the Move Metrics icon.
The Move Metrics dialog box appears.
Note: To select multiple scores, hold the Shift key.
6.
7.
Click OK.
In the Navigator, select the project or folder that contains the scorecard.
Metrics
75
2.
3.
4.
5.
6.
7.
Click OK.
In the Navigator, select the project or folder that contains the scorecard.
2.
3.
4.
5.
Select a metric group in the Metric Groups panel, and click the Delete Group icon.
The Delete Groups dialog box appears.
6.
Choose the option to delete the metrics in the metric group or the option to move the metrics to the default metric
group before deleting the metric group.
7.
Click OK.
2.
3.
Click Actions > Drilldown to view the rows of valid or invalid data for the column.
4.
76
In the Navigator, select the project or folder that contains the scorecard.
2.
3.
4.
Scorecard Notifications
You can configure scorecard notification settings so that the Analyst tool sends emails when specific metric scores or
metric group scores move across thresholds or remain in specific score ranges, such as Unacceptable, Acceptable,
and Good.
You can configure email notifications for individual metric scores and metric groups. If you use the global settings, the
Analyst tool sends notification emails when the scores of selected metrics cross the threshold from the score ranges
Good to Acceptable and Acceptable to Bad. You also get notification emails for each scorecard run if the score
remains in the Unacceptable score range across consecutive scorecard runs.
You can customize the notification settings so that scorecard users get email notifications when the scores move from
the Unacceptable to Acceptable and Acceptable to Good score ranges. You can also choose to send email
notifications if a score remains within specific score ranges for every scorecard run.
Description
ScorecardName
ObjectURL
A hyperlink to the scorecard. You need to provide the username and password.
MetricGroupName
CurrentWeightedAverage
Weighted average value for the metric group in the current scorecard run.
CurrentRange
The score range, such as Unacceptable, Acceptable, and Good, for the metric
group in the current scorecard run.
Scorecard Notifications
77
Tag
Description
PreviousWeightedAverage
Weighted average value for the metric group in the previous scorecard run.
PreviousRange
The score range, such as Unacceptable, Acceptable, and Good, for the metric
group in the previous scorecard run.
ColumnName
ColumnType
RuleName
RuleType
DataObjectName
2.
3.
4.
5.
6.
Click the Notifications check box to enable the global settings for the metric or metric group.
7.
Select Use custom settings to change the settings for the metric or metric group.
You can choose to send a notification email when the score is in Unacceptable, Acceptable, and Good ranges
and moves across thresholds.
8.
To edit the global settings for scorecard notifications, click the Edit Global Settings icon.
The Edit Global Settings dialog box appears where you can edit the settings including the email template.
2.
Click Actions > Edit to open the Edit Scorecard dialog box.
3.
4.
5.
6.
78
Choose when you want to send email notifications using the Score in and Score moves check boxes.
7.
8.
9.
10.
In the Body field, add the introductory and closing text of the email message.
11.
To apply the global settings, select Apply settings to all metrics and metric groups.
12.
Click OK.
Description
HOST_NAME
PORT
MRS_PROJECT_ID
SCORECARD_ID
ID of the scorecard.
MRS_PARENT_PATH
VIEW_MODE
CREDENTIAL
The VIEW_MODE attribute in the scorecard URL determines whether you can integrate a read-only or editable view of
the scorecard with the external application:
view=objectonly
Displays a read-only view of the scorecard results.
79
view=objectrunonly
Displays scorecard results where you can run the scorecard and drill down on results.
view=full
Opens the scorecard results in the Analyst tool with full access.
2.
3.
4.
Add the URL to the source code of the external application or web portal.
Scorecard Lineage
Scorecard lineage shows the origin of the data, describes the path, and shows how the data flows for a metric or metric
group. You can use scorecard lineage to analyze the root cause of an unacceptable score variance in metrics or
metric groups. View the scorecard lineage in the Analyst tool.
Complete the following tasks to view scorecard lineage:
1.
In Informatica Administrator, associate a Metadata Manager Service with the Analyst Service.
2.
Select a project and export the scorecard objects in it to an XML file using the Export Resource File for Metadata
Manager option in the Developer tool or infacmd oie exportResources command.
3.
In Metadata Manager, use the exported XML file to create a resource and load it.
Note: The name of the resource file that you create and load in Metadata Manager must use the following naming
convention: <MRS name>_<project name>. For more information about how to create and load a resource file,
see Informatica PowerCenter Metadata Manager User Guide.
4.
In the Analyst tool, open the scorecard and select a metric or metric group.
5.
In the Navigator, select the project or folder that contains the scorecard.
2.
3.
80
4.
Scorecard Lineage
81
CHAPTER 15
82
You can configure data quality transformations in a single mapping, or you can create mappings for different stages in
the process.
Use the Developer tool to perform the following tasks:
Create a mapping that generates score values for data quality issues
Use a Match transformation in cluster mode to generate score values for duplicate record exceptions.
Use a transformation that writes a business rule to generate score values for records that contain errors. For
example, you can define an IF/THEN rule in a Decision transformation. Use the rule to evaluate the output of
other data quality transformations.
Use an Exception transformation to analyze the record scores
Configure the Exception transformation to read the output of other transformations or to read a data object from
another mapping. Configure the transformation to write records to database tables based on score values in the
records.
Configure target data objects for good records or automatic consolidation records
Connect the Exception transformation output ports to the target data objects in the mapping.
Create the target data object for bad or duplicate records
Use the Generate bad records table or Generate duplicate record table option to create the database object
and add it to the mapping canvas. The Developer tool auto-connects the bad or duplicate record ports to the data
object.
Run the mapping
Run the mapping to process exceptions.
Use the Analyst tool or Informatica Data Director for Data Quality to perform the following tasks:
Review the exception table data
You can use the Analyst tool or Informatica Data Director for Data Quality to review the bad or duplicate record
tables.
Use the Analyst tool to import the exception records into a bad or duplicate record table. Open the imported
table from the Model repository and work on the exception data.
Use Informatica Data Director for Data Quality if you are assigned a task to review or correct exceptions as
83
matchScore
any name beginning with DQA_
2.
Select a project.
3.
4.
Optionally, use the menus to filter the table records. You can filter records by value in the following columns:
Priority, Quality Issue, Column, and Status.
5.
Click Show to view the records that match the filter criteria.
6.
7.
Saving changes to a record is the first step in processing the record in the Analyst tool. After you save changes to a
record, you can update the record status to accept, reprocess, or reject the record.
Click Accept.
Indicates that the record is acceptable for use.
Click Reject.
Indicates that the record is not acceptable for use.
Click Reprocess.
Selects the record for reprocessing by a data quality mapping. Select this option when you are unsure if the record
is valid. Rerun the mapping with an updated business rule to recheck the record.
84
2.
Select a project.
3.
4.
5.
85
2.
3.
Click Show.
The following table describes record statuses for the audit trail.
86
Record Status
Description
Updated
Consolidated
Rejected
Accepted
Reprocess
Rematch
Extracted
CHAPTER 16
Reference Tables
This chapter includes the following topics:
Reference Tables Overview, 87
Reference Table Properties, 87
Create Reference Tables, 89
Create a Reference Table from Profile Data, 90
Create a Reference Table From a Flat File, 92
Create a Reference Table from a Database Table, 94
Copying a Reference Table in the Model Repository, 95
Reference Table Updates, 96
Audit Trail Events, 98
Rules and Guidelines for Reference Tables, 99
87
Description
Name
Description
Location
Valid Column
Created on
Created By
Last Modified
Last Modified By
Connection Name
Type
Description
Name
Data Type
The datatype for the data in each column. You can select
one of the following datatypes:
-
bigint
date/time
decimal
double
integer
string
88
Property
Description
Precision
Scale
Description
Nullable
In the Navigator, select the project or folder where you want to create the reference table.
2.
3.
4.
Click Next.
5.
Enter the table name, and optionally enter a description and default value.
The Analyst tool uses the default value for any table record that does not contain a value.
6.
For each column you want to include in the reference table, click the Add New Column icon and configure the
properties for each column.
Note: You can reorder or delete columns.
7.
89
8.
Click Finish.
In the Navigator, select the project or folder that contains the profile with the column that you want to add to a
reference table.
2.
3.
In the Column Profiling view, select the column that you want to add to a reference table.
4.
5.
6.
Click Next.
7.
The column name appears by default as the table name. Optionally enter another table name, a description, and
default value.
The Analyst tool uses the default value for any table record that does not contain a value.
8.
Click Next.
9.
In the Column Attributes panel, configure the column properties for the column.
10.
Optionally, choose to create a description column for rows in the reference table.
Enter the name and precision for the column.
11.
12.
Click Next.
13.
The column name appears as the table name by default. Optionally, enter another table name and a
description.
14.
In the Save in panel, select the location where you want to create the reference table.
The Reference Tables: panel lists the reference tables in the location you select.
90
15.
16.
Click Finish.
In the Navigator, select the project or folder that contains the profile with the column that you want to add to a
reference table.
2.
3.
In the Column Profiling view, select the column that you want to add to a reference table.
4.
In the Values view, select the column values you want to add. Use the CONTROL or SHIFT keys to select
multiple values.
5.
6.
7.
Click Next.
8.
The column name appears by default as the table name. Optionally enter another table name, a description, and
default value.
The Analyst tool uses the default value for any table record that does not contain a value.
9.
Click Next.
10.
In the Column Attributes panel, configure the column properties for the column.
11.
Optionally, choose to create a description column for rows in the reference table.
Enter the name and precision for the column.
12.
13.
Click Next.
14.
The column name appears as the table name by default. Optionally, enter another table name and a
description.
15.
In the Save in panel, select the location where you want to create the reference table.
The Reference Tables: panel lists the reference tables in the location you select.
16.
17.
Click Finish.
In the Navigator, select the project or folder that contains the profile with the column that you want to add to a
reference table.
2.
3.
In the Column Profiling view, select the column that you want to add to a reference table.
4.
In the Patterns view, select the column patterns you want to add. Use the CONTROL or SHIFT keys to select
multiple values
5.
91
7.
Click Next.
8.
The column name appears by default as the table name. Optionally enter another table name, a description, and
default value.
The Analyst tool uses the default value for any table record that does not contain a value.
9.
Click Next.
10.
In the Column Attributes panel, configure the column properties for the column.
11.
Optionally, choose to create a description column for rows in the reference table.
Enter the name and precision for the column.
12.
13.
Click Next.
14.
The column name appears as the table name by default. Optionally, enter another table name and a
description.
15.
In the Save in panel, select the location where you want to create the reference table.
The Reference Tables: panel lists the reference tables in the location you select.
16.
17.
Click Finish
92
Description
Delimiters
Text Qualifier
Column Names
Imports column names from the first line. Select this option if
column names appear in the first row.
The wizard uses data in the first row in the preview for
column names.
Default is not enabled.
Values
In the Navigator, select the project or folder where you want to create the reference table.
2.
3.
4.
Click Next.
5.
6.
Click Upload to upload the file to a directory in the Informatica services installation directory that the Analyst tool
can access.
7.
Enter the table name. Optionally, enter a description and default value.
The Analyst tool uses the default value for any table record that does not contain a value.
8.
Select a code page that matches the data in the flat file.
9.
10.
Click Next.
93
11.
12.
13.
Click Next.
14.
On the Column Attributes panel, verify or edit the column properties for each column.
15.
Optionally, create a description column for rows in the reference table. Enter the name and precision for the
column.
16.
17.
Click Finish.
In the Navigator, select the project or folder to store the reference table object.
2.
3.
4.
Select Unmanaged Table to create a table that does not store data in the reference data warehouse.
To perform edit operations on an unmanaged reference table, select the Editable option.
Click Next.
5.
6.
7.
8.
94
9.
Click Next.
10.
11.
On the Folders panel, verify the project or folder to store the reference table.
The Reference Tables panel lists the reference tables in the folder that you select.
12.
Click Finish.
In the Navigator, select the project or folder to store the reference table object.
2.
3.
4.
Select Unmanaged Table if you want to create a table that does not store data in the reference data
warehouse.
If you want to perform edit operations on an unmanaged reference table, select the Editable option.
5.
Click Next.
6.
7.
8.
Click OK.
The database connection appears in the list of available connections.
Browse the Model repository, and find the reference table you want to copy.
2.
Right-click the reference table, and select Duplicate from the context menu.
3.
In the Duplicate dialog box, select a folder to store the copy of the reference table.
4.
Optionally, enter a new name for the copy of the reference table.
5.
Click OK.
95
Managing Columns
Use the Edit column properties dialog box to manage the columns in a reference table. You can also set table
properties in the Edit column properties dialog box.
1.
In the Navigator, select the project or folder that contains the reference table that you want to edit.
2.
Click the reference table name to open it in a tab. The Reference Table tab appears.
3.
Click Actions > Edit Table or click the Edit Table icon.
The Edit column properties dialog box appears. Use the dialog box options to perform the following
operations:
Change the valid column in the table.
Delete a column from the table.
Change a column name.
Update the descriptive text for a column.
Update the editable status of the reference table.
Update the audit note for the table. The audit note appears in the audit log for any action that you perform in
96
Managing Rows
You can add, edit, or delete rows in a reference table.
1.
In the Navigator, select the project or folder that contains the reference table.
2.
Click the reference table name to open it. The table opens in the Reference Table tab.
3.
Edit the data rows. You can edit the data rows in the following ways:
To add a row, select Actions > Add Row.
In the Add Row window, enter a value for each column. Optionally, enter an audit note.
Click OK to apply the changes.
To edit a data value, double-click the value in the reference table and update the value
After you edit the data, use the row-level options to accept or reject the edit.
To edit multiple rows, select the rows to edit and select Actions > Edit.
In the Edit Multiple Rows window, enter a value for each column in the row. Optionally, enter an audit
note.
Click OK to apply the changes.
To delete rows, select the rows to delete and click Actions > Delete.
In the Navigator, select the project or folder that contains the reference table.
2.
Click the reference table name to open it. The table opens in the Reference Table tab.
3.
4.
5.
Search the columns you select for the data value in the Find field.
Use the following options to replace values one by one or to replace all values:
Use the Next and Previous options to find values one by one.
To replace a value, select Replace.
Use the Highlight All option to display all instances of the value.
To replace all instances of the value, select Replace All.
97
In the Navigator, select the project or folder that contains the reference table.
2.
Click the reference table name to open it. The table opens in the Reference Table tab.
3.
4.
Description
File Name
File Format
Format of the exported file. You can select the following formats:
csv. Comma-separated file.
xls. Microsoft Excel file.
dic. Informatica dictionary file.
5.
Column name option. Select the option to indicate that the first row of the file
contains the column names.
Code Page
Code page of the reference data. The default code page is UTF-8.
In the Navigator, select the project or folder that contains the reference table.
2.
3.
Click Actions > Edit Table or click the Edit Table icon.
The Edit column properties window appears.
4.
98
You can configure query options on the Audit Trail tab to filter the log events that you view. You can specify filters on
the date range, type, user name, and status. The following table describes the options you configure when you view
audit trail log events:
Option
Description
Date
Start and end dates for the log events to search for. Use the calender to choose dates.
Type
Type of audit trail events. You can filter and view the following events types:
- Data. Events related to data in the reference table. Events include creating, editing, deleting, and
replacing all rows.
- Metadata. Events related to reference table metadata. Events include creating reference tables,
adding, deleting, and editing columns, and updating valid columns.
User
User who edited the reference table and entered the audit trail comment. The Analyst tool
generates the list of users from the Analyst tool users configured in the Administrator tool.
Status
Status of the audit trail log events. Status corresponds to the action performed in the reference
table editor.
Audit trail log events also include the audit trail comments and the column values that were inserted, updated, or
deleted.
In the Navigator, select the project or folder that contains the reference table that you want to view the audit trail
for.
2.
Click the reference table name to open it in a tab. The Reference Table tab appears.
3.
4.
5.
Click Show.
The log events for the specified query options appear.
Server database, the Analyst tool cannot display the preview if the table, view, schema, synonym, or column
names contain mixed case or lowercase characters.
To preview data in tables that reside in case-sensitive databases, set the Support Mixed Case Identifiers attribute
to true. Set the attribute to true in the connections for Oracle, IBM DB2, IBM DB2/zOS, IBM DB2/iOS, and Microsoft
SQL Server databases in the Developer tool or Administrator tool.
When you create a reference table from inferred column patterns in one format, the Analyst tool populates the
99
For example, when you create a reference table for the column pattern X(5), the Analyst tool displays the following
format for the column pattern in the reference table: XXXXX.
When you import an Oracle database table, verify the length of any VARCHAR2 column in the table. The Analyst
tool cannot import an Oracle database table that contains a VARCHAR2 column with a length greater than
1000.
To read a reference table, you need execute permissions on the connection to the database that stores the table
data values. For example, if the reference data warehouse stores the data values, you need execute permissions
on the connection to the reference data warehouse. You need execute permissions to access the reference table
in read or write mode. The database connection permissions apply to all reference data in the database.
100
INDEX
A
Analyst tool
find and replace reference data values 97
C
column profile
drilldown 64
Informatica Developer 36
options 35
overview 34
process 55
column profile results
Informatica Developer 38
column properties
reference tables in Analyst tool 87
reference tables in Developer tool 47
creating a custom profile
profiles 56
creating a reference table from column patterns
reference tables 91
creating a reference table from column values
reference tables 91
creating a reference table from profile columns
reference tables 90
creating a reference table manually
reference tables 89
creating an expression rule
rules 69
D
data object profiles
creating a single profile 37
Developer tool
find and replace reference data values 52
E
export
scorecard lineage to XML 44
exporting a reference table
reference tables 98
expression rules
process 69
F
flat file properties
reference tables in Analyst tool 87
reference tables in Developer tool 47
flat files
synchronizing a flat file data object 59
I
importing a reference table
reference tables 93
Informatica Analyst
column profile results 60
column profiles overview 54
rules 67
Informatica Data Quality
overview 2
Informatica Developer
rules 41
M
managing columns
reference tables 96
managing rows
reference tables 97
mapping object
running a profile 45
Mapplet and Mapping Profiling
Overview 45
P
predefined rules
process 68
profile results
column patterns 62
column statistics 63
column values 62
drilling down 64
Excel 65
exporting 64
exporting from Informatica Analyst 65
exporting in Informatica Developer 40
summary 61
profiles
creating a custom profile 56
running 58
R
reference tables
column properties in Analyst tool 87
column properties in Developer tool 47
creating a reference table from column patterns 91
creating a reference table from column values 91
101
S
scorecard
configuring global notification settings 78
configuring notifications 78
viewing in external applications 80
scorecard integration
Informatica Analyst 79
scorecard lineage
viewing from Informatica Developer 44
viewing in Informatica Analyst 80
102
scorecards
adding columns to a scoredard 72
creating a metric group 75
defining thresholds 74
deleting a metric group 76
drilling down 76
editing 74
editing a metric group 75
Informatica Analyst 71
Informatica Analyst process 71
Informatica Developer 43
metric groups 75
metric weights 72
metrics 72
moving scores 75
notifications 77
overview 35
running 73
viewing 73
Index
tables
synchronizing a relational data object 59
trend charts
viewing 77
V
viewing audit table events
reference tables 99