You are on page 1of 10

How to Evaluate the Accuracy of Address Records

2011 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means

(electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation.

Abstract
The Mailability Score, Match Code, and Result Percentage ports on the Address Validator transformation provide you with general information about the deliverability and accuracy of address data. This article tells you how to use the ports to evaluate the data quality of address records. This article also shows you how to simplify the output codes from the ports so that they are easy to understand.

Supported Versions
Informatica Data Quality 9.1.0

Table of Contents
Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 When to Use the Status Info Ports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Status Info Port Definitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Using Mapplet Rules to Read Status Info Port Outputs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Installing the Core Accelerator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Rule Descriptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 How to Read Mailability Score Output Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 How to Read Match Code Output Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Overview
The Address Validator transformation uses reference data to evaluate the accuracy and deliverability of postal addresses. The transformation can correct errors in an address, add data to an address, and provide status information on address data quality. For example, the United States Postal Service (USPS) provides reference data that identifies every mailbox in the United States. When the Address Validator transformation reads a United States address record, it compares each address record with reference data that the USPS provides to Informatica. The Address Validator transformation handles the input addresses in the following ways:
If the transformation finds a perfect match between the input address and the USPS reference data, it writes the

address information to the output ports with no change.


If the transformation finds a partial match in the reference data, it selects the correct address elements from the

reference data and writes the correct elements to the output ports.
If the transformation cannot find a match in the reference data, it attempts to write the correct form of each input

element to the output ports. The resulting record may not contain a deliverable address. In each case, the transformation writes the results of the matching operation to status ports that indicate the data quality of the address.

The following illustration shows the Status Info port group on the Templates tab:

Note: The Status Info port group includes the Address Type port. This output port describes the type of mailbox in a United States or Canadian address. The Address Type port does not contain information about the deliverable status of the address.

When to Use the Status Info Ports


The Mailability Score, Match Code, and Result Percentage ports provide summary information about the data quality of each address in the data set. The Element Input Status, Element Relevance, and Element Result Status ports provide detailed information on each element in each address record. The Mailability Score, Match Code, and Result Percentage outputs are useful indicators of whether you need to define an address validation stage in a data project. If the Mailability Score, Match Code, and Result Percentage outputs indicate that all addresses meet your data quality standards, you do not need to perform additional address validation. If you find one or more addresses that are not valid, review the outputs on the Element Input Status, Element Relevance, and Element Result Status ports. Use the ports to identify the address elements that you need to fix.

Status Info Port Definitions


Each Status Info port provides a different type of information about an address record. The Mailability Score, Match Code, and Result Percentage ports perform the following types of analysis:

Mailability Score This port describes the likely outcome of any attempt to deliver mail to the address. The Mailability Score output is a text description that summarizes the quality of the address in terms of the risk to mail delivery. Select this port when you need general information on the quality of the input data that you connect to the Address Validator transformation. Match Code This port describes the results of the validation operation that the Address Validator performed on the input address. The Match Code output is a two-character string that represents the success or failure of the operation. Select this port to identify output address records that are valid or not valid. Result Percentage This port indicates the degree of overall similarity between an input address and the address validation results. The Result Percentage output is a percentage value. Higher percentage values indicate greater similarity between the input and output address. Select this port to identify address records that changed during address validation and to review the extent of the changes.

Using Mapplet Rules to Read Status Info Port Outputs


The Mailability Score and Match Code ports provide information about address quality in coded format. You can use Informatica mapplet rules to parse information from the output codes. The rules use reference tables to convert each code value into an English-language equivalent. The rules and reference tables are part of the Core Accelerator, which is available to Data Quality customers. The Core Accelerator contains a rule for each port on the Status Info output group except the Result Percentage port. Result Percentage does not write coded output. To use an accelerator rule in an address validation mapping, complete the following steps: 1. 2. 3. 4. 5. 6. Download the accelerator, and import the accelerator objects to the Model repository. Add a rule to an address validation mapping. The mapping must read address records from a data object, and it must contain an Address Validator transformation. Connect a Status Info port on the Address Validator transformation to the rule you added to the mapping. The rule name contains the name of the port that you connect to. Run the mapping, or run the Data Viewer on the Address Validator transformation. If you run the mapping, add a writable data object as a target and connect the rule output to the data object. Read the rule output. If you ran a mapping, open the data object. If you ran the Data Viewer, resize the Developer tool so that the Data Viewer columns are visible. Evaluate the rule output, and decide the next steps you need to take for the address data.

Installing the Core Accelerator


You download the Core Accelerator with the Data Quality Content Installer. You find the accelerator object XML file and reference table ZIP file the in the Accelerator_Content directory of the Content Installer package. Use the Developer tool to import the accelerator rules to the Model repository. The rules and reference tables appear in the Model repository in the project you specified during import. Find the rules you need for the status information ports in this repository location:
[Content_Project_Name]\[Rule_Folder_Name]\General_Data_Cleansing

The reference tables install to this location:


[Content_Project_Name]\[Rule_Folder_Name]\Dictionaries

You do not need to open or edit the reference tables. Note: The reference tables used by the rules are different to the address reference data files used by the Address Validator transformation. You purchase the address reference data files from Informatica. You cannot open or edit the address reference data files.

Rule Descriptions
Each rule contains an input data object, a Parser transformation, and an output data object. The following rules parse the output codes on the Match Code and Mailability Score ports: rule_Assign_DQ_90_Mailability_Score_Description This rule writes a description of the output code values on the Mailability Score port. The Parser transformation in this rule reads the reference table DQ90_AV_MailabilityScores_infa. rule_Assign_DQ_90_Match_Code_Descriptions This rule writes a description of the output code values on the Match Code port. The Parser transformation in this rule reads the reference table DQ90_Match_code_desc_infa. Note: The Core Accelerator does not contain a mapplet rule for the Result Percentage port. The Result Percentage port does not write coded output. The port writes a percentage value. The following illustration shows rule_Assign_DQ_90_Mailability_Score_Description in the Developer tool:

How to Read Mailability Score Output Codes


The Mailability Score port output is a single digit that indicates the likelihood of successful delivery to the output address. After you run the address validation mapping, review the output data from this port to determine if an address needs further validation. Connect the port to rule_Assign_DQ_90_Mailability_Score_Description to generate text descriptions of each output code.

The following table lists the possible values in the output code and the text that the transformation reads from the reference table DQ90_AV_MailabilityScores_infa:
Output Code 5 Reference Table Text completely confident almost certain Description

All address data elements that are relevant to delivery are present and correct.

The address has a unique match in the address reference data and one of the following cases applies: - Some data elements could not be checked by the address reference data. - Some data elements were corrected with a very high degree of confidence. The validation process returns this output code when the number of unmatched elements is very low.

should be fine

Some data elements were corrected with a very high degree of confidence. The validation process returns this output code when the address has a unique match in the address reference data and the number of unmatched elements is acceptable.

fair chance

Address data elements that are relevant to delivery are present, and one of the following scenarios also applies: - The validation process did not find a strong match in the address reference data. - The validation process found multiple matches and has similar levels of confidence in each match. The validation process found a partial match between the input data and the address reference data. The output address is likely to be incomplete. The input address is missing too many elements, or a majority of the elements generated no matches in the address reference data.

risky

futile

How to Read Match Code Output Codes


The Match Code port output describes the results of the address validation operation performed on the input address. After you run the address validation mapping, review the output data from this port to establish the data quality of the input addresses. Connect the port to rule_Assign_DQ_90_Match_Code_Descriptions to generate text descriptions of each output code. The following table lists the possible values at each position in the output code and the text that the transformation reads from the reference table DQ90_Match_code_desc_infa:
Output Code V4 Reference Table Text Verified - Input data correct - all elements were checked and input matched perfectly Description The input address is a perfect match with a single address in the address data. The input and output addresses in the record use the same information. The output address matches a single address in the address data. The Address Validator transformation edited one or

V3

Verified - Input data correct on input but some or all elements were standardised or input contains outdated names or exonyms

Output Code

Reference Table Text

Description more input data elements for one of the following reasons: - An input element uses a name other than the local name. - An input element uses a name that is out of date.

V2

Verified - Input data correct but some elements could not be verified because of incomplete reference data

The output address matches a single address in the address data, but the Address Validator transformation could not verify every input element because some address reference data files are not installed. The input address matches a single address in the address data, but the Address Validator transformation cannot write some output data because an output port has the wrong precision. The output address may be undeliverable. The input address contains information that matches a single address in the address data, and the Address Validator transformation replaced one or more elements with new elements from the address reference data. All output elements are verified correct for the address. The input address contains information that matches a single address in the address data, and the Address Validator transformation replaced one or more elements with new elements from the address reference data. All output elements are verified correct for the address. However, the transformation could not verify every input element. The Address Validator transformation replaced one or more elements with new elements from the address reference data. However, the transformation could not verify deliverability as some address reference data files are not installed. The Address Validator transformation replaced one or more elements with new elements from the address reference data. However, the transformation could not verify deliverability as some input elements cannot be corrected. The output address matches a single address in the address data, but the Address Validator transformation could

V1

Verified - Input data correct but the user standardisation has deteriorated deliverability (wrong element user standardisation - for example postcode length chosen is too short). Not set by validation. Corrected - all elements have been checked

C4

C3

Corrected - but some elements could not be checked

C2

Corrected - but delivery status unclear (lack of reference data)

C1

Corrected - but delivery status unclear because user standardisation was wrong. Not set by validation.

I4

Data could not be corrected completely but is very likely to be deliverable - single match (e.g. HNO is wrong but only 1 HNO is found in reference data)

Output Code

Reference Table Text

Description not verify every input element. The input data is likely to contain an error.

I3

Data could not be corrected completely but is very likely to be deliverable multiple matches (e.g. HNO is wrong but more than 1 HNO is found in reference data)

The output address is very likely to be deliverable but the Address Validator transformation found multiple matches for one or more input elements in the address reference data. For example, a house number is incorrect but the number is in the correct range. The Address Validator transformation cannot find matching address data in the address reference data. However, the input record contains data in an address format that may be deliverable. The Address Validator transformation cannot find matching address data in the address reference data. The input record does not contain data that is likely to be deliverable. The Address Validator transformation found multiple good matches for the input record data in the address reference data. The transformation returns this code in Suggestion List mode.

I2

Data could not be corrected but there is a slim chance that the address is deliverable

I1

Data could not be corrected and is pretty unlikely to be delivered.

Q3

FastCompletion Status - Suggestions are available - complete address

Q2

FastCompletion Status - Suggested address is complete but combined with elements from the input (added or deleted)

The Address Validator transformation found a partial match for the input record data in the address reference data. The transformation returned a complete address. The transformation returns this code in Suggestion List mode.

Q1

FastCompletion Status - Suggested address is not complete (enter more information)

The Address Validator transformation did not find a match for the input record data in the address reference data. The transformation returned a partial address. The transformation returns this code in Suggestion List mode.

Q0

FastCompletion Status - Insufficient information provided to generate suggestions.

The Address Validator transformation did not find a match for the input record data in the address reference data. The transformation did not return any data for the address. The transformation returns this code in Suggestion List mode.

RA

Country recognized from ForceCountryISO3 Setting

The Address Validator transformation used the Force Country setting to add country name data to the address.

Output Code R9

Reference Table Text Country recognized from DefaultCountryISO3 Setting

Description The Address Validator transformation used the Default Country setting to add country name data to the address. The Address Validator transformation identified a destination country from the input data. The Address Validator transformation identified a destination country from the input data, but the input data contains inconsistent data for this country. The Address Validator transformation identified a destination country from state or national territory information in the input data. The Address Validator transformation identified a destination country from province information in the input data. The Address Validator transformation identified a destination country from city or town information in the input data. The Address Validator transformation identified a destination country from the structure of the address. The Address Validator transformation identified a destination country from data provided by a script. The Address Validator transformation identified several possible destination countries. The transformation did not verify a country for the address. The Address Validator transformation could not identify a destination country for the input address data. The Address Validator transformation parsed all input elements successfully. The transformation returns this code in Parsing mode.

R8

Country recognized from name without errors

R7

Country recognized from name with errors

R6

Country recognized from territory

R5

Country recognized from province

R4

Country recognized from major town

R3

Country recognized from format

R2

Country recognized from script

R1

Country not recognized - multiple matches

R0

Country not recognized

S4

Parsed perfectly

S3

Parsed with multiple results

The Address Validator transformation parsed all input elements, but some elements match multiple element types. The transformation returns this code in Parsing mode.

S2

Parsed with Errors - Elements change position

The Address Validator transformation parsed all input elements, but the

Output Code

Reference Table Text

Description transformation changed the element type in one or more cases. The transformation returns this code in Parsing mode.

S1

Parse Error - Input Format Mismatch

The Address Validator transformation could not parse input elements because the address structure did not match the address reference data structure. The transformation returns this code in Parsing mode.

N6

Validation Error: No validation performed because input data was insufficient

The Address Validation transformation could not validate the address because the transformation lacked usable input data. The Address Validation transformation could not validate the address because the address reference data is out of date.

N5

Validation Error: No validation performed because reference database is too old please contact Address Doctor to obtain updated reference data Validation Error: No validation performed because reference database is corrupt or in wrong format Validation Error: No validation performed because country could not be unlocked

N4

The Address Validation transformation could not validate the address because it could not read the address reference data. The Address Validation transformation could not validate the address because it could find an address reference data license. The Address Validation transformation could not validate the address because it could not find address reference data for the destination country. The Address Validation transformation could not validate the address because it could not associate the input address with a country.

N3

N2

Validation Error: No validation performed because required reference database is not available

N1

Validation Error: No validation performed because country was not recognized

Conclusion
The Status Info ports contain detailed information about address accuracy and deliverability. Select a Mailability Score, Match Code, or Result Percentage port when you need general information about the deliverability of an address. Use the Informatica accelerator rules to convert the port output codes into text descriptions that you can more quickly understand.

Author
David Handy Principal Technical Writer

10

You might also like