This action might not be possible to undo. Are you sure you want to continue?
Informatica Data Quality
The core Informatica Data Quality applications are Workbench, Server, and the Data Quality Integration. Data Quality Workbench - Use to design, test, and deploy data quality processes, called plans. Workbench allows you to test and execute plans as needed, enabling rapid data investigation and testing of data quality methodologies. You can also deploy plans, as well as associated data and reference files, to other Data Quality machines. Plans are stored in a Data Quality repository. Workbench provides access to fifty database-based, file-based, and algorithmic data quality transformations that you can use to build plans. Data Quality Server - Use to enable plan and file sharing and to run plans in a networked environment. Data Quality Server supports networking through service domains and communicates with Workbench over TCP/IP. Data Quality Server allows multiple users to collaborate on data projects, speeding up the development and implementation of data quality solutions. Data Quality Integration is a plug-in for Informatica PowerCenter. It allows PowerCenter users to connect to a Data Quality repository and to pass data quality plan instructions into a transformation. When PowerCenter runs a workflow containing the transformation, it sends the plan instructions to the Data Quality engine for execution and retrieves the data quality results back into the workflow. Following are some frequently used transformations available in Data Quality Workbench that can be used to build the plans (Here we have excluded the transformations that are already present in PowerCenter) 1. Address Validator Transformation Address validation compares input address data with address reference data to determine the accuracy of the input addresses and fix errors they may contain. The reference data is subscription based. The address reference datasets read by the Address Validator transformation do not install with Informatica applications. You must download the address reference datasets separately and install them using the Data Quality Content Installer. 2. Case Converter Transformation The Case Converter transformation creates data uniformity by standardizing the case of strings in input data. 3. Comparison Transformation The Comparison transformation evaluates the similarity between pairs of input strings and calculates the degree of similarity for each pair as a numerical score. The Comparison transformation outputs match scores in a range from 0 to 1, where 1 indicates a perfect match. There are different strategies that can be utilized in this transformation as per the requirement. 4. Consolidation Transformation Use the Consolidation transformation to create a single, consolidated record from records identified as duplicates by the Match transformation.
10. Standardizer Transformation The Standardizer transformation replaces input data strings with standardized strings. A label is a character or string that represents the type of character or string that a field contains. 11. so that these records can be associated together in data consolidation and master data management operations. You can edit the weight applied to each input score to increase or decrease its contribution to the aggregate score.5. as the number of match-pair comparisons performed during the analyses grows exponentially with the number of fields in the selected columns. 8. Mergers or acquisitions. 9. . so that the final score for a record reflects the relative importance of each data field in the duplicate analysis. Key Generator Transformation The Key Generator transformation enables faster duplicate analysis by identifying records that contain common information in a field that you select. The transformation creates a group key for each record based on the data in the selected field. It creates links between duplicate records that are assigned to different match clusters. 6. 5. Group keys enable the Match transformation to process records as a series of groups rather than as a single dataset.Informatica Data Quality The Consolidation transformation reads the Association IDs created by the Association transformation to perform this task. Weighted Average Transformation The Weighted Average transformation reads match scores from two or more matching operations and calculates an aggregate match score. Use the Labeler transformation to identify the types of information in your source data. Association Transformation The Association transformation processes output data from a Match transformation. Parser Transformation The Parser transformation reads data fields containing multiple information types and creates new fields for each information type. The default weight is 0. 12. Records with common group key values are processed together by the Match transformation. Regulatory compliance initiatives etc. A weight is a numerical value. Match Transformation The Match transformation calculates the degrees of similarity between data records. 7. Merge Transformation The Merge transformation reads the data values from multiple input columns and creates a single output column. Labeler Transformation The Labeler transformation examines input fields and creates labels that describe the type of characters or strings in each field. The types of data project that can require duplicate analysis include: CRM projects. The duplicate analyses performed by the Match transformation can consume significant amounts of computing resources.
The reference data provided by third-party vendors is typically in database format. Comparison Reads identity population data during duplicate analysis. occupations. and acronyms. You purchase an annual subscription to address data for a country.Informatica Data Quality Reference Data Plans can make use of Reference data to identify. Case Converter Reads reference data tables to identify strings that must change case. . or remove inaccurate or duplicate data values. first names. repair.Address reference data files . The types of reference information include telephone area codes. . household. The Match transformation and the Comparison transformation use this data to parse potential identities from input fields. and that thus can be updated dynamically when the underlying data is updated Third-party reference data These data files are provided by third-parties and are provided by Informatica customers as premium product options. social security number formats. Match Reads identity population data during duplicate analysis. - - Some common Reference Data can be like . Labeler Reads reference data tables to identify and label strings. and corporate identities.containing information on all valid addresses in a country. Standardizer Reads reference data tables to standardize strings to a common format. . Parser Reads reference data tables to parse strings.Identity populations. postcode formats. NOTE: Some Reference data dictionaries are installed with Data Quality while the subscriptions of others like the Addresses and Identities need to be purchased explicitly. Contain information on types of personal. The following transformations can read reference data: y y y y y y y Address Validator Reads address reference data to verify the accuracy of addresses.Reference data table containing information on common business terms from several countries. Several transformations read reference data to perform data quality tasks. Standard dictionary files These files are installed with Informatica Data Quality and can be used by several types of component in Workbench. Database dictionaries Informatica Data Quality users with database expertise can create and specify dictionaries that are linked to database tables. The Address Validator transformation reads this data. Informatica Data Quality plans can make use of three types of reference data.
and standardizes a United States business-to-business dataset of approximately 1. If not. cleanses. (Developer) The project analyzes.Informatica Data Quality Sample Plan Below is the demonstration of a plan building in action by following a simple data quality project from start to finish. the rule writes such names to a new column named City_or_Town Completeness. The project operations take place in Data Quality Workbench.csv (Address1 Address4) into a single column named Merged Address. Conversely. We would call the dataset with the filename IDQDemo. The Character Labeller analyzes the conformity of the Customer_Number field. The Test Completeness of Cust_No rule comprises a simple IF statement. the rule writes Incomplete Customer_Number in the relevant field. any name in a CP City or Town field has already been verified by the Context Parser (see above). The Merged Address column is used as input by the Parser. which applies a reference dictionary of city and town names to the merged data and writes the output to a new column named CP City or Town. The Rule Based Analyzer has been configured to apply business rules to the Customer_Number and CP City or Town fields. The dataset comprises the following columns: Customer Number Contact Name Company Name Address 1 Address 2 Address 3 Address 4 Zip code ISO Country Code Currency Customer Turnover The transformations have been configured to assess data quality in the following ways: The Merge transformation has been configured to merge four fields from IDQDemo. An empty field in the CP City or Town column indicates that the underlying address data lacks recognizable city/town information. It also merges the ISO Country Code and Currency columns into a Merged Country Code and Currency column that will be analyzed by the Token Labeller.csv.200 records. - - - . If a value is present in a Customer_Number field. The Test Completeness of Town rule profiles the completeness of the CP City or Town column through a similar IF statement. the rule writes Complete Customer_Number to a corresponding field in a new column named Customer Number Completeness. Any American city names found in Merged Address are written to this new column.
zip code and currency data are analyzed against dictionaries of valid zip codes and currency names respectively. and 101. The Labeller applies reference data dictionaries to analyze conformity in the Contact Name. 191. Similarly. first names. and surnames. filters have been defined to identify the account numbers that begin with 159. NOTE: The Reference dictionary refers to the Reference data.Informatica Data Quality On the Filters tab of the transformation s configuration dialog box. Zip Code. and Merged Country Code and Currency columns. Currency. and against dictionaries of US company names. ISO Country Code. The Token Labeller also cross-checks that the currency applied to an account is compatible with the country in which the account holder is resident. - . All numbers so identified are written to a new column named Customer Number Conformity as specified on the transformation s Outputs tab. It does so by comparing the merged country code and currency entries with a dictionary that contains valid country name and currency combinations. Contact and company name data are analyzed against dictionaries of name prefixes. Company Name.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.