This action might not be possible to undo. Are you sure you want to continue?
Error Handling Techniques - PowerCenter Mappings
Identifying and capturing data errors using a mapping approach, and making such errors available for further processing or correction.
Identifying errors and creating an error handling strategy is an essential part of a data integration project. In the production environment, data must be checked and validated prior to entry into the target system. One strategy for catching data errors is to use PowerCenter mappings and error logging capabilities to catch specific data validation errors and unexpected transformation or database constraint errors.
Data Validation Errors
The first step in using a mapping to trap data validation errors is to understand and identify the error handling requirements. Consider the following questions: What types of data errors are likely to be encountered? Of these errors, which ones should be captured? What process can capture the possible errors? Should errors be captured before they have a chance to be written to the target database? Will any of these errors need to be reloaded or corrected? How will the users know if errors are encountered? How will the errors be stored? Should descriptions be assigned for individual errors? Can a table be designed to store captured errors and the error descriptions? Capturing data errors within a mapping and re-routing these errors to an error table facilitates analysis by end users and improves performance. One practical application of the mapping approach is to capture foreign key constraint errors (e.g., executing a lookup on a dimension table prior to loading a fact table). Referential integrity is assured by including this sort of functionality in a mapping. While the database still enforces the foreign key constraints, erroneous data is not written to the target table; constraint errors are captured within the mapping so that the PowerCenter server does not have to write them to the session log and the reject/bad file, thus improving performance. Data content errors can also be captured in a mapping. Mapping logic can identify content
© 2012 Informatica Corporation. All rights reserved. Phoca PDF
The MAPPING_ID would refer to the mapping name and the ROW_ID would be created by a sequence generator. including: date conversion. namely: CUSTOMER_ERR and ERR_DESC_TBL. the row containing the error is to be separated from the data flow and logged in an error table. This approach can be effective for many types of data content error. Once a null value is identified. © 2012 Informatica Corporation. A router transformation can then separate valid rows from those containing the errors. Sample Mapping Approach for Data Validation Errors In the following example.Informatica's Velocity Methodology errors and attach descriptions to them. The composite key is designed to allow developers to trace rows written to the error tables that store information useful for error reporting and investigation. In this example. Phoca PDF . All rights reserved. this can be a composite consisting of a MAPPING_ID and ROW_ID. two error tables are suggested. for example. applying rules and flagging records with one or more errors. null values intended for not null target fields. and incorrect data formats or data types. It is good practice to append error rows with a unique key. One solution is to implement a mapping similar to the one shown below: An expression transformation can be employed to validate the source data. customer data is to be checked to ensure that invalid null values are intercepted before being written to not null columns in a target CUSTOMER table.
This table can be used to hold all data validation error descriptions for all mappings. three error rows would be generated with appropriate error descriptions (ERROR_DESC) in the table ERR_DESC_TBL. the error row can be split into several rows. These columns allow the two error tables to be joined. Table Name: CUSTOMER_ERR NAME DOB ADDRES ROW_ID S MAPPING_ID NULL NULL NULL 1 DIM_LOAD Table Name: ERR_DESC_TBL FOLDER_ MAPPIN ROW_ERROR_ LOAD_ SOURC Target NAME G_ID ID DESC DATE E © 2012 Informatica Corporation. In this example. EXP_VALIDATION in the sample mapping).. the resulting rows can be filtered to leave only errors that are present (i.e. The CUSTOMER_ERR table can be an exact copy of the target CUSTOMER table appended with two additional columns: ROW_ID and MAPPING_ID. and the error description.Informatica's Velocity Methodology The table ERR_DESC_TBL. The mapping logic must assign a unique description for each error in the rejected row. After the field descriptions are assigned. each record can have zero to many errors). The following table shows how the error data produced may look. For example. giving a single point of reference for reporting. Phoca PDF . All rights reserved. any null value intended for a not null target field could generate an error message such as ‘NAME is NULL’ or ‘DOB is NULL’. the ROW_ID. enabling the user to trace the error rows back to the source and potentially build mappings to reprocess them. This step can be done in an expression transformation (e.. The CUSTOMER_ERR table stores the entire row that was rejected. one for each possible error using a normalizer transformation. if a row has three errors. such as the mapping name.g. is designed to hold information about the error. After a single source row is normalized.
Ideally. for example. reject data if constraints are violated. © 2012 Informatica Corporation. is its flexibility. Once an error type is identified. we would like to detect these database-level errors automatically and send them to the same error table used to store the soft errors caught by the mapping approach described above. The advantage of the mapping approach is that all errors are identified as either data errors or constraint errors and can be properly addressed. business organizations need to decide if the analysts should fix the data in the reject table or in the source systems. how can we handle unexpected errors that arise in the load? For example. A ‘hard’ error can be defined as one that would fail when being written to the database. A ‘soft’ error can be defined as a data content error. flagging data validation errors as ‘soft’ or ‘hard’. All rights reserved. For example. Ultimately. This improves productivity in implementing and managing the capture of data validation errors. In implementing the mapping approach described above to detect errors and log them to an error table. which can be shared by multiple mappings. An RDBMS may. By using the mapping approach to capture identified errors. The mapping approach also reports errors based on projects or categories by identifying the mappings that contain errors. Constraint and Transformation Errors Perfect data can never be guaranteed. The most important aspect of the mapping approach however. the operations team can effectively communicate data quality issues to the business users. Phoca PDF . the error handling logic can be placed anywhere within a mapping. Common logic should be placed in mapplets. while a record flagged as ‘soft’ can be written to both the target system and the error tables. however the relational database management system (RDBMS) may reject it for some unexpected reason.Informatica's Velocity Methodology CUST DIM_LO 1 AD Name is NULL 10/11/20CUSTO CUST 06 MER_FF OMER CUST DIM_LO 1 AD DOB is NULL 10/11/20CUSTO CUST 06 MER_FF OMER CUST DIM_LO 1 AD Address is 10/11/20CUSTO CUST NULL 06 MER_FF OMER The efficiency of a mapping approach can be increased by employing reusable objects. This gives business analysts an opportunity to evaluate and correct data imperfections while still allowing the records to be processed for end-user reporting. PowerCenter may apply the validated data to the database. A record flagged as ‘hard’ can be filtered from the target and written to the error tables. Data validation error handling can be extended by including mapping logic to grade error severity. such as a constraint error.
By default.e.. and the entire source may need to be reloaded or recovered. an additional PowerCenter session) can be implemented to read the PMERR_MSG table. An alternative might be to have the load process continue in the event of records being rejected. or a primary key in the case of a relational data source. ERR_DESC_TBL will contain both ‘soft’ errors and ‘hard’ errors. A post-load session (i. This field contains the row number at the target when the error occurred. Reprocessing © 2012 Informatica Corporation. All rights reserved. the PMERR_MSG table stores the error messages that were encountered in a session. This can be achieved by configuring the ‘stop on errors’ property to 0 and switching on relational error logging for a session. the error-messages from the RDBMS and any un-caught transformation errors are sent to the session log. Phoca PDF . one source row can actually result in zero or more rows at the target). The source key stored in the translation table could be a row number in the case of a flat file.Informatica's Velocity Methodology In some cases. Joining this table with the Metadata Exchange (MX) View REP_LOAD_SESSIONS in the repository allows the MAPPING_ID to be retrieved. this is the name of the target transformation. When a RDBMS error occurs. When the post process ends. · TRANS_NAME: Name of the transformation where an error occurred. all RDBMS errors can be extracted and stored in an applicable error table. The PowerCenter Workflow Administration Guide contains detailed information on the structure of these tables. This can be difficult when the source and target rows are not directly related (i. the process will stop with a failure. One problem with capturing RDBMS errors in this way is mapping them to the relevant source key to provide lineage. However. the ‘stop on errors’ session property can be set to ‘1’ to stop source data for which unhandled errors were encountered from being loaded. Switching on relational error logging redirects these messages to a selected database in which four tables are automatically created: PMERR_MSG. In this case. The translation table can then be used by the post-load session to identify the source key by the target row number retrieved from the error log. the mapping that loads the source must write translation data to a staging table (including the source key and target row number).. In this case.e. · TRANS_ROW_ID: Specifies the row ID generated by the last active source. · ERROR_MSG: Error message generated by the RDBMS With this information. This is not always an acceptable approach. the data must be corrected. PMERR_TRANS and PMERR_SESS. and insert the error details into ERR_DESC_TBL. PMERR_DATA. and then reprocess only the records that were found to be in error. The following four columns of this table allow us to retrieve any RDBMS errors: · SESS_INST_ID: A unique identifier for the session. join it with the MX View REP_LOAD_SESSION in the repository.
Phoca PDF Powered by TCPDF (www.g.Informatica's Velocity Methodology After the load and post-load sessions are complete. and loaded.org) . If the mapping is reprocessing. the post-load process is executed to capture any new RDBMS errors. while the filter removes source rows for which the lookup has not found errors. Ideally. During a reprocess run. the same mapping can be used for initial and reprocess loads. validated. the lookup searches for each source row number in the error table. If initial loading. while any new errors encountered should be inserted as if an initial run.tcpdf. This ensures that reprocessing loads are repeatable and result in reducing numbers of records in the error table over time. All rights reserved. The operations team can. This can be achieved by including a filter and a lookup in the original load mapping and using a parameter to configure the mapping for an initial load or for a reprocess load. the source data can be reloaded. On completion. The rows listed in this table have not been loaded into the target database. fix the data in the source that resulted in ‘soft’ errors and may be able to explain and remediate the ‘hard’ errors. the error table (e. © 2012 Informatica Corporation. With this approach. therefore. all rows are passed through the filter. Once the errors have been fixed. the records successfully loaded should be deleted (or marked for deletion) from the error table. ERR_DESC_TBL) can be analyzed by members of the business or operational teams.. only the rows resulting in errors during the first run should be reprocessed in the reload.
This action might not be possible to undo. Are you sure you want to continue?