This action might not be possible to undo. Are you sure you want to continue?
Error Handling Techniques - PowerCenter Mappings
Identifying and capturing data errors using a mapping approach, and making such errors available for further processing or correction.
Identifying errors and creating an error handling strategy is an essential part of a data integration project. In the production environment, data must be checked and validated prior to entry into the target system. One strategy for catching data errors is to use PowerCenter mappings and error logging capabilities to catch specific data validation errors and unexpected transformation or database constraint errors.
Data Validation Errors
The first step in using a mapping to trap data validation errors is to understand and identify the error handling requirements. Consider the following questions: What types of data errors are likely to be encountered? Of these errors, which ones should be captured? What process can capture the possible errors? Should errors be captured before they have a chance to be written to the target database? Will any of these errors need to be reloaded or corrected? How will the users know if errors are encountered? How will the errors be stored? Should descriptions be assigned for individual errors? Can a table be designed to store captured errors and the error descriptions? Capturing data errors within a mapping and re-routing these errors to an error table facilitates analysis by end users and improves performance. One practical application of the mapping approach is to capture foreign key constraint errors (e.g., executing a lookup on a dimension table prior to loading a fact table). Referential integrity is assured by including this sort of functionality in a mapping. While the database still enforces the foreign key constraints, erroneous data is not written to the target table; constraint errors are captured within the mapping so that the PowerCenter server does not have to write them to the session log and the reject/bad file, thus improving performance. Data content errors can also be captured in a mapping. Mapping logic can identify content
© 2012 Informatica Corporation. All rights reserved. Phoca PDF
applying rules and flagging records with one or more errors. the row containing the error is to be separated from the data flow and logged in an error table. In this example. This approach can be effective for many types of data content error. including: date conversion. One solution is to implement a mapping similar to the one shown below: An expression transformation can be employed to validate the source data. Phoca PDF . Sample Mapping Approach for Data Validation Errors In the following example. for example. It is good practice to append error rows with a unique key. The composite key is designed to allow developers to trace rows written to the error tables that store information useful for error reporting and investigation. namely: CUSTOMER_ERR and ERR_DESC_TBL. and incorrect data formats or data types. two error tables are suggested. null values intended for not null target fields. All rights reserved. customer data is to be checked to ensure that invalid null values are intercepted before being written to not null columns in a target CUSTOMER table. The MAPPING_ID would refer to the mapping name and the ROW_ID would be created by a sequence generator.Informatica's Velocity Methodology errors and attach descriptions to them. A router transformation can then separate valid rows from those containing the errors. this can be a composite consisting of a MAPPING_ID and ROW_ID. © 2012 Informatica Corporation. Once a null value is identified.
After a single source row is normalized. the error row can be split into several rows. For example. The CUSTOMER_ERR table can be an exact copy of the target CUSTOMER table appended with two additional columns: ROW_ID and MAPPING_ID. Phoca PDF . the ROW_ID. This table can be used to hold all data validation error descriptions for all mappings. After the field descriptions are assigned. The following table shows how the error data produced may look. such as the mapping name.. is designed to hold information about the error. The CUSTOMER_ERR table stores the entire row that was rejected. All rights reserved. In this example. Table Name: CUSTOMER_ERR NAME DOB ADDRES ROW_ID S MAPPING_ID NULL NULL NULL 1 DIM_LOAD Table Name: ERR_DESC_TBL FOLDER_ MAPPIN ROW_ERROR_ LOAD_ SOURC Target NAME G_ID ID DESC DATE E © 2012 Informatica Corporation. enabling the user to trace the error rows back to the source and potentially build mappings to reprocess them.Informatica's Velocity Methodology The table ERR_DESC_TBL.e. giving a single point of reference for reporting. if a row has three errors.g. the resulting rows can be filtered to leave only errors that are present (i. each record can have zero to many errors). and the error description. The mapping logic must assign a unique description for each error in the rejected row.. These columns allow the two error tables to be joined. any null value intended for a not null target field could generate an error message such as ‘NAME is NULL’ or ‘DOB is NULL’. three error rows would be generated with appropriate error descriptions (ERROR_DESC) in the table ERR_DESC_TBL. one for each possible error using a normalizer transformation. This step can be done in an expression transformation (e. EXP_VALIDATION in the sample mapping).
for example. the error handling logic can be placed anywhere within a mapping.Informatica's Velocity Methodology CUST DIM_LO 1 AD Name is NULL 10/11/20CUSTO CUST 06 MER_FF OMER CUST DIM_LO 1 AD DOB is NULL 10/11/20CUSTO CUST 06 MER_FF OMER CUST DIM_LO 1 AD Address is 10/11/20CUSTO CUST NULL 06 MER_FF OMER The efficiency of a mapping approach can be increased by employing reusable objects. reject data if constraints are violated. By using the mapping approach to capture identified errors. Phoca PDF . © 2012 Informatica Corporation. the operations team can effectively communicate data quality issues to the business users. For example. The most important aspect of the mapping approach however. however the relational database management system (RDBMS) may reject it for some unexpected reason. Ultimately. A record flagged as ‘hard’ can be filtered from the target and written to the error tables. PowerCenter may apply the validated data to the database. is its flexibility. A ‘hard’ error can be defined as one that would fail when being written to the database. All rights reserved. The advantage of the mapping approach is that all errors are identified as either data errors or constraint errors and can be properly addressed. such as a constraint error. A ‘soft’ error can be defined as a data content error. which can be shared by multiple mappings. we would like to detect these database-level errors automatically and send them to the same error table used to store the soft errors caught by the mapping approach described above. Constraint and Transformation Errors Perfect data can never be guaranteed. Once an error type is identified. This gives business analysts an opportunity to evaluate and correct data imperfections while still allowing the records to be processed for end-user reporting. This improves productivity in implementing and managing the capture of data validation errors. flagging data validation errors as ‘soft’ or ‘hard’. how can we handle unexpected errors that arise in the load? For example. Ideally. In implementing the mapping approach described above to detect errors and log them to an error table. while a record flagged as ‘soft’ can be written to both the target system and the error tables. An RDBMS may. The mapping approach also reports errors based on projects or categories by identifying the mappings that contain errors. Data validation error handling can be extended by including mapping logic to grade error severity. business organizations need to decide if the analysts should fix the data in the reject table or in the source systems. Common logic should be placed in mapplets.
This can be difficult when the source and target rows are not directly related (i. join it with the MX View REP_LOAD_SESSION in the repository. A post-load session (i.e. · ERROR_MSG: Error message generated by the RDBMS With this information. One problem with capturing RDBMS errors in this way is mapping them to the relevant source key to provide lineage. and then reprocess only the records that were found to be in error. This can be achieved by configuring the ‘stop on errors’ property to 0 and switching on relational error logging for a session. By default. ERR_DESC_TBL will contain both ‘soft’ errors and ‘hard’ errors. In this case.Informatica's Velocity Methodology In some cases. PMERR_DATA. The following four columns of this table allow us to retrieve any RDBMS errors: · SESS_INST_ID: A unique identifier for the session. Phoca PDF .. PMERR_TRANS and PMERR_SESS. an additional PowerCenter session) can be implemented to read the PMERR_MSG table.e. The translation table can then be used by the post-load session to identify the source key by the target row number retrieved from the error log.. the mapping that loads the source must write translation data to a staging table (including the source key and target row number). the ‘stop on errors’ session property can be set to ‘1’ to stop source data for which unhandled errors were encountered from being loaded. This is not always an acceptable approach. The source key stored in the translation table could be a row number in the case of a flat file. However. · TRANS_NAME: Name of the transformation where an error occurred. and insert the error details into ERR_DESC_TBL. the error-messages from the RDBMS and any un-caught transformation errors are sent to the session log. In this case. Reprocessing © 2012 Informatica Corporation. When a RDBMS error occurs. one source row can actually result in zero or more rows at the target). When the post process ends. Switching on relational error logging redirects these messages to a selected database in which four tables are automatically created: PMERR_MSG. the data must be corrected. The PowerCenter Workflow Administration Guide contains detailed information on the structure of these tables. the PMERR_MSG table stores the error messages that were encountered in a session. and the entire source may need to be reloaded or recovered. This field contains the row number at the target when the error occurred. Joining this table with the Metadata Exchange (MX) View REP_LOAD_SESSIONS in the repository allows the MAPPING_ID to be retrieved. · TRANS_ROW_ID: Specifies the row ID generated by the last active source. All rights reserved. this is the name of the target transformation. all RDBMS errors can be extracted and stored in an applicable error table. or a primary key in the case of a relational data source. An alternative might be to have the load process continue in the event of records being rejected. the process will stop with a failure.
All rights reserved. Once the errors have been fixed. all rows are passed through the filter. the lookup searches for each source row number in the error table. and loaded. Ideally. This can be achieved by including a filter and a lookup in the original load mapping and using a parameter to configure the mapping for an initial load or for a reprocess load.org) .g. while any new errors encountered should be inserted as if an initial run. Phoca PDF Powered by TCPDF (www. the same mapping can be used for initial and reprocess loads. This ensures that reprocessing loads are repeatable and result in reducing numbers of records in the error table over time.tcpdf. the post-load process is executed to capture any new RDBMS errors. the error table (e. the records successfully loaded should be deleted (or marked for deletion) from the error table. With this approach. On completion. only the rows resulting in errors during the first run should be reprocessed in the reload. If the mapping is reprocessing. fix the data in the source that resulted in ‘soft’ errors and may be able to explain and remediate the ‘hard’ errors. ERR_DESC_TBL) can be analyzed by members of the business or operational teams.Informatica's Velocity Methodology After the load and post-load sessions are complete. validated. the source data can be reloaded. therefore. while the filter removes source rows for which the lookup has not found errors. During a reprocess run. The rows listed in this table have not been loaded into the target database. The operations team can. If initial loading.. © 2012 Informatica Corporation.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.