Regular Expression Validation EP

Salim Achouche Principal Software Engineer Ashlee Bailey Senior Technical Writer

Overview
If you are running PowerCenter 7.0, you can include Regular Expression Validation External Procedure (EP) transformations in a mapping to validate patterns of data in String format. This lets you validate data patterns, such as IDs, telephone numbers, postal codes, and state names. You can also include the name of a Regular Expression Validation EP as part of an expression in an Expression transformation in a mapping. This is useful when you want to validate more than one data pattern in a single mapping. You validate data patterns in the Regular Expression Validation EP transformation using Perl Compatible Regular Expressions (PCRE) in the EP transformation. PCRE is a powerful tool for matching data in String format that follows a pattern.

Installing the Regular Expression Validation EP Transformation
Before you can use the Regular Expression Validation EP transformation, you must download the RegExValidation70.zip file. The ZIP file includes the following components: sample1.xml file and sample2.xml files. These files contain a Regular Expression Validation EP transformation and two mappings that use the transformation. documentation PowerCenter Server files for the Regular Expression Validation EP transformation Once you download the ZIP file, you configure the Regular Expression Validation EP files and import the transformation into your PowerCenter repository. Note: There is no need to have a Perl installation when using the Regular Expression Validation EP transformation, since it has been compiled with the open source PCRE RegEx library. To install and configure the Regular Expression Validation EP transformation: 1. Download the RegExValidation70.zip file to your local area network. 2. Unzip the RegExValidation70.zip file to a temporary directory. The unzip process extracts a folder called RegExValidation70. 3. Open the RegExValidation70 folder. 4. Copy the PowerCenter Server library file to the PowerCenter Server\bin directory based on the following table: Operating System Windows Filename pmdpregexpr.dll pmpcre.dll pcre.dll pmdpmetadata.dll

1 libpmpcre.so. IS_VALID is the output port. From the PowerCenter Designer.so. if you want to use a regular expression to validate telephone numbers and employee IDs.a libpcre.a libpmdpmetadata. .so libpmdpregexpr.0 libpmdpmetadata.so libpcre.a libpmpcre.so libpmdpregexpr. Table 1 describes the ports in the Regular Expression Validation EP transformation: Table 1: Ports in a Regular Expression Validation EP Transformation Port Name COLUMN_VALUE IS_VALID Description Represents the input string that should be validated. set to 0.so.so libpcre.Solaris Linux HP-UX AIX libpmdpregexpr.xml files.sl libpmdpmetadata. import the sample1. Set to 1 if input string is valid or null.xml and sample2.a Note: You must have execute permission to run the PowerCenter Server libraries.0 libpmdpmetadata.sl libpmpcre. 5. The Regular Expression Validation EP transformation has one input and one output port.sl libpcre. Otherwise.so libpmdpregexpr. COLUMN_VALUE is the input port. Working with the Regular Expression Validation EP Transformation Include one Regular Expression Validation EP transformation in a mapping for each expression you want to use to validate source data. you must add a separate Regular Expression Validation EP transformation for each regular expression to your mapping.1 libpmpcre.so. The ports are predefined and cannot be modified. For example.

You do not have to configure any properties when you include the transformation in your mapping. . The data you want to validate must be in String format. Tip: Use port concatenation as a workaround. Figure 2: Properties Tab of the Regular Expression Validation EP Transformation Once you include the Regular Expression Validation EP transformation in a mapping.Figure 1 shows the Ports tab of the Regular Expression Validation EP transformation with the COLUMN_VALUE and IS_VALID ports: Figure 1: Ports Tab of the Regular Expression Validation EP Transformation You cannot use pass-through ports with the Regular Expression Validation EP transformation. you can define the regular expression on the Initialization Properties tab of the transformation at the mapping level. Figure 2 shows the properties tab of the Regular Expression Validation EP transformation.

Figure 3 shows the Initialization Properties tab of the Regular Expression Validation EP transformation with a regular expression: Figure 3: Initialization Properties Tab of the Regular Expression Validation EP Transformation The Value attribute represents the regular expression that the PowerCenter Server uses to validate the input port COLUMN_VALUE. (a period) [a-z] \d Description Matches any one character. To turn off the default anchored behavior.* at the beginning and end of the input string. you can add . Matches one instance of a letter. [a-z] and [A-Z] are equivalent. For example. the following information applies: Regular expressions are case insensitive. Working with Perl Compatible Regular Expressions in an Regular Expression Validation EP When you use PCRE in a Regular Expression Validation EP. [a-z][a-z] can match ab or CA. For example. For example. \d+ is equivalent to ^\d+$. This means that matching starts at the beginning of the input string and ends at the end of the input string.Tip: To validate data that is not in String format. you can use an Expression transformation in the mapping to convert the data to String format. $ is the PCRE syntax for marking the end of a string. ^ is the PCRE syntax for marking the beginning of a string. Matches one instance of any digit from 0-9. . Regular expressions are automatically anchored. Table 2 provides guidelines for entering a regular expression in the Regular Expression Validation EP transformation: Table 2: PCRE Syntax Syntax . Spaces at the beginning and end of an input string are ignored.

x 9xx9 Matches one instance of a letter. Matches any four digits from 0-9. For example. such as 5407. as in 93930-5407. which finds any two numbers followed by a hyphen and any two numbers. In this example.() Groups an expression. \d{3} matches any five numbers. The parentheses surrounding \d{4} group this segment of the expression. Matches the number of characters exactly. . as in 1ab2. \d{5} refers to any five numbers. \d{4} refers to any four numbers. Table 3 shows examples of COBOL syntax and their PCRE equivalents: Table 3: COBOL Syntax and PCRE Syntax Compared COBOL Syntax 9 9999 PCRE Syntax \d \d\d\d\d or \d{4} [a-z] \d[a-z][a-z]\d Description Matches one instance of any digit from 0-9. Matches any number followed by two letters and another number. as well as 9-digit zip codes. such as 93930-5407. Or. zip codes. Table 4 shows examples of SQL syntax and their PCRE equivalents: Table 4: SQL Syntax and PCRE Syntax Compared SQL Syntax % PCRE Syntax . The hyphen represents the hyphen of a 9-digit zip code. [a-z]{2} matches any two letters.S. such as 650 or 510. as in 12-34. as in 1234 or 5936. \d{3}(-{d{4})? matches any three numbers. which can be followed by a hyphen and any four numbers. *0 is any value that precedes a 0. to create a regular expression for U. such as CA or NY. zip codes. Matches zero or more instances of the values that follow the asterisk. {} ? * For example. such as 93930. Matches the preceding character or group of characters zero or one time. Tips for Converting COBOL Syntax to PCRE format If you are familiar with COBOL syntax. the parentheses in (\d-\d-\d\d) groups the expression \d\d-\d\d. you can use the following information to help you write regular expressions.S. you can enter the following: \d{5}(-\d{4})? This expression lets you validate a column that contains 5-digit U. For example. you can use the following information to help you write regular expressions.* Description Matches any string. Tips for Converting SQL Syntax to PCRE format If you are familiar with SQL syntax. For example. The question mark states that the hyphen and last four digits are optional or can appear one time. For example. such as 93930.

For example. . use a Filter transformation to filter out the valid telephone numbers and pass the invalid values to the target. Matches any one character. The Regular Expression Validation EP transformation validates the data pattern. Therefore. Validating a Regular Expression To validate a data pattern with a regular expression.* . Matches “A” followed by any one character. such as AZ.A% _ A_ A. Figure 4 shows an example of Regular Expression Validation EP transformation with a regular expression to check valid North American telephone numbers: Figure 4: Regular Expression for Validating North American Telephone Numbers In the mapping. which contains the data you want to validate. Enter the following regular expression on the Initialization Properties tab in the EP transformation to determine if the data in the TEL column are valid North American telephone numbers: \d\d\d-\d\d\d-\d\d\d\d In this expression. while numbers such as 385-5000 are not. \d\d\d finds any three numbers and \d\d\d\d finds any four numbers. Use an Expression transformation to pass the data in the TEL column to a Regular Expression Validation EP transformation. you want to verify that the telephone numbers in the TEL column in your source are valid North American telephone numbers. The Expression transformation creates the name for the column. numbers such as 650-385-5000 are valid. (a period) A. include a Regular Expression Validation EP transformation in the mapping along with an Expression transformation. as in Area. Matches the letter “A” followed by any string. You want to write invalid telephone numbers to a target table that holds invalid employee attributes. The Expression transformation also passes the data to the Regular Expression Validation EP transformation.

<transformation_name>(<column_name>) For example: :EXT. you want to validate North American telephone numbers and five-digit customer ID numbers. When you include Regular Expression Validation EP transformations in an expression. add a Regular Expression Validation EP transformation for each data pattern you want to validate. you must leave the transformations unconnected in the mapping. For example.t_RegExValidate_CID(CID) . You want to write invalid values to target tables.Figure 5 shows the mapping for writing invalid telephone numbers to a target: Figure 5: Mapping with a Single Regular Expression Validation EP Transformation Validating Multiple Regular Expressions You can use regular expressions to validate multiple data patterns in a single mapping. Create two Regular Expression Validation EP transformations: one to validate North American telephone numbers and one to validate customer IDs. Use the following syntax in the expression: :EXT. Include the transformation name as part of an expression in an Expression transformation. In this case. Include the name of the Regular Expression Validation EP transformations as part of expressions in an Expression transformation.

Figure 7 shows an example of a mapping with two Regular Expression Validation EP transformations and a single Expression transformation: Figure 7: Mapping for Validating Multiple Regular Expressions .Figure 6 shows the Ports tab of an Expression transformation that includes a Regular Expression Validation EP transformation name in an expression: Figure 6: Ports Tab of the Expression Transformation In the mapping. use Filter transformations to filter the invalid data and pass it to target tables.

Sign up to vote on this title
UsefulNot useful