Create a new DataStage Parallel job with 3 stages linked together: a sequential file stage, xml input stage (located under the Real Time category), and a peek stage. The first trick is to load the entire xml file into a single row.
Create a new DataStage Parallel job with 3 stages linked together: a sequential file stage, xml input stage (located under the Real Time category), and a peek stage. The first trick is to load the entire xml file into a single row.
Copyright:
Attribution Non-Commercial (BY-NC)
Available Formats
Download as TXT, PDF, TXT or read online from Scribd
Create a new DataStage Parallel job with 3 stages linked together: a sequential file stage, xml input stage (located under the Real Time category), and a peek stage. The first trick is to load the entire xml file into a single row.
Copyright:
Attribution Non-Commercial (BY-NC)
Available Formats
Download as TXT, PDF, TXT or read online from Scribd
[edit]Introduction The intention of this tutorial is to give novice developers a quick start with l oading XML data using a DataStage parallel job. [edit]Steps Step 1: Create a simple XML file named test.xml <xml> <customer>Mike</customer> <customer>Anna</customer> </xml> Step 2: Create a new DataStage parallel job with 3 stages linked together: A sequential file stage, XML input stage (located under the Real Time category), and a peek s tage. Step 3: The first trick is to load the entire XML file into a single column of a single row. You do this by creating a column in the sequential file stage of type LongV arChar[Max=9999]. In this example the max size is arbitrary. Set the input file to test.xml. Next, remove all properties in the [Format] tab and add these two: In the Record level: Record type=implicit In the Field defaults: Delimiter=none Step 4: Now that we have the XML in a single column then we can set the XML input stage properties. In the [Transformation settings] tab under the [Stage] tab check the [Repetition element required] tag. In the [Input] tab select the column that yo u defined in step 3 and check the [XML document] box. In the [Output] tag define a column named [customer] of type varchar[max=255]. Set it as the key. In the d escription box enter the xml path. In this case /xml/customer/text() Tip: To reference XML attributes you would use @. For example: /xml/customer/@id would equal 1 when using this xml: <xml><customer id= 1 >Mike</customer></xml> Step 5: Compile and run. Peek will produce log records that list the customers from the XML file. [edit]Conclusion That's it. For more details on processing XML read the XML Pack documentation th at comes with DataStage. Here is a more extensive XML tutorial for server jobs f rom IBM: Transform and integrate data using WebSphere DataStage XML and Web services pack s This biggest difference is that in parallel jobs you do not have a folder stage so you need to use the sequential file stage with the setting mentioned above.