You are on page 1of 2

Load XML files using a DataStage Parallel job

updated Mar 28, 2008 11:27 pm | 13,361 views


[edit]Introduction
The intention of this tutorial is to give novice developers a quick start with l
oading XML data using a DataStage parallel job.
[edit]Steps
Step 1:
Create a simple XML file named test.xml
<xml> <customer>Mike</customer> <customer>Anna</customer> </xml>
Step 2:
Create a new DataStage parallel job with 3 stages linked together: A sequential
file stage, XML input stage (located under the Real Time category), and a peek s
tage.
Step 3:
The first trick is to load the entire XML file into a single column of a single
row. You do this by creating a column in the sequential file stage of type LongV
arChar[Max=9999]. In this example the max size is arbitrary. Set the input file
to test.xml. Next, remove all properties in the [Format] tab and add these two:
In the Record level:
Record type=implicit
In the Field defaults:
Delimiter=none
Step 4:
Now that we have the XML in a single column then we can set the XML input stage
properties. In the [Transformation settings] tab under the [Stage] tab check the
[Repetition element required] tag. In the [Input] tab select the column that yo
u defined in step 3 and check the [XML document] box. In the [Output] tag define
a column named [customer] of type varchar[max=255]. Set it as the key. In the d
escription box enter the xml path. In this case /xml/customer/text()
Tip: To reference XML attributes you would use @. For example: /xml/customer/@id
would equal 1 when using this xml: <xml><customer id= 1 >Mike</customer></xml>
Step 5:
Compile and run. Peek will produce log records that list the customers from the
XML file.
[edit]Conclusion
That's it. For more details on processing XML read the XML Pack documentation th
at comes with DataStage. Here is a more extensive XML tutorial for server jobs f
rom IBM:
Transform and integrate data using WebSphere DataStage XML and Web services pack
s
This biggest difference is that in parallel jobs you do not have a folder stage
so you need to use the sequential file stage with the setting mentioned above.

You might also like