You are on page 1of 130

MANAPPS

Pg 1

ETL Benchmarks
Version corrigée V 1.1

Comparing
DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA 8.1.1 PENTAHO DATA INTEGRATOR 3.0.0

info@manapps.tm.fr

V 1.1 2009/01

ETL Benchmarks

MANAPPS

Pg 2

This document is published under the Creative Commons license: http://creativecommons.org/licenses/by/3.0/us/

You are free: to Share — to copy, distribute, display, and perform the work

to Remix — to make derivative works

Under the following conditions: Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to this web page. Any of the above conditions can be waived if you get permission from the copyright holder. Apart from the remix rights granted under this license, nothing in this license impairs or restricts the author's moral rights.

V 1.1 2009/01

ETL Benchmarks

MANAPPS

Pg 3

Table of Contents
You are free: .................................................................................................................................... 2 Under the following conditions: ...................................................................................................... 2 Table of Contents .................................................................................................................................... 3 General comments .................................................................................................................................. 5 Hardware Configuration .......................................................................................................................... 6 Test 1: File Input Delimited > File Output Delimited ............................................................................... 7 Scenario: .............................................................................................................................................. 7 Test results: ....................................................................................................................................... 13 Test 2: File Input Delimited > Table MySQL Output .............................................................................. 14 Scenario: ............................................................................................................................................ 14 Test results: ....................................................................................................................................... 17 Test 3: Table Oracle Input > File Output Delimited ............................................................................... 17 Scenario: ............................................................................................................................................ 17 Test results: ....................................................................................................................................... 24 Test 4: File Input Delimited > Table Output Oracle BULK ..................................................................... 25 Scenario: ............................................................................................................................................ 25 Test results: ....................................................................................................................................... 31 Test 5: File Input Delimited > Transform > File Output Delimited ........................................................ 32 Scenario: ............................................................................................................................................ 32 Tests result: ....................................................................................................................................... 44 Test 6: Table Input Oracle > Aggregation > Table Output Oracle (ELT) ................................................ 45 Scenario: ............................................................................................................................................ 45 Test results: ....................................................................................................................................... 51 Test 7: Tables Input Oracle > Transformation > Tables Output Oracle (ELT) ........................................ 52 Scenario: ............................................................................................................................................ 52 Test results: ....................................................................................................................................... 58 Test 8: File Input Delimited > Sort > File Output Delimited .................................................................. 59

V 1.1 2009/01

ETL Benchmarks

...........................MANAPPS Pg 4 Scenario: ................ 95 Scenario: ............................................................................ 59 Tests result: .................................... 108 V 1....................................................................................................................... 69 Tests result: ................................................................. 79 Scenario: .......................... 79 Tests result: ..................................................... 69 Scenario: .......................................................................................................................................................1 2009/01 ETL Benchmarks ................................................................................... 76 Test 10: File Input Delimited > Lookup > File Output Delimited ........................ 91 Test 11: File Input Delimited > Lookup > File Output Delimited && rejects ...................................... 95 Tests result: ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... 65 Test 9: File Input Delimited > Aggregate > File Output Delimited ...................................................................

as detailed in this corrected version of our benchmark.0 of this ETL Benchmark. we have assigned.1 of the ETL Benchmark.1 2009/01 ETL Benchmarks . and the same tests were run again on the same environment. An expert from Informatica suggested adapted settings. we used 2 nodes to take advantage of the dual cores and of the parallelization feature of the tool. a specific number of points to the tested solutions (5 points to the best. 4 to the second…1 to the fifth).5 (239 points) V 1.1 (353 points) o Second: Talend Open Studio 2.MANAPPS Pg 5 General comments This document constitutes Version 1. for each test. as our tests were carried out with inadequate settings for this product. and are ready to give access to our testing conditions in order to allow them to verify the results obtained by their products and to suggest applicable best practices.1 of the benchmark thus includes the updated results and comparison between all tested tools. but also to other publishers. Global performance: As requested by some people after the issue of version 1. and we think that each test is different.1. and Annexe1 details the changes in the use of the Informatica software.4. in order to preserve the benchmarking basis between all compared ETL tools. For the tests with DataStage PX. as version 1.0 showed inaccurate tests results for the PowerCenter solution powered by Informatica.1 (333 points) o Third: IBM Datastage PX 7. some people ask us to give a global synthesis of those tests. We are open to comments from all tested editors. results are as follows: o First: Informatica 8. According to this scenario. This Version 1. Results: Even if it is difficult to give results for this kind of benchmark. Use of the correct settings on the Informatica PowerCenter solution greatly improve the results obtained by this solution on the same ETL benchmark tests.

1 16 0 11 13 18 12 15 16 18 17 17 19 16 13 16 18 19 16 16 19 13 16 19 Total 333 148 199 239 353 Open Source ETL & Parallelization: Pentaho Data Integrator claims the first position here.2 Test11.0_87 RAM: 4 Go V 1. Hardware Configuration OS: Windows XP Pro SP2 CPU: Intel Core2 Duo 2 GHz JVM 1.3 Test12 Test12.3 Test9 Test9.6.3 13 0 13 8 15 15 11 13 12 12 12 16 12 20 20 16 12 20 20 16 20 20 17 PDI 3. We did however fine some issues with the way the tool lets you to parallelize all the components.1 Test1 Test2 Test3 Test4 Test5 Test6 Test7 Test8 Test8.0.0 (148 points) Below are the detailed results: TOS 2.3 Test10 Test10. It is easier to parallelize with PDI.5 (199 points) o Fifth: Pentaho Data Integration 3.0 7 0 3 7 4 4 3 12 13 12 6 5 8 7 6 6 4 7 6 6 8 7 7 IBM DS 7.3 Test10.2 Test10. but some results are inconsistent.0.4.4 Test11 Test11.2 Test9.1.5 19 0 7 12 13 10 7 5 4 4 15 12 13 12 6 6 8 10 6 6 13 6 5 IBM DS PX 7.1 2009/01 ETL Benchmarks .5 8 0 9 5 12 5 8 14 15 15 12 9 11 10 13 14 17 8 12 13 6 11 12 INFA PWC 8.MANAPPS Pg 6 o Fourth: IBM Dataserver 7.2 Test8.2 Test12.

1 2009/01 ETL Benchmarks . File input delimited extract: V 1.MANAPPS Pg 7 Test 1: File Input Delimited > File Output Delimited Scenario: Reading X lines from a file input delimited and writing in a file output delimited.

1 2009/01 ETL Benchmarks .MANAPPS Pg 8 TALEND OPEN STUDIO Job name: file_input_delimited__file_output_delimited Job Schema of file_input_delimited V 1.

MANAPPS Pg 9 PENTAHO DATA INTEGRATION Job name: file_input_delimited__file_output_delimited Job Schema of file_input_delimited V 1.1 2009/01 ETL Benchmarks .

1 2009/01 ETL Benchmarks .MANAPPS Pg 10 DATASTAGE SERVER Job name: file_input_delimited__file_output_delimited Job Schema of file_input_delimited V 1.

1 2009/01 ETL Benchmarks .MANAPPS Pg 11 DATASTAGE PX Job name: PX_file_input_delimited__file_output_delimited Job Schema of file_input_delimited V 1.

1 2009/01 ETL Benchmarks .MANAPPS Pg 12 INFORMATICA Job name: file_input_delimited__file_output_delimited Job Schema of file_input_delimited V 1.

1 PDI 3.40 2.1 2 0.99 DataStage 7.1.0.00 2.4.1 PDI 3.50 83.00 18.0 2 1.00 150.0 IBM DS 7.00 3.4 1.00 1 000 000 5 000 000 7.00 Statistics: Number of lines 100 000 1 000 000 TOS 2.00 74.4.1.5 2 0.00 12.1 V 1.5 DataStage PX 7.00 2.10 15.80 39.MANAPPS Pg 13 Test results: Test 1: File Input Delimited > File Output Delimited Lines TOS 2.1 100 000 1.54 Informatica 8.00 7.00 40.80 66.4.00 20 000 000 162.5 INFA PWC 8.51 3.1 2009/01 ETL Benchmarks .50 12.80 4.5 IBM DS PX 7.09 417.0.9 ratio compared with TOS 2.

TOS 2. To parallelize with TOS 2.1 is 6 times faster. Comments: DataStage 7. the job has been parallelize. we just have to cut through our file input delimited (With the header and the limit parameters) and parallelize two sub-jobs.0.1 permits to use the extended insert. TOS 2.47 5 000 000 20 000 000 Test 2: File Input Delimited > Table MySQL Output Scenario: Reading X lines from a file input delimited and writing into a table output MySQL. With this feature. DataStage PX 7.93 0.1 2009/01 ETL Benchmarks .1 are not tested for this use case.32 0.4.4. the test has been done with default parameters.41 1. we just have to increment the number of copy.14 2.46 0.1. With PDI 3.5. This feature limits the number of database accesses and increases the performances. V 1.02 0. the commit parameter has been learned. To optimize the performances. To finish.1.MANAPPS Pg 14 2.4. which is a MySQL feature.5 and Informatica 8.0. To begin.58 0.

1 2009/01 ETL Benchmarks .MANAPPS Pg 15 TALEND OPEN STUDIO Job name: file_input_delimited__table_output_mysql Job (Multi-Thread Execution checked on Job Settings) Schema of file_input_delimited V 1.

1 2009/01 ETL Benchmarks .MANAPPS Pg 16 PENTAHO DATA INTEGRATION Job name: file_input_delimited__table_output_mysql Job Schema of file_input_delimited V 1.

18 0.0 TOS 2.60 1 000 000 144.MANAPPS Pg 17 Test results: Test 2: File Input Delimited > Table MySQL Output Lines TOS 2.4.1 with Extended Insert 100 000 15.90 2.17 0.80 25.05 1.4.98 1.1 Extended Insert 0.0.00 Statistics: Number of lines 100 000 1 000 000 5 000 000 TOS 2.1 PDI 3.1 2009/01 ETL Benchmarks .18 ratio compared with TOS 2.0 0.15 TOS 2.50 151.0.4.90 129.1 Test 3: Table Oracle Input > File Output Delimited Scenario: V 1.78 843.4.00 5 000 000 731.26 14.4.1 PDI 3.

MANAPPS Pg 18 Reading X lines from a table output Oracle and writing into a file output delimited.1 2009/01 ETL Benchmarks . V 1.

MANAPPS Pg 19 TALEND OPEN STUDIO Job name: table_input_oracle__file_output_delimited Job Schema of table_input_oracle V 1.1 2009/01 ETL Benchmarks .

MANAPPS Pg 20 PENTAHO DATA INTEGRATION Job name: table_input_oracle__file_output_delimited Job SCHEMA VIEWER NOT POSSIBLE Schema of table_input_oracle V 1.1 2009/01 ETL Benchmarks .

MANAPPS Pg 21 DATASTAGE SERVER Job name: table_input_oracle__file_output_delimited Job Schema of table_input_oracle V 1.1 2009/01 ETL Benchmarks .

1 2009/01 ETL Benchmarks .MANAPPS Pg 22 DATASTAGE PX Job name: PX_table_input_oracle__file_output_delimited Job Schema of table_input_oracle V 1.

1 2009/01 ETL Benchmarks .MANAPPS Pg 23 INFORMATICA Job name: table_input_oracle__file_output_delimited Job Schema of table_input_oracle V 1.

MANAPPS Pg 24 Test results: Test 3: Table Oracle Input > File Output Delimited Lines TOS 2.0 2.00 15.78 1.1.5 INFA PWC 8.5 IBM DS PX 7.5 1.33 1.0 IBM DS 7.28 1.39 2.5 DataStage PX 7.00 8.1 2009/01 ETL Benchmarks .40 19.4.25 4.20 11.1 2 0.00 9 Statistics: Number of lines 100 000 500 000 1 000 000 TOS 2.00 6 1 000 000 14.1 100 000 2.26 21.76 1.62 DataStage 7.00 5 500 000 6.4.1 V 1.78 1.12 3.63 ratio compared with TOS 2.0.1.4.25 37.00 4.1 PDI 3.95 0.0.1 PDI 3.05 Informatica 8.78 4.

V 1.MANAPPS Pg 25 Test 4: File Input Delimited > Table Output Oracle BULK Scenario: Reading X lines from a file input delimited and writing into a table output Oracle BULK.1 2009/01 ETL Benchmarks .

MANAPPS Pg 26 TALEND OPEN STUDIO Job name: file_input_delimited__table_output_oracle_bulk Job V 1.1 2009/01 ETL Benchmarks .

1 2009/01 ETL Benchmarks .MANAPPS Pg 27 PENTAHO DATA INTEGRATION Job name: file_input_delimited__table_output_oracle_bulk Job Schema of file_input_delimited V 1.

1 2009/01 ETL Benchmarks .MANAPPS Pg 28 DATASTAGE SERVER Job name: file_input_delimited__table_output_oracle_bulk Job Schema of file_input_delimited V 1.

MANAPPS

Pg 29

DATASTAGE PX Job name: PX_file_input_delimited__table_output_oracle_bulk

Job

Schema of file_input_delimited

V 1.1 2009/01

ETL Benchmarks

MANAPPS

Pg 30

INFORMATICA Job name: file_input_delimited__table_output_oracle_bulk

Job

Schema of file_input_delimited

V 1.1 2009/01

ETL Benchmarks

MANAPPS

Pg 31

Test results:

Test 4: File Input Delimited > Table Output Oracle BULK

Lines TOS 2.4.1 PDI 3.0.0 IBM DS 7.5 IBM DS PX 7.5 INFA PWC 8.1.1

100 000 4,36 2,60 3,00 6,00 4

1 000 000 22,12 30,60 18,00 27,00 7

2 000 000 49,66 72,70 40,00 55,00 11

Statistics: Number of lines 100 000 1 000 000 2 000 000 TOS 2.4.1 PDI 3.0.0 0,6 1,38 1,46 DataStage 7.5 DataStage PX 7.5 0,69 0,81 0,8 1,38 1,22 1,11 Informatica 8.1.1 0,92 0,31 0,22

ratio compared with TOS 2.4.1

V 1.1 2009/01

ETL Benchmarks

Thus. The four others ETL got a transformer to do this. The used language is JavaScript. The fields `address` content is converted to uppercase.1 2009/01 ETL Benchmarks . we have to use a custom code component. The new field `name` is a concatenation (`firstname`+ « » +`lastname`). The field `rate` content is multiplied by 100. Talend Open Studio got a custom code too. V 1. named tJavaRow or tPerlRow. Changes list: • • • Comments: Pentaho Data Integration hasn’t any graphic component to transform data.MANAPPS Pg 32 Test 5: File Input Delimited > Transform > File Output Delimited Scenario: Reading X lines from a file input delimited and writing in a file output delimited after some changes.

MANAPPS

Pg 33

TALEND OPEN STUDIO Job name: file_input_delimited__transformation__file_output_delimited

Job

Schema of file_input_delimited

Schema of file_output_delimited

V 1.1 2009/01

ETL Benchmarks

MANAPPS

Pg 34

tMap

V 1.1 2009/01

ETL Benchmarks

MANAPPS

Pg 35 PENTAHO DATA INTEGRATION Job name: file_input_delimited__transformation__file_output_delimited

Job

Schema of file_input_delimited

Schema of file_output_delimited

V 1.1 2009/01

ETL Benchmarks

1 2009/01 ETL Benchmarks .MANAPPS Pg 36 JavaScript Custom Code Select Values Select Values V 1.

1 2009/01 ETL Benchmarks .MANAPPS Pg 37 DATASTAGE SERVER Job name: file_input_delimited__transformation__file_output_delimited Job Schema of file_input_delimited Schema of file_output_delimited V 1.

1 2009/01 ETL Benchmarks .MANAPPS Pg 38 Transformer V 1.

MANAPPS Pg 39 DATASTAGE PX Job name: PX_file_input_delimited__transformation__file_output_delimited Job Schema of file_input_delimited Schema of file_output_delimited V 1.1 2009/01 ETL Benchmarks .

1 2009/01 ETL Benchmarks .MANAPPS Pg 40 Transformer V 1.

1 2009/01 ETL Benchmarks .MANAPPS Pg 41 INFORMATICA Job name: file_input_delimited__transformation__file_output_delimited Job Schema of file_input_delimited V 1.

1 2009/01 ETL Benchmarks .MANAPPS Pg 42 Schema of file_output_delimited V 1.

MANAPPS Pg 43 Mapping V 1.1 2009/01 ETL Benchmarks .

07 6 6.16 DataStage 7.18 1.13 1126.1 PDI 3.1 2.65 1.00 4.5 1.0.1 100 000 1.02 6.97 3.75 3.5 INFA PWC 8.1 PDI 3.00 11.7 0.MANAPPS Pg 44 Tests result: Test 5: File Input Delimited > Transform > File Output Delimited Lines TOS 2.4.5 IBM DS PX 7.00 5 000 000 43.39 0.95 0.00 74.00 Statistics: Number of lines 100 000 1 000 000 5 000 000 20 000 000 TOS 2.1 2009/01 ETL Benchmarks .1.30 2.4.33 6.3 0.0 IBM DS 7.0.4 ratio compared with TOS 2.0 4.30 5.10 259.1 V 1.1.33 0.3 0.00 155.50 51.00 41.00 1 000 000 8.00 10.5 DataStage PX 7.00 17.84 Informatica 8.10 178.40 56.54 1.00 20 000 000 183.4.

V 1. but I didn’t find this feature on the tool.MANAPPS Pg 45 Test 6: Table Input Oracle > Aggregation > Table Output Oracle (ELT) Scenario: Reading X lines from tables input Oracle and writing into another tables output Oracle (ELT Mod). Informatica got the Push Down Optimization.1 2009/01 ETL Benchmarks . Comments: Only Talend Open Studio permits to use an ELT mod.

1 2009/01 ETL Benchmarks .MANAPPS Pg 46 TALEND OPEN STUDIO Job names: ELT__table_input_oracle__aggregate_group_by_age_count__table_output_oracle Job (ELT) Schema of table_input_oracle V 1.

MANAPPS Pg 47 PENTAHO DATA INTEGRATION Job name: table_input_oracle__aggregate_group_by_age_count__table_output_oracle Job SCHEMA VIEWER NOT POSSIBLE Schema of table_input_oracle V 1.1 2009/01 ETL Benchmarks .

MANAPPS Pg 48 DATASTAGE SERVER Job name: table_input_oracle__aggregate_group_by_age_count__table_output_oracle Job Schema of table_input_oracle V 1.1 2009/01 ETL Benchmarks .

MANAPPS Pg 49 DATASTAGE PX Job name: PX_table_input_oracle__aggregate_group_by_age_count__table_output_oracle Job Schema of table_input_oracle V 1.1 2009/01 ETL Benchmarks .

1 2009/01 ETL Benchmarks .MANAPPS Pg 50 INFORMATICA Job name: table_input_oracle__aggregate_group_by_age_count__table_output_oracle Job Schema of table_input_oracle V 1.

4.00 3 1 000 000 1.69 47.36 ratio compared with TOS 2.1 PDI 3.MANAPPS Pg 51 Test results: Test 6: Table Input Oracle > Aggregation > Table Output Oracle (ELT) Lines TOS 2.36 Informatica 8.1 2009/01 ETL Benchmarks .1 3.80 13.57 10.28 DataStage 7.4 22.1.4.5 INFA PWC 8.71 8.67 17.9 28.44 15.94 5.50 4 Statistics: Number of lines 100 000 500 000 1 000 000 TOS 2.45 8.22 2.1 100 000 1.26 8.5 DataStage PX 7.26 2.09 6.1.00 12.24 4.1 PDI 3.0 3.40 8.1 V 1.14 2.0.4.0 IBM DS 7.5 1.00 4 500 000 1.5 IBM DS PX 7.0.

1 2009/01 ETL Benchmarks . V 1.MANAPPS Pg 52 Test 7: Tables Input Oracle > Transformation > Tables Output Oracle (ELT) Scenario: Reading X lines from tables input Oracle and writing into another tables output Oracle (ELT Mod) after some changes.

MANAPPS Pg 53 TALEND OPEN STUDIO Job name: table_input_oracle__elt__table_output_oracle Job (ELT) Schema of table_lookup_oracle Schema of table_input_oracle V 1.1 2009/01 ETL Benchmarks .

MANAPPS Pg 54 PENTAHO DATA INTEGRATION Job name: table_input_oracle__elt__table_output_oracle Job SCHEMA VIEWER NOT POSSIBLE Schema of table_lookup_oracle SCHEMA VIEWER NOT POSSIBLE Schema of table_input_oracle V 1.1 2009/01 ETL Benchmarks .

1 2009/01 ETL Benchmarks .MANAPPS Pg 55 DATASTAGE SERVER Job name: table_input_oracle__elt__table_output_oracle Job Schema of table_lookup_oracle Schema of table_input_oracle V 1.

MANAPPS Pg 56 DATASTAGE PX Job name: PX_table_input_oracle__elt__table_output_oracle Job Schema of table_lookup_oracle Schema of table_input_oracle V 1.1 2009/01 ETL Benchmarks .

MANAPPS Pg 57 INFORMATICA Job name: table_input_oracle__elt__table_output_oracle Job Schema of table_lookup_oracle V 1.1 2009/01 ETL Benchmarks .

4.1 PDI 3.1.67 7.00 5 500 000 23.4.31 0.1 V 1.0 6.1 0.5 1.50 9 1 000 000 52.70 15.99 38.0.5 IBM DS PX 7.26 DataStage 7.35 12.1 PDI 3.60 65.9 Informatica 8.12 2.72 382.60 116.00 47.5 INFA PWC 8.50 14 Statistics: Number of lines 100 000 500 000 1 000 000 TOS 2.2 2.27 ratio compared with TOS 2.83 0.26 201.1 2009/01 ETL Benchmarks .0.00 30.0 IBM DS 7.5 DataStage PX 7.79 2.1.5 2.MANAPPS Pg 58 Schema of table_input_oracle Test results: Test 7: Tables Input Oracle > Transformation > Tables Output Oracle (ELT) Lines TOS 2.39 0.4.4 8.1 100 000 5.

a sort software. V 1. I can’t do sort in memory with Pentaho Data Integrator. Sorts list: • • • Comments: With the version used. But the feature is present on latest version.1 2009/01 ETL Benchmarks . Order by the integer field `age` ASC. we have to use the component tExternalSort which use GNU sort.MANAPPS Pg 59 Test 8: File Input Delimited > Sort > File Output Delimited Scenario: Reading X lines from a file input delimited and writing in a file input delimited sorted. Order by the string field `firstname` ASC. Order by the fields `age` and `firstname` ASC. On Talend Open Studio. with a large volume (5 000 000 and 20 000 000).

MANAPPS Pg 60 TALEND OPEN STUDIO Job names: • file_input_delimited__sort_on_age__file_output_delimited • file_input_delimited__sort_on_firstname__file_output_delimited • file_input_delimited__sort_on_firstname_and_age__file_output_delimited Job Schema of file_input_delimited V 1.1 2009/01 ETL Benchmarks .

1 2009/01 ETL Benchmarks .MANAPPS Pg 61 PENTAHO DATA INTEGRATION Job names: • file_input_delimited__sort_on_age__file_output_delimited • file_input_delimited__sort_on_firstname__file_output_delimited • file_input_delimited__sort_on_firstname_and_age__file_output_delimited Job Schema of file_input_delimited V 1.

MANAPPS Pg 62 DATASTAGE SERVER Job names: • file_input_delimited__sort_on_age__file_output_delimited • file_input_delimited__sort_on_firstname__file_output_delimited • file_input_delimited__sort_on_firstname_and_age__file_output_delimited Job Schema of file_input_delimited V 1.1 2009/01 ETL Benchmarks .

1 2009/01 ETL Benchmarks .MANAPPS Pg 63 DATASTAGE PX Job names: • PX_file_input_delimited__sort_on_age__file_output_delimited • PX_file_input_delimited__sort_on_firstname__file_output_delimited • PX_file_input_delimited__sort_on_firstname_and_age__file_output_delimited Job Schema of file_input_delimited V 1.

1 2009/01 ETL Benchmarks .MANAPPS Pg 64 INFORMATICA Job names: • file_input_delimited__sort_on_age__file_output_delimited • file_input_delimited__sort_on_firstname__file_output_delimited • file_input_delimited__sort_on_firstname_and_age__file_output_delimited Job Schema of file_input_delimited V 1.

85 60.95 267.82 0.0 IBM DS 7.09 0.83 DataStage 7.86 1.20 492.25 13.5 IBM DS PX 7.0 2.1 V 1.63 4.00 5 000 000 188.4.0.4.1.73 32.1.20 4.67 201.1 2009/01 ETL Benchmarks .MANAPPS Pg 65 Tests result: Test 8: File Input Delimited > Sort > File Output Delimited Sorted by Age Sorted by age Lines TOS 2.26 ratio compared with TOS 2.03 0.1 PDI 3.50 50.70 64.51 2.5 2.44 3.00 1 000 000 15.34 Informatica 8.78 1.00 5.5 INFA PWC 8.4.1 3.03 668.00 Statistics: Number of lines 100 000 1 000 000 5 000 000 TOS 2.00 20 000 000 1016.5 DataStage PX 7.42 2.21 155.1 100 000 1.1 PDI 3.70 16.47 0.92 3.0.

MANAPPS Pg 66 0.20 58.46 157.1 2.20 739.00 51.1.0 IBM DS 7.1 V 1.1 100 000 1.1.48 0.5 IBM DS PX 7.00 5 000 000 168.00 1 000 000 18.1 PDI 3.5 INFA PWC 8.0.00 4.15 426.00 4.00 16.1 2009/01 ETL Benchmarks .89 Informatica 8.4.0 2.72 ratio compared with TOS 2.55 3.00 223.00 Statistics: Number of lines 100 000 1 000 000 TOS 2.40 6.4.37 0.21 2.2 20 000 000 0.66 +++ Test 8: File Input Delimited > Sort > File Output Delimited Sort By First Name Sorted by firstname Lines TOS 2.5 DataStage PX 7.20 624.00 20 000 000 1071.0.69 3.00 57.05 31.5 3.36 0.1 PDI 3.01 1.00 13.4.73 DataStage 7.

26 Informatica 8.00 49.1 PDI 3.1 V 1.50 5.1.1 100 000 1.75 0.40 29.1 2009/01 ETL Benchmarks . Name Sorted by age & firstname Lines TOS 2.00 1 000 000 17.0 IBM DS 7.74 0.1.03 159.22 ratio compared with TOS 2.21 5 000 000 20 000 000 Test 8: File Input Delimited > Sort > File Output Delimited Sort By First Age.42 1.68 0.38 0.1 PDI 3.27 60.45 1.5 IBM DS PX 7.00 842.71 DataStage 7.0 2.33 13.20 582.50 211.3 0.5 INFA PWC 8.4.6 3.0.4.00 20 000 000 1007.69 2.00 59.10 360.0.00 Statistics: Number of lines 100 000 1 000 000 5 000 000 TOS 2.22 7.00 5 000 000 225.MANAPPS Pg 67 0.53 +++ 0.34 0.1 3.51 3.93 0.5 DataStage PX 7.58 0.00 16.33 4.5 5.33 3.94 0.4.

84 +++ V 1.58 0.1 2009/01 ETL Benchmarks .MANAPPS Pg 68 0.21 20 000 000 0.

we have to use the tSortedAggregateRow on Talend Open Studio.1 2009/01 ETL Benchmarks . Pentaho Data Integrator failed.MANAPPS Pg 69 Test 9: File Input Delimited > Aggregate > File Output Delimited Scenario: Reading X lines from a file input delimited. SUM(rate). MIN(rate). 2 – Group by the field `age`. achieving an aggregation and writing the operations result in a file output delimited. On this case. V 1. MAX(rate). Operations: COUNT. AVG(rate). Comments: When the output flow is too big (aggregate by firstname with big volume here). Operation: COUNT. 1 – Group by the field `age`. This component sorts rows before the aggregation. 3 – Group by the field `firstname`. Operations: COUNT.

1 2009/01 ETL Benchmarks .MANAPPS Pg 70 TALEND OPEN STUDIO Job names: • file_input_delimited__aggregate_group_by_age_count__file_output_delimited • file_input_delimited__aggregate_group_by_age_count_sum_avg_min_max__file_o utput_delimited • file_input_delimited__aggregate_group_by_firstname_count__file_output_delimit ed Job Job using the tExternalSortRow component V 1.

1 2009/01 ETL Benchmarks .MANAPPS Pg 71 Schema of file_input_delimited Schema of file_output_delimited file_input_delimited__aggregate_group_by_age_count__file_output_delimited V 1.

MANAPPS Pg 72 PENTAHO DATA INTEGRATION Job names: • file_input_delimited__aggregate_group_by_age_count__file_output_delimited • file_input_delimited__aggregate_group_by_age_count_sum_avg_min_max__file_o utput_delimited • file_input_delimited__aggregate_group_by_firstname_count__file_output_delimit ed Job Schema of file_input_delimited Schema of file_output_delimited file_input_delimited__aggregate_group_by_age_count__file_output_delimited V 1.1 2009/01 ETL Benchmarks .

MANAPPS Pg 73 DATASTAGE SERVER Job names: • file_input_delimited__aggregate_group_by_age_count__file_output_delimited • file_input_delimited__aggregate_group_by_age_count_sum_avg_min_max__file_o utput_delimited • file_input_delimited__aggregate_group_by_firstname_count__file_output_delimit ed Job Schema of file_input_delimited Schema of file_output_delimited file_input_delimited__aggregate_group_by_age_count__file_output_delimited V 1.1 2009/01 ETL Benchmarks .

1 2009/01 ETL Benchmarks .MANAPPS Pg 74 DATASTAGE PX Job names: • PX_file_input_delimited__aggregate_group_by_age_count__file_output_delimited • PX_file_input_delimited__aggregate_group_by_age_count_sum_avg_min_max__fi le_output_delimited • PX_file_input_delimited__aggregate_group_by_firstname_count__file_output_deli mited Job Schema of file_input_delimited Schema of file_output_delimited file_input_delimited__aggregate_group_by_age_count__file_output_delimited V 1.

1 2009/01 ETL Benchmarks .MANAPPS Pg 75 INFORMATICA Job names: • file_input_delimited__aggregate_group_by_age_count__file_output_delimited • file_input_delimited__aggregate_group_by_age_count_sum_avg_min_max__file_o utput_delimited • file_input_delimited__aggregate_group_by_firstname_count__file_output_delimit ed Job Schema of file_input_delimited Schema of file_output_delimited file_input_delimited__aggregate_group_by_age_count__file_output_delimited V 1.

50 5.00 1 000 000 6.16 466.86 6.1 PDI 3.1 V 1.1.1 100 000 0.00 4.35 3.00 Statistics: Number of lines 100 000 1 000 000 TOS 2.00 20 000 000 124.1 2009/01 ETL Benchmarks .5 3.5 DataStage PX 7.00 5 000 000 30.4.45 0.62 2.00 27.MANAPPS Pg 76 Tests result: Test 9: File Input Delimited > Aggregate > File Output Delimited Group by age (count) Group by Age (Count) Lines TOS 2.50 128.0.00 3.1.00 21.1 PDI 3.23 0.05 134.0.84 0.4.70 2.1 4.00 78.93 Informatica 8.99 26.30 21.53 6.5 IBM DS PX 7.72 ratio compared with TOS 2.4.5 INFA PWC 8.0 IBM DS 7.00 6.0 4.8 DataStage 7.33 8.

00 33. Min(Rate).4.0. Sum(Rate).00 184.1 100 000 0.89 Informatica 8.5 DataStage PX 7.22 5 000 000 20 000 000 Test 9: File Input Delimited > Aggregate > File Output Delimited Group by Age (Count.31 ratio compared with TOS 2. Min(Rate).00 Statistics: Number of lines 100 000 1 000 000 5 000 000 TOS 2.50 12.00 20 000 000 139.48 1.84 2.76 0.30 50.63 0. Avg(Rate).61 138.0.33 38.5 INFA PWC 8.1 2009/01 ETL Benchmarks .68 DataStage 7. Max(Rate)) Lines TOS 2.47 3.06 0.1 PDI 3.00 254. Avg(Rate).33 13.4.44 25.00 15.1 3.33 6.12 426.1 PDI 3.00 1 000 000 7.25 2.1.1 2.4.03 0.5 2.20 11. Max(Rate)) Group by Age (Count.0 3.8 0.38 0.00 11. Sum(Rate).60 2.MANAPPS Pg 77 4.1 V 1.1.0 IBM DS 7.27 0.5 IBM DS PX 7.71 0.00 5 000 000 37.38 1.39 2.7 1.39 3.

0 IBM DS 7.00 85 Statistics: Number of lines 100 000 1 000 000 5 000 000 20 000 000 TOS 2.27 20 000 000 3.0.59 DataStage 7.5 INFA PWC 8.00 23 20 000 000 928.1 V 1.70 14.5 2.30 68.00 40.50 4 1 000 000 7.0 3.1 100 000 0.5 IBM DS PX 7.00 11.00 505.00 424.1 2009/01 ETL Benchmarks .34 0.4.4.70 2.1 PDI 3.1.33 1.1.79 162.092 ratio compared with TOS 2.32 Test 9: File Input Delimited > Aggregate > File Output Delimited Group by FirstName (Count) Group by FirstName (Count) Lines TOS 2.54 Informatica 8.89 29.MANAPPS Pg 78 1.86 2.91 0.23 1.77 0.0.4.2 0.14 012 0.08 544.5 DataStage PX 7.1 PDI 3.1 4.82 0.14 3.65 1.46 5.00 4.06 1.00 9 5 000 000 198.39 0.76 0.

1 2009/01 ETL Benchmarks .MANAPPS Pg 79 Test 10: File Input Delimited > Lookup > File Output Delimited Scenario: Reading X lines from a file input delimited. Writing the jointure result into a file output delimited. for 4 fields using id_client column. V 1. looking up to another file input delimited.

1 2009/01 ETL Benchmarks .MANAPPS Pg 80 TALEND OPEN STUDIO Job name: file_input_delimited__file_lookup_delimited__file_output_delimited Job Schema of file_input_delimited Schema of file_lookup_delimited V 1.

1 2009/01 ETL Benchmarks .MANAPPS Pg 81 Schema file_output_delimited tMap Component V 1.

1 2009/01 ETL Benchmarks .MANAPPS Pg 82 PENTAHO DATA INTEGRATION Job name: file_input_delimited__file_lookup_delimited__file_output_delimited Job Schema of file_input_delimited Schema of file_lookup_delimited V 1.

1 2009/01 ETL Benchmarks .MANAPPS Pg 83 Schema of file_output_delimited Mapping Component V 1.

MANAPPS Pg 84 DATASTAGE SERVER Job name: file_input_delimited__file_lookup_delimited__file_output_delimited Job Schema of file_input_delimited V 1.1 2009/01 ETL Benchmarks .

MANAPPS Pg 85 Schema of file_lookup_delimited Schema file_output_delimited V 1.1 2009/01 ETL Benchmarks .

MANAPPS Pg 86 Transformer Component V 1.1 2009/01 ETL Benchmarks .

1 2009/01 ETL Benchmarks .MANAPPS Pg 87 DATASTAGE PX Job name: PX_file_input_delimited__file_lookup_delimited__file_output_delimited Job Schema of file_input_delimited V 1.

MANAPPS Pg 88 Schema of file_lookup_delimited Schema file_output_delimited Transformer Component V 1.1 2009/01 ETL Benchmarks .

MANAPPS Pg 89 INFORMATICA Job name: file_input_delimited__file_lookup_delimited__file_output_delimited Job Schema of file_input_delimited Schema of file_lookup_delimited V 1.1 2009/01 ETL Benchmarks .

MANAPPS Pg 90 Schema file_output_delimited Transformer Component V 1.1 2009/01 ETL Benchmarks .

4.0 IBM DS 7.20 11.45 4.14 5.1 3.00 Statistics: Number of lines 100 000 1 000 000 5 000 000 TOS 2.66 1.00 40.0.00 1 000 000 6.4.1 PDI 3.72 1.60 12.MANAPPS Pg 91 Tests result: Test 10: File Input Delimited > Lookup > File Output Delimited Lookup 100 000 rows ~7MB Lookup 100 000 rows ~7MB Lines TOS 2.11 ratio compared with TOS 2.45 1.1.00 5.1 V 1.15 3.00 122.05 DataStage 7.90 139.1 2009/01 ETL Benchmarks .0 2.00 20 000 000 108.60 33.1 PDI 3.72 87.00 5 000 000 28.5 IBM DS PX 7.39 Informatica 8.1 100 000 1.86 3.00 32.45 1.37 288.4.91 1.44 1.00 116.35 3.0.39 21.5 3.40 10.00 5.1.5 INFA PWC 8.5 DataStage PX 7.

9 7.28 Test 10: File Input Delimited > Lookup > File Output Delimited Lookup 500 000 rows ~34MB Lookup 500 000 rows ~34MB Lines TOS 2.4.0 2.00 Statistics: Number of lines 100 000 1 000 000 5 000 000 20 000 000 TOS 2.1 V 1.89 24.00 20 000 000 115.24 1.69 1.67 1.52 DataStage 7.07 20 000 000 2.67 291.10 195.03 2.00 4.24 1.50 33.1.00 122.36 97.1 PDI 3.4.90 28.00 1 000 000 8.1 100 000 3.1 2009/01 ETL Benchmarks .0.03 1.18 3.5 IBM DS PX 7.00 13.00 5 000 000 32.46 1.5 INFA PWC 8.05 Informatica 8.00 7.40 56.00 122.00 33.MANAPPS Pg 92 1.0.0 IBM DS 7.73 1.01 2.71 1.1 1.5 DataStage PX 7.1.79 1.4.76 3.13 1.00 40.02 1.5 7.00 11.05 ratio compared with TOS 2.1 PDI 3.

00 40.20 IBM DS 7.5 9.00 123.1 V 1.5 DataStage PX 7.00 20 000 000 121.6 116.00 12.5 6.91 1.05 1.67 0.00 IBM DS PX 7.93 5.16 ratio compared with TOS 2.4.00 Statistics: Number of lines 100 000 1 000 000 5 000 000 20 000 000 TOS 2.1.02 4.25 203.61 2.00 142.51 0.1 9.4.86 14.1 PDI 3.64 1.01 Informatica 8.01 DataStage 7.26 3.0 14.04 1.44 487.00 5 000 000 38.50 32.4.84 0.26 PDI 3.30 80.94 1.1 2009/01 ETL Benchmarks .0 1.1.0.MANAPPS Pg 93 Test 10: File Input Delimited > Lookup > File Output Delimited Lookup 1 000 000 rows ~68MB Lookup 1 000 000 rows ~68MB Lines 100 000 1 000 000 TOS 2.47 2.5 68.1 5.60 102.0.00 INFA PWC 8.00 35.25 15.1 0.

00 Statistics: Number of lines 100 000 1 000 000 5 000 000 20 000 000 TOS 2.00 5 000 000 199.5 INFA PWC 8.28 0.4.4.1 100 000 56.26 496.00 1 000 000 69.21 0.00 14.1 407.1 973.42 0.00 11.5 DataStage PX 7.MANAPPS Pg 94 Test 10: File Input Delimited > Lookup > File Output Delimited Lookup 5 000 000 rows ~365MB Lookup 5 000 000 rows ~365MB Lines TOS 2.19 0.0 IBM DS 7.00 24.1.1.1 PDI 3.00 134.1 PDI 3.4.53 5.00 42.00 20 000 000 557.75 0.1 2009/01 ETL Benchmarks .24 Informatica 8.51 369.5 IBM DS PX 7.2 0.00 55.43 0.0 Failed Failed Failed Failed DataStage 7.00 30.1 0.89 2.1 V 1.25 ratio compared with TOS 2.0.00 141.49 1.5 6.0.

Talend Open Studio. we have to use filter components.MANAPPS Pg 95 Test 11: File Input Delimited > Lookup > File Output Delimited && rejects Scenario: Reading X lines from a file input delimited. 1 – Filter rejects: `age` content < 18 2 – Filter rejects: `age` content < 18 and inner join reject Comments: Talend Open Studio and DataStage Server are the more ergonomic tools to manage the expression filter rejects and inner join rejects (with the Transformer component (tMap on Talend Open Studio)).1 2009/01 ETL Benchmarks . For DataStage PX. looking up to another file input delimited. we have to use filter components. Pentaho and Data Integrator. Pentaho Data Integrator and Informatica. Writing the jointure result into a file output delimited and the output rejects into another files output delimited. For DataStage PX. Informatica and DataStage Server are the more ergonomic tools to manage the expression filter rejects and inner join rejects. for 4 fields using id_client column. V 1.

MANAPPS Pg 96 TALEND OPEN STUDIO Job name: file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_file_output_delimited Job Schema of file_input_delimited Schema of file_lookup_delimited V 1.1 2009/01 ETL Benchmarks .

MANAPPS Pg 97 Schema of file_output_delimited (age>=18) Schema of file_output_delimited (age<18) = Schema of file_ output _delimited tMap Component V 1.1 2009/01 ETL Benchmarks .

1 2009/01 ETL Benchmarks .MANAPPS Pg 98 PENTAHO DATA INTEGRATION Job name: file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_file_output_delimited Job Schema of file_input_delimited Schema of file_lookup_delimited V 1.

1 2009/01 ETL Benchmarks .MANAPPS Pg 99 Schema of file_output_delimited Schema of file_output_delimited (age<18) = Schema of file_ output _delimited V 1.

MANAPPS Pg 100 Mapping Component DATASTAGE SERVER Job name: file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_file_output_delimited Job Schema of file_input_delimited V 1.1 2009/01 ETL Benchmarks .

MANAPPS Pg 101 Schema file_lookup_delimited Schema of file_output_delimited Schema of file_output_delimited (age<18) = Schema of file_ output _delimited V 1.1 2009/01 ETL Benchmarks .

1 2009/01 ETL Benchmarks .MANAPPS Pg 102 Transformer Component V 1.

MANAPPS Pg 103 DATASTAGE PX Job name: PX_file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_file_output_delim ited Job Schema of file_input_delimited V 1.1 2009/01 ETL Benchmarks .

MANAPPS Pg 104 Schema file_lookup_delimited Schema of file_output_delimited Schema of file_output_delimited (age<18) = Schema of file_output_delimited V 1.1 2009/01 ETL Benchmarks .

MANAPPS Pg 105 Transformer Component V 1.1 2009/01 ETL Benchmarks .

1 2009/01 ETL Benchmarks .MANAPPS Pg 106 INFORMATICA Job name: file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_file_output_delimited Job Schema of file_input_delimited V 1.

1 2009/01 ETL Benchmarks .MANAPPS Pg 107 Schema file_lookup_delimited Schema of file_output_delimited Schema of file_output_delimited (age<18) = Schema of file_output_delimited Transformer Component V 1.

39 1.MANAPPS Pg 108 Tests result: Test 11: File Input Delimited > Lookup > File Output Delimited && rejects Lookup 100 000 rows ~7MB + Filter 18 years Lookup 100 000 rows ~7MB Lines TOS 2.00 20 000 000 101.5 IBM DS PX 7.5 3.0 IBM DS 7.1 3.1 100 000 1.00 137.5 INFA PWC 8.30 6.56 1.1.00 10.65 3 DataStage 7.4.10 10.19 2.64 2.00 120.54 2.08 1.1 PDI 3.12 1.1 V 1.0.22 1.00 1 000 000 6.00 144.74 17.1 PDI 3.31 1.00 33.1.00 5 000 000 29.48 1.35 Informatica 8.00 41.42 4.18 ratio compared with TOS 2.1 2009/01 ETL Benchmarks .0.00 5.0 2.00 7.4.55 78.40 36.51 3.4.65 305.97 1.5 DataStage PX 7.00 Statistics: Number of lines 100 000 1 000 000 5 000 000 20 000 000 TOS 2.50 14.

28 20.0 IBM DS 7.77 DataStage 7.0.08 1.17 1.1 2009/01 ETL Benchmarks .1.13 ratio compared with TOS 2.00 Statistics: Number of lines 100 000 1 000 000 5 000 000 20 000 000 TOS 2.54 1.50 5.00 1 000 000 9.00 5 000 000 32.00 155.54 1.71 3.5 INFA PWC 8.76 1.MANAPPS Pg 109 Test 11: File Input Delimited > Lookup > File Output Delimited && rejects Lookup 500 000 rows ~34MB + Filter 18 years Lookup 500 000 rows ~34MB Lines TOS 2.5 DataStage PX 7.21 2.1 PDI 3.50 57.1 100 000 4.50 34.5 IBM DS PX 7.4.0.1.66 1.98 310.00 173.1 V 1.00 20 000 000 111.44 81.60 7.67 34.5 6.00 44.80 28.25 10.26 7.20 126.76 1.00 14.51 2.4.83 2.1 PDI 3.0 1.39 Informatica 8.05 1.1 1.38 1.4.

1 2009/01 ETL Benchmarks .MANAPPS Pg 110 V 1.

00 IBM DS PX 7.00 34.00 Statistics: Number of lines 100 000 1 000 000 5 000 000 20 000 000 TOS 2.4.7 1.35 IBM DS 7.63 319.5 66.MANAPPS Pg 111 Test 11: File Input Delimited > Lookup > File Output Delimited && rejects Lookup 1 000 000 rows ~68MB + Filter 18 years Lookup 1 000 000 rows ~68MB Lines 100 000 1 000 000 TOS 2.89 1.00 68.1.0 1.31 111.2 15.00 18.52 DataStage 7.00 5 000 000 38.0 14.5 9.74 0.47 1.35 95.13 2.10 32.1 2009/01 ETL Benchmarks .0.38 2.5 DataStage PX 7.00 14.4.03 ratio compared with TOS 2.18 1.00 51.0.1 V 1.5 6.33 1.1 PDI 3.33 130.1 0.1.88 1.1 10.05 220.59 0.1 6.00 INFA PWC 8.91 2.21 Informatica 8.00 20 000 000 126.00 153.47 4.92 0.4.22 PDI 3.

MANAPPS Pg 112 TALEND OPEN STUDIO Job name: file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_and_innerjoin_rejects _file_output_delimited Job Schema of file_input_delimited V 1.1 2009/01 ETL Benchmarks .

1 2009/01 ETL Benchmarks .MANAPPS Pg 113 Schema of file_lookup_delimited Schema of file_output_delimited (age>=18) Schema of file_output_delimited (age<18) = Schema of file_output_delimited Schema of file_output_delimited (inner join rejects) = Schema of file_output_delimited V 1.

MANAPPS Pg 114 tMap Component V 1.1 2009/01 ETL Benchmarks .

1 2009/01 ETL Benchmarks .MANAPPS Pg 115 PENTAHO DATA INTEGRATION Job name: file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_and_innerjoin_rejects _file_output_delimited Job Schema of file_input_delimited Schema of file_lookup_delimited V 1.

1 2009/01 ETL Benchmarks .MANAPPS Pg 116 Schema of file_output_delimited Schema of file_output_delimited (age<18) = Schema of file_output_delimited Schema of file_output_delimited (inner join rejects) = Schema of file_output_delimited V 1.

1 2009/01 ETL Benchmarks .MANAPPS Pg 117 Mapping Component DATASTAGE SERVER Job name: file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_and_innerjoin_rejects _file_output_delimited V 1.

1 2009/01 ETL Benchmarks .MANAPPS Pg 118 Job Schema of file_input_delimited Schema of file_lookup_delimited V 1.

1 2009/01 ETL Benchmarks .MANAPPS Pg 119 Schema file_output_delimited Schema of file_output_delimited (age<18) = Schema of file_output_delimited Schema of file_output_delimited (inner join rejects) = Schema of file_output_delimited V 1.

1 2009/01 ETL Benchmarks .MANAPPS Pg 120 Transformer Component V 1.

MANAPPS

Pg 121

DATASTAGE PX Job name: PX_file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_and_innerjoin_rej ects_file_output_delimited

Job

Schema of file_input_delimited

V 1.1 2009/01

ETL Benchmarks

MANAPPS

Pg 122

Schema of file_lookup_delimited

Schema file_output_delimited

Schema of file_output_delimited (age<18) = Schema of file_output_delimited Schema of file_output_delimited (inner join rejects) = Schema of file_output_delimited

V 1.1 2009/01

ETL Benchmarks

MANAPPS

Pg 123

Transformer Component

V 1.1 2009/01

ETL Benchmarks

MANAPPS Pg 124 INFORMATICA Job name: file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_and_innerjoin_rejects _file_output_delimited Job Schema of file_input_delimited V 1.1 2009/01 ETL Benchmarks .

1 2009/01 ETL Benchmarks .MANAPPS Pg 125 Schema of file_lookup_delimited Schema file_output_delimited Schema of file_output_delimited (age<18) = Schema of file_output_delimited Schema of file_output_delimited (inner join rejects) = Schema of file_output_delimited Transformer Component V 1.

00 Statistics: Number of lines 100 000 1 000 000 5 000 000 20 000 000 TOS 2.4.22 1.83 2.MANAPPS Pg 126 Test 12: file_input_delimited >_file_lookup_delimited > file_output_delimited__rejects && innerjoin_rejects_file_output_delimited Lookup 100 000 rows ~7MB Lookup 100 000 rows ~7MB Lines TOS 2.00 47.63 59.00 121.80 30.1 V 1.1 100 000 1.92 1.77 1.00 10.1.60 6.4.1 2009/01 ETL Benchmarks .12 1.13 ratio compared with TOS 2.1 PDI 3.5 DataStage PX 7.42 2.22 1.65 13.82 2.07 DataStage 7.5 4.4.00 9.00 4.5 IBM DS PX 7.0 IBM DS 7.78 327.28 6.25 12.33 33.00 146.00 5 000 000 24.3 2.0 1.34 2.1 PDI 3.1 2.7 1.0.00 15.1.5 INFA PWC 8.0.00 20 000 000 106.00 1 000 000 5.64 1.37 Informatica 8.60 137.43 3.

08 1.65 DataStage 7.4.83 1.60 189.4.1 2009/01 ETL Benchmarks .00 5 1 000 000 8.26 28.1 1.06 2.75 2.1 PDI 3.00 11.50 16.21 2.50 150.73 1.1.5 6.0.24 Informatica 8.16 7.4.57 6.5 IBM DS PX 7.34 72.00 44.0.5 INFA PWC 8.30 35.0 1.25 63.1.2 1.05 ratio compared with TOS 2.1 100 000 4.00 33 20 000 000 120.38 2.26 1.MANAPPS Pg 127 Test 12: file_input_delimited >_file_lookup_delimited > file_output_delimited__rejects && innerjoin_rejects_file_output_delimited Lookup 500 000 rows ~34MB Lookup 500 000 rows ~34MB Lines TOS 2.00 127 Statistics: Number of lines 100 000 1 000 000 5 000 000 20 000 000 TOS 2.74 19.0 IBM DS 7.53 319.73 4.1 PDI 3.1 V 1.09 1.00 11 5 000 000 30.5 DataStage PX 7.45 1.

57 413.0.27 DataStage 7.00 49.98 15.4.1 V 1.0.40 IBM DS PX 7.49 90.00 19.86 0.5 13.51 5.0 1.00 134.4.1.96 2.18 PDI 3.06 Informatica 8.4.49 79.1 0.1 10.55 0.5 38.MANAPPS Pg 128 Test 12: file_input_delimited >_file_lookup_delimited > file_output_delimited__rejects && innerjoin_rejects_file_output_delimited Lookup 1 000 000 rows ~68MB Lookup 1 000 000 rows ~68MB Lines 100 000 1 000 000 TOS 2.1.18 1.30 27.27 1.1 PDI 3.5 3.45 231.81 1.00 INFA PWC 8.1 6 13 5 000 000 38.83 1.00 37 20 000 000 126.00 108.1 2009/01 ETL Benchmarks .25 1.05 3.5 DataStage PX 7.0 13.04 ratio compared with TOS 2.00 131 Statistics: Number of lines 100 000 1 000 000 5 000 000 20 000 000 TOS 2.35 IBM DS 7.8 2.21 1.96 1.

1.Metadata Manager and Reporting Service deactivation *** Configuration amendments : .43 GB of RAM) we've done following change: .2.0. Core2 Duo CPU and 3.1 installation: *** Since the 'benchmark' machine is a tiny laptop with limited ressource (XP 32bit.0) Database installation with: sga_max_size=164MB pga_aggregate_target=115MB Comments and "best-practices" for the tests: Test 1: File Input Delimited > File Output Delimited .MANAPPS Pg 129 Annex 1: Informatica settings and results This annex presents the settings changes made by Informatica and limitations they have found Comments and amendment done on the basic PowerCenter 8.Auto-Memory deactivation: MaxMem at 0 in the Default Session Config .dynamic partitioning at 2 with more than 5 millions rows This is a Disk Bounded test Test 2: File Input Delimited > Table MySQL Output Not Applicable Test 3: Table Oracle Input > File Output Delimited .High Availability storage deactivation: EnableHAStorage at No for the 'Integration Service .1.1 2009/01 ETL Benchmarks .no partitioning as it's too small in volume and short in time Test 4: File Input Delimited > Table Output Oracle BULK V 1.Custom variable FileRdrTreatNullCharAs on the Integration Service added (NULL character are encountered in source data files) *** Standard Oracle 10g (10.Unix environment variable INFA_DEFAULT_DOMAIN added .

function "CONCAT(CONCAT(firstname.no partitioning as it's too small in volume and short in time Oracle database is not 'tuned' for ELT mode Test 8: File Input Delimited > Sort > File Output Delimited .sorter memory adjustment This is a memory limited test at 20 millions rows (2 pass sort are required) and also disk limited sometime Test 9: File Input Delimited > Aggregate > File Output Delimited .dynamic partitioning at 2 with more than 5 millions rows This is a Disk Bounded test Test 6: Table Input Oracle > Aggregation > Table Output Oracle (ELT) .lookup memory adjustment .1 2009/01 ETL Benchmarks .dynamic partitioning at 2 with more than 5 millions rows in source .lookup in the flow with hash partitioning point This is a CPU bounded test Test 12: file_input_delimited >_file_lookup_delimited > file_output_delimited__rejects && innerjoin_rejects_file_output_delimited .commit size at 50000 .lookup in the flow with hash partitioning point This is a CPU bounded test Test 11: File Input Delimited > Lookup > File Output Delimited && rejects .lookup memory adjustment .use of router in place of filters .' ').lookup memory adjustment .dynamic partitioning at 2 with more than 5 millions rows in source .use of router in place of filters .dynamic partitioning at 2 with 2 millions rows This is a Disk Bounded test Test 5: File Input Delimited > Transform > File Output Delimited .commit size at 100000 .aggregator memory adjustment This is a CPU bounded test Test 10: File Input Delimited > Lookup > File Output Delimited .lastname)" is replaced by "firstname || ' ' || lastname" .lookup in the flow with hash partitioning point This is a CPU bounded test V 1.dynamic partitioning at 2 with more than 5 millions rows in source .dynamic partitioning at 2 with more than 5 millions rows in source or lookup .MANAPPS Pg 130 .no partitioning as it's too small in volume and short in time Oracle database is not 'tuned' for ELT mode Test 7: Tables Input Oracle > Transformation > Tables Output Oracle (ELT) .