You are on page 1of 2

Project Camel Talend Open Studio Hadoop

Pros Flexibility very lightweight Native ETL tool with more than 900 Open source technology.
Potential to Custom Fit some integration components and built-in connectors that Highly scalable distributed system that use
requirements allows easy linking of wide array of sources commodity software.
Lower Cost - Licensing and targets. Self-managing and self-healing, Built-In
Product Development Methodology. Lesser TTM (GUI Based) creating mechanism for machine outage recovery.
Open Source straightforward, simple, code in the least Highly fault tolerant.
Heavily aligned with Java amount of time. Enterprise Infrastructure can be leveraged.
Built-In components for Joins/Lookups on Efficient processing of exponentially
Automation/Continuous Development medium volume of data. growing data.
Integrated Testing Framework Column level data aggregation can be Brings the computation as physically close
Limited cross project environment impact achieved easily. to the data for best bandwidth, instead of
Built-In components for data sorting and for copying the data.
eliminating duplicates. Efficiently handling batch workloads.
Built-In CDC functionality. Built-In techniques for
Modularization can be achieved through jobs. Large volumes and parallel data
Reusability - efficient reuse of sharable processing.
objects. Large data Sorting.
Complex calculations.
Multiple heterogeneous source system.
Multiple lookups/joins.

Ideal for both batch Processing and Data

Can be leveraged for Data Migration,
Historical loads and Historical Storage.
Cons Unavailability of specialized components to achieve Talend does not follow Target's dev ops Not fit for small data.
high volume (>0.5M) data lookups/dataset joins approach such as building in reusable test Compliance data needs to be encrypted
Need to leverage DB feature of joins by persisting data. cases within applications before processing.
Sorting and aggregation on the huge intermittent Talend's open studio toolset doesn't Hadoop File System is immutable. Files
incorporate code within a software repository cannot be modified.
Removal of duplicates to retain last record on huge
dataset. (e.g. GIT) Multiple tools involved in the ecosystem
State maintenance is required to have restart 'ability / Talend doesn't follow automated CI/CD which need to be expertized.
job recoverability. pipeline best practice.
Unavailability of CDC (Change Data Capture). Un-availability of component level
Any batch use case (though its a flow) dealing with parallelism (sequential execution). This
large volumes of data on which mathematical requires vertical scaling of environment as
calculations are performed collectively should be the data grows.
avoided. No component for parsing binary mainframe
Pivoting is yet another scenario where stream and split
processing is not possible which is performed on entire
FFA project includes medium data volume ( < 10 Million) and huge
transformations such as mathematical calculations, series of joins/lookups to
integrate data that is scattered across multiple systems and data roll ups, Talend
could be used for its conversion.

WFM project includes Huge data volume ( 50-600 Million) and transformations
such as mathematical calculations, series of joins/lookups to integrate data that
is scattered across multiple systems and data roll ups. Multiple Historical Loads
for drivers (1800+ Stores , 2 Years Data ) Needs to be migrated to Kronos.