You are on page 1of 14

Top Talend Interview Questions In 2020

Q1] What is Talend?


Ans: Talend is an open-source data integration platform that provides solutions for data integration and
data management. It offers various integration software and services for data management, data quality,
data integration, Big data, data preparation, cloud storage, and enterprise application. It is designed to
combine, convert, and update data in various business applications.

Q2] What is the full name of Talend?


Ans: Talend Open Studio

Q3] What is Talend Open Studio?


Ans: Talend Open Studio is an open-source ETL tool used for data integration and Big data. It is based
on the eclipse developer and design tool. Talend Open Studio acts as a code generator that provides
data transformation scripts and underlying programs in Java.

Q4] Define component in Talend Open Studio?


Ans: A component is a functional unit that is used to perform a single operation in Talend. With the
help of drag and drop functions, we can use the components to perform operations. The component can
be a snippet of Java code that is generated as a part of a job.

Q5] When did Talend Open Studio come into existence/launched?


Ans: Launched in October 2006

Q6] Talend Open Studio is written in which computer language?


Ans: Java

Q7] What are the programming languages that support Talend?


Ans: The programming languages that are supported by Talend are as follows:

Java SE

XQuery, SQL, XPath

Scripting languages: Javascript, Ruby, PHP, ECMAScript, and Groovy

Q8] What is the most current version of Talend Open Studio?


Ans:: Talend Open Studio 5.6.0

Q9] Why Talend is called as a code generator in Talend?


Ans: Talend is a code generator because it offers GUI, that allows the user to perform drag and drop the
component to create a job. Talend translates these jobs into Javascript.

Q10] Define routines?


Ans: Routines are the reusable pieces of code that can be used to optimize data processing by the use of
custom code. It also helps to improve the job capacity and the features of Talend studio. The routines
are of two types:

System routine: The system routine is the read-only codes that can call the inside any job.
User routine: The custom created a routine by the users either by making a new one or the existing one.

Q11] Define a project in Talend?


Ans: In Talend Studio, the highest physical structure used for storing several kinds of data integration
jobs, routines, metadata, etc., is known as Project.

Q12] Define tMap?


Ans: tMap is an advanced component that integrates itself as a plugin to Talend Studio. It is used for
mapping the data and also transforms and routes data from single or multiple sources to single or
multiple destinations.

Q13] List the functions of tMap?


Ans:The functions of tMap are as follows:

Apply transformation rules on any type of field

Multiplex and demultiplex of data

Filter input and output data using constraints

Add or remove columns

Reject data

Concatenate and interchange the data

Q14] How to access global and context variables?


Ans:To access the global and context variables, use the shortcut key Ctrl+spacebar.

Q15] Define Context variable in Talend?


Ans: Context variables are the user-defined parameters used by Talend that are defined into a job at
runtime. These variables may change their values as a job from development to test and production
environment.

Q16] What are the ways to define Context variables?


Ans: There are three ways to define Context variables. They are as follows:

Embedded Context variables

External Context variables

Repository Context variables

Q17] Explain Subjob?


Ans:A Subjob can be defined as a component or no.of components that are joined by the data flow.
Each individual component can be considered as a Subjob when they are not connected to each other. A
job can have one or more subjobs.

Q18] How to schedule a job in Talend?


Ans: It is required to export the job as a standalone program to schedule a job. Then using the
Operating System scheduling tools such as Cron, Windows Scheduler, Linux, etc. we can schedule the
jobs.

Q19] What is the difference between XMX and XMS parameters?


Ans: XMX parameter is used to specify the maximum heap size in java, whereas XMS parameter is
used to determine the initial heap size in java.

Q20] What is the function of the tXMLMap component?


Ans: tXMLMap is a component used for routing and transforming XML data flows mainly, when
processing numerous XML data sources, with or without flat data is to be joined.

Q21] What is the function of tJavaFlex?


Ans: tJavaFlex allows the user to add personalized code to integrate into the Talend program. With the
use of tJavaFlex function, we can enter three java code parts such as start, main, and end that constitute
a kind of component to the desired operation.

Q22] What is the function of tJava?


Ans: tJava allows the user to enter personalized code to integrate into the Talend program. This code
can be executed only once. It makes it possible to extend functionalities of a Talend job using custom
Java commands.

Q23] What is the use of tContextLoad?


Ans: tContextLoad is used to load a context from a flow. This component performs two controls. It
alerts when the parameters are defined in the incoming flow are not defined in Context, and another
control is it also alerts when a context value is not initialized in the incoming flow.

Q24] What are the components used to close a hive connection automatically?
Ans: To close a Hive connection automatically, we can use tPostJob and tHiveClose components.

Q25] What is the language used for Pig scripting?


Ans: Pig is a platform using a scripting language to express data flows. It programs the operation to
transform data using Pig Latin, which is the language used for pig scripting.

Q26] What are the various features that are available in the main window of Talend Open Studio?
Ans: The features that are available in the main window of Talend Open Studio are as follows:

Menubar

Toolbar

Workspace

Palette

Tab panel

Configure tabs
Repository

Outline view and code view

Q27] Explain Palette panel in Talend Studio?


Ans: In Talend Studio, Palette is used to find the components required to create or design a job.

Q28] What is the use of Palette settings in Talend?


Subscribe to our youtube channel to get new updates..!

Ans: In the Palette settings view, we can set the preferences for the component searching from the
palette and from the component list that appears on the design workspace when adding a component
without using the palette.

Q29] What is the use of String handling routines in Talend?


Ans: The string handling routines allows the user to carry out different types of operations and tests on
the alphanumeric expressions, depending on the Java methods.

Q30] What are the ways to improve the performance of a Job in Talend?
Ans: The following are the ways to improve the performance of a job in Talend:

Use of Talend ELT components when it is required

Remove unnecessary records using tFilterRows component

Use of Select Query to retrieve data from the DB

Split Talend Job into smaller SubJobs

Remove unnecessary fields or columns using tFilterColumns component

Use of Database bulk components

Q31] What is the use of Outline view in Talend Open Studio?


Ans: The Outline view provides an easy way to check out where the design workspace is located. It
allows the user to check the return values available in a component.

Q32] Mention the configurations that are required to connect HDFS?


Ans: The following are the configurations that are required to connect HDFS:

NameNode URI

User name

Distribution

Q33] Mention the service that is required for coordinating the transactions between HBase and Talend
studio?
Ans: Zookeeper client port service is required for coordinating the transactions between HBase and
Talend Open Studio.

Q34] What is the use of tLoqate AddressRow component in Talend?


Ans:: This component is used for correct mailing addresses associated with customer data to ensure a
single customer view and better delivery for their customer mailings.

Explore Tableau Sample Resumes! Download & Edit, Get Noticed by Top Employers!Download Now!
Q35] Can you edit generated code directly?
Ans: This is no possible; you cannot directly edit the code generated for a Talend Job.

Q36] If you want to include your own Java code in a Job, use one of these methods?
Ans:

1. Use a tJava, tJavaRow, or tJavaFlex component.


2. Create a routine by right -clicking Routines under Code in the Repository and then clicking Create
routine

Q37] Is it possible to use Binary or ASCII mode transfer in SFTP connection?


Ans: No, it is not possible to use Binary or ASCII transfer mode in SFTP connection. Because SFTP
does not support any kind of transfer modes as it is an extension to SSH.

Q38] Which component is used to sort data?


Ans: tSortRow, tExternalSortRow

Q39]What is the default pattern of a Date column in Talend?


Ans: By default, the date pattern for a column of type Date in a schema is “dd-MM-yyyy”.

[Related Page: Learn Adding and Reading Talend Headers and Trailers in Talend]

Q40] Built -In vs. Repository, Which is better?


Ans: It depends on the way you use the information is used. Use Built-In for information that you only
use once or very rarely. Use the Repository for information that you want to use repeatedly in multiple
components or Jobs, such as a database connection.

Q41] What is the difference between OnSubjobOK and OnComponentOK?


Ans:

OnSubjobOk OnComponentOk
It belongs to Subjob Triggers It belongs to Component Triggers
It is used to trigger the next subjob on the condition where the subjob is completed without any errors.
It is used to trigger the target component after the execution of the source component completes
without any errors.
This link can be used only with the first component of the subjob. This link can be used with any
component in a job.
Q42] How can you normalize delimited data in Talend Open Studio?
Ans: By using the tNormalize component

Q43] What is tMap?


Ans: tMap is an advanced component, which integrates itself as plugin to Talend Studio tMap
transforms and routes data from single or multiple sources to single or multiple destinations. It allows
you to define the tMap routing and transformation properties.

Q44] What types of joins are supported by the tMap component?


Ans: Inner, outer, unique, first, and all joins

Q45] What is the function of tDenormalizeSortedRow?


Ans: tDenormalizeSortedRow combines in a group of all input sorted rows. The distinct values of the
denormalized sorted rows are joined with item separators. It synthesized sorted input flow to save
memory.

Q46] Which Talend component is used for data transform using buitl in .NET classes?
Ans: tDotNETRow helps you facilitate data transform by utilizing custom or built-in .NET classes.

Check Out Talend Tutorials

Q47] What is tJoin?


Ans: tJoin joins two tables by doing an exact match on several columns. It compares columns from the
main flow with reference columns from the lookup flow and outputs the main flow data and/or the
rejected data.

Q48] What do you understand by MDM in Talend?


Ans: Master data management, through which an organization builds and manages a single, consistent,
accurate view of key enterprise data, has demonstrated substantial business value including
improvements to operational efficiency, marketing effectiveness, strategic planning, and regulatory
compliance. To date, however, MDM has been the privilege of a relatively small number of large,
resource-rich organizations. Thwarted by the prohibitive costs of proprietary MDM software and the
great difficulty of building and maintaining an in-house MDM solution, most organizations have had to
forego MDM despite its clear value.

Q49] What’s new in v5.6?


Talend Certification Training!
Explore Curriculum
Ans: This technical note highlights the important new features and capabilities of version 5.6 of
Talend’s comprehensive suite of Platform, Enterprise, and Open Studio solutions.

Q50] With version 5.6, Talend?


Ans:

Extends its big data leadership position enabling firms to move beyond batch processing and into real-
time big data by providing technical previews for the Apache Spark, Apache Spark Streaming and
Apache Storm frameworks.

Enhances its support for the Internet of Things (IoT) by introducing support for key IoT protocols
(MQTT, AMQP) to gather and collect information from machines, sensors, or other devices.
Improves Big Data performance: MapReduce executes on average 24% faster in v5.6 than in v5.5, and
53% faster than in v5.4.2, while Big Data profiling performance is typically 20 times faster in v5.6
compared to v5.5.

Enables faster updates to MDM data models and provides deeper control of data lineage, more
visibility, and control.

Offers further enterprise application connectivity and support by continuing to add to its extensive list
of over 800 connectors and components with enhanced support for enterprise applications such as SAP
BAPI and Tables, Oracle 12 GoldenGate CDC, Microsoft HDInsight, Marketo and Salesforce.com.

Advance Interview Questions

Q51] Talend Vs Pentaho


Ans:

Compare Talend And Pentaho


Comparison
Talend
Pentaho Kettle
Approach
Code generation
Meta driven
Monitoring
Enough tools to monitor logs
Enough tools to monitor logs
Risk
Eqally complex like Pentaho
Eqally complex like Talend
Data Quality (DQ)
Graphical User Interface (GUI) featured by Data Quality
GUI featured by DQ along with additional options
Interface
Moderate interface
Moderate interface
Speed
Moderate performance
Best when compared to Talend
Community Support
Strong community support
Strong community support
Deployment
Works on any java/perl compatible machine
Runs on java compatible machines.
Q52] What are the schemas that are supported by Talend?
Ans: The schemas that are supported by Talend are as follows:

Repository schema: The repository schema can often be used across multiple jobs. If any changes are
made in the schema, the entire job will effect automatically.
Fixed schema: The fixed schema is a read-only schema. For some components, the fixed schema is an
in-built in Talend.

Generic schema: The Generic schema is used as a sharable resource for multiple data sources. It is used
where you want to limit the use of schema tied to a specific file type or database type.

Q53] Explain various connections that are available in Talend?


Ans: Connection defines whether the data has to be processed, data output or the sequence of a job. The
various kinds of connections that are available in Talend are as follows:

1. Row: The row connection handles the actual data. According to the nature of the flow process, it can
be in the following types of row connections:

Main
Multiple Input/Output
Filter
Lookup
ErrorRejects
Rejects
Output
Uniques/Duplicates
2. Iterate: The iterate connection is used to execute a loop of files contained in the directory, on the
database entries or the rows contained in a file.

3.Trigger: Trigger connections are used to define the processing sequence, so there is no data handles
through these connections. The trigger connections are of two types:

Subjob Trigger: On Subjob Ok, On Subjob Error, Run if


Component Trigger: On Component Ok, On Component Error, Run if
4. Link: The Link connection can be used with ELT components only. These links are used to transfer
the table schema information to the ELT component to be used in specific Database query statements.

Q54] Is it possible to perform a Talend job partly?


Ans: Yes, it is possible to perform a Talend job partly using the command line. It is required to export
the job along with its dependencies. After that, you can access its instruction files from the terminal.

Q55] Explain Job design in Talend?


Ans: Job design is a design with at least one component connected together that allows the user to
develop and run the dataflow management processes. It interprets the business requirements into
routines, programs, and code to implement the data flow.

Q56] How to pass data from a parent job to a child job?


Ans: Create a Standard job called Childjob. Open the context to define two variables, such as name and
scope. These variables are used to pass a value from the parent job to a child job.

Q57] Is it possible to exclude headers and footers from the input files before loading the data?
Ans: Yes, it is possible to exclude headers and footers easily before loading the data from the input
files.
Q58] Explain the use of Expression editor in Talend?
Ans: In Talend Open Studio, all expressions such as Input, Var, Output, and constraint statements can
be viewed and edited with the use of Expression editor. It provides visual comfort to write any function
in a dedicated view.

Q59] What is the difference between Built -In and Repository?


Ans:

Repository

Built-in

All the information is stored centrally in the repository.

All the information is stored locally on the job. It allows the user to enter and edit all the information.

It enables the user to import read-only information into the Job from the repository.

It allows the user to enter all the information manually.

It can be used overall by any Job in the project.

It can be used by the Job only.

Q60] Explain the error handling in Talend?


Ans: There are few ways in which the error can be handled in Talend are as follows:

Use of dedicated components provided by Talend

Use of links between two components in a job

Use of custom to design an appropriate job

Q61] How can we run multiple jobs in parallel within Talend?


Ans: In Talend, various jobs and Subjobs in multiple threads can be executed to reduce the runtime of a
job. There are three ways for parallel execution in Talend are as follows:

Multithreading

tParallelize component

Automatic parallelization

Q62] What is the difference between “Insert or Update” and “Update or Insert”?
Ans:

Insert or Update: First tries to insert a record, but if a record with a matching primary key already
exists, instead of updates that record.
Update or Insert: First tries to update a record with a matching primary key, but if none already exists,
instead inserts the record.

From a results point of view, there are no differences between the two, nor are there significant
performance differences. In general, choose the action that matches what you expect to be more
common: Insert or Update if you think there are more inserts than updates, Update or Insert if you think
there are more updates than inserts.

Q63] Is it possible to define a variable that can be accessed from multiple jobs?
Ans: Yes, you can declare a static variable in a routine, and add the setter/getter methods for this
variable in the routine. The variable is then accessible from different Jobs.

Q64] How to change the background colour of job designer in Talend?


Ans: By selecting the preferences of the window menu, then by clicking on the colour menu, we can
change the background colour of the job designer.

Q65] Mention the configurations that are required to connect HDFS?


Ans: The following are the configurations that are required to connect HDFS:

NameNode URI

User name

Distribution

Q66] Explain the function of tPigLoad component?


Ans: tPigLoad sets up a connection to the data source for a current transaction. It helps to load original
input data to an output stream in a single transaction once the data has been validated.

Q67] Is it possible to define schema at runtime in Talend?


Ans: No, it is not possible to define a schema at runtime. The schema defines the movement of data, so
it should be defined while configuring the components or at some point of the layout.

Q68] Differentiate between tMap and tJoin.


Ans:

tJoin

tMap

It can handle basic join cases only

It is a powerful component which can handle complicated cases.

It supports only unique join

It supports multiple types of join models such as first join, unique join, and all join etc.,
It can accept only two input links such as main and lookup

It can allow multiple input links in which one link is main, and other links are lookups

It can accept only two output links such as main and reject

It can allow more than one output links

It can not filter the data using filter expressions

It can filter the data with the help of filter expressions;

It supports only inner join

It supports inner join and left outer join

Q69] What is the difference between the ETL and ELT?


Ans:

Extract, Transform, and Load (ETL)

Extract, Load, and Transform (ELT)

The ETL process is that it extracts the data, transforms the data, and then it loads the data into the
database.

The ELT process is that it extracts the data, loads into the database, and then it transforms the data.

It is easy to implement

It requires good knowledge of tools to implement

It supports relational data

It supports the unstructured data

It does not provide Data lake support

It allows the use of Data lake support with unstructured data.

With the increase in the size of data, the processing slows down, and it requires to wait until the
transformation completes

The processing does not depend on the size of the data.

It is used to transfer data from the source database to the destination data warehouse.

It is a data manipulation process, which is used in data warehousing.


===================================================================
Part 1 – Talend Interview Questions (Basic)
This first part covers basic Interview Questions and Answers

Q1. Explain various connections available in Talend?


Answer:
The connections define if the data has to be a data output, processed or a logical sequence. Various
connections are:

Row-based: Types such as Lookup, main, filter, ErrorRejects, Rejects, uniques/duplicates, Output, and
Multiple Input/Output.
Iterate: This is used to perform a recurring loop on files which are contained in a directory.
Trigger: This connection is used to create a dependency between subjobs or Jobs which are triggered in
a consecutive sequence. The two generalized categories are: Subjob and Component level triggers
Link: It is used to transfer the table schema into the ELT component.
Q2. How is Talend related to Code generator?
Answer:
This is the basic Talend Interview Question asked in an interview. Please find below the different tables
that are supported by Talend are: Talend is called as a code generator which provides a user-friendly
graphical user interface where the components simply need to be dragged and dropped for designing a
job. Talent Studio automatically compiles into a Java class once the job is submitted where the inner
components, begin, main and end help in the control flow and therefore it is also referred to as the code
generator.

Q3. What schemas are supported by Talend?


Answer:
The following schemas are supported:

Generic schema: It is not tied to any particular source and also used as a sharable resource across
different data sources.
Fixed schema: Read-only schemas which come predefined with some components.
Repository Schema: Schema is reusable and any changes made in the schema will be reflected in all the
jobs.
Q4. What are routines?
Answer:
They are the reusable pieces of which can be used to optimize data processing by making use of custom
code. It also helps in enhancing the Talend Studio features and also improves job capacity. There are
basically two kinds of routines: User routine and System routine.

System routine: The read-only codes which can be directly called inside any Job.
User routine: Custom created a routine by the users either by making new ones or using the existing old
ones.
Q5. What is the difference between ETL and ELT?
Answer:
ETL or Extraction, Transformation, and Load is the age-old concept that involves the extraction of data
from external sources, transforming it to make it fit for use as per business and operational needs, then
loading it into the end target data warehouse or target database. This is a very valid approach as long as
there are multiple databases and source systems involved in the whole process. The data is transported
from one place to another, so it is often advisable to do all the transformation-related work in a separate
specialized engine.
ELT, on the other hand, is the process where the extracted data is primarily loaded into the end systems.
Thereafter, transformations are done on top of it. It is a better approach when your target system is
efficient and robust enough to handle all the transformations. Most of the analytical databases today
like Google Big Query and Amazon Redshift often make use of ELT technology because their end
systems are efficient enough to process, tackle and handle all the transformed data.

Popular Course in this category


Sale
All in One Data Science Bundle (360+ Courses, 50+ projects)
360+ Online Courses | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (3,220 ratings)Course Price
₹19999 ₹125000
View Course
Related Courses
Programming Languages Training (41 Courses, 13+ Projects, 4 Quizzes)C Programming Training (3
Courses, 5 Project)Selenium Automation Testing Training (9 Courses, 4+ Projects, 4 Quizzes)
Part 2 – Talend Interview Questions (Advanced)
Let us now have a look at the advanced Interview Questions.

Q6. What is a sub job? How data is sent from the parent job to the child job?
Answer:
A sub job is defined as a single component or more than one component joined by a data flow. One job
can at least have one sub job. Context variables should be used while passing a value from the parent to
the child job.

Q7. Explain tMap component and also list down the different functions which can be performed by
making its use?
Answer:
This is the most asked Talend Interview Questions in an interview. tMap is one of the essential
components which forms a core part of the “processing” family. The main use is to map the input data
with the output data. The main functions which can be performed by tap include:

Applying transformation rules on any kind of field.


Adding or removing columns
Reject data
Filter input and output data using constraints
Concatenate and interchanging of the data
Multiplexing and demultiplexing of data
Q8. Explain tDenormalizeSortedRow. Also, can we use Binary Transfer mode or an ASCII code in
creating an SFTP connection?
Answer:
tDenormalizeSortedRow forms an integral component of the processing family. It is used to synthesize
sorted input flow such that the memory is saved. All input sorted rows are combined in a group where
the item separators are joined with distinct values. No, the transfer modes cannot be used while creating
an SFTP connection. It is just an extension to SSH and therefore doesn’t support any kind of transfer
modes.
Q9. Explain error handling in Talend?
Answer:
The following is the error handling process:

Exception throwing process can be relied upon which can also be seen in the run view of the red stack
trace.
Every component and the sub job has to return the code which leads to additional processing. The
OK/Error links can be used to redirect the error towards an error handling routine.
The best and the most trusted way to handle an error is to define an error handling subjob which gets
called in case of an error.
Let us move to the next Talend Interview Questions

Q10. What is the difference between XMS and XMX parameters?


Answer:
XMS parameter is used to define the initial Heap size in Java whereas XMX is used to define the
maximum heap size.

You might also like