Professional Documents
Culture Documents
Java SE
System routine: The system routine is the read-only codes that can call the inside any job.
User routine: The custom created a routine by the users either by making a new one or the existing one.
Reject data
Q24] What are the components used to close a hive connection automatically?
Ans: To close a Hive connection automatically, we can use tPostJob and tHiveClose components.
Q26] What are the various features that are available in the main window of Talend Open Studio?
Ans: The features that are available in the main window of Talend Open Studio are as follows:
Menubar
Toolbar
Workspace
Palette
Tab panel
Configure tabs
Repository
Ans: In the Palette settings view, we can set the preferences for the component searching from the
palette and from the component list that appears on the design workspace when adding a component
without using the palette.
Q30] What are the ways to improve the performance of a Job in Talend?
Ans: The following are the ways to improve the performance of a job in Talend:
NameNode URI
User name
Distribution
Q33] Mention the service that is required for coordinating the transactions between HBase and Talend
studio?
Ans: Zookeeper client port service is required for coordinating the transactions between HBase and
Talend Open Studio.
Explore Tableau Sample Resumes! Download & Edit, Get Noticed by Top Employers!Download Now!
Q35] Can you edit generated code directly?
Ans: This is no possible; you cannot directly edit the code generated for a Talend Job.
Q36] If you want to include your own Java code in a Job, use one of these methods?
Ans:
[Related Page: Learn Adding and Reading Talend Headers and Trailers in Talend]
OnSubjobOk OnComponentOk
It belongs to Subjob Triggers It belongs to Component Triggers
It is used to trigger the next subjob on the condition where the subjob is completed without any errors.
It is used to trigger the target component after the execution of the source component completes
without any errors.
This link can be used only with the first component of the subjob. This link can be used with any
component in a job.
Q42] How can you normalize delimited data in Talend Open Studio?
Ans: By using the tNormalize component
Q46] Which Talend component is used for data transform using buitl in .NET classes?
Ans: tDotNETRow helps you facilitate data transform by utilizing custom or built-in .NET classes.
Extends its big data leadership position enabling firms to move beyond batch processing and into real-
time big data by providing technical previews for the Apache Spark, Apache Spark Streaming and
Apache Storm frameworks.
Enhances its support for the Internet of Things (IoT) by introducing support for key IoT protocols
(MQTT, AMQP) to gather and collect information from machines, sensors, or other devices.
Improves Big Data performance: MapReduce executes on average 24% faster in v5.6 than in v5.5, and
53% faster than in v5.4.2, while Big Data profiling performance is typically 20 times faster in v5.6
compared to v5.5.
Enables faster updates to MDM data models and provides deeper control of data lineage, more
visibility, and control.
Offers further enterprise application connectivity and support by continuing to add to its extensive list
of over 800 connectors and components with enhanced support for enterprise applications such as SAP
BAPI and Tables, Oracle 12 GoldenGate CDC, Microsoft HDInsight, Marketo and Salesforce.com.
Repository schema: The repository schema can often be used across multiple jobs. If any changes are
made in the schema, the entire job will effect automatically.
Fixed schema: The fixed schema is a read-only schema. For some components, the fixed schema is an
in-built in Talend.
Generic schema: The Generic schema is used as a sharable resource for multiple data sources. It is used
where you want to limit the use of schema tied to a specific file type or database type.
1. Row: The row connection handles the actual data. According to the nature of the flow process, it can
be in the following types of row connections:
Main
Multiple Input/Output
Filter
Lookup
ErrorRejects
Rejects
Output
Uniques/Duplicates
2. Iterate: The iterate connection is used to execute a loop of files contained in the directory, on the
database entries or the rows contained in a file.
3.Trigger: Trigger connections are used to define the processing sequence, so there is no data handles
through these connections. The trigger connections are of two types:
Q57] Is it possible to exclude headers and footers from the input files before loading the data?
Ans: Yes, it is possible to exclude headers and footers easily before loading the data from the input
files.
Q58] Explain the use of Expression editor in Talend?
Ans: In Talend Open Studio, all expressions such as Input, Var, Output, and constraint statements can
be viewed and edited with the use of Expression editor. It provides visual comfort to write any function
in a dedicated view.
Repository
Built-in
All the information is stored locally on the job. It allows the user to enter and edit all the information.
It enables the user to import read-only information into the Job from the repository.
Multithreading
tParallelize component
Automatic parallelization
Q62] What is the difference between “Insert or Update” and “Update or Insert”?
Ans:
Insert or Update: First tries to insert a record, but if a record with a matching primary key already
exists, instead of updates that record.
Update or Insert: First tries to update a record with a matching primary key, but if none already exists,
instead inserts the record.
From a results point of view, there are no differences between the two, nor are there significant
performance differences. In general, choose the action that matches what you expect to be more
common: Insert or Update if you think there are more inserts than updates, Update or Insert if you think
there are more updates than inserts.
Q63] Is it possible to define a variable that can be accessed from multiple jobs?
Ans: Yes, you can declare a static variable in a routine, and add the setter/getter methods for this
variable in the routine. The variable is then accessible from different Jobs.
NameNode URI
User name
Distribution
tJoin
tMap
It supports multiple types of join models such as first join, unique join, and all join etc.,
It can accept only two input links such as main and lookup
It can allow multiple input links in which one link is main, and other links are lookups
It can accept only two output links such as main and reject
The ETL process is that it extracts the data, transforms the data, and then it loads the data into the
database.
The ELT process is that it extracts the data, loads into the database, and then it transforms the data.
It is easy to implement
With the increase in the size of data, the processing slows down, and it requires to wait until the
transformation completes
It is used to transfer data from the source database to the destination data warehouse.
Row-based: Types such as Lookup, main, filter, ErrorRejects, Rejects, uniques/duplicates, Output, and
Multiple Input/Output.
Iterate: This is used to perform a recurring loop on files which are contained in a directory.
Trigger: This connection is used to create a dependency between subjobs or Jobs which are triggered in
a consecutive sequence. The two generalized categories are: Subjob and Component level triggers
Link: It is used to transfer the table schema into the ELT component.
Q2. How is Talend related to Code generator?
Answer:
This is the basic Talend Interview Question asked in an interview. Please find below the different tables
that are supported by Talend are: Talend is called as a code generator which provides a user-friendly
graphical user interface where the components simply need to be dragged and dropped for designing a
job. Talent Studio automatically compiles into a Java class once the job is submitted where the inner
components, begin, main and end help in the control flow and therefore it is also referred to as the code
generator.
Generic schema: It is not tied to any particular source and also used as a sharable resource across
different data sources.
Fixed schema: Read-only schemas which come predefined with some components.
Repository Schema: Schema is reusable and any changes made in the schema will be reflected in all the
jobs.
Q4. What are routines?
Answer:
They are the reusable pieces of which can be used to optimize data processing by making use of custom
code. It also helps in enhancing the Talend Studio features and also improves job capacity. There are
basically two kinds of routines: User routine and System routine.
System routine: The read-only codes which can be directly called inside any Job.
User routine: Custom created a routine by the users either by making new ones or using the existing old
ones.
Q5. What is the difference between ETL and ELT?
Answer:
ETL or Extraction, Transformation, and Load is the age-old concept that involves the extraction of data
from external sources, transforming it to make it fit for use as per business and operational needs, then
loading it into the end target data warehouse or target database. This is a very valid approach as long as
there are multiple databases and source systems involved in the whole process. The data is transported
from one place to another, so it is often advisable to do all the transformation-related work in a separate
specialized engine.
ELT, on the other hand, is the process where the extracted data is primarily loaded into the end systems.
Thereafter, transformations are done on top of it. It is a better approach when your target system is
efficient and robust enough to handle all the transformations. Most of the analytical databases today
like Google Big Query and Amazon Redshift often make use of ELT technology because their end
systems are efficient enough to process, tackle and handle all the transformed data.
Q6. What is a sub job? How data is sent from the parent job to the child job?
Answer:
A sub job is defined as a single component or more than one component joined by a data flow. One job
can at least have one sub job. Context variables should be used while passing a value from the parent to
the child job.
Q7. Explain tMap component and also list down the different functions which can be performed by
making its use?
Answer:
This is the most asked Talend Interview Questions in an interview. tMap is one of the essential
components which forms a core part of the “processing” family. The main use is to map the input data
with the output data. The main functions which can be performed by tap include:
Exception throwing process can be relied upon which can also be seen in the run view of the red stack
trace.
Every component and the sub job has to return the code which leads to additional processing. The
OK/Error links can be used to redirect the error towards an error handling routine.
The best and the most trusted way to handle an error is to define an error handling subjob which gets
called in case of an error.
Let us move to the next Talend Interview Questions