You are on page 1of 8

Experiment no.

4
(Part A)
Aim: - To Perform Sorting Operation on CSV file using Talend Open Source tool for Data
Integration
Student Name:- Khushi Jain
Roll No.:- A218
Performance Date:- 08/08/2020
Submission Date:- 8/8/2020
PART B
Objective:

 To Perform Sorting Operation on CSV file using Talend Open Source tool for Data
Integration.
 To Understand the use of these components in sorting .
 To implement sorting operation on excel file.

Experiment Outcome: Successfully implemented the Sorting Operation on CSV file using
Talend tool.

Input:

Output: (Copy and paste output here)

Conclusion:-

 We concluded the learning through Talend software for sorting the CSV files.
 Output file has been generated using the functions like tSortRow ,
tLogRow ,etc.

Questions:-
1. What is the significance of tSortRow component? Explain the properties of
tSortRow.
tSortRow component offers the advantage of the dynamic schema
feature. This allows you to retrieve unknown columns from source files
or to copy batches of columns from a source without mapping each
column individually.
tSortRow properties

Component Processing  
family

Basic Schema and Edit A schema is a row description. It defines the


settings Schema number of fields (columns) to be processed
and passed on to the next component. The
schema is either Built-In or stored remotely in
the Repository.

Click Edit schema to make changes to the


schema. If the current schema is of
the Repository type, three options are
available:

 View schema: choose this option to


view the schema only.

 Change to built-in property: choose


this option to change the schema
to Built-in for local changes.

 Update repository connection: choose


this option to change the schema stored
in the repository and decide whether to
propagate the changes to all the Jobs
upon completion. If you just want to
propagate the changes to the current
Job, you can select No upon completion
and choose this schema metadata again
in the [Repository Content] window.

Click Sync columns to retrieve the schema


from the previous component connected in the
Job.

Built-In: You create and store the schema


    locally for this component only. Related topic:
see Talend Studio User Guide.

Repository: You have already created the


schema and stored it in the Repository. You
    can reuse it in various projects and Job
designs. Related topic: see Talend Studio User
Guide.

Criteria Click + to add as many lines as required for the


  sort to be complete. By default the first column
defined in your schema is selected.

Schema column: Select the column label from


your schema, which the sort will be based on.
   
Note that the order is essential as it determines
the sorting priority.

Sort type: Numerical and Alphabetical order


   
are proposed. More sorting types to come.

    Order: Ascending or descending order.

Advanced Sort on disk Customize the memory used to temporarily


settings store output data.

Temp data directory path: Set the location


where the temporary files should be stored.

Create temp data directory if not exists:


Select this check box to create the directory if it
does not exist.

Buffer size of external sort: Type in the size


of physical memory you want to allocate to sort
processing.

tStatCatcher Select this check box to gather the Job


  Statistics processing metadata at the Job level as well as
at each component level.

Global ERROR_MESSAGE: the error message generated by the


Variables component when an error occurs. This is an After variable and it
returns a string. This variable functions only if the Die on
error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while


an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to


access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User


Guide.

Usage This component handles flow of data therefore it requires input and
output, hence is defined as an intermediary step.

Limitation n/a

2. What is the significance of tLogRow component? Explain the properties of


tSortRow.
The tLogRow component is part of the Logs & Errors family
of components. tLogRow allows you to write data, that is flowing through
your Job (rows), to the console.

tLogRow properties
Componen Logs & Errors  
t family

Basic Schema and Edit A schema is a row description. It defines


settings schema the number of fields (columns) to be
processed and passed on to the next
component. The schema is either Built-
In or stored remotely in the Repository.

Click Edit schema to make changes to the


schema. If the current schema is of
the Repository type, three options are
available:

 View schema: choose this option to


view the schema only.

 Change to built-in property:


choose this option to change the
schema to Built-in for local
changes.
 Update repository connection:
choose this option to change the
schema stored in the repository and
decide whether to propagate the
changes to all the Jobs upon
completion. If you just want to
propagate the changes to the
current Job, you can select No upon
completion and choose this schema
metadata again in the [Repository
Content] window.

    Built-In: You create and store the schema


locally for this component only. Related
topic: see Talend Studio User Guide.

    Repository: You have already created the


schema and stored it in the Repository.
You can reuse it in various projects and
Job designs. Related topic: see Talend
Studio User Guide.

  Sync columns Click to synchronize the output file schema


with the input file schema. The Sync function is
available only when the component is linked
with the preceding component using
a Row connection.

  Basic Displays the output flow in basic mode.

  Table Displays the output flow in table cells.

  Vertical Displays each row of the output flow as a


key-value list.

With this mode selected, you can choose


to show either the unique name or the label
of component, or both of them, for each
output row.

Separator Enter the separator which will delimit data


on the Log display.
 
(For Basic mode
only)
Print header Select this check box to include the header of
  the input flow in the output display.
(For Basic mode
only)

Print component Select this check box to show the unique


unique name in front name the component in front of each
of each output row output row to differentiate outputs in case
 
several tLogRow components are used.
(For Basic mode
only)

Print schema Select this check box to retrieve column


column name in labels from output schema.
front of each value
 
(For Basic mode
only)

Use fixed length for Select this check box to set a fixed width
values for the value display.
 
(For Basic mode
only)

Global NB_LINE: the number of rows processed. This is an After variable


Variables and it returns an integer.

ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable and it
returns a string. This variable functions only if the Die on
error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component


while an After variable functions after the execution of the
component.

To fill up a field or expression with a variable, press Ctrl +


Space to access the variable list and choose the variable to use
from it.

For further information about variables, see Talend Studio User


Guide.

Usage This component can be used as intermediate step in a data flow or


as a n end object in the Job flowchart.

Log4j If you are using a subscription-based version of the Studio, the


activity of this component can be logged using the log4j feature.
For more information on this feature, see Talend Studio User
Guide.

Limitation n/a

3. What do you mean by component?


A component is a functional piece which is used to perform a single operation in
Talend. On the palette, whatever you can see all are the graphical representation of
the components. You can use them with a simple drag and drop. At the backend, a
component is a snippet of Java code that is generated as a part of a Job (which is
basically a Java class). These Java codes are automatically compiled by Talend
when the Job is saved.

4. What do you understand by the term Routines?


Routines are the reusable pieces of Java code. Using routines you can write custom
code in Java in order to optimize data processing, improve Job capacity, and extend Talend Studio
features.

Talend supports two types of routines:

 System routines: These are the read-only codes which you can call directly in any Job. 
 User routines: These are the routines which can be custom created by the users by either
creating new ones or adapting the existing ones.

You might also like