Professional Documents
Culture Documents
• Extraction
• Transformation Transformation
Load
• Load
Extraction
Introduction to the ETL dataflow
Transformation
ELT
Extraction
Standards and Transformations
Create sequences from top to bottom, leaving a place to the right for data
outputs and to connect with error control components or messages.
A Job with too many components can be difficult to understand and maintain.
It is preferable to create jobs that control the sequence of other jobs using the
tRunJob component.
Standards and Transformations
Document the job with a label or comment that indicates in general terms
what you do, the objective of the transformations, hopefully include its value
in terms of business meaning.
Name the jobs as short as possible but indicating in the name what they
mainly do either using a very representative word of what they solve or
including words of the process to which they belong with some documented
nomenclature of jobs.
Standards and Transformations
Standard transformation
Scalability and Performance
• TOS_DI-win32-x86.ini
• -Xms: JAVA_MIN_MEM
• Assign by default when a Job starts
• -Xmx: JAVA_MAX_MEM
• To avoid exceptions "Out of memory"
Work Environment
Work Environment
Work Environment
Work Environment
Work Environment
Work Environment
Workshop: explore the environment
Objective:
• Custom Code: components to define our own custom code and be able to
use it integrated with the rest of Talend components. We can write
components in Java and Perl, as well as load libraries or customize Groovy
commands.
Component Types
• Data Quality: components for data quality management, such as filtering,
CRC* calculations, fuzzy logic searches, replacement of values, validation of
schemes against metadata, cleaning of duplicates, etc.
• ELT: components to work with databases in ELT mode (with the typical
transformations and processes of this type of systems).
* CRC: cyclic redundancy check (CRC) is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to raw data.
https://en.wikipedia.org/wiki/Cyclic_redundancy_check
Component Types
• File: controls for the management of files (existence verification, copy,
deletion, list, properties), for reading files of different formats (text, excel,
delimited, XML, mail, etc.) and for writing in them.
• Logs & Errors: controls for error management and logs in the process
definition.
Component Types
Orchestration Components
Other components