You are on page 1of 40

Metadata, Commit and Rollback

in Connection to DB

Jenny Elizabeth Abella Sánchez


Computer Science Engineer
MBA - BI & Big Data
Class 4
Metadata Repository
Metadata Repository

• The Repository of metadata centralizes the information of all


the projects and guarantees the coherence in all the
integration processes.

• The metadata related to the source and destination systems


of the integration processes are easily loaded into the
Repository of metadata through advanced analysis tools of
the databases or files, facilitated by various assistants.

• The characteristics defined in the Metadata are inherited by


the processes that make use of them.
Centralize the connections

• Database Connection: Save what type of database and its


connection parameters, in addition to what objects of the Bd
we can work with.

• To create the connection we go to Metadata - DbConnections


and on DbConnections with the right button we choose Create
Connections and a wizard will appear, where it will guide us
throughout the process.

• Once the connection is created, we will have a new element


in our repository.
Centralize the connections

• On the new created


element, with the right
button we indicate
Retrieve Schema, to be
able to recover all the
objects (tables) with
which we wish to work.
Centralize data flows and schemas
• Generic Schema: This type
of element, we could define
it as a schema template,
which can be used by any
element that uses
schemas.

• For its creation, we place


the courses on Generic
Schema and press the right
button Create Generic
Schema. An assistant will
appear where we will
indicate which fields our
scheme will have.
Access to relational databases
Access to relational databases

• Talend allows you to perform data processing tasks with ease.


All components for database access are in the Databases group

• The main information management component from the Mysql


database in Talend is the ‘tMysqlInput’ component, which we find
in the Databases / Mysql component category.

• Next we will see the steps for the creation of a standard Job with
access to MySQL Databases:
Access to relational databases

• Add tMySqlInput Component


Access to relational databases

With a double click on the component we will see its main


configuration:

• Property type: Select whether the properties of the component


are created dynamically or through metadata already defined
before the job

• DB Version: Version of the database we are using.


Access to relational databases

• Use an existing connection: It is the most common way to define


the configuration of our database connection. This is using
another type of component (tMysqlconnection) where our
connection is defined, and then we use it.

• Host: Name of the connection host

• Port: Connection Port


Access to relational databases

• Scheme: Definition of the scheme in the same component by means of


the ‘Built-In‘ option, or to choose the scheme definition of a repository
previously defined in Talend by means of the ‘Repository‘ option

• Edit schema: Definition of the source information scheme

• Table name: Name of the table from which to extract the information

• Query: Query formed dynamically by Talend when defining the


scheme. To update this query use the ‘Guess schema’ button.
Access to relational databases
• These will be the minimum configuration for the operation
of the component. Example:
Access to relational databases

The departments table has the following definition in the database:

The departments table records are:


Access to relational databases
We create the scheme in the generic schema section by defining the
name and type of the columns in the table, using the ‘Edit Schema‘
option:
Access to relational databases

• Basically we indicate the name we want to give the column in


the field ‘Column‘, the real name of the column in database in
the field ‘Db Column‘ and the type of output flow, which in this
case we define everything as String

• You can also configure the column type in database (DB Type),
whether the value of the column (Nullable) is null or not and a
series of other characteristics about the columns.
Access to relational databases
• When defining the scheme and choose it in the repository,
we can click on the option ‘Guess Query’ and observe how
the field ‘Query’ of the preferences of the component has
been autocompleted forming the query through which it will
obtain the flow of information from the database:
Access to relational databases

• We use the log component ‘tLogRow‘ to observe the results


of the outflow. For this we place it from the component
palette and propagate the flow of the database input
component to that component.

• To do this, right click on the database component, option


‘Row / Main‘ and click on the log component resulting:
Access to relational databases

• To see the outflow scheme, click on the option ‘Edit schema‘


of the preferences of the tLogRow component.

• We can observe how the output flow corresponds by default


with each of the columns of the output flow that comes from
the tMysqlInput component.
Access to relational databases

• We execute the work using the ‘Play‘ button and observe


how the output flow obtained is adequate:
Access to relational databases

• We can see how in the work area it marks the number of


rows of the outflow recovered, as well as the time it has
taken, and the average of rows per second:
Access to relational databases
• The tMysqlOutput component. As the name implies, it
serves to dump a flow of information into a specific database
table.

• We place such a component in our work area and propagate


the output flow of the tlogRow component to that component
(right click on tLogRow, Row, Main and click on the
tMysqlOutput component):
Access to relational databases
 We configure the component with the same connection as the
tMysqlInput component (Since the database where to dump the
outflow is the same) and with the particularity of configuring the 'Table'
property with a different name:
Access to relational databases
• Another of the interesting properties of this component are:

• Action on table: Action to be performed on the scheme about the target


table. For example create the table if it does not exist, clean it before
inserting

• Action on data: Action to take with the flow of information, for example
the action of insert or insert the records

• Die on error: Interesting action since checking it means that if at any


time the work fails, it will stop completely, that is, do not continue to
dump the rest of the information.
Access to relational databases
Click on the 'Edit Schema' option to see how the defined data flow is
adequate, that is, by default the departments_v2 table will be created
with the same columns and records as the table from which the data flow
comes, departments:
Access to relational databases
• Next, we execute the work and we can see in our scheme
how the departments_v2 table has been created properly
and filled in with the data correctly:
Access to relational databases

 The tMysqlConnection component allows us to configure a


database connection, and reuse that connection in other
components, such as tMysqlInput or tMysqlOutput.
Access to relational databases

• The properties of the component are practically the same as


the input and output components of Mysql
Access to relational databases
• Add the following properties:

• Additional JDB Parameters: It allows to specify additional parameters for the


connection to the database, in our example we add the parameter ‘maxRows
= 0’, to verify that the preferences work properly. This parameter indicates
that you return all the rows in the database.

• Use or register a shared DB Connection: Define a shared connection so that


it can be used in other Jobs (jobs).

• Once this component is properly configured, we are ready to use it in the


rest of our work components, thus avoiding having to configure the
connection to the database in each of them.
Access to relational databases

• To do this, we access the configuration of the tMysqlInput


and tMysqlOutput component to indicate ‘Use a connection
exists‘, marking that check:
Access to relational databases

• There are two useful components when working with


database connections in the Talend tool: tMysqlCommit and
tMysqlRollback.

• These components, as the name implies, are responsible for


performing the commit or rollback action, after performing a
specific action on the database. As we all should know when
working with databases, the action of commit is to confirm
the changes made to the database; and the rollback action
consists in undoing all the changes made to a database
during the execution of a process in it.
Access to relational databases
• We place these components in our work area:
Access to relational databases
• The triggers allow or not to divert the flow of execution of our Job,
depending on whether an error has occurred in the development of
the action to be performed by the component where the trigger is
configured.

• We have three types of conditional triggers in the component:

• Run if: Let the flow through if a condition is met.


• On component ok: Let the flow through in case everything went well.
• On component error: Let the flow through in case there was an error.
Access to relational databases

• In our example we will configure the output component to


database ‘tMysqlOutput’ so that depending on the result of its
execution is correct or not, perform one action or another.

• The ‘On component ok’ trigger will divert the flow to commit,
that is, to the tMysqlCommit component.

• The trigger ‘On component error’ will divert the flow to the
tMysqlRollback component, to perform a rollback on a
database, that is, undo the changes made to it in case of an
error.
Access to relational databases
• To do this, we right click on the tMysqlOutput component by
selecting the option ‘Trigger / On component ok’ and clicking
on the tMysqlCommit component:
Access to relational databases

• We perform the same action to set the trigger ‘On component


error’ to the tMysqlRollback component.

• Whenever we perform an action on a database, it is advisable to


close the connection to it, otherwise connections will be opened
that increase the memory consumption of our database.
Access to relational databases

• If you look at the preferences of the components responsible


for committing or rollback, they have an option to close the
database after performing their work:
Access to relational databases

• Another way to perform this action is through the


tMysqlClose component, whose end is to close a database
connection. For this, it is configured with the name of the
component on which the connection must be closed. In this
example its use is not necessary.

• Even so, it is advisable that all components that perform an


action on a database, and therefore, open or use a
connection against it, should in case of an error, close the
database connection, for example using 'On component
error' triggers.
Access to relational databases

 In this way the scheme results:


Access to relational databases

• We execute our work and


observe how the flow
follows the appropriate
path from the component
that reads the source
data to the component
that commits and closes
the connection in our
database:

You might also like