CloverETL/GUI User's Manual Version 2.0

CloverGUI
User's guide
Javlin, a.s.
Tomas Waller
CloverGUI User's Guide
Tomas Waller
This User's Guide covers the Release 2.0.x of CloverGUI.
Copyright © 2008 Javlin, a.s. All rights reserved.

Published 26-August-2008
Table of Contents
I. Installation Guide ...................................................................................................................... 1
1. CloverGUI Overview ......................................................................................................... 2
What Is CloverGUI? ..................................................................................................... 2
What Is CloverEngine? .................................................................................................. 2
What Is CloverServer? ................................................................................................... 2
Web Information .......................................................................................................... 2
2. Installation Instructions ...................................................................................................... 3
The Way How You Should Download CloverGUI .............................................................. 3
Downloading the Eclipse Platform ........................................................................... 3
Setting Workspace ................................................................................................ 4
Downloading the Eclipse GEF Plugin ....................................................................... 5
Downloading the CloverGUI Plugin ......................................................................... 8
Creating a New Project ................................................................................................ 12
Creating a New Graph ................................................................................................. 15
Running Graphs .......................................................................................................... 18
3. Import ........................................................................................................................... 24
Import Clover Projects ................................................................................................. 25
Import Graphs ............................................................................................................ 26
Import Metadata .......................................................................................................... 27
Metadata from XSD ............................................................................................ 27
Metadata from DDL ............................................................................................ 28
4. Export ........................................................................................................................... 29
Export Graphs ............................................................................................................ 29
Export Graphs to HTML .............................................................................................. 30
Export Metadata to XSD .............................................................................................. 31
Export Image .............................................................................................................. 31
A. Setting and Configuring Java Tools .................................................................................... 33
Setting Java Runtime Environment ................................................................................. 33
Installing Java Development Kit .................................................................................... 37
B. Import Other Examples .................................................................................................... 39
II. Objects, Structures and Tools .................................................................................................... 45
5. CloverGUI Structures ....................................................................................................... 46
CloverGUI Perspective ................................................................................................. 46
CloverGUI Panes ........................................................................................................ 46
Graph Editor with Palette of Components ................................................................ 47
Navigator Pane ................................................................................................... 51
Outline Pane ....................................................................................................... 51
Tabs Pane .......................................................................................................... 52
6. Building Transformation Graph .......................................................................................... 54
7. Edges ............................................................................................................................ 55
What Are the Edges? ................................................................................................... 55
Connecting Components by the Edges ............................................................................ 55
Assigning Metadata to the Edges ................................................................................... 55
Propagating Metadata through the Edges ......................................................................... 56
Debugging the Edges ................................................................................................... 56
Viewing the Data Flowing through the Edges ................................................................... 57
Types of Edges ........................................................................................................... 60
Colors of the Edges ..................................................................................................... 61
8. Metadata ........................................................................................................................ 62
Internal Metadata ........................................................................................................ 62
How You Can Create Internal Metadata .................................................................. 62
Externalizing Internal Metadata ............................................................................. 64
Exporting Internal Metadata .................................................................................. 64
External (Shared) Metadata ........................................................................................... 65
iii
CloverGUI
How You Can Create External (Shared) Metadata ..................................................... 65

Linking External (Shared) Metadata ....................................................................... 66
Internalizing External (Shared) Metadata ................................................................. 66
The Resources From Which You Can Extract Metadata ..................................................... 67
Extracting Metadata from a Flat File ...................................................................... 67
Extracting Metadata from an XLS File .................................................................... 71
Extracting Metadata from a Database ...................................................................... 72
Creating Metadata from a DBase File ..................................................................... 74
Creating Metadata by User ................................................................................... 78
Assigning Metadata to an Edge ..................................................................................... 78
Editing Metadata ......................................................................................................... 78
Creating Database Table on the basis of Metadata and Database Connection ........................... 79
Metadata Editor .......................................................................................................... 79
Record Pane ....................................................................................................... 81
Field Pane .......................................................................................................... 83
Filter Textarea .................................................................................................... 84
Dynamic Metadata ...................................................................................................... 84
9. Database Connections ....................................................................................................... 85
Internal Database Connections ....................................................................................... 85
How You Can Create Internal Database Connections ................................................. 85
Externalizing Internal Database Connections ............................................................ 88
External (Shared) Database Connections .......................................................................... 89
How You Can Create External (Shared) Database Connections .................................... 89
Linking External (Shared) Database Connection ........................................................ 93
Internalizing External (Shared) Database Connections ................................................ 93
Browsing Database and Extracting Metadata from Database Tables ...................................... 93
Encrypting the Access Password .................................................................................... 94
10. Lookup Tables .............................................................................................................. 95
Creating Lookup Tables ............................................................................................... 95
Simple Lookup Table ........................................................................................... 96
Database Lookup Table ........................................................................................ 98
Range Lookup Table ........................................................................................... 99
11. Parameters .................................................................................................................. 101
Internal Parameters .................................................................................................... 101
How You Can Create Internal Parameters .............................................................. 101
Externalizing Internal Parameters ......................................................................... 101
External (Shared) Parameters ....................................................................................... 102
How You Can Create External (Shared) Parameters ................................................. 102
Linking External (Shared) Parameters ................................................................... 102
Internalizing External (Shared) Parameters ............................................................. 103
Parameters Wizard ..................................................................................................... 103
Using Parameters ....................................................................................................... 104
12. Sequences ................................................................................................................... 105
Creating a Sequence ................................................................................................... 105
Editing a Sequence .................................................................................................... 105
C. JMS Connections .......................................................................................................... 107
Internal JMS Connections ........................................................................................... 107
How You Can Create Internal JMS Connections ..................................................... 107
Externalizing Internal JMS Connections ................................................................. 107
External (Shared) JMS Connections .............................................................................. 107
How You Can Create External (Shared) JMS Connections ........................................ 107
Linking External (Shared) JMS Connection ............................................................ 107
Internalizing External (Shared) JMS Connections .................................................... 108
Edit JMS Connection Wizard ...................................................................................... 108
Encrypting the Authentication Password ........................................................................ 109
III. Components Guide ............................................................................................................... 110
iv
CloverGUI
13. Introduction to Components ........................................................................................... 111

Common Properties of Components .............................................................................. 111
Palette of Components ........................................................................................ 112
Giving a Name to a Component ........................................................................... 113
Phases ............................................................................................................. 114
Enabling vs. Disabling Components vs. PassThrough Status ...................................... 115
Data Policy ....................................................................................................... 117
Locating Files with URL File Dialog .................................................................... 117
Viewing Data in Readers and Writers .................................................................... 118
14. Defining the Transformations ......................................................................................... 121
Open Type Wizard .................................................................................................... 121
Edit Value Wizard ..................................................................................................... 122
Transform Editor ....................................................................................................... 123
15. Readers ...................................................................................................................... 132
File URL ................................................................................................................. 132
File Readers ............................................................................................................. 133
DataGenerator ................................................................................................... 133
Flat File Readers ............................................................................................... 135
UniversalDataReader .......................................................................................... 135
Other Type File Readers ..................................................................................... 136
CloverDataReader .............................................................................................. 136
XLSDataReader ................................................................................................. 137
DBFDataReader ................................................................................................ 139
Database Readers ...................................................................................................... 140
Using JDBC Drivers .......................................................................................... 140
DBInputTable ................................................................................................... 140
Advanced Readers ..................................................................................................... 141
XMLExtract ...................................................................................................... 141
XMLXPathReader ............................................................................................. 144
JMSReader ....................................................................................................... 147
LDAPReader .................................................................................................... 148
16. Writers ....................................................................................................................... 149
File URL ................................................................................................................. 149
File Writers .............................................................................................................. 149
Partitioning Data Flow into Different Output Files ................................................... 150
Trash ............................................................................................................... 151
Flat File Writers ................................................................................................ 151
UniversalDataWriter ........................................................................................... 151
Other Type File Writers ...................................................................................... 152
CloverDataWriter ............................................................................................... 152
XLSDataWriter ................................................................................................. 153
StructuredDataWriter .......................................................................................... 154
Database Writers ....................................................................................................... 155
Using JDBC Drivers .......................................................................................... 155
DBOutputTable ................................................................................................. 156
Using Database Bulk Loaders .............................................................................. 158
DB2DataWriter ................................................................................................. 158
InformixDataWriter ............................................................................................ 159
MSSQLDataWriter ............................................................................................. 160
MySQLDataWriter ............................................................................................. 161
OracleDataWriter ............................................................................................... 161
PostgreSQLDataWriter ....................................................................................... 162
Advanced Writers ...................................................................................................... 163
XMLWriter ...................................................................................................... 163
JMSWriter ........................................................................................................ 167
LDAPWriter ..................................................................................................... 168
v
CloverGUI
17. Transformers ............................................................................................................... 169

Copying, Filtering and Sorting ..................................................................................... 169
SimpleCopy ...................................................................................................... 169
SpeedLimiter .................................................................................................... 169
ExtSort ............................................................................................................ 170
Dedup .............................................................................................................. 171
ExtFilter ........................................................................................................... 171
Concatenating, Gathering and Merging .......................................................................... 172
Concatenate ...................................................................................................... 172
SimpleGather .................................................................................................... 172
Merge .............................................................................................................. 173
Partitioning and Intersection ........................................................................................ 174
Partition ........................................................................................................... 174
DataIntersection ................................................................................................. 177
Pure Transformers ..................................................................................................... 179
KeyGenerator .................................................................................................... 179
Aggregate ......................................................................................................... 180
Reformat .......................................................................................................... 181
Denormalizer .................................................................................................... 183
Normalizer ....................................................................................................... 185
XSLTransformer ................................................................................................ 188
18. Joiners ........................................................................................................................ 189
Join Types ................................................................................................................ 189
Inner Join ......................................................................................................... 189
Left Outer Join .................................................................................................. 189
Full Outer Join .................................................................................................. 189
Joining Components ................................................................................................... 189
Transformations ................................................................................................. 190
ApproximativeJoin ............................................................................................. 191
ExtHashJoin ...................................................................................................... 194
ExtMergeJoin .................................................................................................... 197
LookupJoin ....................................................................................................... 199
DBJoin ............................................................................................................ 201
19. Other Components ........................................................................................................ 203
Executing Components ............................................................................................... 203
SystemExecute .................................................................................................. 203
JavaExecute ...................................................................................................... 203
DBExecute ....................................................................................................... 204
RunGraph ......................................................................................................... 205
Non-Executing Components ........................................................................................ 206
CheckForeignKey .............................................................................................. 206
LookupTableReaderWriter ................................................................................... 208
20. Deprecated .................................................................................................................. 210
Flat File Readers ....................................................................................................... 210
DelimitedDataReader .................................................................................................. 210
FixLenDataReader ..................................................................................................... 211
Flat File Writers ........................................................................................................ 212
DelimitedDataWriter .................................................................................................. 212
FixLenDataWriter ...................................................................................................... 213
D. Defining Transformations in Java ..................................................................................... 215
IV. Transformation Language ...................................................................................................... 216
21. Clover Transformation Language .................................................................................... 217
Program Structure ...................................................................................................... 217
Comments ................................................................................................................ 217
Import ..................................................................................................................... 217
Data Types ............................................................................................................... 218
vi
CloverGUI
Literals .................................................................................................................... 220

Variables .................................................................................................................. 222
Operators ................................................................................................................. 222
Arithmetic Operators .......................................................................................... 222
Relational Operators ........................................................................................... 224
Logical Operators .............................................................................................. 225
Simple Statement and Block of Statements ..................................................................... 226
Control Statements ..................................................................................................... 226
Selection Statements .......................................................................................... 226
Iteration Statements ............................................................................................ 227
Jump Statements ................................................................................................ 228
Functions ................................................................................................................. 228
Eval ........................................................................................................................ 229
Parameters ................................................................................................................ 229
Sequences ................................................................................................................ 229
Lookup Tables .......................................................................................................... 229
Data Flows ............................................................................................................... 230
Mapping .................................................................................................................. 230
E. Clover TL Functions ...................................................................................................... 231
Conversion Functions ................................................................................................. 231
Date Functions .......................................................................................................... 233
Mathematical Functions .............................................................................................. 234
String Functions ........................................................................................................ 235
Miscellaneous Functions ............................................................................................. 238
F. Clover Transformation Language Lite ............................................................................... 239
vii
List of Figures
2.1. The Eclipse Logo ................................................................................................................... 3
2.2. You Are Asked To Select a Workspace ...................................................................................... 4
2.3. You Can Select the Following Workspace ................................................................................... 4
2.4. The Eclipse Platform Introductory Screen ................................................................................... 5
2.5. Downloading the Graphical Editing Framework ........................................................................... 5
2.6. Install/Update Wizard .............................................................................................................. 6
2.7. Searching for the Graphical Editing Framework ........................................................................... 6
2.8. List of Mirrors for Download ................................................................................................... 7
2.9. The Eclipse License Agreement ................................................................................................ 7
2.10. The About Eclipse SDK Window ............................................................................................ 8
2.11. List of Installed Plugins ......................................................................................................... 8
2.12. List of Update Sites ............................................................................................................... 9
2.13. Adding the Clover Update Site ................................................................................................ 9
2.14. Selecting the Sites that Should Be Updated .............................................................................. 10
2.15. CloverGUI Prompt .............................................................................................................. 10
2.16. Clover Products to Install ..................................................................................................... 11
2.17. Clover Has Been Installed ..................................................................................................... 11
2.18. Creating a New Project ........................................................................................................ 12
2.19. Selecting the New Project Wizard .......................................................................................... 12
2.20. Giving a Name to a New Project ............................................................................................ 13
2.21. CloverETL Examples Project ................................................................................................. 13
2.22. Opening the CloverETL Perspective ....................................................................................... 13
2.23. CloverETL Perspective ......................................................................................................... 14
2.24. CloverETL Perspective with Highlighted Navigator Pane and the Project Folder Structure ................. 14
2.25. Creating a New Graph ......................................................................................................... 15
2.26. Giving a Name to a New Graph ............................................................................................. 15
2.27. Selecting the Parent Folder for the Graph ................................................................................ 16
2.28. CloverETL Perspective with Highlighted Graph Editor ............................................................... 16
2.29. Graph Editor with a New Graph and the Palette of Components ................................................... 17
2.30. Opening the Workspace.prm File ........................................................................................... 17
2.31. The Parameters Contained in the Workspace.prm File ................................................................ 18
2.32. Running a Graph from the Main Menu .................................................................................... 19
2.33. Running a Graph from the Context Menu ................................................................................ 19
2.34. Running a Graph from the Upper Tool Bar .............................................................................. 20
2.35. Open Run Dialog ................................................................................................................ 20
2.36. Setting Up Memory Size ...................................................................................................... 21
2.37. Successful Data Parsing ........................................................................................................ 21
2.38. Console Tab with an Overview of the Graph Processing ............................................................. 22
2.39. Counting Parsed Data ........................................................................................................... 22
2.40. Enlarging the Font of Numbers .............................................................................................. 23
2.41. Setting the Font Size ............................................................................................................ 23
3.1. Import (Main Menu) .............................................................................................................. 24
3.2. Import (Context Menu) .......................................................................................................... 24
3.3. Import Options ..................................................................................................................... 25
3.4. Import Projects ..................................................................................................................... 25
3.5. Import Graphs ...................................................................................................................... 26
3.6. Import Metadata from XSD .................................................................................................... 27
3.7. Import Metadata from DDL .................................................................................................... 28
4.1. Export Options ..................................................................................................................... 29
4.2. Export Graphs ...................................................................................................................... 29
4.3. Export Graphs to HTML ........................................................................................................ 30
4.4. Export metadata to XSD ........................................................................................................ 31
4.5. Export Image ....................................................................................................................... 31
A.1. Setting Java Runtime Environment .......................................................................................... 33
viii
CloverGUI
A.2. Preferences Wizard ............................................................................................................... 34

A.3. Installed JREs Wizard ........................................................................................................... 34
A.4. Adding a New JRE ............................................................................................................... 35
A.5. Selecting New JRE Files ....................................................................................................... 35
A.6. Selecting a JRE .................................................................................................................... 36
A.7. Adding Java Development Kit ................................................................................................ 37
A.8. Searching for JDK Jars ......................................................................................................... 37
A.9. Selecting JDK Jars ............................................................................................................... 38
A.10. Adding JDK Jars ................................................................................................................ 38
B.1. Import Examples (Main Menu) ............................................................................................... 39
B.2. Import Examples (Context Menu) ............................................................................................ 40
B.3. Import External Clover Project ............................................................................................... 40
B.4. Examples Selected ................................................................................................................ 41
B.5. CloverETL Perspective with a Set of Projects ............................................................................ 41
B.6. Newer Examples Files and Folders in the Navigator Pane ............................................................ 42
B.7. Older Examples Files and Folders in the Navigator Pane ............................................................. 42
B.8. Setting the WORKSPACE Parameter ....................................................................................... 43
B.9. Example Graph .................................................................................................................... 44
5.1. CloverGUI Perspective ........................................................................................................... 46
5.2. Graph Editor with an Opened Palette of Components ................................................................... 47
5.3. Closing the Graphs ................................................................................................................ 48
5.4. Grid in the Graph Editor ........................................................................................................ 48
5.5. A Graph before Selecting Auto-Layout. .................................................................................... 49
5.6. A Graph after Selecting Auto-Layout. ....................................................................................... 49
5.7. Six New Buttons in the Tool Bar Appear Highlighted (Align Middle is shown) ................................. 50
5.8. Alignments from the Context Menu .......................................................................................... 50
5.9. Navigator Pane ..................................................................................................................... 51
5.10. Outline Pane ....................................................................................................................... 51
5.11. Another Representation of the Outline Pane ............................................................................. 52
5.12. Properties Tab ..................................................................................................................... 53
5.13. Console Tab ....................................................................................................................... 53
5.14. Problems Tab ..................................................................................................................... 53
5.15. Clover - Graph Tracking Tab ................................................................................................ 53
5.16. Clover - Log Tab ................................................................................................................ 53
7.1. Creating Metadata on the Empty Edge ...................................................................................... 56
7.2. Properties of an Edge ............................................................................................................ 57
7.3. Filter Editor Wizard .............................................................................................................. 57
7.4. View Data Dialog ................................................................................................................. 58
7.5. Viewing Data ....................................................................................................................... 58
7.6. Hide/Show Columns when Viewing Data .................................................................................. 58
7.7. View Record Dialog .............................................................................................................. 59
7.8. Find Dialog .......................................................................................................................... 59
7.9. Copy Dialog ......................................................................................................................... 60
7.10. Selecting the Edge Type ....................................................................................................... 60
7.11. Metadata in the Tooltip ........................................................................................................ 61
8.1. Creating Internal Metadata in the Outline Pane ........................................................................... 63
8.2. Creating Internal Metadata in the Graph Editor ........................................................................... 63
8.3. Externalizing and/or Exporting Internal Metadata ........................................................................ 64
8.4. Selecting a Location for a New Externalized and/or Exported Internal Metadata ................................ 65
8.5. Creating External (Shared) Metadata in the Main Menu and/or in the Navigator Pane ......................... 66
8.6. Internalizing External (Shared) Metadata ................................................................................... 67
8.7. Extracting Metadata from Delimited Flat File ............................................................................. 68
8.8. Extracting Metadata from Fixed Length Flat File ........................................................................ 68
8.9. Setting Up Delimited Metadata ................................................................................................ 69
8.10. Setting Up Fixed Length Metadata ......................................................................................... 70
8.11. Extracting Metadata from XLS File ........................................................................................ 71
ix
CloverGUI
8.12. Extracting Internal Metadata from a Database ........................................................................... 72

8.13. Database Connection Wizard ................................................................................................. 72
8.14. Selecting Columns for Metadata ............................................................................................. 73
8.15. Generating a Query .............................................................................................................. 73
8.16. Original Libraries Tab of Java Build Path ................................................................................ 74
8.17. Adding the Two Libraries for Extracting Metadata from DBASE File ........................................... 75
8.18. Creating Java Application for Extracting Metadata from DBASE File ............................................ 75
8.19. Selecting the Main Class ...................................................................................................... 76
8.20. Adding the Main Class ......................................................................................................... 76
8.21. Adding Arguments .............................................................................................................. 77
8.22. Configuration for Extracting Metadata from DBASE File Has Been Created ................................... 77
8.23. Assigning Metadata to an Edge .............................................................................................. 78
8.24. Creating Database Table on the Basis of Metadata and Database Connection .................................. 79
8.25. Metadata Editor for a Delimited File ....................................................................................... 81
8.26. Metadata Editor for a Fixed Length File .................................................................................. 81
9.1. Creating Internal Database Connection ...................................................................................... 85
9.2. Database Connection Wizard ................................................................................................... 86
9.3. Adding a new JDBC Driver into the List of Available Drivers ....................................................... 86
9.4. Defining Internal Database Connection ..................................................................................... 87
9.5. Externalizing Internal Database Connection ............................................................................... 88
9.6. Creating External (Shared) Database Connection ........................................................................ 89
9.7. Selecting Database Connection Item ......................................................................................... 90
9.8. Database Connection Wizard ................................................................................................... 90
9.9. Adding a new JDBC Driver into the List of Available Drivers ....................................................... 91
9.10. Defining External (Shared) Database Connection ...................................................................... 92
9.11. Selecting a Folder for External (Shared) Database Connection ..................................................... 92
9.12. Internalizing External (Shared) Database Connection ................................................................. 93
9.13. Running a Graph with the Password Encrypted ......................................................................... 94
10.1. Lookup Table Wizard .......................................................................................................... 95
10.2. Simple Lookup Table Wizard ................................................................................................ 96
10.3. Edit Key Wizard ................................................................................................................. 96
10.4. Simple Lookup Table Wizard with File URL ........................................................................... 97
10.5. Simple Lookup Table Wizard with Data .................................................................................. 97
10.6. Changing Data .................................................................................................................... 97
10.7. Database Lookup Table Wizard ............................................................................................. 98
10.8. Query Editor Wizard ............................................................................................................ 98
10.9. Appropriate Data for Range Lookup Table ............................................................................... 99
10.10. Range Lookup Table Wizard ............................................................................................... 99
10.11. Define Range Lookup Table Key Wizard ............................................................................. 100
10.12. Assigning End Fields to Start Fields .................................................................................... 100
11.1. Creating Internal Parameters ................................................................................................ 101
11.2. Externalizing Internal Parameters .......................................................................................... 102
11.3. Internalizing External (Shared) Parameter ............................................................................... 103
11.4. Example of a Parameter-Value Pair ....................................................................................... 104
12.1. Creating a Sequence ........................................................................................................... 105
12.2. Editing a Sequence ............................................................................................................ 106
12.3. A New Run of the Graph with the Previous Start Value of the Sequence ...................................... 106
C.1. Edit JMS Connection Wizard ................................................................................................ 108
13.1. Selecting Components ........................................................................................................ 112
13.2. Components in Palette ........................................................................................................ 112
13.3. Removing Components from he Palette ................................................................................. 113
13.4. Simple Renaming Components ............................................................................................. 114
13.5. Running a Graph with Various Phases ................................................................................... 115
13.6. Running a Graph with Disabled Component ........................................................................... 116
13.7. Running a Graph with Component in PassThrough Status ......................................................... 116
13.8. URL File Dialog ................................................................................................................ 117
x
CloverGUI
13.9. Viewing Data in Components .............................................................................................. 119

13.10. Viewing Data as Plain Text ............................................................................................... 119
13.11. Viewing Data as Grid ....................................................................................................... 119
13.12. Plain Text Data Viewing ................................................................................................... 120
13.13. Grid Data Viewing ........................................................................................................... 120
14.1. Open Type Wizard ............................................................................................................. 122
14.2. Edit Value Wizard ............................................................................................................. 122
14.3. Find Wizard ...................................................................................................................... 122
14.4. Go to Line Wizard ............................................................................................................. 123
14.5. Transformations Tab of the Transform Editor ......................................................................... 123
14.6. Copying the Input Field to the Output ................................................................................... 124
14.7. Transformation Definition in CTL (Transformations Tab) .......................................................... 125
14.8. Mapping of Inputs to Outputs (Connecting Lines) .................................................................... 125
14.9. Editor with Fields and Functions .......................................................................................... 126
14.10. Transformation Definition in CTL (Source Tab) .................................................................... 126
14.11. Confirmation Message ...................................................................................................... 127
14.12. Transformation Definition in CTL (Transform Tab of the Graph Editor) ..................................... 127
14.13. Outline Pane Displaying Variables and Functions ................................................................... 128
14.14. Content Assist (Record and Field Names) ............................................................................. 128
14.15. Content Assist (List of CTL Functions) ................................................................................ 129
14.16. Error in Transformation ..................................................................................................... 129
14.17. Converting Transformation to Java ...................................................................................... 130
14.18. Transformation Definition in Java ....................................................................................... 130
14.19. Older Transformation Definition in CTL Lite (Transformations Tab) ......................................... 131
14.20. Older Transformation Definition in CTL Lite (Source Tab) ...................................................... 131
15.1. Sequences Dialog .............................................................................................................. 134
15.2. A Sequence Assigned ......................................................................................................... 134
15.3. Edit Key Dialog ................................................................................................................ 135
15.4. XLS Mapping Dialog ......................................................................................................... 138
15.5. XLS Fields Mapped to Clover Fields .................................................................................... 138
16.1. Create Mask Wizard ........................................................................................................... 155
17.1. Defining Sort Key and Sort Order ........................................................................................ 170
17.2. Ranges Editor ................................................................................................................... 175
17.3. Source Tab of the Transform Editor in the Partition Component ................................................. 176
17.4. Source Tab of the Transform Editor in the DataIntersection Component ....................................... 178
17.5. Source Tab of the Transform Editor in the Reformat Component ................................................ 182
17.6. Source Tab of the Transform Editor in the Denormalizer Component ........................................... 184
17.7. Source Tab of the Transform Editor in the Normalizer Component .............................................. 186
17.8. XSLT Mapping ................................................................................................................. 188
17.9. An Example of Mapping ..................................................................................................... 188
18.1. Source Tab of the Transform Editor in Joiners ........................................................................ 190
18.2. Join Key Wizard (Master Key Tab) ...................................................................................... 191
18.3. Join Key Wizard (Slave Key Tab) ........................................................................................ 192
18.4. An Example of the Join Key Attribute in ApproximativeJoin Component ..................................... 193
18.5. Matching Key Wizard (Master Key Tab) ............................................................................... 193
18.6. Matching Key Wizard (Slave Key Tab) ................................................................................. 193
18.7. An Example of the Join Key Attribute in ExtHashJoin Component .............................................. 195
18.8. Hash Join Key Wizard ........................................................................................................ 196
18.9. Join Key Wizard (Master Key Tab) ...................................................................................... 198
18.10. Join Key Wizard (Slave Key Tab) ....................................................................................... 198
18.11. Edit Key Wizard .............................................................................................................. 200
19.1. Foreign Key Definition Wizard (Foreign Key Tab) .................................................................. 207
19.2. Foreign Key Definition Wizard (Primary Key Tab) .................................................................. 207
19.3. Foreign Key Definition Wizard (Foreign and Primary Keys Assigned) ......................................... 208
xi
Part I. Installation Guide
Chapter 1. CloverGUI Overview
This chapter is an overview of the following three products of our CloverETL software: CloverGUI, Clov-
erEngine and CloverServer.
What Is CloverGUI?
CloverGUI is one of the family of CloverETL software products developed by Opensys and Javlin Companies. It
is a powerful Java-based standalone application for data extraction, transformation and loading.
CloverGUI is provided as a plugin for the Eclipse Platform. Thus, to work with CloverGUI, you must first down-
load the Eclipse Platform.
Working with CloverGUI is much more simple than writing your proper code for data parsing. Its graphical user
interface makes building and running graphs more easy and comfortable.
What Is CloverEngine?
CloverEngine is another member of the family of CloverETL software products developed by Opensys and Javlin
Companies. CloverEngine is a Java-based application that allows you to integrate CloverEngine into other appli-
cations you are using.
What Is CloverServer?
CloverServer is the last and newest member of CloverETL software products developed by Opensys and Javlin
Companies. CloverServer is also based on Java. It provides the full functionality of a server.
Web Information
In addition to this User's Guide, you can find many useful information on the following sites:
• wiki.clovergui.net
• www.cloveretl.org
• www.opensys.eu
2
Chapter 2. Installation Instructions
This chapter explains how you should download CloverGUI, create a new project and a new graph, and how you
can run graphs.
The Way How You Should Download Clover-

GUI
So far it has been indispensable download and install both Java Runtime Environment and Java Development Kit,
but from now Clover tools contain Janino compiler. For this reason, you can use this compiler now. JRE can be
downloaded from the following site: http://java.sun.com/javase/downloads/index_jdk5.jsp. We suggest you use
Java 1.5 because Clover is being developed on it.
Downloading the Eclipse Platform

So, once you have downloaded and installed Java Runtime Environment, you should download the Eclipse Plat-
form. There are various Eclipse Platforms for different operating systems. The following is the Eclipse home site:
www.eclipse.org.
The Eclipse Platform for both Windows and Linux can be downloaded from the following site: www.eclipse.org/
downloads.
Once you have downloaded the Eclipse Platform, you only need to unpack its .zip file (for Windows), or its
.tar.gz file (for Linux).
In Windows, the Eclipse folder contains an eclipse.exe file by which you can start up the Eclipse Platform.
In Linux, the folder contains an executable eclipse file.
Figure 2.1. The Eclipse Logo
3
Installation Instructions
Setting Workspace
When you double-click the eclipse.exe file or executable eclipse file, the Eclipse Platform starts up and
you are prompted to select a location for the workspace folder. It is the place your projects will be stored in.
In Windows, you can see the following prompt. Instead of the cloveruser folder there will be your username,
e.g. johnsmith. (C:\Users\johnsmith\workspace).
Figure 2.2. You Are Asked To Select a Workspace
You can agree with the offered workspace location, but if you want, you can choose another one. Maybe you
want to have the Eclipse workspace inside the eclipse folder. In such a case, follow these instructions:
(You could set C:\Users\johnsmith\Desktop\eclipse\workspace, for example.)
Figure 2.3. You Can Select the Following Workspace
These screenshots are taken from MS Windows operating system, but they are similar in Linux.
In Linux, you can choose /home/cloveruser/Desktop/eclipse/workspace. Again, with your user-

name (e.g. johnsmith) instead of cloveruser.
After confirming the selected workspace you will see the following window:
4
Figure 2.4. The Eclipse Platform Introductory Screen
Downloading the Eclipse GEF Plugin

After downloading the Eclipse Platform and selecting the workspace, you must download the Graphical Editing
Framework (GEF) with the help of the Eclipse Update Mechanism.
After opening the Eclipse Platform, you must choose Help → Software Updates → Find and Install...
Figure 2.5. Downloading the Graphical Editing Framework
5
Then, after clicking Find and Install..., you can see a new window with two options. One of them allows you to
update the currently installed features, the other one - to install new features. You must select the option as follows:
Figure 2.6. Install/Update Wizard
After that, you can select some or all of the provided options. We have chosen the Europa Discovery Site here.
Figure 2.7. Searching for the Graphical Editing Framework
When you click the Finish button, you will be presented with some mirrors for download.
6
Figure 2.8. List of Mirrors for Download
Then, you must select which mirror should be used for searching and downloading new features. When all new
features are found, you must check the Graphical Editing Framework item.
Now you must accept the terms in the license agreements.
Figure 2.9. The Eclipse License Agreement
Now you must click the Next button, then Finish and you will be asked to accept the installation and restart the
Eclipse Platform. After restarting the Eclipse Platform, the Eclipse GEF is already installed. When you choose
Help → About Eclipse SDK, you can see the Graphical Editing Framework items after clicking the Plug-in
Details button.
7
Figure 2.10. The About Eclipse SDK Window
Figure 2.11. List of Installed Plugins
Downloading the CloverGUI Plugin

Once you have downloaded the Eclipse GEF, you can download CloverGUI itself. The method varies depending
on your license.
First, you must register an account at the company site: www.cloveretl.org/user/register.
After that, you will be sent an e-mail with your login name and password. In that mail, you will be asked to confirm
the registration. Without doing that, you would not be able to download the CloverGUI plugin.
When you confirm the registration, you can download CloverGUI itself. After choosing Help → Software Up-
dates → Update and Install..., you can see the following window:
8
Figure 2.12. List of Update Sites
Now you must click the New Remote Site... button and fill in the two fields of the new window. CloverGUI
should be the name. For URL you must type: http://www.clovergui.net/eval-update.
Thus, the resulting window should be as follows:
Figure 2.13. Adding the Clover Update Site
After clicking the OK button, you can see the following window:
9
Figure 2.14. Selecting the Sites that Should Be Updated
Now, when you click the Finish button, you will be prompted to fill in your username and password that you have
received in the registration mail.
Figure 2.15. CloverGUI Prompt
When you type your username and password and click the OK button, CloverGUI will be found. After selecting
and expanding the CloverGUI item, the window should look like this:
10
Figure 2.16. Clover Products to Install
Now, you must click the Next button, select I accept the terms in the license agreements, click the Next and
Finish buttons. Again, you will be asked if you want to install new features. You must click Install or Install
All. Again, you will be asked to restart the Eclipse Platform. After that, you should see the Clover logo when you
choose Help → About Eclipse SDK.
Figure 2.17. Clover Has Been Installed
This way, you have installed the Eclipse Platform along with the CloverGUI plugin.
11
Creating a New Project

When you want to create a new project, you must do it by choosing File → New → Project.
Figure 2.18. Creating a New Project
Now, you should expand the CloverETL item in the presented list, select CloverETL Project or CloverETL
Examples Project and click the Next button.
Figure 2.19. Selecting the New Project Wizard
If you have selected the CloverETL Project item, you will be asked to give a name to your project. You can give
it the Project_01 name and click Finish.
12
Figure 2.20. Giving a Name to a New Project
If you have selected the CloverETL Examples Project item, you will be presented with the following wizard:
Figure 2.21. CloverETL Examples Project
You can select any of the example projects by checking its checkbox. After clicking Finish, any of them (or them
all) will appear in the Navigator pane.
After that, you will be asked to change the perspective of the Eclipse Platform for that of CloverGUI.
Figure 2.22. Opening the CloverETL Perspective
Once you have confirmed it by clicking Yes, if you have selected all of the example projects along with a new
project, you can see the following window:
13
Figure 2.23. CloverETL Perspective
On the left side, there is a Navigator pane. In this pane, you can expand the Project_01 folder, for example. Af-
ter expanding the project folder, you will be presented with the folder structure. There are subfolders for data (da-
ta-in, data-out, data-tmp), metadata (meta), connections (conn), lookup tables (lookup), sequences
(seq), transformations (trans) and graphs (graph). In the project folder, there is also a workspace.prm
file. In this file, some important project parameters are set.
Figure 2.24. CloverETL Perspective with Highlighted Navigator Pane and the Project
Folder Structure
14
Creating a New Graph

Now you can create a graph for the Project_01 by choosing File → New → ETL Graph.
(Once you have more projects in you workspace, you should better right-click the desired project in the Navigator
pane and select New → ETL Graph from the context menu.)
Figure 2.25. Creating a New Graph
After clicking the item, you will be asked to give a name to the graph. For example, the name can be
Project_01 too. But, in most cases your project will contain more graphs and you can give them names such
as Project_01_###, for example. Or any other names according to what they should do.
Figure 2.26. Giving a Name to a New Graph
15
Remember that you can decide what parameters file should be included to this project along with the graph. This
selection can be done in the textarea at the bottom of this window. You can locate some other file by clicking the
Browse... button and searching the right one. Or, you can even uncheck the checkbox leaving the graph without
a parameters file included.
We decided to have the workspace.prm file included.
At the end, you can click the Next button. After that, the extension .grf will be added to the selected name
automatically.
Figure 2.27. Selecting the Parent Folder for the Graph
By clicking Finish, you save the graph in the graph subfolder. Then, an item Project_01_001.grf appears
in the Navigator pane and a tab named Project_01_001.grf appears on the window.
Figure 2.28. CloverETL Perspective with Highlighted Graph Editor
16
You can see that there is a palette of components on the right side of the graph. This palette opens by clicking.
Figure 2.29. Graph Editor with a New Graph and the Palette of Components
You can also look at the workspace.prm file by clicking this item in the Navigator pane, by right-clicking
and choosing Open With → Text Editor from the context menu.
Figure 2.30. Opening the Workspace.prm File
You can see the parameters of your new project. The parameters of imported projects may differ from those of
new project.
17
Figure 2.31. The Parameters Contained in the Workspace.prm File
We suggest you better do not use backslashes in parameters. You should use single forward slashes instead. Or
you can use double backslashes as well.
Both Linux and Windows accept forwards slashes, but in case of backslashes, Windows always needs to have
double backslashes instead of only single ones.
And, with only single forward slashes you will be sure that everything will be working as you are expecting.
Running Graphs
When you have already created or imported graphs into your projects, you can run them.
There are three ways of running a graph:
• You can select Run → Run as → CloverETL graph from the main menu.
• Or you can right-click in the Graph editor, then select Run as in the context menu and click the CloverETL
graph item.
• Or you can click the green circle with white triangle in the tool bar located in the upper part of the window.
18
Figure 2.32. Running a Graph from the Main Menu
Figure 2.33. Running a Graph from the Context Menu
19
Figure 2.34. Running a Graph from the Upper Tool Bar
In each of these cases you can also open the Open Run Dialog, fill in the project name, the graph name and other
parameters and click the Run button.
Figure 2.35. Open Run Dialog
In this Open Run Dialog you can also select set the Java memory size in Megabytes. It is important to define
some memory size because Java Virtual Machine needs this memory capacity to run the graphs. You must define
maximum memory size for JVM by selecting the proper value:
20
Figure 2.36. Setting Up Memory Size
When using either of these three ways, the process of running the graph can be seen in the Console.
Figure 2.37. Successful Data Parsing
21
Figure 2.38. Console Tab with an Overview of the Graph Processing
And, below the edges, counts of parsed data should appear:
Figure 2.39. Counting Parsed Data
If you want, you can enlarge the font of these numbers. To do that, select Window → Preferences...
22
Figure 2.40. Enlarging the Font of Numbers
Then, expand the CloverETL item, select Tracking and type the desired font size to the Record number font
size area. By default, it is set to 7.
Figure 2.41. Setting the Font Size
23
Chapter 3. Import
CloverGUI allows you to import Clover projects, graphs and/or metadata. If you want to import something, select
File → Import... from the main menu.
Figure 3.1. Import (Main Menu)
Or right-click in the Navigator pane and select Item... from the context menu.
Figure 3.2. Import (Context Menu)
24
Import
After that, the following window opens. When you expand the Clover ETL category, the window will look like
this:
Figure 3.3. Import Options
Import Clover Projects

If you select the Import external Clover.ETL projects item, you can click the Next button and you will see
the following window:
Figure 3.4. Import Projects
You can find some directory or compressed archive file (the right option must be selected by switching the ra-
diobuttons). If you locate the directory, you can also decide whether you want to copy or link the project to your
workspace. If you want the project be linked only, you can leave the Copy projects into workspace checkbox
unchecked. Otherwise, it will be copied. Linked projects are contained in more workspaces. If you select some
archive file, the list of projects contained in the archive will appear in the Projects area. You can select some or
all of them by checking the checkboxes that appear along with them.
25
Import
Import Graphs
If you select the Import graphs - version conversion item, you can click the Next button and you will see the
following window:
Figure 3.5. Import Graphs
You must select the right graph(s) and specify from which directory into which folder the selected graph(s) should
be copied. By switching the radio buttons, you decide whether complete folder structure or only selected folders
should be created. You can also order to overwrite existing sources without warning. You can also convert it(them)
from 1.x.x to 2.x.x version of GUI.
26
Import
Import Metadata
You can also import metadata from XSD or DDL.
Metadata from XSD

If you select the Import metadata from XSD item, you can click the Next button and you will see the following
window:
Figure 3.6. Import Metadata from XSD
You must select the right metadata and specify from which directory into which folder the selected metadata
should be copied. By switching the radio buttons, you decide whether complete folder structure or only selected
folders should be created. You can also order to overwrite existing sources without warning. You can specify the
delimiters or default field size.
27
Import
Metadata from DDL

If you select the Import metadata - transform from DDL item, you can click the Next button and you will see
the following window:
Figure 3.7. Import Metadata from DDL
You must select the right metadata and specify from which directory into which folder the selected metadata
should be copied. By switching the radio buttons, you decide whether complete folder structure or only selected
folders should be created. You can also order to overwrite existing sources without warning. You need to specify
the delimiters.
28
Chapter 4. Export
CloverGUI allows you to export Clover graphs and/or metadata. If you want to export something, select File →
Export... from the main menu. Or right-click in the Navigator pane and select Item... from the context menu. After
that, the following window opens. When you expand the Clover ETL category, the window will look like this:
Figure 4.1. Export Options
Export Graphs
If you select the Export graphs item, you can click the Next button and you will see the following window:
Figure 4.2. Export Graphs
You must check the graph(s) that should be exported in the right pane. You also must find the output directory. In
addition to it, you can select whether external (shared) metadata, connections and parameters should be internalized
29
Export
and inserted into graph(s). This must be done by checking corresponding checkboxes. You can also remove gui
tags from the output file by checking the Strip gui tags checkbox.
Export Graphs to HTML

If you select the Export graphs to HTML item, you can click the Next button and you will see the following
window:
Figure 4.3. Export Graphs to HTML
You must select the graph(s) and specify to which output directory the selected graph(s) should be exported. You
can also generate index file of the exported pages and corresponding graphs and/or images of the selected graphs.
By switching the radio buttons, you are selecting either the scale of the output images, or the width and the height
of the images. You can decide whether antialiasing should be used.
30
Export
Export Metadata to XSD

If you select the Export metadata to XSD item, you can click the Next button and you will see the following
window:
Figure 4.4. Export metadata to XSD
You must select the metadata and specify to which output directory the selected metadata should be exported.
Export Image
If you select the Export image item, you can click the Next button and you will see the following window:
Figure 4.5. Export Image
31
Export
This option allows you to export images of the selected graphs only. You must select the graph(s) and specify to
which output directory the selected graph(s) images should be exported. You can also specify the format of output
files - bmp, jpeg or png. By switching the radio buttons, you are selecting either the scale of the output images,
or the width and the height of the images. You can decide whether antialiasing should be used.
32
Appendix A. Setting and Configuring
Java Tools
This new release of CloverGUI contains Janino compiler. Therefore, you can compile .java files and Java source
code located outside the graph or inside the graph, respectively. For this reason you do not need to install Java
Development Kit any more. Neither you need to set Java Runtime Environment. Janino compiler can do the same
what Java Development Kit can do. However, if you should want to set JRE or add JDK libraries, you can do it
as shown in this Appendix A. Remember that you should use Java 1.5!
Setting Java Runtime Environment

When you want to set JRE, you can do it by selecting Window → Preferences.
Figure A.1. Setting Java Runtime Environment
After clicking the item, you can see the following window:
33
Setting and Configuring Java Tools
Figure A.2. Preferences Wizard
Now you must expand the Java item and select the Installed JREs item as shown above. If you have installed
JRE 1.6, you can see the following window:
Figure A.3. Installed JREs Wizard
You should switch Java 1.6 to 1.5. Select the right JRE version by clicking the mentioned Add button, after which
you will be presented with the following window:
34
Figure A.4. Adding a New JRE
Once you have found the right folder with JRE (version 1.5), the libraries with .jar files appear in the JRE
system libraries textarea.
Figure A.5. Selecting New JRE Files
After clicking the OK button, you will have two JRE, from which you must select the right one by checking:
35
Figure A.6. Selecting a JRE
After doing this and clicking the OK button, you have prepared the right JRE for CloverGUI.
36
Installing Java Development Kit

As mentioned above, the new release of CloverGUI contains Janino compiler. But, if you want, you can install
JDK and add it to the project by selecting its item, right-clicking it and selecting the Properties item in the context
menu. We suggest once more you use JDK 1.5!
Figure A.7. Adding Java Development Kit
Then you can select the Java Build Path item and its Libraries tab. You must click the Add External JARs
button on it.
Figure A.8. Searching for JDK Jars
37
You can add all .jar files contained in the selected jdk folder into the Libraries tab. (This window below is
taken from Windows Vista, you may see some other window in your OS.)
Figure A.9. Selecting JDK Jars
After confirming this, the .jar files will be added to the project as shown below.
Figure A.10. Adding JDK Jars
38
Appendix B. Import Other Examples
In addition to the Clover examples project, you can also download and import other Clover examples.
You can find examples at the following site: www.cloveretl.org/download/examples.
The structure of some older examples folder may differ from that of new projects. The older examples consist
of one project only instead of four projects in case of newer examples.
Figure B.1. Import Examples (Main Menu)
39
Import Other Examples
Figure B.2. Import Examples (Context Menu)
You must expand the Clover ETL item and select Import external Clover.ETL projects.
Figure B.3. Import External Clover Project
You can choose Select root directory or Select archive file by switching the radio button. When you choose
Select archive file, you can find the desired archive file. For example, you can download any of the .zip files
containing our examples that are provided at the following site: www.cloveretl.org/download/examples. When
you select some of these .zip files, an examples label appears checked in the window. You only need to click
Finish.
(If you import some directory, you also can decide whether the examples should be copied or linked to the
workspace. If you check the Copy projects into workspace checkbox, it will be copied there. Otherwise, the
examples will be linked only. Linked projects are shared by more workspaces.)
40
Figure B.4. Examples Selected
In the Navigator pane, there appears the examples folder or any other name of the project if you have imported
any.
Figure B.5. CloverETL Perspective with a Set of Projects
After expanding the examples folder in the Navigator pane, you can see the tree of graphs and other folders.
41
Figure B.6. Newer Examples Files and Folders in the Navigator Pane
Older examples had a different folder structure as you can see in the following screenshot:
Figure B.7. Older Examples Files and Folders in the Navigator Pane
When you want to work with any of the graphs (your proper or imported ones), you only need to select such graph
file in the Navigator pane and add it to the Graph Editor by double-clicking the graph item in the Navigator pane.
The mentioned folder structure of some of older projects may differ from that of new projects. Also the
workspace.prm file sets different parameters. To work with the graphs, you must set the WORKSPACE pa-
rameter in the following way:
42
Figure B.8. Setting the WORKSPACE Parameter
As you can see, the WORKSPACE parameter can be set with the help of forward slashes in both Windows and
Linux, but it can be set with the help of backslashes in Windows as well. However, in Windows, there should
be used double backslashes instead of only single ones. So, in Windows, wherever you have some path with
only single backslashes, you should change them for double backslashes or you should change each of the single
backslashes for a single forward slash. For this reason it is important that you add another backslash to every single
backslash contained in the PROJECT parameter in the workspace.prm file or in each other place. But, we can
suggest that you use only forward slashes for Clover tools in both Windows and Linux. It is the best solution of
this discrepancy.
When you double-click any of the graph items in the Navigator pane, such graph opens in the Graph Editor. It
is displayed with the help of two tabs - as a graph and as a source code.
43
Figure B.9. Example Graph
44
Part II. Objects, Structures and Tools
Chapter 5. CloverGUI Structures
This chapter presents a description of the appearance and structures of CloverGUI.
CloverGUI Perspective
The CloverGUI perspective consists of 4 panes:
Figure 5.1. CloverGUI Perspective
• Graph Editor with Palette of Components is in the upper right part of the window.
In this pane you can build your graphs. Palette of Components serves to select components, move them into
the Graph Editor, connect them by edges. This pane has two tabs.
• Navigator pane is in the upper left part of the window.
There are folders and files of your projects in this pane. You can expand or collaps them and open any graph
by double-clicking its item.
• Outline pane is in the lower left part of the window.
There are all of the parts of the graph that is opened in the Graph Editor.
• Tabs pane is in the lower right part of the window.
You can see the data parsing process in these tabs.
CloverGUI Panes
Now we will present you a more detailed description of the panes.
But, first we would like to suggest you that if you want to extend any of the tabs of some pane, you only need to
double-click such a tab. After that, the pane will extend to the size of the whole window. When you double-click
it again, it will return to its original size.
46
CloverGUI Structures
Graph Editor with Palette of Components

The most important pane is the Graph Editor with Palette of Components.
To create a graph, you need to open the Palette tool by clicking the arrow which is located above the Palette
label or by holding the cursor on the Palette label. You can close the Palette again by clicking the same arrow
or even by simple moving the cursor outside the Palette tool. You can even change the shape of the Palette by
shifting its border in the Graph Editor and/or move it to the left side of the Graph Editor by clicking the label
and moving it to this location.
The name of the user that has created the graph and the name of its last modifier are saved to the Source tab
automatically.
It is the Palette tool from which you can select a component and paste it to the Graph Editor. To paste the
component, you only need to click the component label, move the cursor to the Graph Editor and click again.
After that, the component appears in the Graph Editor. You can do the same with the other components.
Once you have selected and pasted more components to the Graph Editor, you need to connect them by edges
taken from the same Palette tool. To connect two components by an edge, you must click the edge label in the
Palette tool, move the cursor to the first component, connect the edge to the output port of the component by
clicking and move the cursor to the input of another component and click again. This way the two components
will be connected. Once you have terminated your work with edges, you must click the Select item in the Palette
window.
After creating or modifying a graph, you must save it by selecting the Save item from the context menu. The
graph becomes a part of the project in which it has been created. A new graph name appears in the Navigator
pane. All components and properties of the graph can be seen in the Outline pane when the graph is opened in
the Graph Editor.
Figure 5.2. Graph Editor with an Opened Palette of Components
If you want to close any of the graphs that are opened in the Graph Editor, you can click the cross at the right side
of the tab, but if you want to close more tabs at once, right-click any of the tabs and select a corresponding item
from the context menu. There you have the items: Close, Close other, Close All and some other ones. See below:
47
Figure 5.3. Closing the Graphs
From the main menu, you can also select the CloverETL item (but only when the Graph Editor is highlighted)
and you can add grid to the Graph Editor by selecting the Grid item from the main menu.
Figure 5.4. Grid in the Graph Editor
By clicking the Graph auto-layout item, you can change the layout of the graph. You can see how it changes
when you select the Graph auto-layout item in case you have opened the graphAggregateUnsorted.grf.
Before selecting this item, the graph looks like this:
48
Figure 5.5. A Graph before Selecting Auto-Layout.
Once you have selected the mentioned item, graph could look like this:
Figure 5.6. A Graph after Selecting Auto-Layout.
Another possibility of what you can do with the Graph Editor is the following:
When you push and hold down the left mouse button somewhere inside the Graph Editor, drag the mouse through-
out the pane, a rectangle is created. When you create this rectangle in such a way so as to surround some of the graph
components and finally release the mouse button, you can see that these components have become highlighted.
(The first and second ones on the left in the graph below.) After that, six buttons (Align Left, Align Center, Align
49
Right, Align Top, Align Middle and Align Bottom) appear highlighted in the tool bar above the Graph Editor
or Navigator panes. (With their help, you can change the position of the selected components.) See below:
Figure 5.7. Six New Buttons in the Tool Bar Appear Highlighted (Align Middle is shown)
You can do the same by right-clicking inside the Graph Editor and selecting the Alignments item from the
context menu. Then, a submenu appears with the same items as mentioned above.
Figure 5.8. Alignments from the Context Menu
Remember that you can copy any highlighted part of any graph by clicking Ctrl - C and subsequently Ctrl -
V after opening some other graph.
50
Navigator Pane
In the Navigator pane, there is a list of your projects, their subfolders and files. You can expand or collaps them,
view them and open.
All graphs of the project are situated in this pane. You can open any of them in the Graph Editor by double-click-
ing the graph item.
Figure 5.9. Navigator Pane
Outline Pane
In the Outline pane, there are shown all components of the selected graph. There you can create or edit all prop-
erties of the graph components, edges metadata, database connections or JMS connections, lookups, parameters,
sequences, and notes. You can both create internal properties and link external (shared) ones. Internal properties
are contained in the graph and are visible there. You can externalize the internal properties and/or internalize the
external (shared) properties. You can also export the internal metadata. If you select any item in the Outline pane
(component, connection, metadata, etc.) and click Enter, its editor will open.
Figure 5.10. Outline Pane
Note that the two buttons in the upper right part of the Outline pane have the following property:
By default you can see the tree of components, metadata, connections, parameters, sequences, lookups and notes
in the Outline pane. But, when you click the button that is the second from the left in the upper right part of the
Outline pane, you will be switched to another representation of the pane. It will look like this:
51
Figure 5.11. Another Representation of the Outline Pane
You can see a part of some of the example graphs in the Graph Editor and you can see the same graph structure
in the Outline pane. In addition to it, there is a light-blue rectangle in the Outline pane. You can see exactly the
same part of the graph as you can see in the Graph Editor within the light-blue rectangle in the Outline pane.
By moving this rectangle within the space of the Outline pane, you can see the corresponding part of the graph in
the Graph Editor as it moves along with the rectangle. Both the light blue-rectangle and the graph in the Graph
Editor move equally.
You can do the same with the help of the scroll bars on the right and bottom sides of the Graph Editor.
To switch to the tree representation of the Outline pane, you only need to click the button that is the first from
the left in the upper right part of the Outline pane.
Tabs Pane
In the lower right part of the window, there is a serie of tabs.
• Properties tab
In this tab, you can view and/or edit the component properties. When you click a component, properties of the
selected component appear in this tab.
• Console tab
In this tab, process of reading, unloading, transforming, joining, writing, and loading data can be seen.
• Problems tab
In this tab, you can see error messages, warnings, etc. When you expand any of the items, you can see their
resources (name of the graph), their paths (path to the graph), their location (name of the component).
• Clover - graph tracking tab
In this tab, you can see a brief description of the graph that has been run successfully. The names of the com-
ponents, grouped by phases (with their using time in seconds, their using capacity in percents), status of all
52
components, CPU time that has been used for them (in seconds), CPU size that has been used for them (in
percents), average of bytes processed (in Bytes per second), average of rows processed (in rows per second),
total bytes processed (in Bytes, Kilobytes, etc.), total rows processed (in rows).
• Clover - Log tab
In this tab, you can see the entire log from the process of data parsing that is created after running a graph. There
can be a set of logs from more runs of graphs.
Figure 5.12. Properties Tab
Figure 5.13. Console Tab
Figure 5.14. Problems Tab
Figure 5.15. Clover - Graph Tracking Tab
Figure 5.16. Clover - Log Tab
53
Chapter 6. Building Transformation
Graph
To build a graph, you must select graph components, set up their properties, connect these components by edges,
select data files and/or database tables that should be read or unloaded from, written or loaded to, create metadata
describing data, assign them to edges, create database connections or JMS connections, create lookup tables and/
or create parameters and sequences. Once all of it is done, you can run the graph.
We are presenting a more detailed description of individual parts of every graph in the following chapters and
sections.
54
Chapter 7. Edges
This chapter presents an overview of the edges. It describes what they are, how they can be connected to the
components of a graph, how metadata can be assigned to them and propagated through them, how the edges can
be debugged and how the data flowing through the edges can be seen.
What Are the Edges?

The edges represent data flowing from one component to another.
Connecting Components by the Edges

When you have selected and pasted at least two components to the Graph Editor, you must connect them by
edges taken from the Palette tool. In order to connect two components by an edge, you must click the edge label
in the Palette tool, move the cursor onto one of the two components, connect the edge to its output port by clicking
the left mouse button on the component and move the cursor to the input of another component and click again.
This way the two components will be connected.
Some components only receive data from their input port(s) and write it to some data resources (Writers, includ-
ing Trash), other components read data from data resources or generate data and send it into their output port(s)
(Readers, including DataGenerator), and other components both receive data and send it to other components
(Transformers and Joiners). And the last group of components either need to be connected to some edges (non-
executing components such as CheckForeignKey and LookupTableReaderWriter) or not (Executing Compo-
nents). But almost all components must be connected by edges.
When pasting an edge to the graph, as desribed, it always bounds to a component port. The number of ports of
some components is strictly specified, while in others the number of ports is unlimited. If the number of ports is
unlimited, a new port is created by connecting a new edge. Once you have terminated your work with edges, you
must click the Select item in the Palette tool or click Esc on the keyboard.
If you have already connected two components by an edge, you can remove this edge to any other component.
To do that, you must highlight the edge by clicking, then move to the port to which the edge is connected (input
or output) until the arrow mouse cursor turns to a cross. Once the cross appears, you can drag the edge to some
of the other free ports of any component. Remember that you can only replace output port by another output port
and input port by another input port.
Assigning Metadata to the Edges

Metadata must be created and assigned to an edge. The edge has still a form of a dashed line. Only after metadata
have been created and assigned to the edge, the line becomes continuous.
You can create metadata as shown in corresponding sections below, however, you can also double-click the empty
(dashed) edge and select Create metadata from the menu. Or you can link some existing external metadata file
by selecting Link shared metadata.
55
Edges
Figure 7.1. Creating Metadata on the Empty Edge
You can also assign metadata to an edge by right-clicking the edge, choosing the Select metadata item from the
context menu and selecting the desired metadata from the list.
Propagating Metadata through the Edges

When you have already assigned metadata to the edge, you need to propagate the assigned metadata to other edges
throughout a component.
To propagate metadata, you must also open the context menu by right-clicking the edge, select the Propagate
metadata item and click this item. The metadata will be propagated until they reach the component in which
metadata can be changed (for example: reformat, join, etc.).
For the other edges, you must define another metadata and propagate them again if necessary.
Debugging the Edges

Should there occur some errors, if you obtain incorrect or unexpected results when running some of your graphs,
you must debug the graph.
To do that, it is necessary to decide first where the problem may arise from. Then you must right-click the edges
that are under your suspicion. Now you must click the Enable debug item from the context menu. After that, a
bug icon appears on the edge meaning that a debugging will be performed.
You can do the same by opening the Properties tab of the Tabs pane and setting Debug mode to true. It is false
by default.
Then you can set up some of the other properties of the edge. First, you should decide how many records you want
to view. Note that all of the records will be parsed, but only some of them can be displayed for viewing. So, type
some number to the Debug max. records field.
Then you can decide whether you want to view the desired number of selected records only from their start (from
the first record), or you want to view them selected from all of the records evenly. In such a case you need to
56
Edges
set the Debug sample data item to true. This means that the desired number of records will be selected evenly
throughout all of the records. This property is set to false by default. This means that records are selected only
from the start. But, sometimes it might be better to set this property to true.
Figure 7.2. Properties of an Edge
Finally, you can create some filter expression for debugging. After clicking the Debug filter expression item, you
can create or type the filter expression in the Filter editor wizard.
Figure 7.3. Filter Editor Wizard
This wizard consists of three panes. The left one displays the list of record fields, their names and data types. You
can select any of them by double-clicking or dragging and dropping. Then the field name appears in the bottom
area with the dollar sign before that name. You can also use the functions selected from the right pane of the
window. Below this pane, there are both comparison signs and logical connections. You can select any of the
names, functions, signs and connections by double-clicking. After that, they appear in the bottom area. You can
work with them in this area and complete the creation of the filter expression. You can validate the expression, exit
the creation by clicking Cancel or confirm the expression by clicking OK. Only the records that meet the filter
expression can be viewed after running the graph. The way of viewing them will be described in the next section.
Viewing the Data Flowing through the Edges

In order to view the records that have flown trough the edge and met the filter expression, you must open the
context menu by right-clicking. Then you must click the View data item. After that, a View data dialog opens.
Note, that even here you can create a filter expression in the same way as desribed above.
You must select the amount of records that should be displayed and confirm it by clicking OK.
57
Edges
Figure 7.4. View Data Dialog
The records are shown in another View data dialog. This dialog has grid mode. You can sort the records in any
of its columns in ascending or descending order by simply clicking its header.
Figure 7.5. Viewing Data
Above the grid, there are three labels: Edit, View, Hide/Show columns.
By clicking the Hide/Show columns label, you can select which columns should be displayed: all, none, only
selected. You can select any option by clicking.
Figure 7.6. Hide/Show Columns when Viewing Data
By clicking the View label, you are presented with two options: You can decide whether you want to view also
the unprintable characters, or not. And, you can decide whether you want to view only one record separately. Such
58
Edges
a record appears in the View record dialog. At the bottom of this dialog, you can see some arrow buttons. They
allow user to browse the records and view them in sequence. Note that by clicking the button most on the right, you
can see the last record of the displayed records, but it does not display the record that is the last of the processed.
Figure 7.7. View Record Dialog
By clicking the Edit label, you are presented with four options.
• You can select the number of record or line you want to see. Such a record will be highlighted after typing its
number and clicking OK.
• Another option opens the Find dialog. First of all, this wizard contains some textarea you can type some ex-
pression into. Then, if you check the Match case checkbox, the searching will be case sensitive. If you check
the Entire cells checkbox, only the cells that meet the expression completely will be highlighted. If you check
the Regular expression checkbox, the expression you have typed into the textarea will be used as a regular ex-
pression. You can also decide whether you want to search some expression in the direction of rows or columns.
You can also select what column shall it be searched in: all, only visible, one column from the list. And, as the
last option, you can select whether you want to find all cells that meet some criterion or only one of them.
Figure 7.8. Find Dialog
• As the last option, you can copy some of your records or their part. You need to select whether you want to
copy the entire record either to string, or as a record (in this last case you can select the delimiter as well). Or
whether you want to copy only some of the record fields. After clicking the OK button, you only need to choose
the location where it shall be copied into and past it there.
59
Edges
Figure 7.9. Copy Dialog
Types of Edges
Every edge has some internal buffer, you can select among edges by clicking the Select edge item and clicking
some of the presented types.
Figure 7.10. Selecting the Edge Type
Types of edges can be set to one of the following:
• Direct edge. This type of edge has a buffer in memory, what makes possible faster data flow. This is the edge
default type.
• Buffered edge. This type of edge has also a buffer in memory, but, if necessary, it can store data on disk as
well. Thus the buffer size is unlimited. It has two buffers, for reading and writing.
• Direct fast propagate edge. This is an alternative implementation of the Direct edge. It makes possible fast
data flow too.
• Phase connection edge. This edge cannot be selected, it is created automatically between two components with
different phase numbers.
If you do not want to specify some explicit edge type, you can let selected the default option: Detect default. In
such a case, Clover itself decides which edge type should be used.
60
Edges
Colors of the Edges

• When you connect two components by an edge, it is gray and dashed.
• After assigning metadata to the edge, it becomes solid, but still remains gray.
• When you click any metadata item in the Outline pane, all edges with the selected metadata become blue.
• If you click an edge in the Graph Editor, the selected edge becomes black and all of the other edges with the
same metadata become blue. (In this case, metadata are shown in the edge tooltip as well.)
Figure 7.11. Metadata in the Tooltip
61
Chapter 8. Metadata
Every edge of any graph carries some metadata information. So, each of your graphs containing some edge or
edges contains metadata as well. These can be either internal, or external (shared).
If metadata are internal, they are part of the graph. They are contained in the graph and you can see them when
you look at the source tab in the Graph Editor.
If metadata are external (shared), they are located outside the graph in some metadata file (in the meta folder
by default). If you look at the Source tab, you can only see a link to such external file. It is in that file where
metadata are described.
Lets suppose that you have more graphs that use the same data files or the same database tables or any other data
resource. For either of such graphs you can have the same metadata. These metadata can be either in each of these
graphs separately, or all of the graphs can share them.
It is more convenient and more simple to have one metadata for more graphs in one location, i.e. to have one
external file (shared by all of these graphs) that is linked to these various graphs that use the same data resource.
That would be very difficult if you should work with some metadata for more graphs separately in case you would
have to make some changes in your metadata. In such a case you should have to change the same characteristics
in all of the other graphs. As you can see, much better is to change the desired property in only one location -
in a metadata file.
On the other hand, if you want to give someone any of your graphs, you must give to such a person not only the
graph, but also metadata information. In such a case, more simple is to have metadata contained in your graph.
CloverGUI helps you to solve this problem whether to have internal or external metadata. If you have some
metadata in some file or more files outside the graph, you can internalize such a metadata. That means - you can put
them into your graph! After doing that, you do not need to give him or her your graph and its metadata separately.
You can give only the graph with internal metadata. The external (shared) metadata file still remain to exist, but
the metadata have already become a part of the desired graph. Subsequently, the person who receives your graph
can externalize such metadata after receiving them! That means - he or she can create a new metadata file and link
the resulting file to the graph. It is also possible to export metadata. By exporting any internal metadata you create
an external (shared) metadata file, but the original internal metadata still remain in your graph.
The same is valid for connections (both database connections and JMS connections) and parameters. Also con-
nections and parameters can be internal and external (shared). It is also possible to externalize internal connections
and/or parameters and internalize external (shared) connections and/or parameters. But they cannot be exported.
If you wanted to export them, you should externalize them and internalize again. The external connection and
parameter would still remain exist.
Internal Metadata
As mentioned above, internal metadata are part of a graph, they are contained in it and can be seen in its source tab.
How You Can Create Internal Metadata

If you want to create internal metadata, you can do it in two ways:
• You can do it in the Outline pane.
In the Outline pane, you can select the Metadata item and open the context menu by right-clicking and select
the New metadata item there.
• You can do it in the Graph Editor.
62
Metadata
In the Graph Editor, you must open the context menu by right-clicking any of the edges. There you can see
the New metadata item.
Figure 8.1. Creating Internal Metadata in the Outline Pane
Figure 8.2. Creating Internal Metadata in the Graph Editor
In both cases, after selecting the New metadata item, a new submenu appears. There you can select the way how
to define metadata.
Now you have three possibilities for either case mentioned above: If you want to define metadata yourself, you
must select the User defined item or, if you want to extract metadata from a file, you must select the Extract
63
Metadata
from flat file or Extract from xls file items, if you want to extract metadata from a database, you must select the
Extract from database item. This way, you can only create internal metadata.
Externalizing Internal Metadata

Once you have created internal metadata as a part of a graph, you have them in your graph, once they are contained
and visible in the graph, you may want to convert them to external (shared) metadata. In such a case, you would
be able to use the same metadata for more graphs (more graphs would share them).
You can externalize internal metadata into external (shared) one by right-clicking some of the internal metadata
items in the Outline pane, clicking Externalize metadata from the context menu, selecting the project you want
to add metadata into, expanding that project, selecting the meta folder, renaming the metadata file, if necessary,
and clicking Finish.
Then, the internal file disappears from the Outline pane metadata folder, but, at the same location, there appears
a newly created metadata file.
The same metadata file appears in the meta subfolder in the Navigator pane.
Exporting Internal Metadata

This case is somewhat similar to that of externalizing metadata. But, now you create a metadata file that is outside
the graph in the same way as that of externalized file, but such a file is not linked to the original graph! Only
a metadata file is being created. Subsequently you can use such a file for more graphs as an external (shared)
metadata file as mentioned in the previous sections.
You can export internal metadata into external (shared) one by right-clicking some of the internal metadata items
in the Outline pane, clicking Export metadata from the context menu, selecting the project you want to add
metadata into, expanding that project, selecting the meta folder, renaming the metadata file, if necessary, and
clicking Finish.
After that, the Outline pane metadata folder remains the same, but in the meta folder in the Navigator pane the
newly created metadata file appears.
Figure 8.3. Externalizing and/or Exporting Internal Metadata
64
Metadata
Figure 8.4. Selecting a Location for a New Externalized and/or Exported Internal
Metadata
External (Shared) Metadata

As mentioned above, external (shared) metadata are metadata that serve for more graphs than only one. They are
located outside the graph and that is why more graphs share them.
How You Can Create External (Shared) Metadata

If you want to create shared metadata, you can do it in two ways:
• You can do it by selecting File → New → Other in the main menu.
To create external (shared) metadata, after clicking the Other item, you must select the CloverETL item,
expand it and decide whether you want to define metadata yourself (Define by hand), extract them from a file
(Flat file or XLS file), or extract them from a database (Database).
• You can do it in the Navigator pane.
To create external (shared) metadata, you can open the context menu by right-clicking, select New → Others-
from it, and after opening the list of wizards you must select the CloverETL item, expand it and decide whether
you want to define metadata yourself (Define by hand), extract them from a file (Flat file or XLS file), or
extract them from a database (Database).
65
Metadata
Figure 8.5. Creating External (Shared) Metadata in the Main Menu and/or in the
Navigator Pane
Linking External (Shared) Metadata

After their creation (see previous sections), shared metadata must be linked to all of the graphs in which they
should be used. You must do it in the context menu by selecting New metadata → Link shared definition (For
more information see Section "How You Can Create Internal Metadata".), and, after clicking it, you only need to
select the metadata file from the files contained in the File selection wizard.
Internalizing External (Shared) Metadata

Once you have created and linked external (shared) metadata, in case you want to put them into the graph, you may
want to convert them to internal metadata. In such a case you would be able to see their structure in the graph itself.
You can internalize external (shared) metadata file into internal metadata by right-clicking some of the external
(shared) metadata items in the Outline pane and clicking Internalize metadata from the context menu.
Then, the external (shared) metadata item disappears from the Outline pane metadata folder, but, at the same
location, there appears the newly created internal metadata item.
However, the original external (shared) metadata file still remain to exist in the meta subfolder in the Navigator
pane.
66
Metadata
Figure 8.6. Internalizing External (Shared) Metadata
The Resources From Which You Can Extract

Metadata
As mentioned above, metadata describe structure of data. And, since data itself is contained in flat files, in XLS
files, in DBF files, in XML files or in database tables, you must extract metadata in a different way for either of
these data resources. We will describe how to create or extract (for some files only) metadata from files mentioned
above. Let`s start with a flat file.
The description is valid for both internal and external (shared) metadata.
Extracting Metadata from a Flat File

If you want to extract metadata from a flat file (Flat file for shared definition or Extract from file for internal
definition), you must do it in the following way:
When you want to create metadata definition by extracting from a flat file, you must click the corresponding item
(Flat file for shared definition or Extract from flat file for internal definition). After that, a Flat file wizard opens.
In that wizard, you must type the file name or find it with the help of the Browse... button. Once you have selected
the file, you must specify the Encoding and Record type options as well.
If the fields of a record are separated from each other by some delimiters, you need to select Delimited as the
Record type option. If the fields are of some defined sizes, you need to select the Fixed Length option.
In the Input file pane below you can see the data from the file.
67
Metadata
Figure 8.7. Extracting Metadata from Delimited Flat File
Figure 8.8. Extracting Metadata from Fixed Length Flat File
How to Extract Metadata from Delimited Files

After clicking the Next button, you can see more detailed information about the content of the input file and the
delimiters in the Metadata wizard. It consists of four panes. The first two are at the upper part of the window, the
third is at the middle, the fourth is at the bottom. Either pane can be expanded to the whole window by clicking
the corresponding symbol in the upper right corner.
The first two panes at the top are the panes described in the Section "Metadata Editor". If you want to set up the
metadata, you can do it in the way explained in more details in the mentioned section. You can click the symbol
68
Metadata
in the upper right corner of the pane after which the two panes expand to the whole window. The two upper panes
are the Record pane and the Field pane. In the Record pane, there are also Delimiters (for delimited files) or
Sizes (for fixed length files) of the fields or both (for mixed files). In the Field pane, after clicking some of the
fields of the record, you can see the structure of the selected individual field: the Properties of the field along with
their values. Some Properties have default values, whereas others have not. Also here, you can change the Name,
Type, Format, Nullable property, Default, Delimiter, EOF as delimiter, Size, Autofilling, Locale, Format,
Shift. (For more details on how you should change the metadata structure see Section "Metadata Editor".)
In the middle there is the third pane. If you expand it to the whole wizard window, you will see the following:
Figure 8.9. Setting Up Delimited Metadata
In addition to the upper two panes, you can also change some metadata settings in the third pane in the middle. In
this pane you can specify whether the first line of the file contains the names of the record fields. If so, you need to
check the Extract names checkbox. If you want, you can also click some column header and decide whether you
want to change the name of the field (Rename) or the data type (Retype). If there are no field names in the file,
CloverGUI gives them the names Field# as the default names of the fields. By default, the type of all record fields
is set to string. You can change this data type for any other type by selecting the right option from the presented
list. These options are as follows: boolean, byte, cbyte, date, decimal, integer, long, numeric,
string. (For more detailed description see Section "Data Types".) Also you must specify what kind of delimiter
is used in the file (Delimiter). It can be comma, colon, semicolon, space, tabulator, or a sequence of characters.
You need to select the right one. Finally, you can also click the Reparse button after which you can see the result
of a new parsing of the file in the pane.
At the bottom of the wizard, the fourth pane displays the data of the file.
In case you are creating internal metadata, you only need to click the Finish button. If you are creating external
(shared) metadata, you must click the offered Next button, then select the folder (meta) and name of metadata
and click Finish. The extension .fmt will be added to the metadata file automatically.
How to Extract Metadata from Fixed Length Files

After clicking the Next button, you can see more detailed information about the content of the input file and the
delimiters in the Metadata wizard. It consists of four panes. The first two are at the upper part of the window,
69
Metadata
the third is at the middle, the fourth is at the bottom. Either pane can be expanded to the whole wizard window
by clicking the corresponding symbol in the upper right corner.
The first two panes at the top are the panes described in the Section "Metadata Editor". If you want to set up the
metadata, you can do it in the way explained in more details in the mentioned section. You can click the symbol
in the upper right corner of the pane after which the two panes expand to the whole window. The two upper panes
are the Record pane and the Field pane. In the Record pane, there are also Delimiters (for delimited files) or
Sizes (for fixed length files) of the fields or both (for mixed files). In the Field pane, after clicking some of the
fields of the record, you can see the structure of the selected individual field: the Properties of the field along with
their values. Some Properties have default values, whereas others have not. Also here, you can change the Name,
Type, Format, Nullable property, Default, Delimiter, EOF as delimiter, Size, Autofilling, Locale, Format,
Shift. (For more details on how you should change the metadata structure see Section "Metadata Editor".)
In the middle there is the third pane. If you expand it to the whole window, you will see the following:
Figure 8.10. Setting Up Fixed Length Metadata
If you want, you can also click some column header and choose one of the following: Rename, Resize, Retype.
If you want to change the name of a column, you can choose the Rename option. If there are no field names in
the file, CloverGUI gives them the names Field# as the default names of the fields. Also, the type of all record
fields is set to string by default. If you want to change the data type, you must choose Retype. You can change
the default data type for any other type by selecting the right option from the presented list. These options are
as follows: boolean, byte, cbyte, date, decimal, integer, long, numeric, string. (For more
detailed description see Section "Data Types".) You must also change the default sizes of the individual fields
(Resize). You may also want to split column, merge column, add one column or more, remove column. And you
can change the sizes by moving the borders of the columns.
At the bottom of the wizard, the fourth pane displays the data of the file.
In case you are creating internal metadata, you only need to click the Finish button. If you are creating external
(shared) metadata, you must click the offered Next button, then select the folder (meta) and name of metadata
and click Finish. The extension .fmt will be added to the metadata file automatically.
70
Metadata
Extracting Metadata from an XLS File

When you want to extract metadata from an XLS file, you must select XLS file for external (shared) metadata or
Extract from xls file for internal metadata from the context menu.
In the Sheet properties wizard that appears after clicking either of the two mentioned items, you must browse
and locate the desired XLS file and click the Open button.
After that, some properties that you can see in the wizard appear filled with some values. They are Sheet name,
Metadata row, Sample data row, Encoding. If they do not appear filled, you can do it yourself. Also, at the
bottom, you can see data probe from the selected XLS file.
You can select the Sheet name. You may want to change the encoding as well.
As regards Metadata row and Sample data row: Metadata row is set to 1 and Sample data row is set to 2 by
default. (Sample data row means the row from which data types are extracted. Metadata row is the row which
contains the names of the fields. Together they give rise to metadata description of the file.)
If the XSL file does not contain any row with field names, you should set Metadata row to 0. In such a case,
headers or codes of columns (letters starting from A, etc.) will serve as the names of the fields.
In case of XSL files, data types are set to their right types thanks to the Sample data row. Also the formats are
set to the right format types.
You can also select the Number of lines in preview. By default it is 100.
As the last step, you click either the OK button (when creating internal metadata), or the Next button, select the
location (meta, by default) and choose some name (when creating external (shared) metadata file). The extension
.fmt will be added to the name of metadata file automatically.
Figure 8.11. Extracting Metadata from XLS File
71
Metadata
Extracting Metadata from a Database

If you want to extract metadata from a database (when you select the Database item for external (shared) definition
or the Extract from database item for internal definition), you must have some database connection defined prior
to extracting metadata.
In addition to this, if you want to extract internal metadata from a database, you can also right-click any connection
item in the Outline pane and select New metadata → Extract from database.
Figure 8.12. Extracting Internal Metadata from a Database
After each of these three options, a Database Connection wizard opens.
Figure 8.13. Database Connection Wizard
72
Metadata
In order to extract metadata, you must first create database connection as shown in corresponding section. Once
it has been created, User, Password and URL fields become filled in the Database Connection wizard.
Then you must click Next. After that, you can see a database schema.
Figure 8.14. Selecting Columns for Metadata
Now you have two possibilities:
Either you write a query directly, or you generate the query by selecting individual columns of database tables.
If you want to generate the query, hold Ctrl on the keyboard, highlight individual columns from individual tables
by clicking and (at the end) click the Generate button. The query will be generated automatically.
Figure 8.15. Generating a Query
Once you have written or generated the query, you can check its validity by clicking the Validate button.
Then you must click Next. After that, Metadata Editor opens. In it, you must finish the extraction of metadata.
• By clicking the Finish button (in case of internal metadata), you will get internal metadata in the Outline pane.
73
Metadata
• On the other hand, if you wanted to extract external (shared) metadata, you must click the Next button first, after
which you will be prompted to decide which project and which subfolder should contain your future metadata
file. After expanding the project, selecting the meta subfolder, specifying the name of the metadata file and
clicking Finish, it is saved into the selected location.
Creating Metadata from a DBase File

If you want to create metadata from a dBase file, you need two jars to do so. Both of them are provided with
our CloverGUI.
So, if you want to create a dBase file metadata in any of your projects, you must first right-click the project name
in the Navigator pane, click the Properties item from the context menu and click Java Build Path.
Figure 8.16. Original Libraries Tab of Java Build Path
There you must open the Libraries tab, click the Add Externals JARS button and locate the two .jar-s men-
tioned above.
The two .jar-s are the following:
C:\Users\cloveruser\Desktop\eclipse\plugins\com.cloveretl.gui_2.0.0\lib\lib
\cloveretl.engine.jar
C:\Users\cloveruser\Desktop\eclipse\plugins\com.cloveretl.gui_2.0.0\lib\lib
\commons-logging.jar
In your case, you may need to change the path to your eclipse folder which is C:\Users\cloverus-
er\Desktop in our case. Maybe you also need to change the numbers indicating the CloverGUI release that
are 1.10.1 here.
When you have found these two .jar-s, you must add them to the libraries by clicking Open and then OK.
74
Metadata
Figure 8.17. Adding the Two Libraries for Extracting Metadata from DBASE File
Then you must right-click in the Graph Editor and select Run as → Open Run Dialog... from the context menu.
In it, you must collaps all of the graphs, select and double-click the Java Application item. After that, a new
configuration appears. You must type dbasefile_metadata as its name in the Main tab (or you can choose
any other name for this Java application).
Figure 8.18. Creating Java Application for Extracting Metadata from DBASE File
Now you only need to click the Search... button, locate the main class and select its name from the list provided
by the Select Main Type wizard.
75
Metadata
Figure 8.19. Selecting the Main Class
You must select the DBFAnalyzer name and double-click this item. Then the name
org.jetel.database.dbf.DBFAnalyzer appears in the Main class textarea of the Main tab.
Figure 8.20. Adding the Main Class
Now you must switch to the Arguments tab. There you must type data-in/DBASEFILENAME.DBF and (after
one white space) meta/metadatadbf.fmt. Or you can type any other names depending on what dbase file
you have in the data-in subfolder and what metadata file you want to create. For different dbase files you must
select different metadata files.
76
Metadata
Figure 8.21. Adding Arguments
Now, when you click the Run button, you are creating the metadata file with the help of your dbase_metadata
configuration.
Figure 8.22. Configuration for Extracting Metadata from DBASE File Has Been Created
Then, you can do with the metadata file all that has been described above for the case of the other external
(shared) metadata. You can link this external (shared) dbase metadata file to each graph that use the mentioned
DBASEFILENAME.DBF. You can assign this metadata to edges, you can internalize the file and, if needed, you
can also externalize this metadata again.
This way, CloverGUI helps you create metadata from every dbase file.
77
Metadata
Creating Metadata by User

If you want to create metadata yourself (Define by hand for external (shared) definition or User defined for
internal definition), you must do it in the following manner:
After opening the Metadata wizard, you must add a desired number of fields by clicking the plus sign, set up their
names, their data types, their delimiters, their sizes, formats and all that has been described above.
Once you have done all of that, you must click either OK for internal metadata, or Next for external (shared) meta-
data. In the last case, you only need to select the location (meta, by default) and a name for metadata file. When
you click OK, your metadata file will be saved and the extension .fmt will be added to the file automatically.
Assigning Metadata to an Edge

When you have created metadata, you must assign them to an edge. You need to right-click the edge, choose the
Select metadata item and select the right metadata item from the metadata list.
Figure 8.23. Assigning Metadata to an Edge
Editing Metadata
When you want to edit already defined metadata, you can do it with the help of the same wizards.
• You can edit both internal and external (shared) metadata by double-clicking an edge. A Metadata editor opens.
• You can edit both internal and external (shared) metadata by using the context menu that has been called out
in the Outline pane by right-clicking the Metadata item.
• You can edit both internal and external (shared) metadata from the context menu that has been called out in the
Graph Editor by right-clicking an edge of some graph.
In all cases, then you must select the Edit item, after which you will work with the same wizard as when extracting
metadata from a file.
78
Metadata
• Internal and external (shared) metadata can also be edited as a part of the graph or as a separate XML file,
respectively, if you open them in the Graph Editor.
You must display the source code in the source tab of the Graph Editor or in the Navigator pane you must
first select the project folder, expand it, select the metadata folder, select the metadata definition file (with .fmt
extension) and open it by choosing Open With → Text Editor. Then the metadata definition file appears in the
Graph Editor. There you can change its content.
If you select any metadata item in the Outline pane and click Enter, the Metadata editor will open.
Creating Database Table on the basis of

Metadata and Database Connection
As the last option, you can also create a database table on the basis of metadata (both internal and external).
When you select the Create database table item from either of the two context menus (called out from the Outline
pane and/or Graph Editor), a wizard with a SQL query that can create database table opens.
Figure 8.24. Creating Database Table on the Basis of Metadata and Database Connection
You can edit the content of this window if you want.
When you select some connection to a database (for more details see Section "Database Connections"), such
database table will be created.
Metadata Editor
You can also open this editor when selecting some metadata item in the Outline pane and clicking Enter.
Here we will describe the appearance of this Metadata editor.
It consists of two panes - Record pane and Field pane.
In the Record pane, there are also Delimiters (for delimited files) or Sizes (for fixed length files) of the fields
or both (for mixed files).
79
Metadata
Remember that you must define mixed metadata by hand. You must specify both delimiters and sizes. Delimiters
for some fields, whereas sizes for others.
In the Field pane, after clicking some of the fields of the record, you can see the structure of the selected individual
field: the Properties of the field along with their values. Some Properties have default values, whereas others
have not. Also here, you can change the Name, Type, Format, Nullable property, Default, Delimiter, EOF as
delimiter, Size, Autofilling, Locale, Format, Shift.
It must be mentioned now that field delimiters are mostly the same for all of the record fields. Thus, the delimiter
located on the first row is the default field delimiter that is used by default for all fields (Except the last one - it
is the record delimiter itself. It is displayed grayish.). These used default field delimiters are displayed grayish
for all fields.
Nevertheless, you can set delimiters for all fields (including the record delimiter) to any other values.
Note that the delimiter on the first row is not the record delimiter, it is the default field delimiter. The record
delimiter can be seen on the last row. It is grayish as well. Record delimiter is set to \r\n by default, but you
can select any other record delimiter.
Now we will explain how you should understand the whole system of delimiters in metadata editor:
You can see the numbers in the first column of this metadata editor. They are the numbers of individual record
fields. The field names that correspond to these numbers are on the right-hand side from these numbers. In the
same way, the delimiters that correspond to these fields are on the right-hand side from these field names.
Remember that the first field lies at the right-hand side from the number "1" and the delimiter that follows this
field (first field) on the same row of this metadata editor corresponds to this field in the following way:
In the same way as this first delimiter follows the first field name on this row of this metadata editor, the same
delimiter follows the first field within the whole structure of each record.
Thus, the delimiter is located at the right-hand side from the field name on the same row of metadata editor and
(in the same way) the same delimiter is located next to this field at the right-hand side from the same field within
the record structure.
And what is also important:
You can see the last row of metadata editor. The delimiter on this row follows the field name on the same last row
of this metadata editor. In the same way, this delimiter follows the last field within the record structure. This is the
record delimiter. The record delimiter is the same delimiter that follows the field on the last row of metadata editor!
If you want to change the record delimiter for any other value, click the second column in the first row of the
metadata editor. It contains the following label in bold: Record: recordname. If you click this label, it turns
to be recordname on the blue background and on the right side, in the Field pane, you can find the follow-
ing properties: Default delimiter, Record delimiter, Name, Preview Attachment Metadata Row, Preview At-
tachment Sample Data Row, Preview Charset, Preview attachment, Skip first line, Type (type of metadata
- delimited, fixed or mixed) and Locale. There you can change the Record delimiter property. You can
set this record delimiter to any value.
Below you can see an example of delimited metadata and another one of fixed length metadata. Mixed metadata
would be a combination of both cases. For some field names delimiter would be defined and no size would be
specified, whereas for others size would be defined and no delimiter would be specified. To create such a metadata,
you must do it by hand.
80
Metadata
Figure 8.25. Metadata Editor for a Delimited File
Figure 8.26. Metadata Editor for a Fixed Length File
Record Pane
On the left side of the wizard, at the left from the Record pane, there are six buttons (down from the top) - for
adding or removing fields, for moving fields to top, up, down or bottom. Above these buttons, there are two arrows
(for undoing and redoing, from left to right).
81
Metadata
In addition to it, each column of the Record pane can be sorted in ascending or descending order by simple clicking
its header.
Field Names
There you can type the names of the fields. Every name of the field is the same field name as in the Field pane.
By changing this name and clicking the Enter button, you are changing the same field name in the Field pane as
well. We suggest you only use the following characters for the field names: [a-zA-Z0-9_].
Data and Record Types

There you can select some type of data. Every type of data is the same data type as in the Field pane. By changing
its value and clicking the Enter button, you are changing the same value of the data type in the Field pane as well.
Data Types
• Boolean. This data type can have values either true or false.
• Byte. This data type is an array of bytes. Each byte has values from -127 to 128.
• CByte. This is a compressed array of bytes. Each byte has values from -127 to 128.
• Date. This data type serves to designate date. Its size is 8 bytes.
• Decimal. This data type is defined by scale and precision. Scale is the number of all digits contained in this
numeric data type as a maximum. Precision is the number of digits after the decimal dot. Thus, data type deci-
mal(6,2) can have values from -9999.99 to 9999.99. Its size depends on its precision.
• Integer. This data type can have values from -231 to 231-1. Its size is 4 bytes.
• Long. This data type can have values from -263 to 263-1. Its size is 8 bytes.
• Numeric. This data type can have the following values: 0 and negative values from -(2-2-52).21023 to -2-1074
and positive values from 2-1074 to (2-2-52).21023 Its size is 8 bytes.
• String. This data type is a sequence of characters. Every data type can be converted to a string in a simple way.
Record Types
• Delimited. This is the format of records in which every two adjacent fields are separated between each other
by some delimiter.
• Fixed. This is the format of records in which every field has some defined size.
• Mixed. This is the format of records in which some fields are separated between each other by delimiters
whereas the other fields have some defined sizes. It is the mixture of the two cases above.
Delimiters
There you can select delimiters separating two adjacent fields (pipe, comma, semicolon, colon, tabulator, line
feed, carriage return, carriage return plus line feed, etc.). For either field, it is the same delimiter as in the Field
pane. By changing its value and clicking the Enter button, you are changing the value of the delimiter in the
Field pane as well.
Sizes
There you can type the sizes of the fields in characters. These sizes are the same as in the Field pane. For either
field, it is the same size as in the Field pane. By changing its value and clicking the Enter button, you are changing
the value of the size in the Field pane as well.
82
Metadata
Field Pane
On the right side of the wizard, there is a Field pane. In this pane, following properties can be set up:
• Default. This is the default value of the field. It is used if you set the Autofilling property to default_value.
• Delimiter. This is the same field delimiter as in the Record pane. By changing its value and clicking the Enter
button, you are changing the same value of the delimiter in the Record pane as well.
• Name. This is the same field name as in the Record pane. By changing this name and clicking the Enter button,
you are changing the same field name in the Record pane as well.
• Nullable. This can be true or false. If it is set to true, the field value can be null. This value is set to true by
default. The fields are nullable by default.
• Size. This is the same size as in the Record pane. By changing its value and clicking the Enter button, you are
changing the same value of the size in the Record pane as well.
• Type. This is the same data type as in the Record pane. By changing its value and clicking the Enter button,
you are changing the same value of the data type in the Record pane as well.
• Autofilling. From this list, you can select some of the functions that should be used to fill some of the fields
by some of the components when they read data records. It cannot be used in the edges following after the
XMLExtract and CloverDataReader components.
• default_value. This function fills the specified record fields of corresponding data type by the value
specified as the Default property.
• global_row_count. This function fills the specified record fields of any numeric data type in more edges
sequentially, in the order in which data records are sent out through the output ports. The numbering starts
from 0. However, if data records are read from more data sources, the numbering goes continuously through
all data sources. On the other hand, if some edge does not include such a field, corresponding numbers will
be omitted. The others will be written to the specified fields.
• source_row_count. This function fills the specified record fields of any numeric data type in more edges
sequentially, in the order in which data records are sent out through the output ports. If data records are read
from more data sources, the numbering starts from 0 for each data source. And, if some edge does not include
such a field, corresponding numbers will be omitted. The others will be written to the specified fields.
• metadata_row_count. This function fills the specified record fields of any numeric data type for one
metadata sequentially, in the order in which data records are sent out through the output ports. The numbering
starts from 0. However, if data records are read from more data sources, the numbering goes continuously
through all data sources.
• metadata_source_row_count. This function fills the specified record field of any numeric data type
for one metadata sequentially, in the order in which data records are sent out through the output ports. If data
records are read from more data sources, the numbering starts from 0 for each data source.
• source_name. This function fills the specified record fields of string data type by the name of data source
from which records are read.
• source_timestamp. This function fills the specified record fields of date data type by the timestamp
corresponing to the data source from which records are read.
• source_size. This function fills the specified record fields of any numeric data type by the size of data
source from which records are read.
• EOF as delimiter. This can be set to true or false according to whether EOF character is used as delimiter. It
can be useful when your file does not end with any other delimiter. If you did not set this property to true, run
of the graph with such data file would fail (by default it is false).
83
Metadata
• Format. This is a description of the format. For example, the data type of date field can have the format dd/
MM/yyyy or dd.MM.yyyy. The integer numbers have format #, etc.
• Locale. This property can be set up according to the localization of your computer. It can be useful for date
formats or for decimal separator, for example.
• Shift. This is the gap between the end of one field and the start of the next one when the fields are part of fixed
or mixed record and their sizes are set to some value.
Filter Textarea
In the Filter textarea, you can type any expression you want to search among the fields of the Record pane. Note
that this is case sensitive.
Dynamic Metadata
In addition to all other metadata created or extracted using CloverGUI, you can also write metadata definition
in the Source tab of the Graph Editor pane. Unlike the metadata defined in GUI, such metadata written in the
Source tab cannot be edited in GUI.
To define the metadata in the Source tab, open this tab and write there the following:
<Metadata id="YourMetadataId" connection="YourConnectionToDB"

sqlQuery="YourQuery"/>
Select any expression for YourMetadataId, type your DB connection that should be used to connect to DB as
YourConnectionToDB and type the query that will be used to extract data from DB as YourQuery.
If you want to speed the run of your graph, you can also add to your query "where 1=0" or "and where
1=0" (the last expression should be added to the query terminated by other "where ..." expression.
This way only metadata will be extracted and no data will be read. Remember that such metadata cannot be created
in GUI and will only be generated at the runtime.
84
Chapter 9. Database Connections
If you want to parse data, you need to have some resources of data. Sometimes you get data from files, in other
cases from databases or other data resources.
Now we will describe how you can work with the resources that are not files. In order to work with them, you need
to make a connection to such data resources. By now we will describe only how to work with databases, some of
the more advanced data resources using connections will be described later.
When you want to work with databases, you can do it in two following ways: Either you have a client on your
computer that connects with a database located on some server by means of some client utility . The other way is
to use a JDBC driver. Now we will describe the database connections that use some JDBC drivers. The other way
(client-server architecture) will be described later when we are talking about components.
As in the case of metadata, database connections can be internal or external (shared). You can create them in
two ways.
Internal Database Connections

As mentioned above about metadata, also internal database connections are part of a graph, they are contained in
it and can be seen in its source tab. This property is common for all internal structures.
How You Can Create Internal Database Connections

If you want to create an internal database connection, you must do it in the Outline pane by selecting the Con-
nections item, right-clicking this item, selecting Connections → Create internal.
Figure 9.1. Creating Internal Database Connection
A Database connection wizard opens. (You can also open this wizard when selecting some DB connection item
in the Outline pane and clicking Enter.)
In the Database connection wizard, you must specify the name of the connection, type your username, your
access password and URL of the database connection (hostname, database name or other properties). You also
85
Database Connections
decide whether you want to encrypt the access password by checking the checkbox. And you need to select the
JDBC specific property. You can also use the default one, however, it may not do all that you want.
To add some driver, you must click some of the available drivers in the list. In case that you still do not have the
desired JDBC driver in the list, you must load such driver by clicking the Plus sign located on the right side of
the wizard ("Load driver from JAR"). The result can be as follows:
Figure 9.3. Adding a new JDBC Driver into the List of Available Drivers
If necessary, you can also add another JAR to the driver classpath (Add JAR to driver classpath). For example,
some databases may need their license be added.
You can also add some property (Add user-defined property).
Note that you can also remove some driver from the list (Remove selected) by clicking the Minus sign.
86
CloverGUI already provides two built-in JDBC drivers that are displayed in the list of available drivers. They are
the JDBC drivers for MySQL and PostgreSQL databases.
You can choose some JDBC driver from the list of available drivers. By clicking any of them, connection string
hint appears in the URL textarea. You only need to modify the connection.
Once you have selected the driver from the list, you only need to type your username and password for connecting
to the database. You also need to change the "hostname" for its right name. You must also type the right database
name instead of the "database" word. Some other drivers provide different URLs that must be changed in a different
way. You can also load some existing connection from one of the existing configuration files. And you can set
up the JDBC specific property.
Figure 9.4. Defining Internal Database Connection
When all has been done, you can validate your connection by clicking the Validate connection button.
After clicking Finish, your internal database connection has been created.
87
Externalizing Internal Database Connections

Once you have created internal database connection as a part of a graph, you have it in your graph, once it is
contained and visible in the graph, you may want to convert it into external (shared) database connection. Thus,
you would be able to use the same database connection for more graphs (more graphs would share the connection).
You can externalize internal database connection into external (shared) one by right-clicking some of the internal
database connection items in the Outline pane, clicking Externalize connection from the context menu, select-
ing the project you want to add the database connection into, expanding that project, selecting the conn folder,
renaming the configuration file, if necessary, and clicking Finish.
Figure 9.5. Externalizing Internal Database Connection
After that, the internal file disappears from the Outline pane connections folder, but, at the same location, a newly
created configuration file appears.
The same configuration file appears in the conn subfolder in the Navigator pane.
88
External (Shared) Database Connections

As mentioned above, external (shared) database connections are such connections that serve for more graphs than
only one. They are stored outside the graph and that is why more graphs can share them.
How You Can Create External (Shared) Database Con-

nections
If you want to create an external (shared) database connection, you must do it by selecting File → New → Other...
Figure 9.6. Creating External (Shared) Database Connection
Then you must expand the CloverETL item and either click the Database connection item and Next, or dou-
ble-click the Database Connection item.
89
Figure 9.7. Selecting Database Connection Item
After that, a Database connection wizard opens. (You can also open this wizard when selecting some DB con-
nection item in the Outline pane and clicking Enter.)
In the Database connection wizard, you must specify the name of the connection, type your username, your
access password and URL of the database connection (hostname, database name or other properties). You also
decide whether you want to encrypt the access password by checking the checkbox. And you need to select the
JDBC specific property.
To add some driver, you must click some of the available drivers in the list. In case that you still do not have the
desired JDBC driver in the list, you must load such driver by clicking the Plus sign located on the right side of
the wizard ("Load driver from JAR"). The result can be as follows:
90
Figure 9.9. Adding a new JDBC Driver into the List of Available Drivers
If necessary, you can also add another JAR to the driver classpath (Add JAR to driver classpath). For example,
some databases may need their license be added.
You can also add some property (Add user-defined property).
Note that you can also remove some driver from the list (Remove selected) by clicking the Minus sign.
CloverGUI already provides two built-in JDBC drivers that are displayed in the list of available drivers. They are
the JDBC drivers for MySQL and PostgreSQL databases.
You can choose some JDBC driver from the list of available drivers. By clicking any of them, connection string
hint appears in the URL textarea. You only need to modify the connection.
Once you have selected the driver from the list, you only need to type your username and password for connecting
to the database. You also need to change the "hostname" for its right name. You must also type the right database
name instead of the "database" word. Some other drivers provide different URLs that must be changed in a different
way. You can also load some existing connection from one of the existing configuration files. And you can set
up the JDBC specific property.
91
Figure 9.10. Defining External (Shared) Database Connection
When all has been done, you can validate your connection by clicking the Validate connection button.
Then you only need to click the Next button and select the folder for your connection configuration file.
Figure 9.11. Selecting a Folder for External (Shared) Database Connection
After clicking Finish, your external (shared) database connection has been created.
92
Linking External (Shared) Database Connection

If you want to link an already existing external (shared) database connection, you must do it in the Outline pane
by selecting the Connections item, right-clicking this item, selecting Connections → Link shared connection.
Then you must select some of the existing configuration files (extension .cfg).
Internalizing External (Shared) Database Connections

Once you have created and linked external (shared) database connection, in case you want to put this connection
into the graph, you may want to convert it into internal database connection. Thus, you would be able to see its
structure in the graph itself.
You can internalize external (shared) configuration file into internal database connection by right-clicking some
of the external (shared) database connections items in the Outline pane and clicking Internalize connection from
the context menu.
Figure 9.12. Internalizing External (Shared) Database Connection
After that, the external (shared) database connection item disappears from the Outline pane connections folder,
but, at the same location, a newly created internal database connection item appears.
However, the original external (shared) configuration file still remains in the conn subfolder in the Navigator
pane.
Browsing Database and Extracting Metadata

from Database Tables
As you could see above (in Sections "Externalizing Internal Database Connections" and "Internalizing External
(Shared) Database Connections".), in both of these cases the context menu contains two interesting items: the
Browse database and New metadata items. They give you the opportunity to browse a database (if your connec-
tion is valid) and/or extract metadata from some selected database table. Such a metadata will be internal only,
but you can externalize and/or export them.
93
Encrypting the Access Password

If you do not encrypt your access password, it remains stored and visible in the configuration file (shared con-
nection) or in the graph itself (internal connection). Thus, the access password can be seen in some of these two
locations.
Of course, this would not present any problem if you were the only one who had access to your graph and/or
who had your computer to yourself only. And even more, there would not be any problem if the password did not
provide access to the whole database! But it does provide access right there!
So, in case you want and need to give someone any of your graphs, you must not give him or her the access
password to the whole database. This is the reason why it is important to encrypt your access password. Without
doing so, you would be at great risk of some intrusion into your database or of some other damage from whoever
who could get this access password.
Thus, it is important and possible that you give him or her the graph with the access password encrypted. This
way, no person would be able to change your database without your permission in any way.
In order to hide your access password, you must select Encrypt password by checking the checkbox in the
Database connection wizard, typing a new (encrypting) password to encrypt the original (encrypted now) access
password and clicking the Finish button.
And then, in order to run such a graph, you will not be able to run the graph by choosing Run as → CloverETL
graph any more. To run the graph, you must use the Open Run Dialog wizard now. There, in the Main tab,
you must type or find by browsing the name of the project, its graph name, parameter file and - what is the most
important - type in the Password textarea the encrypting password. The access password cannot be read now, it
has been already encrypted and cannot be seen neither in the configuration file nor in the graph.
Figure 9.13. Running a Graph with the Password Encrypted
If you should want to return to your access password, you can do it by typing the encrypting password into the
Database connection wizard and clicking Finish.
94
Chapter 10. Lookup Tables
When you are working with CloverGUI, you can also create and use Lookup Tables. These tables are data struc-
tures that allow fast access to stored data using known key or SQL query. This way you can reduce the need to
browse database or data files.
Creating Lookup Tables

If you want to create a lookup table, you must do it in the Outline pane by selecting the Lookups item, right-
clicking this item, selecting Lookup tables → Create lookup table. A Lookup table wizard opens. After selecting
the lookup table type and clicking Next, you can specify the properties of the selected lookup table.
(You can also open this wizard when selecting some lookup item in the Outline pane and clicking Enter.)
Figure 10.1. Lookup Table Wizard
95
Lookup Tables
Simple Lookup Table

In the Simple lookup table wizard, you must set up the demanded properties:
In the Table definition tab, you must give a Name to the lookup table, select the corresponding Metadata and
the Key that should be used to look up data records from the table. You can select Charset and the Initial size of
the lookup table (512 by default) and decide whether Byte mode should be used.
Figure 10.2. Simple Lookup Table Wizard
After clicking the button on the right side from the Key area, you will be presented with the Edit key wizard
which helps you select the Key.
Figure 10.3. Edit Key Wizard
By highlighting some of the field names in the Field pane and clicking the Right arrow button you are moving
such a field name into the Key parts pane. You can move more fields into the Key parts pane. You can also
change the position of any of them in the list of the Key parts by clicking the Up or Down buttons. The key parts
that are higher in the list have higher priority. When you have finished, you only need to click OK. (You can also
remove any of them by highlighting it and clicking the Left arrow button.)
In the Data source tab, you can either locate the file URL, or type or paste some data into the Data textarea.
96
Lookup Tables
Figure 10.4. Simple Lookup Table Wizard with File URL
Figure 10.5. Simple Lookup Table Wizard with Data
You can click the Edit data button after which you can change some data.
Figure 10.6. Changing Data
97
Lookup Tables
After all has been done, you can click OK and then Finish.
Database Lookup Table

When creating or editing a Database lookup table, you must check the Database lookup radio button and click
Next. (See Figure "Lookup Table Wizard".)
Figure 10.7. Database Lookup Table Wizard
Then, in the Database lookup table wizard, you must give a Name to the selected lookup table, specify some
Metadata and DB connection.
You can also check the Store negative response key. (If some key value has not been found in the table, this
value is stored for future purposes and will not be searched again.) And type some Max cache size value.
You must also type or edit some SQL query that serves to look up data records from lookup table. (This query
corresponds to the key that had to be used in Simple lookup table.) If you want to edit the query, you must click
the Edit button and, if your database connection is valid and working, you will be presented with the Query editor
wizard, where you can browse the database, generate some query, validate it and view the resulting data.
Figure 10.8. Query Editor Wizard
Now, you can click OK and then Finish.
98
Lookup Tables
Range Lookup Table

You can create a Range lookup table only in case some fields of the records create ranges. That means the fields
are of the same data type and they can be assigned both start and end. You can see it in the following example:
Figure 10.9. Appropriate Data for Range Lookup Table
When you create a Range lookup table, you must check the Range lookup radio button and click Next. (See
Figure "Lookup Table Wizard".)
Figure 10.10. Range Lookup Table Wizard
Then, in the Range lookup table wizard, you must give a Name to the selected lookup table, specify Metadata.
You can select Charset and decide whether Byte mode and/or Internationalization should be used.
When you click the Edit button, you are opening the following wizard:
99
Lookup Tables
Figure 10.11. Define Range Lookup Table Key Wizard
There you can see two panes, Fields on the left and Ranges definition on the right.
Now you must select End fields and assign them to Start fields by dragging and dropping some selected fields
from the left pane to the End fields column in the right pane.
You can also use the buttons on the right side of the wizard.
Figure 10.12. Assigning End Fields to Start Fields
You must also select whether any start or end field should be included to these ranges or not. You can do it by
selecting any of them in the corresponding column of the wizard and clicking.
After that, you only need to click OK and then Finish.
100
Chapter 11. Parameters
When you work with your graphs, sometimes you need to create parameters. Like metadata and connections, also
parameters can be both internal and external (shared). The reason why to create parameters is the following: When
using parameters, you can do more simple all your work with your graphs. Every value, number, path, filename
or attribute, etc. can be set up or changed with the help of parameters. Parameters are similar to named constants.
They are stored in one place and after the value of any of them is changed, this new value is used in the program.
Internal Parameters
Internal parameters are stored in the graph, they can be seen there. If you want to change the value of some
parameter, it is better to have external (shared) parameters. If you want to give someone your graph, it is better to
have internal parameters. It is the same as with metadata and connections.
How You Can Create Internal Parameters

If you want to create internal parameters, you must do it in the Outline pane by selecting the Parameters item,
right-clicking this item, selecting Parameters → Create internal parameter. A Graph parameters wizard ap-
pears. You must set up the names and values and click Finish.
Figure 11.1. Creating Internal Parameters
Externalizing Internal Parameters

Once you have created internal parameters as a part of a graph, you have them in your graph, but you may want
to convert them into external (shared) parameters. Thus, you would be able to use the same parameters for more
graphs (when more graphs share them).
You can externalize one internal parameter into external (shared) by right-clicking its item in the Outline pane,
clicking Externalize parameters from the context menu, selecting the project you want to add the parameter file
into, expanding that project and clicking OK.
101
Parameters
But mostly there are more internal parameters in a graph than only one and we suggest you externalize all of its
internal parameters into one external (shared) parameter file.
To do so, you must first select them. You can do it by holding down the Shift key while dragging by the Up
arrow or Down arrow key. Then you must call out the context menu by right-clicking the selected items, click
the Externalize parameters item, select the project you want to add the parameter file into, expand the project
and click OK.
Figure 11.2. Externalizing Internal Parameters
After that, the internal parameter or parameters disappear from the Outline pane parameters folder, but, at the
same location, a newly created parameter file appears.
The same parameter file appears in the project folder in the Navigator pane. The extension .prm is given auto-
matically to the parameter file.
External (Shared) Parameters

External (shared) parameters are stored outside the graph, they are stored in a separate file within the project folder.
If you want to change the value of some of the parameters, it is better to have external (shared) parameters. But,
if you want to give someone your graph, it is better to have internal parameters. It is the same as with metadata
and connections.
How You Can Create External (Shared) Parameters

If you want to create external (shared) parameters, you must do it by selecting File → New → Other and then by
expanding the CloverETL item and clicking the Graph parameter file item. (See Section "How You Can Create
External (Shared) Metadata".) Then you must click Next and a Graph parameters wizard appears and you only
need to create parameters with the help of it.
Linking External (Shared) Parameters

If you want to link an already existing external (shared) parameter file, you must do it in the Outline pane by se-
lecting the Parameters item, right-clicking this item, selecting Parameters → Link parameter file. (See Section
102
Parameters
"How You Can Create Internal Parameters".) Then you must select some of the existing parameter files (extension
.prm).
Internalizing External (Shared) Parameters

Once you have created and linked external (shared) parameters, in case you want to put them into the graph, you
may want to convert them into internal parameters. In such a case you would be able to see their structure in the
graph itself. Note that one parameter file with more parameters will create more internal parameters.
You can internalize external (shared) parameter file into internal parameters by right-clicking some of the external
(shared) parameters items in the Outline pane and clicking Internalize parameters from the context menu.
After that, the external (shared) parameters item disappears from the Outline pane parameters folder, but, at the
same location, the newly created internal metadata items appear. Mostly more than one.
However, the original external (shared) parameter file still remains to exist in the project folder in the Navigator
pane.
Figure 11.3. Internalizing External (Shared) Parameter
Parameters Wizard
(You can also open this wizard when selecting some parameters item in the Outline pane and clicking Enter.)
By clicking the plus button on the right side, a pair of words "name" and "value" appear in the wizard. After each
clicking the Plus button, a new line with name and value labels appears and you must set up both names and
values. You can do it when highlight any of them by clicking and change it to whatever you want and need. When
you select all names and set up all values you want, you can click the Finish button (for internal parameters) or the
Next button and type the name of the parameter file. The extension .prm will be added to the file automatically.
You also need to select the location for the parameter file in the project folder. Then you can click the Finish
button. After that, the file will be saved.
103
Parameters
Figure 11.4. Example of a Parameter-Value Pair
Using Parameters
When you have defined, for example, a db_table (parameter) which means a database table named employ-
ee (its value) (as above), you can only use ${db_table} instead of employee wherever you are using this
database table.
104
Chapter 12. Sequences
CloverGUI contains a tool designed to create sequences of numbers that can be used, for example, for numbering
records. In records, a new field is created and filled by numbers taken from the sequence.
Creating a Sequence
If you want to create a sequence, you must right-click the Sequence item in the Outline pane and choose Sequence
→ Create sequence from the context menu. After that, a Sequence wizard appears. There you must type the
name of the sequence, select the value of its first number, the incrementing step (in other words, the difference
between every pair of adjacent numbers), the number of precomputed values that you want to be cached and,
at the end, the name of the sequence file where the numbers should be stored. The name can be, for example,
${SEQ_DIR}/sequencefile.seq or ${SEQ_DIR}/anyothername. Note that we are using here the
SEQ_DIR parameter defined in the workspace.prm file, whose value is ${PROJECT}/seq. And PROJECT
is another parameter defining the path to your project located in workspace.
Figure 12.1. Creating a Sequence
Editing a Sequence
When you want to edit some of the existing sequences, you must select the sequence name in the Outline pane,
open the context menu by right-clicking this name and select the Edit item. A Sequence wizard appears. (You
can also open this wizard when selecting some sequence item in the Outline pane and clicking Enter.)
Now it differs from that mentioned above by a new textarea with the current value of the sequence number. The
value has been taken from a file. If you want, you can change all of the sequence properties and you can reset the
current value to its original value by clicking the button.
105
Sequences
Figure 12.2. Editing a Sequence
And when the graph has been run once again, the same sequence started from 1001:
Figure 12.3. A New Run of the Graph with the Previous Start Value of the Sequence
You can also see how the sequence numbers fill one of the record fields.
106
Appendix C. JMS Connections
For receiving Java messages you need JMS connections. Like metadata, parameters and database connections,
these can also be internal or external (shared).
Internal JMS Connections

As mentioned above in case for other tools (metadata, database connections and parameters), also internal JMS
connections are part of a graph, they are contained in it and can be seen in its source tab. This property is common
for all internal structures.
How You Can Create Internal JMS Connections

If you want to create an internal JMS connection, you must do it in the Outline pane by selecting the Connections
item, right-clicking this item, selecting Connections → JMS internal connection. A Edit JMS connection wiz-
ard opens. You can define the JMS connection in this wizard. Its appearance and the way how you must set up
the connection are described below.
Externalizing Internal JMS Connections

Once you have created internal JMS connection as a part of a graph, you have it in your graph, once it is contained
and visible in the graph, you may want to convert it into external (shared) JMS connection. Thus, you would be
able to use the same JMS connection for more graphs (more graphs would share the connection).
You can externalize internal JMS connection into external (shared) one by right-clicking some of the internal
JMS connection items in the Outline pane, clicking Externalize connection from the context menu, selecting the
project you want to add the JMS connection into, expanding that project, selecting the conn folder, renaming the
configuration file, if necessary, and clicking Finish.
After that, the internal item disappears from the Outline pane connections folder, but, at the same location, a
newly created configuration file appears.
The same configuration file appears in the conn subfolder in the Navigator pane.
External (Shared) JMS Connections

As mentioned above, external (shared) JMS connections are such connections that serve for more graphs than only
one. They are stored outside the graph and that is why more graphs can share them.
How You Can Create External (Shared) JMS Connec-

tions
If you want to create an external (shared) JMS connection, you must do it by selecting File → New → Other...
and then by expanding the CloverETL item and either by clicking the JMS connection item and then Next, or
by double-clicking the JMS Connection item. (See Section "How You Can Create External (Shared) Metadata".)
An Edit JMS connection wizard opens.
Linking External (Shared) JMS Connection

If you want to link an already existing external (shared) JMS connection, you must do it in the Outline pane
by selecting the Connections item, right-clicking this item, selecting Connections → JMS shared connection.
107
JMS Connections
(See Section "How You can Create Internal Database Connection".) Then you must select some of the existing
configuration files (extension .cfg).
Internalizing External (Shared) JMS Connections

Once you have created and linked external (shared) JMS connection, in case you want to put this connection into
the graph, you may want to convert it into internal JMS connection. Thus, you would be able to see its structure
in the graph itself.
You can internalize external (shared) configuration file into internal JMS connection by right-clicking some of
the external (shared) JMS connections items in the Outline pane and clicking Internalize connection from the
context menu.
After that, the external (shared) JMS connection item disappears from the Outline pane connections folder, but,
at the same location, a newly created internal JMS connection item appears.
However, the original external (shared) configuration file still remains in the right conn subfolder in the Navi-
gator pane.
Edit JMS Connection Wizard

As you can see, the Edit JMS connection wizard contains eight textareas that must be filled by: Name, Initial
context factory class, Libraries, URL, Connection factory JNDI name, Destination JNDI, User, Password
(password to receive and/or produce the messages).
(You can also open this wizard when selecting some JMS connection item in the Outline pane and clicking Enter.)
Figure C.1. Edit JMS Connection Wizard
In the Edit JMS connection wizard, you must specify the name of the connection, Initial context factory class,
select necessary libraries (you can add them by clicking the plus button), URL of the connection, Connection
factory JNDI name, Destination JNDI name, your authentication username (User) and your authentication
password (Password). You can also decide whether you want to encrypt this authentication password. This can
be done by checking the Encrypt password checkbox. If you are creating the external (shared) JMS connection,
you must select a filename for this external (shared) JMS connection and its location.
108
JMS Connections
Encrypting the Authentication Password

If you do not encrypt your authentication password, it remains stored and visible in the configuration file (shared
connection) or in the graph itself (internal connection). Thus, the authentication password can be seen in some
of these two locations.
Of course, this would not present any problem if you were the only one who had access to your graph and/or who
had your computer to yourself only. And even more, there would not be any problem if the password did not give
the right to receive and/or send the messages! But it does give such a right!
So, in case you want and need to give someone any of your graphs, you must not give him or her the authentication
password. This is the reason why it is important to encrypt your authentication password. Without doing so,
you would be at great risk of some intrusion actions or some other damage from whoever who could get this
authentication password.
Thus, it is important and possible that you give him or her the graph with the authentication password encrypted.
This way, no person would be able to receive and/or produce the messages without your permission in any way.
In order to hide your authentication password, you must select Encrypt password by checking the checkbox in
the Edit JMS connection wizard, typing a new (encrypting) password to encrypt the original (encrypted now)
authentication password and clicking the Finish button.
And then, in order to run such a graph, you will not be able to run the graph by choosing Run as → CloverETL
graph any more. To run the graph, you must use the Open Run Dialog wizard now. There, in the Main tab,
you must type or find by browsing the name of the project, its graph name, parameter file and - what is the most
important - type in the Password textarea the encrypting password. The authentication password cannot be read
now, it has been already encrypted and cannot be seen neither in the configuration file nor in the graph.
If you should want to return to your authentication password, you can do it by typing the encrypting password
into the JMS connection wizard and clicking Finish.
109
Part III. Components Guide
Chapter 13. Introduction to
Components
In the palette of components of the Graph Editor, all components are divided into 5 groups: Readers, Writers,
Transformers, Joiners and Others. We will describe either group step by step. One more category is called
Deprecated now. It should not be used any more and we do not describe its components either.
So far we have talked about how to paste components to graphs. We will now discuss the properties of components
and the manner of configuring them. You can configure the properties of any graph component in the following
way:
• You can simply double-click the component in the Graph Editor.
• You can do it by clicking the component and/or its item in the Outline pane and editing the items in the Prop-
erties tab.
• You can select the component item in the Outline pane and click Enter.
• You can also open the context menu by right-clicking the component in the Graph Editor and/or in the Outline
pane. Then you can select the Edit item from the context menu and edit the items in the Edit component wizard.
Common Properties of Components

Some properties are common for all components. They are the following: Component names, Phases, Enabling
vs. Disabling components vs. PassThrough status.
Data policy is common for some of them only but we will describe it here. It is also important to remember that
you must specify some URL in some of the components.
This can be done with the help of URL File Dialog.
In addition to these properties, you can also choose which components should be displayed in the Palette of
Components and which should be removed from there.
We can also describe how you can view your input or output data.
111
Introduction to Components
Palette of Components
CloverGUI provides all components in the Palette of Components. However, you can choose which should be
included in the Palette and which not. If you want to choose only some components, select Window → Prefer-
ences... from the main menu.
Figure 13.1. Selecting Components
After that, you must expand the CloverETL item and choose Components in Palette.
Figure 13.2. Components in Palette
In the window, you can see the categories of components. Expand the category you want and uncheck the check-
boxes of the components you want to remove from the palette.
112
Figure 13.3. Removing Components from he Palette
Then you only need to close and open graph and the components will be removed from the Palette.
Giving a Name to a Component

Each component has a label on it which can be changed for another one. As you may have many components in
your graph and they may have some specified functions, you can give them names according to what they do.
Otherwise you would have many different components with identical names in your graph.
You can rename any component in either of the following four ways:
• You can rename the component in the Edit component dialog by specifying the Component name attribute.
• You can rename the component in the Properties tab by specifying the Component name attribute.
• You can rename the component by highlighting and clicking it.
If you highlight any component (by clicking the component itself or by clicking its item in the Outline pane),
a hint appears showing the name of the component. After that, when you click the highlighted component, a
rectangle appears below the component, showing the Component name on a blue background. You can change
the name showed in this rectangle and then you only need to click Enter. The Component name has been
changed and it can be seen on the component.
113
Figure 13.4. Simple Renaming Components
• You can right-click the component and select Rename from the context menu. After that, the same rectangle as
mentioned above appears below the component. You can rename the component in the way described above.
Phases
Each graph can be divided into some amount of phases by setting the phase numbers on components. You can see
this phase number in the upper left corner of every component.
The meaning of a phase is that each graph runs in parallel within the same phase number. That means that each
component and each edge that have the same phase number run simultaneously. If the process stops within some
phase, higher phases do not start. Only after all processes within one phase terminate successfully, next phase starts.
That is why the phases must at least remain the same as the process is running. They must not descend.
So, when you increase some phase number on any of the graph components, all components with the same phase
number (unlike those with higher phase numbers) lying further along the graph change their phase to this new
value automatically.
114
Figure 13.5. Running a Graph with Various Phases
Enabling vs. Disabling Components vs. PassThrough

Status
By default all components are enabled. Once configured, they can parse data. However, you can turn off any group
of components of any graph. Each component can be disabled. When you disable some component, it becomes
greyish and does not parse data when the process starts. Moreover, neither the components that lie further along the
graph parse data. Only if there is some other enabled component that enter the branch further along the graph, data
can flow into the branch through that enabled component. But, if some component from which data flows to the
disabled component or to which data flows from the disabled component cannot parse data without the disabled
component, graph terminates with error. Data that are parsed by some component must be sent to other components
and if it is not possible, parsing is impossible as well. Disabling can be done in the context menu or Properties
tab. You can see the following example of when parsing is possible even with some component disabled:
115
Figure 13.6. Running a Graph with Disabled Component
You can see that data records from the disabled component are not necessary for the Concatenate component
and for this reason parsing is possible. Nevertheless, if you disabled this Concatenate component, readers before
this component would not have at their disposal any component to which they could send their data records and
graph would terminate with error.
But, if you want to process the graph in this example even with the Concatenate component disabled, you can do
it by setting the component to the passThrough status. Thus, data records would pass through the component from
input to output ports, but component would not change them. Unlike Disabling it can be done in the Properties
tab only.
Figure 13.7. Running a Graph with Component in PassThrough Status
116
Data Policy
When you want to configure some of the components (some Readers), you must first decide what should be done
when incorrect or incomplete records are parsed. This can be specified with the help of the Data Policy attribute.
Following are the components in which you can set this attribute. You have three options according to what data
policy you want to select:
• Strict. This data policy is set by default. It means that data parsing stops when first error occurs. This data
policy does not create any error port.
• Controlled. This data policy means that every error is logged, but incorrect records are skipped and data parsing
continues. If you set the Data Policy attribute to this value, a new output port is created through which the log
can be sent to other components (for UniversalDataReader only) or the log information is sent to stdout
(the others).
Thus, if you have set the Data policy attribute to controlled in UniversalDataReader, of course, you need
to select the components that should process the information or maybe you only want to write it. You must
select an edge and connect the error port of the UniversalDataReader (in which the data policy attribute is
set to controlled) with the input port of the selected writer if you only want to write it or with the input port
other processing component. And you must only assign metadata to this edge. The metadata must be created by
hand. They consist of 4 fields: number of incorrect record, number of incorrect field,
incorrect record, error message. The first two fields are of integer data type, the other two are
strings. (For more detailed information on how you can create metadata by hand, see corresponding section.)
• Lenient. This data policy means that incorrect records are set to their default values (if possible) and data
parsing continues.
Locating Files with URL File Dialog

In some components you must also specify URL of some files. These files can serve to locate the files from which
data should be read, the files to which data should be written or the files that must be used to transform data flowing
through a component and some other file URL. To specify such a file URL, you can use the URL File Dialog.
When you open the URL File Dialog, you can see some textareas and panes on it. Upon opening this wizard, you
can see your local file structure in the left pane.
Figure 13.8. URL File Dialog
117
However, if you want to find some file that is not stored locally, you must first specify the connection to the server
in the upper textarea (Server). If the connection is valid, you can connect to the specified server.
To connect to the server, you must type the connection according to the following pattern: proto-
col://username:password@hostname:portnumber. If you have typed the connection in the Server
textarea, you must click the Refresh button so as to display the file structure of the server in the left pane.
You can do the same by clicking the Connect to server button that is the second on the right side from the Server
textarea, after which a Connection Settings wizard opens. In the Connection Settings wizard, you must specify
the Protocol that should be used to connect to the server. The possible protocols are the following: HTTP, HTTPS,
FTP, FTPS and SFTP. You also need to specify some of the following: the host name (Host), the port number
(Port), your user name (Username) and your password (Password). You can validate the connection by clicking
the Validate connection button. After that, you can add the connection settings to the Server textarea by clicking
the OK button. When you click the Refresh button, the file structure of the server appears in the left pane.
(If you use the http protocol when connecting to the server, you can set some parameters of this connection. The
Http parameters button located on the right side from the Server textarea serves to specify the properties of the
http connection. When you click this button, a Property dialog opens. There you can define some properties along
with their values. When you type both the Property name and the Value and click the Down arrow button, you
add these properties to the list of the pane. At the end you only need to confirm the list by clicking the OK button.
You can edit any of the selected properties and their values by clicking and changing any of the items, you can
also remove any of them by clicking the item and clicking the Minus button. If you click the Two crosses button,
you remove all of the properties and values.)
Independently on whether you are selecting local or remote files, you see some file structure in the left pane. You
need to select some files from this pane. By double-clicking the items of the left pane, you can add the paths and
the file names to the Path textarea and (at the same time) the same item appears in the right pane. You can also
add wildcards to this textarea by clicking any of the first two buttons on the right side from this textarea (with ?
or * signs). The third button serves to refresh the path and the file name in the textarea.
If you click the Two right arrows button located between the two panes, you add all of the files from the left pane
to the File URLs pane on the right. If you click any of the file items of the left pane and click the Right arrow
button, you add this file URL to the right pane. You can do the same by simply double-clicking the item in the
left pane. After that, the selected item appears in the right pane. If you have a path and a file name in the Path
textarea, you can add it to the right pane by clicking the Down arrow button located above the right pane. In all
of these cases, at least one file URL appears in the right pane. Then you add this file URL or these file URLs to
the Component editor wizard by clicking the OK button.
The resulting URL is of the following type:
protocol://username:password@hostname:portnumber:/path/filename.
Above the File URLs pane on the right you can see the button for creating the URL of the following type: port:
$0.FieldName[:processingType]. After clicking the button, you can select FieldName and one of the
following processing types: source, discrete and stream.
Viewing Data in Readers and Writers

You can also view data in Readers and Writers using the context menu. To do that, right-click the desired com-
ponent and select View data from the context menu.
118
Figure 13.9. Viewing Data in Components
After that, you can choose whether you want to see data as plain text or grid. If you select the Plain text option,
you can select Charset, but you cannot select any filter expression.
Figure 13.10. Viewing Data as Plain Text
On the other hand, if you select the Grid option, you can select Filter expression, but no Charset.
Figure 13.11. Viewing Data as Grid
The result can be as follows in the Plain text mode:
119
Figure 13.12. Plain Text Data Viewing
Or in the Grid mode, it can be like the following:
Figure 13.13. Grid Data Viewing
The same can be done in some Writers. However, only after the output file has been created.
120
Chapter 14. Defining the
Transformations
This section describes how you can define transformation in the following components:
• Partition, DataIntersection, Reformat, Denormalizer and Normalizer
Except Partition, the other four components require that some transformation should be defined. In the Par-
tition component, transformation is required only if neither the Ranges nor the Partition key attributes are
defined.
You can define the transformation in Java or Clover transformation language or in Clover transformation lan-
guage Lite.
However, we suggest you better do not use CTL Lite.
• ApproximativeJoin, ExtHashJoin, ExtMergeJoin, LookupJoin and DBJoin
Some transformation is required in these components.
You can define the transformation in Java or Clover transformation language or in Clover transformation lan-
guage Lite.
However, we suggest you better do not use CTL Lite.
• JMSReader, JMSWriter and JavaExecute
Only in JavaExecute, some transformation is required, however, it must be written in Java language only. In
JMSReader and JMSWriter, transformation is optional, but must also be written in Java.
Of course, in all of these components, you can use some compiled transformation class. To do that, use the Open
Type wizard. In this case, transformation is located outside the graph.
You can also use a transformation defined in some source file outside the graph. To locate the transformation
source file, use the URL File Dialog. Each of the mentioned components can use this transformation definition.
This file must contain definition of the transformation written in the same languages as in case of the internal
transformation definition. In this case, transformation is located outside the graph. (For more detailed information
see Section "Locating Files with URL File Dialog" above.)
To define the transformation in the graph itself, you must use the Transform editor (or the Edit value wizard
in case of JMSReader, JMSWriter and JavaExecute components). In it, you can define some transformation
located and visible in the graph itself. The languages that serve for writing transformation has been mentioned
above.
More details about how you should define the transformations can be found in the sections concerning correspond-
ing components.
Some details about writing transformations in Java can be found in corresponding Appendix.
Open Type Wizard

This wizard serves to select some class that defines the desired transformation. When you open it, you only need
to type a part of a class name. By typing the name, the classes satisfying to the written letters appear in this wizard
and you can select the right one.
121
Defining the Transformations
Figure 14.1. Open Type Wizard
Edit Value Wizard

The Edit Value wizard contains a simple textarea where you can write the transformation code in JMSReader,
JMSWriter and JavaExecute components.
Figure 14.2. Edit Value Wizard
When you click the Navigate button at the upper left corner, you will be presented with the list of possible options.
You can select either Find or Go to line.
Figure 14.3. Find Wizard
If you click the Find item, you will be presented with another wizard. In it you can type the expression you want
to find (Find textarea), decide whether you want to find the whole word only (Whole word), whether the cases
should match or not (Match case), and the Direction in which the word will be searched - downwards (Forward)
or upwards (Backward). These options must be selected by checking the presented checkboxes or radio buttons.
If you click the Go to line item, a new wizard opens in which you must type the number of the line you want
to go to.
122
Figure 14.4. Go to Line Wizard
Transform Editor
Some of the components provide the Transform editor in which you can define the transformation.
When you open the Transform editor, you can see the following tabs: the Transformations and Source tabs.
The Transformations tab can look like this:
Figure 14.5. Transformations Tab of the Transform Editor
In this Transformations tab, you can define the transformation using a simple mapping of inputs to outputs. First,
you must have both input and output metadata defined and assigned. Only after that, you can define the desired
mapping.
After opening the Transform editor, you can see some panes and tabs in it. You can see the input fields of all
input ports and their data types in the left pane and the output fields of all output ports and their data types in the
right pane. You can see the following three tabs in the middle bottom: Variables, Sequences, Parameters.
If you want to define the mapping, you must select some of the input fields, push down the left mouse button on
it, hold the button, drag to the Transformations pane in the middle and release the button. After that, the selected
field name appears in the Transformations pane.
The following will be the resulting form of the expression: $portnumber.fieldname.
After that, you can do the same with some of the other input fields. If you want to concatenate the values of various
fields (even from different input ports, in case of Joiners and the DataIntersection component), you can transfer
123
all of the selected fields to the same row in the Transformations pane after which there will appear the expression
that can look like this: $portnumber1.fieldnameA+$portnumber2.fieldnameB.
The port numbers can be the same or different. The portnumber1 and portnumber2 can be 0 or 1 or any
other integer number. (In all components both input and output ports are numbered starting from 0.) This way you
have defined some part of the transformation. You only need to assign these expressions to the output fields.
In order to assign these expressions to the output, you must select any item in the Transformations pane in the
middle, push the left mouse button on it, hold the button, drag to the desired field in right pane and release the
button. This output field in the right pane becomes bold.
In addition to the said until now, you can see empty little circles on the left from either of these expressions (still
in the Transformations pane). Whenever some mapping is made, the corresponding circle fills up with blue.
This way you must map all of the expressions in the Transformations pane to the output fields until all of the
expressions in the Transformations pane becomes blue. At that moment, the transformation has been defined.
You can also copy any input field to the output by right-clicking the input item in the left pane and selecting Copy
fields to... and the name of the output metadata:
Figure 14.6. Copying the Input Field to the Output
Remember that if you have not defined the output metadata before defining the transformation, you can define
them even here, by copying and renaming the output fields using right-click, however, it is much more simple to
define new metadata prior to defining the transformation. If you defined the output metadata using this Transform
editor, you would be informed that output records are not known and you would have to confirm the transformation
with this error and (after that) specify the delimiters in metadata editor.
The resulting simple mapping can look like this:
124
Figure 14.7. Transformation Definition in CTL (Transformations Tab)
If you select any item in the left, middle or right pane, corresponding items will be connected by lines. See example
below:
Figure 14.8. Mapping of Inputs to Outputs (Connecting Lines)
By clicking the button that appears after selecting the row of the Transformations pane, you can also open the
editor for defining the transformation of each individual field. It contains a list of fields, functions and operators
and it also provides the hints. See below:
125
Figure 14.9. Editor with Fields and Functions
Some of your transformations may be complicated and it is difficult to define them in the Transformations tab.
You can use the Source tab instead.
(Source tabs for individual components are displayed in corresponding sections concerning these components.)
Below you can see the Source tab with the transformation defined above. It is written in Clover transformation
language.
Figure 14.10. Transformation Definition in CTL (Source Tab)
126
In the upper right corner of either tab, there are two buttons: for creating a new tab in Graph Editor (Open tab
button) and for converting the defined transformation to Java (Convert to Java button).
If you click the Open tab button, a new tab with the transformation will be opened in the Graph Editor. It will
be confirmed by the following message:
Figure 14.11. Confirmation Message
The tab can look like this:
Figure 14.12. Transformation Definition in CTL (Transform Tab of the Graph Editor)
If you switch to this tab, you can view the declared variables and functions in the Outline pane. (The tab can be
closed by clicking the red cross in the upper right corner of the tab.)
The Outline pane can look like this:
127
Figure 14.13. Outline Pane Displaying Variables and Functions
Note that you can also use some content assist by clicking Ctrl-Space.
If you click these two keys inside any of the expressions, the help advises what should be written to define the
transformation.
Figure 14.14. Content Assist (Record and Field Names)
If you click these two keys outside any of the expressions, the help gives a list of functions that can be used to
define the transformation.
128
Figure 14.15. Content Assist (List of CTL Functions)
If you have some error in your definition, the line will be highlighted by red circle with a white cross in it and at
the lower left corner there will be a more detailed information about it.
Figure 14.16. Error in Transformation
If you want to convert the transformation code into the Java language, click the Convert to Java button and select
whether you want to use clover preprocessor macros or not.
129
Figure 14.17. Converting Transformation to Java
After selecting and clicking OK, the transformation converts into the following form:
Figure 14.18. Transformation Definition in Java
In older transformations, Clover transformation language Lite was used, however, we suggest you do not use it
from now.
Nevertheless, the same transformation could look in the Transformations tab (in CTL Lite) like this:
130
Figure 14.19. Older Transformation Definition in CTL Lite (Transformations Tab)
Nevertheless, the same transformation could look in the Source tab (in CTL Lite) like this:
Figure 14.20. Older Transformation Definition in CTL Lite (Source Tab)
You should convert older transformations in CTL Lite to a new form in CTL.
In the upper right corner you can also see two buttons. This time, you can either convert the transformation to
Java (Convert to Java button) or CTL (Convert to CTL button). Only transformation in CTL can be displayed
as a new tab in the Graph Editor.
131
Chapter 15. Readers
Readers are mostly the initial components of graphs. They read data from data resources and send it to other
graph components. This is the reason why each reader must have at least one output port through which the data
flows out. Readers can read data from files or databases located on disk. They can also receive data through some
connection using FTP, LDAP, or JMS. Some Readers can log the information about errors. Among the readers,
there is also the Data Generator component that generates data according to some specified pattern. And, some
Readers have an optional input port through which they can also receive data.
Remember that (in case of most readers) you can see some part of input data when you right-click a reader and
select the View data option. After that, you will be prompted with the same View data dialog as when debugging
the edges (For more details see Section "Viewing the Data Flowing through the Edges".). This wizard allows you
view the read data (it can even be used before graph has been run).
File URL
In order to work with the components, you must set File URL in some of them.
These are some examples of the File URL attributes for reading data.
• /path/filename.txt
• /path/filename1.txt;/path/filename2.txt This way you can read two files that are located on
your disk.
• /path/filename?.txt This way you can read the files conforming the mask that are located on your disk.
• /path/* This way you can read all of the files inside some folder.
• zip:/path/file.zip This way you can read the first file from the compressed file.
• zip:/path/file.zip#filename.txt This way you can read the specified file from the compressed
file.
• gzip:/path/file.gz Like above. Remember that no gzip file can be read by CloverDataReader.
• gzip:/path/file.gz#filename.txt Like above. Remember that no gzip file can be read by Clover-
DataReader.
• ftp://user:password@server/path/filename.txt Remember that ftp cannot be used by

CloverDataReader.
• http://server/path/filename.txt Remember that http cannot be used by CloverDataReader.
• https://server/path/filename.txt Remember that https cannot be used by CloverDataRead-

er.
• zip:(ftp://user:password@server/path/file.zip)#filename.txt Remember that ftp

cannot be used by CloverDataReader.
• zip:(http://server/path/file.zip)#filename.txt Remember that http cannot be used by

CloverDataReader.
• zip:(zip:(ftp://user:password@server/path/
name.zip)#file.zip)#filename.txt Remember that ftp cannot be used by CloverDataReader.
132
Readers
• gzip:(http://server/path/file.gz) Remember that no gzip file can be read by Clover-

DataReader.
• port:$0.FieldName:source If this URL is used, input port must be connected. The specified field of
such data records that are received through this optional input port represents some URL from which data is read
and parsed. Input data type of this FieldName must be one of the following three: string, byte or cbyte.
• port:$0.FieldName:discrete If this URL is used, input port must be connected. The specified field of
data records that are received through this optional input port represents one particular data source. Input data
type of this FieldName must be one of the following three: string, byte or cbyte.
• port:$0.FieldName:stream If this URL is used, input port must be connected. The specified field values
of all data records that are received through input port are concatenated to represent one particular data source.
Input data type of this FieldName must be one of the following three: string, byte or cbyte.
• - This way you can specify data should be read from stdin. Remember that stdin cannot be used by Clover-
DataReader.
File Readers
These components read data from files. Only DataGenerator does not read data, it generates records according
to the specified pattern. One component reads data from flat files: UniversalDataReader. The others read data in
internal Clover format (CloverDataReader), Excel files (XLSDataReader) and DBase files (DBFDataReader).
Unlike CloverDataReader (and DataGenerator, of course), other file readers can also receive data through their
optional input port.
DataGenerator
This component has at least one output port. Whenever you connect an edge to any output port, a new output
port is created.
This component generates data according to some pattern instead of reading data from some file, or database, or
any other data resource.
When you select this component, you must decide which fields should be generated at random (Random fields)
and which by sequence (Sequence fields). The other fields will be constant. You must create a pattern (Record
pattern) that looks like data from delimited or fixed length file. Record pattern is a string containing all fields
(except random and sequential fields) of the generated records in the form of delimited (with delimiters defined
in metadata on the output port) or fixed length (with sizes defined in metadata on the output port) record. All of
the record fields will be constant, and/or random values, and/or sequential values.
You must first specify how many record you want to be generated (Number of records to generate). Then you
must select which fields should be generated by sequence and/or at random.
You can do it by choosing among the sequences after clicking the Sequence fields attribute. Then the Sequences
dialog opens.
133
Readers
Figure 15.1. Sequences Dialog
This dialog consists of two panes. There are all of the graph sequences on the left and all clover fields (names
of the fields in metadata) on the right. By choosing the desired sequence from the left pane, holding down the
left mouse button on the sequence, moving it to the desired clover field on the right and releasing the button, you
assign sequences to those clover fields that should be generated by sequence.
Figure 15.2. A Sequence Assigned
Remember that it is not necessary to assign the same sequence to different clover fields. But, of course, it is
possible. It depends only on your decision. This dialog contains two buttons on its right side. For cancelling any
selected assigned mapping or all assigned mappings.
You must also specify the fields that should be generated at random. For either of them you can define its ranges.
(Its minimum and maximum values.) These values are of the corresponding data types according to metadata. You
can assign random fields in the Edit key dialog that opens after clicking the Random fields attribute.
134
Readers
Figure 15.3. Edit Key Dialog
There are the Fields pane on the left, the Random fields on the right and the Random ranges pane at the bottom.
In the last pane, you can specify the ranges of the selected random field. There you can type specific values. You
can move fields between the Fields and Random fields panes as was described above - by clicking the Left arrow
and Right arrow buttons.
Flat File Readers

The UniversalDataReader component reads data from flat files indepentently on whether they are delimited,
fixed or mixed. Flat files are simple text files with delimiters separating data records and fields, or with defined
sizes of data records and fields or with both delimiters and sizes. Delimiters and sizes are defined in metadata.
This file reader can also receive data through its optional input port.
UniversalDataReader
This component has one optional input port and one or two output ports. The second output port is optional. You
can extract metadata from a flat file.
If you connect an edge to the optional input port of the component, you must set the File URL attribute to port:
$0.FieldName[:processingType]. Here processingType is optional and can be set to one of the
following: source, discrete and stream. If it is not set explicitly, it is set to discrete by default. (You
can see the meaning of these attribute values in the section "File URL" above.) Input data type of this FieldName
must be one of the following three: string, byte or cbyte.
This component reads data from flat files. It can read both delimited and fixed length data records depending on
metadata on its output port.
You must select which file should be read (File URL), what character type is used in these records (Charset),
whether the first line describes the names of the fields and must be skipped (Skip first line), how many records
should be read from the file (Max number of records), otherwise the reader would read and send out all records.
You can also specify what to do in case of some incorrect records (Data policy). If you switch to controlled data
policy, you can log information about errors and send it through the second (optional) output port into some other
component. Therefore this component can have one optional error port.
Thus, if you have set the Data policy attribute to controlled in UniversalDataReader, of course, you need to select
the components that should process the information or maybe you only want to write it. You must select an edge
135
Readers
and connect the error port of the UniversalDataReader (in which the data policy attribute is set to controlled) with
the input port of the selected writer if you only want to write it or with the input port other processing component.
And you must only assign metadata to this edge. The metadata must consist of 4 fields: number of incorrect
record, number of incorrect field, incorrect record, error message. The first two fields
are of integer data type, the other two are strings. (For more detailed information on how you can create metadata
by hand, see corresponding section.) The field names can be arbitrary.
In addition to the attributes mentioned above, in this type of component, you may also define the following:
Sometimes a limited number of rows is only header describing data and not data itself. In such a case, you must
set the Skip rows attribute to the number of rows that must be skipped.
You can also define whether white spaces in the leading ends of the fields should be skipped (Skip leading blanks).
You can also specify the Max error count attribute so as to limit errors that can still be processed until the data
parsing stops. The default value is 0.
The Quoted strings attribute can be set to true allowing to use field surrounded by single or double quotes. This
is false by default.
In this type of component, you can even treat multiple delimiters like single one. You can do it by setting the
Treat multiple delimiters as one attribute to true. This is false by default. If you used multiple delimiters and
did not set this attribute to true, it would be interpreted as null field between every pair of single delimiters within
such multiple delimiter.
You may also want to Trim strings.
You can also set the Incremental file and Incremental key attributes. The Incremental key is a string to which
information about read records is written. This key is stored in the Incremental file. This way, the component
reads only new records on each run of the graph.
It is also possible to set a phase of parsing data (Phase), set the visual name located on the component (Component
name) and enable/disable the component (Enable).
Other Type File Readers

The other three components read data from other type files. (CloverDataReader, XLSDataReader and DBF-
DataReader.) These files are internal Clover format files (CloverDataReader), Excel files (XLSDataReader)
and dBase files (DBFDataReader). You can extract metadata from an Excel file and you can create metadata from
a dBase file. (See Sections "Extracting Metadata from an Excel File" and "Creating Metadata from a DBase File".)
The last two components (XMLExtract and XMLXPathReader) read data from XML files, but these are more
advanced questions and they are dealt in "Advanced Readers" Section below.
Unlike CloverDataReader and XMLExtract, the others of these file readers can also receive data through their
optional input port.
CloverDataReader
port is created.
You must create metadata by hand or select some prepared or you have metadata stored in some metadata file that
was created when writing data to Clover file.
136
Readers
This component reads data that are stored in internal Clover binary data format. With this component, you can
read data in this internal format that allows faster access to stored data. When you read such a file, you can also
have an index file which allows you to select individual records from the data file. In addition to this, the file you
read can be compressed.
Remember that CloverDataReader cannot work with ftp, http and https protocols and gzip files and
neither it can read data from stdin. It can only read data from common files or compressed zip files.
You must select which file should be read (File URL), whether you want to read compressed data (Compressed
data) (if it is not set explicitly, it depends on the .zip extension of the data file). If you do not want to read all
records, you can specify the Index file URL (if it is not stored in the same folder as the clover data file or if it
has other name than datafilename.idx. The datafilename includes even the extension of the data file.)
and the Start record and the Final record parameters. In such a case, records are read starting from the Start
record up to the Final record. If you do not set the Final record, CloverDataReader will read and send out all
records starting from the Start record.
If you read some Clover data, remember that in case of compressed file, all files (data file, index file and metadata
file) are compressed together in this compressed file within subfolders in the following way: DATA/datafile-
name, INDEX/datafilename.idx and META/datafilename.fmt. Again, datafilename includes
its extension.
XLSDataReader
This component has one optional input port and at least one output port. Whenever you connect an edge to any
output port, a new output port is created. You can extract metadata from an xls file.
This component reads data that are stored in an excel file. To read such data, you must first specify the sheet where
are the data you want to read. You can specify the Sheet number, or the Sheet name. If you select or type both
attributes, only the Sheet number will be applied. Remember that (at least) one of them must be specified. (Note
that the first sheet number is 0.)
You can use wildcards (*, ?) when specifying sheet names. And you can use a mask (separated by comma and got
together with a hyphen: (number, minNumber-maxNumber, *-maxNumber, minNumber-*) or a similar
combination of numbers (1,3,5-7,9-*) when specifying sheet numbers.
You can specify which row contains the names of columns (Metadata row). If there is no such row, codes of
columns will be used as the names of the fields.
And you can also specify the rows that should be read starting from the Start row to the Final row.
Max number of records defines how many records should be read from the file as a maximum.
You can also set the Max error count attribute so as to limit errors that can be processed until the data parsing
stops. It is set to 0 by default.
You may also want to specify the Field mapping attribute. This attribute is not required, sometimes you do not
need to specify it, but in other cases some mapping must be defined. See below.
137
Readers
Mapping and Metadata

If you want to specify some mapping (Field mapping), click the row of this attribute. After that, a button appears
there and when you click this button, the following dialog will open:
Figure 15.4. XLS Mapping Dialog
This dialog consists of two panes: Xls fields on the left and Mappings on the right. At the right side of this dialog,
there are three buttons: for automatic mapping, canceling one selected mapping and canceling all mappings. You
must select an xls field from the left pane, push the left mouse button, move to the right pane (to the Xls fields
column) and release the button. This way, the selected xls field has been mapped to one of the output clover fields.
Repeat the same with the others xls fields too. (Or you can click the Auto mapping button.)
Figure 15.5. XLS Fields Mapped to Clover Fields
Note that xls fields are derived automatically from xls column names when extracting metadata from the XLS file.
138
Readers
When you confirm the mapping by clicking OK, the resulting Field mapping attribute will look like this (for
example): $OrderDate:=#D;$OrderID:=#A;
On the other hand, if you check the Return value with xls names checkbox on the XLS mapping dialog, the
same mapping will look like this: $OrderDate:=ORDERDATE,D;$OrderID:=ORDERID,N,20,5;
You can see that the Field mapping attribute is a sequence of single mappings, each of them is followed by
semicolon. The last semicolon is optional, it can be omitted.
Each single mapping consists of assignment of clover field name and xls field. Clover field is on the left side of
the assignment and it is preceded by dollar sign, xls field is on the right side of the assignment and it is either the
code of xls column preceded by hash, or the xls field as shown in the Xls fields pane.
A pair of clover field and xls field (or xls code) is put together using colon and equal sign.
You must remember that you do not need to read and send out all xls columns, you can even read and send out
some of them only.
Now we will describe how you can map selected or all xlsColumns to cloverFields:
• First, we suppose that you have specified Metadata row:
• If all clover fields are the same as the selected xls fields that should be mapped to these clover fields (inde-
pendently on their order), you do not need to define any mapping (Field mapping). In such a case, however,
these selected xls fields would be mapped to all clover fields according to their names.
• If all clover fields are the same as the selected xls fields that should be mapped to these clover fields (inde-
pendently on their order), but you want to preserve the original order of selected xls fields or change it to
some other order than that defined by clover fields, you must specify some mapping (Field mapping) as
shown above. You would use all clover fields and selected xls fields.
• And, if only some clover fields are the same as some xls columns, you must also define some mapping (Field
mapping) as shown above. The reason is that, in this case, xls columns and clover fields would not be mapped
to each other by their names. Such clover fields would be empty.
• Second, we suppose that you have not specified any Metadata row:
• If you do not define any mapping (Field mapping), xls columns will be mapped to all clover fields in the
order of their appearance in XLS file starting from the first column of XLS file. In this case, you should skip
Metadata row (if there is any) by setting Start row to the first row that contains data.
• If you define some mapping (Field mapping) (this time with codes of xls columns preceded by hash only),
selected xls columns will be mapped to all clover fields according to the defined mapping (Field mapping).
In this case, you should skip Metadata row (if there is any) by setting Start row to the first row that contains
data.
DBFDataReader
output port, a new output port is created.
139
Readers
This component reads data from dBase files (extension .DBF). It can read only fixed length data records.
When you select this component, you must specify which file should be read (File URL), what character type is
used in these records (Charset), and specify what to do in case of some incorrect records (Data policy). If you
switch to controlled data policy, you can log information about errors. In this component the log information is
sent to stdout.
It was already mentioned how you can extract metadata from this type of file.
Database Readers
So far we have talked about file readers, but often you want to read data from databases instead of files. In such
cases you can read data using either some client that connects to database or some JDBC driver. You can extract
metadata from a database (See corresponding Section.)
Using JDBC Drivers

Now we will describe the following component that uses JDBC drivers - DBInputTable.
DBInputTable
port is created.
This component reads data from databases. It can be used for various database systems. You only need to define
a database connection. To do that, you must specify all of the following: host name of database server, database
name, user name, access password and JDBC driver that should be used to connect such database. Sometimes you
must also define the port number of the database connection.
When you select this component, you must define a query by typing some SQL query in the graph (SQL query)
or specifying the location of some file with the SQL query (Query URL). If you define both, Query URL will be
applied. In the query, database table can be specified. You must choose some of the available database connections
(DB connection). And what should be done in case of some incorrect record (Data policy). If you switch to
controlled data policy, you can log information about errors. In this component the log information is sent to stdout.
You may also want to change the number of records that should be unloaded from database at the same time
(Fetch size).
Of course, you can also specify what character type (Charset) should be used when reading external query URL.
And finally, it is also possible to set a phase of parsing data (Phase), set the visual name located on the component
(Component name) and enable/disable the component (Enable).
140
Readers
Advanced Readers
The components described above can read files of different types and databases. But some information is not
contained in these two data resources or it is contained in more complicated structured files. CloverGUI offers
you four additional advanced readers: XMLExtract, XMLXPathReader, JMSReader and LDAPReader. The
first two components read XML files, JMSReader receives Java messages and LDAPReader gets information
from LDAP directories. XMLXPathReader can also receive data through its optional input port.
XMLExtract
This component has at least one output port. Whenever you connect an edge to any output port, a new output port
is created. You must create metadata on the output port(s) by hand or select some prepared.
This component reads data from so called XML file or any other text file with XML-like nested tree structure.
This component is faster than XMLXPathReader that can read XML files too. The mapping can be done starting
from some selected level and going to the depth. It uses SAX technology.
When you select this component, you must specify which file should be read (File URL). Sometimes you want
to skip some amount of records. You can do it by specifying this Skip mappings attribute. By default it is 0.
You can also select how many records should be sent out from the file (Max number of mappings), otherwise
XMLExtract would read and send out all data records.
These two attributes above limit the outgoing records on the outputs and send the records to the output ports one
by one. Some number of the outgoing records is skipped (Skip mappings), some number of the others is sent out
through the output ports one by one (Max number of mappings).
The most important is to define some mapping of the original file to the output ports. The number of output ports
is not defined exactly. It depends on the selected Mapping.
You can also set the Use nested nodes attribute to true. Below we will describe what this means.

Because the original XML nested tree files have the structure of some pairs of tags surrounding either other pairs
of tags or some text representing data, the mapping of the original XML file to other data file or database table
or any other data resources must have similar structure. Each mapping must use some tags, but its nesting must
be solved in a different way. Nested parts of the original XML file (different pairs of tags surrounding a serie of
other tag pairs) are sent to different output ports unless the Use nested nodes attribute is set to true. Then, only
some parts of the original XML nested file will be sent to different output ports. However, the structure must be
very similar in both the original file and the mapping.
• If you have some XML file which is located between some root start-tag <roottag> and the corresponding
end-tag </roottag>, its mapping must have a similar structure that starts with <Mappings> as its start-tag
and terminates with </Mappings> as its end-tag. All other tags of mapping must be located between these
two terminal tags. Their structure must correspond to the structure of the original file. The serie of tag pairs that
are at the same level in the original XML file must also be at the same level and between the corresponding
surrounding tags in the mapping. They also must create a serie of mapping tags of the same level. Those tag
pairs that are located deeper in the original file must also be located deeper in the mapping and must be at
141
Readers
the corresponding place between the corresponding mapping tags. They also must be a serie of mapping tags
creating different levels.
• If you want to assign (map) some of the tags (elements) to some of the output ports, you must do it in the
following way:
Between this pair of terminal tags mentioned above and destined for mapping ( <Mappings> and </
Mappings>), there must be a serie consisting of the following two expressions:
• Closed ("empty") tag like this:
<Mapping element="tagA" outPort="noA" xmlFields="eAA" cloverFields="eAB"

parentKey="eAC" generatedKey="eAD"/>
• A nested structure like this:
<Mapping element="tagB" outPort="noB" xmlFields="eBA" cloverFields="eBB"

parentKey="eBC" generatedKey="eBD">
A serie of some closed mapping tags as mentioned above or a serie of some

nested structures like this one or both types of the structure can be here.
</Mapping>
In the last case, the elements in the middle must be a combination of the two structure types mentioned above.
They can be either a serie of closed ("empty") tags or a serie of nested structures or a mixture of the two types.
Remember that the levels and nesting of the mapping must correspond to the levels and nesting of the original
file. Remember also that the numbers of the output ports must differ in each of these mapping expressions. For
each selected element, different output port must be assigned.
Thus, if the element="sometag" expression corresponds to some pair of tags at some level
(<sometag>...</sometag>) of the original file, only the elements that lie at the same level or in greater
depth can be sent to the output ports and mapped to metadata on the output:
Note that the "noA", "noB", ..., "noD", etc. are the numbers of the output ports of the component surrounded
by double quotes through which data is sent out.
Now we must explain what expressions should be used to designate "eAA", "eAB", ..., "eDD",
etc. They are all sequences of tags (xmlFields and parentKey) or metadata fields (clover-
Fields or generatedKey) separated by semicolon. These sequences are surrounded by double
quotes. (For example, you can have: xmlFields="firstname;lastname;salary;address"
cloverFields="fname;lname;slr;addr".)
• Some metadata fields on the output port must belong to the level that lies at the next deeper level of the original
XML file. In other words, this concerns its children. However, it concerns only those parts of the children
level that can offer some values to fill the selected fields being themselves of the <sometag>somevalue</
sometag> type.
These various sometag-s are the mentioned xmlFields. It is only on your decision how many tags from
this part of the children level you want to select as the xmlFields that should be sent to the output.
In case that the Use nested nodes attribute is set to true, you can also select as xmlFields the tags that are
located deeper in the original file.
The selected fields can be renamed (mapped) by setting xmlFields="eCA" cloverFields="eCB".

This way, these xmlFields (names of tags) that are expressed by the eCA are assigned to cloverFields
(names of metadata fields) that are expressed by the eCB.
Remember that if the xmlFields are the same as the cloverFields, you do not need to do any mapping
between them. In such a case, you would only need the expression of the following type:
142
Readers
<Mapping element="tagC" outPort="noC" parentKey="eCC" generatedKey="eCD">
...
</Mapping>
• The other metadata fields on the output port belong to the next higher level of the original XML file. In other
words, this concerns the parent level. However, it concerns only such parts of the parent level offering some
values to fill the selected fields being themselves of the <sometag>somevalue</sometag> type.
This is the mentioned parentKey. It is only on your decision how many tags from this parent level you want
to select as the parentKey that should be sent to the output.
These fields can be renamed (mapped) by setting parentKey="eDC" generatedKey="eDD". This way,
this parentKey (serie of names of tags) that is expressed by the eDC is assigned to generatedKey (some
combination of names of metadata fields) that is expressed by the eDD.
Remember that if the next higher level in the original file does not contain any structure of the
<sometag>somevalue</sometag> type, you do not have parentKey nor generatedKey. Thus,
you have only the expression of the following type:
<Mapping element="tagD" outPort="noD" xmlFields="eDA" cloverFields="eDB">
...
</Mapping>
• Remember that the number of selected xmlFields and cloverFields contained in these expressions must
equal to each other. But, what is the most important: You do not need to map xmlFields to cloverFields
in case both the tags at the selected level in the original XML file and the field names in metadata on the output
port have identical names. In such a case, they are mapped to each other automatically according to their names.
Remember that you do not need to use all of the tags. You can limit yourself to some of them only.
• Now we must mention one more possibility concerning what you can do when mapping some XML file to some
of the output ports. If you set the Use nested nodes attribute to true, some of the tags that are at deeper levels
and (at the same time) consist of a serie of <sometag>somevalue</sometag> expressions can be sent to
the same port as the original level. Their sometag-s are new xmlFields and can also be renamed (mapped)
to some cloverFields (metadata fields) on the output.
• Thus, xmlFields are sequences of names of the tags at some levels of the original XML file. Also the par-
entKey are sequences of names of the tags. But, cloverFields can be any other names of metadata fields.
They are the metadata field names. You can also change the generatedKey names if you want. Even you
can concatenate some serie of tags contained in the parentKey="eEC" (where eEC is a sequence of
fields separated by semicolon) into one field for the eED which will be of the string data type and will be
look like the following: generatedKey="nameED".
• Also, if you do not obtain a unique identification of the outgoing records with the help of the mentioned keys, you
can use a pair of the following expressions: sequenceField="metadatafieldAofthesequence"
sequenceId="identificationofthesequence". And for the next deeper level of
the mapping, you can use such new artificial field name as its parentKey (values
of this new, artificial sequenceField): parentKey="metadatafieldAofthesequence"
generatedKey="metadatafieldBofthesequence".
Example
For example, if you have an <employees> and </employees> root tags, if between <employees> at
the start and </employees> at the end there is a serie of structures describing a group of employees of some
company (for example, 100 employees) and if data concerning every employee is described and located between
143
Readers
a pair of <employee> and </employee> tags in your original XML file, you can assign all employees to the
first port by the following expression:
<Mapping element="employee" outPort="0" and the rest of expressions as shown

above/>
(Remember that the ports are numbered starting from 0.)
You must remember that all the information about each employee from the whole serie of employees will be
converted to one record consisting of some number of fields that are sent out through the selected output port (in
this case, the first port has been selected). Thus, if there are 100 employees, you will have 100 records flowing
out through the first output port. The number of records flowing through the selected port equals to the number of
individual employees that are delimited by pairs of <employees> and </employees> tags. Thus, for a serie
of 100 pairs of <employee>...</employee> tags (between a pair of <employees>...</employees>
tags), 100 records describing employees will be created and sent out through the selected output port.
XMLXPathReader
output port, a new output port is created. You must create metadata on the output port(s) by hand or select some
prepared.
This component reads data from so called XML file or any other text file with XML-like nested tree structure.
It is not so fast as XMLExtract, but it can do more with the file nodes. It can do better mapping of the file structure.
It uses DOM technology.
When you select this component, you must specify which file should be read (File URL). Sometimes you want
to skip some amount of records. You can do it by specifying this Skip mappings attribute. By default it is 0.
You can also select how many records should be sent out from the file (Max number of mappings), otherwise
XMLXPathReader would read and send out all data records.
These two attributes above limit the outgoing records on the outputs and send the records to the output ports one
by one. Some number of the outgoing records is skipped (Skip mappings), some number of the others is sent out
through the output ports one by one (Max number of mappings).
The most important is to define some mapping of the original file to the output ports. The number of output ports
is not defined exactly. It depends on the selected Mapping.
You can also set the Data policy attribute to Strict, Controlled or Lenient. Strict is the default value. If you
switch to controlled data policy, you can send the log information about errors to stdout.

Because the original XML nested tree files have the structure consisting of some pairs of tags surrounding either
other pairs of tags or some text representing data, also the mapping of the original XML file to other data file or
database table or any other data resources must have very similar structure. Each mapping must use some tags, but
its nesting must be solved in a different way. However, the structure must be very similar in both the original file
144
Readers
and the mapping. But, now the mapping uses XML XPath language in contrast with the XMLExtract component.
XPath language makes more simple the way of locating any tag at any level of the original XML file.
• If you have some XML file which is located between some root start-tag <roottag> and the corresponding
end-tag </roottag>, its mapping must have a similar structure that starts with <Context someexpres-
sions> as its start-tag and terminates with </Context> as its end-tag. All other tags of mapping must be
located between these two terminal tags. Their structure must correspond to the structure of the original file.
The serie of tag pairs that are at the same level in the original XML file must also be at the same level and
between the surrounding tags in the mapping. They also must create a serie of mapping tags of the same level.
The tag pairs that are located deeper in the original file must also be located deeper in the mapping and must
be at the corresponding place between the corresponding mapping tags. They also must be a serie of mapping
tags creating different levels.
The start-tag of the mapping must look like this: <Context xpath="/roottag/tagA1/.../tagAj"
outPort="0">. The number of tags in the selected serie of levels depends on you, but (at the same time) it
defines the level where you want to start the mapping. And remember that you can map only the tags that are
located at the j-th level or deeper, independently on how deep they could be located. And note that you cannot
map any tag that is located higher.
• Now, once you have selected the j-th level and the port number in the expression above, you can select some
tags at the levels below the j-th and assign (map) them to some clover fields (names of the fields in metadata
on the selected output port).
These can be mapped in the following two ways:
Either the tags that are located next below the j-th level surround some values looking like this:
<sometag>somevalue</sometag>. In such a case, you can map these tags to clover fields in the follow-
ing way:
<Mapping nodeName="tagAj+1" cloverField="metadatafieldA"/>
<Mapping nodeName="tagBj+1" cloverField="metadatafieldB"/>
Or some other tags that are located more deeper (at the j+k-th level) are those that look like that:
<sometag>somevalue</sometag>. In such a case, you can map these tags to clover fields in the follow-
ing way:
<Mapping xpath="tagCj+1/.../tagCj+k" cloverField="metadatafieldC"/>
<Mapping xpath="tagDj+1/.../tagDj+k" cloverField="metadatafieldD"/>
Here, metadatafieldD in cloverField="metadatafieldD" is the metadata field name to which

tagDj+k is mapped.
Remember that you must map such tags (nodeNames) to clover fields only if you want to rename them in
metadata. If you do not want to rename them and (at the same time) metadata contain such a name, both will
be mapped to each other automatically.
It is on your decision how many tags from the original file you want to be selected and sent to the output. You
can limit to some of them only.
And note that these series of mappings for all selected tags of one level must be surrounded by some <Context
xpath="tagEA/tagEB/.../.../tagEG" outPort="noE"> and </Context> pair of tags.
• If you want to assign (map) some of the tags to some of the other output ports, you must do it in the following
way:
Between this pair of terminal tags mentioned above and destined for mapping ( <Context xpath="/
roottag/tagA1/.../tagAj" outPort="0"> and </Context>), there must be a serie consisting
of the following two expressions:
145
Readers
• Closed ("empty") tag like this:
<Context xpath="tagEj+1/.../tagEj+m" outPort="noE" parentKey="eEA"

generatedKey="eEB"/>
• A nested structure like this:
<Context xpath="tagFj+1/.../tagFj+m" outPort="noF" parentKey="eFA"

generatedKey="eFB">
A serie of mappings like the most above, a serie of some closed tags as
mentioned above or a serie of some nested structures like this one or all
types of structures can be here.
</Context>
In the last case, the elements in the middle must be a serie of the two structures mentioned above. They can be
either a serie of closed ("empty") tags or a serie of nested structures or a mixture of the two types. Remember that
the levels and nesting of the mapping must correspond to the levels and nesting of the original file. Remember
also that the numbers of the output ports must differ in each of these mapping expressions. For each selected
element, different output port must be assigned. The parentKey is always taken from the level that is next, but
higher than the level containing the <tagGj+m> tag. Remember that there can also be the xpath="tagHj+1"
expression only. There is no need to go deeper than necessary.
Note that the "noA", "noB", ..., "noF", etc. are the numbers of the output ports of the component surrounded
by double quotes through which data is sent out.
Now we must explain what expressions should be used to designate "eAA", "eAB",
..., "eFB", etc. They are all sequences of tags (parentKey) or metadata fields (gen-
eratedKey) separated by semicolon. These sequences are surrounded by double quotes.
(For example, you can have: parentKey="firstname;lastname;salary;address"
generatedKey="fname;lname;slr;addr".)
Thus, if the xpath="tagEA/tagEB/.../tagEG" expression ends at the j+m-th level and there are corresponds
to some pair of tags at some level (<sometag>...</sometag>) of the original file:
• Some metadata fields on the output port belong to the next higher level of the original XML file. However, it
concerns only such parts of the parent level that offer some values to fill the selected fields being themselves
of the <sometag>somevalue</sometag> type.
This is the mentioned parentKey. It is only on your decision how many tags from this parent level you want
to select as the parentKey. Remember that you do not need to use all of the tags. You can limit yourself to
some of them only.
These fields can be renamed (mapped) by setting parentKey="eGA" generatedKey="eGB".
This way, this parentKey (sequence of names of tags separated by semicolon) that is expressed by the eGA is
mapped to generatedKey (sequence of names of metadata fields separated by semicolon) that is expressed
by the eGB.
• Remember that you do not need to map tags to cloverFields in case both the tags at the selected level in
the original XML file and the field names in metadata on the output port have identical names. In such a case,
they will be mapped to each other according their names automatically. But, you do not need to use all of the
tags. You can limit yourself to some of them only.
• Also, if you do not obtain a unique identification of the outgoing records

with the help of the mentioned keys, you can use in your mapping a
pair of the following expressions sequenceField="metadatafieldAofthesequence"
sequenceId="identificationofthesequence". And for the deeper part of the original
146
Readers
file and its mapping, you can use such new artificial field name as its parentKey (val-
ues of this new, artificial sequenceField): parentKey="metadatafieldAofthesequence"
generatedKey="metadatafieldBofthesequence".
JMSReader
This component has at least one output port. Whenever any output port is connected, a new output port is created.
It receives Java messages and sends out data records. The component implements the JmsMsg2DataRecord
interface.
Once you have created the connection, you only need to specify it in the component (JMS connection).
To create such a connection, you must first specify all of the following: name of the connection, Initial context
factory class, available libraries, URL, Connection factory JNDI name, Destination JNDI, User and Password.
(For more information about how to create JMS connection, see corresponding section.)
Sometimes you also need to specify some JMS message selector.
You can also define the processing transformation by specifying one of the following three attributes: Processor
class, Processor code or Processor URL. (Processor class is a path and a file name of some class, jar or zip file
located outside the graph. Processor code is the transformation defined in the graph itself with the help of the
Java language. Processor URL is a path and a file name of some file written in Java language.)
If you want to define the Processor class attribute, you must click its item row, after which a button appears there,
and, when you click this button, an Open Type wizard opens. In it, you must specify the desired type. (See Section
"Open Type Wizard" for more information.)
If you want to define the Processor code attribute, you must click its item row, after which a button appears there,
and, when you click this button, an Edit value wizard opens. In this wizard you can define the transformation in
Java language. (See Section "Edit Value Wizard" for more information.)
If you want to define the Processor URL attribute, you must click its item row, after which a button appears there,
and, when you click this button, an URL File Dialog opens. In it, you can locate the desired file. (See Section
"Locating Files with URL File Dialog" for more information.)
It is also important to decide whether you want to limit the number of received messages and/or time of processing.
This can be done by specifying the maximal number of messages (Max msg count), the timeout (Timeout) or
both. The two of these attributes are set to 0. This means that processing should never stop. It is not limited by
number of messages. Neither it is limited by time. But, if you want to limit any of the two properties, you must do
it by defining some positive number(s). Thus, the processing will be limited by number of messages, or time of
processing (the attribute is in milliseconds), or both of these attributes. When the specified number of messages
is received, or when the process lasts some defined time, the process stops.
Of course, you can also specify what character type (Charset) should be used when reading external Processor
URL.
Of course, as in the case of the other components, you can change the phase of parsing data (Phase), set the visual
name located on the component (Component name) and enable/disable the component (Enable).
147
Readers
LDAPReader
This component has at least one output port. Whenever any output port is connected, a new output port is created.
The metadata on the output must precisely describe the structure of the read object.
This component reads information from LDAP directory converting it to Clover data records. It provides the logic
for extracting the results of a search and converts them into Clover data records. The results of the search must
have the same objectClass.
When you select this component, you must specify which directory should be read (Ldap URL). It has the fol-
lowing form: ldap://hostname:portnumber. You must also define the base distinguished name (Base
DN). It is a sequence of attribute and value pairs separated by comma. For example, it can look like this:
dc=example,dc=com.
You need to specify what filter should be used to search (Filter). It is defined by specifying a combination of
some attribute and value pairs. For example, sn=*.
You can also decide whether one object should be searched (object), or a level next below the distinguished name
(onelevel), or the whole subtree below the distinguished name (subtree). It can be done by defining one of the
mentioned values of the Scope attribute.
Sometimes you may need define your user name (User) and your password (Password). Your username can be
similar to the following: cn=john.smith,dc=example,dc=com.
148
Chapter 16. Writers
Writers are the final components of the transformation graph. Each writer must have at least one input port through
which the data flows to this graph component from some of the others. The writers serve to write data to files or
database tables located on disk or to send data using some FTP, LDAP or JMS connection. Among the writers,
there is also the Trash component which discards all of the records it receives.
In all writers it is important to decide whether you want either to append data to the existing file or sheet or database
table (Append attribute for files, for example), or to replace the existing file or sheet or database table by a new
one. The Append attribute is set to false by default. That means "do not append data, replace it".
It is important to know that you can also write data to one file or one database table by more writers of the same
graph, but in such a case you should write data by different writers in different phases.
Remember that (in case of most writers) you can see some part of resulting data when you right-click a writer and
select the View data option. After that, you will be prompted with the same View data dialog as when debugging
the edges (For more details see Section "Viewing the Data Flowing through the Edges".). This wizard allows you
view the written data (it can only be used after graph has already been run).
File URL
In order to work with the components, you must set File URL in some of them.
These are some examples of the File URL attributes for writing data.
• /path/filename.out
• /path/filename1.out;/path/filename2.out This way you can write two files on your disk.
• /path/filename$.out This way you can write the files named according to the specified pattern on your
disk. The dollar sign means the numbering of the files from 0 up to 9.
• /path/filename$$.out This way you can write the files named according to the specified pattern on your
disk. The dollar signs mean the numbering of the files from 00 up to 99.
• zip:/path/file$.zip This way you can write data to a compressed zip file(s) whose name(s)
correspond(s) to the specified pattern.
• zip:/path/file$.zip#filename.out This way you can write specified file into the compressed zip
file(s) on your disk.
• gzip:/path/file$.gz This way you can write data to a compressed gzip file(s) whose name(s)
correspond(s) to the specified pattern. Remember that CloverDataWriter cannot write data to any gzip file.
• ftp://user:password@server/path/filename.out This way you can write specified file(s) into

some remote server. Remember that CloverDataWriter cannot use ftp.
• port:$0.FieldName:discrete If this URL is used, output port must be connected. The :discrete
expression is optional here (since it is discrete by default) and can be omitted. The specified field of such data
records that are sent out through this optional output port represents one particular data source. Output data type
of this FieldName must be one of the following three: string, byte or cbyte.
• - This way you can write data to stdout. Remember that CloverDataWriter cannot write to stdout.
File Writers
These components write data to files. Only Trash does not write data, it discards all incoming records. One com-
ponent writes data to flat files: UniversalDataWriter. The others write data into internal Clover format (Clover-
149
Writers
DataWriter), Excel files (XLSDataWriter), DBase files (DBFDataWriter) and StructuredDataWriter can
even write records with a defined structure.
Unlike CloverDataWriter (and Trash, of course), other file writers can also send out data through their optional
output port.
Partitioning Data Flow into Different Output Files

Three components allow you to part the incoming data flow and distribute the records among different output files.
These components are the following: UniversalDataWriter, XLSDataWriter and StructuredDataWriter.
If you want to part the data flow and write the records into different output files depending on a key value, you
must specify the key for such a partition (Partition key). It has the form of a sequence of record field names
separated by semicolon.
You must also decide whether the individual output files should be numbered or whether they should be named
according to some field values. This can be done by choosing either the Number file tag or Key file tag value
of the Partition file tag attribute.
The output files are numbered by default (Number file tag option is the default value of the Partition file tag
attribute.)
In both cases, the File URL value will only serve as the base name for the output file names.
• If you want to give numbers to the output files, you must set the Partition file tag attribute to Number file tag.
If the File URL is the following: path/filebasename, the output file names will be constructed according
to the following pattern: path/filebasename#.
The number of hashes depends on how many output files could be created. One hash corresponds to one digit.
The files could also be the following: path/filebasename## or path/filebasename###. The files
are numbered starting from 0.
Thus, the output files can be created according to the following pattern: path/filebasename0, path/
filebasename1, ..., path/filebasename892, for example.
• If you want to give some explicit names to the output files, you must set the Partition file tag attribute to the
Key file tag.
If the File URL is the following: path/filebasename, the output file names will be constructed according
to the following pattern: path/filebasenamedistinguishingnames.
If the Partition key attribute is of the following form: field1;field2;...;fieldN and

the values of these fields are the following: valueofthefield1, valueofthefield2, ...,
valueofthefieldN, all the values of the fields are converted to strings and concatenat-
ed. The resulting values are used as distinguishingnames. They will be the following:
valueofthefield1valueofthefield2...valueofthefieldN.
Thus, the output files can be created according to the following pattern: path/
filebasenamevalueofthefield1valueofthefield2...valueofthefieldN.
For example, if you have the File URL attribute like the following: path/out, and if
firstname;lastname is the Partition key, you can have the output files as follows: path/outjohn-
smith, path/outmarksmith, path/outmichaelgordon, etc.
In addition to this, you can also select other names for output files. This can be done by using some lookup table
(Partition lookup table) and specifying the Partition output fields attribute. This way, new names are given to
the same output files. Partition output fields is a sequence of fields taken from Partition lookup table separated
from each other by semicolon. If some Partition key values are not contained in Partition lookup table, such
records will be written to the unassignedfilebasename file.
150
Writers
Trash
This component has one input port.
This is the most simple component. It discards all of the records it receives. Nevertheless, it still can write incoming
data to a file (Debug print and Debug file URL attributes) or send it to stdout.
You may also want to set the character type that should be used for encoding data that will be written to the output
file(s) or sent to stdout (Charset).
You must also decide whether the data should be appended to the debug file or whether the file should be replaced
(Debug append). Default is false, what means "do not print data, discard it".
It is also possible to change the phase of parsing data (Phase), set the visual name located on the component
Flat File Writers

One component writes data to flat files: UniversalDataWriter. It can also part the incoming data flow and write
it into different output files.
This file writer can also send data out through its optional output port.
UniversalDataWriter
This component has one input port and one optional output port.
This component writes data to flat files. It can write delimited, fixed length and/or mixed data records depending
on metadata on its input port.
When you select this component, you must specify the file(s) to which the data should be written (File URL).
If you connect an edge to the optional output port of the component, you must set the File URL attribute to
port:$0.FieldName[:processingType]. Here processingType is optional and can only be set to:
discrete. (You can see the meaning of these attribute values in the section "File URL" above.) Output data
file(s) (Charset).
If you want, you can write the names of the fields (Write field names) to the first row of the output file(s). It
is set to false by default.
It is very important to decide whether the records should be appended to the existing file (Append) or whether
the file should be replaced. This attribute is set to false by default ("do not append, replace the file").
You can also decide how many records should be skipped before writing to the output file(s) (Number of skipped
records). It is 0 by default. You can set a limit to the Max number of records. If you did not specify these
attributes, UniversalDataWriter would write all incoming data records to output file(s).
151
Writers
You can also limit the number of records that can be contained in one file as a maximum (Records per file) and/
or the file size in bytes (Bytes per file). In such a case, if you want to write incoming data records to more output
files, and not only one, you must use dollar signs in the output file base name (in File URL). This way, output
files will be more and data records will be written to different output files.
If you want to part the data flow and distribute the incoming records among different output files, you must define
the Partition key attribute and select the value of Partition file tag (either Number file tag or Key field tag
values). The default value of this attribute is Number file tag. If you want to give other names to these output
files, you must specify Partition lookup table and Partition output fields.
Other Type File Writers

The other four components write data to other type files. (CloverDataWriter, XLSDataWriter, DBFDataWriter
and StructuredDataWriter.) These files are internal Clover format files (CloverDataWriter), Excel files (XLS-
DataWriter), dBase files (DBFDataWriter) and files with more complicated structure (StructuredDataWriter).
Unlike CloverDataWriter, the others of these file writers can also send out data through their optional output port.
The last component (XMLWriter) writes data to XML files, but this is a more advanced question and it is dealt
in "Advanced Writers" Section below.
XMLWriter can also send out data through its optional output port.
CloverDataWriter
This component writes data in our internal binary Clover data format. With this component, you can write data
in this internal format that allows faster access to data.
When you select this component, you must specify the output file(s) to which the data should be written (File
URL).
Remember that CloverDataWriter cannot work with ftp, http and https protocols and gzip files and
neither it can send data to stdout. It can only write data to common files or to compressed zip files.
When you write such a file, you can also create and save an index file which allows you to subsequently select
individual records from the data file. You can also create and save the metadata file. You can specify whether you
want to save the index file (Save index) and/or save the metadata file (Save metadata). Both attributes are set to
false by default. In addition to this, all file(s) can be compressed in one zip archive.
You can also limit the number of records that can be contained in one file as a maximum (Records per file). In
such a case, if you want to write incoming data records to more output files, and not only one, you must use dollar
signs in the output file base name (in File URL). This way, output files will be more and data records will be
written to different output files.
You can also decide whether you want to compress the output data (Compress data). (This attribute is not re-
quired.)
152
Writers
• If you set this attribute to true, CloverDataWriter will compress the created file(s) into one output file, inde-
pendently on whether the output file name (File URL) contains the zip extension or not.
• If you set this attribute to false, CloverDataWriter will not compress the created file(s), independently on
whether the output file name (File URL) contains the zip extension or not. It will save all created file(s) sep-
arately.
• If you do not specify this attribute, CloverDataWriter will compress the created file(s) into one output file only
if the output file name (File URL) contains the zip extension. Otherwise, it will save all created file(s) separately.
If you do not compress the created file(s), the file(s) will be saved separately with the following
name(s): datafilename (for the file with data), datafilename.idx (for the file with index) and
datafilename.fmt (for the file with metadata). In all of the created name(s), datafilename contains its
extension in all of the three created file(s) names.
If you compress the created file(s), you can also set the compression level (Compress level). (The Compress
level attribute can be set to a number from 0 to 9 where 0 equals to "without compression".) Higher number means
better compression, but writing is slower.
If the created data file has the following name: datafilename, the final output file will have the following in-
ternal structure: DATA/datafilename, INDEX/datafilename.idx and META/datafilename.fmt.
Here, datafilename includes its extension in all of the three names. For example: DATA/employees.clv,
INDEX/employees.clv.idx, META/employees.clv.fmt.
If you set the Compress level attribute to 0, all created file(s) will be contained in the same output file with the
same internal structure (see above), but the created file(s) will not be compressed.
records). It is 0 by default. You can set a limit to the Max number of records. It is unlimited by default. Thus,
if you did not specify these attributes, CloverDataWriter would write all incoming data records to output file(s).
XLSDataWriter
When you select this component, you must specify the file to which the data should be written (File URL).
This component writes data to an excel file. You must first specify the sheet to which you want to write. You can
do it by specifying the Sheet number or the Sheet name. If you specify both attributes, only the Sheet name will
be applied. If such sheet does not exist, it will be created with given name and number. But the Sheet number
attribute will be ignored.
You can use as the Sheet name some serie of clover fields preceded by a dollar sign and separated by colon,
semicolon or pipe. Thus, if different combination of clover fields is selected, a new sheet will be created.
You can also specify to which row you want to write the names of the columns (Metadata row). It is 0 by default.
That means that names of the columns would not be written to the sheet. You can specify the rows and columns
153
Writers
from where you want to start writing. The records will be written starting from the Start row and from the Start
column. They are 1 and A by default.
It is very important to decide whether the records should be appended to the existing sheet (Append to the sheet)
or whether the sheet should be replaced. This attribute is set to false by default ("do not append, replace the sheet").
Note that this attribute does not concern the whole file now, but only a sheet!
attributes, XLSDataWriter would write all incoming data records to output file(s).
You can also set a limit to the number of all the records that should be contained in one file as a maximum (Records
per file). If you want to write incoming data records to more output files, and not only one, you must use dollar
signs in the output file base name (in File URL). This way, output files will be more and data records will be
written to different output files.
StructuredDataWriter
This component has one, two, or three input port(s) and one optional output port. The second and third input ports
are optional. These can serve to receive data for writing the header and/or the footer, respectively.
When you select this component, you must specify the file to which the data should be written (File URL).
file(s) (Charset).
This component writes data to files according to some pattern defined in its Body mask attribute. You must specify
this attribute and you can also define some text that should be written at the head of the file (Header mask) and/
or at the end of the file (Footer mask). This component can have up to 3 input ports. The first is for body of the
file and the second and the third can be for header and footer, respectively. (If there are any second and/or the
third.) If any of the last two ports is not connected, you can type yourself any other static header and/or static
footer. Even for the body of the file you can write any structure you want independently on that the first port with
data is connected. But, you can also define the structure of what should be written to the output file using the data
incoming through the first input port.
But, if you have connected all the ports or some of them, you can define some mask in the following way: When
you click the Body mask, the Header mask or the Footer mask attributes, a button appears on the right side
of the line and after clicking this button, a Mask wizard opens. In this wizard, you can see the Metadata and
Mask panes. At the bottom, you can see the Auto XML button. If you click it, a simple XML structure appears
in the Mask pane.
154
Writers
Figure 16.1. Create Mask Wizard
You only need to remove the fields you do not want to save to the output file and you can also rename the suggested
left side of the matchings. These have the form of matchings like this: <sometag=$metadatafield/>. By
default after clicking the Auto XML button, you will obtain the XML structure containing expressions like this:
<metadatafield=$metadatafield/>. Left side of these matchings can be replaced by any other, but the
right side must remain the same. You must not change the field names preceded by a dollar sign on the right side
of the matchings. They are the names of the data fields.
But remember that you even do not need to use any XML file as a mapping. The mapping structure you select
can be of any other type. But always you must use the metadata fields preceded by a dollar sign. They represent
the values of the corresponding data fields.
attributes, StructuredDataWriter would write all incoming data records to output file(s).
or the file size in bytes (Bytes per file). If you want to write incoming data records to more output files, and not
only one, you must use dollar signs in the output file base name (in File URL). This way, output files will be more
and data records will be written to different output files.
Database Writers
So far we have talked about file writers, but often you want to write data to databases instead of files. In such cases
you can write data using either some client or utility that connects to database or some JDBC driver.
Using JDBC Drivers

Now we will describe a component that uses JDBC drivers - DBOutputTable. When using this component, you
do not need any database client or other utility on your computer, but working with DBOutputTable is slower
than working with database bulk loaders.
155
Writers
DBOutputTable
This component has one input port and two optional output ports. These output ports can be used for records that
have been rejected by database table (first one) and/or for so called autogenerated columns (second one) (supported
by some database systems only). Metadata on the first optional output port can be the same as on the input port.
Or they can have an additional string field at the end containing an error message generated when parsing the
record. Metadata on the second optional output port must correspond to the autogenerated columns of the selected
database.
This component writes data to databases. Unlike database writers for different database systems based on using
client-server architecture or using special DB utilities, DBOutputTable can be used for various database systems
depending on selected JDBC driver.
First, you need to define a database connection. You must choose some of the available database connections (DB
connection). To create such a connection, you must specify all of the following: host name of database server,
database name, user name, access password, JDBC specific and JDBC driver that should be used to connect such
database. Sometimes you also need to define the port number.
You must also specify the database table and define some mapping of Clover fields (names of the fields in meta-
data) to database fields. You can do it in the following way:
Database table can be specified as one of the attributes of the component (DB table) or in a query. The query can
be defined in the graph (SQL query) or in some file outside the graph (Query URL). You should define only
one of these three attributes.
However, if you define not only DB table, but also SQL query or Query URL or both, the DB table attribute
will be ignored. And if you specify SQL query along with Query URL, only the SQL query will be applied.
• If you define some query (independently on whether it is SQL query in the graph or Query URL outside the
graph), you have two possibilities how to map Clover fields to DB fields.
• You can use selected Clover fields, each of them preceded by a dollar sign, in the query itself.
In such a case, you do not need to define any mapping: the Field mapping, Clover fields and DB fields
attributes will not be specified. Even if you specified any of them, it would be ignored.
• You can use question marks as placeholders in DB table specified in the query for such Clover fields that
should be mapped to these DB fields.
If you specify the Field mapping attribute, it will be used to map specified Clover fields to specified DB
fields. The resulting values of DB fields will be inserted to the DB table specified in the query.
If you want to map specified Clover fields to specified DB fields, but you do not use the Field mapping
attribute, Clover fields will be mapped to DB fields automatically, according to their mutual order in the
Clover fields and DB fields attributes. The resulting values of DB fields will be inserted to the DB table
specified by the query.
If you specify the Clover fields attribute alone, these fields will be mapped (in their order in the attribute)
to DB fields represented in the query by question marks.
Remember that if you specify both the Clover fields attribute and the DB fields attribute, these Clover fields
will not be mapped to the question marks directly, they will be mapped (in their mutual order) to these DB
fields in the DB fields attribute first and (after that) these values of DB fields will be inserted into the DB
table specified in the query into corresponding columns.
156
Writers
(Field mapping is a sequence of expressions of the form $cloverField:=dbField, each of them is

followed by semicolon. The last semicolon is optional, it can be omitted.)
(Clover fields or DB fields attributes are sequences of Clover field names or DB column names, respectively,
separated by semicolon. Even the last field can be followed by semicolon, but such delimiter is optional and
can be omitted.)
If you do not specify Field mapping, nor Clover fields, nor DB fields, Clover fields will be mapped to DB
fields according to the order of Clover fields in metadata. The number of Clover fields must equal to the
number of DB fields.
Remember that if you specify Field mapping along with these other two attributes, only Field mapping
will be applied.
• If you define the DB table attribute:
• If you want to map Clover fields to DB fields and do not define any mapping, Clover fields will be mapped
to DB fields automatically, according to their order in metadata. The number of Clover fields must equal to
the number of DB fields.
• If you want to map Clover fields in different order or if you want to map only some of them, you must define
some mapping.
If you want to define such mapping, you must specify either the Field mapping attribute alone, or both the
Clover fields (Clover fields attribute) and DB fields (DB fields attribute) that should be mapped to each other
or the Clover fields attribute alone.
Remember that if you specify the Field mapping attribute, both the Clover fields and DB fields attributes
will be ignored.
Remember also that if you do not specify the Field mapping attribute, but you define the Clover fields and
DB fields instead, these fields will be mapped to each other by the order of their appearance in the mentioned
attributes. The resulting values of DB fields will be inserted to the DB table specified as the DB table attribute.
If you specify the Clover fields attribute alone, these fields will be mapped to DB columns in the order of
their appearance in this attribute.
Remember that if you specify Field mapping along with these other two attributes, only Field mapping
will be applied.
You can define maximum number of errors (Max error count) after which the process stops. If you set this
attribute to -1, all errors will be ignored. Default value is 0.
You can also define what should be done if error occurs. The Action on error attribute can be set to ROLLBACK
or COMMIT. Default is COMMIT. You can specify the number of records that should be committed at once
(Commit). If an error occurs, the last batch can be committed or rolled back.
If your database and/or JDBC driver support batch mode of sending statements to database, you should set the
Batch mode attribute to true since this can speed data loading. You can also specify how many records should be
loaded in one batch update (Batch size). It is set to 25 by default.
It is deprecated now, but in case that the SQL query attribute contains only one query, you can also set the Auto
generated columns attribute.
For Oracle and DB2 databases, they are the names of database columns that should be returned.
For Informix and MySQL, they are the field names of incoming records along with one additional field called
"AUTO_GENERATED" that should be returned.
Remember, that Batch mode makes impossible to generate a key.
157
Writers
You may also want to set the character type (Charset) that should be used when reading external Query URL.
Using Database Bulk Loaders

Now we will describe such database components that do not need to use JDBC drivers. They are all faster than
DBOutputTable. But you need to have installed specific database client or other utility on your computer so as
to connect to some specific database server.
DB2DataWriter
This component has one optional input port and one optional output port. It can read data through the input port
or from some file. If the input port is not connected to any other component, data must be contained in other file
that should be specified in the component.
If you connect some other component to the optional output port, it can serve to log the information about errors.
Metadata on this error port must have three fields: the number of incorrect record (integer), either the number of
incorrect field (for delimited records) or the offset of incorrect field (for fixed length records) (both are of integer
data type) and the error message (string).
This component writes data to databases. It can only be used for DB2 database system.
First, you need to install the DB2 database client on the computer with CloverGUI. Only then you can use this
component. You must specify all of the following: database name (Database), name of database table (Database
table) you want to work with, your user name (User name), your access password (Password) and the mode to
load data (Load mode). You must select as the Load mode attribute one of the following: insert, replace, restart,
terminate. The default value is insert.
If the component does not receive data through the input port, you must specify the file from which it should read
the desired data (Data file URL). When you read data from such an external file, you must define its metadata (File
metadata). All of the columns are separated from each other by a one char delimiter. The last column is delimited
by a line feed character ( \n ). The delimiter of the columns is defined in the Column delimiter attribute. You can
also define this delimiter in the Parameters attribute as the coldel variable. If you define both, the Column
delimiter attribute is applied. Remember that the delimiter must not be contained in record fields as their part.
This component allows you to assign the original metadata fields (clover fields) to the database fields. You must
define either the Field mapping attribute or both the clover fields (Clover fields) and database fields (DB fields)
that should be assigned to each other. And they are assigned by their order of appearance in the mentioned at-
tributes. Their number must be equal in both of the attributes. Remember that if you have defined the Field map-
ping attribute, all fields listed in the Clover fields and DB fields attributes will be ignored. Field mapping is a
sequence of expressions of the form $cloverField:=dbField, each of them is followed by semicolon. The
last semicolon is optional, it can be omitted. Note that you can map clover fields to database fields even without
listing the database fields. But, the number of clover fields must equal to the number of database fields.
If you read data from the input port, you can specify how many records should be skipped before writing to database
(Number of skipped records). Remember that this is not valid for reading from the file specified in the Loader
input file attribute. You can also set a limit to the Max number of records. If you set the rowcount variable in the
Parameters attribute, the value of this rowcount will be applied instead of the Max number of records attribute.
You can also set the Max warning count. This attribute limits the number of error messages and/or warnings.
You can also set the Max error count.
158
Writers
If you want to save rejected records, you must specify the Rejected records URL (on server) attribute. In this
place - on database server - all of the rejected records will be saved.
The DB2 command interpreter attribute serves to define the interpreter that should execute the DB2 commands
(connect, load, disconnect). It has the following form: interpreter [parameters] ${} [pa-
rameters]. The name of the script file should be used as the ${} expression.
Batch file URL defines the file where db2 commands should be stored. Remember that the path must not contain
white spaces.
If you are working on Linux, you may also Use pipe transfer. You can send the data incoming through the input
port to pipe instead of temporary file. It is false by default.
You may also want to set some serie of parameters that can be used when working with DB2 database system
(Parameters). All of the parameters must have the form of key=value or key only (if its value is true).
Individual parameters must be separated from each other by colon, semicolon or pipe. Note that colon, semicolon
or pipe can also be a part of some parameter value, but in this case the value must be double quoted.
InformixDataWriter
This component has one optional input port and one optional output port. It can read data through the input port
or from some file. If the input port is not connected to any other component, data must be contained in other file
that should be specified in the component. If you connect some other component to the optional output port, it
can serve to log the rejected records and the information about errors. Metadata on this error port must have the
same metadata fields as the incoming or read records plus two additional fields at its end: number of row
(integer) and error message (string).
This component writes data to databases. It can only be used for Informix database system.
First, you need to install the Informix dbload database utility on the computer with CloverGUI. In addition to
this, it is very import to have the server with the database on the same computer as both the dbload database
utility and CloverGUI and be logged as root user. Only then, you can use the dbload database utility. Otherwise,
in order to load data to database, you must use load2 free library instead of dbload utility. The load2 free
library can even be used in case of server located on a remote computer.
You must specify the name of database (Database), sometimes you need to specify the name of database server
(Host) and you must select one of the following: either the database table (Database table) or control script
(Control script). If you do not specify some control script, default script will be used. If you specify both database
table and control script, the control script will be used. But remember that control script will be ignored if you use
the load2 library. In such a case you must specify the database table.
You must also set up the Path to dbload utility attribute. It is the path to the dbload.exe or dbload exe-
cutable utilities. If you have the path to the utility in your PATH variable, you only need to set this attribute to
dbload.exe for Windows or dbload for Linux.
If you have not connected any component to the input port, you must specify the file that should be read (Loader
input file).
If you want to log the process of loading data to database, you can select the name of log file in the Error log
URL attribute. If you do not specify any other name, default name of log file (error.log) will be used.
When using the dbload utility, you can also specify the Ignore rows and Max error count attributes meaning
the number of rows that should be skipped and the number of errors after which the process stops, respectively.
159
Writers
The Max error count attribute is set to 10 by default. The Ignore rows attribute applies only when working with
the dbload utility.
You can set the Commit interval attribute which is 100 rows by default. You can also set the Column delimiter
which is a pipe by default. But remember that it must not be contained in record fields as their part. Column
delimiter is used only when working with the dbload utility.
You may want to change the utility for loading data into database. You may want to prefer loading data with the
help of the load2 free library instead of dbload utility. You must do it by setting the Use load utility attribute
to true. In such a case, you need to specify the following four properties: your user name (User name), your access
password (Password) and the Ignore unique key violation attribute. The last property is set to false by default.
The Use insert cursor attribute is set to true by default. This doubles data transfer performance. It is used only
when working with the load2 library,
MSSQLDataWriter
This component has one optional input port and one optional output port. It can read data through the input port or
from some file. If the input port is not connected to any other component, data must be contained in other file that
should be specified in the component. If you connect some other component to the optional output port, it can serve
to log the rejected records and information about errors. Metadata on this error port must have the same metadata
fields as the records plus three other fields at its end: number of incorrect row (integer), number of
incorrect column (integer), error message (string).
This component writes data to databases. It can only be used for MSSQL database system.
First, you need to install the MSSQL database client on the computer with CloverGUI. Only then you can use this
component. You must specify all of the following: database name (Database), either the name of database table
(Database table) or the name of database view (Database view) you want to work with, your user name (User
name) and your access password (Password). If you are not the owner of database table or database view, you must
also specify the name of database owner (Database owner). If you were the owner, this would not be necessary.
You must also set up the Path to bcp utility attribute. It is the path to the bcp.exe or bcp executable utilities.
If you have the path to the utility in your PATH variable, you only need to set this attribute to bcp.exe for
Windows or bcp for Linux.
If the component does not receive data through the input port, you must specify the file from which it should
get the desired data (Loader input file). You can select the Column delimiter. Default delimiter is the tabulator
character. But remember that it must not be contained in record fields as their part.
You may also want to set some serie of parameters that can be used when working with MSSQL (Parameters).
For example, you can set the number of the port, etc. All of the parameters must have the form of key=value
or key only (if its value is true). Individual parameters must be separated from each other by colon, semicolon
or pipe. Note that colon, semicolon or pipe can also be a part of some parameter value, but in this case the value
must be double quoted.
Among the optional parameters, there can also be set userName, password or fieldTerminator for User
name, Password or Column delimiter attributes, respectively. If some of the three attributes (User name, Pass-
word and Column delimiter) will be set, corresponding parameter value will be ignored.
160
Writers
MySQLDataWriter
This component has one optional input port and one optional output port. It can read data through the input port or
from some file. If the input port is not connected to any other component, data must be contained in other file that
should be specified in the component. If you connect some other component to the optional output port, it can serve
to log the rejected records and information about errors. Metadata of this error port must have three fields: number
of incorrect row (integer), number of incorrect column (integer), error message (string).
This component writes data to databases. It can only be used for MySQL database system.
First, you need to install the MySQL database client on the computer with CloverGUI. Only then you can use this
component. You must specify all of the following: host name of database server (Host), database name (Database),
name of database table (Database table) you want to work with, your user name (User name) and your access
password (Password).
You must also set up the Path to mysql utility attribute. It is the path to the mysql.exe or mysql executable
utilities. If you have the path to the utility in your PATH variable, you only need to set this attribute to mysql.exe
for Windows or mysql for Linux.
character. But remember that it must not be contained in record fields as their part. You can also specify how
many rows from data file should be ignored when working with database (Ignore rows). And you can specify
the Path to control script.
You may also want to set some serie of parameters that can be used when working with MySQL (Parameters).
For example, you can set the number of port, etc. All of the parameters must have the form of key=value or
key only (if its value is true). Individual parameters must be separated from each other by colon, semicolon
OracleDataWriter
This component writes data to databases. It can only be used for Oracle database system.
First, you need to install the Oracle sqlldr database utility on the computer with CloverGUI. Only then you can
use this component. You must specify all of the following: name of database table (Oracle table) you want to
work with, name of Transparent Network Substrate (TNS name) and your user name (User name). Optionally
you can specify the access password as well (Password).
Note that you can also specify the names of database table columns (DB column names).
You must also set up the Path to sqlldr utility attribute. It is the path to the sqlldr.exe or sqlldr exe-
cutable utilities. If you have the path to the utility in your PATH variable, you only need to set this attribute to
sqlldr.exe for Windows or sqlldr for Linux.
161
Writers
Of course, you must decide what should be done with the database table. You have four options: Insert, Append,
Replace, Truncate. This property can be set by the Append attribute. Its value is Append, by default.
And you can specify the Path to control script attribute. If you do not specify your proper Path to control script,
the default will be used.
You can also log the process of loading data to database. To do that, you do not have at your disposal any output port
but you can specify the file to which the log should be written (Log file name). Its default name is loader.log.
You can also specify the file to which incorrect records should be written (Bad file name). Its default name is
loader.bad. And you can set the Discard file name attribute for writing the records that did not meet the
desired criteria. Its default name is loader.dis.
PostgreSQLDataWriter
This component has one optional input port. It can read data through the input port or from some file. If the input
port is not connected to any other component, data must be contained in other file that should be specified in the
component.
This component writes data to databases. It can only be used for PostgreSQL database system.
First, you need to install the PostgreSQL database client on the computer with CloverGUI. Only then you can
use this component. You must specify all of the following: host name of database server (Host), database name
(Database), name of database table (Database table) you want to work with and your user name (User name).
You must also set up the Path to psql utility attribute. It is the path to the psql.exe or psql executable utilities.
If you have the path to the utility in your PATH variable, you only need to set this attribute to psql.exe for
Windows or psql for Linux.
character. But remember that it must not be contained in record fields as their part. You can also specify how
many rows from data file should be ignored when working with database (Ignore rows). And you can specify
the Path to control script.
You may also want to set some serie of parameters that can be used when working with PostgreSQL (Parameters).
For example, you can set the number of port, etc. All of the parameters must have the form of key=value or
key only (if its value is true). Individual parameters must be separated from each other by colon, semicolon
162
Writers
Advanced Writers
The components described above can write data to files of different types and to databases. But some information
is not contained in these two data resources or it is contained in more complicated structured files. CloverGUI
offers you three additional advanced writers: XMLWriter, JMSWriter and LDAPWriter. The XMLWriter
creates XML files, JMSWriter sends Java messages and LDAPWriter puts information into LDAP directories.
XMLWriter
This component writes data to an XML file with a nested tree structure. It can have more input ports as resources
for various levels of the resulting file. It uses SAX technology.
If you select this component, you must specify to which file the outgoing data should be written (File URL).
When specifying the output file name, you can use a pattern containing a dollar sign meaning any number from
0 to 9. Thus, if you want to split the output file to some number of subfiles, you can do it with the help of dollar
signs. If you do not limit the output file size, it is not necessary. Thus, for example, if you have set the output
file names to filename_$$.dat, when running the graph, filename_00.dat, filename_01.dat, etc.
will be created if they are necessary.
You can set the number of mappings that can be contained in one output file as a maximum (Mappings per file).
By default, all mappings are written to one file.
You can also set a limit to the total number of written mappings as well. This can be done by specifying Max
number of mappings. It is unlimited by default.
If you want to skip some records from the beginning, you can do it by setting the Number of skipped mappings
attribute. The default value of this attribute is 0.
You can set the Whole output to single line attribute to true. In such a case, the whole output XML file will be
written to a single line. By default, this attribute is set to false.
By default, any output file uses as the root element the following tag: <root>. If you want to use other root
element, you can specify the Name of root element in output XML file attribute. The root element is used only
if more mappings are written to one output file.
If you do not want to use any root element, you can switch the Use root element attribute to false. Remember that
the output file without any root element is invalid XML file.
If you use root element, you can specify Default namespace for root element and some other namespaces:
Namespaces for root element. The Default namespace for root element is an URI of some namespace. The
other Namespaces for root element has the following form:
prefix1="URI1";...;prefixN="URIN"
Only if you use some root element, you can also specify DTD. This can be done by setting the following two
attributes: DTD public Id and DTD system Id. After that, the resulting output XML file will contain the following
DTD:
<!DOCTYPE [rootElement] PUBLIC "[dtdPublicId]" "[dtdSystemId]">
163
Writers
If you use some root element, you may also want to define XSD schema location. Remember that among names-
paces should be one with the xsi prefix. Example: xsi="http://www.w3.org/2001/XMLSchema-in-
stance"
It is important to define some mapping of the original files incoming through the input ports to the output XML
file(s). The number of input ports is not defined exactly. It depends on the selected Mapping of ports to XML
structure attribute. The tags in both mapping and output file(s) have the form of <element> (in this case,
Default namespace is used) or <someprefix:element> (if Namespace corresponding to this someprefix
is specified).
You may also want to set the character type that should be used when writing data to the output file(s) (Charset).

Because the XML nested tree files have the structure of some pairs of tags surrounding either other pairs of tags
or some text representing data, you must define the mapping of some amount of input files to the resulting output
XML file in a similar way. Each mapping must use some tags.
• If you have some number of input files incoming through input ports, you must decide which of the files should
represent parents and which should represent children.
• Then you must decide whether some specific parent can have more children of some type or one child at most.
For example, one customer can make more orders, but one order can be made by one customer only.
• If you want to map the input files to the output XML file, you must unite pairs of incoming files and interconnect
them mutually in the following way:
• First you must decide whether a parent can have more children of some type or at most one.
• If the parent can have more children, you must define the following structure of mapping for the children:
<Mapping element="ctagA" inPort="noA" key="cAA" keyToParent="cAB"

fieldsAs="coptA" fieldsAsExcept="cAD" fieldsIgnore="cAE">
Some other mapping can be here. Its structure must be similar to this one.
</Mapping>
(Here, ctagA is some tag of a child, noA is the number of the input port, cAA is some key expression
(sequence of fields separated by semicolon), coptA is an option (either elements or attributes), cAD
and cAE are some expressions of the child (sequences of field names separated by semicolon).)
(For parent, there would be ptagB, pBA, poptB, etc. See below.)
In this case, one parent can have more children. This is the reason why the keyToParent expression can
be found here. This keyToParent is some key from some child. It must identify the children uniquely and
its values must equal to the values of some other key in the parent. It is of no importance whether the names
of the keys in parent and children are the same. But their values are those that must be the same.
Thus, then you will continue to the nested tree structure of parent and children:
<Mapping element="ptagB" inPort="noB" key="pBA" theotherattributesB>
<Mapping element="ctagC" inPort="noC" key="cCA" keyToParent="cCB(pBA)"

theotherattributesC>
(Deeper levels of XML file expressed in a similar way.)
164
Writers
</Mapping>
</Mapping>
Note that the pBA in the key attribute of the parent must have the same values as the cCB(pBA) in the
keyToParent attribute which is the key in the children.
If you want to create two different children of one parent at the same level, you must do it by creating a serie
of tag pairs like this:
<Mapping element="ctagD" inPort="noD" key="cDA" keyToParent="cDB(pBA)"

theotherattributesD>
</Mapping>
<Mapping element="ctagE" inPort="noE" key="cEA" keyToParent="cEB(pBA)"

theotherattributesE>
</Mapping>
Thus, you have found some common fields or group of fields in both parent and children, they have the name
pBA in the parent and cDB(pBA) or cEB(pBA) in the children. But they must have the same values.
The meaning of the keyToParent="cDB(pBA)" expression is the following: In the children level there
is a cDB key that has the same values as the pBA key in the parent. They do not need to have the same names
but they must have the same values.
Here, ptagB is a tag of the parent and ctagC, ctagD and ctagE are tags of the children.
• If a parent can have one child only, you must define the following structure of mapping for this child:
<Mapping element="ctagF" inPort="noF" key="cFA" keyFromParent="pBA(cFA)"

fieldsAs="coptF" fieldsAsExcept="cFD" fieldsIgnore="cFE">
Some other mapping can be here. Its structure must be similar to this one.
</Mapping>
In this case, one parent can have at most one child. This is the reason why the keyFromParent expression
can be found here.
The meaning of the keyFromParent="pBA(cFA)" expression is the following: In the parent level there
is a pBA key that has the same values as the cFA key in the child. They do not need to have the same names
but they must have the same values.
Thus, you will continue to the nested tree structure of parent and child:

theotherattributesF>
(Deeper levels of XML fields expressed in a similar way.)
</Mapping>
165
Writers
</Mapping>
But, if you want to create two different unique children of one parent at the same level of XML file, you must
do it by creating a serie of tag pairs like this:

theotherattributesF>
</Mapping>
<Mapping element="ctagG" inPort="noG" key="cGA" keyFromParent="pBA(cGA)"

theotherattributesG>
</Mapping>
Note that the ceGA in the key of the child must have the same values as the pBA(cGA) in the
keyFromParent (which is the key of the parent).
Finally, you can also combine unique children with different children at the same level of XML file.
<Mapping element="ctagH" inPort="noH" key="cHA" keyFromParent="pBA(cHA)"

theotherattributesH>
</Mapping>
<Mapping element="ctagJ" inPort="noJ" key="cJA" keyToParent="cJB(pBA)"

theotherattributesJ>
</Mapping>
</Mapping>
• This way you can define mapping for parent and its children or parent and its child.
Now we will describe what the fieldsAs, fieldsAsExcept and fieldsIgnore expressions mean.
When you select this component, you must also decide whether some values should be parsed as tags or as at-
tributes. If you want them to be attributes, you must set up fieldsAs="attributes", otherwise it is set
to elements by default. These options (attributes or elements) are the mentioned optA and optB in
the fieldsAs="coptA" and fieldsAs="coptB", respectively. By the fieldsAsExcept="cYZ"
you are specifying a list of fields that should be processed as the other value. If fieldsAs is set to ele-
ments, all of the fields from the list in the cYZ will be processed as attributes and vice versa.
If you do not want to save some fields independently on whether it should be from children or parents (they
may have been already mentioned in other places -parents or children), you can make a list of them separated
by semicolons and quoted by double quotes and put it as the value of the fieldsIgnore attribute. This way
you can hide the listed fields in the selected element (parent or child/children) that you do not want to save.
Note also that if you have customers and orders among your input fields, you can set
element="customers" and element="orders". Or you can use other names. These are only names,
you can change them to whatever other name. But, if you have selected the mentioned names, you will obtain
in the output XML file the <customers>, </customers>, <orders> and </orders> tags.
Of course, ports are numbered starting from 0. Thus, you can start from inPort="0".
• As you can see above, if you have the following two expressions:
166
Writers
key="pAB" ... (in the parent),
key="cCD" keyFromParent="pAB(cCD)" (in the child),
the mentioned keyFromParent contains a sequence of field names from the parent contained in the parent
key and (at the same time) the values of these parent fields equal to the values of the fields from the child
key. Thus, pAB and cCD are different sequences of field names (from parent and child, respectively), but
their values are the same.
And, the parent can have only one child of this type (with the same ctagC).
• As you can see above, if you have the following expressions:
key="pAB" ... (in the parent)
key="cCD" keyToParent="cCD(pAB)" (in the child)
The mentioned keyToParent contains a sequence of field names from the child key and (at the same time)
the values of these child fields equal to the values of the fields from the parent key. Thus, pAB and cCD are
different sequences of field names (from parent and child, respectively), but their values are the same.
And, the parent can have more children of this type (with the same ctagC).
JMSWriter
It receives data records and sends out Java messages. The component implements the DataRecord2JmsMsg
interface.
Once you have created the connection, you only need to specify it in the component (JMS connection).
To create such a connection, you must first specify all of the following: name of the connection, Initial context
factory class, available libraries, URL, Connection factory JNDI name, Destination JNDI, User and Password.
(For more information about how to create JMS connection, see corresponding section.)
You can also define the processing transformation by specifying one of the following three attributes: Processor
class, Processor code or Processor URL. (Processor class is a path and a file name of some class, jar or zip file
located outside the graph. Processor code is the transformation defined in the graph itself with the help of the
Java language. Processor URL is a path and a file name of some file written in Java language.)
If you want to define the Processor class attribute, you must click its item row, after which a button appears there,
If you want to define the Processor code attribute, you must click its item row, after which a button appears, and,
when you click this button, an Edit value wizard opens. In this wizard you can define the transformation in Java
language. (See Section "Edit Value Wizard" for more information.)
If you want to define the Processor URL attribute, you must click its item row, after which a button appears,
Of course, you can also specify what character type (Charset) should be used when reading from external sources.
167
Writers
LDAPWriter
This component has one input port and one optional output port. If the optional output port is created, rejected
records are sent to it. Therefore, metadata on this optional output port must be the same as the metadata on the
input port. But, you cannot propagate them through the component. You must select them separately.
The component writes information to LDAP directory. It provides the logic for updating the information on the
LDAP directory.
The metadata on the input must precisely match the LDAP object attribute name. The Distinguished Name meta-
data attribute is required. As the LDAP attributes are multivalued, their values can be separated by pipe. For this
reason only strings can be correctly handled. Thus, the only metadata type supported is string.
When you select this component, you must specify to which directory the information should be written (Ldap
URL). Its form corresponds to the following pattern: ldap://hostname:portnumber.
You can also decide whether some entry should be added (Add entry) or removed (Remove entry) and whether
some attribute should be replaced (Replace attributes) or removed (Remove attributes). This can be done by
defining the value of the Action attribute.
Sometimes you may need to log in with the help of your user name (User) and your password (Password). Your
username can be similar to the following: cn=john.smith,dc=example,dc=com.
168
Chapter 17. Transformers
These components have both input and output ports. They can put together more data flows with the same metadata,
split one data flow into more flows, intersect two data flows (even with different metadata on inputs) and even
make more complicated transformations of data flows.
Metadata can be propagated through some of these transformers, whereas the same is not possible in such compo-
nents that transform data flows in a more complicated manner. You must have the output metadata defined prior
to configuring these components.
Some of these transformers use transformations that have been described above.
Copying, Filtering and Sorting

These components have one input port and at least one output port. They send data to one or more output ports
(SimpleCopy, this is the only one of these components that can change metadata but even in this component it
is not necessary), filter the data (Dedup or ExtFilter) according to some criterion or sort it (ExtSort) according
to the selected key.
SimpleCopy
This component has one input port and at least one output port. Whenever you connect an edge to any output port,
a new output port is created. It is possible to propagate metadata through this component.
This component does not necessarily change metadata. But, it can change them if you want. You can transform
fixed length metadata to delimited metadata and viceversa. But the number of fields and their data types must
be preserved.
It copies all of the records that enter the component, sending them to all of the output ports. Thus, if you want to
create more identical data flows, you can do it using SimpleCopy component.
When you select this component, you can change the phase of parsing data (Phase), set the visual name located
on the component (Component name) and enable/disable the component (Enable).
SpeedLimiter
a new output port is created.
(This component still belongs to the Others group in the palette of components, but it is similar to SimpleCopy
except for the Delay attribute. For this reason it is described in "Transformers" Section of this manual.)
You can propagate metadata through this component. It does not change metadata. However, the output metadata
must at least have nearly the same structure as the input metadata (the number of fields, data types and sizes).
Metadata name and even the field names may differ.
It can delay the data flow on its way through the component. It delays every record by the same value. Thus, the
total execution time depends on the number of records going through the SpeedLimiter.
169
Transformers
When you select this component, you must specify this delay (Delay). The delay should be expressed in millisec-
onds.
Also in case of this component, you can change the phase of parsing data (Phase), set the visual name located on
the component (Component name) and enable/disable the component (Enable).
ExtSort
ExtSort does not change metadata. Metadata can be propagated through the component.
It sorts all of the records according to some selected key (combination of field names). When you specify the key
field names, order of the selected names is of importance.
You can select the key field names using the Edit key dialog. Click the row of the Sort key attribute. After that, a
button appears. When you click this button, an Edit key dialog opens. There you can select the fields that should
create the Sort key attribute.
Select the fields you want and drag and drop all the selected fields to the pane on the right. (You can also use
the Arrow buttons.) The highest field name has the highest sorting priority. Then the sorting priority goes down
gradually towards the end of the list of the selected field names. The lowest field name has the lowest sorting
priority.
When you click the OK button, the selected field names will turn to the sequence of the same field names separated
by semicolon. This sequence can be seen in the Sort key attribute row. (In this sequence, the highest sorting
priority has the first field name of the sequence. The priority goes down towards the end of the sequence.)
When you select this component, you must define both the key for sorting (Sort key) and specify the order for
sorting (Sort order, to be Ascending or Descending). Click the button that appears in the Sort key attribute row
and define the Sort key by clicking the arrows buttons or dragging and dropping. When you select any item in
the Field pane on the left and move it to the Key column in the Key parts pane on the right., the default Sort
order (Ascending) appears in the corresponding column.
Figure 17.1. Defining Sort Key and Sort Order
The resulting Sort key is a sequence of field names and an a or a d letter in parentheses separated by semicolon.
It can look like this: FieldM(a);...FieldN(d).
170
Transformers
You can specify the buffer capacity for sorting records in memory (Buffer capacity), select the directory for
temporary files that should be created (Temp directories URL), the number of temporary files (Number of tapes,
an even number greater than two), specify the initial capacity for sorting (Sorter initial capacity, number of
records). The Number of tapes attribute is set to 6 by default. The order for sorting is Ascending by default. It
is sufficient to select the order by specifying its initial letter. The temporary directories are specified as a list of
names separated by semicolon.
Dedup
This component has one input port and one or two output ports. The optional second output port can be used for
rejected data if it is connected to some other component.
The component does not change metadata. Metadata can be propagated through the component.
It serves to remove duplicate records. You must select some key (combination of field names) according to which
the records should be considered duplicate. It is very important that the input records be sorted according to the
selected key. Otherwise, duplicate records within only one adjacent group would be removed whereas the other
groups with the same key would be considered as a distinct group. Thus, it is necessary to sort the records according
to the selected key before the duplicate records should be removed.
When you select this component, you must specify the key for deduplicating (Dedup key) (a combination of
field names separated from each other by semicolon), decide which duplicate records should remain on the output
(Keep) (the options are First, Last, Unique) along with the Number of duplicates attribute defining the amount
of records that should be sent out, you can also decide whether two or more records with some dedup fields being
null should be considered equal (Equal NULL). This attribute is set to true by default.
If you set the Keep attribute to First and the Number of duplicates to 5, at most five records starting from the
beginning will be sent to the first output port. If you set the Keep attribute to Last and the Number of duplicates
to 10, at most ten records from the end will be sent to the first output port. If you choose Unique, the Number of
duplicates is ignored because only unique records are sent to the first output port. The rejected data will be sent
to the optional second output port if there is a component connected to it. The first records are those that should
be kept by default. It is sufficient to specify any of the options by the first letter of the selected option.
ExtFilter
This component has one input port and one or two output ports. The optional second output port can be used for
rejected data if it is connected to some other component.
It filters records according to a logical expression. It sends all of the records corresponding to the filter expression
to the first output port and all of the rejected records to the second port if it is connected.
This component does not change metadata. Metadata can be propagated through the component.
When you select this component, you must specify the expression according to which the filtering should be per-
formed (Filter expression). The filtering expression consists of some number of subexpressions connected with
171
Transformers
logical operators (logical and and logical or) and parentheses for expressing precedence. For these subexpres-
sions there exists a set of functions that can be used and set of comparison operators (greater than, greater than or
equal to, less than, less than or equal to, equal to, not equal to). The latter can be selected in the Filter editor wizard
as the mathematical comparison signs (>, >=, <, <=, ==, !=) or also their textual abbreviations can be used (.gt.,
.ge., .lt., .le., .eq., .ne.). All of the record field values must be expressed by their name preceded by
dollar sign. For example, $employeeid.
Concatenating, Gathering and Merging

These components put together the records incoming through various input ports with equivalent metadata, sending
the result to the output port while preserving metadata structure.
Concatenate
This component has at least one input port and a common output port. Whenever you connect any edge to any
input port, a new input port is created.
All input ports must receive data with the same metadata structure, however, there is no need to use only one
metadata for the two input ports, nevertheless, if you want to use two metadata for the two input ports, either of the
two metadata must have identical structure (the number of fields, field names, data types and sizes). Only metadata
name may differ. The component does not change metadata structure.
It receives all of the records that enter the component, mixturing them (if the component has various input ports)
and sending the result to the common output port while preserving metadata on the output port.
First, the component receives all of the records incoming through the first input port, sends all of them to the
common output port and, subsequently, adds to them all of the records incoming through the next input port. If
the component has more input ports than two, the records are received and sent to the output according to the
order of the input ports.
If some of the input ports contains no records, such port is skipped.
When the last input port is reached and all of its records have been received and sent to the output port, the process
terminates.
SimpleGather
This component has at least one input port and a common output port. Whenever you connect an edge to any input
port, a new input port is created.
metadata for the two input ports, nevertheless, if you want to use two metadata for the two input ports, either of
the two metadata must have nearly identical structure (the number of fields, and data types). Metadata name and
even the field names may differ. The component does not change metadata structure.
172
Transformers
It receives all of the records that enter the component, mixturing them (if the component has various input ports)
and sending the result to the common output port while preserving metadata on the output port.
First, the component receives only one record incoming through the first input port, sends it to the common output
port and, subsequently, adds to it only one record incoming through the next input port. If the component has more
input ports than two, all records are received from the input ports cyclically, one by one, going through all of the
input ports in the ascending order of their numbers, and are sent to the common output port.
When the last input port is reached, the process returns to the first input port or to the first input port through which
the component can still receive some records.
If some of the input ports contains no more records, such port is skipped.
When the component receives the last record and sends it to the common output port, the process terminates.
Merge
This component has at least two input ports and a common output port.
metadata for the two input ports, nevertheless, if you want to use two metadata for the two input ports, either of the
two metadata must have identical structure (the number of fields, field names, data types and sizes). Only metadata
name may differ. The component does not change metadata structure.
It receives all of the records that enter the component, mixturing and sending the result to the common output port
while preserving metadata on the output port.
This component must receive only sorted records that enter its input ports and that are sorted according to some
key (Merge key), sending the result to the common output port in a sorted manner according to the same key.
The order of selecting field names is of importance.
If some of the input ports contains no records, such port is skipped.
When the component reads the last record and sends it to the common output port, the process terminates.
173
Transformers
Partitioning and Intersection

These components receive the data that enter through one or two input ports distributing the flow among more
outputs (Partition or DataIntersection components).
Partition
This component evaluates all incoming records according to some specified criterion, splits this data flow and
distributes different records among different output ports. To part the data flow, you can define some data field
ranges, some exact values, etc.
For example, the records in which some defined date is before the specified day are sent to one output port, the
other records are sent to another output. You can also split the incoming data flow depending on the alphabetical
order of names, etc. You can specify any combination of field value ranges or other definitions expressed by
Partition key, Ranges or some partitioning defined by Partition class, Partition code or Partition URL.
Remember that this component does not require any partitioning definition if Ranges or Partition key are defined.
The component does not change metadata. It is possible to propagate metadata through this component.
When you select this component, you must define the way how the incoming data flow should be parted and the
records should be distributed among different output ports.
You can do it in the following ways:
• If you define any of the three attributes that can specify the way how the incoming data flow should be parted
and the records should be distributed among the output ports (Partition class, Partition or Partition URL
attributes), such partitioning transformation will be used. In this case, you do not need to define the Partition
key and/or the Ranges attributes. If you define some Partition key and/or Ranges, these attributes will be
ignored. (Partition class is a path and a file name of some class, jar or zip file located outside the graph.
Partition is the transformation defined in the graph itself with the help of the Java language or the internal
Clover transformation language. Partition URL is a path and a file name of some file written in Java or in the
internal Clover transformation language.)
• If you do not define any partitioning transformation but (at the same time) you define both the Partition key
(some sequence of the fields separated by semicolon) and the Ranges (ranges of values of the key fields)
attributes, the records will be distributed among all of the output ports depending on the values of the key fields.
The records in which the values of the fields are inside the same range will be sent to the same output port. The
number of the output port corresponds to the order of the range within all values of the fields. The ranges must
be defined with the help of pairs of values separated by comma and surrounded by braces. Round parentheses
or angle brackets mean that the boundary value is excluded/included from/in the range, respectively.
• If you do not define any partitioning transformation but (at the same time) you only define the Partition key
without defining the Ranges, hash value will be calculated and used to part data flow and distribute the records
among all of the output ports.
• If you do not define any partitioning transformation but (at the same time) you only define the Ranges without
defining the Partition key, RoundRobin algorithm will be used to part data flow and distribute the records
among all of the output ports.
174
Transformers
Figure 17.2. Ranges Editor
(Partition class is a path and a file name of some class, jar or zip file located outside the graph. Partition is the
transformation defined in the graph itself with the help of the Java language or the internal Clover transformation
language. Partition URL is a path and a file name of some file written in Java or in the internal Clover transfor-
mation language.)
If you want to define the Partition class attribute, you must click its item row, after which a button appears there,
If you want to define the Partition attribute, you must click its item row, after which a button appears there, and,
when you click this button, a Transform editor opens. There you can define the transformation by writing it in
Clover transformation language or Java language.
If you want to define the Partition URL attribute, you must click its item row, after which a button appears there,
You may also want to set the character type (Charset) that should be used when reading transformation definition
from external Partition URL.
And you may want to decide whether some local specific rules should be used (Use internationalization and
Locale attributes). The first attribute is set to false by default and the second is set to the system value by default.
175
Transformers
Transformations
Here is an example of how the Source tab for defining the transformation looks.
Figure 17.3. Source Tab of the Transform Editor in the Partition Component
If you want to define some partitioning transformation using Clover transformation language, independently on
whether it is contained in the graph itself (Partition) or in some file outside the graph (Partition URL), you
must use the following function: getOutputPort(). This function is required. It returns integers. It does not
transform the records, it does not change them nor remove them, it only assigns numbers to individual records.
These numbers means the output ports to which individual records should be sent.
The function can be defined in the Transform editor using if or switch statements or using any other way
of selecting output ports:
function getOutputPort() {
if (condition0) return 0
else if (condition1) return 1
...
else if (conditionN) return N
}
Above you have used if statements.
You can also use the switch statement to decide and select the numbers of output ports.
switch (expression) {
case(expression0):return0
case(expression1):return1
...
case(expressionN):return N
[default:return N+1]
}
176
Transformers
You must define the conditions or expressions mentioned above. They allow to select which output port should
be assigned to every individual record. Remember that the ports are numbered starting from 0.
For example, you can define the following partitioning transformation:
if ($temperature > 0) return 0
else return 1
}
In addition to this required function, you can define another function: init(). If you want to declare and initialize
some variables, if you want to anything what should be done at the beginning of data processing by the component,
you should do it within the init() function. This function is called only once. The init() function is called
at the beginning. Unlike the init() function, the required getOutputPort() function is called many times.
It is called after init().
You can open the transformation definition as a third (or higher) tab of the graph (in addition to the Graph and
Source tabs of Graph Editor) by clicking corresponding button at the upper right corner of the tab.
Once you have written your transformation, you can also convert it to Java language code by clicking correspond-
ing button at the upper right corner of the tab.
DataIntersection
This component has two input ports and three output ports.
Metadata structure of incoming data records can differ in both input ports. Data records incoming through different
input ports can even have different number of fields. However, some part of them (Join key and Slave override
key) must be comparable.
The component does not change metadata structure on the first and third output ports, but it is not possible to
propagate metadata through this component to these ports. Metadata on the second output port may be different.
You must first create metadata on the second output port according to the desired result or select some prepared.
Only then you can define the transformation.
When you select this component, you must specify the key according to which the records from both input ports
should be compared and intersected (Join key). It can be defined as a sequence of field names separated by
semicolon.
The records that enter only through the first input port will be sent to the first output port, the records that enter
only through the second input port will be sent to the third output port. Those records that enter through both
the first and the second input ports are sent to the second output port. Remember that the records are compared
according to the fields that are used to define Join key only. It is of no importance if the other fields are different.
You do not need to have the same key field names in both input ports, you may want to use some different field
names for the second input port (Slave override key). This can be done by clicking the Slave override key
attribute row. After it, an Override key wizard opens. On it, you can see two panes: Slave fields pane on the
left and Master key pane on the right. You can select some of the fields from the Master key pane by clicking,
push the left mouse button, drag to the Slave fields pane and release the button. This way you can assign any
of the fields from the Join key on the first input port to the corresponding fields on the second input port. You
can also use the buttons on the wizard if you want to make Auto pairing or reset some/all assignment you have
made (reset and reset all buttons).
177
Transformers
You must also decide whether the records with duplicate key values should also be used (Allow key duplicates).
This attribute is set to false by default, the duplicate records are not allowed. By default these records are discarded
and only the last of them is sent to the transform() function.
You must also specify some transformation by defining one of the following three attributes: Transform class,
Transform or Transform URL. (Transform class is a path and a file name of some class, jar or zip file located
outside the graph. Transform is the transformation defined in the graph itself with the help of the Java language
or the internal Clover transformation language. Transform URL is a path and a file name of some file written in
Java or in the internal Clover transformation language.)
If you want to define the Transform class attribute, you must click its item row, after which a button appears
there, and, when you click this button, an Open Type wizard opens. In it, you must specify the desired type. (See
Section "Open Type Wizard" for more information.)
If you want to define the Transform attribute, you must click its item row, after which a button appears there,
and, when you click this button, a Transform editor opens. There you can define the transformation by defining
easy transformation mapping, or writing the transformation in Clover transformation language or Java language.
If you want to define the Transform URL attribute, you must click its item row, after which a button appears
there, and, when you click this button, an URL File Dialog opens. In it, you can locate the desired file. (For more
information see Section "Locating Files with URL File Dialog" above.)
You must also decide whether two or more records with some fields being null should be considered equal (Equal
NULL). This attribute is set to true by default.
You may also want to set the character type (Charset) that should be used when reading transformation definition
from external Transform URL.
Transformations
Figure 17.4. Source Tab of the Transform Editor in the DataIntersection Component
178
Transformers
If you want to define some intersection using Clover transformation language, independently on whether it is
contained in the graph itself (Transform) or in some file outside the graph (Transform URL), you must use
the following function: transform(). This function is required. It maps intersected data records to the second
output port.
The records incoming through the first input port are sent to the first output port if the Join key values of these
records are not contained in the Slave override key of the records incoming through the second input port. The
records incoming through the second input port are sent to the third output port if the Slave override key values
of these records are not contained in the Join key of the records incoming through the first input port. But the
difference is that you must create new records for the second output port to which records with the same Join key
and Slave override key values should be sent. In other words, you must define some mapping of metadata on
both input ports to the second output port. Remember that the ports are numbered starting from 0.
To define the transformation, you must have the output metadata defined.
In addition to this required function, you can define other two functions: init() and finished(). If you
want to declare and initialize some variables, if you want to anything what should be done at the beginning of
data processing by the component, you should do it within the init() function. If you want to free memory,
delete some temporary files, you should define it within the finished() function. Either of these functions is
called only once. The init() function is called at the beginning, the finished() function is called at the end.
Unlike them, the required transform() function is called many times. It is called after init() and before
finished().
Pure Transformers
These components transform data flowing through them and change metadata on the output ports.
KeyGenerator
This component has one input port and one output port.
It is used before the ApproximativeJoin component. The newly created key serves as the Matching key in the
ApproximativeJoin component.
It changes metadata by creating a new additional field named key and adding this key to the end of the outgoing
records. You must first create metadata on the component output by hand according to the desired result. These
metadata are the same as the metadata on the input with one added field of defined data type at the end.
When you select this component, you must specify the field name(s) from which the key should be generated
(Matching key) and also decide for all of the selected fields the characters, their number and the way that should be
used to generate the key. The Matching key attribute has the form of a sequence of specified expressions separated
by semicolon. Each of these expressions has the following form: fieldname [number][parameters].
When you want to specify the properties of the key, you must click the Matching key attribute row, then a button
appears. By clicking this button, an Edit key wizard opens. You can see the following three panes on it: Fields,
Key parts and Matching key properties. First, you must select some field from the Fields pane by clicking its
item, then you need to click the Right arrow button, after which the selected field name transfers to the Key parts
pane. This way you must transfer all the fields that should generate the key from the Fields pane to the Key parts
179
Transformers
pane. When you click any of the fields in the Key parts pane, you can see the possible options in the Matching
key properties pane. There you must decide the following:
You must specify the number of characters from the selected field that should be used in the generated Matching
key. This must be done by typing the desired number in the number of letters to create key textarea.
If you want to use only alphabetic or numeric characters in the generated Matching key, you must check the
Alpha/numeric check box. After it, you will be able to decide whether you want to use only alphabetic characters
(this will be marked as the a letter in the parameters sequence), only numeric characters (marked as the n letter)
or both of them (an). This can be done by checking the corresponding checkboxes.
By checking the remove blank space and/or strip diacritic checkboxes you can decide whether the white spaces
(s) and the diacritical signs (d), respectively, should be removed when generating the key. The result can be
standard Latin letters, etc.
If you want to change the case of alphabetic characters in the generated Matching key, you must check the Case
checkbox and choose what case of characters selected from the field should be used in the key - either upper case
(u), or lower case (l). You must choose the desired case by checking its corresponding radio button.
As the result of such selection, you will obtain the sequence of expressions separated by semicolon as was men-
tioned above. In it, the number meaning the amount of characters that should be used in the generated Matching
key will be used as the number. As the parameters, the letters mentioned above will be used.
For example, if you want to use only two characters from the customer field, both alphabetic and numeric,
remove blank spaces and diacritic signs and change the case of alphabetic characters to upper case and for the
order field only three alphabetic and numeric upper-case characters, you will obtain the following sequence of
expressions: customer 2ansdu; order 3anu.
Aggregate
It changes metadata by aggregating groups of records on the input port, applying some of the provided functions
to the whole group and creating new records on the output port.
You must first create metadata on the component output according to the desired result or select some prepared.
Only then you can create the transformation.
You must define the aggregation key (Aggregate key). It is a sequence of field names separated by semicolon.
The records with the same value of the key are considered to be the group to which a function from the provided
list should be applied. But it is not necessary to sort the data before this component.
You may want to specify whether the data on the input is sorted or not (Sorted input). This attribute is set to
true by default.
You can also decide whether two or more records with some fields being null should be considered equal (Equal
NULL). This attribute is set to false by default.
You must also specify either the Aggregation mapping or the Old aggregation mapping attributes.
The latter attribute must be specified in the release 2.1 of CloverEngine or older. It must be defined by hand. It
works with a new version too, but its use is deprecated now.
The first attribute must be used in newer releases of CloverEngine.
180
Transformers
When you click the Aggregation mapping attribute row, an Aggregation mapping wizard opens. In it, you must
define both the mapping and the aggregation. The wizard consists of two panes. You can see the Input field pane
on the left and the Aggregation mapping pane on the right. You must select some input fields in the left pane
and map them to the output field names in the right pane. You can do it by clicking the selected item from the
left pane, holding down the left mouse button, dragging to the Mapping column in the right pane at the row of
the desired output field name and releasing the button. After that, the selected input field appears in the Mapping
column. This way you can map all the desired input fields to the output fields. In addition to it, you must click
some row in the Function column and select some function from the provided list. This can be repeated until
you define all of the desired functions. These functions will be applied to all records of each group and the result
will be sent to the output.
You may also want to set the character type that should be used in the data flow (Charset).
Reformat
In principle, the component preserves the number of records contained in the data flow on its way from the input
port to the output port(s). It can change number of fields. It can change format of date data type, concatenate some
fields, reorder them, split some fields, cut off some parts of data, change letter cases, convert different data types
from one type to another or replace some field values by some other identification. This component can do many
complicated things using a defined transformation.
It changes metadata. You must first create metadata on the component output(s) by hand according to the desired
transformation or select some prepared. Only then you can create the transformation. Different outputs can even
have different metadata.
When you select this component, you must specify the way how the records should be reformated on their way
throughout the component (Transform class, Transform or Transform URL attributes). (Transform class is a
path and a file name of some class, jar or zip file located outside the graph. Transform is the transformation defined
in the graph itself with the help of the Java language or the internal Clover transformation language. Transform
URL is a path and a file name of some file written in Java or in the internal Clover transformation language.)
The transformation must implement the RecordTransform interface or inherit from the DataRecordTransform
superclass. In the last case you only need to implement the transform() method.
there, and, when you click this button, an URL File Dialog opens. In it, you can locate the desired file. (See
Section "Locating Files with URL File Dialog" for more information.)
You may also want to set the character type (Charset) that should be used when reading from external Transform
URL.
181
Transformers
Transformations
Figure 17.5. Source Tab of the Transform Editor in the Reformat Component
If you want to define some transformation using Clover transformation language, independently on whether it is
contained in the graph itself (Transform) or in some file outside the graph (Transform URL), you must imple-
ment the transform() function in the following way (for example):
function transform() {
$0.Name := $0.fname+" "+$0.lname;
$0.address := $0.address;
}
Above you are assigning input field values to output field values. This transformation changes the format of
records. The transform() function is required. It maps inputs to outputs.
finished().
182
Transformers
Denormalizer
It changes metadata by composing various records on the input port into one record on the output port. You must
first create metadata on the component output by hand according to the desired result or select some prepared.
Only then you can create the transformation.
The component receives records whose metadata are not convenient for some purposes. User may want to change
metadata, put together various records of one data flow into one new records whose metadata will differ from
those on the input. Different fields can be transferred to different new records.
For example, if you have a data flow consisting of records collected during some amount of years in which air
temperature and pressure during the twelve months of every individual year are stored as two fields with the year
and month as the other two fields, you can put together these records into other data flow in which every record
will contain the information about temperature and pressure for the whole year. Thus, you will obtain records
with 25 fields. The information about every individual month will be expressed only by order of some data field
within the 25 fields of one record. The first field can contain information about a year and the other 24 contain the
information about air temperature and pressure for 12 months. Thus, the amount of records will be twelve times
less and number of fields will be 25 instead of 4 only. The information about some individual month is expressed
by the order of the field in a record. The months can be ordered from January to December starting from the second
field to 25th field of the record. The counterpart of this process is normalization.
This component must receive data that is sorted according to some specified key. For this reason, when you
select this component, you must specify such key (Key, a sequence of field names separated from each other by
semicolon) and the order of the incoming data (Sort order).
You can create the Key with the help of the Edit key wizard.
The Sort order attribute can be set to Ascending, Descending, Auto or Ignore. You can select the desired value
by clicking the Sort order attribute row and choosing from the presented list. Since the Denormalizer works
with the groups of records equally if their Keys are equal and (at the same time) if the records of each group are
grouped together, it is important that the incoming records be ordered according to such a Key. The Sort order
can be ascending or descending (Ascending or Descending values). You may also want that Clover itself make
autodetection of the order of the incoming data. In this case the Sort order attribute must be set to Auto. You
can set this attribute to Ignore as well. But remember that if the records are not ordered according to the Key on
the input port, the records with some Key value that are not grouped together are parsed as if they were different
groups of records.
In addition to all this, you must specify the desired transformation by defining one of the following attributes:
Denormalize class, Denormalize or Denormalize URL. (Denormalize class is a path and a file name of some
class, jar or zip file located outside the graph. Denormalize is the transformation defined in the graph itself with
the help of the Java language or the internal Clover transformation language. Denormalize URL is a path and a
file name of some file written in Java or in the internal Clover transformation language.)
If you want to define the Denormalize class attribute, you must click its item row, after which a button appears
If you want to define the Denormalize attribute, you must click its item row, after which a button appears there,
and, when you click this button, a Transform editor opens. There you can define the transformation by writing
it in Clover transformation language or Java language.
If you want to define the Denormalize URL attribute, you must click its item row, after which a button appears
183
Transformers
You may also want to set the character type (Charset) that should be used when reading from external Denor-
malize URL.
Transformations
Figure 17.6. Source Tab of the Transform Editor in the Denormalizer Component
If you want to define some denormalization using Clover transformation language, independently on whether it
is contained in the graph itself (Denormalize) or in some file outside the graph (Denormalize URL), you must
do it in the following way:
When you want to define some transformation from input to output, you must use the following two functions:
addInputRecord() and getOutputRecord(). These functions are required.
First of all, you must declare some variables. You can also initialize them.
Then you must assign the values of the fields of incoming records to these variables. This must be done within
the addInputRecord() function. The purpose of this function is to remember the group of input records. This
can be done using some variables as described above or in some other way.
Finally, you only need to assign the set of the defined variables to the output fields. This must be done within the
getOutputRecord() function.
You need to assign the field values to variables using the equal sign (variable = $inputfield) where-
as assignment of variables to the field values must be done using together the colon and equal sign ($output-
field := variable).
To define the denormalization, you must have the output metadata defined.
See the following example:
int yearA;
string monthB;
184
Transformers
int temperaturejan;
int pressurejan;
int temperaturefeb;
int pressurefeb;
function addInputRecord() {
yearA = $year;
if ($month == "January") {
temperaturejan = $temperature;
pressurejan = $pressure;
}
if ($month == "February") {
temperaturefeb = $temperature;
pressurefeb = $pressure;
}
}
function getOutputRecord() {
$field1 := yearA;
$field2 := temperaturejan;
$field3 := pressurejan;
$field4 := temperaturefeb;
$field5 := pressurefeb;
}
In addition to these required functions, you can define other three functions: init(), finished() and
clean(). If you want to declare and initialize some variables, if you want to anything what should be done
at the beginning of data processing by the component, you should do it within the init() function. (Thus it
would be better if the variables above were declared and initialized within the init() function.) If you want
to free memory, delete some temporary files, you should define it within the finished() function. Either of
these functions is called only once. The init() function is called at the beginning, the finished() function
is called at the end. Unlike them, the required addInputRecord() and getOutputRecord() functions are
called many times. They are called after init() and before finished(). However, if you want to reset values
of some variables and/or delete some temporary files between parsing groups of records with different key values,
you can do it within the clean() function. It is called many times, but once after parsing each group of records
and sending the resulting outgoing record to output port.
Normalizer
It changes metadata by decomposing each record on the input port into various records on the output port. You
must first create metadata on the component output by hand according to the desired result. Only then you can
define the transformation.
The component receives records whose metadata are not convenient for some purposes. User may want to change
metadata, split one record into more new records whose metadata will differ from those on the input. Different
fields can be transferred to different new records.
185
Transformers
For example, if you have a data flow consisting of records collected during some amount of years in which air
temperature and pressure during the twelve months of every individual year are stored with the year as another
field, you can split these records into other data flow in which every record will contain the information about
temperature and pressure for one month only. The information about the month can be contained in a new field.
Thus you will have records with 4 fields. Two describe information about the year and month, the other two contain
the information about air temperature and pressure. Thus, the amount of records will be twelve times greater and
number of fields will be only 4 instead of 25. The counterpart of this process is denormalization.
When you select this component, you must specify the desired transformation by defining one of the following
three attributes: Normalize class, Normalize or Normalize URL. (Normalize class is a path and a file name of
some class, jar or zip file located outside the graph. Normalize is the transformation defined in the graph itself
with the help of the Java language or the internal Clover transformation language. Normalize URL is a path and
a file name of some file written in Java or in the internal Clover transformation language.)
If you want to define the Normalize class attribute, you must click its item row, after which a button appears
If you want to define the Normalize attribute, you must click its item row, after which a button appears there, and,
when you click this button, a Transform editor opens. There you can define the transformation by writing it in
If you want to define the Normalize URL attribute, you must click its item row, after which a button appears there,
You may also want to set the character type (Charset) that should be used when reading from external Normalize
URL.
Transformations
Figure 17.7. Source Tab of the Transform Editor in the Normalizer Component
186
Transformers
If you want to define some normalization using Clover transformation language, independently on whether it is
contained in the graph itself (Normalize) or in some file outside the graph (Normalize URL), you must do it in
the following way:
When you want to define some transformation from input to output, you must use the following two functions:
count() and transform(idx). These functions are required.
The count() function is a simple function defined in the transformation code in the following way:
function count () {
return N;
}
Here, N is the number of records to which each of the incoming records must be split. The function only gives rise
to the index that serves to define how many outgoing records should be created from one incoming record.
The transform(idx) function accepts all values of this index (idx) and defines some mapping from input to
output. The index has N integer values starting from 0 to N-1.
If you want to use some variables in your code, you must declare them first. If you define some mapping, it must
be at the end of any of the functions or at the end of the whole program. In principle, at the very end of any
closed block.
Inside the transform(idx) function, you must define the transformation using index values. You can use if
or switch statements to select what should be done with all of the individual parts of the incoming records.
To define the normalization, you must have the output metadata defined.
See the following example:
function map1() {
$year := $Field1;
$month := "January";
$temperature := $Field2;
$pressure := $Field3;
}
function map2() {
$year := $Field1;
$month := "February";
$temperature := $Field4;
$pressure := $Field5;
}
function transform(idx) {
switch (idx) {
case 0:map1();
case 1:map2();
}
}
function count() {
return 2;
}
In addition to these required functions (count() and transform()), you can define other three functions:
init(), finished() and clean(). If you want to declare and initialize some variables, if you want to any-
thing what should be done at the beginning of data processing by the component, you should do it within the
init() function. (Thus it would be better if the variables above were declared and initialized within the init()
187
Transformers
function.) If you want to free memory, delete some temporary files, you should define it within the finished()
function. Either of these functions is called only once. The init() function is called at the beginning, the fin-
ished() function is called at the end. Unlike them, the required count() and transform() functions are
called many times. They are called after init() and before finished(). However, if you want to reset values
of some variables and/or delete some temporary files between parsing individual incoming records, you can do it
within the clean() function. It is called many times, but once after parsing each incoming record and sending
the resulting group of outgoing records to output port.
XSLTransformer
It can transform incoming data records based on the specified Xslt or Xslt file attributes. The first of them must
be edited in the Edit value wizard, the second must be specified in the File URL dialog.
You can also define some Mapping using the following wizard.
Figure 17.8. XSLT Mapping
Assign the input fields from the Input fields pane on the left to the output fields by dragging and dropping them
in the Input field column of the right pane. Select which of them should be transformed by setting the Transform
data option to true. By default, fields are not transformed.
The resulting Mapping can look like this:
Figure 17.9. An Example of Mapping
You may also want to set the character type (Charset) that should be used when reading the Xslt file.
188
Chapter 18. Joiners
These components have both input and output ports. They serve to put together the records with a different meta-
data (including with a different number of fields) according to the specified key and the transformation. They can
join the records incoming through more input ports and they can also join the records incoming through input ports
with those from lookup table and/or database table. Metadata cannot be propagated through these components.
You must first select the right metadata or create them by hand according to the desired result. Only then you can
define the transformation. For some of the output edges you can also select the metadata on the input, but neither
these metadata can be propagated through the component. These components use some transformations that are
described in the section concerning transformers.
Join Types
These components can work under the following three processing modes:
Inner Join
In this processing mode, only the driver records which correspond to some slave record(s) are processed.
Left Outer Join

In this processing mode, also the driver records with no corresponding slave are processed.
Full Outer Join

In this processing mode, the transformation method is also called for the slave records without a corresponding
driver.
Joining Components
The following five components serve to join data flow with different metadata: ApproximativeJoin, Ex-
tHashJoin, ExtMergeJoin, LookupJoin and DBJoin.
In each component some transformation must be defined. You can use external .class or .java files or write
the transformations in the graph itself using the Transform editor.
189
Joiners
Transformations
Figure 18.1. Source Tab of the Transform Editor in Joiners
In all of these Joiners you can define some transformation. You can do it using easy transformation mapping,
If you want to define some transformation using Clover transformation language, independently on whether it is
contained in the graph itself (Transform) or in some file outside the graph (Transform URL), you must use the
transform() function. This function is required.
Within this function you must define the selection of records based on driver and slave key values and create
mapping of incoming records (and records from lookup table or database) to the output port to which records
sharing the same key values are sent.
finished().
190
Joiners
ApproximativeJoin
This component has two input ports and between two and four output ports. The third and fourth output ports are
optional. They do not need to be connected. If the third and the fourth output ports are connected, they serve to
send out the records incoming through the first and the second input ports, respectively. The first input port serves
as a driver, the second input port serves as a slave.
The component does not need to receive the same metadata on the two input ports. They do not even need to have
the same number of fields. But, if the third output port is connected, it has the same metadata as the first input
port. The fourth output port has the same metadata as the second input port too. Nevertheless, neither of the input
metadata can be propagated through the component to the third and fourth output edges. The metadata on the first
and second output ports differ. You must first create metadata on the first and second output ports according to
the desired result or select some prepared. Only then you can define the transformation.
Metadata on the first and second output ports can contain two additional fields of numeric data type. Their names
must be the following: "_total_conformity_" and "_keyName_conformity_". In the last field name,
you must use one of the fields of the Join key attribute as the keyName in the last of the two mentioned additional
field names. To these additional fields the values of computed conformity will be written.
To the first and second output ports, the joined records with greater and smaller conformity, respectively, will
be sent.
The component receives the records incoming through the input ports, reads them and (for each driver record)
looks up the corresponding slave record. If such a slave record is not found, the driver record is sent out through
the third output port. After that, the component computes a conformity of the driver and slave pair. The pairs of
driver and slave whose conformity is greater than specified limit are joined and sent out through the first output
port. The pairs whose conformity is smaller but with the same matching key value are joined and sent out through
the second output port. Finally, the slave records without a driver are sent out through the fourth output port. (The
conformity is computed as the Levenstein distance.)
You must define the key that should be used to join the records (Join key). You can define the key with the help of
the Join key wizard. When you open the Join key wizard, you can see two tabs: Master key tab and Slave key tab.
Figure 18.2. Join Key Wizard (Master Key Tab)
191
Joiners
In the Master key tab, you must select the driver (master) fields in the Fields pane on the left and drag and drop
them to the Master key pane on the right. (You can also use the buttons.)
Figure 18.3. Join Key Wizard (Slave Key Tab)
In the Slave key tab, you can see the Fields pane (containing all slave fields) on the left and the Key mapping
pane on the right.
You must select some of these slave fields and drag and drop them to the Slave key field column at the right from
the Master key field column (containing the master fields selected in the Master key tab in the first step). In
addition to these two columns, there are other six columns that should be defined: Maximum changes, Weight and
the last four representing strength of comparison.
The maximum changes property contains the integer number that equals to the number of letters that should be
changed so as to convert one data value to another value. The maximum changes property serves to compute the
conformity. The conformity between two strings is 0, if more letter must be changed so as to convert one string
to the other.
The weight property defines the weight of the field in computing the similarity. Weight of each field difference is
computed as the quotient of the weight defined by user and the sum of the weights defined by user.
The strength of comparison can be identical, tertiary, secondary or primary.
If it is identical, only identical letters are considered equal.
If it is tertiary, upper and lower case letters are considered equal.
If it is secondary, diacritic letters and their Latin equivalents are considered equal.
If it is primary, letters with additional features like as peduncle, pink, circle and their Latin equivalents are con-
sidered equal.
You can change any boolean value by simply clicking. This switches true to false and viceversa. You can also
change any numeric value by clicking and typing the desired value.
When you click OK, you will obtain a sequence of assignments of driver (master) fields and slave fields preceded
by dollar sign and separated by semicolon. Each slave field is followed by parentheses containing six mentioned
parameters separated by white spaces. The sequence will look like this:
$driver_field1=$slave_field1(parameters);...;$driver_fieldN=
$slave_fieldN(parameters)
192
Joiners
Figure 18.4. An Example of the Join Key Attribute in ApproximativeJoin Component
(When you create the Join key using the wizard, a semicolon is also added to the end of the sequence. However,
this last semicolon is optional and can be omitted.)
You must also define the matching key for comparing driver and slave records (Matching key).
The Matching key need to be generated before the ApproximativeJoin component using the KeyGenerator
components.
You can define the Matching key using the Matching key wizard. You only need to select the desired master
(driver) field in the Master key pane on the left and drag and drop it to the Master key pane on the right in the
Master key tab. (You can also use the buttons.)
Figure 18.5. Matching Key Wizard (Master Key Tab)
In the Slave key tab, you must select one of the slave fields in the Fields pane on the left and drag and drop it to
the Slave key field column at the right from the Master key field column (containing the master field selected
in the Master key tab) in the Key mapping pane.
Figure 18.6. Matching Key Wizard (Slave Key Tab)
193
Joiners
The result is a mapping expression of the following form: $driver_field=$slave_field. It can also be
followed by semicolon and hash, but these two signs are optional and can be omitted.
As the two input ports do not need to receive the same metadata, maybe their fields bear different names and you
may also want to specify the join key fields for the slave records (Slave override key) and the matching field
names for these slaves (Slave override matching key). If you want to define these keys, you must select the
corresponding attribute row, click it, after that a button appears. When you click this button, an Edit key wizard
opens. In this wizard you can define corresponding keys. However, this is deprecated now.
You must also define the limit of conformity (Conformity limit (0,1)). The defined value distributes the incoming
records according to their conformity. The conformity can be greater or smaller. You must define transformations
for either of these two groups. The records with smaller conformity can be marked as "suspicious".
For the records with greater conformity you must specify some transformation by defining one of the following
three attributes: Transform class, Transform or Transform URL.
For the records with smaller conformity (suspicious) you must also specify some transformation by defining one
of the following three attributes: Transform class for suspicious, Transform for suspicious or Transform URL
for suspicious.
(Transform class is a path and a file name of some class, jar or zip file located outside the graph. Transform
is the transformation defined in the graph itself with the help of the Java language or the internal Clover transfor-
mation language. Transform URL is a path and a file name of some file written in Java or in the internal Clover
transformation language.)
You must do the same for the suspicious group.
URL.
ExtHashJoin
This component has at least two input ports and only one output port. Whenever the second or higher order input
port is connected, a new input port is created. The first input port serves as a driver, the other input port(s) serve
as slave(s).
This component does not need to receive the same metadata on the driver and the slave input ports. They do not
even need to have the same number of fields. Nor the slave ports need to receive the same metadata. And, no input
194
Joiners
metadata can be propagated through the component to the output edge. You must first create metadata of the output
edge by hand according to the desired result or select some prepared. Only then you can define the transformation.
The component first receives the records incoming through the slave input ports, reads them and creates hash
tables from these records. These hash tables must be sufficiently small. After that, for each driver record incoming
through the driver input port the component looks up the corresponding records in these hash tables. For every
slave input port one hash table is created. The records on the input ports do not need to be sorted. If such record(s)
are found, the tuple of the driver record and the slave record(s) from the hash tables are sent to transformation
class. The transform method is called for each tuple of the driver and its corresponding slave records.
You can select the join type (Join type attribute). You can choose one of the following three options: Inner join,
Left outer join, Full outer join. The default value is Inner join.
If this attribute is set to Inner join (the default processing mode), only the driver records for which all slave records
exist are processed. If it is set to Left outer join, also the driver records with no slave are processed. If the attribute
is set to Full outer join, the transformation method is called also for the slave records without a driver record.
You must also decide whether the slave records with duplicate key values should also be used to create the hash
table (Allow slave duplicates). This attribute is set to false by default, the duplicate records are not allowed. By
default these records are discarded and only the last of them is used for join.
You may also want to change the number of records that can be stored in one hash table (Hash table size attribute).
The default size is 512. If there are more records than 512, they can be parsed, however, such table must be
rehashed, which slows down the whole process.
The incoming records do not need to be sorted, but the initialization of the hash tables is time consuming and it
may be good to specify how many records can be stored in hash tables. If you decide to specify this attribute, it
would be good to set it to the value slightly greater than needed. Nevertheless, for small sets of records it is not
necessary to change the default value.
You must define the key that should be used to join the records (Join key). You can define the Join key by typing
or in the Hash Join key wizard.
The Join key attribute is a sequence of mapping expressions for all slaves, each of them is followed by hash. The
last hash is optional, it can be omitted. Each mapping expression is a sequence of field names from driver and
slave records (in this order) put together using equal sign followed by semicolon. The last semicolon is optional,
it can be omitted.
Figure 18.7. An Example of the Join Key Attribute in ExtHashJoin Component
Order of these mappings must correspond to the order of the slave input ports. If some of these mappings is empty
or missing for some of the slave input ports, the mapping of the first slave input port is used instead.
Each of these mappings is a sequence of matchings separated by colon, semicolon or

pipe. For example: driver_field1=slave_field1|driver_field2=slave_field2|...|
driver_fieldN=slave_fieldN. If some slave_fieldj is missing (in other words, if the subexpres-
sion looks like this: driver_fieldj=), it is supposed to be the same as the driver_fieldj. If some
driver_fieldk is missing, driver_fieldk from the first mapping is used instead. (You can use semi-
colons instead of pipes which are shown above.)
Driver (Master) key can be different for different slaves.
You can also use the mentioned Hash Join key wizard. When you click the Join key attribute row, a button
appears in this row. By clicking this button you can open the mentioned wizard.
195
Joiners
Figure 18.8. Hash Join Key Wizard
In it, you can see the tabs for all of the slave input ports. In each tab there are two panes. The Master fields pane
on the left and the Key mapping pane on the right. In the left pane you can see the list of driver field names. In the
right pane you can see two columns: Slave key field and Master key field mapped. The left column contains the
field names of the corresponding slave input port. If you want to map some driver field to some slave field, you
must select the driver field in the left pane by clicking its item, and by pushing the left mouse button, dragging
to the Master key field mapped column in the right pane and releasing the button you can transfer the driver
field to this column. The same must be done for each slave. Note that you can also use the Auto mapping button
or other buttons in each tab.
(If you create the Join key using the Hash Join key wizard, this wizard also adds semicolon and hash to the end
of the mentioned mappings. See example above. It also adds dollar sign before either field name. Note that the
last semicolon and the last hash are optional, they can be omitted.)
Up to release 2.4 you had to be sure that your transformation could process even null records. From release 2.5
that is not necessary any more. Now each null record is substituted by a special null record for which all of
the getValue methods return null instead of throwing exception. If you want to take some action on a null
record, you can compare it to NullRecord.NULL_RECORD.
URL.
196
Joiners
ExtMergeJoin
This component has at least two input ports and only one output port. Whenever the second or higher order input
port is connected, a new input port is created. The first input port serves as a driver, the other input port(s) serve
as slave(s).
The metadata on the driver and slave port(s) do not need to be the same. They do not even need to have the same
number of fields. But, this component must receive the same metadata on all of the slave input ports. No input
metadata can be propagated through the component to the output edge. You must first create metadata of the output
edge according to the desired result or select some prepared. Only then you can define the transformation.
The component receives the records incoming through the driver (master) and slave input ports and reads them.
(The incoming records must be sorted according to the specified key.) After that, for each driver record incoming
through the driver input port the component looks up the corresponding slave records. If such record(s) is(are)
found, the driver record along with the slave record(s) are sent to transformation class. The transform method is
called for each combination of the driver and the corresponding slave record(s). The component joins the records
according to the specified Join key, transforms them and sends them to the output port.
You can also select the join type (Join type attribute). You can choose one of the following three options: Inner
join, Left outer join, Full outer join. The default value is Inner join.
If this attribute is set to Inner join (the default processing mode), only the driver records for which all slave records
exist are processed. If it is set to Left outer join, also the driver records with no slave are processed. If the attribute
is set to Full outer join, the transformation method is called also for the slave records without a driver record.
You must also decide whether the slave records with duplicate key values should also be used (Allow slave
duplicates). This attribute is set to false by default, the duplicate records are not allowed. By default these records
are discarded and only the last of them is used for join.
You must define the key that should be used to join the records (Join key). The records on the input ports must
be sorted according to the corresponding parts of the Join key attribute. You can define the Join key by typing
or in the Join key wizard.
The Join key attribute is a sequence of individual key expressions for the driver and all of the slaves followed by
hash. The last hash is optional, it can be omitted. Order of these expressions must correspond to the order of the
input ports. Driver (master) key is a sequence of driver (master) field names (each of them should be preceded by
dollar sign) separated by colon, semicolon or pipe. Each slave key is a sequence of slave field names (first of them
should be preceded by dollar sign) separated by colon, semicolon or pipe.
You can also use the mentioned Join key wizard. When you click the Join key attribute row, a button appears
there. By clicking this button you can open the mentioned wizard.
In it, you can see the tab for the driver (Master key tab) and the tabs for all of the slave input ports (Slave key tabs).
(If you create the Join key using the Join key wizard, this wizard also adds semicolon and hash to the end of the
mentioned sequences. The last semicolon and the last hash are optional, they can be omitted.)
197
Joiners
Figure 18.9. Join Key Wizard (Master Key Tab)
In the driver tab there are two panes. The Fields pane on the left and the Master key pane on the right. You can
select the driver expression by selecting the fields in the Fields pane on the left and moving them to the Master
key pane on the right with the help of the Right arrow button.
Figure 18.10. Join Key Wizard (Slave Key Tab)
In each of the slave tab(s) there are two panes. The Fields pane on the left and the Key mapping pane on the right.
In the left pane you can see the list of the slave field names. In the right pane you can see two columns: Master
key field and Slave key field. The left column contains the selected field names of the driver input port. If you
want to map some driver field to some slave field, you must select the slave field in the left pane by clicking its
item, and by pushing the left mouse button, dragging to the Slave key field column in the right pane and releasing
the button you can transfer the slave field to this column. The same must be done for each slave. Note that you
can also use the Auto mapping button or other buttons in each tab.
Driver (Master) key must be unique for all slaves.
198
Joiners
Up to release 2.4 you had to be sure that your transformation could process even null records. From release 2.5
that is not necessary any more. Now each null record is substituted by a special null record for which all of
the getValue methods return null instead of throwing exception. If you want to take some action on a null
record, you can compare it to NullRecord.NULL_RECORD.
URL.
LookupJoin
This component has one input port and one or two output ports. The second output port is optional. It does not
need to be connected.
The metadata on the input port and that of the lookup table do not need to be the same. They do not even need to
have the same number of fields. Some of the records incoming through the first input port can be sent out through
the second optional output port if it is connected. Thus, the first input port and the second output port have the
same metadata. Nevertheless, the metadata on the input port cannot be propagated through the component to this
output edge. But, the metadata of the first input edge need only to be selected for the metadata of the second output
edge. The metadata of the first output edge must be created according to the desired result or you must select some
prepared. Only then you can define the transformation.
The component receives the data through the input port (driver) and from the lookup table (slave). After that,
for each driver record incoming through the input port the component looks up the corresponding slave records
from the lookup table. If such record(s) is(are) found, the driver record along with the slave record(s) are sent
to transformation class. The transform method is called for each pair of the driver and the corresponding slave
record. The component joins the records according to the specified Join key, transforms them and sends them to
the first output port. Each driver record with no slave can be sent to the optional second output port if the port is
connected to some other component. Only if the component is switched to the left outer join mode, none of the
driver records can be sent to the optional output port because they are all processed.
When you select this component, you must first specify the lookup table that should be used as the resource of
slave records (Lookup table). You must also decide whether the data stored in memory should be lost after the
process finishes (Free lookup table after finishing). It is set to false by default.
You must define the key that should be used to join the records (Join key). It is a sequence of field names from
the input metadata separated by semicolon. You can define the key with the help of the Edit key wizard.
199
Joiners
Figure 18.11. Edit Key Wizard
When you define the transformation, data records that are get from lookup table are considered as if they were
incoming through the port 1 (which is virtual).
You can also change the join type (Left outer join attribute). You can select either left outer join (true) or inner
join (false). The default value of this attribute is false.
By default the component uses the inner join type. It joins the records incoming through the input port with the
records from the lookup table, but only in case they have the same key value. The records incoming through the
input port that have the key value different from the values contained in the lookup table are not joined. Such
incoming records can be sent to the second optional output port if it is connected. If the second optional port is
not connected, the component discards the driver records that have no corresponding slave.
If you switch to the left outer join, even the driver records with no slave record are processed and none of them
can be sent to the second optional output port.
URL.
200
Joiners
DBJoin
This component has one input port and one or two output ports. The second output port is optional. It does not
need to be connected.
The metadata on the input port and that of the database table do not need to be the same. They do not even need
to have the same number of fields. When the second output port is connected, it can receive some of the records
incoming through the first input port. Thus, the first input port and the second output port have the same metadata.
Nevertheless, the metadata on the input port cannot be propagated through the component to this output edge.
But, the metadata of the first input edge need only to be selected for the metadata of the second output edge. The
metadata of the first output edge must be created by hand according to the desired result or you must select some
prepared. Only then you can define the transformation.
The component receives the data through the input port (driver) and from the database (slave). After that, for
each driver record incoming through the input port the component looks up the corresponding slave records from
the database table. If such record(s) is(are) found, the driver record along with the slave record(s) are sent to
transformation class. The transform method is called for each pair of the driver and the corresponding slave record.
The component joins the records according to the specified Join key and sends them out through the first output
port. Each driver record with no slave can be sent to the optional second output port if an edge is connected to
this port. Only if the component is switched to the left outer join mode, none of the driver records can be sent to
the optional output port because they are all processed.
When you select this component, you must first specify the database connection that should be used to connect
to the database (DB connection). The component uses to connect to the database some JDBC driver. You must
also define the query that should be sent to the database (SQL query). The database table serves as dynamic DB
lookup table and the resource of slave records. You may also want to specify the metadata of the database table
(DB Metadata). If you select no metadata, the component will get metadata with the help of the query.
You must define the key that should be used to join the records (Join key). It is a sequence of field names from
the input metadata separated by semicolon. You can define the key with the help of the same Edit key wizard
like in the LookupJoin component (see above).
When you define the transformation, data records that are loaded from databases are considered as if they were
incoming through the port 1 (which is virtual).
there, and, when you click this button, an Open Type wizard opens. In it, you must specify the desired file. (See
If no transformation is defined, only the records from the database table (slaves) are sent to the output port. But
only such slaves to which there exists some corresponding driver.
201
Joiners
You can also change the join type (Left outer join attribute). You can select either left outer join (true) or inner
join (false). The default value of this attribute is false.
By default the component uses the inner join type. It joins the records incoming through the input port with the
records from the database table, but only in case they have the same key value. The records incoming through
the input port that have the key value that differs from the values contained in the database table are not joined.
Such incoming records can be sent to the second output port if it is connected. If the second optional port is not
connected, the component discards the driver records that have no corresponding slave.
If you switch to the left outer join, even the driver records wit no slave record are processed and none of them
can be sent to the second output port.
You may change the number of sets of records from database with different key values that can be stored in
memory (Cache size). The default value is 100.
URL.
202
Chapter 19. Other Components
These components serve to fulfil some tasks that has not been mentioned already. We will describe them now.
Executing Components
These components executes some system, Java or database commands or run Clover graphs.
SystemExecute
This component has one optional input port and one optional output port. The purpose of these ports is described
below in this section.
When you select this component, you can connect either of the two ports but (at the same time) you do not need
to connect any of them. Either port can have different metadata. You must create both metadata by hand or select
some prepared. Metadata cannot be propagated through the component.
In case you do not connect the output port and you want to get some output, you need to specify the file to which
data should be written (Output file URL). In such a case, you must decide whether the data should be appended
to the output file (Append) or whether the file should be replaced by a new one. The value is false by default
("do not append data, replace it").
You must specify some system commands that should be executed (System command). These can be defined
with the help of the Edit value wizard. If the Command interpreter attribute is defined, the commands are saved
to a temporary file and this file is executed by the interpreter as a script.
You can also set the number of error lines (Number of error lines) that should be printed to the output file if
some errors happen.
You may also want to define which command interpreter should be used (Command interpreter). This attribute
must have the following form: interpretername [parameters] ${} [parameters]. If the command
interpreter is defined, system commands are written to a temporary file and executed as a script by the interpreter.
In such a case, the component replaces this ${} expression by the name of this script file.
If the command requires some data, it can be sent to the component through the optional input port from some
other component. In such a case, the input port must be connected.
JavaExecute
This component has neither input port nor output port.
When you select this component, you must specify what should be executed.
You must define one of the following three attributes: Runnable class, Runnable or Runnable URL. (Runnable
class is a path and a file name of some class, jar or zip file located outside the graph. Runnable is the transformation
203
Other Components
defined in the graph itself with the help of the Java language. Runnable URL is a path and a file name of some
file written in Java language.)
If you want to define the Runnable class attribute, you must click its item row, after which a button appears there,
and, when you click this button, an Open Type wizard opens. In it, you must specify the desired file. (See Section
If you want to define the Runnable attribute, you must click its item row, after which a button appears there, and,
when you click this button, an Edit value wizard opens. In this wizard you can define the transformation in Java
language. (See Section "Edit Value Wizard" for more information.)
If you want to define the Runnable URL attribute, you must click its item row, after which a button appears there,
You can also specify some properties that should be used when executing the Java command (Properties).
You may also want to set the character type (Charset) that should be used when reading from external Runnable
URL.
DBExecute
This component has one optional input port and one optional output port. The purpose of these ports is described
When you select this component, you can connect either of the two ports but (at the same time) you do not need
to connect any of them. Either port can have different metadata. You must create both metadata by hand or select
some prepared. Metadata cannot be propagated through the component.
When you select this component, remember that it uses a JDBC driver to connect to the database. Thus, you must
create the database connection and specify the corresponding attribute: DB connection.
You must also specify some SQL query. You can do it in one of the following two ways: As the SQL query
attribute or as the Query URL attribute. In other words, the query can be contained either in the graph itself or
outside the graph as a file containing the query. If you define both attributes, only SQL query will be applied.
In both cases you can decide whether the SQL commands should be sent to stdout (Print statements). This
attribute is set to false by default.
If you set the SQL query attribute, you can also define the statement delimiter for the query (SQL statement
delimiter). The default delimiter is semicolon. A query may consist of statements separated from each other by
semicolon. These statements will be executed one by one.
You must decide whether the query should be executed in transaction (Transaction set). This attribute has the
following three possible values: One statement, One set of statements, All statements. The default value is One
statement. (The releases of CloverEngine older than 2.4 have two possible boolean values of this attribute. The
default value for the older releases is false.) Remember that some database system does not support transactions.
• If the value of the attribute is One statement, commit is performed after each query execution.
• If the value of the attribute is One set of statements, all statements are executed for each input record. Commit is
performed after a set of statements. For this reason, if any error occurred during the execution of any statement,
all statements would be rolled back for such a record.
204
Other Components
• If the value of the attribute is All statements, commit is performed after all statements only. For this reason,
if any error occurred, all operations would be rolled back.
You may want to specify whether the query should be treated as a stored procedure/function calls that would use
the JDBC CallableStatement (Call as stored procedure). This attribute is set to false by default. If you switch
this attribute to true, you may have to define at least one of the following two series of parameters: Query input
parameters and Query output parameters.
• To call the stored procedure/function with input parameters, you must connect an edge to the input port, assign
it some metadata fields and define which fields should be used as such parameters. They must be expressed
with the help of the Query input parameters attribute.
This attribute must be of the form: 1:=$inField1;...;n:=$inFieldN since the parameters are num-
bered starting from 1. This way the mentioned input field names are mapped to the input parameters of the query.
• To call the stored procedure/function with returned value and/or output parameters, you must connect an edge
to the output port, assign it the necessary metadata fields and define to what fields such value(s) or parameter(s)
should be mapped. They must be expressed with the help of the Query output parameters attribute.
This attribute must be of the form: 1:=$outField1;...;n:=$outFieldN since the parameters are num-
bered starting from 1. This way the returned value and the mentioned output parameters of the query are mapped
to the output field names. Returned value is the first output parameter. If command returns some set of output,
you need to specify the sequence of output metadata field names separated from each other by semicolon (Re-
sult set output fields).
You may also want to set the character type (Charset) that should be used when reading from external Query
URL.
RunGraph
This component has one optional input port and two optional output ports. When you select this component, you
can connect either port but (at the same time) you do not need to connect any of them. There are two ways how to
connect the ports. They depend on the configuring the component. If both output ports are connected, they have
the same metadata. The metadata structure is described in this section. If the first input port and the first output
port are connected, they have different metadata.
Either port can have different metadata. You must create all metadata by hand or select some prepared. Metadata
cannot be propagated through the component. The metadata on the ports must have the structure as described
This component serves to execute any of the prepared Clover graphs.
When you select this component, you must define the graph that should be executed in the following two ways:
One way of doing it is to set the Graph URL attribute. In this case, you do not need to connect the input port but
the two output ports must be connected. The metadata on the output ports must have the structure as described
below. The Graph URL attribute is a path and a file that can be defined with the help of an URL File Dialog. The
information on whether the execution of the specified graph was successful or not is sent to them. The information
about successful execution is sent to the first output port, whereas the information about the fail is sent to the
second output port. The metadata of the edges that are connected to these output ports must have the following five
fields: graph, result, description, message, duration. The first four are of the string data type, the
last is of the decimal data type. (These metadata must be created by hand. See the corresponding section above.)
205
Other Components
The graph field contains the path and the name of the graph that was executed. The result field contains one
of the following: Finished OK, Aborted or Error. The description field contains a detailed description
if the graph fails. The message field is a string value of org.jetel.graph.Result. The duration field
contains the time of the graph execution in milliseconds.
The other way is to connect the input port through which the component will receive the data whose first field
is of the string data type containing the path and the file name of the graph that should be executed. The second
input field is optional. If it is defined and used, it contains a string Clover command line argument. If the first
input port is connected, only the first output port needs to be connected as well. The information on whether the
execution of the specified graph was successful or not is sent to this output port. The metadata on the output port
must have the same structure as described above.
If any of the graphs specified in the input port fails, the Ignore graph fail attribute decides whether the execution
will continue or not. This attribute is set to false by default. By default, the execution stops if any of the graphs fails.
When you select this component you must also decide which JVM should execute the specified graph (The same
JVM). The value of this attribute is set to true by default. The graph is executed in the same instance of JVM by
default. But, if you set this attribute to false, you can define the following attributes in addition to those mentioned
already:
First, you must define some other JVM (Alternative JVM command line). It is java by default. You may also
want to define the name of the main class that will execute the specified graph (Graph execution class). The
default value of this attribute is org.jetel.main.runGraph. You can also specify some arguments of the
command line (Command line arguments). You can also decide whether you want to log the result of the process
to a file (Log file URL). If you want to log the process, you must specify the path and the file name of the log
file by using an URL File Dialog. In addition to it, you can specify that the log information should be appended
(Append to log file). This value is set to false by default ("do not append, replace the file"). The information is
only string about the execution of the graph and about whether it was successful or not.
Non-Executing Components
The following three components do not execute any tasks, but they do some other work.
CheckForeignKey
This component has two input ports and one or two output ports.
When you select this component, you must connect both input ports and at least one output port. The second output
port is optional. But you can connect this port if you want.
The metadata on both input ports can be different. The metadata on the output(s) can be the same as those on the
first input port. They must at least have the same metadata structure (the number of fields, data types and sizes).
Field names may differ. Nevertheless, metadata cannot be propagated through this component.
The component receives two data flows (the primary and the foreign). The foreign data flow is connected to
the first input port and the primary data flow is connected to the second input port. The keys of both flows are
compared. If some value of the foreign key is not found among the values of the primary key, default value is
given to the foreign key instead of its invalid value. Then all of the foreign records are sent to the first output port
with the new foreign key values and the original foreign records with invalid foreign key values can be sent to
the optional second output port if it is connected.
206
Other Components
When you select this component, you must specify the foreign key (Foreign key).
In older versions of Clover you had to specify both the primary and the foreign keys using the Primary key and
the Foreign key attributes, respectively. They had the form of a sequence of field names separated from each other
by semicolon. However, the use of Primary key is deprecated from now.
The Foreign key is a sequence of individual assignments separated from each other by semicolon. Each of these
individual assignments looks like this: $foreignField=$primaryKey. Even the last individual assignment
is followed by semicolon and hash, however, these terminal characters are optional and can be omitted.
To define Foreign key, you must select the desired fields in the Foreign key tab of the Foreign key definition
wizard. Select the fields from the Fields pane on the left and move them to the Foreign key pane on the right.
Figure 19.1. Foreign Key Definition Wizard (Foreign Key Tab)
When you switch to the Primary key tab, you will see that the selected foreign fields appeared in the Foreign
key column of the Foreign key definition pane.
Figure 19.2. Foreign Key Definition Wizard (Primary Key Tab)
You only need to select some primary fields from the left pane and move them to the Primary key column of
the Foreign key definition pane on the right.
207
Other Components
Figure 19.3. Foreign Key Definition Wizard (Foreign and Primary Keys Assigned)
You must also define the default foreign key values (Default foreign key). This key is also a sequence of values
of corresponding data types separated from each other by semicolon. The number and data types must correspond
to metadata of the foreign key.
If you want to define the default foreign key values, you must click the Default foreign key attribute row and
type the default values of all fields.
You may also want to set the Hash table size attribute to some value. By default it is 512. Remember that this
value should be greater than the number of unique primary key values.
LookupTableReaderWriter
You can connect the input port or the output port(s) alone or both the input and the output port(s) at the same time.
This component can be used as a writer, a reader or both reader and writer at the same time.
When it is used as a reader, it reads data from the lookup table and sends it to the connected output edge(s). The
input port is not connected.
When it is used as a writer, it reads data from the connected input edge and writes it to the lookup table. The
output port is not connected.
When it is used both as a reader and a writer, it receives data from the connected input edge, updates the lookup
table, reads all data from the lookup table and sends it out through the output port(s). Both the input port and the
output port(s) are connected.
Remember that metadata of the lookup table can be the same as the metadata of the edge(s). Both the lookup table
and the edge(s) must at least have nearly the same metadata structure (the number of fields, data types and sizes).
Metadata name and even the field names may differ.
208
Other Components
When you select this component, you must specify the name of the lookup table (Lookup table). You must also
decide whether the data stored only in memory should be lost after the process finishes (Free lookup table after
finishing). It is set to false by default.
209
Chapter 20. Deprecated
This category includes some older components whose use is deprecated now. We suggest you do not use them.
However, four of these eight components have been used until recently. For this reason we describe them in this
chapter: DelimitedDataReader, FixLenDataReader, DelimitedDataWriter and FixLenDataWriter.
The other four were removed to Deprecated category longer ago and we do not describe them here. They are
Sort, Filter, HashJoin and MergeJoin.
You should use UniversalDataReader instead of DelimitedDataReader and FixLenDataReader and also Uni-
versalDataWriter instead of DelimitedDataWriter and FixLenDataWriter.
Flat File Readers

The following two components (DelimitedDataReader and FixLenDataReader) read data from flat files either
with delimited, or fixed length metadata only. Delimiters and sizes are defined in metadata.
These file readers can also receive data through their optional input port.
DelimitedDataReader
The use of this component is deprecated, we suggest you use UniversalDataReader instead.
output port, a new output port is created. You can extract metadata from a flat file.
This component reads data from flat files in which both fields and records are separated from each other by so
called delimiters (a character or a sequence of characters). This is the reason why individual record fields must not
contain the same sequences as their parts. If a delimiter were contained in some fields, such fields would be split
into parts or cut off because their inner part, their leading or trailing ends would be considered to be delimiter.
When you are configuring this component, you must specify these delimiters of fields and records.
If you want to put a delimiter into some field, you can do it if you surround this field value by single or double
quotes. This way such delimiter can be located inside a field value.
You must also decide which file should be read (File URL), what character type is used in these records (Charset).
Some files do not contain the names of the fields, whereas other files have them on the first line. In the latter case,
you must set the Skip first line property to true. By default it is false. You can select how many records should be
read from the file (Max number of records), otherwise the reader would read and send out all records. You can
also specify what to do in case of some incorrect records (Data policy). If you switch to controlled data policy,
you can log information about errors. In this component the log information is sent to stdout.
A limited number of rows may only be header describing data and not data itself. In such a case, you must set the
Skip rows attribute to the number of rows that must be skipped.
Sometimes there are white spaces between a field and delimiter, in such a case you may want to set the Trim
strings attribute to true - white spaces will be removed.
210
Deprecated
FixLenDataReader
The use of this component is deprecated, we suggest you use UniversalDataReader instead.
output port, a new output port is created. You can extract metadata from a flat file.
This component reads data from flat files in which all fields have exactly defined sizes. When you are configuring
this component, you must specify how many characters belong to each individual field of the records.
You must also decide which file should be read (File URL), what character type is used in these records (Charset).
If the file contains the names of the fields on the first line, you must skip this line by setting the Skip first line
property to true. You can select how many records should be read from the file (Max number of records),
otherwise the reader would read and send out all records. You can also specify what to do in case of some incorrect
and/or empty fields (Data policy). If you switch to controlled data policy, you can log information about errors.
In this component the log information is sent to stdout.
In addition to the attributes mentioned above, in this type of component, you also need to define the following:
Sometimes a limited number of rows is only header describing data and not data itself. In such a case, you must
set the Skip rows attribute to the number of rows that must be skipped.
The Byte mode attribute is set to false by default. You can switch to Byte mode by selecting or typing true in
this component wizard. After that, byte buffer will be used for data parsing. Otherwise, char buffer will be used.
It can be effective only for byte or cbyte data type.
You must also define whether white spaces in the leading and/or trailing ends of the fields should be skipped (Skip
leading blanks and/or Skip trailing blanks, respectively). Both of these attributes are true by default.
You also need to decide whether empty records should be skipped. You must set the Skip empty attribute to true
if you want. It is false by default.
Remember that if you select the Byte mode attribute to be true, the properties Skip leading blanks, Skip trailing
blanks and/or Skip empty have no effect on the process of data parsing.
Some records may be incomplete. In such a case, you must decide whether you want to have such records parsed
or not. If you want these records to be parsed, you must set the Enable incomplete attribute to true. By default,
it is true for char mode and false for byte mode.
You may also want to Trim strings.
211
Deprecated
Remember that a delimiter can be set in metadata even here. But now it is used only to delimit individual records.
Otherwise, you would have to specify how many fields are contained in one record instead of specifying the record
delimiter.
Flat File Writers

The following two components (DelimitedDataWriter and FixLenDataWriter) writes data to flat files either
with delimited, or fixed length metadata only. Delimiters and sizes are defined in metadata.
These file writers can also send data out through their optional output port.
DelimitedDataWriter
The use of this component is deprecated, we suggest you use UniversalDataWriter instead.
This component writes data to flat files in which both fields and records are separated from each other by so
called delimiters (a character or a sequence of characters). This is the reason why individual record fields must
not contain the same sequences as their parts. If a delimiter were contained in some fields, such fields would be
subsequently (on their reading) split into parts or cut off because their inner part, their leading or trailing ends
would be considered to be delimiter. When you are configuring this component, you must specify these delimiters
of fields and records.
When you select this component, you must specify the file(s) to which data should be written (File URL).
file(s) (Charset).
attributes, DelimitedDataWriter would write all incoming data records to output file(s).
212
Deprecated
FixLenDataWriter
The use of this component is deprecated, we suggest you use UniversalDataWriter instead.
This component writes data to flat files in which all fields have exactly defined sizes. When you are configuring
this component, you must specify how many characters belong to each individual field of the records.
When you select this component, you must specify the file(s) to which the data should be written (File URL).
You may also want to set the character type (Charset) that should be used for encoding data that will be written
to the output file(s).
You can also specify what character should be used for padding fields (Field filler) and/or padding gaps between
fields in output records (Record filler). Default field filler is a space, default record filler is an equal sign.
If you specify some fillers, you can decide whether data should be aligned to the left or to the right. You can set
the Left align attribute to false if you want to align data to the right. By default this attribute is set to true (data
is aligned to the left).
attributes, FixLenDataWriter would write all incoming data records to output file(s).
213
Deprecated
214
Appendix D. Defining
Transformations in Java
In the same way as you can define transformations in Clover transformation language (see Part IV), you can
also define them in Java. If you want to write transformations in Java, you must add some jar files to build path.
You must at least add the same two jar files that were added for creating metadata from dBase files. They are
cloveretl.engine.jar and commons-logging.jar. These files are contained in the following folder:
pathtotheeclipsefolder/eclipse/plugins/com.cloveretl.gui_2.0.0/lib/lib.
If you need to use connections, sequences and/or other tools, you must also add other appropriate jar files. See
Section "Creating Metadata from a DBase File" for more detailed information on how to add the mentioned jars.
215
Part IV. Transformation Language
Chapter 21. Clover Transformation
Language
Clover transformation language (CTL) is used to define transformation in some components. (in all Joiners,
Partition, DataIntersection, Reformat, Denormalizer and Normalizer).
Program Structure
Each program written in CTL must have the following structure:
ImportStatemenets
VariableDeclarations
FunctionDeclarations
Statements
Mappings
Remember that the ImportStatements must be at the beginning of the program and the Mappings must be
at its end. Both ImportStatements and Mappings may consist of more individual statements or mappings
either of which must be terminated by semicolon. The middle part of the program can be interspersed. Individual
declaration of variables and functions and individual statements does not need to be in this order. But they always
must use only declared variables and functions! Thus, first you need to declare variable and/or function before
you can use it in some statement or another declaration of variable and function.
Comments
Throughout the program you can use comments. These comments are not processed, they only serve to describe
what happens within the program.
The comments are of two types. They can be one-line comments or multiline comments. See the following two
options:
• //This is an one-line comment.
• /* This is a multiline comment. */
Import
First of all, at the beginning of the program in CTL, you can import some of the existing programs in CTL. The
way how you must do it is the following:
• import 'fileURL';
• import "fileURL";
You must decide whether you want to use single or double quotes. Single quotes do not escape so called escape
sequences. (For more detailes see Section "Literals" below.) For these fileURL, you must type the URL of some
existing source code file.
But remember that you must import such files at the beginning before any other declaration(s) and/or statement(s).
217
Clover Transformation Language
Data Types
In any program, you can use some variables. Their data types can be the following:
• int
This data type serves to store integer numbers.
To store a value, 32 bits are used.
Its range is from Integer.MIN_VALUE to Integer.MAX_VALUE (according to the Java integer data type).
From -2147483648 to +2147483647.
Its declaration look like this: int identifier;
The default value is 0.
If you add an l letter to the end of any integer number, you can cast it to the long data type
• long
This data type serves to store long numbers.
Its range is from Long.MIN_VALUE+1 to Long.MAX_VALUE (according to the Java long data type). From
-9223372036854775807 to +9223372036854775807.
Its declaration look like this: long identifier;
The default value is 0.
Any integer number can be cast to this data type by adding an l letter to the end of the integer number.
• decimal
This data type serves to store decimal numbers with arbitrary precision.
Its declaration can look like this: decimal identifier;
or it can be: decimal (length,scale) identifier;
The default length and scale are 8 and 2, respectively.
The default values of DECIMAL_LENGTH and DECIMAL_SCALE are contained in the

org.jetel.data.defaultProperties file.
You can cast any float number to the decimal data type by adding the d letter to the end of the float number.
• number (double)
This data type serves to store double numbers.
The data type has the special following three values: NaN, Infinity, -Infinity.
Its declaration look like this: number identifier;
The default value is 0.0.
218
• string
This data type serves to store sequences of characters.
To store a string, each character is stored in 16 bits.
The declaration can look like this: string identifier;
The default value is an empty string.
• date
This data type serves to store date and time.
Its declaration look like this: date identifier;
The default value is the current date and time.
• boolean
This data type serves for values of logical expressions.
It can be either true or false.
Its declaration look like this: boolean identifier;
The default value is false.
• bytearray
This data type is an array of bytes of a length that can be up to Integer.MAX_VALUE as a maximum. It
behaves similarly to the list data type (see below).
Its declaration can look like this: bytearray identifier;
or it can be: bytearray (size) identifier;
The default bytearray is an empty array.
• list
This data type is a container of any data type.
The list data type is indexed by integers.
Its declaration looks like this: list identifier;
The default list is an empty list.
Examples:
list list2; examplelist2[5]=123;
Assignments:
• list1=list2;
It means that both lists reference the same elements.
• list1[ ]=list2;
It adds all elements of list2 to the end of list1.
219
• list1[ ]="abc";
It adds the "abc" string to the list1 as its new last element.
• list1[ ]=NULL;
It removes the last element of the list1.
• map
This data type is a container of any data type.
The map is indexed by strings.
Its declaration looks like this: map identifier;
The default map is an empty map.
Example: map map1; map1["abc"]=true;
The assignments are similar to those valid for a list.
• record
This data type is a set of fields of data.
The structure of record is based on metadata.
Its declaration look like this: record (metadata) identifier;
Remember that metadata id must be used in record declaration. Do not use metadata name here!
The variable has not any default value.
It can be indexed by both integer numbers and strings.
Literals
Literals serve to write the data types mentioned above.
• Integer
These literals represent integer data type expressed in decimal form.
They can be marked using the following form: -[0-9]+ or [0-9]+. For example, -25487 or 25487.
• Octal integer
These literals represent integer data type expressed in octal form. In other words, in the base-8 numeral system.
They can be marked using the following form: 0[0-7]+. For example, 0644.
• Hexadecimal integer
These literals represent integer data type expressed in hexadecimal form. In other words, in the base-16 numeral
system.
They can be marked using the following form: 0x[0-9A-F]+. For example, 0x2AF3.
220
• Long integer
These literals represent long data type. In other words, they represent integer numbers greater than 232.
They can be marked using the following form: [0-9]+L. For example, 956230781312312331287L.
• Number (Double)
These literals represent floating point numbers in double precision format.
They are stored in 64 bits.
They can be marked using the following form: [0-9]+.[0-9]+.
For example, 452.126 is representation of a double.
• Decimal
These literals represent decimal numbers with fixed precision.
They can be marked using the following form: [0-9]+.[0-9]+D.
For example, 235.32D is representation of a decimal.
• Double quoted string
These literals can represent strings.
They are sequences of characters surrounded by double quotes.
To express unprintable characters, you can use so called escape sequences like the following pairs: \t (tabu-
lator), \n (line feed), \r (carriage return), etc. These pairs of characters are escaped to their corresponding
unprintable characters.
You must not use a double quote sign inside any double quoted literal. However, if you need to use a double
quote inside, you can do it by using a double quote sign preceded by a backslash: \". This pair is escaped to
a double quote character.
For example, "Hello\tworld!" is a double quoted representation of a string containing a tabulator.
• Single quoted string
These literals can represent strings.
They are sequences of characters surrounded by single quotes.
Unlike double quoted literals, they cannot express so called escape sequences.
Single quote alone must not be contained in a single quoted literal. However, it can be used if it is preceded by
a backslash. This way the pair of backslash and single quote is escaped to a single quote character.
For example, 'Hello\tworld!' is a single quoted representation of a string. But the backslash and
the t letter are not converted together to a tabulator unlike the same two characters in double quoted literals.
Pairs of backslash and a letter other than single quote remain pairs of characters.
• Date
These literals represent date.
They have the following form: yyyy-[M[M]]-[d[d]].
221
For example, 2008-06-12 is representation of a date.
• Datetime
These literals represent date and time.
They have the following form: yyyy-[M[M]]-[d[d]] [h[h]]:[m[m]]:[s[s]].
For example, 2008-06-12 17:21:15 is representation of a datetime.
• List of literals
These literals represent lists of other literals including lists, maps, records.
For example, ["Hello\tworld!", 9, 25.3, 2008-06-12, ['Hello\tworld!', 0x27,

09]] is representation of a list of literals.
Variables
If you define some variable, you must do it by typing data type of the variable, white space, the name of the
variable and semicolon.
Such variable can be initialized later, but it can also be initialized in the declaration itself. Of course, the value of
the expression must be of the same data type as the variable.
Both cases of variable declaration and initialization are shown below:
• dataType variable;
...
variable=expression;
• dataType variable=expression;
Operators
The operators serve to create more complicated expressions within the program. They can be arithmetic, relational
and logical. The relational and logical operators serve to create expressions with resulting boolean value. The
arithmetic operators can be used in all expressions, not only the logical ones.
Arithmetic Operators
The following operators serve to put together values of different expressions (except those of boolean values).
These signs can be used more times in one expression. In such a case, you can express priority of operations by
parentheses. The result depends on the order of the expressions.
• Addition
The operator above serves to sum the values of two expressions.
But the addition of two boolean values or two date data types is not possible. To create a new value from two
boolean values, you must use logical operators instead.
Nevertheless, if you want to add any data type to a string, the second data type is converted to a string automat-
ically and it is concatenated with the first (string) summand. But remember that the string must be on the first
222
place! Naturally, two strings can be summed in the same way. Note also that the concat() function is faster
and you should use this function instead of adding any summand to a string.
You can also add any numeric data type to a date. The result is a date in which the number of days is increased
by the whole part of the number. Again, here is also necessary to have the date on the first place.
The sum of two numeric data types depends on the order of the data types. The resulting data type is the same
as that of the first summand. The second summand is converted to the first data type automatically.
• Subtraction and Unitary minus
The operator serves to subtract one numeric data type from another. Again the resulting data type is the same
as that of the minuend. The subtrahend is converted to the minuend data type automatically.
But it can also serve to subtract numeric data type from a date data type. The result is a date in which the number
of days is reduced by the whole part of the subtrahend.
• Multiplication
The operator serves only to multiplicate two numeric data types.
• Division
The operator serves only to divide two numeric data types. Remember that you must not divide by zero. Divid-
ing by zero throws TransformLangExecutorRuntimeException or gives Infinity (in case of a
number data type)
• Modulus
The operator can be used for both floating-point data types and integer data types. It returns the remainder of
division.
• Incrementing
++
The operator serves to increment numeric data type by one. The operator can be used for both floating-point
data types and integer data types.
If it is used as a prefix, the number is incremented first and then it is used in the expression.
If it is used as a postfix, first, the number is used in the expression and then it is incremented.
• Decrementing
--
The operator serves to decrement numeric data type by one. The operator can be used for both floating-point
data types and integer data types.
If it is used as a prefix, the number is decremented first and then it is used in the expression.
If it is used as a postfix, first, the number is used in the expression and then it is decremented.
223
Relational Operators
The following operators serve to compare some subexpressions when you want to obtain a boolean value result.
Either of the mentioned signs can be used. If you choose the .operator. signs, they must be surrounded by
white spaces. These signs can be used more times in one expression. In such a case you can express priority of
comparisons by parentheses.
• Greater than
Either of the two signs below can be used to compare expressions consisting of numeric, date and string data
type. Both data types in the expressions must be comparable. The result can depend on the order of the two
expressions if they are of different data type.
• >
• .gt.
• Greater than or equal to
Either of the three signs below can be used to compare expressions consisting of numeric, date and string data
• >=
• =>
• .ge.
• Less than
Either of the two signs below can be used to compare expressions consisting of numeric, date and string data
• <
• .lt.
• Less than or equal to
Either of the three signs below can be used to compare expressions consisting of numeric, date and string data
• <=
• =<
• .le.
• Equal to
Either of the two signs below can be used to compare expressions of any data type. Both data types in the
expressions must be comparable. The result can depend on the order of the two expressions if they are of
different data type.
• ==
• .eq.
224
• Not equal to
Either of the three signs below can be used to compare expressions of any data type. Both data types in the
expressions must be comparable. The result can depend on the order of the two expressions if they are of
different data type.
• !=
• <>
• .ne.
• Matches regular expression
The operator serves to compare string and some regular expression. The regular expression can look like this
(for example): "[^a-d].*" It means that any character (it is expressed by the dot) except a, b, c, d (exception
is expressed by the ^ sign) (a-d means - characters from a to d) can be contained zero or more times (expressed
by *). Or, '[p-s]{5}' means that p, r, s must be contained exactly five times in the string. For more detailed
explanation about how to use regular expressions see java.util.regex.Pattern.
• ~=
• .regex.
• Contained in
This operator serves to specify whether some value is contained in the list or in the map of other values.
• .in.
Logical Operators
If the expression whose value must be of boolean data type is complicated, it can consist of some subexpressions
(see above) that are put together by logical conjunctions (AND, OR, NOT, .EQUAL TO, NOT EQUAL TO). If
you want to express priority in such an expression, you can use parentheses. From the conjunctions mentioned
below you can choose either form (for example, && or and, etc.).
Every sign of the form .operator. must be surrounded by white space.
• Logical AND
• &&
• and
• Logical OR
• ||
• or
• Logical NOT
• !
• not
• Logical EQUAL TO
• ==
225
• .eq.
• Logical NOT EQUAL TO
• !=
• <>
• .ne.
Simple Statement and Block of Statements

Simple statement is an expression terminated by semicolon. Block of statements is a serie of simple statements
(either is terminated by semicolon). The statements in a block can follow each other in one line or they can be in
more lines. But remember that each of the statements in such a block must be terminated by semicolon. Sometimes
this block of statements must be surrounded by curled braces (if it is part of some other statement and must be
executed as one statement). In this case, no semicolon is used after the closing curled brace.
Control Statements
Some statements serve to control the process of the program.
Selection Statements
These statements serve to branch out the process of the program.
If Statement
On the basis of the Condition value this statement decides whether the Statement should be executed. If the
Condition is true, Statement is executed. If it is false, the Statement is ignored and process continues
next after the If statement. Statement is either simple statement or a block of statements
• if (Condition) Statement
Unlike the previous version of the If statement (in which the Statementis executed only if the Condi-
tion is true), other Statements that should be executed even if the Condition value is false can be added to
the If statement . Thus, if the Condition is true, Statement1 is executed, if it is false, Statement2
is executed. See below:
• if (Condition) Statement1 else Statement2
The Statement2 can even be another If statement and also with else branch:
• if (Condition1) Statement1
else if (Condition2) Statement3
else Statement4
Switch Statement
Sometimes you would have very complicated statement if you created the statement of more branched out If
statement. In such a case, much more better is to use the Switch statement.
Now, the Condition is evaluated and according to the value of the Expression you can branch out the
process. If the value of Expression equals to the value of the Expression1, the Statement1 are executed.
The same is valid for the other Expression/Statement pairs. But, if the value of Expression does not
equal to none of the Expression1,...,ExpressionN, nothing is done and the process jumps over the
226
Switch statement. And, if the value of Expression equals to the values of more ExpressionK, more
StatementK (for different K) are executed.
• switch (Expression) {
case Expression1:Statement1
...
case ExpressionN:StatementN
}
In the following case, even if the value of Expression does not equal to the values of the
Expression1,...,ExpressionN, StatementN+1 is executed.
• switch (Expression) {
...
case ExpressionN:StatementN
default:StatementN+1
}
Iteration Statements
These iteration statements repeat some processes during which some inner Statements are executed cyclically
until the Condition that limits the execution cycle becomes false.
For Loop
First, the Initialization is set up, after that, the Condition is evaluated and if its value is true, the Statement
is executed and finally the Iteration is made.
During the next cycle of the loop, the Condition is evaluated again and if it is true, Statement is executed
and Iteration is made. This way the process repeats until the Condition becomes false. Then the loop is
terminated and the process continues with the other part of the program.
If the Condition is false at the beginning, the process jumps over the Statement out of the loop.
• for (Initialization;Condition;Iteration) {
Statement
}
Do-While Loop
First, the Statement is executed, then the process depends on the value of Condition. If its value is true, the
Statement is executed again and then the Condition is evaluated again and the subprocess either continues
(if it is true again) or stops and jumps to the next or higher level subprocesses (if it is false). Since the Condition
is at the end of the loop, even if it is false at the beginning of the subprocess, the Statement is executed at
least once.
• do {
Statement
} while (Condition)
While Loop
This process depends on the value of Condition. If its value is true, the Statements is executed and then
the Condition is evaluated again and the subprocess either continues (if it is true again) or stops and jumps to
227
the next or higher level subprocesses (if it is false). Since the Condition is at the start of the loop, if it is false
at the beginning of the subprocess, the Statements is not executed at all and the loop is jumped over.
• while (Condition) {
Statement
}
Jump Statements
Sometimes you need to control the process in a different way than by decision based on the Condition value.
To do that, you have the following options:
Break Statement
If you want to stop some subprocess, you can use the following word in the program:
• break
The subprocess breaks and the process jumps to the higher level or to the next Statements.
Continue Statement
If you want to stop some iteration subprocess, you can use the following word in the program:
• continue
The subprocess breaks and the process jumps to the next iteration step.
Return Statement
In the functions you can use the return word either alone or along with some expression. (See the following
two options below.) The return statement must be at the end of the function. If it were not at the end, all of the
variableDeclarations, Statements and Mappings located after it would be ignored and skipped. The
whole function both without the return word and with the return word alone returns null, whereas the function
with the return expression returns the value of the expression. Remember that the data type of the
expression must be the same as that of the declared return value.
• return
• return expression
Functions
You can also define your own functions in the following way:
• function functionName (arg1,arg2,...) {

variableDeclarations
Statements
Mappings
[return [expression]]
}
You must put the return statement at the end (For more information about the return statement see Section "Return
Statement" above.), right before it there can be some Mappings, the variableDeclarations and State-
ments must be at the beginning, the variableDeclarations and Statements can even be interspersed,
but you must remember that undeclared and uninitialized variables cannot be used. So we suggest that first you
declare variables and only then specify the Statements.
228
Eval
The following two functions allow to parse, execute or insert some CTL expression into you CTL program.
The first function (eval(someExpression)) parses some expression and adds it to the place where the
eval(someExpression) is executed.
The second function (eval_exp(someExpression)) parses and executes some CTL expression and clean
it once it is executed. If you want to evaluate some mathematical expression or perform a simple task, it is good
to use this function.
• eval()
• eval_exp()
Parameters
The parameters can be used in Clover transformation language in the following way: ${nameOfTheParame-
ter}. If you want such a parameter is considered a string data type, you must surround it by single or double
quotes like this: '${nameOfTheParameter}' or "${nameOfTheParameter}".
Sequences
In your graphs you are also using sequences. You can use them in CTL by specifying the name of the sequence
and placing it as an argument in the sequence() function.
You have three options depending on what you want to do with the sequence. You can get the current number
of the sequence, or get the next number of the sequence, or you may want to reset the sequence numbers to the
initial number value.
See the mentioned following three options:
• sequence(nameOfTheSequence).current
• sequence(nameOfTheSequence).next
• sequence(nameOfTheSequence).reset
Although these expressions return integer values, you may also want to get long or string values. This can be done
in one of the following ways:
• sequence(nameOfTheSequence,long).current
• sequence(nameOfTheSequence,long).next
• sequence(nameOfTheSequence,string).current
• sequence(nameOfTheSequence,string).next
Lookup Tables
In your graphs you are also using lookup tables. You can use them in CTL by specifying the name of the
lookup table and placing it as an argument in the lookup(), lookup_next(), lookup_found() or
lookup_admin() functions.
You have five options depending on what you want to do with the lookup table. You can create lookup table, get
the value of the specified field name from the lookup table associated with the specified key, or get the next value
229
of the specified field name from the lookup table, or (if the records are duplicated) count the number of the records
with the same field name values, or you can destroy the lookup table.
Now, the key is a sequence of values of the field names separated by comma (not semicolon!). Thus, the key is
of the following form: keypart1,keypart2,...,keypartN.
See the mentioned following five options:
• lookup_admin(nameOfTheLookupTable).init
• lookup(nameOfTheLookupTable,key).fieldName
• lookup_next(nameOfTheLookupTable).fieldName
• lookup_found(nameOfTheLookupTable)
• lookup_admin(nameOfTheLookupTable).free
Data Flows
This section describes the way how the record fields should be marked. As you know, each component has some
ports. Both input and output ports are numbered starting from 0. And they can be marked by their names as well.
The names of data flows are the names of metadata. And the names of data fields are the names of metadata fields.
Thus, if you want to mark any field within any data flow, you must do it by marking both the data flow and the
data field. They must be separated by dot, the data flow name on the left from the dot, the data field on the right
from the dot. Each of them can be marked by either number or name. Therefore, you can have the following four
possibilities how to mark record field(s) (of which you can use either their number or name):
• flowNumber.fieldNumber
• flowNumber.fieldName
• flowName.fieldNumber
• flowName.fieldName
Mapping
Similarly like some existing files with some code must be imported at the beginning of the program, also mapping
must be at the end of the program.
When you want to do some mapping, you must do it in the following way:
Each mapping is an assignment of inputs to outputs. Since CTL is used in a component that have output ports,
each mapping serves to assign (map) values to the output port(s).
The procedure is as follows: On the left side of any mapping, there is an output record field to which some value(s)
are assigned. On the right side, there are the value(s). The left side and the right side are put together by colon
and equal sign ( := ) On the right side, there is a sequence of expressions offering values separated by one white
space, one colon and another white space. The number of these expressions is unlimited (but it can also be only
one expression terminated by semicolon). The whole sequence is terminated by semicolon as well. (For more
information about how to mark the record fields see Section "Data Flows".)
When the mapping is being done, the expressions are evaluated going from left to right, and when some of these
expressions is found to be successful, it is mapped to the left side of the mapping.
• recordField:=expression1 : expression2 : expression3 : ... : expressionN;
230
Appendix E. Clover TL Functions
Clover transformation language has at its disposal a set of functions you can use. We describe them here.
Conversion Functions
Sometimes you need to convert values from one data type to another. This can be done by using the following
functions:
• bytearray base64byte(string arg);
The base64byte(string) function takes one string argument in base64 representation and converts it to
an array of bytes. Its counterpart is the byte2base64(bytearray) function.
• string bits2str(bytearray arg);
The bits2str(bytearray) function takes an array of bytes and converts it to a string. Its counterpart is
the str2bits(string) function.
• int bool2num(boolean arg);
The bool2num(boolean) function takes one boolean argument and converts it to either integer 1 (if
the argument is true) or integer 0 (if the argument is false). Its counterpart is the num2bool(numeric)
function.
• numerictype bool2num(boolean arg, typename numerictype);
The bool2num(boolean, typename) function accepts two arguments: the first is boolean and the other
is the name of any numeric data type. It takes them and converts the first argument to the corresponding 1 or 0
in the numeric representation specified by the second argument. The return type of the function is the same
as the second argument. Its counterpart is the num2bool(numeric) function.
• string byte2base64(bytearray arg);
The byte2base64(bytearray) function takes an array of bytes and converts it to a string in base64
representation. Its counterpart is the base64byte(string) function.
• string byte2hex(bytearray arg);
The byte2hex(bytearray) function takes an array of bytes and converts it to a string in hexadecimal
representation. Its counterpart is the hex2byte(string) function.
• long date2long(date arg);
The date2long(date) function takes one date argument and converts it to a long type. Its value equals to
the number of milliseconds elapsed from January 1, 1970, 00:00:00 GMT to the date specified as
the argument. Its counterpart is the long2date(long) function.
• int date2num(date arg, unit timeunit);
The date2num(date, unit) function accepts two arguments: the first is date and the other is any time unit.
The unit can be one of the following: year, month, day, hour, minute, second, millisecond. The
function takes these two arguments and converts them to an integer. If the time unit is contained in the date, it is
returned as an integer number. If it is not contained, the function returns 0. Remember that months are numbered
starting from 0. Thus, date2num(2008-06-12, month) returns 5. And date2num(2008-06-12,
hour) returns 0.
231
Clover TL Functions
• string date2str(date arg, string pattern);
The date2str(date, string) function accepts two arguments: date and string. The function
takes them and converts the date according to the pattern specified as the second argument. Thus,
date2str(2008-06-12, "dd.MM.yyyy") returns the following string: "12.6.2008". Its counter-
part is the str2date(string, string) function.
• bytearray hex2byte(string arg);
The hex2byte(string) function takes one string argument in hexadecimal representation and converts it
to an array of bytes. Its counterpart is the byte2hex(bytearray) function.
• date long2date(long arg);
The long2date(long) function takes one long argument and converts it to a date. It adds the argument
number of seconds to January 1, 1970, 00:00:00 GMT and returns the result as a date. Its counterpart
is the date2long(date) function.
• boolean num2bool(numerictype arg);
The num2bool(numeric) function takes one argument of any numeric data type representing 1 or 0 and
returns boolean true or false, respectively.
• numerictype num2num(numerictype arg, typename numerictype);
The num2num(numerictype, typename) function accepts two arguments: the first is of any numeric
data type and the second is the name of any numeric data type. It takes them and converts the first argument value
to that of the numeric type specified as the second argument. The return type of the function is the same as the
second argument. The conversion is successful only if it is possible without any loss of information, otherwise
the function throws exception. Thus, num2num(25.4, int) throws exception, whereas num2num(25.0,
int) returns 25.
• string num2str(numerictype arg);
The num2str(numeric) function takes one argument of any numeric data type and converts it to its string
representation. Thus, num2str(20.52) returns "20.52" .
• string num2str(numerictype arg, int radix);
The num2str(numerictype, int) function accepts two arguments: the first is of any numeric data type
and the second is integer. It takes these two arguments and converts the first to its string representation in the
radix based numeric system. Thus, num2str(31, 16) returns "1F".
• bytearray str2bits(string arg);
The str2bits(string) function takes one string argument and converts it to an array of bytes. Its coun-
terpart is the bits2str(bytearray) function.
• boolean str2bool(string arg);
The str2bool(string) function takes one string argument and converts it to the corresponding boolean
value. The string can be one of the following four: "true", "1", "false", "0". The first two strings are
converted to boolean true, the other two are converted to boolean false.
• date str2date(string arg, string pattern);
The str2date(string, string) function accepts two string arguments. It takes them and converts
the first string to the date according to the pattern specified as the second argument. The pattern must
correspond to the structure of the first argument. Thus, str2date("12.6.2008", "dd.MM.yyyy")
returns the following date: 2008-06-12 .
232
Clover TL Functions
• date str2date(string arg, string pattern, string locale, boolean lenient);
The str2date(string, string, string, boolean) function accepts three string arguments and
one boolean. It takes the arguments and converts the first string to the date according to the pattern spec-
ified as the second argument. The pattern must correspond to the structure of the first argument. Thus,
str2date("12.6.2008", "dd.MM.yyyy") returns the following date: 2008-06-12 . The third
argument defines the locale for the date. The fourth argument specify whether date interpretation should be
lenient (true) or not (false). If it is true, the function tries to make interpretation of the date even if it does not
match locale and/or pattern. If this function has three arguments only, the third one is interpreted as locale (if
it is string) or lenient (if it is boolean).
• numerictype str2num(string arg);
The str2num(string) function takes one string argument and converts it to the corresponding numeric
value. Thus, str2num("0.25") returns 0.25 if the function is declared with double return type, but the
same throws exception if it is declared with integer return type. The return type of the function can be any
numeric type.
• numerictype str2num(string arg, typename numerictype);
The str2num(string, typename) function accepts two arguments: the first is string and the second is the
name of any numeric data type. It takes the first argument and returns its corresponding value in the numeric
data type specified by the second argument. The return type of the function is the same as the second argument.
• numerictype str2num(string arg, typename numerictype, int radix);
The str2num(string, typename, int) function accepts three arguments: string, the name of any
numeric data type and integer. It takes the first argument as if it were expressed in the radix based numeric
system representation and returns its corresponding value in the numeric data type specified as the second
argument. The return type is the same as the second argument. The third argument can be 10 or 16 for double
type as the second argument, 10 for decimal type as the second argument and any integer number between
Character.MIN_RADIX and Character.MAX_RADIX for int and long types as the second argument.
• string to_string(anytype arg);
The to_string(anytype) function takes one argument of any data type and converts it to its string rep-
resentation.
• bool try_convert(anytype from, anytype to, string pattern);
The try_convert(anytype, anytype, string) function accepts three arguments: two are of any
data type, the third is string. The function takes these arguments, tries convert the first argument to the second. If
the conversion is successful, the second argument receives the value from the first argument. And the function
returns boolean true. If the conversion is not successful, the function returns boolean false and the first and
second arguments retain their original values. The third argument is optional and it is used only if any of the
first two arguments is string. For example, try_convert("27.5.1942", dateA, "dd.MM.yyyy")
returns true and dateA gets the value of the 27 May 1942.
Date Functions
When you work with date, you may use the following functions:
• date dateadd(date arg, numerictype amount, unit timeunit);
The dateadd(date, numerictype, unit) function accepts three arguments: the first is date, the
second is of any numeric data type and the last is any time unit. The unit can be one of the following: year,
month, day, hour, minute, second, millisecond. The function takes the first argument, adds the
amount of time units to it and returns the result as a date. The amount and time unit are specified as the
second and third arguments, respectively.
233
Clover TL Functions
• int datediff(date later, date earlier, unit timeunit);
The datediff(date, date, unit) function accepts three arguments: two dates and one time unit. It
takes these arguments and subtracts the second argument from the first argument. Then the function returns the
resulting time difference expressed in time units specified as the third argument. Thus, the difference of two dates
is expressed in defined time units. The result is expressed as an integer number. Thus, date(2008-06-18,
2001-02-03, year) returns 7. But, date(2001-02-03, 2008-06-18, year) returns -7!
• date today();
The today() function accepts no argument and returns current date and time.
• date trunc(date arg);
The trunc(date) function takes one date argument and returns the date with the same year, month and day,
but hour, minute, second and millisecond are set to 0.
• long trunc(numerictype arg);
The trunc(numerictype) function takes one argument of any numeric data type and returns its truncated
long value.
• null trunc(list arg);
The trunc(list) function takes one list argument, empties its values and returns null.
• null trunc(map arg);
The trunc(map) function takes one map argument, empties its values and returns null.
Mathematical Functions
You may also want to use some mathematical functions:
• numerictype abs(numerictype arg);
The abs(numerictype) function takes one argument of any numeric data type and returns its absolute value.
• number e();
The e() function accepts no argument and returns the Euler number.
• number exp(numerictype arg);
The exp(numeric) function takes one argument of any numeric data type and returns the result of the ex-
ponential function of this argument.
• number log(numerictype arg);
The log(numerictype) takes one argument of any numeric data type and returns the result of the natural
logarithm of this argument.
• number log10(numerictype arg);
The log10(numerictype) function takes one argument of any numeric data type and returns the result of
the logarithm of this argument to the base 10.
• number pi();
The pi() function accepts no argument and returns the pi number.
234
Clover TL Functions
• number pow(numerictype base, numerictype exp);
The pow(numerictype, numerictype) function takes two arguments of any numeric data types (that
do not need to be the same) and returns the exponential function of the first argument as the exponent with
the second as the base.
• number random();
The random() function accepts no argument and returns a random positive double greater than or equal to
0.0 and less than 1.0.
• long round(numerictype arg);
The round(numerictype) function takes one argument of any numeric data type and returns the long that
is closest to this argument.
• number sqrt(numerictype arg);
The sqrt(numerictype) function takes one argument of any numeric data type and returns the square root
of this argument.
String Functions
Some functions work with strings. Here are the functions:
• string char_at(string arg, numerictype index);
The char_at(string, numerictype) function accepts two arguments: the first is string and the other
is of any numeric data type. It takes the string and returns the character that is located at the position specified
by the index.
• string concat(anytype arg1, ... ..., anytype argN);
The concat(anytype, ..., anytype) function accepts unlimited number of arguments of any data
type. But they do not need to be the same. It takes these arguments and returns their concatenation. If some
arguments are not strings, they are converted to their string representation before the concatenation is done. You
can also concatenate these arguments using plus signs, but this function is faster for more than two arguments.
• string get_alphanumeric_chars(string arg);
The get_alphanumeric_chars(string) function takes one string argument and returns only letters
and digits contained in the string argument in the order of their appearance in the string. The other characters
are removed.
• string get_alphanumeric_chars(string arg, boolean takeAlpha, boolean tak-

eNumeric);
The get_alphanumeric_chars(string, boolean, boolean) function accepts three arguments:

one string and two booleans. It takes them and returns letters and/or digits if the second and/or the third argu-
ments, respectively, are set to true.
• int index_of(string arg, string substring);
The index_of(string, string) function accepts two strings. It takes them and returns the index of the
first appearance of substring in the string specified as the first argument.
235
Clover TL Functions
• int index_of(string arg, string substring, int fromIndex);
The index_of(string, string, int) function accepts three arguments: two strings and one integer.
It takes them and returns the index of the first appearance of substring counted from the character located
at the position specified by the third argument.
• boolean is_ascii(string arg);
The is_ascii(string) function takes one string argument and returns a boolean value depending on
whether the string can be encoded as an ASCII string (true) or not (false).
• boolean is_blank(string arg);
The is_blank(string) function takes one string argument and returns a boolean value depending on
whether the string contains only white space characters (true) or not (false).
• boolean is_date(string arg, string pattern);
The is_date(string, string) function accepts two string arguments. It takes them, compares the first
argument with pattern and returns a boolean value depending on whether the first argument can be converted
to date using this pattern (true) or not (false).
• boolean is_date(string arg, string pattern, string locale, boolean lenient);
The is_date(string, string, string, boolean) function accepts three string arguments and
one boolean. It takes them, compare the first argument with the second as a pattern, use the third (locale) and
if it is comparable and can be converted to date, the function returns true independently on the fourth argument.
If it were not possible, it would return false. But, it is possible to set the fourth argument to true and the function
will try to judge whether the string is date. If it is successful, again true is returned.
• boolean is_integer(string arg);
The is_integer(string) function takes one string argument and returns a boolean value depending on
whether the string can be converted to an integer number (true) or not (false).
• boolean is_long(string arg);
The is_long(string) function takes one string argument and returns a boolean value depending on
whether the string can be converted to a long number (true) or not (false).
• boolean is_number(string arg);
The is_number(string) function takes one string argument and returns a boolean value depending on
whether the string can be converted to a double (true) or not (false).
• string join(string delimiter, anytype arg1, ... ..., anytype argN);
The join(string, anytype, ..., anytype) function accepts unlimited number of arguments. The
first is string, the others are of any data type. All data types do not need to be the same. The arguments that are
not strings are converted to their string representation and put together with the first argument as delimiter.
• string left(string arg, numerictype length);
The left(string, numerictype) function accepts two arguments: the first is string and the other is of
any numeric data type. It takes them and returns the substring of the length specified as the second argument
counted from the start of the string specified as the first argument.
• string lowercase(string arg);
The lowercase(string) function takes one string argument and returns another string with cases convert-
ed to lower cases only.
236
Clover TL Functions
• string remove_blank_space(string arg);
The remove_blank_space(string) function takes one string argument and returns another string with
white spaces removed.
• string remove_diacritic(string arg);
The remove_diacritic(string) function takes one string argument and returns another string with
diacritic signs removed.
• string replace(string arg, string regex, string replacement);
The replace(string, string, string) function accepts three string arguments. The first is the
original string in which the second is searched and if it is found, it is replaced by the third string argument. The
second argument is some regular expression. The resulting string is returned. Thus, replace("Hello",
"[Ll]", "t") returns "Hetto".
• string right(string arg, numerictype length);
The right(string, numerictype) function accepts two arguments: the first is string and the other is
of any numeric data type. It takes them and returns the substring of the length specified as the second argument
counted from the end of the string specified as the first argument.
• string soundex(string arg);
The soundex(string) function takes one string argument and converts the string to another. The resulting
string consists of the first letter of the string specified as the argument and three digits. The three digits are based
on the consonants contained in the string when similar numbers correspond to similarly sounding consonants.
Thus, soundex("word") returns "w600".
• list split(string arg, string regex);
The split(string, string) function accepts two string arguments. The second is some regular expres-
sion. It is searched in the first string argument and if it is found, the string is split into the parts located between
the characters or substrings of such a regular expression. The resulting parts of the string are returned as a list.
Thus, split("abcdefg", "[ce]") returns ["ab", "d", "fg"].
• string substring(string arg, numerictype fromIndex, numerictype length);
The substring(string, numerictype, numerictype) function accepts three arguments: the first
is string and the other two are of any numeric data type. The two numeric types do not need to be the same.
The function takes the arguments and returns a substring of the defined length obtained from the original string
by getting the length number of characters starting from the position defined by the second argument. If
the second and third arguments are not integers, only the integer parts of them are used by the function. Thus,
substring("text", 1.3, 2.6) returns "ex".
• string translate(string arg, string searchingSet, string replaceSet);
The translate(string, string, string) function accepts three string arguments. The number of
characters must be equal in both the second and the third arguments. If some character from the string specified
as the second argument is found in the string specified as the first argument, it is replaced by a character taken
from the string specified as the third argument. The character from the third string must be at the same position
as the character in the second string. Thus, translate("hello", "leo", "pii") returns "hippi".
• string trim(string arg);
The trim(string) function takes one string argument and returns another string with leading and trailing
white spaces removed.
237
Clover TL Functions
• string uppercase(string arg);
The uppercase(string) function takes one string argument and returns another string with cases convert-
ed to upper cases only.
Miscellaneous Functions
The rest of the functions can be denominated as miscellaneous. These are the following:
• void breakpoint();
The breakpoint() function accepts no argument and prints out all global and local variables.
• anytype iif(boolean con, anytype iftruevalue, anytype iffalsevalue);
The iif(boolean, anytype, anytype) function accepts three arguments: one is boolean and two are
of any data type. Both argument data types and return type are the same.
The function takes the first argument and returns the second if the first is true or the third if the first is false.
• boolean isnull(anytype arg);
The isnull(anytype) function takes one argument and returns a boolean value depending on whether the
argument is null (true) or not (false). The argument may be of any data type.
• anytype nvl(anytype arg, anytype default);
The nvl(anytype, anytype) function accepts two arguments of any data type. Both arguments must be
of the same type. If the first argument is not null, the function returns its value. If it is null, the function returns
the default value specified as the second argument.
• void print_err(anytype message);
The print_err(anytype) function accepts one argument of any data type. It takes this argument and prints
out the message on the error port.
• void print_err(anytype message, boolean printLocation);
The print_err(type, boolean) function accepts two arguments: the first is of any data type and the
second is boolean. It takes them and prints out the message and the location of the error (if the second argument
is true).
• void print_log(level loglevel, anytype message);
The print_log(level, anytype) function accepts two arguments: the first is a log level of the mes-
sage specified as the second argument, which is of any data type. The first argument is one of the following:
debug, info, warn, error, fatal. The function takes the arguments and sends out the message to a
logger.
• void print_stack();
The print_stack() function accepts no argument and prints out all variables from the stack.
• void raise_error(string message);
The raise_error(string) function takes one string argument and throws out error with the message
specified as the argument.
238
Appendix F. Clover Transformation
Language Lite
Transformations can be defined in Java or Clover transformation language, but you could use Clover transforma-
tion language Lite as well. Nevertheless, its use is deprecated now and we suggest you write transformations in
Java or CTL instead.
Here we will explain how CTL Lite looks like.
Input ports and output ports are represented by words in and out.
Ports are represented by numbers. They are numbered starting from 0.
Field names are used to represent fields of records.
You can get record values using the $ sign.
The resulting expressions look like the following two: ${in.numberofinputport.fieldname} for inputs
and ${out.numberofoutputport.fieldname} for outputs.
Input field values can be assigned to output fields with the help of equal sign.
Here is an example: ${out.0.name} = ${in.0.fname} + ${in.0.lname}
Parameters and sequences can also be used in CTL Lite. Their values must be expressed this way:
${par.parametername} and ${seq.sequencename}. The par and seq words means that parameter
and sequence are used in these expressions and parametername and sequencename are the names of the
parameter and the name of the sequence.
If you want to get the whole objects instead of field values, you can use the @ sign in-
stead of $. The structure is the same: @{in.numberofinputport.fieldname} for inputs and
@{out.numberofoutputport.fieldname} for outputs.
All of these expressions can be used in defining transformations in some components, but we once more suggest
you use Clover transformation language and/or Java instead of CTL Lite.
239

CloverETL/GUI User's Manual Version 2.0

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CloverETL/GUI User's Manual Version 2.0

Uploaded by

Copyright:

Available Formats

CloverGUI

Copyright © 2008 Javlin, a.s. All rights reserved.

How You Can Create External (Shared) Metadata ..................................................... 65

13. Introduction to Components ........................................................................................... 111

17. Transformers ............................................................................................................... 169

Literals .................................................................................................................... 220

A.2. Preferences Wizard ............................................................................................................... 34

8.12. Extracting Internal Metadata from a Database ........................................................................... 72

13.9. Viewing Data in Components .............................................................................................. 119

The Way How You Should Download Clover-

Downloading the Eclipse Platform

In Linux, the folder contains an executable eclipse file.

Figure 2.1. The Eclipse Logo

Figure 2.2. You Are Asked To Select a Workspace

(You could set C:\Users\johnsmith\Desktop\eclipse\workspace, for example.)

Figure 2.3. You Can Select the Following Workspace

In Linux, you can choose /home/cloveruser/Desktop/eclipse/workspace. Again, with your user-

Figure 2.4. The Eclipse Platform Introductory Screen

Downloading the Eclipse GEF Plugin

Figure 2.5. Downloading the Graphical Editing Framework

Figure 2.6. Install/Update Wizard

Figure 2.7. Searching for the Graphical Editing Framework

Figure 2.8. List of Mirrors for Download

Now you must accept the terms in the license agreements.

Figure 2.9. The Eclipse License Agreement

Figure 2.10. The About Eclipse SDK Window

Figure 2.11. List of Installed Plugins

Downloading the CloverGUI Plugin

First, you must register an account at the company site: www.cloveretl.org/user/register.

Figure 2.12. List of Update Sites

Thus, the resulting window should be as follows:

Figure 2.13. Adding the Clover Update Site

Figure 2.14. Selecting the Sites that Should Be Updated

Figure 2.15. CloverGUI Prompt

Figure 2.16. Clover Products to Install

Figure 2.17. Clover Has Been Installed

Creating a New Project

Figure 2.18. Creating a New Project

Figure 2.19. Selecting the New Project Wizard

Figure 2.20. Giving a Name to a New Project

Figure 2.21. CloverETL Examples Project

Figure 2.22. Opening the CloverETL Perspective

Figure 2.23. CloverETL Perspective

Creating a New Graph

Figure 2.25. Creating a New Graph

Figure 2.26. Giving a Name to a New Graph

We decided to have the workspace.prm file included.

Figure 2.27. Selecting the Parent Folder for the Graph

Figure 2.28. CloverETL Perspective with Highlighted Graph Editor

Figure 2.30. Opening the Workspace.prm File

Figure 2.31. The Parameters Contained in the Workspace.prm File

There are three ways of running a graph:

Figure 2.32. Running a Graph from the Main Menu

Figure 2.33. Running a Graph from the Context Menu

Figure 2.34. Running a Graph from the Upper Tool Bar

Figure 2.35. Open Run Dialog

Figure 2.36. Setting Up Memory Size

Figure 2.37. Successful Data Parsing

Figure 2.38. Console Tab with an Overview of the Graph Processing

And, below the edges, counts of parsed data should appear:

Figure 2.39. Counting Parsed Data

Figure 2.40. Enlarging the Font of Numbers

Figure 2.41. Setting the Font Size