A Technical Paper on ‘EAI – eTL implementation using JCAPS’

Name PRASENJIT GHOSH Prasenjit.Ghosh@wipro.com Sudarsana Raju Sangaraju Sudarshan.Sangaraju@wipro.com

Account / Business Group Emerson / Manufacturing

Author(s)

Reviewed by

Emerson / Manufacturing

Transform. or even Enterprise Service Bus. Many ETL vendors now have data profiling. Increasingly. An ETL process can be created using almost any programming language but creating them from scratch is quite complex.  Tech Leads with Integration background. A good ETL tool must be able to communicate with the many different relational databases and read the various file formats used throughout an organization. systems that now cover much more than just the extraction. data quality and metadata capabilities. companies are buying ETL tools to help in the creation of ETL processes. ETL can also be used for the integration with legacy systems. transformation and loading of data. Wipro – Confidential Page 2 of 14 . Intended Audience This document is intended to assist  Developers with working knowledge of JCAPS. and Load (ETL) is a process in data warehousing that involves following activities    Extraction of data from outside sources Transforming it to fit into business needs Loading the transformed data into the Enterprise data warehouse. Although ETL is a concept that has been adapted by almost all middleware tools that are currently running in the market.Data Transfer Using eTL Abstract Extract. ETL tools have started to migrate into Enterprise Application Integration. this technical paper describes how huge data load can be transferred between source and destination database systems using ETL adapter in Seebeyond Java CAPS. ETL refers to a process that loads any database.

......... 4 2................... How ETL Works.........................2..........................1 Data Truncation Problem..........................................................3..........................................2..................................................... 13 5.4. 12 5........................................................... About the ETL .......................1........................ ETL Tool Scope and Constraints......................... 4 2...... 6 3................................................2.. Technical specifications....................2............................................. 5 2...................................... 13 4................... Introduction............... ETL Supported Datatypes..2...............................4 Runtime Output........................................ 4 Solution Overview ..................1...............1............. Connectivity Map ........................................ 12 5................... 14 Wipro – Confidential Page 3 of 14 ............................ Data Transfer Logic / Code: ........................... References......................bpJdeToFTP ..............................................................lang.............................................................Data Transfer Using eTL Table of Contents 1........................................ Use Case Scenario 1: connection between two different DB ........................2...................... 12 5........ 12 5...............................................cmJdeToFTP........................ Acronyms and Glossary.. 11 5 Known issues & work-around / Solution:.......................................................2 Automap Inactive for Flat File OTD ..................... Business Process ..................................... 7 3....... 2........... 8 3.............................. Use Case Scenario2: connection Between DB and BatchLocal or Batch FTP8 3.........5.............................. 9 3......... Where it can be used (Business Scenarios)...........................3 java.OutOfMemoryError: Java heap space Out of Memory...................................... 4 2.... 9 3.......................... Sample Integration Solution – Transferring bulk data from Source to Target .................. 9 Observations ............ 7 3.....4........................................2.......................................... 5 2..................................................3.............

You can also use ETL for data type conversions. Transform. About the ETL Extraction. Healthcare businesses almost in every industry. and organizations are deploying file transfer underpinning for the automation of key  Transfer of weekly or monthly Financial data Transfer of weekly or monthly Business information from source system to destination system Transfer of Annual report Banking.Data Transfer Using eTL 1. 2.2. regulatory compliance. ETL Integrator provides high-volume extraction and loading of tabular data sets for Java CAPS projects. and Load (ETL) is a data integration methodology that extracts data from data sources. strategically as the fundamental business processes. to migrate data from one database/platform to another. The paper also contains a case study of ETL implementation using Java CAPS ETL tool. or acquire a more permanent data set for the population of a data mart or data warehouse. High volume file transfer is a universal requirement for every organization. to perform extract and transmit bulk/large volumes of data from legacy database systems. Finally the lessons learned. Businesses today use file transfer many more business requirements. integration. Solution Overview 2. You can use ETL Integrator to acquire a temporary subset of data for reports or other purposes. Introduction In this paper an attempt has been made to explain about the ETL process. historically satisfied through utilities like FTP and home grown solutions which provided basic capabilities for sending and receiving files. the business scenarios requiring ETL.1. security. transforms and cleanses the data. then loads the data in a uniform format into one or more target data sources. Moreover. 2.    in more sophisticated ways and to satisfy including control. do’s and don’t and work arounds implemented to overcome the existing ETL tool constraints were also presented in detail. The paper presents the advantages accrued by using Java CAPS ETL in comparison to conventional modes of data extraction with results. Wipro – Confidential Page 4 of 14 . Where it can be used (Business Scenarios) Java CAPS ETL tool can be used when a bulk data to be transferred from source database to destination database within a comparatively small amount of time.

Java CAPS ETL tool uses connection pooling mechanism to optimize database connections.3. Flat file DB etc. How ETL Works ETL can connect two or more databases. ETL Tool Scope and Constraints Scope This section enlists in detail what ETL can do  It supports bulk data transfer between multiple databases –  Example: Oracle. This object can then be used to efficiently execute this statement multiple times.Data Transfer Using eTL 2. 2. it is much faster compared to conventional data fetching techniques. This requires creation of otds from source and target database table structures using database otd wizard.  Data transformation is supported through a set of built in operators/methods. In our current case study we have created otds using prepared statements. then mapping them using ETL collaboration rule editor. Constraints This section enlists existing constraints with Current ETL tool  Partially supports data cleansing Wipro – Confidential Page 5 of 14 .4. To fetch data from the database. Statement. but setting up a new connection for each request often is. SQL server. the application needs to establish a new data base connection for each request. This special type of statement is derived from the more general class. Connection pooling It is a technique used to avoid the overhead of making a new database connection every time an application or server object requires access to a database. A database connection pool avoids this bottleneck. The advantages of using connection pooling and prepared statement otds listed below. A SQL statement is precompiled and stored in a PreparedStatement object. The database access itself is not the bottleneck. Since ETL implements connection pooling and prepared statements. DB2 400. Sometimes it is more convenient to use a PreparedStatement object for sending SQL statements to the database. Prepared Statement otd object It is an object that represents a precompiled SQL statement.

some data types can be transformed.3. the data must follow one of these formats:  yyyy-MM-dd HH:mm:ss.5. and timestamp (for more information pls refer section 2. The list below shows the supported data types for flat file Projects:  varchar (default)  numeric  time  timestamp If a flat file is created using the time or timestamp data type.SSS  yyyy-MM-dd HH:mm:ss  yyyy-MM-dd  MM-dd-yyyy  HH:mm:ss Wipro – Confidential Page 6 of 14 . ETL Supported Datatypes ETL Projects can handle many data types. time. numeric. 2.2). others are merely passed through without transformation. Irrespective of the source or target data type.Data Transfer Using eTL  ETL support only 4 data types namely varchar (default). we have to map through these data types.

Build the business process for the same. Use Case Scenario 1: connection between two different DB The interface transferring the data from one database to another database (in this example from Oracle DB to JDE DB).1. Build the collaboration map (CMAP) and deployment profile. Step6: build and deploy. Step3: Map the corresponding fields in the ETL editor.Data Transfer Using eTL 3. is very easy and fast to built with the ETL. Build the source and destination system OTD. Step2. Step4. Data can be transformed while mapping with destination system fields. Steps to be followed: Step1. Step5. Build the ETL collaboration with source and destination OTDs created in step 1. Wipro – Confidential Page 7 of 14 . Sample Integration Solution – Transferring bulk data from Source to Target 3. Let us take an example where ETL is transferring data from Oracle DB (Table: TARGETEMPLOYEE) and JDE DB (Table: F00XACT) with proper transformation of data according to the requirement.

We have to choose flat file DB as intermediate location in order to transfer it to the FTP location as ETL can connect only two databases.Business Process . Use Case Scenario2: connection Between DB and BatchLocal or Batch FTP As we noticed above that connecting two external databases through ETL is very easy but if we want to transfer the data from DB to Batch Local location or Batch FTP server.2. the process will become complex.bpJdeToFTP Wipro – Confidential Page 8 of 14 .Data Transfer Using eTL  The scheduler triggers the trigger message which is received by the business process bpOraToJde and that in turn invokes the ETL eTL service.1.2. Below is the example of how that can be implemented: 3.  The ETL then connects to Outbound Oracle external location and fetches all (or some of the records based on the condition mentioned in the ETL) the records from Oracle. 3.  The ETL then connects to the inbound JDE external system and load all the records onto JDE.

3.Data Transfer Using eTL 3.CSV file that was stored by ETL eTLRecvFromJde.4.3.2.2. Data Transfer Logic / Code: Collaboration Name jcdDelvFTP Wipro – Confidential Page 9 of 14 .Connectivity Map .  The collaboration then writes the file into the destination batch FTP location.2.  The collaboration jcdDelvFTP then connects to the batch local file system and reads the .cmJdeToFTP  The scheduler triggers the trigger message which is received by the business process bpDelvToFlatDB and that in turn invokes the ETL eTLRecvFromJde. bpDelvToFlatDB sends the message to tpcDelvEvent which invokes services svcDelvFTP that in turn invokes the java collaboration jcdDelvFTP.2.  After ETL completes its job.  The ETL eTLRecvFromJde then connects to Outbound DB2 location and fetches all (or some of the records based on the condition mentioned in the ETL) the records from JDE.  The ETL then transformed the data accordingly and transfer the same into flat file DB.Technical specifications 3.

// placed the streams into inbound FTP location Wipro – Confidential Page 10 of 14 . Chop the file into different streams.eWays.BatchFTP Seebeyond.getClient().BatcheWay. // Set target file name for inbound Batch FTP location.getConfiguration(). instBatchFTP.getTargetFileName()). } catch(Exception) { Print Exception message.getConfiguration().getInputStreamAdapter() ). // chop the file into streams instBatchFTP.BatcheWay. Connect to the Outbound Local File Location.JMS.getClient().setInputStreamAdapter(instBatchLocalFile.getClient( ). } Code instBatchLocalFile.BatchLocalFile Seebeyond.getConfiguration(). Get target file name from Batch Local Location.JMS Business Logic Try { Subscribe to the incoming trigger message from the JMS topic tpcDelvEvent.CSV" ).receive OTD instance instBatchLocalFile instBatchFTP instJMS SeeBeyond.put().setTargetFileName( "ABC.eGate.eWays. Publish append the streams into the inbound batch FTP location.eGate.Data Transfer Using eTL Web Service Operation OTD Name Seebeyond. instBatchFTP. Connect to the inbound FTP location.setAppend( true ). // Get target file name from Batch Local Location. Set target file name for inbound Batch FTP location. //set append as true instBatchFTP.

2 monk approaches. In ICAN eGate505 jcd approach: project Used XML. CME to Marshall and Unmarshall data between receive and delivery service.5.5.29 min 2 hrs 4. Wipro – Confidential Page 11 of 14 .    ETL Transfer eGate505 java collaboration for extract and Load eGate 4. Even though there is no intermediate XML (CME) used in case of eGate 4.5. TRUNCATE is a better option from the performance stand point 4.Data Transfer Using eTL 4 Comparison of ETL Transfers with eGate5. For deleting total records from a table. the data extraction and load times were much higher compared to eGate505 jcd and ETL approaches.2 monk approach.2 monk Transfer Time Taken using S. eGate 4. IQ’s were used for data persistence.25 hrs > 6 hrs > 6 hrs Observations 1.1 and monk transfer methods The table below contains the data extract/load times and their comparisons.2 (Monk) 1 2 DB2/AS400 DB2/AS400 6 51 543849 111583 20.5.No Source System # Columns Extracted/ Loaded (Records) ETL eGate 505 (Jcd) eGate 4. that will also add to little overhead 3.58 min 22. In eGate we are using DELETE prepared statement before loading. which increases the no of hops for the data transfers 2. In ETL there is a configuration option to truncate data before loading. In ICAN egate505 jcd.

suppose the data in the source database for a particular field is “2007-07-07 23:12:25:012365”. And the domain will move into hung state Work-around/Solution: Our administrator raised a ticket with Sun Seebeyond team for the same. 10000 etc). in this case exact data will be transferred at the target database.lang. Work-around/Solution: Sun Seebeyond is aware of the problem and a patch is expected soon. If the patch is not available with user. the data will automatically be truncated and the data that will be sent to target database is “2007-07-07 23:12:25:123”. But if data is like “2007-07-07 23:12:25:123000”.SSS if the last 3 digits of millisecond are zeros.OutOfMemoryError: Java heap space Out of Memory ETL is used for transferring a huge data load between source and destination. 5. after applying the patch to our environment the problem got resolved. 5. developer may get this OutOfMemory Exception.SSSSSS is getting truncated to YYYY-MM-DD HH:mm:ss.1 Data Truncation Problem When data is being fetched from source database in the format YYYY-MM-DD HH:mm:ss. ETL transfer the records in Batches (of 5000. After working around they send a patch to our team. the auto map facility does not work for mapping target and source fields even if all the field names in target and source side are matching.2 Automap Inactive for Flat File OTD While mapping the source fields with the target database fields. This is happening even after the data type has been changed to varchar at target side. For that. user can follow the below steps in order to avoid this exception: Wipro – Confidential Page 12 of 14 . For example.Data Transfer Using eTL 5 Known issues & work-around / Solution: 5. In spite of that. Work-around/Solution: No resolution was found for this issue but business will not be affected for this as it can be ensured that no valid data will be lost. This issue comes while working with flat file OTD.3 java.

transferring the decimal data through ETL.com/app/docs/doc/819-6857 Page 13 of 14 Wipro – Confidential . In ETL decimal data type can be mapped with Numeric data types (as there is no data type as decimal. These times are generated based on source system time. Developers can not change the same. Note: One point should be mentioned in this context (although this does not belong to the problem of ETL) is. Work-around/Solution: Raise a ticket with Sun seebeyond team mentioning the problem.sun. Note: Xmx can not be increased more than 2048. References   From Wikipedia.Data Transfer Using eTL Login to http://logicalhost:portnumber with uid "Administrator" and go to JVM setting > JVM options and request seebeyond administrator to change two parameters: Xmx value to 1024m MaxPermSize to 512m If the exception is still experienced the Xmx value needs to be increased. III. The runtime output can generate 4 arguments: I.1 for more info) and developers must set the scale explicitly according to the requirements.4 Runtime Output There is problem while generating runtime output from ETL. II. 5. For example. IV.1.2584 then the scale should set at 4 while building the flat file DB for ETL. the free encyclopedia. Sun SeeBeyond ETL(TM) Integrator User's Guide available at http://docs. Status Count Start time End time Developers have no control on these start time and end time. Then restart the domain and redeploy the project. check 2. if source DB is holding the value 1258. 4.

Data cleansing Data cleansing is the act of detecting and correcting (or removing) corrupt or inaccurate records from a record set Wipro – Confidential Page 14 of 14 . parallel processing makes a program run faster because there are more engines (CPUs) running it.Data Transfer Using eTL 5. in which a single CPU executes several programs at once. it is often difficult to divide a program in such a way that separate CPUs can execute different portions without interfering with each other. Ideally. Acronyms and Glossary JCAPS parallel processing Java Composite Application Platform Suite The simultaneous use of more than one CPU to execute a program. In practice. Parallel processing differs from multitasking.