Professional Documents
Culture Documents
What's New in DataStage 8 - FINAL
What's New in DataStage 8 - FINAL
x Summary
DataStage has seen major transformations in the past years from an extract-transform-load (ETL) tool running in what was called the UniVerse engine, to what is now a DataStage engine. With the need to adapt to demands of volume processing, Ascential acquired and integrated the parallel processing engine into DataStage. To provide data quality and data integration solutions, Ascential added more products that included MetaStage, QualityStage and ProfileStage. Enter IBM Information Server, the latest release from IBM after acquiring Ascential. IBM Information Server delivers improved functionalities that make job administration, development and maintenance easier.
Page 1 of 5
Confidential Material
ODBC Connector
ODBC Connectors are for accessing external data sources and can be used to read, write, look up and filter data or simply to test the database connectivity during job design. The Test Button on connectors allows developers to test database connections without having to view the data or to run the job. ODBC Connector is the best stage to use for your database because it gives the maximum parallel performance and offers more features compared to database enterprise stages and plug-in stages. ODBC Connector has DataStage schemas to data types, attributes specify, runtime errors a schema reconciliation function that automatically compares external-resource schemas such as a database. Schemas include and field lengths. Based on the reconciliation rules that you or extra transformation on mismatched schemas can be avoided.
ODBC Connecter has its own reject-handling function which eliminates the need to add a Modify or a Transformer stage for capturing SQL errors or for aborting jobs. A choice between number of rows or percentage or rows rejected can be specified for terminating the job run. ODBC Connector can transfer large objects (LOB) using inline or reference methods. However, a connector is the only stage that does reference methods so another connector is needed to transfer the LOB inline later in the job.
SQL Builder
The enterprise database stages in WebSphere DataStage as well as the ODBC Connector stage provide a graphical interface for building SQL that makes it easier for developers to view, create and edit SQL. While previous releases support auto-generated queries that synchronize the SQL select list with the DataStage column list, the SQL that can be created is very limited the table alias has to be manually appended to the columns and the where clause has to be manually keyed in. The SQL Builder provides a drag-anddrop approach for creating the column list and the where clause and supports most database functions. It is important to note that the SQL Builder is limited to one or two source tables and a simple SQL statement.
Range Look-up
Range Look-up is equivalent to the operator between. Lookup against a range of values was difficult to implement in previous DataStage versions. By having this functionality in
Page 2 of 5
Confidential Material
Advanced Find
The Quick Find and the Advanced Find are very useful tools for developers as it does a quick search within the project or within the repository for object usage or dependencies. The results are presented in a detailed view or graphical view. This facility can be used for impact analysis or when doing project-wide job changes.
Parameter Set
Parameter Set is a new repository object containing parameter names and values that can be shared across jobs. Parameter Sets provide an easier and faster method when adding parameters to a job, eliminating the need to add parameters invidiually to each job. Parameter values can be stored in a file, adding flexibility when changing parameter values at runtime or when changing environment.
Resource Estimation
Resource Estimation tool is designed for planning how much resources will have to be allocated to the DataStage jobs as volume of data grows. The Resource Estimation utility can project how much CPU, scratch and disk requirements per operator per partition will be utilized by the job given a sample size of data. The output is presented in a graphical form that can also be generated in a HTML report.
Performance Analysis
The Performance Analysis tool is useful for identifying bottlenecks in a job. The graphical report visualizes the start and end time for each stage in the job, the physical amount of time needed for each stage to complete in each partition, the total CPU and system time for each stage, the memory utilization for each stage over the course of the job, and much more.
Job Compare
While it is a good to keep copies of different job implementations, we frequently lose track of the job changes. In our attempt to improve job design and performance, our experiments often lead us back to the older copy of the job. This is where the Job Compare feature of WebSphere DataStage comes in handy. Job Compare also works on other objects like routines, shared containers, and table definitions; whether they are in the same or in different repositories.
Job Locking
In WebSphere DataStage, jobs can be opened in read-only mode. Developers need not guess who has the job locked as this information is provided when opening jobs that are in use. Sessions can also be disconnected from the web console if there is a need such as restarting the DataStage server while there are existing connections. Unlocking a job
Page 3 of 5
Confidential Material
Job Monitoring
Time-based and record-based monitoring of jobs is available, both of which can adjust to system loads, thus eliminating an unnecessary refresh when the DataStage server is at its peak performance.
Conclusion
The look and feel of DataStage and QualityStage canvas remains the same but the new functionalities are major enhancements over the previous version. Data Connection Object, Parameter Set, Range Look-up and Slowly Changing Dimension are all designed to simplify design, help cut implementation effort and reduce cost. Advance Find provides a good way to do impact analysis, an importanat step in project management. Resource Estimation is as important for project planning. Meanwhile, Perfornamce Analysis tool is another useful feature that can be used throughout the lifecycle of a job. By knowing what causes a performance bottleneck, production support groups can better cope with the ever-shrinking batch windows. While Advance Find will not perform a Replace function and SQL Builder will not let us build complex SQL, all the changes in version 8 have positive impact on job development, production support and project management. Combined with the features offered in Information Server, existing customers who are looking to upgrade or new DataStage clients will benefit from the new enhancements.
Page 4 of 5
Confidential Material
Author Luisa Garcia Senior Information Management Consultant, Alpine Consulting, Inc. For more information on DataStage, Information Server, or related products, please contact Alpine Consulting at 847.605.0788 or info@alpineinc.com.
Page 5 of 5
Confidential Material