IBM WebSphere DataStage : A Brief Overview

Rajeev Priyadarshi, President, PR3 Systems rpriyadarshi@pr3systems.com

Copyright PR3 Systems, 2005

Topics to be covered
What is Data Warehousing, ETL and Business Intelligence? Product Overview of DataStage Types of DataStage Clients DataStage Administrator DataStage Manager DataStage Designer DataStage Director
Copyright PR3 Systems, 2005

Why is Data Warehousing?
A data warehouse is a collection of data gathered and organized so that it can easily by analyzed, extracted, synthesized, and otherwise be used for the purposes of further understanding the data. It may be contrasted with data that is gathered to meet immediate business objectives such as order and payment transactions, although this data would also usually become part of a data warehouse.

Copyright PR3 Systems, 2005

What is Data ETL?
A process of gathering, converting and storing data, often from many locations. The data is often converted from one format to another in the process. ETL is an abbreviation for "Extract, Transform and Load" Examples : IBM DataStage, Informatica

Copyright PR3 Systems, 2005

What is BI? Business intelligence (BI) is a broad category of application programs and technologies for gathering. online analytical processing (OLAP). and providing access to data to help enterprise users make better business decisions. BI applications include the activities of decision support. and data mining. query and reporting. statistical analysis.businessobjects. analyzing.com Copyright PR3 Systems. storing. 2005 . forecasting. Examples : BusinessObjects : www.

New technologies. 2005 . Return on Investment high. hence requires new resources.Careers in this Domain Much easier to pick up than software languages. Less Competitive than mainstream software development. Copyright PR3 Systems. Salary / Training Effort Ratio.

Copyright PR3 Systems. 2005 . We have come up with an unique approach of ETL project development [PR3 RUSK Framework] enhancing the re-usability. We have completed several successful DataStage projects for Fortune 500 companies. scalability and HighAvailability of the processing framework.PR3 DataStage Training Course We specialize in providing education and consulting services for IBM’s WebSphere DataStage Products.

DataStage Architecture DataStage Server (Unix / Windows / MF) DataStage Client (Windows) Administrator Designer Director Manager DataStage Client (Windows) Administrator Designer Director Manager Copyright PR3 Systems. 2005 DataStage Client (Windows) Administrator Designer Director Manager .

From database to flat files. Eg. Changing of data from one format to another.Product Overview DataStage is a product from IBM being used as the strategic ETL tool within many organizations. Interacts with WebSphere MQ to provide real time processing capabilities triggered by external messages. XML files. 2005 . etc. Fast access to data that doesn’t change often. Copyright PR3 Systems. It can be used for multiple purposes: Interfacing between multiple databases.

deploy and run datastage jobs. the jobs can be kicked off through scripts directly on Unix servers Copyright PR3 Systems. 2005 . In a deployment environment.Usage of DataStage within organizations DataStage has Windows Clients which connect to the Server on the Unix / Windows or Mainframe platform. The clients can be used to develop.

Types of DataStage clients DataStage Administrator DataStage Designer Datastage Manager Datastage Director Copyright PR3 Systems. 2005 .

3. or as dsadm (for UNIX servers). It is enabled only when at least one project exists. otherwise your rights are restricted as described in the previous section. To access the DataStage Administrator: 1. you have unlimited administrative rights. 2. choose DataStage Administrator. Log on to the server. a client program provided with DataStage. The DataStage Administration window appears: The General page lets you set server-wide properties. The controls and buttons on this page are enabled only if you logged on as an administrator Copyright PR3 Systems.DataStage Administrator Most DataStage configuration tasks are carried out using the DataStage Administrator. If you do so as an Administrator (for Windows NT servers). From the Ascential DataStage program folder. 2005 .

Administrator Interface Copyright PR3 Systems. 2005 .

Administrator Interface Copyright PR3 Systems. 2005 .

Administrator Interface Copyright PR3 Systems. 2005 .

2005 .Project Properties Screen Copyright PR3 Systems.

Custom routines and transforms can also be created in the Manager Copyright PR3 Systems.DataStage Manager Used to store and manage re-usable metadata for the jobs. Used to import and export components from file-system to Datastage projects. Primary interface to the DataStage Repository. 2005 .

2005 .DataStage Routines (Manager window) Copyright PR3 Systems.

2005 .Importing / Exporting a Project Copyright PR3 Systems.

2005 . integrate.DataStage Designer DataStage Designer is used to : Create DataStage Jobs that are compiled into executable programs. Design the jobs that extract. aggregate. cleansing. load. Integrating and loading data. Copyright PR3 Systems. and transform the data. transforming. Create and reuse metadata and job components Allows you to use familiar graphical point-andclick techniques to develop processes for extracting.

2005 . Split data into multiple outputs on the basis of defined constraints Copyright PR3 Systems. Specify data transformations. Decode data going into the target tables using reference lookups Aggregate Data.DataStage Designer Use Designer to: Specify how data is extracted.

and add links. still working in the Designer. drop them onto the Designer work area. A job created with the Designer is easily scalable. you define the required actions and processes for each stage and link. additional data sources. get it working. Then.DataStage Designer The Designer graphical interface lets you select Stage icons. then insert further processing. and so on. This means that you can easily create a simple job. Copyright PR3 Systems. 2005 .

Defines the columns contained in a data table. A text file that describes the format of a file in COBOL terms. COBOL File Description. A tool used from within the DataStage Manager or DataStage Designer to view the content of a table or file. CFD Column definition Container stage Data Browser Copyright PR3 Systems. A plug-in stage supplied with DataStage that bulk loads data into a Microsoft SQL Server or Sybase table. Includes the column name and the type of data contained in the column. 2005 . A built-in stage type that represents a group of stages and links in a job design.DataStage Terms and Concepts Term Aggregator stage BCPLoad stage Description A stage type that computes totals or other functions of sets of data.

cleanse. integrate. and transforms that define how to extract. 2005 . See also mainframe job and server job. A table definition which describes the structure of the table is an example of meta data. and load data into a target database. transform. Data about data. A collection of linked stages. Copyright PR3 Systems. stage Link Partitioner stage meta data A server job stage that allows you to partition data so that it can be processed in parallel on an SMP system. Link Collector A server job stage that collects previously partitioned data together. data elements. A stage that extracts data from or loads data into a database that contains hashed files.hashed file Hashed File stage job A file that uses a hashing algorithm for distributing records in one or more groups on disk.

Hashed File. A stage that extracts data from. Also referred to as meta data. Sequential File. UniVerse. or writes data to. and stages. A definition describing the data you want including information about the data table and the columns associated with it. plug-in stage Repository Sequential File stage stage table definition Transformer stage Copyright PR3 Systems. Used to represent a data source. A stage that performs specific processing that is not supported by the Aggregator. or a data mart in a DataStage job. A DataStage area where projects and jobs are stored as well as definitions for all standard and user-defined data elements. an aggregation step. a text file. ODBC.ODBC stage A stage that extracts data from or loads data into a database that implements the industry standard Open Database Connectivity API. or a target data table. UniData. A component that represents a data source. a processing step. A stage where data is transformed (converted) using transform functions. 2005 . transforms. and Transformer stages.

4. This is the name of the system where the DataStage server components are installed. 2005 . This list box displays all the projects installed on your DataStage server.DataStage Client Login 1. Select the Save settings check box to save your logon settings Copyright PR3 Systems. Enter your user name in the User name field. 3. This is your user name on the server system. 5. 2. At this point. Enter your password in the Password field. Choose the project to connect to from the Project list. Enter the name of your host in the Host system field. you may only have one project installed on your system and this is displayed by default.

2005 .DataStage Designer Copyright PR3 Systems.

2005 .Creating a New DataStage Job Copyright PR3 Systems.

• A Tool Palette from which you select job components. 2005 . Copyright PR3 Systems. • A Toolbar from where you select Designer functions. • The Repository window where you view components in a projects. refer to the DataStage Designer Guide. including the functions of the pull-down and shortcut menus. • The Property Browser window where you view the properties of the selected job. For full information about the Designer window. compilation. for example.and information on the current state of job operations. • A Debug Toolbar from where you select debug functions.The DataStage Designer Window The DataStage Designer window consists of the following parts: • One or more Job windows where you design your jobs. • A Status Bar which displays one-line help for the window components.

New Job Window Screen Copyright PR3 Systems. 2005 .

2005 .The Repository Window Copyright PR3 Systems.

2005 .Designer Toolbar Copyright PR3 Systems.

2005 .A Simple job Copyright PR3 Systems.

The Designer Tool Palette The tool palette contains buttons that represent the components you can add to your job designs. Click the group title to open the group. You can also drag other items there from the Repository window. 2005 . The palette has different groups to organize the tools available. The Favorites group allows you to drag frequently used tools there so you can access them quickly. such as jobs and server shared containers: Copyright PR3 Systems.

DataStage Director The DataStage Director is the client component that validates. and monitors jobs run by the DataStage Server. 2005 . Copyright PR3 Systems. schedules. runs. It is the starting point for most of the tasks a DataStage operator needs to do in respect of DataStage jobs.

DataStage Director Copyright PR3 Systems. 2005 .

2005 .] Copyright PR3 Systems.DataStage Director [Contd.

Display Area The display area is the main part of the DataStage Director window. Displays a summary of scheduled jobs and batches in the currently selected job category. If you hide the job category pane. The default view. If the job category pane is hidden. the display area shows all scheduled jobs and batches. which appears in the right pane of the DataStage Director window.regardless of their category. Job Log. regardless of their category. Displays the log file for a job chosen from the Job Status view or the Job Schedule view. the Job Status view includes a Category column. It displays the status of all jobs in the category currently selected in the job category tree. There are three views: Job Status. 2005 . Copyright PR3 Systems. and displays the status of all server jobs in the current project. Job Schedule.

Starts a text search dialog box. shows further details of entries.Menu Bar The menu bar has six pull-down menus that give access to all the functions of the Director: Project. specifies the sorting order. View. buttons. changes the view. status bar. 2005 . and refreshes the screen. filters entries. Displays or hides the toolbar. Search. Copyright PR3 Systems. or job category pane. Opens an alternative project and sets up printing.

runs. schedules. and allows you to set default job parameter values. Invokes the Help system. Tools. It also starts MetaStage Explorer and Quality Manager. Help. cleans up job resources (if the administrator has enabled this option). purges old entries from the job log file. You can also get help from anyscreen or dialog box in the DataStage Director. deletes unwanted jobs. Validates. Copyright PR3 Systems.Menu Bar [Contd. if these components are installed on the system. and resets jobs. 2005 . stops. and custom software. Monitors running jobs. and starts the DataStage Designer and DataStage Manager. manages job batches.] Job.

DataStage Director [ Contd. 2005 .] Copyright PR3 Systems.

The job was stopped by the operator. The job has been validated with no errors. 2005 . reset. The job has finished but warning messages were generated or rows were rejected. The job is currently being run.Job States within Director Job State Compiled Not compiled Running Finished Finished (see log) Stopped Aborted Validated OK Has been reset Description The job has been compiled but has not been validated or run since compilation. The job has finished. View the log file for more details. The job has been reset with no errors. or validated. Copyright PR3 Systems. The job is under development and has not been compiled successfully. The job finished prematurely.

The need for Data Integration and Consolidation within organizations is fuelling the need for DataStage. 2005 . PR3 Systems provides detailed training and consulting services for DataStage.Conclusions DataStage has proved to be an excellent ETL tool within the industry. Copyright PR3 Systems. Best Practices in DataStage Project Design and Helping organizations to consolidate their Data and Information Network. Data Transfer format landscape is gradually moving towards XML in every industry.

2005 . Copyright PR3 Systems.Contact Us For information about our training and consulting services. you can send an email to info@pr3systems.com or call 630-452-9883.

Sign up to vote on this title
UsefulNot useful