This action might not be possible to undo. Are you sure you want to continue?
This course will explain the concepts of DataStage, its architecture, and how to apply it to a 'real life' scenario in a business case-study in which you'll solve business problems. We will begin by looking at the big picture and discuss why businesses need ETL tools and where DataStage fits in the product set. Once we've talked about the very basic architecture of DataStage, we'll investigate a business case-study, learn about a company called Amalgamated Conglomerate Corporation (ACC) - a fictitious holding company - and its business and technical needs. We'll then go about solving this company's problems with DataStage in a Guided Tour Product Simulation. In a practice environment, you'll become ACC’s Senior DataStage Developer. In this capacity, you will assess an existing DataStage Job, build your own Job, modify a Job, and build a Sequencer Job. Using the DataStage clients, you'll log onto the DataStage server and look at a Job that was built previously by the former DataStage Developer at ACC. You’ll then build your own Job by importing meta data, building a Job design, compiling, running, troubleshooting, and then fixing this Job. You’ll then modify a Job and finally build a special type of Job called a Sequence Job. Let’s get started! This section begins by talking about the DataStage tool, what it does, and some of the concepts that underpin its use. We’ll discuss how it fits in the Information Server suite of products and how it can be purchased as a stand-alone product or in conjunction with other products in the suite. We’ll briefly cover DataStage’s architecture and its clients.
Business Intelli Solutions Inc
Information Server 8.1 Introduction
What Is DataStage? InfoSphere DataStage is an ETL (Extract Transform and Load) tool that is a part of the InfoSphere suite of products. It functions as a stand-alone product or in conjunction with other products in the suite. DataStage provides a visual UI, which you can use, in a point-and-click fashion (A non-linear programming tool), to quickly build DataStage Jobs that will perform extractions, transformations, and loads of data for use in data warehousing, system migrations, data integration projects, and data marts.
Sometimes DataStage is purchased with QualityStage, which performs data cleansing (We’ll touch on it in this training but be sure to take the “Introduction to QualityStage FlexLearning module as well). When implementing DataStage as a stand-alone product, you will still need to install InfoSphere Information Server components with it. Other components or products in the suite can be added at a later time. The other products in the suite are QualityStage, Information Analyzer, Business Glossary, and MetaData Workbench. InfoSphere Information Server is a suite of tools that is used to manage your information needs. Several components come with Information Server. One of the main components is DataStage, which provides the ETL capability. IBM InfoSphere Metadata Workbench provides end-to-end metadata management, depicting the relationships between sources and consumers.
IBM InfoSphere Change Data Capture Real-time change Data Capture (CDC) and replication solution across heterogeneous environments
IBM InfoSphere Change Data Capture for Oracle Replication Real-time data distribution and high availability/disaster recovery solution for Oracle environments
IBM InfoSphere Information Analyzer Profiles and establishes an understanding of source systems and monitors data rules.
IBM InfoSphere Business Glossary Creates, manages, and searches metadata definitions.
IBM InfoSphere QualityStage Standardizes and matches information across heterogeneous sources.
IBM InfoSphere DataStage Extracts, transforms, and loads data between multiple sources and targets
IBM InfoSphere Datastage MVS Edition provides native data integration capabilities for the mainframe.
Business Intelli Solutions Inc
Information Server 8.1 Introduction
IBM InfoSphere Federation Server Defines integrated views across diverse and distributed information sources, including cost-based query optimization and integrated caching.
IBM InfoSphere Information Services Director allows information access and integration processes to be published as reusable services in a service oriented architecture.
IBM InfoSphere Information Server FastTrack simplifies and streamlines communication between the business analyst and developer by capturing business requirements and automatically translating into DataStage ETL jobs.
Connectivity Software provides efficient and cost-effective cross-platform, high-speed, realtime, batch and change-only integration for your data sources. This training will focus on DataStage which gives you the ability to import, export, create, and manage metadata from a wide variety of sources to use within these DataStage ETL Jobs. After Jobs are created, they can be scheduled, run, and monitored, all within the DataStage environment.
DataStage provides a point-and-click user interface in which there is a Canvas. You dragand-drop icons that represent object’s (Stages and Links) from a Palette onto the Canvas to build Jobs. For instance, you might drag and drop icons for a source, a transformation, and a target onto the Canvas. The data might then flow from a Source Stage via a Link to a Transformation Stage through another Link to a Database Stage, for example. Stages and Links present a graphical environment that guides you, the developer through needed steps. You are presented with fill-in-the-blank-style boxes that enable you to quickly and easily configure data flows. When connecting to a database for example, you drag a proprietary database Stage icon onto the Canvas, you don’t have to worry about how that database’s native code works to still be able to leverage its high performance. After configuring some particulars and doing a compile, the result is a working Job that is executable within the DataStage environment as well as a Job design which provides a graphical look as to how the data or information is flowing and being used
Business Intelli Solutions Inc
InfoSphere has various product modules within it including the Information Services Director. These product modules are all supported by common metadata access services and metadata analysis services. this 3-tiered architecture fulfills many information integration needs allowing metadata and terminology collected with one product during one phase of a project to flow to other products throughout the enterprise enabling common understandings of data and leveraging common discoveries. the Business Glossary. QualityStage. From there.1 Introduction Information Server Backbone In order to understand the environment in which we will work as Developers.com .Information Server 8. you will need to install the basics of the Information Server itself including the repository and the application layer. They all sit on top of the repository. Information Analyzer. They all work with one another and can be used as an integrated product suite or it also comes as a stand-alone DataStage application. Business Intelli Solutions Inc www. DataStage. let’s first very briefly look at the architecture and framework behind the InfoSphere Information Server and its various components (Collectively known as InfoSphere).businessintelli. Whichever component or components you have. and Federation Server.
businessintelli. DataStage has both real-time and batch capabilities. Using DataStage. As mergers and acquisitions occur. The Business Glossary maps your business terms to your technical terms so that all the members of your organization can have common nomenclature during all business activities and stages of technical development thus reducing ambiguity among varying audiences. using DataStage. in the process of doing your other ETL activities. and move it into a data warehouse or to online operating systems so that you can have quicker access and more robust delivery of your business data. can pull and push metadata.Information Server 8. files. be brought together into a single trusted view or common unified view with which to make better business decisions and implementations. The Federation Server is a tool that let's you put together disparate and heterogeneous data stores through a single point of view for the organizations entire data world: Databases still stored on mainframes as well as a variety of other computers are made to look as one repository. Business Intelli Solutions Inc www. let's take a more detailed look at DataStage's components. DataStage will provide the components needed to get data from place to another and manage it on an ongoing basis.1 Introduction Let’s just review the products in the suite again. Now that we've talked about the overall Information Server's architecture and components. for example. you will be able to build data warehouses and other new repositories for integrated forms and views of data for your entire organization. The Information Services Director let's you turn your Jobs into real-time Web Services. The Information Analyzer allows you to analyze data in databases. DataStage is also very useful in data migration activities. effectively parse it. cleanse your data inline. DataStage also integrates with various other tools in addition to databases such as MQ Series that provide reliable delivery of messaging traffic. • DataStage. and information from and to various targets and sources of data. thereby allowing you to. table definitions. DataStage and QualityStage have been integrated into a single canvas so that all your activities within a Job can flow from a traditional DataStage Stage to one of the QualityStage Stages. DataStage and QualityStage allow you to then manipulate the data and cleanse it in-line while you're working with it. the underlying IT systems can. This tutorial will focus primarily on the ETL activities with a little bit of knowledge on DataStage's QualityStage Stages that go along with it. by itself. DataStage can help with ordering and receiving activities as well by putting data to and/or pulling data from an EDI wire. EDI and other means of data communication. and data stores and gives you the knowledge to determine what types of Jobs need to be developed.com . • • • • • These product modules feed to DataStage allowing you to create Jobs that will function in a high efficiency manner to move data and/or cleanse it as necessary.
watch Jobs. The Designer is used to develop and manage DataStage applications and all their related components such as metadata. and a shared repository. and the Director client. The Designer is where you will be doing most of your development. The clients include an Administrator client. and is running Business Intelli Solutions Inc www. deployed. The Administrator is used for setting up and managing individual DataStage Projects and each Project's related common project information.businessintelli.1 Introduction DataStage Architecture DataStage has a 3-tiered architecture consisting of clients. These are all components within the InfoSphere Information Server using its shared and common repository. dedicated engines. a Designer client. The Director allows you to actively monitor runtime activities.Information Server 8. and keep track of events once your Job has been developed.com . schedule Jobs.
1 Introduction Inside DataStage.com . logs of the Job runs. The Parallel engine allows you to create Jobs that can. and metadata that you will need while developing Jobs. at runtime. The 'Server’ jobs are typically single-threaded and have evolved over the years during various versions of DataStage since its inception. dynamically increase or decrease their efficiency and to optimize the use of resources at hand within your hardware. The Shared or 'Common' repository holds all of the information such as Job designs. the traditional Server engine (used for DataStage 'Server' Jobs) and the Parallel engine (used for DataStage Parallel Jobs).Information Server 8. Business Intelli Solutions Inc www. let's take a look at the Administrator client's UI. there are two engines.businessintelli. Now. The Parallel engine can dynamically change the way in which it performs its parallelism.
Here.businessintelli.com . sharing metadata with importing from Connectors (the DataStage objects that automatically connect to various proprietary sources and databases with minimal configuration) and also give you some auto-purge settings for your log.Information Server 8. enable runtime column propagation (RCP) for all of your Parallel Jobs. you can enable job administration from within the Director client. What is RCP and how does it work? Business Intelli Solutions Inc www.1 Introduction Administrator Client We're looking at the Administrator's UI under the tab entitled 'General'. enable editing of the internal Job references.
we would need to designate who will be the authorized user to run Jobs from the scheduler. There may be differing functions made available to different people. The Parallel tab gives us options as to how the parallel operations will work with the parallel engine. The Administrator. The Schedule tab allows us to dictate which scheduler will be used. Usually. and operators may not be the same person.1 Introduction DataStage is a role-based tool. The person with the role of ‘administrator’ may have more options than does a ‘developer’.businessintelli. In the Windows environment. your UI may be different than others'.com . This would allow us to create Jobs that could then be ported up to the mainframe and then executed in their native form with the appropriate JCL. the Sequencer options are set.Information Server 8. DataStage can assign functions to individuals who will be using the tool. Traces are available should you wish to put tracing on the engine. Thus. This can all be designated for each separate DataStage Project. developers. Under the Sequence tab. The Mainframe tab allows us to set needed options when working with the mainframe version of DataStage. And under the Remote tab of the Business Intelli Solutions Inc www. when it is installed the Administrator assigns roles such as Developer or Operator. Under the Permissions tab. when you look at the actual product. This means that the UI may have different options available to different people. Other tabs let us set the memory requirements on a per-Project basis. You can thus determine which user gets what role for that particular Project.
1 Introduction Administrator client. The Palette contains all of the Stages (drag and droppable icons).com . Toward the lower-left is the Palette.businessintelli. Let's examine the main areas of the UI. and then drop on the next Stage to create the Link dynamically. It shows us the metadata objects and Jobs found within DataStage. DataStage Designer The second of the three DataStage clients is the Designer. As an ETL Developer. this is where you'll spend a good deal of your time.Information Server 8. You would select a Database source icon. we make settings for remote parallel Job deployment on the USS system or remote Job deployment on a Grid. On the right is the Canvas. Now. drag. data will flow from a Row Generator Stage to the Peek Stage. In the upper-left (shown highlighted) is the Repository Browser area. Links can be added from the Palette directly or you can just Right-click on a Stage. Business Intelli Solutions Inc www. This is where you would drop the Stages that you have chosen to be part of your Job design. In this particular example. You can see that the Job in the graphic above is a Parallel Job and that is contains two Stages that have been joined together with a Link. for instance and drag it to the right.
main activities of the program.com . There are over 100 Stages that come with the product out of the box (At installation time there are approximately 60 base Stages installed. we can see an individual Job log. Should there be any warnings or runtime errors. this is a fatal error. Above. a yellow message is a warning and may not directly affect the running of the Job but should probably be looked at. an additional 40 that are optional. which must be resolved before the Job can run successfully.1 Introduction The Designer is where you will create the kinds of working code that you need in order to properly move your data from one source to another. Director gives us a runtime view of the Job. Business Intelli Solutions Inc www. If you should ever see a red message in here. they can be found and analyzed here to determine what went wrong in order to take corrective action.Information Server 8. and then eventually the successful completion of the Job. Color-coding shows that green messages are just informational. DataStage Director The third DataStage client is the Director. setting environment variables. and 50-100 more based on service packs and add-ons. We can see the types of things that occur during a Job run such as the Job starting. and then you can build your own custom Stages in addition).businessintelli. There are messages and other events shown in this Job run that are logged to this area and kept inside the Shared Repository.
Information Server 8.1 Introduction
DataStage Repository In the Repository Browser, we have various folder icons that represent different categories within the DataStage Project. Some come automatically with the Project and then you can add your own and name them. Organize them in a way suitable to your DataStage Project providing for easy export and import. Standard folders include those for Jobs, Table Definitions, Rule Sets, and other DataStage objects. Tip: Create a Folder named after your initiative and then order or move all the appropriate objects for just that initiative under that one common location.
Table Definitions are also known as metadata or schemas. These terms are often used interchangeably. Each typically contains a collection of information that defines the metadata about the individual record or row within a table or file. A piece of metadata can describe column names, column lengths, or columns' data types or it can describe fields within a file.
Other folders include one for Routines (sub-routines able to be called from within a Job). Shared Containers are pieces of DataStage Jobs that can be pulled together, used, and reused as modular components. Stage Types (highlighted) is a master list of all the Stages that are available to you in this Project. Stages can also be added later through various
Business Intelli Solutions Inc
Information Server 8.1 Introduction
options depending on what product(s) have already been installed and depending on what your administrator has made available to you. The Standardization Rules folder contains out-of-the-box or default rulesfor QualityStage.
Transforms provide other abilities from the standard Server engine to create macro types of use and re-use of various functions to its calling structure. WAVES Rules are also used by QualityStage for address verification modules. Match Specifications which are used by QualityStage's Match Stages. Machine Profiles which are used in IBM Mainframes. IMS “view sets” are for working with legacyIMS types of Database Management Systems
Steps to Create a DataStage Job . This section talks about the steps to building your first Job. First, you'll need to understand some basic concepts, then we'll talk about setting up an environment, next you'll connect to the sources, then we'll talk about how to import table definitions, and then we'll provide you with an understanding of the various types of Stages and how they are used. Then, we'll talk about working with RCP, creating Parameter Sets, understanding how to use the CLI, and, in the next lession, you will put this all of this knowledge to use and begin building Jobs in a case-study scenario. This section covers the following: 1. Understand Some Underpinnings 1. Types of Jobs 2. Design Elements of Parallel Jobs 3. Pipeline Parallelism 4. Partition Parallelism 5. Three-Node Partitioning 6. Job Design Versus Execution 7. Architectural Setup Option 2. Setting up a new DataStage environment 1. Setting up the DataStage Engine for the First Time 2. DataStage Administrator Projects Tab 3. Environment Reporting Variables 4. DataStage Administrator Permissions Tab 5. Export Window 6. Choose Stages; Passive Stages 7. Connect to Databases 3. Import Table Definitions 4. Active Stages; The Basics 5. An Important Active Stage; The Transformer Stage 6. Advanced Concepts When Building Jobs; RCP and More 7. Pulling Jobs Together; Parameter Sets, CLI, etc
Business Intelli Solutions Inc
Information Server 8.1 Introduction
Let’s now shift our discussion to the process of developing Jobs in DataStage. What are the steps involved in developing Jobs? First, you define global and Project properties using the Administrator client that we mentioned earlier in the tutorial. This includes defining how you want the Job to be run. For instance, do you want Runtime Column Propagation or any other common variables in your environment that you might be using throughout the Project? We'll discuss this in greater depth later in this tutorial.
Next, you go into the Designer client and import metadata into the repository for use, later, in building your Jobs. Then, using the Designer tool, you actually build the Job and compile it. The next step is to use the Director tool to run and monitor the Job (Jobs can also be run from within the Designer but the Job log messages must be viewed from within the Director client). So, as you’re testing your Jobs it can be a good idea to have both tools open. This way, you can get very detailed information about your Job and all of the things that are happening in it.
Business Intelli Solutions Inc
Inside the Stage(s). we would next want to go into our Designer tool to first import metadata from the database itself for the table or tables which we’ll be pulling. and password. Still in the Administrator. We do this by double-clicking the Stage in question and filling out the particulars that it requests. we will use the variables (that we defined in our common environment using the Administrator tool) to define the Job as we see it needs to be done. These can be set there and encrypted in a common location so that everyone can use these without necessarily exposing security in your system.1 Introduction Administrator Client Let’s say that we want to create a Job that extracts data from our primary operational system. the user ID. The database Stages come pre-built to meet the needs of many proprietary databases and satisfy their native connectivity requirements thus providing high-speed transfer capability while eliminating the need to do a lot of the configuration.com . We could go to the Administrator tool set up our Jobs to be able to use RCP and set the number of purges in the logs for however often we want to run purges. ready-to-use. We’ll pull that data out and store it into either a local repository such as a flat file (By dragging a Sequential File Stage onto the Canvas and creating a link between the two Stages) or store it into one of DataStage’s custom Data Sets (A proprietary data store which keeps the data in a partitioned. Also within the Designer we want to use Stages that will connect us to the database. the name of the database that is being accessed. Designer Client To then create this kind of extract Job. for instance. put it directly into another database (By dragging a Stage onto the Canvas at the end of that data flow). for instance. high-speed format so that it doesn’t need to be parsed for re-use as it would otherwise need to be from a flat file) And then we could. Business Intelli Solutions Inc www. We do this by dragging and dropping the appropriate Stage onto the Canvas.Information Server 8.businessintelli. we could then put in some common environmental variables.
you can configure your parallelism dynamically. called OSH. For instance. When Jobs are compiled. and Server Jobs (the legacy Jobs from earlier versions of DataStage). These are means of efficiency that can be set. are Jobs that control other Jobs. or looping that needs to go on in order to re-run a job over and over Business Intelli Solutions Inc www. The executable that the OSH executes is accomplished with C++ class instances. kick off other Jobs and other activities including Command Line activities. If you have special data need to write to a special device. such as one for scanning and tracking control on delivery trucks. Job Sequences. These are then monitored in the DataStage Director with the runtime Job monitoring process. other control logic. for example. These fit into the categories of Parallel Jobs. Parallel Jobs are executed by DataStage using a parallel engine. Master Sequencer Jobs can. The OSH executes various Operators Operators are pre-built functions relating to the Stages in our Job Design. They have built-in functionality for what is called pipeline and partition parallelism. as we mentioned.com . Job Sequences (which control other Jobs). then you could write a custom module to put the data directly into that format without having to go through any other intermediate tools. Also you can create custom Operators using a toolkit and the C++ language.Information Server 8.businessintelli. they are compiled into a DataStage-specific language called Orchestrate Scripting Language.1 Introduction Types of Jobs Let’s talk about the three types of Jobs that we would most likely be developing. We’ll discuss these things in greater detail later in the tutorial.
when a local Job fails or globally. A common Command Line API is provided. The Job Sequencer can control all your Jobs at once without having to schedule each individual Job with the scheduler.com . These are executed by the DataStage Server engine and are compiled into a form of BASIC and then run against the Server engine. Server Jobs cannot be parallelized in the same way that the Parallel Jobs can be. when anything happens and an exception is thrown. you can embed this into third-party schedulers and can then execute Jobs in whatever fashion you choose. The third type of Job is the Server Job. In addition. Business Intelli Solutions Inc www. Server Jobs are legacy DataStage Jobs that continue to exist and are supported as DataStage applications but you may also find them useful for their string-parsing capabilities. they cannot easily have their parallelism changed dynamically at runtime as the Parallel Jobs can. you can perform activities based on their failures and do that either specifically.Information Server 8. you can achieve parallel processing capabilities by either sequencing these Jobs with the Job Sequencer to run at the same time. Runtime monitoring is built into the DataStage engine and can also be viewed from the Director client. Although you still have the ability to call any/each Job individually. you can also group them together through the Job Sequencer and call them that way.businessintelli. Therefore. or using other options to gain efficiencies with their performance. Thus. However.1 Introduction again. Legacy Server Jobs do not follow the same parallelization scheme that the Parallel Jobs do.
These allow you to read data and write data. we should think of the basic elements.com . Next. and various other third-party component Stages such as SAP and Siebel Stages Business Intelli Solutions Inc www. the Oracle Stage. flow through transformations in the middle of the Canvas. and understand the activities and flows that are going on within the DataStage Job. using an orderly. you can drag an Extract Stage out onto the visual Canvas and drop it on the left of the screen.Information Server 8. Passive Stages include things such as the Sequential File Stage. Stages. Other Passive Stages might include the MQ Series Stage. Transform. This also means that other Developer's Jobs will be intuitively understandable to you When we start designing a Parallel Job. This flow makes future interpretation of your Jobs simple. DataStage let’s you put elements such as Stages and Links in any fashion that you wish. which are Stages and Links. Peek Stages that allow you to debug and look at some of the Job’s interimactivity without having to land your data somewhere. as we mentioned. going outputs or Loading Stages on the right side. Those who are new to DataStage are able to look at. flow chart style left-to-right top-to-bottom visual design. For example. we have Passive Stages. the Stage to read DB2. which represent the ‘E’ and the ‘L’ of ETL (Extract and Load). get implemented ‘under the covers’ as OSH Operators (The pre-built components that you don’t have to worry about and that will execute based on your Job design on the graphical Canvas).1 Introduction Design Elements of Parallel Jobs One of the nice things about having a visual medium like DataStage is that you can easily take the notions of Extract. see.businessintelli. and Load and represent them visually in a manner that is familiar to people. For instance.
including the metadata that you chose. it’s easy.businessintelli. manipulate column order and do a variety of other functions to your data to enrich it and potentially to validate and pre-cleanse it before applying some of the QualityStage Stages to it. These are also assisted by specific Stages to do filtering. When you open up a Stage by doubleclicking it. Your settings. or pulling the data – either by joining it into one common larger column or by funneling it into multiple rows within the same data. These include the Transformer Stage. in some cases. It allows you to propagate the data that goes out in a number of ways and. in actuality. you get various dialogue boxes that refer to input and output (input to that Stage and output from that Stage) in DataStage. when you modify Jobs. And therefore. DataStage is a link-based system. are all kept with the Link.com .Information Server 8. Business Intelli Solutions Inc www.1 Introduction Processing or active Stages are the ‘T’ of ELT (Transformation). to move those pieces of data because they stay with the Links (all the metadata and accompanying information) from one Stage type to another Stage very easily without losing data or having to re-enter it. The setting that you define in these two areas within the Stage affect the Link coming in and the Link going out of the Stage. It provides a variety of functionality including the ability to filter rows. Although the Stages are visually predominant on the Canvas. splitting or merging the data (using the ability to direct data down multiple paths simultaneously. These represent “Links” coming into and out of the Stage (when viewed from the Canvas). as well as individually modifying each column of data that goes out. perform aggregation of the data (summations or counting of data or finding first and last). generating the data – such as generating rows or columns. Now that we’ve talked about some Stages. merge two streams of data together or pieces from two streams into common one common stream. all the action is happening on the links. which allows you to transform data. They do the transformation and other ‘heavy lifting’ work. let’s talk about the things that help the data flow from one Stage to the Next Links are the ‘pipes’ through which the data moves from Stage to Stage. You will learn to rely heavily on the Transformer Stage.
For example. this allows us to execute transformer. This reduces disk usage for the staging areas. cleansing.1 Introduction Pipeline Parallelism The purpose of parallelism is to create more efficiency and get more work out of your machine(s). each of the Operators (That correspond to the various Stages that we see on our Canvas) set themselves up within their own processing space and pass data from one to the other rapidly without having to be re-instantiated continually or without having to do all the work in a given Stage before starting the next portion of the pipelining process. by not requiring you to land data as often. and loading processes simultaneously.com . Here. This is done is two ways the first of which is Pipeline Parallelism. It uses your processor more efficiently and thus maximizes the investment in your hardware.businessintelli. There are some limits on the scalability depending on how much CPU processing horsepower is available and how much memory is available to store things dynamically. Business Intelli Solutions Inc www.Information Server 8. You can think of it as a conveyer-belt moving the rows from process to process.
This assumes that data is evenly distributed and no other factors that would limit your data from getting out there. 24 times faster on 24 processors. Each partition of data is processed by the same Operator.Information Server 8. such as network latency or other external issues.com . and so on. Business Intelli Solutions Inc www.1 Introduction Partition Parallelism The second factor of parallelism is called Partition Parallelism. let’s say that you want to do a filtering type of operation. Each partition will be filtered exactly the same way but multiple filters will be set up in order to handle each of the partitions themselves.businessintelli. That means that we could run 8 times faster on 8 processors. For instance. Each one of these subsets is called a partition. This facilitates a near-linear scalability of your hardware. This is more of a ‘divide and conquer’ approach in which we divide an incoming stream of data into subsets to be separately processed by an Operator.
like a card dealer.1 Introduction Three-Node Partitioning Data can be partitioned into multiple streams. we need to segregate the data within the partition based on some key values. The simplest of the methods is called the Round Robin in which we assign each incoming row to the next subsequent partition in order to create a balanced workload. dealing out cards in the deck to every player around the table. may be in another partition and thereby mistakenly ‘kept alive’ and moving down another stream in a parallel Operation. you keep the relevant data together by these common values. in Vegas. Should the data become separated (The Key value fields start going into different partitions). Business Intelli Solutions Inc www.Information Server 8. whether they are numeric or alpha values. If we had put them all correctly on one partition then only one duplicate would survive which was our desired outcome. This is known as Hash partitioning (Or with numeric you can use a Modular partitioning) and in this way. thus dealing all the cards evenly to each player. Other times. when the Operation occurs it will not fully work correctly as some of the data that should have been removed. Then at the other end of the partitioning process when all the data is collected. 15:34 The way in which we partition will depend upon how we need to process this data. we still have multiple copies when there shouldn’t be. This is very important when we’re doing things such as sorting the data or possibly filtering the data on these key values. In other words.com . then.businessintelli.
Here’s what we would see: For each of the Oracle Stages (Toward the left of the data flow) and other Stages in this design. Job Design Versus Execution Let’s look at an example that will give you an idea of what happens when we design the Job and then what happens when we run it. let’s say that we want to partition this data four-ways (In DataStage terminology would be known as 4 Nodes). we take the merged data and we do an aggregation on it and finally we send the data out to a DB2 database. If you have to partition by range or some type of hash partitioning scheme. the quantities of data in each partition may become imbalanced and one or two of the partitions may finish long before the others and thus reduce performance. From there. it will be processed much more quickly.businessintelli. Business Intelli Solutions Inc www. In this kind of activity when doing summation. you can see that the data is flowing from two Oracle Stages and the data is brought together inside of a Merge operation. At the top of the graphic.com . it is extremely important that we partition or segregate our data and that we partition it in a way that will work with our design.1 Introduction The kind of partition that you perform matters a lot! One of the reasons why Round Robin is the default is that if you can evenly distribute the data. we would create four instances of each to handle the four separate streams of data.Information Server 8. In the background.
we might get four values out of this phase (Highlight on the 4 instances of Aggregation Stage) with only one of the set of partitions summing anything at all! Architectural Setup Options Now that we've talked about the considerations that we must be aware of as we do our partitioning both in the design of our Job and then in the choice of partitioning strategy.Information Server 8. we want them all to be together in the same partition. All of the components are hosted in one environment. so that all the summation can be done together. let's talk about some architectural setup options. For now.com . Business Intelli Solutions Inc www. This section of the tutorial will talk about some of the underpinnings of DataStage in general and specifically some of the architectural setups that we can use. let's continue our discussion in a different direction. though. Note that we will discuss these partitioning strategies in greater detail later in the tutorial.businessintelli. we have deployed all of the Information Server DataStage components on a single machine. Let’s say that we have 5 rows that need to come together for a sum. we want to partition it by the values by which we are going to do the summation. This way.1 Introduction Here. In one setup. Otherwise. we don’t accidentally sum on two different things and loose parts of our rows.
1 Introduction In one setup. What are some reasons for putting all of these components on a single machine? Perhaps that is all the resources that you have available or perhaps you have very poor network performance and don’t want the resulting latency going across the network wire to the repository. and the DataStage Server itself.com . All of the components are hosted in one environment. we have deployed all of the Information Server DataStage components on a single machine. let’s say that we are carrying a laptop around: It can contain all the co-located components (and it could still be connected to by a remote client).Information Server 8.businessintelli. Business Intelli Solutions Inc www. the Xmeta repository using DB2 (other databases can be used for the repository such as Oracle or SQL Server). This includes the WebSphere Domain. In other words. The clients can be stored on the same machine as well if it is a Windows machine.
One of the benefits to this separation (Having two machines) is that the metadata repository on Machine A can be utilized (By a product such as Information Analyzer) concurrently without impacting the performance of DataStage as it runs on Machine B Business Intelli Solutions Inc www.Information Server 8.1 Introduction Now let’s talk about putting the components on two machines.com . A typical two-machine setup would consist of the Metadata repository and the Domain layer on Machine A and the DataStage Server onto Machine B and potentially be able to connect to Machine B remotely with the clients.businessintelli.
Finally. For Business Intelli Solutions Inc www. we are going to talk about DataStage’s architecture ‘under the cover’ and some of the ways that we can set up the product.Information Server 8. We’ll also talk about setting up the DataStage engine itself for the first time using the Administrator client and see how to then setup DataStage Projects. In this last section.businessintelli.1 Introduction In a 3rd scenario. you have the DataStage Server on Machine C. Next we’ll look at Jobs within Projects and talk about a number of important development objects such as Passive DataStage Stages.com . And. The database repository is on Machine A (Maybe because you are using a machine that already has your database farms installed on it). In this next section of the tutorial. The Metadata Server backbone is installed on Machine B so that Information Analysis and Business Glossary activities don’t risk negatively impacting any development on the DataStage Server: Several DataStage development teams can keep working and utilizing their resources most effectively. we talked about some different options. Setting up a new DataStage environment. it should be said that you must determine the optimal setup in your environment depending on your own conditions and requirements. we’ll see how to import and export table definitions into that Project. connected to by remote clients. Within a Project. major components are separated onto different machines.
These might include the creating Parameter Sets. Finally. putting together Sequencers to organize all of your Jobs. We’ll also talk about some Active Stages. Business Intelli Solutions Inc www. This may be within Sequencers or at the Job level we will talk about strategies for different ways to apply parallelism. and the Transformer Stage (at the heart of many Jobs). using various Run options.com . Then we’ll talk about some key concepts that you’ll want to know about as you are developing Jobs such as the use of Runtime Column Propagation (RCP). Then we’ll talk about ways of implementing and deploying our DataStage application that can make things easier on you and the team. Specifically. the Join Stage. we’ll discuss parallelism more in depth. the Merge. Sequential Files Stage. We’ll see how these things work and then what happens when the parallel Job gets compiled.Information Server 8.1 Introduction example. how to create Shared Containers for re-usability. we’ll talk in more detail about the two forms of parallelism (Pipeline parallelism and Partitioning parallelism). Within InfoSphere. which are important when we are trying to combine and/or manipulate data. we’ll talk about the Data Set Stage. Of course. Stage. These include the Lookup Stage. we’ll discuss how to apply these concepts and scale our previously constructed DataStage application up or down depending on the resources that we have available to us. and various relational data Stages. we’ll talk about where we perform parallelism.businessintelli. and using the Command Line Interface (CLI) in which Jobs can be called from a third-party scheduler.
businessintelli. DataStage Administrator Projects Tab Within the Administrator client.Information Server 8. Often. We can see it shows the Domain to which we want to connect with its corresponding port number of the application server. and the DataStage Server name or its IP address. We use this for each of the individual Projects that we want to set up here within DataStage. When we log on. the user ID and password for the DataStage Administrator with which we will connect.1 Introduction Setting up the DataStage Engine for the First Time When we're setting up a new DataStage engine.readu command which would show you any locks that are still active in DataStage (as well as any other commands associated with the Business Intelli Solutions Inc www.com . we have a tab entitled Projects. The Command button allows you to issue native DataStage engine commands when necessary such as the list. the first thing is to go into the Administrator client and make sure that some of the basics have been taken care of. we are presented with the Attach to DataStage dialog box shown above. you will find that the Server’s name is the same as the beginning portion of the Domain name (But this does not have to be the case – it depends on the architecture and name of the computer and how it is set up in your company’s Domain Name Server (DNS)).
The Properties button brings up the dialog box shown above. Various tabs across the top each have thier own options. This will enable job administration in Director. you can see the Project pathname box. Enabling job administration in Director will allow developers to do things such as stopping their jobs. Business Intelli Solutions Inc www. and so on.1 Introduction DataStage engine or with the Universe which was the predecessor to the DataStage engine). Below that. want to give people the ability to administer Jobs in the Director client as they see fit. typically. One of the times that you may NOT want to do this is in a production environment. This is extremely important as you must know where the Project is located on the machine (It is located the machine on which the DataStage Server has been installed).businessintelli. One thing that you’ll probably want to do is to check the first option. This shows us where the Project directory is located. you do. Let’s take a look at the Properties button. Otherwise. It will show us the settings for the selected Project (datastage1 shown in the graphic).com .Information Server 8. clearing job information. re-setting their jobs.
We’ll cover this more in depth a little later in the tutorial but just be aware that this is where you enable it for the datastage1 (each) Project. The ‘autopurge of the job log’ option is very important.businessintelli. It's also useful in production where you don't want people to make modifications to Jobs. which may go for long periods of time between each run of a particular Job. The ‘Share metadata when importing from Connectors’ option. You always want to set up default purge options so that the logs don’t fill up and consume large disk space on your machine (The machine that the DataStage Server is installed on – not necessarily the machine that houses your clients like the Administrator client). This again. What type of data is it) but operational metadata tells us things like “How frequently has it been run” and “How much volume has gone through it”. The “Generate operational metadata” checkbox will allow the data. Since most people want to make changes regularly during development work. Project-specific Environment variables are only for this one particular Project. is the option to enable RCP for parallel Jobs. you may want to set it to ‘every 30 days’ where you plan on running your Jobs on a daily basis. Next is a button that allows you to protect a Project. Business Intelli Solutions Inc www. The Environment button allows you to set global environment variables for your entire Project. In your production environment. it's not very useful for development environments. allows us to perform cross-application impact analysis. These will be Project-specific. The ‘Enable editing of internal references in jobs’ option will help you when dealing with lineage and impact analysis (If you change Jobs or objects over time).1 Introduction Next. as it is being generated. for further analysis and understanding data in the ‘operational’ form. Metadata is usually understood as a descriptive (How big is a field. later. These will be separate from your overall DataStage Environment variables. Some strategies include setting it to the 5 previous runs within your development and test environment. This can be important later as you determine your scalability and future considerations such as capacity planning. in particular the row counts to be able to be captured. You have your choice of purging them based on the number of runs or by the number of days old. which are global to the entire Server.Information Server 8.com .
its prompt. and then the value. it opens the Environment variables dialog box.1 Introduction When you click on the Environment button. The other category is User Defined. we have our categories of environmental variables such as General.businessintelli. the corresponding pane on the right displays the name of the environmental variable to be used. Under each category area. which includes Parallel environmental variables. there are two panes in the window. On the left. Here.Information Server 8.com . that will be used for the entire DataStage Project. Business Intelli Solutions Inc www.
By setting these here. each of these variables can also be set individually or in each Job. you can have just one Job contain a different value from the other uses of that same environmental variable throughout the rest of the Project. APT_MSG_FILELINE. However. That is to say.com .Information Server 8.1 Introduction Environment Reporting Variables A standard category is for environmental variables for reporting such as the APT_DUMP_SCORE. APT_NO_JOBMON. you can have all the developers on the Project use them in a standardized way. at runtime. Business Intelli Solutions Inc www.businessintelli.
Information Server 8.com .1 Introduction Business Intelli Solutions Inc www.businessintelli.
we allow Operators to view the full log so that if they encounter problems during a DataStage Job. Once we create a new user. Developer.com .Information Server 8. Business Intelli Solutions Inc www. Super Operator.businessintelli.1 Introduction DataStage Administrator Permissions Tab is where we set up users and groups for a variety of authorizations. they can easily communicate that information to their second line of support when escalating a problem. we can see in the User Role drop-down menu that there are several roles available for us to assign to that user or group. Typically. We would add a user and then assign a product role. This allows us to perform engine traces. Let’s continue our discussion by moving over to the Tracing tab. The “DataStage Operator to view the full log” checkbox is typically used unless you have a particular reason such as sensitive data or Operators being overwhelmed by too much information. These consist of Operator. Normally. and Production Manager. you will not use unless working with IBM support while experiencing a difficult problem that cannot be solved any other way.
businessintelli. Further down. Business Intelli Solutions Inc www.Information Server 8. there are some advanced options for doing things such as creating format defaults for various data types used in the parallel framework for this Project.1 Introduction The Parallel tab contains some options. The checkbox at the top allows you to see the Orchestrate Shell (OSH) Scripting language that has been generated from within your Job properties box. It creates a tab entitled OSH.com .
Information Server 8.1 Introduction
Under the tab entitled Sequence, we can add checkpoints so that the Job sequences are restartable upon any failure. We can automatically handle activities that fail, log warnings after activities that finish with a status other than ‘OK’, and log report messages after each Job run.
Work on Projects and Jobs Once we get our engine configured and set our basic Project properties, the next thing that we will want to be able to do is to work on a new Project. If we generate all these DataStage Jobs from scratch, we won’t be re-using anything from previous Projects. However, often times, it makes sense to leverage work that we have already done. We may want to import things from a previous Project. These include things such as program objects such as Jobs, Sequences, Shared Containers, Metadata objects (for instance Table Definitions), Data Connections, things used globally at runtime such as Parameter Sets, and other objects such as routines, transforms, that we want to bring along and re-use from a previous Project.
Exporting and Importing Objects
Business Intelli Solutions Inc
Information Server 8.1 Introduction
Export Window In order to make this happen, the first thing that we need to do is to export it from one environment. Then we will be able to import it. Here in the Export window, we can highlight certain objects and then add them to our export set. One important thing to consider is the ‘Job components to export’ drop-down menu shown highlighted. This will allow you to export the job designs with executables, where applicable. Let’s say that we are promoting all of these objects to a production environment that does not have its own compiler. You will then need to have these pre-compiled Jobs ready to run when they reach their target. You can opt to exclude ‘read only’ items. This is a typical default.
Next, you would choose which file you would like to export it into. This is a text file that will ‘live’ on your client machine. All of the export functionality is done through client components. Once we have identified all the objects that we want to export, we simply click the Export button to begin the export.
Business Intelli Solutions Inc
Information Server 8.1 Introduction
We export either into an XML file or the native DSX file. Once this is accomplished, then we can begin our import. For the import, we would use the DataStage Designer client, click its import option and select the objects to import from the flat file into our new environment. Let's take a look at that dialog box.
This file that we are importing may serve as a backup in that it allows us to import objectby-object. Our export is exactly like a database export in so far as it allows us to pull out one object from the export at a time. Should someone accidentally delete something (in the new environment), instead of losing the entire box, we only lose one object that can then be
Business Intelli Solutions Inc
In particular. At the time. Whereas. you will need to identify the export file. On the Import screen. Let’s talk about some very important and key components that we will use during our development. You can also choose to perform an ‘Impact analysis’ to see what objects are related to each other. you can choose to ‘Overwrite without a query’ if you know that everything that you want is the most current. databases. the Filter Stage.1 Introduction restored without losing any of the other work that you’ve done in the meantime by importing it from the export file. You then have a choice to ‘Import all’ or ‘Import selected’. let's discuss the Passive Stages. The Active Stages are continuous flow Stages that go in the middle of the data flow. let’s talk about DataStage Stages that we see on our Canvas. Active Stages are the things that actually perform on the data: As rows are coming in. Passive Stages and Active Stages.Information Server 8. First. and Data Sets (Data Sets are the DataStage proprietary high-speed data format).businessintelli. which has many functions. First. and the Modify Stage pass data as they receive the data unlike the Passive Stage which only either inputs data or only outputs data. Once you have set things up and are ready to begin developing Jobs. So the Passive Stages are beginning points in the data flow or ending points. For example a Transformer Stage. Most notable of the Active type is the Transformer Stage. let's cover the Passive Stages. There are two classes of Stages. The active Stages manipulate the data. We'll get to this Stage a little later. Passive Stages Passive Stages are typically used for ‘data in’ and ‘data out’ such as working with sequential files. Choose Stages Now that we've talked about setting up our Project. let's now focus on building Jobs. they immediately go out.com . Business Intelli Solutions Inc www.
and then complex flat file data which is most notably used with older COBOL-type structures – these allow us to look at complicated data structures embedded inside of files and are used in proprietary formats –much the way that COBOL does. it is important to know that DataStage has an import/export operator working on the data (Not to be confused with importing and exporting objects. This Sequential File Stage’s import and export is about taking the data from its native format. likewise.com .1 Introduction Types of File Data Let’s describe the types of file data that we’ll be dealing with as we begin to develop Jobs. on the way out (when writing it). components. taking that internal structure and writing it back out into a flat file format.businessintelli.Information Server 8. Primarily this falls into three categories. Business Intelli Solutions Inc www. sequential file data (Should be of either fixed or variable length). and putting it into an internal structure (within DataStage) so that it can be used in the Job and then. Data Sets which are the DataStage proprietary format for high-speed access. and metadata). parsing it. How Sequential Data is Handled When we are using our Sequential File Stage.
we could look in the log and see messages relating to things such as ‘How many records were imported successfully’ and ‘How many were rejected’. Later.Information Server 8.businessintelli. When the records get rejected.1 Introduction After the Job has been run. Business Intelli Solutions Inc www.com . we’ll see how we can push those rejected files out. it is because they cannot be converted correctly during the import or the export.
how the row is divided into columns).1 Introduction Normally.businessintelli. The Stage needs to ‘know’ how the file is divided into rows and it needs to ‘know’ what the record format is (e.com . a field of data within a record is either delimited or in a fixed position. this is going to execute in sequential mode but you can use one of the features of the Sequential File Stage to be able to read multiple files at the same time. These will execute in parallel when executing multiple files. If we set up multiple readers. We can use these record delimiters and column delimiters (such as the comma or the new-line delimiter) as a record delimiter.Information Server 8. Business Intelli Solutions Inc www. we can read chunks of a single file in parallel.g. Typically.
However. it will be sent down the dashed reject link entitled Source_Rejects and.1 Introduction Job Design Using Sequential Stages The graphic above shows how we can read data from the Selling_Group_Mapping file.com . in this case. Business Intelli Solutions Inc www. Over on the right. it will be sent out to the Target_File_Target_Rejects link to the TargetRejects Peek Stage. if anything isn’t written correctly. by the target.Information Server 8. send it into our Copy Stage and load it into the target flat file. read by a Peek Stage. we would double-click it. To configure the Sequential File Stage. if anything is not read correctly by the Selling_Group_Mapping link.businessintelli.
In other words all of the Column Names. under the Columns sub-tab look like all of the other metadata that we’ve seen for this file. on the Output tab. etc. Next. under the Columns sub-tab. let's say that we clicked on the sub-tab entitled 'Properties'.1 Introduction Sequential Source Columns Tab Then.Information Server 8. Keys. within the Sequential File Stage.com . Business Intelli Solutions Inc www. SQL Types. we would configure our Input file column definitions. Scales. The column metadata looks like the metadata that we have elsewhere.businessintelli. Lengths.
and other options such as whether or not the first line is a column header. etc. For instance. we are presented with some options. These options help us define how we will read the file.com .businessintelli. whether or not we will reject unreadable rows.1 Introduction Under the Properties tab. Business Intelli Solutions Inc www.Information Server 8. there is the file’s name. its read method.
It preserves the partitioning that we established during our Jobs. the data contained in the Data Set is already partitioned and ready to go into the next parallel Job in the proper format.com .1 Introduction Datasets The other primary file that we use is the Data Set. Business Intelli Solutions Inc www. This way.businessintelli. This kind of file stores data in a binary format that is non-readable without a DataStage viewer. Data Sets are very important to our parallel operations in that they work within the parallel framework and use all the same terminology and are therefore accessed more easily.Information Server 8.
It is a proprietary DataStage file format that can be used within the InfoSphere world for intermediate staging of data where we do not want to use the database or a sequential file.Information Server 8.businessintelli. or for EDI. or for any of the other normal forms of data interchange. It is not a useful form of data storage for sending to other people as an FTP file.com . Business Intelli Solutions Inc www.1 Introduction We use a Data Set as an intermediate point to land data that does not need to be interchanged with any application outside of DataStage: It will only be used within DataStage Jobs going from one job to the next (or just within a Job).
1 Introduction Data Set management Utility You may need to look at a Data Set to verify that it looks ‘right’ or to find specific values as a part of your testing. And. which will show you how the file is separated. You can look at a Data Set using the Designer client: There is a tool option that allows you to do data set management.Information Server 8. It then opens a screen.com . Let’s look at this display Business Intelli Solutions Inc www. It shows the number of partitions and nodes and it shows how the data is balanced between them. if you would like. it will also include a Displaying option for your data. It let’s you first.businessintelli. seek out the file that you want.
1 Introduction Data and Schema Displayed A Data Set cannot be read like a typical sequential file by native tools such as the VI Editor or your Notepad editor in Windows.Information Server 8. When looking at the Data Viewer we will see data in the normal tabular format just as you view all other DataStage data using the View command.com . Business Intelli Solutions Inc www.businessintelli.
we will use the Table Definition Import utility within the Designer client – or you can use the orchdbutil. You will need to work with Data Connection objects. So. before we can access our databases. To do this.com . Data Connection objects store all of our database connection information into one single named object so that.1 Introduction Connect to Databases Now that we’ve talked about some of the basic Passive Stages for accessing file data we will take a look at some of the basic Database Stages that allow us to connect to a variety of relational databases. let's begin by talking a little bit about working with relational data. when we go into our other Stages in Jobs. The orchdbutil is the preferred method to get correct type conversions. you need to verify that your type conversions have come through correctly and will suit the downstream needs (They must suit the needs of subsequent Stages and other activities within DataStage). we will need to talk a little bit about working with relational data. password of the various databases that we’re connecting to.businessintelli. Business Intelli Solutions Inc www. we won’t need to include every piece of detailed information such as User ID. in any situation. However.Information Server 8. To import relational data. However. we need to know how to import relational data.
DataStage has a similar functionality that allows you to build INSERT. We just talked briefly about Connector Stages. Business Intelli Solutions Inc www.Information Server 8. and DELETE statements also using the SQL Builder. These are the legacy Stages from previous versions of DataStage. We have the ability to use Select statements as we would in any database. we have the Enterprise Stages.1 Introduction Next we need to see what Stages are available to access the relational data.businessintelli. They provide parallel support as well with the parallel extender set. There are also things called Plug-in Stages: These are the oldest family of connectivity Stages for databases and other relational sources. When we’re writing the data. These were ported to the current version of Information Server in order to support DataStage Server Jobs and their functionality. With DataStage. Then. UPDATE.com . it gives you the ability to select the data you want. They provide parallel support and are the most functional and provide consistent GUI and functionality across all the relational data types. When you have one of these Stages. you get a feature called the SQL Builder utility that quickly builds up such statements allowing you to get the data that you desire.
Information Server 8. From within the Designer client.1 Introduction Import Table Definitions The first thing that we want to do is to import our table definitions. Alternatively. To import these table definitions. Business Intelli Solutions Inc www.businessintelli. you could choose to import table definitions using the ODBC option. Orchestrate schema imports are better because the data types tend to be more accurate. which can then later be translated into pulling that same data out from corresponding databases. Likewise you can use the Plug-In table definitions or other sources from which you can import metadata including legacy information such as COBOL files. you simply click on Import>Table Definitions>Orchestrate Schema Definitions. we can use either ODBC or the Orchestrate schema definitions.com .
1 Introduction Orchestrate Schema Import The Orchestrate Schema Import utility does require certain bits of information including the database type.com . all the entries will be blank). Many will need a username and password although some may come pre-configured. username. the server on which it is hosted. and password. first we select the ODBC data source name (DSN). Business Intelli Solutions Inc www.businessintelli. This will need to have been set up for you by your System Administrator before you use the screen shown above (Otherwise.Information Server 8. the name of the database from which you are pulling it. ODBC Import When we’re using our ODBC Import.
This will also include the suite of DataDirect drivers.Information Server 8.0 for client/server. There is also a connector for the WebSphere MQ to allow us to connect to MQ series queues.5 standard and is level 3 compliant and certified for use with Oracle. This gives us fast access to Teradata databases. DB2 UDB. The last Connector Stage type is for Teradata. It can be used with MQ 5. And there is also WSMB 5. SQL Server.com .2.0.1 and 8. This is useful for DB2 versions 8. The next connector type is the DB2 UDB Stage. which allow you to connect to these various data sources and give you unlimited connections.1 Introduction Connector Stage Types There are several Connector Stage types including ODBC. and various other databases. Business Intelli Solutions Inc www.businessintelli.3 and 6. which conforms to the ODBC 3.
Next we would want to configure it so that we can then pull our data from an ODBC sources. in the graphic above.com . Let’s take a look inside of the Stage.Information Server 8. You can see.businessintelli. we want to be able to use it on our Canvas. Business Intelli Solutions Inc www.1 Introduction Connector Stages Now that we’ve done some of the basic set up for our relational data. that an ODBC connector has been dragged and droppedon on the left.
Or we can manually enter our own SQL. Alternatively. We can use the SQL Builder tool highlighted on the right. Business Intelli Solutions Inc www. we can see the various other properties of the connection itself. you can see a Navigator panel.1 Introduction Stage Editor Inside of the Connector Stage.businessintelli. we will then want to configure an SQL statement that will allow us to pull data.com . Once we have chosen the connector for our ODBC Connector Stage. and then below. we can use SQL that is file-based. an area in which we can see the link properties.Information Server 8.
or various other directives to the database. Business Intelli Solutions Inc www. removing constraints.1 Introduction Connector Stage Properties When configuring the Connector Stage. re-creating indexes. we can enter any Before SQL commands or After SQL commands that may need to occur such as dropping indexes. This may include any transaction information and session management information that we entered (This would be our record count and commitment control shown highlighted) Additionally.Information Server 8.businessintelli.com . we should have our connection information and know what SQL we will be using.
Business Intelli Solutions Inc www.businessintelli.1 Introduction Building a Query Using SQL Builder When you build a query using the SQL Builder utility. there are certain things that you’ll need to know: Be sure that you are using the proper table definition and be sure that the Locator tab information is specified fully and correctly. you can drag on the columns that you want to select. And.com . You can also drag the table definition to the SQL Builder Canvas.Information Server 8.
Information Server 8.1 Introduction
Data Connection Data Connections are objects in DataStage that allow us to store all of the characteristics and important information that we need to connect to our various data sources. The Data Connection stores the database parameters and values as a named object in the DataStage repository. It is associated with a Stage type. The property values can be specified in a Job Stage of the given type by loading the Data Connection into that Stage.
Business Intelli Solutions Inc
Information Server 8.1 Introduction
Creating a New Data Connection Object You can see the icon that represents the Data Connection highlighted above.
Business Intelli Solutions Inc
Information Server 8.1 Introduction
Select the Stage Type Inside the Data Connection, we can select the Stage type that we’re interested in using and that we want to associate this to. Then we specify any parameters that are needed by that particular database. Obviously, different databases have different required parameters, so the selection of the Stage type is important.
Business Intelli Solutions Inc
the transformation of data.com . Business Intelli Solutions Inc www. modification of metadata. our other main classification of Stages is the Active Stage. Active Stages Lookup. Some of the Active Stages that we’ll be looking at include a number of functionalities such as data combination. filtering of data.Information Server 8.1 Introduction Loading the Data Connection Once we have built our Data Connection. Merge.businessintelli. then we will want to load the Data Connection into the Stage that we have selected. Join Stages Aside from Passive Stages. and a number of other things for which there are corresponding Active Stages.
These Stages combine two or more input links. The Lookup Stage Some of the features of the Lookup Stage include its requirement for one input for its primary input link. Some of these Stages also have input requirements such as needing to sort the data or deduplicate it prior to its combination. They also differ in how they treat rows of data when there are unmatched key values.Information Server 8. You can have multiple reference links. the three Stages that we use are the Lookup Stage. and the Join Stage. It can only have one primary output link (And an optional ‘reject’ link if that option is selected).com . Business Intelli Solutions Inc www. mainly.1 Introduction Data Combination First. in the way that they use memory.businessintelli. the Merge Stage. These Stages differ. For this. let’s talk about data combination.
Business Intelli Solutions Inc www.1 Introduction Lookup Failure Actions It is limited to one output link however. dropping the row. you can include a reject link. passing the data through with a null value for what should have been looked up. Other lookup failure options include continuing with the row of data through the process.com . with its lookup failure options. or we could just fail the Job (Which will abort the Job).Information Server 8.businessintelli.
so you must be careful how you use it. so you must be careful how you use it. This data is indexed by using a hash key. You should make sure that the Lookup data is small enough to fit into physical memory. Business Intelli Solutions Inc www. The Lookup Stage builds a Hash table in memory from the Lookup file(s). You should make sure that the Lookup data is small enough to fit into physical memory.businessintelli.Information Server 8. which gives it a high-speed lookup capability.1 Introduction The Lookup Stage can also return multiple matching rows. This data is indexed by using a hash key.com . The Lookup Stage can also return multiple matching rows. The Lookup Stage builds a Hash table in memory from the Lookup file(s). which gives it a high-speed lookup capability.
Business Intelli Solutions Inc www.Information Server 8.1 Introduction Lookup Types There are different types of Lookups depending on whether we want to do an equality match. or a range lookup on the reference link. a caseless match.businessintelli.com .
Information Server 8.com . Coming in from the top is our Reference data. As you are dragging objects onto the Canvas and building the Job. We have a primary input coming in from the left and the data flows into the Lookup Stage. Then coming from our Lookup Stage is the output link going to the target.1 Introduction Lookup Example Let’s see how a Lookup Stage might typically fit into a Job. Notice that the reference link is dashed (a string of broken lines). What would happen if we clicked on the Lookup Stage to see how it's configured? Let's take a look.businessintelli. always draw the Primary link before the Reference link. Business Intelli Solutions Inc www.
Business Intelli Solutions Inc www. there is a multi-pane window. On the top of the screen is more of a graphical representation of each row. This same metaphor applies to most other Stages as well.1 Introduction Lookup Stage With an Equality Match Within the Lookup Stage. Everything on the left represents the data coming into the Lookup Stage – both Primary link and Reference link(s). Toward the bottom of the screen is a more metadata-oriented view in which you can see each of the links and their respective characteristics. Notice that the metadata references (shown highlighted lower left) look like the table definitions that we used earlier to pull metadata into the DataStage Project.Information Server 8.com . Everything on the right is coming out of the Lookup Stage.businessintelli.
businessintelli. Business Intelli Solutions Inc www.1 Introduction Highlighted toward the top. is a place where you connect column(s) from the input area (highlight Item input link) and dragging them down and dropping them to the corresponding Reference link and connect them to the various equivalent fields here (highlight on Warehouse item in the lower blue area).com .Information Server 8. In this case. we are using a single column to connect an equal: Where a “Warehouse Item” is equal to an “Item”.
we are taking all of our input columns and we are adding a new column to it based on what we looked up on the Warehouse Item ID.com . Business Intelli Solutions Inc www.1 Introduction Next. So.Information Server 8. we can move all the columns from the input area on the left to the output link area on the right but replace this Item (highlight on the right box) description with the Warehouse Item description (the one that we found in the Reference link).businessintelli.
Business Intelli Solutions Inc www.Information Server 8.1 Introduction When you click on the icon with golden-chains (shown highlighted) it gives you a new dialog screen.com .businessintelli.
Information Server 8.businessintelli. you can use these same options of continue. Here we can determine what actions to take should our lookup fail.1 Introduction Specifying Lookup Failure Actions This dialog box shows us the constraints that we can use. Business Intelli Solutions Inc www. drop the row. drop. if you use a condition during your lookup (highlight on fail under conditions not met). fail the row. Also. or reject it. or reject for if/when your condition is not met.com . fail. You can continue the row.
Information Server 8. Like a Lookup Stage in an SQL query. in the Join Stage we will use columns to pull our data together. We must identify a left and a right link as they come in. Business Intelli Solutions Inc www. right outer. left outer. The Join Stage will use much less memory: It is known as a ‘light-weight’ Stage. The data doesn't need to be indexed and instead has come into the Stage in a sorted fashion. There are 4 types of joins. inner. and the full outer join.com . The input links must be sorted before they come into a Join Stage.businessintelli.1 Introduction Join Stage The Join Stage involves the combination of data. This Stage supports additional ‘intermediate’ links.
Remember that the reference link that we saw earlier did have a dashed line. and an output. Similar to the Lookup Stage.com . notice that the right outer link input is not a dashed line (Highlight on the upper link).1 Introduction Job with Join Stage Above is an example of a Job that uses the Join Stage. we have a primary input. Notice the green Join Stage icon in the middle.Information Server 8.businessintelli. Business Intelli Solutions Inc www. But. a secondary input. in this case. here.
Here. We will specify the join type that we want and what kind of key we will be joining on. Business Intelli Solutions Inc www.Information Server 8.com .1 Introduction Join Stage Editor When we get inside the Join Stage.businessintelli. we will use the Properties editor. Similar editors are found in other Stages. we will use the attributes necessary to create the join.
Instead of a left and a right link. Business Intelli Solutions Inc www. we have a 'Master' link and one or more Secondary links. Unmatched Master rows can be kept or dropped.1 Introduction The Merge Stage It is quite similar to the Join Stage. It is a ‘light-weight’ Stage in that it uses little memory (This is because we expect the data to already be sorted before coming into the Stage and therefore there are no keys in memory [indexes to be used]). its input links must be sorted.Information Server 8. This gives the effect of a left-outer join.businessintelli. Unmatched Secondary links can be captured in a Reject link and dispositioned accordingly. Much like the Join Stage.com .
Information Server 8. Business Intelli Solutions Inc www. and then collecting the partitioned data back up).com . then you will see its icon on the Link (The icon is shown in the lower-left of the graphic). The icon toward the right of each link above is the partitioning icon (Partitioning it out. it looks like a fan.businessintelli.1 Introduction When the data is not ‘pre-sorted’. the sorting can be done by using an explicit Sort Stage or you can use an On-link sort. If you have an On-link sort.
1 Introduction Coming out of the Merge Stage. The latter will capture the incoming rows that were not matched in the merge. Business Intelli Solutions Inc www. we have a solid line for the main output and a dashed line for the rejects.com .businessintelli.Information Server 8.
Business Intelli Solutions Inc www.com . Also you must specify whether or not to warn on unmatched masters.Information Server 8. in some cases produce multiple rows.businessintelli. This is another way to combine data.1 Introduction Merge Stage Properties Inside the Merge Stage. you must specify the attributes by which you will pull together the two sets of data. What to do with the unmatched master records and whether to warn or to reject updates. which will create more columns within one row and.
businessintelli.1 Introduction The Funnel Stage The Funnel Stage provides a way to bring many rows from different input links into one common output link. then they must first go through some kind of Transformer Stage or Modify Stage in order to make all of their metadata match before coming into the Funnel Stage. Business Intelli Solutions Inc www. The important thing here is that all sources must have identical metadata! If your links do not have this.com .Information Server 8.
businessintelli. The first one coming into the Funnel is the first one put out. This produces a sorted output if all input links are sorted by the same key. and so on (Based on the number of links that you have coming into your Funnel Stage).com . The Sort mode is where we combine the input records in order defined based on keys.1 Introduction The Funnel Stage works in three modes. The Sequence mode outputs all of the records from the first input link and then outputs all from the second input link. Business Intelli Solutions Inc www.Information Server 8. The Continuous mode is where records are combined in no particular order.
As mentioned.businessintelli. the Sort Stage is used to sort data that requires sorting. Just set the partitioning to anything other than Auto. Business Intelli Solutions Inc www. Obviously. as was the case with the Join Stage and the Merge Stage. Let’s continue our discussion on Active Stages. This on-link sort is configured within a given Stage on its Input link’s Partitioning tab.Information Server 8. you can see an example of the Funnel Stage’s use.1 Introduction Funnel Stage Example Above. we’ll briefly talk about the Sort Stage and the Aggregate Stage.com . sorts can be done on the input link to a given Stage. Here.
Business Intelli Solutions Inc www.Information Server 8.1 Introduction Alternatively. you can have a separate Sort Stage. One of the advantages to the Sort Stage is that it gives you more options for controlling memory usage during the sort.com . The advantage to on-line sorting (within the Stage) is that it is quick and easy and can often by done in conjunction with any partitioning that you’ll be doing.businessintelli.
Information Server 8. Below.1 Introduction Sorting Alternatives The Sort Stage is highlighted above. In this example.com . we have sequential data coming into a Sort Stage before it is moved to a Remove_Duplicates Stage and then sent out to a Data Set. we have a different example in which file that is being sorted directly into a Data Set but in this case the sort is happening directly on the link (on-link sort).businessintelli. Business Intelli Solutions Inc www.
then. the minimum. then it’s not necessary.com . summing values.Information Server 8. using the default hash option will allow you to pull the data in without any sorting ahead of time. or determining the maximum. If you don’t want to take the time to do a sort of your data prior to aggregation. Business Intelli Solutions Inc www.1 Introduction Our next Active Stage is the Aggregator Stage. you specify one or more key columns.businessintelli. which define the aggregation units (or groups). you specify the aggregation functions. When using it. Columns to be aggregated must be specified. The purpose of this Stage is to perform data aggregations. If your data is already sorted. The grouping method (hash table or pre-sort) is often a performance issue. Some examples of functions are counting values (such as nulls/non-nulls). or ranges of values that you are looking for. Next.
businessintelli. Data flows from a Data Set (at the top of the data flow) and is copied two ways.com .1 Introduction Job with Aggregator Stage In this Job we can see the Aggregator Stage’s icon. One way (to the right) is sent to a Join Stage.Information Server 8. The Aggregator Stage uses a Greek Sigma character as a symbol for summation. Business Intelli Solutions Inc www. Another way (straight down) is sent to an Aggregator Stage to aggregate the row counts and then (flowing from the Aggregate to the Join) is passed back into the same Join so that each row that came into this Job can also have. attached to it. the total number of rows that are being counted.
businessintelli. Business Intelli Solutions Inc www. However. it lets you choose whether you want to retain the first or the last of the duplicates in the group. Use of the Unique option provides for a Stable sort which always retains the first row in the group or a Non-Stable sort in which it is indeterminate as to which row will be kept. In particular.com . this leaves us with no choice as to which duplicate to keep. The process of removing duplicates can be accomplished using the Sort Stage with the Unique option. This tactic gives you more sophisticated ways to remove duplicates.1 Introduction Remove Duplicates Stage Our next Active Stage is the Remove Duplicate Stage.Information Server 8. The alternative to removing duplicates is to use the Remove Duplicates Stage.
Constraints. and Expressions to be referenced. Then the data is sent out to a Data Set.Information Server 8. An Important Active Stage: The Transformer Stage One of the most important Active Stages is the Transformer Stage. Business Intelli Solutions Inc www.businessintelli. It provides for four things: Column Mappings. From the Sort Stage (toward the left of the graphic).1 Introduction This sample Job shows us an example of the Remove Duplicate Stage and we can see what the icon looks like.com . the data comes into a Copy Stage toward the bottom. and then runs the sorted data up into the Remove Duplicate Stage. Derivations.
we can change the metadata. We can use system variables with constants. The Transformer's use of expressions is for constraints and/or derivations to use as reference. concatenation. Also. Business Intelli Solutions Inc www. character detection.com . These are written in a BASIC code and the final compiled code is C++ generated object code. and we can also use external routines within our Transformer. The Constraint allows you to filter data much like the Filter Stage does. and its content. We can use Job parameters and/or functions (whether they be the native ones provided with DataStage or ones that you have custom-created). We can use the input columns that are coming into the Transformer to determine how we want to either derive or constrain our data. You can direct data down different Output links and process it differently or process different forms of the data.businessintelli.Information Server 8. With column mappings. Stage variables (local in their scope) to just the Stage – as opposed to being global for the whole Job). There are a number of out-of-the-box derivations that you can use allowing you to do things such as: string manipulation. its layout. within the Transformer Stage are derivations.1 Introduction It is one of the most powerful Stages. You can also write your own custom derivations and save these in your Project for use in many Jobs. and other activities. Other features of the Transformer include Constraints.
There is a reject link coming out of the Transformer (performed with the Otherwise option) and two other links coming out of the Transformer as well to populate two different files.businessintelli. Business Intelli Solutions Inc www. In the example Job above.Information Server 8.1 Introduction Job with a Transformer Stage The Transformer Stage has an icon that looks like a T shape with an arrow.com . we have the data coming into the Transformer.
Again. This gives you the very detailed information about the metadata on each column. as with the Lookup Stage.com . The top area (highlight on upper-most blue box on the right) allows us to create Stage variables.1 Introduction Inside the Transformer Stage Inside a Transformer Stage we have a look-and-feel similar to that of the Lookup Stage that we talked about earlier. Business Intelli Solutions Inc www.Information Server 8.businessintelli. in that we have an Input link (highlighted on the left) and all of our Output links on the right. These two output links (the two boxes on the right) give us a graphical view and specify the column names and also gives us places to enter any derivations. the bottom half of the screen resembles the table definitions that you will see inside of your repository (highlight on center lower half). which we can use to make local calculations in the process of doing all of our other derivations.
Information Server 8.businessintelli.1 Introduction (Yellow highlight on the Golden-Chain button in the toolbar at the top) When you click the Golden Chain button then this will bring you to the section that allows you to do Constraints.com . Business Intelli Solutions Inc www.
com . Business Intelli Solutions Inc www. Here you can create an expression that will let you put certain data into one link or a different link or not put it into a link. DataStage brings up options for your Expression Builder (highlighted toward the left). you can type in your constraint into the highlighted area that also uses BASIC terminology. When you right-click in any of the white space.1 Introduction Defining a Constraint.Information Server 8.businessintelli.
businessintelli.com . you can also use the Ellipse button to bring up the same menu. we are using an input column. We compare our input column to an upper-cased version of one of our Job parameters. At the beginning of the expression. Business Intelli Solutions Inc www.1 Introduction Alternatively. In the example above it shows an UpCase function and a Job parameter named (Channel_Description).Information Server 8.
As with the constraints.1 Introduction Defining a Derivation When we are defining or building a derivation. Business Intelli Solutions Inc www. a right-click or clicking the Ellipse button brings up the context-sensitive code menu to help you build your expression. These derivations can occur within the Stage Variable area or down within any one of the Links.com .Information Server 8. we also use the BASIC-like code to create expressions within a particular cell.businessintelli.
The Transformer can use a Substring operator. then. it can Upcase/Downcase. and/or it can find the length of strings. else” derivations where we can put conditional logic that will affect the data on our output link String Functions and Operators Some things included in the Transformer Stage’s functionality are string functions and operators.1 Introduction If Then Else Derivation Using our Transformer Stage.Information Server 8.com . It provides a Business Intelli Solutions Inc www.businessintelli. we can also build up “If.
Checking for NULLs Other out-of-the-box functionality of the Transformer includes checking for NULLs. assist us in how we want to handle them.com . DataStage can identify whether or not there are NULLs. setting.businessintelli. and can do various testing.1 Introduction variety of other string functions as well such as string substitution and finding positional information. or replacing NULLs as we see fit.Information Server 8. Business Intelli Solutions Inc www.
Transformer Execution Order It is important to know that. Strings.Information Server 8. Numbers. Business Intelli Solutions Inc www. there is a very specific order of execution. The first things that are executed in the Transformer for each row are the derivations in the Stage Variables. when a Transformer is executed. 1.businessintelli. Within each Link. in the earlier links before the later links. logic. column derivations (from top to bottom). and those for type-conversion. Then within each link.com . the next thing that is executed are the column derivations and they are executed first. The derivations in higher columns are executed before lower columns: Everything has a topdown flow to it.1 Introduction Transformer Functions Other functions of the Transformer include those for Date and Time. Constraints (from top to bottom) 3. The second things that are executed are the constraints for each link that is going out of the Transformer. NULL handling. Derivations in Stage variables (from top to bottom) 2.
at the beginning we will execute all my variables (from top to bottom) and then execute each constraint (from top to bottom).com . and within that Link. the Transformer will fire off the corresponding output Link. we can use its results in the column below.Information Server 8.1 Introduction Let’s look at a quick example of how this might work. we do all the columns from top to bottom. For each constraint.businessintelli. So. TIP: You must be very. very careful if you ever re-arrange your metadata (because of how the order of execution could affect your results within a Transformer. Which means that. Business Intelli Solutions Inc www. if we do something in a column above.
Let’s quickly see how this is done.businessintelli.1 Introduction Transformer Reject Links You can also have Reject links coming out of your Transformer. Business Intelli Solutions Inc www. Reject links differentiate themselves from other links by how they are designated inside the constraint.Information Server 8.com .
1 Introduction Off to the right.com .businessintelli. we have the “Otherwise Log” option or heading. Checking the respective checkbox create a Reject link as necessary.Information Server 8. Business Intelli Solutions Inc www.
businessintelli. • • Business Intelli Solutions Inc www. the input columns are mapped to unmapped columns by name. This is all done ‘behind the scenes’ for you. we need to cover a few other important concepts that will help us in our development of DataStage Jobs. When RCP is turned on.1 Introduction Our “Otherwise” link creates a straight line (such as the Missing Data Link shown above) whereas. the columns of data can flow through a Stage without being explicitly defined in the Stage. With RCP. the target columns in a Stage need not have any columns explicitly mapped to them: No column mapping is enforced at design-time.Information Server 8. • • • • Run Time Column Propagation Shared Containers Runtime Column Propagation (RCP) One of the most important concepts is that of Runtime Column Propagation (RCP). Advanced Concepts When Building Jobs We’ve discussed the key objects in DataStage and we have set up DataStage. With RCP enabled. a “Reject” link would have a dashed line (Such as the Link all the way to the left). Now. We’ve been able to use Passive Stages to pull data in from various files or relational database sources and we’ve talked about various Active Stages that let us manipulate our data.com . We’ll see some examples of this later.
then can also choose to have each newly created Link use RCP by default or not. they give us greater Job flexibility. First. This way you can create a component of logic and apply it to a single named column while all the other columns will flow through untouched. • • • • Business Intelli Solutions Inc www. This default setting can be overridden at the Job-level or at the Link-level within each Stage. if you do. Second. And. we look at the Output Column tab in order to decide whether or not we want to use RCP. such as the Shared Containers (There are many ways to create re-usable bits of DataStage ETL code). we can enable it for the entire Job or just for individual Stages. You must enable RCP for the entire Project if you intend to use it. a Job can process input with different layouts. The implicit columns are read from sequential files associated with schemas or from the tables when read from relational databases using a “select” or they can be explicitly defined as an output column in a Stage that is earlier in the data flow of our Job. When we want to enable RCP.Information Server 8. The main benefits of RCP are twofold. At the Job level.1 Introduction • Let’s talk about how the implicit columns get into a Job. it must be enabled at the Project-level. RCP gives us the ability to create more re-usability within components. This way. We don’t have to have a separate Job for multiple record types that are just slightly different. then you won’t find the option in your Job. Within each Stage. If it’s not enabled there.businessintelli.com .
When you see this checkmark.1 Introduction Here’s an example of where we set up and enable RCP just within a Stage (Highlight on checkbox of Enabling RCP at Stage Level slide). the two lower columns would remain in a RED state (ineligible derivations) and this Job will not compile. However. When RCP is Disabled Let’s say that we had four columns coming in to this Stage shown above as the input link shown above on the left and four columns going out (on the right) then. if we only drew derivations to two of the columns (the upper two on the right).businessintelli. you know that columns coming out of this Stage will be RCP-enabled. let’s see what we can do with RCP Business Intelli Solutions Inc www.com .Information Server 8.
Business Intelli Solutions Inc www. DataStage will assign the two columns on the input link (on the left) to two columns going out on the output link (lower right) automatically. The only thing that they all have to share is that the very first two columns (highlight on top two columns) must be the same and must be explicit.Information Server 8. we could leave these two columns completely off (the ones in the lower right of the previous graphic) and they would be carried in an invisible fashion through subsequent Stages and thereby. we can run this Job one time with a record that has four columns and another time we can run a Job that has eight columns another time we can run that Job with six columns.businessintelli. Or.1 Introduction When RCP is enabled we can see that the output link’s columns (lower right) do not have anything in the red state. allowing you to run many Jobs that have row-types that have the upper two columns explicitly but also have the lower two (or more) columns implicitly In other words. By name matching.com .
and other things that allow you greater flexibility and help create greater reusability within your DataStage Jobs. Target columns in the Stage do not need to have any columns explicitly mapped to them.businessintelli. Business Intelli Solutions Inc www. we can work with a Sequential File Stage or the Modify Stage to take unknown data and give it an explicit metadata tag. will allow columns of data to flow through a Stage without being explicitly defined in the Stage. RCP also allows us to create re-usability. We can also pull in our metadata implicitly when we read it from a database using our “select” statement. It also works very well within DataStage’s Shared Containers (the user-defined chunks of DataStage ETL applications). a schema file is another way to define our metadata much like a table definition.1 Introduction Runtime Column Propagation (RCP) In Detail Because of the importance of concepts including Runtime Column Propagation (RCP).com . RCP is how implicit columns get into a Job. The main benefits of RCP are Job flexibility so that we can process input with different or varying layouts. Therefore. so let’s review and continue our discussion about it.Information Server 8. Input columns are mapped to unmapped columns by name. As you’ll recall. no column mapping is enforced at designtime when RCP is turned on. RCP. Understanding RCP often takes several years in the field to understand. the use of Shared Containers (the chunks of DataStage code that can be re-used). Even people who have been in the field for years can have difficulty grasping both the power and the difficulties of understanding RCP better. We can define their metadata by using a schema file in conjunction with a Sequential File Stage. Using this schema. Let’s look at an example of RCP so that we can better understand it. We can also define it by explicitly defining it as an output column in an earlier Stage in our data flow (Using a previous Stage in the Job design). it is worth talking about them in greater detail. when it is turned on. the use of schemas.
1 Introduction The graphic above summarizes the main points of RCP but by revisiting the example that we looked at a few moments ago.Information Server 8. So let’s review our example in which RCP is NOT used. we can gain a deeper understanding of RCP.com . Business Intelli Solutions Inc www.businessintelli.
Let’s review how it functions without RCP.com . these two columns are left in a red color indicating that this is an incomplete derivation. We can see four columns on the input link to the left. Business Intelli Solutions Inc www. To the lower right. we can see four outgoing columns (on the output link). this Job won’t compile.Information Server 8.1 Introduction We’re looking within a Transformer Stage at a typical example of columns when RCP has not been enabled. Therefore. But the two lower columns SPEC_HANDLING_CODE and DISTR_CHANNEL_DESC aren’t connected to the input columns and don’t have any derivations. Therefore.businessintelli.
It gives us a lot of flexibility but can create headaches at runtime if you are not aware of data that is passing through your Job. they would be carried invisibly along to the next Stage. and to understand what data is going out of the Stage. Now with the same Transformer.businessintelli. But this would allow us to not require any kind of individual column-checking at design-time. If they Business Intelli Solutions Inc www. they could be either used or not used as necessary. this Job will compile.1 Introduction When RCP is Enabled. what kind of metadata needs to get processed there. This has mixed benefits. in the exact same situation but one in which RCP is enabled.com . But. If we had additional explicitly named input link columns beyond the four highlighted to the left. by name reference these two columns on the bottom left will be assigned to the two columns on the bottom right. the bottom two columns are no longer in a red state and therefore.Information Server 8. Thus you must be very careful when using the RCP. Important things to know when using RCP are to understand what data is coming in. This applies in terms of the number of columns and the metadata type. This can negatively impact downstream activities. And. When this Job runs.
Information Server 8. Business Use Case Scenario Let’s talk about a good use of RCP in business terms. It would then bring this data together so that it could be written out with the new number attached to it. one company might have 10 columns in their Customer table while another company has 20 columns in their Customer’s table. Let’s say that you have a large conglomerate company and five companies underneath it in a corporate hierarchy. Two columns would be needed. you’ll need to make sure that it is truncated properly to fit into the outgoing column space so that it doesn’t overflow and create a runtime failure. the source system ID and the source system Customer ID. Each of the sub-systems has different layouts for their Customer but they all contain a commonality of having its own Customer ID field. and process the data. then you’ll need to perform some type of conversion function on it. The conglomerate wants to take data from all five of these companies’ operational systems and populating a data warehouse with Customer Information. if we were to process five Customer records from the five companies separately. We are only going to specifically identify 2 of those columns coming in and only specify 3 of those columns coming out of our Stage. then what you are bringing in at some point in the processing. But each of these sub-companies will not know what their corporate conglomerate Customer ID is until it comes time to pull all of this data into the warehouse. For instance. If we do this in a file-to-file example. and then process the data through a DataStage Job which would then relate the subsystem’s ID to the corporate ID via a cross-reference that is stored in the corporate data warehouse.1 Introduction do not match. But. will need to be modified accordingly to work with the final target metadata of the Job. We will read the first file coming in. if you have a longer column coming in. Likewise. So.businessintelli. They don’t know this corporate conglomerate Customer ID because their systems weren’t built knowing about any corporate conglomerate Customer ID. And then that data could then be processed subsequently knowing to which corporate Customer ID this particular sub-system’s row belongs. The conglomerate has a single Customer ID that can track each of them distinctly. the corporation conglomerate knows about it and can pull the customer data from all of the sub-companies. However. let’s say that you bring in one column of a numeric type of data but it needs to go out as character data. The Job could then use this information to do a Lookup to a reference from the conglomerate’s warehouse that would in turn find the corresponding corporate conglomerate Customer ID.com . then write it out… even though we only see. the 2 or 3 columns that we Business Intelli Solutions Inc www. each of these Customer ID fields may overlap: The value of ‘123’ at Company A may represent a totally different Customer than the value of ‘123’ at Company B even though they are the same value. For example if it comes in with one column length but needs to go out with a longer column length then you’ll need to affect it from within either a Transformer Stage or a Modify Stage. in our Job. add another column or piece of information to say ‘from which company it came’.
Thereby.businessintelli. one for each of specific data layouts coming from each of the sub-companies. when we write our output file. and then re-applied. stored.Information Server 8. Shared Containers Shared Containers are encapsulated pieces of DataStage Job designs. They are components of Jobs that are stored in a different container. we will only need one Job instead of having to write 5 separate Jobs. They can then be reused in various other Job designs.1 Introduction have explicitly identified. Business Intelli Solutions Inc www. We can apply “stored Transformer business logic” to a variety of different situations.com . This logic could be encapsulated. In the previous example we performed some logic on the grouping Customer Information from sub-companies with a corporate Customer ID. all of the columns that came from the input file will be written in addition to the new conglomerate Customer ID that we did the lookup on. This is the case with RCP on.
Then this same Job would have a slightly different look. the Developer has selected the Copy Stage and the Transformer Stage as a Shared Container.com . Then clicking (E)dit from the menubar. Let’s see what it would look like.businessintelli. the (C)onstruct Container option.1 Introduction Creating a Shared Container In the example above.Information Server 8. Business Intelli Solutions Inc www. and (S)hared completes the task.
Business Intelli Solutions Inc www. If we create a Shared Container in conjunction with RCP then the metadata inside the Shared Container only need to be what is necessary to conduct the functions that are going to occur in that Shared Container. all the columns would be passed out with the newly added corporate identifier (that we had done a Lookup in our Shared Container). The combination of Shared Container and RCP is a very powerful tool for re-use. If that Shared Container is where we did our Lookup to find our corporate conglomerate number.1 Introduction Using a Shared Container in a Job Above.com . any other columns (regardless of how many there are – different sub-companies have different amounts of columns) passed in there too. But on the way out of the Transformer (that adds the conglomerate Customer ID).Information Server 8. you can see the Shared Container icon where the selected Stages used to be. This same Shared Container can now be used in many other Jobs. Recall for a moment the example that we were talking about with the conglomerate and the five sub-companies. Let’s examine the concept of combining the two.businessintelli. Shared Containers can use RCP although they don’t have to do so. and all we really needed to do was to pass into the Shared Container the 2 columns we explicitly knew.
By simply passing in our whole Order record but only exposing the Customer number (the sub-system Customer identifier) and the source sub-system from which it came so that the Shared Container can do the Lookup. Maybe even in Jobs that weren’t intended to process just the Customer data. we will know not only the Source CustomerID that appeared on the billing invoice and other information from that particular sub-company but also be able to tie it to our conglomerate database so that we can distinguish it from other customers from other sub-companies. By using RCP we only have to specify the few columns that really matter to us while we are inside the Shared Container.com . as we process this Order into our data warehouse. and tag that into the Order. You will need to select the Container link to map the input link as well as the appropriate columns that you’ll need.Information Server 8. Then. When we come back out of the Shared Container we can re-identify any of the columns as necessary – depending on what we want to do with them Shared Containers Continued The main idea around Shared Containers is that they let us create re-usable code. we might want to process Order data where we still need to Lookup the conglomerate’s Customer ID just for this Order. you need to select the Shared Container link with which the input link will be mapped to. produce the appropriate output including the concatenated conglomerate Customer ID.1 Introduction Mapping Input/Output Links to the Container When mapping the input and output links to the Shared Container. Business Intelli Solutions Inc www.businessintelli. For instance. It can be used in a variety of Jobs.
Business Intelli Solutions Inc www. have talked about how to pull in metadata.com . They might even be of the Sequencer type in which case they are sub-Sequencers or they can be Jobs themselves whether Parallel or Server Jobs. A Job Sequence is a special type of Job. Job Control Now that we’ve created the DataStage environment.Information Server 8. the next topic answers the question “How do we control all of these Jobs and process them in an orderly flow without having to do each one individually?” The main element that we use for this is called a Job Sequence.businessintelli. Shared Containers do not necessarily have to be RCP-enabled but the combination of the two is a very powerful re-usability tool.1 Introduction Again. and have seen how to build Jobs. It is a master controlling Job that controls the execution of a set of subordinate Jobs.
1 Introduction One of the things that a Job Sequencer does is to pass values to the subordinate Job’s parameters. For example.Information Server 8. let’s say that we were waiting for a file to be FTP’ed to our system. we can then sit there in a waiting-mode until that file comes.businessintelli. Business Intelli Solutions Inc www. the All or Some options.com . the rest of the activities within that Sequencer will be kicked off. When it does arrive. The Sequence specifies the conditions under which the subordinate Jobs get executed using a term know as “Triggers”. and we can also do things such as ‘Wait for File’ before our Job starts. We can specify a complex flow of control using Loops. It also controls the order of execution by using Links. The Sequence Link tells us what should be executed next and when it should be executed. These Links are different from our Job Links though in that there is no metadata being passed on them.
When you create a Job Sequencer. you open a new Job Sequence & specify whether or not it's re-startable.businessintelli. Then. you add Links between Stages. Jobs. The order of sub-sequences. you could send a message to your Tivoli Work Manager. This will specify the order in which the Jobs are executed. and pick up and re-run the one that failed. You can just find the one that failed. Next. and/or special purpose Stages that will perform functions such as looping options or setting up local variables. you don’t have to go back and re-run all the Jobs. The Sequencer can include restart checkpoints. Should there be some failure within your Sequence. skipping over all the ones that worked. e-mails is determined by the flow that you create on the Canvas Business Intelli Solutions Inc www.1 Introduction The Job Sequencers will allow us to do system activities. Stages that will execute system commands in other executables. For example you can use its e-mail option that ties into your native system and sends an e-mail. by writing a Command Line option that writes a message to the Tivoli Command Center so that operators can see what has happened. They also allow us to execute system commands and executables (that we might normally execute from the Command Line). looping. on the Job Sequencer’s Canvas you add Stages that will execute the Jobs.Information Server 8. Any Command Line options can be executed from within the Sequencer. For example. system commands.com .
Job Sequences allow us to specify particular error handling such as the global ability to trap errors.1 Introduction You can specify triggers on the Links. one Link goes to Activity B if the Activity A Job finished successfully but goes to Activity C on a different Link if the Job ‘errors-out’.com . Now let's see some of the Stages that can be used within the Sequencer. Business Intelli Solutions Inc www. Then you could enable and disable re-start checkpoints within the Sequencer. The ‘error-out’ Link might then go to an e-mail that sends an error message to someone’s console and stops the Sequencer at that point. you might create a condition so that when the flow is coming out of Activity A.businessintelli. For instance.Information Server 8.
com . These include the EndLoop Activity Stage. the Job Activity Stage (used to execute a DataStage Job). Business Intelli Solutions Inc www. the Routine Activity Stage (used to call DataStage Server routines).businessintelli. and the Wait For File Activity. the UserVariables Activity Stage (used to create local variables).1 Introduction Here is a graphic of some of the Stages that can be used within the Sequencer. Executing (the native OS) Command Stage. the Sequencer Stage (which allows us to bring together and to coordinate a rendezvous point for multiple Links once we have started many Jobs in parallel) and the StartLoop Activity Stage. the Exception Handler Stage. There is a (forced) terminator Activity Stage.Information Server 8. and the Nested Condition Stage. In addition there are the Notification Activity Stage (that is typically tied into DataStage’s email function).
If Job 3 completes successfully. go to Job 2’). The red links indicate error activity. the bottom Stage is an exception handler Stage and will only kick off if there is an exception during our runtime. then the flow goes on to execute a command. then it sends a notification warning by email that there has been a Job failure. 2. this Job Sequence starts with nothing coming into it. Normally the one at the bottom would as well.Information Server 8. if any of the Links are followed to it. The green color of the Link indicates that there is a trigger on the Link (this trigger basically says ‘if the Job was successful. Should any of Jobs 1. or 3 fail. the first Stage waits for a file. Business Intelli Solutions Inc www. At the top left.businessintelli. This Stage will thus start executing immediately. the flow follows the red link to a Sequencer (which can have all or any of the Jobs complete) but in this case. then it kicks off Job 1. In the top flow of the Job Sequence above though. they normally will all start at the same time within a typical Job Sequence. however.com .1 Introduction Sequencer Job Example Here we can see an example of a Sequencer Job. In a different example. if Job 1 is finished successfully the flow follows the green Link to Job 2 and so on. if there were many Stages independent of one another.
Parameters are needed for various functions within our Jobs and are extremely useful in providing Jobs with flexibility and versatility.businessintelli. Business Intelli Solutions Inc www. There must be some external scheduler whether that is the DataStage scheduler or a third-party enterprise scheduler to start up this whole Sequencer Job and only from there. These values can then be picked up at runtime. one or more value files can be named and specified.com . We will talk a little about how the Command Line can be used in this manner as well as how it will help us deploy our applications effectively and efficiently. Whether that source is just kicking it off from a Command Line or using something like a Tivoli Work Manager to execute the Job based on a daily or monthly schedule or a conditional schedule should some other activity occur first and then. Think of it as an entry point into your application from whatever external source you intend to use. It just sequences the activities.Information Server 8. Thereby. In order to do that. A value file stores the values for each parameter within the Parameter Set. call this Job. The Parameter Set allows us to store a number of parameters in a particular named object. will it call all the other Jobs for you. One of the areas that will help us to pull our Jobs together more effectively will be the use of Parameter Sets. we would need to learn to execute the Job from the Command Line. Pulling Jobs Together • • • Parameter Sets Running Jobs from the Command Line Other Director Functions Parameter Sets Let's start by talking about Parameter Sets. in turn.1 Introduction Exception Handler Stage There is one final note about our Sequencer Job: It is not a scheduler unto itself.
the “Usernames”. which is on the Parameter tab within the Job Properties. and another set for production. the “Database Type”. As we develop our Jobs. This makes it very convenient for us. the “Database Name”. We could then have one set of values in a value file used specifically for development.Information Server 8. the “Host Type”.businessintelli. another set for testing. We might set up a Parameter Set entitled “environment” and have it contain parameters such as the “Host Name”. We could then move our Job from one environment to the next and simply select the appropriate values for that environment from the Parameter Set. An example of a Parameter Set might include all the things that happen within a certain environment. and the “Passwords” (Not to be confused with the DataStage Connection Objects). Business Intelli Solutions Inc www.1 Introduction Parameter Sets can be added to the Job’s parameter list. we don’t have to enter a long list of parameters for every single Job thus risking the possibilities of mis-keying or having omissions that prevent our parameters from passing successfully from the Sequencer Job into our lower level Jobs.com .
this is not the typical way that applications are utilized. The graphic above shows some of the particulars.Information Server 8. and tested a Job and are now ready to put it into production.com . there are a series of Command Line options that will allow us to start the Job from the native Command Line when we type in the command or to use our scheduler to call Jobs.businessintelli. Business Intelli Solutions Inc www. we have an application. The next question is “How do we execute it?” Although it can be executed by hand using the DataStage Director client. Let’s say that we’ve built. developed.1 Introduction Running Jobs from the Command Line Now that we’ve learned the basics of how to sequence our Jobs and group our parameters. Instead.
The InfoSphere suite provides a utility for DataStage called the dsjob. we are passing parameters that include the number of rows that we want to run. you are able to store the logged information directly into the Xmeta repository or keep it locally in the DataStage engine. the name of the Project that we want to run from.Information Server 8. and the name of the Job. summary of its messages. you’ll be able to accomplish a number or automated functions using a whole host of other tools (Since the Command Line is the most common method for launching other activities). This way you can do it without negatively impacting the performance of the application as it is running since you will be doing it after the application is finished. which can then either be archived or even used by another DataStage Job and loaded into a database with all your log information (any database – doesn’t have to be part of the DataStage repository). we must run Jobs from the Command Line as most schedulers use a Command Line execution. only at the end. You may choose the latter for performance reasons so that you don’t tie up your Xmeta database with a lot of logging activity while the Job is running –which could affect your network should your Xmeta be remotely located from your DataStage server. and even Link information. Now. The dsjob is a multifunctional Application Programming Interface (API) on the Command Line that allows us to perform a number of functions. Jobs are scheduled by an external scheduler. Business Intelli Solutions Inc www. By executing the dsjob command on your Command Line with various options after it. pull that data out and process it into your master repository (whether that be Xmeta or some other relational data source) so that you can keep it for long term usage.1. With DataStage version 8.com . To do this.1 Introduction Other Director Functions We mentioned that we can schedule Jobs using the DataStage Director. The top one in the graphic above (HIGHLIGHT ON TOP ONE) will run a Job. dsjob’s function is documented in the “Parallel Job Advanced Developer’s Guide”. typically. when we use the dsjob command it will be encapsulated inside some kind of shell script (whether on a Windows or UNIX type of system or whatever system DataStage is running on) and we would want to make sure that all the options are correct before issuing the command. but more often than not. In it. The example above (in the second bullet point) displays a summary of all Job messages in the log. dsjob also has other functions such as giving us information about a Job’s run status. Xmeta is the repository for all of DataStage. Having the option to pull the logging information out subsequently and allow you to run all the log information into the local DataStage engine’s repository and then.businessintelli. One of the most common functions is to run a Job. You can use the -logsum or the -logdetail options to the dsjob command to write your output to a file. As mentioned earlier.
In our scenario. to create DataStage Jobs. 2. Specifically. we discussed the many of the important concepts involved in DataStage.com . Amalgamated Conglomeration Corporation (ACC) has hired you as their DataStage Developer! ACC is a large holding company that owns many subsidiary companies underneath it. In this section you will do the following: 1.businessintelli. without having to perform any installations. and ‘learn by doing’.Information Server 8. Understand the Business Problem and Case Scenario Assess the Existing Architecture Look at a Feed Job Create a Feed Job Modify the Consolidation Job Create a Sequence Job Business Intelli Solutions Inc www. 5. 3. Earlier in the tutorial. In this interactive section. you will ‘use the product’. navigate around the UI. 6. we’re going to talk about what happens to ACC over a period of a year as they acquire new companies. we will first introduce you to a case-study of a business use-case scenario in which you will learn about a business entitled Amalgamated Conglomeration Corporation.1 Introduction Product Simulation Welcome to the Guided Tour Product Simulation. 4.
These subsidiary companies include Acme Universal Manufacturing (Company A). some time will have gone by. Let’s go back to January 1 and see what the feeds looked like before Disks of the World (Company D) came on board. As you can see in the graphic above.businessintelli. Now.Information Server 8. Cracklin’ Communications (Company C) and Eco Research and Design (Company E). and.com . All of these belong to ACC and information from all of them will need to be pulled together from time to time. Big Box Stores (Company B). on March 3rd.1 Introduction Let’s look at ACC’s corporate structure as it looks at the beginning of the year on January 1 st. ACC has four subsidiary companies. Business Intelli Solutions Inc www. the state of ACC has changed: Your employer has acquired a new company called Disks of the World (Company D). Disks of the World will now need to be able to feed its information into ACC’s Data Warehouse.
Eco Research or Company E has such a small customer base that they only produce their feed one time per week. In other words. each puts out its own feed at different frequencies. However. C.Information Server 8.1 Introduction Each of the individual subsidiaries (Companies A. But Big Box or Company B has such a large feed that they put it out once an hour. is for us to have a daily consolidation of these feeds into the Common Customer Data Mart. Acme or Company A puts out their data once per day. The reason why we have several different feeds running to files is that each of these subsidiary companies put out their data at different intervals or periods.com . Let’s quickly talk about how this was done from more of a technical standpoint. It will therefore pull in many files from Companies B Business Intelli Solutions Inc www. we can see four particular feeds. But this is done in two steps.businessintelli. one for each company and that particular company's customers. and E) feeds its data into a common customer Data Mart. so they do it four times per day. These files are then later pulled together by a consolidation job (A DataStage Job) and run into the COMMON CUSTOMER DATA MART. we only want to consolidate it in “one fell swoop” (ACC’s business requirements are that we put all the feed files into the Data Mart at one time). ACC’s business requirement though. The feeds are all turned into files. B. Each FEED FILE has been generated by an individual DataStage DS FEED JOB. Above. Cracklin’ Communications or Company C has a large feed but not quite as large as Company B.
This way.businessintelli. probably only one from Company A. and then. Now let’s go ahead in time to March 3 rd. We will just schedule the one Sequencer. This feed will be landed to the same set of files (Feed files at the center of the graphic) that are then picked up by another DataStage Job entitled Consolidation DS Job (which combines them into the mart). You will build a DataStage feed that accommodates their need to produce their data multiple times per day during their salescycle. once per week. You’ll recall that ACC has now acquired another company called Disks of the World.Information Server 8. Business Intelli Solutions Inc www.1 Introduction and C. there will also be one in the heap from Company E. can be brought together by using a Sequencer Job. This Sequencer will call all the Jobs in the correct order. In other words. Finally.com . This type of Job is used in order to control the running of other DataStage Jobs. we don’t have to schedule them individually in our corporate-wide scheduling software. you will need to build a special type of Job called a Sequencer Job. we’ll see how all of the Jobs (All DS FEED JOBs and the CONSOLIDATION JOB). The DataStage Developers for ACC (that’s you) will now need to add a new DataStage feed for the customer data coming from Company D (Disks of the World) into the Common Customer Data Mart.
Even though these two share the same number. each Feed Job has designated its subsidiary company uniquely. If we put them into the Common Customer Data Mart ‘as is’. There is a technical problem that has been identified and reported to ACC in the past. We’ll be able to load it correctly and distinctly. you will help solve this problem: ACC will need to recognize any particular customer as having come from a particular subsidiary: 6. then the Data Mart must find the unique ID. “Source system 1” or “coming from Company A” for Acme. then we will hand it off to ACC's Data Mart team to add it in a separate process (a ‘Black Box’ process Job that we will not see in this tutorial) which will then place the new customer or insert it for the first time into the Data Mart. then we would have ambiguity and not know which customer we were dealing with. you will also create a Job that controls all of these Jobs! 5. “Source system 2” or “coming from Company B” for Big Box Stores. 12. they are not the same customer. 11. For example. You will need to understand the issue so that. Then you will need to modify ACC’s DS CONSOLIDATION JOB to bring in this new/extra feed (from Company D) into the data mart 4. For this we will use both the source system number or ID and the Customer ID from the source system to do a lookup into the table. 7. However. For example. If the customer is already known to the Mart. as the DataStage Developer for ACC. along with an additional identifier.1 Introduction In sum. the source system number will be sent into the Job (as a parameter). when it comes time to load the Common Customer Data Mart (With the Consolidation DS Job) then we won’t have ambiguity in our data. Customer_123SourceSysID_1 is added to an ADDRESS-update for Joe Smith). Company A has a Customer 123 that represents Joe Smith but over at Company B. later in this Guided Tour Product Simulation. These two fields when combined will create a new field that is an alternate key and is now a unique identifier in the Common Customer Data Mart for Joe. Anytime that ACC runs one of these Jobs the Feed Jobs designate from which source system (Company) they are coming. View the existing Acme DS FEED JOB to see how the previous DataStage Developer at ACC did it 2. Finally.businessintelli. Let’s talk about how the Consolidation DataStage Job would do this. you will need to do the following: 1. it the Job must perform a lookup. 8. and so forth. for instance). This way. the numbering systems that they use for their customers are different and may well inappropriately overlap.Information Server 8. ACC's Business Requirements 13. In other words. Then. The Common Customer Data Mart will have its own numbering system for these customers and it will differentiate by using the Customer ID that comes from the company (Joe Smith = Customer 123).com . Business Intelli Solutions Inc www. create a new DataStage Job shown highlighted above (Company D's DS FEED JOB) 3. and add the unique ID to the data. For example. from the Common Customer Data Mart. One of the things that all of these DS Feed Jobs have in common is that the DataStage Jobs will need to find a common customer ID for the corporation. that indicates which company the data came from (Company A is known to the Data Mart as Source System ID = 1. This number will now be able to distinguish any customer from any subsidiary. as you develop a new DS FEED JOB. 10. Our DataStage “Feed” Jobs will need to be able to find out if their customer is already known to the Common Customer Data Mart. Joe’s address is updated. If the customer is not known. there is a Customer 123 that represents Emmanuel Jones Enterprises. Since each of the subsidiary companies developed their own systems independently (of ACC). 9. When the data is processed.
15. the unique Customer ID in the Common Customer Data Mart has compounded the Source System Identifier with a source-specific Customer ID into a field that. 17. This Insertion Job is not discussed in this tutorial and is considered a “Black Box” process used by Amalgamated Conglomerate Corporation to add new customers. Additionally.1 Introduction to see if this compound key already exists. we should see the same type of data but captured in columns and fields which conform to the Common Customer Data Mart’s metadata structure. This key has no meaning to each individual source system. The feed process enriches the data by looking at the Common Customer Data Mart and finding this Common Customer ID to help differentiate all of our different customers across all of our subsidiary companies. So the only way that the source system can find this unique key is through this feed process. If the customer is new. for example. The Common Customer ID is a unique key within the Common Customer Data Mart. then it will be sent to another process for insertion into the Customer Data Mart.businessintelli. the Job uses those two columns to find the true key for the Common Customer Data Mart which should then be returned into the Job so that that column (the true key) can be carried along with all the other native source columns that are coming in the row as it passes through the Job. This is called a “reverse lookup”. we need to code the Job so that it will convert all of all of the source system-specific metadata from the source system’s metadata layout into the target metadata layout. 14.com . might contain a value like “9788342”.Information Server 8. In other words. And this field is unique so that no customer is overlapped with another Business Intelli Solutions Inc www. the metadata appears as it would in the source system (Acme’s layout of the customer) but. In summary. on the way out. 16. Coming into the Job.
In order to look at the DataStage Feed Job for Acme Universal Manufacturing. we’ll look at the Consolidation Job. We will need the highlighted client to look at. the architecture. and the development strategy. then. after ACC makes another acquisition in March. After we've seen it and how it works. create. and modify DataStage Jobs.Information Server 8.1 Introduction Look At Acme DS Feed Job Now that we know the business requirements.com . we'll need to bring up the DataStage Designer Client. we will see how all of the individual Feed files are brought together prior to being loaded into the Common Customer Data Mart. we will go to a section of the tutorial. we will see how it pulls the data from Acme’s operational database and puts that data into a file after it has looked up to find the appropriate Common Customer ID from the Data Mart. let's look at the DS Feed Job for the Acme subsidiary (Company A) that was built previously by another DataStage Developer at ACC (Highlighted above). Business Intelli Solutions Inc www. Then. In this Job. in which you will have to build a similar DS Feed Job for Disks of the World (Company D). There.businessintelli.
you can see the domain in which we are working. we’ll need to enter username/password information. since we will look at an existing Job. At the top of the screen. we are asked if we want to create a Job and if so. what type. First.1 Introduction Our Login screen comes up. At the bottom of the screen. with a password filled in for you (for purposes of this tutorial) enter a username of student and click the OK button (The NEXT button below has been disabled for the remainder of the Guided Tour Product Simulation) Our Designer client is now connecting to the DataStage Server via authentication and authorization from the Server’s Security layer. Then. For right now.businessintelli. From here please develop as per the spec…… Business Intelli Solutions Inc www. Here.com . we need to make sure that we are working with the correct DataStage Project in this case it is the AmalCorp Project (Amalgamated Conglomerate Corporation Project).Information Server 8. A screen will then pop up within which we can work on our Job. just click the Cancel button.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.