Connecting Shared Tables to your DataStage/QualityStage Jobs October 25, 2010 ² dsrealtime At this point you should be familiar
with Shared Tables, and understand how data lineage works within, and among, multiple DataStage Jobs. Stage to Stage lineage is very useful, and may be all that you require. It¶s also powerful, though to go beyond this and connect your Jobs to the tables that you have imported from various places. Why? One of the key reasons for including Shared Tables in lineage is for ³Business Lineage´. This is a high level summary of lineage that doesn¶t illustrate the lower level transformation details«just the ³key´ sources and targets (files, tables, reports, etc.) along the way. Another is to connect your Jobs to ³external, non Information Server assets.´ DataStage doesn¶t write directly to a (for example) Cognos report«.it writes to a table somewhere, and that table is then read by the business intelligence tool. The connection thru that table is critical for accurate data lineage reporting. Here is how Metadata Workbench makes the connection between a DataStage Job and a table« Shared Tables have a four (4) part name: Host/Database/Schema/Tablename (see other entries in the table of contents on Shared Tables for more details on the names of Shared Tables or alternatives for importing them). Relational Stages in DataStage typically use two (2) parts to identify a particular table. ³Server´ or ³Database´ or ³DSN´ name, and tablename. The Server/DSN/Database name is usually in a dedicated property within the Stage. The Tablename might be in a dedicated property, or it could also be embedded in User Defined SQL. Any of them might be hard coded or established via Job Parameters. The first thing Metadata Workbench needs to do is to ³map´ the abstract ³Server/Database/DSN´ name to a particular ³Host´ and ³Database´ combination in the list of Shared Tables. Like any application, DataStage is just pointing to some abstract ³string´ when trying to find a database. An ODBC DSN, for example, ³might´ be the name of the database, but it could also just be ³myODBCdatabase´, which really points to (in the ODBC definition) a DB2 table called HRMAIN. Even if we use the string HRMAIN in the ³Server´ property of the Stage, we still need a way to identify the particular host that we are considering for data lineage. This is done via the ³Database Alias´ link in the ³Advanced´ tab of the Metadata Workbench. Go to the Advanced Tab after signing into the Workbench. Perform Automated Services for one of your projects (this could take a long time if it¶s the first time you are doing it). When it finishes, click on ³Database Alias´. Look carefully at the values there. These are the ³strings´ that are used in your various Jobs to identify databases. Pick the string that is appropriate for the Stage Type that you are working with and slide your cursor over to the right. The ³Add´ button will allow you to select the desired host/database combination that this abstract string should connect to. In the example noted above, I might assign the string (alias) myODBCdatabase to the host/database combination of QR2H004/HRMAIN. QR2H0004 with Database HRMAIN must be something I have already imported and is viewable in the left navigation pane of the Workbench, or in the Repository Management tab of the Information Server Web Console. Save it (button on the lower right). The next time you perform Automated Services, whenever that Stage type with that particular string (myODBCdatabase) is found, Metadata Workbench will use QR2H004/HRMAIN, combined with the fully qualified tablename in the Stage, to match to a particular Shared Table that has been imported previously. Are you using Job Parameters? For Design based lineage, Metadata Workbench is smart, and will use the ³default values´ of those Job Parameters when finding alias ³strings´ and also when obtaining the fully qualified (schema.tablename) tablenames to use in the linking. Are you using $PROJDEF? Run the ProcessEnvVars shell or bat file inside of /IBM/InformationServer/ASBNode/bin to obtain the project definitions for use in this algorithm. Operational Metadata (a whole separate blog entry is needed to discuss OMD) is used to populate the Job Parameter values from ³run time´«otherwise the same rules apply. If you are just starting, get to know lineage w/o worrying about OMD. That¶s an advanced topic. Understand how database alias works by performing Automated Services, reviewing the Database Alias page in the Workbench, doing the assignment and running Automated Services again. Once it is complete, go to the actual table using the
Ernie Posted in Metadata Workbench.even if you haven't compiled a single one or started your formal testing and QA process!] Automated Services is the ³parsing´ step at the Advanced Tab«. and Tablename. who might have all the lineage in his or her head. Do it during off hours if you have hundreds or thousands of Jobs. You simply have to pay attention to a few ³good sense´ leading practices and understand the pattern. You should also be comfortable with thinking about the ³direction´ for your Data Lineage investigation ² are you looking ³upstream´ for ³Where did this come from?´ or downstream for ³Where does this go to?´ If you need a refresher on the basics. For Sequential type Stages. but more often it is a larger scale endeavor. Sometimes they are written by one very hardworking developer. data lineage.same ³default´ values for Job Parameters for the critical common properties. Note that this has ³nothing´ to do with Shared Tables or Table Definitions at all« it¶s entirely done by merely parsing thru your Jobs [this is a key reason why you can get immediate insight on 7. you can expect to see Stages through many Jobs back to the ultimate source. By now you should be comfortable with thinking about your ³starting position´ for a data lineage report ² your initial ³perspective´ if you will (what object are you standing on when you begin). or there¶s an odd Stage that isn¶t supported for lineage (Redbrick is one of the only ones I¶m aware of at this point). and you start while ³standing´ on the target Stage of an application down-stream Job«. Schema. For RDBMS type Stages. a high degree of lineage will often occur immediately after the first Automated Services (informally known as ³stitching´) occurs. or may not. select ³where does this go to´«.navigator frame at the left (open the Host tree and find the table) and right mouse and select ³data lineage´. it¶s time to look at how DataStage Jobs are automatically linked together. Expect that the ³first´ time you run Automated Services.
. but so is ODBC and say. datastage. How are the DataStage Jobs sequenced from a data flow perspective? How does data flow between a Job developed to process data received via FTP from the mainframe and then ultimately to a datamart that supports a reporting system? How does one Job connect to another? Sometimes it may be one giant Job.even thousands. metadata. it¶s the filename. Now when you do your lineage reporting.x environment --. Or two Sequential Stages. please see Getting Started with Data Lineage!. looking for similarities that will link Jobs together end-to-end. Intermediate temporary tables are often created for everything from checkpoints to Operational Data Stores to ³parking lots´ where data can be restructured or delivered to another application along the way. c) Same hard coded values (yuk«who does that?) OR«. If it dead-ends surprisingly. b) At least one column in common. They may know each other. Workbench combs through your Jobs. The Automated Services will put together multiple Job Parameter default values if needed. If your team follows good practices of having parameter sets or common values for things like an ODBC DSN. but its not likely. select ³where did this come from´«« and validate your results. meta data. All integrated and working together to transform your data and move it from one place to another. with lots of team members. Intra-Job Data Lineage (Data Lineage between Jobs) is largely automatic. Here¶s what it looks for among Jobs: a) Common or ³like´ Stages between the Target of one Job and the Source of another. Workbench can sort this out and provide you with lineage through all these Jobs. and with varied skill sets and possibly working on related albeit independent solutions. it¶s probably because one of the three rules above didn¶t apply. If it is a source. often scattered around the globe. Oracle OCI. 2010 ² dsrealtime Once you have mastered the ³navigation´ and asset selection options of Data Lineage reporting.. Or DB2Load and DB2Connector.if it is a target. After that first time it will recognize a delta and only parse thru the Jobs that are new or have been changed. Leave a Comment » Linking DataStage Jobs Together September 30.when you say ³run´ with your Project(s) checked.x Jobs that are imported into the 8. it¶s ServerName.when you ask for ³Where does this come from?´. Two ODBC Stages are in common. A typical production site for DataStage/QualityStage has MANY Jobs ² hundreds perhaps«. it could take a long time and be very intense.
. Extension Mappings.you may get more metadata than I am describing in this initial learning step. but nothing on ³getting started´. Extended Data Sources and all other such concepts. scripts and processes.if you have. by using a Manual ³Stage Binding´.´ Ernie Posted in Metadata Workbench. DataStage Jobs. Stewards... as noted. datastage. Note below you have many ³expandable´ sections for things like Job Operational metadata«. because you will know what to expect at each particular dialog. and find the metadata section). 1 Comment » Getting started with Data Lineage! September 28. It is fairly easy to use«it prompts you first for the name of a Job. only when absolutely necessary. I was reading that with an ³External Source´ Stage. along with their icon. metadata. If you are using Metadata Workbench and are not a DataStage user.. As Metadata Workbench starts to support more and more objects. even intimate. data lineage.] Log into the Metadata Workbench and notice the ³Engine´ pull down at the left. Tags: etl. knowing certain skills and techniques becomes that much more important. there is the main page with the picture of the Job (click on it and you will get an expanded view in a new window of what the Job looks like). Many of you who start with Metadata Workbench begin with DataStage/QualityStage Jobs.investigate the options.and if you don't know what I'm talking about (yet). In one case I was sending the output of an XML stage into a Sequential File«and in the next Job. but also when the rules above don¶t apply. it gives you reporting capabilities for the casual business user as well as the (often) more technical dedicated metadata researcher. Open up the project. meta data. I have used it effectively for unsupported Stages. columns.. I¶ve had a variety of entries about Workbench in the past two years (see the table of contents link in the top right. but you might not get the same results as I am outlining below -. slide your cursor over to the right so that you can ³add´ a Stage from another Job. This is the list of your DataStage Servers and their Projects. and then when it presents you with a list of Stages. Click directly on it. reasonably complex DataStage Job. External ETL Tools. and find your Job. The Stage Binding is designed to be used. This is one of the options for ³manual´ binding of metadata ² a sort of ³toolbox´ of wrenches and bolts for when you need µem. 2010 ² dsrealtime Last night I was reminded about a series of blog entries I¶ve wanted to make concerning the InfoSphere Metadata Workbench and how to get the most out of its Data Lineage capabilities. As we progress I will take a tour through Extensions. This is especially true when trying to gain the most from Metadata Workbench when it is being used to illustrate Business Terms. Since you are learning about the Workbench.If that fails. databases.. it¶s folders. metadata. that's ok too. report or screen. forcing the Sequential Stage of the first Job and the External Source Stage of the second Job to be ³bolted´ together. Maybe one with a lookup or a Join. it's ok. use the ³Stage Binding´ at the Advanced Services tab. operational metadata data and a vast list of other data integration artifacts. a reasonable sequence of Stages (8 to 10 or so) and preferably a single ³major´ target. data files. The metadata you are viewing is up-to-date from the last moment you or a developer saved the Job in the DS Designer. FastTrack Mappings. with this Job. [this first "getting started" assumes that you have NEVER performed Automated Services against your DataStage Project. So I will start there. people. There is nothing in common between those stage types. Good reporting! Next topic will be ³Connecting Database Tables to your Jobs. stay tuned. and its combination with other objects. data lineage. and the columns were entirely different (the Sequential File Stage contains a column called ³myXML´ and the External Source merely carries the output of a unix list command (a set of filenames). Combined with Business Glossary. Tables and Files. . and much. Also there is a very important listing of the Stage types in the Job.. The Workbench is very powerful ² it illustrates relationships between processes. which I will also cover in this series of blog entries. datastage. you can then easily move on to other concepts for non-DataStage metadata.. That will help as you learn the various ways to navigate through the user interface. Scroll up and down in the detailed page that appears. Start with your favorite. business concepts. much more. to ³force´ two Stages together for lineage purposes. Once you have mastered lineage with DataStage. you should be familiar. I was able to establish perfect lineage however.
´ What does that mean? Let¶s imagine that a user is confused about a value in a report«perhaps it is a numeric result labeled as ³Quarterly Profit Variance´. Data lineage doesn¶t show you ³ALL´ the sources ² just the path to/from the ones that you select [we'll contrast this in a later entry with Business Lineage. Now find the little triangle towards the top left of this center pane where your report is (it¶s called Report Selection or similar) and click on it. If you practice this. The default option at the next dialog is ³Where did this come from´. but don¶t click anything ² when you are ready. The textual report isn¶t as pretty. data lineage. Grab some white space around the diagram and move the whole thing around«. One conceptual issue that vendors and architects are pushing related to metadata management (whether home-grown or part of a packaged solution) is ³data lineage.´ One large bank I visited recently said that this can take _days_! «and that¶s assuming they ever successfully find the answer. The list of objects above should get much smaller. Scroll up and down. and the Job details you see on the right. metadata. knowing ³where you are´ when you start your lineage is very important. This leads to additional phone calls and emails. finally. That should expose again the ³assets´ page. This is because it¶s the ³final´ target ² there isn¶t anything else. This highlight bar lets you select EXACTLY which resource you¶d like to see for your actual report. Find the button labeled ³Display Final Assets´. datastage. In too many other cases the executive cannot wait that long and makes a decision without knowing the background. 1 Comment » What exactly is Data Lineage? December 15. which DOES provide a summary of ALL sources or ALL target from a particular resource]. ³Where did this come from´ will yield a similar result if you happen to be ³sitting´ on a source when you start your lineage exercise. metadata workbench. Everything is hyperlinked.. What happens if you choose the target stage of your original Job (the first stage you selected earlier) and ask for ³Data Lineage´ and select ³Where does this go to´? If you haven¶t done Automated Services as I¶ve noted above. Move it up and down.Now click on the ³main´ target Stage of this Job. you should likely receive ³No assets found´ or ³No data for the report´. and deeper. consider ³where you are standing´ (you are on a ³Stage´) and what sort of lineage you would like to see. and will have a strong base for moving forward with more complex. That will bring you back to a detailed viewing page and the process starts again. you should become very familiar with the lineage report user interface. play with it. Click on the various icons in the lineage and then right mouse on one of the stages and find ³open details in new window´. Next entry: Linking Jobs together«« (link to next post in this series: Linking Jobs ) Ernie Posted in Metadata Workbench. This will comb through ALL the possible resources for ³where´ data for the ³stage you started on´ came from. and note what you see on the left. data quality. What do they do today to gain further awareness of this amount and trust it for making a decision? In many large enterprises. Look at the bottom of the page. This time it finally feels more serious. Pick the primary source stage for the Job and then click ³Show Textual´ Report. Review the result. Ignore the three checked boxes for now and click ³Create Report´. they call someone at the ³help desk´. but it tends to be more scalable. and then one or more analysts or developers sifting thru various systems reviewing source code and tracking down ³subject matter experts. this one for the ³Stage. for the ³nth´ time over the years). thanks to attention to initiatives such as data governance. As you do so. This is often a point of confusion because the highlight bar is not always obvious. select ³Data Lineage´ at the upper right. master data management. it should just show the source stage for this Job. with all the risks that entails. etl. scenarios. Note also the highlighted line. Tags: etl. Now you can try ³Show Graphical´. or maybe its ultimate source as well as a lookup source stage or a source for a Join. metadata.´ Look around. meta data. This brings you to a similar looking detail page. As you will discover. That¶s a good thing. and others.
. Click it. Look thru the list. When you get there. The ³total´ collection of lineage resources is in front of you right now ² you will select which one you want for a detailed source-to-target report. 2009 ² dsrealtime Metadata management is becoming a big issue («again.try the zoom bar in the upper left. Most likely. data lineage.
Knowing the original source.ppt. This lineage should be presented in a visual format.dsx file). is to use an XML import. ³Where did the value come from?´ ³How is it calculated?´ Data lineage can answer these questions. 4 Comments » «another way to load Terms into InfoSphere Business Glossary February 19. I¶ve tried putting all the content in ³notes pages´ of . nor have they been widely tested ²± but they are already being implemented at a variety of locations.txt file. but if you look at the Job you will see that it can easily be adapted to any source.0. Leave a Comment » Using DataStage to load new Terms into Business Glossary November 30.doc. This was tested in 8. meta data. although has since been modified for use with 8. but may not be enough«. Please let me know if you find them useful. Happy reporting! Ernie Ostic Posted in Business Glossary. Using the example above. shells and FTP scripts. these DataStage Jobs read a potential source of Terms (just alter the source stage as needed) and then create a target csv file that is in the correct format for loading into Business Glossary using the new 8. In this example I use a simple sequential file. Each is named . The XML format is fairly easy to produce (a sample is provided with the Business Glossary and can be found at your Information Server Web Console) The attached DataStage Server Job illustrates how to load new Terms and Attributes from some external structure. and understanding ³what happens´ to the data as it flows to a report helps boost confidence in the results and the overall business intelligence infrastructure. 2008 ² dsrealtime There are a variety of ways to import new Terms into the InfoSphere Busines Glossary. The lineage path may run through cubes and database views. I¶m not sure how well it will import into 8. Many sources and expressions may have contributed to the final value. general. sample xml.
. intermediate staging tables. general. They or their support team may need to drill deeper. Ernie (the one with ³XML´ in the name is the same as the prior blog entry. and even legacy systems on the mainframe. and . for initial loads. Like the earlier post on Business Glossary.). or simply write another Job to go from your source to a target that reflects the sample terms I¶ve provided below. bg. preferably with options for viewing at a summary level with an option to drill down for individual column and process details. but is actually a . Please let me know if you have any questions or run into problems. The Jobs are fairly well annotated and should be self explanatory. tracing the data path (it¶s ³lineage´) upstream from the report. (I hope you can make use of these. You¶ll need to download and then open the . just to keep things simple and allow for discussion. I haven¶t yet set them up for Custom Attributes. Seems the blog has changed and won¶t allow me to upload . The code is an example for instructional purposes only. ETL processes that load a warehouse or datamart. quick access to a corporate glossary of terminology will enable an executive to look up the business definition for ³Quarterly Profit Variance. data lineage. datastage.´ That may help them understand the business semantics. The terms and attributes are very simplistic and use a hockey theme.1 csv import/export features available at the Information Server Web Console« Glossary tab.dsx.1. metadata. 2009 ² dsrealtime copy-of-createbgimportscsv copyofcreatebusinesstermsandattributesxmldsx3 Here are a few other Jobs for loading new Terms and Categories into Business Glossary.ppt [one of only a few file types allowed here] and then see if you can cut and paste the sample .0. Tags: data lineage. One of these. datastage.txt files. Posted in Business Glossary.Carefully managed metadata that supports data lineage can help. Ultimately this leads to better decision making.