This action might not be possible to undo. Are you sure you want to continue?
with Shared Tables, and understand how data lineage works within, and among, multiple DataStage Jobs. Stage to Stage lineage is very useful, and may be all that you require. It¶s also powerful, though to go beyond this and connect your Jobs to the tables that you have imported from various places. Why? One of the key reasons for including Shared Tables in lineage is for ³Business Lineage´. This is a high level summary of lineage that doesn¶t illustrate the lower level transformation details«just the ³key´ sources and targets (files, tables, reports, etc.) along the way. Another is to connect your Jobs to ³external, non Information Server assets.´ DataStage doesn¶t write directly to a (for example) Cognos report«.it writes to a table somewhere, and that table is then read by the business intelligence tool. The connection thru that table is critical for accurate data lineage reporting. Here is how Metadata Workbench makes the connection between a DataStage Job and a table« Shared Tables have a four (4) part name: Host/Database/Schema/Tablename (see other entries in the table of contents on Shared Tables for more details on the names of Shared Tables or alternatives for importing them). Relational Stages in DataStage typically use two (2) parts to identify a particular table. ³Server´ or ³Database´ or ³DSN´ name, and tablename. The Server/DSN/Database name is usually in a dedicated property within the Stage. The Tablename might be in a dedicated property, or it could also be embedded in User Defined SQL. Any of them might be hard coded or established via Job Parameters. The first thing Metadata Workbench needs to do is to ³map´ the abstract ³Server/Database/DSN´ name to a particular ³Host´ and ³Database´ combination in the list of Shared Tables. Like any application, DataStage is just pointing to some abstract ³string´ when trying to find a database. An ODBC DSN, for example, ³might´ be the name of the database, but it could also just be ³myODBCdatabase´, which really points to (in the ODBC definition) a DB2 table called HRMAIN. Even if we use the string HRMAIN in the ³Server´ property of the Stage, we still need a way to identify the particular host that we are considering for data lineage. This is done via the ³Database Alias´ link in the ³Advanced´ tab of the Metadata Workbench. Go to the Advanced Tab after signing into the Workbench. Perform Automated Services for one of your projects (this could take a long time if it¶s the first time you are doing it). When it finishes, click on ³Database Alias´. Look carefully at the values there. These are the ³strings´ that are used in your various Jobs to identify databases. Pick the string that is appropriate for the Stage Type that you are working with and slide your cursor over to the right. The ³Add´ button will allow you to select the desired host/database combination that this abstract string should connect to. In the example noted above, I might assign the string (alias) myODBCdatabase to the host/database combination of QR2H004/HRMAIN. QR2H0004 with Database HRMAIN must be something I have already imported and is viewable in the left navigation pane of the Workbench, or in the Repository Management tab of the Information Server Web Console. Save it (button on the lower right). The next time you perform Automated Services, whenever that Stage type with that particular string (myODBCdatabase) is found, Metadata Workbench will use QR2H004/HRMAIN, combined with the fully qualified tablename in the Stage, to match to a particular Shared Table that has been imported previously. Are you using Job Parameters? For Design based lineage, Metadata Workbench is smart, and will use the ³default values´ of those Job Parameters when finding alias ³strings´ and also when obtaining the fully qualified (schema.tablename) tablenames to use in the linking. Are you using $PROJDEF? Run the ProcessEnvVars shell or bat file inside of /IBM/InformationServer/ASBNode/bin to obtain the project definitions for use in this algorithm. Operational Metadata (a whole separate blog entry is needed to discuss OMD) is used to populate the Job Parameter values from ³run time´«otherwise the same rules apply. If you are just starting, get to know lineage w/o worrying about OMD. That¶s an advanced topic. Understand how database alias works by performing Automated Services, reviewing the Database Alias page in the Workbench, doing the assignment and running Automated Services again. Once it is complete, go to the actual table using the
even thousands. Leave a Comment » Linking DataStage Jobs Together September 30. c) Same hard coded values (yuk«who does that?) OR«. b) At least one column in common. Intra-Job Data Lineage (Data Lineage between Jobs) is largely automatic. but so is ODBC and say. Schema. and with varied skill sets and possibly working on related albeit independent solutions. Or two Sequential Stages. Now when you do your lineage reporting. If it is a source. Sometimes they are written by one very hardworking developer. . 2010 ² dsrealtime Once you have mastered the ³navigation´ and asset selection options of Data Lineage reporting. metadata. Do it during off hours if you have hundreds or thousands of Jobs. it¶s the filename.when you say ³run´ with your Project(s) checked. Here¶s what it looks for among Jobs: a) Common or ³like´ Stages between the Target of one Job and the Source of another. For RDBMS type Stages. but its not likely. select ³where does this go to´«. Ernie Posted in Metadata Workbench. A typical production site for DataStage/QualityStage has MANY Jobs ² hundreds perhaps«. After that first time it will recognize a delta and only parse thru the Jobs that are new or have been changed. Two ODBC Stages are in common. You simply have to pay attention to a few ³good sense´ leading practices and understand the pattern. or there¶s an odd Stage that isn¶t supported for lineage (Redbrick is one of the only ones I¶m aware of at this point). it could take a long time and be very intense. with lots of team members. Workbench can sort this out and provide you with lineage through all these Jobs. it¶s time to look at how DataStage Jobs are automatically linked together. Intermediate temporary tables are often created for everything from checkpoints to Operational Data Stores to ³parking lots´ where data can be restructured or delivered to another application along the way. Oracle OCI. you can expect to see Stages through many Jobs back to the ultimate source. select ³where did this come from´«« and validate your results. Note that this has ³nothing´ to do with Shared Tables or Table Definitions at all« it¶s entirely done by merely parsing thru your Jobs [this is a key reason why you can get immediate insight on 7. If your team follows good practices of having parameter sets or common values for things like an ODBC DSN.when you ask for ³Where does this come from?´. All integrated and working together to transform your data and move it from one place to another. The Automated Services will put together multiple Job Parameter default values if needed.if it is a target.same ³default´ values for Job Parameters for the critical common properties. often scattered around the globe. If it dead-ends surprisingly. looking for similarities that will link Jobs together end-to-end. who might have all the lineage in his or her head.even if you haven't compiled a single one or started your formal testing and QA process!] Automated Services is the ³parsing´ step at the Advanced Tab«. but more often it is a larger scale endeavor.. it¶s probably because one of the three rules above didn¶t apply.navigator frame at the left (open the Host tree and find the table) and right mouse and select ³data lineage´. it¶s ServerName. meta data. You should also be comfortable with thinking about the ³direction´ for your Data Lineage investigation ² are you looking ³upstream´ for ³Where did this come from?´ or downstream for ³Where does this go to?´ If you need a refresher on the basics. Expect that the ³first´ time you run Automated Services. datastage. or may not. How are the DataStage Jobs sequenced from a data flow perspective? How does data flow between a Job developed to process data received via FTP from the mainframe and then ultimately to a datamart that supports a reporting system? How does one Job connect to another? Sometimes it may be one giant Job. a high degree of lineage will often occur immediately after the first Automated Services (informally known as ³stitching´) occurs. data lineage. Workbench combs through your Jobs. Or DB2Load and DB2Connector. and Tablename. By now you should be comfortable with thinking about your ³starting position´ for a data lineage report ² your initial ³perspective´ if you will (what object are you standing on when you begin).x environment --.x Jobs that are imported into the 8. They may know each other. and you start while ³standing´ on the target Stage of an application down-stream Job«. For Sequential type Stages. please see Getting Started with Data Lineage!.
report or screen..investigate the options. business concepts. It is fairly easy to use«it prompts you first for the name of a Job. and find the metadata section). 2010 ² dsrealtime Last night I was reminded about a series of blog entries I¶ve wanted to make concerning the InfoSphere Metadata Workbench and how to get the most out of its Data Lineage capabilities.´ Ernie Posted in Metadata Workbench. I have used it effectively for unsupported Stages. and much. Good reporting! Next topic will be ³Connecting Database Tables to your Jobs. . slide your cursor over to the right so that you can ³add´ a Stage from another Job. databases. I was able to establish perfect lineage however. it¶s folders. Once you have mastered lineage with DataStage. Scroll up and down in the detailed page that appears. knowing certain skills and techniques becomes that much more important. you can then easily move on to other concepts for non-DataStage metadata. but also when the rules above don¶t apply. That will help as you learn the various ways to navigate through the user interface. along with their icon.and if you don't know what I'm talking about (yet). Start with your favorite. people. The Stage Binding is designed to be used.] Log into the Metadata Workbench and notice the ³Engine´ pull down at the left. External ETL Tools. The metadata you are viewing is up-to-date from the last moment you or a developer saved the Job in the DS Designer. only when absolutely necessary. Since you are learning about the Workbench. a reasonable sequence of Stages (8 to 10 or so) and preferably a single ³major´ target. as noted. Tags: etl. by using a Manual ³Stage Binding´. it gives you reporting capabilities for the casual business user as well as the (often) more technical dedicated metadata researcher. As Metadata Workbench starts to support more and more objects.. and its combination with other objects. Stewards. stay tuned.if you have. This is especially true when trying to gain the most from Metadata Workbench when it is being used to illustrate Business Terms. metadata. but you might not get the same results as I am outlining below -. with this Job.. which I will also cover in this series of blog entries. datastage. data lineage. metadata. because you will know what to expect at each particular dialog. There is nothing in common between those stage types. it's ok. you should be familiar. This is one of the options for ³manual´ binding of metadata ² a sort of ³toolbox´ of wrenches and bolts for when you need µem. data files.. Many of you who start with Metadata Workbench begin with DataStage/QualityStage Jobs. columns. This is the list of your DataStage Servers and their Projects. Maybe one with a lookup or a Join. I was reading that with an ³External Source´ Stage. meta data. In one case I was sending the output of an XML stage into a Sequential File«and in the next Job. Combined with Business Glossary. there is the main page with the picture of the Job (click on it and you will get an expanded view in a new window of what the Job looks like). and the columns were entirely different (the Sequential File Stage contains a column called ³myXML´ and the External Source merely carries the output of a unix list command (a set of filenames). and find your Job. As we progress I will take a tour through Extensions.. data lineage.. I¶ve had a variety of entries about Workbench in the past two years (see the table of contents link in the top right. Also there is a very important listing of the Stage types in the Job. If you are using Metadata Workbench and are not a DataStage user. [this first "getting started" assumes that you have NEVER performed Automated Services against your DataStage Project. reasonably complex DataStage Job. DataStage Jobs. FastTrack Mappings. Tables and Files. The Workbench is very powerful ² it illustrates relationships between processes. 1 Comment » Getting started with Data Lineage! September 28. use the ³Stage Binding´ at the Advanced Services tab.If that fails. Extended Data Sources and all other such concepts. Open up the project. but nothing on ³getting started´. Extension Mappings. to ³force´ two Stages together for lineage purposes. datastage. operational metadata data and a vast list of other data integration artifacts.. even intimate. forcing the Sequential Stage of the first Job and the External Source Stage of the second Job to be ³bolted´ together. that's ok too.. Note below you have many ³expandable´ sections for things like Job Operational metadata«. Click directly on it. So I will start there. much more. . and then when it presents you with a list of Stages.you may get more metadata than I am describing in this initial learning step. scripts and processes.
Pick the primary source stage for the Job and then click ³Show Textual´ Report. data lineage. you should become very familiar with the lineage report user interface. Find the button labeled ³Display Final Assets´. This leads to additional phone calls and emails.try the zoom bar in the upper left. knowing ³where you are´ when you start your lineage is very important. . As you do so. metadata workbench. Look at the bottom of the page. data quality. That will bring you back to a detailed viewing page and the process starts again. Scroll up and down. What happens if you choose the target stage of your original Job (the first stage you selected earlier) and ask for ³Data Lineage´ and select ³Where does this go to´? If you haven¶t done Automated Services as I¶ve noted above. it should just show the source stage for this Job. The default option at the next dialog is ³Where did this come from´. When you get there. finally. and deeper.´ What does that mean? Let¶s imagine that a user is confused about a value in a report«perhaps it is a numeric result labeled as ³Quarterly Profit Variance´. What do they do today to gain further awareness of this amount and trust it for making a decision? In many large enterprises. Click on the various icons in the lineage and then right mouse on one of the stages and find ³open details in new window´. which DOES provide a summary of ALL sources or ALL target from a particular resource]. Look thru the list. meta data. One conceptual issue that vendors and architects are pushing related to metadata management (whether home-grown or part of a packaged solution) is ³data lineage. This time it finally feels more serious. Tags: etl. you should likely receive ³No assets found´ or ³No data for the report´.´ Look around. but don¶t click anything ² when you are ready. and note what you see on the left. with all the risks that entails. Click it. That¶s a good thing. Now you can try ³Show Graphical´. thanks to attention to initiatives such as data governance. for the ³nth´ time over the years). Next entry: Linking Jobs together«« (link to next post in this series: Linking Jobs ) Ernie Posted in Metadata Workbench. 1 Comment » What exactly is Data Lineage? December 15. metadata. This brings you to a similar looking detail page. Data lineage doesn¶t show you ³ALL´ the sources ² just the path to/from the ones that you select [we'll contrast this in a later entry with Business Lineage. Grab some white space around the diagram and move the whole thing around«. This highlight bar lets you select EXACTLY which resource you¶d like to see for your actual report. and then one or more analysts or developers sifting thru various systems reviewing source code and tracking down ³subject matter experts. select ³Data Lineage´ at the upper right. etl.´ One large bank I visited recently said that this can take _days_! «and that¶s assuming they ever successfully find the answer. In too many other cases the executive cannot wait that long and makes a decision without knowing the background. ³Where did this come from´ will yield a similar result if you happen to be ³sitting´ on a source when you start your lineage exercise. Most likely. If you practice this. scenarios. this one for the ³Stage. and others.Now click on the ³main´ target Stage of this Job. That should expose again the ³assets´ page. but it tends to be more scalable. and the Job details you see on the right. This will comb through ALL the possible resources for ³where´ data for the ³stage you started on´ came from. This is because it¶s the ³final´ target ² there isn¶t anything else. The textual report isn¶t as pretty. play with it. consider ³where you are standing´ (you are on a ³Stage´) and what sort of lineage you would like to see. Everything is hyperlinked. metadata. Ignore the three checked boxes for now and click ³Create Report´. they call someone at the ³help desk´. The ³total´ collection of lineage resources is in front of you right now ² you will select which one you want for a detailed source-to-target report. Note also the highlighted line. or maybe its ultimate source as well as a lookup source stage or a source for a Join.. As you will discover. Review the result. The list of objects above should get much smaller. datastage. This is often a point of confusion because the highlight bar is not always obvious. Now find the little triangle towards the top left of this center pane where your report is (it¶s called Report Selection or similar) and click on it. data lineage. Move it up and down. 2009 ² dsrealtime Metadata management is becoming a big issue («again. master data management. and will have a strong base for moving forward with more complex.
Leave a Comment » Using DataStage to load new Terms into Business Glossary November 30. You¶ll need to download and then open the . general. and understanding ³what happens´ to the data as it flows to a report helps boost confidence in the results and the overall business intelligence infrastructure. This was tested in 8. Knowing the original source.dsx. Please let me know if you have any questions or run into problems. datastage. This lineage should be presented in a visual format. for initial loads.ppt. but if you look at the Job you will see that it can easily be adapted to any source. Ultimately this leads to better decision making.Carefully managed metadata that supports data lineage can help. The XML format is fairly easy to produce (a sample is provided with the Business Glossary and can be found at your Information Server Web Console) The attached DataStage Server Job illustrates how to load new Terms and Attributes from some external structure. ³Where did the value come from?´ ³How is it calculated?´ Data lineage can answer these questions. Happy reporting! Ernie Ostic Posted in Business Glossary. The terms and attributes are very simplistic and use a hockey theme.txt file. intermediate staging tables. Please let me know if you find them useful. is to use an XML import. Like the earlier post on Business Glossary. 2008 ² dsrealtime There are a variety of ways to import new Terms into the InfoSphere Busines Glossary. 4 Comments » «another way to load Terms into InfoSphere Business Glossary February 19. but may not be enough«. Posted in Business Glossary. One of these. quick access to a corporate glossary of terminology will enable an executive to look up the business definition for ³Quarterly Profit Variance.0.). Using the example above. or simply write another Job to go from your source to a target that reflects the sample terms I¶ve provided below. nor have they been widely tested ²± but they are already being implemented at a variety of locations.doc. tracing the data path (it¶s ³lineage´) upstream from the report. and even legacy systems on the mainframe. Tags: data lineage. . In this example I use a simple sequential file. I¶ve tried putting all the content in ³notes pages´ of . Each is named . data lineage. I haven¶t yet set them up for Custom Attributes.ppt [one of only a few file types allowed here] and then see if you can cut and paste the sample .txt files.1 csv import/export features available at the Information Server Web Console« Glossary tab. but is actually a . The code is an example for instructional purposes only.0. Seems the blog has changed and won¶t allow me to upload . I¶m not sure how well it will import into 8. preferably with options for viewing at a summary level with an option to drill down for individual column and process details. datastage. meta data. these DataStage Jobs read a potential source of Terms (just alter the source stage as needed) and then create a target csv file that is in the correct format for loading into Business Glossary using the new 8. bg. The lineage path may run through cubes and database views. (I hope you can make use of these. just to keep things simple and allow for discussion. general. and .dsx file).1. Many sources and expressions may have contributed to the final value.´ That may help them understand the business semantics. The Jobs are fairly well annotated and should be self explanatory. ETL processes that load a warehouse or datamart. sample xml. shells and FTP scripts. 2009 ² dsrealtime copy-of-createbgimportscsv copyofcreatebusinesstermsandattributesxmldsx3 Here are a few other Jobs for loading new Terms and Categories into Business Glossary. Ernie (the one with ³XML´ in the name is the same as the prior blog entry. although has since been modified for use with 8. They or their support team may need to drill deeper. metadata.
This action might not be possible to undo. Are you sure you want to continue?