Professional Documents
Culture Documents
Design Infosphere Datastage Jobs For Optimum Lineage
Design Infosphere Datastage Jobs For Optimum Lineage
ii
Contents
Design InfoSphere DataStage jobs for optimum lineage . . . . . . . . . . . 1
iii
iv
Use Connector Connector stages give the stages maximum amount of metadata about the job design. Therefore, use Connector stages instead of equivalent generic stages. For example, use the ODBC Connector stage rather than the ODBC Enterprise stage.
Table 1. Actions to ensure complete job design metadata for data lineage (continued)
Action Use environment variables and job parameters Description You can define variables and parameters to reuse across all jobs of a project by using environment variables and job parameters. Wherever possible, use parameters and parameter sets as common references across all jobs in a project. How this action affects lineage The use of variables reduces error and promotes data reuse in job development. Additional information For more information about how to set up job parameters and parameter sets, see Making your jobs adaptable. For general information about setting environment variables, see Guide to setting environment variables. For general information about environment variables, see Environment variables. Import project-level environment variables Before you run lineage reports, you must import the project-level environment variables that you defined in InfoSphere DataStage into InfoSphere Metadata Workbench. To list the environment variables that are defined for the project, use the dsadmin utility. Table definitions carry information about your source and target data, such as the name and structure of the database tables or files that contain your data. Within a table definition are column definitions. Column definitions contain information about the column name, column length, data type, and other column properties, such as keys and null values. The name and directory path of the imported or shared data file must match the name and directory path in the stage. InfoSphere Metadata Workbench requires table and column definitions to match imported database assets to jobs and to other assets in the metadata repository. InfoSphere Metadata Workbench uses the environment variables to reconcile and link the job with referenced sources. For information about how to import environment variables, see Import project-level environment variables.
Check the project-level environment variables Load columns of database and file stages from shared metadata
For information about how to run this utility, see Listing environment variables. For more information about shared metadata in InfoSphere DataStage, see Shared metadata.
When you import a data file, ensure that the its name and directory path are defined in the same way that they are defined in the stage Use job parameters to define file names and directory paths
If the name or directory path is not the same as it is in the stage, the data file and stage cannot be linked correctly in the job data flow. As a result, the lineage is incorrect or incomplete.
Table 1. Actions to ensure complete job design metadata for data lineage (continued)
Action Use the default SQL statements rather than user-defined SQL Description In InfoSphere Metadata Workbench, the schema and database table name of the imported database must be the same as the schema and table name in the stage. You can generate default SQL statements to read from and write to data sources. Alternatively, you can define SQL statements that read from and write to data sources. How this action affects lineage The Manage Lineage utility parses all SQL statements to extract information about the schema, owner, database tables, and columns. The utility then maps this information to shared database tables that were previously imported. User-defined SQL that contains complex statements might not be parsed correctly. If statements are not parsed correctly, you must run the Manual Binding utility. This utility manually sets the relationships between stages and data sources and between stages and other stages. Additional information For information about user-defined SQL in InfoSphere DataStage, see User-defined SQL. For information about job design considerations and SQL, see Job design considerations.
Set up a logging view and review the metadata workbench logs Query InfoSphere DataStage jobs in InfoSphere Metadata Workbench
You can view the log information in the IBM InfoSphere Information Server Web console.
For information about log views and their configuration in InfoSphere Metadata Workbench, see Log messages, Creating logging configurations, and Creating log views. For general information about queries, see Queries. For information about creating queries, see Creating queries.
On the Discover tab, you can run the Job Design Usage published query to see the links between jobs and their sources. You can also construct your own queries to see the stage types of a project.
After you complete these actions, you are ready to set up InfoSphere Metadata Workbench to analyze metadata for lineage. Follow these steps: 1. Run the Manage Lineage utility. This utility automatically runs the Manual Binding and Map Database Alias utilities. 2. To identify schemas that are identical, run the Data Source Identity utility. If two schemas are identified as identical, the database tables and database columns contained by the schemas are also marked as identical when their names match. This might be necessary when the same data source is imported into the repository by different means, such as by a connector and a bridge. 3. Run the data lineage report. The data lineage report shows the movement of data within a job or through multiple jobs. The report can also show the order of activities in a run of a job.